Skip to article frontmatterSkip to article content

Orthogonalization:

Text provided under a Creative Commons Attribution license, CC-BY. All code is made available under the FSF-approved MIT license. (c) Kyle T. Mandli

Note: This material largely follows the text “Numerical Linear Algebra” by Trefethen and Bau (SIAM, 1997) and is meant as a guide and supplement to the material presented there.

from __future__ import print_function

%matplotlib inline
import matplotlib.pyplot as plt
import numpy

Orthogonalization:

The QR Factorization, Projections,Least Squares Problems and Applications

Projections

A projector is a square matrix PP that satisfies

P2=P.P^2 = P.

Why does this definition make sense? Why do we require it to be square?

A projector comes from the idea that we want to project a vector v\mathbf{v} onto a lower dimensional subspace. For example, suppose that v\mathbf{v} lies completely within the subspace, i.e. vrange(P)\mathbf{v} \in \text{range}(P). If that is the case then PvP \mathbf{v} should not change, or Pv=vP\mathbf{v} = \mathbf{v}. This motivates the definition above.

i.e. if

Pv=vP\mathbf{v} = \mathbf{v}

then

P(Pv)=Pv=v.P( P \mathbf{v} ) = P\mathbf{v} = \mathbf{v}.

or

P2=PP^2 = P

As another example, take a vector xrange(P)\mathbf{x} \notin \text{range}(P) and project it onto the subspace Px=vP\mathbf{x} = \mathbf{v}. If we apply the projection again to v\mathbf{v} we now have

Px=v,P2x=Pv=vP2=P.\begin{aligned} P\mathbf{x} &= \mathbf{v}, \\ P^2 \mathbf{x} & = P \mathbf{v} = \mathbf{v} \\ \Rightarrow P^2 &= P. \end{aligned}

It is also important to keep in mind the following, given again xrange(P)\mathbf{x} \notin \text{range}(P), if we look at the difference between the projection and the original vector PxxP\mathbf{x} - \mathbf{x} and apply the projection again we have

P(Pxx)=P2xPx=0P(P\mathbf{x} - \mathbf{x}) = P^2 \mathbf{x} - P\mathbf{x} = 0

which means the difference between x\mathbf{x} and the projected vector Px=vP\mathbf{x} = \mathbf{v} lies in the null space of PP, i.e.

(xv)null(P).(\mathbf{x}-\mathbf{v}) \in \text{null}(P).

But

xv=xPx=(IP)x\mathbf{x} - \mathbf{v} = \mathbf{x} - P\mathbf{x} = (I - P)\mathbf{x}

Thus IPI-P is a matrix that projects onto null(P)\mathrm{null}(P)

Complementary Projectors

A projector also has a complement defined to be IPI - P.

Show that this complement is also a projector.

We can show that this a projector by examining a repeated application of (IP)(I-P):

(IP)2=IIPIP+P2=I2P+P2=I2P+P=IP.\begin{aligned} (I - P)^2 &= I - IP - IP + P^2 \\ &= I - 2 P + P^2 \\ &= I - 2P + P \\ &= I - P. \end{aligned}

It turns out that the complement projects exactly onto null(P)\text{null}(P).

Take

xnull(P),\mathbf{x} \in \text{null}(P),

then

(IP)x=xPx=x(I - P) \mathbf{x} = \mathbf{x} - P \mathbf{x} = \mathbf{x}

since Px=0P \mathbf{x} = 0 implying that xrange(IP)\mathbf{x} \in \text{range}(I - P).

We also know that

(IP)xnull(P)(I - P) \mathbf{x }\in \text{null}(P)

as well.

This shows that the

range(IP)null(P)\text{range}(I - P) \subseteq \text{null}(P)

and

range(IP)null(P)\text{range}(I - P) \supseteq \text{null}(P)

implying that

range(IP)=null(P)\text{range}(I - P) = \text{null}(P)

exactly.

Reflect on these subspaces and convince yourself that this all makes sense.

This result provides an important property of a projector and its complement, namely that they divide a space into two subspaces whose intersection is

range(IP)range(P)={0}\text{range}(I - P) \cap \text{range}(P) = \{0\}

or

null(P)range(P)={0}\text{null}(P) \cap \text{range}(P) = \{0\}

These two spaces are called complementary subspaces.

Given this property we can take any PCm×mP \in \mathbb C^{m \times m} which will split Cm×m\mathbb C^{m \times m} into two subspaces SS and VV, assume that sS=range(P)\mathbf{s}\in S = \text{range}(P), and vV=null(P)\mathbf{v} \in V = \text{null}(P). If we have xCm×m\mathbf{x} \in \mathbb C^{m \times m} that we can split the vector x\mathbf{x} into components in SS and VV by using the projections

Px=xSxsS(IP)x=xVxVV\begin{aligned} P \mathbf{x} = \mathbf{x}_S& &\mathbf{x}_s \in S \\ (I - P) \mathbf{x} = \mathbf{x}_V& &\mathbf{x}_V \in V \end{aligned}

which we can also observe adds to the original vector as

xS+xV=Px+(IP)x=x.\mathbf{x}_S + \mathbf{x}_V = P \mathbf{x} + (I - P) \mathbf{x} = \mathbf{x}.

Try constructing a projection matrix so that PR3P \in \mathbb R^3 that projects a vector into one of the coordinate directions (R\mathbb R).

  • What is the complementary projector?

  • What is the complementary subspace?

Example: A non-orthogonal non-linear projector

Given a vector of mols of NN chemical components

n=[n1n2nN]\mathbf{n} = \begin{bmatrix} n_1 \\ n_2 \\ \vdots \\ n_N\end{bmatrix}

where (e.g. n1n_1 is the number of moles of component 1) and ni0n_i\geq 0

Then we can define the mol fraction of component ii as

xi=ninT1x_i = \frac{n_i}{\mathbf{n}^T\mathbf{1}}


and the “vector” of mole fractions

x=[x1x2xN]\mathbf{x} = \begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_N\end{bmatrix}

In the homework you will show that

  • xT1=1\mathbf{x}^T\mathbf{1} = 1 (the sum of the mole fractions add to 1)

  • mole fractions do not form a vector space or subspace

  • There exists a non Orthogonal projector ff such that f(n)=xf(\mathbf{n})=\mathbf{x}, f2=ff^2=f

  • PP is singular (like all projection matrices) such that if you know the mole-fractions you don’t know how many moles you have.

  • Find N(P)N(P)

Orthogonal Projectors

An orthogonal projector is one that projects onto a subspace SS that is orthogonal to the complementary subspace VV (this is also phrased that SS projects along a space VV). Note that we are only talking about the subspaces (and their basis), not the projectors! i.e. for orthogonal subspaces all the vectors in SS are orthogonal to all the vectors in VV.

A hermitian matrix is one whose complex conjugate transposed is itself, i.e.

P=P.P = P^\ast.

With this definition we can then say: A projector PP is orthogonal if and only if PP is hermitian.

Quick Proof

Show that if P2=PP^2 = P and P=PP^\ast = P, then

Px,(IP)x=0\langle P\mathbf{x}, (I-P)\mathbf{x}\rangle=0

Projection with an Orthonormal Basis

We can also directly construct a projector that uses an orthonormal basis on the subspace SS. If we define another matrix QCm×nQ \in \mathbb C^{m \times n} which is unitary (its columns are orthonormal) we can construct an orthogonal projector as

P=QQ.P = Q Q^*.

Check that P=PP^* = P and P2=PP^2 = P (Remember QQ=IQ^*Q=I)

Note that the resulting matrix PP is in Cm×m\mathbb C^{m \times m} as we require. This means also that the dimension of the subspace SS is nn.

Example: Orthonormal projection and Least-Squares Problems... A review

Consider the overdetermined problem Ax=bA\mathbf{x}=\mathbf{b} where AR3×2A\in\mathbb{R}^{3\times2} and bR3\mathbf{b}\in\mathbb{R}^3 i.e.

[a1a2][x1x2]=[b]\begin{bmatrix} | & | \\ \mathbf{a}_1 & \mathbf{a}_2\\ | & | \\ \end{bmatrix} \begin{bmatrix} x_1 \\ x_2\\ \end{bmatrix} = \begin{bmatrix} | \\ \mathbf{b} \\ | \\ \end{bmatrix}

and a1\mathbf{a}_1, a2\mathbf{a}_2 are linearly independent vectors that span a two-dimensional subspace of R3\mathbb{R}^3.

Geometry

Geometrically this problem looks like

Drawing

If bC(A)\mathbf{b}\notin C(A), then there is clearly no solution to Ax=bA\mathbf{x}=\mathbf{b}. However, we can find the point p=Ax^C(A)\mathbf{p}=A\hat{\mathbf{x}}\in C(A) that minimizes the length of the the error e=bp\mathbf{e}=\mathbf{b}-\mathbf{p}. While we could resort to calculus to find the values of x^\hat{\mathbf{x}} that minimizes e2||\mathbf{e}||_2. It should be clear from the figure that the shortest error (in the 2\ell_2 norm) is the one that is perpendicular to every vector in C(A)C(A).

But the sub-space of vectors orthogonal to C(A)C(A) is the left-Null Space N(AT)N(A^T), and therefore we simply seek solutions of

0=ATe=AT(bp)=AT(bAx^)\begin{align} 0 &= A^T\mathbf{e} \\ &= A^T(\mathbf{b}-\mathbf{p})\\ &= A^T(\mathbf{b} - A\hat{\mathbf{x}})\\ \end{align}

or we just need to solve the “Normal Equations” ATAx^=ATbA^T A\hat{\mathbf{x}} = A^T\mathbf{b} for the least-squares solution

x^=(ATA)1ATb\hat{\mathbf{x}} = \left(A^T A\right)^{-1}A^T\mathbf{b}

if we’re actually interested in p\mathbf{p} which is the orthogonal projection of b\mathbf{b} onto C(A)C(A) we get

p=Ax^=A(ATA)1ATb=Pb\mathbf{p}= A\hat{\mathbf{x}} = A \left(A^T A\right)^{-1}A^T\mathbf{b} = P\mathbf{b}

where

P=A(ATA)1ATP = A \left(A^T A\right)^{-1}A^T

is an orthogonal projection matrix (verify that P2=PP^2 = P and (IP)bPb)(I - P)\mathbf{b}\perp P\mathbf{b})

For a general matrix AA, this form of the projection matrix is rather horrid to find, however, if the columns of AA formed an orthonormal basis for C(A)C(A), i.e. A=QA=Q, then the form of the projection matrix is much simpler as QTQ=IQ^T Q=I, therefore

P=QQTP = QQ^T

This is actually quite general. Given any orthonormal basis for a vector space S=spanq1,q2,,qNS=\mathrm{span}\langle \mathbf{q}_1,\mathbf{q}_2,\ldots,\mathbf{q}_N\rangle. If these vectors form the columns of QQ, then the orthogonal projector onto SS is always QQTQQ^T and the complement is always IQQTI-QQ^T.

Example: Construction of an orthonormal projector

Take R3\mathbb R^3 and derive a projector that projects onto the x-y plane and is an orthogonal projector.

the simplest Orthonormal basis for the xyx-y plane are the columns of

Q=[100100]Q = \begin{bmatrix} 1 & 0 \\ 0 & 1 \\ 0 & 0 \end{bmatrix}

and an orthogonal projector onto the xyx-y plane is simply

QQ=[100100][100010]=[100010000]Q Q^\ast = \begin{bmatrix} 1 & 0 \\ 0 & 1 \\ 0 & 0 \end{bmatrix} \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \end{bmatrix} = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 0 \end{bmatrix}
Q = numpy.array([[1.0, 0.0], [0.0, 1.0], [0.0, 0.0]])
P = numpy.dot(Q, Q.T)
I = numpy.identity(3)

x = numpy.array([3.0, 4.0, 5.0])

print("x = {}".format(x))
print("\nQ = \n{}".format(Q))
print("\nP = \n{}".format(P))
x_S = numpy.dot(P, x)
x_V = numpy.dot(I - P, x)
print("x_S = {}".format(x_S))
print("x_V = {}\n".format(x_V))
print("x_S + x_V = {}".format(x_S + x_V))

A numerically more sensible approach

The previous problem calculated the projection matrix first, P=QQTP=QQ^T then calculated the projection and its complement by matrix vector multiplication, i.e.

xS=Px=(QQT)x,xV=(IP)x\mathbf{x}_S = P\mathbf{x} = (QQ^T)\mathbf{x}, \quad \mathbf{x}_V = (I - P)\mathbf{x}

A mathematically equivalent, but numerically more efficient method is to calculate the following

xS=Q(QTx),xV=xxS\mathbf{x}_S = Q(Q^T\mathbf{x}),\quad \mathbf{x}_V = \mathbf{x} - \mathbf{x}_S

How much more efficient is the latter than the former?

Check

Q = numpy.array([[1, 0], [0, 1], [0, 0]])
x = numpy.array([3.0, 4.0, 5.0])

print("x = {}".format(x))
x_S = Q.dot(Q.T.dot(x))
x_V = x - x_S
print("x_S = {}".format(x_S))
print("x_V = {}\n".format(x_V))
print("x_S + x_V = {}".format(x_S + x_V))

Example: Construction of a projector that eliminates a direction

Goal: Eliminate the component of a vector in the direction q\mathbf{q}.

Drawing

e.g. We can calculate the projection p\mathbf{p} in two equivalent ways

  1. find the matrix P=QQTP=QQ^T that projects b\mathbf{b} onto C(A)C(A)

  2. find the matrix P=qqTP'=\mathbf{q}\mathbf{q}^T that projects onto the unit vector q\mathbf{q} normal to the plane (i.e. parallel to e\mathbf{e}), and then take its Complement projection p=(IP)b=bq(qTb)\mathbf{p} = (I - P')\mathbf{b} = \mathbf{b} - \mathbf{q}(\mathbf{q}^T\mathbf{b})

Form the projector P=qqCm×mP = \mathbf{q} \mathbf{q}^\ast \in \mathbb C^{m \times m}. The complement IPI - P will then include everything BUT that direction.

If q=1||\mathbf{q}|| = 1 we can then simply use IqqI - \mathbf{q} \mathbf{q}^\ast. If not we can write the projector in terms of the arbitrary vector a\mathbf{a} as

Iaaa2=IaaaaI - \frac{\mathbf{a} \mathbf{a}^\ast}{||\mathbf{a}||^2} = I - \frac{\mathbf{a} \mathbf{a}^\ast}{\mathbf{a}^\ast \mathbf{a}}

Note that differences in the resulting dimensions between the two values in the fraction. Also note that as we saw with the outer product, the resulting rank(aa)=1\text{rank}(\mathbf{a} \mathbf{a}^\ast) = 1.

Now again try to construct a projector in R3\mathbb R^3 that projects onto the xx-yy plane.

q = numpy.array([0, 0, 1])
P = numpy.outer(q, q.conjugate())
P_comp = numpy.identity(3) - P
print("P = \n{}\n".format(P))
print("I - P = \n{}".format(P_comp))
x = numpy.array([3, 4, 5])
print(numpy.dot(P, x), q * (q.dot(x)))
print(numpy.dot(P_comp, x), x - q * (q.dot(x)))
a = numpy.array([0, 0, 3])
P = numpy.outer(a, a.conjugate()) / (numpy.dot(a, a.conjugate()))
P_comp = numpy.identity(3) - P
print(numpy.dot(P, x))
print(numpy.dot(P_comp, x))

Quick Review

  • A projection matrix is any square matrix PP such that P2=PP^2=P

  • PP projects onto a subspace S=range(P)S=\mathrm{range}(P)

  • The complementary projection matrix IPI-P projects onto V=null(P)V=\mathrm{null}(P)

  • An orthogonal projector can always be constructed as P=QQTP=QQ^T where the columns of QQ form an orthonormal basis for SS and SVS\perp V

  • Solutions of Least-squares problems Ax=bA\mathbf{x}=\mathbf{b} are essentially projection problems where we seek to solve Ax=bA\mathbf{x}=\mathbf{b} where bC(A)\mathbf{b}\notin C(A)

Solution of Least Squares problems by the Normal Equations

given Ax=bA\mathbf{x}=\mathbf{b} where bC(A)\mathbf{b}\notin C(A) we can always solve them using the Normal Equations

ATAx^=ATbA^T A\hat{\mathbf{x}} = A^T\mathbf{b}

which actually solves Ax^=pA\hat{\mathbf{x}} =\mathbf{p} where p\mathbf{p} is the orthogonal projection of b\mathbf{b} onto C(A)C(A)

However...as you will show in the homework, this can be numerically innaccurate because the condition number

κ(ATA)=κ(A)2\kappa(A^T A) = \kappa(A)^2

but there is a better way

Solution of Least Squares problems by the QR factorization

given any matrix AA that is full column rank, we will show that we can always factor it as

A=QRA=QR

where QQ is a unitary matrix whose columns form an orthonormal basis for C(A)C(A), and RR is an upper triangular matrix that says how to reconstruct the columns of AA from the columns of QQ

[a1an]=[q1qn][r11r12r1nr22rnn].\begin{bmatrix} & & \\ & & \\ \mathbf{a}_1 & \cdots & \mathbf{a}_n \\ & & \\ & & \end{bmatrix} = \begin{bmatrix} & & \\ & & \\ \mathbf{q}_1 & \cdots & \mathbf{q}_n \\ & & \\ & & \end{bmatrix} \begin{bmatrix} r_{11} & r_{12} & \cdots & r_{1n} \\ & r_{22} & & \\ & & \ddots & \vdots \\ & & & r_{nn} \end{bmatrix}.

If we write this out as a matrix multiplication we have

a1=r11q1a2=r22q2+r12q1a3=r33q3+r23q2+r13q1\begin{aligned} \mathbf{a}_1 &= r_{11} \mathbf{q}_1 \\ \mathbf{a}_2 &= r_{22} \mathbf{q}_2 + r_{12} \mathbf{q}_1 \\ \mathbf{a}_3 &= r_{33} \mathbf{q}_3 + r_{23} \mathbf{q}_2 + r_{13} \mathbf{q}_1 \\ &\vdots \end{aligned}

i.e we can construct the columns of AA from linear combinations of the columns of QQ. (And those specific combinations come from R=QTAR=Q^TA)

given A=QRA=QR, then solving

Ax=bA\mathbf{x} = \mathbf{b}

becomes

QRx=bQR\mathbf{x} = \mathbf{b}

or since QTQ=IQ^T Q=I

Rx=QTbR\mathbf{x} = Q^T\mathbf{b}

which can be solve quickly by back-substitution as it is a triangular matrix.

Moreover, this problem is much better conditioned as it can be shown that κ(R)=κ(A)\kappa(R)=\kappa(A) not κ(A)2\kappa(A)^2

Multiplying both sides by QQ again shows that this problem is equivalent to

QRx=QQTbQR\mathbf{x} = QQ^T\mathbf{b}

or Ax=QQTbA\mathbf{x} = QQ^T\mathbf{b} i.e. we are just solving Ax=pA\mathbf{x}=\mathbf{p} where p\mathbf{p} is the orthogonal projection of b\mathbf{b} onto C(A)C(A)

Question... how to find QRQR?

QR Factorization

One of the most important ideas in linear algebra is the concept of factorizing an original matrix into different constituents that may have useful properties. These properties can help us understand the matrix better and lead to numerical methods. In numerical linear algebra one of the most important factorizations is the QR factorization.

There are actually multiple algorithm’s that accomplish the same factorization but with very different methods and numerical stability. Here we will discuss three of the algorithms

  1. Classical Gram-Schmidt Orthogonalization: Transform AQA\rightarrow Q by a succession of projections and calculate RR as a by-product of the algorithm. (Unfortunately, prone to numerical floating point error)

  2. Modified Gram-Schmidt Orthogonalization: Transform AQA\rightarrow Q by a different set of projections. Yields the same QRQR but more numerically stable.

  3. Householder Triangularization: Transform ARA\rightarrow R by a series of Unitary transformations, can solve least squares problems directly without accumulating QQ, or can build QQ on the fly.

And want to find a sequence of orthonormal vectors qj\mathbf{q}_j that span the sequence of subspaces

Classical Gram-Schmidt

We begin with a matrix ACm×nA\in\mathbb C^{m\times n} with nn linearly independent columns.

A=[a1an]A = \begin{bmatrix} & & \\ & & \\ \mathbf{a}_1 & \cdots & \mathbf{a}_n \\ & & \\ & & \end{bmatrix}

And want to find a sequence of orthonormal vectors qj\mathbf{q}_j that span the sequence of subspaces

span(a1)span(a1,a2)span(a1,a2,a3)span(a1,a2,,an)\text{span}(\mathbf{a}_1) \subseteq \text{span}(\mathbf{a}_1, \mathbf{a}_2) \subseteq \text{span}(\mathbf{a}_1, \mathbf{a}_2, \mathbf{a}_3) \subseteq \cdots \subseteq \text{span}(\mathbf{a}_1, \mathbf{a}_2, \ldots , \mathbf{a}_n)

where here span(v1,v2,,vm)\text{span}(\mathbf{v}_1,\mathbf{v}_2,\ldots,\mathbf{v}_m) indicates the subspace spanned by the vectors v1,v2,,vm\mathbf{v}_1, \mathbf{v}_2, \ldots, \mathbf{v}_m.

The individual qj\mathbf{q}_j will form the columns of the matrix QQ such that C(Ak)=C(Qk)C(A_k)=C(Q_k) where the AkA_k is a matrix with the first kk columns of A (or Q)

Starting with the first vector a1\mathbf{a}_1, this forms the basis for a 1-dimensional subspace of Rm\mathbb{R}^m (i.e. a line). Thus q1\mathbf{q}_1 is a unit vector in that line or

q1=a1a1.\mathbf{q}_1 = \frac{\mathbf{a}_1}{||\mathbf{a}_1||}.

For span(a1,a2)\text{span}(\mathbf{a}_1, \mathbf{a}_2) (which is a plane in Rm\mathbb{R}^m we already have q1\mathbf{q}_1 so we need to find a vector q2\mathbf{q}_2 that is orthogonal to q1\mathbf{q}_1, but still in the plane.

An obvious option is to find the component of a2\mathbf{a}_2 that is orthogonal to q1\mathbf{q}_1 but this is just

v2=(Iq1q1T)a2=a2q1(q1Ta2)\begin{align} \mathbf{v}_2 &= (I - \mathbf{q}_1\mathbf{q}_1^T)\mathbf{a}_2 \\ &= \mathbf{a}_2 - \mathbf{q}_1(\mathbf{q}_1^T\mathbf{a}_2)\\ \end{align}

By construction it’s easy to show that v2\mathbf{v_2} is orthogonal to q1\mathbf{q}_1 but not necessarily a unit vector but we can find q2\mathbf{q}_2 by normalizing

q2=v2v2.\mathbf{q}_2 = \frac{\mathbf{v}_2}{||\mathbf{v}_2||}.

Classical Gram-Schmidt as a series of projections

If we define the unitary matrix

Qk=[q1qk]Q_k = \begin{bmatrix} & & \\ & & \\ \mathbf{q}_1 & \cdots & \mathbf{q}_k \\ & & \\ & & \end{bmatrix}

as the first kk columns of Q, then we could also rewrite the first two steps of Classical Gram-Schmidt as

set v1=a1\mathbf{v}_1 = \mathbf{a}_1, then $$

q1=v1v1v2=(IQ1Q1T)a2=a2Q1(Q1Ta2)q2=v2v2\begin{align} \mathbf{q}_1 &= \frac{\mathbf{v}_1}{||\mathbf{v}_1||}\\ \mathbf{v_2} &= (I - Q_1Q_1^T)\mathbf{a}_2 = \mathbf{a}_2 - Q_1(Q_1^T\mathbf{a}_2)\\ \mathbf{q}_2 &= \frac{\mathbf{v}_2}{||\mathbf{v}_2||}\\ \end{align}

$$

and we can continue $$

v3=(IQ2Q2T)a3=a3Q2(Q2Ta3)q3=v3v3vk=(IQk1Qk1T)ak=akQk1(Qk1Tak)qk=vkvk\begin{align} \mathbf{v_3} &= (I - Q_2Q_2^T)\mathbf{a}_3 = \mathbf{a}_3 - Q_2(Q_2^T\mathbf{a}_3)\\ \mathbf{q}_3 &= \frac{\mathbf{v}_3}{||\mathbf{v}_3||}\\ & \vdots \\ \mathbf{v_k} &= (I - Q_{k-1}Q_{k-1}^T)\mathbf{a}_k = \mathbf{a}_k - Q_{k-1}(Q_{k-1}^T\mathbf{a}_k)\\ \mathbf{q}_k &= \frac{\mathbf{v}_k}{||\mathbf{v}_k||}\\ \end{align}

$$

With each step finding the component of aka_k that is orthogonal to all the other vectors before it, and normalizing it.

A picture is probably useful here...

But what about RR?

The above algorithm appears to transform AA directly to QQ without calculating RR (which we need for least-squares problems).

But actually that’s not true as RR is hiding in the algorithm (much like LL hides in the LULU factorizations AUA\rightarrow U)

If we consider the full factorization A=QRA = QR

[a1an]=[q1qn][r11r12r1nr22rnn].\begin{bmatrix} & & \\ & & \\ \mathbf{a}_1 & \cdots & \mathbf{a}_n \\ & & \\ & & \end{bmatrix} = \begin{bmatrix} & & \\ & & \\ \mathbf{q}_1 & \cdots & \mathbf{q}_n \\ & & \\ & & \end{bmatrix} \begin{bmatrix} r_{11} & r_{12} & \cdots & r_{1n} \\ & r_{22} & & \\ & & \ddots & \vdots \\ & & & r_{nn} \end{bmatrix}.

it follows that R=QTAR=Q^T A or $$ \begin{bmatrix} r_{11} & r_{12} & \cdots & r_{1n} \ & r_{22} & & \ & & \ddots & \vdots \ & & & r_{nn}\end{bmatrix} =

[q1Tq2TqnT]\begin{bmatrix} & -\mathbf{q}_1^T- & \\ & -\mathbf{q}_2^T- & \\ & \vdots & \\ & -\mathbf{q}_n^T- & \\ \end{bmatrix}
[a1an]\begin{bmatrix} & & \\ & & \\ \mathbf{a}_1 & \cdots & \mathbf{a}_n \\ & & \\ & & \end{bmatrix}
.
Can't use function '$' in math mode at position 56: … show that the $̲j$th column of …

Which with a little bit of work, you can show that the $j$th column of $R$
\mathbf{r}_j = Q_j^T\mathbf{a}_j

$$

So the full Classical Gram-Schmidt algorithm looks something like

Set v1=a1\mathbf{v}_1 = \mathbf{a}_1, R11=v1R_{11}=||\mathbf{v}_1||, q1=v1/R11\mathbf{q}_1 = \mathbf{v}_1/R_{11}

Loop over columns for j=2,,nj=2,\ldots,n

  • Find rj=Qj1Taj\mathbf{r}_j = Q^T_{j-1}\mathbf{a_j}

  • vj=ajQj1rj\mathbf{v}_j = \mathbf{a}_j - Q_{j-1}\mathbf{r}_j

  • Rjj=vj2R_{jj} = ||\mathbf{v}_j||_2

  • qj=vj/Rjj\mathbf{q}_j = \mathbf{v}_j/R_{jj}

Which builds up both QQ and RR

Or unrolled

q1=a1r11q2=a2r12q1r22q3=a3r13q1r23q2r33qn=ani=1n1rinqirnn\begin{aligned} \mathbf{q}_1 &= \frac{\mathbf{a}_1}{r_{11}} \\ \mathbf{q}_2 &= \frac{\mathbf{a}_2 - r_{12} \mathbf{q}_1}{r_{22}} \\ \mathbf{q}_3 &= \frac{\mathbf{a}_3 - r_{13} \mathbf{q}_1 - r_{23} \mathbf{q}_2}{r_{33}} \\ &\vdots \\ \mathbf{q}_n &= \frac{\mathbf{a}_n - \sum^{n-1}_{i=1} r_{in} \mathbf{q}_i}{r_{nn}} \end{aligned}

leading us to define

rij={qi,ajijaji=1j1rijqii=jr_{ij} = \left \{ \begin{aligned} &\langle \mathbf{q}_i, \mathbf{a}_j \rangle & &i \neq j \\ &\left \Vert \mathbf{a}_j - \sum^{j-1}_{i=1} r_{ij} \mathbf{q}_i \right \Vert & &i = j \end{aligned} \right .

This is called the classical Gram-Schmidt iteration. Turns out that the procedure above is unstable because of rounding errors introduced.

Which can be easily coded up in python as

# Implement Classical Gram-Schmidt Iteration
def classic_GS(A):
    m, n = A.shape
    Q = numpy.empty((m, n))
    R = numpy.zeros((n, n))
    # loop over columns
    for j in range(n):
        v = A[:, j]
        for i in range(j):
            R[i, j] = numpy.dot(Q[:, i].conjugate(), A[:, j])
            v = v - R[i, j] * Q[:, i]
        R[j, j] = numpy.linalg.norm(v, ord=2)
        Q[:, j] = v / R[j, j]
    return Q, R

And check

A = numpy.array([[12, -51, 4], [6, 167, -68], [-4, 24, -41]], dtype=float)
print(A)
Q, R = classic_GS(A)
print("Q=\n{}\n".format(Q))
print("Q^TQ=\n{}".format(numpy.dot(Q.transpose(), Q)))
print("R=\n{}\n".format(R))
print("QR - A=\n{}".format(numpy.dot(Q, R) - A))

Full vs. Reduced QR

If the original matrix ACm×nA \in \mathbb C^{m \times n} where mnm \ge n then we can still define a QR factorization, called the full QR factorization, which appends columns full of zeros to RR to reproduce the full matrix.

A=QR=[Q1Q2][R10]=Q1R1A = Q R = \begin{bmatrix} Q_1 & Q_2 \end{bmatrix} \begin{bmatrix} R_1 \\ 0 \end{bmatrix} = Q_1 R_1

The factorization Q1R1Q_1 R_1 is called the reduced or thin QR factorization of AA.

We require that the additional columns added Q2Q_2 are an orthonormal basis that is orthogonal itself to range(A)\text{range}(A). If AA is full ranked then Q1Q_1 and Q2Q_2 provide a basis for range(A)\text{range}(A) and null(A)\text{null}(A^\ast) respectively.

QR Existence and Uniqueness

Two important theorems exist regarding this algorithm which we state without proof:

Every ACm×nA \in \mathbb C^{m \times n} with mnm \geq n has a full QR factorization and therefore a reduced QR factorization.

Each ACm×nA \in \mathbb C^{m \times n} with mnm \geq n of full rank has a unique reduced QR factorization A=QRA = QR with rjj>0r_{jj} > 0.

Modified Gram-Schmidt

Unfortunately the classical Gram-Schmidt algorithm is is not stable numerically. Instead we can derive a modified method that is more numerically stable but calculates the same QQ and RR with just a different order of projections.

Recall that the basic piece of the original algorithm was to take the inner product of aj\mathbf{a}_j and all the relevant qi\mathbf{q}_i. Using the rewritten version of Gram-Schmidt in terms of projections we then have

vj=(IQj1Qj1T)aj.\mathbf{v}_j = (I - Q_{j-1}Q^T_{j-1}) \mathbf{a}_j.

Each of these projections is of different rank m(j1)m - (j - 1) although we know that the resulting vj\mathbf{v}_j are linearly independent by construction. The modified version of Gram-Schmidt instead uses projections that are all of rank m1m-1. To construct this projection remember that we can again construct the complement to a projection and perform the following sequence of projections

Pj= ⁣P^qj1 ⁣P^qj2 ⁣P^q2 ⁣P^q1P_j = \hat{\!P}_{\mathbf{q}_{j-1}} \hat{\!P}_{\mathbf{q}_{j-2}} \cdots \hat{\!P}_{\mathbf{q}_{2}} \hat{\!P}_{\mathbf{q}_{1}}

where

 ⁣P^qj=IqiqiT\hat{\!P}_{\mathbf{q}_{j}} = I - \mathbf{q}_i\mathbf{q_i}^T

which projects onto the complementary space orthogonal to qi\mathbf{q}_i.

Note that this performs mathematically the same job as PiaiP_i \mathbf{a}_i however each of these projectors are of rank m1m - 1. The reason why this approach is more stable is that we are not projecting with a possibly arbitrarily low-rank projector, instead we only take projectors that are high-rank.

again...a picture is probably worth a lot here.

This leads to the following set of calculations:

1.vi(1)=ai2.vi(2)= ⁣P^q1vi(1)=vi(1)q1q1vi(1)3.vi(3)= ⁣P^q2vi(2)=vi(2)q2q2vi(2) i.vi(i)= ⁣P^qi1vi(i1)=vi(i1)qi1qi1vi(i1)\begin{aligned} 1.\quad \mathbf{v}^{(1)}_i &= \mathbf{a}_i \\ 2.\quad \mathbf{v}^{(2)}_i &= \hat{\!P}_{\mathbf{q}_1} \mathbf{v}_i^{(1)} = \mathbf{v}^{(1)}_i - \mathbf{q}_1 q_1^\ast \mathbf{v}^{(1)}_i \\ 3.\quad \mathbf{v}^{(3)}_i &= \hat{\!P}_{\mathbf{q}_2} \mathbf{v}_i^{(2)} = \mathbf{v}^{(2)}_i - \mathbf{q}_2 \mathbf{q}_2^\ast \mathbf{v}^{(2)}_i \\ & \text{ } \vdots & &\\ i.\quad \mathbf{v}^{(i)}_i &= \hat{\!P}_{\mathbf{q}_{i-1}} \mathbf{v}_i^{(i-1)} = \mathbf{v}_i^{(i-1)} - \mathbf{q}_{i-1} \mathbf{q}_{i-1}^\ast \mathbf{v}^{(i-1)}_i \end{aligned}

The reason why this approach is more stable is that we are not projecting with a possibly arbitrarily low-rank projector, instead we only take projectors that are high-rank.

Example: Implementation of modified Gram-Schmidt Implement the modified Gram-Schmidt algorithm checking to make sure the resulting factorization has the required properties.

# Implement Modified Gram-Schmidt Iteration
def mod_GS(A):
    m, n = A.shape
    Q = numpy.empty((m, n))
    R = numpy.zeros((n, n))
    v = A.copy()
    for i in range(n):
        R[i, i] = numpy.linalg.norm(v[:, i], ord=2)
        Q[:, i] = v[:, i] / R[i, i]
        for j in range(i + 1, n):
            R[i, j] = numpy.dot(Q[:, i].conjugate(), v[:, j])
            v[:, j] -= R[i, j] * Q[:, i]
    return Q, R

And check

A = numpy.array([[12, -51, 4], [6, 167, -68], [-4, 24, -41]], dtype=float)
print(A)
Q, R = mod_GS(A)
print("Q=\n{}\n".format(Q))
print("Q^TQ=\n{}".format(numpy.dot(Q.transpose(), Q)))
print("R=\n{}\n".format(R))
print("QR - A=\n{}".format(numpy.dot(Q, R) - A))

Householder Triangularization

One way to also interpret Gram-Schmidt orthogonalization is as a series of right multiplications by upper triangular matrices of the matrix A. For instance the first step in performing the modified algorithm is to divide through by the norm r11=v1r_{11} = ||v_1|| to give q1q_1:

[v1v2v3vn][1r1111]=[q1v2v3vn]\begin{bmatrix} & & & \\ & & & \\ \mathbf{v}_1 & \mathbf{v}_2 & \mathbf{v}_3 & \cdots & \mathbf{v}_n \\ & & & \\ & & & \end{bmatrix} \begin{bmatrix} \frac{1}{r_{11}} & &\cdots & \\ & 1 & \\ & & \ddots & \\ & & & 1 \end{bmatrix} = \begin{bmatrix} & & & \\ & & & \\ \mathbf{q}_1 & \mathbf{v}_2 & \mathbf{v}_3 & \cdots & \mathbf{v}_n \\ & & & \\ & & & \end{bmatrix}

We can also perform all the step (2) evaluations by also combining the step that projects onto the complement of q1q_1 by add the appropriate values to the entire first row:

[v1v2v3vn][1r11r12r11r13r1111]=[q1v2(2)v3(2)vn(2)]\begin{bmatrix} & & & \\ & & & \\ \mathbf{v}_1 & \mathbf{v}_2 & \mathbf{v}_3 & \cdots & \mathbf{v}_n \\ & & & \\ & & & \end{bmatrix} \begin{bmatrix} \frac{1}{r_{11}} & -\frac{r_{12}}{r_{11}} & -\frac{r_{13}}{r_{11}} & \cdots \\ & 1 & \\ & & \ddots & \\ & & & 1 \end{bmatrix} = \begin{bmatrix} & & & \\ & & & \\ \mathbf{q}_1 & \mathbf{v}_2^{(2)} & \mathbf{v}_3^{(2)} & \cdots & \mathbf{v}_n^{(2)} \\ & & & \\ & & & \end{bmatrix}

The next step can then be placed into the second row:

[v1v2v3vn]R1[11r22r23r22r25r2211]=[q1q2v3(3)vn(3)]\begin{bmatrix} & & & \\ & & & \\ \mathbf{v}_1 & \mathbf{v}_2 & \mathbf{v}_3 & \cdots & \mathbf{v}_n \\ & & & \\ & & & \end{bmatrix} \cdot R_1 \cdot \begin{bmatrix} 1 & & & & \\ & \frac{1}{r_{22}} & -\frac{r_{23}}{r_{22}} & -\frac{r_{25}}{r_{22}} & \cdots \\ & & 1 & \\ & & & \ddots & \\ & & & & 1 \end{bmatrix} = \begin{bmatrix} & & & \\ & & & \\ \mathbf{q}_1 & \mathbf{q}_2 & \mathbf{v}_3^{(3)} & \cdots & \mathbf{v}_n^{(3)} \\ & & & \\ & & & \end{bmatrix}

If we identify the matrices as R1R_1 for the first case, R2R_2 for the second case and so on we can write the algorithm as

AR1R2RnR^1= ⁣Q^.A \underbrace{R_1R_2 \quad \cdots \quad R_n}_{\hat{R}^{-1}} = \hat{\!Q}.

This view of Gram-Schmidt is called Gram-Schmidt triangularization.

Householder triangularization is similar in spirit. Instead of multiplying AA on the right Householder multiplies AA on the left by unitary matrices QkQ_k. Remember that a unitary matrix (or an orthogonal matrix when strictly real) has as its inverse its adjoint (transpose when real) Q=Q1Q^* = Q^{-1} so that QQ=IQ^* Q = I. We therefore have

QnQn1Q2Q1A=RQ_n Q_{n-1} \quad \cdots \quad Q_2 Q_1 A = R

which if we identify QnQn1  Q2Q1=QQ_n Q_{n-1} \text{ } \cdots \text{ } Q_2 Q_1 = Q^* and note that if Q=QnQn1  Q2Q1Q = Q^\ast_n Q^\ast_{n-1} \text{ } \cdots \text{ } Q^\ast_2 Q^\ast_1 then QQ is also unitary.

Householder Triangularization

While Gram-Schmidt and its variants transform AQA\rightarrow Q and calculate RR on the way. Householder triangularization is a rather different algorithm that transforms ARA\rightarrow R by multiplying AA by a series of square Unitary matrices that systematically put zeros in the subdiagonal (similar to the LU factorization)

QnQn1Q2Q1A=RQ_n Q_{n-1} \quad \cdots \quad Q_2 Q_1 A = R

which if we identify Q=QnQn1  Q2Q1Q^* = Q_n Q_{n-1} \text{ } \cdots \text{ } Q_2 Q_1 then we can define Q=Q1Q2  Qn1QnQ = Q^\ast_1 Q^\ast_{2} \text{ } \cdots \text{ } Q^\ast_{n-1} Q^\ast_n which is also unitary and

A=QRA = QR

We can then write this as

QnQn1  Q2Q1A=RQn1Q2Q1A=QnR A=Q1Q2  Qn1QnRA=QR\begin{aligned} Q_n Q_{n-1} \text{ } \cdots \text{ } Q_2 Q_1 A &= R \\ Q_{n-1} \cdots Q_2 Q_1 A &= Q^\ast_n R \\ & \text{ } \vdots \\ A &= Q^\ast_1 Q^\ast_2 \text{ } \cdots \text{ } Q^\ast_{n-1} Q^\ast_n R \\ A &= Q R \end{aligned}

This was we can think of Householder triangularization as one of introducing zeros into AA via orthogonal matrices.

[xxxxxxxxxxxx]Q1[+++++++++]Q2[+++----]Q3[+++--*]=R\begin{bmatrix} \text{x} & \text{x} & \text{x} \\ \text{x} & \text{x} & \text{x} \\ \text{x} & \text{x} & \text{x} \\ \text{x} & \text{x} & \text{x} \\ \end{bmatrix} \overset{Q_1}{\rightarrow} \begin{bmatrix} \text{+} & \text{+} & \text{+} \\ & \text{+} & \text{+} \\ & \text{+} & \text{+} \\ & \text{+} & \text{+} \\ \end{bmatrix} \overset{Q_2}{\rightarrow} \begin{bmatrix} \text{+} & \text{+} & \text{+} \\ & \text{-} & \text{-} \\ & & \text{-} \\ & & \text{-} \\ \end{bmatrix} \overset{Q_3}{\rightarrow} \begin{bmatrix} \text{+} & \text{+} & \text{+} \\ & \text{-} & \text{-} \\ & & \text{*} \\ & & \\ \end{bmatrix} = R

Now the question is how do we construct the QkQ_k. The construction is usually broken down into a block matrix of the form

Qk=[I00H]Q_k = \begin{bmatrix} I & 0 \\ 0 & H \end{bmatrix}

where ICk1×k1I \in \mathbb C^{k-1 \times k-1} is the identity matrix that preserves the top k1k-1 rows

and HCm(k1)×m(k1)H \in \mathbb C^{m - (k - 1) \times m - (k-1)} is a unitary matrix that just modifies the lower mk1m-k-1 rows.

Note that this will leave the rows and columns we have already worked on alone and be unitary.

We now turn to the task of constructing the matrix HH. Note that the definition of QkQ_k implies that $$

Qkx=[I00H]x=[I00H][xtopxbottom]=[xtopHxbottom].\begin{align} Q_k \mathbf{x} &= \begin{bmatrix} I & 0 \\ 0 & H \end{bmatrix} \mathbf{x} &= \begin{bmatrix} I & 0 \\ 0 & H \end{bmatrix} \begin{bmatrix} \mathbf{x}_{top} \\ \mathbf{x}_{bottom} \end{bmatrix} \\ &= \begin{bmatrix} \mathbf{x}_{top} \\ H\mathbf{x}_{bottom} \end{bmatrix}. \\ \end{align}
Can't use function '$' in math mode at position 13: The vectors $̲\mathbf{x}_{top…

The vectors $\mathbf{x}_{top}$ and $\mathbf{x}_{bottom}$ are defined to be
xtop=[x1x2xk1]xbottom=[xkxk+1xm]\begin{align} \mathbf{x}_{top} &= \begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_{k-1}\end{bmatrix} \\ \mathbf{x}_{bottom} &= \begin{bmatrix} x_k \\ x_{k+1} \\ \vdots \\ x_m \end{bmatrix} \\ \end{align}

$Thetaskistoconstructthematrix The task is to construct the matrix Hsothataftermultiplying, so that after multiplying, H\mathbf{x}_{bottom}$ results in a vector that has some nonzero number in the top position and the rest of the numbers are all zeroes.

or

Hxbottom=[α00]H\mathbf{x}_{bottom} = \begin{bmatrix} \alpha \\ 0 \\ \vdots \\ 0 \end{bmatrix}

But if HH is unitary, it can’t change the length of a vector which implies

α=xbottom\alpha = ||\mathbf{x}_{bottom}||

One unitary operation that does this is the Householder Reflection Matrix that reflects the vector x\mathbf{x} over the hyper plane HH that is normal to the vector q\mathbf{q} so that Hx=xe^1H \mathbf{x} = ||x|| \hat{\mathbf{e}}_1:

Drawing

or mathematically

x=[x1x2xn],Hx=[x00]=xe1.wheree1=[100].\mathbf{x} = \begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_{n} \end{bmatrix}, \quad H\mathbf{x} = \begin{bmatrix} ||\mathbf{x}|| \\ 0 \\ \vdots \\ 0 \end{bmatrix} = ||\mathbf{x}|| \mathbf{e}_1.\quad\text{where}\quad\mathbf{e}_1 = \begin{bmatrix} 1 \\ 0 \\ \vdots \\ 0 \end{bmatrix}.

is the first unit vector

This is of course the effect on only one vector. Any other vector will be reflected across HH (technically a hyperplane) which is orthogonal to

v=xe^1x.\mathbf{v} = ||\mathbf{x}|| \hat{e}_1 - \mathbf{x}.

This has a similar construction as to the projector complements we were working with before. Consider the projector defined as

Px=(IqqT)x=xq(qTx)P x = \left (I - \mathbf{q} \mathbf{q}^T\right)\mathbf{x} = \mathbf{x} - \mathbf{q}(\mathbf{q}^T\mathbf{x})

where

q=vv.\mathbf{q} = \frac{\mathbf{v}}{||\mathbf{v}||}.

This vector PxP\mathbf{x} is now orthogonal to v\mathbf{v}. i.e. lies in the plane HH

Since we actually want to transform x\mathbf{x} to lie in the direction of e^1\hat{e}_1 we need to go twice as far as which allows us to identify the matrix HH as

H=I2qqT.H = I - 2 \mathbf{q} \mathbf{q}^T.

Drawing

There is actually a non-uniqueness to which direction we reflect over since another definition of H^\hat{H} which is orthogonal to the one we originally choose is available. For numerical stability purposes we will choose the reflector that is the most different from x\mathbf{x}. This comes back to having difficulties numerically when the vector x\mathbf{x} is nearly aligned with e^1\hat{e}_1 and therefore one of the HH specification. By convention the v\mathbf{v} chosen is defined by

v=sign(x1)xe^1+x.\mathbf{v} = \text{sign}(x_1)||\mathbf{x}|| \hat{e}_1 + \mathbf{x}.
# Implementation of Householder QR Factorization
def householder_QR(A, verbose=False):
    R = A.copy()
    m, n = A.shape
    QT = numpy.eye(m)
    for k in range(n):
        x = numpy.zeros(m)
        e = numpy.zeros(m)
        x[k:] = R[k:, k]
        e[k] = 1.0
        # simplest version v = ||x||e - x
        # v =  numpy.linalg.norm(x, ord=2) * e - x
        # alternate version
        v = numpy.sign(x[k]) * numpy.linalg.norm(x, ord=2) * e + x
        v = v / numpy.linalg.norm(v, ord=2)
        R -= 2.0 * numpy.outer(v, numpy.dot(v.T, R))
        QT -= 2.0 * numpy.outer(v, numpy.dot(v.T, QT))
    Q = QT.T[:, :n]
    return Q, R
A = numpy.array([[12, -51, 4], [6, 167, -68], [-4, 24, -41]], dtype=float)
print("Matrix A = ")
print(A)
%precision 6
Q, R = householder_QR(A, verbose=False)
print("Householder (reduced) Q =\n{}\n".format(Q))
print("Householder (full) R = ")
print(R)
m, n = A.shape
print(
    "Check to see if factorization worked...||A -QR|| = {}".format(
        numpy.linalg.norm(A - numpy.dot(Q, R[:n, :n]))
    )
)
print(A - numpy.dot(Q, R[:n, :n]))
print(
    "\nCheck if Q is unitary...||Q^TQ -I|| = {}".format(
        numpy.linalg.norm(Q.T.dot(Q) - numpy.eye(n))
    )
)
print(Q.T.dot(Q))

Comparison of accuracy of the different algorithms

As it turns out, not all QRQR algorithms produce the same quality of orthogonalization. Here we provide a few examples that compare the behavior of the 3 different algorithms and numpy.linalg.qr

Example 1: Random Matrix QR

Here we construct a large matrix AA with a random eigenspace and widely varying eigenvalues. The values along the diagonal of RR gives us some idea of the size of the projections as we go, i.e. the larger the values the less effective we are in constructing orthogonal directions.

N = 80
# construct a random matrix with known singular values
U, X = numpy.linalg.qr(numpy.random.random((N, N)))
V, X = numpy.linalg.qr(numpy.random.random((N, N)))
S = numpy.diag(2.0 ** numpy.arange(-1.0, -(N + 1), -1.0))
A = numpy.dot(U, numpy.dot(S, V))

fig = plt.figure(figsize=(8, 6))
axes = fig.add_subplot(1, 1, 1)
Q, R = classic_GS(A)
axes.semilogy(numpy.diag(R), "bo", label="Classic")
Q, R = mod_GS(A)
axes.semilogy(numpy.diag(R), "ro", label="Modified")
Q, R = householder_QR(A)
axes.semilogy(numpy.diag(R), "ko", label="Householder")
Q, R = numpy.linalg.qr(A)
axes.semilogy(numpy.diag(R), "go", label="numpy")

axes.set_xlabel("Index", fontsize=16)
axes.set_ylabel("$R_{ii}$", fontsize=16)
axes.legend(loc=3, fontsize=14)
axes.plot(numpy.arange(0, N), numpy.ones(N) * numpy.sqrt(numpy.finfo(float).eps), "k--")
axes.plot(numpy.arange(0, N), numpy.ones(N) * numpy.finfo(float).eps, "k--")

plt.show()

Example 2: Comparing Orthogonality

Consider the QRQR factorization of the ill-conditioned matrix

A=[0.700000.707110.700010.70711].A = \begin{bmatrix} 0.70000 & 0.70711 \\ 0.70001 & 0.70711 \end{bmatrix}.
%precision 16

A = numpy.array([[0.7, 0.70711], [0.70001, 0.70711]])
print("\ncond(A) = {}".format(numpy.linalg.cond(A)))

To check that the matrix QQ is really unitary, we compute A=QRA=QR with the different algorithms and compare

QTQI||Q^TQ - I||
Q_c, R = classic_GS(A)
r_c = numpy.linalg.norm(numpy.dot(Q_c.transpose(), Q_c) - numpy.eye(2))
print("Classic: ", r_c)

Q, R = mod_GS(A)
print("Modified: ", numpy.linalg.norm(numpy.dot(Q.transpose(), Q) - numpy.eye(2)))

Q_h, R = householder_QR(A)
r_h = numpy.linalg.norm(numpy.dot(Q_h.transpose(), Q_h) - numpy.eye(2))
print("Householder:", r_h)

Q, R = numpy.linalg.qr(A)
r = numpy.linalg.norm(numpy.dot(Q.transpose(), Q) - numpy.eye(2))
print("Numpy: ", r)

print("\ncond(A) = {}".format(numpy.linalg.cond(A)))
print("r_classic/r_householder = {}".format(r_c / r_h))

Applications of the QR

The QRQR factorization and unitary transformation such as Householder reflections, play important roles in a wide range of algorithms for Numerical Algorithms

Least-Squares problems: Solving Ax=bA\mathbf{x} = \mathbf{b} with QR

We have already discussed solving overdetermined least-squares problems using the QRQR factorization. In general, if we seek least-squares solutions to Ax=bA\mathbf{x}=\mathbf{b} and A=QRA=QR, the problem reduces to solving the triangular problem

Rx=QTbR\mathbf{x} = Q^T\mathbf{b}

If we use Householder triangularization to transform ARA\rightarrow R directly, we do not need to explicitly form the matrix QQ and can save memory and computation. If we consider the augmented system [Ab]\begin{bmatrix} A & \mathbf{b}\end{bmatrix} and Apply the same sequence of unitary, rank 1 tranformations we used in the Householder algorithm, the sequence becomes

QnQ2Q1[Ab]=[QTAQTb]=[Rc]Q_n\ldots Q_2Q_1\begin{bmatrix} A & \mathbf{b}\end{bmatrix} = \begin{bmatrix} Q^TA & Q^T\mathbf{b}\end{bmatrix} =\begin{bmatrix} R & \mathbf{c}\end{bmatrix}

Thus we just need to solve

Rx=cR\mathbf{x} = \mathbf{c}

Alternatively, during the QRQR factorization by Householder, we can just store the vectors q1qn\mathbf{q}_1\ldots\mathbf{q}_n and reconstruct QTbQ^T\mathbf{b} by rank-1 updates as neccessary.

Finding Eigenvalues

As it turns out, the QRQR factorization and Householder transformation are also extremely useful for finding eigenvalues of a matrix AA. There is an entire notebook 13_LA_eigen.ipynb that develops these ideas in detail. Here we will just discuss the parts that relate to the QRQR and orthogonalization algorithms.

The basics

The eigenproblem

Ax=λxA \mathbf{x} = \lambda \mathbf{x}

can be rewritten as

(AλI)x=0( A - \lambda I)\mathbf{x} = \mathbf{0}

which implies that the eigenvectors are in the Null space of AλIA-\lambda I.

However for this matrix to have a non-trivial Null space, requires that AλIA-\lambda I is singular.

Characteristic Polynomial

If AλIA-\lambda I is singular, it follows that

det(AλI)=PA(λ)=0\det( A - \lambda I) = {\cal P}_A(\lambda) = 0

where PA(λ){\cal P}_A(\lambda) can be shown to be a mmth order polynomial in λ\lambda known as the characteristic polynomial of a matrix AA

Computing Eigenvalues

In basic linear algebra classes we usually find the eigenvalues by directly calculating the roots of PA(λ){\cal P}_A(\lambda) which can work for low-degree polynomials. Unfortunately the following theorem (due to Galois) suggests this is not a good way to compute eigenvalues:

Theorem: For an m5m \geq 5 there is a polynomial P(z)\mathcal{P}(z) of degree mm with rational coefficients that has a real root P(z0)=0\mathcal{P}(z_0) = 0 with the property that z0z_0 cannot be written using any expression involving rational numbers, addition, subtraction, multiplication, division, and kkth roots.

I.e., there is no way to find the roots of a polynomial of degree >4>4 in a deterministic, fixed number of steps.

Not all is lost however!

We just must use an iterative approach where we construct a sequence that converges to the eigenvalues.

Some Questions

  • How does this relate to how we found roots previously?

  • Why will it still be difficult to use our rootfinding routines to find Eigenvalues?

We will return to how we actually find Eigenvalues (and roots of polynomials) after a bit more review

Similarity Transformations

Generally, we say any two matrices AA and BB are similar if they can be related through an invertible matrix MM as

A=M1BMA = M^{-1} B M

Example

The general Eigen problem

Ax=λxA\mathbf{x}=\lambda\mathbf{x}

is really nn problems for each eigenvalue, eigenvector pair i.e.

Axi=λixifori=1,,nA\mathbf{x}_i = \lambda_i\mathbf{x}_i\quad\text{for}\quad i=1,\ldots,n

which can be written concisely in matrix form as

AX=XΛAX =X\Lambda

where XX is a matrix whose columns contain the eigenvectors and Λ\Lambda is a diagonal matrix of corresponding eigenvalues. This form is always true

Example Diagonalizable matrices

If a matrix ARn×nA\in\mathbb{R}^{n\times n} has nn linearly independent eigenvectors, then we say AA is diagonalizable and we can factor it as

A=XΛX1A = X\Lambda X^{-1}

Where XX is a matrix of eigenvectors, and Λ\Lambda is a diagonal matrix of corresponding Eigenvalues.

which says AA is similar to Λ\Lambda with similarity transform XX

Theorem:

If AA and BB are similar matrices, they have the same eigenvalues and their eigenvectors are related through an invertible matrix MM

Proof: Let

B=MAM1B = M A M^{-1}

or

BM=MABM = MA

if Ax=λxA\mathbf{x} = \lambda\mathbf{x} then

BMx=MAx=λMxBM\mathbf{x} = M A\mathbf{x} = \lambda M\mathbf{x}

or

By=λyB\mathbf{y} = \lambda\mathbf{y}

which shows that λ\lambda is also an eigenvalue of BB with corresponding eigenvector y=Mx\mathbf{y} = M\mathbf{x}

Schur Factorization

A Schur factorization of a matrix AA is defined as

A=QTQA = Q T Q^\ast

where QQ is unitary and TT is upper-triangular. Because Q=Q1Q^\ast=Q^{-1} (for square unitary matrices). It follows directly that AA and TT are similar.

  • Good News! TT is upper triangular so its eigenvalues can just be read of the diagonal

  • Bad News! There is no deterministic way to calculate TT as that would violate Galois theory of polynomials

Theorem: Every matrix ACm×mA \in \mathbb C^{m \times m} has a Schur factorization.

Partial Proof for a diagonalizable matrix. If AA is diagonalizable, A=XΛX1A=X\Lambda X^{-1}. But we know we can always factor X=QRX=QR and substitute to show

A=Q(RΛR1)QTA = Q(R\Lambda R^{-1})Q^T

and it is not hard to show that the product RΛR1R\Lambda R^{-1} is also an upper triangular matrix (exercise left to the reader).

(For a non-diagonalizable matrix the proof requires showing the existence of the Jordan form A=MJM1A=MJM^{-1})

Note that the above results imply the following

  • An eigen-decomposition A=XΛX1A = X \Lambda X^{-1} exists if and only if AA is non-defective (it has a complete set of eigenvectors)

  • A unitary transformation A=QΛQA = Q \Lambda Q^\ast exists if and only if AA is Hermitian (A=AA^\ast = A)

  • A Schur factorization always exists

Note that each of these lead to a means for isolating the eigenvalues of a matrix and will be useful when considering algorithms for finding them.

Hessenberg form

The first step to finding the Schur factorization is to try and get AA as close to triangular as possible without changing its eigenvalues. This requires a series of similarity transformations.

As it turns out, the closest we can do is to reduce it to Hessenberg form which is upper triangular with one extra subdiagonal. Which can be done with nn explicit similarity transformations using Householder Transformations

[xxxxxxxxxxxxxxxxxxxxxxxxx]H1A0H1[xxxxxxxxxx0xxxx0xxxx0xxxx]H2A1H2[xxxxxxxxxx0xxxx00xxx00xxx]H3A2H3[xxxxxxxxxx0xxxx00xxx000xx]\begin{bmatrix} \text{x} & \text{x} & \text{x} & \text{x} & \text{x} \\ \text{x} & \text{x} & \text{x} & \text{x} & \text{x} \\ \text{x} & \text{x} & \text{x} & \text{x} & \text{x} \\ \text{x} & \text{x} & \text{x} & \text{x} & \text{x} \\ \text{x} & \text{x} & \text{x} & \text{x} & \text{x} \end{bmatrix} \overset{H_1^\ast A_0 H_1}{\rightarrow} \begin{bmatrix} \text{x} & \text{x} & \text{x}& \text{x} & \text{x} \\ \text{x} & \text{x} & \text{x}& \text{x} & \text{x} \\ 0 & \text{x} & \text{x}& \text{x} & \text{x} \\ 0 & \text{x} & \text{x}& \text{x} & \text{x} \\ 0 & \text{x} & \text{x}& \text{x} & \text{x} \end{bmatrix} \overset{H_2^\ast A_1H_2}{\rightarrow} \begin{bmatrix} \text{x} & \text{x} & \text{x}& \text{x} & \text{x} \\ \text{x} & \text{x} & \text{x}& \text{x} & \text{x} \\ 0 & \text{x} & \text{x}& \text{x} & \text{x} \\ 0 & 0 & \text{x}& \text{x} & \text{x} \\ 0 & 0 & \text{x}& \text{x} & \text{x} \end{bmatrix} \overset{H_3^\ast A_2H_3}{\rightarrow} \begin{bmatrix} \text{x} & \text{x} & \text{x}& \text{x} & \text{x} \\ \text{x} & \text{x} & \text{x}& \text{x} & \text{x} \\ 0 & \text{x} & \text{x}& \text{x} & \text{x} \\ 0 & 0 & \text{x}& \text{x} & \text{x} \\ 0 & 0 & 0 & \text{x} & \text{x} \end{bmatrix}

so we have the sequence H=QAQH = Q^\ast A Q which has the same eigenvalues as the original matrix AA.

Question? Why can’t we just use Householder to take ATA\rightarrow T like we did for the QRQR?

QR/RQ Algorithm

Given a matrix in Hessenberg form, it turns out we can use repeated QRQR factorizations to reduce the size of the subdiagonal and iterate towards the Schur factorization to find all the eigenvalues simultaneously.

The simplest algorithm just iterates

    while not converged:
        Q, R = numpy.linalg.qr(A)
        A = R.dot(Q)        

calculating the QRQR factorization of AA, then forming a new A=RQA=RQ, This sequence will eventually converge to the Schur decomposition of the matrix AA.

Code this up and see what happens.

%precision 6
m = 3
A0 = numpy.array([[2, 1, 1], [1, 3, 1], [1, 1, 4]])
MAX_STEPS = 10

A = A0
print("A=")
print(A)
for i in range(MAX_STEPS):
    Q, R = numpy.linalg.qr(A)
    A = numpy.dot(R, Q)
    print()
    print("A({}) =".format(i))
    print(A)
print()
print("True eigenvalues: ")
print(numpy.sort(numpy.linalg.eigvals(A0)))
print()
print("Computed eigenvalues: ")
print(numpy.sort(numpy.diag(A)))

So why does this work? The first step is to find the QRQR factorization

A(k1)=Q(k)R(k)A^{(k-1)} = Q^{(k)}R^{(k)}

which is equivalent to finding

(Q(k))TA(k1)=R(k)(Q^{(k)})^T A^{(k-1)} = R^{(k)}

and multiplying on the right leads to

(Q(k))TA(k1)Q(k)=R(k)Q(k)=A(k).(Q^{(k)})^T A^{(k-1)} Q^{(k)} = R^{(k)} Q^{(k)} = A^{(k)}.

In this way we can see that this is a similarity transformation of the matrix A(k1)A^{(k-1)} since the Q(k)Q^{(k)} is an orthogonal matrix (Q1=QTQ^{-1} = Q^T). This of course is not a great idea to do directly but works great in this case as we iterate to find the upper triangular matrix R(k)R^{(k)} which is exactly where the eigenvalues appear.

In practice this basic algorithm is modified to include a few additions:

  1. Before starting the iteration AA is reduced to tridiagonal or Hessenberg form.

  2. Motivated by the inverse power iteration we observed we instead consider a shifted matrix A(k)μ(k)IA^{(k)} - \mu^{(k)} I for factoring. The μ\mu picked is related to the estimate given by the Rayleigh quotient. Here we have

μ(k)=(qm(k))TAqm(k)(qm(k))Tqm(k)=(qm(k))TAqm(k).\mu^{(k)} = \frac{(q_m^{(k)})^T A q_m^{(k)}}{(q_m^{(k)})^T q_m^{(k)}} = (q_m^{(k)})^T A q_m^{(k)}.
  1. Deflation is used to reduce the matrix A(k)A^{(k)} into smaller matrices once (or when we are close to) finding an eigenvalue to simplify the problem.

This has been the standard approach until recently for finding eigenvalues of a matrix.

Application: Finding the roots of a polynomial

Numpy has a nice function called roots which returns the nn roots of a nnth degree polynomial

p(x)=p0xn+p1xn1+p2xn2++pnp(x) = p_0 x^n + p_1 x^{n-1} + p_2 x^{n-2} + \ldots + p_n

described by a n+1n+1 vector of coefficients p\mathbf{p}

p = numpy.array([1, 1, -1])
r = numpy.roots(p)
print(r)
p = numpy.random.rand(8)
r = numpy.roots(p)
print(r)

This routine, does not try and actually find the roots of a high-order polynomial, instead it actually calculates the eigenvalues of a companion matrix CC whose characteristic polynomial PC(λ)P_C(\lambda) is the monic polynomial

c(x)=c0+c1x+c2x2++cn1xn1+xnc(x) = c_0 + c_1 x + c_2 x^2 + \ldots + c_{n-1} x^{n-1} + x^n

It can be shown that this matrix can be constructed as (see e.g.)

C(p)=[000c0100c1010c2001cn1].C(p)=\begin{bmatrix} 0 & 0 & \dots & 0 & -c_0 \\ 1 & 0 & \dots & 0 & -c_1 \\ 0 & 1 & \dots & 0 & -c_2 \\ \vdots & \vdots & \ddots & \vdots & \vdots \\ 0 & 0 & \dots & 1 & -c_{n-1} \end{bmatrix}.
def myroots(p, verbose=False):
    """Calculate the roots of a polynomial described by coefficient vector
    in numpy.roots order
    p(x) = p_0 x^n + p_1 x^{n-1} + p_2 x^{n-2} + \ldots + p_n
    by finding the eigenvalues of the companion matrix

    returns:
    --------
    eigenvalues sorted by |\lambda|

    """

    # construct the companion matrix of the coefficient vector c
    # make p monic and reverse the order for this definition of the companion matrix
    c = numpy.flip(p / p[0])
    if verbose:
        print(c)
    m = len(c) - 1
    C = numpy.zeros((m, m))
    C[:, -1] = -c[:-1]
    C[1:, :-1] = numpy.eye(m - 1)
    if verbose:
        print("C = \n{}".format(C))

    # calculate the eigenvalues of the companion matrix, then sort by |lambda|
    eigs = numpy.linalg.eigvals(C)
    index = numpy.flip(numpy.argsort(numpy.abs(eigs)))
    return eigs[index]
p = numpy.array([1, 1, -1])
r = numpy.roots(p)
print(r)
mr = myroots(p)
print
print(mr)
%precision 4
p = numpy.random.rand(5)
r = numpy.roots(p)
print("nproots: {}".format(r))
mr = myroots(p)
print()
print("myroots: {}".format(mr))