Skip to article frontmatterSkip to article content

Eigenproblems

Text provided under a Creative Commons Attribution license, CC-BY. All code is made available under the FSF-approved MIT license. (c) Kyle T. Mandli

Note: This material largely follows the text “Numerical Linear Algebra” by Trefethen and Bau (SIAM, 1997) and is meant as a guide and supplement to the material presented there.

from __future__ import print_function

%matplotlib inline
import warnings

import matplotlib.pyplot as plt
import numpy

Eigenproblems

Overview

We will now consider eigenproblems of the form

Ax=λxA \mathbf{x} = \lambda \mathbf{x}

where ACm×mA \in \mathbb C^{m \times m}, xCm\mathbf{x} \in \mathbb C^m and λC\lambda \in \mathbb C. The vector x\mathbf{x} is known as the eigenvector and λ\lambda the eigenvalue. The set of all eigenvalues is called the spectrum of AA.

The basics

The eigenproblem

Ax=λxA \mathbf{x} = \lambda \mathbf{x}

can be rewritten as

(AλI)x=0( A - \lambda I)\mathbf{x} = \mathbf{0}

which implies that the eigenvectors are in the Null space of AλIA-\lambda I.

However for this matrix to have a non-trivial Null space, requires that AλIA-\lambda I is singular.

Characteristic Polynomial

If AλIA-\lambda I is singular, it follows that

det(AλI)=PA(λ)=0\det( A - \lambda I) = {\cal P}_A(\lambda) = 0

where PA(λ){\cal P}_A(\lambda) can be shown to be a mmth order polynomial in λ\lambda known as the characteristic polynomial of a matrix AA

We can then state the following theorem regarding the zeros of PA\mathcal{P}_A and the eigenvalues of AA:

Theorem: λ\lambda is an eigenvalue of AA if and only if PA(λ)=0\mathcal{P}_A(\lambda) = 0.

i.e. the eigenvalues are the roots of PA(λ){\cal P}_A(\lambda), and therefore there are exactly mm eigenvalues.

Proof:

If λ is an eigenvalue of A there is a non-zero vector x s.t. λxAx=0λIA is singular (since x is a non-trivial vector in the null space of λIA)det(λIA)=0\begin{aligned} \text{If } \lambda \text{ is an eigenvalue of } A &\Leftrightarrow \text{ there is a non-zero vector } x \text{ s.t. } \lambda x - A x = 0 \\ &\Leftrightarrow \lambda I A \text{ is singular (since }x\text{ is a non-trivial vector in the null space of } \lambda I - A) \\ &\Leftrightarrow \det(\lambda I - A) = 0 \end{aligned}

Note that this theorem implies that even though ARm×mA \in \mathbb R^{m \times m} that λC\lambda \in \mathbb C.

Computing Eigenvalues

In basic linear algebra classes we usually find the eigenvalues by directly calculating the roots of PA(λ){\cal P}_A(\lambda) which can work for low-degree polynomials. Unfortunately the following theorem (due to Galois) suggests this is not a good way to compute eigenvalues:

Theorem: For an m5m \geq 5 there is a polynomial P(z)\mathcal{P}(z) of degree mm with rational coefficients that has a real root P(z0)=0\mathcal{P}(z_0) = 0 with the property that z0z_0 cannot be written using any expression involving rational numbers, addition, subtraction, multiplication, division, and kkth roots.

I.e., there is no way to find the roots of a polynomial of degree >4>4 in a deterministic, fixed number of steps.

Not all is lost however!

We just must use an iterative approach where we construct a sequence that converges to the eigenvalues.

Some Questions

  • How does this relate to how we found roots previously?

  • Why will it still be difficult to use our rootfinding routines to find Eigenvalues?

We will return to how we actually find Eigenvalues (and roots of polynomials) after a bit more review

Eigenvalue Factorization and Diagonalization

Given that there are exactly mm (possibly repeated) eigenvalues for a m×mm\times m system, the eigenproblem is really more correctly written as

Axi=λixi,i=1,2,,mA\mathbf{x}_i = \lambda_i \mathbf{x}_i, \quad i=1,2,\ldots,m

Or in Matrix form as

AX=XΛAX = X\Lambda

where XX is the matrix formed by the eigenvectors xx as its columns and Λ\Lambda is a diagonal matrix with the eigenvalues along its diagonal.

Expanded, AX=XΛA X = X \Lambda looks like:

[A][x1x2xm]=[x1x2xm1xm][λ1λ2λm1λm]\begin{bmatrix} & & & & \\ & & & & \\ & & A & & \\ & & & & \\ & & & & \end{bmatrix} \begin{bmatrix} & & & & \\ & & & & \\ \mathbf{x}_1 & \mathbf{x}_2 & \cdots & \mathbf{x}_m \\ & & & & \\ & & & & \end{bmatrix} = \begin{bmatrix} & & & & \\ & & & & \\ \mathbf{x}_1 & \mathbf{x}_2 & \cdots & \mathbf{x}_{m-1} & \mathbf{x}_m \\ & & & & \\ & & & & \end{bmatrix} \begin{bmatrix} \lambda_1 & & & & \\ & \lambda_2 & & & \\ & & \ddots & & \\ & & & \lambda_{m-1} & \\ & & & & \lambda_m \end{bmatrix}

Here we note that the eigenpair (xj,λj)(\mathbf{x}_j, \lambda_j) are matched as the jjth column of XX and the jjth element of Λ\Lambda on the diagonal.

Diagonalization

Eigenproblems can always be written as

AX=XΛAX = X\Lambda

However, if there are a linearly independent set of eigenvectors then the matrix XX is invertible (why?). Under this condition we can transform AA into the diagonal matrix Λ\Lambda by

Λ=X1AX\Lambda = X^{-1}A X

Factorization

or more usefully, we can rewrite AA as a product of more useful matrices

A=XΛX1A = X\Lambda X^{-1}

And use it in similar ways to how we use other factorization such as A=QRA=QR and A=LUA=LU

The Rules for diagonalizability

Unfortunately, not all matrices can be diagonalized (i.e. don’t have a full linearly independent set of eigenvectors that span Rm\mathbb{R}^m or Cm\mathbb{C}^m). Here are the rules.

A matrix can be factored as A=XΛX1A = X\Lambda X^{-1} if

  1. all eigenvalues are distinct (never repeat). These are known as simple eigenvalues

  2. Eigenvalues repeat, but for every repeated eigenvalue there can be found a linearly independent eigenvector

  3. The matrix is Hermitian (A=AA^\ast = A, or if real, then symmetric AT=AA^T = A). In this case the eigenvalues are always real and the eigenvectors can always be chosen orthonormal. In this special case X=QX=Q and

A=QΛQA = Q\Lambda Q^\ast

The issue of repeated eigenvalues can be made a bit more precise by defining

  • Algebraic multiplicity: the number of times an eigenvalue is repeated

  • Geometric multiplicity: the number of linearly independent eigenvectors corresponding to each eigenvalue.

If the algebraic multiplicity is equal to the geometric multiplicity for all λ\lambda then we can say that there is a full eigenspace and the matrix is diagonalizable.

Example: Computing Multiplicities

Compute the geometric and algebraic multiplicities for the following matrices. What is the relationship between the algebraic and geometric multiplicities?

A=[222]A = \begin{bmatrix} 2 & & \\ & 2 & \\ & & 2 \end{bmatrix}
B=[21212]B = \begin{bmatrix} 2 & 1 & \\ & 2 & 1 \\ & & 2 \end{bmatrix}
  1. The characteristic polynomial of AA is

PA(z)=(2z)(2z)(2z)=(2z)3\mathcal{P}_A(z) = (2 - z)(2 - z)(2 - z) = (2 - z)^3

so the eigenvalues are all λ=2\lambda = 2 so we know the algebraic multiplicity is 3 of this eigenvalue. The geometric multiplicity is determined by the number of linearly independent eigenvectors. For this matrix we have three eigenvectors that are all linearly independent which happen to be the unit vectors in each direction (check!). This means that the geometric multiplicity is also 3.

  1. The characteristic polynomial of BB is the same as AA so again we know λ=2\lambda = 2 but now we need to be a bit careful about the eigenvectors. In this case the only eigenvector is a scalar multiple of e1e_1 so the geometric multiplicity is 1.

Interpretations of the Eigenspace

One way to interpret the eigenproblem is that of one that tries to find the subspaces of Cm\mathbb C^m which act like scalar multiplication by λ\lambda. The eigenvectors associated with one eigenvalue then form a subspace of SCmS \subseteq \mathbb C^m.

When an eigenvalue has algebraic multiplicity that equals its geometric then it is called non-defective and otherwise defective. This property is also inherited to the matrix so in the above example AA and BB are non-defective and defective matrices respectively.

Determinant and Trace

Two important properties of matrices have important relationships with their eigenvalues, namely the determinant and trace. The determinant we have seen, the trace is defined as the sum of the elements on the diagonal of a matrix, in other words

tr(A)=i=1mAii.\text{tr}(A) = \sum^m_{i=1} A_{ii}.

The relationship between the determinant and the eigenvalues is not difficult to guess due to the nature of the characteristic polynomial. The trace of a diagonal matrix is clear and provides another suggestion to the relationship.

Theorem: The determinant det(A)\det(A) and trace trace(A)\text{trace}(A) are equal to the product and sum of the eigenvalues of AA respectively counting algebraic multiplicity.

Similarity Transformations

Generally, we say any two matrices AA and BB are similar if they can be related through an invertible matrix MM as

A=M1BMA = M^{-1} B M

Example

a diagonalizable matrix AA is similar to the diagonal matrix Λ\Lambda through the invertible matrix XX

A=XΛX1A = X\Lambda X^{-1}

Theorem: If AA and BB are similar matrices, they have the same eigenvalues and their eigenvectors are related through an invertible matrix MM

Proof: Let

B=MAM1B = M A M^{-1}

or

BM=MABM = MA

if Ax=λxA\mathbf{x} = \lambda\mathbf{x} then

BMx=MAx=λMxBM\mathbf{x} = M A\mathbf{x} = \lambda M\mathbf{x}

or

By=λyB\mathbf{y} = \lambda\mathbf{y}

which shows that λ\lambda is also an eigenvalue of BB with corresponding eigenvector y=Mx\mathbf{y} = M\mathbf{x}

Schur Factorization

A Schur factorization of a matrix AA is defined as

A=QTQA = Q T Q^\ast

where QQ is unitary and TT is upper-triangular. Because Q=Q1Q^\ast=Q^{-1} (for square unitary matrices). It follows directly that AA and TT are similar.

  • Good News! TT is upper triangular so its eigenvalues can just be read of the diagonal

  • Bad News! There is no deterministic way to calculate TT as that would violate Galois theory of polynomials

Theorem: Every matrix ACm×mA \in \mathbb C^{m \times m} has a Schur factorization.

Note that the above results imply the following

  • An eigen-decomposition A=XΛX1A = X \Lambda X^{-1} exists if and only if AA is non-defective (it has a complete set of eigenvectors)

  • A unitary transformation A=QΛQA = Q \Lambda Q^\ast exists if and only if AA is normal (AA=AAA^\ast A = A A^\ast)

  • A Schur factorization always exists

Note that each of these lead to a means for isolating the eigenvalues of a matrix and will be useful when considering algorithms for finding them.

Condition Number of a Simple Eigenvalue

Before we discuss a number of approaches to computing eigenvalues it good to consider what the condition number of a given eigenproblem is.

Let

Ax=λxAx = \lambda x

define the eigenvalue problem in question. Here we will introduce a related problem

yA=λyy^\ast A = \lambda y^\ast

where yy is the left eigenvector and from before xx is the right eigenvector. These vectors also can be shown to have the relationship yx0y^\ast x \neq 0 for a simple eigenvalue.

Now consider the perturbed problem

(A+δA)(x+δx)=(λ+δλ)(x+δx).(A + \delta A) (x + \delta x) = (\lambda + \delta \lambda) (x + \delta x).

Expanding this and throwing out quadratic terms and removing the eigenproblem we have

δAx+Aδx=δλx+λδx.\delta A x + A \delta x = \delta \lambda x + \lambda \delta x.

Multiple both sides of the above by the left eigenvector and use yx0y^\ast x \neq 0 to find

yδAx+yAδx=yδλx+yλδxyδAx=yδλx\begin{aligned} y^\ast \delta A x + y^\ast A \delta x &= y^\ast \delta \lambda x + y^\ast \lambda \delta x \\ y^\ast \delta A x &= y^\ast \delta \lambda x \end{aligned}

where we again use the slightly different definition of the eigenproblem. We can then solve for δλ\delta \lambda to find

δλ=yδAxyx\delta \lambda = \frac{y^\ast \delta A x}{y^\ast x}

meaning that the ratio between the dot-product of the left and right eigenvectors and the conjugate dot-product of the matrix δA\delta A then form a form of bound on the expected error in the simple eigenvalue.

Computing Eigenvalues

Almost all useful approaches to computing eigenvalues do so through the computation of the Schur factorization. The Schur factorization, as we have seen, will preserve the eigenvalues. The steps to compute the Schur factorization are usually broken down into two steps

  1. Directly transform AA into a Hessenberg matrix, a matrix that contains zeros below its first sub-diagonal, directly using Householder reflections. This is as close to triangular that you can get by direct similarity transformations.

  2. Use an iterative method to change the sub-diagonal into all zeros

Hessenberg and Tridiagonal form

What we want to do is construct a sequence of similarity transformations matrices that turns AA into a Hessenberg matrix with the same eigenvalues as AA. We use Householder reflections to do this with the important distinction that we only want to remove zeros below the first sub-diagonal.

[xxxxxxxxxxxxxxxxxxxxxxxxx]H1A0H1[xxxxxxxxxx0xxxx0xxxx0xxxx]H2A1H2[xxxxxxxxxx0xxxx00xxx00xxx]H3A2H3[xxxxxxxxxx0xxxx00xxx000xx]\begin{bmatrix} \text{x} & \text{x} & \text{x} & \text{x} & \text{x} \\ \text{x} & \text{x} & \text{x} & \text{x} & \text{x} \\ \text{x} & \text{x} & \text{x} & \text{x} & \text{x} \\ \text{x} & \text{x} & \text{x} & \text{x} & \text{x} \\ \text{x} & \text{x} & \text{x} & \text{x} & \text{x} \end{bmatrix} \overset{H_1^\ast A_0 H_1}{\rightarrow} \begin{bmatrix} \text{x} & \text{x} & \text{x}& \text{x} & \text{x} \\ \text{x} & \text{x} & \text{x}& \text{x} & \text{x} \\ 0 & \text{x} & \text{x}& \text{x} & \text{x} \\ 0 & \text{x} & \text{x}& \text{x} & \text{x} \\ 0 & \text{x} & \text{x}& \text{x} & \text{x} \end{bmatrix} \overset{H_2^\ast A_1H_2}{\rightarrow} \begin{bmatrix} \text{x} & \text{x} & \text{x}& \text{x} & \text{x} \\ \text{x} & \text{x} & \text{x}& \text{x} & \text{x} \\ 0 & \text{x} & \text{x}& \text{x} & \text{x} \\ 0 & 0 & \text{x}& \text{x} & \text{x} \\ 0 & 0 & \text{x}& \text{x} & \text{x} \end{bmatrix} \overset{H_3^\ast A_2H_3}{\rightarrow} \begin{bmatrix} \text{x} & \text{x} & \text{x}& \text{x} & \text{x} \\ \text{x} & \text{x} & \text{x}& \text{x} & \text{x} \\ 0 & \text{x} & \text{x}& \text{x} & \text{x} \\ 0 & 0 & \text{x}& \text{x} & \text{x} \\ 0 & 0 & 0 & \text{x} & \text{x} \end{bmatrix}

so we have the sequence H=QAQH = Q^\ast A Q which has the same eigenvalues as the original matrix AA.

Question? Why can’t we just use Householder to take ATA\rightarrow T like we did for the QRQR?

One important special case of this sequence of transformations is that if the matrix AA is hermitian (the matrix is its own conjugate transpose, A=AA = A^\ast, or symmetric in the real case) then the Hessenberg matrix is tridiagonal.

We now will focus on how to formulate the iteration step of the eigenproblem. We will also restrict our attention to symmetric, real matrices. This implies that all eigenvalues will be real and have a complete set of orthogonal eigenvectors. Generalizations can be made of many of the following algorithms but is beyond the scope of this class.

Rayleigh Quotient and Inverse Iteration

There are a number of classical approaches to computing the iterative step above which we will review here. Inverse power iteration in particular is today still the dominant means of finding the eigenvectors once the eigenvalues are known.

Rayleigh Quotient

The Rayleigh quotient of a vector xRmx \in \mathbb R^m is the scalar

r(x)=xTAxxTx.r(x) = \frac{x^T A x}{x^T x}.

The importance of the Rayleigh quotient is made clear when we evaluate r(x)r(x) at an eigenvector. When this is the case the quotient evaluates to the corresponding eigenvalue.

The Rayleigh quotient can be motivated by asking the question, given an eigenvector xx, what value α\alpha acts most like an eigenvalue in an 2\ell_2 sense:

minαAxαx2.\min_\alpha ||A x - \alpha x||_2.

This can be reformulated as a least-squares problem noting that xx is the “matrix”, α\alpha is the unknown vector (scalar) and AxAx is the right-hand side so we have

(xTx)α=xT(Ax)(x^T x) \alpha = x^T (A x)

which can be solved so that

α=r(x)=xTAxxTx.\alpha = r(x) = \frac{x^T A x}{x^T x}.

Power Iteration

Power iteration is a straightforward approach to finding the eigenvector of the largest eigenvalue of AA. The basic idea is that the sequence

xx,AxAx,A2xA2x,A3xA3x,\frac{x}{||x||}, \frac{Ax}{||Ax||}, \frac{A^2x}{||A^2x||}, \frac{A^3x}{||A^3x||}, \ldots

will converge (although very slowly) to the desired eigenvector.

We implement this method by initializing the algorithm with some vector vv with v=1||v|| = 1. We then apply the sequence of multiplications.

# Demo
# generate a random symmetric matrix
A = numpy.random.rand(3, 3)
A = 0.5 * (A + A.T)
print("A=\n{}".format(A))

lams = numpy.linalg.eigvals(A)
print("\nLambda = {}".format(lams))
# the Rayleigh Quotient
def rayleighq(A, x):
    return numpy.dot(x.T, A.dot(x)) / numpy.dot(x.T, x)


def power_iteration(A, tol=1.0e-6):
    """power_iteration to find the largest eigenvector and corresponding eigenvalue

    parameters:
    -----------

    A: ndarray (square)
        m x m matrix

    tol: float
        stopping criteria for iteration.
        iteration will cease when ||x_{i+1} - x_{i}|| < tol or
        MAX_ITS exceeded

    returns:
    --------
    x: ndarray
        array of iterates of the eigenvector
    r: ndarray
    """

    MAX_ITS = 100
    m = A.shape[0]

    x = numpy.empty((MAX_ITS, m))
    lam = numpy.empty(MAX_ITS)
    res = numpy.empty(MAX_ITS)

    # generate a random unit vector
    x0 = numpy.random.rand(A.shape[0])
    x[0, :] = x0 / numpy.linalg.norm(x0, ord=2)
    lam[0] = rayleighq(A, x0)

    for i in range(1, MAX_ITS + 1):
        xi = A.dot(x[i - 1, :])
        x[i, :] = xi / numpy.linalg.norm(xi, ord=2)
        lam[i] = rayleighq(A, x[i, :])
        res[i - 1] = numpy.abs(lam[i] - lam[i - 1]) / numpy.abs(lam[i])
        if res[i - 1] < tol:
            break

    if i == MAX_ITS:
        warnings.warn("Maximum iterations exceeded")

    x.resize(i + 1, m)
    lam.resize(i + 1)
    res.resize(i)
    return x, lam, res
x, r, res = power_iteration(A, tol=1.0e-8)
print("{} Iterations".format(len(r)))
print("x = {}".format(x[-1]))
print("Eigenvalue = {}".format(r[-1]))
print("eigs(A) = {}".format(numpy.linalg.eigvals(A)))
fig = plt.figure(figsize=(16, 6))
axes = fig.add_subplot(1, 2, 1)
axes.plot(r, "o", markersize=8)
lam_max = numpy.max(numpy.linalg.eigvals(A))
axes.plot(lam_max * numpy.ones(r.shape), "k--")

axes.grid()
axes.set_xlabel("Iteration", fontsize=16)
axes.set_ylabel("Rayleigh Quotient", fontsize=16)

axes = fig.add_subplot(1, 2, 2)
axes.semilogy(res, "o-", markersize=8)

axes.grid()
axes.set_xlabel("Iteration", fontsize=16)
axes.set_ylabel("Residual", fontsize=16)
plt.show()

The reason why this works can be seen by considering the initial vector vv as a linear combination of the orthonormal eigenvectors (which we have assumed exist) such that

v(0)=a1q1+a2q2++amqm.v^{(0)} = a_1 q_1 + a_2 q_2 + \cdots + a_m q_m.

Multiplying v(0)v^{(0)} by AA then leads to

Av(0)=v(1)=a1Aq1+a2Aq2++amAqm=c1(a1λ1q1+a2λ2q2++amλmqm)\begin{aligned} Av^{(0)} = v^{(1)} &= a_1 A q_1 + a_2 A q_2 + \cdots + a_m A q_m \\ &= c_1 (a_1 \lambda_1 q_1 + a_2 \lambda_2 q_2 + \cdots + a_m \lambda_m q_m) \\ \end{aligned}

here c1c_1 is some constant due to the fact the eigenvectors are not uniquely specified.

wRepeating this kk times we have

Av(k1)=v(k)=a1Akq1+a2Akq2++amAkqm=ck(a1λ1kq1+a2λ2kq2++amλmkqm)=ckλ1k(a1q1+a2λ2kλ1kq2++amλmkλ1kqm)\begin{aligned} Av^{(k-1)} = v^{(k)} &= a_1 A^k q_1 + a_2 A^k q_2 + \cdots + a_m A^k q_m \\ &= c_k (a_1 \lambda_1^k q_1 + a_2 \lambda_2^k q_2 + \cdots + a_m \lambda_m^k q_m) \\ &= c_k \lambda_1^k \left(a_1 q_1 + a_2 \frac{\lambda_2^k}{\lambda_1^k} q_2 + \cdots + a_m \frac{\lambda_m^k}{\lambda_1^k} q_m \right) \end{aligned}

Since λ1>λi\lambda_1 > \lambda_i for all i1i \neq 1 then in the limit the terms λ2k/λ1k\lambda_2^k / \lambda_1^k will approach zero and on normalization v(k)/v(k)q1v^{(k)}/||v^{(k)}||\rightarrow \mathbf{q}_1.

Inverse Iteration with shifts

Inverse iteration with shifts uses a similar approach with the difference being that we can use it to find any of the eigenvectors for the matrix AA.

Some Preliminaries: inverse and shift rules of Eigenvalues

Show that if x\mathbf{x} is an eigenvector of AA with eigenvalue λ\lambda, then

  • x\mathbf{x} is an eigenvector of A1A^{-1} with eigenvalue 1/λ1/\lambda

  • x\mathbf{x} is an eigenvector of AσIA -\sigma I with eigenvalue λσ\lambda - \sigma

So...

If we want to find the smallest eigenvalue we can consider the power method on A1A^{-1},

But we really don’t want to find A1A^{-1} which is expensive, instead we can do the equivalent iteration

x[0] = x0
for i in range(MAX_ITS):
    solve A w[i] = x[i]
        x[i+1] = w[i]/norm(w[i]) 

as wi=A1xi\mathbf{w}_i = A^{-1}\mathbf{x}_i

and if we want to find the eigenvalue closest to some number μ\mu we can apply the power method to

(AμI)1,(A - \mu I)^{-1},

the eigenvectors of this matrix are

(λjμ)1(\lambda_j - \mu)^{-1}

where λj\lambda_j are the eigenvalues of AA.

If μ\mu is close to a particular λj\lambda_j, say λJ\lambda_J, then

(λJμ)1(\lambda_J - \mu)^{-1}

will be larger than any of the other (λjμ)1(\lambda_j - \mu)^{-1}. In this way we effectively have picked out the eigenvalue we want to consider in the power iteration!

Rayleigh Quotient Iteration

By themselves the above approaches are not particularly useful but combining them we can iterate back and forth to find the eigenvalue, eigenvector pair:

  1. Compute the Rayleigh quotient and find an estimate for λj\lambda_j

  2. Compute one step of inverse iteration to approximate xjx_j

  3. Repeat...

def rayleigh_quotient_iteration(A, tol=1.0e10):
    """rayleigh quotient iteration to find eigenvalues
    parameters:
    -----------

    A: ndarray (square)
        m x m matrix

    tol: float
        stopping criteria for iteration.
        iteration will cease when ||x_{i+1} - x_{i}|| < tol or
        MAX_ITS exceeded

    returns:
    --------
    x: ndarray
        array of iterates of the eigenvector
    r: ndarray
    """

    MAX_ITS = 100
    m = A.shape[0]

    x = numpy.empty((MAX_ITS, m))
    lam = numpy.empty(MAX_ITS)
    res = numpy.empty(MAX_ITS)

    I = numpy.eye(m)

    # generate a random unit vector
    x0 = numpy.random.rand(A.shape[0])
    x[0, :] = x0 / numpy.linalg.norm(x0, ord=2)
    lam[0] = rayleighq(A, x0)

    for i in range(1, MAX_ITS + 1):
        # this is the only different line
        w = numpy.linalg.solve(A - lam[i - 1] * I, x[i - 1, :])
        x[i, :] = w / numpy.linalg.norm(w, ord=2)
        lam[i] = rayleighq(A, x[i, :])
        res[i - 1] = numpy.abs(lam[i] - lam[i - 1]) / numpy.abs(lam[i])
        if res[i - 1] < tol:
            break
    if i == MAX_ITS:
        warnings.warn("Maximum iterations exceeded")

    x.resize(i + 1, m)
    lam.resize(i + 1)
    res.resize(i)
    return x, lam, res
x, r, res = rayleigh_quotient_iteration(A, tol=1.0e-10)
print("{} Iterations".format(len(r)))
print("x = {}".format(x[-1]))
print("Rayleighquotient = {}".format(r[-1]))
print("eigs(A) = {}".format(numpy.linalg.eigvals(A)))
fig = plt.figure(figsize=(16, 6))
axes = fig.add_subplot(1, 2, 1)
axes.plot(r, "o", markersize=8)
lam_max = numpy.max(numpy.linalg.eigvals(A))
axes.plot(lam_max * numpy.ones(r.shape), "k--")

axes.grid()
axes.set_xlabel("Iteration", fontsize=16)
axes.set_ylabel("Rayleigh Quotient", fontsize=16)

axes = fig.add_subplot(1, 2, 2)
axes.semilogy(res, "o-", markersize=8)

axes.grid()
axes.set_xlabel("Iteration", fontsize=16)
axes.set_ylabel("Residual", fontsize=16)
plt.show()

QR/RQ Algorithm

All of the above methods pick out at most a few eigenvalues at a time. However it turns out we can use the QRQR algorithm, to iterate towards the Schur factorization and find all the eigenvalues simultaneously.

The simplest algorithm just iterates

    while not converged:
        Q, R = numpy.linalg.qr(A)
        A = R.dot(Q)        

calculating the QRQR factorization of AA, then forming a new A=RQA=RQ, This sequence will eventually converge to the Schur decomposition of the matrix AA.

Code this up and see what happens.

%precision 6
m = 3
A = numpy.array([[2, 1, 1], [1, 3, 1], [1, 1, 4]])
print(numpy.linalg.eigvals(A))
from scipy.linalg import hessenberg

A = hessenberg(A)
print(A)
MAX_STEPS = 10
for i in range(MAX_STEPS):
    Q, R = numpy.linalg.qr(A)
    A = numpy.dot(R, Q)
    print()
    print("A(%s) =" % (i))
    print(A)
print()
print("True eigenvalues: ")
print(numpy.linalg.eigvals(A))
print()
print("Computed eigenvalues: ")
print(numpy.diag(A))

So why does this work? The first step is to find the QRQR factorization of A(k1)A^{(k-1)} or

A(k1)=Q(k)R(k)A^{(k-1)} = Q^{(k)}R^{(k)}

which is equivalent to finding

(Q(k))TA(k1)=R(k)(Q^{(k)})^T A^{(k-1)} = R^{(k)}

and multiplying on the right leads to

(Q(k))TA(k1)Q(k)=R(k)Q(k).(Q^{(k)})^T A^{(k-1)} Q^{(k)} = R^{(k)} Q^{(k)}.

In this way we can see that this is a similarity transformation of the matrix A(k1)A^{(k-1)} since the Q(k)Q^{(k)} is an orthogonal matrix (Q1=QTQ^{-1} = Q^T). This of course is not a great idea to do directly but works great in this case as we iterate to find the upper triangular matrix R(k)R^{(k)} which is exactly where the eigenvalues appear.

In practice this basic algorithm is modified to include a few additions:

  1. Before starting the iteration AA is reduced to Hessenberg form.

  2. Motivated by the inverse power iteration we observed we instead consider a shifted matrix A(k)μ(k)IA^{(k)} - \mu^{(k)} I for factoring. The μ\mu picked is related to the estimate given by the Rayleigh quotient. Here we have

    μ(k)=(qm(k))TAqm(k)(qm(k))Tqm(k)=(qm(k))TAqm(k).\mu^{(k)} = \frac{(q_m^{(k)})^T A q_m^{(k)}}{(q_m^{(k)})^T q_m^{(k)}} = (q_m^{(k)})^T A q_m^{(k)}.
  3. Deflation is used to reduce the matrix A(k)A^{(k)} into smaller matrices once (or when we are close to) finding an eigenvalue to simplify the problem.

This has been the standard approach until recently for finding eigenvalues of a matrix.

Application: Finding the roots of a polynomial

Numpy has a nice function called roots which returns the nn roots of a nnth degree polynomial

p(x)=c0+c1x+c2x2++cnxnp(x) = c_0 + c_1 x + c_2 x^2 + \ldots + c_n x^n

described by a n+1n+1 vector of coefficients c\mathbf{c}

c = numpy.array([-1, -1, 1])
r = numpy.roots(c)
print(r)
c = numpy.random.rand(6)
r = numpy.roots(c)
print(r)

This routine, does not try and actually find the roots of a high-order polynomial, instead it actually calculates the eigenvalues of a companion matrix CC whose characteristic polynomial PC(λ)P_C(\lambda) is the monic polynomial pm(x)=p(x)/cnp_m(x) = p(x)/c_n.

It can be shown that this matrix can be constructed as (see e.g.)

C(p)=[000c0100c1010c2001cn1].C(p)=\begin{bmatrix} 0 & 0 & \dots & 0 & -c_0 \\ 1 & 0 & \dots & 0 & -c_1 \\ 0 & 1 & \dots & 0 & -c_2 \\ \vdots & \vdots & \ddots & \vdots & \vdots \\ 0 & 0 & \dots & 1 & -c_{n-1} \end{bmatrix}.
def myroots(p, verbose=True):
    """Calculate the roots of a polynomial described by coefficient vector
    p(x) = p_0 + p_1 x + p_2 x^2 + \ldots + p_n x^n
    by finding the eigenvalues of the companion matrix
    """

    # construct the companion matrix of the coefficient vector c
    # make p monic and drop the last coefficient
    c = p / p[-1]
    if verbose:
        print(c)
    m = len(c) - 1
    C = numpy.zeros((m, m))
    C[:, -1] = -c[:-1]
    C[1:, :-1] = numpy.eye(m - 1)
    if verbose:
        print("C = \n{}".format(C))

    return numpy.linalg.eigvals(C)
c = numpy.array([-1, -1, 1])
r = numpy.roots(c)
print(r)
mr = myroots(c)
print
print(mr)
c = numpy.random.rand(5)
r = numpy.roots(c)
print(r)
mr = myroots(c)
print
print(mr)

Alternatives

Jacobi

Jacobi iteration employs the idea that we know the eigenvalues of a matrix of size equal to or less than 4 (we know the roots of the characteristic polynomial directly). Jacobi iteration therefore attempts to break the matrix down into at most 4 by 4 matrices along the diagonal via a series of similarity transformations based on only diagonalizing sub-matrices 4 by 4 or smaller.

Bisection

It turns out if you do not want all of the eigenvalues of a matrix that using a bisection method to find some subset of the eigenvalues is often the most efficient way to get these. This avoids the pitfall of trying to find the eigenvalues via other root-finding approaches by only needing evaluations of the function and if a suitable initial guess is provided can find the eigenvalue quickly that is closest to the initial bracket provided.

Divide-and-conquer

This algorithm is actually the one used most often used if both eigenvalues and eigenvectors are needed and performs up to twice as fast as the QRQR approach. The basic idea is to split the matrix into two pieces at every iteration by introducing zeros on the appropriate off-diagonals which neatly divides the problem into two pieces.

Arnoldi and Lanczos Iteration

Krylov subspace methods (which we will unfortunately not cover) are another approach to finding eigenvalues of a matrix. These methods generally use some piece of the QRQR approach outlined above and are extremely effective at finding the “extreme” eigenvalues of the matrix.