Skip to article frontmatterSkip to article content

Gaussian Elimination

Text provided under a Creative Commons Attribution license, CC-BY. All code is made available under the FSF-approved MIT license. (c) Kyle T. Mandli

Note: This material largely follows the text “Numerical Linear Algebra” by Trefethen and Bau (SIAM, 1997) and is meant as a guide and supplement to the material presented there.

import matplotlib.pyplot as plt
import numpy

%matplotlib inline

Gaussian Elimination

Gaussian elimination is the process where one transforms a matrix or linear system through a series of operations into one that is at the very least upper triangular (it is often also presented as including the operation that transforms AA completely to diagonal). These series of operations can be written as a sequence of successive matrix-matrix multiplications by lower triangular matrices. Letting these “elimination matrices” be EjE_j and the resulting upper triangular matrix be UU we can write this as

Em1E2E1L1A=U.\overbrace{E_{m-1} \cdots E_2 E_1}^{L^{-1}} A = U.

Labeling the successive operations as L1L^{-1} allows us to move LL to the other side of the equation and we see that in fact what we have done is computed another factorization of the matrix AA called the LULU factorization.

Example

As an example of this process lets consider the matrix

A=[2110433187956798]A = \begin{bmatrix} 2 & 1 & 1 & 0 \\ 4 & 3 & 3 & 1 \\ 8 & 7 & 9 & 5 \\ 6 & 7 & 9 & 8 \end{bmatrix}

The first step is to remove the values in the first column below the diagonal, to do this we can multiply by the matrix

E1=[1214131]so thatE1A=E1[2110433187956798]=[2110111355468].E_1 = \begin{bmatrix} 1 & & & \\ -2 & 1 & & \\ -4 & & 1 & \\ -3 & & & 1 \end{bmatrix}\quad\text{so that}\quad E_1 A = E_1\begin{bmatrix} 2 & 1 & 1 & 0 \\ 4 & 3 & 3 & 1 \\ 8 & 7 & 9 & 5 \\ 6 & 7 & 9 & 8 \end{bmatrix} = \begin{bmatrix} 2 & 1 & 1 & 0 \\ & 1 & 1 & 1 \\ & 3 & 5 & 5 \\ & 4 & 6 & 8 \end{bmatrix}.

To construct E1E_1, we first define the element a11=2a_{11} = 2 as the “Pivot” and the i,1i,1 component of E1E_1 as the negative of the “multiplier”

i,1=ai1a11\ell_{i,1} = \frac{a_{i1}}{a_{11}}

which is the number, that, when multiplied by the pivot and subtracted from the ai1a_{i1} element puts a zero in the i,1i,1 position. Put another way each row of E1E_1 subtracts some multiple of the first row from each row (except the first which remains the same) that put’s a zero below the diagonal and changes all the other elements.

Once the first column is zero’d out below the diagonal, we can repeat the process for the smaller 3×33\times3 matrix starting at the 22{22} element

E1A=[2110111355468]E_1 A = \begin{bmatrix} 2 & 1 & 1 & 0 \\ & 1 & 1 & 1 \\ & 3 & 5 & 5 \\ & 4 & 6 & 8 \end{bmatrix}

The next step is to remove the values below the diagonal of the second column. This can be done with

E2=[113141] so that E2E1A=[21101112224].E_2 = \begin{bmatrix} 1 & & & \\ & 1 & & \\ & -3 & 1 & \\ & -4 & & 1 \end{bmatrix} \text{ so that } E_2 E_1 A = \begin{bmatrix} 2 & 1 & 1 & 0 \\ & 1 & 1 & 1 \\ & & 2 & 2 \\ & & 2 & 4 \end{bmatrix}.

Finally we multiply AA by E3E_3 defined as

E3=[11111] so that E3E2E1A=[2110111222]E_3 = \begin{bmatrix} 1 & & & \\ & 1 & & \\ & & 1 & \\ & & -1 & 1 \end{bmatrix} \text{ so that } E_3 E_2 E_1 A = \begin{bmatrix} 2 & 1 & 1 & 0 \\ & 1 & 1 & 1 \\ & & 2 & 2 \\ & & & 2 \end{bmatrix}

completing the factorization with

L1=E3E2E1=[11111][113141][1214131]L^{-1} = E_3 E_2 E_1 = \begin{bmatrix} 1 & & & \\ & 1 & & \\ & & 1 & \\ & & -1 & 1 \end{bmatrix} \begin{bmatrix} 1 & & & \\ & 1 & & \\ & -3 & 1 & \\ & -4 & & 1 \end{bmatrix} \begin{bmatrix} 1 & & & \\ -2 & 1 & & \\ -4 & & 1 & \\ -3 & & & 1 \end{bmatrix}

and

U=[2110111222]U = \begin{bmatrix} 2 & 1 & 1 & 0 \\ & 1 & 1 & 1 \\ & & 2 & 2 \\ & & & 2 \end{bmatrix}

We can actually easily invert LL (but don’t really need to as you will find that LL is just a lower triangular matrix with 1 on the diagonal and the lower triangular elements are just the multipliers in their respective positions i.e.

Lij=ijL_{ij}= \ell_{ij}

in the and finally can write AA as

A=LU=[1214313411][2110111222]A = LU = \begin{bmatrix} 1 & & & \\ 2 & 1 & & \\ 4 & 3 & 1 & \\ 3 & 4 & 1 & 1 \end{bmatrix}\begin{bmatrix} 2 & 1 & 1 & 0 \\ & 1 & 1 & 1 \\ & & 2 & 2 \\ & & & 2 \end{bmatrix}

Put another way, the matrix LL just shows how the rows of AA can be constructed from the rows of UU

Solving Ax=bA\mathbf{x} = \mathbf{b}

Given our factorization A=LUA=LU it is now straightforward to solve the linear system Ax=bA\mathbf{x} = \mathbf{b} in just a few steps

  1. Substitute A=LUA=LU in our original equation to get

    LUx=bLU\mathbf{x} = \mathbf{b}
  2. Define c=Ux\mathbf{c} = U\mathbf{x} and solve Lc=bL \mathbf{c} = \mathbf{b} using forward substitution.

  3. Solve Ux=cU \mathbf{x} = \mathbf{c} using backward substitution.

Forward Substitution

For forward substitution we proceed from the first row and progress downwards through the matrix. We can then consider the general iith row with

Li,1y1+Li,2y2++Li,i1yi1+yi=biL_{i,1} y_1 + L_{i,2} y_2 + \cdots + L_{i,i-1} y_{i-1} + y_i = b_i

noting that we are using the fact that the matrix LL has 1 on its diagonal. We can now solve for yiy_i as

yi=bi(Li,1y1+Li,2y2++Li,i1yi1).y_i = b_i - \left( L_{i,1} y_1 + L_{i,2} y_2 + \cdots + L_{i,i-1} y_{i-1} \right ).

Backward Substitution

Backwards substitution requires us to move from the last row of UU and move upwards. We can consider again the general iith row with

Ui,ixi+Ui,i+1xi+1++Ui,m1xm1+Ui,mxm=yi.U_{i,i} x_i + U_{i,i+1} x_{i+1} + \ldots + U_{i,m-1} x_{m-1} + U_{i,m} x_m = y_i.

We can now solve for xix_i as

xi=1Ui,i(yi(Ui,i+1xi+1++Ui,m1xm1+Ui,mxm))x_i = \frac{1}{U_{i,i}} \left( y_i - ( U_{i,i+1} x_{i+1} + \ldots + U_{i,m-1} x_{m-1} + U_{i,m} x_m) \right )

Unfortunately, in most cases this will actually be unstable and can fail on even well-conditioned matrices

Pivoting

As you may recall from your linear algebra course pivoting of the rows and columns of AA is often an important addition to Gaussian elimination to ensure that we can in fact factorize the matrix. As a simple example take the matrix

A=[0111].A = \begin{bmatrix} 0 & 1 \\ 1 & 1 \end{bmatrix}.

Without switching the rows, Gaussian elimination would fail at the first step!

A more insidious example is given by the matrix

A=[ϵ111].A = \begin{bmatrix} \epsilon & 1 \\ 1 & 1 \end{bmatrix}.

on a finite precision machine where ϵ<ϵmach\epsilon < \epsilon_{mach}.

Just consider the solution by Gaussian elimination of the two identical problems

[ϵ111]x=[12]and[11ϵ1]x=[21]\begin{bmatrix} \epsilon & 1 \\ 1 & 1 \end{bmatrix}\mathbf{x} = \begin{bmatrix} 1 \\ 2 \end{bmatrix}\quad\text{and}\quad \begin{bmatrix}1 & 1 \\ \epsilon & 1 \end{bmatrix}\mathbf{x} = \begin{bmatrix} 2 \\ 1 \end{bmatrix}

on a finite precision machine where ϵ<ϵmach\epsilon < \epsilon_{mach}.

In principle any row and column can be pivoted so that at each step we have the maximum value being used (on the diagonal) to perform the operations that compose the matrices EjE_j. In practice however we restrict ourselves to only pivoting rows of the matrix (called partial pivoting).

The basic algorithm just adds an extra step of finding the largest element in any column kk in rows mkm\geq k, and swapping row mm for row kk. Row swaps can also be accomplished by multiplying by a permutation matrix PkP_k which just reorders the rows of the identity matrix

Consider again the example from above and switch the 1st and 3rd rows using the criteria that we always want to use the largest value to do perform the reduction. Defining the first permutation matrix as

P1=[1111]so thatP1A=P1[2110433187956798]=[8795433121106798]P_1 = \begin{bmatrix} & & 1 & \\ & 1 & & \\ 1 & & & \\ & & & 1 \end{bmatrix} \quad \text{so that} \quad P_1 A = P_1\begin{bmatrix} 2 & 1 & 1 & 0 \\ 4 & 3 & 3 & 1 \\ 8 & 7 & 9 & 5 \\ 6 & 7 & 9 & 8 \end{bmatrix} = \begin{bmatrix} 8 & 7 & 9 & 5 \\ 4 & 3 & 3 & 1 \\ 2 & 1 & 1 & 0 \\ 6 & 7 & 9 & 8 \end{bmatrix}

Now defining the first E1E_1 as

E1=[1121141341]so thatE1P1A=[87951232323454547494174].E_1 = \begin{bmatrix} 1 & & & \\ -\frac{1}{2} & 1 & & \\ -\frac{1}{4} & & 1 & \\ -\frac{3}{4} & & & 1 \end{bmatrix} \quad \text{so that} \quad E_1 P_1 A = \begin{bmatrix} 8 & 7 & 9 & 5 \\ & -\frac{1}{2} & -\frac{3}{2} & -\frac{3}{2} \\ & -\frac{3}{4} & -\frac{5}{4} & -\frac{5}{4} \\ & \frac{7}{4} & \frac{9}{4} & \frac{17}{4} \end{bmatrix}.

Again examining the remaining values in column 2 the maximum value lies in row 4 so we want to interchange this with the second row (note that we do not want to move the first row as that will bring non-zero values into the first column below the diagonal).

P2=[1111]andE2=[11371271]so thatE2P2E1P1A=[8795749417427476727].P_2 = \begin{bmatrix} 1 & & & \\ & & & 1 \\ & & 1 & \\ & 1 & & \end{bmatrix} \quad \text{and} \quad E_2 = \begin{bmatrix} 1 & & & \\ & 1 & & \\ & \frac{3}{7} & 1 & \\ & \frac{2}{7} & & 1 \end{bmatrix} \quad \text{so that} \quad E_2 P_2 E_1 P_1 A = \begin{bmatrix} 8 & 7 & 9 & 5 \\ & \frac{7}{4} & \frac{9}{4} & \frac{17}{4} \\ & & -\frac{2}{7} & \frac{4}{7} \\ & & -\frac{6}{7} & -\frac{2}{7} \end{bmatrix}.

Finally

P3=[1111]andE3=[111131]so thatE3P3E2P2E1P1A=[87957494174672723].P_3 = \begin{bmatrix} 1 & & & \\ & 1 & & \\ & & & 1 \\ & & 1 & \end{bmatrix} \quad \text{and} \quad E_3 = \begin{bmatrix} 1 & & & \\ & 1 & & \\ & & 1 & \\ & & -\frac{1}{3} & 1 \end{bmatrix} \quad \text{so that} \quad E_3 P_3 E_2 P_2 E_1 P_1 A = \begin{bmatrix} 8 & 7 & 9 & 5 \\ & \frac{7}{4} & \frac{9}{4} & \frac{17}{4} \\ & & -\frac{6}{7} & -\frac{2}{7} \\ & & & \frac{2}{3} \\ \end{bmatrix}.

LU Factorization with Partial Pivoting

Due to the nature of the pivot matrices we can disentangle them from the matrices EjE_j. Right now we have

E3P3E2P2E1P1A=UE_3 P_3 E_2 P_2 E_1 P_1 A = U

where what we want is

(E3E2E1)P3P2P1A=U.(E'_3 E'_2 E'_1) P_3 P_2 P_1 A = U.

It turns out we can easily do this by

E3=E3,E2=P3E2P31,andE1=P3P2E1P21P31.E'_3 = E_3, \quad E'_2 = P_3 E_2 P_3^{-1}, \quad \text{and} \quad E'_1 = P_3 P_2 E_1 P_2^{-1} P_3^{-1}.

(Moreover, because Permutation matrices are unitary matrices, then Pk1=PkTP_k^{-1}=P_k^T so no inverses need be computed) These new matrices EjE'_j can easily be computed and it turns out that the EjE'_j matrices have the same structure with the rows permuted accordingly.

In general then the LULU factorization with partial pivoting of the above example can be written as

[1111]P=P3P2P1[2110433187956798]A=[1341122711437131]L[87957494174672723]U\underbrace{\begin{bmatrix} & & 1 & \\ & & & 1 \\ & 1 & & \\ 1 & & & \end{bmatrix}}_{P = P_3 P_2 P_1} \underbrace{\begin{bmatrix} 2 & 1 & 1 & 0 \\ 4 & 3 & 3 & 1 \\ 8 & 7 & 9 & 5 \\ 6 & 7 & 9 & 8 \end{bmatrix}}_{A} = \underbrace{\begin{bmatrix} 1 & & & \\ \frac{3}{4} & 1 & & \\ \frac{1}{2} & -\frac{2}{7} & 1 & \\ \frac{1}{4} & -\frac{3}{7} & \frac{1}{3} & 1 \\ \end{bmatrix}}_{L} \underbrace{\begin{bmatrix} 8 & 7 & 9 & 5 \\ & \frac{7}{4} & \frac{9}{4} & \frac{17}{4} \\ & & -\frac{6}{7} & -\frac{2}{7} \\ & & & \frac{2}{3} \\ \end{bmatrix}}_{U}

Some code for the LU with partial pivoting (translated from the matlab, courtesy of Cleve Moler)

Here is an implementation of an “in-place” algorithm transforming a matrix in place from AA to a matrix LULU where the diagonal and upper elements are replaced with UU and the strictly lower diagonal part of A is replaced with the multipliers in LL. This routine also returns a permuted index vector pp such that b[p]=Pbb[p]= P\mathbf{b}

A little hand-drawn animation of this algorithm, may go a long way to understanding it.

def lutx(A):
    """LUTX Triangular factorization, textbook version

    translated to python from the original Matlab from Cleve Moler

    L,U,p = lutx(A) produces a unit lower triangular matrix L,
    an upper triangular matrix U, and a permutation vector p,
    so that L*U = A(p,:)

    Parameters:
    -----------
    A: ndarray
        square numpy array

    Returns:
    --------
    L: ndarray NxN
        Lower triangular matrix with 1's on the diagonal and multipliers in place
    U: ndarray NxN
        Upper triangular matrix with Pivots on the diagonal
    p: ndarray size(N)
        permutation array
    """

    n, m = A.shape
    assert n == m

    # make a deep copy of the original matrix into U
    U = A.copy()
    p = numpy.array(range(n))

    for k in range(n - 1):
        # Find index of largest element below diagonal in k-th column
        m = numpy.argmax(numpy.abs(U[k:, k])) + k
        # Skip elimination if column is zero
        if U[m, k] != 0.0:
            # Swap pivot row
            if m != k:
                U[[k, m], :] = U[[m, k], :]
                p[[k, m]] = p[[m, k]]
        # Compute multipliers in place
        U[k + 1 : n, k] /= U[k, k]
        # Update the remainder of the matrix
        U[k + 1 : n, k + 1 : n] -= numpy.outer(U[k + 1 : n, k], U[k, k + 1 : n])

    # Separate result and return
    L = numpy.tril(U, -1) + numpy.eye(n)
    U = numpy.triu(U)
    return L, U, p

Let’s apply this to our example problem

A = numpy.array(
    [
        [2.0, 1.0, 1.0, 0.0],
        [4.0, 3.0, 3.0, 1.0],
        [8.0, 7.0, 9.0, 5.0],
        [6.0, 7.0, 9.0, 8.0],
    ]
)
L, U, p = lutx(A)
print("L=\n{}".format(L))
print("U=\n{}".format(U))
print("p={}".format(p))
import scipy.linalg as la

P, LL, UU = la.lu(A)
print("L=\n{}".format(LL))
print("U=\n{}".format(UU))
print("P={}".format(P.T))
print("A=\n{}\n".format(A))
print("p = {}".format(p))
print("PA=\n{}".format(A[p, :]))
print("LU=\n{}".format(L.dot(U)))

And let’s use this to solve Ax=bA\mathbf{x} = \mathbf{b}

Given our factorization PA=LUPA=LU it is now straightforward to solve the linear system Ax=bA\mathbf{x} = \mathbf{b} in just a few steps

  1. Multiply both sides by PP to yield PAx=PbPA\mathbf{x} = P\mathbf{b} and substitute PA=LUPA=LU to get

LUx=PbLU\mathbf{x} = P\mathbf{b}
  1. Define c=Ux\mathbf{c} = U\mathbf{x} and solve Lc=PbL \mathbf{c} = P \mathbf{b} using forward substitution.

  2. Solve Ux=cU \mathbf{x} = \mathbf{c} using backward substitution.

Here we’ll just use numpy.linalg.solve to solve the individual triangular systems but we could use more efficient forward and backward substitution codes.

x = numpy.random.rand(4)
b = A.dot(x)

L, U, p = lutx(A)

c = numpy.linalg.solve(L, b[p])
xp = numpy.linalg.solve(U, c)

print(x)
print(xp)

numpy.testing.assert_allclose(x, xp)
print("success")
x = numpy.linalg.solve(A, b)
print(x)

computational Costs

Gaussian elimination with partial pivoting is an extremely robust algorithm even for ill-conditioned matrices, however it is not cheap. With a little work, it is easy to show that for a dense n×nn\times n matrix, the LULU decomposition takes O(n3/3)O(n^3/3) operations, while the two forward and back substitutions take in total O(n2)O(n^2) operations.

For the large (but sparse) matrices that often arise from numerical PDE’s, there are a host of other iterative methods that, under certain circumstances can solve linear systems in many fewer flops. But that is another story.

LU vs the matrix inverse A1A^{-1}

Formally, one can solve Ax=bA\mathbf{x}=\mathbf{b} using the PA=LUPA=LU or as

x=A1b\mathbf{x} = A^{-1}\mathbf{b}

which will formally have O(N3+N2)O(N^3 + N^2) operations (which is about 3 times slower than the LU). However, in general, one rarely needs to calculate A1A^{-1} and for many matrices arising from 2-pt BVP’s it is a particulary bad idea as even if AA is sparse (e.g. tridiagonal), the inverse A1A^{-1} can often be dense. The LU decomposition however, will preserve the sparsity structure for banded matrices.

We will demonstrate this here using the simplest 2-pt boundary value problem (the 1-D Poisson problem)

uxx=f(x)u(0)=u(L)=0u_{xx} = f(x)\quad u(0)=u(L)=0

or in its discrete form

Du=fD\mathbf{u} = \mathbf{f}

where DD is a tridiagonal finite difference matrix, u=u(x)\mathbf{u}=u(\mathbf{x}) is the discrete solution and f=f(x)\mathbf{f} = f(\mathbf{x}) is the discrete approximation to the RHS.

Reusing our sparse matrix code from the ODE BVP notebook

from scipy.sparse import identity, lil_matrix, spdiags
from scipy.sparse.linalg import inv, splu, spsolve

from fdcoeffV import fdcoeffV


def D2(x, bcs=["dirichlet", "dirichlet"]):
    """
    Assemble a general sparse second-order finite-difference approximation to d/dx^2 on a possibly irregular mesh
    First and last rows are set by string bcs

    parameters:
    -----------
    x: numpy.array
        mesh coordinates
    bcs: list of strings for boundary conditions e.g [left_string, right_string] where
        the strings can be either 'dirichlet' or 'neumann'
    """
    N = len(x)
    A = lil_matrix((N, N))
    if bcs[0] == "dirichlet":
        A[0, 0] = 1.0
    elif bcs[0] == "neumann":
        A[0, 0:3] = fdcoeffV(1, x[0], x[:3])
    else:
        raise ValueError("no known BC type for left boundary {}".format(bcs[0]))

    if bcs[1] == "dirichlet":
        A[-1, -1] = 1.0
    elif bcs[1] == "neumann":
        A[-1, -3:] = fdcoeffV(1, x[-1], x[-3:])
    else:
        raise ValueError("no known BC type for right boundary {}".format(bcs[1]))

    for i in range(1, N - 1):
        A[i, i - 1 : i + 2] = fdcoeffV(2, x[i], x[i - 1 : i + 2])
    return A.tocsc()


def RHS(x, f, bvalues):
    """Set the rhs vector

    parameters
    ----------
    x: numpy.array
        mesh coordinates
    f: callable
        rhs function for interior points called on f(x[1:-2])
    bvalues:  numpy.array (len 2)
        values for boundary conditions (either dirichlet or neumann)
    """

    N = len(x)
    rhs = numpy.empty(N)
    rhs[[0, N - 1]] = bvalues
    rhs[1:-1] = f(x[1:-1])

    return rhs

We’ll set up a small NN-point Finite Difference problem

N = 101
x = numpy.linspace(0, 1.0, N)
A = D2(x)
func = lambda x: x
f = RHS(x, func, [0.0, 0.0])

u = spsolve(A, f)
fig = plt.figure(figsize=(16, 6))
axes = fig.add_subplot(1, 2, 1)
axes.spy(A)
axes.set_title("Sparsity of $A$")

axes = fig.add_subplot(1, 2, 2)
axes.plot(x, u, "bo-")
axes.grid()
axes.set_xlabel("$x$")
axes.set_ylabel("$u$")
axes.set_title("Solution $u$")
plt.show()

Visualizing L,UL,U and A1A^{-1}

scipy.sparse also includes routines for calculating the inverse and permuted L,UL,U decompositions of sparse matrices AA using the parallel sparse package SuperLU

B = splu(A, permc_spec="NATURAL")
Ainv = inv(A)
fig = plt.figure(figsize=(18, 6))
axes = fig.add_subplot(1, 3, 1)
axes.spy(Ainv)
axes.set_title("Sparsity of $A^{{-1}}$, nnz={}".format(Ainv.nnz))

axes = fig.add_subplot(1, 3, 2)
axes.spy(B.L)
axes.set_title("Sparsity of $B.L$, nnz={}".format(B.L.nnz))

axes = fig.add_subplot(1, 3, 3)
axes.spy(B.U)
axes.set_title("Sparsity of $B.U$, nnz={}".format(B.U.nnz))
plt.show()

Check

And don’t forget to check that we get the same answer...

err = (Ainv.dot(f) - B.solve(f)) / numpy.finfo("float").eps
# print(err)
fig = plt.figure(figsize=(18, 6))
axes = fig.add_subplot(1, 1, 1)
axes.plot(range(len(err)), err, "o-")
axes.grid()
axes.set_xlabel("index")
axes.set_ylabel("err$/\epsilon_{mach}$")
axes.set_title("Error")
plt.show()

Timing

%timeit inv(A)
%timeit Ainv.dot(f)
%timeit B = splu(A)
%timeit B.solve(f)