Skip to article frontmatterSkip to article content

Conditioning and Stability

Text provided under a Creative Commons Attribution license, CC-BY. All code is made available under the FSF-approved MIT license. (c) Kyle T. Mandli

Note: This material largely follows the text “Numerical Linear Algebra” by Trefethen and Bau (SIAM, 1997) and is meant as a guide and supplement to the material presented there.

from __future__ import print_function

%matplotlib inline
import matplotlib.pyplot as plt
import numpy

Conditioning and Stability

Once an approximation to a linear system is constructed the next question is how much trust can we put in the approximation? Since the true solution is not known, one of the few tools we have is to ask how well the approximation matches the original equation. In other words, we seek a solution to a system,

f(x)=b.\vec{f}(\vec{x}) = \vec{b}.

We do not have x\vec{x} but instead have an approximation, x^\hat{x}, and we hope that

f(x^)b.\vec{f}(\hat{x}) \approx \vec{b}.

In this section the question we explore is to try to determine a bound on the relative error, xx^x\frac{||\vec{x}-\hat{x}||}{||\vec{x}||} given the matrix, AA.

This leads to the notion of conditioning. Conditioning is the behavior of a problem when the solution is a changed a small bit (perturbed), and it is a mathematical (analytic) property of the original system of equations. Stability, on the other hand, is concerned with how the algorithm used to obtain an approximation behaves when the approximation is perturbed.

Conditioning and Condition Numbers

A well-conditioned problem is one where a small perturbation to the original problem leads to only small changes in the solution.

Formally we can think of a function ff which maps xx to yy

f(x)=yorf:XY.f(x) = y \quad \text{or} \quad f: X \rightarrow Y.

Let xXx \in X where we perturb xx with δx\delta x and we ask how the result yy changes:

f(x)f(x+δx)Cx(x+δx)||f(x) - f(x + \delta x)|| \leq C ||x - (x+\delta x)||

for some constant CC possible dependent on δx\delta x depending on the type of conditioning we are considering.

Absolute Condition Number

If we let δx\delta x be the small perturbation to the input and δf=f(x+δx)f(x)\delta f = f(x + \delta x) - f(x) be the result the absolute condition number  κ^\hat{~\kappa} can be defined as

 ⁣κ^=supδxδfδx\hat{\!\kappa} = \sup_{\delta x} \frac{||\delta f||}{||\delta x||}

for most problems (assuming δf\delta f and δx\delta x are both infinitesimal).

When ff is differentiable we can evaluate the condition number via the Jacobian. Recall that the derivative of a multi-valued function can be termed in the form of a Jacobian J(x)J(x) where

[J(x)]ij=fixj(x).[J(x)]_{ij} = \frac{\partial f_i}{\partial x_j}(x).

This allows us to write the infinitesimal δf\delta f as

δfJ(x)δx\delta f \approx J(x) \delta x

with equality when δx0||\delta x|| \rightarrow 0. Then we can write the condition number as

 ⁣κ^=J(x)\hat{\!\kappa} = ||J(x)||

where the norm is the one induced by the spaces XX and YY.

Relative Condition Number

The relative condition number is defined similarly and is related to the difference before between the absolute error and relative error as defined previously. With the same caveats as before it can be defined as

κ=supδx(δff(x)δxx).\kappa = \sup_{\delta x} \left( \frac{\frac{||\delta f||}{||f(x)||}}{\frac{||\delta x||}{||x||}} \right).

Again if ff is differentiable we can use the Jacobian J(x)J(x) to evaluate the relative condition number as

κ=J(x)f(x) / x.\kappa = \frac{||J(x)||}{||f(x)|| ~/ ~||x||}.

Examples

Calculate the following relative condition numbers of the following problems.

x\sqrt{x} for x>0x > 0.

f(x)=x,J(x)=f(x)=12xκ=J(x)f(x)/x=12xxx=12f(x) = \sqrt{x}, \quad J(x) = f'(x) = \frac{1}{2 \sqrt{x}} \\ \kappa = \frac{||J(x)||}{||f(x)|| / ||x||} = \frac{1}{2 \sqrt{x}} \frac{x}{\sqrt{x}} = \frac{1}{2}

Calculate the relative condition number for the scalar function f(x)=x1x2f(x) = x_1 - x_2 using the vector x=(x1,x2)TR2\vec{x} = (x_1, x_2)^T \in \mathbb R^2 using an \ell_\infty norm.

f(x)=x1x2,J(x)=[fx1,fx2]=[1,1]κ=J(x)f(x)/x=2maxi=1,2xix1x2f(x) = x_1 - x_2, \quad J(x) = \left [ \frac{\partial f}{\partial x_1}, \frac{\partial f}{\partial x_2}\right ] = [1, -1] \\ \kappa = \frac{||J(x)||_\infty}{||f(x)||_\infty / ||x||_\infty} = \frac{2 \max_{i=1,2} |x_i|}{|x_1 - x_2|}

where

J=2.||J||_\infty = 2.

The condition number of a function was discussed in general terms above. Now, the more specific case of a linear function, a matrix-vector multiplication, is examined. Here we let f(x)=Ax\vec{f}(\vec{x})=Ax and determine the condition number by perturbing xx.

We begin with the definition above,

κ=supδx(A(x+δx)AxAxxδx),=supδxAδxδxxAx,=AxAx,\begin{aligned} \kappa &= \sup_{\delta x} \left ( \frac{||A (\vec{x}+\delta x) - A \vec{x}||}{||A\vec{x}||} \frac{||\vec{x}||}{||\delta x||}\right ), \\ &= \sup_{\delta x} \frac{ ||A \delta x||}{||\delta x||} \frac{||\vec{x}||}{||A\vec{x}||}, \\ &= ||A|| \frac{||\vec{x}||}{||A \vec{x}||}, \end{aligned}

where δx\delta x is a vector.

If AA has an inverse, then we note that $$

x=A1Ax,x=A1Ax,A1Ax,\begin{align} \vec{x} &= A^{-1}A \vec{x}, \\ \Rightarrow ||\vec{x}|| &= || A^{-1}A \vec{x} ||, \\ &\leq ||A^{-1}|| || A \vec{x} ||, \end{align}
whichimpliesthatwhich implies that
\frac{||x||}{||A x||} \leq ||A^{-1}||.
WecannowboundtheconditionnumberforamatrixbyWe can now bound the condition number for a matrix by
\kappa \leq ||A|| ||A^{-1}||.

$$

Condition Number of a Matrix

The condition number of a matrix is defined by the product

κ(A)=A A1.\kappa(A) = ||A||~||A^{-1}||.

where here we are thinking about the matrix rather than a problem. If κ\kappa is small than AA is said to be well-conditioned. If AA is singular we assign κ(A)=\kappa(A) = \infty as the matrix’s condition number.

When we are considering the 2\ell_2 norm then we can write the condition number as

κ(A)=ρ(AA)ρ((AA)1)=maxλminλ.\kappa(A) = \frac{\sqrt{\rho(A^\ast A)}}{\sqrt{\rho((A^\ast A)^{-1})}} = \frac{\sqrt{\max |\lambda|}}{\sqrt{\min |\lambda|}}.

Condition Number of a System of Equations

Another way to think about the conditioning of a problem we have looked at before is that the matrix AA itself is an input to the problem. Consider than the system of equations Ax=bA\vec{x} = \vec{b} where we will perturb both AA and x\vec{x} resulting in

(A+δA)(x+δx)=b.(A + \delta A)(\vec{x} + \delta x) = \vec{b}.

Assuming we solve the problem exactly we know that Ax=bA\vec{x} = \vec{b} and that the infinitesimals multiplied δAδx\delta A \delta x are smaller than the other term, and the above expression can be approximation by

(A+δA)(x+δx)=b,Ax+δAx+Aδx+δAδx=bδAx+Aδx=0\begin{aligned} (A + \delta A)(\vec{x} + \delta x) &= \vec{b}, \\ A\vec{x} + \delta Ax + A \delta x + \delta A \delta \vec{x} &= \vec{b} \\ \delta A\vec{x} + A \delta x & = 0 \end{aligned}

Solving for δx\delta x leads to

δx=A1δAx\delta x = -A^{-1} \delta A \vec{x}

implying

δxA1 δA x||\delta x|| \leq ||A^{-1}|| ~ ||\delta A|| ~ ||\vec{x}||

and therefore

δxxδAAA1 A=κ(A).\frac{\frac{||\delta x||}{||\vec{x}||}}{\frac{||\delta A||}{||A||}} \leq ||A^{-1}||~||A|| = \kappa(A).

We can also say the following regarding the condition number of a system of equations then

Theorem: Let b\vec{b} be fixed and consider the problem of computing x\vec{x} in Ax=bA\vec{x} = \vec{b} where AA is square and non-singular. The condition number of this problem with respect to perturbations in AA is the condition number of the matrix κ(A)\kappa(A).

Stability

We now return to the consideration of the fact that we are interested not only in the well-conditioning of a mathematical problem but in how we might solve it on a finite precision machine. In some sense conditioning describes how well we can solve a problem in exact arithmetic and stability how well we can solve the problem in finite arithmetic.

Accuracy and Stability

As we have defined before we will consider absolute error as

F(x)f(x)||F(x) - f(x)||

where F(x)F(x) is the approximation to the true solution f(x)f(x). Similarly we can define relative error as

F(x)f(x)f(x).\frac{||F(x) - f(x)||}{||f(x)||}.

In the ideal case we would like the relative error to be O(ϵmachine)\mathcal{O}(\epsilon_{\text{machine}}).

Forwards Stability

A forward stable algorithm for xXx \in X has

F(x)f(x)f(x)=O(ϵmachine)\frac{||F(x) - f(x)||}{||f(x)||} = \mathcal{O}(\epsilon_{\text{machine}})

In other words

A forward stable algorithm gives almost the right answer to exactly the right question.

Backwards Stability

A stronger notion of stability can also be defined which is satisfied by many approaches in numerical linear algebra. We say that an algorithm FF is backward stable if for xXx \in X we have

F(x)=f( ⁣x^)F(x) = f(\hat{\!x})

for some  ⁣x^\hat{\!x} with

 ⁣x^xx=O(ϵmachine).\frac{||\hat{\!x} - x||}{||x||} = \mathcal{O}(\epsilon_{\text{machine}}).

In other words

A backward stable algorithm gives exactly the right answer to nearly the right question.

Combining these ideas along with the idea that we should not expect to be able to accurately compute the solution to a poorly conditioned problem we can form the mixed forward-backward sense of stability as for xXx \in X if

F(x)f( ⁣x^)f( ⁣x^)=O(ϵmachine)\frac{||F(x) - f(\hat{\!x})||}{||f(\hat{\!x})||} = \mathcal{O}(\epsilon_{\text{machine}})

for some  ⁣x^\hat{\!x} with

 ⁣x^xx=O(ϵmachine).\frac{||\hat{\!x} - x||}{||x||} = \mathcal{O}(\epsilon_{\text{machine}}).

In other words

A stable algorithm gives nearly the right answer to nearly the right question.

An important aspect of the above statement is that we can not necessarily guarantee an accurate result. If the condition number κ(x)\kappa(x) is small we would expect that a stable algorithm would give us an accurate result (by definition). This is reflected in the following theorem.

Theorem: Suppose a backward stable algorithm is applied to solve a problem f:XYf: X \rightarrow Y with condition number κ\kappa on a finite precision machine, then the relative errors satisfy

F(x)f( ⁣x^)f( ⁣x^)=O(κ(x) ϵmachine).\frac{||F(x) - f(\hat{\!x})||}{||f(\hat{\!x})||} = \mathcal{O}(\kappa(x) ~ \epsilon_{\text{machine}}).

Proof: By the definition of the condition number of a problem we can write

F(x)f( ⁣x^)f( ⁣x^)(κ(x)+O(ϵmachine)) ⁣x^xx.\frac{||F(x) - f(\hat{\!x})||}{||f(\hat{\!x})||} \leq (\kappa(x) + \mathcal{O}(\epsilon_{\text{machine}}))\frac{||\hat{\!x} - x||}{||x||}.

Combining this with the definition of backwards stability we can arrive at the statement of the theorem.

Backward Error Analysis - Process of using the condition number of the problem and stability of the algorithm to determine the error.

Forward Error Analysis - Considers the accrual of error at each step of an algorithm given slightly perturbed input.

Stability of Ax=bA\vec{x} = \vec{b} using Householder Triangularization

As an example lets consider the conditioning and algorithm for solving Ax=bA\vec{x} = \vec{b}. Here we will use a QRQR factorization approach to solve Ax=bA\vec{x} = \vec{b} given by a Householder triangularization. First off lets discuss the QRQR factorization itself.

Theorem: Let the QRQR factorization A=QRA = QR of a matrix ACm×nA \in \mathbb C^{m \times n} be computed using a Householder triangularization approach on a finite precision machine, then

 ⁣Q^ ⁣R^=A+δAδAA=O(ϵmachine)\hat{\!Q} \cdot \hat{\!R} = A + \delta A \quad \frac{||\delta A||}{||A||} = \mathcal{O}(\epsilon_{\text{machine}})

for some δACm×n\delta A \in \mathbb C^{m \times n} where  ⁣Q^\hat{\!Q} and  ⁣R^\hat{\!R} are the finite arithmetic versions of QQ and RR. Householder triangularization is therefore backward stable.

Solving Ax=bA\vec{x} = \vec{b} with QRQR Factorization

So Householder triangularization is backwards stable but we also know that this does not guarantee accuracy if the problem itself is ill-conditioned. Is backward stability enough to guarantee accurate results if we use it for Ax=bA\vec{x} = \vec{b} for instance? It turns out that the accuracy of the product of QRQR is enough to guarantee accuracy of a larger algorithm.

Consider the steps to solving Ax=bA \vec{x} = \vec{b} using QRQR factorization:

  1. Compute the QRQR factorization of AA

  2. Multiply the vector b\vec{b} by QQ^\ast so that y=Qb\vec{y} = Q^\ast \vec{b}.

  3. Solve using backward-substitution the triangular system Rx=yR \vec{x} = \vec{y} or x=R1y\vec{x} = R^{-1} \vec{y}.

We know that step (1) is backward stable, what about step (2), the matrix-vector multiplication? We can write the estimate of the backwards stability as

( ⁣Q^+δQ) ⁣y^=bwithδQ=O(ϵmachine)(\hat{\!Q} + \delta Q) \cdot \hat{\!y} = b \quad \text{with} \quad ||\delta Q|| = \mathcal{O}(\epsilon_{\text{machine}})

where we have inverted the matrix  ⁣Q^ \hat{\!Q} since it is unitary. Since this is exact we know also that the matrix-vector multiplication is also backwards stable since this is an equivalent statement to multiplying bb by a slightly perturbed matrix.

Step (3) is backward substitution (or the computation of R1R^{-1}). Writing the backwards stability estimate we have

( ⁣R^+δR) ⁣x^= ⁣y^withδR ⁣R^=O(ϵmachine)(\hat{\!R} + \delta R) \cdot \hat{\!x} = \hat{\!y} \quad \text{with} \quad \frac{||\delta R||}{||\hat{\!R}||} = \mathcal{O}(\epsilon_{\text{machine}})

demonstrating that the results  ⁣x^\hat{\!x} is the exact solution to a slight perturbation of the original problem.

These results lead to the following two theorems:

Theorem: Using QRQR factorization to solve Ax=bA\vec{x}=\vec{b} as described above is backward stable, satisfying

(A+ΔA)  ⁣x^=b,ΔAA=O(ϵmachine)(A + \Delta A) ~ \hat{\!x} = \vec{b}, \quad \frac{||\Delta A||}{||A||} = \mathcal{O}(\epsilon_{\text{machine}})

for some ΔACm×n\Delta A \in \mathbb C^{m \times n}.

Theorem: The solution x^\hat{x} computed by the above algorithm satisfies

 ⁣x^xx=O(κ(x) ϵmachine).\frac{||\hat{\!x} - \vec{x}||}{||\vec{x}||} = \mathcal{O}(\kappa(x) ~ \epsilon_{\text{machine}}).