Text provided under a Creative Commons Attribution license, CC-BY. All code is made available under the FSF-approved MIT license. (c) Kyle T. Mandli

Note: This material largely follows the text “Numerical Linear Algebra” by Trefethen and Bau (SIAM, 1997) and is meant as a guide and supplement to the material presented there.

from __future__ import print_function

%matplotlib inline
import matplotlib.pyplot as plt
import numpy

Conditioning and Stability¶

Once an approximation to a linear system is constructed the next question is how much trust can we put in the approximation? Since the true solution is not known, one of the few tools we have is to ask how well the approximation matches the original equation. In other words, we seek a solution to a system,

\vec{f}(\vec{x}) = \vec{b}.

(1)

We do not have $\vec{x}$ but instead have an approximation, $\hat{x}$ , and we hope that

\vec{f}(\hat{x}) \approx \vec{b}.

(2)

In this section the question we explore is to try to determine a bound on the relative error, $\frac{||\vec{x}-\hat{x}||}{||\vec{x}||}$ given the matrix, $A$ .

This leads to the notion of conditioning. Conditioning is the behavior of a problem when the solution is a changed a small bit (perturbed), and it is a mathematical (analytic) property of the original system of equations. Stability, on the other hand, is concerned with how the algorithm used to obtain an approximation behaves when the approximation is perturbed.

Conditioning and Condition Numbers¶

A well-conditioned problem is one where a small perturbation to the original problem leads to only small changes in the solution.

Formally we can think of a function $f$ which maps $x$ to $y$

f(x) = y \quad \text{or} \quad f: X \rightarrow Y.

(3)

Let $x \in X$ where we perturb $x$ with $\delta x$ and we ask how the result $y$ changes:

||f(x) - f(x + \delta x)|| \leq C ||x - (x+\delta x)||

(4)

for some constant $C$ possible dependent on $\delta x$ depending on the type of conditioning we are considering.

Absolute Condition Number¶

If we let $\delta x$ be the small perturbation to the input and $\delta f = f(x + \delta x) - f(x)$ be the result the absolute condition number $\hat{~\kappa}$ can be defined as

\hat{\!\kappa} = \sup_{\delta x} \frac{||\delta f||}{||\delta x||}

(5)

for most problems (assuming $\delta f$ and $\delta x$ are both infinitesimal).

When $f$ is differentiable we can evaluate the condition number via the Jacobian. Recall that the derivative of a multi-valued function can be termed in the form of a Jacobian $J(x)$ where

[J(x)]_{ij} = \frac{\partial f_i}{\partial x_j}(x).

(6)

This allows us to write the infinitesimal $\delta f$ as

\delta f \approx J(x) \delta x

(7)

with equality when $||\delta x|| \rightarrow 0$ . Then we can write the condition number as

\hat{\!\kappa} = ||J(x)||

(8)

where the norm is the one induced by the spaces $X$ and $Y$ .

Relative Condition Number¶

The relative condition number is defined similarly and is related to the difference before between the absolute error and relative error as defined previously. With the same caveats as before it can be defined as

\kappa = \sup_{\delta x} \left( \frac{\frac{||\delta f||}{||f(x)||}}{\frac{||\delta x||}{||x||}} \right).

(9)

Again if $f$ is differentiable we can use the Jacobian $J(x)$ to evaluate the relative condition number as

\kappa = \frac{||J(x)||}{||f(x)|| ~/ ~||x||}.

(10)

Examples¶

Calculate the following relative condition numbers of the following problems.

$\sqrt{x}$ for $x > 0$ .

f(x) = \sqrt{x}, \quad J(x) = f'(x) = \frac{1}{2 \sqrt{x}} \\ \kappa = \frac{||J(x)||}{||f(x)|| / ||x||} = \frac{1}{2 \sqrt{x}} \frac{x}{\sqrt{x}} = \frac{1}{2}

(11)

Calculate the relative condition number for the scalar function $f(x) = x_1 - x_2$ using the vector $\vec{x} = (x_1, x_2)^T \in \mathbb R^2$ using an $\ell_\infty$ norm.

f(x) = x_1 - x_2, \quad J(x) = \left [ \frac{\partial f}{\partial x_1}, \frac{\partial f}{\partial x_2}\right ] = [1, -1] \\ \kappa = \frac{||J(x)||_\infty}{||f(x)||_\infty / ||x||_\infty} = \frac{2 \max_{i=1,2} |x_i|}{|x_1 - x_2|}

(12)

where

||J||_\infty = 2.

(13)

The condition number of a function was discussed in general terms above. Now, the more specific case of a linear function, a matrix-vector multiplication, is examined. Here we let $\vec{f}(\vec{x})=Ax$ and determine the condition number by perturbing $x$ .

We begin with the definition above,

\begin{aligned} \kappa &= \sup_{\delta x} \left ( \frac{||A (\vec{x}+\delta x) - A \vec{x}||}{||A\vec{x}||} \frac{||\vec{x}||}{||\delta x||}\right ), \\ &= \sup_{\delta x} \frac{ ||A \delta x||}{||\delta x||} \frac{||\vec{x}||}{||A\vec{x}||}, \\ &= ||A|| \frac{||\vec{x}||}{||A \vec{x}||}, \end{aligned}

(14)

where $\delta x$ is a vector.

If $A$ has an inverse, then we note that $$

\begin{align} \vec{x} &= A^{-1}A \vec{x}, \\ \Rightarrow ||\vec{x}|| &= || A^{-1}A \vec{x} ||, \\ &\leq ||A^{-1}|| || A \vec{x} ||, \end{align}

(15)

which implies that

(16)

\frac{||x||}{||A x||} \leq ||A^{-1}||.

We can now bound the condition number for a matrix by

(17)

\kappa \leq ||A|| ||A^{-1}||.

$$

Condition Number of a Matrix¶

The condition number of a matrix is defined by the product

\kappa(A) = ||A||~||A^{-1}||.

(18)

where here we are thinking about the matrix rather than a problem. If $\kappa$ is small than $A$ is said to be well-conditioned. If $A$ is singular we assign $\kappa(A) = \infty$ as the matrix’s condition number.

When we are considering the $\ell_2$ norm then we can write the condition number as

\kappa(A) = \frac{\sqrt{\rho(A^\ast A)}}{\sqrt{\rho((A^\ast A)^{-1})}} = \frac{\sqrt{\max |\lambda|}}{\sqrt{\min |\lambda|}}.

(19)

Condition Number of a System of Equations¶

Another way to think about the conditioning of a problem we have looked at before is that the matrix $A$ itself is an input to the problem. Consider than the system of equations $A\vec{x} = \vec{b}$ where we will perturb both $A$ and $\vec{x}$ resulting in

(A + \delta A)(\vec{x} + \delta x) = \vec{b}.

(20)

Assuming we solve the problem exactly we know that $A\vec{x} = \vec{b}$ and that the infinitesimals multiplied $\delta A \delta x$ are smaller than the other term, and the above expression can be approximation by

\begin{aligned} (A + \delta A)(\vec{x} + \delta x) &= \vec{b}, \\ A\vec{x} + \delta Ax + A \delta x + \delta A \delta \vec{x} &= \vec{b} \\ \delta A\vec{x} + A \delta x & = 0 \end{aligned}

(21)

Solving for $\delta x$ leads to

\delta x = -A^{-1} \delta A \vec{x}

(22)

implying

||\delta x|| \leq ||A^{-1}|| ~ ||\delta A|| ~ ||\vec{x}||

(23)

and therefore

\frac{\frac{||\delta x||}{||\vec{x}||}}{\frac{||\delta A||}{||A||}} \leq ||A^{-1}||~||A|| = \kappa(A).

(24)

We can also say the following regarding the condition number of a system of equations then

Theorem: Let $\vec{b}$ be fixed and consider the problem of computing $\vec{x}$ in $A\vec{x} = \vec{b}$ where $A$ is square and non-singular. The condition number of this problem with respect to perturbations in $A$ is the condition number of the matrix $\kappa(A)$ .

Stability¶

We now return to the consideration of the fact that we are interested not only in the well-conditioning of a mathematical problem but in how we might solve it on a finite precision machine. In some sense conditioning describes how well we can solve a problem in exact arithmetic and stability how well we can solve the problem in finite arithmetic.

Accuracy and Stability¶

As we have defined before we will consider absolute error as

||F(x) - f(x)||

(25)

where $F(x)$ is the approximation to the true solution $f(x)$ . Similarly we can define relative error as

\frac{||F(x) - f(x)||}{||f(x)||}.

(26)

In the ideal case we would like the relative error to be $\mathcal{O}(\epsilon_{\text{machine}})$ .

Forwards Stability¶

A forward stable algorithm for $x \in X$ has

\frac{||F(x) - f(x)||}{||f(x)||} = \mathcal{O}(\epsilon_{\text{machine}})

(27)

In other words

A forward stable algorithm gives almost the right answer to exactly the right question.

Backwards Stability¶

A stronger notion of stability can also be defined which is satisfied by many approaches in numerical linear algebra. We say that an algorithm $F$ is backward stable if for $x \in X$ we have

F(x) = f(\hat{\!x})

(28)

for some $\hat{\!x}$ with

\frac{||\hat{\!x} - x||}{||x||} = \mathcal{O}(\epsilon_{\text{machine}}).

(29)

In other words

A backward stable algorithm gives exactly the right answer to nearly the right question.

Combining these ideas along with the idea that we should not expect to be able to accurately compute the solution to a poorly conditioned problem we can form the mixed forward-backward sense of stability as for $x \in X$ if

\frac{||F(x) - f(\hat{\!x})||}{||f(\hat{\!x})||} = \mathcal{O}(\epsilon_{\text{machine}})

(30)

for some $\hat{\!x}$ with

\frac{||\hat{\!x} - x||}{||x||} = \mathcal{O}(\epsilon_{\text{machine}}).

(31)

In other words

A stable algorithm gives nearly the right answer to nearly the right question.

An important aspect of the above statement is that we can not necessarily guarantee an accurate result. If the condition number $\kappa(x)$ is small we would expect that a stable algorithm would give us an accurate result (by definition). This is reflected in the following theorem.

Theorem: Suppose a backward stable algorithm is applied to solve a problem $f: X \rightarrow Y$ with condition number $\kappa$ on a finite precision machine, then the relative errors satisfy

\frac{||F(x) - f(\hat{\!x})||}{||f(\hat{\!x})||} = \mathcal{O}(\kappa(x) ~ \epsilon_{\text{machine}}).

(32)

Proof: By the definition of the condition number of a problem we can write

\frac{||F(x) - f(\hat{\!x})||}{||f(\hat{\!x})||} \leq (\kappa(x) + \mathcal{O}(\epsilon_{\text{machine}}))\frac{||\hat{\!x} - x||}{||x||}.

(33)

Combining this with the definition of backwards stability we can arrive at the statement of the theorem.

Backward Error Analysis - Process of using the condition number of the problem and stability of the algorithm to determine the error.

Forward Error Analysis - Considers the accrual of error at each step of an algorithm given slightly perturbed input.

Stability of $A\vec{x} = \vec{b}$ using Householder Triangularization¶

As an example lets consider the conditioning and algorithm for solving $A\vec{x} = \vec{b}$ . Here we will use a $QR$ factorization approach to solve $A\vec{x} = \vec{b}$ given by a Householder triangularization. First off lets discuss the $QR$ factorization itself.

Theorem: Let the $QR$ factorization $A = QR$ of a matrix $A \in \mathbb C^{m \times n}$ be computed using a Householder triangularization approach on a finite precision machine, then

\hat{\!Q} \cdot \hat{\!R} = A + \delta A \quad \frac{||\delta A||}{||A||} = \mathcal{O}(\epsilon_{\text{machine}})

(34)

for some $\delta A \in \mathbb C^{m \times n}$ where $\hat{\!Q}$ and $\hat{\!R}$ are the finite arithmetic versions of $Q$ and $R$ . Householder triangularization is therefore backward stable.

Solving $A\vec{x} = \vec{b}$ with $QR$ Factorization¶

So Householder triangularization is backwards stable but we also know that this does not guarantee accuracy if the problem itself is ill-conditioned. Is backward stability enough to guarantee accurate results if we use it for $A\vec{x} = \vec{b}$ for instance? It turns out that the accuracy of the product of $QR$ is enough to guarantee accuracy of a larger algorithm.

Consider the steps to solving $A \vec{x} = \vec{b}$ using $QR$ factorization:

Compute the $QR$ factorization of $A$
Multiply the vector $\vec{b}$ by $Q^\ast$ so that $\vec{y} = Q^\ast \vec{b}$ .
Solve using backward-substitution the triangular system $R \vec{x} = \vec{y}$ or $\vec{x} = R^{-1} \vec{y}$ .

We know that step (1) is backward stable, what about step (2), the matrix-vector multiplication? We can write the estimate of the backwards stability as

(\hat{\!Q} + \delta Q) \cdot \hat{\!y} = b \quad \text{with} \quad ||\delta Q|| = \mathcal{O}(\epsilon_{\text{machine}})

(35)

where we have inverted the matrix $\hat{\!Q}$ since it is unitary. Since this is exact we know also that the matrix-vector multiplication is also backwards stable since this is an equivalent statement to multiplying $b$ by a slightly perturbed matrix.

Step (3) is backward substitution (or the computation of $R^{-1}$ ). Writing the backwards stability estimate we have

(\hat{\!R} + \delta R) \cdot \hat{\!x} = \hat{\!y} \quad \text{with} \quad \frac{||\delta R||}{||\hat{\!R}||} = \mathcal{O}(\epsilon_{\text{machine}})

(36)

demonstrating that the results $\hat{\!x}$ is the exact solution to a slight perturbation of the original problem.

These results lead to the following two theorems:

Theorem: Using $QR$ factorization to solve $A\vec{x}=\vec{b}$ as described above is backward stable, satisfying

(A + \Delta A) ~ \hat{\!x} = \vec{b}, \quad \frac{||\Delta A||}{||A||} = \mathcal{O}(\epsilon_{\text{machine}})

(37)

for some $\Delta A \in \mathbb C^{m \times n}$ .

Theorem: The solution $\hat{x}$ computed by the above algorithm satisfies

\frac{||\hat{\!x} - \vec{x}||}{||\vec{x}||} = \mathcal{O}(\kappa(x) ~ \epsilon_{\text{machine}}).

(38)

Conditioning and Stability

Conditioning and Stability¶

Conditioning and Condition Numbers¶

Absolute Condition Number¶

Relative Condition Number¶

Examples¶

Condition Number of a Matrix¶

Condition Number of a System of Equations¶

Stability¶

Accuracy and Stability¶

Forwards Stability¶

Backwards Stability¶

Stability of Ax⃗=b⃗A\vec{x} = \vec{b}Ax=b using Householder Triangularization¶

Solving Ax⃗=b⃗A\vec{x} = \vec{b}Ax=b with QRQRQR Factorization¶

Stability of $A\vec{x} = \vec{b}$ using Householder Triangularization¶

Solving $A\vec{x} = \vec{b}$ with $QR$ Factorization¶