![]() | Text provided under a Creative Commons Attribution license, CC-BY. All code is made available under the FSF-approved MIT license. (c) Kyle T. Mandli |
Note: This material largely follows the text “Numerical Linear Algebra” by Trefethen and Bau (SIAM, 1997) and is meant as a guide and supplement to the material presented there.
%matplotlib inline
%precision 3
import matplotlib.pyplot as plt
import numpyNumerical Linear Algebra¶
Numerical methods for linear algebra problems lies at the heart of many numerical approaches and is something we will spend some time on. Roughly we can break down problems that we would like to solve into three general problems, solving a system of equations
Projection problems or Linear Least Squares
and solving the eigenvalue problem
We examine each of these problems separately and will evaluate some of the fundamental properties and methods for solving these problems. We will be careful in deciding how to evaluate the results of our calculations and try to gain some understanding of when and how they fail.
Factorizations¶
For each fundamental problem, there are a range of algorithms to solve them, but all of these algorithms can often be succinctly described in terms of a matrix “factorization”, the ability to write an arbitrary matrix as a product of matrices with special properties (e.g. triangular, diagonal, orthogonal) that make solving the general problem straightforward. For example
| Problem | Algorithms | "Factorizations" |
|---|---|---|
| $$A \mathbf{x} = \mathbf{b}$$ | Gaussian-Elimination, Gauss-Jordan Elimination | $PA=LU$, $A=ER$ |
| $$ A^TA\mathbf{x} = A^T\mathbf{b}$$ | Orthogonalization algorithms (Gram-Schmidt, Modified GS, Householder, Givens) | $A=QR$ |
| $$A \mathbf{v} = \lambda \mathbf{v}$$ | Various Iterative methods (Power, inverse power, $RQ$ with shifts$\ldots$ | $A=S\Lambda S^{-1}$, $A=Q\Lambda Q^T$, $A=MJM^{-1}$ |
| All of the above | Singular Value Decomposition (SVD) | $A = U\Sigma V^T$ |
To develop all of these algorithms in depth is a course in itself, however, here we will highlight some of the key factorizations and algorithms used in modern computational linear algebra. In particular we will highlight issues of accuracy, stability and computational cost particularly in the limit of large systems.
Some Example Problems¶
The number and power of the different tools made available from the study of linear algebra makes it an invaluable field of study. Before we dive in to numerical approximations we first consider some of the pivotal problems that numerical methods for linear algebra are used to address.
For this discussion we will be using the common notation to denote the dimensions of a matrix . The refers to the number of rows and the number of columns. If a matrix is square, i.e. , then we will use the notation that is .
Systems of Equations¶
The first type of problem is to find the solution to a linear system of equations. If we have equations for unknowns it can be written in matrix/vector form,
For this example is an matrix, denoted as being in , and and are column vectors with entries, denoted as .
Example 1: Polynomial Fitting¶
In our previous work on interpolation we found that the unique interpolating polynomial of order through the points can be found by solving a Linear system of equations
where is a VanderMonde matrix whose columns are the basis functions (e.g. the monomials) and are the coefficients (weights) such that the interpolating polynomial is given uniquely by
Examples 2: Solution of Non-linear equations by Newton’s Method¶
Given a non-linear system of equations
Newton’s method provides an iterative method to find roots where at every step of the iteration a linear system of equations
Examples 3: Solution of Boundary Value problems by Finite Differences or Finite Elements¶
Given a general linear differential equation
where is a linear differential operator e.g.
Finite difference or finite element discretizations usually reduce the continuous, infinite dimensional problem to a discrete linear problem of form
Where and are discrete approximations to the solution function and right hand side and is a discrete approximation to . Moreover, for many discretizations, can be very sparse.
Linear least squares¶
In a similar case as above, say we want to fit a particular function (could be a polynomial) to a given number of data points except in this case we have more data points than free parameters. In the case of polynomials this could be the same as saying we have data points but only want to fit a order polynomial through the data where . One of the common approaches to this problem is to minimize the “least-squares” error between the data and the resulting function:
If our function can be written as a linear combination of basis functions
then the linear system through points becomes where is a generalized vandermonde matrix
Which will be over-determined for . However the Least-squares solution for the weights will satisfy
we can guarantee that the error is minimized in the least-squares sense1. (Although we will also show that this is not the most numerically stable way to solve this problem)
Define the data¶
N = 20
x = numpy.linspace(-1.0, 1.0, N)
y = x + numpy.random.random((N))Define basis functions and Vandermonde matrix¶
# define the basis functions
phi_1 = lambda x: numpy.ones(x.shape)
phi_2 = lambda x: x
phi_3 = lambda x: numpy.exp(x)
# Define various Vandermonde matrix based on our x-values
A = numpy.array([phi_1(x), phi_2(x), phi_3(x)]).T
A.shape# Determine the weights of our linear function
# result in the smallest sum of the squares of the residual.
w = numpy.linalg.solve(numpy.dot(A.T, A), numpy.dot(A.T, y))
error = y - A.dot(w)
print("w = {}, ||e|| = {}".format(w, numpy.linalg.norm(error)))# Do the same using numpy's lstsq routine (which solves a more numerically stable problem)
w = numpy.linalg.lstsq(A, y, rcond=None)[0]
error = y - A.dot(w)
print("w = {}, ||e|| = {}".format(w, numpy.linalg.norm(error)))# just repeat the calculations to draw the figure
# define data
N = 20
x = numpy.linspace(-1.0, 1.0, N)
y = x + numpy.random.random((N))
# define the basis functions
phi_1 = lambda x: numpy.ones(x.shape)
phi_2 = lambda x: x
phi_3 = lambda x: numpy.exp(x)
# Define various Vandermonde matrix based on our x-values
A = numpy.array([phi_1(x), phi_2(x), phi_3(x)]).T
w = numpy.linalg.lstsq(A, y, rcond=None)[0]
# Plot it out, cuz pictures are fun!
fig = plt.figure(figsize=(8, 6))
axes = fig.add_subplot(1, 1, 1)
f = A.dot(w)
axes.plot(x, y, "ko")
axes.plot(x, f, "r")
axes.set_title("Least Squares Fit to Data")
axes.set_xlabel("$x$")
axes.set_ylabel("$f(x)$ and $y_i$")
axes.grid()
plt.show()Eigenproblems¶
Eigenproblems come up in a variety of contexts and often are integral to many problem of scientific and engineering interest. It is such a powerful idea that it is not uncommon for us to take a problem and convert it into an eigenproblem.
One of my favorite examples The Tacoma Narrows Bridge Collapse
Or the original Google Page-Rank algorithm which essentially finds the dominant eigenvector of an enormous sparse Markov Matrix.
Here we introduce the idea and give some examples.
As a review, if (a square matrix with complex values), a non-zero vector is an eigenvector of with a corresponding eigenvalue if
One way to interpret the eigenproblem is that we are attempting to ascertain the “action” of the matrix on some subspace of where this action acts like scalar multiplication. This subspace is called an eigenspace.
General idea of EigenProblems¶
Rewriting the standard Eigen problem for , as
it becomes clear that for to be non-trivial (i.e. ), requires that the matrix be singular,
This is equivalent to finding all values of such that (the determinant of singular matrices is always zero). However, it can also be shown that
which is a th order polynomial in . Thus implies the eigenvalues are the roots of , and the eigenspace corresponding to is just
Solving EigenProblems¶
The temptation (and what we usually teach in introductory linear algebra classes) is to find the roots of to get the eigenvalues, then find the null-space of .
However that would be wrong. The best algorithms for finding Eigenvalues are completely unrelated to rootfinding as we shall see (and in fact, the way you find the roots of polynomials is to find the eigenvalues of a “companion matrix”)
The Singular Value Decomposition¶
One of the most beautiful, and useful, factorizations is the Singular Value Decomposition (or SVD),
where is a general matrix, and are unitary (orthogonal) matrices such that , and is a diagonal matrix with real, positive diagonal entries (the singular values)
where is the rank of
Applications of the SVD¶
The SVD combines all the aspects of basic linear algebra and is used in a large number of applications including
diagnosing ill-conditioned matrices
providing orthonormal bases for the 4-subspaces of
Solving linear systems and linear least-squares problems
Solving ill-conditioned and singular linear systems (Pseudo-inverse)
Dimensional reduction in Data analysis (PCA, EOF, POD analysis)
and more...
Because of its ubiquity in computational linear algebra and data science. we will spend a bit of time to understand the SVD and its applications (but not the specific algorithms). But first
Fundamentals¶
Objectives
Understand basic linear-algebraic operations and their computational costs
Understand Numpy implementation (and performance) for Linear Algebra
Understand Singular vs Invertible matrices
Understand Orthonormal vectors and matrices
Understand vector and Matrix norms
Understand the Condition Number
Matrix-Vector Multiplication¶
One of the most basic operations we can perform with matrices is to multiply them by a vector which maps a vector to a vector. There are multiple ways to interpret and compute the matrix-vector product .
index notation¶
row picture (dot products)¶
We can also consider matrix-vector multiplication as a sequence of inner products (dot-products between the rows of and the vector .
where is the th row of
column picture¶
An alternative (and entirely equivalent way) to write the matrix-vector product is as a linear combination of the columns of where each column’s weighting is .
This view will be useful later when we are trying to interpret various types of matrices.
Operation Counts¶
No matter how you compute , the total number of operations is the same (for a dense matrix ). The row view however is convenient for calculating the Operation counts required for .
If and . Then just counting the number of multiplications involved to compute is
One important property of the matrix-vector product is that is a linear operation, also known as a linear operator. This means that the for any and any we know that
Example: Numerical matrix-vector multiply¶
Write a matrix-vector multiply function and check it with the appropriate numpy routine. Also verify the linearity of the matrix-vector multiply.
# Mat-vec in index notation (the long way...don't do this)
def matrix_vector_product_index(A, x):
m, n = A.shape
b = numpy.zeros(m)
for i in range(m):
for j in range(n):
b[i] += A[i, j] * x[j]
return b
# Mat-vec by row picture (still don't do this)
def matrix_vector_product_row(A, x):
m, n = A.shape
b = numpy.zeros(m)
# loop on rows
for i in range(m):
b[i] = A[i, :].dot(x)
return b
# Mat-vec by column picture (still don't do this)
def matrix_vector_product_col(A, x):
m, n = A.shape
b = numpy.zeros(m)
# loop over columns
for j in range(n):
b += A[:, j] * x[j]
return bCheck equivalence¶
first set up some large random matrices and vectors and compare to numpy built in .dot function
m = 1000
n = 1000
A = numpy.random.uniform(size=(m, n))
x = numpy.random.uniform(size=(n))
y = numpy.random.uniform(size=(n))
c = numpy.random.uniform()funcs = [
matrix_vector_product_index,
matrix_vector_product_row,
matrix_vector_product_col,
]
for f in funcs:
b = f(A, x)
print("{}(A,x) = A.dot(x)? {}".format(f.__name__, numpy.allclose(b, A.dot(x))))Check Linearity¶
for f in funcs:
print(
"{}(A,x+y) = Ax + Ay is {}".format(
f.__name__, numpy.allclose(f(A, (x + y)), f(A, x) + f(A, y))
)
)
print(
"{}(A,cx) = cAx is {}\n".format(
f.__name__, numpy.allclose(f(A, c * x), c * f(A, x))
)
)Check Timing/performance¶
for f in funcs:
print(f.__name__, end="\t")
%timeit f(A,x)
print("numpy.dot(A,x)", end="\t\t\t")
%timeit A.dot(x)Matrix-Matrix Multiplication¶
The matrix product with another matrix is defined as
i.e. each component of is a dot-product between the th row of and the th column of
As with matrix-vector multiplication, Matrix-matrix multiplication can be thought of multiple ways
dot products (each with flops)
multiplying the columns of
Linear combinations of the rows of
Questions¶
What are the dimensions of and so that the multiplication works?
What are the Operations Counts for Matrix-Matrix Multiplication?
Comment on the product vs.
Example: Outer Product¶
The product of two vectors and is a matrix where the columns are the vector multiplied by the corresponding value of : $$
$$
It is useful to think of these as operations on the column vectors, and an equivalent way to express this relationship is $$
$$
Or each column is just a scalar multiple of
Alternatively you can think of this as the rows are all just scalar multiples of
rank 1 updates¶
We call any matrix of the form a “rank one matrix” ( because its rank r=?). These sort of matrix operations are very common in numerical algorithms for orthogonalization, eigenvalues and the original page-rank algorithm for google. Again, the order of operations is critical.
Check¶
Set up some random vectors
m = 10000
n = 10000
u = numpy.random.rand(m)
v = numpy.random.rand(n)
x = numpy.random.rand(n)b = numpy.outer(u, v).dot(x)
c = u.dot(v.dot(x))
d = u * v.dot(x)
print("Max Difference = {}".format(numpy.max(numpy.abs(b - c))))
print("Max Difference = {}".format(numpy.max(numpy.abs(b - d))))
print("Max Difference = {}".format(numpy.max(numpy.abs(c - d))))Check Timing/performance¶
# timing
print("(uv^T).dot(x)", end="\t\t")
%timeit numpy.outer(u,v).dot(x)
print("\nu*v.dot(x)", end="\t\t")
%timeit u*v.dot(x)Example: Upper Triangular Multiplication¶
Consider the multiplication of a matrix and the upper-triangular matrix defined as the matrix with entries for and for . The product can be written as
The columns of are then
so that is the sum of the first columns of .
Example: Write Matrix-Matrix Multiplication¶
Write a function that computes matrix-matrix multiplication and demonstrate the following properties:
(for square matrices))
where
in general
def matrix_matrix_product(A, B):
C = numpy.zeros((A.shape[0], B.shape[1]))
for i in range(A.shape[0]):
for j in range(B.shape[1]):
for k in range(A.shape[1]):
C[i, j] += A[i, k] * B[k, j]
return C
m = 4
n = 4
p = 4
A = numpy.random.uniform(size=(m, n))
B = numpy.random.uniform(size=(n, p))
C = numpy.random.uniform(size=(m, p))
c = numpy.random.uniform()
print(numpy.allclose(matrix_matrix_product(A, B), numpy.dot(A, B)))
print(
numpy.allclose(
matrix_matrix_product(A, (B + C)),
matrix_matrix_product(A, B) + matrix_matrix_product(A, C),
)
)
print(numpy.allclose(matrix_matrix_product(A, c * B), c * matrix_matrix_product(A, B)))
print(numpy.allclose(matrix_matrix_product(A, B), matrix_matrix_product(B, A)))NumPy Products¶
NumPy and SciPy contain routines that are optimized to perform matrix-vector and matrix-matrix multiplication. Given two ndarrays you can take their product by using the dot function.
n = 10
# Matrix vector with identity
I = numpy.identity(n)
x = numpy.random.random(n)
print(x)print("x = Ix is {}\n".format(numpy.allclose(x, numpy.dot(I, x))))print("x - I.dot(x) = {}\n".format(x - I.dot(x)))print("I*x = \n{}\n".format(I * x))print(I.dot(x))# Matrix vector product
m = 5
A = numpy.random.random((m, n))
print(numpy.dot(A, x))# Matrix matrix product
B = numpy.random.random((n, m))
print(numpy.dot(A, B))
print()
print(A.dot(B))Check non-commutative property of matrix-matrix multiplication¶
It’s easy to demonstrate the non-commutative nature of Matrix-Matrix multiplication with almost any two matrices and . For a clear demonstration, consider the two matrices a diagonal matrix and a matrix of all 1s
N = 5
D = numpy.diag(numpy.array(range(1, N + 1)))
A = numpy.ones((N, N))
print(D, "\n")
print(A)print(D.dot(A))print(A.dot(D))Range and Null-Space¶
Range¶
The range of a matrix (similar to any function), denoted as , is the set of vectors that can be expressed as for .
We can also then say that that is the space spanned by the columns of . In other words the linearly independent columns of provide a basis for , also called the column space of the matrix .
controls the existence of solutions to
Null-Space¶
Similarly the null-space of a matrix , denoted is the set of vectors that satisfy .
controls the uniqueness of solutions to
A similar concept is the rank of the matrix , denoted as , is the dimension of the column space. A matrix is said to have full-rank if . This property also implies that the matrix mapping is one-to-one.
Inverse¶
A non-singular or invertible matrix is characterized as a square matrix with full-rank. This is related to why we know that the matrix is one-to-one, we can use it to transform a vector and using the inverse, denoted , we can map it back to the original matrix. The familiar definition of this is
Since has full rank, its columns form a basis for and the vector must be in the column space of .
There are a number of important properties of an invertible matrix . Here we list them as the following equivalent statements
has a unique inverse such that
0 is not an eigenvalue of
Example: Properties of invertible matrices¶
Show that given an invertible matrix that the rest of the properties hold. Make sure to search the numpy packages for relevant functions.
m = 5
# generate a random m x m invertible matrix
for n in range(100):
A = numpy.random.uniform(size=(m, m))
if numpy.linalg.det(A) != 0:
break
print("A^{{-1}}*A = \n\n{}\n".format(numpy.dot(numpy.linalg.inv(A), A)))
print("rank(A) = {}\n".format(numpy.linalg.matrix_rank(A)))
print("N(A)= {}\n".format(numpy.linalg.solve(A, numpy.zeros(m))))
print("Eigenvalues = {}".format(numpy.linalg.eigvals(A)))Orthogonal Vectors and Matrices¶
Orthogonality is a very important concept in linear algebra that forms the basis of many of the modern methods used in numerical computations.
Two vectors are said to be orthogonal if their inner-product or dot-product defined as
Here we have shown the various notations you may run into (the inner-product is in-fact a general term for a similar operation for mathematical objects such as functions).
If then we say and are orthogonal. The reason we use this terminology is that the inner-product of two vectors can also be written in terms of the angle between them where
and is the Euclidean () norm of the vector , which we can interpret as the length of a vector.
The generalization of the inner-product to complex vector spaces is defined as
where is the complex-conjugate of the value .
Orthonormality¶
Taking this idea one step further we can say a set of vectors are orthogonal to if
If in addition
then they are also called orthonormal.
Note that we dropped the 2 as a subscript to the notation for the norm of a vector. Later we will explore other ways to define a norm of a vector other than the Euclidean norm defined above.
Another concept that is related to orthogonality is linear-independence. A set of vectors are linearly independent if that each cannot be written as a linear combination of the other vectors in the set .
An equivalent statement is that given a set of vectors , the only set of scalars that satisfies
is if for all
Show that if a set of vectors are orthonormal, they must be linearly independent.
This can be related directly through the idea of projection. If we have a set of vectors we can project another vector onto the vectors in by using the inner-product. This is especially powerful if we have a set of orthogonal vectors , which are said to span a space (or provide a basis for a space), s.t. any vector in the space spanned by can be expressed as a linear combination of the basis vectors
Note if that
Looping back to matrices, the column space of a matrix is spanned by its linearly independent columns. Any vector in the column space can therefore be expressed via the equation above.
Unitary Matrices¶
A special class of matrices are called unitary matrices when complex-valued and orthogonal when purely real-valued if the columns of the matrix are orthonormal to each other. Importantly this implies that for a unitary matrix we know the following
therefore if is square,
where is called the adjoint (or Hermitian) of . The adjoint is defined as the transpose of the original matrix with the entries being the complex conjugate of each entry as the notation implies. Note, if then
As an example if we have the matrix
The important part of being an unitary matrix is that the projection onto the column space of the matrix preserves geometry in an Euclidean sense, i.e. preserves the Cartesian distance.
Vector and Matrix Norms and the condition number¶
The following sections will lay out the definitions and computation required to calculate a range of vector and matrix norms which provide a measure of the “size” of an object or the distance in some vector space.
In the context of Numerical Linear algebra, norms are essential for defining a key property of a matrix, the “condition number”.
In infinite precision, a square matrix is either invertible or singular, however, in finite precision, even an invertible matrix can behave poorly if it is ‘ill-conditioned’, i.e. it is almost singular.
We need a quantitiative measure of how “near-singular” a matrix is
Why not the Determinant ?¶
We know that if a matrix is singular, . But what if is small?
Turns out even a perfectly invertible, well-conditioned matrix can have arbitrarily small determinant.
More generally if , .
Yet all these matrices are diagonal and easily inverted (i.e. ). Thus we need something better (the condition number). But to get there requires developing important ideas about vector and matrix norms.
Vector Norms¶
Norms (and also measures) provide a means for measure the “size” or distance in a space. In general a norm is a function, denoted by , that maps . In other words we stick in a multi-valued object and get a single, real-valued number out the other end.
All norms satisfy the properties:
only if
(triangle inequality)
where
There are a number of relevant norms that we can define beyond the Euclidean norm, also know as the 2-norm or norm:
norm:
norm:
norm:
norm:
weighted norm:
These are also related to other norms denoted by capital letters ( for instance). In this case we use the lower-case notation to denote finite or discrete versions of the infinite dimensional counterparts.
Example: Comparisons Between Norms¶
Compute the norms given some vector and compare their values. Verify the properties of the norm for one of the norms.
def pnorm(x, p):
"""return the vector p norm of a vector
parameters:
-----------
x: numpy array
vector
p: float or numpy.inf
value of p norm such that ||x||_p = (sum(|x_i|^p))^{1/p} for p< inf
for infinity norm return max(abs(x))
returns:
--------
pnorm: float
pnorm of x
"""
if p == numpy.inf:
norm = numpy.max(numpy.abs(x))
else:
norm = numpy.sum(numpy.abs(x) ** p) ** (1.0 / p)
return normm = 10
p = 4
x = numpy.random.uniform(size=m)
ell_1 = pnorm(x, 1)
ell_2 = pnorm(x, 2)
ell_p = pnorm(x, p)
ell_infty = pnorm(x, numpy.inf)
print("x = {}".format(x))
print()
print(
"L_1 = {}\nL_2 = {}\nL_{} = {}\nL_inf = {}".format(
ell_1, ell_2, p, ell_p, ell_infty
)
)y = numpy.random.uniform(size=m)
print()
print("Properties of norms:")
print("x = {}".format(x))
print("y = {}\n".format(y))
p = 2
print(
"||x+y||_{p} = {nxy}\n||x||_{p} + ||y||_{p} = {nxny}".format(
p=p, nxy=pnorm(x + y, p), nxny=pnorm(x, p) + pnorm(y, p)
)
)
c = -0.1
print("||c x||_{} = {}".format(p, pnorm(c * x, p)))
print("|c||x||_{} = {}".format(p, abs(c) * pnorm(x, p)))Geometric Interpretation¶
For every norm we can define a set of ‘unit vectors’ that is the set of all vectors in with .
For example in , the unit spheres in the 1-,2- and -norm look like
import matplotlib.patches as patches
# Note: that this code is a bit fragile to angles that go beyond pi
# due to the use of arccos.
head_width = 0.1
head_length = 1.5 * head_width
def draw_unit_vectors(axes, A, head_width=0.1):
head_length = 1.5 * head_width
image_e = numpy.empty(A.shape)
angle = numpy.empty(A.shape[0])
image_e[:, 0] = numpy.dot(A, numpy.array((1.0, 0.0)))
image_e[:, 1] = numpy.dot(A, numpy.array((0.0, 1.0)))
for i in range(A.shape[0]):
angle[i] = numpy.arccos(image_e[0, i] / numpy.linalg.norm(image_e[:, i], ord=2))
axes.arrow(
0.0,
0.0,
image_e[0, i] - head_length * numpy.cos(angle[i]),
image_e[1, i] - head_length * numpy.sin(angle[i]),
head_width=head_width,
color="b",
alpha=0.5,
)
# comparison of norms
# ============
# 1-norm
# Unit-ball
fig = plt.figure(figsize=(8, 6))
# fig.suptitle("1-Norm: $||A||_1 = {}$".format(numpy.linalg.norm(A,ord=1)), fontsize=16)
theta = numpy.linspace(0.0, 2.0 * numpy.pi, 100)
axes = fig.add_subplot(1, 1, 1, aspect="equal")
axes.plot(
(1.0, 0.0, -1.0, 0.0, 1.0),
(0.0, 1.0, 0.0, -1.0, 0.0),
"r",
label="$||\mathbf{x}||_1=1$",
)
axes.plot(numpy.cos(theta), numpy.sin(theta), "g", label="$||\mathbf{x}||_2=1$")
axes.plot(
(1.0, -1.0, -1.0, 1.0, 1.0),
(1.0, 1.0, -1.0, -1.0, 1.0),
"b",
label="$||\mathbf{x}||_\infty=1$",
)
draw_unit_vectors(axes, numpy.eye(2))
axes.arrow(
0.0,
0.0,
1.0 - head_length * numpy.cos(numpy.pi / 4.0),
1.0 - head_length * numpy.sin(numpy.pi / 4.0),
head_width=head_width,
color="k",
linestyle="--",
alpha=0.5,
)
axes.set_title("Unit Ball in 1-,2-,and $\infty$ norm")
axes.set_xlim((-1.1, 1.1))
axes.set_ylim((-1.1, 1.1))
axes.grid(True)
# axes.legend([ '$||\mathbf{x}||_1=1$', '$||\mathbf{x}||_2=1$', '$||\mathbf{x}||_\infty=1$'], loc='best')
axes.legend(loc="upper center", bbox_to_anchor=(0.5, 0.5))
plt.show()And if then and
Induced Matrix Norms¶
The most direct way to consider a matrix norm is one induced by a vector-norm. Given a vector norm, we can define a matrix p-norm as the smallest number that satisfies the inequality
or as the supremum of the ratios so that
with no loss of generality, we can restrict to the set of all unit vectors and interpret the matrix p-norm as the maximum distortion of the “unit sphere”
Properties of all Matrix Norms (induced and non-induced)¶
In general matrix-norms have the following properties whether they are induced from a vector-norm or not:
and only if
(Triangle Inequality)
In addition, all induced p-norms satisfy the product rules
Computation of induced matrix p-norms¶
With a little work it can be shown that
=
numpy.linalg.norm(A, ord=1)is the maximum 1-norm of the Columns of=
numpy.linalg.norm(A, ord=2) = max(numpy.linalg.svd(A)[1]is the maximum Singular Value of=
numpy.linalg.norm(A, ord=numpy.inf)is the maximum 1-norm of the Rows of (or )
I = numpy.identity(3)
for ord in [1, 2, numpy.inf]:
print("||I||_{} = {}".format(ord, numpy.linalg.norm(I, ord=ord)))D = numpy.diag([2, 1, -3])
for ord in [1, 2, numpy.inf]:
print("||D||_{} = {}".format(ord, numpy.linalg.norm(D, ord=ord)))Orthogonal Matrices where ¶
A fundamental property of matrices is that they do not change the Euclidean length of vectors i.e.
Therefore
Example: Induced Matrix Norms¶
Consider the matrix
Compute the induced-matrix norm of for the vector norms and .
: For both of the requested norms the unit-length vectors and can be used to give an idea of what the norm might be and provide a lower bound.
computing each of the norms we have
which translates into the norms and . This implies that the induced matrix norm of is at least .
The exact value of can be computed using the spectral radius defined as
where are the eigenvalues of . With this we can compute the norm of as
Computing the norm again here we find
which has eigenvalues
so .
The actual induced 2-norm of a matrix can be derived using the Singular Value Decomposition (SVD) and is simply the largest singular value .
Proof: Given that every Matrix can be factored into its SVD (see notebook 10.1):
where and are unitary matrices with the property and (of their respective sizes) and is a real diagonal matrix of singular values .
Then the 2-norm squared of a square matrix is
but so
but if (i.e. is a unit vector), then so is because unitary matrices don’t change the length of vectors. So it follows that
A = numpy.array([[1, 2], [0, 2]])
# calculate the SVD(A)
U, S, Vt = numpy.linalg.svd(A)
print("Singular_values = {}".format(S))
print("||A||_2 = {}".format(S.max()))
print("||A||_2 = {}".format(numpy.linalg.norm(A, ord=2)))
# more fun facts about the SVD
# print(U.T.dot(U))
# print(Vt.T.dot(Vt))
# print(A - numpy.dot(U,numpy.dot(numpy.diag(S),Vt)))The induced 2-norm is the maximum singular value (short version...slightly scrappy)
Given
where and is a diagonal matrix with diagonal entries
The induce -norm is the max of the 1-norm of rows of
because the largest unit vector on the unit sphere in the norm is a vector of 1’s.
# Calculate the 1-norm of A
A = numpy.array([[1, 2], [0, 2]])
normA_1 = numpy.max(numpy.sum(numpy.abs(A), axis=0))
print("||A||_1 = {}".format(normA_1))
print("||A||_1 = {}".format(numpy.linalg.norm(A, ord=1)))# calculate the 2 norm of A
normA_2 = numpy.max(numpy.linalg.svd(A, compute_uv=False))
print("||A||_2 = {}".format(normA_2))
print("||A||_2 = {}".format(numpy.linalg.norm(A, ord=2)))
print()
U, S, V = numpy.linalg.svd(A)
print(U.dot(numpy.diag(S).dot(V)))# calculate the infinity norm of A
normA_inf = numpy.max(numpy.sum(numpy.abs(A), axis=1))
print("||A||_inf = {}".format(normA_inf))
print("||A||_inf = {}".format(numpy.linalg.norm(A, ord=numpy.inf)))The Geometric Picture¶
One of the most useful ways to think about matrix norms is as a transformation of a unit-ball to an ellipse. Depending on the norm in question, the norm will be some combination of the resulting ellipse.
1-Norm: ¶
A = numpy.array([[1, 2], [0, 2]])
# ============
# 1-norm
# Unit-ball
fig = plt.figure()
fig.set_figwidth(fig.get_figwidth() * 2)
fig.suptitle("1-Norm: $||A||_1 = {}$".format(numpy.linalg.norm(A, ord=1)), fontsize=16)
axes = fig.add_subplot(1, 2, 1, aspect="equal")
axes.plot((1.0, 0.0, -1.0, 0.0, 1.0), (0.0, 1.0, 0.0, -1.0, 0.0), "r")
draw_unit_vectors(axes, numpy.eye(2))
axes.set_title("Unit Ball")
axes.set_xlim((-1.1, 1.1))
axes.set_ylim((-1.1, 1.1))
axes.grid(True)
# Image
axes = fig.add_subplot(1, 2, 2, aspect="equal")
axes.plot((1.0, 2.0, -1.0, -2.0, 1.0), (0.0, 2.0, 0.0, -2.0, 0.0), "r")
draw_unit_vectors(axes, A, head_width=0.2)
axes.set_title("Images Under A")
axes.grid(True)
plt.show()2-Norm: ¶
# ============
# 2-norm
# Unit-ball
fig = plt.figure()
fig.suptitle(
"2-Norm: $||A||_2 = ${:3.4f}".format(numpy.linalg.norm(A, ord=2)), fontsize=16
)
fig.set_figwidth(fig.get_figwidth() * 2)
axes = fig.add_subplot(1, 2, 1, aspect="equal")
axes.add_artist(plt.Circle((0.0, 0.0), 1.0, edgecolor="r", facecolor="none"))
draw_unit_vectors(axes, numpy.eye(2))
axes.set_title("Unit Ball")
axes.set_xlim((-1.1, 1.1))
axes.set_ylim((-1.1, 1.1))
axes.grid(True)
# Image
# Compute some geometry
u, s, v = numpy.linalg.svd(A)
theta = numpy.empty(A.shape[0])
ellipse_axes = numpy.empty(A.shape)
theta[0] = numpy.arccos(u[0][0]) / numpy.linalg.norm(u[0], ord=2)
theta[1] = theta[0] - numpy.pi / 2.0
for i in range(theta.shape[0]):
ellipse_axes[0, i] = s[i] * numpy.cos(theta[i])
ellipse_axes[1, i] = s[i] * numpy.sin(theta[i])
axes = fig.add_subplot(1, 2, 2, aspect="equal")
axes.add_artist(
patches.Ellipse(
(0.0, 0.0),
2 * s[0],
2 * s[1],
theta[0] * 180.0 / numpy.pi,
edgecolor="r",
facecolor="none",
)
)
for i in range(A.shape[0]):
axes.arrow(
0.0,
0.0,
ellipse_axes[0, i] - head_length * numpy.cos(theta[i]),
ellipse_axes[1, i] - head_length * numpy.sin(theta[i]),
head_width=head_width,
color="k",
)
draw_unit_vectors(axes, A, head_width=0.2)
axes.set_title("Images Under A")
axes.set_xlim((-s[0] + 0.1, s[0] + 0.1))
axes.set_ylim((-s[0] + 0.1, s[0] + 0.1))
axes.grid(True)
plt.show()-Norm: ¶
# ============
# infty-norm
# Unit-ball
fig = plt.figure()
fig.suptitle(
"$\infty$-Norm: $||A||_\infty = {}$".format(numpy.linalg.norm(A, ord=numpy.inf)),
fontsize=16,
)
fig.set_figwidth(fig.get_figwidth() * 2)
axes = fig.add_subplot(1, 2, 1, aspect="equal")
axes.plot((1.0, -1.0, -1.0, 1.0, 1.0), (1.0, 1.0, -1.0, -1.0, 1.0), "r")
draw_unit_vectors(axes, numpy.eye(2))
axes.set_title("Unit Ball")
axes.set_xlim((-1.1, 1.1))
axes.set_ylim((-1.1, 1.1))
axes.grid(True)
# Image
# Geometry - Corners are A * ((1, 1), (1, -1), (-1, 1), (-1, -1))
# Symmetry implies we only need two. Here we just plot two
u = numpy.empty(A.shape)
u[:, 0] = numpy.dot(A, numpy.array((1.0, 1.0)))
u[:, 1] = numpy.dot(A, numpy.array((-1.0, 1.0)))
theta[0] = numpy.arccos(u[0, 0] / numpy.linalg.norm(u[:, 0], ord=2))
theta[1] = numpy.arccos(u[0, 1] / numpy.linalg.norm(u[:, 1], ord=2))
axes = fig.add_subplot(1, 2, 2, aspect="equal")
axes.plot((3, 1, -3, -1, 3), (2, 2, -2, -2, 2), "r")
for i in range(A.shape[0]):
axes.arrow(
0.0,
0.0,
u[0, i] - head_length * numpy.cos(theta[i]),
u[1, i] - head_length * numpy.sin(theta[i]),
head_width=head_width,
color="k",
)
draw_unit_vectors(axes, A, head_width=0.2)
axes.set_title("Images Under A")
axes.set_xlim((-4.1, 4.1))
axes.set_ylim((-3.1, 3.1))
axes.grid(True)
plt.show()Cauchy-Schwarz and Hölder Inequalities¶
Computing matrix norms where or is more difficult unfortunately. We have a couple of tools that can be useful however.
Cauchy-Schwarz Inequality: For the special case where , for any vectors and
Hölder’s Inequality: Turns out this holds in general if given a and that satisfy with
Note: this is essentially what we used in the proof of the norm with and
The most widely used matrix norm not induced by a vector norm is the Frobenius norm defined by
Invariance under unitary multiplication¶
One important property of the matrix 2-norm (and Frobenius norm) is that multiplication by a unitary matrix does not change the product (kind of like multiplication by 1). In general for any and unitary matrix we have
The condition number¶
Finally we have enough machinery to define the condition number (or often ) which is simply
Diagonal Matrix
2-norm condition number of a matrix
if is singular,
More generally, it’s easy to show that scaling of a matrix does not change its condition number
The condition number and error analysis of ¶
The condition number is important in many parts of analysis of numerical linear algebra, but is easily illustrated in understanding the behavior of solutions of linear systems
Assume that is a solution to and we want to understand how a small change in the RHS propagates to errors in
Consider the perturbed problem
which by linearity of Matrix vector multiplication, and that implies that
or
since , it follows that
or the relative error in the solution depends on the relative error in the RHS times the condition number.
alpha = numpy.finfo(float).eps
A = numpy.array([[1, 2], [1 + alpha, 2]])
# first version
b = numpy.array([1.0, 1.0])
x = numpy.linalg.solve(A, b)
print("b = {}\nx = {}".format(b, x))# now perturb b by epsilon
bp = numpy.array([1, 1 - alpha])
xp = numpy.linalg.solve(A, bp)
print("b'={}\nx' = {}".format(bp, xp))# and calculate relative error and condition number
err_b = numpy.linalg.norm(bp - b) / numpy.linalg.norm(b)
err_x = numpy.linalg.norm(xp - x) / numpy.linalg.norm(x)
condA = numpy.linalg.cond(A)
print("k(A) = {}".format(condA))
print("err_b = {}, err_x = {}, err_x/err_b ={}".format(err_b, err_x, err_x / err_b))