![]() | Text provided under a Creative Commons Attribution license, CC-BY. All code is made available under the FSF-approved MIT license. (c) Kyle T. Mandli and Rajath Kumar Mysore Pradeep Kumar |
Note: This material largely follows the text “Numerical Linear Algebra” by Trefethen and Bau (SIAM, 1997) and is meant as a guide and supplement to the material presented there.
from __future__ import print_function
%precision 6
%matplotlib inline
import matplotlib.pyplot as plt
import numpy
import pandas as pdUnderstanding the Singular Value Decomposition¶
Factorizations: A review¶
Matrix Factorization is a fundamental concept in Linear Algebra and refers to the ability to write general matrices as the products of simpler matrices with special properties.
In particular, all of the great algorithms encountered in numerical linear Algebra can be succinctly described by their factorizations. Examples include
The LU decomposition¶
Algorithms Gaussian Elimination with partial pivoting
Factorization Permutation matrix (from pivoting), : Lower and upper triangular matrices
Application direct solution of for
The QR decomposition¶
Algorithms Gram-Schmidt Orthogonalization, Householder Triangularization
Factorization : Orthonormal Basis for , : upper triangular matrix
Applications Least-Squares solutions, Projection problems, Eigenvalues
EigenProblems¶
Diagonalizable matrices
Hermitian/Symmetric Matrices
Schur Factorization
Algorithms: Power Method, Inverse Power with shifts,
Factorization-- , : matrix of Eigenvectors, and diagonal matrix of eigenvalues
Application Solution of Dynamical systems, Iterative Maps, Markov Chains, Vibrational analysis, Quantum Mechanics
*But perhaps the most beautiful factorization, which contains aspects of all of these problems
Singular Value Decomposition (SVD)¶
for any , where
and is the orthogonal matrix whose columns are the eigenvectors of
and is the orthogonal matrix whose columns are the eigenvectors of
and is a diagonal matrix with elements where corresponding to the square roots of the eigenvalues of . They are called the singular values of and are positive arranged in descending order. ().
Or a picture is worth a thousand words...
Existence and Uniqueness¶
Every matrix has a singular value decomposition. Furthermore, the singular values are uniquely determined, and if is square and the are distinct, the left and right singular vectors and are uniquely determined up to complex signs (i.e., complex scalar factors of absolute value 1).
Eigenvalue Decomposition vs. SVD Decomposition¶
Let the matrix contain the eigenvectors of which are linearly independent, then we can write a decomposition of the matrix as
How does this differ from the SVD?
The basis of the SVD representation differs from the eigenvalue decomposition
The basis vectors are not in general orthogonal for the eigenvalue decomposition where it is for the SVD
The SVD effectively contains two basis sets.
All matrices have an SVD decomposition whereas not all have eigenvalue decompositions.
Applications¶
Matrix Pseudo-Inverse for ill-conditions problems
Principal Component analysis
Total Least Squares
Image Compression
Data Analysis for interpretation of high-dimensional data
and more
A conceptual computation of the SVD (not how it’s really done)¶
Given , we can form the matrix
where is unitary and
However, because is clearly Symmetric it also has the eigen factorization
where is unitary and is real diagonal.
Moreover, we can show that is at least Positive semi-definite so all the eigenvalues are .
In particular, if we know that eigenvalues are and eigenvalues are equal to zero.
Given that
If we simply order the eigenvalues from largest to smallest (and rearrange the columns of to match), then we can make the association
or
A similar argument can be made about and the eigenvectors of , however, a better way to think about the relationship between , and is to rearrange the SVD as
or we can look at this column-by-column
(in the same way the Eigen decomposition is equivalent to )
or rearranging
lets us solve for the first columns of given and .
As we’ll show, we usually don’t need to solve for the remaing columns of for reasons that relate to the “4 fundamental subspaces of ”
Review: the Singular Value Decomposition (SVD)¶
for any , where
and is the orthogonal matrix whose columns are the eigenvectors of
and is the orthogonal matrix whose columns are the eigenvectors of
and is a diagonal matrix with elements where corresponding to the square roots of the eigenvalues of . They are called the singular values of and are positive arranged in descending order. ().
But a better way to write and understand the SVD is as
where the columns of and contain orthonormal bases for the 4 fundamental subspaces of
The Strangian view of the 4 Subspaces: a quick refresher¶

The Big idea: The columns of and contain orthonormal bases for the 4 fundamental subspaces¶
Returning to it follows that
Therefore the last columns of must be in the Null space of , and in fact must form an orthonormal basis for the Null space .
Since the first columns of are all orthogonal to the last columns, they must form an orthonormal basis for the row space .
Leaving only the last columns of as an orthonormal basis for the left null space
Again a picture is worth a lot here
The Economy (or Skinny) SVD¶
As it turns out, because of all the 0’s all we actually need to reconstruct is the the first columns of , and , and the square sub-block of with just the singular values. i.e.
or
And it is this object (The Economy (or Skinny, or Compact) SVD), that helps us understand what the SVD does.
Spectral Theorem¶
The Economy SVD can also be written as
Which says we can expand as a series of rank-1 matrices weighted by the singular values...
and it is this picture that leads to many of the important approximating properties of the SVD.
Matrix Multiplication¶
Consider the Matrix Vector product which maps a vector to its image in column space . In the context of the Economy SVD
or (since )
or $$
Can't use function '$' in math mode at position 7: where $̲$\mathbf{x}^+ =…
where $$\mathbf{x}^+ = (V_r V_r^T)\mathbf{x}is the projection of onto the row space
Question: If we wanted it, how would we find the projection of onto the Null space of ?
SVD and the Matrix Inverse¶
If a matrix is square and invertible, it implies that and , and are all square invertible matrices, therefore if
then
SVD and the Matrix Pseudo-Inverse¶
But suppose is not square. Then it cannot have an inverse, however, the SVD allows us to define a “Pseudo-Inverse”
Given, the skinny SVD
we can define
because is square and invertible (but its size is set by the rank )
Action of the Pseudo-Inverse¶
In general, if is rank deficient (), the problem has either no solution or an infinite number of solutions. However, it always has a minimal least squares solution given by
which maps any vector to a unique vector
As we did with , we can consider the action of because on any vector as a projection problem.
The minimal Least-Squares solution¶
It should be clear that
lies entirely in the Row space of (Why?)
Therefore it has no component in the and is the shortest solution to which is the least squares problem
The Pseudo-Inverse and ill-conditioned problems¶
Technically, a square matrix is either invertible or singular, however, in many real problems, a matrix can be close to singular or “ill-conditioned”.
Again, the condition number of a matrix defined by an induced norm is
In particular, for , it is easy to show (using the SVD) that for a square matrix
and thus for a singular matrix with , .
ill-conditioned matrices¶
However, it is possible that while , it is significantly smaller than , leading to a very large condition number. In particular if the last singular values are very small, it suggest that the matrix has a “near Null Space”, with a basis defined by the last columns of . Thus, while strictly invertible, if a direct solver is used to solve
any component in the near null space will be amplified by and completely pollute your answer.
However, a useful fix can be to approximate as a lower rank matrix
which just keeps the first singular values (i.e. pretends ). If chosen wisely, The approximate solution
can be considerably more accurate as it does not allow the near-null space to contaminate the solution.
Examples¶
Full SVD example¶
Consider the matrix
Confirm the SVD representation using numpy functions as appropriate.
A = numpy.array([[2.0, 0.0, 3.0], [5.0, 7.0, 1.0], [0.0, 6.0, 2.0]])
U, sigma, V_T = numpy.linalg.svd(A, full_matrices=True)print("U:\n{}\n".format(U))
print("Sigma:\n{}\n".format(numpy.diag(sigma)))
print("V:\n{}\n".format(V_T.T))
print(sigma)Aprime = numpy.dot(U, numpy.dot(numpy.diag(sigma), V_T))
print("A - USV^T:\n{}".format(A - Aprime))x = numpy.array([1.0, 2.0, 1.0])
y = numpy.array([2.0, -1.0, 1.0])
A = numpy.outer(x, y)
print("A:\n{}\n".format(A))U, sigma, V_T = numpy.linalg.svd(A, full_matrices=False)
print("U:\n{}\n".format(U))
print("sigma:\n{}\n".format(sigma))
print("V:\n{}\n".format(V_T.T))Aprime = numpy.dot(U, numpy.dot(numpy.diag(sigma), V_T))
print("A - USV^T:\n{}".format(A - Aprime))# rank-1 reconstruction
k = 1
A1 = numpy.dot(U[:, :k], numpy.dot(numpy.diag(sigma[:k]), V_T[:k, :]))
print("A_k =\n{}\n".format(A1))
print("A - A_k:\n{}".format(A - A1))An ill-conditioned example¶
Consider the matrix
where the last column is almost the sum of the first two.
epsilon = 1.0e-5
A = numpy.array([[1.0, 0.0, 1.0], [1.0, 1.0, 2 + epsilon], [0.0, 1.0, 1.0]])
print("A:\n{}\n".format(A))U, S, V_T = numpy.linalg.svd(A)
print("U:\n{}\n".format(U))
print("S:\n{}\n".format(S))
print("V:\n{}\n".format(V_T.T))
print(
"sigma_1/sigma_n = {}, cond_2(A) = {}".format(
S[0] / S[2], numpy.linalg.cond(A, p=2)
)
)Try to solve ¶
epsilon = 5 * numpy.finfo(numpy.float64).eps
A = numpy.array([[1.0, 0.0, 1.0], [1.0, 1.0, 2 + epsilon], [0.0, 1.0, 1.0]])
x_true = numpy.array([1.0, 2.0, 3.0])
b = A.dot(x_true)x = numpy.linalg.solve(A, b)
print("x = {}: cond(A)={}".format(x, numpy.linalg.cond(A, p=2)))Rank 2 reconstruction¶
U, S, VT = numpy.linalg.svd(A)
print(S)
k = 2
A2 = numpy.dot(U[:, :k], numpy.dot(numpy.diag(S[:k]), VT[:k, :]))
# Pseudo Inverse
Ap2 = numpy.dot(VT.T[:, :k], numpy.dot(numpy.diag(1.0 / S[:k]), U.T[:k, :]))xp = Ap2.dot(b)
print("x_true\t= {}".format(x_true))
print("x^+ \t= {}".format(xp))
print("x \t= {}: ".format(numpy.linalg.solve(A, b)))
print(
"rel_err\t= {} ".format(numpy.linalg.norm(x - x_true) / numpy.linalg.norm(x_true))
)Review: the Singular Value Decomposition (SVD)¶
for any , where
and is the orthogonal matrix whose columns are the eigenvectors of
and is the orthogonal matrix whose columns are the eigenvectors of
and is a diagonal matrix with elements where corresponding to the square roots of the eigenvalues of . They are called the singular values of and are positive arranged in descending order. ().
Again a picture is worth a lot here
and contain orthonormal bases for the 4 subspaces of ¶
The first columns of form an orthonormal bases for
The first columns of form an orthonormal bases for
The last columns of form an orthonormal basis for
The last columns of form an orthonormal basis for
The Economy (or Skinny) SVD¶
As it turns out, because of all the 0’s all we actually need to reconstruct is the the first columns of , and , and the square sub-block of with just the singular values. i.e.
is an orthogonal projector onto
projects onto
is an orthogonal projector onto
projects onto
The Pseudo-Inverse¶
Given the SVD, every matrix has a ‘pseudo-inverse’
with the properties
if and , then
is always the shortest least squares solution s.t.
projects onto ______?
projects onto ______?
Matrix Properties via the SVD¶
The where is the number of non-zero singular values.
The and .
The and .
The nonzero singular values of A are the square roots of the nonzero eigenvalues of or .
If , then the singular values of are the absolute values of the eigenvalues of .
For then
It can then be shown (Eckart–Young–Mirsky theorem) that for all matrices with rank , then
i.e. that is the best rank-k approximation to in the 2-norm.
Example: Image compression¶
How does this work in practice?
the original Matrix (From Durer’s Melancholia)¶

from matplotlib import image
data = image.imread("images/melancholia-magic-square.png")
m, n = data.shapeprint("{}x{} pixel image".format(m, n))
plt.figure(figsize=(8, 6))
plt.imshow(data, cmap="gray")
plt.show()u, s, vt = numpy.linalg.svd(data)fig = plt.figure(figsize=(16, 6))
axes = fig.add_subplot(1, 2, 1)
axes.semilogy(s, "bo-")
axes.set_ylabel("$\sigma$", fontsize=16)
axes.grid()
axes.set_title("Spectrum of Singular Values")
axes = fig.add_subplot(1, 2, 2)
g = numpy.cumsum(s * s) / numpy.sum(s * s)
axes.plot(g, "b-")
axes.set_title("% cumulative percent variance explained", fontsize=18)
axes.grid()
plt.show()First 4 Modes¶
where mode is the rank-1 matrix
fig = plt.figure()
fig.set_figwidth(fig.get_figwidth() * 4)
fig.set_figheight(fig.get_figheight() * 1)
for i in range(4):
mode = s[i] * numpy.outer(u[:, i], vt[i, :])
axes = fig.add_subplot(1, 4, i + 1)
mappable = axes.imshow(mode, cmap="gray")
axes.set_title("Mode = {}, $\sigma$={:3.3f}".format(i + 1, s[i]), fontsize=18)
plt.show()Compressed Reconstructions¶
We can now view compressed reconstructions that sum the first modes
the amount of memory required to store is whereas the storage of is so the relative compression is
The question is how many modes are required to adequately reconstruct the original image?
fig = plt.figure()
fig.set_figwidth(fig.get_figwidth() * 3)
fig.set_figheight(fig.get_figheight() * 2)
for i, k in enumerate([1, 10, 20]):
Ak = u[:, :k].dot(numpy.diag(s[:k]).dot(vt[:k, :]))
storage = 100.0 * k * (m + n + 1) / (m * n)
axes = fig.add_subplot(1, 3, i + 1)
mappable = axes.imshow(Ak, vmin=0.0, vmax=1.0, cmap="gray")
axes.set_title(
"$A_{{{}}}$, $storage={:2.2f}\%, \% var ={:2.2f}$".format(
k, storage, 100 * g[k]
),
fontsize=16,
)
plt.show()fig = plt.figure()
fig.set_figwidth(fig.get_figwidth() * 3)
fig.set_figheight(fig.get_figheight() * 2)
for i, k in enumerate([50, 100, 200]):
Ak = u[:, :k].dot(numpy.diag(s[:k]).dot(vt[:k, :]))
storage = 100.0 * k * (m + n + 1) / (m * n)
axes = fig.add_subplot(1, 3, i + 1)
mappable = axes.imshow(Ak, vmin=0.0, vmax=1.0, cmap="gray")
axes.set_title(
"$A_{{{}}}$, $storage={:2.2f}\%, \% var ={:2.2f}$".format(
k, storage, 100 * g[k]
),
fontsize=16,
)
plt.show()Other Applications¶
Total Least-squares
PCA
Total Least Squares¶
GOAL: Demonstrate the use of the SVD to calculate total least squares regression and compare it to the classical least squares problem that assumes only errors in y.
Random data:¶
We start by constructing a random data set that approximates a straight line but has random errors in both x and y coordinates
# npoints uniformly randomly distributed points in the interval [0,3]
npnts = 100
x = numpy.random.uniform(0.0, 3.0, npnts)
# set y = mx + b plus random noise of size err
slope = 2.0
intercept = 1.0
err = 0.5
y = slope * x + intercept
y += numpy.random.normal(loc=y, scale=err)
# add some random noise to x variable as well
x += numpy.random.normal(loc=x, scale=err)
# And plot out the data
fig = plt.figure(figsize=(8, 6))
axes = fig.add_subplot(1, 1, 1)
axes.scatter(x, y)
axes.set_xlabel("x", fontsize=16)
axes.set_ylabel("y", fontsize=16)
axes.set_title("Data", fontsize=18)
axes.grid()
plt.show()Classical Least Squares:¶
We first calculate the best fit straight line assuming all the error is in the variable using the a decomposition of the Vandermonde matrix
# Vandermonde matrix
A = numpy.array([numpy.ones(x.shape), x]).T
# solve Ac = y using the QR decomposition via scipy
c_ls, res, rank, s = numpy.linalg.lstsq(A, y, rcond=None)
print("Best fit Linear Least Squares:")
print(" slope={}".format(c_ls[1]))
print(" intercept={}".format(c_ls[0]))# dummy variables
t_ls = numpy.linspace(0, x.max())
# And plot out the data
fig = plt.figure(figsize=(10, 8))
axes = fig.add_subplot(1, 1, 1)
axes.scatter(x, y)
axes.set_xlabel("x")
axes.set_ylabel("y")
# plot the least squares solution
axes.plot(t_ls, c_ls[0] + t_ls * c_ls[1], "r-", label="Least Squares")
axes.legend()
axes.grid()
plt.show()Total Least Squares:¶
Suppose we wanted to find the line defined by its unit normal vector that instead minimized the sum of orthogonal distance between the data and the line, i.e.
# make up a line through the center of the data
X = numpy.array([x, y]).T
X_mean = numpy.mean(X, 0)
v = numpy.array([1, 2])
v = v / numpy.linalg.norm(v, ord=2)
u = numpy.array([-v[1], v[0]])
# print(u.dot(v))
t = numpy.linspace(0, x.max())
t = 2 * (t - numpy.mean(t_ls))
fig = plt.figure(figsize=(8, 6))
axes = fig.add_subplot(1, 1, 1)
axes.scatter(x, y)
l = X_mean + numpy.outer(t, v)
lu = X_mean + u
axes.plot(l[:, 0], l[:, 1], "g")
axes.plot([X_mean[0], X_mean[0] + u[0]], [X_mean[1], X_mean[1] + u[1]], "r")
axes.text(X_mean[0] + 1.3 * u[0], X_mean[1] + 1.3 * u[1], "$\mathbf{u}$", fontsize=14)
axes.set_aspect("equal")
axes.set_xlabel("x", fontsize=16)
axes.set_ylabel("y", fontsize=16)
axes.set_title("Data", fontsize=18)
axes.grid()
plt.show()Let where
then we seek that minimizes
because
hopefully it’s clear that the vector
provides the minimum value
But
So, the best fit line is the one that is orthogonal to which is the singular vector corresponding to the smallest singular value.
The line we seek is orthogonal to that, so must lie in the direction of
Total Least-Squares (a recipe)¶
form a matrix whose rows are the de-meaned data
find the SVD of ,
set
and in 2-D, the best fit line is parallel to and goes through the mean of the data
# Prepare the data matrix
X = numpy.array([x, y]).T
print("Shape of data Matrix: {}".format(X.shape))# and remove the mean
X_mean = numpy.mean(X, 0)
print("Mean of data matrix={}".format(X_mean))
M = X - X_mean# now calculate the SVD of the de-meaned data matrix
U, S, VT = numpy.linalg.svd(M, full_matrices=False)
V = VT.T
print("Singular values = {}".format(S))
print("First Right singular vector V_1= {}".format(V[:, 0]))Now plot and compare the two solutions
# dummy variables
t_ls = numpy.linspace(0, x.max())
t_svd = 2 * (t_ls - numpy.mean(t_ls))
# make figure
plt.figure(figsize=(10, 8))
# plot data
plt.scatter(x, y)
# plot the least squares solution
plt.plot(t_ls, c_ls[0] + t_ls * c_ls[1], "r-", label="Least Squares")
# plot the total least Squares solution
# plot the mean
plt.plot(X_mean[0], X_mean[1], "go", markersize=12, label="X_mean")
# calculate a line through the mean with the first principal component as a basis
L_tls = X_mean + numpy.outer(t_svd, V[:, 0])
plt.plot(L_tls[:, 0], L_tls[:, 1], "c-", label="Total Least Squares")
plt.xlabel("x")
plt.ylabel("y")
plt.title("Comparison Least Squares vs Total Least Squares")
plt.legend(loc="best")
plt.grid()
plt.show()Notes¶
This technique can be extended into higher dimensions. For example, we could find a best fit plane through a 3-D cloud of data.
For this problem again, we would find the SVD of the demeaned data matrix , and the best fit plane would be normal to the last singular vector or
In general, for a demeaned data matrix , columns of the matrix form an orthonormal basis for the row space of and describe the axes of the best fit Ellipsoid describing the data.
Principal Component/EOF analysis¶
GOAL: Demonstrate the use of the SVD to calculate principal components or “Empirical Orthogonal Functions” in a geophysical data set. This example is modified from a paper by Chris Small (LDEO)
Small, C., 1994. A global analysis of mid-ocean ridge axial topography. Geophys J Int 116, 64–84. doi:10
The Data¶
Here we will consider a set of topography profiles taken across the global mid-ocean ridge system where the Earth’s tectonic plates are spreading apart.
![]() |
The data consists of 156 profiles from a range of spreading rates. Each profile contains 79 samples so is in effect a vector in
# read the data from the csv file
data = pd.read_csv("data/m80_data.csv", header=None).values
data_mean = numpy.mean(data, 0)# and plot out a few profiles and the mean depth.
plt.figure(figsize=(10, 8))
rows = [9, 59, 99]
labels = ["slow", "medium", "fast"]
for i, row in enumerate(rows):
plt.plot(data[row, :], label=labels[i])
plt.plot(data_mean, "k--", label="mean")
plt.xlabel("Distance across axis (km)")
plt.ylabel("Relative Elevation (m)")
plt.legend(loc="best")
plt.title("Example cross-axis topography of mid-ocean ridges")
plt.grid()
plt.show()EOF analysis¶
While each profile lives in an 80 dimensional space, we would like to see if we can classify the variability in fewer components. To begin we form a de-meaned data matrix where each row is a profile.
X = data - data_meanplt.figure(figsize=(6, 8))
plt.imshow(X, vmin=-1500, vmax=2000)
plt.xlabel("Distance across axis (Km)", fontsize=16)
plt.ylabel("Relative Spreading Rate", fontsize=16)
plt.title("De-meaned Data", fontsize=18)
cbar = plt.colorbar()
cbar.set_label("Relative Depth (m)", fontsize=16)
plt.show()Applying the SVD¶
We now use the SVD to factor the data matrix as
# now calculate the SVD of the de-meaned data matrix
U, S, Vt = numpy.linalg.svd(X, full_matrices=False)And begin by looking at the spectrum of singular values . Defining the variance as then we can also calculate the cumulative contribution to the total variance as
Plotting both and shows that 80% of the total variance can be explained by the first 4-5 Components
# plot the singular values
fig = plt.figure(figsize=(16, 6))
axes = fig.add_subplot(1, 2, 1)
axes.semilogy(S, "bo")
axes.grid()
axes.set_title("Singular Values", fontsize=18)
# and cumulative percent of variance
axes = fig.add_subplot(1, 2, 2)
g = numpy.cumsum(S * S) / numpy.sum(S * S)
axes.plot(g, "bx-")
axes.set_title("% cumulative percent variance explained", fontsize=18)
axes.grid()
plt.show()Plotting the first 3 Singular Vectors in , shows them to reflect some commonly occuring patterns in the data
num_EOFs = 3fig = plt.figure(figsize=(10, 8))
axes = fig.add_subplot(1, 1, 1)
for row in range(num_EOFs):
axes.plot(Vt[row, :], label="EOF{}".format(row + 1))
axes.plot([0, X.shape[1]], [0.0, 0.0], "k--")
axes.grid()
axes.set_xlabel("Distance (km)")
axes.set_title("First {} EOFs ".format(num_EOFs))
axes.legend(loc="best")
plt.show()For example, the first EOF pattern is primarily a symmetric pattern with an axial high surrounded by two off axis troughs (or an axial low with two flanking highs, the EOF’s are just unit vector bases for the row-space and can be added with any positive or negative coefficient). The Second EOF is broader and all of one sign while the third EOF encodes assymetry.
Reconstruction¶
Using the SVD we can also decompose each profile into a weighted linear combination of EOF’s i.e.
where is a matrix of coefficients that describes the how each data row is decomposed into the relevant basis vectors. We can then produce a k-rank truncated representation of the data by
where is the first columns of and is the first EOF’s.
Here we show the original data and the reconstructed data using the first 5 EOF’s
# recontruct the data using the first 5 EOF's
k = 5
Ck = numpy.dot(U[:, :k], numpy.diag(S[:k]))
Vtk = Vt[:k, :]
data_k = data_mean + numpy.dot(Ck, Vtk)fig = plt.figure(figsize=(12, 8))
axes = fig.add_subplot(1, 2, 1)
im = axes.imshow(data_k, vmin=-1500, vmax=2000)
fig.colorbar(im)
axes.set_title("reconstructed data")
axes = fig.add_subplot(1, 2, 2)
im = axes.imshow(data, vmin=-2000, vmax=2000)
axes.set_title("Original data")
fig.colorbar(im)
plt.show()And we can consider a few reconstructed profiles compared with the original data
# show the original 3 profiles and their recontructed values using the first k EOF's
fig = plt.figure(figsize=(24, 8))
for i, row in enumerate(rows):
axes = fig.add_subplot(1, 3, i + 1)
h = axes.plot(data_k[row, :])
h += axes.plot(data[row, :])
axes.grid()
Cstring = ["{:3.0f}, ".format(Ck[row, i]) for i in range(k)]
axes.set_title(
"Reconstruction profile {}:\n C_{}=".format(row, k) + "".join(Cstring)
)
axes.legend(["k={}".format(k), "original data"], loc="best")
plt.show()projection of data onto a subspace¶
We can also use the Principal Components to look at the projection of the data onto a lower dimensional space as the coefficients , are simply the coordinates of our data along each principal component. For example we can view the data in the 2-Dimensional space defined by the first 2 EOF’s by simply plotting C_1 against C_2.
# plot the data in the plane defined by the first two principal components
fig = plt.figure(figsize=(16, 6))
axes = fig.add_subplot(1, 2, 1)
axes.scatter(Ck[:, 0], Ck[:, 1])
axes.set_xlabel("$V_1$")
axes.set_ylabel("$V_2$")
axes.grid()
axes.set_title("Projection onto the first two principal components")
# Or consider the degree of assymetry (EOF 3) as a function of spreading rate
axes = fig.add_subplot(1, 2, 2)
axes.plot(Ck[:, 2], "o")
axes.set_xlabel("Relative Spreading rate")
axes.set_ylabel("$C_3$")
axes.grid()
axes.set_title("Degree of assymetry")
plt.show()
