Skip to article frontmatterSkip to article content

Singular Value Decomposition

Text provided under a Creative Commons Attribution license, CC-BY. All code is made available under the FSF-approved MIT license. (c) Kyle T. Mandli and Rajath Kumar Mysore Pradeep Kumar

Note: This material largely follows the text “Numerical Linear Algebra” by Trefethen and Bau (SIAM, 1997) and is meant as a guide and supplement to the material presented there.

from __future__ import print_function

%matplotlib inline
import matplotlib.pyplot as plt
import numpy

Singular Value Decomposition

Definition: Let ARm×nA \in \mathbb R^{m \times n}, then AA can be factored as

A=UΣVTA = U\Sigma V^{T}

where,

  • URm×mU \in \mathbb R^{m \times m} and is the orthogonal matrix whose columns are the eigenvectors of AATAA^{T}

  • VRn×nV \in \mathbb R^{n \times n} and is the orthogonal matrix whose columns are the eigenvectors of ATAA^{T}A

  • ΣRm×n\Sigma \in \mathbb R^{m \times n} and is a diagonal matrix with elements σ1,σ2,σ3,...σr\sigma_{1}, \sigma_{2}, \sigma_{3}, ... \sigma_{r} where r=rank(A)r = rank(A) corresponding to the square roots of the eigenvalues of ATAA^{T}A. They are called the singular values of AA and are non negative arranged in descending order. (σ1σ2σ3...σr0\sigma_{1} \geq \sigma_{2} \geq \sigma_{3} \geq ... \sigma_{r} \geq 0).

The SVD has a number of applications mostly related to reducing the dimensionality of a matrix.

Full SVD example

Consider the matrix

A=[203571062].A = \begin{bmatrix} 2 & 0 & 3 \\ 5 & 7 & 1 \\ 0 & 6 & 2 \end{bmatrix}.

Confirm the SVD representation using numpy functions as appropriate.

A = numpy.array([[2.0, 0.0, 3.0], [5.0, 7.0, 1.0], [0.0, 6.0, 2.0]])

U, sigma, V_T = numpy.linalg.svd(A, full_matrices=True)
print(numpy.dot(U, numpy.dot(numpy.diag(sigma), V_T)))

Eigenvalue Decomposition vs. SVD Decomposition

Let the matrix XX contain the eigenvectors of AA which are linearly independent, then we can write a decomposition of the matrix AA as

A=XΛX1.A = X \Lambda X^{-1}.

How does this differ from the SVD?

  • The basis of the SVD representation differs from the eigenvalue decomposition

    • The basis vectors are not in general orthogonal for the eigenvalue decomposition where it is for the SVD

    • The SVD effectively contains two basis sets.

  • All matrices have an SVD decomposition whereas not all have eigenvalue decompositions.

Existence and Uniqueness

Every matrix ACm×nA \in \mathbb{C}^{m \times n} has a singular value decomposition. Furthermore, the singular values {σj}\{\sigma_{j}\} are uniquely determined, and if AA is square and the σj\sigma_{j} are distinct, the left and right singular vectors {uj}\{u_{j}\} and {vj}\{v_{j}\} are uniquely determined up to complex signs (i.e., complex scalar factors of absolute value 1).

Matrix Properties via the SVD

  • The rank(A)=r\text{rank}(A) = r where rr is the number of non-zero singular values.

  • The range(A)=[u1,...,ur]\text{range}(A) = [u_1, ... , u_r] and null(a)=[vr+1,...,vn]\text{null}(a) = [v_{r+1}, ... , v_n].

  • The A2=σ1|| A ||_2 = \sigma_1 and AF=σ12+σ22+...+σr2||A||_F = \sqrt{\sigma_{1}^{2}+\sigma_{2}^{2}+...+\sigma_{r}^{2}}.

  • The nonzero singular values of A are the square roots of the nonzero eigenvalues of ATAA^{T}A or AATAA^{T}.

  • If A=ATA = A^{T}, then the singular values of AA are the absolute values of the eigenvalues of AA.

  • For ACm×nA \in \mathbb{C}^{m \times n} then det(A)=Πi=1mσi|det(A)| = \Pi_{i=1}^{m} \sigma_{i}

Low-Rank Approximations

  • AA is the sum of the rr rank-one matrices:

    A=UΣVT=j=1rσjujvjTA = U \Sigma V^T = \sum_{j=1}^{r} \sigma_{j}u_{j}v_{j}^{T}
  • For any kk with 0kr0 \leq k \leq r, define

    A=j=1kσjujvjTA = \sum_{j=1}^{k} \sigma_{j}u_{j}v_{j}^{T}

    Let k=min(m,n)k = min(m,n), then

AAv2=infBCm×nrank(B)kAB2=σk+1||A - A_{v}||_{2} = \text{inf}_{B \in \mathbb{C}^{m \times n}} \text{rank}(B)\leq k|| A-B||_{2} = \sigma_{k+1}
  • For any kk with 0kr0 \leq k \leq r, the matrix AkA_{k} also satisfies

    AAvF=infBCm×nrank(B)vABF=σv+12+...+σr2||A - A_{v}||_{F} = \text{inf}_{B \in \mathbb{C}^{m \times n}} \text{rank}(B)\leq v ||A-B||_{F} = \sqrt{\sigma_{v+1}^{2} + ... + \sigma_{r}^{2}}

Example: “Hello World”

How does this work in practice?

data = numpy.zeros((15, 40))

# H
data[2:10, 2:4] = 1
data[5:7, 4:6] = 1
data[2:10, 6:8] = 1

# E
data[3:11, 10:12] = 1
data[3:5, 12:16] = 1
data[6:8, 12:16] = 1
data[9:11, 12:16] = 1

# L
data[4:12, 18:20] = 1
data[10:12, 20:24] = 1

# L
data[5:13, 26:28] = 1
data[11:13, 28:32] = 1

# 0
data[6:14, 34:36] = 1
data[6:8, 36:38] = 1
data[12:14, 36:38] = 1
data[6:14, 38:40] = 1

plt.imshow(data)
plt.show()
u, diag, vt = numpy.linalg.svd(data, full_matrices=True)
fig = plt.figure()
fig.set_figwidth(fig.get_figwidth() * 3)
fig.set_figheight(fig.get_figheight() * 4)
for i in range(1, 16):
    diag_matrix = numpy.concatenate(
        (
            numpy.zeros(
                (len(diag[:i]) - 1),
            ),
            diag[i - 1 : i],
            numpy.zeros(
                (40 - i),
            ),
        )
    )
    reconstruct = numpy.dot(numpy.dot(u, numpy.diag(diag_matrix)[:15,]), vt)

    axes = fig.add_subplot(5, 3, i)
    mappable = axes.imshow(reconstruct, vmin=0.0, vmax=1.0)
    axes.set_title("Component = %s" % i)

plt.show()
u, diag, vt = numpy.linalg.svd(data, full_matrices=True)
fig = plt.figure()
fig.set_figwidth(fig.get_figwidth() * 3)
fig.set_figheight(fig.get_figheight() * 4)
for i in range(1, 16):
    diag_matrix = numpy.concatenate(
        (
            diag[:i],
            numpy.zeros(
                (40 - i),
            ),
        )
    )
    reconstruct = numpy.dot(numpy.dot(u, numpy.diag(diag_matrix)[:15,]), vt)

    axes = fig.add_subplot(5, 3, i)
    mappable = axes.imshow(reconstruct, vmin=0.0, vmax=1.0)
    axes.set_title("Component = %s" % i)

plt.show()