Skip to main content

Worksheet 4.7 Worksheet: Singular Value Decomposition

For this worksheet, the reader is directed to Section 4.6. Further details may be found in Section 8.6 of Linear Algebra with Applications, by Keith Nicholson. (See also notebook by Dr. Juan H Klopper 1 .)
In Section 4.6 we saw that the singular_value_decomposition algorithm in SymPy does things a little bit differently than in Section 4.6. If we start with a square matrix \(A\text{,}\) the results are the same, but if \(A\) is not square, the decomposition \(A = P\Sigma_A Q^T\) looks a little different. In particular, if \(A\) is \(m\times n\text{,}\) the matrix \(\Sigma_A\) defined in Section 4.6 will also be \(m\times n\text{,}\) but it will contain some rows or columns of zeros that are added to get the desired size.
The matrix \(Q\) is an orthogonal \(n\times n\) matrix whose columns are an orthonormal basis of eigenvectors for \(A^TA\text{.}\) The matrix \(P\) is an orthogonal \(m\times m\) matrix whose columns are an orthonormal basis of \(\R^m\text{.}\) (The first \(r\) columns of \(P\) are given by \(A\vecq_i\text{,}\) where \(\vecq_i\) is the eigenvector of \(A^TA\) corresponding to the positive singular value \(\sigma_i\text{.}\))
The algorithm provided by SymPy replaces \(\Sigma_A\) by the \(r\times r\) diagonal matrix of nonzero singular values. (This is common in most algorithms, since we don’t want to bother storing data we don’t need.) The matrix \(Q\) is replaced by the \(n\times r\) matrix whose columns are the first \(r\) eigenvectors of \(A^TA\text{,}\) and the matrix \(P\) is replaced by the \(m\times r\) matrix whose columns are the orthonormal basis for the column space of \(A\text{.}\) (Note that the rank of \(A^TA\) is equal to the rank of \(A\text{,}\) which is equal to the number \(r\) of nonzero eigenvectors of \(A^TA\) (counted with multiplicity).)
The product \(P\Sigma_A Q^T\) will be the same in both cases, and the matrix \(P\) is the same as well.
This time, rather than using the SymPy algorithm, we will work through the process outlined in Section 4.6 step-by-step. Let’s revisit Example 4.6.4. Let \(A = \bbm 1\amp 1\amp 1\\1\amp 0\amp -1\ebm\text{.}\) First, we get the singular values:
Next, we get the eigenvalues and eigenvectors of \(A^TA\text{:}\)
Now we need to normalize the eigenvectors, in the correct order. Note that the eigenvectors were listed in increasing order of eigenvalue, so we need to reverse the order. Note that L1 is a list of lists. The eigenvector is the third entry (index 2) in the list (eigenvalue, multiplicity, eigenvector). We also need to turn list elements into matrices. So, for example the second eigenvector is Matrix(L1[1][2]).
Next, we can assemble these vectors into a matrix, and confirm that it’s orthogonal.
We’ve made the matrix \(Q\text{!}\) Next, we construct \(\Sigma_A\text{.}\) This we will do by hand. (Can you think of a way to do it automatically?)
Alternatively, you could do SigA = diag(L0[0],L0[1]).row_join(Matrix([0,0])). Finally, we need to make the matrix \(P\text{.}\) First, we find the vectors \(A\vecq_1, A\vecq_2\) and normalize. (Note that \(A\vecq_3=\zer\text{,}\) so this vector is unneeded, as expected.)
Note that the matrix \(P\) is already the correct size, because \(\rank(A)=2\dim(\R^2)\text{.}\) In general, for an \(m\times n\) matrix \(A\text{,}\) if \(\rank(A)=r\lt m\text{,}\) we would have to extend the set \(\{\vecp_1,\ldots, \vecp_r\}\) to a basis for \(\R^m\text{.}\) Finally, we check that our matrices work as advertised.
For convenience, here is all of the above code, with all print commands (except the last one) removed.
from sympy import Matrix,BlockMatrix,init_printing
init_printing()
A = Matrix([[1,1,1],[1,0,-1]])
B=(A.T)*A
L0=A.singular_values()
L1=B.eigenvects()
R1=Matrix(L1[2][2])
R2=Matrix(L1[1][2])
R3=Matrix(L1[0][2])
Q1 = (1/R1.norm())*R1
Q2 = (1/R2.norm())*R2
Q3 = (1/R3.norm())*R3
Q = Matrix(BlockMatrix([Q1,Q2,Q3]))
SigA = diag(L0[0],L0[1]).row_join(Matrix([0,0]))
S1 = A*Q1
S2 = A*Q2
P1 = (1/S1.norm())*S1
P2 = (1/S2.norm())*S2
P = Matrix(BlockMatrix([P1,P2]))
P,SigA,Q,P*SigA*Q.T

1.

Compute the SVD for the matrices
\begin{equation*} \bbm 2\amp -1 \amp 1\\1\amp 0 \amp -2\ebm \quad \quad \bbm 2\amp -1\\-1\amp 3\\1\amp -1\ebm \quad \quad \bbm 1\amp 1\amp 0\\ 0\amp 1\amp 2\\1\amp 0 \amp -2\ebm\text{.} \end{equation*}
Note that for these matrices, you may need to do some additional work to extend the \(\vecp_i\) vectors to an orthonormal basis. You can adapt the code above, but you will have to think about how to implement additional code to construct any extra vectors you need.

2.

By making some very minor changes in the matrices in Worksheet Exercise 4.7.1, convince yourself that (a) those matrices were chosen very carefully, and (b) there’s a reason why most people do SVD numerically.

3.

Recall from Worksheet 3.5 that for an inconsistent system \(A\xx = \mathbf{b}\text{,}\) we wish to find a vector \(\yy\) so that \(A\xx=\yy\) is consistent, with \(\yy\) as close to \(\mathbf{b}\) as possible.
In other words, we want to minimize \(\len{A\xx-\mathbf{b}}\text{,}\) or equivialently, \(\len{A\xx-\mathbf{b}}^2\text{.}\)

(a)

Let \(A = P\Sigma_A Q^T\) be the singular value decomposition of \(A\text{.}\) Show that
\begin{equation*} \len{A\xx-\mathbf{b}} = \len{\Sigma_A\yy-\zz}\text{,} \end{equation*}
where \(\yy = Q^T\xx\text{,}\) and \(\zz = P^T\mathbf{b}\text{.}\)

(b)

Show that setting \(\yy_i = \begin{cases}\zz_i/\sigma_i, \amp \text{ if } \sigma_i\neq 0\\ 0, \amp \text{ if } \sigma_i=0\end{cases}\) minimizes the value of \(\len{\Sigma_A\yy-\zz}\text{.}\)

(c)

Recall that we set \(\Sigma_A = \bbm D_A \amp 0\\ 0 \amp 0\ebm\text{,}\) where \(D_A\) is the diagonal matrix of nonzero singular values. Let us define the pseudo-inverse of \(\Sigma_A\) to be the matrix \(\Sigma_A^+ = \bbm D_A^{-1}\amp 0\\0\amp 0\ebm\text{.}\)
Show that the solution to the least squares problem is given by \(\xx = A^+\mathbf{b}\text{,}\) where \(A^+ = Q\Sigma_A^+ P^T\text{.}\)