Skip to main content

Section 4.2 Diagonalization of symmetric matrices

Recall that an \(n\times n\) matrix \(A\) is symmetric if \(A^T=A\text{.}\) Symmetry of \(A\) is equivalent to the following: for any vectors \(\xx,\yy\in\R^n\text{,}\)
\begin{equation*} \xx\dotp (A\yy) = (A\xx)\dotp \yy\text{.} \end{equation*}
To see that this is implied by the symmetry of \(A\text{,}\) note that
\begin{equation*} \xx\dotp (A\yy) = \xx^T(A\yy)=(\xx^TA^T)\yy = (A\xx)^T\yy=(A\xx)\dotp\yy\text{.} \end{equation*}

Exercise 4.2.1.

Prove that if \(\xx\dotp(A\yy)=(A\xx)\dotp \yy\) for any \(\xx,\yy\in\R^n\text{,}\) then \(A\) is symmetric.
Hint.
If this condition is true for all \(\xx,\yy\in \R^n\text{,}\) then it is true in particular for the vectors in the standard basis for \(\R^n\text{.}\)
A useful property of symmetric matrices, mentioned earlier, is that eigenvectors corresponding to distinct eigenvalues are orthogonal.

Strategy.

We want to show that if \(\xx_1,\xx_2\) are eigenvectors corresponding to distinct eigenvalues \(\lambda_1,\lambda_2\text{,}\) then \(\xx_1\dotp \xx_2=0\text{.}\) It was pointed out above that since \(A\) is symmetric, we know \((A\xx_1)\dotp \xx_2=\xx_1\dotp (A\xx_2)\text{.}\) Can you see how to use this, and the fact that \(\xx_1,\xx_2\) are eigenvectors, to prove the result?

Proof.

To see this, suppose \(A\) is symmetric, and that we have
\begin{equation*} A\xx_1=\lambda_1\xx_1\quad \text{ and } A\xx_2=\lambda_2\xx_2\text{,} \end{equation*}
with \(\xx_1\neq\zer,\xx_2\neq \zer\text{,}\) and \(\lambda_1\neq \lambda_2\text{.}\) We then have, since \(A\) is symmetric, and using the result above,
\begin{equation*} \lambda_1(\xx_1\dotp \xx_2) = (\lambda_1\xx_1)\dotp \xx_2 = (A\xx_1)\dotp \xx_2 = \xx_1\dotp(A\xx_2) = \xx_1(\lambda_2\xx_2) = \lambda_2(\xx_1\dotp\xx_2)\text{.} \end{equation*}
It follows that \((\lambda_1-\lambda_2)(\xx_1\dotp \xx_2)=0\text{,}\) and since \(\lambda_1\neq \lambda_2\text{,}\) we must have \(\xx_1\dotp \xx_2=0\text{.}\)
The procedure for diagonalizing a matrix is as follows: assuming that \(\dim E_\lambda(A)\) is equal to the multiplicity of \(\lambda\) for each distinct eigenvalue \(\lambda\text{,}\) we find a basis for \(E_\lambda(A)\text{.}\) The union of the bases for each eigenspace is then a basis of eigenvectors for \(\R^n\text{,}\) and the matrix \(P\) whose columns are those eigenvectors will satisfy \(P^{-1}AP = D\text{,}\) where \(D\) is a diagonal matrix whose diagonal entries are the eigenvalues of \(A\text{.}\)
If \(A\) is symmetric, we know that eigenvectors from different eigenspaces will be orthogonal to each other. If we further choose an orthogonal basis of eigenvectors for each eigenspace (which is possible via the Gram-Schmidt procedure), then we can construct an orthogonal basis of eigenvectors for \(\R^n\text{.}\) Furthermore, if we normalize each vector, then we’ll have an orthonormal basis. The matrix \(P\) whose columns consist of these orthonormal basis vectors has a name.

Definition 4.2.3.

A matrix \(P\) is called orthogonal if \(P^T = P^{-1}\text{.}\)

Strategy.

This more or less amounts to the fact that \(P^T=P^{-1}\) if and only if \(PP^T=I\text{,}\) and thinking about the matrix product in terms of dot products.
A fun fact is that if the columns of \(P\) are orthonormal, then so are the rows. But this is not true if we ask for the columns to be merely orthogonal. For example, the columns of \(A = \bbm 1\amp 0\amp 5\\-2\amp 1\amp 2\\1\amp 2\amp -1\ebm \) are orthogonal, but (as you can check) the rows are not. But if we normalize the columns, we get
\begin{equation*} P = \bbm 1/\sqrt{6}\amp 0 \amp 1/\sqrt{30}\\-2/\sqrt{6}\amp 1/\sqrt{5}\amp 2/\sqrt{30}\\1/\sqrt{6}\amp 2/\sqrt{5}\amp -1/\sqrt{30}\ebm\text{,} \end{equation*}
which, as you can confirm, is an orthogonal matrix.

Definition 4.2.5.

An \(n\times n\) matrix \(A\) is said to be orthogonally diagonalizable if there exists an orthogonal matrix \(P\) such that \(P^TAP\) is diagonal.
The above definition leads to the following result, also known as the Principal Axes Theorem. A careful proof is quite difficult, and omitted from this book. The hard part is showing that any symmetric matrix is orthogonally diagonalizable. There are a few ways to do this, most requiring induction on the size of the matrix. A common approach actually uses multivariable calculus! (Optimization via Lagrange multipliers, to be precise.) If you are reading this along with the book by Nicholson, there is a gap in his proof: in the induction step, he assumes the existence of a real eigenalue of \(A\text{,}\) but this has to be proved!

Example 4.2.7.

Determine the eigenvalues of \(A=\bbm 5\amp -2\amp -4\\-2\amp 8\amp -2\\-4\amp -2\amp 5\ebm\text{,}\) and find an orthogonal matrix \(P\) such that \(P^TAP\) is diagonal.
Solution.
We’ll solve this problem with the help of the computer.
We get \(c_A(x)=x(x-9)^2\text{,}\) so our eigenvalues are \(0\) and \(9\text{.}\) For \(0\) we have \(E_0(A) = \nll(A)\text{:}\)
For \(9\) we have \(E_9(A) = \nll(A-9I)\text{.}\)
The approach above is useful as we’re trying to remind ourselves how eigenvalues and eigenvectors are defined and computed. Eventually we might want to be more efficient. Fortunately, there’s a command for that.
Note that the output above lists each eigenvalue, followed by its multiplicity, and then the associated eigenvectors.
This gives us a basis for \(\R^3\) consisting of eigenvalues of \(A\text{,}\) but we want an orthogonal basis. Note that the eigenvector corresponding to \(\lambda = 0\) is orthogonal to both of the eigenvectors corresponding to \(\lambda =9\text{.}\) But these eigenvectors are not orthogonal to each other. To get an orthogonal basis for \(E_9(A)\text{,}\) we apply the Gram-Schmidt algorithm.
This gives us an orthogonal basis of eigenvectors. Scaling to clear fractions, we have
\begin{equation*} \left\{\bbm 2\\1\\2\ebm, \bbm -1\\2\\0\ebm, \bbm -4\\-2\\5\ebm\right\} \end{equation*}
From here, we need to normalize each vector to get the matrix \(P\text{.}\) But we might not like that the last vector has norm \(\sqrt{45}\text{.}\) One option to consider is to apply Gram-Schmidt with the vectors in the other order.
That gives us the (slightly nicer) basis
\begin{equation*} \left\{\bbm 2\\1\\2\ebm, \bbm -1\\0\\1\ebm, \bbm 1\\-4\\1\ebm\right\}\text{.} \end{equation*}
The corresponding orthonormal basis is
\begin{equation*} B = \left\{\frac{1}{3}\bbm 2\\1\\2\ebm, \frac{1}{\sqrt{2}}\bbm -1\\0\\1\ebm, \frac{1}{\sqrt{18}}\bbm 1\\-4\\1\ebm\right\}\text{.} \end{equation*}
This gives us the matrix \(P=\bbm 2/3\amp -1/\sqrt{2}\amp 1/\sqrt{18}\\1/3\amp 0 \amp -4/\sqrt{18}\\2/3\amp 1/\sqrt{2}\amp 1/\sqrt{18}\ebm\text{.}\) Let’s confirm that \(P\) is orthogonal.
Since \(PP^T=I_3\text{,}\) we can conclude that \(P^T=P^{-1}\text{,}\) so \(P\) is orthogonal, as required. Finally, we diagonalize \(A\text{.}\)
Incidentally, the SymPy library for Python does have a diagaonalization routine; however, it does not do orthogonal diagonalization by default. Here is what it provides for our matrix \(A\text{.}\)

Exercises Exercises

1.

Find the eigenvalues \(\lambda_1 \lt \lambda_2\) and associated unit eigenvectors \(\vec{u}_1, \vec{u}_2\) of the symmetric matrix \(A = {\left[\begin{array}{cc} 31 \amp -27\cr -27 \amp -41 \end{array}\right]}.\)
The smaller eigenvalue \(\lambda_1 =\) has associated unit eigenvector \(\vec{u}_1 =\) .
The larger eigenvalue \(\lambda_2 =\) has associated unit eigenvector \(\vec{u}_2 =\) .
Note: The eigenvectors above form an orthonormal eigenbasis for \(A\text{.}\)

2.

Find the eigenvalues \(\lambda_1 \lt \lambda_2\) and associated unit eigenvectors \(\vec{u}_1, \vec{u}_2\) of the symmetric matrix \(A = {\left[\begin{array}{cc} 27 \amp 9\cr 9 \amp 3 \end{array}\right]}.\)
The smaller eigenvalue \(\lambda_1 =\) has associated unit eigenvector \(\vec{u}_1 =\) .
The larger eigenvalue \(\lambda_2 =\) has associated unit eigenvector \(\vec{u}_2 =\) .
Note: The eigenvectors above form an orthonormal eigenbasis for \(A\text{.}\)

3.

Find the eigenvalues \(\lambda_1 \lt \lambda_2\) and associated unit eigenvectors \(\vec{u}_1, \vec{u}_2\) of the symmetric matrix \(A = {\left[\begin{array}{ccc} 1 \amp 5 \amp 3\cr 5 \amp 1 \amp 3\cr 3 \amp 3 \amp 3 \end{array}\right]}.\)
The eigenvalue \(\lambda_1 =\) has associated unit eigenvector \(\vec{u}_1 =\) .
The eigenvalue \(\lambda_2 =\) has associated unit eigenvector \(\vec{u}_2 =\) .
The eigenvalue \(\lambda_3 =\) has associated unit eigenvector \(\vec{u}_3 =\) .
Note: The eigenvectors above form an orthonormal eigenbasis for \(A\text{.}\)