Math1410 Eigenvalues and Diagonalization

Section 7.3 Eigenvalues and Diagonalization

Up to now we have concentrated primarily on finding eigenvalues and eigenvectors, and while we may have claimed that this is a particularly useful endeavour, we haven’t offered much to back up this claim, aside from referring the reader to a Wikipedia page. While a complete treatment of the theory and applications of eigenvalues and eigenvectors goes well beyond the scope of this book, we can offer a brief taste of what’s to come (should the reader choose to further their studies in linear algebra).

🔗

Similar matrices.

We begin by defining what it means for two matrices to be similar. Shortly, we will see that similar matrices share a number of properties, and ultimately we will define diagonalizable matrices as those matrices that are similar to a diagonal matrix.

🔗

Definition 7.3.1. Similar Matrices.

Let \(A\) and \(B\) be \(n\times n\) matrices. We say that \(A\) is similar to \(B\text{,}\) and write \(A\sim B\text{,}\) if there exists an invertible \(n\times n\) matrix \(P\) such that

\begin{equation*} A = P^{-1}BP\text{.} \end{equation*}

🔗

Although our definition above is not symmetric (it defines what it means for \(A\) to be similar to \(B\text{,}\) but not vice-versa), there is in fact no ambiguity in using a statement such as “\(A\) and \(B\) are similar matrices,” as the next theorem demonstrates.

🔗

Theorem 7.3.2. Properties of matrix similarity.

Let \(A\text{,}\) \(B\text{,}\) and \(C\) be \(n\times n\) matrices. Then:

\(\displaystyle A\sim A\)
🔗

🔗
If \(A\sim B\text{,}\) then \(B\sim A\)
🔗

🔗
If \(A\sim B\) and \(B\sim C\text{,}\) then \(A\sim C\text{.}\)
🔗

🔗

🔗

Let’s see why these results are true.

Setting \(P=I\text{,}\) the \(n\times n\) identity matrix, we have \(A = I^{-1}AI\text{,}\) and thus \(A\sim A\text{.}\)
🔗

🔗
Suppose \(A\sim B\text{.}\) Then we know that \(A = P^{-1}BP\) for some invertible matrix \(P\text{.}\) Setting \(Q=P^{-1}\text{,}\) (and noting \(Q^{-1} = P\)) we have

\begin{equation*} B = (PP^{-1})B(PP^{-1}) = P(P^{-1}BP)P^{-1} = Q^{-1}AQ, \end{equation*}

so \(B\sim A\text{.}\)

🔗

🔗
Suppose that \(A\sim B\) and \(B\sim C\text{.}\) Then there exist \(n\times n\) matrices \(P\) and \(Q\) such that \(A = P^{-1}BP\) and \(B=Q^{-1}CQ\text{.}\) But then

\begin{equation*} A = P^{-1}BP = P^{-1}(Q^{-1}CQ)P = (P^{-1}Q^{-1})C(QP) = (QP)^{-1}C(QP), \end{equation*}

which shows that \(A\sim C\text{.}\)

🔗

🔗

🔗

The fact that the definition of matrix similarity satisfies the three properties in the theorem above tells us that matrix similarity is an example of what is called an equivalence relation. This equivalence relation breaks the set of all \(n\times n\) matrices into sets of “equivalent” matrices. Similar matrices are so-called because, although they often have very different entries, they share many of the same properties. (At the end of this section, we’ll discuss the fact that from the point of view of linear transformations, similar matrices describe the same linear transformation, if we represent that linear transformation using different choices of basis.) The following theorem gives some of the properties similar matrices share.

🔗

Theorem 7.3.3. Shared properties of similar matrices.

Let \(A\) and \(B\) be \(n\times n\) matrices. If \(A\sim B\text{,}\) then:

\(\displaystyle \tr(A) = \tr(B)\)
🔗

🔗
\(\displaystyle \det(A) = \det(B)\)
🔗

🔗
\(A\) and \(B\) have the same eigenvalues.
🔗

🔗

🔗

Let us again check the details on each of the points in the above theorem to see why they’re true.

Recall that the trace satisfies \(\tr(AB) = \tr(BA)\) for any \(n\times n\) matrices \(A\) and \(B\text{.}\) Thus, if \(A = P^{-1}BP\text{,}\) we have

\begin{equation*} \tr(A) = \tr(P^{-1}BP) = \tr(B(P^{-1}P)) = \tr(BI)=\tr(B)\text{.} \end{equation*}

🔗

🔗
Recall that the determinant satisfies \(\det(AB) = \det(A)\det(B)\) for any \(n\times n\) matrices \(A\) and \(B\text{,}\) and \(\det(A^{-1}) = \dfrac{1}{\det(A)}\) for any invertible matrix \(A\text{.}\) Thus, if \(A = P^{-1}BP\text{,}\) we have

\begin{align*} \det(A) \amp = \det(P^{-1}BP) = \det(P^{-1})\det(B)\det(P)\\ \amp = \frac{1}{\det(P)}\det(B)\det(P) = \det(B)\text{.} \end{align*}

🔗

🔗
Suppose that \(\lambda\) is an eigenvalue of \(A\text{,}\) and that \(A\sim B\text{.}\) Since \(\lambda\) is an eigenvalue of \(A\text{,}\) we have

\begin{equation*} A\vec{v} = \lambda\vec{v} \end{equation*}

for some nonzero vector \(\vec{v}\text{.}\) But if \(A\sim B\text{,}\) then \(A = P^{-1}BP\) for some invertible matrix \(P\text{,}\) so

\begin{equation*} P^{-1}BP\vec{v} = \lambda\vec{v}\text{.} \end{equation*}

🔗

Multiplying both sides on the left by \(P\text{,}\) we have

\begin{equation*} B(P\vec{v}) = P(\lambda\vec{v}) = \lambda(P\vec{v})\text{.} \end{equation*}

Since \(P\) is invertible and \(\vec{v}\neq \vec{0}\text{,}\) we know that \(\vec{w} = P\vec{v}\) is nonzero, and thus \(\lambda\) is an eigenvalue of \(B\) with corresponding eigenvector \(P\vec{v}\text{.}\)

🔗

🔗

🔗

Multiplicity of an eigenvalue.

To clarify the presentation in this section, we will shift our notation somewhat from what we’ve been using so far in the textbook. Previously, we used the definition

\begin{equation*} p_A(\lambda) = \det(A-\lambda I) \end{equation*}

for the characteristic polynomial of a matrix \(A\text{.}\) We will adjust this in two ways: first, we will use \(x\) as the variable in our polynomial; second, we will shift the sign of the characteristic polynomial and make the following definition:

🔗

Definition 7.3.4. Characteristic Polynomial (revised definition).

The characteristic polynomial of an \(n\times n\) matrix \(A\) is the degree-\(n\) polynomial \(p_A(x)\) defined by

\begin{equation*} p_A(x) = \det(xI-A)\text{.} \end{equation*}

🔗

Notice that we are using \(xI-A\) rather than \(A-xI\text{.}\) This guarantees that the highest-degree term in \(p_A(x)\) is always \(x^n\) (with coefficient 1), whereas before this term was \(\pm x^n\text{,}\) depending on whether \(n\) is even or odd. We are using \(x\) as our variable to make it easier to talk about factors of the characteristic polynomial: with this change, the statements “\(\lambda\) is an eigenvalue of \(A\)” and “\((x-\lambda)\) is a factor of \(p_A(x)\)” are equivalent.

🔗

In the previous sections, we saw examples of matrices whose characteristic polynomial had a repeated root. In such cases, it is sometimes possible (but not guaranteed) that there is more than one independent eigenvector associated with the repeated eigenvalue. Indeed, consider the matrices

\begin{equation*} A = \bbm 2 \amp 0 \amp 0\\0\amp 2\amp 0\\0\amp 0\amp 2\ebm, \quad B = \bbm 2\amp 0\amp 0\\0\amp 2\amp 1\\0\amp 0\amp 2\ebm, \quad C = \bbm 2\amp 1\amp 0\\0\amp 2\amp 1\\0\amp 0\amp 2\ebm\text{.} \end{equation*}

All three matrices are upper-triangular, so they have the same characteristic polynomial, namely:

\begin{equation*} p(x) = (x-2)^3\text{.} \end{equation*}

Notice that the eigenvalue \(\lambda = 2\) is “repeated”, in the sense that it is a repeated root of the characteristic polynomial. This leads to the following definition:

🔗

Definition 7.3.5. Algebraic Multiplicity of an Eigenvalue.

We say that \(\lambda\) is an eigenvalue with algebraic multiplicity \(k\) if \((x-\lambda)^k\) is a factor of \(p_A(x)\text{,}\) but \((x-\lambda)^{k+1}\) is not.

🔗

In other words, the algebraic multiplicity of an eigenvalue \(\lambda \) is the (largest) power to which the corresponding factor \((x-\lambda)\) appears in the characteristic polynomial. In our example above, all three of the matrices \(A\text{,}\) \(B\text{,}\) and \(C\) have the eigenvalue \(2\) with algebraic multiplicity 3. However, consider the corresponding eigenvectors in each case:

🔗

For \(A = \bbm 2\amp 0\amp 0\\0\amp 2\amp 0\\0\amp 0\amp 2\ebm\text{,}\) we have \(A-2I = 0\text{,}\) so every non-zero vector \(\vec{x}\) is an eigenvector. Indeed, \(A=2I\text{,}\) so

\begin{equation*} A\vec{x} = (2I)\vec{x} = I(2\vec{x}) = 2\vec{x} \end{equation*}

for every vector. In particular, we can choose the standard unit basis vectors

\begin{equation*} \vec{e}_1 = \bbm 1\\0\\0\ebm, \vec{e}_2 = \bbm 0\\1\\0\ebm, \vec{e}_3 = \bbm 0\\0\\1\ebm \end{equation*}

as our basic eigenvectors, and we see that there are three independent eigenvectors corresponding to the eigenvalue \(\lambda=2\text{.}\)

🔗

For \(B = \bbm 2\amp 0\amp 0\\0\amp 2\amp 1\\0\amp 0\amp 2\ebm\text{,}\) we have \(B-2I = \bbm 0\amp 0\amp 0\\0\amp 0\amp 1\\0\amp 0\amp 0\ebm\text{,}\) which is row-equivalent to \(\bbm 0\amp 0\amp 1\\0\amp 0\amp 0\\0\amp 0\amp 0\ebm\text{.}\) The general solution to the equation \((B-2I)\vec{x}=\vec{0}\) is thus given by

\begin{equation*} \vec{x} = \bbm s\\t\\0\ebm = s\bbm 1\\0\\0\ebm + t\bbm 0\\1\\0\ebm, \end{equation*}

since the reduced row-echelon form of \(B-2I\) tells us that we must have \(z=0\text{,}\) while \(x\) and \(y\) are free. Thus, in this case we have two independent basic eigenvectors, given by the standard unit basis vectors \(\vec{e}_1\) and \(\vec{e}_2\text{.}\)

🔗

Finally, for \(C= \bbm 2\amp 1\amp 0\\0\amp 2\amp 1\\0\amp 0\amp 2\ebm\) have have \(C-2I = \bbm 0\amp 1\amp 0\\0\amp 0\amp 1\\0\amp 0\amp 0\ebm\text{,}\) which is already in reduced-row echelon form. From this, we can see that the system \((C-2I)\vec{x}=\vec{0}\) has general solution

\begin{equation*} \vec{x} = \bbm t\\0\\0\ebm = t\vec{e}_1, \end{equation*}

so we have only one independent eigenvector associated to \(\lambda=2\text{;}\) namely, \(\vec{e}_1\text{.}\)

🔗

The above shows us that when the algebraic multiplicity of an eigenvalue is greater than one, we might have multiple independent eigenvectors associated to that eigenvalue, but this is not guaranteed. To quantify this situation, we introduce further terminology. First, we make a definition: the null space of an \(n\times n\) matrix \(A\) is the set defined by

\begin{equation*} \operatorname{null}(A) = \{\vec{x}\in\R^n \,|\, A\vec{x}=\vec{0}\}\text{.} \end{equation*}

In this language, we note that every eigenvector associated to an eigenvalue \(\lambda\) belongs to the null space of \(A-\lambda I\text{.}\) Indeed, \(\operatorname{null}(A-\lambda I)\) is equal to the set of all eigenvectors associated to the eigenvalue \(\lambda\text{,}\) along with the zero vector. This leads to another definition:

🔗

Definition 7.3.6. Eigenspace.

Let \(A\) be an \(n\times n\) matrix, and let \(\lambda\) be any real number. The \(\lambda\)-eigenspace of \(A\text{,}\) denoted by \(E(A,\lambda)\text{,}\) is defined by

\begin{equation*} E(A,\lambda) = \operatorname{null}(A-\lambda I)\text{.} \end{equation*}

🔗

A couple of remarks on this definition are in order. First, notice that \(\lambda\) is not assumed to be an eigenvalue in the definition above. However, for any real number \(\lambda\) which is not an eigenvalue, we have

\begin{equation*} E(A,\lambda) = \{\vec{0}\}\text{.} \end{equation*}

The eigenvalues of \(A\) can thus be described as those real numbers for which \(E(A,\lambda)\neq \{\vec{0}\}\text{.}\) When \(\lambda\) is an eigenvalue, the basic eigenvectors associated to \(\lambda\) (that is, the basic solutions to the system \((A-\lambda I)\vec{x}=\vec{0}\)) form a basis for \(E(A,\lambda)\text{.}\) This tells us that the dimension of each eigenspace is given by the number of independent basic eigenvectors associated to the corresponding eigenvalue, and gives us one more definition:

🔗

Definition 7.3.7. Geometric Multiplicity of an Eigenvalue.

Let \(A\) be an \(n\times n\) matrix. The geometric multiplicity of an eigenvalue \(\lambda\) of \(A\) is defined to be the dimension of the eigenspace \(E(A,\lambda)\text{.}\)

🔗

Example 7.3.8. Computing eigenvalues and their multiplicities.

Determine the eigenvalues of the matrix \(A = \bbm 0 \amp 1 \amp 1\\1\amp 0\amp 1\\1\amp 1\amp 0\ebm\text{,}\) along with their algebraic and geometric multiplicities.

🔗

Solution.

We first compute the eigenvalues of \(A\text{.}\) The characteristic polynomial of \(A\) is given by

\begin{align*} c_A(x) \amp = \begin{vmatrix}x \amp -1\amp -1\\-1\amp x\amp -1\\-1\amp -1\amp x\end{vmatrix}\\ \amp = x\begin{vmatrix}x \amp -1\\-1\amp x\end{vmatrix}+\begin{vmatrix}-1\amp -1\\-1\amp x\end{vmatrix}-\begin{vmatrix}-1\amp x\\-1\amp -1\end{vmatrix}\\ \amp = x(x^2-1)-(x+1)-(x+1)\\ \amp = x(x-1)(x+1)-2(x+1)\\ \amp = (x^2-x-2)(x+1)\\ \amp = (x-2)(x+1)^2\text{.} \end{align*}

We see that the roots of \(c_A(x)\) are \(\lambda_1 = 2\text{,}\) which has algebraic multiplicity 1, and \(\lambda_2 = -1\text{,}\) which has algebraic multiplicity 2.

🔗

To determine the geometric multiplicities, we compute the corresponding eigenvectors. For \(\lambda = 2\text{,}\) we have

\begin{equation*} A-2I = \bbm -2 \amp 1 \amp 1\\1\amp -2\amp 1\\1\amp 1\amp -2\ebm \xrightarrow{\text{RREF}} \bbm 1\amp 0\amp -1\\0\amp 1\amp -1\\0\amp 0\amp 0\ebm\text{.} \end{equation*}

Thus, \((A-2I)\vec{x} = \vec{0}\) for \(\vec{x} = \bbm t\\t\\t\ebm = t\bbm 1\\1\\1\ebm\text{,}\) so \(\vec{x}_1 = \bbm 1\\1\\1\ebm\) is the (single) basic eigenvector associated to \(\lambda = 2\text{.}\) This gives us

\begin{equation*} E(A,2) = \operatorname{span}\left\{\bbm 1\\1\\1\ebm\right\}, \end{equation*}

so the geometric multiplicity of \(\lambda = 2\) is \(\dim E(A,2) = 1\text{.}\)

🔗

For \(\lambda = -1\text{,}\) we find

\begin{equation*} A-(-1)I = \bbm 1 \amp 1 \amp 1\\1\amp 1\amp 1\\1\amp 1\amp 1\ebm \xrightarrow{\text{RREF}} \bbm 1\amp 1\amp 1\\0\amp 0\amp 0\\0\amp 0\amp 0\ebm\text{.} \end{equation*}

This tells us that the general solution to \((A+I)\vec{x}=\vec{0}\) is

\begin{equation*} \vec{x} = \bbm -s-t\\s\\t\ebm = s\bbm -1\\1\\0\ebm + t\bbm -1\\0\\1\ebm, \end{equation*}

\begin{equation*} E(A,-1) = \operatorname{span}\left\{\bbm -1\\1\\0\ebm, \bbm -1\\0\\1\ebm\right\}, \end{equation*}

and the geometric multiplicity of \(\lambda = -1\) is \(\dim E(A,-1) = 2\text{.}\)

🔗

Diagonalization.

One common application of matrix algebra is to a class of dynamical systems known as linear recurrences. A linear recurrence is a process that occurs in discrete steps, such that the state of the system at any step \(k\) depends linearly on the state of the system after the previous step. An example might be a population model for an ecosystem involving several species. For a simple two-species “predator-prey” model, we might have two populations \(P_1\) (of rabbits, perhaps?) and \(P_2\) (let’s say these are foxes) where the populations at a time \(t_k\) satisfy a relationship such as

\begin{align*} P_1(t_k) \amp = aP_1(t_{k-1}) + bP_2(t_{k-1})\\ P_2(t_k) \amp = cP_1(t_{k-1}) + dP_2(t_{k-1}). \end{align*}

A reasonable model would probably have positive values for the coefficients \(a\) and \(d\) (absent competition, predators, etc. we expect the populations to increase), while \(b\) would be negative (more foxes mean fewer rabbits) and \(c\) would be positive (more rabbits provide more fox food). If we introduce a “population vector” \(\vec{P}(t) = \bbm P_1(t)\\P_2(t)\ebm\) to describe the situation at a time \(t\text{,}\) then the above system can be written in the form \(\vec{P}(t_k) = A\vec{P}(t_{k-1})\text{,}\) where \(A = \bbm a\amp b\\c\amp d\ebm\text{.}\)

🔗

Now, given an initial population state \(\vec{P}(t_0)\text{,}\) we might want to know what to expect after several generations (or seasons, or some other reasonable unit of time). Notice that we have

\begin{align*} \vec{P}(t_1) \amp = A\vec{P}(t_0)\\ \vec{P}(t_2) \amp = A\vec{P}(t_1) = A(A\vec{P}(t_0)) = A^2\vec{P}(t_0)\\ \vec{P}(t_3) \amp = A\vec{P}(t_2) = A(A^2\vec{P}(t_0)) = A^3\vec{P}(t_0)\\ \vdots \quad \amp \quad \vdots\\ \vec{P}(t_k) \amp = A^k\vec{P}(t_0). \end{align*}

Thus, to model the time evolution of our system, it suffices to compute powers of our matrix \(A\text{.}\) If we want to study a system over long periods of time, we need to be able to compute large powers of our matrix, and for complex systems, being able to do so efficiently will become increasingly important. (For example, we may be interested in whether or not the system settles into an equilibrium state. This would be the case, for example, if we were able to show that \(A^k\) was approximately equal to the identity matrix for large values of \(k\text{.}\))

🔗

One case where this is easily done is when the initial state \(\vec{P}_0=\vec{P}(t_0)\) is an eigenvector for our matrix: if \(A\vec{P}_0 = \lambda\vec{P}_0\text{,}\) then we have

\begin{align*} A^2\vec{P}_0 \amp = A(A\vec{P}_0) = A(\lambda \vec{P}_0) = \lambda (A\vec{P}_0) = \lambda(\lambda \vec{P}_0) = \lambda^2\vec{P}_0\\ A^3\vec{P}_0 \amp = A(A^2\vec{P}_0) = A(\lambda^2\vec{P}_0) = \lambda^2 (A\vec{P}_0) = \lambda^2(\lambda\vec{P}_0) = \lambda^3\vec{P}_0, \end{align*}

and so on: in general, \(A^k\vec{P}_0 = \lambda^k\vec{P}_0\text{.}\) This is useful, since computing powers of a number is much simpler than computing powers of a matrix.

🔗

What if the initial state is not an eigenvector? It turns out that the same trick works as long as \(\vec{P}_0\) can be written as a linear combination of eigenvectors: suppose that

\begin{equation*} \vec{P}_0 = c_1\vec{x}_1 + c_2\vec{x}_2 + \cdots + c_k\vec{x}_k, \end{equation*}

where \(\vec{x}_1, \vec{x}_2, \ldots, \vec{x}_k\) are eigenvectors of \(A\) with eigenvalues \(\lambda_1, \lambda_2, \ldots, \lambda_k\text{,}\) respectively. (Note that at this point we’ve moved away from the original premise of a \(2\times 2\) matrix \(A\) to consider a more general situation.) In this case, for any natural number \(n\text{,}\) we have

\begin{align*} A^n\vec{P}_0 \amp = A^n(c_1\vec{x}_1 + c_2\vec{x}_2 + \cdots + c_k\vec{x}_k)\\ \amp = c_1(A^n\vec{x}_1) + c_2(A^n\vec{x}_2) + \cdots + c_k(A^n\vec{x}_k)\\ \amp = c_1(\lambda_1^n\vec{x}_1) + c_2(\lambda_2^n\vec{x}_2) + \cdots + c_k(\lambda_k^n\vec{x}_k), \end{align*}

using our previous result. Again, we only need to be able to compute powers of the eigenvalues, and not of the original matrix.

🔗

With the above in mind, we will clearly be in an advantageous situation if we can guarantee that every vector in our space can be written as a linear combination of eigenvectors. We know from our earlier work that in order to guarantee this, we would need to be able to construct a basis of eigenvectors. We will see how such a basis can be used after we make a definition.

🔗

Definition 7.3.9. Diagonalizable Matrix.

We say that an \(n\times n\) matrix \(A\) is diagonalizable (can be diagonalized) if \(A\sim D\) for some diagonal matrix \(D\text{.}\) In other words, \(A\) is diagonalizable if there exists an invertible matrix \(P\) such that \(D=P^{-1}AP\) is a diagonal matrix.

🔗

Theorem 7.3.10. A matrix with a basis of eigenvectors is diagonalizable.

An \(n\times n\) matrix \(A\) is diagonalizable if and only if there exists a basis for \(\R^n\) consisting of eigenvectors of \(A\text{.}\)

🔗

We will now proceed with a proof of this theorem. We will see that in addition to the theoretical interest in providing a proof, the ideas in this proof will have practical use as well.

🔗

First, suppose that \(A\) can be diagonalized. Then there exists an invertible matrix \(P\) such that \(P^{-1}AP = D\text{,}\) where

\begin{equation*} D = \bbm \lambda_1 \amp 0 \amp \cdots \amp 0\\0\amp \lambda_2 \amp \cdots \amp 0\\\vdots \amp \vdots \amp \ddots \amp \vdots\\0\amp 0\amp \cdots \amp \lambda_n\ebm, \end{equation*}

and \(\lambda_1, \lambda_2, \ldots, \lambda_n\) are the (not necessarily distinct) eigenvalues of \(A\text{.}\) Letting \(\vec{e}_1,\ldots, \vec{e}_n\) denote the standard basis for \(\R^n\) we have, for each \(j=1,2,\ldots, n\text{:}\)

\begin{align*} D\vec{e}_j \amp = \lambda_j \vec{e}_j\\ (P^{-1}AP)\vec{e}_j \amp = \lambda_j\vec{e}_j\\ (AP)\vec{e}_j \amp = P(\lambda_j\vec{e}_j)\\ A(P\vec{e}_j) \amp \lambda_j(P\vec{e}_j), \end{align*}

so \(\vec{x}_j = P\vec{e}_j\) is an eigenvector for \(A\) with eigenvalue \(\lambda_j\text{.}\) Moreover, since \(P\) is invertible, we know that the vectors \(\vec{x}_1, \ldots, \vec{x}_n\) are linearly independent, and thus form our desired basis of eigenvectors. (See the margin note for a proof of this fact.)

🔗

Conversely, suppose that we have a basis \(\vec{x}_1, \vec{x}_2, \ldots, \vec{x}_n\) of eigenvectors for \(A\text{,}\) with corresponding eigenvalues \(\lambda_1,\lambda_2, \ldots, \lambda_n\text{.}\) Let \(P\) be the matrix whose columns are given by these vectors. Then \(P\) is invertible, since its columns are linearly independent. Let \(D\) be defined as above. Then

\begin{equation*} PD = \bbm \vec{x}_1 \amp \vec{x}_2 \amp \cdots \amp \vec{x}_n\ebm \bbm \lambda_1 \amp 0 \amp \cdots \amp 0\\0\amp \lambda_2 \amp \cdots \amp 0\\\vdots \amp \vdots \amp \ddots \amp \vdots\\0\amp 0\amp \cdots \amp \lambda_n\ebm = \bbm \lambda_1\vec{x}_1 \amp \lambda_2\vec{x}_2 \amp \cdots \amp \lambda_n\vec{x}_n\ebm, \end{equation*}

while

\begin{equation*} AP = A\bbm \vec{x}_1 \amp \vec{x}_2 \amp \cdots \amp \vec{x}_n\ebm = \bbm A\vec{x}_1 \amp A\vec{x}_2 \amp \cdots \amp A\vec{x}_n\ebm = \bbm \lambda_1\vec{x}_1\amp \lambda_2\vec{x}_2\amp \cdots \amp \lambda_n\vec{x}_n\ebm. \end{equation*}

Thus, \(PD=AP\text{,}\) and since \(P\) is invertible, this gives us \(D=P^{-1}AP\text{,}\) as required.

🔗

As we discussed above, the diagonalizability of \(A\) makes it easy to compute powers of \(A\text{.}\) Indeed, suppose \(P^{-1}AP=D\) is diagonal, for some invertible matrix \(P\text{.}\) Then \(A = PDP^{-1}\text{,}\) and

\begin{equation*} A^n = (PDP^{-1})^n = (PDP^{-1})\cdots (PDP^{-1}) = PD(P^{-1}P)D\cdots (P^{-1}P)DP^{-1} = PD^nP^{-1}\text{.} \end{equation*}

This makes it easy to compute \(A^n\text{,}\) since it’s easy to compute \(D^n\text{:}\)

\begin{equation*} \text{If } D=\bbm \lambda_1 \amp 0 \amp \cdots \amp 0\\0\amp \lambda_2 \amp \cdots \amp 0\\\vdots \amp \vdots \amp \ddots \amp \vdots\\0\amp 0\amp \cdots \amp \lambda_n\ebm, \text{ then } D^n = \bbm \lambda_1^n \amp 0 \amp \cdots \amp 0\\0\amp \lambda_2^n \amp \cdots \amp 0\\\vdots \amp \vdots \amp \ddots \amp \vdots\\0\amp 0\amp \cdots \amp \lambda_n^n\ebm\text{.} \end{equation*}

🔗

The proof Theorem 7.3.10 is quite useful, since it tells us two things: first, we now know what it takes to diagonalize a matrix: for an \(n\times n\) matrix, we need to be able to find \(n\) independent eigenvectors. Second, the proof of the theorem tells us how to diagonalize: once we’ve found our eigenvectors, we construct the matrix \(P\) whose columns are given by the eigenvectors of \(A\text{;}\) the matrix \(P^{-1}AP\) will then be a diagonal matrix with the eigenvalues of \(A\) on the main diagonal. This leaves us with the question: when can we be sure we have enough eigenvectors? To answer this, we begin with a theorem:

🔗

Theorem 7.3.11. Linear independence of eigenvectors.

The eigenvectors \(\vec{x}_1,\ldots, \vec{x}_k\) corresponding to distinct eigenvalues of \(A\) are linearly independent.

🔗

The above theorem leads to the following corollary:

🔗

Theorem 7.3.12. Distinct eigenvalues and diagonalization.

If an \(n\times n\) matrix \(A\) has \(n\) distinct eigenvalues, then \(A\) is diagonalizable.

🔗

Let us first see how the above corollary follows from our theorem. We know that every eigenvalue has at least one associated eigenvector. If there are \(n\) distinct eigenvalues, then we have \(n\) associated eigenvectors which, by the theorem above, are linearly independent, and we know that any set of \(n\) linearly independent vectors in \(\R^n\) forms a basis for \(\R^n\text{.}\)

🔗

As for the proof of the theorem, we use an argument by contradiction: suppose, to the contrary, that the eigenvectors \(\vec{x}_1, \ldots, \vec{x}_k\) are linearly dependent. Then there is some \(m\text{,}\) \(2\leq m\leq k-1\text{,}\) such that the vectors \(\vec{x}_1, \ldots, \vec{x}_m\) are linearly independent, but \(\vec{x}_{m+1}\) can be written as a linear combination of the vectors \(\vec{x}_1, \ldots, \vec{x}_m\text{;}\) that is,

\begin{equation} \vec{x}_{m+1} = c_1\vec{x}_1+c_2\vec{x}_2+\cdots + c_m\vec{x}_m\text{,}\tag{7.3.1} \end{equation}

for some scalars \(c_1,\ldots, c_m\text{.}\) Then, multiplying both sides of (7.3.1) on the left by \(A\text{,}\) we must have

\begin{equation*} A\vec{x}_{m+1} = A(c_1\vec{x}_1+c_2\vec{x}_2+\cdots + c_m\vec{x}_m) = c_1(A\vec{x}_1)+c_2(A\vec{x}_2)+\cdots + c_m(A\vec{x}_m)\text{,} \end{equation*}

but each vector \(\vec{x}_i\) is an eigenvector of \(A\) with eigenvalue \(\lambda_i\text{.}\)

🔗

Thus,

\begin{equation*} \lambda_{m+1}\vec{x}_{m+1} = c_1\lambda_1\vec{x}_1+c_2\lambda_2\vec{x}_2+\cdots + c_m\lambda_m\vec{x}_m\text{.} \end{equation*}

On the other hand, if we multiply both sides of \eqref{eq-indepeigen} by the scalar \(\lambda_{m+1}\text{,}\) we have

\begin{equation*} \lambda_{m+1}\vec{x}_{m+1} = c_1\lambda_{m+1}\vec{x}_1 + c_2\lambda_{m+1}\vec{x}_2 + \cdots + c_m\lambda_{m+1}\vec{x}_{m+1}\text{.} \end{equation*}

🔗

Subtracting these last two equations, we find:

\begin{equation*} \vec{0} = c_1(\lambda_{m+1}-\lambda_1)\vec{x}_1 + c_2(\lambda_{m+1}-\lambda_2)\vec{x}_2+\cdots + c_m(\lambda_{m+1}-\lambda_m)\vec{x}_m\text{.} \end{equation*}

🔗

Now, we are assuming that our eigenvalues are all distinct. Thus, \(\lambda_{m+1}-\lambda_i\neq 0\) for all \(i=1,2,\ldots, m\text{.}\) On the other hand, the vectors \(\vec{x}_1,\vec{x}_2, \ldots, \vec{x}_m\) are assumed to be linearly independent, so we must have

\begin{equation*} c_1(\lambda_{m+1}-\lambda_1) = 0, c_2(\lambda_{m+1}-\lambda_2)=0, \ldots, c_m(\lambda_{m+1}-\lambda_m)=0\text{.} \end{equation*}

Since none of the \(\lambda_{m+1}-\lambda_i\) terms vanish, it must be the case that \(c_1=c_2=\cdots =c_m=0\text{.}\) But if that is true, then

\begin{equation*} \vec{x}_{m+1} = 0\vec{x}_1+0\vec{x}_2+\cdots + 0\vec{x}_m = \vec{0}, \end{equation*}

which is impossible, since \(\vec{x}_{m+1}\) was assumed to be an eigenvector, and eigenvectors are nonzero. Thus, it must be the case that all of our eigenvectors are linearly independent.

🔗

The above tells us that any \(n\times n\) matrix \(A\) with \(n\) distinct eigenvalues is automatically diagonalizable. There are two other possibilities. One is that the characteristic polynomial of \(A\) cannot be completely factored over the real numbers. In this case, diagonalization is impossible, unless we are willing to consider complex eigenvalues. Recall that over the complex numbers, every polynomial can be completely factored. (This is the Fundamental Theorem of Algebra.) In this case, we may still be able to diagonalize, but the eigenvectors corresponding to the complex eigenvalues may well be complex themselves, so the matrix \(P\) used to diagonalize \(A\) could have complex entries.

🔗

The other possibility is that we have repeated eigenvalues. Here, it is possible that we encounter truly insurmountable obstacles. First, we have the following theorem, which we will state without proof. The interested reader can easily find the details in other linear algebra textbooks or online.

🔗

Theorem 7.3.13. Algebraic versus geometric multiplicity.

Let \(A\) be an \(n\times n\) matrix, and for each real number \(\lambda\text{,}\) let \(m_\lambda(A)\) denote the algebraic multiplicity of \(\lambda\text{.}\) (If \(\lambda\) is not an eigenvalue of \(A\text{,}\) we set \(m_\lambda(A)=0\text{.}\)) Then for all numbers \(\lambda\text{,}\) we have

\begin{equation*} \dim\operatorname{null}(A-\lambda I)\leq m_\lambda(A)\text{.} \end{equation*}

🔗

In other words, the geometric multiplicity of an eigenvalue is always less than or equal to the algebraic multiplicity. Note that the sum of the algebraic multiplicities is always equal to the degree of the characteristic polynomial, which, in turn, is equal to \(n\text{,}\) for an \(n\times n\) matrix \(A\text{.}\) On the other hand, the total number of independent eigenvectors for \(A\) is equal to the sum of the geometric multiplicities. (We know that eigenvectors for different eigenvalues are independent, and the geometric multiplicity tells us how many independent eigenvectors we’ll have for a single eigenvalue.) As a result, we have the following:

🔗

Theorem 7.3.14. Diagonalization and multiplicities.

An \(n\times n\) matrix \(A\) is diagonalizable if and only if \(\dim\operatorname{null}(A) = m_\lambda(A)\) for each eigenvalue \(\lambda\) of \(A\text{;}\) that is, if the geometric multiplicity of each eigenvalue is equal to its algebraic multiplicity.

🔗

This also tells us exactly when we can expect a matrix to fail to be diagonalizable: this will be the case if \(A\) has an eigenvalue of algebraic multiplicity greater than 1 for which the geometric multiplicity is less than the algebraic multiplicity. (That is, if we don’t get “enough” basic eigenvectors corresponding to a repeated eigenvalue.)

🔗

The reader may wonder if there is anything further that can be said in the case that a matrix \(A\) is not diagonalizable, and indeed, there is. As long as we are willing to work over the complex numbers (to avoid situations where the characteristic polynomial cannot be factored), one can prove that in the worst-case scenario, there exists an invertible matrix \(P\) such that the matrix \(P^{-1}AP\) is triangular. (If we can’t get a diagonal matrix, upper or lower-triangular is the next best thing.) The standard form of such a matrix is called the Jordan Canonical Form; it is obtained using what are called generalized eigenvectors. The Jordan Canonical Form of a matrix is studied in more advanced courses in linear algebra.

🔗

One last important result that we do not have time to study concerns the case of symmetric matrices. Recall that an \(n\times n\) matrix \(A\) is symmetric if \(A^T=A\text{.}\) An important result called the Spectral Theorem guarantees that every symmetric matrix \(A\) is diagonalizable. In fact, the spectral theorem goes one step further. Recall that two vectors \(\vec{u}\) and \(\vec{v}\) are orthogonal if \(\vec{u}\dotp\vec{v}=0\text{.}\) We say that a set of vectors \(\{\vec{x}_1,\vec{x}_2,\ldots, \vec{x}_k\}\) is orthogonal if \(\vec{x}_i\neq \vec{0}\) for each \(i\text{,}\) and \(\vec{x}_i\dotp \vec{x}_j = 0\) for all \(i\neq j\text{.}\) Orthogonality is a “stronger” condition than linear independence. Indeed, if the vectors \(\vec{x}_1,\ldots, \vec{x}_k\) are orthogonal and

\begin{equation*} c_1\vec{x}_1+c_2\vec{x}_2+\cdots + c_k\vec{x}_k = \vec{0} \end{equation*}

for some scalars \(c_1,c_2,\ldots, c_k\text{,}\) taking the dot product of both sides of the above equation with \(\vec{x}_1\) gives us

\begin{equation*} c_1(\vec{x}_1\dotp \vec{x}_1)+c_2(0)+\cdots + c_k(0)=0, \end{equation*}

and since \(\vec{x}_1\dotp \vec{x}_1 = \lVert \vec{x}_1\rVert^2 \neq 0\text{,}\) it must be that \(c_1=0\text{,}\) and similarly, all the other scalars must be zero as well.

🔗

For a symmetric matrix, eigenvectors corresponding to distinct eigenvalues are not just independent, they’re orthogonal. Indeed, since \(\vec{x}_1\dotp \vec{x}_2\) can be written as the matrix product \(\vec{x}_1^T\vec{x}_2\text{,}\) we have (using the fact that \(A^T=A\)):

\begin{equation*} \lambda_1(\vec{x}_1\dotp \vec{x}_2) = (\lambda_1\vec{x}_1)^T\vec{x}_2 = (A\vec{x}_1)^T\vec{x}_2 = (\vec{x}_1^TA^T)\vec{x}_2 = \vec{x}_1^T(A\vec{x}_2) = \vec{x}_1^T(\lambda_2\vec{x}_2) = \lambda_2(\vec{x}_1\dotp \vec{x}_2), \end{equation*}

Thus, \((\lambda_1-\lambda_2)(\vec{x}_1\dotp\vec{x}_2) = 0\text{,}\) and since \(\lambda_1\neq \lambda_2\text{,}\) we must have \(\vec{x}_1\dotp \vec{x}_2 = 0\text{.}\)

🔗

It is possible to prove that for a symmetric matrix, one can find an orthogonal basis of eigenvectors. This observation has a number of important applications. (Try an online search for “orthogonal diagonalization” and you should be able to find several examples.)

🔗

Matrix similarity and change of basis for transformations.

We end this section with the promised explanation of how similar matrices can be viewed as different representations of the same linear transformation. If we think of matrices in terms of the matrix transformations they define, then two matrix transformations obtained from similar matrices should be viewed as two descriptions of the same linear transformation using different “coordinate systems”. Let us explain what we mean here. Suppose

\begin{equation*} T:\R^n\to \R^n \end{equation*}

is a matrix transformation defined by \(T(\vec{x}) = A\vec{x}\) for some \(n\times n\) matrix \(A\text{.}\) Let us consider this to be a transformation defined in terms of the standard basis vectors \(\vec{e}_1, \vec{e}_2, \ldots, \vec{e}_n\) for \(\R^n\text{.}\)

🔗

We can then write

\begin{equation*} \vec{x} = \bbm x_1 \amp x_2 \amp \cdots \amp x_n\ebm^T = x_1\vec{e}_1+x_2\vec{e}_2+\cdots +x_n\vec{e}_n \end{equation*}

and think of \(T\) as a function of the variables \(x_1,x_2,\ldots, x_n\text{.}\) To put this another way, when we speak of a matrix transformation with standard matrix \(A\text{,}\) that matrix \(A\) is defined with respect to the standard basis. In general, let \(\mathcal{A}=\{\vec{e}_1,\ldots, \vec{e}_n\}\) denote our standard basis, and suppose

\begin{equation*} \mathcal{B} = \{\vec{b}_1, \vec{b}_2, \ldots, \vec{b}_n\} \end{equation*}

is another basis for \(\R^n\text{.}\) Recall that when we say that the set \(\mathcal{B}\) is a basis, we mean that \(\operatorname{span}\mathcal{B} = \R^n\text{,}\) so that every vector in \(\R^n\) can be written as a linear combination of the vectors in \(\mathcal{B}\text{,}\) and \(\mathcal{B}\) is linearly independent, which means that every vector in \(\R^n\) can be written uniquely as a linear combination of the vectors in \(\mathcal{B}\text{.}\)

🔗

Let \(\vec{v} = y_1\vec{b}_1+y_2\vec{b}_2+\cdots +y_n\vec{b}_n\) any vector in \(\R^n\text{.}\) Using the same linear transformation \(T\) as above, we have

\begin{equation*} T(\vec{v}) = T(y_1\vec{b}_1+y_2\vec{b}_2+\cdots +y_n\vec{b}_n) = y_1T(\vec{b}_1)+y_2T(\vec{b}_2)+\cdots + y_nT(\vec{b}_n)\text{.} \end{equation*}

Now, each of the vectors \(T(\vec{b}_i)\) can, in turn, be written in terms of the basis \(\mathcal{B}\text{:}\) we have

\begin{align*} T(\vec{b}_1) \amp = b_{11}\vec{b}_1 + b_{21}\vec{b}_2 + \cdots + b_{n1}\vec{b}_n\\ T(\vec{b}_2) \amp = b_{12}\vec{b}_1 + b_{22}\vec{b}_2 + \cdots + b_{n2}\vec{b}_n\\ \vdots \quad \amp \quad \vdots \quad \vdots \quad \quad \vdots\\ T(\vec{b}_n) \amp = b_{1n}\vec{b}_1 + b_{2n}\vec{b}_2 + \cdots + b_{nn}\vec{b}_n\text{.} \end{align*}

🔗

The coefficients \(b_{ij}\) obtained above determine a new matrix \(B\text{,}\) which we define to be the \textbf{matrix of \(T\) with respect to the basis} \(\mathcal{B}\text{.}\) Notice that if we write \(\vec{y} = \bbm y_1 \amp y_2 \amp \cdots \amp y_n\ebm^T\text{,}\) we have

\begin{equation*} B\vec{y} = \bbm b_{11} \amp b_{12} \amp \cdots \amp b_{1n}\\b_{21} \amp b_{22} \amp \cdots \amp b_{2n}\\\vdots \amp \vdots \amp \ddots \amp \vdots\\b_{n1}\amp b_{n2}\amp \cdots \amp n_{nn}\ebm \bbm y_1\\y_2\\\vdots \\y_n\ebm = \bbm b_{11}y_1+b_{12}y_2+\cdots b_{1n}y_n\\b_{21}y_1+b_{22}y_2 + \cdots + b_{2n}y_n\\ \vdots \\b_{n1}y_1+b_{n2}y_2+\cdots b_{nn}y_n\ebm\text{.} \end{equation*}

🔗

Let’s introduce some notation: for any vector \(\vec{w}\) in \(\R^n\text{,}\) we will let \([\vec{w}]_{\mathcal{B}}\) denote the \(n\times 1\) column vector defined as follows:

\begin{equation*} \text{If } \, \vec{w} = w_1\vec{b}_1 + w_2\vec{b}_2 + \cdots + w_n\vec{b}_n, \, \text{ then } \, [\vec{w}]_{\mathcal{B}} = \bbm w_1\\w_2\\\vdots \\w_n\ebm\text{.} \end{equation*}

In this notation, we have \(\vec{y}=[\vec{v}]_{\mathcal{B}}\text{,}\) and

\begin{equation*} [T(\vec{v})]_{\mathcal{B}} = \bbm b_{11}y_1+b_{12}y_2+\cdots b_{1n}y_n\\b_{21}y_1+b_{22}y_2 + \cdots + b_{2n}y_n\\ \vdots \\b_{n1}y_1+b_{n2}y_2+\cdots b_{nn}y_n\ebm\text{,} \end{equation*}

since

\begin{align*} (b_{11}y_1+\cdots b_{1n}y_n)\vec{b}_1 \amp + \cdots + (b_{n1}y_1+\cdots + b_{nn}y_n)\vec{b}_n\\ \amp = y_1(b_{11}\vec{b}_1+\cdots + b_{n1}\vec{b}_n) + \cdots + y_n(b_{1n}\vec{b}_1+\cdots + b_{nn}\vec{b}_n)\\ \amp = y_1T(\vec{b}_1) + \cdots + y_nT(\vec{b}_n)\\ \amp = T(y_1\vec{b}_1 + \cdots + y_n\vec{b}_n)\\ \amp = T(\vec{v})\text{.} \end{align*}

🔗

Let’s tie everything together. First, let \(P\) be the \(n\times n\) matrix whose columns are the vectors in \(\mathcal{B}\text{.}\) Then we know that

\begin{equation*} \vec{b}_1 = P\vec{e}_1, \vec{b}_2=P\vec{e}_2, \ldots, \vec{b}_n = P\vec{e}_n, \end{equation*}

and conversely,

\begin{equation*} \vec{e}_1 = P^{-1}\vec{b}_1, \vec{e}_2 = P^{-1}\vec{b}_2, \ldots, \vec{e}_n = P^{-1}\vec{b}_n\text{.} \end{equation*}

If \([\vec{v}]_{\mathcal{B}} = \bbm v_1\\v_2\\\vdots \\v_n\ebm\text{,}\) then we have

\begin{equation*} \vec{v} = v_1\vec{b}_1 +\cdots + v_n\vec{b}_n = v_1P\vec{e}_1 +\cdots +v_nP\vec{e}_n = P(v_1\vec{e}_1+\cdots + v_n\vec{e}_n) = P[\vec{v}]_{\mathcal{B}}\text{.} \end{equation*}

🔗

Note that we can also write \(\vec{v} = [\vec{v}]_{\mathcal{A}}\text{,}\) where \(\mathcal{A}\) is the standard basis, since by default, all of our vectors are written in terms of the standard basis. Thus, we can conclude that

\begin{equation} [\vec{v}]_{\mathcal{A}} = P[\vec{v}]_{\mathcal{B}}\tag{7.3.2} \end{equation}

defines the relationship between the column vector representations of our vector with respect to the two bases. Similarly, we can write

\begin{equation} [T(\vec{v})]_{\mathcal{A}} = P[T(\vec{v})]_{\mathcal{B}}\tag{7.3.3} \end{equation}

for the column representations of the output of our linear transformation \(T\text{.}\)

🔗

However, we also know that our transformation was originally defined using the matrix \(A\) according to

\begin{equation} [T(\vec{v})]_{\mathcal{A}} = A[\vec{v}]_{\mathcal{A}}\tag{7.3.4} \end{equation}

and that the matrix \(B\) above was defined such that

\begin{equation} [T(\vec{v})]_{\mathcal{B}} = B[\vec{v}]_{\mathcal{B}}\text{.}\tag{7.3.5} \end{equation}

Equating the two expressions above for \([T(\vec{v})]_{\mathcal{A}}\text{,}\) in Equations (7.3.3) and (7.3.4) we have

\begin{equation*} P[T(\vec{v})]_{\mathcal{B}} = A[\vec{v}]_{\mathcal{A}}\text{.} \end{equation*}

🔗

Plugging in Equations (7.3.5) on the left and (7.3.2) on the right, we have

\begin{equation*} PB[\vec{v}]_{\mathcal{B}} = AP[\vec{v}]_{\mathcal{B}}\text{.} \end{equation*}

Since this must be true for any vector, we conclude that \(PB = AP\text{,}\) and thus

\begin{equation*} B = P^{-1}AP, \end{equation*}

so \(B\) is similar to \(A\text{.}\)

🔗

Exercises Exercises

Exercise Group.

Compute the characteristic polynomial of the given matrix, and the eigenvalues of the matrix, along with their algebraic and geometric multiplicities. If possible, construct a matrix \(P\) such that \(P^{-1}AP\) is diagonal, and verify your result. If no such matrix \(P\) exists, explain why.

🔗

1.

\(\bbm 7\amp 0\amp 0\\10\amp -3\amp 0\\10\amp -8\amp 5\ebm\)

🔗

2.

\(\bbm 1\amp 1\amp 0\\3\amp 0\amp 3\\2\amp -1\amp 3\ebm\)

🔗

3.

\(\bbm 9\amp -7\\-6\amp 20\ebm\)

🔗

4.

\(\bbm 2\amp -1\\1\amp 0\ebm\)

🔗

5.

\(\bbm 0\amp 1\amp 1\\1\amp 0\amp 1\\1\amp 1\amp 0\ebm\)

🔗

6.

\(\bbm 3\amp -2\\1\amp 0\ebm\)

🔗

7.

\(\bbm 3\amp 2\amp -2\\2\amp 2\amp -2\\2\amp 2\amp -1\ebm\)

🔗

8.

\(\bbm 4 \amp -2\\1 \amp 3\ebm\)

🔗

Prev Top Next