Let \(A\) be an \(m\times n\) matrix. The tranpsose of \(A\text{,}\) denoted \(A^T\text{,}\) is the \(n\times m\) matrix whose columns are the respective rows of \(A\text{.}\)
If we write \(A=[a_{ij}]\) to emphasize the entries of \(A\text{,}\) then the transpose of \(A\) is the matrix \(A^T = [a^T_{ij}]\) where \(a^T_{ij} = a_{ji}\text{;}\) that is, the \((i,j)\)-entry of \(A^T\) is the \((j,i)\)-entry of \(A\text{.}\) Examples will make this definition clear.
Note that \(A\) is a \(2\times 3 \) matrix, so \(A^T\) will be a \(3 \times 2\) matrix. By the definition, the first column of \(A^T\) is the first row of \(A\text{;}\) the second column of \(A^T\) is the second row of \(A\text{.}\) Therefore,
We find each transpose using the definition without explanation. Make note of the dimensions of the original matrix and the dimensions of its transpose.
Notice that with matrix \(B\text{,}\) when we took the transpose, the diagonal did not change. We can see what the diagonal is below where we rewrite \(B\) and \(B^T\) with the diagonal in bold. Weβll follow this by a definition of what we mean by βthe diagonal of a matrix,β along with a few other related definitions.
The diagonals of \(A\) and \(A^T\) are the same, consisting of the entries 1, 4 and 6. The diagonals of \(B\) and \(B^T\) are also the same, consisting of the entries 3, 7 and \(-1\text{.}\) Finally, the diagonals of \(C\) and \(C^T\) are the same, consisting of the entries 1, 4 and 6.
The matrix \(B\) is diagonal. By their definitions, we can also see that \(B\) is both upper and lower triangular. Likewise, \(I_4\) is diagonal, as well as upper and lower triangular.
Make note of the definitions of diagonal and triangular matrices. We specify that a diagonal matrix must be square, but triangular matrices donβt have to be. (βMostβ of the time, however, the ones we study are.) Also, as we mentioned before in the example, by definition a diagonal matrix is also both upper and lower triangular. Finally, notice that by definition, the transpose of an upper triangular matrix is a lower triangular matrix, and vice-versa.
There are many questions to probe concerning the transpose operations. The first set of questions weβll investigate involve the matrix arithmetic we learned from last chapter. We do this investigation by way of examples, and then summarize what we have learned at the end.
It looks like βthe sum of the transposes is the transpose of the sum.β (This is kind of fun to say, especially when said fast. Regardless of how fast we say it, we should think about this statement. The βisβ represents βequals.β The stuff before βisβ equals the stuff afterwards.) This should lead us to wonder how the transpose works with multiplication.
We may have suspected that \((AB)^T = A^TB^T\text{.}\) We saw that this wasnβt the case, though β and not only was it not equal, the second product wasnβt even defined! Oddly enough, though, we saw that \((AB)^T = B^TA^T\text{.}\) (Then again, maybe this isnβt all that βodd.β It is reminiscent of the fact that, when invertible, \((AB)^{-1} = B^{-1}A^{-1}\text{.}\)) To help understand why this is true, look back at the work above and confirm the steps of each multiplication.
It seems that βthe inverse of the transpose is the transpose of the inverse.β (Again, we should think about this statement. The part before βisβ states that we take the transpose of a matrix, then find the inverse. The part after βisβ states that we find the inverse of the matrix, then take the transpose. Since these two statements are linked by an βis,β they are equal.)
We have just looked at some examples of how the transpose operation interacts with matrix arithmetic operations. (These examples donβt prove anything, other than it worked in specific examples.) We now give a theorem that tells us that what we saw wasnβt a coincidence, but rather is always true.
We included in the theorem two ideas we didnβt discuss already. First, that \((kA)^T = kA^T\text{.}\) This is probably obvious. It doesnβt matter when you multiply a matrix by a scalar when dealing with transposes.
The second βnewβ item is that \((A^T)^T = A\text{.}\) That is, if we take the transpose of a matrix, then take its transpose again, what do we have? The original matrix.
Now that we know some properties of the transpose operation, we are tempted to play around with it and see what happens. For instance, if \(A\) is an \(m\times n\) matrix, we know that \(A^T\) is an \(n\times m\) matrix. So no matter what matrix \(A\) we start with, we can always perform the multiplication \(AA^T\) (and also \(A^TA\)) and the result is a square matrix!
Another thing to ask ourselves as we βplay aroundβ with the transpose: suppose \(A\) is a square matrix. Is there anything special about \(A+A^T\text{?}\) The following example has us try out these ideas.
Letβs look at the matrices weβve formed in this example. First, consider \(AA^T\text{.}\) Something seems to be nice about this matrix β look at the location of the 6βs, the 5βs and the 3βs. More precisely, letβs look at the transpose of \(AA^T\text{.}\) We should notice that if we take the transpose of this matrix, we have the very same matrix. That is,
Look at the next part of the example; what do we notice about \(A+A^T\text{?}\) We should see that it, too, is symmetric. Finally, consider the last part of the example: do we notice anything about \(A-A^T\text{?}\)
We should immediately notice that it is not symmetric, although it does seem βclose.β Instead of it being equal to its transpose, we notice that this matrix is the opposite of its transpose. We call this type of matrix skew symmetric. (Some mathematicians use the term antisymmetric.) We formally define these matrices here.
So why was \(AA^T\) symmetric in our previous example? Did we just luck out? (Of course not.) Letβs take the transpose of \(AA^T\) and see what happens.
We have just proved that no matter what matrix \(A\) we start with, the matrix \(AA^T\) will be symmetric. Nothing in our string of equalities even demanded that \(A\) be a square matrix; it is always true.
We can do a similar proof to show that as long as \(A\) is square, \(A+A^T\) is a symmetric matrix. (Why do we say that \(A\) has to be square?) Weβll instead show here that if \(A\) is a square matrix, then \(A-A^T\) is skew symmetric.
Weβll take what we learned from ExampleΒ 6.1.10 and put it in a box. (Weβve already proved most of this is true; the rest we leave to solve in the Exercises.)
There are two answers that each answer both of these questions. First, we are interested in the transpose of a matrix and symmetric matrices because they are interesting. One particularly interesting thing about symmetric and skew symmetric matrices is this: consider the sum of \((A+A^T)\) and \((A-A^T)\text{:}\)
This gives us an idea β if we were to multiply both sides of this equation by \(\frac12\text{,}\) then the right hand side would just be \(A\text{.}\) This means that
\begin{equation*}
A = \underbrace{\frac12(A+A^T)}_{\text{symmetric}}\ +\ \underbrace{\frac12(A-A^T)}_{\text{skew symmetric}}\text{.}
\end{equation*}
That is, any matrix \(A\) can be written as the sum of a symmetric and skew symmetric matrix. Thatβs interesting.
The second reason we care about them is that they are very useful and important in various areas of mathematics. The transpose of a matrix turns out to be an important operation; symmetric matrices have many nice properties that make solving certain types of problems possible.
Most of this text focuses on the preliminaries of matrix algebra, and the actual uses are beyond our current scope. One easy to describe example is curve fitting. Suppose we are given a large set of data points that, when plotted, look roughly quadratic. How do we find the quadratic that βbest fitsβ this data? The solution can be found using matrix algebra, and specifically a matrix called the pseudoinverse. If \(A\) is a matrix, the pseudoinverse of \(A\) is the matrix \(A^\dagger = (A^TA)^{-1}A^T\) (assuming that the inverse exists). We arenβt going to worry about what all the above means; just notice that it has a cool sounding name and the transpose appears twice.
In the next section weβll learn about the trace, another operation that can be performed on a matrix that is relatively simple to compute but can lead to some deep results.