We close this chapter with some discussion of a theoretical nature. First, we will attempt to gain some additional insight into the (initially mysterious) definition of matrix multiplication by revisiting it from the point of view of linear transformations. We’ll then introduce two fundamental subspaces associated with a matrix transformation.
Subsection5.4.1Matrix multiplication as function composition
Recall that one of the ways we can obtain new functions from old ones is via function composition. Given two functions \(f(x)\) and \(g(x)\) (where \(x\in\R\)), we can form the compositions
In this example, both functions are defined from \(\R\) to \(\R\text{,}\) and neither is a linear transformation in the sense of this section. In fact, if \(f:\R\to\R\) satisfies Definition 5.2.3, then we must have \(f(x) = ax\) for some real number \(a\text{.}\) If \(g(x) = bx\) is another linear transformation from \(\R\) to \(\R\text{,}\) notice that we have
Thus, to compose two linear transformations from \(\R\) to \(\R\text{,}\) we simply multiply the constants used to define the transformations.
Now, what about a general linear transformation \(S:\R^n\to \R^m\text{?}\) We know that any such transformation is a matrix transformation: we must have
\begin{equation*}
S(\vec x) = A\vec x
\end{equation*}
for any \(\vec x\in\R^n\text{,}\) where \(A\) is an \(m\times n\) matrix. Since we’re multiplying an \(m\times n\) matrix by an \(n\times 1\) matrix, the rules of matrix multiplication ensure that the output \(\vec y = A\vec x\) is an element of \(\R^m\text{.}\)
Suppose now that we want to define the composition \((S\circ T)(\vec x)\) for some other linear transformation \(T\text{.}\) Recall the following rule of function composition:
In order for the composition \(S\circ T\) to be defined, the range of \(T\) must be contained in the domain of \(S\).
That is, since \(S\circ T\) is defined by \((S\circ T)(\vec x) = S(T(\vec x))\text{,}\) the vector \(T(\vec x)\) (which by definition is in the range of \(T\)) must belong to the domain of \(S\text{.}\) This means that we must have \(T(\vec x)\in \R^n\text{,}\) so we have
\begin{equation*}
T:\R^k\to \R^n
\end{equation*}
for some natural number \(k\text{.}\) On the other hand, we know that if \(T\) is a linear transformation, then \(T\) is defined by matrix multiplication: \(T(\vec x) = B\vec x\) for some \(n\times k\) matrix \(B\text{.}\)
Let us now recall one of the rules of matrix multiplication:
For the matrix product \(AB\) to be defined, the number of columns of \(A\) must equal the number of rows of \(B\).
That is, if \(A\) is an \(m\times n\) matrix, then \(B\) must be an \(n\times k\) matrix for some \(k\text{.}\) But this is the same conclusion as above! What is the connection? Well, if we follow the rules for function composition, if \(T(\vec x) = B\vec x\) and \(S(\vec y) = A\vec y\text{,}\) we must have
where the last equality is due to the associative property of matrix multiplication from Theorem 4.2.11. Thus, we see that:
Composition of linear transformations is the same as multiplication of the corresponding matrices!
Looking at things from the point of view of matrix transformations gives us two insights on the nature of matrix multiplication:
When \(A\) and \(B\) are both \(n\times n\) matrices, the transformations \(S(\vec x) = A\vec x\) and \(T(\vec x) = B\vec x\) are both maps from \(\mathbb{R}^n\) to \(\mathbb{R}^n\text{,}\) and we can define both
\begin{equation*}
(S\circ T)(\vec x) = (AB)\vec x
\end{equation*}
Our experience with functions teaches us that most of the time, \(S\circ T \neq T\circ S\text{,}\) so of course it makes sense that \(AB\neq BA\) in general!
The fact that \(AB\) is defined only when the number of columns of \(A\) matches the number of rows of \(B\) is simply a consequence of the fact that \(S\circ T\) is only defined if the range of \(T\) is a subset of the domain of \(A\text{.}\)
What about the “row times column” rule for determining the entries of \(AB\text{?}\) Let’s look at how things work in 2D. Suppose we’ve defined linear transformations
and this is exactly the rule for multiplying two \(2\times 2\) matrices! Of course, we can repeat the above argument for the general case where \(A\) is \(m\times n\) and \(B\) is \(n\times k\text{,}\) but you can probably guess that the algebra gets a bit messy on that one, so we’ll spare you the details.
Subsection5.4.2Column space
When we discussed the composition of linear transformations above, we briefly mentioned that this involves the consideration of the range. Recall that the range of a function is the set of all possible outputs when every input in the domain is considered. For example, with the function \(f(x)=x^2\text{,}\) where \(x\) can be any real number, the range is the set of all real numbers \(y\geq 0\text{.}\) (If \(y=x^2\) and \(x\) is real, then \(y\) can’t be negative.)
If we’re given a linear transformation \(T:\R^n\to \R^m\text{,}\) we might want to know what sort of vectors \(\vec y \in \R^m\) can be obtained from \(T\text{.}\) Consider Examples Example 5.1.3 and Example 5.1.5 from way back at the beginning of the section. In Example 5.1.3, the vectors \(A\vec x\) and \(A\vec y\) were non-parallel, and therefore independent. It follows that for any other vector \(\vec z\in\R^2\text{,}\) we can find scalars \(a\) and \(b\) such that
so every vector in \(\R^2\) can be written as the output of the transformation \(T(\vec x) = A\vec x\text{.}\) On the other hand, using the matrix \(A = \bbm 1\amp -1\\1\amp -1\ebm\) in Example 5.1.5, for any vector \(\bbm a\\b\ebm\in\R^2\text{,}\) we have
so the only vectors in the range of \(T(\vec x) = A\vec x\) are those parallel to the vector \(\bbm 1\\1\ebm\text{.}\)
Next, we’re going to consider a general matrix transformation \(T:\R^n\to \R^m\) given by \(T(\vec x) = A\vec x\text{,}\) but we’ll play around with the multiplication a little bit. By definition (and a bit of manipulation), we have
Thus, whenever we multiply a vector by a matrix, the result is a linear combination of the columns of \(A\text{!}\) If we think of each column of \(A\) as a column vector in \(\R^m\text{,}\) we can make the following definition:
Definition5.4.1.The column space of a matrix.
The column space of an \(m\times n\) matrix \(A\) is the subspace of \(\R^m\) spanned by the columns of \(A\text{:}\)
Second, as mentioned in Definition 5.4.1, since the range of \(T\) can be written as a span, it is automatically a subspace of \(\R^m\) according to Theorem 5.3.5. The range of a linear transformation is one of the more important examples of a subspace.
To give a more useful description of the column space, we rely Theorem 5.4.2 below, whose proof is too technical for this text. To help with the statement of this theorem, we first introduce one more bit of terminology. We will call a column of a matrix \(A\) a pivot column if the corresponding column in the reduced row echelon form of \(A\) contains a leading 1.
Theorem5.4.2.Basis for the column space of a matrix.
A basis for the column space of an \(m\times n\) matrix \(A\) is given by the set of pivot columns of \(A\text{.}\)
We will illustrate Theorem 5.4.2 with an example. It’s important to note that while we need to find the reduced row echelon form of \(A\) in order to find the pivot columns, the columns we want are those of the original matrix \(A\text{,}\) not its RREF.
Example5.4.3.Finding a basis for the column space.
Determine a basis for the column space of the matrix
and note that \(R\) has leading 1s in columns 1, 2, and 3. It follows that
\begin{equation*}
B = \left\{\bbm 1\\2\\-1\ebm, \bbm 0\\-1\\1\ebm, \bbm 2\\0\\3\ebm\right\}
\end{equation*}
is a basis for \(\operatorname{col}(A)\text{.}\)
Let’s make a few observations about the previous example. Notice that we have three leading 1s, so \(\operatorname{rank}(A) = 3\text{.}\) In particular, there is a leading 1 in each row, so we’re guaranteed that the system \(\ttaxb\) is consistent, no matter what the vector \(\vec{b}\) is. Since the number of pivot columns of \(A\) is equal to the number of leading 1s, we obtain the following result.
Theorem5.4.4.Dimension of the column space.
The dimension of the column space of a matrix \(A\) (or equivalently, the dimension of range of the matrix transformation defined by \(A\)) is equal to the rank of \(A\text{.}\)
To see why this result can be useful, notice that in our previous example, the matrix transformation \(T(\vx) = A\vx\) determines a linear transformation \(T:\R^4\to \R^3\text{.}\) Notice that there are three vectors in the basis for \(\operatorname{col}(A)\text{;}\) this means that the column space of \(A\) (and thus, the range of \(T\)) is three-dimensional, and therefore the range of \(T\) is all of \(\R^3\text{,}\) and thus, no matter what vector \(\vec b\in\R^3\) we choose, we’re guaranteed to be able to find a vector \(\vx \in \R^4\) such that \(A\vx = \vec b\text{.}\)
The key observation here is that the question “Does \(\ttaxb\) have a solution?” is equivalent to the question “Does the vector \(\vec{b}\) belong to \(\operatorname{col}(A)\text{?}\)” Unfortunately, while we may gain some insight from noticing that these questions are the same, we are no further ahead when it comes to answering them. Whatever version we prefer, the only way to get an answer is to compute the reduced row echelon form of \(\left[\begin{array}{c|c} A\amp \vec{b}\end{array}\right]\text{.}\)
Suppose we repeated Example 5.4.3 using the matrix \(A\) from Example 3.6.9. Both cases involved a matrix of size \(3\times 4\text{,}\) but the matrix from Example 3.6.9 had rank 2, so the column space of \(A\) is only two-dimensional. In this case, the system \(\ttaxb\) will be consistent if \(\vec{b}\) belongs to the span of the first two columns of \(A\text{,}\) and inconsistent otherwise.
Reading off the first two columns of \(A\text{,}\) we find that
We know that this is a plane through the origin in \(\mathbb{R}^3\text{,}\) but how do we quickly determine what vectors belong to this plane? There’s an easy way and a hard way. The easy way is to compute the cross product, as we did in many of the problems from Section 2.6. We find
belongs to the column space of \(A\) if and only if
\begin{equation*}
-4a+2b+c=0,
\end{equation*}
using the scalar form for the equation of a plane in \(\mathbb{R}^3\text{.}\) Having done it the easy way, let us do things once more the hard way. (Why do it the hard way if the easy way works? Because if we’re in any other case than a two-dimensional subspace of \(\R^3\text{,}\) the hard way is the only option we have!) The hard way is to solve the equation \(\ttaxb\) for an arbitrary vector \(\vb = \bbm a\\b\\c\ebm\text{.}\) As with the previous examples, we set up the augmented matrix and reduce:
We stopped before getting all the way to the reduced row echelon form, but we’re far enough along to realize that the only way our system can be consistent is if the last entry in the third row is equal to zero. This gives us the condition
which (after multiplying both sides by 10) is exactly the same as what we found using the cross product.
Subsection5.4.3Null space
The other important example is the null space of a matrix. The null space of an \(m\times n\) matrix \(A\) is simply the set of all those vectors \(\vec{x}\in\R^n\) such that \(A\vec x = \vec 0\text{.}\)
Definition5.4.5.The null space of a matrix.
The null space of an \(m\times n\) matrix \(A\) is denoted by \(\operatorname{null}(A)\text{,}\) and defined by
For example, we saw in Example 5.1.5 that the vector \(\vec x = \bbm 1\\1\ebm\) belongs to the null space of the matrix \(A = \bbm 1\amp -1\\1\amp -1\ebm\text{,}\) since
Given a general \(m\times n\) matrix \(A\text{,}\) we know from Section 3.6 that determining the null space amounts to simply solving a homogeneous system of linear equations. Let us see how this works in an example.
Example5.4.6.Determining the null space of a matrix.
Since the null space of \(A\) is equal to the set of all solutions \(\vx\) to the matrix equation \(A\vec x = \vec 0\text{,}\) we proceed by forming the proper augmented matrix and putting it into reduced row echelon form, which we do below.
We see that the null space of \(A\) contains infinitely many solutions to the equation \(A\vx = \vec 0\text{;}\) any choice of \(x_2\) gives us one of these solutions. For instance, picking \(x_2=2\) gives the solution
This is a particularly nice solution, since there are no fractions! In fact, since the parameter \(t\) can take on any real value, there is nothing preventing us from defining a new parameter \(s = t/2\text{,}\) and then
Our solutions are multiples of a vector, and hence we can graph this, as done in Figure 5.4.7.
In Example 5.4.6, we saw that the solution is a line through the origin, and thus, we can conclude that \(\operatorname{null}(A)\) is a subspace! In fact, this is no coincidence: it is guaranteed by our next theorem.
Theorem5.4.8.The null space of a matrix is a subspace.
For any \(m\times n\) matrix \(A\text{,}\)\(\operatorname{null}(A)\) is a subspace of \(\R^n\text{.}\)
The proof of this theorem is simple. Suppose \(\vec x, \vec y\in \operatorname{null}(A)\text{.}\) By definition, this means \(A\vec x = \vec 0\) and \(A\vec y = \vec 0\text{.}\) Using the properties of matrix multiplication, we have
\begin{equation*}
A(\vec x + \vec y) = A\vec x + A\vec y = \vec 0 + \vec 0 = \vec 0\text{,}
\end{equation*}
so \(\vec x + \vec y\in \operatorname{null}(A)\text{,}\) and
so \(c\vec x\in \operatorname{null}(A)\text{.}\) It follows from the definition of a subspace that \(\operatorname{null}(A)\) is a subspace of \(\R^n\text{.}\)
In Section 2.7 we discussed the fact that whenever we have a subspace of \(\R^n\text{,}\) it can be useful to determine a basis for our subspace. Recall from Definition 3.6.8 that the general solution to a homogeneous system of linear equations can be written in terms of certain basic solutions. In the context of null spaces, these basic solutions are just such a basis.
Although we will not prove it here, the basic solutions to a homogeneous system are always linearly independent. Moreover, it follows from the definition of \(\operatorname{null}(A)\) that any \(\vx \in\operatorname{null}(A)\) can be written as a linear combination of the basic solutions. In the language of Section 2.7, the basic solutions to a homogeneous system \(A\vx = \vec 0\) form a basis for the null space of \(A\text{.}\) This is an important point to remember, so we emphasize it in the following Key Idea.
Key Idea5.4.9.Basis for the null space of a matrix.
The basic solutions to the homogeneous system \(A\vec{x}=\vec{0}\) form a basis for the null space of \(A\text{.}\) That is, if \(\vec{v}_1, \vec{v}_2, \ldots, \vec{v}_k\) are the basic solutions to \(A\vec{x}=\vec{0}\text{,}\) then
Again, determining \(\operatorname{null}(A)\) is the same as solving the homogeneous system \(A\vx = \vec 0\text{,}\) and by Key Idea 5.4.9, a basis for \(\operatorname{null}(A)\) is given by the basic solutions to this system. As usual, to find the basic solutions, we set up the augmented matrix of the system and reduce:
are the basic solutions to \(A\vx = \vec 0\text{.}\) Since the null space of \(A\) is equal to the set of solutions to \(A\vx = \vec 0\text{,}\) and since every solution to \(A\vx = \vec 0\) can be written in terms of \(\vvv\) and \(\vec{w}\text{,}\) it follows that
and that \(\{\vvv, \vec{w}\}\) is a basis for \(\operatorname{null}(A)\text{.}\)
Another reason the null space is interesting is that it lets us determine whether or not a linear transformation is one-to-one. Suppose \(T:\R^n\to\R^m\) is a linear transformation defined by \(T(\vec x) = A\vec x\text{.}\) We know that \(T(\vec 0) = \vec 0\text{,}\) so \(\vec 0\in \operatorname{null}(A)\) (as it must be, since \(\operatorname{null}(A)\) is a subspace). If we have any non-zero vector \(\vec v\in\operatorname{null}(A)\text{,}\) then \(T\) cannot be one-to-one, since we’d have
Thus, if \(\operatorname{null}(A)\neq \{\vec 0\}\text{,}\) then \(T\) is not one-to-one. On the other hand, suppose \(\operatorname{null}(A)=\{\vec 0\}\text{,}\) and that \(T(\vec x) = T(\vec y)\) for vectors \(\vec x, \vec y\in R^n\text{.}\) Then we have
so that \(\vec x - \vec y\in \operatorname{null}(A) = \{0\}\text{,}\) which means that \(\vec x - \vec y = \vec{0}\text{,}\) and thus \(\vec x = \vec y\text{.}\) We have proved the following:
Theorem5.4.11.Null space and one-to-one transformations.
Let \(T:\R^n\to\R^m\) be defined by \(T(\vec x) = A\vec x\) for some \(m\times n\) matrix \(A\text{.}\) Then \(T\) is one-to-one if and only if \(\operatorname{null}(A) = \{\vec 0\}\text{.}\)
The final result we’ll state provides an interesting (and powerful) relationship between the null and column spaces.
Theorem5.4.12.The Fundamental Theorem of Linear Transformations.
Let \(T:\R^n\to\R^m\) be a linear transformation defined by \(T(\vec x) = A\vec x\) for some \(m\times n\) matrix \(A\text{.}\) Then
This result is sometimes known as the “rank-nullity theorem”; it gives the relationship between the rank of a matrix \(A\text{,}\) which is equal to the dimension of its column space, and the nullity of \(A\text{,}\) which is defined to be the dimension of its null space.
A formal proof of this result is beyond the scope of this course, but the intuition we’ve gained from solving systems should make it plausible. Recall that Item 3 in Theorem 3.6.15 on the relationship between the rank of a matrix and types of solutions gives us the equation
\begin{equation*}
k + \operatorname{rank}(A) = n\text{,}
\end{equation*}
where \(k\) is the number of parameters in the general solution of \(A\vx = \vec b\text{.}\) Now, we know from Definition 3.6.8 that the number of parameters in the general solution to \(\ttaxb\) is equal to the number of basic solutions to the system \(A\vx = \vec 0\text{,}\) and that the basic solutions to \(\ttaxo\) form a basis for \(\operatorname{null}(A)\text{.}\) From this, we can conclude that
\begin{equation*}
k = \dim \operatorname{null}(A)\text{.}
\end{equation*}
We also claimed in Theorem 5.4.4 above that the rank of \(A\) is equal to the dimension of its column space. Putting these facts together, we can see why the rank-nullity theorem must hold. Let’s confirm that the result holds in one more example.
Whether or not the vector \(\vec{b}\) belongs to the column space of \(A\text{.}\)
Solution.
We’ll tackle the null space first. We form the augmented matrix for the system \(\ttaxo\text{,}\) put it into reduced row echelon form, and interpret the result.
It is easy to confirm that both \(\vu\) and \(\vvv\) are solutions to the linear system \(\tta\vx=\zero\text{.}\) (Just multiply \(\tta\vu\) and \(\tta\vvv\) and see that both are \zero.) Since both are solutions to a homogeneous system of linear equations, any linear combination of \(\vu\) and \(\vvv\) will be a solution, too, so the vectors \(\vu\) and \(\vvv\) form a basis for the null space of \(A\text{.}\)
Now let’s tackle the column space. Determining whether or not \(\vec b\) belongs to the column space is the same as solving the system \(\tta\vx = \vb\text{.}\) Once again we put the associated augmented matrix into reduced row echelon form and interpret the results.
Again, we pull apart this vector, but this time we break it into three vectors: one with “\(s\)” stuff, one with “\(t\)” stuff, and one with just constants.
Note that \(\tta\vec{x}_p = \vb\text{;}\) by itself, \(\vec{x}_p\) is a solution. The fact that we have at least one vector \(\vec{x}_p\) such that \(A\vec{x}_p=\vec{b}\) tells us that \(\vb\) belongs to the range of the transformation \(T(\vx) = A\vx\text{.}\) The fact that there is more than one solution corresponds to the fact that the null space of \(A\) is non-trivial.
Why don’t we graph this solution as we did in the past? Before we had only two variables, meaning the solution could be graphed in 2D. Here we have four variables, meaning that our solution “lives” in 4D. You can draw this on paper, but it is very confusing.
For further verification of Theorem 5.4.12, the reader is encouraged to revisit the examples of Section 3.6 and re-interpret them in the context of null space and column space.
Exercises5.4.4Exercises
Exercise Group.
Determine a basis for the null space and column space of the given matrix, and verify Theorem 5.4.12.