Section 2.2 Kernel and Image
Given any linear transformation \(T:V\to W\) we can associate two important sets: the kernel of \(T\) (also known as the nullspace), and the image of \(T\) (also known as the range).
Definition 2.2.1.
Let \(T:V\to W\) be a linear transformation. The kernel of \(T\text{,}\) denoted \(\ker T\text{,}\) is defined by
\begin{equation*}
\ker T = \{\vv\in V \,|\, T(\vv)=\zer\}\text{.}
\end{equation*}
The image of \(T\text{,}\) denoted \(\im T\text{,}\) is defined by
\begin{equation*}
\im T = \{T(\vv) \,|\, \vv\in V\}\text{.}
\end{equation*}
Theorem 2.2.4.
For any linear transformation \(T:V\to W\text{,}\)
\(\ker T\) is a subspace of \(V\text{.}\)
\(\im T\) is a subspace of \(W\text{.}\)
Strategy.
Both parts of the proof rely on the
Subspace Test. So for each set, we first need to explain why it contains the zero vector. Next comes closure under addition: assume you have to elements of the set, then use the definition to explain what that means.
Now you have to show that the sum of those elements belongs to the set as well. It’s fairly safe to assume that this is going to involve the addition property of a linear transformation!
Scalar multiplication is handled similarly, but using the scalar multiplication property of \(T\text{.}\)
Proof.
To show that \(\ker T\) is a subspace, first note that \(\zer\in \ker T\text{,}\) since \(T(\zer)=\zer\) for any linear transformation \(T\text{.}\) Now, suppose that \(\vv,\ww\in \ker T\text{;}\) this means that \(T(\vv)=\zer\) and \(T(\ww)=0\text{,}\) and therefore,
\begin{equation*}
T(\vv+\ww)=T(\vv)+T(\ww)=\zer+\zer=\zer\text{.}
\end{equation*}
Similarly, for any scalar \(c\text{,}\) if \(\vv\in \ker T\) then \(T(\vv)=\zer\text{,}\) so
\begin{equation*}
T(c\vv)=cT(\vv)=c\zer=\zer\text{.}
\end{equation*}
By the subspace test, \(\ker T\) is a subspace.
Again, since \(T(\zer)=\zer\text{,}\) we see that \(\zer\in \im T\text{.}\) (That is, \(T(\zer_V)=\zer_W\text{,}\) so \(\zer_W\in \im T\text{.}\)) Now suppose that \(\ww_1,\ww_2\in \im T\text{.}\) This means that there exist \(\vv_1,\vv_2\in V\) such that \(T(\vv_1)=\ww_1\) and \(T(\vv_2)=\ww_2\text{.}\) It follows that
\begin{equation*}
\ww_1+\ww_2 = T(\vv_1)+T(\vv_2) = T(\vv_1+\vv_2)\text{,}
\end{equation*}
so \(\ww_1+\ww_2\in \im T\text{,}\) since it’s the image of \(\vv_1+\vv_2\text{.}\) Similarly, if \(c\) is any scalar and \(\ww=T(\vv)\in\im T\text{,}\) then
\begin{equation*}
c\ww=cT(\vv)=T(c\vv)\text{,}
\end{equation*}
so \(c\ww\in \im T\text{.}\)
Example 2.2.5. Null space and column space.
A familiar setting that you may already have encountered in a previous linear algebra course (or
Example 2.1.4) is that of a matrix transformation. Let
\(A\) be an
\(m\times n\) matrix. Then we can define
\(T:\R^n\to \R^m\) by
\(T(\xx)=A\xx\text{,}\) where elements of
\(\R^n,\R^m\) are considered as column vectors. We then have
\begin{equation*}
\ker T = \nll(A) = \{\xx\in \R^n \,|\, A\xx=\zer\}
\end{equation*}
and
\begin{equation*}
\im T = \csp(A) = \{A\xx\,|\, \xx\in \R^n\}\text{,}
\end{equation*}
where \(\csp(A)\) denotes the column space of \(A\text{.}\) Recall further that if we write \(A\) in terms of its columns as
\begin{equation*}
A = \bbm C_1 \amp C_2 \amp \cdots \amp C_n\ebm
\end{equation*}
and a vector \(\xx\in \R^n\) as \(\xx=\bbm x_1\\x_2\\\vdots \\x_n\ebm\text{,}\) then
\begin{equation*}
A\xx = x_1C_1+x_2C_2+\cdots +x_nC_n\text{.}
\end{equation*}
Thus, any element of \(\csp(A)\) is a linear combination of its columns, explaining the name column space.
Determining \(\nll(A)\) and \(\csp(A)\) for a given matrix \(A\) is, unsurprisingly, a matter of reducing \(A\) to row-echelon form. Finding \(\nll(A)\) comes down to describing the set of all solutions to the homogeneous system \(A\xx=\zer\text{.}\) Finding \(\csp(A)\) relies on the following theorem.
Theorem 2.2.6.
Let \(A\) be an \(m\times n\) matrix with columns \(C_1,C_2,\ldots, C_n\text{.}\) If the reduced row-echelon form of \(A\) has leading ones in columns \(j_1,j_2,\ldots, j_k\text{,}\) then
\begin{equation*}
\{C_{j_1},C_{j_2},\ldots, C_{j_k}\}
\end{equation*}
is a basis for \(\csp(A)\text{.}\)
For a proof of this result, see Section 5.4 of Linear Algebra with Applications, by Keith Nicholson. The proof is fairly long and technical, so we omit it here.
Example 2.2.7.
Consider the linear transformation \(T:\R^4\to \R^3\) defined by the matrix
\begin{equation*}
A = \bbm 1 \amp 3 \amp 0 \amp -2\\
-2 \amp -1 \amp 2 \amp 0\\
1 \amp 8 \amp 2 \amp -6\ebm\text{.}
\end{equation*}
Let’s determine the
RREF of
\(A\text{:}\)
We see that there are leading ones in the first and second column. Therefore, \(\csp(A) = \im(T) = \spn\left\{\bbm 1\\-2\\1\ebm, \bbm 3\\-1\\8\ebm\right\}\text{.}\) Indeed, note that
\begin{equation*}
\bbm 0\\2\\2\ebm = -\frac65\bbm 1\\-2\\1\ebm + \frac25\bbm 3\\-1\\8\ebm
\end{equation*}
and
\begin{equation*}
\bbm -2\\0\\-6\ebm = \frac25\bbm 1\\-2\\1\ebm -\frac45\bbm 3\\-1\\8\ebm\text{,}
\end{equation*}
so that indeed, the third and fourth columns are in the span of the first and second.
Furthermore, we can determine the nullspace: if \(A\xx=\zer\) where \(\xx=\bbm x_1\\x_2\\x_3\\x_4\ebm\text{,}\) then we must have
\begin{align*}
x_1 \amp =\frac65 x_3-\frac25 x_4\\
x_2 \amp =-\frac25 x_3+\frac 45 x_4\text{,}
\end{align*}
so
\begin{equation*}
\xx = \bbm \frac65x_3-\frac25x_4\\ -\frac25x_3+\frac45x_4\\x_3\\x_4\ebm = \frac{x_3}{5}\bbm 6\\-2\\5\\0\ebm + \frac{x_4}{5}\bbm -2\\4\\0\\5\ebm\text{.}
\end{equation*}
It follows that a basis for \(\nll(A)=\ker T\) is \(\left\{\bbm 6\\-2\\5\\0\ebm, \bbm -2\\4\\0\\5\ebm\right\}\text{.}\)
An important result that comes out while trying to show that the “pivot columns” of a matrix (the ones that end up with leading ones in the
RREF) are a basis for the column space is that the column rank (defined as the dimension of
\(\csp(A)\)) and the row rank (the dimension of the space spanned by the rows of
\(A\)) are equal. One can therefore speak unambiguously about the
rank of a matrix
\(A\text{,}\) and it is just as it’s defined in a first course in linear algebra: the number of leading ones in the
RREF of
\(A\text{.}\)
For a general linear transformation, we can’t necessarily speak in terms of rows and columns, but if \(T:V\to W\) is linear, and either \(V\) or \(W\) is finite-dimensional, then we can define the rank of \(T\) as follows.
Definition 2.2.9.
Note that if \(W\) is finite-dimensional, then so is \(\im T\text{,}\) since it’s a subspace of \(W\text{.}\) On the other hand, if \(V\) is finite-dimensional, then we can find a basis \(\{\vv_1,\ldots, \vv_n\}\) of \(V\text{,}\) and the set \(\{T(\vv_1),\ldots, T(\vv_n)\}\) will span \(\im T\text{,}\) so again the image is finite-dimensional, so the rank of \(T\) is finite. It is possible for either the rank or the nullity of a transformation to be infinite.
Knowing that the kernel and image of an operator are subspaces gives us an easy way to define subspaces. From the textbook, we have the following nice example.
Exercise 2.2.10.
Let \(T:M_{nn}\to M_{nn}\) be defined by \(T(A)=A-A^T\text{.}\) Show that:
(a)
\(T\) is a linear map.
Hint.
You can use the fact that the transpose is linear: \((A+B)^T=A^T+B^T\) and \((cA)^T=cA^T\text{.}\)
(b)
\(\ker T\) is equal to the set of all symmetric matrices.
Hint.
A matrix is symmetric if \(A^T=A\text{,}\) or in other words, \(A-A^T=0\text{.}\)
(c)
\(\im T\) is equal to the set of all skew-symmetric matrices.
Hint.
A matrix is skew-symmetric if \(A^T=-A\text{.}\)
Recall that a function \(f:A\to B\) is injective (or one-to-one) if for any \(x_1,x_2\in A\text{,}\) \(f(x_1)=f(x_2)\) implies that \(x_1=x_2\text{.}\) (In other words, no two different inputs give the same output.) We say that \(f\) is surjective (or onto) if \(f(A)=B\text{.}\) (That is, if the range of \(f\) is the entire codomain \(B\text{.}\)) These properties are important considerations for the discussion of inverse functions.
For a linear transformation \(T\text{,}\) the property of surjectivity is tied to \(\im T\) by definition: \(T:V\to W\) is onto if \(\im T = W\text{.}\) What might not be immediately obvious is that the kernel tells us if a linear transformation is injective.
Theorem 2.2.11.
Let \(T:V\to W\) be a linear transformation. Then \(T\) is injective if and only if \(\ker T = \{\zer\}\text{.}\)
Strategy.
We have an “if and only if” statement, so we have to make sure to consider both directions. The basic idea is this: we know that \(\zer\) is always in the kernel, so if the kernel contains any other vector \(\vv\text{,}\) we would have \(T(\vv)=T(\zer)=\zer\text{,}\) and \(T\) would not be injective.
There is one trick to keep in mind: the statement \(T(\vv_1)=T(\vv_2)\) is equivalent to \(T(\vv_1)-T(\vv_2)=\zer\text{,}\) and since \(T\) is linear, \(T(\vv_1)-T(\vv_2)=T(\vv_1-\vv_2)\text{.}\)
Proof.
Suppose \(T\) is injective, and let \(\vv\in \ker T\text{.}\) Then \(T(\vv)=\zer\text{.}\) On the other hand, we know that \(T(\zer)=\zer=T(\vv)\text{.}\) Since \(T\) is injective, we must have \(\vv=\zer\text{.}\) Conversely, suppose that \(\ker T = \{0\}\) and that \(T(\vv_1)=T(\vv_2)\) for some \(\vv_1,\vv_2\in V\text{.}\) Then
\begin{equation*}
\zer = T(\vv_1)-T(\vv_2) = T(\vv_1-\vv_2)\text{,}
\end{equation*}
so \(\vv_1-\vv_2\in \ker T\text{.}\) Therefore, we must have \(\vv_1-\vv_2=\zer\text{,}\) so \(\vv_1=\vv_2\text{,}\) and it follows that \(T\) is injective.
Exercise 2.2.12.
Rearrange the blocks below to produce a valid proof of the following statement:
If \(T:V\to W\) is injective and \(\{\vv_1,\vv_2,\ldots, \vv_n\}\) is linearly independent in \(V\text{,}\) then \(\{T(\vv_1),T(\vv_2),\ldots, T(\vv_n)\}\) is linearly independent in \(W\text{.}\)
Suppose \(T:V\to W\) is injective and \(\{\vv_1,\ldots, \vv_n\}\subseteq V\) is independent.
---
Assume that \(c_1T(\vv_1)+\cdots + c_nT(\vv_n)=\zer\text{,}\) for some scalars \(c_1,c_2,\ldots, c_n\text{.}\)
---
Since \(T\) is linear,
\begin{align*}
\zer \amp = c_1T(\vv_1)+\cdots + c_nT(\vv_n)\\
\amp =T(c_1\vv_1+ \ldots + c_n\vv_n)\text{.}
\end{align*}
---
Therefore, \(c_1\vv_1+\cdots + c_n\vv_n\in \ker T\text{.}\)
---
Since \(T\) is injective, \(\ker T = \{\zer\}\text{.}\)
---
Therefore, \(c_1\vv_1+\cdots + c_n\vv_n=\zer\text{.}\)
---
Since \(\{\vv_1,\ldots, \vv_n\}\) is independent, we must have \(c_1=0,\ldots, c_n=0\text{.}\)
---
It follows that \(\{T(\vv_1),\ldots, T(\vv_n)\}\) is linearly independent.
Hint.
Remember that your goal is to show that \(\{T(\vv_1),\ldots, T(\vv_n)\}\) is linearly independent, so after you assume your hypotheses, you should begin by setting a linear combination of these vectors equal to zero.
Exercise 2.2.13.
Rearrange the blocks below to produce a valid proof of the following statement:
If \(T:V\to W\) is surjective and \(V=\spn\{\vv_1,\ldots, \vv_n\}\text{,}\) then \(W = \spn\{T(\vv_1),\ldots, T(\vv_n)\}\text{.}\)
Suppose \(T\) is surjective, and \(\{\vv_1,\ldots, \vv_n\}\) is independent.
---
Let \(\ww\in W\) be any vector.
---
Since \(T\) is a surjection, there is some \(\vv\in V\) such that \(T(\vv)=\ww\text{.}\)
---
Since \(V=\spn\{\vv_1,\ldots, \vv_n\}\) and \(\vv\in V\text{,}\) there are scalars \(c_1,\ldots, c_n\) such that \(\vv=c_1\vv_1+\cdots +c_n\vv_n\text{.}\)
---
Since \(T\) is linear,
\begin{align*}
\ww \amp = T(\vv) \\
\amp = T(c_1\vv_1+\cdots +c_n\vv_n)\\
\amp = c_1T(\vv_1)+\cdots +c_nT(\vv_n)\text{,}
\end{align*}
so \(\ww\in \spn\{T(\vv_1),\ldots, T(\vv_n)\}\text{.}\)
---
Therefore, \(W\subseteq \spn\{T(\vv_1),\ldots, T(\vv_n)\}\text{,}\) and since \(\spn\{T(\vv_1),\ldots, T(\vv_n)\}\subseteq W\text{,}\) we have \(W=\spn\{T(\vv_1),\ldots, T(\vv_n)\}\text{.}\)
Hint.
You need to show that any \(\ww\in W\) can be written as a linear combination of the \(T(\vv_i)\text{.}\) Because \(T\) is surjective, you know that \(\ww=T(\vv)\) for some \(\vv\in V\text{.}\)
Theorem 2.2.15. Dimension Theorem.
Let \(T:V\to W\) be any linear transformation such that \(\ker T\) and \(\im T\) are finite-dimensional. Then \(V\) is finite-dimensional, and
\begin{equation*}
\dim V = \dim \ker T + \dim \im T\text{.}
\end{equation*}
Proof.
The trick with this proof is that we aren’t assuming \(V\) is finite-dimensional, so we can’t start with a basis of \(V\text{.}\) But we do know that \(\im T\) is finite-dimensional, so we start with a basis \(\{\ww_1,\ldots, \ww_m\}\) of \(\im T\text{.}\) Of course, every vector in \(\im T\) is the image of some vector in \(V\text{,}\) so we can write \(\ww_i =T(\vv_i)\text{,}\) where \(\vv_i\in V\text{,}\) for \(i=1,2,\ldots, m\text{.}\)
Since
\(\{T(\vv_1),\ldots, T(\vv_m)\}\) is a basis, it is linearly independent. The results of
Exercise 2.1.1 tell us that the set
\(\{\vv_1,\ldots, \vv_m\}\) must therefore be independent.
We now introduce a basis \(\{\uu_1,\ldots, \uu_n\}\) of \(\ker T\text{,}\) which we also know to be finite-dimensional. If we can show that the set \(\{\uu_1,\ldots, \uu_n,\vv_1,\ldots, \vv_m\}\) is a basis for \(V\text{,}\) we’d be done, since the number of vectors in this basis is \(\dim\ker T + \dim \im T\text{.}\) We must therefore show that this set is independent, and that it spans \(V\text{.}\)
To see that it’s independent, suppose that
\begin{equation*}
a_1\uu_1+\cdots + a_n\uu_n+b_1\vv_1+\cdots +b_m\vv_m=\zer\text{.}
\end{equation*}
Applying \(T\) to this equation, and noting that \(T(\uu_i)=\zer\) for each \(i\text{,}\) by definition of the \(\uu_i\text{,}\) we get
\begin{equation*}
b_1T(\vv_1)+\cdots +b_mT(\vv_m)=\zer\text{.}
\end{equation*}
We assumed that the vectors \(T(\vv_i)\) were independent, so all the \(b_i\) must be zero. But then we get
\begin{equation*}
a_1\uu_1+\cdots +a_n\uu_n=\zer\text{,}
\end{equation*}
and since the \(\uu_i\) are independent, all the \(a_i\) must be zero.
To see that these vectors span, choose any \(\xx\in V\text{.}\) Since \(T(\xx)\in \im T\text{,}\) there exist scalars \(c_1,\ldots, c_m\) such that
\begin{equation}
T(\xx)=c_1T(\vv_1)+\cdots +c_mT(\vv_m)\text{.}\tag{2.2.1}
\end{equation}
We’d like to be able to conclude from this that \(\xx=c_1\vv_1+\cdots +c_m\vv_m\text{,}\) but this would be false, unless \(T\) was known to be injective (which it isn’t). Failure to be injective involves the kernel -- how do we bring that into the picture?
The trick is to realize that the reason we might have
\(\xx\neq c_1\vv_1+\cdots +c_m\vv_m\) is that we’re off by something in the kernel. Indeed,
(2.2.1) can be re-written as
\begin{equation*}
T(\xx-c_1\vv_1-\cdots -c_m\vv_m) = \zer\text{,}
\end{equation*}
so \(\xx-c_1\vv_1-\cdots -c_m\vv_m\in\ker T\text{.}\) But we have a basis for \(\ker T\text{,}\) so we can write
\begin{equation*}
\xx-c_1\vv_1-\cdots -c_m\vv_m=t_1\uu_1+\cdots +t_n\uu_n
\end{equation*}
for some scalars \(t_1,\ldots, t_n\text{,}\) and this can be rearanged to give
\begin{equation*}
\xx=t_1\uu_1+\cdots +t_n\uu_n+c_1\vv_1+\cdots + c_m\vv_m\text{,}
\end{equation*}
which completes the proof.
This is sometimes known as the Rank-Nullity Theorem, since it can be stated in the form
\begin{equation*}
\dim V = \rank T + \operatorname{nullity} T\text{.}
\end{equation*}
We will see that this result is frequently useful for providing simple arguments that establish otherwise difficult results. A basic situation where the theorem is useful is as follows: we are given
\(T:V\to W\text{,}\) where the dimensions of
\(V\) and
\(W\) are known. Since
\(\im T\) is a subspace of
\(W\text{,}\) we know from
Theorem 1.7.18 that
\(T\) is onto if and only if
\(\dim \im T = \dim W\text{.}\) In many cases it is easier to compute
\(\ker T\) than it is
\(\im T\text{,}\) and the Dimension Theorem lets us determine
\(\dim\im T\) if we know
\(\dim V\) and
\(\dim \ker T\text{.}\)
Exercise 2.2.16.
Select all statements below that are true:
If \(\vv\in \ker T\text{,}\) then \(\vv=\zer\text{.}\)
Remember that \(\vv\in \ker T\) implies \(T(\vv)=\zer\text{;}\) this does not necessarily mean \(\vv=\zer\text{.}\)
If \(T:\R^4\to\R^6\) is injective, then it is surjective.
By the dimension theorem, the dimension of \(\im T\) cannot be greater than 4, so \(T\) can never be surjective.
If \(T:\R^4\to P_3(\R)\) is injective, then it is surjective.
Correct! If \(T\) is injective, then \(\dim\ker T=0\text{,}\) so by the dimension theorem, \(\dim \im T=\dim \R^4=4\text{.}\) Since \(\dim P_3(\R)=4\) as well, \(\im T=P_3(\R)\text{.}\)
It is possible to have an injective linear transformation \(T:\R^4\to\R^3\text{.}\)
The maximum dimension of \(\im T\) is 3, so the minimum dimension of \(\ker T\) is 1.
If \(T:V\to W\) is surjective, then \(\dim V\geq \dim W\text{.}\)
Correct! If \(T\) is surjective, then \(\im T=W\text{,}\) so \(\dim V = \dim \ker T +\dim \im T = \dim \ker T+\dim W\geq \dim W\text{.}\)
A useful consequence of this result is that if we know \(V\) is finite-dimensional, we can order any basis such that the first vectors in the list are a basis of \(\ker T\text{,}\) and the images of the remaining vectors produce a basis of \(\im T\text{.}\)
Another consequence of the dimension theorem is that we must have
\begin{equation*}
\dim \ker T\leq \dim V \quad \text{ and } \quad \dim \im T\leq \dim V\text{.}
\end{equation*}
Of course, we must also have \(\dim\im T\leq \dim W\text{,}\) since \(\im T\) is a subspace of \(W\text{.}\) In the case of a matrix transformation \(T_A\text{,}\) this means that the rank of \(T_A\) is at most the minimum of \(\dim V\) and \(\dim W\text{.}\) This once again has consequences for the existence and uniqueness of solutions for linear systems with the coefficient matrix \(A\text{.}\)
We end with an exercise that is both challenging and useful. Do your best to come up with a proof before looking at the solution.
Exercise 2.2.17.
Let \(V\) and \(W\) be finite-dimensional vector spaces. Prove the following:
(a)
\(\dim V\leq \dim W\) if and only if there exists an injection \(T:V\to W\text{.}\)
Hint.
You’re dealing with an “if and only if” statement, so be sure to prove both directions. One direction follows immediately from the dimension theorem.
What makes the other direction harder is that you need to prove an existence statement. To show that a transformation with the required property exists, you’re going to need to construct it! To do so, try defining your transformation in terms of a basis.
(b)
\(\dim V\geq \dim W\) if and only if there exists a surjection \(T:V\to W\text{.}\)
Hint.
The hint from the previous part also applies here!
Exercises Exercises
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.