So far in this chapter, we’ve learned a systematic method for solving systems of linear equations. Some of the applied examples we considered in the previous section led naturally to systems of equations, and had solutions that were best interpreted in that context. Other examples, such as those involving span and linear independence, were stated in terms of vectors. In this section, we discuss how to write a system of linear equations in terms of vectors and matrices, and express solutions as vectors.
Expressing the solutions of linear systems in terms of vectors will give us additional insight into the behaviour of those systems, and provides a stepping-off point for the study of the algebra of matrices.
We have often relied on previous algebra experience to help us understand linear algebra concepts. We do that again here. Consider the equation \(ax=b\text{,}\) where \(a=3\) and \(b=6\text{.}\) If we asked one to “solve for \(x\text{,}\)” what exactly would we be asking? We would want to find a number, which we call \(x\text{,}\) where \(a\) times \(x\) gives \(b\text{;}\) in this case, it is a number, when multiplied by 3, returns 6. As long as \(a\neq 0\) (what if \(a\) does equal zero?), we know that we can multiply both sides of the equation by \(\frac{1}{a}\) to get \(x = \frac{1}{a}(b) = \frac{b}{a}\text{.}\)
Consider a general system of linear equations, of the form
and express our solution as a single vector rather than a list of numbers.
To create an analogy with the single variable equation \(ax=b\text{,}\) we ask: Is there a way to define the product \(A\vec{x}\) of a matrix and a column vector in such a way that the system (3.6.1) can be written in the form
Fortunately for us, the answer is yes! Even better, we’ll see that the definition we give here turns out to be a special case (and motivating example) for the general definition of matrix multiplication given in Section 4.2.
The definition of the product \(A\vec{x}\) is straightforward. We want the result to be a column vector of size \(m\) (that is, with \(m\) entries), so that we can set it equal to the column vector \(\vec{b}\text{.}\) Furthermore, each entry in \(\vec{b}\) is the right-hand side of an equation in (3.6.1), so we want the corresponding entry in \(A\vec{x}\) to be the left-hand side. We are immediately forced to adopt the following rule.
Definition3.6.1.The product \(A\vec{x}\).
The product of the matrix \(A = \bbm a_{11} \amp a_{12} \amp \cdots \amp a_{1n}\\
a_{21} \amp a_{22} \amp \cdots \amp a_{2n}\\
\vdots \amp \vdots \amp \ddots \amp \vdots\\
a_{m1} \amp a_{m2} \amp \cdots \amp a_{mn}\ebm\) and the vector \(\vec{x} = \bbm x_1\\x_2\\\vdots\\x_n\ebm\) is given by
A few remarks about Definition 3.6.1 are needed here. First, note that the number of columns in the matrix \(A\) matches the number of entries in the vector \(\vec{x}\text{,}\) and the number of rows in \(A\) matches the number of entries in the vector \(\vec{b}\text{.}\) Moreover,
The \(i\)th entry in the vector \(A\vec{x}\) is obtained by forming the dot product of row \(i\) in the matrix \(A\) (viewed as a vector in \(\R^n\)) with the vector \(\vec{x}\text{.}\)
That is, since each row of \(A\) has \(n\) entries, as does the vector \(\vec{x}\text{,}\) we can form the dot product of \(\vec{x}\) with any of the rows of \(A\text{.}\) Each such dot product forms the corresponding entry in the vector \(A\vec{x}\text{.}\)
Second, the right-hand side of the equation defining \(A\vx\) in Definition 3.6.1 can be rewritten as follows:
That is, the product\(A\vx\)is a linear combination of the columns of\(A\text{,}\)with the entries of\(\vx\)as coefficients.
Or equivalently, using the language of Section 2.7, the vector \(A\vx\) belongs to the span of the columns of \(A\text{.}\) This will be a useful observation when solving questions about span.
To help understand what this is asking, we’ll consider an example. Let
(We don’t know what \(\vx\) is, so we have to represent its entries with the variables \(x_1\text{,}\)\(x_2\) and \(x_3\text{.}\)) Let’s “solve for \(\vx\text{,}\)” given the equation \(\ttaxb\text{.}\)
We multiply out the left hand side of this equation according to Definition 3.6.1. The first row of \(A\) is the row vector \(\vec{a}_1=\bbm 1\amp 1\amp 1\ebm\text{.}\) Treating this as a column vector, we form the dot product with \(\vec{x}\text{,}\) giving us
Be sure to note that the product is just a vector; it has just one column.
If you prefer to think in terms of the columns of \(A\text{,}\) we can use the observation above, that \(A\vx\) is a linear combination of the columns of \(A\text{.}\) This gives us
This should look familiar; it is a system of linear equations! Given the matrix-vector equation \(\tta\vx=\vb\text{,}\) we can recognize \(\tta\) as the coefficient matrix from a linear system and as the vector of the constants from the linear system. Given a system of equations, rewriting it in matrix form is equally straightforward.
To solve a matrix--vector equation (and the corresponding linear system), we simply augment the matrix \(\tta\) with the vector \(\vb\text{,}\) put this matrix into reduced row echelon form, and interpret the results.
We convert the above linear system into an augmented matrix and find the reduced row echelon form:
The solution is rather straightforward, even though we did a lot of work before to find the answer. Form the augmented matrix \(\bbm \tta\amp \vb\ebm\) and interpret its reduced row echelon form.
This probably seems all well and good. While asking one to solve the equation \(\tta\vx=\vb\) for \(\vx\) seems like a new problem, in reality it is just asking that we solve a system of linear equations. Our variables \(x_1\text{,}\) etc., appear not individually but as the entries of our vector \(\vx\text{.}\) We are simply writing an old problem in a new way.
In line with this new way of writing the problem, we have a new way of writing the solution. Instead of listing, individually, the values of the unknowns, we simply list them as the elements of our vector\(\vx\text{.}\)
These are important ideas, so we state the basic principle once more: solving the equation \(\tta\vx=\vb\) for \(\vx\) is the same thing as solving a linear system of equations. Equivalently, any system of linear equations can be written in the form \(\tta\vx=\vb\) for some matrix \(\tta\) and vector\(\vb\text{.}\)
Since these ideas are equivalent, we’ll refer to \(\tta\vx=\vb\) both as a matrix–vector equation and as a system of linear equations: they are the same thing.
We’ve seen two examples illustrating this idea so far, and in both cases the linear system had exactly one solution. We know from Theorem 3.4.2 that any linear system has either one solution, infinitely many solutions, or no solution. So how does our new method of writing a solution work with infinitely many solutions and no solutions?
Certainly, if \(\tta\vx=\vb\) has no solution, we simply say that the linear system has no solution. There isn’t anything special to write. So the only other option to consider is the case where we have infinitely many solutions. We’ll learn how to handle these situations through examples.
Example3.6.3.Finding the vector solution to a linear system.
Solve the linear system \(\tta\vx=\zero\) for \(\vx\) and write the solution in vector form, where
From above we know that \(x_1 = -2x_2\text{,}\) where \(x_2=t\text{,}\) so we replace the \(x_1\) in \(\vx\) with \(-2t\) and replace \(x_2\) by \(t\text{.}\) This gives our solution as
Recall that since our system was consistent and had a free variable, we have infinitely many solutions. This form of the solution highlights this fact; pick any value for \(t\) and we get a different solution.
For instance, by setting \(t = -1\text{,}\)\(0\text{,}\) and \(5\text{,}\) we get the solutions
We should check our work; multiply each of the above vectors by \(\tta\) to see if we indeed get \(\zero\text{.}\) Or, we can save ourselves some time and check the general solution. We have
We have officially solved this problem; we have found the solution to \(\tta\vx=\zero\) and written it properly. One final thing we will do here is graph the solution, using our skills learned in the previous section.
This means that any scalar multiple of the vector \(\vvv = \bbm -2\\1\ebm\) is a solution; we know how to sketch the scalar multiples of \(\vvv\text{.}\) This is done in Figure 3.6.4.
Here vector \(\vvv\) is drawn as well as the line that goes through the origin in the direction of \(\vvv\text{.}\) Any vector along this line is a solution. So in some sense, we can say that the solution to \(\tta\vx=\zero\) is a line.
A few comments are in order here. First, matrix equations (or the corresponding system of linear equations) such as the above where the vector on the right-hand side is the zero vector form a special case that is important enough to have its own name: these are known as homogeneous systems of equations. The formal definition is as follows.
Definition3.6.5.Homogeneous Linear System of Equations.
A system of linear equations is homogeneous if the constant term in each equation is zero.
Note: a homogeneous system of equations can be written in vector form as \(\tta\vx = \zero\text{.}\)
The term homogeneous comes from two Greek words; homo meaning “same” and genus meaning “type.” A homogeneous system of equations is a system in which each equation is of the same type — all constants are 0. In mathematics, objects that are considered homogeneous often have some sort of scale invariance, and that is the case here: if \(\vx\) is a solution to \(\tta\vx=\zero\text{,}\) then so is \(c\vx\) for any scalar \(c\text{.}\) (Do you see why?)
Notice that the line \(\vx =t\vvv\) in the solution of Example 3.6.3 above passes through the origin. This is an important characteristic of homogeneous systems: since \(A\vec 0 = \vec 0\) for any matrix \(A\text{,}\) we always have (at least) the solution \(\vx = \vec 0\text{.}\) (We’ll have more to say about this below.)
Example3.6.6.Determining the solution of a homogeneous system.
Determine the solution to the system \(A\vx = \vec 0\text{,}\) where
Again, we have infinitely many solutions to the equation \(A\vx = \vec 0\text{;}\) any choice of \(x_2\) gives us one of these solutions. For instance, picking \(x_2=2\) gives the solution
This is a particularly nice solution, since there are no fractions! In fact, since the parameter \(t\) can take on any real value, there is nothing preventing us from defining a new parameter \(s = t/2\text{,}\) and then
As in the previous example, our solutions are multiples of a vector, and hence we can graph this, as done in Figure 3.6.7.
In the last two examples, we saw that the general solution could be written in the form \(\vx = t\vvv\) for a vector \(\vvv\) such that \(A\vvv =\vec 0\text{.}\) Such vectors are known as the basic solutions to a homogeneous linear system.
Definition3.6.8.Basic solution.
Let \(A\vx = \vec 0\) be a homogeneous linear system of equations with infinitely many solutions and free variables
The basic solutions to the system \(A\vx = \vec 0\) are the vectors \(\vec{v}_1, \vec{v}_2, \ldots, \vec{v}_k\) such that the general solution to the system is given by
are the basic solutions to \(A\vx = \vec 0\text{.}\)
Let’s practice finding vector solutions again; this time, we won’t solve a system of the form \(\tta\vx=\zero\text{,}\) but instead \(\ttaxb\text{,}\) for some vector \(\vb\neq \vec 0\text{.}\) Such systems are known (unsurprisingly) as non-homogeneous systems.
This solution is different than what we’ve seen in the past two examples; we can’t simply pull out a \(t\) since there is a 3 in the first entry. Using the properties of matrix addition, we can “pull apart” this vector and write it as the sum of two vectors: one which contains only constants, and one that contains only terms involving the parameter \(t\text{.}\) We do this below.
We still have infinitely many solutions; by picking a value for \(t\) we get one of these solutions. For instance, by letting \(t= -1\text{,}\)\(0\text{,}\) or \(2\text{,}\) we get the solutions
We have officially solved the problem; we have solved the equation \(\ttaxb\) for \(\vx\) and have written the solution in vector form. As an additional visual aid, we will graph this solution.
Each vector in the solution can be written as the sum of two vectors: \(\vec{x}_p\) and a multiple of \(\vvv\text{.}\) In Figure 3.6.11, \(\vec{x}_p\) is graphed and \(\vvv\) is graphed with its origin starting at the tip of \(\vec{x}_p\text{.}\) Finally, a line is drawn in the direction of \(\vvv\) from the tip of \(\vec{x}_p\text{;}\) any vector pointing to any point on this line is a solution to \(\ttaxb\text{.}\)
Notice that in this case our line does not pass through the origin, so the set of solutions is not a subspace. On the other hand, every solution to the system \(A\vec x = \vec b\) can be obtained by adding the vector \(\vec{x}_p\) to an element of the set of solutions to the homogeneous system \(A\vx = \vec 0\text{.}\) We’ll elaborate on this shortly.
The previous examples illustrate some important concepts. One is that (at least, when \(\vec x\in \R^2\) or \(\R^3\)) we can visualize the solution to a system of linear equations. Before, when we had infinitely many solutions, we knew we could arbitrarily pick values for our free variables and get different solutions. We knew this to be true, and we even practised it, but the result was not very “tangible.” Now, we can view our solution as a vector; by picking different values for our free variables, we see this as multiplying certain important vectors by a scalar which gives a different solution.
Another important concept that these examples demonstrate comes from the fact that Examples 3 and 10 were only “slightly different” and hence had only “slightly different” answers. Both solutions had
\begin{equation*}
t\bbm-2\\1\ebm
\end{equation*}
in them; in Example 3.6.10 the solution also had another vector added to this. The addition of the vector \(\vec{x}_p\) in Example 3.6.10 is needed to account for the fact that we were dealing with a non-homogeneous system of linear equations.
Recall that for a homogeneous system of linear equations, we know that \(\vx = \vec 0\) will be a solution, since no matter what the matrix \(A\) is, we can be certain that \(\tta\zero = \zero\text{.}\) This fact is important; the zero vector is always a solution to a homogeneous linear system. Therefore a homogeneous system is always consistent; we need only to determine whether we have exactly one solution (just \(\zero\)) or infinitely many solutions. This idea is important, so we give it its own box.
Key Idea3.6.12.Homogeneous Systems and Consistency.
All homogeneous linear systems are consistent.
How do we determine if we have exactly one or infinitely many solutions? Recall Key Idea 3.4.7: if the solution has any free variables, then it will have infinitely many solutions. How can we tell if the system has free variables? Form the augmented matrix \(\bbm \tta \amp \zero \ebm\text{,}\) put it into reduced row echelon form, and interpret the result.
It may seem that we’ve brought up a new question, “When does \(\tta\vx=\zero\) have exactly one or infinitely many solutions?” only to answer with “Look at the reduced row echelon form of \(\tta\) and interpret the results, just as always.” Why bring up a new question if the answer is an old one?
While the new question has an old solution, it does lead to a great idea. Let’s refresh our memory; earlier we solved two linear systems,
so \(\vec{x}_p\) is a solution. (The subscript \(p\) of “\(\vec{x}_p\)” is used to denote that this vector is a particular solution: see Definition 3.4.11.) What about the general solution \(\vec x = \vec{x}_p+t\vvv\text{?}\) Recalling that \(A\vvv=\zero\text{,}\) we have
for any value of \(t\text{,}\) so there are infinitely many solutions to our system, one for each \(t\in \R\text{.}\) The whole point is that \(\vec{x}_p\) itself is a solution to \(\tta\vx = \vb\text{,}\) and we could find more solutions by adding vectors “that go to zero” when multiplied by \(\tta\text{.}\)
So we wonder: does this mean that \(\tta\vx = \vb\) will have infinitely many solutions? After all, if \(\vec{x}_p\) and \(\vec{x}_p+\vvv\) are both solutions, don’t we have infinitely many solutions?
No. If \(\tta\vx = \zero\) has exactly one solution, then \(\vvv = \zero\text{,}\) and \(\vec{x}_p = \vec{x}_p +\vvv\text{;}\) we only have one solution.
So here is the culmination of all of our fun that started a few pages back. If \(\vvv\) is a solution to \(\tta\vx=\zero\) and \(\vec{x}_p\) is a solution to \(\tta\vx=\vb\text{,}\) then \(\vec{x}_p+\vvv\) is also a solution to \(\tta\vx=\vb\text{.}\) If \(\tta\vx=\zero\) has infinitely many solutions, so does \(\tta\vx = \vb\text{;}\) if \(\tta\vx=\zero\) has only one solution, so does \(\tta\vx = \vb\text{.}\) This culminating idea is of course important enough to be stated again.
Key Idea3.6.13.Solutions of Consistent Systems.
Let \(\tta\vx=\vb\) be a consistent system of linear equations.
If \(\tta\vx = \zero\) has exactly one solution \((\vx = \zero)\text{,}\) then \(\tta\vx = \vb\) has exactly one solution.
If \(\tta\vx =\zero\) has infinitely many solutions, then \(\tta\vx=\vb\) has infinitely many solutions.
A key word in the above statement is consistent. If \(\tta\vx = \vb\) is inconsistent (the linear system has no solution), then it doesn’t matter how many solutions \(\tta\vx = \zero\) has; \(\tta\vx=\vb\) has no solution.
We can elaborate on Key Idea 3.6.13 above, as well as Key Idea 3.4.7 from Section 3.4 by introducing one more piece of important terminology. By now it is probably clear that the leading 1s in the reduced row echelon form of a matrix play a key role in understanding the system. In fact, it turns out that we can describe all of the different possibilities for a linear system in terms of one number: the number of leading 1s in the reduced row echelon form of a matrix.
Definition3.6.14.The rank of a matrix.
The rank of a matrix \(A\) is denoted by \(\operatorname{rank}(A)\) and defined as the number of leading 1s in the reduced row echelon form of \(A\text{.}\)
Although we do not prove it in this textbook, the reduced row echelon form of any matrix is unique; it follows from this fact that the rank of a matrix is a well-defined number. The importance of rank is outlined in the following result.
Theorem3.6.15.Rank and solution types.
Let \(A\) be an \(m\times n\) matrix. For any linear system \(\ttaxb\) in \(n\) variables, we have the following possibilities:
If \(\operatorname{rank}(A) \lt \operatorname{rank}\left[\begin{array}{c|c}A \amp \vec{b}\end{array}\right]\text{,}\) then the system \(\ttaxb\) is inconsistent.
If \(\operatorname{rank}(A)=\operatorname{rank}\left[\begin{array}{c|c} A \amp \vec b\end{array}\right] = n\) (where \(n\) is the number of variables), then the system \(\ttaxb\) has a unique solution.
If \(\operatorname{rank}(A)=\operatorname{rank}\left[\begin{array}{c|c} A \amp \vec b\end{array}\right] = \lt n\text{,}\) then the system \(\ttaxb\) has infinitely solutions. Moreover, the general solution to \(\ttaxb\) will involve \(k\) parameters, where
\begin{equation*}
k = n - \operatorname{rank}(A)\text{.}
\end{equation*}
To understand Item 1 above, note that if
\begin{equation*}
\operatorname{rank}(A)\lt \operatorname{rank}\left[\begin{array}{c|c} A \amp \vec b\end{array}\right]\text{,}
\end{equation*}
then there must be a leading 1 in the right-hand column of the reduced row echelon form of \(\bbm A \amp \vec b\ebm\text{,}\) meaning that we have a row of the form
which is exactly what we expect in a system with no solutions.
Items 2 and 3 in Theorem 3.6.15 simply give another way of stating the fact that the free variables are those variables that do not have a leading 1 in their column. This seems like an obvious fact, but it is very important. We will see in Section 5.4 that this observation leads to a major theorem, sometimes known as the Fundamental Theorem of Linear Transformations: see Theorem 5.4.12.
Let us explore this result with a series of examples.
Example3.6.16.Using matrices and vectors to solve a system of equations.
\begin{align*}
x_1 \amp =-8-11r+2s+11t\\
x_2 \amp =5+7r-2s-9t\\
x_3 \amp =r \text{ is free}\\
x_4 \amp =s \text{ is free}\\
x_5 \amp =t \text{ is free}\text{.}
\end{align*}
We use this information to write\(\vx\text{,}\) again pulling it apart. Since we have three free variables and also constants, we’ll need to pull \(\vx\) apart into four separate vectors.
So \(\vec{x}_p\) is a particular solution; \(\tta\vxp = \vb\text{.}\) (Multiply it out to verify that this is true.) The other vectors, \(\vu\text{,}\)\(\vvv\) and \(\vec{w}\text{,}\) that are multiplied by our free variables \(x_3=r\text{,}\)\(x_4=s\) and \(x_5=t\text{,}\) are each solutions to the homogeneous equations, \(\tta\vx=\zero\text{.}\) Any linear combination of these three vectors, i.e., any vector found by choosing values for \(r\text{,}\)\(s\) and \(t\) in \(r\vu+s\vvv+t\vec{w}\) is a solution to \(\tta\vx=\zero\text{.}\)
This may strike us as a bit odd; we are used to having lots of different vectors in the solution. However, in this case, the linear system \(\tta\vx = \vb\) has exactly one solution, and we’ve found it. What is the solution to \(\tta\vx=\zero\text{?}\) Since we’ve only found one solution to \(\tta\vx=\vb\text{,}\) we can conclude from Key Idea 3.6.13 the related homogeneous equations \(\tta\vx=\zero\) have only one solution, namely \(\vx = \zero\text{.}\) We can write our solution vector \(\vx\) in a form similar to our previous examples to highlight this:
Again, in light of Theorem 3.6.15, this should not be too surprising. The reduced row echelon form of \(A\) is \(\bbm 1\amp 0\\0\amp 1\ebm\text{,}\) so the rank of \(A\) is 2, and there are 2 variables in our system, so we expect \(2-2=0\) parameters in our general solution.
We immediately have a problem; we see that the second row tells us that \(0x_1+0x_2 = 1\text{,}\) the sign that our system does not have a solution. Thus \(\ttaxb\) has no solution. Of course, this does not mean that \(\ttaxo\) has no solution; it always has a solution.
To find the solution to \(\ttaxo\text{,}\) we interpret the reduced row echelon form of the appropriate augmented matrix.
We have no solution to \(\ttaxb\text{,}\) but infinitely many solutions to \(\ttaxo\text{.}\)
The previous example may seem to violate the principle of Key Idea 3.6.13. After all, it seems that having infinitely many solutions to \(\ttaxo\) should imply infinitely many solutions to \(\ttaxb\text{.}\) However, we remind ourselves of the key word in the idea that we observed before: consistent. If \(\ttaxb\) is consistent and \(\ttaxo\) has infinitely many solutions, then so will \(\ttaxb\text{.}\) But if \(\ttaxb\) is not consistent, it does not matter how many solutions \(\ttaxo\) has; \(\ttaxb\) is still inconsistent.
In this chapter, we developed a systematic method for solving systems of linear equations. A key tool in this method was the augmented matrix corresponding to a given system. In this final section, we’ve seen that further insight into the structure of solutions can be gained by considering our systems in terms of matrices and vectors.
In the next chapter, we will begin the study of matrices as objects unto themselves. We will see that they can be added and multiplied by scalars in exactly the same way as vectors, and in addition to this, matrices of the correct sizes can be multiplied in a way that reproduces Definition 3.6.1 above as a special case.
One question that may have occurred to you as you worked through this section is the following: in the one-variable linear equation \(ax=b\text{,}\) we know that as long as \(a\neq 0\text{,}\) we can divide both sides by \(a\text{,}\) giving us the solution \(x=b/a\text{.}\) Now, given the matrix equation \(A\vec{x}=\vec{b}\text{,}\) is there some equivalent means of “dividing by \(A\)” to obtain the solution \(\vec{x}\text{?}\) The short answer is no. Indeed, there is no such thing as matrix division; the algebraic rules for matrix multiplication are much more complicated than they are for numbers. (In particular, we’ll see that for matrices, \(AB\) is usually not the same thing as \(BA\text{!}\))
The slightly longer answer to our question might be phrased as “Sometimes. Well, sort of.” To obtain the correct (and much longer) answer, we will be led in the next chapter to the definition of the inverse of a matrix.
ExercisesExercises
Exercise Group.
A matrix \(\tta\) and vectors \(\vb\text{,}\)\(\vu\) and \(\vvv\) are given. Verify that \(\vu\) and \(\vvv\) are both solutions to the equation \(\ttaxb\text{;}\) that is, show that \(\tta\vu=\tta\vvv=\vb\text{.}\)
A matrix \(\tta\) and vectors \(\vb\text{,}\)\(\vu\) and \(\vvv\) are given. Verify that \(\tta\vu=\zero\text{,}\)\(\tta\vvv=\vb\) and \(\tta(\vu+\vvv)=\vb\text{.}\)
A matrix \(\tta\) and vector are given. Solve the equation \(\ttaxb\text{,}\) write the solution in vector format, and sketch the solution as the appropriate line on the Cartesian plane.