So far in this chapter, weβve learned a systematic method for solving systems of linear equations. Some of the applied examples we considered in the previous section led naturally to systems of equations, and had solutions that were best interpreted in that context. Other examples, such as those involving span and linear independence, were stated in terms of vectors. In this section, we discuss how to write a system of linear equations in terms of vectors and matrices, and express solutions as vectors.
Expressing the solutions of linear systems in terms of vectors will give us additional insight into the behaviour of those systems, and provides a stepping-off point for the study of the algebra of matrices.
We have often relied on previous algebra experience to help us understand linear algebra concepts. We do that again here. Consider the equation \(ax=b\text{,}\) where \(a=3\) and \(b=6\text{.}\) If we asked one to βsolve for \(x\text{,}\)β what exactly would we be asking? We would want to find a number, which we call \(x\text{,}\) where \(a\) times \(x\) gives \(b\text{;}\) in this case, it is a number, when multiplied by 3, returns 6. As long as \(a\neq 0\) (what if \(a\) does equal zero?), we know that we can multiply both sides of the equation by \(\frac{1}{a}\) to get \(x = \frac{1}{a}(b) = \frac{b}{a}\text{.}\)
To create an analogy with the single variable equation \(ax=b\text{,}\) we ask: Is there a way to define the product \(A\vec{x}\) of a matrix and a column vector in such a way that the system (3.6.1) can be written in the form
Fortunately for us, the answer is yes! Even better, weβll see that the definition we give here turns out to be a special case (and motivating example) for the general definition of matrix multiplication given in SectionΒ 4.2.
The definition of the product \(A\vec{x}\) is straightforward. We want the result to be a column vector of size \(m\) (that is, with \(m\) entries), so that we can set it equal to the column vector \(\vec{b}\text{.}\) Furthermore, each entry in \(\vec{b}\) is the right-hand side of an equation in (3.6.1), so we want the corresponding entry in \(A\vec{x}\) to be the left-hand side. We are immediately forced to adopt the following rule.
A few remarks about DefinitionΒ 3.6.1 are needed here. First, note that the number of columns in the matrix \(A\) matches the number of entries in the vector \(\vec{x}\text{,}\) and the number of rows in \(A\) matches the number of entries in the vector \(\vec{b}\text{.}\) Moreover,
The \(i\)th entry in the vector \(A\vec{x}\) is obtained by forming the dot product of row \(i\) in the matrix \(A\) (viewed as a vector in \(\R^n\)) with the vector \(\vec{x}\text{.}\)
That is, since each row of \(A\) has \(n\) entries, as does the vector \(\vec{x}\text{,}\) we can form the dot product of \(\vec{x}\) with any of the rows of \(A\text{.}\) Each such dot product forms the corresponding entry in the vector \(A\vec{x}\text{.}\)
Or equivalently, using the language of SectionΒ 2.7, the vector \(A\vx\) belongs to the span of the columns of \(A\text{.}\) This will be a useful observation when solving questions about span.
(We donβt know what \(\vx\) is, so we have to represent its entries with the variables \(x_1\text{,}\)\(x_2\) and \(x_3\text{.}\)) Letβs βsolve for \(\vx\text{,}\)β given the equation \(A\vx = \vb\text{.}\)
We multiply out the left hand side of this equation according to DefinitionΒ 3.6.1. The first row of \(A\) is the row vector \(\vec{a}_1=\bbm 1\amp 1\amp 1\ebm\text{.}\) Treating this as a column vector, we form the dot product with \(\vec{x}\text{,}\) giving us
If you prefer to think in terms of the columns of \(A\text{,}\) we can use the observation above, that \(A\vx\) is a linear combination of the columns of \(A\text{.}\) This gives us
This should look familiar; it is a system of linear equations! Given the matrix-vector equation \(A\vx=\vb\text{,}\) we can recognize \(A\) as the coefficient matrix from a linear system and as the vector of the constants from the linear system. Given a system of equations, rewriting it in matrix form is equally straightforward.
To solve a matrix--vector equation (and the corresponding linear system), we simply augment the matrix \(A\) with the vector \(\vb\text{,}\) put this matrix into reduced row echelon form, and interpret the results.
The solution is rather straightforward, even though we did a lot of work before to find the answer. Form the augmented matrix \(\bbm A\amp \vb\ebm\) and interpret its reduced row echelon form.
This probably seems all well and good. While asking one to solve the equation \(A\vx=\vb\) for \(\vx\) seems like a new problem, in reality it is just asking that we solve a system of linear equations. Our variables \(x_1\text{,}\) etc., appear not individually but as the entries of our vector \(\vx\text{.}\) We are simply writing an old problem in a new way.
In line with this new way of writing the problem, we have a new way of writing the solution. Instead of listing, individually, the values of the unknowns, we simply list them as the elements of our vector\(\vx\text{.}\)
These are important ideas, so we state the basic principle once more: solving the equation \(A\vx=\vb\) for \(\vx\) is the same thing as solving a linear system of equations. Equivalently, any system of linear equations can be written in the form \(A\vx=\vb\) for some matrix \(A\) and vector\(\vb\text{.}\)
Since these ideas are equivalent, weβll refer to \(A\vx=\vb\) both as a matrixβvector equation and as a system of linear equations: they are the same thing.
Weβve seen two examples illustrating this idea so far, and in both cases the linear system had exactly one solution. We know from TheoremΒ 3.4.2 that any linear system has either one solution, infinitely many solutions, or no solution. So how does our new method of writing a solution work with infinitely many solutions and no solutions?
Certainly, if \(A\vx=\vb\) has no solution, we simply say that the linear system has no solution. There isnβt anything special to write. So the only other option to consider is the case where we have infinitely many solutions. Weβll learn how to handle these situations through examples.
From above we know that \(x_1 = -2x_2\text{,}\) where \(x_2=t\text{,}\) so we replace the \(x_1\) in \(\vx\) with \(-2t\) and replace \(x_2\) by \(t\text{.}\) This gives our solution as
Recall that since our system was consistent and had a free variable, we have infinitely many solutions. This form of the solution highlights this fact; pick any value for \(t\) and we get a different solution.
We should check our work; multiply each of the above vectors by \(A\) to see if we indeed get \(\zero\text{.}\) Or, we can save ourselves some time and check the general solution. We have
We have officially solved this problem; we have found the solution to \(A\vx=\zero\) and written it properly. One final thing we will do here is graph the solution, using our skills learned in the previous section.
This means that any scalar multiple of the vector \(\vvv = \bbm -2\\1\ebm\) is a solution; we know how to sketch the scalar multiples of \(\vvv\text{.}\) This is done in FigureΒ 3.6.4.
Here vector \(\vvv\) is drawn as well as the line that goes through the origin in the direction of \(\vvv\text{.}\) Any vector along this line is a solution. So in some sense, we can say that the solution to \(A\vx=\zero\) is a line.
A few comments are in order here. First, matrix equations (or the corresponding system of linear equations) such as the above where the vector on the right-hand side is the zero vector form a special case that is important enough to have its own name: these are known as homogeneous systems of equations. The formal definition is as follows.
The term homogeneous comes from two Greek words; homo meaning βsameβ and genus meaning βtype.β A homogeneous system of equations is a system in which each equation is of the same type β all constants are 0. In mathematics, objects that are considered homogeneous often have some sort of scale invariance, and that is the case here: if \(\vx\) is a solution to \(A\vx=\zero\text{,}\) then so is \(c\vx\) for any scalar \(c\text{.}\) (Do you see why?)
Notice that the line \(\vx =t\vvv\) in the solution of ExampleΒ 3.6.3 above passes through the origin. This is an important characteristic of homogeneous systems: since \(A\vec 0 = \vec 0\) for any matrix \(A\text{,}\) we always have (at least) the solution \(\vx = \vec 0\text{.}\) (Weβll have more to say about this below.)
We proceed exactly as we did in ExampleΒ 3.6.3, by forming the proper augmented matrix and putting it into reduced row echelon form, which we do below.
Again, we have infinitely many solutions to the equation \(A\vx = \vec 0\text{;}\) any choice of \(x_2\) gives us one of these solutions. For instance, picking \(x_2=2\) gives the solution
This is a particularly nice solution, since there are no fractions! In fact, since the parameter \(t\) can take on any real value, there is nothing preventing us from defining a new parameter \(s = t/2\text{,}\) and then
In the last two examples, we saw that the general solution could be written in the form \(\vx = t\vvv\) for a vector \(\vvv\) such that \(A\vvv =\vec 0\text{.}\) Such vectors are known as the basic solutions to a homogeneous linear system.
The basic solutions to the system \(A\vx = \vec 0\) are the vectors \(\vec{v}_1, \vec{v}_2, \ldots, \vec{v}_k\) such that the general solution to the system is given by
Letβs practice finding vector solutions again; this time, we wonβt solve a system of the form \(A\vx=\zero\text{,}\) but instead \(A\vx = \vb\text{,}\) for some vector \(\vb\neq \vec 0\text{.}\) Such systems are known (unsurprisingly) as non-homogeneous systems.
This solution is different than what weβve seen in the past two examples; we canβt simply pull out a \(t\) since there is a 3 in the first entry. Using the properties of matrix addition, we can βpull apartβ this vector and write it as the sum of two vectors: one which contains only constants, and one that contains only terms involving the parameter \(t\text{.}\) We do this below.
We still have infinitely many solutions; by picking a value for \(t\) we get one of these solutions. For instance, by letting \(t= -1\text{,}\)\(0\text{,}\) or \(2\text{,}\) we get the solutions
We have officially solved the problem; we have solved the equation \(A\vx = \vb\) for \(\vx\) and have written the solution in vector form. As an additional visual aid, we will graph this solution.
Each vector in the solution can be written as the sum of two vectors: \(\vec{x}_p\) and a multiple of \(\vvv\text{.}\) In FigureΒ 3.6.11, \(\vec{x}_p\) is graphed and \(\vvv\) is graphed with its origin starting at the tip of \(\vec{x}_p\text{.}\) Finally, a line is drawn in the direction of \(\vvv\) from the tip of \(\vec{x}_p\text{;}\) any vector pointing to any point on this line is a solution to \(A\vx = \vb\text{.}\)
Notice that in this case our line does not pass through the origin, so the set of solutions is not a subspace. On the other hand, every solution to the system \(A\vec x = \vec b\) can be obtained by adding the vector \(\vec{x}_p\) to an element of the set of solutions to the homogeneous system \(A\vx = \vec 0\text{.}\) Weβll elaborate on this shortly.
The previous examples illustrate some important concepts. One is that (at least, when \(\vec x\in \R^2\) or \(\R^3\)) we can visualize the solution to a system of linear equations. Before, when we had infinitely many solutions, we knew we could arbitrarily pick values for our free variables and get different solutions. We knew this to be true, and we even practised it, but the result was not very βtangible.β Now, we can view our solution as a vector; by picking different values for our free variables, we see this as multiplying certain important vectors by a scalar which gives a different solution.
Another important concept that these examples demonstrate comes from the fact that Examples 3 and 10 were only βslightly differentβ and hence had only βslightly differentβ answers. Both solutions had
\begin{equation*}
t\bbm-2\\1\ebm
\end{equation*}
in them; in ExampleΒ 3.6.10 the solution also had another vector added to this. The addition of the vector \(\vec{x}_p\) in ExampleΒ 3.6.10 is needed to account for the fact that we were dealing with a non-homogeneous system of linear equations.
Recall that for a homogeneous system of linear equations, we know that \(\vx = \vec 0\) will be a solution, since no matter what the matrix \(A\) is, we can be certain that \(A\zero = \zero\text{.}\) This fact is important; the zero vector is always a solution to a homogeneous linear system. Therefore a homogeneous system is always consistent; we need only to determine whether we have exactly one solution (just \(\zero\)) or infinitely many solutions. This idea is important, so we give it its own box.
How do we determine if we have exactly one or infinitely many solutions? Recall Key IdeaΒ 3.4.7: if the solution has any free variables, then it will have infinitely many solutions. How can we tell if the system has free variables? Form the augmented matrix \(\bbm A \amp \zero \ebm\text{,}\) put it into reduced row echelon form, and interpret the result.
It may seem that weβve brought up a new question, βWhen does \(A\vx=\zero\) have exactly one or infinitely many solutions?β only to answer with βLook at the reduced row echelon form of \(A\) and interpret the results, just as always.β Why bring up a new question if the answer is an old one?
so \(\vec{x}_p\) is a solution. (The subscript \(p\) of β\(\vec{x}_p\)β is used to denote that this vector is a particular solution: see DefinitionΒ 3.4.11.) What about the general solution \(\vec x = \vec{x}_p+t\vvv\text{?}\) Recalling that \(A\vvv=\zero\text{,}\) we have
for any value of \(t\text{,}\) so there are infinitely many solutions to our system, one for each \(t\in \R\text{.}\) The whole point is that \(\vec{x}_p\) itself is a solution to \(A\vx = \vb\text{,}\) and we could find more solutions by adding vectors βthat go to zeroβ when multiplied by \(A\text{.}\)
So we wonder: does this mean that \(A\vx = \vb\) will have infinitely many solutions? After all, if \(\vec{x}_p\) and \(\vec{x}_p+\vvv\) are both solutions, donβt we have infinitely many solutions?
So here is the culmination of all of our fun that started a few pages back. If \(\vvv\) is a solution to \(A\vx=\zero\) and \(\vec{x}_p\) is a solution to \(A\vx=\vb\text{,}\) then \(\vec{x}_p+\vvv\) is also a solution to \(A\vx=\vb\text{.}\) If \(A\vx=\zero\) has infinitely many solutions, so does \(A\vx = \vb\text{;}\) if \(A\vx=\zero\) has only one solution, so does \(A\vx = \vb\text{.}\) This culminating idea is of course important enough to be stated again.
A key word in the above statement is consistent. If \(A\vx = \vb\) is inconsistent (the linear system has no solution), then it doesnβt matter how many solutions \(A\vx = \zero\) has; \(A\vx=\vb\) has no solution.
We can elaborate on Key IdeaΒ 3.6.13 above, as well as Key IdeaΒ 3.4.7 from SectionΒ 3.4 by introducing one more piece of important terminology. By now it is probably clear that the leading 1s in the reduced row echelon form of a matrix play a key role in understanding the system. In fact, it turns out that we can describe all of the different possibilities for a linear system in terms of one number: the number of leading 1s in the reduced row echelon form of a matrix.
The rank of a matrix \(A\) is denoted by \(\operatorname{rank}(A)\) and defined as the number of leading 1s in the reduced row echelon form of \(A\text{.}\)
Although we do not prove it in this textbook, the reduced row echelon form of any matrix is unique; it follows from this fact that the rank of a matrix is a well-defined number. The importance of rank is outlined in the following result.
Let \(A\) be an \(m\times n\) matrix. For any linear system \(A\vx = \vb\) in \(n\) variables, we have the following possibilities:
If \(\operatorname{rank}(A) \lt \operatorname{rank}\left[\begin{array}{c|c}A \amp \vec{b}\end{array}\right]\text{,}\) then the system \(A\vx = \vb\) is inconsistent.
If \(\operatorname{rank}(A)=\operatorname{rank}\left[\begin{array}{c|c} A \amp \vec b\end{array}\right] = n\) (where \(n\) is the number of variables), then the system \(A\vx = \vb\) has a unique solution.
If \(\operatorname{rank}(A)=\operatorname{rank}\left[\begin{array}{c|c} A \amp \vec b\end{array}\right] = \lt n\text{,}\) then the system \(A\vx = \vb\) has infinitely solutions. Moreover, the general solution to \(A\vx = \vb\) will involve \(k\) parameters, where
\begin{equation*}
k = n - \operatorname{rank}(A)\text{.}
\end{equation*}
\begin{equation*}
\operatorname{rank}(A)\lt \operatorname{rank}\left[\begin{array}{c|c} A \amp \vec b\end{array}\right]\text{,}
\end{equation*}
then there must be a leading 1 in the right-hand column of the reduced row echelon form of \(\bbm A \amp \vec b\ebm\text{,}\) meaning that we have a row of the form
Items 2 and 3 in TheoremΒ 3.6.15 simply give another way of stating the fact that the free variables are those variables that do not have a leading 1 in their column. This seems like an obvious fact, but it is very important. We will see in SectionΒ 5.4 that this observation leads to a major theorem, sometimes known as the Fundamental Theorem of Linear Transformations: see TheoremΒ 5.4.12.
We use this information to write\(\vx\text{,}\) again pulling it apart. Since we have three free variables and also constants, weβll need to pull \(\vx\) apart into four separate vectors.
So \(\vec{x}_p\) is a particular solution; \(A\vxp = \vb\text{.}\) (Multiply it out to verify that this is true.) The other vectors, \(\vu\text{,}\)\(\vvv\) and \(\vec{w}\text{,}\) that are multiplied by our free variables \(x_3=r\text{,}\)\(x_4=s\) and \(x_5=t\text{,}\) are each solutions to the homogeneous equations, \(A\vx=\zero\text{.}\) Any linear combination of these three vectors, i.e., any vector found by choosing values for \(r\text{,}\)\(s\) and \(t\) in \(r\vu+s\vvv+t\vec{w}\) is a solution to \(A\vx=\zero\text{.}\)
This may strike us as a bit odd; we are used to having lots of different vectors in the solution. However, in this case, the linear system \(A\vx = \vb\) has exactly one solution, and weβve found it. What is the solution to \(A\vx=\zero\text{?}\) Since weβve only found one solution to \(A\vx=\vb\text{,}\) we can conclude from Key IdeaΒ 3.6.13 the related homogeneous equations \(A\vx=\zero\) have only one solution, namely \(\vx = \zero\text{.}\) We can write our solution vector \(\vx\) in a form similar to our previous examples to highlight this:
Again, in light of TheoremΒ 3.6.15, this should not be too surprising. The reduced row echelon form of \(A\) is \(\bbm 1\amp 0\\0\amp 1\ebm\text{,}\) so the rank of \(A\) is 2, and there are 2 variables in our system, so we expect \(2-2=0\) parameters in our general solution.
We immediately have a problem; we see that the second row tells us that \(0x_1+0x_2 = 1\text{,}\) the sign that our system does not have a solution. Thus \(A\vx = \vb\) has no solution. Of course, this does not mean that \(A\vx = \zero\) has no solution; it always has a solution.
The previous example may seem to violate the principle of Key IdeaΒ 3.6.13. After all, it seems that having infinitely many solutions to \(A\vx = \zero\) should imply infinitely many solutions to \(A\vx = \vb\text{.}\) However, we remind ourselves of the key word in the idea that we observed before: consistent. If \(A\vx = \vb\) is consistent and \(A\vx = \zero\) has infinitely many solutions, then so will \(A\vx = \vb\text{.}\) But if \(A\vx = \vb\) is not consistent, it does not matter how many solutions \(A\vx = \zero\) has; \(A\vx = \vb\) is still inconsistent.
In this chapter, we developed a systematic method for solving systems of linear equations. A key tool in this method was the augmented matrix corresponding to a given system. In this final section, weβve seen that further insight into the structure of solutions can be gained by considering our systems in terms of matrices and vectors.
In the next chapter, we will begin the study of matrices as objects unto themselves. We will see that they can be added and multiplied by scalars in exactly the same way as vectors, and in addition to this, matrices of the correct sizes can be multiplied in a way that reproduces DefinitionΒ 3.6.1 above as a special case.
One question that may have occurred to you as you worked through this section is the following: in the one-variable linear equation \(ax=b\text{,}\) we know that as long as \(a\neq 0\text{,}\) we can divide both sides by \(a\text{,}\) giving us the solution \(x=b/a\text{.}\) Now, given the matrix equation \(A\vec{x}=\vec{b}\text{,}\) is there some equivalent means of βdividing by \(A\)β to obtain the solution \(\vec{x}\text{?}\) The short answer is no. Indeed, there is no such thing as matrix division; the algebraic rules for matrix multiplication are much more complicated than they are for numbers. (In particular, weβll see that for matrices, \(AB\) is usually not the same thing as \(BA\text{!}\))
The slightly longer answer to our question might be phrased as βSometimes. Well, sort of.β To obtain the correct (and much longer) answer, we will be led in the next chapter to the definition of the inverse of a matrix.
A matrix \(A\) and vectors \(\vb\text{,}\)\(\vu\) and \(\vvv\) are given. Verify that \(\vu\) and \(\vvv\) are both solutions to the equation \(A\vx = \vb\text{;}\) that is, show that \(A\vu=A\vvv=\vb\text{.}\)
A matrix \(A\) and vectors \(\vb\text{,}\)\(\vu\) and \(\vvv\) are given. Verify that \(A\vu=\zero\text{,}\)\(A\vvv=\vb\) and \(A(\vu+\vvv)=\vb\text{.}\)
A matrix \(A\) and vector are given. Solve the equation \(A\vx = \vb\text{,}\) write the solution in vector format, and sketch the solution as the appropriate line on the Cartesian plane.