Section4.1Matrix Addition and Scalar Multiplication
As mentioned above, a matrix is a construction that allows us to organize information in a tabular form. For example, we may be interested in the following (made up) data involving crop yields on several Southern Alberta farms, given in tabular form:
Table4.1.1.2014 crop yields, in metric tonnes
Corn
Potatoes
Soybeans
Wheat
Farm A
48
18
92
0
Farm B
0
0
73
152
Farm C
34
203
0
88
Someone in charge of compiling data on farms and crops probably already has a program or spreadsheet set up with all of the farms and crops pre-defined; what they are interested in are the numbers giving the crop yields. Thus, when they enter their data into the computer, they are probably more interested in the array
As long as we’re consistent and always assign each farm to the same row, and each crop to the same column, we can dispense with the labels and work directly with the data.
The array above is our first example of a matrix, which we now define. The definition of matrix is remarkable only in how unremarkable it seems — it is simply a way of organizing information (usually numbers) into an array.
Definition4.1.2.Matrix.
A matrix is a rectangular array of numbers.
The horizontal lines of numbers form rows and the vertical lines of numbers form columns. A matrix with \(m\) rows and \(n\) columns is said to be an \(m\times n\) matrix (“an \(m\) by \(n\) matrix”, or a matrix of size \(m\times n\)).
The entries of an \(m\times n\) matrix are indexed as follows:
That is, \(a_{32}\) means “the number in the third row and second column.” To save space, we will sometimes use the shorthand notation \(A = [a_{ij}]\) to denote a matrix \(A\) with entries \(a_{ij}\text{.}\) If we need to specify the size, we can also write \(A=[a_{ij}]_{m\times n}\text{.}\)
Two special types of matrix are worth noting: those with a single row or column. Such matrices are known as vectors.
Definition4.1.3.Row and column vectors.
A row vector is a \(1\times n\) matrix of the form
A column vector is a \(m\times 1\) matrix of the form
\begin{equation*}
C = \bbm c_1\\c_2\\ \vdots \\ c_m\ebm\text{.}
\end{equation*}
In particular, we can obtain row or column vectors by isolating any row or column of a given \(m\times n\) matrix. For example, given our matrix of crop data in Equation (4.1.1), we might be interested only in Farm B, in which case we would want the row vector
Continuing with our farming example, suppose that in addition to the matrix \(A\) of 2014 crop yields above we also have data for 2015 crop yields given by
We might be interested in quantities such as the total crop yields over two years. If we arrange these totals into a matrix \(T\text{,}\) it seems like it should be reasonable to define matrix addition in such a way that we can write
\begin{equation*}
T = A + B\text{.}
\end{equation*}
This leads to two questions. First, how do we define matrix addition in order to ensure this outcome? Second, and perhaps more fundamentally, what do we mean by “\(=\)” in the context of matrices? Let us tackle the second question first.
Definition4.1.4.Matrix Equality.
Two \(m\times n\) matrices \(\tta\) and \(\ttb\) are equal if their corresponding entries are equal.
Notice that our more formal definition specifies that if matrices are equal, they have the same size. This should make sense.
Now we move on to describing how to add two matrices together. To start off, take a wild stab: how do you think we should add our matrices of crop data above? Well, if we want the sum to represent the total yields for each crop, then it stands to reason that to add the two matrices, we add together each of the corresponding crop yields within them:
For another example, suppose we wanted to know the total production for each farm in 2014, assuming that our matrices represent all of the crops each farm produces. This would be obtained by simply calculating the total for each row in the matrix \(A\text{.}\) Another way to accomplish the same task is as follows: let
The column vector \(Y\) then contains the total yields for each farm. Notice that although we only defined the sum of two matrices in Definition 4.1.5, it makes sense to add any number of matrices, and there is no need to add parentheses.
Recall from Chapter 2 that we defined the multiplication of a vector \(\vec{v} = \bbm a\\ b\\ c\ebm\) by a scalar \(t\in\mathbb{R}\) by
It stands to reason that scalar multiplication should be defined in exactly the same way for row and column vectors; after all, in most cases these are just different ways of writing down the same mathematical object. Thus, in order to multiply a row or column vector by a scalar, we should multiply each entry in the vector by that scalar. From here, it’s not too much of a stretch to conclude that the same definition is reasonable for matrices in general.
Definition4.1.6.Scalar Multiplication.
Let \(\tta = [a_{ij}]\) be an \(m\times n\) matrix and let \(k\) be a scalar. The scalar multiplication \(\tta\) by \(k\text{,}\) denoted \(k\tta\text{,}\) is defined by
Referring one last time to our farming data, we could imagine that a fertilizer company is advertising a new product they claim will increase crop yields by 30%. Increasing the yield of each crop by 30\% amounts to multiplying each of the entries in the matrices \(A\) or \(B\) by a factor of 1.3; according to Definition 4.1.6, this is the same as forming the scalar multiples \(1.3\tta\) and \(1.3\ttb\text{.}\)
Since we have two years’ worth of crop data, we could also ask for the average yield for each crop on each farm. For example, Farm A produced 48 tonnes of corn in 2014, and 41 tonnes of corn in 2015. The two-year average for corn on Farm A is thus
Notice that we can obtain the average for each entry by dividing each entry in \(T=A+B\) by 2, which is the same thing as multiplying by \(\frac{1}{2}\text{.}\) Our matrix of averages is
with similar considerations for the other entries, we also could have obtained the average by first dividing \(A\) and \(B\) by \(\frac{1}{2}\text{,}\) and then adding the result. That is,
We’ll see shortly that this result is due to a general property of matrix arithmetic, called the distributive property.
Expressions such as \(\frac12 A+\frac12 B\) that use both addition and scalar multiplication together occur frequently in Linear Algebra, and are known as linear combinations. In general, a linear combination can be formed from any number of matrices, as long as they’re all of the same size.
Definition4.1.7.Linear combination.
Given \(m\times n\) matrices \(A_1, A_2, \ldots, A_k\text{,}\) a linear combination of these matrices is any expression of the form
\begin{equation*}
B = c_1A_1+c_2A_2+\cdots +c_kA_k\text{,}
\end{equation*}
where \(c_1, c_2, \ldots, c_k\) are scalars.
It is time to forget our farm data and move into some abstract computational examples to make sure we have the hang of these new operations.
Example4.1.8.Matrix addition and scalar multiplication.
\(\tta + \ttc\) is not defined. If we look at our definition of matrix addition, we see that the two matrices need to be the same size. Since \(\tta\) and \(\ttc\) have different dimensions, we don’t even try to create something as an addition; we simply say that the sum is not defined.
To compute this linear combination we first carry out the scalar multiplication, followed by the addition:
Our example raised a few interesting points. Notice how \(\tta + \ttb = \ttb + \tta\text{.}\) We probably aren’t surprised by this, since we know that when dealing with numbers, \(a+b = b+a\text{.}\) Also, notice that \(5\tta+5\ttb = 5(\tta+\ttb)\text{.}\) In our example, we were careful to compute each of these expressions following the proper order of operations; knowing these are equal allows us to compute similar expressions in the most convenient way.
Another interesting thing that came from our previous example is that
It seems like this should be a special matrix; after all, every entry is 0 and 0 is a special number.
In fact, this is a special matrix. We define \(\tto\text{,}\) which we read as “the zero matrix,” to be the matrix of all zeros. We should be careful; this previous “definition” is a bit ambiguous, for we have not stated what size the zero matrix should be. Is \(\bbm 0\amp 0\\0\amp 0\\ \ebm\) the zero matrix? How about \(\bbm 0\amp 0\ebm\text{?}\)
Let’s not get bogged down in semantics. If we ever see \(\tto\) in an expression, we will usually know right away what size \(\tto\) should be; it will be the size that allows the expression to make sense. If \(\tta\) is a \(3\times 5\) matrix, and we write \(\tta + \tto\text{,}\) we’ll simply assume that \(\tto\) is also a \(3\times 5\) matrix. If we are ever in doubt, we can add a subscript; for instance, \(\tto_{2\times 7}\) is the \(2\times7\) matrix of all zeros.
Since the zero matrix is an important concept, we give it its own definition box.
Definition4.1.9.The Zero Matrix.
The \(m\times n\) matrix of all zeros, denoted \(\tto_{m\times n}\text{,}\) is the zero matrix.
When the dimensions of the zero matrix are clear from the context, the subscript is generally omitted.
The following presents some of the properties of matrix addition and scalar multiplication that we discovered above, plus a few more.
Theorem4.1.10.Properties of Matrix Addition and Scalar Multiplication.
The following equalities hold for all \(m\times n\) matrices \(\tta\text{,}\)\(\ttb\) and \(\ttc\) and scalars \(k\text{.}\)
Be sure that this last property makes sense; it says that if we multiply any matrix by the number 0, the result is the zero matrix, or \(\tto\text{.}\) (You now have more than one kind of zero to keep track of!)
It’s important to understand that since matrix addition and scalar multiplication are defined in terms of the entries of our matrices, the properties in Theorem 4.1.10 follow directly from the properties of real number arithmetic in Section 1.2. For example to prove item 1 above, let \(A=[a_{ij}]\) and \(B=[b_{ij}]\) be \(m\times n\) matrices. We then have
Our matrix properties identified \(\tto\) as the Additive Identity; i.e., if you add \(\tto\) to any matrix \(\tta\text{,}\) you simply get \(\tta\text{.}\) This is similar in notion to the fact that for all numbers \(a\text{,}\)\(a+0 = a\text{.}\) A Multiplicative Identity would be a matrix \(\tti\) where \(\tti\times\tta = \tta\) for all matrices \(\tta\text{.}\) (What would such a matrix look like? A matrix of all 1s, perhaps?) However, in order for this to make sense, we’ll need to learn to multiply matrices together, which we’ll do in the next section.
ExercisesExercises
Exercise Group.
For the matrices \(\tta\) and \(\ttb\) given below, simplify the given expression.