You may recall from elementary linear algebra, or a calculus class, that vectors in \(\R^2\) or \(\R^3\) are considered to be quantities with both magnitude and direction. Interestingly enough, neither of these properties is inherent to a general vector space. The vector space axioms specify only algebra; they say nothing about geometry. (What, for example, should be the “angle” between two polynomials?)
Because vector algebra is often introduced as a consequence of geometry (like the “tip-to-tail” rule), you may not have thought all that carefully about what, exactly, is responsible for making the connection between algebra and geometry. It turns out that the missing link is the humble dot product.
You probably encountered the following result, perhaps as a consequence of the law of cosines: for any two vectors \(\uu,\vv\in\R^2\text{,}\)
\begin{equation*}
\uu\dotp\vv = \len{\uu}\,\len{\vv}\cos\theta\text{,}
\end{equation*}
where \(\theta\) is the angle between \(\uu\) and \(\vv\text{.}\) Here we see both magnitude and direction (encoded by the angle) defined in terms of the dot product.
While it is possible to generalize the idea of the dot product to something called an inner product, we will first focus on the basic dot product in \(\R^n\text{.}\) Once we have a good understanding of things in that setting, we can move on to consider the abstract counterpart.
Subsection 3.1.1 Basic definitions and properties
For most of this chapter (primarily for typographical reasons) we will denote elements of \(\R^n\) as ordered \(n\)-tuples \((x_1,\ldots, x_n)\) rather than as column vectors.
Definition 3.1.1.
Let \(\xx=(x_1,x_2,\ldots, x_n)\) and \(\yy=(y_1,y_2,\ldots, y_n)\) be vectors in \(\R^n\text{.}\) The dot product of \(\xx\) and \(\yy\text{,}\) denoted by \(\xx\dotp\yy\) is the scalar defined by
\begin{equation*}
\xx\dotp \yy = x_1y_1+x_2y_2+\cdots + x_ny_n\text{.}
\end{equation*}
The norm of a vector \(\xx\) is denoted \(\len{\xx}\) and defined by
\begin{equation*}
\len{\xx} = \sqrt{x_1^2+x_2^2+\cdots + x_n^2}\text{.}
\end{equation*}
Note that both the dot product and the norm produce scalars. Through the Pythagorean Theorem, we recognize the norm as the length of \(\xx\text{.}\) The dot product can still be thought of as measuring the angle between vectors, although the simple geometric proof used in two dimensions is not that easily translated to \(n\) dimensions. At the very least, the dot product lets us extend the notion of right angles to higher dimensions.
Definition 3.1.2.
We say that two vectors \(\xx,\yy\in\R^n\) are orthogonal if \(\xx\dotp\yy = 0\text{.}\)
It should be no surprise that all the familiar properties of the dot product work just as well in any dimension. The folowing properties can be confirmed by direct computation, so the proof is left as an exercise.
Theorem 3.1.3.
For any vectors \(\xx,\yy,\zz\in\R^n\text{,}\)
\(\displaystyle \xx\dotp\yy = \yy\dotp\xx\)
\(\displaystyle \xx\dotp(\yy+\zz)=\xx\dotp\yy+\xx\dotp\zz\)
For any scalar \(c\text{,}\) \(\xx\dotp(c\yy) = (c\xx)\dotp\yy=c(\xx\dotp\yy)\)
\(\xx\dotp\xx\geq 0\text{,}\) and \(\xx\dotp\xx=0\) if and only if \(\xx=\zer\)
Exercise 3.1.5.
Show that for any vectors \(\xx,\yy\in\R^n\text{,}\) we have
\begin{equation*}
\len{\xx+\yy}^2 = \len{\xx}^2+2\xx\dotp\yy+\len{\yy}^2\text{.}
\end{equation*}
Hint.
Use properties of the dot product to expand and simplify.
Exercise 3.1.6.
Suppose \(\mathbb{R}^n=\spn\{\vv_1,\vv_2,\ldots, \vv_k\}\text{.}\) Prove that \(\xx=\zer\) if and only if \(\xx\dotp \vv_i=0\) for each \(i=1,2,\ldots, k\text{.}\)
Hint.
Don’t forget to prove both directions! Note that the hypothesis allows you to write \(\xx\) as a linear combination of the \(\vv_i\text{.}\)
There are two important inequalities associated to the dot product and norm. We state them both in the following theorem, without proof.
Theorem 3.1.7.
Let \(\xx,\yy\) be any vectors in \(\R^n\text{.}\) Then
\(\displaystyle \lvert \xx\dotp \yy\rvert \leq \len{\xx}\len{\yy}\)
\(\displaystyle \len{\xx+\yy}\leq \len{\xx}+\len{\yy}\)
The first of the above inequalities is called the Cauchy-Schwarz inequality, which be viewed as a manifestation of the formula
\begin{equation*}
\xx\dotp \yy = \len{\xx}\len{\yy}\cos\theta\text{,}
\end{equation*}
since after all, \(\lvert \cos\theta\rvert\leq 1\) for any angle \(\theta\text{.}\)
The usual proof involves some algebraic trickery; the interested reader is invited to search online for the Cauchy-Schwarz inequality, where they will find no shortage of websites offering proofs.
The second result, called the
triangle inequality, follows immediately from the Cauchy-Scwarz inequality and
Exercise 3.1.5:
\begin{equation*}
\len{\xx+\yy}^2 = \len{\xx}^2+2\xx\dotp\yy+\len{\yy^2}\leq \len{\xx}^2+2\len{\xx}\len{\yy}+\len{\yy}^2=(\len{\xx}+\len{\yy})^2\text{.}
\end{equation*}
The triangle inequality gets its name from the “tip-to-tail” picture for vector addition. Essentially, it tells us that the length of any side of a triangle must be less than the sum of the lengths of the other two sides. The importance of the triangle inequality is that it tells us that the norm can be used to define distance.
Definition 3.1.8.
For any vectors \(\xx,\yy\in \R^n\text{,}\) the distance from \(\xx\) to \(\yy\) is denoted \(d(\xx,\yy)\text{,}\) and defined as
\begin{equation*}
d(\xx,\yy) = \len{\xx-\yy}\text{.}
\end{equation*}
Exercise 3.1.10.
Select all vectors that are orthogonal to the vector \((2,1,-3)\)
\((1,1,1)\)
Yes! \(2(1)+1(1)-3(1)=0\text{.}\)
\((3,1,2)\)
You should find that the dot product is \(1\text{,}\) not \(0\text{,}\) so these vectors are not orthogonal.
\((0,0)\)
You might be tempted to say that the zero vector is orthogonal to everything, but we can’t compare vectors from different vector spaces!
\((0,-3,-1)\)
Yes! We have to be careful of signs here: \(2(0)+1(-3)+(-3)(-1)=0-3+3=0\text{.}\)
Exercise 3.1.11.
If \(\uu\) is orthogonal to \(\vv\) and \(\vv\) is orthogonal to \(\ww\text{,}\) then \(\uu\) is orthogonal to \(\ww\text{.}\)
True.
Consider \(\uu=(1,0,0)\text{,}\) \(\vv=(0,1,0)\text{,}\) and \(\ww=(1,0,1)\text{.}\)
False.
Consider \(\uu=(1,0,0)\text{,}\) \(\vv=(0,1,0)\text{,}\) and \(\ww=(1,0,1)\text{.}\)
Subsection 3.1.2 Orthogonal sets of vectors
In
Chapter 1, we learned that linear independence and span are important concepts associated to a set of vectors. In this chapter, we learn what it means for a set of vectors to be
orthogonal, and try to understand why this concept is just as important as independence and span.
Definition 3.1.12.
A set of vectors \(\{\vv_1,\vv_2,\ldots, \vv_k\}\) in \(\R^n\) is called orthogonal if:
\(\vv_i\neq \zer\) for each \(i=1,2\ldots, k\)
\(\vv_i\dotp\vv_j = 0\) for all \(i\neq j\)
Exercise 3.1.13.
Show that the following is an orthogonal subset of \(\R^4\text{.}\)
\begin{equation*}
\{(1,0,1,0), (-1,0,1,1), (1,1,-1,2)\}
\end{equation*}
Can you find a fourth vector that is orthogonal to each vector in this set?
Hint.
The dot product of the fourth vector with each vector above must be zero. Can you turn this requirement into a system of equations?
Exercise 3.1.14.
If \(\{\vv,\ww\}\) and \(\{\xx,\yy\}\) are orthogonal sets of vectors in \(\R^n\text{,}\) then \(\{\vv,\ww,\xx,\yy\}\) is an orthogonal set of vectors.
True.
Try to construct an example. The vector \(\xx\) has to be orthogonal to \(\yy\text{,}\) but is there any reason it has to be orthogonal to \(\vv\) or \(\ww\text{?}\)
False.
Try to construct an example. The vector \(\xx\) has to be orthogonal to \(\yy\text{,}\) but is there any reason it has to be orthogonal to \(\vv\) or \(\ww\text{?}\)
The requirement that the vectors in an orthogonal set be nonzero is partly because the alternative would be boring, and partly because it lets us state the following theorem.
Theorem 3.1.15.
Any orthogonal set of vectors is linearly independent.
Strategy.
Any proof of linear independence should start by defining our set of vectors, and assuming that a linear combination of these vectors is equal to the zero vector, with the goal of showing that the scalars have to be zero.
Set up the equation (say, \(c_1\vv_1+\cdots c_n\vv_n=\zer\)), with the assumption that your set of vectors is orthogonal. What happens if you take the dot product of both sides with one of these vectors?
Proof.
Suppose \(S=\{\vv_1,\vv_2,\ldots, \vv_k\}\) is orthogonal, and suppose
\begin{equation*}
c_1\vv_1+c_2\vv_2+\cdots + c_k\vv_k = \zer
\end{equation*}
for scalars \(c_1,c_2,\ldots, c_k\text{.}\) Taking the dot product of both sides of the above equation with \(\vv_1\) gives
\begin{align*}
c_1(\vv_1\dotp \vv_1)+c_2(\vv_1\dotp \vv_2)+\cdots +c_k(\vv_1\dotp \vv_k) \amp =\vv_1\dotp \zer\\
c_1\len{\vv_1}^2+0+\cdots + 0\amp = 0 \text{.}
\end{align*}
Since \(\len{\vv_1}^2\neq 0\text{,}\) we must have \(c_1=0\text{.}\) We similarly find that all the remaining scalars are zero by taking the dot product with \(\vv_2,\ldots, \vv_k\text{.}\)
Another useful consequence of orthogonality: in two dimensions, we have the Pythagorean Theorem for right-angled triangles. If the “legs” of the triangle are identified with vectors \(\xx\) and \(\yy\text{,}\) and the hypotenuse with \(\zz\text{,}\) then \(\len{\xx}^2+\len{\yy}^2=\len{\zz}^2\text{,}\) since \(\xx\dotp \yy=0\text{.}\)
In \(n\) dimensions, we have the following, which follows from the fact that all “cross terms” (dot products of different vectors) will vanish.
Theorem 3.1.16. Pythagorean Theorem.
For any orthogonal set of vectors \(\{\xx_1,\ldots, \xx_k\}\) we have
\begin{equation*}
\len{\xx_1+\cdots +\xx_k}^2 = \len{\xx_1}^2+\cdots + \len{\xx_k}^2\text{.}
\end{equation*}
Strategy.
Remember that
\begin{equation*}
\len{\xx_1+\cdots + \xx_k}^2 = (\xx_1+\cdots +\xx_k)\dotp (\xx_1+\cdots +\xx_k)\text{,}
\end{equation*}
and use the distributive property of the dot product, along with the fact that each pair of different vectors is orthogonal.
Our final initial result about orthogonal sets of vectors relates to span. In general, we know that if \(\yy\in\spn\{\xx_1,\ldots, \xx_k\}\text{,}\) then it is possible to solve for scalars \(c_1,\ldots, c_k\) such that \(\yy=c_1\xx_1+\cdots+ c_k\xx_k\text{.}\) The trouble is that finding these scalars generally involves setting up, and then solving, a system of linear equations. The great thing about orthogonal sets of vectors is that we can provide explicit formulas for the scalars.
Theorem 3.1.17. Fourier expansion theorem.
Let \(S=\{\vv_1,\vv_2,\ldots, \vv_k\}\) be an orthogonal set of vectors. For any \(\yy\in \spn S\text{,}\) we have
\begin{equation*}
\yy = \left(\frac{\yy\dotp\mathbf{v}_1}{\vv_1\dotp\vv_1}\right)\vv_1+
\left(\frac{\yy\dotp\mathbf{v}_2}{\vv_2\dotp\vv_2}\right)\vv_2+\cdots +
\left(\frac{\yy\dotp\mathbf{v}_k}{\vv_k\dotp\vv_k}\right)\vv_k\text{.}
\end{equation*}
Strategy.
Take the same approach you used in the proof of
Theorem 3.1.15, but this time, with a nonzero vector on the right-hand side.
Proof.
Let \(\yy=c_1\vv_1+\cdots + c_k\vv_k\text{.}\) Taking the dot product of both sides of this equation with \(\vv_i\) gives
\begin{equation*}
\vv_i\dotp\yy = c_i(\vv_i\dotp\vv_i)\text{,}
\end{equation*}
since the dot product of \(\vv_i\) with \(\vv_j\) for \(i\neq j\) is zero.
One use of
Theorem 3.1.17 is determining whether or not a given vector is in the span of an orthogonal set. If it is in the span, then its coefficients must satisfy the Fourier expansion formula. Therefore, if we compute the right hand side of the above formula and do not get our original vector, then that vector must not be in the span.
Exercise 3.1.18.
Determine whether or not the vectors \(\vv=(1,-4,3,-11), \ww=(3,1,-4,2)\) belong to the span of the vectors \(\xx_1=(1,0,1,0), \xx_2=(-1,0,1,1), \xx_3=(1,1,-1,2)\text{.}\)
(We confirmed that
\(\{\xx_1,\xx_2,\xx_3\}\) is an orthogonal set in
Exercise 3.1.13.)
The Fourier expansion is especially simple if our basis vectors have norm one, since the denominators in each coefficient disappear. Recall that a unit vector in \(\R^n\) is any vector \(\xx\) with \(\len{\xx}=1\text{.}\) For any nonzero vector \(\vv\text{,}\) a unit vector (that is, a vector of norm one) in the direction of \(\vv\) is given by
\begin{equation*}
\hat{\uu} = \frac{1}{\len{\vv}}\vv\text{.}
\end{equation*}
We often say that the vector \(\uu\) is normalized. (The convention of using a “hat” for unit vectors is common but not universal.)
Exercise 3.1.19.
Definition 3.1.20.
A basis \(B\) of \(\R^n\) is called an orthonormal basis if \(B\) is orthogonal, and all the vectors in \(B\) are unit vectors.
Example 3.1.21.
\begin{equation*}
\{(1,0,1,0), (-1,0,1,1), (1,1,-1,2),(1,-6,-1,2)\}
\end{equation*}
is orthogonal. Since it’s orthogonal, it must be independent, and since it’s a set of four independent vectors in \(\R^4\text{,}\) it must be a basis. To get an orthonormal basis, we normalize each vector:
\begin{align*}
\hat{\uu}_1 \amp = \frac{1}{\sqrt{1^2+0^2+1^2+0^2}}(1,0,1,0) = \frac{1}{\sqrt{2}}(1,0,1,0)\\
\hat{\uu}_2 \amp = \frac{1}{\sqrt{(-1)^2+0^2+1^2+1^2}}(-1,0,1,1,) = \frac{1}{\sqrt{3}}(-1,0,1,1)\\
\hat{\uu}_3 \amp = \frac{1}{\sqrt{1^2+1^2+(-1)^2+2^2}}(1,1,-1,2) = \frac{1}{\sqrt{7}}(1,1,-1,2)\\
\hat{\uu}_4 \amp = \frac{1}{\sqrt{1^2+(-6)^2+(-1)^2+2^2}}(1,-6,-1,2) = \frac{1}{\sqrt{42}}(1,-6,-1,2)\text{.}
\end{align*}
The set \(\{\hat{\uu}_1,\hat{\uu}_2,\hat{\uu}_3,\hat{\uu}_4\}\) is then an orthonormal basis of \(\R^4\text{.}\)
The process of creating unit vectors does typically introduce square root coefficients in our vectors. This can seem undesirable, but there remains value in having an orthonormal basis. For example, suppose we wanted to write the vector \(\vv=(3,5,-1,2)\) in terms of our basis. We can quickly compute
\begin{align*}
\vv\dotp\hat{\uu}_1 \amp = \frac{3}{\sqrt{2}}-\frac{1}{\sqrt{2}}=\sqrt{2}\\
\vv\dotp\hat{\uu}_2 \amp = -\frac{3}{\sqrt{3}}-\frac{1}{\sqrt{3}}+\frac{2}{\sqrt{3}}=-\frac{2}{\sqrt{3}}\\
\vv\dotp\hat{\uu}_3 \amp = \frac{3}{\sqrt{7}}+\frac{5}{\sqrt{7}}+\frac{1}{\sqrt{7}}+\frac{4}{\sqrt{7}} = \frac{11}{\sqrt{7}}\\
\vv\dotp\hat{\uu}_4 \amp = \frac{3}{\sqrt{42}}-\frac{30}{\sqrt{42}}+\frac{1}{\sqrt{42}}+\frac{4}{\sqrt{42}} = -\frac{22}{\sqrt{42}}\text{,}
\end{align*}
and so
\begin{equation*}
\vv = \sqrt{2}\hat{\uu}_1-\frac{2}{\sqrt{3}}\hat{\uu}_2+\frac{11}{\sqrt{7}}\hat{\uu}_3-\frac{22}{\sqrt{42}}\hat{\uu}_4\text{.}
\end{equation*}
There’s still work to be done, but it is comparatively simpler than solving the corresponding system of equations.