Quadratic forms

Section 4.4 Quadratic forms

If you’ve done a couple of calculus courses, you’ve probably encountered conic sections, like the ellipse \(\frac{x^2}{a^2}+\frac{y^2}{b^2}=1\) or the parabola \(\frac{y}{b}=\frac{x^2}{a^2}\text{.}\) You might also recall that your instructor was careful to avoid conic sections with equations including “cross-terms” like \(xy\text{.}\) The reason for this is that sketching a conic section like \(x^2+4xy+y^2=1\) requires the techniques of the previous section.

A basic fact about orthogonal matrices is that they preserve length. Indeed, for any vector \(\xx\) in \(\R^n\) and any orthogonal matrix \(P\text{,}\)

\begin{equation*} \len{P\xx}^2 = (P\xx)\dotp (P\xx) = (P\xx)^T(P\xx) = (\xx^TP^T)(P\xx) = \xx^T\xx=\len{\xx}^2\text{,} \end{equation*}

since \(P^TP=I_n\text{.}\)

Note also that since \(P^TP=I_n\) and \(\det P^T=\det P\text{,}\) we have

\begin{equation*} \det(P)^2=\det(P^TP)=\det(I_n)=1\text{,} \end{equation*}

so \(\det(P)=\pm 1\text{.}\) If \(\det P=1\text{,}\) we have what is called a special orthogonal matrix. In \(\R^2\) or \(\R^3\text{,}\) multiplication by a special orthogonal matrix is simply a rotation. (If \(\det P=-1\text{,}\) there is also a reflection.)

We mentioned in the previous section that the Real Spectral Theorem is also referred to as the principal axes theorem. The name comes from the fact that one way to interpret the orthogonal diagonalization of a symmetric matrix is that we are rotating our coordinate system. The original coordinate axes are rotated to new coordinate axes, with respect to which the matrix \(A\) is diagonal. This will become more clear once we apply these ideas to the problem of conic sections mentioned above. First, a definition.

Definition 4.4.1.

A quadratic form on variables \(x_1, x_2,\ldots, x_n\) is any expression of the form

\begin{equation*} q(x_1,\ldots, x_n) = \sum_{i\leq j}a_{ij}x_ix_j\text{.} \end{equation*}

For example, \(q_1(x,y)=4 x^2-4xy+4y^2\) and \(q_2(x,y,z)=9x^2-4 y^2-4xy-2xz+z^2\) are quadratic forms. Note that each term in a quadratic form is of degree two. We omit linear terms, since these can be absorbed by completing the square. The important observation is that every quadratic form can be associated to a symmetric matrix. The diagonal entries are the coefficients \(a_{ii}\) appearing in Definition 4.4.1, while the off-diagonal entries are half the corresponding coefficients \(a_{ij}\text{.}\)

For example the two quadratic forms given above have the following associated matrices:

\begin{equation*} A_1 = \bbm 4 \amp -2\\-2\amp 4\ebm \text{ and } A_2 = \bbm 9 \amp -2 \amp -1\\-2\amp 4\amp 0\\-1\amp 0\amp 1\ebm\text{.} \end{equation*}

The reason for this is that we can then write

\begin{equation*} q_1(x,y)=\bbm x\amp y\ebm\bbm 4 \amp -1\\-1\amp 1\ebm\bbm x\\y\ebm \end{equation*}

and

\begin{equation*} q_2(x,y,z)=\bbm x\amp y\amp z\ebm\bbm 9 \amp -2 \amp -1\\-2\amp 4\amp 0\\-1\amp 0\amp 1\ebm\bbm x\\y\\z\ebm\text{.} \end{equation*}

Of course, the reason for wanting to associate a symmetric matrix to a quadratic form is that it can be orthogonally diagonalized. Consider the matrix \(A_1\text{.}\)

We find distinct eigenvalues \(\lambda_1=2\) and \(\lambda_2=6\text{.}\) Since \(A\) is symmetric, we know the corresponding eigenvectors will be orthogonal.

The resulting orthogonal matrix is \(P=\frac{1}{\sqrt{2}}\bbm 1\amp -1\\1\amp 1\ebm\text{,}\) and we find

\begin{equation*} P^TAP = \bbm 2\amp 0\\0\amp 6\ebm, \text{ or } A = PDP^T, \end{equation*}

where \(D = \bbm 2\amp 0\\0\amp 6\ebm\text{.}\) If we define new variables \(y_1,y_2\) by

\begin{equation*} \bbm y_1\\y_2\ebm = P^T\bbm x_1\\x_2\ebm\text{,} \end{equation*}

then we find that

\begin{align*} \bbm x_1\amp x_2\ebm A\bbm x_1\\x_2\ebm \amp = (\bbm x_1\amp x_2\ebm P)D\left(P^T\bbm x_1\\x_2\ebm\right) \\ \amp = \bbm y_1 \amp y_2\ebm\bbm 2\amp 0\\0\amp 6\ebm\bbm y_1\\y_2\ebm\\ \amp = 2y_1^2+6y_2^2\text{.} \end{align*}

Note that there is no longer any cross term.

Now, suppose we want to graph the conic \(4x_1^2-4x_1x_2+4x_2^2=12\text{.}\) By changing to the variables \(y_1,y_2\) this becomes \(2y_1^2+6y_2^2=12\text{,}\) or \(\frac{y_1^2}{6}+\frac{y_2^2}{2}=1\text{.}\) This is the standard from of an ellipse, but in terms of new variables. How do we graph it? Returning to the definition of our new variables, we find \(y_1=\frac{1}{\sqrt{2}}(x_1+x_2)\) and \(y_2=\frac{1}{\sqrt{2}}(-x_1+x_2)\text{.}\) The \(y_1\) axis should be the line \(y_2=0\text{,}\) or \(x_1=x_2\text{.}\) (Note that this line points in the direction of the eigenvector \(\bbm 1\\1\ebm\text{.}\)) The \(y_2\) axis should be the line \(y_1=0\text{,}\) or \(x_1=-x_2\text{,}\) which is in the direction of the eigenvector \(\bbm -1\\1\ebm\text{.}\)

This lets us see that our new coordinate axes are simply a rotation (by \(\pi/4\)) of the old coordinate axes, and our conic section is, accordingly, an ellipse that has been rotated by the same angle.

Remark 4.4.2.

One reason to study quadratic forms is the classification of critical points in calculus. You may recall (if you took Calculus 1) that for a differentiable function \(f(x)\text{,}\) if \(f'(c)=0\) and \(f''(c)\gt 0\) at some number \(c\text{,}\) then \(f\) has a local minimum at \(c\text{.}\) Similarly, if \(f'(C)=0\) and \(f''(c)\lt 0\text{,}\) then \(f\) has a local maximum at \(c\text{.}\)

For functions of two or more variables, determining whether a critical point is a maximum or minimum (or something else) is more complicated. Or rather, it is more complicated for those unfamiliar with linear algebra! The second-order partial derivatives of our function can be arranged into a matrix called the Hessian matrix. For example, a function \(f(x,y)\) of two variables has first-order partial derivatives \(f_x(x,y)\) and \(f_y(x,y)\) with respect to \(x\) and \(y\text{,}\) respectively, and second-order partial derivatives \(f_{xx}(x,y)\) (twice with respect to \(x\)), \(f_{xy}(x,y)\) (first \(x\text{,}\) then \(y\)), \(f_{yx}(x,y)\) (first \(y\text{,}\) then \(x\)), and \(f_{yy}(x,y)\) (twice with respect to \(y\)).

The Hessian matrix at a point \((a,b)\) is

\begin{equation*} H_f(a,b) = \bbm f_{xx}(a,b) \amp f_{xy}(a,b)\\f_{yx}(a,b) \amp f_{yy}(a,b)\ebm\text{.} \end{equation*}

As long as the second-order partial derivatives are continuous at \((a,b)\text{,}\) it is guaranteed that the Hessian matrix is symmetric! That means that there is a corresponding quadratic form, and when the first-order derivatives \(f_x(a,b)\) and \(f_y(a,b)\) are both zero (a critical point), it turns out that this quadratic form provides the best quadratic approximation to \(f(x,y)\) near the point \((a,b)\text{.}\) This is true for three or more variables as well.

The eigenvalues of this matrix then give us some information about the behaviour of our function near the critical point. If all eigenvalues are positive at a point, we say that the corresponding quadratic form is positive-definite, and the function \(f\) has a local minimum at that point. If all eigenvalues are negative at a point, we say that the corresponding quadratic form is negative-definite, and the function \(f\) has a local maximum at that point. If all eigenvalues are nonzero at a point, with some positive and some negative, we say that \(f\) has a saddle point. The corresponding quadratic form is called indefinite, and this term applies even if some eigenvalues are zero.

If a quadratic form corresponds to a symmetric matrix whose eigenvalues are positive or zero, we say that the quadratic form is positive-semidefinite. Similarly, a negative-semidefinite quadratic form corresponds to symmetric matrix whose eigenvalues are all less than or equal to zero.

Exercises Exercises

1.

Write the matrix of the quadratic form \(Q(x_1,x_2,x_3) = {7x_{1}^{2}-x_{2}^{2}-3x_{3}^{2}+9x_{1}x_{2}-6x_{1}x_{3}+4x_{2}x_{3}}.\)

2.

Determine the quadratic form \(Q(\vec{x}) = \vec{x}^TA\vec{x}\) associated to the matrix

\begin{equation*} A = {\left[\begin{array}{ccc} 9 \amp -8 \amp 5\cr -8 \amp -5 \amp 9\cr 5 \amp 9 \amp 4 \end{array}\right]}. \end{equation*}

3.

The matrix

\begin{equation*} A = {\left[\begin{array}{ccc} 2.1 \amp 0 \amp 0.3\cr 0 \amp 6 \amp 0\cr 0.3 \amp 0 \amp 2.9 \end{array}\right]} \end{equation*}

has three distinct eigenvalues, \(\lambda_1 \lt \lambda_2 \lt \lambda_3\text{.}\) Find the eigenvalues, and classify the quadratic form \(Q(\mathbf{x}) = \mathbf{x}^TA\mathbf{x}\text{.}\)

Prev Top Next