Section14.7Constrained Optimization and Lagrange Multipliers

Let us continue our discussion of constrained optimization begun in Section 14.5. Theorem 14.5.19 tells us that the Extreme Value Theorem extends to functions of two variables; in fact, this is true for a function of any number of variables: if a real-valued function \(f\) is continuous on a closed, bounded subset of \(\mathbb{R}^n\text{,}\) then it is guaranteed to have an absolute maximum and minimum.

However, as the number of variables increases, the job of finding these absolute extrema becomes more and more complicated. We saw one approach in Section 14.5: given a continuous function on a closed, bounded region \(D\text{,}\) we first consider critical values on the interior of \(D\text{.}\) We then restrict our function \(f\) to points on the boundary of \(D\text{,}\) and attempt to reduce the problem to optimization in one variable.

In many cases, this approach is best accomplished by parametrizing the boundary. We learned how to define parametric curves in the plane in Section 9.2.

Example14.7.1.Constrained optimization by parametrization.

Find the absolute maximum and minimum values of \(f(x,y) = x^2-8x-3y^2\) on the disc \(x^2+y^2\leq 4\text{.}\)

which vanishes when \((x,y) = (4,0)\text{.}\) This critical point is outside our region, so we do not consider it.

Next, we look for extreme values on the boundary. The boundary of our region is the circle \(x^2+y^2=4\text{,}\) which we can parametrize using \(x=2\cos t\text{,}\)\(y=2\sin t\text{,}\) for \(t\in [0,2\pi]\text{.}\) For \((x,y)\) on the boundary, we have

a function of one variable, with domain \([0,2\pi]\text{.}\)

We learned how to find the extreme values of such a function back in our first course in calculus: see Section 3.1. We have \(h(0)=h(2\pi)=-12\text{,}\) and

\begin{equation*}
h'(t) = -8\cos t\sin t+16\sin t-24\sin t\cos t = 16\sin t (1-2\cos t)\text{.}
\end{equation*}

Thus, \(h'(t)=0\) if \(\sin t = 0\) (\(t=0,\pi,2\pi\)) or \(\cos =\frac12\) (\(t=\pi/3, 5\pi/3\)). We have already checked that \(h(0)=h(2\pi)=-12\text{,}\) so we check the remaining points:

We see that the absolute maximum is when \(t=\pi\text{:}\)\(h(\pi) = f(-2,0)=20\text{,}\) and the absolute minimum is \(-16\text{,}\) which occurs when \(t=\pi/3\) and \(t=5\pi/3\text{,}\) corresponding to the points \((1,\sqrt{3})\) and \((1,-\sqrt{3})\text{,}\) respectively.

The above method works well, when it's straightforward to set up. The advantage is that it reduces the problem of optimization along the boundary to an optimization problem in one variable, which is something we mastered long ago.

One downside is that it is not always easy to come up with a parametrization for a curve. In the above example, the boundary \(x^2+y^2=4\) is a level curve: it's of the form \(g(x,y)=c\text{.}\) When we're trying to optimize subject to a constraint of this form, there is another approach, called the method of Lagrange multipliers.

Suppose we are trying to maximize a function \(f(x,y)\) subject to a constrain \(g(x,y)=c\text{.}\) We could follow the approach given above: find a function \(\vec{r}: [a,b]\to \mathbb{R}^2\) that parametrizes the curve \(g(x,y)=c\text{.}\) As we saw above, the maximum (or minimum) should occur at some point \(t_0\) that is a critical number of \(h(t)=f(\vec{r}(t))\text{;}\) that is, such that

This tells us that the gradient \(\nabla f\) should be orthogonal to the constraint curve \(g(x,y)=c\) at the point \((x_0,y_0)=(x(t_0),y(t_0))\text{.}\) But we know another gradient that is orthogonal to this curve: \(\nabla g\text{!}\) Recall from Theorem 14.3.9 that \(\nabla g(x,y)\) is always orthogonal to the level curve \(g(x,y)=c\) at points along the curve.

Let's sum up: the vectors \(\nabla f(x_0,y_0)\) and \(\nabla g(x_0,y_0)\) are both orthogonal to the vector \(\vrp(t_0)\text{.}\) We assume that \(\nabla f(x_0,y_0)\neq \vec{0}\text{,}\) since critical points of \(f\) have already been checked. We also assume that \(c\) is a regular value of \(g\text{,}\) meaning that there are no critical points of \(g\) along the curve \(g(x,y)=c\text{,}\) so \(\nabla g(x_0,y_0)\neq\vec{0}\) as well.

But the only way that two non-zero vectors in the plane can both be orthogonal to a third vector is if they're parallel! This means that there must be some scalar \(\lambda\) such that

Let \(f\) and \(g\) be continuously differentiable functions of two variables, and suppose \(c\) is a regular value for \(g\text{.}\) If the function \(f\text{,}\) when constrained to the level curve \(g(x,y)=c\) has a local maximum or minimum value at a point \((x_0,y_0)\text{,}\) then

Note that there are two possibilities: either \(\lambda=0\text{,}\) in which case \((x_0,y_0)\) is a critical point of \(f\text{,}\) or \(\lambda\neq 0\text{,}\) in which case the level curve of \(f\) that passes through \((x_0,y_0)\) must be tangent to the curve \(g(x,y)=c\) at that point. Putting Theorem 14.7.3 to use is a matter of solving a system of equations.

Key Idea14.7.4.Method of Lagrange Multipliers.

To find the maximum and minimum values of a function \(f\) of two variables subject to a constraint \(g(x,y)=c\text{,}\) we must find the simultaneous solutions to the following equations, where \(\lambda\) is an unknown constant (called a Lagrange multiplier):

This is the same problem as Example 14.7.1, but this time, we will attempt to solve it using the method of Lagrange multipliers. Again, since \(\nabla f(x,y) = \la 2x-8,-6y\ra\text{,}\) the only critical point for \(f\) is outside the given disc. It remains to find the maximum and minimum of \(f\) subject to the constraint \(x^2+y^2=4\text{,}\) so our constraint function is \(g(x,y)=x^2+y^2\text{.}\) We have

Together with the constraint, we have three equations:

\begin{align*}
2x-8 \amp = 2\lambda x \quad \Rightarrow\, (1-\lambda)x=4\\
-6y \amp = 2\lambda y \quad \Rightarrow\, y=0 \text{ or } \lambda = -3\\
x^2+y^2=4\text{.}
\end{align*}

Now we encounter the primary difficulty with Lagrange multipliers. While the idea is simple, the equations it leads to frequently are not. The equations are rarely linear, so there is no systematic method for solving them: solving a Lagrange multiplier problem requires a certain amount of patience and creativity!

One of the possibilities we see above is \(y=0\text{.}\) If \(y=0\text{,}\) the constraint equation requires \(x=\pm 2\text{,}\) and in either case we can choose a value for \(\lambda\) (\(-1\) and \(3\text{,}\) respectively) that solves the equation \((1-\lambda)x=4\text{.}\)

We find \(f(2,0)=-12\text{,}\) and \(f(-2,0)=20\text{.}\) If \(y\neq 0\text{,}\) then we must have \(\lambda=-3\text{.}\) Putting this into the equation \((1-\lambda)x=4\) gives us \(4x=4\text{,}\) or \(x=1\text{.}\) If \(x=1\text{,}\) the constraint equation gives us \(1+y^2=4\text{,}\) so \(y=\pm \sqrt{3}\text{.}\)

We find \(f(1,\sqrt{3})=f(1,-\sqrt{3}) = -16\text{.}\) There are no other points that satisfy all three equations, so we compare values to complete the problem: the maximum is \(f(-2,0)=20\text{,}\) and the minimum is \(f(1,\pm\sqrt{3})=-16\text{,}\) as before.

The method of Lagrange multipliers seems rather arcane at first glance, but it's actually not hard to understand geometrically why it works.

Consider Figure 14.7.7. The constraint curve \(x^2+y^2=4\) is the dashed circle. We also see the three level curves (solid) that were obtained as solutions to the Lagrange multiplier equations:

\(f(x,y)=-12\text{:}\) passing through \((2,0)\)

\(f(x,y)=20\text{:}\) passing through \((-2,0)\)

\(f(x,y)=-16\text{:}\) this curve is actually a pair of lines, \(\sqrt{3}y=\pm (x-4)\text{,}\) passing through \((1,\pm\sqrt{3})\text{,}\) respectively.

We see that all three curves are tangent to the constraint curve, as we expect from the requirement that the gradients \(\nabla f\) and \(\nabla g\) be parallel where the curves intersect.

Additional level curves \(f(x,y)=c\) are plotted as well, with dashed-dotted lines. For values of \(c\) with \(c\gt20\) (greater than the maximum) or \(c\lt-16\) (less than the minimum), the level curve \(f(x,y)=c\) does not intersect the constraint curve at all.

For values of \(c\) with \(-16\lt c\lt 20\text{,}\) the curve \(f(x,y)=c\) intersects the constraint curve, but the intersection is what's called transversal: at these points of intersection, the two curves are not tangent, and the gradients are not parallel.

In Figure 14.7.7, you can imagine that increasing or decreasing the value of \(c\) has the effect of shifting the level curve one way or the other, until it just touches the circle. Any bigger than the maximum, or smaller than the minimum, and the curves no longer intersect. Of course, saying that the curves “just touch” amounts to saying that they are tangent at their point of intersection, just as Theorem 14.7.3 predicts.

so \(x+y=4\text{,}\) from our constraint, and \(4x=\lambda=2y\text{,}\) giving us \(y=3-x\) and \(y=2x\text{,}\) so \(2x=3-x\text{,}\) giving \(x=1\text{,}\) and \(y=2\text{.}\)

We get only one solution: the value \(f(1,2)=6\text{.}\) But is this a maximum or a minimum? And shouldn't we get both?

Rather than blindly attacking the equations, perhaps it would do to take a step back and think about the problem. First, consider the constraint equation: \(x+y=3\text{.}\) This is a line; it certainly is not the boundary of a closed, bounded reason in the plane. Thus, we haven't satisfied the conditions of the Extreme Value Theorem, and have no reason to expect both an absolute maximum and an absolute minimum.

Now, since the line \(x+y=3\) extends without bound, it's clear that there can be no maximum value \(c\) beyond which the ellipse \(2x^2+y^2=c\) does not intersect the line. There is, however, a minimum value: when \(c=6\text{,}\) the ellipse \(2x^2+y^2=6\) has gradient \(\nabla f(x,y)=\la 4,4\ra\text{,}\) giving us the tangent line

\begin{equation*}
4(x-1)+4(y-2)=0, \text{ or } x+y=3\text{,}
\end{equation*}

the equation of our constraint. For value of \(c\) less than 3, the ellipse \(2x^2+y^2=c\) does not intersect the line \(x+y=3\text{.}\)

The method of Lagrange multipliers is not restricted to functions of two variables or to single constraints. We can similarly determine the extrema of a function \(f\) of three variables on a closed bounded subset of \(\mathbb{R}^3\text{.}\)

Example14.7.10.Determining constrained extrema for a function of three variables.

Determine the maximum and minimum values of the function \(f(x,y,z)=x^4+y^4+z^4\text{,}\) subject to the constraint \(x^2+y^2+z^2=1\text{.}\)

Equating first components, we have \(2x^3=\lambda x\text{.}\) One possibility is \(x=0\text{;}\) the other is \(\lambda = 2x^2\text{.}\) Similar results hold for the other two variables, leaving us with several possibilities to consider.

We take the solution \(x=0\text{,}\)\(y=0\text{,}\) and \(z=0\) from the vector equation above. But this result cannot satisfy our constraint, so we rule it out.

We have \(x=0\) and \(y=0\text{,}\) but \(z\neq 0\text{.}\) The constraint equation forces \(z=\pm 1\text{.}\) Similarly, we can have \(x=0\text{,}\)\(y=\pm 1\text{,}\) and \(z=0\text{,}\) or \(x=\pm 1\text{,}\)\(y=0\text{,}\) and \(z=0\text{.}\) This gives us six points, and they all give the same value for \(f\text{:}\)

One of the three variables is zero. If \(x=0\text{,}\) with \(y\) and \(z\) nonzero, then we have \(2y^2=\lambda =2z^2\text{,}\) and since \(x^2+y^2+z^2=1\text{,}\) we must have \(y^2=z^2=\frac12\text{.}\) This gives us \(f(x,y,z) = 0+\frac14+\frac14=\frac12\text{.}\)

There are twelve possibilities here: one variable zero, and the other two can be \(\pm \frac{1}{\sqrt{2}}\text{.}\) Each one gives a value of \(\frac12\) for \(f\text{.}\)

Finally, we could have all three variables nonzero. In this case the Lagrange multiplier equations give us

and putting these into the constraint equation gives us \(x^2=y^2=z^2=\frac13\text{.}\) There are eight different points satisfying this requirement, but all of them give us a value of

Comparing values, we see that the maximum value for \(f\text{,}\) when constrained to unit sphere, is 1, and there are 6 points on the sphere with this value. The minimum value is \(\frac13\text{,}\) and this occurs at 8 different points.

As the above examples show, Lagrange multiplier problems are often easy to set up, but hard to solve by hand. So why is the method useful? One reason is that it can be used to establish useful theoretical results. But more practically, the method of Lagrange multipliers is useful because it is easy to program into a computer: we simply provide the function and the constraint(s), and the computer solves the resulting equations. There is no need for the same degree of problem-solving employed when we first tackled optimization problems in one variable, back in Chapter 4. To emphasize this, we consider one more example: a reprise of one of the optimization problems from Section 4.3.

Example14.7.11.Solving an optimization problem with Lagrange multipliers.

Find the dimensions of a cylindrical can of volume \(206 \text{ in}^3\) that minimize the can's surface area.

This is the function we wish to minimize, subject to the volume constraint \(\pi r^2 h = 206\text{.}\)

In Section 4.3, our next step would have been to solve the constraint equation for one of the two variables (likely \(h\) ) in terms of the other, so we could rewrite \(s(r,h)\) as a function of one variable and apply the techniques of Section 3.1.

Instead, we introduce the constraint function \(v(r,h)= \pi r^2 h\text{.}\) The Lagrange multiplier equation \(\nabla s = \lambda \nabla v\) gives us

Equating the second components gives us \(2\pi r = \lambda\pi r^2\text{.}\) Since the constraint ensures that \(r\neq 0\text{,}\) we have \(\lambda r = 2\text{.}\)

Now, we equate the first components:

\begin{equation*}
4r+2\pi h = \lambda \cdot 2\pi r\text{,}
\end{equation*}

but \(\lambda r =2\text{,}\) so we have simply \(4r+2\pi h = 4\pi h\text{,}\) or \(\pi h = 2r\text{.}\) Putting this into the constraint equation gives us

\begin{equation*}
\pi r^2 h = 2r^2 = 206\text{,}
\end{equation*}

so \(r=\sqrt[3]{103}\approx 4.688\text{,}\) and \(h=2\sqrt[3]{103}/\pi \approx 2.984\text{.}\) This is, of course, the same result you would have found if you did this exercise back in Section 4.3.