<< Chapter < Page | Chapter >> Page > |
A typical problem arising in signal processing is to minimize $x^TAx$ subject to the linear constraint $c^Tx=1$ . $A$ is a positive definite, symmetric matrix (a correlation matrix) in mostproblems. Clearly, the minimum of the objective function occurs at $x=0$ , but his solution cannot satisfy the constraint. The constraint $g(x)=c^Tx-1$ is a scalar-valued one; hence the theorem of Lagrange applies as there are no multiple components in theconstraint forcing a check of linear independence. The Lagrangian is $$L(x, )=x^TAx+(c^Tx-1)$$ Its gradient is $2Ax+c$ with a solution ${x}_{}=-\left(\frac{{}_{}A^{(-1)}c}{2}\right)$ . To find the value of the Lagrange multiplier, this solution must satisfy the constraint. Imposing theconstraint, ${}_{}c^TA^{(-1)}c=-2$ ; thus, ${}_{}=\frac{-2}{c^TA^{(-1)}c}$ and the total solution is $${x}_{}=\frac{A^{(-1)}c}{c^TA^{(-1)}c}$$
When the independent variable is complex-valued, the Lagrange multiplier technique can be used if care is taken to make the Lagrangian real. If it is not real, wecannot use the theorem that permits computation of stationary points by computing the gradient with respect to $\overline{z}$ alone. The Lagrangian may not be real-valued even when the constraint is real. Once insured real, the gradientof the Lagrangian with respect to the conjugate of the independent vector can be evaluated and the minimizationprocedure remains as before.
Consider slight variations to the previous example: let the vector $z$ be complex so that the objective function is $(z)Az$ where $A$ is a positive definite, Hermitian matrix and let the constraint be linear, but vector-valued ( $Cz=c$ ). The Lagrangian is formed from the objective function and the real part of the usual constraint term. $$L(z, )=(z)Az+()(Cz-c)+^T(\overline{C}\overline{z}-\overline{c})$$ For the Lagrange multiplier theorem to hold, the gradients of each component of the constraint must be linearlyindependent. As these gradients are the columns of $C$ , their mutual linear independence means that each constraint vector mustnot be expressible as a linear combination of the others. We shall assume this portion of the problem statement true.Evaluating the gradient with respect to $\overline{z}$ , keeping $z$ a constant, and setting the result equal to zero yields $$A{z}_{}+(C){}_{}=0$$ The solution is ${z}_{}$ is $-(A^{(-1)}(C){}_{})$ . Applying the constraint, we find that $CA^{(-1)}(C){}_{}=-c$ . Solving for the Lagrange multiplier and substituting the result into the solution, we find that thesolution to the constrained optimization problem is $${z}_{}=A^{(-1)}(C)CA^{(-1)}(C)^{(-1)}c$$ The indicated matrix inverses always exist: $A$ is assumed invertible and $CA^{(-1)}(C)$ is invertible because of the linear independence of the constraints.
When some of the constraints are inequalities, the Lagrange multiplier technique can be used, but the solution must bechecked carefully in its details. But first, the optimizationproblem with equality and inequality constraints is formulated as $$\min\{x , f(x)\}\text{subject to}g(x)=0\text{and}h(x)\le 0$$ As before, $f()$ is the scalar-valued objective function and $g()$ is the equality constraint function; $h()$ is the inequality constraint function .
The key result which can be used to find the analytic solution
to this problem is to first form the Lagrangian in the usualway as
$L(x, , )=f(x)+^Tg(x)+^Th(x)$ . The following theorem is the general statement of
the Lagrange multiplier technique for constrained optimizationproblems.
Let ${x}^{}$ be a local minimum for the constrained optimization problem. If the gradients of $g$ 's components and the gradients of those components of $h()$ for which ${h}_{i}({x}^{})=0$ are linearly independent, then $$\frac{d L({x}^{}, {}^{}, {}^{})}{d x}=0$$ where ${}^{}\ge 0$ and ${}_{i}^{}{h}_{i}({x}^{})=0$
The portion of this result dealing with the inequality constraint differs substantially from that concerned with theequality constraint. Either a component of the constraint equals its maximum value (zero in this case) and thecorresponding component of its Lagrange multiplier is non-negative (and is usually positive) or a component is less than the constraint and its component of the Lagrange multiplier is zero. This latter result meansthat some components of the inequality constraint are not as stringent as others and these lax ones do not affect thesolution.The rationale behind this theorem is a technique for converting the inequality constraint into an equalityconstraint: ${h}_{i}(x)\le 0$ is equivalent to ${h}_{i}(x)+{s}_{i}^{2}=0$ . Since the new term, called a slack variable , is non-negative, the constraint must be non-positive. With the inclusion of slack variables, theequality constraint theorem can be used and the above theorem results. To prove the theorem, not only does the gradientwith respect to $x$ need to be considered, but also with respect to the vector $s$ of slack variables. The ${i}^{\mathrm{th}}$ component of the gradient of the Lagrangian with respect to $s$ at the stationary point is $2{}_{i}^{}{s}_{i}^{}=0$ . If in solving the optimization problem, ${s}_{i}^{}=0$ , the inequality constraint was in reality an equality constraint and that component of the constraintbehaves accordingly. As ${s}_{i}=\sqrt{-{h}_{i}(x)}$ , ${s}_{i}=0$ implies that that component of the inequality constraint must equal zero. On the other hand, if ${s}_{i}\neq 0$ , the corresponding Lagrange multiplier must be zero.
Consider the problem of minimizing a quadratic form subject to a linear equality constraint and an inequality constrainton the norm of the linear constraint vector's variation. $$\min\{x , x^TAx\}\text{subject to}(c+)^Tx=1\text{and}()^{2}\le $$ This kind of problem arises in robust estimation. One seeks a solution where one of the "knowns" of the problem, $c$ in this case, is, in reality, only approximately specified. Theindependent variables are $x$ and $$ . The Lagrangian for this problem is $$L(\{x, \}, , )=x^TAx+((c+)^Tx-1)+(()^{2}-)$$ Evaluating the gradients with respect to the independent variables yields $$2A{x}^{}+{}^{}(c+{}^{})=0$$ $${}^{}{x}^{}+2{}^{}{}^{}=0$$ The latter equation is key. Recall that either ${}^{}=0$ or the inequality constraint is satisfied with equality. If ${}^{}$ is zero, that implies that ${x}^{}$ must be zero which will not allow the equality constraint to be satisfied. The inescapable conclusion isthat $({}^{})^{2}=$ and that ${}^{}$ is parallel to ${x}^{}$ : ${}^{}=-(\frac{{}^{}}{2{}^{}}{x}^{})$ . Using the first equation, ${x}^{}$ is found to be $${x}^{}=-\left(\frac{{}^{}}{2}\right)A-\frac{{}^{}^{2}}{4{}^{}}I^{(-1)}c$$ Imposing the constraints on this solution results in a pair of equations for the Lagrange multipliers. $$(1/4\frac{{}^{}^{2}}{{}^{}})^{2}c^T(A-1/4\frac{{}^{}^{2}}{{}^{}}I)^{-2}c=$$ $$c^TA-1/4\frac{{}^{}^{2}}{{}^{}}I^{(-1)}c=-\left(\frac{2}{{}^{}}\right)-2\frac{{}^{}}{{}^{}^{2}}$$ Multiple solutions are possible and each must be checked. The rather complicated completion of this example is left tothe (numerically oriented) reader.
Notification Switch
Would you like to follow the 'Statistical signal processing' conversation and receive update notifications?