# 0.16 Appendix: optimization theory

 Page 1 / 1

Optimization theory is the branch of applied mathematics whose purpose is to consider a mathematical expression in order to find a set of parameters that either maximize or minimize it. Being an applied discipline, problems usually arise from real-life situations including areas like science, engineering and finance (among many other). This section presents some basic concepts for completeness and is not meant to replace a treaty on the subject. The reader is encouraged to consult further references for more information.

## Solution of linear weighted least squares problems

$\underset{h}{\text{min}}\phantom{\rule{0.277778em}{0ex}}{\parallel d-\mathbf{C}h\parallel }_{2}$

which can be written as

$\underset{h}{\text{min}}\phantom{\rule{0.277778em}{0ex}}{\left(d,-,\mathbf{C},h\right)}^{T}\left(d,-,\mathbf{C},h\right)$

omitting the square root since this problem is a strictly convex one. Therefore its unique (and thus global) solution is found at the point where the partial derivatives with respect to the optimization variable are equal to zero. That is,

$\begin{array}{cc}\hfill \frac{\partial }{\partial h}\left\{{\left(d,-,\mathbf{C},h\right)}^{T},\left(d,-,\mathbf{C},h\right)\right\}& =\frac{\partial }{\partial h}\left\{{d}^{T},d,-,2,{d}^{T},\mathbf{C},h,+,{\left(\mathbf{C},h\right)}^{T},\mathbf{C},h\right\}\hfill \\ & =-2{\mathbf{C}}^{T}d+2{\mathbf{C}}^{T}\mathbf{C}h=0\hfill \\ & ⇒\phantom{\rule{0.277778em}{0ex}}{\mathbf{C}}^{T}\mathbf{C}h={\mathbf{C}}^{T}d\hfill \end{array}$

The solution of [link] is given by

$h={\left({\mathbf{C}}^{T},\mathbf{C}\right)}^{-1}{\mathbf{C}}^{T}d$

where the inverted term is referred [link] , [link] as the Moore-Pentrose pseudoinverse of ${\mathbf{C}}^{T}\mathbf{C}$ .

In the case of a weighted version of [link] ,

$\underset{h}{\text{min}}\phantom{\rule{0.277778em}{0ex}}{\parallel \sqrt{w}\left(d,-,\mathbf{C},h\right)\parallel }_{2}^{2}=\sum _{k}{w}_{k}{|{d}_{k}-{C}_{k}h|}^{2}$

where ${C}_{k}$ is the $k$ -th row of $\mathbf{C}$ , one can write [link] as

$\underset{h}{\text{min}}\phantom{\rule{0.277778em}{0ex}}\left(\mathbf{W},\left(,d,-,\mathbf{C},h\right){\right)}^{T}\left(\mathbf{W},\left(,d,-,\mathbf{C},h\right)\right)$

where $\mathbf{W}=\text{diag}\left(\sqrt{w}\right)$ contains the weighting vector $w$ . The solution is therefore given by

$h={\left({\mathbf{C}}^{T},{\mathbf{W}}^{T},\mathbf{W},\mathbf{C}\right)}^{-1}{\mathbf{C}}^{T}{\mathbf{W}}^{T}\mathbf{W}d$

## Newton's method and ${l}_{p}$ Linear phase systems

Consider the problem

$\underset{a}{\text{min}}\phantom{\rule{0.277778em}{0ex}}g\left(a\right)={\parallel A\left(\omega ;a\right)-D\left(\omega \right)\parallel }_{p}$

for $a\in {\mathbb{R}}^{M+1}$ . Problem [link] is equivalent to the better posed problem

$\begin{array}{ccc}\hfill \underset{a}{\text{min}}\phantom{\rule{0.277778em}{0ex}}f\left(a\right)=g{\left(a\right)}^{p}& =& {\parallel A\left(\omega ;a\right)-D\left(\omega \right)\parallel }_{p}^{p}\hfill \\ & =& \sum _{i=0}^{L}\mid {C}_{i}a-{D}_{i}{\mid }^{p}\hfill \end{array}$

where ${D}_{i}=D\left({\omega }_{i}\right)$ , ${\omega }_{i}\in \left[0,\pi \right]$ , ${C}_{i}=\left[{C}_{i,0},...,{C}_{i,M}\right]$ , and

$\mathbf{C}=\left[\begin{array}{c}{C}_{0}\\ ⋮\\ {C}_{L}\end{array}\right]$

The $ij$ -th element of $\mathbf{C}$ is given by ${C}_{i,j}=\text{cos}\phantom{\rule{1mm}{0ex}}{\omega }_{i}\left(M-j\right)$ , where $0\le i\le L$ and $0\le j\le M$ . From [link] we have that

$\nabla f\left(a\right)=\left[\begin{array}{c}\frac{\partial }{\partial {a}_{0}}f\left(a\right)\\ ⋮\\ \frac{\partial }{\partial {a}_{M}}f\left(a\right)\end{array}\right]$

where ${a}_{j}$ is the $j$ -th element of $a\in {\mathbb{R}}^{M+1}$ and

$\begin{array}{ccc}\hfill \frac{\partial }{\partial {a}_{j}}f\left(a\right)& =& \frac{\partial }{\partial {a}_{j}}\sum _{i=0}^{L}\mid {C}_{i}a-{D}_{i}{\mid }^{p}\hfill \\ & =& \sum _{i=0}^{L}\frac{\partial }{\partial {a}_{j}}\mid {C}_{i}a-{D}_{i}{\mid }^{p}\hfill \\ & =& p\sum _{i=0}^{L}\mid {C}_{i}a-{D}_{i}{\mid }^{p-1}·\frac{\partial }{\partial {a}_{j}}\mid {C}_{i}a-{D}_{i}\mid \hfill \end{array}$

Now,

$\frac{\partial }{\partial {a}_{j}}\mid {C}_{i}a-{D}_{i}\mid =\text{sign}\left({C}_{i}a-{D}_{i}\right)·\frac{\partial }{\partial {a}_{j}}\left({C}_{i}a-{D}_{i}\right)={C}_{i,j}\phantom{\rule{0.277778em}{0ex}}\text{sign}\left({C}_{i}a-{D}_{i}\right)$

where Note that

$\underset{u\left(a\right)\to {0}^{+}}{lim}\frac{\partial }{\partial {a}_{j}}\mid u\left(a\right){\mid }^{p}=\underset{u\left(a\right)\to {0}^{-}}{lim}\frac{\partial }{\partial {a}_{j}}\mid u\left(a\right){\mid }^{p}=0$

$\text{sign}\left(x\right)=\left\{\begin{array}{cc}1& x>0\hfill \\ 0& x=0\hfill \\ -1& x<0\hfill \end{array}\right)$

Therefore the Jacobian of $f\left(a\right)$ is given by

$\nabla f\left(a\right)=\left[\begin{array}{c}p\sum _{i=0}^{L}{C}_{i,0}\phantom{\rule{0.277778em}{0ex}}\mid {C}_{i}a-{D}_{i}{\mid }^{p-1}\text{sign}\left({C}_{i}a-{D}_{i}\right)\\ ⋮\\ p\sum _{i=0}^{L}{C}_{i,M-1}\phantom{\rule{0.277778em}{0ex}}\mid {C}_{i}a-{D}_{i}{\mid }^{p-1}\text{sign}\left({C}_{i}a-{D}_{i}\right)\end{array}\right]$

The Hessian of $f\left(a\right)$ is the matrix ${\nabla }^{2}f\left(a\right)$ whose $jm$ -th element ( $0\le j,m\le M$ ) is given by

$\begin{array}{ccc}\hfill {\nabla }_{j,m}^{2}f\left(a\right)=\frac{\partial {a}^{2}}{\partial {a}_{j}\partial {a}_{m}}f\left(a\right)& =& \frac{\partial }{\partial {a}_{m}}\frac{\partial }{\partial {a}_{j}}f\left(a\right)\hfill \\ & =& \sum _{i=0}^{L}p\phantom{\rule{0.222222em}{0ex}}{C}_{i,j}\phantom{\rule{0.277778em}{0ex}}\frac{\partial }{\partial {a}_{m}}\mid {D}_{i}-{C}_{i}a{\mid }^{p-1}\text{sign}\left({D}_{i}-{C}_{i}a\right)\hfill \\ & =& \sum _{i=0}^{L}\alpha \frac{\partial }{\partial {a}_{m}}b\left(a\right)d\left(a\right)\hfill \end{array}$

where adequate substitutions have been made for the sake of simplicity. We have

$\begin{array}{ccc}\hfill \frac{\partial }{\partial {a}_{m}}b\left(a\right)& =& \frac{\partial }{\partial {a}_{m}}\mid {C}_{i}a-{D}_{i}{\mid }^{p-1}\hfill \\ & =& \left(p-1\right){C}_{i,m}\mid {C}_{i}a-{D}_{i}{\mid }^{p-2}\text{sign}\left({C}_{i}a-{D}_{i}\right)\hfill \\ \hfill \frac{\partial }{\partial {a}_{m}}d\left(a\right)& =& \frac{\partial }{\partial {a}_{m}}\text{sign}\left({D}_{i}-{C}_{i}a\right)=0\hfill \end{array}$

Note that the partial derivative of $d\left(a\right)$ at ${D}_{i}-{C}_{i}a=0$ is not defined. Therefore

$\begin{array}{ccc}\hfill \frac{\partial }{\partial {a}_{m}}b\left(a\right)d\left(a\right)& =& b\left(a\right)\frac{\partial }{\partial {a}_{m}}d\left(a\right)+d\left(a\right)\frac{\partial }{\partial {a}_{m}}b\left(a\right)\hfill \\ & =& \left(p-1\right){C}_{i,m}\mid {C}_{i}a-{D}_{i}{\mid }^{p-2}{\text{sign}}^{2}\left({C}_{i}a-{D}_{i}\right)\hfill \end{array}$

Note that ${\text{sign}}^{2}\left({C}_{i}a-{D}_{i}\right)=1$ for all ${D}_{i}-{C}_{i}a\ne 0$ where it is not defined. Then

${\nabla }_{j,m}^{2}f\left(a\right)=p\left(p-1\right)\sum _{i=0}^{L}{C}_{i,j}{C}_{i,m}\phantom{\rule{0.277778em}{0ex}}\mid {C}_{i}a-{D}_{i}{\mid }^{p-2}$

except at ${D}_{i}-{C}_{i}a=0$ where it is not defined.

• Given ${a}_{0}\in {\mathbb{R}}^{M+1}$ , $D\in {\mathbb{R}}^{L+1}$ , $\mathbf{C}\in {\mathbb{R}}^{L+1×M+1}$
• For $i=0,1,...$
1. Find $\nabla f\left({a}_{i}\right)$ .
2. Find ${\nabla }^{2}f\left({a}_{i}\right)$ .
3. Solve ${\nabla }^{2}f\left({a}_{i}\right)s=-\nabla f\left({a}_{i}\right)$ for $s$ .
4. Let ${a}_{+}={a}_{i}+s$ .
5. Check for convergence and iterate if necessary.

Note that for problem [link] the Jacobian of $f\left(a\right)$ can be written as

$\nabla f\left(a\right)=p{\mathbf{C}}^{T}y$

where

$y=\mid \mathbf{C}{a}_{i}-D{\mid }^{p-1}\text{sign}\left(\mathbf{C}{a}_{i}-D\right)=\mid \mathbf{C}{a}_{i}-D{\mid }^{p-2}\left(\mathbf{C}{a}_{i}-D\right)$

Also,

${\nabla }_{j,m}^{2}f\left(a\right)=p\left(p-1\right)\phantom{\rule{0.277778em}{0ex}}{C}_{j}^{T}\mathbf{Z}{C}_{m}$

where

$\mathbf{Z}=\text{diag}\left(\mid ,\mathbf{C},{a}_{i},-,D,{\mid }^{p-2}\right)$

and

${C}_{j}=\left[\begin{array}{c}{C}_{0,j}\\ ⋮\\ {C}_{L,j}\end{array}\right]$

Therefore

${\nabla }^{2}f\left(a\right)=\left({p}^{2}-p\right){\mathbf{C}}^{T}\mathbf{Z}\mathbf{C}$

From [link] , the Hessian ${\nabla }^{2}f\left(a\right)$ can be expressed as

${\nabla }^{2}f\left(a\right)=\left({p}^{2}-p\right){\mathbf{C}}^{T}{\mathbf{W}}^{T}\mathbf{W}\mathbf{C}$

where

$\mathbf{W}=\text{diag}\left(\mid ,\mathbf{C},{a}_{i},-,D,{\mid }^{\frac{p-2}{2}}\right)$

The matrix $\mathbf{C}\in {\mathbb{R}}^{\left(L+1\right)×\left(M+1\right)}$ is given by

$\mathbf{C}=\left[\begin{array}{ccccccc}\text{cos}M{\omega }_{0}& \text{cos}\left(M-1\right){\omega }_{0}& \cdots & \text{cos}\left(M-j\right){\omega }_{0}& \cdots & \text{cos}{\omega }_{0}& 1\\ \text{cos}M{\omega }_{1}& \text{cos}\left(M-1\right){\omega }_{1}& \cdots & \text{cos}\left(M-j\right){\omega }_{1}& \cdots & \text{cos}{\omega }_{1}& 1\\ ⋮& ⋮& \ddots & ⋮& & ⋮& ⋮\\ \text{cos}M{\omega }_{i}& \text{cos}\left(M-1\right){\omega }_{i}& \cdots & \text{cos}\left(M-j\right){\omega }_{i}& \cdots & \text{cos}{\omega }_{i}& 1\\ ⋮& ⋮& & ⋮& \ddots & ⋮& ⋮\\ \text{cos}M{\omega }_{L-1}& \text{cos}\left(M-1\right){\omega }_{L-1}& \cdots & \text{cos}\left(M-j\right){\omega }_{L-1}& \cdots & \text{cos}{\omega }_{L-1}& 1\\ \text{cos}M{\omega }_{L}& \text{cos}\left(M-1\right){\omega }_{L}& \cdots & \text{cos}\left(M-j\right){\omega }_{L}& \cdots & \text{cos}{\omega }_{L}& 1\end{array}\right]$

The matrix $\mathbf{H}={\nabla }^{2}f\left(a\right)$ is positive definite (for $p>1$ ). To see this, consider $\mathbf{H}={\mathbf{K}}^{T}\mathbf{K}$ where $\mathbf{K}=\mathbf{W}\mathbf{C}$ . Let $z\in {\mathbb{R}}^{M+1}$ , $z\ne 0$ . Then

${z}^{T}\mathbf{H}z={z}^{T}{\mathbf{K}}^{T}\mathbf{K}z={\parallel \mathbf{K}z\parallel }_{2}^{2}>0$

unless $z\in N\left(\mathbf{K}\right)$ . But since $\mathbf{W}$ is diagonal and $\mathbf{C}$ is full column rank, $N\left(\mathbf{K}\right)=0$ . Thus ${z}^{T}\mathbf{H}z\ge 0$ (identity only if $z=0$ ) and so $\mathbf{H}$ is positive definite.

## Newton's method and ${l}_{p}$ Complex linear systems

Consider the problem

$\underset{x}{\text{min}}\phantom{\rule{0.277778em}{0ex}}e\left(x\right)={\parallel \mathbf{A}x-b\parallel }_{p}^{p}$

where $\mathbf{A}\in {\mathbb{C}}^{m×n}$ , $x\in {\mathbb{R}}^{n}$ and $b\in {\mathbb{C}}^{m}$ . One can write [link] in terms of the real and imaginary parts of $\mathbf{A}$ and $b$ ,

$\begin{array}{ccc}\hfill e\left(x\right)& =& \sum _{i=1}^{m}{|{A}_{i}x-{b}_{i}|}^{p}\hfill \\ & =& \sum _{i=1}^{m}{|\text{Re}\left\{{A}_{i}x-{b}_{i}\right\}+jIm\left\{{A}_{i}x-{b}_{i}\right\}|}^{p}\hfill \\ & =& \sum _{i=1}^{m}{|\left({R}_{i}x-{\alpha }_{i}\right)+\left({Z}_{i}x-{\gamma }_{i}\right)|}^{p}\hfill \\ & =& \sum _{i=1}^{m}{\left(\sqrt{{\left({R}_{i}x-{\alpha }_{i}\right)}^{2}+{\left({Z}_{i}x-{\gamma }_{i}\right)}^{2}}\right)}^{p}\hfill \\ & =& \sum _{i=1}^{m}{g}_{i}{\left(x\right)}^{p/2}\hfill \end{array}$

where $\mathbf{A}=\mathbf{R}+j\mathbf{Z}$ and $b=\alpha +j\gamma$ . The gradient $\nabla e\left(x\right)$ is the vector whose $k$ -th element is given by

$\frac{\partial }{\partial {x}_{k}}e\left(x\right)=\frac{p}{2}\sum _{i=1}^{m}\left[\frac{\partial }{\partial {x}_{k}},{g}_{i},\left(x\right)\right]{g}_{i}{\left(x\right)}^{\frac{p-2}{2}}=\frac{p}{2}{q}_{k}\left(x\right)\stackrel{^}{g}\left(x\right)$

where ${q}_{k}$ is the row vector whose $i$ -th element is

$\begin{array}{cc}\hfill {q}_{k,i}\left(x\right)=\frac{\partial }{\partial {x}_{k}}{g}_{i}\left(x\right)& =2\left({R}_{i}x-\alpha {\alpha }_{i}\right){R}_{ik}+2\left({Z}_{i}x-\gamma {\gamma }_{i}\right){Z}_{ik}\hfill \\ & =2{R}_{ik}{R}_{i}x+2{Z}_{ik}{Z}_{i}x-\left[2{\alpha }_{i}{R}_{ik}+2{\gamma }_{i}{Z}_{ik}\right]\hfill \end{array}$

Therefore one can express the gradient of $e\left(x\right)$ by $\nabla e\left(x\right)=\frac{p}{2}\mathbf{Q}\stackrel{^}{g}$ , where $\mathbf{Q}=\left[{q}_{k,i}\right]$ as above. Note that one can also write the gradient in vector form as follows

$\nabla e\left(x\right)=p\phantom{\rule{0.277778em}{0ex}}\left[{\mathbf{R}}^{T},\text{diag},\left(\mathbf{R}x-\alpha \right),+,{\mathbf{Z}}^{T},\text{diag},\left(\mathbf{Z}x-\gamma \right)\right]·\left[{\left({\left(\mathbf{R}x-\alpha \right)}^{2},+,{\left(\mathbf{Z}x-\gamma \right)}^{2}\right)}^{\frac{p-2}{2}}\right]$

The Hessian $\mathbf{H}\left(x\right)$ is the matrix of second derivatives whose $kl$ -th entry is given by

$\begin{array}{cc}\hfill {\mathbf{H}}_{k,l}\left(x\right)& =\frac{{\partial }^{2}}{\partial {x}_{k}\partial {x}_{l}}e\left(x\right)\hfill \\ \hfill & =\frac{\partial }{\partial {x}_{l}}\frac{p}{2}\sum _{i=1}^{m}{q}_{k,i}\left(x\right){g}_{i}{\left(x\right)}^{\frac{p-2}{2}}\hfill \\ & =\frac{p}{2}\sum _{i=1}^{m}\left[{q}_{k,i},\left(x\right),\frac{\partial }{\partial {x}_{l}},{g}_{i},{\left(x\right)}^{\frac{p-2}{2}},+,{g}_{i},{\left(x\right)}^{\frac{p-2}{2}},\frac{\partial }{\partial {x}_{l}},{q}_{k,i},\left(x\right)\right]\hfill \end{array}$

Now,

$\begin{array}{cc}\hfill \frac{\partial }{\partial {x}_{l}}{g}_{i}{\left(x\right)}^{\frac{p-2}{p}}& =\frac{p-2}{2}\left[\frac{\partial }{\partial {x}_{l}},{g}_{i},\left(x\right)\right]{g}_{i}{\left(x\right)}^{\frac{p-4}{2}}\hfill \\ & =\frac{p-2}{2}{q}_{l,i}\left(x\right){g}_{i}{\left(x\right)}^{\frac{p-4}{2}}\hfill \\ \hfill \frac{\partial }{\partial {x}_{l}}{q}_{k,i}\left(x\right)& =2{R}_{ik}{R}_{il}+2{Z}_{ik}{Z}_{il}\hfill \end{array}$

${\mathbf{H}}_{k,l}\left(x\right)=\frac{p\left(p-2\right)}{4}\sum _{i=1}^{m}{q}_{k,i}\left(x\right){q}_{l,i}\left(x\right){g}_{i}{\left(x\right)}^{\frac{p-4}{4}}+p\sum _{i=1}^{m}\left({R}_{ik}{R}_{il}+{Z}_{ik}{Z}_{il}\right){g}_{i}{\left(x\right)}^{\frac{p-2}{2}}$

Note that $\mathbf{H}\left(x\right)$ can be written in matrix form as

$\begin{array}{cc}\hfill \mathbf{H}\left(x\right)=& \frac{p\left(p-2\right)}{4}\left(\mathbf{Q},\phantom{\rule{0.277778em}{0ex}},\text{diag},\left(g,{\left(x\right)}^{\frac{p-4}{2}}\right),{\mathbf{Q}}^{T}\right)+\hfill \\ & p\left({\mathbf{R}}^{T},\text{diag},\left(g,{\left(x\right)}^{\frac{p-2}{2}}\right),\mathbf{R},+,{\mathbf{Z}}^{T},\text{diag},\left(g,{\left(x\right)}^{\frac{p-2}{2}}\right),\mathbf{Z}\right)\hfill \end{array}$

Therefore to solve [link] one can use Newton's method as follows: given an initial point ${x}_{0}$ , each iteration gives a new estimate ${x}^{+}$ according to the formulas

$\begin{array}{ccc}\mathbf{H}\left({x}^{c}\right)s& =& -\nabla e\left({x}^{c}\right)\\ {x}^{+}& =& {x}^{c}+s\end{array}$

where $\mathbf{H}\left({x}^{c}\right)$ and $\nabla e\left({x}^{c}\right)$ correspond to the Hessian and gradient of $e\left(x\right)$ as defined previously, evaluated at the current point ${x}^{c}$ . Since the $p$ -norm is convex for $1 , problem [link] is convex. Therefore Newton's method will converge to the global minimizer ${x}^{☆}$ as long as $\mathbf{H}\left({x}^{c}\right)$ is not ill-conditioned.

given eccentricity and a point find the equiation
12, 17, 22.... 25th term
12, 17, 22.... 25th term
Akash
College algebra is really hard?
Absolutely, for me. My problems with math started in First grade...involving a nun Sister Anastasia, bad vision, talking & getting expelled from Catholic school. When it comes to math I just can't focus and all I can hear is our family silverware banging and clanging on the pink Formica table.
Carole
find the 15th term of the geometric sequince whose first is 18 and last term of 387
I know this work
salma
The given of f(x=x-2. then what is the value of this f(3) 5f(x+1)
hmm well what is the answer
Abhi
how do they get the third part x = (32)5/4
can someone help me with some logarithmic and exponential equations.
20/(×-6^2)
Salomon
okay, so you have 6 raised to the power of 2. what is that part of your answer
I don't understand what the A with approx sign and the boxed x mean
it think it's written 20/(X-6)^2 so it's 20 divided by X-6 squared
Salomon
I'm not sure why it wrote it the other way
Salomon
I got X =-6
Salomon
ok. so take the square root of both sides, now you have plus or minus the square root of 20= x-6
oops. ignore that.
so you not have an equal sign anywhere in the original equation?
hmm
Abhi
is it a question of log
Abhi
🤔.
Abhi
I rally confuse this number And equations too I need exactly help
salma
But this is not salma it's Faiza live in lousvile Ky I garbage this so I am going collage with JCTC that the of the collage thank you my friends
salma
Commplementary angles
hello
Sherica
im all ears I need to learn
Sherica
right! what he said ⤴⤴⤴
Tamia
hii
Uday
hi
salma
what is a good calculator for all algebra; would a Casio fx 260 work with all algebra equations? please name the cheapest, thanks.
a perfect square v²+2v+_
kkk nice
algebra 2 Inequalities:If equation 2 = 0 it is an open set?
or infinite solutions?
Kim
The answer is neither. The function, 2 = 0 cannot exist. Hence, the function is undefined.
Al
y=10×
if |A| not equal to 0 and order of A is n prove that adj (adj A = |A|
rolling four fair dice and getting an even number an all four dice
Got questions? Join the online conversation and get instant answers!