0.16 Appendix: optimization theory

Iterative design of l_p digital Page 1 / 1

Optimization theory is the branch of applied mathematics whose purpose is to consider a mathematical expression in order to find a set of parameters that either maximize or minimize it. Being an applied discipline, problems usually arise from real-life situations including areas like science, engineering and finance (among many other). This section presents some basic concepts for completeness and is not meant to replace a treaty on the subject. The reader is encouraged to consult further references for more information.

Solution of linear weighted least squares problems

Consider the quadratic problem

min_{h} {∥ d - C h ∥}_{2}

which can be written as

min_{h} {(d - C h)}^{T} (d - C h)

omitting the square root since this problem is a strictly convex one. Therefore its unique (and thus global) solution is found at the point where the partial derivatives with respect to the optimization variable are equal to zero. That is,

\begin{matrix} \frac{\partial}{\partial h} \{{(d - C h)}^{T}, (d - C h)\} & = \frac{\partial}{\partial h} \{d^{T} d - 2 d^{T} C h + {(C, h)}^{T} C h\} \\ = - 2 C^{T} d + 2 C^{T} C h = 0 \\ \Rightarrow C^{T} C h = C^{T} d \end{matrix}

The solution of [link] is given by

h = {(C^{T}, C)}^{- 1} C^{T} d

where the inverted term is referred [link] , [link] as the Moore-Pentrose pseudoinverse of $C^{T} C$ .

In the case of a weighted version of [link] ,

min_{h} {∥ \sqrt{w} (d - C h) ∥}_{2}^{2} = \sum_{k} w_{k} {| d_{k} - C_{k} h |}^{2}

where $C_{k}$ is the $k$ -th row of $C$ , one can write [link] as

min_{h} (W (d - C h))^{T} (W (d - C h))

where $W = diag (\sqrt{w})$ contains the weighting vector $w$ . The solution is therefore given by

h = {(C^{T}, W^{T}, W, C)}^{- 1} C^{T} W^{T} W d

Newton's method and the approximation of linear systems in an $l_{p}$ Sense

Newton's method and $l_{p}$ Linear phase systems

Consider the problem

min_{a} g (a) = {∥ A (ω; a) - D (ω) ∥}_{p}

for $a \in R^{M + 1}$ . Problem [link] is equivalent to the better posed problem

\begin{matrix} min_{a} f (a) = g {(a)}^{p} & = & {∥ A (ω; a) - D (ω) ∥}_{p}^{p} \\ = & \sum_{i = 0}^{L} ∣ C_{i} a - D_{i} ∣^{p} \end{matrix}

where $D_{i} = D (ω_{i})$ , $ω_{i} \in [0, π]$ , $C_{i} = [C_{i, 0}, ..., C_{i, M}]$ , and

C = [\begin{matrix} C_{0} \\ ⋮ \\ C_{L} \end{matrix}]

The $i j$ -th element of $C$ is given by $C_{i, j} = cos ω_{i} (M - j)$ , where $0 \leq i \leq L$ and $0 \leq j \leq M$ . From [link] we have that

\nabla f (a) = [\begin{matrix} \frac{\partial}{\partial a_{0}} f (a) \\ ⋮ \\ \frac{\partial}{\partial a_{M}} f (a) \end{matrix}]

where $a_{j}$ is the $j$ -th element of $a \in R^{M + 1}$ and

\begin{matrix} \frac{\partial}{\partial a_{j}} f (a) & = & \frac{\partial}{\partial a_{j}} \sum_{i = 0}^{L} ∣ C_{i} a - D_{i} ∣^{p} \\ = & \sum_{i = 0}^{L} \frac{\partial}{\partial a_{j}} ∣ C_{i} a - D_{i} ∣^{p} \\ = & p \sum_{i = 0}^{L} ∣ C_{i} a - D_{i} ∣^{p - 1} \cdot \frac{\partial}{\partial a_{j}} ∣ C_{i} a - D_{i} ∣ \end{matrix}

Now,

\frac{\partial}{\partial a_{j}} ∣ C_{i} a - D_{i} ∣ = sign (C_{i} a - D_{i}) \cdot \frac{\partial}{\partial a_{j}} (C_{i} a - D_{i}) = C_{i, j} sign (C_{i} a - D_{i})

where Note that

lim_{u (a) \to 0^{+}} \frac{\partial}{\partial a_{j}} ∣ u (a) ∣^{p} = lim_{u (a) \to 0^{-}} \frac{\partial}{\partial a_{j}} ∣ u (a) ∣^{p} = 0

sign (x) = \{\begin{matrix} 1 & x > 0 \\ 0 & x = 0 \\ - 1 & x < 0 \end{matrix})

Therefore the Jacobian of $f (a)$ is given by

\nabla f (a) = [\begin{matrix} p \sum_{i = 0}^{L} C_{i, 0} ∣ C_{i} a - D_{i} ∣^{p - 1} sign (C_{i} a - D_{i}) \\ ⋮ \\ p \sum_{i = 0}^{L} C_{i, M - 1} ∣ C_{i} a - D_{i} ∣^{p - 1} sign (C_{i} a - D_{i}) \end{matrix}]

The Hessian of $f (a)$ is the matrix $\nabla^{2} f (a)$ whose $j m$ -th element ( $0 \leq j, m \leq M$ ) is given by

\begin{matrix} \nabla_{j, m}^{2} f (a) = \frac{\partial a^{2}}{\partial a_{j} \partial a_{m}} f (a) & = & \frac{\partial}{\partial a_{m}} \frac{\partial}{\partial a_{j}} f (a) \\ = & \sum_{i = 0}^{L} p C_{i, j} \frac{\partial}{\partial a_{m}} ∣ D_{i} - C_{i} a ∣^{p - 1} sign (D_{i} - C_{i} a) \\ = & \sum_{i = 0}^{L} α \frac{\partial}{\partial a_{m}} b (a) d (a) \end{matrix}

where adequate substitutions have been made for the sake of simplicity. We have

\begin{matrix} \frac{\partial}{\partial a_{m}} b (a) & = & \frac{\partial}{\partial a_{m}} ∣ C_{i} a - D_{i} ∣^{p - 1} \\ = & (p - 1) C_{i, m} ∣ C_{i} a - D_{i} ∣^{p - 2} sign (C_{i} a - D_{i}) \\ \frac{\partial}{\partial a_{m}} d (a) & = & \frac{\partial}{\partial a_{m}} sign (D_{i} - C_{i} a) = 0 \end{matrix}

Note that the partial derivative of $d (a)$ at $D_{i} - C_{i} a = 0$ is not defined. Therefore

\begin{matrix} \frac{\partial}{\partial a_{m}} b (a) d (a) & = & b (a) \frac{\partial}{\partial a_{m}} d (a) + d (a) \frac{\partial}{\partial a_{m}} b (a) \\ = & (p - 1) C_{i, m} ∣ C_{i} a - D_{i} ∣^{p - 2} {sign}^{2} (C_{i} a - D_{i}) \end{matrix}

Note that ${sign}^{2} (C_{i} a - D_{i}) = 1$ for all $D_{i} - C_{i} a \neq 0$ where it is not defined. Then

\nabla_{j, m}^{2} f (a) = p (p - 1) \sum_{i = 0}^{L} C_{i, j} C_{i, m} ∣ C_{i} a - D_{i} ∣^{p - 2}

except at $D_{i} - C_{i} a = 0$ where it is not defined.

Based on [link] and [link] , one can apply Newton's method to problem [link] as follows,

Given $a_{0} \in R^{M + 1}$ , $D \in R^{L + 1}$ , $C \in R^{L + 1 \times M + 1}$
For i = 0 , 1 , ...
1. Find $\nabla f (a_{i})$ .
2. Find $\nabla^{2} f (a_{i})$ .
3. Solve $\nabla^{2} f (a_{i}) s = - \nabla f (a_{i})$ for $s$ .
4. Let $a_{+} = a_{i} + s$ .
5. Check for convergence and iterate if necessary.

Note that for problem [link] the Jacobian of $f (a)$ can be written as

\nabla f (a) = p C^{T} y

where

y = ∣ C a_{i} - D ∣^{p - 1} sign (C a_{i} - D) = ∣ C a_{i} - D ∣^{p - 2} (C a_{i} - D)

Also,

\nabla_{j, m}^{2} f (a) = p (p - 1) C_{j}^{T} Z C_{m}

where

Z = diag (∣ C a_{i} - D ∣^{p - 2})

and

C_{j} = [\begin{matrix} C_{0, j} \\ ⋮ \\ C_{L, j} \end{matrix}]

Therefore

\nabla^{2} f (a) = (p^{2} - p) C^{T} Z C

From [link] , the Hessian $\nabla^{2} f (a)$ can be expressed as

\nabla^{2} f (a) = (p^{2} - p) C^{T} W^{T} W C

where

W = diag (∣ C a_{i} - D ∣^{\frac{p - 2}{2}})

The matrix $C \in R^{(L + 1) \times (M + 1)}$ is given by

C = [\begin{matrix} cos M ω_{0} & cos (M - 1) ω_{0} & \dots & cos (M - j) ω_{0} & \dots & cos ω_{0} & 1 \\ cos M ω_{1} & cos (M - 1) ω_{1} & \dots & cos (M - j) ω_{1} & \dots & cos ω_{1} & 1 \\ ⋮ & ⋮ & ⋱ & ⋮ & ⋮ & ⋮ \\ cos M ω_{i} & cos (M - 1) ω_{i} & \dots & cos (M - j) ω_{i} & \dots & cos ω_{i} & 1 \\ ⋮ & ⋮ & ⋮ & ⋱ & ⋮ & ⋮ \\ cos M ω_{L - 1} & cos (M - 1) ω_{L - 1} & \dots & cos (M - j) ω_{L - 1} & \dots & cos ω_{L - 1} & 1 \\ cos M ω_{L} & cos (M - 1) ω_{L} & \dots & cos (M - j) ω_{L} & \dots & cos ω_{L} & 1 \end{matrix}]

The matrix $H = \nabla^{2} f (a)$ is positive definite (for $p > 1$ ). To see this, consider $H = K^{T} K$ where $K = W C$ . Let $z \in R^{M + 1}$ , $z \neq 0$ . Then

z^{T} H z = z^{T} K^{T} K z = {∥ K z ∥}_{2}^{2} > 0

unless $z \in N (K)$ . But since $W$ is diagonal and $C$ is full column rank, $N (K) = 0$ . Thus $z^{T} H z \geq 0$ (identity only if $z = 0$ ) and so $H$ is positive definite.

Newton's method and $l_{p}$ Complex linear systems

Consider the problem

min_{x} e (x) = {∥ A x - b ∥}_{p}^{p}

where $A \in C^{m \times n}$ , $x \in R^{n}$ and $b \in C^{m}$ . One can write [link] in terms of the real and imaginary parts of $A$ and $b$ ,

\begin{matrix} e (x) & = & \sum_{i = 1}^{m} {| A_{i} x - b_{i} |}^{p} \\ = & \sum_{i = 1}^{m} {| Re {A_{i} x - b_{i}} + j I m {A_{i} x - b_{i}} |}^{p} \\ = & \sum_{i = 1}^{m} {| (R_{i} x - α_{i}) + (Z_{i} x - γ_{i}) |}^{p} \\ = & \sum_{i = 1}^{m} {(\sqrt{{(R_{i} x - α_{i})}^{2} + {(Z_{i} x - γ_{i})}^{2}})}^{p} \\ = & \sum_{i = 1}^{m} g_{i} {(x)}^{p / 2} \end{matrix}

where $A = R + j Z$ and $b = α + j γ$ . The gradient $\nabla e (x)$ is the vector whose $k$ -th element is given by

\frac{\partial}{\partial x_{k}} e (x) = \frac{p}{2} \sum_{i = 1}^{m} [\frac{\partial}{\partial x_{k}}, g_{i}, (x)] g_{i} {(x)}^{\frac{p - 2}{2}} = \frac{p}{2} q_{k} (x) \hat{g} (x)

where $q_{k}$ is the row vector whose $i$ -th element is

\begin{matrix} q_{k, i} (x) = \frac{\partial}{\partial x_{k}} g_{i} (x) & = 2 (R_{i} x - α α_{i}) R_{i k} + 2 (Z_{i} x - γ γ_{i}) Z_{i k} \\ = 2 R_{i k} R_{i} x + 2 Z_{i k} Z_{i} x - [2 α_{i} R_{i k} + 2 γ_{i} Z_{i k}] \end{matrix}

Therefore one can express the gradient of $e (x)$ by $\nabla e (x) = \frac{p}{2} Q \hat{g}$ , where $Q = [q_{k, i}]$ as above. Note that one can also write the gradient in vector form as follows

\nabla e (x) = p [R^{T} diag (R x - α) + Z^{T} diag (Z x - γ)] \cdot [{({(R x - α)}^{2} + {(Z x - γ)}^{2})}^{\frac{p - 2}{2}}]

The Hessian $H (x)$ is the matrix of second derivatives whose $k l$ -th entry is given by

\begin{matrix} H_{k, l} (x) & = \frac{\partial^{2}}{\partial x_{k} \partial x_{l}} e (x) \\ = \frac{\partial}{\partial x_{l}} \frac{p}{2} \sum_{i = 1}^{m} q_{k, i} (x) g_{i} {(x)}^{\frac{p - 2}{2}} \\ = \frac{p}{2} \sum_{i = 1}^{m} [q_{k, i} (x) \frac{\partial}{\partial x_{l}} g_{i} {(x)}^{\frac{p - 2}{2}} + g_{i} {(x)}^{\frac{p - 2}{2}} \frac{\partial}{\partial x_{l}} q_{k, i} (x)] \end{matrix}

Now,

\begin{matrix} \frac{\partial}{\partial x_{l}} g_{i} {(x)}^{\frac{p - 2}{p}} & = \frac{p - 2}{2} [\frac{\partial}{\partial x_{l}}, g_{i}, (x)] g_{i} {(x)}^{\frac{p - 4}{2}} \\ = \frac{p - 2}{2} q_{l, i} (x) g_{i} {(x)}^{\frac{p - 4}{2}} \\ \frac{\partial}{\partial x_{l}} q_{k, i} (x) & = 2 R_{i k} R_{i l} + 2 Z_{i k} Z_{i l} \end{matrix}

Substituting [link] and [link] into [link] we obtain

H_{k, l} (x) = \frac{p (p - 2)}{4} \sum_{i = 1}^{m} q_{k, i} (x) q_{l, i} (x) g_{i} {(x)}^{\frac{p - 4}{4}} + p \sum_{i = 1}^{m} (R_{i k} R_{i l} + Z_{i k} Z_{i l}) g_{i} {(x)}^{\frac{p - 2}{2}}

Note that $H (x)$ can be written in matrix form as

\begin{matrix} H (x) = & \frac{p (p - 2)}{4} (Q, diag, (g, {(x)}^{\frac{p - 4}{2}}), Q^{T}) + \\ p (R^{T} diag (g, {(x)}^{\frac{p - 2}{2}}) R + Z^{T} diag (g, {(x)}^{\frac{p - 2}{2}}) Z) \end{matrix}

Therefore to solve [link] one can use Newton's method as follows: given an initial point $x_{0}$ , each iteration gives a new estimate $x^{+}$ according to the formulas

\begin{matrix} H (x^{c}) s & = & - \nabla e (x^{c}) \\ x^{+} & = & x^{c} + s \end{matrix}

where $H (x^{c})$ and $\nabla e (x^{c})$ correspond to the Hessian and gradient of $e (x)$ as defined previously, evaluated at the current point $x^{c}$ . Since the $p$ -norm is convex for $1 < p < \infty$ , problem [link] is convex. Therefore Newton's method will converge to the global minimizer $x^{☆}$ as long as $H (x^{c})$ is not ill-conditioned.

<< Chapter < Page Page > Chapter >>

Read also:

Get Jobilize Job Search Mobile App in your pocket Now!

100% Free Mobile Applications
Receive real-time job alerts and never miss the right job again

Source: OpenStax, Iterative design of l_p digital filters. OpenStax CNX. Dec 07, 2011 Download for free at http://cnx.org/content/col11383/1.1

Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Iterative design of l_p digital filters' conversation and receive update notifications?

Ask

©flickr: Gareth	Professional Etiquette MCQ By Abby Sharp Start Quiz
	15 AP 15 Autonomic Nervous System Essay By OpenStax Start Flashcards
©flickr:	Vocabulary Practice Quiz! By Katie Montrose Start Quiz
	Worldport Outside By Rachel Carlisle Start Quiz
	41 Biology 41 Osmotic Regulation and Excretion MCQ By OpenStax Start Quiz
	4 Sociology 04 Society and Social Interaction MCQ By OpenStax Start Quiz
	27 Biology 27 Animal Diversity MCQ By OpenStax Start Quiz
	14 AP 14 Brain Cranial Nerves Essay By OpenStax Start Flashcards
	Social Organization Kinship By Richley Crapo Start Assignment
	6 Neuroanatomy 06 Head Somatic Visceral Sensory By Stephen Voron Start Quiz

0.16 Appendix: optimization theory

Solution of linear weighted least squares problems

Newton's method and the approximation of linear systems in an l p Sense

Newton's method and l p Linear phase systems

Newton's method and l p Complex linear systems

Read also:

Get Jobilize Job Search Mobile App in your pocket Now!

Newton's method and the approximation of linear systems in an $l_{p}$ Sense

Newton's method and $l_{p}$ Linear phase systems

Newton's method and $l_{p}$ Complex linear systems