Conjugate Gradient Part 2

The source code for all algorithms and plots can be found here.

Ellipsoidal Level Curves of the Quadratic Form

We will show that the contours of the quadratic form from TP.2 are ellipsoidal. Let $\alpha\equiv\frac12$ and $\beta\equiv-1$. Then TP.2 becomes

$$\align{ f(x) &\equiv \alpha\innprd{Ax}{x}+\beta\innprd{b}{x}+c\tag{ELC.1} \\ &= \alpha x^TAx+\beta b^Tx+c }$$

We will initially work in $\wR^2$ and then generalize to an arbitrary dimension. So, for the moment, $A$ is a $2\times2$ matrix, $b$ is a $2\times1$ vector, and $f$ is a function of a $2\times1$ vector $x$.

Suppose $A$ is positive definite and hence symmetric. Let $\lambda_1$ and $\lambda_2$ be the eigenvalues of $A$ corresponding to the normalized eigenvectors $v_1$ and $v_2$, respectively. Also suppose that $\lambda_1\neq\lambda_2$ so that $v_1$ and $v_2$ are orthogonal (by Axler 7.22). Then $x=\innprd{x}{v_1}v_1+\innprd{x}{v_2}v_2$ and

$$\align{ Ax &= A\prn{\innprd{x}{v_1}v_1+\innprd{x}{v_2}v_2} \\ &= \innprd{x}{v_1}Av_1+\innprd{x}{v_2}Av_2 \\ &= \innprd{x}{v_1}\lambda_1v_1+\innprd{x}{v_2}\lambda_2v_2 }$$

Hence

$$\align{ \innprd{x}{Ax} &= \innprdbg{x}{\lambda_1\innprd{x}{v_1}v_1+\lambda_2\innprd{x}{v_2}v_2} \\ &= \innprdbg{x}{\lambda_1\innprd{x}{v_1}v_1}+\innprdbg{x}{\lambda_2\innprd{x}{v_2}v_2} \\ &= \lambda_1\innprd{x}{v_1}\innprd{x}{v_1}+\lambda_2\innprd{x}{v_2}\innprd{x}{v_2} \\ &= \lambda_1\innprd{x}{v_1}^2+\lambda_2\innprd{x}{v_2}^2\tag{ELC.2} }$$

Also note that $b=\innprd{b}{v_1}v_1+\innprd{b}{v_2}v_2$ and

$$\align{ \innprd{b}{x} &= \innprdbg{\innprd{b}{v_1}v_1+\innprd{b}{v_2}v_2}{x} \\ &= \innprdbg{\innprd{b}{v_1}v_1}{x}+\innprdbg{\innprd{b}{v_2}v_2}{x} \\ &= \innprd{b}{v_1}\innprd{v_1}{x}+\innprd{b}{v_2}\innprd{v_2}{x}\tag{ELC.3} }$$

Combining ELC.2 and ELC.3, we compute

$$\align{ \alpha\innprd{x}{Ax} + \beta\innprd{b}{x} &= \alpha\lambda_1\innprd{x}{v_1}^2+\alpha\lambda_2\innprd{x}{v_2}^2+\beta\innprd{b}{v_1}\innprd{x}{v_1}+\beta\innprd{b}{v_2}\innprd{x}{v_2} \\\\ &= \alpha\lambda_1\innprd{x}{v_1}^2+\beta\innprd{b}{v_1}\innprd{x}{v_1}+\alpha\lambda_2\innprd{x}{v_2}^2+\beta\innprd{b}{v_2}\innprd{x}{v_2} \\\\ &= \alpha\lambda_1\innprd{x}{v_1}^2+\alpha\lambda_1\frac{\beta\innprd{b}{v_1}}{\alpha\lambda_1}\innprd{x}{v_1}+\alpha\lambda_2\innprd{x}{v_2}^2+\alpha\lambda_2\frac{\beta\innprd{b}{v_2}}{\alpha\lambda_2}\innprd{x}{v_2} \\\\ &= \alpha\lambda_1\Prn{\innprd{x}{v_1}^2+\frac{\beta\innprd{b}{v_1}}{\alpha\lambda_1}\innprd{x}{v_1}}+\alpha\lambda_2\Prn{\innprd{x}{v_2}^2+\frac{\beta\innprd{b}{v_2}}{\alpha\lambda_2}\innprd{x}{v_2}}\tag{ELC.4} }$$

For any $k\gt f(x_{*})$, define $\phi_k\in\wR$ by

$$\align{ \phi_k &\equiv k-c+\alpha\lambda_1\Prn{\frac{\beta\innprd{b}{v_1}}{2\alpha\lambda_1}}^2+\alpha\lambda_2\Prn{\frac{\beta\innprd{b}{v_2}}{2\alpha\lambda_2}}^2\tag{ELC.5a} \\ &= k-c+\frac{\prn{\beta\innprd{b}{v_1}}^2}{4\alpha\lambda_1}+\frac{\prn{\beta\innprd{b}{v_2}}^2}{4\alpha\lambda_2}\tag{ELC.5b} }$$

Then, for all $x$ satisfying $k=\alpha x^TAx+\beta b^Tx+c$, we have

$$\align{ \phi_k &= k-c+\alpha\lambda_1\Prn{\frac{\beta\innprd{b}{v_1}}{2\lambda_1}}^2+\alpha\lambda_2\Prn{\frac{\beta\innprd{b}{v_2}}{2\lambda_2}}^2 \\ &= \alpha x^TAx+\beta b^Tx+c-c+\alpha\lambda_1\Prn{\frac{\beta\innprd{b}{v_1}}{2\alpha\lambda_1}}^2+\alpha\lambda_2\Prn{\frac{\beta\innprd{b}{v_2}}{2\alpha\lambda_2}}^2 \\ &= \alpha x^TAx+\beta b^Tx+\alpha\lambda_1\Prn{\frac{\beta\innprd{b}{v_1}}{2\alpha\lambda_1}}^2+\alpha\lambda_2\Prn{\frac{\beta\innprd{b}{v_2}}{2\alpha\lambda_2}}^2 \\\\ &= \alpha\lambda_1\Prn{\innprd{x}{v_1}^2+\frac{\beta\innprd{b}{v_1}}{\alpha\lambda_1}\innprd{x}{v_1}+\Sbr{\frac{\beta\innprd{b}{v_1}}{2\alpha\lambda_1}}^2}+\alpha\lambda_2\Prn{\innprd{x}{v_2}^2+\frac{\beta\innprd{b}{v_2}}{\alpha\lambda_2}\innprd{x}{v_2}+\Sbr{\frac{\beta\innprd{b}{v_2}}{2\alpha\lambda_2}}^2}\tag{by ELC.4} \\\\ &= \alpha\lambda_1\Prn{\innprd{x}{v_1}+\frac{\beta\innprd{b}{v_1}}{2\alpha\lambda_1}}^2+\alpha\lambda_2\Prn{\innprd{x}{v_2}+\frac{\beta\innprd{b}{v_2}}{2\alpha\lambda_2}}^2 \\\\ &= \frac{\prn{\innprd{x}{v_1}+\frac{\beta\innprd{b}{v_1}}{2\alpha\lambda_1}}^2}{\frac{1}{\alpha\lambda_1}}+\frac{\prn{\innprd{x}{v_2}+\frac{\beta\innprd{b}{v_2}}{2\alpha\lambda_2}}^2}{\frac{1}{\alpha\lambda_2}} \\\\ }$$

Hence, on the level curve $f(x)=k$, we have

$$\align{ 1 &= \frac1{\phi_k}\frac{\prn{\innprd{x}{v_1}+\frac{\beta\innprd{b}{v_1}}{2\alpha\lambda_1}}^2}{\frac{1}{\alpha\lambda_1}}+\frac1{\phi_k}\frac{\prn{\innprd{x}{v_2}+\frac{\beta\innprd{b}{v_2}}{2\alpha\lambda_2}}^2}{\frac{1}{\alpha\lambda_2}} \\\\ &= \frac{\prn{\innprd{x}{v_1}+\frac{\beta\innprd{b}{v_1}}{2\alpha\lambda_1}}^2}{\frac{\phi_k}{\alpha\lambda_1}}+\frac{\prn{\innprd{x}{v_2}+\frac{\beta\innprd{b}{v_2}}{2\alpha\lambda_2}}^2}{\frac{\phi_k}{\alpha\lambda_2}} \\\\ &= \frac{\prn{\innprd{x}{v_1}-\frac{-\beta\innprd{b}{v_1}}{2\alpha\lambda_1}}^2}{\Prn{\sqrt{\frac{\phi_k}{\alpha\lambda_1}}}^2}+\frac{\prn{\innprd{x}{v_2}-\frac{-\beta\innprd{b}{v_2}}{2\alpha\lambda_2}}^2}{\Prn{\sqrt{\frac{\phi_k}{\alpha\lambda_2}}}^2}\tag{ELC.6} }$$

We have derived the familiar equation for an ellipse. First note that the two variables in this ellipse equation are $\innprd{x}{v_1}$ and $\innprd{x}{v_2}$. Geometrically, this means that the eigenvectors of $A$ define the basis in which this ellipsoid resides. That is, the eigenvectors of $A$ align with axes of the ellipsoid. Also note that the center $m$ of the level ellipsoid $f(x)=k$ is

$$\align{ m=\innprd{m}{v_1}v_1+\innprd{m}{v_2}v_2=\frac{-\beta\innprd{b}{v_1}}{2\alpha\lambda_1}v_1+\frac{-\beta\innprd{b}{v_2}}{2\alpha\lambda_2}v_2\tag{ELC.7} }$$

And finally note that the length of the semi-major axis is the larger of $\sqrt{\frac{\phi_k}{\alpha\lambda_1}}$ and $\sqrt{\frac{\phi_k}{\alpha\lambda_2}}$. The length of the semi-minor axis is the smaller of the two.

We will now perform the same computations but for an arbitrary inner product space of dimension $n$. Let $A$ be an $n\times n$ matrix and let $b$ be an $n\times1$ vector. The quadratic form $f$ is now a function of an $n\times1$ vector $x$.

We continue to assume that $A$ is positive definite and hence symmetric. Let $\lambda_1,\dots,\lambda_n$ be the eigenvalues of $A$ corresponding to the normalized eigenvectors $v_1,\dots,v_n$, respectively. Also suppose that the eigenvalues are distinct so that $v_1,\dots,v_n$ are orthogonal (by Axler 7.22). Then $x=\sum_{j=1}^n\innprd{x}{v_j}v_j$ and

$$ Ax = A\Prngg{\sum_{j=1}^n\innprd{x}{v_j}v_1} = \sum_{j=1}^n\innprd{x}{v_j}Av_j = \sum_{j=1}^n\innprd{x}{v_j}\lambda_jv_j\tag{ELC.8} $$

Hence

$$ \innprd{Ax}{x} = \innprdBgg{\sum_{j=1}^n\innprd{x}{v_j}\lambda_jv_j}{x} = \sum_{j=1}^n\innprdbg{\innprd{x}{v_j}\lambda_jv_j}{x} = \sum_{j=1}^n\innprd{x}{v_j}\innprd{x}{v_j}\lambda_j = \sum_{j=1}^n\innprd{x}{v_j}^2\lambda_j\tag{ELC.9} $$

We also have

$$ \innprd{b}{x} = \innprdBgg{\sum_{j=1}^n\innprd{b}{v_j}v_j}{x} = \sum_{j=1}^n\innprdbg{\innprd{b}{v_j}v_j}{x} = \sum_{j=1}^n\innprd{b}{v_j}\innprd{x}{v_j}\tag{ELC.10} $$

Combining ELC.9 and ELC.10, we compute

$$\align{ \alpha\innprd{Ax}{x}+\beta\innprd{b}{x} &= \alpha\sum_{j=1}^n\innprd{x}{v_j}^2\lambda_j+\beta\sum_{j=1}^n\innprd{b}{v_j}\innprd{x}{v_j} \\\\ &= \alpha\sum_{j=1}^n\innprd{x}{v_j}^2\lambda_j+\alpha\sum_{j=1}^n\frac{\beta\innprd{b}{v_j}}{\alpha\lambda_j}\innprd{x}{v_j}\lambda_j \\\\ &= \alpha\sum_{j=1}^n\Prn{\innprd{x}{v_j}^2+\frac{\beta\innprd{b}{v_j}}{\alpha\lambda_j}\innprd{x}{v_j}}\lambda_j\tag{ELC.11} }$$

For any $k\gt f(x_{*})$, define $\phi_k\in\wR$ by

$$\align{ \phi_k &\equiv k-c+\alpha\sum_{j=1}^n\Prn{\frac{\beta\innprd{b}{v_j}}{2\alpha\lambda_j}}^2\lambda_j\tag{ELC.12a} \\ &= k-c+\sum_{j=1}^n\frac{\prn{\beta\innprd{b}{v_j}}^2}{4\alpha\lambda_j}\tag{ELC.12b} }$$

Then, for all $x$ satisfying $k=f(x)$, we have

$$\align{ \phi_k &= k-c+\alpha\sum_{j=1}^n\Prn{\frac{\beta\innprd{b}{v_j}}{2\alpha\lambda_j}}^2\lambda_j\tag{by ELC.12b} \\ &= \alpha\innprd{Ax}{x}+\beta\innprd{b}{x}+c-c+\alpha\sum_{j=1}^n\Prn{\frac{\beta\innprd{b}{v_j}}{2\alpha\lambda_j}}^2\lambda_j \\ &= \alpha\innprd{Ax}{x}+\beta\innprd{b}{x}+\alpha\sum_{j=1}^n\Prn{\frac{\beta\innprd{b}{v_j}}{2\alpha\lambda_j}}^2\lambda_j \\\\ &= \alpha\sum_{j=1}^n\Prn{\innprd{x}{v_j}^2+\frac{\beta\innprd{b}{v_j}}{\alpha\lambda_j}\innprd{x}{v_j}}\lambda_j+\alpha\sum_{j=1}^n\Prn{\frac{\beta\innprd{b}{v_j}}{2\alpha\lambda_j}}^2\lambda_j\tag{by ELC.11} \\\\ &= \alpha\sum_{j=1}^n\Prn{\innprd{x}{v_j}^2+\frac{\beta\innprd{b}{v_j}}{\alpha\lambda_j}\innprd{x}{v_j}+\Sbr{\frac{\beta\innprd{b}{v_j}}{2\alpha\lambda_j}}^2}\lambda_j \\\\ &= \alpha\sum_{j=1}^n\Prn{\innprd{x}{v_j}+\frac{\beta\innprd{b}{v_j}}{2\alpha\lambda_j}}^2\lambda_j \\\\ &= \alpha\sum_{j=1}^n\prn{\innprd{x}{v_j}-\innprd{m}{v_j}}^2\lambda_j \\\\ &= \alpha\sum_{j=1}^n\innprd{x-m}{v_j}^2\lambda_j\tag{ELC.13} \\\\ &= \sum_{j=1}^n\frac{\innprd{x-m}{v_j}^2}{\frac{1}{\alpha\lambda_j}} \\\\ &= \sum_{j=1}^n\frac{\innprd{x-m}{v_j}^2}{\Prn{\sqrt{\frac{1}{\alpha\lambda_j}}}^2}\tag{ELC.14} }$$

where the center point $m$ is defined by

$$\align{ m &\equiv \sum_{j=1}^n\frac{-\beta\innprd{b}{v_j}}{2\alpha\lambda_j}v_j\tag{ELC.15} \\ &= \sum_{j=1}^n\frac{\innprd{b}{v_j}}{\lambda_j}v_j }$$

Notice that the center point $m$ is the same regardless of the value for $k$. That is, every contour of $f$ has the same center point $m$. Dividing through by $\phi_k$ in ELC.14, we get the equation for the ellipsoid in $n$-dimensional space:

$$ 1 = \sum_{j=1}^n\frac{\innprd{x-m}{v_j}^2}{\Prn{\sqrt{\frac{\phi_k}{\alpha\lambda_j}}}^2}\tag{ELC.16} $$

For a given quadratic form $f$ on a given matrix $A$ and for a given scalar $k\gt f(x_{*})$, we define the level $k$ ellipsoid generated by the quadratic form $f$ on $A$ to be the set of all points $x$ satisfying ELC.13 or equivalently ELC.16. When $f$ and $A$ are clear, we just say level $k$ ellipsoid.

Notice that we derived ELC.13 and ELC.16 in such a way that $x$ is on the level $k$ ellipsoid if and only if $f(x)=k$. Of the three, equation ELC.13 is typically the most convenient to determine directly that a given $x$ is on the level $k$ ellipsoid. By using ELC.13, we will soon prove a much easier way to determine that a given $x$ is or is not on the level $k$ ellipsoid. But equation ELC.13 is nevertheless very intuitive and serves as a primary definition of the level $k$ ellipsoid (along with $f(x)=k$).

We generally assume that the eigenvalues are ordered smallest-to-largest so that $\lambda_1\lt\lambda_2\lt\dotsb\lt\lambda_n$. And we define $d_{k,1},d_{k,2},\dots,d_{k,n}$ by

$$\gather{ d_{k,1}\equiv\text{Length of largest semi-axis} \equiv \sqrt{\frac{\phi_k}{\alpha\lambda_1}} \\ d_{k,2}\equiv\text{Length of second largest semi-axis} \equiv \sqrt{\frac{\phi_k}{\alpha\lambda_2}}\tag{ELC.17} \\\\ \vdots \\\\ d_{k,n}\equiv\text{Length of smallest semi-axis} \equiv \sqrt{\frac{\phi_k}{\alpha\lambda_n}} }$$

Since we divide by $\phi_k$ in ELC.16 and we take the square root of $\phi_k$ in ELC.17, we should prove that $\phi_k\gt0$. This fact will also be important later on (in the proofs of the critical propositions PLE.4 and CJGC.4).

Proposition ELC.18 For all $k\gt f(x_{*})$, we have $\phi_k\gt0$.

Proof Let $k\gt f(x_{*})$. Then

$$\align{ \phi_k &= k-c+\sum_{j=1}^n\frac{\prn{\beta\innprd{b}{v_j}}^2}{4\alpha\lambda_j}\tag{by ELC.12b} \\ &\gt f(x_{*})-c+\sum_{j=1}^n\frac{\prn{\beta\innprd{b}{v_j}}^2}{4\alpha\lambda_j}\tag{ELC.18a} \\ &= \alpha\innprd{Ax_{*}}{x_{*}}+\beta\innprd{b}{x_{*}}+c-c+\sum_{j=1}^n\frac{\prn{\beta\innprd{b}{v_j}}^2}{4\alpha\lambda_j} \\ &= \frac12\innprd{Ax_{*}}{x_{*}}-\innprd{b}{x_{*}}+\sum_{j=1}^n\frac{\prn{-1\cdot\innprd{b}{v_j}}^2}{4\cdot\frac12\cdot\lambda_j}\tag{since $\alpha=\frac12$ and $\beta=-1$} \\ &= \frac12\innprd{Ax_{*}}{x_{*}}-\innprd{Ax_{*}}{x_{*}}+\sum_{j=1}^n\frac{\innprd{Ax_{*}}{v_j}^2}{2\lambda_j}\tag{since $Ax_{*}=b$} \\ &= -\frac12\innprd{Ax_{*}}{x_{*}}+\frac12\sum_{j=1}^n\frac{\innprd{Ax_{*}}{v_j}^2}{\lambda_j} \\ &= \frac12\sum_{j=1}^n\frac{\innprd{Ax_{*}}{v_j}^2}{\lambda_j}-\frac12\innprd{Ax_{*}}{x_{*}} \\ &= \frac12\Prngg{\sum_{j=1}^n\frac{\innprd{Ax_{*}}{v_j}^2}{\lambda_j}-\innprd{Ax_{*}}{x_{*}}} \\ &= \frac12\Prngg{\sum_{j=1}^n\frac{\innprd{Ax_{*}}{v_j}^2}{\lambda_j}-\innprdBg{\sum_{j=1}^n\innprd{Ax_{*}}{v_j}v_j}{x_{*}}} \\ &= \frac12\Prngg{\sum_{j=1}^n\frac{\innprd{Ax_{*}}{v_j}^2}{\lambda_j}-\sum_{j=1}^n\innprd{Ax_{*}}{v_j}\innprd{v_j}{x_{*}}} \\ &= \frac12\Prngg{\sum_{j=1}^n\frac1{\lambda_j}\innprd{Ax_{*}}{v_j}^2-\sum_{j=1}^n\innprd{Ax_{*}}{v_j}\innprd{x_{*}}{v_j}} \\ &= \frac12\sum_{j=1}^n\Prn{\frac1{\lambda_j}\innprd{Ax_{*}}{v_j}^2-\innprd{Ax_{*}}{v_j}\innprd{x_{*}}{v_j}} \\ &= \frac12\sum_{j=1}^n\Prn{\frac1{\lambda_j}\innprd{x_{*}}{A^Tv_j}^2-\innprd{x_{*}}{A^Tv_j}\innprd{x_{*}}{v_j}} \\ &= \frac12\sum_{j=1}^n\Prn{\frac1{\lambda_j}\innprd{x_{*}}{Av_j}^2-\innprd{x_{*}}{Av_j}\innprd{x_{*}}{v_j}}\tag{since $A$ is symmetric} \\ &= \frac12\sum_{j=1}^n\Prn{\frac1{\lambda_j}\innprd{x_{*}}{\lambda_jv_j}^2-\innprd{x_{*}}{\lambda_jv_j}\innprd{x_{*}}{v_j}} \\ &= \frac12\sum_{j=1}^n\Prn{\frac{\lambda_j^2}{\lambda_j}\innprd{x_{*}}{v_j}^2-\lambda_j\innprd{x_{*}}{v_j}\innprd{x_{*}}{v_j}} \\ &= 0\tag{ELC.18b} }$$

$\wes$

If we slightly alter our restriction on $k$, then we can carefully use $\phi_k$ to prove this helpful result:

Proposition ELC.19 The center point of any contour of $f$ is the minimum point of $f$. That is

$$ m=x_{*} $$

Proof Let $k=f(x_{*})$. Then

$$\align{ \phi_k &= k-c+\sum_{j=1}^n\frac{\prn{\beta\innprd{b}{v_j}}^2}{4\alpha\lambda_j}\tag{by ELC.12b} \\ &= f(x_{*})-c+\sum_{j=1}^n\frac{\prn{\beta\innprd{b}{v_j}}^2}{4\alpha\lambda_j} \\ &= 0 }$$

We arrive at zero in the last equality by repeating the computations from ELC.18a to ELC.18b. Notice that with $\phi_k=0$, we can still perform the computations from ELC.12b to ELC.13 to get

$$ \phi_k = \alpha\sum_{j=1}^n\innprd{x_{*}-m}{v_j}^2\lambda_j $$

Hence

$$ \sum_{j=1}^n\innprd{x_{*}-m}{v_j}^2\lambda_j = \frac{\phi_k}{\alpha} = 0 $$

Recall that the eigenvalues $\lambda_1,\dots,\lambda_n$ of the positive definite $A$ are positive. Hence the left hand side of this equation is a sum of nonnegative numbers. Since the sum of nonnegative numbers is zero if and only if all of the terms in the sum are zero, then it must be that $\innprd{x_{*}-m}{v_j}^2=0$ for $j=1,\dots,n$. Hence

$$ 0 = \sum_{j=1}^n\innprd{x_{*}-m}{v_j}^2 = \dnorm{\sum_{j=1}^n\innprd{x_{*}-m}{v_j}v_j}^2 = \dnorm{x_{*} - m}^2 $$

Hence $x_{*}-m=0$.

Alternatively:

$$ Am = A\sum_{j=1}^n\frac{\innprd{b}{v_j}}{\lambda_j}v_j = \sum_{j=1}^n\frac{\innprd{b}{v_j}}{\lambda_j}Av_j = \sum_{j=1}^n\frac{\innprd{b}{v_j}}{\lambda_j}\lambda_jv_j = \sum_{j=1}^n\innprd{b}{v_j}v_j = b $$

In proposition TP.6, we showed that the solution $x_{*}$ to TP.1 is unique since $f$ has exactly one minimum point. Hence $x_{*}=m$.

$\wes$