Conjugate Gradient Part 6

19 May 2019

The source code for all algorithms and plots can be found here.

Conjugate Directions

In the previous section, we described a method to compute $n$ steps from the starting point $x_{(0)}$ to the minimum point $x_{*}$. But that method requires that we compute the Eigendecomposition of $A$. This is a very expensive computation. Fortunately, there is a way to circumvent this requirement and still achieve the same result.

We will work in an arbitrary real inner product space $V$ of dimension $n$. Our goal is to find search directions $d_{(0)}$, $d_{(1)}$,…, $d_{(n-1)}$ and step sizes $\alpha_{(0)}$, $\alpha_{(1)}$, …, $\alpha_{(n-1)}$ that get us to $x_{*}$ in $n$ iterations. That is, we want to iterate $n$ times through the computations

$$ x_{(i+1)} = x_{(i)}+\alpha_{(i)}d_{(i)} \tag{CJDR.2} $$

in such a way that we get

$$ x_{(n)} = x_{*} \dq\iff\dq e_{(n)} = 0 \tag{CJDR.3} $$

Note that when we subtract $x_{*}$ from each side of CJDR.2, we get

$$\align{ e_{(i+1)} &= e_{(i)}+\alpha_{(i)}d_{(i)} \tag{CJDR.4} \\ &= e_{(i-1)}+\alpha_{(i-1)}d_{(i-1)}+\alpha_{(i)}d_{(i)} \\ &= e_{(i-1)}+\sum_{k=i-1}^{i}\alpha_{(k)}d_{(k)} \\ &= e_{(i-2)}+\alpha_{(i-2)}d_{(i-2)}+\sum_{k=i-1}^{i}\alpha_{(k)}d_{(k)} \\ &= e_{(i-2)}+\sum_{k=i-2}^{i}\alpha_{(k)}d_{(k)} \\ &= \dq\vdots \\ &= e_{(0)}+\sum_{k=0}^{i}\alpha_{(k)}d_{(k)} \tag{CJDR.5} }$$

Motivated by the discussion and results from the previous section, we require that the desired search directions $d_{(0)}$, $d_{(1)}$,…, $d_{(n-1)}$ be mutually conjugate. In a moment, we will show how to derive such directions.

In the section on Steepest Descent, we performed a line search for the value $\alpha_{(i)}$ that minimizes $f(x_{(i+1)})=f(x_{(i)}+\alpha_{(i)}r_{(i)})$. We found that $\alpha_{(i)}$ minimizes $f(x_{(i+1)})=f(x_{(i)}+\alpha_{(i)}r_{(i)})$ if and only if $r_{(i+1)}$ and $r_{(i)}$ are orthogonal. We find an analogous equivalence here:

$$\align{ 0 &= \wderiv{}{\alpha_{(i)}}f(x_{(i+1)}) \\ &= \innprdBg{f'(x_{(i+1)})}{\wderiv{x_{(i+1)}}{\alpha_{(i)}}}\tag{by the Multivariable Chain Rule} \\ &= \innprdBg{-r_{(i+1)}}{\wderiv{(x_{(i)}+\alpha_{(i)}d_{(i)})}{\alpha_{(i)}}} \\ &= \innprd{Ae_{(i+1)}}{d_{(i)}}\tag{CJDR.6} }$$

Hence $\alpha_{(i)}$ minimizes $f(x_{(i+1)})=f(x_{(i)}+\alpha_{(i)}d_{(i)})$ if and only if $e_{(i+1)}$ and $d_{(i)}$ are $A$-orthogonal. Hence

$$\align{ 0 &= \innprd{Ae_{(i+1)}}{d_{(i)}} \\ &= \innprd{d_{(i)}}{Ae_{(i+1)}} \\ &= \innprd{d_{(i)}}{A(e_{(i)}+\alpha_{(i)}d_{(i)})}\tag{by CJDR.4} \\ &= \innprd{d_{(i)}}{Ae_{(i)}+\alpha_{(i)}Ad_{(i)}} \\ &= \innprd{d_{(i)}}{Ae_{(i)}}+\innprd{d_{(i)}}{\alpha_{(i)}Ad_{(i)}} \\ &= \innprd{d_{(i)}}{Ae_{(i)}}+\alpha_{(i)}\innprd{d_{(i)}}{Ad_{(i)}} }$$

or

$$ \alpha_{(i)} = -\frac{\innprd{d_{(i)}}{Ae_{(i)}}}{\innprd{d_{(i)}}{Ad_{(i)}}} = \frac{\innprd{d_{(i)}}{r_{(i)}}}{\innprd{d_{(i)}}{Ad_{(i)}}} \tag{CJDR.7} $$

Note that if the search vector $d_{(i)}$ was the residual, this formula would be identical to SD.2 from Steepest Descent.

To verify that the Conjugate Directions algorithm takes $n$ steps, note that the conjugacy of $d_{(0)}$, $d_{(1)}$,…, $d_{(n-1)}$ implies that it is a linearly independent list of $n$ vectors in the $n$ dimensional vector space $V$. Hence, by Axler 2.39, $d_{(0)}$, $d_{(1)}$,…, $d_{(n-1)}$ is a basis of $V$. Hence, by Axler 2.29, we can write any vector in $V$ as a linear combination of this basis. In particular, for some scalars $\delta_{(0)}$, $\delta_{(1)}$,…, $\delta_{(n-1)}$, we can write the initial error term $e_{(0)}$ as

$$ e_{(0)} = \sum_{j=0}^{n-1}\delta_{(j)}d_{(j)} \tag{CJDR.8} $$

Hence

$$\align{ \innprd{d_{(i)}}{Ae_{(0)}} &= \innprdBgg{d_{(i)}}{A\sum_{j=0}^{n-1}\delta_{(j)}d_{(j)}} \\ &= \innprdBgg{d_{(i)}}{\sum_{j=0}^{n-1}\delta_{(j)}Ad_{(j)}} \\ &= \sum_{j=0}^{n-1}\innprd{d_{(i)}}{\delta_{(j)}Ad_{(j)}} \\ &= \sum_{j=0}^{n-1}\delta_{(j)}\innprd{d_{(i)}}{Ad_{(j)}} \\ &= \delta_{(i)}\innprd{d_{(i)}}{Ad_{(i)}}\tag{by the $A$-orthogonality of $d_{(0)}$, $d_{(1)}$,..., $d_{(n-1)}$} }$$

Also note that

$$\align{ \innprd{d_{(i)}}{Ae_{(0)}} &= \innprd{d_{(i)}}{Ae_{(0)}}+\sum_{k=0}^{i-1}\alpha_{(k)}\innprd{d_{(i)}}{Ad_{(k)}} \\ &= \innprd{d_{(i)}}{Ae_{(0)}}+\innprdBgg{d_{(i)}}{\sum_{k=0}^{i-1}\alpha_{(k)}Ad_{(k)}} \\ &= \innprdBgg{d_{(i)}}{Ae_{(0)}+\sum_{k=0}^{i-1}\alpha_{(k)}Ad_{(k)}} \\ &= \innprdBgg{d_{(i)}}{Ae_{(0)}+A\sum_{k=0}^{i-1}\alpha_{(k)}d_{(k)}} \\ &= \innprdBgg{d_{(i)}}{A\Prngg{e_{(0)}+\sum_{k=0}^{i-1}\alpha_{(k)}d_{(k)}}} \\ &= \innprd{d_{(i)}}{Ae_{(i)}}\tag{CJDR.9} }$$

The last equality follows from CJDR.5. Hence

$$ \delta_{(i)} = \frac{\innprd{d_{(i)}}{Ae_{(0)}}}{\innprd{d_{(i)}}{Ad_{(i)}}} = \frac{\innprd{d_{(i)}}{Ae_{(i)}}}{\innprd{d_{(i)}}{Ad_{(i)}}} = -\alpha_{(i)} \tag{CJDR.10} $$

Hence

$$ e_{(i)} = e_{(0)}+\sum_{j=0}^{i-1}\alpha_{(j)}d_{(j)} = \sum_{j=0}^{n-1}\delta_{(j)}d_{(j)}-\sum_{j=0}^{i-1}\delta_{(j)}d_{(j)} = \sum_{j=i}^{n-1}\delta_{(j)}d_{(j)} \tag{CJDR.11} $$

And we’re done after $n$ iterations since $e_{(n)}=0$. The equation is helpful because we see that we have fulfilled our stated intention to go in any given direction at most once. That is, after $i$ iterations, we need only take $n-i$ more steps in exactly $n-i$ distinct directions.

In the next section, we derive a formula for computing the search directions. To summarize, given some linearly independent list $u_0,\dots,u_{n-1}$, our $n$-step Conjugate Directions algorithm becomes

$$\gather{ d_{(0)} = u_0 \\\\ r_{(i)}=b-Ax_{(i)} \\\\ \alpha_{(i)} = \frac{\innprd{d_{(i)}}{r_{(i)}}}{\innprd{d_{(i)}}{Ad_{(i)}}} \\\\ x_{(i+1)}=x_{(i)}+\alpha_{(i)}d_{(i)} \\\\ d_{(i+1)} = u_{i+1} - \sum_{j=0}^{i}\frac{\innprd{u_{i+1}}{Ad_{(j)}}}{\innprd{d_{(j)}}{Ad_{(j)}}}d_{(j)} \\\\ }$$

Gram-Schmidt Conjugation

We want to derive a list $d_{(0)},\dots,d_{(n-1)}$ of $A$-orthogonal vectors from a linearly independent list $u_{0},\dots,u_{n-1}$. First we will prove and plot a simple result that gives us some intuition into this procedure:

Proposition GSC.1 Let $V$ be an inner product space, let $A$ be positive definite on $V$, and let $u,v\in V$ with $v\neq0$. Then there exists a vector $w\in V$ and a scalar $c$ such that

$$ \innprd{Aw}{v} = 0 \dq\text{and}\dq u = cv + w $$

Proof Note that

$$ u = cv + (u - cv) $$

Put $w\equiv u-cv$. We want to choose $c$ so that $v$ is orthogonal to $Aw=Au-cAv$. That is, we want

$$ 0 = \innprd{v}{Aw} = \innprd{v}{Au-cAv} = \innprd{v}{Au} - c\innprd{v}{Av} $$

Since $A$ is positive definite and $v\neq0$, then $\innprd{v}{Av}\gt0$ and we can define $c$ by

$$ c \equiv \frac{\innprd{v}{Au}}{\innprd{v}{Av}} $$

Then

$$ cv + w = cv + u-cv = u $$

and

$$ \innprd{Aw}{v} = \innprd{Au-cAv}{v} = \innprd{Au}{v}-c\innprd{Av}{v} = \innprd{Au}{v}-\frac{\innprd{v}{Au}}{\innprd{v}{Av}}\innprd{Av}{v} = 0 $$

$\wes$

The plot below describes the above proposition and connects it to Gram-Schmidt Conjugation. Note that $u_1$ consists of two components; the green component is orthogonal to $Au_0$ and the red component is linearly dependent on $u_0$. So to conjugate $u_0$ and $u_1$, we first put $d_{(0)}\equiv u_0$. Then we subtract the red component from $u_1$ to get $d_{(1)}$.

More formally, for positive definite $A$, we define

$$ d_{(j)} \equiv \cases{ u_j & j=0 \\u_j + \sum_{k=0}^{j-1}\beta_{j,k}d_{(k)} & j=1,\dots,n-1} \tag{GSC.2} $$

for some scalars $\beta_{j,0},\dots\beta_{j,j-1}$ such that $0=\innprd{d_{(j)}}{Ad_{(i)}}$ for $i=0,\dots,j-1$. If we can find such scalars, then CJGC.2 gives $0=\innprd{Ad_{(j)}}{d_{(i)}}$ as well. Hence we get $0=\innprd{d_{(j)}}{Ad_{(i)}}$ for $i\neq j$. To find such scalars, assume that we have already found them, let $i$ and $j$ be integers such that $0\leq i\lt j\leq n-1$, and compute

$$\align{ 0 &= \innprd{d_{(j)}}{Ad_{(i)}} \\ &= \innprdBgg{u_j + \sum_{k=0}^{j-1}\beta_{j,k}d_{(k)}}{Ad_{(i)}} \\ &= \innprd{u_j}{Ad_{(i)}}+\innprdBgg{\sum_{k=0}^{j-1}\beta_{j,k}d_{(k)}}{Ad_{(i)}} \\ &= \innprd{u_j}{Ad_{(i)}}+\sum_{k=0}^{j-1}\innprd{\beta_{j,k}d_{(k)}}{Ad_{(i)}} \\ &= \innprd{u_j}{Ad_{(i)}}+\sum_{k=0}^{j-1}\beta_{j,k}\innprd{d_{(k)}}{Ad_{(i)}} \\ &= \innprd{u_j}{Ad_{(i)}}+\beta_{j,i}\innprd{d_{(i)}}{Ad_{(i)}}\tag{by assumption} }$$

So we define

$$ \beta_{j,i} \equiv -\frac{\innprd{u_j}{Ad_{(i)}}}{\innprd{d_{(i)}}{Ad_{(i)}}} \tag{GSC.3} $$

Suppose $u_{0},\dots,u_{n-1}$ is not linearly independent. Then it’s possible that $0=d_{(i)}$, in which case $0=\innprd{d_{(i)}}{Ad_{(i)}}$ and $\beta_{j,i}$ is undefined. The linear independence of $u_{0},\dots,u_{n-1}$ guarantees that all of the $\beta_{j,i}$’s are well defined.

Proposition GSC.4 Let $A$ be positive definite and let $u_{0},\dots,u_{n-1}$ be linearly independent. Define $d_{(0)},\dots,d_{(n-1)}$ as in GSC.2. Let $i$ and $j$ be integers between $0$ and $n-1$ inclusively such that $i\neq j$. Then we have $0=\innprd{d_{(i)}}{Ad_{(j)}}$.

Proof It suffices to show that for integers $i$ and $j$ with $0\leq i\lt j\leq n-1$, we have $0=\innprd{d_{(i)}}{Ad_{(j)}}$. We will prove this by induction. For $i=0$ and $j=1$, we have

$$\align{ \innprd{d_{(0)}}{Ad_{(1)}} &= \innprd{u_0}{A(u_1+\beta_{1,0}d_{(0)})} \\ &= \innprd{u_0}{Au_1+\beta_{1,0}Ad_{(0)}} \\ &= \innprd{u_0}{Au_1+\beta_{1,0}Au_{(0)}} \\ &= \innprd{u_0}{Au_1}+\innprd{u_0}{\beta_{1,0}Au_{(0)}} \\ &= \innprd{u_0}{Au_1}+\beta_{1,0}\innprd{u_0}{Au_{(0)}} \\ &= \innprd{u_0}{Au_1}-\frac{\innprd{u_1}{Au_{(0)}}}{\innprd{u_{(0)}}{Au_{(0)}}}\innprd{u_0}{Au_{(0)}} \\ &= \innprd{u_0}{Au_1}-\innprd{u_1}{Au_{(0)}} \\ &= \innprd{u_0}{Au_1}-\innprd{Au_1}{u_{(0)}} \\ &= 0 }$$

Our induction assumption is that, for some integer $h$ with $2\leq h\leq n-1$, we have $0=\innprd{d_{(i)}}{Ad_{(j)}}$ for all $0\leq i\lt j\leq h-1$. Then for any $0\leq i\lt h$, we have

$$\align{ \innprd{d_{(i)}}{Ad_{(h)}} &= \innprdBgg{d_{(i)}}{A\Prngg{u_h + \sum_{k=0}^{h-1}\beta_{h,k}d_{(k)}}} \\ &= \innprdBgg{d_{(i)}}{Au_h + A\sum_{k=0}^{h-1}\beta_{h,k}d_{(k)}} \\ &= \innprdBgg{d_{(i)}}{Au_h + \sum_{k=0}^{h-1}\beta_{h,k}Ad_{(k)}} \\ &= \innprd{d_{(i)}}{Au_h} + \innprdBgg{d_{(i)}}{\sum_{k=0}^{h-1}\beta_{h,k}Ad_{(k)}} \\ &= \innprd{d_{(i)}}{Au_h} + \sum_{k=0}^{h-1}\innprd{d_{(i)}}{\beta_{h,k}Ad_{(k)}} \\ &= \innprd{d_{(i)}}{Au_h} + \sum_{k=0}^{h-1}\beta_{h,k}\innprd{d_{(i)}}{Ad_{(k)}} \\ &= \innprd{d_{(i)}}{Au_h} + \beta_{h,i}\innprd{d_{(i)}}{Ad_{(i)}}\tag{by the induction assumption} \\ &= \innprd{d_{(i)}}{Au_h} - \frac{\innprd{u_h}{Ad_{(i)}}}{\innprd{d_{(i)}}{Ad_{(i)}}}\innprd{d_{(i)}}{Ad_{(i)}} \\ &= \innprd{d_{(i)}}{Au_h} - \innprd{u_h}{Ad_{(i)}} \\ &= \innprd{d_{(i)}}{Au_h} - \innprd{Au_h}{d_{(i)}} \\ &= 0 }$$

$\wes$

Equivalence to Orthogonal Directions in Unskewed Space

In progress

References

  1. Painless Conjugate Gradient by Jonathan Richard Shewchuk
  2. Painless Conjugate Gradient in Python by Andre van Schaik
  3. Applied Numerical Linear Algebra by James W. Demmel