The source code for all algorithms and plots can be found here.
Conjugate Directions
In the previous section, we described a method to compute $n$ steps from the starting point $x_{(0)}$ to the minimum point $x_{*}$. But that method requires that we compute the Eigendecomposition of $A$. This is a very expensive computation. Fortunately, there is a way to circumvent this requirement and still achieve the same result.
We will work in an arbitrary real inner product space $V$ of dimension $n$. Our goal is to find search directions $d_{(0)}$, $d_{(1)}$,…, $d_{(n-1)}$ and step sizes $\alpha_{(0)}$, $\alpha_{(1)}$, …, $\alpha_{(n-1)}$ that get us to $x_{*}$ in $n$ iterations. That is, we want to iterate $n$ times through the computations
$$ x_{(i+1)} = x_{(i)}+\alpha_{(i)}d_{(i)} \tag{CJDR.2} $$
in such a way that we get
$$ x_{(n)} = x_{*} \dq\iff\dq e_{(n)} = 0 \tag{CJDR.3} $$
Note that when we subtract $x_{*}$ from each side of CJDR.2, we get
$$\align{ e_{(i+1)} &= e_{(i)}+\alpha_{(i)}d_{(i)} \tag{CJDR.4} \\ &= e_{(i-1)}+\alpha_{(i-1)}d_{(i-1)}+\alpha_{(i)}d_{(i)} \\ &= e_{(i-1)}+\sum_{k=i-1}^{i}\alpha_{(k)}d_{(k)} \\ &= e_{(i-2)}+\alpha_{(i-2)}d_{(i-2)}+\sum_{k=i-1}^{i}\alpha_{(k)}d_{(k)} \\ &= e_{(i-2)}+\sum_{k=i-2}^{i}\alpha_{(k)}d_{(k)} \\ &= \dq\vdots \\ &= e_{(0)}+\sum_{k=0}^{i}\alpha_{(k)}d_{(k)} \tag{CJDR.5} }$$
Motivated by the discussion and results from the previous section, we require that the desired search directions $d_{(0)}$, $d_{(1)}$,…, $d_{(n-1)}$ be mutually conjugate. In a moment, we will show how to derive such directions.
In the section on Steepest Descent, we performed a line search for the value $\alpha_{(i)}$ that minimizes $f(x_{(i+1)})=f(x_{(i)}+\alpha_{(i)}r_{(i)})$. We found that $\alpha_{(i)}$ minimizes $f(x_{(i+1)})=f(x_{(i)}+\alpha_{(i)}r_{(i)})$ if and only if $r_{(i+1)}$ and $r_{(i)}$ are orthogonal. We find an analogous equivalence here:
$$\align{ 0 &= \wderiv{}{\alpha_{(i)}}f(x_{(i+1)}) \\ &= \innprdBg{f'(x_{(i+1)})}{\wderiv{x_{(i+1)}}{\alpha_{(i)}}}\tag{by the Multivariable Chain Rule} \\ &= \innprdBg{-r_{(i+1)}}{\wderiv{(x_{(i)}+\alpha_{(i)}d_{(i)})}{\alpha_{(i)}}} \\ &= \innprd{Ae_{(i+1)}}{d_{(i)}}\tag{CJDR.6} }$$
Hence $\alpha_{(i)}$ minimizes $f(x_{(i+1)})=f(x_{(i)}+\alpha_{(i)}d_{(i)})$ if and only if $e_{(i+1)}$ and $d_{(i)}$ are $A$-orthogonal. Hence
$$\align{ 0 &= \innprd{Ae_{(i+1)}}{d_{(i)}} \\ &= \innprd{d_{(i)}}{Ae_{(i+1)}} \\ &= \innprd{d_{(i)}}{A(e_{(i)}+\alpha_{(i)}d_{(i)})}\tag{by CJDR.4} \\ &= \innprd{d_{(i)}}{Ae_{(i)}+\alpha_{(i)}Ad_{(i)}} \\ &= \innprd{d_{(i)}}{Ae_{(i)}}+\innprd{d_{(i)}}{\alpha_{(i)}Ad_{(i)}} \\ &= \innprd{d_{(i)}}{Ae_{(i)}}+\alpha_{(i)}\innprd{d_{(i)}}{Ad_{(i)}} }$$
or
$$ \alpha_{(i)} = -\frac{\innprd{d_{(i)}}{Ae_{(i)}}}{\innprd{d_{(i)}}{Ad_{(i)}}} = \frac{\innprd{d_{(i)}}{r_{(i)}}}{\innprd{d_{(i)}}{Ad_{(i)}}} \tag{CJDR.7} $$
Note that if the search vector $d_{(i)}$ was the residual, this formula would be identical to SD.2 from Steepest Descent.
To verify that the Conjugate Directions algorithm takes $n$ steps, note that the conjugacy of $d_{(0)}$, $d_{(1)}$,…, $d_{(n-1)}$ implies that it is a linearly independent list of $n$ vectors in the $n$ dimensional vector space $V$. Hence, by Axler 2.39, $d_{(0)}$, $d_{(1)}$,…, $d_{(n-1)}$ is a basis of $V$. Hence, by Axler 2.29, we can write any vector in $V$ as a linear combination of this basis. In particular, for some scalars $\delta_{(0)}$, $\delta_{(1)}$,…, $\delta_{(n-1)}$, we can write the initial error term $e_{(0)}$ as
$$ e_{(0)} = \sum_{j=0}^{n-1}\delta_{(j)}d_{(j)} \tag{CJDR.8} $$
Hence
$$\align{ \innprd{d_{(i)}}{Ae_{(0)}} &= \innprdBgg{d_{(i)}}{A\sum_{j=0}^{n-1}\delta_{(j)}d_{(j)}} \\ &= \innprdBgg{d_{(i)}}{\sum_{j=0}^{n-1}\delta_{(j)}Ad_{(j)}} \\ &= \sum_{j=0}^{n-1}\innprd{d_{(i)}}{\delta_{(j)}Ad_{(j)}} \\ &= \sum_{j=0}^{n-1}\delta_{(j)}\innprd{d_{(i)}}{Ad_{(j)}} \\ &= \delta_{(i)}\innprd{d_{(i)}}{Ad_{(i)}}\tag{by the $A$-orthogonality of $d_{(0)}$, $d_{(1)}$,..., $d_{(n-1)}$} }$$
Also note that
$$\align{ \innprd{d_{(i)}}{Ae_{(0)}} &= \innprd{d_{(i)}}{Ae_{(0)}}+\sum_{k=0}^{i-1}\alpha_{(k)}\innprd{d_{(i)}}{Ad_{(k)}} \\ &= \innprd{d_{(i)}}{Ae_{(0)}}+\innprdBgg{d_{(i)}}{\sum_{k=0}^{i-1}\alpha_{(k)}Ad_{(k)}} \\ &= \innprdBgg{d_{(i)}}{Ae_{(0)}+\sum_{k=0}^{i-1}\alpha_{(k)}Ad_{(k)}} \\ &= \innprdBgg{d_{(i)}}{Ae_{(0)}+A\sum_{k=0}^{i-1}\alpha_{(k)}d_{(k)}} \\ &= \innprdBgg{d_{(i)}}{A\Prngg{e_{(0)}+\sum_{k=0}^{i-1}\alpha_{(k)}d_{(k)}}} \\ &= \innprd{d_{(i)}}{Ae_{(i)}}\tag{CJDR.9} }$$
The last equality follows from CJDR.5. Hence
$$ \delta_{(i)} = \frac{\innprd{d_{(i)}}{Ae_{(0)}}}{\innprd{d_{(i)}}{Ad_{(i)}}} = \frac{\innprd{d_{(i)}}{Ae_{(i)}}}{\innprd{d_{(i)}}{Ad_{(i)}}} = -\alpha_{(i)} \tag{CJDR.10} $$
Hence
$$ e_{(i)} = e_{(0)}+\sum_{j=0}^{i-1}\alpha_{(j)}d_{(j)} = \sum_{j=0}^{n-1}\delta_{(j)}d_{(j)}-\sum_{j=0}^{i-1}\delta_{(j)}d_{(j)} = \sum_{j=i}^{n-1}\delta_{(j)}d_{(j)} \tag{CJDR.11} $$
And we’re done after $n$ iterations since $e_{(n)}=0$. The equation is helpful because we see that we have fulfilled our stated intention to go in any given direction at most once. That is, after $i$ iterations, we need only take $n-i$ more steps in exactly $n-i$ distinct directions.
In the next section, we derive a formula for computing the search directions. To summarize, given some linearly independent list $u_0,\dots,u_{n-1}$, our $n$-step Conjugate Directions algorithm becomes
$$\gather{ d_{(0)} = u_0 \\\\ r_{(i)}=b-Ax_{(i)} \\\\ \alpha_{(i)} = \frac{\innprd{d_{(i)}}{r_{(i)}}}{\innprd{d_{(i)}}{Ad_{(i)}}} \\\\ x_{(i+1)}=x_{(i)}+\alpha_{(i)}d_{(i)} \\\\ d_{(i+1)} = u_{i+1} - \sum_{j=0}^{i}\frac{\innprd{u_{i+1}}{Ad_{(j)}}}{\innprd{d_{(j)}}{Ad_{(j)}}}d_{(j)} \\\\ }$$
Gram-Schmidt Conjugation
We want to derive a list $d_{(0)},\dots,d_{(n-1)}$ of $A$-orthogonal vectors from a linearly independent list $u_{0},\dots,u_{n-1}$. First we will prove and plot a simple result that gives us some intuition into this procedure:
Proposition GSC.1 Let $V$ be an inner product space, let $A$ be positive definite on $V$, and let $u,v\in V$ with $v\neq0$. Then there exists a vector $w\in V$ and a scalar $c$ such that
$$ \innprd{Aw}{v} = 0 \dq\text{and}\dq u = cv + w $$
Proof Note that
$$ u = cv + (u - cv) $$
Put $w\equiv u-cv$. We want to choose $c$ so that $v$ is orthogonal to $Aw=Au-cAv$. That is, we want
$$ 0 = \innprd{v}{Aw} = \innprd{v}{Au-cAv} = \innprd{v}{Au} - c\innprd{v}{Av} $$
Since $A$ is positive definite and $v\neq0$, then $\innprd{v}{Av}\gt0$ and we can define $c$ by
$$ c \equiv \frac{\innprd{v}{Au}}{\innprd{v}{Av}} $$
Then
$$ cv + w = cv + u-cv = u $$
and
$$ \innprd{Aw}{v} = \innprd{Au-cAv}{v} = \innprd{Au}{v}-c\innprd{Av}{v} = \innprd{Au}{v}-\frac{\innprd{v}{Au}}{\innprd{v}{Av}}\innprd{Av}{v} = 0 $$
$\wes$
The plot below describes the above proposition and connects it to Gram-Schmidt Conjugation. Note that $u_1$ consists of two components; the green component is orthogonal to $Au_0$ and the red component is linearly dependent on $u_0$. So to conjugate $u_0$ and $u_1$, we first put $d_{(0)}\equiv u_0$. Then we subtract the red component from $u_1$ to get $d_{(1)}$.
More formally, for positive definite $A$, we define
$$ d_{(j)} \equiv \cases{ u_j & j=0 \\u_j + \sum_{k=0}^{j-1}\beta_{j,k}d_{(k)} & j=1,\dots,n-1} \tag{GSC.2} $$
for some scalars $\beta_{j,0},\dots\beta_{j,j-1}$ such that $0=\innprd{d_{(j)}}{Ad_{(i)}}$ for $i=0,\dots,j-1$. If we can find such scalars, then CJGC.2 gives $0=\innprd{Ad_{(j)}}{d_{(i)}}$ as well. Hence we get $0=\innprd{d_{(j)}}{Ad_{(i)}}$ for $i\neq j$. To find such scalars, assume that we have already found them, let $i$ and $j$ be integers such that $0\leq i\lt j\leq n-1$, and compute
$$\align{ 0 &= \innprd{d_{(j)}}{Ad_{(i)}} \\ &= \innprdBgg{u_j + \sum_{k=0}^{j-1}\beta_{j,k}d_{(k)}}{Ad_{(i)}} \\ &= \innprd{u_j}{Ad_{(i)}}+\innprdBgg{\sum_{k=0}^{j-1}\beta_{j,k}d_{(k)}}{Ad_{(i)}} \\ &= \innprd{u_j}{Ad_{(i)}}+\sum_{k=0}^{j-1}\innprd{\beta_{j,k}d_{(k)}}{Ad_{(i)}} \\ &= \innprd{u_j}{Ad_{(i)}}+\sum_{k=0}^{j-1}\beta_{j,k}\innprd{d_{(k)}}{Ad_{(i)}} \\ &= \innprd{u_j}{Ad_{(i)}}+\beta_{j,i}\innprd{d_{(i)}}{Ad_{(i)}}\tag{by assumption} }$$
So we define
$$ \beta_{j,i} \equiv -\frac{\innprd{u_j}{Ad_{(i)}}}{\innprd{d_{(i)}}{Ad_{(i)}}} \tag{GSC.3} $$
Suppose $u_{0},\dots,u_{n-1}$ is not linearly independent. Then it’s possible that $0=d_{(i)}$, in which case $0=\innprd{d_{(i)}}{Ad_{(i)}}$ and $\beta_{j,i}$ is undefined. The linear independence of $u_{0},\dots,u_{n-1}$ guarantees that all of the $\beta_{j,i}$’s are well defined.
Proposition GSC.4 Let $A$ be positive definite and let $u_{0},\dots,u_{n-1}$ be linearly independent. Define $d_{(0)},\dots,d_{(n-1)}$ as in GSC.2. Let $i$ and $j$ be integers between $0$ and $n-1$ inclusively such that $i\neq j$. Then we have $0=\innprd{d_{(i)}}{Ad_{(j)}}$.
Proof It suffices to show that for integers $i$ and $j$ with $0\leq i\lt j\leq n-1$, we have $0=\innprd{d_{(i)}}{Ad_{(j)}}$. We will prove this by induction. For $i=0$ and $j=1$, we have
$$\align{ \innprd{d_{(0)}}{Ad_{(1)}} &= \innprd{u_0}{A(u_1+\beta_{1,0}d_{(0)})} \\ &= \innprd{u_0}{Au_1+\beta_{1,0}Ad_{(0)}} \\ &= \innprd{u_0}{Au_1+\beta_{1,0}Au_{(0)}} \\ &= \innprd{u_0}{Au_1}+\innprd{u_0}{\beta_{1,0}Au_{(0)}} \\ &= \innprd{u_0}{Au_1}+\beta_{1,0}\innprd{u_0}{Au_{(0)}} \\ &= \innprd{u_0}{Au_1}-\frac{\innprd{u_1}{Au_{(0)}}}{\innprd{u_{(0)}}{Au_{(0)}}}\innprd{u_0}{Au_{(0)}} \\ &= \innprd{u_0}{Au_1}-\innprd{u_1}{Au_{(0)}} \\ &= \innprd{u_0}{Au_1}-\innprd{Au_1}{u_{(0)}} \\ &= 0 }$$
Our induction assumption is that, for some integer $h$ with $2\leq h\leq n-1$, we have $0=\innprd{d_{(i)}}{Ad_{(j)}}$ for all $0\leq i\lt j\leq h-1$. Then for any $0\leq i\lt h$, we have
$$\align{ \innprd{d_{(i)}}{Ad_{(h)}} &= \innprdBgg{d_{(i)}}{A\Prngg{u_h + \sum_{k=0}^{h-1}\beta_{h,k}d_{(k)}}} \\ &= \innprdBgg{d_{(i)}}{Au_h + A\sum_{k=0}^{h-1}\beta_{h,k}d_{(k)}} \\ &= \innprdBgg{d_{(i)}}{Au_h + \sum_{k=0}^{h-1}\beta_{h,k}Ad_{(k)}} \\ &= \innprd{d_{(i)}}{Au_h} + \innprdBgg{d_{(i)}}{\sum_{k=0}^{h-1}\beta_{h,k}Ad_{(k)}} \\ &= \innprd{d_{(i)}}{Au_h} + \sum_{k=0}^{h-1}\innprd{d_{(i)}}{\beta_{h,k}Ad_{(k)}} \\ &= \innprd{d_{(i)}}{Au_h} + \sum_{k=0}^{h-1}\beta_{h,k}\innprd{d_{(i)}}{Ad_{(k)}} \\ &= \innprd{d_{(i)}}{Au_h} + \beta_{h,i}\innprd{d_{(i)}}{Ad_{(i)}}\tag{by the induction assumption} \\ &= \innprd{d_{(i)}}{Au_h} - \frac{\innprd{u_h}{Ad_{(i)}}}{\innprd{d_{(i)}}{Ad_{(i)}}}\innprd{d_{(i)}}{Ad_{(i)}} \\ &= \innprd{d_{(i)}}{Au_h} - \innprd{u_h}{Ad_{(i)}} \\ &= \innprd{d_{(i)}}{Au_h} - \innprd{Au_h}{d_{(i)}} \\ &= 0 }$$
$\wes$
Equivalence to Orthogonal Directions in Unskewed Space
In progress
References
- Painless Conjugate Gradient by Jonathan Richard Shewchuk
- Painless Conjugate Gradient in Python by Andre van Schaik
- Applied Numerical Linear Algebra by James W. Demmel