The source code for all algorithms and plots can be found here.
Orthogonal Directions in Unskewed Space
The goal of the Conjugate Directions algorithm is to preclude repeat search directions from the Steepest Descent algorithm. That is, in each search direction, we take exactly one step; and that step will be just the right length to line up evenly with $x_{*}$. After $n$ steps, we’ll be done.
Looking back at Steepest Descent, when we’re finished going in the direction of $r_{(i)}$, we proceed to go in the orthogonal direction of $r_{(i+1)}$. But the vector $-e_{(i+1)}$ gives us the direction we really want to go because it points directly at $x_{*}$. Of course we don’t know the direction of $-e_{(i+1)}$. But we do know that $r_{(i)}$ and $e_{(i+1)}$ are $A$-orthogonal:
$$\align{ 0 = \innprd{r_{(i)}}{r_{(i+1)}} = \innprd{r_{(i)}}{-Ae_{(i+1)}} = -\innprd{r_{(i)}}{Ae_{(i+1)}} \tag{CJDR.1} }$$
This is the crucial piece of information that allows us to develop the methods of Conjugate Directions and Conjugate Gradient. Before we exploit it, note that there are multiple trajectories we could take that are $A$-orthogonal to the previous direction of $r_{(i)}$. But only one of these trajectories is in the direction of $-e_{(i+1)}$. For example, the next plot depicts the $A$-orthogonality of $r_{(0)}$ with both $e_{(1)}$ and $-e_{(1)}$ in $\wR^2$:
Dimensions larger than two infinitely exacerbate this problem. In $\wR^3$, there’s an entire plane of vectors that are orthogonal to $Ar_{(i)}$:
But still, equation CJDR.1 might give us the idea that, when we change directions, it’s better to choose a direction that is $A$-orthogonal to the previous direction than it is to choose a direction that is orthogonal to the previous direction. In fact, to prevent duplicate search directions, we can go a step further and require that each search direction is $A$-orthogonal to all previous search directions.
Suppose we can find $n$ mutually conjugate vectors in an $n$ dimensional inner product space $V$. By proposition CJGC.3, these mutually conjugate vectors are linearly independent. Then by Axler 2.39, these $n$ vectors form a basis of $V$. And by Axler 2.29, we can write any vector in $V$ as a linear combination of these $n$ mutually $A$-orthogonal vectors. In particular, we can write the initial error vector $e_{(0)}=x_{(0)}-x_{*}$ as such a linear combination.
If you’re familiar with the Gram-Schmidt procedure, you might correctly guess that we can devise a similar procedure to produce $A$-orthogonal vectors. We will gradually and intuitively build up to that approach using the tools that we have developed so far.
First we compute the Eigendecomposition of $A=R^T\Lambda R$ and then we compute the diagonal matrix $D_1$ with entries $d_{1,1},\dots,d_{1,n}$ (see ELC.17). Then we use the map $R^T\inv{D_1}R$ and proposition PLE.4 to dilate the contours of $f$ to spheres. The Eigendecomposition also allows us to compute $x_{*}$ (by ELC.19):
Then for any orthonormal basis $u_{0},\dots,u_{n-1}$ of $V$, we compute the unique linear combination that traces a path from $x_{(0)}$ to $x_{*}$ in this spherical space:
$$\align{ R^T\inv{D_1}R(x_0-x_{*}) = R^T\inv{D_1}Re_{(0)} = \sum_{j=0}^{n-1}c_ju_j }$$
The $n$ steps in this path are orthogonal. This procedure of finding the unique linear combination of some orthonormal basis is sometimes called Orthogonal Directions. In the plot below, we use the standard basis in $\wR^2$:
Then we dilate everything back to the original space with the map $R^TD_1R$. And finally we invoke proposition CJGC.4 to see that we have $n$ vectors $c_0R^TD_1Ru_0,\dots,c_{n-1}R^TD_1Ru_{n-1}$ that are $A$-orthogonal:
$$\align{ x_0-x_{*} = e_{(0)} = R^TD_1RR^T\inv{D_1}Re_{(0)} = R^TD_1R\sum_{j=0}^{n-1}c_ju_j = \sum_{j=0}^{n-1}c_jR^TD_1Ru_j }$$
Let’s look at this approach in $\wR^3$. First we compute $R^T\inv{D_1}R$ and dilate the space such that the level $1$ ellipsoid becomes the unit ball around $m$:
Then we perform Orthogonal Directions:
And then we use $R^TD_1R$ to revert the earlier dilation so that the unit ball becomes the level $1$ ellipsoid. This yields three vectors that are $A$-orthogonal and trace a path from $x_{(0)}$ to $x_{*}$: