Conjugate Gradient Part 3a

The source code for all algorithms and plots can be found here.

Plot The Level $k$ Ellipsoid

In the previous section, we defined the level $k$ ellipsoid to be the set of all points $x$ that satisfy ELC.13 or equivalently ELC.16 or equivalently $f(x)=k$. But we didn’t show that such points exist for an arbitrary $k\gt f(x_{*})$. Nor did we show how to find them. In this section, we will do both by finding and proving an isomorphism between the level $k$ ellipsoid and the unit ball.

To maximize our understanding, we will do this visually and rigorously. Hence we will commingle our work between a concrete example in $\wR^3$ and an arbitrary inner product space of dimension $n$. All of the work in $\wR^3$ is easily generalized to the latter.

Our goal is to plot the level $k$ ellipsoid generated by the quadratic form. We start with the unit ball or sphere centered at the origin. The previous section gives us the center point and the semi-axes lengths of the level $k$ ellipsoid. We also learned that the eigenvectors of $A$ align with the axes of the origin-centered level $k$ ellipsoid. We must find an algorithm for rotating, dilating, and centering the unit ball to arrive at the level $k$ ellipsoid.

We will use the quadratic form given in equations ELC.1 and TP.2. Hence $\alpha=\frac12$ and $\beta=-1$ in equation ELC.1. We assume that the eigenvalues $\lambda_1, \lambda_2, \lambda_3$ of $A$ are ordered smallest to largest. First we plot the unit ball along with the standard basis $e_1,e_2,e_3$ and the ordered eigenvectors $v_1,v_2,v_3$:

Note the color scheme. Those points with the lowest $z$ component are colored red. Those points with the highest $z$ component are colored blue. The points in between are colored green. These colors will help us visualize the rotations and dilation at each step.

Again, the dilation occurs along the axes given by the eigenvectors. Formally, let $x$ be a point on the unit ball centered at the origin. We must find a matrix $M$ such that $Mx$ lies on the level $k$ ellipsoid of $f$ centered at the origin. But it’s not at all clear how to derive such an $M$.

If the eigenvectors happened to be the standard basis, it would be simple to derive such an $M$. We would construct $M$ as a diagonal matrix with the lengths of the semi-axes in the diagonal entries.

So, in order to simplify the dilation in our example, we will first rotate the unit ball and the vectors so that the eigenvectors are in the positions currently occupied by the standard basis. Then we will dilate the rotated unit ball according to the equations in ELC.17. Then we will rotate back.

The isolated eigenvectors and standard basis are plotted below. It’s not hard to see that the eigenvectors (blue) can be rotated to the standard basis vectors (green).

Direct derivation of a rotation matrix that maps the eigenvectors to the standard basis is a relatively straightforward exercise. In the source code, I have implemented this direct derivation in $\wR^3$. The gist of the direct derivation is this:

Derive a matrix $R_1$ that rotates $v_3$ around the $x$-axis such that the $y$ component of $R_1v_3$ is zero.
Derive a matrix $R_2$ that rotates $R_1v_3$ around the $y$-axis such that $R_2R_1v_3$ equals the third standard basis vector.
Derive a matrix $R_3$ that rotates $R_2R_1v_1$ around the $z$-axis such that $R_3R_2R_1v_1$ equals the first standard basis vector.

Note the following:

Step (1) rotates $v_3$ to a vector $R_1v_3$ that is orthogonal to the $y$-axis. Hence, since $\dnorm{R_1v_3}=\dnorm{v_3}=1$, when we rotate $R_1v_3$ around the $y$-axis in step (2), we are guaranteed to arrive at the third standard basis vector.
Step (2) rotates $R_1v_3$ to the third standard basis vector. Hence, when we rotate around the $z$-axis in step (3), $R_2R_1v_3$ stays put. That is $R_3R_2R_1v_3=R_2R_1v_3=e_3$.
Since rotations preserve angle, then $R_2R_1v_3$ and $R_2R_1v_1$ are orthogonal. Hence, since $R_2R_1v_3=e_3$, then $R_2R_1v_1$ is orthogonal to the $z$-axis. Hence, when we rotate $R_2R_1v_1$ around the $z$-axis in step (3), then we are guaranteed to arrive at the first standard basis vector.

Define $R\equiv R_3R_2R_1$. When we multiply $R$ by an eigenvector $v_j$, we get a standard basis vector $e_j$. That is

$$ Rv_j = e_j \dq\text{for }j=1,2,3\tag{PLE.1} $$

This is exactly what we wanted. From here, we just dilate, rotate back, and add $m$. But before we get to that, let’s look a little more closely at the rotation matrix $R$. We will find a very close connection to the Eigendecomposition of $A$. To see this, note that equation PLE.1 yields important information about the rows $r_1,r_2,r_3$ of $R$. For $r_1$, we have

$$ \innprd{r_1}{v_1}=1 \dq \innprd{r_1}{v_2}=0 \dq \innprd{r_1}{v_3}=0 $$

Since $v_1,v_2,v_3$ is an orthonormal basis of $\wR^3$, then

$$ r_1=\innprd{r_1}{v_1}v_1+\innprd{r_1}{v_2}v_2+\innprd{r_1}{v_3}v_3=v_1 $$

We can similarly show that $r_2=v_2$ and $r_3=v_3$. Hence the columns of $R^T$ are the eigenvectors of $A$. Hence the Eigendecomposition of $A$ is $R^T\Lambda R$ where $\Lambda$ is the diagonal (i.e. dilation) matrix of eigenvalues of $A$. The geometric interpretation of the Eigendecomposition $R^T\Lambda R$ is that $R$ rotates an arbitrary vector $x$ and $\Lambda$ dilates the vector $Rx$. Then $R^T$ rotates the dilated vector $\Lambda Rx$ back to the original basis:

$$\align{ R^T\Lambda Rx &= \inv{R}\Lambda R\sum_{j=1}^3\innprd{x}{v_j}v_j \\ &= \sum_{j=1}^3\innprd{x}{v_j}\inv{R}\Lambda Rv_j \\ &= \sum_{j=1}^3\innprd{x}{v_j}\inv{R}\Lambda e_j\tag{by PLE.1} \\ &= \sum_{j=1}^3\innprd{x}{v_j}\inv{R}\lambda_je_j \\ &= \sum_{j=1}^3\innprd{x}{v_j}\lambda_j\inv{R}Rv_j \\ &= \sum_{j=1}^3\innprd{x}{v_j}\lambda_jv_j\tag{PLE.2} }$$

The first equality follows because rotation matrices are orthogonal and invertible and we have $\inv{R}=R^T$. Note that PLE.2 agrees with ELC.8. Also notice that the vector $R^T\Lambda Rx = \sum_{j=1}^3\innprd{x}{v_j}\lambda_jv_j$ is very similar to the vector $x=\sum_{j=1}^3\innprd{x}{v_j}v_j$. The only difference is that $R^T\Lambda Rx$ is scaled by the factor of $\lambda_j$ in the direction of $v_j$.

Keeping this strong connection to the Eigendecomposition in mind, let’s return to the problem at hand. We define $d_{k,i}$ to be length of the $i^{th}$ largest semi-axis generated by the quadratic form $f$ at the level $k$ ellipsoid. From the equations in ELC.17, we have

$$ d_{k,1}\equiv\sqrt{\frac{\phi_k}{\alpha\lambda_1}} \dq d_{k,2}\equiv\sqrt{\frac{\phi_k}{\alpha\lambda_2}} \dq d_{k,3}\equiv\sqrt{\frac{\phi_k}{\alpha\lambda_3}} $$

Define $D_k$ to be the diagonal matrix of $d_{k,1},d_{k,2},d_{k,3}$. The geometric interpretation of $R^TD_kR$ is similar to that of the Eigendecomposition. And we can perform a similar computation to PLE.2 to get

$$ R^TD_kRx = \sum_{j=1}^3\innprd{x}{v_j}d_{k,j}v_j\tag{PLE.3} $$

Hence the level $k$ ellipsoid of a quadratic form on $A$ is just a differently dilated and differently centered version of the action of $A$ on the unit ball.

So to plot the level $k$ ellipsoid centered at $m$, we start with the unit ball centered at the origin. We apply the decomposition $R^TD_kR$ to every point on the unit ball. And then we add the center point $m$ to these points. It’s helpful to see each of these steps one at a time. First, $R$ times the unit ball, the eigenvectors, and the standard basis looks like this:

When we compare PLE.Fig.3 with PLE.Fig.1, we see that indeed the unit ball, the eigenvectors, and the standard basis have all been rotated in the direction we would expect. Next we dilate:

And now we rotate back to the original basis:

And finally in PLE.Fig.6, we add the center $m$ to all the points on the ellipsoid centered at the origin in PLE.Fig.5: