Conjugate Gradient Part 4

08 May 2019

The source code for all algorithms and plots can be found here.

Conjugacy

We define two vectors $u$ and $v$ to be $A$-orthogonal or conjugate if

$$ \innprd{Au}{v}=0 \tag{CJGC.1} $$

Note that this is equivalent to $\innprd{u}{Av}=0$ when $A$ is symmetric since

$$ \innprd{Au}{v}=\innprd{u}{A^Tv}=\innprd{u}{Av} \tag{CJGC.2} $$

Proposition CJGC.3 Conjugacy implies linear independence when $A$ is positive definite.

Proof Let $u_1$,…,$u_n$ be mutually $A$-orthogonal and suppose, by way of contradiction, that there exist scalars $\gamma_1$,…,$\gamma_n$ not at all zero such that

$$ 0=\gamma_1u_1+\dotsb+\gamma_nu_n $$

Suppose $\gamma_i$ is nonzero and take the inner product of both sides with $Au_i$. Then

$$ 0=\innprdBg{Au_i}{\sum_{j=1}^n\gamma_ju_j}=\sum_{j=1}^n\innprd{Au_i}{\gamma_ju_j}=\sum_{j=1}^n\gamma_j\innprd{Au_i}{u_j}=\gamma_i\innprd{Au_i}{u_i}\neq0 $$

The last equality holds because, by the $A$-orthogonality of $u_1,\dots,u_n$, we have $\innprd{Au_i}{u_j}=0$ for $j\neq i$. The non-equality on the right side holds because $\gamma_i\neq0$ and $\innprd{Au_i}{u_i}\gt0$. This is a contradiction and linear independence must hold.

$\wes$

We will soon exploit conjugacy to find $x_{*}$ more efficiently than does Steepest Descent. The next proposition will be critical to that.

Proposition CJGC.4 Suppose $A$ is positive definite on an inner product space $V$ and let $x,y\in V$. Let $k$ and $q$ be arbitrary scalars that are both greater than $f(x_{*})$. Then the following are equivalent:

(a) $x$ and $y$ are orthogonal.
(b) $R^TD_kRx$ and $R^TD_qRy$ are $A$-orthogonal.
(c) $\inv{A}x$ and $y$ are $A$-orthogonal.
(d) $x$ and $\inv{A}y$ are $A$-orthogonal.

Proof Note that

$$\align{ R^TD_k\Lambda D_qRy &= R^TD_k\Lambda D_qR\sum_{j=1}^n\innprd{y}{v_j}v_j \\ &= \sum_{j=1}^n\innprd{y}{v_j}R^TD_k\Lambda D_qRv_j \\ &= \sum_{j=1}^n\innprd{y}{v_j}R^TD_k\Lambda D_qe_j \\ &= \sum_{j=1}^n\innprd{y}{v_j}R^TD_k\Lambda d_{q,j}e_j \\ &= \sum_{j=1}^n\innprd{y}{v_j}d_{q,j}R^TD_k\Lambda e_j \\ &= \sum_{j=1}^n\innprd{y}{v_j}d_{q,j}R^TD_k\lambda_je_j \\ &= \sum_{j=1}^n\innprd{y}{v_j}d_{q,j}\lambda_jR^TD_ke_j \\ &= \sum_{j=1}^n\innprd{y}{v_j}d_{q,j}\lambda_jR^Td_{k,j}e_j \\ &= \sum_{j=1}^n\innprd{y}{v_j}d_{q,j}d_{k,j}\lambda_jR^Te_j \\ &= \sum_{j=1}^n\innprd{y}{v_j}d_{q,j}d_{k,j}\lambda_jv_j \\ &= \sum_{j=1}^n\innprd{y}{v_j}\sqrt{\frac{\phi_q}{\alpha\lambda_j}}\sqrt{\frac{\phi_k}{\alpha\lambda_j}}\lambda_jv_j \\ &= \sum_{j=1}^n\innprd{y}{v_j}\sqrt{\frac{\phi_q\phi_k}{\alpha^2\lambda_j^2}}\lambda_jv_j \\ &= \sum_{j=1}^n\innprd{y}{v_j}\frac{\sqrt{\phi_q\phi_k}}{\alpha\lambda_j}\lambda_jv_j \\ &= \sum_{j=1}^n\innprd{y}{v_j}\frac{\sqrt{\phi_q\phi_k}}{\alpha}v_j \\ &= \frac{\sqrt{\phi_q\phi_k}}{\alpha}\sum_{j=1}^n\innprd{y}{v_j}v_j \\ &= \frac{\sqrt{\phi_q\phi_k}}{\alpha}y }$$

Hence

$$\align{ \innprd{AR^TD_kRx}{R^TD_qRy} &= \innprd{R^T\Lambda RR^TD_kRx}{R^TD_qRy} \\\\ &= \innprd{R^T\Lambda D_kRx}{R^TD_qRy} \\\\ &= \innprd{x}{(R^T\Lambda D_kR)^TR^TD_qRy} \\\\ &= \innprd{x}{R^TD_k^T\Lambda^T(R^T)^TR^TD_qRy} \\\\ &= \innprd{x}{R^TD_k^T\Lambda^TRR^TD_qRy} \\\\ &= \innprd{x}{R^TD_k^T\Lambda^TD_qRy} \\\\ &= \innprd{x}{R^TD_k\Lambda D_qRy}\tag{since $D_k$ and $\Lambda$ are diagonal} \\\\ &= \innprdBg{x}{\frac{\sqrt{\phi_q\phi_k}}{\alpha}y}\tag{by the above computation} \\\\ &= \frac{\sqrt{\phi_q\phi_k}}{\alpha}\innprd{x}{y} \\\\ }$$

Hence $0=\innprd{AR^TD_kRx}{R^TD_qRy}$ if and only if $0=\innprd{x}{y}$ since $\phi_q$ and $\phi_k$ are nonzero (by ELC.18). This proves (a)$\iff$(b).

To show that (a)$\iff$(c), note that $0=\innprd{x}{y}$ if and only if $0=\innprd{\inv{A}x}{Ay}$ since

$$ \innprd{x}{y} = \innprd{A\inv{A}x}{y} = \innprd{\inv{A}x}{A^Ty} = \innprd{\inv{A}x}{Ay} $$

To show that (a)$\iff$(d), note that $0=\innprd{x}{y}$ if and only if $0=\innprd{Ax}{\inv{A}y}$ since

$$ \innprd{x}{y} = \innprd{\inv{A}Ax}{y} = \innprd{Ax}{(\inv{A})^Ty} = \innprd{Ax}{\inv{(A^T)}y} = \innprd{Ax}{\inv{A}y} $$

$\wes$