
Steepest descent method&Newton’s method

Steepest descent method
Def 4.1. normalized steepest descent direction:
\[\begin{aligned}
\Delta x_{\text{nsd}} := &\arg \min\{\nabla f(x)^T v~|~\|v\|=1\} \\
= &\arg \min\{\nabla f(x)^T v~|~\|v\|\leq1\}
\end{aligned}
\]
Thus \(\nabla f(x)^T\Delta x_{\text{nsd}}=-\|\nabla f(x)\|_*\) .
Def 4.2. unnormalized steepest descent step:
\[\Delta x_{\text{sd}} := \|\nabla f(x)\|_*\Delta x_{\text{nsd}}
\]
Thus \(\nabla f(x)^T\Delta x_{\text{sd}}=-\|\nabla f(x)\|_*^2\) and \(\|\Delta x_{\text{sd}}\|= \|\nabla f(x)\|_*\)
Convergence analysis
\(L\)-smooth + \(\mu\)-strongly convex, \(x^{(k+1)}=x^{(k)}+\eta \Delta x_{\text{sd}}\) .
Lemma 1. there exists constant \(\gamma_1, \gamma_2 \in (0,1]\) \(s.t.\)
\[\|x\|\geq \gamma_1\|x\|_2,\quad\|x\|_*\geq \gamma_2\|x\|_2
\]
Hence we can have that
\[\begin{aligned}
f(x^{(k+1)})-f(x^*) \leq& f(x^{(k)})-f(x^*)+\eta\nabla f(x^{(k)})^T\Delta x_{\text{sd}}+\frac{\eta^2 L}{2}\|\Delta x_{\text{sd}}\|^2_2\\
{\leq} & f(x^{(k)})-f(x^*) + (-\eta+\frac{\eta^2 L}{2\gamma_1^2})\|\nabla f(x^{(k)})\|^2_*\\
\overset{(i)}{\leq} & f(x^{(k)})-f(x^*) + \gamma_2^2(-\eta+\frac{\eta^2 L}{2\gamma_1^2})\|\nabla f(x^{(k)})\|^2_2\\
\leq& f(x^{(k)})-f(x^*) + \frac{\gamma_2^2}{2\mu}(-\eta+\frac{\eta^2 L}{2\gamma_1^2})(f(x^{(k)})-f(x^*))\\
=& (1+\frac{\gamma_2^2}{2\mu}(-\eta+\frac{\eta^2 L}{2\gamma_1^2}))(f(x^{(k)})-f(x^*))
\end{aligned}
\]
\((i)\) holds for \(\eta\in(0,\frac{2\gamma_1^2}{L})\), and \((1+\frac{\gamma_2^2}{2\mu}(-\eta+\frac{\eta^2 L}{2\gamma_1^2}))<1\).
Newton’s method
\(x^{(k+1)}=x^{(k)}+\eta \Delta x_{\text{nt}}\) , where \(\nabla^2 f(x^{(k)})\Delta x_{\text{nt}}+\nabla f(x^{(k)})=0\) .
Convergence analysis
\(L_1\)-smooth + \(\mu\)-strongly convex + \(\nabla^2 f\) is Lipschitz continuous with \(L_2\), \(\Delta x_{\text{nt}}=-\nabla^2 f(x^{(k)})^{-1}\nabla f(x^{(k)})\) .
\(1^{\circ}.\) when \(\|\nabla f(x^{k})\| >\gamma\),
\[\begin{aligned}
f(x^{(k+1)})-f(x^*) \leq& f(x^{(k)})-f(x^*)+\eta\nabla f(x^{(k)})^T\Delta x_{\text{nt}}+\frac{\eta^2 L}{2}\|\Delta x_{\text{nt}}\|^2_2\\
=& f(x^{(k)})-f(x^*) - \eta \Delta x_{\text{nt}}^T \nabla^2 f(x^{(k)})^T \Delta x_{\text{nt}}+\frac{\eta^2 L}{2}\|\Delta x_{\text{nt}}\|^2_2\\
\overset{(i)}{\leq}& f(x^{(k)})-f(x^*) - (\mu\eta-\frac{\eta^2 L}{2})\|\Delta x_{\text{nt}}\|^2_2\\
\leq& f(x^{(k)})-f(x^*)-\frac{1}{L_1^2}(\mu\eta-\frac{\eta^2 L}{2})\|\nabla f(x^{(k)})\|^2\\
\leq& f(x^{(k)})-f(x^*) - \frac{\gamma^2}{L_1^2}(\mu\eta-\frac{\eta^2 L}{2})
\end{aligned}
\]
\((i)\) holds when \(\eta\in(0,\frac{2\mu}{L})\)
\[\frac{1}{2 L_1}\|\nabla f(x^{(k)})\|^2\leq f(x^{(k)})-f(x^*)\leq f(x^{(0)})-f(x^*)-\frac{\gamma^2}{L_1^2}(\mu\eta-\frac{\eta^2 L}{2})k
\]
\(2^{\circ}.\) when \(\|\nabla f(x^{k})\| \leq \gamma\),
\[\begin{aligned}
\|\nabla f(x^{(k+1)})\| =& \|\int_0^1 \nabla^2 f(x^{(k)}+t\Delta x_{\text{nt}})\Delta x_{\text{nt}}~dt+\nabla f(x^{(k)})\|\\
=& \|\int_0^1 \nabla^2 f(x^{(k)}+t\Delta x_{\text{nt}})\Delta x_{\text{nt}}+\nabla f(x^{(k)})~dt\|\\
=& \|\int_0^1 \nabla^2 f(x^{(k)}+t\Delta x_{\text{nt}})(-\nabla^2 f(x^{(k)})^{-1}\nabla f(x^{(k)}))+\nabla f(x^{(k)})~dt\|\\
\leq& \int_0^1 \|(\nabla^2 f(x^{(k)}+t\Delta x_{\text{nt}})-\nabla^2 f(x^{(k)}))\nabla^2 f(x^{(k)})^{-1}\nabla f(x^{(k)}) \|dt\\
\leq& \int_0^1 L_2 \|\Delta x_{\text{nt}}\|^2t~dt\\
\leq& \frac{L_2}{2\mu^2}\|\nabla f(x^{(k)})\|^2
\end{aligned}
\]
which means that
\[\frac{L_2}{2\mu^2}\|\nabla f(x^{(k)})\|\leq (\frac{L_2}{2\mu^2}\|\nabla f(x^{(0)})\|)^{2^k}
\]
where \(\gamma<\frac{2\mu^2}{L_2}\), then
\[f(x^{(k)})-f(x^*)\leq \frac{1}{2\mu}\|\nabla f(x^{(k)})\|^2\leq \frac{2\mu^3}{L_2^2}(\frac{L_2}{2\mu^2}\|\nabla f(x^{(0)})\|)^{2^{k+1}}
\]