BoydC9pt3

1756615042606


Steepest descent method

Def 4.1. normalized steepest descent direction:

\[\begin{aligned} \Delta x_{\text{nsd}} := &\arg \min\{\nabla f(x)^T v~|~\|v\|=1\} \\ = &\arg \min\{\nabla f(x)^T v~|~\|v\|\leq1\} \end{aligned} \]

Thus \(\nabla f(x)^T\Delta x_{\text{nsd}}=-\|\nabla f(x)\|_*\) .

Def 4.2. unnormalized steepest descent step:

\[\Delta x_{\text{sd}} := \|\nabla f(x)\|_*\Delta x_{\text{nsd}} \]

Thus \(\nabla f(x)^T\Delta x_{\text{sd}}=-\|\nabla f(x)\|_*^2\) and \(\|\Delta x_{\text{sd}}\|= \|\nabla f(x)\|_*\)

Convergence analysis

\(L\)-smooth + \(\mu\)-strongly convex, \(x^{(k+1)}=x^{(k)}+\eta \Delta x_{\text{sd}}\) .

Lemma 1. there exists constant \(\gamma_1, \gamma_2 \in (0,1]\) \(s.t.\)

\[\|x\|\geq \gamma_1\|x\|_2,\quad\|x\|_*\geq \gamma_2\|x\|_2 \]

Hence we can have that

\[\begin{aligned} f(x^{(k+1)})-f(x^*) \leq& f(x^{(k)})-f(x^*)+\eta\nabla f(x^{(k)})^T\Delta x_{\text{sd}}+\frac{\eta^2 L}{2}\|\Delta x_{\text{sd}}\|^2_2\\ {\leq} & f(x^{(k)})-f(x^*) + (-\eta+\frac{\eta^2 L}{2\gamma_1^2})\|\nabla f(x^{(k)})\|^2_*\\ \overset{(i)}{\leq} & f(x^{(k)})-f(x^*) + \gamma_2^2(-\eta+\frac{\eta^2 L}{2\gamma_1^2})\|\nabla f(x^{(k)})\|^2_2\\ \leq& f(x^{(k)})-f(x^*) + \frac{\gamma_2^2}{2\mu}(-\eta+\frac{\eta^2 L}{2\gamma_1^2})(f(x^{(k)})-f(x^*))\\ =& (1+\frac{\gamma_2^2}{2\mu}(-\eta+\frac{\eta^2 L}{2\gamma_1^2}))(f(x^{(k)})-f(x^*)) \end{aligned} \]

\((i)\) holds for \(\eta\in(0,\frac{2\gamma_1^2}{L})\), and \((1+\frac{\gamma_2^2}{2\mu}(-\eta+\frac{\eta^2 L}{2\gamma_1^2}))<1\).

Newton’s method

\(x^{(k+1)}=x^{(k)}+\eta \Delta x_{\text{nt}}\) , where \(\nabla^2 f(x^{(k)})\Delta x_{\text{nt}}+\nabla f(x^{(k)})=0\) .

Convergence analysis

\(L_1\)-smooth + \(\mu\)-strongly convex + \(\nabla^2 f\) is Lipschitz continuous with \(L_2\), \(\Delta x_{\text{nt}}=-\nabla^2 f(x^{(k)})^{-1}\nabla f(x^{(k)})\) .

\(1^{\circ}.\) when \(\|\nabla f(x^{k})\| >\gamma\),

\[\begin{aligned} f(x^{(k+1)})-f(x^*) \leq& f(x^{(k)})-f(x^*)+\eta\nabla f(x^{(k)})^T\Delta x_{\text{nt}}+\frac{\eta^2 L}{2}\|\Delta x_{\text{nt}}\|^2_2\\ =& f(x^{(k)})-f(x^*) - \eta \Delta x_{\text{nt}}^T \nabla^2 f(x^{(k)})^T \Delta x_{\text{nt}}+\frac{\eta^2 L}{2}\|\Delta x_{\text{nt}}\|^2_2\\ \overset{(i)}{\leq}& f(x^{(k)})-f(x^*) - (\mu\eta-\frac{\eta^2 L}{2})\|\Delta x_{\text{nt}}\|^2_2\\ \leq& f(x^{(k)})-f(x^*)-\frac{1}{L_1^2}(\mu\eta-\frac{\eta^2 L}{2})\|\nabla f(x^{(k)})\|^2\\ \leq& f(x^{(k)})-f(x^*) - \frac{\gamma^2}{L_1^2}(\mu\eta-\frac{\eta^2 L}{2}) \end{aligned} \]

\((i)\) holds when \(\eta\in(0,\frac{2\mu}{L})\)

\[\frac{1}{2 L_1}\|\nabla f(x^{(k)})\|^2\leq f(x^{(k)})-f(x^*)\leq f(x^{(0)})-f(x^*)-\frac{\gamma^2}{L_1^2}(\mu\eta-\frac{\eta^2 L}{2})k \]

\(2^{\circ}.\) when \(\|\nabla f(x^{k})\| \leq \gamma\),

\[\begin{aligned} \|\nabla f(x^{(k+1)})\| =& \|\int_0^1 \nabla^2 f(x^{(k)}+t\Delta x_{\text{nt}})\Delta x_{\text{nt}}~dt+\nabla f(x^{(k)})\|\\ =& \|\int_0^1 \nabla^2 f(x^{(k)}+t\Delta x_{\text{nt}})\Delta x_{\text{nt}}+\nabla f(x^{(k)})~dt\|\\ =& \|\int_0^1 \nabla^2 f(x^{(k)}+t\Delta x_{\text{nt}})(-\nabla^2 f(x^{(k)})^{-1}\nabla f(x^{(k)}))+\nabla f(x^{(k)})~dt\|\\ \leq& \int_0^1 \|(\nabla^2 f(x^{(k)}+t\Delta x_{\text{nt}})-\nabla^2 f(x^{(k)}))\nabla^2 f(x^{(k)})^{-1}\nabla f(x^{(k)}) \|dt\\ \leq& \int_0^1 L_2 \|\Delta x_{\text{nt}}\|^2t~dt\\ \leq& \frac{L_2}{2\mu^2}\|\nabla f(x^{(k)})\|^2 \end{aligned} \]

which means that

\[\frac{L_2}{2\mu^2}\|\nabla f(x^{(k)})\|\leq (\frac{L_2}{2\mu^2}\|\nabla f(x^{(0)})\|)^{2^k} \]

where \(\gamma<\frac{2\mu^2}{L_2}\), then

\[f(x^{(k)})-f(x^*)\leq \frac{1}{2\mu}\|\nabla f(x^{(k)})\|^2\leq \frac{2\mu^3}{L_2^2}(\frac{L_2}{2\mu^2}\|\nabla f(x^{(0)})\|)^{2^{k+1}} \]

posted @ 2025-09-01 13:30  p0q  阅读(10)  评论(0)    收藏  举报