BoydC9pt1

Unconstrained minimization problems
Terminology and assumptions
\(\mu\)-strongly convex (\(\mu-SC\)), where \(\mu>0\) :
- \(f(\theta x+(1-\theta)y)\leq \theta f(x)+(1-\theta)f(y)-\frac{\mu}{2}\theta(1-\theta)\|x-y\|^2\), \(\theta\in[0,1]\)
- If \(f\) is differentiable, then\[f(y)\geq f(x)+\nabla f(x)^T(y-x)+\frac{\mu}{2}\|y-x\|^2 \]
- If \(f\) is differentiable,then\[f(y)\leq f(x)+\nabla f(x)^T(y-x)+\frac{1}{2\mu}\|\nabla f(y)-\nabla f(x)\|^2 \]specially,\[f(x)-f(x^*)\leq \frac{1}{2\mu} \|\nabla f(x)\|^2 \]
- If \(f\) is differentiable,then\[(\nabla f(y)-\nabla f(x))^T(y-x) \geq \mu\|y-x\|^2 \]Thus \(\|\nabla f(y)-\nabla f(x)\|\geq\mu \|y-x\|\)
- If \(f\) is twice differentiable, then\[\nabla^2 f(x) \succeq \mu I \]
proof
Lemma 3.
\[\begin{aligned} &&f(y)\geq& f(x)+\nabla f(x)^T(y-x)+\frac{\mu}{2}\|y-x\|^2\\ \implies&& \inf_y f(y)\geq& \inf_y \{f(x)+\nabla f(x)^T(y-x)+\frac{\mu}{2}\|y-x\|^2\}\\ \implies&& f(x^*) \geq & f(x)-\frac{1}{2\mu} \|\nabla f(x)\|^2 \end{aligned} \]let \(g_{y}(x)=f(x)-\nabla f(y)^T x\), then \(g_y(x)\) is also \(\mu\)-strongly convex, and \(\nabla g_y(x)=\nabla f(x)-\nabla f(y)\), \(g_y(x^*)=g_y(y)=f(y)-\nabla f(y)^T y\), thus
\[f(x)\leq f(y)+\nabla f(y)^T(x-y)+\frac{1}{2\mu}\|\nabla f(x)-\nabla f(y)\|^2 \]
\(L\)-smooth : For differentiable \(f:\mathbb{R}^n\to\mathbb{R}\), for all \(x,y\in \mathbb{R}^n\)
\[\|\nabla f(y)-\nabla f(x)\|\leq L \|y-x\|
\]
where \(\|\cdot\|\) denotes \(\|\cdot\|_2\), the following lemmas hold:
- Gradient \(L\)-Lipschitz continuity.
- Gradient inner product inequality:\[ \left<\nabla f(y)-\nabla f(x),y-x \right> \leq L\|y-x\|^2 \]
- Descent lemma:\[f(y)\leq f(x)+\nabla f(x)^T(y-x)+\frac{L}{2}\|y-x\|^2 \]
- \(f(\theta x + (1 - \theta)y) \geq \theta f(x) + (1 - \theta)f(y) - \frac{L}{2} \theta(1 - \theta)\|x - y\|^2\), where \(\theta\in [0,1]\).
- If \(f\) is twice differentiable, then\[\nabla^2 f(x) \preceq L I \]
If \(f\) is convex, then
- Gradient inner product inequality:\[\frac{1}{L}\|\nabla f(y)-\nabla f(x)\|^2 \leq \left<\nabla f(y)-\nabla f(x),y-x \right> \]
-
\[ f(y)\geq f(x)+\nabla f(x)^T(y-x)+\frac{1}{2L}\|\nabla f(y) - \nabla f(x)\|^2 \]
proof.
Lemma 2.
\[\begin{aligned} \left<\nabla f(y)-\nabla f(x),y-x \right> \leq &\|\nabla f(y)-\nabla f(x)\|\cdot\|y-x\|\\ \leq & L \|y-x\|^2 \end{aligned} \]Lemma 3.
\[\begin{aligned} &f(y)-f(x)\\ =& \int_0^1 \left<\nabla f(x+t(y-x)),y-x\right> dt\\ =& \int_0^1 \left<\nabla f(x+t(y-x))-\nabla f(x),y-x\right> dt+\left<\nabla f(x),y-x\right>\\ \leq& \int_0^1 tL \|y-x\|^2 dt+\left<\nabla f(x),y-x\right>\\ =& \nabla f(x)^T(y-x)+\frac{L}{2}\|y-x\|^2 \end{aligned} \]Lemma 4. For \(\theta \in[0,1]\), let \(\tilde{x} = \theta x+(1-\theta)y\), by descent lemma then
\[\begin{align} f(y)\leq& f(\tilde{x})+\nabla f(\tilde{x})^T(y-\tilde{x})+\frac{L}{2}\|y-\tilde{x}\|^2\nonumber\\ =& f(\tilde{x})+\theta\nabla f(\tilde{x})^T(y-x)+\frac{\theta^2L}{2}\|y-x\|^2\\ f(x)\leq& f(\tilde{x})+\nabla f(\tilde{x})^T(x-\tilde{x})+\frac{L}{2}\|x-\tilde{x}\|^2\nonumber\\ =& f(\tilde{x})-(1-\theta)\nabla f(\tilde{x})^T(y-x)+\frac{(1-\theta)^2L}{2}\|y-x\|^2\\ \end{align} \]let \((1-\theta)(1)+\theta(2)\), then
\[\theta f(x) + (1 - \theta)f(y) \leq f(\theta x + (1 - \theta)y)+ \frac{L}{2} \theta(1 - \theta)\|x - y\|^2 \]Lemma 6.
For \(x_1,x_2\), for all \(x\)\[\begin{aligned} f(x) &\geq f(x_1)+\nabla f(x_1)^T(x-x_1)\triangleq g_1(x)\quad && \text{convex}\\ f(x) &\leq f(x_2)+\nabla f(x_2)^T(x-x_2) + \frac{L}{2} \|x-x_2\|^2\triangleq g_2(x) && \text{descent lemma}\\ \end{aligned} \]Thus,
\[\begin{aligned} &\inf (g_2(x)-g_1(x))\\ =&\inf \{\frac{L}{2} \|x-x_2\|^2+(\nabla f(x_2)-\nabla f(x_1))^Tx\\ &+f(x_2)-f(x_1)-\nabla f(x_2)^Tx_2+\nabla f(x_1)^Tx_1\}\\ \geq &0 \end{aligned} \]Furthermore,
\[\begin{aligned} &\nabla (g_2(x)-g_1(x))=L(x-x_2)+\nabla f(x_2)-\nabla f(x_1)\\ \implies & x^* = \frac{\nabla f(x_1)-\nabla f(x_2)}{L}+x_2 \end{aligned} \]Hence
\[\begin{aligned} &\inf (g_2(x)-g_1(x)) = g_2(x^*)-g_1(x^*)\\ =& -\frac{1}{2L}\|\nabla f(x_2)-\nabla f(x_1)\|^2-\nabla f(x_1)^T(x_2-x_1)+ f(x_2)-f(x_1) \\ \geq& 0 \end{aligned} \]which means
\[f(x_2)\geq f(x_1)+\nabla f(x_1)^T(x_2-x_1)+\frac{1}{2L}\|\nabla f(x_2)-\nabla f(x_1)\|^2 \]Lemma 7. By lemma 6,
\[\begin{aligned} f(x_2)\geq& f(x_1)+\nabla f(x_1)^T(x_2-x_1)+\frac{1}{2L}\|\nabla f(x_2)-\nabla f(x_1)\|^2\\ f(x_1)\geq& f(x_2)+\nabla f(x_2)^T(x_1-x_2)+\frac{1}{2L}\|\nabla f(x_2)-\nabla f(x_1)\|^2\\ \end{aligned} \]Thus
\[(f(x_2)-f(x_1))^T(x_2-x_1)\geq \frac{1}{L}\|\nabla f(x_2)-\nabla f(x_1)\|^2 \]

1st order methods
浙公网安备 33010602011771号