【SI152笔记】part4：无约束非线性规划

SI152: Numerical Optimization

Lec 9： Nonlinear Programming，Line Search Method

Fundamentals for nonlinear unstrained optimization

Theorem 1 (Mean Value Theorem)
Given $f \in C$, $x \in\mathbb{R}^n$, and $d\in\mathbb{R}^n$, there exists $\alpha\in(0, 1)$ such that $$f(x + d) = f(x) + \nabla f(x + αd)^T d$$

Theorem 2 (Taylor’s Theorem)
Given $f \in C^2$, $x \in\mathbb{R}^n$, and $d\in\mathbb{R}^n$, there exists $\alpha\in(0, 1)$ such that $$f(x + d) = f(x) + \nabla f(x)^T d + \frac{1}{2}d^T \nabla^2 f(x + αd) d$$

Definition 3 (Convex function)
A function $f : \mathbb{R}^n \to \mathbb{R}$ is convex if for all ${x1, x2} \subset \mathbb{R}^n$ and $α \in [0, 1]$ we have

\[f(αx_1 + (1 − α)x_2) \leq αf(x_1) + (1 − α)f(x_2) \]

$f$ is concave if $−f$ is convex.
strictly convex: if for x1 = x2, the above inequality holds strictly.
If $f : \mathbb{R}^n \to \mathbb{R}$ is convex, then it is continuous.
Addition, Maximization, Composition, preserving convexity.

Definition 4
The epigraph of $f$ is $\mathsf{epi}(f) := \{(x, z) : x \in X , z \in R, \text{and } f(x) ≤ z\}$.
$\mathsf{dom}(f) := \{x | x\in X \text{ and } f(x) < \infty \}$.

Theorem 5
Let $\mathcal{X}$ be a nonempty convex subset of $\mathbb{R}^n$ and let $f : \mathbb{R}^n \to \mathbb{R}$ be differentiable over an open set containing $\mathcal{X}$. Then, the following hold:

(a) f is convex over $\mathcal{X}$ if and only if, for all {x1, x2} ⊂ X, we have

\[f(x_2) ≥ f(x_1) + \nabla f(x1)^T(x_2 − x_1) \]

(b) f is strictly convex over $\mathcal{X}$ if and only if the above inequality is strict when $x_1 = x_2$.

Theorem 6
Let $\mathcal{X}$ be a nonempty convex subset of $\mathbb{R}^n$ and let $f : \mathbb{R}^n \to \mathbb{R}$ be twice continuously differentiable over an open set containing $\mathcal{X}$. Then, thefollowing hold:

(a) If $\nabla^2f(x)$ is positive semi-definite for all $x\in\mathcal{X}$ , then $f$ is convex over $\mathcal{X}$ .
(b) If $\nabla^2f(x)$ is positive definite for all $x\in\mathcal{X}$, then $f$ is convex over $\mathcal{X}$.
(c) If $\mathcal{X}$ is open and $f$ is convex over $\mathcal{X}$, then $\nabla^2f(x)$ is positive definite for all $x\in\mathcal{X}$.

Definition 7 (Subgradient and Subdifferential)
A vector $g \in \mathbb{R}^n $ is a subgradient of a proper convex f at $x \in dom(f)$ if

\[f(\bar{x}) ≥ f(x) + g^T(\bar{x} − x) ,\forall \bar{x}\in \mathbb{R}^n . \]

The set of all subgradients of $f$ at $x$, denoted $∂f(x)$, is the subdifferential of $f$ at $x$.

Let $f : \mathbb{R}^n\to \mathbb{\bar{R}}$ be proper and convex.
- If $x ∈ dom(f)$, then $g ∈ ∂f(x)$ if and only if
\[f'(d; x) ≥ g^T d , \forall d \in\mathbb{R}^n \]
- If $x ∈ int dom(f)$, then $∂f(x)$ is a nonempty, convex, and compact and
\[f'(d; x) ≥ \max_{g\in\partial f(x)} g^T d , \forall d \in\mathbb{R}^n \]

Descent directions
At a point $x \in \mathbb{R}^n$, a descent direction $d$ is one for which we have $∇f(x)^T d = f'(d; x) < 0$.
We can decrease $f$ by moving (a small distance) along such a direction $d$.
The steepest descent direction is $d = -\nabla f(x)$.

Optimality conditions

Definition 10 (Global minimum)
A vector $x^∗$ is a global minimum of $f$ if $f(x^∗) ≤ f(x) ,\forall x \in\mathbb{R}^n$.

Definition 11 (Local minimum)
A vector $x^∗$ is a global minimum of $f$ if there exists $\epsilon > 0$ such that $f(x^∗) ≤ f(x) ,\forall x \in B(x^∗, \epsilon) := \{x \in\mathbb{R}^n | \lVert x − x^∗ \rVert _2 ≤ \epsilon \}$.

Theorem 12
If $x \in \mathbb{R}^n$ is convex, then a local minimum of $f$ is a global minimum of f. If $f$ is strictly convex, then there exits at most one global minimum of $f$.

Theorem 13 (First-order necessary condition)
If $f \in C$ and $x^∗$ is a local minimizer of $f$, then $\nabla f(x^∗) = 0$.

Definition 14
A point $x \in \mathbb{R}^n$ is a stationary point for $f \in C$ if $\nabla f(x) = 0$.

Theorem 16 (Second-order necessary condition)
If $f \in C^2$ and $x^∗$ is a local minimizer of $f$, then $\nabla^2f(x^∗) \succ 0$.

Theorem 17 (Second-order sufficient condition)
If $f \in C^2$, $\nabla f(x^∗) = 0$ and $\nabla^2f(x^∗) \succ 0$, then $x^*$ is a strict local minimizer.

Line search method

Line search philosophy: compute $d_k$ and then compute $α_k > 0$ so that $x_{k+1} \gets x_k + α_k d_k$ is “better” than $x_k$ in some way.

Common choices for $d_k$:

the steepest descent direction (gradient descent): $d_k = -\nabla f(x_k)$.
the Newton direction: $d_k = -\nabla^2 f(x_k)^{-1} \nabla f(x_k)$.
Some approximation of Newton direction (Quasi-Newton): $d_k = -H_k^{-1} \nabla f(x_k)$.

Choice of $\alpha_k$:

At least ensure that for $x_{k+1} \gets x_k + α_k d_k$, we have $f(x_{k+1}) < f(x_k)$.
At most solve the one-dimensional (nonlinear) minimization problem: $\min_{\alpha\geq 0} f(x_k + \alpha d_k)$.

Sufficient decrease condition

the Armijo condition:

\[f(x_k + α_k d_k) ≤ f(x_k) + c_1 α_k\nabla f(xk)^T d_k \]

where $c_1 \in (0, 1)$ is a user-specified constant:

$c_1 = 0$ is too loose of a requirement
$c_1 = 1$ is too strict, and may not be satisfiable if curvature is strictly positive.

Backtracking line search

Algorithmically, choose the largest value in the set $\{ \gamma^0,\gamma^1,\gamma^2,\dots \}$ where $\gamma\in(0, 1)$ is a given constant, satsifying the Armijo condition.

Wolfe conditions

the curvature condition:

\[\nabla f(x_k + α_k d_k)^T d_k \geq c_2 \nabla f(x_k)^T d_k \]

where $c_2\in(c_1, 1)$ is a user-specified constant.
Use this to let the answer away from $\alpha = 0$.

The Armijo and curvature conditions together compose the Wolfe conditions.

Theorem 18 (Zoutendijk’s Theorem)

Suppose that $f$ is bounded below and continuously differentiable in an open set $\mathcal{N}$ containing the sublevel set $L := \{x | f(x) ≤ f(x_0)\}$. Suppose also that $\nabla f$ is Lipschitz continuous on $\mathcal{N}$ with constant $L$. Consider any iteration of the form $x_{k+1} \gets x_k + α_k d_k$ for all $k \in\mathbb{N}_+$, where, for all $k$, $d_k$ is a descent direction, and $α_k$ satisfies the Wolfe conditions. $θ_k$ is the angle between $d_k$ and $−\nabla f(x_k)$. Then,

\[\sum_{k=0}^{\infty} \cos^2 \theta_k \lVert \nabla f(x_k) \rVert^2 < \infty \]

Theorem 19

For $L$-smooth and $µ$-strongly convex $f$, Gradient descent ($d_k = -\nabla f(x_k)$) with fixed step size
$\alpha \leq \frac{2}{\mu + L}$ satisfies

\[f(x^k) - f(x^*) \leq \left( \dfrac{L/\mu -1}{L/\mu +1} \right)^{2k} \dfrac{L}{2} \lVert x^0 -x^* \rVert^2 \]

So it's linear convergence.

Lec 10: Quasi-Newton Method

Newton’s method is fast, but expensive, since it requires Hessians and the solution of a linear system to compute the search direction.
So for Quasi-Newton method, rather than compute $∇^2 f(x_k)$, we compute an approximation $H_k$, updating $H_k$ in each iteration.

The model in each iteration:

\[m_k(d) := f(x_k) + \nabla f(x_k)^T d + \frac{1}{2} d^T H_k d \]

For next iteration:

\[m_{k+1}(d) := f(x_{k+1}) + \nabla f(x_{k+1})^T d + \frac{1}{2} d^T H_{k+1} d \]

It should satisfies:

\[\nabla m_{k+1}( -\alpha_k d_k ) = \nabla f(x_{k}) \]

Then we have the “secant equation”

\[H_{k+1} s_k = y_k;~ s_k = x_{k+1} - x_k = \alpha_k d_k,~ y_k = \nabla f(x_{k+1}) - \nabla f(x_{k}) \]

For $n\geq 1$, the $H_{k+1}$ is not unique.

Symmetric-rank-1 updates

The symmetric-rank-1 (SR1) method requires Hk+1 to be symmetric and
enforces: Hk+1sk = yk (secant equation)
Hk+1 = Hk + σvvT (rank 1 update);

Symmetric-rank-2 updates

Davidon-Fletcher-Powell update

Broyden-Fletcher-Goldfarb-Shanno update

Convergence

Superlinear convergence of BFGS method

posted @ 2025-01-01 22:08 Coinred 阅读(29) 评论(0) 收藏举报

刷新页面返回顶部

Coinred 的手稿们

——"An AC a day,keep the WA away."

【SI152笔记】part4：无约束非线性规划

SI152: Numerical Optimization

Lec 9： Nonlinear Programming，Line Search Method

Fundamentals for nonlinear unstrained optimization

Optimality conditions

Line search method

Sufficient decrease condition

Backtracking line search

Wolfe conditions

Theorem 18 (Zoutendijk’s Theorem)

Theorem 19

Lec 10: Quasi-Newton Method

Symmetric-rank-1 updates

Symmetric-rank-2 updates

Davidon-Fletcher-Powell update

Broyden-Fletcher-Goldfarb-Shanno update

Convergence

公告

Coinred 的 手稿们

——"An AC a day,keep the WA away."

【SI152笔记】part4：无约束非线性规划

SI152: Numerical Optimization

Lec 9： Nonlinear Programming，Line Search Method

Fundamentals for nonlinear unstrained optimization

Optimality conditions

Line search method

Sufficient decrease condition

Backtracking line search

Wolfe conditions

Theorem 18 (Zoutendijk’s Theorem)

Theorem 19

Lec 10: Quasi-Newton Method

Symmetric-rank-1 updates

Symmetric-rank-2 updates

Davidon-Fletcher-Powell update

Broyden-Fletcher-Goldfarb-Shanno update

Convergence

公告

Coinred 的手稿们