【SI152笔记】part1:方程与优化

SI152: Numerical Optimization

Lec 1.

Optimization

Three elements of an optimization problem: Objective(目标), Variables(变量), Constraints(约束条件).

\[\textbf{Objective} : \min_{x\in\mathbb{R}^n} f(x) \\ \textbf{Variables} : x \\ \textbf{Constraints} : \begin{cases} c_i(x) = 0, &i\in\mathcal{E} \\ c_i(x) < 0, &i\in\mathcal{I} \\ x\in\Omega \end{cases} \]

Classifacation:

  • Linear Optimization v.s. Nonlinear Optimization
  • Constrained Optimization v.s. Unconstrained Optimization
  • Continuous Optimization v.s. Integer Optimization
  • Stochastic Optimization v.s. Deterministic Optimization
  • Convex Optimization v.s. Nonconvex Optimization
  • Single-objective Optimization v.s. Multi-objective Optimization
  • Bilevel Optimization v.s. Single-level Optimization
    • Bilevel Optimization: \(F(x, y(x))\) is the objective function of the upper level problem, which depends on the solution \(y(x)\) of the lower level problem.
  • Global Optimization v.s. Local Optimization

Equation \(\iff\) Optimization

Iterative algorithms: \(x_{k+1} = \mathcal{M} (x_k)\)

Generally, the sequence from iterative algorithms converges to an “optimal solution”.

  • Globally convergent algorithm v.s. Locally convergent algorithm

Convergence rates

For sequence \(\{x_k\}\) converges to \(x^*\).

Q-linear convergence: If there exists a constant \(c \in [0,1)\) and \(\hat{k}\geq 0\) such that

\[|| x_{k+1} - x^* ||_2 \leq c || x_{k} - x^* ||_2 ~,~\forall k\geq \hat{k} \]

then \(\{x_k\}\) converges Q-linearly to \(x^*\).

  • i.e.

\[\limsup_{k\to\infty} \dfrac{|| x_{k+1} - x^* ||_2}{|| x_{k} - x^* ||_2 } < 1 \]

  • also called geometric convergence.

Q-superlinear convergence: If there exists a sequence \(\{c_k\} \to 0\) such that

\[|| x_{k+1} - x^* ||_2 \leq c_k || x_{k} - x^* ||_2 \]

then \(\{x_k\}\) converges Q-superlinearly to \(x^*\).

  • i.e.

\[\lim_{k\to\infty} \dfrac{|| x_{k+1} - x^* ||_2}{|| x_{k} - x^* ||_2 } = 0 \]

Q-quadratic convergence: If there exists a constant \(c \geq 0\) and \(\hat{k}\geq 0\) such that

\[|| x_{k+1} - x^* ||_2 \leq c || x_{k} - x^* ||_2^2 ~,~\forall k\geq \hat{k} \]

then \(\{x_k\}\) converges Q-linearly to \(x^*\).

“R-” convergence: skipped. So we’ll drop the “Q-”.

Sublinear convergence (Arithmetic Convergence): If the sequence \(\{r_k\}\) converges to \(r^*\) in such a way that

\[|| r_{k+1} - r^* ||_2 \leq C \dfrac{|| r_{0} - r^* ||_2}{k^p} , k\geq 1, 0<p<\infty \]

where \(C\) is a fixed positive number, the sequence is said to converge arithmetically to \(r^∗\) with order \(p\).

  • i.e.

\[\limsup_{k\to\infty} \dfrac{|| r_{k+1} - r^* ||_2}{|| r_{k} - r^* ||_2 } = 1 \]

Lec 2.

Linear equations

Solution:

  1. Direct methods: Gaussian elimination
  2. Iterative methods
  3. Conjugate gradient method

The Jacobi Iteration Method

For \(n \times n\) linear system \(Ax=b\), let solution be \(x^*\):

Let \(A=L+D+U\)

\[L=\begin{bmatrix} 0 & \cdots & \cdots & 0 \\ a_{21} & \ddots & & \vdots \\ \vdots & & \ddots & \vdots \\ a_{n1} & \cdots & a_{n,n-1} & 0 \end{bmatrix}, \quad D=\begin{bmatrix} a_{11} & 0 & \cdots & 0 \\ 0 & a_{22} & & \vdots \\ \vdots & & \ddots & 0 \\ 0 & \cdots & 0 & a_{nn} \end{bmatrix}, \quad U=\begin{bmatrix} 0 & a_{12} & \cdots & a_{1n} \\ \vdots & \ddots & \ddots & \vdots \\ \vdots & & \ddots & a_{n-1,n} \\ 0 & \cdots & \cdots & 0 \end{bmatrix} \]

Trasform \((D+L+U)x=b\) into:

\[x = D^{-1} b - D^{-1}(L+U)x \]

So Jacobi iterative technique (Matrix form):

\[x^{(k+1)} = D^{-1} b - D^{-1}(L+U)x^{(k)} \stackrel{def}{=} Mx^{(k)} + c \]

(Elementwisely form):

\[x^{(k+1)}_i = \dfrac{1}{a_{ii}}\left(b_i - \sum_{j\neq i} a_{ij} x^{(k)}_{j} \right) \]

Convergence:

\[x^{(k+1)} - x^* = M(x^{(k)} - x^*) \implies ||x^{(k+1)} - x^*|| \leq ||M||\cdot||(x^{(k)} - x^*) || \]

So $||M||<1 \implies \text{linear convergence} $ , \(||M||\) is spectral norm(谱范数) of \(M\).

Weighted Jacobi method (1):

\[x^{(k+1)} = (1-\omega)x^{(k)} + \omega D^{-1}(b - (L+U)x^{(k)}) \]

Weighted Jacobi method (2):

\[\begin{aligned} x^{(k+1)} &= (1-\omega)x^{(k)} + \omega D^{-1}(b - (L+U)x^{(k)}) \\ &= \end{aligned} \]

Gauss-Seidel Method

(Matrix form):

\[x^{(k+1)} = (L+D)^{-1} (b - U x^{(k)}) \]

  • Often converges faster than Jacobi.
  • Still depends on the spectral radius but with improved convergence

Nonlinear equations

Nonlinear equation \(f(x)=0\), solution is \(x_*\).

Definition

A function \(G\) is Lipschitz continuous with constant \(L \geq 0\) in \(\mathcal{X}\) if

\[\lVert G(x_1) -G(x_2) \rVert \leq L \lVert x_1 - x_2 \rVert \]

Bisection Method

In use of Intermediate Value Theorem:
If \(f(x)\) is continuous on \([a, b]\) and \(f(a) \cdot f(b) < 0\), then \(\exist c\in (a, b) , f(c) = 0\).

Newton’s method

From:

\[f(x) = f(x_k) + f'(x_k)(x-x_k) + o(x-x_k) \]

Then:

\[x_{k+1} := x_{k} - \dfrac{f(x_{k})}{f'(x_{k})} \]

Quadratic convergence:
Let \(e_k = x_k - x_*\)

\[\begin{aligned} e_{k+1} &= e_{k} - \dfrac{0 + f'(x_*)e_k + \frac{f''(x_*)}{2}e_k^2 + O(e_k^3)}{f'(x_*)+f''(x_*)e_k + O(e_k^2)} \\ &= e_{k} - e_{k}\left(1 + \dfrac{f''(x_*)}{2f'(x_*)}e_k + O(e_k^2)\right) \left(1 - \dfrac{f''(x_*)}{f'(x_*)}e_k + O(e_k^2)\right) \\ &= \dfrac{f''(x_*)}{2f'(x_*)}e_k^2 \end{aligned} \]

Problem of Newton’s method

  • May cause cycling and even divergence. (\(f(x) = \arctan(x)\))
  • May have \(f'(x_k) = 0\). (\(f(x) = x^3\))
  • \(f(x_k)\) may be undefined.

Secant method

Change tangent into secant:

\[x_{k+1} := x_{k} - \dfrac{x_{k} - x_{k-1}}{f(x_{k}) - f(x_{k-1})} f(x_{k}) \]

Superline convergence:
Let \(e_k = x_k - x_*\), \(M=\dfrac{f''(x_*)}{2f'(x_*)}\)

\[\begin{aligned} f(x_*+e_k) &\approx e_k f'(x_*) (1+M e_k) \\ f(x_*+e_k) - f(x_*+e_{k-1}) &\approx f'(x_*) (e_k-e_{k-1}) (1+M(e_k+e_{k-1})) \\ e_{k+1} &\approx e_{k} - \dfrac{e_k (1+M e_k)}{1+M(e_k+e_{k-1})} \\ &= \dfrac{M e_{k-1}e_{k}}{1+M(e_k+e_{k-1})} \approx M e_{k-1}e_{k} \end{aligned} \]

If \(|e_{k+1}|\approx C|e_k|^p\), then

\[C|e_k|^p \approx |M||e_{k-1}||e_{k}| \\ |e_k| = C|e_{k-1}|^p \approx \left(\dfrac{|M|}{C}\right)^{\frac{1}{p-1}} |e_{k-1}|^{\frac{1}{p-1}} \]

So we can get:

\[|e_{k+1}|\approx \left| \dfrac{f''(x_*)}{2f'(x_*)} \right|^{\frac{\sqrt{5}-1}{2}}|e_k|^{\frac{\sqrt{5}+1}{2}} \]

Newton’s method for multivariable

Suppose we have $F: \mathbb{R}^m \mapsto \mathbb{R}^n $ and solve \(F(x) = 0\).

Then

\[\begin{gathered} \nabla F(x_k)^T d = -F(x_k) \\ x_{k+1} = x_k + d_k \end{gathered} \]

Quadratical convergence:

Newton's method converges quadratically! (under nice assumptions)

  • \(F\) is continuously differentiable in an open convex set \(\mathcal{X}\subset\mathbb{R}^n\) with $ x_* \in\mathcal{X} $.
  • The Jacobian of \(F\) at \(x_*\) is invertible and is bounded in norm by \(M > 0\), i.e.,

    \[\lVert \nabla F(x_*)^{-T} \rVert _2 \leq M \]

    • If \(M\) is large, then following the gradient may send us far away.
  • For some neighborhood of \(x_∗\) with radius \(r > 0\) contained in \(\mathcal{X}\), i.e.,

\[\mathbb{B}(x_∗, r) := \{ x\in\mathbb{R}^n \mid \lVert x-x_* \rVert _2 \leq r \}\subset \mathcal{X} \]

the Jacobian of F(x) is Lipschitz continuous with constant \(L\) in \(\mathbb{B}(x_∗, r)\).

  • If \(L\) is large, then the gradients of the functions change rapidly.

Equations and Optimization

Linear

\[Kx = y \implies \min_x \dfrac{1}{2}\lVert Kx-y \rVert _2 \iff K^T(Kx-y)=0 \]

Nonlinear

  • Nonlinear equation \(\implies\) Optimization
    • \(F(x) = 0 \implies \min_{x\in\mathbb{R}} f(x) := \frac{1}{2}\lVert F(x) \rVert _2^2\)
    • However, we generally do not prefer solve the latter, since

      \[ \nabla f(x) = F(x)^T \nabla F(x), \nabla^2 f(x) = \nabla F(x)^T \nabla F(x) \]

  • Optimization \(\implies\) Nonlinear equation
    • For any \(C^1\)-smooth,

      \[\min_{x\in\mathbb{R}} f(x) \iff \nabla f(x) = 0 \]

posted @ 2024-12-21 14:03  Coinred  阅读(30)  评论(0)    收藏  举报