原始问题
假设\(f(x),c_i(x),h_j(x)\)是定义在\(R^n\)上的连续可微函数,考虑约束最优化问题
\[\begin{aligned}
\mathop{min}\limits_{x \in R^n}\ &f(x) \\
s.t.\ &c_i(x) \leq 0,i = 1,2,\cdots,k \\
&h_j(x) = 0,j = 1,2,\cdots,l
\end{aligned}
\]
称此约束最优化问题为原始最优化问题或原始问题,首先引入广义拉格朗日函数:
\[L(x,\alpha,\beta) = f(x) + \sum_{i = 1}^k\alpha_ic_i(x) + \sum_{j = 1}^l\beta_jh_j(x)
\]
这里,\(x = (x^{(1)},x^{(2)},\cdots,x^{(n)})^T \in R^n,\alpha_i,\beta_j\)是拉格朗日乘子,\(\alpha_i \geq 0\),考虑\(x\)的函数:
\[\theta_P(x) = \mathop{max}\limits_{\alpha,\beta,\alpha_i \geq 0}L(x,\alpha,\beta)
\]
这里的下标\(P\)表示原始问题,因为\(c_i(x) \leq 0,\alpha_i \geq 0\),所以\(\sum_{i = 1}^k\alpha_ic_i(x) \leq 0\),类似的\(\sum_{j = 1}^l\beta_jh_j(x) = 0\),故可以得到以下结论:
\[\theta_P(x) =
\begin{cases}
f(x) & x满足原始约束条件 \\
+\infty & otherwise
\end{cases}
\]
考虑极小化问题:
\[\mathop{min}\limits_x\theta_P(x) = \mathop{min}\limits_{x} \mathop{max}\limits_{\alpha,\beta,\alpha_i \geq 0} L(x,\alpha,\beta)
\]
上式和原始的最优化问题是等价的,也就是和原问题有相同的解,\(\mathop{min}\limits_{x} \mathop{max}\limits_{\alpha,\beta,\alpha_i \geq 0} L(x,\alpha,\beta)\)称为广义拉格朗日函数的极小极大问题,为了方便,定义原始问题的最优值\(p^* = \mathop{min}\limits_{x}\theta_P(x)\)称为原始问题的值
对偶问题
定义
\[\theta_D(\alpha,\beta) = \mathop{min}\limits_xL(x,\alpha,\beta)
\]
再考虑极大化\(\theta_D(\alpha,\beta) = \mathop{min}\limits_xL(x,\alpha,\beta)\),即:
\[\mathop{max}\limits_{\alpha,\beta;\alpha_i \geq 0}\theta_D(\alpha,\beta) = \mathop{max}\limits_{\alpha,\beta;\alpha_i \geq 0} \mathop{min}\limits_xL(x,\alpha,\beta)
\]
问题\(\mathop{max}\limits_{\alpha,\beta;\alpha_i \geq 0} \mathop{min}\limits_xL(x,\alpha,\beta)\)称为广义拉格朗日函数的极大极小问题,可以将极大极小问题表示为约束最优化问题:
\[\begin{aligned}
\mathop{max}\limits_{\alpha,\beta}\ &\theta_D(\alpha,\beta) = \mathop{max}\limits_{\alpha,\beta} \mathop{min}\limits_xL(x,\alpha,\beta) \\
s.t.\ &\alpha_i \geq 0,i = 1,2,\cdots,k
\end{aligned}
\]
对偶问题的最优值\(d^* = \mathop{max}\limits_{\alpha,\beta;\alpha_i \geq 0}\ \theta_D(\alpha,\beta)\)
原始问题和对偶问题的关系
定理1
若原始问题和对偶问题都有最优值,则:
\[d^* = \mathop{max}\limits_{\alpha,\beta;\alpha_i \geq 0} \mathop{min}\limits_xL(x,\alpha,\beta) \leq \mathop{min}\limits_{x} \mathop{max}\limits_{\alpha,\beta,\alpha_i \geq 0} L(x,\alpha,\beta) = p^*
\]
证明如下:
对任意的\(\alpha,\beta,x\)有
\[\theta_D(\alpha,\beta) = \mathop{min}\limits_xL(x,\alpha,\beta) \leq L(x,\alpha,\beta) \leq \mathop{max}\limits_{\alpha,\beta,\alpha_i \geq 0} L(x,\alpha,\beta) = \theta_P(x)
\]
所以:
\[\theta_D(\alpha,\beta) \leq \theta_P(x)
\]
由于原始问题和对偶问题均有最优值,所以:
\[ \mathop{max}\limits_{\alpha,\beta;\alpha_i \geq 0} \theta_D(\alpha,\beta) \leq \mathop{min}\limits_x \theta_P(x)
\]
得证。
定理2
考虑原始问题和对偶问题,假设函数\(f(x)\)和\(c_i(x)\)是凸函数,\(h_j(x)\)是仿射函数,假设不等式约束\(c_i(x)\)是严格可行的,及存在\(x\)对所有的\(i\)有\(c_i(x) \lt 0\),则存在\(x^*,\alpha^*,\beta^*\),使得\(x^*\)是原问题的解,\(\alpha^*,\beta^*\)是对偶问题的解,并且有:
\[p^* = d^* = L(x^*,\alpha^*,\beta^*)
\]
定理3
假设函数\(f(x)\)和\(c_i(x)\)是凸函数,\(h_j(x)\)是仿射函数,不等式约束\(c_i(x)\)是严格可行的,那么有\(x^*\)是原问题的解,\(\alpha^*,\beta^*\)是对偶问题的解的充分必要条件是\(x^*,\alpha^*,\beta^*\)满足下面的KKT条件(Karush-Kuhn-Tucker):
\[\nabla_xL(x^*,\alpha^*,\beta^*) = 0 \\
\alpha^*_ic_i(x^*) = 0,i = 1,2,\cdots,k \\
c_i(x^*) \leq 0,i = 1,2,\cdots,k \\
\alpha_i^* \geq 0,i = 1,2,\cdots,k \\
h_j(x^*) = 0,j = 1,2,\cdots,l
\]
\(\alpha^*_ic_i(x^*) = 0,i = 1,2,\cdots,k\)称为KKT的对偶互补条件,由此条件可知:若\(\alpha_i^* \gt 0\),则\(c_i(x^*) = 0\).