Optimization Theory
Linear Programming
What is LP?
An optimization problem can be written as
where \(\boldsymbol{x}\in\mathbb{R}^n\) is the variable to be determined, and \(f,g_i,h_j\) are given functions.
However, in general it's hard to solve. A special case is linear programming (LP), where \(f,g_i,h_j\) are all linear functions.
This can be transformed to the standard form of LP.
- For the constraints in the form \(\boldsymbol{a_i^\top x}\ge b_i\), we can add a slack variable \(s_i\) and transform it to \(\boldsymbol{a_i^\top x}-s_i=b_i\) and \(s_i\ge 0\).
- For the constraints in the form \(\boldsymbol{a_i^\top x}\le b_i\), we can multiply both sides by \(-1\) so that it reduces to the previous case.
- For the constraints in the form \(x_j\le 0\), we can multiply both sides by \(-1\) so that it reduces to the case \(x_j\ge 0\).
Now our program is in the form
The last step is to eliminate the unconstrained variables. For each unconstrained variable \(x_j\), we can replace it with \(x_j^+-x_j^-\), where \(x_j^+,x_j^-\ge 0\). Then the program is in standard form.
The Geometry of LP
In this section, we will discuss some geometric properties of linear programs.
Clearly, the feasible set of any linear programming problem is a polyhedron.
We will see that polyhedra is convex.
- The vector \(\sum_{i=1}^{k}\lambda_{i}\boldsymbol{x}^{i}\) is said to be a convex combination of the vectors \(\boldsymbol{x}^{1},\dots,\boldsymbol{x}^{k}\).
- The convex hull of the vectors \(\boldsymbol{x}^{1},\dots,\boldsymbol{x}^{k}\) is the set of all convex combinations of these vectors.
The result that follows establishes some important facts related to convexity.
- The intersection of convex sets is convex.
- Every polyhedron is a convex set.
- A convex combination of a finite number of elements of a convex set also belongs to that set.
- The convex hull of a finite number of vectors is a convex set.
If we find all the "corners" of a polyhedron, their convex hull is the polyhedron itself.
So how to define "corner"? We have three definitions: extreme point, vertex, and basic feasible solution.
Besides, we can define "corner" in a linear algebra way.
Linear algebra tells us that if we have \(n\) linearly independent constraints active at a point in \(\mathbb{R}^n\), then this point is uniquely determined.
- There exist \(n\) vectors in the set \(\{\boldsymbol{a}_{i} \mid i \in I\}\), which are linearly independent.
- The span of the vectors \(\boldsymbol{a}_{i}, i \in I\), is all of \(\mathbb{R}^{n}\), that is, every element of \(\mathbb{R}^{n}\) can be expressed as a linear combination of the vectors \(\boldsymbol{a}_{i}, i \in I\).
- The system of equations \(\boldsymbol{a}_{i}^{\top}\boldsymbol{x} = b_{i}, i \in I\), has a unique solution.
- The vector \(\boldsymbol{x}^{*}\) is a basic solution if:
- All equality constraints are active;
- Out of the constraints that are active at \(\boldsymbol{x}^{*}\), there are \(n\) of them that are linearly independent.
- If \(\boldsymbol{x}^{*}\) is a basic solution that satisfies all of the constraints, we say that it is a basic feasible solution.
Finally, we have the following important theorem that shows the equivalence of the three definitions of "corner".
- \(\boldsymbol{x}^{*}\) is a vertex;
- \(\boldsymbol{x}^{*}\) is an extreme point;
- \(\boldsymbol{x}^{*}\) is a basic feasible solution.
(Vertex \(\Rightarrow\) Extreme point) Suppose \(\boldsymbol{x}^{*}\) is a vertex but not an extreme point. Then there exist \(\boldsymbol{y}, \boldsymbol{z} \in P\) such that \(\boldsymbol{y} \neq \boldsymbol{x}^{*}\), \(\boldsymbol{z} \neq \boldsymbol{x}^{*}\), and \(\boldsymbol{x}^{*} = \lambda\boldsymbol{y} + (1 - \lambda)\boldsymbol{z}\) for some \(\lambda \in (0, 1)\). Thus,
which is a contradiction.
(Extreme point \(\Rightarrow\) Basic feasible solution) Suppose \(\boldsymbol{x}^{*}\) is an extreme point but not a basic feasible solution. Since \(\boldsymbol{x}^{*} \in P\), it satisfies all the constraints, and all equality constraints are active at \(\boldsymbol{x}^{*}\). Let \(I = \{i \mid \boldsymbol{a}_{i}^{\top}\boldsymbol{x}^{*} = b_{i}\}\) be the set of indices of constraints that are active at \(\boldsymbol{x}^{*}\). Since \(\boldsymbol{x}^{*}\) is not a basic feasible solution, the vectors \(\{\boldsymbol{a}_{i} \mid i \in I\}\) do not contain \(n\) linearly independent vectors. Thus, by the previous proposition, there exists a nonzero vector \(\boldsymbol{d}\) such that \(\boldsymbol{a}_{i}^{\top}\boldsymbol{d} = 0\) for all \(i \in I\). Let \(\epsilon > 0\) be sufficiently small so that \(\boldsymbol{y}=\boldsymbol{x}^{*} + \epsilon\boldsymbol{d}\) and \(\boldsymbol{z}=\boldsymbol{x}^{*} - \epsilon\boldsymbol{d}\) satisfy all the constraints. Then, we have \(\boldsymbol{x}^{*} = \frac{1}{2}(\boldsymbol{y}+\boldsymbol{z})\), which contradicts the assumption that \(\boldsymbol{x}^{*}\) is an extreme point.
(Basic feasible solution \(\Rightarrow\) Vertex) W.l.o.g., we assume \(M_2=\emptyset\). Suppose \(\boldsymbol{x}^{*}\) is a basic feasible solution. Let \(I=\{i \mid \boldsymbol{a}_{i}^{\top}\boldsymbol{x}^{*} = b_{i}\}\) and \(\boldsymbol{c}=\sum_{i \in I}\boldsymbol{a}_{i}\). Then,
For any \(\boldsymbol{y} \in P\), we have
Furthermore, the equality holds if and only if \(\boldsymbol{a}_{i}^{\top}\boldsymbol{y} = b_{i}\) for all \(i \in I\). Since \(\boldsymbol{x}^{*}\) is a basic feasible solution, the system of equations has a unique solution, which is \(\boldsymbol{y} = \boldsymbol{x}^{*}\). Therefore, \(\boldsymbol{x}^{*}\) is a vertex.
Clearly, the number of basic solutions is bounded above by the number of ways that we
can choose all possible sets of linearly independent constraints from the given constraints.
The next question is, how to find all the basic feasible solutions of a polyhedron? First, let's see how to find all the basic solutions of a linear program in standard form. W.l.o.g., we always assume that the rows of \(\boldsymbol{A}\) are linearly independent, otherwise we can remove the dependent rows.
- The columns \(\boldsymbol{A}_{B(1)}, \dots, \boldsymbol{A}_{B(m)}\) are linearly independent;
- If \(i \neq B(1), \dots, B(m)\), then \(x_{i} = 0\).
(\(\Leftarrow\)) Suppose \(\boldsymbol{x}\) satisfies the two conditions. Then we have
Thus, \(\boldsymbol{x}\) are unique determined. By proposition, there are \(n\) linearly independent active constraints, and this implies that \(\boldsymbol{x}\) is a basic solution.
Now we can find all the basic solutions by enumerating all the ways to choose \(m\) linearly independent columns of \(\boldsymbol{A}\), setting the corresponding variables as unknowns and the rest as zero, and solving the linear system. Finally, we can check which basic solutions are feasible.
- Choose \(m\) linearly independent columns \(\boldsymbol{A}_{B(1)}, \ldots, \boldsymbol{A}_{B(m)}\).
- Let \(x_{i} = 0\) for all \(i \neq B(1), \ldots, B(m)\).
- Solve the system of \(m\) equations \(\boldsymbol{Ax} = \boldsymbol{b}\) for the unknowns \(x_{B(1)}, \cdots, x_{B(m)}\).
Remark: The variables \(x_{B(1)}, \ldots, x_{B(m)}\) are called basic variables, and the rest are called nonbasic variables. By arranging the \(M\) basic columns next to each other, we obtain an \(m\times m\) matrix \(\boldsymbol{B}=[\boldsymbol{A}_{B(1)}\cdots\boldsymbol{A}_{B(m)}]\), called a basis matrix. Clearly, \(\boldsymbol{B}\) is invertible.
However, some polyhedrons do not have extreme points at all. It turns out that the existance of extreme points depends on whether a polyhedron contains an infinite line or not.
- The polyhedron \(P\) has at least one extreme point.
- There exist \(n\) vectors out of the family \(\boldsymbol{a}_{1}, \ldots, \boldsymbol{a}_{m}\), which are linearly independent.
- The polyhedron \(P\) does not contain a line.
(1 \(\Rightarrow\) 2) Suppose that \(P\) has an extreme point \(\boldsymbol{x}^{*}\), then \(\boldsymbol{x}^{*}\) is a basic feasible solution, and there exist \(n\) constraints that active at \(\boldsymbol{x}^{*}\), with the corresponding vectors being linearly independent.
(2 \(\Rightarrow\) 3) Suppose that there exist \(n\) vectors \(\boldsymbol{a}_{i_1},\boldsymbol{a}_{i_2},\cdots,\boldsymbol{a}_{i_n}\) that are linearly independent, and \(P\) contains a line \(\boldsymbol{x}+\lambda\boldsymbol{d}\), where \(\boldsymbol{d}\neq \boldsymbol{0}\). We then have \(\boldsymbol{a}_{i_j}^{\top}\boldsymbol{d}\neq 0\) for some \(j\in\{1,2,\cdots,n\}\). We conclude that \(\boldsymbol{a}_{i_j}^{\top}(\boldsymbol{x}+\lambda\boldsymbol{d})\ge b_{i_j}\) cannot hold for all \(\lambda\). This contradicts the assumption that \(P\) contains a line.
(3 \(\Rightarrow\) 1) Let \(\boldsymbol{x}\) be an element of \(P\) and let \(I=\{i \mid \boldsymbol{a}_{i}^{\top}\boldsymbol{x} = b_{i}\}\). If \(n\) of the vectors \(\boldsymbol{a}_{i}, i \in I\), are linearly independent, then \(\boldsymbol{x}\) is a basic feasible solution. Otherwise, by the previous proposition, there exists a nonzero vector \(\boldsymbol{d}\) such that \(\boldsymbol{a}_{i}^{\top}\boldsymbol{d} = 0\) for all \(i \in I\). Since \(P\) does not contain a line, there exists some \(\lambda^*\) such that \(\boldsymbol{y}=\boldsymbol{x}+\lambda^*\boldsymbol{d}\in P\) and some \(j\notin I\) such that \(\boldsymbol{a}_{j}^{\top}\boldsymbol{y} = b_{j}\). By moving from \(\boldsymbol{x}\) to \(\boldsymbol{y}\), we have added at least one new active constraint. Repeating this process, we will eventually find a point with \(n\) linearly independent active constraints, which is a basic feasible solution and thus an extreme point.
Bounded polyhedrons and polyhedrons in standard form do not contain any line.
To end this section, we have the following important theorem that shows the importance of extreme points in linear programming.
Suppose \(\boldsymbol{x}^{*}\) is not an extreme point of \(P\). Then there exist \(\boldsymbol{y}, \boldsymbol{z} \in P\) such that \(\boldsymbol{y} \neq \boldsymbol{x}^{*}\), \(\boldsymbol{z} \neq \boldsymbol{x}^{*}\), and \(\boldsymbol{x}^{*} = \lambda\boldsymbol{y} + (1 - \lambda)\boldsymbol{z}\) for some \(\lambda \in (0, 1)\). It follows that \(v=\boldsymbol{c}^{\top}\boldsymbol{x}^{*} = \lambda\boldsymbol{c}^{\top}\boldsymbol{y} + (1 - \lambda)\boldsymbol{c}^{\top}\boldsymbol{z}\). Since \(v\) is optimal, we have \(\boldsymbol{c}^{\top}\boldsymbol{y} = v\) and \(\boldsymbol{c}^{\top}\boldsymbol{z} = v\). Thus, \(\boldsymbol{y}, \boldsymbol{z} \in Q\), which contradicts the assumption that \(\boldsymbol{x}^{*}\) is an extreme point of \(Q\). Therefore, \(\boldsymbol{x}^{*}\) is an extreme point of \(P\), and it is an optimal solution of the linear program.
Simplex Method
Back to the linear programming problem, our goal is to find an optimal solution. By the above theorem, we only need to check the extreme points.
The simplex method is an algorithm that iteratively moves along the "edges" of the polyhedron from one extreme point to another, until it reaches an optimal extreme point.
Initialization
First, we need to find an initial basic feasible solution. We can use the two-phase method.
Assume the original LP is in standard form:
Consider the auxiliary LP:
The original LP is feasible if and only of the auxiliary LP has the optimal objective value \(0\). So we can solve the auxiliary LP first. However, the auxiliary LP has an obvious basic feasible solution: \(\boldsymbol{x}=\boldsymbol{0}\) and \(\boldsymbol{y}=\boldsymbol{b}\). Thus, we can use the simplex method to solve the auxiliary LP, get an initial basic feasible solution of the original LP, and then use the simplex method again to solve the original LP.
Iteration
Next, how to find a direction from the current vertex?
Let \(\boldsymbol{x}\) be a basic feasible solution with basis matrix \(\boldsymbol{B}=[\boldsymbol{A}_{B(1)}\cdots\boldsymbol{A}_{B(m)}]\). We have \(x_i = 0\) for every nonbasic variable, while the vector \(x_B = (x_{B(1)},\cdots,x_{B(m)})\) of basic variables is given by
Consider moving from \(\boldsymbol{x}\) to \(\boldsymbol{x}+\theta\boldsymbol{d}\), by changing exactly one nonbasic variable \(x_j\) from \(0\) to \(\theta\). That means, we set \(d_j=1\) and \(d_i=0\) for all nonbasic variables \(i\neq j\). Since that we are only interested in feasible solutions, we require \(\boldsymbol{A}(\boldsymbol{x} + \theta\boldsymbol{d}) = \boldsymbol{b}\), and since \(\boldsymbol{x}\) is feasible, we also have \(\boldsymbol{Ad} = \boldsymbol{0}\). Thus
Solving this equation gives
The effects on the cost function if we move along this direction is given by
- If \(\overline{\boldsymbol{c}} \geq \mathbf{0}\), then \(\boldsymbol{x}\) is optimal.
- If \(\boldsymbol{x}\) is optimal and nondegenerate, then \(\overline{\boldsymbol{c}} \geq \mathbf{0}\).
Once we find a descent direction \(\boldsymbol{d}\), we can move along this direction as far as possible:
Simplifying for standard polyhedron gives
Now we can say that we have found a new basic feasible solution \(\boldsymbol{x}+\theta^*\boldsymbol{d}\).
- The columns \(\mathbf{A}_{B(i)}\), \(i \neq \ell\), and \(\mathbf{A}_{j}\) are linearly independent and, therefore, \(\overline{\mathbf{B}}\) is a basis matrix.
- The vector \(\mathbf{y} = \mathbf{x} + \theta^{*}\mathbf{d}\) is a basic feasible solution associated with the basis matrix \(\overline{\mathbf{B}}\).
Summary
Repeat until optimality is reached:
- If there is no descent direction, \(\boldsymbol{x}\) is optimal;
- Otherwise, choose a descent direction \(\boldsymbol{d}\) by selecting a nonbasic variable with negative reduced cost; Update \(\boldsymbol{x}\) by moving along \(\boldsymbol{d}\) as far as possible to get a new basic feasible solution.
Clearly, the simplex method is exponential in the worst case. However, it works very well in practice.
Duality theory
Duality theory is an important part of linear programming. Every linear programming problem, called the primal problem, has a corresponding dual problem. The solutions of the two problems are closely related.
In standard form, the primal and dual problems are:
How would the dual problem help us understand the primal problem itself?
Duality Theorems
First, let's show that the cost of any feasible solution to the dual is a lower bound on the cost of any feasible solution to the primal.
It has two simple corollaries.
- If the optimal cost in the primal is \(-\infty\), then the dual problem must be infeasible.
- If the optimal cost in the dual is \(+\infty\), then the primal problem must be infeasible.
- There exists some \(\mathbf{x} \geq \mathbf{0}\) such that \(\mathbf{A}\mathbf{x} = \mathbf{b}\).
- There exists some vector \(\mathbf{p}\) such that \(\mathbf{p}^{\prime}\mathbf{A} \geq \mathbf{0}^{\prime}\) and \(\mathbf{p}^{\prime}\mathbf{b} < 0\).
The next theorem is the central result on linear programming duality.
Cutting Plane Methods
Finally, how to design faster algorithms?
Convexity
Surprisingly, the answer is to step back and look at the more general convex problems.
Center-of-Gravity Algorithm
For linear programming, the subgradient is simply the cost vector \(\boldsymbol{c}\).
Repeat for \(t=0,1,\ldots\) until convergence:
- Compute the center of gravity \(c_t\) of \(S_t\):\[c_t=\frac{1}{\text{vol}(S_t)}\int_{x\in S_t} x \text{d}x. \]
- Find a subgradient \(\partial f(c_t)\) and update:\[S_{t+1}=S_t\cap\{x\mid \partial f(c_t)^{\top}(x-c_t)\leq 0\}. \]
However, computing the center of gravity is hard. We need more practical algorithms.

浙公网安备 33010602011771号