Optimization Theory

Linear Programming

What is LP?

An optimization problem can be written as

\[\begin{array}{rll} \text{minimize}& f(\boldsymbol{\boldsymbol{x}}) \\ \text{subject to} & g_i(\boldsymbol{x})\le 0, & i=1,2,\cdots,m, \\ & h_j(\boldsymbol{x})=0, &j=1,2,\cdots,p, \end{array} \]

where \(\boldsymbol{x}\in\mathbb{R}^n\) is the variable to be determined, and \(f,g_i,h_j\) are given functions.

However, in general it's hard to solve. A special case is linear programming (LP), where \(f,g_i,h_j\) are all linear functions.

A linear program (in general form) is an optimization problem of the form $$ \begin{array}{rll} \text{minimize} & \boldsymbol{c^\top x} &\\ \text{subject to} & \boldsymbol{a_i^\top x}\ge b_i, & i\in M_1, \\ & \boldsymbol{a_i^\top x}\le b_i, & i\in M_2, \\ & \boldsymbol{a_i^\top x}= b_i, & i\in M_3, \\ & x_j\ge 0, & j\in N_1, \\ & x_j\le 0, & j\in N_2, \\ \end{array} $$ where $\boldsymbol{x}\in\mathbb{R}^n$ is the variable to be determined, $\boldsymbol{c,a_i}\in\mathbb{R}^n$ are given coefficient vectors, and $b_i\in\mathbb{R}$ are given scalars. Here $M_1,M_2,M_3$ are disjoint index sets with $M_1\cup M_2\cup M_3=\{1,2,\cdots,m\}$, and $N_1,N_2$ are disjoint index sets with $N_1\cup N_2=\{1,2,\cdots,n\}$.

This can be transformed to the standard form of LP.

A linear program (in standard form) is an optimization problem of the form $$ \begin{align*} \text{minimize} \quad & \boldsymbol{c^\top x} \\ \text{subject to} \quad & \boldsymbol{Ax}=\boldsymbol{b}, \\ & \boldsymbol{x}\ge \boldsymbol{0}, \end{align*} $$ where $\boldsymbol{x}\in\mathbb{R}^n$ is the variable to be determined, $\boldsymbol{c}\in\mathbb{R}^n$ is a given coefficient vector, $\boldsymbol{A}\in\mathbb{R}^{m\times n}$ is a given coefficient matrix, and $\boldsymbol{b}\in\mathbb{R}^m$ is a given vector.
Any linear program in general form can be transformed to an equivalent linear program in standard form.
Each constraint can be successfully transformed:
  • For the constraints in the form \(\boldsymbol{a_i^\top x}\ge b_i\), we can add a slack variable \(s_i\) and transform it to \(\boldsymbol{a_i^\top x}-s_i=b_i\) and \(s_i\ge 0\).
  • For the constraints in the form \(\boldsymbol{a_i^\top x}\le b_i\), we can multiply both sides by \(-1\) so that it reduces to the previous case.
  • For the constraints in the form \(x_j\le 0\), we can multiply both sides by \(-1\) so that it reduces to the case \(x_j\ge 0\).

Now our program is in the form

\[\begin{align*} \text{minimize} \quad & \boldsymbol{c^\top x} \\ \text{subject to} \quad & \boldsymbol{Ax}=\boldsymbol{b}, \\ & x_j\ge 0, \quad j\in N, \end{align*} \]

The last step is to eliminate the unconstrained variables. For each unconstrained variable \(x_j\), we can replace it with \(x_j^+-x_j^-\), where \(x_j^+,x_j^-\ge 0\). Then the program is in standard form.

The Geometry of LP

In this section, we will discuss some geometric properties of linear programs.

A polyhedron is a set that can be described in the form $\{\boldsymbol{x}\in\mathbb{R}^n|\boldsymbol{Ax}\ge\boldsymbol{b}\}$, where $\boldsymbol{A}$ is an $m\times n$ matrix and $\boldsymbol{b}$ is a vector in $\mathbb{R}^m$.

Clearly, the feasible set of any linear programming problem is a polyhedron.

We will see that polyhedra is convex.

A set $S \subset \mathbb{R}^{n}$ is convex if for any $\boldsymbol{x}, \boldsymbol{y} \in S$, and any $\lambda \in [0,1]$, we have $\lambda\boldsymbol{x} + (1-\lambda)\boldsymbol{y} \in S$.
Let $\boldsymbol{x}^{1},\dots,\boldsymbol{x}^{k}$ be vectors in $\mathbb{R}^{n}$ and let $\lambda_{1},\dots,\lambda_{k}$ be nonnegative scalars whose sum is unity.
  • The vector \(\sum_{i=1}^{k}\lambda_{i}\boldsymbol{x}^{i}\) is said to be a convex combination of the vectors \(\boldsymbol{x}^{1},\dots,\boldsymbol{x}^{k}\).
  • The convex hull of the vectors \(\boldsymbol{x}^{1},\dots,\boldsymbol{x}^{k}\) is the set of all convex combinations of these vectors.

The result that follows establishes some important facts related to convexity.

  • The intersection of convex sets is convex.
  • Every polyhedron is a convex set.
  • A convex combination of a finite number of elements of a convex set also belongs to that set.
  • The convex hull of a finite number of vectors is a convex set.

If we find all the "corners" of a polyhedron, their convex hull is the polyhedron itself.

So how to define "corner"? We have three definitions: extreme point, vertex, and basic feasible solution.

Let $P$ be a polyhedron. A vector $\boldsymbol{x} \in P$ is an extreme point of $P$ if we cannot find two vectors $\boldsymbol{y}, \boldsymbol{z} \in P$, both different from $\boldsymbol{x}$, and a scalar $\lambda \in [0, 1]$, such that $\boldsymbol{x} = \lambda\boldsymbol{y} + (1 - \lambda)\boldsymbol{z}$.
Let $P$ be a polyhedron. A vector $\boldsymbol{x} \in P$ is a vertex of $P$ if there exists some $\boldsymbol{c}$ such that $\boldsymbol{c}^{\top}\boldsymbol{x} < \boldsymbol{c}^{\top}\boldsymbol{y}$ for all $\boldsymbol{y}$ satisfying $\boldsymbol{y} \in P$ and $\boldsymbol{y} \neq \boldsymbol{x}$.

Besides, we can define "corner" in a linear algebra way.

If a vector $\boldsymbol{x}^{*}$ satisfies $\boldsymbol{a}_{i}^{\top}\boldsymbol{x}^{*} = b_{i}$ for some $i$ in $M_{1}$, $M_{2}$, or $M_{3}$ (recall the definition of general form of LP), we say that the corresponding constraint is active or binding at $\boldsymbol{x}^{*}$.

Linear algebra tells us that if we have \(n\) linearly independent constraints active at a point in \(\mathbb{R}^n\), then this point is uniquely determined.

Let $\boldsymbol{x}^{*}$ be an element of $\mathbb{R}^{n}$ and let $I = \{i \mid \boldsymbol{a}_{i}^{\top}\boldsymbol{x}^{*} = b_{i}\}$ be the set of indices of constraints that are active at $\boldsymbol{x}^{*}$. Then, the following are equivalent:
  • There exist \(n\) vectors in the set \(\{\boldsymbol{a}_{i} \mid i \in I\}\), which are linearly independent.
  • The span of the vectors \(\boldsymbol{a}_{i}, i \in I\), is all of \(\mathbb{R}^{n}\), that is, every element of \(\mathbb{R}^{n}\) can be expressed as a linear combination of the vectors \(\boldsymbol{a}_{i}, i \in I\).
  • The system of equations \(\boldsymbol{a}_{i}^{\top}\boldsymbol{x} = b_{i}, i \in I\), has a unique solution.
Consider a polyhedron $P$ defined by linear equality and inequality constraints, and let $\boldsymbol{x}^{*}$ be an element of $\mathbb{R}^{n}$.
  • The vector \(\boldsymbol{x}^{*}\) is a basic solution if:
    • All equality constraints are active;
    • Out of the constraints that are active at \(\boldsymbol{x}^{*}\), there are \(n\) of them that are linearly independent.
  • If \(\boldsymbol{x}^{*}\) is a basic solution that satisfies all of the constraints, we say that it is a basic feasible solution.

Finally, we have the following important theorem that shows the equivalence of the three definitions of "corner".

Let $P$ be a nonempty polyhedron and let $\boldsymbol{x}^{*} \in P$. Then, the following are equivalent:
  • \(\boldsymbol{x}^{*}\) is a vertex;
  • \(\boldsymbol{x}^{*}\) is an extreme point;
  • \(\boldsymbol{x}^{*}\) is a basic feasible solution.

(Vertex \(\Rightarrow\) Extreme point) Suppose \(\boldsymbol{x}^{*}\) is a vertex but not an extreme point. Then there exist \(\boldsymbol{y}, \boldsymbol{z} \in P\) such that \(\boldsymbol{y} \neq \boldsymbol{x}^{*}\), \(\boldsymbol{z} \neq \boldsymbol{x}^{*}\), and \(\boldsymbol{x}^{*} = \lambda\boldsymbol{y} + (1 - \lambda)\boldsymbol{z}\) for some \(\lambda \in (0, 1)\). Thus,

\[\boldsymbol{c}^{\top}\boldsymbol{x}^{*} = \lambda\boldsymbol{c}^{\top}\boldsymbol{y} + (1 - \lambda)\boldsymbol{c}^{\top}\boldsymbol{z} > \lambda\boldsymbol{c}^{\top}\boldsymbol{x}^{*} + (1 - \lambda)\boldsymbol{c}^{\top}\boldsymbol{x}^{*} = \boldsymbol{c}^{\top}\boldsymbol{x}^{*}, \]

which is a contradiction.

(Extreme point \(\Rightarrow\) Basic feasible solution) Suppose \(\boldsymbol{x}^{*}\) is an extreme point but not a basic feasible solution. Since \(\boldsymbol{x}^{*} \in P\), it satisfies all the constraints, and all equality constraints are active at \(\boldsymbol{x}^{*}\). Let \(I = \{i \mid \boldsymbol{a}_{i}^{\top}\boldsymbol{x}^{*} = b_{i}\}\) be the set of indices of constraints that are active at \(\boldsymbol{x}^{*}\). Since \(\boldsymbol{x}^{*}\) is not a basic feasible solution, the vectors \(\{\boldsymbol{a}_{i} \mid i \in I\}\) do not contain \(n\) linearly independent vectors. Thus, by the previous proposition, there exists a nonzero vector \(\boldsymbol{d}\) such that \(\boldsymbol{a}_{i}^{\top}\boldsymbol{d} = 0\) for all \(i \in I\). Let \(\epsilon > 0\) be sufficiently small so that \(\boldsymbol{y}=\boldsymbol{x}^{*} + \epsilon\boldsymbol{d}\) and \(\boldsymbol{z}=\boldsymbol{x}^{*} - \epsilon\boldsymbol{d}\) satisfy all the constraints. Then, we have \(\boldsymbol{x}^{*} = \frac{1}{2}(\boldsymbol{y}+\boldsymbol{z})\), which contradicts the assumption that \(\boldsymbol{x}^{*}\) is an extreme point.

(Basic feasible solution \(\Rightarrow\) Vertex) W.l.o.g., we assume \(M_2=\emptyset\). Suppose \(\boldsymbol{x}^{*}\) is a basic feasible solution. Let \(I=\{i \mid \boldsymbol{a}_{i}^{\top}\boldsymbol{x}^{*} = b_{i}\}\) and \(\boldsymbol{c}=\sum_{i \in I}\boldsymbol{a}_{i}\). Then,

\[\boldsymbol{c}^{\top}\boldsymbol{x}^{*} = \sum_{i \in I}\boldsymbol{a}_{i}^{\top}\boldsymbol{x}^{*} = \sum_{i \in I}b_{i}. \]

For any \(\boldsymbol{y} \in P\), we have

\[\boldsymbol{c}^{\top}\boldsymbol{y} = \sum_{i \in I}\boldsymbol{a}_{i}^{\top}\boldsymbol{y} \geq \sum_{i \in I}b_{i} = \boldsymbol{c}^{\top}\boldsymbol{x}^{*}. \]

Furthermore, the equality holds if and only if \(\boldsymbol{a}_{i}^{\top}\boldsymbol{y} = b_{i}\) for all \(i \in I\). Since \(\boldsymbol{x}^{*}\) is a basic feasible solution, the system of equations has a unique solution, which is \(\boldsymbol{y} = \boldsymbol{x}^{*}\). Therefore, \(\boldsymbol{x}^{*}\) is a vertex.

Clearly, the number of basic solutions is bounded above by the number of ways that we
can choose all possible sets of linearly independent constraints from the given constraints.

Given a finite number of linear inequality constraints, there can only be a finite number of basic or basic feasible solutions.

The next question is, how to find all the basic feasible solutions of a polyhedron? First, let's see how to find all the basic solutions of a linear program in standard form. W.l.o.g., we always assume that the rows of \(\boldsymbol{A}\) are linearly independent, otherwise we can remove the dependent rows.

Consider the constraints $\boldsymbol{Ax} = \boldsymbol{b}$ and $\boldsymbol{x} \geq \boldsymbol{0}$ and assume that the $m \times n$ matrix $\boldsymbol{A}$ has linearly independent rows. A vector $\boldsymbol{x} \in \mathbb{R}^{n}$ is a basic solution if and only if we have $\boldsymbol{Ax} = \boldsymbol{b}$, and there exist indices $B(1), \dots, B(m)$ such that:
  • The columns \(\boldsymbol{A}_{B(1)}, \dots, \boldsymbol{A}_{B(m)}\) are linearly independent;
  • If \(i \neq B(1), \dots, B(m)\), then \(x_{i} = 0\).
($\Rightarrow$) Suppose $\boldsymbol{x}$ is a basic solution. Let $B(1),\ldots, B(k)$ be the indices of the nonzero components of $\boldsymbol{x}$. Since $\boldsymbol{x}$ is a basic solution, the system of equations formed by the active constraints $\boldsymbol{Ax}=\boldsymbol{b}$ and $x_i=0,i\neq B(1),\ldots,B(k)$ has a unique solution. Equivalently, $\sum_{i=1}^k \boldsymbol{A}_{B(i)}x_{B(i)}=\boldsymbol{b}$ has a unique solution. Thus, the columns $\boldsymbol{A}_{B(1)},\ldots,\boldsymbol{A}_{B(k)}$ are linearly independent. Immediately we have $k\le m$. So we add $m-k$ indices to $B(1),\ldots,B(k)$ such that the columns $\boldsymbol{A}_{B(1)},\ldots,\boldsymbol{A}_{B(m)}$ are linearly independent. Now both conditions are satisfied.

(\(\Leftarrow\)) Suppose \(\boldsymbol{x}\) satisfies the two conditions. Then we have

\[\sum_{i=1}^m \boldsymbol{A}_{B(i)}x_{B(i)} = \sum_{i=1}^n \boldsymbol{A}_i x_i = \boldsymbol{Ax}=\boldsymbol{b}. \]

Thus, \(\boldsymbol{x}\) are unique determined. By proposition, there are \(n\) linearly independent active constraints, and this implies that \(\boldsymbol{x}\) is a basic solution.

Now we can find all the basic solutions by enumerating all the ways to choose \(m\) linearly independent columns of \(\boldsymbol{A}\), setting the corresponding variables as unknowns and the rest as zero, and solving the linear system. Finally, we can check which basic solutions are feasible.

  1. Choose \(m\) linearly independent columns \(\boldsymbol{A}_{B(1)}, \ldots, \boldsymbol{A}_{B(m)}\).
  2. Let \(x_{i} = 0\) for all \(i \neq B(1), \ldots, B(m)\).
  3. Solve the system of \(m\) equations \(\boldsymbol{Ax} = \boldsymbol{b}\) for the unknowns \(x_{B(1)}, \cdots, x_{B(m)}\).

Remark: The variables \(x_{B(1)}, \ldots, x_{B(m)}\) are called basic variables, and the rest are called nonbasic variables. By arranging the \(M\) basic columns next to each other, we obtain an \(m\times m\) matrix \(\boldsymbol{B}=[\boldsymbol{A}_{B(1)}\cdots\boldsymbol{A}_{B(m)}]\), called a basis matrix. Clearly, \(\boldsymbol{B}\) is invertible.

However, some polyhedrons do not have extreme points at all. It turns out that the existance of extreme points depends on whether a polyhedron contains an infinite line or not.

A polyhedron $P\subset \mathbb{R}^n$ contains a line if there exist a vector $\boldsymbol{x}\in P$ and a nonzero vector $\boldsymbol{d}\in\mathbb{R}^n$ such that $\boldsymbol{x}+\lambda\boldsymbol{d}\in P$ for all scalars $\lambda$.
Suppose that the polyhedron $P = \{\boldsymbol{x} \in \mathbb{R}^{n} \mid \boldsymbol{a}_{i}^{\prime}\boldsymbol{x} \geq b_{i},\ i = 1, \ldots, m\}$ is nonempty. Then, the following are equivalent:
  1. The polyhedron \(P\) has at least one extreme point.
  2. There exist \(n\) vectors out of the family \(\boldsymbol{a}_{1}, \ldots, \boldsymbol{a}_{m}\), which are linearly independent.
  3. The polyhedron \(P\) does not contain a line.

(1 \(\Rightarrow\) 2) Suppose that \(P\) has an extreme point \(\boldsymbol{x}^{*}\), then \(\boldsymbol{x}^{*}\) is a basic feasible solution, and there exist \(n\) constraints that active at \(\boldsymbol{x}^{*}\), with the corresponding vectors being linearly independent.

(2 \(\Rightarrow\) 3) Suppose that there exist \(n\) vectors \(\boldsymbol{a}_{i_1},\boldsymbol{a}_{i_2},\cdots,\boldsymbol{a}_{i_n}\) that are linearly independent, and \(P\) contains a line \(\boldsymbol{x}+\lambda\boldsymbol{d}\), where \(\boldsymbol{d}\neq \boldsymbol{0}\). We then have \(\boldsymbol{a}_{i_j}^{\top}\boldsymbol{d}\neq 0\) for some \(j\in\{1,2,\cdots,n\}\). We conclude that \(\boldsymbol{a}_{i_j}^{\top}(\boldsymbol{x}+\lambda\boldsymbol{d})\ge b_{i_j}\) cannot hold for all \(\lambda\). This contradicts the assumption that \(P\) contains a line.

(3 \(\Rightarrow\) 1) Let \(\boldsymbol{x}\) be an element of \(P\) and let \(I=\{i \mid \boldsymbol{a}_{i}^{\top}\boldsymbol{x} = b_{i}\}\). If \(n\) of the vectors \(\boldsymbol{a}_{i}, i \in I\), are linearly independent, then \(\boldsymbol{x}\) is a basic feasible solution. Otherwise, by the previous proposition, there exists a nonzero vector \(\boldsymbol{d}\) such that \(\boldsymbol{a}_{i}^{\top}\boldsymbol{d} = 0\) for all \(i \in I\). Since \(P\) does not contain a line, there exists some \(\lambda^*\) such that \(\boldsymbol{y}=\boldsymbol{x}+\lambda^*\boldsymbol{d}\in P\) and some \(j\notin I\) such that \(\boldsymbol{a}_{j}^{\top}\boldsymbol{y} = b_{j}\). By moving from \(\boldsymbol{x}\) to \(\boldsymbol{y}\), we have added at least one new active constraint. Repeating this process, we will eventually find a point with \(n\) linearly independent active constraints, which is a basic feasible solution and thus an extreme point.

Bounded polyhedrons and polyhedrons in standard form do not contain any line.

Every nonempty bounded polyhedron and every nonempty polyhedron in standard form has at least one extreme point.

To end this section, we have the following important theorem that shows the importance of extreme points in linear programming.

If a linear program has an optimal solution, then at least one of its optimal solutions is an extreme point.
Consider the linear programming problem of minimizing $\boldsymbol{c}^{\top}\boldsymbol{x}$ over a polyhedron $P=\{\boldsymbol{x} \in \mathbb{R}^{n} \mid \boldsymbol{Ax} \ge\boldsymbol{b}\}$ and let $v$ be the optimal value. Then $Q=\{\boldsymbol{x} \in \mathbb{R}^{n} \mid \boldsymbol{Ax} \ge\boldsymbol{b}, \boldsymbol{c}^{\top}\boldsymbol{x} = v\}$ is also a polyhedron. By the above theorem, $P$ contains no line, so $Q$ contains no line either. Therefore, $Q$ has an extreme point $\boldsymbol{x}^{*}$.

Suppose \(\boldsymbol{x}^{*}\) is not an extreme point of \(P\). Then there exist \(\boldsymbol{y}, \boldsymbol{z} \in P\) such that \(\boldsymbol{y} \neq \boldsymbol{x}^{*}\), \(\boldsymbol{z} \neq \boldsymbol{x}^{*}\), and \(\boldsymbol{x}^{*} = \lambda\boldsymbol{y} + (1 - \lambda)\boldsymbol{z}\) for some \(\lambda \in (0, 1)\). It follows that \(v=\boldsymbol{c}^{\top}\boldsymbol{x}^{*} = \lambda\boldsymbol{c}^{\top}\boldsymbol{y} + (1 - \lambda)\boldsymbol{c}^{\top}\boldsymbol{z}\). Since \(v\) is optimal, we have \(\boldsymbol{c}^{\top}\boldsymbol{y} = v\) and \(\boldsymbol{c}^{\top}\boldsymbol{z} = v\). Thus, \(\boldsymbol{y}, \boldsymbol{z} \in Q\), which contradicts the assumption that \(\boldsymbol{x}^{*}\) is an extreme point of \(Q\). Therefore, \(\boldsymbol{x}^{*}\) is an extreme point of \(P\), and it is an optimal solution of the linear program.

Simplex Method

Back to the linear programming problem, our goal is to find an optimal solution. By the above theorem, we only need to check the extreme points.

The simplex method is an algorithm that iteratively moves along the "edges" of the polyhedron from one extreme point to another, until it reaches an optimal extreme point.

Initialization

First, we need to find an initial basic feasible solution. We can use the two-phase method.

Assume the original LP is in standard form:

\[\begin{align*} \text{minimize} \quad & \boldsymbol{c^\top x} \\ \text{subject to} \quad & \boldsymbol{Ax}=\boldsymbol{b}, \\ & \boldsymbol{x}\ge \boldsymbol{0}. \end{align*} \]

Consider the auxiliary LP:

\[\begin{align*} \text{minimize} \quad & \sum_{i=1}^m y_i \\ \text{subject to} \quad & \boldsymbol{Ax}+\boldsymbol{y}=\boldsymbol{b}, \\ & \boldsymbol{x}\ge \boldsymbol{0}, \\ & \boldsymbol{y}\ge \boldsymbol{0}. \end{align*} \]

The original LP is feasible if and only of the auxiliary LP has the optimal objective value \(0\). So we can solve the auxiliary LP first. However, the auxiliary LP has an obvious basic feasible solution: \(\boldsymbol{x}=\boldsymbol{0}\) and \(\boldsymbol{y}=\boldsymbol{b}\). Thus, we can use the simplex method to solve the auxiliary LP, get an initial basic feasible solution of the original LP, and then use the simplex method again to solve the original LP.

Iteration

Next, how to find a direction from the current vertex?

Let $\boldsymbol{x}$ be an element of a polyhedron $P$. A vector $\boldsymbol{d} \in \mathbb{R}^n$ is said to be a feasible direction at $\boldsymbol{x}$, if there exists a positive scalar $\theta$ for which $\boldsymbol{x} + \theta\boldsymbol{d} \in P$.

Let \(\boldsymbol{x}\) be a basic feasible solution with basis matrix \(\boldsymbol{B}=[\boldsymbol{A}_{B(1)}\cdots\boldsymbol{A}_{B(m)}]\). We have \(x_i = 0\) for every nonbasic variable, while the vector \(x_B = (x_{B(1)},\cdots,x_{B(m)})\) of basic variables is given by

\[\boldsymbol{x}_B=\boldsymbol{B}^{-1}\boldsymbol{b}. \]

Consider moving from \(\boldsymbol{x}\) to \(\boldsymbol{x}+\theta\boldsymbol{d}\), by changing exactly one nonbasic variable \(x_j\) from \(0\) to \(\theta\). That means, we set \(d_j=1\) and \(d_i=0\) for all nonbasic variables \(i\neq j\). Since that we are only interested in feasible solutions, we require \(\boldsymbol{A}(\boldsymbol{x} + \theta\boldsymbol{d}) = \boldsymbol{b}\), and since \(\boldsymbol{x}\) is feasible, we also have \(\boldsymbol{Ad} = \boldsymbol{0}\). Thus

\[\boldsymbol{0}=\boldsymbol{Ad}=\sum_{i=1}^n \boldsymbol{A}_i d_i = \sum_{i=1}^m \boldsymbol{A}_{B(i)} d_{B(i)} + \boldsymbol{A}_j=\boldsymbol{B}\boldsymbol{d}_B + \boldsymbol{A}_j. \]

Solving this equation gives

\[\boldsymbol{d}_B = -\boldsymbol{B}^{-1}\boldsymbol{A}_j. \]

The effects on the cost function if we move along this direction is given by

\[\overline{c}_j=\boldsymbol{c}^\top\boldsymbol{d} = c_j-\boldsymbol{c}_B^\top\boldsymbol{B}^{-1}\boldsymbol{A}_j. \]

Consider a basic feasible solution $\boldsymbol{x}$ associated with a basis matrix $\boldsymbol{B}$, and let $\overline{\boldsymbol{c}}$ be the corresponding vector of reduced costs.
  • If \(\overline{\boldsymbol{c}} \geq \mathbf{0}\), then \(\boldsymbol{x}\) is optimal.
  • If \(\boldsymbol{x}\) is optimal and nondegenerate, then \(\overline{\boldsymbol{c}} \geq \mathbf{0}\).
TBD

Once we find a descent direction \(\boldsymbol{d}\), we can move along this direction as far as possible:

\[\theta^*=\max\{\theta\ge 0|\boldsymbol{x}+\theta\boldsymbol{d}\in P\}. \]

Simplifying for standard polyhedron gives

\[\theta^*=\min_{i:d_{B(i)}<0}\left(-\frac{x_{B(i)}}{d_{B(i)}}\right)=-\frac{x_{B(\ell)}}{d_{B(\ell)}}. \]

Now we can say that we have found a new basic feasible solution \(\boldsymbol{x}+\theta^*\boldsymbol{d}\).

  • The columns \(\mathbf{A}_{B(i)}\), \(i \neq \ell\), and \(\mathbf{A}_{j}\) are linearly independent and, therefore, \(\overline{\mathbf{B}}\) is a basis matrix.
  • The vector \(\mathbf{y} = \mathbf{x} + \theta^{*}\mathbf{d}\) is a basic feasible solution associated with the basis matrix \(\overline{\mathbf{B}}\).
TBD

Summary

Find an initial basic feasible solution $\boldsymbol{x}$ by the two-phase method.

Repeat until optimality is reached:

  • If there is no descent direction, \(\boldsymbol{x}\) is optimal;
  • Otherwise, choose a descent direction \(\boldsymbol{d}\) by selecting a nonbasic variable with negative reduced cost; Update \(\boldsymbol{x}\) by moving along \(\boldsymbol{d}\) as far as possible to get a new basic feasible solution.

Clearly, the simplex method is exponential in the worst case. However, it works very well in practice.

Duality theory

Duality theory is an important part of linear programming. Every linear programming problem, called the primal problem, has a corresponding dual problem. The solutions of the two problems are closely related.

Let $\mathbf{A}$ be a matrix with rows $\mathbf{a}_{i}^{\prime}$ and columns $\mathbf{A}_{j}$. Given a primal problem with the structure shown on the left, its dual is defined to be the maximization problem shown on the right:

\[\begin{array}{rllrll} \text{minimize} & \boldsymbol{c^\top x} & & \quad \text{maximize} &\mathbf{p}^{\top}\mathbf{b}\\ \text{subject to} & \boldsymbol{a_i^\top x}\ge b_i, & i\in M_1, & \quad\text{subject to} & p_{i} \geq 0, & i \in M_{1},\\ & \boldsymbol{a_i^\top x}\le b_i, & i\in M_2, & & p_{i} \leq 0, & i \in M_{2},\\ & \boldsymbol{a_i^\top x}= b_i, & i\in M_3, & & p_{i} \text{ free}, & i \in M_{3},\\ & x_j\ge 0, & j\in N_1, & & \mathbf{p}^{\top}\mathbf{A}_{j} \leq c_{j}, & j \in N_{1}, \\ & x_j\le 0, & j\in N_2, & & \mathbf{p}^{\top}\mathbf{A}_{j} \geq c_{j}, & j \in N_{2}, \\ & x_j \text{ free}, & j\in N_3, & & \mathbf{p}^{\top}\mathbf{A}_{j} = c_{j}, & j \in N_{3}. \end{array} \]

In standard form, the primal and dual problems are:

\[\begin{array}{rlrl} \text{minimize} & \boldsymbol{c^\top x} & \quad \text{maximize} &\mathbf{p}^{\top}\mathbf{b}\\ \text{subject to} & \boldsymbol{Ax}=\boldsymbol{b}, & \quad\text{subject to} & \mathbf{p}^{\top}\mathbf{A} \leq \mathbf{c}^{\top}.\\ & \boldsymbol{x}\ge \boldsymbol{0}, & & \end{array} \]

If we transform the dual into an equivalent minimization problem and then form its dual, we obtain a problem equivalent to the original problem.

How would the dual problem help us understand the primal problem itself?

Duality Theorems

First, let's show that the cost of any feasible solution to the dual is a lower bound on the cost of any feasible solution to the primal.

If $\boldsymbol{x}$ is a feasible solution to the primal problem and $\boldsymbol{p}$ is a feasible solution to the dual problem, then

\[\boldsymbol{p}^\top\boldsymbol{b} \leq \boldsymbol{c}^\top\boldsymbol{x}. \]

Suppose $\boldsymbol{x}$ and $\boldsymbol{p}$ are feasible solutions to the primal and dual problems, respectively. Let $u_i=p_i(\boldsymbol{a}_i^\top\boldsymbol{x}-b_i)$ and $v_i=(c_j-\boldsymbol{p}^\top\boldsymbol{A}_j)x_j$. Then we have $u_i\ge 0$ for all $i$ and $v_j\ge 0$ for all $j$. Thus, $$ 0\ge \sum_i u_i + \sum_j v_j =\boldsymbol{c}^\top\boldsymbol{x}-\boldsymbol{p} ^\top\boldsymbol{b}. $$

It has two simple corollaries.

  • If the optimal cost in the primal is \(-\infty\), then the dual problem must be infeasible.
  • If the optimal cost in the dual is \(+\infty\), then the primal problem must be infeasible.
Let $\mathbf{x}$ and $\mathbf{p}$ be feasible solutions to the primal and the dual, respectively, and suppose that $\mathbf{p}^{\prime}\mathbf{b} = \mathbf{c}^{\prime}\mathbf{x}$. Then, $\mathbf{x}$ and $\mathbf{p}$ are optimal solutions to the primal and the dual, respectively.
Let $\mathbf{A}$ be a matrix of dimensions $m \times n$ and let $\mathbf{b}$ be a vector in $\mathbb{R}^m$. Then, exactly one of the following two alternatives holds:
  • There exists some \(\mathbf{x} \geq \mathbf{0}\) such that \(\mathbf{A}\mathbf{x} = \mathbf{b}\).
  • There exists some vector \(\mathbf{p}\) such that \(\mathbf{p}^{\prime}\mathbf{A} \geq \mathbf{0}^{\prime}\) and \(\mathbf{p}^{\prime}\mathbf{b} < 0\).

The next theorem is the central result on linear programming duality.

If a linear programming problem has an optimal solution, then so does its dual, and the optimal values of the objective functions in the two problems are equal.
TBD

Cutting Plane Methods

Finally, how to design faster algorithms?

Convexity

Surprisingly, the answer is to step back and look at the more general convex problems.

The epigraph of a function $f:\mathbb{R}^n\to\mathbb{R}$ is defined as

\[\textbf{epi }f=\{(x,t)\mid x\in\textbf{dom } f,f(x)\le t\}. \]

A function is convex if its epigraph is a convex set.
More commonly used definition is the following.
A function $ f:\mathbb{R}^{n}\rightarrow\mathbb{R} $ is convex if $ \textbf{dom }f $ is a convex set and if for all $ x,y\in\textbf{dom }f $, and $ \theta $ with $ 0\leq\theta\leq1 $, we have

\[f(\theta x+(1-\theta)y)\leq\theta f(x)+(1-\theta)f(y). \]

Suppose $C$, $D$ are nonempty disjoint convex sets, then there exists a vector $\boldsymbol{a} \neq \boldsymbol{0}$ and a scalar $b$ such that

\[\{\boldsymbol{x} \mid \boldsymbol{a}^{\top}\boldsymbol{x} \leq b\} \supseteq C, \quad \{\boldsymbol{x} \mid \boldsymbol{a}^{\top}\boldsymbol{x} \geq b\} \supseteq D. \]

For any nonempty convex set $C$, and any point $x_0$ in its boundary, there exists a supporting hyperplane to $C$ at $x_0$.

Center-of-Gravity Algorithm

We say a vector $\partial f(x)$ is a subgradient of $f: \mathbb{R}^{n} \to \mathbb{R}$ at $x \in \textbf{dom } f$ if for all $z \in \textbf{dom } f$,

\[f(z) \geq f(x) + \partial f(x)^{T}(z - x). \]

For linear programming, the subgradient is simply the cost vector \(\boldsymbol{c}\).

Initialize $S_0$ to be a convex set containing an optimal solution.

Repeat for \(t=0,1,\ldots\) until convergence:

  1. Compute the center of gravity \(c_t\) of \(S_t\):

    \[c_t=\frac{1}{\text{vol}(S_t)}\int_{x\in S_t} x \text{d}x. \]

  2. Find a subgradient \(\partial f(c_t)\) and update:

    \[S_{t+1}=S_t\cap\{x\mid \partial f(c_t)^{\top}(x-c_t)\leq 0\}. \]

Suppose that $f:S_0\to [-B,B]$. *The* center-of-gravity algorithm satisfies $$ f(x_t)-\min_{x\in S_0}f(x)\le 2B\left(1-\frac{1}{e}\right)^{t/n}, $$ where $x_t\in \arg\min_{1\le i\le t} f(c_i)$.

However, computing the center of gravity is hard. We need more practical algorithms.

Ellipsoid Method

posted @ 2025-09-19 12:10  xcyle  阅读(18)  评论(0)    收藏  举报