Loading

Linear And Nonlinear Least-Squares With Math.NET

Linear And Nonlinear Least-Squares With Math.NET

by Libor Tinka

中国版本 (Chinese version of this article)

Linear And Nonlinear Least-Squares

CONTENTS

1. INTRODUCTION

2. LEAST-SQUARES IN GENERAL

3. LINEAR LEAST SQUARES

3.1 Solution with Normal Equations

3.2 Solution with QR Decomposition

3.3 Solution with Singular Value Decomposition (SVD)

3.4 Example

4. NONLINEAR LEAST-SQUARES

4.1 Steepest Descent Method

4.2 Gauss-Newton Method

4.3 Levenberg-Marquardt Method

5. IMPLEMENTATION

5.1 Demo Application

5.2 Source Code

6. ALGORITHM IMPROVEMENTS

7. FURTHER READING

1. INTRODUCTION

This article explains how to solve linear and nonlinear least-squares problems and provides C# implementation for it. All the numerical methods used in the demo project stand on top of vector and matrix operations provided by the Math.NET numerical library. You can, of course, adjust the presented algorithms to work with any other numerical library that can handle commonly used matrix operations (e.g. LAPACK).

The tutorial covers discrete least-squares problems (i.e. fitting a set of points with known model). We will care about over-determined problems (e.g. where the number of measurements are greater than number of unknowns) and assume the problems are well-conditioned. The tutorial goes from simple to more complex and more general methods.

This article was written for programmers with basic knowledge of matrix computations and calculus.

2. LEAST-SQUARES IN GENERAL

The least-squares problem arise in many applications (statistics, financial, physics...). It can be viewed as an optimization problem. We gathered some data and want to find model parameters for which the model best fits the data in some sense.

The following section is purely mathematical. It is a general description of the problem, without reference to any specific application (e.g. fitting parabola to a set of points...). However, it is crucial for the following derivation of solutions for both linear and nonlinear problems.

The least-squares problem can be written compactly as:

(1)

where

Here

As you can see, the function

The residual value can be either positive or negative and in both cases it means that the model deviates from measured values. We would like to have as small residuals as possible, but in absolute value. This is why we look for least squares of the residuals, rather than residual values itself. Note that there exist variants of the above optimization problem, where different norms (e.g. absolute values) are used instead of squares. But least squares problem is easy to solve and have a remarkable statistical property of being so called maximum likelihood estimator when errors between measured values and expectations are distributed normally.

The norm in equation (1) is a 2-norm, so we can re-write this equation in terms of the objective function in the well-known form:

(2)

Of course, we would like to find parameters

We place partial derivatives of

3. LINEAR LEAST-SQUARES

Now we can be more specific about the problems we solve, but linear least-squares can still be applied to a whole family of functions. For example, when we are fitting straight line to 2D points (a common example), our model function has a form:

(3)

But we are not limited to just straight lines. We can fit any fuction that is linear in the paramters

can be used in linear least-squares framework as well.

The general form of the model function is:

Here

As we can observe, derivatives of residuals with respect to model parameters are particularly simple:

We can put all the partial derivatives to a single matrix called design matrix:

The residuals can be written as:

or in matrix terms:

3.1 SOLUTION WITH NORMAL EQUATIONS

With the above knowledge, we can write

We know from (1) that we need to minimize

This is equivalent to a linear system called normal equations:

(4)

We first compute product

This can be solved in various ways, so let's play with the numerical library and try different approcaches...

Since the matrix

(5)

The parameters can be computed this way in a single line of code:

1
x = A.Transpose().Multiply(A).Inverse().Multiply(A.Transpose().Multiply(dataY));

Much less expensive and more stable approach would be the Cholesky decomposition of the matrix on left hand side:

The Cholesky decomposition produces an upper triangular matrix

Math.NET library provides means for computing both the Cholesky factorization and methods for solving linear systems of the form

1
x = A.Transpose().Multiply(A).Cholesky().Solve(A.Transpose().Multiply(dataY));

The Cholesky decomposition is guaranteed to exist in our well-behaved problems (which are overdetermined - there are more points than unknowns and the Jacobian has full rank).

The presented approach (use of normal equations) works well, but only when

The solution with normal equations and Cholesky decomposition is useful when

3.2 SOLUTION WITH QR DECOMPOSITION

This approach gets over the problem with squaring the condition number by decomposing the Jacobian into product of orthogonal (

We substitute

This is equivalent to linear system

1
x = A.QR().Solve(dataY);

If we are forced to use just basic matrix operations, we should be aware of the fact that the matrix

The matrix

1
2
3
4
5
6
7
8
9
// compute the QR decomposition
QR qr = A.QR();
 
// get orthogonal matrix with first n columns of Q
Matrix<double> Q1 = qr.Q.SubMatrix(0, m,0, n);
// get upper-triangular matrix of size n x n
Matrix<double> R = qr.R.SubMatrix(0, n,0, n);
 
x = R.Inverse().Multiply(Q1.Transpose().Multiply(dataY));

Of course, it is more effective to solve the linear system

The solution using QR decomposition is numerically more stable than normal equations.

3.3 SOLUTION WITH SINGULAR VALUE DECOMPOSITION (SVD)

The third (and the most advanced) way of solving linear least squares problem is using SVD. This decomposition splits our matrix

If the decomposed matrix is singular or near singular, we get

Following the same logic as in the QR approach, we substitute

The computation of system

1
x = A.Svd(true).Solve(dataY);

If we want to use just the basic matrix operations, we should be aware of the fact that

The matrix

1
2
3
4
5
6
7
8
9
10
11
// compute the SVD
Svd svd = A.Svd(true);
 
// get matrix of left singular vectors with first n columns of U
Matrix<double> U1 = svd.U().SubMatrix(0, m,0, n);
// get matrix of singular values
Matrix<double> S =newDiagonalMatrix(n, n, svd.S().ToArray());
// get matrix of right singular vectors
Matrix<double> V = svd.VT().Transpose();
 
x = V.Multiply(S.Inverse()).Multiply(U1.Transpose().Multiply(dataY));

And of course, it is more efficient to solve the linear system

3.4 EXAMPLE

The plot shows a line (

Linear least-squares method yields a new line (

Most of the data points are placed below the function with true model parameters (see the gray line). The estimated line is thus placed a little lower than the original one and the estimated parametes are slightly different. We chose strongly perturbed dataset to show how errors can affect the estimated parameters. Of course, this also depends on the number of points, their relative distance and magnitude of the error introduced.

4. NONLINEAR LEAST-SQUARES

Up to now, we could solve least-squares problems directly because the model function depended on its parameters linearly. In the purely nonlinear waters, however, the parameters cannot be separated from the model function. So instead of solving linear problem

The best we can do in nonlinear case is to guess some starting values for parameter vector and move toward solution, then update the parameters and repeat the process, hence use iterative method.

What we called the design matrix

but the difference is that it depends on model parameters

We can construct Jacobian directly from our model function (it follows from the definition):

In linear least squares, the partial derivatives would leave only basis functions and we would get the design matrix

The other difference from linear least squares it that we will need to examine not only values of the objective function, but also its first and second order derivatives. Since the objective function is

When solving nonlinear least squares problems, the second summation term in Hessian:

is usually negligible compared to the first term

This allows us to compute approximate Hessian without using second derivatives of

(6)

(7)

This convenient, because user to provide just prescription for the model function (as in the linear least squares) plus its partial derivatives (with respect to each parameter) and the algorithm constructs all the neccessary structures by itself:

Every iterative method starts with an initial estimate

Forthermore, value of the objective function should be smaller at each step:

By approximating left hand side of this inequality by first-order Taylor polynomial we have:

This gives us decrease condition in terms of objective function's gradient:

(8)

4.1 STEEPEST DESCENT METHOD

The steepest descent method is a general function minimization method. It is not a method specialized in solving least squares problems, but it is a good one to start with.

Each iteration consists of three steps:

  1. Select step direction

  2. Select step length

  3. Update parameters with the steepest descent step

Direction of the step (

To derive speepest descent direction, let's start with directional derivative of our objective function

We take the right side and continue computing the step:

Note that we assume

Since

The steepest descent step can thus be expressed as:

or in terms of Jacobian and residual:

Obtaining the step direction is straightforward, but computing step length

The best option would be to select

In general (nonlinear) case, however, we can only perform inexact line search. There are several strategies on how to find

To keep implementation understandable, the line search in demo implementation is reduced to just shortening the step if objective value function does not decrease, and lenghtening it oterwise. This is very ineffective, but the demo implementation of steepest descent is just a proof of concept.

4.2 GAUSS-NEWTON METHOD

To compute the iteration step

and set derivative of the right hand side (with respect to

From there we have formula for the so called Newton step:

(9)

We can conclude from the above that the Newton step will take us to a stationary point of the quadratic approximation of

In either case, the Newton step should agree with the decrease condition (8) to work. We substitute this step into condition to inspect when it actually decreases function value:

As we can see, this condition holds for any nonzero

Substituing expressions for gradient (6) and approximate Hessian (7) into (9) gives us definition of the Gauss-Newton step:

(10)

The step can as be written as:

Compare the last equation to (4). The Gauss-Newton method actually solves system similar to normal equations in each iteration. The method actually approximates nonlinear model by the linear one and solves it. Then updates the current estimate for better approximation.

Of course, any of the presented approaches for linear least squares (Cholesky, QR, SVD) can be used to solve for Gauss-Newton step.

Being more general, Gauss-Newton method can be used to solve linear problems as well. In such case, it converges in a single iteration.

4.3 LEVENBERG-MARQUARDT METHOD

The Gauss-Newton method works well unless the Hessian gets singular or near-singular. This can happen when dealing with "highly nonlinear" functions. Other drawback of the Gauss-Newton method is when the initial guess is far from minimizer. In such case, the method converges very slowly or may not converge at all.

These problems can be addressed by modifying the Hessian matrix to improve convergence of the method. As we said earlier, the step direction is a descent direction whenever the Hessian is positive definite. This may not be the case in every point of the objective function. The function can be convex in some direction and concave in other (i.e. the Hessian is indefinite). The simple aid to make the hessian approximation positive definite is by adding multiple of identity:

where

If we replace

(11)

As the

Since

where

This is the Levenberg-Marquardt step used in most implementations.

We can rewrite it in terms of Jacobian and residual as

Here is an outline for the Levenberg-Marquardt algorithm:

The algorithm takes initial guess (p_curr) as an input. It initializes

Next, a new parameter vector (p_new) and objective function value (v_new) are computed.

If the objective function value has decreased (v_new < v_curr), the algorithm decreases

If the objective function has not decreased, the

Finally, the algorithm checks termination conditions and either ends or continues with the next iteration.

4.3 EXAMPLE

We start with a model function

The first graph shows initial guess with

The data points are not perturbed so the estimated paramters match the true ones to four decimal places after ten iterations.

5. IMPLEMENTATION

5.1 DEMO APPLICATION

All the presented methods have been implemented in a C# application called LeastSquaresDemo:

The combo box in the top-left corner allows you to select preset. There are several presets with sample settings.

Below is the options for the current preset:

These allows you to adjust current model, dataset (set of points to fit) and the method to be used for data fitting.

Further options are available through the "cogwheel" buttons.

The only options for model are its parameters:

You can edit them in the dialog window. For example, the line model has two parameters that define the line.

There are two kinds of datasets: Exact and Perturbed. The first one generates data points that corresponds to function values exactly. The second one adds noise to simulate real-world data. Both the horizontal and vertical positioning of data points can be further adjusted:

Finally, the method to be used for data fitting can be adjusted. You can choose whether to use effective solvers from Math.NET that use matrix decompositions (applies to linear models) and stopping conditions (applies to nonlinear models):

Under the panel with options, there is an iteration list:

It contains initial parameters (the initial guess) which you can edit by clicking on the numbers.

When the fitting is performed, it contains estimated parameters from every iteration of the fitting method. The linear least squares method need jsut one iteration, so there can appear only two lines (the guess which is actually not used and the solution). When nonlinear models and methods are used, there can be many iterations.

The fitting process is started by the button on the bottom left part of the form:

The button contains context menu so you can only fit the current data, or generate new and then fit.

Finally, there is a graph showing the data points and all the currently selected iterations. Model function for initial guess is plotted in gray, while the computed iterations are in red:

5.2 SOURCE CODE

The code regarding least squares fitting is divided into three sets of classes:

The Dataset class serves for generating data points, either lying on the model curve, or being scattered around it.

The Model class represents model function (or expectation model, if you will). There are two linear model and two non-linear. The class provides value of the model function (

The Solver class represents a method used for solving the least-squares problem, either linear or nonlinear. There are three implementations for linear least-squares and three for nonlinear least-squares.

The actual fitting is done in the Fit() method of MainForm.

The method checks whether the problem can be solved (there is enough points and we are not trying to use linear solver for a nonlinear model).

I have not implemented any extra dummy-proofing, so beware of putting in values that are out of range.

6. ALGORITHM IMPROVEMENTS

All the presented methods can be improved on multiple levels.

One of the possible improvement is the computations of function gradient when the Jacobian is expected to be sparse. Large and sparse matrices often arise in real-world application, thus I would like to point reader to [Watkins] (linear problems) or [Nocedal] (nonlinear problems) where such efficient implementations are described in depth.

Another improvement lies in better conditioning of the matrix we invert during computation. Better conditioning brings more accurate and stable solution with less iterations needed. [Watkins] covers the topic of preconditioning well.

Least squares method works best if the data points are scattered around our expectation model with normal distribution. There can also be a systematic error and we would like to know precision and accuracy of our estimator. The noisy data can also have other than normal distribution (e.g. Poisson). [Bos] covers the topic of dealing with such cases, even estimating the precision and accuracy of the expectation model is covered here.

The data can also have error in both coordinates. Such problem is discussed in [Press] (as well as the measurement of confidence intervals and other statistical properties) and in [Nocedal] (orthogonal distance regression).

7. FURTHER READING

[Bos] Van den Box, A., "Parameter Estimation for Scientists and Engineers", Wiley Online Library, 2007

[Nocedal] J. Nocedal and S. J. Wright, "Numerical Optimization", Springer Verlag, 1999

[Press] Press, W.H. and Flannery, B.P. and Teukolsky, S.A. and Vetterling, W.T. and others, "Numerical Recipes", Cambridge Univ Press, 2007

[Watkins] Watkins, D.S., "Fundamentals of Matrix Computations", Wiley, 2005

Download Sample Binary (836 kB) Download Sample Source Code (1.92 MB)

posted @ 2018-01-03 14:29  Sam Xiao  阅读(3081)  评论(0)    收藏  举报