Ngô Quốc Anh

June 10, 2009

Linear Least Square Problem and Singular Value Decomposition (SVD)

Linear Least Square Problem (linear least squares), or ordinary least squares (OLS), is an important computational problem, that arises primarily in applications when it is desired to fit a linear mathematical model to measurements obtained from experiments. The goals of linear least squares are to extract predictions from the measurements and to reduce the effect of measurement errors. Mathematically, it can be stated as the problem of finding an approximate solution to an overdetermined system of linear equations. In statistics, it corresponds to the maximum likelihood estimate for a linear regression with normally distributed error.

Linear least square problems admit a closed-form solution, in contrast to non-linear least squares problems, which often have to be solved by an iterative procedure.

Motivational example

As a result of an experiment, four (x,y) data points were obtained, (1,6), (2,5), (3,7), and (4,10) (shown in red in the picture on the right). It is desired to find a line y = \beta_1 + \beta_2x that fits “best” these four points. In other words, we would like to find the numbers \beta_1 and \beta_2 that approximately solve the overdetermined linear system

\displaystyle {\beta _1} + 1{\beta _2} = 6, \quad {\beta _1} + 2{\beta _2} = 5, \quad {\beta _1} + 3{\beta _2} = 7, \quad {\beta _1} + 4{\beta _2} = 10.

of four equations in two unknowns in some “best” sense.

File:Linear least squares example2.png

The least squares approach to solving this problem is to try to make as small as possible the sum of squares of “errors” between the right- and left-hand sides of these equations, that is, to find the minimum of the function

\displaystyle S(\beta_1, \beta_2)= \left[6-(\beta_1+1\beta_2)\right]^2 +\left[5-(\beta_1+2\beta_2) \right]^2 +\left[7-(\beta_1 + 3\beta_2)\right]^2 +\left[10-(\beta_1 + 4\beta_2)\right]^2.

The minimum is determined by calculating the partial derivatives of S(\beta_1,\beta_2) in respect to \beta_1 and \beta_2 and setting them to zero. This results in a system of two equations in two unknowns, called the normal equations, which, when solved, gives the solution

\displaystyle \beta_1 = 3.5, \beta_2 = 1.4

and the equation y = 3.5 + 1.4x of the line of best fit. The residuals, that is, the discrepancies between the y values from the experiment and the y values calculated using the line of best fit are then found to be 1.1, -1.3, -0.7, and 0.9 (see the picture on the right). The minimum value of the sum of squares is

\displaystyle S(3.5,1.4) = 1.12 + (-1.3)2 + (-0.7)2 + 0.92 = 4.2.

The general problem

Consider an overdetermined system (there are more equations than unknowns)

\displaystyle\sum_{j=1}^{n} X_{ij}\beta_j = y_i,\ (i=1, 2, \dots, m),

of m linear equations in n unknown coefficients, \beta_1,\beta_2,…,\beta_n, with m > n, written in matrix form as \mathbf {X} \boldsymbol {\beta} = \mathbf {y},where

\displaystyle\mathbf {X}=\begin{pmatrix} X_{11} & X_{12} & \cdots & X_{1n} \\ X_{21} & X_{22} & \cdots & X_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ X_{m1} & X_{m2} & \cdots & X_{mn} \end{pmatrix} , \qquad \boldsymbol \beta = \begin{pmatrix} \beta_1 \\ \beta_2 \\ \vdots \\ \beta_n \end{pmatrix} , \qquad \mathbf y = \begin{pmatrix} y_1 \\ y_2 \\ \vdots \\ y_m \end{pmatrix}.

Such a system usually has no solution, and the goal is then to find the coefficients \beta which fit the equations “best”, in the sense of solving the quadratic minimization problem.

\displaystyle\underset{\boldsymbol \beta} {\rm argmin} \sum_{i=1}^{m}\left|y_i - \sum_{j=1}^{n} X_{ij}\beta_j\right|^2 = \underset{\boldsymbol \beta} {\rm argmin}\big\|\mathbf y - X \boldsymbol \beta \big\|^2.

A justification for choosing this criterion is given in properties below. This minimization problem has a unique solution, provided that the n columns of the matrix X are linearly independent, given by solving the normal equations

\displaystyle (\mathbf{X}^\top \mathbf{X}) \widehat{\boldsymbol \beta} =\mathbf{X}^ \top \mathbf{y}.

Singular value decomposition

In linear algebra, the singular value decomposition (SVD) is an important factorization of a rectangular real or complex matrix, with many applications in signal processing and statistics. Applications which employ the SVD include computing the pseudo-inverse, least squares fitting of data, matrix approximation, and determining the rank, range and null space of a matrix.

Suppose M is an m-by-n matrix whose entries come from the field K, which is either the field of real numbers or the field of complex numbers. Then there exists a factorization of the form

\displaystyle M = U\Sigma V^*,

where U is an m-by-m unitary matrix over K, the matrix \Sigma is m-by-n diagonal matrix with nonnegative real numbers on the diagonal, and V^* denotes the conjugate transpose of V, an n-by-n unitary matrix over K. Such a factorization is called a singular-value decomposition of M. A common convention is to order the diagonal entries \Sigma_{i,i} in non-increasing fashion. In this case, the diagonal matrix \Sigma is uniquely determined by M (though the matrices U and V are not). The diagonal entries of \Sigma are known as the singular values of M.

In practice, matrix U consists of eigenvectors of matrix MM^T while matrix V consists of eigenvectors of matrix M^TM. Finally, matrix \Sigma is the square root of a diagonal matrix forming by eigenvalues of either matrix MM^T or matrix M^TM.


Let us consider the case when

\displaystyle A = \left( {\begin{array}{*{20}{c}} 2&4 \\ 1&3 \\ 0&0 \\ 0&0 \end{array}} \right).

Having A yields

\displaystyle A{A^T} = \left( {\begin{array}{*{20}{c}} {20}&{14}&0&0 \\ {14}&{10}&0&0 \\ 0&0&0&0 \\ 0&0&0&0 \end{array}} \right)


\displaystyle {A^T}A = \left( {\begin{array}{*{20}{c}} 5&{11} \\ {11}&{25} \end{array}} \right).

By simple calculation, one has

\displaystyle\lambda_1=15+\sqrt{221}, \quad \lambda_2 =15-\sqrt{221}

being eigenvalues of matrix A^TA (in the descending form), which helps us to write down

\displaystyle\Sigma = \left( {\begin{array}{*{20}{c}} {\sqrt {15 + \sqrt {221} } }&0 \\ 0&{\sqrt {15 - \sqrt {221} } } \\ 0&0 \\ 0 & 0 \end{array}} \right)

To construct matrix V, one has to find eigenvectors corresponding to eigenvalues \lambda_1 and \lambda_2. For the first one, that is, \lambda_1, one has the following linear system

\displaystyle\left( {\begin{array}{*{20}{c}} { - 10 - \sqrt {221} }&{11} \\ {11}&{10 - \sqrt {221} } \end{array}} \right)

which implies, for example,

\displaystyle {x_1} = \left( {\begin{array}{*{20}{c}} {\frac{{11}}{{10 + \sqrt {221} }}} \\ 1 \end{array}} \right).

Similar to \lambda_2, one has its corresponding eigenvector as follows

\displaystyle {x_2} \left( {\begin{array}{*{20}{c}} {\frac{{11}}{{10 - \sqrt {221} }}} \\ 1 \end{array}} \right).


\displaystyle V = \left( {\begin{array}{*{20}{c}} {\frac{{11}}{{10 + \sqrt {221} }}}&{\frac{{11}}{{10 - \sqrt {221} }}} \\ 1&1 \end{array}} \right).

The numerical solution of V is

\displaystyle V = \left( {\begin{array}{*{20}{c}} {0.4423698861}&{ - 2.260551703} \\ 1&1 \end{array}} \right)

which gives us

\displaystyle {V^T}V = \left( {\begin{array}{*{20}{c}} {\frac{{2(221 + 10\sqrt {221} )}}{{{{(10 + \sqrt {221} )}^2}}}}&0 \\ 0&{ - \frac{{2( - 221 + 10\sqrt {221} )}}{{{{( - 10 + \sqrt {221} )}^2}}}} \end{array}} \right).

Having V we can compute U by the following formula U=AV^{-1}\Sigma^{-1} or you can perform the same routine as above. That is

\displaystyle U = \left( {\begin{array}{*{20}{c}} { - \frac{{11\sqrt {442} (\sqrt {17} + \sqrt {13} )}}{{442(15 + \sqrt {221} )}}}&{\frac{{\sqrt {442} ( - \sqrt {17} + \sqrt {13} )( - 10 + 3\sqrt {221} )}}{{442( - 15 + \sqrt {221} )}}}&0&0 \\ { - \frac{{11\sqrt {442} (\sqrt {17} + \sqrt {13} )}}{{442(15 + \sqrt {221} )}}}&{\frac{{\sqrt {442} ( - \sqrt {17} + \sqrt {13} )( - 10 + 3\sqrt {221} )}}{{442( - 15 + \sqrt {221} )}}}&0&0 \\ 0&0&0&0 \\ 0&0&0&0 \end{array}} \right).

It is easy to see that

\displaystyle U\Sigma V^* = \left( {\begin{array}{*{20}{c}} {2.000000051}&{4.000000052} \\ {1.000000029}&{3.000000030} \\ 0&0 \\ 0&0 \end{array}} \right).

Having a SVD result, that is

\displaystyle A = U\left( {\begin{array}{*{20}{c}} {{\Sigma _1}} \\ 0 \\ \end{array} } \right){V^T}

one can see that the solution of Ax=b can be computed by

\displaystyle x = V\left( {\begin{array}{*{20}{c}} {\Sigma _1^{ - 1}} & 0 \\ \end{array} } \right){U^T}b.

Note that matrices U and V are not unique. Using Maple software, you can compute SVD as shown below.


Useful links


  1. It’s a well-known result by using Moore–Penrose pseudoinverse.

    Let A=(a_{ij})_n be a matrix and, b=(b_1,b_2,...,b_n)^{T} is vector. Find the vector x such that \|Ax-b\| has the smallest value.

    If denote A^{+} is Moore–Penrose pseudoinverse of matrix A, and x_0=A^{+}b then we have: \|Ax-b\|\ge \|Ax_0-b\|

    We can calculate with Maple


    Comment by Tuan Minh — June 15, 2009 @ 21:58

  2. Thanks Minh for this helpful information🙂.

    Comment by Ngô Quốc Anh — June 15, 2009 @ 22:07

  3. Thanks Anh and Minh.
    I have some SVD problems with my Search Engine Research.
    @Minh : Nice to see you, I’m Quoc Hoc high school ( 03-06)🙂
    Best Regards.

    Comment by Hoàng Cường — November 17, 2009 @ 1:03

  4. Reblogged this on Miah World and commented:
    Ahh this is fun

    Comment by ~MIAH~ — August 19, 2013 @ 13:17

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at

%d bloggers like this: