# Ngô Quốc Anh

## June 30, 2009

### A quick introduction to Simplex Method via an example

Filed under: Giải tích 9 (MA5265), Linh Tinh, Nghiên Cứu Khoa Học — Ngô Quốc Anh @ 14:36

 Below is the original problem: objective function is in green.
 Red variables below are called SLACK VARIABLES
 Below is the SIMPLEX TABLEAU. Compare RED symbols with Z = x1 + 2x2 – x3. Blue numbers below are the “ISM“.
 The 1st SIMPLEX TABLEAU is below. Note missing z-column (correction by Steve Hulet)
See steps 3,4,5 of
SIMPLEX METHOD
as you handle INDICATORS,
RATIOS, and PIVOTS.
 Named below are the 4 row operations needed to pivot on the number “5” encircled in red
 Below are the results of the row operations named above blue numbers are the new ISM
 Since one INDICATOR (namely -1/5) remains negative, we must repeat steps 3-9
 Below is the result of changing our pivot to “1”
 Named below are 4 row operations needed to pivot on the number(16/5) encircled in red
 Above there was a tie for least non-negative ratio: either row 1 or row 2 could have become the pivot row, and either choice leads to the final tableau after one additional pivoting. At the right is the result of the final 3 row operations.
 All indicators {0, 0, 49 16 , 0, 1 16 and 3 8 } are now zero or bigger (“13” is NOT an indicator). Thus, as in step 8 of the SIMPLEX METHOD, the last tableau is a FINAL TABLEAU. Row operations of SIMPLEX METHOD are done.
 Thus, the basic solution for the tableau above is the solution to our original problem. [1st] set equal to 0 all variables NOT associated with the blue ISM, as at the right. Each column of the final tableau has a label naming it’s variable. [2nd] convert each row of the final tableau (except the bottom row) back into equation form (as at the right) to find the values of the remaining variables. The value of the objective function is in the lower right corner of the final tableau.

## June 29, 2009

### Karush–Kuhn–Tucker conditions, the 1st order optimality condition

In mathematics, the Karush–Kuhn–Tucker conditions (also known as the Kuhn-Tucker or the KKT conditions) are necessary for a solution in nonlinear programming to be optimal, provided some regularity conditions are satisfied. It is a generalization of the method of Lagrange multipliers to inequality constraints. The conditions are named for William Karush, Harold W. Kuhn, and Albert W. Tucker.

Let us consider the following nonlinear optimization problem:

Minimize

$f(x)$

subject to

$g_i(x) \le 0, \quad h_j(x) = 0$

where $f(x)$ is the function to be minimized, $g_i (x)$ ($i = 1,...,m$) are the inequality constraints and $h_j (x)$ ($j = 1,...,l$) are the equality constraints, and $m$ and $l$ are the number of inequality and equality constraints, respectively.

The necessary conditions for this general equality-inequality constrained problem were first published in the Masters thesis of William Karush, although they only became renowned after a seminal conference paper by Harold W. Kuhn and Albert W. Tucker.

Necessary conditions

Suppose that the objective function, i.e., the function to be minimized, is $f : \mathbb{R}^n \rightarrow \mathbb{R}$ and the constraint functions are $g_i : \mathbb{R}^n \rightarrow \mathbb{R}$ and $h_j : \mathbb{R}^n \rightarrow \mathbb{R}$. Further, suppose they are continuously differentiable at a point $x^\star$. If $x^\star$ is a local minimum that satisfies some regularity conditions, then there exist constants $\mu_i\ (i = 1,...,m)$ and $\lambda_j (j = 1,...,l)$ such that

Stationarity:

$\displaystyle\nabla f(x^\star) + \sum_{i=1}^m \mu_i \nabla g_i(x^\star) + \sum_{j=1}^l \lambda_j \nabla h_j(x^\star) = 0$,

Primal feasibility:

$\displaystyle g_i(x^\star) \le 0, \qquad\forall i = 1,..., m$
$\displaystyle h_j(x^\star) = 0, \qquad \forall j = 1,..., l$.

Dual feasibility:

$\displaystyle \mu_i \ge 0 (i = 1,...,m)$.

Complementary slackness:

$\displaystyle \mu_i g_i (x^\star) = 0 \qquad \forall i = 1,...,m$.

Regularity conditions (or constraint qualifications)

In order for a minimum point $x^\star$ be KKT, it should satisfy some regularity condition, the most used ones are listed below

• Linear independence constraint qualification (LICQ): the gradients of the active inequality constraints and the gradients of the equality constraints are linearly independent at $x^\star$.
• Mangasarian-Fromowitz constraint qualification (MFCQ): the gradients of the active inequality constraints and the gradients of the equality constraints are positive-linearly independent at $x^\star$.
• Constant rank constraint qualification (CRCQ): for each subset of the gradients of the active inequality constraints and the gradients of the equality constraints the rank at a vicinity of $x^\star$ is constant.
• Constant positive linear dependence constraint qualification (CPLD): for each subset of the gradients of the active inequality constraints and the gradients of the equality constraints, if it is positive-linear dependent at $x^\star$ then it is positive-linear dependent at a vicinity of $x^\star$ ($v_1,...,v_n$) is positive-linear dependent if there exists $a_1\geq 0$,…,$a_n\geq 0$ not all zero such that $a_1v_1+...+a_nv_n=0$).
• Quasi-normality constraint qualification (QNCQ): if the gradients of the active inequality constraints and the gradients of the equality constraints are positive-linearly independent at $x^\star$ with associated multipliers $\lambda_i$ for equalities and $\mu_j$ for inequalities than it doesn’t exist a sequence $x_k\to x^\star$ such that $\lambda_i \ne 0$ therefore $\lambda_i h_i(x_k)>0$ and $\mu_j \ne 0$ thus $\mu_j g_j(x_k)>0$.
• Slater condition: for a convex problem, there exists a point $x$ such that $h(x) = 0$ and $g_i(x) < 0$ for all $i$ active in $x^\star$.
• Linearity constraints: If $f$ and $g$ are affine functions, then no other condition is needed to assure that the minimum point is KKT.

It can be shown that

LICQ⇒MFCQ⇒CPLD⇒QNCQ,

LICQ⇒CRCQ⇒CPLD⇒QNCQ

(and the converses are not true), although MFCQ is not equivalent to CRCQ. In practice weaker constraint qualifications are preferred since they provide stronger optimality conditions.

Sufficient conditions

In some cases, the necessary conditions are also sufficient for optimality. This is the case when the objective function $f$ and the inequality constraints $g_j$ are continuously differentiable convex functions and the equality constraints hi are affine functions. It was shown by Martin in 1985 that the broader class of functions in which KKT conditions guarantees global optimality are the so called invex functions. So if equality constraints are affine functions, inequality constraints and the objective function are continuously differentiable invex functions then KKT conditions are sufficient for global optimality.

Value function

If we reconsider the optimization problem as a maximization problem with constant inequality constraints

Minimize

$f(x)$

subject to

$g_i(x) \le a_i , \quad h_j(x) = 0$

The value function is defined as

$\displaystyle V(a_1, ..., a_n) = \sup\limits_{x} f(x)$

subject to

$\displaystyle g_i(x) \le a_i , \quad h_j(x) = 0 j \in \{1,...,l\}, i\in{1,...,m}$.

(So the domain of $V$ is

$\displaystyle\{a \in \mathbb{R}^m | \mbox{for some }x\in X, g_i(x) \leq a_i, i \in \{1,...,m\}$.)

Given this definition, each coefficient, $\lambda_i$, is the rate at which the value function increases as $a_i$ increases. Thus if each $a_i$ is interpreted as a resource constraint, the coefficients tell you how much increasing a resource will increase the optimum value of our function $f$. This interpretation is especially important in economics and is used, for instance, in utility maximization problems.

Source

## June 20, 2009

### How to apply the Runge–Kutta 4 method to the system of 1st order ODEs?

In numerical analysis, the Runge–Kutta methods are an important family of implicit and explicit iterative methods for the approximation of solutions of ordinary differential equations.

The common fourth-order Runge–Kutta method

One member of the family of Runge–Kutta methods is so commonly used that it is often referred to as “RK4” or simply as “the Runge–Kutta method”. Let an initial value problem be specified as follows.

, with .

We equally split the interval into subintervals by the following interior points

.

Of course, . The number , which will be denoted by , is called the step size. The value of at step is usually denoted by , that is, . Therefore, the initial value problem can be rewritten as , where and with . The Runge–Kutta 4 method tells us that, having the value of at , we can compute the value of at by using the following scheme

Thus, for each step, we need to compute all four , , constants.

The Runge–Kutta 4 method for a system of ODEs

Now we investigate the question in the title of this entry. An th-order system of 1st initial value problems has the form

for , with the initial conditions

, ,…,.

Let be . We denote by the following number

At each step , having all we can compute for all . To do this, for each , we need to compute all , . More precisely, we need to compute

As can be seen, at step and for each , all the values , with must be computed before any of the expressions .

Example

We now consider the following example

Therefore, at the step , we can compute and by using the following scheme

## June 17, 2009

### Linear Least Square Problem and QR Decomposition

In linear algebra, a decomposition (also called a factorization) of a matrix is a decomposition of the matrix into an orthogonal and a right triangular matrix. decomposition is often used to solve the linear least squares problem, and is the basis for a particular eigenvalue algorithm, the algorithm.

Square matrix

A QR decomposition of a real square matrix is a decomposition of as where is an orthogonal matrix (meaning that ) and is an upper triangular matrix (also called right triangular matrix). This generalizes to a complex square matrix and a unitary matrix . If is nonsingular, then this factorization is unique if we require that the diagonal elements of are positive.

Rectangular matrix

More generally, we can factor a complex matrix , with , as the product of an unitary matrix and an upper triangular matrix . As the bottom rows of an upper triangular matrix consist entirely of zeroes, it is often useful to partition , or both and :

where is an upper triangular matrix, is , is , and and both have orthogonal columns.

Golub & Van Loan call the thin factorization of . If is of full rank and we require that the diagonal elements of are positive then and are unique, but in general is not. is then equal to the upper triangular factor of the Cholesky decomposition of ( if is real).

, and decompositions

Analogously, we can define , , and decompositions, with being a lower triangular matrix.

How to solve the Linear Least Square problems

Having the decomposition, that is, , one can solve the Least Square Problem where a matrix, and with . To be precise, by mean of , we see that is an matrix and is an matrix. Therefore, we can assume both and to be

Now equation is equivalent to . Thus .

Computing the decomposition

There are several methods for actually computing the QR decomposition, such as by means of the Gram–Schmidt process, Householder transformations, or Givens rotations. Each has a number of advantages and disadvantages. In this section, we mainly consider the Gram–Schmidt process. More precisely, consider the Gram–Schmidt process, with the vectors to be considered in the process as the columns of the matrix . Assume that after finishing the Gram–Schmidt process, we obtain . Therefore, we put

and thus,

.

Example

Let us consider the case when

Then

From , we obtain from , that is,

To obtain , we firstly project onto , then we subtract this result from . More precisely,

where

which yields

Thus

and

Finally,

In Maple, you can practice as shown below.

## June 10, 2009

### Linear Least Square Problem and Singular Value Decomposition (SVD)

Linear Least Square Problem (linear least squares), or ordinary least squares (OLS), is an important computational problem, that arises primarily in applications when it is desired to fit a linear mathematical model to measurements obtained from experiments. The goals of linear least squares are to extract predictions from the measurements and to reduce the effect of measurement errors. Mathematically, it can be stated as the problem of finding an approximate solution to an overdetermined system of linear equations. In statistics, it corresponds to the maximum likelihood estimate for a linear regression with normally distributed error.

Linear least square problems admit a closed-form solution, in contrast to non-linear least squares problems, which often have to be solved by an iterative procedure.

Motivational example

As a result of an experiment, four $(x,y)$ data points were obtained, $(1,6)$, $(2,5)$, $(3,7)$, and $(4,10)$ (shown in red in the picture on the right). It is desired to find a line $y = \beta_1 + \beta_2x$ that fits “best” these four points. In other words, we would like to find the numbers $\beta_1$ and $\beta_2$ that approximately solve the overdetermined linear system

$\displaystyle {\beta _1} + 1{\beta _2} = 6, \quad {\beta _1} + 2{\beta _2} = 5, \quad {\beta _1} + 3{\beta _2} = 7, \quad {\beta _1} + 4{\beta _2} = 10$.

of four equations in two unknowns in some “best” sense.

The least squares approach to solving this problem is to try to make as small as possible the sum of squares of “errors” between the right- and left-hand sides of these equations, that is, to find the minimum of the function

$\displaystyle S(\beta_1, \beta_2)= \left[6-(\beta_1+1\beta_2)\right]^2 +\left[5-(\beta_1+2\beta_2) \right]^2 +\left[7-(\beta_1 + 3\beta_2)\right]^2 +\left[10-(\beta_1 + 4\beta_2)\right]^2$.

The minimum is determined by calculating the partial derivatives of $S(\beta_1,\beta_2)$ in respect to $\beta_1$ and $\beta_2$ and setting them to zero. This results in a system of two equations in two unknowns, called the normal equations, which, when solved, gives the solution

$\displaystyle \beta_1 = 3.5, \beta_2 = 1.4$

and the equation $y = 3.5 + 1.4x$ of the line of best fit. The residuals, that is, the discrepancies between the $y$ values from the experiment and the $y$ values calculated using the line of best fit are then found to be $1.1$, $-1.3$, $-0.7$, and $0.9$ (see the picture on the right). The minimum value of the sum of squares is

$\displaystyle S(3.5,1.4) = 1.12 + (-1.3)2 + (-0.7)2 + 0.92 = 4.2$.

The general problem

Consider an overdetermined system (there are more equations than unknowns)

$\displaystyle\sum_{j=1}^{n} X_{ij}\beta_j = y_i,\ (i=1, 2, \dots, m)$,

of $m$ linear equations in n unknown coefficients, $\beta_1$,$\beta_2$,…,$\beta_n$, with $m > n$, written in matrix form as $\mathbf {X} \boldsymbol {\beta} = \mathbf {y}$,where

$\displaystyle\mathbf {X}=\begin{pmatrix} X_{11} & X_{12} & \cdots & X_{1n} \\ X_{21} & X_{22} & \cdots & X_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ X_{m1} & X_{m2} & \cdots & X_{mn} \end{pmatrix} , \qquad \boldsymbol \beta = \begin{pmatrix} \beta_1 \\ \beta_2 \\ \vdots \\ \beta_n \end{pmatrix} , \qquad \mathbf y = \begin{pmatrix} y_1 \\ y_2 \\ \vdots \\ y_m \end{pmatrix}$.

Such a system usually has no solution, and the goal is then to find the coefficients $\beta$ which fit the equations “best”, in the sense of solving the quadratic minimization problem.

$\displaystyle\underset{\boldsymbol \beta} {\rm argmin} \sum_{i=1}^{m}\left|y_i - \sum_{j=1}^{n} X_{ij}\beta_j\right|^2 = \underset{\boldsymbol \beta} {\rm argmin}\big\|\mathbf y - X \boldsymbol \beta \big\|^2$.

A justification for choosing this criterion is given in properties below. This minimization problem has a unique solution, provided that the $n$ columns of the matrix $X$ are linearly independent, given by solving the normal equations

$\displaystyle (\mathbf{X}^\top \mathbf{X}) \widehat{\boldsymbol \beta} =\mathbf{X}^ \top \mathbf{y}$.

Singular value decomposition

In linear algebra, the singular value decomposition (SVD) is an important factorization of a rectangular real or complex matrix, with many applications in signal processing and statistics. Applications which employ the SVD include computing the pseudo-inverse, least squares fitting of data, matrix approximation, and determining the rank, range and null space of a matrix.

Suppose $M$ is an $m$-by-$n$ matrix whose entries come from the field $K$, which is either the field of real numbers or the field of complex numbers. Then there exists a factorization of the form

$\displaystyle M = U\Sigma V^*$,

where $U$ is an $m$-by-$m$ unitary matrix over $K$, the matrix $\Sigma$ is $m$-by-$n$ diagonal matrix with nonnegative real numbers on the diagonal, and $V^*$ denotes the conjugate transpose of $V$, an $n$-by-$n$ unitary matrix over $K$. Such a factorization is called a singular-value decomposition of $M$. A common convention is to order the diagonal entries $\Sigma_{i,i}$ in non-increasing fashion. In this case, the diagonal matrix $\Sigma$ is uniquely determined by $M$ (though the matrices $U$ and $V$ are not). The diagonal entries of $\Sigma$ are known as the singular values of $M$.

In practice, matrix $U$ consists of eigenvectors of matrix $MM^T$ while matrix $V$ consists of eigenvectors of matrix $M^TM$. Finally, matrix $\Sigma$ is the square root of a diagonal matrix forming by eigenvalues of either matrix $MM^T$ or matrix $M^TM$.

Example

Let us consider the case when

$\displaystyle A = \left( {\begin{array}{*{20}{c}} 2&4 \\ 1&3 \\ 0&0 \\ 0&0 \end{array}} \right)$.

Having $A$ yields

$\displaystyle A{A^T} = \left( {\begin{array}{*{20}{c}} {20}&{14}&0&0 \\ {14}&{10}&0&0 \\ 0&0&0&0 \\ 0&0&0&0 \end{array}} \right)$

and

$\displaystyle {A^T}A = \left( {\begin{array}{*{20}{c}} 5&{11} \\ {11}&{25} \end{array}} \right)$.

By simple calculation, one has

$\displaystyle\lambda_1=15+\sqrt{221}, \quad \lambda_2 =15-\sqrt{221}$

being eigenvalues of matrix $A^TA$ (in the descending form), which helps us to write down

$\displaystyle\Sigma = \left( {\begin{array}{*{20}{c}} {\sqrt {15 + \sqrt {221} } }&0 \\ 0&{\sqrt {15 - \sqrt {221} } } \\ 0&0 \\ 0 & 0 \end{array}} \right)$

To construct matrix , one has to find eigenvectors corresponding to eigenvalues $\lambda_1$ and $\lambda_2$. For the first one, that is, $\lambda_1$, one has the following linear system

$\displaystyle\left( {\begin{array}{*{20}{c}} { - 10 - \sqrt {221} }&{11} \\ {11}&{10 - \sqrt {221} } \end{array}} \right)$

which implies, for example,

$\displaystyle {x_1} = \left( {\begin{array}{*{20}{c}} {\frac{{11}}{{10 + \sqrt {221} }}} \\ 1 \end{array}} \right)$.

Similar to $\lambda_2$, one has its corresponding eigenvector as follows

$\displaystyle {x_2} \left( {\begin{array}{*{20}{c}} {\frac{{11}}{{10 - \sqrt {221} }}} \\ 1 \end{array}} \right)$.

Thus

$\displaystyle V = \left( {\begin{array}{*{20}{c}} {\frac{{11}}{{10 + \sqrt {221} }}}&{\frac{{11}}{{10 - \sqrt {221} }}} \\ 1&1 \end{array}} \right)$.

The numerical solution of is

$\displaystyle V = \left( {\begin{array}{*{20}{c}} {0.4423698861}&{ - 2.260551703} \\ 1&1 \end{array}} \right)$

which gives us

$\displaystyle {V^T}V = \left( {\begin{array}{*{20}{c}} {\frac{{2(221 + 10\sqrt {221} )}}{{{{(10 + \sqrt {221} )}^2}}}}&0 \\ 0&{ - \frac{{2( - 221 + 10\sqrt {221} )}}{{{{( - 10 + \sqrt {221} )}^2}}}} \end{array}} \right)$.

Having $V$ we can compute $U$ by the following formula $U=AV^{-1}\Sigma^{-1}$ or you can perform the same routine as above. That is

$\displaystyle U = \left( {\begin{array}{*{20}{c}} { - \frac{{11\sqrt {442} (\sqrt {17} + \sqrt {13} )}}{{442(15 + \sqrt {221} )}}}&{\frac{{\sqrt {442} ( - \sqrt {17} + \sqrt {13} )( - 10 + 3\sqrt {221} )}}{{442( - 15 + \sqrt {221} )}}}&0&0 \\ { - \frac{{11\sqrt {442} (\sqrt {17} + \sqrt {13} )}}{{442(15 + \sqrt {221} )}}}&{\frac{{\sqrt {442} ( - \sqrt {17} + \sqrt {13} )( - 10 + 3\sqrt {221} )}}{{442( - 15 + \sqrt {221} )}}}&0&0 \\ 0&0&0&0 \\ 0&0&0&0 \end{array}} \right)$.

It is easy to see that

$\displaystyle U\Sigma V^* = \left( {\begin{array}{*{20}{c}} {2.000000051}&{4.000000052} \\ {1.000000029}&{3.000000030} \\ 0&0 \\ 0&0 \end{array}} \right)$.

Having a SVD result, that is

$\displaystyle A = U\left( {\begin{array}{*{20}{c}} {{\Sigma _1}} \\ 0 \\ \end{array} } \right){V^T}$

one can see that the solution of $Ax=b$ can be computed by

$\displaystyle x = V\left( {\begin{array}{*{20}{c}} {\Sigma _1^{ - 1}} & 0 \\ \end{array} } \right){U^T}b$.

Note that matrices $U$ and $V$ are not unique. Using Maple software, you can compute SVD as shown below.

## June 7, 2009

### Schwarz Reflection Principle and several applications

Suppose that $f$ is an analytic function which is defined in the upper half-disk $\{|z|^2<1,I[z]>0\}$. Further suppose that $f$ extends to a continuous function on the real axis, and takes on real values on the real axis. Then $f$ can be extended to an analytic function on the whole disk by the formula

$\displaystyle f(\overline z)=\overline{f(z)}$

and the values for $z$ reflected across the real axis are the reflections of $f(z)$ across the real axis. It is easy to check that the above function is complex differentiable in the interior of the lower half-disk. What is remarkable is that the resulting function must be analytic along the real axis as well, despite no assumptions of differentiability.

This is called the Schwarz reflection principle, and is sometimes also known as Schwarz’s symmetric principle (Needham 2000, p. 257). The diagram above shows the reflection principle applied to a function f defined for the upper half-disk (left figure; red) and its image (right figure; red). The function is real on the real axis, so it is possible to extend the function to the reflected domain (left and right figures; pink).

For the reflected function to be continuous, it is necessary for the values at the boundary to be continuous and to fall on the line being reflected. The reflection principle also applies in the generality of reflecting along any line, not just the real axis, in which case the function $f$ has to take values along a line in the range. In fact, any arc which has a neighborhood biholomorphic to a straight line can be reflected across. The basic example is the boundary of the unit circle which is mapped to the real axis by $z \mapsto \frac{z+1}{z+i}$.

The reflection principle can also be used to reflect a harmonic function which extends continuously to the zero function on its boundary. In this case, for negative $y$, defining

$\displaystyle v(x,y)=-v(x,-y)$

extends $v$ to a harmonic function on the reflected domain. Again note that it is necessary for  $v(x,0)=0$. This result provides a way of extending a harmonic function from a given open set to a larger open set (Krantz 1999, p. 95).

### Perturbation in linear systems

In this small topic, I will show several traditional results regarding to the perturbation in matrix algebra. This comes from the study of the stability of solving system of linear equations in Numerical Analysis.

Let us consider the following matrix-form linear system $Ax=b$. We split into three cases

• Perturbation on $b$

If $\delta b$ is a small perturbation on $b$, then $\delta x$ is a small perturbation on $x$ in the sense that the following system $A(x + \delta x) = b + \delta b$ holds true. This together with the fact that $Ax =b$ gives $A \delta x = \delta b$. By some simple calculation, one has

$\displaystyle\begin{gathered}\frac{{\left\| {\delta x} \right\|}}{{\left\| x \right\|}} = \frac{{\left\| {{A^{ - 1}}\left( {\delta b} \right)} \right\|}}{{\left\| x \right\|}}\leq\frac{{\left\| {{A^{ - 1}}} \right\|\left\| {\delta b} \right\|}}{{\left\| x \right\|}} \hfill \\ \qquad= \frac{{\left\| {{A^{ - 1}}} \right\|\left\| A \right\|\left\| {\delta b} \right\|}}{{\left\| A \right\|\left\| x \right\|}}\leq\frac{{\left\| {{A^{ - 1}}} \right\|\left\| A \right\|\left\| {\delta b} \right\|}}{{\left\| {Ax} \right\|}} \hfill \\ \qquad= \frac{{\left\| {{A^{ - 1}}} \right\|\left\| A \right\|\left\| {\delta b} \right\|}}{{\left\| b \right\|}}. \hfill \\ \end{gathered}$

Thus

$\displaystyle\frac{{\left\| {\delta x} \right\|}}{{\left\| x \right\|}}\leq\left\| {{A^{ - 1}}} \right\|\left\| A \right\|\frac{{\left\| {\delta b} \right\|}}{{\left\| b \right\|}}.$

• Perturbation on $A$.

If $\delta A$ is a small perturbation on $A$, then $\delta x$ is a small perturbation on $x$ in the sense that the following system $(A + \delta A)(x+\delta x) = b$ holds true. This together with the fact that $Ax =b$ gives $\left( {\delta A} \right)x + A\left( {\delta x} \right) + \left( {\delta A} \right)\left( {\delta x} \right) = 0$. By some simple calculation, one has

$\displaystyle\begin{gathered}\left\| {\delta x} \right\| = \left\| {{A^{ - 1}}\left( {\left( {\delta A} \right)\left( x \right) + \left( {\delta A} \right)\left( {\delta x} \right)} \right)} \right\| \hfill \\ \qquad\leq\left\| {{A^{ - 1}}} \right\|\left\| {\left( {\delta A} \right)\left( x \right) + \left( {\delta A} \right)\left( {\delta x} \right)} \right\| \hfill \\ \qquad\leq \left\| {{A^{ - 1}}} \right\|\left( {\left\| {\delta A} \right\|\left\| x \right\| + \left\| {\delta A} \right\|\left\| {\delta x} \right\|} \right). \hfill \\ \end{gathered}$

Thus

$\displaystyle\left\| {\delta x} \right\|\left( {1 - \left\| {{A^{ - 1}}} \right\|\left\| {\delta A} \right\|} \right) \leqslant \left\| {{A^{ - 1}}} \right\|\left\| {\delta A} \right\|\left\| x \right\|$

which implies that

$\displaystyle\frac{{\left\| {\delta x} \right\|}}{{\left\| x \right\|}} \leqslant \frac{{\left\| {{A^{ - 1}}} \right\|\left\| {\delta A} \right\|}}{{1 - \left\| {{A^{ - 1}}} \right\|\left\| {\delta A} \right\|}} = \frac{{\left\| A \right\|\left\| {{A^{ - 1}}} \right\|\frac{{\left\| {\delta A} \right\|}}{{\left\| A \right\|}}}}{{1 - \left\| A \right\|\left\| {{A^{ - 1}}} \right\|\frac{{\left\| {\delta A} \right\|}}{{\left\| A \right\|}}}}.$

• Perturbation on $A$ and $b$.

If $\delta A$ is a small perturbation on $A$ and $\delta b$ is a small perturbation on $b$, then $\delta x$ is a small perturbation on $x$ in the sense that the following system $(A + \delta A)(x+\delta x) = b + \delta b$ holds true. This together with the fact that $Ax =b$ gives $A\left( {\delta x} \right) + \left( {\delta A} \right)x + \left( {\delta A} \right)\left( {\delta x} \right) = \delta b$. By some simple calculation, one has

$\displaystyle\begin{gathered}\left\| {\delta x} \right\| = \left\| {{A^{ - 1}}\left( {\delta b - \left( {\delta A} \right)\left( x \right) - \left( {\delta A} \right)\left( {\delta x} \right)} \right)} \right\| \hfill \\ \qquad\leqslant \left\| {{A^{ - 1}}} \right\|\left\| {\delta b - \left( {\delta A} \right)\left( x \right) - \left( {\delta A} \right)\left( {\delta x} \right)} \right\| \hfill \\ \qquad\leqslant \left\| {{A^{ - 1}}} \right\|\left( {\left\| {\delta b} \right\| + \left\| {\delta A} \right\|\left\| x \right\| + \left\| {\delta A} \right\|\left\| {\delta x} \right\|} \right) \hfill \\ \end{gathered}$

which implies that

$\displaystyle\left\|{\delta x}\right\|\left({1-\left\|{{A^{-1}}}\right\|\left\|{\delta A}\right\|}\right)\leqslant\left\|{{A^{-1}}}\right\|\left({\left\|{\delta b}\right\|+\left\|{\delta A}\right\|\left\| x\right\|}\right).$

Thus,

$\displaystyle\begin{gathered}\frac{{\left\| {\delta x} \right\|}}{{\left\| x \right\|}} \leqslant \frac{{\left\| {{A^{ - 1}}} \right\|\left( {\frac{{\left\| {\delta b} \right\|}}{{\left\| x \right\|}} + \left\| {\delta A} \right\|} \right)}}{{1 - \left\| {{A^{ - 1}}} \right\|\left\| {\delta A} \right\|}} \hfill \\ \qquad= \frac{{\left\| A \right\|\left\| {{A^{ - 1}}} \right\|\left( {\frac{{\left\| {\delta b} \right\|}}{{\left\| A \right\|\left\| x \right\|}} + \frac{{\left\| {\delta A} \right\|}}{{\left\| A \right\|}}} \right)}}{{1 - \left\| A \right\|\left\| {{A^{ - 1}}} \right\|\frac{{\left\| {\delta A} \right\|}}{{\left\| A \right\|}}}} \hfill \\ \qquad\leqslant \frac{{\left\| A \right\|\left\| {{A^{ - 1}}} \right\|\left( {\frac{{\left\| {\delta b} \right\|}}{{\left\| {Ax} \right\|}} + \frac{{\left\| {\delta A} \right\|}}{{\left\| A \right\|}}} \right)}}{{1 - \left\| A \right\|\left\| {{A^{ - 1}}} \right\|\frac{{\left\| {\delta A} \right\|}}{{\left\| A \right\|}}}} \hfill \\ \qquad\leqslant \frac{{\left\| A \right\|\left\| {{A^{ - 1}}} \right\|\left( {\frac{{\left\| {\delta b} \right\|}}{{\left\| b \right\|}} + \frac{{\left\| {\delta A} \right\|}}{{\left\| A \right\|}}} \right)}}{{1 - \left\| A \right\|\left\| {{A^{ - 1}}} \right\|\frac{{\left\| {\delta A} \right\|}}{{\left\| A \right\|}}}}. \hfill \\ \end{gathered}$

In the literature, the expression $\|A^{-1}\|\|A\|$ is called the condition number of the matrix $A$, denoted by $K(A)$. It is easy to see that $K(A) \geq 1$. If $K(A)$ is small then the system $Ax=b$ is called to be well-posed. Alternatively, in the case when $1 \ll K(A)$, the system $Ax=b$ is ill-posed.

## June 2, 2009

### Finite difference method: Local truncation error and consistency

Truncation error or local truncation error is error made by numerical algorithms that arises from taking finite number of steps in computation. It is present even with infinite-precision arithmetic, because it is caused by truncation of the infinite Taylor series to form the algorithm.

Use of arbitrarily small steps in numerical computation is prevented by round-off error, which are the consequence of using finite precision floating point numbers on computers.

Let $F_{i,j}(u)=0$ represent the difference equation approximating the PDE at the $(i.j)$-th mesh point, with exact solution $u$. For example,

$\displaystyle {F_{i,j}}\left( u \right) = \frac{{{u_{i,j + 1}} - {u_{i,j}}}}{k} - {\text{ }}\frac{{{u_{i - 1,j}} - 2{u_{i,j}} + {u_{i + 1,j}}}}{{{h^2}}}$

represent the difference equation approximating the following PDE

$\displaystyle\frac{{\partial u}} {{\partial t}} = \frac{{{\partial ^2}u}} {{\partial {t^2}}}$

at the $(i,j)$-th mesh point. If $u$ is replaced by $U$ at the mesh points of the $\Delta E$, where $U$ is the exact solution of the PDEs, the value of $F_{i,j}(U)$ is called the local truncation error $T_{i,j}$ at the $(i,j)$-th mesh point. $F_{i,j}(U)$ clearly measures the amount by which the exact solution values of the PDE at the mesh points of the $\Delta E$ do not satisfy the $\Delta E$ at the point $(ih, jk)$.

Continuing the above example gives us

$\displaystyle {T_{i,j}} = {F_{i,j}}\left( U \right) = \frac{{{U_{i,j + 1}} - {U_{i,j}}}}{k} - \frac{{{U_{i - 1,j}} - 2{U_{i,j}} + {U_{i + 1,j}}}}{{{h^2}}}$.

Using Taylor expansions, it is easy to express $T_{i,j}$ in terms of power of $h$ and $k$ and partial derivatives of $U$ at $(ih, jk)$. This leads us to the computation of the local truncation error.

It is sometimes possible to approximate a parabolic or hyperbolic equation by a finite-difference scheme that is stable (i.e. limits the amplification of all the components of the initial conditions), but which has a solution that converges to the solution of a different differential equation as the mesh lengths tend to zero. Such a difference scheme is said to be inconsistent or incompatible with the PDE.

The real importance of the concept of consistency lies in a theorem by Lax which states that if a linear finite-difference equation is consistent with a properly posed linear initial-value problem then stability guarantees convergence of $u$ to $U$ as the mesh lengths tend to zero. Consistency can be defined in either of two equivalent but slightly different ways.

The more general definition is as follows. Let $L(U)=0$ represent the PDE in the independent variables $x$ and $t$, with exact solution $U$. Let $F(u)=0$ represent the approximating finite-difference equation with exact solution $u$. Let $v$ be a continuous function of $x$ and $t$ with a sufficient number of continuous derivatives to enable $L(v)$ to be evaluated at the point $(ih,jk)$.

Then the truncation error $T_{i,j}(v)$ at the point $(ih,jk)$ is defined by

$\displaystyle T_{i,j}(v) = F_{i,j}(v) - L(v_{i,j})$.

If $T_{i,j}(v) \to 0$ as $h \to 0$ and $k \to 0$, the difference equation is said to be consistent or compatible with the PDE. Most authors put $v = U$ because $L(U)=0$.

For example, let us consider the following parabolic equation

$\displaystyle u_t = \nu u_{xx}(x) - \kappa u(x), \quad a

with

$\displaystyle u(x,0)=f(x), \quad a \leq x \leq b$

and

$\displaystyle u(a,t)=g_1(t), \quad u(b,t)=g_2(t), \quad 0 \leq t \leq T$,

where $\nu >0$ and $\kappa \geq 0$.

Backward Euler + second order central finite difference discretization.

We split equally the domain $a \leq x \leq b$ into $N$ parts and the domain $0 \leq t \leq T$ into $M$ parts, i.e. $(i,j)$-th mesh point is of the following form

$\displaystyle \left( {a + i\frac{{b - a}} {N},j\frac{T} {M}} \right)$.

We are now in a position to discretize the problem as follows

$\displaystyle\frac{{{u_{i,j + 1}} - {u_{i,j}}}}{k} = \nu \frac{{{u_{i - 1,j + 1}} - 2{u_{i,j + 1}} + {u_{i + 1,j + 1}}}}{{{h^2}}} - \kappa {u_{i,j}}$

where $1 \leq i \leq N - 1$. At $i=0$, that is $x=a$, one has

$\displaystyle\frac{{{u_{1,j + 1}} - {u_{ - 1,j + 1}}}} {{2h}} = g_1\left( t \right)$

and at $i=N$, that is $x=b$, one obtains

$\displaystyle\frac{{{u_{N + 1,j + 1}} - {u_{N - 1,j + 1}}}} {{2h}} = g_2\left( t \right)$.

The local truncation error and consistency

The above discretization can be rewritten as following

where with the following boundary conditions

and

Now we have

By Taylor’s expansion

and

and

Therefore,

which yields

Thus, the principle part of the local truncation error is

Hence

$\displaystyle {T_{i,j}} = O\left( k \right) + O\left( {{h^2}} \right)$.

Consistency.

In view of the local truncation error, one can easily see that the local truncation error is a polynomial of two variables $h$ and $k$ which implies that the method is consistent.