Ngô Quốc Anh

Tháng tám 16, 2009

On the stability of the Runge-Kutta 4 (RK4)

In the literature, the so-called RK4 is given as following: we first define the following coefficients

\displaystyle \begin{gathered}{K_{1}}= f\left({{t_{i}},{y_{i}}}\right),\hfill\\ {K_{2}}= f\left({{t_{i}}+\frac{1}{2}\Delta t,{y_{i}}+\frac{1}{2}\Delta t{K_{1}}}\right),\hfill\\ {K_{3}}= f\left({{t_{i}}+\frac{1}{2}\Delta t,{y_{i}}+\frac{1}{2}\Delta t{K_{2}}}\right),\hfill\\ {K_{4}}= f\left({{t_{i}}+\Delta t,{y_{i}}+\Delta t{K_{3}}}\right),\hfill\\ \end{gathered}

then

\displaystyle {y_{i + 1}} = {y_i} + \frac{{\Delta t}} {6}\left( {{K_1} + 2{K_2} + 2{K_3} + {K_4}} \right).

This is the most important iterative method for the approximation of solutions of ordinary differential equations

\displaystyle y' = f(t, y), \quad y(t_0) = y_0.

This technique was developed around 1900 by the German mathematicians C. Runge and M.W. Kutta. In order to study its stability, we use the model problem

\displaystyle y' = \lambda y, \quad \Re \lambda < 0.

In other words, we replace f(t,y) by \lambda y. Then the stability condition for time step \Delta comes from the following condition

\displaystyle \left| \frac{y_{i+1}}{y_i} \right| \leq 1.

Applying the above discussion to RK4 method, we see that

\displaystyle\begin{gathered}{K_{4}}=\lambda\left({{y_{i}}+\Delta t{K_{3}}}\right),\hfill\\ {K_{3}}=\lambda\left({{y_{i}}+\frac{1}{2}\Delta t{K_{2}}}\right),\hfill\\ {K_{2}}=\lambda\left({{y_{i}}+\frac{1}{2}\Delta t{K_{1}}}\right),\hfill\\ {K_{1}}=\lambda{y_{i}},\hfill\\ \end{gathered}

which implies

\displaystyle\begin{gathered}{K_{4}}=\lambda\left({{y_{i}}+\Delta t\lambda\left({{y_{i}}+\frac{1}{2}\Delta t\lambda\left({{y_{i}}+\frac{1}{2}\Delta t{y_{i}}}\right)}\right)}\right) =\lambda{y_{i}}\left({1+\Delta t\lambda\left({1+\frac{1}{2}\Delta t\lambda\left({1+\frac{1}{2}\Delta t}\right)}\right)}\right),\hfill\\ {K_{3}}=\lambda\left({{y_{i}}+\frac{1}{2}\Delta t\lambda\left({{y_{i}}+\frac{1}{2}\Delta t{y_{i}}}\right)}\right) =\lambda{y_{i}}\left({1+\frac{1}{2}\Delta t\lambda\left({1+\frac{1}{2}\Delta t}\right)}\right),\hfill\\ {K_{2}}=\lambda\left({{y_{i}}+\frac{1}{2}\Delta t\lambda{y_{i}}}\right) =\lambda{y_{i}}\left({1+\frac{1}{2}\Delta t\lambda }\right),\hfill\\ {K_{1}}=\lambda{y_{i}},\hfill\\ \end{gathered}

which yields

\displaystyle\begin{gathered}{K_{4}}=\lambda\left({{y_{i}}+\Delta t\lambda\left({{y_{i}}+\frac{1}{2}\Delta t\lambda\left({{y_{i}}+\frac{1}{2}\Delta t{y_{i}}}\right)}\right)}\right) =\lambda{y_{i}}\left({1+\Delta t\lambda\left({1+\frac{1}{2}\Delta t\lambda\left({1+\frac{1}{2}\Delta t}\right)}\right)}\right),\hfill\\ {K_{3}}=\lambda\left({{y_{i}}+\frac{1}{2}\Delta t\lambda\left({{y_{i}}+\frac{1}{2}\Delta t{y_{i}}}\right)}\right) =\lambda{y_{i}}\left({1+\frac{1}{2}\Delta t\lambda\left({1+\frac{1}{2}\Delta t}\right)}\right),\hfill\\ {K_{2}}=\lambda\left({{y_{i}}+\frac{1}{2}\Delta t\lambda{y_{i}}}\right) =\lambda{y_{i}}\left({1+\frac{1}{2}\Delta t\lambda }\right),\hfill\\ {K_{1}}=\lambda{y_{i}},\hfill\\ \end{gathered}

Thus

\displaystyle\frac{{{y_{i+1}}}}{{{y_{i}}}}= 1+\frac{{\Delta t}}{6}\left[{\lambda+2\lambda\left({1+\frac{1}{2}\Delta t\lambda }\right)+2\lambda\left({1+\frac{1}{2}\Delta t\lambda\left({1+\frac{1}{2}\Delta t}\right)}\right)+\lambda\left[{1+\Delta t\lambda\left({1+\frac{1}{2}\Delta t\lambda\left({1+\frac{1}{2}\Delta t}\right)}\right)}\right]}\right]

which is nothing but

\displaystyle\frac{{{y_{i+1}}}}{{{y_{i}}}}= 1+\Delta t\lambda+\frac{{{{\left({\Delta t\lambda }\right)}^{2}}}}{2}+\frac{{{{\left({\Delta t\lambda }\right)}^{3}}}}{6}+\frac{{{{\left({\Delta t\lambda }\right)}^{4}}}}{{24}}.

Therefore, the stability condition is given as follows

\displaystyle \left| {1 + z + \frac{{{z^2}}} {2} + \frac{{{z^3}}} {6} + \frac{{{z^4}}} {{24}}} \right| \leq 1, \quad \Re z < 0.

Tháng tám 15, 2009

Summation by parts (Abel sum formula)

Suppose \{f_k\} and \{g_k\} are two sequences. Then,

\displaystyle\sum_{k=m}^n f_k(g_{k+1}-g_k) = \left[f_{n+1}g_{n+1} - f_m g_m\right] - \sum_{k=m}^n g_{k+1}(f_{k+1}- f_k).

Using the forward difference operator \Delta, it can be stated more succinctly as

\displaystyle\sum_{k=m}^n f_k\Delta g_k = \left[f_{n+1} g_{n+1} - f_m g_m\right] - \sum_{k=m}^n g_{k+1}\Delta f_k,

Note that summation by parts is an analogue to the integration by parts formula,

\displaystyle\int f\,dg = f g - \int g df.

We also have the following identity

\displaystyle\sum_{k=m}^n f_k(g_k)-g_{k-1}) = \left[f_{n+1}g_n - f_m g_{m-1}\right] - \sum_{k=m}^n g_k(f_{k+1}- f_k).

Using the backward difference operator \Delta, it can be stated more succinctly as

\displaystyle\sum_{k=m}^n f_k\Delta g_{k-1} = \left[f_{n+1}g_n - f_m g_{m-1}\right] - \sum_{k=m}^n g_k \Delta f_k.

Tháng tám 1, 2009

Solutions of Equations in One Variable

A root-finding algorithm is a numerical method, or algorithm, for finding a value x such that f(x) = 0, for a given function f. Such an x is called a root of the function f.

  • The Bisection method

Suppose f is a continuous function defined on the interval [a, b], with f(a) and f(b) of opposite sign. By the Intermediate Value Theorem, there exists a number p in (a, b) with f(p) = 0. Although the procedure will work when there is more than one root in the interval (a,b), we assume for simplicity that the root in this interval is unique. The method calls for a repeated halving of subintervals of [a, b] and, at each step, locating the half containing p.

bisectionmethodTo begin, set a_1=a and b_1=b, and let p_1 be the midpoint of [a,b]; that is,

p_1 = a_1 + \frac{b_1-a_1}{2}=\frac{a_1+b_1}{2}.

If f(p_1)=0, then p=p_1, and we are done. If f(p_1)\ne 0, then f(p_1) has the same sign as either f(a_1) or f(b_1). When f(p_1) and f(a_1) have the same sign, p \in (p_1, b_1), and we set a_2=p_1 and b_2 = b_1. When f(p_1) and f(a_1) have opposite signs, p \in (a_1, p_1), and we set a_2 = a_1 and b_2 = p_1. We then reapply the process to the interval [a_2, b_2].

  • Fixed-point iteration

A number p is a fixed point for a given function g if g(p) = p. In this section we consider the problem of finding solutions to fixed-point problems and the connection between the fixed-point problems and the root-finding problems we wish to solve. Root-finding problems and fixed-point problems are equivalent classes in the following sense:

Given a root-finding problem f(p) = 0, we can define functions g with a fixed point at p in a number of ways, for example, as g(x) = x-f(x) or as g(x) = x + 3f(x). Conversely, if the function g has a fixed point at p, then the function defined byf(x) = x-g(x) has a zero at p.

fixedpointiteration

Although the problems we wish to solve are in the root-finding form, the fixed-point form is easier to analyze, and certain fixed-point choices lead to very powerful root-finding techniques.

fixedpointiteration

  • Newton’s method

Newton’s (or the Newton-Raphson) method is one of the most powerful and well-known numerical methods for solving aroot-finding problem. With an initial approximation p_0, the Newton’s method generates the sequence \{p_n\}_{n=1}^\infty by

{p_{n + 1}} = {p_n} - \frac{{f\left( {{p_n}} \right)}} {{f'\left( {{p_n}} \right)}}, \quad n \geqslant 0.

newtonmethod

  • The Secant method

To circumvent the problem of the derivative evaluation in Newton’s method, we  introduce a slight variation. By definition,

f'\left( {{p_n}} \right) = \mathop {\lim }\limits_{x \to {p_n}} \frac{{f\left( x \right) - f\left( {{p_n}} \right)}} {{x - {p...

Letting x=p_{n-1}, we have

f'\left( {{p_n}} \right) \approx \frac{{f\left( {{p_{n - 1}}} \right) - f\left( {{p_n}} \right)}} {{{p_{n - 1}} - {p_n}}}

which yields

{p_{n + 1}} = {p_n} - \frac{{f\left( {{p_n}} \right)\left( {{p_{n - 1}} - {p_n}} \right)}} {{f\left( {{p_{n - 1}}} \right) - ...

thesecandmethod

  • The method of False Position

The method of False Position (also called Regula Falsi) generates approximations in the same manner as the Secant method, but it includes a test to ensure that the root is bracketed between successive iterations. Although it is not a method we generally recommend, it illustrates how bracketing can be incorporated.

First choose initial approximations p_0 and p_1 with f(p_0) f(p_1) < 0. The approximation p_2 is chosen in the same manner as in the Secant method, as the x-intercept of the line joining (p_0, f(p_0)) and (p_1, f(p_1)). To decide which secant line to use to compute p_3, we check f(p_2) f(p_1). If this value is negative, then p_1 and p_2 bracket a root, and we choose p_3 as the x-intercept of the line joining (p_1, f(p_1)) and (p_2, f(p_2)). If not, we choose p_3 as the x-intercept of the line joining (p_0, f(p_0)) and (p_2, f(p_2)), and then interchange the indices on p_0 and p_1.

sedantandfalsepositionmethods

In a similar manner, once p_3 is found, the sign of f(p_3)f(p_2) determines whether we use p_2 and p_3 or p_3 and p_1 to compute p_4. In the latter case a relabeling of p_2 and p_1 is performed.

Source: Richard L. Burden and J. Douglas Faires, Numerical Analysis, 8th edition, Thomson/Brooks/Cole, 2005.

Tháng Bảy 24, 2009

Linear shooting method

In numerical analysis, the shooting method is a method for solving a boundary value problem by reducing it to the solution of an initial value problem. The following exposition may be clarified by this illustration of the shooting method.

For a boundary value problem of a second-order ordinary differential equation, the method is stated as follows. Let

y''(t) = f(t, y(t), y'(t)), \quad y(t_0) = y_0, \quad y(t_1) = y_1

be the boundary value problem. Let y(t, a) denote the solution of the initial value problem

y''(t) = f(t, y(t), y'(t)), \quad y(t_0) = y_0, \quad y'(t_0) = a

Define the function F(a) as the difference between y(t_1, a) and the specified boundary value y_1.

F(a) = y(t_1, a) - y_1 \,

If the boundary value problem has a solution, then F has a root, and that root is just the value of y'(t_0) which yields a solution y(t) of the boundary value problem. The usual methods for finding roots may be employed here, such as the bisection method or Newton’s method.

Linear shooting method

The boundary value problem is linear if f has the form

f(t, y(t), y'(t))=p(t)y'(t)+q(t)y(t)+r(t).

In this case, the solution to the boundary value problem is usually given by

y(t) = y_{(1)}(t)+\frac{y_1-y_{(1)}(t_1)}{y_{(2)}(t_1)}y_{(2)}(t)

where y_{(1)}(t) is the solution to the initial value problem

y''(t) = f(t, y(t), y'(t)),\quad y(t_0) = y_0, \quad y'(t_0) = 0,

and y_{(2)}(t) is the solution to the initial value problem

y''(t) = p(t)y'(t)+q(t)y(t),\quad y(t_0) = 0, \quad y'(t_0) = 1.

See the proof for the precise condition under which this result holds.

Remark

One can easily see that if y_{(2)}(t_1) = 0 then y_{(2)}(t) \equiv 0 for all t. Thus y(t) = y_{(1)}(t) for all t. Besides, the formula

y(t) = y_{(1)}(t)+\frac{y_1-y_{(1)}(t_1)}{y_{(2)}(t_1)}y_{(2)}(t)

comes from the fact that we need to find y(t) as a combination of y_{(1)}(t) and y_{(2)}(t). In this manner, y(t) should be of the form

y(t) = y_{(1)}(t) + Cy_{(2)}(t) for all t.

At x=t_0,

{y_0} = y({t_0}) = \underbrace {{y_{(1)}}({t_0})}_{{y_0}} + C{y_{(2)}}({t_0})

which implies that y_{(2)}(t_0)=0. This is self-satisfied.
At x=t_1,

{y_1} = y({t_1}) = {y_{(1)}}({t_1}) + C{y_{(2)}}({t_1})

which implies that

C = \frac{{{y_1} - {y_{(1)}}({t_1})}} {{{y_{(2)}}({t_1})}}.

Therefore,

y(t) = {y_{(1)}}(t) + \frac{{{y_1} - {y_{(1)}}({t_1})}} {{{y_{(2)}}({t_1})}}{y_{(2)}}(t).

Example

A boundary value problem is given as follows by Stoer and Bulirsch.

w''(t) = \frac{3}{2} w^2, \quad w(0) = 4, \quad w(1) = 1.

The initial value problem

w''(t) = \frac{3}{2} w^2, \quad w(0) = 4, \quad w'(0) = s

was solved for s = -1, -2, -3, ..., -100, and F(s) = w(1,s) -1 plotted in the first figure. Inspecting the plot of F, we see that there are roots near -8 and -36. Some trajectories of w(t,s) are shown in the second figure.

Solutions of the initial value problem were computed by using the LSODE algorithm, as implemented in the mathematics package GNU Octave. Stoer and Bulirsch state that there are two solutions, which can be found by algebraic methods. These correspond to the initial conditions w'(0) = -8 and w'(0) = -35.9 (approximately).

http://upload.wikimedia.org/wikipedia/commons/8/87/Shooting_method_error.png

The function F(s) = w(1,s)-1.

http://upload.wikimedia.org/wikipedia/commons/2/2d/Shooting_method_trajectories.png

Trajectories w(t,s) for s = w'(0) equal to -7, -8, -10, -36, and -40 (red, green, blue, cyan, and magenta, respectively). The point (1,1) is marked with a red diamond.

Source: http://en.wikipedia.org/wiki/Shooting_method

Tháng Sáu 30, 2009

A quick introduction to Simplex Method via an example

Chuyên mục: Giải tích 9 (MA5265), Linh Tinh, Nghiên Cứu Khoa Học — Ngô Quốc Anh @ 14:36

Below is the
original problem:
objective function
is in green.
See
step
1
Red variables below
are called
SLACK VARIABLES
See
step
2
Below is the
SIMPLEX TABLEAU.
Compare RED symbols
with Z = x1 + 2x2 - x3.
Blue numbers below
are the “ISM“.
The 1st SIMPLEX TABLEAU
is below. Note
missing z-column
(correction by Steve Hulet)
See steps 3,4,5 of
SIMPLEX METHOD
as you handle INDICATORS,
RATIOS, and PIVOTS.
Named below are the 4
row operations needed to
pivot on the number
“5″ encircled in red
Below are the
results of the
row operations
named above
blue
numbers
are the
new ISM
Since one INDICATOR
(namely -1/5) remains
negative, we must
repeat steps 3-9
Below is the result
of changing our pivot
to “1″
Named below are
4 row operations
needed to pivot
on the number(16/5)
encircled in red
Above there was a tie for least non-negative ratio:
either row 1 or row 2 could have become the pivot row, and either choice leads to the final tableau after one additional pivoting. At the right is the result of the final 3 row operations.
All indicators {0, 0, 49
16
, 0, 1
16
and 3
8
} are now zero or bigger (“13″ is NOT an indicator).
Thus, as in step 8 of the SIMPLEX METHOD, the last tableau is a FINAL TABLEAU.
Row operations of SIMPLEX METHOD are done.
Thus, the basic solution for the tableau above is the solution to our original problem.
[1st] set equal to 0 all variables NOT associated with the blue ISM, as at the right. Each column of the final tableau has a label naming it’s variable.
[2nd] convert each row of the final tableau (except the bottom row) back into equation form (as at the right) to find the values of the remaining variables. The value of the objective function is in the lower right corner of the final tableau.

Source: http://math.uww.edu/~mcfarlat/s-prob.htm

More: http://people.hofstra.edu/stefan_Waner/realworld/tutorialsf4/unit4_3.html

Tháng Sáu 29, 2009

Karush–Kuhn–Tucker conditions, the 1st order optimality condition

Chuyên mục: Giải tích 9 (MA5265), Linh Tinh, Nghiên Cứu Khoa Học — Ngô Quốc Anh @ 16:28

In mathematics, the Karush–Kuhn–Tucker conditions (also known as the Kuhn-Tucker or the KKT conditions) are necessary for a solution in nonlinear programming to be optimal, provided some regularity conditions are satisfied. It is a generalization of the method of Lagrange multipliers to inequality constraints. The conditions are named for William Karush, Harold W. Kuhn, and Albert W. Tucker.

Let us consider the following nonlinear optimization problem:

\mbox{Minimize } f(x)

\mbox{subject to: } g_i(x) \le 0 , h_j(x) = 0

where f(x) is the function to be minimized, g_i (x) (i = 1,...,m) are the inequality constraints and h_j (x) (j = 1,...,l) are the equality constraints, and m and l are the number of inequality and equality constraints, respectively.

The necessary conditions for this general equality-inequality constrained problem were first published in the Masters thesis of William Karush, although they only became renowned after a seminal conference paper by Harold W. Kuhn and Albert W. Tucker.

Necessary conditions

Suppose that the objective function, i.e., the function to be minimized, is f : \mathbb{R}^n \rightarrow \mathbb{R} and the constraint functions are g_i : \mathbb{R}^n \rightarrow \mathbb{R} and h_j :  \mathbb{R}^n \rightarrow \mathbb{R}. Further, suppose they are continuously differentiable at a point x^\star. If x^\star is a local minimum that satisfies some regularity conditions, then there exist constants \mu_i\ (i = 1,...,m) and \lambda_j (j = 1,...,l) such that

Stationarity:

\nabla f(x^\star) + \sum_{i=1}^m \mu_i \nabla g_i(x^\star) + \sum_{j=1}^l \lambda_j \nabla h_j(x^\star) = 0,

Primal feasibility:

g_i(x^\star) \le 0, \mbox{ for all } i = 1,..., m
h_j(x^\star) = 0, \mbox{ for all } j = 1,..., l .

Dual feasibility:

\mu_i \ge 0 (i = 1,...,m) .

Complementary slackness:

\mu_i g_i (x^\star) = 0 for all i = 1,...,m.

Regularity conditions (or constraint qualifications)

In order for a minimum point x^\star be KKT, it should satisfy some regularity condition, the most used ones are listed below:

  • Linear independence constraint qualification (LICQ): the gradients of the active inequality constraints and the gradients of the equality constraints are linearly independent at x^\star.
  • Mangasarian-Fromowitz constraint qualification (MFCQ): the gradients of the active inequality constraints and the gradients of the equality constraints are positive-linearly independent at x^\star.
  • Constant rank constraint qualification (CRCQ): for each subset of the gradients of the active inequality constraints and the gradients of the equality constraints the rank at a vicinity of x^\star is constant.
  • Constant positive linear dependence constraint qualification (CPLD): for each subset of the gradients of the active inequality constraints and the gradients of the equality constraints, if it is positive-linear dependent at x^\star then it is positive-linear dependent at a vicinity of x^\star. (v_1,...,v_n) is positive-linear dependent if there exists a_1\geq 0,...,a_n\geq 0 not all zero such that a_1v_1+...+a_nv_n=0)
  • Quasi-normality constraint qualification (QNCQ): if the gradients of the active inequality constraints and the gradients of the equality constraints are positive-linearly independent at x^\star with associated multipliers \lambda_i for equalities and \mu_j for inequalities than it doesn’t exist a sequence x_k\to x^\star such that: \lambda_i \ne 0 therefore \lambda_i h_i(x_k)>0 and \mu_j \ne 0 thus \mu_j g_j(x_k)>0.
  • Slater condition: for a convex problem, there exists a point x such that h(x) = 0 and g_i(x) < 0 for all i active in x^\star.
  • Linearity constraints: If f and g are affine functions, then no other condition is needed to assure that the minimum point is KKT.

It can be shown that

LICQ⇒MFCQ⇒CPLD⇒QNCQ,

LICQ⇒CRCQ⇒CPLD⇒QNCQ

(and the converses are not true), although MFCQ is not equivalent to CRCQ. In practice weaker constraint qualifications are preferred since they provide stronger optimality conditions.

Sufficient conditions

In some cases, the necessary conditions are also sufficient for optimality. This is the case when the objective function f and the inequality constraints g_j are continuously differentiable convex functions and the equality constraints hi are affine functions. It was shown by Martin in 1985 that the broader class of functions in which KKT conditions guarantees global optimality are the so called invex functions. So if equality constraints are affine functions, inequality constraints and the objective function are continuously differentiable invex functions then KKT conditions are sufficient for global optimality.

Value function

If we reconsider the optimization problem as a maximization problem with constant inequality constraints,

\mbox{Maximize }\; f(x)

\mbox{subject to: } g_i(x) \le a_i , h_j(x) = 0.

The value function is defined as:

V(a_1, ..., a_n) = \sup\limits_{x}\;\; f(x)

\mbox{subject to: } g_i(x) \le a_i , h_j(x) = 0 j\in\{1,...,l\}, i\in{1,...,m}.

(So the domain of V is \{a \in \mathbb{R}^m | \mbox{for some }x\in X, g_i(x) \leq a_i, i \in \{1,...,m\}.) Given this definition, each coefficient, \lambda_i, is the rate at which the value function increases as ai increases. Thus if each ai is interpreted as a resource constraint, the coefficients tell you how much increasing a resource will increase the optimum value of our function f. This interpretation is especially important in economics and is used, for instance, in utility maximization problems.

Source

Tháng Sáu 20, 2009

How to apply the Runge–Kutta 4 method to the system of 1st order ODEs?

In numerical analysis, the Runge–Kutta methods are an important family of implicit and explicit iterative methods for the approximation of solutions of ordinary differential equations.

The common fourth-order Runge–Kutta method

One member of the family of Runge–Kutta methods is so commonly used that it is often referred to as “RK4″ or simply as “the Runge–Kutta method”. Let an initial value problem be specified as follows.

y' = f(t, y), a \leq t \leq b with y(a) = y_0.

We equally split the interval [a,b] into N subintervals by the following interior points

a = t_0 <t_1<...<t_{N-1}<t_N = b.

Of course, t_i = a + i\frac{b-a}{N}. The number \frac{b-a}{N}, which will be denoted by h, is called the step size. The value of y at step i is usually denoted by y_i, that is, y_i = y(t_i). Therefore, the initial value problem can be rewritten as y' = f(t, y), where a \leq t \leq b and with y(t_0) = y_0. The Runge–Kutta 4 method tells us that, having the value of y at t_i, we can compute the value of y at t_{i+1} by using the following scheme

\begin{gathered}   {K^1} = hf\left( {{t_i},{y_i}} \right) \hfill \\   {K^2} = hf\left( {{t_i} + \frac{h} {2},{y_i} + \frac{1}...

Thus, for each step, we need to compute all four K^i, i =\overline{1,4}, constants.

The Runge–Kutta 4 method for a system of ODEs

Now we investigate the question in the title of this entry. An mth-order system of 1st initial value problems has the form

\begin{gathered}   {{y'}_1}\left( t \right) = {f_1}\left( {t,{y_1},...,{y_m}} \right), \hfill \\   {{y'}_2}\left( t \right) =...

for a \leq t \leq b, with the initial conditions

y_1(a)=\alpha_1, y_2(a)=\alpha_2,…,y_m(a)=\alpha_m.

Let t_i be a+i\frac{b-a}{N}. We denote by y_{i,j} the following number

{y_{i,j}} = {y_i}\left( {{t_j}} \right), \quad i = \overline {1,m} , \quad j = \overline {0,N} .

At each step j+1, having all y_{i,j} we can compute y_{i,j+1} for all i=\overline{1,m}. To do this, for each i, we need to compute all K^k, k=\overline{1,4}. More precisely, we need to compute

\begin{gathered}   K_{i,j}^1 = h{f_i}\left( {{t_j},{y_{1,j}},{y_{2,j}},...,{y_{m,j}}} \right), \hfill \\   K_{i,j}^2 = h{f_i}...

As can be seen, at step j and for each k=\overline{1,3}, all the values K_{i,j}^k, with i=\overline{1,m} must be computed before any of the expressions K_{i,j}^{k+1}.

Example

We now consider the following example

\begin{gathered}   u'\left( t \right) = f\left( {t,u,v} \right), \hfill \\   v'\left( t \right) = g\left( {t,u,v} \right). \h...

Therefore, at the step j+1, we can compute u(t_{j+1}) and v(t_{j+1}) by using the following scheme

\begin{gathered}   {K^1} = hf\left( {{t_j},{u_j},{v_j}} \right), \hfill \\   {\overline K ^1} = hg\left( {{t_j},{u_j},{v_j}} ...

Tháng Sáu 17, 2009

Linear Least Square Problem and QR Decomposition

In linear algebra, a QR decomposition (also called a QR factorization) of a matrix is a decomposition of the matrix into an orthogonal and a right triangular matrix. QR decomposition is often used to solve the linear least squares problem, and is the basis for a particular eigenvalue algorithm, the QR algorithm.

Square matrix

A QR decomposition of a real square matrix A is a decomposition of A as A = QR, where Q is an orthogonal matrix (meaning that Q^TQ = I ) and R is an upper triangular matrix (also called right triangular matrix). This generalizes to a complex square matrix A and a unitary matrix Q. If A is nonsingular, then this factorization is unique if we require that the diagonal elements of R are positive.

Rectangular matrix

More generally, we can factor a complex m \times n matrix A, with m > n, as the product of an m \times m unitary matrix Q and an m\times n upper triangular matrix R. As the bottom m-n rows of an m \times n upper triangular matrix consist entirely of zeroes, it is often useful to partition R, or both R and Q:

A = QR = Q \begin{pmatrix} R_1 \\ 0 \end{pmatrix} = \begin{pmatrix} Q_1, Q_2 \end{pmatrix} \begin{pmatrix} R_1 \\ 0 \end{pmat...

where R_1 is an n\times n upper triangular matrix, Q_1 is m\times n, Q_2 is m\times (m-n), and Q_1 and Q_2 both have orthogonal columns.

Golub & Van Loan call Q_1R_1 the thin QR factorization of A. If A is of full rank n and we require that the diagonal elements of R_1 are positive then R_1 and Q_1 are unique, but in general Q_2 is not. R_1 is then equal to the upper triangular factor of the Cholesky decomposition of A^\star A (= A^TA if A is real).

QL, RQ and LQ decompositions

Analogously, we can define QL, RQ, and LQ decompositions, with L being a lower triangular matrix.

How to solve the Linear Least Square problems

Having the QR decomposition, that is, A=QR, one can solve the Least Square Problem Ax=b where A a m \times n matrix, x \in \mathbb R^n and b \in \mathbb R^m with m >n. To be precise, by mean of A=QR, we see that Q is an m \times m matrix and R is an m \times n matrix. Therefore, we can assume both Q and R to be

Q = \left( {\begin{array}{*{20}{c}}    {{{\left[ {{Q_1}} \right]}_{m \times n}}} & {{{\left[ {{Q_2}} \right]}_{m \times \...

Now equation Ax=b is equivalent to Q_1R_1x =b. Thus x=R_1^{-1}Q_1^Tb.

Computing the QR decomposition

There are several methods for actually computing the QR decomposition, such as by means of the Gram–Schmidt process, Householder transformations, or Givens rotations. Each has a number of advantages and disadvantages. In this section, we mainly consider the Gram–Schmidt process. More precisely, consider the Gram–Schmidt process, with the vectors to be considered in the process as the columns of the matrix A=(\mathbf{a}_1| \cdots|\mathbf{a}_n). Assume that after finishing the Gram–Schmidt process, we obtain (\mathbf{e}_1| \cdots|\mathbf{e}_n). Therefore, we put

\begin{matrix}Q\end{matrix} = (\mathbf{e}_1|\cdots |\mathbf{e}_n)

and thus,

R=Q^TA.

Example

Let us consider the case when

A = \left( {\begin{array}{*{20}{c}}    2 & 4 \\    1 & 3  \\    0 & 0 \\    0 & 0  \\   \end{array} } \right)...

Then

\mathbf a_1 = \left( {\begin{array}{*{20}{c}}    2  \\    1  \\    0  \\    0  \\   \end{array} } \right), \quad \mathbf a_2 ...

From \mathbf a_1, we obtain \mathbf e_1 from \mathbf e_1 : = \frac{\mathbf a_1}{\|\mathbf a_1\|}, that is,

\mathbf e_1 = \left( {\begin{array}{*{20}{c}}    {\frac{2} {{\sqrt 5 }}}  \\    {\frac{1} {{\sqrt 5 }}}  \\    0  \\    0  \\...

To obtain \mathbf e_2, we firstly project \mathbf a_2 onto \mathbf e_1, then we subtract this result from \mathbf a_2. More precisely,

{\mathbf e_2} = \frac{\mathbf a_2 - {\rm proj}_{\mathbf e_1}{\mathbf a_2}} {{\left\| {{\mathbf a_2} - {\rm proj}_{\mathbf e_1...

where

{\rm proj}_{\mathbf e_1}{\mathbf a_2} = \frac{{\left\langle {{\mathbf e_1},{\mathbf a_2}} \right\rangle }} {{\left\langle {{\...

which yields

{\mathbf e_2} = \left( {\begin{array}{*{20}{c}}    {\frac{{ - 1}} {{\sqrt 5 }}}  \\    {\frac{2} {{\sqrt 5 }}}  \\    0  \\  ...

Thus

Q = \left( {\begin{array}{*{20}{c}}    {\frac{2} {{\sqrt 5 }}} & { - \frac{1} {{\sqrt 5 }}} & 0 & 0  \\    {\frac...

and

R = {Q^T}A = \left( {\begin{array}{*{20}{c}}    {\frac{2} {{\sqrt 5 }}} & {\frac{1} {{\sqrt 5 }}} & 0 & 0  \\    ...

Finally,

\underbrace {\left( {\begin{array}{*{20}{c}}    2 & 4  \\    1 & 3  \\    0 & 0  \\    0 & 0  \\   \end{array...

In Maple, you can practice as shown below.

QR-Maple

Tháng Sáu 10, 2009

Linear Least Square Problem and Singular Value Decomposition (SVD)

Linear Least Square Problem (linear least squares), or ordinary least squares (OLS), is an important computational problem, that arises primarily in applications when it is desired to fit a linear mathematical model to measurements obtained from experiments. The goals of linear least squares are to extract predictions from the measurements and to reduce the effect of measurement errors. Mathematically, it can be stated as the problem of finding an approximate solution to an overdetermined system of linear equations. In statistics, it corresponds to the maximum likelihood estimate for a linear regression with normally distributed error.

Linear least square problems admit a closed-form solution, in contrast to non-linear least squares problems, which often have to be solved by an iterative procedure.

Motivational example

As a result of an experiment, four (x,y) data points were obtained, (1,6), (2,5), (3,7), and (4,10) (shown in red in the picture on the right). It is desired to find a line y = \beta_1 + \beta_2x that fits “best” these four points. In other words, we would like to find the numbers \beta_1 and \beta_2 that approximately solve the overdetermined linear system

\begin{gathered}   {\beta _1} + 1{\beta _2} = 6, \hfill \\   {\beta _1} + 2{\beta _2} = 5, \hfill \\   {\beta _1} + 3{\beta _...

of four equations in two unknowns in some “best” sense.

File:Linear least squares example2.png

The least squares approach to solving this problem is to try to make as small as possible the sum of squares of “errors” between the right- and left-hand sides of these equations, that is, to find the minimum of the function

S(\beta_1, \beta_2)= \left[6-(\beta_1+1\beta_2)\right]^2 +\left[5-(\beta_1+2\beta_2) \right]^2 +\left[7-(\beta_1 + 3\beta_2)\...

The minimum is determined by calculating the partial derivatives of S(\beta_1,\beta_2) in respect to \beta_1 and \beta_2 and setting them to zero. This results in a system of two equations in two unknowns, called the normal equations, which, when solved, gives the solution

\beta_1 = 3.5, \beta_2 = 1.4

and the equation y = 3.5 + 1.4x of the line of best fit. The residuals, that is, the discrepancies between the y values from the experiment and the y values calculated using the line of best fit are then found to be 1.1, -1.3, -0.7, and 0.9 (see the picture on the right). The minimum value of the sum of squares is

S(3.5,1.4) = 1.12 + (-1.3)2 + (-0.7)2 + 0.92 = 4.2.

The general problem

Consider an overdetermined system (there are more equations than unknowns)

\sum_{j=1}^{n} X_{ij}\beta_j = y_i,\ (i=1, 2, \dots, m),

of m linear equations in n unknown coefficients, \beta_1,\beta_2,...,\beta_n, with m > n, written in matrix form as \mathbf {X} \boldsymbol {\beta} = \mathbf {y},where


\mathbf {X}=\begin{pmatrix} X_{11} & X_{12} & \cdots & X_{1n} \\ X_{21} & X_{22} & \cdots & X_{2n} \\...

Such a system usually has no solution, and the goal is then to find the coefficients \beta which fit the equations “best”, in the sense of solving the quadratic minimization problem

\mathop {\rm argmin}\limits_\beta  \sum\limits_{i = 1}^m {{{\left| {{y_i} - \sum\limits_{j = 1}^n {{X_{ij}}} {\beta _j}} \rig...

A justification for choosing this criterion is given in properties below. This minimization problem has a unique solution, provided that the n columns of the matrix X are linearly independent, given by solving the normal equations

(\mathbf X^\top \mathbf X )\hat \boldsymbol \beta= \mathbf X^\top \mathbf y.

Singular value decomposition

In linear algebra, the singular value decomposition (SVD) is an important factorization of a rectangular real or complex matrix, with many applications in signal processing and statistics. Applications which employ the SVD include computing the pseudoinverse, least squares fitting of data, matrix approximation, and determining the rank, range and null space of a matrix.

Suppose M is an m-by-n matrix whose entries come from the field K, which is either the field of real numbers or the field of complex numbers. Then there exists a factorization of the form

M = U\Sigma V^*,

where U is an m-by-m unitary matrix over K, the matrix \Sigma is m-by-n diagonal matrix with nonnegative real numbers on the diagonal, and V^* denotes the conjugate transpose of V, an n-by-n unitary matrix over K. Such a factorization is called a singular-value decomposition of M. A common convention is to order the diagonal entries \Sigma_{i,i} in non-increasing fashion. In this case, the diagonal matrix \Sigma is uniquely determined by M (though the matrices U and V are not). The diagonal entries of \Sigma are known as the singular values of M.

In practice, matrix U consists of eigenvectors of matrix AA^T while matrix V consists of eigenvectors of matrix A^TA. Finally, matrix \Sigma is the square root of a diagonal matrix forming by eigenvalues of either matrix AA^T or matrix A^TA.

Example

Let us consider the case when

A = \left( {\begin{array}{*{20}{c}}    2 & 4    \\    1 & 3    \\    0 & 0    \\    0 & 0   \\   \end{array} ...

Having A yields

A{A^T} = \left( {\begin{array}{*{20}{c}}    {20} & {14} & 0 & 0  \\    {14} & {10} & 0 & 0  \\    0 &...

By simple calculation, one has

\lambda_1=15+\sqrt{221}, \quad \lambda_2 =15-\sqrt{221}

being eigenvalues of matrix A^TA (in the descending form), which helps us to write down

To construct matrix V, one has to find eigenvectors corresponding to eigenvalues \lambda_1 and \lambda_2. For the first one, that is, \lambda_1, one has the following linear system

\left( {\begin{array}{*{20}{c}}    { - 10 - \sqrt {221} } & {11}  \\    {11} & {10 - \sqrt {221} }  \\   \end{array} ...

which implies, for example,

{x_1} = \frac{{11}} {{\sqrt {442 + 20\sqrt {221} } }}, \quad {x_2} = \frac{{10 + 221}} {{\sqrt {442 + 20\sqrt {221} } }}.

Similar to \lambda_2, one has its corresponding eigenvector as follows

{x_3} = \frac{{11}} {{\sqrt {442 - 20\sqrt {221} } }}, \quad {x_4} = \frac{{10 - 221}} {{\sqrt {442 - 20\sqrt {221} } }}.

Thus

V = \left( {\begin{array}{*{20}{c}}    {\frac{{11}} {{\sqrt {442 + 20\sqrt {221} } }}} & {\frac{{11}} {{\sqrt {442 - 20\s...

The numerical solution of V is

V = \left( {\begin{array}{*{20}{c}}    {{\text{0}}{\text{.4045535848}}} & {{\text{0}}{\text{.9145142955}}}  \\    {{\text...

which gives us

{V^T}V = \left( {\begin{array}{*{20}{c}}    {{\text{1}}{\text{.000000000}}} & {{\text{ - }}{\text{.4e - 9}}}  \\    {{\te...

Having V we can compute U by the following formula U=AV\Sigma^{-1} or you can perform the same routine as above. That is

U = \left( {\begin{array}{*{20}{c}}    {{\text{0}}{\text{.1495732296}}} & {{\text{1}}{\text{.574048236}}} & 0 & 0....

It is easy to see that

U\Sigma {V^T} = \left( {\begin{array}{*{20}{c}}    {{\text{1}}{\text{.999999783}}} & {{\text{4}}{\text{.000000099}}}  \\ ...

Having a SVD result, that is

A = U\left( {\begin{array}{*{20}{c}}    {{\Sigma _1}}  \\    0  \\   \end{array} } \right){V^T}

one can see that the solution of Ax=b can be computed by x = V\left( {\begin{array}{*{20}{c}}    {\Sigma _1^{ - 1}} & 0  \\   \end{array} } \right){U^T}b. Note that matrices U and V are not unique. Using Maple software, you can compute SVD as shown below.

SVD-Maple

Useful links

Tháng Sáu 7, 2009

Perturbation in linear systems

In this small topic, I will show several traditional results regarding to the perturbation in matrix algebra. This comes from the study of the stability of solving system of linear equations in Numerical Analysis.

Let us consider the following matrix-form linear system Ax=b. We split into three cases

  • Perturbation on b

If \delta b is a small perturbation on b, then \delta x is a small perturbation on x in the sense that the following system A(x + \delta x) = b + \delta b holds true. This together with the fact that Ax =b gives A \delta x = \delta b. By some simple calculation, one has

\displaystyle\begin{gathered}\frac{{\left\| {\delta x} \right\|}}{{\left\| x \right\|}} = \frac{{\left\| {{A^{ - 1}}\left( {\delta b} \right)} \right\|}}{{\left\| x \right\|}}\leq\frac{{\left\| {{A^{ - 1}}} \right\|\left\| {\delta b} \right\|}}{{\left\| x \right\|}} \hfill \\ \qquad= \frac{{\left\| {{A^{ - 1}}} \right\|\left\| A \right\|\left\| {\delta b} \right\|}}{{\left\| A \right\|\left\| x \right\|}}\leq\frac{{\left\| {{A^{ - 1}}} \right\|\left\| A \right\|\left\| {\delta b} \right\|}}{{\left\| {Ax} \right\|}} \hfill \\ \qquad= \frac{{\left\| {{A^{ - 1}}} \right\|\left\| A \right\|\left\| {\delta b} \right\|}}{{\left\| b \right\|}}. \hfill \\ \end{gathered}

Thus

\displaystyle\frac{{\left\| {\delta x} \right\|}}{{\left\| x \right\|}}\leq\left\| {{A^{ - 1}}} \right\|\left\| A \right\|\frac{{\left\| {\delta b} \right\|}}{{\left\| b \right\|}}.

  • Perturbation on A.

If \delta A is a small perturbation on A, then \delta x is a small perturbation on x in the sense that the following system (A + \delta A)(x+\delta x) = b holds true. This together with the fact that Ax =b gives \left( {\delta A} \right)x + A\left( {\delta x} \right) + \left( {\delta A} \right)\left( {\delta x} \right) = 0. By some simple calculation, one has

\displaystyle\begin{gathered}\left\| {\delta x} \right\| = \left\| {{A^{ - 1}}\left( {\left( {\delta A} \right)\left( x \right) + \left( {\delta A} \right)\left( {\delta x} \right)} \right)} \right\| \hfill \\ \qquad\leq\left\| {{A^{ - 1}}} \right\|\left\| {\left( {\delta A} \right)\left( x \right) + \left( {\delta A} \right)\left( {\delta x} \right)} \right\| \hfill \\ \qquad\leq \left\| {{A^{ - 1}}} \right\|\left( {\left\| {\delta A} \right\|\left\| x \right\| + \left\| {\delta A} \right\|\left\| {\delta x} \right\|} \right). \hfill \\ \end{gathered}

Thus

\displaystyle\left\| {\delta x} \right\|\left( {1 - \left\| {{A^{ - 1}}} \right\|\left\| {\delta A} \right\|} \right) \leqslant \left\| {{A^{ - 1}}} \right\|\left\| {\delta A} \right\|\left\| x \right\|

which implies that

\displaystyle\frac{{\left\| {\delta x} \right\|}}{{\left\| x \right\|}} \leqslant \frac{{\left\| {{A^{ - 1}}} \right\|\left\| {\delta A} \right\|}}{{1 - \left\| {{A^{ - 1}}} \right\|\left\| {\delta A} \right\|}} = \frac{{\left\| A \right\|\left\| {{A^{ - 1}}} \right\|\frac{{\left\| {\delta A} \right\|}}{{\left\| A \right\|}}}}{{1 - \left\| A \right\|\left\| {{A^{ - 1}}} \right\|\frac{{\left\| {\delta A} \right\|}}{{\left\| A \right\|}}}}.

  • Perturbation on A and b.

If \delta A is a small perturbation on A and \delta b is a small perturbation on b, then \delta x is a small perturbation on x in the sense that the following system (A + \delta A)(x+\delta x) = b + \delta b holds true. This together with the fact that Ax =b gives A\left( {\delta x} \right) + \left( {\delta A} \right)x + \left( {\delta A} \right)\left( {\delta x} \right) = \delta b. By some simple calculation, one has

\displaystyle\begin{gathered}\left\| {\delta x} \right\| = \left\| {{A^{ - 1}}\left( {\delta b - \left( {\delta A} \right)\left( x \right) - \left( {\delta A} \right)\left( {\delta x} \right)} \right)} \right\| \hfill \\ \qquad\leqslant \left\| {{A^{ - 1}}} \right\|\left\| {\delta b - \left( {\delta A} \right)\left( x \right) - \left( {\delta A} \right)\left( {\delta x} \right)} \right\| \hfill \\ \qquad\leqslant \left\| {{A^{ - 1}}} \right\|\left( {\left\| {\delta b} \right\| + \left\| {\delta A} \right\|\left\| x \right\| + \left\| {\delta A} \right\|\left\| {\delta x} \right\|} \right) \hfill \\ \end{gathered}

which implies that

\displaystyle\left\|{\delta x}\right\|\left({1-\left\|{{A^{-1}}}\right\|\left\|{\delta A}\right\|}\right)\leqslant\left\|{{A^{-1}}}\right\|\left({\left\|{\delta b}\right\|+\left\|{\delta A}\right\|\left\| x\right\|}\right).

Thus,

\displaystyle\begin{gathered}\frac{{\left\| {\delta x} \right\|}}{{\left\| x \right\|}} \leqslant \frac{{\left\| {{A^{ - 1}}} \right\|\left( {\frac{{\left\| {\delta b} \right\|}}{{\left\| x \right\|}} + \left\| {\delta A} \right\|} \right)}}{{1 - \left\| {{A^{ - 1}}} \right\|\left\| {\delta A} \right\|}} \hfill \\ \qquad= \frac{{\left\| A \right\|\left\| {{A^{ - 1}}} \right\|\left( {\frac{{\left\| {\delta b} \right\|}}{{\left\| A \right\|\left\| x \right\|}} + \frac{{\left\| {\delta A} \right\|}}{{\left\| A \right\|}}} \right)}}{{1 - \left\| A \right\|\left\| {{A^{ - 1}}} \right\|\frac{{\left\| {\delta A} \right\|}}{{\left\| A \right\|}}}} \hfill \\ \qquad\leqslant \frac{{\left\| A \right\|\left\| {{A^{ - 1}}} \right\|\left( {\frac{{\left\| {\delta b} \right\|}}{{\left\| {Ax} \right\|}} + \frac{{\left\| {\delta A} \right\|}}{{\left\| A \right\|}}} \right)}}{{1 - \left\| A \right\|\left\| {{A^{ - 1}}} \right\|\frac{{\left\| {\delta A} \right\|}}{{\left\| A \right\|}}}} \hfill \\ \qquad\leqslant \frac{{\left\| A \right\|\left\| {{A^{ - 1}}} \right\|\left( {\frac{{\left\| {\delta b} \right\|}}{{\left\| b \right\|}} + \frac{{\left\| {\delta A} \right\|}}{{\left\| A \right\|}}} \right)}}{{1 - \left\| A \right\|\left\| {{A^{ - 1}}} \right\|\frac{{\left\| {\delta A} \right\|}}{{\left\| A \right\|}}}}. \hfill \\ \end{gathered}

In the literature, the expression \|A^{-1}\|\|A\| is called the condition number of the matrix A, denoted by K(A). It is easy to see that K(A) \geq 1. If K(A) is small then the system Ax=b is called to be well-posed. Alternatively, in the case when 1 \ll K(A), the system Ax=b is ill-posed.

Bài viết cũ hơn »

Blog at WordPress.com.