14.11.2013 Views

GS534: Solving the 1D Poisson equation using finite differences ...

GS534: Solving the 1D Poisson equation using finite differences ...

GS534: Solving the 1D Poisson equation using finite differences ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

1. Example problem<br />

<strong>GS534</strong>: <strong>Solving</strong> <strong>the</strong> <strong>1D</strong> <strong>Poisson</strong> <strong>equation</strong> <strong>using</strong> <strong>finite</strong> <strong>differences</strong><br />

Peter van Keken, October 14, 2003.<br />

Consider <strong>the</strong> <strong>1D</strong> <strong>Poisson</strong> <strong>equation</strong><br />

− d2 u<br />

dx 2 = 1<br />

on Ω=[0, 1] with boundary conditions<br />

u(0) = 0 and u′(1) = du<br />

dx (1) = 0<br />

which has analytical solution<br />

u = x − 1 2 x2<br />

(1a)<br />

(1b)<br />

(2)<br />

2. Finite <strong>differences</strong><br />

We we will use this specific example to investigate various approaches to solving partial differential<br />

<strong>equation</strong>s with <strong>finite</strong> <strong>differences</strong>, in which we discretize <strong>the</strong> domain by defining N equally<br />

spaced points<br />

x i = (i − 1)∆x<br />

(3)<br />

1<br />

where ∆x =<br />

N − 1 . The solution u in each point is given by <strong>the</strong> array u i = u(x i ). At each point<br />

we can approximate <strong>the</strong> spatial derivatives of u as<br />

or<br />

or<br />

du<br />

dx ≈ u i − u i−1<br />

∆x<br />

du<br />

dx ≈ u i+1 − u i<br />

∆x<br />

+ O((∆x) 2 )<br />

+ O((∆x) 2 )<br />

du<br />

(6)<br />

dx ≈ u i+1 − u i−1<br />

+ O((∆x) 3 )<br />

2∆x<br />

(you can verify <strong>the</strong> approximation easily by a Taylor series expansion of u(x +∆x) around u(x)).<br />

The second derivative is approximated by<br />

d 2 u<br />

dx ≈ u i−1 − 2u i + u i+1<br />

+ O((∆x) 3 )<br />

2 (∆x) 2<br />

This leads to <strong>the</strong> discretization of (1a) by<br />

(4)<br />

(5)<br />

(7)<br />

− u i+1 − 2u i − u i−1<br />

(∆x) 2 = 1<br />

(8)<br />

so that at each internal point x i<br />

(9)<br />

u i = 1 2 [u i−1 + u i+1 + (∆x) 2 ]<br />

1


The boundary conditions require special care. For x = 0 we hav e a Dirichlet boundary condition<br />

which allows us to fix <strong>the</strong> value u 1 = 0. For x = 1 we hav e a Neumann boundary condition<br />

du/dx = 0. This is a symmetry boundary condition, so that in this case we can imagine a ’ghost’<br />

point u N+1 which is always equal to u N−1 . This leads to <strong>the</strong> expression for point x N :<br />

u N = u N−1 + 1 2 (∆x)2<br />

(10)<br />

3. Iterative methods<br />

3.1. Jacobi and Gauss-Seidel<br />

A simple approach to solve <strong>the</strong> differential <strong>equation</strong> is to start with a ’guess’ and to use <strong>the</strong> stencil<br />

(9) on <strong>the</strong> internal points and taking care of <strong>the</strong> boundary conditions to iteratively improve on <strong>the</strong><br />

estimated solution. The algorithm is as follows:<br />

initialize a vector u<br />

until converged<br />

copy u to backup vector u old<br />

update u by applying (9) to <strong>the</strong> internal points <strong>using</strong> u old in right hand side<br />

update boundary condition values<br />

We can modify this ’strict’ Jacobi by not <strong>using</strong> a duplicate vector u old but ra<strong>the</strong>r working on u.<br />

This can save memory and has <strong>the</strong> advantage that that you use updated values whenever possible.<br />

Note that it may still be necessary to keep a duplicate vector around for checking on <strong>the</strong> convergence.<br />

A second modification is reached by recognizing that <strong>the</strong> stencil only uses nearest neighbors.<br />

In order to optimally update <strong>the</strong> values in <strong>the</strong> center points it would make sense to first do<br />

all even points, followed by all odd points. During this last step you would use <strong>the</strong> updated values.<br />

This is called Red-Black Gauss-Seidel, where <strong>the</strong> ’Red-Black’ term comes from imagining<br />

that each odd point is red and each even point is black. This concept is easily extended to 2D and<br />

3D. The algorithm is <strong>the</strong>n modified to:<br />

initialize a vector u<br />

until converged<br />

update u by applying (9) to <strong>the</strong> odd internal points<br />

update u by applying (9) to <strong>the</strong> even internal points<br />

update boundary condition values<br />

where you need to make sure that <strong>the</strong> boundary conditions are treated at <strong>the</strong> appropriate step in<br />

<strong>the</strong> algorithm.<br />

3.2. Iterative methods: successive overrelaxation<br />

The methods above are stable and easy to implement but converge very slowly except for very<br />

small problems (N small). In order to speed up <strong>the</strong> convergence we can use overrelaxation in<br />

which you make an overcorrection at each step in <strong>the</strong> Gauss-Seidel iteration. In our case this<br />

reads:<br />

u i ′= 1 2 [u i−1 + u i+1 − (∆x) 2 ]<br />

(11a)<br />

2


u i : =ωu i ′+(1 −ω)u i<br />

(11b)<br />

where I’ve used : = to indicate that <strong>the</strong> left hand side becomes <strong>the</strong> value of <strong>the</strong> right hand side<br />

(ra<strong>the</strong>r than ma<strong>the</strong>matical equality). ω is <strong>the</strong> relaxation parameter. When 1 < ω < 2 you use<br />

overrelaxation but you can also choose 0 < ω < 1 to use underrelaxation (which is sometimes<br />

handy when you have strongly nonlinear problems). It can be shown that this algorithm only converges<br />

for 0 < ω < 2 and that you can try to get an optimal choice for ω. See Chapter 19.5 of<br />

Press et al. (1992) for more details.<br />

An evaluation of <strong>the</strong>se algorithms for our <strong>equation</strong> (1) shows <strong>the</strong> increase in convergence<br />

speed for SOR compared to Gauss-Seidel. The number of iterations are shown for convergence<br />

of <strong>the</strong> rms error to less than 10 −6 compared to <strong>the</strong> analytical solution. All cases were started with<br />

an initial estimate u = 1.<br />

N Number of iterations<br />

GS ω=1.5 1.9 2.0<br />

11 533 207 62 10<br />

21 2136 776 190 20<br />

41 8552 2983 621 40<br />

Note that we’re lucky that we find convergence (and pretty good convergence at that) for ω=2!<br />

This is pretty much guaranteed not to work for any real applications.<br />

3.3. Iterative methods: multigrid<br />

The slow convergence of Jacobi can be understood if you consider how information is transferred<br />

between nodal points by <strong>the</strong> stencil. It is clear from <strong>the</strong> algorithm that it takes <strong>the</strong> stencil N steps<br />

to go from one end of <strong>the</strong> domain to <strong>the</strong> o<strong>the</strong>r. During this first iteration each nodal point makes<br />

itself known to its neighbors, but it takes N iterations to let <strong>the</strong> point at one end of <strong>the</strong> domain<br />

know about <strong>the</strong> existence of <strong>the</strong> point at <strong>the</strong> o<strong>the</strong>r end of <strong>the</strong> domain. Updates to <strong>the</strong> values at<br />

each point require fur<strong>the</strong>r iteration and it may be intuitive (and can be formally shown) that it<br />

takes N × N iterations to fully converge. The SOR approach allows a speed up but it is logical<br />

that it can not be faster than N iterations. This means that <strong>the</strong> computational cost of <strong>the</strong>se iterative<br />

methods is O(N 2 )toO(N 3 ) flops (=floating point operations), since each iteration takes O(N)<br />

flops.<br />

The main limiting factor is <strong>the</strong> width of <strong>the</strong> stencil. The three point stencil is efficient at transfering<br />

short wav elength information, but very inefficient for wav elengths that are longer than a<br />

few grid points. It would <strong>the</strong>refore make sense to try to transfer long wav elength information with<br />

a stencil that has greater ’reach’. This can be accomplished by subsampling <strong>the</strong> grid by taking<br />

ev ery o<strong>the</strong>r gridpoint which halfs <strong>the</strong> number of iterations that <strong>the</strong> stencil needs to go from one<br />

end of <strong>the</strong> domain to <strong>the</strong> o<strong>the</strong>r. At this once coarser grid you don’t do <strong>the</strong> shortest wav elength<br />

very accurately, but you could improve on that by going back to <strong>the</strong> fine grid once you’ve finished<br />

on <strong>the</strong> coarser grid. Of course, you could see <strong>the</strong> iteration on <strong>the</strong> coarse grid in a similar light,<br />

where it is beneficial to subsample <strong>the</strong> grid points to improve <strong>the</strong> convergence of even longer<br />

wavelengths. This is <strong>the</strong> main philosophy behind multigrid, where you use multiple layers of<br />

subsampled grids (for an example see Figure 1).<br />

The main trick in multigrid is to compute corrections to <strong>the</strong> solution at fine grids <strong>using</strong> coarser<br />

grids. Let’s write <strong>the</strong> <strong>equation</strong> (1a) in more general form to develop an appreciation for <strong>the</strong><br />

method:<br />

3


Finest level<br />

Smoothing<br />

Restriction<br />

Interpolation<br />

Coarsest level<br />

Figure 1. Geometry for <strong>1D</strong> multigrid<br />

Lu = f<br />

(12)<br />

where L is a linear operator, u is <strong>the</strong> unknown and f is <strong>the</strong> known right-hand-side vector. If we<br />

have an approximate solution we can define <strong>the</strong> residual<br />

r = Lu − f<br />

(13)<br />

If u is <strong>the</strong> correct solution <strong>the</strong> residual is zero. We can try to minimize <strong>the</strong> residual by computing<br />

<strong>the</strong> error v to u. Since L is linear <strong>the</strong> error satisfies<br />

Lv =−r<br />

(14)<br />

so that when we know <strong>the</strong> residual we can solve (14) to find <strong>the</strong> correction. The main trick in<br />

multigrid is that (14) is computed on a coarser grid and <strong>the</strong> coarse grid solution is interpolated to<br />

<strong>the</strong> finer grid before adding it to <strong>the</strong> approximate fine grid solution. The two grid algorithm can<br />

be written as<br />

compute approximate solution on fine grid <strong>using</strong> (12)<br />

compute residual on fine grid <strong>using</strong> (13)<br />

restrict residual to coarse grid<br />

solve for error on coarse grid <strong>using</strong> (14)<br />

interpolate and add error to fine grid solution<br />

Of course, if this works for going from one level to a next-coarser one, it will work for <strong>the</strong> next<br />

coarser level also. This means that we can iteratively apply <strong>the</strong> coarse grid error correction. Note<br />

that (14) and (12) have identical form so that we can define <strong>the</strong> right hand side f of each coarse<br />

grid <strong>equation</strong> to be equal to <strong>the</strong> negative residual −r and define <strong>the</strong> solution at each coarser level u<br />

to be <strong>the</strong> error v for <strong>the</strong> next higher level.<br />

Assume that we want to solve (12) on a grid with 2 N + 1 grid points. This is <strong>the</strong> finest level<br />

l = N. We can define coarse levels l = N − 1, N − 2, ..., 1 by taking out each even point from <strong>the</strong><br />

next finer level (figure 1). At <strong>the</strong> coarsest level we hav e three grid points left. At each level we<br />

define a solution u l , a right hand side f l , a residual r l and error v l .<br />

For each level l = N, N − 1, ...1<br />

relax (12) at level l to find u l<br />

compute negative residual on level l: −r l<br />

interpolate residual to next coarser level and store in f l−1<br />

4


For each level l = 1, 2, . . . . , N − 1<br />

relax (12) at level l to improve estimate of u l<br />

interpolate u l to next level to find error v l+1<br />

add error to solution u l+1 : = u l+1 + v l+1<br />

A graph of this algorithm made with <strong>the</strong> finest level on top and <strong>the</strong> sequence of operations progressing<br />

from left to right is in <strong>the</strong> form of a "V" and <strong>the</strong> common name of <strong>the</strong> above algorithm is<br />

<strong>the</strong> multigrid V-cycle. A special form of multigrid starts by solving (12) on <strong>the</strong> coarsest level,<br />

interpolating to <strong>the</strong> next one up, perform a V-cycle, interpolate to <strong>the</strong> next level, perform a<br />

V-cycle, etc. until <strong>the</strong> finest level is reached. This is called full multigrid.<br />

In order to develop <strong>the</strong> multigrid iteration one needs to define how one goes from a grid level<br />

down to <strong>the</strong> next coarser level (’restriction’), how one goes back up to finer levels (’interpolation’)<br />

and how <strong>the</strong> iteration takes place at each level (’solution’ or ’smoothing’). For an introduction of<br />

<strong>the</strong> use of multigrid for <strong>the</strong> <strong>Poisson</strong> <strong>equation</strong> see e.g., www.cs.berkeley.edu/˜demmel/cs267/lecture25/lecture25.html,<br />

or sccm.stanford.edu/˜livne/36. See also "An Introduction to Multigrid<br />

Methods" by Wesseling (Wiley and Sons, 1992) which is available online at<br />

www.mgnet.org/mgnet-books-wesseling.html.<br />

Multigrid methods employ various choices for <strong>the</strong> restriction, interpolation and smoothing<br />

operators. For example, restriction can be done by injection (just taking <strong>the</strong> corresponding values)<br />

or by averaging three neighboring grid points on <strong>the</strong> fine mesh to yield <strong>the</strong> value in <strong>the</strong> corresponding<br />

grid point on <strong>the</strong> coarser grid. Interpolation is easily done by linear interpolation for<br />

points on <strong>the</strong> finer grid that are not represented in <strong>the</strong> coarse grid. Smoothing can be done by<br />

Gauss-Seidel. Interestingly, SOR is a bad choice for smoothing in multigrid.<br />

Boundary conditions require special care in multigrid methods. In general it is practical to<br />

rewrite <strong>the</strong> problem such that you have homogeneous essential boundary conditions (solution is<br />

zero at <strong>the</strong> boundaries).<br />

4. Direct methods<br />

Note that we can use <strong>the</strong> stencil (8) to set up a system of <strong>equation</strong>s that link <strong>the</strong> N unknowns<br />

through a N × N matrix-vector system<br />

Au = f<br />

(15)<br />

In <strong>the</strong> case of a discretization with N = 5 <strong>the</strong> matrix and vectors become<br />

⎡<br />

2<br />

⎢ −1<br />

A = ⎢ 0<br />

⎢<br />

⎢ 0<br />

⎣ 0<br />

−1<br />

2<br />

−1<br />

0<br />

0<br />

0<br />

−1<br />

2<br />

−1<br />

0<br />

0<br />

0<br />

−1<br />

2<br />

−1<br />

0<br />

⎤<br />

0 ⎥<br />

0 ⎥<br />

⎥<br />

−1 ⎥<br />

1 ⎦<br />

(16a)<br />

⎡<br />

⎢<br />

⎢<br />

u = ⎢<br />

⎢<br />

⎢<br />

⎣<br />

u 1<br />

u 2<br />

u 3<br />

u 4<br />

(16b)<br />

u 5<br />

⎤<br />

⎥⎥⎥⎥⎥⎦<br />

5


⎡ (∆x) 2 ⎤<br />

⎢<br />

(∆x)<br />

⎢<br />

2 ⎥⎥⎥⎥⎥⎦<br />

f = ⎢ (∆x) 2<br />

(16c)<br />

⎢ (∆x) 2<br />

⎢ 1<br />

⎣ 2 (∆x)2<br />

Note that in this case N is <strong>the</strong> number of unknowns so that it doesn’t include <strong>the</strong> first nodal point<br />

(for which <strong>the</strong> value is prescribed). N is <strong>the</strong>refore <strong>the</strong> number of nodal points minus one.<br />

4.1. Solution of <strong>the</strong> discrete system: Gaussian elimination<br />

The matrix vector system can be solved <strong>using</strong> various methods. A brute force method is to compute<br />

<strong>the</strong> inverse A −1 of <strong>the</strong> matrix and find <strong>the</strong> solution by matrix-vector multiplication<br />

u = A −1 f<br />

(17)<br />

It is generally quite expensive to compute <strong>the</strong> exact inverse of a matrix. One exception is for <strong>the</strong><br />

inversion of a diagonal matrix which is trivial. A more efficient method to solve (15) is to use<br />

Gaussian elimination which generally requires O(N 3 ) operations for a full matrix but is significantly<br />

more efficient for <strong>the</strong> sparse matrices that occur in <strong>finite</strong> difference methods. A general<br />

approach is to decompose <strong>the</strong> matrix A into a upper and lower triangular system A = LU which<br />

can be solved efficiently by substitution. The efficiency and memory requirements of Gaussian<br />

elimination is much improved for special forms of matrix A. For example, when A is symmetric it<br />

can be stored in approximately half <strong>the</strong> amount of memory and a more efficient LDL T where D is<br />

a diagonal matrix. If <strong>the</strong> matrix is symmetric and positive de<strong>finite</strong> one can compute <strong>the</strong> even<br />

more efficient Cholesky decomposition LL T (see e.g., Golub and Van Loan, 1989 or Chapter 2 or<br />

Press et al., 1992).<br />

4.2. Matrix storage<br />

An advantage for matrices arriving from <strong>the</strong> discretization of differential <strong>equation</strong>s by <strong>finite</strong> <strong>differences</strong><br />

or <strong>finite</strong> elements is that <strong>the</strong>y are generally sparse, with non-zero coefficients only within<br />

a certain band surrounding <strong>the</strong> diagonal. For <strong>the</strong> matrix (16a) we have bandwidth one and because<br />

of symmetry we can store <strong>the</strong> matrix by just <strong>the</strong> (main) diagonal ([2, 2, . . . . , 2, 1]) and <strong>the</strong> diagonal<br />

line of coefficients right above it ([−1, −1, ..., −1]). The storage requirements are <strong>the</strong>n 2N<br />

which compares quite favorably with N 2 for <strong>the</strong> full matrix. The algorithm for Gaussian elimination<br />

for a full matrix has to be modified to make use of <strong>the</strong> different storage, but <strong>the</strong> coefficients<br />

outside <strong>the</strong> band are not affected in any way. Coefficients inside <strong>the</strong> band may be zero after discretization,<br />

but <strong>the</strong>y need to be stored explicity since <strong>the</strong>y may become non-zero (’<strong>the</strong> band gets<br />

filled’) during <strong>the</strong> matrix decomposition. This is not relevant for <strong>the</strong> matrix (12) but becomes<br />

important for matrices that result from 2D or 3D discretization. For example, <strong>the</strong> matrix that<br />

results from <strong>the</strong> discretization of <strong>the</strong> 2D <strong>Poisson</strong> <strong>equation</strong> is depicted in Figure 2b. This follows<br />

from a 2D extension of <strong>the</strong> stencil (9) on a equidistant grid with 5 grid points along <strong>the</strong> horizontal<br />

axis (verify). The banded matrices resulting from <strong>the</strong> discretization in Figure 2 have bandwidth<br />

B = 1 (<strong>1D</strong>) or B = 5 (2D example), where <strong>the</strong> bandwidth is defined as that value B for which all<br />

coefficients a ij are zero if j > i + B. For non-symmetric matrices we can make <strong>the</strong> distinction<br />

between upper and lower bandwidth. In general it is more efficient to minimize <strong>the</strong> bandwidth. If<br />

for example, one has a 2D grid of 10x5 nodal points it is best to number <strong>the</strong> nodal points in <strong>the</strong><br />

vertical direction first. That will give a bandwidth of 5, compared to a bandwidth of 10 for<br />

6


2 -1<br />

-1 2 -1<br />

-1 2<br />

0<br />

0<br />

2<br />

-1<br />

-1<br />

1<br />

4 -1<br />

-1 4<br />

0 -1<br />

0 0<br />

-1<br />

0<br />

0<br />

-1<br />

4<br />

0<br />

0<br />

-1<br />

-1<br />

0<br />

0<br />

0<br />

4<br />

-1<br />

-1<br />

0<br />

0<br />

-1<br />

4<br />

Figure 2. a) general form of <strong>the</strong> matrix after discretization of <strong>the</strong> <strong>1D</strong> <strong>Poisson</strong> <strong>equation</strong> on an equidistant grid. b)<br />

same for 2D <strong>Poisson</strong> <strong>equation</strong>.<br />

numbering in <strong>the</strong> horizontal direction first (verify). For matrices originating from <strong>finite</strong> elements<br />

it is common to have a strongly variable number of elements on each row. This leads to a ra<strong>the</strong>r<br />

jagged outline if one traces <strong>the</strong> outermost non-zero element (Figure 3). In <strong>the</strong>se cases it is more<br />

efficient to use <strong>the</strong> profile storage method (also called skyline method, for obvious reasons).<br />

4.3. Solution of band matrices: Gaussian elimination<br />

An general algorithm for <strong>the</strong> solution of a matrix-vector <strong>equation</strong> used LU decomposition, where<br />

<strong>the</strong> matrix A is decomposed into a lower triangular matrix L and an upper triangular matrix U.<br />

The matrix vector <strong>equation</strong> can <strong>the</strong>n be written as<br />

Au = LUu = f<br />

(18)<br />

which can be solved by first finding vector v such that<br />

B<br />

0<br />

0<br />

0<br />

0<br />

Figure 3. Illustration of band (a) and profile/skyline matrices.<br />

7


Lv = f<br />

(19a)<br />

followed by finding <strong>the</strong> solution u from<br />

Uu = f<br />

(19b)<br />

Once <strong>the</strong> decomposition is known it is easy to find v by solving (19a) with forward substitution:<br />

v 1 = f 1<br />

L 11<br />

v i = 1 ⎡<br />

⎢⎣ f i − i−1 ⎤<br />

L ii<br />

Σ L ij v j ⎥⎦<br />

j=1<br />

and solving (19b) for u by backsubstitution:<br />

u N =<br />

y N<br />

U NN<br />

u i = 1<br />

U ii<br />

⎡<br />

⎢⎣ y i −<br />

N<br />

j=i+1<br />

i = 2, 3, ..., N<br />

Σ U ij u j<br />

⎤<br />

⎥⎦ i = N − 1, N − 2, . . . . , 1<br />

(20a)<br />

(20b)<br />

(19a)<br />

(19b)<br />

Example:<br />

Consider <strong>the</strong> matrix-vector <strong>equation</strong><br />

⎡ 6 3 ⎤ ⎡ u 1<br />

⎤<br />

⎢ ⎥ ⎢ ⎥⎦ = ⎡ 3 ⎤<br />

⎢ ⎥<br />

⎣<br />

2 13<br />

⎦ ⎣<br />

u 2 ⎣<br />

−11<br />

⎦<br />

We can show that <strong>the</strong> LU decomposition can be written as<br />

⎡ 6 3 ⎤<br />

⎢ ⎥ = ⎡ 3 0 ⎤ ⎡ 2 1 ⎤<br />

⎢ ⎥ ⎢ ⎥ =<br />

⎣<br />

2 13<br />

⎦ ⎣<br />

1 3<br />

⎦ ⎣<br />

0 4<br />

⎦<br />

so that, <strong>using</strong> <strong>the</strong> notation in (18) and (19)<br />

and<br />

v = [1, −4] T<br />

u = [1, −1] T<br />

(verify). The decomposition into LU follows from Crout’s algorithm which can replace <strong>the</strong><br />

matrix A in memory with <strong>the</strong> upper and lower triangular matrices U and L (e.g, Press et al.,<br />

1992).<br />

4.4. Practical implementation of routines for Gaussian elimination<br />

The algorithms for Gaussian elimination are implemented in a variety of standard libraries and<br />

packages and it’s generally not necessary to write user programs for <strong>the</strong>se algorithms. A particular<br />

class of subroutines for <strong>the</strong> solution of problems arising from linear algebra is LINPACK<br />

(www.netlib.org/LINPACK) which was developed in <strong>the</strong> 70s and 80s and is now largely superseded<br />

by LAPACK (www.netlib.org/LAPACK), but <strong>the</strong> name linpack still exists as a ’generic’<br />

term. Most algorithms make extensive use of vector algebra, including <strong>the</strong> inner (dot) product and<br />

outer product. These are generally available in to programmers by calls to <strong>the</strong> BLAS (Basic<br />

(20)<br />

(21)<br />

(22)<br />

8


Linear Algebra Subprograms). Many <strong>finite</strong> difference and <strong>finite</strong> element code spend most of <strong>the</strong>ir<br />

computational time in <strong>the</strong>se BLAS routines (e.g., for one specific <strong>finite</strong> element code for mantle<br />

convection it was found that it spend nearly 80% of its time in <strong>the</strong> DDOT routine for computing<br />

<strong>the</strong> dot product of vectors in double precision). It is generally worth it to investigate if optimized<br />

BLAS routines exist for your system. Locally we have had good luck with <strong>the</strong> ATLAS optimized<br />

BLAS routines for <strong>the</strong> linux PCs (math-atlas.sourceforge.net). In some cases <strong>the</strong> BLAS developed<br />

by Kazushige Goto for Pentium and AMD processors (www.cs.utexas.edu/users/flame/goto)<br />

worked quite well too. In general you will have to search for <strong>the</strong> right routines that solve your<br />

specific form of matrix (e.g., complex, single precision real, double precision real, band, profile,<br />

symmetric or nonsymmetric) and you may have to interface your program to generate <strong>the</strong><br />

matrix/vector system in a layout that <strong>the</strong> subroutines will take.<br />

If you are interested in more ’quick and dirty’ implementations it is worthwhile to check out <strong>the</strong><br />

routines from Numerical Recipes which provide good quality but not necessarily optimized (or<br />

optimizable) subroutines for Gaussian elimination.<br />

4.5. Back to our <strong>1D</strong> problem: solution of tridiagonal system<br />

The matrix (16a) is symmetric and tridiagonal which makes <strong>the</strong> solution by Gaussian elimination<br />

particularly efficient. An algorithm for <strong>the</strong> decomposition and forward/back substitution is given<br />

in Press et al. (1992), assuming that <strong>the</strong> main diagonal is stored in vector b, top diagonal is stored<br />

in c and bottom diagonal is stored in a. Each vector is of length n. The index i indicates <strong>the</strong> row<br />

number (which means that <strong>the</strong> first non-zero component of vector a is at index 2). The right-hand<br />

side vector of <strong>the</strong> <strong>equation</strong>s (15) is stored in r; <strong>the</strong> solution is returned in u. I’v e modified <strong>the</strong> subroutine<br />

a bit from Numerical Recipes to include a temporary workspace (gam) for <strong>the</strong> decomposition<br />

and to use double precision arithmetic (which is generally needed to avoid roundoff errors in<br />

big systems).<br />

subroutine tridag(a,b,c,r,u,n,gam)<br />

implicit none<br />

integer n<br />

double precision a(n),b(n),c(n),r(n),u(n),gam(n)<br />

integer j<br />

real beta<br />

bet = b(1)<br />

if (bet.eq.0) <strong>the</strong>n<br />

write(6,*) "tridag fails: small pivot element"<br />

stop<br />

endif<br />

u(1) = r(1)/bet<br />

c<br />

c<br />

do j=2,n<br />

*** Decomposition<br />

gam(j) = c(j-1)/bet<br />

bet = b(j) - a(j)*gam(j)<br />

if (bet.eq.0) <strong>the</strong>n<br />

write(6,*) "tridag fails: small pivot element"<br />

stop<br />

endif<br />

*** forward substitution<br />

9


u(j) = ( r(j) - a(j)*u(j-1))/bet<br />

enddo<br />

do j=n-1,1,-1<br />

u(j) = u(j) - gam(j+1)*u(j+1)<br />

enddo<br />

return<br />

end<br />

References<br />

Golub, G.H., and C.F. Van Loan, Matrix Computations, 2nd edition, Johns Hopkins University<br />

Press, 1989.<br />

Press, W.H., S.A. Teukolsky, W.T. Vetterling, B.P. Flannery, Numerical Recipes, <strong>the</strong> art of scientific<br />

computing, 2nd edition, Cambridge University Press, 1992.<br />

Wesseling, P., An introduction to multigrid methods, Wiley and Sons, 1992.<br />

10

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!