GS534: Solving the 1D Poisson equation using finite differences ...
GS534: Solving the 1D Poisson equation using finite differences ...
GS534: Solving the 1D Poisson equation using finite differences ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
1. Example problem<br />
<strong>GS534</strong>: <strong>Solving</strong> <strong>the</strong> <strong>1D</strong> <strong>Poisson</strong> <strong>equation</strong> <strong>using</strong> <strong>finite</strong> <strong>differences</strong><br />
Peter van Keken, October 14, 2003.<br />
Consider <strong>the</strong> <strong>1D</strong> <strong>Poisson</strong> <strong>equation</strong><br />
− d2 u<br />
dx 2 = 1<br />
on Ω=[0, 1] with boundary conditions<br />
u(0) = 0 and u′(1) = du<br />
dx (1) = 0<br />
which has analytical solution<br />
u = x − 1 2 x2<br />
(1a)<br />
(1b)<br />
(2)<br />
2. Finite <strong>differences</strong><br />
We we will use this specific example to investigate various approaches to solving partial differential<br />
<strong>equation</strong>s with <strong>finite</strong> <strong>differences</strong>, in which we discretize <strong>the</strong> domain by defining N equally<br />
spaced points<br />
x i = (i − 1)∆x<br />
(3)<br />
1<br />
where ∆x =<br />
N − 1 . The solution u in each point is given by <strong>the</strong> array u i = u(x i ). At each point<br />
we can approximate <strong>the</strong> spatial derivatives of u as<br />
or<br />
or<br />
du<br />
dx ≈ u i − u i−1<br />
∆x<br />
du<br />
dx ≈ u i+1 − u i<br />
∆x<br />
+ O((∆x) 2 )<br />
+ O((∆x) 2 )<br />
du<br />
(6)<br />
dx ≈ u i+1 − u i−1<br />
+ O((∆x) 3 )<br />
2∆x<br />
(you can verify <strong>the</strong> approximation easily by a Taylor series expansion of u(x +∆x) around u(x)).<br />
The second derivative is approximated by<br />
d 2 u<br />
dx ≈ u i−1 − 2u i + u i+1<br />
+ O((∆x) 3 )<br />
2 (∆x) 2<br />
This leads to <strong>the</strong> discretization of (1a) by<br />
(4)<br />
(5)<br />
(7)<br />
− u i+1 − 2u i − u i−1<br />
(∆x) 2 = 1<br />
(8)<br />
so that at each internal point x i<br />
(9)<br />
u i = 1 2 [u i−1 + u i+1 + (∆x) 2 ]<br />
1
The boundary conditions require special care. For x = 0 we hav e a Dirichlet boundary condition<br />
which allows us to fix <strong>the</strong> value u 1 = 0. For x = 1 we hav e a Neumann boundary condition<br />
du/dx = 0. This is a symmetry boundary condition, so that in this case we can imagine a ’ghost’<br />
point u N+1 which is always equal to u N−1 . This leads to <strong>the</strong> expression for point x N :<br />
u N = u N−1 + 1 2 (∆x)2<br />
(10)<br />
3. Iterative methods<br />
3.1. Jacobi and Gauss-Seidel<br />
A simple approach to solve <strong>the</strong> differential <strong>equation</strong> is to start with a ’guess’ and to use <strong>the</strong> stencil<br />
(9) on <strong>the</strong> internal points and taking care of <strong>the</strong> boundary conditions to iteratively improve on <strong>the</strong><br />
estimated solution. The algorithm is as follows:<br />
initialize a vector u<br />
until converged<br />
copy u to backup vector u old<br />
update u by applying (9) to <strong>the</strong> internal points <strong>using</strong> u old in right hand side<br />
update boundary condition values<br />
We can modify this ’strict’ Jacobi by not <strong>using</strong> a duplicate vector u old but ra<strong>the</strong>r working on u.<br />
This can save memory and has <strong>the</strong> advantage that that you use updated values whenever possible.<br />
Note that it may still be necessary to keep a duplicate vector around for checking on <strong>the</strong> convergence.<br />
A second modification is reached by recognizing that <strong>the</strong> stencil only uses nearest neighbors.<br />
In order to optimally update <strong>the</strong> values in <strong>the</strong> center points it would make sense to first do<br />
all even points, followed by all odd points. During this last step you would use <strong>the</strong> updated values.<br />
This is called Red-Black Gauss-Seidel, where <strong>the</strong> ’Red-Black’ term comes from imagining<br />
that each odd point is red and each even point is black. This concept is easily extended to 2D and<br />
3D. The algorithm is <strong>the</strong>n modified to:<br />
initialize a vector u<br />
until converged<br />
update u by applying (9) to <strong>the</strong> odd internal points<br />
update u by applying (9) to <strong>the</strong> even internal points<br />
update boundary condition values<br />
where you need to make sure that <strong>the</strong> boundary conditions are treated at <strong>the</strong> appropriate step in<br />
<strong>the</strong> algorithm.<br />
3.2. Iterative methods: successive overrelaxation<br />
The methods above are stable and easy to implement but converge very slowly except for very<br />
small problems (N small). In order to speed up <strong>the</strong> convergence we can use overrelaxation in<br />
which you make an overcorrection at each step in <strong>the</strong> Gauss-Seidel iteration. In our case this<br />
reads:<br />
u i ′= 1 2 [u i−1 + u i+1 − (∆x) 2 ]<br />
(11a)<br />
2
u i : =ωu i ′+(1 −ω)u i<br />
(11b)<br />
where I’ve used : = to indicate that <strong>the</strong> left hand side becomes <strong>the</strong> value of <strong>the</strong> right hand side<br />
(ra<strong>the</strong>r than ma<strong>the</strong>matical equality). ω is <strong>the</strong> relaxation parameter. When 1 < ω < 2 you use<br />
overrelaxation but you can also choose 0 < ω < 1 to use underrelaxation (which is sometimes<br />
handy when you have strongly nonlinear problems). It can be shown that this algorithm only converges<br />
for 0 < ω < 2 and that you can try to get an optimal choice for ω. See Chapter 19.5 of<br />
Press et al. (1992) for more details.<br />
An evaluation of <strong>the</strong>se algorithms for our <strong>equation</strong> (1) shows <strong>the</strong> increase in convergence<br />
speed for SOR compared to Gauss-Seidel. The number of iterations are shown for convergence<br />
of <strong>the</strong> rms error to less than 10 −6 compared to <strong>the</strong> analytical solution. All cases were started with<br />
an initial estimate u = 1.<br />
N Number of iterations<br />
GS ω=1.5 1.9 2.0<br />
11 533 207 62 10<br />
21 2136 776 190 20<br />
41 8552 2983 621 40<br />
Note that we’re lucky that we find convergence (and pretty good convergence at that) for ω=2!<br />
This is pretty much guaranteed not to work for any real applications.<br />
3.3. Iterative methods: multigrid<br />
The slow convergence of Jacobi can be understood if you consider how information is transferred<br />
between nodal points by <strong>the</strong> stencil. It is clear from <strong>the</strong> algorithm that it takes <strong>the</strong> stencil N steps<br />
to go from one end of <strong>the</strong> domain to <strong>the</strong> o<strong>the</strong>r. During this first iteration each nodal point makes<br />
itself known to its neighbors, but it takes N iterations to let <strong>the</strong> point at one end of <strong>the</strong> domain<br />
know about <strong>the</strong> existence of <strong>the</strong> point at <strong>the</strong> o<strong>the</strong>r end of <strong>the</strong> domain. Updates to <strong>the</strong> values at<br />
each point require fur<strong>the</strong>r iteration and it may be intuitive (and can be formally shown) that it<br />
takes N × N iterations to fully converge. The SOR approach allows a speed up but it is logical<br />
that it can not be faster than N iterations. This means that <strong>the</strong> computational cost of <strong>the</strong>se iterative<br />
methods is O(N 2 )toO(N 3 ) flops (=floating point operations), since each iteration takes O(N)<br />
flops.<br />
The main limiting factor is <strong>the</strong> width of <strong>the</strong> stencil. The three point stencil is efficient at transfering<br />
short wav elength information, but very inefficient for wav elengths that are longer than a<br />
few grid points. It would <strong>the</strong>refore make sense to try to transfer long wav elength information with<br />
a stencil that has greater ’reach’. This can be accomplished by subsampling <strong>the</strong> grid by taking<br />
ev ery o<strong>the</strong>r gridpoint which halfs <strong>the</strong> number of iterations that <strong>the</strong> stencil needs to go from one<br />
end of <strong>the</strong> domain to <strong>the</strong> o<strong>the</strong>r. At this once coarser grid you don’t do <strong>the</strong> shortest wav elength<br />
very accurately, but you could improve on that by going back to <strong>the</strong> fine grid once you’ve finished<br />
on <strong>the</strong> coarser grid. Of course, you could see <strong>the</strong> iteration on <strong>the</strong> coarse grid in a similar light,<br />
where it is beneficial to subsample <strong>the</strong> grid points to improve <strong>the</strong> convergence of even longer<br />
wavelengths. This is <strong>the</strong> main philosophy behind multigrid, where you use multiple layers of<br />
subsampled grids (for an example see Figure 1).<br />
The main trick in multigrid is to compute corrections to <strong>the</strong> solution at fine grids <strong>using</strong> coarser<br />
grids. Let’s write <strong>the</strong> <strong>equation</strong> (1a) in more general form to develop an appreciation for <strong>the</strong><br />
method:<br />
3
Finest level<br />
Smoothing<br />
Restriction<br />
Interpolation<br />
Coarsest level<br />
Figure 1. Geometry for <strong>1D</strong> multigrid<br />
Lu = f<br />
(12)<br />
where L is a linear operator, u is <strong>the</strong> unknown and f is <strong>the</strong> known right-hand-side vector. If we<br />
have an approximate solution we can define <strong>the</strong> residual<br />
r = Lu − f<br />
(13)<br />
If u is <strong>the</strong> correct solution <strong>the</strong> residual is zero. We can try to minimize <strong>the</strong> residual by computing<br />
<strong>the</strong> error v to u. Since L is linear <strong>the</strong> error satisfies<br />
Lv =−r<br />
(14)<br />
so that when we know <strong>the</strong> residual we can solve (14) to find <strong>the</strong> correction. The main trick in<br />
multigrid is that (14) is computed on a coarser grid and <strong>the</strong> coarse grid solution is interpolated to<br />
<strong>the</strong> finer grid before adding it to <strong>the</strong> approximate fine grid solution. The two grid algorithm can<br />
be written as<br />
compute approximate solution on fine grid <strong>using</strong> (12)<br />
compute residual on fine grid <strong>using</strong> (13)<br />
restrict residual to coarse grid<br />
solve for error on coarse grid <strong>using</strong> (14)<br />
interpolate and add error to fine grid solution<br />
Of course, if this works for going from one level to a next-coarser one, it will work for <strong>the</strong> next<br />
coarser level also. This means that we can iteratively apply <strong>the</strong> coarse grid error correction. Note<br />
that (14) and (12) have identical form so that we can define <strong>the</strong> right hand side f of each coarse<br />
grid <strong>equation</strong> to be equal to <strong>the</strong> negative residual −r and define <strong>the</strong> solution at each coarser level u<br />
to be <strong>the</strong> error v for <strong>the</strong> next higher level.<br />
Assume that we want to solve (12) on a grid with 2 N + 1 grid points. This is <strong>the</strong> finest level<br />
l = N. We can define coarse levels l = N − 1, N − 2, ..., 1 by taking out each even point from <strong>the</strong><br />
next finer level (figure 1). At <strong>the</strong> coarsest level we hav e three grid points left. At each level we<br />
define a solution u l , a right hand side f l , a residual r l and error v l .<br />
For each level l = N, N − 1, ...1<br />
relax (12) at level l to find u l<br />
compute negative residual on level l: −r l<br />
interpolate residual to next coarser level and store in f l−1<br />
4
For each level l = 1, 2, . . . . , N − 1<br />
relax (12) at level l to improve estimate of u l<br />
interpolate u l to next level to find error v l+1<br />
add error to solution u l+1 : = u l+1 + v l+1<br />
A graph of this algorithm made with <strong>the</strong> finest level on top and <strong>the</strong> sequence of operations progressing<br />
from left to right is in <strong>the</strong> form of a "V" and <strong>the</strong> common name of <strong>the</strong> above algorithm is<br />
<strong>the</strong> multigrid V-cycle. A special form of multigrid starts by solving (12) on <strong>the</strong> coarsest level,<br />
interpolating to <strong>the</strong> next one up, perform a V-cycle, interpolate to <strong>the</strong> next level, perform a<br />
V-cycle, etc. until <strong>the</strong> finest level is reached. This is called full multigrid.<br />
In order to develop <strong>the</strong> multigrid iteration one needs to define how one goes from a grid level<br />
down to <strong>the</strong> next coarser level (’restriction’), how one goes back up to finer levels (’interpolation’)<br />
and how <strong>the</strong> iteration takes place at each level (’solution’ or ’smoothing’). For an introduction of<br />
<strong>the</strong> use of multigrid for <strong>the</strong> <strong>Poisson</strong> <strong>equation</strong> see e.g., www.cs.berkeley.edu/˜demmel/cs267/lecture25/lecture25.html,<br />
or sccm.stanford.edu/˜livne/36. See also "An Introduction to Multigrid<br />
Methods" by Wesseling (Wiley and Sons, 1992) which is available online at<br />
www.mgnet.org/mgnet-books-wesseling.html.<br />
Multigrid methods employ various choices for <strong>the</strong> restriction, interpolation and smoothing<br />
operators. For example, restriction can be done by injection (just taking <strong>the</strong> corresponding values)<br />
or by averaging three neighboring grid points on <strong>the</strong> fine mesh to yield <strong>the</strong> value in <strong>the</strong> corresponding<br />
grid point on <strong>the</strong> coarser grid. Interpolation is easily done by linear interpolation for<br />
points on <strong>the</strong> finer grid that are not represented in <strong>the</strong> coarse grid. Smoothing can be done by<br />
Gauss-Seidel. Interestingly, SOR is a bad choice for smoothing in multigrid.<br />
Boundary conditions require special care in multigrid methods. In general it is practical to<br />
rewrite <strong>the</strong> problem such that you have homogeneous essential boundary conditions (solution is<br />
zero at <strong>the</strong> boundaries).<br />
4. Direct methods<br />
Note that we can use <strong>the</strong> stencil (8) to set up a system of <strong>equation</strong>s that link <strong>the</strong> N unknowns<br />
through a N × N matrix-vector system<br />
Au = f<br />
(15)<br />
In <strong>the</strong> case of a discretization with N = 5 <strong>the</strong> matrix and vectors become<br />
⎡<br />
2<br />
⎢ −1<br />
A = ⎢ 0<br />
⎢<br />
⎢ 0<br />
⎣ 0<br />
−1<br />
2<br />
−1<br />
0<br />
0<br />
0<br />
−1<br />
2<br />
−1<br />
0<br />
0<br />
0<br />
−1<br />
2<br />
−1<br />
0<br />
⎤<br />
0 ⎥<br />
0 ⎥<br />
⎥<br />
−1 ⎥<br />
1 ⎦<br />
(16a)<br />
⎡<br />
⎢<br />
⎢<br />
u = ⎢<br />
⎢<br />
⎢<br />
⎣<br />
u 1<br />
u 2<br />
u 3<br />
u 4<br />
(16b)<br />
u 5<br />
⎤<br />
⎥⎥⎥⎥⎥⎦<br />
5
⎡ (∆x) 2 ⎤<br />
⎢<br />
(∆x)<br />
⎢<br />
2 ⎥⎥⎥⎥⎥⎦<br />
f = ⎢ (∆x) 2<br />
(16c)<br />
⎢ (∆x) 2<br />
⎢ 1<br />
⎣ 2 (∆x)2<br />
Note that in this case N is <strong>the</strong> number of unknowns so that it doesn’t include <strong>the</strong> first nodal point<br />
(for which <strong>the</strong> value is prescribed). N is <strong>the</strong>refore <strong>the</strong> number of nodal points minus one.<br />
4.1. Solution of <strong>the</strong> discrete system: Gaussian elimination<br />
The matrix vector system can be solved <strong>using</strong> various methods. A brute force method is to compute<br />
<strong>the</strong> inverse A −1 of <strong>the</strong> matrix and find <strong>the</strong> solution by matrix-vector multiplication<br />
u = A −1 f<br />
(17)<br />
It is generally quite expensive to compute <strong>the</strong> exact inverse of a matrix. One exception is for <strong>the</strong><br />
inversion of a diagonal matrix which is trivial. A more efficient method to solve (15) is to use<br />
Gaussian elimination which generally requires O(N 3 ) operations for a full matrix but is significantly<br />
more efficient for <strong>the</strong> sparse matrices that occur in <strong>finite</strong> difference methods. A general<br />
approach is to decompose <strong>the</strong> matrix A into a upper and lower triangular system A = LU which<br />
can be solved efficiently by substitution. The efficiency and memory requirements of Gaussian<br />
elimination is much improved for special forms of matrix A. For example, when A is symmetric it<br />
can be stored in approximately half <strong>the</strong> amount of memory and a more efficient LDL T where D is<br />
a diagonal matrix. If <strong>the</strong> matrix is symmetric and positive de<strong>finite</strong> one can compute <strong>the</strong> even<br />
more efficient Cholesky decomposition LL T (see e.g., Golub and Van Loan, 1989 or Chapter 2 or<br />
Press et al., 1992).<br />
4.2. Matrix storage<br />
An advantage for matrices arriving from <strong>the</strong> discretization of differential <strong>equation</strong>s by <strong>finite</strong> <strong>differences</strong><br />
or <strong>finite</strong> elements is that <strong>the</strong>y are generally sparse, with non-zero coefficients only within<br />
a certain band surrounding <strong>the</strong> diagonal. For <strong>the</strong> matrix (16a) we have bandwidth one and because<br />
of symmetry we can store <strong>the</strong> matrix by just <strong>the</strong> (main) diagonal ([2, 2, . . . . , 2, 1]) and <strong>the</strong> diagonal<br />
line of coefficients right above it ([−1, −1, ..., −1]). The storage requirements are <strong>the</strong>n 2N<br />
which compares quite favorably with N 2 for <strong>the</strong> full matrix. The algorithm for Gaussian elimination<br />
for a full matrix has to be modified to make use of <strong>the</strong> different storage, but <strong>the</strong> coefficients<br />
outside <strong>the</strong> band are not affected in any way. Coefficients inside <strong>the</strong> band may be zero after discretization,<br />
but <strong>the</strong>y need to be stored explicity since <strong>the</strong>y may become non-zero (’<strong>the</strong> band gets<br />
filled’) during <strong>the</strong> matrix decomposition. This is not relevant for <strong>the</strong> matrix (12) but becomes<br />
important for matrices that result from 2D or 3D discretization. For example, <strong>the</strong> matrix that<br />
results from <strong>the</strong> discretization of <strong>the</strong> 2D <strong>Poisson</strong> <strong>equation</strong> is depicted in Figure 2b. This follows<br />
from a 2D extension of <strong>the</strong> stencil (9) on a equidistant grid with 5 grid points along <strong>the</strong> horizontal<br />
axis (verify). The banded matrices resulting from <strong>the</strong> discretization in Figure 2 have bandwidth<br />
B = 1 (<strong>1D</strong>) or B = 5 (2D example), where <strong>the</strong> bandwidth is defined as that value B for which all<br />
coefficients a ij are zero if j > i + B. For non-symmetric matrices we can make <strong>the</strong> distinction<br />
between upper and lower bandwidth. In general it is more efficient to minimize <strong>the</strong> bandwidth. If<br />
for example, one has a 2D grid of 10x5 nodal points it is best to number <strong>the</strong> nodal points in <strong>the</strong><br />
vertical direction first. That will give a bandwidth of 5, compared to a bandwidth of 10 for<br />
6
2 -1<br />
-1 2 -1<br />
-1 2<br />
0<br />
0<br />
2<br />
-1<br />
-1<br />
1<br />
4 -1<br />
-1 4<br />
0 -1<br />
0 0<br />
-1<br />
0<br />
0<br />
-1<br />
4<br />
0<br />
0<br />
-1<br />
-1<br />
0<br />
0<br />
0<br />
4<br />
-1<br />
-1<br />
0<br />
0<br />
-1<br />
4<br />
Figure 2. a) general form of <strong>the</strong> matrix after discretization of <strong>the</strong> <strong>1D</strong> <strong>Poisson</strong> <strong>equation</strong> on an equidistant grid. b)<br />
same for 2D <strong>Poisson</strong> <strong>equation</strong>.<br />
numbering in <strong>the</strong> horizontal direction first (verify). For matrices originating from <strong>finite</strong> elements<br />
it is common to have a strongly variable number of elements on each row. This leads to a ra<strong>the</strong>r<br />
jagged outline if one traces <strong>the</strong> outermost non-zero element (Figure 3). In <strong>the</strong>se cases it is more<br />
efficient to use <strong>the</strong> profile storage method (also called skyline method, for obvious reasons).<br />
4.3. Solution of band matrices: Gaussian elimination<br />
An general algorithm for <strong>the</strong> solution of a matrix-vector <strong>equation</strong> used LU decomposition, where<br />
<strong>the</strong> matrix A is decomposed into a lower triangular matrix L and an upper triangular matrix U.<br />
The matrix vector <strong>equation</strong> can <strong>the</strong>n be written as<br />
Au = LUu = f<br />
(18)<br />
which can be solved by first finding vector v such that<br />
B<br />
0<br />
0<br />
0<br />
0<br />
Figure 3. Illustration of band (a) and profile/skyline matrices.<br />
7
Lv = f<br />
(19a)<br />
followed by finding <strong>the</strong> solution u from<br />
Uu = f<br />
(19b)<br />
Once <strong>the</strong> decomposition is known it is easy to find v by solving (19a) with forward substitution:<br />
v 1 = f 1<br />
L 11<br />
v i = 1 ⎡<br />
⎢⎣ f i − i−1 ⎤<br />
L ii<br />
Σ L ij v j ⎥⎦<br />
j=1<br />
and solving (19b) for u by backsubstitution:<br />
u N =<br />
y N<br />
U NN<br />
u i = 1<br />
U ii<br />
⎡<br />
⎢⎣ y i −<br />
N<br />
j=i+1<br />
i = 2, 3, ..., N<br />
Σ U ij u j<br />
⎤<br />
⎥⎦ i = N − 1, N − 2, . . . . , 1<br />
(20a)<br />
(20b)<br />
(19a)<br />
(19b)<br />
Example:<br />
Consider <strong>the</strong> matrix-vector <strong>equation</strong><br />
⎡ 6 3 ⎤ ⎡ u 1<br />
⎤<br />
⎢ ⎥ ⎢ ⎥⎦ = ⎡ 3 ⎤<br />
⎢ ⎥<br />
⎣<br />
2 13<br />
⎦ ⎣<br />
u 2 ⎣<br />
−11<br />
⎦<br />
We can show that <strong>the</strong> LU decomposition can be written as<br />
⎡ 6 3 ⎤<br />
⎢ ⎥ = ⎡ 3 0 ⎤ ⎡ 2 1 ⎤<br />
⎢ ⎥ ⎢ ⎥ =<br />
⎣<br />
2 13<br />
⎦ ⎣<br />
1 3<br />
⎦ ⎣<br />
0 4<br />
⎦<br />
so that, <strong>using</strong> <strong>the</strong> notation in (18) and (19)<br />
and<br />
v = [1, −4] T<br />
u = [1, −1] T<br />
(verify). The decomposition into LU follows from Crout’s algorithm which can replace <strong>the</strong><br />
matrix A in memory with <strong>the</strong> upper and lower triangular matrices U and L (e.g, Press et al.,<br />
1992).<br />
4.4. Practical implementation of routines for Gaussian elimination<br />
The algorithms for Gaussian elimination are implemented in a variety of standard libraries and<br />
packages and it’s generally not necessary to write user programs for <strong>the</strong>se algorithms. A particular<br />
class of subroutines for <strong>the</strong> solution of problems arising from linear algebra is LINPACK<br />
(www.netlib.org/LINPACK) which was developed in <strong>the</strong> 70s and 80s and is now largely superseded<br />
by LAPACK (www.netlib.org/LAPACK), but <strong>the</strong> name linpack still exists as a ’generic’<br />
term. Most algorithms make extensive use of vector algebra, including <strong>the</strong> inner (dot) product and<br />
outer product. These are generally available in to programmers by calls to <strong>the</strong> BLAS (Basic<br />
(20)<br />
(21)<br />
(22)<br />
8
Linear Algebra Subprograms). Many <strong>finite</strong> difference and <strong>finite</strong> element code spend most of <strong>the</strong>ir<br />
computational time in <strong>the</strong>se BLAS routines (e.g., for one specific <strong>finite</strong> element code for mantle<br />
convection it was found that it spend nearly 80% of its time in <strong>the</strong> DDOT routine for computing<br />
<strong>the</strong> dot product of vectors in double precision). It is generally worth it to investigate if optimized<br />
BLAS routines exist for your system. Locally we have had good luck with <strong>the</strong> ATLAS optimized<br />
BLAS routines for <strong>the</strong> linux PCs (math-atlas.sourceforge.net). In some cases <strong>the</strong> BLAS developed<br />
by Kazushige Goto for Pentium and AMD processors (www.cs.utexas.edu/users/flame/goto)<br />
worked quite well too. In general you will have to search for <strong>the</strong> right routines that solve your<br />
specific form of matrix (e.g., complex, single precision real, double precision real, band, profile,<br />
symmetric or nonsymmetric) and you may have to interface your program to generate <strong>the</strong><br />
matrix/vector system in a layout that <strong>the</strong> subroutines will take.<br />
If you are interested in more ’quick and dirty’ implementations it is worthwhile to check out <strong>the</strong><br />
routines from Numerical Recipes which provide good quality but not necessarily optimized (or<br />
optimizable) subroutines for Gaussian elimination.<br />
4.5. Back to our <strong>1D</strong> problem: solution of tridiagonal system<br />
The matrix (16a) is symmetric and tridiagonal which makes <strong>the</strong> solution by Gaussian elimination<br />
particularly efficient. An algorithm for <strong>the</strong> decomposition and forward/back substitution is given<br />
in Press et al. (1992), assuming that <strong>the</strong> main diagonal is stored in vector b, top diagonal is stored<br />
in c and bottom diagonal is stored in a. Each vector is of length n. The index i indicates <strong>the</strong> row<br />
number (which means that <strong>the</strong> first non-zero component of vector a is at index 2). The right-hand<br />
side vector of <strong>the</strong> <strong>equation</strong>s (15) is stored in r; <strong>the</strong> solution is returned in u. I’v e modified <strong>the</strong> subroutine<br />
a bit from Numerical Recipes to include a temporary workspace (gam) for <strong>the</strong> decomposition<br />
and to use double precision arithmetic (which is generally needed to avoid roundoff errors in<br />
big systems).<br />
subroutine tridag(a,b,c,r,u,n,gam)<br />
implicit none<br />
integer n<br />
double precision a(n),b(n),c(n),r(n),u(n),gam(n)<br />
integer j<br />
real beta<br />
bet = b(1)<br />
if (bet.eq.0) <strong>the</strong>n<br />
write(6,*) "tridag fails: small pivot element"<br />
stop<br />
endif<br />
u(1) = r(1)/bet<br />
c<br />
c<br />
do j=2,n<br />
*** Decomposition<br />
gam(j) = c(j-1)/bet<br />
bet = b(j) - a(j)*gam(j)<br />
if (bet.eq.0) <strong>the</strong>n<br />
write(6,*) "tridag fails: small pivot element"<br />
stop<br />
endif<br />
*** forward substitution<br />
9
u(j) = ( r(j) - a(j)*u(j-1))/bet<br />
enddo<br />
do j=n-1,1,-1<br />
u(j) = u(j) - gam(j+1)*u(j+1)<br />
enddo<br />
return<br />
end<br />
References<br />
Golub, G.H., and C.F. Van Loan, Matrix Computations, 2nd edition, Johns Hopkins University<br />
Press, 1989.<br />
Press, W.H., S.A. Teukolsky, W.T. Vetterling, B.P. Flannery, Numerical Recipes, <strong>the</strong> art of scientific<br />
computing, 2nd edition, Cambridge University Press, 1992.<br />
Wesseling, P., An introduction to multigrid methods, Wiley and Sons, 1992.<br />
10