Sparse preconditioners for dense linear systems from ... - cerfacs

Sparse preconditioners for dense linear systems from ... - cerfacs Sparse preconditioners for dense linear systems from ... - cerfacs

22.10.2014 Views

N o Ordre: 1879 PhD Thesis Spécialité : Informatique Sparse preconditioners for dense linear systems from electromagnetic applications présentée le 23 Avril 2002 à l’Institut National Polytechnique de Toulouse par Bruno CARPENTIERI CERFACS devant le Jury composé de : G. Alléon EADS M. Daydé Professeur à l’ENSEEIHT I. S. Duff Project Leader at CERFACS Group Leader at Rutherford Appleton Laboratory Président L. Giraud CERFACS G. Meurant CEA Rapporteur Y. Saad Professor at the University of Minnesota Rapporteur S. Piperno INRIA-CERMICS CERFACS report: TH/PA/02/48

N o Ordre: 1879<br />

PhD Thesis<br />

Spécialité :<br />

In<strong>for</strong>matique<br />

<strong>Sparse</strong> <strong>preconditioners</strong> <strong>for</strong> <strong>dense</strong> <strong>linear</strong><br />

<strong>systems</strong> <strong>from</strong> electromagnetic<br />

applications<br />

présentée le 23 Avril 2002 à<br />

l’Institut National Polytechnique de Toulouse<br />

par<br />

Bruno CARPENTIERI<br />

CERFACS<br />

devant le Jury composé de :<br />

G. Alléon EADS<br />

M. Daydé Professeur à l’ENSEEIHT<br />

I. S. Duff Project Leader at CERFACS<br />

Group Leader at Ruther<strong>for</strong>d Appleton Laboratory Président<br />

L. Giraud CERFACS<br />

G. Meurant CEA Rapporteur<br />

Y. Saad Professor at the University of Minnesota Rapporteur<br />

S. Piperno INRIA-CERMICS<br />

CERFACS report: TH/PA/02/48


i<br />

Acknowledgments<br />

I wish to express my sincere gratitude to Iain S. Duff and Luc Giraud<br />

because they introduced me to the subject of this thesis and guided my<br />

research with vivid interest. They taught me the enjoyment both <strong>for</strong> rigour<br />

and <strong>for</strong> simplicity, and let me experience the freedom and the excitement of<br />

personal discovery. Without their professional advice and their trust in me,<br />

this thesis would not be possible.<br />

My sincere thanks go to Michel Daydé <strong>for</strong> his continued support in the<br />

development of my research at CERFACS.<br />

I am grateful to Gerard Meurant and Yousef Saad who accepted to act<br />

as referees <strong>for</strong> my thesis. It was an honour <strong>for</strong> me to benefit <strong>from</strong> their<br />

feedback on my research work.<br />

I wish to thank Guillaume Alléon and Serge Piperno who opened me<br />

the door of an enriching collaboration with EADS and INRIA-CERMICS,<br />

respectively, and accepted to take part in my jury. Guillaume Sylvand at<br />

INRIA-CERMICS deserves thanks <strong>for</strong> providing me with codes and valuable<br />

support.<br />

Grateful acknowledgments are made <strong>for</strong> the EMC Team at CERFACS<br />

<strong>for</strong> their interest in my work, in particular to Mbarek Fares who provided<br />

me with the CESC code, and Francis Collino and Florence Millot <strong>for</strong> many<br />

fertile discussions.<br />

I would like to thank sincerely all the members of the Parallel Algorithms<br />

Team and CSG at CERFACS <strong>for</strong> their professional and friendly support, and<br />

Brigitte Yzel <strong>for</strong> helping me many times gently. The Parallel Algorithms<br />

Team represented a stimulating environment to develop my thesis. I am<br />

grateful to many visitors or colleagues who, at different stages, shared my<br />

enjoyment <strong>for</strong> this research.<br />

Above all, I wish to express my deep gratitude to my family and friends<br />

<strong>for</strong> their presence and continued support.<br />

This work was supported by INDAM under a grant ”Borsa di Studio per<br />

l’Estero A.A. 1998-’99” (Provvedimento del Presidente del 30 Aprile 1998),<br />

and by CERFACS.<br />

- B. C.


iii<br />

To my family


v<br />

Don’t just say ”it is impossible” without putting a sincere ef<strong>for</strong>t.<br />

Observe the word ”Impossible” carefully .. You can see ”I’m possible”.<br />

What really matters is your attitude and your perception.<br />

Anonymous


vii<br />

Abstract<br />

In this work, we investigate the use of sparse approximate inverse<br />

<strong>preconditioners</strong> <strong>for</strong> the solution of large <strong>dense</strong> complex <strong>linear</strong> <strong>systems</strong> arising<br />

<strong>from</strong> integral equations in electromagnetism applications.<br />

The goal of this study is the development of robust and parallelizable<br />

<strong>preconditioners</strong> that can easily be integrated in simulation codes able to<br />

treat large configurations. We first adapt to the <strong>dense</strong> situation the<br />

<strong>preconditioners</strong> initialy developed <strong>for</strong> sparse <strong>linear</strong> <strong>systems</strong>. We compare<br />

their respective numerical behaviours and propose a robust pattern selection<br />

strategy <strong>for</strong> Frobenius-norm minimization <strong>preconditioners</strong>.<br />

Our approach has been implemented by another PhD student in a<br />

large parallel code that exploits a fast multipole calculation <strong>for</strong> the matrix<br />

vector product in the Krylov iterations. This enables us to study the<br />

numerical scalability of our preconditioner on large academic and industrial<br />

test problems in order to identify its limitations. To remove these limitations<br />

we propose an embedded scheme. This inner-outer technique enables to<br />

significantly reduce the computational cost of the simulation and improve<br />

the robustness of the preconditioner. In particular, we were able to solve a<br />

<strong>linear</strong> system with more than a million unknowns arising <strong>from</strong> a simulation<br />

on a real aircraft. That solution was out of reach with our initial technique.<br />

Finally we per<strong>for</strong>m a preliminary study on a spectral two-level<br />

preconditioner to enhance the robustness of our preconditioner. This<br />

numerical technique exploits spectral in<strong>for</strong>mation of the preconditioned<br />

<strong>systems</strong> to build a low-rank update of the preconditioner.<br />

Keywords : Krylov subspace methods, preconditioning techniques,<br />

sparse approximate inverse, Frobenius-norm minimization method, nonzero<br />

pattern selection strategies, electromagnetic scattering applications,<br />

boundary element method, fast multipole method.


viii


Contents<br />

1 Introduction 1<br />

1.1 The physical problem and applications . . . . . . . . . . . . . 2<br />

1.2 The mathematical problem . . . . . . . . . . . . . . . . . . . 2<br />

1.3 Numerical solution of Maxwell’s equations . . . . . . . . . . . 5<br />

1.3.1 Differential equation methods . . . . . . . . . . . . . . 5<br />

1.3.2 Integral equation methods . . . . . . . . . . . . . . . . 6<br />

1.4 Direct versus iterative solution methods . . . . . . . . . . . . 8<br />

1.4.1 A sparse approach <strong>for</strong> solving scattering problems . . 9<br />

2 Iterative solution via preconditioned Krylov solvers of <strong>dense</strong><br />

<strong>systems</strong> in electromagnetism 13<br />

2.1 Introduction and motivation . . . . . . . . . . . . . . . . . . . 13<br />

2.2 Preconditioning based on sparsification strategies . . . . . . . 18<br />

2.2.1 SSOR . . . . . . . . . . . . . . . . . . . . . . . . . . . 22<br />

2.2.2 Incomplete Cholesky factorization . . . . . . . . . . . 25<br />

2.2.3 AINV . . . . . . . . . . . . . . . . . . . . . . . . . . . 38<br />

2.2.4 SPAI . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43<br />

2.2.5 SLU . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47<br />

2.2.6 Other <strong>preconditioners</strong> . . . . . . . . . . . . . . . . . . 50<br />

2.3 Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . 50<br />

3 <strong>Sparse</strong> pattern selection strategies <strong>for</strong> robust Frobeniusnorm<br />

minimization preconditioner 53<br />

3.1 Introduction and motivation . . . . . . . . . . . . . . . . . . . 54<br />

3.2 Pattern selection strategies <strong>for</strong> Frobenius-norm minimization<br />

methods in electromagnetism . . . . . . . . . . . . . . . . . . 56<br />

3.2.1 Algebraic strategy . . . . . . . . . . . . . . . . . . . . 56<br />

3.2.2 Topological strategy . . . . . . . . . . . . . . . . . . . 58<br />

3.2.3 Geometric strategy . . . . . . . . . . . . . . . . . . . . 60<br />

3.2.4 Numerical experiments . . . . . . . . . . . . . . . . . . 61<br />

3.3 Strategies <strong>for</strong> the coefficient matrix . . . . . . . . . . . . . . . 65<br />

3.4 Numerical results . . . . . . . . . . . . . . . . . . . . . . . . . 68<br />

3.5 Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . 73<br />

ix


x<br />

4 Symmetric Frobenius-norm minimization <strong>preconditioners</strong> in<br />

electromagnetism 77<br />

4.1 Comparison with standard <strong>preconditioners</strong> . . . . . . . . . . . 77<br />

4.2 Symmetrization strategies <strong>for</strong> Frobenius-norm minimization<br />

method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80<br />

4.3 Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . 88<br />

5 Combining fast multipole techniques and approximate<br />

inverse <strong>preconditioners</strong> <strong>for</strong> large parallel electromagnetics<br />

calculations. 91<br />

5.1 The fast multipole method . . . . . . . . . . . . . . . . . . . . 92<br />

5.2 Implementation of the Frobenius-norm minimization<br />

preconditioner in the fast multipole framework . . . . . . . . 94<br />

5.3 Numerical scalability of the preconditioner . . . . . . . . . . . 96<br />

5.4 Improving the preconditioner robustness using embedded<br />

iterations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103<br />

5.5 Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . 108<br />

6 Spectral two-level preconditioner 111<br />

6.1 Introduction and motivation . . . . . . . . . . . . . . . . . . . 111<br />

6.2 Two-level preconditioner via low-rank spectral updates . . . . 114<br />

6.2.1 Additive <strong>for</strong>mulation . . . . . . . . . . . . . . . . . . . 115<br />

6.2.2 Numerical experiments . . . . . . . . . . . . . . . . . . 118<br />

6.2.3 Symmetric <strong>for</strong>mulation . . . . . . . . . . . . . . . . . . 136<br />

6.3 Multiplicative <strong>for</strong>mulation of low-rank spectral updates . . . 139<br />

6.3.1 Numerical experiments . . . . . . . . . . . . . . . . . . 140<br />

6.4 Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . 143<br />

7 Conclusions and perspectives 145<br />

A Numerical results with the two-level spectral<br />

preconditioner 153<br />

A.1 Effect of the low-rank updates on the GMRES convergence . 154<br />

A.2 Experiments with the operator W H = Vɛ H M 1 . . . . . . . . . 164<br />

A.3 Cost of the eigencomputation . . . . . . . . . . . . . . . . . . 174<br />

A.4 Sensitivity of the preconditioner to the accuracy of the<br />

eigencomputation . . . . . . . . . . . . . . . . . . . . . . . . . 179<br />

A.5 Experiments with a poor preconditioner M 1 . . . . . . . . . . 189<br />

A.6 Numerical results <strong>for</strong> the symmetric <strong>for</strong>mulation . . . . . . . 204<br />

A.7 Numerical results <strong>for</strong> the multiplicative <strong>for</strong>mulation . . . . . . 216


List of Tables<br />

2.1.1 Number of matrix-vector products needed by some<br />

unpreconditioned Krylov solvers to reduce the residual by a<br />

factor of 10 −5 . . . . . . . . . . . . . . . . . . . . . . . . . . . 16<br />

2.2.2 Number of iterations using both symmetric and unsymmetric<br />

preconditioned Krylov methods to reduce the normwise<br />

backward error by 10 −5 on Example 1. The symbol ’-’ means<br />

that convergence was not obtained after 500 iterations. The<br />

symbol ’*’ means that the method is not applicable. . . . . . 20<br />

2.2.3 Number of iterations required by different Krylov solvers<br />

preconditioned by SSOR to reduce the residual by 10 −5 . The<br />

symbol ’-’ means that convergence was not obtained after 500<br />

iterations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23<br />

2.2.4 Number of iterations, varying the sparsity level of à and the<br />

level of fill-in on Example 1. . . . . . . . . . . . . . . . . . . 25<br />

2.2.5 Number of iterations, varying the sparsity level of à and the<br />

level of fill-in on Example 2. . . . . . . . . . . . . . . . . . . 26<br />

2.2.6 Number of iterations, varying the sparsity level of à and the<br />

level of fill-in on Example 3. . . . . . . . . . . . . . . . . . . 27<br />

2.2.7 Number of iterations, varying the sparsity level of à and the<br />

level of fill-in on Example 4. . . . . . . . . . . . . . . . . . . 28<br />

2.2.8 Number of iterations, varying the sparsity level of à and the<br />

level of fill-in on Example 5. . . . . . . . . . . . . . . . . . . 29<br />

2.2.9 Number of SQMR iterations, varying the shift parameter <strong>for</strong><br />

various level of fill-in in IC. . . . . . . . . . . . . . . . . . . 34<br />

2.2.10 Number of iterations required by different Krylov solvers<br />

preconditioned by AINV to reduce the residual by 10 −5 . The<br />

symbol ’-’ means that convergence was not obtained after 500<br />

iterations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41<br />

2.2.11 Number of iterations required by different Krylov solvers<br />

preconditioned by AINV to reduce the residual by 10 −5 . The<br />

preconditioner is computed using the <strong>dense</strong> coefficient matrix.<br />

The symbol ’-’ means that convergence was not obtained after<br />

500 iterations. . . . . . . . . . . . . . . . . . . . . . . . . . . . 42<br />

xi


xii<br />

2.2.12 Number of iterations required by different Krylov solvers<br />

preconditioned by SPAI to reduce the residual by 10 −5 . The<br />

symbol ’-’ means that convergence was not obtained after 500<br />

iterations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48<br />

2.2.13 Number of iterations required by different Krylov solvers<br />

preconditioned by SLU to reduce the residual by 10 −5 . The<br />

symbol ’-’ means that convergence was not obtained after 500<br />

iterations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49<br />

3.2.1 Number of iterations using the <strong>preconditioners</strong> based on<br />

<strong>dense</strong> A. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65<br />

3.3.2 Number of iterations <strong>for</strong> GMRES(50) preconditioned with<br />

different values <strong>for</strong> the density of M using the same pattern<br />

<strong>for</strong> A and larger patterns. A geometric approach is adopted<br />

to construct the patterns. The test problem is Example 1.<br />

This is representative of the general behaviour observed. . . . 67<br />

3.4.3 Number of iterations to solve the set of test problems. . . . . 71<br />

3.4.4 CPU time to compute the <strong>preconditioners</strong>. . . . . . . . . . . . 71<br />

3.5.5 Number of iterations to solve the set of test models by<br />

using a multiple density geometric strategy to construct the<br />

preconditioner. The pattern imposed on M is twice as <strong>dense</strong><br />

as that imposed on A. . . . . . . . . . . . . . . . . . . . . . . 74<br />

3.5.6 Number of iterations to solve the set of test models by using<br />

a topological strategy to sparsify A and a geometric strategy<br />

<strong>for</strong> the preconditioner. The pattern imposed on M is twice<br />

as <strong>dense</strong> as that imposed on A. . . . . . . . . . . . . . . . . . 75<br />

4.1.1 Number of iterations with some standard <strong>preconditioners</strong><br />

computed using sparse A (algebraic). . . . . . . . . . . . . . . 80<br />

4.2.2 Number of iterations on the test examples using the same<br />

pattern <strong>for</strong> the <strong>preconditioners</strong>. . . . . . . . . . . . . . . . . 83<br />

4.2.3 Number of iterations <strong>for</strong> M Sym−F rob combined with SQMR<br />

using three times more non-zero in à than in the preconditioner. 83<br />

4.2.4 Number of iterations of SQMR with M Sym−F rob with<br />

different values <strong>for</strong> the density of M, using the same pattern<br />

<strong>for</strong> A and larger patterns. The test problem is Example 1. . . 84<br />

4.2.5 Number of iterations of SQMR with M Aver−F rob with<br />

different values <strong>for</strong> the density of M, using the same pattern<br />

<strong>for</strong> A and larger patterns. The test problem is Example 1. . . 84<br />

4.2.6 Number of iterations of SQMR with M Sym−F rob with<br />

different orderings. . . . . . . . . . . . . . . . . . . . . . . . . 85<br />

4.2.7 Number of iterations on the test examples using the same<br />

pattern <strong>for</strong> the <strong>preconditioners</strong>. An algebraic pattern is used<br />

to sparsify A. . . . . . . . . . . . . . . . . . . . . . . . . . . . 86


xiii<br />

4.2.8 Number of iterations M Sym−F rob combined with SQMR using<br />

three times more non-zero in à than in the preconditioner.<br />

An algebraic pattern is used to sparsify A. . . . . . . . . . . . 87<br />

4.2.9 Number of iterations of SQMR with M Sym−F rob with<br />

different values <strong>for</strong> the density of M, using the same pattern<br />

<strong>for</strong> A and larger patterns. A geometric approach is adopted<br />

to construct the pattern <strong>for</strong> the preconditioner and an<br />

algebraic approach is adopted to construct the pattern <strong>for</strong><br />

the coefficient matrix. The test problem is Example 1. . . . . 87<br />

4.2.10 Number of iterations of SQMR with M Aver−F rob with<br />

different values <strong>for</strong> the density of M, using the same pattern<br />

<strong>for</strong> A and larger patterns. A geometric approach is adopted<br />

to construct the pattern <strong>for</strong> the preconditioner and an<br />

algebraic approach is adopted to construct the pattern <strong>for</strong><br />

the coefficient matrix. The test problem is Example 1. . . . . 88<br />

4.2.11 Number of iterations of SQMR with M Sym−F rob with<br />

different ordering. An algebraic pattern is used to sparsify<br />

A. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88<br />

5.3.1 Total number of matrix-vector products required to converge<br />

on a sphere on problems of increasing size - tolerance = 10 −2 .<br />

The size of the leaf-boxes in the oct-tree associated with the<br />

preconditioner is 0.125 wavelengths. . . . . . . . . . . . . . . 98<br />

5.3.2 Elapsed time required to build the preconditioner and<br />

by GMRES(30) to converge on a sphere on problems of<br />

increasing size on eight processors on a Compaq Alpha server<br />

- tolerance = 10 −2 . . . . . . . . . . . . . . . . . . . . . . . . . 98<br />

5.3.3 Total number of matrix-vector products required to converge<br />

on an aircraft on problems of increasing size - tolerance =<br />

2 · 10 −2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99<br />

5.3.4 Elapsed time required to build the preconditioner and<br />

by GMRES(30) to converge on an aircraft on problems of<br />

increasing size on eight procs on a Compaq Alpha server -<br />

tolerance = 2 · 10 −2 . . . . . . . . . . . . . . . . . . . . . . . . 99<br />

5.3.5 Elapsed time to build the preconditioner, elapsed time to<br />

solve the problem and total number of matrix-vector products<br />

using GMRES(30) on an aircraft with 213084 unknowns -<br />

tolerance = 2 · 10 −2 - eight processors Compaq, varying the<br />

parameters controlling the density of the preconditioner. The<br />

symbol ’–’ means stagnation after 1000 iterations. . . . . . . . 102<br />

5.3.6 Tests on the parallel scalability of the code relative to the<br />

construction and application of the preconditioner and to the<br />

matrix-vector product operation on problems of increasing<br />

size. The test example is the Airbus aircraft. . . . . . . . . . 103


xiv<br />

5.4.7 Global elapsed time and total number of matrix-vector<br />

products required to converge on a sphere with 367500<br />

points varying the size of the restart parameters and the<br />

maximum number of inner GMRES iterations per FGMRES<br />

preconditioning step - tolerance = 10 −2 - eight processors<br />

Compaq. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105<br />

5.4.8 Global elapsed time and total number of matrixvector<br />

products required to converge on an aircraft with<br />

213084 unknowns varying the size of the restart parameters<br />

and the maximum number of inner GMRES iterations per<br />

FGMRES preconditioning step - tolerance = 2 · 10 −2 - eight<br />

processors Compaq. . . . . . . . . . . . . . . . . . . . . . . . 105<br />

5.4.9 Total number of matrix-vector products required to converge<br />

on a sphere on problems of increasing size - tolerance = 10 −2 . 107<br />

5.4.10 Total number of matrix-vector products required to converge<br />

on an aircraft on problems of increasing size - tolerance =<br />

2 · 10 −2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108<br />

6.2.1 Effect of shifting the eigenvalues nearest zero on the<br />

convergence of GMRES(10) <strong>for</strong> Example 2. We show the<br />

magnitude of successively shifted eigenvalues and the number<br />

of iterations required when these eigenvalues are shifted. A<br />

tolerance of 10 −8 is required in the iterative solution. . . . . . 120<br />

6.2.2 Effect of shifting the eigenvalues nearest zero on the<br />

convergence of GMRES(10) <strong>for</strong> Example 5. We show the<br />

magnitude of successively shifted eigenvalues and the number<br />

of iterations required when these eigenvalues are shifted. A<br />

tolerance of 10 −8 is required in the iterative solution. . . . . . 121<br />

6.2.3 Number of iterations required by GMRES(10) preconditioned<br />

by a Frobenius-norm minimization method updated with<br />

spectral corrections to reduce the normwise backward error<br />

by 10 −8 <strong>for</strong> increasing size of the coarse space on Example 1.<br />

Different choices are considered <strong>for</strong> the operator W H . . . . . 125<br />

6.2.4 Number of iterations required by GMRES(10) preconditioned<br />

by a Frobenius-norm minimization method updated with<br />

spectral corrections to reduce the normwise backward error<br />

by 10 −8 <strong>for</strong> increasing size of the coarse space on Example 2.<br />

Different choices are considered <strong>for</strong> the operator W H . . . . . 126<br />

6.2.5 Number of iterations required by GMRES(10) preconditioned<br />

by a Frobenius-norm minimization method updated with<br />

spectral corrections to reduce the normwise backward error<br />

by 10 −8 <strong>for</strong> increasing size of the coarse space on Example 3.<br />

Different choices are considered <strong>for</strong> the operator W H . . . . . 127


xv<br />

6.2.6 Number of iterations required by GMRES(10) preconditioned<br />

by a Frobenius-norm minimization method updated with<br />

spectral corrections to reduce the normwise backward error<br />

by 10 −8 <strong>for</strong> increasing size of the coarse space on Example 4.<br />

Different choices are considered <strong>for</strong> the operator W H . . . . . 128<br />

6.2.7 Number of iterations required by GMRES(10) preconditioned<br />

by a Frobenius-norm minimization method updated with<br />

spectral corrections to reduce the normwise backward error<br />

by 10 −8 <strong>for</strong> increasing size of the coarse space on Example 5.<br />

Different choices are considered <strong>for</strong> the operator W H . . . . . 129<br />

6.2.8 Number of matrix-vector products required by the IRAM<br />

algorithm to compute approximate eigenvalues nearest zero<br />

and the corresponding right eigenvectors. . . . . . . . . . . . 130<br />

6.2.9 Number of amortization vectors required by the IRAM<br />

algorithm to compute approximate eigenvalues nearest zero<br />

and the corresponding right eigenvectors. The computation<br />

of the amortization vectors is relative to GMRES(10) and a<br />

tolerance of 10 −5 . . . . . . . . . . . . . . . . . . . . . . . . . . 131<br />

6.2.10 Number of iterations required by GMRES(10)<br />

preconditioned by a Frobenius-norm minimization method<br />

updated with spectral corrections to reduce the normwise<br />

backward error by 10 −8 <strong>for</strong> increasing size of the coarse space.<br />

The <strong>for</strong>mulation of Theorem 2 with the choice W H = Vε H M 1<br />

is used <strong>for</strong> the low-rank updates. The computation of Ritz<br />

pairs is carried out at machine precision. . . . . . . . . . . . . 132<br />

A.1.1 Number of iterations required by GMRES preconditioned<br />

by a Frobenius-norm minimization method updated with<br />

spectral corrections to reduce the normwise backward error<br />

by 10 −8 <strong>for</strong> increasing size of the coarse space. . . . . . . . . 154<br />

A.1.2 Number of iterations required by GMRES preconditioned<br />

by a Frobenius-norm minimization method updated with<br />

spectral corrections to reduce the normwise backward error<br />

by 10 −5 <strong>for</strong> increasing size of the coarse space. . . . . . . . . 155<br />

A.1.3 Number of iterations required by GMRES preconditioned<br />

by a Frobenius-norm minimization method updated with<br />

spectral corrections to reduce the normwise backward error<br />

by 10 −8 <strong>for</strong> increasing size of the coarse space. . . . . . . . . 156<br />

A.1.4 Number of iterations required by GMRES preconditioned<br />

by a Frobenius-norm minimization method updated with<br />

spectral corrections to reduce the normwise backward error<br />

by 10 −5 <strong>for</strong> increasing size of the coarse space. . . . . . . . . 157


xvi<br />

A.1.5 Number of iterations required by GMRES preconditioned<br />

by a Frobenius-norm minimization method updated with<br />

spectral corrections to reduce the normwise backward error<br />

by 10 −8 <strong>for</strong> increasing size of the coarse space. . . . . . . . . 158<br />

A.1.6 Number of iterations required by GMRES preconditioned<br />

by a Frobenius-norm minimization method updated with<br />

spectral corrections to reduce the normwise backward error<br />

by 10 −5 <strong>for</strong> increasing size of the coarse space. . . . . . . . . 159<br />

A.1.7 Number of iterations required by GMRES preconditioned<br />

by a Frobenius-norm minimization method updated with<br />

spectral corrections to reduce the normwise backward error<br />

by 10 −8 <strong>for</strong> increasing size of the coarse space. . . . . . . . . 160<br />

A.1.8 Number of iterations required by GMRES preconditioned<br />

by a Frobenius-norm minimization method updated with<br />

spectral corrections to reduce the normwise backward error<br />

by 10 −5 <strong>for</strong> increasing size of the coarse space. . . . . . . . . 161<br />

A.1.9 Number of iterations required by GMRES preconditioned<br />

by a Frobenius-norm minimization method updated with<br />

spectral corrections to reduce the normwise backward error<br />

by 10 −8 <strong>for</strong> increasing size of the coarse space. . . . . . . . . 162<br />

A.1.10 Number of iterations required by GMRES preconditioned<br />

by a Frobenius-norm minimization method updated with<br />

spectral corrections to reduce the normwise backward error<br />

by 10 −5 <strong>for</strong> increasing size of the coarse space. . . . . . . . . 163<br />

A.2.11Number of iterations required by GMRES preconditioned<br />

by a Frobenius-norm minimization method updated with<br />

spectral corrections to reduce the normwise backward error by<br />

10 −8 <strong>for</strong> increasing size of the coarse space. The <strong>for</strong>mulation<br />

of Theorem 2 with the choice W H = Vε<br />

H M 1 is used <strong>for</strong> the<br />

low-rank updates. . . . . . . . . . . . . . . . . . . . . . . . . . 164<br />

A.2.12Number of iterations required by GMRES preconditioned<br />

by a Frobenius-norm minimization method updated with<br />

spectral corrections to reduce the normwise backward error by<br />

10 −5 <strong>for</strong> increasing size of the coarse space. The <strong>for</strong>mulation<br />

of Theorem 2 with the choice W H = Vε<br />

H M 1 is used <strong>for</strong> the<br />

low-rank updates. . . . . . . . . . . . . . . . . . . . . . . . . . 165<br />

A.2.13Number of iterations required by GMRES preconditioned<br />

by a Frobenius-norm minimization method updated with<br />

spectral corrections to reduce the normwise backward error by<br />

10 −8 <strong>for</strong> increasing size of the coarse space. The <strong>for</strong>mulation<br />

of Theorem 2 with the choice W H = Vε<br />

H M 1 is used <strong>for</strong> the<br />

low-rank updates. . . . . . . . . . . . . . . . . . . . . . . . . . 166


xvii<br />

A.2.14Number of iterations required by GMRES preconditioned<br />

by a Frobenius-norm minimization method updated with<br />

spectral corrections to reduce the normwise backward error by<br />

10 −5 <strong>for</strong> increasing size of the coarse space. The <strong>for</strong>mulation<br />

of Theorem 2 with the choice W H = Vε<br />

H M 1 is used <strong>for</strong> the<br />

low-rank updates. . . . . . . . . . . . . . . . . . . . . . . . . . 167<br />

A.2.15Number of iterations required by GMRES preconditioned<br />

by a Frobenius-norm minimization method updated with<br />

spectral corrections to reduce the normwise backward error by<br />

10 −8 <strong>for</strong> increasing size of the coarse space. The <strong>for</strong>mulation<br />

of Theorem 2 with the choice W H = Vε<br />

H M 1 is used <strong>for</strong> the<br />

low-rank updates. . . . . . . . . . . . . . . . . . . . . . . . . . 168<br />

A.2.16Number of iterations required by GMRES preconditioned<br />

by a Frobenius-norm minimization method updated with<br />

spectral corrections to reduce the normwise backward error by<br />

10 −5 <strong>for</strong> increasing size of the coarse space. The <strong>for</strong>mulation<br />

of Theorem 2 with the choice W H = Vε<br />

H M 1 is used <strong>for</strong> the<br />

low-rank updates. . . . . . . . . . . . . . . . . . . . . . . . . . 169<br />

A.2.17Number of iterations required by GMRES preconditioned<br />

by a Frobenius-norm minimization method updated with<br />

spectral corrections to reduce the normwise backward error by<br />

10 −8 <strong>for</strong> increasing size of the coarse space. The <strong>for</strong>mulation<br />

of Theorem 2 with the choice W H = Vε<br />

H M 1 is used <strong>for</strong> the<br />

low-rank updates. . . . . . . . . . . . . . . . . . . . . . . . . . 170<br />

A.2.18Number of iterations required by GMRES preconditioned<br />

by a Frobenius-norm minimization method updated with<br />

spectral corrections to reduce the normwise backward error by<br />

10 −5 <strong>for</strong> increasing size of the coarse space. The <strong>for</strong>mulation<br />

of Theorem 2 with the choice W H = Vε<br />

H M 1 is used <strong>for</strong> the<br />

low-rank updates. . . . . . . . . . . . . . . . . . . . . . . . . . 171<br />

A.2.19Number of iterations required by GMRES preconditioned<br />

by a Frobenius-norm minimization method updated with<br />

spectral corrections to reduce the normwise backward error by<br />

10 −8 <strong>for</strong> increasing size of the coarse space. The <strong>for</strong>mulation<br />

of Theorem 2 with the choice W H = Vε<br />

H M 1 is used <strong>for</strong> the<br />

low-rank updates. . . . . . . . . . . . . . . . . . . . . . . . . . 172<br />

A.2.20 Number of iterations required by GMRES preconditioned<br />

by a Frobenius-norm minimization method updated with<br />

spectral corrections to reduce the normwise backward error by<br />

10 −5 <strong>for</strong> increasing size of the coarse space. The <strong>for</strong>mulation<br />

of Theorem 2 with the choice W H = Vε<br />

H M 1 is used <strong>for</strong> the<br />

low-rank updates. . . . . . . . . . . . . . . . . . . . . . . . . . 173


xviii<br />

A.3.21 Number of matrix-vector products, CPU time and<br />

amortization vectors required by the IRAM algorithm to<br />

compute approximate eigenvalues nearest zero and the<br />

corresponding eigenvectors. The computation of the<br />

amortization vectors is relative to GMRES(10) and a<br />

tolerance of 10 −5 . . . . . . . . . . . . . . . . . . . . . . . . . . 174<br />

A.3.22 Number of matrix-vector products, CPU time and<br />

amortization vectors required by the IRAM algorithm to<br />

compute approximate eigenvalues nearest zero and the<br />

corresponding eigenvectors. The computation of the<br />

amortization vectors is relative to GMRES(10) and a<br />

tolerance of 10 −5 . . . . . . . . . . . . . . . . . . . . . . . . . . 175<br />

A.3.23 Number of matrix-vector products, CPU time and<br />

amortization vectors required by the IRAM algorithm to<br />

compute approximate eigenvalues nearest zero and the<br />

corresponding eigenvectors. The computation of the<br />

amortization vectors is relative to GMRES(10) and a<br />

tolerance of 10 −5 . . . . . . . . . . . . . . . . . . . . . . . . . . 176<br />

A.3.24 Number of matrix-vector products, CPU time and<br />

amortization vectors required by the IRAM algorithm to<br />

compute approximate eigenvalues nearest zero and the<br />

corresponding eigenvectors. The computation of the<br />

amortization vectors is relative to GMRES(10) and a<br />

tolerance of 10 −5 . . . . . . . . . . . . . . . . . . . . . . . . . . 177<br />

A.3.25 Number of matrix-vector products, CPU time and<br />

amortization vectors required by the IRAM algorithm to<br />

compute approximate eigenvalues nearest zero and the<br />

corresponding eigenvectors. The computation of the<br />

amortization vectors is relative to GMRES(10) and a<br />

tolerance of 10 −5 . . . . . . . . . . . . . . . . . . . . . . . . . . 178<br />

A.4.26Number of iterations required by GMRES preconditioned<br />

by a Frobenius-norm minimization method updated with<br />

spectral corrections to reduce the residual by 10 −8 <strong>for</strong><br />

increasing size of the coarse space. The <strong>for</strong>mulation of<br />

Theorem 2 with the choice W H = Vε<br />

H M 1 is used <strong>for</strong> the<br />

low-rank updates. The computation of Ritz pairs is carried<br />

out at machine precision. . . . . . . . . . . . . . . . . . . . . 179<br />

A.4.27Number of iterations required by GMRES preconditioned<br />

by a Frobenius-norm minimization method updated with<br />

spectral corrections to reduce the normwise backward error by<br />

10 −5 <strong>for</strong> increasing size of the coarse space. The <strong>for</strong>mulation<br />

M 1 is used <strong>for</strong> the<br />

low-rank updates. The computation of Ritz pairs is carried<br />

out at machine precision. . . . . . . . . . . . . . . . . . . . . 180<br />

of Theorem 2 with the choice W H = V H<br />

ε


xix<br />

A.4.28Number of iterations required by GMRES preconditioned<br />

by a Frobenius-norm minimization method updated with<br />

spectral corrections to reduce the normwise backward error by<br />

10 −8 <strong>for</strong> increasing size of the coarse space. The <strong>for</strong>mulation<br />

of Theorem 2 with the choice W H = Vε<br />

H M 1 is used <strong>for</strong> the<br />

low-rank updates. The computation of Ritz pairs is carried<br />

out at machine precision. . . . . . . . . . . . . . . . . . . . . 181<br />

A.4.29Number of iterations required by GMRES preconditioned<br />

by a Frobenius-norm minimization method updated with<br />

spectral corrections to reduce the normwise backward error by<br />

10 −5 <strong>for</strong> increasing size of the coarse space. The <strong>for</strong>mulation<br />

of Theorem 2 with the choice W H = Vε<br />

H M 1 is used <strong>for</strong> the<br />

low-rank updates. The computation of Ritz pairs is carried<br />

out at machine precision. . . . . . . . . . . . . . . . . . . . . 182<br />

A.4.30 Number of iterations required by GMRES preconditioned<br />

by a Frobenius-norm minimization method updated with<br />

spectral corrections to reduce the normwise backward error by<br />

10 −8 <strong>for</strong> increasing size of the coarse space. The <strong>for</strong>mulation<br />

of Theorem 2 with the choice W H = Vε<br />

H M 1 is used <strong>for</strong> the<br />

low-rank updates. The computation of Ritz pairs is carried<br />

out at machine precision. . . . . . . . . . . . . . . . . . . . . 183<br />

A.4.31 Number of iterations required by GMRES preconditioned<br />

by a Frobenius-norm minimization method updated with<br />

spectral corrections to reduce the normwise backward error by<br />

10 −5 <strong>for</strong> increasing size of the coarse space. The <strong>for</strong>mulation<br />

of Theorem 2 with the choice W H = Vε<br />

H M 1 is used <strong>for</strong> the<br />

low-rank updates. The computation of Ritz pairs is carried<br />

out at machine precision. . . . . . . . . . . . . . . . . . . . . 184<br />

A.4.32 Number of iterations required by GMRES preconditioned<br />

by a Frobenius-norm minimization method updated with<br />

spectral corrections to reduce the normwise backward error by<br />

10 −8 <strong>for</strong> increasing size of the coarse space. The <strong>for</strong>mulation<br />

M 1 is used <strong>for</strong> the<br />

low-rank updates. The computation of Ritz pairs is carried<br />

out at machine precision. . . . . . . . . . . . . . . . . . . . . 185<br />

of Theorem 2 with the choice W H = V H<br />

ε<br />

A.4.33 Number of iterations required by GMRES preconditioned<br />

by a Frobenius-norm minimization method updated with<br />

spectral corrections to reduce the normwise backward error by<br />

10 −5 <strong>for</strong> increasing size of the coarse space. The <strong>for</strong>mulation<br />

M 1 is used <strong>for</strong> the<br />

low-rank updates. The computation of Ritz pairs is carried<br />

out at machine precision. . . . . . . . . . . . . . . . . . . . . 186<br />

of Theorem 2 with the choice W H = V H<br />

ε


xx<br />

A.4.34 Number of iterations required by GMRES preconditioned<br />

by a Frobenius-norm minimization method updated with<br />

spectral corrections to reduce the normwise backward error by<br />

10 −8 <strong>for</strong> increasing size of the coarse space. The <strong>for</strong>mulation<br />

of Theorem 2 with the choice W H = Vε<br />

H M 1 is used <strong>for</strong> the<br />

low-rank updates. The computation of Ritz pairs is carried<br />

out at machine precision. . . . . . . . . . . . . . . . . . . . . 187<br />

A.4.35 Number of iterations required by GMRES preconditioned<br />

by a Frobenius-norm minimization method updated with<br />

spectral corrections to reduce the normwise backward error by<br />

10 −5 <strong>for</strong> increasing size of the coarse space. The <strong>for</strong>mulation<br />

of Theorem 2 with the choice W H = Vε<br />

H M 1 is used <strong>for</strong> the<br />

low-rank updates. The computation of Ritz pairs is carried<br />

out at machine precision. . . . . . . . . . . . . . . . . . . . . 188<br />

A.5.36 Number of iterations required by GMRES preconditioned<br />

by a Frobenius-norm minimization method updated with<br />

spectral corrections to reduce the normwise backward error by<br />

10 −8 <strong>for</strong> increasing size of the coarse space. The <strong>for</strong>mulation<br />

of Theorem 2 with the choice W H = Vε<br />

H M 1 is used <strong>for</strong> the<br />

low-rank updates. The same nonzero sructure is imposed on<br />

A and M 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189<br />

A.5.37 Number of iterations required by GMRES preconditioned<br />

by a Frobenius-norm minimization method updated with<br />

spectral corrections to reduce the normwise backward error by<br />

10 −5 <strong>for</strong> increasing size of the coarse space. The <strong>for</strong>mulation<br />

of Theorem 2 with the choice W H = Vε<br />

H M 1 is used <strong>for</strong> the<br />

low-rank updates. The same nonzero structure is imposed on<br />

A and M 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190<br />

A.5.38 Number of iterations required by GMRES preconditioned<br />

by a Frobenius-norm minimization method updated with<br />

spectral corrections to reduce the normwise backward error by<br />

10 −8 <strong>for</strong> increasing size of the coarse space. The <strong>for</strong>mulation<br />

of Theorem 2 with the choice W H = Vε<br />

H M 1 is used <strong>for</strong> the<br />

low-rank updates. . . . . . . . . . . . . . . . . . . . . . . . . . 191<br />

A.5.39 Number of iterations required by GMRES preconditioned<br />

by a Frobenius-norm minimization method updated with<br />

spectral corrections to reduce the normwise backward error by<br />

10 −5 <strong>for</strong> increasing size of the coarse space. The <strong>for</strong>mulation<br />

M 1 is used <strong>for</strong> the<br />

low-rank updates. The same nonzero structure is imposed on<br />

A and M 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192<br />

of Theorem 2 with the choice W H = V H<br />

ε


xxi<br />

A.5.40 Number of iterations required by GMRES preconditioned<br />

by a Frobenius-norm minimization method updated with<br />

spectral corrections to reduce the normwise backward error by<br />

10 −8 <strong>for</strong> increasing size of the coarse space. The <strong>for</strong>mulation<br />

of Theorem 2 with the choice W H = Vε<br />

H M 1 is used <strong>for</strong> the<br />

low-rank updates. The same nonzero structure is imposed on<br />

A and M 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193<br />

A.5.41 Number of iterations required by GMRES preconditioned<br />

by a Frobenius-norm minimization method updated with<br />

spectral corrections to reduce the normwise backward error by<br />

10 −5 <strong>for</strong> increasing size of the coarse space. The <strong>for</strong>mulation<br />

of Theorem 2 with the choice W H = Vε<br />

H M 1 is used <strong>for</strong> the<br />

low-rank updates. The same nonzero structure is imposed on<br />

A and M 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194<br />

A.5.42 Number of iterations required by GMRES preconditioned<br />

by a Frobenius-norm minimization method updated with<br />

spectral corrections to reduce the normwise backward error by<br />

10 −8 <strong>for</strong> increasing size of the coarse space. The <strong>for</strong>mulation<br />

of Theorem 2 with the choice W H = Vε<br />

H M 1 is used <strong>for</strong> the<br />

low-rank updates. The same nonzero structure is imposed on<br />

A and M 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195<br />

A.5.43 Number of iterations required by GMRES preconditioned<br />

by a Frobenius-norm minimization method updated with<br />

spectral corrections to reduce the normwise backward error by<br />

10 −5 <strong>for</strong> increasing size of the coarse space. The <strong>for</strong>mulation<br />

of Theorem 2 with the choice W H = Vε<br />

H M 1 is used <strong>for</strong> the<br />

low-rank updates. The same nonzero structure is imposed on<br />

A and M 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196<br />

A.5.44 Number of iterations required by GMRES preconditioned<br />

by a Frobenius-norm minimization method updated with<br />

spectral corrections to reduce the normwise backward error by<br />

10 −8 <strong>for</strong> increasing size of the coarse space. The <strong>for</strong>mulation<br />

M 1 is used <strong>for</strong> the<br />

low-rank updates. The same nonzero sructure is imposed on<br />

A and M 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197<br />

of Theorem 2 with the choice W H = V H<br />

ε<br />

A.5.45 Number of iterations required by GMRES preconditioned<br />

by a Frobenius-norm minimization method updated with<br />

spectral corrections to reduce the normwise backward error by<br />

10 −5 <strong>for</strong> increasing size of the coarse space. The <strong>for</strong>mulation<br />

M 1 is used <strong>for</strong> the<br />

low-rank updates. The same nonzero structure is imposed on<br />

A and M 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198<br />

of Theorem 2 with the choice W H = V H<br />

ε


xxii<br />

A.5.46 Number of matrix-vector products, CPU time and<br />

amortization vectors required by the IRAM algorithm to<br />

compute approximate eigenvalues nearest zero and the<br />

corresponding eigenvectors. The computation of the<br />

amortization vectors is relative to GMRES(10) and a<br />

tolerance of 10 −5 . . . . . . . . . . . . . . . . . . . . . . . . . . 199<br />

A.5.47 Number of matrix-vector products, CPU time and<br />

amortization vectors required by the IRAM algorithm to<br />

compute approximate eigenvalues nearest zero and the<br />

corresponding eigenvectors. The computation of the<br />

amortization vectors is relative to GMRES(10) and a<br />

tolerance of 10 −5 . . . . . . . . . . . . . . . . . . . . . . . . . . 200<br />

A.5.48 Number of matrix-vector products, CPU time and<br />

amortization vectors required by the IRAM algorithm to<br />

compute approximate eigenvalues nearest zero and the<br />

corresponding eigenvectors. The computation of the<br />

amortization vectors is relative to GMRES(10) and a<br />

tolerance of 10 −5 . . . . . . . . . . . . . . . . . . . . . . . . . . 201<br />

A.5.49 Number of matrix-vector products, CPU time and<br />

amortization vectors required by the IRAM algorithm to<br />

compute approximate eigenvalues nearest zero and the<br />

corresponding eigenvectors. The computation of the<br />

amortization vectors is relative to GMRES(10) and a<br />

tolerance of 10 −5 . . . . . . . . . . . . . . . . . . . . . . . . . . 202<br />

A.5.50 Number of matrix-vector products, CPU time and<br />

amortization vectors required by the IRAM algorithm to<br />

compute approximate eigenvalues nearest zero and the<br />

corresponding eigenvectors. The computation of the<br />

amortization vectors is relative to GMRES(10) and a<br />

tolerance of 10 −5 . . . . . . . . . . . . . . . . . . . . . . . . . . 203<br />

A.6.51 Number of iterations required by GMRES preconditioned<br />

by a Frobenius-norm minimization method updated with<br />

spectral corrections to reduce the normwise backward error<br />

by 10 −8 <strong>for</strong> increasing size of the coarse space. The symmetric<br />

<strong>for</strong>mulation of Theorem 2 with the choice W = V ε is used <strong>for</strong><br />

the low-rank updates. . . . . . . . . . . . . . . . . . . . . . . 204<br />

A.6.52 Number of iterations required by GMRES preconditioned<br />

by a Frobenius-norm minimization method updated with<br />

spectral corrections to reduce the normwise backward error<br />

by 10 −5 <strong>for</strong> increasing size of the coarse space. The symmetric<br />

<strong>for</strong>mulation of Theorem 2 with the choice W = V ε is used <strong>for</strong><br />

the low-rank updates. . . . . . . . . . . . . . . . . . . . . . . 205


xxiii<br />

A.6.53 Number of iterations required by GMRES preconditioned<br />

by a Frobenius-norm minimization method updated with<br />

spectral corrections to reduce the normwise backward error<br />

by 10 −8 <strong>for</strong> increasing size of the coarse space. The symmetric<br />

<strong>for</strong>mulation of Theorem 2 with the choice W = V ε is used <strong>for</strong><br />

the low-rank updates. . . . . . . . . . . . . . . . . . . . . . . 206<br />

A.6.54 Number of iterations required by GMRES preconditioned<br />

by a Frobenius-norm minimization method updated with<br />

spectral corrections to reduce the normwise backward error<br />

by 10 −5 <strong>for</strong> increasing size of the coarse space. The symmetric<br />

<strong>for</strong>mulation of Theorem 2 with the choice W = V ε is used <strong>for</strong><br />

the low-rank updates. . . . . . . . . . . . . . . . . . . . . . . 207<br />

A.6.55 Number of iterations required by GMRES preconditioned<br />

by a Frobenius-norm minimization method updated with<br />

spectral corrections to reduce the normwise backward error<br />

by 10 −8 <strong>for</strong> increasing size of the coarse space. The symmetric<br />

<strong>for</strong>mulation of Theorem 2 with the choice W = V ε is used <strong>for</strong><br />

the low-rank updates. . . . . . . . . . . . . . . . . . . . . . . 208<br />

A.6.56 Number of iterations required by GMRES preconditioned<br />

by a Frobenius-norm minimization method updated with<br />

spectral corrections to reduce the normwise backward error<br />

by 10 −5 <strong>for</strong> increasing size of the coarse space. The symmetric<br />

<strong>for</strong>mulation of Theorem 2 with the choice W = V ε is used <strong>for</strong><br />

the low-rank updates. . . . . . . . . . . . . . . . . . . . . . . 209<br />

A.6.57 Number of iterations required by GMRES preconditioned<br />

by a Frobenius-norm minimization method updated with<br />

spectral corrections to reduce the normwise backward error<br />

by 10 −8 <strong>for</strong> increasing size of the coarse space. The symmetric<br />

<strong>for</strong>mulation of Theorem 2 with the choice W = V ε is used <strong>for</strong><br />

the low-rank updates. . . . . . . . . . . . . . . . . . . . . . . 210<br />

A.6.58 Number of iterations required by GMRES preconditioned<br />

by a Frobenius-norm minimization method updated with<br />

spectral corrections to reduce the normwise backward error<br />

by 10 −5 <strong>for</strong> increasing size of the coarse space. The symmetric<br />

<strong>for</strong>mulation of Theorem 2 with the choice W = V ε is used <strong>for</strong><br />

the low-rank updates. . . . . . . . . . . . . . . . . . . . . . . 211<br />

A.6.59 Number of iterations required by GMRES preconditioned<br />

by a Frobenius-norm minimization method updated with<br />

spectral corrections to reduce the normwise backward error<br />

by 10 −8 <strong>for</strong> increasing size of the coarse space. The symmetric<br />

<strong>for</strong>mulation of Theorem 2 with the choice W = V ε is used <strong>for</strong><br />

the low-rank updates. . . . . . . . . . . . . . . . . . . . . . . 212


xxiv<br />

A.6.60 Number of iterations required by GMRES preconditioned<br />

by a Frobenius-norm minimization method updated with<br />

spectral corrections to reduce the normwise backward error<br />

by 10 −5 <strong>for</strong> increasing size of the coarse space. The symmetric<br />

<strong>for</strong>mulation of Theorem 2 with the choice W = V ε is used <strong>for</strong><br />

the low-rank updates. . . . . . . . . . . . . . . . . . . . . . . 213<br />

A.6.61 Number of iterations required by SQMR preconditioned<br />

by a Frobenius-norm minimization method updated with<br />

spectral corrections to reduce the normwise backward error<br />

by 10 −8 <strong>for</strong> increasing size of the coarse space. The symmetric<br />

<strong>for</strong>mulation of Theorem 2 with the choice W = V ε is used <strong>for</strong><br />

the low-rank updates. . . . . . . . . . . . . . . . . . . . . . . 214<br />

A.6.62 Number of iterations required by SQMR preconditioned<br />

by a Frobenius-norm minimization method updated with<br />

spectral corrections to reduce the normwise backward error<br />

by 10 −5 <strong>for</strong> increasing size of the coarse space. The symmetric<br />

<strong>for</strong>mulation of Theorem 2 with the choice W = V ε is used <strong>for</strong><br />

the low-rank updates. . . . . . . . . . . . . . . . . . . . . . . 215<br />

A.7.63 Number of iterations required by GMRES preconditioned<br />

by a Frobenius-norm minimization method updated with<br />

spectral corrections to reduce the normwise backward error<br />

by 10 −8 <strong>for</strong> increasing size of the coarse space. The<br />

preconditioner is updated in multiplicative <strong>for</strong>m. . . . . . . . 216<br />

A.7.64 Number of iterations required by GMRES preconditioned<br />

by a Frobenius-norm minimization method updated with<br />

spectral corrections to reduce the normwise backward error<br />

by 10 −5 <strong>for</strong> increasing size of the coarse space. The<br />

preconditioner is updated in multiplicative <strong>for</strong>m. . . . . . . . 217<br />

A.7.65 Number of iterations required by GMRES preconditioned<br />

by a Frobenius-norm minimization method updated with<br />

spectral corrections to reduce the normwise backward error<br />

by 10 −8 <strong>for</strong> increasing size of the coarse space. The<br />

preconditioner is updated in multiplicative <strong>for</strong>m. . . . . . . . 218<br />

A.7.66 Number of iterations required by GMRES preconditioned<br />

by a Frobenius-norm minimization method updated with<br />

spectral corrections to reduce the normwise backward error<br />

by 10 −5 <strong>for</strong> increasing size of the coarse space. The<br />

preconditioner is updated in multiplicative <strong>for</strong>m. . . . . . . . 219<br />

A.7.67 Number of iterations required by GMRES preconditioned<br />

by a Frobenius-norm minimization method updated with<br />

spectral corrections to reduce the normwise backward error<br />

by 10 −8 <strong>for</strong> increasing size of the coarse space. The<br />

preconditioner is updated in multiplicative <strong>for</strong>m. . . . . . . . 220


xxv<br />

A.7.68 Number of iterations required by GMRES preconditioned<br />

by a Frobenius-norm minimization method updated with<br />

spectral corrections to reduce the normwise backward error<br />

by 10 −5 <strong>for</strong> increasing size of the coarse space. The<br />

preconditioner is updated in multiplicative <strong>for</strong>m. . . . . . . . 221<br />

A.7.69 Number of iterations required by GMRES preconditioned<br />

by a Frobenius-norm minimization method updated with<br />

spectral corrections to reduce the normwise backward error<br />

by 10 −8 <strong>for</strong> increasing size of the coarse space. The<br />

preconditioner is updated in multiplicative <strong>for</strong>m. . . . . . . . 222<br />

A.7.70 Number of iterations required by GMRES preconditioned<br />

by a Frobenius-norm minimization method updated with<br />

spectral corrections to reduce the normwise backward error<br />

by 10 −5 <strong>for</strong> increasing size of the coarse space. The<br />

preconditioner is updated in multiplicative <strong>for</strong>m. . . . . . . . 223<br />

A.7.71 Number of iterations required by GMRES preconditioned<br />

by a Frobenius-norm minimization method updated with<br />

spectral corrections to reduce the normwise backward error<br />

by 10 −8 <strong>for</strong> increasing size of the coarse space. The<br />

preconditioner is updated in multiplicative <strong>for</strong>m. . . . . . . . 224<br />

A.7.72 Number of iterations required by GMRES preconditioned<br />

by a Frobenius-norm minimization method updated with<br />

spectral corrections to reduce the normwise backward error<br />

by 10 −5 <strong>for</strong> increasing size of the coarse space. The<br />

preconditioner is updated in multiplicative <strong>for</strong>m. . . . . . . . 225<br />

A.7.73 Number of iterations required by SQMR preconditioned<br />

by a Frobenius-norm minimization method updated with<br />

spectral corrections to reduce the normwise backward error by<br />

10 −8 <strong>for</strong> increasing size of the coarse space. The symmetric<br />

<strong>for</strong>mulation of Theorem 2 with the choice W = V ε is used<br />

<strong>for</strong> the low-rank updates. The preconditioner is updated in<br />

multiplicative <strong>for</strong>m. . . . . . . . . . . . . . . . . . . . . . . . . 226<br />

A.7.74 Number of iterations required by SQMR preconditioned<br />

by a Frobenius-norm minimization method updated with<br />

spectral corrections to reduce the normwise backward error by<br />

10 −5 <strong>for</strong> increasing size of the coarse space. The symmetric<br />

<strong>for</strong>mulation of Theorem 2 with the choice W = V ε is used<br />

<strong>for</strong> the low-rank updates. The preconditioner is updated in<br />

multiplicative <strong>for</strong>m. . . . . . . . . . . . . . . . . . . . . . . . 227


xxvi


List of Figures<br />

1.3.1 Example of discretized mesh. . . . . . . . . . . . . . . . . . . 8<br />

2.1.1 Meshes associated with test examples. . . . . . . . . . . . . . 30<br />

2.1.2 Eigenvalue distribution in the complex plane of the coefficient<br />

matrix of Example 3. . . . . . . . . . . . . . . . . . . . . . . . 31<br />

2.2.3 Pattern structure of the large entries of A. The test problem<br />

is Example 5. . . . . . . . . . . . . . . . . . . . . . . . . . . . 31<br />

2.2.4 Nonzero pattern <strong>for</strong> A when the smallest entries are<br />

discarded. The test problem is Example 5. . . . . . . . . . . . 32<br />

2.2.5 Sensitivity of SQMR convergence to the SSOR parameter ω<br />

<strong>for</strong> Example 1. . . . . . . . . . . . . . . . . . . . . . . . . . . 32<br />

2.2.6 Sensitivity of SQMR convergence to the SSOR parameter ω<br />

<strong>for</strong> Example 4. . . . . . . . . . . . . . . . . . . . . . . . . . . 33<br />

2.2.7 Incomplete factorization algorithm - M = LDL T . . . . . . . 33<br />

2.2.8 The spectrum of the matrix preconditioned with IC(1), the<br />

condition number of L, and the number of iterations with<br />

SQMR <strong>for</strong> various values of the shift parameter τ. The test<br />

problem is Example 1 and the density of à is around 3%. . . 35<br />

2.2.9 The eigenvalue distribution on the square [-1, 1] of the matrix<br />

preconditioned with IC(1), the condition number of L, and<br />

the number of iterations with SQMR <strong>for</strong> various values of the<br />

shift parameter τ. The test problem is Example 1 and the<br />

density of à is around 3%. . . . . . . . . . . . . . . . . . . . . 36<br />

2.2.10 The eigenvalue distribution on the square [-0.3, 0.3] of the<br />

matrix preconditioned with IC(1), the condition number of<br />

L, and the number of iterations with SQMR <strong>for</strong> various values<br />

of the shift parameter τ. The test problem is Example 1 and<br />

the density of à is around 3%. . . . . . . . . . . . . . . . . . 37<br />

2.2.11The biconjugation algorithm - M = ZD −1 Z T . . . . . . . . . 39<br />

2.2.12Sparsity patterns of the inverse of A (on the left) and of the<br />

inverse of its lower triangular factor (on the right), where all<br />

the entries whose relative magnitude is smaller than 5.0×10 −2<br />

are dropped. The test problem, representative of the general<br />

trend, is a small sphere. . . . . . . . . . . . . . . . . . . . . . 44<br />

xxvii


xxviii<br />

2.2.13 Histograms of the magnitude of the entries of the first<br />

column of A −1 and its lower triangular factor. A similar<br />

behaviour has been observed <strong>for</strong> all the other columns. The<br />

test problem, representative of the general trend, is a small<br />

sphere. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44<br />

3.2.1 Pattern structure of A −1 . The test problem is Example 5. . . 57<br />

3.2.2 Example of discretized mesh. . . . . . . . . . . . . . . . . . . 59<br />

3.2.3 Topological neighbours of a DOF in the mesh. . . . . . . . . . 59<br />

3.2.4 Topological localization in the mesh <strong>for</strong> the large entries of<br />

A. The test problem is Example 1 and is representative of<br />

the general behaviour. . . . . . . . . . . . . . . . . . . . . . . 60<br />

3.2.5 Topological localization in the mesh <strong>for</strong> the large entries of<br />

A −1 . The test problem is Example 1 and is representative of<br />

the general behaviour. . . . . . . . . . . . . . . . . . . . . . . 61<br />

3.2.6 Evolution of the density of the pattern computed <strong>for</strong><br />

increasing number of levels. The test problem is Example 1.<br />

This is representative of the general behaviour. . . . . . . . . 62<br />

3.2.7 Geometric localization in the mesh <strong>for</strong> the large entries of A.<br />

The test problem is Example 1. This is representative of the<br />

general behaviour. . . . . . . . . . . . . . . . . . . . . . . . . 63<br />

3.2.8 Geometric localization in the mesh <strong>for</strong> the large entries of<br />

A −1 . The test problem is Example 1. This is representative<br />

of the general behaviour. . . . . . . . . . . . . . . . . . . . . . 64<br />

3.2.9 Evolution of the density of the pattern computed <strong>for</strong> larger<br />

geometric neighbourhoods. The test problem is Example 1.<br />

This is representative of the general behaviour. . . . . . . . . 64<br />

3.2.10Mesh of Example 2. . . . . . . . . . . . . . . . . . . . . . . . 66<br />

3.3.11Nonzero pattern <strong>for</strong> A −1 when the smallest entries are<br />

discarded. The test problem is Example 5. . . . . . . . . . . . 67<br />

3.3.12 Sparsity pattern of the inverse of sparse A associated with<br />

Example 1. The pattern has been sparsified with the same<br />

value of the threshold used <strong>for</strong> the sparsification of displayed<br />

in Figure 3.3.11. . . . . . . . . . . . . . . . . . . . . . . . . . 68<br />

3.3.13 CPU time <strong>for</strong> the construction of the preconditioner using<br />

a different number of nonzeros in the patterns <strong>for</strong> A and M.<br />

The test problem is Example 1. This is representative of the<br />

other examples. . . . . . . . . . . . . . . . . . . . . . . . . . . 69<br />

3.4.14 Eigenvalue distribution <strong>for</strong> the coefficient matrix<br />

preconditioned by using a single density strategy on<br />

Example 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73<br />

3.4.15 Eigenvalue distribution <strong>for</strong> the coefficient matrix<br />

preconditioned by using a multiple density strategy on<br />

Example 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74


xxix<br />

5.1.1 Interactions in the one-level FMM. For each leaf-box,<br />

the interactions with the gray neighbouring leaf-boxes are<br />

computed directly. The contribution of far away cubes are<br />

computed approximately. The multipole expansions of far<br />

away boxes are translated to local expansions <strong>for</strong> the leaf-box;<br />

these contributions are summed together and the total field<br />

induced by far away cubes is evaluated <strong>from</strong> local expansions. 94<br />

5.1.2 The oct-tree in the FMM algorithm. The maximum number<br />

of children is eight. The actual number corresponds to the<br />

subset of eight that intersect the object (courtesy of G.<br />

Sylvand, INRIA CERMICS). . . . . . . . . . . . . . . . . . . 95<br />

5.1.3 Interactions in the multilevel FMM. The interactions <strong>for</strong> the<br />

gray boxes are computed directly. We denote by dashed<br />

lines the interaction list <strong>for</strong> the observation box, that consists<br />

of those cubes that are not neighbours of the cube itself<br />

but whose parent is a neighbour of the cube’s parent. The<br />

interactions of the cubes in the list are computed using the<br />

FMM. All the other interactions are computed hierarchically<br />

on a coarser level, denoted by solid lines. . . . . . . . . . . . . 96<br />

5.3.4 Mesh associated with the Airbus aircraft (courtesy of EADS).<br />

The surface is discretized by 15784 triangles. . . . . . . . . . 97<br />

5.3.5 The RCS curve <strong>for</strong> an Airbus aircraft discretized with 200000<br />

unknowns. The problem is <strong>for</strong>mulated using the EFIE<br />

<strong>for</strong>mulation and a tolerance of 2·10 −2 in the iterative solution.<br />

The quantity reported on the ordinate axis indicates the value<br />

of the energy radiated back at different incidence angles. . . . 101<br />

5.3.6 The RCS curve <strong>for</strong> an Airbus aircraft discretized with 200000<br />

unknowns. The problem is <strong>for</strong>mulated using the CFIE<br />

<strong>for</strong>mulation and a tolerance of ·10 −6 in the iterative solution.<br />

The quantity reported on the ordinate axis indicates the value<br />

of the energy radiated back at different incidence angles. . . . 101<br />

5.3.7 Effect of the restart parameter on GMRES stagnation on an<br />

aircraft with 94704 unknowns. . . . . . . . . . . . . . . . . . . 102<br />

5.4.8 Inner-outer solution schemes in the FMM context. Sketch of<br />

the algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . 104<br />

5.4.9 Convergence history of restarted GMRES <strong>for</strong> different values<br />

of restart on an aircraft with 94704 unknowns. . . . . . . . . 106<br />

5.4.10Effect of the restart parameter on FGMRES stagnation on<br />

an aircraft with 94704 unknowns using GMRES(20) as inner<br />

solver. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108<br />

6.2.1 Eigenvalue distribution <strong>for</strong> the coefficient matrix<br />

preconditioned by the Frobenius-norm minimization method<br />

on Example 2. . . . . . . . . . . . . . . . . . . . . . . . . . . 115


xxx<br />

6.2.2 Number of iterations required by GMRES preconditioned<br />

by a Frobenius-norm minimization method updated with<br />

spectral corrections to reduce the normwise backward error<br />

by 10 −8 and 10 −5 <strong>for</strong> increasing size of the coarse space on<br />

Example 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119<br />

6.2.3 Number of iterations required by GMRES preconditioned<br />

by a Frobenius-norm minimization method updated with<br />

spectral corrections to reduce the normwise backward error<br />

by 10 −8 and 10 −5 <strong>for</strong> increasing size of the coarse space on<br />

Example 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120<br />

6.2.4 Number of iterations required by GMRES preconditioned<br />

by a Frobenius-norm minimization method updated with<br />

spectral corrections to reduce the normwise backward error<br />

by 10 −8 and 10 −5 <strong>for</strong> increasing size of the coarse space on<br />

Example 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121<br />

6.2.5 Number of iterations required by GMRES preconditioned<br />

by a Frobenius-norm minimization method updated with<br />

spectral corrections to reduce the normwise backward error<br />

by 10 −8 and 10 −5 <strong>for</strong> increasing size of the coarse space on<br />

Example 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122<br />

6.2.6 Number of iterations required by GMRES preconditioned<br />

by a Frobenius-norm minimization method updated with<br />

spectral corrections to reduce the normwise backward error<br />

by 10 −8 and 10 −5 <strong>for</strong> increasing size of the coarse space on<br />

Example 5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122<br />

6.2.7 Number of iterations required by GMRES preconditioned<br />

by a Frobenius-norm minimization method updated with<br />

spectral corrections to reduce the normwise backward error<br />

by 10 −8 <strong>for</strong> three choices of restart and increasing size of the<br />

coarse space on Example 1. . . . . . . . . . . . . . . . . . . . 123<br />

6.2.8 Number of iterations required by GMRES preconditioned<br />

by a Frobenius-norm minimization method updated with<br />

spectral corrections to reduce the normwise backward error<br />

by 10 −8 <strong>for</strong> three choices of restart and increasing size of the<br />

coarse space on Example 2. . . . . . . . . . . . . . . . . . . . 124<br />

6.2.9 Number of iterations required by GMRES preconditioned<br />

by a Frobenius-norm minimization method updated with<br />

spectral corrections to reduce the normwise backward error<br />

by 10 −8 <strong>for</strong> three choices of restart and increasing size of the<br />

coarse space on Example 3. . . . . . . . . . . . . . . . . . . . 124<br />

6.2.10 Eigenvalue distribution <strong>for</strong> the coefficient matrix<br />

preconditioned by a Frobenius-norm minimization method<br />

on Example 2. The same sparsity pattern is used <strong>for</strong> A and<br />

<strong>for</strong> the preconditioner. . . . . . . . . . . . . . . . . . . . . . . 133


xxxi<br />

6.2.11 Convergence of GMRES preconditioned by a Frobeniusnorm<br />

minimization method updated with spectral corrections<br />

to reduce the normwise backward error by 10 −8 and 10 −5<br />

<strong>for</strong> increasing size of the coarse space on Example 1. The<br />

<strong>for</strong>mulation of Theorem 2 with the choice W H = Vε<br />

H M 1 is<br />

used <strong>for</strong> the low-rank updates. The same nonzero structure<br />

is used <strong>for</strong> A and M 1 . . . . . . . . . . . . . . . . . . . . . . . 133<br />

6.2.12 Convergence of GMRES preconditioned by a Frobeniusnorm<br />

minimization method updated with spectral corrections<br />

to reduce the normwise backward error by 10 −8 and 10 −5<br />

<strong>for</strong> increasing size of the coarse space on Example 2. The<br />

<strong>for</strong>mulation of Theorem 2 with the choice W H = Vε<br />

H M 1 is<br />

used <strong>for</strong> the low-rank updates. The same nonzero structure<br />

is used <strong>for</strong> A and M 1 . . . . . . . . . . . . . . . . . . . . . . . 134<br />

6.2.13 Convergence of GMRES preconditioned by a Frobeniusnorm<br />

minimization method updated with spectral corrections<br />

to reduce the normwise backward error by 10 −8 and 10 −5<br />

<strong>for</strong> increasing size of the coarse space on Example 3. The<br />

<strong>for</strong>mulation of Theorem 2 with the choice W H = Vε<br />

H M 1 is<br />

used <strong>for</strong> the low-rank updates. The same nonzero structure<br />

is used <strong>for</strong> A and M 1 . . . . . . . . . . . . . . . . . . . . . . . 134<br />

6.2.14 Convergence of GMRES preconditioned by a Frobeniusnorm<br />

minimization method updated with spectral corrections<br />

to reduce the normwise backward error by 10 −8 and 10 −5<br />

<strong>for</strong> increasing size of the coarse space on Example 4. The<br />

<strong>for</strong>mulation of Theorem 2 with the choice W H = Vε<br />

H M 1 is<br />

used <strong>for</strong> the low-rank updates. The same nonzero structure<br />

is used <strong>for</strong> A and M 1 . . . . . . . . . . . . . . . . . . . . . . . 135<br />

6.2.15Number of iterations required by SQMR preconditioned by a<br />

Frobenius-norm minimization method updated with spectral<br />

corrections to reduce the normwise backward error by 10 −5<br />

<strong>for</strong> increasing size of the coarse space on Example 1. The<br />

symmetric <strong>for</strong>mulation of Theorem 2 with the choice W = V ε<br />

is used <strong>for</strong> the low-rank updates. . . . . . . . . . . . . . . . . 136<br />

6.2.16Number of iterations required by SQMR preconditioned by a<br />

Frobenius-norm minimization method updated with spectral<br />

corrections to reduce the normwise backward error by 10 −5<br />

<strong>for</strong> increasing size of the coarse space on Example 2. The<br />

symmetric <strong>for</strong>mulation of Theorem 2 with the choice W = V ε<br />

is used <strong>for</strong> the low-rank updates. . . . . . . . . . . . . . . . . 137


xxxii<br />

6.2.17Number of iterations required by SQMR preconditioned by a<br />

Frobenius-norm minimization method updated with spectral<br />

corrections to reduce the normwise backward error by 10 −5<br />

<strong>for</strong> increasing size of the coarse space on Example 3. The<br />

symmetric <strong>for</strong>mulation of Theorem 2 with the choice W = V ε<br />

is used <strong>for</strong> the low-rank updates. . . . . . . . . . . . . . . . . 137<br />

6.2.18Number of iterations required by SQMR preconditioned by a<br />

Frobenius-norm minimization method updated with spectral<br />

corrections to reduce the normwise backward error by 10 −5<br />

<strong>for</strong> increasing size of the coarse space on Example 4. The<br />

symmetric <strong>for</strong>mulation of Theorem 2 with the choice W = V ε<br />

is used <strong>for</strong> the low-rank updates. . . . . . . . . . . . . . . . . 138<br />

6.2.19Number of iterations required by SQMR preconditioned by a<br />

Frobenius-norm minimization method updated with spectral<br />

corrections to reduce the normwise backward error by 10 −5<br />

<strong>for</strong> increasing size of the coarse space on Example 5. The<br />

symmetric <strong>for</strong>mulation of Theorem 2 with the choice W = V ε<br />

is used <strong>for</strong> the low-rank updates. . . . . . . . . . . . . . . . . 138<br />

6.3.20 Convergence of GMRES preconditioned by a Frobeniusnorm<br />

minimization method updated with spectral corrections<br />

to reduce the normwise backward error by 10 −8 and 10 −5<br />

<strong>for</strong> increasing numberof corrections on Example 1. The<br />

symmetric <strong>for</strong>mulation of Theorem 2 with the choice W = V ε<br />

is used <strong>for</strong> the low-rank updates. The preconditioner is<br />

updated in multiplicative <strong>for</strong>m. . . . . . . . . . . . . . . . . . 141<br />

6.3.21 Convergence of GMRES preconditioned by a Frobeniusnorm<br />

minimization method updated with spectral corrections<br />

to reduce the normwise backward error by 10 −8 and 10 −5<br />

<strong>for</strong> increasing size of the coarse space on Example 3. The<br />

symmetric <strong>for</strong>mulation of Theorem 2 with the choice W = V ε<br />

is used <strong>for</strong> the low-rank updates. The preconditioner is<br />

updated in multiplicative <strong>for</strong>m. . . . . . . . . . . . . . . . . . 142<br />

6.3.22 Convergence of GMRES preconditioned by a Frobeniusnorm<br />

minimization method updated with spectral corrections<br />

to reduce the normwise backward error by 10 −8 and 10 −5<br />

<strong>for</strong> increasing size of the coarse space on Example 4. The<br />

symmetric <strong>for</strong>mulation of Theorem 2 with the choice W = V ε<br />

is used <strong>for</strong> the low-rank updates. The preconditioner is<br />

updated in multiplicative <strong>for</strong>m. . . . . . . . . . . . . . . . . . 142


Chapter 1<br />

Introduction<br />

This thesis considers the problem of designing effective preconditioning<br />

strategies <strong>for</strong> the iterative solution of boundary integral equations in<br />

electromagnetism. An accurate numerical solution of these problems<br />

is required in the simulation of many industrial processes, such as the<br />

prediction of the Radar Cross Section (RCS) of arbitrarily shaped 3D<br />

objects like aircrafts, the analysis of electromagnetic compatibility of<br />

electrical devices with their environment, and many others. In the last<br />

20 years, owing to the impressive development in computer technology<br />

and to the introduction of fast methods which require less computational<br />

cost and memory resources, a rigorous numerical solution of many of these<br />

applications has become possible [29]. Nowadays challenging problems in<br />

an industrial setting demand a continuous reduction in the computational<br />

complexity of the numerical methods employed; the aim of this research is<br />

to investigate the use of sparse <strong>linear</strong> algebra techniques (with particular<br />

emphasis on preconditioning) <strong>for</strong> the solution of <strong>dense</strong> <strong>linear</strong> <strong>systems</strong><br />

of equations arising <strong>from</strong> scattering problems expressed in an integral<br />

<strong>for</strong>mulation.<br />

In this chapter, we illustrate the motivation of our research, and we<br />

present the major topics discussed in the thesis. In Section 1.1, we describe<br />

the physical problem we are interested in, and give some examples of<br />

applications. In Section 1.2, we <strong>for</strong>mulate the mathematical problem and,<br />

in Section 1.3, we overview some of the principal approaches generally used<br />

to solve scattering problems. Finally, in Section 1.4, we discuss direct and<br />

iterative solution strategies and introduce some issues relevant to the design<br />

of the preconditioner.<br />

1


2 1. Introduction<br />

1.1 The physical problem and applications<br />

Electromagnetic scattering problems address the physical issue of detecting<br />

the diffraction pattern of the electromagnetic radiation scattered <strong>from</strong> a<br />

large and complex body when illuminated by an incident incoming wave.<br />

A good understanding of these phenomena is crucial to the design of<br />

many industrial devices like radars, antennae, computer microprocessors,<br />

optical fibre <strong>systems</strong>, cellular telephones, transistors, modems, and so on.<br />

Electronic circuits produce and are subject to electromagnetic interference,<br />

and ensuring reduced radiation and signal distortion have become two<br />

major issues in the design of modern electronic devices. The increase of<br />

currents and frequencies in industrial simulations makes electromagnetic<br />

compatibility requirements more difficult to meet and demands an accurate<br />

analysis previous to the design phase.<br />

The study of electromagnetic scattering is required in radar applications,<br />

where a target is illuminated by incident radiation and the energy radiated<br />

back to the radar is analysed to retrieve in<strong>for</strong>mation on the target. In fact,<br />

the amount of the radiated energy depends on the radar cross-section of<br />

the target, on its shape, on the material of which it is composed, and on<br />

the wavelength of the incident radiation. Radar measurements are vital<br />

<strong>for</strong> estimating surface currents in oceanography, <strong>for</strong> mapping precipitation<br />

areas and detecting wind direction and speed in meteorological and<br />

climatic studies, as well as in the production of accurate weather <strong>for</strong>ecasts,<br />

geophysical prospecting <strong>from</strong> remote sensing data, wireless communication<br />

and bioelectromagnetics. In particular, the computation of radar crosssection<br />

is used to identify unknown targets as well as to design stealth<br />

technology.<br />

Modern targets reduce their observability features by using new<br />

materials. Engineers design, develop and test absorbing materials which<br />

can control radiation, reduce signatures of military <strong>systems</strong>, preserve<br />

compatibility with other electromagnetic compatibility devices, isolate<br />

recording studios and listening rooms. A good knowledge of the<br />

electromagnetic properties of materials can be critical <strong>for</strong> economic<br />

competitiveness and technological advances in many industrial sectors. All<br />

these simulations can be very demanding in terms of computer resources;<br />

they require innovative algorithms and the use of high per<strong>for</strong>mance<br />

computers to af<strong>for</strong>d a rigorous numerical solution.<br />

1.2 The mathematical problem<br />

The mathematical <strong>for</strong>mulation of scattering problems relies on Maxwell’s<br />

equations, originally introduced by James Maxwell in 1864 in the article A<br />

Dynamical Theory of the Electromagnetic Field [103] as 20 scalar equations.


1.2. The mathematical problem 3<br />

Maxwell’s equations were re<strong>for</strong>mulated in the 1880s as a set of four vector<br />

differential equations, describing the time and space evolution of the electric<br />

and the magnetic field around the scatterer. They are:<br />

⎧<br />

⎪⎨<br />

⎪⎩<br />

∇ × H = J + ∂D<br />

∂t ,<br />

∇ × E = − ∂B<br />

∂t ,<br />

∇ · D = ρ,<br />

∇ · B = 0.<br />

(1.2.1)<br />

The vector fields which appear in (1.2.1) are the electric field E(x,t), the<br />

magnetic field H(x,t), the magnetic flux density B(x,t) and the electric flux<br />

density D(x,t). Equations (1.2.1) involve also the current density J(x,t) and<br />

the charge density ρ(x, t). Given a vector field A represented in Cartesian<br />

coordinates in the <strong>for</strong>m A(x, y, z) = A x (x, y, z)i+A y (x, y, z)j+A z (x, y, z)k,<br />

the components of the curl operator ∇ × A are<br />

⎧<br />

⎪⎨<br />

⎪⎩<br />

(∇ × A) x = ∂A z<br />

∂y − ∂A y<br />

∂z ,<br />

(∇ × A) y = ∂A x<br />

∂z − ∂A z<br />

∂x ,<br />

(∇ × A) z = ∂A y<br />

∂x − ∂A x<br />

∂y .<br />

The divergence operator ∇ · A in Cartesian coordinates is<br />

∇ · A = ∂A x<br />

∂x + ∂A y<br />

∂y + ∂A z<br />

∂z .<br />

The continuity equation, which expresses the conservation of charge,<br />

relates the quantities J and ρ<br />

∂ρ<br />

∂t + ∇ · J = 0.<br />

In an isotropic conductor the current density is related to the electric field<br />

by Ohm’s law:<br />

J = σE,<br />

where σ(x) is called the electric conductivity. If σ is nonzero, the medium is<br />

called a conductor, whereas if σ = 0 the medium is referred to as a dielectric.<br />

Relations also exist between D and E, B and H, and are determined by the<br />

polarization and magnetization properties of the medium containing the<br />

scatterer; in a <strong>linear</strong> isotropic medium we have<br />

D = ɛE,<br />

B = µH,


4 1. Introduction<br />

where the functions ɛ(x) and µ(x) are the electric permittivity and the<br />

magnetic permeability, respectively. In a vacuum D = E, and B = H.<br />

This equality can be assumed valid, up to some approximation, when the<br />

medium is the air. In this case, Maxwell’s equations can be simplified and<br />

read:<br />

⎧<br />

⎪⎨<br />

⎪⎩<br />

∇ × H = J + ∂E<br />

∂t ,<br />

∇ × E = − ∂H<br />

∂t ,<br />

∇ · E = ρ,<br />

∇ · H = 0.<br />

(1.2.2)<br />

Boundary conditions are associated with system (1.2.2) to describe<br />

different physical situations. For scattering <strong>from</strong> perfect conductors, which<br />

represents an important model problem in industrial simulations, the electric<br />

field vanishes inside the object and the total tangential electric field on the<br />

surface of the scatterer is zero. Absorbing radiation conditions at infinity<br />

are imposed, like the Silver-Müller radiation condition [25]<br />

lim<br />

r→∞ (Hs × x − rE s ) = 0<br />

uni<strong>for</strong>mly in all directions ˆx = x/|x|, where r = |x| and H s and E s are the<br />

scattered part of the fields.<br />

A further simplification comes when Maxwell’s equations are <strong>for</strong>mulated<br />

in the frequency domain rather than in the time domain. Since the sum of<br />

two solutions is still a solution, Fourier trans<strong>for</strong>mations can be introduced<br />

to remove time-dependency <strong>from</strong> system (1.2.2), and to write it in the<br />

<strong>for</strong>m of a set of several time-independent <strong>systems</strong>, each corresponding to<br />

one fixed value of frequency. All the quantities in (1.2.2) are assumed to<br />

have harmonic behaviour in time, that is they can be written in the <strong>for</strong>m<br />

A(x, t) = A(x)e iωt (ω is a constant) and their time dependency is completely<br />

determined by the amplitude and relative phase. For a dielectric body the<br />

new system assumes the <strong>for</strong>m:<br />

⎧<br />

⎪⎨<br />

⎪⎩<br />

∇ × H = +iωE,<br />

∇ × E = −iωH,<br />

∇ · E = 0,<br />

∇ · H = 0.<br />

(1.2.3)<br />

where now E = E(x) and H = H(x). Here ω = ck = 2πc/λ is referred to<br />

as the angular frequency, k as the wave number and λ as the wavelength of<br />

the electromagnetic wave. The constant c is the speed of light.


1.3. Numerical solution of Maxwell’s equations 5<br />

1.3 Numerical solution of Maxwell’s equations<br />

A popular solution approach eliminates the magnetic field H <strong>from</strong> (1.2.3)<br />

and obtains a vector Helmholtz equation with a divergence condition:<br />

{ ∆E + k 2 E = 0,<br />

∇ · E = 0.<br />

(1.3.4)<br />

Systems (1.3.4) are challenging to solve. An analytic solution can be<br />

computed when the geometry of the scatterer is very regular, as in the case<br />

of a sphere or a spheroid. More complicated boundaries require the use of<br />

numerical techniques.<br />

Objects of interest in industrial applications generally have large<br />

dimension in terms of wavelength, and the computation of their scattering<br />

cross section can be very demanding in terms of computer resources. Until<br />

the emergence of high-per<strong>for</strong>mance computers in the early eighties, the<br />

solution was af<strong>for</strong>ded by using approximate high frequency techniques such<br />

as the shooting and bouncing ray method (SBR) [101]. Basically, raybased<br />

asymptotic methods like SBR and uni<strong>for</strong>m theory of diffraction are<br />

based on the idea that EM scattering becomes a localized phenomenon<br />

as the size of the scatterer increases with respect to the wavelength. In<br />

the last 20 years, the impressive advance in computer technology and the<br />

introduction of fast methods which have less computational and memory<br />

requirement, have made a rigorous numerical solution af<strong>for</strong>dable <strong>for</strong> many<br />

practical applications. Nowadays, computer scientists generally adopt two<br />

distinct approaches <strong>for</strong> the numerical solution, based on either differential<br />

or integral equation methods.<br />

1.3.1 Differential equation methods<br />

The first approach solves system (1.3.4) <strong>for</strong> the electric field surrounding the<br />

scatterer by differential equation methods. Classical discretization schemes<br />

like the finite-element method (FEM) [125, 145]) or the finite-difference<br />

method (FDM) [99, 137] can be used to discretize the continuous model<br />

and give rise to a sparse <strong>linear</strong> system of equations. The domain outside<br />

the object is truncated and an artificial boundary is introduced to simulate<br />

an infinite volume [20, 83, 85]. Absorbing boundary conditions do not alter<br />

the sparsity structure in the matrix <strong>from</strong> the discretization but have to<br />

be imposed at some distance <strong>from</strong> the scatterer. More accurate exterior<br />

boundary conditions, based on integral equations, allow us to bring the<br />

exterior boundary of the simulation region closer to the surface of the<br />

scatterer and to limit the size of the <strong>linear</strong> system to solve [89, 104]. As<br />

they are based on integral equations, they result in a part of the matrix<br />

being <strong>dense</strong> in the final system which can increase the overall solution cost.


6 1. Introduction<br />

The discretization of large 3D domains may suffer <strong>from</strong> grid dispersion<br />

errors, which occur when a wave has a different phase velocity on the<br />

grid compared to the exact solution [9, 90, 100]. Grid dispersion errors<br />

accumulate in space and, <strong>for</strong> 2D and 3D problems over large simulation<br />

regions, their effect can be troublesome, introducing spurious solutions<br />

in the computation. The effect of grid dispersion errors can be reduced<br />

by using finer grids or higher-order accurate differential equation solvers,<br />

which substantially increase the problem size, or by coupling the differential<br />

equation solver with an integral equation solver.<br />

Because of the sparsity structure of the discretization matrix, differential<br />

equation methods have become popular solution methods <strong>for</strong> EM problems.<br />

1.3.2 Integral equation methods<br />

An alternative class of methods is represented by integral equation solvers.<br />

Using the equivalence principle, system (1.3.4) can be recast in the <strong>for</strong>m of<br />

four integral equations which relate the electric and magnetic fields E and H<br />

to the equivalent electric and magnetic currents J and M on the surface of<br />

the object. Integral equation methods solve <strong>for</strong> the induced currents globally,<br />

whereas differential equation methods solve <strong>for</strong> the fields. The electric-field<br />

integral equation (EFIE) expresses the electric field outside the object E E<br />

in terms of the induced current J. In the case of harmonic time dependency<br />

it reads<br />

∫<br />

E(x) = − ∇G(x, x ′ )ρ(x ′ )d 3 x ′ − ik ∫<br />

G(x, x ′ )J(x ′ )d 3 x ′ + E E (x) (1.3.5)<br />

Γ c Γ<br />

where E E is the electric field due to external sources, and G is the Green’s<br />

function <strong>for</strong> scattering problems:<br />

G(x, x ′ ) = e−ik|x−x′ |<br />

|x − x ′ | .<br />

The EFIE provides a first-kind integral equation which is well known<br />

to be ill-conditioned, but it is the only integral <strong>for</strong>mulation that can be<br />

used <strong>for</strong> open targets. Another <strong>for</strong>mulation, referred to as the magneticfield<br />

integral equation (MFIE), expresses the magnetic field outside the<br />

object in terms of the induced current and allows the calculation of the<br />

magnetic field outside the object. Both <strong>for</strong>mulations suffer <strong>from</strong> interior<br />

resonances, which can make the numerical solution more problematic at<br />

some frequencies known as resonant frequencies. The problem of interior<br />

resonances is particularly troubling <strong>for</strong> large objects. A possible remedy is to<br />

combine <strong>linear</strong>ly the EFIE and MFIE <strong>for</strong>mulation. The resulting equation,<br />

known as the combined-field integral equation (CFIE), does not suffer <strong>from</strong><br />

internal resonance and is much better conditioned as it generally provides<br />

an integral equation of the second-kind, but can be used only <strong>for</strong> closed


1.3. Numerical solution of Maxwell’s equations 7<br />

targets. Owing to these nice properties, the use of the CFIE <strong>for</strong>mulation is<br />

considered mandatory <strong>for</strong> closed surfaces.<br />

The resulting EFIE, MFIE and CFIE are converted into matrix equations<br />

by the Method of Moments [86]. The unknown current J(x) on the surface<br />

of the object is expanded into a set of basis functions B i , i = 1, 2, ..., N<br />

J(x) =<br />

N∑<br />

J i B i (x).<br />

i=1<br />

This expansion is introduced in (1.3.5), and the discretized equation is<br />

applied to a set of test functions. A <strong>linear</strong> system of equations is finally<br />

obtained, whose unknowns are the coefficients of the expansion. The entries<br />

in the coefficient matrix are expressed in terms of surface integrals and<br />

assume the simplified <strong>for</strong>m<br />

∫ ∫<br />

A KL = G(x, y)B K (x) · B L (y)dL(y)dK(x). (1.3.6)<br />

When m-point Gauss quadrature <strong>for</strong>mulae are used to compute the surface<br />

integrals in (1.3.6), the entries of the coefficient matrix have the <strong>for</strong>m<br />

A KL =<br />

m∑ m∑<br />

ω i ω j G(x Ki , y Lj )B K (x Ki ) · B L (y Lj ).<br />

i=1 j=1<br />

The resulting <strong>linear</strong> system is <strong>dense</strong> and complex, unsymmetric in the case<br />

of MFIE and CFIE, symmetric but non-Hermitian in the case of the EFIE<br />

<strong>for</strong>mulation.<br />

For homogeneous or layered homogeneous dielectric bodies, integral<br />

equations are discretized on the surface of the object or at the discontinuous<br />

interfaces between two different materials. Thus the number of unknowns is<br />

generally much smaller when compared to the discretization of large 3D<br />

spaces by finite-difference or finite-element methods. However, a global<br />

coupling of the induced currents in the problem results in <strong>dense</strong> matrices.<br />

The cost of the solution associated with these <strong>dense</strong> matrices has <strong>for</strong> a long<br />

time precluded the popularity of integral solution methods in EM. In recent<br />

years, the application in the context of the study of radar targets of different<br />

materials and the availability of larger computer resources have motivated<br />

an increasing interest towards integral methods.<br />

Throughout this thesis, we focus on preconditioning strategies <strong>for</strong> the<br />

EFIE <strong>for</strong>mulation of scattering problems. In the integral equation context<br />

that we consider, the problems are discretized by the Method of Moments<br />

using the Rao-Wilton-Glisson (RWG) basis functions [116]. The surface<br />

of the object is modelled by a triangular faceted mesh (see Figure 1.3.1),<br />

and each RWG basis is assigned to one interior edge in the mesh. Each<br />

unknown in the problem represents the vectorial flux across each edge in the


8 1. Introduction<br />

triangular mesh. The total number of unknowns is given by the number of<br />

interior edges, which is about one and a half times the number of triangular<br />

facets. In order to have a correct approximation to the oscillating solution<br />

of the Maxwell’s equations, physical constraints impose that the average<br />

edge length a has to be between 0.1λ and 0.2λ, where λ is the wavelength<br />

of the incoming wave [11]. Two factors mainly affect the dimension N of<br />

the <strong>linear</strong> system to solve, namely the total surface area and the frequency<br />

of the problem. For a given target the size of the system is proportional<br />

to the square of the frequency, and the memory cost <strong>for</strong> the storage of<br />

the N 2 complex numbers of the full discretization matrix is proportional<br />

to the fourth power of the frequency. This cost increases drastically when<br />

fine discretization is required, as in the case <strong>for</strong> rough geometries, and can<br />

make the numerical solution of medium size problems unaf<strong>for</strong>dable even on<br />

modern computers. Nowadays a typical electromagnetic problem in industry<br />

can have hundred of thousands or a few million unknowns.<br />

Figure 1.3.1: Example of discretized mesh.<br />

1.4 Direct versus iterative solution methods<br />

Direct methods are often the method of choice <strong>for</strong> the solution of<br />

these <strong>systems</strong> in an industrial environment because they are reliable and<br />

predictable both in terms of accuracy and cost. Dense <strong>linear</strong> algebra<br />

packages such as LAPACK [5] provide reliable implementations of LU<br />

factorization attaining good per<strong>for</strong>mance on modern computer architectures.<br />

In particular, they use Level 3 BLAS [51, 52] <strong>for</strong> block operations which


1.4 Direct versus iterative solution methods 9<br />

enable us to exploit data locality in the cache memory. Except when the<br />

geometries are very irregular, the coefficient matrices of the discretized<br />

problem are not very ill-conditioned, and direct methods compute fairly<br />

accurate solutions. The factorization can be per<strong>for</strong>med once and then is<br />

reused to compute a solution <strong>for</strong> all excitations. In industrial simulations,<br />

objects are illuminated at several, slightly different incidence directions,<br />

and hundred of thousands of <strong>systems</strong> have often to be solved <strong>for</strong> the same<br />

application, all having the same coefficient matrix and a different right-hand<br />

side.<br />

For the solution of large-scale problems, direct methods become<br />

impractical even on large parallel plat<strong>for</strong>ms because they require storage<br />

of N 2 single or double precision complex entries of the coefficient matrix<br />

and O(N 3 ) floating-point operations to compute the factorization, where N<br />

denotes the size of the <strong>linear</strong> system. Some direct solvers with reduced<br />

computational complexity have been introduced <strong>for</strong> the case when the<br />

solution is sought <strong>for</strong> blocks of right-hand sides, like the EADS out-of-core<br />

parallel solver [1], the Nested Equivalence Principle Algorithm (NEPAL) [30,<br />

31] and the Recursive Aggregate T-Matrix Algorithm (RATMA) [31, 32], but<br />

the computational cost remains a bottleneck <strong>for</strong> large-scale applications.<br />

Although, in the last twenty years, computer technology has gone <strong>from</strong><br />

flops to Gigaflops, that is a speedup factor of 10 9 , the size of the largest<br />

<strong>dense</strong> problems solved on current architectures increased by only a factor of<br />

three [56, 57].<br />

1.4.1 A sparse approach <strong>for</strong> solving scattering problems<br />

It can be argued that all large <strong>dense</strong> matrices hide some structure behind<br />

their N 2 entries. The structure sometimes emerges naturally at the<br />

matrix level (Toeplitz, circulant, orthogonal matrices) and sometimes can<br />

be identified <strong>from</strong> the origin of the problem. When the number of unknowns<br />

is large, the discretized problem reflects more closely the properties of the<br />

continuous problem, and the entries of the discretization matrix are far<br />

<strong>from</strong> arbitrary. Exploiting this structure can enable the use of sparse <strong>linear</strong><br />

algebra techniques and lead to a sensible reduction of the overall solution<br />

cost. The use of iterative methods can be promising <strong>from</strong> this viewpoint<br />

because they simply require a routine to compute matrix-vector products<br />

and do not need the knowledge of all the entries of the coefficient matrix.<br />

Special properties of the problem can be profitably used to reduce the<br />

computational cost of this procedure. Under favourable conditions, iterative<br />

methods improve the approximate solution at each step. When the required<br />

accuracy is obtained, one can stop the iteration.<br />

In the last decades, active research ef<strong>for</strong>ts have been devoted to<br />

understanding theoretical and numerical properties of modern iterative<br />

solvers. Although they still cannot compete with direct solvers in terms of


10 1. Introduction<br />

robustness, they have been successfully used in many contexts. In particular,<br />

it is now established that iterative solvers have to be used with some <strong>for</strong>m of<br />

preconditioning to be effective on challenging problems, like those arising in<br />

industry (see, <strong>for</strong> instance, [2, 41, 60, 146]). Provided we have fast matrixvector<br />

multiplications and robust <strong>preconditioners</strong> the iterative solution via<br />

modern Krylov solvers can be an alternative to direct methods.<br />

There are active research ef<strong>for</strong>ts on fast methods [4, 82] to per<strong>for</strong>m fast<br />

matrix-vector products with O(N log N) computational complexity. These<br />

methods, generally referred to as hierarchical methods, were introduced<br />

originally in the context of the study of particle simulations as a way<br />

to reduce costs and enable the solution of large problems, or to demand<br />

more accuracy in the computation [6, 8]. Hierarchical methods can be<br />

effective on boundary element applications, and many research ef<strong>for</strong>ts have<br />

been successful in this direction, including strategies <strong>for</strong> parallel distributed<br />

memory implementations [45, 46, 47, 79, 80].<br />

In this thesis, we focus on the other key component of Krylov methods<br />

in this context; that is, we study the design of robust preconditioning<br />

techniques. The design of the preconditioner is generally very problemdependent<br />

and can take great advantage of a good knowledge of the<br />

underlying physical problem. General purpose <strong>preconditioners</strong> can fail on<br />

specific classes of problems, and <strong>for</strong> some of them a good preconditioner is<br />

not known yet. A preconditioner M is required to be a good approximation<br />

of A in some sense (or of A −1 , depending on the context), to be easy to<br />

compute and cheap to store and to apply. For electromagnetic scattering<br />

problems expressed in integral <strong>for</strong>mulation, some special constraints in<br />

addition to normal constraints are required. For large problems the use<br />

of fast methods is mandatory <strong>for</strong> the matrix-vector products. When fast<br />

methods are used, the coefficient matrix is not completely stored in memory<br />

and only some of the entries, corresponding to the near-field interactions, are<br />

explicitely computed and available <strong>for</strong> the construction of the preconditioner.<br />

Hierarchical methods are often implemented in parallel, partitioning the<br />

domain among different processors and the matrix-vector products are<br />

computed in a distributed manner, trying to meet the goal of both load<br />

balancing and reduced communications. Thus, parallelism is a relevant<br />

factor to consider in the design of the preconditioner. Nowadays the typical<br />

problem size in electromagnetic industry is continually increasing, and the<br />

effectiveness of preconditioned Krylov subspace solvers should be combined<br />

with the property of numerical scalability; that is, the numerical behavior of<br />

the preconditioner should not depend on the mesh size or on the frequency<br />

of the problem. Finally, matrices arising <strong>from</strong> the discretization of integral<br />

equations can be highly indefinite and many standard <strong>preconditioners</strong> can<br />

exhibit surprisingly poor per<strong>for</strong>mance.<br />

This manuscript is structured as follows. In Chapter 2, we establish the<br />

need <strong>for</strong> preconditioning <strong>linear</strong> <strong>systems</strong> of equations which arise <strong>from</strong> the


1.4 Direct versus iterative solution methods 11<br />

discretization of boundary integral equations in electromagnetism, and we<br />

test and compare several standard <strong>preconditioners</strong> computed <strong>from</strong> a sparse<br />

approximation of the <strong>dense</strong> coefficient matrix. We study their numerical<br />

behaviour on a set of model problems arising <strong>from</strong> both academic and <strong>from</strong><br />

industrial applications, and gain some insight on potential causes of failure.<br />

In Chapter 3, we focus our analysis on sparse approximate inverse methods<br />

and we propose some efficient static nonzero pattern selection strategies <strong>for</strong><br />

the construction of a robust Frobenius-norm minimization preconditioner in<br />

electromagnetism. We introduce suitable strategies to identify the relevant<br />

entries to consider in the original matrix A, as well as an appropriate<br />

sparsity structure <strong>for</strong> the approximate inverse. In Chapter 4, we illustrate<br />

the numerical and computational efficiency of the proposed preconditioner<br />

on a set of model problems, and we complete the study considering<br />

two symmetric <strong>preconditioners</strong> based on Frobenius-norm minimization.<br />

In Chapter 5, we consider the implementation of the Frobenius-norm<br />

minimization preconditioner within the code that implements the Fast<br />

Multipole Method (FMM). We combine the sparse approximate inverse<br />

preconditioner with fast multipole techniques <strong>for</strong> the solution of huge<br />

electromagnetic problems. We study the numerical and parallel scalability<br />

of the implementation and we investigate the numerical behaviour of innerouter<br />

iterative solution schemes implemented in a multipole context with<br />

different levels of accuracy <strong>for</strong> the matrix-vector products in the inner and<br />

outer loops. In Chapter 6, we introduce an algebraic multilevel strategy<br />

based on low-rank updates <strong>for</strong> the preconditioner computed by using spectral<br />

in<strong>for</strong>mation of the preconditioned matrix. We illustrate the computational<br />

and numerical efficiency of the algorithm on a set of model problems<br />

that is representative of real electromagnetic calculation. We finally draw<br />

some conclusions arising <strong>from</strong> the work and address perspectives <strong>for</strong> future<br />

research.


12 1. Introduction


Chapter 2<br />

Iterative solution via<br />

preconditioned Krylov<br />

solvers of <strong>dense</strong> <strong>systems</strong> in<br />

electromagnetism<br />

In this chapter we establish the need <strong>for</strong> preconditioning <strong>linear</strong> <strong>systems</strong> of<br />

equations which arise <strong>from</strong> the discretization of boundary integral equations<br />

in electromagnetism. In Section 2.1, we illustrate the numerical behaviour<br />

of iterative Krylov solvers on a set of model problems arising both <strong>from</strong><br />

industrial and <strong>from</strong> academic applications. The numerical results suggest<br />

the need <strong>for</strong> preconditioning to effectively reduce the number of iterations<br />

required to obtain convergence. In Section 2.2, we introduce the idea of<br />

preconditioning based on sparsification strategies, and we test and compare<br />

several standard <strong>preconditioners</strong> computed <strong>from</strong> a sparse approximation of<br />

the <strong>dense</strong> coefficient matrix. We study their numerical behaviour on model<br />

problems and gain some insight on potential causes of failure.<br />

2.1 Introduction and motivation<br />

In this section we study the numerical behaviour of several iterative<br />

solvers <strong>for</strong> the solution of <strong>linear</strong> <strong>systems</strong> of the <strong>for</strong>m<br />

Ax = b (2.1.1)<br />

where the coefficient matrix A arises <strong>from</strong> the discretization of boundary<br />

integral equations in electromagnetism. Among different integral<br />

13


14 2. Iterative solution via preconditioned Krylov solvers ...<br />

<strong>for</strong>mulations here we focus on the EFIE <strong>for</strong>mulation 1.3.5, because it is more<br />

general and more difficult to solve. We use the following Krylov methods:<br />

• restarted GMRES [123];<br />

• Bi-CGSTAB [142] and Bi-CGSTAB(2) [129];<br />

• symmetric [69], nonsymmetric [67] and transpose-free QMR [66];<br />

• CGS [131].<br />

As a set of model problems <strong>for</strong> the numerical experiments we consider<br />

the following geometries, arising both <strong>from</strong> academic and <strong>from</strong> industrial<br />

applications, that are representative of the general numerical behaviour<br />

observed. For physical consistency we have set the frequency of the wave so<br />

that there are about ten discretization points per wavelength [11].<br />

Example 1: a cylinder with a hollow inside, a matrix of order n = 1080,<br />

see Figure 2.1.1(a);<br />

Example 2: a cylinder with a break on the surface, a matrix of order<br />

n = 1299, see Figure 2.1.1(b);<br />

Example 3: a satellite, a matrix of order n = 1701, see Figure 2.1.1(c);<br />

Example 4: a parallelopiped, a matrix of order n = 2016, see<br />

Figure 2.1.1(d); and<br />

Example 5: a sphere, a matrix of order n = 2430, see Figure 2.1.1(e).<br />

The first three examples are considered because they can be<br />

representative of real industrial simulations. The geometries of Examples 4<br />

and 5 are very regular, and they are mainly introduced to study the<br />

numerical behaviour of the proposed methods on smooth surfaces. In spite<br />

of their small dimension, these problems are not easy to solve. Except <strong>for</strong><br />

two of the model problems, the sphere and the parallelopiped, the other<br />

problems are tough because their geometries have open surfaces. Larger<br />

problems will be examined in Chapter 5 when we consider the multipole<br />

method.


2.1. Introduction and motivation 15<br />

(a) Example 1 (b) Example 2<br />

(c) Example 3 (d) Example 4<br />

(e) Example 5<br />

Figure 2.1.1: Meshes associated with test examples.


16 2. Iterative solution via preconditioned Krylov solvers ...<br />

Table 2.1.1 shows the number of matrix-vector products needed by<br />

each of the solvers to reduce the residual by 10 −5 . This tolerance can be<br />

accurate <strong>for</strong> engineering purposes, as it enables to localize fairly accurately<br />

the distribution of the currents on the surface of the object. In each case,<br />

we take as initial guess x 0 = 0, and the right-hand side such that the exact<br />

solution of the system is known. In the GMRES code [63] and the symmetric<br />

QMR code [62] (referred to as SQMR in the <strong>for</strong>thcoming tables), iterations<br />

are stopped when, <strong>for</strong> the current approximation x m , the computed value of<br />

‖r m‖ 2<br />

α‖x m ‖ 2 +β<br />

satisfied a fixed tolerance. Here r m is the residual vector r m = b − Ax m ,<br />

and standard choices <strong>for</strong> constants α and β in backward error analysis<br />

are α = ‖A‖ 2 and β = ‖b‖ 2 . In all our tests we use α = 0 and<br />

β = ‖b‖ 2 = ‖r 0 ‖ 2 because of initial guess. For CGS and Bi-CGSTAB, we<br />

use the implementations provided by HSL 2000 [87] subroutines MI06 and<br />

MI03 respectively, suitably adapted to complex arithmetic. These routines<br />

accept the current approximation x m when<br />

‖b − Ax m ‖ ≤ max(‖b − Ax 0 ‖ 2 · ε 1 , ε 2 ),<br />

where ε 1 and ε 2 are user-defined tolerances. In our case we take ε 1 as<br />

equal to the required accuracy, and ε 2 = 0.0. For Bi-CGSTAB(2) we<br />

use the implementation developed by D. Fokkema of the Bi-CGSTAB(l)<br />

algorithm, which introduces some enhancements to improve stability and<br />

robustness, as explained in [127] and [128]. The algorithm stops iterations<br />

when the relative residual norm ‖r n ‖ 2 /‖r 0 ‖ 2 becomes smaller than a fixed<br />

tolerance. In the tests with nonsymmetric QMR (referred to as UQMR<br />

in the <strong>for</strong>thcoming tables) and TFQMR, we use, respectively, the ZUCPL<br />

and ZUTFX routines provided in QMRPACK [70]. In particular, ZUCPL<br />

implements a double complex nonsymmetric QMR algorithm based on the<br />

coupled two-term look-ahead Lanczos variant (see [68]). Both ZUCPL<br />

and ZUTFX stop iterations when the relative residual norm ‖r n ‖ 2 /‖r 0 ‖ 2<br />

becomes smaller than a fixed tolerance. Notice that, since x 0 = 0, all the<br />

stopping criteria are equivalent, allowing fair comparison among all those<br />

methods. All the numerical experiments reported in this section correspond<br />

to runs on a Sun workstation in double complex arithmetic and Level 2<br />

BLAS operations are used to carry out <strong>dense</strong> matrix-vector products. In<br />

connection with GMRES, we test different values of the restart m, <strong>from</strong> 10<br />

up to 110. We recall that each iteration involves one matrix-vector product<br />

<strong>for</strong> restarted GMRES and SQMR, two <strong>for</strong> Bi-CGSTAB and CGS, three <strong>for</strong><br />

UQMR and four <strong>for</strong> TFQMR.


2.1. Introduction and motivation 17<br />

Example<br />

Size<br />

GMRES(m) Bi -<br />

CGStab<br />

m=10 m=30 m=50 m=80 m=110<br />

1 1080 +1000 +1000 600 255 204 445<br />

2 1299 +1000 826 589 403 292 717<br />

3 1701 +1000 824 651 556 493 +1000<br />

4 2016 426 232 195 160 149 354<br />

5 2430 +1000 356 238 148 127 303<br />

Example<br />

Size<br />

Bi -<br />

CGStab(2)<br />

SQMR UQMR TFQMR CGS<br />

1 1080 320 149 693 700 330<br />

2 1299 404 186 +1000 996 438<br />

3 1701 668 345 +1000 +1000 418<br />

4 2016 228 92 514 470 256<br />

5 2430 284 98 511 434 284<br />

Table 2.1.1: Number of matrix-vector products needed by some<br />

unpreconditioned Krylov solvers to reduce the residual by a factor of 10 −5 .<br />

Except <strong>for</strong> SQMR, all the other solvers exhibit very slow convergence<br />

on the first three examples which correspond to irregular geometries and<br />

are more difficult to solve. The last two examples are easier because the<br />

geometries are very regular, however the iterative solution is still expensive<br />

in terms of number of matrix-vector products. These experiments reveal the<br />

remarkable robustness of SQMR that clearly outper<strong>for</strong>ms non-symmetric<br />

solvers on all the test cases, even GMRES <strong>for</strong> large restarts. The results<br />

also reveal the good per<strong>for</strong>mance of Bi-CGSTAB(2) compared to the<br />

standard Bi-CGSTAB method which generally requires at least one third<br />

more matrix-vector products to converge. On the most difficult problems,<br />

slow convergence is essentially due to the bad spectral properties of the<br />

coefficient matrix. Figure 2.1.2 plots the distribution of eigenvalues in the<br />

complex plane <strong>for</strong> Example 3; the eigenvalues are scattered <strong>from</strong> the left to<br />

the right of the spectrum, many of them have large negative real part and<br />

no clustering appears. Such a distribution is not at all favourable <strong>for</strong> the<br />

rapid convergence of Krylov solvers.<br />

Krylov methods look <strong>for</strong> the solution of the system in the Krylov space<br />

K k (A, b) = span{b, Ab, A 2 b, ..., A k−1 b}. This is a good space <strong>from</strong> which<br />

to construct approximate solutions <strong>for</strong> a nonsingular <strong>linear</strong> system because<br />

it is intimately related to A −1 . The inverse of any nonsingular matrix<br />

A can be written in terms of powers of A with the help of the minimal<br />

polynomial q(t) of A, which is the unique monic polynomial of minimum


18 2. Iterative solution via preconditioned Krylov solvers ...<br />

3.5<br />

3<br />

2.5<br />

2<br />

Imaginary axis<br />

1.5<br />

1<br />

0.5<br />

0<br />

−0.5<br />

−45 −40 −35 −30 −25 −20 −15 −10 −5 0 5<br />

Real axis<br />

Figure 2.1.2: Eigenvalue distribution in the complex plane of the coefficient<br />

matrix of Example 3.<br />

degree such that q(A) = 0. If the minimal polynomial of A has degree m,<br />

then the solution of Ax = b lies in the space K m (A, b). Consequently, the<br />

smaller the degree of the minimal polynomial, the faster the expected rate<br />

of convergence of a Krylov method (see [88]). If preconditioning A by a<br />

nonsingular matrix M causes the eigenvalues of M −1 A to fall into a few<br />

clusters, say t of them, whose diameters are small enough, then M −1 A<br />

behaves numerically like a matrix with t distinct eigenvalues. As a result,<br />

we would expect t iterations of a Krylov method to produce reasonably<br />

accurate approximations. It has been shown in [74, 122, 148] that in<br />

practice, with the availability of a high quality preconditioner, the choice of<br />

the Krylov subspace accelerator is not so critical.<br />

2.2 Preconditioning based on sparsification<br />

strategies<br />

A preconditioner M should satisfy the following demands:<br />

• M is a good approximation to A in some sense (sometimes to A −1 ,<br />

depending on the context);<br />

• the construction and storage of M is not expensive;


2.2. Preconditioning based on sparsification strategies 19<br />

• the system Mx = b is much easier to solve than the original one.<br />

The trans<strong>for</strong>med preconditioned system has the <strong>for</strong>m M −1 Ax = M −1 b if<br />

preconditioning <strong>from</strong> the left, and AM −1 y = b , with x = M −1 y, when<br />

preconditioning <strong>from</strong> the right. For a preconditioner M given in the <strong>for</strong>m<br />

M = M 1 M 2 , it is also possible to consider the two-sided preconditioned<br />

system M1 −1 −1<br />

AM2 z = M 1 −1<br />

−1<br />

b, with x = M2 z.<br />

Most of the existing <strong>preconditioners</strong> can be divided into either implicit or<br />

explicit <strong>for</strong>m. A preconditioner is said to be of implicit <strong>for</strong>m if its application,<br />

within each step of an iterative method, requires the solution of a <strong>linear</strong><br />

system; it is implicitly defined by any nonsingular matrix M ≈ A. The most<br />

important example of this class is represented by incomplete factorization<br />

methods, where M is implicitly defined by M = ¯LŪ, ¯L and Ū are generally<br />

triangular matrices that approximate the exact L and U factors <strong>from</strong> a<br />

standard factorization of A according to some dropping strategy adopted<br />

during the factorization. It is well known that these methods are sensitive to<br />

indefiniteness in the coefficient matrix A and can lead to unstable triangular<br />

solves and very poor <strong>preconditioners</strong> (see [34]). Another important drawback<br />

of ILU-techniques is that they are not naturally suitable <strong>for</strong> a parallel<br />

implementation since the sparse triangular solves can lead to a severe<br />

degradation of per<strong>for</strong>mance on vector and parallel machines.<br />

Explicit preconditioning techniques try to mitigate such difficulties.<br />

They directly approximate A −1 as the product M of sparse matrices, so that<br />

the preconditioning operation reduces to <strong>for</strong>ming one or more matrix-vector<br />

products. Consequently the application of the preconditioner should be<br />

easier to parallelize, with different strategies depending on the particular<br />

architecture. In addition, some of these techniques can also per<strong>for</strong>m the<br />

construction phase in parallel. On certain indefinite problems with large<br />

nonsymmetric parts, these methods have provided better results than<br />

techniques based on incomplete factorizations (see [35]), representing an<br />

efficient alternative to the solution of difficult applications. A comparison<br />

of approximate inverse and ILU can be found in [76].<br />

In the next sections, we study the numerical behaviour of several<br />

standard <strong>preconditioners</strong> both of implicit and of explicit <strong>for</strong>m in<br />

combination with Krylov methods <strong>for</strong> the solution of <strong>systems</strong> (2.1.1).<br />

All the <strong>preconditioners</strong> are computed <strong>from</strong> a sparse approximation of the<br />

<strong>dense</strong> coefficient matrix. On general problems, this approach can cause<br />

a severe deterioration of the quality of the preconditioner; in the BEM<br />

context, it is likely to be more effective since a very sparse matrix can retain<br />

the most relevant contributions to the singular integrals. In Figure 2.2.3<br />

we depict the pattern structure of the large entries in the discretization<br />

matrix <strong>for</strong> Example 5, which is representative of the general trend. Large<br />

to small entries are depicted in different colours, <strong>from</strong> red to green, yellow


20 2. Iterative solution via preconditioned Krylov solvers ...<br />

and blue. The picture shows that, in the discretization matrix, only a<br />

small set of entries generally have large magnitude. The largest entries are<br />

located on the main diagonal and only a few adjacent bands have entries<br />

of high magnitude. Most of remaining entries generally have much smaller<br />

modulus. In Figure 2.2.4, we plot <strong>for</strong> the same example the matrix obtained<br />

by scaling A = [a ij ] so that max i,j |a ij | = 1, and discarding <strong>from</strong> A all<br />

entries less than ε = 0.05 in modulus. This matrix is 98.5% sparse. The<br />

figure emphasizes the presence of the strong coupling among neighbouring<br />

edges introduced in the geometrical domain by the Boundary Element<br />

Method, and suggests the possibility of extracting a sparsity pattern <strong>from</strong><br />

A by simply discarding elements of negligible magnitude, which correspond<br />

to weak contributions of coupling among distant nodes.<br />

Figure 2.2.3: Pattern structure of the large entries of A. The test problem<br />

is Example 5.<br />

The dropping operation is generally referred to as sparsification. The<br />

idea of sparsifying <strong>dense</strong> matrices be<strong>for</strong>e computing the preconditioner was<br />

introduced by Kolotilina [93] in the context of sparse approximate inverse<br />

methods. Alléon et al. [2], Chen [28] and Vavasis [144] used this idea <strong>for</strong><br />

the preconditioning of <strong>dense</strong> <strong>systems</strong> <strong>from</strong> the discretization of boundary<br />

integral equations, and Tang and Wan [140] in the context of multigrid<br />

methods. Similar ideas are also exploited by Ruge and Stüben [118] in the


2.2. Preconditioning based on sparsification strategies 21<br />

Figure 2.2.4: Nonzero pattern <strong>for</strong> A when the smallest entries are discarded.<br />

The test problem is Example 5.<br />

context of algebraic multigrid methods. On sparse <strong>systems</strong>, sparsification<br />

can be helpful to identify the most relevant connections in the direct<br />

problem, especially when the coefficient matrix contains many small entries<br />

or is fairly <strong>dense</strong> (see [33] and [91]).<br />

Several heuristics can be used to sparsify A and to try and retain the<br />

main contributions to the singular integrals. Some approaches are the<br />

following:<br />

• find, in each column of A, the k entries of largest modulus, where<br />

k ≪ n is a positive integer. The choice of the parameter k is generally<br />

problem-dependent. The resulting matrix will have exactly k·n entries;<br />

• <strong>for</strong> each column of A, select the row indices of the k largest entries<br />

in modulus and then, <strong>for</strong> each row index i corresponding to one of<br />

these entries, per<strong>for</strong>ming the same search on column i. These new row<br />

indices will be added to the previous ones to <strong>for</strong>m the nonzero pattern<br />

<strong>for</strong> the column. This heuristic, referred to as neighbours of neighbours,<br />

is described in detail in [36];<br />

• the same approach as in the previous heuristic, but per<strong>for</strong>ming more<br />

than one iteration, and halving the number of largest entries to be<br />

located at each iteration in order to preserve sparsity. In practice, two<br />

iterations are enough [2];


22 2. Iterative solution via preconditioned Krylov solvers ...<br />

• scaling A such that its largest entry has magnitude equal to 1, and<br />

retaining in the pattern only the elements located in positions (i, j)<br />

such that ‖a ij ‖ > ε, where the threshold parameter ε ∈ (0, 1). This<br />

heuristic was proposed by Kolotilina in [93].<br />

Combinations of these approaches can be also used. In the numerical<br />

experiments the <strong>preconditioners</strong> considered are constructed <strong>from</strong> the sparse<br />

near-field approximation of A, computed by using the first heuristic. We will<br />

refer to this matrix as sparsified(A) and denote it as Ã. We symmetrize the<br />

pattern after computing it in order to preserve symmetry in Ã. We consider<br />

the following methods implemented as right <strong>preconditioners</strong> :<br />

• SSOR(ω), where ω is the relaxation parameter ;<br />

• IC(k), the incomplete Cholesky factorization technique with k levels<br />

of fill-in, i.e. taking <strong>for</strong> the factors a sparsity pattern based on position<br />

and prescribed in advance;<br />

• AINV , the approximate inverse method introduced in [16] that uses<br />

a dropping strategy based on values;<br />

• SP AI, a Frobenius-norm minimization technique with the adaptive<br />

strategy proposed by Gould and Scott [76] <strong>for</strong> the selection of the<br />

sparsity pattern <strong>for</strong> the preconditioner.<br />

In order to illustrate the trend in the behaviour of these <strong>preconditioners</strong>,<br />

we first show in Table 2.2.2 the number of iterations required to compute the<br />

solution on Example 1. All the <strong>preconditioners</strong> are computed using the same<br />

sparse approximation of the original matrix and all have roughly the same<br />

number of nonzeros entries. In the incomplete Cholesky factorization, no<br />

additional level of fill-in was allowed in the factors; with AINV , we selected<br />

a suitable dropping threshold (around 10 −3 ) to obtain the same degree of<br />

density as the other methods; and finally, with SP AI, we chose a priori, <strong>for</strong><br />

each column of M, the same fixed maximum number of nonzeros as in the<br />

computation of sparsified(A). In the SSOR method, we choose ω=1. In<br />

Table 2.2.2 we give the number of iterations <strong>for</strong> both GMRES and SQMR<br />

that actually also corresponds to the number of matrix-vector products that<br />

is the most time consuming part of the algorithms. We intend, in the<br />

following sections, to understand the numerical behaviour of these methods<br />

on electromagnetics problems, identifying some potential causes of failure.<br />

2.2.1 SSOR<br />

The SSOR preconditioner is the most basic preconditioning method apart<br />

<strong>from</strong> a diagonal scaling. It is defined as


2.2. Preconditioning based on sparsification strategies 23<br />

Example 1 - Density of à = 4% - Density of M = 4%<br />

Precond. GMRES(50) GMRES(110) GMRES(∞) SQMR<br />

None – 204 139 142<br />

Jacobi 465 174 134 142<br />

SSOR 214 100 100 145<br />

IC(0) – – 159 –<br />

AINV – – – –<br />

SP AI 336 79 79 *<br />

Table 2.2.2: Number of iterations using both symmetric and unsymmetric<br />

preconditioned Krylov methods to reduce the normwise backward error by<br />

10 −5 on Example 1. The symbol ’-’ means that convergence was not obtained<br />

after 500 iterations. The symbol ’*’ means that the method is not applicable.<br />

M = (D + ωE)D −1 (D + ωE T )<br />

where E is the strictly lower triangular part of Ã, and D is the diagonal<br />

matrix whose nonzero entries are the diagonal entries of Ã. In the case<br />

ω = 1, D + E is the lower part of Ã, including the diagonal, and D + ET is<br />

the upper part of Ã. We recall that à is symmetric, because A is symmetric<br />

and we use a symmetric pattern <strong>for</strong> the sparsification.<br />

In Table 2.2.3 we show the number of iterations required by different<br />

Krylov solvers preconditioned by SSOR to reduce the residual by a factor<br />

of 10 −5 . For those experiments we use ω = 1 to compute the preconditioner<br />

and we consider increasing values of density <strong>for</strong> the matrix Ã. Although<br />

very cheap to compute, SSOR is not very robust. Increasing the density of<br />

the sparse approximation of A does not help to improve its per<strong>for</strong>mance,<br />

and indeed on some problems it behaves like a diagonal scaling (ω = 0). In<br />

Figures 2.2.5 and Figures 2.2.6 we illustrate the sensitivity of the SQMR<br />

convergence to the parameter ω <strong>for</strong> Examples 1 and 4. When SSOR is used<br />

as a stationary iterative solver, the relaxation parameter ω is selected in the<br />

interval [0,2]. When SSOR is used as a preconditioner, the choice of the ω<br />

parameter might be less constraining; thus we also show experiments with<br />

values a bit larger than 2.0.


24 2. Iterative solution via preconditioned Krylov solvers ...<br />

Density of Ã<br />

GMRES(m)<br />

Example 1<br />

Bi -<br />

CGStab<br />

UQMR SQMR TFQMR<br />

m=10 m=30 m=50 m=80 m=110<br />

2% – – 213 145 103 310 – 149 –<br />

4% – – 214 139 100 297 – 145 –<br />

6% – – 224 136 98 317 – 149 –<br />

8% – – 216 127 95 307 – 149 –<br />

10% – – 202 126 94 360 – 151 –<br />

Example 2<br />

Density of Ã<br />

GMRES(m)<br />

Bi -<br />

CGStab<br />

UQMR SQMR TFQMR<br />

m=10 m=30 m=50 m=80 m=110<br />

2% – 478 269 184 146 – – 195 –<br />

4% – – 281 178 145 349 – 187 –<br />

6% – – 350 194 152 – – 186 –<br />

8% – – 381 205 156 – – 189 –<br />

10% – – 385 200 157 428 – 193 –<br />

Example 3<br />

Density of Ã<br />

GMRES(m)<br />

Bi -<br />

CGStab<br />

UQMR SQMR TFQMR<br />

m=10 m=30 m=50 m=80 m=110<br />

2% – – 411 314 245 – – 419 –<br />

4% – – 405 306 233 – – 420 –<br />

6% – – 406 306 231 486 – 412 –<br />

8% – – 405 303 228 498 – 421 –<br />

10% – – 406 302 229 – – 326 –<br />

Example 4<br />

Density of Ã<br />

GMRES(m)<br />

Bi -<br />

CGStab<br />

UQMR SQMR TFQMR<br />

m=10 m=30 m=50 m=80 m=110<br />

2% 371 192 138 116 95 193 379 85 342<br />

4% 457 206 145 119 95 221 387 85 400<br />

6% 464 208 148 119 97 224 399 85 356<br />

8% 445 214 152 121 97 263 389 85 392<br />

10% 475 217 157 122 96 223 396 85 402<br />

Continued on next page


2.2. Preconditioning based on sparsification strategies 25<br />

Density of Ã<br />

GMRES(m)<br />

Example 5<br />

Continued <strong>from</strong> previous page<br />

Bi -<br />

CGStab<br />

UQMR SQMR TFQMR<br />

m=10 m=30 m=50 m=80 m=110<br />

2% – 327 208 152 125 371 – 67 –<br />

4% – 436 272 192 160 471 – 67 –<br />

6% – – 333 217 184 – – 68 –<br />

8% – – 381 231 191 – – 68 –<br />

10% – – 423 242 195 – – 73 –<br />

Table 2.2.3: Number of iterations required by different Krylov<br />

solvers preconditioned by SSOR to reduce the residual by<br />

10 −5 . The symbol ’-’ means that convergence was not<br />

obtained after 500 iterations.<br />

165<br />

Example 1 − Size = 1080 − Density of sparsified(A) = 6 %<br />

160<br />

SQMR iterations<br />

155<br />

150<br />

145<br />

140<br />

0 0.5 1 1.5 2 2.5<br />

Value of ω<br />

Figure 2.2.5: Sensitivity of SQMR convergence to the SSOR parameter ω<br />

<strong>for</strong> Example 1.<br />

2.2.2 Incomplete Cholesky factorization<br />

Incomplete factorization methods are one of the most natural ways to<br />

construct <strong>preconditioners</strong> of implicit type. In the general nonsymmetric<br />

case, they start <strong>from</strong> a factorization method such as LU or Cholesky


26 2. Iterative solution via preconditioned Krylov solvers ...<br />

105<br />

Example 4 − Size = 2016 − Density of sparsified(A) = 6 %<br />

100<br />

SQMR iterations<br />

95<br />

90<br />

85<br />

80<br />

0 0.5 1 1.5 2 2.5<br />

Value of ω<br />

Figure 2.2.6: Sensitivity of SQMR convergence to the SSOR parameter ω<br />

<strong>for</strong> Example 4.<br />

decomposition or even QR factorization that decompose the matrix into the<br />

product of triangular factors, and thus modify it to reduce the construction<br />

cost. The basic idea is to keep the factors artificially sparse, <strong>for</strong> instance<br />

by dropping some elements in prescribed nondiagonal positions during the<br />

standard Gaussian elimination algorithm. It is well known that, even when<br />

the matrix is sparse, the triangular factors L and U and similarly the unitary<br />

and the upper triangular factors Q and R can often be fairly <strong>dense</strong>. The<br />

preconditioning operation z = M −1 y is computed by solving the <strong>linear</strong><br />

system ¯LŪz = y, where ¯L ≈ L and Ū ≈ U, that is per<strong>for</strong>med in two<br />

distinct steps:<br />

1. solve ¯Lw = y<br />

2. solve Ūz = w.<br />

ILU <strong>preconditioners</strong> are amongst the most reliable in a general setting.<br />

Originally developed <strong>for</strong> sparse matrices, they can be applied also to <strong>dense</strong><br />

<strong>systems</strong>, by extracting a sparsity pattern in advance, and per<strong>for</strong>ming the<br />

incomplete factorization on the sparsified matrix. This class has been<br />

intensively studied, and successfully employed on a wide range of symmetric<br />

problems, providing a good balance between computational costs and<br />

reduction of the number of iterations (see [27] and [55]). Well known<br />

theoretical results on the existence and stability of the factorization can<br />

be proved <strong>for</strong> the class of M-matrices [105], and recent studies involve more<br />

general symmetric matrices, both structured and unstructured.


2.2. Preconditioning based on sparsification strategies 27<br />

In this section, we consider the incomplete Cholesky factorization and<br />

denote it by IC. We assume that the standard IC factorization matrix M<br />

of à is given in the following <strong>for</strong>m<br />

M = LDL T , (2.2.2)<br />

where D and L stand <strong>for</strong>, respectively, the diagonal matrix and the unit lower<br />

triangular matrix whose entries are computed by means of the algorithm<br />

given in Figure 2.2.7. The set F of fill-in entries to be kept is given by<br />

F = { (k, i) | lev(l k,i ) ≤ l } ,<br />

where integer l denotes a user specified maximal fill-in level.<br />

lev(l k,i ) of the coefficient l k,i of L is defined by:<br />

The level<br />

Initialization<br />

Factorization<br />

lev(l k,i ) =<br />

⎧<br />

⎨<br />

⎩<br />

0 if l k,i ≠ 0 or k = i<br />

∞<br />

otherwise<br />

lev(l k,i ) = min { lev(l k,i ) , lev(l i,j ) + lev(l k,j ) + 1 } .<br />

The resulting preconditioner is usually denoted by IC(l). Alternative<br />

strategies that dynamically discard fill-in entries are summarized in [122].<br />

In Tables 2.2.4 to 2.2.8, we display the number of iterations using<br />

an incomplete Cholesky factorization preconditioner on the five model<br />

problems. In this and in the <strong>for</strong>thcoming tables the symbol ’-’ means that<br />

convergence was not obtained after 500 iterations. We show results <strong>for</strong><br />

increasing values of the density <strong>for</strong> the sparse approximation of A as well<br />

as various levels of fill-in. The general trend is that increasing the fill-in<br />

generally produces a much more robust preconditioner than IC(0) applied to<br />

a <strong>dense</strong>r sparse approximation of the original matrix. Moreover, IC(l) with<br />

l ≥ 1 may deliver a good rate of convergence provided the coefficient matrix<br />

is not too sparse, as we get closer to LDL T . However, on indefinite problems<br />

the numerical behaviour of IC can be fairly chaotic. This can be observed in<br />

Table 2.2.8 <strong>for</strong> Example 5. The factorization of a very sparse approximation<br />

(up to 2%) of the coefficient matrix can be stable and deliver a good rate of<br />

convergence, especially if at least one level of fill-in is retained. For higher<br />

values of density <strong>for</strong> the approximation of A, the factors may become very<br />

ill-conditioned and consequently the preconditioner is very poor. As shown<br />

in the tables, ill-conditioning of the factors is not related to ill-conditioning<br />

of the matrix Ã. This behaviour has been already observed on sparse real<br />

indefinite <strong>systems</strong>, see <strong>for</strong> instance [34].<br />

As an attempt <strong>for</strong> a possible remedy, following [109, 110], we apply IC(l)<br />

to a perturbation of à by a complex diagonal matrix. More specifically, we


28 2. Iterative solution via preconditioned Krylov solvers ...<br />

Compute D and L<br />

Initialization phase<br />

d i,i = ã i,i ,<br />

i = 1, 2, · · · , n<br />

l i,j = ã i,j , i = 2, · · · , n , j = 1, 2, · · · , i − 1<br />

Incomplete factorization process<br />

do j = 1, 2, · · · , n − 1<br />

do i = j + 1, j + 2, · · · , n<br />

d i,i = d i,i − l2 i,j<br />

d j,j<br />

end do<br />

end do<br />

l i,j = l i,j<br />

d j,j<br />

do k = i + 1, i + 2, · · · , n<br />

end do<br />

if (i, k) ∈ F<br />

l k,i = l k,i − l i,j l k,j<br />

Figure 2.2.7: Incomplete factorization algorithm - M = LDL T .<br />

use<br />

à τ = à + i τh∆ r , (2.2.3)<br />

where ∆ r = diag(Re(A)) = diag(Re(Ã)), and τ stands <strong>for</strong> a nonnegative<br />

real parameter, while<br />

h = n − 1 d with d = 3 (the space dimension). (2.2.4)<br />

The intention is to move the eigenvalues of the preconditioned system along<br />

the imaginary axis and thus avoid a possible eigenvalue cluster close to zero.<br />

In Table 2.2.9, we show the number of SQMR iterations <strong>for</strong> different<br />

values of τ, the shift parameter, and various levels of fill-in in the<br />

preconditioner. The value of the shift is problem-dependent, and should be<br />

selected to ensure a good balance between making the factorization process<br />

more stable without perturbing significantly the coefficient matrix. A good<br />

value can be between 0 and 2. Although it is not easy to tune and its effect is<br />

difficult to predict, a small diagonal shift can help to compute a more stable<br />

factorization, and in some cases the per<strong>for</strong>mance of the preconditioner can<br />

significantly improve.<br />

In Figures 2.2.8, 2.2.9 and 2.2.10, we illustrate the effect of this shift<br />

strategy on the eigenvalue distribution of the preconditioned matrix. For


2.2. Preconditioning based on sparsification strategies 29<br />

Example 1<br />

Density of à = 2% - K∞(Ã) = 50321<br />

IC(level) Density of M GMRES(30) GMRES(50) SQMR<br />

IC(0) 2.0% – – –<br />

IC(1) 4.5% – – –<br />

IC(2) 7.8% – – –<br />

Density of à = 3% - K∞(Ã) = 120282<br />

IC(level) Density of M GMRES(30) GMRES(50) SQMR<br />

IC(0) 3.0% – – –<br />

IC(1) 7.5% – – –<br />

IC(2) 13.0% – – –<br />

Density of à = 4% - K∞(Ã) = 29727<br />

IC(level) Density of M GMRES(30) GMRES(50) SQMR<br />

IC(0) 4.0% – – –<br />

IC(1) 11.9% – – –<br />

IC(2) 23.4% – – 194<br />

Density of à = 5% - K∞(Ã) = 5350<br />

IC(level) Density of M GMRES(30) GMRES(50) SQMR<br />

IC(0) 5.0% – – 398<br />

IC(1) 16.9% – – 222<br />

IC(2) 32.3% 310 100 86<br />

Density of à = 6% - K∞(Ã) = 12610<br />

IC(level) Density of M GMRES(30) GMRES(50) SQMR<br />

IC(0) 6.0% – – 296<br />

IC(1) 21.7% – – 128<br />

IC(2) 39.0% 81 46 45<br />

Table 2.2.4: Number of iterations, varying the sparsity level of à and the<br />

level of fill-in on Example 1.<br />

each value of the shift parameter τ, we display κ(L), the condition number<br />

(calculated using the LAPACK package) of the computed L factor, and<br />

the number of iterations required by SQMR. The eigenvalues are scattered<br />

all over the complex plane when no shift is used, whereas they look more<br />

clustered when a shift is applied. As we mentioned be<strong>for</strong>e, a clustered<br />

spectrum of the preconditioned matrix is usually considered a desirable<br />

property <strong>for</strong> fast convergence of Krylov solvers. However, <strong>for</strong> incomplete<br />

factorizations the condition number of the factors plays a more important<br />

role than the eigenvalue distribution on the rate of convergence of the Krylov<br />

iterations. In fact, if the triangular factors computed by the incomplete<br />

factorization process are very ill-conditioned, the long recurrences associated


30 2. Iterative solution via preconditioned Krylov solvers ...<br />

Example 2<br />

Density of à = 2% - K∞(Ã) = 14136<br />

IC(level) Density of M GMRES(30) GMRES(50) SQMR<br />

IC(0) 2.0% – – 168<br />

IC(1) 4.1% – – 386<br />

IC(2) 6.6% – – –<br />

Density of à = 3% - K∞(Ã) = 998<br />

IC(level) Density of M GMRES(30) GMRES(50) SQMR<br />

IC(0) 3.0% – – 171<br />

IC(1) 6.7% 84 76 35<br />

IC(2) 11.5% 84 46 30<br />

Density of à = 4% - K∞(Ã) = 737<br />

IC(level) Density of M GMRES(30) GMRES(50) SQMR<br />

IC(0) 4.0% – 327 121<br />

IC(1) 9.9% 46 38 31<br />

IC(2) 17.5% 32 31 25<br />

Density of à = 5% - K∞(Ã) = 647<br />

IC(level) Density of M GMRES(30) GMRES(50) SQMR<br />

IC(0) 5.0% – – 103<br />

IC(1) 13.2% 41 36 31<br />

IC(2) 23.4% 29 29 25<br />

Density of à = 6% - K∞(Ã) = 648<br />

IC(level) Density of M GMRES(30) GMRES(50) SQMR<br />

IC(0) 6.0% – – 143<br />

IC(1) 15.9% 41 35 30<br />

IC(2) 28.2% 28 29 23<br />

Table 2.2.5: Number of iterations, varying the sparsity level of à and the<br />

level of fill-in on Example 2.<br />

with the triangular solves are unstable and the use of the preconditioner may<br />

be totally uneffective.<br />

An auto-tuned strategy might be designed, which consists in incrementing<br />

the value of the shift and computing a new incomplete factorization if<br />

the condition number of the current factor is too large. Although time<br />

consuming, this strategy might construct a robust shifted IC factorization<br />

on highly indefinite problems.


2.2. Preconditioning based on sparsification strategies 31<br />

Example 3<br />

Density of à = 2% - K∞(Ã) = 33348<br />

IC(level) Density of M GMRES(30) GMRES(50) SQMR<br />

IC(0) 2.0% – – –<br />

IC(1) 4.5% – – –<br />

IC(2) 7.0% – – –<br />

Density of à = 3% - K∞(Ã) = 13269<br />

IC(level) Density of M GMRES(30) GMRES(50) SQMR<br />

IC(0) 3.0% – – –<br />

IC(1) 7.1% – 247 110<br />

IC(2) 11.3% 60 41 40<br />

Density of à = 4% - K∞(Ã) = 9568<br />

IC(level) Density of M GMRES(30) GMRES(50) SQMR<br />

IC(0) 4.0% – – 388<br />

IC(1) 10.0% 80 47 47<br />

IC(2) 15.9% 26 26 24<br />

Density of à = 5% - K∞(Ã) = 1874<br />

IC(level) Density of M GMRES(30) GMRES(50) SQMR<br />

IC(0) 5.0% – – 342<br />

IC(1) 12.9% 39 33 32<br />

IC(2) 20.4% 21 21 19<br />

Density of à = 6% - K∞(Ã) = 1403<br />

IC(level) Density of M GMRES(30) GMRES(50) SQMR<br />

IC(0) 6.0% – – 362<br />

IC(1) 15.8% 29 29 27<br />

IC(2) 24.5% 18 18 15<br />

Table 2.2.6: Number of iterations, varying the sparsity level of à and the<br />

level of fill-in on Example 3.


32 2. Iterative solution via preconditioned Krylov solvers ...<br />

Example 4<br />

Density of à = 2% - K∞(Ã) = 541<br />

IC(level) Density of M GMRES(30) GMRES(50) SQMR<br />

IC(0) 2.0% 285 221 98<br />

IC(1) 5.1% 46 42 30<br />

IC(2) 8.6% 30 30 24<br />

Density of à = 3% - K∞(Ã) = 346<br />

IC(level) Density of M GMRES(30) GMRES(50) SQMR<br />

IC(0) 3.0% – – 467<br />

IC(1) 8.3% 34 32 24<br />

IC(2) 14.2% 23 23 14<br />

Density of à = 4% - K∞(Ã) = 322<br />

IC(level) Density of M GMRES(30) GMRES(50) SQMR<br />

IC(0) 4.0% 255 187 96<br />

IC(1) 10.9% 24 24 15<br />

IC(2) 17.9% 19 19 12<br />

Density of à = 5% - K∞(Ã) = 369<br />

IC(level) Density of M GMRES(30) GMRES(50) SQMR<br />

IC(0) 5.0% – – –<br />

IC(1) 14.7% 23 23 15<br />

IC(2) 24.5% 19 19 11<br />

Density of à = 6% - K∞(Ã) = 370<br />

IC(level) Density of M GMRES(30) GMRES(50) SQMR<br />

IC(0) 6.0% 477 341 146<br />

IC(1) 18.6% 19 19 12<br />

IC(2) 30.2% 16 16 10<br />

Table 2.2.7: Number of iterations, varying the sparsity level of à and the<br />

level of fill-in on Example 4.


2.2. Preconditioning based on sparsification strategies 33<br />

Example 5<br />

Density of à = 2% - K∞(Ã) = 263<br />

IC(level) Density of M κ ∞ (L) GMRES(30) GMRES(50) SQMR<br />

IC(0) 2.0% 2 · 10 3 378 245 102<br />

IC(1) 5.1% 1 · 10 3 79 68 45<br />

IC(2) 9.1% 9 · 10 2 58 48 34<br />

Density of à = 3% - K∞(Ã) = 270<br />

IC(level) Density of M κ ∞ (L) GMRES(30) GMRES(50) SQMR<br />

IC(0) 3.0% 1 · 10 6 – – –<br />

IC(1) 7.8% 1 · 10 5 – – –<br />

IC(2) 12.8% 3 · 10 3 48 45 30<br />

Density of à = 4% - K∞(Ã) = 253<br />

IC(level) Density of M κ ∞ (L) GMRES(30) GMRES(50) SQMR<br />

IC(0) 4.0% 6 · 10 9 – – –<br />

IC(1) 11.7% 2 · 10 5 – – –<br />

IC(2) 19.0% 7 · 10 3 40 38 25<br />

Density of à = 5% - K∞(Ã) = 285<br />

IC(level) Density of M κ ∞ (L) GMRES(30) GMRES(50) SQMR<br />

IC(0) 5.0% 6 · 10 10 – – –<br />

IC(1) 14.6% 1 · 10 5 – – 307<br />

IC(2) 23.0% 3 · 10 4 150 84 49<br />

Density of à = 6% - K∞(Ã) = 294<br />

IC(level) Density of M κ ∞ (L) GMRES(30) GMRES(50) SQMR<br />

IC(0) 6.0% 8 · 10 11 – – –<br />

IC(1) 18.8% 5 · 10 11 – – –<br />

IC(2) 29.6% 7 · 10 4 – – 242<br />

Table 2.2.8: Number of iterations, varying the sparsity level of à and the<br />

level of fill-in on Example 5.


34 2. Iterative solution via preconditioned Krylov solvers ...<br />

Example 1 - Density of à = 5%<br />

IC(level) Density of M<br />

τ<br />

0.0 0.1 0.3 0.5 0.7 0.9 1.1<br />

IC(0) 5.0% 398 – 222 166 117 123 109<br />

IC(1) 16.9% 222 – – 169 90 73 67<br />

IC(2) 32.3% 86 159 146 134 67 68 62<br />

Example 2 - Density of à = 2%<br />

IC(level) Density of M<br />

τ<br />

0.0 0.1 0.3 0.5 0.7 0.9 1.1<br />

IC(0) 2.0% 168 423 – – 458 182 180<br />

IC(1) 4.1% 386 – – – 363 141 142<br />

IC(2) 6.6% – 380 200 – 474 142 117<br />

Example 3 - Density of à = 3%<br />

IC(level) Density of M<br />

τ<br />

0.0 0.1 0.3 0.5 0.7 0.9 1.1<br />

IC(0) 3.0% – – – – – 179 172<br />

IC(1) 7.1% 110 139 – 336 95 109 145<br />

IC(2) 11.3% 40 92 – 95 80 85 90<br />

Example 4 - Density of à = 4%<br />

IC(level) Density of M<br />

τ<br />

0.0 0.1 0.3 0.5 0.7 0.9 1.1<br />

IC(0) 3.0% 467 189 – – – – 206<br />

IC(1) 8.4% 24 26 60 234 – – –<br />

IC(2) 14.2% 14 15 21 28 – – –<br />

Example 5 - Density of à = 4%<br />

IC(level) Density of M<br />

τ<br />

0.0 0.1 0.3 0.5 0.7 0.9 1.1<br />

IC(0) 4.0% – – – – – – –<br />

IC(1) 11.7% – – – – – – –<br />

IC(2) 19.0% 25 131 123 – – – –<br />

Table 2.2.9: Number of SQMR iterations, varying the shift parameter <strong>for</strong><br />

various level of fill-in in IC.


2.2. Preconditioning based on sparsification strategies 35<br />

(a) τ = 0.0 - κ(L) = 526284 -<br />

SQMR iter. = +500<br />

(b) τ = 0.1 - κ(L) = 134975 -<br />

SQMR iter. = +500<br />

(a) τ = 0.3 - κ(L) = 9608 -<br />

SQMR iter. = 313<br />

(b) τ = 0.5 - κ(L) = 2165 -<br />

SQMR iter. = 161<br />

(c) τ = 0.7 - κ(L) = 777 - SQMR<br />

iter. = 117<br />

(d) τ = 0.9 - κ(L) = 434 - SQMR<br />

iter. = 104<br />

(c) τ = 1.1 - κ(L) = 261 - SQMR<br />

iter. = 95<br />

(d) τ = 1.3 - κ(L) = 183 - SQMR<br />

iter. = 94<br />

Figure 2.2.8: The spectrum of the matrix preconditioned with IC(1), the<br />

condition number of L, and the number of iterations with SQMR <strong>for</strong> various<br />

values of the shift parameter τ. The test problem is Example 1 and the<br />

density of à is around 3%.


36 2. Iterative solution via preconditioned Krylov solvers ...<br />

(a) τ = 0.0 - κ(L) = 526284 -<br />

SQMR iter. = +500<br />

(b) τ = 0.1 - κ(L) = 134975 -<br />

SQMR iter. = +500<br />

(a) τ = 0.3 - κ(L) = 9608 -<br />

SQMR iter. = 313<br />

(b) τ = 0.5 - κ(L) = 2165 -<br />

SQMR iter. = 161<br />

(c) τ = 0.7 - κ(L) = 777 - SQMR<br />

iter. = 117<br />

(d) τ = 0.9 - κ(L) = 434 - SQMR<br />

iter. = 104<br />

(c) τ = 1.1 - κ(L) = 261 - SQMR<br />

iter. = 95<br />

(d) τ = 1.3 - κ(L) = 183 - SQMR<br />

iter. = 94<br />

Figure 2.2.9: The eigenvalue distribution on the square [-1, 1] of the matrix<br />

preconditioned with IC(1), the condition number of L, and the number of<br />

iterations with SQMR <strong>for</strong> various values of the shift parameter τ. The test<br />

problem is Example 1 and the density of à is around 3%.


2.2. Preconditioning based on sparsification strategies 37<br />

(a) τ = 0.0 - κ(L) = 526284 -<br />

SQMR iter. = +500<br />

(b) τ = 0.1 - κ(L) = 134975 -<br />

SQMR iter. = +500<br />

(a) τ = 0.3 - κ(L) = 9608 -<br />

SQMR iter. = 313<br />

(b) τ = 0.5 - κ(L) = 2165 -<br />

SQMR iter. = 161<br />

(c) τ = 0.7 - κ(L) = 777 - SQMR<br />

iter. = 117<br />

(d) τ = 0.9 - κ(L) = 434 - SQMR<br />

iter. = 104<br />

(c) τ = 1.1 - κ(L) = 261 - SQMR<br />

iter. = 95<br />

(d) τ = 1.3 - κ(L) = 183 - SQMR<br />

iter. = 94<br />

Figure 2.2.10: The eigenvalue distribution on the square [-0.3, 0.3] of the<br />

matrix preconditioned with IC(1), the condition number of L, and the<br />

number of iterations with SQMR <strong>for</strong> various values of the shift parameter<br />

τ. The test problem is Example 1 and the density of à is around 3%.


38 2. Iterative solution via preconditioned Krylov solvers ...<br />

2.2.3 AINV<br />

An alternative way to construct a preconditioner is to compute an explicit<br />

approximation of the inverse of the coefficient matrix. In this section<br />

we consider two techniques, the first constructs an approximation of the<br />

inverse of the factors using an Ã-biconjugation process [19] and the other a<br />

Frobenius-norm minimization technique [93].<br />

If the matrix à can be written in the <strong>for</strong>m LDLT where L is unit lower<br />

triangular and D is diagonal, then its inverse can be decomposed as Ã−1 =<br />

L −T D −1 L −1 = ZD −1 Z T where Z = L −T is unit triangular. Factorized<br />

sparse approximate inverse techniques compute sparse approximations ¯Z ≈<br />

Z, so that the resulting preconditioner will be M = ¯Z ¯D −1 ¯ZT ≈ Ã−1 , <strong>for</strong><br />

¯D ≈ D.<br />

In the approach known as AINV the triangular factors are computed<br />

by means of a set of Ã-biconjugate vectors {z i } n i=1 , such that zT i Ãz j = 0 if<br />

and only if i ≠ j. Then, introducing the matrix Z = [z 1 , z 2 , ...z n ] the relation<br />

⎛<br />

⎞<br />

p 1 0 . . . 0<br />

Z T 0 p 2 . . . 0<br />

ÃZ = D = ⎜<br />

⎝<br />

.<br />

. . ..<br />

⎟<br />

. ⎠<br />

0 0 . . . p n<br />

holds, where p i = z T i<br />

Ãz i ≠ 0 , and the inverse is equal to<br />

à −1 = ZD −1 Z T =<br />

n∑<br />

i=1<br />

z i z T i<br />

p i<br />

.<br />

The sets of Ã-biconjugate vectors are computed by means of a (two-sided)<br />

Gram-Schmidt orthogonalization process with respect to the bi<strong>linear</strong> <strong>for</strong>m<br />

associated with Ã. A sketch of the algorithm is resumed in Figure 2.2.11.<br />

In exact arithmetic this process can be completed if and only if à admits a<br />

LU factorization. AINV does not require a pattern prescribed in advance<br />

<strong>for</strong> the approximate inverse factors, and sparsity is preserved during the<br />

process, by discarding elements in the computed approximate inverse factor<br />

having magnitude smaller than a given positive threshold.<br />

An alternative approach was proposed by Kolotilina and Yeremin in<br />

a series of papers ([95, 96, 97, 98]). This approach, known as F SAI,<br />

approximates Ã−1 by the factorization G T G, where G is a sparse lower<br />

triangular matrix approximating the inverse of the lower triangular Cholesky<br />

factor, ˜L, of Ã. This technique has obtained good results on some difficult<br />

problems and is suitable <strong>for</strong> parallel implementation, but it requires an a<br />

priori prescription <strong>for</strong> the sparsity pattern <strong>for</strong> the approximate factors. The<br />

approximate inverse factor is computed by minimizing ||I − G˜L|| 2 F , that<br />

can be accomplished without knowing the Cholesky factor ˜L by solving the


2.2. Preconditioning based on sparsification strategies 39<br />

Compute D −1 and Z<br />

Initialization phase<br />

z (0)<br />

i = e i (1 ≤ i ≤ n), A = [a 1, · · · , a n]<br />

The biconjugation algorithm<br />

do i = 1, 2, · · · , n<br />

do j = i, i + 1, · · · , n<br />

end do<br />

p (i−1)<br />

j<br />

do j = i + 1, · · · , n<br />

= a T i z (i−1)<br />

j<br />

end do<br />

end do<br />

z (i)<br />

j<br />

= z (i−1)<br />

j<br />

− (p (i−1)<br />

j<br />

/p (i−1)<br />

i )z (i−1)<br />

i<br />

z i = z (i−1)<br />

i , p i = p (i−1)<br />

i<br />

Figure 2.2.11: The biconjugation algorithm - M = ZD −1 Z T .<br />

normal equations<br />

{G˜L˜L T } ij = ˜L T ij, (i, j) ∈ S˜L<br />

(2.2.5)<br />

where S˜L<br />

is a lower triangular nonzero pattern <strong>for</strong> G. Equation (2.2.5) can<br />

be replaced by<br />

{ ˜GÃ} ij = I ij , (i, j) ∈ S˜L<br />

(2.2.6)<br />

where ˜G = ˜D −1 G and ˜D is the diagonal of ˜L. Then, each row of ˜G<br />

can be computed independently by solving a small <strong>linear</strong> system. The<br />

preconditioned <strong>linear</strong> system has the <strong>for</strong>m<br />

GÃGT = ˜D ˜GÃ ˜G T ˜D.<br />

The matrix ˜D is not known and is generally chosen so that the diagonal of<br />

GÃGT is all ones.<br />

Recently another matrix inversion based on incomplete biconjugation<br />

has been proposed in [148]. The idea is to compute a lower unit triangular<br />

matrix<br />

L = [L 1 , L 2 , ...L n ] of order n,<br />

such that L T ÃL is a diagonal nonsingular matrix, say<br />

D −1 =diag[d −1<br />

11 , d−1 22 ...d−1 nn] .


40 2. Iterative solution via preconditioned Krylov solvers ...<br />

This is equivalent to the relations<br />

L T i ÃL j<br />

{ = 0 if i ≠ j<br />

≠ 0 if i = j · (2.2.7)<br />

In other words L T i and L j are Ã-biconjugate, and then the inverse can be<br />

written as Ã−1 = LDL T . A procedure computes the inverse factors of<br />

à −1 using relations 2.2.7 and preserves a sparsity pattern <strong>for</strong> the factor L<br />

discarding entries with small modulus.<br />

In Table 2.2.10 we show the number of iterations needed by GMRES and<br />

SQMR preconditioned by AINV to reduce the normwise backward error by<br />

10 −5 on the five examples considered. On the most difficult problems, the<br />

per<strong>for</strong>mance of this preconditioner is very poor. For low values of density<br />

of Ã, AINV is less effective than a diagonal scaling, and its quality does not<br />

improve even when the <strong>dense</strong> coefficient matrix is used <strong>for</strong> the construction<br />

as shown in the results of Table 2.2.11. Both re-ordering and shift strategies<br />

do not improve the effectiveness of the preconditioner. We per<strong>for</strong>med<br />

in particular experiments with the reverse Cuthil-MacKee ordering [37],<br />

the minimum degree ordering [71, 141] and the spectral nested dissection<br />

ordering [114]. The best per<strong>for</strong>mance were observed with the minimum<br />

degree algorithm that in some cases enables to have smaller norm-wise<br />

backward error at the end of convergence. We mention that very similar<br />

or sometimes more disappointing results have been observed with the FSAI<br />

method and the other factorized approximate inverse proposed in [148].


2.2. Preconditioning based on sparsification strategies 41<br />

Example 1<br />

Density of Ã<br />

GMRES(m)<br />

SQMR<br />

m=50 m=110 m=∞<br />

2% – – – –<br />

4% – – – –<br />

6% – – 313 –<br />

8% – – 350 –<br />

10% – – 207 306<br />

Example 2<br />

Density of Ã<br />

GMRES(m)<br />

SQMR<br />

m=50 m=110 m=∞<br />

2% – – – –<br />

4% – – 206 345<br />

6% 402 213 143 175<br />

8% 318 195 120 132<br />

10% 144 93 93 99<br />

Example 3<br />

Density of Ã<br />

GMRES(m)<br />

SQMR<br />

m=50 m=110 m=∞<br />

2% – – – –<br />

4% 264 101 101 105<br />

6% 56 51 51 48<br />

8% 37 37 37 34<br />

10% 31 31 31 29<br />

Example 4<br />

Density of Ã<br />

GMRES(m)<br />

SQMR<br />

m=50 m=110 m=∞<br />

2% – – 280 387<br />

4% 83 68 68 57<br />

6% 46 46 46 34<br />

8% 42 42 42 32<br />

10% 48 48 48 38<br />

Example 5<br />

Density of Ã<br />

GMRES(m)<br />

SQMR<br />

m=50 m=110 m=∞<br />

2% 177 142 121 111<br />

4% – – 213 251<br />

6% – 407 194 210<br />

8% – 404 179 207<br />

10% – 328 154 189<br />

Table 2.2.10: Number of iterations required by different Krylov solvers<br />

preconditioned by AINV to reduce the residual by 10 −5 . The symbol ’-’<br />

means that convergence was not obtained after 500 iterations.


42 2. Iterative solution via preconditioned Krylov solvers ...<br />

Density of Ã<br />

Example 1<br />

GMRES(m)<br />

SQMR<br />

m=50 m=110 m=∞<br />

2% – – – –<br />

4% – – – –<br />

6% – – – –<br />

8% – – – –<br />

10% – – 483 –<br />

Example 2<br />

Density of Ã<br />

GMRES(m)<br />

SQMR<br />

m=50 m=110 m=∞<br />

2% – – – –<br />

4% – – 495 –<br />

6% – – 361 –<br />

8% – – 279 –<br />

10% – – 209 486<br />

Example 3<br />

Density of Ã<br />

GMRES(m)<br />

SQMR<br />

m=50 m=110 m=∞<br />

2% – 288 153 176<br />

4% 101 79 79 78<br />

6% 66 57 57 52<br />

8% 42 42 42 38<br />

10% 36 36 36 34<br />

Example 4<br />

Density of Ã<br />

GMRES(m)<br />

SQMR<br />

m=50 m=110 m=∞<br />

2% – – 211 245<br />

4% – 315 154 182<br />

6% – 202 127 142<br />

8% 447 107 107 114<br />

10% 198 90 90 91<br />

Example 5<br />

Density of Ã<br />

GMRES(m)<br />

SQMR<br />

m=50 m=110 m=∞<br />

2% – – – –<br />

4% – – – –<br />

6% – – 259 474<br />

8% – – 229 358<br />

10% – – 216 374<br />

Table 2.2.11: Number of iterations required by different Krylov<br />

solvers preconditioned by AINV to reduce the residual by 10 −5 . The<br />

preconditioner is computed using the <strong>dense</strong> coefficient matrix. The<br />

symbol ’-’ means that convergence was not obtained after 500 iterations.


2.2. Preconditioning based on sparsification strategies 43<br />

Possible causes of failure of factorized approximate inverses<br />

One potential difficulty with the factorized approximate inverse method<br />

AINV is the tuning of the threshold parameter that controls the fill-in<br />

in the inverse factors. For a typical example we display in Figure 2.2.12<br />

the sparsity pattern of A −1 (on the left) and L −1 , the inverse of its<br />

Cholesky factor (on the right), respectively, where all the entries smaller<br />

than 5.0 × 10 −2 have been dropped after a symmetric scaling such that<br />

max i |a ji | = max i |l ji | = 1. The location of the large entries in the<br />

inverse matrix exhibit some structure. In addition, only a very small<br />

number of its entries have large magnitude compared to the others that<br />

are much smaller. This fact has been successfully exploited to define<br />

various a priori pattern selection strategies <strong>for</strong> Frobenius norm minimization<br />

<strong>preconditioners</strong> [2, 22] in a non-factorized <strong>for</strong>m. On the contrary, the inverse<br />

factors that are explicitely approximated by AINV and by F SAI can be<br />

totally unstructured as shown in Figure 2.2.12(b). In this case, the a priori<br />

selection of a sparse pattern <strong>for</strong> the factors can be extremely hard as no<br />

real structures are revealed, preventing the use of techniques like F SAI.<br />

In Figure 2.2.13 we plot the magnitude of the entries in the first column<br />

of A −1 (on the left) and L −1 (on the right), respectively, with respect to<br />

their row index. These plots indicate that any dropping strategy, either<br />

static or dynamic, may be very difficult to tune as it can easily discard<br />

relevant in<strong>for</strong>mation and potentially lead to a very poor preconditioner.<br />

Selecting too small a threshold would retain too many entries and lead to a<br />

fairly <strong>dense</strong> preconditioner. For instance on the small example considered,<br />

if a threshold of 0.05 is used the preconditioner is 14.8% <strong>dense</strong>. A larger<br />

threshold would yield a sparser preconditioner but might discard too many<br />

entries of moderate magnitude that are important <strong>for</strong> the preconditioner. On<br />

the previous example all the entries with magnitude smaller than 0.2 must<br />

be dropped to keep the density in the inverse factor around 3%. Because<br />

of these issues, finding the appropriate threshold to enable a good trade-off<br />

between sparsity and numerical efficiency is challenging and very problemdependent.<br />

2.2.4 SPAI<br />

Frobenius-norm minimization is a natural approach <strong>for</strong> building explicit<br />

<strong>preconditioners</strong>. This method computes a sparse approximate inverse as<br />

the matrix M = {m ij } which minimizes ‖I − MÃ‖ F (or ‖I − ÃM‖ F<br />

<strong>for</strong> right preconditioning) subject to certain sparsity constraints. Early<br />

references to this latter class can be found in [12, 13, 14, 65] and in [2]<br />

<strong>for</strong> some applications to boundary element matrices in electromagnetism.<br />

The Frobenius-norm is usually chosen since it allows the decoupling of the<br />

constrained minimization problem into n independent <strong>linear</strong> least-squares


44 2. Iterative solution via preconditioned Krylov solvers ...<br />

0<br />

0<br />

20<br />

20<br />

40<br />

40<br />

60<br />

60<br />

80<br />

80<br />

100<br />

100<br />

120<br />

0 20 40 60 80 100 120<br />

Density = 8.75%<br />

(a) Sparsity pattern of<br />

sparsified(A −1 )<br />

120<br />

0 20 40 60 80 100 120<br />

Density = 29.39%<br />

(b) Sparsity pattern of<br />

sparsified(L −1 )<br />

Figure 2.2.12: Sparsity patterns of the inverse of A (on the left) and of<br />

the inverse of its lower triangular factor (on the right), where all the entries<br />

whose relative magnitude is smaller than 5.0 × 10 −2 are dropped. The test<br />

problem, representative of the general trend, is a small sphere.<br />

2<br />

2<br />

1.8<br />

1.8<br />

Magnitude of the entries in the 1st row of A −1<br />

1.6<br />

1.4<br />

1.2<br />

1<br />

0.8<br />

0.6<br />

0.4<br />

0.2<br />

1.6<br />

1.4<br />

1.2<br />

1<br />

0.8<br />

0.6<br />

0.4<br />

0.2<br />

0<br />

0 20 40 60 80 100 120<br />

Column of A −1<br />

of A −1 of the inverse of a factor of A<br />

0 20 40 60 80 100 120<br />

(a) Histogram of the magnitude<br />

of the entries of the first column<br />

(b) Histogram of the magnitude<br />

of the entries in the first column<br />

Figure 2.2.13: Histograms of the magnitude of the entries of the first column<br />

of A −1 and its lower triangular factor. A similar behaviour has been observed<br />

<strong>for</strong> all the other columns. The test problem, representative of the general<br />

trend, is a small sphere.<br />

0<br />

problems, one <strong>for</strong> each column of M (when preconditioning <strong>from</strong> the right)<br />

or row of M (when preconditioning <strong>from</strong> the left).<br />

The independence of these least-squares problems follows immediately <strong>from</strong><br />

the identity:<br />

‖I − MÃ‖2 F = ‖I − ÃM T ‖ 2 F =<br />

n∑<br />

‖e j − Ãm j•‖ 2 2 (2.2.8)<br />

where e j is the j-th unit vector and m j• is the column vector representing<br />

the j-th row of M.<br />

j=1


2.2. Preconditioning based on sparsification strategies 45<br />

In the case of right preconditioning, the analogous relation<br />

‖I − ÃM‖2 F =<br />

n∑<br />

‖e j − Ãm •j‖ 2 2 (2.2.9)<br />

j=1<br />

holds, where m •j is the column vector representing the j-th column of<br />

M. Clearly, there is considerable scope <strong>for</strong> parallelism in this approach.<br />

However, the precondioner is not guaranteed to be nonsingular, and the<br />

symmetry of à is generally not preserved in M. The main issue <strong>for</strong> the<br />

computation of the sparse approximate inverse is the selection of the nonzero<br />

pattern of M, that is the set of indices<br />

S = { (i, j) ∈ [1, n] 2 s.t. m ij = 0 }.<br />

If the sparsity pattern of M is known, the nonzero structure <strong>for</strong> the j-th<br />

column of M is automatically determined, and defined as<br />

J = {i ∈ [1, n] s.t. (i, j) ∈ S}.<br />

The least-squares solution involves only the columns of à indexed by J; we<br />

indicate this subset by Ã(:, J). Because à is sparse, many rows in Ã(:, J) are<br />

usually null, not affecting the solution of the least-squares problems (2.2.9).<br />

Thus if I is the set of indices corresponding to the nonzero rows in Ã(:, J),<br />

and if we define by  = Ã(I, J), by ˆm j = m j (J), and by ê j = e j (J), the<br />

actual “reduced” least-squares problems to solve are<br />

min‖ê j − Â ˆm j‖ 2 , j = 1, .., n. (2.2.10)<br />

Usually problems (2.2.10) have much smaller size than problems (2.2.9).<br />

Two different approaches can be followed <strong>for</strong> the selection of the sparsity<br />

pattern of M: an adaptive technique that dynamically tries to identify<br />

the best structure <strong>for</strong> M; and a static technique, where the pattern of<br />

M is prescribed a priori based on some heuristics. The idea is to keep<br />

M reasonably sparse while trying to capture the “large” entries of the<br />

inverse, which are expected to contribute the most to the quality of the<br />

preconditioner. A static approach that requires an a priori nonzero pattern<br />

<strong>for</strong> the preconditioner, introduces significant scope <strong>for</strong> parallelism and has<br />

the advantage that the memory storage requirements and computational<br />

cost <strong>for</strong> the setup phase are known in advance. However, it can be very<br />

problem dependent.<br />

A dynamic approach is generally effective but is usually very expensive.<br />

These methods usually start with a simple initial guess, like a diagonal<br />

matrix, and then improve the pattern until a criterion of the <strong>for</strong>m ‖Ãm j −<br />

e j ‖ 2 < ε (<strong>for</strong> each j) is satisfied <strong>for</strong> a given ε > 0, e j being the j-th column<br />

of the identity matrix, or until a maximum number of nonzeros in the j-th<br />

column m j of M has been reached.


46 2. Iterative solution via preconditioned Krylov solvers ...<br />

Different strategies can be adopted to enrich the initial nonzero structure<br />

of the j-th column of the preconditioner. The method known as SPAI [84]<br />

uses some heuristic to select the new indices by predicting those that can<br />

most effectively reduce the residual<br />

‖r‖ 2<br />

= ‖Ã(:, J) ˆm j − ê j ‖ 2 (2.2.11)<br />

Grote and Huckle [84] propose solving a one-dimensional minimization<br />

problem. If L = {l s.t. r(l) ≠ 0}, then the new candidates are selected <strong>from</strong><br />

Ĩ = {j s.t. Ã(L, j) ≠ 0}. They suggest solving, <strong>for</strong> each j ∈ Ĩ the following<br />

problem<br />

The solution of this problem is<br />

min µj ‖r + µ j Ãe j ‖ 2 .<br />

µ j = rT Ãe j<br />

‖Ãe ,<br />

j‖ 2 2<br />

and the residual of the updated solution is given by<br />

ρ j = ‖r‖ 2<br />

− rT Ãe j<br />

‖Ãe .<br />

j‖ 2 2<br />

The proposed heuristic selects the indices which maximize rT Ãe j<br />

. More<br />

‖Ãe j‖ 2 2<br />

than one new candidate can be selected at a time, and the algorithm stops<br />

when either a maximum number of nonzeros per column is reached or<br />

the required accuracy is achieved. The algorithm can deliver very good<br />

<strong>preconditioners</strong> even on hard problems, but at the cost of huge times and<br />

memory although the execution time can be significantly reduced because<br />

of parallelism. A comparison in terms of construction cost with ILU-type<br />

methods can be found in [18, 76].<br />

In Table 2.2.12, we show the number of iterations needed by Krylov<br />

solvers preconditioned by SPAI to solve the model problems. As <strong>for</strong> the<br />

other <strong>preconditioners</strong>, we consider different levels of density in the sparse<br />

approximation of A. Provided the preconditioner is <strong>dense</strong> enough, SPAI is<br />

quite effective in reducing the number of iterations. Also, the quality of the<br />

preconditioner on difficult problems can be remarkably improved if the <strong>dense</strong><br />

coefficient matrix is used <strong>for</strong> the construction. For instance on Example 1, if<br />

SPAI is computed using the full A, then a density of 2% <strong>for</strong> the approximate<br />

inverse enables the convergence of GMRES(80) in 75 iterations, whereas<br />

convergence is not achieved in 500 iterations if the approximate inverse is<br />

computed using a sparse approximation of A. However the adaptive strategy<br />

requires a prohibitive time. The construction of the approximate inverse<br />

using 6% density <strong>for</strong> Ã takes nearly one hour of computation on a SGI


2.2. Preconditioning based on sparsification strategies 47<br />

Origin 2000 <strong>for</strong> Example 4 and three hours <strong>for</strong> Example 5. When using the<br />

<strong>dense</strong> matrix A in the computation, the construction of the preconditioner<br />

<strong>for</strong> the same examples takes more than one day.<br />

2.2.5 SLU<br />

In this section we use the sparsified matrix à as an implicit preconditioner;<br />

that is, the sparsified matrix is factorized using ME47, a sparse direct solver<br />

<strong>from</strong> HSL [87], and those exact factors are used as the preconditioner. Thus<br />

it represents an extreme case with respect to ILU(0), since a complete fillin<br />

is allowed in the factors. This method will be referred to as SLU. This<br />

approach, although not easily parallelizable, is generally quite effective on<br />

this class of applications <strong>for</strong> <strong>dense</strong> enough sparse approximations of A.<br />

In Table 2.2.13 we show the number of iterations required by different<br />

Krylov solvers preconditioned by SLU to reduce the normwise backward<br />

error by a factor of 10 −5 . This approach, although not easily parallelizable,<br />

is generally quite effective on this class of applications <strong>for</strong> <strong>dense</strong> enough<br />

sparse approximations of A. However, as shown in the table, when<br />

the preconditioner is very sparse, the numerical quality of this approach<br />

deteriorates and the Frobenius-norm minimization method is more robust.


48 2. Iterative solution via preconditioned Krylov solvers ...<br />

Example 1<br />

Density of Ã<br />

GMRES(m) Bi -<br />

CGStab<br />

UQMR TFQMR<br />

m=10 m=30 m=50 m=80 m=110<br />

2% – – – – – – – –<br />

4% – – 336 79 79 333 254 370<br />

6% – – 150 65 65 269 243 312<br />

8% – 242 82 56 56 175 195 240<br />

10% – 237 50 50 50 127 174 196<br />

Example 2<br />

Density of Ã<br />

GMRES(m) Bi -<br />

CGStab<br />

UQMR TFQMR<br />

m=10 m=30 m=50 m=80 m=110<br />

2% – – – – 212 – – –<br />

4% – – 494 79 79 371 315 –<br />

6% – – 185 72 72 291 279 432<br />

8% – – 134 66 66 277 287 406<br />

10% – – 109 62 62 229 267 458<br />

Example 3<br />

Density of Ã<br />

GMRES(m) Bi -<br />

CGStab<br />

UQMR TFQMR<br />

m=10 m=30 m=50 m=80 m=110<br />

2% – – – – – – – –<br />

4% – – 194 72 72 187 255 340<br />

6% – 230 80 55 55 153 177 222<br />

8% – 151 48 48 48 181 162 196<br />

10% – 151 46 46 46 157 159 208<br />

Example 4<br />

Density of Ã<br />

GMRES(m) Bi -<br />

CGStab<br />

UQMR TFQMR<br />

m=10 m=30 m=50 m=80 m=110<br />

2% – – 253 81 81 – 309 394<br />

4% – – 187 113 85 374 331 424<br />

6% – 401 153 76 76 288 270 370<br />

8% – 90 47 47 47 76 171 170<br />

10% 41 28 28 28 28 35 105 74<br />

Example 5<br />

Density of Ã<br />

GMRES(m) Bi -<br />

CGStab<br />

UQMR TFQMR<br />

m=10 m=30 m=50 m=80 m=110<br />

2% – – – – – – – –<br />

4% – 183 138 73 73 213 457 338<br />

6% – – 194 122 93 – 448 442<br />

8% – 289 137 71 71 – 345 358<br />

10% – 283 100 68 68 – 334 266<br />

Table 2.2.12: Number of iterations required by different Krylov solvers<br />

preconditioned by SPAI to reduce the residual by 10 −5 . The symbol ’-’<br />

means that convergence was not obtained after 500 iterations.


2.2. Preconditioning based on sparsification strategies 49<br />

Example 1<br />

Density of Ã<br />

GMRES(m) Bi -<br />

CGStab<br />

UQMR TFQMR<br />

m=10 m=30 m=50 m=80 m=110<br />

2% +500 +500 +500 364 241 +500 +500 486<br />

4% +500 +500 128 65 65 136 111 114<br />

6% 60 31 31 31 31 23 36 28<br />

8% 51 27 27 27 27 21 34 22<br />

10% 33 22 22 22 22 14 25 17<br />

Example 2<br />

Density of Ã<br />

GMRES(m) Bi -<br />

CGStab<br />

UQMR TFQMR<br />

m=10 m=30 m=50 m=80 m=110<br />

2% +500 +500 +500 288 109 489 290 229<br />

4% 50 30 30 30 30 18 42 22<br />

6% 40 27 27 27 27 16 38 21<br />

8% 32 24 24 24 24 14 35 19<br />

10% 26 21 21 21 21 13 30 16<br />

Example 3<br />

Density of Ã<br />

GMRES(m) Bi -<br />

CGStab<br />

UQMR TFQMR<br />

m=10 m=30 m=50 m=80 m=110<br />

2% +500 +500 330 171 108 207 234 206<br />

4% 38 27 27 27 27 16 29 19<br />

6% 27 21 21 21 21 11 22 14<br />

8% 21 17 17 17 17 10 17 12<br />

10% 18 15 15 15 15 9 16 10<br />

Example 4<br />

Density of Ã<br />

GMRES(m) Bi -<br />

CGStab<br />

UQMR TFQMR<br />

m=10 m=30 m=50 m=80 m=110<br />

2% 37 35 34 34 34 17 39 21<br />

4% 23 21 21 21 21 10 24 14<br />

6% 18 17 17 17 17 9 18 10<br />

8% 15 15 15 15 15 8 16 9<br />

10% 14 13 13 13 13 7 15 9<br />

Example 5<br />

Density of Ã<br />

GMRES(m) Bi -<br />

CGStab<br />

UQMR TFQMR<br />

m=10 m=30 m=50 m=80 m=110<br />

2% 72 45 42 42 34 46 37<br />

4% 42 29 29 29 29 23 32 25<br />

6% 29 26 26 26 26 20 28 16<br />

8% 29 23 23 23 23 17 25 15<br />

10% 28 21 21 21 21 17 25 18<br />

Table 2.2.13: Number of iterations required by different Krylov solvers<br />

preconditioned by SLU to reduce the residual by 10 −5 . The symbol ’-’<br />

means that convergence was not obtained after 500 iterations.


50 2. Iterative solution via preconditioned Krylov solvers ...<br />

2.2.6 Other <strong>preconditioners</strong><br />

A third class of explicit methods deserves to be mentioned here, although<br />

we will not consider it in our numerical experiments. It is based on ILU<br />

techniques, and in the general nonsymmetric case it builds the sparse<br />

approximate inverse by first per<strong>for</strong>ming an incomplete LU factorization<br />

à ≈ ¯LŪ and then approximately inverting the ¯L and Ū factors by solving<br />

the 2n triangular <strong>linear</strong> <strong>systems</strong><br />

{ ¯Lxi = e i<br />

Ūy i = e i<br />

(1 ≤ i ≤ n).<br />

These two <strong>systems</strong> are solved approximately, prescribing two sparsity<br />

pattern <strong>for</strong> ¯L and Ū and using a Frobenius-type method, or the adaptive<br />

SP AI method without any pattern in advance. Another approach, which<br />

has provided better results, consists in solving the 2n triangular <strong>systems</strong> by<br />

customary <strong>for</strong>ward and backward substitution, respectively, and adopting<br />

dropping strategy, based either on position or on values, to maintain sparsity<br />

in the columns of ¯L and Ū. Generally two different levels of incompleteness<br />

are applied, rather than one as in the other approximate inverse methods.<br />

These <strong>preconditioners</strong> are not easy to use; relying on ILU factorization, they<br />

are almost useless <strong>for</strong> highly nonsymmetric, indefinite matrices and since<br />

incomplete processes are strongly sequential, the preconditioner building<br />

phase is not entirely parallelizable, although the independence of the two<br />

triangular solves suggest a good scope <strong>for</strong> parallelism. References to this<br />

class can be found in [3, 40, 133].<br />

2.3 Concluding remarks<br />

In this chapter we have established the need <strong>for</strong> preconditioning <strong>linear</strong><br />

<strong>systems</strong> of equations which arise <strong>from</strong> the discretization of boundary<br />

integral equations in electromagnetism. We have discussed several standard<br />

<strong>preconditioners</strong> based on sparsification strategies and have studied and<br />

compared their numerical behaviour on a set of model problems that may be<br />

representative of real electromagnetic calculation. We have shown that the<br />

incomplete factorization process is highly unstable on indefinite matrices<br />

like those arising <strong>from</strong> the discretization of the EFIE <strong>for</strong>mulation. Using<br />

numerical experiments we have shown that the triangular factors computed<br />

by the factorization can be very ill-conditioned, and the long recurrences<br />

associated with the triangular solves are unstable. As an attempt at a<br />

possible remedy, we have introduced a small complex shift to move the<br />

eigenvalues of the preconditioned system along the imaginary axis and<br />

thus try to avoid a possible cluster of eigenvalues close to zero. A small<br />

diagonal complex shift can help to compute a more stable factorization.


2.4. Concluding remarks 51<br />

However, suitable strategies can be introduced to tune the optimal value<br />

of the shift and to predict its effect. Factorized approximate inverses,<br />

namely AINV and F SAI, exhibit poor convergence behaviour because<br />

the inverse factors can be totally unstructured; both reordering and shift<br />

strategies do not improve their effectiveness. Any dropping strategy, either<br />

static or dynamic, may be very difficult to tune as it can easily discard<br />

relevant in<strong>for</strong>mation and potentially lead to a very poor preconditioner.<br />

Among different techniques, Frobenius norm minimization methods are<br />

quite efficient because they deliver a good rate of convergence. However,<br />

they require a high computational ef<strong>for</strong>t, so that their use is mainly effective<br />

in a parallel setting. To be computationally af<strong>for</strong>dable on <strong>dense</strong> <strong>linear</strong><br />

<strong>systems</strong>, Frobenius-norm minimization preconditioning techniques require<br />

a suitable strategy to identify the relevant entries to consider in the original<br />

matrix A, in order to define small least-squares problems, as well as an<br />

appropriate sparsity structure <strong>for</strong> the approximate inverse. Prescribing a<br />

pattern in advance <strong>for</strong> the preconditioner can greatly reduce the amount<br />

of work in terms of CPU-time. The problem of cost is evident <strong>for</strong> the<br />

computation of SP AI, since fast convergence can be obtained <strong>for</strong> high values<br />

of the sparsity ratio, but then the adaptive strategy requires a prohibitive<br />

time and computational cost in a sequential environment. Compared to<br />

sparse approximate inverse methods, SSOR is generally slower, but is very<br />

cheap to compute. Its main drawback is that it is not parallelizable and in<br />

addition, <strong>for</strong> much larger problems, the cost per iteration will grow so that<br />

this preconditioner will no longer be competitive with the other techniques.<br />

Finally, the SLU preconditioner, although generally quite effective on this<br />

class of applications, is not easily parallelizable and requires <strong>dense</strong> enough<br />

sparse approximations of A. This preconditioner can be expensive in terms<br />

of both memory and CPU time <strong>for</strong> the solution of large problems, and thus<br />

it is mainly interesting <strong>for</strong> comparison purpose.


52 2. Iterative solution via preconditioned Krylov solvers ...


Chapter 3<br />

<strong>Sparse</strong> pattern selection<br />

strategies <strong>for</strong> robust<br />

Frobenius-norm<br />

minimization preconditioner<br />

In the previous chapter, we established the need <strong>for</strong> preconditioning<br />

<strong>linear</strong> <strong>systems</strong> of equations arising <strong>from</strong> the discretization of<br />

boundary integral equations (expressed via the EFIE <strong>for</strong>mulation) in<br />

electromagnetism. We briefly discussed some <strong>preconditioners</strong> and compared<br />

their per<strong>for</strong>mance on a set of model problems arising both <strong>from</strong> academic<br />

and <strong>from</strong> industrial applications. The numerical results suggests that sparse<br />

approximate inverse techniques can be good candidates to precondition<br />

this class of problems efficiently. In particular, the Frobenius-norm<br />

minimization approach can greatly reduce the number of iterations needed<br />

if compared with the implicit approach based on incomplete factorization.<br />

In addition Frobenius-norm minimization is inherently parallel. To be<br />

computationally af<strong>for</strong>dable on <strong>dense</strong> <strong>linear</strong> <strong>systems</strong>, Frobenius-norm<br />

minimization <strong>preconditioners</strong> require a suitable strategy to identify the<br />

relevant entries to consider in the original matrix A, in order to define small<br />

least-squares problems, as well as an appropriate sparsity structure <strong>for</strong> the<br />

approximate inverse.<br />

In this chapter, we propose some efficient static nonzero pattern<br />

selection strategies both <strong>for</strong> the preconditioner and <strong>for</strong> the selection of<br />

the entries of A. In Section 3.1, we overview both dynamic and static<br />

approaches to compute the sparsity pattern of Frobenius-norm minimization<br />

<strong>preconditioners</strong>. In Section 3.2, we introduce and compare some strategies<br />

to prescribe in advance the nonzero structure of the preconditioner in<br />

electromagnetic applications. In Section 3.3, we propose the use of a different<br />

53


54 3. <strong>Sparse</strong> pattern selection strategies <strong>for</strong> robust ...<br />

pattern selection procedure <strong>for</strong> the original matrix <strong>from</strong> that used <strong>for</strong> the<br />

preconditioner and finally, in Section 3.4 we illustrate the numerical and<br />

computational efficiency of the proposed <strong>preconditioners</strong> on a set of model<br />

problems.<br />

3.1 Introduction and motivation<br />

We introduced Frobenius-norm minimization in Section 2.2.4. The idea<br />

is to compute the sparse approximate inverse of a matrix A as the matrix<br />

M which minimizes ‖I − MA‖ F (or ‖I − AM‖ F <strong>for</strong> right preconditioning)<br />

subject to certain sparsity constraints. The main issue is the selection of<br />

the nonzero pattern of M. The idea is to keep M reasonably sparse while<br />

trying to capture the “large” entries of the inverse, which are expected to<br />

contribute the most to the quality of the preconditioner. For this purpose,<br />

two approaches can be followed: an adaptive technique that dynamically<br />

tries to identify the best structure <strong>for</strong> M; and a static technique, where the<br />

pattern of M is prescribed a priori based on some heuristics.<br />

A simple approach is to prescribe the locations of nonzeros of M be<strong>for</strong>e<br />

computing their actual values. When the coefficient matrix has a special<br />

structure or special properties, ef<strong>for</strong>ts have been made to find a pattern<br />

that can retain the entries of A −1 having large modulus [42, 48, 49, 138],<br />

and indeed some theoretical studies have shown that there are cases where<br />

the large entries in A −1 are clustered near the diagonal [58, 106]. If A is<br />

row diagonally dominant, then the entries in the inverse decay columnwise<br />

and vice versa [138]. When A is a banded SP D matrix, the entries of A −1<br />

decay exponentially along each row or column; more precisely, if b ij is the<br />

element located at the i-th row and j-th column of A −1 , then<br />

|b ij | ≤ Cγ |i−j| (3.1.1)<br />

where γ < 1 and C > 0 are constant. In this case a banded M would be a<br />

good approximation to A −1 [49]. For many PDE problems the entries of the<br />

inverse exhibit some decaying behaviour and a good sparse pattern <strong>for</strong> the<br />

approximate inverse can be computed in advance. However the constant C<br />

in relation (3.1.1) can be very large and the decay unacceptably slow, or the<br />

decay is non-monotonic and thus hardly predictable [139].<br />

For sparse matrices, the nonzero structure of the approximate inverse<br />

can be computed based on graph in<strong>for</strong>mation of the coefficient matrix. The<br />

sparsity structure of a sparse matrix A of order n is represented by a directed<br />

graph G(A) where the vertices are the integers {1, 2, ..., n} and the edges<br />

connect pairs of distinct vertices (i, j) corresponding to nonzero off-diagonal<br />

entries {a ij } in A. The inverse will contain a nonzero in the (i, j) location<br />

whenever there is a directed path connecting vertex i to vertex j in G(A) [72].


3.1. Introduction and motivation 55<br />

Several heuristics can be used to traverse the graph along specific directions<br />

and select a suitable subset of vertices of G(A) to construct the sparsity<br />

pattern of the approximate inverse. Benson and Frederickson [13] define<br />

the structure <strong>for</strong> the j-th column of the approximate inverse in the case of<br />

structurally symmetric matrices with a full diagonal by selecting in G(A)<br />

vertex j and its q-th level nearest-neighbours. They called matrices defined<br />

with these patterns as q-local matrices. A 0-local matrix has a diagonal<br />

structure, while a 1-local matrix has the same sparsity pattern of A. Taking<br />

<strong>for</strong> the sparse approximate inverse the same pattern of A generally works<br />

well only <strong>for</strong> specific classes of problems; using more levels can improve the<br />

quality of the preconditioner but the storage can become prohibitive when<br />

q is increased, and even q=2 is impractical in many cases [61].<br />

The direction of the path in the graph can be selected based on<br />

physical considerations dictated by the decay of the magnitude of the entries<br />

observed in the discrete Green’s function <strong>for</strong> many problems [139]. The<br />

discrete Green’s function can be considered as a row or as a column of<br />

the exact inverse depicted on the physical computational grid. Dropping<br />

or sparsification can help to identify the most relevant interactions in<br />

the direct problem and select suitable search directions in the graph.<br />

For instance dropping entries of A smaller than a global threshold can<br />

detect anisotropy in the underlying problem and reveal it when no<br />

additional physical in<strong>for</strong>mation is available. Chow [33] proposes combining<br />

sparsification with the use of patterns of powers of the sparsified matrix<br />

<strong>for</strong> preconditioning <strong>linear</strong> <strong>systems</strong> arising <strong>from</strong> the discretization of PDE<br />

problems. Sparsification can remarkably reduce the construction cost of the<br />

preconditioner, and the use of matrix powers enables to retain the largest<br />

entries in the Green’s function. A post-processing stage, called filtration,<br />

can be included to drop small magnitude entries in the sparse approximate<br />

inverse, and reduce the cost of storing and applying the preconditioner.<br />

However, the choice of these parameters is problem-dependent and this<br />

strategy is not guaranteed to be effective on <strong>systems</strong> not arising <strong>from</strong> PDEs.<br />

The difficulty in extracting a good sparsity pattern <strong>for</strong> the approximate<br />

inverse of matrices with a general sparsity pattern has motivated the<br />

investigation of adaptive strategies that compute the pattern of the<br />

approximate inverse dynamically. The adaptive procedure known as SP AI<br />

has been already described in Section 2.2.4. The procedure described<br />

in [35] uses a few steps of an iterative solver, like the minimal residual,<br />

to approximately minimize the least-squares problems of relation 2.2.9.<br />

The sparsity pattern automatically emerges during the computation, and<br />

a dual threshold strategy is adopted to drop small entries either in the<br />

search directions or the iterates. To control costs, operations must be<br />

per<strong>for</strong>med in sparse-sparse mode, meaning that sparse matrix-sparse vector<br />

multiplications are per<strong>for</strong>med. These algorithms usually compute the<br />

approximate inverse starting with an initial pattern and estimate the


56 3. <strong>Sparse</strong> pattern selection strategies <strong>for</strong> robust ...<br />

accuracy of the preconditioner computed by monitoring the 2-norm of the<br />

residual R = I − AM. If the norm is larger than a user-defined threshold or<br />

the number of nonzeros used is less than a fixed maximum, the pattern<br />

is enlarged according to some heuristics and the approximate inverse is<br />

recomputed. The process is repeated until the required accuracy is not<br />

attained. We refer to these as adaptive procedures.<br />

We have mentioned the problem of cost <strong>for</strong> the computation of SP AI.<br />

Fast convergence can be obtained <strong>for</strong> high values of the sparsity ratio, but<br />

then the adaptive strategy requires a prohibitive time and computational<br />

cost in a sequential environment. In general, adaptive strategies can solve<br />

much more general or hard problems but tend to be very expensive. The use<br />

of effective static pattern selection strategies can greatly reduce the amount<br />

of work in terms of CPU-time, and improve substantially the overall setup<br />

process, introducing significant scope <strong>for</strong> parallelism. Also, the memory<br />

storage requirements and computational cost <strong>for</strong> the setup phase are known<br />

in advance.<br />

In the next sections, we investigate nonzero pattern selection strategies<br />

<strong>for</strong> the computation of sparse approximate inverses on electromagnetic<br />

problems. We consider both methods based on the magnitude of the entries<br />

and methods which exploit geometric or topological in<strong>for</strong>mation <strong>from</strong> the<br />

underlying meshes. The pattern is computed in a preprocessing step and<br />

then used to compute the entries of the preconditioner.<br />

3.2 Pattern selection strategies <strong>for</strong> Frobeniusnorm<br />

minimization methods in<br />

electromagnetism<br />

3.2.1 Algebraic strategy<br />

The boundary element method discretizes integral equations on the surface<br />

of the scattering object, generally introducing a very localized strong<br />

coupling among the edges in the underlying mesh. Each edge is strongly<br />

connected to only a few neighbours while, although not null, far-away<br />

connections are much weaker. This means that a very sparse matrix can<br />

still retain the most relevant contributions <strong>from</strong> the singular integrals that<br />

give rise to <strong>dense</strong> matrices.<br />

Owing to the decay of the discrete Green’s function, the inverse of A<br />

may exhibit a very similar structure to A. Figure 3.2.1 shows the typical<br />

decay of the discrete Green’s function <strong>for</strong> Example 5, a scattering problem<br />

<strong>from</strong> a small sphere, which is representative of the general trend. In the<br />

density coloured plot, large to small magnitude entries in the inverse matrix


3.2. Pattern selection strategies <strong>for</strong> Frobenius-norm ... 57<br />

are depicted in different colours, <strong>from</strong> red to green, yellow and blue. The<br />

discrete Green’s function peaks at a point, then it decays rapidly, and far<br />

<strong>from</strong> the diagonal only a small set of entries have large magnitude.<br />

Figure 3.2.1: Pattern structure of A −1 . The test problem is Example 5.<br />

In this case, a good pattern <strong>for</strong> the sparse approximate inverse is likely<br />

to be the nonzero pattern of a sparse approximation to A, constructed<br />

by dropping all the entries lower than a prescribed global threshold, as<br />

suggested <strong>for</strong> instance in [93]. We refer to this approach as the algebraic<br />

approach.<br />

The dropping heuristics described in Section 2.2 can be used to compute<br />

the sparse pattern <strong>for</strong> the approximate inverse. In [2], these approaches were<br />

compared, observing similar results in the ability to cluster the eigenvalues<br />

of the <strong>preconditioners</strong>. The first and the last heuristic are the simplest,<br />

and are more suitable <strong>for</strong> parallel implementation. In addition, the first<br />

one has the advantage of placing the number of nonzero entries in the<br />

approximate inverse under complete user-control, and of achieving a perfect<br />

load balancing in a parallel implementation. A drawback common to all<br />

heuristics is that we need some deus ex machina to find optimal values <strong>for</strong><br />

the parameters. In the numerical experiments, we have selected the strategy<br />

where, <strong>for</strong> each column of A, the k entries (k ≪ n is a positive integer) of<br />

largest modulus are retained.<br />

The algebraic strategy generally works well and competes with the<br />

approach that adaptively defines the nonzero pattern as implemented<br />

in the SPAI preconditioner described in reference [84]. Nevertheless it


58 3. <strong>Sparse</strong> pattern selection strategies <strong>for</strong> robust ...<br />

suffers some drawbacks that put severe limits on its use in practical<br />

applications. For large problems, accessing all the entries of the matrix<br />

A becomes too expensive or even impossible. This is the case in the fast<br />

multipole framework, where all the entries of the matrix A are not even<br />

available. In addition on complex geometries, a pattern <strong>for</strong> the sparse<br />

approximate inverse computed by using in<strong>for</strong>mation solely <strong>from</strong> A may<br />

lead to a poor preconditioner. These two main drawbacks motivate the<br />

investigation of more appropriate techniques to define a sparsity pattern<br />

<strong>for</strong> the preconditioner.<br />

Because we work in an integral equation context, we can use more<br />

in<strong>for</strong>mation than just the entries of the matrix of the discretized problem. In<br />

particular, we can exploit the underlying mesh and extract further relevant<br />

in<strong>for</strong>mation to construct the preconditioner. Two types of in<strong>for</strong>mation are<br />

available <strong>from</strong> the mesh:<br />

the connectivity graph, describing the topological neighbourhood among<br />

the edges, and<br />

the coordinates of the nodes in the mesh, describing geometric<br />

neighbourhoods among the edges.<br />

3.2.2 Topological strategy<br />

In the integral equation context that we consider, the surface of the object is<br />

discretized by a triangular mesh (see Figure 3.2.2). Each degree of freedom<br />

(DOF), representing an unknown in the <strong>linear</strong> system, corresponds to the<br />

vectorial flux across an edge in the mesh.<br />

When the object geometries are smooth, only the neighbouring edges can<br />

have a strong interaction with each other, while far-away connections are<br />

generally much weaker. Thus an effective pattern <strong>for</strong> the sparse approximate<br />

inverse can be prescribed by exploiting topological in<strong>for</strong>mation related to the<br />

near field. The sparsity pattern <strong>for</strong> any row of the preconditioner can be<br />

defined according to the concept of level k neighbours, as introduced in [115].<br />

Figure 3.2.3 shows the hierarchical representation of the mesh in terms of<br />

topological levels. Level 1 neighbours of a DOF are the DOF plus the four<br />

DOFs belonging to the two triangles that share the edge corresponding to<br />

the DOF itself. Level 2 neighbours are all the level 1 neighbours plus the<br />

DOFs in the triangles that are neighbours of the two triangles considered at<br />

level 1, and so <strong>for</strong>th.<br />

In Figures 3.2.4 and 3.2.5 we plot, <strong>for</strong> each pair of DOFs of the mesh<br />

<strong>for</strong> Example 1, the magnitude of the associated entry in A and A −1 with<br />

respect to their relative level of neighbours. The large entries in A −1 derive<br />

<strong>from</strong> the interaction of a very localized set of edges in the mesh so that by<br />

retaining a few levels of neighbours <strong>for</strong> each DOF an effective preconditioner


3.2. Pattern selection strategies <strong>for</strong> Frobenius-norm ... 59<br />

Figure 3.2.2: Example of discretized mesh.<br />

Figure 3.2.3: Topological neighbours of a DOF in the mesh.<br />

is likely to be constructed. Three levels can generally provide a good pattern<br />

<strong>for</strong> constructing an effective sparse approximate inverse. Using more levels<br />

increases the computational cost but does not improve substantially the<br />

quality of the preconditioner. We will refer to this pattern selection strategy<br />

as the topological strategy. In Figure 3.2.6 we show how the density of<br />

nonzeros in the preconditioner evolves when the number of levels is increased.


60 3. <strong>Sparse</strong> pattern selection strategies <strong>for</strong> robust ...<br />

It can be seen that <strong>for</strong> up to five levels the preconditioner is still sparse with<br />

a density lower than 10%. Considering too many topological levels may<br />

cause unnecessary introduction of nonzeros in the sparse approximation.<br />

Some of these nonzero entries do not contribute much to the quality of the<br />

approximation.<br />

Magnitude v.s. levels <strong>for</strong> A<br />

Figure 3.2.4: Topological localization in the mesh <strong>for</strong> the large entries of<br />

A. The test problem is Example 1 and is representative of the general<br />

behaviour.<br />

3.2.3 Geometric strategy<br />

When the object geometries are not smooth, two far-away edges in the<br />

topological sense can have a strong interaction with each other so that they<br />

are strongly coupled in the inverse matrix. For the scattering problem on<br />

Example 1, we plot in Figures 3.2.7 and 3.2.8, <strong>for</strong> the interaction of each<br />

pair of edges in the mesh, the magnitude of the associated entry in A and<br />

A −1 with respect to their distance in terms of wavelength. The largest<br />

entries of A −1 on smooth geometries may come <strong>from</strong> the interaction of a<br />

geometrically localized set of entries in the mesh. If we construct the sparse<br />

pattern <strong>for</strong> the inverse by only using in<strong>for</strong>mation related to A, we may<br />

retain many small entries in the preconditioner, contributing marginally to<br />

its quality, but may neglect some of the large ones potentially damaging the<br />

quality of the preconditioner. Also, when the surface of the object is very<br />

non-smooth, these large entries may come <strong>from</strong> the interaction of far-away<br />

or non-connected edges in a topological sense, which are neighbours in a<br />

geometric sense. Thus they cannot be detected by using only topological<br />

in<strong>for</strong>mation related to the near field. Figure 3.2.8 suggests that we can


3.2. Pattern selection strategies <strong>for</strong> Frobenius-norm ... 61<br />

Magnitude v.s. levels <strong>for</strong> A −1<br />

Figure 3.2.5: Topological localization in the mesh <strong>for</strong> the large entries of<br />

A −1 . The test problem is Example 1 and is representative of the general<br />

behaviour.<br />

select the pattern <strong>for</strong> the preconditioner using physical in<strong>for</strong>mation, that<br />

is: <strong>for</strong> each edge we select all those edges within a sufficiently large sphere<br />

that defines our geometric neighbourhood. By using a suitable size <strong>for</strong> this<br />

sphere, we hope to include the most relevant contributions to the inverse<br />

and consequently to obtain an effective sparse approximate inverse. This<br />

selection strategy will be referred to as the geometric strategy. In Figure 3.2.9<br />

we show how the density of nonzeros in the preconditioner evolves when the<br />

radius of the sphere increases.<br />

3.2.4 Numerical experiments<br />

In this section, we compare the different strategies described above in the<br />

solution of our test problems.<br />

Using the three pattern selection strategies <strong>for</strong> M, we denote by<br />

• M a , the preconditioner computed by using the algebraic strategy,<br />

• M t , the preconditioner computed by using the topological strategy,<br />

• M g , the preconditioner computed by using the geometric strategy,<br />

• SP AI, the preconditioner constructed by using the dynamic strategy<br />

implemented by [77] and described in Section 2.2.4.<br />

To evaluate the effectiveness of the proposed strategies, we first consider<br />

using the <strong>dense</strong> matrix A to construct the <strong>preconditioners</strong> M a , M t , M g and<br />

SP AI. This requires the solution of large <strong>dense</strong> least-squares problems.


62 3. <strong>Sparse</strong> pattern selection strategies <strong>for</strong> robust ...<br />

100<br />

90<br />

Percentage of density of the pattern computed<br />

80<br />

70<br />

60<br />

50<br />

40<br />

30<br />

20<br />

10<br />

0<br />

0 5 10 15 20 25 30 35 40<br />

Levels<br />

Figure 3.2.6: Evolution of the density of the pattern computed <strong>for</strong> increasing<br />

number of levels. The test problem is Example 1. This is representative of<br />

the general behaviour.<br />

The density of the preconditioner varies <strong>from</strong> one problem to another<br />

<strong>for</strong> the same value of the distance parameter chosen to define M g . As<br />

Figure 3.2.8 shows, and tests on all the other examples confirm, those entries,<br />

corresponding to edges contained within a sphere of radius 0.12 times the<br />

wavelength, can retain many of the large entries of the inverse while giving<br />

rise to quite a sparse preconditioner. For all our numerical experiments, we<br />

choose a value <strong>for</strong> k in the construction of M a and SP AI, and <strong>for</strong> the level<br />

of neighbours used to generate M t so that they have the same density as<br />

M g , when necessary discarding some small entries of the preconditioner so<br />

that all have the same number of entries.<br />

As <strong>for</strong> the numerical experiments reported in the previous chapter, we<br />

show results <strong>for</strong> different Krylov solvers. The stopping criteria in all cases<br />

just consists in reducing the normwise backward error by 10 −5 . The symbol<br />

’-’ means that convergence was not obtained after 500 iterations. In each<br />

case, we took as the initial guess x 0 = 0, and the right-hand side was<br />

such that the exact solution of the system was known. We per<strong>for</strong>med<br />

different tests with different known solutions, observing identical results.<br />

All the numerical experiments were per<strong>for</strong>med in double precision complex<br />

arithmetic on a SGI Origin 2000 and the number of iterations reported in<br />

this paper are <strong>for</strong> left preconditioning. Very similar results were obtained<br />

when preconditioning <strong>from</strong> the right.<br />

From the results shown in Table 3.2.1, we first note that all the<br />

<strong>preconditioners</strong> accelerate the convergence of the Krylov solvers, and in<br />

some cases enable convergence when the unpreconditioned solver diverges


3.2. Pattern selection strategies <strong>for</strong> Frobenius-norm ... 63<br />

Magnitude v.s. distance <strong>for</strong> A<br />

Figure 3.2.7: Geometric localization in the mesh <strong>for</strong> the large entries of<br />

A. The test problem is Example 1. This is representative of the general<br />

behaviour.<br />

or converges very slowly. These numerical experiments also highlight the<br />

advantages of the geometric strategy. It not only outper<strong>for</strong>ms the algebraic<br />

approach and is more robust than the topological approach, which has a<br />

similar computational complexity, but it also generally outper<strong>for</strong>ms the<br />

adaptive approach implemented in SPAI which is much more sophisticated<br />

and more expensive in execution time and memory. SPAI competes with M g<br />

only on Example 1 where the density of the preconditioner is higher. This<br />

trend, namely the <strong>dense</strong>r the preconditioner the more efficient SPAI is, has<br />

been observed on many other examples. However, <strong>for</strong> sparse <strong>preconditioners</strong>,<br />

SPAI may be quite poor, as illustrated on Example 4 where preconditioned<br />

GMRES(30) or Bi-CGSTAB are slower than without a preconditioner and<br />

the iteration diverges <strong>for</strong> GMRES(10) with the SPAI preconditioner while it<br />

converges <strong>for</strong> the other three <strong>preconditioners</strong>. On the non-smooth geometry,<br />

that is Example 2, an explanation of why the geometric approach should lead<br />

to a better sparse preconditioner can be suggested by Figure 3.2.10. Some<br />

far-away edges in the connectivity graph, those <strong>from</strong> each side of the break,<br />

are weakly connected in the mesh but can have a strong interaction with<br />

each other and can lead to large entries in the inverse matrix.


64 3. <strong>Sparse</strong> pattern selection strategies <strong>for</strong> robust ...<br />

(b) Magnitude v.s. distance <strong>for</strong> A −1<br />

Figure 3.2.8: Geometric localization in the mesh <strong>for</strong> the large entries of<br />

A −1 . The test problem is Example 1. This is representative of the general<br />

behaviour.<br />

100<br />

90<br />

Percentage of density of the pattern computed<br />

80<br />

70<br />

60<br />

50<br />

40<br />

30<br />

20<br />

10<br />

0<br />

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9<br />

Distance/Wavelength<br />

Figure 3.2.9: Evolution of the density of the pattern computed <strong>for</strong> larger<br />

geometric neighbourhoods. The test problem is Example 1. This is<br />

representative of the general behaviour.


3.3. Strategies <strong>for</strong> the coefficient matrix 65<br />

Example 1 - Density of M = 5.03%<br />

Precond.<br />

GMRES(m) Bi -<br />

CGStab<br />

UQMR TFQMR<br />

m=10 m=30 m=50 m=80 m=110<br />

Unprec. - - - 251 202 223 231 175<br />

M j - - 465 222 174 239 210 169<br />

M a 219 135 96 72 72 86 107 72<br />

M t 100 49 36 36 36 35 42 32<br />

M g 124 68 46 46 46 44 58 38<br />

SPAI - 67 44 44 44 48 50 43<br />

Example 2 - Density of M = 1.59%<br />

Precond.<br />

GMRES(m) Bi -<br />

CGStab<br />

UQMR TFQMR<br />

m=10 m=30 m=50 m=80 m=110<br />

Unprec. - - - 398 289 359 403 249<br />

M j - - 473 330 243 257 354 228<br />

M a 472 273 239 207 184 330 313 141<br />

M t - 470 346 243 195 187 275 158<br />

M g 90 72 55 52 52 44 82 40<br />

SPAI - - 99 61 61 168 97 111<br />

Example 4 - Density of M = 1.04%<br />

Precond.<br />

GMRES(m) Bi -<br />

CGStab<br />

UQMR TFQMR<br />

m=10 m=30 m=50 m=80 m=110<br />

Unprec. - 224 191 158 147 177 170 118<br />

M j 350 211 178 153 140 188 152 110<br />

M a 212 157 141 132 123 131 145 115<br />

M t 288 187 160 146 139 145 156 98<br />

M g 63 51 41 41 41 37 47 32<br />

SPAI - 370 184 112 84 256 96 85<br />

Table 3.2.1: Number of iterations using the <strong>preconditioners</strong> based on<br />

<strong>dense</strong> A.<br />

3.3 Strategies <strong>for</strong> the coefficient matrix<br />

When the coefficient matrix of the <strong>linear</strong> system is <strong>dense</strong>, the construction<br />

of even a very sparse preconditioner may become too expensive in execution<br />

time as the problem size increases. Both memory and execution time are<br />

significantly reduced by replacing A with a sparse approximation. On<br />

general problems, this approach can cause a severe deterioration of the<br />

quality of the preconditioner; in the context of the Boundary Element<br />

Method (BEM), since a very sparse matrix can retain the most relevant<br />

contributions to the singular integrals, it is likely to be more effective. The


66 3. <strong>Sparse</strong> pattern selection strategies <strong>for</strong> robust ...<br />

Figure 3.2.10: Mesh of Example 2.<br />

use of a sparse matrix substantially reduces the size of the least-squares<br />

problems that can then be efficiently solved by direct methods.<br />

The algebraic heuristic described in the previous sections is well suited<br />

<strong>for</strong> sparsifying A. In [2] the same nonzero sparsity pattern is selected both<br />

<strong>for</strong> A and M; in that case, especially when the pattern is very sparse, the<br />

computed preconditioner may be poor on some geometries. The effect of<br />

replacing A with its sparse approximation on some problems is highlighted<br />

in Figure 3.3.12 where we display the sparsified pattern of the inverse of the<br />

sparsified A. We see that the resulting pattern is very different <strong>from</strong> the<br />

sparsified pattern of the inverse of A shown in Figure 3.3.11.<br />

A possible remedy is to increase the density in the patterns <strong>for</strong> both<br />

A and M. To a certain extent, we can improve the convergence, but the<br />

computational cost of generating the preconditioner grows almost cubicly<br />

with respect to density. A cheaper remedy is to choose a different number<br />

of nonzeros to construct the patterns <strong>for</strong> A and M, with less entries in the<br />

preconditioner than in Ã, the sparse approximation of A. To illustrate this<br />

effect, we show in Table 3.3.2 the number of iterations of preconditioned<br />

GMRES(50), where the <strong>preconditioners</strong> are built by using either the same<br />

sparsity pattern <strong>for</strong> A or a two, three or five times <strong>dense</strong>r pattern <strong>for</strong> A.<br />

Except when the preconditioner is very sparse, increasing the density<br />

of the pattern imposed on A <strong>for</strong> a given density of M accelerates the<br />

convergence as expected, getting quite rapidly very close to the number<br />

of iterations required when using a full A. The additional cost in terms<br />

of CPU time is negligible as can be seen in Figure 3.3.13 <strong>for</strong> experiments<br />

on Example 1. This is due to the fact that the complexity of the QR<br />

factorization used to solve the least-squares problems is the square of the<br />

number of columns times the number of rows. Thus, increasing the number<br />

of rows, that is the number of entries of Ã, is much cheaper in terms of<br />

overall CPU time than increasing the density of the preconditioner, that<br />

is the number of columns in the least-squares problems. Notice that this


3.3. Strategies <strong>for</strong> the coefficient matrix 67<br />

sparsified(A −1 )<br />

Figure 3.3.11: Nonzero pattern <strong>for</strong> A −1 when the smallest entries are<br />

discarded. The test problem is Example 5.<br />

Example 1<br />

Percentage density of M<br />

Density strategy<br />

1 2 3 4 5 6 7 8 9 10<br />

Same - - 299 146 68 47 47 42 37 39<br />

2 times - - 248 155 76 46 40 39 39 38<br />

3 times - 253 207 109 49 39 39 37 35 34<br />

5 times - 258 213 99 48 37 38 34 33 33<br />

Full A 364 359 144 96 46 35 35 34 32 31<br />

Table 3.3.2: Number of iterations <strong>for</strong> GMRES(50) preconditioned with<br />

different values <strong>for</strong> the density of M using the same pattern <strong>for</strong> A and larger<br />

patterns. A geometric approach is adopted to construct the patterns. The<br />

test problem is Example 1. This is representative of the general behaviour<br />

observed.<br />

observation is true <strong>for</strong> both left and right preconditioning because, according<br />

to (2.2.8) and (2.2.9), the smaller dimension of the matrices involved in<br />

the least-squares problems always corresponds to the entries of M to be<br />

computed, and the larger to the entries of the sparsified matrix <strong>from</strong> A.


68 3. <strong>Sparse</strong> pattern selection strategies <strong>for</strong> robust ...<br />

Figure 3.3.12: Sparsity pattern of the inverse of sparse A associated with<br />

Example 1. The pattern has been sparsified with the same value of the<br />

threshold used <strong>for</strong> the sparsification of displayed in Figure 3.3.11.<br />

3.4 Numerical results<br />

We report in this section on the numerical results obtained by replacing<br />

A with its sparse approximation in the construction of the preconditioner.<br />

In Table 3.4.3 we use the following notation:<br />

• M a−a , introduced in [2] and computed by using algebraic in<strong>for</strong>mation<br />

<strong>from</strong> A. The same pattern is used <strong>for</strong> the preconditioner;<br />

• M a−t , constructed by using the algebraic strategy to sparsify A and<br />

the topological strategy to prescribe the pattern <strong>for</strong> the preconditioner;<br />

• M a−g , constructed by using the geometric approach and an algebraic<br />

heuristic <strong>for</strong> A with the same density as <strong>for</strong> the preconditioner;<br />

• M 2a−t , similar to M a−t , but the density of the pattern imposed on A<br />

is twice as <strong>dense</strong> as that imposed on M a−t ;<br />

• M 2a−g , similar to M a−g but, as in the previous case, the density of the<br />

pattern imposed on A is twice as <strong>dense</strong> as that imposed on M a−g .


3.4. Numerical results 69<br />

CPU−time <strong>for</strong> the construction of the preconditioner<br />

500<br />

450<br />

400<br />

350<br />

300<br />

250<br />

200<br />

150<br />

100<br />

1:1<br />

3:1<br />

5:1<br />

Full A<br />

50<br />

0<br />

0 1 2 3 4 5 6 7 8 9 10<br />

Density of the preconditioning matrix<br />

Figure 3.3.13: CPU time <strong>for</strong> the construction of the preconditioner using a<br />

different number of nonzeros in the patterns <strong>for</strong> A and M. The test problem<br />

is Example 1. This is representative of the other examples.<br />

For the sake of comparison we also report the number of iterations<br />

without using a preconditioner and with only a diagonal scaling, denoted by<br />

M j (j stands <strong>for</strong> Jacobi preconditoner).<br />

Other combinations are possible <strong>for</strong> defining the selection strategies <strong>for</strong><br />

the patterns of A and M. Here we focus on the most promising ones that use<br />

in<strong>for</strong>mation <strong>from</strong> the mesh to retain the large entries of the inverse, and the<br />

algebraic strategy <strong>for</strong> A to capture the most relevant contributions to the<br />

singular integrals. We also consider the preconditioner M a−a to compare<br />

with previous tests [2] that were per<strong>for</strong>med on different geometries <strong>from</strong><br />

those considered here. We show, in Table 3.4.3, the results of our numerical<br />

experiments. For each example, we give the number of iterations required<br />

by each preconditioned solver.


70 3. <strong>Sparse</strong> pattern selection strategies <strong>for</strong> robust ...<br />

Example 1 - Density of M = 5.03%<br />

Precond.<br />

GMRES(m) Bi -<br />

CGStab<br />

UQMR TFQMR<br />

m=10 m=30 m=50 m=80 m=110<br />

Unprec. - - - 251 202 223 231 175<br />

M j - - 465 222 174 239 210 169<br />

M a−a 284 170 138 114 92 120 156 94<br />

M a−t 179 61 45 45 45 43 58 36<br />

M a−g 147 93 68 59 59 55 73 53<br />

M 2a−t 128 56 40 40 40 37 50 36<br />

M 2a−g 131 79 52 51 51 59 65 44<br />

Example 2 - Density of M = 1.59%<br />

Precond.<br />

GMRES(m) Bi -<br />

CGStab<br />

UQMR TFQMR<br />

m=10 m=30 m=50 m=80 m=110<br />

Unprec. - - - 398 289 359 403 249<br />

M j - - 473 330 243 257 354 228<br />

M a−a - 319 255 221 203 181 319 135<br />

M a−t - 261 213 174 169 128 251 121<br />

M a−g 251 178 150 138 117 106 256 116<br />

M 2a−t - 370 284 202 182 176 276 127<br />

M 2a−g 100 73 61 55 55 48 93 40<br />

Example 3 - Density of M = 2.35%<br />

Precond.<br />

GMRES(m) Bi -<br />

CGStab<br />

UQMR TFQMR<br />

m=10 m=30 m=50 m=80 m=110<br />

Unprec. - - - - 488 - 444 308<br />

M j - - - 491 427 375 356 306<br />

M a−a 436 316 240 193 125 144 166 135<br />

M a−t 137 108 93 71 71 64 93 66<br />

M a−g - 464 296 203 108 240 166 144<br />

M 2a−t 113 78 59 53 53 41 61 44<br />

M 2a−g 122 84 72 59 59 53 67 50<br />

Example 4 - Density of M = 1.04%<br />

Precond.<br />

GMRES(m) Bi -<br />

CGStab<br />

UQMR TFQMR<br />

m=10 m=30 m=50 m=80 m=110<br />

Unprec. - 224 191 158 147 177 170 118<br />

M j 350 211 178 153 140 188 152 110<br />

M a−a 299 205 172 146 133 162 180 103<br />

M a−t 266 152 130 114 99 92 127 83<br />

M a−g 81 67 66 63 63 39 79 41<br />

M 2a−t 269 167 143 136 116 107 137 93<br />

M 2a−g 71 60 47 47 47 43 61 41<br />

Continued on next page


3.4. Numerical results 71<br />

Continued <strong>from</strong> previous page<br />

Example 5 - Density of M = 0.63%<br />

Precond.<br />

GMRES(m) Bi -<br />

CGStab<br />

UQMR TFQMR<br />

m=10 m=30 m=50 m=80 m=110<br />

Unprec. - 344 233 146 125 152 170 109<br />

M j - 326 219 140 131 183 173 107<br />

M a−a - 352 249 154 134 202 183 107<br />

M a−t 360 66 64 60 60 34 76 46<br />

M a−g 313 81 68 61 61 36 74 40<br />

M 2a−t 71 48 47 47 47 25 54 30<br />

M 2a−g 88 42 39 39 39 21 45 25<br />

Table 3.4.3: Number of iterations to solve the set of test problems.<br />

Example 1 - Density of M = 5.03%<br />

M a−a M a−t M 2a−t M a−g M 2a−g<br />

83.42 91.07 91.78 79.47 80.18<br />

Example 2 - Density of M = 1.59%<br />

M a−a M a−t M 2a−t M a−g M 2a−g<br />

13.98 16.45 16.73 13.53 13.67<br />

Example 3 - Density of M = 2.35%<br />

M a−a M a−t M 2a−t M a−g M 2a−g<br />

83.59 146.44 147.79 109.45 110.30<br />

Example 4 - Density of M = 1.04%<br />

M a−a M a−t M 2a−t M a−g M 2a−g<br />

31.75 38.05 38.23 31.12 31.24<br />

Example 4 - Density of M = 0.63%<br />

M a−a M a−t M 2a−t M a−g M 2a−g<br />

27.66 70.93 71.29 26.04 26.13<br />

Table 3.4.4: CPU time to compute the <strong>preconditioners</strong>.<br />

In Table 3.4.4, we show the CPU time required to compute the<br />

<strong>preconditioners</strong> when the least-squares problems are solved using LAPACK<br />

routines. The CPU time <strong>for</strong> constructing M a−t and M 2a−t is in some cases<br />

much larger than that needed <strong>for</strong> M a−g and M 2a−g . The reason is that, in<br />

the topological strategy, it is not possible to prescribe exactly a value <strong>for</strong><br />

the density. Thus, <strong>for</strong> each problem, we select a suitable number of levels<br />

of neighbours, to obtain the closest number of nonzeros to that retained in<br />

the pattern based on the geometric approach. After the construction of the


72 3. <strong>Sparse</strong> pattern selection strategies <strong>for</strong> robust ...<br />

preconditioner, we drop its smallest entries to ensure an identical number of<br />

nonzeros <strong>for</strong> the two strategies. The results illustrate that considering twice<br />

as <strong>dense</strong> a pattern <strong>for</strong> A as <strong>for</strong> M does not cause a significant growth in<br />

the computational time although it enables us to construct a more robust<br />

preconditioner.<br />

We first observe that using a sparse approximation of A reduces the<br />

convergence rate of the preconditioned iterations when the nonzero pattern<br />

imposed on the preconditioner is very sparse. However if we adopt the<br />

geometric strategy to define the sparsity pattern <strong>for</strong> the approximate inverse,<br />

the convergence rate is not affected very much. For even larger values<br />

of density, the difference in the number of iterations between using full<br />

A or an algebraic sparse approximation becomes negligible. For all the<br />

experiments, M a−g still outper<strong>for</strong>ms M a−a and is generally more robust than<br />

M a−t ; the most efficient and robust preconditioner is M 2a−g . The multiple<br />

density strategy allows us to improve the efficiency and the robustness of the<br />

Frobenius-norm preconditioner on this class of problems without requiring<br />

any more time <strong>for</strong> the construction of the preconditioner. For all the test<br />

examples, it enables us to get the fastest convergence even <strong>for</strong> GMRES with<br />

a low restart parameter on problems where neither M a−a nor M a−g converge.<br />

The effectiveness of this multiple density heuristic is illustrated in<br />

Figures 3.4.14 and 3.4.15 where we see the effect of preconditioning on the<br />

clustering of the eigenvalues of A <strong>for</strong> the most difficult problem, Example 2.<br />

The eigenvalues of the preconditioned matrices are in both cases well<br />

clustered around the point (1.0,0.0) (with a more effective clustering <strong>for</strong><br />

M 2a−g ), but those obtained by using the multiple density strategy are<br />

further <strong>from</strong> the origin. This is highly desirable when trying to improve<br />

the convergence of Krylov solvers.<br />

Another advantage of this multiple density heuristic is that it generally<br />

allows us to reduce the density of the preconditioner (and thus its<br />

construction cost), while preserving its numerical quality. Although no<br />

specific results are reported to illustrate this aspect, this behaviour may<br />

be partially observed in Table 3.3.2.


3.5. Concluding remarks 73<br />

0.5<br />

0<br />

Imaginary axis<br />

−0.5<br />

−1<br />

−1.5<br />

−0.5 0 0.5 1 1.5<br />

Real axis<br />

Figure 3.4.14: Eigenvalue distribution <strong>for</strong> the coefficient matrix<br />

preconditioned by using a single density strategy on Example 2.<br />

3.5 Concluding remarks<br />

We have presented some a priori pattern selection strategies <strong>for</strong> the<br />

construction of a robust sparse Frobenius-norm minimization preconditioner<br />

<strong>for</strong> electromagnetic scattering problems expressed in integral <strong>for</strong>mulation.<br />

We have shown that, by using additional geometric in<strong>for</strong>mation <strong>from</strong> the<br />

underlying mesh, it is possible to construct robust sparse <strong>preconditioners</strong><br />

at an af<strong>for</strong>dable computational and memory cost. The topological strategy<br />

requires less computational ef<strong>for</strong>t to construct the pattern, but since the<br />

density is a step function of the number of levels, the construction of the<br />

preconditioner can require some additional computation. Also it may not<br />

handle very well complex geometries where some parts of the object are not<br />

connected. By retaining two different densities in the patterns of A and M<br />

we can decrease very much the computational cost <strong>for</strong> the construction of the<br />

preconditioner, usually a bottleneck <strong>for</strong> this family of methods; preserving<br />

the efficiency while increasing the robustness of the resulting preconditioner.<br />

Although sparsifying A using an algebraic dropping strategy seems to be<br />

the most natural approach to get a sparse approximation of A when all<br />

its entries are available, either the topological or the geometric criterion<br />

can be used to define the sparse approximation of A. Those alternatives<br />

are attractive in a multipole framework where all the entries of A are not<br />

computed. The geometric approach can be also used to sparsify A, without<br />

noticeably deteriorating the quality of the preconditioner. This is shown in<br />

Table 3.5.5, where M 2g−g is constructed by exploiting geometric in<strong>for</strong>mation


74 3. <strong>Sparse</strong> pattern selection strategies <strong>for</strong> robust ...<br />

0.5<br />

0<br />

Imaginary axis<br />

−0.5<br />

−1<br />

−1.5<br />

−0.5 0 0.5 1 1.5<br />

Real axis<br />

Figure 3.4.15: Eigenvalue distribution <strong>for</strong> the coefficient matrix<br />

preconditioned by using a multiple density strategy on Example 2.<br />

in the patterns of both A and M, but choosing twice as <strong>dense</strong> a pattern <strong>for</strong> A<br />

as <strong>for</strong> M. As suggested by Figure 3.2.4, due to the strongly localized coupling<br />

introduced by the discretization of the integral equations, the topological<br />

approach can also provide a good sparse approximation of A, by retaining<br />

just a few levels of neighbouring edges <strong>for</strong> each DOF in the mesh. The<br />

numerical behaviour of this approach is illustrated in Table 3.5.6. In both<br />

cases the resulting preconditioner is still robust and better suited <strong>for</strong> a fast<br />

multipole framework since it does not require knowledge of the location of<br />

the largest entries in A.<br />

M 2g−g<br />

Example<br />

GMRES(m) Bi -<br />

CGStab<br />

UQMR TFQMR<br />

m=10 m=30 m=50 m=80 m=110<br />

1 165 103 75 60 60 66 71 61<br />

2 145 110 95 76 76 68 140 64<br />

3 129 89 70 57 57 49 69 52<br />

4 71 57 48 48 48 38 52 34<br />

5 110 46 42 42 42 24 50 27<br />

Table 3.5.5: Number of iterations to solve the set of test models by using<br />

a multiple density geometric strategy to construct the preconditioner.<br />

The pattern imposed on M is twice as <strong>dense</strong> as that imposed on A.


3.5. Concluding remarks 75<br />

M 2t−g<br />

Example<br />

GMRES(m) Bi -<br />

CGStab<br />

UQMR TFQMR<br />

m=10 m=30 m=50 m=80 m=110<br />

1 197 87 49 49 49 50 66 50<br />

2 103 82 72 61 61 49 111 50<br />

3 143 98 84 60 60 56 70 53<br />

4 70 58 49 49 49 39 65 37<br />

5 143 50 47 47 47 29 57 28<br />

Table 3.5.6: Number of iterations to solve the set of test models by<br />

using a topological strategy to sparsify A and a geometric strategy <strong>for</strong><br />

the preconditioner. The pattern imposed on M is twice as <strong>dense</strong> as that<br />

imposed on A.


76 3. <strong>Sparse</strong> pattern selection strategies <strong>for</strong> robust ...


Chapter 4<br />

Symmetric Frobenius-norm<br />

minimization <strong>preconditioners</strong><br />

in electromagnetism<br />

In the previous chapter we have introduced and compared some<br />

strategies to compute a priori the nonzero sparsity pattern <strong>for</strong> Frobeniusnorm<br />

minimization <strong>preconditioners</strong> in electromagnetic applications. The<br />

results of the numerical experiments suggest that using additional geometric<br />

in<strong>for</strong>mation <strong>from</strong> the underlying mesh, it is possible to construct very sparse<br />

<strong>preconditioners</strong> and to make them more robust. In this chapter, we illustrate<br />

the numerical and computational efficiency of the proposed preconditioner.<br />

In Section 4.1, we assess the effectiveness of the sparse approximate inverse<br />

compared with standard methods <strong>for</strong> the solution of a set of model problems<br />

that are representative of real electromagnetic calculation. In Section 4.2,<br />

we complete the study considering two symmetric <strong>preconditioners</strong> based on<br />

Frobenius-norm minimization.<br />

4.1 Comparison with standard <strong>preconditioners</strong><br />

In this section we want to assess the per<strong>for</strong>mance of the proposed<br />

Frobenius-norm minimization approach. In Table 4.1.1, we show<br />

the numerical results observed on Examples 1-5 with some standard<br />

<strong>preconditioners</strong>, of both explicit and implicit <strong>for</strong>m. These are: diagonal<br />

scaling, SSOR, ILU(0), SPAI and SLU applied to a sparse approximation<br />

of A constructed using the algebraic approach. All these <strong>preconditioners</strong>,<br />

except SLU, exhibit much poorer acceleration capabilities than that<br />

provided by M 2a−g . If we reduce the density of the preconditioner in<br />

Example 1 and 3, M 2a−g converges slowly but becomes the most efficient.<br />

77


78 4. Symmetric Frobenius-norm minimization <strong>preconditioners</strong> ...<br />

It should also be noted that SPAI works reasonably well when computed<br />

using <strong>dense</strong> A (see Table 3.2.1) but with sparse A it does not converge on<br />

Example 2 (see Table 4.1.1). In addition, following [35], we per<strong>for</strong>med some<br />

numerical experiments where we obtained an approximate m •j <strong>from</strong> (2.2.9)<br />

by dropping the smallest entries of the iterates computed by few steps<br />

of either the Minimum Residual method or GMRES. Un<strong>for</strong>tunately, the<br />

per<strong>for</strong>mance of these approaches <strong>for</strong> dynamically defining the pattern of the<br />

preconditioner was disappointing. They only improved the unpreconditioned<br />

case when a relative large number of iterations was used to build the<br />

preconditioner making them unaf<strong>for</strong>dable <strong>for</strong> our problems.<br />

The purpose of this study is to understand the numerical behaviour of<br />

the <strong>preconditioners</strong>. Nevertheless, we do recognize that some of the simple<br />

strategies have a much lower cost <strong>for</strong> building the preconditioner and so could<br />

result in a faster solution. When SSOR converges, it is often the fastest, in<br />

terms of the CPU time <strong>for</strong> the overall solution of the <strong>linear</strong> system. When<br />

the solution is per<strong>for</strong>med <strong>for</strong> only one right-hand side, the construction cost<br />

of the other <strong>preconditioners</strong> cannot be compensated <strong>for</strong> by the reduction<br />

in the number of iterations; the matrix-vector product is per<strong>for</strong>med using<br />

BLAS kernels that make the iteration cost quite cheap <strong>for</strong> the problem sizes<br />

we have considered. For instance, when solving Example 1 with GMRES(50)<br />

on a SUN Enterprise, SSOR converges in 31.4 seconds and M 2a−g requires<br />

190 seconds <strong>for</strong> the construction and 7.6 seconds <strong>for</strong> the iterations. However,<br />

in electromagnetism applications, the same <strong>linear</strong> system has to be solved<br />

with many right-hand sides when illuminating an object with various waves<br />

corresponding to different angles of incidence. For that example, if we have<br />

more than eight right-hand sides, the construction of M 2a−g is overcome<br />

by the time saved in the iterations and M 2a−g becomes more efficient than<br />

SSOR. In addition, the construction and the application of M 2a−g is fully<br />

parallelizable while the parallelization of SSOR requires some reordering of<br />

equations that may be difficult to implement efficiently on a distributed<br />

memory plat<strong>for</strong>m.


4.1. Comparison with standard <strong>preconditioners</strong> 79<br />

Example 1 - Density of M = 5.03%<br />

Precond.<br />

GMRES(m) Bi -<br />

CGStab<br />

UQMR TFQMR<br />

m=10 m=30 m=50 m=80 m=110<br />

M j - - 465 222 174 239 210 169<br />

SSOR - - 216 136 98 147 177 135<br />

ILU(0) - - - - - - 479 -<br />

SP AI - - 192 68 68 150 83 94<br />

SLU 160 53 38 38 38 46 50 39<br />

M 2a−g 131 79 52 51 51 59 65 44<br />

Example 2 - Density of M = 1.59%<br />

Precond.<br />

GMRES(m) Bi -<br />

CGStab<br />

UQMR TFQMR<br />

m=10 m=30 m=50 m=80 m=110<br />

M j - - 473 330 243 257 354 228<br />

SSOR - 413 245 164 134 185 281 266<br />

ILU(0) - - - - 322 385 394 439<br />

SP AI - - - - - - - -<br />

SLU - - - - 282 - - -<br />

M 2a−g 100 73 61 55 55 48 93 40<br />

Example 3 - Density of M = 2.35%<br />

Precond.<br />

GMRES(m) Bi -<br />

CGStab<br />

UQMR TFQMR<br />

m=10 m=30 m=50 m=80 m=110<br />

M j - - - 491 427 375 356 306<br />

SSOR - 500 397 301 226 228 246 199<br />

ILU(0) - - - 474 185 - 388 -<br />

SP AI - - - 157 89 198 119 122<br />

SLU 36 25 25 25 25 14 27 19<br />

M 2a−g 122 84 72 59 59 53 67 50<br />

Example 4 - Density of M = 1.04%<br />

Precond.<br />

GMRES(m) Bi -<br />

CGStab<br />

UQMR TFQMR<br />

m=10 m=30 m=50 m=80 m=110<br />

M j<br />

SSOR 360 185 137 112 93 94 124 84<br />

ILU(0) - 359 280 202 127 203 179 136<br />

SP AI 99 78 59 55 55 49 72 53<br />

SLU 99 78 59 55 55 49 72 53<br />

M 2a−g 71 60 47 47 47 43 61 41<br />

Continued on next page


80 4. Symmetric Frobenius-norm minimization <strong>preconditioners</strong> ...<br />

Continued <strong>from</strong> previous page<br />

Example 5 - Density of M = 0.63%<br />

Precond.<br />

GMRES(m) Bi -<br />

CGStab<br />

UQMR TFQMR<br />

m=10 m=30 m=50 m=80 m=110<br />

M j<br />

SSOR - 296 194 145 115 161 168 124<br />

ILU(0) - - - - 414 345 389 272<br />

SP AI - 454 196 124 91 118 118 96<br />

SLU 115 68 52 52 52 29 59 42<br />

M 2a−g 88 42 39 39 39 21 45 25<br />

Table 4.1.1: Number of iterations with some standard <strong>preconditioners</strong><br />

computed using sparse A (algebraic).<br />

4.2 Symmetrization strategies <strong>for</strong> Frobenius-norm<br />

minimization method<br />

The <strong>linear</strong> <strong>systems</strong> arising <strong>from</strong> the discretization by BEM can be symmetric<br />

non-Hermitian in the Electric Field Integral Equation <strong>for</strong>mulation (EFIE),<br />

or unsymmetric in the Combined Field Integral Equation <strong>for</strong>mulation<br />

(CFIE). In this thesis, as mentioned in the previous chapters, we will<br />

only consider cases where the matrix is symmetric because EFIE usually<br />

gives rise to <strong>linear</strong> <strong>systems</strong> that are more difficult to solve with iterative<br />

methods. Another motivation to focus only on the EFIE <strong>for</strong>mulation is<br />

that it does not require any restriction on the geometry of the scattering<br />

obstacle as CFIE does and in this respect is more general. However, the<br />

sparse approximate inverse computed by the Frobenius-norm minimization<br />

method is not guaranteed to be symmetric, and usually is not, even if a<br />

symmetric pattern is imposed on M, and consequently it might not fully<br />

exploit all the characteristics of the <strong>linear</strong> system. This fact prevents the<br />

use of symmetric Krylov solvers. To complete the earlier studies, in this<br />

section we consider two possible symmetrization strategies <strong>for</strong> Frobeniusnorm<br />

minimization using a prescribed pattern <strong>for</strong> the preconditioner based<br />

on geometric in<strong>for</strong>mation. As be<strong>for</strong>e, all the <strong>preconditioners</strong> are computed<br />

using as input Ã, a sparse approximation of the <strong>dense</strong> coefficient matrix A.<br />

If M F rob denotes the unsymmetric matrix resulting <strong>from</strong> the<br />

minimization (2.2.9), the first strategy simply averages its off-diagonal<br />

entries. That is<br />

M Aver−F rob = M F rob + MF T rob<br />

. (4.2.1)<br />

2<br />

An alternative way to construct a symmetric sparse approximate inverse<br />

is to only compute the lower triangular part, including the diagonal, of the


4.2. Symmetrization strategies <strong>for</strong> Frobenius-norm minimization ... 81<br />

preconditioner. The nonzeros calculated are reflected with respect to the<br />

diagonal and are used to update the right-hand sides of the subsequent leastsquares<br />

problems involved in the construction of the remaining columns of<br />

the preconditioner. More precisely, in the computation of the k-th column of<br />

the preconditioner, the entries m ik <strong>for</strong> i < k are set to m ki that are already<br />

available and only the lower diagonal entries are computed. The entries m ki<br />

are then used to update the right-hand sides of the least-squares problems<br />

which involve the remaining unknowns m ik , <strong>for</strong> k ≥ i. The least-squares<br />

problems are as follows:<br />

min‖ê j − Ã ˆm •j‖ 2 2 (4.2.2)<br />

where ê j = e j − ∑ k


82 4. Symmetric Frobenius-norm minimization <strong>preconditioners</strong> ...<br />

Frobenius-norm minimization type <strong>preconditioners</strong>, both symmetric and<br />

unsymmetric. In the following we consider a geometric approach to define<br />

the sparsity pattern <strong>for</strong> Ã, as it is the only one that can be efficiently<br />

implemented in a parallel fast multipole environment [23]. We compare the<br />

unsymmetric preconditioner M F rob and the two symmetric <strong>preconditioners</strong><br />

M Aver−F rob and M Sym−F rob . The column entitled “Relative Flops” displays<br />

σ QR (M)<br />

the ratio<br />

σ QR (M F rob ) , where the σ QR(M) represents the number of floatingpoint<br />

operations required by the sequence of QR factorizations used to build<br />

the preconditioner M, that is either M = M Aver−F rob or M = M Sym−F rob .<br />

In this table, it can be seen that M Aver−F rob almost always requires<br />

less iterations than M Sym−F rob that imposes the symmetry directly and<br />

consequently only computes half of the entries. Since M Sym−F rob computes<br />

less entries the associated values in the column “Relative Flops” are all<br />

less than one and close to a third in all cases. On the hardest test cases<br />

(Examples 1 and 3), the combination SQMR and M Aver−F rob needs less<br />

than half the number of iterations of M F rob with GMRES(30) and is only<br />

very slightly less efficient than M F rob and GMRES(80). On the less difficult<br />

problems, SQMR plus M Aver−F rob converges between 21 and 37% faster than<br />

GMRES(80) plus M F rob and between 31 and 43% faster than GMRES(30)<br />

plus M F rob . M Sym−F rob , that only computes half of the entries of the<br />

preconditioner, has a poor convergence behaviour on the hardest problems<br />

and is slightly less efficient than M Aver−F rob on the other problems when<br />

used with SQMR. Nevertheless, we should mention that, <strong>for</strong> the sake of<br />

comparison, those preliminary results have been per<strong>for</strong>med using the set of<br />

parameters <strong>for</strong> the density of à and M that were the best <strong>for</strong> M F rob and<br />

consequently nearly optimal <strong>for</strong> M Aver−F rob ; the per<strong>for</strong>mance of M Sym−F rob<br />

might be improved as shown by the results depicted in Table 4.2.3. These<br />

first experiments reveal the remarkable robustness of SQMR when used in<br />

combination with a symmetric preconditioner. This combination generally<br />

outper<strong>for</strong>ms GMRES even <strong>for</strong> large restarts.<br />

The best alternative <strong>for</strong> significantly improving the behaviour of<br />

M Sym−F rob is to enlarge significantly the density of à and only marginally<br />

increase the density of the preconditioner. In Table 4.2.3, we show the<br />

number of iterations observed with this strategy that consists in using a<br />

density of à that is three times larger than that <strong>for</strong> M Sym−F rob ; we recall<br />

that <strong>for</strong> M Aver−F rob and M F rob a density of à twice as large as that of<br />

the preconditioner is usually the best trade-off between computing cost and<br />

numerical efficiency. It can be seen that M Sym−F rob is slightly better than<br />

M Aver−F rob (as in Table 4.2.2) but it is less expensive to build. In this<br />

table, we consider the same values <strong>for</strong> σ QR (M F rob ) as those in Table 4.2.2<br />

to evaluate the ratio “Relative Flops”.


4.2. Symmetrization strategies <strong>for</strong> Frobenius-norm minimization ... 83<br />

Example 1 - Density of à = 10.13% - Density of M = 5.03%<br />

Precond. GMRES(30) GMRES(80) GMRES(∞) SQMR Relative Flops<br />

M F rob 108 60 60 * 1.00<br />

M Aver−F rob 171 79 79 74 1.00<br />

M Sym−F rob – – 301 – 0.25<br />

Example 2 - Density of à = 3.17% - Density of M = 1.99%<br />

Precond. GMRES(30) GMRES(80) GMRES(∞) SQMR Relative Flops<br />

M F rob 57 43 43 * 1.00<br />

M Aver−F rob 59 44 44 34 1.00<br />

M Sym−F rob 60 46 39 41 0.28<br />

Example 3 - Density of à = 4.72% - Density of M = 2.35%<br />

Precond. GMRES(30) GMRES(80) GMRES(∞) SQMR Relative Flops<br />

M F rob 89 57 57 * 1.00<br />

M Aver−F rob 122 63 63 58 1.00<br />

M Sym−F rob 318 135 91 102 0.29<br />

Example 4 - Density of à = 2.08% - Density of M = 1.04%<br />

Precond. GMRES(30) GMRES(80) GMRES(∞) SQMR Relative Flops<br />

M F rob 58 48 48 * 1.00<br />

M Aver−F rob 59 47 47 30 1.00<br />

M Sym−F rob 63 51 51 33 0.30<br />

Example 5 - Density of à = 1.25% - Density of M = 0.62%<br />

Precond. GMRES(30) GMRES(80) GMRES(∞) SQMR Relative Flops<br />

M F rob 35 33 33 * 1.00<br />

M Aver−F rob 35 34 34 24 1.00<br />

M Sym−F rob 51 38 38 32 0.31<br />

Table 4.2.2: Number of iterations on the test examples using the same<br />

pattern <strong>for</strong> the <strong>preconditioners</strong>.<br />

Example Density<br />

GMRES(m)<br />

SQMR Relative Flops<br />

m=30 m=80 m=∞<br />

1<br />

à =11.98%<br />

M = 6.10 %<br />

172 68 68 67 0.40<br />

2<br />

à = 5.94%<br />

M = 2.04 %<br />

56 41 41 33 0.30<br />

3<br />

à =11.01%<br />

M = 3.14 %<br />

88 57 57 56 0.66<br />

4<br />

à = 2.08%<br />

M = 1.19 %<br />

56 50 50 32 0.47<br />

5<br />

à = 1.98%<br />

M = 0.62 %<br />

33 33 33 15 0.34<br />

Table 4.2.3: Number of iterations <strong>for</strong> M Sym−F rob combined with SQMR<br />

using three times more non-zero in à than in the preconditioner.


84 4. Symmetric Frobenius-norm minimization <strong>preconditioners</strong> ...<br />

To illustrate the effect of the densities of à and of the <strong>preconditioners</strong>,<br />

we per<strong>for</strong>med experiments with preconditioned SQMR, where the<br />

<strong>preconditioners</strong> are built by using either the same sparsity pattern <strong>for</strong> Ã<br />

or a two, three or five times <strong>dense</strong>r pattern <strong>for</strong> Ã. We report in Tables 4.2.4<br />

and 4.2.5 respectively the number of SQMR iterations <strong>for</strong> M Sym−F rob , and<br />

<strong>for</strong> M Aver−F rob respectively. In these tables, M Sym−F rob always requires<br />

more iterations than M Aver−F rob <strong>for</strong> the same values of density <strong>for</strong> Ã and<br />

<strong>for</strong> the preconditioner, but its computation costs about a quarter of the flops<br />

<strong>for</strong> each test.<br />

Example 1<br />

Percentage density of M<br />

Density strategy<br />

1 2 3 4 5 6 7 8 9 10<br />

Same – – – – – 180 150 118 105 55<br />

2.0 times – – – – – 67 56 48 91 42<br />

3.0 times – – – – 393 55 52 47 74 39<br />

5.0 times – – – – 346 53 50 45 56 39<br />

Table 4.2.4: Number of iterations of SQMR with M Sym−F rob with<br />

different values <strong>for</strong> the density of M, using the same pattern <strong>for</strong> A and<br />

larger patterns. The test problem is Example 1.<br />

Example 1<br />

Percentage density of M<br />

Density strategy<br />

1 2 3 4 5 6 7 8 9 10<br />

Same – – – 336 78 55 55 45 38 40<br />

2.0 times – – 426 105 81 50 48 43 43 44<br />

3.0 times – 426 293 113 92 49 45 36 35 35<br />

5.0 times – 315 248 114 80 44 38 37 37 35<br />

Table 4.2.5: Number of iterations of SQMR with M Aver−F rob with<br />

different values <strong>for</strong> the density of M, using the same pattern <strong>for</strong> A and<br />

larger patterns. The test problem is Example 1.<br />

Because the construction of M Sym−F rob is dependent on the ordering<br />

selected, a natural question concerns the sensitivity of the quality of the<br />

preconditioner to this. In particular, in [54] it is shown that the numerical<br />

behaviour of IC is very dependent on the ordering and a similar study and<br />

comparable conclusion with AINV is described in [17]. In Table 4.2.6, we<br />

display the number of iterations with SQMR, selecting the same density


4.2. Symmetrization strategies <strong>for</strong> Frobenius-norm minimization ... 85<br />

parameters as those used <strong>for</strong> the experiments reported in Table 4.2.5, but<br />

using different orderings to permute the original pattern of M Sym−F rob .<br />

More precisely we consider the reverse Cuthil-MacKee ordering [37] (RCM),<br />

the minimum degree [71, 141] ordering (MD), the spectral nested dissection<br />

ordering [114] (SND) and lastly we reorder the matrix by putting the <strong>dense</strong>r<br />

rows and columns first (DF). It can be seen that M Sym−F rob is not too<br />

sensitive to the ordering and none of the tested orderings appears superior<br />

to the others.<br />

Example Density Original RCM MD SND DF<br />

1<br />

à =11.98%<br />

M = 6.10 %<br />

67 93 93 75 87<br />

2<br />

à = 5.94%<br />

M = 2.04 %<br />

33 41 40 40 44<br />

3<br />

à =11.01%<br />

M = 3.14 %<br />

56 51 68 73 77<br />

4<br />

à = 2.08%<br />

M = 1.19 %<br />

32 42 40 39 39<br />

5<br />

à = 1.98%<br />

M = 0.62 %<br />

15 26 25 26 23<br />

Table 4.2.6: Number of iterations of SQMR with M Sym−F rob with<br />

different orderings.<br />

For comparison, in Table 4.2.7, we report on comparative results<br />

amongst different Frobenius-norm minimization type <strong>preconditioners</strong>, both<br />

symmetric and unsymmetric, obtained when the algebraic dropping strategy<br />

is used to sparsify the coefficient matrix. In this case, M Aver−F rob always<br />

per<strong>for</strong>ms better than M Sym−F rob but is at least three times more expensive<br />

to compute. On Examples 1 and 3, the hardest test cases, the combination<br />

SQMR and M Aver−F rob needs up to 65% more iterations than GMRES(80)<br />

plus M F rob but competes with GMRES(30) plus M F rob . On the less difficult<br />

problems, SQMR plus M Aver−F rob converges between 18 and 35% faster than<br />

GMRES(80) plus M F rob and between 20 and 47% faster than GMRES(30)<br />

plus M F rob . The best alternative to significantly improve the behaviour of<br />

M Sym−F rob remains to enlarge notably the density of à and only marginally<br />

the density of the preconditioner. This can be observed in Table 4.2.8 where<br />

we show the number of iterations observed with this strategy that consists<br />

in using a density of à that is at most three times larger than that of<br />

M Sym−F rob . Once again the behaviour of M Sym−F rob is comparable to that<br />

of M Aver−F rob described in Table 4.2.7 but is less expensive to build.<br />

In Tables 4.2.9 and 4.2.10 we illustrate the effect of the density of the<br />

approximation of the original matrix and of the <strong>preconditioners</strong> on the


86 4. Symmetric Frobenius-norm minimization <strong>preconditioners</strong> ...<br />

convergence of SQMR. The <strong>preconditioners</strong> are built by using either the<br />

same sparsity pattern <strong>for</strong> Ã or a two, three or five times <strong>dense</strong>r pattern <strong>for</strong><br />

Ã. We report in Tables 4.2.9 and Table 4.2.10, respectively, the number<br />

of iterations of SQMR iterations when an algebraic approach is used <strong>for</strong><br />

à and a geometric approach is selected <strong>for</strong> M Sym−F rob and M Aver−F rob ,<br />

respectively. If we compare these results with those reported in Table 4.2.4,<br />

it can be seen that, on hard problems, using geometric in<strong>for</strong>mation even<br />

to prescribe the pattern of à is beneficial. M Sym−F rob remains rather<br />

insensitive to the ordering as shown in the results of Table 4.2.11.<br />

Example 1 - Density of à = 10.19% - Density of M = 5.03%<br />

Precond. GMRES(30) GMRES(80) GMRES(110) SQMR Relative Flops<br />

M F rob 79 51 51 * 1.00<br />

M Aver−F rob 196 119 90 84 1.00<br />

M Sym−F rob – – – – 0.25<br />

Example 2 - Density of à = 3.18% - Density of M = 1.99%<br />

Precond. GMRES(30) GMRES(80) GMRES(110) SQMR Relative Flops<br />

M F rob 45 39 39 * 1.00<br />

M Aver−F rob 48 40 40 32 1.00<br />

M Sym−F rob 78 49 49 46 0.28<br />

Example 3 - Density of à = 4.69% - Density of M = 2.35%<br />

Precond. GMRES(30) GMRES(80) GMRES(110) SQMR Relative Flops<br />

M F rob 84 59 59 * 1.00<br />

M Aver−F rob 119 74 74 74 1.00<br />

M Sym−F rob – – – – 0.29<br />

Example 4 - Density of à = 2.10% - Density of M = 1.04%<br />

Precond. GMRES(30) GMRES(80) GMRES(110) SQMR Relative Flops<br />

M F rob 60 47 47 * 1.00<br />

M Aver−F rob 64 49 49 32 1.00<br />

M Sym−F rob 64 51 51 33 0.30<br />

Example 5 - Density of à = 1.27% - Density of M = 0.62%<br />

Precond. GMRES(30) GMRES(80) GMRES(110) SQMR Relative Flops<br />

M F rob 42 39 39 * 1.00<br />

M Aver−F rob 30 30 30 25 1.00<br />

M Sym−F rob 50 36 36 31 0.31<br />

Table 4.2.7: Number of iterations on the test examples using the same<br />

pattern <strong>for</strong> the <strong>preconditioners</strong>. An algebraic pattern is used to sparsify<br />

A.


4.2. Symmetrization strategies <strong>for</strong> Frobenius-norm minimization ... 87<br />

Example Density<br />

GMRES(m)<br />

SQMR Relative Flops<br />

m=30 m=80 m=∞<br />

1<br />

à =12%<br />

M = 6%<br />

360 79 79 79 0.41<br />

2<br />

à = 5.97%<br />

M = 2.04 %<br />

59 43 43 34 0.57<br />

3<br />

à =11.08%<br />

M = 3.14 %<br />

171 76 76 78 0.66<br />

4<br />

à = 2.10%<br />

M = 1.19 %<br />

51 44 44 31 0.47<br />

5<br />

à = 1.87%<br />

M = 0.62 %<br />

33 33 33 14 0.34<br />

Table 4.2.8: Number of iterations M Sym−F rob combined with SQMR<br />

using three times more non-zero in à than in the preconditioner. An<br />

algebraic pattern is used to sparsify A.<br />

Example 1<br />

Percentage density of M<br />

Density strategy<br />

1 2 3 4 5 6 7 8 9 10<br />

Same – – – – – – 494 364 440 90<br />

2.0 times – – – – – 79 173 105 81 58<br />

3.0 times – – – – – 64 66 71 45 55<br />

5.0 times – – – – 346 52 70 56 40 41<br />

Table 4.2.9: Number of iterations of SQMR with M Sym−F rob with<br />

different values <strong>for</strong> the density of M, using the same pattern <strong>for</strong> A<br />

and larger patterns. A geometric approach is adopted to construct the<br />

pattern <strong>for</strong> the preconditioner and an algebraic approach is adopted to<br />

construct the pattern <strong>for</strong> the coefficient matrix. The test problem is<br />

Example 1.


88 4. Symmetric Frobenius-norm minimization <strong>preconditioners</strong> ...<br />

Example 1<br />

Percentage density of M<br />

Density strategy<br />

1 2 3 4 5 6 7 8 9 10<br />

Same 391 – 433 99 89 48 50 38 36 36<br />

2.0 times – 420 272 112 84 44 37 36 33 34<br />

3.0 times 362 363 222 96 86 40 43 36 36 35<br />

5.0 times – 365 251 100 76 40 38 34 35 36<br />

Table 4.2.10: Number of iterations of SQMR with M Aver−F rob with<br />

different values <strong>for</strong> the density of M, using the same pattern <strong>for</strong> A<br />

and larger patterns. A geometric approach is adopted to construct the<br />

pattern <strong>for</strong> the preconditioner and an algebraic approach is adopted to<br />

construct the pattern <strong>for</strong> the coefficient matrix. The test problem is<br />

Example 1.<br />

Example Density Original RCM MD SND DF<br />

1<br />

à =12%<br />

M = 6%<br />

79 72 70 71 76<br />

2<br />

à = 5.97%<br />

M = 2.04 %<br />

34 39 39 35 39<br />

3<br />

à =11.08%<br />

M = 3.14 %<br />

78 122 92 112 122<br />

4<br />

à = 2.10%<br />

M = 1.19 %<br />

31 29 30 30 27<br />

5<br />

à = 1.87%<br />

M = 0.62 %<br />

14 27 24 26 14<br />

Table 4.2.11: Number of iterations of SQMR with M Sym−F rob with<br />

different ordering. An algebraic pattern is used to sparsify A.<br />

4.3 Concluding remarks<br />

In this chapter we have assessed the per<strong>for</strong>mance of the Frobeniusnorm<br />

minimization preconditioner in the solution of <strong>dense</strong> complex<br />

symmetric non-Hermitian <strong>systems</strong> of equations arising <strong>from</strong> electromagnetic<br />

applications. The set of problems used <strong>for</strong> the numerical experiments can<br />

be representative of larger <strong>systems</strong>. We have also investigated the use of<br />

symmetric <strong>preconditioners</strong> which reflect the symmetry of the original matrix<br />

in the associated preconditioner, and enable us to use a symmetric Krylov<br />

solver that might be cheaper than GMRES iterations. Both M Aver−F rob<br />

and M Sym−F rob appear to be efficient and robust. Through numerical


4.3. Concluding remarks 89<br />

experiments, we have shown that M Sym−F rob was not too sensitive to column<br />

ordering while M Aver−F rob is totally insensitive. In addition M Aver−F rob<br />

is straight<strong>for</strong>ward to parallelize even though it requires more flops <strong>for</strong> its<br />

construction. It would probably be the preconditioner of choice in a parallel<br />

distributed fast multipole environment but possibilities <strong>for</strong> parallelizing<br />

M Sym−F rob also exist, by using colouring techniques to detect independent<br />

subsets of columns that can be computed in parallel. In a multipole context<br />

the algorithm must be recast by blocks, and Level 2 BLAS operations have<br />

to be used <strong>for</strong> the least-squares updates. Finally, the major benefit of these<br />

two <strong>preconditioners</strong> is the remarkable robustness they exhibit when used in<br />

conjunction with SQMR.


90 4. Symmetric Frobenius-norm minimization <strong>preconditioners</strong> ...


Chapter 5<br />

Combining fast multipole<br />

techniques and approximate<br />

inverse <strong>preconditioners</strong> <strong>for</strong><br />

large parallel<br />

electromagnetics<br />

calculations.<br />

In this chapter we consider the implementation of the Frobenius-norm<br />

minimization preconditioner described in Chapter 3 within a code that<br />

implements the Fast Multipole Method (FMM). We combine the sparse<br />

approximate inverse preconditioner with fast multipole techniques <strong>for</strong> the<br />

solution of huge electromagnetic problems. The chapter is organized as<br />

follows: in Section 5.1 we quickly overview the FMM. In Section 5.2<br />

we describe the implementation of the Frobenius-norm minimization<br />

preconditioner in a parallel and multipole context that has been developed<br />

by [135]. In Section 5.3 we study the numerical and parallel scalability of the<br />

implementation <strong>for</strong> the solution of large problems. Finally, in Section 5.4 we<br />

investigate the numerical behaviour of inner-outer iterative solution schemes<br />

implemented in a multipole context with different levels of accuracy <strong>for</strong><br />

the matrix-vector products in the inner and outer loops. We consider in<br />

particular FGMRES as the outer solver with an inner GMRES iteration<br />

preconditioned by the Frobenius-norm minimization method. We illustrate<br />

the robustness and effectiveness of this scheme <strong>for</strong> the solution of problems<br />

with up to one million unknowns.<br />

91


92 5. Combining fast multipole techniques and approximate inverse ...<br />

5.1 The fast multipole method<br />

The FMM, introduced by Greengard and Rokhlin [82], provides<br />

an algorithm <strong>for</strong> computing approximate matrix-vector products <strong>for</strong><br />

electromagnetic scattering problems. The method is fast in the sense that<br />

the computation of one matrix-vector product costs O(n log n) arithmetic<br />

operations instead of the usual O(n 2 ) operations, and is approximate in<br />

the sense that the relative error with respect to the exact computation is<br />

around 10 −3 [38, 135]. It is based on truncated series expansions of the<br />

Green’s function <strong>for</strong> the electric-field integral equation (EFIE). The EFIE<br />

can be written as<br />

∫<br />

E(x) = − ∇G(x, x ′ )ρ(x ′ )d 3 x ′ − ik<br />

Γ c<br />

∫<br />

Γ<br />

G(x, x ′ )J(x ′ )d 3 x ′ +E E (x), (5.1.1)<br />

where E E is the electric field due to external sources, J(x) is the current<br />

density, ρ(x) is the charge density and the constants k and c are the<br />

wavenumber and the speed of light, respectively. The Green’s function G<br />

can be expressed as<br />

G(x, x ′ ) = e−ik|x−x′ |<br />

|x − x ′ | . (5.1.2)<br />

The EFIE is converted into matrix equations by the Method of Moments [86].<br />

The unknown current J(x) on the surface of the object is expanded into a<br />

set of basis functions B i , i = 1, 2, ..., N<br />

J(x) =<br />

N∑<br />

J i B i (x).<br />

i=1<br />

This expansion is introduced in (5.1.1), and the discretized equation is<br />

applied to a set of test functions. A <strong>linear</strong> system is finally obtained. The<br />

entries in the coefficient matrix of the system are expressed in terms of<br />

surface integrals, and have the <strong>for</strong>m<br />

∫ ∫<br />

A KL = G(x, y)B K (x) · B L (y)dL(y)dK(x). (5.1.3)<br />

When m-point Gauss quadrature <strong>for</strong>mulae are used to compute the surface<br />

integrals in (5.1.3), the entries of the coefficient matrix assume the <strong>for</strong>m<br />

A KL =<br />

m∑<br />

i=1 j=1<br />

m∑<br />

ω i ω j G(x Ki , y Lj )B K (x Ki ) · B L (y Lj ). (5.1.4)<br />

Single and multilevel variants of the FMM exist and, <strong>for</strong> the multilevel<br />

algorithm, there are adaptive variants that handle efficiently inhomogeneous


5.1. The fast multipole method 93<br />

discretizations. In the one-level algorithm, the 3D obstacle is entirely<br />

enclosed in a large rectangular domain, and the domain is divided into eight<br />

boxes (four in 2D). Each box is recursively divided until the length of the<br />

edges of the boxes of the current level is small enough compared with the<br />

wavelength. The neighbourhood of a box is defined by the box itself and its<br />

26 adjacent neighbours (eight in 2D). The interactions of degrees of freedom<br />

within nearby boxes are computed exactly <strong>from</strong> (5.1.4), where the Green’s<br />

function is expressed via (5.1.2). The contributions of far away cubes are<br />

computed approximately. For each far away box, the effect of a large number<br />

of degrees of freedom is concentrated into one multipole coefficient, that is<br />

computed using truncated series expansion of the Green’s function<br />

G(x, y) =<br />

P∑<br />

ψ p (x)φ p (y). (5.1.5)<br />

p=1<br />

The expansion (5.1.5) separates the Green’s function into two sets of terms,<br />

ψ i and φ i , that depend on the observation point x and the source (or<br />

evaluation) point y, respectively. In (5.1.5) the origin of the expansion is near<br />

the source point and the observation point x is far away. Local coefficients<br />

<strong>for</strong> the observation cubes are computed by summing together multipole<br />

coefficients of far-away boxes, and the total effect of the far field on each<br />

observation point is evaluated <strong>from</strong> the local expansions (see Figure 5.1.1<br />

<strong>for</strong> a 2D illustration). Local and multipole coefficients can be computed in a<br />

preprocessing step; the approximate computation of the far field enables us<br />

to reduce the computational cost of the matrix-vector product to O(n 3/2 )<br />

in the basic one-level algorithm.<br />

In the hierarchical multilevel algorithm, the obstacle is enclosed in a<br />

cube, the cube is divided into eight subcubes and each subcube is recursively<br />

divided until the size of the smallest box is generally half of a wavelength.<br />

Tree-structured data is used at all levels. In particular only non-empty<br />

cubes are indexed and recorded in the data structure. The resulting<br />

tree is called an oct-tree (see Figure 5.1.2) and we refer to its leaves as<br />

the leaf-boxes. The oct-tree provides a hierarchical representation of the<br />

computational domain partitioned by boxes. Each box has one parent in<br />

the oct-tree, except <strong>for</strong> the largest cube which encloses the whole domain,<br />

and up to eight children. Obviously, the leaf-boxes have no children.<br />

Multipole coefficients are computed <strong>for</strong> all cubes in the lowest level of the<br />

oct-tree, that is <strong>for</strong> the leaf-boxes. Multipole coefficients of the parent<br />

cubes in the hierarchy are computed by summing together contributions<br />

<strong>from</strong> the multipole coefficients of their children. The process is repeated<br />

recursively until the coarsest possible level. For each observation cube,<br />

an interaction list is defined that consists of those cubes that are not<br />

neighbours of the cube itself but whose parent is a neighbour of the cube’s<br />

parent. In Figure 5.1.3 we denote by dashed lines the interaction list


94 5. Combining fast multipole techniques and approximate inverse ...<br />

<strong>for</strong> the observation cube in the 2D case. The interactions of degrees<br />

of freedom within neighbouring boxes are computed exactly, while the<br />

interactions between cubes in the interaction list are computed using the<br />

FMM. All the other interactions are computed hierarchically on a coarser<br />

level traversing the oct-tree. Both the computational cost and the memory<br />

requirement of the algorithm are of order O(n log n). For further details<br />

on the algorithmic steps see [39, 115, 124] and [38, 44, 45, 46] <strong>for</strong> recent<br />

theoretical investigations. Parallel implementations of hierarchical methods<br />

have been described in [78, 79, 80, 81, 126, 149].<br />

Figure 5.1.1: Interactions in the one-level FMM. For each leaf-box, the<br />

interactions with the gray neighbouring leaf-boxes are computed directly.<br />

The contribution of far away cubes are computed approximately. The<br />

multipole expansions of far away boxes are translated to local expansions<br />

<strong>for</strong> the leaf-box; these contributions are summed together and the total field<br />

induced by far away cubes is evaluated <strong>from</strong> local expansions.<br />

5.2 Implementation of the Frobenius-norm<br />

minimization preconditioner in the fast<br />

multipole framework<br />

An efficient implementation of the Frobenius-norm minimization<br />

preconditioner in the FMM context exploits the box-wise partitioning of<br />

the domain. The subdivision into boxes of the computational domain uses


5.2. Implementation of the Frobenius-norm minimization preconditioner ...95<br />

Figure 5.1.2: The oct-tree in the FMM algorithm. The maximum number<br />

of children is eight. The actual number corresponds to the subset of eight<br />

that intersect the object (courtesy of G. Sylvand, INRIA CERMICS).<br />

geometric in<strong>for</strong>mation <strong>from</strong> the obstacle, that is the spatial coordinates of<br />

its degrees of freedom. As we know <strong>from</strong> Chapter 3, this in<strong>for</strong>mation can<br />

be profitably used to compute an effective a priori sparsity pattern <strong>for</strong> the<br />

approximate inverse. In the FMM implementation, we adopt the following<br />

criterion: the nonzero structure of each column of the preconditioner is<br />

defined by retaining all the edges within a given leaf-box and those in<br />

one level of neighbouring boxes. We recall that the neighbourhood of a<br />

box is defined by the box itself and its 26 adjacent neighbours (eight in<br />

2D). The sparse approximation of the <strong>dense</strong> coefficient matrix is defined by<br />

retaining the entries associated with edges included in the given leaf-box as<br />

well as those belonging to the two levels of neighbours. The actual entries<br />

of the approximate inverse are computed column by column by solving<br />

independent least-squares problems. The main advantage of defining the<br />

pattern of the preconditioner and of the original sparsified matrix box-wise<br />

is that we only have to compute a QR factorization per leaf-box. Indeed<br />

the least-squares problems corresponding to edges within the same box are<br />

identical because they are defined using the same nonzero structure and the<br />

same entries of A. It means that the QR factorization can be per<strong>for</strong>med<br />

once and reused many times, improving significantly the efficiency of the<br />

computation. The preconditioner has a sparse block structure; each block is<br />

<strong>dense</strong> and is associated with one leaf-box. Its construction can use a different<br />

partitioning <strong>from</strong> that used to approximate the <strong>dense</strong> coefficient matrix and<br />

represented by the oct-tree. The size of the smallest boxes in the partitioning


96 5. Combining fast multipole techniques and approximate inverse ...<br />

Figure 5.1.3: Interactions in the multilevel FMM. The interactions <strong>for</strong> the<br />

gray boxes are computed directly. We denote by dashed lines the interaction<br />

list <strong>for</strong> the observation box, that consists of those cubes that are not<br />

neighbours of the cube itself but whose parent is a neighbour of the cube’s<br />

parent. The interactions of the cubes in the list are computed using the<br />

FMM. All the other interactions are computed hierarchically on a coarser<br />

level, denoted by solid lines.<br />

associated with the preconditioner is a user-defined parameter that can be<br />

tuned to control the number of nonzeros computed per row, that is the<br />

density of the preconditioner. According to our criterion, the larger the size<br />

of the leaf-boxes, the larger the geometric neighbourhood that determines<br />

the sparsity structure of the columns of the preconditioner. Parallelism can<br />

be exploited by assigning disjoint subsets of leaf-boxes to different processors<br />

and per<strong>for</strong>ming the least-squares solutions independently on each processor.<br />

Communication is required to get in<strong>for</strong>mation on the entries of the coefficient<br />

matrix <strong>from</strong> neighbouring leaf-boxes.<br />

5.3 Numerical scalability of the preconditioner<br />

In this section we show results concerning the numerical scalability<br />

of the Frobenius-norm minimization preconditioner. They have been<br />

obtained by increasing the value of the frequency and illuminating the same<br />

obstacle. The surface of the object is always discretized using ten points<br />

per wavelength. We consider two test examples: a sphere of radius 1 metre


5.3. Numerical scalability of the preconditioner 97<br />

and an Airbus aircraft (see Figure 5.3.4) that represents a real life model<br />

problem in an industrial context. In Table 5.3.1, we present the number of<br />

Figure 5.3.4: Mesh associated with the Airbus aircraft (courtesy of EADS).<br />

The surface is discretized by 15784 triangles.<br />

matrix-vector products using either GMRES(30) or TFQMR with a required<br />

accuracy of 10 −2 on the normwise backward error ||r||<br />

||b||<br />

, where r denotes<br />

the residual and b the right-hand side of the <strong>linear</strong> system associated with<br />

the experiments on the sphere. This tolerance is accurate <strong>for</strong> engineering<br />

purposes, as it enables us to detect correctly the radar cross section of the<br />

object. The symbol ‘–’ means no convergence after 1500 iterations. In<br />

Table 5.3.2, we show the number of iterations and the parallel elapsed time<br />

to build the preconditioner and to solve the <strong>linear</strong> system when its size is<br />

increased. Similar in<strong>for</strong>mation is reported <strong>for</strong> the experiments on the Airbus<br />

aircraft in Tables 5.3.3 and 5.3.4. All the runs have been per<strong>for</strong>med in single<br />

precision on eight processors of a Compaq Alpha server. The Compaq Alpha<br />

server is a cluster of Symmetric Multi-Processors. Each node consists of<br />

four Alpha processors that share 512 Mb of memory. On that computer the


98 5. Combining fast multipole techniques and approximate inverse ...<br />

temporary disk space that can be used by the out-of-core solver is around<br />

189 Gb.<br />

Size of the Density of the Frequency<br />

<strong>linear</strong> system preconditioner GHz<br />

radius/λ GMRES(30) TFQMR<br />

40368 1.16% 0.9 3 99 152<br />

71148 0.33% 1.2 4 83 171<br />

112908 0.21% 1.5 5 96 134<br />

161472 0.15% 1.8 6 96 654<br />

221952 0.11% 2.1 7 438 —<br />

288300 0.08% 2.4 8 348 —<br />

549552 0.04% 3.3 11 532 —<br />

1023168 0.02% 4.5 15 1196 —<br />

Table 5.3.1: Total number of matrix-vector products required<br />

to converge on a sphere on problems of increasing size -<br />

tolerance = 10 −2 . The size of the leaf-boxes in the oct-tree<br />

associated with the preconditioner is 0.125 wavelengths.<br />

Size of the<br />

Disk memory used Construction Solution<br />

GMRES(30)<br />

<strong>linear</strong> system<br />

Mbytes<br />

time time<br />

71148 83 16.5 13 mins 3 mins<br />

161472 96 37.8 30 mins 8 mins<br />

288300 348 67.9 55 mins 1 hour<br />

549552 532 129.7 1 h 45 mins 4 hours<br />

1023168 1196 243.5 3 h 10 mins 1 day<br />

Table 5.3.2: Elapsed time required to build the<br />

preconditioner and by GMRES(30) to converge on a sphere<br />

on problems of increasing size on eight processors on a<br />

Compaq Alpha server - tolerance = 10 −2 .


5.3. Numerical scalability of the preconditioner 99<br />

Size of the Frequency<br />

<strong>linear</strong> system GHz<br />

GMRES(30) TFQMR<br />

23676 2.3 61 –<br />

94704 4.6 101 –<br />

213084 6.9 225 –<br />

378816 9.2 – –<br />

591900 11.4 – –<br />

1160124 16.1 – –<br />

Table 5.3.3: Total number of matrix-vector products required<br />

to converge on an aircraft on problems of increasing size -<br />

tolerance = 2 · 10 −2 .<br />

Size of the<br />

Disk memory used Construction Solution<br />

GMRES(30)<br />

<strong>linear</strong> system<br />

Mbytes<br />

time time<br />

23676 61 5.7 4 mins 3 mins<br />

94704 101 26.3 26 mins 13 mins<br />

213084 225 63.7 54 mins 47 mins<br />

591900 – 169.9 2 h 30 mins –<br />

1160124 – 338.8 3 h 15 mins –<br />

Table 5.3.4: Elapsed time required to build the<br />

preconditioner and by GMRES(30) to converge on an aircraft<br />

on problems of increasing size on eight procs on a Compaq<br />

Alpha server - tolerance = 2 · 10 −2 .<br />

The number of iterations and the computational cost grow rapidly with<br />

the problem size. On the sphere, the number of iterations required by<br />

GMRES(30) is nearly constant <strong>for</strong> small problems, but increases <strong>linear</strong>ly<br />

<strong>for</strong> larger problems. The solution by GMRES(30) of a scattering problem<br />

at 3.3 GHz frequency, discretized with half a million points, requires 532<br />

matrix-vector products and four hours of computation time to solve the<br />

associated <strong>linear</strong> system. Nearly one day of computation is necessary<br />

to solve the same problem at a frequency of 4.5 GHz. In this case the<br />

pertinent matrix has one million unknowns, and GMRES(30) requires 1196<br />

iterations to converge. Compared to GMRES, TFQMR exhibits a very poor<br />

convergence behaviour. It never converges in less than 1500 matrix-vector<br />

products on <strong>systems</strong> with more than two hundred thousand unknowns. The<br />

Airbus aircraft is even more challenging to solve. On the smallest problem,


100 5. Combining fast multipole techniques and approximate inverse ...<br />

of size 23676, neither GMRES(30) nor TFQMR converge to 10 −2 in 1500<br />

iterations, and 115 iterations are required by full GMRES. On a larger<br />

test, of size 94704, stagnation occurs <strong>for</strong> small or medium restarts and full<br />

GMRES requires 625 iterations. In Figure 5.3.7, we show <strong>for</strong> this problem<br />

the normwise backward error after 1500 iterations of GMRES <strong>for</strong> different<br />

values of restart, when stagnation appears. Convergence to 10 −2 is achieved<br />

only when a very large restart (around 500) is selected. However, <strong>for</strong> large<br />

problems this choice might not be af<strong>for</strong>dable because it is too demanding<br />

in terms of storage requirements. If we reduce the required accuracy to<br />

2 · 10 −2 the convergence becomes obviously easier to achieve in a reasonable<br />

elapsed time and using an af<strong>for</strong>dable memory cost, at least <strong>for</strong> medium-size<br />

problems. As can be observed in Table 5.3.3, GMRES(30) converges in<br />

less than 1500 iterations on problems of size up to two hundred thousand<br />

unknowns. Although this tolerance may seem artificial, we checked at the<br />

end of the computation that the radar cross section of the obstacle was<br />

accurately determined. In Figures 5.3.5 and 5.3.6 we show the typical curves<br />

of the radar cross section <strong>for</strong> an Airbus aircraft discretized with 200000<br />

unknowns. The quantity reported on the ordinate axis indicates the value<br />

of the energy radiated back at different incidence angles. The RCS curve<br />

depicted in Figure 5.3.5 is obtained when we require an accuracy of 2·10 −2 on<br />

the normwise backward error in the solution of the <strong>linear</strong> system. The RCS<br />

curve depicted in Figure 5.3.6 is obtained using another integral <strong>for</strong>mulation,<br />

the CFIE, that is better conditioned and simpler to solve, and requiring an<br />

accuracy of 10 −6 on the normwise backward error in the iterative solution.<br />

The CFIE <strong>for</strong>mulation is less general than EFIE but can be used on closed<br />

targets like the Airbus aircraft. It can be observed that in both figures<br />

the pics are equally well approximated. Thus, <strong>for</strong> engineering purposes the<br />

solution is still meaningful and can be exploited in the design process. We<br />

use this tolerance <strong>for</strong> the remaining numerical experiments on the Airbus<br />

aircraft.<br />

In Table 5.3.5, we investigate the influence of the density on the quality<br />

of the preconditioner on the aircraft. We adopt the same criterion described<br />

in Section 5.2 to define the sparsity patterns but we increase the size of<br />

the leaf-boxes in the oct-tree associated with the preconditioner. The best<br />

trade-off between cost and per<strong>for</strong>mance is obtained <strong>for</strong> 0.125 wavelengths,<br />

that is the default value set in the code. If the preconditioner is reused to<br />

solve <strong>systems</strong> with the same coefficient matrix and multiple right-hand sides,<br />

it might be worth computing more nonzeros because the construction cost<br />

can be quickly amortized. If the size of the leaf-boxes is large enough, the<br />

preconditioner is very effective in reducing the number of GMRES iterations;<br />

<strong>for</strong> values smaller than 0.1 wavelengths the preconditioner is very sparse and<br />

quite poor. For values larger than 0.2 the memory requirements exceed the<br />

limits of our machine.<br />

Finally, in Table 5.3.6, we show the parallel scalability of the


5.3. Numerical scalability of the preconditioner 101<br />

Figure 5.3.5: The RCS curve <strong>for</strong> an Airbus aircraft discretized with 200000<br />

unknowns. The problem is <strong>for</strong>mulated using the EFIE <strong>for</strong>mulation and a<br />

tolerance of 2 · 10 −2 in the iterative solution. The quantity reported on the<br />

ordinate axis indicates the value of the energy radiated back at different<br />

incidence angles.<br />

Figure 5.3.6: The RCS curve <strong>for</strong> an Airbus aircraft discretized with 200000<br />

unknowns. The problem is <strong>for</strong>mulated using the CFIE <strong>for</strong>mulation and a<br />

tolerance of ·10 −6 in the iterative solution. The quantity reported on the<br />

ordinate axis indicates the value of the energy radiated back at different<br />

incidence angles.<br />

implementation of the preconditioner in the FMM code [135]. We solve<br />

problems of increasing size on a larger number of processors, keeping the<br />

number of unknowns per processor constant. We refer to [135] <strong>for</strong> a complete<br />

description of the parallel code that we used.


102 5. Combining fast multipole techniques and approximate inverse ...<br />

0.03<br />

Normwise Backward Error after 1500 iterations of restarted GMRES<br />

0.028<br />

0.026<br />

0.024<br />

Normwise Backward Error<br />

0.022<br />

0.02<br />

0.018<br />

0.016<br />

0.014<br />

0.012<br />

0.01<br />

0 50 100 150 200 250 300 350 400 450 500<br />

Value of restart <strong>for</strong> GMRES<br />

Figure 5.3.7: Effect of the restart parameter on GMRES stagnation on an<br />

aircraft with 94704 unknowns.<br />

radius<br />

# nonzero # mat-vec Construction Solution Overall<br />

per row in M in GMRES time (sec) time (sec) time (sec)<br />

0.097 183 – 1275 – –<br />

0.110 235 472 1836 8121 9957<br />

0.125 299 225 2593 2846 5439<br />

0.141 372 – 4213 – –<br />

0.157 461 – 5866 – –<br />

0.176 569 278 7234 3637 10871<br />

0.195 684 129 10043 1571 11614<br />

Table 5.3.5: Elapsed time to build the preconditioner,<br />

elapsed time to solve the problem and total number of<br />

matrix-vector products using GMRES(30) on an aircraft with<br />

213084 unknowns - tolerance = 2 · 10 −2 - eight processors<br />

Compaq, varying the parameters controlling the density of<br />

the preconditioner. The symbol ’–’ means stagnation after<br />

1000 iterations.


5.4. Improving the preconditioner robustness using embedded iterations103<br />

Problem<br />

Construction time Elapsed time Elapsed time<br />

Nb procs<br />

size<br />

(sec)<br />

precond (sec) mat-vec (sec)<br />

112908 8 513 0.39 1.77<br />

161472 12 488 0.40 1.95<br />

221952 16 497 0.43 2.15<br />

288300 20 520 0.45 2.28<br />

342732 24 523 0.47 3.10<br />

393132 28 514 0.47 3.30<br />

451632 32 509 0.48 2.80<br />

674028 48 504 0.54 3.70<br />

900912 64 514 0.60 3.80<br />

Table 5.3.6: Tests on the parallel scalability of the<br />

code relative to the construction and application of the<br />

preconditioner and to the matrix-vector product operation on<br />

problems of increasing size. The test example is the Airbus<br />

aircraft.<br />

5.4 Improving the preconditioner robustness<br />

using embedded iterations<br />

The numerical results shown in the previous section indicate that the<br />

Frobenius-norm minimization preconditioner tends to be less effective when<br />

the problem size increases. By its nature the sparse approximate inverse is<br />

inherently local because each degree of freedom is coupled only to a very few<br />

neighbours. The compact support that we use to define the preconditioner<br />

does not allow an exchange of global in<strong>for</strong>mation, and when the exact<br />

inverse is globally coupled this lack of global in<strong>for</strong>mation may have a severe<br />

impact on the quality of the preconditioner. In addition, in a multipole<br />

context, the density of the sparse approximate inverse tends to decrease <strong>for</strong><br />

increasing values of frequency because the size of the subdivision boxes gets<br />

smaller when the frequency of the problem is higher. For the solution of<br />

problems of large size it may be necessary to introduce some mechanism<br />

to recover global in<strong>for</strong>mation on the numerical behaviour of the discrete<br />

Green’s function. In this section we investigate the behaviour of innerouter<br />

solution schemes implemented in the FMM context. We consider in<br />

particular FGMRES [121] as the outer solver with an inner GMRES iteration<br />

preconditioned with the Frobenius-norm minimization method. For the<br />

FGMRES method, we consider the implementation described in [64]. The<br />

motivation that naturally leads us to consider inner-outer schemes is to try


104 5. Combining fast multipole techniques and approximate inverse ...<br />

Outer solver −→ FGMRES, FQMR<br />

Do k=1,2, ...<br />

• M-V product: FMM with high accuracy<br />

• Preconditioning : Inner solver (GMRES, TFQMR, ...)<br />

Do i=1,2, ...<br />

End Do<br />

End Do<br />

• M-V product: FMM with low accuracy<br />

• Preconditioning : M F rob<br />

Figure 5.4.8: Inner-outer solution schemes in the FMM context. Sketch of<br />

the algorithm.<br />

to balance the locality of the preconditioner with the use of the multipole<br />

matrix. The matrix-vector products within the outer and the inner solvers<br />

are carried out at different accuracies. Highly accurate FMM is used within<br />

the outer solver that is used to actually solve the <strong>linear</strong> system, and a lower<br />

accurate FMM within the inner solver is used as preconditioner <strong>for</strong> the<br />

outer scheme. In fact, we solve a nearby system <strong>for</strong> the preconditioning<br />

operation. This enables us to save considerable computational ef<strong>for</strong>t during<br />

the iterative process. More precisely, the FMM accuracy is “high” <strong>for</strong> the<br />

FGMRES iteration (the relative error in the matrix-vector computation is<br />

around 5 · 10 −4 compared to the exact computation) and “medium” <strong>for</strong> the<br />

inner iteration (the relative error is around ·10 −3 ). We present a sketch of<br />

the algorithm in Figure 5.4.8. One could apply this idea recursively and<br />

embed several FGMRES schemes with decreasing FMM accuracy down to<br />

the lowest accuracy in the innermost GMRES. However, in our work we only<br />

consider a two-level scheme. We will see that this is already quite effective.<br />

Among the various possibilities, we select FGMRES(5) and GMRES(20)<br />

that seem to give the optimal trade-off, as the results reported in Table 5.4.7<br />

and 5.4.8 show <strong>for</strong> experiments on a sphere with 367500 points and an Airbus<br />

aircraft with 213084 points, respectively.


5.4. Improving the preconditioner robustness using embedded iterations105<br />

restart<br />

FGMRES<br />

restart<br />

GMRES<br />

max inner<br />

GMRES<br />

total inner<br />

mat-vec<br />

total outer<br />

mat-vec<br />

5 10 10 230 29 4211<br />

5 10 20 231 15 3526<br />

5 10 30 288 12 4544<br />

5 20 20 180 12 2741<br />

5 20 30 248 11 3967<br />

5 20 40 246 9 3785<br />

10 10 10 180 21 2912<br />

Table 5.4.7: Global elapsed time and total number of matrixvector<br />

products required to converge on a sphere with 367500<br />

points varying the size of the restart parameters and the<br />

maximum number of inner GMRES iterations per FGMRES<br />

preconditioning step - tolerance = 10 −2 - eight processors<br />

Compaq.<br />

Solution<br />

time (sec)<br />

restart<br />

FGMRES<br />

restart<br />

GMRES<br />

max inner<br />

GMRES<br />

total inner<br />

mat-vec<br />

total outer<br />

mat-vec<br />

5 10 10 +200 +4000 –<br />

5 10 20 13 210 2198<br />

5 10 30 12 288 3178<br />

5 20 20 11 160 1768<br />

5 20 30 9 186 2020<br />

5 20 40 7 205 2183<br />

5 30 30 9 180 1946<br />

10 10 10 19 160 1861<br />

10 20 20 10 160 1827<br />

10 30 30 8 180 2123<br />

Table 5.4.8: Global elapsed time and total number of matrixvector<br />

products required to converge on an aircraft with<br />

213084 unknowns varying the size of the restart parameters<br />

and the maximum number of inner GMRES iterations per<br />

FGMRES preconditioning step - tolerance = 2 · 10 −2 - eight<br />

processors Compaq.<br />

Solution<br />

time (sec)


106 5. Combining fast multipole techniques and approximate inverse ...<br />

The convergence history of GMRES depicted in Figure 5.4.9 <strong>for</strong> different<br />

values of the restart gives us some clues to the numerical behaviour of the<br />

proposed scheme. The residual of GMRES tends to decrease very rapidly in<br />

the first few iterations independently of the restarts, then decreases much<br />

more slowly, and finally stagnates to a value that depends on the restart;<br />

the larger the restart, the lower the stagnation value. It suggests that a<br />

few steps (up to 20) in the inner solver can be very effective <strong>for</strong> obtaining a<br />

significant reduction of the initial residual. A different numerical behaviour<br />

has been observed <strong>for</strong> the TFQMR solver as inner solver. The residual in<br />

the beginning of the convergence is nearly constant or decreases very slowly.<br />

The use of this method as an inner solver is ineffective. Figure 5.4.9 also<br />

shows that large restarts of GMRES do not enable a further reduction of<br />

the normwise backward error in the beginning of convergence. Thus small<br />

restarts should be preferred in the inner GMRES iterations.<br />

Normwise Backward Error<br />

0.07<br />

0.06<br />

0.05<br />

0.04<br />

0.03<br />

Convergence history of restarted GMRES <strong>for</strong> different values of restart<br />

Restart = 10<br />

" = 20<br />

" = 30<br />

" = 50<br />

" = 80<br />

" = 150<br />

" = 300<br />

" = 500<br />

0.02<br />

0.01<br />

0 500 1000 1500<br />

Number of M−V products<br />

Figure 5.4.9: Convergence history of restarted GMRES <strong>for</strong> different values<br />

of restart on an aircraft with 94704 unknowns.<br />

We show the results of some preliminary experiments in Tables 5.4.9<br />

and 5.4.10. We show the number of inner and outer matrix-vector products<br />

needed to achieve convergence on the sphere using a tolerance of 10 −2 and<br />

on the Airbus aircraft using a tolerance of 2 · 10 −2 . We also give timings.<br />

The comparison with the results shown in Tables 5.3.1 and 5.3.3 is fair<br />

because GMRES(30) has exactly the same storage requirements as the<br />

combination FGMRES(5)/GMRES(20). In fact, <strong>for</strong> the same restart value,<br />

the storage requirement <strong>for</strong> the FGMRES algorithm is twice that <strong>for</strong> the<br />

standard GMRES algorithm, as it stores the preconditioned vectors of the<br />

Krylov basis. The combination FGMRES/GMRES remarkably enhances<br />

the robustness of the preconditioner on large problems. On the sphere


5.4. Improving the preconditioner robustness using embedded iterations107<br />

with 367500 points, it enables convergence in 16 outer and 252 total inner<br />

iterations whereas GMRES(30) does not converge in 1500 iterations. On the<br />

sphere with one million unknowns the elapsed time <strong>for</strong> the iterative solution<br />

is reduced <strong>from</strong> 11 hours to 1 1 2<br />

hours on 16 processors. The enhancement of<br />

the robustness of the preconditioner is even more significant on the Airbus<br />

aircraft as GMRES(30) does not converge on problem sizes larger than<br />

around 200000 unknowns. This can be observed in Table 5.4.10 and also<br />

in Figure 5.4.10 where we report on the normwise backward error after 100<br />

outer iterations of FGMRES <strong>for</strong> different values of restart <strong>for</strong> FGMRES.<br />

The value reported in this figure can be considered as the level of stagnation<br />

of the normwise backward error. The depicted curve can be compared to the<br />

one given in Figure 5.3.7. The normwise backward error is much smaller than<br />

that obtained with the standard GMRES at a comparable computational<br />

cost. Finally, we mention that the combination FGMRES(5)/GMRES(20)<br />

does not converge on one problem, of size 378816, using a tolerance of 2·10 −2 .<br />

In fact, <strong>for</strong> some specific values of the frequency, resonance phenomena may<br />

occur in the associated physical problem, and the resulting <strong>linear</strong> system<br />

can become very ill-conditioned.<br />

Size of the<br />

Solution<br />

FGMRES(5) GMRES(20)<br />

<strong>linear</strong> system<br />

time (sec)<br />

40368 7 105 2 mins<br />

71148 7 105 4 mins<br />

112908 7 105 7 mins<br />

161472 9 126 13 mins<br />

221952 13 210 29 mins<br />

288300 13 210 37 mins<br />

367500 16 252 1 h 10 mins<br />

549552 17 260 1 h 50 mins<br />

1023168 17 260 3 h 20 mins<br />

Table 5.4.9: Total number of matrix-vector products required<br />

to converge on a sphere on problems of increasing size -<br />

tolerance = 10 −2 .


108 5. Combining fast multipole techniques and approximate inverse ...<br />

Size of the<br />

Solution<br />

FGMRES(5) GMRES(20)<br />

<strong>linear</strong> system<br />

time (sec)<br />

23676 15 220 7 mins<br />

94704 7 100 9 mins<br />

213084 11 160 36 mins<br />

591900 17 260 3 h 25 mins<br />

1160124 19 300 8 h 42 mins<br />

Table 5.4.10: Total number of matrix-vector products<br />

required to converge on an aircraft on problems of increasing<br />

size - tolerance = 2 · 10 −2 .<br />

0.016<br />

Normwise Backward Error after 100 iterations of restarted FGMRES<br />

0.014<br />

Normwise Backward Error<br />

0.012<br />

0.01<br />

0.008<br />

0.006<br />

0.004<br />

0 50 100 150<br />

Value of restart <strong>for</strong> FGMRES<br />

Figure 5.4.10: Effect of the restart parameter on FGMRES stagnation on<br />

an aircraft with 94704 unknowns using GMRES(20) as inner solver.<br />

5.5 Concluding remarks<br />

In this chapter, we have described the implementation of the Frobeniusnorm<br />

minimization preconditioner within the code that implements the<br />

Fast Multipole Method (FMM). We have studied the numerical and parallel<br />

scalability of the implementation <strong>for</strong> the solution of large problems, with up<br />

to one million unknowns, and we have investigated the numerical behaviour<br />

of inner-outer iterative solution schemes implemented in a multipole context<br />

with different levels of accuracy <strong>for</strong> the matrix-vector products in the


5.5. Concluding remarks 109<br />

inner and outer loops. In particular we have shown that the combination<br />

FGMRES(5)/GMRES(20) can effectively enhance the robustness of the<br />

preconditioner, reducing significantly the computational cost and the storage<br />

requirement <strong>for</strong> the solution of large problems. Most of the experiments<br />

shown in this chapter require a huge amount of computation and storage,<br />

and they often reach the limits of our target machine in terms of Mbytes.<br />

For the solution of <strong>systems</strong> with one million unknowns direct methods would<br />

require 8 Tbytes of storage and 37 years of computation on one processor of<br />

the target computer (assuming the computation runs at peak per<strong>for</strong>mance).<br />

Some questions are still open. One issue concerns the optimal tuning of<br />

the inner accuracy of the FMM. In the numerical experiments we selected a<br />

“medium” accuracy <strong>for</strong> the inner iteration. We mention now as be<strong>for</strong>e that<br />

using a “lower” accurate FFM in the inner GMRES does not enable us to<br />

get convergence of the outer FGMRES. A multilevel scheme can be designed<br />

as a natural extension of the simple two-level scheme considered in this<br />

chapter, with several embedded FGMRES going down to the lowest accuracy<br />

in the innermost GMRES. An interesting further experiment might be to use<br />

variants of these schemes, based on the FQMR method [136] as the outer<br />

solver and SQMR as the inner solver. The SQMR scheme is remarkably<br />

robust on these applications when used in combination with a symmetric<br />

Frobenius-norm minimization preconditioner such as those introduced in<br />

Chapter 4.


110 5. Combining fast multipole techniques and approximate inverse ...


Chapter 6<br />

Spectral two-level<br />

preconditioner<br />

In the previous chapter, we analysed the numerical behaviour of the<br />

Frobenius-norm minimization method <strong>for</strong> the solution of large problems.<br />

The numerical results indicate that the preconditioner is less effective when<br />

the problem size increases because of the inherently local nature of the<br />

approximate inverse and the global behaviour of the equations. In this<br />

chapter, we introduce an algebraic multilevel strategy based on low-rank<br />

updates <strong>for</strong> the preconditioner computed by using spectral in<strong>for</strong>mation of<br />

the preconditioned matrix.<br />

The chapter is organized in the following way. In Section 6.1, we motivate<br />

the idea of the construction of multilevel <strong>preconditioners</strong> via low-rank<br />

updates, and we provide a few references to similar work. In Section 6.2, we<br />

describe an additive <strong>for</strong>mulation of the preconditioner <strong>for</strong> both unsymmetric<br />

and symmetric <strong>systems</strong>. We show the results of numerical experiments<br />

to illustrate the computational and numerical efficiency of the algorithm<br />

on a set of model problems arising <strong>from</strong> electromagnetic calculations. In<br />

Section 6.3, we describe a multiplicative <strong>for</strong>mulation of the preconditioner<br />

and give some comparative results. We conclude the chapter with some final<br />

remarks and perspectives.<br />

6.1 Introduction and motivation<br />

The construction of the Frobenius-norm minimization preconditioner<br />

is inherently local. Each degree of freedom in the approximate inverse<br />

is coupled to only a very few neighbours and this compact support does<br />

not allow an exchange of global in<strong>for</strong>mation. When the exact inverse<br />

is globally coupled, the lack of global in<strong>for</strong>mation may have a severe<br />

111


112 6. Spectral two-level preconditioner<br />

impact on the quality of the preconditioner. The discrete Green’s function<br />

in electromagnetic applications exhibits a rapid decay, nevertheless the<br />

exact inverse is <strong>dense</strong> and thus has global support. The locality of<br />

the preconditioner can be reduced by increasing the number of nonzeros<br />

computed, but the construction cost grows almost cubicly with respect to<br />

density. Enlarging the sparsity pattern imposed on A can be a cheaper<br />

remedy because the computational cost <strong>for</strong> the least-squares solution grows<br />

only <strong>linear</strong>ly with the number of rows. However, in a multipole context<br />

where only the entries of the coefficient matrix associated with the nearfield<br />

interactions are available, the computation of additional entries <strong>from</strong><br />

A requires the approximation of surface integrals.<br />

In this chapter, we propose a refinement technique which enhances the<br />

robustness of the approximate inverse on large problems. The method<br />

is based on the introduction of low-rank updates computed by exploiting<br />

spectral in<strong>for</strong>mation of the preconditioned matrix. The purpose here<br />

is to remove the effect of the smallest eigenvalues in magnitude in the<br />

preconditioned matrix, which potentially can slow down the convergence<br />

of Krylov solvers. We discussed in Chapter 2 that a clustered spectrum<br />

is highly desirable property <strong>for</strong> the rapid convergence of Krylov methods.<br />

In exact arithmetic the number of distinct eigenvalues would determine<br />

the maximum dimension of the Krylov subspace. If the diameters of<br />

the clusters are small enough, the eigenvalues within each cluster behave<br />

numerically like a single eigenvalue, and we would expect less iterations<br />

of a Krylov method to produce reasonably accurate approximations. The<br />

Frobenius-norm minimization preconditioner succeeds in clustering most of<br />

the eigenvalues far <strong>from</strong> the origin, nevertheless eigenvalues nearest zero<br />

can potentially slow down convergence. Theoretical studies have related<br />

super<strong>linear</strong> convergence of GMRES to the convergence of Ritz values [143].<br />

Basically, convergence occurs as if, at each iteration of GMRES, the next<br />

smallest eigenvalue in magnitude is removed <strong>from</strong> the system. As the<br />

restarting procedure destroys in<strong>for</strong>mation about the Ritz values at each<br />

restart, the super<strong>linear</strong> convergence may be lost. Thus removing the effect<br />

of small eigenvalues in the preconditioned matrix can have a beneficial effect<br />

on the convergence.<br />

There are essentially two different approaches <strong>for</strong> exploiting in<strong>for</strong>mation<br />

related to the smallest eigenvalues during the iteration. The first<br />

idea is to compute a few, k say, approximate eigenvectors of MA<br />

corresponding to the k smallest eigenvalues in magnitude, and enlarge<br />

the Krylov subspace with those directions. At each restart, let<br />

u 1 , u 2 , ..., u k be approximate eigenvectors corresponding to the approximate<br />

eigenvalues of MA closest to the origin. The updated solution of<br />

the <strong>linear</strong> system in the next cycle of GMRES is extracted <strong>from</strong><br />

Span{r 0 , Ar 0 , A 2 r 0 , A 3 r 0 , ..., A m−k−1 r 0 , u 1 , u 2 , ..., u k }. This approach is<br />

referred to as the augmented subspace approach (see [112, 113, 120]).


6.1. Introduction and motivation 113<br />

The approximate eigenvectors can be chosen to be Ritz vectors <strong>from</strong> the<br />

Arnoldi method. The standard implementation of the restarted GMRES(m)<br />

algorithm is based on the Arnoldi process, and this allows us to recover<br />

spectral in<strong>for</strong>mation of MA during the iterations. Deflation techniques have<br />

been proposed in [94, 43].<br />

The second idea exploits spectral in<strong>for</strong>mation gathered during the<br />

Arnoldi process to determine an approximation of an invariant subspace of A<br />

associated with the eigenvalues nearest the origin, and uses this in<strong>for</strong>mation<br />

to construct a preconditioner or to update the preconditioner. The idea of<br />

using exact invariant subspaces to improve the eigenvalue distribution was<br />

proposed in [119]. In<strong>for</strong>mation <strong>from</strong> the invariant subspace associated with<br />

the smallest eigenvalues and its orthogonal complement are used to construct<br />

a preconditioner in the approach proposed in [7]. This in<strong>for</strong>mation can be<br />

obtained <strong>from</strong> the Arnoldi decomposition of a matrix A of size n that has<br />

the <strong>for</strong>m<br />

AV m = V m H m + f m e T m<br />

where V m ∈ R n×m , f m ∈ R n , e m is the m-th unit vector of R n , Vm T V m =<br />

I m , Vm T f m = 0, and H m ∈ R m×m is an upper Hessenberg matrix. If the<br />

Arnoldi process is started <strong>from</strong> V m e 1 = r 0 /‖r 0 ‖, the columns of V m span<br />

the Krylov subspace K m (A, r 0 ). Let the matrix V k ∈ R k×n consist of the<br />

first k columns v 1 , v 2 , ..., v k of V m , and let the columns of the orthogonal<br />

matrix W n−k span the orthogonal complement of Span{v 1 , v 2 , ..., v k }. As<br />

Wn−k T W n−k = I n−k , the columns of the matrix [V k W n−k ] <strong>for</strong>m an<br />

orthogonal basis of R n . In [7] the inverse of the matrix<br />

M = V k H k V T<br />

k<br />

+ W n−kW T n−k<br />

is used as a left preconditioner. It can be expressed as:<br />

M −1 = V k H −1<br />

k<br />

V k T + W n−kWn−k T .<br />

At each restart, the preconditioner is updated by extracting new<br />

eigenvalues which are the smallest in magnitude. The algorithm proposed<br />

uses the recursion <strong>for</strong>mulae of the implicitly restarted Arnoldi (IRA) method<br />

described in [132], and the determination of the preconditioner does not<br />

require the evaluation of any matrix-vector products with the matrix A in<br />

addition to those needed <strong>for</strong> the Arnoldi process.<br />

Another adaptive procedure to determine a preconditioner during<br />

GMRES iterations was introduced in [59]. It is based on the same<br />

idea of estimating the invariant subspace corresponding to the smallest<br />

eigenvalues. The preconditioner is based on a deflation technique such that<br />

the <strong>linear</strong> system is solved exactly in an invariant subspace of dimension r<br />

corresponding to the smallest r eigenvalues of A.


114 6. Spectral two-level preconditioner<br />

Finally, a preconditioner <strong>for</strong> GMRES based on a sequence of rank-one<br />

updates that involve the left and right smallest eigenvectors is proposed<br />

in [92]. The method is based on the idea of translating isolated eigenvalues<br />

consecutively group by group into a vicinity of the point (1.0,0.0) using<br />

low-rank projections of the coefficient matrix of the <strong>for</strong>m<br />

à = A · (I n + u 1 v H 1 ) · ... · (I n + u l v H l ).<br />

The vectors u j and v j , j ∈ [1, l] are determined to ensure the numerical<br />

stability of consecutive translations of groups of isolated eigenvalues of Ã.<br />

After each restart of GMRES(m), approximations to the isolated eigenvalues<br />

to be translated are computed by the Arnoldi process. The isolated<br />

eigenvalues are translated towards the point (1.0,0.0) of the spectrum, and<br />

the next cycle of GMRES(m) is applied to the trans<strong>for</strong>med matrix. The<br />

effectiveness of this method relies on the assumption that most of the<br />

eigenvalues of A are clustered close to (1.0,0.0) in the complex plane.<br />

Most of these schemes are combined with the GMRES procedure as they<br />

derive in<strong>for</strong>mation directly <strong>from</strong> its internal Arnoldi process. In our work,<br />

we consider an explicit eigencomputation which makes the preconditioner<br />

independent of the Krylov solver used <strong>for</strong> the actual solution of the <strong>linear</strong><br />

system.<br />

6.2 Two-level preconditioner via low-rank spectral<br />

updates<br />

The Frobenius-norm minimization preconditioner succeeds in clustering<br />

most of the eigenvalues far <strong>from</strong> the origin. This can be observed in<br />

Figure 6.2.1 where we see a big cluster near (1.0,0.0) in the spectrum<br />

of the preconditioned matrix <strong>for</strong> Example 2. This kind of distribution is<br />

highly desirable to get fast convergence of Krylov solvers. Nevertheless the<br />

eigenvalues nearest to zero potentially can slow down convergence. When<br />

we use M 2g−g it is difficult to remove all the smallest eigenvalues close to<br />

the origin even if we increase the number of nonzeros.<br />

In the next sections, we propose a refinement technique <strong>for</strong> the<br />

approximate inverse based on the introduction of low-rank corrections<br />

computed by using spectral in<strong>for</strong>mation associated with the smallest<br />

eigenvalues in MA. Roughly speaking, the proposed technique consists<br />

in solving the preconditioned system exactly on a coarse space and using<br />

this in<strong>for</strong>mation to update the preconditioned residual. We first present<br />

our technique <strong>for</strong> unsymmetric <strong>linear</strong> <strong>systems</strong> and then derive a variant <strong>for</strong><br />

symmetric and SPD matrices.


6.2. Two-level preconditioner via low-rank spectral updates 115<br />

0.5<br />

0<br />

Imaginary axis<br />

−0.5<br />

−1<br />

−1.5<br />

−0.5 0 0.5 1 1.5<br />

Real axis<br />

Figure 6.2.1: Eigenvalue distribution <strong>for</strong> the coefficient matrix<br />

preconditioned by the Frobenius-norm minimization method on Example 2.<br />

6.2.1 Additive <strong>for</strong>mulation<br />

We consider the solution of the <strong>linear</strong> system<br />

Ax = b, (6.2.1)<br />

where A is a n × n complex unsymmetric nonsingular matrix, x and b are<br />

vectors of size n. The <strong>linear</strong> system is solved using a preconditioned Krylov<br />

solver and we denote by M 1 the left preconditioner, meaning that we solve<br />

is:<br />

M 1 Ax = M 1 b. (6.2.2)<br />

We assume that the preconditioned matrix M 1 A is diagonalizable, that<br />

M 1 A = V ΛV −1 , (6.2.3)<br />

with Λ = diag(λ i ), where |λ 1 | ≤ . . . ≤ |λ n | are the eigenvalues and V = (v i )<br />

the associated right eigenvectors. We denote by U = (u i ) the associated left<br />

eigenvectors; we then have U H V = diag(u H i v i), with u H i v i ≠ 0, ∀i [147]. Let<br />

V ε be the set of right eigenvectors associated with the set of eigenvalues λ i<br />

with |λ i | ≤ ε. Similarly, we define by U ε the corresponding subset of left<br />

eigenvectors.<br />

Theorem 1 Let<br />

A c = U H ε M 1 AV ε ,<br />

M c = V ε A −1<br />

c U H ε M 1


116 6. Spectral two-level preconditioner<br />

and<br />

M = M 1 + M c .<br />

Then MA is diagonalisable and we have MA = V diag(η i )V −1 with<br />

{<br />

ηi = λ i if |λ i | > ε,<br />

η i = 1 + λ i if |λ i | ≤ ε.<br />

Proof<br />

We first remark that A c = diag(λ i u H i v i) with |λ i | ≤ ε and then A c is<br />

nonsingular.<br />

Let V = (V ε , V¯ε ), where V¯ε is the set of (n − k) right eigenvectors associated<br />

with eigenvalues |λ i | > ε.<br />

Let D ε = diag(λ i ) with |λ i | ≤ ε and D¯ε = diag(λ j ) with |λ j | > ε.<br />

The following relations hold<br />

MAV ε = M 1 AV ε + V ε A −1<br />

c Uε H M 1 AV ε<br />

= V ε D ε + V ε I k<br />

= V ε (D ε + I k )<br />

where I k denotes the (k × k) identity matrix, and<br />

MAV¯ε = M 1 AV¯ε + V ε A −1<br />

c Uε H M 1 AV¯ε<br />

= V¯ε D¯ε + V ε A −1<br />

c Uε H V¯ε D¯ε<br />

= V¯ε D¯ε as Uε H V¯ε = 0.<br />

We then have<br />

( )<br />

Dε + I<br />

MAV = V<br />

k 0<br />

.<br />

0 D¯ε<br />

A c represents the projection of the matrix M 1 A on the coarse space<br />

defined by the approximate eigenvectors associated with its smallest<br />

eigenvalues.<br />

Theorem 2 Let W be such that<br />

<br />

à c = W H AV ε<br />

has full rank,<br />

and<br />

˜M c = V ε Ã −1<br />

c W H<br />

˜M = M 1 + ˜M c .<br />

Then ˜MA is similar to a matrix whose eigenvalues are<br />

{<br />

ηi = λ i if |λ i | > ε,<br />

η i = 1 + λ i if |λ i | ≤ ε.


6.2. Two-level preconditioner via low-rank spectral updates 117<br />

Proof<br />

With the same notation as <strong>for</strong> Theorem 1 we have:<br />

˜MAV ε = M 1 AV ε + V ε Ã −1<br />

c W H AV ε<br />

= V ε D ε + V ε I k<br />

= V ε (D ε + I k )<br />

˜MAV¯ε = M 1 AV¯ε + V ε Ã −1<br />

c W H AV¯ε<br />

= V¯ε D¯ε + V ε C with C = Ã−1 c<br />

= ( V ε V¯ε<br />

) ( C<br />

D¯ε<br />

)<br />

We then have<br />

˜MAV = V<br />

W H AV¯ε<br />

( )<br />

Dε + I k C<br />

.<br />

0 D¯ε<br />

For right preconditioning, that is AM 1 y = b, similar results hold.<br />

<br />

Lemma 1 Let<br />

and<br />

A c = U H ε AM 1 V ε ,<br />

M c = M 1 V ε A −1<br />

c Uε<br />

H<br />

M = M 1 + M c .<br />

Then AM is diagonalisable and we have AM = V diag(η i )V −1 with<br />

{<br />

ηi = λ i if |λ i | > ε,<br />

η i = 1 + λ i if |λ i | ≤ ε.<br />

Lemma 2 Let W be such that<br />

and<br />

à c = W H AM 1 V ε<br />

has full rank,<br />

˜M c = M 1 V ε Ã −1<br />

c W H<br />

˜M = M 1 + ˜M c .<br />

Then A ˜M is similar to a matrix whose eigenvalues are<br />

{<br />

ηi = λ i if |λ i | > ε,<br />

η i = 1 + λ i if |λ i | ≤ ε.<br />

We should point out that if the symmetry of the preconditioner has to be<br />

preserved an obvious choice exists. For left preconditioning, we can set W =<br />

V ε , that nevertheless does not imply that Ãc has full rank. For SPD matrices<br />

this choice leads to a SPD preconditioner. Indeed the preconditioner ˜M is<br />

the sum of a SPD matrix M 1 and the low rank update that is symmetric<br />

semi-definite; it can be noticed that in this case the preconditioner has a<br />

similar <strong>for</strong>m to those proposed in [24] <strong>for</strong> two-level <strong>preconditioners</strong> in nonoverlapping<br />

domain decomposition.


118 6. Spectral two-level preconditioner<br />

6.2.2 Numerical experiments<br />

In this section, we show some numerical results that illustrate the<br />

effectiveness of the spectral two-level preconditioner <strong>for</strong> the solution of <strong>dense</strong><br />

complex symmetric non-Hermitian <strong>systems</strong> arising <strong>from</strong> the discretization<br />

of surface integral equations in electromagnetism. In our experiments, the<br />

eigenpairs are computed in a preprocessing step, be<strong>for</strong>e per<strong>for</strong>ming the<br />

iterative solution. This makes the preconditioner independent of the Krylov<br />

solver used <strong>for</strong> the actual solution of the <strong>linear</strong> system at the cost of this<br />

extra computation. We use the IRA method implemented in the ARPACK<br />

package to compute approximations to the smallest eigenvalues and their<br />

corresponding approximate eigenvectors. The methods implemented in the<br />

ARPACK software are derived <strong>from</strong> a class of algorithms called Krylov<br />

subspace projection methods. These methods use in<strong>for</strong>mation <strong>from</strong> the<br />

sequence of vectors generated by the power method to compute eigenvectors<br />

corresponding to eigenvalues other than the one with largest magnitude.<br />

In our experiments, we consider coarse spaces of dimension up to 20,<br />

and different values of restart <strong>for</strong> GMRES, <strong>from</strong> 10 to 110. For each<br />

test problem, we per<strong>for</strong>m experiments with two levels of accuracy in the<br />

GMRES solution to gain more insight into the robustness of our method.<br />

We provide extensive results in Appendix A; in this chapter we show<br />

the qualitative numerical behaviour of our method on our set of test<br />

examples that can be representative of the general trend in electromagnetic<br />

applications. First we consider the unsymmetric <strong>for</strong>mulation described<br />

in Theorem 1. In Figures 6.2.2-6.2.6 we show the number of iterations<br />

required by GMRES(10) to reduce the normwise backward error to 10 −8<br />

and 10 −5 <strong>for</strong> increasing size of the coarse space. The numerical results show<br />

that the introduction of the low-rank updates can remarkably enhance the<br />

robustness of the approximate inverse. By selecting up to 10 eigenpairs<br />

the number of iterations decreases by at least a factor of 2 on most of the<br />

experiments reported. The gain is more relevant when high accuracy is<br />

required <strong>for</strong> the approximate solution. On Example 2, the preconditioning<br />

updates enable fast convergence of GMRES with a low restart within a<br />

tolerance of 10 −8 whereas no convergence was obtained in 1500 iterations<br />

without updates. However, a substantial improvement on the convergence<br />

is observed also when low accuracy is required. In the most effective case,<br />

by selecting 10 corrections, the number of GMRES iterations needed to<br />

achieve convergence of 10 −5 using low restarts reduces by greater than<br />

a factor of 4 on Example 5. If more eigenpairs are selected, generally<br />

no substantial improvement is observed. In fact, the gain in terms of<br />

iterations is strongly related to the magnitude of the shifted eigenvalues. A<br />

speedup in convergence is obtained when a full cluster of small eigenvalues<br />

is completely removed. This is illustrated in Tables 6.2.1 and 6.2.2 where we<br />

show the effect on the convergence of GMRES(10) of deflating eigenvalues


6.2. Two-level preconditioner via low-rank spectral updates 119<br />

of increasing magnitude on Examples 2 and 5, that are representative of<br />

the general trend. On Example 2, the presence of a very small eigenvalue<br />

slows down the convergence significantly. Once this eigenvalue is shifted,<br />

the number of iterations rapidly decreases. On Example 5, there is a cluster<br />

of seven eigenvalues of magnitude around 10 −3 . When the eigenvalues<br />

within the cluster are shifted, a quick speed-up of convergence is observed;<br />

the shifting of the remaining eigenvalues does not have any impact on<br />

the convergence. In Figure 6.2.7-6.2.9, we show the number of iterations<br />

required by restarted GMRES to reduce the normwise backward error to<br />

10 −8 <strong>for</strong> different values of restart and increasing size of the coarse space.<br />

The remarkable enhancement of the robustness of the preconditioner enables<br />

the use of very small restarts <strong>for</strong> GMRES.<br />

400<br />

Example 1 − Size = 1080 − IRAM tolerance = 0.1<br />

350<br />

GMRES Toler = 1.0e−8<br />

" = 1.0e−5<br />

Number of iterations of GMRES(10)<br />

300<br />

250<br />

200<br />

150<br />

100<br />

50<br />

0 2 4 6 8 10 12 14 16 18 20<br />

Size of the coarse space<br />

Figure 6.2.2: Number of iterations required by GMRES preconditioned by<br />

a Frobenius-norm minimization method updated with spectral corrections<br />

to reduce the normwise backward error by 10 −8 and 10 −5 <strong>for</strong> increasing size<br />

of the coarse space on Example 1.


120 6. Spectral two-level preconditioner<br />

350<br />

Example 2 − Size = 1299 − IRAM tolerance = 0.1<br />

300<br />

GMRES Toler = 1.0e−8<br />

" = 1.0e−5<br />

Number of iterations of GMRES(10)<br />

250<br />

200<br />

150<br />

100<br />

50<br />

0 2 4 6 8 10 12 14 16 18 20<br />

Size of the coarse space<br />

Figure 6.2.3: Number of iterations required by GMRES preconditioned by<br />

a Frobenius-norm minimization method updated with spectral corrections<br />

to reduce the normwise backward error by 10 −8 and 10 −5 <strong>for</strong> increasing size<br />

of the coarse space on Example 2.<br />

Nr of shifted<br />

Eigenvalues<br />

Example 2<br />

Magnitude of<br />

the eigenvalue<br />

0 +1500<br />

1 7.1116e-04 310<br />

2 4.9685e-02 306<br />

3 5.2737e-02 308<br />

4 6.3989e-02 304<br />

5 7.0395e-02 309<br />

6 7.7396e-02 313<br />

7 7.8442e-02 246<br />

8 8.9548e-02 205<br />

9 9.1598e-02 205<br />

10 9.9216e-02 198<br />

GMRES(10)<br />

Toler = 10 −8<br />

Table 6.2.1: Effect of shifting the eigenvalues nearest zero on the convergence<br />

of GMRES(10) <strong>for</strong> Example 2. We show the magnitude of successively<br />

shifted eigenvalues and the number of iterations required when these<br />

eigenvalues are shifted. A tolerance of 10 −8 is required in the iterative<br />

solution.


6.2. Two-level preconditioner via low-rank spectral updates 121<br />

300<br />

Example 3 − Size = 1701 − IRAM tolerance = 0.1<br />

GMRES Toler = 1.0e−8<br />

" = 1.0e−5<br />

250<br />

Number of iterations of GMRES(10)<br />

200<br />

150<br />

100<br />

50<br />

0 2 4 6 8 10 12 14 16 18 20<br />

Size of the coarse space<br />

Figure 6.2.4: Number of iterations required by GMRES preconditioned by<br />

a Frobenius-norm minimization method updated with spectral corrections<br />

to reduce the normwise backward error by 10 −8 and 10 −5 <strong>for</strong> increasing size<br />

of the coarse space on Example 3.<br />

Nr of shifted<br />

Eigenvalues<br />

Example 5<br />

Magnitude of<br />

the eigenvalue<br />

0 297<br />

1 8.7837e-03 290<br />

2 8.7968e-03 290<br />

3 8.7993e-03 287<br />

4 9.8873e-03 254<br />

5 9.9015e-03 232<br />

6 9.9053e-03 392<br />

7 9.9126e-03 52<br />

8 2.3331e-01 52<br />

9 2.4811e-01 53<br />

10 2.4813e-01 53<br />

GMRES(10)<br />

Toler = 10 −8<br />

Table 6.2.2: Effect of shifting the eigenvalues nearest zero on the convergence<br />

of GMRES(10) <strong>for</strong> Example 5. We show the magnitude of successively<br />

shifted eigenvalues and the number of iterations required when these<br />

eigenvalues are shifted. A tolerance of 10 −8 is required in the iterative<br />

solution.


122 6. Spectral two-level preconditioner<br />

160<br />

Example 4 − Size = 2016 − IRAM tolerance = 0.1<br />

140<br />

GMRES Toler = 1.0e−8<br />

" = 1.0e−5<br />

Number of iterations of GMRES(10)<br />

120<br />

100<br />

80<br />

60<br />

40<br />

20<br />

0 2 4 6 8 10 12 14 16 18 20<br />

Size of the coarse space<br />

Figure 6.2.5: Number of iterations required by GMRES preconditioned by<br />

a Frobenius-norm minimization method updated with spectral corrections<br />

to reduce the normwise backward error by 10 −8 and 10 −5 <strong>for</strong> increasing size<br />

of the coarse space on Example 4.<br />

400<br />

Example 5 − Size = 2430 − IRAM tolerance = 0.1<br />

350<br />

GMRES Toler = 1.0e−8<br />

" = 1.0e−5<br />

300<br />

Number of iterations of GMRES(10)<br />

250<br />

200<br />

150<br />

100<br />

50<br />

0<br />

0 2 4 6 8 10 12 14 16 18 20<br />

Size of the coarse space<br />

Figure 6.2.6: Number of iterations required by GMRES preconditioned by<br />

a Frobenius-norm minimization method updated with spectral corrections<br />

to reduce the normwise backward error by 10 −8 and 10 −5 <strong>for</strong> increasing size<br />

of the coarse space on Example 5.


6.2. Two-level preconditioner via low-rank spectral updates 123<br />

400<br />

Example 1 − Size = 1080 − IRAM tolerance = 0.1<br />

350<br />

m = 10<br />

m = 30<br />

m = 50<br />

300<br />

Number of iterations of GMRES(m)<br />

250<br />

200<br />

150<br />

100<br />

50<br />

0<br />

0 2 4 6 8 10 12 14 16 18 20<br />

Size of the coarse space<br />

Figure 6.2.7: Number of iterations required by GMRES preconditioned by<br />

a Frobenius-norm minimization method updated with spectral corrections<br />

to reduce the normwise backward error by 10 −8 <strong>for</strong> three choices of restart<br />

and increasing size of the coarse space on Example 1.<br />

In the second set of experiments, we consider the <strong>for</strong>mulation of the<br />

preconditioner illustrated in Theorem 2, where we select W H = Vε<br />

H M 1 to<br />

save the computation of left eigenvectors. The quality of the preconditioner<br />

is very well preserved as we see in Tables 6.2.3-6.2.7. The construction cost<br />

of the low-rank updates is twice as cheap.<br />

In Table 6.2.8 we show the number of matrix-vector products required by<br />

the ARPACK implementation of the IRA method to compute the smallest<br />

approximate eigenvalues and the associated approximate right eigenvectors.<br />

All the numerical experiments are per<strong>for</strong>med in double precision complex<br />

arithmetic on a SGI Origin 2000. We remark that the matrix-vector<br />

products do not include those required <strong>for</strong> the iterative solution. Although<br />

the computation can be expensive, the cost can be amortized if the<br />

preconditioner is reused to solve <strong>linear</strong> <strong>systems</strong> with the same coefficient<br />

matrix and several right-hand sides. In Table 6.2.9 we show the number of<br />

amortization vectors relative to GMRES(10) and a tolerance of 10 −5 , that<br />

is the number of right-hand sides that have to be considered to amortize the<br />

extra cost <strong>for</strong> the eigencomputation. The localization of a few eigenvalues<br />

within a cluster may be more expensive than the computation of a full<br />

group of small eigenvalues. The optimal trade-off seems to be around 10. In<br />

that case the number of amortization vectors is reasonably small especially<br />

compared to real electromagnetic calculations where <strong>linear</strong> <strong>systems</strong> with<br />

the same coefficient matrix and up to thousands of right-hand sides are<br />

often solved.


124 6. Spectral two-level preconditioner<br />

500<br />

Example 2 − Size = 1299 − IRAM tolerance = 0.1<br />

450<br />

m = 10<br />

m = 30<br />

m = 50<br />

400<br />

Number of iterations of GMRES(m)<br />

350<br />

300<br />

250<br />

200<br />

150<br />

100<br />

50<br />

0 2 4 6 8 10 12 14 16 18 20<br />

Size of the coarse space<br />

Figure 6.2.8: Number of iterations required by GMRES preconditioned by<br />

a Frobenius-norm minimization method updated with spectral corrections<br />

to reduce the normwise backward error by 10 −8 <strong>for</strong> three choices of restart<br />

and increasing size of the coarse space on Example 2.<br />

300<br />

Example 3 − Size = 1701 − IRAM tolerance = 0.1<br />

250<br />

m = 10<br />

m = 30<br />

m = 50<br />

Number of iterations of GMRES(m)<br />

200<br />

150<br />

100<br />

50<br />

0<br />

0 2 4 6 8 10 12 14 16 18 20<br />

Size of the coarse space<br />

Figure 6.2.9: Number of iterations required by GMRES preconditioned by<br />

a Frobenius-norm minimization method updated with spectral corrections<br />

to reduce the normwise backward error by 10 −8 <strong>for</strong> three choices of restart<br />

and increasing size of the coarse space on Example 3.


6.2. Two-level preconditioner via low-rank spectral updates 125<br />

Size of the<br />

coarse space<br />

Example 1<br />

Choice <strong>for</strong> the operator W H<br />

W H = Uɛ H M 1<br />

1 314 315 316<br />

2 314 314 312<br />

3 313 314 315<br />

4 310 313 308<br />

5 313 306 315<br />

6 315 303 311<br />

7 315 298 290<br />

8 315 294 292<br />

9 315 303 302<br />

10 248 244 244<br />

11 206 206 204<br />

12 197 190 215<br />

13 194 177 208<br />

14 192 177 184<br />

15 191 180 186<br />

16 189 184 189<br />

17 189 180 195<br />

18 175 180 205<br />

19 166 174 182<br />

20 153 174 173<br />

W H = V H<br />

ɛ M 1 W = V ɛ<br />

Table 6.2.3: Number of iterations required by GMRES(10) preconditioned<br />

by a Frobenius-norm minimization method updated with spectral<br />

corrections to reduce the normwise backward error by 10 −8 <strong>for</strong> increasing<br />

size of the coarse space on Example 1. Different choices are considered <strong>for</strong><br />

the operator W H .


126 6. Spectral two-level preconditioner<br />

Size of the<br />

coarse space<br />

Example 2<br />

Choice <strong>for</strong> the operator W H<br />

W H = Uɛ H M 1<br />

1 310 285 304<br />

2 306 286 305<br />

3 308 286 310<br />

4 304 279 303<br />

5 309 286 310<br />

6 313 286 307<br />

7 246 229 239<br />

8 205 188 201<br />

9 205 187 202<br />

10 198 185 194<br />

11 198 184 194<br />

12 198 196 193<br />

13 198 187 193<br />

14 185 190 194<br />

15 175 189 193<br />

16 186 183 185<br />

17 159 178 183<br />

18 192 179 187<br />

19 167 178 185<br />

20 187 168 169<br />

W H = V H<br />

ɛ M 1 W = V ɛ<br />

Table 6.2.4: Number of iterations required by GMRES(10) preconditioned<br />

by a Frobenius-norm minimization method updated with spectral<br />

corrections to reduce the normwise backward error by 10 −8 <strong>for</strong> increasing<br />

size of the coarse space on Example 2. Different choices are considered <strong>for</strong><br />

the operator W H .


6.2. Two-level preconditioner via low-rank spectral updates 127<br />

Size of the<br />

coarse space<br />

Example 3<br />

Choice <strong>for</strong> the operator W H<br />

W H = Uɛ H M 1<br />

1 267 254 260<br />

2 271 286 267<br />

3 263 284 272<br />

4 260 259 256<br />

5 255 269 262<br />

6 209 221 199<br />

7 209 222 202<br />

8 209 225 208<br />

9 137 133 135<br />

10 127 126 126<br />

11 126 124 125<br />

12 115 117 115<br />

13 119 117 118<br />

14 119 119 120<br />

15 114 119 110<br />

16 104 105 103<br />

17 105 106 105<br />

18 103 105 102<br />

19 97 99 94<br />

20 96 96 90<br />

W H = V H<br />

ɛ M 1 W = V ɛ<br />

Table 6.2.5: Number of iterations required by GMRES(10) preconditioned<br />

by a Frobenius-norm minimization method updated with spectral<br />

corrections to reduce the normwise backward error by 10 −8 <strong>for</strong> increasing<br />

size of the coarse space on Example 3. Different choices are considered <strong>for</strong><br />

the operator W H .


128 6. Spectral two-level preconditioner<br />

Size of the<br />

coarse space<br />

Example 4<br />

Choice <strong>for</strong> the operator W H<br />

W H = Uɛ H M 1<br />

1 145 145 145<br />

2 134 134 135<br />

3 133 130 131<br />

4 127 125 126<br />

5 126 123 124<br />

6 123 120 122<br />

7 101 101 101<br />

8 101 101 99<br />

9 100 98 94<br />

10 72 94 93<br />

11 95 93 86<br />

12 86 86 86<br />

13 86 86 85<br />

14 84 85 83<br />

15 82 82 82<br />

16 81 81 82<br />

17 81 82 82<br />

18 80 81 82<br />

19 82 81 81<br />

20 76 77 77<br />

W H = V H<br />

ɛ M 1 W = V ɛ<br />

Table 6.2.6: Number of iterations required by GMRES(10) preconditioned<br />

by a Frobenius-norm minimization method updated with spectral<br />

corrections to reduce the normwise backward error by 10 −8 <strong>for</strong> increasing<br />

size of the coarse space on Example 4. Different choices are considered <strong>for</strong><br />

the operator W H .


6.2. Two-level preconditioner via low-rank spectral updates 129<br />

Size of the<br />

coarse space<br />

Example 5<br />

Choice <strong>for</strong> the operator W H<br />

W H = Uɛ H M 1<br />

1 290 312 290<br />

2 290 311 290<br />

3 287 354 287<br />

4 254 345 252<br />

5 232 270 214<br />

6 392 559 430<br />

7 52 53 51<br />

8 52 55 52<br />

9 53 55 53<br />

10 53 54 52<br />

11 53 53 52<br />

12 52 52 49<br />

13 58 53 49<br />

14 50 52 50<br />

15 51 52 50<br />

16 51 52 50<br />

17 51 52 50<br />

18 60 52 50<br />

19 59 53 52<br />

20 60 53 52<br />

W H = V H<br />

ɛ M 1 W = V ɛ<br />

Table 6.2.7: Number of iterations required by GMRES(10) preconditioned<br />

by a Frobenius-norm minimization method updated with spectral<br />

corrections to reduce the normwise backward error by 10 −8 <strong>for</strong> increasing<br />

size of the coarse space on Example 5. Different choices are considered <strong>for</strong><br />

the operator W H .


130 6. Spectral two-level preconditioner<br />

Size of the Number of Matrix-Vector products<br />

coarse space<br />

Ex. 1 Ex. 2 Ex. 3 Ex. 4 Ex. 5<br />

1 90 135 120 75 60<br />

2 388 440 336 168 58<br />

3 243 524 290 214 107<br />

4 281 469 250 178 103<br />

5 354 423 192 149 163<br />

6 293 357 183 180 156<br />

7 247 340 175 134 128<br />

8 198 333 165 236 105<br />

9 179 345 154 261 125<br />

10 138 358 169 207 128<br />

11 186 527 157 191 160<br />

12 213 579 219 197 131<br />

13 189 574 224 248 162<br />

14 235 1010 212 309 126<br />

15 276 1762 223 355 164<br />

16 266 1053 202 412 237<br />

17 514 751 226 408 227<br />

18 336 3050 264 390 223<br />

19 336 2359 264 426 756<br />

20 650 1066 300 345 220<br />

Table 6.2.8: Number of matrix-vector products required by the IRAM<br />

algorithm to compute approximate eigenvalues nearest zero and the<br />

corresponding right eigenvectors.<br />

A natural question concerns the sensitivity of the preconditioner to the<br />

accuracy of the approximate eigenvectors. In the numerical experiments<br />

we require an accuracy of 0.1 on the computation of the eigenpairs. The<br />

stopping criterion adopted in the ARPACK implementation of the IRA<br />

algorithm assures small backward error on the Ritz pairs. The backward<br />

error is defined as the smallest perturbation ∆A, in norm, such that the<br />

Ritz pair is an eigenpair <strong>for</strong> the perturbed system A + ∆A. At the end<br />

of the computation, we checked that the required accuracy was attained.<br />

If ˜λ is an approximate eigenvalue and ˜x is the corresponding approximate<br />

eigenvector, then the normwise backward error associated with the eigenpair<br />

(˜λ,˜x) is<br />

‖r‖<br />

α‖x‖ , where α > 0 and r = A˜x − ˜λ˜x. In Table 6.2.10, the spectral<br />

in<strong>for</strong>mation is computed at an accuracy of the order of the machine<br />

precision, that is 10 −16 . No remarkable differences with the previous results<br />

can be observed in the number of iterations except <strong>for</strong> Example 2.


6.2. Two-level preconditioner via low-rank spectral updates 131<br />

Size of the Number of Amortization Vectors<br />

coarse space<br />

Ex. 1 Ex. 2 Ex. 3 Ex. 4 Ex. 5<br />

1 9 135 - - 60<br />

2 36 74 - 56 58<br />

3 23 105 290 72 18<br />

4 26 94 84 30 5<br />

5 33 85 48 25 5<br />

6 25 179 9 26 156<br />

7 21 14 8 7 2<br />

8 17 8 8 11 2<br />

9 15 8 3 12 2<br />

10 4 8 3 5 2<br />

11 3 12 3 8 2<br />

12 4 13 4 8 2<br />

13 3 13 4 10 2<br />

14 4 16 4 12 2<br />

15 4 26 4 13 2<br />

16 4 19 3 15 3<br />

17 8 10 4 15 3<br />

18 5 53 4 49 3<br />

19 4 33 4 16 10<br />

20 7 21 4 12 3<br />

Table 6.2.9: Number of amortization vectors required by the IRAM<br />

algorithm to compute approximate eigenvalues nearest zero and the<br />

corresponding right eigenvectors. The computation of the amortization<br />

vectors is relative to GMRES(10) and a tolerance of 10 −5 .<br />

In Figures 6.2.11-6.2.14, we investigate the numerical behaviour of our<br />

method in the presence of a larger cluster of small eigenvalues in M 1 A, that<br />

is when M 1 is a poor preconditioner. This generally happens when the<br />

nonzero structure of the approximate inverse is very sparse, or when less<br />

in<strong>for</strong>mation <strong>from</strong> A is used to construct M 1 . As we mentioned in Chapter 3<br />

and is shown in Figures 6.2.10 <strong>for</strong> Example 2, a side-effect of reducing the<br />

number of nonzeros in the sparse approximation of A is that a larger number<br />

of eigenvalues cluster around the origin of the spectrum of the preconditioned<br />

matrix. In the experiments reported in Figures 6.2.11-6.2.14 the Frobeniusnorm<br />

preconditioner is constructed using the same nonzero structure to<br />

sparsify A and to compute M 1 . As the results show, when the preconditioner<br />

is not very effective the spectral corrections, although beneficial, do not


132 6. Spectral two-level preconditioner<br />

Example 1<br />

Size of the Numberof iterations of GMRES(10)<br />

coarse space<br />

Ex 1 Ex 2 Ex 3 Ex 4 Ex 5<br />

1 315 215 254 145 312<br />

2 314 202 286 134 311<br />

3 314 193 284 130 354<br />

4 313 192 259 125 346<br />

5 306 190 269 123 270<br />

6 303 189 221 120 552<br />

7 298 161 222 101 53<br />

8 294 147 225 101 55<br />

9 303 146 133 99 54<br />

10 244 144 126 94 54<br />

11 206 143 124 93 50<br />

12 190 143 117 86 50<br />

13 177 140 117 86 48<br />

14 177 139 119 85 48<br />

15 182 139 117 82 48<br />

16 184 139 106 81 48<br />

17 171 139 106 81 48<br />

18 176 135 102 81 48<br />

19 177 135 94 80 47<br />

20 178 131 99 80 50<br />

Table 6.2.10: Number of iterations required by GMRES(10) preconditioned<br />

by a Frobenius-norm minimization method updated with spectral<br />

corrections to reduce the normwise backward error by 10 −8 <strong>for</strong> increasing<br />

size of the coarse space. The <strong>for</strong>mulation of Theorem 2 with the choice<br />

W H = Vε<br />

H M 1 is used <strong>for</strong> the low-rank updates. The computation of Ritz<br />

pairs is carried out at machine precision.<br />

enhance its robustness significantly. Coarse spaces of larger size may be<br />

necessary to shift the clustered eigenvalues nearest zero and speed up the<br />

convergence. The localization of the eigenvalues of smallest magnitude by<br />

the IRA method is much more expensive in this situation, as illustrated in<br />

the numerical experiments reported in Appendix A.


6.2. Two-level preconditioner via low-rank spectral updates 133<br />

0.5<br />

0<br />

Imaginary axis<br />

−0.5<br />

−1<br />

−1.5<br />

−0.5 0 0.5 1 1.5<br />

Real axis<br />

Figure 6.2.10: Eigenvalue distribution <strong>for</strong> the coefficient matrix<br />

preconditioned by a Frobenius-norm minimization method on Example 2.<br />

The same sparsity pattern is used <strong>for</strong> A and <strong>for</strong> the preconditioner.<br />

900<br />

Example 1 − Size = 1080 − IRAM tolerance = 0.1<br />

800<br />

Toler = 1.0e−8<br />

" = 1.0e−5<br />

Number of iterations of GMRES(10)<br />

700<br />

600<br />

500<br />

400<br />

300<br />

200<br />

0 2 4 6 8 10 12 14 16 18 20<br />

Size of the coarse space<br />

Figure 6.2.11: Convergence of GMRES preconditioned by a Frobeniusnorm<br />

minimization method updated with spectral corrections to reduce the<br />

normwise backward error by 10 −8 and 10 −5 <strong>for</strong> increasing size of the coarse<br />

space on Example 1. The <strong>for</strong>mulation of Theorem 2 with the choice W H =<br />

Vε<br />

H M 1 is used <strong>for</strong> the low-rank updates. The same nonzero structure is used<br />

<strong>for</strong> A and M 1 .


134 6. Spectral two-level preconditioner<br />

550<br />

500<br />

Example 2 − Size = 1299 − IRAM tolerance = 0.1<br />

Toler = 1.0e−8<br />

" = 1.0e−5<br />

Number of iterations of GMRES(10)<br />

450<br />

400<br />

350<br />

300<br />

250<br />

200<br />

0 2 4 6 8 10 12 14 16 18 20<br />

Size of the coarse space<br />

Figure 6.2.12: Convergence of GMRES preconditioned by a Frobeniusnorm<br />

minimization method updated with spectral corrections to reduce the<br />

normwise backward error by 10 −8 and 10 −5 <strong>for</strong> increasing size of the coarse<br />

space on Example 2. The <strong>for</strong>mulation of Theorem 2 with the choice W H =<br />

Vε<br />

H M 1 is used <strong>for</strong> the low-rank updates. The same nonzero structure is used<br />

<strong>for</strong> A and M 1 .<br />

350<br />

Example 4 − Size = 1701 − IRAM tolerance = 0.1<br />

Toler = 1.0e−8<br />

" = 1.0e−5<br />

300<br />

Number of iterations of GMRES(10)<br />

250<br />

200<br />

150<br />

100<br />

50<br />

0 2 4 6 8 10 12 14 16 18 20<br />

Size of the coarse space<br />

Figure 6.2.13: Convergence of GMRES preconditioned by a Frobeniusnorm<br />

minimization method updated with spectral corrections to reduce the<br />

normwise backward error by 10 −8 and 10 −5 <strong>for</strong> increasing size of the coarse<br />

space on Example 3. The <strong>for</strong>mulation of Theorem 2 with the choice W H =<br />

Vε<br />

H M 1 is used <strong>for</strong> the low-rank updates. The same nonzero structure is used<br />

<strong>for</strong> A and M 1 .


6.2. Two-level preconditioner via low-rank spectral updates 135<br />

Example 4 − Size = 2016 − IRAM tolerance = 0.1<br />

280<br />

Toler = 1.0e−8<br />

" = 1.0e−5<br />

260<br />

Number of iterations of GMRES(10)<br />

240<br />

220<br />

200<br />

180<br />

160<br />

140<br />

120<br />

100<br />

0 2 4 6 8 10 12 14 16 18 20<br />

Size of the coarse space<br />

Figure 6.2.14: Convergence of GMRES preconditioned by a Frobeniusnorm<br />

minimization method updated with spectral corrections to reduce the<br />

normwise backward error by 10 −8 and 10 −5 <strong>for</strong> increasing size of the coarse<br />

space on Example 4. The <strong>for</strong>mulation of Theorem 2 with the choice W H =<br />

Vε<br />

H M 1 is used <strong>for</strong> the low-rank updates. The same nonzero structure is used<br />

<strong>for</strong> A and M 1 .


136 6. Spectral two-level preconditioner<br />

6.2.3 Symmetric <strong>for</strong>mulation<br />

One problem with the previous <strong>for</strong>mulations is that the updated<br />

preconditioner M is no longer symmetric even if M 1 is symmetric. A<br />

symmetric <strong>for</strong>mulation can be obtained if we choose W = V ε in Theorem 2.<br />

Nevertheless we point out that, as in the case W H = Vε<br />

H M 1 , the projected<br />

matrix Ãc is not guaranteed to have full rank. For SPD matrices this choice<br />

naturally leads to a SPD preconditioner.<br />

In Tables 6.2.3-6.2.7, we show experiments with this choice <strong>for</strong> the<br />

operator W . The method is still effective as no remarkable deterioration can<br />

be observed in the quality of the preconditioner computed. In Figures 6.2.15-<br />

6.2.19, we use the symmetric Frobenius-norm minimization method obtained<br />

by averaging the off-diagonal entries, and we solve the <strong>linear</strong> system<br />

by the SQMR algorithm. The remarkable robustness of this solver on<br />

electromagnetic applications should be noted, as it clearly outper<strong>for</strong>ms<br />

GMRES with large restart.<br />

110<br />

Example 1 − Size = 1080 − IRAM tolerance = 0.1<br />

100<br />

SQMR Toler = 1.0e−8<br />

" = 1.0e−5<br />

90<br />

Number of iterations of SQMR<br />

80<br />

70<br />

60<br />

50<br />

40<br />

30<br />

20<br />

0 2 4 6 8 10 12 14 16 18 20<br />

Size of the coarse space<br />

Figure 6.2.15: Number of iterations required by SQMR preconditioned by<br />

a Frobenius-norm minimization method updated with spectral corrections<br />

to reduce the normwise backward error by 10 −5 <strong>for</strong> increasing size of the<br />

coarse space on Example 1. The symmetric <strong>for</strong>mulation of Theorem 2 with<br />

the choice W = V ε is used <strong>for</strong> the low-rank updates.


6.2. Two-level preconditioner via low-rank spectral updates 137<br />

180<br />

Example 2 − Size = 1299 − IRAM tolerance = 0.1<br />

160<br />

SQMR Toler = 1.0e−8<br />

" = 1.0e−5<br />

140<br />

Number of iterations of SQMR<br />

120<br />

100<br />

80<br />

60<br />

40<br />

0 2 4 6 8 10 12 14 16 18 20<br />

Size of the coarse space<br />

Figure 6.2.16: Number of iterations required by SQMR preconditioned by<br />

a Frobenius-norm minimization method updated with spectral corrections<br />

to reduce the normwise backward error by 10 −5 <strong>for</strong> increasing size of the<br />

coarse space on Example 2. The symmetric <strong>for</strong>mulation of Theorem 2 with<br />

the choice W = V ε is used <strong>for</strong> the low-rank updates.<br />

100<br />

Example 3 − Size = 1701 − IRAM tolerance = 0.1<br />

90<br />

SQMR Toler = 1.0e−8<br />

" = 1.0e−5<br />

80<br />

Number of iterations of SQMR<br />

70<br />

60<br />

50<br />

40<br />

30<br />

0 2 4 6 8 10 12 14 16 18 20<br />

Size of the coarse space<br />

Figure 6.2.17: Number of iterations required by SQMR preconditioned by<br />

a Frobenius-norm minimization method updated with spectral corrections<br />

to reduce the normwise backward error by 10 −5 <strong>for</strong> increasing size of the<br />

coarse space on Example 3. The symmetric <strong>for</strong>mulation of Theorem 2 with<br />

the choice W = V ε is used <strong>for</strong> the low-rank updates.


138 6. Spectral two-level preconditioner<br />

65<br />

60<br />

Example 4 − Size = 2016 − IRAM tolerance = 0.1<br />

SQMR Toler = 1.0e−8<br />

" = 1.0e−5<br />

55<br />

Number of iterations of SQMR<br />

50<br />

45<br />

40<br />

35<br />

30<br />

25<br />

20<br />

15<br />

0 2 4 6 8 10 12 14 16 18 20<br />

Size of the coarse space<br />

Figure 6.2.18: Number of iterations required by SQMR preconditioned by<br />

a Frobenius-norm minimization method updated with spectral corrections<br />

to reduce the normwise backward error by 10 −5 <strong>for</strong> increasing size of the<br />

coarse space on Example 4. The symmetric <strong>for</strong>mulation of Theorem 2 with<br />

the choice W = V ε is used <strong>for</strong> the low-rank updates.<br />

55<br />

50<br />

Example 5 − Size = 2430 − IRAM tolerance = 0.1<br />

SQMR Toler = 1.0e−8<br />

" = 1.0e−5<br />

45<br />

Number of iterations of SQMR<br />

40<br />

35<br />

30<br />

25<br />

20<br />

15<br />

10<br />

0 2 4 6 8 10 12 14 16 18 20<br />

Size of the coarse space<br />

Figure 6.2.19: Number of iterations required by SQMR preconditioned by<br />

a Frobenius-norm minimization method updated with spectral corrections<br />

to reduce the normwise backward error by 10 −5 <strong>for</strong> increasing size of the<br />

coarse space on Example 5. The symmetric <strong>for</strong>mulation of Theorem 2 with<br />

the choice W = V ε is used <strong>for</strong> the low-rank updates.


6.3. Multiplicative <strong>for</strong>mulation of low-rank spectral updates 139<br />

6.3 Multiplicative <strong>for</strong>mulation of lowrank<br />

spectral updates<br />

The spectral in<strong>for</strong>mation that we compute can be exploited differently<br />

if we look at it <strong>from</strong> a multigrid view point. This leads us to derive a<br />

multiplicative version of our two-level preconditioner that can be expressed<br />

as a two-grid algorithm. In order to illustrate this link, let us first briefly<br />

describe the classical geometric two-grid algorithm.<br />

For solving a <strong>linear</strong> system Ax = b with initial guess x 0 where A comes<br />

<strong>from</strong> a discretization of an elliptic operator on a mesh, a geometric two-grid<br />

algorithm can be briefly described as follows:<br />

1. Pre-smoothing:<br />

a few iterations are per<strong>for</strong>med to damp the high frequencies of the<br />

error. The components that are eliminated belong to the subspace<br />

spanned by the eigenvectors associated with the large eigenvalues of<br />

the pre-smoothing iteration matrix. One iteration of this pre-smoother<br />

might be written<br />

x new = x old + B(b − Ax old ) (6.3.4)<br />

and let x k+ 1 3<br />

iterations.<br />

denotes the approximate solution after µ 1 pre-smoothing<br />

2. Coarse grid correction:<br />

the components left in the error are smooth and then can be<br />

represented on a coarser mesh. Consequently the residual is projected<br />

onto the coarse mesh and the error equation is solved exactly in<br />

the associated coarse space. The error on the coarse mesh is then<br />

interpolated back into the fine mesh to correct x k+ 1 3 . If we denote<br />

by R the projection operator and by P the prolongation/interpolation<br />

operator, the coarse grid problem is usually defined by the Galerkin<br />

<strong>for</strong>mula A c = RAP . The coarse grid correction can then be written<br />

x k+ 2 3<br />

= x<br />

k+ 1 3 + P A<br />

−1<br />

c R(b − Ax k+ 1 3 ) (6.3.5)<br />

3. Post-smoothing:<br />

a few additional smoothing iterations are per<strong>for</strong>med to eliminate<br />

the possible high frequency that might have been introduced by<br />

the interpolation. We can then compute the new iterate x k+1 by<br />

reper<strong>for</strong>ming µ 2 iterations of the iterative scheme (6.3.4) using the<br />

initial guess x k+ 2 3 .<br />

In a geometric multigrid algorithm, the coarse grid correction effectively<br />

solves the error equation restricted to the subspace associated with the


140 6. Spectral two-level preconditioner<br />

smallest eigenvalues. The corresponding eigenvectors are associated with<br />

the low frequency mode and consequently can geometrically be represented<br />

on a coarser mesh.<br />

In our multiplicative algorithm we make the “coarse grid” correction<br />

explicit by actually projecting the error equation directly onto the subspace<br />

associated with the smallest eigenvalues. More precisely, the smoother<br />

is defined by our preconditioner M 1 , the restriction is R = V ε and the<br />

prolongation is Uε H . The preconditioning operation is per<strong>for</strong>med in three<br />

distinct steps and <strong>for</strong> the sake of simplicity <strong>for</strong> this exposition we set µ 1 =<br />

µ 2 = 1. The first step consists of a sweep with the sparse approximate inverse<br />

M 1 , that is ẑ = M 1 p where p is the vector to precondition. The second<br />

step is intended to correct some components of the preconditioned vector ẑ<br />

along the directions defined by the approximate eigenvectors corresponding<br />

to the approximate eigenvalues smallest in magnitude. Using the notation<br />

of Section 6.2.1<br />

ẑ = ẑ + V ɛ A −1<br />

c U ɛ (p − Aẑ).<br />

Finally, the sparse approximate inverse is used to refine the preconditioning<br />

operation in the complement to the subspace determined by approximate<br />

eigenvectors corresponding to the smallest eigenvalues<br />

ẑ = ẑ + M 1 (p − Aẑ).<br />

The nomenclature multiplicative <strong>for</strong>mulation is inherited <strong>from</strong> the<br />

framework of domain decomposition methods as similarities exist with<br />

Schwarz methods [26, 130]. In addition, with µ 1 = µ 2 = 1 the two-step<br />

correction can be expressed in the following compact <strong>for</strong>m<br />

ẑ = ẑ + B(p − Aẑ)<br />

where B = (I − (I − M 1 )(I − V ɛ A −1<br />

c U ɛ ))A −1 .<br />

6.3.1 Numerical experiments<br />

In this section, we show the qualitative numerical behaviour of our method<br />

on our set of test examples. In Figures 6.3.20, 6.3.21 and 6.3.22, we show the<br />

number of iterations required by restarted GMRES to reduce the residual<br />

to a prescribed accuracy <strong>for</strong> increasing size of the coarse space. As be<strong>for</strong>e in<br />

the extensive results reported in Appendix A we consider coarse spaces of<br />

increasing dimension, up to 20, and values of restart <strong>for</strong> GMRES <strong>from</strong> 10 to<br />

110. The preconditioner is very effective as shown in Figures 6.3.20, 6.3.21<br />

and 6.3.22. Compared to the additive <strong>for</strong>mulation a larger reduction in terms<br />

of iteration count is observed on all the five examples. However, we point<br />

out that each iteration step requires two additional matrix-vector products


6.3. Multiplicative <strong>for</strong>mulation of low-rank spectral updates 141<br />

which makes this <strong>for</strong>mulation always more expensive than the additive one.<br />

Finally, we mention that this <strong>for</strong>mulation naturally leads to a symmetric<br />

preconditioner if M 1 is symmetric. Thus the SQMR algorithm can be used<br />

to solve the problem and the results obtained with this solver are shown in<br />

Appendix A. It should be noted that the results are surprisingly poor on<br />

two test problems, Examples 2 and 5.<br />

400<br />

Example 1 − Size = 1080 − IRAM tolerance = 0.1<br />

350<br />

GMRES Toler = 1.0e−8<br />

" = 1.0e−5<br />

300<br />

Number of iterations of GMRES(10)<br />

250<br />

200<br />

150<br />

100<br />

50<br />

0<br />

0 2 4 6 8 10 12 14 16 18 20<br />

Size of the coarse space<br />

Figure 6.3.20: Convergence of GMRES preconditioned by a Frobeniusnorm<br />

minimization method updated with spectral corrections to reduce<br />

the normwise backward error by 10 −8 and 10 −5 <strong>for</strong> increasing numberof<br />

corrections on Example 1. The symmetric <strong>for</strong>mulation of Theorem 2 with<br />

the choice W = V ε is used <strong>for</strong> the low-rank updates. The preconditioner is<br />

updated in multiplicative <strong>for</strong>m.


142 6. Spectral two-level preconditioner<br />

300<br />

Example 3 − Size = 1701 − IRAM tolerance = 0.1<br />

250<br />

GMRES Toler = 1.0e−8<br />

" = 1.0e−5<br />

Number of iterations of GMRES(10)<br />

200<br />

150<br />

100<br />

50<br />

0<br />

0 2 4 6 8 10 12 14 16 18 20<br />

Size of the coarse space<br />

Figure 6.3.21: Convergence of GMRES preconditioned by a Frobeniusnorm<br />

minimization method updated with spectral corrections to reduce the<br />

normwise backward error by 10 −8 and 10 −5 <strong>for</strong> increasing size of the coarse<br />

space on Example 3. The symmetric <strong>for</strong>mulation of Theorem 2 with the<br />

choice W = V ε is used <strong>for</strong> the low-rank updates. The preconditioner is<br />

updated in multiplicative <strong>for</strong>m.<br />

160<br />

Example 4 − Size = 2016 − IRAM tolerance = 0.1<br />

140<br />

GMRES Toler = 1.0e−8<br />

" = 1.0e−5<br />

Number of iterations of GMRES(10)<br />

120<br />

100<br />

80<br />

60<br />

40<br />

20<br />

0 2 4 6 8 10 12 14 16 18 20<br />

Size of the coarse space<br />

Figure 6.3.22: Convergence of GMRES preconditioned by a Frobeniusnorm<br />

minimization method updated with spectral corrections to reduce the<br />

normwise backward error by 10 −8 and 10 −5 <strong>for</strong> increasing size of the coarse<br />

space on Example 4. The symmetric <strong>for</strong>mulation of Theorem 2 with the<br />

choice W = V ε is used <strong>for</strong> the low-rank updates. The preconditioner is<br />

updated in multiplicative <strong>for</strong>m.


6.4. Concluding remarks 143<br />

6.4 Concluding remarks<br />

In this chapter, we have presented a refinement technique <strong>for</strong> the<br />

approximate inverse based on low-rank corrections computed by using<br />

spectral in<strong>for</strong>mation <strong>from</strong> the preconditioned matrix. We have shown the<br />

effectiveness and the robustness of the resulting preconditioner on a set of<br />

small but tough problems arising <strong>from</strong> electromagnetic applications. The<br />

method is very well suited <strong>for</strong> its use on electromagnetic problems as the<br />

preconditioner is often used to solve <strong>systems</strong> with the same coefficient<br />

matrix and multiple right-hand sides. In this way, the extra cost <strong>for</strong><br />

the computation of the preconditioner updates can be amortized. It can<br />

be combined with inner-outer schemes via embedded iterations described<br />

in the previous chapter to construct robust and efficient <strong>preconditioners</strong><br />

on electromagnetic applications, but we do not carry out this study in<br />

this thesis. Moreover, this technique can be used <strong>for</strong> general problems.<br />

Preliminary results on domain decomposition methods [117] and SPD<br />

matrices <strong>from</strong> the Harwell-Boeing collection [53] are encouraging.


144 6. Spectral two-level preconditioner


Chapter 7<br />

Conclusions and perspectives<br />

In this thesis, we have presented preconditioning methods <strong>for</strong><br />

the numerical solution, using iterative Krylov solvers, of <strong>dense</strong><br />

complex symmetric non-Hermitian <strong>systems</strong> of equations arising <strong>from</strong> the<br />

discretization of boundary integral equations in electromagnetism. We have<br />

illustrated both the numerical behaviour and the cost of the proposed<br />

<strong>preconditioners</strong>, identified potential causes of failure and introduced<br />

techniques to enhance their robustness. The major concern of the thesis<br />

has been to design robust sparse approximate inverse <strong>preconditioners</strong> based<br />

on Frobenius-norm minimization techniques. However, in Chapter 2,<br />

we considered several standard <strong>preconditioners</strong> based on the idea of<br />

sparsification, of both implicit and explicit type, and we studied their<br />

numerical behaviour on electromagnetic applications.<br />

We have shown that incomplete LU factorization methods do not work<br />

well <strong>for</strong> such <strong>systems</strong>. The incomplete factorization process is highly<br />

unstable on indefinite matrices like those arising <strong>from</strong> the discretization of<br />

the EFIE <strong>for</strong>mulation. Using numerical experiments we have shown that the<br />

triangular factors computed by the factorization can be very ill-conditioned,<br />

and the long recurrences associated with the triangular solves are unstable.<br />

As an attempt at a possible remedy, we introduced a small complex shift to<br />

move the eigenvalues of the preconditioned system along the imaginary axis<br />

and thus try to avoid a possible cluster of eigenvalues close to zero. A small<br />

diagonal complex shift can help to compute a more stable factorization,<br />

and in some cases the per<strong>for</strong>mance of the preconditioner can significantly<br />

improve. Further work is required to make the preconditioner more<br />

robust. Condition estimators can be incorporated into the factorization<br />

process to detect instabilities during the computation, and suitable strategies<br />

introduced to tune the optimal value of the shift and to predict its effect.<br />

The construction of the preconditioner is inherently sequential but many<br />

recent research ef<strong>for</strong>ts have been designed to exploit parallelism [102, 111].<br />

145


146 7. Conclusions and perspectives<br />

This gives hope that it might worth examining further this method in a<br />

parallel and multipole context.<br />

Factorized approximate inverses, namely AINV and F SAI, exhibit<br />

poor convergence behaviour because the inverse factors can be totally<br />

unstructured; both reordering and shift strategies do not improve their<br />

effectiveness. Any dropping strategy, either static or dynamic, may be<br />

very difficult to tune as it can easily discard relevant in<strong>for</strong>mation and<br />

potentially lead to a very poor preconditioner. In this case, finding the<br />

appropriate threshold to enable a good trade-off between sparsity and<br />

numerical efficiency is challenging and very problem-dependent. Graph<br />

partitioning algorithms can be used to define a sparse structure <strong>for</strong> the<br />

inverse factors. Geometric and spectral partitioning methods would split<br />

the graph of the sparse approximation à to A into a number, say p,<br />

of independent subgraphs of roughly equal size, with relatively small<br />

connections. By numbering the interior nodes first and the interface nodes<br />

last, the permuted matrix assumes the <strong>for</strong>m<br />

⎛<br />

P T ÃP =<br />

⎜<br />

⎝<br />

A 1 B T 1<br />

A 2 B T 2<br />

. .. .<br />

A p<br />

B T p<br />

B 1 B 2 · · · B p A T S<br />

⎞<br />

.<br />

⎟<br />

⎠<br />

where P is a permutation matrix. The diagonal blocks A 1 , A 2 , ..., A p<br />

correspond to connections between nodes in the same subgraph; the offdiagonal<br />

blocks B i correspond to connections between nodes of distinct<br />

subgraphs, and the block A S represents connections between interface nodes.<br />

This permutation strategy can also be used to introduce parallelism in the<br />

construction of some inherently sequential <strong>preconditioners</strong> [15, 102]. The<br />

inverse of the permuted matrix admits the decomposition<br />

P −1 Ã −1 P −T = L −T D −1 L −1 =<br />

⎛<br />

⎜<br />

⎝<br />

I 1<br />

I 2<br />

L −T<br />

1<br />

L −T<br />

2<br />

. .. .<br />

I p<br />

L −T<br />

p<br />

I S<br />

⎞ ⎛<br />

·<br />

⎟ ⎜<br />

⎠ ⎝<br />

T −1<br />

1<br />

T −1<br />

2<br />

. ..<br />

T −1<br />

p<br />

T −1<br />

S<br />

⎞ ⎛<br />

·<br />

⎟ ⎜<br />

⎠ ⎝<br />

I p<br />

L −1<br />

1 L −1<br />

2 · · · L −1<br />

p<br />

⎞<br />

.<br />

⎟<br />

⎠<br />

I S<br />

It can be seen that fill-in in the inverse factor L can occur only in the<br />

blocks L −1<br />

i<br />

. The use of these techniques might enable the control and<br />

prediction of fill-in in the inverse factors, and enhance significantly the


7. Conclusions and perspectives 147<br />

robustness of factorized approximate inverse methods like AINV or F SAI<br />

on electromagnetic applications.<br />

In Chapter 2, we have shown that the location of the large entries in the<br />

inverse matrix exhibit some structure and thus a non-factorized approximate<br />

inverse can be a good candidate to precondition these <strong>systems</strong> effectively.<br />

In particular, <strong>preconditioners</strong> based on the Frobenius-norm minimization<br />

are much less prone to instabilities than incomplete factorization methods.<br />

To be computationally af<strong>for</strong>dable on <strong>dense</strong> <strong>systems</strong>, these <strong>preconditioners</strong><br />

require a suitable strategy to identify the relevant entries to consider in<br />

the original matrix A in order to define small least-squares problems, as<br />

well as an appropriate sparsity structure <strong>for</strong> the approximate inverse. In<br />

Chapter 3 we exploited the decay of the discrete Green’s function to compute<br />

an effective a priori pattern <strong>for</strong> the approximate inverse. We have shown<br />

that by using additional geometric in<strong>for</strong>mation <strong>from</strong> the underlying mesh,<br />

it is possible to construct robust sparse <strong>preconditioners</strong> at an af<strong>for</strong>dable<br />

computational and memory cost. An important feature of the pattern<br />

selection strategy based on geometric in<strong>for</strong>mation is that it does not require<br />

access to all the entries of the matrix A, so that it is well suited <strong>for</strong> an<br />

implementation in a fast multipole setting where A is not directly available<br />

but where only the near field entries are computed. Strategies that use<br />

in<strong>for</strong>mation <strong>from</strong> the connectivity graph of the underlying mesh require<br />

less computational ef<strong>for</strong>t to construct the pattern but are generally less<br />

effective. Also, they may not handle complex geometries very well where<br />

some parts of the object are not connected. By retaining two different<br />

densities in the patterns of A and M we can increase the robustness of the<br />

resulting preconditioner without penalizing the cost of its construction. The<br />

numerical experiments show that, using this pattern selection strategy, we<br />

can compute a very sparse but effective preconditioner. With the same low<br />

density, none of the standard <strong>preconditioners</strong> that we discussed earlier can<br />

compete with it.<br />

In Chapter 4, we propose two symmetric <strong>preconditioners</strong>, that can<br />

exploit the symmetry of the original matrix in the associated preconditioner<br />

and enable the use of a symmetric Krylov solver that proves to be cheaper<br />

than GMRES iterations. The first strategy simply averages the off-diagonal<br />

entries. We have shown that this approach, used in combination with<br />

the SQMR solver, is fairly robust, and is totally insensitive to column<br />

ordering; however, the construction of the preconditioner requires the same<br />

computational cost as in the unsymmetric case. The second strategy<br />

only computes the lower triangular part, including the diagonal, of the<br />

preconditioner. The nonzeros calculated are reflected with respect to the<br />

diagonal and are used to update the right-hand sides of the subsequent leastsquares<br />

problems involved in the construction of the remaining columns<br />

of the preconditioner. If m denotes the number of nonzeros entries in<br />

the approximate inverse, this method only computes (m + n)/2 nonzeros.


148 7. Conclusions and perspectives<br />

Thus the overall computational complexity <strong>for</strong> the construction can be<br />

considerably smaller. Through numerical experiments, we have shown that<br />

this method is not too sensitive to column ordering. Both these methods<br />

appear to be efficient and exhibit a remarkable robustness when used in<br />

conjunction with SQMR. They are promising <strong>for</strong> use in a parallel and<br />

multipole context <strong>for</strong> the solution of large <strong>systems</strong>. The first approach<br />

is straight<strong>for</strong>ward to parallelize even though it requires more flops <strong>for</strong> its<br />

construction. It would probably be the preconditioner of choice in a parallel<br />

distributed fast multipole environment. The second approach is under half<br />

as expensive and can be computationally attractive especially <strong>for</strong> large<br />

problems. Possibilities <strong>for</strong> parallelizing this approach also exist by using<br />

colouring techniques to detect independent subsets of columns that can be<br />

computed in parallel. In a multipole context the algorithm must be recast<br />

by blocks, and Level 2 BLAS operations have to be used <strong>for</strong> the least-squares<br />

updates. Further work is required to implement this procedure.<br />

In Chapter 5, we illustrated the implementation of the Frobenius-norm<br />

minimization preconditioner within a parallel out-of-core research code that<br />

implements the Fast Multipole Method (FMM), and we have studied the<br />

numerical and parallel scalability of the implementation <strong>for</strong> the solution of<br />

large scattering applications, up to one million unknowns. On problems of<br />

this size, the construction of the preconditioner can be demanding in terms<br />

of time, memory and disk resources. A potential limit of the Frobeniusnorm<br />

minimization preconditioner, and in general of any sparse approximate<br />

inverse method, is that it tends to be less effective on large problems<br />

because the number of iterations increases rapidly with the problem size.<br />

In Chapter 5, we proposed the use of inner-outer iterative solution schemes<br />

implemented in a multipole context with different levels of accuracy <strong>for</strong> the<br />

matrix-vector products in the inner and outer loops. We have shown that the<br />

use of the multipole matrix can be effective to balance the locality of the<br />

preconditioner. In particular, the combination FGMRES(5)/GMRES(20)<br />

can enhance the robustness of the preconditioner, reducing significantly the<br />

computational cost and the storage requirement <strong>for</strong> the solution of large<br />

problems. We have successfully used this approach to solve <strong>systems</strong> of<br />

size up to one million unknowns; the approach is very promising <strong>for</strong> the<br />

solution of challenging real-life industrial applications. Some questions are<br />

still open. One issue concerns the optimal tuning of the inner accuracy. In<br />

the numerical experiments, we selected a “medium” accuracy <strong>for</strong> the inner<br />

iteration. A multilevel scheme can be designed as a natural extension of the<br />

simple two-level scheme considered in Chapter 5, with several embedded<br />

FGMRES to go down to the lowest accuracy in the innermost GMRES.<br />

Variants of these schemes can be based on the flexible variants of the SQMR<br />

method as outer solvers and SQMR as the inner solver.<br />

In Chapter 6, we investigated a refinement technique <strong>for</strong> the approximate<br />

inverse based on low-rank corrections computed by using spectral


7. Conclusions and perspectives 149<br />

in<strong>for</strong>mation <strong>from</strong> the preconditioned matrix. We have illustrated the<br />

effectiveness and the robustness of the proposed preconditioner on a set<br />

of small but tough problems arising <strong>from</strong> electromagnetic applications, and<br />

we have analysed the cost of the algorithm. The conclusion is that the<br />

method is very well suited <strong>for</strong> the solution of electromagnetic problems; the<br />

extra cost <strong>for</strong> the computation of the preconditioner updates can be quickly<br />

amortized by considering a few right-hand sides. Also, the preconditioner is<br />

independent of the Krylov solver used <strong>for</strong> the actual solution of the <strong>linear</strong><br />

system. A symmetric <strong>for</strong>mulation has been derived and numerical results<br />

have shown the remarkable robustness of this <strong>for</strong>mulation when used in<br />

conjunction with SQMR. The numerical results are encouraging <strong>for</strong> the<br />

investigation of this procedure <strong>for</strong> the solution of much larger problems. The<br />

computation of the preconditioning updates by the IRA method is based on<br />

matrix-vector operations and thus can be easily integrated within the code<br />

that implements the Fast Multipole Method. It could be combined with<br />

inner-outer schemes via embedded iterations to construct <strong>preconditioners</strong><br />

on electromagnetic applications that might be expected to be very robust<br />

and effective. Although the electromagnetic context is an ideal setting<br />

<strong>for</strong> its application, the proposed technique can be effectively used in other<br />

contexts, as it only requires algebraic in<strong>for</strong>mation <strong>from</strong> the preconditioned<br />

matrix. Preliminary results on domain decomposition methods and both<br />

SPD and unsymmetric <strong>linear</strong> <strong>systems</strong> <strong>from</strong> the Harwell-Boeing sparse matrix<br />

collection are encouraging.<br />

The idea of updating the preconditioner by using low-rank corrections is<br />

a natural one in the context of integral equations, and is inherently related<br />

to the algebraic structure of the discretizated integral operator. A block<br />

structure of the coefficient matrix naturally emerges when the oct-tree is<br />

considered and the unknowns are numbered consecutively by leaf-boxes. If<br />

the n unknowns are divided into p groups, the coefficient matrix can be<br />

written in the <strong>for</strong>m:<br />

where<br />

A = D + Q<br />

D = diag{T 11 , T 22 , ..., T pp }<br />

is a block-diagonal matrix, and Q is a block matrix with zero blocks on the<br />

diagonal. Each block T kk represents the connection between edges within<br />

the same leaf-box and each off-diagonal block Q kl , l ≠ k represents the<br />

connection between edges of group k and group l. The off-diagonal blocks<br />

Q kl corresponding to far-away groups k and l have low rank r kl and thus<br />

can be expressed as the sum of r kl rank-one updates as follows


150 7. Conclusions and perspectives<br />

where<br />

r kl<br />

∑<br />

Q kl = u i kl vi T<br />

kl = Ukl Vkl<br />

T<br />

i=1<br />

U kl = [u 1 kl , u2 kl , ..., ur kl<br />

kl ]<br />

V kl = [v 1 kl , v2 kl , ..., vr kl<br />

kl ]<br />

Matrix-free methods [75] use this idea to approximate nonsingular<br />

coefficient matrices in 3D boundary integral applications <strong>from</strong> CEM<br />

and CFD by purely algebraic techniques. The Matrix Decomposition<br />

Algorithm and its multilevel variant [107, 108] approximates the far-field<br />

interactions of electromagnetic scattering problems by standard <strong>linear</strong><br />

algebra techniques. In [10] an iterative algorithm is proposed to compute<br />

low-rank approximations to blocks of large unstructured matrices; the<br />

algorithm uses only a few entries <strong>from</strong> the original blocks and the<br />

approximate rank is not needed in advance.<br />

The idea of low-rank approximations can be exploited in the design of<br />

the preconditioner [21]. Denoting by U and V the matrices<br />

⎡<br />

U = ⎢<br />

⎣<br />

⎡<br />

V =<br />

⎢<br />

⎣<br />

⎤<br />

U 11 0 · · · 0 U 12 0 · · · 0 · · · U 1p 0 · · · 0<br />

0 U 21 · · · 0 0 U 22 · · · 0 · · · 0 U 2p · · · 0<br />

.<br />

. .. .<br />

. . .. .<br />

. · · · . ..<br />

⎥<br />

. ⎦<br />

0 0 U p1 0 0 U p2 · · · 0 0 U pp<br />

⎤<br />

V 11 V 21 · · · V p1 0 0 · · · 0 · · · 0 0 · · · 0<br />

0 0 · · · 0 V 12 V 22 · · · V p2 · · · . 0 .<br />

⎥<br />

.<br />

. · · · 0 0 ⎦<br />

0 0 · · · 0 0 · · · · · · 0 · V 1p V 2p · · · V pp<br />

the matrix Q can be written as the product UV T . In our case the blocks U ii<br />

and V ii are null <strong>for</strong> i = 1, ..., p. By using the Sherman-Morrison-Woodbury<br />

<strong>for</strong>mula [73], the following explicit expression can be derived <strong>for</strong> the inverse<br />

of B :<br />

B −1 = (D + UV T ) −1 = D −1 − D −1 U(I + G) −1 V T D −1<br />

where G = V T D −1 U is of order m = ∑ k,l r kl. The application of the<br />

preconditioner requires “inverting”, that is exactly factorizing, the diagonal<br />

blocks of D and the matrix I + G which has small size. It might profitable<br />

to explore this further in future research. Preliminary results show that<br />

this strategy can be effective provided that the diagonal blocks are exactly


7. Conclusions and perspectives 151<br />

factorized. Some questions are still open. An explicit computation of the<br />

singular value decomposition of off-diagonal blocks is too expensive and is<br />

not feasible in a multipole context where the entries of these blocks are<br />

not available. More sophisticated block partitioning schemes need to be<br />

investigated to select the rank of the off-diagonal blocks appropriately.<br />

Some methods work well <strong>for</strong> our applications and we have tuned them <strong>for</strong><br />

problems in this area. It would be interesting in future work to see whether<br />

these methods are applicable in other areas, <strong>for</strong> example in acoustics.


152 7. Conclusions and perspectives


Appendix A<br />

Numerical results with the<br />

two-level spectral<br />

preconditioner<br />

153


154 A. Numerical results with the two-level spectral preconditioner<br />

A.1 Effect of the low-rank updates on the GMRES<br />

convergence<br />

Example 1<br />

Size of the<br />

GMRES(m), Toler. 1e-8<br />

coarse space<br />

m=10 m=30 m=50 m=80 m=110<br />

0 358 213 144 79 79<br />

1 314 179 138 76 76<br />

2 314 173 127 73 73<br />

3 313 172 116 70 70<br />

4 310 169 113 69 69<br />

5 313 169 108 67 67<br />

6 315 162 97 64 64<br />

7 315 145 91 62 62<br />

8 315 138 78 59 59<br />

9 315 134 75 57 57<br />

10 248 103 60 53 53<br />

11 206 98 53 52 52<br />

12 197 96 52 52 52<br />

13 194 91 52 51 51<br />

14 192 90 51 51 51<br />

15 191 90 51 51 51<br />

16 189 80 48 48 48<br />

17 189 80 48 48 48<br />

18 175 80 48 48 48<br />

19 166 60 42 42 42<br />

20 153 54 37 37 37<br />

Table A.1.1: Number of iterations required by GMRES preconditioned by<br />

a Frobenius-norm minimization method updated with spectral corrections<br />

to reduce the normwise backward error by 10 −8 <strong>for</strong> increasing size of the<br />

coarse space.


A.1. Effect of the low-rank updates on the GMRES convergence 155<br />

Example 1<br />

Size of the<br />

GMRES(m), Toler. 1e-5<br />

coarse space<br />

m=10 m=30 m=50 m=80 m=110<br />

0 165 103 75 60 60<br />

1 154 87 64 56 56<br />

2 154 87 62 54 54<br />

3 154 87 62 53 53<br />

4 154 87 62 53 53<br />

5 154 87 61 53 53<br />

6 153 77 50 50 50<br />

7 153 73 48 48 48<br />

8 153 72 45 45 45<br />

9 153 68 44 44 44<br />

10 129 52 40 40 40<br />

11 102 50 39 39 39<br />

12 97 49 39 39 39<br />

13 92 48 38 38 38<br />

14 92 48 38 38 38<br />

15 92 48 38 38 38<br />

16 91 45 35 35 35<br />

17 92 45 35 35 35<br />

18 97 45 35 35 35<br />

19 79 32 31 31 31<br />

20 69 26 26 26 26<br />

Table A.1.2: Number of iterations required by GMRES preconditioned by<br />

a Frobenius-norm minimization method updated with spectral corrections<br />

to reduce the normwise backward error by 10 −5 <strong>for</strong> increasing size of the<br />

coarse space.


156 A. Numerical results with the two-level spectral preconditioner<br />

Example 2<br />

Size of the<br />

GMRES(m), Toler. 1e-8<br />

coarse space<br />

m=10 m=30 m=50 m=80 m=110<br />

0 +1500 +1500 496 311 198<br />

1 310 235 192 151 107<br />

2 306 222 184 144 104<br />

3 308 209 177 138 101<br />

4 304 208 170 135 97<br />

5 309 206 164 132 96<br />

6 313 205 158 123 92<br />

7 246 174 146 108 88<br />

8 205 159 138 102 87<br />

9 205 159 138 101 87<br />

10 198 155 136 99 86<br />

11 198 154 136 98 86<br />

12 198 154 136 96 84<br />

13 198 153 136 89 83<br />

14 185 131 109 74 74<br />

15 175 138 115 75 75<br />

16 186 137 112 74 74<br />

17 159 117 98 70 70<br />

18 192 135 105 70 70<br />

19 167 126 98 68 68<br />

20 187 143 112 73 73<br />

Table A.1.3: Number of iterations required by GMRES preconditioned by<br />

a Frobenius-norm minimization method updated with spectral corrections<br />

to reduce the normwise backward error by 10 −8 <strong>for</strong> increasing size of the<br />

coarse space.


A.1. Effect of the low-rank updates on the GMRES convergence 157<br />

Example 2<br />

Size of the<br />

GMRES(m), Toler. 1e-5<br />

coarse space<br />

m=10 m=30 m=50 m=80 m=110<br />

0 145 110 95 76 76<br />

1 144 110 95 76 76<br />

2 139 104 91 73 73<br />

3 140 103 90 73 73<br />

4 140 103 90 73 73<br />

5 140 102 90 73 73<br />

6 143 100 88 72 72<br />

7 119 89 81 68 68<br />

8 100 83 76 65 65<br />

9 100 82 76 65 65<br />

10 99 82 76 64 64<br />

11 99 82 76 64 64<br />

12 99 82 76 64 64<br />

13 99 81 75 64 64<br />

14 80 60 48 48 48<br />

15 77 67 53 52 52<br />

16 87 69 60 54 54<br />

17 69 56 47 47 47<br />

18 87 66 52 51 51<br />

19 73 58 46 46 46<br />

20 93 74 65 58 58<br />

Table A.1.4: Number of iterations required by GMRES preconditioned by<br />

a Frobenius-norm minimization method updated with spectral corrections<br />

to reduce the normwise backward error by 10 −5 <strong>for</strong> increasing size of the<br />

coarse space.


158 A. Numerical results with the two-level spectral preconditioner<br />

Example 3<br />

Size of the<br />

GMRES(m), Toler. 1e-8<br />

coarse space<br />

m=10 m=30 m=50 m=80 m=110<br />

0 268 174 130 79 79<br />

1 267 171 123 76 76<br />

2 271 171 121 72 72<br />

3 263 170 115 70 70<br />

4 260 153 100 67 67<br />

5 255 141 93 64 64<br />

6 209 111 79 60 60<br />

7 209 111 78 60 60<br />

8 209 111 78 58 58<br />

9 137 86 66 55 55<br />

10 127 82 61 54 54<br />

11 126 82 61 54 54<br />

12 115 80 56 53 53<br />

13 119 81 56 52 52<br />

14 119 81 56 52 52<br />

15 114 79 52 51 51<br />

16 104 74 49 49 49<br />

17 105 68 48 48 48<br />

18 103 57 43 43 43<br />

19 97 59 44 44 44<br />

20 96 57 44 44 44<br />

Table A.1.5: Number of iterations required by GMRES preconditioned by<br />

a Frobenius-norm minimization method updated with spectral corrections<br />

to reduce the normwise backward error by 10 −8 <strong>for</strong> increasing size of the<br />

coarse space.


A.1. Effect of the low-rank updates on the GMRES convergence 159<br />

Example 3<br />

Size of the<br />

GMRES(m), Toler. 1e-5<br />

coarse space<br />

m=10 m=30 m=50 m=80 m=110<br />

0 129 89 70 57 57<br />

1 129 89 70 56 56<br />

2 129 88 69 56 56<br />

3 128 88 69 56 56<br />

4 126 86 57 53 53<br />

5 125 76 49 49 49<br />

6 107 60 45 45 45<br />

7 107 60 45 45 45<br />

8 107 60 45 45 45<br />

9 73 51 42 42 42<br />

10 65 49 40 40 40<br />

11 65 49 40 40 40<br />

12 63 48 40 40 40<br />

13 64 48 40 40 40<br />

14 63 48 40 40 40<br />

15 62 46 40 40 40<br />

16 59 45 37 37 37<br />

17 54 44 36 36 36<br />

18 53 32 31 31 31<br />

19 53 34 33 33 33<br />

20 53 33 32 32 32<br />

Table A.1.6: Number of iterations required by GMRES preconditioned by<br />

a Frobenius-norm minimization method updated with spectral corrections<br />

to reduce the normwise backward error by 10 −5 <strong>for</strong> increasing size of the<br />

coarse space.


160 A. Numerical results with the two-level spectral preconditioner<br />

Example 4<br />

Size of the<br />

GMRES(m), Toler. 1e-8<br />

coarse space<br />

m=10 m=30 m=50 m=80 m=110<br />

0 145 113 90 71 71<br />

1 145 113 90 68 68<br />

2 134 105 83 65 65<br />

3 133 105 83 65 65<br />

4 127 97 74 61 61<br />

5 126 95 75 61 61<br />

6 123 91 63 58 58<br />

7 101 77 58 56 56<br />

8 101 77 58 56 56<br />

9 100 75 58 56 56<br />

10 72 55 43 43 43<br />

11 95 74 55 53 53<br />

12 86 70 51 51 51<br />

13 86 68 49 49 49<br />

14 84 66 49 49 49<br />

15 82 63 49 49 49<br />

16 81 63 49 49 49<br />

17 81 65 49 49 49<br />

18 80 65 49 49 49<br />

19 82 65 49 49 49<br />

20 76 59 47 47 47<br />

Table A.1.7: Number of iterations required by GMRES preconditioned by<br />

a Frobenius-norm minimization method updated with spectral corrections<br />

to reduce the normwise backward error by 10 −8 <strong>for</strong> increasing size of the<br />

coarse space.


A.1. Effect of the low-rank updates on the GMRES convergence 161<br />

Example 4<br />

Size of the<br />

GMRES(m), Toler. 1e-5<br />

coarse space<br />

m=10 m=30 m=50 m=80 m=110<br />

0 71 57 48 48 48<br />

1 71 57 48 48 48<br />

2 68 54 45 45 45<br />

3 68 53 45 45 45<br />

4 65 46 43 43 43<br />

5 65 46 42 42 42<br />

6 64 45 41 41 41<br />

7 49 41 38 38 38<br />

8 49 41 38 38 38<br />

9 48 41 38 38 38<br />

10 20 18 18 18 18<br />

11 46 38 37 37 37<br />

12 44 36 34 34 34<br />

13 44 36 34 34 34<br />

14 43 35 34 34 34<br />

15 43 34 34 34 34<br />

16 42 34 34 34 34<br />

17 43 34 34 34 34<br />

18 43 34 34 34 34<br />

19 43 34 34 34 34<br />

20 40 32 32 32 32<br />

Table A.1.8: Number of iterations required by GMRES preconditioned by<br />

a Frobenius-norm minimization method updated with spectral corrections<br />

to reduce the normwise backward error by 10 −5 <strong>for</strong> increasing size of the<br />

coarse space.


162 A. Numerical results with the two-level spectral preconditioner<br />

Example 5<br />

Size of the<br />

GMRES(m), Toler. 1e-8<br />

coarse space<br />

m=10 m=30 m=50 m=80 m=110<br />

0 297 87 75 66 66<br />

1 290 78 75 66 66<br />

2 290 78 75 66 66<br />

3 287 66 68 58 58<br />

4 254 66 64 58 58<br />

5 232 66 62 58 58<br />

6 392 66 50 50 50<br />

7 52 43 39 39 39<br />

8 52 43 39 39 39<br />

9 53 43 39 39 39<br />

10 53 43 40 40 40<br />

11 53 43 40 40 40<br />

12 52 44 38 38 38<br />

13 58 46 43 43 43<br />

14 50 44 38 38 38<br />

15 51 44 38 38 38<br />

16 51 44 38 38 38<br />

17 51 44 38 38 38<br />

18 60 45 40 40 40<br />

19 59 45 41 41 41<br />

20 60 45 42 42 42<br />

Table A.1.9: Number of iterations required by GMRES preconditioned by<br />

a Frobenius-norm minimization method updated with spectral corrections<br />

to reduce the normwise backward error by 10 −8 <strong>for</strong> increasing size of the<br />

coarse space.


A.1. Effect of the low-rank updates on the GMRES convergence 163<br />

Example 5<br />

Size of the<br />

GMRES(m), Toler. 1e-5<br />

coarse space<br />

m=10 m=30 m=50 m=80 m=110<br />

0 110 46 42 42 42<br />

1 109 45 41 41 41<br />

2 109 45 41 41 41<br />

3 104 34 33 33 33<br />

4 88 34 33 33 33<br />

5 73 34 33 33 33<br />

6 109 35 33 33 33<br />

7 23 21 21 21 21<br />

8 23 21 21 21 21<br />

9 23 21 21 21 21<br />

10 23 22 22 22 22<br />

11 23 22 22 22 22<br />

12 24 21 21 21 21<br />

13 28 24 24 24 24<br />

14 23 21 21 21 21<br />

15 23 21 21 21 21<br />

16 23 21 21 21 21<br />

17 23 21 21 21 21<br />

18 30 24 24 24 24<br />

19 30 24 24 24 24<br />

20 32 24 24 24 24<br />

Table A.1.10: Number of iterations required by GMRES preconditioned by<br />

a Frobenius-norm minimization method updated with spectral corrections<br />

to reduce the normwise backward error by 10 −5 <strong>for</strong> increasing size of the<br />

coarse space.


164 A. Numerical results with the two-level spectral preconditioner<br />

A.2 Experiments with the operator W H = V H<br />

ɛ M 1<br />

Example 1<br />

Size of the<br />

GMRES(m), Toler. 1e-8<br />

coarse space<br />

m=10 m=30 m=50 m=80 m=110<br />

0 358 213 144 79 79<br />

1 315 176 137 76 76<br />

2 314 171 125 72 72<br />

3 314 171 115 70 70<br />

4 313 169 109 68 68<br />

5 306 171 107 67 67<br />

6 303 169 96 64 64<br />

7 298 145 90 61 61<br />

8 294 138 76 58 58<br />

9 303 134 71 57 57<br />

10 244 100 59 53 53<br />

11 206 94 53 51 51<br />

12 190 96 52 51 51<br />

13 177 88 51 51 51<br />

14 177 88 50 50 50<br />

15 180 88 50 50 50<br />

16 184 80 47 47 47<br />

17 180 80 47 47 47<br />

18 180 79 47 47 47<br />

19 174 76 46 46 46<br />

20 174 61 44 44 44<br />

Table A.2.11: Number of iterations required by GMRES preconditioned by<br />

a Frobenius-norm minimization method updated with spectral corrections<br />

to reduce the normwise backward error by 10 −8 <strong>for</strong> increasing size of the<br />

coarse space. The <strong>for</strong>mulation of Theorem 2 with the choice W H = Vε H M 1<br />

is used <strong>for</strong> the low-rank updates.


A.2. Experiments with the operator W H = V H<br />

ɛ M 1 165<br />

Example 1<br />

Size of the<br />

GMRES(m), Toler. 1e-5<br />

coarse space<br />

m=10 m=30 m=50 m=80 m=110<br />

0 165 103 75 60 60<br />

1 151 86 63 56 56<br />

2 152 86 61 53 53<br />

3 153 86 61 53 53<br />

4 150 86 61 53 53<br />

5 150 84 61 53 53<br />

6 146 81 50 50 50<br />

7 146 48 48 48 48<br />

8 138 70 44 44 44<br />

9 141 70 43 43 43<br />

10 144 64 40 40 40<br />

11 120 51 39 39 39<br />

12 98 49 38 38 38<br />

13 97 47 37 37 37<br />

14 93 46 37 37 37<br />

15 93 46 37 37 37<br />

16 90 46 34 34 34<br />

17 90 44 34 34 34<br />

18 90 44 34 34 34<br />

19 93 44 34 34 34<br />

20 93 33 32 32 32<br />

Table A.2.12: Number of iterations required by GMRES preconditioned by<br />

a Frobenius-norm minimization method updated with spectral corrections<br />

to reduce the normwise backward error by 10 −5 <strong>for</strong> increasing size of the<br />

coarse space. The <strong>for</strong>mulation of Theorem 2 with the choice W H = Vε H M 1<br />

is used <strong>for</strong> the low-rank updates.


166 A. Numerical results with the two-level spectral preconditioner<br />

Example 2<br />

Size of the<br />

GMRES(m), Toler. 1e-8<br />

coarse space<br />

m=10 m=30 m=50 m=80 m=110<br />

0 +1500 +1500 496 311 198<br />

1 285 215 177 141 103<br />

2 286 202 169 133 100<br />

3 286 193 160 129 97<br />

4 279 192 152 125 93<br />

5 286 190 149 124 92<br />

6 286 189 146 111 89<br />

7 229 161 137 95 85<br />

8 188 147 129 90 84<br />

9 187 146 129 91 84<br />

10 185 144 127 90 83<br />

11 184 143 127 89 83<br />

12 196 148 131 91 83<br />

13 187 147 130 85 81<br />

14 190 144 129 80 80<br />

15 189 144 126 77 77<br />

16 183 137 114 74 74<br />

17 178 135 109 73 73<br />

18 179 136 108 73 73<br />

19 178 135 102 70 70<br />

20 168 130 100 69 69<br />

Table A.2.13: Number of iterations required by GMRES preconditioned by<br />

a Frobenius-norm minimization method updated with spectral corrections<br />

to reduce the normwise backward error by 10 −8 <strong>for</strong> increasing size of the<br />

coarse space. The <strong>for</strong>mulation of Theorem 2 with the choice W H = Vε H M 1<br />

is used <strong>for</strong> the low-rank updates.


A.2. Experiments with the operator W H = V H<br />

ɛ M 1 167<br />

Example 2<br />

Size of the<br />

GMRES(m), Toler. 1e-5<br />

coarse space<br />

m=10 m=30 m=50 m=80 m=110<br />

0 145 110 95 76 76<br />

1 119 90 83 69 69<br />

2 118 88 80 67 67<br />

3 123 88 80 67 67<br />

4 122 88 80 67 67<br />

5 124 88 80 67 67<br />

6 123 88 80 66 66<br />

7 106 78 71 62 62<br />

8 84 71 64 58 58<br />

9 86 71 65 58 58<br />

10 85 71 65 58 58<br />

11 85 71 64 58 58<br />

12 94 74 69 61 61<br />

13 94 74 68 61 61<br />

14 94 74 68 61 61<br />

15 92 74 66 58 58<br />

16 88 69 60 54 54<br />

17 87 69 60 54 54<br />

18 86 69 60 54 54<br />

19 88 68 58 53 53<br />

20 85 67 56 53 53<br />

Table A.2.14: Number of iterations required by GMRES preconditioned by<br />

a Frobenius-norm minimization method updated with spectral corrections<br />

to reduce the normwise backward error by 10 −5 <strong>for</strong> increasing size of the<br />

coarse space. The <strong>for</strong>mulation of Theorem 2 with the choice W H = Vε H M 1<br />

is used <strong>for</strong> the low-rank updates.


168 A. Numerical results with the two-level spectral preconditioner<br />

Example 3<br />

Size of the<br />

GMRES(m), Toler. 1e-8<br />

coarse space<br />

m=10 m=30 m=50 m=80 m=110<br />

0 268 174 130 79 79<br />

1 254 170 121 76 76<br />

2 286 170 119 72 72<br />

3 284 169 114 70 70<br />

4 259 150 99 66 66<br />

5 269 141 92 63 63<br />

6 221 110 78 60 60<br />

7 222 108 77 59 59<br />

8 225 109 77 58 58<br />

9 133 86 65 55 55<br />

10 126 82 59 53 53<br />

11 124 82 60 53 53<br />

12 117 81 56 52 52<br />

13 117 81 56 52 52<br />

14 119 80 56 52 52<br />

15 119 79 53 51 51<br />

16 105 74 49 49 49<br />

17 106 69 47 47 47<br />

18 105 65 46 46 46<br />

19 99 58 44 44 44<br />

20 96 58 44 44 44<br />

Table A.2.15: Number of iterations required by GMRES preconditioned by<br />

a Frobenius-norm minimization method updated with spectral corrections<br />

to reduce the normwise backward error by 10 −8 <strong>for</strong> increasing size of the<br />

coarse space. The <strong>for</strong>mulation of Theorem 2 with the choice W H = Vε H M 1<br />

is used <strong>for</strong> the low-rank updates.


A.2. Experiments with the operator W H = V H<br />

ɛ M 1 169<br />

Example 3<br />

Size of the<br />

GMRES(m), Toler. 1e-5<br />

coarse space<br />

m=10 m=30 m=50 m=80 m=110<br />

0 129 89 70 57 57<br />

1 130 88 70 56 56<br />

2 147 88 68 56 56<br />

3 145 88 69 56 56<br />

4 139 83 55 52 52<br />

5 135 74 49 49 49<br />

6 116 59 45 45 45<br />

7 115 60 45 45 45<br />

8 115 60 45 45 45<br />

9 70 50 41 41 41<br />

10 66 48 40 40 40<br />

11 66 48 40 40 40<br />

12 64 47 39 39 39<br />

13 64 47 39 39 39<br />

14 64 47 39 39 39<br />

15 62 46 39 39 39<br />

16 56 44 37 37 37<br />

17 56 42 36 36 36<br />

18 55 41 35 35 35<br />

19 56 34 33 33 33<br />

20 56 33 32 32 32<br />

Table A.2.16: Number of iterations required by GMRES preconditioned by<br />

a Frobenius-norm minimization method updated with spectral corrections<br />

to reduce the normwise backward error by 10 −5 <strong>for</strong> increasing size of the<br />

coarse space. The <strong>for</strong>mulation of Theorem 2 with the choice W H = Vε H M 1<br />

is used <strong>for</strong> the low-rank updates.


170 A. Numerical results with the two-level spectral preconditioner<br />

Example 4<br />

Size of the<br />

GMRES(m), Toler. 1e-8<br />

coarse space<br />

m=10 m=30 m=50 m=80 m=110<br />

0 145 113 90 71 71<br />

1 145 113 90 68 68<br />

2 134 106 83 65 65<br />

3 130 103 85 65 65<br />

4 125 97 74 61 61<br />

5 123 93 73 61 61<br />

6 120 91 66 58 58<br />

7 101 78 58 56 56<br />

8 101 78 58 56 56<br />

9 98 78 58 56 56<br />

10 94 74 56 55 55<br />

11 93 74 55 53 53<br />

12 86 70 52 51 51<br />

13 86 68 50 50 50<br />

14 85 67 49 49 49<br />

15 82 64 49 49 49<br />

16 81 64 49 49 49<br />

17 82 66 49 49 49<br />

18 81 66 49 49 49<br />

19 81 67 50 50 50<br />

20 77 62 47 47 47<br />

Table A.2.17: Number of iterations required by GMRES preconditioned by<br />

a Frobenius-norm minimization method updated with spectral corrections<br />

to reduce the normwise backward error by 10 −8 <strong>for</strong> increasing size of the<br />

coarse space. The <strong>for</strong>mulation of Theorem 2 with the choice W H = Vε H M 1<br />

is used <strong>for</strong> the low-rank updates.


A.2. Experiments with the operator W H = V H<br />

ɛ M 1 171<br />

Example 4<br />

Size of the<br />

GMRES(m), Toler. 1e-5<br />

coarse space<br />

m=10 m=30 m=50 m=80 m=110<br />

0 71 57 48 48 48<br />

1 71 57 48 48 48<br />

2 68 54 45 45 45<br />

3 66 50 45 45 45<br />

4 64 46 43 43 43<br />

5 63 45 41 41 41<br />

6 63 45 41 41 41<br />

7 50 41 38 38 38<br />

8 50 41 38 38 38<br />

9 49 41 38 38 38<br />

10 46 39 37 37 37<br />

11 46 38 37 37 37<br />

12 44 36 35 35 35<br />

13 45 36 35 35 35<br />

14 44 35 34 34 34<br />

15 43 34 34 34 34<br />

16 43 34 33 33 33<br />

17 43 35 34 34 34<br />

18 43 35 34 34 34<br />

19 43 35 34 34 34<br />

20 41 33 32 32 32<br />

Table A.2.18: Number of iterations required by GMRES preconditioned by<br />

a Frobenius-norm minimization method updated with spectral corrections<br />

to reduce the normwise backward error by 10 −5 <strong>for</strong> increasing size of the<br />

coarse space. The <strong>for</strong>mulation of Theorem 2 with the choice W H = Vε H M 1<br />

is used <strong>for</strong> the low-rank updates.


172 A. Numerical results with the two-level spectral preconditioner<br />

Example 5<br />

Size of the<br />

GMRES(m), Toler. 1e-8<br />

coarse space<br />

m=10 m=30 m=50 m=80 m=110<br />

0 297 87 75 66 66<br />

1 312 79 75 66 66<br />

2 311 79 75 66 66<br />

3 354 66 68 58 58<br />

4 345 66 64 58 58<br />

5 270 66 62 58 58<br />

6 559 66 50 50 50<br />

7 53 43 40 40 40<br />

8 55 43 40 40 40<br />

9 55 43 40 40 40<br />

10 54 44 41 41 41<br />

11 53 44 41 41 41<br />

12 52 43 38 38 38<br />

13 53 44 38 38 38<br />

14 52 44 39 39 39<br />

15 52 44 39 39 39<br />

16 52 44 39 39 39<br />

17 52 44 39 39 39<br />

18 52 44 39 39 39<br />

19 53 45 39 39 39<br />

20 53 45 40 40 40<br />

Table A.2.19: Number of iterations required by GMRES preconditioned by<br />

a Frobenius-norm minimization method updated with spectral corrections<br />

to reduce the normwise backward error by 10 −8 <strong>for</strong> increasing size of the<br />

coarse space. The <strong>for</strong>mulation of Theorem 2 with the choice W H = Vε H M 1<br />

is used <strong>for</strong> the low-rank updates.


A.2. Experiments with the operator W H = V H<br />

ɛ M 1 173<br />

Example 5<br />

Size of the<br />

GMRES(m), Toler. 1e-5<br />

coarse space<br />

m=10 m=30 m=50 m=80 m=110<br />

0 110 46 42 42 42<br />

1 111 45 41 41 41<br />

2 111 45 41 41 41<br />

3 112 34 34 34 34<br />

4 92 35 34 34 34<br />

5 72 35 34 34 34<br />

6 121 36 34 34 34<br />

7 23 21 21 21 21<br />

8 23 22 22 22 22<br />

9 24 22 22 22 22<br />

10 24 21 21 21 21<br />

11 23 21 21 21 21<br />

12 23 21 21 21 21<br />

13 23 21 21 21 21<br />

14 24 21 21 21 21<br />

15 24 21 21 21 21<br />

16 24 21 21 21 21<br />

17 24 22 22 22 22<br />

18 24 22 22 22 22<br />

19 24 22 22 22 22<br />

20 24 22 22 22 22<br />

Table A.2.20: Number of iterations required by GMRES preconditioned by<br />

a Frobenius-norm minimization method updated with spectral corrections<br />

to reduce the normwise backward error by 10 −5 <strong>for</strong> increasing size of the<br />

coarse space. The <strong>for</strong>mulation of Theorem 2 with the choice W H = Vε H M 1<br />

is used <strong>for</strong> the low-rank updates.


174 A. Numerical results with the two-level spectral preconditioner<br />

A.3 Cost of the eigencomputation<br />

Example 1<br />

Nr. of eigenvalues M-V products<br />

CPU-time<br />

(in sec)<br />

A-V<br />

1 90 15 9<br />

2 388 41 36<br />

3 243 22 23<br />

4 281 35 26<br />

5 354 33 33<br />

6 293 27 25<br />

7 247 23 21<br />

8 198 19 17<br />

9 179 26 15<br />

10 138 14 4<br />

11 186 26 3<br />

12 213 21 4<br />

13 189 22 3<br />

14 235 23 4<br />

15 276 36 4<br />

16 266 31 4<br />

17 514 50 8<br />

18 336 34 5<br />

19 336 34 4<br />

20 650 75 7<br />

Table A.3.21: Number of matrix-vector products, CPU time and<br />

amortization vectors required by the IRAM algorithm to compute<br />

approximate eigenvalues nearest zero and the corresponding eigenvectors.<br />

The computation of the amortization vectors is relative to GMRES(10) and<br />

a tolerance of 10 −5 .


A.3. Cost of the eigencomputation 175<br />

Example 2<br />

Nr. of eigenvalues M-V products<br />

CPU-time<br />

(in sec)<br />

A-V<br />

1 135 18 135<br />

2 440 59 74<br />

3 524 71 105<br />

4 469 63 94<br />

5 423 58 85<br />

6 357 49 179<br />

7 340 47 14<br />

8 333 46 8<br />

9 345 48 8<br />

10 358 50 8<br />

11 527 74 12<br />

12 579 81 13<br />

13 574 81 13<br />

14 1010 142 16<br />

15 1762 303 26<br />

16 1053 149 19<br />

17 751 107 10<br />

18 3050 514 53<br />

19 2359 335 33<br />

20 1066 188 21<br />

Table A.3.22: Number of matrix-vector products, CPU time and<br />

amortization vectors required by the IRAM algorithm to compute<br />

approximate eigenvalues nearest zero and the corresponding eigenvectors.<br />

The computation of the amortization vectors is relative to GMRES(10) and<br />

a tolerance of 10 −5 .


176 A. Numerical results with the two-level spectral preconditioner<br />

Example 3<br />

Nr. of eigenvalues M-V products<br />

CPU-time<br />

(in sec)<br />

A-V<br />

1 120 29 -<br />

2 336 79 -<br />

3 290 69 290<br />

4 250 60 84<br />

5 192 46 48<br />

6 183 45 9<br />

7 175 43 8<br />

8 165 41 8<br />

9 154 39 3<br />

10 169 42 3<br />

11 157 40 3<br />

12 219 56 4<br />

13 224 62 4<br />

14 212 70 4<br />

15 223 57 4<br />

16 202 53 3<br />

17 226 59 4<br />

18 264 69 4<br />

19 264 69 4<br />

20 300 78 4<br />

Table A.3.23: Number of matrix-vector products, CPU time and<br />

amortization vectors required by the IRAM algorithm to compute<br />

approximate eigenvalues nearest zero and the corresponding eigenvectors.<br />

The computation of the amortization vectors is relative to GMRES(10) and<br />

a tolerance of 10 −5 .


A.3. Cost of the eigencomputation 177<br />

Example 4<br />

Nr. of eigenvalues M-V products<br />

CPU-time<br />

(in sec)<br />

A-V<br />

1 75 25 -<br />

2 168 56 56<br />

3 214 72 72<br />

4 178 60 30<br />

5 149 57 25<br />

6 180 73 26<br />

7 134 47 7<br />

8 236 81 11<br />

9 261 90 12<br />

10 207 72 5<br />

11 191 67 8<br />

12 197 70 8<br />

13 248 88 10<br />

14 309 109 12<br />

15 355 125 13<br />

16 412 156 15<br />

17 408 144 15<br />

18 390 138 49<br />

19 426 191 16<br />

20 345 159 12<br />

Table A.3.24: Number of matrix-vector products, CPU time and<br />

amortization vectors required by the IRAM algorithm to compute<br />

approximate eigenvalues nearest zero and the corresponding eigenvectors.<br />

The computation of the amortization vectors is relative to GMRES(10) and<br />

a tolerance of 10 −5 .


178 A. Numerical results with the two-level spectral preconditioner<br />

Example 5<br />

Nr. of eigenvalues M-V products<br />

CPU-time<br />

(in sec)<br />

A-V<br />

1 60 30 60<br />

2 58 29 58<br />

3 107 53 18<br />

4 103 51 5<br />

5 163 81 5<br />

6 156 78 156<br />

7 128 65 2<br />

8 105 54 2<br />

9 125 65 2<br />

10 128 67 2<br />

11 160 83 2<br />

12 131 94 2<br />

13 162 85 2<br />

14 126 68 2<br />

15 164 97 2<br />

16 237 124 3<br />

17 227 119 3<br />

18 223 118 3<br />

19 756 454 10<br />

20 220 118 3<br />

Table A.3.25: Number of matrix-vector products, CPU time and<br />

amortization vectors required by the IRAM algorithm to compute<br />

approximate eigenvalues nearest zero and the corresponding eigenvectors.<br />

The computation of the amortization vectors is relative to GMRES(10) and<br />

a tolerance of 10 −5 .


A.4. Sensitivity of the preconditioner to the accuracy of the eigencomputation179<br />

A.4 Sensitivity of the preconditioner to the<br />

accuracy of the eigencomputation<br />

Example 1<br />

Size of the<br />

GMRES(m), Toler. 1e-8<br />

coarse space<br />

m=10 m=30 m=50 m=80 m=110<br />

0 358 213 144 79 79<br />

1 315 176 137 76 76<br />

2 314 171 125 72 72<br />

3 314 171 115 70 70<br />

4 313 169 109 68 68<br />

5 306 171 107 67 67<br />

6 303 169 96 64 64<br />

7 298 145 90 61 61<br />

8 294 138 76 58 58<br />

9 303 134 71 57 57<br />

10 244 100 59 53 53<br />

11 206 94 53 51 51<br />

12 190 96 52 51 51<br />

13 177 88 51 51 51<br />

14 177 88 50 50 50<br />

15 182 88 50 50 50<br />

16 184 80 47 47 47<br />

17 171 80 47 47 47<br />

18 176 79 47 47 47<br />

19 177 77 47 47 47<br />

20 178 61 44 44 44<br />

Table A.4.26: Number of iterations required by GMRES preconditioned by<br />

a Frobenius-norm minimization method updated with spectral corrections<br />

to reduce the residual by 10 −8 <strong>for</strong> increasing size of the coarse space. The<br />

<strong>for</strong>mulation of Theorem 2 with the choice W H = Vε<br />

H M 1 is used <strong>for</strong> the<br />

low-rank updates. The computation of Ritz pairs is carried out at machine<br />

precision.


180 A. Numerical results with the two-level spectral preconditioner<br />

Example 1<br />

Size of the<br />

GMRES(m), Toler. 1e-5<br />

coarse space<br />

m=10 m=30 m=50 m=80 m=110<br />

0 165 103 75 60 60<br />

1 151 86 63 56 56<br />

2 152 86 61 53 53<br />

3 153 86 61 53 53<br />

4 150 86 61 53 53<br />

5 150 84 61 53 53<br />

6 146 81 50 50 50<br />

7 138 70 48 48 48<br />

8 141 70 44 44 44<br />

9 144 64 43 43 43<br />

10 120 51 40 40 40<br />

11 98 49 39 39 39<br />

12 97 47 38 38 38<br />

13 93 46 37 37 37<br />

14 93 46 37 37 37<br />

15 91 46 37 37 37<br />

16 92 44 34 34 34<br />

17 89 44 34 34 34<br />

18 90 44 34 34 34<br />

19 92 44 34 34 34<br />

20 90 34 32 32 32<br />

Table A.4.27: Number of iterations required by GMRES preconditioned by<br />

a Frobenius-norm minimization method updated with spectral corrections<br />

to reduce the normwise backward error by 10 −5 <strong>for</strong> increasing size of the<br />

coarse space. The <strong>for</strong>mulation of Theorem 2 with the choice W H = Vε H M 1<br />

is used <strong>for</strong> the low-rank updates. The computation of Ritz pairs is carried<br />

out at machine precision.


A.4. Sensitivity of the preconditioner to the accuracy of the eigencomputation181<br />

Example 2<br />

Size of the<br />

GMRES(m), Toler. 1e-8<br />

coarse space<br />

m=10 m=30 m=50 m=80 m=110<br />

0 +1500 496 311 198 123<br />

1 215 177 141 103 103<br />

2 202 169 133 100 100<br />

3 193 160 129 97 97<br />

4 192 152 125 93 93<br />

5 190 149 124 92 92<br />

6 189 146 111 89 89<br />

7 161 137 95 85 85<br />

8 147 129 90 84 84<br />

9 146 129 91 84 84<br />

10 144 127 90 83 83<br />

11 143 127 89 83 83<br />

12 143 126 88 82 82<br />

13 140 122 80 80 80<br />

14 139 118 79 79 79<br />

15 139 119 79 79 79<br />

16 139 118 79 79 79<br />

17 139 116 76 76 76<br />

18 135 113 75 75 75<br />

19 135 114 75 75 75<br />

20 131 109 73 73 73<br />

Table A.4.28: Number of iterations required by GMRES preconditioned by<br />

a Frobenius-norm minimization method updated with spectral corrections<br />

to reduce the normwise backward error by 10 −8 <strong>for</strong> increasing size of the<br />

coarse space. The <strong>for</strong>mulation of Theorem 2 with the choice W H = Vε H M 1<br />

is used <strong>for</strong> the low-rank updates. The computation of Ritz pairs is carried<br />

out at machine precision.


182 A. Numerical results with the two-level spectral preconditioner<br />

Example 2<br />

Size of the<br />

GMRES(m), Toler. 1e-5<br />

coarse space<br />

m=10 m=30 m=50 m=80 m=110<br />

0 145 110 95 76 76<br />

1 119 90 83 69 69<br />

2 118 88 80 67 67<br />

3 123 88 80 67 67<br />

4 122 88 80 67 67<br />

5 124 88 80 67 67<br />

6 123 88 80 66 66<br />

7 106 78 71 62 62<br />

8 84 71 64 58 58<br />

9 86 71 65 58 58<br />

10 85 71 65 58 58<br />

11 85 71 64 58 58<br />

12 84 71 64 58 58<br />

13 84 70 62 56 56<br />

14 83 70 62 56 56<br />

15 83 70 62 56 56<br />

16 84 70 62 56 56<br />

17 84 70 61 55 55<br />

18 79 68 59 55 55<br />

19 79 68 60 55 55<br />

20 79 67 58 53 53<br />

Table A.4.29: Number of iterations required by GMRES preconditioned by<br />

a Frobenius-norm minimization method updated with spectral corrections<br />

to reduce the normwise backward error by 10 −5 <strong>for</strong> increasing size of the<br />

coarse space. The <strong>for</strong>mulation of Theorem 2 with the choice W H = Vε H M 1<br />

is used <strong>for</strong> the low-rank updates. The computation of Ritz pairs is carried<br />

out at machine precision.


A.4. Sensitivity of the preconditioner to the accuracy of the eigencomputation183<br />

Example 3<br />

Size of the<br />

GMRES(m), Toler. 1e-8<br />

coarse space<br />

m=10 m=30 m=50 m=80 m=110<br />

0 268 174 130 79 79<br />

1 254 170 121 76 76<br />

2 286 170 119 72 72<br />

3 284 169 114 70 70<br />

4 259 150 99 66 66<br />

5 269 141 92 63 63<br />

6 221 110 78 60 60<br />

7 222 108 77 59 59<br />

8 225 109 77 58 58<br />

9 133 86 65 55 55<br />

10 126 82 59 53 53<br />

11 124 82 60 53 53<br />

12 117 81 56 52 52<br />

13 117 81 56 52 52<br />

14 119 80 56 52 52<br />

15 117 79 53 51 51<br />

16 104 73 49 49 4¯9<br />

17 106 70 47 47 47<br />

18 102 64 46 46 46<br />

19 94 58 44 44 44<br />

20 99 58 44 44 44<br />

Table A.4.30: Number of iterations required by GMRES preconditioned by<br />

a Frobenius-norm minimization method updated with spectral corrections<br />

to reduce the normwise backward error by 10 −8 <strong>for</strong> increasing size of the<br />

coarse space. The <strong>for</strong>mulation of Theorem 2 with the choice W H = Vε H M 1<br />

is used <strong>for</strong> the low-rank updates. The computation of Ritz pairs is carried<br />

out at machine precision.


184 A. Numerical results with the two-level spectral preconditioner<br />

Example 3<br />

Size of the<br />

GMRES(m), Toler. 1e-5<br />

coarse space<br />

m=10 m=30 m=50 m=80 m=110<br />

0 129 89 70 57 57<br />

1 130 88 70 56 56<br />

2 147 88 68 56 56<br />

3 145 88 69 56 56<br />

4 139 83 55 52<br />

5 135 74 49 49 49<br />

6 116 59 45 45 45<br />

7 115 60 45 45 45<br />

8 115 60 45 45 45<br />

9 70 50 41 41 41<br />

10 66 48 40 40 40<br />

11 66 48 40 40 40<br />

12 64 47 39 39 39<br />

13 63 47 39 39 39<br />

14 63 47 39 39 39<br />

15 62 46 39 39 39<br />

16 55 44 37 37 37<br />

17 56 42 36 36 36<br />

18 53 39 35 35 35<br />

19 55 34 33 33 33<br />

20 55 34 32 32 32<br />

Table A.4.31: Number of iterations required by GMRES preconditioned by<br />

a Frobenius-norm minimization method updated with spectral corrections<br />

to reduce the normwise backward error by 10 −5 <strong>for</strong> increasing size of the<br />

coarse space. The <strong>for</strong>mulation of Theorem 2 with the choice W H = Vε H M 1<br />

is used <strong>for</strong> the low-rank updates. The computation of Ritz pairs is carried<br />

out at machine precision.


A.4. Sensitivity of the preconditioner to the accuracy of the eigencomputation185<br />

Example 4<br />

Size of the<br />

GMRES(m), Toler. 1e-8<br />

coarse space<br />

m=10 m=30 m=50 m=80 m=110<br />

0 145 113 90 71 71<br />

1 145 113 90 68 68<br />

2 134 106 83 65 65<br />

3 130 103 85 65 65<br />

4 125 97 74 61 61<br />

5 123 93 73 61 61<br />

6 120 91 66 58 58<br />

7 101 78 58 56 56<br />

8 101 78 58 56 56<br />

9 99 77 58 56 56<br />

10 94 74 56 55 55<br />

11 93 74 55 53 53<br />

12 86 70 52 51 51<br />

13 86 68 50 50 50<br />

14 85 67 49 49 49<br />

15 82 65 49 49 49<br />

16 81 64 49 49 49<br />

17 81 66 49 49 49<br />

18 81 66 49 49 49<br />

19 80 64 49 49 49<br />

20 80 65 49 49 49<br />

Table A.4.32: Number of iterations required by GMRES preconditioned by<br />

a Frobenius-norm minimization method updated with spectral corrections<br />

to reduce the normwise backward error by 10 −8 <strong>for</strong> increasing size of the<br />

coarse space. The <strong>for</strong>mulation of Theorem 2 with the choice W H = Vε H M 1<br />

is used <strong>for</strong> the low-rank updates. The computation of Ritz pairs is carried<br />

out at machine precision.


186 A. Numerical results with the two-level spectral preconditioner<br />

Example 4<br />

Size of the<br />

GMRES(m), Toler. 1e-5<br />

coarse space<br />

m=10 m=30 m=50 m=80 m=110<br />

0 71 57 48 48 48<br />

1 71 57 48 48 48<br />

2 68 54 45 45 45<br />

3 66 50 45 45 45<br />

4 64 46 43 43 43<br />

5 63 45 41 41 41<br />

6 63 45 41 41 41<br />

7 50 41 38 38 38<br />

8 50 41 38 38 38<br />

9 48 40 38 38 38<br />

10 46 39 37 37 37<br />

11 46 38 37 37 37<br />

12 44 36 35 35 35<br />

13 45 36 35 35 35<br />

14 44 35 34 34 34<br />

15 43 34 34 34 34<br />

16 43 34 34 34 34<br />

17 43 34 34 34 34<br />

18 43 34 34 34 34<br />

19 43 34 34 34 34<br />

20 43 35 34 34 34<br />

Table A.4.33: Number of iterations required by GMRES preconditioned by<br />

a Frobenius-norm minimization method updated with spectral corrections<br />

to reduce the normwise backward error by 10 −5 <strong>for</strong> increasing size of the<br />

coarse space. The <strong>for</strong>mulation of Theorem 2 with the choice W H = Vε H M 1<br />

is used <strong>for</strong> the low-rank updates. The computation of Ritz pairs is carried<br />

out at machine precision.


A.4. Sensitivity of the preconditioner to the accuracy of the eigencomputation187<br />

Example 5<br />

Size of the<br />

GMRES(m), Toler. 1e-8<br />

coarse space<br />

m=10 m=30 m=50 m=80 m=110<br />

0 297 87 75 66 66<br />

1 312 79 75 66 66<br />

2 311 79 75 66 66<br />

3 354 66 68 58 58<br />

4 346 66 64 58 58<br />

5 270 66 62 58 58<br />

6 552 66 50 50 50<br />

7 53 43 40 40 40<br />

8 55 43 40 40 40<br />

9 54 44 40 40 40<br />

10 54 44 40 40 40<br />

11 50 43 38 38 38<br />

12 50 43 38 38 38<br />

13 48 43 38 38 38<br />

14 48 43 38 38 38<br />

15 48 43 38 38 38<br />

16 48 43 38 38 38<br />

17 48 43 38 38 38<br />

18 48 43 38 38 38<br />

19 47 41 36 36 36<br />

20 50 40 36 36 36<br />

Table A.4.34: Number of iterations required by GMRES preconditioned by<br />

a Frobenius-norm minimization method updated with spectral corrections<br />

to reduce the normwise backward error by 10 −8 <strong>for</strong> increasing size of the<br />

coarse space. The <strong>for</strong>mulation of Theorem 2 with the choice W H = Vε H M 1<br />

is used <strong>for</strong> the low-rank updates. The computation of Ritz pairs is carried<br />

out at machine precision.


188 A. Numerical results with the two-level spectral preconditioner<br />

Example 5<br />

Size of the<br />

GMRES(m), Toler. 1e-5<br />

coarse space<br />

m=10 m=30 m=50 m=80 m=110<br />

0 110 46 42 42 42<br />

1 111 45 41 41 41<br />

2 111 45 41 41 41<br />

3 112 34 34 34 34<br />

4 92 35 34 34 34<br />

5 72 35 34 34 34<br />

6 121 36 34 34 34<br />

7 23 21 21 21 21<br />

8 23 22 22 22 22<br />

9 23 21 21 21 21<br />

10 23 21 21 21 21<br />

11 23 21 21 21 21<br />

12 23 21 21 21 21<br />

13 23 21 21 21 21<br />

14 23 21 21 21 21<br />

15 23 21 21 21 21<br />

16 23 21 21 21 21<br />

17 23 21 21 21 21<br />

18 23 21 21 21 21<br />

19 23 21 21 21 21<br />

20 24 21 21 21 21<br />

Table A.4.35: Number of iterations required by GMRES preconditioned by<br />

a Frobenius-norm minimization method updated with spectral corrections<br />

to reduce the normwise backward error by 10 −5 <strong>for</strong> increasing size of the<br />

coarse space. The <strong>for</strong>mulation of Theorem 2 with the choice W H = Vε H M 1<br />

is used <strong>for</strong> the low-rank updates. The computation of Ritz pairs is carried<br />

out at machine precision.


A.5. Experiments with a poor preconditioner M 1 189<br />

A.5 Experiments with a poor preconditioner M 1<br />

Example 1<br />

Size of the<br />

GMRES(m), Toler. 1e-8<br />

coarse space<br />

m=10 m=30 m=50 m=80 m=110<br />

0 818 418 303 193 142<br />

1 804 418 303 193 139<br />

2 780 419 301 184 122<br />

3 784 419 306 178 112<br />

4 779 348 262 154 105<br />

5 766 328 247 153 104<br />

6 696 317 238 148 102<br />

7 722 316 233 149 102<br />

8 690 318 236 148 102<br />

9 710 314 235 148 102<br />

10 666 298 231 145 101<br />

11 710 290 227 144 99<br />

12 635 260 196 132 93<br />

13 628 258 196 131 93<br />

14 589 255 195 130 93<br />

15 648 256 195 130 92<br />

16 626 255 190 126 91<br />

17 658 251 185 113 87<br />

18 654 250 185 113 87<br />

19 615 251 184 113 87<br />

20 658 238 161 93 83<br />

Table A.5.36: Number of iterations required by GMRES preconditioned by<br />

a Frobenius-norm minimization method updated with spectral corrections<br />

to reduce the normwise backward error by 10 −8 <strong>for</strong> increasing size of the<br />

coarse space. The <strong>for</strong>mulation of Theorem 2 with the choice W H = Vε H M 1<br />

is used <strong>for</strong> the low-rank updates. The same nonzero sructure is imposed on<br />

A and M 1 .


190 A. Numerical results with the two-level spectral preconditioner<br />

Example 1<br />

Size of the<br />

GMRES(m), Toler. 1e-5<br />

coarse space<br />

m=10 m=30 m=50 m=80 m=110<br />

0 342 174 138 81 81<br />

1 338 174 138 82 82<br />

2 325 174 138 82 82<br />

3 331 174 138 82 82<br />

4 327 146 121 74 74<br />

5 309 134 114 72 72<br />

6 291 133 113 71 71<br />

7 302 133 113 71 71<br />

8 285 133 113 71 71<br />

9 301 133 113 71 71<br />

10 304 131 111 71 71<br />

11 290 127 107 69 69<br />

12 267 108 86 65 65<br />

13 269 107 85 64 64<br />

14 269 107 85 64 64<br />

15 268 107 85 63 63<br />

16 269 106 85 63 63<br />

17 275 106 85 61 61<br />

18 270 106 85 61 61<br />

19 265 106 85 61 61<br />

20 270 99 74 57 57<br />

Table A.5.37: Number of iterations required by GMRES preconditioned by<br />

a Frobenius-norm minimization method updated with spectral corrections<br />

to reduce the normwise backward error by 10 −5 <strong>for</strong> increasing size of the<br />

coarse space. The <strong>for</strong>mulation of Theorem 2 with the choice W H = Vε H M 1<br />

is used <strong>for</strong> the low-rank updates. The same nonzero structure is imposed on<br />

A and M 1 .


A.5. Experiments with a poor preconditioner M 1 191<br />

Example 2<br />

Size of the<br />

GMRES(m), Toler. 1e-8<br />

coarse space<br />

m=10 m=30 m=50 m=80 m=110<br />

0 +1500 +1500 +1500 1058 509<br />

1 518 371 311 265 224<br />

2 513 372 310 265 223<br />

3 515 372 309 264 222<br />

4 513 371 308 263 220<br />

5 516 370 308 263 220<br />

6 504 370 307 262 220<br />

7 515 369 307 263 220<br />

8 506 367 306 262 220<br />

9 506 367 306 262 219<br />

10 508 365 306 261 219<br />

11 502 366 305 261 219<br />

12 502 362 304 260 219<br />

13 497 363 304 260 219<br />

14 499 363 304 260 219<br />

15 499 363 304 260 219<br />

16 504 363 304 259 219<br />

17 497 362 302 259 218<br />

18 490 358 299 256 218<br />

19 490 358 299 256 218<br />

20 492 358 299 255 218<br />

Table A.5.38: Number of iterations required by GMRES preconditioned by<br />

a Frobenius-norm minimization method updated with spectral corrections<br />

to reduce the normwise backward error by 10 −8 <strong>for</strong> increasing size of the<br />

coarse space. The <strong>for</strong>mulation of Theorem 2 with the choice W H = Vε H M 1<br />

is used <strong>for</strong> the low-rank updates.


192 A. Numerical results with the two-level spectral preconditioner<br />

Example 2<br />

Size of the<br />

GMRES(m), Toler. 1e-5<br />

coarse space<br />

m=10 m=30 m=50 m=80 m=110<br />

0 247 180 155 137 115<br />

1 239 174 149 133 109<br />

2 237 174 149 133 109<br />

3 238 174 149 133 109<br />

4 237 174 148 133 109<br />

5 239 174 149 133 109<br />

6 237 174 148 132 109<br />

7 237 174 148 132 109<br />

8 239 173 148 132 109<br />

9 237 173 148 132 109<br />

10 239 173 148 132 108<br />

11 233 173 148 132 108<br />

12 235 173 148 132 108<br />

13 233 173 148 132 108<br />

14 229 173 148 132 108<br />

15 237 173 148 132 108<br />

16 237 172 148 132 108<br />

17 232 172 147 131 108<br />

18 232 171 147 130 107<br />

19 233 171 147 130 107<br />

20 233 170 147 130 107<br />

Table A.5.39: Number of iterations required by GMRES preconditioned by<br />

a Frobenius-norm minimization method updated with spectral corrections<br />

to reduce the normwise backward error by 10 −5 <strong>for</strong> increasing size of the<br />

coarse space. The <strong>for</strong>mulation of Theorem 2 with the choice W H = Vε H M 1<br />

is used <strong>for</strong> the low-rank updates. The same nonzero structure is imposed on<br />

A and M 1 .


A.5. Experiments with a poor preconditioner M 1 193<br />

Example 3<br />

Size of the<br />

GMRES(m), Toler. 1e-8<br />

coarse space<br />

m=10 m=30 m=50 m=80 m=110<br />

0 303 234 194 159 132<br />

1 275 210 176 152 118<br />

2 278 210 175 150 111<br />

3 276 209 173 149 111<br />

4 273 208 173 149 110<br />

5 277 208 173 149 110<br />

6 273 208 173 148 108<br />

7 253 191 163 143 106<br />

8 254 191 163 143 106<br />

9 253 190 163 141 103<br />

10 220 175 148 134 100<br />

11 221 175 148 133 99<br />

12 221 173 147 133 99<br />

13 216 172 145 131 99<br />

14 219 172 145 131 96<br />

15 219 168 143 128 93<br />

16 217 168 143 127 93<br />

17 217 166 142 119 90<br />

18 213 164 142 119 90<br />

19 213 164 142 119 90<br />

20 200 150 133 109 88<br />

Table A.5.40: Number of iterations required by GMRES preconditioned by<br />

a Frobenius-norm minimization method updated with spectral corrections<br />

to reduce the normwise backward error by 10 −8 <strong>for</strong> increasing size of the<br />

coarse space. The <strong>for</strong>mulation of Theorem 2 with the choice W H = Vε H M 1<br />

is used <strong>for</strong> the low-rank updates. The same nonzero structure is imposed on<br />

A and M 1 .


194 A. Numerical results with the two-level spectral preconditioner<br />

Example 3<br />

Size of the<br />

GMRES(m), Toler. 1e-5<br />

coarse space<br />

m=10 m=30 m=50 m=80 m=110<br />

0 152 118 101 94 86<br />

1 149 114 97 84 82<br />

2 149 114 96 83 82<br />

3 147 113 96 81 81<br />

4 144 112 95 80 80<br />

5 145 112 95 80 80<br />

6 144 112 95 80 80<br />

7 136 105 91 77 77<br />

8 137 105 91 77 77<br />

9 136 105 91 77 77<br />

10 119 9¯7 84 73 73<br />

11 118 97 84 73 73<br />

12 118 97 84 72 72<br />

13 117 96 83 72 72<br />

14 117 96 83 72 72<br />

15 117 95 82 71 71<br />

16 116 93 81 70 70<br />

17 115 90 80 69 69<br />

18 114 90 80 68 68<br />

19 114 90 80 68 68<br />

20 108 84 76 66 66<br />

Table A.5.41: Number of iterations required by GMRES preconditioned by<br />

a Frobenius-norm minimization method updated with spectral corrections<br />

to reduce the normwise backward error by 10 −5 <strong>for</strong> increasing size of the<br />

coarse space. The <strong>for</strong>mulation of Theorem 2 with the choice W H = Vε H M 1<br />

is used <strong>for</strong> the low-rank updates. The same nonzero structure is imposed on<br />

A and M 1 .


A.5. Experiments with a poor preconditioner M 1 195<br />

Example 4<br />

Size of the<br />

GMRES(m), Toler. 1e-8<br />

coarse space<br />

m=10 m=30 m=50 m=80 m=110<br />

0 256 235 214 194 171<br />

1 256 235 214 193 170<br />

2 255 232 211 190 167<br />

3 255 232 211 190 166<br />

4 252 229 207 186 159<br />

5 251 229 207 186 155<br />

6 250 227 206 185 155<br />

7 249 223 199 170 151<br />

8 248 222 199 169 149<br />

9 248 222 199 169 149<br />

10 248 222 198 169 148<br />

11 247 221 197 168 133<br />

12 248 220 191 159 125<br />

13 240 199 169 148 119<br />

14 240 199 169 148 119<br />

15 236 195 167 146 117<br />

16 237 194 166 146 117<br />

17 236 194 166 146 116<br />

18 236 194 166 146 116<br />

19 229 191 163 139 114<br />

20 226 192 163 139 112<br />

Table A.5.42: Number of iterations required by GMRES preconditioned by<br />

a Frobenius-norm minimization method updated with spectral corrections<br />

to reduce the normwise backward error by 10 −8 <strong>for</strong> increasing size of the<br />

coarse space. The <strong>for</strong>mulation of Theorem 2 with the choice W H = Vε H M 1<br />

is used <strong>for</strong> the low-rank updates. The same nonzero structure is imposed on<br />

A and M 1 .


196 A. Numerical results with the two-level spectral preconditioner<br />

Example 4<br />

Size of the<br />

GMRES(m), Toler. 1e-5<br />

coarse space<br />

m=10 m=30 m=50 m=80 m=110<br />

0 123 110 101 91 87<br />

1 123 110 101 91 87<br />

2 122 109 100 91 87<br />

3 122 109 100 91 87<br />

4 122 108 99 90 87<br />

5 121 108 99 90 86<br />

6 120 107 98 89 86<br />

7 119 103 95 83 83<br />

8 119 103 95 83 83<br />

9 119 103 95 83 83<br />

10 119 102 94 83 83<br />

11 118 101 93 82 81<br />

12 118 100 92 81 76<br />

13 117 96 88 76 76<br />

14 117 96 88 76 75<br />

15 116 94 87 75 75<br />

16 116 92 85 75 75<br />

17 116 92 85 75 75<br />

18 116 92 85 75 75<br />

19 114 91 84 72 72<br />

20 112 92 84 72 72<br />

Table A.5.43: Number of iterations required by GMRES preconditioned by<br />

a Frobenius-norm minimization method updated with spectral corrections<br />

to reduce the normwise backward error by 10 −5 <strong>for</strong> increasing size of the<br />

coarse space. The <strong>for</strong>mulation of Theorem 2 with the choice W H = Vε H M 1<br />

is used <strong>for</strong> the low-rank updates. The same nonzero structure is imposed on<br />

A and M 1 .


A.5. Experiments with a poor preconditioner M 1 197<br />

Example 5<br />

Size of the<br />

GMRES(m), Toler. 1e-8<br />

coarse space<br />

m=10 m=30 m=50 m=80 m=110<br />

0 +1500 321 175 156 144<br />

1 +1500 310 153 155 144<br />

2 +1500 310 153 155 144<br />

3 1443 174 149 118 104<br />

4 1341 175 149 117 104<br />

5 1292 174 149 116 104<br />

6 1058 193 141 94 89<br />

7 132 95 86 74 74<br />

8 132 95 86 74 74<br />

9 132 95 86 74 74<br />

10 132 95 86 74 74<br />

11 132 94 84 74 74<br />

12 129 93 84 74 74<br />

13 128 92 85 74 74<br />

14 125 90 86 74 74<br />

15 120 90 85 74 74<br />

16 120 90 85 74 74<br />

17 120 90 85 74 74<br />

18 119 88 84 74 74<br />

19 119 86 82 70 70<br />

20 120 86 83 70 70<br />

Table A.5.44: Number of iterations required by GMRES preconditioned by<br />

a Frobenius-norm minimization method updated with spectral corrections<br />

to reduce the normwise backward error by 10 −8 <strong>for</strong> increasing size of the<br />

coarse space. The <strong>for</strong>mulation of Theorem 2 with the choice W H = Vε H M 1<br />

is used <strong>for</strong> the low-rank updates. The same nonzero sructure is imposed on<br />

A and M 1 .


198 A. Numerical results with the two-level spectral preconditioner<br />

Example 5<br />

Size of the<br />

GMRES(m), Toler. 1e-5<br />

coarse space<br />

m=10 m=30 m=50 m=80 m=110<br />

0 527 92 80 72 72<br />

1 523 90 80 72 72<br />

2 523 90 80 72 72<br />

3 509 66 60 57 57<br />

4 462 65 61 57 57<br />

5 433 65 61 57 57<br />

6 270 64 76 57 57<br />

7 62 43 41 41 41<br />

8 62 43 41 41 41<br />

9 62 43 41 41 41<br />

10 62 43 41 41 41<br />

11 62 43 41 41 41<br />

12 59 44 41 41 41<br />

13 58 44 41 41 41<br />

14 56 43 40 40 40<br />

15 55 40 37 37 37<br />

16 55 40 37 37 37<br />

17 55 40 37 37 37<br />

18 55 40 37 37 37<br />

19 55 40 37 37 37<br />

20 56 39 37 37 37<br />

Table A.5.45: Number of iterations required by GMRES preconditioned by<br />

a Frobenius-norm minimization method updated with spectral corrections<br />

to reduce the normwise backward error by 10 −5 <strong>for</strong> increasing size of the<br />

coarse space. The <strong>for</strong>mulation of Theorem 2 with the choice W H = Vε H M 1<br />

is used <strong>for</strong> the low-rank updates. The same nonzero structure is imposed on<br />

A and M 1 .


A.5. Experiments with a poor preconditioner M 1 199<br />

Example 1<br />

Nr. of eigenvalues M-V products CPU-time A-V<br />

1 480 54 120<br />

2 5198 526 306<br />

3 1580 174 144<br />

4 1355 153 91<br />

5 997 96 31<br />

6 926 116 19<br />

7 891 82 23<br />

8 965 89 17<br />

9 1367 126 34<br />

10 1317 139 35<br />

11 1331 124 26<br />

12 1829 248 12<br />

13 1738 316 24<br />

14 2872 302 40<br />

15 3084 355 42<br />

16 2574 258 36<br />

17 2654 436 40<br />

18 2156 253 30<br />

19 1689 163 22<br />

20 3284 336 46<br />

Table A.5.46: Number of matrix-vector products, CPU time and<br />

amortization vectors required by the IRAM algorithm to compute<br />

approximate eigenvalues nearest zero and the corresponding eigenvectors.<br />

The computation of the amortization vectors is relative to GMRES(10) and<br />

a tolerance of 10 −5 .


200 A. Numerical results with the two-level spectral preconditioner<br />

Example 2<br />

Nr. of eigenvalues M-V products CPU-time A-V<br />

1 285 47 36<br />

2 1714 408 18<br />

3 10242 1690 1138<br />

4 2083 280 209<br />

5 9892 1320 1237<br />

6 5353 716 54<br />

7 2599 548 260<br />

8 21113 2845 2640<br />

9 2716 387 272<br />

10 3311 798 414<br />

11 3197 442 229<br />

12 2534 357 212<br />

13 2605 358 187<br />

14 2515 345 140<br />

15 6477 943 648<br />

16 3079 429 308<br />

17 3054 480 204<br />

18 4658 653 311<br />

19 3806 581 272<br />

20 9304 1319 665<br />

Table A.5.47: Number of matrix-vector products, CPU time and<br />

amortization vectors required by the IRAM algorithm to compute<br />

approximate eigenvalues nearest zero and the corresponding eigenvectors.<br />

The computation of the amortization vectors is relative to GMRES(10) and<br />

a tolerance of 10 −5 .


A.5. Experiments with a poor preconditioner M 1 201<br />

Example 3<br />

Nr. of eigenvalues M-V products CPU-time A-V<br />

1 180 43 60<br />

2 631 148 211<br />

3 991 234 199<br />

4 955 288 120<br />

5 703 170 101<br />

6 590 141 74<br />

7 537 172 34<br />

8 743 179 50<br />

9 542 152 16<br />

10 736 177 23<br />

11 904 240 27<br />

12 745 227 22<br />

13 784 242 23<br />

14 986 247 29<br />

15 1443 388 42<br />

16 1333 335 38<br />

17 1331 284 36<br />

18 1224 311 33<br />

19 1495 487 40<br />

20 1652 411 38<br />

Table A.5.48: Number of matrix-vector products, CPU time and<br />

amortization vectors required by the IRAM algorithm to compute<br />

approximate eigenvalues nearest zero and the corresponding eigenvectors.<br />

The computation of the amortization vectors is relative to GMRES(10) and<br />

a tolerance of 10 −5 .


202 A. Numerical results with the two-level spectral preconditioner<br />

Example 4<br />

Nr. of eigenvalues M-V products CPU-time A-V<br />

1 555 189 -<br />

2 4307 1439 4307<br />

3 1402 461 1402<br />

4 2154 709 2154<br />

5 1343 617 672<br />

6 1097 475 66<br />

7 1044 457 261<br />

8 1541 757 386<br />

9 1413 614 354<br />

10 1441 496 361<br />

11 3440 1166 688<br />

12 3688 1241 738<br />

13 4473 1548 746<br />

14 2514 948 419<br />

15 1695 573 243<br />

16 5491 1864 785<br />

17 2787 993 399<br />

18 3573 1217 511<br />

19 7160 2462 796<br />

20 8188 2879 745<br />

Table A.5.49: Number of matrix-vector products, CPU time and<br />

amortization vectors required by the IRAM algorithm to compute<br />

approximate eigenvalues nearest zero and the corresponding eigenvectors.<br />

The computation of the amortization vectors is relative to GMRES(10) and<br />

a tolerance of 10 −5 .


A.5. Experiments with a poor preconditioner M 1 203<br />

Example 5<br />

Nr. of eigenvalues M-V products CPU-time A-V<br />

1 105 51 27<br />

2 58 29 15<br />

3 237 115 14<br />

4 252 145 4<br />

5 163 81 2<br />

6 251 128 1<br />

7 239 134 1<br />

8 229 114 1<br />

9 209 118 1<br />

10 213 154 1<br />

11 215 109 1<br />

12 608 565 2<br />

13 665 348 2<br />

14 655 383 2<br />

15 817 420 2<br />

16 850 439 2<br />

17 1060 620 3<br />

18 1247 622 3<br />

19 973 842 3<br />

20 4293 2206 10<br />

Table A.5.50: Number of matrix-vector products, CPU time and<br />

amortization vectors required by the IRAM algorithm to compute<br />

approximate eigenvalues nearest zero and the corresponding eigenvectors.<br />

The computation of the amortization vectors is relative to GMRES(10) and<br />

a tolerance of 10 −5 .


204 A. Numerical results with the two-level spectral preconditioner<br />

A.6 Numerical results <strong>for</strong> the symmetric<br />

<strong>for</strong>mulation<br />

Example 1<br />

Size of the<br />

GMRES(m), Toler. 1e-8<br />

coarse space<br />

m=10 m=30 m=50 m=80 m=110<br />

0 358 213 144 79 79<br />

1 316 179 137 76 76<br />

2 312 172 126 73 73<br />

3 315 170 116 70 70<br />

4 308 166 112 68 68<br />

5 315 170 108 67 67<br />

6 311 170 96 64 64<br />

7 290 144 90 62 62<br />

8 292 138 77 58 58<br />

9 302 134 72 57 57<br />

10 244 99 60 53 53<br />

11 204 96 54 51 51<br />

12 215 96 54 51 51<br />

13 208 89 52 51 51<br />

14 184 88 51 51 51<br />

15 186 88 51 51 51<br />

16 189 80 47 47 47<br />

17 195 80 47 47 47<br />

18 205 77 47 47 47<br />

19 182 77 47 47 47<br />

20 173 63 44 44 44<br />

Table A.6.51: Number of iterations required by GMRES preconditioned by<br />

a Frobenius-norm minimization method updated with spectral corrections<br />

to reduce the normwise backward error by 10 −8 <strong>for</strong> increasing size of the<br />

coarse space. The symmetric <strong>for</strong>mulation of Theorem 2 with the choice<br />

W = V ε is used <strong>for</strong> the low-rank updates.


A.6. Numerical results <strong>for</strong> the symmetric <strong>for</strong>mulation 205<br />

Example 1<br />

Size of the<br />

GMRES(m), Toler. 1e-5<br />

coarse space<br />

m=10 m=30 m=50 m=80 m=110<br />

0 165 103 75 60 60<br />

1 153 87 64 56 56<br />

2 154 87 62 53 53<br />

3 154 87 61 53 53<br />

4 150 87 61 53 53<br />

5 152 87 61 53 53<br />

6 151 81 50 50 50<br />

7 148 72 48 48 48<br />

8 147 70 44 44 44<br />

9 146 6¯6 43 43 43<br />

10 122 51 40 40 40<br />

11 110 50 39 39 39<br />

12 108 49 38 38 38<br />

13 106 47 38 38 38<br />

14 106 46 38 38 38<br />

15 95 47 38 38 38<br />

16 105 44 35 35 35<br />

17 109 45 35 35 35<br />

18 106 45 35 35 35<br />

19 105 44 35 35 35<br />

20 105 34 32 32 32<br />

Table A.6.52: Number of iterations required by GMRES preconditioned by<br />

a Frobenius-norm minimization method updated with spectral corrections<br />

to reduce the normwise backward error by 10 −5 <strong>for</strong> increasing size of the<br />

coarse space. The symmetric <strong>for</strong>mulation of Theorem 2 with the choice<br />

W = V ε is used <strong>for</strong> the low-rank updates.


206 A. Numerical results with the two-level spectral preconditioner<br />

Example 2<br />

Size of the<br />

GMRES(m), Toler. 1e-8<br />

coarse space<br />

m=10 m=30 m=50 m=80 m=110<br />

0 +1500 +1500 496 311 198<br />

1 304 235 192 151 107<br />

2 305 222 184 143 104<br />

3 310 209 177 138 101<br />

4 303 208 170 136 97<br />

5 310 206 164 133 95<br />

6 307 205 160 123 92<br />

7 239 174 146 107 89<br />

8 201 159 138 100 87<br />

9 202 159 136 99 86<br />

10 194 155 135 97 86<br />

11 194 155 135 97 86<br />

12 193 155 135 95 84<br />

13 193 154 134 88 83<br />

14 194 150 133 82 81<br />

15 193 147 130 78 78<br />

16 185 143 119 75 75<br />

17 183 141 115 74 74<br />

18 187 141 113 74 74<br />

19 185 140 107 71 71<br />

20 169 135 103 70 70<br />

Table A.6.53: Number of iterations required by GMRES preconditioned by<br />

a Frobenius-norm minimization method updated with spectral corrections<br />

to reduce the normwise backward error by 10 −8 <strong>for</strong> increasing size of the<br />

coarse space. The symmetric <strong>for</strong>mulation of Theorem 2 with the choice<br />

W = V ε is used <strong>for</strong> the low-rank updates.


A.6. Numerical results <strong>for</strong> the symmetric <strong>for</strong>mulation 207<br />

Example 2<br />

Size of the<br />

GMRES(m), Toler. 1e-5<br />

coarse space<br />

m=10 m=30 m=50 m=80 m=110<br />

0 145 110 95 76 76<br />

1 143 110 95 76 76<br />

2 140 104 91 73 73<br />

3 146 103 90 73 73<br />

4 142 103 90 73 73<br />

5 147 103 90 73 73<br />

6 148 101 89 72 72<br />

7 118 89 82 68 68<br />

8 118 81 75 64 64<br />

9 99 80 75 64 64<br />

10 97 80 75 64 64<br />

11 97 80 74 63 63<br />

12 96 80 75 63 63<br />

13 97 80 74 63 63<br />

14 96 79 74 63 63<br />

15 99 77 70 60 60<br />

16 96 74 65 57 57<br />

17 92 74 65 57 57<br />

18 93 74 65 57 57<br />

19 93 73 63 56 56<br />

20 86 71 60 54 54<br />

Table A.6.54: Number of iterations required by GMRES preconditioned by<br />

a Frobenius-norm minimization method updated with spectral corrections<br />

to reduce the normwise backward error by 10 −5 <strong>for</strong> increasing size of the<br />

coarse space. The symmetric <strong>for</strong>mulation of Theorem 2 with the choice<br />

W = V ε is used <strong>for</strong> the low-rank updates.


208 A. Numerical results with the two-level spectral preconditioner<br />

Example 3<br />

Size of the<br />

GMRES(m), Toler. 1e-8<br />

coarse space<br />

m=10 m=30 m=50 m=80 m=110<br />

0 268 174 130 79 79<br />

1 260 171 121 76 76<br />

2 267 169 120 72 72<br />

3 272 169 114 70 70<br />

4 256 155 100 67 67<br />

5 262 142 93 64 64<br />

6 199 112 79 60 60<br />

7 202 112 79 60 60<br />

8 208 112 79 58 58<br />

9 135 87 66 55 55<br />

10 126 82 62 54 54<br />

11 125 82 61 54 54<br />

12 115 81 57 53 53<br />

13 118 81 57 53 53<br />

14 120 81 58 53 53<br />

15 110 76 50 50 50<br />

16 103 69 47 47 47<br />

17 105 65 46 46 46<br />

18 102 66 47 47 47<br />

19 94 59 45 45 45<br />

20 90 58 44 44 44<br />

Table A.6.55: Number of iterations required by GMRES preconditioned by<br />

a Frobenius-norm minimization method updated with spectral corrections<br />

to reduce the normwise backward error by 10 −8 <strong>for</strong> increasing size of the<br />

coarse space. The symmetric <strong>for</strong>mulation of Theorem 2 with the choice<br />

W = V ε is used <strong>for</strong> the low-rank updates.


A.6. Numerical results <strong>for</strong> the symmetric <strong>for</strong>mulation 209<br />

Example 3<br />

Size of the<br />

GMRES(m), Toler. 1e-5<br />

coarse space<br />

m=10 m=30 m=50 m=80 m=110<br />

0 129 89 70 57 57<br />

1 129 88 70 56 56<br />

2 133 88 69 56 56<br />

3 137 88 69 56 56<br />

4 127 86 57 53 53<br />

5 129 76 49 49 49<br />

6 108 60 45 45 45<br />

7 108 60 45 45 45<br />

8 109 60 45 45 45<br />

9 73 51 42 42 42<br />

10 66 49 40 40 40<br />

11 65 49 40 40 40<br />

12 62 48 40 40 40<br />

13 61 48 40 40 40<br />

14 66 48 40 40 40<br />

15 56 42 36 36 36<br />

16 50 40 34 34 34<br />

17 52 38 34 34 34<br />

18 56 42 35 35 35<br />

19 54 36 34 34 34<br />

20 53 35 33 33 33<br />

Table A.6.56: Number of iterations required by GMRES preconditioned by<br />

a Frobenius-norm minimization method updated with spectral corrections<br />

to reduce the normwise backward error by 10 −5 <strong>for</strong> increasing size of the<br />

coarse space. The symmetric <strong>for</strong>mulation of Theorem 2 with the choice<br />

W = V ε is used <strong>for</strong> the low-rank updates.


210 A. Numerical results with the two-level spectral preconditioner<br />

Example 4<br />

Size of the<br />

GMRES(m), Toler. 1e-8<br />

coarse space<br />

m=10 m=30 m=50 m=80 m=110<br />

0 145 113 90 71 71<br />

1 145 113 90 68 68<br />

2 135 106 83 65 65<br />

3 131 103 85 65 65<br />

4 126 97 74 61 61<br />

5 124 94 72 61 61<br />

6 122 91 64 58 58<br />

7 101 75 58 56 56<br />

8 99 75 58 56 56<br />

9 94 74 58 56 56<br />

10 93 74 55 55 55<br />

11 86 74 55 53 53<br />

12 86 70 51 51 51<br />

13 85 68 50 50 50<br />

14 83 67 49 49 49<br />

15 82 65 49 49 49<br />

16 82 65 49 49 49<br />

17 82 65 49 49 49<br />

18 82 66 49 49 49<br />

19 81 66 50 50 50<br />

20 77 61 47 47 47<br />

Table A.6.57: Number of iterations required by GMRES preconditioned by<br />

a Frobenius-norm minimization method updated with spectral corrections<br />

to reduce the normwise backward error by 10 −8 <strong>for</strong> increasing size of the<br />

coarse space. The symmetric <strong>for</strong>mulation of Theorem 2 with the choice<br />

W = V ε is used <strong>for</strong> the low-rank updates.


A.6. Numerical results <strong>for</strong> the symmetric <strong>for</strong>mulation 211<br />

Example 4<br />

Size of the<br />

GMRES(m), Toler. 1e-5<br />

coarse space<br />

m=10 m=30 m=50 m=80 m=110<br />

0 71 57 48 48 48<br />

1 71 57 48 48 48<br />

2 68 54 45 45 45<br />

3 66 50 45 45 45<br />

4 65 46 43 43 43<br />

5 64 45 41 41 41<br />

6 64 45 41 41 41<br />

7 50 41 38 38 38<br />

8 50 41 38 38 38<br />

9 48 41 38 38 38<br />

10 46 38 37 37 37<br />

11 46 38 37 37 37<br />

12 45 36 35 35 35<br />

13 45 36 35 35 35<br />

14 44 35 34 34 34<br />

15 43 34 34 34 34<br />

16 43 34 34 34 34<br />

17 43 35 34 34 34<br />

18 43 35 34 34 34<br />

19 43 35 34 34 34<br />

20 40 33 32 32 32<br />

Table A.6.58: Number of iterations required by GMRES preconditioned by<br />

a Frobenius-norm minimization method updated with spectral corrections<br />

to reduce the normwise backward error by 10 −5 <strong>for</strong> increasing size of the<br />

coarse space. The symmetric <strong>for</strong>mulation of Theorem 2 with the choice<br />

W = V ε is used <strong>for</strong> the low-rank updates.


212 A. Numerical results with the two-level spectral preconditioner<br />

Example 5<br />

Size of the<br />

GMRES(m), Toler. 1e-8<br />

coarse space<br />

m=10 m=30 m=50 m=80 m=110<br />

0 297 87 75 66 66<br />

1 290 78 75 66 66<br />

2 290 78 75 66 66<br />

3 287 66 68 58 58<br />

4 252 66 64 58 58<br />

5 214 66 62 58 58<br />

6 430 66 50 50 50<br />

7 51 43 39 39 39<br />

8 52 43 39 39 39<br />

9 53 43 39 39 39<br />

10 52 44 40 40 40<br />

11 52 44 40 40 40<br />

12 49 44 38 38 38<br />

13 49 44 38 38 38<br />

14 50 44 38 38 38<br />

15 50 44 38 38 38<br />

16 50 44 38 38 38<br />

17 50 44 38 38 38<br />

18 50 44 38 38 38<br />

19 52 44 39 39 39<br />

20 52 44 39 39 39<br />

Table A.6.59: Number of iterations required by GMRES preconditioned by<br />

a Frobenius-norm minimization method updated with spectral corrections<br />

to reduce the normwise backward error by 10 −8 <strong>for</strong> increasing size of the<br />

coarse space. The symmetric <strong>for</strong>mulation of Theorem 2 with the choice<br />

W = V ε is used <strong>for</strong> the low-rank updates.


A.6. Numerical results <strong>for</strong> the symmetric <strong>for</strong>mulation 213<br />

Example 5<br />

Size of the<br />

GMRES(m), Toler. 1e-5<br />

coarse space<br />

m=10 m=30 m=50 m=80 m=110<br />

0 110 46 42 42 42<br />

1 109 45 41 41 41<br />

2 109 45 41 41 41<br />

3 104 34 33 33 33<br />

4 88 34 33 33 33<br />

5 72 34 33 33 33<br />

6 102 36 33 33 33<br />

7 23 21 21 21 21<br />

8 23 21 21 21 21<br />

9 23 21 21 21 21<br />

10 23 21 21 21 21<br />

11 23 21 21 21 21<br />

12 23 20 20 20 20<br />

13 23 21 21 21 21<br />

14 23 21 21 21 21<br />

15 23 21 21 21 21<br />

16 23 21 21 21 21<br />

17 23 21 21 21 21<br />

18 23 21 21 21 21<br />

19 23 21 21 21 21<br />

20 24 22 22 22 22<br />

Table A.6.60: Number of iterations required by GMRES preconditioned by<br />

a Frobenius-norm minimization method updated with spectral corrections<br />

to reduce the normwise backward error by 10 −5 <strong>for</strong> increasing size of the<br />

coarse space. The symmetric <strong>for</strong>mulation of Theorem 2 with the choice<br />

W = V ε is used <strong>for</strong> the low-rank updates.


214 A. Numerical results with the two-level spectral preconditioner<br />

Size of the<br />

Example<br />

coarse space<br />

Nr. 1 Nr. 2 Nr. 3 Nr. 4 Nr. 5<br />

0 103 161 92 61 51<br />

1 97 129 90 61 51<br />

2 90 119 85 60 51<br />

3 84 120 78 56 40<br />

4 78 117 77 54 40<br />

5 71 108 71 53 40<br />

6 65 104 67 50 40<br />

7 60 99 67 49 33<br />

8 60 97 66 49 33<br />

9 58 94 58 49 34<br />

10 58 91 56 46 33<br />

11 55 82 58 46 33<br />

12 52 82 56 42 34<br />

13 47 86 52 40 33<br />

14 44 77 51 41 33<br />

15 44 76 51 41 26<br />

16 40 73 48 41 34<br />

17 42 70 49 41 34<br />

18 41 69 48 41 34<br />

19 37 66 47 39 34<br />

20 37 68 46 39 34<br />

Table A.6.61: Number of iterations required by SQMR preconditioned by a<br />

Frobenius-norm minimization method updated with spectral corrections to<br />

reduce the normwise backward error by 10 −8 <strong>for</strong> increasing size of the coarse<br />

space. The symmetric <strong>for</strong>mulation of Theorem 2 with the choice W = V ε is<br />

used <strong>for</strong> the low-rank updates.


A.6. Numerical results <strong>for</strong> the symmetric <strong>for</strong>mulation 215<br />

Size of the<br />

Example<br />

coarse space<br />

Nr. 1 Nr. 2 Nr. 3 Nr. 4 Nr. 5<br />

0 74 70 58 30 24<br />

1 74 70 58 30 23<br />

2 65 70 58 30 23<br />

3 60 70 58 31 23<br />

4 57 70 55 27 23<br />

5 49 70 50 27 23<br />

6 47 70 45 27 15<br />

7 37 69 45 23 15<br />

8 40 69 45 23 15<br />

9 40 56 42 23 14<br />

10 40 58 42 23 14<br />

11 34 59 42 21 14<br />

12 30 56 40 21 14<br />

13 28 59 37 20 14<br />

14 25 59 36 20 14<br />

15 23 51 36 20 14<br />

16 23 47 33 20 14<br />

17 23 47 33 19 14<br />

18 23 47 33 20 14<br />

19 22 42 33 19 14<br />

20 22 43 33 19 14<br />

Table A.6.62: Number of iterations required by SQMR preconditioned by a<br />

Frobenius-norm minimization method updated with spectral corrections to<br />

reduce the normwise backward error by 10 −5 <strong>for</strong> increasing size of the coarse<br />

space. The symmetric <strong>for</strong>mulation of Theorem 2 with the choice W = V ε is<br />

used <strong>for</strong> the low-rank updates.


216 A. Numerical results with the two-level spectral preconditioner<br />

A.7 Numerical results <strong>for</strong> the multiplicative<br />

<strong>for</strong>mulation<br />

Example 1<br />

Size of the<br />

GMRES(m), Toler. 1e-8<br />

coarse space<br />

m=10 m=30 m=50 m=80 m=110<br />

0 358 213 144 79 79<br />

1 189 89 49 49 49<br />

2 188 88 47 47 47<br />

3 188 85 45 45 45<br />

4 186 83 44 44 44<br />

5 186 82 43 43 43<br />

6 183 70 41 41 41<br />

7 178 60 39 39 39<br />

8 178 56 37 37 37<br />

9 170 53 36 36 36<br />

10 116 44 34 34 34<br />

11 105 40 33 33 33<br />

12 103 40 33 33 33<br />

13 97 40 33 33 33<br />

14 96 38 32 32 32<br />

15 96 38 32 32 32<br />

16 88 30 30 30 30<br />

17 88 30 30 30 30<br />

18 88 30 30 30 30<br />

19 88 29 29 29 29<br />

20 79 25 25 25 25<br />

Table A.7.63: Number of iterations required by GMRES preconditioned by<br />

a Frobenius-norm minimization method updated with spectral corrections<br />

to reduce the normwise backward error by 10 −8 <strong>for</strong> increasing size of the<br />

coarse space. The preconditioner is updated in multiplicative <strong>for</strong>m.


A.7. Numerical results <strong>for</strong> the multiplicative <strong>for</strong>mulation 217<br />

Example 1<br />

Size of the<br />

GMRES(m), Toler. 1e-5<br />

coarse space<br />

m=10 m=30 m=50 m=80 m=110<br />

0 165 103 75 60 60<br />

1 90 44 37 37 37<br />

2 90 42 35 35 35<br />

3 90 42 35 35 35<br />

4 90 42 35 35 35<br />

5 90 42 34 34 34<br />

6 89 39 32 32 32<br />

7 85 31 31 31 31<br />

8 85 29 29 29 29<br />

9 83 28 28 28 28<br />

10 58 26 26 26 26<br />

11 50 25 25 25 25<br />

12 49 25 25 25 25<br />

13 47 24 24 24 24<br />

14 47 24 24 24 24<br />

15 47 24 24 24 24<br />

16 45 22 22 22 22<br />

17 45 22 22 22 22<br />

18 45 22 22 22 22<br />

19 43 21 21 21 21<br />

20 37 18 18 18 18<br />

Table A.7.64: Number of iterations required by GMRES preconditioned by<br />

a Frobenius-norm minimization method updated with spectral corrections<br />

to reduce the normwise backward error by 10 −5 <strong>for</strong> increasing size of the<br />

coarse space. The preconditioner is updated in multiplicative <strong>for</strong>m.


218 A. Numerical results with the two-level spectral preconditioner<br />

Example 2<br />

Size of the<br />

GMRES(m), Toler. 1e-8<br />

coarse space<br />

m=10 m=30 m=50 m=80 m=110<br />

0 +1500 +1500 496 311 198<br />

1 190 132 98 68 68<br />

2 187 123 94 66 66<br />

3 176 117 92 64 64<br />

4 190 116 89 61 61<br />

5 189 113 87 60 60<br />

6 188 110 81 59 59<br />

7 150 99 73 56 56<br />

8 130 94 69 55 55<br />

9 129 93 68 55 55<br />

10 127 90 67 55 55<br />

11 127 90 67 55 55<br />

12 130 94 67 54 54<br />

13 125 90 62 53 53<br />

14 113 81 49 49 49<br />

15 113 77 47 47 47<br />

16 127 85 49 49 49<br />

17 279 85 47 47 47<br />

18 130 83 48 48 48<br />

19 109 69 44 44 44<br />

20 128 78 46 46 46<br />

Table A.7.65: Number of iterations required by GMRES preconditioned by<br />

a Frobenius-norm minimization method updated with spectral corrections<br />

to reduce the normwise backward error by 10 −8 <strong>for</strong> increasing size of the<br />

coarse space. The preconditioner is updated in multiplicative <strong>for</strong>m.


A.7. Numerical results <strong>for</strong> the multiplicative <strong>for</strong>mulation 219<br />

Example 2<br />

Size of the<br />

GMRES(m), Toler. 1e-5<br />

coarse space<br />

m=10 m=30 m=50 m=80 m=110<br />

0 145 110 95 76 76<br />

1 82 59 47 47 47<br />

2 82 56 45 45 45<br />

3 80 56 45 45 45<br />

4 82 56 45 45 45<br />

5 80 56 45 45 45<br />

6 83 55 44 44 44<br />

7 68 51 42 42 42<br />

8 63 48 40 40 40<br />

9 60 48 40 40 40<br />

10 60 48 40 40 40<br />

11 60 48 40 40 40<br />

12 64 49 40 40 40<br />

13 60 48 40 40 40<br />

14 47 36 32 32 32<br />

15 47 35 31 31 31<br />

16 64 47 38 38 38<br />

17 108 42 33 33 33<br />

18 64 46 37 37 37<br />

19 47 34 32 32 32<br />

20 60 42 35 35 35<br />

Table A.7.66: Number of iterations required by GMRES preconditioned by<br />

a Frobenius-norm minimization method updated with spectral corrections<br />

to reduce the normwise backward error by 10 −5 <strong>for</strong> increasing size of the<br />

coarse space. The preconditioner is updated in multiplicative <strong>for</strong>m.


220 A. Numerical results with the two-level spectral preconditioner<br />

Example 3<br />

Size of the<br />

GMRES(m), Toler. 1e-8<br />

coarse space<br />

m=10 m=30 m=50 m=80 m=110<br />

0 268 174 130 79 79<br />

1 163 97 52 51 51<br />

2 166 96 49 49 49<br />

3 170 96 47 47 47<br />

4 163 88 45 45 45<br />

5 156 71 43 43 43<br />

6 119 58 41 41 41<br />

7 119 58 41 41 41<br />

8 119 57 40 40 40<br />

9 89 51 37 37 37<br />

10 80 48 36 36 36<br />

11 80 47 36 36 36<br />

12 77 46 36 36 36<br />

13 75 45 35 35 35<br />

14 75 45 35 35 35<br />

15 70 44 34 34 34<br />

16 67 38 33 33 33<br />

17 66 35 32 32 32<br />

18 63 32 31 31 31<br />

19 60 30 30 30 30<br />

20 56 29 29 29 29<br />

Table A.7.67: Number of iterations required by GMRES preconditioned by<br />

a Frobenius-norm minimization method updated with spectral corrections<br />

to reduce the normwise backward error by 10 −8 <strong>for</strong> increasing size of the<br />

coarse space. The preconditioner is updated in multiplicative <strong>for</strong>m.


A.7. Numerical results <strong>for</strong> the multiplicative <strong>for</strong>mulation 221<br />

Example 3<br />

Size of the<br />

GMRES(m), Toler. 1e-5<br />

coarse space<br />

m=10 m=30 m=50 m=80 m=110<br />

0 129 89 70 57 57<br />

1 87 56 39 39 39<br />

2 88 56 38 38 38<br />

3 89 57 38 38 38<br />

4 89 55 36 36 36<br />

5 82 44 34 34 34<br />

6 63 33 31 31 31<br />

7 64 33 31 31 31<br />

8 64 33 31 31 31<br />

9 49 29 29 29 29<br />

10 45 28 28 28 28<br />

11 45 28 28 28 28<br />

12 43 28 28 28 28<br />

13 43 28 28 28 28<br />

14 43 27 27 27 27<br />

15 42 27 27 27 27<br />

16 39 26 26 26 26<br />

17 39 25 25 25 25<br />

18 35 24 24 24 24<br />

19 37 23 23 23 23<br />

20 34 22 22 22 22<br />

Table A.7.68: Number of iterations required by GMRES preconditioned by<br />

a Frobenius-norm minimization method updated with spectral corrections<br />

to reduce the normwise backward error by 10 −5 <strong>for</strong> increasing size of the<br />

coarse space. The preconditioner is updated in multiplicative <strong>for</strong>m.


222 A. Numerical results with the two-level spectral preconditioner<br />

Example 4<br />

Size of the<br />

GMRES(m), Toler. 1e-8<br />

coarse space<br />

m=10 m=30 m=50 m=80 m=110<br />

0 145 113 90 71 71<br />

1 103 66 48 48 48<br />

2 98 64 45 45 45<br />

3 98 64 45 45 45<br />

4 90 59 43 43 43<br />

5 93 59 43 43 43<br />

6 87 57 41 41 41<br />

7 75 51 40 40 40<br />

8 75 51 40 40 40<br />

9 76 51 40 40 40<br />

10 64 43 35 35 35<br />

11 70 49 37 37 37<br />

12 64 40 36 36 36<br />

13 62 37 35 35 35<br />

14 61 37 35 35 35<br />

15 59 36 34 34 34<br />

16 59 36 34 34 34<br />

17 59 36 34 34 34<br />

18 59 36 34 34 34<br />

19 59 35 33 33 33<br />

20 55 34 33 33 33<br />

Table A.7.69: Number of iterations required by GMRES preconditioned by<br />

a Frobenius-norm minimization method updated with spectral corrections<br />

to reduce the normwise backward error by 10 −8 <strong>for</strong> increasing size of the<br />

coarse space. The preconditioner is updated in multiplicative <strong>for</strong>m.


A.7. Numerical results <strong>for</strong> the multiplicative <strong>for</strong>mulation 223<br />

Example 4<br />

Size of the<br />

GMRES(m), Toler. 1e-5<br />

coarse space<br />

m=10 m=30 m=50 m=80 m=110<br />

0 71 57 48 48 48<br />

1 52 39 34 34 34<br />

2 51 35 33 33 33<br />

3 50 35 33 33 33<br />

4 45 32 32 32 32<br />

5 45 33 32 32 32<br />

6 44 31 31 31 31<br />

7 36 28 28 28 28<br />

8 36 28 28 28 28<br />

9 37 28 28 28 28<br />

10 27 22 22 22 22<br />

11 34 27 27 27 27<br />

12 31 26 26 26 26<br />

13 31 25 25 25 25<br />

14 30 25 25 25 25<br />

15 30 25 25 25 25<br />

16 30 25 25 25 25<br />

17 29 25 25 25 25<br />

18 29 24 24 24 24<br />

19 30 24 24 24 24<br />

20 27 23 23 23 23<br />

Table A.7.70: Number of iterations required by GMRES preconditioned by<br />

a Frobenius-norm minimization method updated with spectral corrections<br />

to reduce the normwise backward error by 10 −5 <strong>for</strong> increasing size of the<br />

coarse space. The preconditioner is updated in multiplicative <strong>for</strong>m.


224 A. Numerical results with the two-level spectral preconditioner<br />

Example 5<br />

Size of the<br />

GMRES(m), Toler. 1e-8<br />

coarse space<br />

m=10 m=30 m=50 m=80 m=110<br />

0 297 87 75 66 66<br />

1 137 65 48 48 48<br />

2 136 65 48 48 48<br />

3 119 47 43 43 43<br />

4 121 47 43 43 43<br />

5 123 48 43 43 43<br />

6 270 47 36 36 36<br />

7 38 29 29 29 29<br />

8 38 30 30 30 30<br />

9 38 30 30 30 30<br />

10 38 30 30 30 30<br />

11 39 30 30 30 30<br />

12 38 29 29 29 29<br />

13 44 30 30 30 30<br />

14 37 28 28 28 28<br />

15 37 28 28 28 28<br />

16 36 28 28 28 28<br />

17 37 28 28 28 28<br />

18 41 30 30 30 30<br />

19 40 30 30 30 30<br />

20 43 30 30 30 30<br />

Table A.7.71: Number of iterations required by GMRES preconditioned by<br />

a Frobenius-norm minimization method updated with spectral corrections<br />

to reduce the normwise backward error by 10 −8 <strong>for</strong> increasing size of the<br />

coarse space. The preconditioner is updated in multiplicative <strong>for</strong>m.


A.7. Numerical results <strong>for</strong> the multiplicative <strong>for</strong>mulation 225<br />

Example 5<br />

Size of the<br />

GMRES(m), Toler. 1e-5<br />

coarse space<br />

m=10 m=30 m=50 m=80 m=110<br />

0 110 46 42 42 42<br />

1 46 40 32 32 32<br />

2 46 40 32 32 32<br />

3 46 24 24 24 24<br />

4 58 24 24 24 24<br />

5 59 24 24 24 24<br />

6 85 24 24 24 24<br />

7 20 16 16 16 16<br />

8 20 16 16 16 16<br />

9 20 16 16 16 16<br />

10 20 16 16 16 16<br />

11 20 16 16 16 16<br />

12 19 16 16 16 16<br />

13 22 17 17 17 17<br />

14 19 15 15 15 15<br />

15 19 15 15 15 15<br />

16 19 15 15 15 15<br />

17 19 15 15 15 15<br />

18 23 18 18 18 18<br />

19 24 19 19 19 19<br />

20 25 19 19 19 19<br />

Table A.7.72: Number of iterations required by GMRES preconditioned by<br />

a Frobenius-norm minimization method updated with spectral corrections<br />

to reduce the normwise backward error by 10 −5 <strong>for</strong> increasing size of the<br />

coarse space. The preconditioner is updated in multiplicative <strong>for</strong>m.


226 A. Numerical results with the two-level spectral preconditioner<br />

Size of the<br />

Example<br />

coarse space<br />

Nr. 1 Nr. 2 Nr. 3 Nr. 4 Nr. 5<br />

0 103 161 92 61 51<br />

1 97 108 75 46 37<br />

2 107 +500 77 45 40<br />

3 96 +500 69 49 39<br />

4 93 +500 77 46 32<br />

5 91 +500 68 51 64<br />

6 78 +500 66 42 186<br />

7 73 +500 70 41 32<br />

8 73 +500 65 42 42<br />

9 68 +500 56 42 +500<br />

10 68 +500 56 38 66<br />

11 72 +500 59 47 183<br />

12 53 +500 58 40 +500<br />

13 56 +500 49 36 +500<br />

14 40 +500 48 36 +500<br />

15 40 +500 46 36 +500<br />

16 42 +500 43 37 116<br />

17 35 +500 45 38 +500<br />

18 37 +500 45 37 +500<br />

19 41 +500 44 35 +500<br />

20 39 +500 43 35 +500<br />

Table A.7.73: Number of iterations required by SQMR preconditioned by<br />

a Frobenius-norm minimization method updated with spectral corrections<br />

to reduce the normwise backward error by 10 −8 <strong>for</strong> increasing size of the<br />

coarse space. The symmetric <strong>for</strong>mulation of Theorem 2 with the choice<br />

W = V ε is used <strong>for</strong> the low-rank updates. The preconditioner is updated in<br />

multiplicative <strong>for</strong>m.


A.7. Numerical results <strong>for</strong> the multiplicative <strong>for</strong>mulation 227<br />

Size of the<br />

Example<br />

coarse space<br />

Nr. 1 Nr. 2 Nr. 3 Nr. 4 Nr. 5<br />

0 74 70 58 30 24<br />

1 61 46 42 21 10<br />

2 59 57 45 21 10<br />

3 50 59 46 21 10<br />

4 46 63 42 17 10<br />

5 43 +500 38 17 10<br />

6 36 +500 34 17 10<br />

7 37 68 36 13 10<br />

8 34 154 37 13 10<br />

9 33 +500 34 13 10<br />

10 32 +500 31 13 16<br />

11 30 +500 32 13 10<br />

12 27 51 32 13 10<br />

13 27 +500 29 13 16<br />

14 22 +500 25 13 14<br />

15 20 +500 27 13 13<br />

16 18 +500 24 13 14<br />

17 16 +500 25 13 13<br />

18 20 +500 25 13 12<br />

19 18 +500 24 13 18<br />

20 18 +500 23 13 32<br />

Table A.7.74: Number of iterations required by SQMR preconditioned by<br />

a Frobenius-norm minimization method updated with spectral corrections<br />

to reduce the normwise backward error by 10 −5 <strong>for</strong> increasing size of the<br />

coarse space. The symmetric <strong>for</strong>mulation of Theorem 2 with the choice<br />

W = V ε is used <strong>for</strong> the low-rank updates. The preconditioner is updated in<br />

multiplicative <strong>for</strong>m.


228 A. Numerical results with the two-level spectral preconditioner


Bibliography<br />

[1] G. Alléon, S. Amram, N. Durante, P. Homsi, D. Pogarieloff, and<br />

C. Farhat. Massively parallel processing boosts the solution of<br />

industrial electromagnetic problems: High per<strong>for</strong>mance out-of-core<br />

solution of complex <strong>dense</strong> <strong>systems</strong>. In M. Heath, V. Torczon,<br />

G. Astfalk, P. E. Bjørstad, A. H. Karp, C. H. Koebel, V. Kumar,<br />

R. F. Lucas, L. T. Watson, and Editors D. E. Womble, editors,<br />

Proceedings of the Eighth SIAM Conference on Parallel Processing <strong>for</strong><br />

Scientific Computing. SIAM Book, Philadelphia, 1997. Conference<br />

held in Minneapolis, Minnesota, USA.<br />

[2] G. Alléon, M. Benzi, and L. Giraud. <strong>Sparse</strong> approximate inverse<br />

preconditioning <strong>for</strong> <strong>dense</strong> <strong>linear</strong> <strong>systems</strong> arising in computational<br />

electromagnetics. Numerical Algorithms, 16:1–15, 1997.<br />

[3] F. Alvarado and H. Daǧ. Sparsified and incomplete sparse factored<br />

inverse <strong>preconditioners</strong>. In Copper Mountain Conference on Iterative<br />

Methods. Preliminary Proceedings., volume I, April 9-14, 1992.<br />

[4] H. Anastassiu and J. L. Volakis. An AIM based analysis of<br />

scattering <strong>from</strong> cylindrically periodic structures. IEEE Antennas and<br />

Propagation Society International Symposium Digest, pages 60–63,<br />

1997.<br />

[5] E. Anderson, Z. Bai, C. Bischof, S. Black<strong>for</strong>d, J. Demmel, J. Dongarra,<br />

J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, and<br />

D. Sorensen. LAPACK Users’ Guide. Society <strong>for</strong> Industrial and<br />

Applied Mathematics, Philadelphia, PA, third edition, 1999.<br />

[6] A. W. Appel. An efficient program <strong>for</strong> many-body simulation. SIAM<br />

J. Scientific and Statistical Computing, 6:85–103, 1985.<br />

[7] J. Baglama, D. Calvetti, G. H. Golub, and L. Reichel. Adaptively<br />

preconditioned GMRES algorithms. SIAM J. Scientific Computing,<br />

20(1):243–269, 1999.<br />

[8] J. Barnes and P. Hut. A hierarchical O(n log n) <strong>for</strong>ce calculation<br />

algorithm. Nature, 324:446–449, 1986.<br />

229


230 A. Numerical results with the two-level spectral preconditioner<br />

[9] A. Bayliss, C. I. Goldstein, and E. Turkel. On accuracy conditions<br />

<strong>for</strong> the numerical computation of waves. J. Comp. Phys., 59:396–404,<br />

1985.<br />

[10] M. Bebendorf. Approximation of boundary element matrices.<br />

Numerische Mathematik, 86(4):565–589, 2000.<br />

[11] A. Bendali. Approximation par éléments finis de surface de problèmes<br />

de diffraction des ondes electro-magnetiques. PhD thesis, Université<br />

Paris VI , 1984.<br />

[12] M. W. Benson. Iterative solution of large scale <strong>linear</strong> <strong>systems</strong>. Master’s<br />

thesis, Lakehead University, Thunder Bay, Canada, 1973.<br />

[13] M. W. Benson and P. O. Frederickson. Iterative solution of large sparse<br />

<strong>linear</strong> <strong>systems</strong> arising in certain multidimensional approximation<br />

problems. Utilitas Mathematica, 22:127–140, 1982.<br />

[14] M. W. Benson, J. Krettmann, and M. Wright. Parallel algorithms <strong>for</strong><br />

the solution of certain large sparse <strong>linear</strong> <strong>systems</strong>. Int J. of Computer<br />

Mathematics, 16, 1984.<br />

[15] M. Benzi, J. Marin, and M. Tůma. A two-level parallel preconditioner<br />

based on sparse approximate inverses. In D.R. Kincaid and A.C.<br />

Elster, editors, Iterative Methods in Scientific Computation IV,<br />

IMACS Series in Computational and Applied Mathematics, pages 167–<br />

178. IMACS, New Brunswick, NJ, 1999.<br />

[16] M. Benzi, C. D. Meyer, and M. Tůma. A sparse approximate inverse<br />

preconditioner <strong>for</strong> the conjugate gradient method. SIAM J. Scientific<br />

Computing, 17:1135–1149, 1996.<br />

[17] M. Benzi, D. B. Szyld, and A. van Duin. Orderings <strong>for</strong> incomplete<br />

factorization preconditioning of nonsymmetric problems. SIAM J.<br />

Scientific Computing, 20:1652–1670, 1999.<br />

[18] M. Benzi and M. Tůma. A comparison of some preconditioning<br />

techniques <strong>for</strong> general sparse matrices. In Iterative Methods in Linear<br />

Algebra, II, ed. P. Vassilevski and S. Margenov, IMACS Series in<br />

Computational and Applied Mathematics, vol. 3, Piscataway, NJ<br />

(1996), pp. 191-203.<br />

[19] M. Benzi and M. Tůma. A sparse approximate inverse preconditioner<br />

<strong>for</strong> nonsymmetric <strong>linear</strong> <strong>systems</strong>. SIAM J. Scientific Computing,<br />

19:968–994, 1998.<br />

[20] J.-P. Berenger. A perfectly matched layer <strong>for</strong> the absorption of<br />

electromagnetic waves. J. Comp. Phys., 114:185–200, 1994.


A.7. Numerical results <strong>for</strong> the multiplicative <strong>for</strong>mulation 231<br />

[21] R. Bramley and V. Menkov. Parallel <strong>preconditioners</strong> with low rank<br />

off-diagonal blocks. 1996. Submitted to Parallel Computing.<br />

[22] B. Carpentieri, I. S. Duff, and L. Giraud. <strong>Sparse</strong> pattern selection<br />

strategies <strong>for</strong> robust Frobenius-norm minimization <strong>preconditioners</strong> in<br />

electromagnetism. Numerical Linear Algebra with Applications, 7(7-<br />

8):667–685, 2000.<br />

[23] B. Carpentieri, I. S. Duff, L. Giraud, and G. Sylvand. Combining<br />

fast multipole techniques and an approximate inverse preconditioner<br />

<strong>for</strong> large parallel electromagnetics calculations. Technical Report in<br />

preparation, CERFACS, Toulouse, France.<br />

[24] L. M. Carvalho, L. Giraud, and P. Le Tallec. Algebraic two-level<br />

<strong>preconditioners</strong> <strong>for</strong> the Schur complement method. SIAM J. Scientific<br />

Computing, 22(6):1987–2005, 2001.<br />

[25] K. Chadan, D. Colton, L. Päivärinta, and W. Rundell. An introduction<br />

to Inverse Scattering and Inverse Spectral Problems. SIAM Book,<br />

Philadelphia, 1997.<br />

[26] T. Chan and T.P. Mathew. Domain Decomposition Algorithms,<br />

volume 3 of Acta Numerica, pages 61–143. Cambridge University<br />

Press, Cambridge, 1994.<br />

[27] T. Chan and H. A. van der Vorst. Approximate and<br />

incomplete factorizations. In D. E. Keyes, A. Sameh, and<br />

V. Venkatakrishnan, editors, Parallel Numerical Algorithms,<br />

ICASE/LaRC Interdisciplinary Series in Science and Engineering<br />

Volume 4, Kluwer Academic, Dordecht, 1997, pages 167–202, 1997.<br />

[28] K. Chen. On a class of preconditioning methods <strong>for</strong> <strong>dense</strong> <strong>linear</strong><br />

<strong>systems</strong> <strong>from</strong> boundary elements. SIAM J. Scientific Computing,<br />

20(2):684–698, 1998.<br />

[29] W. C. Chew, J. M. Jin, C. C. Lu, E. Michielssen, and J. M. Song.<br />

Fast solution methods in electromagnetics. IEEE Transactions on<br />

Antennas and Propagation, 45(3):533–543, 1997.<br />

[30] W. C. Chew and C. C. Lu. The use of Huygens’equivalence principle<br />

<strong>for</strong> solving 3D volume integral equation of scattering. IEEE Trans.<br />

Ant. Prop., 43(5):500–507, 1995.<br />

[31] W. C. Chew, C. C. Lu, and Y. M. Wang. Review of efficient<br />

computation of three-dimensional scattering of vector electromagnetic<br />

waves. J. Opt. Soc. Am. A, 11:1528–1537, 1994.


232 A. Numerical results with the two-level spectral preconditioner<br />

[32] W. C. Chew and Y. M. Wang. A recursive T-matrix approach <strong>for</strong><br />

the solution of electromagnetic scattering by many spheres. IEEE<br />

Transactions on Antennas and Propagation, 41(12):1633–1639, 1993.<br />

[33] E. Chow. Parallel implementation and practical use of sparse<br />

approximate inverse <strong>preconditioners</strong> with a priori sparsity patterns.<br />

Int. J. High Perf. Comput. Apps., 15:56–74, 2001.<br />

[34] E. Chow and Y. Saad. Experimental study of ILU <strong>preconditioners</strong><br />

<strong>for</strong> indefinite matrices. Journal of Computational and Applied<br />

Mathematics, 86:387–414, 1997.<br />

[35] E. Chow and Y. Saad. Approximate inverse <strong>preconditioners</strong> via sparsesparse<br />

iterations. SIAM J. Scientific Computing, 19(3):995–1023,<br />

1998.<br />

[36] A. Cosnau. Etude d’un préconditionneur pour les matrices complexes<br />

<strong>dense</strong> symmetric issues des équations de Maxwell en <strong>for</strong>mulation<br />

intégrale. Note technique ONERA, 1996. 142328.96/DI/MT.<br />

[37] E. Cuthill and J. McKee. Reducing the bandwidth of sparse symmetric<br />

matrices. In Proceedings 24th National Conference of the Association<br />

<strong>for</strong> Computing Machinery, Brandon Press, New Jersey, pages 157–<br />

172. Brandon Press, New Jersey, 1969.<br />

[38] E. Darve. The fast multipole method (I): Error analysis and<br />

asymptotic complexity. SIAM J. Numerical Analysis, 38(1):98–128,<br />

2000.<br />

[39] E. Darve. The fast multipole method: Numerical implementation. J.<br />

Comp. Phys., 160(1):195–240, 2000.<br />

[40] H. Daǧ. Iterative Methods and Parallel Computation <strong>for</strong> Power<br />

Systems. PhD thesis, Department of Electrical Engineering. University<br />

of Winsconsin, Madison, WI, 1996.<br />

[41] E. F. D’Azevedo, P. A. Forsyth, and W.-P. Tang. Drop tolerance<br />

preconditioning <strong>for</strong> incompressible viscous flow. Int. J. Computer<br />

Mathematics, 44:301–312, 1992.<br />

[42] C. de Boor. Dichotomies <strong>for</strong> band matrices. SIAM J. Numerical<br />

Analysis, 17:894–907, 1980.<br />

[43] E. de Sturler. Inner-outer methods with deflation <strong>for</strong> <strong>linear</strong> <strong>systems</strong><br />

with multiple right-hand sides. In Householder Symposium XIII,<br />

Proceedings of the Householder Symposium on Numerical Algebra,<br />

Pontresina, Switzerland, pages 193–196, June 17 - 26, 1996.


A.7. Numerical results <strong>for</strong> the multiplicative <strong>for</strong>mulation 233<br />

[44] B. Dembart and M. A. Epton. A 3D fast multipole method <strong>for</strong><br />

electromagnetics with multiple levels. Tech. Rep. ISSTECH-97-004,<br />

The Boeing Company, Seattle, WA, 1994.<br />

[45] B. Dembart and M. A. Epton. Low frequency multipole translation<br />

theory <strong>for</strong> the Helmholtz equation. Tech. Rep. SSGTECH-98-013, The<br />

Boeing Company, Seattle, WA, 1998.<br />

[46] B. Dembart and M. A. Epton. Spherical harmonic analysis and<br />

synthesis <strong>for</strong> the fast multipole method. Tech. Rep. SSGTECH-98-<br />

014, The Boeing Company, Seattle, WA, 1998.<br />

[47] B. Dembart and E. Yip. Matrix assembly in FMM-MOM codes. Tech.<br />

Rep. ISSTECH-97-002, The Boeing Company, Seattle, WA, 1997.<br />

[48] S. Demko. Inverses of band matrices and local convergence of spline<br />

projections. SIAM J. Numerical Analysis, 14:616–619, 1977.<br />

[49] S. Demko, W. F. Moss, and P. W. Smith. Decay rates <strong>for</strong> inverses of<br />

band matrices. Mathematics of Computation, 43:491–499, 1984.<br />

[50] B. Deprés. Quadratic functional and integral equations <strong>for</strong> harmonic<br />

wave problems in exterior domains. Mathematical Modelling and<br />

Numerical Analysis, 31(6):679–732, 1997.<br />

[51] J. Dongarra, J. Du Croz, I. S. Duff, and S. Hammarling. Algorithm<br />

679: A set of level 3 basic <strong>linear</strong> algebra subprograms. ACM Trans.<br />

Math. Softw., 16:18–28, 1990.<br />

[52] J. Dongarra, J. Du Croz, I. S. Duff, and S. Hammarling. A set of<br />

level 3 basic <strong>linear</strong> algebra subprograms. ACM Trans. Math. Softw.,<br />

16:1–17, 1990.<br />

[53] I. S. Duff, R. G. Grimes, and J. G. Lewis. User’s guide <strong>for</strong> Harwell-<br />

Boeing sparse matrix test problems collection. Tech. Report RAL-92-<br />

086, Computing and In<strong>for</strong>mation Systems Department, Ruther<strong>for</strong>d<br />

Appleton Laboratory, Didcot, UK, 1992.<br />

[54] I. S. Duff and G. A. Meurant. The effect of ordering on preconditioned<br />

conjugate gradient. BIT, 29:635–657, 1989.<br />

[55] I. S. Duff and H. A. van der Vorst. Preconditioning and parallel<br />

preconditioning. Tech. Rep. TR/PA/98/23, CERFACS, France, 1998.<br />

[56] A. Edelman. The first annual large <strong>dense</strong> <strong>linear</strong> system survey. The<br />

SIGNUM Newsletter, 26:6–12, 1991.


234 A. Numerical results with the two-level spectral preconditioner<br />

[57] A. Edelman. Large <strong>dense</strong> numerical <strong>linear</strong> algebra in 1993:<br />

The parallel computing influence. Journal of Supercomputing<br />

Applications., 7:113–128, 1993.<br />

[58] V. Eijkhout and B. Polman. Decay rates of inverses of banded M-<br />

matrices that are near to Toeplitz matrices. Linear Algebra and its<br />

Applications, 109:247–277, 1988.<br />

[59] J. Erhel, K. Burrage, and B. Pohl. Restarted GMRES preconditioned<br />

by deflation. J. Comput. Appl. Math., 69:303–318, 1996.<br />

[60] Q. Fan, P. A. Forsyth, W.-P. Tang, and J. R. F. McMacken.<br />

Per<strong>for</strong>mance issues <strong>for</strong> iterative solvers in semiconductor device<br />

simulation. SIAM J. Scientific Computing, 1:100–117, 1996.<br />

[61] M. R. Field. An efficient parallel preconditioner <strong>for</strong> the conjugate<br />

gradient algorithm. Technical Report HDL-TR-97-175, Hitachi Dublin<br />

Laboratory, Trinity College, Dublin, 1998.<br />

[62] V. Frayssé and L. Giraud. An implementation of block QMR <strong>for</strong><br />

J-symmetric matrices. Technical Report TR/PA/97/57, CERFACS,<br />

Toulouse, France, 1997.<br />

[63] V. Frayssé, L. Giraud, and S. Gratton. A set of GMRES routines <strong>for</strong><br />

real and complex arithmetics. Tech. Rep. TR/PA/97/49, CERFACS,<br />

1997.<br />

[64] V. Frayssé, L. Giraud, and S. Gratton. A set of Flexible-<br />

GMRES routines <strong>for</strong> real and complex arithmetics. Technical Report<br />

TR/PA/98/20, CERFACS, Toulouse, France, 1998.<br />

[65] P. O. Frederickson. Fast approximate inversion of large sparse <strong>linear</strong><br />

<strong>systems</strong>. Math. Report 7, Lakehead University, Thunder Bay, Canada,<br />

1975.<br />

[66] R. W. Freund. A transpose-free quasi-minimal residual algorithm<br />

<strong>for</strong> non-Hermitian <strong>linear</strong> <strong>systems</strong>. SIAM J. Scientific Computing,<br />

14(2):470–482, 1993.<br />

[67] R. W. Freund and N. M. Nachtigal. QMR: a quasi-minimal residual<br />

method <strong>for</strong> non-Hermitian <strong>linear</strong> <strong>systems</strong>. Numerische Mathematik,<br />

60(3):315–339, 1991.<br />

[68] R. W. Freund and N. M. Nachtigal. An implementation of the QMR<br />

method based on coupled two-term recurrences. SIAM J. Scientific<br />

Computing, 15(2):313–337, 1994.


A.7. Numerical results <strong>for</strong> the multiplicative <strong>for</strong>mulation 235<br />

[69] R. W. Freund and N. M. Nachtigal. Software <strong>for</strong> simplified Lanczos<br />

and QMR algorithms. Applied Numerical Mathematics, 19:319–341,<br />

1995.<br />

[70] R. W. Freund and N. M. Nachtigal. QMRPACK: a package of QMR<br />

algorithms. ACM Transactions on Mathematical Software, 22:46–77,<br />

1996.<br />

[71] J. George and J. W. H. Liu. The evolution of the minimum degree<br />

ordering algorithm. SIAM Review, 31:1–19, 1989.<br />

[72] J. R. Gilbert. Predicting structure in sparse matrix computations.<br />

SIAM J. Matrix Analysis and Applications, 15:62–79, 1994.<br />

[73] G. H. Golub and C. F. Van Loan. Matrix computations. Johns Hopkins<br />

Studies in the Mathematical Sciences. The Johns Hopkins University<br />

Press, Baltimore, MD, USA, third edition, 1996.<br />

[74] G. H. Golub and H. A. van der Vorst. Closer to the solution: iterative<br />

<strong>linear</strong> solvers. Technical report, In I.S. Duff and G.A. Watson editors,<br />

1997. The State of the Art in Numerical Analysis.<br />

[75] S. A. Goreinov, E. E. Tyrtyshnikov, and A. Yu Yeremin. Matrix-free<br />

iterative solution strategies <strong>for</strong> large <strong>dense</strong> <strong>linear</strong> <strong>systems</strong>. Numerical<br />

Linear Algebra with Applications, 4(4):273–294, 1997.<br />

[76] N. I. M. Gould and J. A. Scott. On approximate-inverse<br />

<strong>preconditioners</strong>. Tech. Rep. 95-026, RAL, 1995.<br />

[77] N. I. M. Gould and J. A. Scott. <strong>Sparse</strong> approximate-inverse<br />

<strong>preconditioners</strong> using norm-minimization techniques. SIAM J.<br />

Scientific Computing, 19(2):605–625, 1998.<br />

[78] A. Grama, V. Kumar, and A. Sameh. On n-body simulations<br />

using message-passing parallel computers. In Sidney Karin, editor,<br />

Proceedings of the 1995 SIAM Conference on Parallel Processing, San<br />

Francisco, CA, USA, 1995.<br />

[79] A. Grama, V. Kumar, and A. Sameh. Parallel matrix-vector product<br />

using approximate hierarchical methods. In Sidney Karin, editor,<br />

Proceedings of the 1995 ACM/IEEE Supercomputing Conference,<br />

December 3–8, 1995, San Diego Convention Center, San Diego, CA,<br />

USA, New York, NY, USA, 1995. ACM Press and IEEE Computer<br />

Society Press.<br />

[80] A. Grama, V. Kumar, and A. Sameh. Scalable parallel <strong>for</strong>mulations of<br />

the Barnes–Hut method <strong>for</strong> n-body simulations. Parallel Computing,<br />

24(5–6):797–822, 1998.


236 A. Numerical results with the two-level spectral preconditioner<br />

[81] L. Greengard and W. Gropp. A parallel version of the fast multipole<br />

method. Comput. Math. Appl., 20:63–71, 1990.<br />

[82] L. Greengard and V. Rokhlin. A fast algorithm <strong>for</strong> particle<br />

simulations. Journal of Computational Physics, 73:325–348, 1987.<br />

[83] M. Grote. Nonreflecting boundary conditions <strong>for</strong> electromagnetic<br />

scattering. Int. J. Numer. Model., 13:397–416, 2000.<br />

[84] M. Grote and T. Huckle. Parallel preconditionings with sparse<br />

approximate inverses. SIAM J. Scientific Computing, 18:838–853,<br />

1997.<br />

[85] W. Hackbusch. Multigrid methods and applications. Springer-Verlag,<br />

1985.<br />

[86] R. Harrington. Origin and development of the Method of Moments <strong>for</strong><br />

field computation. IEEE Antennas and Propagation Magazine, 1990.<br />

[87] HSL. A collection of Fortran codes <strong>for</strong> large scale scientific<br />

computation, 2000. http://www.numerical.rl.ac.uk/hsl.<br />

[88] I. C. F. Ipsen and C. D. Meyer. The idea behind Krylov<br />

methods. Tech. Rep. CRSC-TR97-3, NCSU Center For Research In<br />

Scientific Computation, January 31, 1997. To Appear in American<br />

Mathematical Monthly.<br />

[89] J. M. Jin and V. V. Liepa. A note on hybrid finite element method <strong>for</strong><br />

solving scattering problems. IEEE Trans. Ant. Prop., 36(10):1486–<br />

1489, 1988.<br />

[90] W. R. Scott Jr. Errors due to spatial discretization and numerical<br />

precision in the finite-element method. IEEE Trans. Ant. Prop.,<br />

42(11):1565–1569, 1994.<br />

[91] I. E. Kaporin. A preconditioned conjugate gradient method <strong>for</strong> solving<br />

discrete analogs of differential problems. Differential Equations,<br />

26:897–906, 1990.<br />

[92] S. A. Kharchenko and A. Yu. Yeremin. Eigenvalue translation based<br />

<strong>preconditioners</strong> <strong>for</strong> the GMRES(k) method. Numerical Linear Algebra<br />

with Applications, 2(1):51–77, 1995.<br />

[93] L. Yu. Kolotilina. Explicit preconditioning of <strong>systems</strong> of <strong>linear</strong><br />

algebraic equations with <strong>dense</strong> matrices. J. Sov. Math., 43:2566–<br />

2573, 1988. English translation of a paper first published in Zapisli<br />

Nauchnykh Seminarov Leningradskogo Otdeleniya Matematicheskogo<br />

im. V.A. Steklova AN SSSR 154 (1986) 90-100.


A.7. Numerical results <strong>for</strong> the multiplicative <strong>for</strong>mulation 237<br />

[94] L. Yu. Kolotilina. Twofold deflation preconditioning of <strong>linear</strong><br />

algebraic <strong>systems</strong>. I. Theory. Technical Report EM-RR 20/95, Elegant<br />

Mathematics, Inc., 1995. Available in Postscript <strong>for</strong>mat at the URL<br />

http://www.elegant-math.com/abs-emrr.htm.<br />

[95] L. Yu Kolotilina and A. Yu. Yeremin. Factorized sparse approximate<br />

inverse preconditionings. I: Theory. SIAM J. Matrix Analysis and<br />

Applications, 14:45–58, 1993.<br />

[96] L. Yu Kolotilina and A. Yu. Yeremin. Factorized sparse approximate<br />

inverse preconditionings. II: Solution of 3D FE <strong>systems</strong> on massively<br />

parallel computers. Int J. High Speed Computing, 7:191–215, 1995.<br />

[97] L. Yu Kolotilina, A. Yu. Yeremin, and A. A. Nikishin. Factorized<br />

sparse approximate inverse preconditionings. IV: Simple approaches to<br />

rising efficiency. Numerical Linear Algebra with Applications, 6:515–<br />

531, 1999.<br />

[98] L. Yu Kolotilina, A. Yu. Yeremin, and A. A. Nikishin.<br />

Factorized sparse approximate inverse preconditionings. III: Iterative<br />

construction of <strong>preconditioners</strong>. Journal of Mathematical Sciences,<br />

101:3237–3254, 2000. Originally published in Russian in<br />

Zap. Nauchn. Semin. POMI, 248:17-48, 1998.<br />

[99] K. S. Kunz and R. J. Luebbers. The Finite Difference Time Domain<br />

Method <strong>for</strong> Electromagnetics. CRC Press, Boca Raton, 1993.<br />

[100] R. Lee and A. C. Cangellaris. A study of discretization error in the<br />

finite element approximation of wave solution. IEEE Trans. Ant.<br />

Prop., 40(5):542–549, 1992.<br />

[101] S. W. Lee, H. Ling, and R. C. Chou. Ray tube integration in shooting<br />

and bouncing ray method. Micro. Opt. Tech. Lett., 1:285–289, 1988.<br />

[102] Z. Li, Y. Saad, and M. Sosonkina. pARMS: a parallel version of<br />

the algebraic recursive multilevel solver. Technical Report umsi-2001-<br />

100, Minnesota Supercomputer Institute, University of Minnesota,<br />

Minneapolis, MN, 2001.<br />

[103] J. C. Maxwell. A dynamical theory of the electromagnetic field. Royal<br />

Society Transactions, CLV, 1864. Reprinted in R. A. R. Tricker,<br />

The Contributions of Faraday and Maxwell to Electrical Science,<br />

(Pergamon Press, 1966).<br />

[104] B. McDonald and A. Wexler. Finite element solution of unbounded<br />

field problem. IEEE Trans. Microwave Theory Tech., 20:841–847,<br />

1972.


238 A. Numerical results with the two-level spectral preconditioner<br />

[105] J. A. Meijerink and H. A. van der Vorst. An iterative solution method<br />

<strong>for</strong> <strong>linear</strong> <strong>systems</strong> of which the coeffcient matrix is a symmetric M-<br />

matrix. Mathematics of Computation, 31:148–162, 1977.<br />

[106] G. Meurant. A review on the inverse of symmetric tridiagonal and<br />

block tridiagonal matrices. SIAM J. Matrix Analysis and Applications,<br />

13:707–728, 1992.<br />

[107] E. Michielssen and A. Boag. Multilevel evaluation of electromagnetic<br />

fields <strong>for</strong> the rapid solution of scattering problems. Micro. Opt. Tech.<br />

Lett., 7(17):790–795, 1994.<br />

[108] E. Michielssen and A. Boag. A multilevel matrix decomposition<br />

algorithm <strong>for</strong> analyzing scattering <strong>from</strong> large structures. IEEE<br />

Transactions on Antennas and Propagation, 44(8):1086 –1093, 1996.<br />

[109] M. Magolu monga Made. Incomplete factorization based<br />

preconditionings <strong>for</strong> solving the Helmholtz equation. Int. Journal <strong>for</strong><br />

Numerical Methods in Engineering, 50(5):1077–1101, 2001.<br />

[110] M. Magolu monga Made, R. Beauwens, and G. Warzee.<br />

Preconditioning of discrete Helmholtz operators perturbed by a<br />

diagonal complex matrix. Communications in Numerical Methods in<br />

Engineering, 11:801–817, 2000.<br />

[111] M. Magolu monga Made and H. A. van der Vorst. ParIC: A<br />

family of parallel incomplete Cholesky <strong>preconditioners</strong>. In M. Bubak,<br />

H. Afsarmanesh, R. Williams, and B. Hertzberger, editors, High<br />

Per<strong>for</strong>mance Computing and Networking. Proceedings of the HPCN<br />

Europe 2000 Conference, Amsterdam. Lecture Notes in Computer<br />

Science, 1823, pages 89–98. Springer-Verlag, Berlin, 2000.<br />

[112] R. B. Morgan. Implicitely restarted GMRES and Arnoldi methods<br />

<strong>for</strong> nonsymmetric <strong>systems</strong> of equations. SIAM J. Matrix Analysis and<br />

Applications, 21(4):1112–1135.<br />

[113] R. B. Morgan. A restarted GMRES method augmented with<br />

eigenvectors. SIAM J. Matrix Analysis and Applications, 16:1154–<br />

1171, 1995.<br />

[114] A. Pothen, H. D. Simon, and K. P. Liou. Partitioning sparse<br />

matrices with eigenvectors of graphs. SIAM J. Matrix Analysis and<br />

Applications, 11(3):430–452, 1990.<br />

[115] J. Rahola. Experiments on iterative methods and the fast multipole<br />

method in electromagnetic scattering calculations. Technical Report<br />

TR/PA/98/49, CERFACS, Toulouse, France, 1998.


A.7. Numerical results <strong>for</strong> the multiplicative <strong>for</strong>mulation 239<br />

[116] S. M. Rao, D. R. Wilton, and A. W. Glisson. Electromagnetic<br />

scattering by surfaces of arbitrary shape. IEEE Trans. Antennas<br />

Propagat., AP-30:409–418, 1982.<br />

[117] J.-C. Rioual. Solving <strong>linear</strong> <strong>systems</strong> <strong>for</strong> semiconductor device<br />

simulations on parallel distributed computers. PhD thesis, CERFACS,<br />

Toulouse, France, 2002.<br />

[118] J. W. Ruge and K. Stüben. Algebraic multigrid (AMG). 1987.<br />

In Multigrid Methods, Frontiers in Applied Mathematics 3, S.F.<br />

McCormick, ed., SIAM, Philadelphia, PA, pp. 73-130.<br />

[119] Y. Saad. Projection and deflation methods <strong>for</strong> partial pole assignment<br />

in <strong>linear</strong> state feedback. IEEE Trans. Automat. Contr., 33(3):290–297,<br />

1988.<br />

[120] Y. Saad. Analysis of augmented Krylov subspace techniques. SIAM<br />

J. Scientific Computing, 14:461–469, 1993.<br />

[121] Y. Saad. A flexible inner-outer preconditioned GMRES algorithm.<br />

SIAM J. Scientific and Statistical Computing, 14:461–469, 1993.<br />

[122] Y. Saad. Iterative Methods <strong>for</strong> <strong>Sparse</strong> Linear Systems. PWS<br />

Publishing, New York, 1996.<br />

[123] Y. Saad and M. H. Schultz. GMRES: A generalized minimal residual<br />

algorithm <strong>for</strong> solving nonsymmetric <strong>linear</strong> <strong>systems</strong>. SIAM J. Scientific<br />

and Statistical Computing, 7:856–869, 1986.<br />

[124] K. E. Schmidt and M. A. Lee. Implementing the fast multipole method<br />

in three dimensions. J. Statist. Phys., 63:1120, 1991.<br />

[125] P. P. Silvester and R. L. Ferrari. Finite Elements <strong>for</strong> Electrical<br />

Engineers. Cambridge University Press, Cambridge, 1990.<br />

[126] J. P. Singh, C. Holt, T. Totsuka, A. Gupta, and J. L. Hennessy.<br />

Load Balancing and Data Locality in Adaptive Hierarchical n-body<br />

Methods: Barnes-Hut, Fast Multiple, and Radiosity. Journal of<br />

Parallel and Distributed Computing, 27:118–141, 1995.<br />

[127] G. L. G. Sleijpen and H. A. van der Vorst. Maintaining convergence<br />

properties of Bi-CGSTAB methods in finite precision arithmetic.<br />

Numerical Algorithms, 10:203–223, 1995.<br />

[128] G. L. G. Sleijpen and H. A. van der Vorst. Reliable updated residuals<br />

in hybrid Bi-CG methods. Computing, 56:141–163, 1996.


240 A. Numerical results with the two-level spectral preconditioner<br />

[129] G. L. G. Sleijpen, H. A. van der Vorst, and D. R. Fokkema. Bi-<br />

CGSTAB(l) and other hybrid Bi-CG methods. Numerical Algorthms,<br />

7:75–109, 1994.<br />

[130] B. F. Smith, P. Bjørstad, and W. Gropp. Domain Decomposition,<br />

Parallel Multilevel Methods <strong>for</strong> Elliptic Partial Differential Equations.<br />

Cambridge University Press, New York, 1st edition, 1996.<br />

[131] P. Sonneveld. CGS, a fast Lanczos-type solver <strong>for</strong> nonsymmetric <strong>linear</strong><br />

<strong>systems</strong>. SIAM J. Scientific and Statistical Computing, 10:36–52,<br />

1989.<br />

[132] D. C. Sorensen. Implicit application of polynomial filters in a k-step<br />

Arnoldi method. SIAM J. Matrix Analysis and Applications, 13:357–<br />

385, 1992.<br />

[133] R. Suda. New Iterative Linear Solvers <strong>for</strong> Parallel Circuit Simulation.<br />

PhD thesis, Department of In<strong>for</strong>mation Sciences, University of Tokio,<br />

1996.<br />

[134] X. Sun and N. P. Pitsianis. A Matrix Version od the Fast Multipole<br />

Method. SIAM Review, 43(2):289–300, 2001.<br />

[135] G. Sylvand. Résolution Itérative de Formulation Intégrale pour<br />

Helmholtz 3D: Applications de la Méthode Multipôle à des Problèmes<br />

de Grande Taille. PhD thesis, Ecole Nationale des Ponts et Chaussées,<br />

2002.<br />

[136] D. B. Szyld and J. A. Vogel. A flexible quasi-minimal residual method<br />

with inexact preconditioning. SIAM J. Scientific Computing, 23:363–<br />

380, 2001.<br />

[137] A. Taflove. Computational Electrodynamics: The Finite-Difference<br />

Time-Domain Method. Artech House, Boston, 1995.<br />

[138] W.-P. Tang. Schwartz splitting and template operators. PhD thesis,<br />

Computer Science Dept., Stan<strong>for</strong>d University, Stan<strong>for</strong>d, CA, 1987.<br />

[139] W.-P. Tang. Towards an effective sparse approximate inverse<br />

<strong>preconditioners</strong>. SIAM J. Matrix Analysis and Applications,<br />

20(4):970–986, 1998.<br />

[140] W.-P. Tang and W. L. Wan. <strong>Sparse</strong> approximate inverse smoother<br />

<strong>for</strong> multigrid. SIAM J. Matrix Analysis and Applications, 21(4):1236–<br />

1252, 2000.<br />

[141] W. F. Tinney and J. W. Walker. Direct solutions of sparse network<br />

equations by optimally ordered triangular factorization. Proc. of the<br />

IEEE, 55:1801–1809, 1967.


A.7. Numerical results <strong>for</strong> the multiplicative <strong>for</strong>mulation 241<br />

[142] H. A. van der Vorst. Bi-CGSTAB: a fast and smoothly converging<br />

variant of Bi-CG <strong>for</strong> the solution of nonsymmetric <strong>linear</strong> <strong>systems</strong>.<br />

SIAM J. Scientific and Statistical Computing, 13:631–644, 1992.<br />

[143] H. A. van der Vorst and C. Vuik. The super<strong>linear</strong> convergence<br />

behaviour of GMRES. J. Comput. Appl. Math., 48:327–341, 1993.<br />

[144] S. A. Vavasis. Preconditioning <strong>for</strong> boundary integral equations. SIAM<br />

J. Matrix Analysis and Applications, 13:905–925, 1992.<br />

[145] J. L. Volakis, A. Chatterjee, and L. C. Kempel. Finite element methods<br />

<strong>for</strong> electromagnetics. IEEE Press, Piscataway, NJ, 1998.<br />

[146] J. W. Watts. A conjugate gradient truncated direct method <strong>for</strong> the<br />

iterative solution of the reservoir simulation pressure equation. Society<br />

of Petroleum Engineers Journal, 21:345–353, 1981.<br />

[147] J. H. Wilkinson. The Algebraic Eigenvalue Problem. Ox<strong>for</strong>d University<br />

Press, Walton Street, Ox<strong>for</strong>d OX2 6DP, UK, 1965.<br />

[148] J. Zhang. A sparse approximate inverse technique <strong>for</strong> parallel<br />

preconditioning of general sparse matrices. Tech. Rep. 281-98,<br />

Department of Computer Science, University of Kentucky, KY, 1998.<br />

Accepted <strong>for</strong> publication in Applied Mathematics and Computation.<br />

[149] F. Zhao and S. L. Johnsson. The parallel multipole method on the<br />

connection machine. SIAM J. Scientific and Statistical Computing,<br />

12:1420–1437, 1991.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!