Sparse preconditioners for dense linear systems from ... - cerfacs

Sparse preconditioners for dense linear systems from ... - cerfacs Sparse preconditioners for dense linear systems from ... - cerfacs

from cerfacs.fr More from this publisher

22.10.2014 Views

N o Ordre: 1879 PhD Thesis Spécialité : Informatique Sparse preconditioners for dense linear systems from electromagnetic applications présentée le 23 Avril 2002 à l’Institut National Polytechnique de Toulouse par Bruno CARPENTIERI CERFACS devant le Jury composé de : G. Alléon EADS M. Daydé Professeur à l’ENSEEIHT I. S. Duff Project Leader at CERFACS Group Leader at Rutherford Appleton Laboratory Président L. Giraud CERFACS G. Meurant CEA Rapporteur Y. Saad Professor at the University of Minnesota Rapporteur S. Piperno INRIA-CERMICS CERFACS report: TH/PA/02/48

N o Ordre: 1879

PhD Thesis

Spécialité :

Informatique

Sparse preconditioners for dense linear

systems from electromagnetic

applications

présentée le 23 Avril 2002 à

l’Institut National Polytechnique de Toulouse

par

Bruno CARPENTIERI

CERFACS

devant le Jury composé de :

G. Alléon EADS

M. Daydé Professeur à l’ENSEEIHT

I. S. Duff Project Leader at CERFACS

Group Leader at Rutherford Appleton Laboratory Président

L. Giraud CERFACS

G. Meurant CEA Rapporteur

Y. Saad Professor at the University of Minnesota Rapporteur

S. Piperno INRIA-CERMICS

CERFACS report: TH/PA/02/48

i

Acknowledgments

I wish to express my sincere gratitude to Iain S. Duff and Luc Giraud

because they introduced me to the subject of this thesis and guided my

research with vivid interest. They taught me the enjoyment both for rigour

and for simplicity, and let me experience the freedom and the excitement of

personal discovery. Without their professional advice and their trust in me,

this thesis would not be possible.

My sincere thanks go to Michel Daydé for his continued support in the

development of my research at CERFACS.

I am grateful to Gerard Meurant and Yousef Saad who accepted to act

as referees for my thesis. It was an honour for me to benefit from their

feedback on my research work.

I wish to thank Guillaume Alléon and Serge Piperno who opened me

the door of an enriching collaboration with EADS and INRIA-CERMICS,

respectively, and accepted to take part in my jury. Guillaume Sylvand at

INRIA-CERMICS deserves thanks for providing me with codes and valuable

support.

Grateful acknowledgments are made for the EMC Team at CERFACS

for their interest in my work, in particular to Mbarek Fares who provided

me with the CESC code, and Francis Collino and Florence Millot for many

fertile discussions.

I would like to thank sincerely all the members of the Parallel Algorithms

Team and CSG at CERFACS for their professional and friendly support, and

Brigitte Yzel for helping me many times gently. The Parallel Algorithms

Team represented a stimulating environment to develop my thesis. I am

grateful to many visitors or colleagues who, at different stages, shared my

enjoyment for this research.

Above all, I wish to express my deep gratitude to my family and friends

for their presence and continued support.

This work was supported by INDAM under a grant ”Borsa di Studio per

l’Estero A.A. 1998-’99” (Provvedimento del Presidente del 30 Aprile 1998),

and by CERFACS.

- B. C.

iii

To my family

v

Don’t just say ”it is impossible” without putting a sincere effort.

Observe the word ”Impossible” carefully .. You can see ”I’m possible”.

What really matters is your attitude and your perception.

Anonymous

vii

Abstract

In this work, we investigate the use of sparse approximate inverse

preconditioners for the solution of large dense complex linear systems arising

from integral equations in electromagnetism applications.

The goal of this study is the development of robust and parallelizable

preconditioners that can easily be integrated in simulation codes able to

treat large configurations. We first adapt to the dense situation the

preconditioners initialy developed for sparse linear systems. We compare

their respective numerical behaviours and propose a robust pattern selection

strategy for Frobenius-norm minimization preconditioners.

Our approach has been implemented by another PhD student in a

large parallel code that exploits a fast multipole calculation for the matrix

vector product in the Krylov iterations. This enables us to study the

numerical scalability of our preconditioner on large academic and industrial

test problems in order to identify its limitations. To remove these limitations

we propose an embedded scheme. This inner-outer technique enables to

significantly reduce the computational cost of the simulation and improve

the robustness of the preconditioner. In particular, we were able to solve a

linear system with more than a million unknowns arising from a simulation

on a real aircraft. That solution was out of reach with our initial technique.

Finally we perform a preliminary study on a spectral two-level

preconditioner to enhance the robustness of our preconditioner. This

numerical technique exploits spectral information of the preconditioned

systems to build a low-rank update of the preconditioner.

Keywords : Krylov subspace methods, preconditioning techniques,

sparse approximate inverse, Frobenius-norm minimization method, nonzero

pattern selection strategies, electromagnetic scattering applications,

boundary element method, fast multipole method.

viii

Contents

1 Introduction 1

1.1 The physical problem and applications . . . . . . . . . . . . . 2

1.2 The mathematical problem . . . . . . . . . . . . . . . . . . . 2

1.3 Numerical solution of Maxwell’s equations . . . . . . . . . . . 5

1.3.1 Differential equation methods . . . . . . . . . . . . . . 5

1.3.2 Integral equation methods . . . . . . . . . . . . . . . . 6

1.4 Direct versus iterative solution methods . . . . . . . . . . . . 8

1.4.1 A sparse approach for solving scattering problems . . 9

2 Iterative solution via preconditioned Krylov solvers of dense

systems in electromagnetism 13

2.1 Introduction and motivation . . . . . . . . . . . . . . . . . . . 13

2.2 Preconditioning based on sparsification strategies . . . . . . . 18

2.2.1 SSOR . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.2.2 Incomplete Cholesky factorization . . . . . . . . . . . 25

2.2.3 AINV . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

2.2.4 SPAI . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

2.2.5 SLU . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

2.2.6 Other preconditioners . . . . . . . . . . . . . . . . . . 50

2.3 Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . 50

3 Sparse pattern selection strategies for robust Frobeniusnorm

minimization preconditioner 53

3.1 Introduction and motivation . . . . . . . . . . . . . . . . . . . 54

3.2 Pattern selection strategies for Frobenius-norm minimization

methods in electromagnetism . . . . . . . . . . . . . . . . . . 56

3.2.1 Algebraic strategy . . . . . . . . . . . . . . . . . . . . 56

3.2.2 Topological strategy . . . . . . . . . . . . . . . . . . . 58

3.2.3 Geometric strategy . . . . . . . . . . . . . . . . . . . . 60

3.2.4 Numerical experiments . . . . . . . . . . . . . . . . . . 61

3.3 Strategies for the coefficient matrix . . . . . . . . . . . . . . . 65

3.4 Numerical results . . . . . . . . . . . . . . . . . . . . . . . . . 68

3.5 Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . 73

x

4 Symmetric Frobenius-norm minimization preconditioners in

electromagnetism 77

4.1 Comparison with standard preconditioners . . . . . . . . . . . 77

4.2 Symmetrization strategies for Frobenius-norm minimization

method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

4.3 Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . 88

5 Combining fast multipole techniques and approximate

inverse preconditioners for large parallel electromagnetics

calculations. 91

5.1 The fast multipole method . . . . . . . . . . . . . . . . . . . . 92

5.2 Implementation of the Frobenius-norm minimization

preconditioner in the fast multipole framework . . . . . . . . 94

5.3 Numerical scalability of the preconditioner . . . . . . . . . . . 96

5.4 Improving the preconditioner robustness using embedded

iterations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

5.5 Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . 108

6 Spectral two-level preconditioner 111

6.1 Introduction and motivation . . . . . . . . . . . . . . . . . . . 111

6.2 Two-level preconditioner via low-rank spectral updates . . . . 114

6.2.1 Additive formulation . . . . . . . . . . . . . . . . . . . 115

6.2.2 Numerical experiments . . . . . . . . . . . . . . . . . . 118

6.2.3 Symmetric formulation . . . . . . . . . . . . . . . . . . 136

6.3 Multiplicative formulation of low-rank spectral updates . . . 139

6.3.1 Numerical experiments . . . . . . . . . . . . . . . . . . 140

6.4 Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . 143

7 Conclusions and perspectives 145

A Numerical results with the two-level spectral

preconditioner 153

A.1 Effect of the low-rank updates on the GMRES convergence . 154

A.2 Experiments with the operator W H = Vɛ H M 1 . . . . . . . . . 164

A.3 Cost of the eigencomputation . . . . . . . . . . . . . . . . . . 174

A.4 Sensitivity of the preconditioner to the accuracy of the

eigencomputation . . . . . . . . . . . . . . . . . . . . . . . . . 179

A.5 Experiments with a poor preconditioner M 1 . . . . . . . . . . 189

A.6 Numerical results for the symmetric formulation . . . . . . . 204

A.7 Numerical results for the multiplicative formulation . . . . . . 216

List of Tables

2.1.1 Number of matrix-vector products needed by some

unpreconditioned Krylov solvers to reduce the residual by a

factor of 10 −5 . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.2.2 Number of iterations using both symmetric and unsymmetric

preconditioned Krylov methods to reduce the normwise

backward error by 10 −5 on Example 1. The symbol ’-’ means

that convergence was not obtained after 500 iterations. The

symbol ’*’ means that the method is not applicable. . . . . . 20

2.2.3 Number of iterations required by different Krylov solvers

preconditioned by SSOR to reduce the residual by 10 −5 . The

symbol ’-’ means that convergence was not obtained after 500

iterations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

2.2.4 Number of iterations, varying the sparsity level of Ã and the

level of fill-in on Example 1. . . . . . . . . . . . . . . . . . . 25

2.2.5 Number of iterations, varying the sparsity level of Ã and the

level of fill-in on Example 2. . . . . . . . . . . . . . . . . . . 26

2.2.6 Number of iterations, varying the sparsity level of Ã and the

level of fill-in on Example 3. . . . . . . . . . . . . . . . . . . 27

2.2.7 Number of iterations, varying the sparsity level of Ã and the

level of fill-in on Example 4. . . . . . . . . . . . . . . . . . . 28

2.2.8 Number of iterations, varying the sparsity level of Ã and the

level of fill-in on Example 5. . . . . . . . . . . . . . . . . . . 29

2.2.9 Number of SQMR iterations, varying the shift parameter for

various level of fill-in in IC. . . . . . . . . . . . . . . . . . . 34

2.2.10 Number of iterations required by different Krylov solvers

preconditioned by AINV to reduce the residual by 10 −5 . The

symbol ’-’ means that convergence was not obtained after 500

iterations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

2.2.11 Number of iterations required by different Krylov solvers

preconditioned by AINV to reduce the residual by 10 −5 . The

preconditioner is computed using the dense coefficient matrix.

The symbol ’-’ means that convergence was not obtained after

500 iterations. . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

xii

2.2.12 Number of iterations required by different Krylov solvers

preconditioned by SPAI to reduce the residual by 10 −5 . The

symbol ’-’ means that convergence was not obtained after 500

iterations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

2.2.13 Number of iterations required by different Krylov solvers

preconditioned by SLU to reduce the residual by 10 −5 . The

symbol ’-’ means that convergence was not obtained after 500

iterations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

3.2.1 Number of iterations using the preconditioners based on

dense A. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

3.3.2 Number of iterations for GMRES(50) preconditioned with

different values for the density of M using the same pattern

for A and larger patterns. A geometric approach is adopted

to construct the patterns. The test problem is Example 1.

This is representative of the general behaviour observed. . . . 67

3.4.3 Number of iterations to solve the set of test problems. . . . . 71

3.4.4 CPU time to compute the preconditioners. . . . . . . . . . . . 71

3.5.5 Number of iterations to solve the set of test models by

using a multiple density geometric strategy to construct the

preconditioner. The pattern imposed on M is twice as dense

as that imposed on A. . . . . . . . . . . . . . . . . . . . . . . 74

3.5.6 Number of iterations to solve the set of test models by using

a topological strategy to sparsify A and a geometric strategy

for the preconditioner. The pattern imposed on M is twice

as dense as that imposed on A. . . . . . . . . . . . . . . . . . 75

4.1.1 Number of iterations with some standard preconditioners

computed using sparse A (algebraic). . . . . . . . . . . . . . . 80

4.2.2 Number of iterations on the test examples using the same

pattern for the preconditioners. . . . . . . . . . . . . . . . . 83

4.2.3 Number of iterations for M Sym−F rob combined with SQMR

using three times more non-zero in Ã than in the preconditioner. 83

4.2.4 Number of iterations of SQMR with M Sym−F rob with

different values for the density of M, using the same pattern

for A and larger patterns. The test problem is Example 1. . . 84

4.2.5 Number of iterations of SQMR with M Aver−F rob with

different values for the density of M, using the same pattern

for A and larger patterns. The test problem is Example 1. . . 84

4.2.6 Number of iterations of SQMR with M Sym−F rob with

different orderings. . . . . . . . . . . . . . . . . . . . . . . . . 85

4.2.7 Number of iterations on the test examples using the same

pattern for the preconditioners. An algebraic pattern is used

to sparsify A. . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

xiii

4.2.8 Number of iterations M Sym−F rob combined with SQMR using

three times more non-zero in Ã than in the preconditioner.

An algebraic pattern is used to sparsify A. . . . . . . . . . . . 87

4.2.9 Number of iterations of SQMR with M Sym−F rob with

different values for the density of M, using the same pattern

for A and larger patterns. A geometric approach is adopted

to construct the pattern for the preconditioner and an

algebraic approach is adopted to construct the pattern for

the coefficient matrix. The test problem is Example 1. . . . . 87

4.2.10 Number of iterations of SQMR with M Aver−F rob with

different values for the density of M, using the same pattern

for A and larger patterns. A geometric approach is adopted

to construct the pattern for the preconditioner and an

algebraic approach is adopted to construct the pattern for

the coefficient matrix. The test problem is Example 1. . . . . 88

4.2.11 Number of iterations of SQMR with M Sym−F rob with

different ordering. An algebraic pattern is used to sparsify

A. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

5.3.1 Total number of matrix-vector products required to converge

on a sphere on problems of increasing size - tolerance = 10 −2 .

The size of the leaf-boxes in the oct-tree associated with the

preconditioner is 0.125 wavelengths. . . . . . . . . . . . . . . 98

5.3.2 Elapsed time required to build the preconditioner and

by GMRES(30) to converge on a sphere on problems of

increasing size on eight processors on a Compaq Alpha server

- tolerance = 10 −2 . . . . . . . . . . . . . . . . . . . . . . . . . 98

5.3.3 Total number of matrix-vector products required to converge

on an aircraft on problems of increasing size - tolerance =

2 · 10 −2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

5.3.4 Elapsed time required to build the preconditioner and

by GMRES(30) to converge on an aircraft on problems of

increasing size on eight procs on a Compaq Alpha server -

tolerance = 2 · 10 −2 . . . . . . . . . . . . . . . . . . . . . . . . 99

5.3.5 Elapsed time to build the preconditioner, elapsed time to

solve the problem and total number of matrix-vector products

using GMRES(30) on an aircraft with 213084 unknowns -

tolerance = 2 · 10 −2 - eight processors Compaq, varying the

parameters controlling the density of the preconditioner. The

symbol ’–’ means stagnation after 1000 iterations. . . . . . . . 102

5.3.6 Tests on the parallel scalability of the code relative to the

construction and application of the preconditioner and to the

matrix-vector product operation on problems of increasing

size. The test example is the Airbus aircraft. . . . . . . . . . 103

xiv

5.4.7 Global elapsed time and total number of matrix-vector

products required to converge on a sphere with 367500

points varying the size of the restart parameters and the

maximum number of inner GMRES iterations per FGMRES

preconditioning step - tolerance = 10 −2 - eight processors

Compaq. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

5.4.8 Global elapsed time and total number of matrixvector

products required to converge on an aircraft with

213084 unknowns varying the size of the restart parameters

and the maximum number of inner GMRES iterations per

FGMRES preconditioning step - tolerance = 2 · 10 −2 - eight

processors Compaq. . . . . . . . . . . . . . . . . . . . . . . . 105

5.4.9 Total number of matrix-vector products required to converge

on a sphere on problems of increasing size - tolerance = 10 −2 . 107

5.4.10 Total number of matrix-vector products required to converge

on an aircraft on problems of increasing size - tolerance =

2 · 10 −2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

6.2.1 Effect of shifting the eigenvalues nearest zero on the

convergence of GMRES(10) for Example 2. We show the

magnitude of successively shifted eigenvalues and the number

of iterations required when these eigenvalues are shifted. A

tolerance of 10 −8 is required in the iterative solution. . . . . . 120

6.2.2 Effect of shifting the eigenvalues nearest zero on the

convergence of GMRES(10) for Example 5. We show the

magnitude of successively shifted eigenvalues and the number

of iterations required when these eigenvalues are shifted. A

tolerance of 10 −8 is required in the iterative solution. . . . . . 121

6.2.3 Number of iterations required by GMRES(10) preconditioned

by a Frobenius-norm minimization method updated with

spectral corrections to reduce the normwise backward error

by 10 −8 for increasing size of the coarse space on Example 1.

Different choices are considered for the operator W H . . . . . 125

6.2.4 Number of iterations required by GMRES(10) preconditioned

by a Frobenius-norm minimization method updated with

spectral corrections to reduce the normwise backward error

by 10 −8 for increasing size of the coarse space on Example 2.

Different choices are considered for the operator W H . . . . . 126

6.2.5 Number of iterations required by GMRES(10) preconditioned

by a Frobenius-norm minimization method updated with

spectral corrections to reduce the normwise backward error

by 10 −8 for increasing size of the coarse space on Example 3.

Different choices are considered for the operator W H . . . . . 127

xv

6.2.6 Number of iterations required by GMRES(10) preconditioned

by a Frobenius-norm minimization method updated with

spectral corrections to reduce the normwise backward error

by 10 −8 for increasing size of the coarse space on Example 4.

Different choices are considered for the operator W H . . . . . 128

6.2.7 Number of iterations required by GMRES(10) preconditioned

by a Frobenius-norm minimization method updated with

spectral corrections to reduce the normwise backward error

by 10 −8 for increasing size of the coarse space on Example 5.

Different choices are considered for the operator W H . . . . . 129

6.2.8 Number of matrix-vector products required by the IRAM

algorithm to compute approximate eigenvalues nearest zero

and the corresponding right eigenvectors. . . . . . . . . . . . 130

6.2.9 Number of amortization vectors required by the IRAM

algorithm to compute approximate eigenvalues nearest zero

and the corresponding right eigenvectors. The computation

of the amortization vectors is relative to GMRES(10) and a

tolerance of 10 −5 . . . . . . . . . . . . . . . . . . . . . . . . . . 131

6.2.10 Number of iterations required by GMRES(10)

preconditioned by a Frobenius-norm minimization method

updated with spectral corrections to reduce the normwise

backward error by 10 −8 for increasing size of the coarse space.

The formulation of Theorem 2 with the choice W H = Vε H M 1

is used for the low-rank updates. The computation of Ritz

pairs is carried out at machine precision. . . . . . . . . . . . . 132

A.1.1 Number of iterations required by GMRES preconditioned

by a Frobenius-norm minimization method updated with

spectral corrections to reduce the normwise backward error

by 10 −8 for increasing size of the coarse space. . . . . . . . . 154

A.1.2 Number of iterations required by GMRES preconditioned

by a Frobenius-norm minimization method updated with

spectral corrections to reduce the normwise backward error

by 10 −5 for increasing size of the coarse space. . . . . . . . . 155

A.1.3 Number of iterations required by GMRES preconditioned

by a Frobenius-norm minimization method updated with

spectral corrections to reduce the normwise backward error

by 10 −8 for increasing size of the coarse space. . . . . . . . . 156

A.1.4 Number of iterations required by GMRES preconditioned

by a Frobenius-norm minimization method updated with

spectral corrections to reduce the normwise backward error

by 10 −5 for increasing size of the coarse space. . . . . . . . . 157

xvi

A.1.5 Number of iterations required by GMRES preconditioned

by a Frobenius-norm minimization method updated with

spectral corrections to reduce the normwise backward error

by 10 −8 for increasing size of the coarse space. . . . . . . . . 158

A.1.6 Number of iterations required by GMRES preconditioned

by a Frobenius-norm minimization method updated with

spectral corrections to reduce the normwise backward error

by 10 −5 for increasing size of the coarse space. . . . . . . . . 159

A.1.7 Number of iterations required by GMRES preconditioned

by a Frobenius-norm minimization method updated with

spectral corrections to reduce the normwise backward error

by 10 −8 for increasing size of the coarse space. . . . . . . . . 160

A.1.8 Number of iterations required by GMRES preconditioned

by a Frobenius-norm minimization method updated with

spectral corrections to reduce the normwise backward error

by 10 −5 for increasing size of the coarse space. . . . . . . . . 161

A.1.9 Number of iterations required by GMRES preconditioned

by a Frobenius-norm minimization method updated with

spectral corrections to reduce the normwise backward error

by 10 −8 for increasing size of the coarse space. . . . . . . . . 162

A.1.10 Number of iterations required by GMRES preconditioned

by a Frobenius-norm minimization method updated with

spectral corrections to reduce the normwise backward error

by 10 −5 for increasing size of the coarse space. . . . . . . . . 163

A.2.11Number of iterations required by GMRES preconditioned

by a Frobenius-norm minimization method updated with

spectral corrections to reduce the normwise backward error by

10 −8 for increasing size of the coarse space. The formulation

of Theorem 2 with the choice W H = Vε

H M 1 is used for the

low-rank updates. . . . . . . . . . . . . . . . . . . . . . . . . . 164

A.2.12Number of iterations required by GMRES preconditioned

by a Frobenius-norm minimization method updated with

spectral corrections to reduce the normwise backward error by

10 −5 for increasing size of the coarse space. The formulation

of Theorem 2 with the choice W H = Vε

H M 1 is used for the

low-rank updates. . . . . . . . . . . . . . . . . . . . . . . . . . 165

A.2.13Number of iterations required by GMRES preconditioned

by a Frobenius-norm minimization method updated with

spectral corrections to reduce the normwise backward error by

10 −8 for increasing size of the coarse space. The formulation

of Theorem 2 with the choice W H = Vε

H M 1 is used for the

low-rank updates. . . . . . . . . . . . . . . . . . . . . . . . . . 166

xvii

A.2.14Number of iterations required by GMRES preconditioned

by a Frobenius-norm minimization method updated with

spectral corrections to reduce the normwise backward error by

10 −5 for increasing size of the coarse space. The formulation

of Theorem 2 with the choice W H = Vε

H M 1 is used for the

low-rank updates. . . . . . . . . . . . . . . . . . . . . . . . . . 167

A.2.15Number of iterations required by GMRES preconditioned

by a Frobenius-norm minimization method updated with

spectral corrections to reduce the normwise backward error by

10 −8 for increasing size of the coarse space. The formulation

of Theorem 2 with the choice W H = Vε

H M 1 is used for the

low-rank updates. . . . . . . . . . . . . . . . . . . . . . . . . . 168

A.2.16Number of iterations required by GMRES preconditioned

by a Frobenius-norm minimization method updated with

spectral corrections to reduce the normwise backward error by

10 −5 for increasing size of the coarse space. The formulation

of Theorem 2 with the choice W H = Vε

H M 1 is used for the

low-rank updates. . . . . . . . . . . . . . . . . . . . . . . . . . 169

A.2.17Number of iterations required by GMRES preconditioned

by a Frobenius-norm minimization method updated with

spectral corrections to reduce the normwise backward error by

10 −8 for increasing size of the coarse space. The formulation

of Theorem 2 with the choice W H = Vε

H M 1 is used for the

low-rank updates. . . . . . . . . . . . . . . . . . . . . . . . . . 170

A.2.18Number of iterations required by GMRES preconditioned

by a Frobenius-norm minimization method updated with

spectral corrections to reduce the normwise backward error by

10 −5 for increasing size of the coarse space. The formulation

of Theorem 2 with the choice W H = Vε

H M 1 is used for the

low-rank updates. . . . . . . . . . . . . . . . . . . . . . . . . . 171

A.2.19Number of iterations required by GMRES preconditioned

by a Frobenius-norm minimization method updated with

spectral corrections to reduce the normwise backward error by

10 −8 for increasing size of the coarse space. The formulation

of Theorem 2 with the choice W H = Vε

H M 1 is used for the

low-rank updates. . . . . . . . . . . . . . . . . . . . . . . . . . 172

A.2.20 Number of iterations required by GMRES preconditioned

by a Frobenius-norm minimization method updated with

spectral corrections to reduce the normwise backward error by

10 −5 for increasing size of the coarse space. The formulation

of Theorem 2 with the choice W H = Vε

H M 1 is used for the

low-rank updates. . . . . . . . . . . . . . . . . . . . . . . . . . 173

xviii

A.3.21 Number of matrix-vector products, CPU time and

amortization vectors required by the IRAM algorithm to

compute approximate eigenvalues nearest zero and the

corresponding eigenvectors. The computation of the

amortization vectors is relative to GMRES(10) and a

tolerance of 10 −5 . . . . . . . . . . . . . . . . . . . . . . . . . . 174

A.3.22 Number of matrix-vector products, CPU time and

amortization vectors required by the IRAM algorithm to

compute approximate eigenvalues nearest zero and the

corresponding eigenvectors. The computation of the

amortization vectors is relative to GMRES(10) and a

tolerance of 10 −5 . . . . . . . . . . . . . . . . . . . . . . . . . . 175

A.3.23 Number of matrix-vector products, CPU time and

amortization vectors required by the IRAM algorithm to

compute approximate eigenvalues nearest zero and the

corresponding eigenvectors. The computation of the

amortization vectors is relative to GMRES(10) and a

tolerance of 10 −5 . . . . . . . . . . . . . . . . . . . . . . . . . . 176

A.3.24 Number of matrix-vector products, CPU time and

amortization vectors required by the IRAM algorithm to

compute approximate eigenvalues nearest zero and the

corresponding eigenvectors. The computation of the

amortization vectors is relative to GMRES(10) and a

tolerance of 10 −5 . . . . . . . . . . . . . . . . . . . . . . . . . . 177

A.3.25 Number of matrix-vector products, CPU time and

amortization vectors required by the IRAM algorithm to

compute approximate eigenvalues nearest zero and the

corresponding eigenvectors. The computation of the

amortization vectors is relative to GMRES(10) and a

tolerance of 10 −5 . . . . . . . . . . . . . . . . . . . . . . . . . . 178

A.4.26Number of iterations required by GMRES preconditioned

by a Frobenius-norm minimization method updated with

spectral corrections to reduce the residual by 10 −8 for

increasing size of the coarse space. The formulation of

Theorem 2 with the choice W H = Vε

H M 1 is used for the

low-rank updates. The computation of Ritz pairs is carried

out at machine precision. . . . . . . . . . . . . . . . . . . . . 179

A.4.27Number of iterations required by GMRES preconditioned

by a Frobenius-norm minimization method updated with

spectral corrections to reduce the normwise backward error by

10 −5 for increasing size of the coarse space. The formulation

M 1 is used for the

low-rank updates. The computation of Ritz pairs is carried

out at machine precision. . . . . . . . . . . . . . . . . . . . . 180

of Theorem 2 with the choice W H = V H

xix

A.4.28Number of iterations required by GMRES preconditioned

by a Frobenius-norm minimization method updated with

spectral corrections to reduce the normwise backward error by

10 −8 for increasing size of the coarse space. The formulation

of Theorem 2 with the choice W H = Vε

H M 1 is used for the

low-rank updates. The computation of Ritz pairs is carried

out at machine precision. . . . . . . . . . . . . . . . . . . . . 181

A.4.29Number of iterations required by GMRES preconditioned

by a Frobenius-norm minimization method updated with

spectral corrections to reduce the normwise backward error by

10 −5 for increasing size of the coarse space. The formulation

of Theorem 2 with the choice W H = Vε

H M 1 is used for the

low-rank updates. The computation of Ritz pairs is carried

out at machine precision. . . . . . . . . . . . . . . . . . . . . 182

A.4.30 Number of iterations required by GMRES preconditioned

by a Frobenius-norm minimization method updated with

spectral corrections to reduce the normwise backward error by

10 −8 for increasing size of the coarse space. The formulation

of Theorem 2 with the choice W H = Vε

H M 1 is used for the

low-rank updates. The computation of Ritz pairs is carried

out at machine precision. . . . . . . . . . . . . . . . . . . . . 183

A.4.31 Number of iterations required by GMRES preconditioned

by a Frobenius-norm minimization method updated with

spectral corrections to reduce the normwise backward error by

10 −5 for increasing size of the coarse space. The formulation

of Theorem 2 with the choice W H = Vε

H M 1 is used for the

low-rank updates. The computation of Ritz pairs is carried

out at machine precision. . . . . . . . . . . . . . . . . . . . . 184

A.4.32 Number of iterations required by GMRES preconditioned

by a Frobenius-norm minimization method updated with

spectral corrections to reduce the normwise backward error by

10 −8 for increasing size of the coarse space. The formulation

M 1 is used for the

low-rank updates. The computation of Ritz pairs is carried

out at machine precision. . . . . . . . . . . . . . . . . . . . . 185

of Theorem 2 with the choice W H = V H

ε

A.4.33 Number of iterations required by GMRES preconditioned

by a Frobenius-norm minimization method updated with

spectral corrections to reduce the normwise backward error by

10 −5 for increasing size of the coarse space. The formulation

M 1 is used for the

low-rank updates. The computation of Ritz pairs is carried

out at machine precision. . . . . . . . . . . . . . . . . . . . . 186

of Theorem 2 with the choice W H = V H

xx

A.4.34 Number of iterations required by GMRES preconditioned

by a Frobenius-norm minimization method updated with

spectral corrections to reduce the normwise backward error by

10 −8 for increasing size of the coarse space. The formulation

of Theorem 2 with the choice W H = Vε

H M 1 is used for the

low-rank updates. The computation of Ritz pairs is carried

out at machine precision. . . . . . . . . . . . . . . . . . . . . 187

A.4.35 Number of iterations required by GMRES preconditioned

by a Frobenius-norm minimization method updated with

spectral corrections to reduce the normwise backward error by

10 −5 for increasing size of the coarse space. The formulation

of Theorem 2 with the choice W H = Vε

H M 1 is used for the

low-rank updates. The computation of Ritz pairs is carried

out at machine precision. . . . . . . . . . . . . . . . . . . . . 188

A.5.36 Number of iterations required by GMRES preconditioned

by a Frobenius-norm minimization method updated with

spectral corrections to reduce the normwise backward error by

10 −8 for increasing size of the coarse space. The formulation

of Theorem 2 with the choice W H = Vε

H M 1 is used for the

low-rank updates. The same nonzero sructure is imposed on

A and M 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189

A.5.37 Number of iterations required by GMRES preconditioned

by a Frobenius-norm minimization method updated with

spectral corrections to reduce the normwise backward error by

10 −5 for increasing size of the coarse space. The formulation

of Theorem 2 with the choice W H = Vε

H M 1 is used for the

low-rank updates. The same nonzero structure is imposed on

A and M 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190

A.5.38 Number of iterations required by GMRES preconditioned

by a Frobenius-norm minimization method updated with

spectral corrections to reduce the normwise backward error by

10 −8 for increasing size of the coarse space. The formulation

of Theorem 2 with the choice W H = Vε

H M 1 is used for the

low-rank updates. . . . . . . . . . . . . . . . . . . . . . . . . . 191

A.5.39 Number of iterations required by GMRES preconditioned

by a Frobenius-norm minimization method updated with

spectral corrections to reduce the normwise backward error by

10 −5 for increasing size of the coarse space. The formulation

M 1 is used for the

low-rank updates. The same nonzero structure is imposed on

A and M 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192

of Theorem 2 with the choice W H = V H

xxi

A.5.40 Number of iterations required by GMRES preconditioned

by a Frobenius-norm minimization method updated with

spectral corrections to reduce the normwise backward error by

10 −8 for increasing size of the coarse space. The formulation

of Theorem 2 with the choice W H = Vε

H M 1 is used for the

low-rank updates. The same nonzero structure is imposed on

A and M 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193

A.5.41 Number of iterations required by GMRES preconditioned

by a Frobenius-norm minimization method updated with

spectral corrections to reduce the normwise backward error by

10 −5 for increasing size of the coarse space. The formulation

of Theorem 2 with the choice W H = Vε

H M 1 is used for the

low-rank updates. The same nonzero structure is imposed on

A and M 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194

A.5.42 Number of iterations required by GMRES preconditioned

by a Frobenius-norm minimization method updated with

spectral corrections to reduce the normwise backward error by

10 −8 for increasing size of the coarse space. The formulation

of Theorem 2 with the choice W H = Vε

H M 1 is used for the

low-rank updates. The same nonzero structure is imposed on

A and M 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195

A.5.43 Number of iterations required by GMRES preconditioned

by a Frobenius-norm minimization method updated with

spectral corrections to reduce the normwise backward error by

10 −5 for increasing size of the coarse space. The formulation

of Theorem 2 with the choice W H = Vε

H M 1 is used for the

low-rank updates. The same nonzero structure is imposed on

A and M 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196

A.5.44 Number of iterations required by GMRES preconditioned

by a Frobenius-norm minimization method updated with

spectral corrections to reduce the normwise backward error by

10 −8 for increasing size of the coarse space. The formulation

M 1 is used for the

low-rank updates. The same nonzero sructure is imposed on

A and M 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197

of Theorem 2 with the choice W H = V H

ε

A.5.45 Number of iterations required by GMRES preconditioned

by a Frobenius-norm minimization method updated with

spectral corrections to reduce the normwise backward error by

10 −5 for increasing size of the coarse space. The formulation

M 1 is used for the

low-rank updates. The same nonzero structure is imposed on

A and M 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198

of Theorem 2 with the choice W H = V H

xxii

A.5.46 Number of matrix-vector products, CPU time and

amortization vectors required by the IRAM algorithm to

compute approximate eigenvalues nearest zero and the

corresponding eigenvectors. The computation of the

amortization vectors is relative to GMRES(10) and a

tolerance of 10 −5 . . . . . . . . . . . . . . . . . . . . . . . . . . 199

A.5.47 Number of matrix-vector products, CPU time and

amortization vectors required by the IRAM algorithm to

compute approximate eigenvalues nearest zero and the

corresponding eigenvectors. The computation of the

amortization vectors is relative to GMRES(10) and a

tolerance of 10 −5 . . . . . . . . . . . . . . . . . . . . . . . . . . 200

A.5.48 Number of matrix-vector products, CPU time and

amortization vectors required by the IRAM algorithm to

compute approximate eigenvalues nearest zero and the

corresponding eigenvectors. The computation of the

amortization vectors is relative to GMRES(10) and a

tolerance of 10 −5 . . . . . . . . . . . . . . . . . . . . . . . . . . 201

A.5.49 Number of matrix-vector products, CPU time and

amortization vectors required by the IRAM algorithm to

compute approximate eigenvalues nearest zero and the

corresponding eigenvectors. The computation of the

amortization vectors is relative to GMRES(10) and a

tolerance of 10 −5 . . . . . . . . . . . . . . . . . . . . . . . . . . 202

A.5.50 Number of matrix-vector products, CPU time and

amortization vectors required by the IRAM algorithm to

compute approximate eigenvalues nearest zero and the

corresponding eigenvectors. The computation of the

amortization vectors is relative to GMRES(10) and a

tolerance of 10 −5 . . . . . . . . . . . . . . . . . . . . . . . . . . 203

A.6.51 Number of iterations required by GMRES preconditioned

by a Frobenius-norm minimization method updated with

spectral corrections to reduce the normwise backward error

by 10 −8 for increasing size of the coarse space. The symmetric

formulation of Theorem 2 with the choice W = V ε is used for

the low-rank updates. . . . . . . . . . . . . . . . . . . . . . . 204

A.6.52 Number of iterations required by GMRES preconditioned

by a Frobenius-norm minimization method updated with

spectral corrections to reduce the normwise backward error

by 10 −5 for increasing size of the coarse space. The symmetric

formulation of Theorem 2 with the choice W = V ε is used for

the low-rank updates. . . . . . . . . . . . . . . . . . . . . . . 205

xxiii

A.6.53 Number of iterations required by GMRES preconditioned

by a Frobenius-norm minimization method updated with

spectral corrections to reduce the normwise backward error

by 10 −8 for increasing size of the coarse space. The symmetric

formulation of Theorem 2 with the choice W = V ε is used for

the low-rank updates. . . . . . . . . . . . . . . . . . . . . . . 206

A.6.54 Number of iterations required by GMRES preconditioned

by a Frobenius-norm minimization method updated with

spectral corrections to reduce the normwise backward error

by 10 −5 for increasing size of the coarse space. The symmetric

formulation of Theorem 2 with the choice W = V ε is used for

the low-rank updates. . . . . . . . . . . . . . . . . . . . . . . 207

A.6.55 Number of iterations required by GMRES preconditioned

by a Frobenius-norm minimization method updated with

spectral corrections to reduce the normwise backward error

by 10 −8 for increasing size of the coarse space. The symmetric

formulation of Theorem 2 with the choice W = V ε is used for

the low-rank updates. . . . . . . . . . . . . . . . . . . . . . . 208

A.6.56 Number of iterations required by GMRES preconditioned

by a Frobenius-norm minimization method updated with

spectral corrections to reduce the normwise backward error

by 10 −5 for increasing size of the coarse space. The symmetric

formulation of Theorem 2 with the choice W = V ε is used for

the low-rank updates. . . . . . . . . . . . . . . . . . . . . . . 209

A.6.57 Number of iterations required by GMRES preconditioned

by a Frobenius-norm minimization method updated with

spectral corrections to reduce the normwise backward error

by 10 −8 for increasing size of the coarse space. The symmetric

formulation of Theorem 2 with the choice W = V ε is used for

the low-rank updates. . . . . . . . . . . . . . . . . . . . . . . 210

A.6.58 Number of iterations required by GMRES preconditioned

by a Frobenius-norm minimization method updated with

spectral corrections to reduce the normwise backward error

by 10 −5 for increasing size of the coarse space. The symmetric

formulation of Theorem 2 with the choice W = V ε is used for

the low-rank updates. . . . . . . . . . . . . . . . . . . . . . . 211

A.6.59 Number of iterations required by GMRES preconditioned

by a Frobenius-norm minimization method updated with

spectral corrections to reduce the normwise backward error

by 10 −8 for increasing size of the coarse space. The symmetric

formulation of Theorem 2 with the choice W = V ε is used for

the low-rank updates. . . . . . . . . . . . . . . . . . . . . . . 212

xxiv

A.6.60 Number of iterations required by GMRES preconditioned

by a Frobenius-norm minimization method updated with

spectral corrections to reduce the normwise backward error

by 10 −5 for increasing size of the coarse space. The symmetric

formulation of Theorem 2 with the choice W = V ε is used for

the low-rank updates. . . . . . . . . . . . . . . . . . . . . . . 213

A.6.61 Number of iterations required by SQMR preconditioned

by a Frobenius-norm minimization method updated with

spectral corrections to reduce the normwise backward error

by 10 −8 for increasing size of the coarse space. The symmetric

formulation of Theorem 2 with the choice W = V ε is used for

the low-rank updates. . . . . . . . . . . . . . . . . . . . . . . 214

A.6.62 Number of iterations required by SQMR preconditioned

by a Frobenius-norm minimization method updated with

spectral corrections to reduce the normwise backward error

by 10 −5 for increasing size of the coarse space. The symmetric

formulation of Theorem 2 with the choice W = V ε is used for

the low-rank updates. . . . . . . . . . . . . . . . . . . . . . . 215

A.7.63 Number of iterations required by GMRES preconditioned

by a Frobenius-norm minimization method updated with

spectral corrections to reduce the normwise backward error

by 10 −8 for increasing size of the coarse space. The

preconditioner is updated in multiplicative form. . . . . . . . 216

A.7.64 Number of iterations required by GMRES preconditioned

by a Frobenius-norm minimization method updated with

spectral corrections to reduce the normwise backward error

by 10 −5 for increasing size of the coarse space. The

preconditioner is updated in multiplicative form. . . . . . . . 217

A.7.65 Number of iterations required by GMRES preconditioned

by a Frobenius-norm minimization method updated with

spectral corrections to reduce the normwise backward error

by 10 −8 for increasing size of the coarse space. The

preconditioner is updated in multiplicative form. . . . . . . . 218

A.7.66 Number of iterations required by GMRES preconditioned

by a Frobenius-norm minimization method updated with

spectral corrections to reduce the normwise backward error

by 10 −5 for increasing size of the coarse space. The

preconditioner is updated in multiplicative form. . . . . . . . 219

A.7.67 Number of iterations required by GMRES preconditioned

by a Frobenius-norm minimization method updated with

spectral corrections to reduce the normwise backward error

by 10 −8 for increasing size of the coarse space. The

preconditioner is updated in multiplicative form. . . . . . . . 220

xxv

A.7.68 Number of iterations required by GMRES preconditioned

by a Frobenius-norm minimization method updated with

spectral corrections to reduce the normwise backward error

by 10 −5 for increasing size of the coarse space. The

preconditioner is updated in multiplicative form. . . . . . . . 221

A.7.69 Number of iterations required by GMRES preconditioned

by a Frobenius-norm minimization method updated with

spectral corrections to reduce the normwise backward error

by 10 −8 for increasing size of the coarse space. The

preconditioner is updated in multiplicative form. . . . . . . . 222

A.7.70 Number of iterations required by GMRES preconditioned

by a Frobenius-norm minimization method updated with

spectral corrections to reduce the normwise backward error

by 10 −5 for increasing size of the coarse space. The

preconditioner is updated in multiplicative form. . . . . . . . 223

A.7.71 Number of iterations required by GMRES preconditioned

by a Frobenius-norm minimization method updated with

spectral corrections to reduce the normwise backward error

by 10 −8 for increasing size of the coarse space. The

preconditioner is updated in multiplicative form. . . . . . . . 224

A.7.72 Number of iterations required by GMRES preconditioned

by a Frobenius-norm minimization method updated with

spectral corrections to reduce the normwise backward error

by 10 −5 for increasing size of the coarse space. The

preconditioner is updated in multiplicative form. . . . . . . . 225

A.7.73 Number of iterations required by SQMR preconditioned

by a Frobenius-norm minimization method updated with

spectral corrections to reduce the normwise backward error by

10 −8 for increasing size of the coarse space. The symmetric

formulation of Theorem 2 with the choice W = V ε is used

for the low-rank updates. The preconditioner is updated in

multiplicative form. . . . . . . . . . . . . . . . . . . . . . . . . 226

A.7.74 Number of iterations required by SQMR preconditioned

by a Frobenius-norm minimization method updated with

spectral corrections to reduce the normwise backward error by

10 −5 for increasing size of the coarse space. The symmetric

formulation of Theorem 2 with the choice W = V ε is used

for the low-rank updates. The preconditioner is updated in

multiplicative form. . . . . . . . . . . . . . . . . . . . . . . . 227

xxvi

List of Figures

1.3.1 Example of discretized mesh. . . . . . . . . . . . . . . . . . . 8

2.1.1 Meshes associated with test examples. . . . . . . . . . . . . . 30

2.1.2 Eigenvalue distribution in the complex plane of the coefficient

matrix of Example 3. . . . . . . . . . . . . . . . . . . . . . . . 31

2.2.3 Pattern structure of the large entries of A. The test problem

is Example 5. . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

2.2.4 Nonzero pattern for A when the smallest entries are

discarded. The test problem is Example 5. . . . . . . . . . . . 32

2.2.5 Sensitivity of SQMR convergence to the SSOR parameter ω

for Example 1. . . . . . . . . . . . . . . . . . . . . . . . . . . 32

2.2.6 Sensitivity of SQMR convergence to the SSOR parameter ω

for Example 4. . . . . . . . . . . . . . . . . . . . . . . . . . . 33

2.2.7 Incomplete factorization algorithm - M = LDL T . . . . . . . 33

2.2.8 The spectrum of the matrix preconditioned with IC(1), the

condition number of L, and the number of iterations with

SQMR for various values of the shift parameter τ. The test

problem is Example 1 and the density of Ã is around 3%. . . 35

2.2.9 The eigenvalue distribution on the square [-1, 1] of the matrix

preconditioned with IC(1), the condition number of L, and

the number of iterations with SQMR for various values of the

shift parameter τ. The test problem is Example 1 and the

density of Ã is around 3%. . . . . . . . . . . . . . . . . . . . . 36

2.2.10 The eigenvalue distribution on the square [-0.3, 0.3] of the

matrix preconditioned with IC(1), the condition number of

L, and the number of iterations with SQMR for various values

of the shift parameter τ. The test problem is Example 1 and

the density of Ã is around 3%. . . . . . . . . . . . . . . . . . 37

2.2.11The biconjugation algorithm - M = ZD −1 Z T . . . . . . . . . 39

2.2.12Sparsity patterns of the inverse of A (on the left) and of the

inverse of its lower triangular factor (on the right), where all

the entries whose relative magnitude is smaller than 5.0×10 −2

are dropped. The test problem, representative of the general

trend, is a small sphere. . . . . . . . . . . . . . . . . . . . . . 44

xxvii

xxviii

2.2.13 Histograms of the magnitude of the entries of the first

column of A −1 and its lower triangular factor. A similar

behaviour has been observed for all the other columns. The

test problem, representative of the general trend, is a small

sphere. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

3.2.1 Pattern structure of A −1 . The test problem is Example 5. . . 57

3.2.2 Example of discretized mesh. . . . . . . . . . . . . . . . . . . 59

3.2.3 Topological neighbours of a DOF in the mesh. . . . . . . . . . 59

3.2.4 Topological localization in the mesh for the large entries of

A. The test problem is Example 1 and is representative of

the general behaviour. . . . . . . . . . . . . . . . . . . . . . . 60

3.2.5 Topological localization in the mesh for the large entries of

A −1 . The test problem is Example 1 and is representative of

the general behaviour. . . . . . . . . . . . . . . . . . . . . . . 61

3.2.6 Evolution of the density of the pattern computed for

increasing number of levels. The test problem is Example 1.

This is representative of the general behaviour. . . . . . . . . 62

3.2.7 Geometric localization in the mesh for the large entries of A.

The test problem is Example 1. This is representative of the

general behaviour. . . . . . . . . . . . . . . . . . . . . . . . . 63

3.2.8 Geometric localization in the mesh for the large entries of

A −1 . The test problem is Example 1. This is representative

of the general behaviour. . . . . . . . . . . . . . . . . . . . . . 64

3.2.9 Evolution of the density of the pattern computed for larger

geometric neighbourhoods. The test problem is Example 1.

This is representative of the general behaviour. . . . . . . . . 64

3.2.10Mesh of Example 2. . . . . . . . . . . . . . . . . . . . . . . . 66

3.3.11Nonzero pattern for A −1 when the smallest entries are

discarded. The test problem is Example 5. . . . . . . . . . . . 67

3.3.12 Sparsity pattern of the inverse of sparse A associated with

Example 1. The pattern has been sparsified with the same

value of the threshold used for the sparsification of displayed

in Figure 3.3.11. . . . . . . . . . . . . . . . . . . . . . . . . . 68

3.3.13 CPU time for the construction of the preconditioner using

a different number of nonzeros in the patterns for A and M.

The test problem is Example 1. This is representative of the

other examples. . . . . . . . . . . . . . . . . . . . . . . . . . . 69

3.4.14 Eigenvalue distribution for the coefficient matrix

preconditioned by using a single density strategy on

Example 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

3.4.15 Eigenvalue distribution for the coefficient matrix

preconditioned by using a multiple density strategy on

Example 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

xxix

5.1.1 Interactions in the one-level FMM. For each leaf-box,

the interactions with the gray neighbouring leaf-boxes are

computed directly. The contribution of far away cubes are

computed approximately. The multipole expansions of far

away boxes are translated to local expansions for the leaf-box;

these contributions are summed together and the total field

induced by far away cubes is evaluated from local expansions. 94

5.1.2 The oct-tree in the FMM algorithm. The maximum number

of children is eight. The actual number corresponds to the

subset of eight that intersect the object (courtesy of G.

Sylvand, INRIA CERMICS). . . . . . . . . . . . . . . . . . . 95

5.1.3 Interactions in the multilevel FMM. The interactions for the

gray boxes are computed directly. We denote by dashed

lines the interaction list for the observation box, that consists

of those cubes that are not neighbours of the cube itself

but whose parent is a neighbour of the cube’s parent. The

interactions of the cubes in the list are computed using the

FMM. All the other interactions are computed hierarchically

on a coarser level, denoted by solid lines. . . . . . . . . . . . . 96

5.3.4 Mesh associated with the Airbus aircraft (courtesy of EADS).

The surface is discretized by 15784 triangles. . . . . . . . . . 97

5.3.5 The RCS curve for an Airbus aircraft discretized with 200000

unknowns. The problem is formulated using the EFIE

formulation and a tolerance of 2·10 −2 in the iterative solution.

The quantity reported on the ordinate axis indicates the value

of the energy radiated back at different incidence angles. . . . 101

5.3.6 The RCS curve for an Airbus aircraft discretized with 200000

unknowns. The problem is formulated using the CFIE

formulation and a tolerance of ·10 −6 in the iterative solution.

The quantity reported on the ordinate axis indicates the value

of the energy radiated back at different incidence angles. . . . 101

5.3.7 Effect of the restart parameter on GMRES stagnation on an

aircraft with 94704 unknowns. . . . . . . . . . . . . . . . . . . 102

5.4.8 Inner-outer solution schemes in the FMM context. Sketch of

the algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

5.4.9 Convergence history of restarted GMRES for different values

of restart on an aircraft with 94704 unknowns. . . . . . . . . 106

5.4.10Effect of the restart parameter on FGMRES stagnation on

an aircraft with 94704 unknowns using GMRES(20) as inner

solver. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

6.2.1 Eigenvalue distribution for the coefficient matrix

preconditioned by the Frobenius-norm minimization method

on Example 2. . . . . . . . . . . . . . . . . . . . . . . . . . . 115

xxx

6.2.2 Number of iterations required by GMRES preconditioned

by a Frobenius-norm minimization method updated with

spectral corrections to reduce the normwise backward error

by 10 −8 and 10 −5 for increasing size of the coarse space on

Example 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

6.2.3 Number of iterations required by GMRES preconditioned

by a Frobenius-norm minimization method updated with

spectral corrections to reduce the normwise backward error

by 10 −8 and 10 −5 for increasing size of the coarse space on

Example 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

6.2.4 Number of iterations required by GMRES preconditioned

by a Frobenius-norm minimization method updated with

spectral corrections to reduce the normwise backward error

by 10 −8 and 10 −5 for increasing size of the coarse space on

Example 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

6.2.5 Number of iterations required by GMRES preconditioned

by a Frobenius-norm minimization method updated with

spectral corrections to reduce the normwise backward error

by 10 −8 and 10 −5 for increasing size of the coarse space on

Example 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

6.2.6 Number of iterations required by GMRES preconditioned

by a Frobenius-norm minimization method updated with

spectral corrections to reduce the normwise backward error

by 10 −8 and 10 −5 for increasing size of the coarse space on

Example 5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

6.2.7 Number of iterations required by GMRES preconditioned

by a Frobenius-norm minimization method updated with

spectral corrections to reduce the normwise backward error

by 10 −8 for three choices of restart and increasing size of the

coarse space on Example 1. . . . . . . . . . . . . . . . . . . . 123

6.2.8 Number of iterations required by GMRES preconditioned

by a Frobenius-norm minimization method updated with

spectral corrections to reduce the normwise backward error

by 10 −8 for three choices of restart and increasing size of the

coarse space on Example 2. . . . . . . . . . . . . . . . . . . . 124

6.2.9 Number of iterations required by GMRES preconditioned

by a Frobenius-norm minimization method updated with

spectral corrections to reduce the normwise backward error

by 10 −8 for three choices of restart and increasing size of the

coarse space on Example 3. . . . . . . . . . . . . . . . . . . . 124

6.2.10 Eigenvalue distribution for the coefficient matrix

preconditioned by a Frobenius-norm minimization method

on Example 2. The same sparsity pattern is used for A and

for the preconditioner. . . . . . . . . . . . . . . . . . . . . . . 133

xxxi

6.2.11 Convergence of GMRES preconditioned by a Frobeniusnorm

minimization method updated with spectral corrections

to reduce the normwise backward error by 10 −8 and 10 −5

for increasing size of the coarse space on Example 1. The

formulation of Theorem 2 with the choice W H = Vε

H M 1 is

used for the low-rank updates. The same nonzero structure

is used for A and M 1 . . . . . . . . . . . . . . . . . . . . . . . 133

6.2.12 Convergence of GMRES preconditioned by a Frobeniusnorm

minimization method updated with spectral corrections

to reduce the normwise backward error by 10 −8 and 10 −5

for increasing size of the coarse space on Example 2. The

formulation of Theorem 2 with the choice W H = Vε

H M 1 is

used for the low-rank updates. The same nonzero structure

is used for A and M 1 . . . . . . . . . . . . . . . . . . . . . . . 134

6.2.13 Convergence of GMRES preconditioned by a Frobeniusnorm

minimization method updated with spectral corrections

to reduce the normwise backward error by 10 −8 and 10 −5

for increasing size of the coarse space on Example 3. The

formulation of Theorem 2 with the choice W H = Vε

H M 1 is

used for the low-rank updates. The same nonzero structure

is used for A and M 1 . . . . . . . . . . . . . . . . . . . . . . . 134

6.2.14 Convergence of GMRES preconditioned by a Frobeniusnorm

minimization method updated with spectral corrections

to reduce the normwise backward error by 10 −8 and 10 −5

for increasing size of the coarse space on Example 4. The

formulation of Theorem 2 with the choice W H = Vε

H M 1 is

used for the low-rank updates. The same nonzero structure

is used for A and M 1 . . . . . . . . . . . . . . . . . . . . . . . 135

6.2.15Number of iterations required by SQMR preconditioned by a

Frobenius-norm minimization method updated with spectral

corrections to reduce the normwise backward error by 10 −5

for increasing size of the coarse space on Example 1. The

symmetric formulation of Theorem 2 with the choice W = V ε

is used for the low-rank updates. . . . . . . . . . . . . . . . . 136

6.2.16Number of iterations required by SQMR preconditioned by a

Frobenius-norm minimization method updated with spectral

corrections to reduce the normwise backward error by 10 −5

for increasing size of the coarse space on Example 2. The

symmetric formulation of Theorem 2 with the choice W = V ε

is used for the low-rank updates. . . . . . . . . . . . . . . . . 137

xxxii

6.2.17Number of iterations required by SQMR preconditioned by a

Frobenius-norm minimization method updated with spectral

corrections to reduce the normwise backward error by 10 −5

for increasing size of the coarse space on Example 3. The

symmetric formulation of Theorem 2 with the choice W = V ε

is used for the low-rank updates. . . . . . . . . . . . . . . . . 137

6.2.18Number of iterations required by SQMR preconditioned by a

Frobenius-norm minimization method updated with spectral

corrections to reduce the normwise backward error by 10 −5

for increasing size of the coarse space on Example 4. The

symmetric formulation of Theorem 2 with the choice W = V ε

is used for the low-rank updates. . . . . . . . . . . . . . . . . 138

6.2.19Number of iterations required by SQMR preconditioned by a

Frobenius-norm minimization method updated with spectral

corrections to reduce the normwise backward error by 10 −5

for increasing size of the coarse space on Example 5. The

symmetric formulation of Theorem 2 with the choice W = V ε

is used for the low-rank updates. . . . . . . . . . . . . . . . . 138

6.3.20 Convergence of GMRES preconditioned by a Frobeniusnorm

minimization method updated with spectral corrections

to reduce the normwise backward error by 10 −8 and 10 −5

for increasing numberof corrections on Example 1. The

symmetric formulation of Theorem 2 with the choice W = V ε

is used for the low-rank updates. The preconditioner is

updated in multiplicative form. . . . . . . . . . . . . . . . . . 141

6.3.21 Convergence of GMRES preconditioned by a Frobeniusnorm

minimization method updated with spectral corrections

to reduce the normwise backward error by 10 −8 and 10 −5

for increasing size of the coarse space on Example 3. The

symmetric formulation of Theorem 2 with the choice W = V ε

is used for the low-rank updates. The preconditioner is

updated in multiplicative form. . . . . . . . . . . . . . . . . . 142

6.3.22 Convergence of GMRES preconditioned by a Frobeniusnorm

minimization method updated with spectral corrections

to reduce the normwise backward error by 10 −8 and 10 −5

for increasing size of the coarse space on Example 4. The

symmetric formulation of Theorem 2 with the choice W = V ε

is used for the low-rank updates. The preconditioner is

updated in multiplicative form. . . . . . . . . . . . . . . . . . 142

Chapter 1

Introduction

This thesis considers the problem of designing effective preconditioning

strategies for the iterative solution of boundary integral equations in

electromagnetism. An accurate numerical solution of these problems

is required in the simulation of many industrial processes, such as the

prediction of the Radar Cross Section (RCS) of arbitrarily shaped 3D

objects like aircrafts, the analysis of electromagnetic compatibility of

electrical devices with their environment, and many others. In the last

20 years, owing to the impressive development in computer technology

and to the introduction of fast methods which require less computational

cost and memory resources, a rigorous numerical solution of many of these

applications has become possible [29]. Nowadays challenging problems in

an industrial setting demand a continuous reduction in the computational

complexity of the numerical methods employed; the aim of this research is

to investigate the use of sparse linear algebra techniques (with particular

emphasis on preconditioning) for the solution of dense linear systems

of equations arising from scattering problems expressed in an integral

formulation.

In this chapter, we illustrate the motivation of our research, and we

present the major topics discussed in the thesis. In Section 1.1, we describe

the physical problem we are interested in, and give some examples of

applications. In Section 1.2, we formulate the mathematical problem and,

in Section 1.3, we overview some of the principal approaches generally used

to solve scattering problems. Finally, in Section 1.4, we discuss direct and

iterative solution strategies and introduce some issues relevant to the design

of the preconditioner.

2 1. Introduction

1.1 The physical problem and applications

Electromagnetic scattering problems address the physical issue of detecting

the diffraction pattern of the electromagnetic radiation scattered from a

large and complex body when illuminated by an incident incoming wave.

A good understanding of these phenomena is crucial to the design of

many industrial devices like radars, antennae, computer microprocessors,

optical fibre systems, cellular telephones, transistors, modems, and so on.

Electronic circuits produce and are subject to electromagnetic interference,

and ensuring reduced radiation and signal distortion have become two

major issues in the design of modern electronic devices. The increase of

currents and frequencies in industrial simulations makes electromagnetic

compatibility requirements more difficult to meet and demands an accurate

analysis previous to the design phase.

The study of electromagnetic scattering is required in radar applications,

where a target is illuminated by incident radiation and the energy radiated

back to the radar is analysed to retrieve information on the target. In fact,

the amount of the radiated energy depends on the radar cross-section of

the target, on its shape, on the material of which it is composed, and on

the wavelength of the incident radiation. Radar measurements are vital

for estimating surface currents in oceanography, for mapping precipitation

areas and detecting wind direction and speed in meteorological and

climatic studies, as well as in the production of accurate weather forecasts,

geophysical prospecting from remote sensing data, wireless communication

and bioelectromagnetics. In particular, the computation of radar crosssection

is used to identify unknown targets as well as to design stealth

technology.

Modern targets reduce their observability features by using new

materials. Engineers design, develop and test absorbing materials which

can control radiation, reduce signatures of military systems, preserve

compatibility with other electromagnetic compatibility devices, isolate

recording studios and listening rooms. A good knowledge of the

electromagnetic properties of materials can be critical for economic

competitiveness and technological advances in many industrial sectors. All

these simulations can be very demanding in terms of computer resources;

they require innovative algorithms and the use of high performance

computers to afford a rigorous numerical solution.

1.2 The mathematical problem

The mathematical formulation of scattering problems relies on Maxwell’s

equations, originally introduced by James Maxwell in 1864 in the article A

Dynamical Theory of the Electromagnetic Field [103] as 20 scalar equations.

1.2. The mathematical problem 3

Maxwell’s equations were reformulated in the 1880s as a set of four vector

differential equations, describing the time and space evolution of the electric

and the magnetic field around the scatterer. They are:

⎧

⎪⎨

⎪⎩

∇ × H = J + ∂D

∂t ,

∇ × E = − ∂B

∂t ,

∇ · D = ρ,

∇ · B = 0.

(1.2.1)

The vector fields which appear in (1.2.1) are the electric field E(x,t), the

magnetic field H(x,t), the magnetic flux density B(x,t) and the electric flux

density D(x,t). Equations (1.2.1) involve also the current density J(x,t) and

the charge density ρ(x, t). Given a vector field A represented in Cartesian

coordinates in the form A(x, y, z) = A x (x, y, z)i+A y (x, y, z)j+A z (x, y, z)k,

the components of the curl operator ∇ × A are

⎧

⎪⎨

⎪⎩

(∇ × A) x = ∂A z

∂y − ∂A y

∂z ,

(∇ × A) y = ∂A x

∂z − ∂A z

∂x ,

(∇ × A) z = ∂A y

∂x − ∂A x

∂y .

The divergence operator ∇ · A in Cartesian coordinates is

∇ · A = ∂A x

∂x + ∂A y

∂y + ∂A z

∂z .

The continuity equation, which expresses the conservation of charge,

relates the quantities J and ρ

∂ρ

∂t + ∇ · J = 0.

In an isotropic conductor the current density is related to the electric field

by Ohm’s law:

J = σE,

where σ(x) is called the electric conductivity. If σ is nonzero, the medium is

called a conductor, whereas if σ = 0 the medium is referred to as a dielectric.

Relations also exist between D and E, B and H, and are determined by the

polarization and magnetization properties of the medium containing the

scatterer; in a linear isotropic medium we have

D = ɛE,

B = µH,

4 1. Introduction

where the functions ɛ(x) and µ(x) are the electric permittivity and the

magnetic permeability, respectively. In a vacuum D = E, and B = H.

This equality can be assumed valid, up to some approximation, when the

medium is the air. In this case, Maxwell’s equations can be simplified and

read:

⎧

⎪⎨

⎪⎩

∇ × H = J + ∂E

∂t ,

∇ × E = − ∂H

∂t ,

∇ · E = ρ,

∇ · H = 0.

(1.2.2)

Boundary conditions are associated with system (1.2.2) to describe

different physical situations. For scattering from perfect conductors, which

represents an important model problem in industrial simulations, the electric

field vanishes inside the object and the total tangential electric field on the

surface of the scatterer is zero. Absorbing radiation conditions at infinity

are imposed, like the Silver-Müller radiation condition [25]

lim

r→∞ (Hs × x − rE s ) = 0

uniformly in all directions ˆx = x/|x|, where r = |x| and H s and E s are the

scattered part of the fields.

A further simplification comes when Maxwell’s equations are formulated

in the frequency domain rather than in the time domain. Since the sum of

two solutions is still a solution, Fourier transformations can be introduced

to remove time-dependency from system (1.2.2), and to write it in the

form of a set of several time-independent systems, each corresponding to

one fixed value of frequency. All the quantities in (1.2.2) are assumed to

have harmonic behaviour in time, that is they can be written in the form

A(x, t) = A(x)e iωt (ω is a constant) and their time dependency is completely

determined by the amplitude and relative phase. For a dielectric body the

new system assumes the form:

⎧

⎪⎨

⎪⎩

∇ × H = +iωE,

∇ × E = −iωH,

∇ · E = 0,

∇ · H = 0.

(1.2.3)

where now E = E(x) and H = H(x). Here ω = ck = 2πc/λ is referred to

as the angular frequency, k as the wave number and λ as the wavelength of

the electromagnetic wave. The constant c is the speed of light.

1.3. Numerical solution of Maxwell’s equations 5

1.3 Numerical solution of Maxwell’s equations

A popular solution approach eliminates the magnetic field H from (1.2.3)

and obtains a vector Helmholtz equation with a divergence condition:

{ ∆E + k 2 E = 0,

∇ · E = 0.

(1.3.4)

Systems (1.3.4) are challenging to solve. An analytic solution can be

computed when the geometry of the scatterer is very regular, as in the case

of a sphere or a spheroid. More complicated boundaries require the use of

numerical techniques.

Objects of interest in industrial applications generally have large

dimension in terms of wavelength, and the computation of their scattering

cross section can be very demanding in terms of computer resources. Until

the emergence of high-performance computers in the early eighties, the

solution was afforded by using approximate high frequency techniques such

as the shooting and bouncing ray method (SBR) [101]. Basically, raybased

asymptotic methods like SBR and uniform theory of diffraction are

based on the idea that EM scattering becomes a localized phenomenon

as the size of the scatterer increases with respect to the wavelength. In

the last 20 years, the impressive advance in computer technology and the

introduction of fast methods which have less computational and memory

requirement, have made a rigorous numerical solution affordable for many

practical applications. Nowadays, computer scientists generally adopt two

distinct approaches for the numerical solution, based on either differential

or integral equation methods.

1.3.1 Differential equation methods

The first approach solves system (1.3.4) for the electric field surrounding the

scatterer by differential equation methods. Classical discretization schemes

like the finite-element method (FEM) [125, 145]) or the finite-difference

method (FDM) [99, 137] can be used to discretize the continuous model

and give rise to a sparse linear system of equations. The domain outside

the object is truncated and an artificial boundary is introduced to simulate

an infinite volume [20, 83, 85]. Absorbing boundary conditions do not alter

the sparsity structure in the matrix from the discretization but have to

be imposed at some distance from the scatterer. More accurate exterior

boundary conditions, based on integral equations, allow us to bring the

exterior boundary of the simulation region closer to the surface of the

scatterer and to limit the size of the linear system to solve [89, 104]. As

they are based on integral equations, they result in a part of the matrix

being dense in the final system which can increase the overall solution cost.

6 1. Introduction

The discretization of large 3D domains may suffer from grid dispersion

errors, which occur when a wave has a different phase velocity on the

grid compared to the exact solution [9, 90, 100]. Grid dispersion errors

accumulate in space and, for 2D and 3D problems over large simulation

regions, their effect can be troublesome, introducing spurious solutions

in the computation. The effect of grid dispersion errors can be reduced

by using finer grids or higher-order accurate differential equation solvers,

which substantially increase the problem size, or by coupling the differential

equation solver with an integral equation solver.

Because of the sparsity structure of the discretization matrix, differential

equation methods have become popular solution methods for EM problems.

1.3.2 Integral equation methods

An alternative class of methods is represented by integral equation solvers.

Using the equivalence principle, system (1.3.4) can be recast in the form of

four integral equations which relate the electric and magnetic fields E and H

to the equivalent electric and magnetic currents J and M on the surface of

the object. Integral equation methods solve for the induced currents globally,

whereas differential equation methods solve for the fields. The electric-field

integral equation (EFIE) expresses the electric field outside the object E E

in terms of the induced current J. In the case of harmonic time dependency

it reads

∫

E(x) = − ∇G(x, x ′ )ρ(x ′ )d 3 x ′ − ik ∫

G(x, x ′ )J(x ′ )d 3 x ′ + E E (x) (1.3.5)

Γ c Γ

where E E is the electric field due to external sources, and G is the Green’s

function for scattering problems:

G(x, x ′ ) = e−ik|x−x′ |

|x − x ′ | .

The EFIE provides a first-kind integral equation which is well known

to be ill-conditioned, but it is the only integral formulation that can be

used for open targets. Another formulation, referred to as the magneticfield

integral equation (MFIE), expresses the magnetic field outside the

object in terms of the induced current and allows the calculation of the

magnetic field outside the object. Both formulations suffer from interior

resonances, which can make the numerical solution more problematic at

some frequencies known as resonant frequencies. The problem of interior

resonances is particularly troubling for large objects. A possible remedy is to

combine linearly the EFIE and MFIE formulation. The resulting equation,

known as the combined-field integral equation (CFIE), does not suffer from

internal resonance and is much better conditioned as it generally provides

an integral equation of the second-kind, but can be used only for closed

1.3. Numerical solution of Maxwell’s equations 7

targets. Owing to these nice properties, the use of the CFIE formulation is

considered mandatory for closed surfaces.

The resulting EFIE, MFIE and CFIE are converted into matrix equations

by the Method of Moments [86]. The unknown current J(x) on the surface

of the object is expanded into a set of basis functions B i , i = 1, 2, ..., N

J(x) =

N∑

J i B i (x).

i=1

This expansion is introduced in (1.3.5), and the discretized equation is

applied to a set of test functions. A linear system of equations is finally

obtained, whose unknowns are the coefficients of the expansion. The entries

in the coefficient matrix are expressed in terms of surface integrals and

assume the simplified form

∫ ∫

A KL = G(x, y)B K (x) · B L (y)dL(y)dK(x). (1.3.6)

When m-point Gauss quadrature formulae are used to compute the surface

integrals in (1.3.6), the entries of the coefficient matrix have the form

A KL =

m∑ m∑

ω i ω j G(x Ki , y Lj )B K (x Ki ) · B L (y Lj ).

i=1 j=1

The resulting linear system is dense and complex, unsymmetric in the case

of MFIE and CFIE, symmetric but non-Hermitian in the case of the EFIE

formulation.

For homogeneous or layered homogeneous dielectric bodies, integral

equations are discretized on the surface of the object or at the discontinuous

interfaces between two different materials. Thus the number of unknowns is

generally much smaller when compared to the discretization of large 3D

spaces by finite-difference or finite-element methods. However, a global

coupling of the induced currents in the problem results in dense matrices.

The cost of the solution associated with these dense matrices has for a long

time precluded the popularity of integral solution methods in EM. In recent

years, the application in the context of the study of radar targets of different

materials and the availability of larger computer resources have motivated

an increasing interest towards integral methods.

Throughout this thesis, we focus on preconditioning strategies for the

EFIE formulation of scattering problems. In the integral equation context

that we consider, the problems are discretized by the Method of Moments

using the Rao-Wilton-Glisson (RWG) basis functions [116]. The surface

of the object is modelled by a triangular faceted mesh (see Figure 1.3.1),

and each RWG basis is assigned to one interior edge in the mesh. Each

unknown in the problem represents the vectorial flux across each edge in the

8 1. Introduction

triangular mesh. The total number of unknowns is given by the number of

interior edges, which is about one and a half times the number of triangular

facets. In order to have a correct approximation to the oscillating solution

of the Maxwell’s equations, physical constraints impose that the average

edge length a has to be between 0.1λ and 0.2λ, where λ is the wavelength

of the incoming wave [11]. Two factors mainly affect the dimension N of

the linear system to solve, namely the total surface area and the frequency

of the problem. For a given target the size of the system is proportional

to the square of the frequency, and the memory cost for the storage of

the N 2 complex numbers of the full discretization matrix is proportional

to the fourth power of the frequency. This cost increases drastically when

fine discretization is required, as in the case for rough geometries, and can

make the numerical solution of medium size problems unaffordable even on

modern computers. Nowadays a typical electromagnetic problem in industry

can have hundred of thousands or a few million unknowns.

Figure 1.3.1: Example of discretized mesh.

1.4 Direct versus iterative solution methods

Direct methods are often the method of choice for the solution of

these systems in an industrial environment because they are reliable and

predictable both in terms of accuracy and cost. Dense linear algebra

packages such as LAPACK [5] provide reliable implementations of LU

factorization attaining good performance on modern computer architectures.

In particular, they use Level 3 BLAS [51, 52] for block operations which

1.4 Direct versus iterative solution methods 9

enable us to exploit data locality in the cache memory. Except when the

geometries are very irregular, the coefficient matrices of the discretized

problem are not very ill-conditioned, and direct methods compute fairly

accurate solutions. The factorization can be performed once and then is

reused to compute a solution for all excitations. In industrial simulations,

objects are illuminated at several, slightly different incidence directions,

and hundred of thousands of systems have often to be solved for the same

application, all having the same coefficient matrix and a different right-hand

side.

For the solution of large-scale problems, direct methods become

impractical even on large parallel platforms because they require storage

of N 2 single or double precision complex entries of the coefficient matrix

and O(N 3 ) floating-point operations to compute the factorization, where N

denotes the size of the linear system. Some direct solvers with reduced

computational complexity have been introduced for the case when the

solution is sought for blocks of right-hand sides, like the EADS out-of-core

parallel solver [1], the Nested Equivalence Principle Algorithm (NEPAL) [30,

31] and the Recursive Aggregate T-Matrix Algorithm (RATMA) [31, 32], but

the computational cost remains a bottleneck for large-scale applications.

Although, in the last twenty years, computer technology has gone from

flops to Gigaflops, that is a speedup factor of 10 9 , the size of the largest

dense problems solved on current architectures increased by only a factor of

three [56, 57].

1.4.1 A sparse approach for solving scattering problems

It can be argued that all large dense matrices hide some structure behind

their N 2 entries. The structure sometimes emerges naturally at the

matrix level (Toeplitz, circulant, orthogonal matrices) and sometimes can

be identified from the origin of the problem. When the number of unknowns

is large, the discretized problem reflects more closely the properties of the

continuous problem, and the entries of the discretization matrix are far

from arbitrary. Exploiting this structure can enable the use of sparse linear

algebra techniques and lead to a sensible reduction of the overall solution

cost. The use of iterative methods can be promising from this viewpoint

because they simply require a routine to compute matrix-vector products

and do not need the knowledge of all the entries of the coefficient matrix.

Special properties of the problem can be profitably used to reduce the

computational cost of this procedure. Under favourable conditions, iterative

methods improve the approximate solution at each step. When the required

accuracy is obtained, one can stop the iteration.

In the last decades, active research efforts have been devoted to

understanding theoretical and numerical properties of modern iterative

solvers. Although they still cannot compete with direct solvers in terms of

10 1. Introduction

robustness, they have been successfully used in many contexts. In particular,

it is now established that iterative solvers have to be used with some form of

preconditioning to be effective on challenging problems, like those arising in

industry (see, for instance, [2, 41, 60, 146]). Provided we have fast matrixvector

multiplications and robust preconditioners the iterative solution via

modern Krylov solvers can be an alternative to direct methods.

There are active research efforts on fast methods [4, 82] to perform fast

matrix-vector products with O(N log N) computational complexity. These

methods, generally referred to as hierarchical methods, were introduced

originally in the context of the study of particle simulations as a way

to reduce costs and enable the solution of large problems, or to demand

more accuracy in the computation [6, 8]. Hierarchical methods can be

effective on boundary element applications, and many research efforts have

been successful in this direction, including strategies for parallel distributed

memory implementations [45, 46, 47, 79, 80].

In this thesis, we focus on the other key component of Krylov methods

in this context; that is, we study the design of robust preconditioning

techniques. The design of the preconditioner is generally very problemdependent

and can take great advantage of a good knowledge of the

underlying physical problem. General purpose preconditioners can fail on

specific classes of problems, and for some of them a good preconditioner is

not known yet. A preconditioner M is required to be a good approximation

of A in some sense (or of A −1 , depending on the context), to be easy to

compute and cheap to store and to apply. For electromagnetic scattering

problems expressed in integral formulation, some special constraints in

addition to normal constraints are required. For large problems the use

of fast methods is mandatory for the matrix-vector products. When fast

methods are used, the coefficient matrix is not completely stored in memory

and only some of the entries, corresponding to the near-field interactions, are

explicitely computed and available for the construction of the preconditioner.

Hierarchical methods are often implemented in parallel, partitioning the

domain among different processors and the matrix-vector products are

computed in a distributed manner, trying to meet the goal of both load

balancing and reduced communications. Thus, parallelism is a relevant

factor to consider in the design of the preconditioner. Nowadays the typical

problem size in electromagnetic industry is continually increasing, and the

effectiveness of preconditioned Krylov subspace solvers should be combined

with the property of numerical scalability; that is, the numerical behavior of

the preconditioner should not depend on the mesh size or on the frequency

of the problem. Finally, matrices arising from the discretization of integral

equations can be highly indefinite and many standard preconditioners can

exhibit surprisingly poor performance.

This manuscript is structured as follows. In Chapter 2, we establish the

need for preconditioning linear systems of equations which arise from the

1.4 Direct versus iterative solution methods 11

discretization of boundary integral equations in electromagnetism, and we

test and compare several standard preconditioners computed from a sparse

approximation of the dense coefficient matrix. We study their numerical

behaviour on a set of model problems arising from both academic and from

industrial applications, and gain some insight on potential causes of failure.

In Chapter 3, we focus our analysis on sparse approximate inverse methods

and we propose some efficient static nonzero pattern selection strategies for

the construction of a robust Frobenius-norm minimization preconditioner in

electromagnetism. We introduce suitable strategies to identify the relevant

entries to consider in the original matrix A, as well as an appropriate

sparsity structure for the approximate inverse. In Chapter 4, we illustrate

the numerical and computational efficiency of the proposed preconditioner

on a set of model problems, and we complete the study considering

two symmetric preconditioners based on Frobenius-norm minimization.

In Chapter 5, we consider the implementation of the Frobenius-norm

minimization preconditioner within the code that implements the Fast

Multipole Method (FMM). We combine the sparse approximate inverse

preconditioner with fast multipole techniques for the solution of huge

electromagnetic problems. We study the numerical and parallel scalability

of the implementation and we investigate the numerical behaviour of innerouter

iterative solution schemes implemented in a multipole context with

different levels of accuracy for the matrix-vector products in the inner and

outer loops. In Chapter 6, we introduce an algebraic multilevel strategy

based on low-rank updates for the preconditioner computed by using spectral

information of the preconditioned matrix. We illustrate the computational

and numerical efficiency of the algorithm on a set of model problems

that is representative of real electromagnetic calculation. We finally draw

some conclusions arising from the work and address perspectives for future

research.

12 1. Introduction

Chapter 2

Iterative solution via

preconditioned Krylov

solvers of dense systems in

electromagnetism

In this chapter we establish the need for preconditioning linear systems of

equations which arise from the discretization of boundary integral equations

in electromagnetism. In Section 2.1, we illustrate the numerical behaviour

of iterative Krylov solvers on a set of model problems arising both from

industrial and from academic applications. The numerical results suggest

the need for preconditioning to effectively reduce the number of iterations

required to obtain convergence. In Section 2.2, we introduce the idea of

preconditioning based on sparsification strategies, and we test and compare

several standard preconditioners computed from a sparse approximation of

the dense coefficient matrix. We study their numerical behaviour on model

problems and gain some insight on potential causes of failure.

2.1 Introduction and motivation

In this section we study the numerical behaviour of several iterative

solvers for the solution of linear systems of the form

Ax = b (2.1.1)

where the coefficient matrix A arises from the discretization of boundary

integral equations in electromagnetism. Among different integral

14 2. Iterative solution via preconditioned Krylov solvers ...

formulations here we focus on the EFIE formulation 1.3.5, because it is more

general and more difficult to solve. We use the following Krylov methods:

• restarted GMRES [123];

• Bi-CGSTAB [142] and Bi-CGSTAB(2) [129];

• symmetric [69], nonsymmetric [67] and transpose-free QMR [66];

• CGS [131].

As a set of model problems for the numerical experiments we consider

the following geometries, arising both from academic and from industrial

applications, that are representative of the general numerical behaviour

observed. For physical consistency we have set the frequency of the wave so

that there are about ten discretization points per wavelength [11].

Example 1: a cylinder with a hollow inside, a matrix of order n = 1080,

see Figure 2.1.1(a);

Example 2: a cylinder with a break on the surface, a matrix of order

n = 1299, see Figure 2.1.1(b);

Example 3: a satellite, a matrix of order n = 1701, see Figure 2.1.1(c);

Example 4: a parallelopiped, a matrix of order n = 2016, see

Figure 2.1.1(d); and

Example 5: a sphere, a matrix of order n = 2430, see Figure 2.1.1(e).

The first three examples are considered because they can be

representative of real industrial simulations. The geometries of Examples 4

and 5 are very regular, and they are mainly introduced to study the

numerical behaviour of the proposed methods on smooth surfaces. In spite

of their small dimension, these problems are not easy to solve. Except for

two of the model problems, the sphere and the parallelopiped, the other

problems are tough because their geometries have open surfaces. Larger

problems will be examined in Chapter 5 when we consider the multipole

method.

2.1. Introduction and motivation 15

(a) Example 1 (b) Example 2

(e) Example 5

Figure 2.1.1: Meshes associated with test examples.

16 2. Iterative solution via preconditioned Krylov solvers ...

Table 2.1.1 shows the number of matrix-vector products needed by

each of the solvers to reduce the residual by 10 −5 . This tolerance can be

accurate for engineering purposes, as it enables to localize fairly accurately

the distribution of the currents on the surface of the object. In each case,

we take as initial guess x 0 = 0, and the right-hand side such that the exact

solution of the system is known. In the GMRES code [63] and the symmetric

QMR code [62] (referred to as SQMR in the forthcoming tables), iterations

are stopped when, for the current approximation x m , the computed value of

‖r m‖ 2

α‖x m ‖ 2 +β

satisfied a fixed tolerance. Here r m is the residual vector r m = b − Ax m ,

and standard choices for constants α and β in backward error analysis

are α = ‖A‖ 2 and β = ‖b‖ 2 . In all our tests we use α = 0 and

β = ‖b‖ 2 = ‖r 0 ‖ 2 because of initial guess. For CGS and Bi-CGSTAB, we

use the implementations provided by HSL 2000 [87] subroutines MI06 and

MI03 respectively, suitably adapted to complex arithmetic. These routines

accept the current approximation x m when

‖b − Ax m ‖ ≤ max(‖b − Ax 0 ‖ 2 · ε 1 , ε 2 ),

where ε 1 and ε 2 are user-defined tolerances. In our case we take ε 1 as

equal to the required accuracy, and ε 2 = 0.0. For Bi-CGSTAB(2) we

use the implementation developed by D. Fokkema of the Bi-CGSTAB(l)

algorithm, which introduces some enhancements to improve stability and

robustness, as explained in [127] and [128]. The algorithm stops iterations

when the relative residual norm ‖r n ‖ 2 /‖r 0 ‖ 2 becomes smaller than a fixed

tolerance. In the tests with nonsymmetric QMR (referred to as UQMR

in the forthcoming tables) and TFQMR, we use, respectively, the ZUCPL

and ZUTFX routines provided in QMRPACK [70]. In particular, ZUCPL

implements a double complex nonsymmetric QMR algorithm based on the

coupled two-term look-ahead Lanczos variant (see [68]). Both ZUCPL

and ZUTFX stop iterations when the relative residual norm ‖r n ‖ 2 /‖r 0 ‖ 2

becomes smaller than a fixed tolerance. Notice that, since x 0 = 0, all the

stopping criteria are equivalent, allowing fair comparison among all those

methods. All the numerical experiments reported in this section correspond

to runs on a Sun workstation in double complex arithmetic and Level 2

BLAS operations are used to carry out dense matrix-vector products. In

connection with GMRES, we test different values of the restart m, from 10

up to 110. We recall that each iteration involves one matrix-vector product

for restarted GMRES and SQMR, two for Bi-CGSTAB and CGS, three for

UQMR and four for TFQMR.

2.1. Introduction and motivation 17

Example

Size

GMRES(m) Bi -

CGStab

m=10 m=30 m=50 m=80 m=110

1 1080 +1000 +1000 600 255 204 445

2 1299 +1000 826 589 403 292 717

3 1701 +1000 824 651 556 493 +1000

4 2016 426 232 195 160 149 354

5 2430 +1000 356 238 148 127 303

Example

Size

Bi -

CGStab(2)

SQMR UQMR TFQMR CGS

1 1080 320 149 693 700 330

2 1299 404 186 +1000 996 438

3 1701 668 345 +1000 +1000 418

4 2016 228 92 514 470 256

5 2430 284 98 511 434 284

Table 2.1.1: Number of matrix-vector products needed by some

unpreconditioned Krylov solvers to reduce the residual by a factor of 10 −5 .

Except for SQMR, all the other solvers exhibit very slow convergence

on the first three examples which correspond to irregular geometries and

are more difficult to solve. The last two examples are easier because the

geometries are very regular, however the iterative solution is still expensive

in terms of number of matrix-vector products. These experiments reveal the

remarkable robustness of SQMR that clearly outperforms non-symmetric

solvers on all the test cases, even GMRES for large restarts. The results

also reveal the good performance of Bi-CGSTAB(2) compared to the

standard Bi-CGSTAB method which generally requires at least one third

more matrix-vector products to converge. On the most difficult problems,

slow convergence is essentially due to the bad spectral properties of the

coefficient matrix. Figure 2.1.2 plots the distribution of eigenvalues in the

complex plane for Example 3; the eigenvalues are scattered from the left to

the right of the spectrum, many of them have large negative real part and

no clustering appears. Such a distribution is not at all favourable for the

rapid convergence of Krylov solvers.

Krylov methods look for the solution of the system in the Krylov space

K k (A, b) = span{b, Ab, A 2 b, ..., A k−1 b}. This is a good space from which

to construct approximate solutions for a nonsingular linear system because

it is intimately related to A −1 . The inverse of any nonsingular matrix

A can be written in terms of powers of A with the help of the minimal

polynomial q(t) of A, which is the unique monic polynomial of minimum

18 2. Iterative solution via preconditioned Krylov solvers ...

3.5

3

2.5

2

Imaginary axis

1.5

1

0.5

0

−0.5

−45 −40 −35 −30 −25 −20 −15 −10 −5 0 5

Real axis

Figure 2.1.2: Eigenvalue distribution in the complex plane of the coefficient

matrix of Example 3.

degree such that q(A) = 0. If the minimal polynomial of A has degree m,

then the solution of Ax = b lies in the space K m (A, b). Consequently, the

smaller the degree of the minimal polynomial, the faster the expected rate

of convergence of a Krylov method (see [88]). If preconditioning A by a

nonsingular matrix M causes the eigenvalues of M −1 A to fall into a few

clusters, say t of them, whose diameters are small enough, then M −1 A

behaves numerically like a matrix with t distinct eigenvalues. As a result,

we would expect t iterations of a Krylov method to produce reasonably

accurate approximations. It has been shown in [74, 122, 148] that in

practice, with the availability of a high quality preconditioner, the choice of

the Krylov subspace accelerator is not so critical.

2.2 Preconditioning based on sparsification

strategies

A preconditioner M should satisfy the following demands:

• M is a good approximation to A in some sense (sometimes to A −1 ,

depending on the context);

• the construction and storage of M is not expensive;

2.2. Preconditioning based on sparsification strategies 19

• the system Mx = b is much easier to solve than the original one.

The transformed preconditioned system has the form M −1 Ax = M −1 b if

preconditioning from the left, and AM −1 y = b , with x = M −1 y, when

preconditioning from the right. For a preconditioner M given in the form

M = M 1 M 2 , it is also possible to consider the two-sided preconditioned

system M1 −1 −1

AM2 z = M 1 −1

−1

b, with x = M2 z.

Most of the existing preconditioners can be divided into either implicit or

explicit form. A preconditioner is said to be of implicit form if its application,

within each step of an iterative method, requires the solution of a linear

system; it is implicitly defined by any nonsingular matrix M ≈ A. The most

important example of this class is represented by incomplete factorization

methods, where M is implicitly defined by M = ¯LŪ, ¯L and Ū are generally

triangular matrices that approximate the exact L and U factors from a

standard factorization of A according to some dropping strategy adopted

during the factorization. It is well known that these methods are sensitive to

indefiniteness in the coefficient matrix A and can lead to unstable triangular

solves and very poor preconditioners (see [34]). Another important drawback

of ILU-techniques is that they are not naturally suitable for a parallel

implementation since the sparse triangular solves can lead to a severe

degradation of performance on vector and parallel machines.

Explicit preconditioning techniques try to mitigate such difficulties.

They directly approximate A −1 as the product M of sparse matrices, so that

the preconditioning operation reduces to forming one or more matrix-vector

products. Consequently the application of the preconditioner should be

easier to parallelize, with different strategies depending on the particular

architecture. In addition, some of these techniques can also perform the

construction phase in parallel. On certain indefinite problems with large

nonsymmetric parts, these methods have provided better results than

techniques based on incomplete factorizations (see [35]), representing an

efficient alternative to the solution of difficult applications. A comparison

of approximate inverse and ILU can be found in [76].

In the next sections, we study the numerical behaviour of several

standard preconditioners both of implicit and of explicit form in

combination with Krylov methods for the solution of systems (2.1.1).

All the preconditioners are computed from a sparse approximation of the

dense coefficient matrix. On general problems, this approach can cause

a severe deterioration of the quality of the preconditioner; in the BEM

context, it is likely to be more effective since a very sparse matrix can retain

the most relevant contributions to the singular integrals. In Figure 2.2.3

we depict the pattern structure of the large entries in the discretization

matrix for Example 5, which is representative of the general trend. Large

to small entries are depicted in different colours, from red to green, yellow

20 2. Iterative solution via preconditioned Krylov solvers ...

and blue. The picture shows that, in the discretization matrix, only a

small set of entries generally have large magnitude. The largest entries are

located on the main diagonal and only a few adjacent bands have entries

of high magnitude. Most of remaining entries generally have much smaller

modulus. In Figure 2.2.4, we plot for the same example the matrix obtained

by scaling A = [a ij ] so that max i,j |a ij | = 1, and discarding from A all

entries less than ε = 0.05 in modulus. This matrix is 98.5% sparse. The

figure emphasizes the presence of the strong coupling among neighbouring

edges introduced in the geometrical domain by the Boundary Element

Method, and suggests the possibility of extracting a sparsity pattern from

A by simply discarding elements of negligible magnitude, which correspond

to weak contributions of coupling among distant nodes.

Figure 2.2.3: Pattern structure of the large entries of A. The test problem

is Example 5.

The dropping operation is generally referred to as sparsification. The

idea of sparsifying dense matrices before computing the preconditioner was

introduced by Kolotilina [93] in the context of sparse approximate inverse

methods. Alléon et al. [2], Chen [28] and Vavasis [144] used this idea for

the preconditioning of dense systems from the discretization of boundary

integral equations, and Tang and Wan [140] in the context of multigrid

methods. Similar ideas are also exploited by Ruge and Stüben [118] in the

2.2. Preconditioning based on sparsification strategies 21

Figure 2.2.4: Nonzero pattern for A when the smallest entries are discarded.

The test problem is Example 5.

context of algebraic multigrid methods. On sparse systems, sparsification

can be helpful to identify the most relevant connections in the direct

problem, especially when the coefficient matrix contains many small entries

or is fairly dense (see [33] and [91]).

Several heuristics can be used to sparsify A and to try and retain the

main contributions to the singular integrals. Some approaches are the

following:

• find, in each column of A, the k entries of largest modulus, where

k ≪ n is a positive integer. The choice of the parameter k is generally

problem-dependent. The resulting matrix will have exactly k·n entries;

• for each column of A, select the row indices of the k largest entries

in modulus and then, for each row index i corresponding to one of

these entries, performing the same search on column i. These new row

indices will be added to the previous ones to form the nonzero pattern

for the column. This heuristic, referred to as neighbours of neighbours,

is described in detail in [36];

• the same approach as in the previous heuristic, but performing more

than one iteration, and halving the number of largest entries to be

located at each iteration in order to preserve sparsity. In practice, two

iterations are enough [2];

22 2. Iterative solution via preconditioned Krylov solvers ...

• scaling A such that its largest entry has magnitude equal to 1, and

retaining in the pattern only the elements located in positions (i, j)

such that ‖a ij ‖ > ε, where the threshold parameter ε ∈ (0, 1). This

heuristic was proposed by Kolotilina in [93].

Combinations of these approaches can be also used. In the numerical

experiments the preconditioners considered are constructed from the sparse

near-field approximation of A, computed by using the first heuristic. We will

refer to this matrix as sparsified(A) and denote it as Ã. We symmetrize the

pattern after computing it in order to preserve symmetry in Ã. We consider

the following methods implemented as right preconditioners :

• SSOR(ω), where ω is the relaxation parameter ;

• IC(k), the incomplete Cholesky factorization technique with k levels

of fill-in, i.e. taking for the factors a sparsity pattern based on position

and prescribed in advance;

• AINV , the approximate inverse method introduced in [16] that uses

a dropping strategy based on values;

• SP AI, a Frobenius-norm minimization technique with the adaptive

strategy proposed by Gould and Scott [76] for the selection of the

sparsity pattern for the preconditioner.

In order to illustrate the trend in the behaviour of these preconditioners,

we first show in Table 2.2.2 the number of iterations required to compute the

solution on Example 1. All the preconditioners are computed using the same

sparse approximation of the original matrix and all have roughly the same

number of nonzeros entries. In the incomplete Cholesky factorization, no

additional level of fill-in was allowed in the factors; with AINV , we selected

a suitable dropping threshold (around 10 −3 ) to obtain the same degree of

density as the other methods; and finally, with SP AI, we chose a priori, for

each column of M, the same fixed maximum number of nonzeros as in the

computation of sparsified(A). In the SSOR method, we choose ω=1. In

Table 2.2.2 we give the number of iterations for both GMRES and SQMR

that actually also corresponds to the number of matrix-vector products that

is the most time consuming part of the algorithms. We intend, in the

following sections, to understand the numerical behaviour of these methods

on electromagnetics problems, identifying some potential causes of failure.

2.2.1 SSOR

The SSOR preconditioner is the most basic preconditioning method apart

from a diagonal scaling. It is defined as

2.2. Preconditioning based on sparsification strategies 23

Example 1 - Density of Ã = 4% - Density of M = 4%

Precond. GMRES(50) GMRES(110) GMRES(∞) SQMR

None – 204 139 142

Jacobi 465 174 134 142

SSOR 214 100 100 145

IC(0) – – 159 –

AINV – – – –

SP AI 336 79 79 *

Table 2.2.2: Number of iterations using both symmetric and unsymmetric

preconditioned Krylov methods to reduce the normwise backward error by

10 −5 on Example 1. The symbol ’-’ means that convergence was not obtained

after 500 iterations. The symbol ’*’ means that the method is not applicable.

M = (D + ωE)D −1 (D + ωE T )

where E is the strictly lower triangular part of Ã, and D is the diagonal

matrix whose nonzero entries are the diagonal entries of Ã. In the case

ω = 1, D + E is the lower part of Ã, including the diagonal, and D + ET is

the upper part of Ã. We recall that Ã is symmetric, because A is symmetric

and we use a symmetric pattern for the sparsification.

In Table 2.2.3 we show the number of iterations required by different

Krylov solvers preconditioned by SSOR to reduce the residual by a factor

of 10 −5 . For those experiments we use ω = 1 to compute the preconditioner

and we consider increasing values of density for the matrix Ã. Although

very cheap to compute, SSOR is not very robust. Increasing the density of

the sparse approximation of A does not help to improve its performance,

and indeed on some problems it behaves like a diagonal scaling (ω = 0). In

Figures 2.2.5 and Figures 2.2.6 we illustrate the sensitivity of the SQMR

convergence to the parameter ω for Examples 1 and 4. When SSOR is used

as a stationary iterative solver, the relaxation parameter ω is selected in the

interval [0,2]. When SSOR is used as a preconditioner, the choice of the ω

parameter might be less constraining; thus we also show experiments with

values a bit larger than 2.0.

24 2. Iterative solution via preconditioned Krylov solvers ...

Density of Ã

GMRES(m)

Example 1

Bi -

CGStab

UQMR SQMR TFQMR

m=10 m=30 m=50 m=80 m=110

2% – – 213 145 103 310 – 149 –

4% – – 214 139 100 297 – 145 –

6% – – 224 136 98 317 – 149 –

8% – – 216 127 95 307 – 149 –

10% – – 202 126 94 360 – 151 –

Example 2

Density of Ã

GMRES(m)

Bi -

CGStab

UQMR SQMR TFQMR

m=10 m=30 m=50 m=80 m=110

2% – 478 269 184 146 – – 195 –

4% – – 281 178 145 349 – 187 –

6% – – 350 194 152 – – 186 –

8% – – 381 205 156 – – 189 –

10% – – 385 200 157 428 – 193 –

Example 3

Density of Ã

GMRES(m)

Bi -

CGStab

UQMR SQMR TFQMR

m=10 m=30 m=50 m=80 m=110

2% – – 411 314 245 – – 419 –

4% – – 405 306 233 – – 420 –

6% – – 406 306 231 486 – 412 –

8% – – 405 303 228 498 – 421 –

10% – – 406 302 229 – – 326 –

Example 4

Density of Ã

GMRES(m)

Bi -

CGStab

UQMR SQMR TFQMR

m=10 m=30 m=50 m=80 m=110

2% 371 192 138 116 95 193 379 85 342

4% 457 206 145 119 95 221 387 85 400

6% 464 208 148 119 97 224 399 85 356

8% 445 214 152 121 97 263 389 85 392

10% 475 217 157 122 96 223 396 85 402

Continued on next page

2.2. Preconditioning based on sparsification strategies 25

Density of Ã

GMRES(m)

Example 5

Continued from previous page

Bi -

CGStab

UQMR SQMR TFQMR

m=10 m=30 m=50 m=80 m=110

2% – 327 208 152 125 371 – 67 –

4% – 436 272 192 160 471 – 67 –

6% – – 333 217 184 – – 68 –

8% – – 381 231 191 – – 68 –

10% – – 423 242 195 – – 73 –

Table 2.2.3: Number of iterations required by different Krylov

solvers preconditioned by SSOR to reduce the residual by

10 −5 . The symbol ’-’ means that convergence was not

obtained after 500 iterations.

165

Example 1 − Size = 1080 − Density of sparsified(A) = 6 %

160

SQMR iterations

155

150

145

140

0 0.5 1 1.5 2 2.5

Value of ω

Figure 2.2.5: Sensitivity of SQMR convergence to the SSOR parameter ω

for Example 1.

2.2.2 Incomplete Cholesky factorization

Incomplete factorization methods are one of the most natural ways to

construct preconditioners of implicit type. In the general nonsymmetric

case, they start from a factorization method such as LU or Cholesky

26 2. Iterative solution via preconditioned Krylov solvers ...

105

Example 4 − Size = 2016 − Density of sparsified(A) = 6 %

100

SQMR iterations

95

90

85

80

0 0.5 1 1.5 2 2.5

Value of ω

Figure 2.2.6: Sensitivity of SQMR convergence to the SSOR parameter ω

for Example 4.

decomposition or even QR factorization that decompose the matrix into the

product of triangular factors, and thus modify it to reduce the construction

cost. The basic idea is to keep the factors artificially sparse, for instance

by dropping some elements in prescribed nondiagonal positions during the

standard Gaussian elimination algorithm. It is well known that, even when

the matrix is sparse, the triangular factors L and U and similarly the unitary

and the upper triangular factors Q and R can often be fairly dense. The

preconditioning operation z = M −1 y is computed by solving the linear

system ¯LŪz = y, where ¯L ≈ L and Ū ≈ U, that is performed in two

distinct steps:

1. solve ¯Lw = y

2. solve Ūz = w.

ILU preconditioners are amongst the most reliable in a general setting.

Originally developed for sparse matrices, they can be applied also to dense

systems, by extracting a sparsity pattern in advance, and performing the

incomplete factorization on the sparsified matrix. This class has been

intensively studied, and successfully employed on a wide range of symmetric

problems, providing a good balance between computational costs and

reduction of the number of iterations (see [27] and [55]). Well known

theoretical results on the existence and stability of the factorization can

be proved for the class of M-matrices [105], and recent studies involve more

general symmetric matrices, both structured and unstructured.

2.2. Preconditioning based on sparsification strategies 27

In this section, we consider the incomplete Cholesky factorization and

denote it by IC. We assume that the standard IC factorization matrix M

of Ã is given in the following form

M = LDL T , (2.2.2)

where D and L stand for, respectively, the diagonal matrix and the unit lower

triangular matrix whose entries are computed by means of the algorithm

given in Figure 2.2.7. The set F of fill-in entries to be kept is given by

F = { (k, i) | lev(l k,i ) ≤ l } ,

where integer l denotes a user specified maximal fill-in level.

lev(l k,i ) of the coefficient l k,i of L is defined by:

The level

Initialization

Factorization

lev(l k,i ) =

⎧

⎨

⎩

0 if l k,i ≠ 0 or k = i

∞

otherwise

lev(l k,i ) = min { lev(l k,i ) , lev(l i,j ) + lev(l k,j ) + 1 } .

The resulting preconditioner is usually denoted by IC(l). Alternative

strategies that dynamically discard fill-in entries are summarized in [122].

In Tables 2.2.4 to 2.2.8, we display the number of iterations using

an incomplete Cholesky factorization preconditioner on the five model

problems. In this and in the forthcoming tables the symbol ’-’ means that

convergence was not obtained after 500 iterations. We show results for

increasing values of the density for the sparse approximation of A as well

as various levels of fill-in. The general trend is that increasing the fill-in

generally produces a much more robust preconditioner than IC(0) applied to

a denser sparse approximation of the original matrix. Moreover, IC(l) with

l ≥ 1 may deliver a good rate of convergence provided the coefficient matrix

is not too sparse, as we get closer to LDL T . However, on indefinite problems

the numerical behaviour of IC can be fairly chaotic. This can be observed in

Table 2.2.8 for Example 5. The factorization of a very sparse approximation

(up to 2%) of the coefficient matrix can be stable and deliver a good rate of

convergence, especially if at least one level of fill-in is retained. For higher

values of density for the approximation of A, the factors may become very

ill-conditioned and consequently the preconditioner is very poor. As shown

in the tables, ill-conditioning of the factors is not related to ill-conditioning

of the matrix Ã. This behaviour has been already observed on sparse real

indefinite systems, see for instance [34].

As an attempt for a possible remedy, following [109, 110], we apply IC(l)

to a perturbation of Ã by a complex diagonal matrix. More specifically, we

28 2. Iterative solution via preconditioned Krylov solvers ...

Compute D and L

Initialization phase

d i,i = ã i,i ,

i = 1, 2, · · · , n

l i,j = ã i,j , i = 2, · · · , n , j = 1, 2, · · · , i − 1

Incomplete factorization process

do j = 1, 2, · · · , n − 1

do i = j + 1, j + 2, · · · , n

d i,i = d i,i − l2 i,j

d j,j

end do

l i,j = l i,j

d j,j

do k = i + 1, i + 2, · · · , n

end do

if (i, k) ∈ F

l k,i = l k,i − l i,j l k,j

Figure 2.2.7: Incomplete factorization algorithm - M = LDL T .

use

Ã τ = Ã + i τh∆ r , (2.2.3)

where ∆ r = diag(Re(A)) = diag(Re(Ã)), and τ stands for a nonnegative

real parameter, while

h = n − 1 d with d = 3 (the space dimension). (2.2.4)

The intention is to move the eigenvalues of the preconditioned system along

the imaginary axis and thus avoid a possible eigenvalue cluster close to zero.

In Table 2.2.9, we show the number of SQMR iterations for different

values of τ, the shift parameter, and various levels of fill-in in the

preconditioner. The value of the shift is problem-dependent, and should be

selected to ensure a good balance between making the factorization process

more stable without perturbing significantly the coefficient matrix. A good

value can be between 0 and 2. Although it is not easy to tune and its effect is

difficult to predict, a small diagonal shift can help to compute a more stable

factorization, and in some cases the performance of the preconditioner can

significantly improve.

In Figures 2.2.8, 2.2.9 and 2.2.10, we illustrate the effect of this shift

strategy on the eigenvalue distribution of the preconditioned matrix. For

2.2. Preconditioning based on sparsification strategies 29

Example 1

Density of Ã = 2% - K∞(Ã) = 50321

IC(level) Density of M GMRES(30) GMRES(50) SQMR

IC(0) 2.0% – – –

IC(1) 4.5% – – –

IC(2) 7.8% – – –

Density of Ã = 3% - K∞(Ã) = 120282

IC(level) Density of M GMRES(30) GMRES(50) SQMR

IC(0) 3.0% – – –

IC(1) 7.5% – – –

IC(2) 13.0% – – –

Density of Ã = 4% - K∞(Ã) = 29727

IC(level) Density of M GMRES(30) GMRES(50) SQMR

IC(0) 4.0% – – –

IC(1) 11.9% – – –

IC(2) 23.4% – – 194

Density of Ã = 5% - K∞(Ã) = 5350

IC(level) Density of M GMRES(30) GMRES(50) SQMR

IC(0) 5.0% – – 398

IC(1) 16.9% – – 222

IC(2) 32.3% 310 100 86

Density of Ã = 6% - K∞(Ã) = 12610

IC(level) Density of M GMRES(30) GMRES(50) SQMR

IC(0) 6.0% – – 296

IC(1) 21.7% – – 128

IC(2) 39.0% 81 46 45

Table 2.2.4: Number of iterations, varying the sparsity level of Ã and the

level of fill-in on Example 1.

each value of the shift parameter τ, we display κ(L), the condition number

(calculated using the LAPACK package) of the computed L factor, and

the number of iterations required by SQMR. The eigenvalues are scattered

all over the complex plane when no shift is used, whereas they look more

clustered when a shift is applied. As we mentioned before, a clustered

spectrum of the preconditioned matrix is usually considered a desirable

property for fast convergence of Krylov solvers. However, for incomplete

factorizations the condition number of the factors plays a more important

role than the eigenvalue distribution on the rate of convergence of the Krylov

iterations. In fact, if the triangular factors computed by the incomplete

factorization process are very ill-conditioned, the long recurrences associated

30 2. Iterative solution via preconditioned Krylov solvers ...

Example 2

Density of Ã = 2% - K∞(Ã) = 14136

IC(level) Density of M GMRES(30) GMRES(50) SQMR

IC(0) 2.0% – – 168

IC(1) 4.1% – – 386

IC(2) 6.6% – – –

Density of Ã = 3% - K∞(Ã) = 998

IC(level) Density of M GMRES(30) GMRES(50) SQMR

IC(0) 3.0% – – 171

IC(1) 6.7% 84 76 35

IC(2) 11.5% 84 46 30

Density of Ã = 4% - K∞(Ã) = 737

IC(level) Density of M GMRES(30) GMRES(50) SQMR

IC(0) 4.0% – 327 121

IC(1) 9.9% 46 38 31

IC(2) 17.5% 32 31 25

Density of Ã = 5% - K∞(Ã) = 647

IC(level) Density of M GMRES(30) GMRES(50) SQMR

IC(0) 5.0% – – 103

IC(1) 13.2% 41 36 31

IC(2) 23.4% 29 29 25

Density of Ã = 6% - K∞(Ã) = 648

IC(level) Density of M GMRES(30) GMRES(50) SQMR

IC(0) 6.0% – – 143

IC(1) 15.9% 41 35 30

IC(2) 28.2% 28 29 23

Table 2.2.5: Number of iterations, varying the sparsity level of Ã and the

level of fill-in on Example 2.

with the triangular solves are unstable and the use of the preconditioner may

be totally uneffective.

An auto-tuned strategy might be designed, which consists in incrementing

the value of the shift and computing a new incomplete factorization if

the condition number of the current factor is too large. Although time

consuming, this strategy might construct a robust shifted IC factorization

on highly indefinite problems.

2.2. Preconditioning based on sparsification strategies 31

Example 3

Density of Ã = 2% - K∞(Ã) = 33348

IC(level) Density of M GMRES(30) GMRES(50) SQMR

IC(0) 2.0% – – –

IC(1) 4.5% – – –

IC(2) 7.0% – – –

Density of Ã = 3% - K∞(Ã) = 13269

IC(level) Density of M GMRES(30) GMRES(50) SQMR

IC(0) 3.0% – – –

IC(1) 7.1% – 247 110

IC(2) 11.3% 60 41 40

Density of Ã = 4% - K∞(Ã) = 9568

IC(level) Density of M GMRES(30) GMRES(50) SQMR

IC(0) 4.0% – – 388

IC(1) 10.0% 80 47 47

IC(2) 15.9% 26 26 24

Density of Ã = 5% - K∞(Ã) = 1874

IC(level) Density of M GMRES(30) GMRES(50) SQMR

IC(0) 5.0% – – 342

IC(1) 12.9% 39 33 32

IC(2) 20.4% 21 21 19

Density of Ã = 6% - K∞(Ã) = 1403

IC(level) Density of M GMRES(30) GMRES(50) SQMR

IC(0) 6.0% – – 362

IC(1) 15.8% 29 29 27

IC(2) 24.5% 18 18 15

Table 2.2.6: Number of iterations, varying the sparsity level of Ã and the

level of fill-in on Example 3.

32 2. Iterative solution via preconditioned Krylov solvers ...

Example 4

Density of Ã = 2% - K∞(Ã) = 541

IC(level) Density of M GMRES(30) GMRES(50) SQMR

IC(0) 2.0% 285 221 98

IC(1) 5.1% 46 42 30

IC(2) 8.6% 30 30 24

Density of Ã = 3% - K∞(Ã) = 346

IC(level) Density of M GMRES(30) GMRES(50) SQMR

IC(0) 3.0% – – 467

IC(1) 8.3% 34 32 24

IC(2) 14.2% 23 23 14

Density of Ã = 4% - K∞(Ã) = 322

IC(level) Density of M GMRES(30) GMRES(50) SQMR

IC(0) 4.0% 255 187 96

IC(1) 10.9% 24 24 15

IC(2) 17.9% 19 19 12

Density of Ã = 5% - K∞(Ã) = 369

IC(level) Density of M GMRES(30) GMRES(50) SQMR

IC(0) 5.0% – – –

IC(1) 14.7% 23 23 15

IC(2) 24.5% 19 19 11

Density of Ã = 6% - K∞(Ã) = 370

IC(level) Density of M GMRES(30) GMRES(50) SQMR

IC(0) 6.0% 477 341 146

IC(1) 18.6% 19 19 12

IC(2) 30.2% 16 16 10

Table 2.2.7: Number of iterations, varying the sparsity level of Ã and the

level of fill-in on Example 4.

2.2. Preconditioning based on sparsification strategies 33

Example 5

Density of Ã = 2% - K∞(Ã) = 263

IC(level) Density of M κ ∞ (L) GMRES(30) GMRES(50) SQMR

IC(0) 2.0% 2 · 10 3 378 245 102

IC(1) 5.1% 1 · 10 3 79 68 45

IC(2) 9.1% 9 · 10 2 58 48 34

Density of Ã = 3% - K∞(Ã) = 270

IC(level) Density of M κ ∞ (L) GMRES(30) GMRES(50) SQMR

IC(0) 3.0% 1 · 10 6 – – –

IC(1) 7.8% 1 · 10 5 – – –

IC(2) 12.8% 3 · 10 3 48 45 30

Density of Ã = 4% - K∞(Ã) = 253

IC(level) Density of M κ ∞ (L) GMRES(30) GMRES(50) SQMR

IC(0) 4.0% 6 · 10 9 – – –

IC(1) 11.7% 2 · 10 5 – – –

IC(2) 19.0% 7 · 10 3 40 38 25

Density of Ã = 5% - K∞(Ã) = 285

IC(level) Density of M κ ∞ (L) GMRES(30) GMRES(50) SQMR

IC(0) 5.0% 6 · 10 10 – – –

IC(1) 14.6% 1 · 10 5 – – 307

IC(2) 23.0% 3 · 10 4 150 84 49

Density of Ã = 6% - K∞(Ã) = 294

IC(level) Density of M κ ∞ (L) GMRES(30) GMRES(50) SQMR

IC(0) 6.0% 8 · 10 11 – – –

IC(1) 18.8% 5 · 10 11 – – –

IC(2) 29.6% 7 · 10 4 – – 242

Table 2.2.8: Number of iterations, varying the sparsity level of Ã and the

level of fill-in on Example 5.

34 2. Iterative solution via preconditioned Krylov solvers ...

Example 1 - Density of Ã = 5%

IC(level) Density of M

τ

0.0 0.1 0.3 0.5 0.7 0.9 1.1

IC(0) 5.0% 398 – 222 166 117 123 109

IC(1) 16.9% 222 – – 169 90 73 67

IC(2) 32.3% 86 159 146 134 67 68 62

Example 2 - Density of Ã = 2%

IC(level) Density of M

τ

0.0 0.1 0.3 0.5 0.7 0.9 1.1

IC(0) 2.0% 168 423 – – 458 182 180

IC(1) 4.1% 386 – – – 363 141 142

IC(2) 6.6% – 380 200 – 474 142 117

Example 3 - Density of Ã = 3%

IC(level) Density of M

τ

0.0 0.1 0.3 0.5 0.7 0.9 1.1

IC(0) 3.0% – – – – – 179 172

IC(1) 7.1% 110 139 – 336 95 109 145

IC(2) 11.3% 40 92 – 95 80 85 90

Example 4 - Density of Ã = 4%

IC(level) Density of M

τ

0.0 0.1 0.3 0.5 0.7 0.9 1.1

IC(0) 3.0% 467 189 – – – – 206

IC(1) 8.4% 24 26 60 234 – – –

IC(2) 14.2% 14 15 21 28 – – –

Example 5 - Density of Ã = 4%

IC(level) Density of M

τ

0.0 0.1 0.3 0.5 0.7 0.9 1.1

IC(0) 4.0% – – – – – – –

IC(1) 11.7% – – – – – – –

IC(2) 19.0% 25 131 123 – – – –

Table 2.2.9: Number of SQMR iterations, varying the shift parameter for

various level of fill-in in IC.

2.2. Preconditioning based on sparsification strategies 35

(a) τ = 0.0 - κ(L) = 526284 -

SQMR iter. = +500

(b) τ = 0.1 - κ(L) = 134975 -

SQMR iter. = +500

(a) τ = 0.3 - κ(L) = 9608 -

SQMR iter. = 313

(b) τ = 0.5 - κ(L) = 2165 -

SQMR iter. = 161

iter. = 117

(d) τ = 0.9 - κ(L) = 434 - SQMR

iter. = 104

iter. = 95

(d) τ = 1.3 - κ(L) = 183 - SQMR

iter. = 94

Figure 2.2.8: The spectrum of the matrix preconditioned with IC(1), the

condition number of L, and the number of iterations with SQMR for various

values of the shift parameter τ. The test problem is Example 1 and the

density of Ã is around 3%.

36 2. Iterative solution via preconditioned Krylov solvers ...

(a) τ = 0.0 - κ(L) = 526284 -

SQMR iter. = +500

(b) τ = 0.1 - κ(L) = 134975 -

SQMR iter. = +500

(a) τ = 0.3 - κ(L) = 9608 -

SQMR iter. = 313

(b) τ = 0.5 - κ(L) = 2165 -

SQMR iter. = 161

iter. = 117

(d) τ = 0.9 - κ(L) = 434 - SQMR

iter. = 104

iter. = 95

(d) τ = 1.3 - κ(L) = 183 - SQMR

iter. = 94

Figure 2.2.9: The eigenvalue distribution on the square [-1, 1] of the matrix

preconditioned with IC(1), the condition number of L, and the number of

iterations with SQMR for various values of the shift parameter τ. The test

problem is Example 1 and the density of Ã is around 3%.

2.2. Preconditioning based on sparsification strategies 37

(a) τ = 0.0 - κ(L) = 526284 -

SQMR iter. = +500

(b) τ = 0.1 - κ(L) = 134975 -

SQMR iter. = +500

(a) τ = 0.3 - κ(L) = 9608 -

SQMR iter. = 313

(b) τ = 0.5 - κ(L) = 2165 -

SQMR iter. = 161

iter. = 117

(d) τ = 0.9 - κ(L) = 434 - SQMR

iter. = 104

iter. = 95

(d) τ = 1.3 - κ(L) = 183 - SQMR

iter. = 94

Figure 2.2.10: The eigenvalue distribution on the square [-0.3, 0.3] of the

matrix preconditioned with IC(1), the condition number of L, and the

number of iterations with SQMR for various values of the shift parameter

τ. The test problem is Example 1 and the density of Ã is around 3%.

38 2. Iterative solution via preconditioned Krylov solvers ...

2.2.3 AINV

An alternative way to construct a preconditioner is to compute an explicit

approximation of the inverse of the coefficient matrix. In this section

we consider two techniques, the first constructs an approximation of the

inverse of the factors using an Ã-biconjugation process [19] and the other a

Frobenius-norm minimization technique [93].

If the matrix Ã can be written in the form LDLT where L is unit lower

triangular and D is diagonal, then its inverse can be decomposed as Ã−1 =

L −T D −1 L −1 = ZD −1 Z T where Z = L −T is unit triangular. Factorized

sparse approximate inverse techniques compute sparse approximations ¯Z ≈

Z, so that the resulting preconditioner will be M = ¯Z ¯D −1 ¯ZT ≈ Ã−1 , for

¯D ≈ D.

In the approach known as AINV the triangular factors are computed

by means of a set of Ã-biconjugate vectors {z i } n i=1 , such that zT i Ãz j = 0 if

and only if i ≠ j. Then, introducing the matrix Z = [z 1 , z 2 , ...z n ] the relation

⎛

⎞

p 1 0 . . . 0

Z T 0 p 2 . . . 0

ÃZ = D = ⎜

⎝

.

. . ..

⎟

. ⎠

0 0 . . . p n

holds, where p i = z T i

Ãz i ≠ 0 , and the inverse is equal to

Ã −1 = ZD −1 Z T =

n∑

i=1

z i z T i

p i

.

The sets of Ã-biconjugate vectors are computed by means of a (two-sided)

Gram-Schmidt orthogonalization process with respect to the bilinear form

associated with Ã. A sketch of the algorithm is resumed in Figure 2.2.11.

In exact arithmetic this process can be completed if and only if Ã admits a

LU factorization. AINV does not require a pattern prescribed in advance

for the approximate inverse factors, and sparsity is preserved during the

process, by discarding elements in the computed approximate inverse factor

having magnitude smaller than a given positive threshold.

An alternative approach was proposed by Kolotilina and Yeremin in

a series of papers ([95, 96, 97, 98]). This approach, known as F SAI,

approximates Ã−1 by the factorization G T G, where G is a sparse lower

triangular matrix approximating the inverse of the lower triangular Cholesky

factor, ˜L, of Ã. This technique has obtained good results on some difficult

problems and is suitable for parallel implementation, but it requires an a

priori prescription for the sparsity pattern for the approximate factors. The

approximate inverse factor is computed by minimizing ||I − G˜L|| 2 F , that

can be accomplished without knowing the Cholesky factor ˜L by solving the

2.2. Preconditioning based on sparsification strategies 39

Compute D −1 and Z

Initialization phase

z (0)

i = e i (1 ≤ i ≤ n), A = [a 1, · · · , a n]

The biconjugation algorithm

do i = 1, 2, · · · , n

do j = i, i + 1, · · · , n

end do

p (i−1)

j

do j = i + 1, · · · , n

= a T i z (i−1)

j

end do

z (i)

j

= z (i−1)

j

− (p (i−1)

j

/p (i−1)

i )z (i−1)

i

z i = z (i−1)

i , p i = p (i−1)

i

Figure 2.2.11: The biconjugation algorithm - M = ZD −1 Z T .

normal equations

{G˜L˜L T } ij = ˜L T ij, (i, j) ∈ S˜L

(2.2.5)

where S˜L

is a lower triangular nonzero pattern for G. Equation (2.2.5) can

be replaced by

{ ˜GÃ} ij = I ij , (i, j) ∈ S˜L

(2.2.6)

where ˜G = ˜D −1 G and ˜D is the diagonal of ˜L. Then, each row of ˜G

can be computed independently by solving a small linear system. The

preconditioned linear system has the form

GÃGT = ˜D ˜GÃ ˜G T ˜D.

The matrix ˜D is not known and is generally chosen so that the diagonal of

GÃGT is all ones.

Recently another matrix inversion based on incomplete biconjugation

has been proposed in [148]. The idea is to compute a lower unit triangular

matrix

L = [L 1 , L 2 , ...L n ] of order n,

such that L T ÃL is a diagonal nonsingular matrix, say

D −1 =diag[d −1

11 , d−1 22 ...d−1 nn] .

40 2. Iterative solution via preconditioned Krylov solvers ...

This is equivalent to the relations

L T i ÃL j

{ = 0 if i ≠ j

≠ 0 if i = j · (2.2.7)

In other words L T i and L j are Ã-biconjugate, and then the inverse can be

written as Ã−1 = LDL T . A procedure computes the inverse factors of

Ã −1 using relations 2.2.7 and preserves a sparsity pattern for the factor L

discarding entries with small modulus.

In Table 2.2.10 we show the number of iterations needed by GMRES and

SQMR preconditioned by AINV to reduce the normwise backward error by

10 −5 on the five examples considered. On the most difficult problems, the

performance of this preconditioner is very poor. For low values of density

of Ã, AINV is less effective than a diagonal scaling, and its quality does not

improve even when the dense coefficient matrix is used for the construction

as shown in the results of Table 2.2.11. Both re-ordering and shift strategies

do not improve the effectiveness of the preconditioner. We performed

in particular experiments with the reverse Cuthil-MacKee ordering [37],

the minimum degree ordering [71, 141] and the spectral nested dissection

ordering [114]. The best performance were observed with the minimum

degree algorithm that in some cases enables to have smaller norm-wise

backward error at the end of convergence. We mention that very similar

or sometimes more disappointing results have been observed with the FSAI

method and the other factorized approximate inverse proposed in [148].

2.2. Preconditioning based on sparsification strategies 41

Example 1

Density of Ã

GMRES(m)

SQMR

m=50 m=110 m=∞

2% – – – –

4% – – – –

6% – – 313 –

8% – – 350 –

10% – – 207 306

Example 2

Density of Ã

GMRES(m)

SQMR

m=50 m=110 m=∞

2% – – – –

4% – – 206 345

6% 402 213 143 175

8% 318 195 120 132

10% 144 93 93 99

Example 3

Density of Ã

GMRES(m)

SQMR

m=50 m=110 m=∞

2% – – – –

4% 264 101 101 105

6% 56 51 51 48

8% 37 37 37 34

10% 31 31 31 29

Example 4

Density of Ã

GMRES(m)

SQMR

m=50 m=110 m=∞

2% – – 280 387

4% 83 68 68 57

6% 46 46 46 34

8% 42 42 42 32

10% 48 48 48 38

Example 5

Density of Ã

GMRES(m)

SQMR

m=50 m=110 m=∞

2% 177 142 121 111

4% – – 213 251

6% – 407 194 210

8% – 404 179 207

10% – 328 154 189

Table 2.2.10: Number of iterations required by different Krylov solvers

preconditioned by AINV to reduce the residual by 10 −5 . The symbol ’-’

means that convergence was not obtained after 500 iterations.

42 2. Iterative solution via preconditioned Krylov solvers ...

Density of Ã

Example 1

GMRES(m)

SQMR

m=50 m=110 m=∞

2% – – – –

4% – – – –

6% – – – –

8% – – – –

10% – – 483 –

Example 2

Density of Ã

GMRES(m)

SQMR

m=50 m=110 m=∞

2% – – – –

4% – – 495 –

6% – – 361 –

8% – – 279 –

10% – – 209 486

Example 3

Density of Ã

GMRES(m)

SQMR

m=50 m=110 m=∞

2% – 288 153 176

4% 101 79 79 78

6% 66 57 57 52

8% 42 42 42 38

10% 36 36 36 34

Example 4

Density of Ã

GMRES(m)

SQMR

m=50 m=110 m=∞

2% – – 211 245

4% – 315 154 182

6% – 202 127 142

8% 447 107 107 114

10% 198 90 90 91

Example 5

Density of Ã

GMRES(m)

SQMR

m=50 m=110 m=∞

2% – – – –

4% – – – –

6% – – 259 474

8% – – 229 358

10% – – 216 374

Table 2.2.11: Number of iterations required by different Krylov

solvers preconditioned by AINV to reduce the residual by 10 −5 . The

preconditioner is computed using the dense coefficient matrix. The

symbol ’-’ means that convergence was not obtained after 500 iterations.

2.2. Preconditioning based on sparsification strategies 43

Possible causes of failure of factorized approximate inverses

One potential difficulty with the factorized approximate inverse method

AINV is the tuning of the threshold parameter that controls the fill-in

in the inverse factors. For a typical example we display in Figure 2.2.12

the sparsity pattern of A −1 (on the left) and L −1 , the inverse of its

Cholesky factor (on the right), respectively, where all the entries smaller

than 5.0 × 10 −2 have been dropped after a symmetric scaling such that

max i |a ji | = max i |l ji | = 1. The location of the large entries in the

inverse matrix exhibit some structure. In addition, only a very small

number of its entries have large magnitude compared to the others that

are much smaller. This fact has been successfully exploited to define

various a priori pattern selection strategies for Frobenius norm minimization

preconditioners [2, 22] in a non-factorized form. On the contrary, the inverse

factors that are explicitely approximated by AINV and by F SAI can be

totally unstructured as shown in Figure 2.2.12(b). In this case, the a priori

selection of a sparse pattern for the factors can be extremely hard as no

real structures are revealed, preventing the use of techniques like F SAI.

In Figure 2.2.13 we plot the magnitude of the entries in the first column

of A −1 (on the left) and L −1 (on the right), respectively, with respect to

their row index. These plots indicate that any dropping strategy, either

static or dynamic, may be very difficult to tune as it can easily discard

relevant information and potentially lead to a very poor preconditioner.

Selecting too small a threshold would retain too many entries and lead to a

fairly dense preconditioner. For instance on the small example considered,

if a threshold of 0.05 is used the preconditioner is 14.8% dense. A larger

threshold would yield a sparser preconditioner but might discard too many

entries of moderate magnitude that are important for the preconditioner. On

the previous example all the entries with magnitude smaller than 0.2 must

be dropped to keep the density in the inverse factor around 3%. Because

of these issues, finding the appropriate threshold to enable a good trade-off

between sparsity and numerical efficiency is challenging and very problemdependent.

2.2.4 SPAI

Frobenius-norm minimization is a natural approach for building explicit

preconditioners. This method computes a sparse approximate inverse as

the matrix M = {m ij } which minimizes ‖I − MÃ‖ F (or ‖I − ÃM‖ F

for right preconditioning) subject to certain sparsity constraints. Early

references to this latter class can be found in [12, 13, 14, 65] and in [2]

for some applications to boundary element matrices in electromagnetism.

The Frobenius-norm is usually chosen since it allows the decoupling of the

constrained minimization problem into n independent linear least-squares

44 2. Iterative solution via preconditioned Krylov solvers ...

0

20

40

60

80

100

120

0 20 40 60 80 100 120

Density = 8.75%

(a) Sparsity pattern of

sparsified(A −1 )

120

0 20 40 60 80 100 120

Density = 29.39%

(b) Sparsity pattern of

sparsified(L −1 )

Figure 2.2.12: Sparsity patterns of the inverse of A (on the left) and of

the inverse of its lower triangular factor (on the right), where all the entries

whose relative magnitude is smaller than 5.0 × 10 −2 are dropped. The test

problem, representative of the general trend, is a small sphere.

2

1.8

Magnitude of the entries in the 1st row of A −1

1.6

1.4

1.2

1

0.8

0.6

0.4

0.2

1.6

1.4

1.2

1

0.8

0.6

0.4

0.2

0

0 20 40 60 80 100 120

Column of A −1

of A −1 of the inverse of a factor of A

0 20 40 60 80 100 120

(a) Histogram of the magnitude

of the entries of the first column

(b) Histogram of the magnitude

of the entries in the first column

Figure 2.2.13: Histograms of the magnitude of the entries of the first column

of A −1 and its lower triangular factor. A similar behaviour has been observed

for all the other columns. The test problem, representative of the general

trend, is a small sphere.

0

problems, one for each column of M (when preconditioning from the right)

or row of M (when preconditioning from the left).

The independence of these least-squares problems follows immediately from

the identity:

‖I − MÃ‖2 F = ‖I − ÃM T ‖ 2 F =

n∑

‖e j − Ãm j•‖ 2 2 (2.2.8)

where e j is the j-th unit vector and m j• is the column vector representing

the j-th row of M.

j=1

2.2. Preconditioning based on sparsification strategies 45

In the case of right preconditioning, the analogous relation

‖I − ÃM‖2 F =

n∑

‖e j − Ãm •j‖ 2 2 (2.2.9)

j=1

holds, where m •j is the column vector representing the j-th column of

M. Clearly, there is considerable scope for parallelism in this approach.

However, the precondioner is not guaranteed to be nonsingular, and the

symmetry of Ã is generally not preserved in M. The main issue for the

computation of the sparse approximate inverse is the selection of the nonzero

pattern of M, that is the set of indices

S = { (i, j) ∈ [1, n] 2 s.t. m ij = 0 }.

If the sparsity pattern of M is known, the nonzero structure for the j-th

column of M is automatically determined, and defined as

J = {i ∈ [1, n] s.t. (i, j) ∈ S}.

The least-squares solution involves only the columns of Ã indexed by J; we

indicate this subset by Ã(:, J). Because Ã is sparse, many rows in Ã(:, J) are

usually null, not affecting the solution of the least-squares problems (2.2.9).

Thus if I is the set of indices corresponding to the nonzero rows in Ã(:, J),

and if we define by Â = Ã(I, J), by ˆm j = m j (J), and by ê j = e j (J), the

actual “reduced” least-squares problems to solve are

min‖ê j − Â ˆm j‖ 2 , j = 1, .., n. (2.2.10)

Usually problems (2.2.10) have much smaller size than problems (2.2.9).

Two different approaches can be followed for the selection of the sparsity

pattern of M: an adaptive technique that dynamically tries to identify

the best structure for M; and a static technique, where the pattern of

M is prescribed a priori based on some heuristics. The idea is to keep

M reasonably sparse while trying to capture the “large” entries of the

inverse, which are expected to contribute the most to the quality of the

preconditioner. A static approach that requires an a priori nonzero pattern

for the preconditioner, introduces significant scope for parallelism and has

the advantage that the memory storage requirements and computational

cost for the setup phase are known in advance. However, it can be very

problem dependent.

A dynamic approach is generally effective but is usually very expensive.

These methods usually start with a simple initial guess, like a diagonal

matrix, and then improve the pattern until a criterion of the form ‖Ãm j −

e j ‖ 2 < ε (for each j) is satisfied for a given ε > 0, e j being the j-th column

of the identity matrix, or until a maximum number of nonzeros in the j-th

column m j of M has been reached.

46 2. Iterative solution via preconditioned Krylov solvers ...

Different strategies can be adopted to enrich the initial nonzero structure

of the j-th column of the preconditioner. The method known as SPAI [84]

uses some heuristic to select the new indices by predicting those that can

most effectively reduce the residual

‖r‖ 2

= ‖Ã(:, J) ˆm j − ê j ‖ 2 (2.2.11)

Grote and Huckle [84] propose solving a one-dimensional minimization

problem. If L = {l s.t. r(l) ≠ 0}, then the new candidates are selected from

Ĩ = {j s.t. Ã(L, j) ≠ 0}. They suggest solving, for each j ∈ Ĩ the following

problem

The solution of this problem is

min µj ‖r + µ j Ãe j ‖ 2 .

µ j = rT Ãe j

‖Ãe ,

j‖ 2 2

and the residual of the updated solution is given by

ρ j = ‖r‖ 2

− rT Ãe j

‖Ãe .

j‖ 2 2

The proposed heuristic selects the indices which maximize rT Ãe j

. More

‖Ãe j‖ 2 2

than one new candidate can be selected at a time, and the algorithm stops

when either a maximum number of nonzeros per column is reached or

the required accuracy is achieved. The algorithm can deliver very good

preconditioners even on hard problems, but at the cost of huge times and

memory although the execution time can be significantly reduced because

of parallelism. A comparison in terms of construction cost with ILU-type

methods can be found in [18, 76].

In Table 2.2.12, we show the number of iterations needed by Krylov

solvers preconditioned by SPAI to solve the model problems. As for the

other preconditioners, we consider different levels of density in the sparse

approximation of A. Provided the preconditioner is dense enough, SPAI is

quite effective in reducing the number of iterations. Also, the quality of the

preconditioner on difficult problems can be remarkably improved if the dense

coefficient matrix is used for the construction. For instance on Example 1, if

SPAI is computed using the full A, then a density of 2% for the approximate

inverse enables the convergence of GMRES(80) in 75 iterations, whereas

convergence is not achieved in 500 iterations if the approximate inverse is

computed using a sparse approximation of A. However the adaptive strategy

requires a prohibitive time. The construction of the approximate inverse

using 6% density for Ã takes nearly one hour of computation on a SGI

2.2. Preconditioning based on sparsification strategies 47

Origin 2000 for Example 4 and three hours for Example 5. When using the

dense matrix A in the computation, the construction of the preconditioner

for the same examples takes more than one day.

2.2.5 SLU

In this section we use the sparsified matrix Ã as an implicit preconditioner;

that is, the sparsified matrix is factorized using ME47, a sparse direct solver

from HSL [87], and those exact factors are used as the preconditioner. Thus

it represents an extreme case with respect to ILU(0), since a complete fillin

is allowed in the factors. This method will be referred to as SLU. This

approach, although not easily parallelizable, is generally quite effective on

this class of applications for dense enough sparse approximations of A.

In Table 2.2.13 we show the number of iterations required by different

Krylov solvers preconditioned by SLU to reduce the normwise backward

error by a factor of 10 −5 . This approach, although not easily parallelizable,

is generally quite effective on this class of applications for dense enough

sparse approximations of A. However, as shown in the table, when

the preconditioner is very sparse, the numerical quality of this approach

deteriorates and the Frobenius-norm minimization method is more robust.

48 2. Iterative solution via preconditioned Krylov solvers ...

Example 1

Density of Ã

GMRES(m) Bi -

CGStab

UQMR TFQMR

m=10 m=30 m=50 m=80 m=110

2% – – – – – – – –

4% – – 336 79 79 333 254 370

6% – – 150 65 65 269 243 312

8% – 242 82 56 56 175 195 240

10% – 237 50 50 50 127 174 196

Example 2

Density of Ã

GMRES(m) Bi -

CGStab

UQMR TFQMR

m=10 m=30 m=50 m=80 m=110

2% – – – – 212 – – –

4% – – 494 79 79 371 315 –

6% – – 185 72 72 291 279 432

8% – – 134 66 66 277 287 406

10% – – 109 62 62 229 267 458

Example 3

Density of Ã

GMRES(m) Bi -

CGStab

UQMR TFQMR

m=10 m=30 m=50 m=80 m=110

2% – – – – – – – –

4% – – 194 72 72 187 255 340

6% – 230 80 55 55 153 177 222

8% – 151 48 48 48 181 162 196

10% – 151 46 46 46 157 159 208

Example 4

Density of Ã

GMRES(m) Bi -

CGStab

UQMR TFQMR

m=10 m=30 m=50 m=80 m=110

2% – – 253 81 81 – 309 394

4% – – 187 113 85 374 331 424

6% – 401 153 76 76 288 270 370

8% – 90 47 47 47 76 171 170

10% 41 28 28 28 28 35 105 74

Example 5

Density of Ã

GMRES(m) Bi -

CGStab

UQMR TFQMR

m=10 m=30 m=50 m=80 m=110

2% – – – – – – – –

4% – 183 138 73 73 213 457 338

6% – – 194 122 93 – 448 442

8% – 289 137 71 71 – 345 358

10% – 283 100 68 68 – 334 266

Table 2.2.12: Number of iterations required by different Krylov solvers

preconditioned by SPAI to reduce the residual by 10 −5 . The symbol ’-’

means that convergence was not obtained after 500 iterations.

2.2. Preconditioning based on sparsification strategies 49

Example 1

Density of Ã

GMRES(m) Bi -

CGStab

UQMR TFQMR

m=10 m=30 m=50 m=80 m=110

2% +500 +500 +500 364 241 +500 +500 486

4% +500 +500 128 65 65 136 111 114

6% 60 31 31 31 31 23 36 28

8% 51 27 27 27 27 21 34 22

10% 33 22 22 22 22 14 25 17

Example 2

Density of Ã

GMRES(m) Bi -

CGStab

UQMR TFQMR

m=10 m=30 m=50 m=80 m=110

2% +500 +500 +500 288 109 489 290 229

4% 50 30 30 30 30 18 42 22

6% 40 27 27 27 27 16 38 21

8% 32 24 24 24 24 14 35 19

10% 26 21 21 21 21 13 30 16

Example 3

Density of Ã

GMRES(m) Bi -

CGStab

UQMR TFQMR

m=10 m=30 m=50 m=80 m=110

2% +500 +500 330 171 108 207 234 206

4% 38 27 27 27 27 16 29 19

6% 27 21 21 21 21 11 22 14

8% 21 17 17 17 17 10 17 12

10% 18 15 15 15 15 9 16 10

Example 4

Density of Ã

GMRES(m) Bi -

CGStab

UQMR TFQMR

m=10 m=30 m=50 m=80 m=110

2% 37 35 34 34 34 17 39 21

4% 23 21 21 21 21 10 24 14

6% 18 17 17 17 17 9 18 10

8% 15 15 15 15 15 8 16 9

10% 14 13 13 13 13 7 15 9

Example 5

Density of Ã

GMRES(m) Bi -

CGStab

UQMR TFQMR

m=10 m=30 m=50 m=80 m=110

2% 72 45 42 42 34 46 37

4% 42 29 29 29 29 23 32 25

6% 29 26 26 26 26 20 28 16

8% 29 23 23 23 23 17 25 15

10% 28 21 21 21 21 17 25 18

Table 2.2.13: Number of iterations required by different Krylov solvers

preconditioned by SLU to reduce the residual by 10 −5 . The symbol ’-’

means that convergence was not obtained after 500 iterations.

50 2. Iterative solution via preconditioned Krylov solvers ...

2.2.6 Other preconditioners

A third class of explicit methods deserves to be mentioned here, although

we will not consider it in our numerical experiments. It is based on ILU

techniques, and in the general nonsymmetric case it builds the sparse

approximate inverse by first performing an incomplete LU factorization

Ã ≈ ¯LŪ and then approximately inverting the ¯L and Ū factors by solving

the 2n triangular linear systems

{ ¯Lxi = e i

Ūy i = e i

(1 ≤ i ≤ n).

These two systems are solved approximately, prescribing two sparsity

pattern for ¯L and Ū and using a Frobenius-type method, or the adaptive

SP AI method without any pattern in advance. Another approach, which

has provided better results, consists in solving the 2n triangular systems by

customary forward and backward substitution, respectively, and adopting

dropping strategy, based either on position or on values, to maintain sparsity

in the columns of ¯L and Ū. Generally two different levels of incompleteness

are applied, rather than one as in the other approximate inverse methods.

These preconditioners are not easy to use; relying on ILU factorization, they

are almost useless for highly nonsymmetric, indefinite matrices and since

incomplete processes are strongly sequential, the preconditioner building

phase is not entirely parallelizable, although the independence of the two

triangular solves suggest a good scope for parallelism. References to this

class can be found in [3, 40, 133].

2.3 Concluding remarks

In this chapter we have established the need for preconditioning linear

systems of equations which arise from the discretization of boundary

integral equations in electromagnetism. We have discussed several standard

preconditioners based on sparsification strategies and have studied and

compared their numerical behaviour on a set of model problems that may be

representative of real electromagnetic calculation. We have shown that the

incomplete factorization process is highly unstable on indefinite matrices

like those arising from the discretization of the EFIE formulation. Using

numerical experiments we have shown that the triangular factors computed

by the factorization can be very ill-conditioned, and the long recurrences

associated with the triangular solves are unstable. As an attempt at a

possible remedy, we have introduced a small complex shift to move the

eigenvalues of the preconditioned system along the imaginary axis and

thus try to avoid a possible cluster of eigenvalues close to zero. A small

diagonal complex shift can help to compute a more stable factorization.

2.4. Concluding remarks 51

However, suitable strategies can be introduced to tune the optimal value

of the shift and to predict its effect. Factorized approximate inverses,

namely AINV and F SAI, exhibit poor convergence behaviour because

the inverse factors can be totally unstructured; both reordering and shift

strategies do not improve their effectiveness. Any dropping strategy, either

static or dynamic, may be very difficult to tune as it can easily discard

relevant information and potentially lead to a very poor preconditioner.

Among different techniques, Frobenius norm minimization methods are

quite efficient because they deliver a good rate of convergence. However,

they require a high computational effort, so that their use is mainly effective

in a parallel setting. To be computationally affordable on dense linear

systems, Frobenius-norm minimization preconditioning techniques require

a suitable strategy to identify the relevant entries to consider in the original

matrix A, in order to define small least-squares problems, as well as an

appropriate sparsity structure for the approximate inverse. Prescribing a

pattern in advance for the preconditioner can greatly reduce the amount

of work in terms of CPU-time. The problem of cost is evident for the

computation of SP AI, since fast convergence can be obtained for high values

of the sparsity ratio, but then the adaptive strategy requires a prohibitive

time and computational cost in a sequential environment. Compared to

sparse approximate inverse methods, SSOR is generally slower, but is very

cheap to compute. Its main drawback is that it is not parallelizable and in

addition, for much larger problems, the cost per iteration will grow so that

this preconditioner will no longer be competitive with the other techniques.

Finally, the SLU preconditioner, although generally quite effective on this

class of applications, is not easily parallelizable and requires dense enough

sparse approximations of A. This preconditioner can be expensive in terms

of both memory and CPU time for the solution of large problems, and thus

it is mainly interesting for comparison purpose.

52 2. Iterative solution via preconditioned Krylov solvers ...

Chapter 3

Sparse pattern selection

strategies for robust

Frobenius-norm

minimization preconditioner

In the previous chapter, we established the need for preconditioning

linear systems of equations arising from the discretization of

boundary integral equations (expressed via the EFIE formulation) in

electromagnetism. We briefly discussed some preconditioners and compared

their performance on a set of model problems arising both from academic

and from industrial applications. The numerical results suggests that sparse

approximate inverse techniques can be good candidates to precondition

this class of problems efficiently. In particular, the Frobenius-norm

minimization approach can greatly reduce the number of iterations needed

if compared with the implicit approach based on incomplete factorization.

In addition Frobenius-norm minimization is inherently parallel. To be

computationally affordable on dense linear systems, Frobenius-norm

minimization preconditioners require a suitable strategy to identify the

relevant entries to consider in the original matrix A, in order to define small

least-squares problems, as well as an appropriate sparsity structure for the

approximate inverse.

In this chapter, we propose some efficient static nonzero pattern

selection strategies both for the preconditioner and for the selection of

the entries of A. In Section 3.1, we overview both dynamic and static

approaches to compute the sparsity pattern of Frobenius-norm minimization

preconditioners. In Section 3.2, we introduce and compare some strategies

to prescribe in advance the nonzero structure of the preconditioner in

electromagnetic applications. In Section 3.3, we propose the use of a different

54 3. Sparse pattern selection strategies for robust ...

pattern selection procedure for the original matrix from that used for the

preconditioner and finally, in Section 3.4 we illustrate the numerical and

computational efficiency of the proposed preconditioners on a set of model

problems.

3.1 Introduction and motivation

We introduced Frobenius-norm minimization in Section 2.2.4. The idea

is to compute the sparse approximate inverse of a matrix A as the matrix

M which minimizes ‖I − MA‖ F (or ‖I − AM‖ F for right preconditioning)

subject to certain sparsity constraints. The main issue is the selection of

the nonzero pattern of M. The idea is to keep M reasonably sparse while

trying to capture the “large” entries of the inverse, which are expected to

contribute the most to the quality of the preconditioner. For this purpose,

two approaches can be followed: an adaptive technique that dynamically

tries to identify the best structure for M; and a static technique, where the

pattern of M is prescribed a priori based on some heuristics.

A simple approach is to prescribe the locations of nonzeros of M before

computing their actual values. When the coefficient matrix has a special

structure or special properties, efforts have been made to find a pattern

that can retain the entries of A −1 having large modulus [42, 48, 49, 138],

and indeed some theoretical studies have shown that there are cases where

the large entries in A −1 are clustered near the diagonal [58, 106]. If A is

row diagonally dominant, then the entries in the inverse decay columnwise

and vice versa [138]. When A is a banded SP D matrix, the entries of A −1

decay exponentially along each row or column; more precisely, if b ij is the

element located at the i-th row and j-th column of A −1 , then

|b ij | ≤ Cγ |i−j| (3.1.1)

where γ < 1 and C > 0 are constant. In this case a banded M would be a

good approximation to A −1 [49]. For many PDE problems the entries of the

inverse exhibit some decaying behaviour and a good sparse pattern for the

approximate inverse can be computed in advance. However the constant C

in relation (3.1.1) can be very large and the decay unacceptably slow, or the

decay is non-monotonic and thus hardly predictable [139].

For sparse matrices, the nonzero structure of the approximate inverse

can be computed based on graph information of the coefficient matrix. The

sparsity structure of a sparse matrix A of order n is represented by a directed

graph G(A) where the vertices are the integers {1, 2, ..., n} and the edges

connect pairs of distinct vertices (i, j) corresponding to nonzero off-diagonal

entries {a ij } in A. The inverse will contain a nonzero in the (i, j) location

whenever there is a directed path connecting vertex i to vertex j in G(A) [72].

3.1. Introduction and motivation 55

Several heuristics can be used to traverse the graph along specific directions

and select a suitable subset of vertices of G(A) to construct the sparsity

pattern of the approximate inverse. Benson and Frederickson [13] define

the structure for the j-th column of the approximate inverse in the case of

structurally symmetric matrices with a full diagonal by selecting in G(A)

vertex j and its q-th level nearest-neighbours. They called matrices defined

with these patterns as q-local matrices. A 0-local matrix has a diagonal

structure, while a 1-local matrix has the same sparsity pattern of A. Taking

for the sparse approximate inverse the same pattern of A generally works

well only for specific classes of problems; using more levels can improve the

quality of the preconditioner but the storage can become prohibitive when

q is increased, and even q=2 is impractical in many cases [61].

The direction of the path in the graph can be selected based on

physical considerations dictated by the decay of the magnitude of the entries

observed in the discrete Green’s function for many problems [139]. The

discrete Green’s function can be considered as a row or as a column of

the exact inverse depicted on the physical computational grid. Dropping

or sparsification can help to identify the most relevant interactions in

the direct problem and select suitable search directions in the graph.

For instance dropping entries of A smaller than a global threshold can

detect anisotropy in the underlying problem and reveal it when no

additional physical information is available. Chow [33] proposes combining

sparsification with the use of patterns of powers of the sparsified matrix

for preconditioning linear systems arising from the discretization of PDE

problems. Sparsification can remarkably reduce the construction cost of the

preconditioner, and the use of matrix powers enables to retain the largest

entries in the Green’s function. A post-processing stage, called filtration,

can be included to drop small magnitude entries in the sparse approximate

inverse, and reduce the cost of storing and applying the preconditioner.

However, the choice of these parameters is problem-dependent and this

strategy is not guaranteed to be effective on systems not arising from PDEs.

The difficulty in extracting a good sparsity pattern for the approximate

inverse of matrices with a general sparsity pattern has motivated the

investigation of adaptive strategies that compute the pattern of the

approximate inverse dynamically. The adaptive procedure known as SP AI

has been already described in Section 2.2.4. The procedure described

in [35] uses a few steps of an iterative solver, like the minimal residual,

to approximately minimize the least-squares problems of relation 2.2.9.

The sparsity pattern automatically emerges during the computation, and

a dual threshold strategy is adopted to drop small entries either in the

search directions or the iterates. To control costs, operations must be

performed in sparse-sparse mode, meaning that sparse matrix-sparse vector

multiplications are performed. These algorithms usually compute the

approximate inverse starting with an initial pattern and estimate the

56 3. Sparse pattern selection strategies for robust ...

accuracy of the preconditioner computed by monitoring the 2-norm of the

residual R = I − AM. If the norm is larger than a user-defined threshold or

the number of nonzeros used is less than a fixed maximum, the pattern

is enlarged according to some heuristics and the approximate inverse is

recomputed. The process is repeated until the required accuracy is not

attained. We refer to these as adaptive procedures.

We have mentioned the problem of cost for the computation of SP AI.

Fast convergence can be obtained for high values of the sparsity ratio, but

then the adaptive strategy requires a prohibitive time and computational

cost in a sequential environment. In general, adaptive strategies can solve

much more general or hard problems but tend to be very expensive. The use

of effective static pattern selection strategies can greatly reduce the amount

of work in terms of CPU-time, and improve substantially the overall setup

process, introducing significant scope for parallelism. Also, the memory

storage requirements and computational cost for the setup phase are known

in advance.

In the next sections, we investigate nonzero pattern selection strategies

for the computation of sparse approximate inverses on electromagnetic

problems. We consider both methods based on the magnitude of the entries

and methods which exploit geometric or topological information from the

underlying meshes. The pattern is computed in a preprocessing step and

then used to compute the entries of the preconditioner.

3.2 Pattern selection strategies for Frobeniusnorm

minimization methods in

electromagnetism

3.2.1 Algebraic strategy

The boundary element method discretizes integral equations on the surface

of the scattering object, generally introducing a very localized strong

coupling among the edges in the underlying mesh. Each edge is strongly

connected to only a few neighbours while, although not null, far-away

connections are much weaker. This means that a very sparse matrix can

still retain the most relevant contributions from the singular integrals that

give rise to dense matrices.

Owing to the decay of the discrete Green’s function, the inverse of A

may exhibit a very similar structure to A. Figure 3.2.1 shows the typical

decay of the discrete Green’s function for Example 5, a scattering problem

from a small sphere, which is representative of the general trend. In the

density coloured plot, large to small magnitude entries in the inverse matrix

3.2. Pattern selection strategies for Frobenius-norm ... 57

are depicted in different colours, from red to green, yellow and blue. The

discrete Green’s function peaks at a point, then it decays rapidly, and far

from the diagonal only a small set of entries have large magnitude.

Figure 3.2.1: Pattern structure of A −1 . The test problem is Example 5.

In this case, a good pattern for the sparse approximate inverse is likely

to be the nonzero pattern of a sparse approximation to A, constructed

by dropping all the entries lower than a prescribed global threshold, as

suggested for instance in [93]. We refer to this approach as the algebraic

approach.

The dropping heuristics described in Section 2.2 can be used to compute

the sparse pattern for the approximate inverse. In [2], these approaches were

compared, observing similar results in the ability to cluster the eigenvalues

of the preconditioners. The first and the last heuristic are the simplest,

and are more suitable for parallel implementation. In addition, the first

one has the advantage of placing the number of nonzero entries in the

approximate inverse under complete user-control, and of achieving a perfect

load balancing in a parallel implementation. A drawback common to all

heuristics is that we need some deus ex machina to find optimal values for

the parameters. In the numerical experiments, we have selected the strategy

where, for each column of A, the k entries (k ≪ n is a positive integer) of

largest modulus are retained.

The algebraic strategy generally works well and competes with the

approach that adaptively defines the nonzero pattern as implemented

in the SPAI preconditioner described in reference [84]. Nevertheless it

58 3. Sparse pattern selection strategies for robust ...

suffers some drawbacks that put severe limits on its use in practical

applications. For large problems, accessing all the entries of the matrix

A becomes too expensive or even impossible. This is the case in the fast

multipole framework, where all the entries of the matrix A are not even

available. In addition on complex geometries, a pattern for the sparse

approximate inverse computed by using information solely from A may

lead to a poor preconditioner. These two main drawbacks motivate the

investigation of more appropriate techniques to define a sparsity pattern

for the preconditioner.

Because we work in an integral equation context, we can use more

information than just the entries of the matrix of the discretized problem. In

particular, we can exploit the underlying mesh and extract further relevant

information to construct the preconditioner. Two types of information are

available from the mesh:

the connectivity graph, describing the topological neighbourhood among

the edges, and

the coordinates of the nodes in the mesh, describing geometric

neighbourhoods among the edges.

3.2.2 Topological strategy

In the integral equation context that we consider, the surface of the object is

discretized by a triangular mesh (see Figure 3.2.2). Each degree of freedom

(DOF), representing an unknown in the linear system, corresponds to the

vectorial flux across an edge in the mesh.

When the object geometries are smooth, only the neighbouring edges can

have a strong interaction with each other, while far-away connections are

generally much weaker. Thus an effective pattern for the sparse approximate

inverse can be prescribed by exploiting topological information related to the

near field. The sparsity pattern for any row of the preconditioner can be

defined according to the concept of level k neighbours, as introduced in [115].

Figure 3.2.3 shows the hierarchical representation of the mesh in terms of

topological levels. Level 1 neighbours of a DOF are the DOF plus the four

DOFs belonging to the two triangles that share the edge corresponding to

the DOF itself. Level 2 neighbours are all the level 1 neighbours plus the

DOFs in the triangles that are neighbours of the two triangles considered at

level 1, and so forth.

In Figures 3.2.4 and 3.2.5 we plot, for each pair of DOFs of the mesh

for Example 1, the magnitude of the associated entry in A and A −1 with

respect to their relative level of neighbours. The large entries in A −1 derive

from the interaction of a very localized set of edges in the mesh so that by

retaining a few levels of neighbours for each DOF an effective preconditioner

3.2. Pattern selection strategies for Frobenius-norm ... 59

Figure 3.2.2: Example of discretized mesh.

Figure 3.2.3: Topological neighbours of a DOF in the mesh.

is likely to be constructed. Three levels can generally provide a good pattern

for constructing an effective sparse approximate inverse. Using more levels

increases the computational cost but does not improve substantially the

quality of the preconditioner. We will refer to this pattern selection strategy

as the topological strategy. In Figure 3.2.6 we show how the density of

nonzeros in the preconditioner evolves when the number of levels is increased.

60 3. Sparse pattern selection strategies for robust ...

It can be seen that for up to five levels the preconditioner is still sparse with

a density lower than 10%. Considering too many topological levels may

cause unnecessary introduction of nonzeros in the sparse approximation.

Some of these nonzero entries do not contribute much to the quality of the

approximation.

Magnitude v.s. levels for A

Figure 3.2.4: Topological localization in the mesh for the large entries of

A. The test problem is Example 1 and is representative of the general

behaviour.

3.2.3 Geometric strategy

When the object geometries are not smooth, two far-away edges in the

topological sense can have a strong interaction with each other so that they

are strongly coupled in the inverse matrix. For the scattering problem on

Example 1, we plot in Figures 3.2.7 and 3.2.8, for the interaction of each

pair of edges in the mesh, the magnitude of the associated entry in A and

A −1 with respect to their distance in terms of wavelength. The largest

entries of A −1 on smooth geometries may come from the interaction of a

geometrically localized set of entries in the mesh. If we construct the sparse

pattern for the inverse by only using information related to A, we may

retain many small entries in the preconditioner, contributing marginally to

its quality, but may neglect some of the large ones potentially damaging the

quality of the preconditioner. Also, when the surface of the object is very

non-smooth, these large entries may come from the interaction of far-away

or non-connected edges in a topological sense, which are neighbours in a

geometric sense. Thus they cannot be detected by using only topological

information related to the near field. Figure 3.2.8 suggests that we can

3.2. Pattern selection strategies for Frobenius-norm ... 61

Magnitude v.s. levels for A −1

Figure 3.2.5: Topological localization in the mesh for the large entries of

A −1 . The test problem is Example 1 and is representative of the general

behaviour.

select the pattern for the preconditioner using physical information, that

is: for each edge we select all those edges within a sufficiently large sphere

that defines our geometric neighbourhood. By using a suitable size for this

sphere, we hope to include the most relevant contributions to the inverse

and consequently to obtain an effective sparse approximate inverse. This

selection strategy will be referred to as the geometric strategy. In Figure 3.2.9

we show how the density of nonzeros in the preconditioner evolves when the

radius of the sphere increases.

3.2.4 Numerical experiments

In this section, we compare the different strategies described above in the

solution of our test problems.

Using the three pattern selection strategies for M, we denote by

• M a , the preconditioner computed by using the algebraic strategy,

• M t , the preconditioner computed by using the topological strategy,

• M g , the preconditioner computed by using the geometric strategy,

• SP AI, the preconditioner constructed by using the dynamic strategy

implemented by [77] and described in Section 2.2.4.

To evaluate the effectiveness of the proposed strategies, we first consider

using the dense matrix A to construct the preconditioners M a , M t , M g and

SP AI. This requires the solution of large dense least-squares problems.

62 3. Sparse pattern selection strategies for robust ...

100

90

Percentage of density of the pattern computed

80

70

60

50

40

30

20

10

0

0 5 10 15 20 25 30 35 40

Levels

Figure 3.2.6: Evolution of the density of the pattern computed for increasing

number of levels. The test problem is Example 1. This is representative of

the general behaviour.

The density of the preconditioner varies from one problem to another

for the same value of the distance parameter chosen to define M g . As

Figure 3.2.8 shows, and tests on all the other examples confirm, those entries,

corresponding to edges contained within a sphere of radius 0.12 times the

wavelength, can retain many of the large entries of the inverse while giving

rise to quite a sparse preconditioner. For all our numerical experiments, we

choose a value for k in the construction of M a and SP AI, and for the level

of neighbours used to generate M t so that they have the same density as

M g , when necessary discarding some small entries of the preconditioner so

that all have the same number of entries.

As for the numerical experiments reported in the previous chapter, we

show results for different Krylov solvers. The stopping criteria in all cases

just consists in reducing the normwise backward error by 10 −5 . The symbol

’-’ means that convergence was not obtained after 500 iterations. In each

case, we took as the initial guess x 0 = 0, and the right-hand side was

such that the exact solution of the system was known. We performed

different tests with different known solutions, observing identical results.

All the numerical experiments were performed in double precision complex

arithmetic on a SGI Origin 2000 and the number of iterations reported in

this paper are for left preconditioning. Very similar results were obtained

when preconditioning from the right.

From the results shown in Table 3.2.1, we first note that all the

preconditioners accelerate the convergence of the Krylov solvers, and in

some cases enable convergence when the unpreconditioned solver diverges

3.2. Pattern selection strategies for Frobenius-norm ... 63

Magnitude v.s. distance for A

Figure 3.2.7: Geometric localization in the mesh for the large entries of

A. The test problem is Example 1. This is representative of the general

behaviour.

or converges very slowly. These numerical experiments also highlight the

advantages of the geometric strategy. It not only outperforms the algebraic

approach and is more robust than the topological approach, which has a

similar computational complexity, but it also generally outperforms the

adaptive approach implemented in SPAI which is much more sophisticated

and more expensive in execution time and memory. SPAI competes with M g

only on Example 1 where the density of the preconditioner is higher. This

trend, namely the denser the preconditioner the more efficient SPAI is, has

been observed on many other examples. However, for sparse preconditioners,

SPAI may be quite poor, as illustrated on Example 4 where preconditioned

GMRES(30) or Bi-CGSTAB are slower than without a preconditioner and

the iteration diverges for GMRES(10) with the SPAI preconditioner while it

converges for the other three preconditioners. On the non-smooth geometry,

that is Example 2, an explanation of why the geometric approach should lead

to a better sparse preconditioner can be suggested by Figure 3.2.10. Some

far-away edges in the connectivity graph, those from each side of the break,

are weakly connected in the mesh but can have a strong interaction with

each other and can lead to large entries in the inverse matrix.

64 3. Sparse pattern selection strategies for robust ...

(b) Magnitude v.s. distance for A −1

Figure 3.2.8: Geometric localization in the mesh for the large entries of

A −1 . The test problem is Example 1. This is representative of the general

behaviour.

100

90

Percentage of density of the pattern computed

80

70

60

50

40

30

20

10

0

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Distance/Wavelength

Figure 3.2.9: Evolution of the density of the pattern computed for larger

geometric neighbourhoods. The test problem is Example 1. This is

representative of the general behaviour.

3.3. Strategies for the coefficient matrix 65

Example 1 - Density of M = 5.03%

Precond.

GMRES(m) Bi -

CGStab

UQMR TFQMR

m=10 m=30 m=50 m=80 m=110

Unprec. - - - 251 202 223 231 175

M j - - 465 222 174 239 210 169

M a 219 135 96 72 72 86 107 72

M t 100 49 36 36 36 35 42 32

M g 124 68 46 46 46 44 58 38

SPAI - 67 44 44 44 48 50 43

Example 2 - Density of M = 1.59%

Precond.

GMRES(m) Bi -

CGStab

UQMR TFQMR

m=10 m=30 m=50 m=80 m=110

Unprec. - - - 398 289 359 403 249

M j - - 473 330 243 257 354 228

M a 472 273 239 207 184 330 313 141

M t - 470 346 243 195 187 275 158

M g 90 72 55 52 52 44 82 40

SPAI - - 99 61 61 168 97 111

Example 4 - Density of M = 1.04%

Precond.

GMRES(m) Bi -

CGStab

UQMR TFQMR

m=10 m=30 m=50 m=80 m=110

Unprec. - 224 191 158 147 177 170 118

M j 350 211 178 153 140 188 152 110

M a 212 157 141 132 123 131 145 115

M t 288 187 160 146 139 145 156 98

M g 63 51 41 41 41 37 47 32

SPAI - 370 184 112 84 256 96 85

Table 3.2.1: Number of iterations using the preconditioners based on

dense A.

3.3 Strategies for the coefficient matrix

When the coefficient matrix of the linear system is dense, the construction

of even a very sparse preconditioner may become too expensive in execution

time as the problem size increases. Both memory and execution time are

significantly reduced by replacing A with a sparse approximation. On

general problems, this approach can cause a severe deterioration of the

quality of the preconditioner; in the context of the Boundary Element

Method (BEM), since a very sparse matrix can retain the most relevant

contributions to the singular integrals, it is likely to be more effective. The

66 3. Sparse pattern selection strategies for robust ...

Figure 3.2.10: Mesh of Example 2.

use of a sparse matrix substantially reduces the size of the least-squares

problems that can then be efficiently solved by direct methods.

The algebraic heuristic described in the previous sections is well suited

for sparsifying A. In [2] the same nonzero sparsity pattern is selected both

for A and M; in that case, especially when the pattern is very sparse, the

computed preconditioner may be poor on some geometries. The effect of

replacing A with its sparse approximation on some problems is highlighted

in Figure 3.3.12 where we display the sparsified pattern of the inverse of the

sparsified A. We see that the resulting pattern is very different from the

sparsified pattern of the inverse of A shown in Figure 3.3.11.

A possible remedy is to increase the density in the patterns for both

A and M. To a certain extent, we can improve the convergence, but the

computational cost of generating the preconditioner grows almost cubicly

with respect to density. A cheaper remedy is to choose a different number

of nonzeros to construct the patterns for A and M, with less entries in the

preconditioner than in Ã, the sparse approximation of A. To illustrate this

effect, we show in Table 3.3.2 the number of iterations of preconditioned

GMRES(50), where the preconditioners are built by using either the same

sparsity pattern for A or a two, three or five times denser pattern for A.

Except when the preconditioner is very sparse, increasing the density

of the pattern imposed on A for a given density of M accelerates the

convergence as expected, getting quite rapidly very close to the number

of iterations required when using a full A. The additional cost in terms

of CPU time is negligible as can be seen in Figure 3.3.13 for experiments

on Example 1. This is due to the fact that the complexity of the QR

factorization used to solve the least-squares problems is the square of the

number of columns times the number of rows. Thus, increasing the number

of rows, that is the number of entries of Ã, is much cheaper in terms of

overall CPU time than increasing the density of the preconditioner, that

is the number of columns in the least-squares problems. Notice that this

3.3. Strategies for the coefficient matrix 67

sparsified(A −1 )

Figure 3.3.11: Nonzero pattern for A −1 when the smallest entries are

discarded. The test problem is Example 5.

Example 1

Percentage density of M

Density strategy

1 2 3 4 5 6 7 8 9 10

Same - - 299 146 68 47 47 42 37 39

2 times - - 248 155 76 46 40 39 39 38

3 times - 253 207 109 49 39 39 37 35 34

5 times - 258 213 99 48 37 38 34 33 33

Full A 364 359 144 96 46 35 35 34 32 31

Table 3.3.2: Number of iterations for GMRES(50) preconditioned with

different values for the density of M using the same pattern for A and larger

patterns. A geometric approach is adopted to construct the patterns. The

test problem is Example 1. This is representative of the general behaviour

observed.

observation is true for both left and right preconditioning because, according

to (2.2.8) and (2.2.9), the smaller dimension of the matrices involved in

the least-squares problems always corresponds to the entries of M to be

computed, and the larger to the entries of the sparsified matrix from A.

68 3. Sparse pattern selection strategies for robust ...

Figure 3.3.12: Sparsity pattern of the inverse of sparse A associated with

Example 1. The pattern has been sparsified with the same value of the

threshold used for the sparsification of displayed in Figure 3.3.11.

3.4 Numerical results

We report in this section on the numerical results obtained by replacing

A with its sparse approximation in the construction of the preconditioner.

In Table 3.4.3 we use the following notation:

• M a−a , introduced in [2] and computed by using algebraic information

from A. The same pattern is used for the preconditioner;

• M a−t , constructed by using the algebraic strategy to sparsify A and

the topological strategy to prescribe the pattern for the preconditioner;

• M a−g , constructed by using the geometric approach and an algebraic

heuristic for A with the same density as for the preconditioner;

• M 2a−t , similar to M a−t , but the density of the pattern imposed on A

is twice as dense as that imposed on M a−t ;

• M 2a−g , similar to M a−g but, as in the previous case, the density of the

pattern imposed on A is twice as dense as that imposed on M a−g .

3.4. Numerical results 69

CPU−time for the construction of the preconditioner

500

450

400

350

300

250

200

150

100

1:1

3:1

5:1

Full A

50

0

0 1 2 3 4 5 6 7 8 9 10

Density of the preconditioning matrix

Figure 3.3.13: CPU time for the construction of the preconditioner using a

different number of nonzeros in the patterns for A and M. The test problem

is Example 1. This is representative of the other examples.

For the sake of comparison we also report the number of iterations

without using a preconditioner and with only a diagonal scaling, denoted by

M j (j stands for Jacobi preconditoner).

Other combinations are possible for defining the selection strategies for

the patterns of A and M. Here we focus on the most promising ones that use

information from the mesh to retain the large entries of the inverse, and the

algebraic strategy for A to capture the most relevant contributions to the

singular integrals. We also consider the preconditioner M a−a to compare

with previous tests [2] that were performed on different geometries from

those considered here. We show, in Table 3.4.3, the results of our numerical

experiments. For each example, we give the number of iterations required

by each preconditioned solver.

70 3. Sparse pattern selection strategies for robust ...

Example 1 - Density of M = 5.03%

Precond.

GMRES(m) Bi -

CGStab

UQMR TFQMR

m=10 m=30 m=50 m=80 m=110

Unprec. - - - 251 202 223 231 175

M j - - 465 222 174 239 210 169

M a−a 284 170 138 114 92 120 156 94

M a−t 179 61 45 45 45 43 58 36

M a−g 147 93 68 59 59 55 73 53

M 2a−t 128 56 40 40 40 37 50 36

M 2a−g 131 79 52 51 51 59 65 44

Example 2 - Density of M = 1.59%

Precond.

GMRES(m) Bi -

CGStab

UQMR TFQMR

m=10 m=30 m=50 m=80 m=110

Unprec. - - - 398 289 359 403 249

M j - - 473 330 243 257 354 228

M a−a - 319 255 221 203 181 319 135

M a−t - 261 213 174 169 128 251 121

M a−g 251 178 150 138 117 106 256 116

M 2a−t - 370 284 202 182 176 276 127

M 2a−g 100 73 61 55 55 48 93 40

Example 3 - Density of M = 2.35%

Precond.

GMRES(m) Bi -

CGStab

UQMR TFQMR

m=10 m=30 m=50 m=80 m=110

Unprec. - - - - 488 - 444 308

M j - - - 491 427 375 356 306

M a−a 436 316 240 193 125 144 166 135

M a−t 137 108 93 71 71 64 93 66

M a−g - 464 296 203 108 240 166 144

M 2a−t 113 78 59 53 53 41 61 44

M 2a−g 122 84 72 59 59 53 67 50

Example 4 - Density of M = 1.04%

Precond.

GMRES(m) Bi -

CGStab

UQMR TFQMR

m=10 m=30 m=50 m=80 m=110

Unprec. - 224 191 158 147 177 170 118

M j 350 211 178 153 140 188 152 110

M a−a 299 205 172 146 133 162 180 103

M a−t 266 152 130 114 99 92 127 83

M a−g 81 67 66 63 63 39 79 41

M 2a−t 269 167 143 136 116 107 137 93

M 2a−g 71 60 47 47 47 43 61 41

Continued on next page

3.4. Numerical results 71

Continued from previous page

Example 5 - Density of M = 0.63%

Precond.

GMRES(m) Bi -

CGStab

UQMR TFQMR

m=10 m=30 m=50 m=80 m=110

Unprec. - 344 233 146 125 152 170 109

M j - 326 219 140 131 183 173 107

M a−a - 352 249 154 134 202 183 107

M a−t 360 66 64 60 60 34 76 46

M a−g 313 81 68 61 61 36 74 40

M 2a−t 71 48 47 47 47 25 54 30

M 2a−g 88 42 39 39 39 21 45 25

Table 3.4.3: Number of iterations to solve the set of test problems.

Example 1 - Density of M = 5.03%

M a−a M a−t M 2a−t M a−g M 2a−g

83.42 91.07 91.78 79.47 80.18

Example 2 - Density of M = 1.59%

M a−a M a−t M 2a−t M a−g M 2a−g

13.98 16.45 16.73 13.53 13.67

Example 3 - Density of M = 2.35%

M a−a M a−t M 2a−t M a−g M 2a−g

83.59 146.44 147.79 109.45 110.30

Example 4 - Density of M = 1.04%

M a−a M a−t M 2a−t M a−g M 2a−g

31.75 38.05 38.23 31.12 31.24

Example 4 - Density of M = 0.63%

M a−a M a−t M 2a−t M a−g M 2a−g

27.66 70.93 71.29 26.04 26.13

Table 3.4.4: CPU time to compute the preconditioners.

In Table 3.4.4, we show the CPU time required to compute the

preconditioners when the least-squares problems are solved using LAPACK

routines. The CPU time for constructing M a−t and M 2a−t is in some cases

much larger than that needed for M a−g and M 2a−g . The reason is that, in

the topological strategy, it is not possible to prescribe exactly a value for

the density. Thus, for each problem, we select a suitable number of levels

of neighbours, to obtain the closest number of nonzeros to that retained in

the pattern based on the geometric approach. After the construction of the

72 3. Sparse pattern selection strategies for robust ...

preconditioner, we drop its smallest entries to ensure an identical number of

nonzeros for the two strategies. The results illustrate that considering twice

as dense a pattern for A as for M does not cause a significant growth in

the computational time although it enables us to construct a more robust

preconditioner.

We first observe that using a sparse approximation of A reduces the

convergence rate of the preconditioned iterations when the nonzero pattern

imposed on the preconditioner is very sparse. However if we adopt the

geometric strategy to define the sparsity pattern for the approximate inverse,

the convergence rate is not affected very much. For even larger values

of density, the difference in the number of iterations between using full

A or an algebraic sparse approximation becomes negligible. For all the

experiments, M a−g still outperforms M a−a and is generally more robust than

M a−t ; the most efficient and robust preconditioner is M 2a−g . The multiple

density strategy allows us to improve the efficiency and the robustness of the

Frobenius-norm preconditioner on this class of problems without requiring

any more time for the construction of the preconditioner. For all the test

examples, it enables us to get the fastest convergence even for GMRES with

a low restart parameter on problems where neither M a−a nor M a−g converge.

The effectiveness of this multiple density heuristic is illustrated in

Figures 3.4.14 and 3.4.15 where we see the effect of preconditioning on the

clustering of the eigenvalues of A for the most difficult problem, Example 2.

The eigenvalues of the preconditioned matrices are in both cases well

clustered around the point (1.0,0.0) (with a more effective clustering for

M 2a−g ), but those obtained by using the multiple density strategy are

further from the origin. This is highly desirable when trying to improve

the convergence of Krylov solvers.

Another advantage of this multiple density heuristic is that it generally

allows us to reduce the density of the preconditioner (and thus its

construction cost), while preserving its numerical quality. Although no

specific results are reported to illustrate this aspect, this behaviour may

be partially observed in Table 3.3.2.

3.5. Concluding remarks 73

0.5

0

Imaginary axis

−0.5

−1

−1.5

−0.5 0 0.5 1 1.5

Real axis

Figure 3.4.14: Eigenvalue distribution for the coefficient matrix

preconditioned by using a single density strategy on Example 2.

3.5 Concluding remarks

We have presented some a priori pattern selection strategies for the

construction of a robust sparse Frobenius-norm minimization preconditioner

for electromagnetic scattering problems expressed in integral formulation.

We have shown that, by using additional geometric information from the

underlying mesh, it is possible to construct robust sparse preconditioners

at an affordable computational and memory cost. The topological strategy

requires less computational effort to construct the pattern, but since the

density is a step function of the number of levels, the construction of the

preconditioner can require some additional computation. Also it may not

handle very well complex geometries where some parts of the object are not

connected. By retaining two different densities in the patterns of A and M

we can decrease very much the computational cost for the construction of the

preconditioner, usually a bottleneck for this family of methods; preserving

the efficiency while increasing the robustness of the resulting preconditioner.

Although sparsifying A using an algebraic dropping strategy seems to be

the most natural approach to get a sparse approximation of A when all

its entries are available, either the topological or the geometric criterion

can be used to define the sparse approximation of A. Those alternatives

are attractive in a multipole framework where all the entries of A are not

computed. The geometric approach can be also used to sparsify A, without

noticeably deteriorating the quality of the preconditioner. This is shown in

Table 3.5.5, where M 2g−g is constructed by exploiting geometric information

74 3. Sparse pattern selection strategies for robust ...

0.5

0

Imaginary axis

−0.5

−1

−1.5

−0.5 0 0.5 1 1.5

Real axis

Figure 3.4.15: Eigenvalue distribution for the coefficient matrix

preconditioned by using a multiple density strategy on Example 2.

in the patterns of both A and M, but choosing twice as dense a pattern for A

as for M. As suggested by Figure 3.2.4, due to the strongly localized coupling

introduced by the discretization of the integral equations, the topological

approach can also provide a good sparse approximation of A, by retaining

just a few levels of neighbouring edges for each DOF in the mesh. The

numerical behaviour of this approach is illustrated in Table 3.5.6. In both

cases the resulting preconditioner is still robust and better suited for a fast

multipole framework since it does not require knowledge of the location of

the largest entries in A.

M 2g−g

Example

GMRES(m) Bi -

CGStab

UQMR TFQMR

m=10 m=30 m=50 m=80 m=110

1 165 103 75 60 60 66 71 61

2 145 110 95 76 76 68 140 64

3 129 89 70 57 57 49 69 52

4 71 57 48 48 48 38 52 34

5 110 46 42 42 42 24 50 27

Table 3.5.5: Number of iterations to solve the set of test models by using

a multiple density geometric strategy to construct the preconditioner.

The pattern imposed on M is twice as dense as that imposed on A.

3.5. Concluding remarks 75

M 2t−g

Example

GMRES(m) Bi -

CGStab

UQMR TFQMR

m=10 m=30 m=50 m=80 m=110

1 197 87 49 49 49 50 66 50

2 103 82 72 61 61 49 111 50

3 143 98 84 60 60 56 70 53

4 70 58 49 49 49 39 65 37

5 143 50 47 47 47 29 57 28

Table 3.5.6: Number of iterations to solve the set of test models by

using a topological strategy to sparsify A and a geometric strategy for

the preconditioner. The pattern imposed on M is twice as dense as that

imposed on A.

76 3. Sparse pattern selection strategies for robust ...

Chapter 4

Symmetric Frobenius-norm

minimization preconditioners

in electromagnetism

In the previous chapter we have introduced and compared some

strategies to compute a priori the nonzero sparsity pattern for Frobeniusnorm

minimization preconditioners in electromagnetic applications. The

results of the numerical experiments suggest that using additional geometric

information from the underlying mesh, it is possible to construct very sparse

preconditioners and to make them more robust. In this chapter, we illustrate

the numerical and computational efficiency of the proposed preconditioner.

In Section 4.1, we assess the effectiveness of the sparse approximate inverse

compared with standard methods for the solution of a set of model problems

that are representative of real electromagnetic calculation. In Section 4.2,

we complete the study considering two symmetric preconditioners based on

Frobenius-norm minimization.

4.1 Comparison with standard preconditioners

In this section we want to assess the performance of the proposed

Frobenius-norm minimization approach. In Table 4.1.1, we show

the numerical results observed on Examples 1-5 with some standard

preconditioners, of both explicit and implicit form. These are: diagonal

scaling, SSOR, ILU(0), SPAI and SLU applied to a sparse approximation

of A constructed using the algebraic approach. All these preconditioners,

except SLU, exhibit much poorer acceleration capabilities than that

provided by M 2a−g . If we reduce the density of the preconditioner in

Example 1 and 3, M 2a−g converges slowly but becomes the most efficient.

78 4. Symmetric Frobenius-norm minimization preconditioners ...

It should also be noted that SPAI works reasonably well when computed

using dense A (see Table 3.2.1) but with sparse A it does not converge on

Example 2 (see Table 4.1.1). In addition, following [35], we performed some

numerical experiments where we obtained an approximate m •j from (2.2.9)

by dropping the smallest entries of the iterates computed by few steps

of either the Minimum Residual method or GMRES. Unfortunately, the

performance of these approaches for dynamically defining the pattern of the

preconditioner was disappointing. They only improved the unpreconditioned

case when a relative large number of iterations was used to build the

preconditioner making them unaffordable for our problems.

The purpose of this study is to understand the numerical behaviour of

the preconditioners. Nevertheless, we do recognize that some of the simple

strategies have a much lower cost for building the preconditioner and so could

result in a faster solution. When SSOR converges, it is often the fastest, in

terms of the CPU time for the overall solution of the linear system. When

the solution is performed for only one right-hand side, the construction cost

of the other preconditioners cannot be compensated for by the reduction

in the number of iterations; the matrix-vector product is performed using

BLAS kernels that make the iteration cost quite cheap for the problem sizes

we have considered. For instance, when solving Example 1 with GMRES(50)

on a SUN Enterprise, SSOR converges in 31.4 seconds and M 2a−g requires

190 seconds for the construction and 7.6 seconds for the iterations. However,

in electromagnetism applications, the same linear system has to be solved

with many right-hand sides when illuminating an object with various waves

corresponding to different angles of incidence. For that example, if we have

more than eight right-hand sides, the construction of M 2a−g is overcome

by the time saved in the iterations and M 2a−g becomes more efficient than

SSOR. In addition, the construction and the application of M 2a−g is fully

parallelizable while the parallelization of SSOR requires some reordering of

equations that may be difficult to implement efficiently on a distributed

memory platform.

4.1. Comparison with standard preconditioners 79

Example 1 - Density of M = 5.03%

Precond.

GMRES(m) Bi -

CGStab

UQMR TFQMR

m=10 m=30 m=50 m=80 m=110

M j - - 465 222 174 239 210 169

SSOR - - 216 136 98 147 177 135

ILU(0) - - - - - - 479 -

SP AI - - 192 68 68 150 83 94

SLU 160 53 38 38 38 46 50 39

M 2a−g 131 79 52 51 51 59 65 44

Example 2 - Density of M = 1.59%

Precond.

GMRES(m) Bi -

CGStab

UQMR TFQMR

m=10 m=30 m=50 m=80 m=110

M j - - 473 330 243 257 354 228

SSOR - 413 245 164 134 185 281 266

ILU(0) - - - - 322 385 394 439

SP AI - - - - - - - -

SLU - - - - 282 - - -

M 2a−g 100 73 61 55 55 48 93 40

Example 3 - Density of M = 2.35%

Precond.

GMRES(m) Bi -

CGStab

UQMR TFQMR

m=10 m=30 m=50 m=80 m=110

M j - - - 491 427 375 356 306

SSOR - 500 397 301 226 228 246 199

ILU(0) - - - 474 185 - 388 -

SP AI - - - 157 89 198 119 122

SLU 36 25 25 25 25 14 27 19

M 2a−g 122 84 72 59 59 53 67 50

Example 4 - Density of M = 1.04%

Precond.

GMRES(m) Bi -

CGStab

UQMR TFQMR

m=10 m=30 m=50 m=80 m=110

M j

SSOR 360 185 137 112 93 94 124 84

ILU(0) - 359 280 202 127 203 179 136

SP AI 99 78 59 55 55 49 72 53

SLU 99 78 59 55 55 49 72 53

M 2a−g 71 60 47 47 47 43 61 41

Continued on next page

80 4. Symmetric Frobenius-norm minimization preconditioners ...

Continued from previous page

Example 5 - Density of M = 0.63%

Precond.

GMRES(m) Bi -

CGStab

UQMR TFQMR

m=10 m=30 m=50 m=80 m=110

M j

SSOR - 296 194 145 115 161 168 124

ILU(0) - - - - 414 345 389 272

SP AI - 454 196 124 91 118 118 96

SLU 115 68 52 52 52 29 59 42

M 2a−g 88 42 39 39 39 21 45 25

Table 4.1.1: Number of iterations with some standard preconditioners

computed using sparse A (algebraic).

4.2 Symmetrization strategies for Frobenius-norm

minimization method

The linear systems arising from the discretization by BEM can be symmetric

non-Hermitian in the Electric Field Integral Equation formulation (EFIE),

or unsymmetric in the Combined Field Integral Equation formulation

(CFIE). In this thesis, as mentioned in the previous chapters, we will

only consider cases where the matrix is symmetric because EFIE usually

gives rise to linear systems that are more difficult to solve with iterative

methods. Another motivation to focus only on the EFIE formulation is

that it does not require any restriction on the geometry of the scattering

obstacle as CFIE does and in this respect is more general. However, the

sparse approximate inverse computed by the Frobenius-norm minimization

method is not guaranteed to be symmetric, and usually is not, even if a

symmetric pattern is imposed on M, and consequently it might not fully

exploit all the characteristics of the linear system. This fact prevents the

use of symmetric Krylov solvers. To complete the earlier studies, in this

section we consider two possible symmetrization strategies for Frobeniusnorm

minimization using a prescribed pattern for the preconditioner based

on geometric information. As before, all the preconditioners are computed

using as input Ã, a sparse approximation of the dense coefficient matrix A.

If M F rob denotes the unsymmetric matrix resulting from the

minimization (2.2.9), the first strategy simply averages its off-diagonal

entries. That is

M Aver−F rob = M F rob + MF T rob

. (4.2.1)

2

An alternative way to construct a symmetric sparse approximate inverse

is to only compute the lower triangular part, including the diagonal, of the

4.2. Symmetrization strategies for Frobenius-norm minimization ... 81

preconditioner. The nonzeros calculated are reflected with respect to the

diagonal and are used to update the right-hand sides of the subsequent leastsquares

problems involved in the construction of the remaining columns of

the preconditioner. More precisely, in the computation of the k-th column of

the preconditioner, the entries m ik for i < k are set to m ki that are already

available and only the lower diagonal entries are computed. The entries m ki

are then used to update the right-hand sides of the least-squares problems

which involve the remaining unknowns m ik , for k ≥ i. The least-squares

problems are as follows:

min‖ê j − Ã ˆm •j‖ 2 2 (4.2.2)

where ê j = e j − ∑ k

82 4. Symmetric Frobenius-norm minimization preconditioners ...

Frobenius-norm minimization type preconditioners, both symmetric and

unsymmetric. In the following we consider a geometric approach to define

the sparsity pattern for Ã, as it is the only one that can be efficiently

implemented in a parallel fast multipole environment [23]. We compare the

unsymmetric preconditioner M F rob and the two symmetric preconditioners

M Aver−F rob and M Sym−F rob . The column entitled “Relative Flops” displays

σ QR (M)

the ratio

σ QR (M F rob ) , where the σ QR(M) represents the number of floatingpoint

operations required by the sequence of QR factorizations used to build

the preconditioner M, that is either M = M Aver−F rob or M = M Sym−F rob .

In this table, it can be seen that M Aver−F rob almost always requires

less iterations than M Sym−F rob that imposes the symmetry directly and

consequently only computes half of the entries. Since M Sym−F rob computes

less entries the associated values in the column “Relative Flops” are all

less than one and close to a third in all cases. On the hardest test cases

(Examples 1 and 3), the combination SQMR and M Aver−F rob needs less

than half the number of iterations of M F rob with GMRES(30) and is only

very slightly less efficient than M F rob and GMRES(80). On the less difficult

problems, SQMR plus M Aver−F rob converges between 21 and 37% faster than

GMRES(80) plus M F rob and between 31 and 43% faster than GMRES(30)

plus M F rob . M Sym−F rob , that only computes half of the entries of the

preconditioner, has a poor convergence behaviour on the hardest problems

and is slightly less efficient than M Aver−F rob on the other problems when

used with SQMR. Nevertheless, we should mention that, for the sake of

comparison, those preliminary results have been performed using the set of

parameters for the density of Ã and M that were the best for M F rob and

consequently nearly optimal for M Aver−F rob ; the performance of M Sym−F rob

might be improved as shown by the results depicted in Table 4.2.3. These

first experiments reveal the remarkable robustness of SQMR when used in

combination with a symmetric preconditioner. This combination generally

outperforms GMRES even for large restarts.

The best alternative for significantly improving the behaviour of

M Sym−F rob is to enlarge significantly the density of Ã and only marginally

increase the density of the preconditioner. In Table 4.2.3, we show the

number of iterations observed with this strategy that consists in using a

density of Ã that is three times larger than that for M Sym−F rob ; we recall

that for M Aver−F rob and M F rob a density of Ã twice as large as that of

the preconditioner is usually the best trade-off between computing cost and

numerical efficiency. It can be seen that M Sym−F rob is slightly better than

M Aver−F rob (as in Table 4.2.2) but it is less expensive to build. In this

table, we consider the same values for σ QR (M F rob ) as those in Table 4.2.2

to evaluate the ratio “Relative Flops”.

4.2. Symmetrization strategies for Frobenius-norm minimization ... 83

Example 1 - Density of Ã = 10.13% - Density of M = 5.03%

Precond. GMRES(30) GMRES(80) GMRES(∞) SQMR Relative Flops

M F rob 108 60 60 * 1.00

M Aver−F rob 171 79 79 74 1.00

M Sym−F rob – – 301 – 0.25

Example 2 - Density of Ã = 3.17% - Density of M = 1.99%

Precond. GMRES(30) GMRES(80) GMRES(∞) SQMR Relative Flops

M F rob 57 43 43 * 1.00

M Aver−F rob 59 44 44 34 1.00

M Sym−F rob 60 46 39 41 0.28

Example 3 - Density of Ã = 4.72% - Density of M = 2.35%

Precond. GMRES(30) GMRES(80) GMRES(∞) SQMR Relative Flops

M F rob 89 57 57 * 1.00

M Aver−F rob 122 63 63 58 1.00

M Sym−F rob 318 135 91 102 0.29

Example 4 - Density of Ã = 2.08% - Density of M = 1.04%

Precond. GMRES(30) GMRES(80) GMRES(∞) SQMR Relative Flops

M F rob 58 48 48 * 1.00

M Aver−F rob 59 47 47 30 1.00

M Sym−F rob 63 51 51 33 0.30

Example 5 - Density of Ã = 1.25% - Density of M = 0.62%

Precond. GMRES(30) GMRES(80) GMRES(∞) SQMR Relative Flops

M F rob 35 33 33 * 1.00

M Aver−F rob 35 34 34 24 1.00

M Sym−F rob 51 38 38 32 0.31

Table 4.2.2: Number of iterations on the test examples using the same

pattern for the preconditioners.

Example Density

GMRES(m)

SQMR Relative Flops

m=30 m=80 m=∞

1

Ã =11.98%

M = 6.10 %

172 68 68 67 0.40

2

Ã = 5.94%

M = 2.04 %

56 41 41 33 0.30

3

Ã =11.01%

M = 3.14 %

88 57 57 56 0.66

4

Ã = 2.08%

M = 1.19 %

56 50 50 32 0.47

5

Ã = 1.98%

M = 0.62 %

33 33 33 15 0.34

Table 4.2.3: Number of iterations for M Sym−F rob combined with SQMR

using three times more non-zero in Ã than in the preconditioner.

84 4. Symmetric Frobenius-norm minimization preconditioners ...

To illustrate the effect of the densities of Ã and of the preconditioners,

we performed experiments with preconditioned SQMR, where the

preconditioners are built by using either the same sparsity pattern for Ã

or a two, three or five times denser pattern for Ã. We report in Tables 4.2.4

and 4.2.5 respectively the number of SQMR iterations for M Sym−F rob , and

for M Aver−F rob respectively. In these tables, M Sym−F rob always requires

more iterations than M Aver−F rob for the same values of density for Ã and

for the preconditioner, but its computation costs about a quarter of the flops

for each test.

Example 1

Percentage density of M

Density strategy

1 2 3 4 5 6 7 8 9 10

Same – – – – – 180 150 118 105 55

2.0 times – – – – – 67 56 48 91 42

3.0 times – – – – 393 55 52 47 74 39

5.0 times – – – – 346 53 50 45 56 39

Table 4.2.4: Number of iterations of SQMR with M Sym−F rob with

different values for the density of M, using the same pattern for A and

larger patterns. The test problem is Example 1.

Example 1

Percentage density of M

Density strategy

1 2 3 4 5 6 7 8 9 10

Same – – – 336 78 55 55 45 38 40

2.0 times – – 426 105 81 50 48 43 43 44

3.0 times – 426 293 113 92 49 45 36 35 35

5.0 times – 315 248 114 80 44 38 37 37 35

Table 4.2.5: Number of iterations of SQMR with M Aver−F rob with

different values for the density of M, using the same pattern for A and

larger patterns. The test problem is Example 1.

Because the construction of M Sym−F rob is dependent on the ordering

selected, a natural question concerns the sensitivity of the quality of the

preconditioner to this. In particular, in [54] it is shown that the numerical

behaviour of IC is very dependent on the ordering and a similar study and

comparable conclusion with AINV is described in [17]. In Table 4.2.6, we

display the number of iterations with SQMR, selecting the same density

4.2. Symmetrization strategies for Frobenius-norm minimization ... 85

parameters as those used for the experiments reported in Table 4.2.5, but

using different orderings to permute the original pattern of M Sym−F rob .

More precisely we consider the reverse Cuthil-MacKee ordering [37] (RCM),

the minimum degree [71, 141] ordering (MD), the spectral nested dissection

ordering [114] (SND) and lastly we reorder the matrix by putting the denser

rows and columns first (DF). It can be seen that M Sym−F rob is not too

sensitive to the ordering and none of the tested orderings appears superior

to the others.

Example Density Original RCM MD SND DF

1

Ã =11.98%

M = 6.10 %

67 93 93 75 87

2

Ã = 5.94%

M = 2.04 %

33 41 40 40 44

3

Ã =11.01%

M = 3.14 %

56 51 68 73 77

4

Ã = 2.08%

M = 1.19 %

32 42 40 39 39

5

Ã = 1.98%

M = 0.62 %

15 26 25 26 23

Table 4.2.6: Number of iterations of SQMR with M Sym−F rob with

different orderings.

For comparison, in Table 4.2.7, we report on comparative results

amongst different Frobenius-norm minimization type preconditioners, both

symmetric and unsymmetric, obtained when the algebraic dropping strategy

is used to sparsify the coefficient matrix. In this case, M Aver−F rob always

performs better than M Sym−F rob but is at least three times more expensive

to compute. On Examples 1 and 3, the hardest test cases, the combination

SQMR and M Aver−F rob needs up to 65% more iterations than GMRES(80)

plus M F rob but competes with GMRES(30) plus M F rob . On the less difficult

problems, SQMR plus M Aver−F rob converges between 18 and 35% faster than

GMRES(80) plus M F rob and between 20 and 47% faster than GMRES(30)

plus M F rob . The best alternative to significantly improve the behaviour of

M Sym−F rob remains to enlarge notably the density of Ã and only marginally

the density of the preconditioner. This can be observed in Table 4.2.8 where

we show the number of iterations observed with this strategy that consists

in using a density of Ã that is at most three times larger than that of

M Sym−F rob . Once again the behaviour of M Sym−F rob is comparable to that

of M Aver−F rob described in Table 4.2.7 but is less expensive to build.

In Tables 4.2.9 and 4.2.10 we illustrate the effect of the density of the

approximation of the original matrix and of the preconditioners on the

86 4. Symmetric Frobenius-norm minimization preconditioners ...

convergence of SQMR. The preconditioners are built by using either the

same sparsity pattern for Ã or a two, three or five times denser pattern for

Ã. We report in Tables 4.2.9 and Table 4.2.10, respectively, the number

of iterations of SQMR iterations when an algebraic approach is used for

Ã and a geometric approach is selected for M Sym−F rob and M Aver−F rob ,

respectively. If we compare these results with those reported in Table 4.2.4,

it can be seen that, on hard problems, using geometric information even

to prescribe the pattern of Ã is beneficial. M Sym−F rob remains rather

insensitive to the ordering as shown in the results of Table 4.2.11.

Example 1 - Density of Ã = 10.19% - Density of M = 5.03%

Precond. GMRES(30) GMRES(80) GMRES(110) SQMR Relative Flops

M F rob 79 51 51 * 1.00

M Aver−F rob 196 119 90 84 1.00

M Sym−F rob – – – – 0.25

Example 2 - Density of Ã = 3.18% - Density of M = 1.99%

Precond. GMRES(30) GMRES(80) GMRES(110) SQMR Relative Flops

M F rob 45 39 39 * 1.00

M Aver−F rob 48 40 40 32 1.00

M Sym−F rob 78 49 49 46 0.28

Example 3 - Density of Ã = 4.69% - Density of M = 2.35%

Precond. GMRES(30) GMRES(80) GMRES(110) SQMR Relative Flops

M F rob 84 59 59 * 1.00

M Aver−F rob 119 74 74 74 1.00

M Sym−F rob – – – – 0.29

Example 4 - Density of Ã = 2.10% - Density of M = 1.04%

Precond. GMRES(30) GMRES(80) GMRES(110) SQMR Relative Flops

M F rob 60 47 47 * 1.00

M Aver−F rob 64 49 49 32 1.00

M Sym−F rob 64 51 51 33 0.30

Example 5 - Density of Ã = 1.27% - Density of M = 0.62%

Precond. GMRES(30) GMRES(80) GMRES(110) SQMR Relative Flops

M F rob 42 39 39 * 1.00

M Aver−F rob 30 30 30 25 1.00

M Sym−F rob 50 36 36 31 0.31

Table 4.2.7: Number of iterations on the test examples using the same

pattern for the preconditioners. An algebraic pattern is used to sparsify

4.2. Symmetrization strategies for Frobenius-norm minimization ... 87

Example Density

GMRES(m)

SQMR Relative Flops

m=30 m=80 m=∞

1

Ã =12%

M = 6%

360 79 79 79 0.41

2

Ã = 5.97%

M = 2.04 %

59 43 43 34 0.57

3

Ã =11.08%

M = 3.14 %

171 76 76 78 0.66

4

Ã = 2.10%

M = 1.19 %

51 44 44 31 0.47

5

Ã = 1.87%

M = 0.62 %

33 33 33 14 0.34

Table 4.2.8: Number of iterations M Sym−F rob combined with SQMR

using three times more non-zero in Ã than in the preconditioner. An

algebraic pattern is used to sparsify A.

Example 1

Percentage density of M

Density strategy

1 2 3 4 5 6 7 8 9 10

Same – – – – – – 494 364 440 90

2.0 times – – – – – 79 173 105 81 58

3.0 times – – – – – 64 66 71 45 55

5.0 times – – – – 346 52 70 56 40 41

Table 4.2.9: Number of iterations of SQMR with M Sym−F rob with

different values for the density of M, using the same pattern for A

and larger patterns. A geometric approach is adopted to construct the

pattern for the preconditioner and an algebraic approach is adopted to

construct the pattern for the coefficient matrix. The test problem is

Example 1.

88 4. Symmetric Frobenius-norm minimization preconditioners ...

Example 1

Percentage density of M

Density strategy

1 2 3 4 5 6 7 8 9 10

Same 391 – 433 99 89 48 50 38 36 36

2.0 times – 420 272 112 84 44 37 36 33 34

3.0 times 362 363 222 96 86 40 43 36 36 35

5.0 times – 365 251 100 76 40 38 34 35 36

Table 4.2.10: Number of iterations of SQMR with M Aver−F rob with

different values for the density of M, using the same pattern for A

and larger patterns. A geometric approach is adopted to construct the

pattern for the preconditioner and an algebraic approach is adopted to

construct the pattern for the coefficient matrix. The test problem is

Example 1.

Example Density Original RCM MD SND DF

1

Ã =12%

M = 6%

79 72 70 71 76

2

Ã = 5.97%

M = 2.04 %

34 39 39 35 39

3

Ã =11.08%

M = 3.14 %

78 122 92 112 122

4

Ã = 2.10%

M = 1.19 %

31 29 30 30 27

5

Ã = 1.87%

M = 0.62 %

14 27 24 26 14

Table 4.2.11: Number of iterations of SQMR with M Sym−F rob with

different ordering. An algebraic pattern is used to sparsify A.

4.3 Concluding remarks

In this chapter we have assessed the performance of the Frobeniusnorm

minimization preconditioner in the solution of dense complex

symmetric non-Hermitian systems of equations arising from electromagnetic

applications. The set of problems used for the numerical experiments can

be representative of larger systems. We have also investigated the use of

symmetric preconditioners which reflect the symmetry of the original matrix

in the associated preconditioner, and enable us to use a symmetric Krylov

solver that might be cheaper than GMRES iterations. Both M Aver−F rob

and M Sym−F rob appear to be efficient and robust. Through numerical

4.3. Concluding remarks 89

experiments, we have shown that M Sym−F rob was not too sensitive to column

ordering while M Aver−F rob is totally insensitive. In addition M Aver−F rob

is straightforward to parallelize even though it requires more flops for its

construction. It would probably be the preconditioner of choice in a parallel

distributed fast multipole environment but possibilities for parallelizing

M Sym−F rob also exist, by using colouring techniques to detect independent

subsets of columns that can be computed in parallel. In a multipole context

the algorithm must be recast by blocks, and Level 2 BLAS operations have

to be used for the least-squares updates. Finally, the major benefit of these

two preconditioners is the remarkable robustness they exhibit when used in

conjunction with SQMR.

90 4. Symmetric Frobenius-norm minimization preconditioners ...

Chapter 5

Combining fast multipole

techniques and approximate

inverse preconditioners for

large parallel

electromagnetics

calculations.

In this chapter we consider the implementation of the Frobenius-norm

minimization preconditioner described in Chapter 3 within a code that

implements the Fast Multipole Method (FMM). We combine the sparse

approximate inverse preconditioner with fast multipole techniques for the

solution of huge electromagnetic problems. The chapter is organized as

follows: in Section 5.1 we quickly overview the FMM. In Section 5.2

we describe the implementation of the Frobenius-norm minimization

preconditioner in a parallel and multipole context that has been developed

by [135]. In Section 5.3 we study the numerical and parallel scalability of the

implementation for the solution of large problems. Finally, in Section 5.4 we

investigate the numerical behaviour of inner-outer iterative solution schemes

implemented in a multipole context with different levels of accuracy for

the matrix-vector products in the inner and outer loops. We consider in

particular FGMRES as the outer solver with an inner GMRES iteration

preconditioned by the Frobenius-norm minimization method. We illustrate

the robustness and effectiveness of this scheme for the solution of problems

with up to one million unknowns.

92 5. Combining fast multipole techniques and approximate inverse ...

5.1 The fast multipole method

The FMM, introduced by Greengard and Rokhlin [82], provides

an algorithm for computing approximate matrix-vector products for

electromagnetic scattering problems. The method is fast in the sense that

the computation of one matrix-vector product costs O(n log n) arithmetic

operations instead of the usual O(n 2 ) operations, and is approximate in

the sense that the relative error with respect to the exact computation is

around 10 −3 [38, 135]. It is based on truncated series expansions of the

Green’s function for the electric-field integral equation (EFIE). The EFIE

can be written as

∫

E(x) = − ∇G(x, x ′ )ρ(x ′ )d 3 x ′ − ik

Γ c

∫

Γ

G(x, x ′ )J(x ′ )d 3 x ′ +E E (x), (5.1.1)

where E E is the electric field due to external sources, J(x) is the current

density, ρ(x) is the charge density and the constants k and c are the

wavenumber and the speed of light, respectively. The Green’s function G

can be expressed as

G(x, x ′ ) = e−ik|x−x′ |

|x − x ′ | . (5.1.2)

The EFIE is converted into matrix equations by the Method of Moments [86].

The unknown current J(x) on the surface of the object is expanded into a

set of basis functions B i , i = 1, 2, ..., N

J(x) =

N∑

J i B i (x).

i=1

This expansion is introduced in (5.1.1), and the discretized equation is

applied to a set of test functions. A linear system is finally obtained. The

entries in the coefficient matrix of the system are expressed in terms of

surface integrals, and have the form

∫ ∫

A KL = G(x, y)B K (x) · B L (y)dL(y)dK(x). (5.1.3)

When m-point Gauss quadrature formulae are used to compute the surface

integrals in (5.1.3), the entries of the coefficient matrix assume the form

A KL =

m∑

i=1 j=1

m∑

ω i ω j G(x Ki , y Lj )B K (x Ki ) · B L (y Lj ). (5.1.4)

Single and multilevel variants of the FMM exist and, for the multilevel

algorithm, there are adaptive variants that handle efficiently inhomogeneous

5.1. The fast multipole method 93

discretizations. In the one-level algorithm, the 3D obstacle is entirely

enclosed in a large rectangular domain, and the domain is divided into eight

boxes (four in 2D). Each box is recursively divided until the length of the

edges of the boxes of the current level is small enough compared with the

wavelength. The neighbourhood of a box is defined by the box itself and its

26 adjacent neighbours (eight in 2D). The interactions of degrees of freedom

within nearby boxes are computed exactly from (5.1.4), where the Green’s

function is expressed via (5.1.2). The contributions of far away cubes are

computed approximately. For each far away box, the effect of a large number

of degrees of freedom is concentrated into one multipole coefficient, that is

computed using truncated series expansion of the Green’s function

G(x, y) =

P∑

ψ p (x)φ p (y). (5.1.5)

p=1

The expansion (5.1.5) separates the Green’s function into two sets of terms,

ψ i and φ i , that depend on the observation point x and the source (or

evaluation) point y, respectively. In (5.1.5) the origin of the expansion is near

the source point and the observation point x is far away. Local coefficients

for the observation cubes are computed by summing together multipole

coefficients of far-away boxes, and the total effect of the far field on each

observation point is evaluated from the local expansions (see Figure 5.1.1

for a 2D illustration). Local and multipole coefficients can be computed in a

preprocessing step; the approximate computation of the far field enables us

to reduce the computational cost of the matrix-vector product to O(n 3/2 )

in the basic one-level algorithm.

In the hierarchical multilevel algorithm, the obstacle is enclosed in a

cube, the cube is divided into eight subcubes and each subcube is recursively

divided until the size of the smallest box is generally half of a wavelength.

Tree-structured data is used at all levels. In particular only non-empty

cubes are indexed and recorded in the data structure. The resulting

tree is called an oct-tree (see Figure 5.1.2) and we refer to its leaves as

the leaf-boxes. The oct-tree provides a hierarchical representation of the

computational domain partitioned by boxes. Each box has one parent in

the oct-tree, except for the largest cube which encloses the whole domain,

and up to eight children. Obviously, the leaf-boxes have no children.

Multipole coefficients are computed for all cubes in the lowest level of the

oct-tree, that is for the leaf-boxes. Multipole coefficients of the parent

cubes in the hierarchy are computed by summing together contributions

from the multipole coefficients of their children. The process is repeated

recursively until the coarsest possible level. For each observation cube,

an interaction list is defined that consists of those cubes that are not

neighbours of the cube itself but whose parent is a neighbour of the cube’s

parent. In Figure 5.1.3 we denote by dashed lines the interaction list

94 5. Combining fast multipole techniques and approximate inverse ...

for the observation cube in the 2D case. The interactions of degrees

of freedom within neighbouring boxes are computed exactly, while the

interactions between cubes in the interaction list are computed using the

FMM. All the other interactions are computed hierarchically on a coarser

level traversing the oct-tree. Both the computational cost and the memory

requirement of the algorithm are of order O(n log n). For further details

on the algorithmic steps see [39, 115, 124] and [38, 44, 45, 46] for recent

theoretical investigations. Parallel implementations of hierarchical methods

have been described in [78, 79, 80, 81, 126, 149].

Figure 5.1.1: Interactions in the one-level FMM. For each leaf-box, the

interactions with the gray neighbouring leaf-boxes are computed directly.

The contribution of far away cubes are computed approximately. The

multipole expansions of far away boxes are translated to local expansions

for the leaf-box; these contributions are summed together and the total field

induced by far away cubes is evaluated from local expansions.

5.2 Implementation of the Frobenius-norm

minimization preconditioner in the fast

multipole framework

An efficient implementation of the Frobenius-norm minimization

preconditioner in the FMM context exploits the box-wise partitioning of

the domain. The subdivision into boxes of the computational domain uses

5.2. Implementation of the Frobenius-norm minimization preconditioner ...95

Figure 5.1.2: The oct-tree in the FMM algorithm. The maximum number

of children is eight. The actual number corresponds to the subset of eight

that intersect the object (courtesy of G. Sylvand, INRIA CERMICS).

geometric information from the obstacle, that is the spatial coordinates of

its degrees of freedom. As we know from Chapter 3, this information can

be profitably used to compute an effective a priori sparsity pattern for the

approximate inverse. In the FMM implementation, we adopt the following

criterion: the nonzero structure of each column of the preconditioner is

defined by retaining all the edges within a given leaf-box and those in

one level of neighbouring boxes. We recall that the neighbourhood of a

box is defined by the box itself and its 26 adjacent neighbours (eight in

2D). The sparse approximation of the dense coefficient matrix is defined by

retaining the entries associated with edges included in the given leaf-box as

well as those belonging to the two levels of neighbours. The actual entries

of the approximate inverse are computed column by column by solving

independent least-squares problems. The main advantage of defining the

pattern of the preconditioner and of the original sparsified matrix box-wise

is that we only have to compute a QR factorization per leaf-box. Indeed

the least-squares problems corresponding to edges within the same box are

identical because they are defined using the same nonzero structure and the

same entries of A. It means that the QR factorization can be performed

once and reused many times, improving significantly the efficiency of the

computation. The preconditioner has a sparse block structure; each block is

dense and is associated with one leaf-box. Its construction can use a different

partitioning from that used to approximate the dense coefficient matrix and

represented by the oct-tree. The size of the smallest boxes in the partitioning

96 5. Combining fast multipole techniques and approximate inverse ...

Figure 5.1.3: Interactions in the multilevel FMM. The interactions for the

gray boxes are computed directly. We denote by dashed lines the interaction

list for the observation box, that consists of those cubes that are not

neighbours of the cube itself but whose parent is a neighbour of the cube’s

parent. The interactions of the cubes in the list are computed using the

FMM. All the other interactions are computed hierarchically on a coarser

level, denoted by solid lines.

associated with the preconditioner is a user-defined parameter that can be

tuned to control the number of nonzeros computed per row, that is the

density of the preconditioner. According to our criterion, the larger the size

of the leaf-boxes, the larger the geometric neighbourhood that determines

the sparsity structure of the columns of the preconditioner. Parallelism can

be exploited by assigning disjoint subsets of leaf-boxes to different processors

and performing the least-squares solutions independently on each processor.

Communication is required to get information on the entries of the coefficient

matrix from neighbouring leaf-boxes.

5.3 Numerical scalability of the preconditioner

In this section we show results concerning the numerical scalability

of the Frobenius-norm minimization preconditioner. They have been

obtained by increasing the value of the frequency and illuminating the same

obstacle. The surface of the object is always discretized using ten points

per wavelength. We consider two test examples: a sphere of radius 1 metre

5.3. Numerical scalability of the preconditioner 97

and an Airbus aircraft (see Figure 5.3.4) that represents a real life model

problem in an industrial context. In Table 5.3.1, we present the number of

Figure 5.3.4: Mesh associated with the Airbus aircraft (courtesy of EADS).

The surface is discretized by 15784 triangles.

matrix-vector products using either GMRES(30) or TFQMR with a required

accuracy of 10 −2 on the normwise backward error ||r||

||b||

, where r denotes

the residual and b the right-hand side of the linear system associated with

the experiments on the sphere. This tolerance is accurate for engineering

purposes, as it enables us to detect correctly the radar cross section of the

object. The symbol ‘–’ means no convergence after 1500 iterations. In

Table 5.3.2, we show the number of iterations and the parallel elapsed time

to build the preconditioner and to solve the linear system when its size is

increased. Similar information is reported for the experiments on the Airbus

aircraft in Tables 5.3.3 and 5.3.4. All the runs have been performed in single

precision on eight processors of a Compaq Alpha server. The Compaq Alpha

server is a cluster of Symmetric Multi-Processors. Each node consists of

four Alpha processors that share 512 Mb of memory. On that computer the

98 5. Combining fast multipole techniques and approximate inverse ...

temporary disk space that can be used by the out-of-core solver is around

189 Gb.

Size of the Density of the Frequency

linear system preconditioner GHz

radius/λ GMRES(30) TFQMR

40368 1.16% 0.9 3 99 152

71148 0.33% 1.2 4 83 171

112908 0.21% 1.5 5 96 134

161472 0.15% 1.8 6 96 654

221952 0.11% 2.1 7 438 —

288300 0.08% 2.4 8 348 —

549552 0.04% 3.3 11 532 —

1023168 0.02% 4.5 15 1196 —

Table 5.3.1: Total number of matrix-vector products required

to converge on a sphere on problems of increasing size -

tolerance = 10 −2 . The size of the leaf-boxes in the oct-tree

associated with the preconditioner is 0.125 wavelengths.

Size of the

Disk memory used Construction Solution

GMRES(30)

linear system

Mbytes

time time

71148 83 16.5 13 mins 3 mins

161472 96 37.8 30 mins 8 mins

288300 348 67.9 55 mins 1 hour

549552 532 129.7 1 h 45 mins 4 hours

1023168 1196 243.5 3 h 10 mins 1 day

Table 5.3.2: Elapsed time required to build the

preconditioner and by GMRES(30) to converge on a sphere

on problems of increasing size on eight processors on a

Compaq Alpha server - tolerance = 10 −2 .

5.3. Numerical scalability of the preconditioner 99

Size of the Frequency

linear system GHz

GMRES(30) TFQMR

23676 2.3 61 –

94704 4.6 101 –

213084 6.9 225 –

378816 9.2 – –

591900 11.4 – –

1160124 16.1 – –

Table 5.3.3: Total number of matrix-vector products required

to converge on an aircraft on problems of increasing size -

tolerance = 2 · 10 −2 .

Size of the

Disk memory used Construction Solution

GMRES(30)

linear system

Mbytes

time time

23676 61 5.7 4 mins 3 mins

94704 101 26.3 26 mins 13 mins

213084 225 63.7 54 mins 47 mins

591900 – 169.9 2 h 30 mins –

1160124 – 338.8 3 h 15 mins –

Table 5.3.4: Elapsed time required to build the

preconditioner and by GMRES(30) to converge on an aircraft

on problems of increasing size on eight procs on a Compaq

Alpha server - tolerance = 2 · 10 −2 .

The number of iterations and the computational cost grow rapidly with

the problem size. On the sphere, the number of iterations required by

GMRES(30) is nearly constant for small problems, but increases linearly

for larger problems. The solution by GMRES(30) of a scattering problem

at 3.3 GHz frequency, discretized with half a million points, requires 532

matrix-vector products and four hours of computation time to solve the

associated linear system. Nearly one day of computation is necessary

to solve the same problem at a frequency of 4.5 GHz. In this case the

pertinent matrix has one million unknowns, and GMRES(30) requires 1196

iterations to converge. Compared to GMRES, TFQMR exhibits a very poor

convergence behaviour. It never converges in less than 1500 matrix-vector

products on systems with more than two hundred thousand unknowns. The

Airbus aircraft is even more challenging to solve. On the smallest problem,

100 5. Combining fast multipole techniques and approximate inverse ...

of size 23676, neither GMRES(30) nor TFQMR converge to 10 −2 in 1500

iterations, and 115 iterations are required by full GMRES. On a larger

test, of size 94704, stagnation occurs for small or medium restarts and full

GMRES requires 625 iterations. In Figure 5.3.7, we show for this problem

the normwise backward error after 1500 iterations of GMRES for different

values of restart, when stagnation appears. Convergence to 10 −2 is achieved

only when a very large restart (around 500) is selected. However, for large

problems this choice might not be affordable because it is too demanding

in terms of storage requirements. If we reduce the required accuracy to

2 · 10 −2 the convergence becomes obviously easier to achieve in a reasonable

elapsed time and using an affordable memory cost, at least for medium-size

problems. As can be observed in Table 5.3.3, GMRES(30) converges in

less than 1500 iterations on problems of size up to two hundred thousand

unknowns. Although this tolerance may seem artificial, we checked at the

end of the computation that the radar cross section of the obstacle was

accurately determined. In Figures 5.3.5 and 5.3.6 we show the typical curves

of the radar cross section for an Airbus aircraft discretized with 200000

unknowns. The quantity reported on the ordinate axis indicates the value

of the energy radiated back at different incidence angles. The RCS curve

depicted in Figure 5.3.5 is obtained when we require an accuracy of 2·10 −2 on

the normwise backward error in the solution of the linear system. The RCS

curve depicted in Figure 5.3.6 is obtained using another integral formulation,

the CFIE, that is better conditioned and simpler to solve, and requiring an

accuracy of 10 −6 on the normwise backward error in the iterative solution.

The CFIE formulation is less general than EFIE but can be used on closed

targets like the Airbus aircraft. It can be observed that in both figures

the pics are equally well approximated. Thus, for engineering purposes the

solution is still meaningful and can be exploited in the design process. We

use this tolerance for the remaining numerical experiments on the Airbus

aircraft.

In Table 5.3.5, we investigate the influence of the density on the quality

of the preconditioner on the aircraft. We adopt the same criterion described

in Section 5.2 to define the sparsity patterns but we increase the size of

the leaf-boxes in the oct-tree associated with the preconditioner. The best

trade-off between cost and performance is obtained for 0.125 wavelengths,

that is the default value set in the code. If the preconditioner is reused to

solve systems with the same coefficient matrix and multiple right-hand sides,

it might be worth computing more nonzeros because the construction cost

can be quickly amortized. If the size of the leaf-boxes is large enough, the

preconditioner is very effective in reducing the number of GMRES iterations;

for values smaller than 0.1 wavelengths the preconditioner is very sparse and

quite poor. For values larger than 0.2 the memory requirements exceed the

limits of our machine.

Finally, in Table 5.3.6, we show the parallel scalability of the

5.3. Numerical scalability of the preconditioner 101

Figure 5.3.5: The RCS curve for an Airbus aircraft discretized with 200000

unknowns. The problem is formulated using the EFIE formulation and a

tolerance of 2 · 10 −2 in the iterative solution. The quantity reported on the

ordinate axis indicates the value of the energy radiated back at different

incidence angles.

Figure 5.3.6: The RCS curve for an Airbus aircraft discretized with 200000

unknowns. The problem is formulated using the CFIE formulation and a

tolerance of ·10 −6 in the iterative solution. The quantity reported on the

ordinate axis indicates the value of the energy radiated back at different

incidence angles.

implementation of the preconditioner in the FMM code [135]. We solve

problems of increasing size on a larger number of processors, keeping the

number of unknowns per processor constant. We refer to [135] for a complete

description of the parallel code that we used.

102 5. Combining fast multipole techniques and approximate inverse ...

0.03

Normwise Backward Error after 1500 iterations of restarted GMRES

0.028

0.026

0.024

Normwise Backward Error

0.022

0.02

0.018

0.016

0.014

0.012

0.01

0 50 100 150 200 250 300 350 400 450 500

Value of restart for GMRES

Figure 5.3.7: Effect of the restart parameter on GMRES stagnation on an

aircraft with 94704 unknowns.

radius

# nonzero # mat-vec Construction Solution Overall

per row in M in GMRES time (sec) time (sec) time (sec)

0.097 183 – 1275 – –

0.110 235 472 1836 8121 9957

0.125 299 225 2593 2846 5439

0.141 372 – 4213 – –

0.157 461 – 5866 – –

0.176 569 278 7234 3637 10871

0.195 684 129 10043 1571 11614

Table 5.3.5: Elapsed time to build the preconditioner,

elapsed time to solve the problem and total number of

matrix-vector products using GMRES(30) on an aircraft with

213084 unknowns - tolerance = 2 · 10 −2 - eight processors

Compaq, varying the parameters controlling the density of

the preconditioner. The symbol ’–’ means stagnation after

1000 iterations.

5.4. Improving the preconditioner robustness using embedded iterations103

Problem

Construction time Elapsed time Elapsed time

Nb procs

size

(sec)

precond (sec) mat-vec (sec)

112908 8 513 0.39 1.77

161472 12 488 0.40 1.95

221952 16 497 0.43 2.15

288300 20 520 0.45 2.28

342732 24 523 0.47 3.10

393132 28 514 0.47 3.30

451632 32 509 0.48 2.80

674028 48 504 0.54 3.70

900912 64 514 0.60 3.80

Table 5.3.6: Tests on the parallel scalability of the

code relative to the construction and application of the

preconditioner and to the matrix-vector product operation on

problems of increasing size. The test example is the Airbus

aircraft.

5.4 Improving the preconditioner robustness

using embedded iterations

The numerical results shown in the previous section indicate that the

Frobenius-norm minimization preconditioner tends to be less effective when

the problem size increases. By its nature the sparse approximate inverse is

inherently local because each degree of freedom is coupled only to a very few

neighbours. The compact support that we use to define the preconditioner

does not allow an exchange of global information, and when the exact

inverse is globally coupled this lack of global information may have a severe

impact on the quality of the preconditioner. In addition, in a multipole

context, the density of the sparse approximate inverse tends to decrease for

increasing values of frequency because the size of the subdivision boxes gets

smaller when the frequency of the problem is higher. For the solution of

problems of large size it may be necessary to introduce some mechanism

to recover global information on the numerical behaviour of the discrete

Green’s function. In this section we investigate the behaviour of innerouter

solution schemes implemented in the FMM context. We consider in

particular FGMRES [121] as the outer solver with an inner GMRES iteration

preconditioned with the Frobenius-norm minimization method. For the

FGMRES method, we consider the implementation described in [64]. The

motivation that naturally leads us to consider inner-outer schemes is to try

104 5. Combining fast multipole techniques and approximate inverse ...

Outer solver −→ FGMRES, FQMR

Do k=1,2, ...

• M-V product: FMM with high accuracy

• Preconditioning : Inner solver (GMRES, TFQMR, ...)

Do i=1,2, ...

End Do

• M-V product: FMM with low accuracy

• Preconditioning : M F rob

Figure 5.4.8: Inner-outer solution schemes in the FMM context. Sketch of

the algorithm.

to balance the locality of the preconditioner with the use of the multipole

matrix. The matrix-vector products within the outer and the inner solvers

are carried out at different accuracies. Highly accurate FMM is used within

the outer solver that is used to actually solve the linear system, and a lower

accurate FMM within the inner solver is used as preconditioner for the

outer scheme. In fact, we solve a nearby system for the preconditioning

operation. This enables us to save considerable computational effort during

the iterative process. More precisely, the FMM accuracy is “high” for the

FGMRES iteration (the relative error in the matrix-vector computation is

around 5 · 10 −4 compared to the exact computation) and “medium” for the

inner iteration (the relative error is around ·10 −3 ). We present a sketch of

the algorithm in Figure 5.4.8. One could apply this idea recursively and

embed several FGMRES schemes with decreasing FMM accuracy down to

the lowest accuracy in the innermost GMRES. However, in our work we only

consider a two-level scheme. We will see that this is already quite effective.

Among the various possibilities, we select FGMRES(5) and GMRES(20)

that seem to give the optimal trade-off, as the results reported in Table 5.4.7

and 5.4.8 show for experiments on a sphere with 367500 points and an Airbus

aircraft with 213084 points, respectively.

5.4. Improving the preconditioner robustness using embedded iterations105

restart

FGMRES

restart

GMRES

max inner

GMRES

total inner

mat-vec

total outer

mat-vec

5 10 10 230 29 4211

5 10 20 231 15 3526

5 10 30 288 12 4544

5 20 20 180 12 2741

5 20 30 248 11 3967

5 20 40 246 9 3785

10 10 10 180 21 2912

Table 5.4.7: Global elapsed time and total number of matrixvector

products required to converge on a sphere with 367500

points varying the size of the restart parameters and the

maximum number of inner GMRES iterations per FGMRES

preconditioning step - tolerance = 10 −2 - eight processors

Compaq.

Solution

time (sec)

restart

FGMRES

restart

GMRES

max inner

GMRES

total inner

mat-vec

total outer

mat-vec

5 10 10 +200 +4000 –

5 10 20 13 210 2198

5 10 30 12 288 3178

5 20 20 11 160 1768

5 20 30 9 186 2020

5 20 40 7 205 2183

5 30 30 9 180 1946

10 10 10 19 160 1861

10 20 20 10 160 1827

10 30 30 8 180 2123

Table 5.4.8: Global elapsed time and total number of matrixvector

products required to converge on an aircraft with

213084 unknowns varying the size of the restart parameters

and the maximum number of inner GMRES iterations per

FGMRES preconditioning step - tolerance = 2 · 10 −2 - eight

processors Compaq.

Solution

time (sec)

106 5. Combining fast multipole techniques and approximate inverse ...

The convergence history of GMRES depicted in Figure 5.4.9 for different

values of the restart gives us some clues to the numerical behaviour of the

proposed scheme. The residual of GMRES tends to decrease very rapidly in

the first few iterations independently of the restarts, then decreases much

more slowly, and finally stagnates to a value that depends on the restart;

the larger the restart, the lower the stagnation value. It suggests that a

few steps (up to 20) in the inner solver can be very effective for obtaining a

significant reduction of the initial residual. A different numerical behaviour

has been observed for the TFQMR solver as inner solver. The residual in

the beginning of the convergence is nearly constant or decreases very slowly.

The use of this method as an inner solver is ineffective. Figure 5.4.9 also

shows that large restarts of GMRES do not enable a further reduction of

the normwise backward error in the beginning of convergence. Thus small

restarts should be preferred in the inner GMRES iterations.

Normwise Backward Error

0.07

0.06

0.05

0.04

0.03

Convergence history of restarted GMRES for different values of restart

Restart = 10

" = 20

" = 30

" = 50

" = 80

" = 150

" = 300

" = 500

0.02

0.01

0 500 1000 1500

Number of M−V products

Figure 5.4.9: Convergence history of restarted GMRES for different values

of restart on an aircraft with 94704 unknowns.

We show the results of some preliminary experiments in Tables 5.4.9

and 5.4.10. We show the number of inner and outer matrix-vector products

needed to achieve convergence on the sphere using a tolerance of 10 −2 and

on the Airbus aircraft using a tolerance of 2 · 10 −2 . We also give timings.

The comparison with the results shown in Tables 5.3.1 and 5.3.3 is fair

because GMRES(30) has exactly the same storage requirements as the

combination FGMRES(5)/GMRES(20). In fact, for the same restart value,

the storage requirement for the FGMRES algorithm is twice that for the

standard GMRES algorithm, as it stores the preconditioned vectors of the

Krylov basis. The combination FGMRES/GMRES remarkably enhances

the robustness of the preconditioner on large problems. On the sphere

5.4. Improving the preconditioner robustness using embedded iterations107

with 367500 points, it enables convergence in 16 outer and 252 total inner

iterations whereas GMRES(30) does not converge in 1500 iterations. On the

sphere with one million unknowns the elapsed time for the iterative solution

is reduced from 11 hours to 1 1 2

hours on 16 processors. The enhancement of

the robustness of the preconditioner is even more significant on the Airbus

aircraft as GMRES(30) does not converge on problem sizes larger than

around 200000 unknowns. This can be observed in Table 5.4.10 and also

in Figure 5.4.10 where we report on the normwise backward error after 100

outer iterations of FGMRES for different values of restart for FGMRES.

The value reported in this figure can be considered as the level of stagnation

of the normwise backward error. The depicted curve can be compared to the

one given in Figure 5.3.7. The normwise backward error is much smaller than

that obtained with the standard GMRES at a comparable computational

cost. Finally, we mention that the combination FGMRES(5)/GMRES(20)

does not converge on one problem, of size 378816, using a tolerance of 2·10 −2 .

In fact, for some specific values of the frequency, resonance phenomena may

occur in the associated physical problem, and the resulting linear system

can become very ill-conditioned.

Size of the

Solution

FGMRES(5) GMRES(20)

linear system

time (sec)

40368 7 105 2 mins

71148 7 105 4 mins

112908 7 105 7 mins

161472 9 126 13 mins

221952 13 210 29 mins

288300 13 210 37 mins

367500 16 252 1 h 10 mins

549552 17 260 1 h 50 mins

1023168 17 260 3 h 20 mins

Table 5.4.9: Total number of matrix-vector products required

to converge on a sphere on problems of increasing size -

tolerance = 10 −2 .

108 5. Combining fast multipole techniques and approximate inverse ...

Size of the

Solution

FGMRES(5) GMRES(20)

linear system

time (sec)

23676 15 220 7 mins

94704 7 100 9 mins

213084 11 160 36 mins

591900 17 260 3 h 25 mins

1160124 19 300 8 h 42 mins

Table 5.4.10: Total number of matrix-vector products

required to converge on an aircraft on problems of increasing

size - tolerance = 2 · 10 −2 .

0.016

Normwise Backward Error after 100 iterations of restarted FGMRES

0.014

Normwise Backward Error

0.012

0.01

0.008

0.006

0.004

0 50 100 150

Value of restart for FGMRES

Figure 5.4.10: Effect of the restart parameter on FGMRES stagnation on

an aircraft with 94704 unknowns using GMRES(20) as inner solver.

5.5 Concluding remarks

In this chapter, we have described the implementation of the Frobeniusnorm

minimization preconditioner within the code that implements the

Fast Multipole Method (FMM). We have studied the numerical and parallel

scalability of the implementation for the solution of large problems, with up

to one million unknowns, and we have investigated the numerical behaviour

of inner-outer iterative solution schemes implemented in a multipole context

with different levels of accuracy for the matrix-vector products in the

5.5. Concluding remarks 109

inner and outer loops. In particular we have shown that the combination

FGMRES(5)/GMRES(20) can effectively enhance the robustness of the

preconditioner, reducing significantly the computational cost and the storage

requirement for the solution of large problems. Most of the experiments

shown in this chapter require a huge amount of computation and storage,

and they often reach the limits of our target machine in terms of Mbytes.

For the solution of systems with one million unknowns direct methods would

require 8 Tbytes of storage and 37 years of computation on one processor of

the target computer (assuming the computation runs at peak performance).

Some questions are still open. One issue concerns the optimal tuning of

the inner accuracy of the FMM. In the numerical experiments we selected a

“medium” accuracy for the inner iteration. We mention now as before that

using a “lower” accurate FFM in the inner GMRES does not enable us to

get convergence of the outer FGMRES. A multilevel scheme can be designed

as a natural extension of the simple two-level scheme considered in this

chapter, with several embedded FGMRES going down to the lowest accuracy

in the innermost GMRES. An interesting further experiment might be to use

variants of these schemes, based on the FQMR method [136] as the outer

solver and SQMR as the inner solver. The SQMR scheme is remarkably

robust on these applications when used in combination with a symmetric

Frobenius-norm minimization preconditioner such as those introduced in

Chapter 4.

110 5. Combining fast multipole techniques and approximate inverse ...

Chapter 6

Spectral two-level

preconditioner

In the previous chapter, we analysed the numerical behaviour of the

Frobenius-norm minimization method for the solution of large problems.

The numerical results indicate that the preconditioner is less effective when

the problem size increases because of the inherently local nature of the

approximate inverse and the global behaviour of the equations. In this

chapter, we introduce an algebraic multilevel strategy based on low-rank

updates for the preconditioner computed by using spectral information of

the preconditioned matrix.

The chapter is organized in the following way. In Section 6.1, we motivate

the idea of the construction of multilevel preconditioners via low-rank

updates, and we provide a few references to similar work. In Section 6.2, we

describe an additive formulation of the preconditioner for both unsymmetric

and symmetric systems. We show the results of numerical experiments

to illustrate the computational and numerical efficiency of the algorithm

on a set of model problems arising from electromagnetic calculations. In

Section 6.3, we describe a multiplicative formulation of the preconditioner

and give some comparative results. We conclude the chapter with some final

remarks and perspectives.

6.1 Introduction and motivation

The construction of the Frobenius-norm minimization preconditioner

is inherently local. Each degree of freedom in the approximate inverse

is coupled to only a very few neighbours and this compact support does

not allow an exchange of global information. When the exact inverse

is globally coupled, the lack of global information may have a severe

111

112 6. Spectral two-level preconditioner

impact on the quality of the preconditioner. The discrete Green’s function

in electromagnetic applications exhibits a rapid decay, nevertheless the

exact inverse is dense and thus has global support. The locality of

the preconditioner can be reduced by increasing the number of nonzeros

computed, but the construction cost grows almost cubicly with respect to

density. Enlarging the sparsity pattern imposed on A can be a cheaper

remedy because the computational cost for the least-squares solution grows

only linearly with the number of rows. However, in a multipole context

where only the entries of the coefficient matrix associated with the nearfield

interactions are available, the computation of additional entries from

A requires the approximation of surface integrals.

In this chapter, we propose a refinement technique which enhances the

robustness of the approximate inverse on large problems. The method

is based on the introduction of low-rank updates computed by exploiting

spectral information of the preconditioned matrix. The purpose here

is to remove the effect of the smallest eigenvalues in magnitude in the

preconditioned matrix, which potentially can slow down the convergence

of Krylov solvers. We discussed in Chapter 2 that a clustered spectrum

is highly desirable property for the rapid convergence of Krylov methods.

In exact arithmetic the number of distinct eigenvalues would determine

the maximum dimension of the Krylov subspace. If the diameters of

the clusters are small enough, the eigenvalues within each cluster behave

numerically like a single eigenvalue, and we would expect less iterations

of a Krylov method to produce reasonably accurate approximations. The

Frobenius-norm minimization preconditioner succeeds in clustering most of

the eigenvalues far from the origin, nevertheless eigenvalues nearest zero

can potentially slow down convergence. Theoretical studies have related

superlinear convergence of GMRES to the convergence of Ritz values [143].

Basically, convergence occurs as if, at each iteration of GMRES, the next

smallest eigenvalue in magnitude is removed from the system. As the

restarting procedure destroys information about the Ritz values at each

restart, the superlinear convergence may be lost. Thus removing the effect

of small eigenvalues in the preconditioned matrix can have a beneficial effect

on the convergence.

There are essentially two different approaches for exploiting information

related to the smallest eigenvalues during the iteration. The first

idea is to compute a few, k say, approximate eigenvectors of MA

corresponding to the k smallest eigenvalues in magnitude, and enlarge

the Krylov subspace with those directions. At each restart, let

u 1 , u 2 , ..., u k be approximate eigenvectors corresponding to the approximate

eigenvalues of MA closest to the origin. The updated solution of

the linear system in the next cycle of GMRES is extracted from

Span{r 0 , Ar 0 , A 2 r 0 , A 3 r 0 , ..., A m−k−1 r 0 , u 1 , u 2 , ..., u k }. This approach is

referred to as the augmented subspace approach (see [112, 113, 120]).

6.1. Introduction and motivation 113

The approximate eigenvectors can be chosen to be Ritz vectors from the

Arnoldi method. The standard implementation of the restarted GMRES(m)

algorithm is based on the Arnoldi process, and this allows us to recover

spectral information of MA during the iterations. Deflation techniques have

been proposed in [94, 43].

The second idea exploits spectral information gathered during the

Arnoldi process to determine an approximation of an invariant subspace of A

associated with the eigenvalues nearest the origin, and uses this information

to construct a preconditioner or to update the preconditioner. The idea of

using exact invariant subspaces to improve the eigenvalue distribution was

proposed in [119]. Information from the invariant subspace associated with

the smallest eigenvalues and its orthogonal complement are used to construct

a preconditioner in the approach proposed in [7]. This information can be

obtained from the Arnoldi decomposition of a matrix A of size n that has

the form

AV m = V m H m + f m e T m

where V m ∈ R n×m , f m ∈ R n , e m is the m-th unit vector of R n , Vm T V m =

I m , Vm T f m = 0, and H m ∈ R m×m is an upper Hessenberg matrix. If the

Arnoldi process is started from V m e 1 = r 0 /‖r 0 ‖, the columns of V m span

the Krylov subspace K m (A, r 0 ). Let the matrix V k ∈ R k×n consist of the

first k columns v 1 , v 2 , ..., v k of V m , and let the columns of the orthogonal

matrix W n−k span the orthogonal complement of Span{v 1 , v 2 , ..., v k }. As

Wn−k T W n−k = I n−k , the columns of the matrix [V k W n−k ] form an

orthogonal basis of R n . In [7] the inverse of the matrix

M = V k H k V T

k

+ W n−kW T n−k

is used as a left preconditioner. It can be expressed as:

M −1 = V k H −1

k

V k T + W n−kWn−k T .

At each restart, the preconditioner is updated by extracting new

eigenvalues which are the smallest in magnitude. The algorithm proposed

uses the recursion formulae of the implicitly restarted Arnoldi (IRA) method

described in [132], and the determination of the preconditioner does not

require the evaluation of any matrix-vector products with the matrix A in

addition to those needed for the Arnoldi process.

Another adaptive procedure to determine a preconditioner during

GMRES iterations was introduced in [59]. It is based on the same

idea of estimating the invariant subspace corresponding to the smallest

eigenvalues. The preconditioner is based on a deflation technique such that

the linear system is solved exactly in an invariant subspace of dimension r

corresponding to the smallest r eigenvalues of A.

114 6. Spectral two-level preconditioner

Finally, a preconditioner for GMRES based on a sequence of rank-one

updates that involve the left and right smallest eigenvectors is proposed

in [92]. The method is based on the idea of translating isolated eigenvalues

consecutively group by group into a vicinity of the point (1.0,0.0) using

low-rank projections of the coefficient matrix of the form

Ã = A · (I n + u 1 v H 1 ) · ... · (I n + u l v H l ).

The vectors u j and v j , j ∈ [1, l] are determined to ensure the numerical

stability of consecutive translations of groups of isolated eigenvalues of Ã.

After each restart of GMRES(m), approximations to the isolated eigenvalues

to be translated are computed by the Arnoldi process. The isolated

eigenvalues are translated towards the point (1.0,0.0) of the spectrum, and

the next cycle of GMRES(m) is applied to the transformed matrix. The

effectiveness of this method relies on the assumption that most of the

eigenvalues of A are clustered close to (1.0,0.0) in the complex plane.

Most of these schemes are combined with the GMRES procedure as they

derive information directly from its internal Arnoldi process. In our work,

we consider an explicit eigencomputation which makes the preconditioner

independent of the Krylov solver used for the actual solution of the linear

system.

6.2 Two-level preconditioner via low-rank spectral

updates

The Frobenius-norm minimization preconditioner succeeds in clustering

most of the eigenvalues far from the origin. This can be observed in

Figure 6.2.1 where we see a big cluster near (1.0,0.0) in the spectrum

of the preconditioned matrix for Example 2. This kind of distribution is

highly desirable to get fast convergence of Krylov solvers. Nevertheless the

eigenvalues nearest to zero potentially can slow down convergence. When

we use M 2g−g it is difficult to remove all the smallest eigenvalues close to

the origin even if we increase the number of nonzeros.

In the next sections, we propose a refinement technique for the

approximate inverse based on the introduction of low-rank corrections

computed by using spectral information associated with the smallest

eigenvalues in MA. Roughly speaking, the proposed technique consists

in solving the preconditioned system exactly on a coarse space and using

this information to update the preconditioned residual. We first present

our technique for unsymmetric linear systems and then derive a variant for

symmetric and SPD matrices.

6.2. Two-level preconditioner via low-rank spectral updates 115

0.5

0

Imaginary axis

−0.5

−1

−1.5

−0.5 0 0.5 1 1.5

Real axis

Figure 6.2.1: Eigenvalue distribution for the coefficient matrix

preconditioned by the Frobenius-norm minimization method on Example 2.

6.2.1 Additive formulation

We consider the solution of the linear system

Ax = b, (6.2.1)

where A is a n × n complex unsymmetric nonsingular matrix, x and b are

vectors of size n. The linear system is solved using a preconditioned Krylov

solver and we denote by M 1 the left preconditioner, meaning that we solve

is:

M 1 Ax = M 1 b. (6.2.2)

We assume that the preconditioned matrix M 1 A is diagonalizable, that

M 1 A = V ΛV −1 , (6.2.3)

with Λ = diag(λ i ), where |λ 1 | ≤ . . . ≤ |λ n | are the eigenvalues and V = (v i )

the associated right eigenvectors. We denote by U = (u i ) the associated left

eigenvectors; we then have U H V = diag(u H i v i), with u H i v i ≠ 0, ∀i [147]. Let

V ε be the set of right eigenvectors associated with the set of eigenvalues λ i

with |λ i | ≤ ε. Similarly, we define by U ε the corresponding subset of left

eigenvectors.

Theorem 1 Let

A c = U H ε M 1 AV ε ,

M c = V ε A −1

c U H ε M 1

116 6. Spectral two-level preconditioner

and

M = M 1 + M c .

Then MA is diagonalisable and we have MA = V diag(η i )V −1 with

{

ηi = λ i if |λ i | > ε,

η i = 1 + λ i if |λ i | ≤ ε.

Proof

We first remark that A c = diag(λ i u H i v i) with |λ i | ≤ ε and then A c is

nonsingular.

Let V = (V ε , V¯ε ), where V¯ε is the set of (n − k) right eigenvectors associated

with eigenvalues |λ i | > ε.

Let D ε = diag(λ i ) with |λ i | ≤ ε and D¯ε = diag(λ j ) with |λ j | > ε.

The following relations hold

MAV ε = M 1 AV ε + V ε A −1

c Uε H M 1 AV ε

= V ε D ε + V ε I k

= V ε (D ε + I k )

where I k denotes the (k × k) identity matrix, and

MAV¯ε = M 1 AV¯ε + V ε A −1

c Uε H M 1 AV¯ε

= V¯ε D¯ε + V ε A −1

c Uε H V¯ε D¯ε

= V¯ε D¯ε as Uε H V¯ε = 0.

We then have

( )

Dε + I

MAV = V

k 0

.

0 D¯ε

A c represents the projection of the matrix M 1 A on the coarse space

defined by the approximate eigenvectors associated with its smallest

eigenvalues.

Theorem 2 Let W be such that

Ã c = W H AV ε

has full rank,

and

˜M c = V ε Ã −1

c W H

˜M = M 1 + ˜M c .

Then ˜MA is similar to a matrix whose eigenvalues are

{

ηi = λ i if |λ i | > ε,

η i = 1 + λ i if |λ i | ≤ ε.

6.2. Two-level preconditioner via low-rank spectral updates 117

Proof

With the same notation as for Theorem 1 we have:

˜MAV ε = M 1 AV ε + V ε Ã −1

c W H AV ε

= V ε D ε + V ε I k

= V ε (D ε + I k )

˜MAV¯ε = M 1 AV¯ε + V ε Ã −1

c W H AV¯ε

= V¯ε D¯ε + V ε C with C = Ã−1 c

= ( V ε V¯ε

) ( C

D¯ε

)

We then have

˜MAV = V

W H AV¯ε

( )

Dε + I k C

.

0 D¯ε

For right preconditioning, that is AM 1 y = b, similar results hold.

Lemma 1 Let

and

A c = U H ε AM 1 V ε ,

M c = M 1 V ε A −1

c Uε

H

M = M 1 + M c .

Then AM is diagonalisable and we have AM = V diag(η i )V −1 with

{

ηi = λ i if |λ i | > ε,

η i = 1 + λ i if |λ i | ≤ ε.

Lemma 2 Let W be such that

and

Ã c = W H AM 1 V ε

has full rank,

˜M c = M 1 V ε Ã −1

c W H

˜M = M 1 + ˜M c .

Then A ˜M is similar to a matrix whose eigenvalues are

{

ηi = λ i if |λ i | > ε,

η i = 1 + λ i if |λ i | ≤ ε.

We should point out that if the symmetry of the preconditioner has to be

preserved an obvious choice exists. For left preconditioning, we can set W =

V ε , that nevertheless does not imply that Ãc has full rank. For SPD matrices

this choice leads to a SPD preconditioner. Indeed the preconditioner ˜M is

the sum of a SPD matrix M 1 and the low rank update that is symmetric

semi-definite; it can be noticed that in this case the preconditioner has a

similar form to those proposed in [24] for two-level preconditioners in nonoverlapping

domain decomposition.

118 6. Spectral two-level preconditioner

6.2.2 Numerical experiments

In this section, we show some numerical results that illustrate the

effectiveness of the spectral two-level preconditioner for the solution of dense

complex symmetric non-Hermitian systems arising from the discretization

of surface integral equations in electromagnetism. In our experiments, the

eigenpairs are computed in a preprocessing step, before performing the

iterative solution. This makes the preconditioner independent of the Krylov

solver used for the actual solution of the linear system at the cost of this

extra computation. We use the IRA method implemented in the ARPACK

package to compute approximations to the smallest eigenvalues and their

corresponding approximate eigenvectors. The methods implemented in the

ARPACK software are derived from a class of algorithms called Krylov

subspace projection methods. These methods use information from the

sequence of vectors generated by the power method to compute eigenvectors

corresponding to eigenvalues other than the one with largest magnitude.

In our experiments, we consider coarse spaces of dimension up to 20,

and different values of restart for GMRES, from 10 to 110. For each

test problem, we perform experiments with two levels of accuracy in the

GMRES solution to gain more insight into the robustness of our method.

We provide extensive results in Appendix A; in this chapter we show

the qualitative numerical behaviour of our method on our set of test

examples that can be representative of the general trend in electromagnetic

applications. First we consider the unsymmetric formulation described

in Theorem 1. In Figures 6.2.2-6.2.6 we show the number of iterations

required by GMRES(10) to reduce the normwise backward error to 10 −8

and 10 −5 for increasing size of the coarse space. The numerical results show

that the introduction of the low-rank updates can remarkably enhance the

robustness of the approximate inverse. By selecting up to 10 eigenpairs

the number of iterations decreases by at least a factor of 2 on most of the

experiments reported. The gain is more relevant when high accuracy is

required for the approximate solution. On Example 2, the preconditioning

updates enable fast convergence of GMRES with a low restart within a

tolerance of 10 −8 whereas no convergence was obtained in 1500 iterations

without updates. However, a substantial improvement on the convergence

is observed also when low accuracy is required. In the most effective case,

by selecting 10 corrections, the number of GMRES iterations needed to

achieve convergence of 10 −5 using low restarts reduces by greater than

a factor of 4 on Example 5. If more eigenpairs are selected, generally

no substantial improvement is observed. In fact, the gain in terms of

iterations is strongly related to the magnitude of the shifted eigenvalues. A

speedup in convergence is obtained when a full cluster of small eigenvalues

is completely removed. This is illustrated in Tables 6.2.1 and 6.2.2 where we

show the effect on the convergence of GMRES(10) of deflating eigenvalues

6.2. Two-level preconditioner via low-rank spectral updates 119

of increasing magnitude on Examples 2 and 5, that are representative of

the general trend. On Example 2, the presence of a very small eigenvalue

slows down the convergence significantly. Once this eigenvalue is shifted,

the number of iterations rapidly decreases. On Example 5, there is a cluster

of seven eigenvalues of magnitude around 10 −3 . When the eigenvalues

within the cluster are shifted, a quick speed-up of convergence is observed;

the shifting of the remaining eigenvalues does not have any impact on

the convergence. In Figure 6.2.7-6.2.9, we show the number of iterations

required by restarted GMRES to reduce the normwise backward error to

10 −8 for different values of restart and increasing size of the coarse space.

The remarkable enhancement of the robustness of the preconditioner enables

the use of very small restarts for GMRES.

400

Example 1 − Size = 1080 − IRAM tolerance = 0.1

350

GMRES Toler = 1.0e−8

" = 1.0e−5

Number of iterations of GMRES(10)

300

250

200

150

100

50

0 2 4 6 8 10 12 14 16 18 20

Size of the coarse space

Figure 6.2.2: Number of iterations required by GMRES preconditioned by

a Frobenius-norm minimization method updated with spectral corrections

to reduce the normwise backward error by 10 −8 and 10 −5 for increasing size

of the coarse space on Example 1.

120 6. Spectral two-level preconditioner

350

Example 2 − Size = 1299 − IRAM tolerance = 0.1

300

GMRES Toler = 1.0e−8

" = 1.0e−5

Number of iterations of GMRES(10)

250

200

150

100

50

0 2 4 6 8 10 12 14 16 18 20

Size of the coarse space

Figure 6.2.3: Number of iterations required by GMRES preconditioned by

a Frobenius-norm minimization method updated with spectral corrections

to reduce the normwise backward error by 10 −8 and 10 −5 for increasing size

of the coarse space on Example 2.

Nr of shifted

Eigenvalues

Example 2

Magnitude of

the eigenvalue

0 +1500

1 7.1116e-04 310

2 4.9685e-02 306

3 5.2737e-02 308

4 6.3989e-02 304

5 7.0395e-02 309

6 7.7396e-02 313

7 7.8442e-02 246

8 8.9548e-02 205

9 9.1598e-02 205

10 9.9216e-02 198

GMRES(10)

Toler = 10 −8

Table 6.2.1: Effect of shifting the eigenvalues nearest zero on the convergence

of GMRES(10) for Example 2. We show the magnitude of successively

shifted eigenvalues and the number of iterations required when these

eigenvalues are shifted. A tolerance of 10 −8 is required in the iterative

solution.

6.2. Two-level preconditioner via low-rank spectral updates 121

300

Example 3 − Size = 1701 − IRAM tolerance = 0.1

GMRES Toler = 1.0e−8

" = 1.0e−5

250

Number of iterations of GMRES(10)

200

150

100

50

0 2 4 6 8 10 12 14 16 18 20

Size of the coarse space

Figure 6.2.4: Number of iterations required by GMRES preconditioned by

a Frobenius-norm minimization method updated with spectral corrections

to reduce the normwise backward error by 10 −8 and 10 −5 for increasing size

of the coarse space on Example 3.

Nr of shifted

Eigenvalues

Example 5

Magnitude of

the eigenvalue

0 297

1 8.7837e-03 290

2 8.7968e-03 290

3 8.7993e-03 287

4 9.8873e-03 254

5 9.9015e-03 232

6 9.9053e-03 392

7 9.9126e-03 52

8 2.3331e-01 52

9 2.4811e-01 53

10 2.4813e-01 53

GMRES(10)

Toler = 10 −8

Table 6.2.2: Effect of shifting the eigenvalues nearest zero on the convergence

of GMRES(10) for Example 5. We show the magnitude of successively

shifted eigenvalues and the number of iterations required when these

eigenvalues are shifted. A tolerance of 10 −8 is required in the iterative

solution.

122 6. Spectral two-level preconditioner

160

Example 4 − Size = 2016 − IRAM tolerance = 0.1

140

GMRES Toler = 1.0e−8

" = 1.0e−5

Number of iterations of GMRES(10)

120

100

80

60

40

20

0 2 4 6 8 10 12 14 16 18 20

Size of the coarse space

Figure 6.2.5: Number of iterations required by GMRES preconditioned by

a Frobenius-norm minimization method updated with spectral corrections

to reduce the normwise backward error by 10 −8 and 10 −5 for increasing size

of the coarse space on Example 4.

400

Example 5 − Size = 2430 − IRAM tolerance = 0.1

350

GMRES Toler = 1.0e−8

" = 1.0e−5

300

Number of iterations of GMRES(10)

250

200

150

100

50

0

0 2 4 6 8 10 12 14 16 18 20

Size of the coarse space

Figure 6.2.6: Number of iterations required by GMRES preconditioned by

a Frobenius-norm minimization method updated with spectral corrections

to reduce the normwise backward error by 10 −8 and 10 −5 for increasing size

of the coarse space on Example 5.

6.2. Two-level preconditioner via low-rank spectral updates 123

400

Example 1 − Size = 1080 − IRAM tolerance = 0.1

350

m = 10

m = 30

m = 50

300

Number of iterations of GMRES(m)

250

200

150

100

50

0

0 2 4 6 8 10 12 14 16 18 20

Size of the coarse space

Figure 6.2.7: Number of iterations required by GMRES preconditioned by

a Frobenius-norm minimization method updated with spectral corrections

to reduce the normwise backward error by 10 −8 for three choices of restart

and increasing size of the coarse space on Example 1.

In the second set of experiments, we consider the formulation of the

preconditioner illustrated in Theorem 2, where we select W H = Vε

H M 1 to

save the computation of left eigenvectors. The quality of the preconditioner

is very well preserved as we see in Tables 6.2.3-6.2.7. The construction cost

of the low-rank updates is twice as cheap.

In Table 6.2.8 we show the number of matrix-vector products required by

the ARPACK implementation of the IRA method to compute the smallest

approximate eigenvalues and the associated approximate right eigenvectors.

All the numerical experiments are performed in double precision complex

arithmetic on a SGI Origin 2000. We remark that the matrix-vector

products do not include those required for the iterative solution. Although

the computation can be expensive, the cost can be amortized if the

preconditioner is reused to solve linear systems with the same coefficient

matrix and several right-hand sides. In Table 6.2.9 we show the number of

amortization vectors relative to GMRES(10) and a tolerance of 10 −5 , that

is the number of right-hand sides that have to be considered to amortize the

extra cost for the eigencomputation. The localization of a few eigenvalues

within a cluster may be more expensive than the computation of a full

group of small eigenvalues. The optimal trade-off seems to be around 10. In

that case the number of amortization vectors is reasonably small especially

compared to real electromagnetic calculations where linear systems with

the same coefficient matrix and up to thousands of right-hand sides are

often solved.

124 6. Spectral two-level preconditioner

500

Example 2 − Size = 1299 − IRAM tolerance = 0.1

450

m = 10

m = 30

m = 50

400

Number of iterations of GMRES(m)

350

300

250

200

150

100

50

0 2 4 6 8 10 12 14 16 18 20

Size of the coarse space

Figure 6.2.8: Number of iterations required by GMRES preconditioned by

a Frobenius-norm minimization method updated with spectral corrections

to reduce the normwise backward error by 10 −8 for three choices of restart

and increasing size of the coarse space on Example 2.

300

Example 3 − Size = 1701 − IRAM tolerance = 0.1

250

m = 10

m = 30

m = 50

Number of iterations of GMRES(m)

200

150

100

50

0

0 2 4 6 8 10 12 14 16 18 20

Size of the coarse space

Figure 6.2.9: Number of iterations required by GMRES preconditioned by

a Frobenius-norm minimization method updated with spectral corrections

to reduce the normwise backward error by 10 −8 for three choices of restart

and increasing size of the coarse space on Example 3.

6.2. Two-level preconditioner via low-rank spectral updates 125

Size of the

coarse space

Example 1

Choice for the operator W H

W H = Uɛ H M 1

1 314 315 316

2 314 314 312

3 313 314 315

4 310 313 308

5 313 306 315

6 315 303 311

7 315 298 290

8 315 294 292

9 315 303 302

10 248 244 244

11 206 206 204

12 197 190 215

13 194 177 208

14 192 177 184

15 191 180 186

16 189 184 189

17 189 180 195

18 175 180 205

19 166 174 182

20 153 174 173

W H = V H

ɛ M 1 W = V ɛ

Table 6.2.3: Number of iterations required by GMRES(10) preconditioned

by a Frobenius-norm minimization method updated with spectral

corrections to reduce the normwise backward error by 10 −8 for increasing

size of the coarse space on Example 1. Different choices are considered for

the operator W H .

126 6. Spectral two-level preconditioner

Size of the

coarse space

Example 2

Choice for the operator W H

W H = Uɛ H M 1

1 310 285 304

2 306 286 305

3 308 286 310

4 304 279 303

5 309 286 310

6 313 286 307

7 246 229 239

8 205 188 201

9 205 187 202

10 198 185 194

11 198 184 194

12 198 196 193

13 198 187 193

14 185 190 194

15 175 189 193

16 186 183 185

17 159 178 183

18 192 179 187

19 167 178 185

20 187 168 169

W H = V H

ɛ M 1 W = V ɛ

Table 6.2.4: Number of iterations required by GMRES(10) preconditioned

by a Frobenius-norm minimization method updated with spectral

corrections to reduce the normwise backward error by 10 −8 for increasing

size of the coarse space on Example 2. Different choices are considered for

the operator W H .

6.2. Two-level preconditioner via low-rank spectral updates 127

Size of the

coarse space

Example 3

Choice for the operator W H

W H = Uɛ H M 1

1 267 254 260

2 271 286 267

3 263 284 272

4 260 259 256

5 255 269 262

6 209 221 199

7 209 222 202

8 209 225 208

9 137 133 135

10 127 126 126

11 126 124 125

12 115 117 115

13 119 117 118

14 119 119 120

15 114 119 110

16 104 105 103

17 105 106 105

18 103 105 102

19 97 99 94

20 96 96 90

W H = V H

ɛ M 1 W = V ɛ

Table 6.2.5: Number of iterations required by GMRES(10) preconditioned

by a Frobenius-norm minimization method updated with spectral

corrections to reduce the normwise backward error by 10 −8 for increasing

size of the coarse space on Example 3. Different choices are considered for

the operator W H .

128 6. Spectral two-level preconditioner

Size of the

coarse space

Example 4

Choice for the operator W H

W H = Uɛ H M 1

1 145 145 145

2 134 134 135

3 133 130 131

4 127 125 126

5 126 123 124

6 123 120 122

7 101 101 101

8 101 101 99

9 100 98 94

10 72 94 93

11 95 93 86

12 86 86 86

13 86 86 85

14 84 85 83

15 82 82 82

16 81 81 82

17 81 82 82

18 80 81 82

19 82 81 81

20 76 77 77

W H = V H

ɛ M 1 W = V ɛ

Table 6.2.6: Number of iterations required by GMRES(10) preconditioned

by a Frobenius-norm minimization method updated with spectral

corrections to reduce the normwise backward error by 10 −8 for increasing

size of the coarse space on Example 4. Different choices are considered for

the operator W H .

6.2. Two-level preconditioner via low-rank spectral updates 129

Size of the

coarse space

Example 5

Choice for the operator W H

W H = Uɛ H M 1

1 290 312 290

2 290 311 290

3 287 354 287

4 254 345 252

5 232 270 214

6 392 559 430

7 52 53 51

8 52 55 52

9 53 55 53

10 53 54 52

11 53 53 52

12 52 52 49

13 58 53 49

14 50 52 50

15 51 52 50

16 51 52 50

17 51 52 50

18 60 52 50

19 59 53 52

20 60 53 52

W H = V H

ɛ M 1 W = V ɛ

Table 6.2.7: Number of iterations required by GMRES(10) preconditioned

by a Frobenius-norm minimization method updated with spectral

corrections to reduce the normwise backward error by 10 −8 for increasing

size of the coarse space on Example 5. Different choices are considered for

the operator W H .

130 6. Spectral two-level preconditioner

Size of the Number of Matrix-Vector products

coarse space

Ex. 1 Ex. 2 Ex. 3 Ex. 4 Ex. 5

1 90 135 120 75 60

2 388 440 336 168 58

3 243 524 290 214 107

4 281 469 250 178 103

5 354 423 192 149 163

6 293 357 183 180 156

7 247 340 175 134 128

8 198 333 165 236 105

9 179 345 154 261 125

10 138 358 169 207 128

11 186 527 157 191 160

12 213 579 219 197 131

13 189 574 224 248 162

14 235 1010 212 309 126

15 276 1762 223 355 164

16 266 1053 202 412 237

17 514 751 226 408 227

18 336 3050 264 390 223

19 336 2359 264 426 756

20 650 1066 300 345 220

Table 6.2.8: Number of matrix-vector products required by the IRAM

algorithm to compute approximate eigenvalues nearest zero and the

corresponding right eigenvectors.

A natural question concerns the sensitivity of the preconditioner to the

accuracy of the approximate eigenvectors. In the numerical experiments

we require an accuracy of 0.1 on the computation of the eigenpairs. The

stopping criterion adopted in the ARPACK implementation of the IRA

algorithm assures small backward error on the Ritz pairs. The backward

error is defined as the smallest perturbation ∆A, in norm, such that the

Ritz pair is an eigenpair for the perturbed system A + ∆A. At the end

of the computation, we checked that the required accuracy was attained.

If ˜λ is an approximate eigenvalue and ˜x is the corresponding approximate

eigenvector, then the normwise backward error associated with the eigenpair

(˜λ,˜x) is

‖r‖

α‖x‖ , where α > 0 and r = A˜x − ˜λ˜x. In Table 6.2.10, the spectral

information is computed at an accuracy of the order of the machine

precision, that is 10 −16 . No remarkable differences with the previous results

can be observed in the number of iterations except for Example 2.

6.2. Two-level preconditioner via low-rank spectral updates 131

Size of the Number of Amortization Vectors

coarse space

Ex. 1 Ex. 2 Ex. 3 Ex. 4 Ex. 5

1 9 135 - - 60

2 36 74 - 56 58

3 23 105 290 72 18

4 26 94 84 30 5

5 33 85 48 25 5

6 25 179 9 26 156

7 21 14 8 7 2

8 17 8 8 11 2

9 15 8 3 12 2

10 4 8 3 5 2

11 3 12 3 8 2

12 4 13 4 8 2

13 3 13 4 10 2

14 4 16 4 12 2

15 4 26 4 13 2

16 4 19 3 15 3

17 8 10 4 15 3

18 5 53 4 49 3

19 4 33 4 16 10

20 7 21 4 12 3

Table 6.2.9: Number of amortization vectors required by the IRAM

algorithm to compute approximate eigenvalues nearest zero and the

corresponding right eigenvectors. The computation of the amortization

vectors is relative to GMRES(10) and a tolerance of 10 −5 .

In Figures 6.2.11-6.2.14, we investigate the numerical behaviour of our

method in the presence of a larger cluster of small eigenvalues in M 1 A, that

is when M 1 is a poor preconditioner. This generally happens when the

nonzero structure of the approximate inverse is very sparse, or when less

information from A is used to construct M 1 . As we mentioned in Chapter 3

and is shown in Figures 6.2.10 for Example 2, a side-effect of reducing the

number of nonzeros in the sparse approximation of A is that a larger number

of eigenvalues cluster around the origin of the spectrum of the preconditioned

matrix. In the experiments reported in Figures 6.2.11-6.2.14 the Frobeniusnorm

preconditioner is constructed using the same nonzero structure to

sparsify A and to compute M 1 . As the results show, when the preconditioner

is not very effective the spectral corrections, although beneficial, do not

132 6. Spectral two-level preconditioner

Example 1

Size of the Numberof iterations of GMRES(10)

coarse space

Ex 1 Ex 2 Ex 3 Ex 4 Ex 5

1 315 215 254 145 312

2 314 202 286 134 311

3 314 193 284 130 354

4 313 192 259 125 346

5 306 190 269 123 270

6 303 189 221 120 552

7 298 161 222 101 53

8 294 147 225 101 55

9 303 146 133 99 54

10 244 144 126 94 54

11 206 143 124 93 50

12 190 143 117 86 50

13 177 140 117 86 48

14 177 139 119 85 48

15 182 139 117 82 48

16 184 139 106 81 48

17 171 139 106 81 48

18 176 135 102 81 48

19 177 135 94 80 47

20 178 131 99 80 50

Table 6.2.10: Number of iterations required by GMRES(10) preconditioned

by a Frobenius-norm minimization method updated with spectral

corrections to reduce the normwise backward error by 10 −8 for increasing

size of the coarse space. The formulation of Theorem 2 with the choice

W H = Vε

H M 1 is used for the low-rank updates. The computation of Ritz

pairs is carried out at machine precision.

enhance its robustness significantly. Coarse spaces of larger size may be

necessary to shift the clustered eigenvalues nearest zero and speed up the

convergence. The localization of the eigenvalues of smallest magnitude by

the IRA method is much more expensive in this situation, as illustrated in

the numerical experiments reported in Appendix A.

6.2. Two-level preconditioner via low-rank spectral updates 133

0.5

0

Imaginary axis

−0.5

−1

−1.5

−0.5 0 0.5 1 1.5

Real axis

Figure 6.2.10: Eigenvalue distribution for the coefficient matrix

preconditioned by a Frobenius-norm minimization method on Example 2.

The same sparsity pattern is used for A and for the preconditioner.

900

Example 1 − Size = 1080 − IRAM tolerance = 0.1

800

Toler = 1.0e−8

" = 1.0e−5

Number of iterations of GMRES(10)

700

600

500

400

300

200

0 2 4 6 8 10 12 14 16 18 20

Size of the coarse space

Figure 6.2.11: Convergence of GMRES preconditioned by a Frobeniusnorm

minimization method updated with spectral corrections to reduce the

normwise backward error by 10 −8 and 10 −5 for increasing size of the coarse

space on Example 1. The formulation of Theorem 2 with the choice W H =

Vε

H M 1 is used for the low-rank updates. The same nonzero structure is used

for A and M 1 .

134 6. Spectral two-level preconditioner

550

500

Example 2 − Size = 1299 − IRAM tolerance = 0.1

Toler = 1.0e−8

" = 1.0e−5

Number of iterations of GMRES(10)

450

400

350

300

250

200

0 2 4 6 8 10 12 14 16 18 20

Size of the coarse space

Figure 6.2.12: Convergence of GMRES preconditioned by a Frobeniusnorm

minimization method updated with spectral corrections to reduce the

normwise backward error by 10 −8 and 10 −5 for increasing size of the coarse

space on Example 2. The formulation of Theorem 2 with the choice W H =

Vε

H M 1 is used for the low-rank updates. The same nonzero structure is used

for A and M 1 .

350

Example 4 − Size = 1701 − IRAM tolerance = 0.1

Toler = 1.0e−8

" = 1.0e−5

300

Number of iterations of GMRES(10)

250

200

150

100

50

0 2 4 6 8 10 12 14 16 18 20

Size of the coarse space

Figure 6.2.13: Convergence of GMRES preconditioned by a Frobeniusnorm

minimization method updated with spectral corrections to reduce the

normwise backward error by 10 −8 and 10 −5 for increasing size of the coarse

space on Example 3. The formulation of Theorem 2 with the choice W H =

Vε

H M 1 is used for the low-rank updates. The same nonzero structure is used

for A and M 1 .

6.2. Two-level preconditioner via low-rank spectral updates 135

Example 4 − Size = 2016 − IRAM tolerance = 0.1

280

Toler = 1.0e−8

" = 1.0e−5

260

Number of iterations of GMRES(10)

240

220

200

180

160

140

120

100

0 2 4 6 8 10 12 14 16 18 20

Size of the coarse space

Figure 6.2.14: Convergence of GMRES preconditioned by a Frobeniusnorm

minimization method updated with spectral corrections to reduce the

normwise backward error by 10 −8 and 10 −5 for increasing size of the coarse

space on Example 4. The formulation of Theorem 2 with the choice W H =

Vε

H M 1 is used for the low-rank updates. The same nonzero structure is used

for A and M 1 .

136 6. Spectral two-level preconditioner

6.2.3 Symmetric formulation

One problem with the previous formulations is that the updated

preconditioner M is no longer symmetric even if M 1 is symmetric. A

symmetric formulation can be obtained if we choose W = V ε in Theorem 2.

Nevertheless we point out that, as in the case W H = Vε

H M 1 , the projected

matrix Ãc is not guaranteed to have full rank. For SPD matrices this choice

naturally leads to a SPD preconditioner.

In Tables 6.2.3-6.2.7, we show experiments with this choice for the

operator W . The method is still effective as no remarkable deterioration can

be observed in the quality of the preconditioner computed. In Figures 6.2.15-

6.2.19, we use the symmetric Frobenius-norm minimization method obtained

by averaging the off-diagonal entries, and we solve the linear system

by the SQMR algorithm. The remarkable robustness of this solver on

electromagnetic applications should be noted, as it clearly outperforms

GMRES with large restart.

110

Example 1 − Size = 1080 − IRAM tolerance = 0.1

100

SQMR Toler = 1.0e−8

" = 1.0e−5

90

Number of iterations of SQMR

80

70

60

50

40

30

20

0 2 4 6 8 10 12 14 16 18 20

Size of the coarse space

Figure 6.2.15: Number of iterations required by SQMR preconditioned by

a Frobenius-norm minimization method updated with spectral corrections

to reduce the normwise backward error by 10 −5 for increasing size of the

coarse space on Example 1. The symmetric formulation of Theorem 2 with

the choice W = V ε is used for the low-rank updates.

6.2. Two-level preconditioner via low-rank spectral updates 137

180

Example 2 − Size = 1299 − IRAM tolerance = 0.1

160

SQMR Toler = 1.0e−8

" = 1.0e−5

140

Number of iterations of SQMR

120

100

80

60

40

0 2 4 6 8 10 12 14 16 18 20

Size of the coarse space

Figure 6.2.16: Number of iterations required by SQMR preconditioned by

a Frobenius-norm minimization method updated with spectral corrections

to reduce the normwise backward error by 10 −5 for increasing size of the

coarse space on Example 2. The symmetric formulation of Theorem 2 with

the choice W = V ε is used for the low-rank updates.

100

Example 3 − Size = 1701 − IRAM tolerance = 0.1

90

SQMR Toler = 1.0e−8

" = 1.0e−5

80

Number of iterations of SQMR

70

60

50

40

30

0 2 4 6 8 10 12 14 16 18 20

Size of the coarse space

Figure 6.2.17: Number of iterations required by SQMR preconditioned by

a Frobenius-norm minimization method updated with spectral corrections

to reduce the normwise backward error by 10 −5 for increasing size of the

coarse space on Example 3. The symmetric formulation of Theorem 2 with

the choice W = V ε is used for the low-rank updates.

138 6. Spectral two-level preconditioner

65

60

Example 4 − Size = 2016 − IRAM tolerance = 0.1

SQMR Toler = 1.0e−8

" = 1.0e−5

55

Number of iterations of SQMR

50

45

40

35

30

25

20

15

0 2 4 6 8 10 12 14 16 18 20

Size of the coarse space

Figure 6.2.18: Number of iterations required by SQMR preconditioned by

a Frobenius-norm minimization method updated with spectral corrections

to reduce the normwise backward error by 10 −5 for increasing size of the

coarse space on Example 4. The symmetric formulation of Theorem 2 with

the choice W = V ε is used for the low-rank updates.

55

50

Example 5 − Size = 2430 − IRAM tolerance = 0.1

SQMR Toler = 1.0e−8

" = 1.0e−5

45

Number of iterations of SQMR

40

35

30

25

20

15

10

0 2 4 6 8 10 12 14 16 18 20

Size of the coarse space

Figure 6.2.19: Number of iterations required by SQMR preconditioned by

a Frobenius-norm minimization method updated with spectral corrections

to reduce the normwise backward error by 10 −5 for increasing size of the

coarse space on Example 5. The symmetric formulation of Theorem 2 with

the choice W = V ε is used for the low-rank updates.

6.3. Multiplicative formulation of low-rank spectral updates 139

6.3 Multiplicative formulation of lowrank

spectral updates

The spectral information that we compute can be exploited differently

if we look at it from a multigrid view point. This leads us to derive a

multiplicative version of our two-level preconditioner that can be expressed

as a two-grid algorithm. In order to illustrate this link, let us first briefly

describe the classical geometric two-grid algorithm.

For solving a linear system Ax = b with initial guess x 0 where A comes

from a discretization of an elliptic operator on a mesh, a geometric two-grid

algorithm can be briefly described as follows:

1. Pre-smoothing:

a few iterations are performed to damp the high frequencies of the

error. The components that are eliminated belong to the subspace

spanned by the eigenvectors associated with the large eigenvalues of

the pre-smoothing iteration matrix. One iteration of this pre-smoother

might be written

x new = x old + B(b − Ax old ) (6.3.4)

and let x k+ 1 3

iterations.

denotes the approximate solution after µ 1 pre-smoothing

2. Coarse grid correction:

the components left in the error are smooth and then can be

represented on a coarser mesh. Consequently the residual is projected

onto the coarse mesh and the error equation is solved exactly in

the associated coarse space. The error on the coarse mesh is then

interpolated back into the fine mesh to correct x k+ 1 3 . If we denote

by R the projection operator and by P the prolongation/interpolation

operator, the coarse grid problem is usually defined by the Galerkin

formula A c = RAP . The coarse grid correction can then be written

x k+ 2 3

= x

k+ 1 3 + P A

−1

c R(b − Ax k+ 1 3 ) (6.3.5)

3. Post-smoothing:

a few additional smoothing iterations are performed to eliminate

the possible high frequency that might have been introduced by

the interpolation. We can then compute the new iterate x k+1 by

reperforming µ 2 iterations of the iterative scheme (6.3.4) using the

initial guess x k+ 2 3 .

In a geometric multigrid algorithm, the coarse grid correction effectively

solves the error equation restricted to the subspace associated with the

140 6. Spectral two-level preconditioner

smallest eigenvalues. The corresponding eigenvectors are associated with

the low frequency mode and consequently can geometrically be represented

on a coarser mesh.

In our multiplicative algorithm we make the “coarse grid” correction

explicit by actually projecting the error equation directly onto the subspace

associated with the smallest eigenvalues. More precisely, the smoother

is defined by our preconditioner M 1 , the restriction is R = V ε and the

prolongation is Uε H . The preconditioning operation is performed in three

distinct steps and for the sake of simplicity for this exposition we set µ 1 =

µ 2 = 1. The first step consists of a sweep with the sparse approximate inverse

M 1 , that is ẑ = M 1 p where p is the vector to precondition. The second

step is intended to correct some components of the preconditioned vector ẑ

along the directions defined by the approximate eigenvectors corresponding

to the approximate eigenvalues smallest in magnitude. Using the notation

of Section 6.2.1

ẑ = ẑ + V ɛ A −1

c U ɛ (p − Aẑ).

Finally, the sparse approximate inverse is used to refine the preconditioning

operation in the complement to the subspace determined by approximate

eigenvectors corresponding to the smallest eigenvalues

ẑ = ẑ + M 1 (p − Aẑ).

The nomenclature multiplicative formulation is inherited from the

framework of domain decomposition methods as similarities exist with

Schwarz methods [26, 130]. In addition, with µ 1 = µ 2 = 1 the two-step

correction can be expressed in the following compact form

ẑ = ẑ + B(p − Aẑ)

where B = (I − (I − M 1 )(I − V ɛ A −1

c U ɛ ))A −1 .

6.3.1 Numerical experiments

In this section, we show the qualitative numerical behaviour of our method

on our set of test examples. In Figures 6.3.20, 6.3.21 and 6.3.22, we show the

number of iterations required by restarted GMRES to reduce the residual

to a prescribed accuracy for increasing size of the coarse space. As before in

the extensive results reported in Appendix A we consider coarse spaces of

increasing dimension, up to 20, and values of restart for GMRES from 10 to

110. The preconditioner is very effective as shown in Figures 6.3.20, 6.3.21

and 6.3.22. Compared to the additive formulation a larger reduction in terms

of iteration count is observed on all the five examples. However, we point

out that each iteration step requires two additional matrix-vector products

6.3. Multiplicative formulation of low-rank spectral updates 141

which makes this formulation always more expensive than the additive one.

Finally, we mention that this formulation naturally leads to a symmetric

preconditioner if M 1 is symmetric. Thus the SQMR algorithm can be used

to solve the problem and the results obtained with this solver are shown in

Appendix A. It should be noted that the results are surprisingly poor on

two test problems, Examples 2 and 5.

400

Example 1 − Size = 1080 − IRAM tolerance = 0.1

350

GMRES Toler = 1.0e−8

" = 1.0e−5

300

Number of iterations of GMRES(10)

250

200

150

100

50

0

0 2 4 6 8 10 12 14 16 18 20

Size of the coarse space

Figure 6.3.20: Convergence of GMRES preconditioned by a Frobeniusnorm

minimization method updated with spectral corrections to reduce

the normwise backward error by 10 −8 and 10 −5 for increasing numberof

corrections on Example 1. The symmetric formulation of Theorem 2 with

the choice W = V ε is used for the low-rank updates. The preconditioner is

updated in multiplicative form.

142 6. Spectral two-level preconditioner

300

Example 3 − Size = 1701 − IRAM tolerance = 0.1

250

GMRES Toler = 1.0e−8

" = 1.0e−5

Number of iterations of GMRES(10)

200

150

100

50

0

0 2 4 6 8 10 12 14 16 18 20

Size of the coarse space

Figure 6.3.21: Convergence of GMRES preconditioned by a Frobeniusnorm

minimization method updated with spectral corrections to reduce the

normwise backward error by 10 −8 and 10 −5 for increasing size of the coarse

space on Example 3. The symmetric formulation of Theorem 2 with the

choice W = V ε is used for the low-rank updates. The preconditioner is

updated in multiplicative form.

160

Example 4 − Size = 2016 − IRAM tolerance = 0.1

140

GMRES Toler = 1.0e−8

" = 1.0e−5

Number of iterations of GMRES(10)

120

100

80

60

40

20

0 2 4 6 8 10 12 14 16 18 20

Size of the coarse space

Figure 6.3.22: Convergence of GMRES preconditioned by a Frobeniusnorm

minimization method updated with spectral corrections to reduce the

normwise backward error by 10 −8 and 10 −5 for increasing size of the coarse

space on Example 4. The symmetric formulation of Theorem 2 with the

choice W = V ε is used for the low-rank updates. The preconditioner is

updated in multiplicative form.

6.4. Concluding remarks 143

6.4 Concluding remarks

In this chapter, we have presented a refinement technique for the

approximate inverse based on low-rank corrections computed by using

spectral information from the preconditioned matrix. We have shown the

effectiveness and the robustness of the resulting preconditioner on a set of

small but tough problems arising from electromagnetic applications. The

method is very well suited for its use on electromagnetic problems as the

preconditioner is often used to solve systems with the same coefficient

matrix and multiple right-hand sides. In this way, the extra cost for

the computation of the preconditioner updates can be amortized. It can

be combined with inner-outer schemes via embedded iterations described

in the previous chapter to construct robust and efficient preconditioners

on electromagnetic applications, but we do not carry out this study in

this thesis. Moreover, this technique can be used for general problems.

Preliminary results on domain decomposition methods [117] and SPD

matrices from the Harwell-Boeing collection [53] are encouraging.

144 6. Spectral two-level preconditioner

Chapter 7

Conclusions and perspectives

In this thesis, we have presented preconditioning methods for

the numerical solution, using iterative Krylov solvers, of dense

complex symmetric non-Hermitian systems of equations arising from the

discretization of boundary integral equations in electromagnetism. We have

illustrated both the numerical behaviour and the cost of the proposed

preconditioners, identified potential causes of failure and introduced

techniques to enhance their robustness. The major concern of the thesis

has been to design robust sparse approximate inverse preconditioners based

on Frobenius-norm minimization techniques. However, in Chapter 2,

we considered several standard preconditioners based on the idea of

sparsification, of both implicit and explicit type, and we studied their

numerical behaviour on electromagnetic applications.

We have shown that incomplete LU factorization methods do not work

well for such systems. The incomplete factorization process is highly

unstable on indefinite matrices like those arising from the discretization of

the EFIE formulation. Using numerical experiments we have shown that the

triangular factors computed by the factorization can be very ill-conditioned,

and the long recurrences associated with the triangular solves are unstable.

As an attempt at a possible remedy, we introduced a small complex shift to

move the eigenvalues of the preconditioned system along the imaginary axis

and thus try to avoid a possible cluster of eigenvalues close to zero. A small

diagonal complex shift can help to compute a more stable factorization,

and in some cases the performance of the preconditioner can significantly

improve. Further work is required to make the preconditioner more

robust. Condition estimators can be incorporated into the factorization

process to detect instabilities during the computation, and suitable strategies

introduced to tune the optimal value of the shift and to predict its effect.

The construction of the preconditioner is inherently sequential but many

recent research efforts have been designed to exploit parallelism [102, 111].

145

146 7. Conclusions and perspectives

This gives hope that it might worth examining further this method in a

parallel and multipole context.

Factorized approximate inverses, namely AINV and F SAI, exhibit

poor convergence behaviour because the inverse factors can be totally

unstructured; both reordering and shift strategies do not improve their

effectiveness. Any dropping strategy, either static or dynamic, may be

very difficult to tune as it can easily discard relevant information and

potentially lead to a very poor preconditioner. In this case, finding the

appropriate threshold to enable a good trade-off between sparsity and

numerical efficiency is challenging and very problem-dependent. Graph

partitioning algorithms can be used to define a sparse structure for the

inverse factors. Geometric and spectral partitioning methods would split

the graph of the sparse approximation Ã to A into a number, say p,

of independent subgraphs of roughly equal size, with relatively small

connections. By numbering the interior nodes first and the interface nodes

last, the permuted matrix assumes the form

⎛

P T ÃP =

⎜

⎝

A 1 B T 1

A 2 B T 2

. .. .

A p

B T p

B 1 B 2 · · · B p A T S

⎞

.

⎟

⎠

where P is a permutation matrix. The diagonal blocks A 1 , A 2 , ..., A p

correspond to connections between nodes in the same subgraph; the offdiagonal

blocks B i correspond to connections between nodes of distinct

subgraphs, and the block A S represents connections between interface nodes.

This permutation strategy can also be used to introduce parallelism in the

construction of some inherently sequential preconditioners [15, 102]. The

inverse of the permuted matrix admits the decomposition

P −1 Ã −1 P −T = L −T D −1 L −1 =

⎛

⎜

⎝

I 1

I 2

L −T

1

L −T

2

. .. .

I p

L −T

p

I S

⎞ ⎛

·

⎟ ⎜

⎠ ⎝

T −1

1

T −1

2

. ..

T −1

p

T −1

S

⎞ ⎛

·

⎟ ⎜

⎠ ⎝

I p

L −1

1 L −1

2 · · · L −1

p

⎞

.

⎟

⎠

I S

It can be seen that fill-in in the inverse factor L can occur only in the

blocks L −1

i

. The use of these techniques might enable the control and

prediction of fill-in in the inverse factors, and enhance significantly the

7. Conclusions and perspectives 147

robustness of factorized approximate inverse methods like AINV or F SAI

on electromagnetic applications.

In Chapter 2, we have shown that the location of the large entries in the

inverse matrix exhibit some structure and thus a non-factorized approximate

inverse can be a good candidate to precondition these systems effectively.

In particular, preconditioners based on the Frobenius-norm minimization

are much less prone to instabilities than incomplete factorization methods.

To be computationally affordable on dense systems, these preconditioners

require a suitable strategy to identify the relevant entries to consider in

the original matrix A in order to define small least-squares problems, as

well as an appropriate sparsity structure for the approximate inverse. In

Chapter 3 we exploited the decay of the discrete Green’s function to compute

an effective a priori pattern for the approximate inverse. We have shown

that by using additional geometric information from the underlying mesh,

it is possible to construct robust sparse preconditioners at an affordable

computational and memory cost. An important feature of the pattern

selection strategy based on geometric information is that it does not require

access to all the entries of the matrix A, so that it is well suited for an

implementation in a fast multipole setting where A is not directly available

but where only the near field entries are computed. Strategies that use

information from the connectivity graph of the underlying mesh require

less computational effort to construct the pattern but are generally less

effective. Also, they may not handle complex geometries very well where

some parts of the object are not connected. By retaining two different

densities in the patterns of A and M we can increase the robustness of the

resulting preconditioner without penalizing the cost of its construction. The

numerical experiments show that, using this pattern selection strategy, we

can compute a very sparse but effective preconditioner. With the same low

density, none of the standard preconditioners that we discussed earlier can

compete with it.

In Chapter 4, we propose two symmetric preconditioners, that can

exploit the symmetry of the original matrix in the associated preconditioner

and enable the use of a symmetric Krylov solver that proves to be cheaper

than GMRES iterations. The first strategy simply averages the off-diagonal

entries. We have shown that this approach, used in combination with

the SQMR solver, is fairly robust, and is totally insensitive to column

ordering; however, the construction of the preconditioner requires the same

computational cost as in the unsymmetric case. The second strategy

only computes the lower triangular part, including the diagonal, of the

preconditioner. The nonzeros calculated are reflected with respect to the

diagonal and are used to update the right-hand sides of the subsequent leastsquares

problems involved in the construction of the remaining columns

of the preconditioner. If m denotes the number of nonzeros entries in

the approximate inverse, this method only computes (m + n)/2 nonzeros.

148 7. Conclusions and perspectives

Thus the overall computational complexity for the construction can be

considerably smaller. Through numerical experiments, we have shown that

this method is not too sensitive to column ordering. Both these methods

appear to be efficient and exhibit a remarkable robustness when used in

conjunction with SQMR. They are promising for use in a parallel and

multipole context for the solution of large systems. The first approach

is straightforward to parallelize even though it requires more flops for its

construction. It would probably be the preconditioner of choice in a parallel

distributed fast multipole environment. The second approach is under half

as expensive and can be computationally attractive especially for large

problems. Possibilities for parallelizing this approach also exist by using

colouring techniques to detect independent subsets of columns that can be

computed in parallel. In a multipole context the algorithm must be recast

by blocks, and Level 2 BLAS operations have to be used for the least-squares

updates. Further work is required to implement this procedure.

In Chapter 5, we illustrated the implementation of the Frobenius-norm

minimization preconditioner within a parallel out-of-core research code that

implements the Fast Multipole Method (FMM), and we have studied the

numerical and parallel scalability of the implementation for the solution of

large scattering applications, up to one million unknowns. On problems of

this size, the construction of the preconditioner can be demanding in terms

of time, memory and disk resources. A potential limit of the Frobeniusnorm

minimization preconditioner, and in general of any sparse approximate

inverse method, is that it tends to be less effective on large problems

because the number of iterations increases rapidly with the problem size.

In Chapter 5, we proposed the use of inner-outer iterative solution schemes

implemented in a multipole context with different levels of accuracy for the

matrix-vector products in the inner and outer loops. We have shown that the

use of the multipole matrix can be effective to balance the locality of the

preconditioner. In particular, the combination FGMRES(5)/GMRES(20)

can enhance the robustness of the preconditioner, reducing significantly the

computational cost and the storage requirement for the solution of large

problems. We have successfully used this approach to solve systems of

size up to one million unknowns; the approach is very promising for the

solution of challenging real-life industrial applications. Some questions are

still open. One issue concerns the optimal tuning of the inner accuracy. In

the numerical experiments, we selected a “medium” accuracy for the inner

iteration. A multilevel scheme can be designed as a natural extension of the

simple two-level scheme considered in Chapter 5, with several embedded

FGMRES to go down to the lowest accuracy in the innermost GMRES.

Variants of these schemes can be based on the flexible variants of the SQMR

method as outer solvers and SQMR as the inner solver.

In Chapter 6, we investigated a refinement technique for the approximate

inverse based on low-rank corrections computed by using spectral

7. Conclusions and perspectives 149

information from the preconditioned matrix. We have illustrated the

effectiveness and the robustness of the proposed preconditioner on a set

of small but tough problems arising from electromagnetic applications, and

we have analysed the cost of the algorithm. The conclusion is that the

method is very well suited for the solution of electromagnetic problems; the

extra cost for the computation of the preconditioner updates can be quickly

amortized by considering a few right-hand sides. Also, the preconditioner is

independent of the Krylov solver used for the actual solution of the linear

system. A symmetric formulation has been derived and numerical results

have shown the remarkable robustness of this formulation when used in

conjunction with SQMR. The numerical results are encouraging for the

investigation of this procedure for the solution of much larger problems. The

computation of the preconditioning updates by the IRA method is based on

matrix-vector operations and thus can be easily integrated within the code

that implements the Fast Multipole Method. It could be combined with

inner-outer schemes via embedded iterations to construct preconditioners

on electromagnetic applications that might be expected to be very robust

and effective. Although the electromagnetic context is an ideal setting

for its application, the proposed technique can be effectively used in other

contexts, as it only requires algebraic information from the preconditioned

matrix. Preliminary results on domain decomposition methods and both

SPD and unsymmetric linear systems from the Harwell-Boeing sparse matrix

collection are encouraging.

The idea of updating the preconditioner by using low-rank corrections is

a natural one in the context of integral equations, and is inherently related

to the algebraic structure of the discretizated integral operator. A block

structure of the coefficient matrix naturally emerges when the oct-tree is

considered and the unknowns are numbered consecutively by leaf-boxes. If

the n unknowns are divided into p groups, the coefficient matrix can be

written in the form:

where

A = D + Q

D = diag{T 11 , T 22 , ..., T pp }

is a block-diagonal matrix, and Q is a block matrix with zero blocks on the

diagonal. Each block T kk represents the connection between edges within

the same leaf-box and each off-diagonal block Q kl , l ≠ k represents the

connection between edges of group k and group l. The off-diagonal blocks

Q kl corresponding to far-away groups k and l have low rank r kl and thus

can be expressed as the sum of r kl rank-one updates as follows

150 7. Conclusions and perspectives

where

r kl

∑

Q kl = u i kl vi T

kl = Ukl Vkl

T

i=1

U kl = [u 1 kl , u2 kl , ..., ur kl

kl ]

V kl = [v 1 kl , v2 kl , ..., vr kl

kl ]

Matrix-free methods [75] use this idea to approximate nonsingular

coefficient matrices in 3D boundary integral applications from CEM

and CFD by purely algebraic techniques. The Matrix Decomposition

Algorithm and its multilevel variant [107, 108] approximates the far-field

interactions of electromagnetic scattering problems by standard linear

algebra techniques. In [10] an iterative algorithm is proposed to compute

low-rank approximations to blocks of large unstructured matrices; the

algorithm uses only a few entries from the original blocks and the

approximate rank is not needed in advance.

The idea of low-rank approximations can be exploited in the design of

the preconditioner [21]. Denoting by U and V the matrices

⎡

U = ⎢

⎣

⎡

V =

⎢

⎣

⎤

U 11 0 · · · 0 U 12 0 · · · 0 · · · U 1p 0 · · · 0

0 U 21 · · · 0 0 U 22 · · · 0 · · · 0 U 2p · · · 0

.

. .. .

. . .. .

. · · · . ..

⎥

. ⎦

0 0 U p1 0 0 U p2 · · · 0 0 U pp

⎤

V 11 V 21 · · · V p1 0 0 · · · 0 · · · 0 0 · · · 0

0 0 · · · 0 V 12 V 22 · · · V p2 · · · . 0 .

⎥

.

. · · · 0 0 ⎦

0 0 · · · 0 0 · · · · · · 0 · V 1p V 2p · · · V pp

the matrix Q can be written as the product UV T . In our case the blocks U ii

and V ii are null for i = 1, ..., p. By using the Sherman-Morrison-Woodbury

formula [73], the following explicit expression can be derived for the inverse

of B :

B −1 = (D + UV T ) −1 = D −1 − D −1 U(I + G) −1 V T D −1

where G = V T D −1 U is of order m = ∑ k,l r kl. The application of the

preconditioner requires “inverting”, that is exactly factorizing, the diagonal

blocks of D and the matrix I + G which has small size. It might profitable

to explore this further in future research. Preliminary results show that

this strategy can be effective provided that the diagonal blocks are exactly

7. Conclusions and perspectives 151

factorized. Some questions are still open. An explicit computation of the

singular value decomposition of off-diagonal blocks is too expensive and is

not feasible in a multipole context where the entries of these blocks are

not available. More sophisticated block partitioning schemes need to be

investigated to select the rank of the off-diagonal blocks appropriately.

Some methods work well for our applications and we have tuned them for

problems in this area. It would be interesting in future work to see whether

these methods are applicable in other areas, for example in acoustics.

152 7. Conclusions and perspectives

Appendix A

Numerical results with the

two-level spectral

preconditioner

153

154 A. Numerical results with the two-level spectral preconditioner

A.1 Effect of the low-rank updates on the GMRES

convergence

Example 1

Size of the

GMRES(m), Toler. 1e-8

coarse space

m=10 m=30 m=50 m=80 m=110

0 358 213 144 79 79

1 314 179 138 76 76

2 314 173 127 73 73

3 313 172 116 70 70

4 310 169 113 69 69

5 313 169 108 67 67

6 315 162 97 64 64

7 315 145 91 62 62

8 315 138 78 59 59

9 315 134 75 57 57

10 248 103 60 53 53

11 206 98 53 52 52

12 197 96 52 52 52

13 194 91 52 51 51

14 192 90 51 51 51

15 191 90 51 51 51

16 189 80 48 48 48

17 189 80 48 48 48

18 175 80 48 48 48

19 166 60 42 42 42

20 153 54 37 37 37

Table A.1.1: Number of iterations required by GMRES preconditioned by

a Frobenius-norm minimization method updated with spectral corrections

to reduce the normwise backward error by 10 −8 for increasing size of the

coarse space.

A.1. Effect of the low-rank updates on the GMRES convergence 155

Example 1

Size of the

GMRES(m), Toler. 1e-5

coarse space

m=10 m=30 m=50 m=80 m=110

0 165 103 75 60 60

1 154 87 64 56 56

2 154 87 62 54 54

3 154 87 62 53 53

4 154 87 62 53 53

5 154 87 61 53 53

6 153 77 50 50 50

7 153 73 48 48 48

8 153 72 45 45 45

9 153 68 44 44 44

10 129 52 40 40 40

11 102 50 39 39 39

12 97 49 39 39 39

13 92 48 38 38 38

14 92 48 38 38 38

15 92 48 38 38 38

16 91 45 35 35 35

17 92 45 35 35 35

18 97 45 35 35 35

19 79 32 31 31 31

20 69 26 26 26 26

Table A.1.2: Number of iterations required by GMRES preconditioned by

a Frobenius-norm minimization method updated with spectral corrections

to reduce the normwise backward error by 10 −5 for increasing size of the

coarse space.

156 A. Numerical results with the two-level spectral preconditioner

Example 2

Size of the

GMRES(m), Toler. 1e-8

coarse space

m=10 m=30 m=50 m=80 m=110

0 +1500 +1500 496 311 198

1 310 235 192 151 107

2 306 222 184 144 104

3 308 209 177 138 101

4 304 208 170 135 97

5 309 206 164 132 96

6 313 205 158 123 92

7 246 174 146 108 88

8 205 159 138 102 87

9 205 159 138 101 87

10 198 155 136 99 86

11 198 154 136 98 86

12 198 154 136 96 84

13 198 153 136 89 83

14 185 131 109 74 74

15 175 138 115 75 75

16 186 137 112 74 74

17 159 117 98 70 70

18 192 135 105 70 70

19 167 126 98 68 68

20 187 143 112 73 73

Table A.1.3: Number of iterations required by GMRES preconditioned by

a Frobenius-norm minimization method updated with spectral corrections

to reduce the normwise backward error by 10 −8 for increasing size of the

coarse space.

A.1. Effect of the low-rank updates on the GMRES convergence 157

Example 2

Size of the

GMRES(m), Toler. 1e-5

coarse space

m=10 m=30 m=50 m=80 m=110

0 145 110 95 76 76

1 144 110 95 76 76

2 139 104 91 73 73

3 140 103 90 73 73

4 140 103 90 73 73

5 140 102 90 73 73

6 143 100 88 72 72

7 119 89 81 68 68

8 100 83 76 65 65

9 100 82 76 65 65

10 99 82 76 64 64

11 99 82 76 64 64

12 99 82 76 64 64

13 99 81 75 64 64

14 80 60 48 48 48

15 77 67 53 52 52

16 87 69 60 54 54

17 69 56 47 47 47

18 87 66 52 51 51

19 73 58 46 46 46

20 93 74 65 58 58

Table A.1.4: Number of iterations required by GMRES preconditioned by

a Frobenius-norm minimization method updated with spectral corrections

to reduce the normwise backward error by 10 −5 for increasing size of the

coarse space.

158 A. Numerical results with the two-level spectral preconditioner

Example 3

Size of the

GMRES(m), Toler. 1e-8

coarse space

m=10 m=30 m=50 m=80 m=110

0 268 174 130 79 79

1 267 171 123 76 76

2 271 171 121 72 72

3 263 170 115 70 70

4 260 153 100 67 67

5 255 141 93 64 64

6 209 111 79 60 60

7 209 111 78 60 60

8 209 111 78 58 58

9 137 86 66 55 55

10 127 82 61 54 54

11 126 82 61 54 54

12 115 80 56 53 53

13 119 81 56 52 52

14 119 81 56 52 52

15 114 79 52 51 51

16 104 74 49 49 49

17 105 68 48 48 48

18 103 57 43 43 43

19 97 59 44 44 44

20 96 57 44 44 44

Table A.1.5: Number of iterations required by GMRES preconditioned by

a Frobenius-norm minimization method updated with spectral corrections

to reduce the normwise backward error by 10 −8 for increasing size of the

coarse space.

A.1. Effect of the low-rank updates on the GMRES convergence 159

Example 3

Size of the

GMRES(m), Toler. 1e-5

coarse space

m=10 m=30 m=50 m=80 m=110

0 129 89 70 57 57

1 129 89 70 56 56

2 129 88 69 56 56

3 128 88 69 56 56

4 126 86 57 53 53

5 125 76 49 49 49

6 107 60 45 45 45

7 107 60 45 45 45

8 107 60 45 45 45

9 73 51 42 42 42

10 65 49 40 40 40

11 65 49 40 40 40

12 63 48 40 40 40

13 64 48 40 40 40

14 63 48 40 40 40

15 62 46 40 40 40

16 59 45 37 37 37

17 54 44 36 36 36

18 53 32 31 31 31

19 53 34 33 33 33

20 53 33 32 32 32

Table A.1.6: Number of iterations required by GMRES preconditioned by

a Frobenius-norm minimization method updated with spectral corrections

to reduce the normwise backward error by 10 −5 for increasing size of the

coarse space.

160 A. Numerical results with the two-level spectral preconditioner

Example 4

Size of the

GMRES(m), Toler. 1e-8

coarse space

m=10 m=30 m=50 m=80 m=110

0 145 113 90 71 71

1 145 113 90 68 68

2 134 105 83 65 65

3 133 105 83 65 65

4 127 97 74 61 61

5 126 95 75 61 61

6 123 91 63 58 58

7 101 77 58 56 56

8 101 77 58 56 56

9 100 75 58 56 56

10 72 55 43 43 43

11 95 74 55 53 53

12 86 70 51 51 51

13 86 68 49 49 49

14 84 66 49 49 49

15 82 63 49 49 49

16 81 63 49 49 49

17 81 65 49 49 49

18 80 65 49 49 49

19 82 65 49 49 49

20 76 59 47 47 47

Table A.1.7: Number of iterations required by GMRES preconditioned by

a Frobenius-norm minimization method updated with spectral corrections

to reduce the normwise backward error by 10 −8 for increasing size of the

coarse space.

A.1. Effect of the low-rank updates on the GMRES convergence 161

Example 4

Size of the

GMRES(m), Toler. 1e-5

coarse space

m=10 m=30 m=50 m=80 m=110

0 71 57 48 48 48

1 71 57 48 48 48

2 68 54 45 45 45

3 68 53 45 45 45

4 65 46 43 43 43

5 65 46 42 42 42

6 64 45 41 41 41

7 49 41 38 38 38

8 49 41 38 38 38

9 48 41 38 38 38

10 20 18 18 18 18

11 46 38 37 37 37

12 44 36 34 34 34

13 44 36 34 34 34

14 43 35 34 34 34

15 43 34 34 34 34

16 42 34 34 34 34

17 43 34 34 34 34

18 43 34 34 34 34

19 43 34 34 34 34

20 40 32 32 32 32

Table A.1.8: Number of iterations required by GMRES preconditioned by

a Frobenius-norm minimization method updated with spectral corrections

to reduce the normwise backward error by 10 −5 for increasing size of the

coarse space.

162 A. Numerical results with the two-level spectral preconditioner

Example 5

Size of the

GMRES(m), Toler. 1e-8

coarse space

m=10 m=30 m=50 m=80 m=110

0 297 87 75 66 66

1 290 78 75 66 66

2 290 78 75 66 66

3 287 66 68 58 58

4 254 66 64 58 58

5 232 66 62 58 58

6 392 66 50 50 50

7 52 43 39 39 39

8 52 43 39 39 39

9 53 43 39 39 39

10 53 43 40 40 40

11 53 43 40 40 40

12 52 44 38 38 38

13 58 46 43 43 43

14 50 44 38 38 38

15 51 44 38 38 38

16 51 44 38 38 38

17 51 44 38 38 38

18 60 45 40 40 40

19 59 45 41 41 41

20 60 45 42 42 42

Table A.1.9: Number of iterations required by GMRES preconditioned by

a Frobenius-norm minimization method updated with spectral corrections

to reduce the normwise backward error by 10 −8 for increasing size of the

coarse space.

A.1. Effect of the low-rank updates on the GMRES convergence 163

Example 5

Size of the

GMRES(m), Toler. 1e-5

coarse space

m=10 m=30 m=50 m=80 m=110

0 110 46 42 42 42

1 109 45 41 41 41

2 109 45 41 41 41

3 104 34 33 33 33

4 88 34 33 33 33

5 73 34 33 33 33

6 109 35 33 33 33

7 23 21 21 21 21

8 23 21 21 21 21

9 23 21 21 21 21

10 23 22 22 22 22

11 23 22 22 22 22

12 24 21 21 21 21

13 28 24 24 24 24

14 23 21 21 21 21

15 23 21 21 21 21

16 23 21 21 21 21

17 23 21 21 21 21

18 30 24 24 24 24

19 30 24 24 24 24

20 32 24 24 24 24

Table A.1.10: Number of iterations required by GMRES preconditioned by

a Frobenius-norm minimization method updated with spectral corrections

to reduce the normwise backward error by 10 −5 for increasing size of the

coarse space.

164 A. Numerical results with the two-level spectral preconditioner

A.2 Experiments with the operator W H = V H

ɛ M 1

Example 1

Size of the

GMRES(m), Toler. 1e-8

coarse space

m=10 m=30 m=50 m=80 m=110

0 358 213 144 79 79

1 315 176 137 76 76

2 314 171 125 72 72

3 314 171 115 70 70

4 313 169 109 68 68

5 306 171 107 67 67

6 303 169 96 64 64

7 298 145 90 61 61

8 294 138 76 58 58

9 303 134 71 57 57

10 244 100 59 53 53

11 206 94 53 51 51

12 190 96 52 51 51

13 177 88 51 51 51

14 177 88 50 50 50

15 180 88 50 50 50

16 184 80 47 47 47

17 180 80 47 47 47

18 180 79 47 47 47

19 174 76 46 46 46

20 174 61 44 44 44

Table A.2.11: Number of iterations required by GMRES preconditioned by

a Frobenius-norm minimization method updated with spectral corrections

to reduce the normwise backward error by 10 −8 for increasing size of the

coarse space. The formulation of Theorem 2 with the choice W H = Vε H M 1

is used for the low-rank updates.

A.2. Experiments with the operator W H = V H

ɛ M 1 165

Example 1

Size of the

GMRES(m), Toler. 1e-5

coarse space

m=10 m=30 m=50 m=80 m=110

0 165 103 75 60 60

1 151 86 63 56 56

2 152 86 61 53 53

3 153 86 61 53 53

4 150 86 61 53 53

5 150 84 61 53 53

6 146 81 50 50 50

7 146 48 48 48 48

8 138 70 44 44 44

9 141 70 43 43 43

10 144 64 40 40 40

11 120 51 39 39 39

12 98 49 38 38 38

13 97 47 37 37 37

14 93 46 37 37 37

15 93 46 37 37 37

16 90 46 34 34 34

17 90 44 34 34 34

18 90 44 34 34 34

19 93 44 34 34 34

20 93 33 32 32 32

Table A.2.12: Number of iterations required by GMRES preconditioned by

a Frobenius-norm minimization method updated with spectral corrections

to reduce the normwise backward error by 10 −5 for increasing size of the

coarse space. The formulation of Theorem 2 with the choice W H = Vε H M 1

is used for the low-rank updates.

166 A. Numerical results with the two-level spectral preconditioner

Example 2

Size of the

GMRES(m), Toler. 1e-8

coarse space

m=10 m=30 m=50 m=80 m=110

0 +1500 +1500 496 311 198

1 285 215 177 141 103

2 286 202 169 133 100

3 286 193 160 129 97

4 279 192 152 125 93

5 286 190 149 124 92

6 286 189 146 111 89

7 229 161 137 95 85

8 188 147 129 90 84

9 187 146 129 91 84

10 185 144 127 90 83

11 184 143 127 89 83

12 196 148 131 91 83

13 187 147 130 85 81

14 190 144 129 80 80

15 189 144 126 77 77

16 183 137 114 74 74

17 178 135 109 73 73

18 179 136 108 73 73

19 178 135 102 70 70

20 168 130 100 69 69

Table A.2.13: Number of iterations required by GMRES preconditioned by

a Frobenius-norm minimization method updated with spectral corrections

to reduce the normwise backward error by 10 −8 for increasing size of the

coarse space. The formulation of Theorem 2 with the choice W H = Vε H M 1

is used for the low-rank updates.

A.2. Experiments with the operator W H = V H

ɛ M 1 167

Example 2

Size of the

GMRES(m), Toler. 1e-5

coarse space

m=10 m=30 m=50 m=80 m=110

0 145 110 95 76 76

1 119 90 83 69 69

2 118 88 80 67 67

3 123 88 80 67 67

4 122 88 80 67 67

5 124 88 80 67 67

6 123 88 80 66 66

7 106 78 71 62 62

8 84 71 64 58 58

9 86 71 65 58 58

10 85 71 65 58 58

11 85 71 64 58 58

12 94 74 69 61 61

13 94 74 68 61 61

14 94 74 68 61 61

15 92 74 66 58 58

16 88 69 60 54 54

17 87 69 60 54 54

18 86 69 60 54 54

19 88 68 58 53 53

20 85 67 56 53 53

Table A.2.14: Number of iterations required by GMRES preconditioned by

a Frobenius-norm minimization method updated with spectral corrections

to reduce the normwise backward error by 10 −5 for increasing size of the

coarse space. The formulation of Theorem 2 with the choice W H = Vε H M 1

is used for the low-rank updates.

168 A. Numerical results with the two-level spectral preconditioner

Example 3

Size of the

GMRES(m), Toler. 1e-8

coarse space

m=10 m=30 m=50 m=80 m=110

0 268 174 130 79 79

1 254 170 121 76 76

2 286 170 119 72 72

3 284 169 114 70 70

4 259 150 99 66 66

5 269 141 92 63 63

6 221 110 78 60 60

7 222 108 77 59 59

8 225 109 77 58 58

9 133 86 65 55 55

10 126 82 59 53 53

11 124 82 60 53 53

12 117 81 56 52 52

13 117 81 56 52 52

14 119 80 56 52 52

15 119 79 53 51 51

16 105 74 49 49 49

17 106 69 47 47 47

18 105 65 46 46 46

19 99 58 44 44 44

20 96 58 44 44 44

Table A.2.15: Number of iterations required by GMRES preconditioned by

a Frobenius-norm minimization method updated with spectral corrections

to reduce the normwise backward error by 10 −8 for increasing size of the

coarse space. The formulation of Theorem 2 with the choice W H = Vε H M 1

is used for the low-rank updates.

A.2. Experiments with the operator W H = V H

ɛ M 1 169

Example 3

Size of the

GMRES(m), Toler. 1e-5

coarse space

m=10 m=30 m=50 m=80 m=110

0 129 89 70 57 57

1 130 88 70 56 56

2 147 88 68 56 56

3 145 88 69 56 56

4 139 83 55 52 52

5 135 74 49 49 49

6 116 59 45 45 45

7 115 60 45 45 45

8 115 60 45 45 45

9 70 50 41 41 41

10 66 48 40 40 40

11 66 48 40 40 40

12 64 47 39 39 39

13 64 47 39 39 39

14 64 47 39 39 39

15 62 46 39 39 39

16 56 44 37 37 37

17 56 42 36 36 36

18 55 41 35 35 35

19 56 34 33 33 33

20 56 33 32 32 32

Table A.2.16: Number of iterations required by GMRES preconditioned by

a Frobenius-norm minimization method updated with spectral corrections

to reduce the normwise backward error by 10 −5 for increasing size of the

coarse space. The formulation of Theorem 2 with the choice W H = Vε H M 1

is used for the low-rank updates.

170 A. Numerical results with the two-level spectral preconditioner

Example 4

Size of the

GMRES(m), Toler. 1e-8

coarse space

m=10 m=30 m=50 m=80 m=110

0 145 113 90 71 71

1 145 113 90 68 68

2 134 106 83 65 65

3 130 103 85 65 65

4 125 97 74 61 61

5 123 93 73 61 61

6 120 91 66 58 58

7 101 78 58 56 56

8 101 78 58 56 56

9 98 78 58 56 56

10 94 74 56 55 55

11 93 74 55 53 53

12 86 70 52 51 51

13 86 68 50 50 50

14 85 67 49 49 49

15 82 64 49 49 49

16 81 64 49 49 49

17 82 66 49 49 49

18 81 66 49 49 49

19 81 67 50 50 50

20 77 62 47 47 47

Table A.2.17: Number of iterations required by GMRES preconditioned by

a Frobenius-norm minimization method updated with spectral corrections

to reduce the normwise backward error by 10 −8 for increasing size of the

coarse space. The formulation of Theorem 2 with the choice W H = Vε H M 1

is used for the low-rank updates.

A.2. Experiments with the operator W H = V H

ɛ M 1 171

Example 4

Size of the

GMRES(m), Toler. 1e-5

coarse space

m=10 m=30 m=50 m=80 m=110

0 71 57 48 48 48

1 71 57 48 48 48

2 68 54 45 45 45

3 66 50 45 45 45

4 64 46 43 43 43

5 63 45 41 41 41

6 63 45 41 41 41

7 50 41 38 38 38

8 50 41 38 38 38

9 49 41 38 38 38

10 46 39 37 37 37

11 46 38 37 37 37

12 44 36 35 35 35

13 45 36 35 35 35

14 44 35 34 34 34

15 43 34 34 34 34

16 43 34 33 33 33

17 43 35 34 34 34

18 43 35 34 34 34

19 43 35 34 34 34

20 41 33 32 32 32

Table A.2.18: Number of iterations required by GMRES preconditioned by

a Frobenius-norm minimization method updated with spectral corrections

to reduce the normwise backward error by 10 −5 for increasing size of the

coarse space. The formulation of Theorem 2 with the choice W H = Vε H M 1

is used for the low-rank updates.

172 A. Numerical results with the two-level spectral preconditioner

Example 5

Size of the

GMRES(m), Toler. 1e-8

coarse space

m=10 m=30 m=50 m=80 m=110

0 297 87 75 66 66

1 312 79 75 66 66

2 311 79 75 66 66

3 354 66 68 58 58

4 345 66 64 58 58

5 270 66 62 58 58

6 559 66 50 50 50

7 53 43 40 40 40

8 55 43 40 40 40

9 55 43 40 40 40

10 54 44 41 41 41

11 53 44 41 41 41

12 52 43 38 38 38

13 53 44 38 38 38

14 52 44 39 39 39

15 52 44 39 39 39

16 52 44 39 39 39

17 52 44 39 39 39

18 52 44 39 39 39

19 53 45 39 39 39

20 53 45 40 40 40

Table A.2.19: Number of iterations required by GMRES preconditioned by

a Frobenius-norm minimization method updated with spectral corrections

to reduce the normwise backward error by 10 −8 for increasing size of the

coarse space. The formulation of Theorem 2 with the choice W H = Vε H M 1

is used for the low-rank updates.

A.2. Experiments with the operator W H = V H

ɛ M 1 173

Example 5

Size of the

GMRES(m), Toler. 1e-5

coarse space

m=10 m=30 m=50 m=80 m=110

0 110 46 42 42 42

1 111 45 41 41 41

2 111 45 41 41 41

3 112 34 34 34 34

4 92 35 34 34 34

5 72 35 34 34 34

6 121 36 34 34 34

7 23 21 21 21 21

8 23 22 22 22 22

9 24 22 22 22 22

10 24 21 21 21 21

11 23 21 21 21 21

12 23 21 21 21 21

13 23 21 21 21 21

14 24 21 21 21 21

15 24 21 21 21 21

16 24 21 21 21 21

17 24 22 22 22 22

18 24 22 22 22 22

19 24 22 22 22 22

20 24 22 22 22 22

Table A.2.20: Number of iterations required by GMRES preconditioned by

a Frobenius-norm minimization method updated with spectral corrections

to reduce the normwise backward error by 10 −5 for increasing size of the

coarse space. The formulation of Theorem 2 with the choice W H = Vε H M 1

is used for the low-rank updates.

174 A. Numerical results with the two-level spectral preconditioner

A.3 Cost of the eigencomputation

Example 1

Nr. of eigenvalues M-V products

CPU-time

(in sec)

A-V

1 90 15 9

2 388 41 36

3 243 22 23

4 281 35 26

5 354 33 33

6 293 27 25

7 247 23 21

8 198 19 17

9 179 26 15

10 138 14 4

11 186 26 3

12 213 21 4

13 189 22 3

14 235 23 4

15 276 36 4

16 266 31 4

17 514 50 8

18 336 34 5

19 336 34 4

20 650 75 7

Table A.3.21: Number of matrix-vector products, CPU time and

amortization vectors required by the IRAM algorithm to compute

approximate eigenvalues nearest zero and the corresponding eigenvectors.

The computation of the amortization vectors is relative to GMRES(10) and

a tolerance of 10 −5 .

A.3. Cost of the eigencomputation 175

Example 2

Nr. of eigenvalues M-V products

CPU-time

(in sec)

A-V

1 135 18 135

2 440 59 74

3 524 71 105

4 469 63 94

5 423 58 85

6 357 49 179

7 340 47 14

8 333 46 8

9 345 48 8

10 358 50 8

11 527 74 12

12 579 81 13

13 574 81 13

14 1010 142 16

15 1762 303 26

16 1053 149 19

17 751 107 10

18 3050 514 53

19 2359 335 33

20 1066 188 21

Table A.3.22: Number of matrix-vector products, CPU time and

amortization vectors required by the IRAM algorithm to compute

approximate eigenvalues nearest zero and the corresponding eigenvectors.

The computation of the amortization vectors is relative to GMRES(10) and

a tolerance of 10 −5 .

176 A. Numerical results with the two-level spectral preconditioner

Example 3

Nr. of eigenvalues M-V products

CPU-time

(in sec)

A-V

1 120 29 -

2 336 79 -

3 290 69 290

4 250 60 84

5 192 46 48

6 183 45 9

7 175 43 8

8 165 41 8

9 154 39 3

10 169 42 3

11 157 40 3

12 219 56 4

13 224 62 4

14 212 70 4

15 223 57 4

16 202 53 3

17 226 59 4

18 264 69 4

19 264 69 4

20 300 78 4

Table A.3.23: Number of matrix-vector products, CPU time and

amortization vectors required by the IRAM algorithm to compute

approximate eigenvalues nearest zero and the corresponding eigenvectors.

The computation of the amortization vectors is relative to GMRES(10) and

a tolerance of 10 −5 .

A.3. Cost of the eigencomputation 177

Example 4

Nr. of eigenvalues M-V products

CPU-time

(in sec)

A-V

1 75 25 -

2 168 56 56

3 214 72 72

4 178 60 30

5 149 57 25

6 180 73 26

7 134 47 7

8 236 81 11

9 261 90 12

10 207 72 5

11 191 67 8

12 197 70 8

13 248 88 10

14 309 109 12

15 355 125 13

16 412 156 15

17 408 144 15

18 390 138 49

19 426 191 16

20 345 159 12

Table A.3.24: Number of matrix-vector products, CPU time and

amortization vectors required by the IRAM algorithm to compute

approximate eigenvalues nearest zero and the corresponding eigenvectors.

The computation of the amortization vectors is relative to GMRES(10) and

a tolerance of 10 −5 .

178 A. Numerical results with the two-level spectral preconditioner

Example 5

Nr. of eigenvalues M-V products

CPU-time

(in sec)

A-V

1 60 30 60

2 58 29 58

3 107 53 18

4 103 51 5

5 163 81 5

6 156 78 156

7 128 65 2

8 105 54 2

9 125 65 2

10 128 67 2

11 160 83 2

12 131 94 2

13 162 85 2

14 126 68 2

15 164 97 2

16 237 124 3

17 227 119 3

18 223 118 3

19 756 454 10

20 220 118 3

Table A.3.25: Number of matrix-vector products, CPU time and

amortization vectors required by the IRAM algorithm to compute

approximate eigenvalues nearest zero and the corresponding eigenvectors.

The computation of the amortization vectors is relative to GMRES(10) and

a tolerance of 10 −5 .

A.4. Sensitivity of the preconditioner to the accuracy of the eigencomputation179

A.4 Sensitivity of the preconditioner to the

accuracy of the eigencomputation

Example 1

Size of the

GMRES(m), Toler. 1e-8

coarse space

m=10 m=30 m=50 m=80 m=110

0 358 213 144 79 79

1 315 176 137 76 76

2 314 171 125 72 72

3 314 171 115 70 70

4 313 169 109 68 68

5 306 171 107 67 67

6 303 169 96 64 64

7 298 145 90 61 61

8 294 138 76 58 58

9 303 134 71 57 57

10 244 100 59 53 53

11 206 94 53 51 51

12 190 96 52 51 51

13 177 88 51 51 51

14 177 88 50 50 50

15 182 88 50 50 50

16 184 80 47 47 47

17 171 80 47 47 47

18 176 79 47 47 47

19 177 77 47 47 47

20 178 61 44 44 44

Table A.4.26: Number of iterations required by GMRES preconditioned by

a Frobenius-norm minimization method updated with spectral corrections

to reduce the residual by 10 −8 for increasing size of the coarse space. The

formulation of Theorem 2 with the choice W H = Vε

H M 1 is used for the

low-rank updates. The computation of Ritz pairs is carried out at machine

precision.

180 A. Numerical results with the two-level spectral preconditioner

Example 1

Size of the

GMRES(m), Toler. 1e-5

coarse space

m=10 m=30 m=50 m=80 m=110

0 165 103 75 60 60

1 151 86 63 56 56

2 152 86 61 53 53

3 153 86 61 53 53

4 150 86 61 53 53

5 150 84 61 53 53

6 146 81 50 50 50

7 138 70 48 48 48

8 141 70 44 44 44

9 144 64 43 43 43

10 120 51 40 40 40

11 98 49 39 39 39

12 97 47 38 38 38

13 93 46 37 37 37

14 93 46 37 37 37

15 91 46 37 37 37

16 92 44 34 34 34

17 89 44 34 34 34

18 90 44 34 34 34

19 92 44 34 34 34

20 90 34 32 32 32

Table A.4.27: Number of iterations required by GMRES preconditioned by

a Frobenius-norm minimization method updated with spectral corrections

to reduce the normwise backward error by 10 −5 for increasing size of the

coarse space. The formulation of Theorem 2 with the choice W H = Vε H M 1

is used for the low-rank updates. The computation of Ritz pairs is carried

out at machine precision.

A.4. Sensitivity of the preconditioner to the accuracy of the eigencomputation181

Example 2

Size of the

GMRES(m), Toler. 1e-8

coarse space

m=10 m=30 m=50 m=80 m=110

0 +1500 496 311 198 123

1 215 177 141 103 103

2 202 169 133 100 100

3 193 160 129 97 97

4 192 152 125 93 93

5 190 149 124 92 92

6 189 146 111 89 89

7 161 137 95 85 85

8 147 129 90 84 84

9 146 129 91 84 84

10 144 127 90 83 83

11 143 127 89 83 83

12 143 126 88 82 82

13 140 122 80 80 80

14 139 118 79 79 79

15 139 119 79 79 79

16 139 118 79 79 79

17 139 116 76 76 76

18 135 113 75 75 75

19 135 114 75 75 75

20 131 109 73 73 73

Table A.4.28: Number of iterations required by GMRES preconditioned by

a Frobenius-norm minimization method updated with spectral corrections

to reduce the normwise backward error by 10 −8 for increasing size of the

coarse space. The formulation of Theorem 2 with the choice W H = Vε H M 1

is used for the low-rank updates. The computation of Ritz pairs is carried

out at machine precision.

182 A. Numerical results with the two-level spectral preconditioner

Example 2

Size of the

GMRES(m), Toler. 1e-5

coarse space

m=10 m=30 m=50 m=80 m=110

0 145 110 95 76 76

1 119 90 83 69 69

2 118 88 80 67 67

3 123 88 80 67 67

4 122 88 80 67 67

5 124 88 80 67 67

6 123 88 80 66 66

7 106 78 71 62 62

8 84 71 64 58 58

9 86 71 65 58 58

10 85 71 65 58 58

11 85 71 64 58 58

12 84 71 64 58 58

13 84 70 62 56 56

14 83 70 62 56 56

15 83 70 62 56 56

16 84 70 62 56 56

17 84 70 61 55 55

18 79 68 59 55 55

19 79 68 60 55 55

20 79 67 58 53 53

Table A.4.29: Number of iterations required by GMRES preconditioned by

a Frobenius-norm minimization method updated with spectral corrections

to reduce the normwise backward error by 10 −5 for increasing size of the

coarse space. The formulation of Theorem 2 with the choice W H = Vε H M 1

is used for the low-rank updates. The computation of Ritz pairs is carried

out at machine precision.

A.4. Sensitivity of the preconditioner to the accuracy of the eigencomputation183

Example 3

Size of the

GMRES(m), Toler. 1e-8

coarse space

m=10 m=30 m=50 m=80 m=110

0 268 174 130 79 79

1 254 170 121 76 76

2 286 170 119 72 72

3 284 169 114 70 70

4 259 150 99 66 66

5 269 141 92 63 63

6 221 110 78 60 60

7 222 108 77 59 59

8 225 109 77 58 58

9 133 86 65 55 55

10 126 82 59 53 53

11 124 82 60 53 53

12 117 81 56 52 52

13 117 81 56 52 52

14 119 80 56 52 52

15 117 79 53 51 51

16 104 73 49 49 4¯9

17 106 70 47 47 47

18 102 64 46 46 46

19 94 58 44 44 44

20 99 58 44 44 44

Table A.4.30: Number of iterations required by GMRES preconditioned by

a Frobenius-norm minimization method updated with spectral corrections

to reduce the normwise backward error by 10 −8 for increasing size of the

coarse space. The formulation of Theorem 2 with the choice W H = Vε H M 1

is used for the low-rank updates. The computation of Ritz pairs is carried

out at machine precision.

184 A. Numerical results with the two-level spectral preconditioner

Example 3

Size of the

GMRES(m), Toler. 1e-5

coarse space

m=10 m=30 m=50 m=80 m=110

0 129 89 70 57 57

1 130 88 70 56 56

2 147 88 68 56 56

3 145 88 69 56 56

4 139 83 55 52

5 135 74 49 49 49

6 116 59 45 45 45

7 115 60 45 45 45

8 115 60 45 45 45

9 70 50 41 41 41

10 66 48 40 40 40

11 66 48 40 40 40

12 64 47 39 39 39

13 63 47 39 39 39

14 63 47 39 39 39

15 62 46 39 39 39

16 55 44 37 37 37

17 56 42 36 36 36

18 53 39 35 35 35

19 55 34 33 33 33

20 55 34 32 32 32

Table A.4.31: Number of iterations required by GMRES preconditioned by

a Frobenius-norm minimization method updated with spectral corrections

to reduce the normwise backward error by 10 −5 for increasing size of the

coarse space. The formulation of Theorem 2 with the choice W H = Vε H M 1

is used for the low-rank updates. The computation of Ritz pairs is carried

out at machine precision.

A.4. Sensitivity of the preconditioner to the accuracy of the eigencomputation185

Example 4

Size of the

GMRES(m), Toler. 1e-8

coarse space

m=10 m=30 m=50 m=80 m=110

0 145 113 90 71 71

1 145 113 90 68 68

2 134 106 83 65 65

3 130 103 85 65 65

4 125 97 74 61 61

5 123 93 73 61 61

6 120 91 66 58 58

7 101 78 58 56 56

8 101 78 58 56 56

9 99 77 58 56 56

10 94 74 56 55 55

11 93 74 55 53 53

12 86 70 52 51 51

13 86 68 50 50 50

14 85 67 49 49 49

15 82 65 49 49 49

16 81 64 49 49 49

17 81 66 49 49 49

18 81 66 49 49 49

19 80 64 49 49 49

20 80 65 49 49 49

Table A.4.32: Number of iterations required by GMRES preconditioned by

a Frobenius-norm minimization method updated with spectral corrections

to reduce the normwise backward error by 10 −8 for increasing size of the

coarse space. The formulation of Theorem 2 with the choice W H = Vε H M 1

is used for the low-rank updates. The computation of Ritz pairs is carried

out at machine precision.

186 A. Numerical results with the two-level spectral preconditioner

Example 4

Size of the

GMRES(m), Toler. 1e-5

coarse space

m=10 m=30 m=50 m=80 m=110

0 71 57 48 48 48

1 71 57 48 48 48

2 68 54 45 45 45

3 66 50 45 45 45

4 64 46 43 43 43

5 63 45 41 41 41

6 63 45 41 41 41

7 50 41 38 38 38

8 50 41 38 38 38

9 48 40 38 38 38

10 46 39 37 37 37

11 46 38 37 37 37

12 44 36 35 35 35

13 45 36 35 35 35

14 44 35 34 34 34

15 43 34 34 34 34

16 43 34 34 34 34

17 43 34 34 34 34

18 43 34 34 34 34

19 43 34 34 34 34

20 43 35 34 34 34

Table A.4.33: Number of iterations required by GMRES preconditioned by

a Frobenius-norm minimization method updated with spectral corrections

to reduce the normwise backward error by 10 −5 for increasing size of the

coarse space. The formulation of Theorem 2 with the choice W H = Vε H M 1

is used for the low-rank updates. The computation of Ritz pairs is carried

out at machine precision.

A.4. Sensitivity of the preconditioner to the accuracy of the eigencomputation187

Example 5

Size of the

GMRES(m), Toler. 1e-8

coarse space

m=10 m=30 m=50 m=80 m=110

0 297 87 75 66 66

1 312 79 75 66 66

2 311 79 75 66 66

3 354 66 68 58 58

4 346 66 64 58 58

5 270 66 62 58 58

6 552 66 50 50 50

7 53 43 40 40 40

8 55 43 40 40 40

9 54 44 40 40 40

10 54 44 40 40 40

11 50 43 38 38 38

12 50 43 38 38 38

13 48 43 38 38 38

14 48 43 38 38 38

15 48 43 38 38 38

16 48 43 38 38 38

17 48 43 38 38 38

18 48 43 38 38 38

19 47 41 36 36 36

20 50 40 36 36 36

Table A.4.34: Number of iterations required by GMRES preconditioned by

a Frobenius-norm minimization method updated with spectral corrections

to reduce the normwise backward error by 10 −8 for increasing size of the

coarse space. The formulation of Theorem 2 with the choice W H = Vε H M 1

is used for the low-rank updates. The computation of Ritz pairs is carried

out at machine precision.

188 A. Numerical results with the two-level spectral preconditioner

Example 5

Size of the

GMRES(m), Toler. 1e-5

coarse space

m=10 m=30 m=50 m=80 m=110

0 110 46 42 42 42

1 111 45 41 41 41

2 111 45 41 41 41

3 112 34 34 34 34

4 92 35 34 34 34

5 72 35 34 34 34

6 121 36 34 34 34

7 23 21 21 21 21

8 23 22 22 22 22

9 23 21 21 21 21

10 23 21 21 21 21

11 23 21 21 21 21

12 23 21 21 21 21

13 23 21 21 21 21

14 23 21 21 21 21

15 23 21 21 21 21

16 23 21 21 21 21

17 23 21 21 21 21

18 23 21 21 21 21

19 23 21 21 21 21

20 24 21 21 21 21

Table A.4.35: Number of iterations required by GMRES preconditioned by

a Frobenius-norm minimization method updated with spectral corrections

to reduce the normwise backward error by 10 −5 for increasing size of the

coarse space. The formulation of Theorem 2 with the choice W H = Vε H M 1

is used for the low-rank updates. The computation of Ritz pairs is carried

out at machine precision.

A.5. Experiments with a poor preconditioner M 1 189

A.5 Experiments with a poor preconditioner M 1

Example 1

Size of the

GMRES(m), Toler. 1e-8

coarse space

m=10 m=30 m=50 m=80 m=110

0 818 418 303 193 142

1 804 418 303 193 139

2 780 419 301 184 122

3 784 419 306 178 112

4 779 348 262 154 105

5 766 328 247 153 104

6 696 317 238 148 102

7 722 316 233 149 102

8 690 318 236 148 102

9 710 314 235 148 102

10 666 298 231 145 101

11 710 290 227 144 99

12 635 260 196 132 93

13 628 258 196 131 93

14 589 255 195 130 93

15 648 256 195 130 92

16 626 255 190 126 91

17 658 251 185 113 87

18 654 250 185 113 87

19 615 251 184 113 87

20 658 238 161 93 83

Table A.5.36: Number of iterations required by GMRES preconditioned by

a Frobenius-norm minimization method updated with spectral corrections

to reduce the normwise backward error by 10 −8 for increasing size of the

coarse space. The formulation of Theorem 2 with the choice W H = Vε H M 1

is used for the low-rank updates. The same nonzero sructure is imposed on

A and M 1 .

190 A. Numerical results with the two-level spectral preconditioner

Example 1

Size of the

GMRES(m), Toler. 1e-5

coarse space

m=10 m=30 m=50 m=80 m=110

0 342 174 138 81 81

1 338 174 138 82 82

2 325 174 138 82 82

3 331 174 138 82 82

4 327 146 121 74 74

5 309 134 114 72 72

6 291 133 113 71 71

7 302 133 113 71 71

8 285 133 113 71 71

9 301 133 113 71 71

10 304 131 111 71 71

11 290 127 107 69 69

12 267 108 86 65 65

13 269 107 85 64 64

14 269 107 85 64 64

15 268 107 85 63 63

16 269 106 85 63 63

17 275 106 85 61 61

18 270 106 85 61 61

19 265 106 85 61 61

20 270 99 74 57 57

Table A.5.37: Number of iterations required by GMRES preconditioned by

a Frobenius-norm minimization method updated with spectral corrections

to reduce the normwise backward error by 10 −5 for increasing size of the

coarse space. The formulation of Theorem 2 with the choice W H = Vε H M 1

is used for the low-rank updates. The same nonzero structure is imposed on

A and M 1 .

A.5. Experiments with a poor preconditioner M 1 191

Example 2

Size of the

GMRES(m), Toler. 1e-8

coarse space

m=10 m=30 m=50 m=80 m=110

0 +1500 +1500 +1500 1058 509

1 518 371 311 265 224

2 513 372 310 265 223

3 515 372 309 264 222

4 513 371 308 263 220

5 516 370 308 263 220

6 504 370 307 262 220

7 515 369 307 263 220

8 506 367 306 262 220

9 506 367 306 262 219

10 508 365 306 261 219

11 502 366 305 261 219

12 502 362 304 260 219

13 497 363 304 260 219

14 499 363 304 260 219

15 499 363 304 260 219

16 504 363 304 259 219

17 497 362 302 259 218

18 490 358 299 256 218

19 490 358 299 256 218

20 492 358 299 255 218

Table A.5.38: Number of iterations required by GMRES preconditioned by

a Frobenius-norm minimization method updated with spectral corrections

to reduce the normwise backward error by 10 −8 for increasing size of the

coarse space. The formulation of Theorem 2 with the choice W H = Vε H M 1

is used for the low-rank updates.

192 A. Numerical results with the two-level spectral preconditioner

Example 2

Size of the

GMRES(m), Toler. 1e-5

coarse space

m=10 m=30 m=50 m=80 m=110

0 247 180 155 137 115

1 239 174 149 133 109

2 237 174 149 133 109

3 238 174 149 133 109

4 237 174 148 133 109

5 239 174 149 133 109

6 237 174 148 132 109

7 237 174 148 132 109

8 239 173 148 132 109

9 237 173 148 132 109

10 239 173 148 132 108

11 233 173 148 132 108

12 235 173 148 132 108

13 233 173 148 132 108

14 229 173 148 132 108

15 237 173 148 132 108

16 237 172 148 132 108

17 232 172 147 131 108

18 232 171 147 130 107

19 233 171 147 130 107

20 233 170 147 130 107

Table A.5.39: Number of iterations required by GMRES preconditioned by

a Frobenius-norm minimization method updated with spectral corrections

to reduce the normwise backward error by 10 −5 for increasing size of the

coarse space. The formulation of Theorem 2 with the choice W H = Vε H M 1

is used for the low-rank updates. The same nonzero structure is imposed on

A and M 1 .

A.5. Experiments with a poor preconditioner M 1 193

Example 3

Size of the

GMRES(m), Toler. 1e-8

coarse space

m=10 m=30 m=50 m=80 m=110

0 303 234 194 159 132

1 275 210 176 152 118

2 278 210 175 150 111

3 276 209 173 149 111

4 273 208 173 149 110

5 277 208 173 149 110

6 273 208 173 148 108

7 253 191 163 143 106

8 254 191 163 143 106

9 253 190 163 141 103

10 220 175 148 134 100

11 221 175 148 133 99

12 221 173 147 133 99

13 216 172 145 131 99

14 219 172 145 131 96

15 219 168 143 128 93

16 217 168 143 127 93

17 217 166 142 119 90

18 213 164 142 119 90

19 213 164 142 119 90

20 200 150 133 109 88

Table A.5.40: Number of iterations required by GMRES preconditioned by

a Frobenius-norm minimization method updated with spectral corrections

to reduce the normwise backward error by 10 −8 for increasing size of the

coarse space. The formulation of Theorem 2 with the choice W H = Vε H M 1

is used for the low-rank updates. The same nonzero structure is imposed on

A and M 1 .

194 A. Numerical results with the two-level spectral preconditioner

Example 3

Size of the

GMRES(m), Toler. 1e-5

coarse space

m=10 m=30 m=50 m=80 m=110

0 152 118 101 94 86

1 149 114 97 84 82

2 149 114 96 83 82

3 147 113 96 81 81

4 144 112 95 80 80

5 145 112 95 80 80

6 144 112 95 80 80

7 136 105 91 77 77

8 137 105 91 77 77

9 136 105 91 77 77

10 119 9¯7 84 73 73

11 118 97 84 73 73

12 118 97 84 72 72

13 117 96 83 72 72

14 117 96 83 72 72

15 117 95 82 71 71

16 116 93 81 70 70

17 115 90 80 69 69

18 114 90 80 68 68

19 114 90 80 68 68

20 108 84 76 66 66

Table A.5.41: Number of iterations required by GMRES preconditioned by

a Frobenius-norm minimization method updated with spectral corrections

to reduce the normwise backward error by 10 −5 for increasing size of the

coarse space. The formulation of Theorem 2 with the choice W H = Vε H M 1

is used for the low-rank updates. The same nonzero structure is imposed on

A and M 1 .

A.5. Experiments with a poor preconditioner M 1 195

Example 4

Size of the

GMRES(m), Toler. 1e-8

coarse space

m=10 m=30 m=50 m=80 m=110

0 256 235 214 194 171

1 256 235 214 193 170

2 255 232 211 190 167

3 255 232 211 190 166

4 252 229 207 186 159

5 251 229 207 186 155

6 250 227 206 185 155

7 249 223 199 170 151

8 248 222 199 169 149

9 248 222 199 169 149

10 248 222 198 169 148

11 247 221 197 168 133

12 248 220 191 159 125

13 240 199 169 148 119

14 240 199 169 148 119

15 236 195 167 146 117

16 237 194 166 146 117

17 236 194 166 146 116

18 236 194 166 146 116

19 229 191 163 139 114

20 226 192 163 139 112

Table A.5.42: Number of iterations required by GMRES preconditioned by

a Frobenius-norm minimization method updated with spectral corrections

to reduce the normwise backward error by 10 −8 for increasing size of the

coarse space. The formulation of Theorem 2 with the choice W H = Vε H M 1

is used for the low-rank updates. The same nonzero structure is imposed on

A and M 1 .

196 A. Numerical results with the two-level spectral preconditioner

Example 4

Size of the

GMRES(m), Toler. 1e-5

coarse space

m=10 m=30 m=50 m=80 m=110

0 123 110 101 91 87

1 123 110 101 91 87

2 122 109 100 91 87

3 122 109 100 91 87

4 122 108 99 90 87

5 121 108 99 90 86

6 120 107 98 89 86

7 119 103 95 83 83

8 119 103 95 83 83

9 119 103 95 83 83

10 119 102 94 83 83

11 118 101 93 82 81

12 118 100 92 81 76

13 117 96 88 76 76

14 117 96 88 76 75

15 116 94 87 75 75

16 116 92 85 75 75

17 116 92 85 75 75

18 116 92 85 75 75

19 114 91 84 72 72

20 112 92 84 72 72

Table A.5.43: Number of iterations required by GMRES preconditioned by

a Frobenius-norm minimization method updated with spectral corrections

to reduce the normwise backward error by 10 −5 for increasing size of the

coarse space. The formulation of Theorem 2 with the choice W H = Vε H M 1

is used for the low-rank updates. The same nonzero structure is imposed on

A and M 1 .

A.5. Experiments with a poor preconditioner M 1 197

Example 5

Size of the

GMRES(m), Toler. 1e-8

coarse space

m=10 m=30 m=50 m=80 m=110

0 +1500 321 175 156 144

1 +1500 310 153 155 144

2 +1500 310 153 155 144

3 1443 174 149 118 104

4 1341 175 149 117 104

5 1292 174 149 116 104

6 1058 193 141 94 89

7 132 95 86 74 74

8 132 95 86 74 74

9 132 95 86 74 74

10 132 95 86 74 74

11 132 94 84 74 74

12 129 93 84 74 74

13 128 92 85 74 74

14 125 90 86 74 74

15 120 90 85 74 74

16 120 90 85 74 74

17 120 90 85 74 74

18 119 88 84 74 74

19 119 86 82 70 70

20 120 86 83 70 70

Table A.5.44: Number of iterations required by GMRES preconditioned by

a Frobenius-norm minimization method updated with spectral corrections

to reduce the normwise backward error by 10 −8 for increasing size of the

coarse space. The formulation of Theorem 2 with the choice W H = Vε H M 1

is used for the low-rank updates. The same nonzero sructure is imposed on

A and M 1 .

198 A. Numerical results with the two-level spectral preconditioner

Example 5

Size of the

GMRES(m), Toler. 1e-5

coarse space

m=10 m=30 m=50 m=80 m=110

0 527 92 80 72 72

1 523 90 80 72 72

2 523 90 80 72 72

3 509 66 60 57 57

4 462 65 61 57 57

5 433 65 61 57 57

6 270 64 76 57 57

7 62 43 41 41 41

8 62 43 41 41 41

9 62 43 41 41 41

10 62 43 41 41 41

11 62 43 41 41 41

12 59 44 41 41 41

13 58 44 41 41 41

14 56 43 40 40 40

15 55 40 37 37 37

16 55 40 37 37 37

17 55 40 37 37 37

18 55 40 37 37 37

19 55 40 37 37 37

20 56 39 37 37 37

Table A.5.45: Number of iterations required by GMRES preconditioned by

a Frobenius-norm minimization method updated with spectral corrections

to reduce the normwise backward error by 10 −5 for increasing size of the

coarse space. The formulation of Theorem 2 with the choice W H = Vε H M 1

is used for the low-rank updates. The same nonzero structure is imposed on

A and M 1 .

A.5. Experiments with a poor preconditioner M 1 199

Example 1

Nr. of eigenvalues M-V products CPU-time A-V

1 480 54 120

2 5198 526 306

3 1580 174 144

4 1355 153 91

5 997 96 31

6 926 116 19

7 891 82 23

8 965 89 17

9 1367 126 34

10 1317 139 35

11 1331 124 26

12 1829 248 12

13 1738 316 24

14 2872 302 40

15 3084 355 42

16 2574 258 36

17 2654 436 40

18 2156 253 30

19 1689 163 22

20 3284 336 46

Table A.5.46: Number of matrix-vector products, CPU time and

amortization vectors required by the IRAM algorithm to compute

approximate eigenvalues nearest zero and the corresponding eigenvectors.

The computation of the amortization vectors is relative to GMRES(10) and

a tolerance of 10 −5 .

200 A. Numerical results with the two-level spectral preconditioner

Example 2

Nr. of eigenvalues M-V products CPU-time A-V

1 285 47 36

2 1714 408 18

3 10242 1690 1138

4 2083 280 209

5 9892 1320 1237

6 5353 716 54

7 2599 548 260

8 21113 2845 2640

9 2716 387 272

10 3311 798 414

11 3197 442 229

12 2534 357 212

13 2605 358 187

14 2515 345 140

15 6477 943 648

16 3079 429 308

17 3054 480 204

18 4658 653 311

19 3806 581 272

20 9304 1319 665

Table A.5.47: Number of matrix-vector products, CPU time and

amortization vectors required by the IRAM algorithm to compute

approximate eigenvalues nearest zero and the corresponding eigenvectors.

The computation of the amortization vectors is relative to GMRES(10) and

a tolerance of 10 −5 .

A.5. Experiments with a poor preconditioner M 1 201

Example 3

Nr. of eigenvalues M-V products CPU-time A-V

1 180 43 60

2 631 148 211

3 991 234 199

4 955 288 120

5 703 170 101

6 590 141 74

7 537 172 34

8 743 179 50

9 542 152 16

10 736 177 23

11 904 240 27

12 745 227 22

13 784 242 23

14 986 247 29

15 1443 388 42

16 1333 335 38

17 1331 284 36

18 1224 311 33

19 1495 487 40

20 1652 411 38

Table A.5.48: Number of matrix-vector products, CPU time and

amortization vectors required by the IRAM algorithm to compute

approximate eigenvalues nearest zero and the corresponding eigenvectors.

The computation of the amortization vectors is relative to GMRES(10) and

a tolerance of 10 −5 .

202 A. Numerical results with the two-level spectral preconditioner

Example 4

Nr. of eigenvalues M-V products CPU-time A-V

1 555 189 -

2 4307 1439 4307

3 1402 461 1402

4 2154 709 2154

5 1343 617 672

6 1097 475 66

7 1044 457 261

8 1541 757 386

9 1413 614 354

10 1441 496 361

11 3440 1166 688

12 3688 1241 738

13 4473 1548 746

14 2514 948 419

15 1695 573 243

16 5491 1864 785

17 2787 993 399

18 3573 1217 511

19 7160 2462 796

20 8188 2879 745

Table A.5.49: Number of matrix-vector products, CPU time and

amortization vectors required by the IRAM algorithm to compute

approximate eigenvalues nearest zero and the corresponding eigenvectors.

The computation of the amortization vectors is relative to GMRES(10) and

a tolerance of 10 −5 .

A.5. Experiments with a poor preconditioner M 1 203

Example 5

Nr. of eigenvalues M-V products CPU-time A-V

1 105 51 27

2 58 29 15

3 237 115 14

4 252 145 4

5 163 81 2

6 251 128 1

7 239 134 1

8 229 114 1

9 209 118 1

10 213 154 1

11 215 109 1

12 608 565 2

13 665 348 2

14 655 383 2

15 817 420 2

16 850 439 2

17 1060 620 3

18 1247 622 3

19 973 842 3

20 4293 2206 10

Table A.5.50: Number of matrix-vector products, CPU time and

amortization vectors required by the IRAM algorithm to compute

approximate eigenvalues nearest zero and the corresponding eigenvectors.

The computation of the amortization vectors is relative to GMRES(10) and

a tolerance of 10 −5 .

204 A. Numerical results with the two-level spectral preconditioner

A.6 Numerical results for the symmetric

formulation

Example 1

Size of the

GMRES(m), Toler. 1e-8

coarse space

m=10 m=30 m=50 m=80 m=110

0 358 213 144 79 79

1 316 179 137 76 76

2 312 172 126 73 73

3 315 170 116 70 70

4 308 166 112 68 68

5 315 170 108 67 67

6 311 170 96 64 64

7 290 144 90 62 62

8 292 138 77 58 58

9 302 134 72 57 57

10 244 99 60 53 53

11 204 96 54 51 51

12 215 96 54 51 51

13 208 89 52 51 51

14 184 88 51 51 51

15 186 88 51 51 51

16 189 80 47 47 47

17 195 80 47 47 47

18 205 77 47 47 47

19 182 77 47 47 47

20 173 63 44 44 44

Table A.6.51: Number of iterations required by GMRES preconditioned by

a Frobenius-norm minimization method updated with spectral corrections

to reduce the normwise backward error by 10 −8 for increasing size of the

coarse space. The symmetric formulation of Theorem 2 with the choice

W = V ε is used for the low-rank updates.

A.6. Numerical results for the symmetric formulation 205

Example 1

Size of the

GMRES(m), Toler. 1e-5

coarse space

m=10 m=30 m=50 m=80 m=110

0 165 103 75 60 60

1 153 87 64 56 56

2 154 87 62 53 53

3 154 87 61 53 53

4 150 87 61 53 53

5 152 87 61 53 53

6 151 81 50 50 50

7 148 72 48 48 48

8 147 70 44 44 44

9 146 6¯6 43 43 43

10 122 51 40 40 40

11 110 50 39 39 39

12 108 49 38 38 38

13 106 47 38 38 38

14 106 46 38 38 38

15 95 47 38 38 38

16 105 44 35 35 35

17 109 45 35 35 35

18 106 45 35 35 35

19 105 44 35 35 35

20 105 34 32 32 32

Table A.6.52: Number of iterations required by GMRES preconditioned by

a Frobenius-norm minimization method updated with spectral corrections

to reduce the normwise backward error by 10 −5 for increasing size of the

coarse space. The symmetric formulation of Theorem 2 with the choice

W = V ε is used for the low-rank updates.

206 A. Numerical results with the two-level spectral preconditioner

Example 2

Size of the

GMRES(m), Toler. 1e-8

coarse space

m=10 m=30 m=50 m=80 m=110

0 +1500 +1500 496 311 198

1 304 235 192 151 107

2 305 222 184 143 104

3 310 209 177 138 101

4 303 208 170 136 97

5 310 206 164 133 95

6 307 205 160 123 92

7 239 174 146 107 89

8 201 159 138 100 87

9 202 159 136 99 86

10 194 155 135 97 86

11 194 155 135 97 86

12 193 155 135 95 84

13 193 154 134 88 83

14 194 150 133 82 81

15 193 147 130 78 78

16 185 143 119 75 75

17 183 141 115 74 74

18 187 141 113 74 74

19 185 140 107 71 71

20 169 135 103 70 70

Table A.6.53: Number of iterations required by GMRES preconditioned by

a Frobenius-norm minimization method updated with spectral corrections

to reduce the normwise backward error by 10 −8 for increasing size of the

coarse space. The symmetric formulation of Theorem 2 with the choice

W = V ε is used for the low-rank updates.

A.6. Numerical results for the symmetric formulation 207

Example 2

Size of the

GMRES(m), Toler. 1e-5

coarse space

m=10 m=30 m=50 m=80 m=110

0 145 110 95 76 76

1 143 110 95 76 76

2 140 104 91 73 73

3 146 103 90 73 73

4 142 103 90 73 73

5 147 103 90 73 73

6 148 101 89 72 72

7 118 89 82 68 68

8 118 81 75 64 64

9 99 80 75 64 64

10 97 80 75 64 64

11 97 80 74 63 63

12 96 80 75 63 63

13 97 80 74 63 63

14 96 79 74 63 63

15 99 77 70 60 60

16 96 74 65 57 57

17 92 74 65 57 57

18 93 74 65 57 57

19 93 73 63 56 56

20 86 71 60 54 54

Table A.6.54: Number of iterations required by GMRES preconditioned by

a Frobenius-norm minimization method updated with spectral corrections

to reduce the normwise backward error by 10 −5 for increasing size of the

coarse space. The symmetric formulation of Theorem 2 with the choice

W = V ε is used for the low-rank updates.

208 A. Numerical results with the two-level spectral preconditioner

Example 3

Size of the

GMRES(m), Toler. 1e-8

coarse space

m=10 m=30 m=50 m=80 m=110

0 268 174 130 79 79

1 260 171 121 76 76

2 267 169 120 72 72

3 272 169 114 70 70

4 256 155 100 67 67

5 262 142 93 64 64

6 199 112 79 60 60

7 202 112 79 60 60

8 208 112 79 58 58

9 135 87 66 55 55

10 126 82 62 54 54

11 125 82 61 54 54

12 115 81 57 53 53

13 118 81 57 53 53

14 120 81 58 53 53

15 110 76 50 50 50

16 103 69 47 47 47

17 105 65 46 46 46

18 102 66 47 47 47

19 94 59 45 45 45

20 90 58 44 44 44

Table A.6.55: Number of iterations required by GMRES preconditioned by

a Frobenius-norm minimization method updated with spectral corrections

to reduce the normwise backward error by 10 −8 for increasing size of the

coarse space. The symmetric formulation of Theorem 2 with the choice

W = V ε is used for the low-rank updates.

A.6. Numerical results for the symmetric formulation 209

Example 3

Size of the

GMRES(m), Toler. 1e-5

coarse space

m=10 m=30 m=50 m=80 m=110

0 129 89 70 57 57

1 129 88 70 56 56

2 133 88 69 56 56

3 137 88 69 56 56

4 127 86 57 53 53

5 129 76 49 49 49

6 108 60 45 45 45

7 108 60 45 45 45

8 109 60 45 45 45

9 73 51 42 42 42

10 66 49 40 40 40

11 65 49 40 40 40

12 62 48 40 40 40

13 61 48 40 40 40

14 66 48 40 40 40

15 56 42 36 36 36

16 50 40 34 34 34

17 52 38 34 34 34

18 56 42 35 35 35

19 54 36 34 34 34

20 53 35 33 33 33

Table A.6.56: Number of iterations required by GMRES preconditioned by

a Frobenius-norm minimization method updated with spectral corrections

to reduce the normwise backward error by 10 −5 for increasing size of the

coarse space. The symmetric formulation of Theorem 2 with the choice

W = V ε is used for the low-rank updates.

210 A. Numerical results with the two-level spectral preconditioner

Example 4

Size of the

GMRES(m), Toler. 1e-8

coarse space

m=10 m=30 m=50 m=80 m=110

0 145 113 90 71 71

1 145 113 90 68 68

2 135 106 83 65 65

3 131 103 85 65 65

4 126 97 74 61 61

5 124 94 72 61 61

6 122 91 64 58 58

7 101 75 58 56 56

8 99 75 58 56 56

9 94 74 58 56 56

10 93 74 55 55 55

11 86 74 55 53 53

12 86 70 51 51 51

13 85 68 50 50 50

14 83 67 49 49 49

15 82 65 49 49 49

16 82 65 49 49 49

17 82 65 49 49 49

18 82 66 49 49 49

19 81 66 50 50 50

20 77 61 47 47 47

Table A.6.57: Number of iterations required by GMRES preconditioned by

a Frobenius-norm minimization method updated with spectral corrections

to reduce the normwise backward error by 10 −8 for increasing size of the

coarse space. The symmetric formulation of Theorem 2 with the choice

W = V ε is used for the low-rank updates.

A.6. Numerical results for the symmetric formulation 211

Example 4

Size of the

GMRES(m), Toler. 1e-5

coarse space

m=10 m=30 m=50 m=80 m=110

0 71 57 48 48 48

1 71 57 48 48 48

2 68 54 45 45 45

3 66 50 45 45 45

4 65 46 43 43 43

5 64 45 41 41 41

6 64 45 41 41 41

7 50 41 38 38 38

8 50 41 38 38 38

9 48 41 38 38 38

10 46 38 37 37 37

11 46 38 37 37 37

12 45 36 35 35 35

13 45 36 35 35 35

14 44 35 34 34 34

15 43 34 34 34 34

16 43 34 34 34 34

17 43 35 34 34 34

18 43 35 34 34 34

19 43 35 34 34 34

20 40 33 32 32 32

Table A.6.58: Number of iterations required by GMRES preconditioned by

a Frobenius-norm minimization method updated with spectral corrections

to reduce the normwise backward error by 10 −5 for increasing size of the

coarse space. The symmetric formulation of Theorem 2 with the choice

W = V ε is used for the low-rank updates.

212 A. Numerical results with the two-level spectral preconditioner

Example 5

Size of the

GMRES(m), Toler. 1e-8

coarse space

m=10 m=30 m=50 m=80 m=110

0 297 87 75 66 66

1 290 78 75 66 66

2 290 78 75 66 66

3 287 66 68 58 58

4 252 66 64 58 58

5 214 66 62 58 58

6 430 66 50 50 50

7 51 43 39 39 39

8 52 43 39 39 39

9 53 43 39 39 39

10 52 44 40 40 40

11 52 44 40 40 40

12 49 44 38 38 38

13 49 44 38 38 38

14 50 44 38 38 38

15 50 44 38 38 38

16 50 44 38 38 38

17 50 44 38 38 38

18 50 44 38 38 38

19 52 44 39 39 39

20 52 44 39 39 39

Table A.6.59: Number of iterations required by GMRES preconditioned by

a Frobenius-norm minimization method updated with spectral corrections

to reduce the normwise backward error by 10 −8 for increasing size of the

coarse space. The symmetric formulation of Theorem 2 with the choice

W = V ε is used for the low-rank updates.

A.6. Numerical results for the symmetric formulation 213

Example 5

Size of the

GMRES(m), Toler. 1e-5

coarse space

m=10 m=30 m=50 m=80 m=110

0 110 46 42 42 42

1 109 45 41 41 41

2 109 45 41 41 41

3 104 34 33 33 33

4 88 34 33 33 33

5 72 34 33 33 33

6 102 36 33 33 33

7 23 21 21 21 21

8 23 21 21 21 21

9 23 21 21 21 21

10 23 21 21 21 21

11 23 21 21 21 21

12 23 20 20 20 20

13 23 21 21 21 21

14 23 21 21 21 21

15 23 21 21 21 21

16 23 21 21 21 21

17 23 21 21 21 21

18 23 21 21 21 21

19 23 21 21 21 21

20 24 22 22 22 22

Table A.6.60: Number of iterations required by GMRES preconditioned by

a Frobenius-norm minimization method updated with spectral corrections

to reduce the normwise backward error by 10 −5 for increasing size of the

coarse space. The symmetric formulation of Theorem 2 with the choice

W = V ε is used for the low-rank updates.

214 A. Numerical results with the two-level spectral preconditioner

Size of the

Example

coarse space

Nr. 1 Nr. 2 Nr. 3 Nr. 4 Nr. 5

0 103 161 92 61 51

1 97 129 90 61 51

2 90 119 85 60 51

3 84 120 78 56 40

4 78 117 77 54 40

5 71 108 71 53 40

6 65 104 67 50 40

7 60 99 67 49 33

8 60 97 66 49 33

9 58 94 58 49 34

10 58 91 56 46 33

11 55 82 58 46 33

12 52 82 56 42 34

13 47 86 52 40 33

14 44 77 51 41 33

15 44 76 51 41 26

16 40 73 48 41 34

17 42 70 49 41 34

18 41 69 48 41 34

19 37 66 47 39 34

20 37 68 46 39 34

Table A.6.61: Number of iterations required by SQMR preconditioned by a

Frobenius-norm minimization method updated with spectral corrections to

reduce the normwise backward error by 10 −8 for increasing size of the coarse

space. The symmetric formulation of Theorem 2 with the choice W = V ε is

used for the low-rank updates.

A.6. Numerical results for the symmetric formulation 215

Size of the

Example

coarse space

Nr. 1 Nr. 2 Nr. 3 Nr. 4 Nr. 5

0 74 70 58 30 24

1 74 70 58 30 23

2 65 70 58 30 23

3 60 70 58 31 23

4 57 70 55 27 23

5 49 70 50 27 23

6 47 70 45 27 15

7 37 69 45 23 15

8 40 69 45 23 15

9 40 56 42 23 14

10 40 58 42 23 14

11 34 59 42 21 14

12 30 56 40 21 14

13 28 59 37 20 14

14 25 59 36 20 14

15 23 51 36 20 14

16 23 47 33 20 14

17 23 47 33 19 14

18 23 47 33 20 14

19 22 42 33 19 14

20 22 43 33 19 14

Table A.6.62: Number of iterations required by SQMR preconditioned by a

Frobenius-norm minimization method updated with spectral corrections to

reduce the normwise backward error by 10 −5 for increasing size of the coarse

space. The symmetric formulation of Theorem 2 with the choice W = V ε is

used for the low-rank updates.

216 A. Numerical results with the two-level spectral preconditioner

A.7 Numerical results for the multiplicative

formulation

Example 1

Size of the

GMRES(m), Toler. 1e-8

coarse space

m=10 m=30 m=50 m=80 m=110

0 358 213 144 79 79

1 189 89 49 49 49

2 188 88 47 47 47

3 188 85 45 45 45

4 186 83 44 44 44

5 186 82 43 43 43

6 183 70 41 41 41

7 178 60 39 39 39

8 178 56 37 37 37

9 170 53 36 36 36

10 116 44 34 34 34

11 105 40 33 33 33

12 103 40 33 33 33

13 97 40 33 33 33

14 96 38 32 32 32

15 96 38 32 32 32

16 88 30 30 30 30

17 88 30 30 30 30

18 88 30 30 30 30

19 88 29 29 29 29

20 79 25 25 25 25

Table A.7.63: Number of iterations required by GMRES preconditioned by

a Frobenius-norm minimization method updated with spectral corrections

to reduce the normwise backward error by 10 −8 for increasing size of the

coarse space. The preconditioner is updated in multiplicative form.

A.7. Numerical results for the multiplicative formulation 217

Example 1

Size of the

GMRES(m), Toler. 1e-5

coarse space

m=10 m=30 m=50 m=80 m=110

0 165 103 75 60 60

1 90 44 37 37 37

2 90 42 35 35 35

3 90 42 35 35 35

4 90 42 35 35 35

5 90 42 34 34 34

6 89 39 32 32 32

7 85 31 31 31 31

8 85 29 29 29 29

9 83 28 28 28 28

10 58 26 26 26 26

11 50 25 25 25 25

12 49 25 25 25 25

13 47 24 24 24 24

14 47 24 24 24 24

15 47 24 24 24 24

16 45 22 22 22 22

17 45 22 22 22 22

18 45 22 22 22 22

19 43 21 21 21 21

20 37 18 18 18 18

Table A.7.64: Number of iterations required by GMRES preconditioned by

a Frobenius-norm minimization method updated with spectral corrections

to reduce the normwise backward error by 10 −5 for increasing size of the

coarse space. The preconditioner is updated in multiplicative form.

218 A. Numerical results with the two-level spectral preconditioner

Example 2

Size of the

GMRES(m), Toler. 1e-8

coarse space

m=10 m=30 m=50 m=80 m=110

0 +1500 +1500 496 311 198

1 190 132 98 68 68

2 187 123 94 66 66

3 176 117 92 64 64

4 190 116 89 61 61

5 189 113 87 60 60

6 188 110 81 59 59

7 150 99 73 56 56

8 130 94 69 55 55

9 129 93 68 55 55

10 127 90 67 55 55

11 127 90 67 55 55

12 130 94 67 54 54

13 125 90 62 53 53

14 113 81 49 49 49

15 113 77 47 47 47

16 127 85 49 49 49

17 279 85 47 47 47

18 130 83 48 48 48

19 109 69 44 44 44

20 128 78 46 46 46

Table A.7.65: Number of iterations required by GMRES preconditioned by

a Frobenius-norm minimization method updated with spectral corrections

to reduce the normwise backward error by 10 −8 for increasing size of the

coarse space. The preconditioner is updated in multiplicative form.

A.7. Numerical results for the multiplicative formulation 219

Example 2

Size of the

GMRES(m), Toler. 1e-5

coarse space

m=10 m=30 m=50 m=80 m=110

0 145 110 95 76 76

1 82 59 47 47 47

2 82 56 45 45 45

3 80 56 45 45 45

4 82 56 45 45 45

5 80 56 45 45 45

6 83 55 44 44 44

7 68 51 42 42 42

8 63 48 40 40 40

9 60 48 40 40 40

10 60 48 40 40 40

11 60 48 40 40 40

12 64 49 40 40 40

13 60 48 40 40 40

14 47 36 32 32 32

15 47 35 31 31 31

16 64 47 38 38 38

17 108 42 33 33 33

18 64 46 37 37 37

19 47 34 32 32 32

20 60 42 35 35 35

Table A.7.66: Number of iterations required by GMRES preconditioned by

a Frobenius-norm minimization method updated with spectral corrections

to reduce the normwise backward error by 10 −5 for increasing size of the

coarse space. The preconditioner is updated in multiplicative form.

220 A. Numerical results with the two-level spectral preconditioner

Example 3

Size of the

GMRES(m), Toler. 1e-8

coarse space

m=10 m=30 m=50 m=80 m=110

0 268 174 130 79 79

1 163 97 52 51 51

2 166 96 49 49 49

3 170 96 47 47 47

4 163 88 45 45 45

5 156 71 43 43 43

6 119 58 41 41 41

7 119 58 41 41 41

8 119 57 40 40 40

9 89 51 37 37 37

10 80 48 36 36 36

11 80 47 36 36 36

12 77 46 36 36 36

13 75 45 35 35 35

14 75 45 35 35 35

15 70 44 34 34 34

16 67 38 33 33 33

17 66 35 32 32 32

18 63 32 31 31 31

19 60 30 30 30 30

20 56 29 29 29 29

Table A.7.67: Number of iterations required by GMRES preconditioned by

a Frobenius-norm minimization method updated with spectral corrections

to reduce the normwise backward error by 10 −8 for increasing size of the

coarse space. The preconditioner is updated in multiplicative form.

A.7. Numerical results for the multiplicative formulation 221

Example 3

Size of the

GMRES(m), Toler. 1e-5

coarse space

m=10 m=30 m=50 m=80 m=110

0 129 89 70 57 57

1 87 56 39 39 39

2 88 56 38 38 38

3 89 57 38 38 38

4 89 55 36 36 36

5 82 44 34 34 34

6 63 33 31 31 31

7 64 33 31 31 31

8 64 33 31 31 31

9 49 29 29 29 29

10 45 28 28 28 28

11 45 28 28 28 28

12 43 28 28 28 28

13 43 28 28 28 28

14 43 27 27 27 27

15 42 27 27 27 27

16 39 26 26 26 26

17 39 25 25 25 25

18 35 24 24 24 24

19 37 23 23 23 23

20 34 22 22 22 22

Table A.7.68: Number of iterations required by GMRES preconditioned by

a Frobenius-norm minimization method updated with spectral corrections

to reduce the normwise backward error by 10 −5 for increasing size of the

coarse space. The preconditioner is updated in multiplicative form.

222 A. Numerical results with the two-level spectral preconditioner

Example 4

Size of the

GMRES(m), Toler. 1e-8

coarse space

m=10 m=30 m=50 m=80 m=110

0 145 113 90 71 71

1 103 66 48 48 48

2 98 64 45 45 45

3 98 64 45 45 45

4 90 59 43 43 43

5 93 59 43 43 43

6 87 57 41 41 41

7 75 51 40 40 40

8 75 51 40 40 40

9 76 51 40 40 40

10 64 43 35 35 35

11 70 49 37 37 37

12 64 40 36 36 36

13 62 37 35 35 35

14 61 37 35 35 35

15 59 36 34 34 34

16 59 36 34 34 34

17 59 36 34 34 34

18 59 36 34 34 34

19 59 35 33 33 33

20 55 34 33 33 33

Table A.7.69: Number of iterations required by GMRES preconditioned by

a Frobenius-norm minimization method updated with spectral corrections

to reduce the normwise backward error by 10 −8 for increasing size of the

coarse space. The preconditioner is updated in multiplicative form.

A.7. Numerical results for the multiplicative formulation 223

Example 4

Size of the

GMRES(m), Toler. 1e-5

coarse space

m=10 m=30 m=50 m=80 m=110

0 71 57 48 48 48

1 52 39 34 34 34

2 51 35 33 33 33

3 50 35 33 33 33

4 45 32 32 32 32

5 45 33 32 32 32

6 44 31 31 31 31

7 36 28 28 28 28

8 36 28 28 28 28

9 37 28 28 28 28

10 27 22 22 22 22

11 34 27 27 27 27

12 31 26 26 26 26

13 31 25 25 25 25

14 30 25 25 25 25

15 30 25 25 25 25

16 30 25 25 25 25

17 29 25 25 25 25

18 29 24 24 24 24

19 30 24 24 24 24

20 27 23 23 23 23

Table A.7.70: Number of iterations required by GMRES preconditioned by

a Frobenius-norm minimization method updated with spectral corrections

to reduce the normwise backward error by 10 −5 for increasing size of the

coarse space. The preconditioner is updated in multiplicative form.

224 A. Numerical results with the two-level spectral preconditioner

Example 5

Size of the

GMRES(m), Toler. 1e-8

coarse space

m=10 m=30 m=50 m=80 m=110

0 297 87 75 66 66

1 137 65 48 48 48

2 136 65 48 48 48

3 119 47 43 43 43

4 121 47 43 43 43

5 123 48 43 43 43

6 270 47 36 36 36

7 38 29 29 29 29

8 38 30 30 30 30

9 38 30 30 30 30

10 38 30 30 30 30

11 39 30 30 30 30

12 38 29 29 29 29

13 44 30 30 30 30

14 37 28 28 28 28

15 37 28 28 28 28

16 36 28 28 28 28

17 37 28 28 28 28

18 41 30 30 30 30

19 40 30 30 30 30

20 43 30 30 30 30

Table A.7.71: Number of iterations required by GMRES preconditioned by

a Frobenius-norm minimization method updated with spectral corrections

to reduce the normwise backward error by 10 −8 for increasing size of the

coarse space. The preconditioner is updated in multiplicative form.

A.7. Numerical results for the multiplicative formulation 225

Example 5

Size of the

GMRES(m), Toler. 1e-5

coarse space

m=10 m=30 m=50 m=80 m=110

0 110 46 42 42 42

1 46 40 32 32 32

2 46 40 32 32 32

3 46 24 24 24 24

4 58 24 24 24 24

5 59 24 24 24 24

6 85 24 24 24 24

7 20 16 16 16 16

8 20 16 16 16 16

9 20 16 16 16 16

10 20 16 16 16 16

11 20 16 16 16 16

12 19 16 16 16 16

13 22 17 17 17 17

14 19 15 15 15 15

15 19 15 15 15 15

16 19 15 15 15 15

17 19 15 15 15 15

18 23 18 18 18 18

19 24 19 19 19 19

20 25 19 19 19 19

Table A.7.72: Number of iterations required by GMRES preconditioned by

a Frobenius-norm minimization method updated with spectral corrections

to reduce the normwise backward error by 10 −5 for increasing size of the

coarse space. The preconditioner is updated in multiplicative form.

226 A. Numerical results with the two-level spectral preconditioner

Size of the

Example

coarse space

Nr. 1 Nr. 2 Nr. 3 Nr. 4 Nr. 5

0 103 161 92 61 51

1 97 108 75 46 37

2 107 +500 77 45 40

3 96 +500 69 49 39

4 93 +500 77 46 32

5 91 +500 68 51 64

6 78 +500 66 42 186

7 73 +500 70 41 32

8 73 +500 65 42 42

9 68 +500 56 42 +500

10 68 +500 56 38 66

11 72 +500 59 47 183

12 53 +500 58 40 +500

13 56 +500 49 36 +500

14 40 +500 48 36 +500

15 40 +500 46 36 +500

16 42 +500 43 37 116

17 35 +500 45 38 +500

18 37 +500 45 37 +500

19 41 +500 44 35 +500

20 39 +500 43 35 +500

Table A.7.73: Number of iterations required by SQMR preconditioned by

a Frobenius-norm minimization method updated with spectral corrections

to reduce the normwise backward error by 10 −8 for increasing size of the

coarse space. The symmetric formulation of Theorem 2 with the choice

W = V ε is used for the low-rank updates. The preconditioner is updated in

multiplicative form.

A.7. Numerical results for the multiplicative formulation 227

Size of the

Example

coarse space

Nr. 1 Nr. 2 Nr. 3 Nr. 4 Nr. 5

0 74 70 58 30 24

1 61 46 42 21 10

2 59 57 45 21 10

3 50 59 46 21 10

4 46 63 42 17 10

5 43 +500 38 17 10

6 36 +500 34 17 10

7 37 68 36 13 10

8 34 154 37 13 10

9 33 +500 34 13 10

10 32 +500 31 13 16

11 30 +500 32 13 10

12 27 51 32 13 10

13 27 +500 29 13 16

14 22 +500 25 13 14

15 20 +500 27 13 13

16 18 +500 24 13 14

17 16 +500 25 13 13

18 20 +500 25 13 12

19 18 +500 24 13 18

20 18 +500 23 13 32

Table A.7.74: Number of iterations required by SQMR preconditioned by

a Frobenius-norm minimization method updated with spectral corrections

to reduce the normwise backward error by 10 −5 for increasing size of the

coarse space. The symmetric formulation of Theorem 2 with the choice

W = V ε is used for the low-rank updates. The preconditioner is updated in

multiplicative form.

228 A. Numerical results with the two-level spectral preconditioner

Bibliography

[1] G. Alléon, S. Amram, N. Durante, P. Homsi, D. Pogarieloff, and

C. Farhat. Massively parallel processing boosts the solution of

industrial electromagnetic problems: High performance out-of-core

solution of complex dense systems. In M. Heath, V. Torczon,

G. Astfalk, P. E. Bjørstad, A. H. Karp, C. H. Koebel, V. Kumar,

R. F. Lucas, L. T. Watson, and Editors D. E. Womble, editors,

Proceedings of the Eighth SIAM Conference on Parallel Processing for

Scientific Computing. SIAM Book, Philadelphia, 1997. Conference

held in Minneapolis, Minnesota, USA.

[2] G. Alléon, M. Benzi, and L. Giraud. Sparse approximate inverse

preconditioning for dense linear systems arising in computational

electromagnetics. Numerical Algorithms, 16:1–15, 1997.

[3] F. Alvarado and H. Daǧ. Sparsified and incomplete sparse factored

inverse preconditioners. In Copper Mountain Conference on Iterative

Methods. Preliminary Proceedings., volume I, April 9-14, 1992.

[4] H. Anastassiu and J. L. Volakis. An AIM based analysis of

scattering from cylindrically periodic structures. IEEE Antennas and

Propagation Society International Symposium Digest, pages 60–63,

1997.

[5] E. Anderson, Z. Bai, C. Bischof, S. Blackford, J. Demmel, J. Dongarra,

J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, and

D. Sorensen. LAPACK Users’ Guide. Society for Industrial and

Applied Mathematics, Philadelphia, PA, third edition, 1999.

[6] A. W. Appel. An efficient program for many-body simulation. SIAM

J. Scientific and Statistical Computing, 6:85–103, 1985.

[7] J. Baglama, D. Calvetti, G. H. Golub, and L. Reichel. Adaptively

preconditioned GMRES algorithms. SIAM J. Scientific Computing,

20(1):243–269, 1999.

[8] J. Barnes and P. Hut. A hierarchical O(n log n) force calculation

algorithm. Nature, 324:446–449, 1986.

229

230 A. Numerical results with the two-level spectral preconditioner

[9] A. Bayliss, C. I. Goldstein, and E. Turkel. On accuracy conditions

for the numerical computation of waves. J. Comp. Phys., 59:396–404,

1985.

[10] M. Bebendorf. Approximation of boundary element matrices.

Numerische Mathematik, 86(4):565–589, 2000.

[11] A. Bendali. Approximation par éléments finis de surface de problèmes

de diffraction des ondes electro-magnetiques. PhD thesis, Université

Paris VI , 1984.

[12] M. W. Benson. Iterative solution of large scale linear systems. Master’s

thesis, Lakehead University, Thunder Bay, Canada, 1973.

[13] M. W. Benson and P. O. Frederickson. Iterative solution of large sparse

linear systems arising in certain multidimensional approximation

problems. Utilitas Mathematica, 22:127–140, 1982.

[14] M. W. Benson, J. Krettmann, and M. Wright. Parallel algorithms for

the solution of certain large sparse linear systems. Int J. of Computer

Mathematics, 16, 1984.

[15] M. Benzi, J. Marin, and M. Tůma. A two-level parallel preconditioner

based on sparse approximate inverses. In D.R. Kincaid and A.C.

Elster, editors, Iterative Methods in Scientific Computation IV,

IMACS Series in Computational and Applied Mathematics, pages 167–

178. IMACS, New Brunswick, NJ, 1999.

[16] M. Benzi, C. D. Meyer, and M. Tůma. A sparse approximate inverse

preconditioner for the conjugate gradient method. SIAM J. Scientific

Computing, 17:1135–1149, 1996.

[17] M. Benzi, D. B. Szyld, and A. van Duin. Orderings for incomplete

factorization preconditioning of nonsymmetric problems. SIAM J.

Scientific Computing, 20:1652–1670, 1999.

[18] M. Benzi and M. Tůma. A comparison of some preconditioning

techniques for general sparse matrices. In Iterative Methods in Linear

Algebra, II, ed. P. Vassilevski and S. Margenov, IMACS Series in

Computational and Applied Mathematics, vol. 3, Piscataway, NJ

(1996), pp. 191-203.

[19] M. Benzi and M. Tůma. A sparse approximate inverse preconditioner

for nonsymmetric linear systems. SIAM J. Scientific Computing,

19:968–994, 1998.

[20] J.-P. Berenger. A perfectly matched layer for the absorption of

electromagnetic waves. J. Comp. Phys., 114:185–200, 1994.

A.7. Numerical results for the multiplicative formulation 231

[21] R. Bramley and V. Menkov. Parallel preconditioners with low rank

off-diagonal blocks. 1996. Submitted to Parallel Computing.

[22] B. Carpentieri, I. S. Duff, and L. Giraud. Sparse pattern selection

strategies for robust Frobenius-norm minimization preconditioners in

electromagnetism. Numerical Linear Algebra with Applications, 7(7-

8):667–685, 2000.

[23] B. Carpentieri, I. S. Duff, L. Giraud, and G. Sylvand. Combining

fast multipole techniques and an approximate inverse preconditioner

for large parallel electromagnetics calculations. Technical Report in

preparation, CERFACS, Toulouse, France.

[24] L. M. Carvalho, L. Giraud, and P. Le Tallec. Algebraic two-level

preconditioners for the Schur complement method. SIAM J. Scientific

Computing, 22(6):1987–2005, 2001.

[25] K. Chadan, D. Colton, L. Päivärinta, and W. Rundell. An introduction

to Inverse Scattering and Inverse Spectral Problems. SIAM Book,

Philadelphia, 1997.

[26] T. Chan and T.P. Mathew. Domain Decomposition Algorithms,

volume 3 of Acta Numerica, pages 61–143. Cambridge University

Press, Cambridge, 1994.

[27] T. Chan and H. A. van der Vorst. Approximate and

incomplete factorizations. In D. E. Keyes, A. Sameh, and

V. Venkatakrishnan, editors, Parallel Numerical Algorithms,

ICASE/LaRC Interdisciplinary Series in Science and Engineering

Volume 4, Kluwer Academic, Dordecht, 1997, pages 167–202, 1997.

[28] K. Chen. On a class of preconditioning methods for dense linear

systems from boundary elements. SIAM J. Scientific Computing,

20(2):684–698, 1998.

[29] W. C. Chew, J. M. Jin, C. C. Lu, E. Michielssen, and J. M. Song.

Fast solution methods in electromagnetics. IEEE Transactions on

Antennas and Propagation, 45(3):533–543, 1997.

[30] W. C. Chew and C. C. Lu. The use of Huygens’equivalence principle

for solving 3D volume integral equation of scattering. IEEE Trans.

Ant. Prop., 43(5):500–507, 1995.

[31] W. C. Chew, C. C. Lu, and Y. M. Wang. Review of efficient

computation of three-dimensional scattering of vector electromagnetic

waves. J. Opt. Soc. Am. A, 11:1528–1537, 1994.

232 A. Numerical results with the two-level spectral preconditioner

[32] W. C. Chew and Y. M. Wang. A recursive T-matrix approach for

the solution of electromagnetic scattering by many spheres. IEEE

Transactions on Antennas and Propagation, 41(12):1633–1639, 1993.

[33] E. Chow. Parallel implementation and practical use of sparse

approximate inverse preconditioners with a priori sparsity patterns.

Int. J. High Perf. Comput. Apps., 15:56–74, 2001.

[34] E. Chow and Y. Saad. Experimental study of ILU preconditioners

for indefinite matrices. Journal of Computational and Applied

Mathematics, 86:387–414, 1997.

[35] E. Chow and Y. Saad. Approximate inverse preconditioners via sparsesparse

iterations. SIAM J. Scientific Computing, 19(3):995–1023,

1998.

[36] A. Cosnau. Etude d’un préconditionneur pour les matrices complexes

dense symmetric issues des équations de Maxwell en formulation

intégrale. Note technique ONERA, 1996. 142328.96/DI/MT.

[37] E. Cuthill and J. McKee. Reducing the bandwidth of sparse symmetric

matrices. In Proceedings 24th National Conference of the Association

for Computing Machinery, Brandon Press, New Jersey, pages 157–

172. Brandon Press, New Jersey, 1969.

[38] E. Darve. The fast multipole method (I): Error analysis and

asymptotic complexity. SIAM J. Numerical Analysis, 38(1):98–128,

2000.

[39] E. Darve. The fast multipole method: Numerical implementation. J.

Comp. Phys., 160(1):195–240, 2000.

[40] H. Daǧ. Iterative Methods and Parallel Computation for Power

Systems. PhD thesis, Department of Electrical Engineering. University

of Winsconsin, Madison, WI, 1996.

[41] E. F. D’Azevedo, P. A. Forsyth, and W.-P. Tang. Drop tolerance

preconditioning for incompressible viscous flow. Int. J. Computer

Mathematics, 44:301–312, 1992.

[42] C. de Boor. Dichotomies for band matrices. SIAM J. Numerical

Analysis, 17:894–907, 1980.

[43] E. de Sturler. Inner-outer methods with deflation for linear systems

with multiple right-hand sides. In Householder Symposium XIII,

Proceedings of the Householder Symposium on Numerical Algebra,

Pontresina, Switzerland, pages 193–196, June 17 - 26, 1996.

A.7. Numerical results for the multiplicative formulation 233

[44] B. Dembart and M. A. Epton. A 3D fast multipole method for

electromagnetics with multiple levels. Tech. Rep. ISSTECH-97-004,

The Boeing Company, Seattle, WA, 1994.

[45] B. Dembart and M. A. Epton. Low frequency multipole translation

theory for the Helmholtz equation. Tech. Rep. SSGTECH-98-013, The

Boeing Company, Seattle, WA, 1998.

[46] B. Dembart and M. A. Epton. Spherical harmonic analysis and

synthesis for the fast multipole method. Tech. Rep. SSGTECH-98-

014, The Boeing Company, Seattle, WA, 1998.

[47] B. Dembart and E. Yip. Matrix assembly in FMM-MOM codes. Tech.

Rep. ISSTECH-97-002, The Boeing Company, Seattle, WA, 1997.

[48] S. Demko. Inverses of band matrices and local convergence of spline

projections. SIAM J. Numerical Analysis, 14:616–619, 1977.

[49] S. Demko, W. F. Moss, and P. W. Smith. Decay rates for inverses of

band matrices. Mathematics of Computation, 43:491–499, 1984.

[50] B. Deprés. Quadratic functional and integral equations for harmonic

wave problems in exterior domains. Mathematical Modelling and

Numerical Analysis, 31(6):679–732, 1997.

[51] J. Dongarra, J. Du Croz, I. S. Duff, and S. Hammarling. Algorithm

679: A set of level 3 basic linear algebra subprograms. ACM Trans.

Math. Softw., 16:18–28, 1990.

[52] J. Dongarra, J. Du Croz, I. S. Duff, and S. Hammarling. A set of

level 3 basic linear algebra subprograms. ACM Trans. Math. Softw.,

16:1–17, 1990.

[53] I. S. Duff, R. G. Grimes, and J. G. Lewis. User’s guide for Harwell-

Boeing sparse matrix test problems collection. Tech. Report RAL-92-

086, Computing and Information Systems Department, Rutherford

Appleton Laboratory, Didcot, UK, 1992.

[54] I. S. Duff and G. A. Meurant. The effect of ordering on preconditioned

conjugate gradient. BIT, 29:635–657, 1989.

[55] I. S. Duff and H. A. van der Vorst. Preconditioning and parallel

preconditioning. Tech. Rep. TR/PA/98/23, CERFACS, France, 1998.

[56] A. Edelman. The first annual large dense linear system survey. The

SIGNUM Newsletter, 26:6–12, 1991.

234 A. Numerical results with the two-level spectral preconditioner

[57] A. Edelman. Large dense numerical linear algebra in 1993:

The parallel computing influence. Journal of Supercomputing

Applications., 7:113–128, 1993.

[58] V. Eijkhout and B. Polman. Decay rates of inverses of banded M-

matrices that are near to Toeplitz matrices. Linear Algebra and its

Applications, 109:247–277, 1988.

[59] J. Erhel, K. Burrage, and B. Pohl. Restarted GMRES preconditioned

by deflation. J. Comput. Appl. Math., 69:303–318, 1996.

[60] Q. Fan, P. A. Forsyth, W.-P. Tang, and J. R. F. McMacken.

Performance issues for iterative solvers in semiconductor device

simulation. SIAM J. Scientific Computing, 1:100–117, 1996.

[61] M. R. Field. An efficient parallel preconditioner for the conjugate

gradient algorithm. Technical Report HDL-TR-97-175, Hitachi Dublin

Laboratory, Trinity College, Dublin, 1998.

[62] V. Frayssé and L. Giraud. An implementation of block QMR for

J-symmetric matrices. Technical Report TR/PA/97/57, CERFACS,

Toulouse, France, 1997.

[63] V. Frayssé, L. Giraud, and S. Gratton. A set of GMRES routines for

real and complex arithmetics. Tech. Rep. TR/PA/97/49, CERFACS,

1997.

[64] V. Frayssé, L. Giraud, and S. Gratton. A set of Flexible-

GMRES routines for real and complex arithmetics. Technical Report

TR/PA/98/20, CERFACS, Toulouse, France, 1998.

[65] P. O. Frederickson. Fast approximate inversion of large sparse linear

systems. Math. Report 7, Lakehead University, Thunder Bay, Canada,

1975.

[66] R. W. Freund. A transpose-free quasi-minimal residual algorithm

for non-Hermitian linear systems. SIAM J. Scientific Computing,

14(2):470–482, 1993.

[67] R. W. Freund and N. M. Nachtigal. QMR: a quasi-minimal residual

method for non-Hermitian linear systems. Numerische Mathematik,

60(3):315–339, 1991.

[68] R. W. Freund and N. M. Nachtigal. An implementation of the QMR

method based on coupled two-term recurrences. SIAM J. Scientific

Computing, 15(2):313–337, 1994.

A.7. Numerical results for the multiplicative formulation 235

[69] R. W. Freund and N. M. Nachtigal. Software for simplified Lanczos

and QMR algorithms. Applied Numerical Mathematics, 19:319–341,

1995.

[70] R. W. Freund and N. M. Nachtigal. QMRPACK: a package of QMR

algorithms. ACM Transactions on Mathematical Software, 22:46–77,

1996.

[71] J. George and J. W. H. Liu. The evolution of the minimum degree

ordering algorithm. SIAM Review, 31:1–19, 1989.

[72] J. R. Gilbert. Predicting structure in sparse matrix computations.

SIAM J. Matrix Analysis and Applications, 15:62–79, 1994.

[73] G. H. Golub and C. F. Van Loan. Matrix computations. Johns Hopkins

Studies in the Mathematical Sciences. The Johns Hopkins University

Press, Baltimore, MD, USA, third edition, 1996.

[74] G. H. Golub and H. A. van der Vorst. Closer to the solution: iterative

linear solvers. Technical report, In I.S. Duff and G.A. Watson editors,

1997. The State of the Art in Numerical Analysis.

[75] S. A. Goreinov, E. E. Tyrtyshnikov, and A. Yu Yeremin. Matrix-free

iterative solution strategies for large dense linear systems. Numerical

Linear Algebra with Applications, 4(4):273–294, 1997.

[76] N. I. M. Gould and J. A. Scott. On approximate-inverse

preconditioners. Tech. Rep. 95-026, RAL, 1995.

[77] N. I. M. Gould and J. A. Scott. Sparse approximate-inverse

preconditioners using norm-minimization techniques. SIAM J.

Scientific Computing, 19(2):605–625, 1998.

[78] A. Grama, V. Kumar, and A. Sameh. On n-body simulations

using message-passing parallel computers. In Sidney Karin, editor,

Proceedings of the 1995 SIAM Conference on Parallel Processing, San

Francisco, CA, USA, 1995.

[79] A. Grama, V. Kumar, and A. Sameh. Parallel matrix-vector product

using approximate hierarchical methods. In Sidney Karin, editor,

Proceedings of the 1995 ACM/IEEE Supercomputing Conference,

December 3–8, 1995, San Diego Convention Center, San Diego, CA,

USA, New York, NY, USA, 1995. ACM Press and IEEE Computer

Society Press.

[80] A. Grama, V. Kumar, and A. Sameh. Scalable parallel formulations of

the Barnes–Hut method for n-body simulations. Parallel Computing,

24(5–6):797–822, 1998.

236 A. Numerical results with the two-level spectral preconditioner

[81] L. Greengard and W. Gropp. A parallel version of the fast multipole

method. Comput. Math. Appl., 20:63–71, 1990.

[82] L. Greengard and V. Rokhlin. A fast algorithm for particle

simulations. Journal of Computational Physics, 73:325–348, 1987.

[83] M. Grote. Nonreflecting boundary conditions for electromagnetic

scattering. Int. J. Numer. Model., 13:397–416, 2000.

[84] M. Grote and T. Huckle. Parallel preconditionings with sparse

approximate inverses. SIAM J. Scientific Computing, 18:838–853,

1997.

[85] W. Hackbusch. Multigrid methods and applications. Springer-Verlag,

1985.

[86] R. Harrington. Origin and development of the Method of Moments for

field computation. IEEE Antennas and Propagation Magazine, 1990.

[87] HSL. A collection of Fortran codes for large scale scientific

computation, 2000. http://www.numerical.rl.ac.uk/hsl.

[88] I. C. F. Ipsen and C. D. Meyer. The idea behind Krylov

methods. Tech. Rep. CRSC-TR97-3, NCSU Center For Research In

Scientific Computation, January 31, 1997. To Appear in American

Mathematical Monthly.

[89] J. M. Jin and V. V. Liepa. A note on hybrid finite element method for

solving scattering problems. IEEE Trans. Ant. Prop., 36(10):1486–

1489, 1988.

[90] W. R. Scott Jr. Errors due to spatial discretization and numerical

precision in the finite-element method. IEEE Trans. Ant. Prop.,

42(11):1565–1569, 1994.

[91] I. E. Kaporin. A preconditioned conjugate gradient method for solving

discrete analogs of differential problems. Differential Equations,

26:897–906, 1990.

[92] S. A. Kharchenko and A. Yu. Yeremin. Eigenvalue translation based

preconditioners for the GMRES(k) method. Numerical Linear Algebra

with Applications, 2(1):51–77, 1995.

[93] L. Yu. Kolotilina. Explicit preconditioning of systems of linear

algebraic equations with dense matrices. J. Sov. Math., 43:2566–

2573, 1988. English translation of a paper first published in Zapisli

Nauchnykh Seminarov Leningradskogo Otdeleniya Matematicheskogo

im. V.A. Steklova AN SSSR 154 (1986) 90-100.

A.7. Numerical results for the multiplicative formulation 237

[94] L. Yu. Kolotilina. Twofold deflation preconditioning of linear

algebraic systems. I. Theory. Technical Report EM-RR 20/95, Elegant

Mathematics, Inc., 1995. Available in Postscript format at the URL

http://www.elegant-math.com/abs-emrr.htm.

[95] L. Yu Kolotilina and A. Yu. Yeremin. Factorized sparse approximate

inverse preconditionings. I: Theory. SIAM J. Matrix Analysis and

Applications, 14:45–58, 1993.

[96] L. Yu Kolotilina and A. Yu. Yeremin. Factorized sparse approximate

inverse preconditionings. II: Solution of 3D FE systems on massively

parallel computers. Int J. High Speed Computing, 7:191–215, 1995.

[97] L. Yu Kolotilina, A. Yu. Yeremin, and A. A. Nikishin. Factorized

sparse approximate inverse preconditionings. IV: Simple approaches to

rising efficiency. Numerical Linear Algebra with Applications, 6:515–

531, 1999.

[98] L. Yu Kolotilina, A. Yu. Yeremin, and A. A. Nikishin.

Factorized sparse approximate inverse preconditionings. III: Iterative

construction of preconditioners. Journal of Mathematical Sciences,

101:3237–3254, 2000. Originally published in Russian in

Zap. Nauchn. Semin. POMI, 248:17-48, 1998.

[99] K. S. Kunz and R. J. Luebbers. The Finite Difference Time Domain

Method for Electromagnetics. CRC Press, Boca Raton, 1993.

[100] R. Lee and A. C. Cangellaris. A study of discretization error in the

finite element approximation of wave solution. IEEE Trans. Ant.

Prop., 40(5):542–549, 1992.

[101] S. W. Lee, H. Ling, and R. C. Chou. Ray tube integration in shooting

and bouncing ray method. Micro. Opt. Tech. Lett., 1:285–289, 1988.

[102] Z. Li, Y. Saad, and M. Sosonkina. pARMS: a parallel version of

the algebraic recursive multilevel solver. Technical Report umsi-2001-

100, Minnesota Supercomputer Institute, University of Minnesota,

Minneapolis, MN, 2001.

[103] J. C. Maxwell. A dynamical theory of the electromagnetic field. Royal

Society Transactions, CLV, 1864. Reprinted in R. A. R. Tricker,

The Contributions of Faraday and Maxwell to Electrical Science,

(Pergamon Press, 1966).

[104] B. McDonald and A. Wexler. Finite element solution of unbounded

field problem. IEEE Trans. Microwave Theory Tech., 20:841–847,

1972.

238 A. Numerical results with the two-level spectral preconditioner

[105] J. A. Meijerink and H. A. van der Vorst. An iterative solution method

for linear systems of which the coeffcient matrix is a symmetric M-

matrix. Mathematics of Computation, 31:148–162, 1977.

[106] G. Meurant. A review on the inverse of symmetric tridiagonal and

block tridiagonal matrices. SIAM J. Matrix Analysis and Applications,

13:707–728, 1992.

[107] E. Michielssen and A. Boag. Multilevel evaluation of electromagnetic

fields for the rapid solution of scattering problems. Micro. Opt. Tech.

Lett., 7(17):790–795, 1994.

[108] E. Michielssen and A. Boag. A multilevel matrix decomposition

algorithm for analyzing scattering from large structures. IEEE

Transactions on Antennas and Propagation, 44(8):1086 –1093, 1996.

[109] M. Magolu monga Made. Incomplete factorization based

preconditionings for solving the Helmholtz equation. Int. Journal for

Numerical Methods in Engineering, 50(5):1077–1101, 2001.

[110] M. Magolu monga Made, R. Beauwens, and G. Warzee.

Preconditioning of discrete Helmholtz operators perturbed by a

diagonal complex matrix. Communications in Numerical Methods in

Engineering, 11:801–817, 2000.

[111] M. Magolu monga Made and H. A. van der Vorst. ParIC: A

family of parallel incomplete Cholesky preconditioners. In M. Bubak,

H. Afsarmanesh, R. Williams, and B. Hertzberger, editors, High

Performance Computing and Networking. Proceedings of the HPCN

Europe 2000 Conference, Amsterdam. Lecture Notes in Computer

Science, 1823, pages 89–98. Springer-Verlag, Berlin, 2000.

[112] R. B. Morgan. Implicitely restarted GMRES and Arnoldi methods

for nonsymmetric systems of equations. SIAM J. Matrix Analysis and

Applications, 21(4):1112–1135.

[113] R. B. Morgan. A restarted GMRES method augmented with

eigenvectors. SIAM J. Matrix Analysis and Applications, 16:1154–

1171, 1995.

[114] A. Pothen, H. D. Simon, and K. P. Liou. Partitioning sparse

matrices with eigenvectors of graphs. SIAM J. Matrix Analysis and

Applications, 11(3):430–452, 1990.

[115] J. Rahola. Experiments on iterative methods and the fast multipole

method in electromagnetic scattering calculations. Technical Report

TR/PA/98/49, CERFACS, Toulouse, France, 1998.

A.7. Numerical results for the multiplicative formulation 239

[116] S. M. Rao, D. R. Wilton, and A. W. Glisson. Electromagnetic

scattering by surfaces of arbitrary shape. IEEE Trans. Antennas

Propagat., AP-30:409–418, 1982.

[117] J.-C. Rioual. Solving linear systems for semiconductor device

simulations on parallel distributed computers. PhD thesis, CERFACS,

Toulouse, France, 2002.

[118] J. W. Ruge and K. Stüben. Algebraic multigrid (AMG). 1987.

In Multigrid Methods, Frontiers in Applied Mathematics 3, S.F.

McCormick, ed., SIAM, Philadelphia, PA, pp. 73-130.

[119] Y. Saad. Projection and deflation methods for partial pole assignment

in linear state feedback. IEEE Trans. Automat. Contr., 33(3):290–297,

1988.

[120] Y. Saad. Analysis of augmented Krylov subspace techniques. SIAM

J. Scientific Computing, 14:461–469, 1993.

[121] Y. Saad. A flexible inner-outer preconditioned GMRES algorithm.

SIAM J. Scientific and Statistical Computing, 14:461–469, 1993.

[122] Y. Saad. Iterative Methods for Sparse Linear Systems. PWS

Publishing, New York, 1996.

[123] Y. Saad and M. H. Schultz. GMRES: A generalized minimal residual

algorithm for solving nonsymmetric linear systems. SIAM J. Scientific

and Statistical Computing, 7:856–869, 1986.

[124] K. E. Schmidt and M. A. Lee. Implementing the fast multipole method

in three dimensions. J. Statist. Phys., 63:1120, 1991.

[125] P. P. Silvester and R. L. Ferrari. Finite Elements for Electrical

Engineers. Cambridge University Press, Cambridge, 1990.

[126] J. P. Singh, C. Holt, T. Totsuka, A. Gupta, and J. L. Hennessy.

Load Balancing and Data Locality in Adaptive Hierarchical n-body

Methods: Barnes-Hut, Fast Multiple, and Radiosity. Journal of

Parallel and Distributed Computing, 27:118–141, 1995.

[127] G. L. G. Sleijpen and H. A. van der Vorst. Maintaining convergence

properties of Bi-CGSTAB methods in finite precision arithmetic.

Numerical Algorithms, 10:203–223, 1995.

[128] G. L. G. Sleijpen and H. A. van der Vorst. Reliable updated residuals

in hybrid Bi-CG methods. Computing, 56:141–163, 1996.

240 A. Numerical results with the two-level spectral preconditioner

[129] G. L. G. Sleijpen, H. A. van der Vorst, and D. R. Fokkema. Bi-

CGSTAB(l) and other hybrid Bi-CG methods. Numerical Algorthms,

7:75–109, 1994.

[130] B. F. Smith, P. Bjørstad, and W. Gropp. Domain Decomposition,

Parallel Multilevel Methods for Elliptic Partial Differential Equations.

Cambridge University Press, New York, 1st edition, 1996.

[131] P. Sonneveld. CGS, a fast Lanczos-type solver for nonsymmetric linear

systems. SIAM J. Scientific and Statistical Computing, 10:36–52,

1989.

[132] D. C. Sorensen. Implicit application of polynomial filters in a k-step

Arnoldi method. SIAM J. Matrix Analysis and Applications, 13:357–

385, 1992.

[133] R. Suda. New Iterative Linear Solvers for Parallel Circuit Simulation.

PhD thesis, Department of Information Sciences, University of Tokio,

1996.

[134] X. Sun and N. P. Pitsianis. A Matrix Version od the Fast Multipole

Method. SIAM Review, 43(2):289–300, 2001.

[135] G. Sylvand. Résolution Itérative de Formulation Intégrale pour

Helmholtz 3D: Applications de la Méthode Multipôle à des Problèmes

de Grande Taille. PhD thesis, Ecole Nationale des Ponts et Chaussées,

2002.

[136] D. B. Szyld and J. A. Vogel. A flexible quasi-minimal residual method

with inexact preconditioning. SIAM J. Scientific Computing, 23:363–

380, 2001.

[137] A. Taflove. Computational Electrodynamics: The Finite-Difference

Time-Domain Method. Artech House, Boston, 1995.

[138] W.-P. Tang. Schwartz splitting and template operators. PhD thesis,

Computer Science Dept., Stanford University, Stanford, CA, 1987.

[139] W.-P. Tang. Towards an effective sparse approximate inverse

preconditioners. SIAM J. Matrix Analysis and Applications,

20(4):970–986, 1998.

[140] W.-P. Tang and W. L. Wan. Sparse approximate inverse smoother

for multigrid. SIAM J. Matrix Analysis and Applications, 21(4):1236–

1252, 2000.

[141] W. F. Tinney and J. W. Walker. Direct solutions of sparse network

equations by optimally ordered triangular factorization. Proc. of the

IEEE, 55:1801–1809, 1967.

A.7. Numerical results for the multiplicative formulation 241

[142] H. A. van der Vorst. Bi-CGSTAB: a fast and smoothly converging

variant of Bi-CG for the solution of nonsymmetric linear systems.

SIAM J. Scientific and Statistical Computing, 13:631–644, 1992.

[143] H. A. van der Vorst and C. Vuik. The superlinear convergence

behaviour of GMRES. J. Comput. Appl. Math., 48:327–341, 1993.

[144] S. A. Vavasis. Preconditioning for boundary integral equations. SIAM

J. Matrix Analysis and Applications, 13:905–925, 1992.

[145] J. L. Volakis, A. Chatterjee, and L. C. Kempel. Finite element methods

for electromagnetics. IEEE Press, Piscataway, NJ, 1998.

[146] J. W. Watts. A conjugate gradient truncated direct method for the

iterative solution of the reservoir simulation pressure equation. Society

of Petroleum Engineers Journal, 21:345–353, 1981.

[147] J. H. Wilkinson. The Algebraic Eigenvalue Problem. Oxford University

Press, Walton Street, Oxford OX2 6DP, UK, 1965.

[148] J. Zhang. A sparse approximate inverse technique for parallel

preconditioning of general sparse matrices. Tech. Rep. 281-98,

Department of Computer Science, University of Kentucky, KY, 1998.

Accepted for publication in Applied Mathematics and Computation.

[149] F. Zhao and S. L. Johnsson. The parallel multipole method on the

connection machine. SIAM J. Scientific and Statistical Computing,

12:1420–1437, 1991.

Sparse preconditioners for dense linear systems from ... - cerfacs

Sparse preconditioners for dense linear systems from ... - cerfacs ... View more Sparse preconditioners for dense linear systems from ... - cerfacs

Delete template?

Save as template ?

Sparse preconditioners for dense linear systems from ... - cerfacs Sparse preconditioners for dense linear systems from ... - cerfacs