HSL and the solution of sparse linear systems - EPCC

HSL and the solution of sparse linear 

systems 

Jennifer A. Scott 

Computational Science and Engineering Department, 

Rutherford Appleton Laboratory. 

J.A.Scott@rl.ac.uk 

Group homepage: www.cse.clrc.ac.uk/nag/

• Who we are 

• HSL and software design 

Overview 

• Sparse linear solvers: a brief introduction 

EPCC Janury 2005

The Numerical Analysis Group at RAL 

• Belong to the Computational Science and Engineering (CSE) 

Department of CCLRC. 

• CSE aims to provide world-class expertise and support for UK 

theoretical and computational science communities, in both 

academia and industry. 

• We are based at the Rutherford Appleton Laboratory in 

Oxfordshire (about 15 miles south of Oxford). 

• RAL employs around 1200 people (mainly scientists and 

engineers) plus large number of visitors. 

• We are a small Group (4 permanent members plus a consultant) 

... there has only been one staff change in 15 years! 

• Currently, much of the core funding for the Group is provided 

by an ESPRC grant (GR/S42170). 


Why use a Numerical Library? 

• Developing reliable, robust, accurate and efficient software 

for areas covered by numerical libraries requires considerable 

experience and takes years of effort. 

• Thus it is cost effective to use a Library written and developed 

by experts 

• Reduces programming time and effort 

• Increases productivity 

• Allows confidence in the results 


• There is much free mathematical software available on the web 

(a useful site is gams.nist.gov) 

• Some free software is excellent and is fully documented, tested, 

and maintained (eg the LAPACK linear algebra library for dense 

matrices is available in the public domain) 

• BUT beware of the unknown - on the web there is no overall 

standard and no quality control 

• Often offers no guarantee of maintenance, user support, or 

continuity 

• The alternative is a commercial library eg NAG, IMSL, HSL 

• Also available commercially are high-performance technical 

computing environments (MATLAB, Mathematica ...) 


HSL 

• Began as Harwell Subroutine Library in 1963. 

• Portable, fully documented and tested Fortran packages. 

• Primarily written and developed by RAL Numerical Analysis 

Group. 

• Each package performs a basic numerical task (eg solve 

linear system, find eigenvalues) and has been designed to be 

incorporated into programs. 

• Particular strengths in: 

– sparse matrix computations 

– optimization 

– large-scale system solution 

HSL has international reputation for reliability and efficiency. 


For academics: 

Benefits and advantages of HSL 

• Freely available to ALL UK academics 

• Teaching aid (mainly MSc and PhD level) 

• More time for concentrating on own area of research 

(avoid “reinventing the wheel”!) 

• Can be used with confidence (“black box”) 


Benefits and advantages of HSL (cont.) 

For commercial organisations: 

• Shorten application development cycle, cutting time-to-market 

and gaining competitive advantage 

• Reduce overall development costs 

• More time to focus on specialist aspects of applications 

• Improve application accuracy and robustness 

• Fully supported and maintained software 

HSL routines have been incorporated into a large number of 

commercial products. 


Current version 

• A new version of HSL is released every 2-3 years 

(with UK academics given access to new routines as soon as 

testing is completed). 

• Latest version: HSL 2004 ... released September 2004 

• HSL is currently marketed by Aspen Technology. 


How to get HSL 

• HSL packages are available without charge, for academic 

purposes, to any user whose email address ends in .ac.uk . 

• Access to HSL is via our website by means of a short-lived 

individual password-controlled account. 

• Potential users are asked for brief details, including the use they 

intend to make of HSL. 

• Please provide this data as it helps us (and our funding body) 

to evaluate the relevance of our software to UK academia. 

• Users must accept a conditions-of-use form, and are not 

permitted to distribute any HSL codes they download to a third 

party. 

Further details of HSL: www.cse.clrc.ac.uk/nag/hsl 


Design of the HSL Library 

• HSL is split into HSL 2004 and HSL Archive. 

• HSL Archive consists of older packages that have been 

superseded either by improved HSL packages or by public 

domain libraries such as LAPACK. 

• HSL Archive is free to all for non-commercial use but its use is 

not supported. 

• All HSL usage (main library and Archive) requires a valid 

licence. 

HSL provides users with source code. 


• HSL packages are classified into chapters. These were decided 

on in the early days (certainly by Release 1 of the Catalogue) 

• The chapters led to the HSL naming convention. eg AD02 

for automatic differentiation belongs to the ‘A’ chapter on 

computer algebra and MA48 is part of the ‘MA’ chapter of 

matrix linear algebra packages. 

• The prefix HSL is used to indicate the package is written in 

Fortran 90 or 95 (some packages have Fortran 77 and Fortran 

90 versions). 

• The HSL catalogue provides a complete list of the packages in 

HSL 2004 and for each gives a brief outline of purpose, method, 

origin, language and other attributes. 

• An extensive index assists potential users in choosing packages 

appropriately. 


Software design aims within HSL 

We aim to design our software so that it is 

• Portable 

• Efficient 

• Reliable 

• Straightforward to use 

• General purpose 

• Flexible 

• Threadsafe 


Portability: 

How we achieve these objectives 

• Software written in standard Fortran (older codes are Fortran 

77, more recently, Fortran 90 and 95). 

• Parallel codes use MPI for message passing. 

• Small number of machine-dependent routines (eg FD05 

returns real-valued machine constants). 

Efficiency: 

• Extensive experience of Fortran programming (in particular, 

sparse matrix coding) 

• Use of (eg) BLAS and LAPACK, with options for tuning for 

different platforms 

• Performance compared with other state-of-the-art packages 


Reliability: 

• Extensive testing using comprehensive test deck 

• Also testing on real applications of different sizes 

• Tests performed on a range of computer platforms with a 

range of Fortran compilers. 


Ease of use: 

• Software is fully documented with each package having its 

own specification sheets 

• These include a simple example illustrating the use of the 

code (may be used as a template). 

• Parameters that must be set by the user are kept to a 

minimum. 

• User interface simplified through use of Fortran 90 (dynamic 

memory allocation .. also allows easy restart) 

• The main codes provide checks on the user’s data. In case 

of an error, a flag is set and, optionally, a message written. 

This assists the user with debugging their calling program 

and data. 


General purpose: 

• Packages are not designed for a particular problem arising 

from a single application area. This means that our software 

may not always be the best for a given problem but will 

perform well on a range of problems. 

Flexibility: 

• Many packages offer the more experienced user a range of 

options. These can include options on how to input the 

problem data and whether it is to be checked for errors, 

blocksizes for use with BLAS, and the stability threshold 

parameter (linear solvers). 


Threadsafe: 

• The first release to be threadsafe was HSL 2002. 

• All use of (eg) COMMON and SAVE removed. 

• Allows HSL packages to be used in multi-threaded 

applications. 

• Note: the Archive is not threadsafe (and no plans for this 

as Archive is not actively developed). 


Problem: we wish to solve 

where A is 

Sparse systems 

Ax = b 

LARGE 

Informal definition: A is sparse if 

• many entries are zero 

s p a r s e 

• it is worthwhile to exploit these zeros. 


• The idea of what is LARGE changed significantly over the last 

30-40 years. 

• Problems of order > 10 6 common. 

• Largest problems require iterative solvers (eg CG, GMRES, 

MINRES,...). 

• Our interest lies mainly in direct solvers. 

• Direct methods involve explicit factorization eg A = LU 

(L, U lower and upper triangular matrices). 

• Recently combining direct and iterative solvers has become 

an active area of research eg direct solvers used to obtain 

preconditioners for iterative solvers. 


Many application areas in science, engineering, and finance give 

rise to sparse systems 

• chemical engineering 

• economic modelling 

• fluid flow 

• oceanography 

• linear programming 

• structural engineering ... 

But all have different patterns and characteristics. 


0 

2000 

4000 

6000 

8000 

10000 

Circuit simulation 

circuit3 

12000 

0 2000 4000 6000 

nz = 48137 

8000 10000 12000 


0 

50 

100 

150 

200 

250 

300 

350 

400 

450 

500 

Reservoir modelling 

pores3 

0 100 200 300 

nz = 3474 

400 500 


0 

200 

400 

600 

800 

1000 

1200 

Economic modelling 

0 200 400 600 

nz = 7682 

800 1000 1200 


0 

1000 

2000 

3000 

4000 

5000 

6000 

7000 

8000 

9000 

10000 

Structural engineering 

0 2000 4000 6000 

nz = 428650 

8000 10000 


0 

2000 

4000 

6000 

8000 

10000 

12000 

Acoustics 

0 2000 4000 6000 

nz = 342828 

8000 10000 12000 


0 

200 

400 

600 

800 

1000 

1200 

1400 

1600 

1800 

Chemical engineering 

2000 

0 500 1000 

nz = 14677 

1500 2000 


0 

100 

200 

300 

400 

500 

600 

700 

800 

Linear programming 

0 100 200 300 400 

nz = 4841 

500 600 700 800 


Solving sparse systems 

Let A be n × n with nz nonzeros. 

Gaussian elimination for dense problem requires 

O(n 2 ) storage and O(n 3 ) flops. 

Hence infeasible for large n. 

Sparse algorithm aims to solve equations in 

O(n) + O(nz) time and space. 


Why is it hard? 

• We have to worry about the zero entries 

• Need to use sparse data structures 

• If we just go ahead and apply Gaussian elimination to sparse 

A, the zeros will, in general, rapidly fill-in. 

• We have to order carefully eg 

(a) x x x x x (b) x x 

x x x x 

x x x x 

x x x x 

x x x x x x x 

(a) fills in totally (b) no fill-in 


Note: A does not have to be very large for it to be worthwhile to 

exploit sparsity. 

Here we compare the sparse solver MA48 (HSL) with a dense solver 

SGESV (LAPACK) on some problems from practical applications 

(timings in seconds). 

Identifier n nz MA48 SGESV 

FS 680 3 680 2646 0.06 0.96 

PORES 2 1224 9613 0.54 4.54 

BCSSTK27 1224 56126 2.07 4.55 

NNC1374 1374 8606 0.70 6.19 

WEST2021 2021 7353 0.21 18.88 

ORANI678 2529 90158 1.17 36.37 


HSL contains several sparse direct solvers: 

• Some are for symmetric systems, others for unsymmetric 

systems. 

• There are solvers designed for element problems. 

• There are solvers that use minimal storage. 

• Some are designed for particular sparsity structures (banded, 

highly unsymmetric, KKT ...). 

• There are solvers for real systems and solvers for complex 

systems. 

• Some expolit high level BLAS. 

• Some incorporate scaling/ iterative refinement/ different 

orderings ... 


Chemical process engineering problems 

• Realistic, industrial-scale process modelling problems for 

dynamic simulation and optimization require large-scale 

computation. 

• Solving large, sparse linear systems is often a bottleneck (up to 

95% of total computation time). 

• These systems involve matrices that are: 

– Very sparse 

– Not diagonally dominant 

– Numerically indefinite 

– Highly unsymmetric structure 

– May be ill-conditioned 

• Consequently choice of algorithm/solver is limited. 


0 

500 

1000 

1500 

2000 

2500 

3000 

3500 

4000 

4500 

5000 

Chemical process engineering 

hydr1 matrix 

0 1000 2000 3000 4000 5000 

nz = 23752 

General-purpose solver such as HSL code MA48 is typically used 

for these problems. 


Phases in MA48 

• (Optional) Preorder to block triangular form eg. 

P AQ = 

⎛ 

⎜ 

⎝ 

(only need to factorize Bii) 

B11 

B21 B22 

... 

Bl1 Bl2 ... Bll 

• Analyse - sparsity analysed to produce suitable ordering and 

data structures for efficient factorization (pivot sequence 

chosen to minimise fill-in and for numerical stability). 

• Factorize - compute L and U using information from analyse 

• Solve - forward elimination and back substitution 

These phases are typical of a sparse direct solver. 

⎞ 

⎟ 

⎠ 


• Analyse phase selects a tentative pivot sequence to try and 

minimise fill-in. 

• The analysis can be reused to factorize other matrices with 

same sparsity pattern (important in many applications eg 

solving a nonlinear system using a Newton-type method). 

• The pivot sequence can be modified during the factorize phase 

to ensure stability or it can be fixed (fast factorize option). 

• Once the factors are computed, they can be used to solve 

repeatedly for different right hand sides b. 


MA48 results 

Results on an SGI Origin (timings in seconds). 

Identifier n nz Analyse Factorize Fast Solve 

Factorize 

onetone2 36,057 227,628 10.43 3.47 2.97 0.10 

bayer01 57,735 277,774 5.02 1.20 0.65 0.10 

lhr71c 70,304 1,528,092 40.26 9.99 7.56 0.39 

icomp 75,724 338,711 0.60 0.18 0.13 0.06 


Parallel approach 

Start by preordering A to Singly Bordered Block Diagonal (SBBD) 

form ⎛ 

⎞ 

where 

⎜ 

⎝ 

A11 

A22 

C1 

C2 

... . 

ANN CN 

• All are ml × nl matrices with ml ≥ nl 

• Cl are ml × k with k ≪ nl. 

⎟ 

⎠ 

, 


ayer04 before and after reordering 


• Perform partial LU decomposition of each (All, Cl). 

• Complete factorization by forming and then factorizing an 

interface problem 

Advantages over designing a general parallel sparse solver: 

• Allows us to exploit existing fully tested and developed 

sophisticated direct solvers. 

• Processors are preassigned all the necessary matrix data before 

the factorization starts. 

• Communications only required to send Schur complement 

matrices to the processor responsible for the interface problem 

(plus communication of interface data during the solve phase). 

• Interface matrix much smaller than the original matrix; 

factorize using any existing sparse solver. 


Unfortunately factorization of the rectangular submatrices 

(All, Cl) cannot be performed using an existing direct solver 

without modifications. 

• Have to distinguish between columns of All that are candidates 

for elimination and those belonging to border Cl that must be 

passed to interface problem. 

• Pivots can ONLY be chosen from All. 

• Must have access to the Schur complement remaining at the 

end of each partial factorization. 

• Submatrix may be rank deficient. 


Efficiency 

Our new parallel direct solver is called HSL MP48 

Efficiency of HSL MP48 depends on: 

• SBBD having a small interface. 

(Interface problem is solved using a single processor). 

• Obtaining good load balance. 

Matrix may naturally arise in SBBD form 

(eg the components of a chemical processing plant) 

but not in general. 

MONET algorithm of Yifan Hu very successful 

(available in HSL 2002 as routine HSL MC66) 


Results 

Numerical results for parallel direct solver HSL MP48. 

• 12 processor SGI Origin2000 (Manchester University) 

• cpuset facility (exclusive access to the processors and their local 

memory) 

• Fortran 90 compiler in 64 bit mode with optimization flags 

-O3 -OPT:Olimite=0 

• Vendor-supplied BLAS 

• SBBD with N = 8 blocks 

• All timings are wallclock timings in seconds. 


Results (cont.) 

Example: bayer01 (chemical process simulation problem) 

• n = 57735, nz = 277774. 

• Time for Analyse + Factorize + Solve (Speedup) 

HSL MP48 

MA48 p = 1 2 4 8 

6.37 4.23 2.39 (1.8) 1.48 (2.9) 0.97 (4.4) 

• Time for interface problem is 0.06. 

• Time for Solve (Speedup) 


MA48 p = 1 2 4 8 

0.105 0.116 0.082 (1.4) 0.050 (2.3) 0.047 (2.5) 



Example: lhr71c (light hydrocarbon recovery problem) 

• n = 70304, nz = 1528092. 



MA48 p = 1 2 4 8 

50.6 71.2 39.8 (1.8) 22.3 (3.2) 12.4 (5.7) 




MA48 p = 1 2 4 8 

0.39 0.51 0.32 (1.6) 0.20 (2.6) 0.13 (4.1) 



Example: 10cols (chemical process simulation problem) 

• n = 29496, nz = 109588. 



MA48 p = 1 2 4 8 

16.4 2.75 1.60 (1.7) 0.93 (3.0) 0.65 (4.2) 


• Flops (∗10 5 ): MA48 = 1611; HSL MP48 = 183 



MA48 p =1 2 4 8 

0.074 0.034 0.032 (1.6) 0.027 (2.0) 0.023 (2.4) 


Concluding remarks 

• HSL is an established but continually evolving library of 

mathematical software. 

• Problems users want to solve are always increasing in size. 

• Most of these are sparse. 

• Thus new techniques/methods continue to be developed. 

• New solvers to implement them continue to be written. 

• Recently, parallel algorithms being developed. 

We always welcome new applications and test problems

HSL and the solution of sparse linear systems - EPCC

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?