HSL and the solution of sparse linear systems - EPCC
HSL and the solution of sparse linear systems - EPCC
HSL and the solution of sparse linear systems - EPCC
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
<strong>HSL</strong> <strong>and</strong> <strong>the</strong> <strong>solution</strong> <strong>of</strong> <strong>sparse</strong> <strong>linear</strong><br />
<strong>systems</strong><br />
Jennifer A. Scott<br />
Computational Science <strong>and</strong> Engineering Department,<br />
Ru<strong>the</strong>rford Appleton Laboratory.<br />
J.A.Scott@rl.ac.uk<br />
Group homepage: www.cse.clrc.ac.uk/nag/
• Who we are<br />
• <strong>HSL</strong> <strong>and</strong> s<strong>of</strong>tware design<br />
Overview<br />
• Sparse <strong>linear</strong> solvers: a brief introduction<br />
<strong>EPCC</strong> Janury 2005
The Numerical Analysis Group at RAL<br />
• Belong to <strong>the</strong> Computational Science <strong>and</strong> Engineering (CSE)<br />
Department <strong>of</strong> CCLRC.<br />
• CSE aims to provide world-class expertise <strong>and</strong> support for UK<br />
<strong>the</strong>oretical <strong>and</strong> computational science communities, in both<br />
academia <strong>and</strong> industry.<br />
• We are based at <strong>the</strong> Ru<strong>the</strong>rford Appleton Laboratory in<br />
Oxfordshire (about 15 miles south <strong>of</strong> Oxford).<br />
• RAL employs around 1200 people (mainly scientists <strong>and</strong><br />
engineers) plus large number <strong>of</strong> visitors.<br />
• We are a small Group (4 permanent members plus a consultant)<br />
... <strong>the</strong>re has only been one staff change in 15 years!<br />
• Currently, much <strong>of</strong> <strong>the</strong> core funding for <strong>the</strong> Group is provided<br />
by an ESPRC grant (GR/S42170).<br />
<strong>EPCC</strong> Janury 2005
Why use a Numerical Library?<br />
• Developing reliable, robust, accurate <strong>and</strong> efficient s<strong>of</strong>tware<br />
for areas covered by numerical libraries requires considerable<br />
experience <strong>and</strong> takes years <strong>of</strong> effort.<br />
• Thus it is cost effective to use a Library written <strong>and</strong> developed<br />
by experts<br />
• Reduces programming time <strong>and</strong> effort<br />
• Increases productivity<br />
• Allows confidence in <strong>the</strong> results<br />
<strong>EPCC</strong> Janury 2005
• There is much free ma<strong>the</strong>matical s<strong>of</strong>tware available on <strong>the</strong> web<br />
(a useful site is gams.nist.gov)<br />
• Some free s<strong>of</strong>tware is excellent <strong>and</strong> is fully documented, tested,<br />
<strong>and</strong> maintained (eg <strong>the</strong> LAPACK <strong>linear</strong> algebra library for dense<br />
matrices is available in <strong>the</strong> public domain)<br />
• BUT beware <strong>of</strong> <strong>the</strong> unknown - on <strong>the</strong> web <strong>the</strong>re is no overall<br />
st<strong>and</strong>ard <strong>and</strong> no quality control<br />
• Often <strong>of</strong>fers no guarantee <strong>of</strong> maintenance, user support, or<br />
continuity<br />
• The alternative is a commercial library eg NAG, IMSL, <strong>HSL</strong><br />
• Also available commercially are high-performance technical<br />
computing environments (MATLAB, Ma<strong>the</strong>matica ...)<br />
<strong>EPCC</strong> Janury 2005
<strong>HSL</strong><br />
• Began as Harwell Subroutine Library in 1963.<br />
• Portable, fully documented <strong>and</strong> tested Fortran packages.<br />
• Primarily written <strong>and</strong> developed by RAL Numerical Analysis<br />
Group.<br />
• Each package performs a basic numerical task (eg solve<br />
<strong>linear</strong> system, find eigenvalues) <strong>and</strong> has been designed to be<br />
incorporated into programs.<br />
• Particular strengths in:<br />
– <strong>sparse</strong> matrix computations<br />
– optimization<br />
– large-scale system <strong>solution</strong><br />
<strong>HSL</strong> has international reputation for reliability <strong>and</strong> efficiency.<br />
<strong>EPCC</strong> Janury 2005
For academics:<br />
Benefits <strong>and</strong> advantages <strong>of</strong> <strong>HSL</strong><br />
• Freely available to ALL UK academics<br />
• Teaching aid (mainly MSc <strong>and</strong> PhD level)<br />
• More time for concentrating on own area <strong>of</strong> research<br />
(avoid “reinventing <strong>the</strong> wheel”!)<br />
• Can be used with confidence (“black box”)<br />
<strong>EPCC</strong> Janury 2005
Benefits <strong>and</strong> advantages <strong>of</strong> <strong>HSL</strong> (cont.)<br />
For commercial organisations:<br />
• Shorten application development cycle, cutting time-to-market<br />
<strong>and</strong> gaining competitive advantage<br />
• Reduce overall development costs<br />
• More time to focus on specialist aspects <strong>of</strong> applications<br />
• Improve application accuracy <strong>and</strong> robustness<br />
• Fully supported <strong>and</strong> maintained s<strong>of</strong>tware<br />
<strong>HSL</strong> routines have been incorporated into a large number <strong>of</strong><br />
commercial products.<br />
<strong>EPCC</strong> Janury 2005
Current version<br />
• A new version <strong>of</strong> <strong>HSL</strong> is released every 2-3 years<br />
(with UK academics given access to new routines as soon as<br />
testing is completed).<br />
• Latest version: <strong>HSL</strong> 2004 ... released September 2004<br />
• <strong>HSL</strong> is currently marketed by Aspen Technology.<br />
<strong>EPCC</strong> Janury 2005
How to get <strong>HSL</strong><br />
• <strong>HSL</strong> packages are available without charge, for academic<br />
purposes, to any user whose email address ends in .ac.uk .<br />
• Access to <strong>HSL</strong> is via our website by means <strong>of</strong> a short-lived<br />
individual password-controlled account.<br />
• Potential users are asked for brief details, including <strong>the</strong> use <strong>the</strong>y<br />
intend to make <strong>of</strong> <strong>HSL</strong>.<br />
• Please provide this data as it helps us (<strong>and</strong> our funding body)<br />
to evaluate <strong>the</strong> relevance <strong>of</strong> our s<strong>of</strong>tware to UK academia.<br />
• Users must accept a conditions-<strong>of</strong>-use form, <strong>and</strong> are not<br />
permitted to distribute any <strong>HSL</strong> codes <strong>the</strong>y download to a third<br />
party.<br />
Fur<strong>the</strong>r details <strong>of</strong> <strong>HSL</strong>: www.cse.clrc.ac.uk/nag/hsl<br />
<strong>EPCC</strong> Janury 2005
Design <strong>of</strong> <strong>the</strong> <strong>HSL</strong> Library<br />
• <strong>HSL</strong> is split into <strong>HSL</strong> 2004 <strong>and</strong> <strong>HSL</strong> Archive.<br />
• <strong>HSL</strong> Archive consists <strong>of</strong> older packages that have been<br />
superseded ei<strong>the</strong>r by improved <strong>HSL</strong> packages or by public<br />
domain libraries such as LAPACK.<br />
• <strong>HSL</strong> Archive is free to all for non-commercial use but its use is<br />
not supported.<br />
• All <strong>HSL</strong> usage (main library <strong>and</strong> Archive) requires a valid<br />
licence.<br />
<strong>HSL</strong> provides users with source code.<br />
<strong>EPCC</strong> Janury 2005
• <strong>HSL</strong> packages are classified into chapters. These were decided<br />
on in <strong>the</strong> early days (certainly by Release 1 <strong>of</strong> <strong>the</strong> Catalogue)<br />
• The chapters led to <strong>the</strong> <strong>HSL</strong> naming convention. eg AD02<br />
for automatic differentiation belongs to <strong>the</strong> ‘A’ chapter on<br />
computer algebra <strong>and</strong> MA48 is part <strong>of</strong> <strong>the</strong> ‘MA’ chapter <strong>of</strong><br />
matrix <strong>linear</strong> algebra packages.<br />
• The prefix <strong>HSL</strong> is used to indicate <strong>the</strong> package is written in<br />
Fortran 90 or 95 (some packages have Fortran 77 <strong>and</strong> Fortran<br />
90 versions).<br />
• The <strong>HSL</strong> catalogue provides a complete list <strong>of</strong> <strong>the</strong> packages in<br />
<strong>HSL</strong> 2004 <strong>and</strong> for each gives a brief outline <strong>of</strong> purpose, method,<br />
origin, language <strong>and</strong> o<strong>the</strong>r attributes.<br />
• An extensive index assists potential users in choosing packages<br />
appropriately.<br />
<strong>EPCC</strong> Janury 2005
S<strong>of</strong>tware design aims within <strong>HSL</strong><br />
We aim to design our s<strong>of</strong>tware so that it is<br />
• Portable<br />
• Efficient<br />
• Reliable<br />
• Straightforward to use<br />
• General purpose<br />
• Flexible<br />
• Threadsafe<br />
<strong>EPCC</strong> Janury 2005
Portability:<br />
How we achieve <strong>the</strong>se objectives<br />
• S<strong>of</strong>tware written in st<strong>and</strong>ard Fortran (older codes are Fortran<br />
77, more recently, Fortran 90 <strong>and</strong> 95).<br />
• Parallel codes use MPI for message passing.<br />
• Small number <strong>of</strong> machine-dependent routines (eg FD05<br />
returns real-valued machine constants).<br />
Efficiency:<br />
• Extensive experience <strong>of</strong> Fortran programming (in particular,<br />
<strong>sparse</strong> matrix coding)<br />
• Use <strong>of</strong> (eg) BLAS <strong>and</strong> LAPACK, with options for tuning for<br />
different platforms<br />
• Performance compared with o<strong>the</strong>r state-<strong>of</strong>-<strong>the</strong>-art packages<br />
<strong>EPCC</strong> Janury 2005
Reliability:<br />
• Extensive testing using comprehensive test deck<br />
• Also testing on real applications <strong>of</strong> different sizes<br />
• Tests performed on a range <strong>of</strong> computer platforms with a<br />
range <strong>of</strong> Fortran compilers.<br />
<strong>EPCC</strong> Janury 2005
Ease <strong>of</strong> use:<br />
• S<strong>of</strong>tware is fully documented with each package having its<br />
own specification sheets<br />
• These include a simple example illustrating <strong>the</strong> use <strong>of</strong> <strong>the</strong><br />
code (may be used as a template).<br />
• Parameters that must be set by <strong>the</strong> user are kept to a<br />
minimum.<br />
• User interface simplified through use <strong>of</strong> Fortran 90 (dynamic<br />
memory allocation .. also allows easy restart)<br />
• The main codes provide checks on <strong>the</strong> user’s data. In case<br />
<strong>of</strong> an error, a flag is set <strong>and</strong>, optionally, a message written.<br />
This assists <strong>the</strong> user with debugging <strong>the</strong>ir calling program<br />
<strong>and</strong> data.<br />
<strong>EPCC</strong> Janury 2005
General purpose:<br />
• Packages are not designed for a particular problem arising<br />
from a single application area. This means that our s<strong>of</strong>tware<br />
may not always be <strong>the</strong> best for a given problem but will<br />
perform well on a range <strong>of</strong> problems.<br />
Flexibility:<br />
• Many packages <strong>of</strong>fer <strong>the</strong> more experienced user a range <strong>of</strong><br />
options. These can include options on how to input <strong>the</strong><br />
problem data <strong>and</strong> whe<strong>the</strong>r it is to be checked for errors,<br />
blocksizes for use with BLAS, <strong>and</strong> <strong>the</strong> stability threshold<br />
parameter (<strong>linear</strong> solvers).<br />
<strong>EPCC</strong> Janury 2005
Threadsafe:<br />
• The first release to be threadsafe was <strong>HSL</strong> 2002.<br />
• All use <strong>of</strong> (eg) COMMON <strong>and</strong> SAVE removed.<br />
• Allows <strong>HSL</strong> packages to be used in multi-threaded<br />
applications.<br />
• Note: <strong>the</strong> Archive is not threadsafe (<strong>and</strong> no plans for this<br />
as Archive is not actively developed).<br />
<strong>EPCC</strong> Janury 2005
Problem: we wish to solve<br />
where A is<br />
Sparse <strong>systems</strong><br />
Ax = b<br />
LARGE<br />
Informal definition: A is <strong>sparse</strong> if<br />
• many entries are zero<br />
s p a r s e<br />
• it is worthwhile to exploit <strong>the</strong>se zeros.<br />
<strong>EPCC</strong> Janury 2005
• The idea <strong>of</strong> what is LARGE changed significantly over <strong>the</strong> last<br />
30-40 years.<br />
• Problems <strong>of</strong> order > 10 6 common.<br />
• Largest problems require iterative solvers (eg CG, GMRES,<br />
MINRES,...).<br />
• Our interest lies mainly in direct solvers.<br />
• Direct methods involve explicit factorization eg A = LU<br />
(L, U lower <strong>and</strong> upper triangular matrices).<br />
• Recently combining direct <strong>and</strong> iterative solvers has become<br />
an active area <strong>of</strong> research eg direct solvers used to obtain<br />
preconditioners for iterative solvers.<br />
<strong>EPCC</strong> Janury 2005
Many application areas in science, engineering, <strong>and</strong> finance give<br />
rise to <strong>sparse</strong> <strong>systems</strong><br />
• chemical engineering<br />
• economic modelling<br />
• fluid flow<br />
• oceanography<br />
• <strong>linear</strong> programming<br />
• structural engineering ...<br />
But all have different patterns <strong>and</strong> characteristics.<br />
<strong>EPCC</strong> Janury 2005
0<br />
2000<br />
4000<br />
6000<br />
8000<br />
10000<br />
Circuit simulation<br />
circuit3<br />
12000<br />
0 2000 4000 6000<br />
nz = 48137<br />
8000 10000 12000<br />
<strong>EPCC</strong> Janury 2005
0<br />
50<br />
100<br />
150<br />
200<br />
250<br />
300<br />
350<br />
400<br />
450<br />
500<br />
Reservoir modelling<br />
pores3<br />
0 100 200 300<br />
nz = 3474<br />
400 500<br />
<strong>EPCC</strong> Janury 2005
0<br />
200<br />
400<br />
600<br />
800<br />
1000<br />
1200<br />
Economic modelling<br />
0 200 400 600<br />
nz = 7682<br />
800 1000 1200<br />
<strong>EPCC</strong> Janury 2005
0<br />
1000<br />
2000<br />
3000<br />
4000<br />
5000<br />
6000<br />
7000<br />
8000<br />
9000<br />
10000<br />
Structural engineering<br />
0 2000 4000 6000<br />
nz = 428650<br />
8000 10000<br />
<strong>EPCC</strong> Janury 2005
0<br />
2000<br />
4000<br />
6000<br />
8000<br />
10000<br />
12000<br />
Acoustics<br />
0 2000 4000 6000<br />
nz = 342828<br />
8000 10000 12000<br />
<strong>EPCC</strong> Janury 2005
0<br />
200<br />
400<br />
600<br />
800<br />
1000<br />
1200<br />
1400<br />
1600<br />
1800<br />
Chemical engineering<br />
2000<br />
0 500 1000<br />
nz = 14677<br />
1500 2000<br />
<strong>EPCC</strong> Janury 2005
0<br />
100<br />
200<br />
300<br />
400<br />
500<br />
600<br />
700<br />
800<br />
Linear programming<br />
0 100 200 300 400<br />
nz = 4841<br />
500 600 700 800<br />
<strong>EPCC</strong> Janury 2005
Solving <strong>sparse</strong> <strong>systems</strong><br />
Let A be n × n with nz nonzeros.<br />
Gaussian elimination for dense problem requires<br />
O(n 2 ) storage <strong>and</strong> O(n 3 ) flops.<br />
Hence infeasible for large n.<br />
Sparse algorithm aims to solve equations in<br />
O(n) + O(nz) time <strong>and</strong> space.<br />
<strong>EPCC</strong> Janury 2005
Why is it hard?<br />
• We have to worry about <strong>the</strong> zero entries<br />
• Need to use <strong>sparse</strong> data structures<br />
• If we just go ahead <strong>and</strong> apply Gaussian elimination to <strong>sparse</strong><br />
A, <strong>the</strong> zeros will, in general, rapidly fill-in.<br />
• We have to order carefully eg<br />
(a) x x x x x (b) x x<br />
x x x x<br />
x x x x<br />
x x x x<br />
x x x x x x x<br />
(a) fills in totally (b) no fill-in<br />
<strong>EPCC</strong> Janury 2005
Note: A does not have to be very large for it to be worthwhile to<br />
exploit sparsity.<br />
Here we compare <strong>the</strong> <strong>sparse</strong> solver MA48 (<strong>HSL</strong>) with a dense solver<br />
SGESV (LAPACK) on some problems from practical applications<br />
(timings in seconds).<br />
Identifier n nz MA48 SGESV<br />
FS 680 3 680 2646 0.06 0.96<br />
PORES 2 1224 9613 0.54 4.54<br />
BCSSTK27 1224 56126 2.07 4.55<br />
NNC1374 1374 8606 0.70 6.19<br />
WEST2021 2021 7353 0.21 18.88<br />
ORANI678 2529 90158 1.17 36.37<br />
<strong>EPCC</strong> Janury 2005
<strong>HSL</strong> contains several <strong>sparse</strong> direct solvers:<br />
• Some are for symmetric <strong>systems</strong>, o<strong>the</strong>rs for unsymmetric<br />
<strong>systems</strong>.<br />
• There are solvers designed for element problems.<br />
• There are solvers that use minimal storage.<br />
• Some are designed for particular sparsity structures (b<strong>and</strong>ed,<br />
highly unsymmetric, KKT ...).<br />
• There are solvers for real <strong>systems</strong> <strong>and</strong> solvers for complex<br />
<strong>systems</strong>.<br />
• Some expolit high level BLAS.<br />
• Some incorporate scaling/ iterative refinement/ different<br />
orderings ...<br />
<strong>EPCC</strong> Janury 2005
Chemical process engineering problems<br />
• Realistic, industrial-scale process modelling problems for<br />
dynamic simulation <strong>and</strong> optimization require large-scale<br />
computation.<br />
• Solving large, <strong>sparse</strong> <strong>linear</strong> <strong>systems</strong> is <strong>of</strong>ten a bottleneck (up to<br />
95% <strong>of</strong> total computation time).<br />
• These <strong>systems</strong> involve matrices that are:<br />
– Very <strong>sparse</strong><br />
– Not diagonally dominant<br />
– Numerically indefinite<br />
– Highly unsymmetric structure<br />
– May be ill-conditioned<br />
• Consequently choice <strong>of</strong> algorithm/solver is limited.<br />
<strong>EPCC</strong> Janury 2005
0<br />
500<br />
1000<br />
1500<br />
2000<br />
2500<br />
3000<br />
3500<br />
4000<br />
4500<br />
5000<br />
Chemical process engineering<br />
hydr1 matrix<br />
0 1000 2000 3000 4000 5000<br />
nz = 23752<br />
General-purpose solver such as <strong>HSL</strong> code MA48 is typically used<br />
for <strong>the</strong>se problems.<br />
<strong>EPCC</strong> Janury 2005
Phases in MA48<br />
• (Optional) Preorder to block triangular form eg.<br />
P AQ =<br />
⎛<br />
⎜<br />
⎝<br />
(only need to factorize Bii)<br />
B11<br />
B21 B22<br />
...<br />
Bl1 Bl2 ... Bll<br />
• Analyse - sparsity analysed to produce suitable ordering <strong>and</strong><br />
data structures for efficient factorization (pivot sequence<br />
chosen to minimise fill-in <strong>and</strong> for numerical stability).<br />
• Factorize - compute L <strong>and</strong> U using information from analyse<br />
• Solve - forward elimination <strong>and</strong> back substitution<br />
These phases are typical <strong>of</strong> a <strong>sparse</strong> direct solver.<br />
⎞<br />
⎟<br />
⎠<br />
<strong>EPCC</strong> Janury 2005
• Analyse phase selects a tentative pivot sequence to try <strong>and</strong><br />
minimise fill-in.<br />
• The analysis can be reused to factorize o<strong>the</strong>r matrices with<br />
same sparsity pattern (important in many applications eg<br />
solving a non<strong>linear</strong> system using a Newton-type method).<br />
• The pivot sequence can be modified during <strong>the</strong> factorize phase<br />
to ensure stability or it can be fixed (fast factorize option).<br />
• Once <strong>the</strong> factors are computed, <strong>the</strong>y can be used to solve<br />
repeatedly for different right h<strong>and</strong> sides b.<br />
<strong>EPCC</strong> Janury 2005
MA48 results<br />
Results on an SGI Origin (timings in seconds).<br />
Identifier n nz Analyse Factorize Fast Solve<br />
Factorize<br />
onetone2 36,057 227,628 10.43 3.47 2.97 0.10<br />
bayer01 57,735 277,774 5.02 1.20 0.65 0.10<br />
lhr71c 70,304 1,528,092 40.26 9.99 7.56 0.39<br />
icomp 75,724 338,711 0.60 0.18 0.13 0.06<br />
<strong>EPCC</strong> Janury 2005
Parallel approach<br />
Start by preordering A to Singly Bordered Block Diagonal (SBBD)<br />
form ⎛<br />
⎞<br />
where<br />
⎜<br />
⎝<br />
A11<br />
A22<br />
C1<br />
C2<br />
... .<br />
ANN CN<br />
• All are ml × nl matrices with ml ≥ nl<br />
• Cl are ml × k with k ≪ nl.<br />
⎟<br />
⎠<br />
,<br />
<strong>EPCC</strong> Janury 2005
ayer04 before <strong>and</strong> after reordering<br />
<strong>EPCC</strong> Janury 2005
• Perform partial LU decomposition <strong>of</strong> each (All, Cl).<br />
• Complete factorization by forming <strong>and</strong> <strong>the</strong>n factorizing an<br />
interface problem<br />
Advantages over designing a general parallel <strong>sparse</strong> solver:<br />
• Allows us to exploit existing fully tested <strong>and</strong> developed<br />
sophisticated direct solvers.<br />
• Processors are preassigned all <strong>the</strong> necessary matrix data before<br />
<strong>the</strong> factorization starts.<br />
• Communications only required to send Schur complement<br />
matrices to <strong>the</strong> processor responsible for <strong>the</strong> interface problem<br />
(plus communication <strong>of</strong> interface data during <strong>the</strong> solve phase).<br />
• Interface matrix much smaller than <strong>the</strong> original matrix;<br />
factorize using any existing <strong>sparse</strong> solver.<br />
<strong>EPCC</strong> Janury 2005
Unfortunately factorization <strong>of</strong> <strong>the</strong> rectangular submatrices<br />
(All, Cl) cannot be performed using an existing direct solver<br />
without modifications.<br />
• Have to distinguish between columns <strong>of</strong> All that are c<strong>and</strong>idates<br />
for elimination <strong>and</strong> those belonging to border Cl that must be<br />
passed to interface problem.<br />
• Pivots can ONLY be chosen from All.<br />
• Must have access to <strong>the</strong> Schur complement remaining at <strong>the</strong><br />
end <strong>of</strong> each partial factorization.<br />
• Submatrix may be rank deficient.<br />
<strong>EPCC</strong> Janury 2005
Efficiency<br />
Our new parallel direct solver is called <strong>HSL</strong> MP48<br />
Efficiency <strong>of</strong> <strong>HSL</strong> MP48 depends on:<br />
• SBBD having a small interface.<br />
(Interface problem is solved using a single processor).<br />
• Obtaining good load balance.<br />
Matrix may naturally arise in SBBD form<br />
(eg <strong>the</strong> components <strong>of</strong> a chemical processing plant)<br />
but not in general.<br />
MONET algorithm <strong>of</strong> Yifan Hu very successful<br />
(available in <strong>HSL</strong> 2002 as routine <strong>HSL</strong> MC66)<br />
<strong>EPCC</strong> Janury 2005
Results<br />
Numerical results for parallel direct solver <strong>HSL</strong> MP48.<br />
• 12 processor SGI Origin2000 (Manchester University)<br />
• cpuset facility (exclusive access to <strong>the</strong> processors <strong>and</strong> <strong>the</strong>ir local<br />
memory)<br />
• Fortran 90 compiler in 64 bit mode with optimization flags<br />
-O3 -OPT:Olimite=0<br />
• Vendor-supplied BLAS<br />
• SBBD with N = 8 blocks<br />
• All timings are wallclock timings in seconds.<br />
<strong>EPCC</strong> Janury 2005
Results (cont.)<br />
Example: bayer01 (chemical process simulation problem)<br />
• n = 57735, nz = 277774.<br />
• Time for Analyse + Factorize + Solve (Speedup)<br />
<strong>HSL</strong> MP48<br />
MA48 p = 1 2 4 8<br />
6.37 4.23 2.39 (1.8) 1.48 (2.9) 0.97 (4.4)<br />
• Time for interface problem is 0.06.<br />
• Time for Solve (Speedup)<br />
<strong>HSL</strong> MP48<br />
MA48 p = 1 2 4 8<br />
0.105 0.116 0.082 (1.4) 0.050 (2.3) 0.047 (2.5)<br />
<strong>EPCC</strong> Janury 2005
Results (cont.)<br />
Example: lhr71c (light hydrocarbon recovery problem)<br />
• n = 70304, nz = 1528092.<br />
• Time for Analyse + Factorize + Solve (Speedup)<br />
<strong>HSL</strong> MP48<br />
MA48 p = 1 2 4 8<br />
50.6 71.2 39.8 (1.8) 22.3 (3.2) 12.4 (5.7)<br />
• Time for interface problem is 0.7.<br />
• Time for Solve (Speedup)<br />
<strong>HSL</strong> MP48<br />
MA48 p = 1 2 4 8<br />
0.39 0.51 0.32 (1.6) 0.20 (2.6) 0.13 (4.1)<br />
<strong>EPCC</strong> Janury 2005
Results (cont.)<br />
Example: 10cols (chemical process simulation problem)<br />
• n = 29496, nz = 109588.<br />
• Time for Analyse + Factorize + Solve (Speedup)<br />
<strong>HSL</strong> MP48<br />
MA48 p = 1 2 4 8<br />
16.4 2.75 1.60 (1.7) 0.93 (3.0) 0.65 (4.2)<br />
• Time for interface problem is 0.15.<br />
• Flops (∗10 5 ): MA48 = 1611; <strong>HSL</strong> MP48 = 183<br />
• Time for Solve (Speedup)<br />
<strong>HSL</strong> MP48<br />
MA48 p =1 2 4 8<br />
0.074 0.034 0.032 (1.6) 0.027 (2.0) 0.023 (2.4)<br />
<strong>EPCC</strong> Janury 2005
Concluding remarks<br />
• <strong>HSL</strong> is an established but continually evolving library <strong>of</strong><br />
ma<strong>the</strong>matical s<strong>of</strong>tware.<br />
• Problems users want to solve are always increasing in size.<br />
• Most <strong>of</strong> <strong>the</strong>se are <strong>sparse</strong>.<br />
• Thus new techniques/methods continue to be developed.<br />
• New solvers to implement <strong>the</strong>m continue to be written.<br />
• Recently, parallel algorithms being developed.<br />
We always welcome new applications <strong>and</strong> test problems<br />
<strong>EPCC</strong> Janury 2005