GPU based cloud computing - Open Grid Forum
GPU based cloud computing - Open Grid Forum
GPU based cloud computing - Open Grid Forum
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
<strong>GPU</strong> <strong>based</strong> <strong>cloud</strong> <strong>computing</strong><br />
Dairsie Latimer, Petapath, UK<br />
Petapath<br />
© NVIDIA Corporation 2010
About Petapath<br />
! Founded in 2008 to focus on delivering innovative hardware and<br />
software solutions into the high performance <strong>computing</strong> (HPC) markets<br />
Petapath<br />
! Partnered with HP and SGI to deliverer two Petascale prototype<br />
systems as part of the PRACE WP8 programme<br />
! The system is a testbed for new ideas in usability, scalability and<br />
efficiency of large computer installations<br />
! Active in exploiting emerging standards for acceleration technologies and<br />
are members of Khronos group and sit on the <strong>Open</strong>CL working committee<br />
! We also provide consulting expertise for companies wishing to explore the<br />
advantages offered by heterogeneous systems<br />
© NVIDIA Corporation 2010
What is Heterogeneous or <strong>GPU</strong> Computing?<br />
x86<br />
PCIe bus<br />
<strong>GPU</strong><br />
© NVIDIA Corporation 2010<br />
Computing with CPU + <strong>GPU</strong><br />
Heterogeneous Computing
Low Latency or High Throughput?<br />
CPU<br />
! Optimised for low-latency<br />
access to cached data sets<br />
! Control logic for out-of-order<br />
and speculative execution<br />
<strong>GPU</strong><br />
! Optimised for data-parallel,<br />
throughput computation<br />
! Architecture tolerant of<br />
memory latency<br />
! More transistors dedicated to<br />
computation<br />
© NVIDIA Corporation 2010
NVIDIA <strong>GPU</strong> Computing Ecosystem<br />
ISV<br />
<strong>GPU</strong><br />
Architecture<br />
CUDA<br />
Training<br />
Company<br />
CUDA<br />
Development<br />
Specialist<br />
TPP / OEM<br />
Hardware<br />
Architect<br />
VAR<br />
CUDA SDK<br />
& Tools<br />
NVIDIA Hardware<br />
Solutions<br />
Customer<br />
Requirements<br />
Customer<br />
Application<br />
Hardware<br />
Architecture<br />
© NVIDIA Corporation 2010<br />
Deployment
Science is Desperate for Throughput<br />
Gigaflops<br />
1,000,000,000<br />
1 Exaflop<br />
1,000,000<br />
1 Petaflop<br />
1,000<br />
1<br />
Ran for 8 months to<br />
simulate 2 nanoseconds<br />
© NVIDIA Corporation 2010<br />
1982 1997 2003 2006 2010 2012
Power Crisis in Super<strong>computing</strong><br />
Household Power<br />
Equivalent<br />
Exaflop<br />
City<br />
Petaflop<br />
Town<br />
Teraflop<br />
Neighborhood<br />
Gigaflop<br />
1982 1996 2008 2020<br />
Block<br />
© NVIDIA Corporation 2010
Enter the <strong>GPU</strong><br />
GeForce ®<br />
Entertainment<br />
Tesla TM<br />
High-Performance Computing<br />
Quadro ®<br />
Design & Creation<br />
© NVIDIA Corporation 2010<br />
NVIDIA <strong>GPU</strong> Product Families
NEXT-GENERATION <strong>GPU</strong> ARCHITECTURE —<br />
‘FERMI’<br />
© NVIDIA Corporation 2010
Introducing the ‘Fermi’ Tesla Architecture<br />
The Soul of a Supercomputer in the body of a <strong>GPU</strong><br />
! 3 billion transistors<br />
! Up to 2× the cores (C2050 has 448)<br />
! Up to 8× the peak DP performance<br />
! ECC on all memories<br />
! L1 and L2 caches<br />
Giga Thread<br />
! Improved memory bandwidth (GDDR5)<br />
! Up to 1 Terabyte of <strong>GPU</strong> memory<br />
! Concurrent kernels<br />
! Hardware support for C++<br />
© NVIDIA Corporation 2010
Design Goal of Fermi<br />
Data<br />
Parallel<br />
! Expand<br />
performance sweet<br />
spot of the <strong>GPU</strong><br />
Instruction<br />
Parallel<br />
! Bring more users,<br />
more applications<br />
to the <strong>GPU</strong><br />
© NVIDIA Corporation 2010<br />
Many Decisions<br />
Large Data Sets
Streaming Multiprocessor Architecture<br />
! 32 CUDA cores per SM (512 total)<br />
! 8× peak double precision floating<br />
point performance<br />
! 50% of peak single precision<br />
! Dual Thread Scheduler<br />
! 64 KB of RAM for shared memory<br />
and L1 cache (configurable)<br />
Load/Store Units × 16<br />
Special Func Units × 4<br />
© NVIDIA Corporation 2010
CUDA Core Architecture<br />
! New IEEE 754-2008 floating-point standard,<br />
surpassing even the most advanced CPUs<br />
! Fused multiply-add (FMA) instruction<br />
for both single and double precision<br />
! New integer ALU optimized for<br />
64-bit and extended precision<br />
operations<br />
FP Unit<br />
INT Unit<br />
Load/Store Units x 16<br />
Special Func Units x 4<br />
© NVIDIA Corporation 2010
Cached Memory Hierarchy<br />
! First <strong>GPU</strong> architecture to support a true cache<br />
hierarchy in combination with on-chip shared memory<br />
! L1 Cache per SM (32 cores)<br />
! Improves bandwidth and reduces latency<br />
! Unified L2 Cache (768 KB)<br />
! Fast, coherent data sharing across all cores in the <strong>GPU</strong><br />
Parallel DataCache <br />
Memory Hierarchy<br />
Giga Thread<br />
© NVIDIA Corporation 2010
Larger, Faster, Resilient Memory Interface<br />
! GDDR5 memory interface<br />
! 2× signaling speed of GDDR3<br />
! Up to 1 Terabyte of memory attached to <strong>GPU</strong><br />
! Operate on larger data sets (3 and 6 GB Cards)<br />
Giga Thread<br />
! ECC protection for GDDR5 DRAM<br />
! All major internal memories are ECC protected<br />
! Register file, L1 cache, L2 cache<br />
© NVIDIA Corporation 2010
GigaThread Hardware Thread Scheduler<br />
© NVIDIA Corporation 2010
GigaThread Streaming Data Transfer Engine<br />
! Dual DMA engines<br />
! Simultaneous CPU<strong>GPU</strong> and <strong>GPU</strong>CPU<br />
data transfer<br />
! Fully overlapped with CPU and <strong>GPU</strong><br />
processing time<br />
! Activity Snapshot:<br />
SDT<br />
Kernel 0<br />
SDT0<br />
SDT1<br />
Kernel 1<br />
SDT0<br />
SDT1<br />
Kernel 2<br />
SDT0<br />
SDT1<br />
Kernel 3<br />
SDT0<br />
SDT1<br />
© NVIDIA Corporation 2010
Enhanced Software Support<br />
! Many new features in CUDA Toolkit 3.0<br />
! To be released on Friday<br />
! Including early support for the Fermi architecture:<br />
! Native 64-bit <strong>GPU</strong> support<br />
! Multiple Copy Engine support<br />
! ECC reporting<br />
! Concurrent Kernel Execution<br />
! Fermi HW debugging support in cuda-gdb<br />
© NVIDIA Corporation 2010
Enhanced Software Support<br />
! <strong>Open</strong>CL 1.0 Support<br />
! First class language citizen in CUDA Architecture<br />
! Supports ICD (so interoperability between vendors is a possibility)<br />
! Profiling support available<br />
! Debug support coming to Parallel Nsight (NEXUS) soon<br />
! gDebugger CL from graphicREMEDY<br />
! Third party <strong>Open</strong>CL profiler/debugger/memory checker<br />
! Software Tools Ecosystem is starting to grow<br />
! Given boost by existence of <strong>Open</strong>CL<br />
© NVIDIA Corporation 2010
“Oak Ridge National Lab (ORNL) has already announced it<br />
will be using Fermi technology in an upcoming super that is<br />
"expected to be 10-times more powerful than today's fastest<br />
supercomputer."<br />
Since ORNL's Jaguar supercomputer, for all intents and<br />
purposes, holds that title, and is in the process of being<br />
upgraded to 2.3 PFlops….<br />
…we can surmise that the upcoming Fermi-equipped super is<br />
going to be in the 20 Petaflops range.”<br />
September 30 2009<br />
© NVIDIA Corporation 2010
NVIDIA TESLA PRODUCTS<br />
© NVIDIA Corporation 2010
Tesla <strong>GPU</strong> Computing Products: 10 Series<br />
SuperMicro 1U<br />
<strong>GPU</strong> SuperServer<br />
Tesla S1070<br />
1U System<br />
Tesla C1060<br />
Computing Board<br />
Tesla Personal<br />
Supercomputer<br />
<strong>GPU</strong>s 2 Tesla <strong>GPU</strong>s 4 Tesla <strong>GPU</strong>s 1 Tesla <strong>GPU</strong> 4 Tesla <strong>GPU</strong>s<br />
Single Precision<br />
Performance<br />
Double Precision<br />
Performance<br />
1.87 Teraflops 4.14 Teraflops 933 Gigaflops 3.7 Teraflops<br />
156 Gigaflops 346 Gigaflops 78 Gigaflops 312 Gigaflops<br />
Memory 8 GB (4 GB / <strong>GPU</strong>) 16 GB (4 GB / <strong>GPU</strong>) 4 GB 16 GB (4 GB / <strong>GPU</strong>)<br />
© NVIDIA Corporation 2010
Tesla <strong>GPU</strong> Computing Products: 20 Series<br />
Tesla S2050<br />
1U System<br />
Tesla S2070<br />
1U System<br />
Tesla C2050<br />
Computing Board<br />
Tesla C2070<br />
Computing Board<br />
<strong>GPU</strong>s 4 Tesla <strong>GPU</strong>s 1 Tesla <strong>GPU</strong><br />
Double Precision<br />
Performance<br />
2.1 – 2.5 Teraflops 500+ Gigaflops<br />
Memory 12 GB (3 GB / <strong>GPU</strong>) 24 GB (6 GB / <strong>GPU</strong>) 3 GB 6 GB<br />
© NVIDIA Corporation 2010
HETEROGENEOUS CLUSTERS<br />
© NVIDIA Corporation 2010
Data Centers: Space and Energy Limited<br />
Traditional Data<br />
Center Cluster<br />
Quad-core<br />
CPU<br />
8 cores per server<br />
2x Performance requires 2x Number of Servers<br />
1000’s of cores<br />
1000’s of servers<br />
Heterogeneous Data<br />
Center Cluster<br />
10,000’s of cores<br />
100’s of servers<br />
Augment/replace<br />
host servers<br />
© NVIDIA Corporation 2010
Cluster Deployment<br />
! Now a number of <strong>GPU</strong> aware Cluster Management Systems<br />
! ActiveEon ProActive Parallel Suite® Version 4.2<br />
! Platform Cluster Manager and HPC Workgroup<br />
! Streamline Computing <strong>GPU</strong> Environment (SCGE)<br />
• Not just installation aids<br />
! i.e. putting the driver and toolkits in the right place<br />
! now starting to provide <strong>GPU</strong> node discovery and job steering<br />
! NVIDIA and Mellanox<br />
! Better interop. between Mellanox IF adapters and NVIDIA Tesla <strong>GPU</strong>s<br />
! Can provide as much as a 30% performance improvement by eliminating<br />
unnecessary data movement in a multi node heterogeneous application<br />
© NVIDIA Corporation 2010
Cluster Deployment<br />
! A number of cluster and distributed debug tools now support<br />
CUDA and NVIDIA Tesla<br />
! Allinea® DDT for NVIDIA CUDA<br />
! Extends well known Distributed Debugging Tool (DDT) with CUDA<br />
support<br />
! TotalView® debugger (part of an Early Experience Program)<br />
! Extends with CUDA support, have also announced intentions to support<br />
<strong>Open</strong>CL<br />
! Both <strong>based</strong> on the Parallel Nsight (NEXUS) Debugging API<br />
© NVIDIA Corporation 2010
NVIDIA Reality Server 3.0<br />
! Cloud <strong>computing</strong> platform for running 3D web applications<br />
! Consists of an Tesla RS <strong>GPU</strong>-<strong>based</strong> server cluster running<br />
RealityServer software from mental images<br />
! Deployed in a number of different sizes<br />
! From 2 – 100’s of 1U Servers<br />
! iray® - Interactive Photorealistic Rendering Technology<br />
! Streams interactive 3D applications to any web connected device<br />
! Designers and architects can now share and visualize complex 3D models<br />
under different lighting and environmental conditions<br />
© NVIDIA Corporation 2010
DISTRIBUTED COMPUTING PROJECTS<br />
© NVIDIA Corporation 2010
Distributed Computing Projects<br />
! Traditional distributed <strong>computing</strong> projects have been<br />
making use of <strong>GPU</strong>s for some time (non-commercial)<br />
! Typically have 000’s to 10,000’s of contributors<br />
! Folding@Home has access to 6.5 PFLOPS of compute<br />
! Of which ~95% comes from <strong>GPU</strong>s or PS3s<br />
! Many are bio-informatics, molecular dynamics<br />
and quantum chemistry codes<br />
! Represent the current sweet spot applications<br />
! Ubiquity of <strong>GPU</strong>s in home systems helps<br />
© NVIDIA Corporation 2010
Distributed Computing Projects<br />
! Folding@Home<br />
! Directed by Prof. Vijay Pande at Stanford University (http://folding.stanford.edu/)<br />
! Most recent <strong>GPU</strong>3 Core <strong>based</strong> on <strong>Open</strong>MM 1.0 (https://simtk.org/home/openmm)<br />
! <strong>Open</strong>MM library provides tools for molecular modeling simulation<br />
! Can be hooked into any MM application, allowing that code to do<br />
molecular modeling with minimal extra effort<br />
! <strong>Open</strong>MM has a strong emphasis on hardware acceleration providing<br />
not just a consistent API, but much greater performance<br />
! Current NVIDIA target is via CUDA Toolkit 2.3<br />
! <strong>Open</strong>MM 1.0 also provides Beta support for <strong>Open</strong>CL<br />
! <strong>Open</strong>CL is long term convergence software platform<br />
© NVIDIA Corporation 2010
Distributed Computing Projects<br />
! Berkeley <strong>Open</strong> Infrastructure for Network Computing<br />
! BOINC project (http://boinc.berkeley.edu/)<br />
! Platform infrastructure originally evolved from SETI@home<br />
! Many projects use BOINC and several of these have<br />
heterogeneous compute implementations (http://boinc.berkeley.edu/wiki/<strong>GPU</strong>_<strong>computing</strong>)<br />
! Examples include:<br />
! <strong>GPU</strong>GRID.net<br />
! SETI@home<br />
! Milkyway@home (IEEE 754 Double precision capable <strong>GPU</strong> required)<br />
! AQUA@home<br />
! Lattice<br />
! Collatz Conjecture<br />
© NVIDIA Corporation 2010
Distributed Computing Projects<br />
! <strong>GPU</strong>GRID.net<br />
! Dr. Gianni De Fabritiis,<br />
Research Group of Biomedical Informatics<br />
University Pompeu Fabra-IMIM, Barcelona<br />
! Uses <strong>GPU</strong>s to deliver high-performance all-atom biomolecular<br />
simulation of proteins using ACEMD (http://multiscalelab.org/acemd)<br />
! ACEMD is a production bio-molecular dynamics code specially optimized to run<br />
on graphics processing units (<strong>GPU</strong>s) from NVIDIA<br />
! It reads CHARMM/NAMD and AMBER input files with a simple and powerful<br />
configuration interface<br />
! A commercial implementation of ACEMD is available from Acellera Ltd (<br />
http://www.acellera.com/acemd/)<br />
! What makes this particularly interesting is that it is implemented using <strong>Open</strong>CL<br />
© NVIDIA Corporation 2010
Distributed Computing Projects<br />
! Have had to use brute force methods to deal with robustness<br />
! Run the same WU with multiple users and compare results<br />
! Running on purpose designed heterogeneous grids with ECC<br />
! Means that some of the paranoia can be relaxed<br />
(can at least detect there have been soft errors or WU corruption)<br />
! Results in better throughput on these systems<br />
! But does result in divergence between Consumer and HPC devices<br />
! Should be compensated for by HPC class devices being about 4x faster<br />
© NVIDIA Corporation 2010
Tesla Bio Workbench<br />
Accelerating New Science<br />
January, 2010<br />
http://www.nvidia.com/bio_workbench<br />
© NVIDIA Corporation 2010
Introducing Tesla Bio WorkBench<br />
TeraChem<br />
LAMMPS<br />
<strong>GPU</strong>-AutoDock<br />
MUMmer<strong>GPU</strong><br />
Download,<br />
Documentation<br />
Technical<br />
papers<br />
Discussion<br />
<strong>Forum</strong>s<br />
Benchmarks<br />
& Configurations<br />
Tesla Personal Supercomputer<br />
Tesla <strong>GPU</strong> Clusters<br />
© NVIDIA Corporation 2010
Tesla Bio Workbench Applications<br />
! AMBER (MD)<br />
! ACEMD (MD)<br />
! GROMACS (MD)<br />
! GROMOS (MD)<br />
! LAMMPS (MD)<br />
! NAMD (MD)<br />
! TeraChem (QC)<br />
! VMD (Visualization MD & QC)<br />
! Docking<br />
! <strong>GPU</strong> AutoDock<br />
! Sequence analysis<br />
! CUDASW++ (SmithWaterman)<br />
! MUMmer<strong>GPU</strong><br />
! <strong>GPU</strong>-HMMER<br />
! CUDA-MEME Motif Discovery<br />
© NVIDIA Corporation 2010
Recommended Hardware Configurations<br />
Tesla Personal Supercomputer<br />
Tesla <strong>GPU</strong> Clusters<br />
! Up to 4 Tesla C1060s per<br />
workstation<br />
! 4GB main memory / <strong>GPU</strong><br />
! Tesla S1070 1U<br />
! 4 <strong>GPU</strong>s per 1U<br />
! Integrated CPU-<strong>GPU</strong> Server<br />
! 2 <strong>GPU</strong>s per 1U + 2 CPUs<br />
© NVIDIA Corporation 2010<br />
Specifics at http://www.nvidia.com/bio_workbench
© NVIDIA Corporation 2010<br />
Molecular Dynamics and<br />
Quantum Chemistry Applications
Molecular Dynamics and<br />
Quantum Chemistry Applications<br />
! AMBER (MD)<br />
! ACEMD (MD)<br />
! HOOMD (MD)<br />
! GROMACS (MD)<br />
! LAMMPS (MD)<br />
! NAMD (MD)<br />
! TeraChem (QC)<br />
! VMD (Viz. MD & QC)<br />
! Typical speed ups of 3-8x on a single Tesla C1060 vs Modern 1U<br />
! Some applications (compute bound) show 20-100x speed ups<br />
© NVIDIA Corporation 2010
Usage of Tera<strong>Grid</strong> National Super<strong>computing</strong> <strong>Grid</strong><br />
Half of the<br />
cycles<br />
© NVIDIA Corporation 2010
© NVIDIA Corporation 2010<br />
Summary
Summary<br />
! ‘Fermi’ debuts HPC/Enterprise features<br />
! Particularly ECC and high performance double precision<br />
! Software development environments are now more mature<br />
! Significant software ecosystem is starting to emerge<br />
! Broadening availability of development tools, libraries and applications<br />
! Heterogeneous (<strong>GPU</strong>) aware cluster management systems<br />
! Economics, open standards and improving programming<br />
methodologies<br />
! Heterogeneous <strong>computing</strong> is gradually changing long held perception<br />
that it is just an ‘exotic’ niche technology<br />
© NVIDIA Corporation 2010
© NVIDIA Corporation 2010<br />
Questions?
© NVIDIA Corporation 2010<br />
Supporting Slides
AMBER Molecular Dynamics<br />
Alpha<br />
now<br />
Q1 2010<br />
Q2 2010<br />
• Generalized Born<br />
• PME: Particle Mesh Ewald<br />
• Beta release<br />
• Multi-<strong>GPU</strong> + MPI support<br />
• Beta 2 release<br />
Generalized Born Simulations<br />
! Implicit solvent GB results<br />
! 1 Tesla <strong>GPU</strong> 8x faster than 2<br />
quad-core CPUs<br />
7x 8.6x<br />
More Info<br />
http://www.nvidia.com/object/amber_on_tesla.html<br />
© NVIDIA Corporation 2010<br />
Data courtesy of San Diego Super<strong>computing</strong> Center
GROMACS Molecular Dynamics<br />
Beta<br />
now<br />
Q2 2010<br />
• Particle Mesh Ewald (PME)<br />
• Implicit solvent GB<br />
• Arbitrary forms of nonbonded<br />
interactions<br />
• Multi-<strong>GPU</strong> + MPI support<br />
• Beta 2 release<br />
! PME results<br />
! 1 Tesla <strong>GPU</strong> 3.5x-4.7x faster<br />
than CPU<br />
3.5x<br />
GROMACS on Tesla <strong>GPU</strong> Vs CPU<br />
Particle-Mesh-Ewald<br />
(PME)<br />
5.2x<br />
Reaction-Field<br />
Cutoffs<br />
22x<br />
More Info<br />
http://www.nvidia.com/object/gromacs_on_tesla.html<br />
© NVIDIA Corporation 2010<br />
Data courtesy of Stockholm Center for Biomembrane Research
HOOMD Blue Molecular Dynamics<br />
! Written bottom-up for CUDA<br />
<strong>GPU</strong>s<br />
! Modeled after LAMMPS<br />
! Supports multiple <strong>GPU</strong>s<br />
! 1 Tesla <strong>GPU</strong> outperforms 32<br />
CPUs running LAMMPS<br />
More Info<br />
http://www.nvidia.com/object/hoomd_on_tesla.html<br />
© NVIDIA Corporation 2010<br />
Data courtesy of University of Michigan
LAMMPS: Molecular Dynamics on a <strong>GPU</strong> Cluster<br />
! Available as beta on CUDA<br />
! Cut-off <strong>based</strong> non-bonded terms<br />
! 2 <strong>GPU</strong>s outperforms 24 CPUs<br />
! PME <strong>based</strong> electrostatic<br />
! Preliminary results: 5X speed-up<br />
! Multiple <strong>GPU</strong> + MPI support<br />
enabled<br />
2 <strong>GPU</strong>s = 24 CPUs <br />
More Info<br />
http://www.nvidia.com/object/lammps_on_tesla.html<br />
© NVIDIA Corporation 2010<br />
Data courtesy of Scott Hampton & Pratul K. Agarwal<br />
Oak Ridge National Laboratory
NAMD: Scaling Molecular Dynamics on a <strong>GPU</strong> Cluster<br />
! Feature complete on CUDA :<br />
available in NAMD 2.7 Beta 2<br />
! Full electrostatics with PME<br />
! Multiple time-stepping<br />
! 1-4 Exclusions<br />
! 4 <strong>GPU</strong> Tesla PSC outperforms<br />
8 CPU servers<br />
4 <strong>GPU</strong>s = 16 CPUs <br />
! Scales to a <strong>GPU</strong> cluster<br />
More Info<br />
http://www.nvidia.com/object/namd_on_tesla.html<br />
© NVIDIA Corporation 2010<br />
Data courtesy of Theoretical and Computational Bio-physics Group, UIUC
TeraChem: Quantum Chemistry Package for <strong>GPU</strong>s<br />
Beta<br />
now<br />
Q1 2010<br />
• HF, Kohn-Sham, DFT<br />
• Multiple <strong>GPU</strong>s supported<br />
• Full release<br />
• MPI support<br />
! First QC SW written ground-up for<br />
<strong>GPU</strong>s<br />
! 4 Tesla <strong>GPU</strong>s outperform 256 quadcore<br />
CPUs<br />
More Info<br />
http://www.nvidia.com/object/terachem_on_tesla.html<br />
© NVIDIA Corporation 2010
VMD: Acceleration using CUDA <strong>GPU</strong>s<br />
! Several CUDA applications in<br />
VMD 1.8.7<br />
! Molecular Orbital Display<br />
! Coulomb-<strong>based</strong> Ion Placement<br />
! Implicit Ligand Sampling<br />
! Speedups : 20x - 100x<br />
! Multiple <strong>GPU</strong> support enabled<br />
More Info<br />
http://www.nvidia.com/object/vmd_on_tesla.html<br />
Images and data courtesy of Beckman Institute for Advanced Science and Technology, UIUC<br />
© NVIDIA Corporation 2010
<strong>GPU</strong>-HMMER: Protein Sequence Alignment<br />
! Protein sequence alignment<br />
using profile HMMs<br />
! Available now<br />
! Supports multiple <strong>GPU</strong>s<br />
<strong>GPU</strong>s<br />
CPU<br />
! Speedups range from 60-100x<br />
faster than CPU<br />
! Download<br />
! http://www.mpihmmer.org/releases.htm<br />
© NVIDIA Corporation 2010
MUMmer<strong>GPU</strong>:<br />
Genome Sequence Alignment<br />
! High-throughput pair-wise<br />
local sequence alignment<br />
! Designed for large sequences<br />
! Drop-in replacement for<br />
“mummer” component in<br />
MUMmer software<br />
! Speedups 3.5x to 3.75x<br />
! Download<br />
© NVIDIA Corporation 2010<br />
! http://mummergpu.sourceforge.net