03.05.2014 Views

GPU based cloud computing - Open Grid Forum

GPU based cloud computing - Open Grid Forum

GPU based cloud computing - Open Grid Forum

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>GPU</strong> <strong>based</strong> <strong>cloud</strong> <strong>computing</strong><br />

Dairsie Latimer, Petapath, UK<br />

Petapath<br />

© NVIDIA Corporation 2010


About Petapath<br />

! Founded in 2008 to focus on delivering innovative hardware and<br />

software solutions into the high performance <strong>computing</strong> (HPC) markets<br />

Petapath<br />

! Partnered with HP and SGI to deliverer two Petascale prototype<br />

systems as part of the PRACE WP8 programme<br />

! The system is a testbed for new ideas in usability, scalability and<br />

efficiency of large computer installations<br />

! Active in exploiting emerging standards for acceleration technologies and<br />

are members of Khronos group and sit on the <strong>Open</strong>CL working committee<br />

! We also provide consulting expertise for companies wishing to explore the<br />

advantages offered by heterogeneous systems<br />

© NVIDIA Corporation 2010


What is Heterogeneous or <strong>GPU</strong> Computing?<br />

x86<br />

PCIe bus<br />

<strong>GPU</strong><br />

© NVIDIA Corporation 2010<br />

Computing with CPU + <strong>GPU</strong><br />

Heterogeneous Computing


Low Latency or High Throughput?<br />

CPU<br />

! Optimised for low-latency<br />

access to cached data sets<br />

! Control logic for out-of-order<br />

and speculative execution<br />

<strong>GPU</strong><br />

! Optimised for data-parallel,<br />

throughput computation<br />

! Architecture tolerant of<br />

memory latency<br />

! More transistors dedicated to<br />

computation<br />

© NVIDIA Corporation 2010


NVIDIA <strong>GPU</strong> Computing Ecosystem<br />

ISV<br />

<strong>GPU</strong><br />

Architecture<br />

CUDA<br />

Training<br />

Company<br />

CUDA<br />

Development<br />

Specialist<br />

TPP / OEM<br />

Hardware<br />

Architect<br />

VAR<br />

CUDA SDK<br />

& Tools<br />

NVIDIA Hardware<br />

Solutions<br />

Customer<br />

Requirements<br />

Customer<br />

Application<br />

Hardware<br />

Architecture<br />

© NVIDIA Corporation 2010<br />

Deployment


Science is Desperate for Throughput<br />

Gigaflops<br />

1,000,000,000<br />

1 Exaflop<br />

1,000,000<br />

1 Petaflop<br />

1,000<br />

1<br />

Ran for 8 months to<br />

simulate 2 nanoseconds<br />

© NVIDIA Corporation 2010<br />

1982 1997 2003 2006 2010 2012


Power Crisis in Super<strong>computing</strong><br />

Household Power<br />

Equivalent<br />

Exaflop<br />

City<br />

Petaflop<br />

Town<br />

Teraflop<br />

Neighborhood<br />

Gigaflop<br />

1982 1996 2008 2020<br />

Block<br />

© NVIDIA Corporation 2010


Enter the <strong>GPU</strong><br />

GeForce ®<br />

Entertainment<br />

Tesla TM<br />

High-Performance Computing<br />

Quadro ®<br />

Design & Creation<br />

© NVIDIA Corporation 2010<br />

NVIDIA <strong>GPU</strong> Product Families


NEXT-GENERATION <strong>GPU</strong> ARCHITECTURE —<br />

‘FERMI’<br />

© NVIDIA Corporation 2010


Introducing the ‘Fermi’ Tesla Architecture<br />

The Soul of a Supercomputer in the body of a <strong>GPU</strong><br />

! 3 billion transistors<br />

! Up to 2× the cores (C2050 has 448)<br />

! Up to 8× the peak DP performance<br />

! ECC on all memories<br />

! L1 and L2 caches<br />

Giga Thread<br />

! Improved memory bandwidth (GDDR5)<br />

! Up to 1 Terabyte of <strong>GPU</strong> memory<br />

! Concurrent kernels<br />

! Hardware support for C++<br />

© NVIDIA Corporation 2010


Design Goal of Fermi<br />

Data<br />

Parallel<br />

! Expand<br />

performance sweet<br />

spot of the <strong>GPU</strong><br />

Instruction<br />

Parallel<br />

! Bring more users,<br />

more applications<br />

to the <strong>GPU</strong><br />

© NVIDIA Corporation 2010<br />

Many Decisions<br />

Large Data Sets


Streaming Multiprocessor Architecture<br />

! 32 CUDA cores per SM (512 total)<br />

! 8× peak double precision floating<br />

point performance<br />

! 50% of peak single precision<br />

! Dual Thread Scheduler<br />

! 64 KB of RAM for shared memory<br />

and L1 cache (configurable)<br />

Load/Store Units × 16<br />

Special Func Units × 4<br />

© NVIDIA Corporation 2010


CUDA Core Architecture<br />

! New IEEE 754-2008 floating-point standard,<br />

surpassing even the most advanced CPUs<br />

! Fused multiply-add (FMA) instruction<br />

for both single and double precision<br />

! New integer ALU optimized for<br />

64-bit and extended precision<br />

operations<br />

FP Unit<br />

INT Unit<br />

Load/Store Units x 16<br />

Special Func Units x 4<br />

© NVIDIA Corporation 2010


Cached Memory Hierarchy<br />

! First <strong>GPU</strong> architecture to support a true cache<br />

hierarchy in combination with on-chip shared memory<br />

! L1 Cache per SM (32 cores)<br />

! Improves bandwidth and reduces latency<br />

! Unified L2 Cache (768 KB)<br />

! Fast, coherent data sharing across all cores in the <strong>GPU</strong><br />

Parallel DataCache <br />

Memory Hierarchy<br />

Giga Thread<br />

© NVIDIA Corporation 2010


Larger, Faster, Resilient Memory Interface<br />

! GDDR5 memory interface<br />

! 2× signaling speed of GDDR3<br />

! Up to 1 Terabyte of memory attached to <strong>GPU</strong><br />

! Operate on larger data sets (3 and 6 GB Cards)<br />

Giga Thread<br />

! ECC protection for GDDR5 DRAM<br />

! All major internal memories are ECC protected<br />

! Register file, L1 cache, L2 cache<br />

© NVIDIA Corporation 2010


GigaThread Hardware Thread Scheduler<br />

© NVIDIA Corporation 2010


GigaThread Streaming Data Transfer Engine<br />

! Dual DMA engines<br />

! Simultaneous CPU<strong>GPU</strong> and <strong>GPU</strong>CPU<br />

data transfer<br />

! Fully overlapped with CPU and <strong>GPU</strong><br />

processing time<br />

! Activity Snapshot:<br />

SDT<br />

Kernel 0<br />

SDT0<br />

SDT1<br />

Kernel 1<br />

SDT0<br />

SDT1<br />

Kernel 2<br />

SDT0<br />

SDT1<br />

Kernel 3<br />

SDT0<br />

SDT1<br />

© NVIDIA Corporation 2010


Enhanced Software Support<br />

! Many new features in CUDA Toolkit 3.0<br />

! To be released on Friday<br />

! Including early support for the Fermi architecture:<br />

! Native 64-bit <strong>GPU</strong> support<br />

! Multiple Copy Engine support<br />

! ECC reporting<br />

! Concurrent Kernel Execution<br />

! Fermi HW debugging support in cuda-gdb<br />

© NVIDIA Corporation 2010


Enhanced Software Support<br />

! <strong>Open</strong>CL 1.0 Support<br />

! First class language citizen in CUDA Architecture<br />

! Supports ICD (so interoperability between vendors is a possibility)<br />

! Profiling support available<br />

! Debug support coming to Parallel Nsight (NEXUS) soon<br />

! gDebugger CL from graphicREMEDY<br />

! Third party <strong>Open</strong>CL profiler/debugger/memory checker<br />

! Software Tools Ecosystem is starting to grow<br />

! Given boost by existence of <strong>Open</strong>CL<br />

© NVIDIA Corporation 2010


“Oak Ridge National Lab (ORNL) has already announced it<br />

will be using Fermi technology in an upcoming super that is<br />

"expected to be 10-times more powerful than today's fastest<br />

supercomputer."<br />

Since ORNL's Jaguar supercomputer, for all intents and<br />

purposes, holds that title, and is in the process of being<br />

upgraded to 2.3 PFlops….<br />

…we can surmise that the upcoming Fermi-equipped super is<br />

going to be in the 20 Petaflops range.”<br />

September 30 2009<br />

© NVIDIA Corporation 2010


NVIDIA TESLA PRODUCTS<br />

© NVIDIA Corporation 2010


Tesla <strong>GPU</strong> Computing Products: 10 Series<br />

SuperMicro 1U<br />

<strong>GPU</strong> SuperServer<br />

Tesla S1070<br />

1U System<br />

Tesla C1060<br />

Computing Board<br />

Tesla Personal<br />

Supercomputer<br />

<strong>GPU</strong>s 2 Tesla <strong>GPU</strong>s 4 Tesla <strong>GPU</strong>s 1 Tesla <strong>GPU</strong> 4 Tesla <strong>GPU</strong>s<br />

Single Precision<br />

Performance<br />

Double Precision<br />

Performance<br />

1.87 Teraflops 4.14 Teraflops 933 Gigaflops 3.7 Teraflops<br />

156 Gigaflops 346 Gigaflops 78 Gigaflops 312 Gigaflops<br />

Memory 8 GB (4 GB / <strong>GPU</strong>) 16 GB (4 GB / <strong>GPU</strong>) 4 GB 16 GB (4 GB / <strong>GPU</strong>)<br />

© NVIDIA Corporation 2010


Tesla <strong>GPU</strong> Computing Products: 20 Series<br />

Tesla S2050<br />

1U System<br />

Tesla S2070<br />

1U System<br />

Tesla C2050<br />

Computing Board<br />

Tesla C2070<br />

Computing Board<br />

<strong>GPU</strong>s 4 Tesla <strong>GPU</strong>s 1 Tesla <strong>GPU</strong><br />

Double Precision<br />

Performance<br />

2.1 – 2.5 Teraflops 500+ Gigaflops<br />

Memory 12 GB (3 GB / <strong>GPU</strong>) 24 GB (6 GB / <strong>GPU</strong>) 3 GB 6 GB<br />

© NVIDIA Corporation 2010


HETEROGENEOUS CLUSTERS<br />

© NVIDIA Corporation 2010


Data Centers: Space and Energy Limited<br />

Traditional Data<br />

Center Cluster<br />

Quad-core<br />

CPU<br />

8 cores per server<br />

2x Performance requires 2x Number of Servers<br />

1000’s of cores<br />

1000’s of servers<br />

Heterogeneous Data<br />

Center Cluster<br />

10,000’s of cores<br />

100’s of servers<br />

Augment/replace<br />

host servers<br />

© NVIDIA Corporation 2010


Cluster Deployment<br />

! Now a number of <strong>GPU</strong> aware Cluster Management Systems<br />

! ActiveEon ProActive Parallel Suite® Version 4.2<br />

! Platform Cluster Manager and HPC Workgroup<br />

! Streamline Computing <strong>GPU</strong> Environment (SCGE)<br />

• Not just installation aids<br />

! i.e. putting the driver and toolkits in the right place<br />

! now starting to provide <strong>GPU</strong> node discovery and job steering<br />

! NVIDIA and Mellanox<br />

! Better interop. between Mellanox IF adapters and NVIDIA Tesla <strong>GPU</strong>s<br />

! Can provide as much as a 30% performance improvement by eliminating<br />

unnecessary data movement in a multi node heterogeneous application<br />

© NVIDIA Corporation 2010


Cluster Deployment<br />

! A number of cluster and distributed debug tools now support<br />

CUDA and NVIDIA Tesla<br />

! Allinea® DDT for NVIDIA CUDA<br />

! Extends well known Distributed Debugging Tool (DDT) with CUDA<br />

support<br />

! TotalView® debugger (part of an Early Experience Program)<br />

! Extends with CUDA support, have also announced intentions to support<br />

<strong>Open</strong>CL<br />

! Both <strong>based</strong> on the Parallel Nsight (NEXUS) Debugging API<br />

© NVIDIA Corporation 2010


NVIDIA Reality Server 3.0<br />

! Cloud <strong>computing</strong> platform for running 3D web applications<br />

! Consists of an Tesla RS <strong>GPU</strong>-<strong>based</strong> server cluster running<br />

RealityServer software from mental images<br />

! Deployed in a number of different sizes<br />

! From 2 – 100’s of 1U Servers<br />

! iray® - Interactive Photorealistic Rendering Technology<br />

! Streams interactive 3D applications to any web connected device<br />

! Designers and architects can now share and visualize complex 3D models<br />

under different lighting and environmental conditions<br />

© NVIDIA Corporation 2010


DISTRIBUTED COMPUTING PROJECTS<br />

© NVIDIA Corporation 2010


Distributed Computing Projects<br />

! Traditional distributed <strong>computing</strong> projects have been<br />

making use of <strong>GPU</strong>s for some time (non-commercial)<br />

! Typically have 000’s to 10,000’s of contributors<br />

! Folding@Home has access to 6.5 PFLOPS of compute<br />

! Of which ~95% comes from <strong>GPU</strong>s or PS3s<br />

! Many are bio-informatics, molecular dynamics<br />

and quantum chemistry codes<br />

! Represent the current sweet spot applications<br />

! Ubiquity of <strong>GPU</strong>s in home systems helps<br />

© NVIDIA Corporation 2010


Distributed Computing Projects<br />

! Folding@Home<br />

! Directed by Prof. Vijay Pande at Stanford University (http://folding.stanford.edu/)<br />

! Most recent <strong>GPU</strong>3 Core <strong>based</strong> on <strong>Open</strong>MM 1.0 (https://simtk.org/home/openmm)<br />

! <strong>Open</strong>MM library provides tools for molecular modeling simulation<br />

! Can be hooked into any MM application, allowing that code to do<br />

molecular modeling with minimal extra effort<br />

! <strong>Open</strong>MM has a strong emphasis on hardware acceleration providing<br />

not just a consistent API, but much greater performance<br />

! Current NVIDIA target is via CUDA Toolkit 2.3<br />

! <strong>Open</strong>MM 1.0 also provides Beta support for <strong>Open</strong>CL<br />

! <strong>Open</strong>CL is long term convergence software platform<br />

© NVIDIA Corporation 2010


Distributed Computing Projects<br />

! Berkeley <strong>Open</strong> Infrastructure for Network Computing<br />

! BOINC project (http://boinc.berkeley.edu/)<br />

! Platform infrastructure originally evolved from SETI@home<br />

! Many projects use BOINC and several of these have<br />

heterogeneous compute implementations (http://boinc.berkeley.edu/wiki/<strong>GPU</strong>_<strong>computing</strong>)<br />

! Examples include:<br />

! <strong>GPU</strong>GRID.net<br />

! SETI@home<br />

! Milkyway@home (IEEE 754 Double precision capable <strong>GPU</strong> required)<br />

! AQUA@home<br />

! Lattice<br />

! Collatz Conjecture<br />

© NVIDIA Corporation 2010


Distributed Computing Projects<br />

! <strong>GPU</strong>GRID.net<br />

! Dr. Gianni De Fabritiis,<br />

Research Group of Biomedical Informatics<br />

University Pompeu Fabra-IMIM, Barcelona<br />

! Uses <strong>GPU</strong>s to deliver high-performance all-atom biomolecular<br />

simulation of proteins using ACEMD (http://multiscalelab.org/acemd)<br />

! ACEMD is a production bio-molecular dynamics code specially optimized to run<br />

on graphics processing units (<strong>GPU</strong>s) from NVIDIA<br />

! It reads CHARMM/NAMD and AMBER input files with a simple and powerful<br />

configuration interface<br />

! A commercial implementation of ACEMD is available from Acellera Ltd (<br />

http://www.acellera.com/acemd/)<br />

! What makes this particularly interesting is that it is implemented using <strong>Open</strong>CL<br />

© NVIDIA Corporation 2010


Distributed Computing Projects<br />

! Have had to use brute force methods to deal with robustness<br />

! Run the same WU with multiple users and compare results<br />

! Running on purpose designed heterogeneous grids with ECC<br />

! Means that some of the paranoia can be relaxed<br />

(can at least detect there have been soft errors or WU corruption)<br />

! Results in better throughput on these systems<br />

! But does result in divergence between Consumer and HPC devices<br />

! Should be compensated for by HPC class devices being about 4x faster<br />

© NVIDIA Corporation 2010


Tesla Bio Workbench<br />

Accelerating New Science<br />

January, 2010<br />

http://www.nvidia.com/bio_workbench<br />

© NVIDIA Corporation 2010


Introducing Tesla Bio WorkBench<br />

TeraChem<br />

LAMMPS<br />

<strong>GPU</strong>-AutoDock<br />

MUMmer<strong>GPU</strong><br />

Download,<br />

Documentation<br />

Technical<br />

papers<br />

Discussion<br />

<strong>Forum</strong>s<br />

Benchmarks<br />

& Configurations<br />

Tesla Personal Supercomputer<br />

Tesla <strong>GPU</strong> Clusters<br />

© NVIDIA Corporation 2010


Tesla Bio Workbench Applications<br />

! AMBER (MD)<br />

! ACEMD (MD)<br />

! GROMACS (MD)<br />

! GROMOS (MD)<br />

! LAMMPS (MD)<br />

! NAMD (MD)<br />

! TeraChem (QC)<br />

! VMD (Visualization MD & QC)<br />

! Docking<br />

! <strong>GPU</strong> AutoDock<br />

! Sequence analysis<br />

! CUDASW++ (SmithWaterman)<br />

! MUMmer<strong>GPU</strong><br />

! <strong>GPU</strong>-HMMER<br />

! CUDA-MEME Motif Discovery<br />

© NVIDIA Corporation 2010


Recommended Hardware Configurations<br />

Tesla Personal Supercomputer<br />

Tesla <strong>GPU</strong> Clusters<br />

! Up to 4 Tesla C1060s per<br />

workstation<br />

! 4GB main memory / <strong>GPU</strong><br />

! Tesla S1070 1U<br />

! 4 <strong>GPU</strong>s per 1U<br />

! Integrated CPU-<strong>GPU</strong> Server<br />

! 2 <strong>GPU</strong>s per 1U + 2 CPUs<br />

© NVIDIA Corporation 2010<br />

Specifics at http://www.nvidia.com/bio_workbench


© NVIDIA Corporation 2010<br />

Molecular Dynamics and<br />

Quantum Chemistry Applications


Molecular Dynamics and<br />

Quantum Chemistry Applications<br />

! AMBER (MD)<br />

! ACEMD (MD)<br />

! HOOMD (MD)<br />

! GROMACS (MD)<br />

! LAMMPS (MD)<br />

! NAMD (MD)<br />

! TeraChem (QC)<br />

! VMD (Viz. MD & QC)<br />

! Typical speed ups of 3-8x on a single Tesla C1060 vs Modern 1U<br />

! Some applications (compute bound) show 20-100x speed ups<br />

© NVIDIA Corporation 2010


Usage of Tera<strong>Grid</strong> National Super<strong>computing</strong> <strong>Grid</strong><br />

Half of the<br />

cycles<br />

© NVIDIA Corporation 2010


© NVIDIA Corporation 2010<br />

Summary


Summary<br />

! ‘Fermi’ debuts HPC/Enterprise features<br />

! Particularly ECC and high performance double precision<br />

! Software development environments are now more mature<br />

! Significant software ecosystem is starting to emerge<br />

! Broadening availability of development tools, libraries and applications<br />

! Heterogeneous (<strong>GPU</strong>) aware cluster management systems<br />

! Economics, open standards and improving programming<br />

methodologies<br />

! Heterogeneous <strong>computing</strong> is gradually changing long held perception<br />

that it is just an ‘exotic’ niche technology<br />

© NVIDIA Corporation 2010


© NVIDIA Corporation 2010<br />

Questions?


© NVIDIA Corporation 2010<br />

Supporting Slides


AMBER Molecular Dynamics<br />

Alpha<br />

now<br />

Q1 2010<br />

Q2 2010<br />

• Generalized Born<br />

• PME: Particle Mesh Ewald<br />

• Beta release<br />

• Multi-<strong>GPU</strong> + MPI support<br />

• Beta 2 release<br />

Generalized Born Simulations<br />

! Implicit solvent GB results<br />

! 1 Tesla <strong>GPU</strong> 8x faster than 2<br />

quad-core CPUs<br />

7x 8.6x<br />

More Info<br />

http://www.nvidia.com/object/amber_on_tesla.html<br />

© NVIDIA Corporation 2010<br />

Data courtesy of San Diego Super<strong>computing</strong> Center


GROMACS Molecular Dynamics<br />

Beta<br />

now<br />

Q2 2010<br />

• Particle Mesh Ewald (PME)<br />

• Implicit solvent GB<br />

• Arbitrary forms of nonbonded<br />

interactions<br />

• Multi-<strong>GPU</strong> + MPI support<br />

• Beta 2 release<br />

! PME results<br />

! 1 Tesla <strong>GPU</strong> 3.5x-4.7x faster<br />

than CPU<br />

3.5x<br />

GROMACS on Tesla <strong>GPU</strong> Vs CPU<br />

Particle-Mesh-Ewald<br />

(PME)<br />

5.2x<br />

Reaction-Field<br />

Cutoffs<br />

22x<br />

More Info<br />

http://www.nvidia.com/object/gromacs_on_tesla.html<br />

© NVIDIA Corporation 2010<br />

Data courtesy of Stockholm Center for Biomembrane Research


HOOMD Blue Molecular Dynamics<br />

! Written bottom-up for CUDA<br />

<strong>GPU</strong>s<br />

! Modeled after LAMMPS<br />

! Supports multiple <strong>GPU</strong>s<br />

! 1 Tesla <strong>GPU</strong> outperforms 32<br />

CPUs running LAMMPS<br />

More Info<br />

http://www.nvidia.com/object/hoomd_on_tesla.html<br />

© NVIDIA Corporation 2010<br />

Data courtesy of University of Michigan


LAMMPS: Molecular Dynamics on a <strong>GPU</strong> Cluster<br />

! Available as beta on CUDA<br />

! Cut-off <strong>based</strong> non-bonded terms<br />

! 2 <strong>GPU</strong>s outperforms 24 CPUs<br />

! PME <strong>based</strong> electrostatic<br />

! Preliminary results: 5X speed-up<br />

! Multiple <strong>GPU</strong> + MPI support<br />

enabled<br />

2 <strong>GPU</strong>s = 24 CPUs <br />

More Info<br />

http://www.nvidia.com/object/lammps_on_tesla.html<br />

© NVIDIA Corporation 2010<br />

Data courtesy of Scott Hampton & Pratul K. Agarwal<br />

Oak Ridge National Laboratory


NAMD: Scaling Molecular Dynamics on a <strong>GPU</strong> Cluster<br />

! Feature complete on CUDA :<br />

available in NAMD 2.7 Beta 2<br />

! Full electrostatics with PME<br />

! Multiple time-stepping<br />

! 1-4 Exclusions<br />

! 4 <strong>GPU</strong> Tesla PSC outperforms<br />

8 CPU servers<br />

4 <strong>GPU</strong>s = 16 CPUs <br />

! Scales to a <strong>GPU</strong> cluster<br />

More Info<br />

http://www.nvidia.com/object/namd_on_tesla.html<br />

© NVIDIA Corporation 2010<br />

Data courtesy of Theoretical and Computational Bio-physics Group, UIUC


TeraChem: Quantum Chemistry Package for <strong>GPU</strong>s<br />

Beta<br />

now<br />

Q1 2010<br />

• HF, Kohn-Sham, DFT<br />

• Multiple <strong>GPU</strong>s supported<br />

• Full release<br />

• MPI support<br />

! First QC SW written ground-up for<br />

<strong>GPU</strong>s<br />

! 4 Tesla <strong>GPU</strong>s outperform 256 quadcore<br />

CPUs<br />

More Info<br />

http://www.nvidia.com/object/terachem_on_tesla.html<br />

© NVIDIA Corporation 2010


VMD: Acceleration using CUDA <strong>GPU</strong>s<br />

! Several CUDA applications in<br />

VMD 1.8.7<br />

! Molecular Orbital Display<br />

! Coulomb-<strong>based</strong> Ion Placement<br />

! Implicit Ligand Sampling<br />

! Speedups : 20x - 100x<br />

! Multiple <strong>GPU</strong> support enabled<br />

More Info<br />

http://www.nvidia.com/object/vmd_on_tesla.html<br />

Images and data courtesy of Beckman Institute for Advanced Science and Technology, UIUC<br />

© NVIDIA Corporation 2010


<strong>GPU</strong>-HMMER: Protein Sequence Alignment<br />

! Protein sequence alignment<br />

using profile HMMs<br />

! Available now<br />

! Supports multiple <strong>GPU</strong>s<br />

<strong>GPU</strong>s<br />

CPU<br />

! Speedups range from 60-100x<br />

faster than CPU<br />

! Download<br />

! http://www.mpihmmer.org/releases.htm<br />

© NVIDIA Corporation 2010


MUMmer<strong>GPU</strong>:<br />

Genome Sequence Alignment<br />

! High-throughput pair-wise<br />

local sequence alignment<br />

! Designed for large sequences<br />

! Drop-in replacement for<br />

“mummer” component in<br />

MUMmer software<br />

! Speedups 3.5x to 3.75x<br />

! Download<br />

© NVIDIA Corporation 2010<br />

! http://mummergpu.sourceforge.net

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!