Dan Werthimer - CASPER - University of California, Berkeley
Dan Werthimer - CASPER - University of California, Berkeley
Dan Werthimer - CASPER - University of California, Berkeley
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Collaboration for Radio<br />
Astronomy Instrumentation<br />
<strong>Dan</strong> <strong>Werthimer</strong> and <strong>CASPER</strong> Collaborators<br />
http://casper.berkeley.edu
<strong>CASPER</strong><br />
Collaboration for Radio Astronomy<br />
Signal Processing and Electronics Research<br />
Collaborators<br />
Xilinx, Fujitsu, HP, Sun/Oracle, Nvidia, NSF, NASA, NRAO, NAIC,<br />
CFA (Havard/Smithsonian), Haystack (MIT), Caltech, Cornell, CSIRO/ATNF,<br />
JPL/DSN, South Africa KAT, Manchester/Jodrell Bank, GMRT (India),<br />
Oxford, Bologna, Metsahovi Observatory/Helsinki <strong>University</strong>,<br />
<strong>University</strong> <strong>of</strong> <strong>California</strong>, <strong>Berkeley</strong>; Swinburne <strong>University</strong> (Australia),<br />
Seti Institute, <strong>University</strong> <strong>of</strong> <strong>California</strong>, Santa Barbara;<br />
<strong>University</strong> <strong>of</strong> <strong>California</strong>, Los Angeles; CNRS (France), <strong>University</strong> <strong>of</strong> Maryland<br />
Nancay Observatory, Univerity <strong>of</strong> Cape Town (South Africa),<br />
ASTRON (Netherlands), Academica Sinica (Taiwan), Cambridge,<br />
Brigham Young <strong>University</strong>, Rhodes <strong>University</strong> (South Africa)
The Problem with the Traditional<br />
Hardware Development Model<br />
• Takes 5 to 10 years<br />
• Cost Dominated by NRE because <strong>of</strong><br />
custom Boards, Backplanes, Protocols<br />
• Antiquated by the time it’s released.<br />
• How to buy the hardware at the last<br />
minute<br />
• Each observatory designs from scratch
Solution:<br />
• Modular General Purpose Hardware<br />
– Low number <strong>of</strong> board designs<br />
– Can be upgraded piecemeal or all together<br />
– Reusable<br />
– Standard signal processing model which<br />
is consistent between upgrades.
<strong>CASPER</strong> Real-time Signal Processing Instrumentation<br />
• Low NRE, shared by the community<br />
• Rapid development<br />
• Open-source, collaborative<br />
• Reusable, platform-independent gateware<br />
• Modular, upgradeable hardware<br />
• Industry standard communication protocols<br />
• Use switches to solve correlator interconnect<br />
• Low Cost
Collaboration (not turn key instruments)<br />
• Share Open Source Libraries<br />
• Workshops (Tamara)<br />
• Video’s and Doc’s on Tool Flow, Libraries<br />
• Wiki, Mailing List<br />
• Open Source Boards (available from vendors)
Tutorials<br />
• 1: Introduction to Simulink, Roach and Borph<br />
2: 10GbE<br />
3: basic spectrometer (400MHz, 2k channels)<br />
4: 4-input pocket correlator (400MHz, 1k ch)<br />
5: ADC ROACH CPU/GPU<br />
• 6: GPU tutorials (Richard Edgar, David Kirk)<br />
Jason Manley, Terry Filiba, Mark Wagner,<br />
Wesley New, Andrew Marten, <strong>Dan</strong>ny Price,<br />
Jack Hickish, Griffin Foster
Roach Motel (Roach Nest) (KAT)
Current <strong>CASPER</strong> ADC Boards<br />
ADC2x1000-8 (dual 1GSa/sec, single 2Gsps, 8 bit)<br />
ADC1x3000-8 (3GSa/sec, 8 bit) ADC<br />
(6Gsps interleaved)<br />
64ADCx64-12 (64x 64MSa/sec, 12 bit)<br />
ADC4x250-8 (quad 250MSa/sec, 8 bit)<br />
katADC (dual 1.5GSa/sec, 8 bit, with gain, atten, synth)<br />
ADC2x550-12 (dual 550 Msps, 12 bit)<br />
ADC2x400-14 (dual 400 Msps, 14 bit)<br />
ADC1x5000-8 (1x5Gsps,2x2.5Gsps,4x1.25G sps – Taiwan)<br />
ADC1x1000-12 (optically isolated 12 bit 1Gsps – JPL)
Upcoming Hardware<br />
• Roach II (Virtex 6 – South Africa team)<br />
• Rhino (Spartan 6, ARM CPU, FMC connect)<br />
• Roach III (Virtex 7)<br />
• 20 to 26 Gsps ADC board
Board Interconnect - Upgradable<br />
• Problem: Backplanes are short lived<br />
(S100, Multibus, VME, ISA, EISA, PCI, PCIx, PCIe,<br />
compactPCI, compactPCIe, ATCA…)<br />
• Solution: Use 10Gbit Ethernet<br />
(10Gbe, Infiniband, Myrinet, Xaui, Aurora)<br />
Copper CX4 (40 meters max) or Optical
Beowulf Cluster Like General Purpose Architechture<br />
Dynamic Allocation <strong>of</strong> Resources, need not be FPGA based<br />
Polyphase<br />
Filter Banks<br />
Reconfigurable<br />
Compute Cluster<br />
ADC<br />
PFB<br />
FPGA DSP<br />
Module<br />
ADC<br />
PFB<br />
FPGA DSP<br />
Module<br />
.<br />
.<br />
.<br />
FPGA DSP<br />
Module<br />
Correlator<br />
.<br />
.<br />
.<br />
Commercial <strong>of</strong>f-the-shelf<br />
Multicast 10 Gbps (10GE<br />
or InfiniBand) Switch<br />
FPGA DSP<br />
Module<br />
FPGA DSP<br />
Module<br />
Beamformers/<br />
Spectrometers<br />
FPGA DSP<br />
Module<br />
.<br />
.<br />
.<br />
Pulsar timer<br />
.<br />
.<br />
.<br />
ADC<br />
PFB<br />
General-purpose CPUs
S<strong>of</strong>tware<br />
Hardware<br />
BORPH Operating System – Hayden So<br />
• An extended version <strong>of</strong><br />
Linux operating system<br />
– Treats FPGAs = CPUs<br />
• FPGA applications execute<br />
as hardware processes<br />
• HW/SW communication<br />
– UNIX file I/O<br />
• Benefits<br />
– Easy to understand for<br />
novice/experienced users<br />
– Remote control+monitor<br />
SW SW SW<br />
file<br />
pipe<br />
Device Driver<br />
Hardware Platform<br />
(Network, UART, HD…)<br />
User Library<br />
BORPH Kernel<br />
IPC<br />
Hardware User Library<br />
HW<br />
FPGA<br />
ioreg<br />
HW<br />
Poster Session 3 P3_09<br />
(11am):<br />
File System Access From<br />
Reconfigurable FPGA<br />
Hardware Processes in<br />
BORPH<br />
socket<br />
FPGA
Simulink-based Design Tool Flow<br />
• Simulink Xilinx System Generator Library<br />
• Custom BEE2 Library Blocksets<br />
• S<strong>of</strong>tware programmable registers<br />
• BEE Platform Studio
FFT controls<br />
Simulink Library – Aaron Parsons, David MacMahon<br />
Verilog Library – Jeff Mock<br />
• Transform length<br />
• Bandwidth<br />
• Complex or Real<br />
• Number <strong>of</strong> Polarizations<br />
• Input bit width and output bit width<br />
• twiddle coefficient bit width<br />
• Run-time programmable down-shifting<br />
• Decimate option
PFB vs. FFT
Digital Down-Converter<br />
• Selectable # <strong>of</strong> FIR taps<br />
• On-the-fly programmable mix frequency<br />
• Selectable FIR coeff<br />
• Agile sub-band selection.
X-Engine Correlation Architecture<br />
(Lynn Urry, Aaron Parsons)
Hardware and S<strong>of</strong>tware Libraries<br />
legend:
Applications
Applications<br />
• VLBI Mark 5B data recorder – Haystack, NRAO – 512 MHz<br />
• Beamforming – ATA, SMA –<br />
• SETI – Arecibo (UCB)<br />
JPL/UCB DSN (Preston, Gulkis, Levin, Jones)<br />
• Correlators and Imagers:<br />
ATA (Aaron Parsons, Mel Wright)<br />
PAPER (Reionization Experiment)<br />
Carma Next Gen<br />
MeerKAT/SKA South Africa<br />
GMRT next gen correlator <br />
Bologna (SKA), FASR <br />
Pulsar Timing and Searching, Transient<br />
Greenbank, Allen Telescope Array, VLA,<br />
Swinburne (Parkes), meerKAT, Nancay
SETI Spectrometers<br />
• Parkes Southern SERENDIP<br />
• ALFA SETI Sky Survey (300 MHz x 7 beams)<br />
• JPL DSN Sky Survey (eventually 20 GHz bandwidth)<br />
Radio Astronomy Spectrometers<br />
• GALFA Spectrometer – Arecibo Multibeam Hydrogen Survey<br />
• Astronomy Signal Processor – ASP – Don Backer, Ingrid<br />
Stairs, et al(pulsars)<br />
• Antenna Holography, ATNF, China<br />
• Gavert (DSN education, outreach) – 8 GHz BW –G. Jones<br />
• CMB Bolometer Readout – Caltech, UCB<br />
• Fast Readout Spectrometers (Parkes, NRAO, ATA...)
ATA Fly’s Eye Transient Instrument<br />
44 fast readout spectrometers<br />
3 weeks to build<br />
Ge<strong>of</strong>f Bower, Jim Cordes, Griffin<br />
Foster, Joeri van Leeuwen, Peter<br />
McMahon, Andrew Siemion, Mark<br />
Wagner, <strong>Dan</strong> <strong>Werthimer</strong>
Undergraduate Radio Astronomy Course
4096 channel Mars spectrometer<br />
“Chip in a day” FPGA to ASIC
<strong>CASPER</strong> Correlator Collaboration<br />
Allen Telescope Array (90 uS imaging – G. Jones)<br />
PAPER (Epoch <strong>of</strong> Reionization)<br />
Carma Next Generation<br />
MeerKAT/SKA South Africa<br />
GMRT next gen<br />
Bologna<br />
ISI (Infrared) – 6 Gsps<br />
SKADS (Oxford)<br />
SMA next gen (CFA, ASIAA)<br />
FASR, Baryon Acoustic Oscillation
<strong>CASPER</strong> FX Architecture<br />
F Engine 0<br />
X Engine 0<br />
F Engine 1<br />
X Engine 1<br />
. . .<br />
. . .<br />
. . .<br />
10GbE Switch<br />
F Engine N-1<br />
X Engine N-1
<strong>CASPER</strong> FXB Correlator/Beamformer<br />
(correlator needed to calibrate beamformer)<br />
F Engine 0<br />
X Engine 0<br />
F Engine 1<br />
X Engine 1<br />
. . .<br />
. . .<br />
. . .<br />
10GbE Switch<br />
F Engine N-1<br />
X Engine N-1
Correlators and Beamformers<br />
• Globally Asynchronous (like a computer cluster)<br />
• Data is time stamped with 1 PPS at ADC<br />
• Locally Synchronous, Globally Asynchronous<br />
• Solve problem <strong>of</strong> correlator/beamformer<br />
interconnect problem by using 10 Gbe switches<br />
(for both interconnect and fast readout)<br />
• No need for high density complex boards<br />
• Use Fifo’s to align data before correlation or<br />
beamforming…
Correlator Comparison - 2009 benchmarks<br />
GPU:<br />
CPU:<br />
1.3 MHz per GPU, Greenhill et al, http://www.scigpu.org<br />
2.0 MHz per 8 core computer, Roy, Gupta, et al, optimized code<br />
FPGA: 10.4 MHz per FPGA (<strong>CASPER</strong> : Xilinx XC5VSX95T roach boards)<br />
Power (critical for SKA)<br />
GPU: 150 watts/MHz (including GPU, CPU/motherboard, P.S.)<br />
CPU:<br />
FPGA:<br />
ASIC:<br />
150 watts/MHz<br />
6 watts per MHz (including digitizers, P.S., CPU, motherboard)<br />
2 watts per MHz (estimate)<br />
(For 32 antenna, dual polarization, 500 MHz correlator)
Packetized FPGA FX Correlator Cost<br />
(assuming non hierarchical)<br />
Cost (2010) = Bandwidth/GHz ( $1200 N + $5 N^2 )<br />
2010 $128M for 4,000 antenna, dual pol, 1 GHz bandwidth<br />
2012 $ 64M<br />
2014 $ 32M<br />
2016 $ 16M<br />
2018 $ 8M<br />
2020 $ 4M<br />
2022 $ 2M<br />
• N = number <strong>of</strong> dual polarization receivers (full stokes)<br />
• Moores Law Foundry Prediction through 22nm<br />
• ITRS roadmap predicts slow down (but they always do)
Astronomy Signal Processor<br />
Terry Filiba, Peter McMahon
Parkes Pulsar Discoveries<br />
Bailes, Filiba, McMahon et al<br />
• BPSR is 13 beams, 1024 channels, 64 us.<br />
• 50 pulsars<br />
• 11 millisecond pulsars<br />
• a magnetar (Levin et al.)<br />
• a pulsar with a planetary-mass companion.<br />
• large number <strong>of</strong> RRATs due to increased<br />
dynamic range over previous generation<br />
filterbanks.
1960 – First Radio Astronomy Digital Correlator<br />
21 lags<br />
300kHz clock<br />
discrete transistors<br />
$19,000<br />
Sandy<br />
Weinreb
Correlator processing power<br />
SKA<br />
GFlops<br />
10 6 DXB<br />
10 5<br />
10 7 10 3<br />
ALMA<br />
LOFAR<br />
SMA<br />
EVLA<br />
.<br />
10 9<br />
10 4<br />
EVN/WSRT<br />
10 3<br />
VLA<br />
10 6<br />
10 2<br />
10<br />
1<br />
DLB<br />
DCB<br />
DAS<br />
70 75 80 85 90 95 2000 05 10 2015<br />
source: Arnold van Ardenne
Moores Law – Instruments using FPGA’s: 2X per year<br />
(1,000,000 over 20 years)
Future Spectrometers<br />
2015 4 THz 400 beams<br />
10 GHz each<br />
2020 128 THz 12,800 beams<br />
2025 4000 THz 40,000 beams<br />
2030 128,000 THz 1M beams
Cost <strong>of</strong> FPGA integer computing<br />
2010 $200 per 1E11 MAC/sec<br />
2012 $100<br />
2014 $50<br />
2016 $25<br />
2018 $13<br />
2020 $6<br />
2022 $3<br />
• XC6SLX150T FPGA with PCB, DRAM, P.S., Cooling…<br />
• Moores Law Foundry Prediction through 22nm<br />
• ITRS roadmap predicts slow down (but they always do)
<strong>CASPER</strong> the Friendly...<br />
• Group Helping Open-source Signalprocessing<br />
Technology (GHOST)<br />
– Goal to help develop signal processing<br />
instrumenation and libraries for the<br />
community.<br />
– Open source hardware, gateware, and<br />
s<strong>of</strong>tware.<br />
– Provide training and tutorials<br />
– Not so much delivering turn-key instruments<br />
– Promote Collaboration