Packet Processing to support 40 /100GE Line Rates - Ethernet ...
Packet Processing to support 40 /100GE Line Rates - Ethernet ...
Packet Processing to support 40 /100GE Line Rates - Ethernet ...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Agenda• Current Challenges• Application Requirements• Tabula Solutions for <strong>100GE</strong> <strong>Packet</strong><strong>Processing</strong>• Tabula Technology & Tools• Summary• Next StepsSanta Clara, CA USA April2013 2
<strong>40</strong>GE / <strong>100GE</strong> & Beyond....<strong>Packet</strong> <strong>Processing</strong> ChallengesSanta Clara, CA USA April2013 3
<strong>Ethernet</strong> Stack: <strong>Processing</strong>Challenge<strong>Ethernet</strong> OSI ModelContext Layer FunctionDataDataDataSegments<strong>Packet</strong>sFramesBitsApplicationPresentationSessionTransportNetworkData LinkPhysicalDeep <strong>Packet</strong> InspectionApplication Aware Routing, SwitchingFlow ClassificationRoutingBridging, Switching1
Device Elements for <strong>Packet</strong><strong>Processing</strong>ASIC / ASSPFunctionality
Device Elements for <strong>Packet</strong><strong>Processing</strong>FPGAFunctionalityDSP Block6-Input LUT Architecture
<strong>Packet</strong> <strong>Processing</strong> at <strong>100GE</strong>• ~50% of <strong>Ethernet</strong> packets have a minimumpacket size ~64 bytes• <strong>Line</strong> rate packet processing requirements:1GE10GE<strong>40</strong>GE<strong>100GE</strong><strong>Ethernet</strong> Minimum <strong>Packet</strong> Size(64 bytes + 20-byte IFG)672ns67.2ns16.8ns6.72ns[K. Thompson, G. Miller, R. Wilder, Wide-area traffic patterns and characterization, IEEE Network, Dec. 1997]@<strong>100GE</strong> <strong>Line</strong> Rate, implementation = max 2-3 <strong>Packet</strong> operations + bufferingSanta Clara, CA USA April20137
Performance Bottlenecks areEVERYWHERE• Limited Application Performance• E.g. Network Security Appliance Implementation• Multiple ASICs, ASSPs, PLDs• Software• IO Bandwidth: Multiple 10G, typically• 4-8 10GE lanes• 10-20 1GE lanes• Application Performance:• Firewall Bandwidth: ~50%• IPS Throughput:
Application RequirementsSanta Clara, CA USA April2013 9
Application Requirements• Performance Matched Device Elements• High Bandwidth Memory Elements• Ability <strong>to</strong> S<strong>to</strong>re Once, and Simultaneously ReadMultiple TimesSanta Clara, CA USA April2013 10
Application Requirements• Simple Design Flow• Design for Intent• Au<strong>to</strong>matic Optimization for Higher Perfromance• Faster Design Cycle TimesSanta Clara, CA USA April2013 11
Tabula Solutions for <strong>100GE</strong> <strong>Packet</strong><strong>Processing</strong>Santa Clara, CA USA April2013 12
100G L2-L4 <strong>Packet</strong> Parser Engine• Function• 9-tuple Header extraction at wirelinespeed• Features• Multiple <strong>Ethernet</strong> packet formats• Rapid adaptation of additionalpacket formats• Microcode compiler• Performance• < 0.05% Device Resources• < 20ns Latency• 256b @ 500MHzMulti-portMem[Pattern Match]Multi-portMem[App Logic]Bit<strong>Processing</strong>PL RegPL RegSanta Clara, CA USA April2013 13
<strong>100GE</strong> – 12x10GE Layer2 BridgeSanta Clara, CA USA April2013 14
4 x 100G Crossbar• Function• 4-port <strong>100GE</strong> Crossbar• Features• Support 288KB of per port buffering• 8 queues / port with 32kB / queue• Built in unicast / multicast <strong>support</strong>• Performance• < 0.05% LUTs• < 20% LRAMs• Port-<strong>to</strong>-port latency ~12ns• 256b @ 472 MHzPorts are made of3-ported RAMblocksSanta © 2013 Clara, Tabula CA USA Confidential April 2013 15
Ternary Search EngineSanta Clara, CA USA April2013 16
Tabula Technology & ToolsSanta Clara, CA USA April2013 17
Spacetime Addresses TheInterconnect BottleneckInterconnect2 GHzRAMLogicRAM2 GHz2 GHz2 GHzTransceiversTransceiversEverything can run at 2 GHz on Tabula DevicesSanta Clara, CA USA April2013 18
Spacetime Addresses The MemoryBottleneck• High bandwidth, multi-ported RAM blocks• Up<strong>to</strong> 24 ports• Includes ASYNC FIFO Controllers• High bandwidth, multi-ported RAM enable• Crossbars• State machines & Microcontrollers• Data structures• Hash tables• ROMsSanta Clara, CA USA April2013 19
Queuing and De-Queuing– This memory is dividedin<strong>to</strong> separate queueswith independent writeports– A single read port hasaccess <strong>to</strong> all queues• FPGA / ASIC: Multi-ported memory must beconstructed out of dual ported memories• Similar principle for a single write port <strong>to</strong> multipleread ports20© 2013 Tabula, Inc.
Broadcasting– A single write portprovides access <strong>to</strong> thebuffer for the input data– Independent ReadPorts have access <strong>to</strong> allof the data• FPGA / ASIC: Multi-ported memory must beconstructed out of dual ported memories© 2013 Tabula, Inc.21
Data Selection / IndexingMemoryLogic Block 1Logic Block 2Logic Block 3• IPv4 header aligned <strong>to</strong> 32bits• Logic can operate on 64-bits and smaller portions of 16-bit or8-bit using multiple ports from the memory• Data normally appearing on different clocks can be accessedwithin the same clock using multiple read ports22© 2013 Tabula, Inc.
Tabula ABAX2P1 –Designed for 100G System Implementation• User clock rate: up <strong>to</strong> 2 GHz• Memory Rich Architecture: high capacity at highperformance• IO Rich Architecture: designed for 4 x 100 GigEtraffic• Hard IP• Memory Controllers• <strong>Processing</strong> Rich Architecture• DSP: 2 GHz blocks• LCB:Santa Clara, CA USA April2013 23
Tabula Tools: Stylus®ASIC / ASSP / FPGA: Multiple Optimization StepsTiming ClosureSanta Clara, CA USA April2013 24
Tabula Tools: Stylus®ASIC / ASSP / FPGAMultiple Optimization StepsSTYLUSTiming Aware Design FlowSynthesisPlacementRouting- Accurate feedback- Short cycleSanta Clara, CA USA April2013 25
SummarySanta Clara, CA USA April2013 26
<strong>100GE</strong> <strong>Packet</strong> <strong>Processing</strong> &Beyond• Industry trends driving <strong>40</strong>GE / <strong>100GE</strong> adoption• Big Data will be here sooner than we anticipate• Cheaper Optics: Silicon Pho<strong>to</strong>nics• Simplified Network Programming Initiative: SDN• <strong>40</strong>/<strong>100GE</strong> <strong>Packet</strong> <strong>Processing</strong> Solutions• Plan your architecture right• System Integration– Fixed Functions– Programmable Hardware– SoftwareSanta Clara, CA USA April2013 27
Want To Learn MoreSanta Clara, CA USA April2013 28
Tabula Solutions• Tabula Technology & Tools• http://cus<strong>to</strong>mer.tabula.comSanta Clara, CA USA April2013 29
QuestionsSanta Clara, CA USA April2013 30