SPECIALHIGH-END EMBEDDED COMPUTINGBeam-forming to scientific modeling: High-densitycompute platforms offer multiprocessor solutionsBy Ian StalkerThe <strong>CompactPCI</strong> st<strong>and</strong>ard continuesto grow in importancefor high-end applications.There is for example increased dem<strong>and</strong>for Gigabit Ethernet in a PICMG 2.16configuration. This marks a step towardsthe eventual replacement of parallel bustechnology with switch fabric interfaces,yet it must be stressed that the new generationof switch fabric technology suchas used in <strong>AdvancedTCA</strong> is still sometime away from maturity <strong>and</strong> generaldeployment. In this article Ian covers anumber of applications that make use ofthis type of technology, signal analysisbeing one of them.Today, the preponderance of <strong>CompactPCI</strong>applications are served by classic generalpurpose Single Board Computers (SBCs)<strong>and</strong> I/O products, where a single microprocessor,paired with application specificI/O, provides sufficient computingpower to perform the requisite task. Manyindustrial control applications, for example,fall into this category. At the otherend of the performance spectrum resideapplications that are essentially computebound meaning that the system designerswill take advantage of as many MIPs <strong>and</strong>GFLOPs as their fiscal or power budgetwill permit. Simulation <strong>and</strong> scientificmodeling are examples where there iscontinual need for greater speed <strong>and</strong> resolution,<strong>and</strong> which also frequently requiremultiprocessor solutions.Two classes of compute problems needmultiprocessing solutions. The first classcomprises compute farm applicationswhere multiple channels of data needto be processed but have only a smallor medium requirement for interchangebetween the processors working on theproblem. For these applications Ethernetprovides an ideal transport between theprocessors because it is simple to programwith portable software <strong>and</strong> is alsocost-effective. Scaling up to large systemsinvolves the relatively straightforwardprocess of packaging multiple boards intoenclosures.The second class of multiprocessor applicationsis that in which multiple processorswork with a shared database on asingle problem. These problems typicallyinvolve large amounts of interprocessorcommunications. One example of thistype of application is digital radio beamforming.In these applications one wouldbenefit from augmenting the interboardI/O with a higher performance, low overheadcommunications technology.A high-density 6U <strong>CompactPCI</strong> computeplatform based on the PICMG 2.16Packet Switching Backplane (cPSB) st<strong>and</strong>ard,such as Curtiss-Wright’s CHAMPAV-IV (CAV4) can be adapted to theseapplications with the addition of one ortwo StarFabric PMC modules to provideup to approximately 1 GBps of interboardI/O while significantly reducing processoroverhead. Figure 1 shows the CAV4.Figure 1PICMG 2.16 plusAs previously mentioned, the CAV4employs the PICMG 2.16 PacketSwitching Backplane st<strong>and</strong>ard. In fact,the CAV4 does not have a PCIbus backplaneinterface. The PICMG 2.16 st<strong>and</strong>ardwas developed to overcome the inherentlimitation of the PCIbus. With a single,shared, parallel bus capable of 533 MBps(best case, 5-slots), <strong>CompactPCI</strong> systemswere becoming limited by the throughputof their interconnect. The PICMG 2.16st<strong>and</strong>ard introduced the concept of usingEthernet (10/100 or 10/100/1000) as themain data transport mechanism withina system. Using the <strong>CompactPCI</strong> midplaneJ3 connector, the st<strong>and</strong>ard definesnode <strong>and</strong> fabric slots. Node slots haveone or two Ethernet interfaces. Fabricslots provide the Ethernet switching function.<strong>Systems</strong> comprise one or two switchcards, <strong>and</strong> up to 20 nodes, supporting atotal b<strong>and</strong>width of up to 5 GBps.The CAV4 extends the PICMG 2.16 principleeven further. It provides five GigabitEthernet interfaces to the backplane connectors.Each processing node, includingthe 8540 control processor, has an independentEthernet connection to the backplane.Two of these interfaces are on thepins defined by PICMG 2.16.<strong>Systems</strong> built using the PICMG 2.16Ethernet st<strong>and</strong>ard are precursors of thenew era of interprocessor communicationsusing switched fabric technology.St<strong>and</strong>ards such as VITA 41, VITA 46,<strong>AdvancedTCA</strong>, <strong>and</strong> <strong>CompactPCI</strong> Expressare all based on high-speed point-to-pointserial interconnect with switching insteadof buses. While these technologies continueto mature, Ethernet will garnermany design wins for the current generationof systems.Ethernet performanceIn the course of characterizing the performanceof the CAV4, the Ethernet throughputwas measured using the Wind River<strong>Systems</strong> VxWorks real time operatingsystem with the Blaster/Blastee test programsthat are included. These programshave two tunable parameters: transmitmessage size <strong>and</strong> receiver buffer size. Thebest performance, not surprisingly, wasobtained with the largest message sizes.The test used the st<strong>and</strong>ard VxWorks 5.5IPV4 network stack without optimizationsto take advantage of the Discovery III TCPchecksum offload feature. Table 1 showsthe performance obtained using PowerPC22 / <strong>CompactPCI</strong> <strong>and</strong> <strong>AdvancedTCA</strong> <strong>Systems</strong> / June 2005
SPECIAL7447A processors at different clock ratesusing message sizes of 48 KB.With a total of five Gigabit Ethernet interfaces,the card is capable of well in excessof 300 MBps throughput. A system comprisedof many CAV4s would have dramaticallymore Ethernet communicationsb<strong>and</strong>width than that provided by a singlePCIbus.Power consumptionIt is a well-known phenomenon that thepower consumption of microprocessors<strong>and</strong> accompanying system logic has beensteadily rising. Desktop processors fromIntel <strong>and</strong> AMD now top 100 W. In highperformancemultiprocessor computingapplications, the name of the game howeveris computing density. The questionis, “How much real computing work canbe accomplished within the confines ofst<strong>and</strong>ard enclosures <strong>and</strong> racking systems,without resorting to prohibitively expensivecooling technologies such as spraycooling or refrigerated air systems?”The latest generation of quad processordesigns is starting to push the envelopeof available cooling in the IEEE 1101.10mechanical st<strong>and</strong>ard. A precision airmass-flow measurement test was developedby Curtiss-Wright to qualify <strong>and</strong>accurately specify the cooling requirementsfor high-power, air-cooled processorboards. In concert with this program,we have characterized the power consumptionof the Compact CHAMP-AV IVat different processor clock rates <strong>and</strong> inletair-temperatures. Table 2 shows the powerfor a test scenario designed to stress theprocessors <strong>and</strong> memory subsystem of thecard, thereby consuming power in excessof the majority of real applications.These tests highlight some of the factorsthat influence power consumption. Fasterclock rates are usually accompanied withthe need to power the processor core athigher voltage, causing relatively largeincreases in power for relatively modestincreases in clock rates between 998 MHz<strong>and</strong> 1064 MHz. The other factor that isperhaps less well understood is processors’drawing more power when runningat higher silicon die temperatures,illustrating the need for effective thermaldesigns <strong>and</strong> air-management within theenclosure. Freescale’s power estimatesfor the 7448 processor are not publiclydisclosed, but we expect to see powerreductions at equivalent test conditions.Much of the technology required forapplications with high performance needsin the military market such as radar,sonar, <strong>and</strong> signal intelligence can beapplied in products aimed at the high-endof the commercial/industrial <strong>CompactPCI</strong>market space where performance <strong>and</strong>packaging density is valued but extremeruggedization is not.Ian Stalker is the DSP productmanager for Curtiss-Wright ControlsEmbedded Computing. He holds morethan 20 years of experience in theembedded industry <strong>and</strong> has a degreein Electronic Engineering.For further information, contact Ian at:Curtiss-Wright ControlsEmbedded Computing741-G Miller Drive, SELeesburg, VA 20175Tel: 703-779-7800 • Fax: 703-779-7805E-mail: ian.stalker@curtisswright.comWebsite: www.cwcembedded.comProcessor ClockEthernet Performance665 MHz 62.6 MBps998 MHz 79.1 MBps1064 MHz 79.1 MBpsTable 1Core Voltage 1.0 V 1.0 V 1.1 VCore Frequency 665 MHz 998 MHz 1064 MHzAverage Power@ 25 °C inlet47.6 W 54.7 W 64.9 WAverage Power@ 50 °C inlet50.6 W* 59.2 W 70.9 W*Power measured at 40 °C for this test.Table 2RSC# 23 @www.compactpci-systems.com/rsc