13.07.2015 Views

Power Grid Analysis in VLSI Designs - SERC

Power Grid Analysis in VLSI Designs - SERC

Power Grid Analysis in VLSI Designs - SERC

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Power</strong> <strong>Grid</strong> <strong>Analysis</strong> <strong>in</strong> <strong>VLSI</strong> <strong>Designs</strong>A ThesisSubmitted for the Degree ofMaster of Science (Eng<strong>in</strong>eer<strong>in</strong>g)In the Faculty of Eng<strong>in</strong>eer<strong>in</strong>gByKalpesh ShahSuper Computer Education and Research CentreIndian Institute of ScienceBangalore – 560012March 2007


Table of ContentsAcknowledgements..................................................................................................................3Abstract ...................................................................................................................................111 Introduction ...................................................................................................................131.1 Motivation........................................................................................................................................131.1.1 <strong>Power</strong> Estimation ................................................................................................................................... 161.1.2 <strong>Power</strong> Supply Noise ............................................................................................................................... 171.1.3 MTCMOS <strong>Analysis</strong> ................................................................................................................................. 221.2 Terms ..............................................................................................................................................241.3 Thesis outl<strong>in</strong>e and Contribution......................................................................................................252 Toggle Activity Estimation...........................................................................................272.1 Overview .........................................................................................................................................272.2 Toggle Activity Estimation ..............................................................................................................292.3 Multi-million gate solution ...............................................................................................................302.3.1 Deriv<strong>in</strong>g automatic toggle frequency values.............................................................................................. 312.3.2 Hierarchical Model<strong>in</strong>g ............................................................................................................................. 352.4 Validation and Results....................................................................................................................372.5 Summary.........................................................................................................................................383 <strong>Power</strong> Estimation..........................................................................................................393.1 Overview .........................................................................................................................................393.2 Current approaches to <strong>Power</strong> <strong>Analysis</strong>..........................................................................................423.3 <strong>Power</strong> analysis Tools......................................................................................................................453.3.1 <strong>Power</strong> Compiler: [67] .............................................................................................................................. 453.3.2 <strong>Power</strong> Mill (or Nano Sim) [4][68].............................................................................................................. 463.3.3 Prime <strong>Power</strong> [66].................................................................................................................................... 473.3.4 Other Tools ............................................................................................................................................ 473.4 Validation Flow................................................................................................................................483.4.1 Netlist Setup:.......................................................................................................................................... 503.4.2 Vector Generation .................................................................................................................................. 503.4.3 Interconnect setup .................................................................................................................................. 513.5 Validation and Results....................................................................................................................513.6 <strong>Power</strong> estimation applications........................................................................................................603.6.1 Average power/ground bus currents ........................................................................................................ 603.6.2 Average power dissipation ...................................................................................................................... 613.6.3 Electro migration failures......................................................................................................................... 613.6.4 <strong>Power</strong> Rout<strong>in</strong>g........................................................................................................................................ 613.6.5 Gate Oxide Integrity <strong>Analysis</strong> .................................................................................................................. 623.7 Summary.........................................................................................................................................624 <strong>Power</strong> Supply Noise <strong>Analysis</strong> .....................................................................................634.1 Overview .........................................................................................................................................634.2 Cell Characterization.......................................................................................................................644.2.1 Current Characterization Methodology..................................................................................................... 654.2.2 Current Characterization Flow................................................................................................................. 714.3 <strong>Power</strong> <strong>Grid</strong> network model<strong>in</strong>g ........................................................................................................724.3.1 <strong>Power</strong> <strong>Grid</strong> Current Waveform Model<strong>in</strong>g.................................................................................................. 744.4 Complete Flow ................................................................................................................................785


4.4.1 Tim<strong>in</strong>g Information Generation ................................................................................................................ 804.4.2 <strong>Power</strong> <strong>Grid</strong> Generator............................................................................................................................. 804.4.3 SPICE Simulation................................................................................................................................... 824.5 Validation and Results....................................................................................................................824.5.1 Peak <strong>Power</strong> Results ............................................................................................................................... 834.5.2 Peak Dynamic IR Drop Results ............................................................................................................... 844.6 Summary.........................................................................................................................................875 <strong>Power</strong> Up <strong>Analysis</strong>........................................................................................................895.1 Switched PG Networks...................................................................................................................915.2 Switch Network <strong>Analysis</strong>.................................................................................................................945.2.1 Switch Characterization .......................................................................................................................... 955.2.2 Current or Switch Prediction.................................................................................................................... 965.3 Results and <strong>Analysis</strong>.......................................................................................................................995.4 Summary.......................................................................................................................................1046 Conclusion...................................................................................................................1056.1 Summary.......................................................................................................................................1056.2 Scope of Future Work...................................................................................................................1067 References...................................................................................................................109Appendix A Sample SDC file...............................................................................................115Appendix B Sample SPEF Format......................................................................................116Appendix C <strong>Power</strong> Waveforms <strong>Analysis</strong>...........................................................................118Appendix D Current Characterization – sample spice deck ...........................................119Appendix E Waveform transformation example...............................................................1206


Table of FiguresFigure 1.1 <strong>Power</strong> Dissipation <strong>in</strong> CMOS designs ......................................................................................13Figure 1.2 <strong>Power</strong> Density trend <strong>in</strong> CMOS designs...................................................................................14Figure 1.3 Leakage and Dynamic <strong>Power</strong> Dissipation [2].........................................................................15Figure 1.4 Schematic of <strong>Power</strong> <strong>Grid</strong> <strong>in</strong> CMOS designs...........................................................................18Figure 1.5 Normalized delay and normalized delay to voltage ratio........................................................21Figure 1.6 Total power break up <strong>in</strong>to leakage and active........................................................................23Figure 2.1 Schematic of logic circuit 1......................................................................................................31Figure 2.2 Schematic of Logic Circuit 2....................................................................................................32Figure 2.3 Gated clock example ...............................................................................................................34Figure 2.4 Gate Level Netlist for 'simple' design......................................................................................36Figure 2.5 Tim<strong>in</strong>g Arcs <strong>in</strong> extracted model of 'simple' design..................................................................37Figure 3.1 Venn diagram of <strong>Power</strong> Components.....................................................................................40Figure 3.2 <strong>Power</strong> Estimation <strong>in</strong> Design Stages........................................................................................45Figure 3.3 <strong>Power</strong> Estimation Validation Flow...........................................................................................49Figure 3.4 Legends for Validation Flow....................................................................................................49Figure 4.1 Voltage over time representation at an <strong>in</strong>ternal design node ................................................63Figure 4.2 Schematic circuit for <strong>in</strong>stantaneous voltage drop analysis ....................................................64Figure 4.3 Inverter waveforms measured at different nodes...................................................................66Figure 4.4 transition time vs. peak power for Inverter..............................................................................68Figure 4.5 Transition time vs. peak power for nand gate.........................................................................68Figure 4.6 Load vs. peak power for AND gate.........................................................................................69Figure 4.7 Load vs. Peak power for OR gate...........................................................................................69Figure 4.8 State Dependency on cell switch<strong>in</strong>g .......................................................................................70Figure 4.9 Cell Characterization Flow.......................................................................................................72Figure 4.10 <strong>Power</strong> <strong>Grid</strong> Model<strong>in</strong>g.............................................................................................................73Figure 4.11 Peak IR drop Computation Flow...........................................................................................79Figure 4.12 Prime Time flow for arrival time computation .......................................................................80Figure 4.13 <strong>Power</strong> <strong>Grid</strong> Generation Flow.................................................................................................81Figure 4.14 PSN waveform of Proposed Method.....................................................................................86Figure 4.15 PSN Reference Waveform....................................................................................................86Figure 5.1 Gated <strong>Power</strong> Supply ([74]) ......................................................................................................89Figure 5.2 Layout of 1M gate with switch network...................................................................................92Figure 5.3 Current Glitch and Voltage Ramp at arbitrary switch output..................................................92Figure 5.4 Typical PG network with <strong>Power</strong> Switches...............................................................................93Figure 5.5 Schematic Switch network <strong>Analysis</strong> Flow...............................................................................95Figure 5.6 <strong>Analysis</strong> model of Virtual <strong>Power</strong> Network...............................................................................96Figure 5.7 Inf<strong>in</strong>itesimal Time Division for Current Prediction...................................................................97Figure 5.8 Reduced Switch Network for validation ................................................................................100Figure 5.9 Voltage Ramp up over Time for various nodes ....................................................................103Figure 5.10 Current comparison over time.............................................................................................103Figure 1 1MHz, Peak: 838.9 uW.............................................................................................................118Figure 2 100MHz, Peak: 840.7 uW.........................................................................................................1187


Figure 3 1GHz, Peak: 838.2 uW.............................................................................................................118Figure 4 1MHz base Waveform, 830.4uW .............................................................................................120Figure 5 100MHz Transformation, 830.4 uW .........................................................................................120Figure 6 1GHz Transformation for 1MHz, 830.4uW ..............................................................................1218


List of TablesTable 1.1 Consolidation of ITRS2003 Predictions ...................................................................................14Table 1.2 Generic Term Def<strong>in</strong>itions..........................................................................................................25Table 2.1 Comparison of Static vs Dynamic approaches for <strong>Power</strong> Estimation.....................................28Table 3.1 <strong>Power</strong> Model<strong>in</strong>g for CMOS gates.............................................................................................43Table 3.2 ISCAS89 circuit description ......................................................................................................54Table 3.3 Runtime comparison between vector less and SPICE............................................................55Table 3.4 Clock <strong>Power</strong> vs. Total <strong>Power</strong>....................................................................................................57Table 3.5 <strong>Power</strong> Estimation across various tools ....................................................................................60Table 4.1 Comparison of Peak power Dissipation...................................................................................84Table 4.2 Comparison of percentage peak <strong>in</strong>stantaneous IR drop.........................................................85Table 4.3 Comparison of percentage peak IR drop on ISCAS89 circuits...............................................85Table 5.1 Switch Prediction by proposed algorithm...............................................................................102Table 5.2 Voltage Prediction...................................................................................................................102Table 5.3 <strong>Power</strong> Up analysis - Runtime Comparison ............................................................................1039


Abstract<strong>Power</strong> has become an important design closure parameter <strong>in</strong> today’s ultra low submicrondigital designs. The impact of the <strong>in</strong>crease <strong>in</strong> power is multi-discipl<strong>in</strong>e to researchers rang<strong>in</strong>gfrom power supply design, power converters or voltage regulators design, system, board andpackage thermal analysis, power grid design and signal <strong>in</strong>tegrity analysis to m<strong>in</strong>imiz<strong>in</strong>g poweritself. This work focuses on challenges aris<strong>in</strong>g due to <strong>in</strong>crease <strong>in</strong> power to power grid designand analysis.Challenges aris<strong>in</strong>g due to lower geometries and higher power are very well researched topicsand there is still lot of scope to cont<strong>in</strong>ue work. Traditionally, designs go through average IRdrop analysis. Average IR drop analysis is highly dependent on current dissipation estimation.This work proposes a vector less probabilistic toggle estimation which is extension of one ofthe approaches proposed <strong>in</strong> literature. We have further used toggles computed us<strong>in</strong>g thisapproach to estimate power of ISCAS89 benchmark circuits. This provides <strong>in</strong>sight <strong>in</strong>to qualityof toggles be<strong>in</strong>g generated. <strong>Power</strong> Estimation work is further extended to comprehend withvarious state of the art methodologies available i.e. spice based power estimation, logicsimulation based power estimation, commercially available tool comparisons etc. We f<strong>in</strong>allyarrived at optimum flow recommendation which can be used as per design need and schedule.Today’s design complexity – high frequencies, high logic densities and multiple level clock andpower gat<strong>in</strong>g - has forced design community to look beyond average IR drop. High rate ofswitch<strong>in</strong>g activities <strong>in</strong>duce power supply fluctuations to cells <strong>in</strong> design which is known as11


<strong>in</strong>stantaneous IR drop. However, there is no good analysis methodology <strong>in</strong> place to analyze thisphenomenon. Ad hoc decoupl<strong>in</strong>g plann<strong>in</strong>g and on chip <strong>in</strong>tr<strong>in</strong>sic decoupl<strong>in</strong>g capacitance helpsto conta<strong>in</strong> this noise but there is no guarantee. This work also applies average togglecomputation approach to compute <strong>in</strong>stantaneous IR drop analysis for designs. Instantaneous IRdrop is also known as dynamic IR drop or power supply noise. We are propos<strong>in</strong>g cellcharacterization methodology for standard cells. This data is used to build power grid model ofthe design. F<strong>in</strong>ally, the power network is solved to compute <strong>in</strong>stantaneous IR drop.Leakage <strong>Power</strong> M<strong>in</strong>imization has forced design teams to do complex power gat<strong>in</strong>g – multilevel MTCMOS usage <strong>in</strong> <strong>Power</strong> <strong>Grid</strong>. This puts additonal analysis challenge for <strong>Power</strong> <strong>Grid</strong> <strong>in</strong>terms of ON/OFF sequenc<strong>in</strong>g and noise <strong>in</strong>jection due to it. This work expla<strong>in</strong>s the state of arthere and highlights some of the issues and trade offs us<strong>in</strong>g MTCMOS logic. It further suggestsa simple approach to quickly access the impact of MTCMOS gates <strong>in</strong> <strong>Power</strong> <strong>Grid</strong> <strong>in</strong> terms ofpeak currents and IR drop. Alternatively, the approach suggested also helps <strong>in</strong> MTCMOS gateoptimization. Early leakage optimization overhead can be computed us<strong>in</strong>g this approach.12


1 Introduction1.1 Motivation<strong>VLSI</strong> <strong>in</strong>dustry is fac<strong>in</strong>g one of the biggest challenges <strong>in</strong> its evolution – <strong>Power</strong> Integrity closure– the next after cross talk <strong>in</strong>duced <strong>in</strong>tegrity issues <strong>in</strong> previous decade. <strong>Power</strong> Dissipation hasphenomenally <strong>in</strong>creased across years as shown <strong>in</strong> Figure 1.1 giv<strong>in</strong>g rise to this challenge.Figure 1.2 shows the <strong>in</strong>crease <strong>in</strong> power density due to ultra low scal<strong>in</strong>g and hence <strong>in</strong>creas<strong>in</strong>gthe components cramped <strong>in</strong> unit area.<strong>Power</strong> (Watts)100000100001000100101Pentium® proc286 4863864004 8008 80808085808618KW5KW1.5KW500W0.11971 1974 1978 1985 1992 2000 2004 2008YearFigure 1.1 <strong>Power</strong> Dissipation <strong>in</strong> CMOS designs13


<strong>Power</strong> Density (W/cm2)1000010001001040048008808018086RocketNozzleNuclearReactorHot Plate8085286 386 486P6Pentium® proc1970 1980 1990 2000 2010YearFigure 1.2 <strong>Power</strong> Density trend <strong>in</strong> CMOS designsTable 1.1 below shows consolidation of ITRS2003 [1] predictions on power as well as itsimpact on design as well as operat<strong>in</strong>g voltages.20032004(90u)2005 20062007(65u)2008 20092010(45u)2012Vdd(High Perf) 1.2 1.2 1.1 1.1 1.1 1 1 1 0.9Vdd(Low <strong>Power</strong>) 1 0.9 0.9 0.9 0.8 0.8 0.8 0.7 0.7High Perf <strong>Power</strong> (W) 149 158 167 180 189 200 210 218 240Battery Operated(W) 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 3PG Pads 1700 1800 2000 2100 2200 2300 2400 2400 2600Table 1.1 Consolidation of ITRS2003 Predictions14


Further, Figure 1.3 shows that there is leakage as well as dynamic component of power thoseare cont<strong>in</strong>uously <strong>in</strong>creas<strong>in</strong>g – leakage dom<strong>in</strong>at<strong>in</strong>g dynamic – <strong>in</strong> newer technology nodes. [2]Next sections describe how these give rise to challenges <strong>in</strong> <strong>Power</strong> <strong>Grid</strong> analysis and leads to thework done.Figure 1.3 Leakage and Dynamic <strong>Power</strong> Dissipation [2]15


1.1.1 <strong>Power</strong> EstimationOne of the challenges <strong>in</strong> <strong>Power</strong> Integrity analysis is to predict accurate power dissipation – bothaverage as well as peak - of design. <strong>Power</strong> Estimation is required for package thermal analysis,power m<strong>in</strong>imization, and <strong>Power</strong> <strong>Grid</strong> design.The earliest proposed techniques of estimat<strong>in</strong>g power dissipation were strongly patterndependentcircuit simulation based e.g. SPICE or fast SPICE simulators [3-6]. Besides be<strong>in</strong>gstrongly pattern-dependent, these techniques are too slow to be used on modern very largescale<strong>in</strong>tegrated (<strong>VLSI</strong>) circuits for which high power dissipation is a major problem.In order to improve computational efficiency, other simulation-based techniques were proposedus<strong>in</strong>g various k<strong>in</strong>ds of tim<strong>in</strong>g, switch-level, and logic simulation [7-9]. In these approaches,lookup tables are obta<strong>in</strong>ed by electrical simulation of the basic library elements, and thecollected data are then used dur<strong>in</strong>g gate level simulation. These techniques generally assumethat the power supply and ground voltages are fixed, and only the supply current waveform isestimated. While they are <strong>in</strong>deed more efficient than traditional circuit simulation at the cost ofsome loss <strong>in</strong> accuracy, they rema<strong>in</strong> strongly pattern-dependent and they are still slow formodern multi-million gate designs where whole chip can not be simulated together.In order to overcome the shortcom<strong>in</strong>gs of simulation-based techniques, research has beenfocused on probabilistic and statistical techniques for toggle estimation. The use ofprobabilities to estimate power was first proposed <strong>in</strong> [11]. In this work, a zero-delay model wasmade so that the transition probabilities could be estimated us<strong>in</strong>g signal probabilities. Aprobabilistic power estimation approach that does compute the toggle power and does not makethe zero-delay or temporal <strong>in</strong>dependence assumptions, called probabilistic simulation was16


proposed <strong>in</strong> a few papers. In this technique, the use of probabilities was expanded to allow thespecification of probability waveforms. This approach assumed spatial <strong>in</strong>dependence, and wasnot restricted only to synchronous circuits.Another probabilistic approach was proposed, where the transition density measure of circuitactivity was <strong>in</strong>troduced by Farid N. [12]. An algorithm was also presented for propagat<strong>in</strong>g thetransition density <strong>in</strong> to the circuit. This approach does not make a zero-delay assumption andmakes only the spatial <strong>in</strong>dependence assumption. Result of this <strong>in</strong>dependence assumptionmakes computed density values <strong>in</strong>sensitive to the <strong>in</strong>ternal circuit delays.Yet another probabilistic approach was presented <strong>in</strong> [13] by A. Ghosh et. al., where B<strong>in</strong>aryDecision Diagrams (BDD’s) were used to take <strong>in</strong>to account <strong>in</strong>ternal node correlations andtoggle power, at the cost of <strong>in</strong>creased computation. This approach can become computationallyexpensive. Apart from that, latest literature describes more accurate toggle estimation methodsbased on Bayesian networks [14-16]. They get limited to handle high gate count designs. All ofthe above probabilistic and statistical techniques are applicable only to comb<strong>in</strong>ational circuits.They require the user to specify <strong>in</strong>formation on the activity at the latch outputs.This work addresses the toggle computation problem or pattern dependence problem for multimilliongate designs by extend<strong>in</strong>g Najm’s approach [12]. Us<strong>in</strong>g this average power estimationhas been performed <strong>in</strong> various stages of the designs.1.1.2 <strong>Power</strong> Supply NoiseWith a phenomenal rise <strong>in</strong> the switch<strong>in</strong>g speed <strong>in</strong> the VSLI circuits, the probability of largenumber of cells switch<strong>in</strong>g <strong>in</strong> a short period of time <strong>in</strong>creases. A large number of simultaneous17


switch<strong>in</strong>g occurr<strong>in</strong>g <strong>in</strong> a short period of time can cause a considerable amount of noise <strong>in</strong> thepower supply network of a circuit. <strong>Power</strong> supply noise means decrease <strong>in</strong> voltage seen by cell<strong>Power</strong> Ground nodes. Schematic of <strong>Power</strong> Network gird is shown <strong>in</strong> Figure 1.4. The resistiveparasitic R <strong>in</strong> the power distribution network is accountable for the resistive noise, which is theIR voltage drop <strong>in</strong> the PG network. Apart from R, on chip decoupl<strong>in</strong>g capacitance also plays abig role. The switch<strong>in</strong>g noise <strong>in</strong> the power distribution network must be conta<strong>in</strong>ed to a tolerablelevel to ensure the reliability/performance of a circuit.IO PadVdd Pad Vss Pad IO PadIO PadIO PadVssPadIO Pad5Vss PadIO Pad1IO PadVdd Pad Vss Pad IO PadFigure 1.4 Schematic of <strong>Power</strong> <strong>Grid</strong> <strong>in</strong> CMOS designsExcessive voltage drops manifest themselves as glitches on the PG buses and cause:• Erroneous logic signals18


• Degradation <strong>in</strong> switch<strong>in</strong>g speeds• Reduction <strong>in</strong> Noise Marg<strong>in</strong> and Driv<strong>in</strong>g Capability of the gatesAccord<strong>in</strong>g to a study on Pentium®4 [26], power supply noise can reduce clock frequency by6.5% on 130 nm node and can reduce clock frequency by 8% on 90 nm node. All these arehandled through various marg<strong>in</strong>s <strong>in</strong> design flow as there are no efficient solutions available toaddress dynamic V drop problem <strong>in</strong> design flow.There is some work done to estimate peak power as well as decoupl<strong>in</strong>g capacitor <strong>in</strong> this regard.In [27], a pattern-<strong>in</strong>dependent, l<strong>in</strong>ear time algorithm is described that estimates the maximumcurrent waveforms at various contact po<strong>in</strong>ts <strong>in</strong> the circuit. The algorithm is first demonstratedfor simple gate delay and current models. The expression for model<strong>in</strong>g the delays and currentwaveforms for a general gate is derived and the way to extend the algorithm under moregeneral models is also described. The authors improved the work <strong>in</strong> [28]. In [29] measures ofpeak power are proposed <strong>in</strong> the context of sequential circuits, and a procedure is presented toobta<strong>in</strong> lower bounds on these measures, as well as provid<strong>in</strong>g the actual <strong>in</strong>put vectors that atta<strong>in</strong>such bounds. Automatic generation of a functional vector loop for near-worst case powerconsumption is atta<strong>in</strong>ed.Paper [30] presents a statistical method for estimat<strong>in</strong>g the peakpower dissipation <strong>in</strong> <strong>VLSI</strong> circuits. The method is based on the theory of extreme orderstatistics and its application to the probabilistic distributions of the cycle-by-cycle powerconsumption, the maximum-likelihood estimation, and the Monte-Carlo simulation. It can beused to predict the maximum power of a <strong>VLSI</strong> circuit <strong>in</strong> the set of constra<strong>in</strong>ed <strong>in</strong>put vectorpairs as well as the complete set of all possible <strong>in</strong>put vector pairs. The simulation-based natureof the method avoids the limitations of a gate-level delay model and a gate-level circuitstructure. Also, the method produces maximum power estimates to satisfy user-specified error19


and confidence levels. Experimental results show that this method typically produces maximumpower estimates with<strong>in</strong> 5% of the actual value and with a 90% confidence level by onlysimulat<strong>in</strong>g less than 2500 <strong>in</strong>put vectors. Another technique described <strong>in</strong> [31] computes peakpowers of design while ma<strong>in</strong>ta<strong>in</strong><strong>in</strong>g the current waveform accuracy. It models logic gates bybreak<strong>in</strong>g the gates <strong>in</strong>to various nodes. It then models various currents <strong>in</strong> terms of these nodeswhich are evaluated quickly dur<strong>in</strong>g logic simulation to measure power. However, this is basedon logical simulation so extremely difficult to scale.Chen and L<strong>in</strong>g [36] proposed an approach to estimate the power supply noise based on an<strong>in</strong>tegrated package-level and chip-level power bus model. Chang, Gupta, and Breuer [37]proposed an analytical model to estimate the ground bounce caused by the switch<strong>in</strong>g <strong>in</strong> the<strong>in</strong>ternal circuitry for sub-micron <strong>VLSI</strong> circuits. Jiang, Cheng, and Deng [38] proposed aGenetic Algorithm-based approach that considered the dependence of switch<strong>in</strong>g noise on <strong>in</strong>putpatterns under a distributed RC model of the PG network. Zhao, Roy, and Kho proposed anevent-driven simulation based approach to calculate the worst case power supply noise under adistributed RLC model [39].There are still more challenges <strong>in</strong> this area where very little work has been done.First, to analyze <strong>Power</strong> Ground (PG) noise, worst case vectors are required us<strong>in</strong>g which theparasitic network of chip is simulated. Not only the whole approach needs lot of data andmemory but today’s SPICE simulators are not able to handle such complexity <strong>in</strong> terms ofruntime and capacity. Many times (read as all the time) determ<strong>in</strong><strong>in</strong>g the worst case vectors isnot straightforward.20


Second, today’s design has huge PG network. It is known that the voltages seen at variousnodes <strong>in</strong> this network will vary. A resultant voltage across power-ground bus for a macroimpacts the delay as shown <strong>in</strong> Figure 1.5. Note that delay is non-l<strong>in</strong>ear at low voltages. Further,the change <strong>in</strong> delay to change is voltage is more non l<strong>in</strong>ear compare to delay – this is of veryimportant to designers as it can cause delay issues or design failures. Due to high dependencyof delay to voltage, dynamic V-drop <strong>in</strong> PG network is fast becom<strong>in</strong>g a critical concern for thechip designers [41][59-60].normalized delay and normalizeddelay2voltageRise DelayFall Delayrisedelay2voltage_changefalldelay2voltage_change1.2 1.15 1.1 1.05 1 0.95 0.9 0.85 0.8VoltageFigure 1.5 Normalized delay and normalized delay to voltage ratioThird aspect to PG noise problem is that it is an iterative phenomenon [41]. When voltageacross cell decreases due to sudden rise <strong>in</strong> switch<strong>in</strong>g activity, it also changes the delays andhence the simultaneous switch<strong>in</strong>g. This <strong>in</strong> turn can reduce/<strong>in</strong>crease the dynamic noise issues.Reduce <strong>in</strong> a sense that the simultaneous switch<strong>in</strong>g may reduce all together or <strong>in</strong>crease becauseit can move one hot spot of the design to some other hot spot. Handl<strong>in</strong>g of this is not a trivialtask from analysis perspective.21


Four, design methodologies today expect analysis to meet predef<strong>in</strong>ed PG noise targets. Inreality, any acceptable voltage drop is f<strong>in</strong>e if we meet the required tim<strong>in</strong>g goals. However, thisis not done due to lack of analysis data.Five, it has been found that many times the device fail on testers due to excessive simultaneousswitch<strong>in</strong>g <strong>in</strong> SCAN test<strong>in</strong>g. This creates serious testability issues and hence not only we need toanalyze dynamic V drop for functional mode but also some other modes like test.This work addresses the dynamic PG noise problem. The problem is also described as dynamicV drop problem <strong>in</strong> some literature. Based on the above-mentioned issues, the goal is to addressthe dynamic V drop problem with efficient runtime that addresses today’s multi million gatedesigns. The goal is to also evaluate the impact of dynamic V drop on tim<strong>in</strong>g.1.1.3 MTCMOS <strong>Analysis</strong>Leakage power consists of more than half of total power <strong>in</strong> today’s ultra sub micron designs.See Figure 1.6 below.22


Figure 1.6 Total power break up <strong>in</strong>to leakage and activeLeakage power control and power network <strong>in</strong>tegrity have become one of the key area of<strong>in</strong>terest for today’s power sensitive designs. In comments on <strong>Power</strong> Consumption Problem atthe 2002 International Electron Devices Meet<strong>in</strong>g, Intel chairman Andrew Grove cited off-statecurrent leakage <strong>in</strong> particular as a limit<strong>in</strong>g factor <strong>in</strong> future microprocessor <strong>in</strong>tegration. [72]Designers have been com<strong>in</strong>g out <strong>in</strong>novative way to reduce leakage power us<strong>in</strong>g varioustechniques – reduc<strong>in</strong>g device power supply and frequency of operation [73], Multi-Vt transistorusage [74-79], controll<strong>in</strong>g <strong>in</strong>put states [74], memory leakage reduction [75], us<strong>in</strong>g reverse bodybias [76], and us<strong>in</strong>g transistor stack [77]. A detailed study on sources of leakage power andreduction techniques can be found <strong>in</strong> [82].Several techniques are available to reduce the leakage – gated power supply us<strong>in</strong>g powerswitches is one of the most promis<strong>in</strong>g techniques. <strong>Power</strong> switches consist of several PMOS23


transistors and controll<strong>in</strong>g signals and are used to dynamically switch off or on the powersupply to specific region <strong>in</strong> the chip. This work studies the challenges associated with us<strong>in</strong>gpower switches and proposes fast analysis technique to estimate peak currents while <strong>Power</strong>ramp up of logic happens.1.2 TermsGeneric terms used <strong>in</strong> this report are described below.ASICBlockNetlistPhysical DesignRTLCharacterizationAcronym for Application Specific Integrated Circuits. A custom or semicustom <strong>in</strong>tegrated circuit, such as a cell or gate array, created for a specificapplication. The complexity of ASICs typically requires significant use ofCAD techniques.Also known as functional block or module. Any block with<strong>in</strong> the designhierarchy <strong>in</strong>stantiated one or more times that will be laid out separately isreferred to as a block module. Block modules are def<strong>in</strong>ed divisions of a chipbased on functionality and can be worked on <strong>in</strong>dependently of otherfunctional blocks.A description of the circuit. The description can be a gate-level or Register-Transfer level (RTL) one. It can also be <strong>in</strong> different languages like Verilogor VHDL or SPICE.A portion of a chip or circuit correspond<strong>in</strong>g to a block module that is laidout separately us<strong>in</strong>g a Physical Design tool. It is also referred to as aphysical block, layout region, or layout block.Acronym for Register Transfer LevelElectrical analysis performed for the purpose of determ<strong>in</strong><strong>in</strong>g typical deviceperformance characteristics and/or parametric limits.24


CMOSDieAcronym for Complimentary Metal Oxide Semiconductor. An MOStechnology <strong>in</strong> which both P-channel and N-channel devices are fabricatedon the same die.A s<strong>in</strong>gle square or rectangular piece of silicon <strong>in</strong>to which a specificsemiconductor circuit has been diffused.Electromigration Particle migration <strong>in</strong> alum<strong>in</strong>um or copper th<strong>in</strong>-film or polysiliconconductors at gra<strong>in</strong> boundaries as a result of high current densities.Electromigration can lead to either an open circuit condition <strong>in</strong> a conductoror a short between adjacent connectors.InterconnectTim<strong>in</strong>g W<strong>in</strong>dowThe metallization connect<strong>in</strong>g two or more active elements on the surface ofa die; also, the wires connect<strong>in</strong>g the die to the package leads.Tim<strong>in</strong>g w<strong>in</strong>dow specifies the <strong>in</strong>terval of each circuit node at which atransition activity is anticipated. For a s<strong>in</strong>gle clock doma<strong>in</strong>, the time <strong>in</strong>tervalcan lie with<strong>in</strong> a clock period. There can be more than one <strong>in</strong>tervals oroverlapp<strong>in</strong>g <strong>in</strong>tervals based on complexity of path converg<strong>in</strong>g to the node.Table 1.2 Generic Term Def<strong>in</strong>itions1.3 Thesis outl<strong>in</strong>e and ContributionThere are 3 dist<strong>in</strong>ct problems addressed <strong>in</strong> this work.First, Average <strong>Power</strong> Estimation us<strong>in</strong>g probabilistic toggle estimation for multi-million gatedesigns. Unless specified by the user, the approach calculates switch<strong>in</strong>g probabilities as well asswitch<strong>in</strong>g rate at different nodes <strong>in</strong> the circuit (<strong>in</strong>clud<strong>in</strong>g primary <strong>in</strong>puts). We have studiedswitch<strong>in</strong>g activity calculation method with lot of literature already available and enhanced oneof the techniques to meet multimillion gate design needs. This work helps <strong>in</strong> average dynamic25


power estimation as well as addresses the challenges of toggle estimation which has variedapplications like peak power estimation, power supply noise analysis and reliability analysis.Second, Dynamic <strong>Power</strong> supply Noise estimation. In this regard, a prototype flow is developed<strong>in</strong> conjunction with Prime Time STA flow and Spice to measure <strong>Power</strong> Supply noise. The workdescribes gate characterization methodology that <strong>in</strong>volves one time SPICE simulation and howthe PG network is modeled us<strong>in</strong>g the characterized data.Third problem addressed is power grid analysis where MTCMOS gates are <strong>in</strong>serted. The workfocuses on MTCMOS analysis challenges and key factors to focus on when a bunch of logicturns ON from OFF state. In this regard, a flow is developed to estimate peak currents oroptimize MTCMOS resistance and switches.We restrict out scope to CMOS circuits mapped on a predef<strong>in</strong>ed cell library and we follow thetwo step paradigm – library model<strong>in</strong>g and analysis of design us<strong>in</strong>g modeled <strong>in</strong>formation.Library model<strong>in</strong>g <strong>in</strong>volves description of cells, their functional, structural or electrical behavioras needed for block or design analysis, which happens once for all. Electrical behaviormodel<strong>in</strong>g happens through characterization us<strong>in</strong>g circuit simulator (e.g. SPICE [3]).The document is organized as below. Toggle estimation problem is addressed <strong>in</strong> chapter 2.Chapter 3 describes the various <strong>Power</strong> Estimation techniques and tools available <strong>in</strong> <strong>in</strong>dustryand compares the power numbers with the above toggle estimation method. Chapter 4 describes<strong>Power</strong> Supply Noise Estimation and Chapter 5 describes MTCMOS <strong>Power</strong> Up analysis. F<strong>in</strong>ally,huge lists of publications are shown at the end for further reference.26


2 Toggle Activity Estimation2.1 OverviewIn CMOS technologies, the chip components draw power supply current only dur<strong>in</strong>g a logictransition if we ignore the small leakage current. The current is also proportional to the supplyvoltage value seen by the cell or macro. While this is considered an attractive low-powerfeature of these technologies, it makes the power estimation and voltage drop highly dependenton the switch<strong>in</strong>g activity <strong>in</strong>side these circuits [11][97]. It means, a more active circuit willconsume more current and hence will contribute higher Voltage drop. The activity of circuit isknown by runn<strong>in</strong>g simulation patterns and analyz<strong>in</strong>g the data. The pattern-dependence problemis serious. Often, the power of a functional block needs to be estimated when the rest of thechip has not yet been designed, or even completely specified. In such a case, very little may beknown about the <strong>in</strong>puts to this functional block, and complete and specific <strong>in</strong>formation aboutits <strong>in</strong>puts would be impossible to obta<strong>in</strong>.This drives pattern <strong>in</strong>dependent toggle activity estimation problem, often referred as vector lessapproach. S<strong>in</strong>ce vector less approach does not require patterns, it is also called ‘static’ whereasvector based approach is called ‘dynamic’. Table 2.1 compares these 2 approaches.STATICDYNAMICUses probabilistic approach as described<strong>in</strong> [12] or zero delay simulation basedUses Logic simulation to generate switch<strong>in</strong>gactivity or SPICE simulation to calculate power.27


STATICDYNAMICapproach.Vector-less approach.Many times gives upper bound.Model<strong>in</strong>g of certa<strong>in</strong> element (hardmacro/complex block) is difficult.Very fast. (few m<strong>in</strong>utes-hours)Lot of research <strong>in</strong>to products for averagepower estimation.Synopsys has: <strong>Power</strong> CompilerVector based approach. Hence quality is as good as<strong>in</strong>put vectors. Imag<strong>in</strong>e number of patterns possiblefor 100 <strong>in</strong>puts block.Gives accurate result.S<strong>in</strong>ce it is vector based, functional models can beused dur<strong>in</strong>g simulation.Very slow.(few days-weeks)Can give <strong>in</strong>stantaneous power.Synopsys has: <strong>Power</strong> Mill (Nano Sim)Table 2.1 Comparison of Static vs Dynamic approaches for <strong>Power</strong> EstimationThis work describes the approach used for toggle frequency estimation and its limitations.Further it proposes solution to handle these limitations which makes the approach usable forbig designs.Few terms are used below to clarify discussion:Transition Density: If a logic signal x(t) makes n(T) transitions <strong>in</strong> a time <strong>in</strong>ternal oflength T, then the transition density of x(t) is def<strong>in</strong>ed as:D(x) = n(T)/T where T is very huge time (<strong>in</strong>f<strong>in</strong>ite ideally)28


For large T, D(x) becomes time <strong>in</strong>variant function and hence there is no need to accountfor temporal correlation.Toggle Frequency: If a node x is toggl<strong>in</strong>g n(T) times over a time <strong>in</strong>terval of lengthT, then the toggle frequency F(x) is def<strong>in</strong>ed as:F(x) = n(T)/(2*T) where T is very huge time (<strong>in</strong>f<strong>in</strong>ite ideally)Example, if the node is switch<strong>in</strong>g at 20 MHz, it is expected that the node will switch 2times <strong>in</strong> 50 ns. As it can be seen, the toggle frequency can be converted to transitiondensity or switch<strong>in</strong>g activity by the follow<strong>in</strong>g equation,Toggle density = #of transitions/Period = Switch<strong>in</strong>g ActivityAll the three terms mentioned above are used <strong>in</strong>terchangeably <strong>in</strong> this document.It should be noted that toggle frequency of a node has no direct relation with the clockdoma<strong>in</strong>(s) <strong>in</strong> which node (or logic) exists. We have used the clock doma<strong>in</strong> frequency toupper bound the toggle frequency calculated by our approach.Signal Probability: Signal probability P(x) at a node x is def<strong>in</strong>ed as the averagefraction of clock period <strong>in</strong> which the stead state value of x is logichigh.2.2 Toggle Activity EstimationThis section gives overview of Farid Najm’s work.Boolean difference of output is computed with respect to each <strong>in</strong>put p<strong>in</strong>. Boolean difference offunction y (output) depends on x(each of the <strong>in</strong>put). It is def<strong>in</strong>ed as:29


dydx=y⊕yx = 1 x = 0(1)It was shown <strong>in</strong> [5] that, if the <strong>in</strong>puts x I to boolean logic are (spatially) <strong>in</strong>dependent, then thedensity of its output y is given by:ndyD( y)= ∑ P() D(xi)(2)dxii=1In (2), it is assumed that all <strong>in</strong>puts are <strong>in</strong>dependent. This can lead to <strong>in</strong>accuracy where primary<strong>in</strong>puts will be diverg<strong>in</strong>g and than reconverg<strong>in</strong>g to primary outputs – they are not really spatially<strong>in</strong>dependent. However, at a block, the primary <strong>in</strong>puts can be considered pretty much<strong>in</strong>dependent and hence the above approach can be modeled more accurately if the wholeblock’s boolean difference is computed.Given the signal probability and toggle density values at the primary <strong>in</strong>puts of a logic circuit, as<strong>in</strong>gle pass over the circuit, us<strong>in</strong>g (2), gives the density at every node. Note that apart fromestimat<strong>in</strong>g toggle densities at the output node, we also need to calculate output signalprobabilities to do toggle density estimation of subsequent circuit logic. This is simple for two<strong>in</strong>put AND gate.P(Y) = P(A)*P(B)orP(Y) = 1 – P(A)P(B) for NAND gate.2.3 Multi-million gate solutionWhen we apply the above approach, it gives good results for designs which are small and canbe analyzed flat and dom<strong>in</strong>ated by comb<strong>in</strong>ational logic. Beside, it is always not possible to runflat due to other logistic concerns like blocks are designed first or rest of the design is be<strong>in</strong>g30


done hierarchically or there is reusable IPs <strong>in</strong> design which do not have net list. The approachdescribed <strong>in</strong> previous section was extended to handle such requirements.We also came across several issues while apply<strong>in</strong>g this approach to some large designs [>5Mgates] and implement<strong>in</strong>g tool – Toggle Frequency Calculator. In this section, we will discusssolutions those addresses each of the problem <strong>in</strong> detail.2.3.1 Deriv<strong>in</strong>g automatic toggle frequency values1 Primary Input Handl<strong>in</strong>gThe toggle rate at Primary Input is not known. S<strong>in</strong>ce they are driven externally, there isno easy way to predict toggle rate for the same. The same is true for primary <strong>in</strong>putsignal probability. Consider the follow<strong>in</strong>g Figure 2.1 and Figure 2.2.Figure 2.1 Schematic of logic circuit 131


Figure 2.2 Schematic of Logic Circuit 2In case of above, Input Clk or D go<strong>in</strong>g to block can be primary <strong>in</strong>puts. Unless user givestoggle rate, it is highly difficult to compute the same. We used static tim<strong>in</strong>g analysis[24][25] specifications to derive these <strong>in</strong>puts. They are,Input Delay Specification – A constra<strong>in</strong>t that specifies the m<strong>in</strong>imum or maximumamount of delay from a clock edge to the arrival of a signal at aspecified <strong>in</strong>put port. Input delay specification is with respect to a clockthat triggers events on that signal.Clock specification – specifies the characteristics of a clock, <strong>in</strong>clud<strong>in</strong>g the clockname, source period and waveform.Mode Specifications – specifies the constant values applied on certa<strong>in</strong> port or p<strong>in</strong>sto drive tim<strong>in</strong>g analysis <strong>in</strong> a specific mode. This means that these p<strong>in</strong>sor ports are not toggl<strong>in</strong>g dur<strong>in</strong>g the analysis. It also specifies theconstant value to which the port or p<strong>in</strong> is tied to.For clock <strong>in</strong>puts, we used the toggle rate specified as per the clock specification.For non-clock <strong>in</strong>puts, we used the clock specified on the Input Delay specification.For constant ports, we used 0 toggle rate and static probability based on constant valuetied i.e. if it is constant 0, static probability is 0 else it is 1.32


A Sample SDC file with above command is shown <strong>in</strong> Appendix A. Note that SDC fileis collection of commands <strong>in</strong> tcl format so we have shown the commands which areprimarily required.2 Sequential element model<strong>in</strong>g (e.g. flip-flops, latches)Sequential elements do not directly switch arbitrarily when the <strong>in</strong>put switches. Hence,we can not apply the formula as mentioned <strong>in</strong> equation (1,2).We used follow<strong>in</strong>g formula to compute toggle frequency at the output of sequentialcells. Note that we are referr<strong>in</strong>g latches and basic flip-flops as part of sequential cellsand not the complex macros. They are dealt separately.Qout = m<strong>in</strong>(DataInput, clock/2)The upper bound<strong>in</strong>g of clock/2 is required s<strong>in</strong>ce we identified certa<strong>in</strong> cases where DataInput toggles more than clock/2. This is expla<strong>in</strong>ed below. For the cases, where data<strong>in</strong>put is not toggl<strong>in</strong>g more than clock/2, output can not toggle more than Data Input.Above equation takes care of these facts.3 Some Boolean gates were not tak<strong>in</strong>g care realistic scenarios: exor/exnor gates, muxEquation (1,2) can compute higher toggle rate than clock toggle rate. This can go evenhigher than clock toggle rate if there are more such gates <strong>in</strong> transitive fan out. We foundthat this is not the case on actual designs and <strong>in</strong> many cases, this was not <strong>in</strong>tendedbehavior. We exceptionally identified such cells and clipped their toggle rate to half ofthe clock toggle rate.In similar fashion, we exceptionally identified mux cells and assigned the output togglerate to maximum toggle rate of all <strong>in</strong>puts.33


4 Complex loop handl<strong>in</strong>gThese were handled by break<strong>in</strong>g the loops. We broke the loop at the 1 st po<strong>in</strong>t where wefound the loop form<strong>in</strong>g.5 Unconnected <strong>in</strong>puts go<strong>in</strong>g <strong>in</strong>to logicThis was handled by reverse track<strong>in</strong>g the first sequential cell encountered <strong>in</strong> thetransitive fan out of unconnected <strong>in</strong>puts. This algorithm gives the clock controll<strong>in</strong>g thetoggle rate down the l<strong>in</strong>e.If the unconnected <strong>in</strong>puts are clocks, we assigned the worst toggle rate of the blockitself.6 Gated clocks or generated clocksGated clock is a clock signal that can be modified by logic with<strong>in</strong> the design, such as aclock that can be turned off to save power. Schematic of gated clock is shown <strong>in</strong> Figure2.3.Figure 2.3 Gated clock exampleWe made the gated elements transparent for toggle propagation. A clock gat<strong>in</strong>g cell ishandled like a buffer.7 Design Constra<strong>in</strong>ts – Guidel<strong>in</strong>es to do realistic usable toggle activity estimation34


Some of the care needs to be taken despite of all the above solutions. For example,toggle estimation must be done based on the targeted application. This drives certa<strong>in</strong><strong>in</strong>puts used <strong>in</strong> 1-6 above. In the implementation, we kept certa<strong>in</strong> hooks to give controlto the user.2.3.2 Hierarchical Model<strong>in</strong>g1. Huge portion of the design is occupied by memories however memory output switch<strong>in</strong>gactivity calculation is not straight forward2. Complex functionalities: Hard macros3. Multi-million gates cannot afford to have flat analysis due to cycle time and <strong>in</strong>herentlimitations of probabilistic approaches. We needed to devise a method to do hierarchicalanalysis by model<strong>in</strong>g sub-blocks and us<strong>in</strong>g them as a black box.We used the tim<strong>in</strong>g model<strong>in</strong>g approach to handle (1), (2), (3).All standard library components are presently modeled <strong>in</strong> liberty file. [69] Static tim<strong>in</strong>ganalysis tools can generate similar liberty file for blocks after complet<strong>in</strong>g the analysis. [25]This file has follow<strong>in</strong>g <strong>in</strong>formation,• Input p<strong>in</strong> 2 output p<strong>in</strong> tim<strong>in</strong>g arch• Setup and Hold constra<strong>in</strong>ts for the data <strong>in</strong>put and clock <strong>in</strong>put• Output tim<strong>in</strong>g with respect to either <strong>in</strong>put p<strong>in</strong> or related clockWe derive output toggle frequency f(out) as below.35


In case of <strong>in</strong>put 2 output tim<strong>in</strong>g Archf(out) = maximum(all controll<strong>in</strong>g <strong>in</strong>put toggle rate)In case of clock 2 output tim<strong>in</strong>g Archf(out) = average switch<strong>in</strong>g activity of clock doma<strong>in</strong>Figure 2.4 shows the gate level netlist of a design called ‘simple’. Figure 2.5 shows the tim<strong>in</strong>garcs which will be extracted by Prime Time – a lead<strong>in</strong>g <strong>in</strong>dustry tim<strong>in</strong>g analysis tool. [25]Tim<strong>in</strong>g arc <strong>in</strong>formation will be used to compute output toggle rate as expla<strong>in</strong>ed below.Figure 2.4 Gate Level Netlist for 'simple' design36


Figure 2.5 Tim<strong>in</strong>g Arcs <strong>in</strong> extracted model of 'simple' designThere are comb<strong>in</strong>ational archs from i3 to out2 and i1 to out2. Hence, output toggle rate at out2will be controlled by the same clock as i3 or i1. In this case, we assign maximum of i3 or i1toggle rate at output p<strong>in</strong>. The other tim<strong>in</strong>g arch is clk2->out1. In this case, out1 will be assignedaverage switch<strong>in</strong>g activity of clk2.Thus us<strong>in</strong>g tim<strong>in</strong>g model <strong>in</strong>formation, we generate output toggle rates of memories, complexhard macros or blocks.2.4 Validation and ResultsAbove changes were <strong>in</strong>corporated <strong>in</strong>to executable code and applied to ISCAS89 circuits. Theresults were compared through power estimation as discussed <strong>in</strong> next chapter.37


2.5 SummaryIn this work, we address real issues be<strong>in</strong>g faced by large designs. Automatic toggle generationeases usability as well as improves accuracy. Hierarchical analysis helps <strong>in</strong> hierarchical designwhich is common methodology to handle design complexity.38


3 <strong>Power</strong> Estimation3.1 OverviewAccurate <strong>Power</strong> Estimates are necessary at various stages of the design <strong>in</strong> order to make correctarchitectural, implementation and cost tradeoffs.[61] Architectural level tradeoffs are higherlevel and <strong>in</strong>volves software or <strong>in</strong>struction level power model<strong>in</strong>g or high level activity numbersfor different blocks to do implementation tradeoffs. Many times weighted averages are used toidentify best cost options [62-65]. Once the design gets converted to structural net list andPhysical Design starts, <strong>Power</strong> Estimation ma<strong>in</strong>ly drives package design, PG network designand lower level power m<strong>in</strong>imization. In this case, power dissipation is described as below.P = (A*C*V^2*f) + (τ*A*V*Ishort) + (V*Ileak)WhereA = activity factorthis specifies the amount of switch<strong>in</strong>g at various <strong>in</strong>ternalnodes of design. Note that ‘f’ is clock frequency which is readily available formost designs. Activity factor specifies about how much a node toggles per ‘f’transitions of clock. The activity factor can be derived from simulation patternsof the logic.C = capacitanceInterconnect load capacitance or wire capacitanceV = dynamic voltagevoltage at which the logic operatesf = frequencyclock frequency at which the logic operates39


Ishort = short-circuit current dur<strong>in</strong>g switch<strong>in</strong>gDur<strong>in</strong>g transition <strong>in</strong> CMOSlogic, both NMOS and PMOS are ON for a momentarily of time. This timecurrent f<strong>in</strong>ds a direct path from <strong>Power</strong> Supply to Ground. This is called shortcircuit current. It is dependent on <strong>in</strong>put transition duration of CMOS.τ= duration of short-circuit currentIleak = leakage current [72-80][32]Figure 3.1 def<strong>in</strong>es various components of power and their relation ship or contribution to totalpower estimation.Cell Internal Switch<strong>in</strong>g <strong>Power</strong> –can vary based on macro SizeInternal<strong>Power</strong>Short Circuit powerpower dissipated by amomentary short circuitbetween the P and Ntransistors of a gatedur<strong>in</strong>g switch<strong>in</strong>gSwitch<strong>in</strong>g power (70-80%)power dissipated by thecharg<strong>in</strong>g and discharg<strong>in</strong>g ofthe load capacitance.(VDD ^2)*(Cload(i) *TR(i))∑∀CellStatic (leakage) power (5%):power dissipated by a gatewhen it is not switch<strong>in</strong>g∑∀ Cell(i)PCellLeakage(i)Dynamic <strong>Power</strong> consists ofSwitch<strong>in</strong>g <strong>Power</strong> and Short Circuit <strong>Power</strong>ASIC Flow characterizes librariesfor average and leakage power.Figure 3.1 Venn diagram of <strong>Power</strong> Components40


In this work, above power components and their computation are extensively studied. Toaddress the problem <strong>in</strong> systematic manner, power estimation has been simplified the follow<strong>in</strong>gway. These assumptions are acceptable given the global analysis that we are consider<strong>in</strong>g.<strong>Power</strong> supply and ground voltage levels throughout the chip are fixed so that it becomessimpler to compute the power by estimat<strong>in</strong>g the current drawn by every sub-circuit assum<strong>in</strong>g agiven fixed power supply voltage. Note that this does not mean that different blocks can not beat different voltage level. This allows pre-characteriz<strong>in</strong>g library components for requiredvoltage po<strong>in</strong>ts.The circuit is built of logic gates and latches or reusable IPs, and has the popular and wellstructureddesign style of a synchronous sequential circuit. In other words, it consists of flopsdriven by a common clock and comb<strong>in</strong>ational logic blocks whose <strong>in</strong>puts (outputs) are derivedfrom flop outputs (<strong>in</strong>puts). It is also assumed that the flops are edge-triggered and, with the useof CMOS design technology, the circuit draws no steady-state supply current. This allowsbreak<strong>in</strong>g down average power dissipation of the circuit <strong>in</strong>to 2 components• The power consumed by the flops• The power consumed by the comb<strong>in</strong>ational logic blocks.This chapter is organized as below. In the next section, we have further expla<strong>in</strong>ed cell basedpower analysis. Next section briefly <strong>in</strong>troduces tools used to compare power estimation asperformed by toggle computation described <strong>in</strong> previous chapter. Later validation and results aredescribed.41


3.2 Current approaches to <strong>Power</strong> <strong>Analysis</strong>Cell based power estimation consists of cell characterization and logic simulation or activityestimation. The characterization phase entails a set of electrical simulations of each library cellfor all possible <strong>in</strong>put transitions and for a wide range of fan<strong>in</strong> and fanout conditions. Tim<strong>in</strong>gand power <strong>in</strong>formation obta<strong>in</strong>ed <strong>in</strong> this way is used to construct lookup tables for the basiclibrary elements [46][69].Summ<strong>in</strong>g the leakage power of the design’s constituent library cells derives the total leakagepower of a circuit:P leakageTotal = ∑∀ Cell (i)PCellLeaka ge(i) (3)Where P cellLeakage(I) is the leakage power dissipation of each cell. Technology library developersannotate the library cells with the approximate total leakage power dissipated by each cell.There is usually a s<strong>in</strong>gle static power number per library cell but sometimes leakage power candepend on the logical condition of the cell. In this case, the library cell is annotated with a statedependent static power.A cell’s <strong>in</strong>ternal power is the sum of the <strong>in</strong>ternal power of all of the cell’s <strong>in</strong>puts and outputs asmodeled <strong>in</strong> the technology library:∑P = Ei * A(i) * f ( i)(4)Internal∀P<strong>in</strong>(i)Where Ei is the <strong>in</strong>ternal energy of each p<strong>in</strong>. In practice, the <strong>in</strong>ternal energy if a p<strong>in</strong> ischaracterized <strong>in</strong> the technology library and can be accessed by simple table look-up. Depend<strong>in</strong>g42


on the required accuracy, different look-up tables can be provided by the library designers asexpla<strong>in</strong>ed <strong>in</strong> Table 3.1.Lookup TableP<strong>in</strong>DirectionIndicesInput/OutputInput Transition OR Output load capacitanceOutputInput transition and output load capacitanceOnedimensionalTwodimensionalThreedimensionalOutputInput transition and output load capacitance of the two outputsthat have equal or opposite logic valuesTable 3.1 <strong>Power</strong> Model<strong>in</strong>g for CMOS gatesThe switch<strong>in</strong>g power is calculated <strong>in</strong> the follow<strong>in</strong>g way:∑Pswitch<strong>in</strong>g = ( VDD^2) * ( Cload(i)* A(i) * f ( i))(5)∀CellWhere Cload(i) is the capacitive load of net i. Without any physical <strong>in</strong>formation, the loadcapacitance Cload(i) is calculated us<strong>in</strong>g the wire load model of the net and the fanout of thedriv<strong>in</strong>g p<strong>in</strong>. Usually, this approach achieves relative accuracy.Apart from the approaches mentioned above, the follow<strong>in</strong>g factors are also important foraccurate power estimation.43


1. Temperature dependency of power. <strong>Power</strong> consumption <strong>in</strong> CMOS depends on mobilityfactors, threshold voltage and dop<strong>in</strong>g concentrations. These factors are temperaturedependent. Hence power also varies accord<strong>in</strong>g to variation <strong>in</strong> temperature.2. Voltage dependency of power. Voltage dependency of power is well known.(P=C*V*V*f). This is true for CMOS technology also. If we model, the CMOScomponent as a capacitor, it is clear that power varies based on the variation on supplyvoltage.3. <strong>Power</strong> <strong>in</strong>creases with <strong>in</strong>crease <strong>in</strong> frequency of operation. In fact, many designs now aday have different modes of operation. A high frequency mode when the device isoperational and a low frequency mode when the device is <strong>in</strong> standby mode. The impactof frequency on power estimation is already be<strong>in</strong>g discussed <strong>in</strong> previous section.4. Now a day, most of the designs have a significant chunk of flops or registers. Accord<strong>in</strong>gto one statistics, around 40-50% logic of the design conta<strong>in</strong>s flops. If all the flops areclocked throughout the operation, clock network consumes almost 50% of total power.It is sometimes helpful to analyze power consumption on clock network. This workanalyzes clock power contribution to total power.5. Process corner also impacts the currents and power consumption. This is especially truefor leakage power. A typical <strong>VLSI</strong> process has leakage power variation of order of 4-6from worst process to best process.44


Based on power sensitivity and tool study analysis <strong>in</strong> this section, we propose a powerestimation flow <strong>in</strong> typical design cycle as shown <strong>in</strong> Figure 3.2 below. Note that the poweranalysis varies from RTL design to pre layout netlist to post layout netlist.<strong>Power</strong> Estimation(spreadsheet)ArchitectureForward SAIF*Or FrequencyConstra<strong>in</strong>tsRTLToggle FrequencyCalculatorUnplaced NetlistPlaced NetlistDetailed Route OverLogic SimulationPIF FileGeneration<strong>Power</strong> Estimation<strong>in</strong> <strong>Power</strong>Compiler (wireload, global SPEF,Detailed SPEF)RC RC SPICE NetlistNanoSimPrime<strong>Power</strong>RecommendedLeast Preferred* SAIF - Switch<strong>in</strong>g Activity File based approachFigure 3.2 <strong>Power</strong> Estimation <strong>in</strong> Design Stages3.3 <strong>Power</strong> analysis Tools3.3.1 <strong>Power</strong> Compiler: [67]Formerly known as Design <strong>Power</strong>, power compiler is currently most widely used Synopsys tool.<strong>Power</strong> compiler, typically be<strong>in</strong>g used dur<strong>in</strong>g synthesis, does power optimization as well aspower estimation. This tool has static algorithms for calculat<strong>in</strong>g switch<strong>in</strong>g activity at various45


circuit nodes and propagates the same. It is known fact that power compiler cannot estimategood switch<strong>in</strong>g activity for sequential cells. It should be also noted that most ASIC vendorshave cell power model<strong>in</strong>g based on Synopsys Liberty syntax so it is highly important to haves<strong>in</strong>gle cell power estimation close to <strong>Power</strong> Compiler number. Synopsys Reference Manual on<strong>Power</strong> Compiler [18] gives basic power calculation theory and description of terms be<strong>in</strong>g used<strong>in</strong> its tools.We used power compiler <strong>in</strong> two modes.One mode was to use power compiler as complete solution for power estimation. In thisapproach, we generated <strong>in</strong>put switch<strong>in</strong>g activity from our vectors and specified topower compiler. <strong>Power</strong> compiler propagated the switch<strong>in</strong>g activity based on switch<strong>in</strong>gprobability. It then calculates power. In this method, it used some assignment methodfor sequential cells and we went ahead with that because our aim was to verify defaultswitch<strong>in</strong>g activity propagation algorithm of <strong>Power</strong> Compiler.Second mode was to use power compiler just as power calculation eng<strong>in</strong>e. In thisapproach, we generated switch<strong>in</strong>g activity at all the nodes by us<strong>in</strong>g methodologydef<strong>in</strong>ed <strong>in</strong> Chapter 3 and used the power calculation eng<strong>in</strong>e. As mentioned earlier,power calculation eng<strong>in</strong>e is quite accurate and so based on power estimation; our aimwas to evaluate switch<strong>in</strong>g activity determ<strong>in</strong>ation accuracy of other methods.3.3.2 <strong>Power</strong> Mill (or Nano Sim) [4][68]<strong>Power</strong> Mill is Synopsys tool (currently known as Nano Sim) with fast SPICE eng<strong>in</strong>e at core. Ithas been identified as nicely correlat<strong>in</strong>g for two of the s<strong>in</strong>gle cell circuits and one small design46


with SPICE. <strong>Power</strong> Mill is dynamic simulation based tool and hence it requires patterns forsimulation.We used <strong>Power</strong> Mill to calculate average and peak power. The ma<strong>in</strong> reason was runtimeadvantage of <strong>Power</strong>Mill compare to SPICE. It should be noted here that <strong>Power</strong> Mill is capableof tak<strong>in</strong>g SPICE net list as <strong>in</strong>put so any switch<strong>in</strong>g between from <strong>Power</strong> Mill and SPICE istransparent, if needed.3.3.3 Prime <strong>Power</strong> [66]Prime <strong>Power</strong> is another offer<strong>in</strong>g <strong>in</strong> Synopsys power portfolio. This is dynamic vector basedsolution. However the key difference with <strong>Power</strong> Mill is that <strong>Power</strong> Mill is SPICE based toolwhereas Prime <strong>Power</strong> is logic simulation based tool. In other words, <strong>Power</strong> Mill is more tunedfor accuracy and Analog k<strong>in</strong>d of designs whereas Prime <strong>Power</strong> is tuned to digital andspecifically ASIC k<strong>in</strong>d of designs with reasonably good accuracy. Prime <strong>Power</strong> has PLI<strong>in</strong>terface with lead<strong>in</strong>g <strong>in</strong>dustry simulators e.g. VCS, Modelsim, Verilog etc. While do<strong>in</strong>g logicverification with these simulators, if we <strong>in</strong>stantiate one call/command, the PLI dumps b<strong>in</strong>aryfiles. These b<strong>in</strong>ary files can be used <strong>in</strong> Prime <strong>Power</strong> to do power estimation. It should be notedthat Prime <strong>Power</strong> can do peak power analysis also.We used Prime <strong>Power</strong> for both average and peak power analysis. The simulator <strong>in</strong>terface be<strong>in</strong>gused was VCS.3.3.4 Other ToolsThis project used VTRAN for convert<strong>in</strong>g vectors to SPICE stimulus. VTRAN is one of theoffer<strong>in</strong>gs as part of Synopsys and is generic translator of vectors from one format to another. It47


is support<strong>in</strong>g all major <strong>in</strong>dustry formats as well as <strong>in</strong>ternal formats of many prom<strong>in</strong>entASIC/EDA vendors.VCS was used for logic simulation. There is no specific reason for us<strong>in</strong>g this simulator exceptthat it is Synopsys offer<strong>in</strong>g so will go with Prime <strong>Power</strong> without major hurdles.There are few TI <strong>in</strong>ternal programs used to set up an automated flow. They are listed below.1. genFuncTDL – An <strong>in</strong>ternal utility to generate random vectors with specified clock rate.2. SimOut – A test constra<strong>in</strong>t validation environment.3. SDFAligner – for translat<strong>in</strong>g SDF from one simulator to other simulator compatibleformat.4. SigProbGen – For convert<strong>in</strong>g vectors to <strong>in</strong>put switch<strong>in</strong>g activity and probabilitycalculator.5. DREPGEN – for generat<strong>in</strong>g data compatible for TFC.6. ASCII benchmark data to Verilog netlist and SPICE netlist translator.3.4 Validation FlowThe validation flow diagram, data management and color convention is shown <strong>in</strong> Figure 3.3.Some of the key steps are described below.48


DREPGENDREPFILE+ DATAGENFUNCTDLRANDOMTDLVERILOGNETLISTDC ScriptsTFCUSERFREQFILESIGPROBGENTRANSLATERVerilogPOWERESTIMATIONSWITCHINGACTIVITYFILEVTRAN cmdVTRANISCAS89CircuitsSpiceNETLISTPOWERMILLPWLFILESMOUTCFGTRANSLATERSPICECMDSDFTESTBenchPOWERPrime<strong>Power</strong>PIFVCS_PIFFull VCDCOMPARISON ANDREPORTFigure 3.3 <strong>Power</strong> Estimation Validation Flow• White : Third Party tools• Green : Automatically generated data or written translator• Grey : TI tools• Default : standard <strong>in</strong>puts/outputs• Blue: F<strong>in</strong>al Output• Elipse : Data file(s)• Rhombus : Process Block(s)Figure 3.4 Legends for Validation Flow49


3.4.1 Netlist Setup:Standard <strong>in</strong>dustry benchmark circuits – ISCAS89 are used for the validation. The circuits’complexity ranges from 14 gates to 22000 gates. The detail statistics of the circuit is mentioned<strong>in</strong> Table 2. [71]To make the validation complete, two s<strong>in</strong>gle cell circuits are added for ‘micro’ level validation.ISCAS89 benchmark circuits were mapped to 130nm technology for analysis. Note that there isno optimization or synthesis be<strong>in</strong>g used while mapp<strong>in</strong>g the circuits to 130nm technologyhowever predeterm<strong>in</strong>ed set of cells was used. They are,• 2,3,4 <strong>in</strong>puts AND/NAND gates• 2,3,4 <strong>in</strong>puts OR and NOR gates• Buffers and <strong>in</strong>verters• 2,3 <strong>in</strong>puts ex-or and ex-nor gates• Flops3.4.2 Vector GenerationRandom vectors were generated for all the ISCAS89 circuits. The numbers of vectors werebased on circuit complexity and number of gates. They vary from 4 vectors to 38000 vectorsapproximately. The same set of vectors is used for logic simulation and SPICE simulation aswell as derivation of switch<strong>in</strong>g activity and static probabilities for Input P<strong>in</strong>s.50


3.4.3 Interconnect setupAll the circuits can be estimated as synthesized Verilog netlist and hence the parasitic<strong>in</strong>formation was not available. To make comparison more realistic, no load modes were used <strong>in</strong>power compiler and <strong>in</strong> SPICE simulation. The logic simulation was based on SDF generatedfrom Synopsys.3.5 Validation and ResultsThe complete data from different tools are shown <strong>in</strong> Table 3.5. Table 3.2 describes circuits usedfor benchmark<strong>in</strong>g. Table 3.3 compares run time between dynamic method and modified togglecomputation method for some of the big design blocks. Table 3.4 shows power estimation forclock network vs. total power estimation. All the power data is dynamic power <strong>in</strong> uW.• The power numbers ma<strong>in</strong>ly reflect the cell <strong>in</strong>ternal power and switch<strong>in</strong>g power only dueto gate <strong>in</strong>put capacitances as no <strong>in</strong>terconnects were assumed.• All the experiments are done at nom<strong>in</strong>al operat<strong>in</strong>g po<strong>in</strong>t i.e. normal process, 25 Ctemperatures and 1.2 voltage (nom<strong>in</strong>al voltage).• Clock network power is 50% of total dynamic power but this is not true <strong>in</strong> all cases.• Run time reduction from static approach is more than 1000 times.• Prime <strong>Power</strong> reported power is optimistic <strong>in</strong> many cases to <strong>Power</strong>Mill. This is not <strong>in</strong>our expectation and we are look<strong>in</strong>g <strong>in</strong>to it.• TFC is with<strong>in</strong> 30% of <strong>Power</strong>Mill reported power. However there are certa<strong>in</strong> exceptionswhere it reports 30% optimistic power or >50% pessimistic power.• <strong>Power</strong> Compiler is >50% pessimistic <strong>in</strong> most of the cases.51


DesignNameIN OUT Flops Boolean(gates+<strong>in</strong>v)s111 8 1 0 8s1196 14 14 18 388+141s1238 14 14 18 428+80s13207 31 121 669 2573+5378s13207_1 62 152 638 2573+5378s1423 17 5 74 490+167s1488 8 19 6 550+103s1494 8 19 6 558+89s15850 14 87 597 3448+6324s15850_1 77 150 534 3448+6324s208_1 10 1 8 66+38s27 4 1 3 8+2s298 3 6 14 75+44s344 9 11 15 101+59s349 9 11 15 104+5752


DesignNameIN OUT Flops Boolean(gates+<strong>in</strong>v)s35932 35 320 1728 12204+3861s382 3 6 21 99+59s38417 28 106 1636 8709+13470s38584 12 278 1452 11448+7805s38584_1 38 304 1426 11448+7805s386 7 7 6 118+41s4 2 1 1 0s400 3 6 21 106+58s420_1 18 1 16 140+78s444 3 6 21 119+62s5 2 1 0 1+0s510 19 7 6 179+32s526 3 6 21 141+52s526n 3 6 21 140+54s5378 35 49 179 1004+1775s641 35 24 19 107+27253


DesignNameIN OUT Flops Boolean(gates+<strong>in</strong>v)s713 35 23 19 139+254s820 18 19 5 256+33s832 18 19 5 262+25s838_1 34 1 32 288+158s9234 19 22 228 2027+3570s9234_1 36 39 211 2027+3570s953 16 23 29 311+84Table 3.2 ISCAS89 circuit descriptionDesign TFC + <strong>Power</strong> Compiler Runtimes (<strong>in</strong> mts) <strong>Power</strong>Mill runtime (CPUHr)S13207 3 23S13207_1 3 24S15850 3 25S15850_1 3 26S35932 6 25054


Design TFC + <strong>Power</strong> Compiler Runtimes (<strong>in</strong> mts) <strong>Power</strong>Mill runtime (CPUHr)S38417 6 189S38584 7 205S38584_1 7 212Table 3.3 Runtime comparison between vector less and SPICEDesign Name CLK <strong>Power</strong> Total <strong>Power</strong> %CLK/Totals4 2.13 3.35 63.6s27 6.39 10.91 58.61s208_1 17.05 30.43 56.04s298 29.84 54.12 55.14s344 31.97 61.11 52.32s349 31.97 61.14 52.29s382 47.04 91.73 51.28s386 12.79 32.28 39.62s400 47.04 94.51 49.7755


Design Name CLK <strong>Power</strong> Total <strong>Power</strong> %CLK/Totals420_1 34.1 53.75 63.46s444 44.76 84.83 52.77s510 12.79 29.43 43.46s526n 44.76 85.94 52.08s526 44.76 85.89 52.11s641 40.5 117.38 34.5s713 40.5 123.07 32.91s820 10.66 72.29 14.74s832 10.66 72.5 14.7s838_1 68.21 99.96 68.24s953 61.81 102.37 60.38s1494 12.79 158.7 8.06s1488 12.79 158.24 8.08s1423 157.73 356.1 44.29s1238 38.37 150.51 25.49s1196 38.37 151.17 25.3856


Design Name CLK <strong>Power</strong> Total <strong>Power</strong> %CLK/Totals5378 381.55 751.75 50.75s9234_1 449.75 891.59 50.44s9234 485.99 632.35 76.85s13207_1 1359.9 1908.3 71.26s13207 1426 1718 83s15850 1272.5 1971.3 64.55s15850_1 1138.2 2630.3 43.27s38417 3289.1 4659.3 70.59s35932 3450.5 9654 35.74s38584_1 2920.7 8339.6 35.02s38584 2966.3 8057.2 36.82Table 3.4 Clock <strong>Power</strong> vs. Total <strong>Power</strong>DesignName<strong>Power</strong>CompilerProposedApproachPrime<strong>Power</strong><strong>Power</strong>Mill%newpower/powercompiler%powercompiler/<strong>Power</strong>Mill%newapproach/<strong>Power</strong>Mill%primepower/<strong>Power</strong>Mills111 5.5 2.23 0 2.87 -59.42 91.62 -22.24 -10057


DesignName<strong>Power</strong>CompilerProposedApproachPrime<strong>Power</strong><strong>Power</strong>Mill%newpower/powercompiler%powercompiler/<strong>Power</strong>Mill%newapproach/<strong>Power</strong>Mill%primepower/<strong>Power</strong>Mills4 3.72 3.35 2.93 2.79 -9.95 33.43 20.16 4.95s5 2.49 1.34 0.47 1.72 -46.12 44.66 -22.05 -72.61s27 12.69 10.91 10.03 9.36 -14.01 35.54 16.55 7.14s208_1 44.91 30.43 22.4 29.03 -32.25 54.7 4.81 -22.84s298 67.33 54.12 40.05 41.42 -19.62 62.57 30.67 -3.31s344 85.24 61.11 56.55 65.7 -28.31 29.74 -6.99 -13.93s349 86.48 61.14 56.66 65.86 -29.3 31.31 -7.16 -13.97s382 83.57 91.73 52.75 53.15 9.76 57.25 72.6 -0.75s386 75.15 32.28 42.78 48.46 -57.05 55.07 -33.4 -11.73s400 83.96 94.51 52.77 53.3 12.58 57.51 77.32 -1s420_1 70.19 53.75 45.6 44.12 -23.43 59.11 21.83 3.37s444 83.79 84.83 52.9 53.64 1.24 56.22 58.15 -1.38s510 64.68 29.43 18.23 47.43 -54.51 36.36 -37.96 -61.57s526n 85.2 85.94 53.54 53.89 0.87 58.1 59.48 -0.65s526 85.41 85.89 53.67 54.08 0.57 57.93 58.83 -0.7558


DesignName<strong>Power</strong>CompilerProposedApproachPrime<strong>Power</strong><strong>Power</strong>Mill%newpower/powercompiler%powercompiler/<strong>Power</strong>Mill%newapproach/<strong>Power</strong>Mill%primepower/<strong>Power</strong>Mills641 159.77 117.38 72.37 93.34 -26.53 71.17 25.76 -22.46s713 162.62 123.07 74.51 96.57 -24.32 68.41 27.44 -22.84s820 119.02 72.29 47.96 73 -39.27 63.04 -0.98 -34.3s832 119.18 72.5 48.03 73.34 -39.17 62.51 -1.14 -34.51s838_1 126.27 99.96 93.41 75.78 -20.84 66.63 31.91 23.27s953 159.75 102.37 85.98 88.5 -35.92 80.51 15.67 -2.85s1494 187.71 158.7 98.28 136.47 -15.45 37.54 16.29 -27.99s1488 203.99 158.24 98.16 135.83 -22.42 50.18 16.5 -27.73s1423 406.56 356.1 244.9 278.03 -12.41 46.23 28.08 -11.92s1238 302.45 150.51 128.2 151.55 -50.24 99.57 -0.69 -15.41s1196 296.7 151.17 126.5 151.13 -49.05 96.33 0.03 -16.3s5378 1041.2 751.75 584.3 688.62 -27.8 51.2 9.17 -15.15s9234_1 1480.6 891.59 704.7 812.36 -39.78 82.26 9.75 -13.25s9234 1300.4 632.35 508.2 472.82 -51.37 175.03 33.74 7.48s13207_1 2853 1908.3 1533 1677.46 -33.11 70.08 13.76 -8.6159


DesignName<strong>Power</strong>CompilerProposedApproachPrime<strong>Power</strong><strong>Power</strong>Mill%newpower/powercompiler%powercompiler/<strong>Power</strong>Mill%newapproach/<strong>Power</strong>Mill%primepower/<strong>Power</strong>Mills13207 2572 1718 1436 1418.89 -33.2 81.27 21.08 1.21s15850 2640.3 1971.3 1400 1361.52 -25.34 93.92 44.79 2.83s15850_1 3272.6 2630.3 1539 1945.25 -19.63 68.24 35.22 -20.88s38417 7654.6 4659.3 4352 4688.74 -39.13 63.26 -0.63 -7.18s35932 17606 9654 6789 8513.75 -45.17 106.79 13.39 -20.26s38584_1 12031.7 8339.6 5630 6738.36 -30.69 78.56 23.76 -16.45s38584 10951.4 8057.2 4261 6235.13 -26.43 75.64 29.22 -31.66Table 3.5 <strong>Power</strong> Estimation across various tools3.6 <strong>Power</strong> estimation applicationsOnce the power estimation has been done, the data can be used <strong>in</strong> a post-process<strong>in</strong>g step to<strong>in</strong>vestigate various circuit properties. Note that some of them are applications of average togglecalculation method we described above.3.6.1 Average power/ground bus currentsConsider the problem of comput<strong>in</strong>g the average current <strong>in</strong> the power or ground bus branches.This can be solved us<strong>in</strong>g toggle densities and average power consumption for each library cell.60


We can approximate the average power for each cell based on toggle densities and approximatepower or ground network as distributed or lumped R and C. SPICE simulat<strong>in</strong>g this powernetwork, one can estimate average power/ground bus currents. [31]3.6.2 Average power dissipationAs a direct consequence of the power estimation described above, it should be clear that theanalysis gives overall average power dissipation, summ<strong>in</strong>g over all circuit nodes.3.6.3 Electro migration failuresElectro migration [93][94] is a major reliability problem caused by the transport of atoms <strong>in</strong> ametal l<strong>in</strong>e due to electron flow. Under persistent current stress, this can cause deformations ofthe metal, lead<strong>in</strong>g to either short or open circuits. The electro migration failure depends onaverage and root mean square – RMS current densities <strong>in</strong> metal leads. The average current <strong>in</strong>each metal lead can be estimated by the method described <strong>in</strong> this chapter and thus potentialelectro migration current can be addressed either <strong>in</strong> power network or signal lead.3.6.4 <strong>Power</strong> Rout<strong>in</strong>gIt has been noticed that <strong>in</strong>accurate power estimation normally is the root cause of ‘over design’of power network. By estimat<strong>in</strong>g accurate power number, it is possible to have dense powergrid on a block and light power grid on some other block and thus reduc<strong>in</strong>g the overall IR dropproblem also.61


3.6.5 Gate Oxide Integrity <strong>Analysis</strong>Reduction <strong>in</strong> gate oxide thickness <strong>in</strong> submicron technologies has resulted <strong>in</strong> <strong>in</strong>creased electricfield at the gate oxides. Excessive electric field > 5MV/cm can cause damage to the gate oxideand also reduce the Time Dependent Dielectric Breakdown strength (TDDB). The excessiveelectric field are caused by undershoot and overshoot at gate term<strong>in</strong>al. High duty cycle ofovershoot/undershoots will result <strong>in</strong> permanent failure of the transistors. The Failure <strong>in</strong> Time(FIT) rate represents the probability of device failure <strong>in</strong> 10 years of operation. In this regard,the duty cycle of signal <strong>in</strong>put p<strong>in</strong>s are measured based on toggle density.3.7 SummaryBased on our validation flow and analysis of results, it can be found that there is a way toestimate a good power number with m<strong>in</strong>imum run time as shown Table 3.3. However as themethod suggests, the toggle frequency calculation method has certa<strong>in</strong> limitations as it is basedon probabilistic algorithms and it does not have tim<strong>in</strong>g <strong>in</strong>formation or it does not do any logicalsimulation. Some ‘power’ designers may be <strong>in</strong>terested <strong>in</strong> hav<strong>in</strong>g good accuracy at the cost ofrun time. We have proposed a power estimation flow that caters the need of ‘power’ user aswell as normal users also.62


4 <strong>Power</strong> Supply Noise <strong>Analysis</strong>4.1 OverviewFigure 4.1 below gives a representative voltage waveform at an <strong>in</strong>ternal node <strong>in</strong> digital designswhile they are operational. The fluctuations arise due to switch<strong>in</strong>g CMOS logic and<strong>in</strong>ductances <strong>in</strong> power supply, package and <strong>in</strong>terconnect.Max VoltageVoltageIncreases PropagationDelayTime Average IR DropM<strong>in</strong> VoltageTimeFigure 4.1 Voltage over time representation at an <strong>in</strong>ternal design nodeThe dips <strong>in</strong> voltages are due to sudden change <strong>in</strong> currents dur<strong>in</strong>g logic switch<strong>in</strong>g s<strong>in</strong>ce<strong>in</strong>ductance will have additional di/dt noise. Apart from that, <strong>in</strong> CMOS currents are higher whilelogic switches compare to average currents used for average IR drop analysis. This causesadditional i(t)*R drop where R is resistance of <strong>Power</strong> <strong>Grid</strong>. Total drop seen at the s<strong>in</strong>k ofcurrent is:deltaV = L(di/dt) + i(t)*R63


Most popular technique to control this IR drop is to <strong>in</strong>sert decoupl<strong>in</strong>g capacitors <strong>in</strong> the design.Figure 4.2 shows electrical representation of <strong>in</strong>ductance and dynamic switch<strong>in</strong>g of cell thatcauses <strong>Power</strong> supply noise and decoupl<strong>in</strong>g capacitors that helps <strong>in</strong> meet<strong>in</strong>g this <strong>in</strong>stantaneousneed.V ddV ssL pdV dd P<strong>in</strong>C pdRpdI ddR ndL V ss P<strong>in</strong>ps R ps C psI ssV dd NetC ndV ss NetCellC decapR nsC nsFigure 4.2 Schematic circuit for <strong>in</strong>stantaneous voltage drop analysisThis work focuses on comput<strong>in</strong>g <strong>in</strong>stantaneous IR drop (deltaV) or actual voltage (Vdd-deltaV)at Cell’s <strong>Power</strong>/Ground ports. Vdd is ideal voltage source here and constant over time. Herealso our approach is focused on cell based designs. Next section expla<strong>in</strong>s the cellcharacterization and model<strong>in</strong>g needed for block level analysis. Us<strong>in</strong>g this characterization, webuild a power grid network that can be simulated. This is discussed <strong>in</strong> section 5.3. Section 5.4expla<strong>in</strong>s the prototype flow we developed and chapter ends with validation results andconclusion.4.2 Cell CharacterizationDef<strong>in</strong>ition: Cell characterization is a process through which data is prepared forevery cell for usage <strong>in</strong> the design.Process <strong>in</strong>volves SPICEcharacterization as well as post process<strong>in</strong>g of data. The process needs64


to be absolutely <strong>in</strong> complete alignment between characterization andits usage.4.2.1 Current Characterization MethodologyFor <strong>in</strong>stantaneous <strong>Power</strong> <strong>Grid</strong> analysis, we analyzed cell peak current waveforms. Figure 4.3shows transient waveform of <strong>in</strong>verter cell which was simulated at 250MHz. (VDD is power p<strong>in</strong>and VSS is ground p<strong>in</strong>) It has voltage waveform of primary <strong>in</strong>put and primary output (VA, VY)of <strong>in</strong>verter. It also has current waveform <strong>in</strong> VDD and VSS port (IRVDD, IRVSS). The voltagewaveform at VDD and VSS port is seen. (VVDD_INV1, VVSS_INV1)Note that current waveform at VDD and VSS are similar except one difference – transitiondirection. The current waveform at VDD when output is charg<strong>in</strong>g is same as current waveformat VSS when output is discharg<strong>in</strong>g and vice versa. This is true <strong>in</strong> this case for <strong>in</strong>verter but it canvary if the cell is not balanced properly. However <strong>in</strong> any case the amount of chargesupplied/discharged will be constant s<strong>in</strong>ce it is governed by load connected at output.65


Output isris<strong>in</strong>g. There isnotablesymmetry forrise/fall. Thishelps us tocharacterizeonly onecurrent and dothe analysis at<strong>Power</strong>/Groundnetwork.Output isris<strong>in</strong>g. Thisalignment ispreserved forbetter resultsdur<strong>in</strong>g currentwaveformgeneration.Same is truefor Outputfall<strong>in</strong>g.Figure 4.3 Inverter waveforms measured at different nodes66


In this work, we have ma<strong>in</strong>ta<strong>in</strong>ed temporal relation ship between <strong>Power</strong> and Ground currentwaveforms and decoupled the simulations i.e. they are simulated separately and IR drop resultsare merged.We performed simulations and arrived at follow<strong>in</strong>g conclusions.• The shape of the current waveform rema<strong>in</strong>s the same if the patterns used are sameacross different frequencies. Note here that the overall simulation time decreases whenfrequency <strong>in</strong>creases for a same set of patterns. This is not a surprise as the load be<strong>in</strong>gcharged and discharged is same dur<strong>in</strong>g each transition for the same slew and for thesame set of patterns. In case of CMOS gate, shape of current waveform rema<strong>in</strong>s samefor very high frequencies (period ~= 3 times of 0-100% slew). (Appendix C)• The slew or transition time (used <strong>in</strong>terchangeably) plays a big role for peak powerdeterm<strong>in</strong>ation of cells. When the slew decreases, the width of the current spikedecreases with <strong>in</strong>crease <strong>in</strong> peak. Figure 4.4 and Figure 4.5 shows the peak powervariation for different <strong>in</strong>put transition times. Note the variation of ~2x for <strong>in</strong>verter and~1.5x for 2 <strong>in</strong>put NAND gate.67


Figure 4.4 transition time vs. peak power for InverterFigure 4.5 Transition time vs. peak power for nand gate• Peak power varies while change <strong>in</strong> output load. The change is as expected s<strong>in</strong>cecapacitance <strong>in</strong>crease along with MOS resistance provides exponential voltage ramp up.Peak is largely dependent on MOS ON resistance as well as <strong>in</strong>itial voltage. Figure 4.6and Figure 4.7 shows the plot of variation for AND as well as OR gate. Note that thevariation is ~1-3% across wide range of load.68


Figure 4.6 Load vs. peak power for AND gateFigure 4.7 Load vs. Peak power for OR gate• For cell characterization, pattern dependency is not critical. This is expected as most ofthe circuits will be 1-2 level of logic where each pattern will activate/deactivate most ofthe transistors. However, soon when cells start becom<strong>in</strong>g larger, some logic may not getactivated dur<strong>in</strong>g switch<strong>in</strong>g. In this case, it is important to choose useful patterns for cellcurrent characterization.• For cell characterization, transition direction matters for a given power supply. It meansthat output rise transition or fall transition are important to capture dur<strong>in</strong>g69


characterization and use them appropriately dur<strong>in</strong>g use. (Figure 4.3) In our case, wecapture rise and fall transition together and use them for analysis, mak<strong>in</strong>g proposedapproach direction <strong>in</strong>dependent. Figure 4.8 State Dependency on cell switch<strong>in</strong>gFigure 4.8 State Dependency on cell switch<strong>in</strong>gWe also established few corollaries those will be used later <strong>in</strong> discussion.1. Slew impacts the short circuit current of the device. For multi-stage block, slew impacts1 st stage the most and the overall current waveform is unaffected due to this change.The impact varies from lo to hi when the design stages are decreas<strong>in</strong>g.2. Glitches or hazardous transitions can contribute to peak current need of the circuit.Model<strong>in</strong>g glitches <strong>in</strong> non-SPICE analysis is not trivial. It is desired that glitches arereduced by robust design practices. In this work, it is assumed that there are no glitches<strong>in</strong> the design.70


3. The temporal correlation between different <strong>in</strong>puts <strong>in</strong>fluences the characterization data alot. This is due to simultaneous switch<strong>in</strong>g. We have used the least affect<strong>in</strong>g comb<strong>in</strong>ationi.e. 0 skew between multiple <strong>in</strong>puts <strong>in</strong> our analysis – this is worst case also. (Figure 4.8)4.2.2 Current Characterization FlowCurrent Source generation <strong>in</strong>volves time variant current waveform determ<strong>in</strong>ation for each cell.This is current waveform as it is seen at VDD p<strong>in</strong> of cell when the cell output is ris<strong>in</strong>g or fall<strong>in</strong>g.The flow is shown <strong>in</strong> Figure 4.9. Sample SPICE deck is shown <strong>in</strong> Appendix D. PERL Programthat takes <strong>in</strong>put from SPICE simulation has follow<strong>in</strong>g options available. In our case, we tooklast option with 75ps as sampl<strong>in</strong>g <strong>in</strong>terval.1. full – Whole current data available <strong>in</strong> the punch file is given as output <strong>in</strong> two columnformat, first column giv<strong>in</strong>g the simulation time and the second column giv<strong>in</strong>g thecurrent value correspond<strong>in</strong>g to each simulation time <strong>in</strong>stance.2. fixed – The total simulation time is divided <strong>in</strong>to 8192 po<strong>in</strong>ts and the current value atthese 8192 time-values is obta<strong>in</strong>ed either directly, if available or by <strong>in</strong>terpolation.3. Interval filtered – An <strong>in</strong>terval <strong>in</strong> picoseconds is specified and accord<strong>in</strong>g to that, theprogram obta<strong>in</strong>s the time-values for which the data is expected. Aga<strong>in</strong>, the current datacorrespond<strong>in</strong>g to these time-values is obta<strong>in</strong>ed directly, if available or by <strong>in</strong>terpolation.71


Cell SPICE DeckSPICE simulation@ 10 MHzPerl Process<strong>in</strong>g toSample VDD currentsFigure 4.9 Cell Characterization FlowUs<strong>in</strong>g the above methodology, we characterized all the cells which were be<strong>in</strong>g <strong>in</strong>stantiated <strong>in</strong>ISCAS89 circuits.4.3 <strong>Power</strong> <strong>Grid</strong> network model<strong>in</strong>gThis section describes the <strong>Power</strong> <strong>Grid</strong> network build<strong>in</strong>g us<strong>in</strong>g the cell characterization data.<strong>Power</strong> <strong>Grid</strong> offers resistance, capacitance as well as <strong>in</strong>ductance to the switch<strong>in</strong>g logic. Figure4.10 shows schematic of typical power grid. [45] The power & ground supply p<strong>in</strong>s are modeledas ideal voltage sources. The methodology however vastly varies <strong>in</strong> terms of current sourcemodel<strong>in</strong>g and capacitance estimation [50 51 52 53]. This work also focuses on current sourcemodel<strong>in</strong>g which is described <strong>in</strong> next sub section.72


Each such armRepresents resistance…Figure 4.10 <strong>Power</strong> <strong>Grid</strong> Model<strong>in</strong>gOnce, the power grid is determ<strong>in</strong>ed along with capacitance and current source distribution, itcan be realized as matrix data structure and can be solved for comput<strong>in</strong>g voltages at desirednodes – specifically the nodes where cell components are connected as below.V * Y = IWhere V is voltage value at each node, Y is admittance or resistance of PG segment, I iscurrent that we have characterized.OR v(t) = Z * i(t) ( Z = R – jW for power network )V(w) = z(w) * i(w)73


In our work, we have computed resistances and capacitors based on technology data for 130nmnode. A sample program was written to realize the mesh structure as shown <strong>in</strong> Figure 4.10 forVDD network and VSS was taken as ideal ground. This is not an issue s<strong>in</strong>ce we can lump allthe VSS network elements to VDD network. After determ<strong>in</strong><strong>in</strong>g <strong>Power</strong> <strong>Grid</strong> Current Waveform,we solved the network through SPICE simulations.4.3.1 <strong>Power</strong> <strong>Grid</strong> Current Waveform Model<strong>in</strong>g<strong>Power</strong> <strong>Grid</strong> Current waveform model<strong>in</strong>g <strong>in</strong>volves follow<strong>in</strong>g steps:1. Compute Toggle frequency for each of the <strong>in</strong>stance <strong>in</strong> design as proposed <strong>in</strong> Chapter 2.2. Us<strong>in</strong>g the current characterized data for the cell, transform the current data at the abovecomputed toggle frequency.3. Compute the <strong>in</strong>put arrival for each of the <strong>in</strong>stance <strong>in</strong> design. This is done us<strong>in</strong>g StaticTim<strong>in</strong>g <strong>Analysis</strong>. Compute the shift required <strong>in</strong> current waveform with reference toclock edge. For simplicity, we have assumed 0 skew for clock network.4. Hook up the current sources and solve the PG network.5. Determ<strong>in</strong>e the PG model simulation time.There are expla<strong>in</strong>ed further below.1 Read the characterized data.74


Characterized data was transformed from time doma<strong>in</strong> to frequency doma<strong>in</strong>. Thesampl<strong>in</strong>g is done at fixed frequency (much higher than common design frequencyvalues) – 1000/75 ~ 13.33 GHz and [t, i(t)] are stored.I(t) = i(0)d(0) + i(0+Ts)d(0+Ts) + i(0+2*Ts)d(0+2*Ts) + … N SamplesWhere,‘Ts’ is sampl<strong>in</strong>g frequency – <strong>in</strong> this case 13.33 GHzi(t) is current value at time ‘t’d(t) = 1 when t=n*Ts else 0. n ranges from 1,…,NFor computation efficiency N may be chosen as power of 2… N = 2 ** n (n is <strong>in</strong>teger)Now, the Fourier transform of the samples have been performed:I[k] = i[n]*2 Model the current waveform for each Boolean gate at computed toggle frequency.• A compression factor (M) is def<strong>in</strong>ed to meet the targeted frequency of the cell underconsideration.M = targeted frequency/cell characterized frequency (10MHz <strong>in</strong> this work)• Transformation allows preserv<strong>in</strong>g base of the current transients. This would not havebeen possible <strong>in</strong> a time doma<strong>in</strong> while we scale frequency. Hence, the need of frequencydoma<strong>in</strong> transformation. Appendix E shows the waveform generated after transformationfrom 1 MHz waveform. As it can be seen, 1GHz waveform is not per expectation. Thisis not an issue s<strong>in</strong>ce apart from clock cells, other cells are not expected to switch at 175


GHz average toggle frequency. Beside, this can be handled by hav<strong>in</strong>g higher frequencycharacterization for clock cells.• Current data is compressed by compression factor.• When the data was transformed to frequency doma<strong>in</strong> and the frequency spectrum wasseen, the notable po<strong>in</strong>t was that we had a good chunk of lower frequency components -signify<strong>in</strong>g the approximate triangles of SPICE waveform and most of the medium tohigh frequency components were zero - signify<strong>in</strong>g the zero or low-leakage portion ofthe power waveform.3 Attach the current waveform at a PG node where this cell’s power or ground p<strong>in</strong> isconnected.4 Compute the total simulation time• If all <strong>in</strong>stances <strong>in</strong> the design are applied with respective waveforms, metrics solver givespeak voltage drop value from 0 to LCM (period of all gates)• Comput<strong>in</strong>g lowest common multiplier (LCM) is computationally <strong>in</strong>tensive for mostdesigns. Even if we do that, the generated simulation time is prohibitively high. Thememory space also becomes high.• In reality we are us<strong>in</strong>g a smaller number than that to ensure less simulation time andmore realistic data. Instead we computed simulation time as below.Tstop = f(m<strong>in</strong>imum toggle frequency, max delay)= Time Period of m<strong>in</strong>imum freq cell + maximum delay of all cell outputs= 2000 ns (for m<strong>in</strong>imum frequency as 1 MHz and 1000 ns as worst delay)5 Establish<strong>in</strong>g temporal relationship76


Do tim<strong>in</strong>g analysis and based on <strong>in</strong>put arrival time, the current waveforms are shiftedalong time axis. The purpose beh<strong>in</strong>d tim<strong>in</strong>g analysis is to establish temporal correlationbetween various nodes of the design i.e. even though 2 or more nodes have same togglefrequency; this will not switch all <strong>in</strong>stances <strong>in</strong> design simultaneously unless needed. Inthis work, we have chosen to work with toggle frequency and delay <strong>in</strong>stead of tim<strong>in</strong>gw<strong>in</strong>dow [28][45]. The reasons,• Not all circuit nodes switch <strong>in</strong> all the clock cycles. Average activity computationestablishes relative amount of switch<strong>in</strong>g among various nodes. This is possible becauseactivity estimation techniques consider circuit functionality. Average switch<strong>in</strong>g activityfor most of nodes is believed at 20% of the controll<strong>in</strong>g clock frequency. In certa<strong>in</strong>solutions, the average switch<strong>in</strong>g activity for non clock signals is assumed to be 10%only.• Tim<strong>in</strong>g w<strong>in</strong>dow method uses classical path sensitization to identify the <strong>in</strong>terval ofswitch<strong>in</strong>g. Inherent assumption of STA that all activity on a path should f<strong>in</strong>ish with<strong>in</strong> 1clock period (unless specified explicitly us<strong>in</strong>g multi-cycle path), the tim<strong>in</strong>g <strong>in</strong>tervals forall nodes will lie with<strong>in</strong> a clock period. This makes whole approach of pseudo dynamicsimulation pessimistic. (see results)• Dur<strong>in</strong>g tim<strong>in</strong>g analysis, we collected 2 sets of data. One, sensitization edge of the nodei.e. whether the node is ris<strong>in</strong>g or fall<strong>in</strong>g at that time and second, delay of the node fromreference node.Def<strong>in</strong>ition: Reference nodes are those nodes that can be considered as 0 delaynodes. All the flip-flop outputs are considered as reference node <strong>in</strong> ouranalysis. When the <strong>in</strong>put clock to the flip-flop has some propagation77


delay associated with it, the reference node will have delay associatedwith it.It can be seen that any frequency higher than 1 MHz will have at least some repetition <strong>in</strong> itscurrent signature i.e. a node is switch<strong>in</strong>g at 50 MHz (20ns) will have 50 repetitions of itscurrent signature <strong>in</strong> 1000 ns simulation.By chang<strong>in</strong>g the m<strong>in</strong>imum frequency, we can change the simulation time considerably. Forexample, by chang<strong>in</strong>g m<strong>in</strong>imum frequency to 50 MHz, we can ensure that all the currentsources with less than 50 MHz do not contribute (or contributes an average current) to dynamicV drop analysis and <strong>in</strong> that case maximum simulation time can become only 20 ns. In all ouranalysis we have assumed 1 MHz as m<strong>in</strong>imum frequency.Number of po<strong>in</strong>ts <strong>in</strong> piece wise l<strong>in</strong>ear current waveform is based on the sampl<strong>in</strong>g resolutionthat we did as first step after read<strong>in</strong>g characterized data. An <strong>in</strong>crease or decrease <strong>in</strong> thisfrequency can change the accuracy trad<strong>in</strong>g some runtime. In our analysis, we have assumed 75ps as sampl<strong>in</strong>g <strong>in</strong>terval.Clock network toggles all the time. Also many designs aim for smaller <strong>in</strong>sertion delays as wellas near zero skew. This makes clock network as one of the largest contributor of total current aswell as peak current.4.4 Complete FlowCell characterization and PG network model<strong>in</strong>g is expla<strong>in</strong>ed <strong>in</strong> Figure 4.11. We take VerilogNetlist as an <strong>in</strong>put and calculate average toggle frequency of each circuit node us<strong>in</strong>g simulationless approach. The frequency constra<strong>in</strong>ts are user conditions to drive the frequency calculation78


of any node. Alternatively frequency constra<strong>in</strong>ts can be generated from logic simulation orfunctional patterns. SDC conta<strong>in</strong>s tim<strong>in</strong>g constra<strong>in</strong>ts of the design. This is used <strong>in</strong> toggleactivity calculation as well as tim<strong>in</strong>g analysis. Tim<strong>in</strong>g <strong>in</strong>formation consists of max delay forpaths converg<strong>in</strong>g to any node and sensitization edge across that path. Current signatures foreach of the blocks (library macros as well as hierarchical block) are generated from currentmodels, tim<strong>in</strong>g <strong>in</strong>formation and activity estimation. The document expla<strong>in</strong>s, all the threeprocess<strong>in</strong>g steps – toggle calculation, tim<strong>in</strong>g measurement, current signature generation andblock model<strong>in</strong>g <strong>in</strong> detail. Once the current signatures are hooked to parasitic PG-network, atransient simulation is performed to measure V-drop at each macro node as well as dynamictransient current waveform is generated for the power-ground p<strong>in</strong>s. The V-drop data is be<strong>in</strong>gfed to tim<strong>in</strong>g analysis eng<strong>in</strong>e to analyze impact of V-drop to tim<strong>in</strong>g.Netlist Frequency Constra<strong>in</strong>ts SDCToggle Frequency CalculatorTim<strong>in</strong>g <strong>Analysis</strong>PWL GeneratorCurrent CharRLC netlist with current sourcesSPICE SimulationPeak Dynamic <strong>Power</strong>/Supply NoiseFigure 4.11 Peak IR drop Computation Flow79


Next sections expla<strong>in</strong> <strong>Power</strong> <strong>Grid</strong> Generator, Tim<strong>in</strong>g Information Generation and SPICEsimulation details.4.4.1 Tim<strong>in</strong>g Information GenerationTim<strong>in</strong>g <strong>in</strong>formation was generated us<strong>in</strong>g Prime Time. Prime Time requires Verilog netlist,SDC and SPEF (Standard Parasitic Exchange Format) files as an <strong>in</strong>put. We also wrote a tclscript (Prime Time supports TCL command language) to get arrival time <strong>in</strong>formation for allnodes of the circuit. Prime Time flow is shown <strong>in</strong> Figure 4.12 below. Sample SDC file [24][25]and SPEF used are shown <strong>in</strong> Appendix A and B.SDC FileVerilogNetlistSPEFPrime TimeArrival TimeComputationTim<strong>in</strong>g ReportFigure 4.12 Prime Time flow for arrival time computation4.4.2 <strong>Power</strong> <strong>Grid</strong> GeneratorThe <strong>Power</strong> <strong>Grid</strong> Generator flow is expanded further below <strong>in</strong> Figure 4.13.80


Cell Char @ fix frequency(10MHz <strong>in</strong> our work)Cell FlowToggle FrequencyCalculatorPerl Code(Processes various Inputs)Tim<strong>in</strong>g Report(delay <strong>in</strong>formation)MATLAB Program-Compression Factor computed (M)- M based compression <strong>in</strong> freq doma<strong>in</strong>Perl CodePG Mesh GenerationCurrent PWL hookup<strong>Analysis</strong>FlowPG NetworkFigure 4.13 <strong>Power</strong> <strong>Grid</strong> Generation FlowPERL program comb<strong>in</strong>es the toggle frequency values obta<strong>in</strong>ed us<strong>in</strong>g TFC and delay values forcorrespond<strong>in</strong>g nodes for all the nodes. The output file conta<strong>in</strong><strong>in</strong>g this <strong>in</strong>formation for all thecells is given to MATLAB.MATLAB program – It is given two <strong>in</strong>puts. One be<strong>in</strong>g the current data at prototype frequenciesfor all the gates. The other <strong>in</strong>put is a file conta<strong>in</strong><strong>in</strong>g delay and average activity <strong>in</strong>formation forall the cells of the circuit. Depend<strong>in</strong>g upon the activity, the prototype current data iscompressed. And this data is shifted by the amount equal to the delay at that node. The sameprocedure is repeated for all the cells. This <strong>in</strong>formation about the current data for all the cells isstored <strong>in</strong> a file. The second <strong>in</strong>put is a file, which conta<strong>in</strong>s the follow<strong>in</strong>g <strong>in</strong>formation about the<strong>VLSI</strong> circuit for which we have to obta<strong>in</strong> the power data.81


Based on the generated current signatures, a new PG network is created. After this, all themacro <strong>in</strong>stances are replaced with the correspond<strong>in</strong>g current signatures. In our analysis, wetook a PG network with uniform <strong>Power</strong> <strong>Grid</strong> and ideal GND. We did not do any actual powerrout<strong>in</strong>g but attached the current sources randomly. This is compared with actual spice circuitsfor all macros <strong>in</strong> the same PG network at the same locations.4.4.3 SPICE SimulationNow, each cell is replaced by current source driven by its correspond<strong>in</strong>g PWL data. Package R,L & C is attached to the top-level power p<strong>in</strong>s. SPICE simulation is performed. The voltage ateach node of the power mesh is punched. The IR drop for each cell is calculated us<strong>in</strong>g aCODAC (Characterization & Optimization of Digital & Analog Circuits) program (TI InternalProgram), which subtracts power supply from the m<strong>in</strong>imum voltage obta<strong>in</strong>ed at each node togive the Peak Dynamic IR Drop at that node. This is done for all the nodes of the circuit. Thesame CODAC program can be used to calculate the Average Dynamic IR Drop at each node ofthe circuit.4.5 Validation and ResultsIn this work, we have done follow<strong>in</strong>g simplifications:• Modeled power grid by creat<strong>in</strong>g an nxm mesh. The resistance of each arm <strong>in</strong> mesh wasderived from Ohm/um number. We also assumed 2 such arms <strong>in</strong> parallel to comprehendmulti-layer chip scenario.• Matrix solver was not developed as part of this work. Instead, we used SPICEsimulators available.82


We executed the flow as expla<strong>in</strong>ed <strong>in</strong> previous section. Instead of 1MHz, we used 10MHz forcharacterization. This is to reduce the amount of data. We still did 13.33GHz sampl<strong>in</strong>g of celldata.4.5.1 Peak <strong>Power</strong> ResultsThree small circuits were studied to stabilize the above approach. These three circuits are –• TWOAND :- The circuit consist of two AND gate one after the another.• ANDOR :- The circuit consists of one AND gate followed by one OR gate.• 2AND-1OR :- This circuit has two AND gate at the first level. The outputs of theseAND gates are given to an OR gate whose output is the f<strong>in</strong>al output.The peak power data is obta<strong>in</strong>ed for three small circuits us<strong>in</strong>g the approach described <strong>in</strong> thereport and us<strong>in</strong>g SPICE simulation. The data obta<strong>in</strong>ed us<strong>in</strong>g average switch<strong>in</strong>g activityapproach and SPICE for 100 Mega Hz and 500 Mega Hz <strong>in</strong>put frequency is given below <strong>in</strong>Table 4.1.PEAK POER (Watts)FREEQUNCYTWOAND AND-OR 2AND-1ORSpiceOurApproachSPICEOurApproachSPICEOurApproach100 MHz0.00168170.0016 0.0009409 0.0008421 0.0019253 0.001983


500 MHz 0.00168113 0.0016 0.0009410 0.00086539 0.00192531 0.0018Table 4.1 Comparison of Peak power Dissipation4.5.2 Peak Dynamic IR Drop ResultsFor determ<strong>in</strong><strong>in</strong>g peak Dynamic IR drop, <strong>in</strong>itially three circuits were used.• 100 Inverter Cha<strong>in</strong> – It is a cha<strong>in</strong> of 100 <strong>in</strong>verters with the output of the previous<strong>in</strong>verter act<strong>in</strong>g as the <strong>in</strong>put of the next. Delay of the cha<strong>in</strong> is higher than the frequencyof operation.• 32 Bit Shift Register – This 32-bit shift register is series/parallel shift register.Depend<strong>in</strong>g upon the <strong>in</strong>put and selection criteria, the <strong>in</strong>put is shifted <strong>in</strong> series or parallelmanner.• 16 Bit Adder – This is 16-bit b<strong>in</strong>ary adder. ‘Carry Forward’ logic is used for addition.Follow<strong>in</strong>g po<strong>in</strong>ts are taken <strong>in</strong>to account while generat<strong>in</strong>g the net lists for these circuits.• Package RLC is added to each power pad.• Ideal voltage source is attached to each power pad.• Uniform mesh structure is used and all leaf cells are placed randomly on to it.• Reduced <strong>in</strong>terconnect network was used us<strong>in</strong>g driv<strong>in</strong>g po<strong>in</strong>t admittance estimation forpower as well as signal l<strong>in</strong>es.• No exist<strong>in</strong>g decoupl<strong>in</strong>g capacitors were estimated.The peak Dynamic IR drop data is obta<strong>in</strong>ed us<strong>in</strong>g Average Activity approach, Tim<strong>in</strong>g W<strong>in</strong>dowapproach and SPICE simulation. The data obta<strong>in</strong>ed is shown <strong>in</strong> Table 4.2.84


Circuit%Drop <strong>in</strong>average activity%Drop <strong>in</strong> Tim<strong>in</strong>gW<strong>in</strong>dow ApproachSPICE%Drop100 Inverter Cha<strong>in</strong> 1.65 6 132 Bit Shift Register 17.5 40 1216 Bit Adder 31 NA 19.16Table 4.2 Comparison of percentage peak <strong>in</strong>stantaneous IR dropIt is clear that the accuracy of the Average Activity method is better than Tim<strong>in</strong>g W<strong>in</strong>dowmethod. To check the performance of this approach, Average Activity method was applied to afew <strong>in</strong>dustry standard circuits. Table 4.3 below shows the comparison of the maximumDynamic IR Drop <strong>in</strong> a circuit us<strong>in</strong>g average switch<strong>in</strong>g activity and <strong>Power</strong> Mill. <strong>Power</strong> Mill is aSPICE based transient analysis tool offered by Synopsys. It is now called Nano Sim.circuit %V Drop us<strong>in</strong>g avg activity %Vdrop <strong>in</strong> <strong>Power</strong> Mill %Errors27 4.5 5.8 -22.4138s344 6.3 6.6 -4.54545s349 6.2 7.5 -17.3333s444 8.6 13.3 -35.3383s1238 13.4 13.3 0.75188s298 12.5 15 -16.6667Table 4.3 Comparison of percentage peak IR drop on ISCAS89 circuits85


<strong>Power</strong> Supply Noise waveforms for average activity approach to spice simulation with actuallogic is shown <strong>in</strong> Figure 4.14, Figure 4.15 below.Figure 4.14 PSN waveform of Proposed MethodFigure 4.15 PSN Reference Waveform86


4.6 SummaryWe proposed novel PG network model<strong>in</strong>g technique. The approach <strong>in</strong>volves average switch<strong>in</strong>gactivity calculation, transient current characterization of basic Boolean gates of library,derivation of PG network model and do<strong>in</strong>g transient simulation of the PG model us<strong>in</strong>g vectorless approach. The results are derived from this simulation as desired. Further, our globalaverage switch<strong>in</strong>g activity calculation method ensures that we can consider global tim<strong>in</strong>gimpact due to global voltage drop without caus<strong>in</strong>g extra runtime. This reduces the need oflocal maximum voltage drop analysis on tim<strong>in</strong>g [26]. It is also noted <strong>in</strong> our approach that wehave detailed data of voltage drop across chip/block and based on this profile, we can also usesuitable decoupl<strong>in</strong>g placement at required location. The validation is done and results arecompared with dynamic fast SPICE simulator (Nano Sim) and proved that this averageswitch<strong>in</strong>g rate calculation gives as close results as dynamic vector analysis. However, theadvantage comes from the fact that average switch<strong>in</strong>g activity also gives accurate analysis ofaverage V drop. Hence the approach we are suggest<strong>in</strong>g gives both average and dynamic PGnoise results simultaneously.The approach is scalable to multimillion gate designs by us<strong>in</strong>g the technique proposed byBlaauw et al [55]. There is further possibility to expand this work to understand decapsensitivity as well as to skew the analysis for certa<strong>in</strong> end target e.g. PG grid robustness orMonte Carlo based analysis for higher accuracy and coverage.87


5 <strong>Power</strong> Up <strong>Analysis</strong>One of the popular techniques to reduce leakage is to use gated power supply. [74, 79, 80].Shekhar [74] has highlighted a technique called ‘sleep transistor’ and challenges associatedwith that. This technique proposes to gate power supply us<strong>in</strong>g a high threshold transistor whennot required as shown <strong>in</strong> Figure 5.1. The ‘sleep transistor’ also known as ‘power switch’ turnsoff power supply when a portion of chip is idle and thus sav<strong>in</strong>g leakage current. Apart fromdesign challenges, the technique has additional Design <strong>Analysis</strong> challenges as mentioned below.Figure 5.1 Gated <strong>Power</strong> Supply ([74])1. When <strong>Power</strong> Supply turns on from off state, a huge capacitive load gets chargedcaus<strong>in</strong>g a huge surge <strong>in</strong> current caus<strong>in</strong>g <strong>Power</strong> Supply Noise (PSN). This can couplewith signal l<strong>in</strong>es caus<strong>in</strong>g state change or delay change. It can also rema<strong>in</strong> with<strong>in</strong> supply89


network but caus<strong>in</strong>g huge dynamic IR drop that <strong>in</strong> turn affects circuit performance. Thegoal is to predict the surge and control that.2. The transistor <strong>in</strong> series with the supply acts as a huge resistor <strong>in</strong> normal mode ofoperation caus<strong>in</strong>g additional IR drop. This <strong>in</strong> turn degrades performance. The IR dropacross the transistor can be as high as 5-20mV. The goal is to do an average IR dropanalysis to access the impact of switch.3. Optimization of switches to get the best leakage improvement. The optimization hasarea penalty or IR drop or <strong>Power</strong> Supply Noise as cost parameters. For example, lownumber of switches gives good leakage improvement but high IR drop and <strong>Power</strong>Supply noise.4. When power supply goes down, all sequential logic <strong>in</strong> the virtual power doma<strong>in</strong> lossesits state. This puts extra constra<strong>in</strong>t overall on system behavior. There is also a techniquewhere the state is preserved through ‘retention flops’. [2, 81] The technique does needextra power rout<strong>in</strong>g to save state as well as control logic. The tim<strong>in</strong>g analysis needs tocapture the mode switch<strong>in</strong>g.5. Placement and Rout<strong>in</strong>g of extra signals, special cells (like retention flops etc) andvirtual power network.6. Leakage and number of power switch trade off7. <strong>Power</strong> rout<strong>in</strong>g closes immediately after floor plan. The switches need to be placed bythis time. It is important to have early power up analysis flow to compute required90


number of optimal switches meet<strong>in</strong>g the peak current surge as well as IR drop andleakage needs.Often, PSN is non-negotiable parameter and design-plann<strong>in</strong>g goal is to identify total number ofswitches that limits PSN to user-def<strong>in</strong>ed level. This paper describes an analytical method todeterm<strong>in</strong>e optimum number of power switches and power up glitch. Section II elaborates onswitched PG network and PSN problem. Section III outl<strong>in</strong>es the approach to analyze suchnetworks. Section IV correlates the results we have achieved with SPICE and the efficiency ofalgorithm.5.1 Switched PG Networks<strong>Power</strong> Supply Noise is widely acknowledged research doma<strong>in</strong> <strong>in</strong> today’s high performancedesigns. There is various analysis techniques also proposed <strong>in</strong> literature. [26-31] However,there is not much awareness on <strong>Power</strong> Supply Noise caused by turn<strong>in</strong>g on the power doma<strong>in</strong>swhen gated power supply is used. Figure 5.2 shows switch network for 1M-gate design andFigure 5.3 shows a current glitch and voltage ramp on an arbitrary switch output. Note that thecurrent surge can rema<strong>in</strong> for a considerable amount of time caus<strong>in</strong>g performance impact to ‘on’blocks.91


<strong>Power</strong> SwitchFigure 5.2 Layout of 1M gate with switch networkFigure 5.3 Current Glitch and Voltage Ramp at arbitrary switch outputA typical PG network with <strong>Power</strong> Switches can be represented as shown <strong>in</strong> Figure 5.4. Some ofthe characteristics of this network are: [87]• 2 doma<strong>in</strong>s – one golden doma<strong>in</strong> and non-gated power supply, second multiple virtualdoma<strong>in</strong>s and switched power supply.• All virtual doma<strong>in</strong>s are unconnected with<strong>in</strong>. They are connected to golden doma<strong>in</strong>through switch network.92


• Switch network consists of one or more different k<strong>in</strong>d of switches for a given doma<strong>in</strong>.• Switch network across virtual power doma<strong>in</strong>s are not shared.• Random logic is connected to golden doma<strong>in</strong> as well as all virtual doma<strong>in</strong>s.• Control logic enables any one or more virtual doma<strong>in</strong>s to turn on/off any time.• Further, any switch network consists of parallel network or sequential network orcomb<strong>in</strong>ation of both. Parallel configuration allows all switches to turn onsimultaneously whereas sequential configuration allows each switch to turn on one byone after some delay.Switch Control LogicOffchip <strong>Power</strong> supplyNonGatedVDDSwitchSWVirtual <strong>Power</strong><strong>Power</strong> NetworkNetworkNetworkLogicNetworkZOOMLogicNetworkVDD SW1SW VDD SW2SW VDDSW3SWN SwitchesParallel ConfigurationVDD SW1SW VDD SW2SW VDDSW3SWD1 D1 D1N SwitchesSequential ConfigurationFigure 5.4 Typical PG network with <strong>Power</strong> SwitchesWhen the power supply is ‘off’ and virtual network is disconnected, the current that passesthrough is leakage current. If leakage current of the virtual logic is significantly higher than thatof switch network leakage, leakage current improvement happens. When the switches areturned on i.e. when the power supply connects to virtual power network, the loads <strong>in</strong> virtual93


power network start gett<strong>in</strong>g charged. Loads <strong>in</strong>clude <strong>in</strong>terconnect capacitances, gatecapacitances as well as the circuit diffusion/diode caps. The amount of current be<strong>in</strong>g sunk bythese caps depends on the ability of switch network to provide charge <strong>in</strong> a given time. Due tofast current need of the virtual power doma<strong>in</strong>, there is L*di/dt noise be<strong>in</strong>g <strong>in</strong>jected <strong>in</strong>to circuitthat can affect normal function<strong>in</strong>g of the golden power doma<strong>in</strong>. Note that despite of capacitiveload dom<strong>in</strong>at<strong>in</strong>g, the peak current is still limited by saturation current of switch caus<strong>in</strong>g currentprofile we got <strong>in</strong> Figure 5.3.5.2 Switch Network <strong>Analysis</strong>Switch Network <strong>Analysis</strong> (SNA) early <strong>in</strong> design-plann<strong>in</strong>g <strong>in</strong>cludes decision of switch networktopology, identification of switches to be used, total system tim<strong>in</strong>gs for turn<strong>in</strong>g on/off powerdoma<strong>in</strong>s as well as total power supply noise contribution by a switch network. Sequentialconfiguration allows configur<strong>in</strong>g delay such that the peak current at any po<strong>in</strong>t of time can becontrolled to meet the specification of system noise and hence the tradeoff between the totaltime systems requires to on/off virtual network and the noise criteria. This <strong>in</strong>formation shouldgo to the placement and rout<strong>in</strong>g tools for physical design. Further, switch network contributioncomes from maximum current surge it causes and the po<strong>in</strong>t of optimization there is totalnumber of switches of each type <strong>in</strong> the network and delay.Follow<strong>in</strong>g assumptions are made to keep the analysis simple but <strong>in</strong> reality the solution can beextended to handle them.• Delay between two consecutive switches is same.• 2 types of switches exist <strong>in</strong> the network.94


• Voltage at any node <strong>in</strong> virtual power network is of the same value at any time <strong>in</strong>stantdur<strong>in</strong>g power ON if there is zero static IR drop.• Switch Network is sequential. Parallel configuration essentially means a BIG switch -all transistors form<strong>in</strong>g a BIG switch with characteristic lumped to a s<strong>in</strong>gle MOS.High-level flow for the analysis is shown <strong>in</strong> block diagram Figure 5.5.Switch IVCharacterizationCurrent prediction thatcharges capacitive loadDeterm<strong>in</strong>ation ofrequired parametersFigure 5.5 Schematic Switch network <strong>Analysis</strong> Flow5.2.1 Switch CharacterizationSwitch IV Characterization <strong>in</strong>cludes current be<strong>in</strong>g sourced through switch for different voltagesbetween golden and virtual power port of switch. This is achieved us<strong>in</strong>g transient SPICEsimulation of the switch. The data is stored <strong>in</strong> value-pair (voltage-current) format for furtherprocess<strong>in</strong>g.Switch characterization also <strong>in</strong>volves switch ON resistance measurement. This is resistance thatswitches offer dur<strong>in</strong>g normal functionality i.e. when switches are turned ON and virtual powernetwork is connected to golden power network. This is measured by putt<strong>in</strong>g 10mV batteryacross switch and measur<strong>in</strong>g current. This resistance value is later used for average IR dropanalysis across switch.95


Note that the 1 stcharacterization – IV characterization – that we did also is resistancecharacterization. This resistance varies for different value of voltages across switch so it is alsocalled non-l<strong>in</strong>ear resistance characterization.5.2.2 Current or Switch PredictionCurrent prediction is done based on simplified extracted model of block under consideration asFigure 5.6. The switch network is modeled along with its detailed connectivity and tim<strong>in</strong>gwhereas the logic connected to virtual doma<strong>in</strong> is modeled as capacitive load. Current throughswitch is predicted <strong>in</strong> <strong>in</strong>f<strong>in</strong>itesimal small time duration. The CV characteristic is applied hereas below:Current(I) =dq/dt OR dq = I dtBut dq = C * dvHence dv = I * dt / C……1……2……3VDDSwitchNetworkVoutExtractedTotal CloadFigure 5.6 <strong>Analysis</strong> model of Virtual <strong>Power</strong> NetworkEquation 3 forms the basis of Algorithm 1 described <strong>in</strong> next section. The delay between twoconsecutive switches is used to predict the charge be<strong>in</strong>g supplied by the switch to virtual power96


network doma<strong>in</strong>. The IV table of the switch is used to predict current by further divid<strong>in</strong>g delay<strong>in</strong>to <strong>in</strong>f<strong>in</strong>itesimal small time duration as shown <strong>in</strong> Figure 5.7. Based on the <strong>in</strong>itial voltage andcharge supplied, the voltage has been derived when the next switch just starts turn<strong>in</strong>g on. Thisprocess cont<strong>in</strong>ues till either all switches are turned on or the specified voltage level is reached.Further, the same method cont<strong>in</strong>ues if all the switches are turned on but voltage value is lowerthan the ideal voltage value (VDD golden) to predict the maximum surge <strong>in</strong> current. Predictednumber of switches is used to predict static IR drop across switch network as expla<strong>in</strong>ed <strong>in</strong>Algorithm 2. This is another important parameter that will not be discussed further <strong>in</strong> thischapter.Figure 5.7 Inf<strong>in</strong>itesimal Time Division for Current PredictionParameters those can be analyzed through this setup <strong>in</strong>clude:• Total number of switches required reach<strong>in</strong>g a required voltage value.• Alternatively, voltage value that can be reached with given number of switches.97


• Maximum current surge that will happen given the number of switches.• Delay impact of consecutive switches while they turned on.• IR drop across switch network5.2.2.1 Algorithm for <strong>Power</strong> Switch Network <strong>Analysis</strong>:Initialize load voltage to zero and current charg<strong>in</strong>g to Zero.{For each, <strong>in</strong>f<strong>in</strong>itesimal small times period, predict the current based on thevoltage at lumped load from IV table of the switch type.Identify the actual current based on the number of switches turned on at theparticular <strong>in</strong>stance of time.Track the current at VDD i.e. if the new current is greater than old one, assignmaximum surge current to new current.Calculate the rise <strong>in</strong> voltage <strong>in</strong> the <strong>in</strong>f<strong>in</strong>itesimal small time based on equation(3).Cont<strong>in</strong>ue till either all the switches are turned on or the desired voltage level isreached.}Pr<strong>in</strong>t maximum surge current and voltage level reached after turn<strong>in</strong>g on some specificswitches as required by user.98


Above algorithm is developed for the case where the delay between 2 consecutive switches <strong>in</strong>sequential switch network is same. However, it is possible to extend for different delay scenario.In this case, we need to use tim<strong>in</strong>g <strong>in</strong>formation from Static Tim<strong>in</strong>g <strong>Analysis</strong> or simulations.5.2.2.2 Algorithm for Static IR drop analysis across power switches:{Read switch characterization data – for static IR drop, read ON Channelresistance (RON)Determ<strong>in</strong>e total number of switches required to reach desired voltage level –desired voltage level is specified by user – by “Algorithm for power SwitchNetwork <strong>Analysis</strong>”Effective resistance of the switches predicted above (N) is: RON/NCompute power consumption of switched off or virtual power network us<strong>in</strong>gany methods described <strong>in</strong> this work (can be outside this work also!)Compute average current consumption of the virtual power network. Iavg =Pavg/VDDStatic IR drop across switch network is: Iavg*RON/N.}5.3 Results and <strong>Analysis</strong>Traditional approach to study above would be full-fledged SPICE simulation that <strong>in</strong>cludesvirtual power network and switch network where each switch is turned on after some delay.Note that here we are talk<strong>in</strong>g about thousands of switches <strong>in</strong> switch network and about million99


gates <strong>in</strong> the virtual network or more. This will take weeks to simulate even with fast SPICEsimulators available <strong>in</strong> market. Also it is very late <strong>in</strong> design cycle!Alternately we can reduce the virtual power network by model<strong>in</strong>g the <strong>in</strong>terconnect load andgate capacitance with a huge distributed capacitance and on channel transistor resistance witheffective resistance <strong>in</strong> series with each distributed C to reduce the number of active elementsand simulate the reduced power network us<strong>in</strong>g SPICE (Figure 5.8). This approach gives ordersof improvement <strong>in</strong> terms of simulation time but the run time is still days. This can be donedur<strong>in</strong>g design plann<strong>in</strong>g or after detailed design is over!Figure 5.8 Reduced Switch Network for validationThe technique we presented <strong>in</strong> last section is static <strong>in</strong> nature and reduces the runtime to fewm<strong>in</strong>utes and gives very good correlation to techniques described above. The algorithmsdescribed above were analyzed with switches designed <strong>in</strong> TI’s 90 nm node. All the resultsbelow are for a 1M equivalent gate block. 1M Gates could not be simulated us<strong>in</strong>g SPICE alongwith switches so a simplified model described <strong>in</strong> previous paragraph was employed to get100


SPICE accuracy data while keep<strong>in</strong>g switch network <strong>in</strong>tact. We had employed switch networkwith two k<strong>in</strong>ds of switches for this analysis [87]. One set of switches took the virtual doma<strong>in</strong>till a specific voltage level and second k<strong>in</strong>d of switches with high capacity were turned on <strong>in</strong> asequential manner to measure surge <strong>in</strong> current.Table 5.1 shows prediction of switches for given voltage. When the numbers of switches are<strong>in</strong>creas<strong>in</strong>g the algorithm gives results with<strong>in</strong> 1% accuracy to SPICE based simulation whereaswhen the numbers of switches are less, the <strong>in</strong>accuracy is with<strong>in</strong> 10%. In other words, the actualnumber is quite close to realistic number with accuracy 1-10%. This table also shows thecurrent surge prediction and the switch number which turns ON caus<strong>in</strong>g maximum peak.Essentially, along with surge, we predict the switch at which the maximum surge occurs. Thishelps to further optimize the 2 nd type of switch network. Table 5.2 shows voltage predictiongiven the number of switches.The advantage of whole solution comes from the superlative run time improvement thatenables early analysis and tradeoffs <strong>in</strong> the design – Table 5.3. The runtime clearly outweighsthe small <strong>in</strong>accuracy <strong>in</strong> switch prediction or voltage prediction. Note that runtime does not<strong>in</strong>clude switch IV characterization time s<strong>in</strong>ce it is one time effort. In static analysis, we candump lot more <strong>in</strong>formation quickly as per the need to understand certa<strong>in</strong> behavior for tradeoffanalysis. We can also predict time doma<strong>in</strong> behavior of voltage and current us<strong>in</strong>g the approachdescribed <strong>in</strong> this work. Figure 5.9 compares predicted voltage over time to few arbitrary nodessimulated <strong>in</strong> SPICE. Figure 5.10 compares predicted current over time to current measured atVDD. This is good consider<strong>in</strong>g that the analysis is targeted for early trade off analysis.101


Vdesired (mV)Actual#SwitchesSwitches byAlgorithmCurrentSurge (mA)Current Surgeafter #switches20 380 403 950 12369 760 771 881 114271 1560 1554 749 100583 2340 2328 467 97869 2964 2971 266 811170 4368 4308 24 43Table 5.1 Switch Prediction by proposed algorithm#SwitchesSimulatedVoltage (mV)Voltage byAlgorithmSurgeCurrentSurge Currentafter switch #(mA)%Error <strong>in</strong>voltages780 63 70.54 892 101 111560 280 273.53 784 94 -0.22340 587 589.26 546 78 0.383120 926 927.7 263 64 0.18Table 5.2 Voltage Prediction102


No. of switches Simulation Time (<strong>in</strong> days) Algorithm Runtime (<strong>in</strong> mts)780 ~1.5 < 11560 ~4 < 12340 ~5 < 12940 ~6 < 1Table 5.3 <strong>Power</strong> Up analysis - Runtime Comparison1400Voltage <strong>in</strong> mV120010008006004002000TimePredicted SPICE@node1 SPICE@node2Figure 5.9 Voltage Ramp up over Time for various nodes1000Current <strong>in</strong> mA8006004002000TimePredictedSPICEFigure 5.10 Current comparison over time103


5.4 SummaryThere are various techniques to improve leakage power of the design - ‘gated power supply’ or‘sleep transistor’ or ‘switched power network’ is one of the efficient methods to reduce theleakage power. The analysis techniques described <strong>in</strong> this work helps <strong>in</strong> giv<strong>in</strong>g quick data forarchitecture level decisions while us<strong>in</strong>g ‘switched network’ technique. The runtime is <strong>in</strong> fewseconds and hence Design Team can do lots of iterations to get the optimum number ofswitches. The analytical method to calculate total no of switches is fast s<strong>in</strong>ce it <strong>in</strong>volves onetime SPICE simulation – only IV characteristic of switch - and rest of the analysis is performedus<strong>in</strong>g static analysis. We have also analyzed ‘power on glitch’ for the design us<strong>in</strong>g the methodthat contributes to <strong>Power</strong> Supply Noise dur<strong>in</strong>g power up. All the results are closely match<strong>in</strong>gwith SPICE simulation.104


6 Conclusion6.1 Summary<strong>Power</strong> <strong>Grid</strong> analysis challenges be<strong>in</strong>g faced by CMOS technology is discussed <strong>in</strong> this thesis.For robust power grid, designs need to go through follow<strong>in</strong>g analysis:• Accurate <strong>Power</strong> Estimation• Instantaneous IR drop analysis and decap plann<strong>in</strong>g• <strong>Power</strong> Up analysis for designs us<strong>in</strong>g MTCMOS for leakage reductionThe key results of this work can be summarized as follows:1. Successfully implemented hierarchical probabilistic toggle computation approach that isapplicable to multi-million gate designs ma<strong>in</strong>ta<strong>in</strong><strong>in</strong>g the desired accuracy2. <strong>Power</strong> Dissipation <strong>in</strong> cell based CMOS design discussed. A flow is proposed to dopower estimation <strong>in</strong> various design stages that can improve the accuracy of estimation.The flow also helps user to make run time and accuracy tradeoffs3. Proposed the cell characterization methodology for <strong>in</strong>stantaneous IR drop analysis aswell as <strong>Power</strong> Up analysis for MTCMOS4. Discussed a prototype flow developed for <strong>in</strong>stantaneous IR drop estimation based onaverage toggle rate computed by the proposed toggle methodology <strong>in</strong> this work. Thisflow estimates <strong>in</strong>stantaneous as well as average IR drop numbers dur<strong>in</strong>g samesimulation.105


5. <strong>Power</strong> Up analysis for MTCMOS based digital designs. The methodology is validatedus<strong>in</strong>g prototype flow and gives superlative run time improvement compare to Spice.The methodology also helps <strong>in</strong> MTCMOS gate optimization.6.2 Scope of Future Work<strong>Analysis</strong> approaches proposed <strong>in</strong> this work helps <strong>in</strong> robust power grid analysis. The work hassome extensions possible to further help designs.First, power estimation proposed <strong>in</strong> this work relies on gate level netlist. An RTL level powerestimation helps block designer to trade off power early <strong>in</strong> the design like MTCMOS usage ormulti-Vt usage as proposed <strong>in</strong> [17].Second, it is possible to improve pre-layout and post layout power number correlation. One ofthe reasons for them to be different is clock tree expansion and buffer <strong>in</strong>sertion while do<strong>in</strong>gplacement and rout<strong>in</strong>g <strong>in</strong> design to meet tim<strong>in</strong>g constra<strong>in</strong>ts. Early estimation techniques can bedeveloped to estimate additional cell count to better correlate power numbers <strong>in</strong> various stages.Third, the amount of cell characterization data stored for each cell is very huge. A typical ASICtechnology conta<strong>in</strong>s 2000-4000 cells. This data reduction is possible if we can just store thecurrent signatures dur<strong>in</strong>g transition and use that to model current source <strong>in</strong> block level analysis.This will also elim<strong>in</strong>ate the need of frequency doma<strong>in</strong> transform be<strong>in</strong>g performed here.Techniques used <strong>in</strong> some of the commercial tools <strong>in</strong> conjunction with the analysis approachpresented <strong>in</strong> this work can help improv<strong>in</strong>g data reduction.Fourth, we have not got <strong>in</strong>to details of decoupl<strong>in</strong>g capacitance for <strong>in</strong>stantaneous IR dropanalysis <strong>in</strong> this work. It is possible to further extend the work to extensively study various106


decoupl<strong>in</strong>g capacitors – <strong>in</strong>tr<strong>in</strong>sic due to NWELL, non switch<strong>in</strong>g gates, RAMs as well as<strong>in</strong>tentional be<strong>in</strong>g distributed by user. Decoupl<strong>in</strong>g capacitor estimation, characterization andwhat-if impact analysis on <strong>in</strong>stantaneous IR drop is import area for further research.Fifth MTCMOS analysis approach proposed <strong>in</strong> this work is useful early <strong>in</strong> design plann<strong>in</strong>g tomake efficient tradeoffs of MTCMOS switches vs. noise tolerance levels <strong>in</strong> design. In this work,we have modeled switch power network with a lumped capacitance. This does not model timedoma<strong>in</strong> behavior of PG network due to PG resistance. A more accurate approach can bedeveloped that models distributed RC for PG network once placement and power rout<strong>in</strong>g isdone. It is our belief that this will give quick accurate analysis of actual network compare toSPICE like simulations.107


108


7 References1. Semiconductor Industry Assoc., International Technology Roadmap for Semiconductors, 2003 Update -http://public.itrs.net/Files/2003ITRS/Home2003.htm2. Nam Sung Kim, David Blaauw et al, “Leakage Current: Moore’s Law Meets Static <strong>Power</strong>”, IEEE Computer, Dec 2003.3. The SPICE Home Page, http://bwrc.eecs.berkeley.edu/Classes/IcBook/SPICE/4. Rabe, D; Jochens, G.; Kruse, L.; Nebel, W, „“<strong>Power</strong>-simulation of cell based ASICs: accuracy- and performance trade-offs”, Proceed<strong>in</strong>gsof Design automation and test <strong>in</strong> Europe, Feb 19985. F. Najm, “A survey of power estimation techniques <strong>in</strong> <strong>VLSI</strong> circuits, ”IEEE Trans. <strong>VLSI</strong> System., vol. 2, pp. 446–455, Dec. 1994.6. C. Y. Tsui, M. Pedram, and A. Despa<strong>in</strong>, “Efficient estimation of dynamic power dissipation under a real delay model,” <strong>in</strong> Proc. IEEE Int.Conf. Computer-Aided Design, 1993, pp. 224–2287. B. J. George et al., “<strong>Power</strong> analysis and characterization for semi custom design,” <strong>in</strong> Proc. Int. Workshop Low <strong>Power</strong> Design, 1994, pp.215–218.8. J.-Y. L<strong>in</strong> et al., “A cell-based power estimation <strong>in</strong> CMOS comb<strong>in</strong>ational circuits,” <strong>in</strong> Proc. IEEE Int. Conf. Computer-Aided Design,1994, pp. 304–309.9. H. Sar<strong>in</strong> and A. McNelly, “A power model<strong>in</strong>g and characterization method for logic simulation,” <strong>in</strong> Proc. IEEE Custom IntegratedCircuits Conf., 1995, pp. 363–366.10. Synopsys’ Design <strong>Power</strong>, (http://www.synopsys.com/products/power/power.html)11. N. Waste and K. Eshragian. “Pr<strong>in</strong>ciples of CMOS <strong>VLSI</strong> Design. <strong>VLSI</strong> Systems Series. Addison-Wesley, 1985.12. Najm, F.N, “Transition Density, a stochastic measure of Activity <strong>in</strong> Digital Circuits”, DAC, pp. 644-649, June 1991.13. Ghosh, A.; Devadas, S.; Keutzer, K.; White, J, “Estimation of average switch<strong>in</strong>g activity <strong>in</strong> comb<strong>in</strong>ational and sequential circuits”, DAC,pp. 253-259, June 199214. S. Bhanja, N. Ranganathan, “Dependency Preserv<strong>in</strong>g Probabilistic Model<strong>in</strong>g of Switch<strong>in</strong>g Activity us<strong>in</strong>g Bayesian Networks”, 38thDesign Automation Conference, pp. 209-214, 2001.15. HUGIN API reference manual. Version 5.3. http://www.hug<strong>in</strong>.com16. David Heckerman, “A tutorial on learn<strong>in</strong>g with Bayesian Networks”, ftp://ftp.research.microsoft.com/pub/tr/tr-95-06.pdf, March 1995.17. Agarwal, A.; Mukhopadhyay, S.; Raychowdhury, A.; Roy, K.; Kim, C.H, “Leakage power analysis and reduction <strong>in</strong> nanoscale circuits”,IEEE Micro, Volume 26, Issue 2, pp. 68-80, March 2006.18. Keshavarzi, A.; Tschanz, J.W.; Narendra, S.; De, V.; Daasch, W.R.; Roy, K.; Sachdev, M; Hawk<strong>in</strong>s, C.F, “Leakage and process variationeffects <strong>in</strong> current test<strong>in</strong>g on future CMOS circuits”, IEEE Design & Test of Computers, Volume 9, Issue 5, pp. 36-43, Sept 2002.19. Dresig, F. Lanches, P. Rettig, O., et al, “Simulation and reduction of CMOS power dissipation at logic level”, Design Automation,1993, with the European Event <strong>in</strong> ASIC Design. Proceed<strong>in</strong>gs, pp. 341-246, Feb 1993.20. An-Chang Deng Yan-Chyuan Shiau Loh, K.-H, “Time doma<strong>in</strong> current waveform simulation of CMOS circuits”, IEEE <strong>in</strong>ternationalconference on Computer aided design 1988, pp. 208-211, Nov 1988.109


21. F.N. Najm, R.Burch, P. Yang, and I.N. Hajj. “Probabilistic Simulation for Reliability <strong>Analysis</strong> of CMOS <strong>VLSI</strong> Circuits”. IEEETransactions on CAD, 9(4):439-450, April 1990.22. Randal S and Tom Phoenix and Brian d foy, “Learn<strong>in</strong>g Perl”, 4 th Edition, O’Reilly & Associates, ISBN 059610105823. Matlab Tutorial, http://www.math.ufl.edu/help/matlab-tutorial/24. Synopsys, Inc, “Us<strong>in</strong>g the Synopsys® Design Constra<strong>in</strong>ts Format”, Application Note, Sept 2005.25. Himanshu Bhatnagar, “Advanced ASIC Chip Synthesis: Us<strong>in</strong>g Synopsys Design Compiler Physical Compiler and Primetime”, 2 ndEdition, Kluwer Academic Publishers, ISBN: 0792376447.26. Mart<strong>in</strong> Sa<strong>in</strong>t-Laurent, Swam<strong>in</strong>athan, "Impact of <strong>Power</strong> Supply Noise on Tim<strong>in</strong>g In High Frequency Microprocessors", IEEE Trans onAdvanced Packag<strong>in</strong>g, pp. 135-144, Feb 200427. Kriplani, H.; Najm, F.; Hajj, I, “Improved Delay and Current Models for Estimat<strong>in</strong>g Maximum Currents <strong>in</strong> CMOS <strong>VLSI</strong> Circuits”,ISCAS 94, pp. 435-438, June 1994.28. Kriplani, H.; Najm, F.N.; Hajj, I.N, “Pattern Independent Maximum Current Estimation <strong>in</strong> <strong>Power</strong> and Ground Buses of CMOS <strong>VLSI</strong>Circuits: Algorithms, Signal Correlations, and Their Resolution”, IEEE Trans on CAD of <strong>in</strong>ternational circuits and systems, pp. 998-1012, Aug 1995.29. Hsiao, M.S.; Rudnick, E.M.; Patel, J.H., “Peak <strong>Power</strong> Estimation of <strong>VLSI</strong> Circuits: New Peak <strong>Power</strong> Measures”, IEEE Trans on <strong>VLSI</strong>Systems, pp. 435-439, Aug 200030. Q<strong>in</strong>g Wu; Q<strong>in</strong>ru Qiu; Pedram, M, “Estimation of Peak <strong>Power</strong> Dissipation <strong>in</strong> <strong>VLSI</strong> Circuits Us<strong>in</strong>g the Limit<strong>in</strong>g Distributions of ExtremeOrder Statistics”, IEEE Trans on CAD of <strong>in</strong>tegrated Circuits and Systems, pp. 942-956, Aug 2001.31. Boliolo, A. Ben<strong>in</strong>i, L. de Micheli, G. Ricco, B., “Gate-level power and current simulation of CMOS <strong>in</strong>tegrated circuits”, Very LargeScale Integration (<strong>VLSI</strong>) Systems, pp. 473-488, Dec 199732. Anantha Chandrakasan’s Home Page: http://www-mtl.mit.edu/~anantha/publications.html,http://www.fetchbook.<strong>in</strong>fo/search_Anantha_Chandrakasan/searchBy_Author.html33. FFT Tutorial, http://www.ele.uri.edu/~hansenj/projects/ele436/fft.pdf34. Jeff Tranter and Paul Ra<strong>in</strong>es, “Tcl/Tk <strong>in</strong> Nutshell”, O’Reilly Associates, ISBN 156592433935. Alan V. Oppenheim, Ronald W. Schafer, John R. Buck, “Discrete Time Signal Process<strong>in</strong>g“, 2 nd Edition, Prentice Hall, ISBN 013754920236. Chen, H.H.; L<strong>in</strong>g, D.D, “<strong>Power</strong> Supply <strong>Analysis</strong> Methodology for Deep-Submicron <strong>VLSI</strong> Chip Design”, DAC, pp. 638-643, June 1997.37. Yi-Sh<strong>in</strong>g Chang; Gupta, S.K.; Breuer, M.A, “<strong>Analysis</strong> of Ground Bounce <strong>in</strong> Deep-Submicron Circuits”, <strong>VLSI</strong> Test Symposium, pp. 110-116, May 199738. Yi-M<strong>in</strong> Jiang; Kwang-T<strong>in</strong>g Cheng; An-Chang Deng, “Estimation of Maximum <strong>Power</strong> Supply Noise for Deep Sub-Micron <strong>Designs</strong>”,International sym on low power electronics and design, pp. 233-238, Aug 1998.39. Zhao, S.; Roy, K.; Koh, C.-K, “Estimation of Inductive and Resistive Switch<strong>in</strong>g Noise on <strong>Power</strong> Supply Network <strong>in</strong> Deep Sub-MicronCMOS Circuits”, International conference on Computer Design, pp. 65-72, Sept 2000.40. S. Bobba, I.N.Hajj, “Maximum voltage variation <strong>in</strong> the power distribution network of <strong>VLSI</strong> circuits with RLC Models,” Proc of ISLPED,Aug2001110


41. Bai, G.; Bobba, S.; Hajji, I.N, "Static Tim<strong>in</strong>g <strong>Analysis</strong> Includ<strong>in</strong>g <strong>Power</strong> Supply Noise Effect on Propagation Delay <strong>in</strong> <strong>VLSI</strong> Circuits",DAC, pp. 295-300, 2001.42. G. Steele, et al., “Full-Chip Verification Methods for DSM <strong>Power</strong> Distribution Systems,” Proc. Of DAC, pp. 744-749, 199843. R. Chaudhry, D. Blaauw, R. Panda and T. Edwards, “Current Signature Compression For IR-Drop <strong>Analysis</strong>,” Proc. Design AutomationConference, pp. 162-167, 200044. S. Bobba and I. N. Hajj, “Estimation of maximum current envelope for power bus analysis and design,” Proc. of ISPD, pp 141-146, Apr199845. Rishi Bhooshan (TI) et.al, “A Unique Method For Dynamic Voltage Drop <strong>Analysis</strong> and Decoupl<strong>in</strong>g Capacitance Estimation,, VDAT200346. Cirit, M.A., “Characteriz<strong>in</strong>g a <strong>VLSI</strong> standard cell library”, Digital Object Identifier 10.1109/CICC, pp.25.7.2-25.7.4, May 199147. Debnath, S.P.; Sukumar, J.; Udaykumar, H, “A methodology for fast vector based power supply and substrate noise analyses”,International conference on <strong>VLSI</strong> Design, pp. 808-811, Jan 2005.48. Dalal, A.; Lev, L.; Mitra, S.; “Design of an efficient power distribution network for the UltraSPARC-I microprocessor”, IEEE conferenceon Computer Design: <strong>VLSI</strong> <strong>in</strong> computers and processors, pp. 118-123, Oct 199549. Chen, H.H.; Schuster, S.E.; „On-chip decoupl<strong>in</strong>g capacitor optimization for high-performance <strong>VLSI</strong> design”, <strong>VLSI</strong> Technology, Systemsand Applications, pp. 99-103, June 1995.50. Larsson, P, “<strong>Power</strong> supply noise <strong>in</strong> future IC's: a crystal ball read<strong>in</strong>g”, Custom Integrated Circuits, pp. 467-474, May 1999.51. Sotman, M.; Popovich, M.; Kolodny, A.; Friedman, E, “Leverag<strong>in</strong>g symbiotic on-die decoupl<strong>in</strong>g capacitance”, Electrical Performance ofElectronic Packag<strong>in</strong>g, pp. 111-114, Oct 200552. Larsson, P, “Resonance and damp<strong>in</strong>g <strong>in</strong> CMOS circuits with on-chip decoupl<strong>in</strong>g capacitance”, IEEE Transactions on Circuits andSystems-I, vol 45, pp. 849-858, Aug 199853. Larsson, P, “Parasitic Resistance <strong>in</strong> an MOS Transistor Used as On-Chip Decoupl<strong>in</strong>g Capacitance,” IEEE Journal of Solid State Circuits,vol 32, pp 574-576, Apr 199754. Chaudhry, R.; Panda, R.; Edwards, T.; Blaauw, D, “Design and analysis of power distribution networks with accurate RLC models”,International conference on <strong>VLSI</strong> Design, pp. 151-155, Jan 200055. M<strong>in</strong> Zhao; Panda, R.V.; Sapatnekar, S.S.; Edwards, T.; Chaudhry, R.; Blaauw, D, “Hierarchical analysis of power distribution networks”,DAC, pp. 150-155, June 200056. IBM Methodology for <strong>Power</strong> Supply Noise - http://www.research.ibm.com/da/nova.html57. R. Heald et. al, “Implementation of a 3 rd Generation Sparc V9 64b Microprocessor”, Proc IEEE ISSCC, pp. 412-413, 200058. Yi-M<strong>in</strong> Jiang Kwang-T<strong>in</strong>g Cheng, “<strong>Analysis</strong> of Performance Impact Caused by <strong>Power</strong> Supply Noise <strong>in</strong> Deep Submicron Devices”, DAC,June 199959. Apache Design Solutions, “Reshap<strong>in</strong>g Nanometer Flows with Physical <strong>Power</strong> Integrity”, http://www.apache-da.com, White Paper, May2003.60. Anthony Ralston, Philip Rab<strong>in</strong>owitz, “A First course <strong>in</strong> Numerical <strong>Analysis</strong>”, 2 nd Edition, Dover Publications, ISBN 048641454X.61. Kalpesh Shah, “SNUG 2006 Panel Discussion”111


62. H. Mehta, R.M.Owens, M.J.Irw<strong>in</strong>, “Energy Characterization Based on Cluster<strong>in</strong>g,” 33 rd Design Automation Conference, June 1996.63. D. Brooks, V. Tiwari, and M. Martonosi, “Wattch: A framework for Architectural-Level <strong>Power</strong> <strong>Analysis</strong> and Optimizations,” Proc ofInternational Symposium on Computer Architecture, pp. 83-94, June 200064. V. Tiwari, S. Malik, and A. Wolfe, ”<strong>Power</strong> <strong>Analysis</strong> of Embedded Software: A First Step toward software power m<strong>in</strong>imization,” IEEETrans <strong>VLSI</strong> Systems, vol2, no. 4, pp 437-445, 199465. E. Macii, M. Pedram and F. Somenzi, “High Level <strong>Power</strong> Model<strong>in</strong>g and Estimation,” IEEE Transactions on Computer Aided Design ofIntegrated Circuits and Systems, vol 17, November 1998.66. Synopsys Prime <strong>Power</strong> - http://www.synopsys.com/products/power/primepower_ds.pdf67. Synopsys <strong>Power</strong> Compiler - http://www.synopsys.com/products/power/power_ds.pdf68. Synopsys Nanosim - http://www.synopsys.com/products/mixedsignal/nanosim/nanosim.html69. Synopsys Liberty Format - http://www.synopsys.com/partners/tap<strong>in</strong>/lib_<strong>in</strong>fo.html70. M Horowitz and R Gonzalez, “Energy dissipation <strong>in</strong> general purpose Microprocessors”, IJSSC, vol31, Sept 1996.71. Brglez, F. Bryan, D. Kozm<strong>in</strong>ski, K. , “Comb<strong>in</strong>ational profiles of sequential benchmark circuits”, ISCAS, vol 3, pp. 1929-1934, May1989.72. R. Wilson and D. Lammers, “Grove Calls Leakage Chip Designers’ Top Problem,” EE Times, 13 Dec 2002;www.eetimes.com/story/OEG20021213S0040.73. Intel SpeedStem technology, http://www.<strong>in</strong>tel.com74. Y.Ye, S Borkar, V. De, “A New Technique for Standby Leakage Reduction <strong>in</strong> High-Performance Circuits,” 1998 Symposium on <strong>VLSI</strong>Circuits, June 1998.75. M. Powell et al., “Reduc<strong>in</strong>g Leakage <strong>in</strong> a High Performance Deep-Submicron Instruction Cache,” IEEE Trans. <strong>VLSI</strong>, Feb 2001, pp 77-8976. Ali K., Charles H. et al., “ Effect of reverse body bias for low power CMOS circuits”77. Kaushik R, Mark C.J., D<strong>in</strong>esh S., “leakage control with efficient use of transistor stacks <strong>in</strong> s<strong>in</strong>gle threshold CMOS”78. Shekhar Borkar, “Low <strong>Power</strong> Design Challenges for the Decade”, 2001.79. Kumagai, K.; Iwaki, H.; Yoshida, H.; Suzuki, H.; Yamada, T.; Kurosawa, S.; “A Novel <strong>Power</strong><strong>in</strong>g Down Scheme for low Vt CMOSCircuits”, 1998 Symposium on , 11-13 June 1998. Pages:44 – 4580. Mutoh, S.; Douseki, T.; Matsuya, Y.; Aoki, T.; Yamada, J.,” 1V high-speed digital circuit technology with 0.5&mu;m multi-thresholdCMOS”, IEEE ASIC Conference, 1993.81. Akamatsu, H.; Iwata, T.; Yamamoto, H.; Hirata, T.; Yamauchi, H.; Kotani, H.; Matsuzawa, A.; “A low power data hold<strong>in</strong>g circuit withan <strong>in</strong>termittent power supply scheme for sub-1V MT-CMOS LSIs”, <strong>VLSI</strong> Circuits, 1996. Digest of Technical Papers., 1996 Symposiumon , 13-15 June 1996 Pages:14 – 1582. Ye, Y.; Borkar, S.; De, V. , “A new technique for standby leakage reduction <strong>in</strong> high-performance circuits”, Symposium on <strong>VLSI</strong> Circuits,June 1998. Page(s): 40-4183. Das, K.K.; Joshi, R.V.; Chuang, C.T.; Cook, P.W.; Brown, R.B., “New digital circuit techniques for total standby leakage reduction <strong>in</strong>Nano-scale SOI technology”, pp. 309-312, ISSCC, Sept 2003.84. Wenx<strong>in</strong> Wang; Anis, M.; Areibi, S, “Fast techniques for standby leakage reduction <strong>in</strong> MTCMOS circuits”, ISOCC, pp. 21-24, Sept 2004112


85. Fei Li; Lei He; Saluja, K.K.; “Estimation of maximum power-up current”, DAC, pp. 51-56, Jan 200286. Calhoun, B.H.; Honore, F.A.; Chandrakasan, A.P, “A leakage reduction methodology for distributed MTCMOS”, JSSC, pp. 818-826,May 200487. Royannez, P.; Mair, H.; Dahan, F.; Wagner, M.; Streeter, M.; Bouetel, L.; Blasquez, J.; Clasen, H.; Sem<strong>in</strong>o, G.; Dong, J.; Scott, D.; Pitts,B.; Raibaut, C.; Um<strong>in</strong>g Ko, “90nm Low Leakage SoC Design Techniques for Wireless Applications”, ISSCC, pp. 138-139, Feb 2005.88. R. Heald, et al., “Implementation of a 3 rd Generation SPARC V9 64b Microprocessor,” Proc. IEEE ISSCC, pp 412-413, 200089. P. Gronowski, W. Bowhill, R. Preston, M. Gowan, and R. Allmon, “High Performance Microprocessor Design,” IEEE Journal of SolidState Circuits, vol 33, no 5, pp. 676-686, Apr 1998.90. J. Darnauer, D. Chengson, B. Schmidt, and E. Priest, “Electrical Evaluation of Flip-Chip package Alternatives for Next GenerationMicroprocessor,“ Electronic Components and Technology Conference, pp. 666-673, 199891. S. Borkar, “Low <strong>Power</strong> Design Challenges for the Decade,” Proc. of ISLPED, 200092. V. Tiwari, D. S<strong>in</strong>gh, S. Rajgopal, G. Mehta, R. Patel and F. Baez, “Reduc<strong>in</strong>g <strong>Power</strong> <strong>in</strong> High performance Microprocessors,” Proc. ofDesign Automations Conference, 199793. Wachnik, R.A.; Filippi, R.G.; Shaw, T.M.; L<strong>in</strong>, P.C, “Practical benefits of the electromigration short-length effect, <strong>in</strong>clud<strong>in</strong>g a new designrule methodology and an electromigration resistant power grid with enhanced wireability”, Sym on <strong>VLSI</strong> Technology, pp. 220-221, June2000.94. J. Kitch<strong>in</strong>, “Statistical Electromigration Budget<strong>in</strong>g for Reliable Design and Verification <strong>in</strong> a 300-MHz Microprocessor”, Symposium on<strong>VLSI</strong> Circuits Digests, pp. 115-116, 199595. T .H. Cormen, C. E. Leiserson, R. L. Rivest “Introduction to Algorithms”, PHI96. Chapra, S.C, Canale R P “Numerical Methods for Eng<strong>in</strong>eers” 3rd Ed., McGraw-Hill 1998.97. Rabey, “Digital Integrated Circuits Design”, Pearson Education, Second Edition, 2003113


114


Appendix A Sample SDC filecreate_clock –period [get_ports clk]set_<strong>in</strong>put_delay -clock clk1 [get_ports IN*]set_case_analysis 0 [get_ports *reset* *scan_mode*]report_tim<strong>in</strong>g 115


Appendix B Sample SPEF Format*SPEF "IEEE 1481-1997"*DESIGN "s27"*DATE "Mon Dec 13 10:05:00 1999"*VENDOR "TI"*PROGRAM "vlog2spef"*VERSION "1.0"*DESIGN_FLOW "Dummy From Verilog"*DIVIDER /*DELIMITER :*BUS_DELIMITER []*T_UNIT 1 NS*C_UNIT 1 PF*R_UNIT 1 KOHM*L_UNIT 1e-3 UH*PORTSG17 O *L 0.1G3 I *S 0.1 0.1G2 I *S 0.1 0.1G1 I *S 0.1 0.1G0 I *S 0.1 0.1PREZ I *S 0.1 0.1CLK I *S 0.1 0.1*D_NET G17 0.1*CONN*I IV110_1:Y O *L 0.1 *D IV110*P G17 O *L 0.1*CAP0 G17 0.11 IV110_1:Y 0.12 G17:0 0.1*RES0 G17 G17:0 0.11 IV110_1:Y G17:0 0.1*END*D_NET G3 0.1*CONN*I OR210_1:A I *L 0.1 *D OR210*P G3 I *L 0.1*CAP0 G3 0.11 OR210_1:A 0.12 G3:0 0.1*RES0 G3 G3:0 0.11 OR210_1:A G3:0 0.1*END*D_NET G2 0.1*CONN*I NO210_3:A I *L 0.1 *D NO210*P G2 I *L 0.1*CAP0 G2 0.11 NO210_3:A 0.12 G2:0 0.1*RES0 G2 G2:0 0.11 NO210_3:A G2:0 0.1*END*D_NET G1 0.1*CONN*I NO210_2:A I *L 0.1 *D NO210*P G1 I *L 0.1*CAP0 G1 0.11 NO210_2:A 0.12 G1:0 0.1*RES0 G1 G1:0 0.11 NO210_2:A G1:0 0.1*END*D_NET G0 0.1*CONN*I IV110_0:A I *L 0.1 *D IV110*P G0 I *L 0.1*CAP0 G0 0.11 IV110_0:A 0.12 G0:0 0.1*RES0 G0 G0:0 0.11 IV110_0:A G0:0 0.1*END*D_NET PREZ 0.1*CONN*I DTP10J_0:PREZ I *L 0.1 *D DTP10J*I DTP10J_1:PREZ I *L 0.1 *D DTP10J*I DTP10J_2:PREZ I *L 0.1 *D DTP10J*P PREZ I *L 0.1*CAP0 PREZ 0.11 DTP10J_0:PREZ 0.12 DTP10J_1:PREZ 0.13 DTP10J_2:PREZ 0.14 PREZ:0 0.1116


*RES0 PREZ PREZ:0 0.11 DTP10J_0:PREZ PREZ:0 0.12 DTP10J_1:PREZ PREZ:0 0.13 DTP10J_2:PREZ PREZ:0 0.1*END*D_NET CLK 0.1*CONN*I DTP10J_0:CLK I *L 0.1 *D DTP10J*I DTP10J_1:CLK I *L 0.1 *D DTP10J*I DTP10J_2:CLK I *L 0.1 *D DTP10J*P CLK I *L 0.1*CAP0 CLK 0.11 DTP10J_0:CLK 0.12 DTP10J_1:CLK 0.13 DTP10J_2:CLK 0.14 CLK:0 0.1*RES0 CLK CLK:0 0.11 DTP10J_0:CLK CLK:0 0.12 DTP10J_1:CLK CLK:0 0.13 DTP10J_2:CLK CLK:0 0.1*END*D_NET G10 0.1*END*D_NET G5 0.1*CONN*I DTP10J_0:Q O *L 0.1 *D DTP10J*I NO210_1:A I *L 0.1 *D NO210*CAP0 DTP10J_0:Q 0.11 NO210_1:A 0.12 G5:0 0.1*RES0 DTP10J_0:Q G5:0 0.11 NO210_1:A G5:0 0.1*END*D_NET G6 0.1*CONN*I DTP10J_1:Q O *L 0.1 *D DTP10J*I AN210_0:B I *L 0.1 *D AN210*CAP0 DTP10J_1:Q 0.11 AN210_0:B 0.12 G6:0 0.1*RES0 DTP10J_1:Q G6:0 0.11 AN210_0:B G6:0 0.1 *END*CONN*I DTP10J_0:D I *L 0.1 *D DTP10J*I NO210_0:Y O *L 0.1 *D NO210*CAP0 DTP10J_0:D 0.11 NO210_0:Y 0.12 G10:0 0.1*RES0 DTP10J_0:D G10:0 0.11 NO210_0:Y G10:0 0.1117


Appendix C <strong>Power</strong> Waveforms <strong>Analysis</strong>AND Gate power waveforms at different frequency po<strong>in</strong>ts. Note that waveform shape andpeaks are match<strong>in</strong>g across frequency range.Figure 1 1MHz, Peak: 838.9 uWFigure 2 100MHz, Peak: 840.7 uWFigure 3 1GHz, Peak: 838.2 uW118


Appendix D Current Characterization – sample spice deck**epic tech="voltage 1.2v"*epic "vdd 0 1.2 0.01"*epic "vss 0 0 0.01"*epic "<strong>in</strong>voke spice3 %<strong>in</strong>put %output"* spice options.<strong>in</strong>c /user/kalpu/cloc/autochar/userware/spice_options nopr<strong>in</strong>t* temperature = 25.temp 25.<strong>in</strong>c ../user_data/models_strong nopr<strong>in</strong>t*.<strong>in</strong>c /db/pdk/1233c035a/current/models/current/tis/model.paths.strongnopr<strong>in</strong>t.<strong>in</strong>c /user/kalpu/cloc/autochar/subckt/sr40/an210h nopr<strong>in</strong>tPVDD 1.2vvdd vdd 0 PVDDRVDD VDD VDD_<strong>in</strong>v1 1000RVSS VSS_<strong>in</strong>v1 0 1000x<strong>in</strong>v1 A B Y VSS_<strong>in</strong>v1 vdd_<strong>in</strong>v1 an210h*10 MHzVA A 0 PULSE 0 PVDD 1n pslew pslew 50n 100n *Vb B 0 PULSE 0 PVDD 1n pslewpslew 50n 100n Vb B 0 PVDDPslew 0.01npload 50ffCY Y 0 pload.tran 0.01ns 250ns.MEASURE TR AVGPWR AVG P(Vvdd) FROM=20ns TO=60ns .punch tr V(Vdd_<strong>in</strong>v1vss_<strong>in</strong>v1) .punch tr I(VVDD) .punch tr I(rvdd) .punch tr V(A B Y) *.punch trI(rvdd rvss).end119


Appendix E Waveform transformation exampleFigure 4 1MHz base Waveform, 830.4uWFigure 5 100MHz Transformation, 830.4 uW120


Figure 6 1GHz Transformation for 1MHz, 830.4uW121

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!