Power Grid Analysis in VLSI Designs - SERC

Power Grid Analysis in VLSI DesignsA ThesisSubmitted for the Degree ofMaster of Science (Engineering)In the Faculty of EngineeringByKalpesh ShahSuper Computer Education and Research CentreIndian Institute of ScienceBangalore – 560012March 2007

Table of ContentsAcknowledgements..................................................................................................................3Abstract ...................................................................................................................................111 Introduction ...................................................................................................................131.1 Motivation........................................................................................................................................131.1.1 Power Estimation ................................................................................................................................... 161.1.2 Power Supply Noise ............................................................................................................................... 171.1.3 MTCMOS Analysis ................................................................................................................................. 221.2 Terms ..............................................................................................................................................241.3 Thesis outline and Contribution......................................................................................................252 Toggle Activity Estimation...........................................................................................272.1 Overview .........................................................................................................................................272.2 Toggle Activity Estimation ..............................................................................................................292.3 Multi-million gate solution ...............................................................................................................302.3.1 Deriving automatic toggle frequency values.............................................................................................. 312.3.2 Hierarchical Modeling ............................................................................................................................. 352.4 Validation and Results....................................................................................................................372.5 Summary.........................................................................................................................................383 Power Estimation..........................................................................................................393.1 Overview .........................................................................................................................................393.2 Current approaches to Power Analysis..........................................................................................423.3 Power analysis Tools......................................................................................................................453.3.1 Power Compiler: [67] .............................................................................................................................. 453.3.2 Power Mill (or Nano Sim) [4][68].............................................................................................................. 463.3.3 Prime Power [66].................................................................................................................................... 473.3.4 Other Tools ............................................................................................................................................ 473.4 Validation Flow................................................................................................................................483.4.1 Netlist Setup:.......................................................................................................................................... 503.4.2 Vector Generation .................................................................................................................................. 503.4.3 Interconnect setup .................................................................................................................................. 513.5 Validation and Results....................................................................................................................513.6 Power estimation applications........................................................................................................603.6.1 Average power/ground bus currents ........................................................................................................ 603.6.2 Average power dissipation ...................................................................................................................... 613.6.3 Electro migration failures......................................................................................................................... 613.6.4 Power Routing........................................................................................................................................ 613.6.5 Gate Oxide Integrity Analysis .................................................................................................................. 623.7 Summary.........................................................................................................................................624 Power Supply Noise Analysis .....................................................................................634.1 Overview .........................................................................................................................................634.2 Cell Characterization.......................................................................................................................644.2.1 Current Characterization Methodology..................................................................................................... 654.2.2 Current Characterization Flow................................................................................................................. 714.3 Power Grid network modeling ........................................................................................................724.3.1 Power Grid Current Waveform Modeling.................................................................................................. 744.4 Complete Flow ................................................................................................................................785

4.4.1 Timing Information Generation ................................................................................................................ 804.4.2 Power Grid Generator............................................................................................................................. 804.4.3 SPICE Simulation................................................................................................................................... 824.5 Validation and Results....................................................................................................................824.5.1 Peak Power Results ............................................................................................................................... 834.5.2 Peak Dynamic IR Drop Results ............................................................................................................... 844.6 Summary.........................................................................................................................................875 Power Up Analysis........................................................................................................895.1 Switched PG Networks...................................................................................................................915.2 Switch Network Analysis.................................................................................................................945.2.1 Switch Characterization .......................................................................................................................... 955.2.2 Current or Switch Prediction.................................................................................................................... 965.3 Results and Analysis.......................................................................................................................995.4 Summary.......................................................................................................................................1046 Conclusion...................................................................................................................1056.1 Summary.......................................................................................................................................1056.2 Scope of Future Work...................................................................................................................1067 References...................................................................................................................109Appendix A Sample SDC file...............................................................................................115Appendix B Sample SPEF Format......................................................................................116Appendix C Power Waveforms Analysis...........................................................................118Appendix D Current Characterization – sample spice deck ...........................................119Appendix E Waveform transformation example...............................................................1206

Table of FiguresFigure 1.1 Power Dissipation in CMOS designs ......................................................................................13Figure 1.2 Power Density trend in CMOS designs...................................................................................14Figure 1.3 Leakage and Dynamic Power Dissipation [2].........................................................................15Figure 1.4 Schematic of Power Grid in CMOS designs...........................................................................18Figure 1.5 Normalized delay and normalized delay to voltage ratio........................................................21Figure 1.6 Total power break up into leakage and active........................................................................23Figure 2.1 Schematic of logic circuit 1......................................................................................................31Figure 2.2 Schematic of Logic Circuit 2....................................................................................................32Figure 2.3 Gated clock example ...............................................................................................................34Figure 2.4 Gate Level Netlist for 'simple' design......................................................................................36Figure 2.5 Timing Arcs in extracted model of 'simple' design..................................................................37Figure 3.1 Venn diagram of Power Components.....................................................................................40Figure 3.2 Power Estimation in Design Stages........................................................................................45Figure 3.3 Power Estimation Validation Flow...........................................................................................49Figure 3.4 Legends for Validation Flow....................................................................................................49Figure 4.1 Voltage over time representation at an internal design node ................................................63Figure 4.2 Schematic circuit for instantaneous voltage drop analysis ....................................................64Figure 4.3 Inverter waveforms measured at different nodes...................................................................66Figure 4.4 transition time vs. peak power for Inverter..............................................................................68Figure 4.5 Transition time vs. peak power for nand gate.........................................................................68Figure 4.6 Load vs. peak power for AND gate.........................................................................................69Figure 4.7 Load vs. Peak power for OR gate...........................................................................................69Figure 4.8 State Dependency on cell switching .......................................................................................70Figure 4.9 Cell Characterization Flow.......................................................................................................72Figure 4.10 Power Grid Modeling.............................................................................................................73Figure 4.11 Peak IR drop Computation Flow...........................................................................................79Figure 4.12 Prime Time flow for arrival time computation .......................................................................80Figure 4.13 Power Grid Generation Flow.................................................................................................81Figure 4.14 PSN waveform of Proposed Method.....................................................................................86Figure 4.15 PSN Reference Waveform....................................................................................................86Figure 5.1 Gated Power Supply ([74]) ......................................................................................................89Figure 5.2 Layout of 1M gate with switch network...................................................................................92Figure 5.3 Current Glitch and Voltage Ramp at arbitrary switch output..................................................92Figure 5.4 Typical PG network with Power Switches...............................................................................93Figure 5.5 Schematic Switch network Analysis Flow...............................................................................95Figure 5.6 Analysis model of Virtual Power Network...............................................................................96Figure 5.7 Infinitesimal Time Division for Current Prediction...................................................................97Figure 5.8 Reduced Switch Network for validation ................................................................................100Figure 5.9 Voltage Ramp up over Time for various nodes ....................................................................103Figure 5.10 Current comparison over time.............................................................................................103Figure 1 1MHz, Peak: 838.9 uW.............................................................................................................118Figure 2 100MHz, Peak: 840.7 uW.........................................................................................................1187

Figure 3 1GHz, Peak: 838.2 uW.............................................................................................................118Figure 4 1MHz base Waveform, 830.4uW .............................................................................................120Figure 5 100MHz Transformation, 830.4 uW .........................................................................................120Figure 6 1GHz Transformation for 1MHz, 830.4uW ..............................................................................1218

List of TablesTable 1.1 Consolidation of ITRS2003 Predictions ...................................................................................14Table 1.2 Generic Term Definitions..........................................................................................................25Table 2.1 Comparison of Static vs Dynamic approaches for Power Estimation.....................................28Table 3.1 Power Modeling for CMOS gates.............................................................................................43Table 3.2 ISCAS89 circuit description ......................................................................................................54Table 3.3 Runtime comparison between vector less and SPICE............................................................55Table 3.4 Clock Power vs. Total Power....................................................................................................57Table 3.5 Power Estimation across various tools ....................................................................................60Table 4.1 Comparison of Peak power Dissipation...................................................................................84Table 4.2 Comparison of percentage peak instantaneous IR drop.........................................................85Table 4.3 Comparison of percentage peak IR drop on ISCAS89 circuits...............................................85Table 5.1 Switch Prediction by proposed algorithm...............................................................................102Table 5.2 Voltage Prediction...................................................................................................................102Table 5.3 Power Up analysis - Runtime Comparison ............................................................................1039

AbstractPower has become an important design closure parameter in today’s ultra low submicrondigital designs. The impact of the increase in power is multi-discipline to researchers rangingfrom power supply design, power converters or voltage regulators design, system, board andpackage thermal analysis, power grid design and signal integrity analysis to minimizing poweritself. This work focuses on challenges arising due to increase in power to power grid designand analysis.Challenges arising due to lower geometries and higher power are very well researched topicsand there is still lot of scope to continue work. Traditionally, designs go through average IRdrop analysis. Average IR drop analysis is highly dependent on current dissipation estimation.This work proposes a vector less probabilistic toggle estimation which is extension of one ofthe approaches proposed in literature. We have further used toggles computed using thisapproach to estimate power of ISCAS89 benchmark circuits. This provides insight into qualityof toggles being generated. Power Estimation work is further extended to comprehend withvarious state of the art methodologies available i.e. spice based power estimation, logicsimulation based power estimation, commercially available tool comparisons etc. We finallyarrived at optimum flow recommendation which can be used as per design need and schedule.Today’s design complexity – high frequencies, high logic densities and multiple level clock andpower gating - has forced design community to look beyond average IR drop. High rate ofswitching activities induce power supply fluctuations to cells in design which is known as11

instantaneous IR drop. However, there is no good analysis methodology in place to analyze thisphenomenon. Ad hoc decoupling planning and on chip intrinsic decoupling capacitance helpsto contain this noise but there is no guarantee. This work also applies average togglecomputation approach to compute instantaneous IR drop analysis for designs. Instantaneous IRdrop is also known as dynamic IR drop or power supply noise. We are proposing cellcharacterization methodology for standard cells. This data is used to build power grid model ofthe design. Finally, the power network is solved to compute instantaneous IR drop.Leakage Power Minimization has forced design teams to do complex power gating – multilevel MTCMOS usage in Power Grid. This puts additonal analysis challenge for Power Grid interms of ON/OFF sequencing and noise injection due to it. This work explains the state of arthere and highlights some of the issues and trade offs using MTCMOS logic. It further suggestsa simple approach to quickly access the impact of MTCMOS gates in Power Grid in terms ofpeak currents and IR drop. Alternatively, the approach suggested also helps in MTCMOS gateoptimization. Early leakage optimization overhead can be computed using this approach.12

1 Introduction1.1 MotivationVLSI industry is facing one of the biggest challenges in its evolution – Power Integrity closure– the next after cross talk induced integrity issues in previous decade. Power Dissipation hasphenomenally increased across years as shown in Figure 1.1 giving rise to this challenge.Figure 1.2 shows the increase in power density due to ultra low scaling and hence increasingthe components cramped in unit area.Power (Watts)100000100001000100101Pentium® proc286 4863864004 8008 80808085808618KW5KW1.5KW500W0.11971 1974 1978 1985 1992 2000 2004 2008YearFigure 1.1 Power Dissipation in CMOS designs13

Power Density (W/cm2)1000010001001040048008808018086RocketNozzleNuclearReactorHot Plate8085286 386 486P6Pentium® proc1970 1980 1990 2000 2010YearFigure 1.2 Power Density trend in CMOS designsTable 1.1 below shows consolidation of ITRS2003 [1] predictions on power as well as itsimpact on design as well as operating voltages.20032004(90u)2005 20062007(65u)2008 20092010(45u)2012Vdd(High Perf) 1.2 1.2 1.1 1.1 1.1 1 1 1 0.9Vdd(Low Power) 1 0.9 0.9 0.9 0.8 0.8 0.8 0.7 0.7High Perf Power (W) 149 158 167 180 189 200 210 218 240Battery Operated(W) 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 3PG Pads 1700 1800 2000 2100 2200 2300 2400 2400 2600Table 1.1 Consolidation of ITRS2003 Predictions14

Further, Figure 1.3 shows that there is leakage as well as dynamic component of power thoseare continuously increasing – leakage dominating dynamic – in newer technology nodes. [2]Next sections describe how these give rise to challenges in Power Grid analysis and leads to thework done.Figure 1.3 Leakage and Dynamic Power Dissipation [2]15

1.1.1 Power EstimationOne of the challenges in Power Integrity analysis is to predict accurate power dissipation – bothaverage as well as peak - of design. Power Estimation is required for package thermal analysis,power minimization, and Power Grid design.The earliest proposed techniques of estimating power dissipation were strongly patterndependentcircuit simulation based e.g. SPICE or fast SPICE simulators [3-6]. Besides beingstrongly pattern-dependent, these techniques are too slow to be used on modern very largescaleintegrated (VLSI) circuits for which high power dissipation is a major problem.In order to improve computational efficiency, other simulation-based techniques were proposedusing various kinds of timing, switch-level, and logic simulation [7-9]. In these approaches,lookup tables are obtained by electrical simulation of the basic library elements, and thecollected data are then used during gate level simulation. These techniques generally assumethat the power supply and ground voltages are fixed, and only the supply current waveform isestimated. While they are indeed more efficient than traditional circuit simulation at the cost ofsome loss in accuracy, they remain strongly pattern-dependent and they are still slow formodern multi-million gate designs where whole chip can not be simulated together.In order to overcome the shortcomings of simulation-based techniques, research has beenfocused on probabilistic and statistical techniques for toggle estimation. The use ofprobabilities to estimate power was first proposed in [11]. In this work, a zero-delay model wasmade so that the transition probabilities could be estimated using signal probabilities. Aprobabilistic power estimation approach that does compute the toggle power and does not makethe zero-delay or temporal independence assumptions, called probabilistic simulation was16

proposed in a few papers. In this technique, the use of probabilities was expanded to allow thespecification of probability waveforms. This approach assumed spatial independence, and wasnot restricted only to synchronous circuits.Another probabilistic approach was proposed, where the transition density measure of circuitactivity was introduced by Farid N. [12]. An algorithm was also presented for propagating thetransition density in to the circuit. This approach does not make a zero-delay assumption andmakes only the spatial independence assumption. Result of this independence assumptionmakes computed density values insensitive to the internal circuit delays.Yet another probabilistic approach was presented in [13] by A. Ghosh et. al., where BinaryDecision Diagrams (BDD’s) were used to take into account internal node correlations andtoggle power, at the cost of increased computation. This approach can become computationallyexpensive. Apart from that, latest literature describes more accurate toggle estimation methodsbased on Bayesian networks [14-16]. They get limited to handle high gate count designs. All ofthe above probabilistic and statistical techniques are applicable only to combinational circuits.They require the user to specify information on the activity at the latch outputs.This work addresses the toggle computation problem or pattern dependence problem for multimilliongate designs by extending Najm’s approach [12]. Using this average power estimationhas been performed in various stages of the designs.1.1.2 Power Supply NoiseWith a phenomenal rise in the switching speed in the VSLI circuits, the probability of largenumber of cells switching in a short period of time increases. A large number of simultaneous17

switching occurring in a short period of time can cause a considerable amount of noise in thepower supply network of a circuit. Power supply noise means decrease in voltage seen by cellPower Ground nodes. Schematic of Power Network gird is shown in Figure 1.4. The resistiveparasitic R in the power distribution network is accountable for the resistive noise, which is theIR voltage drop in the PG network. Apart from R, on chip decoupling capacitance also plays abig role. The switching noise in the power distribution network must be contained to a tolerablelevel to ensure the reliability/performance of a circuit.IO PadVdd Pad Vss Pad IO PadIO PadIO PadVssPadIO Pad5Vss PadIO Pad1IO PadVdd Pad Vss Pad IO PadFigure 1.4 Schematic of Power Grid in CMOS designsExcessive voltage drops manifest themselves as glitches on the PG buses and cause:• Erroneous logic signals18

• Degradation in switching speeds• Reduction in Noise Margin and Driving Capability of the gatesAccording to a study on Pentium®4 [26], power supply noise can reduce clock frequency by6.5% on 130 nm node and can reduce clock frequency by 8% on 90 nm node. All these arehandled through various margins in design flow as there are no efficient solutions available toaddress dynamic V drop problem in design flow.There is some work done to estimate peak power as well as decoupling capacitor in this regard.In [27], a pattern-independent, linear time algorithm is described that estimates the maximumcurrent waveforms at various contact points in the circuit. The algorithm is first demonstratedfor simple gate delay and current models. The expression for modeling the delays and currentwaveforms for a general gate is derived and the way to extend the algorithm under moregeneral models is also described. The authors improved the work in [28]. In [29] measures ofpeak power are proposed in the context of sequential circuits, and a procedure is presented toobtain lower bounds on these measures, as well as providing the actual input vectors that attainsuch bounds. Automatic generation of a functional vector loop for near-worst case powerconsumption is attained.Paper [30] presents a statistical method for estimating the peakpower dissipation in VLSI circuits. The method is based on the theory of extreme orderstatistics and its application to the probabilistic distributions of the cycle-by-cycle powerconsumption, the maximum-likelihood estimation, and the Monte-Carlo simulation. It can beused to predict the maximum power of a VLSI circuit in the set of constrained input vectorpairs as well as the complete set of all possible input vector pairs. The simulation-based natureof the method avoids the limitations of a gate-level delay model and a gate-level circuitstructure. Also, the method produces maximum power estimates to satisfy user-specified error19

and confidence levels. Experimental results show that this method typically produces maximumpower estimates within 5% of the actual value and with a 90% confidence level by onlysimulating less than 2500 input vectors. Another technique described in [31] computes peakpowers of design while maintaining the current waveform accuracy. It models logic gates bybreaking the gates into various nodes. It then models various currents in terms of these nodeswhich are evaluated quickly during logic simulation to measure power. However, this is basedon logical simulation so extremely difficult to scale.Chen and Ling [36] proposed an approach to estimate the power supply noise based on anintegrated package-level and chip-level power bus model. Chang, Gupta, and Breuer [37]proposed an analytical model to estimate the ground bounce caused by the switching in theinternal circuitry for sub-micron VLSI circuits. Jiang, Cheng, and Deng [38] proposed aGenetic Algorithm-based approach that considered the dependence of switching noise on inputpatterns under a distributed RC model of the PG network. Zhao, Roy, and Kho proposed anevent-driven simulation based approach to calculate the worst case power supply noise under adistributed RLC model [39].There are still more challenges in this area where very little work has been done.First, to analyze Power Ground (PG) noise, worst case vectors are required using which theparasitic network of chip is simulated. Not only the whole approach needs lot of data andmemory but today’s SPICE simulators are not able to handle such complexity in terms ofruntime and capacity. Many times (read as all the time) determining the worst case vectors isnot straightforward.20

Second, today’s design has huge PG network. It is known that the voltages seen at variousnodes in this network will vary. A resultant voltage across power-ground bus for a macroimpacts the delay as shown in Figure 1.5. Note that delay is non-linear at low voltages. Further,the change in delay to change is voltage is more non linear compare to delay – this is of veryimportant to designers as it can cause delay issues or design failures. Due to high dependencyof delay to voltage, dynamic V-drop in PG network is fast becoming a critical concern for thechip designers [41][59-60].normalized delay and normalizeddelay2voltageRise DelayFall Delayrisedelay2voltage_changefalldelay2voltage_change1.2 1.15 1.1 1.05 1 0.95 0.9 0.85 0.8VoltageFigure 1.5 Normalized delay and normalized delay to voltage ratioThird aspect to PG noise problem is that it is an iterative phenomenon [41]. When voltageacross cell decreases due to sudden rise in switching activity, it also changes the delays andhence the simultaneous switching. This in turn can reduce/increase the dynamic noise issues.Reduce in a sense that the simultaneous switching may reduce all together or increase becauseit can move one hot spot of the design to some other hot spot. Handling of this is not a trivialtask from analysis perspective.21

Four, design methodologies today expect analysis to meet predefined PG noise targets. Inreality, any acceptable voltage drop is fine if we meet the required timing goals. However, thisis not done due to lack of analysis data.Five, it has been found that many times the device fail on testers due to excessive simultaneousswitching in SCAN testing. This creates serious testability issues and hence not only we need toanalyze dynamic V drop for functional mode but also some other modes like test.This work addresses the dynamic PG noise problem. The problem is also described as dynamicV drop problem in some literature. Based on the above-mentioned issues, the goal is to addressthe dynamic V drop problem with efficient runtime that addresses today’s multi million gatedesigns. The goal is to also evaluate the impact of dynamic V drop on timing.1.1.3 MTCMOS AnalysisLeakage power consists of more than half of total power in today’s ultra sub micron designs.See Figure 1.6 below.22

Figure 1.6 Total power break up into leakage and activeLeakage power control and power network integrity have become one of the key area ofinterest for today’s power sensitive designs. In comments on Power Consumption Problem atthe 2002 International Electron Devices Meeting, Intel chairman Andrew Grove cited off-statecurrent leakage in particular as a limiting factor in future microprocessor integration. [72]Designers have been coming out innovative way to reduce leakage power using varioustechniques – reducing device power supply and frequency of operation [73], Multi-Vt transistorusage [74-79], controlling input states [74], memory leakage reduction [75], using reverse bodybias [76], and using transistor stack [77]. A detailed study on sources of leakage power andreduction techniques can be found in [82].Several techniques are available to reduce the leakage – gated power supply using powerswitches is one of the most promising techniques. Power switches consist of several PMOS23

transistors and controlling signals and are used to dynamically switch off or on the powersupply to specific region in the chip. This work studies the challenges associated with usingpower switches and proposes fast analysis technique to estimate peak currents while Powerramp up of logic happens.1.2 TermsGeneric terms used in this report are described below.ASICBlockNetlistPhysical DesignRTLCharacterizationAcronym for Application Specific Integrated Circuits. A custom or semicustom integrated circuit, such as a cell or gate array, created for a specificapplication. The complexity of ASICs typically requires significant use ofCAD techniques.Also known as functional block or module. Any block within the designhierarchy instantiated one or more times that will be laid out separately isreferred to as a block module. Block modules are defined divisions of a chipbased on functionality and can be worked on independently of otherfunctional blocks.A description of the circuit. The description can be a gate-level or Register-Transfer level (RTL) one. It can also be in different languages like Verilogor VHDL or SPICE.A portion of a chip or circuit corresponding to a block module that is laidout separately using a Physical Design tool. It is also referred to as aphysical block, layout region, or layout block.Acronym for Register Transfer LevelElectrical analysis performed for the purpose of determining typical deviceperformance characteristics and/or parametric limits.24

CMOSDieAcronym for Complimentary Metal Oxide Semiconductor. An MOStechnology in which both P-channel and N-channel devices are fabricatedon the same die.A single square or rectangular piece of silicon into which a specificsemiconductor circuit has been diffused.Electromigration Particle migration in aluminum or copper thin-film or polysiliconconductors at grain boundaries as a result of high current densities.Electromigration can lead to either an open circuit condition in a conductoror a short between adjacent connectors.InterconnectTiming WindowThe metallization connecting two or more active elements on the surface ofa die; also, the wires connecting the die to the package leads.Timing window specifies the interval of each circuit node at which atransition activity is anticipated. For a single clock domain, the time intervalcan lie within a clock period. There can be more than one intervals oroverlapping intervals based on complexity of path converging to the node.Table 1.2 Generic Term Definitions1.3 Thesis outline and ContributionThere are 3 distinct problems addressed in this work.First, Average Power Estimation using probabilistic toggle estimation for multi-million gatedesigns. Unless specified by the user, the approach calculates switching probabilities as well asswitching rate at different nodes in the circuit (including primary inputs). We have studiedswitching activity calculation method with lot of literature already available and enhanced oneof the techniques to meet multimillion gate design needs. This work helps in average dynamic25

power estimation as well as addresses the challenges of toggle estimation which has variedapplications like peak power estimation, power supply noise analysis and reliability analysis.Second, Dynamic Power supply Noise estimation. In this regard, a prototype flow is developedin conjunction with Prime Time STA flow and Spice to measure Power Supply noise. The workdescribes gate characterization methodology that involves one time SPICE simulation and howthe PG network is modeled using the characterized data.Third problem addressed is power grid analysis where MTCMOS gates are inserted. The workfocuses on MTCMOS analysis challenges and key factors to focus on when a bunch of logicturns ON from OFF state. In this regard, a flow is developed to estimate peak currents oroptimize MTCMOS resistance and switches.We restrict out scope to CMOS circuits mapped on a predefined cell library and we follow thetwo step paradigm – library modeling and analysis of design using modeled information.Library modeling involves description of cells, their functional, structural or electrical behavioras needed for block or design analysis, which happens once for all. Electrical behaviormodeling happens through characterization using circuit simulator (e.g. SPICE [3]).The document is organized as below. Toggle estimation problem is addressed in chapter 2.Chapter 3 describes the various Power Estimation techniques and tools available in industryand compares the power numbers with the above toggle estimation method. Chapter 4 describesPower Supply Noise Estimation and Chapter 5 describes MTCMOS Power Up analysis. Finally,huge lists of publications are shown at the end for further reference.26

2 Toggle Activity Estimation2.1 OverviewIn CMOS technologies, the chip components draw power supply current only during a logictransition if we ignore the small leakage current. The current is also proportional to the supplyvoltage value seen by the cell or macro. While this is considered an attractive low-powerfeature of these technologies, it makes the power estimation and voltage drop highly dependenton the switching activity inside these circuits [11][97]. It means, a more active circuit willconsume more current and hence will contribute higher Voltage drop. The activity of circuit isknown by running simulation patterns and analyzing the data. The pattern-dependence problemis serious. Often, the power of a functional block needs to be estimated when the rest of thechip has not yet been designed, or even completely specified. In such a case, very little may beknown about the inputs to this functional block, and complete and specific information aboutits inputs would be impossible to obtain.This drives pattern independent toggle activity estimation problem, often referred as vector lessapproach. Since vector less approach does not require patterns, it is also called ‘static’ whereasvector based approach is called ‘dynamic’. Table 2.1 compares these 2 approaches.STATICDYNAMICUses probabilistic approach as describedin [12] or zero delay simulation basedUses Logic simulation to generate switchingactivity or SPICE simulation to calculate power.27

STATICDYNAMICapproach.Vector-less approach.Many times gives upper bound.Modeling of certain element (hardmacro/complex block) is difficult.Very fast. (few minutes-hours)Lot of research into products for averagepower estimation.Synopsys has: Power CompilerVector based approach. Hence quality is as good asinput vectors. Imagine number of patterns possiblefor 100 inputs block.Gives accurate result.Since it is vector based, functional models can beused during simulation.Very slow.(few days-weeks)Can give instantaneous power.Synopsys has: Power Mill (Nano Sim)Table 2.1 Comparison of Static vs Dynamic approaches for Power EstimationThis work describes the approach used for toggle frequency estimation and its limitations.Further it proposes solution to handle these limitations which makes the approach usable forbig designs.Few terms are used below to clarify discussion:Transition Density: If a logic signal x(t) makes n(T) transitions in a time internal oflength T, then the transition density of x(t) is defined as:D(x) = n(T)/T where T is very huge time (infinite ideally)28

For large T, D(x) becomes time invariant function and hence there is no need to accountfor temporal correlation.Toggle Frequency: If a node x is toggling n(T) times over a time interval of lengthT, then the toggle frequency F(x) is defined as:F(x) = n(T)/(2*T) where T is very huge time (infinite ideally)Example, if the node is switching at 20 MHz, it is expected that the node will switch 2times in 50 ns. As it can be seen, the toggle frequency can be converted to transitiondensity or switching activity by the following equation,Toggle density = #of transitions/Period = Switching ActivityAll the three terms mentioned above are used interchangeably in this document.It should be noted that toggle frequency of a node has no direct relation with the clockdomain(s) in which node (or logic) exists. We have used the clock domain frequency toupper bound the toggle frequency calculated by our approach.Signal Probability: Signal probability P(x) at a node x is defined as the averagefraction of clock period in which the stead state value of x is logichigh.2.2 Toggle Activity EstimationThis section gives overview of Farid Najm’s work.Boolean difference of output is computed with respect to each input pin. Boolean difference offunction y (output) depends on x(each of the input). It is defined as:29

dydx=y⊕yx = 1 x = 0(1)It was shown in [5] that, if the inputs x I to boolean logic are (spatially) independent, then thedensity of its output y is given by:ndyD( y)= ∑ P() D(xi)(2)dxii=1In (2), it is assumed that all inputs are independent. This can lead to inaccuracy where primaryinputs will be diverging and than reconverging to primary outputs – they are not really spatiallyindependent. However, at a block, the primary inputs can be considered pretty muchindependent and hence the above approach can be modeled more accurately if the wholeblock’s boolean difference is computed.Given the signal probability and toggle density values at the primary inputs of a logic circuit, asingle pass over the circuit, using (2), gives the density at every node. Note that apart fromestimating toggle densities at the output node, we also need to calculate output signalprobabilities to do toggle density estimation of subsequent circuit logic. This is simple for twoinput AND gate.P(Y) = P(A)*P(B)orP(Y) = 1 – P(A)P(B) for NAND gate.2.3 Multi-million gate solutionWhen we apply the above approach, it gives good results for designs which are small and canbe analyzed flat and dominated by combinational logic. Beside, it is always not possible to runflat due to other logistic concerns like blocks are designed first or rest of the design is being30

done hierarchically or there is reusable IPs in design which do not have net list. The approachdescribed in previous section was extended to handle such requirements.We also came across several issues while applying this approach to some large designs [>5Mgates] and implementing tool – Toggle Frequency Calculator. In this section, we will discusssolutions those addresses each of the problem in detail.2.3.1 Deriving automatic toggle frequency values1 Primary Input HandlingThe toggle rate at Primary Input is not known. Since they are driven externally, there isno easy way to predict toggle rate for the same. The same is true for primary inputsignal probability. Consider the following Figure 2.1 and Figure 2.2.Figure 2.1 Schematic of logic circuit 131

Figure 2.2 Schematic of Logic Circuit 2In case of above, Input Clk or D going to block can be primary inputs. Unless user givestoggle rate, it is highly difficult to compute the same. We used static timing analysis[24][25] specifications to derive these inputs. They are,Input Delay Specification – A constraint that specifies the minimum or maximumamount of delay from a clock edge to the arrival of a signal at aspecified input port. Input delay specification is with respect to a clockthat triggers events on that signal.Clock specification – specifies the characteristics of a clock, including the clockname, source period and waveform.Mode Specifications – specifies the constant values applied on certain port or pinsto drive timing analysis in a specific mode. This means that these pinsor ports are not toggling during the analysis. It also specifies theconstant value to which the port or pin is tied to.For clock inputs, we used the toggle rate specified as per the clock specification.For non-clock inputs, we used the clock specified on the Input Delay specification.For constant ports, we used 0 toggle rate and static probability based on constant valuetied i.e. if it is constant 0, static probability is 0 else it is 1.32

A Sample SDC file with above command is shown in Appendix A. Note that SDC fileis collection of commands in tcl format so we have shown the commands which areprimarily required.2 Sequential element modeling (e.g. flip-flops, latches)Sequential elements do not directly switch arbitrarily when the input switches. Hence,we can not apply the formula as mentioned in equation (1,2).We used following formula to compute toggle frequency at the output of sequentialcells. Note that we are referring latches and basic flip-flops as part of sequential cellsand not the complex macros. They are dealt separately.Qout = min(DataInput, clock/2)The upper bounding of clock/2 is required since we identified certain cases where DataInput toggles more than clock/2. This is explained below. For the cases, where datainput is not toggling more than clock/2, output can not toggle more than Data Input.Above equation takes care of these facts.3 Some Boolean gates were not taking care realistic scenarios: exor/exnor gates, muxEquation (1,2) can compute higher toggle rate than clock toggle rate. This can go evenhigher than clock toggle rate if there are more such gates in transitive fan out. We foundthat this is not the case on actual designs and in many cases, this was not intendedbehavior. We exceptionally identified such cells and clipped their toggle rate to half ofthe clock toggle rate.In similar fashion, we exceptionally identified mux cells and assigned the output togglerate to maximum toggle rate of all inputs.33

4 Complex loop handlingThese were handled by breaking the loops. We broke the loop at the 1 st point where wefound the loop forming.5 Unconnected inputs going into logicThis was handled by reverse tracking the first sequential cell encountered in thetransitive fan out of unconnected inputs. This algorithm gives the clock controlling thetoggle rate down the line.If the unconnected inputs are clocks, we assigned the worst toggle rate of the blockitself.6 Gated clocks or generated clocksGated clock is a clock signal that can be modified by logic within the design, such as aclock that can be turned off to save power. Schematic of gated clock is shown in Figure2.3.Figure 2.3 Gated clock exampleWe made the gated elements transparent for toggle propagation. A clock gating cell ishandled like a buffer.7 Design Constraints – Guidelines to do realistic usable toggle activity estimation34

Some of the care needs to be taken despite of all the above solutions. For example,toggle estimation must be done based on the targeted application. This drives certaininputs used in 1-6 above. In the implementation, we kept certain hooks to give controlto the user.2.3.2 Hierarchical Modeling1. Huge portion of the design is occupied by memories however memory output switchingactivity calculation is not straight forward2. Complex functionalities: Hard macros3. Multi-million gates cannot afford to have flat analysis due to cycle time and inherentlimitations of probabilistic approaches. We needed to devise a method to do hierarchicalanalysis by modeling sub-blocks and using them as a black box.We used the timing modeling approach to handle (1), (2), (3).All standard library components are presently modeled in liberty file. [69] Static timinganalysis tools can generate similar liberty file for blocks after completing the analysis. [25]This file has following information,• Input pin 2 output pin timing arch• Setup and Hold constraints for the data input and clock input• Output timing with respect to either input pin or related clockWe derive output toggle frequency f(out) as below.35

In case of input 2 output timing Archf(out) = maximum(all controlling input toggle rate)In case of clock 2 output timing Archf(out) = average switching activity of clock domainFigure 2.4 shows the gate level netlist of a design called ‘simple’. Figure 2.5 shows the timingarcs which will be extracted by Prime Time – a leading industry timing analysis tool. [25]Timing arc information will be used to compute output toggle rate as explained below.Figure 2.4 Gate Level Netlist for 'simple' design36

Figure 2.5 Timing Arcs in extracted model of 'simple' designThere are combinational archs from i3 to out2 and i1 to out2. Hence, output toggle rate at out2will be controlled by the same clock as i3 or i1. In this case, we assign maximum of i3 or i1toggle rate at output pin. The other timing arch is clk2->out1. In this case, out1 will be assignedaverage switching activity of clk2.Thus using timing model information, we generate output toggle rates of memories, complexhard macros or blocks.2.4 Validation and ResultsAbove changes were incorporated into executable code and applied to ISCAS89 circuits. Theresults were compared through power estimation as discussed in next chapter.37

2.5 SummaryIn this work, we address real issues being faced by large designs. Automatic toggle generationeases usability as well as improves accuracy. Hierarchical analysis helps in hierarchical designwhich is common methodology to handle design complexity.38

3 Power Estimation3.1 OverviewAccurate Power Estimates are necessary at various stages of the design in order to make correctarchitectural, implementation and cost tradeoffs.[61] Architectural level tradeoffs are higherlevel and involves software or instruction level power modeling or high level activity numbersfor different blocks to do implementation tradeoffs. Many times weighted averages are used toidentify best cost options [62-65]. Once the design gets converted to structural net list andPhysical Design starts, Power Estimation mainly drives package design, PG network designand lower level power minimization. In this case, power dissipation is described as below.P = (A*C*V^2*f) + (τ*A*V*Ishort) + (V*Ileak)WhereA = activity factorthis specifies the amount of switching at various internalnodes of design. Note that ‘f’ is clock frequency which is readily available formost designs. Activity factor specifies about how much a node toggles per ‘f’transitions of clock. The activity factor can be derived from simulation patternsof the logic.C = capacitanceInterconnect load capacitance or wire capacitanceV = dynamic voltagevoltage at which the logic operatesf = frequencyclock frequency at which the logic operates39

Ishort = short-circuit current during switchingDuring transition in CMOSlogic, both NMOS and PMOS are ON for a momentarily of time. This timecurrent finds a direct path from Power Supply to Ground. This is called shortcircuit current. It is dependent on input transition duration of CMOS.τ= duration of short-circuit currentIleak = leakage current [72-80][32]Figure 3.1 defines various components of power and their relation ship or contribution to totalpower estimation.Cell Internal Switching Power –can vary based on macro SizeInternalPowerShort Circuit powerpower dissipated by amomentary short circuitbetween the P and Ntransistors of a gateduring switchingSwitching power (70-80%)power dissipated by thecharging and discharging ofthe load capacitance.(VDD ^2)*(Cload(i) *TR(i))∑∀CellStatic (leakage) power (5%):power dissipated by a gatewhen it is not switching∑∀ Cell(i)PCellLeakage(i)Dynamic Power consists ofSwitching Power and Short Circuit PowerASIC Flow characterizes librariesfor average and leakage power.Figure 3.1 Venn diagram of Power Components40

In this work, above power components and their computation are extensively studied. Toaddress the problem in systematic manner, power estimation has been simplified the followingway. These assumptions are acceptable given the global analysis that we are considering.Power supply and ground voltage levels throughout the chip are fixed so that it becomessimpler to compute the power by estimating the current drawn by every sub-circuit assuming agiven fixed power supply voltage. Note that this does not mean that different blocks can not beat different voltage level. This allows pre-characterizing library components for requiredvoltage points.The circuit is built of logic gates and latches or reusable IPs, and has the popular and wellstructureddesign style of a synchronous sequential circuit. In other words, it consists of flopsdriven by a common clock and combinational logic blocks whose inputs (outputs) are derivedfrom flop outputs (inputs). It is also assumed that the flops are edge-triggered and, with the useof CMOS design technology, the circuit draws no steady-state supply current. This allowsbreaking down average power dissipation of the circuit into 2 components• The power consumed by the flops• The power consumed by the combinational logic blocks.This chapter is organized as below. In the next section, we have further explained cell basedpower analysis. Next section briefly introduces tools used to compare power estimation asperformed by toggle computation described in previous chapter. Later validation and results aredescribed.41

3.2 Current approaches to Power AnalysisCell based power estimation consists of cell characterization and logic simulation or activityestimation. The characterization phase entails a set of electrical simulations of each library cellfor all possible input transitions and for a wide range of fanin and fanout conditions. Timingand power information obtained in this way is used to construct lookup tables for the basiclibrary elements [46][69].Summing the leakage power of the design’s constituent library cells derives the total leakagepower of a circuit:P leakageTotal = ∑∀ Cell (i)PCellLeaka ge(i) (3)Where P cellLeakage(I) is the leakage power dissipation of each cell. Technology library developersannotate the library cells with the approximate total leakage power dissipated by each cell.There is usually a single static power number per library cell but sometimes leakage power candepend on the logical condition of the cell. In this case, the library cell is annotated with a statedependent static power.A cell’s internal power is the sum of the internal power of all of the cell’s inputs and outputs asmodeled in the technology library:∑P = Ei * A(i) * f ( i)(4)Internal∀Pin(i)Where Ei is the internal energy of each pin. In practice, the internal energy if a pin ischaracterized in the technology library and can be accessed by simple table look-up. Depending42

on the required accuracy, different look-up tables can be provided by the library designers asexplained in Table 3.1.Lookup TablePinDirectionIndicesInput/OutputInput Transition OR Output load capacitanceOutputInput transition and output load capacitanceOnedimensionalTwodimensionalThreedimensionalOutputInput transition and output load capacitance of the two outputsthat have equal or opposite logic valuesTable 3.1 Power Modeling for CMOS gatesThe switching power is calculated in the following way:∑Pswitching = ( VDD^2) * ( Cload(i)* A(i) * f ( i))(5)∀CellWhere Cload(i) is the capacitive load of net i. Without any physical information, the loadcapacitance Cload(i) is calculated using the wire load model of the net and the fanout of thedriving pin. Usually, this approach achieves relative accuracy.Apart from the approaches mentioned above, the following factors are also important foraccurate power estimation.43

1. Temperature dependency of power. Power consumption in CMOS depends on mobilityfactors, threshold voltage and doping concentrations. These factors are temperaturedependent. Hence power also varies according to variation in temperature.2. Voltage dependency of power. Voltage dependency of power is well known.(P=C*V*V*f). This is true for CMOS technology also. If we model, the CMOScomponent as a capacitor, it is clear that power varies based on the variation on supplyvoltage.3. Power increases with increase in frequency of operation. In fact, many designs now aday have different modes of operation. A high frequency mode when the device isoperational and a low frequency mode when the device is in standby mode. The impactof frequency on power estimation is already being discussed in previous section.4. Now a day, most of the designs have a significant chunk of flops or registers. Accordingto one statistics, around 40-50% logic of the design contains flops. If all the flops areclocked throughout the operation, clock network consumes almost 50% of total power.It is sometimes helpful to analyze power consumption on clock network. This workanalyzes clock power contribution to total power.5. Process corner also impacts the currents and power consumption. This is especially truefor leakage power. A typical VLSI process has leakage power variation of order of 4-6from worst process to best process.44

Based on power sensitivity and tool study analysis in this section, we propose a powerestimation flow in typical design cycle as shown in Figure 3.2 below. Note that the poweranalysis varies from RTL design to pre layout netlist to post layout netlist.Power Estimation(spreadsheet)ArchitectureForward SAIF*Or FrequencyConstraintsRTLToggle FrequencyCalculatorUnplaced NetlistPlaced NetlistDetailed Route OverLogic SimulationPIF FileGenerationPower Estimationin PowerCompiler (wireload, global SPEF,Detailed SPEF)RC RC SPICE NetlistNanoSimPrimePowerRecommendedLeast Preferred* SAIF - Switching Activity File based approachFigure 3.2 Power Estimation in Design Stages3.3 Power analysis Tools3.3.1 Power Compiler: [67]Formerly known as Design Power, power compiler is currently most widely used Synopsys tool.Power compiler, typically being used during synthesis, does power optimization as well aspower estimation. This tool has static algorithms for calculating switching activity at various45

circuit nodes and propagates the same. It is known fact that power compiler cannot estimategood switching activity for sequential cells. It should be also noted that most ASIC vendorshave cell power modeling based on Synopsys Liberty syntax so it is highly important to havesingle cell power estimation close to Power Compiler number. Synopsys Reference Manual onPower Compiler [18] gives basic power calculation theory and description of terms being usedin its tools.We used power compiler in two modes.One mode was to use power compiler as complete solution for power estimation. In thisapproach, we generated input switching activity from our vectors and specified topower compiler. Power compiler propagated the switching activity based on switchingprobability. It then calculates power. In this method, it used some assignment methodfor sequential cells and we went ahead with that because our aim was to verify defaultswitching activity propagation algorithm of Power Compiler.Second mode was to use power compiler just as power calculation engine. In thisapproach, we generated switching activity at all the nodes by using methodologydefined in Chapter 3 and used the power calculation engine. As mentioned earlier,power calculation engine is quite accurate and so based on power estimation; our aimwas to evaluate switching activity determination accuracy of other methods.3.3.2 Power Mill (or Nano Sim) [4][68]Power Mill is Synopsys tool (currently known as Nano Sim) with fast SPICE engine at core. Ithas been identified as nicely correlating for two of the single cell circuits and one small design46

with SPICE. Power Mill is dynamic simulation based tool and hence it requires patterns forsimulation.We used Power Mill to calculate average and peak power. The main reason was runtimeadvantage of PowerMill compare to SPICE. It should be noted here that Power Mill is capableof taking SPICE net list as input so any switching between from Power Mill and SPICE istransparent, if needed.3.3.3 Prime Power [66]Prime Power is another offering in Synopsys power portfolio. This is dynamic vector basedsolution. However the key difference with Power Mill is that Power Mill is SPICE based toolwhereas Prime Power is logic simulation based tool. In other words, Power Mill is more tunedfor accuracy and Analog kind of designs whereas Prime Power is tuned to digital andspecifically ASIC kind of designs with reasonably good accuracy. Prime Power has PLIinterface with leading industry simulators e.g. VCS, Modelsim, Verilog etc. While doing logicverification with these simulators, if we instantiate one call/command, the PLI dumps binaryfiles. These binary files can be used in Prime Power to do power estimation. It should be notedthat Prime Power can do peak power analysis also.We used Prime Power for both average and peak power analysis. The simulator interface beingused was VCS.3.3.4 Other ToolsThis project used VTRAN for converting vectors to SPICE stimulus. VTRAN is one of theofferings as part of Synopsys and is generic translator of vectors from one format to another. It47

is supporting all major industry formats as well as internal formats of many prominentASIC/EDA vendors.VCS was used for logic simulation. There is no specific reason for using this simulator exceptthat it is Synopsys offering so will go with Prime Power without major hurdles.There are few TI internal programs used to set up an automated flow. They are listed below.1. genFuncTDL – An internal utility to generate random vectors with specified clock rate.2. SimOut – A test constraint validation environment.3. SDFAligner – for translating SDF from one simulator to other simulator compatibleformat.4. SigProbGen – For converting vectors to input switching activity and probabilitycalculator.5. DREPGEN – for generating data compatible for TFC.6. ASCII benchmark data to Verilog netlist and SPICE netlist translator.3.4 Validation FlowThe validation flow diagram, data management and color convention is shown in Figure 3.3.Some of the key steps are described below.48

DREPGENDREPFILE+ DATAGENFUNCTDLRANDOMTDLVERILOGNETLISTDC ScriptsTFCUSERFREQFILESIGPROBGENTRANSLATERVerilogPOWERESTIMATIONSWITCHINGACTIVITYFILEVTRAN cmdVTRANISCAS89CircuitsSpiceNETLISTPOWERMILLPWLFILESMOUTCFGTRANSLATERSPICECMDSDFTESTBenchPOWERPrimePowerPIFVCS_PIFFull VCDCOMPARISON ANDREPORTFigure 3.3 Power Estimation Validation Flow• White : Third Party tools• Green : Automatically generated data or written translator• Grey : TI tools• Default : standard inputs/outputs• Blue: Final Output• Elipse : Data file(s)• Rhombus : Process Block(s)Figure 3.4 Legends for Validation Flow49

3.4.1 Netlist Setup:Standard industry benchmark circuits – ISCAS89 are used for the validation. The circuits’complexity ranges from 14 gates to 22000 gates. The detail statistics of the circuit is mentionedin Table 2. [71]To make the validation complete, two single cell circuits are added for ‘micro’ level validation.ISCAS89 benchmark circuits were mapped to 130nm technology for analysis. Note that there isno optimization or synthesis being used while mapping the circuits to 130nm technologyhowever predetermined set of cells was used. They are,• 2,3,4 inputs AND/NAND gates• 2,3,4 inputs OR and NOR gates• Buffers and inverters• 2,3 inputs ex-or and ex-nor gates• Flops3.4.2 Vector GenerationRandom vectors were generated for all the ISCAS89 circuits. The numbers of vectors werebased on circuit complexity and number of gates. They vary from 4 vectors to 38000 vectorsapproximately. The same set of vectors is used for logic simulation and SPICE simulation aswell as derivation of switching activity and static probabilities for Input Pins.50

3.4.3 Interconnect setupAll the circuits can be estimated as synthesized Verilog netlist and hence the parasiticinformation was not available. To make comparison more realistic, no load modes were used inpower compiler and in SPICE simulation. The logic simulation was based on SDF generatedfrom Synopsys.3.5 Validation and ResultsThe complete data from different tools are shown in Table 3.5. Table 3.2 describes circuits usedfor benchmarking. Table 3.3 compares run time between dynamic method and modified togglecomputation method for some of the big design blocks. Table 3.4 shows power estimation forclock network vs. total power estimation. All the power data is dynamic power in uW.• The power numbers mainly reflect the cell internal power and switching power only dueto gate input capacitances as no interconnects were assumed.• All the experiments are done at nominal operating point i.e. normal process, 25 Ctemperatures and 1.2 voltage (nominal voltage).• Clock network power is 50% of total dynamic power but this is not true in all cases.• Run time reduction from static approach is more than 1000 times.• Prime Power reported power is optimistic in many cases to PowerMill. This is not inour expectation and we are looking into it.• TFC is within 30% of PowerMill reported power. However there are certain exceptionswhere it reports 30% optimistic power or >50% pessimistic power.• Power Compiler is >50% pessimistic in most of the cases.51

DesignNameIN OUT Flops Boolean(gates+inv)s111 8 1 0 8s1196 14 14 18 388+141s1238 14 14 18 428+80s13207 31 121 669 2573+5378s13207_1 62 152 638 2573+5378s1423 17 5 74 490+167s1488 8 19 6 550+103s1494 8 19 6 558+89s15850 14 87 597 3448+6324s15850_1 77 150 534 3448+6324s208_1 10 1 8 66+38s27 4 1 3 8+2s298 3 6 14 75+44s344 9 11 15 101+59s349 9 11 15 104+5752

DesignNameIN OUT Flops Boolean(gates+inv)s35932 35 320 1728 12204+3861s382 3 6 21 99+59s38417 28 106 1636 8709+13470s38584 12 278 1452 11448+7805s38584_1 38 304 1426 11448+7805s386 7 7 6 118+41s4 2 1 1 0s400 3 6 21 106+58s420_1 18 1 16 140+78s444 3 6 21 119+62s5 2 1 0 1+0s510 19 7 6 179+32s526 3 6 21 141+52s526n 3 6 21 140+54s5378 35 49 179 1004+1775s641 35 24 19 107+27253

DesignNameIN OUT Flops Boolean(gates+inv)s713 35 23 19 139+254s820 18 19 5 256+33s832 18 19 5 262+25s838_1 34 1 32 288+158s9234 19 22 228 2027+3570s9234_1 36 39 211 2027+3570s953 16 23 29 311+84Table 3.2 ISCAS89 circuit descriptionDesign TFC + Power Compiler Runtimes (in mts) PowerMill runtime (CPUHr)S13207 3 23S13207_1 3 24S15850 3 25S15850_1 3 26S35932 6 25054

Design TFC + Power Compiler Runtimes (in mts) PowerMill runtime (CPUHr)S38417 6 189S38584 7 205S38584_1 7 212Table 3.3 Runtime comparison between vector less and SPICEDesign Name CLK Power Total Power %CLK/Totals4 2.13 3.35 63.6s27 6.39 10.91 58.61s208_1 17.05 30.43 56.04s298 29.84 54.12 55.14s344 31.97 61.11 52.32s349 31.97 61.14 52.29s382 47.04 91.73 51.28s386 12.79 32.28 39.62s400 47.04 94.51 49.7755

Design Name CLK Power Total Power %CLK/Totals420_1 34.1 53.75 63.46s444 44.76 84.83 52.77s510 12.79 29.43 43.46s526n 44.76 85.94 52.08s526 44.76 85.89 52.11s641 40.5 117.38 34.5s713 40.5 123.07 32.91s820 10.66 72.29 14.74s832 10.66 72.5 14.7s838_1 68.21 99.96 68.24s953 61.81 102.37 60.38s1494 12.79 158.7 8.06s1488 12.79 158.24 8.08s1423 157.73 356.1 44.29s1238 38.37 150.51 25.49s1196 38.37 151.17 25.3856

Design Name CLK Power Total Power %CLK/Totals5378 381.55 751.75 50.75s9234_1 449.75 891.59 50.44s9234 485.99 632.35 76.85s13207_1 1359.9 1908.3 71.26s13207 1426 1718 83s15850 1272.5 1971.3 64.55s15850_1 1138.2 2630.3 43.27s38417 3289.1 4659.3 70.59s35932 3450.5 9654 35.74s38584_1 2920.7 8339.6 35.02s38584 2966.3 8057.2 36.82Table 3.4 Clock Power vs. Total PowerDesignNamePowerCompilerProposedApproachPrimePowerPowerMill%newpower/powercompiler%powercompiler/PowerMill%newapproach/PowerMill%primepower/PowerMills111 5.5 2.23 0 2.87 -59.42 91.62 -22.24 -10057

DesignNamePowerCompilerProposedApproachPrimePowerPowerMill%newpower/powercompiler%powercompiler/PowerMill%newapproach/PowerMill%primepower/PowerMills4 3.72 3.35 2.93 2.79 -9.95 33.43 20.16 4.95s5 2.49 1.34 0.47 1.72 -46.12 44.66 -22.05 -72.61s27 12.69 10.91 10.03 9.36 -14.01 35.54 16.55 7.14s208_1 44.91 30.43 22.4 29.03 -32.25 54.7 4.81 -22.84s298 67.33 54.12 40.05 41.42 -19.62 62.57 30.67 -3.31s344 85.24 61.11 56.55 65.7 -28.31 29.74 -6.99 -13.93s349 86.48 61.14 56.66 65.86 -29.3 31.31 -7.16 -13.97s382 83.57 91.73 52.75 53.15 9.76 57.25 72.6 -0.75s386 75.15 32.28 42.78 48.46 -57.05 55.07 -33.4 -11.73s400 83.96 94.51 52.77 53.3 12.58 57.51 77.32 -1s420_1 70.19 53.75 45.6 44.12 -23.43 59.11 21.83 3.37s444 83.79 84.83 52.9 53.64 1.24 56.22 58.15 -1.38s510 64.68 29.43 18.23 47.43 -54.51 36.36 -37.96 -61.57s526n 85.2 85.94 53.54 53.89 0.87 58.1 59.48 -0.65s526 85.41 85.89 53.67 54.08 0.57 57.93 58.83 -0.7558

DesignNamePowerCompilerProposedApproachPrimePowerPowerMill%newpower/powercompiler%powercompiler/PowerMill%newapproach/PowerMill%primepower/PowerMills641 159.77 117.38 72.37 93.34 -26.53 71.17 25.76 -22.46s713 162.62 123.07 74.51 96.57 -24.32 68.41 27.44 -22.84s820 119.02 72.29 47.96 73 -39.27 63.04 -0.98 -34.3s832 119.18 72.5 48.03 73.34 -39.17 62.51 -1.14 -34.51s838_1 126.27 99.96 93.41 75.78 -20.84 66.63 31.91 23.27s953 159.75 102.37 85.98 88.5 -35.92 80.51 15.67 -2.85s1494 187.71 158.7 98.28 136.47 -15.45 37.54 16.29 -27.99s1488 203.99 158.24 98.16 135.83 -22.42 50.18 16.5 -27.73s1423 406.56 356.1 244.9 278.03 -12.41 46.23 28.08 -11.92s1238 302.45 150.51 128.2 151.55 -50.24 99.57 -0.69 -15.41s1196 296.7 151.17 126.5 151.13 -49.05 96.33 0.03 -16.3s5378 1041.2 751.75 584.3 688.62 -27.8 51.2 9.17 -15.15s9234_1 1480.6 891.59 704.7 812.36 -39.78 82.26 9.75 -13.25s9234 1300.4 632.35 508.2 472.82 -51.37 175.03 33.74 7.48s13207_1 2853 1908.3 1533 1677.46 -33.11 70.08 13.76 -8.6159

DesignNamePowerCompilerProposedApproachPrimePowerPowerMill%newpower/powercompiler%powercompiler/PowerMill%newapproach/PowerMill%primepower/PowerMills13207 2572 1718 1436 1418.89 -33.2 81.27 21.08 1.21s15850 2640.3 1971.3 1400 1361.52 -25.34 93.92 44.79 2.83s15850_1 3272.6 2630.3 1539 1945.25 -19.63 68.24 35.22 -20.88s38417 7654.6 4659.3 4352 4688.74 -39.13 63.26 -0.63 -7.18s35932 17606 9654 6789 8513.75 -45.17 106.79 13.39 -20.26s38584_1 12031.7 8339.6 5630 6738.36 -30.69 78.56 23.76 -16.45s38584 10951.4 8057.2 4261 6235.13 -26.43 75.64 29.22 -31.66Table 3.5 Power Estimation across various tools3.6 Power estimation applicationsOnce the power estimation has been done, the data can be used in a post-processing step toinvestigate various circuit properties. Note that some of them are applications of average togglecalculation method we described above.3.6.1 Average power/ground bus currentsConsider the problem of computing the average current in the power or ground bus branches.This can be solved using toggle densities and average power consumption for each library cell.60

We can approximate the average power for each cell based on toggle densities and approximatepower or ground network as distributed or lumped R and C. SPICE simulating this powernetwork, one can estimate average power/ground bus currents. [31]3.6.2 Average power dissipationAs a direct consequence of the power estimation described above, it should be clear that theanalysis gives overall average power dissipation, summing over all circuit nodes.3.6.3 Electro migration failuresElectro migration [93][94] is a major reliability problem caused by the transport of atoms in ametal line due to electron flow. Under persistent current stress, this can cause deformations ofthe metal, leading to either short or open circuits. The electro migration failure depends onaverage and root mean square – RMS current densities in metal leads. The average current ineach metal lead can be estimated by the method described in this chapter and thus potentialelectro migration current can be addressed either in power network or signal lead.3.6.4 Power RoutingIt has been noticed that inaccurate power estimation normally is the root cause of ‘over design’of power network. By estimating accurate power number, it is possible to have dense powergrid on a block and light power grid on some other block and thus reducing the overall IR dropproblem also.61

3.6.5 Gate Oxide Integrity AnalysisReduction in gate oxide thickness in submicron technologies has resulted in increased electricfield at the gate oxides. Excessive electric field > 5MV/cm can cause damage to the gate oxideand also reduce the Time Dependent Dielectric Breakdown strength (TDDB). The excessiveelectric field are caused by undershoot and overshoot at gate terminal. High duty cycle ofovershoot/undershoots will result in permanent failure of the transistors. The Failure in Time(FIT) rate represents the probability of device failure in 10 years of operation. In this regard,the duty cycle of signal input pins are measured based on toggle density.3.7 SummaryBased on our validation flow and analysis of results, it can be found that there is a way toestimate a good power number with minimum run time as shown Table 3.3. However as themethod suggests, the toggle frequency calculation method has certain limitations as it is basedon probabilistic algorithms and it does not have timing information or it does not do any logicalsimulation. Some ‘power’ designers may be interested in having good accuracy at the cost ofrun time. We have proposed a power estimation flow that caters the need of ‘power’ user aswell as normal users also.62

4 Power Supply Noise Analysis4.1 OverviewFigure 4.1 below gives a representative voltage waveform at an internal node in digital designswhile they are operational. The fluctuations arise due to switching CMOS logic andinductances in power supply, package and interconnect.Max VoltageVoltageIncreases PropagationDelayTime Average IR DropMin VoltageTimeFigure 4.1 Voltage over time representation at an internal design nodeThe dips in voltages are due to sudden change in currents during logic switching sinceinductance will have additional di/dt noise. Apart from that, in CMOS currents are higher whilelogic switches compare to average currents used for average IR drop analysis. This causesadditional i(t)*R drop where R is resistance of Power Grid. Total drop seen at the sink ofcurrent is:deltaV = L(di/dt) + i(t)*R63

Most popular technique to control this IR drop is to insert decoupling capacitors in the design.Figure 4.2 shows electrical representation of inductance and dynamic switching of cell thatcauses Power supply noise and decoupling capacitors that helps in meeting this instantaneousneed.V ddV ssL pdV dd PinC pdRpdI ddR ndL V ss Pinps R ps C psI ssV dd NetC ndV ss NetCellC decapR nsC nsFigure 4.2 Schematic circuit for instantaneous voltage drop analysisThis work focuses on computing instantaneous IR drop (deltaV) or actual voltage (Vdd-deltaV)at Cell’s Power/Ground ports. Vdd is ideal voltage source here and constant over time. Herealso our approach is focused on cell based designs. Next section explains the cellcharacterization and modeling needed for block level analysis. Using this characterization, webuild a power grid network that can be simulated. This is discussed in section 5.3. Section 5.4explains the prototype flow we developed and chapter ends with validation results andconclusion.4.2 Cell CharacterizationDefinition: Cell characterization is a process through which data is prepared forevery cell for usage in the design.Process involves SPICEcharacterization as well as post processing of data. The process needs64

to be absolutely in complete alignment between characterization andits usage.4.2.1 Current Characterization MethodologyFor instantaneous Power Grid analysis, we analyzed cell peak current waveforms. Figure 4.3shows transient waveform of inverter cell which was simulated at 250MHz. (VDD is power pinand VSS is ground pin) It has voltage waveform of primary input and primary output (VA, VY)of inverter. It also has current waveform in VDD and VSS port (IRVDD, IRVSS). The voltagewaveform at VDD and VSS port is seen. (VVDD_INV1, VVSS_INV1)Note that current waveform at VDD and VSS are similar except one difference – transitiondirection. The current waveform at VDD when output is charging is same as current waveformat VSS when output is discharging and vice versa. This is true in this case for inverter but it canvary if the cell is not balanced properly. However in any case the amount of chargesupplied/discharged will be constant since it is governed by load connected at output.65

Output isrising. There isnotablesymmetry forrise/fall. Thishelps us tocharacterizeonly onecurrent and dothe analysis atPower/Groundnetwork.Output isrising. Thisalignment ispreserved forbetter resultsduring currentwaveformgeneration.Same is truefor Outputfalling.Figure 4.3 Inverter waveforms measured at different nodes66

In this work, we have maintained temporal relation ship between Power and Ground currentwaveforms and decoupled the simulations i.e. they are simulated separately and IR drop resultsare merged.We performed simulations and arrived at following conclusions.• The shape of the current waveform remains the same if the patterns used are sameacross different frequencies. Note here that the overall simulation time decreases whenfrequency increases for a same set of patterns. This is not a surprise as the load beingcharged and discharged is same during each transition for the same slew and for thesame set of patterns. In case of CMOS gate, shape of current waveform remains samefor very high frequencies (period ~= 3 times of 0-100% slew). (Appendix C)• The slew or transition time (used interchangeably) plays a big role for peak powerdetermination of cells. When the slew decreases, the width of the current spikedecreases with increase in peak. Figure 4.4 and Figure 4.5 shows the peak powervariation for different input transition times. Note the variation of ~2x for inverter and~1.5x for 2 input NAND gate.67

Figure 4.4 transition time vs. peak power for InverterFigure 4.5 Transition time vs. peak power for nand gate• Peak power varies while change in output load. The change is as expected sincecapacitance increase along with MOS resistance provides exponential voltage ramp up.Peak is largely dependent on MOS ON resistance as well as initial voltage. Figure 4.6and Figure 4.7 shows the plot of variation for AND as well as OR gate. Note that thevariation is ~1-3% across wide range of load.68

Figure 4.6 Load vs. peak power for AND gateFigure 4.7 Load vs. Peak power for OR gate• For cell characterization, pattern dependency is not critical. This is expected as most ofthe circuits will be 1-2 level of logic where each pattern will activate/deactivate most ofthe transistors. However, soon when cells start becoming larger, some logic may not getactivated during switching. In this case, it is important to choose useful patterns for cellcurrent characterization.• For cell characterization, transition direction matters for a given power supply. It meansthat output rise transition or fall transition are important to capture during69

characterization and use them appropriately during use. (Figure 4.3) In our case, wecapture rise and fall transition together and use them for analysis, making proposedapproach direction independent. Figure 4.8 State Dependency on cell switchingFigure 4.8 State Dependency on cell switchingWe also established few corollaries those will be used later in discussion.1. Slew impacts the short circuit current of the device. For multi-stage block, slew impacts1 st stage the most and the overall current waveform is unaffected due to this change.The impact varies from lo to hi when the design stages are decreasing.2. Glitches or hazardous transitions can contribute to peak current need of the circuit.Modeling glitches in non-SPICE analysis is not trivial. It is desired that glitches arereduced by robust design practices. In this work, it is assumed that there are no glitchesin the design.70

3. The temporal correlation between different inputs influences the characterization data alot. This is due to simultaneous switching. We have used the least affecting combinationi.e. 0 skew between multiple inputs in our analysis – this is worst case also. (Figure 4.8)4.2.2 Current Characterization FlowCurrent Source generation involves time variant current waveform determination for each cell.This is current waveform as it is seen at VDD pin of cell when the cell output is rising or falling.The flow is shown in Figure 4.9. Sample SPICE deck is shown in Appendix D. PERL Programthat takes input from SPICE simulation has following options available. In our case, we tooklast option with 75ps as sampling interval.1. full – Whole current data available in the punch file is given as output in two columnformat, first column giving the simulation time and the second column giving thecurrent value corresponding to each simulation time instance.2. fixed – The total simulation time is divided into 8192 points and the current value atthese 8192 time-values is obtained either directly, if available or by interpolation.3. Interval filtered – An interval in picoseconds is specified and according to that, theprogram obtains the time-values for which the data is expected. Again, the current datacorresponding to these time-values is obtained directly, if available or by interpolation.71

Cell SPICE DeckSPICE simulation@ 10 MHzPerl Processing toSample VDD currentsFigure 4.9 Cell Characterization FlowUsing the above methodology, we characterized all the cells which were being instantiated inISCAS89 circuits.4.3 Power Grid network modelingThis section describes the Power Grid network building using the cell characterization data.Power Grid offers resistance, capacitance as well as inductance to the switching logic. Figure4.10 shows schematic of typical power grid. [45] The power & ground supply pins are modeledas ideal voltage sources. The methodology however vastly varies in terms of current sourcemodeling and capacitance estimation [50 51 52 53]. This work also focuses on current sourcemodeling which is described in next sub section.72

Each such armRepresents resistance…Figure 4.10 Power Grid ModelingOnce, the power grid is determined along with capacitance and current source distribution, itcan be realized as matrix data structure and can be solved for computing voltages at desirednodes – specifically the nodes where cell components are connected as below.V * Y = IWhere V is voltage value at each node, Y is admittance or resistance of PG segment, I iscurrent that we have characterized.OR v(t) = Z * i(t) ( Z = R – jW for power network )V(w) = z(w) * i(w)73

In our work, we have computed resistances and capacitors based on technology data for 130nmnode. A sample program was written to realize the mesh structure as shown in Figure 4.10 forVDD network and VSS was taken as ideal ground. This is not an issue since we can lump allthe VSS network elements to VDD network. After determining Power Grid Current Waveform,we solved the network through SPICE simulations.4.3.1 Power Grid Current Waveform ModelingPower Grid Current waveform modeling involves following steps:1. Compute Toggle frequency for each of the instance in design as proposed in Chapter 2.2. Using the current characterized data for the cell, transform the current data at the abovecomputed toggle frequency.3. Compute the input arrival for each of the instance in design. This is done using StaticTiming Analysis. Compute the shift required in current waveform with reference toclock edge. For simplicity, we have assumed 0 skew for clock network.4. Hook up the current sources and solve the PG network.5. Determine the PG model simulation time.There are explained further below.1 Read the characterized data.74

Characterized data was transformed from time domain to frequency domain. Thesampling is done at fixed frequency (much higher than common design frequencyvalues) – 1000/75 ~ 13.33 GHz and [t, i(t)] are stored.I(t) = i(0)d(0) + i(0+Ts)d(0+Ts) + i(0+2*Ts)d(0+2*Ts) + … N SamplesWhere,‘Ts’ is sampling frequency – in this case 13.33 GHzi(t) is current value at time ‘t’d(t) = 1 when t=n*Ts else 0. n ranges from 1,…,NFor computation efficiency N may be chosen as power of 2… N = 2 ** n (n is integer)Now, the Fourier transform of the samples have been performed:I[k] = i[n]*2 Model the current waveform for each Boolean gate at computed toggle frequency.• A compression factor (M) is defined to meet the targeted frequency of the cell underconsideration.M = targeted frequency/cell characterized frequency (10MHz in this work)• Transformation allows preserving base of the current transients. This would not havebeen possible in a time domain while we scale frequency. Hence, the need of frequencydomain transformation. Appendix E shows the waveform generated after transformationfrom 1 MHz waveform. As it can be seen, 1GHz waveform is not per expectation. Thisis not an issue since apart from clock cells, other cells are not expected to switch at 175

GHz average toggle frequency. Beside, this can be handled by having higher frequencycharacterization for clock cells.• Current data is compressed by compression factor.• When the data was transformed to frequency domain and the frequency spectrum wasseen, the notable point was that we had a good chunk of lower frequency components -signifying the approximate triangles of SPICE waveform and most of the medium tohigh frequency components were zero - signifying the zero or low-leakage portion ofthe power waveform.3 Attach the current waveform at a PG node where this cell’s power or ground pin isconnected.4 Compute the total simulation time• If all instances in the design are applied with respective waveforms, metrics solver givespeak voltage drop value from 0 to LCM (period of all gates)• Computing lowest common multiplier (LCM) is computationally intensive for mostdesigns. Even if we do that, the generated simulation time is prohibitively high. Thememory space also becomes high.• In reality we are using a smaller number than that to ensure less simulation time andmore realistic data. Instead we computed simulation time as below.Tstop = f(minimum toggle frequency, max delay)= Time Period of minimum freq cell + maximum delay of all cell outputs= 2000 ns (for minimum frequency as 1 MHz and 1000 ns as worst delay)5 Establishing temporal relationship76

Do timing analysis and based on input arrival time, the current waveforms are shiftedalong time axis. The purpose behind timing analysis is to establish temporal correlationbetween various nodes of the design i.e. even though 2 or more nodes have same togglefrequency; this will not switch all instances in design simultaneously unless needed. Inthis work, we have chosen to work with toggle frequency and delay instead of timingwindow [28][45]. The reasons,• Not all circuit nodes switch in all the clock cycles. Average activity computationestablishes relative amount of switching among various nodes. This is possible becauseactivity estimation techniques consider circuit functionality. Average switching activityfor most of nodes is believed at 20% of the controlling clock frequency. In certainsolutions, the average switching activity for non clock signals is assumed to be 10%only.• Timing window method uses classical path sensitization to identify the interval ofswitching. Inherent assumption of STA that all activity on a path should finish within 1clock period (unless specified explicitly using multi-cycle path), the timing intervals forall nodes will lie within a clock period. This makes whole approach of pseudo dynamicsimulation pessimistic. (see results)• During timing analysis, we collected 2 sets of data. One, sensitization edge of the nodei.e. whether the node is rising or falling at that time and second, delay of the node fromreference node.Definition: Reference nodes are those nodes that can be considered as 0 delaynodes. All the flip-flop outputs are considered as reference node in ouranalysis. When the input clock to the flip-flop has some propagation77

delay associated with it, the reference node will have delay associatedwith it.It can be seen that any frequency higher than 1 MHz will have at least some repetition in itscurrent signature i.e. a node is switching at 50 MHz (20ns) will have 50 repetitions of itscurrent signature in 1000 ns simulation.By changing the minimum frequency, we can change the simulation time considerably. Forexample, by changing minimum frequency to 50 MHz, we can ensure that all the currentsources with less than 50 MHz do not contribute (or contributes an average current) to dynamicV drop analysis and in that case maximum simulation time can become only 20 ns. In all ouranalysis we have assumed 1 MHz as minimum frequency.Number of points in piece wise linear current waveform is based on the sampling resolutionthat we did as first step after reading characterized data. An increase or decrease in thisfrequency can change the accuracy trading some runtime. In our analysis, we have assumed 75ps as sampling interval.Clock network toggles all the time. Also many designs aim for smaller insertion delays as wellas near zero skew. This makes clock network as one of the largest contributor of total current aswell as peak current.4.4 Complete FlowCell characterization and PG network modeling is explained in Figure 4.11. We take VerilogNetlist as an input and calculate average toggle frequency of each circuit node using simulationless approach. The frequency constraints are user conditions to drive the frequency calculation78

of any node. Alternatively frequency constraints can be generated from logic simulation orfunctional patterns. SDC contains timing constraints of the design. This is used in toggleactivity calculation as well as timing analysis. Timing information consists of max delay forpaths converging to any node and sensitization edge across that path. Current signatures foreach of the blocks (library macros as well as hierarchical block) are generated from currentmodels, timing information and activity estimation. The document explains, all the threeprocessing steps – toggle calculation, timing measurement, current signature generation andblock modeling in detail. Once the current signatures are hooked to parasitic PG-network, atransient simulation is performed to measure V-drop at each macro node as well as dynamictransient current waveform is generated for the power-ground pins. The V-drop data is beingfed to timing analysis engine to analyze impact of V-drop to timing.Netlist Frequency Constraints SDCToggle Frequency CalculatorTiming AnalysisPWL GeneratorCurrent CharRLC netlist with current sourcesSPICE SimulationPeak Dynamic Power/Supply NoiseFigure 4.11 Peak IR drop Computation Flow79

Next sections explain Power Grid Generator, Timing Information Generation and SPICEsimulation details.4.4.1 Timing Information GenerationTiming information was generated using Prime Time. Prime Time requires Verilog netlist,SDC and SPEF (Standard Parasitic Exchange Format) files as an input. We also wrote a tclscript (Prime Time supports TCL command language) to get arrival time information for allnodes of the circuit. Prime Time flow is shown in Figure 4.12 below. Sample SDC file [24][25]and SPEF used are shown in Appendix A and B.SDC FileVerilogNetlistSPEFPrime TimeArrival TimeComputationTiming ReportFigure 4.12 Prime Time flow for arrival time computation4.4.2 Power Grid GeneratorThe Power Grid Generator flow is expanded further below in Figure 4.13.80

Cell Char @ fix frequency(10MHz in our work)Cell FlowToggle FrequencyCalculatorPerl Code(Processes various Inputs)Timing Report(delay information)MATLAB Program-Compression Factor computed (M)- M based compression in freq domainPerl CodePG Mesh GenerationCurrent PWL hookupAnalysisFlowPG NetworkFigure 4.13 Power Grid Generation FlowPERL program combines the toggle frequency values obtained using TFC and delay values forcorresponding nodes for all the nodes. The output file containing this information for all thecells is given to MATLAB.MATLAB program – It is given two inputs. One being the current data at prototype frequenciesfor all the gates. The other input is a file containing delay and average activity information forall the cells of the circuit. Depending upon the activity, the prototype current data iscompressed. And this data is shifted by the amount equal to the delay at that node. The sameprocedure is repeated for all the cells. This information about the current data for all the cells isstored in a file. The second input is a file, which contains the following information about theVLSI circuit for which we have to obtain the power data.81

Based on the generated current signatures, a new PG network is created. After this, all themacro instances are replaced with the corresponding current signatures. In our analysis, wetook a PG network with uniform Power Grid and ideal GND. We did not do any actual powerrouting but attached the current sources randomly. This is compared with actual spice circuitsfor all macros in the same PG network at the same locations.4.4.3 SPICE SimulationNow, each cell is replaced by current source driven by its corresponding PWL data. Package R,L & C is attached to the top-level power pins. SPICE simulation is performed. The voltage ateach node of the power mesh is punched. The IR drop for each cell is calculated using aCODAC (Characterization & Optimization of Digital & Analog Circuits) program (TI InternalProgram), which subtracts power supply from the minimum voltage obtained at each node togive the Peak Dynamic IR Drop at that node. This is done for all the nodes of the circuit. Thesame CODAC program can be used to calculate the Average Dynamic IR Drop at each node ofthe circuit.4.5 Validation and ResultsIn this work, we have done following simplifications:• Modeled power grid by creating an nxm mesh. The resistance of each arm in mesh wasderived from Ohm/um number. We also assumed 2 such arms in parallel to comprehendmulti-layer chip scenario.• Matrix solver was not developed as part of this work. Instead, we used SPICEsimulators available.82

We executed the flow as explained in previous section. Instead of 1MHz, we used 10MHz forcharacterization. This is to reduce the amount of data. We still did 13.33GHz sampling of celldata.4.5.1 Peak Power ResultsThree small circuits were studied to stabilize the above approach. These three circuits are –• TWOAND :- The circuit consist of two AND gate one after the another.• ANDOR :- The circuit consists of one AND gate followed by one OR gate.• 2AND-1OR :- This circuit has two AND gate at the first level. The outputs of theseAND gates are given to an OR gate whose output is the final output.The peak power data is obtained for three small circuits using the approach described in thereport and using SPICE simulation. The data obtained using average switching activityapproach and SPICE for 100 Mega Hz and 500 Mega Hz input frequency is given below inTable 4.1.PEAK POER (Watts)FREEQUNCYTWOAND AND-OR 2AND-1ORSpiceOurApproachSPICEOurApproachSPICEOurApproach100 MHz0.00168170.0016 0.0009409 0.0008421 0.0019253 0.001983

500 MHz 0.00168113 0.0016 0.0009410 0.00086539 0.00192531 0.0018Table 4.1 Comparison of Peak power Dissipation4.5.2 Peak Dynamic IR Drop ResultsFor determining peak Dynamic IR drop, initially three circuits were used.• 100 Inverter Chain – It is a chain of 100 inverters with the output of the previousinverter acting as the input of the next. Delay of the chain is higher than the frequencyof operation.• 32 Bit Shift Register – This 32-bit shift register is series/parallel shift register.Depending upon the input and selection criteria, the input is shifted in series or parallelmanner.• 16 Bit Adder – This is 16-bit binary adder. ‘Carry Forward’ logic is used for addition.Following points are taken into account while generating the net lists for these circuits.• Package RLC is added to each power pad.• Ideal voltage source is attached to each power pad.• Uniform mesh structure is used and all leaf cells are placed randomly on to it.• Reduced interconnect network was used using driving point admittance estimation forpower as well as signal lines.• No existing decoupling capacitors were estimated.The peak Dynamic IR drop data is obtained using Average Activity approach, Timing Windowapproach and SPICE simulation. The data obtained is shown in Table 4.2.84

Circuit%Drop inaverage activity%Drop in TimingWindow ApproachSPICE%Drop100 Inverter Chain 1.65 6 132 Bit Shift Register 17.5 40 1216 Bit Adder 31 NA 19.16Table 4.2 Comparison of percentage peak instantaneous IR dropIt is clear that the accuracy of the Average Activity method is better than Timing Windowmethod. To check the performance of this approach, Average Activity method was applied to afew industry standard circuits. Table 4.3 below shows the comparison of the maximumDynamic IR Drop in a circuit using average switching activity and Power Mill. Power Mill is aSPICE based transient analysis tool offered by Synopsys. It is now called Nano Sim.circuit %V Drop using avg activity %Vdrop in Power Mill %Errors27 4.5 5.8 -22.4138s344 6.3 6.6 -4.54545s349 6.2 7.5 -17.3333s444 8.6 13.3 -35.3383s1238 13.4 13.3 0.75188s298 12.5 15 -16.6667Table 4.3 Comparison of percentage peak IR drop on ISCAS89 circuits85

Power Supply Noise waveforms for average activity approach to spice simulation with actuallogic is shown in Figure 4.14, Figure 4.15 below.Figure 4.14 PSN waveform of Proposed MethodFigure 4.15 PSN Reference Waveform86

4.6 SummaryWe proposed novel PG network modeling technique. The approach involves average switchingactivity calculation, transient current characterization of basic Boolean gates of library,derivation of PG network model and doing transient simulation of the PG model using vectorless approach. The results are derived from this simulation as desired. Further, our globalaverage switching activity calculation method ensures that we can consider global timingimpact due to global voltage drop without causing extra runtime. This reduces the need oflocal maximum voltage drop analysis on timing [26]. It is also noted in our approach that wehave detailed data of voltage drop across chip/block and based on this profile, we can also usesuitable decoupling placement at required location. The validation is done and results arecompared with dynamic fast SPICE simulator (Nano Sim) and proved that this averageswitching rate calculation gives as close results as dynamic vector analysis. However, theadvantage comes from the fact that average switching activity also gives accurate analysis ofaverage V drop. Hence the approach we are suggesting gives both average and dynamic PGnoise results simultaneously.The approach is scalable to multimillion gate designs by using the technique proposed byBlaauw et al [55]. There is further possibility to expand this work to understand decapsensitivity as well as to skew the analysis for certain end target e.g. PG grid robustness orMonte Carlo based analysis for higher accuracy and coverage.87

5 Power Up AnalysisOne of the popular techniques to reduce leakage is to use gated power supply. [74, 79, 80].Shekhar [74] has highlighted a technique called ‘sleep transistor’ and challenges associatedwith that. This technique proposes to gate power supply using a high threshold transistor whennot required as shown in Figure 5.1. The ‘sleep transistor’ also known as ‘power switch’ turnsoff power supply when a portion of chip is idle and thus saving leakage current. Apart fromdesign challenges, the technique has additional Design Analysis challenges as mentioned below.Figure 5.1 Gated Power Supply ([74])1. When Power Supply turns on from off state, a huge capacitive load gets chargedcausing a huge surge in current causing Power Supply Noise (PSN). This can couplewith signal lines causing state change or delay change. It can also remain within supply89

network but causing huge dynamic IR drop that in turn affects circuit performance. Thegoal is to predict the surge and control that.2. The transistor in series with the supply acts as a huge resistor in normal mode ofoperation causing additional IR drop. This in turn degrades performance. The IR dropacross the transistor can be as high as 5-20mV. The goal is to do an average IR dropanalysis to access the impact of switch.3. Optimization of switches to get the best leakage improvement. The optimization hasarea penalty or IR drop or Power Supply Noise as cost parameters. For example, lownumber of switches gives good leakage improvement but high IR drop and PowerSupply noise.4. When power supply goes down, all sequential logic in the virtual power domain lossesits state. This puts extra constraint overall on system behavior. There is also a techniquewhere the state is preserved through ‘retention flops’. [2, 81] The technique does needextra power routing to save state as well as control logic. The timing analysis needs tocapture the mode switching.5. Placement and Routing of extra signals, special cells (like retention flops etc) andvirtual power network.6. Leakage and number of power switch trade off7. Power routing closes immediately after floor plan. The switches need to be placed bythis time. It is important to have early power up analysis flow to compute required90

number of optimal switches meeting the peak current surge as well as IR drop andleakage needs.Often, PSN is non-negotiable parameter and design-planning goal is to identify total number ofswitches that limits PSN to user-defined level. This paper describes an analytical method todetermine optimum number of power switches and power up glitch. Section II elaborates onswitched PG network and PSN problem. Section III outlines the approach to analyze suchnetworks. Section IV correlates the results we have achieved with SPICE and the efficiency ofalgorithm.5.1 Switched PG NetworksPower Supply Noise is widely acknowledged research domain in today’s high performancedesigns. There is various analysis techniques also proposed in literature. [26-31] However,there is not much awareness on Power Supply Noise caused by turning on the power domainswhen gated power supply is used. Figure 5.2 shows switch network for 1M-gate design andFigure 5.3 shows a current glitch and voltage ramp on an arbitrary switch output. Note that thecurrent surge can remain for a considerable amount of time causing performance impact to ‘on’blocks.91

Power SwitchFigure 5.2 Layout of 1M gate with switch networkFigure 5.3 Current Glitch and Voltage Ramp at arbitrary switch outputA typical PG network with Power Switches can be represented as shown in Figure 5.4. Some ofthe characteristics of this network are: [87]• 2 domains – one golden domain and non-gated power supply, second multiple virtualdomains and switched power supply.• All virtual domains are unconnected within. They are connected to golden domainthrough switch network.92

• Switch network consists of one or more different kind of switches for a given domain.• Switch network across virtual power domains are not shared.• Random logic is connected to golden domain as well as all virtual domains.• Control logic enables any one or more virtual domains to turn on/off any time.• Further, any switch network consists of parallel network or sequential network orcombination of both. Parallel configuration allows all switches to turn onsimultaneously whereas sequential configuration allows each switch to turn on one byone after some delay.Switch Control LogicOffchip Power supplyNonGatedVDDSwitchSWVirtual PowerPower NetworkNetworkNetworkLogicNetworkZOOMLogicNetworkVDD SW1SW VDD SW2SW VDDSW3SWN SwitchesParallel ConfigurationVDD SW1SW VDD SW2SW VDDSW3SWD1 D1 D1N SwitchesSequential ConfigurationFigure 5.4 Typical PG network with Power SwitchesWhen the power supply is ‘off’ and virtual network is disconnected, the current that passesthrough is leakage current. If leakage current of the virtual logic is significantly higher than thatof switch network leakage, leakage current improvement happens. When the switches areturned on i.e. when the power supply connects to virtual power network, the loads in virtual93

power network start getting charged. Loads include interconnect capacitances, gatecapacitances as well as the circuit diffusion/diode caps. The amount of current being sunk bythese caps depends on the ability of switch network to provide charge in a given time. Due tofast current need of the virtual power domain, there is L*di/dt noise being injected into circuitthat can affect normal functioning of the golden power domain. Note that despite of capacitiveload dominating, the peak current is still limited by saturation current of switch causing currentprofile we got in Figure 5.3.5.2 Switch Network AnalysisSwitch Network Analysis (SNA) early in design-planning includes decision of switch networktopology, identification of switches to be used, total system timings for turning on/off powerdomains as well as total power supply noise contribution by a switch network. Sequentialconfiguration allows configuring delay such that the peak current at any point of time can becontrolled to meet the specification of system noise and hence the tradeoff between the totaltime systems requires to on/off virtual network and the noise criteria. This information shouldgo to the placement and routing tools for physical design. Further, switch network contributioncomes from maximum current surge it causes and the point of optimization there is totalnumber of switches of each type in the network and delay.Following assumptions are made to keep the analysis simple but in reality the solution can beextended to handle them.• Delay between two consecutive switches is same.• 2 types of switches exist in the network.94

• Voltage at any node in virtual power network is of the same value at any time instantduring power ON if there is zero static IR drop.• Switch Network is sequential. Parallel configuration essentially means a BIG switch -all transistors forming a BIG switch with characteristic lumped to a single MOS.High-level flow for the analysis is shown in block diagram Figure 5.5.Switch IVCharacterizationCurrent prediction thatcharges capacitive loadDetermination ofrequired parametersFigure 5.5 Schematic Switch network Analysis Flow5.2.1 Switch CharacterizationSwitch IV Characterization includes current being sourced through switch for different voltagesbetween golden and virtual power port of switch. This is achieved using transient SPICEsimulation of the switch. The data is stored in value-pair (voltage-current) format for furtherprocessing.Switch characterization also involves switch ON resistance measurement. This is resistance thatswitches offer during normal functionality i.e. when switches are turned ON and virtual powernetwork is connected to golden power network. This is measured by putting 10mV batteryacross switch and measuring current. This resistance value is later used for average IR dropanalysis across switch.95

Note that the 1 stcharacterization – IV characterization – that we did also is resistancecharacterization. This resistance varies for different value of voltages across switch so it is alsocalled non-linear resistance characterization.5.2.2 Current or Switch PredictionCurrent prediction is done based on simplified extracted model of block under consideration asFigure 5.6. The switch network is modeled along with its detailed connectivity and timingwhereas the logic connected to virtual domain is modeled as capacitive load. Current throughswitch is predicted in infinitesimal small time duration. The CV characteristic is applied hereas below:Current(I) =dq/dt OR dq = I dtBut dq = C * dvHence dv = I * dt / C……1……2……3VDDSwitchNetworkVoutExtractedTotal CloadFigure 5.6 Analysis model of Virtual Power NetworkEquation 3 forms the basis of Algorithm 1 described in next section. The delay between twoconsecutive switches is used to predict the charge being supplied by the switch to virtual power96

network domain. The IV table of the switch is used to predict current by further dividing delayinto infinitesimal small time duration as shown in Figure 5.7. Based on the initial voltage andcharge supplied, the voltage has been derived when the next switch just starts turning on. Thisprocess continues till either all switches are turned on or the specified voltage level is reached.Further, the same method continues if all the switches are turned on but voltage value is lowerthan the ideal voltage value (VDD golden) to predict the maximum surge in current. Predictednumber of switches is used to predict static IR drop across switch network as explained inAlgorithm 2. This is another important parameter that will not be discussed further in thischapter.Figure 5.7 Infinitesimal Time Division for Current PredictionParameters those can be analyzed through this setup include:• Total number of switches required reaching a required voltage value.• Alternatively, voltage value that can be reached with given number of switches.97

• Maximum current surge that will happen given the number of switches.• Delay impact of consecutive switches while they turned on.• IR drop across switch network5.2.2.1 Algorithm for Power Switch Network Analysis:Initialize load voltage to zero and current charging to Zero.{For each, infinitesimal small times period, predict the current based on thevoltage at lumped load from IV table of the switch type.Identify the actual current based on the number of switches turned on at theparticular instance of time.Track the current at VDD i.e. if the new current is greater than old one, assignmaximum surge current to new current.Calculate the rise in voltage in the infinitesimal small time based on equation(3).Continue till either all the switches are turned on or the desired voltage level isreached.}Print maximum surge current and voltage level reached after turning on some specificswitches as required by user.98

Above algorithm is developed for the case where the delay between 2 consecutive switches insequential switch network is same. However, it is possible to extend for different delay scenario.In this case, we need to use timing information from Static Timing Analysis or simulations.5.2.2.2 Algorithm for Static IR drop analysis across power switches:{Read switch characterization data – for static IR drop, read ON Channelresistance (RON)Determine total number of switches required to reach desired voltage level –desired voltage level is specified by user – by “Algorithm for power SwitchNetwork Analysis”Effective resistance of the switches predicted above (N) is: RON/NCompute power consumption of switched off or virtual power network usingany methods described in this work (can be outside this work also!)Compute average current consumption of the virtual power network. Iavg =Pavg/VDDStatic IR drop across switch network is: Iavg*RON/N.}5.3 Results and AnalysisTraditional approach to study above would be full-fledged SPICE simulation that includesvirtual power network and switch network where each switch is turned on after some delay.Note that here we are talking about thousands of switches in switch network and about million99

gates in the virtual network or more. This will take weeks to simulate even with fast SPICEsimulators available in market. Also it is very late in design cycle!Alternately we can reduce the virtual power network by modeling the interconnect load andgate capacitance with a huge distributed capacitance and on channel transistor resistance witheffective resistance in series with each distributed C to reduce the number of active elementsand simulate the reduced power network using SPICE (Figure 5.8). This approach gives ordersof improvement in terms of simulation time but the run time is still days. This can be doneduring design planning or after detailed design is over!Figure 5.8 Reduced Switch Network for validationThe technique we presented in last section is static in nature and reduces the runtime to fewminutes and gives very good correlation to techniques described above. The algorithmsdescribed above were analyzed with switches designed in TI’s 90 nm node. All the resultsbelow are for a 1M equivalent gate block. 1M Gates could not be simulated using SPICE alongwith switches so a simplified model described in previous paragraph was employed to get100

SPICE accuracy data while keeping switch network intact. We had employed switch networkwith two kinds of switches for this analysis [87]. One set of switches took the virtual domaintill a specific voltage level and second kind of switches with high capacity were turned on in asequential manner to measure surge in current.Table 5.1 shows prediction of switches for given voltage. When the numbers of switches areincreasing the algorithm gives results within 1% accuracy to SPICE based simulation whereaswhen the numbers of switches are less, the inaccuracy is within 10%. In other words, the actualnumber is quite close to realistic number with accuracy 1-10%. This table also shows thecurrent surge prediction and the switch number which turns ON causing maximum peak.Essentially, along with surge, we predict the switch at which the maximum surge occurs. Thishelps to further optimize the 2 nd type of switch network. Table 5.2 shows voltage predictiongiven the number of switches.The advantage of whole solution comes from the superlative run time improvement thatenables early analysis and tradeoffs in the design – Table 5.3. The runtime clearly outweighsthe small inaccuracy in switch prediction or voltage prediction. Note that runtime does notinclude switch IV characterization time since it is one time effort. In static analysis, we candump lot more information quickly as per the need to understand certain behavior for tradeoffanalysis. We can also predict time domain behavior of voltage and current using the approachdescribed in this work. Figure 5.9 compares predicted voltage over time to few arbitrary nodessimulated in SPICE. Figure 5.10 compares predicted current over time to current measured atVDD. This is good considering that the analysis is targeted for early trade off analysis.101

Vdesired (mV)Actual#SwitchesSwitches byAlgorithmCurrentSurge (mA)Current Surgeafter #switches20 380 403 950 12369 760 771 881 114271 1560 1554 749 100583 2340 2328 467 97869 2964 2971 266 811170 4368 4308 24 43Table 5.1 Switch Prediction by proposed algorithm#SwitchesSimulatedVoltage (mV)Voltage byAlgorithmSurgeCurrentSurge Currentafter switch #(mA)%Error involtages780 63 70.54 892 101 111560 280 273.53 784 94 -0.22340 587 589.26 546 78 0.383120 926 927.7 263 64 0.18Table 5.2 Voltage Prediction102

No. of switches Simulation Time (in days) Algorithm Runtime (in mts)780 ~1.5 < 11560 ~4 < 12340 ~5 < 12940 ~6 < 1Table 5.3 Power Up analysis - Runtime Comparison1400Voltage in mV120010008006004002000TimePredicted SPICE@node1 SPICE@node2Figure 5.9 Voltage Ramp up over Time for various nodes1000Current in mA8006004002000TimePredictedSPICEFigure 5.10 Current comparison over time103

5.4 SummaryThere are various techniques to improve leakage power of the design - ‘gated power supply’ or‘sleep transistor’ or ‘switched power network’ is one of the efficient methods to reduce theleakage power. The analysis techniques described in this work helps in giving quick data forarchitecture level decisions while using ‘switched network’ technique. The runtime is in fewseconds and hence Design Team can do lots of iterations to get the optimum number ofswitches. The analytical method to calculate total no of switches is fast since it involves onetime SPICE simulation – only IV characteristic of switch - and rest of the analysis is performedusing static analysis. We have also analyzed ‘power on glitch’ for the design using the methodthat contributes to Power Supply Noise during power up. All the results are closely matchingwith SPICE simulation.104

6 Conclusion6.1 SummaryPower Grid analysis challenges being faced by CMOS technology is discussed in this thesis.For robust power grid, designs need to go through following analysis:• Accurate Power Estimation• Instantaneous IR drop analysis and decap planning• Power Up analysis for designs using MTCMOS for leakage reductionThe key results of this work can be summarized as follows:1. Successfully implemented hierarchical probabilistic toggle computation approach that isapplicable to multi-million gate designs maintaining the desired accuracy2. Power Dissipation in cell based CMOS design discussed. A flow is proposed to dopower estimation in various design stages that can improve the accuracy of estimation.The flow also helps user to make run time and accuracy tradeoffs3. Proposed the cell characterization methodology for instantaneous IR drop analysis aswell as Power Up analysis for MTCMOS4. Discussed a prototype flow developed for instantaneous IR drop estimation based onaverage toggle rate computed by the proposed toggle methodology in this work. Thisflow estimates instantaneous as well as average IR drop numbers during samesimulation.105

5. Power Up analysis for MTCMOS based digital designs. The methodology is validatedusing prototype flow and gives superlative run time improvement compare to Spice.The methodology also helps in MTCMOS gate optimization.6.2 Scope of Future WorkAnalysis approaches proposed in this work helps in robust power grid analysis. The work hassome extensions possible to further help designs.First, power estimation proposed in this work relies on gate level netlist. An RTL level powerestimation helps block designer to trade off power early in the design like MTCMOS usage ormulti-Vt usage as proposed in [17].Second, it is possible to improve pre-layout and post layout power number correlation. One ofthe reasons for them to be different is clock tree expansion and buffer insertion while doingplacement and routing in design to meet timing constraints. Early estimation techniques can bedeveloped to estimate additional cell count to better correlate power numbers in various stages.Third, the amount of cell characterization data stored for each cell is very huge. A typical ASICtechnology contains 2000-4000 cells. This data reduction is possible if we can just store thecurrent signatures during transition and use that to model current source in block level analysis.This will also eliminate the need of frequency domain transform being performed here.Techniques used in some of the commercial tools in conjunction with the analysis approachpresented in this work can help improving data reduction.Fourth, we have not got into details of decoupling capacitance for instantaneous IR dropanalysis in this work. It is possible to further extend the work to extensively study various106

decoupling capacitors – intrinsic due to NWELL, non switching gates, RAMs as well asintentional being distributed by user. Decoupling capacitor estimation, characterization andwhat-if impact analysis on instantaneous IR drop is import area for further research.Fifth MTCMOS analysis approach proposed in this work is useful early in design planning tomake efficient tradeoffs of MTCMOS switches vs. noise tolerance levels in design. In this work,we have modeled switch power network with a lumped capacitance. This does not model timedomain behavior of PG network due to PG resistance. A more accurate approach can bedeveloped that models distributed RC for PG network once placement and power routing isdone. It is our belief that this will give quick accurate analysis of actual network compare toSPICE like simulations.107

108

7 References1. Semiconductor Industry Assoc., International Technology Roadmap for Semiconductors, 2003 Update -http://public.itrs.net/Files/2003ITRS/Home2003.htm2. Nam Sung Kim, David Blaauw et al, “Leakage Current: Moore’s Law Meets Static Power”, IEEE Computer, Dec 2003.3. The SPICE Home Page, http://bwrc.eecs.berkeley.edu/Classes/IcBook/SPICE/4. Rabe, D; Jochens, G.; Kruse, L.; Nebel, W, „“Power-simulation of cell based ASICs: accuracy- and performance trade-offs”, Proceedingsof Design automation and test in Europe, Feb 19985. F. Najm, “A survey of power estimation techniques in VLSI circuits, ”IEEE Trans. VLSI System., vol. 2, pp. 446–455, Dec. 1994.6. C. Y. Tsui, M. Pedram, and A. Despain, “Efficient estimation of dynamic power dissipation under a real delay model,” in Proc. IEEE Int.Conf. Computer-Aided Design, 1993, pp. 224–2287. B. J. George et al., “Power analysis and characterization for semi custom design,” in Proc. Int. Workshop Low Power Design, 1994, pp.215–218.8. J.-Y. Lin et al., “A cell-based power estimation in CMOS combinational circuits,” in Proc. IEEE Int. Conf. Computer-Aided Design,1994, pp. 304–309.9. H. Sarin and A. McNelly, “A power modeling and characterization method for logic simulation,” in Proc. IEEE Custom IntegratedCircuits Conf., 1995, pp. 363–366.10. Synopsys’ Design Power, (http://www.synopsys.com/products/power/power.html)11. N. Waste and K. Eshragian. “Principles of CMOS VLSI Design. VLSI Systems Series. Addison-Wesley, 1985.12. Najm, F.N, “Transition Density, a stochastic measure of Activity in Digital Circuits”, DAC, pp. 644-649, June 1991.13. Ghosh, A.; Devadas, S.; Keutzer, K.; White, J, “Estimation of average switching activity in combinational and sequential circuits”, DAC,pp. 253-259, June 199214. S. Bhanja, N. Ranganathan, “Dependency Preserving Probabilistic Modeling of Switching Activity using Bayesian Networks”, 38thDesign Automation Conference, pp. 209-214, 2001.15. HUGIN API reference manual. Version 5.3. http://www.hugin.com16. David Heckerman, “A tutorial on learning with Bayesian Networks”, ftp://ftp.research.microsoft.com/pub/tr/tr-95-06.pdf, March 1995.17. Agarwal, A.; Mukhopadhyay, S.; Raychowdhury, A.; Roy, K.; Kim, C.H, “Leakage power analysis and reduction in nanoscale circuits”,IEEE Micro, Volume 26, Issue 2, pp. 68-80, March 2006.18. Keshavarzi, A.; Tschanz, J.W.; Narendra, S.; De, V.; Daasch, W.R.; Roy, K.; Sachdev, M; Hawkins, C.F, “Leakage and process variationeffects in current testing on future CMOS circuits”, IEEE Design & Test of Computers, Volume 9, Issue 5, pp. 36-43, Sept 2002.19. Dresig, F. Lanches, P. Rettig, O., et al, “Simulation and reduction of CMOS power dissipation at logic level”, Design Automation,1993, with the European Event in ASIC Design. Proceedings, pp. 341-246, Feb 1993.20. An-Chang Deng Yan-Chyuan Shiau Loh, K.-H, “Time domain current waveform simulation of CMOS circuits”, IEEE internationalconference on Computer aided design 1988, pp. 208-211, Nov 1988.109

21. F.N. Najm, R.Burch, P. Yang, and I.N. Hajj. “Probabilistic Simulation for Reliability Analysis of CMOS VLSI Circuits”. IEEETransactions on CAD, 9(4):439-450, April 1990.22. Randal S and Tom Phoenix and Brian d foy, “Learning Perl”, 4 th Edition, O’Reilly & Associates, ISBN 059610105823. Matlab Tutorial, http://www.math.ufl.edu/help/matlab-tutorial/24. Synopsys, Inc, “Using the Synopsys® Design Constraints Format”, Application Note, Sept 2005.25. Himanshu Bhatnagar, “Advanced ASIC Chip Synthesis: Using Synopsys Design Compiler Physical Compiler and Primetime”, 2 ndEdition, Kluwer Academic Publishers, ISBN: 0792376447.26. Martin Saint-Laurent, Swaminathan, "Impact of Power Supply Noise on Timing In High Frequency Microprocessors", IEEE Trans onAdvanced Packaging, pp. 135-144, Feb 200427. Kriplani, H.; Najm, F.; Hajj, I, “Improved Delay and Current Models for Estimating Maximum Currents in CMOS VLSI Circuits”,ISCAS 94, pp. 435-438, June 1994.28. Kriplani, H.; Najm, F.N.; Hajj, I.N, “Pattern Independent Maximum Current Estimation in Power and Ground Buses of CMOS VLSICircuits: Algorithms, Signal Correlations, and Their Resolution”, IEEE Trans on CAD of international circuits and systems, pp. 998-1012, Aug 1995.29. Hsiao, M.S.; Rudnick, E.M.; Patel, J.H., “Peak Power Estimation of VLSI Circuits: New Peak Power Measures”, IEEE Trans on VLSISystems, pp. 435-439, Aug 200030. Qing Wu; Qinru Qiu; Pedram, M, “Estimation of Peak Power Dissipation in VLSI Circuits Using the Limiting Distributions of ExtremeOrder Statistics”, IEEE Trans on CAD of integrated Circuits and Systems, pp. 942-956, Aug 2001.31. Boliolo, A. Benini, L. de Micheli, G. Ricco, B., “Gate-level power and current simulation of CMOS integrated circuits”, Very LargeScale Integration (VLSI) Systems, pp. 473-488, Dec 199732. Anantha Chandrakasan’s Home Page: http://www-mtl.mit.edu/~anantha/publications.html,http://www.fetchbook.info/search_Anantha_Chandrakasan/searchBy_Author.html33. FFT Tutorial, http://www.ele.uri.edu/~hansenj/projects/ele436/fft.pdf34. Jeff Tranter and Paul Raines, “Tcl/Tk in Nutshell”, O’Reilly Associates, ISBN 156592433935. Alan V. Oppenheim, Ronald W. Schafer, John R. Buck, “Discrete Time Signal Processing“, 2 nd Edition, Prentice Hall, ISBN 013754920236. Chen, H.H.; Ling, D.D, “Power Supply Analysis Methodology for Deep-Submicron VLSI Chip Design”, DAC, pp. 638-643, June 1997.37. Yi-Shing Chang; Gupta, S.K.; Breuer, M.A, “Analysis of Ground Bounce in Deep-Submicron Circuits”, VLSI Test Symposium, pp. 110-116, May 199738. Yi-Min Jiang; Kwang-Ting Cheng; An-Chang Deng, “Estimation of Maximum Power Supply Noise for Deep Sub-Micron Designs”,International sym on low power electronics and design, pp. 233-238, Aug 1998.39. Zhao, S.; Roy, K.; Koh, C.-K, “Estimation of Inductive and Resistive Switching Noise on Power Supply Network in Deep Sub-MicronCMOS Circuits”, International conference on Computer Design, pp. 65-72, Sept 2000.40. S. Bobba, I.N.Hajj, “Maximum voltage variation in the power distribution network of VLSI circuits with RLC Models,” Proc of ISLPED,Aug2001110

41. Bai, G.; Bobba, S.; Hajji, I.N, "Static Timing Analysis Including Power Supply Noise Effect on Propagation Delay in VLSI Circuits",DAC, pp. 295-300, 2001.42. G. Steele, et al., “Full-Chip Verification Methods for DSM Power Distribution Systems,” Proc. Of DAC, pp. 744-749, 199843. R. Chaudhry, D. Blaauw, R. Panda and T. Edwards, “Current Signature Compression For IR-Drop Analysis,” Proc. Design AutomationConference, pp. 162-167, 200044. S. Bobba and I. N. Hajj, “Estimation of maximum current envelope for power bus analysis and design,” Proc. of ISPD, pp 141-146, Apr199845. Rishi Bhooshan (TI) et.al, “A Unique Method For Dynamic Voltage Drop Analysis and Decoupling Capacitance Estimation,, VDAT200346. Cirit, M.A., “Characterizing a VLSI standard cell library”, Digital Object Identifier 10.1109/CICC, pp.25.7.2-25.7.4, May 199147. Debnath, S.P.; Sukumar, J.; Udaykumar, H, “A methodology for fast vector based power supply and substrate noise analyses”,International conference on VLSI Design, pp. 808-811, Jan 2005.48. Dalal, A.; Lev, L.; Mitra, S.; “Design of an efficient power distribution network for the UltraSPARC-I microprocessor”, IEEE conferenceon Computer Design: VLSI in computers and processors, pp. 118-123, Oct 199549. Chen, H.H.; Schuster, S.E.; „On-chip decoupling capacitor optimization for high-performance VLSI design”, VLSI Technology, Systemsand Applications, pp. 99-103, June 1995.50. Larsson, P, “Power supply noise in future IC's: a crystal ball reading”, Custom Integrated Circuits, pp. 467-474, May 1999.51. Sotman, M.; Popovich, M.; Kolodny, A.; Friedman, E, “Leveraging symbiotic on-die decoupling capacitance”, Electrical Performance ofElectronic Packaging, pp. 111-114, Oct 200552. Larsson, P, “Resonance and damping in CMOS circuits with on-chip decoupling capacitance”, IEEE Transactions on Circuits andSystems-I, vol 45, pp. 849-858, Aug 199853. Larsson, P, “Parasitic Resistance in an MOS Transistor Used as On-Chip Decoupling Capacitance,” IEEE Journal of Solid State Circuits,vol 32, pp 574-576, Apr 199754. Chaudhry, R.; Panda, R.; Edwards, T.; Blaauw, D, “Design and analysis of power distribution networks with accurate RLC models”,International conference on VLSI Design, pp. 151-155, Jan 200055. Min Zhao; Panda, R.V.; Sapatnekar, S.S.; Edwards, T.; Chaudhry, R.; Blaauw, D, “Hierarchical analysis of power distribution networks”,DAC, pp. 150-155, June 200056. IBM Methodology for Power Supply Noise - http://www.research.ibm.com/da/nova.html57. R. Heald et. al, “Implementation of a 3 rd Generation Sparc V9 64b Microprocessor”, Proc IEEE ISSCC, pp. 412-413, 200058. Yi-Min Jiang Kwang-Ting Cheng, “Analysis of Performance Impact Caused by Power Supply Noise in Deep Submicron Devices”, DAC,June 199959. Apache Design Solutions, “Reshaping Nanometer Flows with Physical Power Integrity”, http://www.apache-da.com, White Paper, May2003.60. Anthony Ralston, Philip Rabinowitz, “A First course in Numerical Analysis”, 2 nd Edition, Dover Publications, ISBN 048641454X.61. Kalpesh Shah, “SNUG 2006 Panel Discussion”111

62. H. Mehta, R.M.Owens, M.J.Irwin, “Energy Characterization Based on Clustering,” 33 rd Design Automation Conference, June 1996.63. D. Brooks, V. Tiwari, and M. Martonosi, “Wattch: A framework for Architectural-Level Power Analysis and Optimizations,” Proc ofInternational Symposium on Computer Architecture, pp. 83-94, June 200064. V. Tiwari, S. Malik, and A. Wolfe, ”Power Analysis of Embedded Software: A First Step toward software power minimization,” IEEETrans VLSI Systems, vol2, no. 4, pp 437-445, 199465. E. Macii, M. Pedram and F. Somenzi, “High Level Power Modeling and Estimation,” IEEE Transactions on Computer Aided Design ofIntegrated Circuits and Systems, vol 17, November 1998.66. Synopsys Prime Power - http://www.synopsys.com/products/power/primepower_ds.pdf67. Synopsys Power Compiler - http://www.synopsys.com/products/power/power_ds.pdf68. Synopsys Nanosim - http://www.synopsys.com/products/mixedsignal/nanosim/nanosim.html69. Synopsys Liberty Format - http://www.synopsys.com/partners/tapin/lib_info.html70. M Horowitz and R Gonzalez, “Energy dissipation in general purpose Microprocessors”, IJSSC, vol31, Sept 1996.71. Brglez, F. Bryan, D. Kozminski, K. , “Combinational profiles of sequential benchmark circuits”, ISCAS, vol 3, pp. 1929-1934, May1989.72. R. Wilson and D. Lammers, “Grove Calls Leakage Chip Designers’ Top Problem,” EE Times, 13 Dec 2002;www.eetimes.com/story/OEG20021213S0040.73. Intel SpeedStem technology, http://www.intel.com74. Y.Ye, S Borkar, V. De, “A New Technique for Standby Leakage Reduction in High-Performance Circuits,” 1998 Symposium on VLSICircuits, June 1998.75. M. Powell et al., “Reducing Leakage in a High Performance Deep-Submicron Instruction Cache,” IEEE Trans. VLSI, Feb 2001, pp 77-8976. Ali K., Charles H. et al., “ Effect of reverse body bias for low power CMOS circuits”77. Kaushik R, Mark C.J., Dinesh S., “leakage control with efficient use of transistor stacks in single threshold CMOS”78. Shekhar Borkar, “Low Power Design Challenges for the Decade”, 2001.79. Kumagai, K.; Iwaki, H.; Yoshida, H.; Suzuki, H.; Yamada, T.; Kurosawa, S.; “A Novel Powering Down Scheme for low Vt CMOSCircuits”, 1998 Symposium on , 11-13 June 1998. Pages:44 – 4580. Mutoh, S.; Douseki, T.; Matsuya, Y.; Aoki, T.; Yamada, J.,” 1V high-speed digital circuit technology with 0.5μm multi-thresholdCMOS”, IEEE ASIC Conference, 1993.81. Akamatsu, H.; Iwata, T.; Yamamoto, H.; Hirata, T.; Yamauchi, H.; Kotani, H.; Matsuzawa, A.; “A low power data holding circuit withan intermittent power supply scheme for sub-1V MT-CMOS LSIs”, VLSI Circuits, 1996. Digest of Technical Papers., 1996 Symposiumon , 13-15 June 1996 Pages:14 – 1582. Ye, Y.; Borkar, S.; De, V. , “A new technique for standby leakage reduction in high-performance circuits”, Symposium on VLSI Circuits,June 1998. Page(s): 40-4183. Das, K.K.; Joshi, R.V.; Chuang, C.T.; Cook, P.W.; Brown, R.B., “New digital circuit techniques for total standby leakage reduction inNano-scale SOI technology”, pp. 309-312, ISSCC, Sept 2003.84. Wenxin Wang; Anis, M.; Areibi, S, “Fast techniques for standby leakage reduction in MTCMOS circuits”, ISOCC, pp. 21-24, Sept 2004112

85. Fei Li; Lei He; Saluja, K.K.; “Estimation of maximum power-up current”, DAC, pp. 51-56, Jan 200286. Calhoun, B.H.; Honore, F.A.; Chandrakasan, A.P, “A leakage reduction methodology for distributed MTCMOS”, JSSC, pp. 818-826,May 200487. Royannez, P.; Mair, H.; Dahan, F.; Wagner, M.; Streeter, M.; Bouetel, L.; Blasquez, J.; Clasen, H.; Semino, G.; Dong, J.; Scott, D.; Pitts,B.; Raibaut, C.; Uming Ko, “90nm Low Leakage SoC Design Techniques for Wireless Applications”, ISSCC, pp. 138-139, Feb 2005.88. R. Heald, et al., “Implementation of a 3 rd Generation SPARC V9 64b Microprocessor,” Proc. IEEE ISSCC, pp 412-413, 200089. P. Gronowski, W. Bowhill, R. Preston, M. Gowan, and R. Allmon, “High Performance Microprocessor Design,” IEEE Journal of SolidState Circuits, vol 33, no 5, pp. 676-686, Apr 1998.90. J. Darnauer, D. Chengson, B. Schmidt, and E. Priest, “Electrical Evaluation of Flip-Chip package Alternatives for Next GenerationMicroprocessor,“ Electronic Components and Technology Conference, pp. 666-673, 199891. S. Borkar, “Low Power Design Challenges for the Decade,” Proc. of ISLPED, 200092. V. Tiwari, D. Singh, S. Rajgopal, G. Mehta, R. Patel and F. Baez, “Reducing Power in High performance Microprocessors,” Proc. ofDesign Automations Conference, 199793. Wachnik, R.A.; Filippi, R.G.; Shaw, T.M.; Lin, P.C, “Practical benefits of the electromigration short-length effect, including a new designrule methodology and an electromigration resistant power grid with enhanced wireability”, Sym on VLSI Technology, pp. 220-221, June2000.94. J. Kitchin, “Statistical Electromigration Budgeting for Reliable Design and Verification in a 300-MHz Microprocessor”, Symposium onVLSI Circuits Digests, pp. 115-116, 199595. T .H. Cormen, C. E. Leiserson, R. L. Rivest “Introduction to Algorithms”, PHI96. Chapra, S.C, Canale R P “Numerical Methods for Engineers” 3rd Ed., McGraw-Hill 1998.97. Rabey, “Digital Integrated Circuits Design”, Pearson Education, Second Edition, 2003113

114

Appendix A Sample SDC filecreate_clock –period [get_ports clk]set_input_delay -clock clk1 [get_ports IN*]set_case_analysis 0 [get_ports *reset* *scan_mode*]report_timing 115

Appendix B Sample SPEF Format*SPEF "IEEE 1481-1997"*DESIGN "s27"*DATE "Mon Dec 13 10:05:00 1999"*VENDOR "TI"*PROGRAM "vlog2spef"*VERSION "1.0"*DESIGN_FLOW "Dummy From Verilog"*DIVIDER /*DELIMITER :*BUS_DELIMITER []*T_UNIT 1 NS*C_UNIT 1 PF*R_UNIT 1 KOHM*L_UNIT 1e-3 UH*PORTSG17 O *L 0.1G3 I *S 0.1 0.1G2 I *S 0.1 0.1G1 I *S 0.1 0.1G0 I *S 0.1 0.1PREZ I *S 0.1 0.1CLK I *S 0.1 0.1*D_NET G17 0.1*CONN*I IV110_1:Y O *L 0.1 *D IV110*P G17 O *L 0.1*CAP0 G17 0.11 IV110_1:Y 0.12 G17:0 0.1*RES0 G17 G17:0 0.11 IV110_1:Y G17:0 0.1*END*D_NET G3 0.1*CONN*I OR210_1:A I *L 0.1 *D OR210*P G3 I *L 0.1*CAP0 G3 0.11 OR210_1:A 0.12 G3:0 0.1*RES0 G3 G3:0 0.11 OR210_1:A G3:0 0.1*END*D_NET G2 0.1*CONN*I NO210_3:A I *L 0.1 *D NO210*P G2 I *L 0.1*CAP0 G2 0.11 NO210_3:A 0.12 G2:0 0.1*RES0 G2 G2:0 0.11 NO210_3:A G2:0 0.1*END*D_NET G1 0.1*CONN*I NO210_2:A I *L 0.1 *D NO210*P G1 I *L 0.1*CAP0 G1 0.11 NO210_2:A 0.12 G1:0 0.1*RES0 G1 G1:0 0.11 NO210_2:A G1:0 0.1*END*D_NET G0 0.1*CONN*I IV110_0:A I *L 0.1 *D IV110*P G0 I *L 0.1*CAP0 G0 0.11 IV110_0:A 0.12 G0:0 0.1*RES0 G0 G0:0 0.11 IV110_0:A G0:0 0.1*END*D_NET PREZ 0.1*CONN*I DTP10J_0:PREZ I *L 0.1 *D DTP10J*I DTP10J_1:PREZ I *L 0.1 *D DTP10J*I DTP10J_2:PREZ I *L 0.1 *D DTP10J*P PREZ I *L 0.1*CAP0 PREZ 0.11 DTP10J_0:PREZ 0.12 DTP10J_1:PREZ 0.13 DTP10J_2:PREZ 0.14 PREZ:0 0.1116

*RES0 PREZ PREZ:0 0.11 DTP10J_0:PREZ PREZ:0 0.12 DTP10J_1:PREZ PREZ:0 0.13 DTP10J_2:PREZ PREZ:0 0.1*END*D_NET CLK 0.1*CONN*I DTP10J_0:CLK I *L 0.1 *D DTP10J*I DTP10J_1:CLK I *L 0.1 *D DTP10J*I DTP10J_2:CLK I *L 0.1 *D DTP10J*P CLK I *L 0.1*CAP0 CLK 0.11 DTP10J_0:CLK 0.12 DTP10J_1:CLK 0.13 DTP10J_2:CLK 0.14 CLK:0 0.1*RES0 CLK CLK:0 0.11 DTP10J_0:CLK CLK:0 0.12 DTP10J_1:CLK CLK:0 0.13 DTP10J_2:CLK CLK:0 0.1*END*D_NET G10 0.1*END*D_NET G5 0.1*CONN*I DTP10J_0:Q O *L 0.1 *D DTP10J*I NO210_1:A I *L 0.1 *D NO210*CAP0 DTP10J_0:Q 0.11 NO210_1:A 0.12 G5:0 0.1*RES0 DTP10J_0:Q G5:0 0.11 NO210_1:A G5:0 0.1*END*D_NET G6 0.1*CONN*I DTP10J_1:Q O *L 0.1 *D DTP10J*I AN210_0:B I *L 0.1 *D AN210*CAP0 DTP10J_1:Q 0.11 AN210_0:B 0.12 G6:0 0.1*RES0 DTP10J_1:Q G6:0 0.11 AN210_0:B G6:0 0.1 *END*CONN*I DTP10J_0:D I *L 0.1 *D DTP10J*I NO210_0:Y O *L 0.1 *D NO210*CAP0 DTP10J_0:D 0.11 NO210_0:Y 0.12 G10:0 0.1*RES0 DTP10J_0:D G10:0 0.11 NO210_0:Y G10:0 0.1117

Appendix C Power Waveforms AnalysisAND Gate power waveforms at different frequency points. Note that waveform shape andpeaks are matching across frequency range.Figure 1 1MHz, Peak: 838.9 uWFigure 2 100MHz, Peak: 840.7 uWFigure 3 1GHz, Peak: 838.2 uW118

Appendix D Current Characterization – sample spice deck**epic tech="voltage 1.2v"*epic "vdd 0 1.2 0.01"*epic "vss 0 0 0.01"*epic "invoke spice3 %input %output"* spice options.inc /user/kalpu/cloc/autochar/userware/spice_options noprint* temperature = 25.temp 25.inc ../user_data/models_strong noprint*.inc /db/pdk/1233c035a/current/models/current/tis/model.paths.strongnoprint.inc /user/kalpu/cloc/autochar/subckt/sr40/an210h noprintPVDD 1.2vvdd vdd 0 PVDDRVDD VDD VDD_inv1 1000RVSS VSS_inv1 0 1000xinv1 A B Y VSS_inv1 vdd_inv1 an210h*10 MHzVA A 0 PULSE 0 PVDD 1n pslew pslew 50n 100n *Vb B 0 PULSE 0 PVDD 1n pslewpslew 50n 100n Vb B 0 PVDDPslew 0.01npload 50ffCY Y 0 pload.tran 0.01ns 250ns.MEASURE TR AVGPWR AVG P(Vvdd) FROM=20ns TO=60ns .punch tr V(Vdd_inv1vss_inv1) .punch tr I(VVDD) .punch tr I(rvdd) .punch tr V(A B Y) *.punch trI(rvdd rvss).end119

Appendix E Waveform transformation exampleFigure 4 1MHz base Waveform, 830.4uWFigure 5 100MHz Transformation, 830.4 uW120

Figure 6 1GHz Transformation for 1MHz, 830.4uW121

Power Grid Analysis in VLSI Designs - SERC

Create successful ePaper yourself

Delete template?

Save as template?