10.07.2015 Views

Part IV .pdf - UCSD VLSI CAD Laboratory

Part IV .pdf - UCSD VLSI CAD Laboratory

Part IV .pdf - UCSD VLSI CAD Laboratory

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Section <strong>IV</strong>: Timing ClosureTechniquesJune 2002 DAC02 – Physical Chip Implementation 1


IBM Contributions to thispresentation include:• T.J. Watson Research Center• Austin Research Lab• ASIC Design Centers• EDA Organization* For more detailed information see references atthe end of this presentation, which include a widevariety of IBM and External publications covering theseareas.June 2002 DAC02 - Physical Chip Implementation 2


Overview• Introduction• Review material (timing and synthesis)• Introduction to placement• Placement algorithms• Paradigms for placement-synthesisintegration• Placement aware synthesis techniques• Congestion avoidance / mitigation techniques• Routing optimizationJune 2002 DAC02 - Physical Chip Implementation 3


Timing Closure• Many aspects of a design contribute to performance,power, and densityu Architecture / Logic Implementationu PD Design Style (Flat, Hierarchical, etc)u Clocking Paradigm / Test / Circuit Familyu Floor Plan / Synthesis / Placement / Routing• Design Automation for timing closure is more significantthan ever beforeu Designs are largeru Wires are longer, invalidating statistical synthesismodels, and requiring lots of buffersu Cycle times are more aggressiveJune 2002 DAC02 - Physical Chip Implementation 4


Design Automation Tools areIndividually Mature• Timing analysis• Synthesis / Technology mapping• Placement / Routing• Floor Planning• Extraction / AnalysisJune 2002 DAC02 - Physical Chip Implementation 5


Challenge is to integrate theminto one cooperative applicationNetlist inPhysical SynthesisPlace&routesynthesistimingCompletedDesignJune 2002 DAC02 - Physical Chip Implementation 6


Design Flow Evolution:Design EntrySynthesisw/TimingTiming Integrated Place Driven drivenPlacement,placementand Synthesis plus &Automatic RoutingPostplacementtuning Route1. Tech independentoptimization2. Tech mapping3. Timing correction1. Physically awareoptimizations2. Physically awaretiming correction3. Timing / Noiseaware routingTimingJune 2002 DAC02 - Physical Chip Implementation 7


What you should expect:• High level concepts presented aregenerally applicable across a widerange of tools / methodologies (ie: notIBM specific)• Specific tool internals used in thistutorial are taken from IBM tools. Theyshould provide a reasonable “feel” as tohow things are done in the industry.June 2002 DAC02 - Physical Chip Implementation 9


Worldwide ASIC/PLD SalesTop 5 Suppliers for 2001• IBM $ 2758 growth 1.2%• Agere $ 1310 growth -43.5%• LSI $ 1243 growth -38.2%• NEC $ 1243 growth -35.2%• XLIINX $ 1149 growth -26.3%Revenue: Millions of U.S. DollarsSource: Gartner Dataquest (March2002)June 2002 DAC02 - Physical Chip Implementation 10


IBM ASIC Supplier #1 since 1999Dataquest 96-021996 1997 1998 1999 200020011NECNECLucentIBMIBMIBM2LSI LogicIBMIBMLucentLucentLucent3FujitsuLucentNECNECLSI LogicLSI Logic4LucentFujitsuLSI LogicLSI LogicNECNEC5IBMLSI LogicFujitsuFujitsuXilinxXilinx6TITIAlteraXilinxFujitsuFujitsu7Toshiba<strong>VLSI</strong>XilinxAlteraAlteraAltera8XilinxToshibaTISTMToshibaToshiba9HitachiAlteraToshiba<strong>VLSI</strong>STMMitsubishi10<strong>VLSI</strong>Xilinx<strong>VLSI</strong>TITIAgilentJune 2002 DAC02 - Physical Chip Implementation 11


Section Outline• Introduction• Review material (timing and synthesis)• Introduction to placement• Placement algorithms• Paradigms for placement-synthesisintegration• Placement aware synthesis techniques• Congestion avoidance / mitigation techniques• Routing optimizationJune 2002 DAC02 - Physical Chip Implementation 12


Static Timing AnalysisJune 2002 DAC02 - Physical Chip Implementation 13


Timing Analysis Basics:Why static timing since simulation is more accurate?• Simulation has a number of key drawbacks• requires input state vectors• long runtimes• A simple example:abczHow would one calculate the worst case risingdelay from a to z?c=0 c=1b=0 a-z delay1 a-z delay2b=1 a-z delay3 a-z delay4Exponential explosion as possible design input states grow!June 2002 DAC02 - Physical Chip Implementation 14


Timing Analysis Basics:Definition of basic terms-Arrival time(AT) -- the time at switch a pin switches state-Slew - the rate at which a signal switches– usually difference of 10% and 90% on voltage curvevdd10timeslew = time90 -90 time1050 AT = time50-Required arrival time(RAT) -- the time a signal must arriveat in order to avoid a chip fail-Slack = Required arrival time - Arrival time– Positive slack good, negative slack badJune 2002 DAC02 - Physical Chip Implementation 15


Timing Analysis Basics:• Block based timing:• Worst value only stored at merge points• Each segment is processed just onceExample Problem: What is slack at PO?at=0at=0d=1d=2at=1at=2d=2d=3temp at=3at=5d=1at=6at=5d=1d=3temp at=7at=8d=3at=11rat=10at=0d=5Slack= -1June 2002 DAC02 - Physical Chip Implementation 16


Timing Analysis Basics:• What is Incremental Timing?• Enabling small incremental changes without full retiming• Only direct fanin/fanout cone is processedat=0at=0at=0d=1d=2at=at=12d=2d=3at=1d=5d=1at=5at=d=1at=at=3at=56d=1d=3at=at=78d=3at=10at 11= rat=102 d=1slack=0d=1it passed!June 2002 DAC02 - Physical Chip Implementation 17


Early Mode Analysis• Definitions change as follows– longest becomes shortest– slack = arrival - requiredSL aAT aAT bSL b=0 − 0 == 0= 1ab0= 1 − 0 = 11cSL yAT yy= 1 −1==11AT c= 0SL c= 0 −1=−10RAT x= 2xAT x=1SL x= 1−2 = −1June 2002 DAC02 - Physical Chip Implementation 18


Timing Correction• Fix electrical violationsuResize cellsuBuffer netsuCopy (clone) cells• Fix timing problemsuLocal transforms (bag of tricks)uPath-based transformsJune 2002 DAC02 - Physical Chip Implementation 19


Local Synthesis Transforms• Resize cells• Buffer or clone to reduce load on critical nets• Decompose large cells• Swap connections on commutative pins oramong equivalent nets• Move critical signals forward• Pad early paths• Area recoveryJune 2002 DAC02 - Physical Chip Implementation 20


Transform Example…..Double InverterRemoval…..…..Delay = 4Delay = 2June 2002 DAC02 - Physical Chip Implementation 21


Resizingab?def0.20.20.3d0.050.040.030.020.010abA0.0350 0.2 0.4 0.6 0.8 1loadA B CabC0.026June 2002 DAC02 - Physical Chip Implementation 22


Cloningd0.050.040.030.020.0100 0.2 0.4 0.6 0.8 1loadA B Cd0.2dab?efgh0.20.20.20.2abABefghJune 2002 DAC02 - Physical Chip Implementation 23


Bufferingd0.050.040.030.020.0100 0.2 0.4 0.6 0.8 1loadA B Cabd0.2e 0.2af? 0.2bg 0.2h 0.2B0.1Bde0.20.2fgh0.20.20.2June 2002 DAC02 - Physical Chip Implementation 24


Redesign Fan-in TreeArr(a)=4Arr(b)=3Arr(c)=1Arr(d)=0abcd111eArr(e)=6cd1b1a1eArr(e)=5June 2002 DAC02 - Physical Chip Implementation 25


Redesign Fan-out Tree3311111111211Longest Path = 5Longest Path = 4Slowdown of buffer due to loadJune 2002 DAC02 - Physical Chip Implementation 26


DecompositionJune 2002 DAC02 - Physical Chip Implementation 27


Swap Commutative Pins0a11 5b12 c2Simple Sorting on arrival times and delay works1012abc11132June 2002 DAC02 - Physical Chip Implementation 28


Move Critical Signals Forwardabcdabdcee• Based on ATPG– linear in circuit size– Detectsredundanciesefficiently• Efficiently find wiresto be added andremove.– Based onmandatoryassignments.June 2002 DAC02 - Physical Chip Implementation 29


Section outline• Introduction• Review material (timing and synthesis)• Introduction to placement• Placement algorithms• Paradigms for placement-synthesisintegration• Placement aware synthesis techniques• Congestion avoidance / mitigation techniques• Routing optimizationJune 2002 DAC02 - Physical Chip Implementation 30


Placement Objective:• Find optimal relative ordering of cellsu minimize wire length and congestionu maximize timing slack• Find optimal spacing of cellsu eliminate wiring congestion problemsu provide space for post placement synthesisFclock treesFbuffer insertionFtiming correction• Find optimal Global PositionJune 2002 DAC02 - Physical Chip Implementation 31


Optimal Relative Order:AB CJune 2002 DAC02 - Physical Chip Implementation 32


To spread ...A B CJune 2002 DAC02 - Physical Chip Implementation 33


.. or not to spreadA B CJune 2002 DAC02 - Physical Chip Implementation 34


35maneira que “o outro jamais é reconhecido como sujeito nem como sujeito de direitos, jamaisé reconhecido como subjetividade nem como alteridade.” (CHAUI, 2007).Em contrapartida, o Direito deve ser “uma exigência ética” (OL<strong>IV</strong>EIRA, 2002, p.49)cujo fundamento é a própria justiça da alteridade. Tendo em vista que “a ética se opõe àviolência” (CHAUÍ, 2007), o Direito, por conseqüência, deve, através de seus mecanismospróprios (como ações afirmativas para os negros), enfrentar e coibir a violência racial dadesigualdade e do não acesso. “O Outro agora passa a ser inerente a todo o Direito e Justiça,pois é no acolhimento do Outro que surge a não violência, a não indiferença, a possibilidadede conceber o Direito provido de conteúdo ético.” (GOMES, 2006a, p. 16).4.2 A Fonte da Violência: o etnocentrismoO racismo leva uma coletividade, possuidora de características fenotípicas emanifestações culturais tidas como diferentes, a uma instância periférica. Esse enaltecimentode um “Si Mesmo” racial desvaloriza e violenta o Outro cultural. Dessa forma, um gruposocial passa a constituir-se como centralidade e o diferente a essa lógica egocêntrica passa aser concebido como estranho, “incivilizado, bárbaro ou simplesmente exótico.”(SHELKSHORN, 1993). E é nesse sentido que o racismo funda-se numa pretensãoetnocêntrica.Segundo Dussel (1993a), a discussão acerca do etnocentrismo marca a história daAmérica Latina desde a sua conquista, abrindo caminhos para a reflexão filosófica dosdireitos humanos. Essa abertura se deu principalmente no Debate entre o Bispo Bartolomeude Las Casas e o Cônego Juan Ginés de Sepúlveda, sobre o processo de conquista ecolonização da América Espanhola:“A pergunta fundamental era: têm os não europeus os mesmos direitos que oseuropeus? Neste contexto, Las Casas fez valer o argumento de que eles têm osmesmos direitos simplesmente porque são seres humanos: índios e africanospertencem a família de Deus, são membros da comunidade do povo de Deus assimnenhuma desigualdade se justifica. Eles não devem ser tratados como animais edevem ser respeitados em sua dignidade.” (OL<strong>IV</strong>EIRA, 2002, p. 46).O debate entre Las Casas e Sepúlveda colocou o “eurocentrismo” no núcleo dasdiscussões acerca da violência contra os ameríndios. Isso possibilitou uma reflexão acerca do“surgimento do homem como sujeito de direitos e a gênese da moderna teoria dos direitoshumanos.” (GOMES, 2006b, p. 11). Para Sepúlveda apenas os homens sábios e virtuososdevem decidir o que é justo e o que é de direito (GOMES, 2006b, p.55), concebendo sempre a


… or to the rightA B CJune 2002 DAC02 - Physical Chip Implementation 36


Optimal Relative Order:AB CWithout “free” space the problem isdominated by orderJune 2002 DAC02 - Physical Chip Implementation 37


Placement Footprints:Standard Cell:Data Path:IP - FloorplanningJune 2002 DAC02 - Physical Chip Implementation 38


Placement Footprints:CoreReserved areasIOControlMixed Data Path &sea of gates:June 2002 DAC02 - Physical Chip Implementation 39


Placement Footprints:Perimeter IOArea IOJune 2002 DAC02 - Physical Chip Implementation 40


Placement objectives are subject toUser Constraints / Design Style:• Hierarchical Design Constraintsu pin locationu power railu reserved layers• Flat Design w/Floor Plan constraints• Fixed circuits• IO connectionsJune 2002 DAC02 - Physical Chip Implementation 41


UnconstrainedPlacementJune 2002 DAC02 - Physical Chip Implementation 42


Floor plannedPlacementJune 2002 DAC02 - Physical Chip Implementation 43


CongestionMAPJune 2002 DAC02 - Physical Chip Implementation 44


Advantages of Hierarchy• Design is carved into smaller pieces that can beworked on in parallel (improved throughput)• A known floor plan provides the logic design teamwith a large degree of placement control.• A known floor plan provided early knowledge of longwires• Timing closure problems can be addressed by tools,logic design, and hierarchy manipulation• Late design changes can be done with minimalturmoil to the entire designJune 2002 DAC02 - Physical Chip Implementation 45


Disadvantages of Hierarchy• Results depend on the quality of the hierarchy. Thelogic hierarchy must be designed with PD taken intoaccount.• Additional methodology requirements must be met toenable hierarchy. Ex. Pin assignment, MacroAbstract management, area budgeting, floorplanning, timing budgets, etc• Late design changes may affect multiplecomponents.• Hierarchy allows divergent methodologies• Hierarchy hinders DA algorithms. They can nolonger perform global optimizations.June 2002 DAC02 - Physical Chip Implementation 46


Physical Synthesis FlowWire-load ModelsSynthesized NetlistUnplacedPhysically “unaware” timingCleanup: Remove buffers,nominal power levels on gatesInitial “basic” placementFor minimal wire-length, min-cut,Steiner tree estimates, physicallyaware timingLogical + Placement optimizationsPhysically aware logicoptimizationsYesTimingNo moreImprovement?Placed NetlistTiming-driven placementw/resynthesisFor minimal netweights, based onthe timing of the netJune 2002 DAC02 - Physical Chip Implementation 47


Logical + Placement Optimizations• Start with a placed or unplacednetlist• Do recursive partitioning• During and following eachpartition action, apply logicoptimizations such asu timing correctionsu rebufferingu repoweringu cloningu pin swappingu move boxesu … etcBinCutJune 2002 DAC02 - Physical Chip Implementation 48


Section Outline• Introduction• Review material (timing and synthesis)• Introduction to placement• Placement algorithms• Paradigms for placement-synthesisintegration• Placement aware synthesis techniques• Congestion avoidance / mitigation techniques• Routing optimizationJune 2002 DAC02 - Physical Chip Implementation 49


Overview of Common PlacementAlgorithms:- Simulated Annealing- Quadratic Placement- <strong>Part</strong>itioningJune 2002 DAC02 - Physical Chip Implementation 50


Simulated Annealing:for(temp=high; temp > absolute_zero; temp -= increment){make a random movescore the moveuse temp dependent probability to decide to accept or reject}Note: Clustering can be useto improve performanceJune 2002 DAC02 - Physical Chip Implementation 51


Annealing:Pros:- ease of implementation, dumb moves / smart scoring- can easily accommodate new constraints - just add them to thescoring function- great quality- can be made to run on parallel processorsCons:- very long run timeJune 2002 DAC02 - Physical Chip Implementation 52


Quadratic PlacementJune 2002 DAC02 - Physical Chip Implementation 53


Review:x=100 x=200x1Cost = (x 1 − 100) 2 + (x 1 − x 2 ) 2 +(x 2 −200) 2x2¹¹x 1Cost = 2(x 1 − 100)+ 2(x 1 − x 2 )¹¹x 2Cost = − 2(x 1 −x 2 ) + 2(x 2 − 200)setting the partial derivatives = 0 we solve for the minimum Cost:Ax + B = 04 −2−2 4x 1x 2+ −200−400= 02 −1 x 1−1 2x 2+ −100−200x1=400/3 x2=500/3= 0June 2002 DAC02 - Physical Chip Implementation 54


Review:x=100 x=200x1x2setting the partial derivatives = 0 we solve for the minimum Cost:Ax + B = 04 −2−2 4x 1x 2+ −200−400= 02 −1 x 1−1 2 x 2+ −100−200x1=400/3 x2=500/3= 0Interpretation of matrices A and B:The diagonal values A[i,i] correspond to the number of connections to xiThe off diagonal values A[i,j] are 1 if object i is connected to object j, 0 otherwiseThe values B[i] correspond to the sum of the locations of fixed objects connected to object iJune 2002 DAC02 - Physical Chip Implementation 55


Why formulate the problem thisway?• Because we can• Because it is trivial to solve• Because there is only one solution• Because the solution is a global optimum• Because the solution conveys “relative order”information• Because the solution conveys “globalposition” informationJune 2002 DAC02 - Physical Chip Implementation 56


However:• Solution is not legal• Solution depends of fixed anchor points• Solution does not minimize linear wire length,congestion, or timing• Solution is generally highly overlapping w/high density (ie needs to be spread out)June 2002 DAC02 - Physical Chip Implementation 57


What does the solution look like?• To get an intuitive feel for the solution,examine the relaxation method forsolving Ax + B = 0• Actual program implementation mayuse other solution methods (that aregenerally less intuitive).June 2002 DAC02 - Physical Chip Implementation 58


Solution of Quadratic usingRelaxation:June 2002 DAC02 - Physical Chip Implementation 59


June 2002 DAC02 - Physical Chip Implementation 60


Constrained Solutions:• Sometimes we want to solve for theminimum wire• length subject to a constraint• Example: Using quadratic forpartitioning, we may want the quadraticplacement to be "centered"June 2002 DAC02 - Physical Chip Implementation 61


June 2002 DAC02 - Physical Chip Implementation 62


June 2002 DAC02 - Physical Chip Implementation 63


Constrained SolutionsTo minimize Cost = f(x) subject to a constraint g(x) = 0 we can uselangrangian multipliers to modify the Cost function as follows:Co st = f(x) + kg(x)¹¹x C ost = ¹ ¹x f(x) + k¹ ¹x g(x)Using CG as a constraint:CG = Snwhere: s is the size of object_ii= 1s i x i +Sni=1s in is the number of objectsg(x ) = (Sni=1s i x i +Sni=1s i ) − C G¹where we use N to represent the constant¹x g(x) = s iN Sni=1s iWe have already shown that¹¹x f(x) = 0leads to the system of equations --- Ax + B = 0Therefore solving the constrained porblem ¹= 0¹x C ost = ¹ ¹x f(x) + k¹ ¹x g(x)leads to: = 0Ax + B + k s iNJune 2002 DAC02 - Physical Chip Implementation 64


Constrained Solutions (cont):To solve Ax + B + k s i= 0 we could use a packaged solver and add theNadditional unknown kand equation CG = Snto our matricies andi=1s i x i +Nsolve.Here is an alternative way to solve the system:by substitution we let x = x u+k x lwhere is the unconstrained solution (ie the solution to Ax + B = 0)xuAssuming we can solve the unconstrained problem,xu is known.By substitution we get:A(xu + kx l) + B + k s iN = 0which becomes:Akx orl + k s iN = 0 Ax l + s iN = 0June 2002 DAC02 - Physical Chip Implementation 65


Constrained Solutions (cont):We need to solve: A x l +s iN = 0Note: The A matrix is the same as the A matrix for the unconstrained solution.Since the A matrix is the netlist connectivity specification, we have A.The B matrix here iss iNinstead of the sum of fixed location connects.Intrepretation:The solution tosA x l +iN = 0and placement such that:can be obtained by modifying the original netlist1.) All fixed objects are moved x = 02.) A constant force vector is applied to each object. The constant forcevector for the i’th object has magnitude s iNThen use the same solver as was used to solve Ax + B = 0June 2002 DAC02 - Physical Chip Implementation 66


Constrained Solutions (cont):We also need to solve for kFrom the CG relationship and x=x u +kx lwe get:CG=S n wherei=0 si (xu +kx )+N N=S ni li i=0 si(ie total size)since we have solved for and the only unkown isxu x l kwe get:k= NCG−S ni=0si x uiSn i=0si x liJune 2002 DAC02 - Physical Chip Implementation 67


Constrained Solutions (summary):To minimize f(x) (the wl squared cost function) subject to a CG constraint we do thefollowing:1.) Solve for by solving using relaxation or some other methodx u A x u + B = 02.) Solve for as follows:x l A x l +s iN = 0- Move all fixed objects to location=0- Add a constant force vector to each object. The constant forcevector for the i’th object has magnitude s iN- Using rela xation or some other method, solve for x l3.) Solve for k using k= NC G−S ni= 0 s i x u iSni=0s i x l i4.) Compute the final placement using x = x u + kx lJune 2002 DAC02 - Physical Chip Implementation 68


Review:Force CG to 150x=100 x=200x1s=10x2s=100From the previous example we know that the solution to:Axu + B = 0133.33x =166.67(10&133.33)+(100&166.67)with this solution the CG is at (ie not 150)110= 163.64Now we need to solve:Ax l + s i which is the same as solving ->N = 0 Ax l + 00+ s iN = 0Recall that the B matrix represents the position of fixed objects. So, this equationrepresents the solution to:x = 0x1x210/110100/110June 2002 DAC02 - Physical Chip Implementation 69


Constrained Solutions (summary):Advantages of this approach:1.) The Solver data structure is the netlist onlyie. no additional memory requirements2.) Sometimes the unconstrained solution is by itself sufficient,therefore we can avoid the additional overhead of producingthe constrained solution3.) The numerical iterations in this method are NOT dependent onthe CG. We can solve for xu and xl, then try many different CG pointsat very low cost.June 2002 DAC02 - Physical Chip Implementation 70


Quadratic Techniques:Pros:- mathematically well behaved- efficient solution techniques find global optimum- great qualityCons:- solution of Ax + B = 0 is not a legal placement, so generallysome additional partitioning techniques are required.- solution of Ax + B = 0 is that of the "mapped" problem, ienets are represented as cliques, and the solution minimizeswire length squared, not linear wire length unless additionalmethods are deployed- fixed IOs are required for these techniques to work wellJune 2002 DAC02 - Physical Chip Implementation 71


<strong>Part</strong>itioningJune 2002 DAC02 - Physical Chip Implementation 72


<strong>Part</strong>itioning:Objective:Given a set of interconnected blocks, produce two sets thatare of equal size, and such that the number of netsconnecting the two sets is minimized.June 2002 DAC02 - Physical Chip Implementation 73


FM <strong>Part</strong>itioning:list_of_sets = entire_chip;while(any_set_has_2_or_more_objects(list_of_sets)){for_each_set_in(list_of_sets){partition_it();}/* each time through this loop the number of *//* sets in the list doubles. */}Initial Random PlacementAfter Cut 1After Cut 2June 2002 DAC02 - Physical Chip Implementation 74


FM <strong>Part</strong>itioning:Moves are made based on object gain.Object Gain: The amount of change in cut crossingsthat will occur if an object is moved fromits current partition into the other partition- each object is assigned again- objects are put into a sortedgain list- the object with the highest gainfrom the smaller of the two sidesis selected and moved.- the moved object is "locked"- gains of "touched" objects arerecomputed- gain lists are resorted-2-1-101002010-2-1-June 2002 DAC02 - Physical Chip Implementation 75


FM <strong>Part</strong>itioning:-102-200-00-2-111-1June 2002 DAC02 - Physical Chip Implementation 76


-1-2-2-2-20-00-2-111-1June 2002 DAC02 - Physical Chip Implementation 77


-1-2-2-2-20-00-2-111-1June 2002 DAC02 - Physical Chip Implementation 78


-1-2-2-2-20-00-2-111-1June 2002 DAC02 - Physical Chip Implementation 79


-1-2-2-2-20-0-2-21 -1-1-1June 2002 DAC02 - Physical Chip Implementation 80


-1-2-2-2-2 0-0-2-21 -1-1-1June 2002 DAC02 - Physical Chip Implementation 81


-1-2-2-2-20-0-2-21 -1-1-1June 2002 DAC02 - Physical Chip Implementation 82


-1-2-2-2-201-2-2-21 -1-1-1June 2002 DAC02 - Physical Chip Implementation 83


-1-2-2-2-201-21-2-2-1-1-1June 2002 DAC02 - Physical Chip Implementation 84


-1-2-2-2-201-21-2-2-1-1-1June 2002 DAC02 - Physical Chip Implementation 85


-1-2-2-2-201-2-1-2-2-3-1-1June 2002 DAC02 - Physical Chip Implementation 86


-1-2-2-2-2-20-1-21-2-3-1-1June 2002 DAC02 - Physical Chip Implementation 87


-1-2-2-2-2-20-11-2-2-3-1-1June 2002 DAC02 - Physical Chip Implementation 88


-1-2-2-2-2-2-2-1-1-2-2-3-1-1June 2002 DAC02 - Physical Chip Implementation 89


<strong>Part</strong>itioning:Pros:- very fast- great quality- scales nearly linearly with problem sizeCons:- non-trivial to implement- very directed algorithm, but this limits the ability to deal withmiscellaneous constraintsJune 2002 DAC02 - Physical Chip Implementation 90


FM <strong>Part</strong>itioning- For large designs min-cut (FM) produces poor resultsTo Compensate, there are two widely used enhancements:1.) Quadratic seeding2.) Multi-Level partitioningJune 2002 DAC02 - Physical Chip Implementation 91


<strong>Part</strong>itioning:cut linemove1move3move2move4June 2002 DAC02 - Physical Chip Implementation 92


Global Placement - Multi-Level<strong>Part</strong>itioning:generate clusters:while(there are clusters){partition_it;remove 1 cluster layer;}partition_it;20 110 0 01move1move30001210 01 00 1000100move2June 2002 DAC02 - Physical Chip Implementation move493


move10121200 00 11move30 0 01 01001000100move2June 2002 DAC02 - Physical Chip Implementation move494


20 110 0 0move1move3001210 01 01001000100move2June 2002 DAC02 - Physical Chip Implementation move495


20 110 0 0move1001210 0move31 01001000100move2June 2002 DAC02 - Physical Chip Implementation move496


0 12move110 0 0move31 01001000100move2June 2002 DAC02 - Physical Chip Implementation move497


012move110 0 0move31 01001000100move2June 2002 DAC02 - Physical Chip Implementation move498


0 12move110 0 0move31 01001000100move2June 2002 DAC02 - Physical Chip Implementation move499


0 1move110 0 0move31 01001000100move2June 2002 DAC02 - Physical Chip Implementation move4100


0 1move110 0 0move31 01001000100move2June 2002 DAC02 - Physical Chip Implementation move4101


0 1move110 0 0move301001 1000move2June 2002 DAC02 - Physical Chip Implementation move4102


0 1move110 0 0move3100010100move2June 2002 DAC02 - Physical Chip Implementation move4103


0 1move110 0 0move3100100100move2June 2002 DAC02 - Physical Chip Implementation move4104


0 1move110 0 0move3100100100move2June 2002 DAC02 - Physical Chip Implementation move4105


0 1move110 0 0move3100100100move2June 2002 DAC02 - Physical Chip Implementation move4106


0 1move110 0 0move310010000move2June 2002 DAC02 - Physical Chip Implementation move4107


0 1move110 0 0move310010000move2June 2002 DAC02 - Physical Chip Implementation move4108


0 1move110 0 0move310000move2June 2002 DAC02 - Physical Chip Implementation move4109


0 1move110 0 0move310000move2June 2002 DAC02 - Physical Chip Implementation move4110


0 1move110 0 0move310000move2June 2002 DAC02 - Physical Chip Implementation move4111


0 1move110 0 0move310000move2June 2002 DAC02 - Physical Chip Implementation move4112


0 1move110 0 0move30000move2June 2002 DAC02 - Physical Chip Implementation move4113


0 1move110 0 0move30000move2June 2002 DAC02 - Physical Chip Implementation move4114


0 1move110 0 0move30000move2June 2002 DAC02 - Physical Chip Implementation move4115


0 1move110 0 0move30000move2June 2002 DAC02 - Physical Chip Implementation move4116


0 1move110 0 0move30000move2June 2002 DAC02 - Physical Chip Implementation move4117


0 1move110 0 0move30000move2June 2002 DAC02 - Physical Chip Implementation move4118


MLP/FM <strong>Part</strong>itioning Cons:• Does not know how to handle “free” space• Results tend to be erratic, ie results from runto run have significant variationJune 2002 DAC02 - Physical Chip Implementation 119


MLP/FM <strong>Part</strong>itioning Pros:• Handles designs that have no fixedconnection points• Very fast - can handle large designsJune 2002 DAC02 - Physical Chip Implementation 120


Hybrid Techniques• Use both MLP and Quadratictechniques• Results are more predictable due toquadratic cost function• <strong>Part</strong>itioning is used for overlap removal• Quadratic is used for “free” spacehandling and some relative orderindicationsJune 2002 DAC02 - Physical Chip Implementation 121


Quadratic <strong>Part</strong>itioningJune 2002 DAC02 - Physical Chip Implementation 122


June 2002 DAC02 - Physical Chip Implementation 123


June 2002 DAC02 - Physical Chip Implementation 124


June 2002 DAC02 - Physical Chip Implementation 125


June 2002 DAC02 - Physical Chip Implementation 126


Analytical Constraint Generation• Combine Quadratic techniques withMLP• Use Quadratic solution to determineglobal position (ie balance)• Use MLP to determine relative orderingof cellsJune 2002 DAC02 - Physical Chip Implementation 127


Analytical Constraint GenerationCapacity = 2 Capacity = 2Quadratic Poor Solution solutionACG solutionArea=1AnalyticalconstraintJune 2002 DAC02 - Physical Chip Implementation 128


Analytical Constraint GenerationJune 2002 DAC02 - Physical Chip Implementation 129


June 2002 DAC02 - Physical Chip Implementation 130


June 2002 DAC02 - Physical Chip Implementation 131


June 2002 DAC02 - Physical Chip Implementation 132


June 2002 DAC02 - Physical Chip Implementation 133


June 2002 DAC02 - Physical Chip Implementation 134


June 2002 DAC02 - Physical Chip Implementation 135


June 2002 DAC02 - Physical Chip Implementation 136


June 2002 DAC02 - Physical Chip Implementation 137


June 2002 DAC02 - Physical Chip Implementation 138


June 2002 DAC02 - Physical Chip Implementation 139


June 2002 DAC02 - Physical Chip Implementation 140


June 2002 DAC02 - Physical Chip Implementation 141


June 2002 DAC02 - Physical Chip Implementation 142


June 2002 DAC02 - Physical Chip Implementation 143


June 2002 DAC02 - Physical Chip Implementation 144


June 2002 DAC02 - Physical Chip Implementation 145


June 2002 DAC02 - Physical Chip Implementation 146


June 2002 DAC02 - Physical Chip Implementation 147


June 2002 DAC02 - Physical Chip Implementation 148


June 2002 DAC02 - Physical Chip Implementation 149


MLPw/ACGJune 2002 DAC02 - Physical Chip Implementation 150


GlobalRouteResults:June 2002 DAC02 - Physical Chip Implementation 151


MLPw/oACGJune 2002 DAC02 - Physical Chip Implementation 152


Side by Side Comparison:OriginalACGJune 2002 DAC02 - Physical Chip Implementation 153


June 2002 DAC02 - Physical Chip Implementation 154


June 2002 DAC02 - Physical Chip Implementation 155


Observations on QuadraticPlacement• placements are predictable and repeatable• timing is inherently better• wire length is not the best, but good• run time: slower than MLP by 4x• run time: faster than annealing by 4x• excellent “free space” handling• placements “feel” similar to those producedby annealingJune 2002 DAC02 - Physical Chip Implementation 156


Repeatability Example:• One circuit• Minimum linear lengthoccurs for all solutionswhere y=50 0 < x < 100• Minimum quadratic lengthoccurs for y=50, x=50• Quadratic solution IS bothminimum linear andminimum quadratic length(0,50) (0,100)June 2002 DAC02 - Physical Chip Implementation 157


Section Outline• Introduction• Review material (timing and synthesis)• Introduction to placement• Placement algorithms• Paradigms for placement-synthesisintegration• Placement aware synthesis techniques• Congestion avoidance / mitigation techniques• Routing optimizationJune 2002 DAC02 - Physical Chip Implementation 158


Synthesis - Placement InterfaceJune 2002 DAC02 - Physical Chip Implementation 159


<strong>Part</strong>itioning Algorithm:<strong>Part</strong>ition & ReflowReadDataPreprocessingNetlistWhile ( any partition has > 2 cells )Global PlacementDivide each <strong>Part</strong>itionReflow across partitionsSynthesisDetailed PlacementDoneJune 2002 DAC02 - Physical Chip Implementation 160


What Synthesis Can do whenInvoked:- add boxes- delete boxes- add nets- delete nets- reconnect nets- change box sizes- query placement locations of boxes- query "bin" statistics- remove a box from a bin- add a box to a binJune 2002 DAC02 - Physical Chip Implementation 161


Placement and SynthesisIntegration• Loosely coupled: (methodology coupling)u do some synthesis, then write out datau do some placement, then write out datau .. Repeat• Interleaved: (placement & synthesis in sameprocess)u do pre-pd synthesisF for each placement step redo synthesis• Tightly coupled: (simultaneous P&S awaretransforms)June 2002 DAC02 - Physical Chip Implementation 162


Loosely Coupled Placement &Synthesis:Characteristics:Do Placement- Placement is treated as a black boxAnalyze- Multiple placement runs are made- re-synthesize- GenerateConstraintsNoMeetObjectivesYesDone w/placementJune 2002 DAC02 - Physical Chip Implementation 163


Interleaved Placement& Synthesis:SynthesisCharacteristics:- the placement flow is the same as ina placement only methodologySynthesis- in between each step of the placementprogression, synthesis is invokedSynthesisJune 2002 DAC02 - Physical Chip Implementation 164


Tightly Coupled• Placement and synthesis algorithms becomeco-dependant• Placement algorithms have awareness ofsynthesis activity• Synthesis algorithms have awareness ofplacement activityJune 2002 DAC02 - Physical Chip Implementation 165


Section Outline• Introduction• Review material (timing and synthesis)• Introduction to placement• Placement algorithms• Paradigms for placement-synthesisintegration• Placement aware synthesis techniques• Congestion avoidance / mitigation techniques• Routing optimizationJune 2002 DAC02 - Physical Chip Implementation 166


Placement Driven CloningcriticalCloning to off-loadnon-critical pathfrom critical pathnon-criticalJune 2002 DAC02 - Physical Chip Implementation 167


Placement Driven ExpansionLogicAOLogicExpansionTransformationLogicLogicExpansion allows primitives to beplaced in a more timing friendlywayLogicLogicJune 2002 DAC02 - Physical Chip Implementation 168


Example:Tightly Coupled PlacementDriven ExpansionJune 2002 DAC02 - Physical Chip Implementation 169


Tightly Coupled Synthesis & Placement:abcdefgTransformadbefcgJune 2002 DAC02 - Physical Chip Implementation 170


Tightly Coupled Synthesis & Placement Example:Suppose the primary IO constraints look like this:cabfgJune 2002 DAC02 - Physical Chip Implementation 171de


Tightly Coupled Synthesis & Placement Example:The placement of the synthesized netlist would look something like this:cabfgJune 2002 DAC02 - Physical Chip Implementation 172de


Tightly Coupled Synthesis & Placement Example:If we could re-synthesize the netlist, we could get something that looks like this.cabfgJune 2002 DAC02 - Physical Chip Implementation 173de


Tightly Coupled Synthesis & Placement:weight = 1abcabcweight = 1/10defgPD-MAPdefgJune 2002 DAC02 - Physical Chip Implementation 174


Tightly Coupled Synthesis & Placement example:Map_TREEFor each cutpartition_itFor each partitionIf(partition number > M){if(related_node_count < N)merge_nodesif(related_node_count == 1)merge_node into neighborpartition}endendJune 2002 DAC02 - Physical Chip Implementation 175


Tightly Coupled Synthesis & Placement Example:cabfgJune 2002 DAC02 - Physical Chip Implementation 176de


Tightly Coupled Synthesis & Placement Example:cabfgJune 2002 DAC02 - Physical Chip Implementation 177de


Tightly Coupled Synthesis & Placement Example:cabfgJune 2002 DAC02 - Physical Chip Implementation 178de


Tightly Coupled Synthesis & Placement Example:cabfgJune 2002 DAC02 - Physical Chip Implementation 179de


Tightly Coupled Synthesis & Placement Example:cabfgJune 2002 DAC02 - Physical Chip Implementation 180de


Tightly Coupled Synthesis & Placement Example:cabfgJune 2002 DAC02 - Physical Chip Implementation 181de


Tightly Coupled Synthesis & Placement Example:abcabcfgResultfgd ed eJune 2002 DAC02 - Physical Chip Implementation 182


Placement Driven Timing CorrectionJune 2002 DAC02 - Physical Chip Implementation 183


Redesign Fan-in TreeArr(a)=4Arr(b)=3Arr(c)=1Arr(d)=0abcd111eArr(e)=6Arr(e)=0ecde1b1a1eArr(e)=5June 2002 DAC02 - Physical Chip Implementation 184


Placement Driven Repowering• Repowering is traditionally done using loadbased cell characterization• Placement changes continuously duringpartitioning• Need high efficiency algorithms to dorepowering in this environment• Solution: Use Gain Based FormulationJune 2002 DAC02 - Physical Chip Implementation 185


Delay ModelsLoad based formulation:Gain based formulation:β.CinCinCoutCinCoutg =CCoutinddCout= c1. E .(1+β ). Ck1= k1.Cinv +out+pinpd =Coutc1. E .(1 + β ). Clinv +d = l.g + pinpd: delayl: logical effortg: gainp: intrinsic delayJune 2002 DAC02 - Physical Chip Implementation 186


Area vs Delay Centric• Load Based Paradigm• (load-based delayeq.)• Know:• sizedu Size of each cellu Total Area ->• Don’t know:u Wire loads– area centricu Delay of each cellu Delay of a path• Gain Based Paradigm• (gain based delayeq.)• Know:• sizelessu The delay of each cell.u The delay of a path ->• Don’t know:u Wire loads– delay centricu The area of each cellu The total area• Estimation error is in the delay:u Local ‘path based’ property.• Estimation error is in the areau Global property.June 2002 DAC02 - Physical Chip Implementation 187


Design FlowHigh Level SynthesisLoadBasedDelay(DCL)LibraryAnalysisRestructuringTech MappingGainBased OptGainBasedDelayDiscretizationLate Timing CorrJune 2002 DAC02 - Physical Chip Implementation 188


Power Levels (Gate Sizes)ddCout0.050.040.030.020.0100 0.2 0.4 0.6 0.8 1CoutA B CJune 2002 DAC02 - Physical Chip Implementation 189


Library (Gain) AnalysisCing =CCoutinddCout0.050.040.030.020.0100.5 2.5 4.5 6.5 8.5 10.5gd = l.g +pA B CJune 2002 DAC02 - Physical Chip Implementation 190


Area and Load Calculationg1g 2g 3C out=0 .71C 1 =Cg21C 2 =Cg32C3=Cgout3• Start at primary outputs/ register inputs.• Much like static timing analysis.• Incremental.g1g 23gC 1 =Cg21C 2 =Cg32C outJune 2002 DAC02 - Physical Chip Implementation 191=0 .71


Gain CalculationDd 1 d 2d 3d 4CinN2 3 4 outG = ∏ gi= g1.g 2.g 3.g 4.= .=CinC 2 C 3 C 4i=1ND = ∑di=1i = d1+ d 2 + d 3 +d = l.g + p = f + pdC4CCCCCoutinMinimize D such that:G =CCoutinCoutMinimizeD =N∑i=NN N N Coutfi + ∑ piSuch that: F = ∏ fi = ∏ li.∏ gi = L.1 i=1i = 1 i = 1 i = 1CinGeometricSolution:June 2002 DAC02 - Physical Chip Implementation 192fi =f


Example Id1d 2 d 3C in = 0.19CNAND2 : d = 0.0308+0.011×CinCNOR2: d = 0.0496+0.021×CINV : d =C0.0295 + 0.009 ×C232out3F =C 2C 3fCC3 out1. f 2 . f3= f = L =in0.000008Cout= 0.71f= 0.0203Nand2 Nor2 Inv Pathp 0.0308 0.0496 0.0295 0.1099f 0.0203 0.0203 0.0203 0.0609d 0.0511 0.0699 0.0498 0.1708Cin 0.19 0.3364 0.3283 .June 2002 DAC02 - Physical Chip Implementation 193


Constant Delay CalculationdcdcCoutCoutInverter :d = l.g + pSet : gc= 3.6gcCalculate: Cout =CinMeasure: dcSet : Cout = 0Measure: do=Calculate: l.gcp= dc− p=fcOtherl . g =ggdnandnorc.nand= 2.5= 1.8= lgates :f c. nand. gnand+pnandJune 2002 DAC02 - Physical Chip Implementation 194


Discretization• From gain-based model back to appropriate power levels• There is an error in timing/load when ‘ideal’ power levels are notavailable.u Goal: Minimize this error.u Can be tuned to delay error or capacitance error.g1g 2g 3d = l.g +pC out=0 .71C 1 =Cg21C 2 =Cg32C3=Cgout3[Kudva98][Beeftink98]June 2002 DAC02 - Physical Chip Implementation 195


Gain Based: Observations:• Gain Based algorithms: A major improvement.u More homogeneous (global) algorithms and designs.u Can be better targeted for area and/or delay.• Reveal inherent cell characteristics to optimization tools, leadingto improved QOR• Good library design is required to facilitate discretization step• Ideally suited for operation within Physical SynthesisJune 2002 DAC02 - Physical Chip Implementation 196


Placement Driven BufferingRip Outall BuffersInsert Buffers basedon placement infoJune 2002 DAC02 - Physical Chip Implementation 197


What to do About Long Wires?• Add buffers• Tune wire sizes• Modify the placement to reduce themJune 2002 DAC02 - Physical Chip Implementation 198


Placement Driven vs LogicDriven Buffer Insertion• Logic driven buffer insertion focuses on logictopology and buffer sizing while assuming astatistical wire load model• Placement driven buffering uses an existingplacement as the fundamental constraintJune 2002 DAC02 - Physical Chip Implementation 199


Placement Driven BufferInsertion: Buffopt (IBM)• Multiple buffer types• Inverters• Capacitance, Slew and Noise constraints• Wire Sizing• Simultaneous driver sizing• High order interconnect delay and Ceffective• Blockage handlingJune 2002 DAC02 - Physical Chip Implementation 200


How Do Buffers Help?• Reduce delayuWire delay quadratic in lengthuBuffers make delay essentially linearuDelay gate dominated, not wire dominated• Fix other problemsuBad slews at sinksuCapacitance range violationsuNoise induced by capacitance couplingJune 2002 DAC02 - Physical Chip Implementation 201


How Does Wire Sizing Help?• Highly resistive lines increase delay• Wider wires or thick metal layers reducesresistance, but can increase capacitance• For long interconnect, resistance reductionoutweighs capacitance increaseJune 2002 DAC02 - Physical Chip Implementation 202


Simple Buffer Insertion ProblemGiven: Source and sink locations, sink capacitancesand RATs, a buffer type, source delay rules, unitwire resistance and capacitanceBufferRAT 3RAT 4s 0RAT 1RAT 2June 2002 DAC02 - Physical Chip Implementation 203


Simple Buffer Insertion ProblemFind: Buffer locations and a routing tree such thatslack at the source is minimizedq( s0)= min1≤i≤4{RAT ( si)− delay(s0,si)}RATRAT 43s 0RAT 1RAT 2June 2002 DAC02 - Physical Chip Implementation 204


Fundamental Buffer Insertion• Van Ginneken’s dynamic programmingalgorithm• Building block: candidate (Cap, slack)uCandidates for each node stored as a listuEach sink has one candidateuPropagate candidates up the tree• Guarantees optimal solution• Quadratic complexityJune 2002 DAC02 - Physical Chip Implementation 205


Assumptions for the Basic VanGinneken algorithm:• Given a routing tree• Given a set of potential insertion points• Single buffer size• No sink or driver sizing• Linear gate delay modeluR d C down + K d• Elmore wire delay modeluR w (C w /2 + C down )June 2002 DAC02 - Physical Chip Implementation 206


Van Ginneken Extensions• Multiple buffer types• Inverters• Capacitance, Slew and Noise constraints• Wire Sizing• Simultaneous driver sizing• High order interconnect delay and Ceffective• Blockage recognitionJune 2002 DAC02 - Physical Chip Implementation 207


Example- Connect the end points of the netusing a steiner route- Add Candidate Nodes- Final buffer solution is optimalfor this route, and this set ofcandidate nodes.- Other routes may produce betterfinal solutions.- Net routing topology is an inputto Van Ginneken’s algorithmJune 2002 DAC02 - Physical Chip Implementation 208


ExampleJune 2002 DAC02 - Physical Chip Implementation 209


How Many Candidates?• Number of candidates seems to double witheach additional node• Prune candidate with worst slack whencapacitances is greater or equal• Linear number of candidatesJune 2002 DAC02 - Physical Chip Implementation 210


Pseudo Code:6 5 4 32 1List = NULL;For each node (bottom up traversal of graph){augment each item in list with wire segment up to nodeduplicate the listfor each element of the duplicate listadd a buffer at nodeanalyze each element in listanalyze each element in buffered (duplicate) listpick best element of buffered list and delete the restnew list is union of list and “best” element of buffered list}Pick best solution;June 2002 DAC02 - Physical Chip Implementation 211


Example6 5 4 32 1• Node 1 processing: 2 evaluations, at most 2 candidates kept• Node 2 processing: 4 evaluations, at most 3 candidates kept• Node 3 processing: 6 evaluations, at most 4 candidates kept• Node 4 processing: 8 evaluations, at most 5 candidates kept• Node 5 processing: 10 evaluations, at most 6 candidates kept• Node 6 processing: 12 evaluations, at most 7 candidates keptuNow pick the best one: Optimal solutionNumNum__evaluationscandidates= 2∑i


Merging BranchesCriticalMerge is additiveJune 2002 DAC02 - Physical Chip Implementation 213


Van Ginneken AlgorithmSummary• Good• Badu Clever pruning controls # of candidatesu Finds an optimal solution in quadratic timeu Easily extended to cover a variety of importantconsiderations (like multiple buffer types, wiresizing, polarity, slew, & capacitance constraints,etc.u Results depend on quality of route providedJune 2002 DAC02 - Physical Chip Implementation 214


Example Route:Critical: can not offloaddue to routeDifferent route leadsto better solutionJune 2002 DAC02 - Physical Chip Implementation 215


Physical Synthesis FlowWire-load ModelsSynthesized NetlistUnplacedPhysically “unaware” timingCleanup: Remove buffers,nominal power levels on gatesInitial “basic” placementFor minimal wire-length, min-cut,Steiner tree estimates, physicallyaware timingLogical + Placement optimizationsPhysically aware logicoptimizationsYesTimingNo moreImprovement?Placed NetlistTiming-driven placementw/resynthesisFor minimal netweights, based onthe timing of the netJune 2002 DAC02 - Physical Chip Implementation 216


Example Route:If still critical,add net weightJune 2002 DAC02 - Physical Chip Implementation 217


Example Route:June 2002 DAC02 - Physical Chip Implementation 218


Multiple Buffer Types• Instead of one buffer type, can choose fromm power levels• Generate m candidates instead of one• Still optimal• Complexity increase quadratic in mJune 2002 DAC02 - Physical Chip Implementation 219


Inverters• Store candidates in “+” and “-” listsu+ implies polarity preservedu- implies polarity reversed• Adding inverteruSwitches candidate in + list to - listuSwitches candidate in - to + list• Final result only chosen from + listJune 2002 DAC02 - Physical Chip Implementation 220


Capacitance Constraints• Each gate g can drive at most C(g) capacitance• When inserting buffer g, check downstreamcapacitance.• If it is bigger than C(g), throw out candidate• Increases efficiencyJune 2002 DAC02 - Physical Chip Implementation 221


Slew Constraints• Similar to capacitance constraints• When inserting buffer, compute slews togates driven by buffer• If any slew exceeds its target, throw outcandidate• Potential difficulty: computing slew accuratelyin bottom-up fashionJune 2002 DAC02 - Physical Chip Implementation 222


Noise Constraints• Each gate has acceptable noise threshold• Compute cumulative noise for each wire viaDevgan noise metric• Throw out candidates that violate noiseCan avoid noise while optimizing timing!June 2002 DAC02 - Physical Chip Implementation 223


Wire Sizing:6 5 4 3 2 1For each node (bottom up traversal of graph){for each Wire Size{augment each item in list with Sized wire segmentduplicate the listfor each element of the duplicate listadd a buffer at nodeanalyze each element in listanalyze each element in buffered (duplicate) listpick best element of buffered list and delete the restnew list is union of list and “best” element of buffered list}}Do Final pruning & Pick best solution;June 2002 DAC02 - Physical Chip Implementation 224


Blockage RecognitionDelete insertionpoints that runover blockagesJune 2002 DAC02 - Physical Chip Implementation 225


Route Around BlockageJune 2002 DAC02 - Physical Chip Implementation 226


Buffer BaysJune 2002 DAC02 - Physical Chip Implementation 227


Routing Into Buffer BaysJune 2002 DAC02 - Physical Chip Implementation 228


“Buffer Site”• Similar to buffer bays, only exact bufferlocations are pre-specified, not just areas• Useful as a mechanism for IP blocks andmicroprocessor design• Dummy cell that holds a buffer• Not connected to any net• Becomes buffer when assigned to a net• Extra sites à decoupling caps• Sprinkle sites throughout design• Allocate percentage within macrosJune 2002 DAC02 - Physical Chip Implementation 229


Routing Into Buffer SitesJune 2002 DAC02 - Physical Chip Implementation 230


Generate Steiner TreeJune 2002 DAC02 - Physical Chip Implementation 231


Reduce Congestion andCouplingJune 2002 DAC02 - Physical Chip Implementation 232


Reduce Congestion andCouplingJune 2002 DAC02 - Physical Chip Implementation 233


Assign BuffersJune 2002 DAC02 - Physical Chip Implementation 234


Comments about Buffering andWire Sizing:• Extremely critical: One of the highestleverage timing closure items• There are extended provably correctalgorithms for dealing with the problem.• Steiner route & Blockage avoidance aremostly heuristic: Hot research area!June 2002 DAC02 - Physical Chip Implementation 235


Section Outline• Introduction• Review material (timing and synthesis)• Introduction to placement• Placement algorithms• Paradigms for placement-synthesisintegration• Placement aware synthesis techniques• Congestion avoidance / mitigation techniques• Routing OptimizationJune 2002 DAC02 - Physical Chip Implementation 236


Congestion MitigationJune 2002 DAC02 - Physical Chip Implementation 237


Sources of Congestion• Placement Quality: Do we have a good relativeordering of cells?• Placement Density: Do we have appropriate cellspreading?• Preplacement of large cells: Is there a better locationfor these cells?• Floorplan quality: Is this a good floorplan / hierarchy?• Netlist complexity: Are some logic groupingsinherently difficult to route• Library characteristics: Do some cells block toomuch metal internally?June 2002 DAC02 - Physical Chip Implementation 238


Congestion Mitigation• Constructive Avoidanceucontrol global placement pin density: fewerpins per unit area means fewer wires perunit areaumonitor congestion during placement andperform dynamic spreading• Post placement fix upuremove problems from an already placednetlistJune 2002 DAC02 - Physical Chip Implementation 239


Constructive Avoidance:Characteristics:- as placement is formed, take action toavoid problems- between each step of the placementprogression there is the potential toevaluate congestion and take actionGroute / Spread / RedoGroute / Spread / RedoGroute / Spread / Redo… etcJune 2002 DAC02 - Physical Chip Implementation 240


Constructive AvoidanceDeficiencies:• Depends on early estimates of congestionthat may not be accurate enough to avoid allproblems• Post placement actions such as clock treeinsertion, repowering, buffering, etc may addcongestion too the design• Guard banding with conservative“constructive avoidance” causes lose ofperformance and densityJune 2002 DAC02 - Physical Chip Implementation 241


Post Placement CongestionMitigation• Use production global router, not internalplacement based global router• Translate congestion values into densitytargets for placement regions• Perform flow based circuit spreading• Preserve relative logic ordering of cellsJune 2002 DAC02 - Physical Chip Implementation 242


Network Flow based SpreadingÙMin-cost max-flow formulationijst¼ i if b(i) > 0,Cap(esi) = b(i)Cost(esi) = 0Supply NodesDemand Nodesb(i) > 0 b(j) < 0¼ i ! s , j ! t,Cap(eij) = Infinity (Large Int)Cost(eij) = K¼ j if b(j) < 0,Cap(ejt) = -b(j)Cost(ejt) = 0June 2002 DAC02 - Physical Chip Implementation 243


Congestion Driven Circuit SpreadingInitial PlacementCalculate bin levelcongestionTranslate bin score tobin target densityNetwork flow basedcircuit spreadingNoIsCongestionbelowthreshold ?YesFinal PlacementJune 2002 DAC02 - Physical Chip Implementation 244


June 2002 DAC02 - Physical Chip Implementation 245


June 2002 DAC02 - Physical Chip Implementation 246


June 2002 DAC02 - Physical Chip Implementation 247


June 2002 DAC02 - Physical Chip Implementation 248


We’ve Talked About• Placement algorithms• Placement / Synthesis interaction• Placement aware synthesis techniques• The Constant Delay paradigm• Physical Buffer insertion / Wire sizing• Congestion MitigationJune 2002 DAC02 - Physical Chip Implementation 249


Let’s Look at some Examples:June 2002 DAC02 - Physical Chip Implementation 250


Pure MLPQuadraticJune 2002 DAC02 - Physical Chip Implementation 251


Optimization Resultsbuffer shatter clonefaninJune 2002 DAC02 - Physical Chip Implementation 252


Optimization ResultsJune 2002 DAC02 - Physical Chip Implementation 253


Optimization ResultsJune 2002 DAC02 - Physical Chip Implementation 254


Section outline• Introduction• Review material (timing and synthesis)• Introduction to placement• Placement algorithms• Paradigms for placement-synthesisintegration• Placement aware synthesis techniques• Congestion avoidance / mitigation techniques• Routing optimizationJune 2002 DAC02 - Physical Chip Implementation 255


Routing Based Optimization:RBO (IBM)June 2002 DAC02 - Physical Chip Implementation 256


Routing based Timing ClosureIssues• Post Routing timing problems can be significantu affect design scheduleu may be too numerous to fix manually• Increasing design density can reduce cost, but it alsoincreases wiring congestionu timing and signal integrity become more significantu available resource for manual fixup is limitedu without automation may not be doable• Rerouting with constraints may resolve some of theproblems, but this process is slowJune 2002 DAC02 - Physical Chip Implementation 257


Solution:• Integrate global routing, detailed routing andtiming correction• Global routing is efficient enough to be run inan iterative timing closure loop• Timing critical nets avoid scenic routes• Non-critical nets that go scenic can berepowered and buffered prior to detailedroutingJune 2002 DAC02 - Physical Chip Implementation 258


Example Problem:criticalpathsnon-critical pathsideal "Steiner" routesTiming deficient wiringsolutionForce optimal use ofwiring resource (e.g.critical paths getdirect route)PDS Timinguses steiner wires - fastPost PD TimingCatches thisproblem: Slow!Timing driven wiringsolutionRBO Timing DrivenRouting sees thisduring global routestage: Fast!June 2002 DAC02 - Physical Chip Implementation 259


Global RoutingDivides the entire chip into localized rectangular regions called tiles.Compress several pin location in each tile to a single pin locationAll the shapes, wires and open are represented in terms of globaltrack capacity and usage.June 2002 DAC02 - Physical Chip Implementation 260


Global Routing•Two step approachuCreate the initial steiner routesuCompute the edge congestion's on the griduPerform a rip-up reroute using shortest pathalgorithm to reduce the overall congestion ofthe design•AdvantagesuCan communicate with detail routeruGood correlation with final detail routingsolutionJune 2002 DAC02 - Physical Chip Implementation 261


Routing Based OptimizationCurrentMethodologyRBO MethodologyPhysicalSynthesisXrGlobalPhysicalSynthesisGlobalRoutingExtractorOptimizerRBO /PhysicalSynthesisEinstimerDetailedRoutingDetailedRoutingTimingAnalysisNo Timing Criticalityfor Global routerCostly Manual TimingCorrectionAnalysisJune 2002 DAC02 - Physical Chip Implementation 262


RBO Extraction Process•Very fastuExcellent correlation with final 3D extraction•Uses global routes for extraction•Neighbor information probabilistically determined based onthe global routing congestion information•Based on extraction tables3 3 2Probability of having a neighbor =(#OccupiedTracks)/(#Capacity)= 2/4 = 0.51Capacity of All Edges = 5June 2002 DAC02 - Physical Chip Implementation 263


RBO results on memcntlDesign : Example 1Nets : ~1.6MSize : 23193 x 23193Congestion : Attachedis a display ofGlobal congestionJune 2002 DAC02 - Physical Chip Implementation 264


Timing Critical Nets: Without RBOJune 2002 DAC02 - Physical Chip Implementation 265


Nets Routed with RBO flowJune 2002 DAC02 - Physical Chip Implementation 266


RBO Results - Example 1Worst Slack #SlackViolations#CapViolations#SlewViolations#Opens#LoopsSteinerEstimates-0.47 17 1 18XrLocalwithout RBO-1.57 4687 14 128 50 87020RBOTimingClosure(GlobalRoutes)Detailedrouting withRBO-0.48 209-0.43 14 18 1 54June 2002 DAC02 - Physical Chip Implementation 267


Example 2: - Critical Net Routed Without RBOJune 2002 DAC02 - Physical Chip Implementation 268


Example 2: Critical Net Routed With RBOJune 2002 DAC02 - Physical Chip Implementation 269


Example 2 Results SummaryWorst Slack #SlackViolations#CapViolations#SlewViolations#Opens#LoopsFinal routingwithout RBO-0.54 1224 33 270 0 1152Using RBO -0.29 909 32 274 0 1070June 2002 DAC02 - Physical Chip Implementation 270


PDS - RBO IntegrationCurrent FlowProposed Flow -Existing ToolsProjected FlowTiming ClosureNo Noise SignOffSteiner Routes TimingClosurePDS-EinstimerDetail RoutingNot Noise AwareSteiner Routes TimingClosurePDS-EinstimerGlobal Routes TimingClosureNoise AwareSteiner-GlobalPDS-Einstimer-RBOProbabilistic Detection ofNoise ProblemsPDS-Einstimer-RBOXrouterRBO-EinstimerNoise AvoidanceTiming SignOffChipEdit-EinstimerSignOff Noise DetectionETCoupling-3DNoisePDS-Einstimer-RBOSignOff Noise DetectionETCoupling-3DNoiseNoise Correction10111212Noise CorrectionManual Correction93847 56Manual CorrectionJune 2002 DAC02 - Physical Chip Implementation 27112 11 10 29 387 564


Noise Detection and Avoidance:• RBO (Detection)u Length BaseFInitial selection includes length and slack thresholdFFurther pruning based on Worst Case MillerTimingu Switching Window based refinementFPattern generation based on switching windowoverlaps• RBO (Avoidance)u Long Net Spreadingu Track Reorderingu Incremental Placement Changesu Layer AssignmentJune 2002 DAC02 - Physical Chip Implementation 272


Noise Detection and Avoidance:• Wire width selection• Physical Synthesisu Integrationu Fix Cap And Slew Violations with Global Routesu Interface to RBOu Noise Alleviation Resizingu Noise Aware BufferingJune 2002 DAC02 - Physical Chip Implementation 273


Long Net SpreaderJune 2002 DAC02 - Physical Chip Implementation 274


Wrap Up• Timing closure today is highly dependant onintegrated tools.• Tightly integrated Placement, Timing & Synthesistools are available today from multiple vendors.• Placement techniques are dominated quadratictechniques and partitioning• Next on the list for integration are Routing and Signalintegrity tools (happening now)• These tools have a high degree of complexity. Ittakes large well funded DA organizations to competein this space.June 2002 DAC02 - Physical Chip Implementation 275


Placement References• C. J. Alpert, T. Chan, D. J.-H, H,\. Huang, I. Markov, and K. Yan, “QuandraticPlacement Revisited”,Proc. 34th IEEE/ACM Design Automation Conference, 1997,pp. 752-757757• C. J. Alpert, J.-H Huang, and A. B. Kahng, “Multilevel Circuit <strong>Part</strong>itioning”, Proc. 34thIEEE/ACM Design Automation Conference, 1997, pp. 530-533533• U. Brenner, and A. Rohe, “An Effective Congestion Driven Placement Framework”,International Symposium on Physical Design 2002, pp. 6-11• A. E. Caldwell, A. B. Kahng, and I.L. Markov, “Can Recursive Bisection AloneProduce Routable Placements”,Proc. 37th IEEE/ACM Design Automation on Conference,2000, pp 477-482482• M.A. Breuer, “Min-Cut Placement”, J. Design Automation and Fault TolerantComputing, I(4), 1997, pp 343-362362• J. Vygen, “Algorithms for Large-Scale Flat Placement”, Proc. 34th IEEE/ACM DesignAutomation Conference, 1988,pp746-751751• H. Eisenmann and F. M. Johannes, “Generic Global Placement and Floorplanning”,Proc. 35th IEEE/ACM Design Automation Conference, 1998, pp. 269-274274• S.-L.Ou and M. Pedram, “Timing Driven Placement Based on <strong>Part</strong>itioning withDynamic Cut-Net Control”, Proc. 37th IEEE/ACM Design Automation Conference,2000, pp. 472-476476• C.M. Fiduccia and R.M. Mattheyses, A linear time heuristic for improving networkpartitions, Proc. ACM/IEEE Design Automation Conference. (1982) pp. 175 - 181.June 2002 DAC02 - Physical Chip Implementation 276


Synthesis References• C.L. Berman, J. L. Carter, and K.F. Day. The Fanout Problem: From Theory to Practice. InAdvanced Research in <strong>VLSI</strong>: Proceedings of the 1989 Decennial Caltech Conference, pages69-99, 99, 1989• C. L. Berman, D. J. Hathaway, A. S. LaPaugh, and L. H. Trevillyan. Efficient Techniques forTiming Corrections. In International Symposium on Circuits and Systems, Pages 415-419, 419, 1990• F. Beefting, P. N. Kudva, D. S. Kung, R. Puri, and L. Stok. Combinatorial Cell Design for CMOSLibraries INTEGRATION, the <strong>VLSI</strong> Journal, 29:67-93, 93, 2000• W. Donath, P. Kudva, L. Stok, P. Villarrubia, L. Reddy, and A. Sullivan. Transformationalplacement and synthesis. In DATE, pages 194-201, 2000• D. J. Hathaway, R.P. Abato, A.D. Drumm, and L.P.P.P . Van Ginneken. Incremental timinganalysis. Technical report, IBM Corp., 1996. U.S. patent 5,508,937. ,937.• D. Kung, P. Kudva, and A. Sullivan. A Gate Sizing Algorithm using Geometric Programming. ramming. InProc. Of the International Workshop on Logic Synthesis, 1997• T. Kutzschebauch and L. Stok. Regularity driven logic synthesis. In Proc of the Int. Conf. OnComputer Aided Design, Nov 2000.• P. Rezvani, A.H. Ajami, M. Pedram, and H. Savoj. LEOPARD: A Logical Effort based fanoutOptimizer for Area and Delay. In IEEE/ACM International Conference nce on <strong>CAD</strong>, pages 516-519, 519,1999.• L. Stok, M. Iyer, and A. Sullivan. Wavefront technology mapping. In DATE, pages 531-536, 536,1999• D. S. Kung. A Fast Fanout Optimization for New-Continuous Buffer Libraries. In IEEE/ACMDesign Automation Conference, pages 352-355, 355, 1998June 2002 DAC02 - Physical Chip Implementation 277


DP Buffer Insertion References• Buffer placement in distributed RC-tree networks for minimal Elmore delayvan Ginneken, L.P.P.P. Circuits and Systems, 1990., IEEE InternationalSymposium on , 1990 Page(s): 865 -868 vol.2• Optimal wire sizing and buffer insertion for low power and a generalized eralized delaymodel Lillis, J.; Chung-KuanCheng; Lin, T.-T.Y. T.Y. Solid-State State Circuits, IEEEJournal of , Volume: 31 Issue: 3 , March 1996 Page(s): 437 –447• Buffer insertion for noise and delay optimization Alpert, C.J.; Devgan, A.;Quay, S.T. Computer-Aided Design of Integrated Circuits and Systems, IEEETransactions on , Volume: 18 Issue: 11 , Nov. 1999 Page(s): 1633 -1645• Buffer insertion with accurate gate and interconnect delay computation Alpert,C.J.; Devgan, A.; Quay, S.T. Design Automation Conference, 1999.Proceedings. 36th , 1999 Page(s): 479 –484• Wire Segmenting For Improved Buffer Insertion Alpert, C.; Devgan, A. DesignAutomation Conference, 1997. Proceedings of the 34th Page(s): 588 –593• Simultaneous routing and buffer insertion for high performance interconnectLillis, J.; Chung-KuanCheng; Ting-Ting Y. Lin <strong>VLSI</strong>, 1996. Proceedings., SixthGreat Lakes Symposium on , 1996 Page(s): 148 -153June 2002 DAC02 - Physical Chip Implementation 278


Blockage Avoidance References• Steiner tree optimization for buffers, blockages, and bays Alpert, C.J.; Gandham, G.;Jiang Hu; Neves, J.I.; Quay, S.T.; Sapatnekar, S.S. Computer-Aided Design ofIntegrated Circuits and Systems, IEEE Transactions on , Volume: 20 Issue: 4 , April2001 Page(s): 556 –562.• A fast algorithm for context-aware buffer insertion Jagannathan, A.;Sung-WooHur; Lillis, J. Design Automation Conference, 2000.Proceedings 2000 Page(s): 368 –373.• Simultaneous routing and buffer insertion with restrictions on bufferlocations Hai Zhou; Wong, D.F.; I-Min Liu; Aziz, A. Computer-AidedDesign of Integrated Circuits and Systems, IEEE Transactions on ,Volume: 19 Issue: 7 , July 2000 Page(s): 819 -824• Maze routing with buffer insertion and wire sizing Minghorng Lai; Wong,D.F. Design Automation Conference, 2000. Proceedings 2000 Page(s):374 -378• Routing tree construction under fixed buffer locations Cong, J.; Xin YuanDesign Automation Conference, 2000. Proceedings 2000 Page(s): 379 -384June 2002 DAC02 - Physical Chip Implementation 279


Interconnect Planning References• A practical methodology for early buffer and wire resource allocation ation Alpert, C.J.;Jiang Hu; Sapatnekar, S.S.; Villarrubia, P.G. Design Automation Conference, 2001.Proceedings , 2001 Page(s): 189 –194• An interconnect-centric centric design flow for nanometer technologies Cong, J. Proceedingsof the IEEE , Volume: 89 Issue: 4 , April 2001 Page(s): 505 -528• Buffer block planning for interconnect-drivenfloorplanning Cong, J.; Tianming Kong;Pan, D.Z. Computer-Aided Design, 1999. Digest of Technical Papers. 1999IEEE/ACM International Conference on , 1999 Page(s): 358 –363• Provably good global buffering using an available buffer block planDragan, F.F.;Kahng, A.B.; Mandoiu, I.; Muddu, S.; Zelikovsky, A. Computer Aided Design, 2000.IC<strong>CAD</strong>-2000. IEEE/ACM International Conference on , 2000 Page(s): 104 -109• Provably good global buffering by multiterminal multicommodity flow approximationDragan, F.F.; Kahng, A.B.; Mandoiu, I.; Muddu, S.; Zelikovsky, A. DesignAutomation Conference, 2001. Proceedings of the ASP-DAC 2001. Asia and SouthPacific , 2001 Page(s): 120 –125• Planning buffer locations by network flows Tang, X.; Wong, D.F.; InternationalSymposium on Physical Design, April 2001 Page(s): 180-185185• Routability-Driven Repeater Block Planning for Interconnect-CentricFloorplanningSarkar, P.; Sundararaman, V.; Koh, C.-K.; International Symposium on PhysicalDesign, April 2001 Page(s): 186-191191June 2002 DAC02 - Physical Chip Implementation 280

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!