12.07.2015 Views

Pdf - Tutorial on High-Level Synthesis - University of Windsor

Pdf - Tutorial on High-Level Synthesis - University of Windsor

Pdf - Tutorial on High-Level Synthesis - University of Windsor

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

erally look at less <strong>of</strong> the search space than global techniques, andtherefore are mote efficient, but ate less likely to find optimal soluti<strong>on</strong>s.Figure 5. A Distributi<strong>on</strong> GraphThe last trpe <strong>of</strong> scheduling algorithm we will c<strong>on</strong>sider is globalboth in the way it selects the next operati<strong>on</strong> to be scheduled and inthe way it decides the c<strong>on</strong>trol step in which to put it. In this type<strong>of</strong> algorithm, the range <strong>of</strong> possible c<strong>on</strong>trol step assignments foreach operati<strong>on</strong> is calculated, given the time c<strong>on</strong>straints and the precedencerelati<strong>on</strong>s between the operati<strong>on</strong>s. In freedom-basedscheduling, the operati<strong>on</strong>s <strong>on</strong> the critical path are scheduled firstand assigned to functi<strong>on</strong>al units. Then the other operati<strong>on</strong>s arescheduled and assigned <strong>on</strong>e at a time. At each step theunschedtded operati<strong>on</strong> with the least freedom, that is, the <strong>on</strong>e withthe smallest range <strong>of</strong> c<strong>on</strong>trol steps into which it can go, is chosen,so that operati<strong>on</strong>s that might present more difficult schedulingproblems are taken care <strong>of</strong> first, before they become blocked.In force-directed scheduling. the range <strong>of</strong> possible c<strong>on</strong>trol steps foreach operati<strong>on</strong> is used to form a so-called Distributi<strong>on</strong> Graph. Thedistributi<strong>on</strong> graph shows, for each c<strong>on</strong>trol step, how heavily loadedthat step is, given that alI possible schedules are equally likely. Ifan operati<strong>on</strong> could be d<strong>on</strong>e in any <strong>of</strong> k c<strong>on</strong>trol steps, then l/k isadded to each <strong>of</strong> those c<strong>on</strong>trol steps in the graph. For exampleFigure 5 shows a dataflow graph, the range <strong>of</strong> steps for each operati<strong>on</strong>,and the corresp<strong>on</strong>ding distributi<strong>on</strong> graph for the additi<strong>on</strong>operati<strong>on</strong>s, assuming a time c<strong>on</strong>straint <strong>of</strong> three c<strong>on</strong>trol steps.Additi<strong>on</strong> al must be scheduled in step 1, so it c<strong>on</strong>tributes 1 to thatstep. Similarly additi<strong>on</strong> a2 adds 1 to c<strong>on</strong>trol step 2. Additi<strong>on</strong> a3could be scheduled in either step 2 or step 3, so it c<strong>on</strong>tributes I/; toeach. Operati<strong>on</strong>s ate then selected and placed so as to balance thedistributi<strong>on</strong> as much as possible. In the above example, a.3 wouldfirst be scheduled into step 3. since that would have the greatesteffect in balancing the graph.3.2 Data Path Allocati<strong>on</strong>Data path allocati<strong>on</strong> involves mapping operati<strong>on</strong>s <strong>on</strong>to functi<strong>on</strong>alunits, assigning values to registers, and providing interc<strong>on</strong>necti<strong>on</strong>sbetween operators and registers using buses and multiplexem. Thedecisi<strong>on</strong> to use AINs instead <strong>of</strong> simple operators is also made atthis time. The optimizati<strong>on</strong> goal is usually to minimize someobjective functi<strong>on</strong>, such asl total interc<strong>on</strong>nect length,- total register, bus driver and multiplexer cost, or. critical path delays.There may also be <strong>on</strong>e or more c<strong>on</strong>straints <strong>on</strong> tbe design whichlimit total area <strong>of</strong> the design, total throughput, or delay from inputto output.The techniques used to perform data path allocati<strong>on</strong> can beclassified into two types, iterative/c<strong>on</strong>structive, and global.Iterative/c<strong>on</strong>structive techniques assign elements <strong>on</strong>e at a time,while global techniques find simultaneous soluti<strong>on</strong>s to a number <strong>of</strong>assignments at a time. Exhaustive search is an extreme case <strong>of</strong> aglobal soluti<strong>on</strong> technique. Iterative/C<strong>on</strong>structive techniques gen-3.2.1 Iterative/C<strong>on</strong>structive Techniques Iterative/c<strong>on</strong>structive techniquesselect an operati<strong>on</strong>, value or interc<strong>on</strong>necti<strong>on</strong> to be assigned,make the assignment, and tben iterate. The rules which determinethe next operati<strong>on</strong>, value or interc<strong>on</strong>nect to be selected can varyfrom global rules, which examine many or all items before selecting<strong>on</strong>e, to local selecti<strong>on</strong> rules, which select the items in a fixedorder, usually as they occur in the data flow graph from inputs tooutputs. Global selecti<strong>on</strong> involves selecting a candidate forassignment <strong>on</strong> the basis <strong>of</strong> some metric, for example taking thecandidate that would add the minimum additi<strong>on</strong>al cost to thedesign. Hafer’s data path allocator, the first RT synthesis programwhich dealt with ‘ITL chips was iterative, and used local selecti<strong>on</strong>[9]. The DAA used a local criteri<strong>on</strong> to select which element toassign next, but chose where to assign it <strong>on</strong> the basis <strong>of</strong> rules thatencoded expert knowledge about the data path design <strong>of</strong> microptocessors.Once this knowledge base had been tested and improvedthrough repeated interviews with designers, the DAA was able toproduced much cleaner data paths than when it began [ 13 pages26-311. EMUCS 1101 used a global selecti<strong>on</strong> criteri<strong>on</strong>, based <strong>on</strong>minimizing both the number <strong>of</strong> functi<strong>on</strong>al tits and registers andthe multiplexing needed, to choose the next element to assign andwhere to assign it. The Elf system also sought to minimize interc<strong>on</strong>nect,but used a local selecti<strong>on</strong> criteri<strong>on</strong>. The REAL program[15] separated out register allocati<strong>on</strong> and performed it afterscheduling, but prior to operator and interc<strong>on</strong>nect allocati<strong>on</strong>.REAL is c<strong>on</strong>structive, and selects the earliest value to assign ateach step, sharing registers am<strong>on</strong>g values whenever possible.al +4mm -- J -- “J?l -_+ +aa3,-JIG%* _-- c,a4 T3 +(1a 11 a3,a4+(2 a2r!!!?lml ,m2Figure 6. Greedy Data Path Allocati<strong>on</strong>An example <strong>of</strong> greedy allocati<strong>on</strong> is shown in fig. 6. The dataflowgraph <strong>on</strong> the left is processed from earliest time step to latest.Operators, registers and interc<strong>on</strong>nect are allocated for each timestep in sequence. Thus, the selecti<strong>on</strong> rule is local, and the allocati<strong>on</strong>c<strong>on</strong>structive. Assignments are made so as to minimize interc<strong>on</strong>nect.In the case shown in the figure, a2 was assigned toadder2 since the increase in multiplexing cost required by thatallocati<strong>on</strong> was zero. a4 was assigned to adder1 because there wasalready a c<strong>on</strong>necti<strong>on</strong> from the register to that adder. Other variati<strong>on</strong>sare possible, each with different multiplexing costs. Forexample, if we had assigned a2 to adder1 and a4 to adder1 withoutchecking for interc<strong>on</strong>necti<strong>on</strong> costs, then the final multiplexingwould have been more expensive. A more global selecti<strong>on</strong> rulealso could have been applied. For example, we could haveselected tbe next item for allocati<strong>on</strong> <strong>on</strong> the basis <strong>of</strong> minimizati<strong>on</strong><strong>of</strong> cost increase. In this case, if we had already allocated a3 toaddet2, then the next step would be to allocate a4 to the sameadder, since they occur in different time steps, and the incrementalcost <strong>of</strong> doing that assignment is less than assigning a2 to adderl.Paper 23.1334

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!