12.07.2015 Views

Pdf - Tutorial on High-Level Synthesis - University of Windsor

Pdf - Tutorial on High-Level Synthesis - University of Windsor

Pdf - Tutorial on High-Level Synthesis - University of Windsor

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

2. The <strong>Synthesis</strong> TaskThe system to be designed is usually represented at the algorithmiclevel by a programming language such as Pascal [27] or Ada [8],or by a hardware descripti<strong>on</strong> language that is similar to a programminglanguage, such as ISPS [2]. Most <strong>of</strong> the languages used areproceduring languages. That is, they describe data manipulati<strong>on</strong> interms <strong>of</strong> assignment statements that am organized into larger blocksusing standard c<strong>on</strong>trol c<strong>on</strong>structs for sequential executi<strong>on</strong>, c<strong>on</strong>diti<strong>on</strong>alexecuti<strong>on</strong> and iterati<strong>on</strong>. There have been experiments, however,with various types <strong>of</strong> n<strong>on</strong>-procedural hardware descripti<strong>on</strong>languages, including applicative, LISP-like languages [ll] anddeclarative languages such as Prolog.The first step in high-level synthesis is usually the compilati<strong>on</strong> <strong>of</strong>the formal language into an internal representati<strong>on</strong>. Two types <strong>of</strong>internal representati<strong>on</strong>s are generally used: parse trees and graphs.Most approaches use variati<strong>on</strong>s <strong>of</strong> graphs that c<strong>on</strong>tain both thedata-flow and the c<strong>on</strong>trol flow implied by the specificati<strong>on</strong> [16],[26], [12]. Fig. 1 shows a part <strong>of</strong> a simple program that computesthe square-root <strong>of</strong> X using Newt<strong>on</strong>’s method, al<strong>on</strong>g with its graphicalrepresentati<strong>on</strong>. The number <strong>of</strong> iterati<strong>on</strong>s necessary in practiceis very small. In the example, 4 iterati<strong>on</strong>s were chosen. A firstdegree minimax polynomial approximati<strong>on</strong> for the interval gives the initial value. The data-flow and c<strong>on</strong>trol flow graphsare shown seprately in the figure for intelligibility. The c<strong>on</strong>trolgraph is derived directly from the explicit order given in the programand from the compiler’s choice <strong>of</strong> how to parse the arithmeticexpressi<strong>on</strong>s. The data-flow graph shows the essential ordering <strong>of</strong>operati<strong>on</strong>s in the program imposed by the data relati<strong>on</strong>s in thespecificati<strong>on</strong>. For example, in fig. 1, the additi<strong>on</strong> at the top <strong>of</strong> thediagram depends for its input <strong>on</strong> data produced by the multiplicati<strong>on</strong>.This implies that the multiplicati<strong>on</strong> must be d<strong>on</strong>e first in anyvalid ordvring <strong>of</strong> the operati<strong>on</strong>s. On the other hand, there is nodependence between the I + 1 operati<strong>on</strong> inside the loop and any <strong>of</strong>the operati<strong>on</strong>s in the chain that calculates Y, so the I + 1 may bed<strong>on</strong>e in parallel with those operati<strong>on</strong>s, as well as before. or afterthem. The data-flow graph can also be used to remove the dependence<strong>on</strong> the way internal variables are used in the specificati<strong>on</strong>,since each value produced by <strong>on</strong>e operati<strong>on</strong> and c<strong>on</strong>sumed byanother is represented uniquely by an arc. This ability to reassignvariables is important both for reordering operati<strong>on</strong>s and for simplifyingtbe datapaths....Y :=0.222222+0.888889*X;I:=O;DOUNTILI>3LOOPY:=O.S*(Y+X/Y);*I:=I+l;+ I*t IFigure 1. <strong>High</strong>-level Specificati<strong>on</strong> and graph for sqrtb > fThe rest <strong>of</strong> this secti<strong>on</strong> outlines the various steps used in turningthe intermediate form into a RT-level StNCtUre, using the SqUaEroot example to illustrate the different steps.Since the specificati<strong>on</strong> has been written for human readability andnot for direct translati<strong>on</strong> into hardware, it is desirable to do someinitial optimizati<strong>on</strong> <strong>of</strong> the internal representati<strong>on</strong>. These high-leveltransformati<strong>on</strong>s include such compiler-like optimizati<strong>on</strong>s as deadcode eliminati<strong>on</strong>, c<strong>on</strong>stant propagati<strong>on</strong>, comm<strong>on</strong> subexpressi<strong>on</strong>eliminati<strong>on</strong>, inline expansi<strong>on</strong> <strong>of</strong> procedures and loop unrolling.Local transformati<strong>on</strong>s, including those that are more specific tohardware, are also used. In the example, the loop-ending criteri<strong>on</strong>can be changed to I = 0 using a two-bit variable for 1. The multiplicati<strong>on</strong>times 0.5 can be replaced by a right shift by <strong>on</strong>e. ‘Iheadditi<strong>on</strong> <strong>of</strong> 1 to I can be replaced by an increment operati<strong>on</strong>. Theinternal representati<strong>on</strong> after these optimizati<strong>on</strong>s is depicted <strong>on</strong> theleft in fig. 2. Loop unrolling can also be d<strong>on</strong>e in this case sincethe number <strong>of</strong> iterati<strong>on</strong>s is fixed and small.The next two steps in synthesis are the core <strong>of</strong> transformingbehavior into structure: scheduling and allocati<strong>on</strong>. They are closelyinterrelated and depend <strong>on</strong> each other. Scheduling c<strong>on</strong>sists inassigning the operati<strong>on</strong>s to so-called c<strong>on</strong>trol steps. A c<strong>on</strong>trol stepis the fundamental sequencing unit in synchr<strong>on</strong>ous systems; itcorresp<strong>on</strong>ds to a clock cycle. AUocati<strong>on</strong> c<strong>on</strong>sists in assigning theoperati<strong>on</strong>s to hardware, i.e. allocating functi<strong>on</strong>al units, storage andcommunicati<strong>on</strong> paths.The aim <strong>of</strong> scheduling is to minimize the amount <strong>of</strong> time or thenumber <strong>of</strong> c<strong>on</strong>trol steps needed for completi<strong>on</strong> <strong>of</strong> the program,given certain limits <strong>on</strong> the available hardware resources. In ourexample, a trivial special case uses just <strong>on</strong>e functi<strong>on</strong>al unit and <strong>on</strong>ememory. Each operati<strong>on</strong> has to be scheduled in a different c<strong>on</strong>trolstep, so the computati<strong>on</strong> takes 3+4*5=23 c<strong>on</strong>trol steps. To speedup the computati<strong>on</strong> at the expense <strong>of</strong> adding more hardware, thec<strong>on</strong>trol graph can be packed into c<strong>on</strong>tml steps as tightly aspossible, observing <strong>on</strong>ly the essential dependencies required by tbedata-flow graph and by the loop boundaries. This form is shown infig. 2. Notice that two dummy nodes to delimit the loop boundarieswere introduced. Since the shift operati<strong>on</strong> is free, with tw<strong>of</strong>uncti<strong>on</strong>al units the operati<strong>on</strong>s can now he scheduled in 2+4*2=10c<strong>on</strong>trol steps.c<strong>on</strong>trol66d0p91010l multiplexer.# shiftc c<strong>on</strong>stantFigure 2. Optimized C<strong>on</strong>trol Graph and SchedulePaper 23.1331

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!