Verification of Parameterised FPGA Circuit Descriptions with Layout ...

Verification of Parameterised FPGA Circuit Descriptions with Layout ... Verification of Parameterised FPGA Circuit Descriptions with Layout ...

24.04.2013 Views

CHAPTER 6. LAYOUT CASE STUDIES 162 One invariant we have seen across all circuit examples is that manually placing designs significantly reduces the time taken for the place and route stage of the compilation process to execute, with the reduction ranging from 14% for the unpipelined matrix multiplier to 92% for the unpipelined adder tree. This result is not overly surprising, since the place and route stage has been reduced to just routing, however it is beneficial to confirm it since it could have been the case that the denser placements specified by the user constraints would increase the routing time by more than is saved by avoiding automatic placement. Another fairly firm conclusion is that in almost all cases 4 manually placed designs require less logic area than automatically placed ones. The logic mapping specified by the manual constraints is denser than that used by the automatic tools and reductions in area of 40% are commonly achieved, with a maximum of 61% area reduction observed for the unpipelined adder tree. In one case the manual mapping for a butterfly circuit used less than 70% of the device resources while the automatic mapping and placement was unable to fit the same circuit onto the device. Manual placement is clearly significantly superior here. The effectiveness of manual placement on positively influencing maximum clock frequency depends on other constraints. Given a homogenous environment, simulated annealing is able to generate circuits with equivalent or better performance by discovering high speed routing paths between cells that humans would not consider sensible - for example, the placed 24-bit pipelined binomial filter circuit is 35% slower than the automatically placed version. However, when other constraints are affecting the layout simulated annealing does not perform so well and the regular layout constraints can produce significant performance gains. The kind of constraints that affect simulated annealing appear to be use of the fast carry chain circuitry, which forces some cells to be laid out vertically, and level of device utilisation which reduces the ability to the placer to find high-speed routes through less densely packed logic. For pipelined bitonic merger butterfly networks, the 4-bit manually placed circuit utilises only 28% of the device runs 14% slower than the automatically placed version - however the 8-bit version utilises 55% of the device and runs 48% faster than the automatically placed version. Manual placement often produces better results than simulated annealing for unpipelined 4 The exception to this we observed was the binomial filter circuit, where we deliberately specified a less dense logic mapping in order to achieve a better aligned layout.

CHAPTER 6. LAYOUT CASE STUDIES 163 circuits where the maximum clock frequency is already much lower than pipelined ones. This is not unexpected, since wiring delays will accumulate in the same way as logic propagation delays in unpipelined circuits. Generally, manual placement appears to lead to reduced power consumption, with reductions in power consumption of up to 40% possible (for the pipelined adder trees). In general power consumption can be reduced even if the maximum clock frequency of the placed design is lower than that for the automatically placed circuit. For the binomial filter power savings of 2-13% were observed even though the placed circuits had lower maximum clock frequencies. In the case of the butterfly network a correlation was once again observed with device util- isation/circuit size - with the 4-bit circuit consuming more power when placed although an 8-bit circuit consumed less. 6.8 Summary We have demonstrated our layout framework with a variety of real circuits including a ma- trix multiplier described with a new type of higher-dimensional combinator, a binomial filter, a butterfly network and a median filter. We have demonstrated how functional reasoning can be used to derive pipelined versions, while the layout framework can be used to verify layouts. We have found that in many, though not all cases, manually placed designs outper- form automatically placed circuits with higher maximum operating frequencies, lower device utilisation, lower power consumption and a faster place and route process.

CHAPTER 6. LAYOUT CASE STUDIES 162<br />

One invariant we have seen across all circuit examples is that manually placing designs<br />

significantly reduces the time taken for the place and route stage <strong>of</strong> the compilation process<br />

to execute, <strong>with</strong> the reduction ranging from 14% for the unpipelined matrix multiplier to<br />

92% for the unpipelined adder tree. This result is not overly surprising, since the place and<br />

route stage has been reduced to just routing, however it is beneficial to confirm it since it<br />

could have been the case that the denser placements specified by the user constraints would<br />

increase the routing time by more than is saved by avoiding automatic placement.<br />

Another fairly firm conclusion is that in almost all cases 4 manually placed designs require<br />

less logic area than automatically placed ones. The logic mapping specified by the manual<br />

constraints is denser than that used by the automatic tools and reductions in area <strong>of</strong> 40% are<br />

commonly achieved, <strong>with</strong> a maximum <strong>of</strong> 61% area reduction observed for the unpipelined<br />

adder tree. In one case the manual mapping for a butterfly circuit used less than 70% <strong>of</strong><br />

the device resources while the automatic mapping and placement was unable to fit the same<br />

circuit onto the device. Manual placement is clearly significantly superior here.<br />

The effectiveness <strong>of</strong> manual placement on positively influencing maximum clock frequency<br />

depends on other constraints. Given a homogenous environment, simulated annealing is able<br />

to generate circuits <strong>with</strong> equivalent or better performance by discovering high speed routing<br />

paths between cells that humans would not consider sensible - for example, the placed 24-bit<br />

pipelined binomial filter circuit is 35% slower than the automatically placed version. However,<br />

when other constraints are affecting the layout simulated annealing does not perform so well<br />

and the regular layout constraints can produce significant performance gains.<br />

The kind <strong>of</strong> constraints that affect simulated annealing appear to be use <strong>of</strong> the fast carry chain<br />

circuitry, which forces some cells to be laid out vertically, and level <strong>of</strong> device utilisation which<br />

reduces the ability to the placer to find high-speed routes through less densely packed logic.<br />

For pipelined bitonic merger butterfly networks, the 4-bit manually placed circuit utilises<br />

only 28% <strong>of</strong> the device runs 14% slower than the automatically placed version - however the<br />

8-bit version utilises 55% <strong>of</strong> the device and runs 48% faster than the automatically placed<br />

version.<br />

Manual placement <strong>of</strong>ten produces better results than simulated annealing for unpipelined<br />

4 The exception to this we observed was the binomial filter circuit, where we deliberately specified a less<br />

dense logic mapping in order to achieve a better aligned layout.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!