Verification of Parameterised FPGA Circuit Descriptions with Layout ...
Verification of Parameterised FPGA Circuit Descriptions with Layout ... Verification of Parameterised FPGA Circuit Descriptions with Layout ...
CHAPTER 6. LAYOUT CASE STUDIES 160 Proof By moving the delay out of the instantiation of cube cell and exploiting the timeless property of wiring blocks to re-organise 3 . We can use a particular instance of this theorem for the cube and manipulate it so the resulting circuit is implementable to give: Theorem 29 cubex,y,zR = [ ˜ z D, ˜ z D, id] ; cubex,y,z(R ; [D, id, id]) ; [id, z D, z D] This theorem can then be used to generate synchronisation registers for the interfaces of the matrix multiplier when we pipeline it. We undertake a layout verification of the cubical matrix multiplier using the previous cor- rectness proofs for col, grid, the multiplier and ripple adder. In the cube block we need to add further explicit type annotations to eliminate some type unknowns because the Is- abelle/HOL theory of zip is not sufficiently detailed to describe the full types. This indicates that a full theory of zip (and indeed tplapl and tplapr) is required, and not just the simple approximation contained in the theory Inbuilt, even though the inbuilt blocks do not affect layout. Otherwise proofs are mostly automatic, with some intervention required for series compositions and expanding the definition of word transpose. Appendix D.3 gives the full Quartz description and some proofs for the matrix multiplier circuit. 6.6.4 Results We generate a circuit that multiples two 2 ×2 matrices together to produce a 2 ×2 matrix as output and evaluate two of these components on the Virtex-II. Table 6.8 shows the results for this circuit. Power consumption was measured at a clock frequency of 16.625Mhz. As can be seen the placed version is out-performed by the automatically placed version for both the pipelined and unpipelined variants in terms of maximum clock frequency, although the percentage difference is small. The power consumption figures present a confusing picture, 3 Actually, proving that the D element can be moved out of cube cell is a difficult and fiddly proof, however cube cell is designed to allow this to take place so we will gloss over this at this point.
CHAPTER 6. LAYOUT CASE STUDIES 161 Slices Util. t-PAR (s) Max freq. (Mhz) Pwr (mW) Unpipelined/Auto 1568 31% 14 23.7 276 Unpipelined/Placed 1413 28% 12 23.0 108 Pipelined/Auto 1542 30% 16 30.7 192 Pipelined/Placed 1476 29% 12 28.9 204 Table 6.8: Results for matrix multiplier circuit with no clear trends emerging. The pipelined, placed version actually consumes more power than the the unpipelined version, which is unexpected since previous results have shown pipelining reduces power consumption for many kinds of circuits [85]. However, the overall power consumption of the circuit is so low it is possible that any real effects are being overwhelmed by noise. Because of the shape of the placed matrix multiplier it is not possible to fit more onto the Virtex device however by increasing the pipelining (by pipelining the multipliers themselves for example) the design could be run and evaluated at a higher clock frequency. The placed version does place & route faster than the unplaced circuit and consumes fewer resources on the chip, so could still be superior in some situations where the small difference in maximum clock frequency is not significant. It is possible that an alternative layout for the cube combinator would produce better results. With the z signals being used for the accumulator data path there are long wires between the respective elements of each grid. 6.7 Evaluation and Conclusions For the five example designs in this chapter we have seen a range of results for our four evaluation metrics: logic area, place and route time, maximum clock frequency and power consumption. At important realisation that manual placement is not an optimisation method per se but rather a way of exerting more control over the compilation process. The way in which that control is exercised determines whether the circuits generated are better or worse in some way than those that would have been generated automatically.
- Page 119 and 120: CHAPTER 5. SPECIALISATION 109 opera
- Page 121 and 122: CHAPTER 5. SPECIALISATION 111 // Ha
- Page 123 and 124: CHAPTER 5. SPECIALISATION 113 circu
- Page 125 and 126: CHAPTER 5. SPECIALISATION 115 const
- Page 127 and 128: CHAPTER 5. SPECIALISATION 117 block
- Page 129 and 130: CHAPTER 5. SPECIALISATION 119 Modif
- Page 131 and 132: CHAPTER 5. SPECIALISATION 121 Buffe
- Page 133 and 134: CHAPTER 5. SPECIALISATION 123 a fas
- Page 135 and 136: CHAPTER 5. SPECIALISATION 125 block
- Page 137 and 138: CHAPTER 5. SPECIALISATION 127 y y y
- Page 139 and 140: CHAPTER 5. SPECIALISATION 129 with
- Page 141 and 142: CHAPTER 6. LAYOUT CASE STUDIES 131
- Page 143 and 144: CHAPTER 6. LAYOUT CASE STUDIES 133
- Page 145 and 146: CHAPTER 6. LAYOUT CASE STUDIES 135
- Page 147 and 148: CHAPTER 6. LAYOUT CASE STUDIES 137
- Page 149 and 150: CHAPTER 6. LAYOUT CASE STUDIES 139
- Page 151 and 152: CHAPTER 6. LAYOUT CASE STUDIES 141
- Page 153 and 154: CHAPTER 6. LAYOUT CASE STUDIES 143
- Page 155 and 156: CHAPTER 6. LAYOUT CASE STUDIES 145
- Page 157 and 158: CHAPTER 6. LAYOUT CASE STUDIES 147
- Page 159 and 160: CHAPTER 6. LAYOUT CASE STUDIES 149
- Page 161 and 162: CHAPTER 6. LAYOUT CASE STUDIES 151
- Page 163 and 164: CHAPTER 6. LAYOUT CASE STUDIES 153
- Page 165 and 166: CHAPTER 6. LAYOUT CASE STUDIES 155
- Page 167 and 168: CHAPTER 6. LAYOUT CASE STUDIES 157
- Page 169: CHAPTER 6. LAYOUT CASE STUDIES 159
- Page 173 and 174: CHAPTER 6. LAYOUT CASE STUDIES 163
- Page 175 and 176: CHAPTER 7. CONCLUSION AND FUTURE WO
- Page 177 and 178: CHAPTER 7. CONCLUSION AND FUTURE WO
- Page 179 and 180: CHAPTER 7. CONCLUSION AND FUTURE WO
- Page 181 and 182: CHAPTER 7. CONCLUSION AND FUTURE WO
- Page 183 and 184: CHAPTER 7. CONCLUSION AND FUTURE WO
- Page 185 and 186: Bibliography [1] A. Aggoun and N. B
- Page 187 and 188: BIBLIOGRAPHY 177 [19] H. Gelernter.
- Page 189 and 190: BIBLIOGRAPHY 179 [41] Y. Li and M.
- Page 191 and 192: BIBLIOGRAPHY 181 [60] L. C. Paulson
- Page 193 and 194: BIBLIOGRAPHY 183 [83] J. Voeten. On
- Page 195 and 196: APPENDIX A. QUARTZ LANGUAGE GRAMMAR
- Page 197 and 198: Appendix B Theoretical Basis for La
- Page 199 and 200: APPENDIX B. THEORETICAL BASIS FOR L
- Page 201 and 202: APPENDIX B. THEORETICAL BASIS FOR L
- Page 203 and 204: APPENDIX B. THEORETICAL BASIS FOR L
- Page 205 and 206: APPENDIX B. THEORETICAL BASIS FOR L
- Page 207 and 208: APPENDIX B. THEORETICAL BASIS FOR L
- Page 209 and 210: APPENDIX B. THEORETICAL BASIS FOR L
- Page 211 and 212: APPENDIX B. THEORETICAL BASIS FOR L
- Page 213 and 214: APPENDIX B. THEORETICAL BASIS FOR L
- Page 215 and 216: APPENDIX B. THEORETICAL BASIS FOR L
- Page 217 and 218: Appendix C Placed Combinator Librar
- Page 219 and 220: APPENDIX C. PLACED COMBINATOR LIBRA
CHAPTER 6. LAYOUT CASE STUDIES 161<br />
Slices Util. t-PAR (s) Max freq. (Mhz) Pwr (mW)<br />
Unpipelined/Auto 1568 31% 14 23.7 276<br />
Unpipelined/Placed 1413 28% 12 23.0 108<br />
Pipelined/Auto 1542 30% 16 30.7 192<br />
Pipelined/Placed 1476 29% 12 28.9 204<br />
Table 6.8: Results for matrix multiplier circuit<br />
<strong>with</strong> no clear trends emerging. The pipelined, placed version actually consumes more power<br />
than the the unpipelined version, which is unexpected since previous results have shown<br />
pipelining reduces power consumption for many kinds <strong>of</strong> circuits [85]. However, the overall<br />
power consumption <strong>of</strong> the circuit is so low it is possible that any real effects are being<br />
overwhelmed by noise. Because <strong>of</strong> the shape <strong>of</strong> the placed matrix multiplier it is not possible<br />
to fit more onto the Virtex device however by increasing the pipelining (by pipelining the<br />
multipliers themselves for example) the design could be run and evaluated at a higher clock<br />
frequency.<br />
The placed version does place & route faster than the unplaced circuit and consumes fewer<br />
resources on the chip, so could still be superior in some situations where the small difference<br />
in maximum clock frequency is not significant.<br />
It is possible that an alternative layout for the cube combinator would produce better results.<br />
With the z signals being used for the accumulator data path there are long wires between<br />
the respective elements <strong>of</strong> each grid.<br />
6.7 Evaluation and Conclusions<br />
For the five example designs in this chapter we have seen a range <strong>of</strong> results for our four<br />
evaluation metrics: logic area, place and route time, maximum clock frequency and power<br />
consumption.<br />
At important realisation that manual placement is not an optimisation method per se but<br />
rather a way <strong>of</strong> exerting more control over the compilation process. The way in which that<br />
control is exercised determines whether the circuits generated are better or worse in some<br />
way than those that would have been generated automatically.