Verification of Parameterised FPGA Circuit Descriptions with Layout ...

Imperial College of Science, 

Technology and Medicine 

(University of London) 

Department of Computing 

Verification of Parameterised FPGA Circuit 

Descriptions with Layout Information 

by 

Oliver D. Pell 

Submitted in partial fulfilment 

of the requirements for the MSc 

Degree in Advanced Computing of the 

University of London and for the 

Diploma of Imperial College of 

Science, Technology and Medicine.

Abstract 

Manual placement is commonly used in FPGA circuit design in order to achieve better results 

than would be generated by automatic place and route algorithms. However, explicit place- 

ment of individual components in parameterised descriptions is tedious and error-prone. In 

this thesis we present a framework for the design and verification of parameterised hardware 

libraries with layout information. There are five main contributions: 

(1) We develop additions to the Quartz language and provide compiler support to allow 

the addition of generic layout information to parameterised circuit descriptions described 

using iterative and recursive constructs. We show how functional combinators can be given 

multiple layout interpretations. 

(2) We provide a specification of layout correctness and develop a proof environment to 

allow the verification of parameterised Quartz circuit layouts. We prove a range of useful 

theorems about common circuit layout expressions and achieve a high level of automation of 

the verification process. 

(3) We develop and verify a range of placed combinator libraries describing useful circuit 

structures including rows, grids, trees and less regular examples. We show that our verifica- 

tion environment can not only establish correctness but also can highlight counter-examples 

where layouts are incorrect. 

(4) We show how distributed specialisation can be used to achieve transparent HDL-level 

specialisation of circuits when some inputs are known. Distributed specialisation allows the 

correctness of specialised circuits to be proven more easily than using lower-level methods. 

We demonstrate the use of our layout framework to specialise parameterised circuits and 

show that our system is able to achieve design compaction. 

(5) We describe and verify the layouts of five example circuits, including a butterfly network, 

binomial filter and matrix multiplier. We show that manual placement can reduce compilation 

time, reduce logic area by up to 60%, reduce power consumption by up to 20% and increase 

maximum clock frequency by up to 80% for unpipelined circuits and 48% for pipelined 

circuits.

Acknowledgements 

Firstly, I’d like to thank my supervisor, Wayne Luk, for invaluable discussions and support 

throughout this project and the work that has preceded it. 

I’d also like to thank all the past and present members of the Custom Computing research 

group for general help and advice. Particular thanks are due to Tobias Becker, Jacob Bower, 

Arran Derbyshire, Rob Dimond and Henry Styles for pointing me in interesting directions, 

reading drafts, stating the obvious when I had missed it or simply sharing their experience 

of where it’s best to aim kicks at expensive bits of FPGA hardware in order to get them to 

do what you want them to. 

I owe a debt of gratitude to those who have developed the tools I have used in this project, 

most especially the people at University of Cambridge, TU Munich and elsewhere who are 

responsible for Isabelle. Thanks are definitely due to the members of the isabelle-users 

mailing list, particularly Larry Paulson and Tobias Nipkow, for answering my Isabelle-related 

questions. 

Finally, I’d like to thank the friends and family who have put up with me all these years and 

especially Nia who has had to listen to me talk endlessly about hardware verification and 

synthesis and yet has managed, most of the time, to pretend that it is interesting. 

ii

Table of Contents 

1 Introduction 1 

1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 

1.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 

1.3 Organisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 

2 Background and Related Work 6 

2.1 FPGAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 

2.1.1 Generating FPGA Circuits . . . . . . . . . . . . . . . . . . . . . . . . 7 

2.1.2 Layout Information in Hardware Descriptions . . . . . . . . . . . . . . 8 

2.1.3 FPGA Architectures . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 

2.2 Describing Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 

2.2.1 High Level Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 

2.2.2 Lower Level Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . 14 

2.3 Quartz . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 

2.3.1 Type System and Overloading . . . . . . . . . . . . . . . . . . . . . . 16 

2.3.2 Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 

2.3.3 Formal Reasoning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 

2.4 Automated Verification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 

2.4.1 Model Checking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 

2.4.2 Theorem Proving . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 

2.4.3 Comparison with Model Checking . . . . . . . . . . . . . . . . . . . . 24 

2.4.4 Logic and Proof . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 

2.4.5 Theorem Proving Tools . . . . . . . . . . . . . . . . . . . . . . . . . . 25 

2.4.6 Embeddings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 

iii

TABLE OF CONTENTS iv 

2.5 Isabelle: A Generic Theorem Prover . . . . . . . . . . . . . . . . . . . . . . . 28 

2.5.1 Meta-logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 

2.5.2 Theories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 

2.5.3 Unification, Resolution and Proof . . . . . . . . . . . . . . . . . . . . . 31 

2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 

3 Generating Parameterised Libraries with Layout 34 

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 

3.2 Placement Infrastructure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 

3.3 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 

3.4 Block Sizes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 

3.4.1 Size Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 

3.4.2 Size of Block Instantiations . . . . . . . . . . . . . . . . . . . . . . . . 42 

3.4.3 Size Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 

3.5 Parameterised Quartz with Placement . . . . . . . . . . . . . . . . . . . . . . 46 

3.5.1 Laid-out Combinators . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 

3.5.2 Naive vs General Placement . . . . . . . . . . . . . . . . . . . . . . . . 47 

3.5.3 A Placed Ripple Adder . . . . . . . . . . . . . . . . . . . . . . . . . . 49 

3.6 Different Layout Interpretations . . . . . . . . . . . . . . . . . . . . . . . . . . 51 

3.6.1 Composition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 

3.6.2 Combinators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 

3.7 Compiling Placed Quartz Designs . . . . . . . . . . . . . . . . . . . . . . . . . 55 

3.7.1 Changes to the Type Processing Module . . . . . . . . . . . . . . . . . 56 

3.7.2 Layout Processing Module . . . . . . . . . . . . . . . . . . . . . . . . . 57 

3.7.3 Distillation of Size Expressions . . . . . . . . . . . . . . . . . . . . . . 58 

3.7.4 Recursive Size Expressions . . . . . . . . . . . . . . . . . . . . . . . . 59 

3.7.5 Expression Simplification . . . . . . . . . . . . . . . . . . . . . . . . . 61 

3.8 Compiling LE-Pebble into VHDL . . . . . . . . . . . . . . . . . . . . . . . . . 62 

3.9 Summary and Comparison with Related Work . . . . . . . . . . . . . . . . . 63 

4 Verifying Circuit Layouts 65 

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

TABLE OF CONTENTS v 

4.2 Choice of Formalism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 

4.3 Specifying Correctness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 

4.3.1 Validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 

4.3.2 Containment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 

4.3.3 Intersection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 

4.4 Proof Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 

4.4.1 Type System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 

4.4.2 Blocks and Block Instantiation . . . . . . . . . . . . . . . . . . . . . . 73 

4.4.3 Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 

4.5 Generating Theories of Quartz Programs . . . . . . . . . . . . . . . . . . . . . 79 

4.5.1 Compiler Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 

4.5.2 Generating Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 

4.5.3 Generating Proof Obligations . . . . . . . . . . . . . . . . . . . . . . . 83 

4.6 Proving the Prelude Library . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 

4.6.1 Proofs with Tacticals . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 

4.6.2 Improved Proof Scripts . . . . . . . . . . . . . . . . . . . . . . . . . . 90 

4.6.3 Building a Library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 

4.7 Proving Other Combinators . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 

4.7.1 Index Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 

4.7.2 An Irregular Grid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 

4.7.3 H-Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 

4.7.4 Surround . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 

4.8 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 

4.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 

5 Specialisation 107 

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 

5.2 Distributed Specialisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 

5.2.1 Specialising Primitives . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 

5.2.2 Benefits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 

5.2.3 Verifying Distributed Specialisation . . . . . . . . . . . . . . . . . . . 114 

5.3 Optimal Distributed Specialisation . . . . . . . . . . . . . . . . . . . . . . . . 116

TABLE OF CONTENTS vi 

5.3.1 Specialising a Ripple Adder . . . . . . . . . . . . . . . . . . . . . . . . 116 

5.3.2 Modified Type System . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 

5.4 High Level Specialisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 

5.5 Specialising a Multiplier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 

5.5.1 Parallel Multiplier Implementation . . . . . . . . . . . . . . . . . . . . 122 

5.5.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 

5.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 

6 Layout Case Studies 130 

6.1 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 

6.2 Adder Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 

6.2.1 Ripple Adder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 

6.2.2 Possible Tree Layouts . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 

6.2.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 

6.3 Median Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 

6.3.1 Circuit Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 

6.3.2 Layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 

6.3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 

6.4 Butterfly Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 

6.4.1 Butterfly Combinator . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 

6.4.2 Implementing a bitonic merger . . . . . . . . . . . . . . . . . . . . . . 145 

6.4.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 

6.5 Binomial Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 

6.5.1 Circuit Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 

6.5.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 

6.6 Matrix Multiplier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 

6.6.1 A 3D “cube” Combinator . . . . . . . . . . . . . . . . . . . . . . . . . 152 

6.6.2 Describing N-dimensional Combinators . . . . . . . . . . . . . . . . . 155 

6.6.3 A 3D Matrix Multiplier . . . . . . . . . . . . . . . . . . . . . . . . . . 158 

6.6.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160 

6.7 Evaluation and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 

6.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163

TABLE OF CONTENTS vii 

7 Conclusion and Future Work 164 

7.1 This Thesis’ Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164 

7.2 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 

7.3 Comparison with Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . 168 

7.3.1 VHDL with Explicit Co-ordinates . . . . . . . . . . . . . . . . . . . . 168 

7.3.2 Relative Placement in Pebble . . . . . . . . . . . . . . . . . . . . . . . 168 

7.3.3 Ruby and Lava . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170 

7.4 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 

7.4.1 Further Support For Alternative Layout Interpretations . . . . . . . . 171 

7.4.2 Less User Interaction In Proofs . . . . . . . . . . . . . . . . . . . . . . 172 

7.4.3 Integrating Layout and Functional Verification . . . . . . . . . . . . . 173 

7.4.4 Run-time Reconfiguration . . . . . . . . . . . . . . . . . . . . . . . . . 173 

7.4.5 Properties of N-Dimensional Combinators . . . . . . . . . . . . . . . . 174 

Bibliography 175 

A Quartz Language Grammar 184 

B Theoretical Basis for Layout Reasoning 187 

B.1 IntAlgebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188 

B.2 Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191 

B.3 Block . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191 

B.4 Inbuilt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192 

B.5 SeriesComposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 

B.6 ParallelComposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 

B.7 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195 

B.8 Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 

B.9 CompilerSimps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204 

B.10 QuartzLayout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 

B.11 Minf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 

C Placed Combinator Libraries 207 

C.1 Prelude Library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207

TABLE OF CONTENTS viii 

C.1.1 fst . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207 

C.1.2 R −1 (converse) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209 

C.1.3 R n (rcomp) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210 

C.1.4 Q\P (conjugate) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213 

C.1.5 map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214 

C.1.6 (tri) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217 

C.1.7 R↔S (beside) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219 

C.1.8 row . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222 

C.1.9 grid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225 

C.1.10 loop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227 

C.2 Recursive Index Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228 

C.2.1 ichain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228 

C.2.2 imap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230 

C.2.3 irow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232 

C.2.4 irdl . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235 

C.2.5 igrid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239 

C.3 Irregular Grid Arrangement . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244 

C.4 Square Element Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252 

C.5 H-Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260 

D Circuit Layout Case Studies 268 

D.1 Median Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268 

D.1.1 Quartz Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268 

D.1.2 Theory max2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271 

D.1.3 Theory min2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274 

D.1.4 Theory eq . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276 

D.1.5 Theory insert . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278 

D.1.6 Theory lct cell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280 

D.1.7 Theory locater . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282 

D.1.8 Theory del cell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283 

D.1.9 Theory compactor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285 

D.1.10 Theory insert median . . . . . . . . . . . . . . . . . . . . . . . . . . . 287

TABLE OF CONTENTS ix 

D.1.11 Theory nextstate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288 

D.1.12 Theory filter core . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291 

D.1.13 Theory filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293 

D.2 Butterfly Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294 

D.2.1 Quartz description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294 

D.2.2 Theory comparator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296 

D.2.3 Theory sort2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297 

D.2.4 Theory butterfly . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301 

D.2.5 Theory merger . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303 

D.3 Matrix Multiplier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305 

D.3.1 Quartz Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305 

D.3.2 Theory cube cell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306 

D.3.3 Theory cube . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309 

D.3.4 Theory matmultcell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312 

D.3.5 Theory matmult . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313

Chapter 1 

Introduction 

This thesis presents a framework for the design and verification of parameterised hardware 

libraries with layout information. Our framework is based around the Quartz hardware 

description language, which supports higher-order combinators and relational operators de- 

signed to promote concise descriptions and formal verification of design function. We extend 

the Quartz language and compiler framework to allow the circuit designer to add layout 

information and formally verify the correctness of parameterised layouts rather than relying 

on automatic placement. 

The framework includes a conservative extension to the class of Quartz expressions to provide 

the extra functionality which we show is required to describe generic layouts for both recur- 

sively and iteratively described circuits, a compiler infrastructure for compiling designs with 

layout information into parameterised hardware libraries and a proof tool with a library of 

theorems to automate the verification of circuit layouts. The use of the framework is demon- 

strated on a range of example designs including median filter and matrix multiplier circuits 

and we show that including layout information in designs can improve their performance by 

up to 82%, reduce logic area by 40-60% and reduce power consumption in comparison with 

using the standard Xilinx place and route tools. 

The potential of this framework to support dynamic specialisation applications is also il- 

lustrated and we introduce the concept of distributed specialisation to achieve HDL-level 

specialisation of circuits in a manner such that the specialised circuits can easily be verified. 

1

CHAPTER 1. INTRODUCTION 2 

1.1 Motivation 

It is a characteristic of custom computing machines that the technology developers have put 

considerable effort into automating the translation from high level hardware descriptions into 

the underlying reconfigurable device fabric. Two critical stages in this process are placement 

and routing where computational resources on the device are allocated and connected to- 

gether. The effectiveness of the placement and routing algorithms has a significant impact 

on the performance of the resulting circuit since a badly placed design will feature unnecessary 

long wire lengths with accompanying delays and impact on maximum clock frequency. 

While modern place & route systems based on simulated annealing can achieve excellent 

results for the highest performance it is still common to intervene manually in the placement 

of designs and much of the value of Intellectual Property Cores, such as those produced by the 

Xilinx Core Generator tool, is that they are carefully laid out to provide good performance. 

It has been shown that user-supplied placement information can often significantly improve 

circuit performance in some common applications [77]. 

The placement of a particular design can be accomplished using graphical tools, either from 

scratch or amending an initial automatic placement. However, manual placements that 

exploit design geometry are likely to scale to a whole class of similar designs, for example the 

arrangement of a parallel multiplier into a grid-shaped structure will be the same for an 8-bit 

multiplier as for a 32-bit multiplier. Since it is possible to produce a single parameterised 

design description for an n-bit multiplier it is desirable to describe design layouts in the same 

way, such that the multiplier need only be laid out once for all possible values of n. 

Precise control of layout is particularly useful for parameterised hardware libraries, where 

placement must also be parameterised, since any inefficiency will affect all circuits that use 

them. This also impacts on circuits produced from high-level descriptions using imperative 

languages such as Handel-C [12] where eventually the description must be implemented by 

combining libraries of circuits performing different functions. 

Controlling placement is also desirable for reconfigurable circuits in order to minimise the 

reconfiguration necessary when switching between different chip configurations since com- 

ponents at identical locations common to both configurations do not need to be changed.


B 

A 

C 

Figure 1.1: An irregular grid such as this one is impossible to describe using purely beside 

and below relative placement. 

Run-time reconfiguration is a growing area of interest as designers seek to make better use of 

the reconfigurable nature of FPGAs to improve performance, particularly since FPGAs are 

otherwise slower and more power-hungry than ASICs. 

Placement information can be specified within hardware description languages by giving 

explicit co-ordinates for each component, however this approach is tedious and error prone. 

The potential for errors is particularly significant for parameterised libraries since designs 

which have painstakingly been established as functionally correct may still fail to synthesise 

properly for some parameter values if these particular parameterisations produce an invalid 

layout – for example, one where more than one component is placed at the same location or 

components are placed outside the area on the chip allocated to the library design. 

The use of relative placement information, placing components beside or below one another, 

has been proposed for simplifying the process of laying out circuits and reducing the potential 

for errors. Errors can not be totally eliminated unless explicit placement is disallowed entirely 

and beside/below placement alone is utilised, however this approach is too restrictive to 

permit the description of all possible desirable circuit layouts. For example, it is impossible to 

describe the layout of components shown in Figure 1.1 using only beside and below placement 

directives. 

The objective of this work is to create a framework allowing the clear and efficient description 

of parameterised circuit descriptions with layout information and which provides a formal 

assurance of correctness for layouts. 

D 

E


1.2 Contributions 

The contributions of this thesis are: 

1. A framework supporting the addition of generic layout information to parameterised 

FPGA libraries described in the Quartz language. We show how these circuit de- 

scriptions can be compiled onto FPGAs and describe how they can be translated into 

parameterised library descriptions in the Pebble hardware description language and 

structural VHDL. We give all Quartz circuit descriptions composed using standard li- 

brary combinators a basic layout interpretation, which is often already optimal, and also 

allow descriptions to be annotated to change the generated layout while maintaining 

the same function through the use of the overloaded combinator blocks. (Chapter 3) 

2. We provide a specification of layout correctness and construct an environment based on 

higher-order logic to allow the verification of the manual layouts specified for Quartz 

combinators and for full circuits. We prove a range of useful properties about common 

circuit layout expressions which we use to automate the verification of layouts for new 

combinators or circuits using the Isabelle theorem prover. (Chapter 4) 

3. We develop and verify a range of placed combinator libraries for common structures 

such as rows, columns, grids, trees and less regular examples such as the pathological 

example shown in Figure 1.1 and a square-element interface configuration. We achieve 

high levels of automation in the verification of these combinator layouts, with many 

proofs completed without any human intervention and others requiring only minimal 

user involvement. We show that the verification framework can not only establish 

correctness of valid layouts but also highlight counter-examples where layouts are not 

correct. (Chapter 4) 

4. We introduce distributed specialisation, where HDL-level specialisation of circuits when 

one or more inputs are known can be achieved without centralised control. Distributed 

specialisation allows circuits to be specialised in order to produce higher-performance 

variants at the HDL-level, eliminating or reducing the need for slow low-level opti- 

misation of circuits in time-critical dynamic specialisation applications. Distributed 

specialisation also makes formal verification of the specialisation process much sim-


pler, allowing the correctness of all specialised circuits to be established by proving 

their equivalence to the original general circuit. We show that our layout framework 

supports distributed specialisation and can be used to achieve design compaction dur- 

ing specialisation of parameterised circuits, in contrast to more conventional low-level 

approaches which eliminate unnecessary logic but do not compact the circuit. Speciali- 

sation with compaction reduces the area on-chip that must be allocated to a circuit and 

we demonstrate that it improves performance for a simple parallel multiplier design. 

(Chapter 5) 

5. We describe and verify the layout for five example circuits, including a median filter, 

butterfly network and a matrix multiplier described using a new class of n-dimensional 

combinators, and investigate the benefits of using user-specified placement constraints 

during synthesis. We show that manually placed designs can be placed a d routed faster 

and often have higher performance and lower power consumption while requiring less 

logic area on a Xilinx Virtex-II device. Improvements of up to 80% in maximum clock 

frequency and a 61% reduction in area are observed. (Chapter 6) 

1.3 Organisation 

The remainder of this thesis is organised as follows: Chapter 2 presents relevant background 

information and related work. Chapter 3 introduces the layout description framework and 

illustrates how circuit descriptions with layout information can be compiled. Chapter 4 de- 

tails the layout verification environment and gives details of some key proofs. Chapter 5 

introduces distributed specialisation and demonstrates the use of the layout framework to 

produce specialised circuits. Chapter 6 describes the construction, verification and perfor- 

mance of some example circuits. Chapter 7 evaluates this work, draws conclusions and 

presents recommendations for future research. 

Appendix A gives the full grammar of the extended Quartz language. Appendix B gives 

the definitions and proofs in the verification environment for Quartz circuit layouts and 

Appendix C gives example proofs for a variety of library combinators. Appendix D contains 

some of the proofs for the layout correctness of the circuit examples in Chapter 6.

Chapter 2 

Background and Related Work 

This chapter details some of the background to the work in this thesis. Section 2.1 briefly 

introduces FPGAs and the process of creating FPGA circuits, including Section 2.1.2 which 

describes previous work enabling the specification of circuit layouts manually. Section 2.2 

summarises some of the different ways of describing FPGA hardware and Section 2.3 in- 

troduces the Quartz hardware description language. Section 2.4 discusses some of the back- 

ground to automated verification of hardware and Section 2.5 introduces the Isabelle theorem 

prover. Section 2.6 summarises the contents of this chapter. 

2.1 FPGAs 

Field Programmable Gate Arrays are programmable logic devices which aim to combine 

the user control and time-to-market benefits of programmable logic devices (PLDs) with the 

densities and cost benefits of gate arrays. In the past decade FPGAs have increased in speed, 

size and density such that they are no longer limited to implementing glue logic within a 

system and are now increasingly used to implement major functions or complete systems. 

For example, in 1998 the Xilinx XC4000 series provided approximately 10,000 logic cells 

while the latest Virtex-4 family now provides around 200,000 cells. Reconfigurable logic has 

been used to effectively implement designs in computationally-intensive applications such as 

digital signal processing and cryptography. 

6

CHAPTER 2. BACKGROUND AND RELATED WORK 7 

FPGAs consist of a grid of programmable logic cells and programmable routing to connect 

these computational elements together. The vast proportion of chip area is made up of 

programmable routing resources. As well as basic programmable logic cells, many FPGA 

architectures also include more complex hardware such as embedded RAMs, multipliers and 

instruction processors. 

The main component of each FPGA logic cell is typically an SRAM look-up table (LUT), 

which can be programmed to implement any n-input logic function. Each logic cell usually 

contains a flip-flop which can be connected to the output of the LUT and possibly other 

specific logic designed to accelerate particular common functions. 

FPGAs are programmed to adopt a particular configuration by loading a “bitstream”. This 

is usually done at start-up but it is also possible reconfigure FPGAs at run-time, adapting 

their function as they continue to process data. This bitstream is typically generated from a 

high-level description in some kind of hardware description language. 

2.1.1 Generating FPGA Circuits 

The process of generating FPGA configurations from hardware descriptions is usually a 

complex and time consuming process. Four major steps are: 

1. Synthesis: generating a graph representing of logical expressions representing a design 

from a higher-level hardware description. This could include the process of creating a 

logical description from an imperative one, such as Handel-C [12]. 

2. Mapping: assigning logic graph nodes into devices resources such as look-up tables or 

registers. Algorithms like FlowMap [13] can be used to achieve this. 

3. Placement: the placement of mapped resources onto specific resources of the target 

architecture. Automatic placement algorithms typically use heuristics such as simulated 

annealing [35]. 

4. Routing: configuring the programmable routing fabric to connect the placed resources 

together to implement the logic graph.


LEN: for i in 0 to bits −1 generate 

constant row : natural :=(( width −1)/2)−(i /2) ; 

constant column : natural :=0; 

constant s l i c e : natural :=0; 

constant rloc str : string := ”R” & itoa (row) & 

”C” & itoa (column ) & ” . S” & itoa ( s l i c e ) ; 

attribute RLOC of U1: label is rloc str ; 

begin 

U1: FDE port map ( 

Q => dd( j ) , 

D => ff d , 

C => clk , 

CE =>lcl en ( en idx ) ) ; 

end generate LEN; 

Figure 2.1: Using RLOCs with a VHDL generate statement to explicitly place FDE primitives 

These four stages can require a long period to run because each can involve many iterations 

to optimise the results. In addition the results are often sub-optimal, particularly for the 

synthesis stage when a high-level description is being translated into logic equations. It 

is common to use structural, rather than behavioural, hardware descriptions for designing 

circuits where performance is important, or for hardware libraries where the return from the 

effort spent on optimising them is high because any inefficiency would affect all designs that 

use them. 

Automatic placement algorithms, while capable of producing good results, operate at a low- 

level and often human designers are able to specify better layouts by hand, exploiting their 

knowledge of the higher-level design structure. When elements of a circuit are badly placed 

this can lead to unnecessarily long wires which have a negative effect on the circuit’s perfor- 

mance. 

2.1.2 Layout Information in Hardware Descriptions 

When describing high performance hardware it is possible to specify layout constraints in 

the source hardware description which replace or augment the automatic placement system 

to produce better circuit layouts. This is often most beneficial when a human designer can 

exploit knowledge about the structure of a circuit to describe a geometric arrangement of 

components that will lead to related logic being placed closely together.


Structural hardware descriptions in VHDL or Verilog can be annotated with “RLOC” place- 

ment constraints for targeting Xilinx FPGA architectures. Figure 2.1 illustrates how this 

can be used in VHDL to place a column of flip-flops. The placement co-ordinates are given 

using the co-ordinate system for Virtex and earlier Spartan devices, where positions are 

given as a string of “RmCn.Sp” to place a component in a particular row and column and 

then at a numbered slice at that position. Later device families, including Virtex-II, use a 

slice-based co-ordinate system where components are placed with a co-ordinate scheme of 

“XmYn” where each combination of (m, n) describes a different slice. 

The Pebble language [46] supports a simpler but equivalent placement infrastructure by 

allowing hardware instantiations to be annotated with at (x,y). These co-ordinates are 

mapped into an appropriate architecture-specific format when the Pebble descriptions are 

compiled into VHDL [48]. 

Specifying layouts with absolute co-ordinates is tedious and error-prone, particularly for 

parameterised hardware descriptions where placement constraints may work for some com- 

binations of parameters but not others. For parameterised, placed, hardware libraries this is 

a particular issue since even if the design if functionally correct some instantiations of it may 

not produce valid layouts. 

An alternative is relative placement, where components are instantiated below or beside one 

another. Relative placement is not as powerful as placement with explicit co-ordinates, how- 

ever if properly specified then it does remove the possibility of describing incorrect layouts. 

The Pebble system has been extended to support relative placement [49, 50]. All block 

instantiations must be contained within a beside or below block which describes layout on a 

grid. beside for and below for constructs are provided to handle iteration. The Pebble system 

is unique in that relatively placed circuit descriptions can be compiled into parameterised 

libraries with explicit co-ordinates, however it does not always do this in the most optimal 

manner as we will discuss in Chapter 3. 

Relative placement systems based on higher-order combinators have been demonstrated for 

Lava [7] and Ruby [24, 26, 31]. Ruby circuits are described as relations between inputs 

and outputs and can be given layout interpretations through their use of beside and below


combinators. Lava provides a more sophisticated layout system designed specifically to target 

Xilinx FPGAs with combinators to place components beside each other, below each other or 

at the same location. 

Circuit layouts derived from Ruby relational descriptions can not be invalid, since only beside 

or below placement is allowed. The Lava system is more powerful and, because it allows 

components to be placed on top of each other (desirable in order to instantiate the different 

hardware primitives within a single slice) can generate invalid layouts. 

2.1.3 FPGA Architectures 

The three main companies producing FPGAs commercially are Altera, Actel and Xilinx. 

Most FPGAs are based on static RAM although Actel produces FPGAs based on antifuses 

[23]. An antifuse is an electrically programmable two-terminal device which changes from 

high to low resistance when a programming voltage is applied to it. Actel FPGAs are mainly 

used in military applications. 

Altera FPGA architectures are based around simple logic elements which contain a look-up 

table, flip-flop and some additional circuitry to implement fast-carry chains. Xilinx architec- 

tures are based around more complex Configurable Logic Blocks, each of which contains four 

“slices”. 

Xilinx Virtex-II [86] is a typical family of FPGAs with 11 members, ranging from 40,000 

to 8M system gates. The slices within each CLB are arranged in two columns with fast 

connections between the slices in each column for propagating carry signals. Each slice 

contains two 4-input function generators, carry logic, logic gates, multiplexers and storage 

elements. Figure 2.2 shows the circuit diagram of the top half of a slice in the Virtex-II 

architecture. 

The 4-input LUTs in each slice are capable of implementing any boolean function of up to 

four inputs and the propagation delay of the component is independent of the function being 

implemented. In addition to the basic LUT a component that is particularly worthy of note 

is the MUXCY multiplexer which permits the implementation of fast carry signals between 

the slices arranged vertically in a column. The ORCY component and dedicated Sum of


SOPIN 

Dual-Port 

Shift-Reg 

G4 

G3 

G2 

G1 

WG4 

WG3 

A4 

A3 

A2 

A1 

WG4 

WG3 

LUT 

RAM 

ROM 

G 

D 

WG2 

WG1 

WG2 

WG1 

MC15 

WS DI 

ALTDIG 

BY 

SLICEWE[2:0] 

CE 

CLK 

SR 

SHIFTIN COUT 

0 

MULTAND 

WSG 

WE[2:0] 

WE 

CLK 

WSF 

1 

1 

0 

SHIFTOUT 

MUXCY 

0 1 

G2 

PROD 

G1 

BY 

CYOG 

MUXCY 

0 1 

Shared between 

x & y Registers 

XORG 

ORCY 

DYMUX 

YBMUX 

CE 

CLK 

GYMUX 

FF 

LATCH 

D Q 

CE 

CK 

Y 

SR REV 

Figure 2.2: Top half of a Virtex-II slice (Copyright 2000-2005 Xilinx, Inc) 

SR 

SOPOUT 

Products (SOP) chain are designed to support the implementation of large SOP expressions. 

Some FPGAs include full instruction processors on-chip, in order to provide some additional 

general purpose computing capability. General purpose processors which incorporate some 

reconfigurable logic are also under development [4]. These software-configurable processors 

are designed to combine the benefits of FPGA parallelism with fast general purpose compu- 

tation to offer significant performance gains. 

2.2 Describing Hardware 

With the increasing complexity of electronic systems computer aided design (CAD) tools 

have become increasingly important. For decades, hardware was primarily designed using 

CIN 

YB 

Y 

DY 

Q 

DIG


schematic capture. Basic elements (logic gates) were selected, placed on a schematic and then 

connected together. This “bottom up” design approach can take a long time to generate large 

circuits and results in designs that are difficult to change. 

In the past 20 years a change has taken place in the methodology of circuit design and 

today hardware description languages (HDLs) have come to be the dominant design mecha- 

nism. Hardware Description Languages (HDLs) offer a number of significant advantages over 

schematic-based design: 

• Designs can be produced at register transfer level (RTL) without choosing a specific 

fabrication technology. Logic synthesis tools can be used to automatically convert a 

design for any fabrication mechanism, including generating FPGA bitstreams. 

• HDLs facilitate a top-down approach to design [69] that allows successive refinement 

of a specification. 

• HDLs support shorter development phases in projects by allowing functional verification 

of circuits early in the design cycle. 

• A textual description of a circuit complete with comments is an easier way to develop 

and debug circuits than large schematics, which become extremely cumbersome for 

complex designs. 

Hardware languages can be characterised by their programming style and the level of abstrac- 

tion they provide from the underlying hardware. High level languages allow hardware to be 

described more easily but less precisely and tend to produce circuits quicker but with lower 

performance. Lower level structural languages are closer to traditional schematic capture for 

circuit design and allow precise control to produce high performance circuits. 

2.2.1 High Level Approaches 

The increasing complexity of integrated circuits is leading to a significant gap between the 

amount of logic (reconfigurable, or otherwise) it is possible to fabricate on a given area of 

silicon and the capabilities of circuit designers to make use of this capacity. This “design gap”


is leading to considerable interest in methods of describing hardware at higher and higher 

levels of abstraction. Even though circuits generated from high level descriptions may be 

less efficient and have worse performance than those that have been subject to painstaking 

low level effort this is often less important than the easier development and shorter time to 

market it allows. 

One approach to high level description of hardware is to compile imperative languages directly 

into hardware. Imperative programming languages specify explicit manipulations of the state 

of a computer system through a series of instructions and are commonly used by computer 

programmers to write software applications. Unlike in VHDL or other declarative languages, 

the designer can concentrate on describing an algorithm in a similar style to a conventional 

programming language. 

A number of imperative languages have been proposed for hardware description. Handel-C 

[12] is a high level imperative language designed to be compiled into hardware. Available as 

a commercial system, Handel-C is based on ANSI-C and has extensions provided specifically 

for hardware development including explicit declaration of data widths, parallel processing 

and communication between parallel elements. Cobble [82] is another imperative language 

that allows declarative blocks to be used within imperative programs, allowing the benefits 

of low-level and high-level programming styles to be exploited. 

Another approach that has been taken is to compile higher-level system models directly into 

hardware. The theoretical development of digital signal processing algorithms in particular 

is often conducted using mathematical tools like Matlab and systems have been presented for 

converting Matlab models into FPGA implementations automatically [36, 57, 58]. Other work 

includes graphical tools for composing hardware from block diagrams [56] and compilation 

of a domain-specific language for networking applications into FPGAs [37]. 

SAFL [54] is a functional language for high-level hardware description. Functional pro- 

gramming treats computation as the evaluation of mathematical functions, emphasising the 

evaluation of expressions rather than the execution of commands. Pure functional languages 

encourage the use of formal reasoning about programs and are also characterised by the use 

of higher order functions - functions which take other functions as arguments. Functional 

languages are often considered to have significant advantages over imperative languages [28]


for conventional computer programming. Many of these advantages (such as the expressive 

power of higher order functions) are also advantages for hardware design since they enable 

the more effective capture of common design patterns. 

2.2.2 Lower Level Approaches 

Lower level hardware descriptions are typically declarative. Declarative languages describe 

relationships between constructs, rather than specifying an explicit series of steps to follow 

and can describe hardware in a way that corresponds most closely to schematic capture. 

VHDL [30] and Verilog [81] are the leading HDLs in use in industry and both allow hardware 

to be described structurally in this way. 

VHSIC HDL (Very high speed integrated circuit hardware description language) was devel- 

oped in the 1980s and was accepted as an IEEE standard in 1987. VHDL supports the 

development, verification, synthesis and testing of hardware designs and is an extremely 

complex language with a rich range of constructs, many of which have no obvious hardware 

implementation. This makes it powerful language but also a difficult one to learn and to use. 

Pebble [46] is a simplified variant of structural VHDL that allows hardware to be described 

through the connection of parameterised blocks. 

Hardware ML (HML) [41] is an example of a hardware description language based on the func- 

tional programming language SML. HML is designed to combine the advantages of strongly 

typed languages with the conciseness of untyped languages. Lava [7] is a functional structural 

HDL based on the popular functional language Haskell. Lava is focused on describing not just 

the function of circuit elements but also their relative layout. Lava’s composition operators 

compose both behaviour (by connecting the output of one circuit to the input of the next) 

and layout (by placing one circuit next to another). Ruby [31] is a relational programming 

language that supports higher order and polymorphic relations. Ruby supports a relational 

view of hardware and formal reasoning about design correctness, however new users require 

some time to familiarise themselves with its variable-free notation.


block addthree ((wire a, wire b), wire c) ∼ (wire d) { 

wire t. 

(a,b) ; add ; t. 

(c, t) ; add ; d. 

} 

2.3 Quartz 

Figure 2.3: A Quartz block which adds together three numbers 

Quartz [62, 65] is a declarative block composition language intended for describing digital 

circuits. A Quartz description is composed of a series of blocks which are defined by their 

name, interface type, local definitions and body statements. A block’s interface is divided, 

in a relational style, into a domain and a range. Primitive blocks represent hardware or 

simulation primitives and control the function of the circuit, while composite blocks control 

the structure and inter-connections of the primitives. 

Blocks can be visualised geometrically, typically as four-sided tiles that can be placed on a 

two dimensional surface in some inter-connected manner. An abstract additional dimension 

allows for the connection of additional signals such as a clock or static parameter values 

without disturbing the underlying geometry. Figure 2.3 illustrates the Quartz description and 

visual representation for a block which adds together three values, the dotted line indicates 

the division between the block’s domain and range. It is common to reason about and 

refine Quartz and Ruby designs using pictures and tools have been developed for Ruby to 

produce design diagrams automatically [25]. This pictorial interpretation is a useful aid to 

implementation, since interconnection is often minimised by careful placement of components. 

Quartz differs from existing relational and functional languages by allowing designers to 

mix VHDL-like and relational styles in a single design as appropriate. The language is a 

development of work on Pebble [46] and the Ruby [31] relational calculus.


The Quartz framework is also intended as a platform for experimenting with language and 

CAD algorithm design. The relative simplicity of the language and the modular design of 

the compiler simplifies experimentation with new language constructs and processing stages. 

A workflow exists to take Quartz designs from an abstract specification through to real 

hardware using a compiler which transforms Quartz into Pebble then into VHDL. 

Quartz higher-order combinators allow circuits to be described quickly and concisely by 

having libraries of common circuit structures available such as rows and columns as well as 

more complex structures such as trees. Quartz designs can also be parameterised by integer or 

boolean input variables and the combination of these features leads to an expressive language 

which puts the power of the Ruby calculus within a practical VHDL-inspired framework. 

2.3.1 Type System and Overloading 

Quartz has three basic signal types for wires, integers and booleans. Integers and booleans 

are used for parameterisation to control the circuit description while wires are a primitive 

type which are not evaluated by the compiler during elaboration. Assignment to wires is 

overloaded allowing both boolean and integer values to be statically assigned to wires as 

well as wires being connected together. Although an integer assignment to a wire clearly 

has no meaning in terms of pure hardware it can be very useful for simulating word-level 

descriptions of a design where a wire type signal can represent a data word. 

The language supports both tuples and vectors of signals. While tuple size is fixed by the 

designer at coding-time the length of a vector can be parameterised. Blocks also have a block 

type and can be passed as parameters to other blocks. Any valid polymorphic operation can 

be carried out on blocks, for example it is acceptable to build a vector of blocks. 

Quartz uses implicit typing and infers most types using the Hindley/Milner type system 

[14, 27, 53], however designers are required to enter type signatures for each block entity. 

This requirement contributes to the documentation of the code and allows clear and localised 

feedback on type errors. Combined with the use of tuples rather than lists for fixed-arity 

groups of signals, this substantially accelerates design development by eliminating time spent 

chasing confusing type errors.


Quartz also supports overloading. Overloading, or ad-hoc polymorphism, describes the use 

of a single identifier to produce different implementations depending on context, the standard 

example being the use of “+” to represent addition of both integers and floating point num- 

bers in most programming languages. Quartz blocks can be overloaded by defining multiple 

blocks with the same name, a mechanism that has a number of uses, including: 

• Primitive blocks can be overloaded when multiple hardware primitives are available 

which essentially carry out the same operation but with different types. 

• Higher-order combinators can be overloaded when multiple blocks have the same basic 

function but different parameterisations, or different non-functional properties. 

• Composite blocks can be overloaded with primitive ones as “wrappers” around the 

primitives e.g. to provide multiple different interfaces to the same functionality. 

The inclusion of overloading allows the designer to work at a higher level of abstraction than 

would otherwise be possible. In order to maintain type inference and permit the automatic 

resolution of overloading without requiring explicit annotations by the designer Quartz uses 

a type inference algorithm based on satisfiability matrix predicates [63, 64] – matrices that 

represent possible values of a type and relationships between type variables. This system 

minimises ambiguity and can express n-ary constraints between type variables clearly and 

easily. 

A key feature of the Quartz overloading system is that it permits overloading of blocks 

with overlapping types. This means, for example, that it is possible to overload a fully 

polymorphic block with type τ ∼ τ with specific instances for wire ∼ wire and int ∼ int. 

The overloading mechanism will select the most specific matching block to use, so for this 

example the instance with type wire ∼ wire would be used if wire types were supplied, even 

though the polymorphic instance also matches. 

Satisfiability matrices can also support blocks with different numbers of parameters by using 

an empty/void type Ω, which can be used to “pad” matrices and block types so that they 

are all the same length. Ω only unifies with itself so blocks with the wrong number of 

parameters are eliminated from the matrix when unification fails. This is a particularly 

useful mechanism for overloading blocks with versions that are have the same function but


have additional tuples of parameters to specify non-functional properties. 

2.3.2 Statements 

Quartz composite blocks contain one or more statements which can instantiate other blocks. 

A block is instantiated by connecting the domain/range signals to the values applied to them 

in the calling environment then processing any statements contained within the block. If the 

instantiated block is a primitive then the instantiation leads to an instance of that primitive 

existing in the final synthesised hardware. 

Quartz provides specific constructs to compose together and instantiate blocks in series with- 

out explicit connections or in parallel. These relational composition operators allow circuits 

to be described very concisely and also aid formal reasoning and a transformational design 

style. 

Series composition, denoted by the semi-colon operator, connects the range of one block to 

the domain of another. Parallel composition, denoted by square brackets, allows the “side- 

by-side” instantiation of blocks that do not communicate. For example, the following Quartz 

statement uses composition to express that d = ¬(a ∧ b) and e = ¬c: 

((a, b), c) ; [and2 ; inv , inv] ; (d, e) 

As well as the block instantiation statements, Quartz also supports a range of other statement 

types including loops, conditionals and assertions. One construct worthy of particular men- 

tion is the signal connection operation “=”. This is a polymorphic, bi-directional operation 

expressing the equivalence between two signals. The bi-directional nature of signal connec- 

tion allows blocks to be re-arranged by higher-order combinators, for example wiring blocks 

can be inverted, and still be compiled into VHDL with output = input style assignments. 

2.3.3 Formal Reasoning 

The higher order combinators and wiring blocks that form the Quartz prelude library have 

simple mathematical properties, giving us a useful set of laws for transforming circuits. We 

can use Quartz laws to reason about Quartz circuits formally and to alter the representation


of a circuit or to refine a circuit from a high level abstract representation into a more concrete 

one (see [32] and [74] for an illustration of this approach). 

An example of the definitions at the basis of Quartz are shown in the definitions below of 

the Quartz identity and converse relations: 

a ; id ; b ⇔ a = b 

a ; R −1 ; b ⇔ b ; R ; a 

More complex Quartz blocks are usually reasoned about using recursive definitions, such as 

the one below for repeated series composition (rcomp): 

R 0 = id 

R n+1 = R ; R n 

As implemented in the prelude library most Quartz blocks are defined iteratively rather than 

recursively, since this leads to a more concise VHDL implementation, however it is usual 

to use the recursive definitions for formal reasoning. The equivalence of the iterative and 

recursive definitions for Quartz blocks can be established formally [62]. 

Using these simple formal definitions we can prove transformation rules that can be applied 

to Quartz block instantiation statements, or entire blocks. For example, Theorem 1 below is 

a useful precursor to a theorem which allows designers to retime repeated compositions: 

Theorem 1 R ; S = S ; R ⇒ R ; S n = S n ; R 

Proof By induction on n. Base case, n = 0 is easy using the relationship R ; id = id ; R. 

For the induction case n+1, we can expand the left hand side using the definition of repeated 

series composition: 

R ; S = S ; R ⇒ R ; S ; S n = S n+1 ; R 

Re-arranging using the precondition gives: 

R ; S = S ; R ⇒ S ; R ; S n = S n+1 ; R


We are then able to use the induction hypothesis: 

R ; S = S ; R ⇒ S ; S n ; R = S n+1 ; R 

and fold back using the definition of repeated composition to complete the proof: 

R ; S = S ; R ⇒ S n+1 ; R = S n+1 ; R 

When R is a functional block and S is the polymorphic delay element D (which can be 

implemented using D flip-flops) the pre-condition of this theorem is called the timeless con- 

dition and is the basis of many re-timing theorems which allow us to prove the equivalence 

of pipelined Quartz circuits to their combinational equivalents. 

Quartz transformations can be used as part of a transformational design methodology where 

a correct but inefficient or non-implementable specification is gradually refined into a useful 

implementation. Each step in this refinement is proved correct and thus the final result is 

correct by design, requiring no additional verification. The T-Ruby design system [74] is an 

example of a transformational design environment that provides assistance for this process. 

Transformational design has its critics [16] and its suitability for general-purpose circuit 

design remains questionable, however it can be effective for carrying out certain operations. 

One issue with transformational design is the question of completeness - whether for every 

valid specification and implementation there is a chain of correct transformations that can be 

applied to transform one into the other. The same problem applies to general theorem proving 

in a language such as Quartz, with is effectiveness the same process in a different direction. 

The completeness property is not well understood but it has been proved that complete 

transformation systems can not exist for languages that support unbounded recursion or 

iteration constructs (such as while loops) [83]. 

2.4 Automated Verification 

In recent years design validation has become the key bottleneck in the digital circuit design 

industry. Formal methods for verifying the correctness of hardware designs are one possible 

solution to this problem. With this approach hardware behaviour is described mathematically 

and a formal proof is produced to verify that the implemented circuit meets a rigorous


specification. 

The aim of automated verification is to have this formal verification process carried out auto- 

matically, or to provide substantial machine assistance allowing users to carry out verification 

more easily. 

Formal verification, however conducted, usually begins with the construction of a specifica- 

tion. This provides a high level description of the expected behaviour of the system being 

described given a particular sequence of inputs. In order to be useful, a formal specification 

should be constructed that is an unambiguous description in some formalism. Logic is a 

popular formalism for describing hardware functionality and logics such as first-order logic, 

higher-order logic and modal/temporal logic have all been used to describe hardware speci- 

fications. The choice of formalism depends on the style of verification to be performed (for 

example, what properties are to be verified). It is also necessary to build an implementation 

model of the system. For some systems, this may be the hardware description of the system 

itself but in others it may be necessary to apply some abstraction in order to produce a 

usable model. 

The key to verification is to relate these mathematical models at different levels of abstrac- 

tion. A set of desired mathematical expressions proved in the specification model should be 

shown to hold in the implementation model. The correctness of the original specification 

is absolutely essential to formal verification since without this it is not possible to make 

meaningful statements about the implementation model. 

One popular means of relating specification and implementation models is model checking. 

This approach has key benefits (such as its ease of use) but quickly becomes computational 

intractable as the size of the hardware to be verified increases. An alternative is theorem 

proving, where the formal semantics of hardware descriptions are shown to be equivalent 

through a chain of mathematical reasoning. Theorem proofs can be extremely large and 

complex, so mechanised theorem-proving tools are often used to help construct them. 

A fundamental point that must be laboured is that formal methods can not guarantee the 

correctness of the final product, only of the design process. At some level all formal methods 

involve assumptions about the correct behaviour of underlying layers - if not the tools that


compile a design description into actual hardware then the operation of the hardware itself. 

The limits of formal methods for verification must be kept in mind – the techniques do 

not guarantee correct hardware but they do promise to remove or reduce error in the most 

error-prone stages of the process. 

2.4.1 Model Checking 

Model checking is an automatic model based property-verification method that is widely 

applicable for verification tasks. Checking starts with a model description and attempts to 

discover whether hypotheses asserted by the user are valid in the model. In this way the 

model checker can verify properties of the model (such as freedom from deadlocks), or can 

provide counterexamples in the form of an execution trace which fails the test. 

Modal checking is based on temporal logic [29] which allows the expression of formulae over 

transition systems. Model checking is essentially the exploration of the full state space of a 

system and thus can be highly automated but the size of the model that can effectively be 

executed is limited by the practical constraints of computer processor power, memory, etc. 

Despite this through considerable practical work on data structures (for example BDDs [11]) 

circuits of considerable size have been verified using model checkers. 

Symbolic Trajectory Evaluation (STE) [73], for example, is a model checking approach de- 

signed to verify circuits with very large state spaces since it is more sensitive to the property 

being checked than to the size of the circuit. STE grew out of symbolic simulation and it 

still close to traditional simulation as a verification method. 

A number of commercially available tools for hardware verification through model checking 

are available from EDA vendors such as Cadence and Synopsis. Model checking is increasingly 

used in commercial circuit development as part of the verification process, although not 

totally replacing simulation. Model checking is particularly useful even in systems that are 

too large for full exhaustive checking (which is most full systems) because it finds counter- 

examples - state transition traces that do not meet the specification - and so can be used 

as part of a bug-fixing process. Simulation and model checking can be used in combination 

to explore a large state space, with simulation used to reach an interesting state and then


model checking used to explore exhaustively around that state. 

2.4.2 Theorem Proving 

In theorem proving the relationship between a specification and an implementation is re- 

garded as a theorem that must be proved in an appropriate formalism. 

The broadest interpretation of theorem proving can encompass most methods of formal ver- 

ification including checking boolean equivalence and model checking, however it is generally 

applied to mathematic proofs of the properties of systems. The chief advantage of this ap- 

proach is the formal proof established through this process can be justified at every step and 

thus the overall soundness of the process is ensured. However the size and complexity of 

even relatively simple theorems means that proof by a human is often a long and difficulty 

process. 

Mechanised theorem-proving systems can be used to aid the proof of large theorems. Despite 

the name, these systems are generally better regarded as proof assistants than provers, since 

they usually require considerable human intervention to steer them toward their goal. The- 

orem provers can often automate trivial stages of proofs leaving only the difficult parts for 

humans to tackle and many can automatically explore possible proofs as a tree search using 

a variety of different algorithms. 

Theorem provers and the field of automated deduction in general have a long history, dating 

back to Robinson’s demonstration of resolution as a basis for mechanised deduction [70] in 

1965. One of the earliest applications of theorem proving was to geometric problems, as 

we apply it in this thesis, with Gelernter’s geometry-theorem proving machine [19, 20] in 

1959. Computational geometry [67] is an active field in its own right that specifically tackles 

the kind of proofs we consider in Chapter 4. However, geometry-specific algorithms are too 

restrictive in the type of equations they can process as we discuss later in this thesis.


2.4.3 Comparison with Model Checking 

Although theorem proving tends to require much more human intervention than model check- 

ing, it has distinct advantages. Firstly, it does not suffer from the state-space explosion that 

afflicts model checking as circuits grow in size. Secondly, it is often easy to prove theorems 

for entire classes of circuits while model checking tends to be restricted to specific instances 

of circuits, for example, the Quartz retiming law proved earlier as Theorem 1 can be applied 

to any circuit described using repeated composition and similar laws have been proved for 

common circuit structures such as rows, columns etc. 

When there is a high degree of proof re-use possible theorem proving may have an advantage 

since key lemmas can be proved once and used in other proofs many times. Also, when model 

checking is used to completely verify an implementation against a specification it is often 

necessary to expend considerable effort on simplifying the implementation model to reduce 

the state space of the problem to something tractable. Recent work [5] has demonstrated that 

theorem proving can produce better/stronger results in a similar amount of time to model 

checking for the verification of a security architecture, although a greater level of proof expert 

involvement was required. 

2.4.4 Logic and Proof 

The essential step in all theorem proving is to formulate the problem in some kind of logic and 

the choice of logic to be used is an issue of some debate. Problems can be formulated either 

in “raw logic” or can be embedded in an application-specific notation however the power of 

the underlying logic is key. Simple logics support more automation and computer-assisted 

proof search procedures are more likely to be effective however powerful logics support better 

specification and embedding. 

Set theory and first order logic is a standard logical foundation for many theorem proving 

applications. Higher-order logics, which allow more flexibility in the scope of quantifiers and 

the use of higher-order arguments to functions/predicates, are more expressive than first 

order logic however they are less “well behaved” and this makes automation more difficult. 

In general, it is easier for tool users to switch between different logics, which all share common


themes and concepts, than it is to switch between different proof methodologies or theorem 

proving tools. It is questionable whether there is any benefit in attempting to enforce a 

standard logic, though it seems likely that any moves in this direction in the future will be 

driven by EDA tool builders. 

Most mature theorem proving tools support both top-down/backward proof and bottom- 

up/forward proof. Backward proof involves the statement of a theorem goal and the applica- 

tion of rules to split the goal to be proved into subgoals. Each subgoal can then be handled 

in the same manner, splitting the goals repeatedly until trivial subgoals are reached that 

can be proved directly from logical axioms. Forward proof proceeds by starting from basic 

axioms and combining them using rules of inference until eventually the goal to be proved is 

deduced. Both styles have advantages and drawbacks, for example every step in a forward 

proof is correct and proved, however it may not bring the user any closer to proving the main 

proof goal. In a backward proof however, a proof is guaranteed to terminate at the correct 

conclusion but it may not actually be a proof of anything at all, unless it can eventually be 

reduced to something axiomatic. 

There are two main styles of interacting with a theorem prover: declarative or imperative 

(tactic-style). The imperative style effectively involves the creation of a proof-generating 

program as a combination of prover tactics in a typically prover-specific format. This style, 

typified by the descendants of the LCF theorem prover [21], is useful for finding proofs and 

for programming verification algorithms but produces output that is generally unreadable. 

As soon as the proof itself and not just the existence of a proof becomes important, the 

declarative style becomes beneficial. This style, pioneered by the Mizar proof checker [71], 

involves the statement of a series of lemmas or subgoals leading to a conclusion. Declarative 

systems are good for mechanised checking of proofs and can produce proof scripts that are 

easily readable by humans but are unwieldy and impractical for finding the original proofs. 

2.4.5 Theorem Proving Tools 

There are a range of theorem proving systems in widespread use. One is HOL, named after 

Higher Order Logic, the underlying formalism. HOL [22] is intended to be a general platform 

for the modelling of systems in higher-order logic with reasoning based on natural deduction


with a few primitive inference rules and axioms. HOL has only a few kinds of primitive 

terms: variables, constants, function applications and λ-abstractions with all other notations 

are derived from them (it is worth remembering that higher-order logic is based on typed 

λ-calculus). 

This simple core logic means that eventually all reasoning should be reduced to these primitive 

inference steps and this approach is quite low level relative to other theorem provers. HOL 

uses tactics to guide the system in the application of primitive steps toward solving theorem 

proving goals. A tactic can be regarded as a high level proof step where the primitive 

steps necessary to achieve the same functionality are carried out automatically. Tacticals are 

functions are used to combine a series of tactics into a larger step of inference. 

HOL has been used extensively in hardware verification, including the verification of full 

microprocessors [18, 33]. 

PVS, the Prototype Verification System [59], is a general-purpose interactive verification en- 

vironment developed at SRI International. The specification language of PVS is based on 

higher-order logic but also incorporates predicate types and subtypes that allow the defi- 

nition of partial functions. These constrained types lead to a type checking process that 

is undecidable and type correctness may incur additional proof obligations for the user to 

manage. 

Inference steps in PVS proceed at a high level, with primitive rules for operations such 

as boolean simplification and decision procedures for linear arithmetic. Unlike HOL it is 

therefore not necessary to rely on tactics to the same extent in order to build usable proof 

steps for the interactive environment. Strategies, which are analogous to HOL tactics, can 

be constructed to automate a sequence of PVS inference steps. 

PVS is less customisable than other theorem provers, however the high degree of automation 

makes it a very practical tool. One example of the large-scale use of PVS is the verifica- 

tion of the AAMP5 microprocessor [79], a commercial processor with around half a million 

transistors. 

ACL2 [34] is an automated reasoning system based on Boyer-Moore logic [9], a first-order, 

quantifier-free logic. ACL’s logic is a very small subset of Common Lisp, a standard list


processing language. Models of all kinds of systems can be built in ACL2 and, once written, 

can be executed as Lisp programs. 

Since ACL2 is based on a first-order logic it is considerably less expressive than theorem 

provers such as HOL which use higher-order logic, however the simply logic allows a very 

high degree of automation with little user intervention required. The user directs the proof 

search procedure by proving supporting lemmas. 

ACL2 has been used to verify some industrial processors, for example a Motorola digital 

signal processing chip [10]. 

Isabelle [61] is a descendant of the LCF system [21]. Unlike most theorem provers which 

focus on providing a single underlying formalism, Isabelle is intended to be used as a general 

platform for the implementation of theorem provers in a large variety of logics. The motiva- 

tion behind the creation of Isabelle is that, while theorem proving is an extremely difficult 

problem, most of the difficulties have to do with logic in general rather than any particular 

logic. A generic theorem prover will probably never provide the full range of support that a 

dedicated prover for each logic could, however by reducing the “barriers to entry” it makes it 

more likely that some proof support will be available for less common or application specific 

logics. Furthermore, since it is currently easier to learn a logic than to learn how to use 

a theorem prover it is also easier for users to learn a single prover and then use the most 

appropriate logic for their needs. 

2.4.6 Embeddings 

In order to using theorem proving for verification of hardware written in a description lan- 

guage it is necessary to develop an implementation of the language in a logic within a theorem 

prover. This process is referred to as embedding and broadly speaking these can be further 

sub-divided [8] into deep embeddings and shallow embeddings. 

A shallow embedding, or semantic embedding, involves the definition of the meaning of the 

language directly in terms of the connectives in the logic. A deep embedding is characterised 

by the definition of the syntax of a language in a formal logic, typically as some sort of 

abstract data type and the definition of a semantic meaning function.


A shallow embedding is typically much easier to construct than a deep embedding since the 

meaning of the language can be encoded directly. With a deep embedding the meaning of 

the language must be defined as a function over the abstract syntax in a way that can be 

cumbersome however it does have the advantage that the theorem prover is able to reason 

over syntactic structures and state theorems about all programs. 

2.5 Isabelle: A Generic Theorem Prover 

Although the work presented in this thesis is not dependent on a particular theorem prover, 

we make extensive use of the Isabelle proof tool and a basic understanding of its capabilities 

and how they can be used is useful for understanding the work presented in Chapter 4 in 

particular. 

2.5.1 Meta-logic 

Isabelle’s distinctive feature is its representation of logics within a small fragment of a higher- 

order logic, called the meta-logic [60]. 

Isabelle’s meta-logic is typed with basic and function types σ → τ. The basic types depend 

on the logic being represented but always include the type prop for propositions. The terms of 

the meta-logic are essentially those of the typed λ-calculus - constants, variables, abstractions 

and function applications. 

There are essentially three operations within the meta-logic: universal quantification, meta- 

implication and meta-equality. Isabelle uses the symbols , =⇒ and ≡ for these operations to 

avoid confusion with the equivalent operations in object logics. Implication expresses logical 

entailment, quantification expresses generality in rules and axiom schemes and meta-equality 

is intended for expressing definitions. 

Isabelle’s meta-logic is simply a logic like any other complete with basic inference rules. For 

example, Figure 2.5.1 shows the meta-logic rules for universal quantification and implication 

introduction and elimination.


[φ] 

ψ 

(⇒-I) 

φ ⇒ ψ 

φ ⇒ ψ φ 

(⇒-E) 

ψ 

(Î-I) 

φ 

x.φ 

 

x.φ 

(Î-E) 

φ[b/x] 

Figure 2.4: Isabelle meta-logic inference rules 

The power of Isabelle lies in the ability to use the meta-logic to represent the inference 

rules of other logics. Object logics are formalised by extending Isabelle’s meta-logic with 

types, constants and axioms. The natural deduction rules of object logics are represented by 

meta-level axioms. For example, the rules for introduction and elimination of the logical and 

operation in first order logic can be expressed as: 

P Q 

(∧-I) 

P ∧ Q 

P ∧ Q 

(∧-E1) 

P 

P ∧ Q 

(∧-E2) 

Q 

Declared as axioms in the Isabelle meta-logic these inference rules can be described by: 

P ; Q =⇒ P ∧ Q 

P ∧ Q =⇒ P 

P ∧ Q =⇒ Q 

Where the nested implication φ1 =⇒ (· · · φn =⇒ ψ) can be abbreviated as φ1 ; . . . ; φn =⇒ ψ 

which allows the easy expression of a rule with n premises. The syntactic resemblance 

between the meta-level axioms and the original inference rules is a happy coincidence arising 

from the similarities of the logic being represented to the meta-logic. In general, Isabelle 

possesses sophisticated mechanisms for supporting object logics with syntax independent 

from the meta-logic through syntax declarations and transformation rules which can be used 

to rewrite parsed abstract syntax trees.


The question can of course be raised as to whether the meta-logic representation is correct. 

Paulson [60] defines the properties that a meta-logic formalisation is faithful if it admits no 

incorrect object-level inferences and adequate if it admits all correct object-level inferences. 

2.5.2 Theories 

The basic building block of Isabelle mathematics is theories, which organise syntax, declara- 

tions, axioms and proofs. Theories are built starting from the Pure theory which represents 

the meta-logic, by extending and combining existing theories. 

Isabelle theories support multiple inheritance and theory dependencies form a directed acyclic 

graph (DAG). Theories can declare additional syntax for constants (operators) within a 

logic using a priority grammar where each nonterminal is annotated with an integer priority 

which controls how it is parsed. Mixfix annotations allow the formulation of sophisticated 

grammar productions to produce readable notation. Special support exists for variable- 

binding constructs such as quantifiers which can be declared as binders. 

Figure 2.5 shows the definition of a minimal logic of implication in Isabelle. Line 1 begins 

the MinLogic theory by stating that it inherits directly from the Pure meta-logic. Lines 2-5 

declare a type o of object logic formulae and line 7 declares a coercion from formulae to 

propositions. This allows object level operators to be defined over the type o rather than 

the general meta-logic prop type, which is important to prevent object logic operations being 

used on meta-logic propositions themselves. 

The consts section defines the constants and operators in the logic, annotating them with 

their types (the short double arrow ⇒ indicates a function type, and [a, b, c] ⇒ d abbreviates 

a ⇒ b ⇒ c ⇒ d). Lines 7 and 8 demonstrate the use of mixfix syntax annotations to describe 

how the constructs should be parsed, for example infixr indicates that the implication symbol 

should be regarded as a right associative infix operator. 

The axioms section declares the three inference rules of the logic as meta-logic axioms. 

Isabelle provides a wide-range of existing theories, grouped into complete object logics. The 

most commonly used Isabelle logics are first-order logic, ZF set theory (which is built as 

extension of first-order logic) and an implementation of higher-order logic in the style of the


1 theory MinLogic = Pure : 

2 types 

3 o 

4 arities 

5 o :: logic 

6 consts 

7 Trueprop :: "o ⇒ prop" (" " 5) 

8 "−→ ":: "[o,o] ⇒ o" (infixr 10) 

9 False :: "o" 

10 axioms 

11 impI : "(P =⇒ Q) =⇒ P −→ Q" 

12 impE : " P −→ Q; P =⇒ Q" 

13 falseE : "False =⇒ P" 

14 end 

HOL system. 

Figure 2.5: A minimal Isabelle logic 

2.5.3 Unification, Resolution and Proof 

Unification [70] is equation solving, for example the solution of solving f(?x, c) ≡ f(d, ?y) is 

?x = d and ?y = c. Isabelle uses higher-order unification, which operates on typed λ-terms 

as an equation solving mechanism to support the application of rules to goals. Higher-order 

unification also handles function unknowns so must guess the unknown function ?f in order 

to solve the equation ?f(t) = g(u1, . . . , uk). Isabelle denotes unknowns for unification (called 

schematic variables) by prefixing their name with a question mark. Logically schematic 

variables are similar to free variables however while ordinary variables remain fixed unknowns 

may be instantiated by unification. 

Resolution is used to combine two Isabelle theorems, renaming variables and instantiating 

unknowns as necessary to allow rules to be unified with the current proof state to create the 

next state. Resolution can be used for both forward and backward proof, although backward 

proof tends to be the preferred proof mechanism in Isabelle. A meta-level theorem such as 

A ; B =⇒ C can be regarded as an inference rule with premises A and B and conclusion 

C or can equally be viewed as a proof state with subgoals A and B and main goal C. 

In backward proof a goal is unified with the conclusion of a rule and the premises are created 

as new subgoals. For example, consider the trivial proof of the theorem A −→ A in the logic


defined in Figure 2.5. This proof can be represented in Isabelle as: 

theorem "A −→ A" 

apply (rule impI) 

apply (assumption) 

done 

The first instruction is for Isabelle to apply the rule impI which lifts the object level implica- 

tion to a meta-logic implication of the form P =⇒ P (The unknowns P and Q in the axiom 

are both unified with the constant A). From this state the proof can be completed using the 

assumption method which attempts to solve the right hand side of a meta-implication using 

the assumptions on the left. A =⇒ A can be proved this way leaving an empty (complete) 

proof state and the proof is completed with the done command. 

Isabelle can also solve simple theorems automatically. For example, the proof (in higher-order 

logic) that reversing a list twice is the same as the original list can be proved as: 

theorem rev rev [simp]: "rev (rev xs) = xs" 

apply (induct xs) 

apply (auto) 

done 

This theorem has been named (rev rev) and declared as a simplification rule. It is proved 

by the application of induction on the variable xs and then automatically by Isabelle. The 

auto method uses both of Isabelle’s two most useful automation tools, the simplifier and the 

classical reasoner. 

The simplifier is a term rewriting tool. It repeatedly applies equations from left to right, 

using declared sets of simplification rules. Virtually any theorem in the form A = B can be 

used as a simplification rule however it is generally the case that only rules which genuinely 

simplify the proposition should be declared as automatic simplification rules. In the example 

above,xs is clearly simpler thanrev (rev xs) so this could be safely used as a simplification 

rule however if the theorem were proved in reverse – xs = rev (rev xs) – then this would 

not be a suitable simplification rule. Not only does it not simplify the proposition but it 

will actually cause the simplifier to loop infinitely by continually expanding the xs on the 

right hand side. It is often useful to invoke the simplifier by itself, which can be done using 

Isabelle’s simp method.


The classical reasoner is a family of tools which performs backward proofs automatically. 

The classical reasoner performs backward proof search and, combined with the simplifier, can 

prove many theorems without much user intervention. The classical reasoner can decompose 

goals into less complex subgoals using pre-proved lemmas and thus can be guided by supplying 

sets of useful lemmas that will substantially shorten the proof. 

2.6 Summary 

In this chapter we have introduced some of the background to the work described in this 

thesis and highlighted some related work. We have described FPGA architectures in general 

and examining the structure of the Xilinx Virtex-II FGPA family in more detail. We have 

described Quartz, the hardware description language we use as the basis of this work and 

shown how it supports formal reasoning. 

We have described the field of automated verification based on theorem proving and com- 

pared this approach with model checking. Finally, we have discussed in detail some of the 

capabilities of the Isabelle generic theorem prover we use in this work.

Chapter 3 

Generating Parameterised 

Libraries with Layout 

In this chapter we describe how the Quartz framework can be extended to allow the generation 

of parameterised hardware libraries with layout information. Section 3.1 introduces our 

motivation for supporting hand placement of designs and discusses the benefits of relative 

placement. Section 3.2 introduces the basic concepts behind our placement infrastructure 

and Section 3.3 discusses what is required to provide language support for hand placement in 

Quartz. Section 3.4 demonstrates how block sizes can be described by extending the class of 

Quartz expressions and Section 3.5 demonstrates how the Quartz layout system can describe 

parameterised combinators and full designs. Section 3.6 shows how blocks can be given 

multiple layout interpretations. Section 3.7 describes the compilation of placed Quartz into 

an extended version of Pebble 5, while Section 3.8 describes how this can then be compiled 

into parameterised VHDL. Section 3.9 summarises this chapter and discusses related work. 

3.1 Introduction 

Placement and routing are critical steps in the compilation of a high-level hardware de- 

scription onto the reconfigurable fabric of a FGPA. The effectiveness of the placement and 

routing algorithms has a significant impact on the performance of the resulting circuit since 

34

CHAPTER 3. GENERATING PARAMETERISED LIBRARIES WITH LAYOUT 35 

a badly placed design will feature unnecessary long wire lengths with accompanying delays 

and impact on maximum clock frequency. 

While modern place and route systems based on simulated annealing can achieve excellent 

results for the highest performance it is still common to intervene manually in the placement 

of designs and much of the value of Intellectual Property Cores, such as those produced by the 

Xilinx Core Generator tool, is that they are carefully laid out to provide good performance. 

Manual placement is of particular value when a human designer can recognise and exploit an 

underlying regularity in a high level description. 

Ensuring optimal resource use and layout is particularly important for parameterised hard- 

ware libraries since any inefficiency will affect all designs that use them. Controlling place- 

ment is also desirable for reconfigurable circuits to support partial reconfiguration, since 

identical components at identical locations common to two different configurations do not 

need to be reconfigured when switching between them. 

The Xilinx “RLOC” placement macro allows primitive hardware components to be placed 

relative to each other in a structural design description. RLOCs provide a relatively low-level 

interface to influence the place and route system and allow the production of parameterised, 

fully or partially laid out designs in a conventional hardware description language like VHDL 

or Verilog. However, supplying explicit coordinate information for every component in a 

large circuit is tedious and error-prone. 

Relative placement, laying out components beside or below one another, has been proposed 

for producing designs. Systems that support this technique include Ruby [26], Lava [7] and 

Pebble [49, 50]. Neither the Ruby-based or Lava-based systems that have been developed 

maintain full parameterisation while processing layout, something that makes them unsuit- 

able if the desired goal is to produce parameterised library descriptions. The Pebble system 

does maintain parameterisation while converting relative positions into explicit coordinates 

however the generated layouts are not necessarily optimal and designers are restricted to 

using relative positioning only, without any explicit coordinates. 

In this chapter we develop a system that allows the combination of relative and explicit 

placement information in a single framework and the compilation of these descriptions into


FPGA libraries in the industry standard language VHDL while maintaining a high degree of 

parameterisation. 

3.2 Placement Infrastructure 

Our system is based around the Quartz framework [65]. Quartz is a simple language that 

incorporates features found in a wide range of other languages, thus providing a good in- 

frastructure for seeing the impact of different features on the layout system. This also means 

that the framework we develop can be adapted to many other languages merely by removing 

aspects the language does not support. 

Quartz supports polymorphism, higher-order combinators, overloading and compilation to 

parameterised VHDL via the Pebble system. The language also supports both recursion 

and explicit iteration with a for loop structure. While the favoured method of repetition 

in functional languages is recursion, parameterised circuit descriptions are more commonly 

written iteratively so support for both mechanisms is useful. 

Quartz is a high-order language that supports hardware blocks which can be parameterised 

by other blocks. These higher order combinators can be leveraged to describe component 

placement as well as functionality, thus eliminating the necessity of adding specific language 

constructs for below, beside, below for... etc as was the focus of the earlier work on parame- 

terised placement in Pebble. 

Two key concepts underly our Quartz placement infrastructure: an explicit placement at 

(x,y) command for block instantiations and a hierarchical approach to placement so explicit 

coordinates within each block are relative to a local origin. 

However, in a high-order language such as Quartz absolute explicit co-ordinates are even less 

useful than in a language such as VHDL. Because hardware blocks can be passed as higher- 

order parameters the actual block instantiated by a particular piece of code can actually be 

unknown at design-time, leaving the problem of specifying absolute coordinate locations for 

the next block to place an intractable one since it depends on the precise size of the unknown 

block. It is therefore necessary to support relative positioning even in explicit coordinates


so that, for example, block R can be positioned explicitly at an offset of the (unknown) 

size of block S. We do this by providing height() and width() functions that can be used in 

expressions to explicitly refer to the size of a particular block. 

Quartz combinators can therefore be used to provide relative placement capabilities though 

they are themselves described in terms of explicit coordinates. This approach has the ad- 

vantage that layout and function can be described by the same code and in many cases a 

geometrically sensible layout is a natural by-product of the functional description. 

The Quartz layout system provides an architecture independent abstract homogeneous grid 

of resources. This allows the layout infrastructure to be developed in an architecture indepen- 

dent manner and layout information can be translated into the specific format required for 

the target architecture during compilation. The assumption that the available resources are 

arranged in a homogeneous grid is an approximation to the complexity of modern FPGAs 

where the device fabric will typically also include specialised resources such as embedded 

RAM or fast multipliers however it is sufficiently close to reality for practical use. 

The Quartz layout grid is composed of elements of uniform size 1 × 1, so all positions on the 

grid can be addressed using purely integer co-ordinates. This is, once again, an approximation 

to FPGA structure where a single computational block has multiple resources, for example 

the Xilinx Virtex-II slice illustrated in Chapter 2 includes a look-up table and also other 

logic gates and multiplexers. How this simplified grid is used to describe real circuits will be 

demonstrated in Chapter 6. 

3.3 Requirements 

Figure 3.1 shows the different Quartz statements. Each type of statement produces particular 

kinds of structures with specific layout requirements. Our layout infrastructure must be able 

to support all of these structures. It is important to note that what we are essentially 

discussing how the size of each statement can be described since if the explicit coordinate 

system is to support fully relative placement then it is necessary to be able to place statements 

relative to the size of other statements. The same approach can be extended to whole blocks 

and each block should be given a size function, parameterised in the same way as the block


〈stmt〉 ::= assert (〈expr〉) "〈string〉" . 

| 〈expr〉 = 〈expr〉 . 

| 〈arg〉 ; 〈blkref 〉 (; 〈blkref 〉 )* ; 〈arg〉 . 

| for 〈id〉 = 〈expr〉..〈expr〉 { 〈stmt〉* } . 

| if (〈expr〉) { 〈stmt〉* } ( else { 〈stmt〉* } )? . 

〈blkref 〉 ::= 〈id〉 〈arg〉* | [ ] | [ 〈blkref 〉 (, 〈blkref 〉)* ] 

〈arg〉 ::= 〈id〉 〈vecindex 〉 | 〈expr〉 

| ( ) | ( 〈arg〉 (, 〈arg〉)* ) 

Figure 3.1: Grammar of Quartz statements 

itself, which describes the size of the statements within the block. 

Assertion statements are compile-time directives that check for the validity of particular pre- 

conditions and thus have no effect on the resulting layout. The assignment operation has 

three (overloaded) uses: 

1. Assigning values to variables which may control elaboration. 

2. Assigning static values to wires. 

3. Connecting wires together. 

(1) clearly does not have any direct effect on layout (save that it may affect the way later 

statements are processed). In our model that concentrates on layout rather than routing we 

will assume that (2) and (3) also have no effect on layout. This is a reasonable assumption 

since FPGA routing resources are independent from computational ones and it is not generally 

necessary to allocate computational logic resources to either of these functions 1 . 

Conditional statements allow different hardware to be generated depending on the evaluation 

of a boolean expression. This is significant because the two possible branches of the condi- 

tional can have different sizes and thus any reference to the size of the overall conditional 

must take this into account. The easiest way to do this is to allow placement expressions to 

contain conditionals of their own, a simple and remarkably powerful approach. 

Loops allow the instantiation of multiple hardware blocks. An important realisation for the 

1 Issues with insufficient routing resources are unusual in most cases when using the standard placer and 

router. However, if a lot of logic is packed with LOC/RLOC constraints regardless of the available routing 

resources this could increase the risk of routing congestion making it possible to describe circuits that can 

not be routed.


general use of these constructs is that each iteration of the loop need not have the same size 

(though it often will), for example the loop statements could themselves contain a conditional. 

Additional functions will need to be added to the placement expressions to allow placement 

relative to arbitrary loop constructs to be described. 

Block instantiations include not just instantiations of individual blocks but also series and 

parallel compositions of blocks. Series and parallel compositions must be given their own 

layout interpretation to ensure that blocks within a composition are laid out correctly, while 

the whole composition itself can be placed using the at (x,y) command. 

An important requirement for a truly general Quartz framework is that it must be able to 

support the full range of Quartz block parameterisation. Two features of Quartz that differ 

from languages like Pebble/VHDL are a particular issue here: 

• Quartz blocks are relational and may have multiple possible interpretations in terms of 

inputs/outputs. During compilation a single input/output arrangement is selected for 

each block however when describing the size of a block in terms of its inputs it is not 

always clear which parameters supplied to the block are inputs and which are outputs. 

• Variable parameters are not restricted to a particular “generics” region of the block 

interface and can be distributed anywhere throughout the block’s domain/range. Fur- 

thermore, Quartz blocks can output variables as well as being parameterised by input 

variables. The compilation of one block may therefore produce a value which impacts 

the compilation of a later block. 

Since a block size can depend on any variable parameter it is important that the block size 

function should be in terms of all input variables, whether in the domain or range. However, 

this is complicated further by polymorphism – which signals are variables and which are 

real hardware is not in general known. While a polymorphic variable can not effect the 

compilation, and hence size, of the immediate block (otherwise its type would have been 

deduced as either int or bool during the type-checking stage), it could be supplied as a 

parameter to a higher-order parameter block and (sometimes) affect that block’s size, hence 

affecting the calling block’s size indirectly. In order to support this possibility block size 

functions must be phrased in terms of all domain and range signals, since any one of them


could affect the block’s elaboration. 

A good example of the usefulness of not restricting variables to a specific region of block 

parameters are shadow values [75, 76]. This refers to when hardware wires are “paired” 

with variables which control the instantiation behaviour of ‘clever components’ - components 

which take account of the context (as expressed by the variable values paired with each wire) 

to generate appropriate hardware). Sheeran describes the use of shadow values to control 

the elaboration of a somewhat regular circuit for finding the median of a set of values and we 

discuss the use of similar (though differently motivated) constructs for specialising designs 

in Chapter 5 of this thesis. 

3.4 Block Sizes 

3.4.1 Size Expressions 

Unlike a functional language such as Haskell and in a style that is more similar to a language 

like VHDL, Quartz statements are distinct from arithmetic or logical expressions. In the 

previous section we established that since block sizes can vary depending on statement exe- 

cution the system to describe block sizes must mirror statement structures, thus we extend 

Quartz expressions with new constructs to facilitate this. 

Figure 3.2 shows how Quartz expressions have been augmented. The height and width func- 

tions allow an expression to refer explicitly to the size of a block or block composition. 

They are parameterised not just by a block name but by a full block instantiation, such as 

(a, (b, c)) ; sndadd ; mult ; d. This allows all possible parameterisation information to be cap- 

tured. max is an n-ary operator which selects the maximum value from a set of expressions. 

The conditional, if(cond, etrue, efalse), chooses one of two values depending on the value of 

the conditional test. 

The sum(i = l..u, e(i)) function sums an expression over a range, implementing 

u 

e(i). 

Unlike the mathematical definition of summation, if the lower bound is greater than the 

upper bound the function returns zero, rather than being undefined. The maxf function is 

parameterised in the same way and selects the maximum value of the expression in the range, 

i=l


〈expr〉 ::= 〈expr〉 〈bop〉 〈expr〉 

| 〈uop〉 〈expr〉 

| 〈var〉 

| 〈num〉 

| true | false 

| height ( 〈blkinst〉 ) 

| width ( 〈blkinst〉 ) 

| max ( 〈expr〉, 〈expr〉 (, 〈expr〉)* ) 

| if ( 〈expr〉 , 〈expr〉 , 〈expr〉 ) 

| sum ( 〈id〉 = 〈expr〉 .. 〈expr〉 , 〈expr〉 ) 

| maxf ( 〈id〉 = 〈expr〉 .. 〈expr〉 , 〈expr〉 ) 

〈bop〉 ::= and | or | nand | nor | xor | xnor 

| + | - | * | / | ** | mod | == | != 

| < | | >= 

〈uop〉 ::= - | abs | not 

Figure 3.2: Grammar of Quartz expressions 

once again returning zero if the lower bound is greater than the upper bound. 

More formally, the semantics of Quartz expressions are described by the evaluation function 

E. The clauses describing the semantics of the new operations are shown in Figure 3.3. 

The concept of a “function” is kept clearly restricted to block height and width expressions, 

while elsewhere the semantics are based on substitution (shown as {a ↦→ b}) rather than 

λ-abstraction. This distinction exists because Quartz expressions must be compiled into 

Pebble/VHDL expressions without functions, λ-calculus like semantics are restricted to block 

size functions since these will be eliminated during Quartz compilation. We will show in 

Chapter 4 how λ-calculus style functions can be used to achieve the same semantics. 

Note that we have extended expressions with max and maxf operators but have not included 

their duals min and minf. While this might be desirable in the interests of symmetry actu- 

ally we do not require minimum-finding operations for generating layouts and in any event 

expressions using these functions can be manipulated to eliminate them using the following 

two relationships: 

Theorem 2 ∀a b. min(a, b) = −max(−a, −b) 

Theorem 3 ∀e1 e2 e3. minf(i = e1..e2, e3) = −maxf(i = e1..e2, −e3)


E :: SizeEnv → BlockEnv → V arEnv → Exp → Exp 

Eσ β µ height(x ; bs ; y) = 

let (w,h) = SIσ βbs in Eσ β µh(x, y) 

Eσ β µ width(x ; bs ; y) = 

let (w,h) = SIσ βbs in Eσ β µw(x, y) 

Eσ β µ max(e1, e2, . . . , en) = 

let e ′ 1 = Eσ β µe1 in 


. 

let e ′ n = Eσ β µen in 

max (e ′ 1 , e′ 2 , . . ., e′ n) 

Eσ β µ sum(i = e1..e2, f) = 


if e ′ 2 < Eσ β µe1 then 0 

else Eσ β µ{i ↦→ e ′ 2 }f+sum(i = e1..(e ′ 2 

Eσ β µ maxf(i = e1..e2, f) = 

− 1), f) 

let e ′ 1 = Eσ β µe1 in let e ′ 2 = Eσ β µe2 in 

if e ′ 2 < e′ 1 then 0 

else if e ′ 2 = e′ 1 then {i ↦→ e′ 2 }f 

else Eσ β µmax({i ↦→ e ′ 2 }f, maxf(i = e1..(e ′ 2 − 1), f)) 

Figure 3.3: Semantics of size expressions 

Proof Theorem 2 is proved by expanding the definitions of min and max and re-arranging. 

Theorem 3 is proved by analysing the cases where e1 > e2 and e1 ≤ e2 separately. In the 

first case, both functions return zero and thus the proposition holds, in the latter case proof 

is by induction on e2 (with a base case of e1 = e2) and then re-arrangement using Theorem 

2. Mechanised proofs are given as theorems min max corres and minf maxf corres in 

Appendix B.11. 

3.4.2 Size of Block Instantiations 

The function SIσ β returns the size functions for a particular block instantiation. It is 

parameterised by two environments, σ maps block identifiers to a pair of (width, height) 

functions, β maps block identifiers to their definition. The function definition can be seen in 

Figure 3.4. The semantic function for blocks, Bβ, generates a logical predicate corresponding 

to the definition of a block identifier, SIσ β uses the function B ′ β which carries out the same 

operation for block instantiation statements. These functions are defined in Figure 4.8 on 

page 81 however their precise definition is mainly relevant to layout verification in Chapter 4 

rather than compilation.


SI :: SizeEnv → BlockEnv → Blkinst → (Exp × Exp) 

SIσ β bid p1 . . . pn = 

let (w, h) = σ(bid) in (w(p1 . . . pn), h(p1 . . . pn)) 

SIσ β [ b1 , . . . , bn ] = 

let (w1, h1) = SIσ βb1 in 

. 

let (wn, hn) = SIσ βbn in 

(λ(x1, . . .,xn)(y1, . . . , yn). max(w1(x1, y1), . . . , wn(xn, yn)), 

λ(x1, . . . , xn)(y1, . . .,yn). h1(x1, y1) + · · · + hn(xn, yn)) 

SIσ β b1 ; b2 = 



(λxy. let s = (ιs. B ′ β b1(x, s) ∧ B ′ β b2(s, y)) in w1(x, s) + w2(s, y), 

λxy. let s = (ιs. B ′ β b1(x, s) ∧ B ′ β b2(s, y)) in max(h1(x, s), h2(s, y))) 

Figure 3.4: Calculating a size function for a block instantiation 

S 

R 

T 

(a) Parallel composition 

R 

S 

T 

(b) Series composition 

Figure 3.5: Sizes generated for block compositions


Where a single block is instantiated the width and height functions are extracted from the σ 

environment and are returned after the application of curried parameters p1 . . . pn. The size of 

a parallel composition of blocks is calculated as though the blocks are arranged vertically, with 

the overall height being the sum of the height of each individual block and the overall width 

being the maximum width of any block in the composition. The size of a series composition 

of blocks is calculated as though the blocks are arranged horizontally in a similar manner. 

Series composition is represented as a fully associative binary operator (an n-size composition 

can be represented as nested series compositions) for convenience. The series composition 

size function internal signal s is defined in terms of the ι definite description operator. This 

operator, in the form ιx. P(x) can be read as “the x such that P(x) holds”. In this case 

the internal signal s is the one such that it obeys the predicates produced by the application 

of function B ′ to the two blocks. This is made necessary by the relational nature of Quartz 

blocks which means it is not possible to state whether block b1 or block b2 generates the value 

s (or some combination of the two, if it is a tuple of values). This logical nicety is practically 

irrelevant for us in this chapter since we will convert relational Quartz descriptions into 

functional ones during compilation while evaluating any size functions. 

The grey boxes with dotted outlines in Figure 3.5 illustrate the sizes calculated for series and 

parallel compositions diagrammatically. 

3.4.3 Size Inference 

As noted in the previous section each type of statement has a particular layout interpretation 

and we can exploit that to describe the size of each type of statement recursively. The size 

of an entire block can then be described in terms of the sizes of its constituent statements. 

The function SB, shown in Figure 3.6, returns a pair of expressions describing the height 

and width of a block in terms of its domain, range and internal signals. 

Unlike the SI function we described previously this function is directly implementable since 

it does not use the definite description operator or refer explicitly to internal signals. Note 

also that this function does not infer size functions for the block, it infers size expressions 

in terms of the block’s local environment. These size expressions can be converted into


SB :: Block → (Exp × Exp) 

SB block bid d1 . . . dn ∼ r { τ1 id1 . . . τq idq. stmts } = SS ′ stmts 

SS ′ :: StmtList → (Exp × Exp) 

SS ′ stmt1 . . . stmtn = 

let (w1, h1) = SSstmt1 in 

. 

let (wn, hn) = SSstmtn in 

(max(w1, . . .,wn), max(h1, . . . , hn)) 

SS :: Stmt → (Exp × Exp) 

SS assert e str = (0, 0) 

SS e1 = e2 = (0, 0) 

SS if e { stmts1 } else { stmts2 } = 

let (w1, h1) = SS ′ stmts1 in 

let (w2, h2) = SS ′ stmts2 in 

(if(e, w1, w2), if(e, h1, h2)) 

SS for i = e1..e2 { stmts } = 

let (w, h) = SSstmts in 

(maxf(i = e1..e2, w), maxf(i = e1..e2, h)) 

SS a ; blkinst ; b at (x, y) = 

(width(a ; blkinst ; b) + x, height(a ; blkinst ; b) + y) 

Figure 3.6: Inferring the size of a block 

functions in terms of the block’s domain and range variables by binding any local variables to 

values determined through an ι operator and the semantic meaning function for the block’s 

statements, however we will not pursue this approach in this chapter. This introduces a 

slight divergence between the function SI which is parameterised by a SizeEnv environment 

of block size functions and the size inference function, which produces expressions, however 

for the purposes of our implementation this is desirable - we take an alternative approach to 

implementing these semantics during the compilation process. In Chapter 4 we will return 

to using size functions for verification purposes and Figure 4.9 on page 82 illustrates how size 

expressions can be converted into size functions. 

The key concept this function implements is that a block’s size is the top right corner of a 

bounding box that encloses all sub-blocks instantiated within it. This is found by selecting 

the maximum value for each statement then selecting the maximum of all statements for the 

block as a whole. It is important to note that because max has been added to the set of 

available expression operators, the expressions that describe the height and width can remain 

fully parameterised in terms of a block’s signals and the max function can be evaluated exactly 

during full elaboration of the design rather than forcing the compiler to select a (possibly


block beside (block R (‘a, ‘b) ∼ (‘d, ‘x), block S (‘x, ‘c) ∼ (‘e, ‘f)) 

(‘a a, (‘b b, ‘c c)) ∼ ((‘d d, ‘e e), ‘f f) 

attributes { 

width = width((a,b) ;R ;(d, is )) + width((is, c) ; S ; (e, f)). 

height = max (height ((a, b) ;R ;(d, is )), height((is, c) ; S ; (e, f))). 

} 

{ 

‘x is. 

(a, b) ; R ; (d, is) at (0,0). 

(is , c) ; S ; (e, f) at (width((a, b) ; R ; (d, is )), 0). 

} 

Figure 3.7: A placed interpretation of the Quartz beside combinator 

non-optimal) static approximation to this value. 

The function is defined recursively over the structure of statements, with three basic cases. 

Assignments and assertions are given size expressions (0,0) since they are assumed to take 

up no space. Block instantiations have size expressions that are the size of the instantiation 

itself added to its x and y placed position. Conditional generation statements produce a 

conditional expression. Iteration statements are the most interesting construct, here the 

maxf function is used to select the maximum size values as the loop variable varies between 

its lower and upper limit. If the loop upper bound is less than its lower bound (e2 < e1 in 

Figure 3.6) then the loop never executes and this is reflected in the semantics for maxf which 

returns zero in these circumstances. 

3.5 Parameterised Quartz with Placement 

These concepts can be brought together to write Quartz code with relative placement using 

explicit co-ordinates. 

3.5.1 Laid-out Combinators 

In Figure 3.7 a version of the R↔S (beside) combinator which has been laid out with relative 

positioning is illustrated. The block R is placed at the origin and the block S is beside it 

with a y-offset of zero and an x-offset of the width of the R block.


The standard block definition has been augmented with an set of attributes giving height 

and width expressions for the block. The width of the combinator is described as the sum 

of the widths of the R and S blocks, while the height is the maximum of the heights of the 

R and S blocks. These manually specified size expressions are the same as those that would 

have been returned by the inference algorithm SBσ β, simplified with the trivial relationship 

x + 0 = x. 

This laid-out beside combinator can now be used to position blocks relative to each other in 

a larger circuit, with the added advantage that it can also describe the inter-connection of 

the blocks. This allows the significant problem of writing correct explicit layouts to be split 

into simpler and more manageable modules: in the instantiation beside(R, S) the internal 

arrangement of the combinator can be ignored and only the size of the entire beside block 

needs to be known. 

3.5.2 Naive vs General Placement 

Figure 3.8 shows a description of the Quartz map n R combinator which applies a block to each 

element of a vector. The block’s function is described using a single loop which instantiates 

multiple R blocks, each of which is explicitly placed at a set of co-ordinates. 

The layout produced by this description for n = 4 is shown in Figure 3.9(a). At first glance 

this layout appears to be correct: the grey bounding box shows the overall height of the map 

block specified as a multiple of n times the height of the R block and the width is the same 

as the width of R, while each block is placed with an x-offset of zero and a y-offset calculated 

from the number of R blocks below it. This layout is in many cases correct, however it 

relies on a key assumption: that the size of R is the same for all iterations across the vector. 

The vector is polymorphic and could be a tuple containing variables which effect the size of 

each R instance yet this would not be reflected in the layout generated by this naive map 

implementation. Figure 3.9(b) illustrates the same map block implementation for a situation 

where the R instances do not all have the same size: block placement is incorrect - leaving 

empty space between blocks or causing them to overlap - and the overall size of the whole 

block is also too small since two R instances lie outside the bounding box.


block map (int n, block R ‘a ∼ ‘b) (‘a i[n]) ∼ (‘b o[n]) 

attributes { 

width = width(i[0] ;R ; o[0]) . 

height = height(i[0] ; R ; o[0]) ∗ n. 

} { 

int j. 

for j = 0..n−1 { 

i [j] ; R ; o[j] at (0, height(i[0] ; R ; o[0]) ∗ j). 

} . 

} 

(0, 3 * height R) 

(0, 2 * height R) 

(0, 1 * height R) 

(0, 0) 

Figure 3.8: The Quartz map n R combinator with naive layout information 

R 

R 

R 

R 

(a) Naive layout 

(width R, 4 * height R) 

(0, 3 * height R) 

(0, 2 * height R) 

(0, 1 * height R) 

(0, 0) 

R 

R 

R 

R 

(width R, 4 * height R) 

(b) Failure of naive layout 

(0, height i[0];R;o[0] + 

height i[1];R;o[1] + 

height i[2];R;o[2]) 

(0, height i[0];R;o[0] + 

height i[1];R;o[1]) 

(0, height i[0];R;o[0]) 

Figure 3.9: Different layouts for map n R 

block map (int n, block R ‘a ∼ ‘b) (‘a i[n]) ∼ (‘b o[n]) 

attributes { 

width = maxf(k=0..n−1, width (i[k] ;R ;o[k])). 

height = sum(k=0..n−1, height (i[k] ;R ;o[k])). 

} { 

int j. 

for j = 0..n−1 { 

i [j] ; R ; o[j] at (0, sum(k=0..j−1,height(i[k] ;R ; o[k]))) . 

} . 

} 

(0, 0) 

R 

R 

R 

R 

(c) General layout 

(max width R, 

sum(j=0..3, height i[j];R;o[j]) 

Figure 3.10: The Quartz map n R combinator with general layout information


The general case requires a more general layout description, shown in Figure 3.10. Instead 

of multiplication, the sum function is used to ensure that each block is placed at exactly the 

right position and the sum and maxf functions are used to describe the bounding box for the 

whole map n R block. Figure 3.9(c) demonstrates that this layout description can cope with 

the irregular example. 

It is worth noting that the manually specified height and width for this block are not the 

same expressions that would be inferred by the inference algorithm in the previous section. 

The inferred height expression would be: 

maxf(j = 0..n − 1,width(i[j] ; R ; o[j]) + 0) 

and the width expression would be: 

maxf(j = 0..n − 1,height(i[j] ; R ; o[j]) +sum(k = 0..j − 1, height(i[k] ; R ; o[k])) 

Both of these are more complex expressions than the manually specified expressions and this 

is the main advantage of manually specifying size expressions rather than using the inference 

algorithm. It is important to note that they are exactly equivalent (we will develop proofs 

of this kind of relationship in Chapter 4). 

3.5.3 A Placed Ripple Adder 

A simple example of the kind of parameterised placed circuit that can be described with this 

infrastructure is a ripple adder. The code in Figure 3.11 describes an n-bit ripple adder built 

from a row of full-adder blocks (note that we have assumed that a full-adder block fadd is 

available as a library/primitive element of size 2 × 1). 

This description uses the row combinator, which has been annotated with placement infor- 

mation, to generate a connected row of full adders. The rippleadd block uses the language 

construct zip to re-arrange the tuple of vectors (a, b) into a vector of tuples for application 

to the row of full adders and uses the wiring block apr (append-right) (in the Quartz prelude 

library) to append the carry-out wire to the sum output.


block row (int n, block R (‘a, ‘b) ∼ (‘c, ‘a)) (‘a l , ‘b t[n]) ∼ 

(‘c b[n], ‘a r) 

attributes { 

height = maxf(k=0..n−1, height((is[k], t[k]) ;R ; (b[k], is [k+1]))). 

width = sum(k=0..n−1, width ((is[k], t[k]) ;R ; (b[k], is [k+1]))). 

} { 

// Wires: l = left , t = top, b = bottom, r = right 

int i. ‘a is [n+1]. 

is [0] = l. 

for i = 0..n−1 { 

(is [ i ], t[ i ]) ; R ; (b[i ], is [ i+1]) 

at (sum(k=0..i−1,width((is[k], t[k]) ; R ; (b[k], is [k+1]))), 0). 

} . 

r = is[n]. 

} 

block fadd (wire cin, (wire a, wire b)) ∼ (wire ans, wire cout) 

attributes { height = 1. width = 2. }{ } 

block rippleadd (int n) (wire a[n], wire b[n]) ∼ (wire ans[n+1]) 

attributes { 

height = 1. 

width = 2 ∗ n. 

} { 

wire cin. 

cin = false. 

(cin, (a, b)) ; snd (zip 2) ; row (n, fadd) ; apr n ; ans at (0,0). 

} 

(0,0) 

Figure 3.11: A simple placed ripple adder 

fadd fadd fadd fadd 

Figure 3.12: The ripple adder laid out on a grid for n = 4 

(8,1)


rippleadd is not a combinator and thus much simpler height and width expressions have been 

specified than would be inferred by the inference algorithm (though the result is the same). 

The zip and apr blocks are both pure wiring which require zero room so the row of full adders 

is the only block which actually contributes to the size of rippleadd - and since fadd has a 

constant height and width the size of the whole ripple adder can be described extremely 

clearly. 

Figure 3.12 shows the layout of the ripple adder circuit for n = 4. 

3.6 Different Layout Interpretations 

While functional descriptions of Quartz designs tend to have a useful obvious layout inter- 

pretation, it is often the case that designs have more than one possible layout interpretation. 

For example, the ripple adder illustrated as a row in Figure 3.12 could equally be described 

as a vertical array of full-adders. In this case we could use the col combinator rather than 

the row combinator however because Quartz combinators describe both function and layout 

it is sometimes desirable to use the same combinator with a different layout interpretation. 

For example, an m×n grid of A elements, surrounded by interface elements B on the domain 

and C on the range can be described in Quartz as: 

[map n B, map m B] ; grid m,n A ; [map m C, map n C] 

This is a common circuit structure, for example where blocks B and C could be input and 

output registers. However, the standard layout interpretation attached to Quartz constructs 

will produce the layout shown in Figure 3.13(a). This is not the worst possible layout that 

could be assigned to this circuit but it is far from compact with long wire lengths and a large 

area of unused logic resources. 

A better layout interpretation would be that shown in Figure 3.13(b). This layout can be 

enclosed in a much smaller bounding box and has shorter wires between the B and C elements 

and the grid. However, this layout is difficult to generate with the system as described so 

far.


B 

B 

B 

B 

B 

B 

B 

B 

B 

B 

A 

A 

A 

A 

A 

A 

A 

A 

A 

A 

A 

A 

A 

A 

A 

A 

(a) Default layout interpretation 

A 

A 

A 

A 

A 

A 

A 

A 

C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

B 

B 

B 

B 

B B B B B B 

A 

A 

A 

A 

A 

A 

A 

A 

A 

A 

A 

A 

C C C C C C 

(b) A better layout interpretation 

Figure 3.13: A grid of m ×n A elements surrounded by B and C elements can have multiple 

layout interpretations. Shown here for n = 4, m = 6. 

3.6.1 Composition 

Series and parallel composition have been given particular layout interpretations. This means 

that a series composition will always place elements from left to right, and a parallel compo- 

sition will always stack elements vertically. In this case, we want the parallel composition to 

be “wrapped around” the top and bottom of the grid - not something that can be achieved 

within the current framework since it implies that the grid is placed within the bounding box 

described by the parallel compositions. There are, however, three solutions that can resolve 

this situation: 

1. The Quartz description can be re-cast using the beside and below combinators. This 

would be quite effective, however it introduces some unnecessary complexity since beside 

and below operate on blocks with four connections and the map n B and map m C blocks 

have only two. The π1 −1 /π1 and π2 −1 /π2 wiring blocks can be used to resolve this 

issue, however the resulting description will be less clear than the original. 

2. The series and parallel compositions can be expanded out into separate block instan- 

tiations with explicit internal signals. This new expanded version can be derived from 

the original description by refinement. After the new description has been produced, 

A 

A 

A 

A 

A 

A 

A 

A 

A 

A 

A 

A 

C 

C 

C 

C


block surround (block A (‘a1, ‘a2) ∼ (‘a3, ‘a4), 

(block B ‘l ∼ ‘a1, block C ‘t ∼ ‘a2), 

(block D ‘a3 ∼ ‘b, block E ‘a4 ∼ ‘r)) 

(‘l l , ‘t t) ∼ (‘b b, ‘r r) { 

‘a1 l2. ‘a2 t2. ‘a3 b2. ‘a4 r2. 

l ; B ; l2 at (0, height(b2 ; D ; b)). 

t ; C ; t2 at (width(l ; B ; l2), max(height(b2 ;D ;b) + height((l2,t2);A;(b2,r2)), 

height(b2;D;b) + height(r2;E;r))). 

(l2, t2) ; A ; (b2, r2) at (width(l ; B ; l2), height(b2 ; D ; b)). 

b2 ; D ; b at (width(l ; B ; l2), 0). 

r2 ; E ; r at (width(l ; B ; l2) + width((l2,t2);A;(b2,r2)), height(b2;D;b)). 

} 

Figure 3.14: A combinator describing the function and layout of the grid interface in Figure 

3.13(b) 

each block instantiation can be given explicit co-ordinates. 

3. Since this arrangement is extremely common a new combinator can be created to 

describe it. 

4. A final alternative is to remove the layout interpretation from series and parallel com- 

positions and require explicit co-ordinates within compositions. This is, in general, 

more trouble than it is worth - however the option can be left open of not using explicit 

co-ordinates and defaulting to the standard interpretation. 

We would suggest option (3) is the most practical and flexible. The purpose of combining 

explicit layout with combinators is to provide just this flexibility and a general combinator 

implementing this structure is illustrated in Figure 3.14. This combinator appears complex 

at first however it actually has a simple structure, with interface elements B, C, D and E 

placed on the four sides of the square element A. Applied to the interface elements in the 

grid example, this combinator is functionally identical: 

Theorem 4 

surround(grid m,n A, (map n B, map m B), (map m C, map n C)) = 

[map n B, map m B] ; grid m,n A ; [map m C, map n C]


Proof Pointwise, by expanding the definition of surround to give the proposition: 

∃l2, t2, b2, r2. l ; map n B ; l2 ∧ t ; map m B ; t2 ∧ (l2, t2) ; grid m,n A ; (b2, r2) 

∧ b2 ; D ; b ∧ r2 ; E ; r = 

[map n B, map m B] ; grid m,n A ; [map m C, map n C] 

Then by expanding the definitions of series and parallel composition. 

The placement co-ordinates of the C block in this combinator are particularly worthy of 

interest. The max function is used to describe the y co-ordinate that C is placed at, this 

ensures that the layout is truly general and independent of the sizes of each block. The 

verification of this combinator’s layout is discussed further in Section 4.7. 

3.6.2 Combinators 

Another key problem is that the map n R combinator block itself has been given a layout 

interpretation - with its elements arranged vertically. However, in this example we require 

both a vertically and horizontally arranged map . One possibility is to require designers to 

choose between vmap and hmap blocks, however this reduces the link between the original 

pure-functional description and the placed version. It also ignores the fact that most of the 

time the map combinator is used in manner where vertical layout is the most appropriate. 

The Quartz overloading mechanism can be used to handle circumstances where multiple 

blocks are required which have the same basic functional properties but differ in some other 

way. Figure 3.15 illustrates how this system can be used to provide a map n R combinator 

which has a default vertical layout but can also be configured to be laid out horizontally. 

A simple usage of map n R will be resolved to refer to the default case map block which then 

instantiates the general combinator with the specific default layout (vertical, for map). If 

horizontal layout is desired then the fully flexible map block can be invoked specifically by 

adding an additional orientation parameter to the instantiation. 

In Figure 3.15 we have used an integer parameter to control the orientation of the map block 

however if the Quartz type system were extended to support enumerated types then this could


#define VERTICAL 0 

#define HORIZONTAL 1 

// Map combinator supporting horizontal and vertical placement 

block map (int orientation) (int n, block R ‘a ∼ ‘b) (‘a i [n]) ∼ (‘b o[n]) 

attributes { 

width = if(orientation==1, sum(k=0..n−1, width (i[k] ;R ; 

o[k])), maxf(k=0..n−1, width (i[k] ;R ;o[k]))) . 

height = if(orientation==1, maxf(k=0..n−1, height (i[k] ;R ; 

o[k ])), sum(k=0..n−1, height (i[k] ;R ; o[k]))) . 

} { 

int j. 

assert (orientation == 1 or orientation == 0) ”Invalid placement specified”; 

for j = 0..n−1 { 

if (orientation == 0) { 

i [j] ; R ; o[j] at (0, sum(k=0..j−1,height(i[k] ;R ; o[k]))) . 

} else { 

i [j] ; R ; o[j] at (sum(k=0..j−1,width(i[k] ;R ;o[k ])), 0). 

} . 

} . 

} 

// Overloaded map block providing default placement interpretation 

block map (int n, block R ‘a ∼ ‘b) (‘a i[n]) ∼ (‘b o[n]) → map VERTICAL. 

Figure 3.15: Using overloading to provide multiple layout interpretations of combinators 

be replaced with an enumerated type parameter with values of horizontal or vertical. In this 

case the pre-processor definitions of the constants could be removed and the parameter- 

checking assertion would not be necessary. 

This approach can be applied for all combinator blocks which have more than one possible 

orientation, for example a row could be laid out vertically or a column could be laid out 

horizontally. 

3.7 Compiling Placed Quartz Designs 

Our goal when compiling Quartz is, where possible, to only partially evaluate the Quartz 

description maintaining the maximum possible degree of parameterisation in the Pebble 

output. When compiling Quartz with layout into Pebble relative positioning information 

must be eliminated and replaced with expressions defining absolute coordinates.


Quartz AST 

Lexer/ 

Parser 

Preprocessed 

Quartz 

Preprocessor 

Quartz 

Input 

Type Processing 

Identifier 

Conversion 

Type 

Inference 

Overloading 

Resolution 

Full Instantiation 

Layout Processing 

Placement checks 

Size Inference 

Layout 

Verification 

Pebble Output 

Direction Processing 

Direction 

Inference 

Direction 

Concreting 

Directional 

Instantiation 

Pretty printer 

Distillation 

Block 

Analysis 

Translation / 

Unwinding 

Layout 

Generation 

Identifier 

Selection 

Pebble AST 

Figure 3.16: Modified compiler architecture for compiling Quartz designs with layout information 

To achieve this, the Pebble 5 language must be extended to enable the evaluation of the more 

complex expressions that can be generated by Quartz compilation. We describe this extended 

Pebble as LE-Pebble (Layout-Enhanced Pebble). The expressions supported by the Pebble 

5 language are essentially the same as those of Quartz before the new functions were added 

to support layout. LE-Pebble differs in that the standard expressions must be augmented 

with max, maxf, sum and if expression types. In practice we will be able to eliminate these 

constructs when compiling from Quartz to Pebble in many cases, however in the general case 

they are still required. 

LE-Pebble does not require height() and width() functions, these will be eliminated during 

compilation and replaced with appropriate simple expressions. 

The Quartz compilation framework is extended as shown in Figure 3.16 to support the 

compilation of designs with layout information. 

3.7.1 Changes to the Type Processing Module 

In the standard compiler framework the type processing module carries out type checking 

on the Quartz input, using a derivative of the Hindley/Milner algorithm [14, 53]. The type


processor also resolves overloading using an algorithm based around satisfiability matrix 

predicates [64]. The standard output of the type processing module is a monomorphic, 

annotated AST with multiple copies of polymorphic blocks instantiated for each utilised 

type. 

This output is suitable for the standard later stages of the compiler, however for placed 

Quartz compilation we need to insert a new layout processing module between the type 

processor and direction processor. This requires significant changes to the type processor to 

split the instantiation stage in two, one part that resolves overloading and the other that 

removes all polymorphism. 

Since different overloaded instances of Quartz blocks can have different size expressions it 

is necessary to resolve overloading prior to layout generation or layout verification. This is 

achieved by the new “overloading resolution” stage within the type processor which replaces 

references to overloaded blocks with references to specific instances, instantiating new copies 

of blocks which use different overloaded instances on different occasions. This non-overloaded 

but still polymorphic Quartz AST is then passed to the layout processing module. 

After layout processing, the type processor completes the process of generating monomorphic 

Quartz by eliminating all polymorphic types. 

3.7.2 Layout Processing Module 

The layout processing module, despite its name, does not take full responsibility for compiling 

layout information in Quartz descriptions. It is responsible for initial processing of layout 

information and preparation for later stages. 

If the compiler is invoked on a design and is not requested to generate placed output the layout 

processing module strips all size attributes and placement information from blocks. The 

modified circuit design is then passed to the later compilation stages. If layout verification 

or placed output mode is requested then the module checks that there are no unplaced 

block instantiations within the circuit (unless they are the only instantiation within a block, 

in which case they are automatically placed at (0, 0)) and then the size inference stage is 

invoked.


The block size inference procedure ensures that all blocks are annotated with height and width 

expressions. If these expressions have been specified manually then no action is necessary, 

however if not the procedure executes the function SB to generate size expressions for the 

block. 

The verification role of the layout processing module will be discussed in Chapter 4. The 

actual task of converting Quartz layout information to explicit co-ordinates in LE-Pebble is 

handled by new stages in the distillation module. 

3.7.3 Distillation of Size Expressions 

The distillation module actually converts Quartz descriptions into Pebble, eliminating higher- 

order parameters and other constructs that are not valid in Pebble. This module has been 

extended with code to compile layout expressions from Quartz into LE-Pebble. This is done 

in a recursive descent over expression trees, specifically eliminating height() and width() 

references. 

Three cases have to be dealt with: 

1. The height()/width() function refers to a single block with domain and range signals 

linked into the local environment. In this case the block’s height or width expression is 

retrieved and compiled, before replacing the height()/width() function with appropriate 

substitutions for the correct applied domain/range signals. If the expression refers to 

signals internal to the block then these must be replaced with fresh identifiers and the 

code within the block that defines the values of these variables lifted into the current 

context. 

2. The function refers to a parallel composition of blocks. This case is compiled by ap- 

plying the size interpretation of parallel compositions (function SIσ β) and extracting 

expressions for each block in the composition, combining them using either addition 

(height) or the max function (width). 

3. The function refers to a series composition of blocks. In this case a similar operation 

must be employed as for parallel compositions however with a significant difference.


If the generated size expression depends on internal signals hidden within the series 

composition then these must be explicitly instantiated. This is already the process 

that occurs when Quartz series compositions are compiled into Pebble and if the series 

composition exactly matches an instantiation carried out anyway then the same newly 

declared signals can be used. However, if the composition is not otherwise present 

in the block new variables must be declared and code that sets their values must be 

lifted into the local context. This implements the semantics of the definite description 

operator used in function SIσ β. 

3.7.4 Recursive Size Expressions 

Many Quartz blocks have both recursive and iterative definitions. The two are usually 

equivalent, although the iterative versions compile more neatly into Pebble/VHDL. It is 

quite common to use the recursive definition of a common combinator for formal reasoning 

purposes and the iterative version in the generated hardware description. 

Some blocks, such as the binary tree combinator, can be more clearly defined recursively than 

iteratively. Size inference of recursively defined blocks can give recursive size expressions and 

while it is sometimes possible to manually specify iterative size expressions this is not always 

the case. The height and width of the recursively defined map combinator can be easily 

described using the same expressions as the iterative version, however the same can not be 

said for more complex combinators such as rows or grids where there are internal signals 

hidden by the recursion that can not be referenced or where blocks do not have a clearly 

defined size in terms of the simple sum and maxf functions (such as a tree). In any event, 

automation of this transformation is non-trivial so where size inference is used alone it is 

possible to be left with recursive size functions. 

For example, the recursively defined map combinator can be described by: 

map 0 R ⇔ id 

map n R ⇔ apln−1 −1 ; [R, map n−1 R] ; apln−1 

After expanding the parallel composition, apl and id this gives size expressions for map n


which can not be further simplified of: 

height = if(n = 0, 0, height(i[0] ; R ; o[0]) + height(i[n − 1..1] ; map n−1 R ; o[n − 1..1])) 

width = if(n = 0, 0, max(width(i[0] ; R ; o[0]), width(i[n − 1..1] ; map n−1 R ; o[n − 1..1])) 

Without knowing the value of n it is not possible to compile away the height() and width() 

references in the same way as for iterative size expressions. One strategy for dealing with 

these expressions is to mark them as requiring unwinding in the final output, in which case the 

full expression can be evaluated at compile time and non-parameterised Pebble is generated 2 . 

Another way for dealing with these expressions while maintaining parameterisation is to 

compile the recursion directly into Pebble. This requires the addition of a fix operator to 

Pebble expressions and makes them much more complex. This does resolve the issue and the 

recursive Pebble expressions can be compiled into specific VHDL functions for translation 

into VHDL (see the next section) however it does introduce unwanted complexity into Pebble. 

An alternative, which is not a fully general approach, is to attempt to compute the transitive 

closure of the recursive size functions. Often recursive blocks are controlled by a single 

integer parameter which decreases by 1 for each recursive call and in these circumstances the 

resulting value can often be determined. This approach depends on the complexity of the 

height and width functions for the R block, however if the size of R is constant for all inputs 

then the recursive functions can be simplified to just: 

height = heightR × n 

width = widthR 

In general it is better to write iterative descriptions where possible as these can usually be 

compiled more efficiently anyway. 

2 This is the approach taken in the current implementation of the compiler.


3.7.5 Expression Simplification 

After height() and width() functions have been eliminated the result is valid Pebble ex- 

pressions, however these are often significantly more complex than necessary and can be 

extensively simplified - for example, maxf(j = 0..5, 3) can be simplified to just 3. The Quartz 

compiler already possesses a sophisticated logic optimiser which simplifies simple logical and 

arithmetic expressions such as x + y − y and also evaluates any expressions where the result 

is known, such as 4 + 5 × 2. 

The logic optimiser is extended to implement the evaluation of max, if, maxf and sum expres- 

sion types when all parameters are known. Since we are generating parameterised Pebble 

output it is often the case that functions can not be fully evaluated and in these circumstances 

a range of transformations are applied to simplify expressions. 

Uses of if can be simplified if the conditional test can be evaluated statically. Uses of max 

are simplified to remove any values which are known to be less than or equal to another 

value (remaining as a parameter). For example, max(a, b, 4, 5, a + 2) can be simplified to 

max(b, 5, a + 2) since a + 2 > a and 5 > 4 always. These relationships can be determined 

using the logic optimiser itself to investigate whether x > y can be simplified to true or false 

for each pair x and y of parameters. 

sum and maxf are more complex functions to simplify. A number of generic transformations 

have been developed for expressions involving these functions and these transformations are 

applied by the optimiser where possible. The transformations applied to sum are: 

n < m ⇒ sum(i = m..n, f) = 0 

m ≤ n ⇒ sum(i = m..n, f) = (n − m + 1) × f 

i /∈ f ⇒ sum(i = m..n, f) = if(m ≤ n, (n − m + 1) × f, 0) 

While similar transformations are applied to maxf: 

n < m ⇒ maxf(i = m..n, f) = 0 

i /∈ f ⇒ maxf(i = m..n, f) = if(m ≤ n, f, 0)


function sum template(b : integer ; t : integer ) return integer is 

constant b : integer := 0; 

variable a : integer := 0; 

variable n : integer ; 

begin 

for n in b to t loop 

a := + a ; 

end loop ; 

return a ; 

end function sum width ; 

Figure 3.17: VHDL template for implementing the LE-Pebble sum function 

Proof By induction on n and re-arrangement using established properties of maxf and sum. 

Appendix B.9 gives mechanised proofs for these simplification rules. 

Conditionals can sometimes be further simplified by taking advantage of the context specified 

by assertions within blocks. For example, if a block contains an assertion which states m ≤ n 

then the correct branch of an if expression dependent on that condition can be selected 

statically. Alternatively, another common situation is for m = 0 and n ≥ 1 to be asserted, 

which can also be used to simplify the same expression. 

3.8 Compiling LE-Pebble into VHDL 

To complete the process of generating parameterised hardware libraries in an industry- 

standard format, LE-Pebble can be compiled into VHDL. The compilation of standard Pebble 

into parameterised structural VHDL has been reported previously [46, 48] however the addi- 

tion of new expression types that are not supported in VHDL to Pebble makes this process 

slightly more complicated. 

While VHDL does not support higher-order functions, it does support the definition of func- 

tions within structural hardware descriptions. This mechanism can be used to compile com- 

plex functions in LE-Pebble such as maxf into VHDL by generating functions based around 

simple templates. Iterative version of sum and maxf are preferable for VHDL implemen- 

tation and these can be described equivalent to the recursive definitions of these functions. 

Figure 3.17 gives the VHDL template for the sum function. This can be instantiated for any 

particular function by replacing the text “” with the expression to evaluate. This


function max(a : integer ; b : integer ) return integer is 

begin 

if a > b then 

return a ; 

else 

return b ; 

end if ; 

end function max; 

Figure 3.18: VHDL max function 

template can be parameterised in its upper and lower bounds and thus only one function 

needs to be generated for each usage of sum over the same function. The semantics of this 

VHDL sum function are the same as those of the Quartz/LE-Pebble function. If the range 

of the sum is zero then the default value of a (0) is returned, otherwise the sum is returned. 

An equivalent template can be instantiated for maxf. The n-input Quartz/LE-Pebble max 

function can be transformed to use nested calls to a 2-input VHDL max function defined as 

in Figure 3.18. Conditional expressions can be implemented using VHDL conditionals. 

Placement co-ordinates themselves are compiled into the appropriate constructs for a partic- 

ular target architecture, as specified by a directive in the source file. We have implemented 

placement support for the Xilinx Virtex and Virtex-II architectures, generating RLOC place- 

ment macros. Virtex and Virtex-II use different co-ordinate schemes and the Pebble compiler 

maps from the basic abstract grid onto these co-ordinate schemes. 

For the Virtex-II architecture, for example (see Chapter 2), each slice contains two look-up 

tables and other logic that can be explicitly instantiated. However placement co-ordinates 

are described in terms of individual slices. When mapping from Pebble to VHDL the grid is 

squashed so each set of two vertically adjacent 1 × 1 blocks are placed in the same slice. The 

basic layout element is thus a half-slice. 

3.9 Summary and Comparison with Related Work 

In this chapter we have described a layout infrastructure for the Quartz language which allows 

designers to apply explicit absolute or relative placement information to hardware designs. 

Our infrastructure allows Quartz higher-order combinators to be given layout interpretations


and then used to combine blocks with the explicit co-ordinates hidden. Some examples of 

full circuits described and laid out using this framework can be found in Chapter 6. 

The Quartz layout infrastructure is, to our knowledge, unique in supporting both iterative and 

recursive constructs and the compilation of designer-specified combinators into parameterised 

output. 

Systems based around Ruby [26] and Lava [7] have been used to exploit the geometric in- 

terpretation of higher-order combinators to generate placed circuits, however both generate 

flattened netlists not parameterised output. Neither support for loop iteration constructs, 

although this is somewhat less relevant since they are not being compiled into a form that 

relies heavily on iteration (VHDL). 

Pebble [49, 50] has been extended with below and beside relative placement language con- 

structs and a procedure for the conversion from relative to explicit co-ordinates has been 

formally analysed. This approach allows the generation of parameterised output and ensures 

that components can not overlap, however it is somewhat limiting in that it does not support 

mixing absolute and relative placement information in a single description and can not de- 

scribe pathological examples such as the irregular grid arrangement illustrated in Chapter 1. 

Furthermore, the generated placements are not necessarily as compact as possible, although 

this can be improved through the use of partial evaluation. The Pebble approach of building 

relative placement constructs into the language rather than allowing them to be user-defined 

as Quartz permits through its higher-order combinators is inherently more limited, although 

necessarily so since Pebble is not a high-order language.

Chapter 4 

Verifying Circuit Layouts 

In this chapter we describe how Quartz circuit layouts can be verified mechanically. Sec- 

tion 4.1 introduces our basic approach and Section 4.2 explains why Higher-Order Logic was 

selected as an good formalism. Section 4.3 discusses how we will define “correctness” of a lay- 

out. Section 4.4 describes the theoretical developments that provide a proof environment for 

Quartz layouts, while Section 4.5 describes how the Quartz compiler is adapted to generate 

definitions and proof obligations automatically. Section 4.6 demonstrates the application of 

our verification framework to the Prelude library and describes how automatically-generated 

scripts are honed based on these examples, while in Section 4.7 we apply the system to a 

range of other combinators. Section 4.8 discusses the strengths and weaknesses of our system 

and Section 4.9 summarises this chapter. 


Using explicit co-ordinates to define the placement of components in parameterised circuit 

descriptions can be complex and error-prone. The Quartz layout system based on giving 

layout interpretations to higher-order combinators substantially reduces the scope for errors 

since the task of describing a circuit layout is divided hierarchically into smaller and simpler 

components. Combinators with layout interpretations can be written once and used many 

times, so while the placement co-ordinates within each combinator are relatively simple, it is 

65

CHAPTER 4. VERIFYING CIRCUIT LAYOUTS 66 

still vital to ensure that the layout is correct. 

While close examination by a human designer is an good method of finding many bugs in 

layout descriptions, it is no substitute for a formal assurance of correctness. In the develop- 

ment of the example designs illustrated in Chapter 6 there have been several occasions where 

Quartz descriptions which appeared on first inspection to be correctly laid out have turned 

out to contain errors. 

The hierarchical decomposition that is typical of Quartz circuit designs can be exploited so 

that sections of the a circuit can be proved correct independently of each other and then 

these proofs integrated into a proof of correctness for the entire design. However, the type 

of proofs involved in formally verifying a layout description are not particularly well suited 

to hand-proof by the designer, or indeed a different proof expert. 

Layouts of large numbers of components leads to long theorems requiring proof - but the 

constituents of these theorems are often either trivial or quite simple. These two factors 

combine to make it likely that “pen and paper” proof of these theorems is likely to be 

particularly unreliable unless extreme care is taken. 

Furthermore, there is a high level of proof re-use between different circuit descriptions, by 

exploiting similar properties of arithmetic operators, binary relations and Quartz size expres- 

sion functions. This suggests that layout verification may be a good candidate for the use of 

mechanised proof tools. 

In this chapter we describe a proof infrastructure based on Higher-Order Logic which elimi- 

nates the possibility of human error in proofs and demonstrates a high level of automation. 

4.2 Choice of Formalism 

There are two main possible approaches to verifying Quartz layouts. The first is to verify the 

output circuit description for each compiled circuit, either in the form of parameterised/hier- 

archical VHDL or in netlist format. Verification of placed netlists is effectively carried out 

by the synthesis tools that generate the FPGA bitstream since they will raise an error if an 

incorrect layout is specified, however this is not of much use if the desire is to provide an


assurance that the generated parameterised VHDL library (Chapter 3) is correct. 

Verification of the parameterised VHDL would be possible but would have to be substantially 

repeated for each new circuit. The alternative approach is to attempt to verify the original 

Quartz description. Quartz combinators could be verified once and then this proof could be 

used in the proofs of all circuit descriptions that use this combinator. This approach offers a 

higher degree of reuse of proofs, which is extremely beneficial. 

Since Quartz is a high-order language, we have selected Higher-Order Logic (HOL) as the 

appropriate formalism for our proof system. This enables us to model most of the features 

of Quartz descriptions and thus to conduct verification at the level closest to the original 

circuit description. The use of HOL for functional verification of hardware is well understood 

[8, 52], although the level of automation that can be achieved is often not that great. 

Other formalisms, such as the Boyer-Moore logic [9] used by the ACL2 theorem prover 

[34], are simpler and can be more highly automated however they are less general. Using a 

first-order logic makes it impossible to prove properties of Quartz higher-order combinators, 

only about their instantiated instances i.e. while we might wish to prove the correctness of 

the map n R combinator we would be restricted to separately proving the correctness of the 

map n add and map n inv blocks. 

In this chapter we develop a system based around the embedding of HOL in the Isabelle 

[61] generic theorem prover. Isabelle/HOL [55] is a well developed Isabelle object logic and 

comes with many useful definitions and theorems. 

Our infrastructure is not specific to Isabelle, or to the Isabelle version of HOL, nor are we 

limited to using HOL for proofs. The layout verification stage of the Quartz compiler is 

designed to be invoked on polymorphic, high-order circuit descriptions however it could be 

invoked later during compilation to produce output for a different formalism. We separate 

the generation of proof obligations from the interface to the Isabelle theorem prover and it 

would be easy to provide an interface to a different proof tool.


4.3 Specifying Correctness 

In order to formally verify anything it is necessary to state the requirements for correctness. 

In the case of circuit layouts we define correctness in terms of validity, containment and 

intersection. 

4.3.1 Validity 

Validity is a property of the size expressions (height and width) for a block: 

Definition 5 A block size expression is valid if, for all allowable values of all variables in 

the expression, it always evaluates to a value greater than or equal to zero. 

∀x1, x2, . . . , xn. assertions(x1, x2, . . .,xn) ⇒ 0 ≤ f(x1, x2, . . . , xn) 

This requirement may appear trivial and indeed its proof is often easy, however it is an 

extremely important requirement. Blocks with size expressions that evaluate to negative 

values will usually render otherwise correct layouts useless. A common proof obligation for 

other correctness requirements is of the form: 

sizeA ≤ sizeA + sizeB 

This is provable only if it can be assumed that sizeB ≥ 0. 

The implication in Definition 5 is also significant. It states that it is only necessary for size 

expressions to be valid for inputs that meet the preconditions specified in the design (via 

assertions). For example, a size expression n × 2 is valid provided that n ≥ 0. 

4.3.2 Containment 

The size of a block is a bounding box defined as a rectangle with bottom left co-ordinates (0, 0) 

and the top right corner with co-ordinates as defined by the block’s size expressions. The size 

of a block can be specified manually and can be regarded as a specification that a block must


A 

B 

C 

E 

D 

(a) Incorrect containment 

F 

B 

C 

A 

E 

D 

(b) Correct containment 

Figure 4.1: Different layouts for the same components can affect containment within an area 

meet. A block meets the containment requirement if all sub-blocks are instantiated within 

the bounding box. 

Definition 6 A block is correctly contained if, for all allowable values of all input variables 

to the block, all instantiated sub-components fall within the block’s bounding box 

∀x1, . . .,xn. assertions(x1, . . .,xn) ⇒ ∀(p ; B ; q at (x, y)) ∈ InstantiatedBlocks. 

0 ≤ x ∧ 0 ≤ y ∧ 

x + Bwidth ≤ width(x1, . . . , xn) ∧ y + Bheight ≤ height(x1, . . . , xn) 

Figure 4.1 illustrates how the same components can be laid out within a block in ways that 

either fail or meet this layout correctness constraint. 

The containment constraint is also vital for layout verification proofs. It permits hierarchical 

proofs where the internal arrangement of blocks can be ignored and the block analysed purely 

in terms of its size expressions. It is sometimes desirable to relax this correctness requirement, 

as we discuss later in this chapter. 

4.3.3 Intersection 

The most significant area for potential problems with explicitly laid-out designs to arise is 

in the intersection of blocks. While this problem is somewhat reduced by hierarchical design 

descriptions and relative co-ordinates the possibility of errors is not totally eliminated. In- 

tersection occurs when the bounding boxes of two blocks overlap, leading to logic resources 

F


(x21, y21) 

(x21, y21) 

(x11, y11) 

(x22, y22) 

(x11, y11) 

(a) x22 ≤ x11 

(x22, y22) 

(x12, y12) 

(c) y12 ≤ y21 

(x12, y12) 

(x11, y11) 

(x11, y11) 

(x21, y21) 

(x12, y12) 

(x21, y21) 

(b) x12 ≤ x21 

(x12, y12) 

(x22, y22) 

(d) y22 ≤ y11 

(x22, y22) 

Figure 4.2: Situations under which two rectangles will not overlap 

possibly being allocated to both. This must obviously be avoided in correct circuit descrip- 

tions. 

Ensuring this correctness criteria is the common target of work on relative placement. The 

problem of ensuring non-overlapping of n rectangles is a well understood objective in con- 

straint programming [1, 6] and is a problem in placement algorithms, however in our case 

we are interested not in finding a solution to the problem but rather in checking one. Exist- 

ing work on constraint solving applied to this problem is not suitable for adaptation to the 

Quartz system because we permit a much ricer language of expressions than simple arithmetic 

operations to define block sizes. 

Two objects placed on a n-dimensional surface will not overlap if is one dimension in which 

they can not intersect. For rectangles on a two-dimensional surface this means that one 

block must be either below or above the other, or to the left or right of it. This means that 

there are four possible situations in which a layout can be correct, illustrated graphically 

in Figure 4.2. When the rectangle A is described by corners (x11, y11) and (x12, y12), and 

rectangle B is described by corners (x21, y21) and (x22, y22) this can be represented by the 

logical disjunction: 

(x22 ≤ x11) ∨ (x12 ≤ x21) ∨ (y12 ≤ y21) ∨ (y22 ≤ y11)


This leads naturally to the definition of intersection correctness: 

Definition 7 For every pair of block instantiations A and B within a block, where A is 

placed at (xA, yA) with size functions (widthA, heightA) and B is placed at (xB, yB) with 

size functions (widthB, heightB), for all possible allowable input values: 

(xB +widthB ≤ xA) ∨ (xA+widthA ≤ xB) ∨ (yB+heightB ≤ yA) ∨ (yA+heightA ≤ yB) 

A naive implementation of Definition 7 has (n − 1) × (n − 2) proof obligations. However, by 

exploiting symmetry this can be reduced to (n − 1) × (n − 2)/2 obligations. This still means 

that for blocks with a large number of components instantiated within them a large number 

of proof obligations are generated, however because Quartz designs tend to be broken up into 

many entities each of which only contains a few constructs this is less of a problem than it 

might seem at first. 

The generation of proof obligations from Quartz descriptions is discussed in detail in Section 

4.5, however it is worth mentioning here the special case of iteration. A Quartz for loop 

can lead to more than one block being instantiated which can potentially intersect with one 

another. When generating proof obligations for a for loop construct, an additional proof 

obligation therefore exists that for any two iterations of the loop, any instantiated blocks can 

not overlap with one another. 

4.4 Proof Environment 

We develop a mechanised theorem proving environment for layout verification based on a 

shallow embedding [8] of Quartz in Higher-Order Logic. This involves the definition of the 

semantics of Quartz constructs in terms of HOL connectives. 

We develop the QuartzLayout library of theories which provides definitions for and useful 

theorems about a sufficient subset of Quartz to enable layout proofs. Our embedding is quite 

different to the typical embeddings of hardware description languages in logic since our aim 

is not to engage in functional verification but rather to verify layout. This means that the


CompilerSimps 

Structures 

Functions 

ParallelComposition 

IntAlgebra 

QuartzLayout 

SeriesComposition 

Inbuilt 

Block 

Types 

Figure 4.3: The QuartzLayout theory hierarchy. Rectangular nodes develop the theory of the 

language itself, oval nodes define functions for size expressions and useful theorems. Arrows 

indicate dependencies. 

actual hardware generated by a Quartz description is of interest only so far as it affects the 

layout of the design and wiring is of no interest, since we assume that wiring resources are 

separate from computational resources. 

Our Quartz model is built up hierarchically from a number of Isabelle theories, as illustrated 

in Figure 4.3. In this chapter we describe how some of the language features of Quartz are 

modelled and identify some useful theorems. The full Isabelle theory development can be 

seen in Appendix B. 

4.4.1 Type System 

Isabelle/HOL has types and polymorphism based on the Hindley/Milner system [53] - the 

same basic system as Quartz. It also supports overloading, although it is based on type 

classes [84] which are not compatible with the Quartz overloading system [64]. Overloading 

complicates verification since it is not necessarily possible to determine which block instance 

is selected and different instances can have different size expressions. 

In the QuartzLayout system we ignore overloading and assume it has been resolved prior to


verification. The modified Quartz compiler ensures that this occurs (see Section 4.5). 

Otherwise the HOL type system is generally suitable for modelling the Quartz type sys- 

tem. Quartz boolean types can be represented as HOL booleans and Quartz integers can 

be modelled as HOL integers. Isabelle/HOL also defines a theory of natural numbers with 

an extensive range of useful theorems and we will make some use of this when modelling 

recursive blocks, however in general we use the integer type which is a more accurate model 

and supports true arithmetic. 

Quartz tuples are represented using the HOL Cartesian product type, which defines tuples as 

nested pairs. This means that the types of (a, b, c) and (a, (b, c)) are considered equivalent, 

however this is not a major problem since this is more permissive than the Quartz type 

system and designs must already type-check using the Quartz compiler’s type processor. 

The Types theory defines the types of wires and of vectors. Wires are named types only, with 

no properties. This is a useful model for our purposes since we are actually not interested 

in wiring in any way and the values on wires in a generated circuit are not relevant to its 

compilation. 

Vectors are defined as functions from integers to some other type. In this model a two 

dimensional Quartz vector wire a[m][n] is given type int → int → wire. There are significant 

limitations to this model [52], essentially arising from the fact that it is not possible to 

identify that a vector has a fixed number of elements and if each element of two vectors are 

equivalent then the vectors themselves are equivalent. However, this theoretical model is 

quite sufficient for layout proofs although it would likely be limiting when reasoning about 

functional properties. 

4.4.2 Blocks and Block Instantiation 

There are three characteristics of blocks that are of interest when reasoning about Quartz 

layouts: their semantic interpretation, width expression and height expression. It is necessary 

to model the meaning of blocks as well as purely their height and width expressions since 

the two are inter-related, with size expressions possibly involving internal signals defined in 

a block.


record (’a,’b)block = 

Def :: "’a" 

Height :: "’b" 

Width :: "’b" 

constdefs 

ap :: "[(’ a⇒ ’b,’a⇒ ’c)block,’a]⇒ (’b,’c)block" 

( infixl "$" 49) 

"ap ≡ λB x. (| Def = Def B x, Height = Height B x, Width = Width B x |)" 

constdefs 

inst :: "[’a, (’a⇒ ’b⇒ bool,’a⇒ ’b⇒ int)block,’b]⇒ (bool, int)block" 

(" ;;; ;;; " [45, 46, 47] 45) 

"inst ≡ (λ x B y. (| Def = Def B x y, Height = Height B x y, Width = Width B x y |))" 

Figure 4.4: Part of the QuartzLayout theory of blocks 

Because Quartz blocks are relational, rather than functional, we must model block semantics 

as logical predicates on their domain and range signals. A predicate block returns the boolean 

true for all valid combinations of inputs and outputs and false for invalid combinations. This 

means that a predicate can not calculate a block’s output values from its inputs in the same 

way as a function, however it can confirm that values are correct. Within block semantic 

definitions, internal signals can be represented using existential quantifiers ∃ (written “EX” 

in Isabelle/HOL). 

We model block size functions, rather than size expressions. A size function includes the 

block’s semantic definition within it to define the values of any internal variables, bound 

using the definite description operator. 

A block is modelled as a record consisting of a logical predicate, height function and width 

function, as shown in Figure 4.4. A block with type δ1 δ2 . . .δn ∼ ρ will be modelled as a 

record of type (δ1 → δ2 → · · · → δn → ρ → bool, δ1 → δ2 → · · · → δn → ρ → int)block. 

While the general form of the type is the same for both the size functions (which return 

integers) and the semantic definition predicate (which returns a boolean) they have to be 

defined separately because of the way types are described. 

Two operations are defined on block records to model the behaviour described by statements 

of the type a ; bid p1 . . .pn ; b. Block application describes the application of an inner 

(curried) parameter to a block. So as not to confuse this with HOL function application 

in QuartzLayout this requires the use of a dollar sign operator. The block instantiation


constdefs 

ser :: " [(’ a⇒ ’b⇒ bool,’a⇒ ’b⇒ int)block,(’b⇒ ’c⇒ bool,’b⇒ ’c⇒ int)block] 

⇒ (’a⇒ ’c⇒ bool, ’a⇒ ’c⇒ int)block" (infixl ";;" 48) 

"ser ≡ (λ B1 B2. (| Def = λx y. ∃ s. (Def B1) x s ∧ (Def B2) s y, 

Height = λx y. let s = (THE s. (Def B1) x s ∧ (Def B2) s y) in 

max (Height B1 x s) (Height B2 s y), 

Width = λx y. let s = (THE s. (Def B1) x s ∧ (Def B2) s y) in 

(Width B1 x s) + (Width B2 s y)|))" 

Figure 4.5: Part of the QuartzLayout theory of series composition 

operation is identified by triple semi-colons in HOL. This eases the job of parsing Quart- 

zLayout definitions and allows it to clearly distinguished from series compositions. A block 

instantiation in QuartzLayout is therefore represented by the string a ;;; bid $ p1 . . . $ 

pn ;;; b. This is sufficiently similar to the Quartz syntax to be readable and yet sufficiently 

simplified to not require the development of AST-transforming ML functions for Isabelle. 

The contents of a block record are accessed using the functions Def, Height and Width, so 

Height (a ;;; bid $ p1 . . . $ pn ;;; b) will return an integer value corresponding to 

the height of that block instantiation. This is identical to the height() function of Quartz 

expressions and is used to model this operation. 

Block compositions are described in the SeriesComposition and ParallelComposition theories. 

Parallel composition is defined as an operation on nested pairs, mirroring the definition of 

Cartesian products used to model Quartz tuples and is syntactically represented using double 

square brackets. 

Series composition is defined as a left-associative binary operator (identified by double semi- 

colons), as illustrated in Figure 4.5. The semantic definition of a series composition of 

two blocks is defined with an explicit internal signal connecting them, while the definite 

description operator is used to parameterise the width and height functions of the two blocks 

by the correct value to determine the width and height of the composition. 

The definitions of series and parallel composition implement the SIσ β function defined on 

Page 43. Series compositions are laid out horizontally with the widths summed and the 

maximum height selected, while parallel compositions are laid out vertically with heights 

summed and the maximum width selected.


Block operations and definitions in QuartzLayout are sufficiently powerful to allow the proofs 

of some (functional) Quartz Laws, such as: fst A ; sndB = [A, B]. This kind of theorem can 

be described in QuartzLayout as: 

!! a b c d. Def ((a, b) ;;; fst $ A ;; snd $ B ;;; (c, d)) = Def ((a, b) ;;; 

[[ A , B ]] ;;; (c, d)) 

(where “!!” is Isabelle’s meta-logic operator for universal quantification). This can be proved 

using Isabelle’s simplifier to expand the definitions of parallel composition, fst and snd. 

4.4.3 Expressions 

HOL arithmetic expressions will be used to model Quartz expressions. The IntAlgebra theory 

defines several additional operators that are required by Quartz but are not provided in 

Isabelle/HOL - such as a greater-than ordering and a power function for integers. The 

IntAlgebra theory also includes proofs of many useful theorems for re-arranging arithmetic 

inequalities of the kinds that will be needed to reason about circuit layouts. Most of these 

theorems can be proved easily using Isabelle’s simplifier, classical reasoner or arithmetic 

decision procedure. Some particularly useful and simple theorems concern the max function: 

Theorem 8 ∀ m n. n ≤ max n m 

Theorem 9 ∀ f g n. n ≤ f ∨ n ≤ g ⇒ n ≤ max f g 

Theorem 10 ∀ m n. 0 ≤ n ⇒ max(m + n) m = (m + n) 

Proof Trivial, by expanding the definition of the max function. Mechanised proofs are 

given in Appendix B.1 as theorems max nm nleq, max geq n disj and max xyge0. 

Another theorem that is simple and extremely useful is: 

Theorem 11 (0 ≤ a ∧ 0 ≤ b ∧ 0 ≤ c ∧ a ≤ b) ⇒ a ≤ (b + c) 

Proof Trivial, given as theorem z aleq bc in Appendix B.1.


consts 

maxf :: "(int∗int∗(int⇒ int))⇒ int" 

recdef maxf "measure (λ(b, t, f). nat(t+1−b))" 

"maxf (bot, top, fun) = (if (top < bot) then 0 

else ( 

case (top = bot) of True ⇒ fun top 

| False ⇒ ( 

let one = fun top in 

let two = maxf (bot, top − 1, fun) in 

max one two)))" 

consts 

sum :: "(int∗int∗(int⇒ int))⇒ int" 

recdef sum "measure (λ(b, t, f). nat(t+1−b))" 

"sum (bot, top, fun) = ( 

case (top < bot) of True ⇒ 0 

| False ⇒ sum (bot, top − 1, fun) + (fun top))" 

Figure 4.6: Definitions of the Quartz sum and maxf functions in QuartzLayout 

Theorem 11 is particularly useful since we are often confronted with expressions of this form 

where a, b and c are size functions for various blocks. Using this theorem we can simplify 

the logical proposition substantially, since we will prove that 0 ≤ x for all size expressions x 

as a matter of course and can use these theorems in other proofs. 

While Isabelle/HOL already includes definitions of max and if, we have to define the complex 

functions maxf and sum. The definitions of these functions from theory Functions can be 

seen in Figure 4.6. These definitions have the same semantics as the appropriate clauses of 

the semantic function for Quartz expressions Eσ β µ given on page 42. 

The functions are defined using the recdef method of declaring arbitrary recursive functions. 

The measure functions allow Isabelle/HOL to prove termination of the functions by showing 

that the supplied measure decreases for each recursive call. We use “case” rather than “if” 

expressions within the functions to avoid problems with the use of Isabelle’s simplifier where 

conditionals are repeatedly split, leading to a loop. Besides this, the two function identically 

and the equivalence between the two can be proved easily (see theorem maxf expand if in 

Appendix B.7 for such a proof for the maxf function). 

We prove the correctness of the maxf function, showing that it implements a logical definition:


Theorem 12 

∀ b t f. b ≤ t ∧ (∀y. 0 ≤ f y) ⇒ 

let max = maxf(b, t, f) in 

∃y. b ≤ y ∧ y ≤ t ∧ f y = max ∧ 

∀x. b ≤ x ∧ x ≤ t ⇒ (f x) ≤ max 

Proof The two parts of the conjunction can be proved separately. The first states that the 

maxf function returns a value that is produced by the function f while the second states that 

there is no greater value returned by f within the specified range. Both can be proved by 

induction over t, bounded from below by b (in Isabelle this is done using the int ge induct 

induction schema) and then re-arrangement using the definitions of max and maxf. Mech- 

anised proofs are available as theorems maxf ansvalid, maxf fmax and maxf is maxf in 

Appendix B.7. 

This proof is significant since it provides a mechanism for moving between the functional and 

logical definitions of the maxf should this be desired during a proof. 

The Functions theory contains the proofs of many useful properties of maxf and sum, includ- 

ing how they relate. Most of these proofs are carried out by induction on the value of top. 

Some particularly useful theorems about maxf are: 

Theorem 13 ∀ m n f. (∀y. 0 ≤ f y) ⇒ 0 ≤ maxf(m, n, f) 

Theorem 14 ∀ m n f. n < m ⇒ 0 ≤ maxf(m, n, f) 

Both of these theorems are useful in containment proofs. Some similarly useful theorems for 

sum are: 

Theorem 15 ∀ b t f g. b ≤ t ⇒ sum(b, t, λi. f i + g i) = sum(b, t, f) + sum(b, t, g) 

Theorem 16 ∀ b t f. b ≤ t ⇒ sum(b, t, f) + f (t + 1) = sum(b, t + 1, f)


Theorem 17 ∀ b t f. (∀y. 0 ≤ f y) ⇒ 0 ≤ sum(b, t, f) 

We also prove theorems about the inter-relationship between these two functions, such as: 

Theorem 18 ∀ b t f. (∀ x. 0 ≤ fx) ⇒ maxf(b, t, λk. sum(b, k, f)) = sum(b, t, f) 

Theorem 19 

∀ b t f. b ≤ t ∧ (∀ x. 0 ≤ f x) ⇒ 

maxf(b, t + 1, λk. sum(b, k, f)) = maxf(b, t, λk. sum(b, k, f) + f(t + 1)) 

Proofs of all these theorems and many others about maxf and sum are listed in Appendix B.7. 

4.5 Generating Theories of Quartz Programs 

Given that we are using a shallow embedding of Quartz in Isabelle/HOL, we can not reason 

about Quartz descriptions directly. Instead, we must translate Quartz descriptions into 

semantic descriptions in Higher-Order Logic. 

4.5.1 Compiler Architecture 

When in verification mode the Quartz compiler translates from Quartz descriptions to Is- 

abelle/HOL during compilation. This allows the Quartz compiler to resolve all overloading 

prior to generating HOL descriptions - t a vital step since our QuartzLayout system does not 

support overloading. 

Figure 4.7 shows the functionality of the Quartz compiler when in layout verification mode. 

Layout verification is divided between the Layout Processing and Isabelle modules. The 

Layout Processing module converts Quartz programs into their HOL semantic definitions 

and generates theorems that must be proved to verify the correctness of a layout. These 

HOL definitions and proof obligations are generated in an abstract data format which is then 

passed to the Isabelle module.


Quartz AST 

Lexer/ 

Parser 

Preprocessed 

Quartz 

Preprocessor 

Quartz 

Input 

Type Processing 

Identifier 

Conversion 

Type 

Inference 

Overloading 

Resolution 

Full Instantiation 

Placement checks 

Find dependencies 

for composites 

Generate HOL 

for composites 

Generate 

Isabelle scripts 

Layout Processing 

Isabelle Module 

Size Inference 

Generate HOL 

for primitives 

Generate proof 

goals 

Generate 

Tacticals 

Isabelle 

input 

Pretty 

printer 

Figure 4.7: Functions of the Quartz compiler in layout verification mode 

The Isabelle module is the only part of the compiler process that is specific to Isabelle. 

It generates Isabelle/HOL proof scripts using the correct syntax and outputs them to disk 

where they can be loaded by Isabelle. The Isabelle module also does its best to automatically 

generate input to Isabelle’s automatic proof tools, this is described in Section 4.6. 

Where blocks are part of a library and have already had their layouts proved the compiler 

does not generate proof scripts, assuming that these scripts are available for loading into the 

theorem prover from elsewhere. This is controlled by a special “layout-proved” attribute that 

can be attached to blocks. 

4.5.2 Generating Definitions 

Given the model developed in the previous section, Isabelle/HOL expressions with the ex- 

panded functionality of the QuartzLayout Functions library now essentially implement a 

super-set of Quartz expressions - it is therefore possible to translate Quartz expressions into 

Isabelle/HOL directly merely by re-writing syntax as necessary. 

Quartz blocks and the statements that they are composed of must be translated into their 

semantic interpretations. We assign a particular semantic interpretation to each statement 

type, recursively defined as necessary (for example, for loop statements) and describe a block 

as the logical conjunction of its statement semantics. This process is syntax-directed and


B ′ β :: BlockEnv → Blkinst → Predicate 

B ′ β bid p1 . . . pn = 

if bid ∈ β then Bββ(bid)(p1, . . .,pn) 

else bid(p1, . . .,pn) 

B ′ β [ b1 , . . ., bn ] = 

let m1 = B ′ βb1 in 

. 

let mn = B ′ β bn in 

λ(x1, . . . , xn) (y1, . . .,yn). m1(x1, y1) ∧ . . . ∧ mn(xn, yn) 

B ′ β b1 ; b2 = 

let m1 = B ′ β b1 in 

let m2 = B ′ β b2 in 

λxy. ∃s. m1(x, s) ∧ m2(s, y) 

B :: BlockEnv → Block → Predicate 

Bβ block bid d1 . . . dn ∼ r { τ1 id1 . . . τp idp. stmts } = 

λd1 . . . dn r. ∃ id1 . . .idp. S ′ β stmts 

S ′ :: BlockEnv → StmtList → Bool 

S ′ β stmt1 . . . stmtn = 

let m1 = Sβstmt1 in 

. 

let mn = Sβstmtn in 

m1 ∧ . . . ∧ mn 

Sβ :: BlockEnv → Stmt → Bool 

Sβ assert e str = e 

Sβ e1 = e2 = e1 = e2 

Sβ if e { stmts1 } else { stmts2 } = 

let m1 = S ′ βstmts1 in 

let m2 = S ′ βstmts2 in 

if e then m1 else m2 

Sβ for i = e1..e2 { stmts } = 

∀i. e1 ≤ i ≤ e2 −→ S ′ βstmts Sβ a ; blkinst ; b at (x, y) = B ′ βblkinst(a, b) 

Figure 4.8: Converting Quartz blocks into their semantic interpretation in Higher-Order 

Logic. The function Bβ defines the formal semantics of Quartz using HOL.


SB ′ :: BlockEnv → Block → (SizeFunc × SizeFunc) 

SB ′ β block bid d1 . . . dn ∼ r attributes { height = h. width = w. } { τ1 id1 . . . τp idp. 

stmts } = 

let m = S ′ βstmts in 

(λd1 . . .dn r. let (id1, . . . , idp) = (ι(id1, . . .,idp). m) in w, 

λd1 . . . dn r. let (id1, . . . , idp) = (ι(id1, . . . , idp). m) in h) 

Figure 4.9: Converting Quartz size expressions into QuartzLayout size functions 

automatic. Figure 4.8 gives the definition of the function Bβ which gives the semantics of a 

block as a logical predicate. This function gives Quartz a formal semantics in HOL, using an 

environment β which maps block identifiers to their definitions. 

The function in Figure 4.8 is implemented in the Quartz compiler layout processing module. 

The only difference between the formal definition and the compiler implementation is that the 

function B ′ β is not executed, instead the modelling of block instantiation within QuartzLayout 

(Section 4.4.2) is used. For example, the semantics of the map n R combinator, as generated 

by the compiler, are described by: 

λ(n, R)io. ∀j. (0 ≤ j ∧ j ≤ n − 1) −→ B ′ βR(i[j], o[j]) 

In Isabelle’s ASCII syntax this is written as (note that “o” has been replaced with “o ” since 

“o” is a reserved keyword in Isabelle/HOL): 

% (n, R) i o . ALL (j::int). ((0


Isabelle defines arbitrary recursive functions using the recdef construct however functions 

defined this way require termination to be proved automatically and this is difficult for the 

theorem prover to do because of the way Quartz blocks are defined. Unfortunately recdef is a 

closed box and it is not possible to manually direct the proof and recdef compilation process 

to generate useful results. 

Luckily Isabelle does provide a (less general) recursive construct called primrec for imple- 

menting recursion over data structures. This can be used with the natural numbers type to 

write recursive equations for Quartz blocks in the form R0 = g and Rn+1 = f(Rn). This is 

limited to cases where the recursion of the block is controlled by a single integer parameter 

that decreases to zero, however this comfortably describes most recursively defined Quartz 

blocks. 

We have carried out the translation from recdef to primrec manually, however there is no 

reason why the process could not be automated. 

4.5.3 Generating Proof Obligations 

The Quartz compiler also generates a series of proof obligations that check the correctness 

of a layout specification. The correctness theorems are split into three groups, representing 

proofs for validity, containment and intersection. These theorems are contained within the 

same theory file that defines the block’s semantic definition, height and width functions. This 

ensures that the theory of a block can only be loaded to support that of a block dependent 

on it once it has itself been proved correct. 

All theorems are universally quantified across all domain and range signals. Validity theorems 

are the simplest and are proved under the assumption that the size functions of all higher- 

order block parameters are also valid. Assertion pre-conditions asserted within the block’s 

body are also assumed and can be used in the proof. The general format of the height validity 

theorem for a block B of type d1 . . . dn ∼ r is:


∀ d1 . . . dn r. 

 

(∀(A :: block d A 1 . . . dA m ∼ rA ) ∈ {d1, . . .,dn, r}. 

∀ d A 1 . . .d A m r A . 0 ≤ HeightA d A 1 . . . d A m r A ) ∧ 

 

(∀ c ∈ assertions(B). c) ⇒ 

0 ≤ HeightB d1 . . . dn r 

The width theorem is similar, with width functions substituted for height functions 1 . 

The validity theorems for each block’s height and width function are given the names 

height ge0 and width ge0 and can be used in other theories. It is common to require these 

proofs when determining the validity of size functions for blocks that use them. 

Containment theorems are generated for each statement that involves block instantiations, 

stating that the leftmost bottom point of each block is greater than of equal to (0, 0) and 

the top rightmost is less than or equal to (width, height) for all possible values of block 

parameters (provided assertions are met). The recursive descent algorithm that calculates 

containment theorems is shown in Figure 4.10. 

For the map n R combinator this generates a containment theorem of: 

theorem "Î(n::int) (R::((’t395⇒ ’t396⇒ bool,’t395⇒ ’t396⇒ int)block)) (i::(’t395)vector) (o 

::(’t396)vector). ∀ (j::int). ((0 ≤ j) ∧ (j ≤ (n − 1))) −→ Def (i ;;; R ;;; o ) ; ∀ ( 

qs691::’t395) (qs692::’t396). 0 ≤ (Height (qs691 ;;; R ;;; qs692)) ; ∀ (qs691 ::’ t395) (qs692 

::’ t396). 0 ≤ (Width (qs691 ;;; R ;;; qs692)) =⇒ 

∀ (j :: int). ((0 ≤ j) ∧ (j ≤ (n − 1))) −→ (((0::int) ≤ 0) ∧ (0 ≤ sum (0, j − 1, λqs403. 

Height (i ;;; R ;;; o ))) ∧ ((0 + (Width (i ;;; R ;;; o ))) ≤ ( 

maxf (0, n − 1, λqs401. Width (i ;;; R ;;; o )))) ∧ ((sum (0, j − 1, λ 

qs403. Height (i ;;; R ;;; o )) + (Height (i ;;; R ;;; o ))) ≤ 

sum (0, n − 1, λqs402. Height (i ;;; R ;;; o ))))" 

Note that the Quartz compiler has annotated the Isabelle theory with the results of its own 

1 Actually the Isabelle module generates 4 validity theorems, in two different representations. This is done 

because a size function can be evaluated either as (Height A) p1 . . . pn or as Height(pn−1 ; A p1 . . . pn−2 ; pn). 

Both of these are precisely equivalent and the compiler automatically generates a proof for one in terms of 

the other, requiring the designer to only prove 2 of the theorems manually. It was originally felt that both 

representations could be useful in proofs however it has emerged that one format is the most useful and thus 

the other 2 theorems are essentially redundant.


CON T :: Block → Bool 

CON T block bid d1 . . . dn ∼ r { τ1 id1 . . . τp idp. stmts } = 

∀d1 . . . dn r. S ∅stmts ⇒ SCON T ′ stmts 

SCON T ′ :: StmtList → Bool 

SCON T ′ 

β stmt1 . . . stmtn = 

let c1 = SCON T stmt1 in 

. 

let cn = SCON T stmtn in 

c1 ∧ . . . ∧ cn 

SCON T :: Stmt → Bool 

SCON T assert e str = True 

SCON T e1 = e2 = True 

SCON T if e { stmts1 } else { stmts2 } = 

if e then SCON T ′ stmts1 else SCON T ′ stmts2 

SCON T for i = e1..e2 { stmts } = 

∀i. e1 ≤ i ≤ e2 −→ SCON T ′ stmts 

SCON T a ; blkinst ; b at (x, y) = 

(0 ≤ x) ∧ (0 ≤ y) ∧ 

(x + Width(a ; blkinst ; b) ≤ widthd1 . . .dn r) ∧ 

(y + Height(a ; blkinst ; b) ≤ heightd1 . . . dn r) 

Figure 4.10: Generating containment theorems 

type inference, ensuring that the Isabelle types are correct. The semantic interpretation of 

the block is generated as a set of assumptions, defining assertions and possibly determining 

the values of internal signals (none in the case of map ). 

Intersection theorems are the most complex. They are generated for each block instantiation 

except the first, checking intersection against the previous statements. Figure 4.11 gives the 

algorithm that generates intersection proof obligations. At first glance this appears quite 

complex, however its structure is really quite simple. The compiler makes a forward pass 

through the block statements, accumulating statements that have already been processed in 

the list φ. For each block instantiation that is encountered the least and maximum x and 

y co-ordinates are identified and are then compared with the equivalent co-ordinates for all 

blocks previously instantiated. This implements Definition 7, as can be most clearly seen in 

the last clause of function IN T ERSECT . 

An important case is the handling of for loops by the function SIN T ERφ. This generates 

two set of requirements - that the elements within a loop don’t intersect with previously 

instantiated blocks and that the elements within the loop don’t intersect with each other. This


IN T ER :: Block → Bool 

IN T ER block bid d1 . . . dn ∼ r { τ1 id1 . . . τp idp. stmts } = 

∀d1 . . . dn r. S ∅stmts ⇒ SIN T ER ′ ∅ stmts 

SIN T ER ′ φ :: StmtListStmtList → Bool 

SIN T ER ′ φ stmt1 . . . stmtn = 

let c1 = SIN T ERφstmt1 in 

let c2 = SIN T ERφ∪{stmt1}stmt2 in 

. 

let cn = SIN T ER φ∪{stmt1,...,stmtn−1}stmtn in 

c1 ∧ . . . ∧ cn 

SIN T ERφ :: StmtList → Stmt → Bool 

SIN T ERφ assert e str = True 

SIN T ERφ e1 = e2 = True 

SIN T ERφ if e { stmts1 } else { stmts2 } = 

if e then SIN T ER ′ φstmts1 else SIN T ER ′ φstmts2 SIN T ERφ for i = e1..e2 { stmts } = 

∀ i. e1 ≤ i ≤ e2 −→ SIN T ER ′ φstmts ∧ 

∀ i j. e1 ≤ i ≤ e2 ∧ e1 ≤ j ≤ e2 ∧ i = j −→ SIN T ER ′ {i↦→j}stmtsstmts 

SIN T ERφ a ; blkinst ; b at (x1, y1) = 

let (x2, y2) = (x1 + Width(a ; blkinst ; b), y1 + Height(a ; blkinst ; b)) in 

IN T ERSECT ′ (x1,y1),(x2,y2)φ 

IN T ERSECT ′ (x1,y1),(x2,y2) :: ((Exp × Exp) × (Exp × Exp)) → StmtList → Bool 

IN T ERSECT ′ (x1,y1),(x2,y2) stmt1 . . . stmtn = 

let c1 = IN T ERSECT (x1,y1),(x2,y2)stmt1 in 

. 

. 

let cn = IN T ERSECT (x1,y1),(x2,y2)stmtn in 

c1 ∧ . . . ∧ cn 

IN T ERSECT (x1,y1),(x2,y2) :: ((Exp × Exp) × (Exp × Exp)) → Stmt → Bool 

IN T ERSECT (x1,y1),(x2,y2) assert e str = True 

IN T ERSECT (x1,y1),(x2,y2) e1 = e2 = True 

IN T ERSECT (x1,y1),(x2,y2) if e { stmts1 } else { stmts2 } = 

if e then IN T ERSECT (x1,y1),(x2,y2)stmts1 

else IN T ERSECT (x1,y1),(x2,y2)stmts2 

IN T ERSECT (x1,y1),(x2,y2) for i = e1..e2 { stmts } = 

∀ i. e1 ≤ i ≤ e2 −→ IN T ERSECT (x1,y1),(x2,y2)stmts 

IN T ERSECT (x1,y1),(x2,y2) a ; blkinst ; b at (x ′ 1, y ′ 1) = 

let (x ′ 2 , y′ 2 ) = (x′ 1 + Width(a ; blkinst ; b), y′ 1 + Height(a ; blkinst ; b)) in 

(x ′ 2 ≤ x1) ∨ (x2 ≤ x ′ 1 ) ∨ (y′ 2 ≤ y′ 1 ) ∨ (y2 ≤ y ′ 1 ) 

Figure 4.11: Generating intersection theorems


is handled by checking the statements against themselves with a new identifier substituted 

for the loop variable. Intersection is only permitted if the two loop variables are the same and 

thus describing one iteration of the loop, which will obviously intersect totally with itself. 

The intersection theorem generated for map illustrates this, containing only the case checking 

that loop iterations do not overlap: 

theorem "Î(n::int) (R::((’t395⇒ ’t396⇒ bool,’t395⇒ ’t396⇒ int)block)) (i::(’t395)vector) (o 

::(’t396)vector). ∀ (j::int). ((0 ≤ j) ∧ (j ≤ (n − 1))) −→ Def (i ;;; R ;;; o ) ; ∀ ( 

qs691::’t395) (qs692::’t396). 0 ≤ (Height (qs691 ;;; R ;;; qs692)) ; ∀ (qs691 ::’ t395) (qs692 

::’ t396). 0 ≤ (Width (qs691 ;;; R ;;; qs692)) =⇒ 

∀ (j :: int) (j ’:: int). ((0 ≤ j) ∧ (j ≤ (n − 1)) ∧ (0 ≤ j’) ∧ (j’ ≤ (n − 1)) ∧ (j’ = j)) −→ 

(((0 + (Width (i ;;; R ;;; o ))) ≤ 0) | ((0 + (Width (i ;;; R ;;; o ))) ≤ 

0) | ((sum (0, j’ − 1, λ qs403. Height (i ;;; R ;;; o )) + (Height (i ;;; R ;;; o ))) ≤ sum (0, j − 1, λqs403. Height (i ;;; R ;;; o ))) 

| ((sum (0, j − 1, λqs403. Height (i ;;; R ;;; o )) + (Height (i ;;; 

R ;;; o ))) ≤ sum (0, j’ − 1, λqs403. Height (i ;;; R ;;; o ))))" 

It is important to note that the algorithms given in Figure 4.10 and Figure 4.11 are pseudo- 

code and the implementation in the Quartz compiler differs slightly. For example, the com- 

piler carries out a large number of optimisations to eliminate unnecessary goals that are 

defined as being true. In addition, rather than generating one large intersection theorem 

the compiler splits it on a statement by statement basis into multiple theorems to make the 

individual proofs a little simpler. 

While containment and intersection theorems are not used elsewhere, the validity theorems 

often are. It is therefore often useful to “prune” the assumption sets of these theorems to 

remove assumptions that are not necessary for the proof. When these theorems are used in 

other proofs the assumptions will themselves become proof goals and proofs can be simplified 

if the number of assumptions is minimised.


4.6 Proving the Prelude Library 

One set of Quartz blocks it is essential to prove is the prelude library, the basic set of 

operations that encode a variety of useful functions. The majority of the blocks in the 

prelude are actually wiring constructs rather than combinators, however it is still necessary 

to give them all a layout interpretation. We have done this according to the following rules: 

1. Wiring blocks, which simply re-arrange signals, are given size 0 × 0. 

2. Repeated composition (R n /rcomp) is laid out horizontally. /tri and ˜ /irt blocks 

which both use repeated composition are laid out vertically. map is laid out vertically, 

as are col and rdr . 

3. All other blocks are laid out horizontally, with the exception of grid which is two 

dimensional. 

The prelude library provides a fairly comprehensive set of blocks with different signal arrange- 

ments and is thus useful for experimenting with how automatic proof tools can be tuned to 

minimise the human intervention in proofs. 

4.6.1 Proofs with Tacticals 

Our initial approach to automating proof of the Prelude library involves generating proof 

tacticals in the Quartz compiler Isabelle module. Tacticals combine elementary proof steps 

and automated tactics with basic repetition or choice operations. 

We design tacticals based on experience with hand-proof of a variety of prelude blocks. These 

tacticals are based on invocations of Isabelle’s simplifier with specific simplification rules, 

interspersed with uses of the primitive rule method to decompose goals into multiple simpler 

sub-goals using theorems such as Theorems 13 and 17. Auto-generating proof scripts for all 

prelude blocks we find that many theories run correctly with the automatically generated 

tacticals without any human intervention, however some require intervention. 

The conjugate and conjugate2 blocks (defined in Figure 4.12) require additional intervention 

to prove the validity of their size functions due to the use of series composition. We de-


block conjugate (block Q ‘a ∼ ‘a, block P ‘a ∼ ‘a) (‘a i) ∼ (‘a o) attributes { 

height = height (i ;converse P ; Q ; P ; o). 

width = width (i ;converse P ; Q ; P ; o). 

} → converse P ; Q ; P. 

block conjugate2 (block R (‘a,‘b) ∼ (‘a,‘b), block S (‘a,‘b) ∼ (‘a,‘b)) 

(‘a i1, ‘b i2) ∼ (‘a o1, ‘b o2) attributes { 

height = height ((i1,i2) ; swap ; converse S ; swap ; R ; S ; (o1,o2)). 

width = width ((i1,i2) ;swap ; converse S ; swap ; R ; S ; (o1,o2)). 

} → swap ; converse S ; swap ; R ; S. 

Figure 4.12: Some of the Prelude library blocks that require manual intervention to prove 

their layouts using the tactical-based methods. 

velop two new theorems about series compositions that are useful here for decomposing this 

problem: 

Theorem 20ÎPQxy. Îxy. 0 ≤ Width P x y ;Îxy. (0:: int) ≤ Width Q x y 

=⇒ 0 ≤ Width (P ;; Q) x y 

Theorem 21ÎPQxy. Îxy. 0 ≤ Height P x y ;Îxy. (0:: int) ≤ Height Q x y 

=⇒ 0 ≤ Height (P ;; Q) x y 

Proof By simplification, expanding the definitions of series composition and “let”. Theo- 

rem 20 also requires the use of the simple lemma 0 ≤ m ∧ 0 ≤ n ⇒ 0 ≤ m + n. Mechanised 

proofs are given in Appendix B.5 as part of the SeriesComposition theory. 

These theorems essentially prove the validity of series composition size functions, assuming 

that the size functions of their constituent blocks are also valid. We prove similar theorems 

for parallel composition in Appendix B.6. 

Using these theorems we can prove the validity theorems for these two blocks. conjugate2 also 

requires additional intervention to split a single identifier representing a pair into two values, 

something that can be done automatically by Isabelle’s auto method. The combinators tri, 

irt, below and grid also require some manual intervention to re-write the proof scripts. Once 

all proof scripts are correct, Isabelle can prove the entire library in 47 seconds of processing 

time.


Overall our tacticals prove good at proving intersection theorems and are mostly effective at 

proving validity and containment theorems. 

4.6.2 Improved Proof Scripts 

Following our experiments with tactical-based proofs of the Prelude library, we attempt the 

same proofs using Isabelle’s standard auto proof method. This method interleaves invocations 

of the simplifier and classical reasoner and can be supplied with sets of simplification rules 

and theorems to use. All our decompositional theorems are proved in the style of introduction 

rules and are supplied to auto as such. 

This method proves effective for validity and containment theorems however its results on 

intersection theorems are far from impressive. Isabelle’s automatic tools consistently select 

the wrong parts of the disjunctions to attempt to prove and leave proof states that are not 

just unproven but actually unprovable. 

We therefore design a new set of rules for generating proof scripts that combines the best of 

these methods. auto-based proofs are generated for validity and containment theorems while 

custom tacticals are generated for intersection theorems. The intersection tacticals include 

an invocation to auto as a last resort when the other options in the tactical fail to prove the 

goal, thus handling the rare circumstances when the classical reasoner can prove a goal but 

our custom tactical can not. 

Returning to our map example, the complete height function validity theorem and proof is 

generated as: 

theorem height ge0 int: "Î(n::int) (R::((’t395⇒ ’t396⇒ bool,’t395⇒ ’t396⇒ int)block)) (i::(’ 

t395)vector) (o ::(’t396)vector). 

∀ (qs691 ::’ t395) (qs692 ::’ t396). 0 ≤ (Height (qs691 ;;; R ;;; qs692)) =⇒ 

0 ≤ (height (n, R) i o )" 

apply (auto intro: sum ge0 maxf ge0 sum ge0 frange maxf ge0 frange z aleq bc simp add: 

done 

Let def max def) 

The containment theorem is similarly proved using an appropriately parameterised auto. The


compiler procedure that generates these scripts also supplies the height ge0 and width ge0 

theorems for all blocks used in the description as introduction rules (none in this case, since 

map only instantiates the supplied R block). This allows validity proofs to build upon one 

another. 

The intersection theorem for map (page 87) is proved by the generated tactical: 

apply (simp, rule impI, simp)? 

apply (( 

done 

(rule allI )+, 

(case tac "0 ≤ n"), 

rule impdisj 12of4, 

(rule loop sum overlap|rule loop sum overlap’), 

(simp add: overlap0’’)+) | 

((rule allI )+, 

(case tac "0 ≤ n"), 

rule impdisj 34of4, 

rule loop sum overlap2, 


auto intro: sum ge0 maxf ge0 sum nsub1 plusf maxf encloses) 

The loop sum overlap theorems are proved in the Structures theory. This theory contains 

theorems that match common layout structures, such as the layout of components in a loop. 

loop sum overlap is given as: 

Î(n::int) (j :: int) (j ’:: int). m ≤ n ;Îy. 0 ≤ f y =⇒ 

((m ≤ j) ∧ (j ≤ (n − 1)) ∧ (m ≤ j’) ∧ (j’ ≤ (n − 1)) ∧ (j’ = j)) −→ 

((sum (m, j − 1, f) + f j) ≤ sum (m, j’ − 1, f) | 

(sum (m, j’ − 1, f) + f j ’) ≤ sum (m, j − 1, f)) 

Its proof involves a number of steps and is given in Appendix B.8. The other loop sum overlap 

theorems are similar. 

Table 4.1 gives statistics on the proofs for some of the blocks in the Prelude library and details 

for all of those where proofs required manual intervention. Overall of nearly 40 blocks in the 

Prelude library, only 5 required manual intervention in their proofs. Using the auto method


Block Type Theorems Intervention Required 

id Wiring 2 

dash Wiring 2 

dstl Composite wiring 5 Expand definition of mfork 

dstr Composite wiring 5 Expand definition of mfork 

pair Composite wiring 3 

rcomp (R n ) Combinator 4 

tri ( ) Combinator 4 

irt ( ˜ ) Combinator 4 Manual containment & intersection 

beside (R↔S) Combinator 5 

row Combinator 4 

conjugate (R\S) Combinator 3 Handling of series composition 2 

conjugate2 (R \S) Combinator 3 Handling of series composition 

Table 4.1: Statistics on the layout proofs for some of the prelude library blocks 

is slower than the tactical-only approach, requiring 1 minute 11 seconds to execute the full 

proofs. However, we are more interested in the amount of human intervention required to 

prove layouts rather than CPU run-time, so long as it remains reasonably low. 

4.6.3 Building a Library 

Because the prelude library is used in virtually every Quartz circuit description it is desirable 

to not only prove its layout correct but also to ensure that the theorems the proofs make 

available are formatted in the most appropriate format to ease later proofs. 

This involves re-phrasing the height ge0 and width ge0 theorems for each block to remove 

unnecessary assumptions, since these would be unnecessary proof burdens on any later proof. 

At the same time we are also able to simplify the auto-generated proof scripts to remove 

redundant proof commands. 

Once final proof scripts for the prelude library are completed they are compiled into an 

Isabelle heap image that can be loaded directly in the same way as the HOL base system or 

the QuartzLayout library. This means that blocks which use prelude theories do not need 

to run the proofs before they can be used. In the Quartz placed prelude library all blocks 

are given the “layout-proved” attribute, indicating to the layout verification modules of the 

compiler that proof scripts do not need to be generated for them. 

2 The application of the series composition decomposition theorems should be automated when supplied 

to auto, however the proof tools do not always apply them correctly.


R (3) 

R (2) 

R (1) 

R (0) 

(a) 

imap 4 R 

R (0) R (1) R (2) R (3) 

(b) irow 4 R 

Figure 4.13: Index operators 

R (3,0) R (3,1) R (3,2) 

R (2,0) R (2,1) R (2,2) 

R (1,0) R (1,1) R (1,2) 

R (0,0) R (1,0) R (2,0) 

(c) igrid 3,4 R 

The full definitions and proofs for some of the prelude library blocks are given in Appen- 

dix C.1. This appendix omits all wiring blocks, where proofs are usually trivial and many 

blocks where the block structures are very similar and thus the proofs identical to others 

(such as col, which is very similar to row ). 

4.7 Proving Other Combinators 

While the prelude library consists of some extremely useful constructs, most of the blocks 

in it are quite simple. In Chapter 6 we will investigate the effectiveness of our verification 

framework when applied to full circuit descriptions, however we are also interested in the 

ease with which we can prove other useful libraries of combinators. 

4.7.1 Index Operators 

The index operators are versions of some of the standard Quartz prelude blocks which pa- 

rameterise their blocks with an integer parameter. For example, the index-map combinator 

imap n R is similar to map n R except the it instantiates instances of R parameterised with 

0, 1, . . .,n − 1 as shown in Figure 4.13(a). Operations such as irow n R (Figure 4.13(b)) and 

igrid n R (Figure 4.13(c)) correspond to rown R and grid n R respectively. 

The index operators are particularly important examples for our system because the extra 

parameterisation of the R block could lead to the size of each instance of R being different.


block irow (int n, block R int (‘a, ‘b) ∼ (‘c, ‘a)) (‘a l , ‘b t[n]) ∼ (‘c b[n], ‘a r) 

attributes { 

height = if(n==0, 0, height((l, t) ; snd (converse (apr (n − 1))) ; 

beside (irow (n−1, R), R n) ; fst (apr (n−1)) ; (b, r))). 

width = if(n==0, 0, width((l, t) ;snd (converse (apr (n − 1))) ; 

beside (irow (n−1, R), R n) ; fst (apr (n−1)) ; (b, r))). 

} { 

// Wires: l = left , t = top, b = bottom, r = right 

assert (n >= 0) ”n >= 0 is required”. 

if (n == 0) { l = r. } // b and t are empty vectors anyway 

else { 

(l , t) ; 

snd (converse (apr (n − 1))) ; 

beside (irow (n−1, R), R n) ; 

fst (apr (n−1)) ; 

(b, r) at (0,0). 

} . 

} 

Figure 4.14: Recursive definition of irow n R 

We investigate proofs of two different versions of the index operators: defined iteratively and 

defined recursively. Recursively defined versions of some index operators are of particular 

interest because these versions can be used to represent less regular circuit descriptions. 

Iterative versions of the index operators are defined a very similar way to the standard prelude 

operators, and the proof infrastructure developed and tested on the prelude works well for 

them - all iterative definitions are proved fully automatically. 

Recursive definitions are slightly more complex and we will examine irow n R as an example. 

Figure 4.14 gives the recursive definition of this combinator. 

The first step in verifying this layout is to generate the theory definitions using the compiler as 

normal, however we must then re-jig the definitions to use Isabelle’s primrec rather than recdef 

constructs (see Section 4.5.2). This process could be performed automatically. The recursive 

width function definition for the irow block after this process is given in Figure 4.15. Because 

the width function recurses over the natural number n, when the parameter is supplied to 

the R block the conversion function “int” must be used to convert the natural number to an 

integer. 

Similarly, the definition of the irow uses the int2nat function we define in the IntAlgebra


primrec 

"width 0 R l t b r = 0" 

"width (Suc n) R l t b r = 

Width ((l, t) ;;; 

snd $ (converse $ (apr $ (int n))) ;; 

beside $ ((| Def = λb c. arbitrary, Height = λb c. arbitrary, Width = λ 

(l,t) (b, r). width n R l t b r |), R $ (int n + 1)) ;; 

fst $ (apr $ (int n)) 

;;; (b, r) 

)" 

Figure 4.15: Isabelle definition of the irow width function 

theory to convert the integer parameter to a natural number. int2nat is similar to the inbuilt 

nat type converter except that it is only defined for values where n ≥ 0. 

Proofs for recursive functions tend to follow a simple structure: induction and then some 

application of auto, possibly combined with other methods. In order to prove the validity 

theorems for irow it is necessary to massage the propositions [55], in order to move variables 

that must be encompassed by the induction onto the right-hand-side of the meta-implication. 

For example, the width validity theorem for irow is phrased as: 

theorem width ge0 int [rule format]: " 

Î(n::nat) 

(R::(( int⇒ (’t107∗’t108)⇒ (’t109∗’t107)⇒ bool,int⇒ (’t107∗’t108)⇒ (’t109∗’t107)⇒ int)block)). 

∀ (qs137::int) (qs138 ::(’ t107∗’t108)) (qs139 ::(’ t109∗’t107)). 

0 ≤ (Height (qs138 ;;; R $ qs137 ;;; qs139)) ; 

∀ (qs137::int) (qs138 ::(’ t107∗’t108)) (qs139 ::(’ t109∗’t107)). 

0 ≤ (Width (qs138 ;;; R $ qs137 ;;; qs139)) 

=⇒ ∀ l t b r. 0 ≤ (width n R l t b r)" 

This differs from the standard representation in that the signals l, t, b and r have been moved 

from being meta-quantified to being object-level quantified. The “rule format” tag instructs 

Isabelle to re-phrase the theorem using meta-quantification once it is proved. 

Proof of this theorem involves applying induction to split it into two cases to prove: 

goal (theorem (width ge0 int), 2 subgoals): 

1.ÎnR. ∀ qs137 qs138 qs139. 0 ≤ Height (qs138 ;;; R $ qs137 ;;; qs139); 

∀ qs137 qs138 qs139. 0 ≤ Width (qs138 ;;; R $ qs137 ;;; qs139) 

=⇒ ∀ l t b r. 0 ≤ irow.width 0 R l t b r


2.ÎnRna. 

∀ qs137 qs138 qs139. 0 ≤ Height (qs138 ;;; R $ qs137 ;;; qs139); 

∀ qs137 qs138 qs139. 0 ≤ Width (qs138 ;;; R $ qs137 ;;; qs139); 

∀ l t b r. 0 ≤ irow.width na R l t b r 

=⇒ ∀ l t b r. 0 ≤ irow.width (Suc na) R l t b r 

The proof should now be completed by auto intro: width ser ge0, however the automatic 

proof tools do not work in this case. We can however prove the base case and expand the 

induction case using only the simplifier: 

> apply (simp, simp) 

goal (theorem (width ge0 int), 1 subgoal): 

1.ÎRna. 

∀ qs137 a b aa ba. 0 ≤ Height R qs137 (a, b) (aa, ba); 

∀ qs137 a b aa ba. 0 ≤ Width R qs137 (a, b) (aa, ba); 


=⇒ ∀ l t b r. 

0 ≤ Width 

(snd.snd $ (converse.converse $ (apr $ int na)) ;; 

beside $ 

((| Def = λb c. arbitrary, Height = λb c. arbitrary, 

Width = λ(l, t). split (irow.width na R l t) |), 

R $ int na + 1) ;; 

fst . fst $ (apr $ int na)) 

(l, t) (b, r) 

We can apply the width ser ge0 rule manually, after removing the universal quantifiers, 

which splits the goal into three sub-goals: 

> apply (rule allI )+ 

> apply (rule width ser ge0)+ 

goal (theorem (width ge0 int), 3 subgoals): 

1.ÎRna l t b r x y xa ya. 



∀ l t b r. 0 ≤ irow.width na R l t b r


=⇒ 0 ≤ Width (snd.snd $ (converse.converse $ (apr $ int na))) xa ya 

2.ÎRna l t b r x y xa ya. 




=⇒ 0 ≤ Width 

(beside $ 

((| Def = λb c. arbitrary, Height = λb c. arbitrary, 

Width = λ(l, t). split (irow.width na R l t) |), 

R $ int na + 1)) 

xa ya 

3.ÎRna l t b r x y. 




=⇒ 0 ≤ Width (fst.fst $ (apr $ int na)) x y 

The proof can then be completed by auto. Similar techniques can be adopted for other proofs, 

although not all will require induction (the containment theorem for irow does not). Note 

that we have not required any properties of maxf or sum - these functions are used when 

describing the layout of iteratively defined blocks and are not needed for recursively defined 

combinators. 

We also attempted proofs for manually specified size functions which expanded the definitions 

of apr and apl etc, referring to explicit vector indexes. These are simpler expressions than the 

full expressions produced by the inference algorithm, however a slightly unexpected result is 

that this substantially complicates the proofs. Proof now requires expansion of the definitions 

of intermediate signals within the series compositions to check that there is a correspondence 

between the values produced by the usages of the append blocks and those that are manually 

specified. This should be a simple process but it is not because of the way blocks are defined 

as logical predicates rather than functions. 

The usage of block predicates to define values bound by the define description operator 

requires the elimination of definite descriptions to extract the real value. This is a significant


B 

A 

C 

Figure 4.16: An irregular grid such as this one is impossible to describe using purely beside 

and below relative placement. 

proof obligation and can be accomplished by using HOL theorems such as: 

theorem the equality: "P a;Îx. P x =⇒ x = a =⇒ (THE x. P x) = a" 

theorem theI2: "P a;Îx. P x =⇒ x = a;Îx. P x =⇒ Q x =⇒ Q (THE x. P x)" 

Theorem theI2 is the most useful, since it allows a definite description to be extracted from 

within a let-definition. However, selecting the correct value is then a proof obligation in 

three parts: proving that a value bound by the predicate exists, proving that it is unique 

and proving that it satisfies the original proposition. This is a complex and fiddly process 

which in general is not worth bothering with. It is, however, an illuminating and unexpected 

observation that directional abstraction, which generally aids reasoning about functional 

properties, complicates reasoning about layout when internal signals are involved. 

Full definitions and theories for some of the recursive index operators can be found in Ap- 

pendix C.2. 

4.7.2 An Irregular Grid 

In Chapter 1 we introduced one example of a pathological layout arrangement that was 

impossible to describe using purely beside and below relative placement constructs. This 

irregular grid arrangement is shown in Figure 4.16. 

Quartz can describe a combinator with this layout, as shown in Figure 4.17. Layout proofs 

were attempted for two versions of this combinator: one with an inferred size function and 

one where the size function was specified manually. There was a marked difference in the 

D 

E


block irregular grid (block A ‘a1 ∼ ‘a2, block B ‘b1 ∼ ‘b2, 

block C ‘c1 ∼ ‘c2, block D ‘d1 ∼ ‘d2, block E ‘e1 ∼ ‘e2) 

(‘a1 a1, ‘b1 b1, ‘c1 c1, ‘d1 d1, ‘e1 e1) ∼ 

(‘a2 a2, ‘b2 b2, ‘c2 c2, ‘d2 d2, ‘e2 e2) 

attributes { 

height = max (height (a1;A;a2) + height(b1;B;b2), 

height(a1;A;a2)+height(c1;C;c2)+height(d1;D;d2), 

height(e1;E;e2)+height(d1;D;d2)). 

width = max (width(a1;A;a2) + width(e1;E;e2), 

width(b1;B;b2) + width(c1;C;c2) + width(e1;E;e2), 

width(b1;B;b2) + width(d1;D;d2)). 

} { 

a1 ; A ; a2 at (0,0). 

b1 ; B ; b2 at (0, height(a1 ; A ; a2)). 

c1 ; C ; c2 at (width(b1 ;B ; b2), height(a1 ; A ; a2)). 

d1 ; D ; d2 

at (width (b1 ;B ; b2), 

max (height (c1 ;C ; c2) + height(a1 ;A ; a2), height(e1 ; E ; e2))). 

e1 ; E ; e2 

at (max (width (a1 ;A ;a2), width(c1 ;C ; c2) + width(b1 ;B ; b2)), 0). 

} 

Figure 4.17: Quartz description for the irregular grid arrangement shown in Figure 4.16 

script execution times for the two combinators, with the inferred size combinator requiring 

2 min 37s to execute while the manually specified size combinator required only 31s. In 

both cases proof scripts required some minor amendments from the auto-generated defaults, 

although these were relatively simple. 

The most important observation from this combinator stems from proof of its four intersection 

theorems. During proofs of these theorems a bug in the layout description was discovered, 

illustrating that the proof process can have a role in the early debugging of this kind of layout 

description. While one downside of theorem proving as opposed to model checking is that it 

does not provide counter-examples, failure to prove an intersection proof obligation tends to 

leave a proof state that with only a little massaging clearly reveals the source of the error. 

Appendix C.3 gives the full proof script for this combinator. 

4.7.3 H-Tree 

A H-tree is type of layout shown in Figure 4.18. It is of particular interest in circuit design 

since it can be used to lay out a tree-shaped circuit in a square block with balanced wire


R 

R R R 

R 

R 

R R 

R 

R 

R R R 

R 

R 

R 

R 

R 

R 

R R R 

R 

R 

R 

R R R 

Figure 4.18: An H-Tree arrangement of R blocks for n = 5 

block htree (int n, block R (‘a, ‘a) ∼ ‘a) (‘a i [m]) ∼ (‘a o) { 

const m = 2 ∗∗ n. 

‘a st1 in[m/2], st2 in[m/2], st1 out, st2 out. 

if n == 0 { o = i[0]. } else { 

i ; half (m/2) ; (st1 in, st2 in) at (0,0). 

if (n mod 2 == 0) { 

// Vertical sub−tree arrangement 

st1 in ; htree (n−1, R) ; st1 out at (0,0). 

(st1 out, st2 out) ; R ; o 

at (0, height(st1 in ; htree (n−1, R) ; st1 out)). 

st2 in ; htree (n−1, R) ; st2 out 

at (0, height(st1 in;htree (n−1, R);st1 out) + 

height ((st1 out, st2 out);R;o)). 

} else { 

// Horizontal sub−tree arrangement 

st1 in ; htree (n−1, R) ; st1 out at (0,0). 


at (width(st1 in ;htree (n−1, R) ; st1 out), 0). 


at (width(st1 in;htree (n−1, R);st1 out) + 

width ((st1 out, st2 out);R;o), 0). 

} . 

} . 

} 

Figure 4.19: Quartz description for the a H-tree combinator 

R 

R 

R


lengths (save on the interface points). This kind of structure can be realised by a recursive 

combinator which alternates between a horizontal and vertical layout for each sub-tree. A 

Quartz description of this combinator can be seen in Figure 4.19. Functionally this combi- 

nator is identical to a standard binary-tree arrangement, although the layout description is 

quite complicated. 

The verification of this kind of combinator is especially important, as we saw with the irreg- 

ular grid example, since the complex layout is relatively more likely to contain errors. The 

semantic definition, height and width functions for this block are recursive and defined using 

Isabelle primrec constructs. The use of multiple internal signals requires special handling. 

We have used tuples of internal signals to bind them in a single predicate, however Isabelle’s 

handling of tuples does not allow these to be split easily, so it is better to re-define the internal 

signals individually binding them with identical copies of the predicate, where other signals 

are existentially quantified within it. This leads to definitions that are long and contain a 

great deal of redundancy but makes proofs substantially easier and since the proof tool is 

designed to handle large definitions and proof scripts easily this is a trade-off worth making. 

Validity theorems are proved by induction on n and then use of the auto method. Contain- 

ment theorems require a combination of auto with some primitive deduction. Intersection is 

proved automatically with the expansion of the definition of the half block. 

4.7.4 Surround 

In Section 3.6 we introduced the surround combinator which describes a square block sur- 

rounded by interface elements. Figure 4.20 illustrates the generic instantiation of this com- 

binator with an A block surrounded by B, C, D and E interface elements. It is important 

to note that this combinator is general and could describe situations where the interface 

elements are the same, or absent (replaced by the identity block). 

The Quartz description for this block given in Figure 3.14 (Page 3.14) is similarly general. 

It must describe a correct layout for situations where the sizes of the blocks within the 

combinator vary arbitrarily. In fact, during the verification of this combinator we discovered 

an error in the block placement which would probably have otherwise remained undiscovered.


B 

C 

A 

D 

Figure 4.20: The surround combinator 

Appendix C.4 gives the Quartz description and correctness proof for this combinator’s layout. 

Validity of height and width expressions are proved by auto configured to expand let defin- 

itions and using Theorem 11 (z aleq bc). Containment proofs can also most be completed 

purely by auto however one requires the use of a variant of Theorem 9. 

The true value of the verification methodology comes into play with the intersection proofs. 

Once again, these are proved entirely automatically using purely auto - however the error that 

was discovered in the layout was discovered because an intersection theorem was not proved. 

The error was that C was naively placed with its y co-ordinate defined by heightA+heightD 

however this did not take into account of the fact that it was possible for it to overlap block E 

under some circumstances. A simple correction to define the y co-ordinate as the maximum 

of the height of E or A and D was sufficient to produce a valid layout. 

This is still a relatively simple layout for this combinator. A more complex layout description 

could use conditionals to compare the heights and widths of the various blocks and adjust 

their relative placement accordingly (for example, the B and E blocks could be aligned with 

the bottom of the combinator as a whole rather than the bottom of the A block if they have 

a greater height than A). 

E


4.8 Discussion 

Our approach to layout verification is effective, both at verifying the correctness of combina- 

tors and in finding the source of errors. However, it does have some drawbacks: 

1. It generates many proof goals for blocks. 

2. Many proofs that should be completed automatically require some manual intervention 

to tweak the role of the automated tactics. 

3. We have not formally established the link between the proof obligations we generate 

for each block and the original definitions of correctness. 

The first issue stems from the increased role of the size inference system over what was 

originally expected. When this work was begun it was presumed that size inference would be 

inefficient and it would almost always be preferable to manually specify block sizes. However, 

we have found that the opposite is often the case. Except for primitive blocks, where sizes 

must always be specified manually, size inference is usually easier than writing complex size 

expressions by hand. In addition, while hand-coded size expressions are more efficient than 

inferred ones, the differences between the two tend to follow common patterns (some of which 

we have proved as theorems during the course of this work). 

It seems likely that in most cases the manually specified size functions could be produced by 

applying correctness-preserving transformations to the inferred size expressions automatically 

in the compiler. 

Using the size inference system, it should no longer be necessary to prove validity and con- 

tainment for each block’s size expressions if these properties could be proved for the inference 

algorithm itself. We can satisfy ourselves by inspection that sizes inferred by the inference 

system have correct containment properties since the inference algorithm is designed to select 

the topmost, rightmost possible co-ordinate of a layout 3 . We can also prove a theorem about 

the size inference function to show validity of its results: 

Theorem 22 For all blocks A, where R is the set of higher-order parameters of A and for 

3 “Correct by definition” is not totally satisfactory, however it is sufficient for our purposes.


all r ∈ R, heightr and widthr are valid, the height and width expressions inferred for block 

A will be valid. 

Proof By induction on the structure of statements and using Theorems 9, 11, 13 and 17. 

Similar proofs must be completed for the size function for block compositions SI but these 

are similar and easy. 

However, if these theorems are simply omitted from the Isabelle theories this makes it im- 

possible for them to be referred to in proofs for blocks which have manually specified size 

functions. Manually specified size functions are far from useless - they can be simpler than 

inferred ones and allow blocks to be allocated sizes that are larger than required. This latter 

point is particularly important for run-time reconfigurable designs where it may be desirable 

to allocate the same (maximum) amount of space to a design regardless of how big it actually 

is and allow all different generated designs to grow or shrink within that boundary. 

A way around this is to describe the size inference function itself in Isabelle and prove these 

properties about it using a deep embedding of Quartz. A meta-theorem could then be proved 

in Isabelle/HOL to provide validity proofs for all size expressions produced by the inference 

algorithm. 

Issue 2 appears to mainly be one of implementation. Isabelle’s automated proof tools do 

produce very good results when their rule sets are correctly configured and further investi- 

gation is necessary to reveal whether these issues are caused by niceties in the phrasing of 

theorems or subtle bugs in the way they are applied by the classical reasoner. It should be 

stressed that where manual intervention in proofs has been necessary it has not required any 

theorems or lemmas that are not already present in the QuartzLayout library and thus for 

an experienced user the task is generally an easy one. 

The third issue is perhaps one of choice. Formal verification is time consuming and difficult. 

As such it makes sense to apply it only to the parts of a design or design process where it 

is most likely to yield results. No hardware development system is available that is formally 

verified from “top to bottom” and in our system we have chosen to implement formal verifi- 

cation for a particular subset, based on particular definitions that we can reason are correct 

in a semi-formal way.


Once again, an alternative to this would perhaps be to attempt a deep embedding of Quartz 

in Isabelle and to prove the correctness of layouts in terms of the compilation function. A 

deep embedding, particularly one which encompassed a full compilation procedure, would be 

an even more substantial undertaking than this shallow embedding. Layout verification based 

on a deep embedding of compilation (for intersection proofs) and the size inference algorithm 

(for validity and containment proofs) would, for each individual design, probably require the 

proof of significantly fewer theorems but each theorem is likely to be more complex. The 

benefit would be an increased level of formal assurance in the result but quite how much of 

a benefit this is remains an open question. 

Another aspect of our system that could be changed is the use of containment proofs. We 

have based layout verification around the assumption that each block can be contained within 

a rectangle and this rectangle can then be used as an abstraction for the block layout in later 

proofs. However, for irregular shaped blocks this may not be optimal - for example consider 

the case of two triangular circuits, which could be laid out inter-locking to form a rectangle 

- but only if their size functions are not rectangular. Introducing block boundaries described 

by arbitrary functions would massively complicate reasoning and the approach we would 

advocate using our system is to describe shapes such as these as new combinators as and 

when they are required - these layouts can then be proved as normal. An alternative is to 

relax the requirement on containment proofs and verify the containment of a set of blocks. 

This means verifying the layout at a level of a set of blocks rather than each individual block 

in the set. 

4.9 Summary 

In this chapter we have described a system for verifying Quartz circuit layouts in a shallow 

embedding of Quartz in Higher-Order Logic using the generic theorem prover Isabelle. We 

give Quartz a formal semantics in HOL. 

We have described new features of the Quartz compiler which support automatic conversion 

of Quartz descriptions into Isabelle/HOL definitions and the automatic generation of proof 

obligations to verify their layouts. Our modified compiler is also reasonably effective at gen-


erating proof scripts which will automatically prove these theorems using Isabelle’s simplifier 

and classical reasoner. 

We have demonstrated our system on a range of combinators including the full prelude and 

index operators libraries. Large theorems have been proven relatively easily and we have also 

demonstrated that our system can reveal the flaws in circuit layouts in a useful manner for 

aiding development, rather than purely giving a false result. 

Chapter 6 contains some examples of the usage of the layout generation and verification 

system on a range of complete designs.

Chapter 5 

Specialisation 

In this chapter we illustrate how Quartz can be used to create specialised, placed designs when 

some input values are known at compile-time. Section 5.1 introduces the benefits of design 

specialisation and some of its applications. In Section 5.2 we illustrate how we can achieve 

distributed specialisation transparently using Quartz “clever components” as primitive ele- 

ments. In Section 5.3 we discuss the limitations of our current infrastructure for distributed 

specialisation and outline the requirements for an optimal system, while Section 5.4 discusses 

the role of specialisation code at a higher-level than primitive components. Section 5.5 illus- 

trates high level specialisation of a multiplier circuit and evaluates the performance impact 

of compacting designs when logic is optimised away. Section 5.6 summarises this chapter. 


Design specialisation is a useful tool that can be used to optimise the performance of digital 

circuits when the values of some inputs are known. It is something that is so commonly used 

in circuit design as to almost not be worth mentioning with simple optimisations such as 

replacing a full multiplier by a constant co-efficient multiplier when one input is fixed or, at 

a meta-level, the choice of selecting a multiplier implementation rather than a full ALU for 

a circuit that will only perform multiplication. 

Typically most design specialisation is static specialisation - a design-time optimisation that 

107

CHAPTER 5. SPECIALISATION 108 

is carried out either by the designer or automatically by synthesis tools. The process can 

encompass a range of low level logic optimisations such as constant propagation and dead 

logic removal (eliminating logic that computes a result which is never used). However, the 

increasing role of FPGAs and their potential for run-time reconfiguration raises the possibility 

of dynamic specialisation - changing the circuit at run-time. 

At present the most common use of run-time reconfiguration is to swap different pre-synthesised 

library circuits on and off a chip. However, dynamic specialisation can be used to reconfigure 

an FPGA to carry out the same operation but to perform it in some way that is better - for 

example using fewer logic resources or being able to run at a higher clock frequency. 

Dynamic specialisation becomes a useful option for circuits which do not have static inputs 

but do have one or more inputs that changes at a much lower rate than the others. A good 

example is a cryptographic processor which may have two inputs: an encryption key and 

plaintext data to encrypt. If the key changes much less frequently than the plaintext then 

the decryption circuit can usefully be specialised for that key value, producing a design that 

is more efficient. 

The usefulness of dynamic specialisation depends on the trade-off between the expected 

benefit to be gained from specialisation and the time taken to generate a specialised design 

and reconfigure the FPGA. This trade-off is not necessarily one purely of time, since it could 

be that the desire is to free logic area on the FPGA for other uses rather than purely making 

the design run faster. 

Dynamic specialisation poses a particular difficulty from the point of view of design verifi- 

cation. Standard verification methodologies based on simulation (or even model checking of 

the finished design) are not going to be any use for verifying the correctness of designs which 

are being produced in the order of seconds rather than days, weeks or months - and very 

probably without any human intervention. This suggests that it is appropriate to employ 

formal verification and theorem proving for dynamic specialisation to verify the equivalence 

of a specialised version to the general-purpose design. 

Some success has been reported with this approach in verifying a procedure to partially 

evaluate a multiplier circuit using higher-order logic [80]. This partial evaluation procedure


operated at a low level on a placed and routed circuit and replaced unnecessary functional 

components with wires. However, this process leaves the bounding box of the circuit un- 

changed and is thus not effective at freeing logic area on the device. In addition this bounding 

box can contain long wire lengths which will have a performance impact on the specialised 

design, reducing its maximum clock frequency. 

An alternative approach to specialising the low level hardware is to generate a specialised 

design at a higher level. This however requires that the full process of synthesising, mapping, 

placing and routing a design is completed for the specialised circuit - something that will 

probably take far too long. This time can be reduced by taking the middle road and gener- 

ating a mapped, placed circuit which only then needs to be routed. Design methodologies 

which allow fast generation of FPGA bitstreams from this kind of description have been 

described [78]. 

In this chapter we demonstrate how our Quartz placement infrastructure can be used to 

specialise designs and illustrate the principle of distributed specialisation. 

5.2 Distributed Specialisation 

We use the term “distributed specialisation” to describe Quartz blocks which appear to be 

hardware elements but which actually contain code which controls their elaboration to pro- 

duce simpler hardware if possible. These self-specialising blocks can be seamlessly integrated 

into a design to achieve HDL-level support for specialisation. 

Distributed specialisation is characterised by the lack of any centralised control, making 

specialisation available transparently to the designer. We will also demonstrate the slightly 

counter-intuitive concept that distributing the specialisation code into multiple locations 

actually makes design verification easier as compared to when specialisation is explicitly 

controlled by a “specialise” input, such as has been demonstrated with the Pebble layout 

system [49]. 

Our self-specialising blocks are “clever components” [75], although they are quite different to 

those previously demonstrated. Rather than pairing hardware wires with extra information


A B Q 

0 0 0 

0 1 0 

1 0 0 

1 1 1 

(a) 2-input 

B Q 

0 0 

1 1 

(b) A=true 

B Q 

0 0 

1 0 

(c) A=false 

Figure 5.1: AND gate truth tables 

about them, we instead totally replace them with Quartz variables. 

5.2.1 Specialising Primitives 

We will first introduce this simple but powerful idea with a basic example: a 2-input logical 

and gate. The truth table for a 2-input and gate is shown in Figure 5.1(a), showing that 

the output signal Q is only asserted if both the inputs A and B are true. If one of the 

input signals is known then this truth table can be simplified as shown in Figure 5.1(b) and 

Figure 5.1(c). If A is false, then Q always equals false, regardless of the value of B, while if 

A is true then Q takes the value of B. 

This allows us to describe the specialisation of an and gate when the A input is fixed - if it 

is fixed with value true then the gate should specialise to a wire linking Q and B, while if it 

fixed with value false then Q should just be statically assigned the value false. 

In distributed specialisation we enclose this behaviour within a composite and2 block which 

will transparently carry out this operation when connected to a static value. We can use 

the Quartz overloading mechanism to select between different possible uses of the and gate 

primitive - with two wire values, or one wire and one known value or two known values - using 

the type system, assuming that static values are represented as Quartz booleans. Figure 5.2 

illustrates the overloaded Quartz blocks that can describe this operation. 

If the two input values are unknown (i.e. they are real dynamic values carried on hardware 

wires) then the hardware primitive and2 is selected which elaborates to a primitive gate with 

size 1 × 1. If both inputs are known then no hardware is generated at all and instead a 

boolean output variable is generated with the value of the and operation. If one input is 

known then either a wire or a static assignment of c to ground is generated.


// Hardware primitive 

block and2 (wire a, wire b) ∼ (wire c) attributes { height = 1. width = 1. }{ } 

// Specialising when both inputs are known 

block and2 (bool a, bool b) ∼ (bool c) attributes { width = 0. height = 0. } 

→ c = (a and b). 

// Specialising when one input is known 

block and2 (bool a, wire b) ∼ (wire c) attributes { width = 0. height = 0. } 

→ if a { c = b. } else { c = false. } . 

block and2 (wire a, bool b) ∼ (wire c) → (b, a) ; and2 ; c. 

Figure 5.2: Distributed specialisation of an and2 block 

A B Q 

0 0 0 

0 1 1 

1 0 1 

1 1 0 

(a) 2-input 

B Q 

0 1 

1 0 

(b) A=true 

B Q 

0 0 

1 1 

(c) A=false 

Figure 5.3: Exclusive-or gate truth tables 

A slightly more complicated example is an exclusive-or function. The full and specialised 

truth tables for this block are given in Figure 5.3. Here the relationship between B and Q 

when A is known is slightly different, if A is false then Q = B however if A is true then Q is 

B inverted. Distributed specialisation can describe xor2 blocks which enclose this behaviour, 

either connecting the two signals together or generating a simple inverter rather than a full 

xor gate. Figure 5.4 shows the Quartz description for xor specialisation. 

This example is a particularly significant one because it demonstrates how conditionals in 

size expressions can be used to reflect the specialisation in the layout of a circuit. These size 

expressions will be propagated through the circuit to create a size expression for the overall 

circuit which is dependent on the value of the static parameter. 

Specialising the xor2 gate to an inverter does not actually save any logic area on an FPGA, 

since logic functions are implemented by look-up tables, however if realised on an ASIC the 

difference between 8 transistors to implement an xor function and 2 to implement an inverter 

is definitely worth having.


// Hardware primitive 

block xor2 (wire a, wire b) ∼ (wire c) attributes { height = 1. width = 1. }{ } 

// Specialising when both inputs are known 

block xor2 (bool a, bool b) ∼ (bool c) attributes { width = 0. height = 0. } 

→ c = (a xor b). 

// Specialising when one input is known 

block xor2 (bool a, wire b) ∼ (wire c) 

attributes { width = if (a,1,0). height = if (a,1,0). } 

→ if a { b ; inv ; c. } else { c = b. } . 

block xor2 (wire a, bool b) ∼ (wire c) 

→ (b, a) ; xor2 ; c. 

5.2.2 Benefits 

Figure 5.4: Distributed specialisation of an xor2 block 

So far we have given simple examples of how distributed specialisation can be used to achieve 

constant propagation and elimination of unnecessary logic primitives. It is worth pointing 

out that this operation will be carried out by any reasonable synthesis tool anyway and we are 

not claiming that applying constant propagation to circuits is anything new. The strengths 

of distributed specialisation are its three advantages over low-level hardware optimisation by 

synthesis tools. 

Firstly, when using components that implement distributed specialisation in a laid out circuit, 

the circuit placement can itself be parameterised by the static parameters. This means that 

blocks can shrink in size as logic is eliminated and the remainder of the circuit components’ 

positions will be adjusted accordingly. Low-level constant propagation can not achieve this, 

even if operating only on a placed rather than placed & routed circuit. While the synthesis 

tools will eliminate logic that is not being used, it does not change the positions of other 

elements of the circuit and could not know how to move them anyway since the circuit layout 

is only parameterised in the high-level description. By moving specialisation up to the same 

level as the layout description (the HDL level) we are able to make sure that the layout is 

parameterised and not only propagate constants through a design but also achieve design 

compaction. 

Compaction has two advantages. It allows specialisation to reduce the overall size of the


circuit, rather than just eliminating some logic internally while maintaining the same overall 

bounding box. This means that the free logic resources on the FPGA can be used more 

efficiently. It also minimises wire lengths by moving components that would otherwise be 

joined by long wires to be adjacent to each other. 

The second advantage of distributed specialisation is that, because it is conducted at a high- 

level, it should be able to be processed much faster than low level constant propagation. This 

is aided because, due to the overloading mechanism only selecting specialising blocks if at 

least one input is static, constant propagation only needs to be analysed through the parts 

of the circuit where it can actually have an effect. 

In addition, distributed specialisation does not just have to take place at the primitive level. 

Designers who write hardware library blocks can also provide blocks with specialisation code 

which could perform high level optimisations rather than using the code in the lower level 

circuit blocks. For example, this means that a grid shaped circuit where entire rows are 

expected to be eliminated can be specialised at the row level rather than processing each 

row element individually. We discuss in the advantages of high level specialisation further in 

Section 5.4. 

Speed is an important factor in any mechanism that is intended for use in dynamic specialisa- 

tion applications where it is imperative to minimise the time taken to generate a new FPGA 

bitstream. Using distributed specialisation with the Quartz layout framework it is possible 

to describe constant propagation, mapping (by using only primitive hardware components 

in descriptions) and placement all within the high-level description leaving only routing and 

bitstream generation to be completed on the system output before it can be used to configure 

a device. 

The third advantage of distributed specialisation is that it provides a clear and modular 

framework with which to verify the specialisation procedure and thus, by extension, the 

correctness of all specialised circuits produced.


5.2.3 Verifying Distributed Specialisation 

In dynamic specialisation applications after the correctness of a general circuit has been 

carefully determined, through whatever means, it is vital to ensure that the new circuits 

generated by the specialisation system are functionally correct. Since it is not possible to 

simulate each new circuit as it is generated the only reasonable approach is to verify the 

specialisation process itself. 

Distributed specialisation provides a clear and modular approach to the verification prob- 

lem. Essentially we rely on the fact that a block with specialisation code should, under all 

circumstances, output the same logical value - regardless of how this is computed from static 

or dynamic hardware inputs. 

This splits the overall task of verifying a specialisation procedure into many smaller and much 

simpler tasks: verification of each self-specialising block. In theory it is much worse to have 

to verify each new specialising block that is written than to verify a single procedure that 

specialises any circuit however in practice we would suggest that this approach is superior. 

It is likely that most circuits will rely on self-specialising blocks that have already been 

developed and thus will not require further verification. The kind of blocks likely to have 

their own specialisation code developed will be library blocks which will should already be 

subject to extensive verification effort. If high-level specialisation has been implemented for 

blocks in order to increase the speed of processing then this will be functionally identical to 

the lower-level specialisation and the lower-level proofs can be used to verify the high-level 

specialisation procedure. 

Because specialisation is being carried out at the HDL level we do not need to concern 

ourselves with details of signal routing and need to perform only a high level functional 

verification. By dividing the verification task each individual verification goal will tend to be 

quite simple, to the point where we would expect automatic proof tools to be able to achieve 

a high level of automation in proving self-specialising blocks automatically. 

Figure 5.5 illustrates simple proofs for the partial specialisation and2 and xor2 blocks de- 

scribed in Figures 5.2 and 5.4. The hardware primitives have been given HOL semantics 

and the partial specialisation blocks described in their expanded HOL form as given by the


constdefs (∗ Semantics for hardware primitives ∗) 

and2 :: "wire⇒ wire⇒ wire⇒ bool" 

"and2 ≡ (λ a b c. c = (a ∧ b))" 

inv :: "wire⇒ wire⇒ bool" 

"inv ≡ (λ a b. b = (˜ a))" 

xor2 :: "wire⇒ wire⇒ wire⇒ bool" 

"xor2 ≡ (λ a b c. c = (a ∧ (˜ b) | b ∧ (˜a)))" 

constdefs (∗ Semantics for partial specialisation blocks ∗) 

and2’ :: "bool⇒ wire⇒ wire⇒ bool" 

"and2’ ≡ (λ a b c. if a then (c = b) else (c = False))" 

xor2’ :: "bool⇒ wire⇒ wire⇒ bool" 

"xor2’ ≡ (λ a b c. if a then (inv b c) else (c = b))" 

(∗ Correctness theorems for specialised blocks ∗) 

theorem and2 spec: "Îabc. and2 a b c = and2’ a b c" 

by (simp add: and2 def and2’ def) 

theorem xor2 spec: "Îabc. xor2 a b c = xor2’ a b c" 

by (simp add: xor2 def xor2’ def inv def) 

Figure 5.5: Functional verification of distributed specialisation of and2 and xor2 

semantic function Bβ (Figure 4.8, page 81). 

This is actually a rather simpler verification model than would be desired, since the wire 

type is simply a synonym for bool and thus there is no need for explicit reference to the types 

of the input/output signals - nevertheless it suitably demonstrates the concept. The two 

theorems demonstrating the equivalence of the blocks are proved by using just the simplifier 

and expanding the block definitions. 

Functional verification is only one side of the verification task: it is also vital to verify the 

correctness of the layout of a specialised circuit. This can be done using the layout verification 

framework we described in Chapter 4. When specialising designs it is useful to state and 

prove a particular additional correctness theorem of the form: 

∀sigs. Width cctspec sigs ≤ Widthcctgen sigs ∧ Heightcctspec sigs ≤ Heightcctgen sigs 

In fact, this theorem should be phrased so it is only universally quantified over dynamic 

parameters. Parameters which genuinely affect the size of the circuit and are not expected to 

vary at run-time (such as the bit-width of a ripple adder) can not be included in such proofs 

since it would clearly render the theorem unprovable.


b 

a 

cout 

cin 

Figure 5.6: Full adder cell 

These layout theorems can be proved for each self-specialising block and then these proofs 

combined to verify the size correctness of an entire circuit. This once again neatly decomposes 

a potentially large verification task into many small and easy parts. For example, for the 

xor2 gate in Figure 5.4 the width correctness theorem can be stated and proved easily: 

∀ a b c. Width xor2spec a b c ≤ Width xor2gen a b c 

⇒ ∀ a b c. if(a, 1, 0) ≤ 1 

⇒ 1 ≤ 1 ∧ 0 ≤ 1 ⇒ True 

5.3 Optimal Distributed Specialisation 

Reviewing the effectiveness of our self-specialising blocks on a real circuit we discover that it 

is necessary to address some shortcomings with the basic approach. 

5.3.1 Specialising a Ripple Adder 

We will demonstrate distributed specialisation in Quartz using a simple ripple adder circuit. 

This is composed of full adder cells as shown in Figure 5.6 and described in Quartz as shown 

in Figure 5.7. The full adder is implemented using two xor gates and a multiplexer, which is 

described separately using two 2-input and gates, an or and an inverter. This description is 

not designed for any particular target architecture however, and what matters is the effect 

on this circuit of specialising it. 

As can be seen in Figure 5.7 the full-adder block accepts only hardware wires for inputs cin 

ans


block fadd ((‘a a, wire b), wire cin) ∼ (wire cout, wire ans) { 

wire xored ab. 

(a,b) ; xor2 ; xored ab at (0, height((a, cin) ; mux2 xored ab ;cout)). 

(cin, xored ab) ;xor2 ; ans at (width((a,b) ; xor2 ; xored ab), height((a, cin) ; 

mux2 xored ab ;cout)). 

(a, cin) ; mux2 xored ab ;cout at (0,0). 

} 

Figure 5.7: Quartz description for a full adder 

Resource Standard A=111 A=100 A=001 

xor gates 6 3 3 3 

and gates 6 3 3 3 

inverters 2 6 4 4 

or gates 3 3 3 3 

Total gates 17 15 13 13 

Saving - 12% 29% 29% 

Total transistors 106 72 68 68 

Saving - 32% 35% 35% 

Table 5.1: Using distributed specialisation to specialise a ripple adder 

and b but input a is more flexible and, since it is connected to overloaded self-specialising 

blocks, can be of either wire or bool type. This allows the a input to be specialised by 

supplying a static value. 

The full adder blocks can be combined together using the col combinator to create a ripple 

adder. We can then use the Quartz compiler to synthesise a netlist for this design with either 

two dynamic inputs or one dynamic and one statically specialised input. Table 5.1 compares 

3-bit versions of the standard ripple adder and three specialised versions. 

The first point to observe from these results is that the distributed specialisation has produced 

a significant saving in on-chip resources, though more so in terms of transistors by converting 

complex functions into simpler ones than by eliminating logic entirely. However, it is also 

obvious that this result is not optimal - for example, adding “100” should be implementable 

as a single full-adder with all other blocks reduced to wiring, an arrangement that only 

requires 6 gates using this fadd block. 

The reason our distributed specialisation system produces such poor results is the Quartz 

typing system and can be seen in the code describing distributed specialisation of the and2 

block, or the or2 block shown in Figure 5.8. These two blocks either connect their output to


block or2 (bool a, wire b) ∼ (wire c) attributes { width = 0. height = 0. } 

→ if a { c = true. } else { c = b. } . 

block or2 (wire a, bool b) ∼ (wire c) 

→ (b, a) ; or2 ; c. 

Figure 5.8: Distributed specialisation for an or2 block 

the dynamic input or statically assign it to some value, depending on the value of the static 

input. 

For or2, c is connected to b if a is false, and connected to ground otherwise. Because c is 

connected to a wire in one branch of the conditional it must itself have a wire type since it 

is not possible to assign wire values to any other type. The assignment c = true used in 

the other branch uses the overloaded signal connection operator, which allows static boolean 

values to be assigned to wires. 

This block correctly specialises itself, however it does not allow a propagation of the constant 

value. If a is true, the more optimal solution is to produce a boolean output c assigned with 

the value true. This would then allow blocks connected to the c output of or2 to properly 

specialise themselves, whereas with this block description other blocks assume that the value 

of c is unknown. 

5.3.2 Modified Type System 

To achieve proper propagation of the constant, c would need to be typed as a boolean if a 

is true and as a wire otherwise. This is impossible in a statically typed language like Quartz 

where types are determined by an inference process prior to the program executing. 

This problem also occurs at the combinator level. When the col combinator is used to 

connect together multiple fadd blocks it requires that the fadd block has a type of the form 

(α, β) ∼ (β, γ) - that is that the top and bottom connections must have the same type so they 

can be connected together. This means that it is not possible for the carry signal moving up 

the column to alternate between bool and wire types, depending on whether the carry value is 

known. Nor is it possible to take account of the fact that, in our ripple adder implementation, 

the initial carry-in input is always zero and use this to simplify the first full adder.


Modifications are necessary to the type system to provide this kind of support. One possibility 

is to provide a full system of dependent types, where the type of a signal could depend on a 

variable value. This complicates type inference - making it undecidable - and would require 

the designer to write complex type declarations. 

A better alternative is to achieve the same power by introducing the much simpler construct 

of enumerated types. These are standard constructs in many programming languages, which 

allow programmers to specify their own types using type constructors. Functional program- 

ming languages usually provide easy-to-use support for recursive types and these are used to 

define recursive data structures such as linked lists and trees. We do not require recursive 

types – merely the ability for a value that can have multiple interpretations to have a simple 

type. 

We could define a data type which could be declared by (in pseudo-code): 

type data = 

Known of bool 

Wire of wire. 

Values of type data would then be used in circuit descriptions, rather than wire or bool, and 

the specific value would be extracted by the block itself. Figure 5.9 illustrates what an or2 

block that used this mechanism could look like. Because static and dynamic values now have 

the same type we are no longer able to use the overloading mechanism to select between 

instances and instead this or2 block contains code to generate the correct output regardless 

of whether zero, one or both inputs are known. This block with specialising code can however 

be overloaded with the hardware primitive with type wire wire ∼ wire and the type system 

can determine which block to instantiate. 

In this description “Wire” and “Known” are used as both an access function, to retrieve the 

wire value attached to the data inputs a and b and as a constructor, and a constructor, to 

be pattern matched. It may be that there is a more appropriate syntax, however it is the 

concept that matters. 

This system can be used to achieve optimal constant propagation results and verification is 

still relatively easy using a suitable model in an automatic theorem prover.


block or2 (data a, data b) ∼ (data c) { 

if (a = Known a2) { 

if a2 { c = Known true. }else { c = b. } . 

} else if (b = Known b2) { 

if b2 { c = Known true. }else { c = Known false. }. 

} else 

(Wire a, Wire b) ; or2 ; (Wire c). 

} . 

} 

Figure 5.9: Distributed specialisation for an or2 block with a better type system 

5.4 High Level Specialisation 

Primitive-level specialisation with clever components that can eliminate individual gates is 

a useful process however it is not a total solution to all possible specialisation requirements. 

A higher level approach to specialisation, writing specialisation code for larger blocks such 

as library elements, also has a significant role to play. 

An important consideration is that constant folding, while one of the most useful optimi- 

sations to carry out when specialising circuits with an aim to reduce their area or improve 

performance, is not the only specialisation procedure we might wish to apply. There are 

many reasons we might wish to specialise a design, for example: 

1. To eliminate unnecessary logic in order to free space on the device for other function- 

ality, or to reduce the circuit’s power consumption. 

2. To increase the maximum clock frequency and run the circuit at a higher speed. 

3. To eliminate unnecessary computation from the critical path of a pipelined design, 

reducing the overall latency. 

4. To free space, allowing it to be used to further parallelise the computation. 

Items 3 & 4 are particularly interesting. If the initial stages of a pipelined computation could 

be eliminated by pre-computation of some of the inputs then the resulting circuit’s latency 

could be reduced. If the circuit is required to have a specific latency then this “latency slack” 

can be used to introduce additional pipelining in the later stages. This could then allow the 

design to run at a higher clock frequency overall, or the design could be run at the same


Buffer/Queue 

R 

(a) General circuit 

R 

Buffer/Queue 

(b) Specialised 

R 

Buffer/Queue 

Additional Control Logic 

R R 

Additional Control Logic 

(c) Parallelised 

Figure 5.10: Space freed by specialisation can be used to further parallelise a circuit 

clock frequency but hopefully with reduced power consumption due to the reduction in glitch 

propagation [85]. 

Alternatively, if the logic resources required to carry out a computation can be reduced 

but the space allocated on the device remains the same then the freed space can be used 

for accelerating the computation in other ways. It could even be used to duplicate the 

computational unit, increasing throughput if tasks are switched between processors rather 

than queueing for a single processor, as illustrated in Figure 5.10. In this diagram, the dotted 

box indicates the logic area allocated to the computation, which can be used to implement 

either a general processor, a single specialised processor and some unused logic, or multiple 

specialised processors with additional control. 

This kind of specialisation is not mere constant propagation and requires a higher-level of 

designer involvement. Circuit designers can program library blocks to exhibit this kind of 

specialisation behaviour, using Quartz conditionals extended with block size constructs to 

identify the size of specialised components. This means that a block to implement the system 

in Figure 5.10 could generate any number of additional copies of the computational block 

depending on the ratio between the size of Rgen and Rspec. 

High-level specialisation can also be used to perform constant propagation on a macro-level, 

eliminating the need to process the specialisation individually for each primitive block. This 

means that the specialisation process can be run more quickly, important for dynamic spe- 

cialisation applications where it is necessary to produce a new circuit quickly at run-time. 

High-level and primitive-level specialisation can be combined in descriptions so that large 

contiguous collections of blocks can be eliminated at the high level while individual blocks in


irregular positions can be eliminated using primitive-level specialisation. 

High-level specialisation can fit our definition of distributed specialising - blocks operating 

independently without centralised control. In some cases it may be desirable to provide 

explicit control over the kind of specialisation engaged in and this can be done by adding extra 

parameters. The Quartz overloading mechanism can be used to overload a parameterised 

block with a non-parameterised one which instantiates the self-specialising block with a 

default set of parameters – the same method we used in Section 3.6 to give blocks multiple 

layout interpretations. 

5.5 Specialising a Multiplier 

We will demonstrate high level specialisation with a simple example: a parallel multiplier 

circuit. Since one of the main advantages of specialising designs in Quartz rather than using 

synthesis tool optimisations is that we are able to specialise and compact placed designs 

we will take this opportunity to evaluate whether any performance benefit is gained from 

compaction. 

5.5.1 Parallel Multiplier Implementation 

Before we can specialise a multiplier circuit it is necessary to describe a multiplier circuit in 

Quartz. Since we are interested in evaluating the real performance of the specialised circuit, 

we will design a multiplier for a real FPGA architecture - Xilinx Virtex-II. 

A parallel multiplier operates using a shift-add methodology that is similar to the way binary 

multiplication is performed on paper. A multiplier performing the operation x × y can 

be described as a grid-shaped circuit with x values flowing vertically and y values flowing 

horizontally. Each functional cell must perform a the multiplication operation for one bit 

of x and one bit of y, producing sum and carry outputs in a similar way to a ripple adder, 

except that these outputs will be connected to additional processing cells with only the final 

stage producing an output. 

The Virtex-II architecture contains specific components within each half-slice to implement


a fast carry chain designed to support the generation of fast adders and multipliers. Each 

half-slice also contains a mult and component which allows two functional cells implementing 

multiplication to be described within a single slice. Figure 5.11 illustrates how a half-slice 

can be configured to form part of the functional cell of a multiplier. 

At first glance this is an inefficient way of implementing the functional cell, since the x · y 

logical and operation is computed twice. However this is not actually the case since the area 

within the dotted boundary is implemented using the slice look-up table. The performance 

and area required by the look-up table is independent of the actual logic function it is used 

to implement and the intermediate x · y signal does not actually exist, so can not be used as 

an input to the top multiplexer. 

The lower and gate is the slice mult and component which is specifically available for carrying 

out this operation and can only be connected to the lower two inputs of the look-up table. 

The second exclusive-or operation and the multiplexer are also available already as dedicated 

devices within the slice so do not require any additional resources that are not already in 

existence. 

This slice logic can be combined with a wiring arrangement to form a cell suitable for com- 

position into a grid. Figure 5.12(a) shows the wiring within a multiplier cell and how it is 

connected to the slice circuitry. The ACCin and ACCout signals provide a diagonal connec- 

tion between the SUMout output of the cell to the left and the Qin input of the cell above. 

X and Y signals are routed through the cell as well as being connected to the slice logic and 

the output signals are connected to the cells above and to the right. Figure 5.13 shows the 

Quartz description of the multiplier cell. 

This cell design can be composed into a grid, describing a multiplier with a y input on the 

left and x input on the bottom. The multiplication results are output on the right side and 

the top side. When multiplying an n bit number by an m bit number the lower n bits of 

the result will be output on the right and the upper m bits will be available in carry-save 

representation on the top connections. An additional adder circuit must be connected to the 

top connections to produce a full m+n bit output. Figure 5.14 shows the Quartz description 

for an n-bit by n-bit multiplier, where only the first n bits of the output are utilised. This 

circuit is similar to one derived formally using the T-Ruby system [68], although ours is


Yin 

ACCin 

Qin 

y 

x 

Figure 5.11: Virtex-II cell configuration to create a parallel multiplier 

ACCout 

Yin 

Pout 

SLICE LOGIC 

Xin 

Xout 

Qin Pin 

Xin 

(a) Standard cell design 

Yout 

SUMout 

Yin 

Pout 

ACCin 

Pin 

ACCout 

Qin 

SUMout 

(b) Specialised X=0 

Figure 5.12: Functional cells for parallel multiplier 

block multcell ((wire acc in, wire y in), (wire acc out, wire p out, wire x out)) ∼ 

((wire q in, wire p in, wire x in), (wire sum out, wire y out)) 

attributes { height = 1. width = 1. }{ 

wire xored sig. 

y out = y in. 

x out = x in. 

acc out = acc in. 

(x in, y in, q in) ; mult lut ; xored sig at (0,0). 

(p in, xored sig) ; xorcy ; sum out at (0,0). 

((x in, y in), p in) ; fst mult and ;muxcy xored sig ;p out at (0,0). 

} 

Figure 5.13: Quartz description of the multiplier cell 

Yout 

SUMout


block mult (int n) (wire y[n], wire x[n]) ∼ (wire z[n]) { 

wire zeros[n]. 

int j. 

for j = 0..n−1 { zeros[j] = false. } . 

(zeros, y) ; 

zip 2 ; rev n ; 

converse (pi1) ; 

grid (n, n, multcell) ; 

[converse (zip 3), map (n, pi1) ; rev n] ; 

((zeros, zeros, x), z) at (0,0). 

} 

Figure 5.14: Quartz description of the multiplier grid 

designed specifically for a real circuit architecture and thus has different data-flow. 

When the value of the x input is known the circuit can be specialised. When x = 0, the 

individual cell can be replaced by the arrangement shown in Figure 5.12(b). Because the x bit 

value is common to the entire column the entire column can be eliminated and replaced with 

an iterative wiring arrangement that directly connects the sum out outputs of the previous 

column to the sum out output of the current column displaced vertically by one cell. This is 

described in a spec multcol block which produces a specialised multiplier with unnecessary 

rows eliminated. 

Rather than using the grid combinator, we describe a multiplier with column-level special- 

isation using irow and this spec multcol block. spec multcol is parameterised by a boolean 

array of the bit values of static signal x and the index parameterisation of irow is used to 

extract the correct value for each column. This is just one way of describing this behaviour, 

an alternative would be to zip the boolean vector with the other column inputs and use the 

standard row combinator. 

The general and specialised multipliers can be overloaded so that the correct instance of mult 

is selected depending on whether a static or dynamic x value is specified. 

5.5.2 Results 

We expect the performance of the specialised multiplier to depend on the precise value of the 

x input, since 0s in the x value do not require any computation at all. Therefore we synthe-


Resources Standard Settings Constrained Timing 

x value Slices Diff Max freq. (Mhz) Diff Max freq. (Mhz) Diff 

3 15 -79% 162 315% 192 368% 

9 14 -80% 153 292% 170 314% 

85 26 -63% 88 126% 95 131% 

121 32 -55% 75 92% 83 102% 

128 5 -93% 114 192% 115 180% 

170 25 -65% 88 126% 97 137% 

255 71 0% 38 -4% 41 0% 

Table 5.2: Results of multiplier specialisation without compaction 

sise a relatively small multiplier circuit, for two 8-bit inputs, so that different performance 

behaviour can be explored without requiring too large a number of different input values to 

be evaluated. We connect registers to the inputs and output so that the maximum clock 

frequency of the design can be evaluated. With this design, a general multiplier requires 71 

slices on the device. We generate two versions of the design on the device - one with the 

standard Xilinx tool settings and a second with a timing constraint to (hopefully) generate a 

better routed circuit. The standard design can run at a maximum clock frequency of 39Mhz, 

while the version synthesised with the timing constraint can run up to 41Mhz. 

Table 5.2 shows the results for specialising the multiplier for various x values with the descrip- 

tion configured to prevent compaction of the design. Compaction is prevented by manually 

specifying a size expression for the spec multcol block which does not contain a conditional 

and is the same size regardless of whether the column is specialised away. The “Diff” columns 

indicate the percentage difference between the values for the specialised multiplier and for 

the general multiplier, standard or with the timing constraint as appropriate. 

There are clearly significant speed and area savings that can be made by replacing the 

general multiplier by constant coefficient versions. To understand the results it is necessary 

to understand how the different values impact the structure of the generated circuit, as 

shown in Figure 5.15. Examining these diagrams it is clear that the multiplier for x = 255 is 

virtually identical to the full multiplier - although it does differ in the important respect that 

it does not require input pins for its x input. It is interesting to note that the Xilinx software 

actually initially generates a design which runs slower than the full multiplier, although this 

effect is eliminated when the timing constraint is introduced. With the timing constraint 

the constant-coefficient version does actually run 0.08% faster, however this effect is pretty


y 

y 

y 

y 

x 0 

x 0 

x 0 

x 1 x 2 x3 x 4 x 5 x 6 x7 

x 

(a) Full multiplier 

x 3 

x 

(c) x = 9 

x 3 x 4 x 5 x 6 

x 

(e) x = 121 

x 1 x 3 x 5 x 7 

x 

(g) x = 170 

ans 

ans 

ans 

ans 

y 

y 

y 

y 

x 0 

x 0 

x 0 

x 1 

x 

(b) x = 3 

x 2 x 4 x 6 

x 

(d) x = 85 

x 

(f) x = 128 

x 7 

x 1 x 2 x3 x 4 x 5 x 6 x7 

x 

(h) x = 255 

Figure 5.15: Comparing the full multiplier with specialised constant co-efficient multipliers 

ans 

ans 

ans 

ans


Resources Standard Settings Constrained Timing 

x value Slices Diff Max freq. (Mhz) Diff Max freq. (Mhz) Diff 

3 15 0% 162 0% 192 0% 

9 14 0% 161 5% 170 0% 

85 26 0% 86 -2% 99 4% 

121 32 0% 76 1% 83


with the uncompacted design the freed logic is dispersed throughout the multiplier circuit. 

This example demonstrates that compaction can improve performance and we would expect 

the performance gain to be much larger for larger circuits where there is more potential for 

compaction, even in a larger multiplier circuit. 

5.6 Summary 

In this chapter we have demonstrated how the Quartz layout infrastructure can be used to 

create specialised versions of designs by optimising for particular static inputs. 

We have presented the mechanism of distributed specialisation and demonstrated how this 

can be used to specialise a ripple adder using the Quartz overloading mechanism. We have 

also highlighted the capabilities required to implement an optimal distributed specialisa- 

tion system. We have shown how distributed specialisation lends itself to clear and simple 

verification in a way that could be easily automated using a theorem prover. 

One of the advantages of performing specialisation at the Quartz level is that we are able to 

achieve compaction of placed designs. We have demonstrated that this can lead to increased 

performance for a specialised multiplication circuit.

Chapter 6 

Layout Case Studies 

In this chapter we demonstrate the use of our layout framework by describing some full 

circuits and comparing the performance and compilation times for versions with and without 

placement. Section 6.1 outlines our basic approach to collecting results. In Section 6.2 we 

describe pipelined and unpipelined binary trees of ripple adders. Section 6.3 gives the Quartz 

design and results for a simple median filter, while Section 6.4 describes a butterfly network 

of 2-sorters and introduces the low-level register pipelining combinator. Section 6.5 describes 

and analyses a binomial filter circuit. Section 6.6 introduces a new class of n-dimensional 

combinators and shows how the 3D version can be used to describe a matrix multiplier 

with an implicit 2D layout interpretation. Section 6.7 evaluates our results and Section 6.8 

summarises this chapter. 

6.1 Approach 

In this chapter we describe a variety of different circuits with layout information, verify their 

layouts and evaluate the performance of the resulting circuit with and without the placement 

constraints. Designs were synthesised for a Xilinx Virtex-II FPGA so that the required logic 

resources, maximum operating frequency and power consumption could be measured. 

Designs are expressed in Quartz and compiled using the Quartz compiler with layout gen- 

eration (Chapter 3) into Pebble 5. The Pebble 5 compiler [66] is then used to produce 

130

CHAPTER 6. LAYOUT CASE STUDIES 131 

a flattened, placed netlist in VHDL format. This flattened VHDL instantiates architecture 

primitives and is enclosed in a hand-coded VHDL testbench for synthesis using the Xilinx ISE 

software. The testbench has been used to carry out the following functions as appropriate: 

1. Clock division. Where the power consumption of circuits has been measured a simple 

clock divider circuit has been used to ensure that all circuits are run within their 

maximum clock frequency. 

2. Generating input data. Linear Feedback Shift Register Counters [3] are used to provide 

pseudo-random input data to designs when necessary. 

3. Provision of suitable interface registers. 

4. XORing of outputs to a single chip pin. 

Outputs were XORed together when it was desired to investigate the power consumption of 

the circuitry on a chip to the minimise the influence of I/O power. It is necessary to connect 

a single output pin to prevent the entire circuit being optimised away by the synthesis tools. 

Power consumption has been measured from a Celoxica RC200 development board, equipped 

with an XC2V1000 FPGA. This is a complex development board and the board power con- 

sumption dwarfs that of the FPGA itself, thus we have measured the quiescent power of 

the board (when the FPGA is programmed to be empty) and deducted this from our power 

measurements. Power consumption itself was measured by monitoring the current drawn by 

the board at its operating voltage of 12V. 

We use four metrics to evaluate our designs: 

1. Maximum clock frequency. This is an important measure of the performance of the 

generated circuit, indicating how fast it can be run and thus how fast it can process 

data. We would hope that manually placed designs would have higher maximum clock 

frequencies than automatically placed ones, however previous results have shown that 

this is not always the case [77]. While placed designs often outperform unplaced designs 

very significantly, some types of design do not. Maximum clock frequency is not the 

only characteristic of circuits we are interested in and a placed design may still be 

preferable to an unplaced one if it outperforms on one of our other metrics.


2. Power consumption. For some variants of circuits we have been able to measure the 

relative power consumption of placed and unplaced designs and compare them. Once 

again we would hope that manually placed designs would have lower power consump- 

tion, however this may differ from circuit to circuit. 

3. Place and route time. The time taken to place and route a circuit is an important 

part of the overall hardware compilation time, particularly for dynamic specialisation 

applications where it is necessary to generate circuits very quickly. This is measured 

by the Xilinx synthesis tools running on an Intel dual Pentium 4 Xeon 2.6Ghz PC with 

4GB of RAM. 

4. Logic area. The logic area used by circuits will be measured as the number of slices 

required on the Virtex-II. 

6.2 Adder Tree 

The simplest circuits we analyse are pipelined and unpipelined binary adder trees. 

6.2.1 Ripple Adder 

Binary ripple adders can be laid out quite densely on the Virtex-II architecture, using a single 

slice to implement the addition (and pipeline delay, if desirable) of two bits. This is because 

the architecture of each slice (Figure 2.2, page 11) is such that two full-adder circuits can be 

implemented in a single slice, using both function generators and the carry chain. 

The Virtex slice architecture contains specialised carry logic designed to create fast carry 

chains. A Virtex full adder can be built by using the 4-input look-up table as an xor function 

which is then connected to the muxcy carry multiplexer to generate the carry out signal and 

xorcy logic to generate the sum result signal (which can then be registered if desired). This 

arrangement is depicted in Figure 6.1 and the Quartz code which generates this arrangement 

is shown in Figure 6.2. This description exploits the geometric interpretation of Quartz block 

domains and ranges to use the cout signal in the domain to indicate that it is connected to 

the top side of the block and cin in the range to indicate it is on the bottom of the block.


b 

a 

cout 

Figure 6.1: Circuit diagram of the full adder block 

block fadd ((wire a, wire b), wire cout) ∼ (wire cin, wire ans) 


wire xored ab. 

(a,b) ; xor2 ; xored ab at (0,0). 

(cin, xored ab) ;xorcy ; ans at (0,0). 

(a, cin) ; muxcy xored ab ;cout at (0,0). 

} 

Figure 6.2: A full adder within a single Virtex 2 slice 

All three primitive components within the fadd block are placed at co-ordinates (0, 0) indi- 

cating that they should be located within the same slice. The xor2 block is a wrapper for the 

Xilinx lut2 primitive which initialises the look-up table with the values necessary to produce 

an xor function. 

We have illustrated the full adder from the unpipelined ripple adder circuit, however, it is 

possible to connect a pipeline register on the output ans signal in a series composition with 

the xorcy block, all within the same slice. A full n-bit ripple adder can be formed using the 

column combinator: (a, b) ; zip 2 ; π1 −1 ; coln fadd ; (cin, ans). 

The verification of the fadd layout is a particular issue. Since the smallest meaningful element 

in our layout framework is a block of size 1 ×1 (half a slice, for Virtex-II) and our framework 

assumes a homogeneous grid of resources, it is not possible to verify the layout within the full 

adder block. Instead, we reason about circuits using the full adder at the level of individual 

full adders, assuming that fadd itself is correct. The xor2, xorcy and muxcy blocks (and fd 

flip-flop primitives, if desired) are given a size of 0 × 0 to indicate that many can be packed 

within a slice. 

cin 

ans


(a) “Tree” (b) Vertical (c) Horizontal 

Figure 6.3: Different ways of laying out binary trees 

This does not prevent incorrect packing of computational resources into slices, however this 

error is a simple one to avoid. Precisely what might be legal or not legal is architecture 

dependent and thus not desirable to implement within our general framework. Since a 

slice/sub-slice “primitive” such as the full adder is not parameterised its design is gener- 

ally either obviously correct or incorrect and in any event synthesis tools will always be able 

to generate an error message if the logic mapping is invalid. 

6.2.2 Possible Tree Layouts 

In Chapter 4 we illustrated how the Quartz system could describe a H-tree layout. This is just 

one of the possible ways of laying out a binary tree of components and we will experiment with 

a simple horizontal layout (Figure 6.3(c) 1 ). We choose this layout because, with the vertical 

arrangement of ripple adders using the fast carry chain circuitry, it can be laid out extremely 

densely. In the horizontal layout the two sub-trees are laid out to each side of the root node. 

We also experimented with a “traditional” tree layout, laid out with spaces between nodes 

(Figure 6.3(a)) and a compaction of the tree into a single column (Figure 6.3(b)) however 

1 The horizontal tree arrangement is shown without wiring, due to its density.


block btree (int p, block R (‘a, ‘a) ∼ ‘a) (‘a i [m]) ∼ (‘a o) { 

const m = 2 ∗∗ p. ‘a t1, t2. ‘a i1[m/2], i2[m/2]. 

assert (p >= 0) ”btree p


Slices Util. t-PAR (s) Max freq. (Mhz) 

Unpipelined/Auto 777 15% 48 61.5 

Unpipelined/Placed 406 7% 19 69.7 

Pipelined/Auto 777 15% 28 206.0 

Pipelined/Placed 406 7% 26 152.0 

Table 6.1: Results for a single adder tree 

R using the timeless pre-condition and split into two repeated anti-delays within the parallel 

composition using the property that D is polymorphic: 

half 2 n ; [btreen(R ; D) ; D −n , btreen(R ; D) ; D −n ] ; R 

The induction hypothesis can then be used to complete the proof. 

Using this combinator we generate a 6-level tree of 8-bit ripple adders, producing a circuit 

which adds together 64 input values. The manual placement is compared with the identical 

circuit compiled without placement and using the Xilinx placement algorithm. 

6.2.3 Results 

Table 6.1 shows the results for placed and unplaced pipelined and unpipelined adder trees. 

“Util” is the percentage of resources utilised on the device, “t-PAR” is the amount of time 

required to place and route the circuit. As expected, pipelining increases the maximum clock 

frequency significantly (although far from the predicted theoretical maximum of ×8). It is 

also interesting to note that the manually placed design has worse performance than the 

automatically placed version when the circuit is pipelined, even though the manually placed 

version has been mapped into fewer Virtex slices. 

We also experimented with placing multiple adder trees on the FPGA. Table 6.2 illustrates 

the results for an FPGA loaded with 7 of the adder trees. The difference in the resources used 

by the placed and unplaced descriptions is very significant, and possibly partially responsible 

for the fact that the placed version now exhibits significantly higher performance than the 

unplaced version regardless of pipelining. 

The difference in the number of slices used is quite interesting. It implies that the process 

of packing primitives into slices automatically does so much less densely than the manual


Slices Util. t-PAR (s) Max freq. (Mhz) Pwr (mW) 

Unpipelined/Auto 4872 95% 225 53.3 - 

Unpipelined/Placed 1907 37% 19 65.5 - 

Pipelined/Auto 4872 95% 142 123.0 1404 

Pipelined/Placed 1908 37% 40 150.9 852 

Table 6.2: Results for 7 adder trees 

method. This appears to be a result of the Xilinx algorithm only packing, as a first preference, 

“related” logic into the same slice. Thus, while the manually specified layout tends to use 

both function generators in a slice, the automatic one prefers to use only one. This may 

allow the Xilinx router to perform better and could explain why the automatically placed 

single pipelined adder tree example requires more FPGA resources than the placed version 

but still runs faster. 

We measure the power consumption of the pipelined variants. Running at the same clock 

frequency, the placed design consumes substantially less power (39% less dynamic power, 

once the quiescent consumption of the development board is subtracted) than the unplaced 

design, though it is unclear whether this is the result of the design using fewer logic resources 

or of better routing. 

6.3 Median Filter 

Median filters are a special case of ranked order filtering. The median filtering operation is 

widely used in digital image processing to remove noise and in a variety of other applications. 

Our circuit will be restricted to one dimensional filtering, although the extension to a two 

dimensional filter is not difficult. 

A 1-dimensional median filtering operation involves “sliding” a filter window along a range 

of values and selecting the median value from the elements currently within the window. 

This can be achieved by sorting the elements and selecting the middle value - obviously the 

window size must always be an odd number so that there is a middle element to select. In 

our circuit the elements within the current window are stored and each cycle a new value 

is inserted while the oldest is discarded. Since only one element differs between different 

window positions we do not need to implement a full sorter but can simplify the circuitry to


new value 

delays 

insert 

previous state 

locater 

compactor 

next state 

midelem 

median value 

Figure 6.5: Block diagram for the median filter 

simply insert a value into the correct position in an already sorted list. 

6.3.1 Circuit Design 

Our design is loosely based on a state-machine based design previously described in Ruby [26], 

however our realisation differs substantially and, we would suggest, is made much simpler by 

using the full features of Quartz rather than basic Ruby relations 2 . Figure 6.5 shows the basic 

architecture of our filter, with several blocks that operate on a current state and produce a 

next state. This is essentially the state transition and output logic of a state machine and 

this circuit can be composed with appropriate registers using the loop combinator. 

The two inputs to this filter core are the previous state (a set of sorted values) and a single 

new value. This new value is inserted into the sorted list by the insert block which implements 

one stage of an insertion sort. If the state contains n elements then the output of the insert 

block contains n + 1 elements, one of which (the oldest) must now be removed. 

As well as being connected to the insert block the new value is also fed into an n-element 

shift register, which is used to determine the value to remove to make the new state. The 

locater carries this out, matching the value from n cycles ago with the values in the state 

until the first match is found. 

The locater is composed of a row of lct cells, which are shown in Figure 6.6(a). These cells 

2 The Ruby design was however refined into a bit-level version, while we will concentrate on the word-level 

circuit.


s 

a a2 

f 

eq 

or2 

d 

f2 

(a) lct cell (b) del cell in 

through mode 

(c) del cell in 

compact mode 

Figure 6.6: Cells making up various filter blocks 

output an array of boolean values (d) which control the operation of the compactor block. 

The value a is the value the locater is “looking for”, s is one of the elements in the state, f 

is a boolean indicating whether the value has been located yet. Each lct cell compares its 

state value with the a value and outputs d and f2 values which control the compactor. 

The compactor is made up of a row of cells which use multiplexers controlled by the d output 

of the corresponding lct cell. This mode signal configures the dell cell block into one of two 

configurations: through mode (Figure 6.6(b)) and compact mode (Figure 6.6(c)). 

The locater configures all compactor cells to the right of the first detected match to the 

correct value (the same value could be within the current state multiple times) into through 

mode. This has the effect of routing the detected value to the right, while the other values are 

routed straight through. The value that appears on the rightmost output of the compactor 

is discarded. The compactor outputs the correct n elements of the next state. 

The midelem block is a simple wiring block which outputs the median value by extracting 

the middle value from the sorted list. 

6.3.2 Layout 

We arrange the insert, locater and compactor blocks vertically roughly as shown in Figure 6.5. 

lct cell is formed from an equality checker, composed of a column of and3 gates on top of 

an or2 gate. del cell is implemented by two multiplexers arranged in adjacent columns. 

The insert insertion-sort block is built from a row of min2 and max2 sorters, with each 

bit implemented as a comparator function unit and another function unit operating as a


Block/Theory Theorems Proof Details 

mux 4 Add mux lut def to simplification set 

mux ff 4 Add mux lut def to simplification set 

max2 6 Expand mux def, mux lut def, rephrase a tactical 

min2 6 Identical to max2 

insert 3 Expand compositions 

midelem 2 Automatic, assertion premise removed 

insert median 3 Expand compositions 

del cell 5 Manual intersection theorem 

compactor 3 Expand compositions 

eq 4 Containment proof requires additional lemma 

lct cell 5 Fully automatic 

locater 3 Fully automatic 

nextstate 5 Fully automatic 

filter core 3 Expand compositions 

filter 3 Fully automatic 

multiplexer. 

Table 6.3: Statistics for median filter layout correctness proof 

The median filter design description, shown in Appendix D.1, is a good example of how the 

Quartz layout system is easy to use and allows the mixing of different styles of expressing 

placement. Size inference is used for all blocks, and since we do not define any combinators 

that we intend to re-use absolute co-ordinates can be used to simplify placement. 

The verification of such a layout description is important, since it could easily contain simple 

errors and we have undertaken this using our framework. Table 6.3 shows statistics and 

information on some of the proofs conducted during the verification of the median filter 

layout (also shown in Appendix D.1). 

Generally the verification of this circuit layout is easy: although considerable tweaking of the 

proof scripts are required for some blocks, the nature of such changes was always obvious 

and simple to carry out. Once again the inability of the classical reasoner to appropriately 

decompose series and parallel compositions is responsible for several required interventions, 

although once these compositions are handled using low-level reasoning tools the automated 

tools complete the proof. 

The complete proof scripts for this design take 6 min 5s to run. This is a surprisingly long 

time, mostly accounted for by the max2 and min2 blocks, however it is still well within a 

reasonable time frame.


shift registers 

insert 

locater 

compactor 

Figure 6.7: Median filter realised on a Virtex-II 


Default synthesis tool configuration 

Automatic 247 4% 26 32.2 

Placed 147 2% 19 37.7 

30ns timing constraint 

Automatic 247 4% 27 37.2 

Placed 147 2% 19 37.1 


Automatic 247 4% 71 42.5 

Placed 147 2% 54 41.1 

Table 6.4: Results for a median filter with 8-bit data values and a window size of 5 

Figure 6.7 shows the logic resources used when a median filter is realised on a Virtex-II chip. 

The relative layout of the different components of the circuit can be clearly seen. 

6.3.3 Results 

We synthesise two different sizes of the median filter for a Virtex-II chip. Results for the 

smaller version, with 8-bit data values and a window size of 5, are shown in Table 6.4. 

We compared the results for three different configurations of the Xilinx synthesis tools. The 

place & route tool can be configured with a desired timing constraint on the resulting circuit, 

which the tool will attempt to meet. It can be seen that this does have a significant effect



Default synthesis tool configuration 

Automatic 2401 47% 116 2.95 

Placed 1317 26% 73 5.38 


Automatic 2401 47% 127 4.36 

Placed 1317 26% 62 5.35 


Automatic 2401 47% 158 4.82 

Placed 1317 26% 91 6.04 

Table 6.5: Results for a median filter with 32-bit data values and a window size of 11 

on the maximum clock frequency, though less so for the manually placed design. The time 

taken by the place & route process increases substantially as the timing constraint is made 

more stringent, although the manually placed version is processed quicker for all constraint 

settings. 

The performance of the automatically and manually placed versions are similar. However, 

both circuits use only a tiny proportion of the available logic resources and we have previously 

observed that automatic place & route seems to have more of an advantage at low utilisations. 

Table 6.5 gives the results for a larger median filter design, with 32-bit data values and a 

window size of 11. This design uses significantly more of the chip and the placed design has 

the clear advantage over the automatically placed version, in terms of both place & route 

time and maximum clock frequency. The clock frequency advantage is greatest when no 

timing constraint is specified, where the placed version is 82% faster, however even when the 

difference is minimised for the 200ns timing constraint the placed version is still 22% faster 

- and takes half the time to process. 

We do not record the power consumption of the median filter circuit because it runs at too 

slow a clock frequency to make this worthwhile. We could pipeline the design if there was a 

desire to increase its operating frequency. 

6.4 Butterfly Network 

Butterfly circuits are characterised by their intensive wiring patterns. Such networks are 

commonly used in applications such as computing a Fast Fourier Transform.


Figure 6.8: A butterfly network of degree 4 (2 4 = 16 inputs) 

block butterfly (int n, block R (‘a, ‘a) ∼ (‘a, ‘a)) (‘a l [m]) ∼ (‘a r[m]) { 

const m = 2 ∗∗ n. 

l ; rcomp (n, 

riffle (m/2) ; 

pair (m/2) ; 

map (m/2, vecpair ; R ; converse (vecpair)) ; 

converse (pair (m/2)) 

) ; r. 

} 

6.4.1 Butterfly Combinator 

Figure 6.9: Quartz butterfly combinator 

A butterfly network is an arrangement of functional blocks as shown in Figure 6.8. The 

network is characterised by repeated instantiations of the same, or similar functional blocks, 

with a particular wiring arrangement between them. There are a number of different ways 

of describing butterfly networks in Quartz and one of the simplest is shown in Figure 6.9. 

This combinator uses repeated composition to connect together multiple instantiations of 

the the functional block R with a wiring arrangement described by riffle and pair. The pair 

block converts a one dimensional vector into a two dimensional vector of pairs. The riffle 

operation splits a vector into two halves and then combines them in an interleaved fashion, 

an iterative version of this wiring block is given in Figure 6.10. 

The vecpair block converts a two element vector into a pair (tuple) of elements and its 

converse performs the opposite operation. The Quartz vecpair block is defined referencing


block riffle (int n) (‘a x[2∗n]) ∼ (‘a y[2∗n]) 


int i. 

for i = 0..n∗2−1 { 

if (i mod 2 == 0) {y[i] = x[i/2]. } else { y[i ] = x[n+i/2]. } . 

} . 

} 

Figure 6.10: Iterative riffle operation 

block spacer (int w, int h) (‘a i) ∼ (‘a o) 

attributes { width = w. height = h. }→ i = o. 

block spacer (int w, int h) (‘a i1, ‘b i2) ∼ (‘b o1, ‘a o2) 

attributes { width = w. height = h. }→ (o1, o2) = (i2, i1). 

Figure 6.11: Two sided and four sided spacer blocks 

explicit vector indexes however it can also be defined using append blocks: 

vecpair = apl2 −1 ; snd [−] −1 = apr2 −1 ; fst [−] −1 

The proof of this relationship is easy. 

The butterfly combinator can be instantiated with any R block to produce a variety of 

different butterfly networks. Its structure has a clear layout interpretation imparted by the 

combinator blocks it utilises: map will arrange the R blocks of each stage of the butterfly 

vertically and each stage will be laid out horizontally next to the previous stage by the rcomp 

block. 

This is a very dense arrangement and it is possible that for some architectures there may 

be insufficient routing resources available to route the complex wiring network between each 

stage of the butterfly. To avoid this problem the butterfly combinator can have a spacer block 

added to the description of each stage. A spacer is a block which is functionally identical 

to the identity block but is defined to have a non-zero size, it can thus be used to produce 

empty space in designs. 

Figure 6.11 illustrates two spacer blocks for use in two-sided and four-sided circuit arrange- 

ments. They are declared as instances of an overloaded spacer identifier which are selected 

between depending on their type. For the butterfly combinator the spacer component can


gr_lut 

gr_lut 

gr_lut 

gr_lut 

gr_lut 

gr_lut 

eq_lut 

eq_lut 

eq_lut 

eq_lut 

eq_lut 

eq_lut 

mux mux 

mux mux 

mux mux 

mux mux 

mux mux 

mux mux 

Figure 6.12: 6-bit 2-sorter circuit 

be placed anywhere within the rcomp parameter composition since the type is polymorphic, 

however the logical place to put the spacer in order to put room between each butterfly stage 

is next to the map instantiation - it can then be given the desired width and any height (less 

than the expected height of the map instantiation) and the series composition layout will 

ensure that this space is left free. 

6.4.2 Implementing a bitonic merger 

The butterfly circuit we evaluate is a network of 2-sorters. This is a bitonic merger circuit 

which merges together two sorted lists. The merger is bitonic because the order of the input 

lists must be opposed – i.e. if one is ascending then the other must be descending or vice 

versa. 

We design a 2-sorter circuit which operates on n-bit data values and lay it out as a 4 × n 

block as shown in Figure 6.12. The first two columns are a comparator which outputs a 

control signal to the multiplexers to select the maximum and minimum values. 

The butterfly sorting network can be pipelined by inserting registers between each stage, 

replacing the R block by R ; D We can state the correctness of a pipelining arrangement 

with the following theorem: 

Theorem 24 R ; D = D ; R ⇒ butterfly n R = butterfly n (R ; D) ; D −n 

Proof This requires a lemma about repeated series composition:


block register (wire clk) (block R ‘a ∼ (wire)) (‘a i) ∼ (wire o) 

attributes { width = 1. height = 1. } 

{ 

wire o2. 

assert (height(i ; R ; o2)



4-bit data 





6-bit data 





8-bit data 





10-bit data 

Unpipelined/Auto - >100% - - - 


Pipelined/Auto - > 100% - - - 


the relationship: 

Table 6.6: Results for 2 64-input bitonic merger circuits 

Theorem 26 R : α ∼ wire ⇒ R ; D = register R 

6.4.3 Results 

We generate a bitonic merger circuit that merges two lists of 32 8-bit numbers pipelined and 

unpipelined for four different bit widths. Two of these circuits are placed on the Virtex-II 

chip. 

Figure 6.6 shows the results for these circuits. The placed version takes up significantly 

fewer logic resources on the device than the unplaced version. In fact, for the 10-bit data the 

unplaced circuit the synthesis tools report that the design can not actually be mapped onto 

the device, while the placed version uses less than 70% of the available resources. 

The picture for our other metrics is less clear, although manual placement clearly produces 

significant improvements in maximum clock frequency for unpipelined designs the same is 

not always the case for the pipelined versions. The same mixed picture is observed with


+ 

+ + + 

D D D D 

Figure 6.14: A 4-stage binomial filter 

power consumption figures, with the manually placed designs consuming 20% less power for 

the 8-bit variant but 43% more power for the 4-bit variant. 

Once again it appears that the level of device utilisation heavily influences the effectiveness 

of manual placement, with the automatic placement producing better results for designs that 

only utilise a small part of the device while for larger circuits manual placement has the 

advantage. 

6.5 Binomial Filter 

A binomial filter is a simple digital signal processing circuit that we can easily describe using 

our framework. 

6.5.1 Circuit Design 

An n-stage binomial filter is composed of n adders and delay elements arranged as shown in 

Figure 6.14. We could implement this circuit using the ripple adder from Section 6.2 and 

implement a word-level pipeline by placing additional registers between each stage. 

We will however use an alternative implementation which allows us to pipeline the design at 

the bit level. This involves abandoning the Virtex carry chain circuitry, which is designed 

to implement fast, un-pipelined, carry chains. We describe a new full adder component with 

size 1 × 2 that uses two function generators to generate the sum and carry-out outputs given 

the three input signals for each stage. Carry signals can still propagate vertically through a 

column of these full adders and the sum output can be connected to the input of the next 

stage. We use the fork wiring block to copy the output of the previous stage and the fst 

combinator to map registers along only one input to the next adder. These registers have 

size 1 ×1 and so in order to place them correctly aligned with the full adders which have size


block lift (int n, block R ‘a ∼ ‘b) (‘a i) ∼ (‘b o) { 

i ; R ; o at (0, n). 

} 

Figure 6.15: The lift combinator can be used to place blocks at a certain y-offset 


24-bit data 





26-bit data 





28-bit data 





32-bit data 





Table 6.7: Results for a 2 binomial filter circuits 

1 × 2 we use the lift combinator, shown in Figure 6.15. lift instantiates a block at a certain 

y-offset, leaving empty space underneath it. 

6.5.2 Results 

We compile a 32-stage binomial filter for several different bitwidths and implement two on the 

Virtex-II device. We evaluate versions without any pipelining and with bit-level pipelining, 

with no input/output synchronisation. 

The results for the variants of this circuit, shown in Figure 6.7, are particularly interesting. 

Firstly, this is the first circuit where the manual placement has not always led to a denser 

logic mapping than the automatic algorithms, with several manually placed versions actually 

requiring marginally more slices than their automatically placed equivalents. This only ap-


plies for the unpipelined circuits and is not surprising because we have deliberately specified 

a less-dense packing for this circuit, using the lift block to align registers with the adders and 

thus using only one out of each two slice flip-flops in these columns. In the unpipelined cir- 

cuits we are also not utilising the in-slice flip-flops of the adder columns which could be used 

to implement the coefficient delay. If we had specified a denser packing for the unpipelined 

circuit then we would use b × n fewer slices (where b is the number of bits and n the number 

of stages) – saving over 1000 slices for the 32-bit, 32 stage circuit. 

The manually placed designs have better maximum clock frequencies than the automatically 

placed ones when there is no pipelining and significantly worse performance when pipelined. 

For example, the placed 32-bit filter runs 14% faster than the unplaced version when un- 

pipelined, but 30% slower when pipelined. 

The clock frequency result is important, since it indicates that simulated annealing has 

achieved better results for the pipelined circuit regardless of the circuit size. The difference 

between the placed and unplaced circuits does nonetheless differ depending on the circuit 

size, with the placed circuit 35% slower for 24-bit data but 30% slower for the larger 32-bit 

data variant. 

This result is consistent with previous research [77] which indicated that placed adder circuits 

not employing the carry chain were outperformed by circuits placed using simulated anneal- 

ing. It shows that without the vertical placement constraint enforced by use of the carry 

chain simulated annealing can place cells where it likes and find high speed paths between 

cells which humans would probably not have considered. In the absence of other constraints, 

simulated annealing can therefore find irregular layouts which are better than what a human 

would consider reasonable. 

Interestingly although for the pipelined circuits the manually placed variants have lower 

maximum clock frequencies, when run at the same clock frequency they consume between 

1.5% and 13.1% less power. This indicates that manual placement has a role to play even 

in circuits such as this one where it leads to a decreased maximum potential performance by 

specifying a denser logic mapping and reducing circuit power consumption. If a 24-bit filter 

that can run at 50Mhz is desired then the placed circuit is clearly superior – it will compile 

quicker, use less logic area and consume significantly less power while running at that speed.


6.6 Matrix Multiplier 

Our final circuit example is a matrix multiplier circuit. Matrix multiplication is a simple 

operation that has applications in many scientific computing applications as well as branches 

of digital signal processing. The multiplication of two matrices requires a large number of 

multiplication and addition operations and there is considerable potential for parallelisation, 

making a hardware implementation attractive. 

Band matrix multipliers have previously been described as systolic grid-shaped circuits using 

Ruby [44, 68]. These descriptions were difficult to relate to a simple specification of matrix 

multiplication as a set of multiply-accumulate operations. We will describe our system in a 

clearer manner using a new combinator that describes 3D circuits. 

We are motivated to create new higher-dimensional combinators by the realisation that a 

combinator with a certain dimension is appropriate for processing data with a particular 

dimension. For example, a one-dimensional array such as a row can process one dimensional 

data, while a grid can process two dimensional data (or two one-dimensional data streams). 

However, the confusion with mapping an operation such as matrix multiplication onto a grid 

arises from the fact that the input data is two two-dimensional data sources and the output is 

two dimensional, while the grid itself only has 4 1-dimensional interface points in its domain 

and range. If we can describe higher dimensional combinators, such as cubical structures, 

then potentially we can describe circuits that operate on this kind of data much more clearly. 

An important point to note is that we are talking about multi-dimensional circuit descrip- 

tions, which are not the same as multi-dimensional circuits. Three dimensional FPGAs 

have been proposed [2, 17, 40, 51], however four dimensional and higher-dimensional FP- 

GAs pose interesting implementation difficulties. An alternative is to attempt to realise 

higher-dimensional FPGA circuitry on standard two dimensional silicon. Schmit proposes 

drawing on three and four dimensional topologies to increase the wiring density on stan- 

dard two-dimensional silicon [72]. What we propose is designing combinators which describe 

multi-dimensional circuits but which we expect to realise on two-dimensional silicon. 

Any higher-dimensional array can be flattened onto a 2D grid in a number of ways. We 

propose higher-dimensional combinators that have an implicit layout interpretation on the


domain 

(a) Domain & range convention 

range 

z 

y 

x 

(b) Dimensions 

Figure 6.16: Representing three dimensional blocks 

2D FPGA but can also be given alternative layout interpretations through manipulation 

using correctness-preserving transformations. 

6.6.1 A 3D “cube” Combinator 

The standard layout interpretation of a Quartz block is as a four sided tile, with two sides 

assigned to the domain and two to the range. The assignment of sides to domain and range 

and the division into two sides is a convention only. It is important to realise that any 

convention is sufficient providing it is consistently applied and we can bear this in mind 

when choosing a convention for visualising the three dimensional blocks which make up a 

cubical circuit. 

Figure 6.16(a) illustrates how we divide the six sides of a cube into a block’s domain and 

range. Visualised in this way the block’s top, back and left sides form the domain while the 

front, bottom and right form the range. We describe block domains and ranges as tuples of 

different dimensions, so the range is described of a tuple of (xs, ys, zs) while the range is a 

tuple of (zs, ys, xs). We use the signal xs to describe signals that are travelling along the 

x-axis, ys to mean the signals travelling along the y-axis etc. Note that we have extended 

the convention of reversing the order of the sides from the 2D case to this 3D case. In the 

n-dimensional case a block’s range dimensions should always be expressed in reverse order 

from its domain dimensions. 

Each side of the cube is itself a two dimensional array of values. This requires some convention 

of assigning dimensions, which we do as shown in Figure 6.16(b). This means that the 

domain and range signals are assigned the dimensions of: xs[z][y], ys[z][x] and zs[y][x]. This


block cube (int x, int y, int z, block R (‘a, ‘b, ‘c) ∼ (‘c, ‘b, ‘a)) 

(‘a x d[z][y], ‘b y d[z][x], ‘c z d[y][x]) ∼ 

(‘c z r[y][x], ‘b y r[z][x], ‘a x r[z][y]) { 

‘a xs[x+1][z][y]. ‘b ys[y+1][z][x]. ‘c zs[z+1][y][x]. 

int ix, iy, iz. 

} 

xs[0] = x d. ys[y] = y d. zs[z] = z d. 

x r = xs[x]. y r = ys[0]. z r = zs[0]. 

for ix = 0..x−1 { 

for iy = 0..y−1 { 

for iz = 0..z−1 { 

(xs[ix ][ iz ][ iy ], ys[iy+1][iz ][ ix ], zs[ iz+1][iy ][ ix ]) ; 

R ; 

(zs[ iz ][ iy ][ ix ], ys[iy ][ iz ][ ix ], xs[ix+1][iz ][ iy]). 

} . 

} . 

} . 

Figure 6.17: cube combinator defined iteratively with explicit internal signals 

generalises easy to the n-dimensional case where extra dimensions can simply be added. 

We can describe a cube combinator iteratively as shown in Figure 6.17 in terms of explicit 

internal signals. Multiple copies of the R block are instantiated and connected to internal 

signals xs, ys and zs that hold values flowing along the x-axis, y-axis and z-axis respectively. 

This definition uses three different internal signals rather than declaring a single vector of 

internal signals with an extra dimension, this is because the signals flowing along each axis 

can have different types - as illustrated in the type signature for the R block and the cube 

block itself. 

This interpretation of a cube also identifies the final important element of our convention – 

that the (0, 0, 0) point is located at the front, left of a cubical structure. This is reflected in 

the indexes chosen to connect to the R block in the loop. While this is a short description it 

is quite complex and has no particularly obvious layout interpretation. 

We can envisage a cube as a 1D array of 2D arrays. We exploit this to represent a cube as a 

column of grids. This kind of description immediately gives the cubical circuit a 2D layout 

interpretation – as grids laid out vertically on top of one another. Figure 6.18 illustrates the 

wiring involved in this kind of arrangement. In this example two grids are placed on top of


y_in 

x_in 

x_in 

y_in 

x_in 

x_in 

R R 

R R 

R R 

R R 

z_out 

z_in 

R 

R 

R 

R 

x_out 

x_out 

y_out 

x_out 

x_out 

y_out 

Figure 6.18: A cubical circuit can be viewed as a column of grids. 

each other, however the wiring is somewhat complex as the z dimension connections need to 

be routed through the x and y connections between the two grids. 

We can do this by “folding” the extra dimensional signals into a tuple with one of the 

“standard” grid dimensions and extracting them as needed. This can be done using the 

zip n,m language construct which converts an n-tuple of vectors into an m-dimensional vector 

of tuples. A pre-requisite for this operation is that the vectors are the same size in the 

dimensions that are being zipped and it turns out that there is only one valid way of carrying 

this out. 

Figure 6.19 shows how a cube combinator can be written in terms of col and grid. Note the 

re-arrangement of the domain tuple into a pair of a tuple of (x d, y d) and z d while the 

same has been done to the range tuple. Pairs are actually extremely powerful mechanisms - 

they allow us to split signals into a part to perform operations on and a part to ignore. The 

conversion of the 3-tuple of domain/range signals into a pair allows the use of the fst and snd 

blocks to control the applications of the zip 2 block - we will show how this transformation 

can be included in the description shortly.


block cube (int x, int y, int z, block R (‘a, ‘b, ‘c) ∼ (‘c, ‘b, ‘a)) 

(‘a x d[z][y], ‘b y d[z][x], ‘c z d[y][x]) ∼ 

(‘c z r[y][x], ‘b y r[z][x], ‘a x r[z][y]) { 

((x d, y d), z d) ; 

fst (zip 2) ; 

col (z, swap ; rsh ; fst (zip 2) ; 

grid (x, y, cube cell (x, R)) ; 

snd (converse (zip 2)) ; rsh ; fst swap ; lsh) ; 

snd (converse (zip 2)) ; 

(z r, (y r, x r)). 

} 

Figure 6.19: A Quartz description for cube as a column of grids 

block cube cell (int n, block R (‘a, ‘b, ‘c) ∼ (‘c, ‘b, ‘a)) 

((‘c z[n], ‘a x), ‘b y) ∼ (‘b y2, (‘c z2[n], ‘a x2)) { 

((x,y), z) ; 

snd (converse (apl (n−1))) ; rsh ; 

fst (tplapr 2 ; R ; converse (tplapl 2) ; swap) ; 

lsh ; snd (swap ; apr (n−1)) ; 

((y2, x2), z2). 

} 

Figure 6.20: The cube cell re-wiring block 

This cube combinator creates a grid of not R blocks but cube cell blocks. This is a wiring 

block with a description shown in Figure 6.20. It splits the full z vector that has been 

“packaged” up with the x vector and extracts the signal that is for this element, connects it 

to R and then appends the z output of R back into the z vector. The whole operation rotates 

the z vector as the value is extracted from the left and appended to the right, meaning that 

the next block will extract the correct (different) z signal and by the end of the row the 

resulting z signals are all neatly packed into the vector. Figure 6.21 illustrates this structure. 

6.6.2 Describing N-dimensional Combinators 

The description in Figure 6.20 uses two new language constructs: tplapl and tplapr. These 

are tuple append blocks that function in much the same way as apl and apr do for vectors, 

adding or removing elements from the left or right of a tuple. These must be defined as 

language constructs that are statically parameterised because otherwise they are not valid 

within the Quartz type system which requires that tuples are fixed-arity data structures. We


x_in 

y_in 

R 

x_out 

z_in z_out 

y_out 

Figure 6.21: The cube cell wiring block 

have implemented tplapl and tplapr using the Quartz compiler infrastructure for defining new 

experimental language constructs. 

The tuple-append operations do exhibit some interesting behaviour because of the way they 

interact with other aspects of the Quartz type system, in particular the typing rule which 

treats a singleton as equivalent to a single-element tuple. For example, the operation tplapl 

1 or tplapr 1 applied to a pair leaves the pair unchanged – the effect is to append the left or 

right element to the single element tuple of the other element forming a new pair within the 

original tuple, which is then eliminated because it only contains a single element (the pair). 

For tplapl the process looks a little like: 

(a, b) −→ ((a), b) −→ ((a, b)) −→ (a, b) 

The tuple-append operators allow us to decompose an n-tuple into a pair of the leftmost/right- 

most element and the rest of the tuple. Once this has been done we can use standard oper- 

ations on pairs to manipulate the tuple. They are vital in allowing us to generalise the 3D 

combinator we have developed in this section into an n-dimensional meta-combinator, nd . 

We describe this as a meta-combinator because it is not itself a valid Quartz combinator 

since it is parameterised in its number of dimensions and utilises tuples of parameterisable 

lengths, something that is not valid in the Quartz type system. Each possible instance is a 

valid combinator but must be described individually. It has been designed to use point-free 

wiring constructs in order to achieve its generality and thus could be a valid construct in the 

untyped Ruby calculus. The limiting influence of type systems in specification languages has


nd2 (i1, i2) R = coli2 (rowi1 R) 

ndn (i1, . . .,in) R = tplaprn−1 −1 ; fst (zip n−1 ) ; colin 

swap ; snd tplapln−2 −1 ; rsh ; fst (zip 2,n−2 ) ; 

tplapln−2 ; ndn−1 (i1, . . .,in−1) (ndcelln (x, R)) ; 

tplaprn−2 −1 −1 

; snd(zip2,n−2 ; swap) ; rsh ; 

 

swap ; snd tplaprn−2 ; 

snd(zip n−1 −1 ) ; tplapln−1 

ndcelln (m, R) = tplapln−2 −1 ; fstswap ; lsh ; sndswap ; rsh ; fst tplapln−2 ; 

been much discussed [38]. 

snd(aplm−1 −1 ) ; tplapl2 ; tplapr2 −1 ; 

fst (tplaprn−1 ; R ; tplaprn−1 −1 ; swap) ; 

lsh ; snd(swap ; aprm−1) ; fst tplaprn−2 −1 ; 

lsh ; sndswap ; tplaprn−2 

Figure 6.22: Description of an n-dimensional meta-combinator 

Figure 6.22 illustrates our n-dimensional combinator description. This combinator is defined 

recursively with a base case of n = 2 (grid ). For n = 3 the description of nd simplifies to 

that of cube and ndcell simplifies to the description of cube cell. 

The definition of an n-dimensional combinator description raises the tantalising possibility of 

being able to prove theorems for all higher-dimensional combinators as a single theorem. For 

example, we could prove a theorem that could totally pipeline an n-dimensional array taking 

the pipelining theorem for a grid as the base case. Alternatively we could investigate how to 

serialise [45] higher-dimensional descriptions into lower dimensional ones (e.g. converting a 

4D description into 3D, 2D, 1D equivalents with appropriate multiplexers to manipulate the 

input signals). 

Unfortunately, the descriptions of nd and ndcell themselves are complex and fiddly, even 

though most individual steps are just describing simple wiring re-arrangements. Since most 

of the commands are simple wiring they are easy to reason about, for example tplapl and tplapr 

are timeless and thus registers can be moved through them easily. The complex description 

does seem to suggest that it could be a candidate for employing mechanised theorem proving.


Matrix 1 

Empty matrix 

Result matrix 

Matrix 2 

(a) Matrix multiplier arrangement 

x_in 

mult 

add 

z_in 

y_in 

z_out 

(b) Functional cell 

Figure 6.23: Implementing a 3D matrix multiplier 

N-dimensional circuits have many potential uses in describing operations on multi-dimensional 

data and have already been discussed for some applications [42, 43]. They could potentially 

be used to provide a quick route for translating imperative for loop algorithms into hard- 

ware, extending existing work on mapping nested loop algorithms into multi-dimensional 

arrays [39]. 

In this work we shall concentrate on describing a single circuit, a matrix multiplier, using 

the three-dimensional cube combinator. 

6.6.3 A 3D Matrix Multiplier 

A cubical circuit description can be used to combine two dimensional data and generate a 

two dimensional output. This is ideal for describing matrix multiplication as we can describe 

a circuit which has data from the two source matrices moving unchanged through the array 

while the output matrix is accumulated along a different axis. Figure 6.23(a) illustrates this 

arrangement. 

The cubical circuit can be made up of cells that function as shown in Figure 6.23(b). Fig- 

ure 6.24 shows the Quartz description for the matrix multiplier. Note that the first matrix 

must be transposed in order to correctly arrange the elements of the matrix for the circuit, 

nevertheless this is an extremely simple description of multiplying a y ×z matrix and a z ×x 

matrix to produce a y × x matrix as output.


block matmultcell (int n) (wire x in[n], wire y in[n], wire z in[n]) ∼ 

(wire z out[n], wire y out[n], wire x out[n]) { 

x out = x in. 

y out = y in. 

((x in, y in), z in) ; fst (mult n) ; add n; z out at (0,0). 

} 

block matmult (int bits) (int x, int y, int z) 

(wire mat1[y][z ][ bits ], wire mat2[z][x][ bits ]) ∼ (wire mat3[y][x][ bits ]) { 

wire emptymat[y][x][bits]. 

wire mat trans[z][y][ bits ]. 

int i , j, k. 

for i = 0..y−1 { 

for j = 0..x−1 { 

for k = 0..bits−1 { emptymat[i][j ][ k] = false. } . } . } . 

mat1 ; word transpose bits (y, z) ; mat trans at (0,0). 

(mat trans, mat2, emptymat) ; 

cube (x, y, z, matmultcell bits) ; 

converse (tplapl 2) ; 

pi1 ; 

mat3 at (0,0). 

} 

Figure 6.24: Quartz description of the 3D matrix multiplier 

In the implementation of the functional cell we use the column ripple adder developed in 

Section 6.2 and the placed parallel multiplier we described in Chapter 5. 

We can pipeline this circuit by inserting registers on the z data path, since this is the 

accumulator path. We can derive a general pipelining arrangement for nd using a retiming 

theorem for a column: 

Theorem 27 coln R = fst ( ˜ 

n D) ; coln (R ; D) ; [D −n , ˜ 

n D−1 ] 

We can apply this to the n-dimensional combinator description to give a theorem for the 

one-dimensional pipelining of an n-dimensional structure: 

Theorem 28 

ndn R = tplaprn−1 −1 ; fst (zip n−1 ; ˜ 

in D) ; tplaprn−1 ; 

ndn (R ; tplapln−1 −1 ; fst D ; tplapln−1) 

tplapln−1 −1 ; [D −in , zip n−1 ; ˜ 

in D−1 ; zip n−1 −1 ] ; tplapln−1


Proof By moving the delay out of the instantiation of cube cell and exploiting the timeless 

property of wiring blocks to re-organise 3 . 

We can use a particular instance of this theorem for the cube and manipulate it so the 

resulting circuit is implementable to give: 

Theorem 29 

cubex,y,zR = [ ˜ 

z D, ˜ 

z D, id] ; cubex,y,z(R ; [D, id, id]) ; [id, 

z D, z D] 

This theorem can then be used to generate synchronisation registers for the interfaces of the 

matrix multiplier when we pipeline it. 

We undertake a layout verification of the cubical matrix multiplier using the previous cor- 

rectness proofs for col, grid, the multiplier and ripple adder. In the cube block we need 

to add further explicit type annotations to eliminate some type unknowns because the Is- 

abelle/HOL theory of zip is not sufficiently detailed to describe the full types. This indicates 

that a full theory of zip (and indeed tplapl and tplapr) is required, and not just the simple 

approximation contained in the theory Inbuilt, even though the inbuilt blocks do not affect 

layout. Otherwise proofs are mostly automatic, with some intervention required for series 

compositions and expanding the definition of word transpose. Appendix D.3 gives the full 

Quartz description and some proofs for the matrix multiplier circuit. 

6.6.4 Results 

We generate a circuit that multiples two 2 ×2 matrices together to produce a 2 ×2 matrix as 

output and evaluate two of these components on the Virtex-II. Table 6.8 shows the results 

for this circuit. Power consumption was measured at a clock frequency of 16.625Mhz. 

As can be seen the placed version is out-performed by the automatically placed version for 

both the pipelined and unpipelined variants in terms of maximum clock frequency, although 

the percentage difference is small. The power consumption figures present a confusing picture, 

3 Actually, proving that the D element can be moved out of cube cell is a difficult and fiddly proof, however 

cube cell is designed to allow this to take place so we will gloss over this at this point.



Unpipelined/Auto 1568 31% 14 23.7 276 

Unpipelined/Placed 1413 28% 12 23.0 108 



Table 6.8: Results for matrix multiplier circuit 

with no clear trends emerging. The pipelined, placed version actually consumes more power 

than the the unpipelined version, which is unexpected since previous results have shown 

pipelining reduces power consumption for many kinds of circuits [85]. However, the overall 

power consumption of the circuit is so low it is possible that any real effects are being 

overwhelmed by noise. Because of the shape of the placed matrix multiplier it is not possible 

to fit more onto the Virtex device however by increasing the pipelining (by pipelining the 

multipliers themselves for example) the design could be run and evaluated at a higher clock 

frequency. 

The placed version does place & route faster than the unplaced circuit and consumes fewer 

resources on the chip, so could still be superior in some situations where the small difference 

in maximum clock frequency is not significant. 

It is possible that an alternative layout for the cube combinator would produce better results. 

With the z signals being used for the accumulator data path there are long wires between 

the respective elements of each grid. 

6.7 Evaluation and Conclusions 

For the five example designs in this chapter we have seen a range of results for our four 

evaluation metrics: logic area, place and route time, maximum clock frequency and power 

consumption. 

At important realisation that manual placement is not an optimisation method per se but 

rather a way of exerting more control over the compilation process. The way in which that 

control is exercised determines whether the circuits generated are better or worse in some 

way than those that would have been generated automatically.


One invariant we have seen across all circuit examples is that manually placing designs 

significantly reduces the time taken for the place and route stage of the compilation process 

to execute, with the reduction ranging from 14% for the unpipelined matrix multiplier to 

92% for the unpipelined adder tree. This result is not overly surprising, since the place and 

route stage has been reduced to just routing, however it is beneficial to confirm it since it 

could have been the case that the denser placements specified by the user constraints would 

increase the routing time by more than is saved by avoiding automatic placement. 

Another fairly firm conclusion is that in almost all cases 4 manually placed designs require 

less logic area than automatically placed ones. The logic mapping specified by the manual 

constraints is denser than that used by the automatic tools and reductions in area of 40% are 

commonly achieved, with a maximum of 61% area reduction observed for the unpipelined 

adder tree. In one case the manual mapping for a butterfly circuit used less than 70% of 

the device resources while the automatic mapping and placement was unable to fit the same 

circuit onto the device. Manual placement is clearly significantly superior here. 

The effectiveness of manual placement on positively influencing maximum clock frequency 

depends on other constraints. Given a homogenous environment, simulated annealing is able 

to generate circuits with equivalent or better performance by discovering high speed routing 

paths between cells that humans would not consider sensible - for example, the placed 24-bit 

pipelined binomial filter circuit is 35% slower than the automatically placed version. However, 

when other constraints are affecting the layout simulated annealing does not perform so well 

and the regular layout constraints can produce significant performance gains. 

The kind of constraints that affect simulated annealing appear to be use of the fast carry chain 

circuitry, which forces some cells to be laid out vertically, and level of device utilisation which 

reduces the ability to the placer to find high-speed routes through less densely packed logic. 

For pipelined bitonic merger butterfly networks, the 4-bit manually placed circuit utilises 

only 28% of the device runs 14% slower than the automatically placed version - however the 

8-bit version utilises 55% of the device and runs 48% faster than the automatically placed 

version. 

Manual placement often produces better results than simulated annealing for unpipelined 

4 The exception to this we observed was the binomial filter circuit, where we deliberately specified a less 

dense logic mapping in order to achieve a better aligned layout.


circuits where the maximum clock frequency is already much lower than pipelined ones. This 

is not unexpected, since wiring delays will accumulate in the same way as logic propagation 

delays in unpipelined circuits. 

Generally, manual placement appears to lead to reduced power consumption, with reductions 

in power consumption of up to 40% possible (for the pipelined adder trees). In general power 

consumption can be reduced even if the maximum clock frequency of the placed design is 

lower than that for the automatically placed circuit. For the binomial filter power savings of 

2-13% were observed even though the placed circuits had lower maximum clock frequencies. 

In the case of the butterfly network a correlation was once again observed with device util- 

isation/circuit size - with the 4-bit circuit consuming more power when placed although an 

8-bit circuit consumed less. 

6.8 Summary 

We have demonstrated our layout framework with a variety of real circuits including a ma- 

trix multiplier described with a new type of higher-dimensional combinator, a binomial filter, 

a butterfly network and a median filter. We have demonstrated how functional reasoning 

can be used to derive pipelined versions, while the layout framework can be used to verify 

layouts. We have found that in many, though not all cases, manually placed designs outper- 

form automatically placed circuits with higher maximum operating frequencies, lower device 

utilisation, lower power consumption and a faster place and route process.

Chapter 7 

Conclusion and Future Work 

In this final chapter we review the work reported in this thesis, its contribution and potential 

for future development. 

7.1 This Thesis’ Contribution 

In this thesis we have described the design, implementation and applications of a framework 

for describing and verifying parameterised FPGA circuits with layout information. 

In Chapter 3 we describe how we can extend Quartz with additional constructs to describe 

placed circuits. We show how two functions – maxf and sum – are sufficient to describe 

the placement and size of iterative structures. We demonstrate how the size of blocks can 

be inferred and also provide a mechanism for sizes to be specified manually. We show how 

high-order Quartz descriptions with layout information can be compiled into parameterised 

hardware libraries with high-order parameters removed. 

Chapter 4 describes an infrastructure for the verification of Quartz circuit layouts using the 

Isabelle theorem prover. We give a formal semantics for Quartz descriptions in HOL and 

provide HOL interpretations of layout correctness. Using a modified Quartz compiler we are 

able to automatically generate semantic definitions for Quartz blocks and proof obligations for 

layout correctness. The compiler can also generate proof scripts using Isabelle’s simplifier and 

164

CHAPTER 7. CONCLUSION AND FUTURE WORK 165 

classical reasoner to verify the layouts of many blocks without requiring any user intervention 

at all. We demonstrate how this verification infrastructure can be applied to a range of useful 

combinators. 

In Chapter 5 we illustrate the use of our system to specialise designs when certain input values 

are known. We introduce the idea of distributed specialisation with self-specialising Quartz 

blocks and show that removing central control of HDL-level specialisation has significant 

advantages in terms of easing verification and faster processing for dynamic specialisation 

applications. We show that HDL-level specialisation of placed designs is able to achieve 

design compaction, unlike lower level approaches which operate on a fully routed circuit 

or placed netlist. We demonstrate the specialisation of a parallel multiplier and show that 

specialisation with compaction substantially increases performance and reduces the logic area 

required to implement the function. 

Chapters 6 demonstrates the use of our layout framework with several complete circuit de- 

scriptions, including a median filter and matrix multiplier. We introduce a new n-dimensional 

combinator for use in multi-dimensional circuit descriptions and show that the cubical version 

of this combinator can describe a matrix multiplication operation more clearly than existing 

combinators. We also demonstrate how Quartz combinators can be used at a low-level to 

pipeline circuits using the in-slice flip-flops of the Xilinx Virtex-II FPGA architecture. We 

show that for many circuits, manually placed designs are compiled quicker, require less logic 

area, have a higher maximum clock frequency and a lower power consumption than when 

automatically placed. 

7.2 Evaluation 

The layout generation system works well and we have succeeded in generating parameterised 

Pebble libraries from a variety of Quartz descriptions. In most cases maxf and sum functions 

are eliminated from the output in favour of conditional expressions by compiler optimisations 

– and these conditionals could themselves be eliminated by replacing them with Pebble 

conditional generation of the different alternatives if we so wished. The current version of 

the Pebble compiler does not currently support generation of parameterised VHDL so we


have been unable to implement our system for generating parameterised placed VHDL from 

Pebble, although we are confident of its design and do not foresee any difficulties with this. 

There is currently no support for generating parameterised Pebble when Quartz blocks have 

recursive size expressions. There are a number of possible solutions to this problem that we 

have discussed, though we have not implemented them. The only totally general solution 

would be to allow Pebble expressions to contain recursive functions, which is not difficult 

although it is slightly untidy. We would recommend that recursive size expressions should be 

eliminated where possible by attempting to compute the transitive closure of the recursion. 

The verification framework also appears to work well for a wide variety of descriptions, rang- 

ing from library combinators to real circuits such as the median filter described in Section 6.3. 

However, we have learnt some important lessons about the relative ease of verifying different 

kinds of Quartz structures. 

Circuits described iteratively can generally be verified relatively easily using the extensive 

range of theorems in the QuartzLayout library developed for maxf and sum. The theorems 

in the Structures theory which can be applied to easily prove intersection theorems for loops 

are particularly useful since the proof goals tend to be quite complicated but are all of the 

same basic structure. 

Verification of recursively defined Quartz blocks is less automated. The compiler’s trans- 

lation into Isabelle recdef recursive definitions is correct but the definitions are not easily 

utilised by the theorem prover. We have had much more success with defining blocks using 

primitive recursion – however this is not a completely general approach. Proofs for recursive 

size functions require induction, however generally after this the automated proof tools are 

effective at completing the proof. 

Two important lessons have been learnt from the verification of our example circuits. Firstly, 

that the size inference process produces better results than expected and the need for manual 

specification of size functions is much less than was initially presumed. Secondly, that the 

relational nature of Quartz seriously complicates layout reasoning in some cases. 

The success of the size inference process is such that for most real circuits it is likely that 

size inference will be used almost all of the time, with manual size expressions only specified


occasionally. Manual size expressions are still necessary, in order to describe the size of 

primitive blocks or blocks which should reserve more space on the FPGA than they actually 

require – for example, so that larger designs can be swapped into the same area using run- 

time reconfiguration. However, the present verification framework requires the verification 

of the validity and containment of size expressions generated from the inference algorithm 

and these proof goals could be mostly eliminated if proof of the correctness of the inference 

algorithm itself could be built into the Isabelle embedding 1 . 

That Quartz directional abstraction complicates layout reasoning is a slightly unexpected 

result since it is generally regarded as simplifying functional reasoning. The use of the 

definite description operator to define the values of internal signals using a block’s predicate 

as generated by the semantic function Bβ makes it difficult to use the actual values of these 

signals (unless they are defined very simply). It is possible that the mechanism for resolving 

Quartz directions could be coded in Isabelle and used to extract the real values however 

since the algorithm is incomplete it is difficult to see how the necessary properties could be 

established formally. 

Despite this, internal signals do not tend to cause problems most of the time because it is 

not actually necessary to resolve their value. Since Quartz is one of the few languages with 

directional abstraction this would also not be an issue if the verification framework were 

applied to another language. 

We have also shown how the layout infrastructure can be used as part of a system of dis- 

tributed self-specialising Quartz blocks to transparently specialise hardware when one or 

more input values are known at compile-time. The limitations of our current framework for 

carrying out distributed specialisation are discussed in Section 5.3 however even with these 

limitations it can be a useful tool for carrying out some simple optimisations on generated 

hardware without requiring any significant designer effort. 

HDL-level specialisation is particularly important for placed hardware libraries because it 

allows designs to be compacted as logic is eliminated from circuits. This is not something 

that can be achieved with low level specialisation of the synthesised design because placement 

constraints are only parameterised in the high-level description. 

1 It would not be possible to totally eliminate all containment proof obligations, since it still remains 

necessary to prove that no block exists to the left or below of (0, 0)


7.3 Comparison with Related Work 

To our knowledge, ours is the only work which addresses the issue of generating and verifying 

parameterised hardware libraries with explicit placement information and support for recur- 

sively and iterative described structures. However, other work has taken different approaches 

to the problem. 

7.3.1 VHDL with Explicit Co-ordinates 

VHDL and Verilog can be used with absolute placement co-ordinates specified using “RLOC” 

constraints for Xilinx architectures. VHDL does not provide any particular support for place- 

ment in particular, however it can be extended with user-defined functions to implement the 

equivalent of our maxf and sum operations. VHDL does not support higher-order functions, 

however, so these would need to be coded explicitly for each hardware arrangement. 

Nevertheless, our theorem proving verification framework could be applied to VHDL designs. 

The potential for the use of a few dozen key theorems to automate proofs is severely reduced 

by the lack of higher-order functions however theorem proving can still be used to verify 

first-order placement co-ordinates. Size functions for each VHDL entity would probably 

need to be entered manually, although they could possibly be inferred in a similar way to 

Quartz providing a suitable subset of the language is used. VHDL allows the combination 

of behavioural and structural design styles within a single description and if this was done 

then verification of the correctness of entity size functions would probably be impossible 

without incorporating a theoretical model of how behavioural descriptions are elaborated 

into structural hardware. 

7.3.2 Relative Placement in Pebble 

Pebble has been extended with support for relative placement [49]. The Pebble system uses 

new language constructs to provide below, beside, below for and beside for capabilities and 

a functional specification has been given for a procedure which compiles relative positions 

into absolute co-ordinates. The Pebble system is clean and effective at describing many


common hardware constructs, however it contains several flaws which are not present in 

our approach. Firstly, by limiting itself to “conventional” arithmetic and logical expressions 

the Pebble system can not infer conditional branches in block sizes and is forced to select 

the maximum possible size of conditionals – something which can introduce inefficiency since 

circuits may not be placed as compactly as possible. Compaction of circuits with conditionals 

can be achieved using partial evaluation, however this requires that the final design is not 

parameterised in any variables which appear in conditional expressions. 

With the Pebble system layout correctness is achieved by design, through reliance on simple 

beside and below placement and the use of block size inference. However the Pebble system 

does not meet our own, more stringent, definition of correctness – for example, it may generate 

incorrect layouts when the size of each iteration of a loop is different. The insistence on using 

relative placement is also limiting and means that pathological cases such as the irregular 

grid example we described in Section 4.7 can not be described. 

On the other hand, the Pebble system does have some advantages over our framework. Firstly, 

it is much simpler and easier to implement in other languages, such as VHDL. By limiting 

itself to basic arithmetic and boolean expressions hardware may not be laid out completely 

optimally, however the expressions it does generate are simpler than those generated by the 

Quartz system which often contain conditionals or maxf/sum functions which have not been 

optimised away by the limited number of optimisations applied by the Quartz compiler. 

Finally, by incorporating beside and below as language constructs the Pebble system can 

express n-ary placement relationships more easily than can be done with Quartz. Quartz 

combinators are blocks like any other and have a fixed arity so can only be parameterised 

by a certain number of other blocks. Combinators can be parameterised by block vectors 

however this creates an unwanted type constraint that all blocks must have the same type. 

The Pebble system is designed to support placement of iteratively described circuit descrip- 

tions and does not detail how recursive blocks should be handled. This also causes problems 

for the compilation of Quartz layouts (see Section 3.7.4) however our general infrastructure 

does support recursive size functions.


7.3.3 Ruby and Lava 

Both Ruby [26] and Lava [7] have been used to generate placed circuit descriptions using 

their higher-order combinators. Ruby and Lava combinators can describe both function and 

placement, as with Quartz. Neither the Ruby nor Lava system supports the generation of 

parameterised output, instead they generate flattened netlists. 

Ruby uses a variable-free notation of relations to describe circuits and it is thus relatively 

easy to give key combinators, such as beside and below a layout interpretation which can then 

be used to generate placed output. Ruby’s design style does not support explicit instantiation 

and signal connection and can not support explicit co-ordinates. This is a limiting factor and 

the system can not support the irregular grid example. No explicit verification infrastructure 

is available for Ruby layouts, however since the output is a flattened netlist and placement 

is limited to beside/below relationships it should be impossible to describe invalid layouts. 

Lava is a more flexible system based on Haskell. Lava provides a combinators which place 

components below each other, beside each other or at the same location. Unlike Ruby, it 

does not enforce a variable-free notation and is thus more flexible. A version of Lava has also 

been demonstrated that supports layout with explicit co-ordinates [77]. 

While Lava provides constructs to aid the construction of correct layouts (using beside and 

below), it also permits possibly invalid layouts (by placing components in the same slice 2 , 

either explicitly or using the combinator provided for this purpose). Unlike Ruby, Lava does 

therefore permit the description of invalid layouts and there is no infrastructure available for 

verifying Lava layouts however our layout verification infrastructure could be adapted to this 

purpose. 

Ruby and Lava use recursion as their only means of repetition. The languages do not support 

iterative descriptions, which can be a clearer way of describing some circuit arrangements 

than recursion (although they are not more powerful per se), although this is less relevant 

than for Quartz since they do not produce output in a format that supports iteration. 

Ruby and Lava do not support giving combinators different layout interpretations in the 

2 The version of Lava which supports placement was developed at Xilinx and is designed specifically to 

target Xilinx FPGAs.


same way as our framework. Lava does support overloading but using Haskell type classes 

which are not suitable for overloading blocks with different parameterisations in the way that 

is required for providing additional layout parameters. 

7.4 Future Work 

Several aspects of this work are particularly open ended and we will end by making a few 

recommendations for areas worthy of future investigation. 

7.4.1 Further Support For Alternative Layout Interpretations 

We have demonstrated how blocks can be given different layout interpretations and over- 

loading used to give one of these interpretations the status of a “default”. However, this 

approach still requires that two or more different layouts are explicitly coded for combinator 

blocks. 

While there will be some cases where blocks are described with completely unrelated layout 

interpretations, in most cases we expect these different interpretations to be variations on 

a theme. It is possible that these operations could be better described by vertical and 

horizontal flipping or rotation and higher-order blocks which performed these operation on 

their parameter block could provide a simpler method of achieving this result. Combinators 

with could rotate or flip blocks allow abstraction of a particular kind of layout operation and 

promote separation of concerns in the same way as higher-order combinators do. 

Such layout-manipulation combinators would need to be based on a different theoretical basis 

to our current system, where one block can not alter the internal structure of another. The 

verification of such combinators would be an interesting exercise, particularly ensuring that 

they do not invalidate a previously valid layout. 

Another useful extension would be to provide a mechanism for series and parallel compositions 

to be given multiple layout interpretations. Lava achieves this for series composition by 

providing different combinators for different series composition layouts, Quartz could take a 

similar approach but achieve it more concisely since series composition is a language-level


construct. Parallel compositions or series compositions could potentially be annotated with 

explicit co-ordinates, or left to follow their default layout interpretation, as appropriate. This 

reduces the link between functional and non-functional (layout) description which, while it 

is often desirable, can sometimes be limiting – for example, when placing binary trees the 

ideal functional description is not a particularly good layout. 

7.4.2 Less User Interaction In Proofs 

One issue with our proof environment is the handling of recursive size functions using Is- 

abelle’s recdef construct. At present our approach is to manually convert the automatically 

generated definitions into primitive recursion and it would definitely be possible to automate 

this process, however it is only valid for a (large) subset of Quartz blocks. It should be pos- 

sible to get general size functions working by proving appropriate congruence rules to direct 

the automatic recdef termination proofs however recdef is a closed box and it will probably 

be necessary to closely examine and possibly change the Isabelle/HOL source code to find 

and correct the problems. 

While we achieved extremely good levels of automation, there appears to be potential to 

make some considerable improvements. Firstly, we would advocate further investigation into 

how the layout theorems regarding series and parallel composition can be better utilised by 

the automatic proof tools. There appears to be no reason why these theorems should need 

to be applied manually and many blocks that use composition could be proved much more 

easily if compositions were decomposed by the theorem prover correctly. 

We would also suggest that more experimentation with the ideal configuration of rule sets 

for the classical reasoner and simplifier to maximise effectiveness. We have generated scripts 

which specify specific rule sets for each theorem proof in the Quartz compiler but a better 

mechanism would be to specify these rule sets as defaults and prove theorems using simply 

“auto”. This is not as easy as it sounds since we are presently generating different scripts 

for different types of theorems and some sort of compromise solution would need to be found 

that performed as well on all types of theorem as the current proof-specific method. 

Another avenue that might be worth investigating is to combine multiple logics and provers


to achieve better automation. While Quartz combinators are usually high-order blocks and 

require a high-order formalism to verify their layouts, Quartz circuits tend to be parame- 

terised purely by integer or boolean parameters. As such, it is possible that Quartz libraries 

could be verified in higher-order logic and these proofs could be in some way be treated as 

axiomatic in proofs for a whole circuit using a different prover with a different formalism. 

The ACL2 theorem prover [34] is known for supporting very high levels of automation but 

proves theorems in the first-order Boyer-Moore logic [9], it is possible that a combination 

of Isabelle and ACL2 could produce better results than Isabelle alone. A major practical 

difficulty that would need to be overcome here is ensuring the soundness of the interaction 

between the two different logics. 

7.4.3 Integrating Layout and Functional Verification 

In this work we have developed a shallow embedding of Quartz designed specifically to enable 

the verification of design layouts. It seems slightly paradoxical to maintain two different 

verification systems, one for functionality and one for layout, when the two could potentially 

be combined into a single embedding within a theorem prover. 

To support full functional reasoning (some limited reasoning about functional properties is 

already possible) QuartzLayout would need to be extended with a timing model to allow the 

data values on wires to be properly modelled in synchronous circuits. It is likely that a deep 

embedding of Quartz, rather than a shallow semantic embedding, would be the best way to 

combine functional and layout verification in a single environment. We have already laid the 

foundations for the definition of such an embedding by defining a formal semantics of Quartz 

in HOL and this function could be translated into an Isabelle implementation to provide a 

meaning function for the deep embedding. 

7.4.4 Run-time Reconfiguration 

In Chapter 5 we demonstrated the ability of our layout framework to support the special- 

isation of designs. Dynamic specialisation of designs at run-time is potentially a highly 

worthwhile activity, with performance gains outweighing the time required to reconfigure a


chip in operation. 

We have showed that distributed specialisation makes the process of verifying the specialisa- 

tion of designs easier than lower-level approaches [80] and we have suggested that it should 

be quicker, though we have not quantitatively evaluated the execution speed of a HDL-level 

distributed specialisation process. 

Distributed specialisation in Quartz is an area that is definitely worthy of further investi- 

gation. In particular the addition of particular constructs to to the language to support 

run-time reconfiguration directly, as have been demonstrated for Pebble [15], could be in- 

vestigated. This would support a higher-level interface to run-time reconfiguration than the 

current capabilities using virtual multiplexer blocks [47]. 

7.4.5 Properties of N-Dimensional Combinators 

The new class of n-dimensional combinators we introduced in Section 6.6 show the potential 

to substantially simplify the description of certain kinds of circuits, particularly those that 

manipulate multiple multi-dimensional data sources. Initial investigations also indicate that 

they could be used to clearly describe the translation of some classes of imperative function 

descriptions based on nested loops into hardware. 

The definition of an n-dimensional combinator suggests that it should be possible to prove 

useful theorems such as retiming and serialisation for all n-dimensional combinators at once, 

even though the description of the fully general case is substantially more complicated that 

the combinator for any particular dimension. Theorems that are valid for this entire class of 

combinators could be extremely useful since, even if complicated to prove, they would only 

need to be proved once. 

It is also worth investigating the different ways of mapping n-dimensional descriptions into 

two-dimensional FPGA hardware and, if there are multiple good ways of doing this, under 

what situations each is optimal.

Bibliography 

[1] A. Aggoun and N. Beldiceanu. Extending CHIP in order to solve complex scheduling 

and placement problems. J. Mathematical and Computer Modelling, 17(7):57–73, 1993. 

[2] M. J. Alexander, J. P. Cohoon, J. L. Colflesh, J. Karro, and G. Robins. Three- 

dimensional field-programmable gate arrays. In Proc. IEEE Intl. ASIC Conf., 1995. 

[3] P. Alfke. Efficient Shift Registers, LFSR Counters and Long Pseudo-Random Sequence 

Generators, Xilinx Application Note 052, July 1996. 

[4] J. M. Arnold. S5: The architecture and development flow of a software configurable 

processor. In Proc FPT’05: IEEE Conf. Field Programmable Technology, to appear. 

[5] D. Basin, H. Kuruma, K. Takaragi, and B. Wolff. Verification of a signature architecture 

with HOL-Z. In Formal Methods 2005, volume 3582 of LNCS, pages 269–285. Springer- 

Verlag, 2005. 

[6] N. Beldiceanu and M. Carlsson. Sweep as a generic pruning technique applied to the non- 

overlapping rectangles constraint. In T. Walsh, editor, Proc. Constraint Programming 

2001, volume 2239 of LNCS, pages 377–391. Springer-Verlag, 2001. 

[7] P. Bjesse, K. Claessen, M. Sheeran, and S. Singh. Lava: hardware design in Haskell. 

In ICFP ’98: Proc. 3rd ACM SIGPLAN Intl. Conf. on Functional programming, pages 

174–184. ACM Press, 1998. 

[8] R. Boulton, A. Gordon, M. Gordon, J. Harrison, J. Herbert, and J. V. Tassel. Experience 

with embedding hardware description languages in HOL. In V. Stavridou, T. F. Melham, 

and R. T. Boute, editors, IFIP TC10/WG 10.2 Intl. Conf. on Theorem Provers in Cir- 

175

BIBLIOGRAPHY 176 

cuit Design: Theory, Practice and Experience, pages 129–156. North-Holland/Elsevier, 

June 1992. 

[9] R. S. Boyer and J. S. Moore. A Computational Logic. Academic Press, NY, 1979. 

[10] B. Brock, M. Kaufmann, and J. S. Moore. ACL2 theorems about commercial micro- 

processors. In M. K. Srivas and A. J. Camilleri, editors, FMCAD’96: First International 

Conference on Formal Methods in Computer Aided Design, volume 1166 of LNCS, pages 

275–293, Palo Alto, California, USA, 1996. Springer-Verlag. 

[11] R. E. Bryant. Symbolic boolean manipulation with ordered binary-decision diagrams. 

ACM Computing Surveys, 24(3):293–318, 1992. 

[12] Celoxica. Handel-C Lanuage Reference Manual, 2001. 

[13] J. Cong and Y. Ding. FlowMap: An optimal technology mapping algorithm for delay 

optimization in lookup-table based FPGA designs. IEEE Trans. Computer Aided Design 

of Integrated Circuits and Systems, 13(1):1–12, 1994. 

[14] L. Damas and R. Milner. Principal type-schemes for functional programs. In POPL ’82: 

Proc. 9th ACM Symp. on Principles of Programming Languages, pages 207–212. ACM 

Press, 1982. 

[15] A. Derbyshire and W. Luk. Compiling run-time parametrisable designs. In Proc. 

FPT’02: IEEE Intl. Conf. on Field-Programmable Technology, pages 44–51, December 

2002. 

[16] E. W. Dijkstra. Selected Writings on Computing: A Personal Perspective. Springer- 

Verlag, 1982. 

[17] H. Fan, J. Liu, and Y.-L. Wu. General models for optimum arbitrary-dimension FPGA 

switch box designs. In Proc. IEEE/ACM Conf. CAD (ICCAD), pages 93–98, 2000. 

[18] A. C. J. Fox. Formal specification and verification of ARM6. In D. Basin and B. Wolff, 

editors, TPHOLS’03: 16th International Conference on Theorem Proving in Higher 

Order Logics, volume 2758 of LNCS. Springer-Verlag, 2003.


[19] H. Gelernter. Realization of a geometry-theorem proving machine. In J. Siekmann and 

G. Wrightson, editors, Automation of Reasoning: Classical Papers on Computational 

Logic 1957–1966, volume 1, pages 99–124. Springer-Verlag, 1983. 

[20] H. Gelernter, J. R. Hansen, and D. W. Loveland. Empirical explorations of the geometry- 

theorem proving machine. In J. Siekmann and G. Wrightson, editors, Automation of Rea- 

soning: Classical Papers on Computational Logic 1957–1966, pages 140–150. Springer- 

Verlag, 1983. 

[21] M. Gordon, R. Milner, and C. Wadsworth. Edinburgh LCF: a mechanised logic of 

computation, volume 78 of LNCS. Springer-Verlag, 1979. 

[22] M. J. C. Gordon. HOL: A proof generating system for higher-order logic. In G. Birtwistle 

and P. Subrahmanyam, editors, VLSI Specification, Verification and Synthesis, pages 

73–128. Kluwer, 1988. 

[23] J. Greene, E. Hamdy, and S. Beal. Antifuse field programmable gate arrays. Proceedings 

of the IEEE, 81(7):1042–1056, 1993. 

[24] S. Guo and W. Luk. Compiling Ruby into FPGAs. In W. Moore and W. Luk, editors, 

Proc. FPL’95: Field Programmable Logic and Applications, volume 975 of LNCS, pages 

188–197. Springer-Verlag, 1995. 

[25] S. Guo and W. Luk. Producing design diagrams from declarative descriptions. In 

S. Yand, J. Zhou, and C. Li, editors, Proc. 4th Intl. Conf. on CAD/CG, pages 1084– 

1093. SPIE, 1995. 

[26] S. Guo and W. Luk. An integrated system for developing regular array designs. J. 

Systems Architecture, 47(3-4):315–337, 2001. 

[27] R. Hindley. The principal type scheme of an object in combinatory logic. Trans. Amer. 

Math. Soc., 146:29–60, 1969. 

[28] J. Hughes. Why functional programming matters. Computer Journal, 32(2):98–107, 

1989. 

[29] M. Huth and M. Ryan. Logic in Computer Science. Cambridge University Press, 2004.


[30] IEEE. IEEE standard VHDL language reference manual, IEEE Std 1076-1987, March 

1988. 

[31] G. Jones and M. Sheeran. Circuit design in Ruby. In J. Staunstrup, editor, Formal 

Methods for VLSI Design, pages 13–70. North-Holland/Elsevier, 1990. 

[32] G. Jones and M. Sheeran. Relations and refinement in circuit design. In C. Morgan and 

J. Woodcock, editors, 3rd Refinement Workshop, Springer Workshops in Computing. 

Springer-Verlag, 1991. 

[33] J. J. Joyce. Formal verification and implementation of a microprocessor. In G. Birtwistle 

and P. A. Subrahmanyam, editors, VLSI Specification, Verification and Synthesis. 

Kluwer, 1988. 

[34] M. Kaufmann and J. S. Moore. An industrial strength theorem prover for a logic based 

on Common Lisp. IEEE Transactions on Software Engineering, 23(4):203–213, April 

1997. 

[35] S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi. Optimization by simulated annealing. 

Science, 220(4598):671–680, 1983. 

[36] A. Krukowski and I. Kale. Simulink/Matlab-to-VHDL route for full custom/FPGA rapid 

prototyping of DSP algorithms. In Proc. DSP’99: Matlab DSP Conference, Tampere, 

Finland, November 1999. 

[37] C. Kulkarni, G. Brebner, and G. Schelle. Mapping a domain specific language to a 

platform FPGA. In Proc. DAC’04: 41st Design Automation Conference, pages 924– 

927, 2004. 

[38] L. Lamport and L. C. Paulson. Should your specification language be typed? ACM 

Trans. Program. Lang. Syst., 21(3):502–526, 1999. 

[39] P.-Z. Lee and Z. M. Kedem. Mapping nested loop algorithms into multidimensional 

systolic arrays. IEEE Trans. Parallel and Distributed Systems, 1(1):64–76, 1990. 

[40] M. Lesser, W. M. Meleis, M. M. Vai, S. Chiricescu, W. Xu, and P. M. Zavracky. Rothko: 

A three-dimensional FPGA. IEEE Design and Test of Computers, 15(1):16–23, 1998.


[41] Y. Li and M. Leeser. HML, a novel hardware description language and its translation 

to VHDL. IEEE Trans. on Very Large Scale Integration Systems, 8(1):1–8, 2000. 

[42] H. Lim and E. E. Swartzlander. Multidimensional systolic arrays for the implementation 

of discrete fourier transforms. IEEE Trans. Signal Processing, 47(5):1359–1370, May 

1999. 

[43] N. Ling and M. A. Bayoumi. The design and implementation of multidimensional systolic 

arrays for DSP applications. In Proc. ICASSP-89: Intl. Conf. on Acoustics, Speech, and 

Signal Processing, pages 1142–1145. IEEE, 1989. 

[44] W. Luk. Systolic band-matrix multipliers. Electronics Letters, 26(6):403–405, March 

1990. 

[45] W. Luk. Systematic serialisation of array-based architectures. Integration, the VLSI 

Journal, 14(3):333–360, February 1993. 

[46] W. Luk and S. McKeever. Pebble: a language for parameterised and reconfigurable 

hardware design. In R. W. Hartenstein and A. Keevallik, editors, Proc. FPL’98: Field- 

Programmable Logic and Applications, volume 1482 of LNCS, pages 9–18. Springer- 

Verlag, 1998. 

[47] W. Luk, N. Shirazi, and P. Y. K. Cheung. Compilation tools for run-time reconfig- 

urable designs. In Proc. FCCM’97: 5th IEEE Symp. on Field-Programmable Custom 

Computing Machines, pages 56–65. IEEE Computer Society, 1997. 

[48] S. McKeever and W. Luk. Towards provably-correct hardware compilation tools based 

on pass separation techniques. In Proc. CHARME ’01: 11th Conf. on Correct Hardware 

Design and Verification Methods, volume 2144 of LNCS, pages 212–227, London, UK, 

2001. Springer-Verlag. 

[49] S. McKeever, W. Luk, and A. Derbyshire. Compiling hardware descriptions with relative 

placement information for parameterised libraries. In M. Aagaard and J. O’Leary, edi- 

tors, Proc. FMCAD 2002: 4th Intl. Conf. Formal Methods in Computer-Aided Design, 

volume 2517 of LNCS, pages 342–359. Springer-Verlag, 2002.


[50] S. McKeever, W. Luk, and A. Derbyshire. Towards verifying parametrised hardware li- 

braries with relative placement information. In Proc. HICSS ’03: 36th Hawaii Intl. Conf. 

on System Sciences, page 10, Washington, DC, USA, 2003. IEEE Computer Society. 

[51] W. M. Meleis, M. Leeser, P. Zavracky, and M. M. Vai. Architectural design of a three 

dimensional FPGA. In Proc. 17th Conf. Advanced Research in VLSI (ARVLSI), pages 

256–268, September 1997. 

[52] T. Melham. Higher Order Logic and Hardware Verification. Cambridge Tracts in The- 

oretical Computer Science. Cambridge University Press, 1993. 

[53] R. Milner. A theory of type polymorphism in programming. J. Comput. Syst. Sci., 

17:348–375, 1978. 

[54] A. Mycroft and R. Sharp. Higher-level techniques for hardware description and synthesis. 

International Journal on Software tools for Technology Transfer, 4(3):271–297, May 

2003. 

[55] T. Nipkow, L. C. Paulson, and M. Wenzel. Isabelle/HOL: A Proof Assistant for Higher- 

Order Logic, volume 2283 of LNCS. Springer-Verlag, 2002. 

[56] S.-W. Ong, N. Kerkiz, B. Srijanto, C. Tan, M. Langston, D. Newport, and D. Bouldin. 

Automatic mapping of multiple applications to multiple adaptive computing systems. 

In Proc. FCCM’01: Proc. 9th IEEE Symp. Field-Programmable Custom Computing 

Machines, pages 10–20. IEEE, 2001. 

[57] J. Ou and V. K. Prasanna. Parameterized and energy efficient adaptive beamforming on 

FPGAs using MATLAB/Simulink. In Proc. ICASSP’04: IEEE Intl. Conf. Acoustics, 

Speech, and Signal Processing, volume 5, pages 181–184, 2004. 

[58] J. Ou and V. K. Prasanna. Pygen: a MATLAB/Simulink based tool for synthesizing 

parameterized and energy efficient designs using FPGAs. In Proc. FCCM 2004: 12th 

IEEE Symp. Field-Programmable Custom Computing Machines, pages 47–56. IEEE, 

April 2004. 

[59] S. Owre, J. M. Rushby, and N. Shankar. PVS: A prototype verification system. In 

D. Kapur, editor, 11th Intl. Conf. on Automated Deduction (CADE), volume 607 of 

LNAI, pages 748–752, Saratoga, NY, June 1992. Springer-Verlag.


[60] L. C. Paulson. The foundation of a generic theorem prover. J. Automated Reasoning, 

5:363–397, 1989. 

[61] L. C. Paulson. Isabelle: A Generic Theorem Prover, volume 828 of LNCS. Springer- 

Verlag, 1994. 

[62] O. Pell. Quartz: A new language for hardware description. Final Year Project Report, 

Dept of Computing, Imperial College, June 2004. 

[63] O. Pell. Quartz compilation algorithms. ISO Dissertation, Dept of Computing, Imperial 

College, January 2005. 

[64] O. Pell and W. Luk. Resolving Quartz overloading. In D. Borrione and W. Paul, edi- 

tors, Proc. CHARME’05: 13th Conference on Correct Hardware Design and Verification 

Methods, volume 3725 of LNCS, pages 380–383. Springer-Verlag, 2005. 

[65] O. Pell and W. Luk. Quartz: A framework for correct and efficient reconfigurable design. 

In Proc. RECONFIG’05: Intl. Conf. on Reconfigurable Computing and FPGAs. IEEE 

Computer Society Press, September 2005, to appear. 

[66] O. Pell and H. Yu. User’s tutorial for Pebble 5.0. https://cc.doc.ic.ac.uk/local/ 

pebble/doc/users/. 

[67] F. P. Preparata and M. I. Shamos. Computational Geometry: An Introduction. Springer- 

Verlag, 1985. 

[68] O. Rasmussen. Transformational VLSI Design. PhD thesis, Technical University of 

Denmark, 1997. 

[69] T. Riesgo, Y. Torroja, and E. de la Torre. Design methodologies based on hardware 

description languages. IEEE Trans. Industrial Electronics, 46(1):3–12, February 1999. 

[70] J. A. Robinson. A machine-oriented logic based on the resolution principle. J. ACM, 

12(1):23–41, 1965. 

[71] P. Rudnicki. An overview of the Mizar project. In Proc. 1992 Workshop on Types for 

Proofs and Programs, Chalmers University of Technology, Bastad, 1992.


[72] H. Schmit. Extra-dimensional island-style FPGAs. In Proc. FPL 2003: Field Program- 

mable Logic and Applications, pages 406–415, 2003. 

[73] C.-J. H. Seger and R. E. Bryant. Formal verification by symbolic evaluation of partially- 

ordered trajectories. Formal Methods in Systems Design, 6:147–189, 1994. 

[74] R. Sharp and O. Rasmussen. The T-Ruby design system. Formal Methods in System 

Design, 11(3):239–264, 1997. 

[75] M. Sheeran. Finding regularity: Describing and analysing circuits that are not quite 

regular. In Proc. CHARME’03: Correct Hardware Design and Verification Methods, 

volume 2860 of LNCS, pages 4–18. Springer-Verlag, 2003. 

[76] M. Sheeran. Generating fast multipliers using clever circuits. In A. J. Hu and A. K. 

Martin, editors, Proc. FMCAD 2004: 5th Intl. Conf. Formal Methods in Computer- 

Aided Design, volume 3312 of LNCS, pages 6–20. Springer-Verlag, 2004. 

[77] S. Singh. Death of the RLOC? In FCCM’00: Proc. 8th IEEE Symp. on Field- 

Programmable Custom Computing Machines, page 145, Washington, DC, USA, 2000. 

IEEE Computer Society. 

[78] S. Singh and P. James-Roxby. Lava and JBits: From HDL to bitstream in seconds. In 

Proc. FCCM’01: 9th IEEE Symp. Field-Programmable Custom Computing Machines, 

2001. 

[79] M. K. Srivas and S. P. Miller. Applying formal verification to the AAMP5 microproces- 

sor: A case study in the industrial use of formal methods. Formal Methods in System 

Design, 8(2):153–188, 1996. 

[80] K. W. Susanto and T. Melham. Formally analyzed dynamic synthesis of hardware. J. 

Supercomputing, 19(1):7–22, 2001. 

[81] D. E. Thomes and P. Moorby. The Verilog Hardware Description Language. Kluwer 

Academic, 3rd edition edition, 1996. 

[82] T. Todman and W. Luk. Combining imperative and declarative hardware descriptions. 

In Proc. HICSS ’03: 36th Hawaii Intl. Conf. on System Sciences, pages 280–289. IEEE 

Computer Society, 2003.


[83] J. Voeten. On the fundamental limitations of transformational design. ACM Transac- 

tions on Design Automation of Electronic Systems, 6(4):533–552, 2001. 

[84] P. Wadler and S. Blott. How to make ad-hoc polymorphism less ad hoc. In POPL ’89: 

Proceedings of the 16th ACM SIGPLAN-SIGACT symposium on Principles of program- 

ming languages, pages 60–76. ACM Press, 1989. 

[85] S. J. E. Wilton, S.-S. Ang, and W. Luk. The impact of pipelining on energy per operation 

in field-programmable gate arrays. In Proc. FPL’04: Intl. Conf. on Field-Programmable 

Logic, volume 3203 of LNCS, pages 719–728, Antwerp, Belgium, August 2004. Springer- 

Verlag. 

[86] Xilinx, Inc. Virtex-II Platform FPGAs: Complete Data Sheet, March 2005. DS031.

Appendix A 

Quartz Language Grammar 

This appendix contains the grammar for the Quartz language with placement constructs, in 

extended Backus-Naur form (EBNF). 

Terminals are set in typewriter font, while non-terminals are set between angled brackets. 

Round brackets are used to indicate grouping. 

The asterisk symbol (*) indicates zero-or-more repetition. The plus symbol (+) indicates 

once-or-more repetition. The question mark symbol (?) indicates zero-or-once repetition. 

〈design〉 ::= 〈blockdef 〉* 

〈blockdef 〉 ::= block 〈id〉 〈domain〉 ~ 〈range〉 { 〈dec〉* 〈stmt〉* } 

〈domain〉 ::= 〈io〉 〈io〉* 

〈range〉 ::= 〈io〉 

| block 〈id〉 〈domain〉 ~ 〈range〉 -> 〈singlestmt〉 

〈io〉 ::= ( 〈io tuple elt〉 (, 〈io tuple elt〉 )* ) 

〈io tuple elt〉 ::= 〈dir〉? 〈basictype〉 〈id〉 〈vecindex 〉* 

〈dir〉 ::= in 

| 〈dir〉? 〈block〉 〈id〉 〈blocksig〉 〈vecindex 〉* 

| ( 〈io tuple elt〉 (, 〈io tuple elt〉 )* ) 

| out 

184

APPENDIX A. QUARTZ LANGUAGE GRAMMAR 185 

| ^〈id〉 

| ^〈id〉* 

〈basictype〉 ::= int 

| bool 

| wire 

| ‘〈id〉 

〈vecindex 〉 ::= ([〈expr〉])* 

〈blocksig〉 ::= 〈sig elt〉 ~ 〈sig elt〉 

〈sig elt〉 ::= 〈dir〉? 〈basictype〉 〈vecindex 〉* 

| 〈dir〉? block 〈blocksig〉 〈vecindex 〉* 

| ( 〈sig elt〉 (, 〈sig elt〉)* ) 

〈dec〉 ::= 〈basictype〉 〈id〉 〈vecindex 〉 (, 〈id〉 〈vecindex 〉)* . 

〈singlestmt〉 ::= 〈blkref 〉 . 

| const 〈id〉 = 〈expr〉 (, 〈id〉 = 〈expr〉)* . 

| 〈stmt〉 

〈stmt〉 ::= 〈blkinst〉 . 

| for 〈id〉 = 〈expr〉..〈expr〉 { 〈stmt〉* } . 

| if (〈expr〉) { 〈stmt〉* } ( else { 〈stmt〉* } ) . 

| 〈expr〉 = 〈expr〉 . 

| assert (〈expr〉) "〈string〉" . 

〈blkinst〉 ::= 〈domainval〉 ; 〈blkref 〉 (; 〈blkref 〉 )* ; 〈rangeval〉 

| 〈blkref 〉 〈domainval〉 ~ 〈rangeval〉 

| 〈domainval〉 ; 〈blkref 〉 (; 〈blkref 〉 )* ; 〈rangeval〉 at ( 〈expr〉 , 〈expr〉 ) 

| 〈blkref 〉 〈domainval〉 ~ 〈rangeval〉 at ( 〈expr〉 , 〈expr〉 ) 

〈blkref 〉 ::= 〈id〉 〈arg〉* 

| [ ] 

| [ 〈blkref 〉 (, 〈blkref 〉)* ] 

〈domainval〉 ::= 〈arg〉 〈arg〉*

APPENDIX A. QUARTZ LANGUAGE GRAMMAR 186 

〈rangeval〉 ::= 〈arg〉 

〈arg〉 ::= 〈id〉 〈vecindex 〉 

| 〈expr〉 

| ( ) 

| ( 〈arg〉 (, 〈arg〉)* ) 

〈expr〉 ::= 〈expr〉 〈bop〉 〈expr〉 

| 〈uop〉 〈expr〉 

| 〈id〉 

| 〈num〉 

| true | false 

| 〈expr〉..〈expr〉 

| ( 〈expr〉 ) 

| height ( 〈blkinst〉 ) 

| width ( 〈blkinst〉 ) 

| max ( 〈expr〉, 〈expr〉 (, 〈expr〉)* ) 

| if ( 〈expr〉 , 〈expr〉 , 〈expr〉 ) 

| sum ( 〈id〉 = 〈expr〉 .. 〈expr〉 , 〈expr〉 ) 

| maxf ( 〈id〉 = 〈expr〉 .. 〈expr〉 , 〈expr〉 ) 

〈bop〉 ::= and | or | nand | nor | xor | xnor 

| + | - | * | / | ** | mod | == | != 

| < | | >= 

〈uop〉 ::= - | abs | not 

〈id〉 ::= (‘A’-‘Z’ | ‘a’-‘z’)+ (‘A’-‘Z’ | ‘a’-‘z’ | ‘0’-‘9’ | ‘ ’)* 

〈num〉 ::= (‘0’-‘9’)*

Appendix B 

Theoretical Basis for Layout 

Reasoning 

This appendix contains Isabelle theories which form the QuartzLayout library. 

IntAlgebra defines Quartz operators that are not already present in HOL, including > 

and a power operation for integers. It also includes many useful theorems that can be used 

to rewrite integer expressions. 

Types declares the Quartz types of wires and vectors. 

Block defines Quartz blocks as Isabelle records consisting of their functional definition, 

height function and width function. The theory also defines the block instantiation operation 

and a number of simplification theorems. 

Inbuilt defines the layout interpretations of language constructs that are treated as inbuilt 

blocks (such as zip). 

SeriesComposition defines the semantics and layout interpretation of Quartz series com- 

position. It also includes proofs of useful properties for the layout of series compositions. 

187

APPENDIX B. THEORETICAL BASIS FOR LAYOUT REASONING 188 

ParallelComposition defines the semantics and layout interpretation of Quartz parallel 

composition. The theory includes AST rewriting functions to allow Isabelle to parse and 

pretty-print parallel compositions and also contains proofs of useful properties of parallel 

composition layouts. 

Functions defines the maxf and sum functions and includes their correctness proofs and 

theorems describing a wide range of useful properties. 

Structures contains theorems that are particularly useful for simplifying proof goals that 

are formed by certain circuit structures (e.g. horizontal arrays). 

CompilerSimps contains the proofs of simplification rules used in the Quartz compiler to 

simplify maxf and sum functions. 

QuartzLayout is a dummy theory which brings together all dependent theories to be used 

as the root library. 

Minf is not included in the QuartzLayout library. It contains definitions and correspon- 

dance theorems between min and max functions and minf and maxf functions. If expressions 

using these minimum operators are desired these theorems can be used to rewrite expressions 

into forms which use purely the max and maxf functions. 

B.1 IntAlgebra 

header {* Useful theorems about integers for use in size function reasoning *} 

theory IntAlgebra = Main: 

section {* Additional ordering operators *} 

constdefs 

grthn:: "[’a::ord, ’a]=>bool" ("(_/ > _)" [50, 51] 50) 

"grthn == % a b. (b < a)" 

geq:: "[’a::ord, ’a]=>bool" ("(_/ >= _)" [50, 51] 50) 

"geq == % a b. (b


declare geq_def [simp] 

section {* Power function for integers *} 

(* Undefined for negative argument y - result must be an integer *) 

consts 

pwr :: "[int,int]=>int" (infixr 60) 

defs 

pwr_def: "x pwr y == if y >= 0 then int (nat x ^ nat y) else arbitrary" 

section {* Reasoning with equality and inequalities *} 

theorem zless_eq: "(((x::int)


lemma z_aleq_bc: "[| (0::int)


(* A less permissive version of the ’nat(n)’ function. ’nat(n)’ is a total function 

*) 

(* but we want a partial function defined only for n >= 0 *) 

constdefs 

int2nat :: "int => nat" 

"int2nat == (% x. if 0 int2nat x = arbitrary" 

by (simp add: int2nat_def) 

theorem int2nat_defined [simp]: "!! (x::int). 0 int2nat x = nat x" 

by (simp add: int2nat_def) 

end 

B.2 Types 

header {* Definition of Quartz types *} 

theory Types = Main: 

typedecl wire 

types ’a vector = "int => ’a" 

constdefs 

vecelem :: "’a vector => int => ’a" ("_" [51, 52] 51) 

"vecelem == % v i. (v i)" 

vecrange:: "’a vector => int => int => ’a vector" ("_" [55,56,57] 55) 

"vecrange == % v ub lb. (%x. if (x + lb) wire" 

bool2wire :: "bool => wire" 

end 

B.3 Block 

header {* Definition of Quartz blocks as records of functional definition and size 

functions *} 

theory Block = Types: 

record (’a,’b)block = 

Def :: "’a" 

Height :: "’b" 

Width :: "’b" 

section {* Currying of blocks and block instantiation *} 

constdefs 

ap :: "[(’a=>’b,’a=>’c)block,’a]=>(’b,’c)block" (infixl "$" 49) 

"ap == % B x. (| Def = Def B x, Height = Height B x, Width = Width B x |)" 

constdefs


inst :: "[’a, (’a=>’b=>bool,’a=>’b=>int)block,’b]=>(bool, int)block" ("_ ;;; _ ;;; 

_" [45, 46, 47] 45) 

"inst == (% x B y. (| Def = Def B x y, Height = Height B x y, Width = Width B x y 

|))" 

section {* Simplification theorems *} 

theorem height_extract [simp]: "Height (x ;;; A ;;; y) = ((Height A) x y)" 

by (simp add:inst_def) 

theorem width_extract [simp]: "Width (x ;;; A ;;; y) = ((Width A) x y)" 

by (simp add:inst_def) 

theorem def_extract: "Def (x ;;; A ;;; y) = (Def A) x y" 

by (simp add: inst_def) 

theorem height_ap [simp]: "Height (A $ x) = (Height A) x" 

by (simp add: ap_def) 

theorem width_ap [simp]: "Width (A $ x) = (Width A) x" 


theorem def_ap [simp]: "Def (A $ x) = (Def A) x" 


section {* Congruence, for recdef proofs *} 

theorem ap_cong: "((| Def = s, Height = h, Width = w|) $ l) = (| Def = s l, Height = 

h l, Width = w l|)" 

by (simp) 

declare ap_cong [recdef_cong] 

end 

B.4 Inbuilt 

header {* In-built language blocks: zip and unzip *} 

theory Inbuilt = Types + Block: 

(* Don’t define structure / function - unnecessary for reasoning about size *) 

section {* Zip block *} 

consts 

defs 

zip_struct:: "int=>’a=>(’b)vector=>bool" 

zip_height:: "int=>’a=>(’b)vector=>int" 

zip_width:: "int=>’a=>(’b)vector=>int" 

zip:: "(int=>’a=>(’b)vector=>bool, int=>’a=>(’b)vector=>int)block" 

zip_height_def: "zip_height == % n x y. 0" 

zip_width_def: "zip_width == % n x y. 0" 

zip_def: "zip == (| Def = zip_struct, Height = zip_height, Width = zip_width 

|)"


declare zip_height_def [simp] 

declare zip_width_def [simp] 

declare zip_def [simp] 

section {* Unzip block *} 

consts 

defs 

unzip_struct:: "int=>(’a)vector=>’b=>bool" 

unzip_height:: "int=>(’a)vector=>’b=>int" 

unzip_width:: "int=>(’a)vector=>’b=>int" 

unzip:: "(int=>(’a)vector=>’b=>bool, int=>(’a)vector=>’b=>int)block" 

unzip_height_def: "unzip_height == % n x y. 0" 

unzip_width_def: "unzip_width == % n x y. 0" 

unzip_def: "unzip == (| Def = unzip_struct, Height = unzip_height, Width = 

unzip_width|)" 

declare unzip_height_def [simp] 

declare unzip_width_def [simp] 

declare unzip_def [simp] 

end 

B.5 SeriesComposition 

header {* Definition of Quartz series composition *} 

theory SeriesComposition = Block + IntAlgebra: 

constdefs 

ser:: "[(’a=>’b=>bool,’a=>’b=>int)block,(’b=>’c=>bool,’b=>’c=>int)block]=>(’a=>’c 

=>bool, ’a=>’c=>int)block" (infixl ";;" 48) 

"ser == (% B1 B2. (| Def = % x y. EX s. (Def B1) x s & (Def B2) s y, 

Height = % x y. let s = (THE s. (Def B1) x s & (Def B2) s y) in 

max (Height B1 x s) (Height B2 s y), 

Width = % x y. let s = (THE s. (Def B1) x s & (Def B2) s y) in 

(Width B1 x s) + (Width B2 s y)|))" 

section {* Properties of series composition *} 

theorem width_ser_ge0: "!! P Q x y. [| !! x y. 0


header {* Definition of Quartz parallel composition *} 

theory ParallelComposition = Block + IntAlgebra: 

consts 

Par :: "(’a=>’b=>bool, ’a=>’b=>int)block=>(’c=>’d=>bool,’c=>’d=>int)block=>((’a*’c 

)=>(’b*’d)=>bool, (’a*’c)=>(’b*’d)=>int)block" 

EmptyPar :: "unit=>unit" ("[[ ]]") 

section {* Syntax definitions *} 

nonterminals 

par_args parpatterns 

syntax 

"_par" :: "(’a=>’b=>bool,’a=>’b=>int)block => par_args => ((’a*’c)=>(’b*’d)=> 

bool,(’a*’c)=>(’b*’d)=>int)block" 

("[[ _ , _ ]]") 

"_par_arg" :: "(’a=>’b=>bool,’a=>’b=>int)block => par_args" ("_") 

"_par_args" :: "(’c=>’d=>bool,’c=>’d=>int)block => par_args => par_args" ("_,/ _ 

") 

"_parpattern" :: "[pttrn, parpatterns] => pttrn" ("’[[_,/ _’]]") 

"" :: "pttrn => parpatterns" ("_") 

"_parpatterns" :: "[pttrn, parpatterns] => parpatterns" ("_,/ _") 

translations 

"[[ x, y ]]" == "Par x y" 

"_par x (_par_args y z)" == "_par x (_par_arg (_par y z))" 

defs 

par_def: "Par A B == (| Def = % (d1, d2) (r1, r2). (Def A) d1 r1 & (Def B) d2 r2, 

Height = % (d1,d2) (r1, r2). (Height A) d1 r1 + (Height B) d2 

r2, 

Width = % (d1, d2) (r1, r2). max ((Width A) d1 r1) ((Width B) 

d2 r2) |)" 

section {* Expansion theorems *} 

theorem par2height: "Height (((a,b) ;;; [[ F, G ]] ;;; (c, d))) = (Height F) a c + ( 

Height G) b d" 

by (simp add: par_def) 

theorem par2width: "Width ((a,b) ;;; [[ F , G ]] ;;; (c, d)) = max ((Width F) a c) (( 

Width G) b d)" 

by (simp add: par_def) 

section {* Properties of parallel composition *} 

theorem width_par_ge0: "!! P Q x y. [| !! x y. 0


end 

B.7 Functions 

header {* Definition of Quartz expression functions: maxf, sum, etc *} 

theory Functions = IntAlgebra: 

section {* Useful logic *} 

theorem conj_imp: "(A --> B & C) = ((A --> B) & (A --> C))" 

by (auto) 

section {* maxf function to find the maximum point of a function within in a range *} 

consts 

maxf :: "(int*int*(int=>int))=>int" 

recdef maxf "measure (%(b, t, f). nat(t+1-b))" 

"maxf (bot, top, fun) = (if (top < bot) then 0 

else ( 

case (top = bot) of True => 

fun top 

| False => ( 


let two = maxf (bot, top - 1, fun) in 

max one two) 

))" 

theorem maxf_expand_if: "maxf(b,t,f) = (if (t < b) then 0 else (if t = b then f t 

else max (f t) (maxf(b,t - 1,f))))" 

by (simp add: Let_def) 

subsection {* Correctness proof *} 

constdefs 

is_maxf :: "[int,int, int=>int,int]=>bool" 

"is_maxf == % bot top fun max. 

(EX y. bot C" 

by (auto) 

lemma logic_impand: "A & B = (A --> B) | ~A" 

by (blast) 

theorem maxf_nobigger: "!! b t f. [| b


apply (rule conjI) 

apply (simp) 


apply (simp (no_asm_simp) add: Let_def del: maxf.simps) 

apply (simp only: z_leqplusone) 

apply (simp only: logic_rearr) 

apply (rule allI) 

apply (rule logic_rearr2) 

apply (simp only: le_max_iff_disj) 

apply (simp) 


apply (simp) 

done 

theorem maxf_fmax: "!! b t f x. [| b b


apply (simp del: maxf.simps) 



defer 


apply (rule exI) 


defer 


defer 

apply (simp) 

defer 

by (simp, simp, auto) 

theorem maxf_is_maxf: "!! b t f. [| b


done 

theorem maxf_ge0_frange: "!! m n f. [| (ALL y. m


theorem sum_norange_ge0: "!! m n f. [| m < n |] ==> 0


theorem sumn_plusf: "!! (p::int) q m n f. [| (!! y. 0


apply (rule impI) apply (rule impI) 

apply (subgoal_tac "0 int). [| (!!x. 0 maxf(b, t, 

%k. sum(b,k,f)) = sum(b, t, f)" 

apply (subgoal_tac "b (B|C|D|E))" 

by (auto) 

lemma impdisj_2of4: "(A --> C) ==> (A --> (B|C|D|E))" 

by (auto)


lemma impdisj_3of4: "(A --> D) ==> (A --> (B|C|D|E))" 

by (auto) 

lemma impdisj_4of4: "(A --> E) ==> (A --> (B|C|D|E))" 

by (auto) 

lemma impdisj_1of2: "(A --> B) ==> (A --> (B|C))" 

by auto 

lemma impdisj_2of2: "(A --> C) ==> (A --> (B|C))" 

by auto 

lemma impdisj_12of4: "(A --> (B|C)) ==> (A --> (B|C|D|E))" 

by auto 

lemma impdisj_34of4: "(A --> (D|E)) ==> (A --> (B|C|D|E))" 

by auto 

section {* Zero size ranges *} 

lemma overlap0: "((0::int)


((m



apply (simp (no_asm_simp) add: Let_def del: maxf.simps) 

apply (simp only: z_leqplusone) 

apply (simp only: logic_rearr) 

apply (rule logic_rearr2) 


apply (simp) 


apply (simp) 

done 

theorem maxf_sizefunc2: "!! x. [| True |] ==> b f x


theorem sum_nlem_zero [simp]: "!! (m::int) (n::int) f. [| n < m |] ==> sum (m, n, f) 

= 0" 

by (simp add: sum.simps) 

theorem sum_simp_xnotinf: "!! (m::int) (n::int) f. sum(m, n, %x. f) = (if m maxf(m, n, f) 

= 0" 

by (simp add:maxf.simps) 

theorem maxf_simp_xnotinf: "!! (m::int) (n::int) f. maxf(m, n, %x. f) = (if m


theorem min_max_corres: "!! (a::int) (b::int). min a b = - (max (-a) (-b))" 

apply (simp add: max_def min_def) 

done 

section {* minf function and correspondance with maxf *} 

consts 

minf :: "(int*int*(int=>int))=>int" 

recdef minf "measure (%(b, t, f). nat(t+1-b))" 

"minf (bot, top, fun) = (if (top < bot) then 0 

else ( 

case (top = bot) of True => 

fun top 

| False => ( 


let two = minf (bot, top - 1, fun) in 

min one two) 

))" 

theorem minf_maxf_corres: "!! (f::int=>int) b t. minf(b,t,f) = - maxf (b,t,% x.- f x) 

" 

apply (case_tac "b

Appendix C 

Placed Combinator Libraries 

This appendix contains the Quartz descriptions and layout correctness proofs for some Quartz 

libraries. The wide range of wiring blocks in the Quartz prelude library have been omitted, 

since they are all defined to have size 0 × 0, as have many combinators which have similar 

functions to the one shown. For example, the snd block is structurally very similar to the fst 

block and has been omitted, as has the col block which is very similar to row. 

C.1 Prelude Library 

C.1.1 fst 

/∗∗ Apply block R to the first element of a tuple ∗/ 

block fst (block R ‘a ∼ ‘b) (‘a i1, ‘c i2) ∼ (‘b o1, ‘c o2) 

attributes { 

height = height(i1 ;R ; o1). 

width = width(i1 ;R ;o1). 

layout−proved. 

} { 

o2 = i2. 

i1 ; R ; o1 at (0,0). 

} 

theory fst = QuartzLayout: 

section {* Function definitions *} 

consts 

struct:: "((’t276=>’t277=>bool,’t276=>’t277=>int)block)=>(’t276*’t284)=>(’t277*’ 

t284)=>bool" 

207

APPENDIX C. PLACED COMBINATOR LIBRARIES 208 

height:: "((’t276=>’t277=>bool,’t276=>’t277=>int)block)=>(’t276*’t284)=>(’t277*’ 

t284)=>int" 

width:: "((’t276=>’t277=>bool,’t276=>’t277=>int)block)=>(’t276*’t284)=>(’t277*’ 

t284)=>int" 

fst:: "(((’t276=>’t277=>bool,’t276=>’t277=>int)block)=>(’t276*’t284)=>(’t277*’ 

t284)=>bool, ((’t276=>’t277=>bool,’t276=>’t277=>int)block)=>(’t276*’t284)=>(’ 

t277*’t284)=>int)block" 

defs 

struct_def: "struct == % R (i1, i2) (o1, o2). (o2 = i2) & Def (i1 ;;; R ;;; o1)" 

height_def: "height == % R (i1, i2) (o1, o2). Height (i1 ;;; R ;;; o1)" 

width_def: "width == % R (i1, i2) (o1, o2). Width (i1 ;;; R ;;; o1)" 

fst_def: "fst == (| Def = struct, Height = height, Width = width |)" 

declare width_def [simp] 

declare height_def [simp] 

declare struct_def [simp] 

section {* Validity of width and height functions *} 

theorem height_ge0_int: "!! (R::((’t276=>’t277=>bool,’t276=>’t277=>int)block)) (i1::’ 

t276) (i2::’t284) (o1::’t277) (o2::’t284). [| ALL (qs674::’t276) (qs675::’t277). 

0 

0 ’t277=>bool,’t276=>’t277=>int)block)) (i1::’ 

t276) (i2::’t284) (o1::’t277) (o2::’t284). [| ALL (qs674::’t276) (qs675::’t277). 

0 

0 ’t277=>bool,’t276=>’t277=>int)block)) (i1::’t276 

) (i2::’t284) (o1::’t277) (o2::’t284). [| ALL (qs674::’t276) (qs675::’t277). 0 

 

0 ’t277=>bool,’t276=>’t277=>int)block)) (i1::’t276) 

(i2::’t284) (o1::’t277) (o2::’t284). [| ALL (qs674::’t276) (qs675::’t277). 0 

0 ’t277=>bool,’t276=>’t277=>int)block)) (i1::’t276) (i2::’t284 

) (o1::’t277) (o2::’t284). [| o2 = i2 ; Def (i1 ;;; R ;;; o1) ; ALL (qs674::’ 

t276) (qs675::’t277). 0


((0::int) int)block)=>’t301=>’t300=>bool" 

height:: "((’t300=>’t301=>bool,’t300=>’t301=>int)block)=>’t301=>’t300=>int" 

width:: "((’t300=>’t301=>bool,’t300=>’t301=>int)block)=>’t301=>’t300=>int" 

converse:: "(((’t300=>’t301=>bool,’t300=>’t301=>int)block)=>’t301=>’t300=>bool, 

((’t300=>’t301=>bool,’t300=>’t301=>int)block)=>’t301=>’t300=>int)block" 

defs 

struct_def: "struct == % R i o_. Def (o_ ;;; R ;;; i)" 

height_def: "height == % R i o_. Height (o_ ;;; R ;;; i)" 

width_def: "width == % R i o_. Width (o_ ;;; R ;;; i)" 

converse_def: "converse == (| Def = struct, Height = height, Width = width |)" 





theorem height_ge0_int: "!! (R::((’t300=>’t301=>bool,’t300=>’t301=>int)block)) (i::’ 

t301) (o_::’t300). [| ALL (qs678::’t300) (qs679::’t301). 0 bool,’t300=>’t301=>int)block)) (i::’ 

t301) (o_::’t300). [| ALL (qs678::’t300) (qs679::’t301). 0 bool,’t300=>’t301=>int)block)) (i::’t301) 

(o_::’t300). [| ALL (qs678::’t300) (qs679::’t301). 0


;;; qs679)) ; ALL (qs678::’t300) (qs679::’t301). 0 

0 ’t301=>bool,’t300=>’t301=>int)block)) (i::’t301) 

(o_::’t300). [| ALL (qs678::’t300) (qs679::’t301). 0 bool,’t300=>’t301=>int)block)) (i::’t301) (o_::’t300) 

. [| Def (o_ ;;; R ;;; i) ; ALL (qs678::’t300) (qs679::’t301). 0



consts 

struct:: "(int*((’t321=>’t321=>bool,’t321=>’t321=>int)block))=>’t321=>’t321=>bool 

" 

height:: "(int*((’t321=>’t321=>bool,’t321=>’t321=>int)block))=>’t321=>’t321=>int" 

width:: "(int*((’t321=>’t321=>bool,’t321=>’t321=>int)block))=>’t321=>’t321=>int" 

rcomp:: "((int*((’t321=>’t321=>bool,’t321=>’t321=>int)block))=>’t321=>’t321=>bool 

, (int*((’t321=>’t321=>bool,’t321=>’t321=>int)block))=>’t321=>’t321=>int) 

block" 

defs 

struct_def: "struct == % (n, R) i o_. EX (intsig::(’t321)vector). if n = 0 then 

o_ = i else (intsig = i) & (ALL (j::int). ((0 ’t321=>int)block 

)) (i::’t321) (o_::’t321). [| ALL (qs680::’t321) (qs681::’t321). 0 

0 ’t321=>bool,’t321=>’t321=>int)block) 

) (i::’t321) (o_::’t321). [| ALL (qs680::’t321) (qs681::’t321). 0 

0 ’t321=>bool,’t321=>’t321=>int)block)) ( 

i::’t321) (o_::’t321). [| ALL (qs680::’t321) (qs681::’t321). 0 

0 ’t321=>bool,’t321=>’t321=>int)block)) (i 

::’t321) (o_::’t321). [| ALL (qs680::’t321) (qs681::’t321). 0


;;; R ;;; qs681)) |] ==> 

0 ’t321=>bool,’t321=>’t321=>int)block)) (i::’t321) ( 

o_::’t321). [| if n = 0 then o_ = i else (intsig = i) & (ALL (j::int). ((0


C.1.4 Q\P (conjugate) 

/∗∗ Conjugation. Q \ P = Pˆ∼ 1 ;Q ;P ∗/ 

block conjugate (block Q ‘a ∼ ‘a, block P ‘a ∼ ‘a) (‘a i) ∼ (‘a o) 

attributes { 

height = height (i ;converse P ; Q ; P ; o). 

width = width (i ;converse P ; Q ; P ; o). 


} → converse P ; Q ; P. 

theory conjugate = converse: 


consts 

struct:: "(((’t343=>’t343=>bool,’t343=>’t343=>int)block)*((’t343=>’t343=>bool,’ 

t343=>’t343=>int)block))=>’t343=>’t343=>bool" 

height:: "(((’t343=>’t343=>bool,’t343=>’t343=>int)block)*((’t343=>’t343=>bool,’ 

t343=>’t343=>int)block))=>’t343=>’t343=>int" 

width:: "(((’t343=>’t343=>bool,’t343=>’t343=>int)block)*((’t343=>’t343=>bool,’ 

t343=>’t343=>int)block))=>’t343=>’t343=>int" 

conjugate:: "((((’t343=>’t343=>bool,’t343=>’t343=>int)block)*((’t343=>’t343=>bool 

,’t343=>’t343=>int)block))=>’t343=>’t343=>bool, (((’t343=>’t343=>bool,’t343 

=>’t343=>int)block)*((’t343=>’t343=>bool,’t343=>’t343=>int)block))=>’t343=>’ 

t343=>int)block" 

defs 

struct_def: "struct == % (Q, P) i o_. Def (i ;;; converse $ (P) ;; Q ;; P ;;; o_) 

" 

height_def: "height == % (Q, P) i o_. Height (i ;;; converse $ (P) ;; Q ;; P ;;; 

o_)" 

width_def: "width == % (Q, P) i o_. Width (i ;;; converse $ (P) ;; Q ;; P ;;; o_) 

" 

conjugate_def: "conjugate == (| Def = struct, Height = height, Width = width |)" 





theorem height_ge0_int: "!! (Q::((’t343=>’t343=>bool,’t343=>’t343=>int)block)) (P 

::((’t343=>’t343=>bool,’t343=>’t343=>int)block)) (i::’t343) (o_::’t343). [| ALL 

(qs683::’t343) (qs684::’t343). 0 int)block)) (P::((’ 

t343=>’t343=>bool,’t343=>’t343=>int)block)) (i::’t343) (o_::’t343). [| ALL ( 

qs683::’t343) (qs684::’t343). 0


(qs686::’t343). 0 

0 ’t343=>bool,’t343=>’t343=>int)block)) (P::((’ 

t343=>’t343=>bool,’t343=>’t343=>int)block)) (i::’t343) (o_::’t343). [| ALL ( 

qs683::’t343) (qs684::’t343). 0 int)block)) (P::((’t343 

=>’t343=>bool,’t343=>’t343=>int)block)) (i::’t343) (o_::’t343). [| ALL (qs683::’ 

t343) (qs684::’t343). 0 int)block)) (P::((’t343=>’t343=> 

bool,’t343=>’t343=>int)block)) (i::’t343) (o_::’t343). [| Def (i ;;; converse $ 

(P) ;; Q ;; P ;;; o_) ; ALL (qs683::’t343) (qs684::’t343). 0


attributes { 

width = maxf(k=0..n−1, width (i[k] ;R ;o[k])). 

height = sum(k=0..n−1, height (i[k] ;R ;o[k])). 


} { 

int j. 

for j = 0..n−1 { 

i [j] ; R ; o[j] at (0, sum(k=0..j−1,height(i[k] ;R ; o[k]))). 

} . 

} 

theory map = QuartzLayout: 


consts 

struct:: "(int*((’t395=>’t396=>bool,’t395=>’t396=>int)block))=>(’t395)vector=>(’ 

t396)vector=>bool" 

height:: "(int*((’t395=>’t396=>bool,’t395=>’t396=>int)block))=>(’t395)vector=>(’ 

t396)vector=>int" 

width:: "(int*((’t395=>’t396=>bool,’t395=>’t396=>int)block))=>(’t395)vector=>(’ 


map:: "((int*((’t395=>’t396=>bool,’t395=>’t396=>int)block))=>(’t395)vector=>(’ 

t396)vector=>bool, (int*((’t395=>’t396=>bool,’t395=>’t396=>int)block))=>(’ 

t395)vector=>(’t396)vector=>int)block" 

defs 

struct_def: "struct == % (n, R) i o_. ALL (j::int). ((0 ’t396=>bool,’t395=>’t396=>int)block 

)) (i::(’t395)vector) (o_::(’t396)vector). [| ALL (qs691::’t395) (qs692::’t396). 

0 


) (i::(’t395)vector) (o_::(’t396)vector). [| ALL (qs691::’t395) (qs692::’t396). 

0 

0


theorem height_ge0: "!! (n::int) (R::((’t395=>’t396=>bool,’t395=>’t396=>int)block)) ( 

i::(’t395)vector) (o_::(’t396)vector). [| ALL (qs691::’t395) (qs692::’t396). 0 

 


::(’t395)vector) (o_::(’t396)vector). [| ALL (qs691::’t395) (qs692::’t396). 0 

0 ’t396=>bool,’t395=>’t396=>int)block)) (i::(’t395) 

vector) (o_::(’t396)vector). [| ALL (j::int). ((0


rule impdisj_34of4, 

rule loop_sum_overlap2, 


auto intro: sum_ge0 maxf_ge0 sum_nsub1_plusf maxf_encloses) 

done 

end 

C.1.6 (tri) 

/∗∗ Triangle. Ruby /\. Place a triangle of R blocks between the n−element 

vectors i and o. Tri is an increasing triangle , irt is decreasing ∗/ 

block tri (int n, block R ‘a ∼ ‘a) (‘a i[n]) ∼ (‘a o[n]) 

attributes { 

height = if (n 1) { 

for j = 1..n−1 { 

i[j] ; rcomp (j, R) ; o[j] at (sum(j=1..j−1,width(i[j] ;rcomp (j, R) ; o[j ])), 0). 

} . 

} . 

} 

theory tri = rcomp: 


consts 

struct:: "(int*((’t415=>’t415=>bool,’t415=>’t415=>int)block))=>(’t415)vector=>(’ 

t415)vector=>bool" 

height:: "(int*((’t415=>’t415=>bool,’t415=>’t415=>int)block))=>(’t415)vector=>(’ 


width:: "(int*((’t415=>’t415=>bool,’t415=>’t415=>int)block))=>(’t415)vector=>(’ 


tri:: "((int*((’t415=>’t415=>bool,’t415=>’t415=>int)block))=>(’t415)vector=>(’ 

t415)vector=>bool, (int*((’t415=>’t415=>bool,’t415=>’t415=>int)block))=>(’ 

t415)vector=>(’t415)vector=>int)block" 

defs 

struct_def: "struct == % (n, R) i o_. (n >= 0) & (o_ = i) & (if n > 1 then 

ALL (j::int). ((1



theorem height_ge0_int: "!! (n::int) (R::((’t415=>’t415=>bool,’t415=>’t415=>int)block 

)) (i::(’t415)vector) (o_::(’t415)vector). [| ALL (qs694::’t415) (qs695::’t415). 

0 


) (i::(’t415)vector) (o_::(’t415)vector). [| ALL (qs694::’t415) (qs695::’t415). 

0 

0 ’t415=>bool,’t415=>’t415=>int)block)) ( 

i::(’t415)vector) (o_::(’t415)vector). [| ALL (qs694::’t415) (qs695::’t415). 0 

 


::(’t415)vector) (o_::(’t415)vector). [| ALL (qs694::’t415) (qs695::’t415). 0 

0 ’t415=>bool,’t415=>’t415=>int)block)) (i::(’t415) 

vector) (o_::(’t415)vector). [| n >= 0 ; o_ = i ; if n > 1 then ALL (j:: 

int). ((1


section {* Intersection theorems *} 

theorem "!! (n::int) (R::((’t415=>’t415=>bool,’t415=>’t415=>int)block)) (i::(’t415) 

vector) (o_::(’t415)vector). [| n >= 0 ; o_ = i ; if n > 1 then ALL (j:: 

int). ((1


struct:: "((((’t486*’t487)=>(’t488*’t489)=>bool,(’t486*’t487)=>(’t488*’t489)=>int 

)block)*(((’t489*’t491)=>(’t492*’t493)=>bool,(’t489*’t491)=>(’t492*’t493)=> 

int)block))=>(’t486*(’t487*’t491))=>((’t488*’t492)*’t493)=>bool" 

height:: "((((’t486*’t487)=>(’t488*’t489)=>bool,(’t486*’t487)=>(’t488*’t489)=>int 

)block)*(((’t489*’t491)=>(’t492*’t493)=>bool,(’t489*’t491)=>(’t492*’t493)=> 

int)block))=>(’t486*(’t487*’t491))=>((’t488*’t492)*’t493)=>int" 

width:: "((((’t486*’t487)=>(’t488*’t489)=>bool,(’t486*’t487)=>(’t488*’t489)=>int) 

block)*(((’t489*’t491)=>(’t492*’t493)=>bool,(’t489*’t491)=>(’t492*’t493)=>int 

)block))=>(’t486*(’t487*’t491))=>((’t488*’t492)*’t493)=>int" 

beside:: "(((((’t486*’t487)=>(’t488*’t489)=>bool,(’t486*’t487)=>(’t488*’t489)=> 

int)block)*(((’t489*’t491)=>(’t492*’t493)=>bool,(’t489*’t491)=>(’t492*’t493) 

=>int)block))=>(’t486*(’t487*’t491))=>((’t488*’t492)*’t493)=>bool, ((((’t486 

*’t487)=>(’t488*’t489)=>bool,(’t486*’t487)=>(’t488*’t489)=>int)block)*(((’ 

t489*’t491)=>(’t492*’t493)=>bool,(’t489*’t491)=>(’t492*’t493)=>int)block)) 

=>(’t486*(’t487*’t491))=>((’t488*’t492)*’t493)=>int)block" 

defs 

struct_def: "struct == % (R, S) (a, (b, c)) ((d, e), f). EX (is::’t489). Def ((a, 

b) ;;; R ;;; (d, is)) & Def ((is, c) ;;; S ;;; (e, f))" 

height_def: "height == % (R, S) (a, (b, c)) ((d, e), f). let is = (THE (is::’t489 

). Def ((a, b) ;;; R ;;; (d, is)) & Def ((is, c) ;;; S ;;; (e, f))) in max ( 

Height ((a, b) ;;; R ;;; (d, is))) (Height ((is, c) ;;; S ;;; (e, f)))" 

width_def: "width == % (R, S) (a, (b, c)) ((d, e), f). let is = (THE (is::’t489). 

Def ((a, b) ;;; R ;;; (d, is)) & Def ((is, c) ;;; S ;;; (e, f))) in (Width 

((a, b) ;;; R ;;; (d, is))) + (Width ((is, c) ;;; S ;;; (e, f)))" 

beside_def: "beside == (| Def = struct, Height = height, Width = width |)" 





theorem height_ge0_int: "!! (R::(((’t486*’t487)=>(’t488*’t489)=>bool,(’t486*’t487) 

=>(’t488*’t489)=>int)block)) (S::(((’t489*’t491)=>(’t492*’t493)=>bool,(’t489*’ 

t491)=>(’t492*’t493)=>int)block)) (a::’t486) (b::’t487) (c::’t491) (d::’t488) (e 

::’t492) (f::’t493). [| ALL (qs704::(’t486*’t487)) (qs705::(’t488*’t489)). 0 bool,(’t486*’t487)=>(’ 

t488*’t489)=>int)block)) (S::(((’t489*’t491)=>(’t492*’t493)=>bool,(’t489*’t491) 

=>(’t492*’t493)=>int)block)) (a::’t486) (b::’t487) (c::’t491) (d::’t488) (e::’ 

t492) (f::’t493). [| ALL (qs704::(’t486*’t487)) (qs705::(’t488*’t489)). 0 bool,(’t486*’t487)=>(’ 

t488*’t489)=>int)block)) (S::(((’t489*’t491)=>(’t492*’t493)=>bool,(’t489*’t491) 

=>(’t492*’t493)=>int)block)) (a::’t486) (b::’t487) (c::’t491) (d::’t488) (e::’


t492) (f::’t493). [| ALL (qs704::(’t486*’t487)) (qs705::(’t488*’t489)). 0 bool,(’t486*’t487)=>(’t488 

*’t489)=>int)block)) (S::(((’t489*’t491)=>(’t492*’t493)=>bool,(’t489*’t491)=>(’ 

t492*’t493)=>int)block)) (a::’t486) (b::’t487) (c::’t491) (d::’t488) (e::’t492) 

(f::’t493). [| ALL (qs704::(’t486*’t487)) (qs705::(’t488*’t489)). 0 bool,(’t486*’t487)=>(’t488*’t489)=> 

int)block)) (S::(((’t489*’t491)=>(’t492*’t493)=>bool,(’t489*’t491)=>(’t492*’t493 

)=>int)block)) (a::’t486) (b::’t487) (c::’t491) (d::’t488) (e::’t492) (f::’t493) 

. [| Def ((a, b) ;;; R ;;; (d, is)) ; Def ((is, c) ;;; S ;;; (e, f)) ; ALL ( 

qs704::(’t486*’t487)) (qs705::(’t488*’t489)). 0 (’t492*’t493 



qs704::(’t486*’t487)) (qs705::(’t488*’t489)). 0


theorem "!! (R::(((’t486*’t487)=>(’t488*’t489)=>bool,(’t486*’t487)=>(’t488*’t489)=> 

int)block)) (S::(((’t489*’t491)=>(’t492*’t493)=>bool,(’t489*’t491)=>(’t492*’t493 



qs704::(’t486*’t487)) (qs705::(’t488*’t489)). 0


struct:: "(int*(((’t520*’t506)=>(’t507*’t520)=>bool,(’t520*’t506)=>(’t507*’t520) 

=>int)block))=>(’t520*(’t506)vector)=>((’t507)vector*’t520)=>bool" 

height:: "(int*(((’t520*’t506)=>(’t507*’t520)=>bool,(’t520*’t506)=>(’t507*’t520) 

=>int)block))=>(’t520*(’t506)vector)=>((’t507)vector*’t520)=>int" 

width:: "(int*(((’t520*’t506)=>(’t507*’t520)=>bool,(’t520*’t506)=>(’t507*’t520)=> 

int)block))=>(’t520*(’t506)vector)=>((’t507)vector*’t520)=>int" 

row:: "((int*(((’t520*’t506)=>(’t507*’t520)=>bool,(’t520*’t506)=>(’t507*’t520)=> 

int)block))=>(’t520*(’t506)vector)=>((’t507)vector*’t520)=>bool, (int*(((’ 

t520*’t506)=>(’t507*’t520)=>bool,(’t520*’t506)=>(’t507*’t520)=>int)block)) 

=>(’t520*(’t506)vector)=>((’t507)vector*’t520)=>int)block" 

defs 

struct_def: "struct == % (n, R) (l, t) (b, r). EX (is::(’t520)vector). (is = l 

) & (ALL (i::int). ((0 bool,(’t520 

*’t506)=>(’t507*’t520)=>int)block)) (l::’t520) (t::(’t506)vector) (b::(’t507) 

vector) (r::’t520). [| ALL (qs708::(’t520*’t506)) (qs709::(’t507*’t520)). 0 

0 (’t507*’t520)=>bool,(’t520*’ 

t506)=>(’t507*’t520)=>int)block)) (l::’t520) (t::(’t506)vector) (b::(’t507) 


0 (’t507*’t520)=>bool,(’t520*’ 

t506)=>(’t507*’t520)=>int)block)) (l::’t520) (t::(’t506)vector) (b::(’t507) 


0


theorem width_ge0: "!! (n::int) (R::(((’t520*’t506)=>(’t507*’t520)=>bool,(’t520*’t506 

)=>(’t507*’t520)=>int)block)) (l::’t520) (t::(’t506)vector) (b::(’t507)vector) ( 

r::’t520). [| ALL (qs708::(’t520*’t506)) (qs709::(’t507*’t520)). 0 

0 (’t507*’t520)=>bool,(’t520*’t506)=>(’t507*’ 

t520)=>int)block)) (l::’t520) (t::(’t506)vector) (b::(’t507)vector) (r::’t520). 

[| is = l ; ALL (i::int). ((0



auto intro: sum_ge0 maxf_ge0 sum_nsub1_plusf maxf_encloses) 

done 

end 

C.1.9 grid 

/∗∗ Grid of m x n R blocks. m columns, n rows ∗/ 

block grid (int m, int n, block R (‘a, ‘b) ∼ (‘b, ‘a)) (‘a l[n], ‘b t[m]) ∼ 

(‘b b[m], ‘a r[n]) 

attributes { 

height = height ((l, t) ; row (m, col (n, R)) ; (b, r)). 

width = width ((l, t) ; row (m, col (n, R)) ; (b, r)). 


} → row (m, col (n, R)). 

theory grid = col + row: 


consts 

struct:: "(int*int*(((’t581*’t587)=>(’t587*’t581)=>bool,(’t581*’t587)=>(’t587*’ 

t581)=>int)block))=>((’t581)vector*(’t587)vector)=>((’t587)vector*(’t581) 

vector)=>bool" 

height:: "(int*int*(((’t581*’t587)=>(’t587*’t581)=>bool,(’t581*’t587)=>(’t587*’ 


vector)=>int" 

width:: "(int*int*(((’t581*’t587)=>(’t587*’t581)=>bool,(’t581*’t587)=>(’t587*’ 


vector)=>int" 

grid:: "((int*int*(((’t581*’t587)=>(’t587*’t581)=>bool,(’t581*’t587)=>(’t587*’ 


vector)=>bool, (int*int*(((’t581*’t587)=>(’t587*’t581)=>bool,(’t581*’t587) 

=>(’t587*’t581)=>int)block))=>((’t581)vector*(’t587)vector)=>((’t587)vector 

*(’t581)vector)=>int)block" 

defs 

struct_def: "struct == % (m, n, R) (l, t) (b, r). Def ((l, t) ;;; row $ (m, col $ 

(n, R)) ;;; (b, r))" 

height_def: "height == % (m, n, R) (l, t) (b, r). Height ((l, t) ;;; row $ (m, 

col $ (n, R)) ;;; (b, r))" 

width_def: "width == % (m, n, R) (l, t) (b, r). Width ((l, t) ;;; row $ (m, col $ 

(n, R)) ;;; (b, r))" 

grid_def: "grid == (| Def = struct, Height = height, Width = width |)" 





theorem height_ge0_int: "!! (m::int) (n::int) (R::(((’t581*’t587)=>(’t587*’t581)=> 

bool,(’t581*’t587)=>(’t587*’t581)=>int)block)) (l::(’t581)vector) (t::(’t587) 

vector) (b::(’t587)vector) (r::(’t581)vector). [| ALL (qs714::(’t581*’t587)) ( 

qs715::(’t587*’t581)). 0


0 (’t587*’t581)=>bool 

,(’t581*’t587)=>(’t587*’t581)=>int)block)) (l::(’t581)vector) (t::(’t587)vector) 

(b::(’t587)vector) (r::(’t581)vector). [| ALL (qs714::(’t581*’t587)) (qs715::(’ 

t587*’t581)). 0 bool,(’ 

t581*’t587)=>(’t587*’t581)=>int)block)) (l::(’t581)vector) (t::(’t587)vector) (b 

::(’t587)vector) (r::(’t581)vector). [| ALL (qs714::(’t581*’t587)) (qs715::(’ 

t587*’t581)). 0 bool,(’ 

t581*’t587)=>(’t587*’t581)=>int)block)) (l::(’t581)vector) (t::(’t587)vector) (b 

::(’t587)vector) (r::(’t581)vector). [| ALL (qs714::(’t581*’t587)) (qs715::(’ 

t587*’t581)). 0 bool,(’t581*’t587) 

=>(’t587*’t581)=>int)block)) (l::(’t581)vector) (t::(’t587)vector) (b::(’t587) 

vector) (r::(’t581)vector). [| Def ((l, t) ;;; row $ (m, col $ (n, R)) ;;; (b, r 

)) ; ALL (qs714::(’t581*’t587)) (qs715::(’t587*’t581)). 0


C.1.10 loop 

/∗∗ Loop. Ruby x (loop R) y. R for some s. ∗/ 

block loop (block R (‘a, ˆd1 ‘b) ∼ (ˆd1∗ ‘b, ‘c)) (‘a i) ∼ (‘c o) 

attributes { 

height = height((i, s) ; R ; (s, o)). 

width = width((i, s) ;R ; (s, o)). 


} { 

‘b s. 

(i, s) ; R ; (s, o) at (0,0). 

} 

theory loop = QuartzLayout: 


consts 

struct:: "(((’t632*’t633)=>(’t633*’t634)=>bool,(’t632*’t633)=>(’t633*’t634)=>int) 

block)=>’t632=>’t634=>bool" 

height:: "(((’t632*’t633)=>(’t633*’t634)=>bool,(’t632*’t633)=>(’t633*’t634)=>int) 

block)=>’t632=>’t634=>int" 

width:: "(((’t632*’t633)=>(’t633*’t634)=>bool,(’t632*’t633)=>(’t633*’t634)=>int) 

block)=>’t632=>’t634=>int" 

loop:: "((((’t632*’t633)=>(’t633*’t634)=>bool,(’t632*’t633)=>(’t633*’t634)=>int) 

block)=>’t632=>’t634=>bool, (((’t632*’t633)=>(’t633*’t634)=>bool,(’t632*’t633 

)=>(’t633*’t634)=>int)block)=>’t632=>’t634=>int)block" 

defs 

struct_def: "struct == % R i o_. EX (s::’t633). Def ((i, s) ;;; R ;;; (s, o_))" 

height_def: "height == % R i o_. let s = (THE (s::’t633). Def ((i, s) ;;; R ;;; ( 

s, o_))) in Height ((i, s) ;;; R ;;; (s, o_))" 

width_def: "width == % R i o_. let s = (THE (s::’t633). Def ((i, s) ;;; R ;;; (s, 

o_))) in Width ((i, s) ;;; R ;;; (s, o_))" 

loop_def: "loop == (| Def = struct, Height = height, Width = width |)" 





theorem height_ge0_int: "!! (R::(((’t632*’t633)=>(’t633*’t634)=>bool,(’t632*’t633) 

=>(’t633*’t634)=>int)block)) (i::’t632) (o_::’t634). [| ALL (qs722::(’t632*’t633 

)) (qs723::(’t633*’t634)). 0 

0 (’t633*’t634)=>bool,(’t632*’t633)=>(’ 

t633*’t634)=>int)block)) (i::’t632) (o_::’t634). [| ALL (qs722::(’t632*’t633)) ( 

qs723::(’t633*’t634)). 0 

0


theorem height_ge0: "!! (R::(((’t632*’t633)=>(’t633*’t634)=>bool,(’t632*’t633)=>(’ 

t633*’t634)=>int)block)) (i::’t632) (o_::’t634). [| ALL (qs722::(’t632*’t633)) ( 

qs723::(’t633*’t634)). 0 

0 (’t633*’t634)=>bool,(’t632*’t633)=>(’t633 

*’t634)=>int)block)) (i::’t632) (o_::’t634). [| ALL (qs722::(’t632*’t633)) ( 

qs723::(’t633*’t634)). 0 

0 (’t633*’t634)=>bool,(’t632*’t633)=>(’t633*’t634)=> 

int)block)) (i::’t632) (o_::’t634). [| Def ((i, s) ;;; R ;;; (s, o_)) ; ALL ( 

qs722::(’t632*’t633)) (qs723::(’t633*’t634)). 0


struct:: "nat=>((int=>’t12=>’t12=>bool,int=>’t12=>’t12=>int)block)=>’t12=>’t12=> 

bool" 

height:: "nat=>((int=>’t12=>’t12=>bool,int=>’t12=>’t12=>int)block)=>’t12=>’ 

t12=>int" 

width:: "nat=>((int=>’t12=>’t12=>bool,int=>’t12=>’t12=>int)block)=>’t12=>’t12=> 

int" 

ichain:: "((int*((int=>’t12=>’t12=>bool,int=>’t12=>’t12=>int)block))=>’t12=>’t12 

=>bool, (int*((int=>’t12=>’t12=>bool,int=>’t12=>’t12=>int)block))=>’t12=>’t12 

=>int)block" 

defs 

ichain_def: "ichain == (| Def = % (n, R) d r. struct (int2nat n) R d r, Height = 

% (n, R) d r. height (int2nat n) R d r, Width = % (n, R) d r. width (int2nat 

n) R d r |)" 

primrec 

"struct 0 R d r = (d = r)" 

"struct (Suc n) R d r = (EX is. (struct n R d is) & (Def R (int (n + 1)) is r))" 

primrec 

"height 0 R d r = 0" 

"height (Suc n) R d r = ( 

let is = (THE is. (struct (Suc n) R d is) & Def R (int (Suc n)) is r) in 

max (height n R d is) (Height R (int (Suc n)) is r) 

)" 

primrec 

"width 0 R d r = 0" 

"width (Suc n) R d r = ( 

let is = (THE is. (struct (Suc n) R d is) & Def R (int (Suc n)) is r) in 

(width n R d is) + (Width R (int (Suc n)) is r) 

)" 


theorem height_ge0_int: "!! (n::nat) (R::((int=>’t18=>’t18=>bool,int=>’t18=>’t18=>int 

)block)) (d::’t18) (r::’t18). [| ALL (qs23::int) (qs24::’t18) (qs25::’t18). 0 ’t18=>bool,int=>’ 

t18=>’t18=>int)block)). [| ALL (qs23::int) (qs24::’t18) (qs25::’t18). 0


theorem height_ge0: "!! (n::int) (R::((int=>’t18=>’t18=>bool,int=>’t18=>’t18=>int) 

block)) (d::’t18) (r::’t18). [| ALL (qs23::int) (qs24::’t18) (qs25::’t18). 0 bool,int=>’t18=>’t18=>int) 

block)) (d::’t18) (r::’t18). [| ALL (qs23::int) (qs24::’t18) (qs25::’t18). 0 bool,int=>’t18=>’t18=>int)block)) (d::’ 

t18) (r::’t18). [| if n = 0 then d = r else Def (d ;;; ichain $ (n - 1, R) ;; R 

$ n ;;; r) ; ALL (qs23::int) (qs24::’t18) (qs25::’t18). 0



consts 

struct:: "nat=>((int=>’t106=>’t106=>bool,int=>’t106=>’t106=>int)block)=>(’t106) 

vector=>(’t106)vector=>bool" 

height:: "nat=>((int=>’t106=>’t106=>bool,int=>’t106=>’t106=>int)block)=>(’t106) 

vector=>(’t106)vector=>int" 

width:: "nat=>((int=>’t106=>’t106=>bool,int=>’t106=>’t106=>int)block)=>(’t106) 

vector=>(’t106)vector=>int" 

imap:: "((int*((int=>’t106=>’t106=>bool,int=>’t106=>’t106=>int)block))=>(’t106) 

vector=>(’t106)vector=>bool, (int*((int=>’t106=>’t106=>bool,int=>’t106=>’t106 

=>int)block))=>(’t106)vector=>(’t106)vector=>int)block" 

defs 

imap_def: "imap == (| Def = % (n, R) d r. struct (int2nat n) R d r, Height = % (n 

, R) d r. height (int2nat n) R d r, Width = % (n, R) d r. width (int2nat n) R 

d r |)" 

primrec 

"struct 0 R d r = (d = r)" 

"struct (Suc n) R d r = ( 

struct n R (d) (r) & 

Def (d ;;; R $ (int (Suc n)) ;;; r) 

)" 

(* Parallel composition laid out vertically, heights add *) 

primrec 

"height 0 R d r = 0" 

"height (Suc n) R d r = ( 

Height (d ;;; R $ (int (Suc n)) ;;; r) + 

height n R (d) (r) 

)" 

primrec 

"width 0 R d r = 0" 

"width (Suc n) R d r = ( 

max (Width (d ;;; R $ (int (Suc n)) ;;; r)) 

(width n R (d) (r)) 

)" 


theorem height_ge0_int [rule_format]: "!! (n::nat) (R::((int=>’t31=>’t31=>bool,int=>’ 

t31=>’t31=>int)block)). [| n >= 0 ; ALL (qs148::int) (qs149::’t31) (qs150::’t31) 

. 0 ’t31=>bool,int=>’ 

t31=>’t31=>int)block)). [| n >= 0 ; ALL (qs148::int) (qs149::’t31) (qs150::’t31)


. 0 ’t31=>bool,int=>’t31=>’t31=>int) 

block)) (d::(’t31)vector) (r::(’t31)vector). [| n >= 0 ; ALL (qs148::int) (qs149 

::’t31) (qs150::’t31). 0 bool,int=>’t31=>’t31=>int) 

block)) (d::(’t31)vector) (r::(’t31)vector). [| n >= 0 ; ALL (qs148::int) (qs149 

::’t31) (qs150::’t31). 0 bool,int=>’t31=>’t31=>int)block)) (d::(’ 

t31)vector) (r::(’t31)vector). [| n >= 0 ; if n = 0 then d = r else Def (d ;;; 

converse $ (apr $ (n - 1)) ;; [[ imap $ (n - 1, R), R $ n ]] ;; apr $ (n - 1) 

;;; r) ; ALL (qs148::int) (qs149::’t31) (qs150::’t31). 0


/∗∗ Row of R blocks with left inputs connected to right outputs of previous block. 

Supply increasing integer parameter from 0 to n−1. ∗/ 

block irow (int n, block R int (‘a, ‘b) ∼ (‘c, ‘a)) (‘a l , ‘b t[n]) ∼ (‘c b[n], ‘a r) 

attributes { 

height = if(n==0, 0, height((l, t) ; snd (converse (apr (n − 1))) ; beside (irow (n−1, R), 

R n) ; fst (apr (n−1)) ; (b, r))). 

width = if(n==0, 0, width((l, t) ;snd (converse (apr (n − 1))) ; beside (irow (n−1, R), R 

n) ; fst (apr (n−1)) ; (b, r))). 

} { 

// Wires: l = left, t = top, b = bottom, r = right 

assert (n >= 0) ”n >= 0 is required”. 

if (n == 0) { l = r. } // b and t are empty vectors anyway 

else { 

(l , t) ; 

snd (converse (apr (n − 1))) ; 

beside (irow (n−1, R), R n) ; 

fst (apr (n−1)) ; 

(b, r) at (0,0). 

} . 

} 

theory irow = fst + beside + apr + converse + snd: 


consts 

struct:: "nat=>((int=>(’t107*’t108)=>(’t109*’t107)=>bool,int=>(’t107*’t108)=>(’ 

t109*’t107)=>int)block)=>’t107=>(’t108)vector=>(’t109)vector=>’t107=>bool" 

height:: "nat=>((int=>(’t107*’t108)=>(’t109*’t107)=>bool,int=>(’t107*’t108)=>(’ 

t109*’t107)=>int)block)=>’t107=>(’t108)vector=>(’t109)vector=>’t107=>int" 

width:: "nat=>((int=>(’t107*’t108)=>(’t109*’t107)=>bool,int=>(’t107*’t108)=>(’ 

t109*’t107)=>int)block)=>’t107=>(’t108)vector=>(’t109)vector=>’t107=>int" 

irow:: "((int*((int=>(’t107*’t108)=>(’t109*’t107)=>bool,int=>(’t107*’t108)=>(’ 

t109*’t107)=>int)block))=>(’t107*(’t108)vector)=>((’t109)vector*’t107)=>bool, 

(int*((int=>(’t107*’t108)=>(’t109*’t107)=>bool,int=>(’t107*’t108)=>(’t109*’ 

t107)=>int)block))=>(’t107*(’t108)vector)=>((’t109)vector*’t107)=>int)block" 

defs 

irow_def: "irow == (| Def = % (n, R) (l, t) (b, r). struct (int2nat n) R l t b r, 

Height = % (n, R) (l, t) (b, r). height (int2nat n) R l t b r, Width = % (n, 

R) (l, t) (b, r). width (int2nat n) R l t b r |)" 

primrec 

"struct 0 R l t b r = (l = r)" 

"struct (Suc n) R l t b r = 

Def beside 

(((| Def = % (l, t) (b, r). struct n R l t b r, 

Height = % a b. arbitrary, 

Width = % a b. arbitrary|)), 

R $ (int (Suc n))) (l, (t, t)) ((b, b< 

int n>), r)" 

primrec 

"height 0 R l t b r = 0" 

"height (Suc n) R l t b r = 

Height beside 

(((| Def = % a b. arbitrary,


Height = % (l,t) (b,r). height n R l t b r, 

Width = % (l,t) (b,r). width n R l t b r|)), 

R $ (int (Suc n))) (l, (t, t)) ((b, b< 

int n>), r)" 

primrec 

"width 0 R l t b r = 0" 

"width (Suc n) R l t b r = 

Width ((l, t) ;;; 

snd $ (converse $ (apr $ (int n))) ;; 

beside $ ((| Def = % b c. arbitrary, Height = % b c. arbitrary, Width = % (l 

,t) (b, r). width n R l t b r |), R $ (int n + 1)) ;; 

fst $ (apr $ (int n)) 

;;; (b, r) 

)" 


theorem height_ge0_int [rule_format]: "!! (n::nat) R. [| ALL (qs137::int) (qs138::(’ 

t107*’t108)) (qs139::(’t109*’t107)). 0 (’t109*’ 

t107)=>bool,int=>(’t107*’t108)=>(’t109*’t107)=>int)block)). [| ALL (qs137::int) 

(qs138::(’t107*’t108)) (qs139::(’t109*’t107)). 0 (’t109*’t107)=>bool,int 

=>(’t107*’t108)=>(’t109*’t107)=>int)block)) (l::’t107) (t::(’t108)vector) (b::(’ 

t109)vector) (r::’t107). [| n >= 0 ; ALL (qs137::int) (qs138::(’t107*’t108)) ( 

qs139::(’t109*’t107)). 0


done 

theorem width_ge0: "!! (n::int) (R::((int=>(’t107*’t108)=>(’t109*’t107)=>bool,int=>(’ 

t107*’t108)=>(’t109*’t107)=>int)block)) (l::’t107) (t::(’t108)vector) (b::(’t109 

)vector) (r::’t107). [| n >= 0 ; ALL (qs137::int) (qs138::(’t107*’t108)) (qs139 

::(’t109*’t107)). 0 (’t154*’t165)=>bool,int=>(’t165*’t153) 

=>(’t154*’t165)=>int)block)) (l::’t165) (t::(’t153)vector) (b::(’t154)vector) (r 

::’t165). [| n >= 0 ; if n = 0 then l = r else Def ((l, t) ;;; snd $ (converse $ 

(apr $ (n - 1))) ;; beside $ (irow $ (n - 1, R), R $ n) ;; fst $ (apr $ (n - 1) 

) ;;; (b, r)) ; ALL (qs205::int) (qs206::(’t165*’t153)) (qs207::(’t154*’t165)). 

0


theory irdlelem = pi2 + converse: 


consts 

struct:: "((int=>(’t36*’t28)=>’t36=>bool,int=>(’t36*’t28)=>’t36=>int)block)=>int 

=>(’t36*’t28)=>(’t28*’t36)=>bool" 

height:: "((int=>(’t36*’t28)=>’t36=>bool,int=>(’t36*’t28)=>’t36=>int)block)=>int 

=>(’t36*’t28)=>(’t28*’t36)=>int" 

width:: "((int=>(’t36*’t28)=>’t36=>bool,int=>(’t36*’t28)=>’t36=>int)block)=>int 

=>(’t36*’t28)=>(’t28*’t36)=>int" 

irdlelem:: "(((int=>(’t36*’t28)=>’t36=>bool,int=>(’t36*’t28)=>’t36=>int)block)=> 

int=>(’t36*’t28)=>(’t28*’t36)=>bool, ((int=>(’t36*’t28)=>’t36=>bool,int=>(’ 

t36*’t28)=>’t36=>int)block)=>int=>(’t36*’t28)=>(’t28*’t36)=>int)block" 

defs 

struct_def: "struct == % R n (l, t) (b, r). Def ((l, t) ;;; R $ n ;; converse $ ( 

pi2) ;;; (b, r))" 

height_def: "height == % R n (l, t) (b, r). Height ((l, t) ;;; R $ n ;; converse 

$ (pi2) ;;; (b, r))" 

width_def: "width == % R n (l, t) (b, r). Width ((l, t) ;;; R $ n ;; converse $ ( 

pi2) ;;; (b, r))" 

irdlelem_def: "irdlelem == (| Def = struct, Height = height, Width = width |)" 





theorem height_ge0_int: "!! (R::((int=>(’t36*’t28)=>’t36=>bool,int=>(’t36*’t28)=>’t36 

=>int)block)) (n::int) (l::’t36) (t::’t28) (b::’t28) (r::’t36). [| ALL (qs643:: 

int) (qs644::(’t36*’t28)) (qs645::’t36). 0 

0 (’t36*’t28)=>’t36=>bool,int=>(’t36*’t28)=>’t36 

=>int)block)) (n::int) (l::’t36) (t::’t28) (b::’t28) (r::’t36). [| ALL (qs643:: 

int) (qs644::(’t36*’t28)) (qs645::’t36). 0 

0 (’t36*’t28)=>’t36=>bool,int=>(’t36*’t28)=>’t36=> 

int)block)) (n::int) (l::’t36) (t::’t28) (b::’t28) (r::’t36). [| ALL (qs643::int 

) (qs644::(’t36*’t28)) (qs645::’t36). 0 

0


apply (simp (no_asm_simp) del: height_def width_def add: Let_def max_def converse_def 

pi2_def irdlelem_def, 

(rule height_ge0_int, (simp+)?)?) 

done 

theorem width_ge0: "!! (R::((int=>(’t36*’t28)=>’t36=>bool,int=>(’t36*’t28)=>’t36=>int 

)block)) (n::int) (l::’t36) (t::’t28) (b::’t28) (r::’t36). [| ALL (qs643::int) ( 

qs644::(’t36*’t28)) (qs645::’t36). 0 

0 (’t36*’t28)=>’t36=>bool,int=>(’t36*’t28)=>’t36=>int)block)) (n 

::int) (l::’t36) (t::’t28) (b::’t28) (r::’t36). [| Def ((l, t) ;;; R $ n ;; 

converse $ (pi2) ;;; (b, r)) ; ALL (qs643::int) (qs644::(’t36*’t28)) (qs645::’ 

t36). 0 ’t298=>int) 

block))=>(’t298*(’t257)vector)=>’t298=>bool" 

height:: "(int*((int=>(’t298*’t257)=>’t298=>bool,int=>(’t298*’t257)=>’t298=>int) 

block))=>(’t298*(’t257)vector)=>’t298=>int" 

width:: "(int*((int=>(’t298*’t257)=>’t298=>bool,int=>(’t298*’t257)=>’t298=>int) 

block))=>(’t298*(’t257)vector)=>’t298=>int" 

irdl:: "((int*((int=>(’t298*’t257)=>’t298=>bool,int=>(’t298*’t257)=>’t298=>int) 

block))=>(’t298*(’t257)vector)=>’t298=>bool, (int*((int=>(’t298*’t257)=>’t298 

=>bool,int=>(’t298*’t257)=>’t298=>int)block))=>(’t298*(’t257)vector)=>’t298=> 

int)block" 

defs 

struct_def: "struct == % (n, R) (l, t) r. Def ((l, t) ;;; irow $ (n, irdlelem $ ( 

R)) ;; pi2 ;;; r)" 

height_def: "height == % (n, R) (l, t) r. Height ((l, t) ;;; irow $ (n, irdlelem 

$ (R)) ;; pi2 ;;; r)" 

width_def: "width == % (n, R) (l, t) r. Width ((l, t) ;;; irow $ (n, irdlelem $ ( 

R)) ;; pi2 ;;; r)"


irdl_def: "irdl == (| Def = struct, Height = height, Width = width |)" 





theorem height_ge0_int: "!! (n::int) (R::((int=>(’t298*’t257)=>’t298=>bool,int=>(’ 

t298*’t257)=>’t298=>int)block)) (l::’t298) (t::(’t257)vector) (r::’t298). [| 0 

bool,int=>(’t298 

*’t257)=>’t298=>int)block)) (l::’t298) (t::(’t257)vector) (r::’t298). [| 0 bool,int=>(’t298*’ 

t257)=>’t298=>int)block)) (l::’t298) (t::(’t257)vector) (r::’t298). [| 0 bool,int=>(’t298*’ 

t257)=>’t298=>int)block)) (l::’t298) (t::(’t257)vector) (r::’t298). [| 0


theorem "!! (n::int) (R::((int=>(’t298*’t257)=>’t298=>bool,int=>(’t298*’t257)=>’t298 

=>int)block)) (l::’t298) (t::(’t257)vector) (r::’t298). [| Def ((l, t) ;;; irow 

$ (n, irdlelem $ (R)) ;; pi2 ;;; r) ; ALL (qs649::int) (qs650::(’t298*’t257)) ( 

qs651::’t298). 0 ’t22=>int)block)=>int=> 

int=>’t21=>’t22=>bool" 

height:: "(((int*int)=>’t21=>’t22=>bool,(int*int)=>’t21=>’t22=>int)block)=>int=> 

int=>’t21=>’t22=>int" 

width:: "(((int*int)=>’t21=>’t22=>bool,(int*int)=>’t21=>’t22=>int)block)=>int=> 

int=>’t21=>’t22=>int" 

curry:: "((((int*int)=>’t21=>’t22=>bool,(int*int)=>’t21=>’t22=>int)block)=>int=> 

int=>’t21=>’t22=>bool, (((int*int)=>’t21=>’t22=>bool,(int*int)=>’t21=>’t22=> 

int)block)=>int=>int=>’t21=>’t22=>int)block" 

defs 

struct_def: "struct == % R m n d r. Def (d ;;; R $ (m, n) ;;; r)" 

height_def: "height == % R m n d r. Height (d ;;; R $ (m, n) ;;; r)" 

width_def: "width == % R m n d r. Width (d ;;; R $ (m, n) ;;; r)" 

curry_def: "curry == (| Def = struct, Height = height, Width = width |)" 





theorem height_ge0_int: "!! (R::(((int*int)=>’t21=>’t22=>bool,(int*int)=>’t21=>’t22=> 

int)block)) (m::int) (n::int) (d::’t21) (r::’t22). [| ALL (qs383::(int*int)) (


qs384::’t21) (qs385::’t22). 0 ’t22=>bool,(int*int)=>’t21=>’t22=> 

int)block)) (m::int) (n::int) (d::’t21) (r::’t22). [| ALL (qs383::(int*int)) ( 

qs384::’t21) (qs385::’t22). 0 ’t22=>bool,(int*int)=>’t21=>’t22=>int) 

block)) (m::int) (n::int) (d::’t21) (r::’t22). [| ALL (qs383::(int*int)) (qs384 

::’t21) (qs385::’t22). 0 ’t22=>bool,(int*int)=>’t21=>’t22=>int) 

block)) (m::int) (n::int) (d::’t21) (r::’t22). [| ALL (qs383::(int*int)) (qs384 

::’t21) (qs385::’t22). 0 ’t22=>bool,(int*int)=>’t21=>’t22=>int)block)) (m:: 

int) (n::int) (d::’t21) (r::’t22). [| Def (d ;;; R $ (m, n) ;;; r) ; ALL (qs383 

::(int*int)) (qs384::’t21) (qs385::’t22). 0



consts 

struct:: "((int=>int=>(’t180*’t180)=>(’t180*’t180)=>bool,int=>int=>(’t180*’t180) 

=>(’t180*’t180)=>int)block)=>int=>int=>((’t180)vector*’t180)=>(’t180*(’t180) 

vector)=>bool" 

height:: "((int=>int=>(’t180*’t180)=>(’t180*’t180)=>bool,int=>int=>(’t180*’t180) 


vector)=>int" 

width:: "((int=>int=>(’t180*’t180)=>(’t180*’t180)=>bool,int=>int=>(’t180*’t180) 


vector)=>int" 

igrid1:: "(((int=>int=>(’t180*’t180)=>(’t180*’t180)=>bool,int=>int=>(’t180*’t180) 


vector)=>bool, ((int=>int=>(’t180*’t180)=>(’t180*’t180)=>bool,int=>int=>(’ 

t180*’t180)=>(’t180*’t180)=>int)block)=>int=>int=>((’t180)vector*’t180)=>(’ 

t180*(’t180)vector)=>int)block" 

defs 

struct_def: "struct == % R n i (l, t) (b, r). Def ((l, t) ;;; icol $ (n, R $ i) 

;;; (b, r))" 

height_def: "height == % R n i (l, t) (b, r). Height ((l, t) ;;; icol $ (n, R $ i 

) ;;; (b, r))" 

width_def: "width == % R n i (l, t) (b, r). Width ((l, t) ;;; icol $ (n, R $ i) 

;;; (b, r))" 

igrid1_def: "igrid1 == (| Def = struct, Height = height, Width = width |)" 





theorem height_ge0_int: "!! (R::((int=>int=>(’t180*’t180)=>(’t180*’t180)=>bool,int=> 

int=>(’t180*’t180)=>(’t180*’t180)=>int)block)) (n::int) (i::int) (l::(’t180) 

vector) (t::’t180) (b::’t180) (r::(’t180)vector). [| ALL (qs389::int) (qs390:: 

int) (qs391::(’t180*’t180)) (qs392::(’t180*’t180)). 0 (’t180*’t180)=>bool,int=> 

int=>(’t180*’t180)=>(’t180*’t180)=>int)block)) (n::int) (i::int) (l::(’t180) 

vector) (t::’t180) (b::’t180) (r::(’t180)vector). [| ALL (qs389::int) (qs390:: 

int) (qs391::(’t180*’t180)) (qs392::(’t180*’t180)). 0


section {* Additional simplification rules for different representations *} 

theorem height_ge0: "!! (R::((int=>int=>(’t180*’t180)=>(’t180*’t180)=>bool,int=>int 

=>(’t180*’t180)=>(’t180*’t180)=>int)block)) (n::int) (i::int) (l::(’t180)vector) 

(t::’t180) (b::’t180) (r::(’t180)vector). [| ALL (qs389::int) (qs390::int) ( 

qs391::(’t180*’t180)) (qs392::(’t180*’t180)). 0 (’t180*’t180)=>bool,int=>int 

=>(’t180*’t180)=>(’t180*’t180)=>int)block)) (n::int) (i::int) (l::(’t180)vector) 

(t::’t180) (b::’t180) (r::(’t180)vector). [| ALL (qs389::int) (qs390::int) ( 

qs391::(’t180*’t180)) (qs392::(’t180*’t180)). 0 (’t180*’t180)=>bool,int=>int=>(’t180*’t180 

)=>(’t180*’t180)=>int)block)) (n::int) (i::int) (l::(’t180)vector) (t::’t180) (b 

::’t180) (r::(’t180)vector). [| Def ((l, t) ;;; icol $ (n, R $ i) ;;; (b, r)) ; 

ALL (qs389::int) (qs390::int) (qs391::(’t180*’t180)) (qs392::(’t180*’t180)). 0 

(’t313*’t313)=>int)block))=>((’t313)vector*(’t313)vector)=>((’ 

t313)vector*(’t313)vector)=>bool"


height:: "(int*int*(((int*int)=>(’t313*’t313)=>(’t313*’t313)=>bool,(int*int)=>(’ 

t313*’t313)=>(’t313*’t313)=>int)block))=>((’t313)vector*(’t313)vector)=>((’ 

t313)vector*(’t313)vector)=>int" 

width:: "(int*int*(((int*int)=>(’t313*’t313)=>(’t313*’t313)=>bool,(int*int)=>(’ 


t313)vector*(’t313)vector)=>int" 

igrid:: "((int*int*(((int*int)=>(’t313*’t313)=>(’t313*’t313)=>bool,(int*int)=>(’ 


t313)vector*(’t313)vector)=>bool, (int*int*(((int*int)=>(’t313*’t313)=>(’t313 

*’t313)=>bool,(int*int)=>(’t313*’t313)=>(’t313*’t313)=>int)block))=>((’t313) 

vector*(’t313)vector)=>((’t313)vector*(’t313)vector)=>int)block" 

defs 

struct_def: "struct == % (m, n, R) (l, t) (b, r). Def ((l, t) ;;; irow $ (m, 

igrid1 $ (curry $ (R)) $ n) ;;; (b, r))" 

height_def: "height == % (m, n, R) (l, t) (b, r). Height ((l, t) ;;; irow $ (m, 


width_def: "width == % (m, n, R) (l, t) (b, r). Width ((l, t) ;;; irow $ (m, 


igrid_def: "igrid == (| Def = struct, Height = height, Width = width |)" 





theorem height_ge0_int: "!! (m::int) (n::int) (R::(((int*int)=>(’t313*’t313)=>(’t313 

*’t313)=>bool,(int*int)=>(’t313*’t313)=>(’t313*’t313)=>int)block)) (l::(’t313) 

vector) (t::(’t313)vector) (b::(’t313)vector) (r::(’t313)vector). [| ALL (qs396 

::(int*int)) (qs397::(’t313*’t313)) (qs398::(’t313*’t313)). 0 (’t313*’t313)=>(’t313*’t313)=>int)block)) (l::(’t313) 


::(int*int)) (qs397::(’t313*’t313)) (qs398::(’t313*’t313)). 0 (’t313*’t313)=>(’t313*’t313)=>int)block)) (l::(’t313)



::(int*int)) (qs397::(’t313*’t313)) (qs398::(’t313*’t313)). 0 (’t313*’t313)=>(’t313*’t313)=>int)block)) (l::(’t313)vector) 

(t::(’t313)vector) (b::(’t313)vector) (r::(’t313)vector). [| ALL (qs396::(int* 

int)) (qs397::(’t313*’t313)) (qs398::(’t313*’t313)). 0 (’t313*’t313)=>(’t313*’t313)=>int)block)) (l::(’t313)vector) (t::(’ 

t313)vector) (b::(’t313)vector) (r::(’t313)vector). [| Def ((l, t) ;;; irow $ (m 

, igrid1 $ (curry $ (R)) $ n) ;;; (b, r)) ; ALL (qs396::(int*int)) (qs397::(’ 

t313*’t313)) (qs398::(’t313*’t313)). 0


} { 

a1 ; A ; a2 

at (0,0). 

b1 ; B ; b2 

at (0, height(a1 ; A ; a2)). 

c1 ; C ; c2 

at (width(b1 ;B ; b2), height(a1 ; A ; a2)). 

d1 ; D ; d2 

at (width (b1 ;B ; b2), 

max (height (c1 ;C ; c2) + height(a1 ;A ; a2), height(e1 ; E ; e2))). 

e1 ; E ; e2 

at (max (width (a1 ;A ;a2), width(c1 ;C ; c2) + width(b1 ;B ; b2)), 0). 

} 

theory irregular_grid = QuartzLayout: 


consts 

struct:: "(((’t15=>’t16=>bool,’t15=>’t16=>int)block)*((’t18=>’t19=>bool,’t18=>’ 

t19=>int)block)*((’t21=>’t22=>bool,’t21=>’t22=>int)block)*((’t24=>’t25=>bool 

,’t24=>’t25=>int)block)*((’t27=>’t28=>bool,’t27=>’t28=>int)block))=>(’t15*’ 

t18*’t21*’t24*’t27)=>(’t16*’t19*’t22*’t25*’t28)=>bool" 

height:: "(((’t15=>’t16=>bool,’t15=>’t16=>int)block)*((’t18=>’t19=>bool,’t18=>’ 

t19=>int)block)*((’t21=>’t22=>bool,’t21=>’t22=>int)block)*((’t24=>’t25=>bool 

,’t24=>’t25=>int)block)*((’t27=>’t28=>bool,’t27=>’t28=>int)block))=>(’t15*’ 

t18*’t21*’t24*’t27)=>(’t16*’t19*’t22*’t25*’t28)=>int" 

width:: "(((’t15=>’t16=>bool,’t15=>’t16=>int)block)*((’t18=>’t19=>bool,’t18=>’t19 

=>int)block)*((’t21=>’t22=>bool,’t21=>’t22=>int)block)*((’t24=>’t25=>bool,’ 

t24=>’t25=>int)block)*((’t27=>’t28=>bool,’t27=>’t28=>int)block))=>(’t15*’t18 

*’t21*’t24*’t27)=>(’t16*’t19*’t22*’t25*’t28)=>int" 

irregular_grid:: "((((’t15=>’t16=>bool,’t15=>’t16=>int)block)*((’t18=>’t19=>bool 

,’t18=>’t19=>int)block)*((’t21=>’t22=>bool,’t21=>’t22=>int)block)*((’t24=>’ 

t25=>bool,’t24=>’t25=>int)block)*((’t27=>’t28=>bool,’t27=>’t28=>int)block)) 

=>(’t15*’t18*’t21*’t24*’t27)=>(’t16*’t19*’t22*’t25*’t28)=>bool, (((’t15=>’t16 

=>bool,’t15=>’t16=>int)block)*((’t18=>’t19=>bool,’t18=>’t19=>int)block)*((’ 

t21=>’t22=>bool,’t21=>’t22=>int)block)*((’t24=>’t25=>bool,’t24=>’t25=>int) 

block)*((’t27=>’t28=>bool,’t27=>’t28=>int)block))=>(’t15*’t18*’t21*’t24*’t27) 

=>(’t16*’t19*’t22*’t25*’t28)=>int)block" 

defs 

struct_def: "struct == % (A, B, C, D, E) (a1, b1, c1, d1, e1) (a2, b2, c2, d2, e2 

). Def (a1 ;;; A ;;; a2) & Def (b1 ;;; B ;;; b2) & Def (c1 ;;; C ;;; c2) & 

Def (d1 ;;; D ;;; d2) & Def (e1 ;;; E ;;; e2)" 

height_def: "height == % (A, B, C, D, E) (a1, b1, c1, d1, e1) (a2, b2, c2, d2, e2 

). max ((Height (a1 ;;; A ;;; a2)) + (Height (b1 ;;; B ;;; b2))) ((max (( 

Height (a1 ;;; A ;;; a2)) + (Height (c1 ;;; C ;;; c2)) + (Height (d1 ;;; D 

;;; d2))) ((Height (e1 ;;; E ;;; e2)) + (Height (d1 ;;; D ;;; d2)))))" 

width_def: "width == % (A, B, C, D, E) (a1, b1, c1, d1, e1) (a2, b2, c2, d2, e2). 

max ((Width (a1 ;;; A ;;; a2)) + (Width (e1 ;;; E ;;; e2))) ((max ((Width ( 

b1 ;;; B ;;; b2)) + (Width (c1 ;;; C ;;; c2)) + (Width (e1 ;;; E ;;; e2))) (( 

Width (b1 ;;; B ;;; b2)) + (Width (d1 ;;; D ;;; d2)))))" 

irregular_grid_def: "irregular_grid == (| Def = struct, Height = height, Width = 

width |)" 


declare height_def [simp]




theorem height_ge0_int: "!! (A::((’t15=>’t16=>bool,’t15=>’t16=>int)block)) (B::((’t18 

=>’t19=>bool,’t18=>’t19=>int)block)) (C::((’t21=>’t22=>bool,’t21=>’t22=>int) 

block)) (D::((’t24=>’t25=>bool,’t24=>’t25=>int)block)) (E::((’t27=>’t28=>bool,’ 

t27=>’t28=>int)block)) (a1::’t15) (b1::’t18) (c1::’t21) (d1::’t24) (e1::’t27) ( 

a2::’t16) (b2::’t19) (c2::’t22) (d2::’t25) (e2::’t28). [| ALL (qs40::’t15) (qs41 

::’t16). 0 bool,’t21=>’t22=>int) 

block)) (D::((’t24=>’t25=>bool,’t24=>’t25=>int)block)) (E::((’t27=>’t28=>bool,’ 

t27=>’t28=>int)block)) (a1::’t15) (b1::’t18) (c1::’t21) (d1::’t24) (e1::’t27) ( 

a2::’t16) (b2::’t19) (c2::’t22) (d2::’t25) (e2::’t28). [| ALL (qs40::’t15) (qs41 

::’t16). 0 bool,’t21=>’t22=>int)block)) 

(D::((’t24=>’t25=>bool,’t24=>’t25=>int)block)) (E::((’t27=>’t28=>bool,’t27=>’ 

t28=>int)block)) (a1::’t15) (b1::’t18) (c1::’t21) (d1::’t24) (e1::’t27) (a2::’ 

t16) (b2::’t19) (c2::’t22) (d2::’t25) (e2::’t28). [| ALL (qs40::’t15) (qs41::’ 

t16). 0


Width (qs48 ;;; E ;;; qs49)) |] ==> 

0 ’t16=>bool,’t15=>’t16=>int)block)) (B::((’t18=>’ 

t19=>bool,’t18=>’t19=>int)block)) (C::((’t21=>’t22=>bool,’t21=>’t22=>int)block)) 

(D::((’t24=>’t25=>bool,’t24=>’t25=>int)block)) (E::((’t27=>’t28=>bool,’t27=>’ 

t28=>int)block)) (a1::’t15) (b1::’t18) (c1::’t21) (d1::’t24) (e1::’t27) (a2::’ 

t16) (b2::’t19) (c2::’t22) (d2::’t25) (e2::’t28). [| ALL (qs40::’t15) (qs41::’ 

t16). 0 bool,’t21=>’t22=>int)block)) (D::((’t24 

=>’t25=>bool,’t24=>’t25=>int)block)) (E::((’t27=>’t28=>bool,’t27=>’t28=>int) 

block)) (a1::’t15) (b1::’t18) (c1::’t21) (d1::’t24) (e1::’t27) (a2::’t16) (b2::’ 

t19) (c2::’t22) (d2::’t25) (e2::’t28). [| Def (a1 ;;; A ;;; a2) ; Def (b1 ;;; B 

;;; b2) ; Def (c1 ;;; C ;;; c2) ; Def (d1 ;;; D ;;; d2) ; Def (e1 ;;; E ;;; e2) 

; ALL (qs40::’t15) (qs41::’t16). 0


theorem "!! (A::((’t15=>’t16=>bool,’t15=>’t16=>int)block)) (B::((’t18=>’t19=>bool,’ 

t18=>’t19=>int)block)) (C::((’t21=>’t22=>bool,’t21=>’t22=>int)block)) (D::((’t24 





; ALL (qs40::’t15) (qs41::’t16). 0 ’t25=>bool,’t24=>’t25=>int)block)) (E::((’t27=>’t28=>bool,’t27=>’t28=>int) 







;;; b2) ; Def (c1 ;;; C ;;; c2) ; Def (d1 ;;; D ;;; d2) ; Def (e1 ;;; E ;;; e2)






; ALL (qs40::’t15) (qs41::’t16). 0 ’t25=>bool,’t24=>’t25=>int)block)) (E::((’t27=>’t28=>bool,’t27=>’t28=>int)









; ALL (qs40::’t15) (qs41::’t16). 0 ’t28=>bool,’t27=>’t28=>int) 




; ALL (qs40::’t15) (qs41::’t16). 0


((((Width (b1 ;;; B ;;; b2)) + (Width (c1 ;;; C ;;; c2))) int)block)) (D::((’t24 





; ALL (qs40::’t15) (qs41::’t16). 0


by auto 

end 

bool,(’t11*’t12)=>(’t13*’t14)=>int)block) 

*(((’t16=>’t11=>bool,’t16=>’t11=>int)block)*((’t18=>’t12=>bool,’t18=>’t12=> 

int)block))*(((’t13=>’t20=>bool,’t13=>’t20=>int)block)*((’t14=>’t22=>bool,’ 

t14=>’t22=>int)block)))=>(’t16*’t18)=>(’t20*’t22)=>bool" 

height:: "((((’t11*’t12)=>(’t13*’t14)=>bool,(’t11*’t12)=>(’t13*’t14)=>int)block) 



t14=>’t22=>int)block)))=>(’t16*’t18)=>(’t20*’t22)=>int" 

width:: "((((’t11*’t12)=>(’t13*’t14)=>bool,(’t11*’t12)=>(’t13*’t14)=>int)block) 



t14=>’t22=>int)block)))=>(’t16*’t18)=>(’t20*’t22)=>int" 

surround:: "(((((’t11*’t12)=>(’t13*’t14)=>bool,(’t11*’t12)=>(’t13*’t14)=>int) 

block)*(((’t16=>’t11=>bool,’t16=>’t11=>int)block)*((’t18=>’t12=>bool,’t18=>’ 

t12=>int)block))*(((’t13=>’t20=>bool,’t13=>’t20=>int)block)*((’t14=>’t22=> 

bool,’t14=>’t22=>int)block)))=>(’t16*’t18)=>(’t20*’t22)=>bool, ((((’t11*’t12) 

=>(’t13*’t14)=>bool,(’t11*’t12)=>(’t13*’t14)=>int)block)*(((’t16=>’t11=>bool 

,’t16=>’t11=>int)block)*((’t18=>’t12=>bool,’t18=>’t12=>int)block))*(((’t13=>’ 

t20=>bool,’t13=>’t20=>int)block)*((’t14=>’t22=>bool,’t14=>’t22=>int)block))) 

=>(’t16*’t18)=>(’t20*’t22)=>int)block" 

defs 

struct_def: "struct == % (A, (B, C), (D, E)) (l, t) (b, r). EX (l2::’t11) (t2::’ 

t12) (b2::’t13) (r2::’t14). Def (l ;;; B ;;; l2) & Def (t ;;; C ;;; t2) & Def 

((l2, t2) ;;; A ;;; (b2, r2)) & Def (b2 ;;; D ;;; b) & Def (r2 ;;; E ;;; r)" 

height_def: "height == % (A, (B, C), (D, E)) (l, t) (b, r). 

let l2 = (THE (l2::’t11). EX (t2::’t12) (b2::’t13) (r2::’t14). Def (l ;;; B 

;;; l2) & Def (t ;;; C ;;; t2) & Def ((l2, t2) ;;; A ;;; (b2, r2)) & Def 

(b2 ;;; D ;;; b) & Def (r2 ;;; E ;;; r)) in


let t2 = (THE (t2::’t12). EX (l2::’t11) (b2::’t13) (r2::’t14). Def (l ;;; B 


(b2 ;;; D ;;; b) & Def (r2 ;;; E ;;; r)) in 

let b2 = (THE (b2::’t13). EX (l2::’t11) (t2::’t12) (r2::’t14). Def (l ;;; B 


(b2 ;;; D ;;; b) & Def (r2 ;;; E ;;; r)) in 

let r2 = (THE (r2::’t14). EX (l2::’t11) (t2::’t12) (b2::’t13). Def (l ;;; B 


(b2 ;;; D ;;; b) & Def (r2 ;;; E ;;; r)) in 

max ((Height (b2 ;;; D ;;; b)) + (Height (r2 ;;; E ;;; r))) ((max (Height (b2 

;;; D ;;; b)) ((max ((Height (b2 ;;; D ;;; b)) + (Height ((l2, t2) ;;; A 

;;; (b2, r2)))) ((max ((max ((Height (b2 ;;; D ;;; b)) + (Height (r2 ;;; 

E ;;; r))) ((Height (b2 ;;; D ;;; b)) + (Height ((l2, t2) ;;; A ;;; (b2, 

r2))))) + (Height (t ;;; C ;;; t2))) ((Height (b2 ;;; D ;;; b)) + ( 

Height (l ;;; B ;;; l2)))))))))" 

width_def: "width == % (A, (B, C), (D, E)) (l, t) (b, r). 

let l2 = (THE (l2::’t11). EX (t2::’t12) (b2::’t13) (r2::’t14). Def (l ;;; B 


(b2 ;;; D ;;; b) & Def (r2 ;;; E ;;; r)) in 

let t2 = (THE (t2::’t12). EX (l2::’t11) (b2::’t13) (r2::’t14). Def (l ;;; B 


(b2 ;;; D ;;; b) & Def (r2 ;;; E ;;; r)) in 

let b2 = (THE (b2::’t13). EX (l2::’t11) (t2::’t12) (r2::’t14). Def (l ;;; B 


(b2 ;;; D ;;; b) & Def (r2 ;;; E ;;; r)) in 

let r2 = (THE (r2::’t14). EX (l2::’t11) (t2::’t12) (b2::’t13). Def (l ;;; B 


(b2 ;;; D ;;; b) & Def (r2 ;;; E ;;; r)) in 

max ((Width (l ;;; B ;;; l2)) + (Width ((l2, t2) ;;; A ;;; (b2, r2))) + ( 

Width (r2 ;;; E ;;; r))) ((max ((Width (l ;;; B ;;; l2)) + (Width (b2 

;;; D ;;; b))) ((max ((Width (l ;;; B ;;; l2)) + (Width ((l2, t2) ;;; A 

;;; (b2, r2)))) ((max ((Width (l ;;; B ;;; l2)) + (Width (t ;;; C ;;; t2 

))) (Width (l ;;; B ;;; l2))))))))" 

surround_def: "surround == (| Def = struct, Height = height, Width = width|)" 





theorem height_ge0_int : "!! (A::(((’t11*’t12)=>(’t13*’t14)=>bool,(’t11*’t12)=>(’t13 

*’t14)=>int)block)) (B::((’t16=>’t11=>bool,’t16=>’t11=>int)block)) (C::((’t18=>’ 

t12=>bool,’t18=>’t12=>int)block)) (D::((’t13=>’t20=>bool,’t13=>’t20=>int)block)) 

(E::((’t14=>’t22=>bool,’t14=>’t22=>int)block)) (l::’t16) (t::’t18) (b::’t20) (r 

::’t22). [| ALL (qs32::(’t11*’t12)) (qs33::(’t13*’t14)). 0


theorem width_ge0_int : "!! (A::(((’t11*’t12)=>(’t13*’t14)=>bool,(’t11*’t12)=>(’t13*’ 

t14)=>int)block)) (B::((’t16=>’t11=>bool,’t16=>’t11=>int)block)) (C::((’t18=>’ 

t12=>bool,’t18=>’t12=>int)block)) (D::((’t13=>’t20=>bool,’t13=>’t20=>int)block)) 


::’t22). [| ALL (qs32::(’t11*’t12)) (qs33::(’t13*’t14)). 0 bool,’t18=>’t12=>int)block)) (D::((’t13=>’t20=>bool,’t13=>’t20=>int) 

block)) (E::((’t14=>’t22=>bool,’t14=>’t22=>int)block)) (l::’t16) (t::’t18) (b::’ 

t20) (r::’t22). [| ALL (qs32::(’t11*’t12)) (qs33::(’t13*’t14)). 0 bool,’t18=>’t12=>int)block)) (D::((’t13=>’t20=>bool,’t13=>’t20=>int)block)) 


::’t22). [| ALL (qs32::(’t11*’t12)) (qs33::(’t13*’t14)). 0


(rule width_ge0_int, (simp+)?)?) 

done 

section {* Containment theorems *} 

theorem "!! (A::(((’t11*’t12)=>(’t13*’t14)=>bool,(’t11*’t12)=>(’t13*’t14)=>int)block 

)) (B::((’t16=>’t11=>bool,’t16=>’t11=>int)block)) (C::((’t18=>’t12=>bool,’t18=>’ 

t12=>int)block)) (D::((’t13=>’t20=>bool,’t13=>’t20=>int)block)) (E::((’t14=>’t22 

=>bool,’t14=>’t22=>int)block)) (l::’t16) (t::’t18) (b::’t20) (r::’t22). [| Def ( 

l ;;; B ;;; l2) ; Def (t ;;; C ;;; t2) ; Def ((l2, t2) ;;; A ;;; (b2, r2)) ; Def 

(b2 ;;; D ;;; b) ; Def (r2 ;;; E ;;; r) ; ALL (qs32::(’t11*’t12)) (qs33::(’t13 

*’t14)). 0 ’t20=>bool,’t13=>’t20=>int)block)) (E::((’t14=>’t22 




*’t14)). 0


) ;;; A ;;; (b2, r2)))) ((Height (b2 ;;; D ;;; b)) + (Height (r2 ;;; E ;;; r) 

))) + (Height (t ;;; C ;;; t2))) (’t13*’t14)=>bool,(’t11*’t12)=>(’t13*’t14)=>int)block 










*’t14)). 0


(0 int)block 










*’t14)). 0


; ALL (qs38::’t13) (qs39::’t20). 0 ’t11=>int)block)) (C::((’t18=>’t12=>bool,’t18=>’ 





*’t14)). 0 ’t22 




*’t14)). 0


((((Width (l ;;; B ;;; l2)) + (Width ((l2, t2) ;;; A ;;; (b2, r2)))) int)block)) (D::((’t13=>’t20=>bool,’t13=>’t20=>int)block)) (E::((’t14=>’t22 




*’t14)). 0


by auto 

end 

C.5 H-Tree 

block htree (int n, block R (‘a, ‘a) ∼ ‘a) (‘a i [m]) ∼ (‘a o) 

{ 

const m = 2 ∗∗ n. 

‘a st1 in[m/2], st2 in[m/2], st1 out, st2 out. 

if n == 0 { 

o = i[0] . 

} else { 

i ; half (m/2) ; (st1 in, st2 in) 

at (0,0). 

if (n mod 2 == 0) { 

// Vertical sub−tree arrangement 


at (0,0). 


at (0, height(st1 in ; htree (n−1, R) ; st1 out)). 


at (0, height(st1 in;htree (n−1, R);st1 out) + height ((st1 out, st2 out);R;o 

)). 

} else { 

// Horizontal sub−tree arrangement 


at (0,0). 


at (width(st1 in ;htree (n−1, R) ; st1 out), 0). 


at (width(st1 in;htree (n−1, R);st1 out) + width ((st1 out, st2 out);R;o), 0) 

. 

} . 

} . 

} 

theory htree = half: 


consts 

struct:: "nat=>(((’t20*’t20)=>’t20=>bool,(’t20*’t20)=>’t20=>int)block)=>(’t20) 

vector=>’t20=>bool" 

height:: "nat=>(((’t20*’t20)=>’t20=>bool,(’t20*’t20)=>’t20=>int)block)=>(’t20) 

vector=>’t20=>int" 

width:: "nat=>(((’t20*’t20)=>’t20=>bool,(’t20*’t20)=>’t20=>int)block)=>(’t20) 

vector=>’t20=>int" 

htree:: "((int*(((’t20*’t20)=>’t20=>bool,(’t20*’t20)=>’t20=>int)block))=>(’t20) 

vector=>’t20=>bool, (int*(((’t20*’t20)=>’t20=>bool,(’t20*’t20)=>’t20=>int) 

block))=>(’t20)vector=>’t20=>int)block" 

defs 

htree_def: "htree == (| Def = % (n, R) i o_. struct (int2nat n) R i o_, Height = 

% (n, R) i o_. height (int2nat n) R i o_, Width = % (n, R) i o_. width ( 

int2nat n) R i o_|)"


primrec 

"struct 0 R i o_ = (o_ = i)" 

"struct (Suc n) R i o_ = (let m = (2 pwr (int (Suc n))) in 

EX (st1_in::(’t20)vector) (st2_in::(’t20)vector) (st1_out::’t20) (st2_out::’t20). 

Def (i ;;; half $ (m div 2) ;;; (st1_in, st2_in)) & 

(if ((Suc n) mod 2) = 0 then (struct n R st1_in st1_out) & Def ((st1_out, st2_out 

) ;;; R ;;; o_) & (struct n R st2_in st2_out) 

else (struct n R st1_in st1_out) & Def ((st1_out, st2_out) ;;; R ;;; o_) & ( 

struct n R st2_in st2_out)))" 

primrec 

"height 0 R i o_ = 0" 

"height (Suc n) R i o_ = (let m = (2 pwr (int (Suc n))) in 

let st1_in = (THE (st1_in::(’t20)vector). EX (st2_in::(’t20)vector) (st1_out::’ 

t20) (st2_out::’t20). 


(if ((Suc n) mod 2) = 0 then (struct n R st1_in st1_out) & Def ((st1_out, 

st2_out) ;;; R ;;; o_) & (struct n R st2_in st2_out) 


struct n R st2_in st2_out)) 

) in 


t20) (st2_out::’t20). 






) in 

let st1_out = (THE (st1_out::’t20). EX (st1_in::(’t20)vector) (st2_in::(’t20) 

vector) (st2_out::’t20). 






) in 








) in 

max (Height (i ;;; half $ (m div 2) ;;; (st1_in, st2_in))) (if ((Suc n) mod 2) = 

0 then max ((height n R st1_in st1_out) + (Height ((st1_out, st2_out) ;;; R 

;;; o_)) + (height n R st2_in st2_out)) ((max ((height n R st1_in st1_out) + 

(Height ((st1_out, st2_out) ;;; R ;;; o_))) (height n R st1_in st1_out))) 

else max (height n R st2_in st2_out) ((max (Height ((st1_out, st2_out) ;;; R 

;;; o_)) (height n R st1_in st1_out)))) 

)" 

primrec 

"width 0 R i o_ = 0" 

"width (Suc n) R i o_ = (let m = (2 pwr (int (Suc n))) in


)" 


t20) (st2_out::’t20). 






) in 


t20) (st2_out::’t20). 






) in 








) in 








) in 

max (Width (i ;;; half $ (m div 2) ;;; (st1_in, st2_in))) (if ((Suc n) mod 2) = 0 

then max ((width n R st1_in st1_out) + (Width ((st1_out, st2_out) ;;; R ;;; 

o_)) + (width n R st2_in st2_out)) ((max ((width n R st1_in st1_out) + (Width 

((st1_out, st2_out) ;;; R ;;; o_))) (width n R st1_in st1_out))) else max ( 

width n R st2_in st2_out) ((max (Width ((st1_out, st2_out) ;;; R ;;; o_)) ( 

width n R st1_in st1_out)))) 


theorem height_ge0_int [rule_format]: "!! (n::nat) (R::(((’t20*’t20)=>’t20=>bool,(’ 

t20*’t20)=>’t20=>int)block)). [| ALL (qs67::(’t20*’t20)) (qs68::’t20). 0 

ALL i o_. 0 ’t20=>bool,(’t20 

*’t20)=>’t20=>int)block)). [| ALL (qs67::(’t20*’t20)) (qs68::’t20). 0


apply (induct_tac n) 

apply (auto intro: z_aleq_bc half.height_ge0 half.width_ge0 simp add: Let_def max_def 

half_def) 

done 


theorem height_ge0: "!! (n::int) (R::(((’t20*’t20)=>’t20=>bool,(’t20*’t20)=>’t20=>int 

)block)) (i::(’t20)vector) (o_::’t20). [| ALL (qs67::(’t20*’t20)) (qs68::’t20). 

0 bool,(’t20*’t20)=>’t20=>int) 

block)) (i::(’t20)vector) (o_::’t20). [| ALL (qs67::(’t20*’t20)) (qs68::’t20). 0 

bool,(’t20*’t20)=>’t20=>int)block)) (i 

::(’t20)vector) (o_::’t20). [| if n = 0 then o_ = i else Def (i ;;; half $ (m 

div 2) ;;; (st1_in, st2_in)) & (if (n mod 2) = 0 then (struct (int2nat (n - 1)) 

R st1_in st1_out) & Def ((st1_out, st2_out) ;;; R ;;; o_) & (struct (int2nat (n 

- 1)) R st2_in st2_out) else (struct (int2nat (n - 1)) R st1_in st1_out) & Def 

((st1_out, st2_out) ;;; R ;;; o_) & (struct (int2nat (n - 1)) R st2_in st2_out)) 

; m = (2 pwr n) ; ALL (qs67::(’t20*’t20)) (qs68::’t20). 0


0) & ((0::int)


st2_out) ;;; R ;;; o_)) + (width (int2nat (n - 1)) R st2_in st2_out)) ((max 

((width (int2nat (n - 1)) R st1_in st1_out) + (Width ((st1_out, st2_out) ;;; 

R ;;; o_))) (width (int2nat (n - 1)) R st1_in st1_out)))))) & ((0 + (height ( 

int2nat (n - 1)) R st1_in st1_out))


theorem "!! (n::int) (R::(((’t20*’t20)=>’t20=>bool,(’t20*’t20)=>’t20=>int)block)) (i 

::(’t20)vector) (o_::’t20). [| if n = 0 then o_ = i else Def (i ;;; half $ (m 

div 2) ;;; (st1_in, st2_in)) & (if (n mod 2) = 0 then (struct (int2nat (n - 1)) 

R st1_in st1_out) & Def ((st1_out, st2_out) ;;; R ;;; o_) & (struct (int2nat (n 

- 1)) R st2_in st2_out) else (struct (int2nat (n - 1)) R st1_in st1_out) & Def 

((st1_out, st2_out) ;;; R ;;; o_) & (struct (int2nat (n - 1)) R st2_in st2_out)) 

; m = (2 pwr n) ; ALL (qs67::(’t20*’t20)) (qs68::’t20). 0


& ((((width (int2nat (n - 1)) R st1_in st1_out) + (Width ((st1_out, st2_out) ;;; 

R ;;; o_)))

Appendix D 

Circuit Layout Case Studies 

This appendix contains the Quartz descriptions and layout verification proofs for a number of 

full hardware designs for the Xilinx Virtex II FPGA architecture. Proof scripts for hardware 

primitives have sometimes been omitted, although they have proof scripts and theorems 

generated like any other block (though empty structures). Useful layout reasoning is at the 

level of half-slices which have size 1 × 1. 

D.1 Median Filter 

D.1.1 Quartz Description 

/∗∗ Median filter in Quartz. 

@author Oliver Pell 

1−D median filter implemented as a state machine with single serial input. 

∗/ 

directive vhdl ”target:virtex2”. 

directive vhdl ”include:ieee header”. 

#include ”p prelude.qtz” 

/∗ Primitives ∗/ 

block or2 (wire a, wire b) ∼ (wire c) attributes { height=1. width=1. }{ } 

block and2 (wire a, wire b) ∼ (wire c) attributes { height=1. width=1. }{ } 

block and3 (wire a, wire b, wire c) ∼ (wire d) attributes { height=1. width=1. }{ } 

block fd (wire c) (wire d) ∼ (wire q) attributes { height=1. width=1. }{ } 

block inv (wire a) ∼ (wire b) attributes { height=1. width=1. }{ } 

block mux lut (wire s) (wire d0, wire d1) ∼ (wire o) attributes { height=1. width=1. }{ } 

block mux lut ff (wire clk) (wire s) (wire d0, wire d1) ∼ (wire o) attributes { height=1. 

width=1. }{ } 

block comp lut ((wire a, wire b), wire s) ∼ (wire o) attributes { height=1. width=1. }{ } 

268

APPENDIX D. CIRCUIT LAYOUT CASE STUDIES 269 

block mux (int b) (wire s) (wire d0[b], wire d1[b]) ∼ (wire o[b]) { 

int j. 

for j = 0..b−1 { 

(d0[j ], d1[j ]) ; mux lut s ; (o[j ]) at (0, j). 

} . 

} 

block mux ff (wire clk) (int b) (wire s) (wire d0[b], wire d1[b]) ∼ (wire o[b]) { 

int j. 

for j = 0..b−1 { 

(d0[j ], d1[j ]) ; mux lut ff clk s ; (o[j ]) at (0, j). 

} . 

} 

block max2 (int bits) (wire a[bits ], wire b[bits ]) ∼ (wire c[bits ]) { 

wire a geq b[bits+1]. 

int j. 

a geq b[0] = true. 

for j = 0..bits−1 { 

((a[j ], b[j ]) , a geq b[j]) ; comp lut ;a geq b[j+1] at (0, j). 

} . 

(b, a) ; mux bits (a geq b[bits]) ; c at (1, 0). 

} 

block min2 (int bits) (wire a[bits ], wire b[bits ]) ∼ (wire c[bits ]) { 

wire a geq b[bits+1]. 

int j. 

a geq b[0] = true. 

for j = 0..bits−1 { 

((a[j ], b[j ]) , a geq b[j]) ; comp lut ;a geq b[j+1] at (0, j). 

} . 

(a, b) ; mux bits (a geq b[bits]) ; c at (1,0). 

} 

block eq (int n) (wire a[n], wire b[n]) ∼ (wire c) { 

int j. 

wire match[n+1]. 

match[0] = true. 

for j = 0..n−1 { 

(match[j], a[j ], b[j ]) ; and3 ; (match[j+1]) at (0, j). 

} . 

c = match[n]. 

} 

/∗∗ Insertion sort ∗/ 

block insert (int bits) (int n) (‘t a, ‘t b[n]) ∼ (‘t c[n+1]) → 

row (n, fork ; [min2 bits, max2 bits]) ; apr n. 

block lct cell (int bits) ((wire f, ‘t a), ‘t s) ∼ (wire d, (wire f2,‘t a2)) { 

wire a eq s. 

a2 = a. 

d = f. 

(a, s) ; eq bits ; a eq s at (0, height((a eq s, f) ;or2 ; f2)). 

(a eq s, f) ; or2 ; f2 at (0, 0). 

}


/∗∗ Locater block determines which value of the state should be discarded. 

@input bits Number of input bits 

@input n Size of state array 

@input a Value to look for 

@input s State array to look in 

@output d Array of true|false values to control the mode for the compactor 

∗/ 

block locater (int bits) (int n) (‘t a, ‘t s[n+1]) ∼ (wire d[n+1]) { 

wire found. 

found = false. 

((found, a), s) ; 

row (n+1, lct cell bits) ; 

pi1 ; 

d at (0,0). 

} 

/∗∗ Compactor cell. Operates in shift or through mode. Shift means we 

haven’t yet encountered the value to remove, through mode means we 

have. In shift mode we push the current value to the left and output 

the last value. In through mode we destroy the last value and output 

this value directly , which will also be done by all subsequent nodes 

∗/ 

block del cell (wire clk) (int bits) (‘t x, (‘t y, wire mode)) ∼ (‘t m, ‘t n) { 

(y, x) ; mux bits mode ;n at (0,0). 

(x, y) ; mux ff clk bits mode ;m at (1,0). 

} 

/∗∗ Compactor block takes a state of size n+1 and discards the specified 

element to produce a new state of size n. 

@input n Desired size of state array 

@input s State array to look in 

@input d Array of true|false values for whether this index is the value to 

be removed 

@output s2 Output state 

∗/ 

block compactor (wire clk) (int bits) (int n) (‘t s[n+1], wire d[n+1]) ∼ (‘t s2[n]) { 

(s [0], (s[n.. 1], d[n..1])) ; 

[id, zip 2] ; 

row (n, del cell clk bits) ; 

pi1 ; 

s2 at (0,0). 

} 

/∗∗ Take the median element ∗/ 

block midelem (int n) (‘t a[n]) ∼ (‘t b) { 

assert (n mod 2 == 1) ”Can’t take the middle element of an even number!”. 

b = a[n / 2]. 

} 

/∗∗ Insert new value into block, then extract the median ∗/ 

block insert median (int bits) (int n) (‘t a, ‘t b[n]) ∼ (‘t c[n+1], ‘t d) 

→ insert bits n ; fork ; snd (midelem (n+1)). 

/∗∗ Remove element a from state s to produce state s2. Throw away d ∗/ 

block nextstate (wire clk) (int bits) (int n) (‘t a, ‘t s[n+1]) ∼ (‘t s2[n], bool d) { 

wire control[n+1]. 

(a, s) ; locater bits n ; control at (0, height((s, control) ; compactor clk bits n ; s2)).


} 

(s, control) ; compactor clk bits n ; s2 at (0,0). 

block filter core (int n, int bits) (wire clk) (wire newval[bits ], wire s[n][ bits ]) ∼ 

(‘t s2[n][ bits ], ‘t median[bits]) → 

fst (fork ; fst (rcomp (n, map (bits, fd clk)))) ; 

below ( nextstate clk bits n , insert median bits n) ; 

snd pi2. 

/∗∗ Median filter, ”n” + 1 size window for ”bits” bit values. 

@input n Window size 

@input bits Number of bits in data values 

@input clk Clock signal 

@input newval Current input value 

@output median Current output (median) value 

∗/ 

block filter (int n, int bits) (wire clk) (‘t newval) ∼ (‘t median) 

→ loop ( filter core (n, bits) clk). 

D.1.2 Theory max2 

theory max2 = mux + comp_lut: 


consts 

struct:: "int=>((wire)vector*(wire)vector)=>(wire)vector=>bool" 

height:: "int=>((wire)vector*(wire)vector)=>(wire)vector=>int" 

width:: "int=>((wire)vector*(wire)vector)=>(wire)vector=>int" 

max2:: "(int=>((wire)vector*(wire)vector)=>(wire)vector=>bool, int=>((wire)vector 

*(wire)vector)=>(wire)vector=>int)block" 

defs 

struct_def: "struct == % bits (a, b) c. EX (a_geq_b::(wire)vector). (a_geq_b = 

bool2wire True) & (ALL (j::int). ((0



theorem height_ge0_int : "!! (bits::int) (a::(wire)vector) (b::(wire)vector) (c::( 

wire)vector). 0


((0::int)


D.1.3 Theory min2 

theory min2 = mux + comp_lut: 


consts 

struct:: "int=>((wire)vector*(wire)vector)=>(wire)vector=>bool" 

height:: "int=>((wire)vector*(wire)vector)=>(wire)vector=>int" 

width:: "int=>((wire)vector*(wire)vector)=>(wire)vector=>int" 

min2:: "(int=>((wire)vector*(wire)vector)=>(wire)vector=>bool, int=>((wire)vector 

*(wire)vector)=>(wire)vector=>int)block" 

defs 

struct_def: "struct == % bits (a, b) c. EX (a_geq_b::(wire)vector). (a_geq_b = 



theorem height_ge0: "!! (bits::int) (a::(wire)vector) (b::(wire)vector) (c::(wire) 

vector). 0


;;; mux $ bits $ a_geq_b ;;; c) |] ==> 

ALL (j::int) (j’::int). ((0 ((wire)vector*(wire) 

vector)=>wire=>int)block" 

defs 

struct_def: "struct == % n (a, b) c. EX (match::(wire)vector). (match = 



j>, a, b) ;;; and3 ;;; match)) & (c = match)" 

height_def: "height == % n (a, b) c. let match = (THE (match::(wire)vector). ( 

match = bool2wire True) & (ALL (j::int). ((0 ) ;;; and3 ;;; match)))) 0" 

width_def: "width == % n (a, b) c. let match = (THE (match::(wire)vector). (match 

= bool2wire True) & (ALL (j::int). ((0


qs480>, a, b) ;;; and3 ;;; match)))) 0)))" 

apply (auto intro: sum_ge0 maxf_ge0 sum_ge0_frange maxf_ge0_frange sum_nsub1_plusf 

maxf_encloses and3.height_ge0 and3.width_ge0 simp add: and3_def) 

apply (simp add: maxf_lamx_top) 

done 

section {* Intersection theorems *} 

theorem "!! (n::int) (a::(wire)vector) (b::(wire)vector) (c::wire). [| match = 

bool2wire True ; ALL (j::int). ((0 

ALL (j::int) (j’::int). ((0 

int" 

width:: "int=>int=>((wire)vector*((wire)vector)vector)=>((wire)vector)vector=>int 

" 

insert:: "(int=>int=>((wire)vector*((wire)vector)vector)=>((wire)vector)vector=> 

bool, int=>int=>((wire)vector*((wire)vector)vector)=>((wire)vector)vector=> 

int)block" 

defs 

struct_def: "struct == % bits n (a, b) c. Def ((a, b) ;;; row $ (n, fork ;; [[ 

min2 $ bits, max2 $ bits ]]) ;; apr $ n ;;; c)" 

height_def: "height == % bits n (a, b) c. Height ((a, b) ;;; row $ (n, fork ;; [[ 

min2 $ bits, max2 $ bits ]]) ;; apr $ n ;;; c)"


width_def: "width == % bits n (a, b) c. Width ((a, b) ;;; row $ (n, fork ;; [[ 

min2 $ bits, max2 $ bits ]]) ;; apr $ n ;;; c)" 

insert_def: "insert == (| Def = struct, Height = height, Width = width |)" 





theorem height_ge0_int : "!! (bits::int) (n::int) (a::(wire)vector) (b::((wire)vector 

)vector) (c::((wire)vector)vector). 0


apply (auto intro: sum_ge0 maxf_ge0 sum_ge0_frange maxf_ge0_frange sum_nsub1_plusf 

maxf_encloses apr.height_ge0 apr.width_ge0 max2.height_ge0 max2.width_ge0 min2. 

height_ge0 min2.width_ge0 fork.height_ge0 fork.width_ge0 row.height_ge0 row. 

width_ge0) 

done 

end 

D.1.6 Theory lct cell 

theory lct_cell = or2 + eq: 


consts 

struct:: "int=>((wire*(wire)vector)*(wire)vector)=>(wire*(wire*(wire)vector))=> 

bool" 

height:: "int=>((wire*(wire)vector)*(wire)vector)=>(wire*(wire*(wire)vector))=> 

int" 

width:: "int=>((wire*(wire)vector)*(wire)vector)=>(wire*(wire*(wire)vector))=>int 

" 

lct_cell:: "(int=>((wire*(wire)vector)*(wire)vector)=>(wire*(wire*(wire)vector)) 

=>bool, int=>((wire*(wire)vector)*(wire)vector)=>(wire*(wire*(wire)vector))=> 

int)block" 

defs 

struct_def: "struct == % bits ((f, a), s) (d, (f2, a2)). EX (a_eq_s::wire). (a2 = 

a) & (d = f) & Def ((a, s) ;;; eq $ bits ;;; a_eq_s) & Def ((a_eq_s, f) ;;; 

or2 ;;; f2)" 

height_def: "height == % bits ((f, a), s) (d, (f2, a2)). let a_eq_s = (THE ( 

a_eq_s::wire). (a2 = a) & (d = f) & Def ((a, s) ;;; eq $ bits ;;; a_eq_s) & 

Def ((a_eq_s, f) ;;; or2 ;;; f2)) in max (Height ((a_eq_s, f) ;;; or2 ;;; f2) 

) ((max ((Height ((a_eq_s, f) ;;; or2 ;;; f2)) + (Height ((a, s) ;;; eq $ 

bits ;;; a_eq_s))) 0))" 

width_def: "width == % bits ((f, a), s) (d, (f2, a2)). let a_eq_s = (THE (a_eq_s 

::wire). (a2 = a) & (d = f) & Def ((a, s) ;;; eq $ bits ;;; a_eq_s) & Def (( 

a_eq_s, f) ;;; or2 ;;; f2)) in max (Width ((a_eq_s, f) ;;; or2 ;;; f2)) ((max 

(Width ((a, s) ;;; eq $ bits ;;; a_eq_s)) 0))" 

lct_cell_def: "lct_cell == (| Def = struct, Height = height, Width = width |)" 





theorem height_ge0_int : "!! (bits::int) (f::wire) (a::(wire)vector) (s::(wire)vector 

) (d::wire) (f2::wire) (a2::(wire)vector). 0


apply (auto intro: sum_ge0 maxf_ge0 sum_ge0_frange maxf_ge0_frange z_aleq_bc eq. 

height_ge0 eq.width_ge0 or2.height_ge0 or2.width_ge0 simp add: Let_def max_def) 

done 


theorem height_ge0: "!! (bits::int) (f::wire) (a::(wire)vector) (s::(wire)vector) (d 

::wire) (f2::wire) (a2::(wire)vector). 0


)) + (Height ((a, s) ;;; eq $ bits ;;; a_eq_s))) (wire)vector=>bool" 

height:: "int=>int=>((wire)vector*((wire)vector)vector)=>(wire)vector=>int" 

width:: "int=>int=>((wire)vector*((wire)vector)vector)=>(wire)vector=>int" 

locater:: "(int=>int=>((wire)vector*((wire)vector)vector)=>(wire)vector=>bool, 

int=>int=>((wire)vector*((wire)vector)vector)=>(wire)vector=>int)block" 

defs 

struct_def: "struct == % bits n (a, s) d. EX (found::wire). (found = bool2wire 

False) & Def (((found, a), s) ;;; row $ (n + 1, lct_cell $ bits) ;; pi1 ;;; d 

)" 

height_def: "height == % bits n (a, s) d. let found = (THE (found::wire). (found 

= bool2wire False) & Def (((found, a), s) ;;; row $ (n + 1, lct_cell $ bits) 

;; pi1 ;;; d)) in max (Height (((found, a), s) ;;; row $ (n + 1, lct_cell $ 

bits) ;; pi1 ;;; d)) 0" 

width_def: "width == % bits n (a, s) d. let found = (THE (found::wire). (found = 

bool2wire False) & Def (((found, a), s) ;;; row $ (n + 1, lct_cell $ bits) ;; 

pi1 ;;; d)) in max (Width (((found, a), s) ;;; row $ (n + 1, lct_cell $ bits 

) ;; pi1 ;;; d)) 0" 

locater_def: "locater == (| Def = struct, Height = height, Width = width |)" 





theorem height_ge0_int : "!! (bits::int) (n::int) (a::(wire)vector) (s::((wire)vector 

)vector) (d::(wire)vector). 0


done 

theorem width_ge0_int : "!! (bits::int) (n::int) (a::(wire)vector) (s::((wire)vector) 

vector) (d::(wire)vector). 0 ((wire)vector*((wire)vector*wire))=>((wire)vector*(wire) 

vector)=>int" 

width:: "wire=>int=>((wire)vector*((wire)vector*wire))=>((wire)vector*(wire) 

vector)=>int" 

del_cell:: "(wire=>int=>((wire)vector*((wire)vector*wire))=>((wire)vector*(wire) 

vector)=>bool, wire=>int=>((wire)vector*((wire)vector*wire))=>((wire)vector*(


wire)vector)=>int)block" 

defs 

struct_def: "struct == % clk bits (x, (y, mode)) (m, n). Def ((y, x) ;;; mux $ 

bits $ mode ;;; n) & Def ((x, y) ;;; mux_ff $ clk $ bits $ mode ;;; m)" 

height_def: "height == % clk bits (x, (y, mode)) (m, n). max (Height ((x, y) ;;; 

mux_ff $ clk $ bits $ mode ;;; m)) (Height ((y, x) ;;; mux $ bits $ mode ;;; 

n))" 

width_def: "width == % clk bits (x, (y, mode)) (m, n). max (1 + (Width ((x, y) 

;;; mux_ff $ clk $ bits $ mode ;;; m))) (Width ((y, x) ;;; mux $ bits $ mode 

;;; n))" 

del_cell_def: "del_cell == (| Def = struct, Height = height, Width = width |)" 





theorem height_ge0_int : "!! (clk::wire) (bits::int) (x::(wire)vector) (y::(wire) 

vector) (mode::wire) (m::(wire)vector) (n::(wire)vector). 0


((0::int) int" 

width:: "wire=>int=>int=>(((wire)vector)vector*(wire)vector)=>((wire)vector) 

vector=>int" 

compactor:: "(wire=>int=>int=>(((wire)vector)vector*(wire)vector)=>((wire)vector) 

vector=>bool, wire=>int=>int=>(((wire)vector)vector*(wire)vector)=>((wire) 

vector)vector=>int)block" 

defs 

struct_def: "struct == % clk bits n (s, d) s2. Def ((s, (s, d)) 

;;; [[ id, zip $ 2 ]] ;; row $ (n, del_cell $ clk $ bits) ;; pi1 ;;; s2)"


height_def: "height == % clk bits n (s, d) s2. Height ((s, (s, d)) 

;;; [[ id, zip $ 2 ]] ;; row $ (n, del_cell $ clk $ bits) ;; pi1 ;;; s2)" 

width_def: "width == % clk bits n (s, d) s2. Width ((s, (s, d)) 

;;; [[ id, zip $ 2 ]] ;; row $ (n, del_cell $ clk $ bits) ;; pi1 ;;; s2)" 

compactor_def: "compactor == (| Def = struct, Height = height, Width = width |)" 





theorem height_ge0_int : "!! (clk::wire) (bits::int) (n::int) (s::((wire)vector) 

vector) (d::(wire)vector) (s2::((wire)vector)vector). 0


((0::int) bool" 

height:: "int=>int=>((wire)vector*((wire)vector)vector)=>(((wire)vector)vector*( 

wire)vector)=>int" 

width:: "int=>int=>((wire)vector*((wire)vector)vector)=>(((wire)vector)vector*( 

wire)vector)=>int" 

insert_median:: "(int=>int=>((wire)vector*((wire)vector)vector)=>(((wire)vector) 

vector*(wire)vector)=>bool, int=>int=>((wire)vector*((wire)vector)vector) 

=>(((wire)vector)vector*(wire)vector)=>int)block" 

defs 

struct_def: "struct == % bits n (a, b) (c, d). Def ((a, b) ;;; insert $ bits $ n 

;; fork ;; snd $ (midelem $ (n + 1)) ;;; (c, d))" 

height_def: "height == % bits n (a, b) (c, d). Height ((a, b) ;;; insert $ bits $ 

n ;; fork ;; snd $ (midelem $ (n + 1)) ;;; (c, d))" 

width_def: "width == % bits n (a, b) (c, d). Width ((a, b) ;;; insert $ bits $ n 

;; fork ;; snd $ (midelem $ (n + 1)) ;;; (c, d))" 

insert_median_def: "insert_median == (| Def = struct, Height = height, Width = 

width |)" 





theorem height_ge0_int : "!! (bits::int) (n::int) (a::(wire)vector) (b::((wire)vector 

)vector) (c::((wire)vector)vector) (d::(wire)vector). 0


theorem width_ge0_int : "!! (bits::int) (n::int) (a::(wire)vector) (b::((wire)vector) 

vector) (c::((wire)vector)vector) (d::(wire)vector). 0


height:: "wire=>int=>int=>((wire)vector*((wire)vector)vector)=>(((wire)vector) 

vector*bool)=>int" 

width:: "wire=>int=>int=>((wire)vector*((wire)vector)vector)=>(((wire)vector) 

vector*bool)=>int" 

nextstate:: "(wire=>int=>int=>((wire)vector*((wire)vector)vector)=>(((wire)vector 

)vector*bool)=>bool, wire=>int=>int=>((wire)vector*((wire)vector)vector)=>((( 

wire)vector)vector*bool)=>int)block" 

defs 

struct_def: "struct == % clk bits n (a, s) (s2, d). EX (control::(wire)vector). 

Def ((a, s) ;;; locater $ bits $ n ;;; control) & Def ((s, control) ;;; 

compactor $ clk $ bits $ n ;;; s2)" 

height_def: "height == % clk bits n (a, s) (s2, d). let control = (THE (control 

::(wire)vector). Def ((a, s) ;;; locater $ bits $ n ;;; control) & Def ((s, 

control) ;;; compactor $ clk $ bits $ n ;;; s2)) in max (Height ((s, control) 

;;; compactor $ clk $ bits $ n ;;; s2)) ((Height ((s, control) ;;; compactor 

$ clk $ bits $ n ;;; s2)) + (Height ((a, s) ;;; locater $ bits $ n ;;; 

control)))" 

width_def: "width == % clk bits n (a, s) (s2, d). let control = (THE (control::( 

wire)vector). Def ((a, s) ;;; locater $ bits $ n ;;; control) & Def ((s, 

control) ;;; compactor $ clk $ bits $ n ;;; s2)) in max (Width ((s, control) 

;;; compactor $ clk $ bits $ n ;;; s2)) (Width ((a, s) ;;; locater $ bits $ n 

;;; control))" 

nextstate_def: "nextstate == (| Def = struct, Height = height, Width = width |)" 





theorem height_ge0_int : "!! (clk::wire) (bits::int) (n::int) (a::(wire)vector) (s 

::((wire)vector)vector) (s2::((wire)vector)vector) (d::bool). 0


theorem width_ge0: "!! (clk::wire) (bits::int) (n::int) (a::(wire)vector) (s::((wire) 

vector)vector) (s2::((wire)vector)vector) (d::bool). 0 

((0::int)


(rule allI)+, 

(case_tac "0 ((wire)vector*((wire)vector)vector)=>(((wire)vector) 

vector*(wire)vector)=>bool" 

height:: "(int*int)=>wire=>((wire)vector*((wire)vector)vector)=>(((wire)vector) 

vector*(wire)vector)=>int" 

width:: "(int*int)=>wire=>((wire)vector*((wire)vector)vector)=>(((wire)vector) 

vector*(wire)vector)=>int" 

filter_core:: "((int*int)=>wire=>((wire)vector*((wire)vector)vector)=>(((wire) 

vector)vector*(wire)vector)=>bool, (int*int)=>wire=>((wire)vector*((wire) 

vector)vector)=>(((wire)vector)vector*(wire)vector)=>int)block" 

defs 

struct_def: "struct == % (n, bits) clk (newval, s) (s2, median). Def ((newval, s) 

;;; fst $ (fork ;; fst $ (rcomp $ (n, map $ (bits, fd $ clk)))) ;; below $ ( 

nextstate $ clk $ bits $ n, insert_median $ bits $ n) ;; snd $ (pi2) ;;; (s2, 

median))" 

height_def: "height == % (n, bits) clk (newval, s) (s2, median). Height ((newval, 

s) ;;; fst $ (fork ;; fst $ (rcomp $ (n, map $ (bits, fd $ clk)))) ;; below 

$ (nextstate $ clk $ bits $ n, insert_median $ bits $ n) ;; snd $ (pi2) ;;; ( 

s2, median))" 

width_def: "width == % (n, bits) clk (newval, s) (s2, median). Width ((newval, s) 

;;; fst $ (fork ;; fst $ (rcomp $ (n, map $ (bits, fd $ clk)))) ;; below $ ( 

nextstate $ clk $ bits $ n, insert_median $ bits $ n) ;; snd $ (pi2) ;;; (s2, 

median))" 

filter_core_def: "filter_core == (| Def = struct, Height = height, Width = width 

|)" 





theorem height_ge0_int : "!! (n::int) (bits::int) (clk::wire) (newval::(wire)vector) 

(s::((wire)vector)vector) (s2::((wire)vector)vector) (median::(wire)vector). 0



bits $ n, insert_median $ bits $ n) ;; snd $ (pi2) ;;; (s2, median)))) (wire)vector=>(wire)vector=>bool" 

height:: "(int*int)=>wire=>(wire)vector=>(wire)vector=>int" 

width:: "(int*int)=>wire=>(wire)vector=>(wire)vector=>int" 

filter:: "((int*int)=>wire=>(wire)vector=>(wire)vector=>bool, (int*int)=>wire=>( 

wire)vector=>(wire)vector=>int)block" 

defs 

struct_def: "struct == % (n, bits) clk newval median. Def (newval ;;; loop $ ( 

filter_core $ (n, bits) $ clk) ;;; median)" 

height_def: "height == % (n, bits) clk newval median. Height (newval ;;; loop $ ( 


width_def: "width == % (n, bits) clk newval median. Width (newval ;;; loop $ ( 


filter_def: "filter == (| Def = struct, Height = height, Width = width |)" 





theorem height_ge0_int : "!! (n::int) (bits::int) (clk::wire) (newval::(wire)vector) 

(median::(wire)vector). 0


done 

Let_def max_def) 


theorem height_ge0: "!! (n::int) (bits::int) (clk::wire) (newval::(wire)vector) ( 

median::(wire)vector). 0


} 

wire o2. 

i ; R ; o2 at (0,0). 

o2 ; fd clk ; o at (0,0). 

block and3 (wire a, wire b, wire c) ∼ (wire d) 


block mux lut (wire s) (wire d0, wire d1) ∼ (wire o) 


block gr lut ((wire a, wire b), (wire is gr, wire is eq)) ∼ (wire is gr2) 


block eq lut ((wire a, wire b), wire is eq) ∼ (wire is eq2) 


block comp elem ((wire a, wire b), (wire is gr, wire is eq)) ∼ (wire is gr2, wire is eq2) { 

((a, b), (is gr, is eq)) ; gr lut ; is gr2 at (0, 0). 

((a, b), is eq) ; eq lut ; is eq2 at (1, 0). 

} 

block comparator (int bits) (wire a[bits ], wire b[bits ]) ∼ (wire a gr b) { 

wire zero, one. 

zero = false. one = true. 

((a, b), (zero, one)) ; fst (zip 2) ; rdr (bits, comp elem) ;pi1 ; a gr b at (0,0). 

} 

/∗ Two input sorting circuit with output register ∗/ 

block sort2 (wire clk) (int bits) (wire a[bits ], wire b[bits ]) ∼ 

(wire min val[bits], wire max val[bits]) { 

wire a gr b. 

(a, b) ; comparator (bits) ; a gr b at (0,0). 

(a, b) ; zip 2 ; map (bits, register clk (mux lut a gr b)) ;min val at (width((a, b) ; 

comparator (bits) ; a gr b), 0). 

(b, a) ; zip 2 ; map (bits, register clk (mux lut a gr b)) ;max val at (width((a, b) ; 

comparator (bits) ;a gr b) + width((a, b) ;zip 2 ; map (bits, mux lut a gr b) ; 

min val), 0). 

} 

block vecpair (‘a i [2]) ∼ (‘a o1, ‘a o2) → (o1, o2) = (i [0], i [1]) . 

/∗ Combinator describing an arbitrary butterfly network ∗/ 

block butterfly (int n, block R (‘a, ‘a) ∼ (‘a, ‘a)) (‘a l[m]) ∼ (‘a r[m]) { 

const m = 2 ∗∗ n. 

l ; rcomp (n, 

riffle (m/2) ; 

pair (m/2) ; 

map (m/2, vecpair ; R ; converse (vecpair)) ; 

converse (pair (m/2)) 

) 

; r. 

} 

/∗ Pipelined bitonic merger ∗/ 

block merger (int n) (wire a[m/2], wire b[m/2]) ∼ (wire c[m]) { 

const m = 2∗∗n.


} 

(a,b) ; snd (rev (m/2)) ; converse (half (m/2)) ; butterfly (n, sort2) ; c at (0,0). 

D.2.2 Theory comparator 

theory comparator = pi1 + comp_elem + rdr + fst: 


consts 

struct:: "int=>((wire)vector*(wire)vector)=>wire=>bool" 

height:: "int=>((wire)vector*(wire)vector)=>wire=>int" 

width:: "int=>((wire)vector*(wire)vector)=>wire=>int" 

comparator:: "(int=>((wire)vector*(wire)vector)=>wire=>bool, int=>((wire)vector*( 

wire)vector)=>wire=>int)block" 

defs 

struct_def: "struct == % bits (a, b) a_gr_b. EX (zero::wire) (one::wire). (zero = 

(bool2wire False)) & (one = (bool2wire True)) & Def (((a, b), (zero, one)) 

;;; fst $ (zip $ 2) ;; rdr $ (bits, comp_elem) ;; pi1 ;;; a_gr_b)" 

height_def: "height == % bits (a, b) a_gr_b. let (zero, one) = (THE (zero::wire, 

one::wire). (zero = (bool2wire False)) & (one = (bool2wire True)) & Def (((a, 

b), (zero, one)) ;;; fst $ (zip $ 2) ;; rdr $ (bits, comp_elem) ;; pi1 ;;; 

a_gr_b)) in max (Height (((a, b), (zero, one)) ;;; fst $ (zip $ 2) ;; rdr $ ( 

bits, comp_elem) ;; pi1 ;;; a_gr_b)) 0" 

width_def: "width == % bits (a, b) a_gr_b. let (zero, one) = (THE (zero::wire, 

one::wire). (zero = (bool2wire False)) & (one = (bool2wire True)) & Def (((a, 

b), (zero, one)) ;;; fst $ (zip $ 2) ;; rdr $ (bits, comp_elem) ;; pi1 ;;; 

a_gr_b)) in max (Width (((a, b), (zero, one)) ;;; fst $ (zip $ 2) ;; rdr $ ( 

bits, comp_elem) ;; pi1 ;;; a_gr_b)) 0" 

comparator_def: "comparator == (| Def = struct, Height = height, Width = width |) 

" 





theorem height_ge0_int : "!! (bits::int) (a::(wire)vector) (b::(wire)vector) (a_gr_b 

::wire). 0


apply (simp (no_asm_simp) del: height_def width_def add: Let_def max_def fst_def 

zip_def rdr_def comp_elem_def pi1_def comparator_def, 

(rule height_ge0_int, (simp+)?)?) 

done 

theorem width_ge0: "!! (bits::int) (a::(wire)vector) (b::(wire)vector) (a_gr_b::wire) 

. 0 

((0::int) 

bool" 

height:: "wire=>int=>((wire)vector*(wire)vector)=>((wire)vector*(wire)vector)=> 

int" 

width:: "wire=>int=>((wire)vector*(wire)vector)=>((wire)vector*(wire)vector)=>int 

" 

sort2:: "(wire=>int=>((wire)vector*(wire)vector)=>((wire)vector*(wire)vector)=> 

bool, wire=>int=>((wire)vector*(wire)vector)=>((wire)vector*(wire)vector)=> 

int)block" 

defs 

struct_def: "struct == % clk bits (a, b) (min_val, max_val). EX (a_gr_b::wire). 

Def ((a, b) ;;; comparator $ bits ;;; a_gr_b) & Def ((a, b) ;;; zip $ 2 ;; 

map $ (bits, register $ clk $ (mux_lut $ a_gr_b)) ;;; min_val) & Def ((b, a) 

;;; zip $ 2 ;; map $ (bits, register $ clk $ (mux_lut $ a_gr_b)) ;;; max_val) 

" 

height_def: "height == % clk bits (a, b) (min_val, max_val). let a_gr_b = (THE ( 

a_gr_b::wire). Def ((a, b) ;;; comparator $ bits ;;; a_gr_b) & Def ((a, b) 

;;; zip $ 2 ;; map $ (bits, register $ clk $ (mux_lut $ a_gr_b)) ;;; min_val) 

& Def ((b, a) ;;; zip $ 2 ;; map $ (bits, register $ clk $ (mux_lut $ a_gr_b


)) ;;; max_val)) in max (Height ((b, a) ;;; zip $ 2 ;; map $ (bits, register 

$ clk $ (mux_lut $ a_gr_b)) ;;; max_val)) ((max (Height ((a, b) ;;; zip $ 2 

;; map $ (bits, register $ clk $ (mux_lut $ a_gr_b)) ;;; min_val)) (Height (( 

a, b) ;;; comparator $ bits ;;; a_gr_b))))" 

width_def: "width == % clk bits (a, b) (min_val, max_val). let a_gr_b = (THE ( 

a_gr_b::wire). Def ((a, b) ;;; comparator $ bits ;;; a_gr_b) & Def ((a, b) 

;;; zip $ 2 ;; map $ (bits, register $ clk $ (mux_lut $ a_gr_b)) ;;; min_val) 

& Def ((b, a) ;;; zip $ 2 ;; map $ (bits, register $ clk $ (mux_lut $ a_gr_b 

)) ;;; max_val)) in max ((Width ((a, b) ;;; comparator $ bits ;;; a_gr_b)) + 

(Width ((a, b) ;;; zip $ 2 ;; map $ (bits, mux_lut $ a_gr_b) ;;; min_val)) + 

(Width ((b, a) ;;; zip $ 2 ;; map $ (bits, register $ clk $ (mux_lut $ a_gr_b 

)) ;;; max_val))) ((max ((Width ((a, b) ;;; comparator $ bits ;;; a_gr_b)) + 

(Width ((a, b) ;;; zip $ 2 ;; map $ (bits, register $ clk $ (mux_lut $ a_gr_b 

)) ;;; min_val))) (Width ((a, b) ;;; comparator $ bits ;;; a_gr_b))))" 

sort2_def: "sort2 == (| Def = struct, Height = height, Width = width |)" 





theorem height_ge0_int : "!! (clk::wire) (bits::int) (a::(wire)vector) (b::(wire) 

vector) (min_val::(wire)vector) (max_val::(wire)vector). 0


a_gr_b)) ;;; min_val) ; Def ((b, a) ;;; zip $ 2 ;; map $ (bits, register $ clk $ 

(mux_lut $ a_gr_b)) ;;; max_val) |] ==> 

((0::int)


zip $ 2 ;; map $ (bits, register $ clk $ (mux_lut $ a_gr_b)) ;;; max_val))) 

((max ((Width ((a, b) ;;; comparator $ bits ;;; a_gr_b)) + (Width ((a, b) ;;; 

zip $ 2 ;; map $ (bits, register $ clk $ (mux_lut $ a_gr_b)) ;;; min_val))) 

(Width ((a, b) ;;; comparator $ bits ;;; a_gr_b)))))) & ((0 + (Height ((b, a) 

;;; zip $ 2 ;; map $ (bits, register $ clk $ (mux_lut $ a_gr_b)) ;;; max_val 

))) 

((0 + (Width ((a, b) ;;; comparator $ bits ;;; a_gr_b)))


((((Width ((a, b) ;;; comparator $ bits ;;; a_gr_b)) + (Width ((a, b) ;;; zip $ 2 

;; map $ (bits, register $ clk $ (mux_lut $ a_gr_b)) ;;; min_val))) (’t280*’t280)=>bool,(’t280*’t280)=>(’t280*’t280) 

=>int)block))=>(’t280)vector=>(’t280)vector=>int" 

width:: "(int*(((’t280*’t280)=>(’t280*’t280)=>bool,(’t280*’t280)=>(’t280*’t280)=> 

int)block))=>(’t280)vector=>(’t280)vector=>int" 

butterfly:: "((int*(((’t280*’t280)=>(’t280*’t280)=>bool,(’t280*’t280)=>(’t280*’ 

t280)=>int)block))=>(’t280)vector=>(’t280)vector=>bool, (int*(((’t280*’t280) 

=>(’t280*’t280)=>bool,(’t280*’t280)=>(’t280*’t280)=>int)block))=>(’t280) 

vector=>(’t280)vector=>int)block" 

defs 

struct_def: "struct == % (n, R) l r. let m = (2 pwr n) in Def (l ;;; rcomp $ (n, 

riffle $ (m div 2) ;; pair $ (m div 2) ;; map $ (m div 2, vecpair ;; R ;; 

converse $ (vecpair)) ;; converse $ (pair $ (m div 2))) ;;; r)" 

height_def: "height == % (n, R) l r. let m = (2 pwr n) in Height (l ;;; rcomp $ ( 

n, riffle $ (m div 2) ;; pair $ (m div 2) ;; map $ (m div 2, vecpair ;; R ;;



width_def: "width == % (n, R) l r. let m = (2 pwr n) in Width (l ;;; rcomp $ (n, 

riffle $ (m div 2) ;; pair $ (m div 2) ;; map $ (m div 2, vecpair ;; R ;; 


butterfly_def: "butterfly == (| Def = struct, Height = height, Width = width|)" 





theorem height_ge0_int : "!! (n::int) (R::(((’t280*’t280)=>(’t280*’t280)=>bool,(’t280 

*’t280)=>(’t280*’t280)=>int)block)) (l::(’t280)vector) (r::(’t280)vector). [| 

ALL (qs344::(’t280*’t280)) (qs345::(’t280*’t280)). 0 bool,(’t280 

*’t280)=>(’t280*’t280)=>int)block)) (l::(’t280)vector) (r::(’t280)vector). [| 

ALL (qs344::(’t280*’t280)) (qs345::(’t280*’t280)). 0 bool,(’ 

t280*’t280)=>(’t280*’t280)=>int)block)) (l::(’t280)vector) (r::(’t280)vector). 

[| ALL (qs344::(’t280*’t280)) (qs345::(’t280*’t280)). 0 bool,(’ 

t280*’t280)=>(’t280*’t280)=>int)block)) (l::(’t280)vector) (r::(’t280)vector).


[| ALL (qs344::(’t280*’t280)) (qs345::(’t280*’t280)). 0 bool,(’t280*’t280)=>(’t280 

*’t280)=>int)block)) (l::(’t280)vector) (r::(’t280)vector). [| Def (l ;;; rcomp 

$ (n, riffle $ (m div 2) ;; pair $ (m div 2) ;; map $ (m div 2, vecpair ;; R ;; 

converse $ (vecpair)) ;; converse $ (pair $ (m div 2))) ;;; r) ; m = (2 pwr n) ; 

ALL (qs344::(’t280*’t280)) (qs345::(’t280*’t280)). 0 bool" 

height:: "wire=>int=>int=>(((wire)vector)vector*((wire)vector)vector)=>((wire) 

vector)vector=>int" 

width:: "wire=>int=>int=>(((wire)vector)vector*((wire)vector)vector)=>((wire) 

vector)vector=>int" 

merger:: "(wire=>int=>int=>(((wire)vector)vector*((wire)vector)vector)=>((wire) 

vector)vector=>bool, wire=>int=>int=>(((wire)vector)vector*((wire)vector) 

vector)=>((wire)vector)vector=>int)block" 

defs 

struct_def: "struct == % clk bits n (a, b) c. let m = (2 pwr n) in Def ((a, b) 

;;; snd $ (rev $ (m div 2)) ;; converse $ (half $ (m div 2)) ;; butterfly $ ( 

n, sort2 $ clk $ bits) ;;; c)"


height_def: "height == % clk bits n (a, b) c. let m = (2 pwr n) in Height ((a, b) 

;;; snd $ (rev $ (m div 2)) ;; converse $ (half $ (m div 2)) ;; butterfly $ 

(n, sort2 $ clk $ bits) ;;; c)" 

width_def: "width == % clk bits n (a, b) c. let m = (2 pwr n) in Width ((a, b) 

;;; snd $ (rev $ (m div 2)) ;; converse $ (half $ (m div 2)) ;; butterfly $ ( 

n, sort2 $ clk $ bits) ;;; c)" 

merger_def: "merger == (| Def = struct, Height = height, Width = width|)" 





theorem height_ge0_int : "!! (clk::wire) (bits::int) (n::int) (a::((wire)vector) 

vector) (b::((wire)vector)vector) (c::((wire)vector)vector). 0


butterfly $ (n, sort2 $ clk $ bits) ;;; c)))


} 

cin = false. 

(a, b) ; zip 2 ; converse (pi1) ; col (n, fadd clk) ; (cin, ans) at (0,0). 

/∗ Repeating cell for cubical matrix multiplier ∗/ 

block matmultcell (int n) (wire clk) (wire x in[n], wire y in[n], wire z in[n]) ∼ 

(wire z out[n], wire y out[n], wire x out[n]) { 

x out = x in. 

y out = y in. 

((x in, y in), z in) ; fst (mult n) ; add n clk; z out at (0,0). 

} 

/∗∗ Transpose a matrix ∗/ 

block word transpose (int bits) (int n, int m) (‘a v1[n][m][bits ]) ∼ (‘a v2[m][n][ bits ]) { 

int i , j. 

for i = 0..n−1 { 

for j = 0..m−1 { 

v2[j ][ i ] = v1[i ][ j]. 

} . } . 

} 

/∗∗ Matrix multiplier ∗/ 

block matmult (wire clk) (int bits) (int x, int y, int z) 

(wire mat1[y][z ][ bits ], wire mat2[z][x][ bits ]) ∼ (wire mat3[y][x][ bits ]) { 

wire emptymat[y][x][bits]. 

wire mat trans[z][y][ bits ]. 

int i , j, k. 

for i = 0..y−1 { for j = 0..x−1 { for k = 0..bits−1 { emptymat[i][j][ k] = false. } . } . } . 

mat1 ; word transpose bits (y, z) ; mat trans at (0,0). 

(mat trans, mat2, emptymat) ; 

cube (x, y, z, matmultcell bits clk) ; 

converse (tplapl 2) ; 

pi1 ; 

mat3 at (0,0). 

} 

D.3.2 Theory cube cell 

theory cube_cell = apr + lsh + swap + fst + rsh + apl + converse + snd: 

section {* Temporary definitions to support tplapl in Isabelle *} 

constdefs 

tplapr2_struct :: "((’a*’b)*’c)=>(’a*’b*’c)=>bool" 

"tplapr2_struct == (% ((a, b), c) (d, e, f). a = d & b = e & f = c)" 

tplapr2 :: "(((’a*’b)*’c)=>(’a*’b*’c)=>bool,((’a*’b)*’c)=>(’a*’b*’c)=>int)block" 

"tplapr2 == (| Def = tplapr2_struct, Height = % a b. (0::int), Width = % a b. 

(0::int) |)" 

tplapl2_struct :: "(’a*(’b*’c))=>(’a*’b*’c)=>bool" 

"tplapl2_struct == (% (a, (b, c)) (d, e, f). a = d & b = e & c = f)" 

tplapl2 :: "((’a*(’b*’c))=>(’a*’b*’c)=>bool,(’a*(’b*’c))=>(’a*’b*’c)=>int)block" 

"tplapl2 == (| Def = tplapl2_struct, Height = % a b. (0::int), Width = % a b. 

(0::int) |)" 


consts


struct:: "(int*(((’t697*’t698*’t772)=>(’t772*’t698*’t697)=>bool,(’t697*’t698*’ 

t772)=>(’t772*’t698*’t697)=>int)block))=>(((’t772)vector*’t697)*’t698)=>(’ 

t698*((’t772)vector*’t697))=>bool" 

height:: "(int*(((’t697*’t698*’t772)=>(’t772*’t698*’t697)=>bool,(’t697*’t698*’ 

t772)=>(’t772*’t698*’t697)=>int)block))=>(((’t772)vector*’t697)*’t698)=>(’ 

t698*((’t772)vector*’t697))=>int" 

width:: "(int*(((’t697*’t698*’t772)=>(’t772*’t698*’t697)=>bool,(’t697*’t698*’t772 

)=>(’t772*’t698*’t697)=>int)block))=>(((’t772)vector*’t697)*’t698)=>(’t698 

*((’t772)vector*’t697))=>int" 

cube_cell:: "((int*(((’t697*’t698*’t772)=>(’t772*’t698*’t697)=>bool,(’t697*’t698 

*’t772)=>(’t772*’t698*’t697)=>int)block))=>(((’t772)vector*’t697)*’t698)=>(’ 

t698*((’t772)vector*’t697))=>bool, (int*(((’t697*’t698*’t772)=>(’t772*’t698*’ 

t697)=>bool,(’t697*’t698*’t772)=>(’t772*’t698*’t697)=>int)block))=>(((’t772) 

vector*’t697)*’t698)=>(’t698*((’t772)vector*’t697))=>int)block" 

defs 

struct_def: "struct == % (n, R) ((z, x), y) (y2, (z2, x2)). Def (((x, y), z) ;;; 

snd $ (converse $ (apl $ (n - 1))) ;; rsh ;; fst $ (tplapr2 ;; R ;; converse 

$ (tplapl2) ;; swap) ;; lsh ;; snd $ (swap ;; apr $ (n - 1)) ;;; ((y2, x2), 

z2))" 

height_def: "height == % (n, R) ((z, x), y) (y2, (z2, x2)). Height (((x, y), z) 

;;; snd $ (converse $ (apl $ (n - 1))) ;; rsh ;; fst $ (tplapr2 ;; R ;; 

converse $ (tplapl2) ;; swap) ;; lsh ;; snd $ (swap ;; apr $ (n - 1)) ;;; (( 

y2, x2), z2))" 

width_def: "width == % (n, R) ((z, x), y) (y2, (z2, x2)). Width (((x, y), z) ;;; 

snd $ (converse $ (apl $ (n - 1))) ;; rsh ;; fst $ (tplapr2 ;; R ;; converse 

$ (tplapl2) ;; swap) ;; lsh ;; snd $ (swap ;; apr $ (n - 1)) ;;; ((y2, x2), 

z2))" 

cube_cell_def: "cube_cell == (| Def = struct, Height = height, Width = width |)" 





theorem height_ge0_int : "!! (n::int) (R::(((’t697*’t698*’t772)=>(’t772*’t698*’t697) 

=>bool,(’t697*’t698*’t772)=>(’t772*’t698*’t697)=>int)block)) (z::(’t772)vector) 

(x::’t697) (y::’t698) (y2::’t698) (z2::(’t772)vector) (x2::’t697). [| ALL ( 

qs1101::(’t697*’t698*’t772)) (qs1102::(’t772*’t698*’t697)). 0


done 

theorem width_ge0_int : "!! (n::int) (R::(((’t697*’t698*’t772)=>(’t772*’t698*’t697)=> 

bool,(’t697*’t698*’t772)=>(’t772*’t698*’t697)=>int)block)) (z::(’t772)vector) (x 

::’t697) (y::’t698) (y2::’t698) (z2::(’t772)vector) (x2::’t697). [| ALL (qs1101 

::(’t697*’t698*’t772)) (qs1102::(’t772*’t698*’t697)). 0 bool 

,(’t697*’t698*’t772)=>(’t772*’t698*’t697)=>int)block)) (z::(’t772)vector) (x::’ 

t697) (y::’t698) (y2::’t698) (z2::(’t772)vector) (x2::’t697). [| ALL (qs1101::(’ 

t697*’t698*’t772)) (qs1102::(’t772*’t698*’t697)). 0 bool 

,(’t697*’t698*’t772)=>(’t772*’t698*’t697)=>int)block)) (z::(’t772)vector) (x::’ 

t697) (y::’t698) (y2::’t698) (z2::(’t772)vector) (x2::’t697). [| ALL (qs1101::(’ 

t697*’t698*’t772)) (qs1102::(’t772*’t698*’t697)). 0 bool,(’t697*’ 

t698*’t772)=>(’t772*’t698*’t697)=>int)block)) (z::(’t772)vector) (x::’t697) (y 

::’t698) (y2::’t698) (z2::(’t772)vector) (x2::’t697). [| Def (((x, y), z) ;;; 

snd $ (converse $ (apl $ (n - 1))) ;; rsh ;; fst $ (tplapr2 ;; R ;; converse $ ( 

tplapl2) ;; swap) ;; lsh ;; snd $ (swap ;; apr $ (n - 1)) ;;; ((y2, x2), z2)) ;


ALL (qs1101::(’t697*’t698*’t772)) (qs1102::(’t772*’t698*’t697)). 0 (((’t686)vector)vector*((’t805 

)vector)vector*((’t772)vector)vector)=>(((’t772)vector)vector*((’t805)vector) 

vector*((’t686)vector)vector)=>bool" 

height:: "(int*int*int*(((’t686*’t805*’t772)=>(’t772*’t805*’t686)=>bool,(’t686*’ 

t805*’t772)=>(’t772*’t805*’t686)=>int)block))=>(((’t686)vector)vector*((’t805 


vector*((’t686)vector)vector)=>int" 

width:: "(int*int*int*(((’t686*’t805*’t772)=>(’t772*’t805*’t686)=>bool,(’t686*’ 



vector*((’t686)vector)vector)=>int" 

cube:: "((int*int*int*(((’t686*’t805*’t772)=>(’t772*’t805*’t686)=>bool,(’t686*’ 



vector*((’t686)vector)vector)=>bool, (int*int*int*(((’t686*’t805*’t772)=>(’ 

t772*’t805*’t686)=>bool,(’t686*’t805*’t772)=>(’t772*’t805*’t686)=>int)block)) 

=>(((’t686)vector)vector*((’t805)vector)vector*((’t772)vector)vector)=>(((’ 

t772)vector)vector*((’t805)vector)vector*((’t686)vector)vector)=>int)block" 

defs 

struct_def: "struct == % (x, y, z, R) (x_d, y_d, z_d) (z_r, y_r, x_r). Def (((x_d 

, y_d), z_d) ;;; fst $ ((zip $ 2)::((((’t686)vector)vector*((’t805)vector) 

vector)=>((’t686)vector*(’t805)vector)vector=>bool,(((’t686)vector)vector*((’ 

t805)vector)vector)=>((’t686)vector*(’t805)vector)vector=>int)block) ;; col $ 

(z, swap ;; rsh ;; fst $ (zip $ 2) ;; grid $ (x, y, cube_cell $ (x, R)) ;; 

snd $ (converse $ (zip $ 2)) ;; rsh ;; fst $ swap ;; lsh) ;; snd $ (converse 

$ ((zip $ 2)::((((’t805)vector)vector*((’t686)vector)vector)=>((’t805)vector 

*(’t686)vector)vector=>bool,(((’t805)vector)vector*((’t686)vector)vector) 

=>((’t805)vector*(’t686)vector)vector=>int)block)) ;;; (z_r, (y_r, x_r)))" 

height_def: "height == % (x, y, z, R) (x_d, y_d, z_d) (z_r, y_r, x_r). Height ((( 

x_d, y_d), z_d) ;;; fst $ ((zip $ 2)::((((’t686)vector)vector*((’t805)vector) 

vector)=>((’t686)vector*(’t805)vector)vector=>bool,(((’t686)vector)vector*((’








width_def: "width == % (x, y, z, R) (x_d, y_d, z_d) (z_r, y_r, x_r). Width (((x_d 

, y_d), z_d) ;;; fst $ ((zip $ 2)::((((’t686)vector)vector*((’t805)vector) 

vector)=>((’t686)vector*(’t805)vector)vector=>bool,(((’t686)vector)vector*((’ 







cube_def: "cube == (| Def = struct, Height = height, Width = width |)" 





theorem height_ge0_int : "!! (x::int) (y::int) (z::int) (R::(((’t686*’t805*’t772)=>(’ 

t772*’t805*’t686)=>bool,(’t686*’t805*’t772)=>(’t772*’t805*’t686)=>int)block)) ( 

x_d::((’t686)vector)vector) (y_d::((’t805)vector)vector) (z_d::((’t772)vector) 

vector) (z_r::((’t772)vector)vector) (y_r::((’t805)vector)vector) (x_r::((’t686) 

vector)vector). [| ALL (qs1103::(’t686*’t805*’t772)) (qs1104::(’t772*’t805*’t686 

)). 0 bool,(’t686*’t805*’t772)=>(’t772*’t805*’t686)=>int)block)) ( 

x_d::((’t686)vector)vector) (y_d::((’t805)vector)vector) (z_d::((’t772)vector) 



)). 0 bool,(’t686*’t805*’t772)=>(’t772*’t805*’t686)=>int)block)) (x_d


::((’t686)vector)vector) (y_d::((’t805)vector)vector) (z_d::((’t772)vector) 



)). 0 bool,(’t686*’t805*’t772)=>(’t772*’t805*’t686)=>int)block)) (x_d 

::((’t686)vector)vector) (y_d::((’t805)vector)vector) (z_d::((’t772)vector) 



)). 0 bool,(’t686*’t805*’t772)=>(’t772*’t805*’t686)=>int)block)) (x_d::((’t686) 

vector)vector) (y_d::((’t805)vector)vector) (z_d::((’t772)vector)vector) (z_r 

::((’t772)vector)vector) (y_r::((’t805)vector)vector) (x_r::((’t686)vector) 

vector). [| Def (((x_d, y_d), z_d) ;;; fst $ (zip $ 2) ;; col $ (z, swap ;; rsh 

;; fst $ (zip $ 2) ;; grid $ (x, y, cube_cell $ (x, R)) ;; snd $ (converse $ ( 

zip $ 2)) ;; rsh ;; fst $ (swap) ;; lsh) ;; snd $ (converse $ (zip $ 2)) ;;; ( 

z_r, (y_r, x_r))) ; ALL (qs1103::(’t686*’t805*’t772)) (qs1104::(’t772*’t805*’ 

t686)). 0 ((’t686) 

vector*(’t805)vector)vector=>int)block) ;; col $ (z, swap ;; rsh ;; fst $ (zip $ 

2) ;; grid $ (x, y, cube_cell $ (x, R)) ;; snd $ (converse $ (zip $ 2)) ;; rsh 

;; fst $ swap ;; lsh) ;; snd $ (converse $ ((zip $ 2)::((((’t805)vector)vector 

*((’t686)vector)vector)=>((’t805)vector*(’t686)vector)vector=>bool,(((’t805) 

vector)vector*((’t686)vector)vector)=>((’t805)vector*(’t686)vector)vector=>int) 

block)) ;;; (z_r, (y_r, x_r))))) ((’t686)vector*(’t805)vector 

)vector=>bool,(((’t686)vector)vector*((’t805)vector)vector)=>((’t686)vector*(’ 

t805)vector)vector=>int)block) ;; col $ (z, swap ;; rsh ;; fst $ (zip $ 2) ;; 

grid $ (x, y, cube_cell $ (x, R)) ;; snd $ (converse $ (zip $ 2)) ;; rsh ;; fst 

$ swap ;; lsh) ;; snd $ (converse $ ((zip $ 2)::((((’t805)vector)vector*((’t686) 

vector)vector)=>((’t805)vector*(’t686)vector)vector=>bool,(((’t805)vector)vector 

*((’t686)vector)vector)=>((’t805)vector*(’t686)vector)vector=>int)block)) ;;; ( 

z_r, (y_r, x_r))))) &


((0 + (Height (((x_d, y_d), z_d) ;;; fst $ ((zip $ 2)::((((’t686)vector) 

vector*((’t805)vector)vector)=>((’t686)vector*(’t805)vector)vector=>bool 

,(((’t686)vector)vector*((’t805)vector)vector)=>((’t686)vector*(’t805) 

vector)vector=>int)block) ;; col $ (z, swap ;; rsh ;; fst $ (zip $ 2) ;; 

grid $ (x, y, cube_cell $ (x, R)) ;; snd $ (converse $ (zip $ 2)) ;; rsh 

;; fst $ swap ;; lsh) ;; snd $ (converse $ ((zip $ 2)::((((’t805)vector) 

vector*((’t686)vector)vector)=>((’t805)vector*(’t686)vector)vector=>bool 

,(((’t805)vector)vector*((’t686)vector)vector)=>((’t805)vector*(’t686) 

vector)vector=>int)block)) ;;; (z_r, (y_r, x_r))))) ((’t686)vector*(’t805)vector)vector=>bool,(((’t686)vector)vector 

*((’t805)vector)vector)=>((’t686)vector*(’t805)vector)vector=>int)block) 

;; col $ (z, swap ;; rsh ;; fst $ (zip $ 2) ;; grid $ (x, y, cube_cell $ 

(x, R)) ;; snd $ (converse $ (zip $ 2)) ;; rsh ;; fst $ swap ;; lsh) ;; 

snd $ (converse $ ((zip $ 2)::((((’t805)vector)vector*((’t686)vector) 

vector)=>((’t805)vector*(’t686)vector)vector=>bool,(((’t805)vector)vector 

*((’t686)vector)vector)=>((’t805)vector*(’t686)vector)vector=>int)block)) 

;;; (z_r, (y_r, x_r)))))" 

by auto 

end 

D.3.4 Theory matmultcell 

theory matmultcell = add + mult + fst: 


consts 

struct:: "int=>wire=>((wire)vector*(wire)vector*(wire)vector)=>((wire)vector*( 

wire)vector*(wire)vector)=>bool" 

height:: "int=>wire=>((wire)vector*(wire)vector*(wire)vector)=>((wire)vector*( 

wire)vector*(wire)vector)=>int" 

width:: "int=>wire=>((wire)vector*(wire)vector*(wire)vector)=>((wire)vector*(wire 

)vector*(wire)vector)=>int" 

matmultcell:: "(int=>wire=>((wire)vector*(wire)vector*(wire)vector)=>((wire) 

vector*(wire)vector*(wire)vector)=>bool, int=>wire=>((wire)vector*(wire) 

vector*(wire)vector)=>((wire)vector*(wire)vector*(wire)vector)=>int)block" 

defs 

struct_def: "struct == % n clk (x_in, y_in, z_in) (z_out, y_out, x_out). (x_out = 

x_in) & (y_out = y_in) & Def (((x_in, y_in), z_in) ;;; fst $ (mult $ n) ;; 

add $ n $ clk ;;; z_out)" 

height_def: "height == % n clk (x_in, y_in, z_in) (z_out, y_out, x_out). max ( 

Height (((x_in, y_in), z_in) ;;; fst $ (mult $ n) ;; add $ n $ clk ;;; z_out) 

) 0" 

width_def: "width == % n clk (x_in, y_in, z_in) (z_out, y_out, x_out). max (Width 

(((x_in, y_in), z_in) ;;; fst $ (mult $ n) ;; add $ n $ clk ;;; z_out)) 0" 

matmultcell_def: "matmultcell == (| Def = struct, Height = height, Width = width 

|)" 




section {* Validity of width and height functions *}


theorem height_ge0_int : "!! (n::int) (clk::wire) (x_in::(wire)vector) (y_in::(wire) 

vector) (z_in::(wire)vector) (z_out::(wire)vector) (y_out::(wire)vector) (x_out 

::(wire)vector). 0


theory matmult = pi1 + converse + matmultcell + cube + word_transpose: 


consts 

struct:: "wire=>int=>(int*int*int)=>((((wire)vector)vector)vector*(((wire)vector) 

vector)vector)=>(((wire)vector)vector)vector=>bool" 

height:: "wire=>int=>(int*int*int)=>((((wire)vector)vector)vector*(((wire)vector) 

vector)vector)=>(((wire)vector)vector)vector=>int" 

width:: "wire=>int=>(int*int*int)=>((((wire)vector)vector)vector*(((wire)vector) 

vector)vector)=>(((wire)vector)vector)vector=>int" 

matmult:: "(wire=>int=>(int*int*int)=>((((wire)vector)vector)vector*(((wire) 

vector)vector)vector)=>(((wire)vector)vector)vector=>bool, wire=>int=>(int* 

int*int)=>((((wire)vector)vector)vector*(((wire)vector)vector)vector)=>((( 

wire)vector)vector)vector=>int)block" 

defs 

struct_def: "struct == % clk bits (x, y, z) (mat1, mat2) mat3. EX (emptymat::((( 

wire)vector)vector)vector) (mat_trans::(((wire)vector)vector)vector). (ALL (i 

::int). ((0


tplapl2) ;; pi1 ;;; mat3)) 

in max (Width ((mat_trans, mat2, emptymat) ;;; cube $ (x, y, z, matmultcell $ 

bits $ clk) ;; converse $ (tplapl2) ;; pi1 ;;; mat3)) ((max (Width (mat1 ;;; 

word_transpose $ bits $ (y, z) ;;; mat_trans)) (if 0


((0


==> 

((0 + (Width (mat1 ;;; word_transpose $ bits $ (y, z) ;;; mat_trans)))

Verification of Parameterised FPGA Circuit Descriptions with Layout ...

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?