24.04.2013 Views

Verification of Parameterised FPGA Circuit Descriptions with Layout ...

Verification of Parameterised FPGA Circuit Descriptions with Layout ...

Verification of Parameterised FPGA Circuit Descriptions with Layout ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Imperial College <strong>of</strong> Science,<br />

Technology and Medicine<br />

(University <strong>of</strong> London)<br />

Department <strong>of</strong> Computing<br />

<strong>Verification</strong> <strong>of</strong> <strong>Parameterised</strong> <strong>FPGA</strong> <strong>Circuit</strong><br />

<strong>Descriptions</strong> <strong>with</strong> <strong>Layout</strong> Information<br />

by<br />

Oliver D. Pell<br />

Submitted in partial fulfilment<br />

<strong>of</strong> the requirements for the MSc<br />

Degree in Advanced Computing <strong>of</strong> the<br />

University <strong>of</strong> London and for the<br />

Diploma <strong>of</strong> Imperial College <strong>of</strong><br />

Science, Technology and Medicine.


Abstract<br />

Manual placement is commonly used in <strong>FPGA</strong> circuit design in order to achieve better results<br />

than would be generated by automatic place and route algorithms. However, explicit place-<br />

ment <strong>of</strong> individual components in parameterised descriptions is tedious and error-prone. In<br />

this thesis we present a framework for the design and verification <strong>of</strong> parameterised hardware<br />

libraries <strong>with</strong> layout information. There are five main contributions:<br />

(1) We develop additions to the Quartz language and provide compiler support to allow<br />

the addition <strong>of</strong> generic layout information to parameterised circuit descriptions described<br />

using iterative and recursive constructs. We show how functional combinators can be given<br />

multiple layout interpretations.<br />

(2) We provide a specification <strong>of</strong> layout correctness and develop a pro<strong>of</strong> environment to<br />

allow the verification <strong>of</strong> parameterised Quartz circuit layouts. We prove a range <strong>of</strong> useful<br />

theorems about common circuit layout expressions and achieve a high level <strong>of</strong> automation <strong>of</strong><br />

the verification process.<br />

(3) We develop and verify a range <strong>of</strong> placed combinator libraries describing useful circuit<br />

structures including rows, grids, trees and less regular examples. We show that our verifica-<br />

tion environment can not only establish correctness but also can highlight counter-examples<br />

where layouts are incorrect.<br />

(4) We show how distributed specialisation can be used to achieve transparent HDL-level<br />

specialisation <strong>of</strong> circuits when some inputs are known. Distributed specialisation allows the<br />

correctness <strong>of</strong> specialised circuits to be proven more easily than using lower-level methods.<br />

We demonstrate the use <strong>of</strong> our layout framework to specialise parameterised circuits and<br />

show that our system is able to achieve design compaction.<br />

(5) We describe and verify the layouts <strong>of</strong> five example circuits, including a butterfly network,<br />

binomial filter and matrix multiplier. We show that manual placement can reduce compilation<br />

time, reduce logic area by up to 60%, reduce power consumption by up to 20% and increase<br />

maximum clock frequency by up to 80% for unpipelined circuits and 48% for pipelined<br />

circuits.


Acknowledgements<br />

Firstly, I’d like to thank my supervisor, Wayne Luk, for invaluable discussions and support<br />

throughout this project and the work that has preceded it.<br />

I’d also like to thank all the past and present members <strong>of</strong> the Custom Computing research<br />

group for general help and advice. Particular thanks are due to Tobias Becker, Jacob Bower,<br />

Arran Derbyshire, Rob Dimond and Henry Styles for pointing me in interesting directions,<br />

reading drafts, stating the obvious when I had missed it or simply sharing their experience<br />

<strong>of</strong> where it’s best to aim kicks at expensive bits <strong>of</strong> <strong>FPGA</strong> hardware in order to get them to<br />

do what you want them to.<br />

I owe a debt <strong>of</strong> gratitude to those who have developed the tools I have used in this project,<br />

most especially the people at University <strong>of</strong> Cambridge, TU Munich and elsewhere who are<br />

responsible for Isabelle. Thanks are definitely due to the members <strong>of</strong> the isabelle-users<br />

mailing list, particularly Larry Paulson and Tobias Nipkow, for answering my Isabelle-related<br />

questions.<br />

Finally, I’d like to thank the friends and family who have put up <strong>with</strong> me all these years and<br />

especially Nia who has had to listen to me talk endlessly about hardware verification and<br />

synthesis and yet has managed, most <strong>of</strong> the time, to pretend that it is interesting.<br />

ii


Table <strong>of</strong> Contents<br />

1 Introduction 1<br />

1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2<br />

1.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4<br />

1.3 Organisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5<br />

2 Background and Related Work 6<br />

2.1 <strong>FPGA</strong>s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6<br />

2.1.1 Generating <strong>FPGA</strong> <strong>Circuit</strong>s . . . . . . . . . . . . . . . . . . . . . . . . 7<br />

2.1.2 <strong>Layout</strong> Information in Hardware <strong>Descriptions</strong> . . . . . . . . . . . . . . 8<br />

2.1.3 <strong>FPGA</strong> Architectures . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10<br />

2.2 Describing Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11<br />

2.2.1 High Level Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . 12<br />

2.2.2 Lower Level Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . 14<br />

2.3 Quartz . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15<br />

2.3.1 Type System and Overloading . . . . . . . . . . . . . . . . . . . . . . 16<br />

2.3.2 Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18<br />

2.3.3 Formal Reasoning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18<br />

2.4 Automated <strong>Verification</strong> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20<br />

2.4.1 Model Checking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22<br />

2.4.2 Theorem Proving . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23<br />

2.4.3 Comparison <strong>with</strong> Model Checking . . . . . . . . . . . . . . . . . . . . 24<br />

2.4.4 Logic and Pro<strong>of</strong> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24<br />

2.4.5 Theorem Proving Tools . . . . . . . . . . . . . . . . . . . . . . . . . . 25<br />

2.4.6 Embeddings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27<br />

iii


TABLE OF CONTENTS iv<br />

2.5 Isabelle: A Generic Theorem Prover . . . . . . . . . . . . . . . . . . . . . . . 28<br />

2.5.1 Meta-logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28<br />

2.5.2 Theories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30<br />

2.5.3 Unification, Resolution and Pro<strong>of</strong> . . . . . . . . . . . . . . . . . . . . . 31<br />

2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33<br />

3 Generating <strong>Parameterised</strong> Libraries <strong>with</strong> <strong>Layout</strong> 34<br />

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34<br />

3.2 Placement Infrastructure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36<br />

3.3 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37<br />

3.4 Block Sizes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40<br />

3.4.1 Size Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40<br />

3.4.2 Size <strong>of</strong> Block Instantiations . . . . . . . . . . . . . . . . . . . . . . . . 42<br />

3.4.3 Size Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44<br />

3.5 <strong>Parameterised</strong> Quartz <strong>with</strong> Placement . . . . . . . . . . . . . . . . . . . . . . 46<br />

3.5.1 Laid-out Combinators . . . . . . . . . . . . . . . . . . . . . . . . . . . 46<br />

3.5.2 Naive vs General Placement . . . . . . . . . . . . . . . . . . . . . . . . 47<br />

3.5.3 A Placed Ripple Adder . . . . . . . . . . . . . . . . . . . . . . . . . . 49<br />

3.6 Different <strong>Layout</strong> Interpretations . . . . . . . . . . . . . . . . . . . . . . . . . . 51<br />

3.6.1 Composition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52<br />

3.6.2 Combinators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54<br />

3.7 Compiling Placed Quartz Designs . . . . . . . . . . . . . . . . . . . . . . . . . 55<br />

3.7.1 Changes to the Type Processing Module . . . . . . . . . . . . . . . . . 56<br />

3.7.2 <strong>Layout</strong> Processing Module . . . . . . . . . . . . . . . . . . . . . . . . . 57<br />

3.7.3 Distillation <strong>of</strong> Size Expressions . . . . . . . . . . . . . . . . . . . . . . 58<br />

3.7.4 Recursive Size Expressions . . . . . . . . . . . . . . . . . . . . . . . . 59<br />

3.7.5 Expression Simplification . . . . . . . . . . . . . . . . . . . . . . . . . 61<br />

3.8 Compiling LE-Pebble into VHDL . . . . . . . . . . . . . . . . . . . . . . . . . 62<br />

3.9 Summary and Comparison <strong>with</strong> Related Work . . . . . . . . . . . . . . . . . 63<br />

4 Verifying <strong>Circuit</strong> <strong>Layout</strong>s 65<br />

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65


TABLE OF CONTENTS v<br />

4.2 Choice <strong>of</strong> Formalism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66<br />

4.3 Specifying Correctness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68<br />

4.3.1 Validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68<br />

4.3.2 Containment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68<br />

4.3.3 Intersection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69<br />

4.4 Pro<strong>of</strong> Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71<br />

4.4.1 Type System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72<br />

4.4.2 Blocks and Block Instantiation . . . . . . . . . . . . . . . . . . . . . . 73<br />

4.4.3 Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76<br />

4.5 Generating Theories <strong>of</strong> Quartz Programs . . . . . . . . . . . . . . . . . . . . . 79<br />

4.5.1 Compiler Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . 79<br />

4.5.2 Generating Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . 80<br />

4.5.3 Generating Pro<strong>of</strong> Obligations . . . . . . . . . . . . . . . . . . . . . . . 83<br />

4.6 Proving the Prelude Library . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88<br />

4.6.1 Pro<strong>of</strong>s <strong>with</strong> Tacticals . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88<br />

4.6.2 Improved Pro<strong>of</strong> Scripts . . . . . . . . . . . . . . . . . . . . . . . . . . 90<br />

4.6.3 Building a Library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92<br />

4.7 Proving Other Combinators . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93<br />

4.7.1 Index Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93<br />

4.7.2 An Irregular Grid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98<br />

4.7.3 H-Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99<br />

4.7.4 Surround . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101<br />

4.8 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103<br />

4.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105<br />

5 Specialisation 107<br />

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107<br />

5.2 Distributed Specialisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109<br />

5.2.1 Specialising Primitives . . . . . . . . . . . . . . . . . . . . . . . . . . . 110<br />

5.2.2 Benefits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112<br />

5.2.3 Verifying Distributed Specialisation . . . . . . . . . . . . . . . . . . . 114<br />

5.3 Optimal Distributed Specialisation . . . . . . . . . . . . . . . . . . . . . . . . 116


TABLE OF CONTENTS vi<br />

5.3.1 Specialising a Ripple Adder . . . . . . . . . . . . . . . . . . . . . . . . 116<br />

5.3.2 Modified Type System . . . . . . . . . . . . . . . . . . . . . . . . . . . 118<br />

5.4 High Level Specialisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120<br />

5.5 Specialising a Multiplier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122<br />

5.5.1 Parallel Multiplier Implementation . . . . . . . . . . . . . . . . . . . . 122<br />

5.5.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125<br />

5.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129<br />

6 <strong>Layout</strong> Case Studies 130<br />

6.1 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130<br />

6.2 Adder Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132<br />

6.2.1 Ripple Adder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132<br />

6.2.2 Possible Tree <strong>Layout</strong>s . . . . . . . . . . . . . . . . . . . . . . . . . . . 134<br />

6.2.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136<br />

6.3 Median Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137<br />

6.3.1 <strong>Circuit</strong> Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138<br />

6.3.2 <strong>Layout</strong> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139<br />

6.3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141<br />

6.4 Butterfly Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142<br />

6.4.1 Butterfly Combinator . . . . . . . . . . . . . . . . . . . . . . . . . . . 143<br />

6.4.2 Implementing a bitonic merger . . . . . . . . . . . . . . . . . . . . . . 145<br />

6.4.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147<br />

6.5 Binomial Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148<br />

6.5.1 <strong>Circuit</strong> Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148<br />

6.5.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149<br />

6.6 Matrix Multiplier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151<br />

6.6.1 A 3D “cube” Combinator . . . . . . . . . . . . . . . . . . . . . . . . . 152<br />

6.6.2 Describing N-dimensional Combinators . . . . . . . . . . . . . . . . . 155<br />

6.6.3 A 3D Matrix Multiplier . . . . . . . . . . . . . . . . . . . . . . . . . . 158<br />

6.6.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160<br />

6.7 Evaluation and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161<br />

6.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163


TABLE OF CONTENTS vii<br />

7 Conclusion and Future Work 164<br />

7.1 This Thesis’ Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164<br />

7.2 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165<br />

7.3 Comparison <strong>with</strong> Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . 168<br />

7.3.1 VHDL <strong>with</strong> Explicit Co-ordinates . . . . . . . . . . . . . . . . . . . . 168<br />

7.3.2 Relative Placement in Pebble . . . . . . . . . . . . . . . . . . . . . . . 168<br />

7.3.3 Ruby and Lava . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170<br />

7.4 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171<br />

7.4.1 Further Support For Alternative <strong>Layout</strong> Interpretations . . . . . . . . 171<br />

7.4.2 Less User Interaction In Pro<strong>of</strong>s . . . . . . . . . . . . . . . . . . . . . . 172<br />

7.4.3 Integrating <strong>Layout</strong> and Functional <strong>Verification</strong> . . . . . . . . . . . . . 173<br />

7.4.4 Run-time Reconfiguration . . . . . . . . . . . . . . . . . . . . . . . . . 173<br />

7.4.5 Properties <strong>of</strong> N-Dimensional Combinators . . . . . . . . . . . . . . . . 174<br />

Bibliography 175<br />

A Quartz Language Grammar 184<br />

B Theoretical Basis for <strong>Layout</strong> Reasoning 187<br />

B.1 IntAlgebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188<br />

B.2 Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191<br />

B.3 Block . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191<br />

B.4 Inbuilt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192<br />

B.5 SeriesComposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193<br />

B.6 ParallelComposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193<br />

B.7 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195<br />

B.8 Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201<br />

B.9 CompilerSimps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204<br />

B.10 Quartz<strong>Layout</strong> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205<br />

B.11 Minf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205<br />

C Placed Combinator Libraries 207<br />

C.1 Prelude Library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207


TABLE OF CONTENTS viii<br />

C.1.1 fst . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207<br />

C.1.2 R −1 (converse) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209<br />

C.1.3 R n (rcomp) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210<br />

C.1.4 Q\P (conjugate) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213<br />

C.1.5 map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214<br />

C.1.6 (tri) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217<br />

C.1.7 R↔S (beside) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219<br />

C.1.8 row . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222<br />

C.1.9 grid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225<br />

C.1.10 loop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227<br />

C.2 Recursive Index Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228<br />

C.2.1 ichain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228<br />

C.2.2 imap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230<br />

C.2.3 irow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232<br />

C.2.4 irdl . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235<br />

C.2.5 igrid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239<br />

C.3 Irregular Grid Arrangement . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244<br />

C.4 Square Element Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252<br />

C.5 H-Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260<br />

D <strong>Circuit</strong> <strong>Layout</strong> Case Studies 268<br />

D.1 Median Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268<br />

D.1.1 Quartz Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268<br />

D.1.2 Theory max2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271<br />

D.1.3 Theory min2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274<br />

D.1.4 Theory eq . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276<br />

D.1.5 Theory insert . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278<br />

D.1.6 Theory lct cell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280<br />

D.1.7 Theory locater . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282<br />

D.1.8 Theory del cell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283<br />

D.1.9 Theory compactor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285<br />

D.1.10 Theory insert median . . . . . . . . . . . . . . . . . . . . . . . . . . . 287


TABLE OF CONTENTS ix<br />

D.1.11 Theory nextstate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288<br />

D.1.12 Theory filter core . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291<br />

D.1.13 Theory filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293<br />

D.2 Butterfly Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294<br />

D.2.1 Quartz description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294<br />

D.2.2 Theory comparator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296<br />

D.2.3 Theory sort2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297<br />

D.2.4 Theory butterfly . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301<br />

D.2.5 Theory merger . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303<br />

D.3 Matrix Multiplier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305<br />

D.3.1 Quartz Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305<br />

D.3.2 Theory cube cell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306<br />

D.3.3 Theory cube . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309<br />

D.3.4 Theory matmultcell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312<br />

D.3.5 Theory matmult . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313


Chapter 1<br />

Introduction<br />

This thesis presents a framework for the design and verification <strong>of</strong> parameterised hardware<br />

libraries <strong>with</strong> layout information. Our framework is based around the Quartz hardware<br />

description language, which supports higher-order combinators and relational operators de-<br />

signed to promote concise descriptions and formal verification <strong>of</strong> design function. We extend<br />

the Quartz language and compiler framework to allow the circuit designer to add layout<br />

information and formally verify the correctness <strong>of</strong> parameterised layouts rather than relying<br />

on automatic placement.<br />

The framework includes a conservative extension to the class <strong>of</strong> Quartz expressions to provide<br />

the extra functionality which we show is required to describe generic layouts for both recur-<br />

sively and iteratively described circuits, a compiler infrastructure for compiling designs <strong>with</strong><br />

layout information into parameterised hardware libraries and a pro<strong>of</strong> tool <strong>with</strong> a library <strong>of</strong><br />

theorems to automate the verification <strong>of</strong> circuit layouts. The use <strong>of</strong> the framework is demon-<br />

strated on a range <strong>of</strong> example designs including median filter and matrix multiplier circuits<br />

and we show that including layout information in designs can improve their performance by<br />

up to 82%, reduce logic area by 40-60% and reduce power consumption in comparison <strong>with</strong><br />

using the standard Xilinx place and route tools.<br />

The potential <strong>of</strong> this framework to support dynamic specialisation applications is also il-<br />

lustrated and we introduce the concept <strong>of</strong> distributed specialisation to achieve HDL-level<br />

specialisation <strong>of</strong> circuits in a manner such that the specialised circuits can easily be verified.<br />

1


CHAPTER 1. INTRODUCTION 2<br />

1.1 Motivation<br />

It is a characteristic <strong>of</strong> custom computing machines that the technology developers have put<br />

considerable effort into automating the translation from high level hardware descriptions into<br />

the underlying reconfigurable device fabric. Two critical stages in this process are placement<br />

and routing where computational resources on the device are allocated and connected to-<br />

gether. The effectiveness <strong>of</strong> the placement and routing algorithms has a significant impact<br />

on the performance <strong>of</strong> the resulting circuit since a badly placed design will feature unnecessary<br />

long wire lengths <strong>with</strong> accompanying delays and impact on maximum clock frequency.<br />

While modern place & route systems based on simulated annealing can achieve excellent<br />

results for the highest performance it is still common to intervene manually in the placement<br />

<strong>of</strong> designs and much <strong>of</strong> the value <strong>of</strong> Intellectual Property Cores, such as those produced by the<br />

Xilinx Core Generator tool, is that they are carefully laid out to provide good performance.<br />

It has been shown that user-supplied placement information can <strong>of</strong>ten significantly improve<br />

circuit performance in some common applications [77].<br />

The placement <strong>of</strong> a particular design can be accomplished using graphical tools, either from<br />

scratch or amending an initial automatic placement. However, manual placements that<br />

exploit design geometry are likely to scale to a whole class <strong>of</strong> similar designs, for example the<br />

arrangement <strong>of</strong> a parallel multiplier into a grid-shaped structure will be the same for an 8-bit<br />

multiplier as for a 32-bit multiplier. Since it is possible to produce a single parameterised<br />

design description for an n-bit multiplier it is desirable to describe design layouts in the same<br />

way, such that the multiplier need only be laid out once for all possible values <strong>of</strong> n.<br />

Precise control <strong>of</strong> layout is particularly useful for parameterised hardware libraries, where<br />

placement must also be parameterised, since any inefficiency will affect all circuits that use<br />

them. This also impacts on circuits produced from high-level descriptions using imperative<br />

languages such as Handel-C [12] where eventually the description must be implemented by<br />

combining libraries <strong>of</strong> circuits performing different functions.<br />

Controlling placement is also desirable for reconfigurable circuits in order to minimise the<br />

reconfiguration necessary when switching between different chip configurations since com-<br />

ponents at identical locations common to both configurations do not need to be changed.


CHAPTER 1. INTRODUCTION 3<br />

B<br />

A<br />

C<br />

Figure 1.1: An irregular grid such as this one is impossible to describe using purely beside<br />

and below relative placement.<br />

Run-time reconfiguration is a growing area <strong>of</strong> interest as designers seek to make better use <strong>of</strong><br />

the reconfigurable nature <strong>of</strong> <strong>FPGA</strong>s to improve performance, particularly since <strong>FPGA</strong>s are<br />

otherwise slower and more power-hungry than ASICs.<br />

Placement information can be specified <strong>with</strong>in hardware description languages by giving<br />

explicit co-ordinates for each component, however this approach is tedious and error prone.<br />

The potential for errors is particularly significant for parameterised libraries since designs<br />

which have painstakingly been established as functionally correct may still fail to synthesise<br />

properly for some parameter values if these particular parameterisations produce an invalid<br />

layout – for example, one where more than one component is placed at the same location or<br />

components are placed outside the area on the chip allocated to the library design.<br />

The use <strong>of</strong> relative placement information, placing components beside or below one another,<br />

has been proposed for simplifying the process <strong>of</strong> laying out circuits and reducing the potential<br />

for errors. Errors can not be totally eliminated unless explicit placement is disallowed entirely<br />

and beside/below placement alone is utilised, however this approach is too restrictive to<br />

permit the description <strong>of</strong> all possible desirable circuit layouts. For example, it is impossible to<br />

describe the layout <strong>of</strong> components shown in Figure 1.1 using only beside and below placement<br />

directives.<br />

The objective <strong>of</strong> this work is to create a framework allowing the clear and efficient description<br />

<strong>of</strong> parameterised circuit descriptions <strong>with</strong> layout information and which provides a formal<br />

assurance <strong>of</strong> correctness for layouts.<br />

D<br />

E


CHAPTER 1. INTRODUCTION 4<br />

1.2 Contributions<br />

The contributions <strong>of</strong> this thesis are:<br />

1. A framework supporting the addition <strong>of</strong> generic layout information to parameterised<br />

<strong>FPGA</strong> libraries described in the Quartz language. We show how these circuit de-<br />

scriptions can be compiled onto <strong>FPGA</strong>s and describe how they can be translated into<br />

parameterised library descriptions in the Pebble hardware description language and<br />

structural VHDL. We give all Quartz circuit descriptions composed using standard li-<br />

brary combinators a basic layout interpretation, which is <strong>of</strong>ten already optimal, and also<br />

allow descriptions to be annotated to change the generated layout while maintaining<br />

the same function through the use <strong>of</strong> the overloaded combinator blocks. (Chapter 3)<br />

2. We provide a specification <strong>of</strong> layout correctness and construct an environment based on<br />

higher-order logic to allow the verification <strong>of</strong> the manual layouts specified for Quartz<br />

combinators and for full circuits. We prove a range <strong>of</strong> useful properties about common<br />

circuit layout expressions which we use to automate the verification <strong>of</strong> layouts for new<br />

combinators or circuits using the Isabelle theorem prover. (Chapter 4)<br />

3. We develop and verify a range <strong>of</strong> placed combinator libraries for common structures<br />

such as rows, columns, grids, trees and less regular examples such as the pathological<br />

example shown in Figure 1.1 and a square-element interface configuration. We achieve<br />

high levels <strong>of</strong> automation in the verification <strong>of</strong> these combinator layouts, <strong>with</strong> many<br />

pro<strong>of</strong>s completed <strong>with</strong>out any human intervention and others requiring only minimal<br />

user involvement. We show that the verification framework can not only establish<br />

correctness <strong>of</strong> valid layouts but also highlight counter-examples where layouts are not<br />

correct. (Chapter 4)<br />

4. We introduce distributed specialisation, where HDL-level specialisation <strong>of</strong> circuits when<br />

one or more inputs are known can be achieved <strong>with</strong>out centralised control. Distributed<br />

specialisation allows circuits to be specialised in order to produce higher-performance<br />

variants at the HDL-level, eliminating or reducing the need for slow low-level opti-<br />

misation <strong>of</strong> circuits in time-critical dynamic specialisation applications. Distributed<br />

specialisation also makes formal verification <strong>of</strong> the specialisation process much sim-


CHAPTER 1. INTRODUCTION 5<br />

pler, allowing the correctness <strong>of</strong> all specialised circuits to be established by proving<br />

their equivalence to the original general circuit. We show that our layout framework<br />

supports distributed specialisation and can be used to achieve design compaction dur-<br />

ing specialisation <strong>of</strong> parameterised circuits, in contrast to more conventional low-level<br />

approaches which eliminate unnecessary logic but do not compact the circuit. Speciali-<br />

sation <strong>with</strong> compaction reduces the area on-chip that must be allocated to a circuit and<br />

we demonstrate that it improves performance for a simple parallel multiplier design.<br />

(Chapter 5)<br />

5. We describe and verify the layout for five example circuits, including a median filter,<br />

butterfly network and a matrix multiplier described using a new class <strong>of</strong> n-dimensional<br />

combinators, and investigate the benefits <strong>of</strong> using user-specified placement constraints<br />

during synthesis. We show that manually placed designs can be placed a d routed faster<br />

and <strong>of</strong>ten have higher performance and lower power consumption while requiring less<br />

logic area on a Xilinx Virtex-II device. Improvements <strong>of</strong> up to 80% in maximum clock<br />

frequency and a 61% reduction in area are observed. (Chapter 6)<br />

1.3 Organisation<br />

The remainder <strong>of</strong> this thesis is organised as follows: Chapter 2 presents relevant background<br />

information and related work. Chapter 3 introduces the layout description framework and<br />

illustrates how circuit descriptions <strong>with</strong> layout information can be compiled. Chapter 4 de-<br />

tails the layout verification environment and gives details <strong>of</strong> some key pro<strong>of</strong>s. Chapter 5<br />

introduces distributed specialisation and demonstrates the use <strong>of</strong> the layout framework to<br />

produce specialised circuits. Chapter 6 describes the construction, verification and perfor-<br />

mance <strong>of</strong> some example circuits. Chapter 7 evaluates this work, draws conclusions and<br />

presents recommendations for future research.<br />

Appendix A gives the full grammar <strong>of</strong> the extended Quartz language. Appendix B gives<br />

the definitions and pro<strong>of</strong>s in the verification environment for Quartz circuit layouts and<br />

Appendix C gives example pro<strong>of</strong>s for a variety <strong>of</strong> library combinators. Appendix D contains<br />

some <strong>of</strong> the pro<strong>of</strong>s for the layout correctness <strong>of</strong> the circuit examples in Chapter 6.


Chapter 2<br />

Background and Related Work<br />

This chapter details some <strong>of</strong> the background to the work in this thesis. Section 2.1 briefly<br />

introduces <strong>FPGA</strong>s and the process <strong>of</strong> creating <strong>FPGA</strong> circuits, including Section 2.1.2 which<br />

describes previous work enabling the specification <strong>of</strong> circuit layouts manually. Section 2.2<br />

summarises some <strong>of</strong> the different ways <strong>of</strong> describing <strong>FPGA</strong> hardware and Section 2.3 in-<br />

troduces the Quartz hardware description language. Section 2.4 discusses some <strong>of</strong> the back-<br />

ground to automated verification <strong>of</strong> hardware and Section 2.5 introduces the Isabelle theorem<br />

prover. Section 2.6 summarises the contents <strong>of</strong> this chapter.<br />

2.1 <strong>FPGA</strong>s<br />

Field Programmable Gate Arrays are programmable logic devices which aim to combine<br />

the user control and time-to-market benefits <strong>of</strong> programmable logic devices (PLDs) <strong>with</strong> the<br />

densities and cost benefits <strong>of</strong> gate arrays. In the past decade <strong>FPGA</strong>s have increased in speed,<br />

size and density such that they are no longer limited to implementing glue logic <strong>with</strong>in a<br />

system and are now increasingly used to implement major functions or complete systems.<br />

For example, in 1998 the Xilinx XC4000 series provided approximately 10,000 logic cells<br />

while the latest Virtex-4 family now provides around 200,000 cells. Reconfigurable logic has<br />

been used to effectively implement designs in computationally-intensive applications such as<br />

digital signal processing and cryptography.<br />

6


CHAPTER 2. BACKGROUND AND RELATED WORK 7<br />

<strong>FPGA</strong>s consist <strong>of</strong> a grid <strong>of</strong> programmable logic cells and programmable routing to connect<br />

these computational elements together. The vast proportion <strong>of</strong> chip area is made up <strong>of</strong><br />

programmable routing resources. As well as basic programmable logic cells, many <strong>FPGA</strong><br />

architectures also include more complex hardware such as embedded RAMs, multipliers and<br />

instruction processors.<br />

The main component <strong>of</strong> each <strong>FPGA</strong> logic cell is typically an SRAM look-up table (LUT),<br />

which can be programmed to implement any n-input logic function. Each logic cell usually<br />

contains a flip-flop which can be connected to the output <strong>of</strong> the LUT and possibly other<br />

specific logic designed to accelerate particular common functions.<br />

<strong>FPGA</strong>s are programmed to adopt a particular configuration by loading a “bitstream”. This<br />

is usually done at start-up but it is also possible reconfigure <strong>FPGA</strong>s at run-time, adapting<br />

their function as they continue to process data. This bitstream is typically generated from a<br />

high-level description in some kind <strong>of</strong> hardware description language.<br />

2.1.1 Generating <strong>FPGA</strong> <strong>Circuit</strong>s<br />

The process <strong>of</strong> generating <strong>FPGA</strong> configurations from hardware descriptions is usually a<br />

complex and time consuming process. Four major steps are:<br />

1. Synthesis: generating a graph representing <strong>of</strong> logical expressions representing a design<br />

from a higher-level hardware description. This could include the process <strong>of</strong> creating a<br />

logical description from an imperative one, such as Handel-C [12].<br />

2. Mapping: assigning logic graph nodes into devices resources such as look-up tables or<br />

registers. Algorithms like FlowMap [13] can be used to achieve this.<br />

3. Placement: the placement <strong>of</strong> mapped resources onto specific resources <strong>of</strong> the target<br />

architecture. Automatic placement algorithms typically use heuristics such as simulated<br />

annealing [35].<br />

4. Routing: configuring the programmable routing fabric to connect the placed resources<br />

together to implement the logic graph.


CHAPTER 2. BACKGROUND AND RELATED WORK 8<br />

LEN: for i in 0 to bits −1 generate<br />

constant row : natural :=(( width −1)/2)−(i /2) ;<br />

constant column : natural :=0;<br />

constant s l i c e : natural :=0;<br />

constant rloc str : string := ”R” & itoa (row) &<br />

”C” & itoa (column ) & ” . S” & itoa ( s l i c e ) ;<br />

attribute RLOC <strong>of</strong> U1: label is rloc str ;<br />

begin<br />

U1: FDE port map (<br />

Q => dd( j ) ,<br />

D => ff d ,<br />

C => clk ,<br />

CE =>lcl en ( en idx ) ) ;<br />

end generate LEN;<br />

Figure 2.1: Using RLOCs <strong>with</strong> a VHDL generate statement to explicitly place FDE primitives<br />

These four stages can require a long period to run because each can involve many iterations<br />

to optimise the results. In addition the results are <strong>of</strong>ten sub-optimal, particularly for the<br />

synthesis stage when a high-level description is being translated into logic equations. It<br />

is common to use structural, rather than behavioural, hardware descriptions for designing<br />

circuits where performance is important, or for hardware libraries where the return from the<br />

effort spent on optimising them is high because any inefficiency would affect all designs that<br />

use them.<br />

Automatic placement algorithms, while capable <strong>of</strong> producing good results, operate at a low-<br />

level and <strong>of</strong>ten human designers are able to specify better layouts by hand, exploiting their<br />

knowledge <strong>of</strong> the higher-level design structure. When elements <strong>of</strong> a circuit are badly placed<br />

this can lead to unnecessarily long wires which have a negative effect on the circuit’s perfor-<br />

mance.<br />

2.1.2 <strong>Layout</strong> Information in Hardware <strong>Descriptions</strong><br />

When describing high performance hardware it is possible to specify layout constraints in<br />

the source hardware description which replace or augment the automatic placement system<br />

to produce better circuit layouts. This is <strong>of</strong>ten most beneficial when a human designer can<br />

exploit knowledge about the structure <strong>of</strong> a circuit to describe a geometric arrangement <strong>of</strong><br />

components that will lead to related logic being placed closely together.


CHAPTER 2. BACKGROUND AND RELATED WORK 9<br />

Structural hardware descriptions in VHDL or Verilog can be annotated <strong>with</strong> “RLOC” place-<br />

ment constraints for targeting Xilinx <strong>FPGA</strong> architectures. Figure 2.1 illustrates how this<br />

can be used in VHDL to place a column <strong>of</strong> flip-flops. The placement co-ordinates are given<br />

using the co-ordinate system for Virtex and earlier Spartan devices, where positions are<br />

given as a string <strong>of</strong> “RmCn.Sp” to place a component in a particular row and column and<br />

then at a numbered slice at that position. Later device families, including Virtex-II, use a<br />

slice-based co-ordinate system where components are placed <strong>with</strong> a co-ordinate scheme <strong>of</strong><br />

“XmYn” where each combination <strong>of</strong> (m, n) describes a different slice.<br />

The Pebble language [46] supports a simpler but equivalent placement infrastructure by<br />

allowing hardware instantiations to be annotated <strong>with</strong> at (x,y). These co-ordinates are<br />

mapped into an appropriate architecture-specific format when the Pebble descriptions are<br />

compiled into VHDL [48].<br />

Specifying layouts <strong>with</strong> absolute co-ordinates is tedious and error-prone, particularly for<br />

parameterised hardware descriptions where placement constraints may work for some com-<br />

binations <strong>of</strong> parameters but not others. For parameterised, placed, hardware libraries this is<br />

a particular issue since even if the design if functionally correct some instantiations <strong>of</strong> it may<br />

not produce valid layouts.<br />

An alternative is relative placement, where components are instantiated below or beside one<br />

another. Relative placement is not as powerful as placement <strong>with</strong> explicit co-ordinates, how-<br />

ever if properly specified then it does remove the possibility <strong>of</strong> describing incorrect layouts.<br />

The Pebble system has been extended to support relative placement [49, 50]. All block<br />

instantiations must be contained <strong>with</strong>in a beside or below block which describes layout on a<br />

grid. beside for and below for constructs are provided to handle iteration. The Pebble system<br />

is unique in that relatively placed circuit descriptions can be compiled into parameterised<br />

libraries <strong>with</strong> explicit co-ordinates, however it does not always do this in the most optimal<br />

manner as we will discuss in Chapter 3.<br />

Relative placement systems based on higher-order combinators have been demonstrated for<br />

Lava [7] and Ruby [24, 26, 31]. Ruby circuits are described as relations between inputs<br />

and outputs and can be given layout interpretations through their use <strong>of</strong> beside and below


CHAPTER 2. BACKGROUND AND RELATED WORK 10<br />

combinators. Lava provides a more sophisticated layout system designed specifically to target<br />

Xilinx <strong>FPGA</strong>s <strong>with</strong> combinators to place components beside each other, below each other or<br />

at the same location.<br />

<strong>Circuit</strong> layouts derived from Ruby relational descriptions can not be invalid, since only beside<br />

or below placement is allowed. The Lava system is more powerful and, because it allows<br />

components to be placed on top <strong>of</strong> each other (desirable in order to instantiate the different<br />

hardware primitives <strong>with</strong>in a single slice) can generate invalid layouts.<br />

2.1.3 <strong>FPGA</strong> Architectures<br />

The three main companies producing <strong>FPGA</strong>s commercially are Altera, Actel and Xilinx.<br />

Most <strong>FPGA</strong>s are based on static RAM although Actel produces <strong>FPGA</strong>s based on antifuses<br />

[23]. An antifuse is an electrically programmable two-terminal device which changes from<br />

high to low resistance when a programming voltage is applied to it. Actel <strong>FPGA</strong>s are mainly<br />

used in military applications.<br />

Altera <strong>FPGA</strong> architectures are based around simple logic elements which contain a look-up<br />

table, flip-flop and some additional circuitry to implement fast-carry chains. Xilinx architec-<br />

tures are based around more complex Configurable Logic Blocks, each <strong>of</strong> which contains four<br />

“slices”.<br />

Xilinx Virtex-II [86] is a typical family <strong>of</strong> <strong>FPGA</strong>s <strong>with</strong> 11 members, ranging from 40,000<br />

to 8M system gates. The slices <strong>with</strong>in each CLB are arranged in two columns <strong>with</strong> fast<br />

connections between the slices in each column for propagating carry signals. Each slice<br />

contains two 4-input function generators, carry logic, logic gates, multiplexers and storage<br />

elements. Figure 2.2 shows the circuit diagram <strong>of</strong> the top half <strong>of</strong> a slice in the Virtex-II<br />

architecture.<br />

The 4-input LUTs in each slice are capable <strong>of</strong> implementing any boolean function <strong>of</strong> up to<br />

four inputs and the propagation delay <strong>of</strong> the component is independent <strong>of</strong> the function being<br />

implemented. In addition to the basic LUT a component that is particularly worthy <strong>of</strong> note<br />

is the MUXCY multiplexer which permits the implementation <strong>of</strong> fast carry signals between<br />

the slices arranged vertically in a column. The ORCY component and dedicated Sum <strong>of</strong>


CHAPTER 2. BACKGROUND AND RELATED WORK 11<br />

SOPIN<br />

Dual-Port<br />

Shift-Reg<br />

G4<br />

G3<br />

G2<br />

G1<br />

WG4<br />

WG3<br />

A4<br />

A3<br />

A2<br />

A1<br />

WG4<br />

WG3<br />

LUT<br />

RAM<br />

ROM<br />

G<br />

D<br />

WG2<br />

WG1<br />

WG2<br />

WG1<br />

MC15<br />

WS DI<br />

ALTDIG<br />

BY<br />

SLICEWE[2:0]<br />

CE<br />

CLK<br />

SR<br />

SHIFTIN COUT<br />

0<br />

MULTAND<br />

WSG<br />

WE[2:0]<br />

WE<br />

CLK<br />

WSF<br />

1<br />

1<br />

0<br />

SHIFTOUT<br />

MUXCY<br />

0 1<br />

G2<br />

PROD<br />

G1<br />

BY<br />

CYOG<br />

MUXCY<br />

0 1<br />

Shared between<br />

x & y Registers<br />

XORG<br />

ORCY<br />

DYMUX<br />

YBMUX<br />

CE<br />

CLK<br />

GYMUX<br />

FF<br />

LATCH<br />

D Q<br />

CE<br />

CK<br />

Y<br />

SR REV<br />

Figure 2.2: Top half <strong>of</strong> a Virtex-II slice (Copyright 2000-2005 Xilinx, Inc)<br />

SR<br />

SOPOUT<br />

Products (SOP) chain are designed to support the implementation <strong>of</strong> large SOP expressions.<br />

Some <strong>FPGA</strong>s include full instruction processors on-chip, in order to provide some additional<br />

general purpose computing capability. General purpose processors which incorporate some<br />

reconfigurable logic are also under development [4]. These s<strong>of</strong>tware-configurable processors<br />

are designed to combine the benefits <strong>of</strong> <strong>FPGA</strong> parallelism <strong>with</strong> fast general purpose compu-<br />

tation to <strong>of</strong>fer significant performance gains.<br />

2.2 Describing Hardware<br />

With the increasing complexity <strong>of</strong> electronic systems computer aided design (CAD) tools<br />

have become increasingly important. For decades, hardware was primarily designed using<br />

CIN<br />

YB<br />

Y<br />

DY<br />

Q<br />

DIG


CHAPTER 2. BACKGROUND AND RELATED WORK 12<br />

schematic capture. Basic elements (logic gates) were selected, placed on a schematic and then<br />

connected together. This “bottom up” design approach can take a long time to generate large<br />

circuits and results in designs that are difficult to change.<br />

In the past 20 years a change has taken place in the methodology <strong>of</strong> circuit design and<br />

today hardware description languages (HDLs) have come to be the dominant design mecha-<br />

nism. Hardware Description Languages (HDLs) <strong>of</strong>fer a number <strong>of</strong> significant advantages over<br />

schematic-based design:<br />

• Designs can be produced at register transfer level (RTL) <strong>with</strong>out choosing a specific<br />

fabrication technology. Logic synthesis tools can be used to automatically convert a<br />

design for any fabrication mechanism, including generating <strong>FPGA</strong> bitstreams.<br />

• HDLs facilitate a top-down approach to design [69] that allows successive refinement<br />

<strong>of</strong> a specification.<br />

• HDLs support shorter development phases in projects by allowing functional verification<br />

<strong>of</strong> circuits early in the design cycle.<br />

• A textual description <strong>of</strong> a circuit complete <strong>with</strong> comments is an easier way to develop<br />

and debug circuits than large schematics, which become extremely cumbersome for<br />

complex designs.<br />

Hardware languages can be characterised by their programming style and the level <strong>of</strong> abstrac-<br />

tion they provide from the underlying hardware. High level languages allow hardware to be<br />

described more easily but less precisely and tend to produce circuits quicker but <strong>with</strong> lower<br />

performance. Lower level structural languages are closer to traditional schematic capture for<br />

circuit design and allow precise control to produce high performance circuits.<br />

2.2.1 High Level Approaches<br />

The increasing complexity <strong>of</strong> integrated circuits is leading to a significant gap between the<br />

amount <strong>of</strong> logic (reconfigurable, or otherwise) it is possible to fabricate on a given area <strong>of</strong><br />

silicon and the capabilities <strong>of</strong> circuit designers to make use <strong>of</strong> this capacity. This “design gap”


CHAPTER 2. BACKGROUND AND RELATED WORK 13<br />

is leading to considerable interest in methods <strong>of</strong> describing hardware at higher and higher<br />

levels <strong>of</strong> abstraction. Even though circuits generated from high level descriptions may be<br />

less efficient and have worse performance than those that have been subject to painstaking<br />

low level effort this is <strong>of</strong>ten less important than the easier development and shorter time to<br />

market it allows.<br />

One approach to high level description <strong>of</strong> hardware is to compile imperative languages directly<br />

into hardware. Imperative programming languages specify explicit manipulations <strong>of</strong> the state<br />

<strong>of</strong> a computer system through a series <strong>of</strong> instructions and are commonly used by computer<br />

programmers to write s<strong>of</strong>tware applications. Unlike in VHDL or other declarative languages,<br />

the designer can concentrate on describing an algorithm in a similar style to a conventional<br />

programming language.<br />

A number <strong>of</strong> imperative languages have been proposed for hardware description. Handel-C<br />

[12] is a high level imperative language designed to be compiled into hardware. Available as<br />

a commercial system, Handel-C is based on ANSI-C and has extensions provided specifically<br />

for hardware development including explicit declaration <strong>of</strong> data widths, parallel processing<br />

and communication between parallel elements. Cobble [82] is another imperative language<br />

that allows declarative blocks to be used <strong>with</strong>in imperative programs, allowing the benefits<br />

<strong>of</strong> low-level and high-level programming styles to be exploited.<br />

Another approach that has been taken is to compile higher-level system models directly into<br />

hardware. The theoretical development <strong>of</strong> digital signal processing algorithms in particular<br />

is <strong>of</strong>ten conducted using mathematical tools like Matlab and systems have been presented for<br />

converting Matlab models into <strong>FPGA</strong> implementations automatically [36, 57, 58]. Other work<br />

includes graphical tools for composing hardware from block diagrams [56] and compilation<br />

<strong>of</strong> a domain-specific language for networking applications into <strong>FPGA</strong>s [37].<br />

SAFL [54] is a functional language for high-level hardware description. Functional pro-<br />

gramming treats computation as the evaluation <strong>of</strong> mathematical functions, emphasising the<br />

evaluation <strong>of</strong> expressions rather than the execution <strong>of</strong> commands. Pure functional languages<br />

encourage the use <strong>of</strong> formal reasoning about programs and are also characterised by the use<br />

<strong>of</strong> higher order functions - functions which take other functions as arguments. Functional<br />

languages are <strong>of</strong>ten considered to have significant advantages over imperative languages [28]


CHAPTER 2. BACKGROUND AND RELATED WORK 14<br />

for conventional computer programming. Many <strong>of</strong> these advantages (such as the expressive<br />

power <strong>of</strong> higher order functions) are also advantages for hardware design since they enable<br />

the more effective capture <strong>of</strong> common design patterns.<br />

2.2.2 Lower Level Approaches<br />

Lower level hardware descriptions are typically declarative. Declarative languages describe<br />

relationships between constructs, rather than specifying an explicit series <strong>of</strong> steps to follow<br />

and can describe hardware in a way that corresponds most closely to schematic capture.<br />

VHDL [30] and Verilog [81] are the leading HDLs in use in industry and both allow hardware<br />

to be described structurally in this way.<br />

VHSIC HDL (Very high speed integrated circuit hardware description language) was devel-<br />

oped in the 1980s and was accepted as an IEEE standard in 1987. VHDL supports the<br />

development, verification, synthesis and testing <strong>of</strong> hardware designs and is an extremely<br />

complex language <strong>with</strong> a rich range <strong>of</strong> constructs, many <strong>of</strong> which have no obvious hardware<br />

implementation. This makes it powerful language but also a difficult one to learn and to use.<br />

Pebble [46] is a simplified variant <strong>of</strong> structural VHDL that allows hardware to be described<br />

through the connection <strong>of</strong> parameterised blocks.<br />

Hardware ML (HML) [41] is an example <strong>of</strong> a hardware description language based on the func-<br />

tional programming language SML. HML is designed to combine the advantages <strong>of</strong> strongly<br />

typed languages <strong>with</strong> the conciseness <strong>of</strong> untyped languages. Lava [7] is a functional structural<br />

HDL based on the popular functional language Haskell. Lava is focused on describing not just<br />

the function <strong>of</strong> circuit elements but also their relative layout. Lava’s composition operators<br />

compose both behaviour (by connecting the output <strong>of</strong> one circuit to the input <strong>of</strong> the next)<br />

and layout (by placing one circuit next to another). Ruby [31] is a relational programming<br />

language that supports higher order and polymorphic relations. Ruby supports a relational<br />

view <strong>of</strong> hardware and formal reasoning about design correctness, however new users require<br />

some time to familiarise themselves <strong>with</strong> its variable-free notation.


CHAPTER 2. BACKGROUND AND RELATED WORK 15<br />

block addthree ((wire a, wire b), wire c) ∼ (wire d) {<br />

wire t.<br />

(a,b) ; add ; t.<br />

(c, t) ; add ; d.<br />

}<br />

2.3 Quartz<br />

Figure 2.3: A Quartz block which adds together three numbers<br />

Quartz [62, 65] is a declarative block composition language intended for describing digital<br />

circuits. A Quartz description is composed <strong>of</strong> a series <strong>of</strong> blocks which are defined by their<br />

name, interface type, local definitions and body statements. A block’s interface is divided,<br />

in a relational style, into a domain and a range. Primitive blocks represent hardware or<br />

simulation primitives and control the function <strong>of</strong> the circuit, while composite blocks control<br />

the structure and inter-connections <strong>of</strong> the primitives.<br />

Blocks can be visualised geometrically, typically as four-sided tiles that can be placed on a<br />

two dimensional surface in some inter-connected manner. An abstract additional dimension<br />

allows for the connection <strong>of</strong> additional signals such as a clock or static parameter values<br />

<strong>with</strong>out disturbing the underlying geometry. Figure 2.3 illustrates the Quartz description and<br />

visual representation for a block which adds together three values, the dotted line indicates<br />

the division between the block’s domain and range. It is common to reason about and<br />

refine Quartz and Ruby designs using pictures and tools have been developed for Ruby to<br />

produce design diagrams automatically [25]. This pictorial interpretation is a useful aid to<br />

implementation, since interconnection is <strong>of</strong>ten minimised by careful placement <strong>of</strong> components.<br />

Quartz differs from existing relational and functional languages by allowing designers to<br />

mix VHDL-like and relational styles in a single design as appropriate. The language is a<br />

development <strong>of</strong> work on Pebble [46] and the Ruby [31] relational calculus.


CHAPTER 2. BACKGROUND AND RELATED WORK 16<br />

The Quartz framework is also intended as a platform for experimenting <strong>with</strong> language and<br />

CAD algorithm design. The relative simplicity <strong>of</strong> the language and the modular design <strong>of</strong><br />

the compiler simplifies experimentation <strong>with</strong> new language constructs and processing stages.<br />

A workflow exists to take Quartz designs from an abstract specification through to real<br />

hardware using a compiler which transforms Quartz into Pebble then into VHDL.<br />

Quartz higher-order combinators allow circuits to be described quickly and concisely by<br />

having libraries <strong>of</strong> common circuit structures available such as rows and columns as well as<br />

more complex structures such as trees. Quartz designs can also be parameterised by integer or<br />

boolean input variables and the combination <strong>of</strong> these features leads to an expressive language<br />

which puts the power <strong>of</strong> the Ruby calculus <strong>with</strong>in a practical VHDL-inspired framework.<br />

2.3.1 Type System and Overloading<br />

Quartz has three basic signal types for wires, integers and booleans. Integers and booleans<br />

are used for parameterisation to control the circuit description while wires are a primitive<br />

type which are not evaluated by the compiler during elaboration. Assignment to wires is<br />

overloaded allowing both boolean and integer values to be statically assigned to wires as<br />

well as wires being connected together. Although an integer assignment to a wire clearly<br />

has no meaning in terms <strong>of</strong> pure hardware it can be very useful for simulating word-level<br />

descriptions <strong>of</strong> a design where a wire type signal can represent a data word.<br />

The language supports both tuples and vectors <strong>of</strong> signals. While tuple size is fixed by the<br />

designer at coding-time the length <strong>of</strong> a vector can be parameterised. Blocks also have a block<br />

type and can be passed as parameters to other blocks. Any valid polymorphic operation can<br />

be carried out on blocks, for example it is acceptable to build a vector <strong>of</strong> blocks.<br />

Quartz uses implicit typing and infers most types using the Hindley/Milner type system<br />

[14, 27, 53], however designers are required to enter type signatures for each block entity.<br />

This requirement contributes to the documentation <strong>of</strong> the code and allows clear and localised<br />

feedback on type errors. Combined <strong>with</strong> the use <strong>of</strong> tuples rather than lists for fixed-arity<br />

groups <strong>of</strong> signals, this substantially accelerates design development by eliminating time spent<br />

chasing confusing type errors.


CHAPTER 2. BACKGROUND AND RELATED WORK 17<br />

Quartz also supports overloading. Overloading, or ad-hoc polymorphism, describes the use<br />

<strong>of</strong> a single identifier to produce different implementations depending on context, the standard<br />

example being the use <strong>of</strong> “+” to represent addition <strong>of</strong> both integers and floating point num-<br />

bers in most programming languages. Quartz blocks can be overloaded by defining multiple<br />

blocks <strong>with</strong> the same name, a mechanism that has a number <strong>of</strong> uses, including:<br />

• Primitive blocks can be overloaded when multiple hardware primitives are available<br />

which essentially carry out the same operation but <strong>with</strong> different types.<br />

• Higher-order combinators can be overloaded when multiple blocks have the same basic<br />

function but different parameterisations, or different non-functional properties.<br />

• Composite blocks can be overloaded <strong>with</strong> primitive ones as “wrappers” around the<br />

primitives e.g. to provide multiple different interfaces to the same functionality.<br />

The inclusion <strong>of</strong> overloading allows the designer to work at a higher level <strong>of</strong> abstraction than<br />

would otherwise be possible. In order to maintain type inference and permit the automatic<br />

resolution <strong>of</strong> overloading <strong>with</strong>out requiring explicit annotations by the designer Quartz uses<br />

a type inference algorithm based on satisfiability matrix predicates [63, 64] – matrices that<br />

represent possible values <strong>of</strong> a type and relationships between type variables. This system<br />

minimises ambiguity and can express n-ary constraints between type variables clearly and<br />

easily.<br />

A key feature <strong>of</strong> the Quartz overloading system is that it permits overloading <strong>of</strong> blocks<br />

<strong>with</strong> overlapping types. This means, for example, that it is possible to overload a fully<br />

polymorphic block <strong>with</strong> type τ ∼ τ <strong>with</strong> specific instances for wire ∼ wire and int ∼ int.<br />

The overloading mechanism will select the most specific matching block to use, so for this<br />

example the instance <strong>with</strong> type wire ∼ wire would be used if wire types were supplied, even<br />

though the polymorphic instance also matches.<br />

Satisfiability matrices can also support blocks <strong>with</strong> different numbers <strong>of</strong> parameters by using<br />

an empty/void type Ω, which can be used to “pad” matrices and block types so that they<br />

are all the same length. Ω only unifies <strong>with</strong> itself so blocks <strong>with</strong> the wrong number <strong>of</strong><br />

parameters are eliminated from the matrix when unification fails. This is a particularly<br />

useful mechanism for overloading blocks <strong>with</strong> versions that are have the same function but


CHAPTER 2. BACKGROUND AND RELATED WORK 18<br />

have additional tuples <strong>of</strong> parameters to specify non-functional properties.<br />

2.3.2 Statements<br />

Quartz composite blocks contain one or more statements which can instantiate other blocks.<br />

A block is instantiated by connecting the domain/range signals to the values applied to them<br />

in the calling environment then processing any statements contained <strong>with</strong>in the block. If the<br />

instantiated block is a primitive then the instantiation leads to an instance <strong>of</strong> that primitive<br />

existing in the final synthesised hardware.<br />

Quartz provides specific constructs to compose together and instantiate blocks in series <strong>with</strong>-<br />

out explicit connections or in parallel. These relational composition operators allow circuits<br />

to be described very concisely and also aid formal reasoning and a transformational design<br />

style.<br />

Series composition, denoted by the semi-colon operator, connects the range <strong>of</strong> one block to<br />

the domain <strong>of</strong> another. Parallel composition, denoted by square brackets, allows the “side-<br />

by-side” instantiation <strong>of</strong> blocks that do not communicate. For example, the following Quartz<br />

statement uses composition to express that d = ¬(a ∧ b) and e = ¬c:<br />

((a, b), c) ; [and2 ; inv , inv] ; (d, e)<br />

As well as the block instantiation statements, Quartz also supports a range <strong>of</strong> other statement<br />

types including loops, conditionals and assertions. One construct worthy <strong>of</strong> particular men-<br />

tion is the signal connection operation “=”. This is a polymorphic, bi-directional operation<br />

expressing the equivalence between two signals. The bi-directional nature <strong>of</strong> signal connec-<br />

tion allows blocks to be re-arranged by higher-order combinators, for example wiring blocks<br />

can be inverted, and still be compiled into VHDL <strong>with</strong> output = input style assignments.<br />

2.3.3 Formal Reasoning<br />

The higher order combinators and wiring blocks that form the Quartz prelude library have<br />

simple mathematical properties, giving us a useful set <strong>of</strong> laws for transforming circuits. We<br />

can use Quartz laws to reason about Quartz circuits formally and to alter the representation


CHAPTER 2. BACKGROUND AND RELATED WORK 19<br />

<strong>of</strong> a circuit or to refine a circuit from a high level abstract representation into a more concrete<br />

one (see [32] and [74] for an illustration <strong>of</strong> this approach).<br />

An example <strong>of</strong> the definitions at the basis <strong>of</strong> Quartz are shown in the definitions below <strong>of</strong><br />

the Quartz identity and converse relations:<br />

a ; id ; b ⇔ a = b<br />

a ; R −1 ; b ⇔ b ; R ; a<br />

More complex Quartz blocks are usually reasoned about using recursive definitions, such as<br />

the one below for repeated series composition (rcomp):<br />

R 0 = id<br />

R n+1 = R ; R n<br />

As implemented in the prelude library most Quartz blocks are defined iteratively rather than<br />

recursively, since this leads to a more concise VHDL implementation, however it is usual<br />

to use the recursive definitions for formal reasoning. The equivalence <strong>of</strong> the iterative and<br />

recursive definitions for Quartz blocks can be established formally [62].<br />

Using these simple formal definitions we can prove transformation rules that can be applied<br />

to Quartz block instantiation statements, or entire blocks. For example, Theorem 1 below is<br />

a useful precursor to a theorem which allows designers to retime repeated compositions:<br />

Theorem 1 R ; S = S ; R ⇒ R ; S n = S n ; R<br />

Pro<strong>of</strong> By induction on n. Base case, n = 0 is easy using the relationship R ; id = id ; R.<br />

For the induction case n+1, we can expand the left hand side using the definition <strong>of</strong> repeated<br />

series composition:<br />

R ; S = S ; R ⇒ R ; S ; S n = S n+1 ; R<br />

Re-arranging using the precondition gives:<br />

R ; S = S ; R ⇒ S ; R ; S n = S n+1 ; R


CHAPTER 2. BACKGROUND AND RELATED WORK 20<br />

We are then able to use the induction hypothesis:<br />

R ; S = S ; R ⇒ S ; S n ; R = S n+1 ; R<br />

and fold back using the definition <strong>of</strong> repeated composition to complete the pro<strong>of</strong>:<br />

R ; S = S ; R ⇒ S n+1 ; R = S n+1 ; R<br />

When R is a functional block and S is the polymorphic delay element D (which can be<br />

implemented using D flip-flops) the pre-condition <strong>of</strong> this theorem is called the timeless con-<br />

dition and is the basis <strong>of</strong> many re-timing theorems which allow us to prove the equivalence<br />

<strong>of</strong> pipelined Quartz circuits to their combinational equivalents.<br />

Quartz transformations can be used as part <strong>of</strong> a transformational design methodology where<br />

a correct but inefficient or non-implementable specification is gradually refined into a useful<br />

implementation. Each step in this refinement is proved correct and thus the final result is<br />

correct by design, requiring no additional verification. The T-Ruby design system [74] is an<br />

example <strong>of</strong> a transformational design environment that provides assistance for this process.<br />

Transformational design has its critics [16] and its suitability for general-purpose circuit<br />

design remains questionable, however it can be effective for carrying out certain operations.<br />

One issue <strong>with</strong> transformational design is the question <strong>of</strong> completeness - whether for every<br />

valid specification and implementation there is a chain <strong>of</strong> correct transformations that can be<br />

applied to transform one into the other. The same problem applies to general theorem proving<br />

in a language such as Quartz, <strong>with</strong> is effectiveness the same process in a different direction.<br />

The completeness property is not well understood but it has been proved that complete<br />

transformation systems can not exist for languages that support unbounded recursion or<br />

iteration constructs (such as while loops) [83].<br />

2.4 Automated <strong>Verification</strong><br />

In recent years design validation has become the key bottleneck in the digital circuit design<br />

industry. Formal methods for verifying the correctness <strong>of</strong> hardware designs are one possible<br />

solution to this problem. With this approach hardware behaviour is described mathematically<br />

and a formal pro<strong>of</strong> is produced to verify that the implemented circuit meets a rigorous


CHAPTER 2. BACKGROUND AND RELATED WORK 21<br />

specification.<br />

The aim <strong>of</strong> automated verification is to have this formal verification process carried out auto-<br />

matically, or to provide substantial machine assistance allowing users to carry out verification<br />

more easily.<br />

Formal verification, however conducted, usually begins <strong>with</strong> the construction <strong>of</strong> a specifica-<br />

tion. This provides a high level description <strong>of</strong> the expected behaviour <strong>of</strong> the system being<br />

described given a particular sequence <strong>of</strong> inputs. In order to be useful, a formal specification<br />

should be constructed that is an unambiguous description in some formalism. Logic is a<br />

popular formalism for describing hardware functionality and logics such as first-order logic,<br />

higher-order logic and modal/temporal logic have all been used to describe hardware speci-<br />

fications. The choice <strong>of</strong> formalism depends on the style <strong>of</strong> verification to be performed (for<br />

example, what properties are to be verified). It is also necessary to build an implementation<br />

model <strong>of</strong> the system. For some systems, this may be the hardware description <strong>of</strong> the system<br />

itself but in others it may be necessary to apply some abstraction in order to produce a<br />

usable model.<br />

The key to verification is to relate these mathematical models at different levels <strong>of</strong> abstrac-<br />

tion. A set <strong>of</strong> desired mathematical expressions proved in the specification model should be<br />

shown to hold in the implementation model. The correctness <strong>of</strong> the original specification<br />

is absolutely essential to formal verification since <strong>with</strong>out this it is not possible to make<br />

meaningful statements about the implementation model.<br />

One popular means <strong>of</strong> relating specification and implementation models is model checking.<br />

This approach has key benefits (such as its ease <strong>of</strong> use) but quickly becomes computational<br />

intractable as the size <strong>of</strong> the hardware to be verified increases. An alternative is theorem<br />

proving, where the formal semantics <strong>of</strong> hardware descriptions are shown to be equivalent<br />

through a chain <strong>of</strong> mathematical reasoning. Theorem pro<strong>of</strong>s can be extremely large and<br />

complex, so mechanised theorem-proving tools are <strong>of</strong>ten used to help construct them.<br />

A fundamental point that must be laboured is that formal methods can not guarantee the<br />

correctness <strong>of</strong> the final product, only <strong>of</strong> the design process. At some level all formal methods<br />

involve assumptions about the correct behaviour <strong>of</strong> underlying layers - if not the tools that


CHAPTER 2. BACKGROUND AND RELATED WORK 22<br />

compile a design description into actual hardware then the operation <strong>of</strong> the hardware itself.<br />

The limits <strong>of</strong> formal methods for verification must be kept in mind – the techniques do<br />

not guarantee correct hardware but they do promise to remove or reduce error in the most<br />

error-prone stages <strong>of</strong> the process.<br />

2.4.1 Model Checking<br />

Model checking is an automatic model based property-verification method that is widely<br />

applicable for verification tasks. Checking starts <strong>with</strong> a model description and attempts to<br />

discover whether hypotheses asserted by the user are valid in the model. In this way the<br />

model checker can verify properties <strong>of</strong> the model (such as freedom from deadlocks), or can<br />

provide counterexamples in the form <strong>of</strong> an execution trace which fails the test.<br />

Modal checking is based on temporal logic [29] which allows the expression <strong>of</strong> formulae over<br />

transition systems. Model checking is essentially the exploration <strong>of</strong> the full state space <strong>of</strong> a<br />

system and thus can be highly automated but the size <strong>of</strong> the model that can effectively be<br />

executed is limited by the practical constraints <strong>of</strong> computer processor power, memory, etc.<br />

Despite this through considerable practical work on data structures (for example BDDs [11])<br />

circuits <strong>of</strong> considerable size have been verified using model checkers.<br />

Symbolic Trajectory Evaluation (STE) [73], for example, is a model checking approach de-<br />

signed to verify circuits <strong>with</strong> very large state spaces since it is more sensitive to the property<br />

being checked than to the size <strong>of</strong> the circuit. STE grew out <strong>of</strong> symbolic simulation and it<br />

still close to traditional simulation as a verification method.<br />

A number <strong>of</strong> commercially available tools for hardware verification through model checking<br />

are available from EDA vendors such as Cadence and Synopsis. Model checking is increasingly<br />

used in commercial circuit development as part <strong>of</strong> the verification process, although not<br />

totally replacing simulation. Model checking is particularly useful even in systems that are<br />

too large for full exhaustive checking (which is most full systems) because it finds counter-<br />

examples - state transition traces that do not meet the specification - and so can be used<br />

as part <strong>of</strong> a bug-fixing process. Simulation and model checking can be used in combination<br />

to explore a large state space, <strong>with</strong> simulation used to reach an interesting state and then


CHAPTER 2. BACKGROUND AND RELATED WORK 23<br />

model checking used to explore exhaustively around that state.<br />

2.4.2 Theorem Proving<br />

In theorem proving the relationship between a specification and an implementation is re-<br />

garded as a theorem that must be proved in an appropriate formalism.<br />

The broadest interpretation <strong>of</strong> theorem proving can encompass most methods <strong>of</strong> formal ver-<br />

ification including checking boolean equivalence and model checking, however it is generally<br />

applied to mathematic pro<strong>of</strong>s <strong>of</strong> the properties <strong>of</strong> systems. The chief advantage <strong>of</strong> this ap-<br />

proach is the formal pro<strong>of</strong> established through this process can be justified at every step and<br />

thus the overall soundness <strong>of</strong> the process is ensured. However the size and complexity <strong>of</strong><br />

even relatively simple theorems means that pro<strong>of</strong> by a human is <strong>of</strong>ten a long and difficulty<br />

process.<br />

Mechanised theorem-proving systems can be used to aid the pro<strong>of</strong> <strong>of</strong> large theorems. Despite<br />

the name, these systems are generally better regarded as pro<strong>of</strong> assistants than provers, since<br />

they usually require considerable human intervention to steer them toward their goal. The-<br />

orem provers can <strong>of</strong>ten automate trivial stages <strong>of</strong> pro<strong>of</strong>s leaving only the difficult parts for<br />

humans to tackle and many can automatically explore possible pro<strong>of</strong>s as a tree search using<br />

a variety <strong>of</strong> different algorithms.<br />

Theorem provers and the field <strong>of</strong> automated deduction in general have a long history, dating<br />

back to Robinson’s demonstration <strong>of</strong> resolution as a basis for mechanised deduction [70] in<br />

1965. One <strong>of</strong> the earliest applications <strong>of</strong> theorem proving was to geometric problems, as<br />

we apply it in this thesis, <strong>with</strong> Gelernter’s geometry-theorem proving machine [19, 20] in<br />

1959. Computational geometry [67] is an active field in its own right that specifically tackles<br />

the kind <strong>of</strong> pro<strong>of</strong>s we consider in Chapter 4. However, geometry-specific algorithms are too<br />

restrictive in the type <strong>of</strong> equations they can process as we discuss later in this thesis.


CHAPTER 2. BACKGROUND AND RELATED WORK 24<br />

2.4.3 Comparison <strong>with</strong> Model Checking<br />

Although theorem proving tends to require much more human intervention than model check-<br />

ing, it has distinct advantages. Firstly, it does not suffer from the state-space explosion that<br />

afflicts model checking as circuits grow in size. Secondly, it is <strong>of</strong>ten easy to prove theorems<br />

for entire classes <strong>of</strong> circuits while model checking tends to be restricted to specific instances<br />

<strong>of</strong> circuits, for example, the Quartz retiming law proved earlier as Theorem 1 can be applied<br />

to any circuit described using repeated composition and similar laws have been proved for<br />

common circuit structures such as rows, columns etc.<br />

When there is a high degree <strong>of</strong> pro<strong>of</strong> re-use possible theorem proving may have an advantage<br />

since key lemmas can be proved once and used in other pro<strong>of</strong>s many times. Also, when model<br />

checking is used to completely verify an implementation against a specification it is <strong>of</strong>ten<br />

necessary to expend considerable effort on simplifying the implementation model to reduce<br />

the state space <strong>of</strong> the problem to something tractable. Recent work [5] has demonstrated that<br />

theorem proving can produce better/stronger results in a similar amount <strong>of</strong> time to model<br />

checking for the verification <strong>of</strong> a security architecture, although a greater level <strong>of</strong> pro<strong>of</strong> expert<br />

involvement was required.<br />

2.4.4 Logic and Pro<strong>of</strong><br />

The essential step in all theorem proving is to formulate the problem in some kind <strong>of</strong> logic and<br />

the choice <strong>of</strong> logic to be used is an issue <strong>of</strong> some debate. Problems can be formulated either<br />

in “raw logic” or can be embedded in an application-specific notation however the power <strong>of</strong><br />

the underlying logic is key. Simple logics support more automation and computer-assisted<br />

pro<strong>of</strong> search procedures are more likely to be effective however powerful logics support better<br />

specification and embedding.<br />

Set theory and first order logic is a standard logical foundation for many theorem proving<br />

applications. Higher-order logics, which allow more flexibility in the scope <strong>of</strong> quantifiers and<br />

the use <strong>of</strong> higher-order arguments to functions/predicates, are more expressive than first<br />

order logic however they are less “well behaved” and this makes automation more difficult.<br />

In general, it is easier for tool users to switch between different logics, which all share common


CHAPTER 2. BACKGROUND AND RELATED WORK 25<br />

themes and concepts, than it is to switch between different pro<strong>of</strong> methodologies or theorem<br />

proving tools. It is questionable whether there is any benefit in attempting to enforce a<br />

standard logic, though it seems likely that any moves in this direction in the future will be<br />

driven by EDA tool builders.<br />

Most mature theorem proving tools support both top-down/backward pro<strong>of</strong> and bottom-<br />

up/forward pro<strong>of</strong>. Backward pro<strong>of</strong> involves the statement <strong>of</strong> a theorem goal and the applica-<br />

tion <strong>of</strong> rules to split the goal to be proved into subgoals. Each subgoal can then be handled<br />

in the same manner, splitting the goals repeatedly until trivial subgoals are reached that<br />

can be proved directly from logical axioms. Forward pro<strong>of</strong> proceeds by starting from basic<br />

axioms and combining them using rules <strong>of</strong> inference until eventually the goal to be proved is<br />

deduced. Both styles have advantages and drawbacks, for example every step in a forward<br />

pro<strong>of</strong> is correct and proved, however it may not bring the user any closer to proving the main<br />

pro<strong>of</strong> goal. In a backward pro<strong>of</strong> however, a pro<strong>of</strong> is guaranteed to terminate at the correct<br />

conclusion but it may not actually be a pro<strong>of</strong> <strong>of</strong> anything at all, unless it can eventually be<br />

reduced to something axiomatic.<br />

There are two main styles <strong>of</strong> interacting <strong>with</strong> a theorem prover: declarative or imperative<br />

(tactic-style). The imperative style effectively involves the creation <strong>of</strong> a pro<strong>of</strong>-generating<br />

program as a combination <strong>of</strong> prover tactics in a typically prover-specific format. This style,<br />

typified by the descendants <strong>of</strong> the LCF theorem prover [21], is useful for finding pro<strong>of</strong>s and<br />

for programming verification algorithms but produces output that is generally unreadable.<br />

As soon as the pro<strong>of</strong> itself and not just the existence <strong>of</strong> a pro<strong>of</strong> becomes important, the<br />

declarative style becomes beneficial. This style, pioneered by the Mizar pro<strong>of</strong> checker [71],<br />

involves the statement <strong>of</strong> a series <strong>of</strong> lemmas or subgoals leading to a conclusion. Declarative<br />

systems are good for mechanised checking <strong>of</strong> pro<strong>of</strong>s and can produce pro<strong>of</strong> scripts that are<br />

easily readable by humans but are unwieldy and impractical for finding the original pro<strong>of</strong>s.<br />

2.4.5 Theorem Proving Tools<br />

There are a range <strong>of</strong> theorem proving systems in widespread use. One is HOL, named after<br />

Higher Order Logic, the underlying formalism. HOL [22] is intended to be a general platform<br />

for the modelling <strong>of</strong> systems in higher-order logic <strong>with</strong> reasoning based on natural deduction


CHAPTER 2. BACKGROUND AND RELATED WORK 26<br />

<strong>with</strong> a few primitive inference rules and axioms. HOL has only a few kinds <strong>of</strong> primitive<br />

terms: variables, constants, function applications and λ-abstractions <strong>with</strong> all other notations<br />

are derived from them (it is worth remembering that higher-order logic is based on typed<br />

λ-calculus).<br />

This simple core logic means that eventually all reasoning should be reduced to these primitive<br />

inference steps and this approach is quite low level relative to other theorem provers. HOL<br />

uses tactics to guide the system in the application <strong>of</strong> primitive steps toward solving theorem<br />

proving goals. A tactic can be regarded as a high level pro<strong>of</strong> step where the primitive<br />

steps necessary to achieve the same functionality are carried out automatically. Tacticals are<br />

functions are used to combine a series <strong>of</strong> tactics into a larger step <strong>of</strong> inference.<br />

HOL has been used extensively in hardware verification, including the verification <strong>of</strong> full<br />

microprocessors [18, 33].<br />

PVS, the Prototype <strong>Verification</strong> System [59], is a general-purpose interactive verification en-<br />

vironment developed at SRI International. The specification language <strong>of</strong> PVS is based on<br />

higher-order logic but also incorporates predicate types and subtypes that allow the defi-<br />

nition <strong>of</strong> partial functions. These constrained types lead to a type checking process that<br />

is undecidable and type correctness may incur additional pro<strong>of</strong> obligations for the user to<br />

manage.<br />

Inference steps in PVS proceed at a high level, <strong>with</strong> primitive rules for operations such<br />

as boolean simplification and decision procedures for linear arithmetic. Unlike HOL it is<br />

therefore not necessary to rely on tactics to the same extent in order to build usable pro<strong>of</strong><br />

steps for the interactive environment. Strategies, which are analogous to HOL tactics, can<br />

be constructed to automate a sequence <strong>of</strong> PVS inference steps.<br />

PVS is less customisable than other theorem provers, however the high degree <strong>of</strong> automation<br />

makes it a very practical tool. One example <strong>of</strong> the large-scale use <strong>of</strong> PVS is the verifica-<br />

tion <strong>of</strong> the AAMP5 microprocessor [79], a commercial processor <strong>with</strong> around half a million<br />

transistors.<br />

ACL2 [34] is an automated reasoning system based on Boyer-Moore logic [9], a first-order,<br />

quantifier-free logic. ACL’s logic is a very small subset <strong>of</strong> Common Lisp, a standard list


CHAPTER 2. BACKGROUND AND RELATED WORK 27<br />

processing language. Models <strong>of</strong> all kinds <strong>of</strong> systems can be built in ACL2 and, once written,<br />

can be executed as Lisp programs.<br />

Since ACL2 is based on a first-order logic it is considerably less expressive than theorem<br />

provers such as HOL which use higher-order logic, however the simply logic allows a very<br />

high degree <strong>of</strong> automation <strong>with</strong> little user intervention required. The user directs the pro<strong>of</strong><br />

search procedure by proving supporting lemmas.<br />

ACL2 has been used to verify some industrial processors, for example a Motorola digital<br />

signal processing chip [10].<br />

Isabelle [61] is a descendant <strong>of</strong> the LCF system [21]. Unlike most theorem provers which<br />

focus on providing a single underlying formalism, Isabelle is intended to be used as a general<br />

platform for the implementation <strong>of</strong> theorem provers in a large variety <strong>of</strong> logics. The motiva-<br />

tion behind the creation <strong>of</strong> Isabelle is that, while theorem proving is an extremely difficult<br />

problem, most <strong>of</strong> the difficulties have to do <strong>with</strong> logic in general rather than any particular<br />

logic. A generic theorem prover will probably never provide the full range <strong>of</strong> support that a<br />

dedicated prover for each logic could, however by reducing the “barriers to entry” it makes it<br />

more likely that some pro<strong>of</strong> support will be available for less common or application specific<br />

logics. Furthermore, since it is currently easier to learn a logic than to learn how to use<br />

a theorem prover it is also easier for users to learn a single prover and then use the most<br />

appropriate logic for their needs.<br />

2.4.6 Embeddings<br />

In order to using theorem proving for verification <strong>of</strong> hardware written in a description lan-<br />

guage it is necessary to develop an implementation <strong>of</strong> the language in a logic <strong>with</strong>in a theorem<br />

prover. This process is referred to as embedding and broadly speaking these can be further<br />

sub-divided [8] into deep embeddings and shallow embeddings.<br />

A shallow embedding, or semantic embedding, involves the definition <strong>of</strong> the meaning <strong>of</strong> the<br />

language directly in terms <strong>of</strong> the connectives in the logic. A deep embedding is characterised<br />

by the definition <strong>of</strong> the syntax <strong>of</strong> a language in a formal logic, typically as some sort <strong>of</strong><br />

abstract data type and the definition <strong>of</strong> a semantic meaning function.


CHAPTER 2. BACKGROUND AND RELATED WORK 28<br />

A shallow embedding is typically much easier to construct than a deep embedding since the<br />

meaning <strong>of</strong> the language can be encoded directly. With a deep embedding the meaning <strong>of</strong><br />

the language must be defined as a function over the abstract syntax in a way that can be<br />

cumbersome however it does have the advantage that the theorem prover is able to reason<br />

over syntactic structures and state theorems about all programs.<br />

2.5 Isabelle: A Generic Theorem Prover<br />

Although the work presented in this thesis is not dependent on a particular theorem prover,<br />

we make extensive use <strong>of</strong> the Isabelle pro<strong>of</strong> tool and a basic understanding <strong>of</strong> its capabilities<br />

and how they can be used is useful for understanding the work presented in Chapter 4 in<br />

particular.<br />

2.5.1 Meta-logic<br />

Isabelle’s distinctive feature is its representation <strong>of</strong> logics <strong>with</strong>in a small fragment <strong>of</strong> a higher-<br />

order logic, called the meta-logic [60].<br />

Isabelle’s meta-logic is typed <strong>with</strong> basic and function types σ → τ. The basic types depend<br />

on the logic being represented but always include the type prop for propositions. The terms <strong>of</strong><br />

the meta-logic are essentially those <strong>of</strong> the typed λ-calculus - constants, variables, abstractions<br />

and function applications.<br />

There are essentially three operations <strong>with</strong>in the meta-logic: universal quantification, meta-<br />

implication and meta-equality. Isabelle uses the symbols , =⇒ and ≡ for these operations to<br />

avoid confusion <strong>with</strong> the equivalent operations in object logics. Implication expresses logical<br />

entailment, quantification expresses generality in rules and axiom schemes and meta-equality<br />

is intended for expressing definitions.<br />

Isabelle’s meta-logic is simply a logic like any other complete <strong>with</strong> basic inference rules. For<br />

example, Figure 2.5.1 shows the meta-logic rules for universal quantification and implication<br />

introduction and elimination.


CHAPTER 2. BACKGROUND AND RELATED WORK 29<br />

[φ]<br />

ψ<br />

(⇒-I)<br />

φ ⇒ ψ<br />

φ ⇒ ψ φ<br />

(⇒-E)<br />

ψ<br />

(Î-I)<br />

φ<br />

x.φ<br />

<br />

x.φ<br />

(Î-E)<br />

φ[b/x]<br />

Figure 2.4: Isabelle meta-logic inference rules<br />

The power <strong>of</strong> Isabelle lies in the ability to use the meta-logic to represent the inference<br />

rules <strong>of</strong> other logics. Object logics are formalised by extending Isabelle’s meta-logic <strong>with</strong><br />

types, constants and axioms. The natural deduction rules <strong>of</strong> object logics are represented by<br />

meta-level axioms. For example, the rules for introduction and elimination <strong>of</strong> the logical and<br />

operation in first order logic can be expressed as:<br />

P Q<br />

(∧-I)<br />

P ∧ Q<br />

P ∧ Q<br />

(∧-E1)<br />

P<br />

P ∧ Q<br />

(∧-E2)<br />

Q<br />

Declared as axioms in the Isabelle meta-logic these inference rules can be described by:<br />

P ; Q =⇒ P ∧ Q<br />

P ∧ Q =⇒ P<br />

P ∧ Q =⇒ Q<br />

Where the nested implication φ1 =⇒ (· · · φn =⇒ ψ) can be abbreviated as φ1 ; . . . ; φn =⇒ ψ<br />

which allows the easy expression <strong>of</strong> a rule <strong>with</strong> n premises. The syntactic resemblance<br />

between the meta-level axioms and the original inference rules is a happy coincidence arising<br />

from the similarities <strong>of</strong> the logic being represented to the meta-logic. In general, Isabelle<br />

possesses sophisticated mechanisms for supporting object logics <strong>with</strong> syntax independent<br />

from the meta-logic through syntax declarations and transformation rules which can be used<br />

to rewrite parsed abstract syntax trees.


CHAPTER 2. BACKGROUND AND RELATED WORK 30<br />

The question can <strong>of</strong> course be raised as to whether the meta-logic representation is correct.<br />

Paulson [60] defines the properties that a meta-logic formalisation is faithful if it admits no<br />

incorrect object-level inferences and adequate if it admits all correct object-level inferences.<br />

2.5.2 Theories<br />

The basic building block <strong>of</strong> Isabelle mathematics is theories, which organise syntax, declara-<br />

tions, axioms and pro<strong>of</strong>s. Theories are built starting from the Pure theory which represents<br />

the meta-logic, by extending and combining existing theories.<br />

Isabelle theories support multiple inheritance and theory dependencies form a directed acyclic<br />

graph (DAG). Theories can declare additional syntax for constants (operators) <strong>with</strong>in a<br />

logic using a priority grammar where each nonterminal is annotated <strong>with</strong> an integer priority<br />

which controls how it is parsed. Mixfix annotations allow the formulation <strong>of</strong> sophisticated<br />

grammar productions to produce readable notation. Special support exists for variable-<br />

binding constructs such as quantifiers which can be declared as binders.<br />

Figure 2.5 shows the definition <strong>of</strong> a minimal logic <strong>of</strong> implication in Isabelle. Line 1 begins<br />

the MinLogic theory by stating that it inherits directly from the Pure meta-logic. Lines 2-5<br />

declare a type o <strong>of</strong> object logic formulae and line 7 declares a coercion from formulae to<br />

propositions. This allows object level operators to be defined over the type o rather than<br />

the general meta-logic prop type, which is important to prevent object logic operations being<br />

used on meta-logic propositions themselves.<br />

The consts section defines the constants and operators in the logic, annotating them <strong>with</strong><br />

their types (the short double arrow ⇒ indicates a function type, and [a, b, c] ⇒ d abbreviates<br />

a ⇒ b ⇒ c ⇒ d). Lines 7 and 8 demonstrate the use <strong>of</strong> mixfix syntax annotations to describe<br />

how the constructs should be parsed, for example infixr indicates that the implication symbol<br />

should be regarded as a right associative infix operator.<br />

The axioms section declares the three inference rules <strong>of</strong> the logic as meta-logic axioms.<br />

Isabelle provides a wide-range <strong>of</strong> existing theories, grouped into complete object logics. The<br />

most commonly used Isabelle logics are first-order logic, ZF set theory (which is built as<br />

extension <strong>of</strong> first-order logic) and an implementation <strong>of</strong> higher-order logic in the style <strong>of</strong> the


CHAPTER 2. BACKGROUND AND RELATED WORK 31<br />

1 theory MinLogic = Pure :<br />

2 types<br />

3 o<br />

4 arities<br />

5 o :: logic<br />

6 consts<br />

7 Trueprop :: "o ⇒ prop" (" " 5)<br />

8 "−→ ":: "[o,o] ⇒ o" (infixr 10)<br />

9 False :: "o"<br />

10 axioms<br />

11 impI : "(P =⇒ Q) =⇒ P −→ Q"<br />

12 impE : " P −→ Q; P =⇒ Q"<br />

13 falseE : "False =⇒ P"<br />

14 end<br />

HOL system.<br />

Figure 2.5: A minimal Isabelle logic<br />

2.5.3 Unification, Resolution and Pro<strong>of</strong><br />

Unification [70] is equation solving, for example the solution <strong>of</strong> solving f(?x, c) ≡ f(d, ?y) is<br />

?x = d and ?y = c. Isabelle uses higher-order unification, which operates on typed λ-terms<br />

as an equation solving mechanism to support the application <strong>of</strong> rules to goals. Higher-order<br />

unification also handles function unknowns so must guess the unknown function ?f in order<br />

to solve the equation ?f(t) = g(u1, . . . , uk). Isabelle denotes unknowns for unification (called<br />

schematic variables) by prefixing their name <strong>with</strong> a question mark. Logically schematic<br />

variables are similar to free variables however while ordinary variables remain fixed unknowns<br />

may be instantiated by unification.<br />

Resolution is used to combine two Isabelle theorems, renaming variables and instantiating<br />

unknowns as necessary to allow rules to be unified <strong>with</strong> the current pro<strong>of</strong> state to create the<br />

next state. Resolution can be used for both forward and backward pro<strong>of</strong>, although backward<br />

pro<strong>of</strong> tends to be the preferred pro<strong>of</strong> mechanism in Isabelle. A meta-level theorem such as<br />

A ; B =⇒ C can be regarded as an inference rule <strong>with</strong> premises A and B and conclusion<br />

C or can equally be viewed as a pro<strong>of</strong> state <strong>with</strong> subgoals A and B and main goal C.<br />

In backward pro<strong>of</strong> a goal is unified <strong>with</strong> the conclusion <strong>of</strong> a rule and the premises are created<br />

as new subgoals. For example, consider the trivial pro<strong>of</strong> <strong>of</strong> the theorem A −→ A in the logic


CHAPTER 2. BACKGROUND AND RELATED WORK 32<br />

defined in Figure 2.5. This pro<strong>of</strong> can be represented in Isabelle as:<br />

theorem "A −→ A"<br />

apply (rule impI)<br />

apply (assumption)<br />

done<br />

The first instruction is for Isabelle to apply the rule impI which lifts the object level implica-<br />

tion to a meta-logic implication <strong>of</strong> the form P =⇒ P (The unknowns P and Q in the axiom<br />

are both unified <strong>with</strong> the constant A). From this state the pro<strong>of</strong> can be completed using the<br />

assumption method which attempts to solve the right hand side <strong>of</strong> a meta-implication using<br />

the assumptions on the left. A =⇒ A can be proved this way leaving an empty (complete)<br />

pro<strong>of</strong> state and the pro<strong>of</strong> is completed <strong>with</strong> the done command.<br />

Isabelle can also solve simple theorems automatically. For example, the pro<strong>of</strong> (in higher-order<br />

logic) that reversing a list twice is the same as the original list can be proved as:<br />

theorem rev rev [simp]: "rev (rev xs) = xs"<br />

apply (induct xs)<br />

apply (auto)<br />

done<br />

This theorem has been named (rev rev) and declared as a simplification rule. It is proved<br />

by the application <strong>of</strong> induction on the variable xs and then automatically by Isabelle. The<br />

auto method uses both <strong>of</strong> Isabelle’s two most useful automation tools, the simplifier and the<br />

classical reasoner.<br />

The simplifier is a term rewriting tool. It repeatedly applies equations from left to right,<br />

using declared sets <strong>of</strong> simplification rules. Virtually any theorem in the form A = B can be<br />

used as a simplification rule however it is generally the case that only rules which genuinely<br />

simplify the proposition should be declared as automatic simplification rules. In the example<br />

above,xs is clearly simpler thanrev (rev xs) so this could be safely used as a simplification<br />

rule however if the theorem were proved in reverse – xs = rev (rev xs) – then this would<br />

not be a suitable simplification rule. Not only does it not simplify the proposition but it<br />

will actually cause the simplifier to loop infinitely by continually expanding the xs on the<br />

right hand side. It is <strong>of</strong>ten useful to invoke the simplifier by itself, which can be done using<br />

Isabelle’s simp method.


CHAPTER 2. BACKGROUND AND RELATED WORK 33<br />

The classical reasoner is a family <strong>of</strong> tools which performs backward pro<strong>of</strong>s automatically.<br />

The classical reasoner performs backward pro<strong>of</strong> search and, combined <strong>with</strong> the simplifier, can<br />

prove many theorems <strong>with</strong>out much user intervention. The classical reasoner can decompose<br />

goals into less complex subgoals using pre-proved lemmas and thus can be guided by supplying<br />

sets <strong>of</strong> useful lemmas that will substantially shorten the pro<strong>of</strong>.<br />

2.6 Summary<br />

In this chapter we have introduced some <strong>of</strong> the background to the work described in this<br />

thesis and highlighted some related work. We have described <strong>FPGA</strong> architectures in general<br />

and examining the structure <strong>of</strong> the Xilinx Virtex-II FGPA family in more detail. We have<br />

described Quartz, the hardware description language we use as the basis <strong>of</strong> this work and<br />

shown how it supports formal reasoning.<br />

We have described the field <strong>of</strong> automated verification based on theorem proving and com-<br />

pared this approach <strong>with</strong> model checking. Finally, we have discussed in detail some <strong>of</strong> the<br />

capabilities <strong>of</strong> the Isabelle generic theorem prover we use in this work.


Chapter 3<br />

Generating <strong>Parameterised</strong><br />

Libraries <strong>with</strong> <strong>Layout</strong><br />

In this chapter we describe how the Quartz framework can be extended to allow the generation<br />

<strong>of</strong> parameterised hardware libraries <strong>with</strong> layout information. Section 3.1 introduces our<br />

motivation for supporting hand placement <strong>of</strong> designs and discusses the benefits <strong>of</strong> relative<br />

placement. Section 3.2 introduces the basic concepts behind our placement infrastructure<br />

and Section 3.3 discusses what is required to provide language support for hand placement in<br />

Quartz. Section 3.4 demonstrates how block sizes can be described by extending the class <strong>of</strong><br />

Quartz expressions and Section 3.5 demonstrates how the Quartz layout system can describe<br />

parameterised combinators and full designs. Section 3.6 shows how blocks can be given<br />

multiple layout interpretations. Section 3.7 describes the compilation <strong>of</strong> placed Quartz into<br />

an extended version <strong>of</strong> Pebble 5, while Section 3.8 describes how this can then be compiled<br />

into parameterised VHDL. Section 3.9 summarises this chapter and discusses related work.<br />

3.1 Introduction<br />

Placement and routing are critical steps in the compilation <strong>of</strong> a high-level hardware de-<br />

scription onto the reconfigurable fabric <strong>of</strong> a FGPA. The effectiveness <strong>of</strong> the placement and<br />

routing algorithms has a significant impact on the performance <strong>of</strong> the resulting circuit since<br />

34


CHAPTER 3. GENERATING PARAMETERISED LIBRARIES WITH LAYOUT 35<br />

a badly placed design will feature unnecessary long wire lengths <strong>with</strong> accompanying delays<br />

and impact on maximum clock frequency.<br />

While modern place and route systems based on simulated annealing can achieve excellent<br />

results for the highest performance it is still common to intervene manually in the placement<br />

<strong>of</strong> designs and much <strong>of</strong> the value <strong>of</strong> Intellectual Property Cores, such as those produced by the<br />

Xilinx Core Generator tool, is that they are carefully laid out to provide good performance.<br />

Manual placement is <strong>of</strong> particular value when a human designer can recognise and exploit an<br />

underlying regularity in a high level description.<br />

Ensuring optimal resource use and layout is particularly important for parameterised hard-<br />

ware libraries since any inefficiency will affect all designs that use them. Controlling place-<br />

ment is also desirable for reconfigurable circuits to support partial reconfiguration, since<br />

identical components at identical locations common to two different configurations do not<br />

need to be reconfigured when switching between them.<br />

The Xilinx “RLOC” placement macro allows primitive hardware components to be placed<br />

relative to each other in a structural design description. RLOCs provide a relatively low-level<br />

interface to influence the place and route system and allow the production <strong>of</strong> parameterised,<br />

fully or partially laid out designs in a conventional hardware description language like VHDL<br />

or Verilog. However, supplying explicit coordinate information for every component in a<br />

large circuit is tedious and error-prone.<br />

Relative placement, laying out components beside or below one another, has been proposed<br />

for producing designs. Systems that support this technique include Ruby [26], Lava [7] and<br />

Pebble [49, 50]. Neither the Ruby-based or Lava-based systems that have been developed<br />

maintain full parameterisation while processing layout, something that makes them unsuit-<br />

able if the desired goal is to produce parameterised library descriptions. The Pebble system<br />

does maintain parameterisation while converting relative positions into explicit coordinates<br />

however the generated layouts are not necessarily optimal and designers are restricted to<br />

using relative positioning only, <strong>with</strong>out any explicit coordinates.<br />

In this chapter we develop a system that allows the combination <strong>of</strong> relative and explicit<br />

placement information in a single framework and the compilation <strong>of</strong> these descriptions into


CHAPTER 3. GENERATING PARAMETERISED LIBRARIES WITH LAYOUT 36<br />

<strong>FPGA</strong> libraries in the industry standard language VHDL while maintaining a high degree <strong>of</strong><br />

parameterisation.<br />

3.2 Placement Infrastructure<br />

Our system is based around the Quartz framework [65]. Quartz is a simple language that<br />

incorporates features found in a wide range <strong>of</strong> other languages, thus providing a good in-<br />

frastructure for seeing the impact <strong>of</strong> different features on the layout system. This also means<br />

that the framework we develop can be adapted to many other languages merely by removing<br />

aspects the language does not support.<br />

Quartz supports polymorphism, higher-order combinators, overloading and compilation to<br />

parameterised VHDL via the Pebble system. The language also supports both recursion<br />

and explicit iteration <strong>with</strong> a for loop structure. While the favoured method <strong>of</strong> repetition<br />

in functional languages is recursion, parameterised circuit descriptions are more commonly<br />

written iteratively so support for both mechanisms is useful.<br />

Quartz is a high-order language that supports hardware blocks which can be parameterised<br />

by other blocks. These higher order combinators can be leveraged to describe component<br />

placement as well as functionality, thus eliminating the necessity <strong>of</strong> adding specific language<br />

constructs for below, beside, below for... etc as was the focus <strong>of</strong> the earlier work on parame-<br />

terised placement in Pebble.<br />

Two key concepts underly our Quartz placement infrastructure: an explicit placement at<br />

(x,y) command for block instantiations and a hierarchical approach to placement so explicit<br />

coordinates <strong>with</strong>in each block are relative to a local origin.<br />

However, in a high-order language such as Quartz absolute explicit co-ordinates are even less<br />

useful than in a language such as VHDL. Because hardware blocks can be passed as higher-<br />

order parameters the actual block instantiated by a particular piece <strong>of</strong> code can actually be<br />

unknown at design-time, leaving the problem <strong>of</strong> specifying absolute coordinate locations for<br />

the next block to place an intractable one since it depends on the precise size <strong>of</strong> the unknown<br />

block. It is therefore necessary to support relative positioning even in explicit coordinates


CHAPTER 3. GENERATING PARAMETERISED LIBRARIES WITH LAYOUT 37<br />

so that, for example, block R can be positioned explicitly at an <strong>of</strong>fset <strong>of</strong> the (unknown)<br />

size <strong>of</strong> block S. We do this by providing height() and width() functions that can be used in<br />

expressions to explicitly refer to the size <strong>of</strong> a particular block.<br />

Quartz combinators can therefore be used to provide relative placement capabilities though<br />

they are themselves described in terms <strong>of</strong> explicit coordinates. This approach has the ad-<br />

vantage that layout and function can be described by the same code and in many cases a<br />

geometrically sensible layout is a natural by-product <strong>of</strong> the functional description.<br />

The Quartz layout system provides an architecture independent abstract homogeneous grid<br />

<strong>of</strong> resources. This allows the layout infrastructure to be developed in an architecture indepen-<br />

dent manner and layout information can be translated into the specific format required for<br />

the target architecture during compilation. The assumption that the available resources are<br />

arranged in a homogeneous grid is an approximation to the complexity <strong>of</strong> modern <strong>FPGA</strong>s<br />

where the device fabric will typically also include specialised resources such as embedded<br />

RAM or fast multipliers however it is sufficiently close to reality for practical use.<br />

The Quartz layout grid is composed <strong>of</strong> elements <strong>of</strong> uniform size 1 × 1, so all positions on the<br />

grid can be addressed using purely integer co-ordinates. This is, once again, an approximation<br />

to <strong>FPGA</strong> structure where a single computational block has multiple resources, for example<br />

the Xilinx Virtex-II slice illustrated in Chapter 2 includes a look-up table and also other<br />

logic gates and multiplexers. How this simplified grid is used to describe real circuits will be<br />

demonstrated in Chapter 6.<br />

3.3 Requirements<br />

Figure 3.1 shows the different Quartz statements. Each type <strong>of</strong> statement produces particular<br />

kinds <strong>of</strong> structures <strong>with</strong> specific layout requirements. Our layout infrastructure must be able<br />

to support all <strong>of</strong> these structures. It is important to note that what we are essentially<br />

discussing how the size <strong>of</strong> each statement can be described since if the explicit coordinate<br />

system is to support fully relative placement then it is necessary to be able to place statements<br />

relative to the size <strong>of</strong> other statements. The same approach can be extended to whole blocks<br />

and each block should be given a size function, parameterised in the same way as the block


CHAPTER 3. GENERATING PARAMETERISED LIBRARIES WITH LAYOUT 38<br />

〈stmt〉 ::= assert (〈expr〉) "〈string〉" .<br />

| 〈expr〉 = 〈expr〉 .<br />

| 〈arg〉 ; 〈blkref 〉 (; 〈blkref 〉 )* ; 〈arg〉 .<br />

| for 〈id〉 = 〈expr〉..〈expr〉 { 〈stmt〉* } .<br />

| if (〈expr〉) { 〈stmt〉* } ( else { 〈stmt〉* } )? .<br />

〈blkref 〉 ::= 〈id〉 〈arg〉* | [ ] | [ 〈blkref 〉 (, 〈blkref 〉)* ]<br />

〈arg〉 ::= 〈id〉 〈vecindex 〉 | 〈expr〉<br />

| ( ) | ( 〈arg〉 (, 〈arg〉)* )<br />

Figure 3.1: Grammar <strong>of</strong> Quartz statements<br />

itself, which describes the size <strong>of</strong> the statements <strong>with</strong>in the block.<br />

Assertion statements are compile-time directives that check for the validity <strong>of</strong> particular pre-<br />

conditions and thus have no effect on the resulting layout. The assignment operation has<br />

three (overloaded) uses:<br />

1. Assigning values to variables which may control elaboration.<br />

2. Assigning static values to wires.<br />

3. Connecting wires together.<br />

(1) clearly does not have any direct effect on layout (save that it may affect the way later<br />

statements are processed). In our model that concentrates on layout rather than routing we<br />

will assume that (2) and (3) also have no effect on layout. This is a reasonable assumption<br />

since <strong>FPGA</strong> routing resources are independent from computational ones and it is not generally<br />

necessary to allocate computational logic resources to either <strong>of</strong> these functions 1 .<br />

Conditional statements allow different hardware to be generated depending on the evaluation<br />

<strong>of</strong> a boolean expression. This is significant because the two possible branches <strong>of</strong> the condi-<br />

tional can have different sizes and thus any reference to the size <strong>of</strong> the overall conditional<br />

must take this into account. The easiest way to do this is to allow placement expressions to<br />

contain conditionals <strong>of</strong> their own, a simple and remarkably powerful approach.<br />

Loops allow the instantiation <strong>of</strong> multiple hardware blocks. An important realisation for the<br />

1 Issues <strong>with</strong> insufficient routing resources are unusual in most cases when using the standard placer and<br />

router. However, if a lot <strong>of</strong> logic is packed <strong>with</strong> LOC/RLOC constraints regardless <strong>of</strong> the available routing<br />

resources this could increase the risk <strong>of</strong> routing congestion making it possible to describe circuits that can<br />

not be routed.


CHAPTER 3. GENERATING PARAMETERISED LIBRARIES WITH LAYOUT 39<br />

general use <strong>of</strong> these constructs is that each iteration <strong>of</strong> the loop need not have the same size<br />

(though it <strong>of</strong>ten will), for example the loop statements could themselves contain a conditional.<br />

Additional functions will need to be added to the placement expressions to allow placement<br />

relative to arbitrary loop constructs to be described.<br />

Block instantiations include not just instantiations <strong>of</strong> individual blocks but also series and<br />

parallel compositions <strong>of</strong> blocks. Series and parallel compositions must be given their own<br />

layout interpretation to ensure that blocks <strong>with</strong>in a composition are laid out correctly, while<br />

the whole composition itself can be placed using the at (x,y) command.<br />

An important requirement for a truly general Quartz framework is that it must be able to<br />

support the full range <strong>of</strong> Quartz block parameterisation. Two features <strong>of</strong> Quartz that differ<br />

from languages like Pebble/VHDL are a particular issue here:<br />

• Quartz blocks are relational and may have multiple possible interpretations in terms <strong>of</strong><br />

inputs/outputs. During compilation a single input/output arrangement is selected for<br />

each block however when describing the size <strong>of</strong> a block in terms <strong>of</strong> its inputs it is not<br />

always clear which parameters supplied to the block are inputs and which are outputs.<br />

• Variable parameters are not restricted to a particular “generics” region <strong>of</strong> the block<br />

interface and can be distributed anywhere throughout the block’s domain/range. Fur-<br />

thermore, Quartz blocks can output variables as well as being parameterised by input<br />

variables. The compilation <strong>of</strong> one block may therefore produce a value which impacts<br />

the compilation <strong>of</strong> a later block.<br />

Since a block size can depend on any variable parameter it is important that the block size<br />

function should be in terms <strong>of</strong> all input variables, whether in the domain or range. However,<br />

this is complicated further by polymorphism – which signals are variables and which are<br />

real hardware is not in general known. While a polymorphic variable can not effect the<br />

compilation, and hence size, <strong>of</strong> the immediate block (otherwise its type would have been<br />

deduced as either int or bool during the type-checking stage), it could be supplied as a<br />

parameter to a higher-order parameter block and (sometimes) affect that block’s size, hence<br />

affecting the calling block’s size indirectly. In order to support this possibility block size<br />

functions must be phrased in terms <strong>of</strong> all domain and range signals, since any one <strong>of</strong> them


CHAPTER 3. GENERATING PARAMETERISED LIBRARIES WITH LAYOUT 40<br />

could affect the block’s elaboration.<br />

A good example <strong>of</strong> the usefulness <strong>of</strong> not restricting variables to a specific region <strong>of</strong> block<br />

parameters are shadow values [75, 76]. This refers to when hardware wires are “paired”<br />

<strong>with</strong> variables which control the instantiation behaviour <strong>of</strong> ‘clever components’ - components<br />

which take account <strong>of</strong> the context (as expressed by the variable values paired <strong>with</strong> each wire)<br />

to generate appropriate hardware). Sheeran describes the use <strong>of</strong> shadow values to control<br />

the elaboration <strong>of</strong> a somewhat regular circuit for finding the median <strong>of</strong> a set <strong>of</strong> values and we<br />

discuss the use <strong>of</strong> similar (though differently motivated) constructs for specialising designs<br />

in Chapter 5 <strong>of</strong> this thesis.<br />

3.4 Block Sizes<br />

3.4.1 Size Expressions<br />

Unlike a functional language such as Haskell and in a style that is more similar to a language<br />

like VHDL, Quartz statements are distinct from arithmetic or logical expressions. In the<br />

previous section we established that since block sizes can vary depending on statement exe-<br />

cution the system to describe block sizes must mirror statement structures, thus we extend<br />

Quartz expressions <strong>with</strong> new constructs to facilitate this.<br />

Figure 3.2 shows how Quartz expressions have been augmented. The height and width func-<br />

tions allow an expression to refer explicitly to the size <strong>of</strong> a block or block composition.<br />

They are parameterised not just by a block name but by a full block instantiation, such as<br />

(a, (b, c)) ; sndadd ; mult ; d. This allows all possible parameterisation information to be cap-<br />

tured. max is an n-ary operator which selects the maximum value from a set <strong>of</strong> expressions.<br />

The conditional, if(cond, etrue, efalse), chooses one <strong>of</strong> two values depending on the value <strong>of</strong><br />

the conditional test.<br />

The sum(i = l..u, e(i)) function sums an expression over a range, implementing<br />

u<br />

e(i).<br />

Unlike the mathematical definition <strong>of</strong> summation, if the lower bound is greater than the<br />

upper bound the function returns zero, rather than being undefined. The maxf function is<br />

parameterised in the same way and selects the maximum value <strong>of</strong> the expression in the range,<br />

i=l


CHAPTER 3. GENERATING PARAMETERISED LIBRARIES WITH LAYOUT 41<br />

〈expr〉 ::= 〈expr〉 〈bop〉 〈expr〉<br />

| 〈uop〉 〈expr〉<br />

| 〈var〉<br />

| 〈num〉<br />

| true | false<br />

| height ( 〈blkinst〉 )<br />

| width ( 〈blkinst〉 )<br />

| max ( 〈expr〉, 〈expr〉 (, 〈expr〉)* )<br />

| if ( 〈expr〉 , 〈expr〉 , 〈expr〉 )<br />

| sum ( 〈id〉 = 〈expr〉 .. 〈expr〉 , 〈expr〉 )<br />

| maxf ( 〈id〉 = 〈expr〉 .. 〈expr〉 , 〈expr〉 )<br />

〈bop〉 ::= and | or | nand | nor | xor | xnor<br />

| + | - | * | / | ** | mod | == | !=<br />

| < | | >=<br />

〈uop〉 ::= - | abs | not<br />

Figure 3.2: Grammar <strong>of</strong> Quartz expressions<br />

once again returning zero if the lower bound is greater than the upper bound.<br />

More formally, the semantics <strong>of</strong> Quartz expressions are described by the evaluation function<br />

E. The clauses describing the semantics <strong>of</strong> the new operations are shown in Figure 3.3.<br />

The concept <strong>of</strong> a “function” is kept clearly restricted to block height and width expressions,<br />

while elsewhere the semantics are based on substitution (shown as {a ↦→ b}) rather than<br />

λ-abstraction. This distinction exists because Quartz expressions must be compiled into<br />

Pebble/VHDL expressions <strong>with</strong>out functions, λ-calculus like semantics are restricted to block<br />

size functions since these will be eliminated during Quartz compilation. We will show in<br />

Chapter 4 how λ-calculus style functions can be used to achieve the same semantics.<br />

Note that we have extended expressions <strong>with</strong> max and maxf operators but have not included<br />

their duals min and minf. While this might be desirable in the interests <strong>of</strong> symmetry actu-<br />

ally we do not require minimum-finding operations for generating layouts and in any event<br />

expressions using these functions can be manipulated to eliminate them using the following<br />

two relationships:<br />

Theorem 2 ∀a b. min(a, b) = −max(−a, −b)<br />

Theorem 3 ∀e1 e2 e3. minf(i = e1..e2, e3) = −maxf(i = e1..e2, −e3)


CHAPTER 3. GENERATING PARAMETERISED LIBRARIES WITH LAYOUT 42<br />

E :: SizeEnv → BlockEnv → V arEnv → Exp → Exp<br />

Eσ β µ height(x ; bs ; y) =<br />

let (w,h) = SIσ βbs in Eσ β µh(x, y)<br />

Eσ β µ width(x ; bs ; y) =<br />

let (w,h) = SIσ βbs in Eσ β µw(x, y)<br />

Eσ β µ max(e1, e2, . . . , en) =<br />

let e ′ 1 = Eσ β µe1 in<br />

let e ′ 2 = Eσ β µe2 in<br />

.<br />

let e ′ n = Eσ β µen in<br />

max (e ′ 1 , e′ 2 , . . ., e′ n)<br />

Eσ β µ sum(i = e1..e2, f) =<br />

let e ′ 2 = Eσ β µe2 in<br />

if e ′ 2 < Eσ β µe1 then 0<br />

else Eσ β µ{i ↦→ e ′ 2 }f+sum(i = e1..(e ′ 2<br />

Eσ β µ maxf(i = e1..e2, f) =<br />

− 1), f)<br />

let e ′ 1 = Eσ β µe1 in let e ′ 2 = Eσ β µe2 in<br />

if e ′ 2 < e′ 1 then 0<br />

else if e ′ 2 = e′ 1 then {i ↦→ e′ 2 }f<br />

else Eσ β µmax({i ↦→ e ′ 2 }f, maxf(i = e1..(e ′ 2 − 1), f))<br />

Figure 3.3: Semantics <strong>of</strong> size expressions<br />

Pro<strong>of</strong> Theorem 2 is proved by expanding the definitions <strong>of</strong> min and max and re-arranging.<br />

Theorem 3 is proved by analysing the cases where e1 > e2 and e1 ≤ e2 separately. In the<br />

first case, both functions return zero and thus the proposition holds, in the latter case pro<strong>of</strong><br />

is by induction on e2 (<strong>with</strong> a base case <strong>of</strong> e1 = e2) and then re-arrangement using Theorem<br />

2. Mechanised pro<strong>of</strong>s are given as theorems min max corres and minf maxf corres in<br />

Appendix B.11.<br />

3.4.2 Size <strong>of</strong> Block Instantiations<br />

The function SIσ β returns the size functions for a particular block instantiation. It is<br />

parameterised by two environments, σ maps block identifiers to a pair <strong>of</strong> (width, height)<br />

functions, β maps block identifiers to their definition. The function definition can be seen in<br />

Figure 3.4. The semantic function for blocks, Bβ, generates a logical predicate corresponding<br />

to the definition <strong>of</strong> a block identifier, SIσ β uses the function B ′ β which carries out the same<br />

operation for block instantiation statements. These functions are defined in Figure 4.8 on<br />

page 81 however their precise definition is mainly relevant to layout verification in Chapter 4<br />

rather than compilation.


CHAPTER 3. GENERATING PARAMETERISED LIBRARIES WITH LAYOUT 43<br />

SI :: SizeEnv → BlockEnv → Blkinst → (Exp × Exp)<br />

SIσ β bid p1 . . . pn =<br />

let (w, h) = σ(bid) in (w(p1 . . . pn), h(p1 . . . pn))<br />

SIσ β [ b1 , . . . , bn ] =<br />

let (w1, h1) = SIσ βb1 in<br />

.<br />

let (wn, hn) = SIσ βbn in<br />

(λ(x1, . . .,xn)(y1, . . . , yn). max(w1(x1, y1), . . . , wn(xn, yn)),<br />

λ(x1, . . . , xn)(y1, . . .,yn). h1(x1, y1) + · · · + hn(xn, yn))<br />

SIσ β b1 ; b2 =<br />

let (w1, h1) = SIσ βb1 in<br />

let (w2, h2) = SIσ βb2 in<br />

(λxy. let s = (ιs. B ′ β b1(x, s) ∧ B ′ β b2(s, y)) in w1(x, s) + w2(s, y),<br />

λxy. let s = (ιs. B ′ β b1(x, s) ∧ B ′ β b2(s, y)) in max(h1(x, s), h2(s, y)))<br />

Figure 3.4: Calculating a size function for a block instantiation<br />

S<br />

R<br />

T<br />

(a) Parallel composition<br />

R<br />

S<br />

T<br />

(b) Series composition<br />

Figure 3.5: Sizes generated for block compositions


CHAPTER 3. GENERATING PARAMETERISED LIBRARIES WITH LAYOUT 44<br />

Where a single block is instantiated the width and height functions are extracted from the σ<br />

environment and are returned after the application <strong>of</strong> curried parameters p1 . . . pn. The size <strong>of</strong><br />

a parallel composition <strong>of</strong> blocks is calculated as though the blocks are arranged vertically, <strong>with</strong><br />

the overall height being the sum <strong>of</strong> the height <strong>of</strong> each individual block and the overall width<br />

being the maximum width <strong>of</strong> any block in the composition. The size <strong>of</strong> a series composition<br />

<strong>of</strong> blocks is calculated as though the blocks are arranged horizontally in a similar manner.<br />

Series composition is represented as a fully associative binary operator (an n-size composition<br />

can be represented as nested series compositions) for convenience. The series composition<br />

size function internal signal s is defined in terms <strong>of</strong> the ι definite description operator. This<br />

operator, in the form ιx. P(x) can be read as “the x such that P(x) holds”. In this case<br />

the internal signal s is the one such that it obeys the predicates produced by the application<br />

<strong>of</strong> function B ′ to the two blocks. This is made necessary by the relational nature <strong>of</strong> Quartz<br />

blocks which means it is not possible to state whether block b1 or block b2 generates the value<br />

s (or some combination <strong>of</strong> the two, if it is a tuple <strong>of</strong> values). This logical nicety is practically<br />

irrelevant for us in this chapter since we will convert relational Quartz descriptions into<br />

functional ones during compilation while evaluating any size functions.<br />

The grey boxes <strong>with</strong> dotted outlines in Figure 3.5 illustrate the sizes calculated for series and<br />

parallel compositions diagrammatically.<br />

3.4.3 Size Inference<br />

As noted in the previous section each type <strong>of</strong> statement has a particular layout interpretation<br />

and we can exploit that to describe the size <strong>of</strong> each type <strong>of</strong> statement recursively. The size<br />

<strong>of</strong> an entire block can then be described in terms <strong>of</strong> the sizes <strong>of</strong> its constituent statements.<br />

The function SB, shown in Figure 3.6, returns a pair <strong>of</strong> expressions describing the height<br />

and width <strong>of</strong> a block in terms <strong>of</strong> its domain, range and internal signals.<br />

Unlike the SI function we described previously this function is directly implementable since<br />

it does not use the definite description operator or refer explicitly to internal signals. Note<br />

also that this function does not infer size functions for the block, it infers size expressions<br />

in terms <strong>of</strong> the block’s local environment. These size expressions can be converted into


CHAPTER 3. GENERATING PARAMETERISED LIBRARIES WITH LAYOUT 45<br />

SB :: Block → (Exp × Exp)<br />

SB block bid d1 . . . dn ∼ r { τ1 id1 . . . τq idq. stmts } = SS ′ stmts<br />

SS ′ :: StmtList → (Exp × Exp)<br />

SS ′ stmt1 . . . stmtn =<br />

let (w1, h1) = SSstmt1 in<br />

.<br />

let (wn, hn) = SSstmtn in<br />

(max(w1, . . .,wn), max(h1, . . . , hn))<br />

SS :: Stmt → (Exp × Exp)<br />

SS assert e str = (0, 0)<br />

SS e1 = e2 = (0, 0)<br />

SS if e { stmts1 } else { stmts2 } =<br />

let (w1, h1) = SS ′ stmts1 in<br />

let (w2, h2) = SS ′ stmts2 in<br />

(if(e, w1, w2), if(e, h1, h2))<br />

SS for i = e1..e2 { stmts } =<br />

let (w, h) = SSstmts in<br />

(maxf(i = e1..e2, w), maxf(i = e1..e2, h))<br />

SS a ; blkinst ; b at (x, y) =<br />

(width(a ; blkinst ; b) + x, height(a ; blkinst ; b) + y)<br />

Figure 3.6: Inferring the size <strong>of</strong> a block<br />

functions in terms <strong>of</strong> the block’s domain and range variables by binding any local variables to<br />

values determined through an ι operator and the semantic meaning function for the block’s<br />

statements, however we will not pursue this approach in this chapter. This introduces a<br />

slight divergence between the function SI which is parameterised by a SizeEnv environment<br />

<strong>of</strong> block size functions and the size inference function, which produces expressions, however<br />

for the purposes <strong>of</strong> our implementation this is desirable - we take an alternative approach to<br />

implementing these semantics during the compilation process. In Chapter 4 we will return<br />

to using size functions for verification purposes and Figure 4.9 on page 82 illustrates how size<br />

expressions can be converted into size functions.<br />

The key concept this function implements is that a block’s size is the top right corner <strong>of</strong> a<br />

bounding box that encloses all sub-blocks instantiated <strong>with</strong>in it. This is found by selecting<br />

the maximum value for each statement then selecting the maximum <strong>of</strong> all statements for the<br />

block as a whole. It is important to note that because max has been added to the set <strong>of</strong><br />

available expression operators, the expressions that describe the height and width can remain<br />

fully parameterised in terms <strong>of</strong> a block’s signals and the max function can be evaluated exactly<br />

during full elaboration <strong>of</strong> the design rather than forcing the compiler to select a (possibly


CHAPTER 3. GENERATING PARAMETERISED LIBRARIES WITH LAYOUT 46<br />

block beside (block R (‘a, ‘b) ∼ (‘d, ‘x), block S (‘x, ‘c) ∼ (‘e, ‘f))<br />

(‘a a, (‘b b, ‘c c)) ∼ ((‘d d, ‘e e), ‘f f)<br />

attributes {<br />

width = width((a,b) ;R ;(d, is )) + width((is, c) ; S ; (e, f)).<br />

height = max (height ((a, b) ;R ;(d, is )), height((is, c) ; S ; (e, f))).<br />

}<br />

{<br />

‘x is.<br />

(a, b) ; R ; (d, is) at (0,0).<br />

(is , c) ; S ; (e, f) at (width((a, b) ; R ; (d, is )), 0).<br />

}<br />

Figure 3.7: A placed interpretation <strong>of</strong> the Quartz beside combinator<br />

non-optimal) static approximation to this value.<br />

The function is defined recursively over the structure <strong>of</strong> statements, <strong>with</strong> three basic cases.<br />

Assignments and assertions are given size expressions (0,0) since they are assumed to take<br />

up no space. Block instantiations have size expressions that are the size <strong>of</strong> the instantiation<br />

itself added to its x and y placed position. Conditional generation statements produce a<br />

conditional expression. Iteration statements are the most interesting construct, here the<br />

maxf function is used to select the maximum size values as the loop variable varies between<br />

its lower and upper limit. If the loop upper bound is less than its lower bound (e2 < e1 in<br />

Figure 3.6) then the loop never executes and this is reflected in the semantics for maxf which<br />

returns zero in these circumstances.<br />

3.5 <strong>Parameterised</strong> Quartz <strong>with</strong> Placement<br />

These concepts can be brought together to write Quartz code <strong>with</strong> relative placement using<br />

explicit co-ordinates.<br />

3.5.1 Laid-out Combinators<br />

In Figure 3.7 a version <strong>of</strong> the R↔S (beside) combinator which has been laid out <strong>with</strong> relative<br />

positioning is illustrated. The block R is placed at the origin and the block S is beside it<br />

<strong>with</strong> a y-<strong>of</strong>fset <strong>of</strong> zero and an x-<strong>of</strong>fset <strong>of</strong> the width <strong>of</strong> the R block.


CHAPTER 3. GENERATING PARAMETERISED LIBRARIES WITH LAYOUT 47<br />

The standard block definition has been augmented <strong>with</strong> an set <strong>of</strong> attributes giving height<br />

and width expressions for the block. The width <strong>of</strong> the combinator is described as the sum<br />

<strong>of</strong> the widths <strong>of</strong> the R and S blocks, while the height is the maximum <strong>of</strong> the heights <strong>of</strong> the<br />

R and S blocks. These manually specified size expressions are the same as those that would<br />

have been returned by the inference algorithm SBσ β, simplified <strong>with</strong> the trivial relationship<br />

x + 0 = x.<br />

This laid-out beside combinator can now be used to position blocks relative to each other in<br />

a larger circuit, <strong>with</strong> the added advantage that it can also describe the inter-connection <strong>of</strong><br />

the blocks. This allows the significant problem <strong>of</strong> writing correct explicit layouts to be split<br />

into simpler and more manageable modules: in the instantiation beside(R, S) the internal<br />

arrangement <strong>of</strong> the combinator can be ignored and only the size <strong>of</strong> the entire beside block<br />

needs to be known.<br />

3.5.2 Naive vs General Placement<br />

Figure 3.8 shows a description <strong>of</strong> the Quartz map n R combinator which applies a block to each<br />

element <strong>of</strong> a vector. The block’s function is described using a single loop which instantiates<br />

multiple R blocks, each <strong>of</strong> which is explicitly placed at a set <strong>of</strong> co-ordinates.<br />

The layout produced by this description for n = 4 is shown in Figure 3.9(a). At first glance<br />

this layout appears to be correct: the grey bounding box shows the overall height <strong>of</strong> the map<br />

block specified as a multiple <strong>of</strong> n times the height <strong>of</strong> the R block and the width is the same<br />

as the width <strong>of</strong> R, while each block is placed <strong>with</strong> an x-<strong>of</strong>fset <strong>of</strong> zero and a y-<strong>of</strong>fset calculated<br />

from the number <strong>of</strong> R blocks below it. This layout is in many cases correct, however it<br />

relies on a key assumption: that the size <strong>of</strong> R is the same for all iterations across the vector.<br />

The vector is polymorphic and could be a tuple containing variables which effect the size <strong>of</strong><br />

each R instance yet this would not be reflected in the layout generated by this naive map<br />

implementation. Figure 3.9(b) illustrates the same map block implementation for a situation<br />

where the R instances do not all have the same size: block placement is incorrect - leaving<br />

empty space between blocks or causing them to overlap - and the overall size <strong>of</strong> the whole<br />

block is also too small since two R instances lie outside the bounding box.


CHAPTER 3. GENERATING PARAMETERISED LIBRARIES WITH LAYOUT 48<br />

block map (int n, block R ‘a ∼ ‘b) (‘a i[n]) ∼ (‘b o[n])<br />

attributes {<br />

width = width(i[0] ;R ; o[0]) .<br />

height = height(i[0] ; R ; o[0]) ∗ n.<br />

} {<br />

int j.<br />

for j = 0..n−1 {<br />

i [j] ; R ; o[j] at (0, height(i[0] ; R ; o[0]) ∗ j).<br />

} .<br />

}<br />

(0, 3 * height R)<br />

(0, 2 * height R)<br />

(0, 1 * height R)<br />

(0, 0)<br />

Figure 3.8: The Quartz map n R combinator <strong>with</strong> naive layout information<br />

R<br />

R<br />

R<br />

R<br />

(a) Naive layout<br />

(width R, 4 * height R)<br />

(0, 3 * height R)<br />

(0, 2 * height R)<br />

(0, 1 * height R)<br />

(0, 0)<br />

R<br />

R<br />

R<br />

R<br />

(width R, 4 * height R)<br />

(b) Failure <strong>of</strong> naive layout<br />

(0, height i[0];R;o[0] +<br />

height i[1];R;o[1] +<br />

height i[2];R;o[2])<br />

(0, height i[0];R;o[0] +<br />

height i[1];R;o[1])<br />

(0, height i[0];R;o[0])<br />

Figure 3.9: Different layouts for map n R<br />

block map (int n, block R ‘a ∼ ‘b) (‘a i[n]) ∼ (‘b o[n])<br />

attributes {<br />

width = maxf(k=0..n−1, width (i[k] ;R ;o[k])).<br />

height = sum(k=0..n−1, height (i[k] ;R ;o[k])).<br />

} {<br />

int j.<br />

for j = 0..n−1 {<br />

i [j] ; R ; o[j] at (0, sum(k=0..j−1,height(i[k] ;R ; o[k]))) .<br />

} .<br />

}<br />

(0, 0)<br />

R<br />

R<br />

R<br />

R<br />

(c) General layout<br />

(max width R,<br />

sum(j=0..3, height i[j];R;o[j])<br />

Figure 3.10: The Quartz map n R combinator <strong>with</strong> general layout information


CHAPTER 3. GENERATING PARAMETERISED LIBRARIES WITH LAYOUT 49<br />

The general case requires a more general layout description, shown in Figure 3.10. Instead<br />

<strong>of</strong> multiplication, the sum function is used to ensure that each block is placed at exactly the<br />

right position and the sum and maxf functions are used to describe the bounding box for the<br />

whole map n R block. Figure 3.9(c) demonstrates that this layout description can cope <strong>with</strong><br />

the irregular example.<br />

It is worth noting that the manually specified height and width for this block are not the<br />

same expressions that would be inferred by the inference algorithm in the previous section.<br />

The inferred height expression would be:<br />

maxf(j = 0..n − 1,width(i[j] ; R ; o[j]) + 0)<br />

and the width expression would be:<br />

maxf(j = 0..n − 1,height(i[j] ; R ; o[j]) +sum(k = 0..j − 1, height(i[k] ; R ; o[k]))<br />

Both <strong>of</strong> these are more complex expressions than the manually specified expressions and this<br />

is the main advantage <strong>of</strong> manually specifying size expressions rather than using the inference<br />

algorithm. It is important to note that they are exactly equivalent (we will develop pro<strong>of</strong>s<br />

<strong>of</strong> this kind <strong>of</strong> relationship in Chapter 4).<br />

3.5.3 A Placed Ripple Adder<br />

A simple example <strong>of</strong> the kind <strong>of</strong> parameterised placed circuit that can be described <strong>with</strong> this<br />

infrastructure is a ripple adder. The code in Figure 3.11 describes an n-bit ripple adder built<br />

from a row <strong>of</strong> full-adder blocks (note that we have assumed that a full-adder block fadd is<br />

available as a library/primitive element <strong>of</strong> size 2 × 1).<br />

This description uses the row combinator, which has been annotated <strong>with</strong> placement infor-<br />

mation, to generate a connected row <strong>of</strong> full adders. The rippleadd block uses the language<br />

construct zip to re-arrange the tuple <strong>of</strong> vectors (a, b) into a vector <strong>of</strong> tuples for application<br />

to the row <strong>of</strong> full adders and uses the wiring block apr (append-right) (in the Quartz prelude<br />

library) to append the carry-out wire to the sum output.


CHAPTER 3. GENERATING PARAMETERISED LIBRARIES WITH LAYOUT 50<br />

block row (int n, block R (‘a, ‘b) ∼ (‘c, ‘a)) (‘a l , ‘b t[n]) ∼<br />

(‘c b[n], ‘a r)<br />

attributes {<br />

height = maxf(k=0..n−1, height((is[k], t[k]) ;R ; (b[k], is [k+1]))).<br />

width = sum(k=0..n−1, width ((is[k], t[k]) ;R ; (b[k], is [k+1]))).<br />

} {<br />

// Wires: l = left , t = top, b = bottom, r = right<br />

int i. ‘a is [n+1].<br />

is [0] = l.<br />

for i = 0..n−1 {<br />

(is [ i ], t[ i ]) ; R ; (b[i ], is [ i+1])<br />

at (sum(k=0..i−1,width((is[k], t[k]) ; R ; (b[k], is [k+1]))), 0).<br />

} .<br />

r = is[n].<br />

}<br />

block fadd (wire cin, (wire a, wire b)) ∼ (wire ans, wire cout)<br />

attributes { height = 1. width = 2. }{ }<br />

block rippleadd (int n) (wire a[n], wire b[n]) ∼ (wire ans[n+1])<br />

attributes {<br />

height = 1.<br />

width = 2 ∗ n.<br />

} {<br />

wire cin.<br />

cin = false.<br />

(cin, (a, b)) ; snd (zip 2) ; row (n, fadd) ; apr n ; ans at (0,0).<br />

}<br />

(0,0)<br />

Figure 3.11: A simple placed ripple adder<br />

fadd fadd fadd fadd<br />

Figure 3.12: The ripple adder laid out on a grid for n = 4<br />

(8,1)


CHAPTER 3. GENERATING PARAMETERISED LIBRARIES WITH LAYOUT 51<br />

rippleadd is not a combinator and thus much simpler height and width expressions have been<br />

specified than would be inferred by the inference algorithm (though the result is the same).<br />

The zip and apr blocks are both pure wiring which require zero room so the row <strong>of</strong> full adders<br />

is the only block which actually contributes to the size <strong>of</strong> rippleadd - and since fadd has a<br />

constant height and width the size <strong>of</strong> the whole ripple adder can be described extremely<br />

clearly.<br />

Figure 3.12 shows the layout <strong>of</strong> the ripple adder circuit for n = 4.<br />

3.6 Different <strong>Layout</strong> Interpretations<br />

While functional descriptions <strong>of</strong> Quartz designs tend to have a useful obvious layout inter-<br />

pretation, it is <strong>of</strong>ten the case that designs have more than one possible layout interpretation.<br />

For example, the ripple adder illustrated as a row in Figure 3.12 could equally be described<br />

as a vertical array <strong>of</strong> full-adders. In this case we could use the col combinator rather than<br />

the row combinator however because Quartz combinators describe both function and layout<br />

it is sometimes desirable to use the same combinator <strong>with</strong> a different layout interpretation.<br />

For example, an m×n grid <strong>of</strong> A elements, surrounded by interface elements B on the domain<br />

and C on the range can be described in Quartz as:<br />

[map n B, map m B] ; grid m,n A ; [map m C, map n C]<br />

This is a common circuit structure, for example where blocks B and C could be input and<br />

output registers. However, the standard layout interpretation attached to Quartz constructs<br />

will produce the layout shown in Figure 3.13(a). This is not the worst possible layout that<br />

could be assigned to this circuit but it is far from compact <strong>with</strong> long wire lengths and a large<br />

area <strong>of</strong> unused logic resources.<br />

A better layout interpretation would be that shown in Figure 3.13(b). This layout can be<br />

enclosed in a much smaller bounding box and has shorter wires between the B and C elements<br />

and the grid. However, this layout is difficult to generate <strong>with</strong> the system as described so<br />

far.


CHAPTER 3. GENERATING PARAMETERISED LIBRARIES WITH LAYOUT 52<br />

B<br />

B<br />

B<br />

B<br />

B<br />

B<br />

B<br />

B<br />

B<br />

B<br />

A<br />

A<br />

A<br />

A<br />

A<br />

A<br />

A<br />

A<br />

A<br />

A<br />

A<br />

A<br />

A<br />

A<br />

A<br />

A<br />

(a) Default layout interpretation<br />

A<br />

A<br />

A<br />

A<br />

A<br />

A<br />

A<br />

A<br />

C<br />

C<br />

C<br />

C<br />

C<br />

C<br />

C<br />

C<br />

C<br />

C<br />

B<br />

B<br />

B<br />

B<br />

B B B B B B<br />

A<br />

A<br />

A<br />

A<br />

A<br />

A<br />

A<br />

A<br />

A<br />

A<br />

A<br />

A<br />

C C C C C C<br />

(b) A better layout interpretation<br />

Figure 3.13: A grid <strong>of</strong> m ×n A elements surrounded by B and C elements can have multiple<br />

layout interpretations. Shown here for n = 4, m = 6.<br />

3.6.1 Composition<br />

Series and parallel composition have been given particular layout interpretations. This means<br />

that a series composition will always place elements from left to right, and a parallel compo-<br />

sition will always stack elements vertically. In this case, we want the parallel composition to<br />

be “wrapped around” the top and bottom <strong>of</strong> the grid - not something that can be achieved<br />

<strong>with</strong>in the current framework since it implies that the grid is placed <strong>with</strong>in the bounding box<br />

described by the parallel compositions. There are, however, three solutions that can resolve<br />

this situation:<br />

1. The Quartz description can be re-cast using the beside and below combinators. This<br />

would be quite effective, however it introduces some unnecessary complexity since beside<br />

and below operate on blocks <strong>with</strong> four connections and the map n B and map m C blocks<br />

have only two. The π1 −1 /π1 and π2 −1 /π2 wiring blocks can be used to resolve this<br />

issue, however the resulting description will be less clear than the original.<br />

2. The series and parallel compositions can be expanded out into separate block instan-<br />

tiations <strong>with</strong> explicit internal signals. This new expanded version can be derived from<br />

the original description by refinement. After the new description has been produced,<br />

A<br />

A<br />

A<br />

A<br />

A<br />

A<br />

A<br />

A<br />

A<br />

A<br />

A<br />

A<br />

C<br />

C<br />

C<br />

C


CHAPTER 3. GENERATING PARAMETERISED LIBRARIES WITH LAYOUT 53<br />

block surround (block A (‘a1, ‘a2) ∼ (‘a3, ‘a4),<br />

(block B ‘l ∼ ‘a1, block C ‘t ∼ ‘a2),<br />

(block D ‘a3 ∼ ‘b, block E ‘a4 ∼ ‘r))<br />

(‘l l , ‘t t) ∼ (‘b b, ‘r r) {<br />

‘a1 l2. ‘a2 t2. ‘a3 b2. ‘a4 r2.<br />

l ; B ; l2 at (0, height(b2 ; D ; b)).<br />

t ; C ; t2 at (width(l ; B ; l2), max(height(b2 ;D ;b) + height((l2,t2);A;(b2,r2)),<br />

height(b2;D;b) + height(r2;E;r))).<br />

(l2, t2) ; A ; (b2, r2) at (width(l ; B ; l2), height(b2 ; D ; b)).<br />

b2 ; D ; b at (width(l ; B ; l2), 0).<br />

r2 ; E ; r at (width(l ; B ; l2) + width((l2,t2);A;(b2,r2)), height(b2;D;b)).<br />

}<br />

Figure 3.14: A combinator describing the function and layout <strong>of</strong> the grid interface in Figure<br />

3.13(b)<br />

each block instantiation can be given explicit co-ordinates.<br />

3. Since this arrangement is extremely common a new combinator can be created to<br />

describe it.<br />

4. A final alternative is to remove the layout interpretation from series and parallel com-<br />

positions and require explicit co-ordinates <strong>with</strong>in compositions. This is, in general,<br />

more trouble than it is worth - however the option can be left open <strong>of</strong> not using explicit<br />

co-ordinates and defaulting to the standard interpretation.<br />

We would suggest option (3) is the most practical and flexible. The purpose <strong>of</strong> combining<br />

explicit layout <strong>with</strong> combinators is to provide just this flexibility and a general combinator<br />

implementing this structure is illustrated in Figure 3.14. This combinator appears complex<br />

at first however it actually has a simple structure, <strong>with</strong> interface elements B, C, D and E<br />

placed on the four sides <strong>of</strong> the square element A. Applied to the interface elements in the<br />

grid example, this combinator is functionally identical:<br />

Theorem 4<br />

surround(grid m,n A, (map n B, map m B), (map m C, map n C)) =<br />

[map n B, map m B] ; grid m,n A ; [map m C, map n C]


CHAPTER 3. GENERATING PARAMETERISED LIBRARIES WITH LAYOUT 54<br />

Pro<strong>of</strong> Pointwise, by expanding the definition <strong>of</strong> surround to give the proposition:<br />

∃l2, t2, b2, r2. l ; map n B ; l2 ∧ t ; map m B ; t2 ∧ (l2, t2) ; grid m,n A ; (b2, r2)<br />

∧ b2 ; D ; b ∧ r2 ; E ; r =<br />

[map n B, map m B] ; grid m,n A ; [map m C, map n C]<br />

Then by expanding the definitions <strong>of</strong> series and parallel composition.<br />

The placement co-ordinates <strong>of</strong> the C block in this combinator are particularly worthy <strong>of</strong><br />

interest. The max function is used to describe the y co-ordinate that C is placed at, this<br />

ensures that the layout is truly general and independent <strong>of</strong> the sizes <strong>of</strong> each block. The<br />

verification <strong>of</strong> this combinator’s layout is discussed further in Section 4.7.<br />

3.6.2 Combinators<br />

Another key problem is that the map n R combinator block itself has been given a layout<br />

interpretation - <strong>with</strong> its elements arranged vertically. However, in this example we require<br />

both a vertically and horizontally arranged map . One possibility is to require designers to<br />

choose between vmap and hmap blocks, however this reduces the link between the original<br />

pure-functional description and the placed version. It also ignores the fact that most <strong>of</strong> the<br />

time the map combinator is used in manner where vertical layout is the most appropriate.<br />

The Quartz overloading mechanism can be used to handle circumstances where multiple<br />

blocks are required which have the same basic functional properties but differ in some other<br />

way. Figure 3.15 illustrates how this system can be used to provide a map n R combinator<br />

which has a default vertical layout but can also be configured to be laid out horizontally.<br />

A simple usage <strong>of</strong> map n R will be resolved to refer to the default case map block which then<br />

instantiates the general combinator <strong>with</strong> the specific default layout (vertical, for map). If<br />

horizontal layout is desired then the fully flexible map block can be invoked specifically by<br />

adding an additional orientation parameter to the instantiation.<br />

In Figure 3.15 we have used an integer parameter to control the orientation <strong>of</strong> the map block<br />

however if the Quartz type system were extended to support enumerated types then this could


CHAPTER 3. GENERATING PARAMETERISED LIBRARIES WITH LAYOUT 55<br />

#define VERTICAL 0<br />

#define HORIZONTAL 1<br />

// Map combinator supporting horizontal and vertical placement<br />

block map (int orientation) (int n, block R ‘a ∼ ‘b) (‘a i [n]) ∼ (‘b o[n])<br />

attributes {<br />

width = if(orientation==1, sum(k=0..n−1, width (i[k] ;R ;<br />

o[k])), maxf(k=0..n−1, width (i[k] ;R ;o[k]))) .<br />

height = if(orientation==1, maxf(k=0..n−1, height (i[k] ;R ;<br />

o[k ])), sum(k=0..n−1, height (i[k] ;R ; o[k]))) .<br />

} {<br />

int j.<br />

assert (orientation == 1 or orientation == 0) ”Invalid placement specified”;<br />

for j = 0..n−1 {<br />

if (orientation == 0) {<br />

i [j] ; R ; o[j] at (0, sum(k=0..j−1,height(i[k] ;R ; o[k]))) .<br />

} else {<br />

i [j] ; R ; o[j] at (sum(k=0..j−1,width(i[k] ;R ;o[k ])), 0).<br />

} .<br />

} .<br />

}<br />

// Overloaded map block providing default placement interpretation<br />

block map (int n, block R ‘a ∼ ‘b) (‘a i[n]) ∼ (‘b o[n]) → map VERTICAL.<br />

Figure 3.15: Using overloading to provide multiple layout interpretations <strong>of</strong> combinators<br />

be replaced <strong>with</strong> an enumerated type parameter <strong>with</strong> values <strong>of</strong> horizontal or vertical. In this<br />

case the pre-processor definitions <strong>of</strong> the constants could be removed and the parameter-<br />

checking assertion would not be necessary.<br />

This approach can be applied for all combinator blocks which have more than one possible<br />

orientation, for example a row could be laid out vertically or a column could be laid out<br />

horizontally.<br />

3.7 Compiling Placed Quartz Designs<br />

Our goal when compiling Quartz is, where possible, to only partially evaluate the Quartz<br />

description maintaining the maximum possible degree <strong>of</strong> parameterisation in the Pebble<br />

output. When compiling Quartz <strong>with</strong> layout into Pebble relative positioning information<br />

must be eliminated and replaced <strong>with</strong> expressions defining absolute coordinates.


CHAPTER 3. GENERATING PARAMETERISED LIBRARIES WITH LAYOUT 56<br />

Quartz AST<br />

Lexer/<br />

Parser<br />

Preprocessed<br />

Quartz<br />

Preprocessor<br />

Quartz<br />

Input<br />

Type Processing<br />

Identifier<br />

Conversion<br />

Type<br />

Inference<br />

Overloading<br />

Resolution<br />

Full Instantiation<br />

<strong>Layout</strong> Processing<br />

Placement checks<br />

Size Inference<br />

<strong>Layout</strong><br />

<strong>Verification</strong><br />

Pebble Output<br />

Direction Processing<br />

Direction<br />

Inference<br />

Direction<br />

Concreting<br />

Directional<br />

Instantiation<br />

Pretty printer<br />

Distillation<br />

Block<br />

Analysis<br />

Translation /<br />

Unwinding<br />

<strong>Layout</strong><br />

Generation<br />

Identifier<br />

Selection<br />

Pebble AST<br />

Figure 3.16: Modified compiler architecture for compiling Quartz designs <strong>with</strong> layout information<br />

To achieve this, the Pebble 5 language must be extended to enable the evaluation <strong>of</strong> the more<br />

complex expressions that can be generated by Quartz compilation. We describe this extended<br />

Pebble as LE-Pebble (<strong>Layout</strong>-Enhanced Pebble). The expressions supported by the Pebble<br />

5 language are essentially the same as those <strong>of</strong> Quartz before the new functions were added<br />

to support layout. LE-Pebble differs in that the standard expressions must be augmented<br />

<strong>with</strong> max, maxf, sum and if expression types. In practice we will be able to eliminate these<br />

constructs when compiling from Quartz to Pebble in many cases, however in the general case<br />

they are still required.<br />

LE-Pebble does not require height() and width() functions, these will be eliminated during<br />

compilation and replaced <strong>with</strong> appropriate simple expressions.<br />

The Quartz compilation framework is extended as shown in Figure 3.16 to support the<br />

compilation <strong>of</strong> designs <strong>with</strong> layout information.<br />

3.7.1 Changes to the Type Processing Module<br />

In the standard compiler framework the type processing module carries out type checking<br />

on the Quartz input, using a derivative <strong>of</strong> the Hindley/Milner algorithm [14, 53]. The type


CHAPTER 3. GENERATING PARAMETERISED LIBRARIES WITH LAYOUT 57<br />

processor also resolves overloading using an algorithm based around satisfiability matrix<br />

predicates [64]. The standard output <strong>of</strong> the type processing module is a monomorphic,<br />

annotated AST <strong>with</strong> multiple copies <strong>of</strong> polymorphic blocks instantiated for each utilised<br />

type.<br />

This output is suitable for the standard later stages <strong>of</strong> the compiler, however for placed<br />

Quartz compilation we need to insert a new layout processing module between the type<br />

processor and direction processor. This requires significant changes to the type processor to<br />

split the instantiation stage in two, one part that resolves overloading and the other that<br />

removes all polymorphism.<br />

Since different overloaded instances <strong>of</strong> Quartz blocks can have different size expressions it<br />

is necessary to resolve overloading prior to layout generation or layout verification. This is<br />

achieved by the new “overloading resolution” stage <strong>with</strong>in the type processor which replaces<br />

references to overloaded blocks <strong>with</strong> references to specific instances, instantiating new copies<br />

<strong>of</strong> blocks which use different overloaded instances on different occasions. This non-overloaded<br />

but still polymorphic Quartz AST is then passed to the layout processing module.<br />

After layout processing, the type processor completes the process <strong>of</strong> generating monomorphic<br />

Quartz by eliminating all polymorphic types.<br />

3.7.2 <strong>Layout</strong> Processing Module<br />

The layout processing module, despite its name, does not take full responsibility for compiling<br />

layout information in Quartz descriptions. It is responsible for initial processing <strong>of</strong> layout<br />

information and preparation for later stages.<br />

If the compiler is invoked on a design and is not requested to generate placed output the layout<br />

processing module strips all size attributes and placement information from blocks. The<br />

modified circuit design is then passed to the later compilation stages. If layout verification<br />

or placed output mode is requested then the module checks that there are no unplaced<br />

block instantiations <strong>with</strong>in the circuit (unless they are the only instantiation <strong>with</strong>in a block,<br />

in which case they are automatically placed at (0, 0)) and then the size inference stage is<br />

invoked.


CHAPTER 3. GENERATING PARAMETERISED LIBRARIES WITH LAYOUT 58<br />

The block size inference procedure ensures that all blocks are annotated <strong>with</strong> height and width<br />

expressions. If these expressions have been specified manually then no action is necessary,<br />

however if not the procedure executes the function SB to generate size expressions for the<br />

block.<br />

The verification role <strong>of</strong> the layout processing module will be discussed in Chapter 4. The<br />

actual task <strong>of</strong> converting Quartz layout information to explicit co-ordinates in LE-Pebble is<br />

handled by new stages in the distillation module.<br />

3.7.3 Distillation <strong>of</strong> Size Expressions<br />

The distillation module actually converts Quartz descriptions into Pebble, eliminating higher-<br />

order parameters and other constructs that are not valid in Pebble. This module has been<br />

extended <strong>with</strong> code to compile layout expressions from Quartz into LE-Pebble. This is done<br />

in a recursive descent over expression trees, specifically eliminating height() and width()<br />

references.<br />

Three cases have to be dealt <strong>with</strong>:<br />

1. The height()/width() function refers to a single block <strong>with</strong> domain and range signals<br />

linked into the local environment. In this case the block’s height or width expression is<br />

retrieved and compiled, before replacing the height()/width() function <strong>with</strong> appropriate<br />

substitutions for the correct applied domain/range signals. If the expression refers to<br />

signals internal to the block then these must be replaced <strong>with</strong> fresh identifiers and the<br />

code <strong>with</strong>in the block that defines the values <strong>of</strong> these variables lifted into the current<br />

context.<br />

2. The function refers to a parallel composition <strong>of</strong> blocks. This case is compiled by ap-<br />

plying the size interpretation <strong>of</strong> parallel compositions (function SIσ β) and extracting<br />

expressions for each block in the composition, combining them using either addition<br />

(height) or the max function (width).<br />

3. The function refers to a series composition <strong>of</strong> blocks. In this case a similar operation<br />

must be employed as for parallel compositions however <strong>with</strong> a significant difference.


CHAPTER 3. GENERATING PARAMETERISED LIBRARIES WITH LAYOUT 59<br />

If the generated size expression depends on internal signals hidden <strong>with</strong>in the series<br />

composition then these must be explicitly instantiated. This is already the process<br />

that occurs when Quartz series compositions are compiled into Pebble and if the series<br />

composition exactly matches an instantiation carried out anyway then the same newly<br />

declared signals can be used. However, if the composition is not otherwise present<br />

in the block new variables must be declared and code that sets their values must be<br />

lifted into the local context. This implements the semantics <strong>of</strong> the definite description<br />

operator used in function SIσ β.<br />

3.7.4 Recursive Size Expressions<br />

Many Quartz blocks have both recursive and iterative definitions. The two are usually<br />

equivalent, although the iterative versions compile more neatly into Pebble/VHDL. It is<br />

quite common to use the recursive definition <strong>of</strong> a common combinator for formal reasoning<br />

purposes and the iterative version in the generated hardware description.<br />

Some blocks, such as the binary tree combinator, can be more clearly defined recursively than<br />

iteratively. Size inference <strong>of</strong> recursively defined blocks can give recursive size expressions and<br />

while it is sometimes possible to manually specify iterative size expressions this is not always<br />

the case. The height and width <strong>of</strong> the recursively defined map combinator can be easily<br />

described using the same expressions as the iterative version, however the same can not be<br />

said for more complex combinators such as rows or grids where there are internal signals<br />

hidden by the recursion that can not be referenced or where blocks do not have a clearly<br />

defined size in terms <strong>of</strong> the simple sum and maxf functions (such as a tree). In any event,<br />

automation <strong>of</strong> this transformation is non-trivial so where size inference is used alone it is<br />

possible to be left <strong>with</strong> recursive size functions.<br />

For example, the recursively defined map combinator can be described by:<br />

map 0 R ⇔ id<br />

map n R ⇔ apln−1 −1 ; [R, map n−1 R] ; apln−1<br />

After expanding the parallel composition, apl and id this gives size expressions for map n


CHAPTER 3. GENERATING PARAMETERISED LIBRARIES WITH LAYOUT 60<br />

which can not be further simplified <strong>of</strong>:<br />

height = if(n = 0, 0, height(i[0] ; R ; o[0]) + height(i[n − 1..1] ; map n−1 R ; o[n − 1..1]))<br />

width = if(n = 0, 0, max(width(i[0] ; R ; o[0]), width(i[n − 1..1] ; map n−1 R ; o[n − 1..1]))<br />

Without knowing the value <strong>of</strong> n it is not possible to compile away the height() and width()<br />

references in the same way as for iterative size expressions. One strategy for dealing <strong>with</strong><br />

these expressions is to mark them as requiring unwinding in the final output, in which case the<br />

full expression can be evaluated at compile time and non-parameterised Pebble is generated 2 .<br />

Another way for dealing <strong>with</strong> these expressions while maintaining parameterisation is to<br />

compile the recursion directly into Pebble. This requires the addition <strong>of</strong> a fix operator to<br />

Pebble expressions and makes them much more complex. This does resolve the issue and the<br />

recursive Pebble expressions can be compiled into specific VHDL functions for translation<br />

into VHDL (see the next section) however it does introduce unwanted complexity into Pebble.<br />

An alternative, which is not a fully general approach, is to attempt to compute the transitive<br />

closure <strong>of</strong> the recursive size functions. Often recursive blocks are controlled by a single<br />

integer parameter which decreases by 1 for each recursive call and in these circumstances the<br />

resulting value can <strong>of</strong>ten be determined. This approach depends on the complexity <strong>of</strong> the<br />

height and width functions for the R block, however if the size <strong>of</strong> R is constant for all inputs<br />

then the recursive functions can be simplified to just:<br />

height = heightR × n<br />

width = widthR<br />

In general it is better to write iterative descriptions where possible as these can usually be<br />

compiled more efficiently anyway.<br />

2 This is the approach taken in the current implementation <strong>of</strong> the compiler.


CHAPTER 3. GENERATING PARAMETERISED LIBRARIES WITH LAYOUT 61<br />

3.7.5 Expression Simplification<br />

After height() and width() functions have been eliminated the result is valid Pebble ex-<br />

pressions, however these are <strong>of</strong>ten significantly more complex than necessary and can be<br />

extensively simplified - for example, maxf(j = 0..5, 3) can be simplified to just 3. The Quartz<br />

compiler already possesses a sophisticated logic optimiser which simplifies simple logical and<br />

arithmetic expressions such as x + y − y and also evaluates any expressions where the result<br />

is known, such as 4 + 5 × 2.<br />

The logic optimiser is extended to implement the evaluation <strong>of</strong> max, if, maxf and sum expres-<br />

sion types when all parameters are known. Since we are generating parameterised Pebble<br />

output it is <strong>of</strong>ten the case that functions can not be fully evaluated and in these circumstances<br />

a range <strong>of</strong> transformations are applied to simplify expressions.<br />

Uses <strong>of</strong> if can be simplified if the conditional test can be evaluated statically. Uses <strong>of</strong> max<br />

are simplified to remove any values which are known to be less than or equal to another<br />

value (remaining as a parameter). For example, max(a, b, 4, 5, a + 2) can be simplified to<br />

max(b, 5, a + 2) since a + 2 > a and 5 > 4 always. These relationships can be determined<br />

using the logic optimiser itself to investigate whether x > y can be simplified to true or false<br />

for each pair x and y <strong>of</strong> parameters.<br />

sum and maxf are more complex functions to simplify. A number <strong>of</strong> generic transformations<br />

have been developed for expressions involving these functions and these transformations are<br />

applied by the optimiser where possible. The transformations applied to sum are:<br />

n < m ⇒ sum(i = m..n, f) = 0<br />

m ≤ n ⇒ sum(i = m..n, f) = (n − m + 1) × f<br />

i /∈ f ⇒ sum(i = m..n, f) = if(m ≤ n, (n − m + 1) × f, 0)<br />

While similar transformations are applied to maxf:<br />

n < m ⇒ maxf(i = m..n, f) = 0<br />

i /∈ f ⇒ maxf(i = m..n, f) = if(m ≤ n, f, 0)


CHAPTER 3. GENERATING PARAMETERISED LIBRARIES WITH LAYOUT 62<br />

function sum template(b : integer ; t : integer ) return integer is<br />

constant b : integer := 0;<br />

variable a : integer := 0;<br />

variable n : integer ;<br />

begin<br />

for n in b to t loop<br />

a := + a ;<br />

end loop ;<br />

return a ;<br />

end function sum width ;<br />

Figure 3.17: VHDL template for implementing the LE-Pebble sum function<br />

Pro<strong>of</strong> By induction on n and re-arrangement using established properties <strong>of</strong> maxf and sum.<br />

Appendix B.9 gives mechanised pro<strong>of</strong>s for these simplification rules.<br />

Conditionals can sometimes be further simplified by taking advantage <strong>of</strong> the context specified<br />

by assertions <strong>with</strong>in blocks. For example, if a block contains an assertion which states m ≤ n<br />

then the correct branch <strong>of</strong> an if expression dependent on that condition can be selected<br />

statically. Alternatively, another common situation is for m = 0 and n ≥ 1 to be asserted,<br />

which can also be used to simplify the same expression.<br />

3.8 Compiling LE-Pebble into VHDL<br />

To complete the process <strong>of</strong> generating parameterised hardware libraries in an industry-<br />

standard format, LE-Pebble can be compiled into VHDL. The compilation <strong>of</strong> standard Pebble<br />

into parameterised structural VHDL has been reported previously [46, 48] however the addi-<br />

tion <strong>of</strong> new expression types that are not supported in VHDL to Pebble makes this process<br />

slightly more complicated.<br />

While VHDL does not support higher-order functions, it does support the definition <strong>of</strong> func-<br />

tions <strong>with</strong>in structural hardware descriptions. This mechanism can be used to compile com-<br />

plex functions in LE-Pebble such as maxf into VHDL by generating functions based around<br />

simple templates. Iterative version <strong>of</strong> sum and maxf are preferable for VHDL implemen-<br />

tation and these can be described equivalent to the recursive definitions <strong>of</strong> these functions.<br />

Figure 3.17 gives the VHDL template for the sum function. This can be instantiated for any<br />

particular function by replacing the text “” <strong>with</strong> the expression to evaluate. This


CHAPTER 3. GENERATING PARAMETERISED LIBRARIES WITH LAYOUT 63<br />

function max(a : integer ; b : integer ) return integer is<br />

begin<br />

if a > b then<br />

return a ;<br />

else<br />

return b ;<br />

end if ;<br />

end function max;<br />

Figure 3.18: VHDL max function<br />

template can be parameterised in its upper and lower bounds and thus only one function<br />

needs to be generated for each usage <strong>of</strong> sum over the same function. The semantics <strong>of</strong> this<br />

VHDL sum function are the same as those <strong>of</strong> the Quartz/LE-Pebble function. If the range<br />

<strong>of</strong> the sum is zero then the default value <strong>of</strong> a (0) is returned, otherwise the sum is returned.<br />

An equivalent template can be instantiated for maxf. The n-input Quartz/LE-Pebble max<br />

function can be transformed to use nested calls to a 2-input VHDL max function defined as<br />

in Figure 3.18. Conditional expressions can be implemented using VHDL conditionals.<br />

Placement co-ordinates themselves are compiled into the appropriate constructs for a partic-<br />

ular target architecture, as specified by a directive in the source file. We have implemented<br />

placement support for the Xilinx Virtex and Virtex-II architectures, generating RLOC place-<br />

ment macros. Virtex and Virtex-II use different co-ordinate schemes and the Pebble compiler<br />

maps from the basic abstract grid onto these co-ordinate schemes.<br />

For the Virtex-II architecture, for example (see Chapter 2), each slice contains two look-up<br />

tables and other logic that can be explicitly instantiated. However placement co-ordinates<br />

are described in terms <strong>of</strong> individual slices. When mapping from Pebble to VHDL the grid is<br />

squashed so each set <strong>of</strong> two vertically adjacent 1 × 1 blocks are placed in the same slice. The<br />

basic layout element is thus a half-slice.<br />

3.9 Summary and Comparison <strong>with</strong> Related Work<br />

In this chapter we have described a layout infrastructure for the Quartz language which allows<br />

designers to apply explicit absolute or relative placement information to hardware designs.<br />

Our infrastructure allows Quartz higher-order combinators to be given layout interpretations


CHAPTER 3. GENERATING PARAMETERISED LIBRARIES WITH LAYOUT 64<br />

and then used to combine blocks <strong>with</strong> the explicit co-ordinates hidden. Some examples <strong>of</strong><br />

full circuits described and laid out using this framework can be found in Chapter 6.<br />

The Quartz layout infrastructure is, to our knowledge, unique in supporting both iterative and<br />

recursive constructs and the compilation <strong>of</strong> designer-specified combinators into parameterised<br />

output.<br />

Systems based around Ruby [26] and Lava [7] have been used to exploit the geometric in-<br />

terpretation <strong>of</strong> higher-order combinators to generate placed circuits, however both generate<br />

flattened netlists not parameterised output. Neither support for loop iteration constructs,<br />

although this is somewhat less relevant since they are not being compiled into a form that<br />

relies heavily on iteration (VHDL).<br />

Pebble [49, 50] has been extended <strong>with</strong> below and beside relative placement language con-<br />

structs and a procedure for the conversion from relative to explicit co-ordinates has been<br />

formally analysed. This approach allows the generation <strong>of</strong> parameterised output and ensures<br />

that components can not overlap, however it is somewhat limiting in that it does not support<br />

mixing absolute and relative placement information in a single description and can not de-<br />

scribe pathological examples such as the irregular grid arrangement illustrated in Chapter 1.<br />

Furthermore, the generated placements are not necessarily as compact as possible, although<br />

this can be improved through the use <strong>of</strong> partial evaluation. The Pebble approach <strong>of</strong> building<br />

relative placement constructs into the language rather than allowing them to be user-defined<br />

as Quartz permits through its higher-order combinators is inherently more limited, although<br />

necessarily so since Pebble is not a high-order language.


Chapter 4<br />

Verifying <strong>Circuit</strong> <strong>Layout</strong>s<br />

In this chapter we describe how Quartz circuit layouts can be verified mechanically. Sec-<br />

tion 4.1 introduces our basic approach and Section 4.2 explains why Higher-Order Logic was<br />

selected as an good formalism. Section 4.3 discusses how we will define “correctness” <strong>of</strong> a lay-<br />

out. Section 4.4 describes the theoretical developments that provide a pro<strong>of</strong> environment for<br />

Quartz layouts, while Section 4.5 describes how the Quartz compiler is adapted to generate<br />

definitions and pro<strong>of</strong> obligations automatically. Section 4.6 demonstrates the application <strong>of</strong><br />

our verification framework to the Prelude library and describes how automatically-generated<br />

scripts are honed based on these examples, while in Section 4.7 we apply the system to a<br />

range <strong>of</strong> other combinators. Section 4.8 discusses the strengths and weaknesses <strong>of</strong> our system<br />

and Section 4.9 summarises this chapter.<br />

4.1 Introduction<br />

Using explicit co-ordinates to define the placement <strong>of</strong> components in parameterised circuit<br />

descriptions can be complex and error-prone. The Quartz layout system based on giving<br />

layout interpretations to higher-order combinators substantially reduces the scope for errors<br />

since the task <strong>of</strong> describing a circuit layout is divided hierarchically into smaller and simpler<br />

components. Combinators <strong>with</strong> layout interpretations can be written once and used many<br />

times, so while the placement co-ordinates <strong>with</strong>in each combinator are relatively simple, it is<br />

65


CHAPTER 4. VERIFYING CIRCUIT LAYOUTS 66<br />

still vital to ensure that the layout is correct.<br />

While close examination by a human designer is an good method <strong>of</strong> finding many bugs in<br />

layout descriptions, it is no substitute for a formal assurance <strong>of</strong> correctness. In the develop-<br />

ment <strong>of</strong> the example designs illustrated in Chapter 6 there have been several occasions where<br />

Quartz descriptions which appeared on first inspection to be correctly laid out have turned<br />

out to contain errors.<br />

The hierarchical decomposition that is typical <strong>of</strong> Quartz circuit designs can be exploited so<br />

that sections <strong>of</strong> the a circuit can be proved correct independently <strong>of</strong> each other and then<br />

these pro<strong>of</strong>s integrated into a pro<strong>of</strong> <strong>of</strong> correctness for the entire design. However, the type<br />

<strong>of</strong> pro<strong>of</strong>s involved in formally verifying a layout description are not particularly well suited<br />

to hand-pro<strong>of</strong> by the designer, or indeed a different pro<strong>of</strong> expert.<br />

<strong>Layout</strong>s <strong>of</strong> large numbers <strong>of</strong> components leads to long theorems requiring pro<strong>of</strong> - but the<br />

constituents <strong>of</strong> these theorems are <strong>of</strong>ten either trivial or quite simple. These two factors<br />

combine to make it likely that “pen and paper” pro<strong>of</strong> <strong>of</strong> these theorems is likely to be<br />

particularly unreliable unless extreme care is taken.<br />

Furthermore, there is a high level <strong>of</strong> pro<strong>of</strong> re-use between different circuit descriptions, by<br />

exploiting similar properties <strong>of</strong> arithmetic operators, binary relations and Quartz size expres-<br />

sion functions. This suggests that layout verification may be a good candidate for the use <strong>of</strong><br />

mechanised pro<strong>of</strong> tools.<br />

In this chapter we describe a pro<strong>of</strong> infrastructure based on Higher-Order Logic which elimi-<br />

nates the possibility <strong>of</strong> human error in pro<strong>of</strong>s and demonstrates a high level <strong>of</strong> automation.<br />

4.2 Choice <strong>of</strong> Formalism<br />

There are two main possible approaches to verifying Quartz layouts. The first is to verify the<br />

output circuit description for each compiled circuit, either in the form <strong>of</strong> parameterised/hier-<br />

archical VHDL or in netlist format. <strong>Verification</strong> <strong>of</strong> placed netlists is effectively carried out<br />

by the synthesis tools that generate the <strong>FPGA</strong> bitstream since they will raise an error if an<br />

incorrect layout is specified, however this is not <strong>of</strong> much use if the desire is to provide an


CHAPTER 4. VERIFYING CIRCUIT LAYOUTS 67<br />

assurance that the generated parameterised VHDL library (Chapter 3) is correct.<br />

<strong>Verification</strong> <strong>of</strong> the parameterised VHDL would be possible but would have to be substantially<br />

repeated for each new circuit. The alternative approach is to attempt to verify the original<br />

Quartz description. Quartz combinators could be verified once and then this pro<strong>of</strong> could be<br />

used in the pro<strong>of</strong>s <strong>of</strong> all circuit descriptions that use this combinator. This approach <strong>of</strong>fers a<br />

higher degree <strong>of</strong> reuse <strong>of</strong> pro<strong>of</strong>s, which is extremely beneficial.<br />

Since Quartz is a high-order language, we have selected Higher-Order Logic (HOL) as the<br />

appropriate formalism for our pro<strong>of</strong> system. This enables us to model most <strong>of</strong> the features<br />

<strong>of</strong> Quartz descriptions and thus to conduct verification at the level closest to the original<br />

circuit description. The use <strong>of</strong> HOL for functional verification <strong>of</strong> hardware is well understood<br />

[8, 52], although the level <strong>of</strong> automation that can be achieved is <strong>of</strong>ten not that great.<br />

Other formalisms, such as the Boyer-Moore logic [9] used by the ACL2 theorem prover<br />

[34], are simpler and can be more highly automated however they are less general. Using a<br />

first-order logic makes it impossible to prove properties <strong>of</strong> Quartz higher-order combinators,<br />

only about their instantiated instances i.e. while we might wish to prove the correctness <strong>of</strong><br />

the map n R combinator we would be restricted to separately proving the correctness <strong>of</strong> the<br />

map n add and map n inv blocks.<br />

In this chapter we develop a system based around the embedding <strong>of</strong> HOL in the Isabelle<br />

[61] generic theorem prover. Isabelle/HOL [55] is a well developed Isabelle object logic and<br />

comes <strong>with</strong> many useful definitions and theorems.<br />

Our infrastructure is not specific to Isabelle, or to the Isabelle version <strong>of</strong> HOL, nor are we<br />

limited to using HOL for pro<strong>of</strong>s. The layout verification stage <strong>of</strong> the Quartz compiler is<br />

designed to be invoked on polymorphic, high-order circuit descriptions however it could be<br />

invoked later during compilation to produce output for a different formalism. We separate<br />

the generation <strong>of</strong> pro<strong>of</strong> obligations from the interface to the Isabelle theorem prover and it<br />

would be easy to provide an interface to a different pro<strong>of</strong> tool.


CHAPTER 4. VERIFYING CIRCUIT LAYOUTS 68<br />

4.3 Specifying Correctness<br />

In order to formally verify anything it is necessary to state the requirements for correctness.<br />

In the case <strong>of</strong> circuit layouts we define correctness in terms <strong>of</strong> validity, containment and<br />

intersection.<br />

4.3.1 Validity<br />

Validity is a property <strong>of</strong> the size expressions (height and width) for a block:<br />

Definition 5 A block size expression is valid if, for all allowable values <strong>of</strong> all variables in<br />

the expression, it always evaluates to a value greater than or equal to zero.<br />

∀x1, x2, . . . , xn. assertions(x1, x2, . . .,xn) ⇒ 0 ≤ f(x1, x2, . . . , xn)<br />

This requirement may appear trivial and indeed its pro<strong>of</strong> is <strong>of</strong>ten easy, however it is an<br />

extremely important requirement. Blocks <strong>with</strong> size expressions that evaluate to negative<br />

values will usually render otherwise correct layouts useless. A common pro<strong>of</strong> obligation for<br />

other correctness requirements is <strong>of</strong> the form:<br />

sizeA ≤ sizeA + sizeB<br />

This is provable only if it can be assumed that sizeB ≥ 0.<br />

The implication in Definition 5 is also significant. It states that it is only necessary for size<br />

expressions to be valid for inputs that meet the preconditions specified in the design (via<br />

assertions). For example, a size expression n × 2 is valid provided that n ≥ 0.<br />

4.3.2 Containment<br />

The size <strong>of</strong> a block is a bounding box defined as a rectangle <strong>with</strong> bottom left co-ordinates (0, 0)<br />

and the top right corner <strong>with</strong> co-ordinates as defined by the block’s size expressions. The size<br />

<strong>of</strong> a block can be specified manually and can be regarded as a specification that a block must


CHAPTER 4. VERIFYING CIRCUIT LAYOUTS 69<br />

A<br />

B<br />

C<br />

E<br />

D<br />

(a) Incorrect containment<br />

F<br />

B<br />

C<br />

A<br />

E<br />

D<br />

(b) Correct containment<br />

Figure 4.1: Different layouts for the same components can affect containment <strong>with</strong>in an area<br />

meet. A block meets the containment requirement if all sub-blocks are instantiated <strong>with</strong>in<br />

the bounding box.<br />

Definition 6 A block is correctly contained if, for all allowable values <strong>of</strong> all input variables<br />

to the block, all instantiated sub-components fall <strong>with</strong>in the block’s bounding box<br />

∀x1, . . .,xn. assertions(x1, . . .,xn) ⇒ ∀(p ; B ; q at (x, y)) ∈ InstantiatedBlocks.<br />

0 ≤ x ∧ 0 ≤ y ∧<br />

x + Bwidth ≤ width(x1, . . . , xn) ∧ y + Bheight ≤ height(x1, . . . , xn)<br />

Figure 4.1 illustrates how the same components can be laid out <strong>with</strong>in a block in ways that<br />

either fail or meet this layout correctness constraint.<br />

The containment constraint is also vital for layout verification pro<strong>of</strong>s. It permits hierarchical<br />

pro<strong>of</strong>s where the internal arrangement <strong>of</strong> blocks can be ignored and the block analysed purely<br />

in terms <strong>of</strong> its size expressions. It is sometimes desirable to relax this correctness requirement,<br />

as we discuss later in this chapter.<br />

4.3.3 Intersection<br />

The most significant area for potential problems <strong>with</strong> explicitly laid-out designs to arise is<br />

in the intersection <strong>of</strong> blocks. While this problem is somewhat reduced by hierarchical design<br />

descriptions and relative co-ordinates the possibility <strong>of</strong> errors is not totally eliminated. In-<br />

tersection occurs when the bounding boxes <strong>of</strong> two blocks overlap, leading to logic resources<br />

F


CHAPTER 4. VERIFYING CIRCUIT LAYOUTS 70<br />

(x21, y21)<br />

(x21, y21)<br />

(x11, y11)<br />

(x22, y22)<br />

(x11, y11)<br />

(a) x22 ≤ x11<br />

(x22, y22)<br />

(x12, y12)<br />

(c) y12 ≤ y21<br />

(x12, y12)<br />

(x11, y11)<br />

(x11, y11)<br />

(x21, y21)<br />

(x12, y12)<br />

(x21, y21)<br />

(b) x12 ≤ x21<br />

(x12, y12)<br />

(x22, y22)<br />

(d) y22 ≤ y11<br />

(x22, y22)<br />

Figure 4.2: Situations under which two rectangles will not overlap<br />

possibly being allocated to both. This must obviously be avoided in correct circuit descrip-<br />

tions.<br />

Ensuring this correctness criteria is the common target <strong>of</strong> work on relative placement. The<br />

problem <strong>of</strong> ensuring non-overlapping <strong>of</strong> n rectangles is a well understood objective in con-<br />

straint programming [1, 6] and is a problem in placement algorithms, however in our case<br />

we are interested not in finding a solution to the problem but rather in checking one. Exist-<br />

ing work on constraint solving applied to this problem is not suitable for adaptation to the<br />

Quartz system because we permit a much ricer language <strong>of</strong> expressions than simple arithmetic<br />

operations to define block sizes.<br />

Two objects placed on a n-dimensional surface will not overlap if is one dimension in which<br />

they can not intersect. For rectangles on a two-dimensional surface this means that one<br />

block must be either below or above the other, or to the left or right <strong>of</strong> it. This means that<br />

there are four possible situations in which a layout can be correct, illustrated graphically<br />

in Figure 4.2. When the rectangle A is described by corners (x11, y11) and (x12, y12), and<br />

rectangle B is described by corners (x21, y21) and (x22, y22) this can be represented by the<br />

logical disjunction:<br />

(x22 ≤ x11) ∨ (x12 ≤ x21) ∨ (y12 ≤ y21) ∨ (y22 ≤ y11)


CHAPTER 4. VERIFYING CIRCUIT LAYOUTS 71<br />

This leads naturally to the definition <strong>of</strong> intersection correctness:<br />

Definition 7 For every pair <strong>of</strong> block instantiations A and B <strong>with</strong>in a block, where A is<br />

placed at (xA, yA) <strong>with</strong> size functions (widthA, heightA) and B is placed at (xB, yB) <strong>with</strong><br />

size functions (widthB, heightB), for all possible allowable input values:<br />

(xB +widthB ≤ xA) ∨ (xA+widthA ≤ xB) ∨ (yB+heightB ≤ yA) ∨ (yA+heightA ≤ yB)<br />

A naive implementation <strong>of</strong> Definition 7 has (n − 1) × (n − 2) pro<strong>of</strong> obligations. However, by<br />

exploiting symmetry this can be reduced to (n − 1) × (n − 2)/2 obligations. This still means<br />

that for blocks <strong>with</strong> a large number <strong>of</strong> components instantiated <strong>with</strong>in them a large number<br />

<strong>of</strong> pro<strong>of</strong> obligations are generated, however because Quartz designs tend to be broken up into<br />

many entities each <strong>of</strong> which only contains a few constructs this is less <strong>of</strong> a problem than it<br />

might seem at first.<br />

The generation <strong>of</strong> pro<strong>of</strong> obligations from Quartz descriptions is discussed in detail in Section<br />

4.5, however it is worth mentioning here the special case <strong>of</strong> iteration. A Quartz for loop<br />

can lead to more than one block being instantiated which can potentially intersect <strong>with</strong> one<br />

another. When generating pro<strong>of</strong> obligations for a for loop construct, an additional pro<strong>of</strong><br />

obligation therefore exists that for any two iterations <strong>of</strong> the loop, any instantiated blocks can<br />

not overlap <strong>with</strong> one another.<br />

4.4 Pro<strong>of</strong> Environment<br />

We develop a mechanised theorem proving environment for layout verification based on a<br />

shallow embedding [8] <strong>of</strong> Quartz in Higher-Order Logic. This involves the definition <strong>of</strong> the<br />

semantics <strong>of</strong> Quartz constructs in terms <strong>of</strong> HOL connectives.<br />

We develop the Quartz<strong>Layout</strong> library <strong>of</strong> theories which provides definitions for and useful<br />

theorems about a sufficient subset <strong>of</strong> Quartz to enable layout pro<strong>of</strong>s. Our embedding is quite<br />

different to the typical embeddings <strong>of</strong> hardware description languages in logic since our aim<br />

is not to engage in functional verification but rather to verify layout. This means that the


CHAPTER 4. VERIFYING CIRCUIT LAYOUTS 72<br />

CompilerSimps<br />

Structures<br />

Functions<br />

ParallelComposition<br />

IntAlgebra<br />

Quartz<strong>Layout</strong><br />

SeriesComposition<br />

Inbuilt<br />

Block<br />

Types<br />

Figure 4.3: The Quartz<strong>Layout</strong> theory hierarchy. Rectangular nodes develop the theory <strong>of</strong> the<br />

language itself, oval nodes define functions for size expressions and useful theorems. Arrows<br />

indicate dependencies.<br />

actual hardware generated by a Quartz description is <strong>of</strong> interest only so far as it affects the<br />

layout <strong>of</strong> the design and wiring is <strong>of</strong> no interest, since we assume that wiring resources are<br />

separate from computational resources.<br />

Our Quartz model is built up hierarchically from a number <strong>of</strong> Isabelle theories, as illustrated<br />

in Figure 4.3. In this chapter we describe how some <strong>of</strong> the language features <strong>of</strong> Quartz are<br />

modelled and identify some useful theorems. The full Isabelle theory development can be<br />

seen in Appendix B.<br />

4.4.1 Type System<br />

Isabelle/HOL has types and polymorphism based on the Hindley/Milner system [53] - the<br />

same basic system as Quartz. It also supports overloading, although it is based on type<br />

classes [84] which are not compatible <strong>with</strong> the Quartz overloading system [64]. Overloading<br />

complicates verification since it is not necessarily possible to determine which block instance<br />

is selected and different instances can have different size expressions.<br />

In the Quartz<strong>Layout</strong> system we ignore overloading and assume it has been resolved prior to


CHAPTER 4. VERIFYING CIRCUIT LAYOUTS 73<br />

verification. The modified Quartz compiler ensures that this occurs (see Section 4.5).<br />

Otherwise the HOL type system is generally suitable for modelling the Quartz type sys-<br />

tem. Quartz boolean types can be represented as HOL booleans and Quartz integers can<br />

be modelled as HOL integers. Isabelle/HOL also defines a theory <strong>of</strong> natural numbers <strong>with</strong><br />

an extensive range <strong>of</strong> useful theorems and we will make some use <strong>of</strong> this when modelling<br />

recursive blocks, however in general we use the integer type which is a more accurate model<br />

and supports true arithmetic.<br />

Quartz tuples are represented using the HOL Cartesian product type, which defines tuples as<br />

nested pairs. This means that the types <strong>of</strong> (a, b, c) and (a, (b, c)) are considered equivalent,<br />

however this is not a major problem since this is more permissive than the Quartz type<br />

system and designs must already type-check using the Quartz compiler’s type processor.<br />

The Types theory defines the types <strong>of</strong> wires and <strong>of</strong> vectors. Wires are named types only, <strong>with</strong><br />

no properties. This is a useful model for our purposes since we are actually not interested<br />

in wiring in any way and the values on wires in a generated circuit are not relevant to its<br />

compilation.<br />

Vectors are defined as functions from integers to some other type. In this model a two<br />

dimensional Quartz vector wire a[m][n] is given type int → int → wire. There are significant<br />

limitations to this model [52], essentially arising from the fact that it is not possible to<br />

identify that a vector has a fixed number <strong>of</strong> elements and if each element <strong>of</strong> two vectors are<br />

equivalent then the vectors themselves are equivalent. However, this theoretical model is<br />

quite sufficient for layout pro<strong>of</strong>s although it would likely be limiting when reasoning about<br />

functional properties.<br />

4.4.2 Blocks and Block Instantiation<br />

There are three characteristics <strong>of</strong> blocks that are <strong>of</strong> interest when reasoning about Quartz<br />

layouts: their semantic interpretation, width expression and height expression. It is necessary<br />

to model the meaning <strong>of</strong> blocks as well as purely their height and width expressions since<br />

the two are inter-related, <strong>with</strong> size expressions possibly involving internal signals defined in<br />

a block.


CHAPTER 4. VERIFYING CIRCUIT LAYOUTS 74<br />

record (’a,’b)block =<br />

Def :: "’a"<br />

Height :: "’b"<br />

Width :: "’b"<br />

constdefs<br />

ap :: "[(’ a⇒ ’b,’a⇒ ’c)block,’a]⇒ (’b,’c)block"<br />

( infixl "$" 49)<br />

"ap ≡ λB x. (| Def = Def B x, Height = Height B x, Width = Width B x |)"<br />

constdefs<br />

inst :: "[’a, (’a⇒ ’b⇒ bool,’a⇒ ’b⇒ int)block,’b]⇒ (bool, int)block"<br />

(" ;;; ;;; " [45, 46, 47] 45)<br />

"inst ≡ (λ x B y. (| Def = Def B x y, Height = Height B x y, Width = Width B x y |))"<br />

Figure 4.4: Part <strong>of</strong> the Quartz<strong>Layout</strong> theory <strong>of</strong> blocks<br />

Because Quartz blocks are relational, rather than functional, we must model block semantics<br />

as logical predicates on their domain and range signals. A predicate block returns the boolean<br />

true for all valid combinations <strong>of</strong> inputs and outputs and false for invalid combinations. This<br />

means that a predicate can not calculate a block’s output values from its inputs in the same<br />

way as a function, however it can confirm that values are correct. Within block semantic<br />

definitions, internal signals can be represented using existential quantifiers ∃ (written “EX”<br />

in Isabelle/HOL).<br />

We model block size functions, rather than size expressions. A size function includes the<br />

block’s semantic definition <strong>with</strong>in it to define the values <strong>of</strong> any internal variables, bound<br />

using the definite description operator.<br />

A block is modelled as a record consisting <strong>of</strong> a logical predicate, height function and width<br />

function, as shown in Figure 4.4. A block <strong>with</strong> type δ1 δ2 . . .δn ∼ ρ will be modelled as a<br />

record <strong>of</strong> type (δ1 → δ2 → · · · → δn → ρ → bool, δ1 → δ2 → · · · → δn → ρ → int)block.<br />

While the general form <strong>of</strong> the type is the same for both the size functions (which return<br />

integers) and the semantic definition predicate (which returns a boolean) they have to be<br />

defined separately because <strong>of</strong> the way types are described.<br />

Two operations are defined on block records to model the behaviour described by statements<br />

<strong>of</strong> the type a ; bid p1 . . .pn ; b. Block application describes the application <strong>of</strong> an inner<br />

(curried) parameter to a block. So as not to confuse this <strong>with</strong> HOL function application<br />

in Quartz<strong>Layout</strong> this requires the use <strong>of</strong> a dollar sign operator. The block instantiation


CHAPTER 4. VERIFYING CIRCUIT LAYOUTS 75<br />

constdefs<br />

ser :: " [(’ a⇒ ’b⇒ bool,’a⇒ ’b⇒ int)block,(’b⇒ ’c⇒ bool,’b⇒ ’c⇒ int)block]<br />

⇒ (’a⇒ ’c⇒ bool, ’a⇒ ’c⇒ int)block" (infixl ";;" 48)<br />

"ser ≡ (λ B1 B2. (| Def = λx y. ∃ s. (Def B1) x s ∧ (Def B2) s y,<br />

Height = λx y. let s = (THE s. (Def B1) x s ∧ (Def B2) s y) in<br />

max (Height B1 x s) (Height B2 s y),<br />

Width = λx y. let s = (THE s. (Def B1) x s ∧ (Def B2) s y) in<br />

(Width B1 x s) + (Width B2 s y)|))"<br />

Figure 4.5: Part <strong>of</strong> the Quartz<strong>Layout</strong> theory <strong>of</strong> series composition<br />

operation is identified by triple semi-colons in HOL. This eases the job <strong>of</strong> parsing Quart-<br />

z<strong>Layout</strong> definitions and allows it to clearly distinguished from series compositions. A block<br />

instantiation in Quartz<strong>Layout</strong> is therefore represented by the string a ;;; bid $ p1 . . . $<br />

pn ;;; b. This is sufficiently similar to the Quartz syntax to be readable and yet sufficiently<br />

simplified to not require the development <strong>of</strong> AST-transforming ML functions for Isabelle.<br />

The contents <strong>of</strong> a block record are accessed using the functions Def, Height and Width, so<br />

Height (a ;;; bid $ p1 . . . $ pn ;;; b) will return an integer value corresponding to<br />

the height <strong>of</strong> that block instantiation. This is identical to the height() function <strong>of</strong> Quartz<br />

expressions and is used to model this operation.<br />

Block compositions are described in the SeriesComposition and ParallelComposition theories.<br />

Parallel composition is defined as an operation on nested pairs, mirroring the definition <strong>of</strong><br />

Cartesian products used to model Quartz tuples and is syntactically represented using double<br />

square brackets.<br />

Series composition is defined as a left-associative binary operator (identified by double semi-<br />

colons), as illustrated in Figure 4.5. The semantic definition <strong>of</strong> a series composition <strong>of</strong><br />

two blocks is defined <strong>with</strong> an explicit internal signal connecting them, while the definite<br />

description operator is used to parameterise the width and height functions <strong>of</strong> the two blocks<br />

by the correct value to determine the width and height <strong>of</strong> the composition.<br />

The definitions <strong>of</strong> series and parallel composition implement the SIσ β function defined on<br />

Page 43. Series compositions are laid out horizontally <strong>with</strong> the widths summed and the<br />

maximum height selected, while parallel compositions are laid out vertically <strong>with</strong> heights<br />

summed and the maximum width selected.


CHAPTER 4. VERIFYING CIRCUIT LAYOUTS 76<br />

Block operations and definitions in Quartz<strong>Layout</strong> are sufficiently powerful to allow the pro<strong>of</strong>s<br />

<strong>of</strong> some (functional) Quartz Laws, such as: fst A ; sndB = [A, B]. This kind <strong>of</strong> theorem can<br />

be described in Quartz<strong>Layout</strong> as:<br />

!! a b c d. Def ((a, b) ;;; fst $ A ;; snd $ B ;;; (c, d)) = Def ((a, b) ;;;<br />

[[ A , B ]] ;;; (c, d))<br />

(where “!!” is Isabelle’s meta-logic operator for universal quantification). This can be proved<br />

using Isabelle’s simplifier to expand the definitions <strong>of</strong> parallel composition, fst and snd.<br />

4.4.3 Expressions<br />

HOL arithmetic expressions will be used to model Quartz expressions. The IntAlgebra theory<br />

defines several additional operators that are required by Quartz but are not provided in<br />

Isabelle/HOL - such as a greater-than ordering and a power function for integers. The<br />

IntAlgebra theory also includes pro<strong>of</strong>s <strong>of</strong> many useful theorems for re-arranging arithmetic<br />

inequalities <strong>of</strong> the kinds that will be needed to reason about circuit layouts. Most <strong>of</strong> these<br />

theorems can be proved easily using Isabelle’s simplifier, classical reasoner or arithmetic<br />

decision procedure. Some particularly useful and simple theorems concern the max function:<br />

Theorem 8 ∀ m n. n ≤ max n m<br />

Theorem 9 ∀ f g n. n ≤ f ∨ n ≤ g ⇒ n ≤ max f g<br />

Theorem 10 ∀ m n. 0 ≤ n ⇒ max(m + n) m = (m + n)<br />

Pro<strong>of</strong> Trivial, by expanding the definition <strong>of</strong> the max function. Mechanised pro<strong>of</strong>s are<br />

given in Appendix B.1 as theorems max nm nleq, max geq n disj and max xyge0.<br />

Another theorem that is simple and extremely useful is:<br />

Theorem 11 (0 ≤ a ∧ 0 ≤ b ∧ 0 ≤ c ∧ a ≤ b) ⇒ a ≤ (b + c)<br />

Pro<strong>of</strong> Trivial, given as theorem z aleq bc in Appendix B.1.


CHAPTER 4. VERIFYING CIRCUIT LAYOUTS 77<br />

consts<br />

maxf :: "(int∗int∗(int⇒ int))⇒ int"<br />

recdef maxf "measure (λ(b, t, f). nat(t+1−b))"<br />

"maxf (bot, top, fun) = (if (top < bot) then 0<br />

else (<br />

case (top = bot) <strong>of</strong> True ⇒ fun top<br />

| False ⇒ (<br />

let one = fun top in<br />

let two = maxf (bot, top − 1, fun) in<br />

max one two)))"<br />

consts<br />

sum :: "(int∗int∗(int⇒ int))⇒ int"<br />

recdef sum "measure (λ(b, t, f). nat(t+1−b))"<br />

"sum (bot, top, fun) = (<br />

case (top < bot) <strong>of</strong> True ⇒ 0<br />

| False ⇒ sum (bot, top − 1, fun) + (fun top))"<br />

Figure 4.6: Definitions <strong>of</strong> the Quartz sum and maxf functions in Quartz<strong>Layout</strong><br />

Theorem 11 is particularly useful since we are <strong>of</strong>ten confronted <strong>with</strong> expressions <strong>of</strong> this form<br />

where a, b and c are size functions for various blocks. Using this theorem we can simplify<br />

the logical proposition substantially, since we will prove that 0 ≤ x for all size expressions x<br />

as a matter <strong>of</strong> course and can use these theorems in other pro<strong>of</strong>s.<br />

While Isabelle/HOL already includes definitions <strong>of</strong> max and if, we have to define the complex<br />

functions maxf and sum. The definitions <strong>of</strong> these functions from theory Functions can be<br />

seen in Figure 4.6. These definitions have the same semantics as the appropriate clauses <strong>of</strong><br />

the semantic function for Quartz expressions Eσ β µ given on page 42.<br />

The functions are defined using the recdef method <strong>of</strong> declaring arbitrary recursive functions.<br />

The measure functions allow Isabelle/HOL to prove termination <strong>of</strong> the functions by showing<br />

that the supplied measure decreases for each recursive call. We use “case” rather than “if”<br />

expressions <strong>with</strong>in the functions to avoid problems <strong>with</strong> the use <strong>of</strong> Isabelle’s simplifier where<br />

conditionals are repeatedly split, leading to a loop. Besides this, the two function identically<br />

and the equivalence between the two can be proved easily (see theorem maxf expand if in<br />

Appendix B.7 for such a pro<strong>of</strong> for the maxf function).<br />

We prove the correctness <strong>of</strong> the maxf function, showing that it implements a logical definition:


CHAPTER 4. VERIFYING CIRCUIT LAYOUTS 78<br />

Theorem 12<br />

∀ b t f. b ≤ t ∧ (∀y. 0 ≤ f y) ⇒<br />

let max = maxf(b, t, f) in<br />

∃y. b ≤ y ∧ y ≤ t ∧ f y = max ∧<br />

∀x. b ≤ x ∧ x ≤ t ⇒ (f x) ≤ max <br />

Pro<strong>of</strong> The two parts <strong>of</strong> the conjunction can be proved separately. The first states that the<br />

maxf function returns a value that is produced by the function f while the second states that<br />

there is no greater value returned by f <strong>with</strong>in the specified range. Both can be proved by<br />

induction over t, bounded from below by b (in Isabelle this is done using the int ge induct<br />

induction schema) and then re-arrangement using the definitions <strong>of</strong> max and maxf. Mech-<br />

anised pro<strong>of</strong>s are available as theorems maxf ansvalid, maxf fmax and maxf is maxf in<br />

Appendix B.7.<br />

This pro<strong>of</strong> is significant since it provides a mechanism for moving between the functional and<br />

logical definitions <strong>of</strong> the maxf should this be desired during a pro<strong>of</strong>.<br />

The Functions theory contains the pro<strong>of</strong>s <strong>of</strong> many useful properties <strong>of</strong> maxf and sum, includ-<br />

ing how they relate. Most <strong>of</strong> these pro<strong>of</strong>s are carried out by induction on the value <strong>of</strong> top.<br />

Some particularly useful theorems about maxf are:<br />

Theorem 13 ∀ m n f. (∀y. 0 ≤ f y) ⇒ 0 ≤ maxf(m, n, f)<br />

Theorem 14 ∀ m n f. n < m ⇒ 0 ≤ maxf(m, n, f)<br />

Both <strong>of</strong> these theorems are useful in containment pro<strong>of</strong>s. Some similarly useful theorems for<br />

sum are:<br />

Theorem 15 ∀ b t f g. b ≤ t ⇒ sum(b, t, λi. f i + g i) = sum(b, t, f) + sum(b, t, g)<br />

Theorem 16 ∀ b t f. b ≤ t ⇒ sum(b, t, f) + f (t + 1) = sum(b, t + 1, f)


CHAPTER 4. VERIFYING CIRCUIT LAYOUTS 79<br />

Theorem 17 ∀ b t f. (∀y. 0 ≤ f y) ⇒ 0 ≤ sum(b, t, f)<br />

We also prove theorems about the inter-relationship between these two functions, such as:<br />

Theorem 18 ∀ b t f. (∀ x. 0 ≤ fx) ⇒ maxf(b, t, λk. sum(b, k, f)) = sum(b, t, f)<br />

Theorem 19<br />

∀ b t f. b ≤ t ∧ (∀ x. 0 ≤ f x) ⇒<br />

maxf(b, t + 1, λk. sum(b, k, f)) = maxf(b, t, λk. sum(b, k, f) + f(t + 1))<br />

Pro<strong>of</strong>s <strong>of</strong> all these theorems and many others about maxf and sum are listed in Appendix B.7.<br />

4.5 Generating Theories <strong>of</strong> Quartz Programs<br />

Given that we are using a shallow embedding <strong>of</strong> Quartz in Isabelle/HOL, we can not reason<br />

about Quartz descriptions directly. Instead, we must translate Quartz descriptions into<br />

semantic descriptions in Higher-Order Logic.<br />

4.5.1 Compiler Architecture<br />

When in verification mode the Quartz compiler translates from Quartz descriptions to Is-<br />

abelle/HOL during compilation. This allows the Quartz compiler to resolve all overloading<br />

prior to generating HOL descriptions - t a vital step since our Quartz<strong>Layout</strong> system does not<br />

support overloading.<br />

Figure 4.7 shows the functionality <strong>of</strong> the Quartz compiler when in layout verification mode.<br />

<strong>Layout</strong> verification is divided between the <strong>Layout</strong> Processing and Isabelle modules. The<br />

<strong>Layout</strong> Processing module converts Quartz programs into their HOL semantic definitions<br />

and generates theorems that must be proved to verify the correctness <strong>of</strong> a layout. These<br />

HOL definitions and pro<strong>of</strong> obligations are generated in an abstract data format which is then<br />

passed to the Isabelle module.


CHAPTER 4. VERIFYING CIRCUIT LAYOUTS 80<br />

Quartz AST<br />

Lexer/<br />

Parser<br />

Preprocessed<br />

Quartz<br />

Preprocessor<br />

Quartz<br />

Input<br />

Type Processing<br />

Identifier<br />

Conversion<br />

Type<br />

Inference<br />

Overloading<br />

Resolution<br />

Full Instantiation<br />

Placement checks<br />

Find dependencies<br />

for composites<br />

Generate HOL<br />

for composites<br />

Generate<br />

Isabelle scripts<br />

<strong>Layout</strong> Processing<br />

Isabelle Module<br />

Size Inference<br />

Generate HOL<br />

for primitives<br />

Generate pro<strong>of</strong><br />

goals<br />

Generate<br />

Tacticals<br />

Isabelle<br />

input<br />

Pretty<br />

printer<br />

Figure 4.7: Functions <strong>of</strong> the Quartz compiler in layout verification mode<br />

The Isabelle module is the only part <strong>of</strong> the compiler process that is specific to Isabelle.<br />

It generates Isabelle/HOL pro<strong>of</strong> scripts using the correct syntax and outputs them to disk<br />

where they can be loaded by Isabelle. The Isabelle module also does its best to automatically<br />

generate input to Isabelle’s automatic pro<strong>of</strong> tools, this is described in Section 4.6.<br />

Where blocks are part <strong>of</strong> a library and have already had their layouts proved the compiler<br />

does not generate pro<strong>of</strong> scripts, assuming that these scripts are available for loading into the<br />

theorem prover from elsewhere. This is controlled by a special “layout-proved” attribute that<br />

can be attached to blocks.<br />

4.5.2 Generating Definitions<br />

Given the model developed in the previous section, Isabelle/HOL expressions <strong>with</strong> the ex-<br />

panded functionality <strong>of</strong> the Quartz<strong>Layout</strong> Functions library now essentially implement a<br />

super-set <strong>of</strong> Quartz expressions - it is therefore possible to translate Quartz expressions into<br />

Isabelle/HOL directly merely by re-writing syntax as necessary.<br />

Quartz blocks and the statements that they are composed <strong>of</strong> must be translated into their<br />

semantic interpretations. We assign a particular semantic interpretation to each statement<br />

type, recursively defined as necessary (for example, for loop statements) and describe a block<br />

as the logical conjunction <strong>of</strong> its statement semantics. This process is syntax-directed and


CHAPTER 4. VERIFYING CIRCUIT LAYOUTS 81<br />

B ′ β :: BlockEnv → Blkinst → Predicate<br />

B ′ β bid p1 . . . pn =<br />

if bid ∈ β then Bββ(bid)(p1, . . .,pn)<br />

else bid(p1, . . .,pn)<br />

B ′ β [ b1 , . . ., bn ] =<br />

let m1 = B ′ βb1 in<br />

.<br />

let mn = B ′ β bn in<br />

λ(x1, . . . , xn) (y1, . . .,yn). m1(x1, y1) ∧ . . . ∧ mn(xn, yn)<br />

B ′ β b1 ; b2 =<br />

let m1 = B ′ β b1 in<br />

let m2 = B ′ β b2 in<br />

λxy. ∃s. m1(x, s) ∧ m2(s, y)<br />

B :: BlockEnv → Block → Predicate<br />

Bβ block bid d1 . . . dn ∼ r { τ1 id1 . . . τp idp. stmts } =<br />

λd1 . . . dn r. ∃ id1 . . .idp. S ′ β stmts<br />

S ′ :: BlockEnv → StmtList → Bool<br />

S ′ β stmt1 . . . stmtn =<br />

let m1 = Sβstmt1 in<br />

.<br />

let mn = Sβstmtn in<br />

m1 ∧ . . . ∧ mn<br />

Sβ :: BlockEnv → Stmt → Bool<br />

Sβ assert e str = e<br />

Sβ e1 = e2 = e1 = e2<br />

Sβ if e { stmts1 } else { stmts2 } =<br />

let m1 = S ′ βstmts1 in<br />

let m2 = S ′ βstmts2 in<br />

if e then m1 else m2<br />

Sβ for i = e1..e2 { stmts } =<br />

∀i. e1 ≤ i ≤ e2 −→ S ′ βstmts Sβ a ; blkinst ; b at (x, y) = B ′ βblkinst(a, b)<br />

Figure 4.8: Converting Quartz blocks into their semantic interpretation in Higher-Order<br />

Logic. The function Bβ defines the formal semantics <strong>of</strong> Quartz using HOL.


CHAPTER 4. VERIFYING CIRCUIT LAYOUTS 82<br />

SB ′ :: BlockEnv → Block → (SizeFunc × SizeFunc)<br />

SB ′ β block bid d1 . . . dn ∼ r attributes { height = h. width = w. } { τ1 id1 . . . τp idp.<br />

stmts } =<br />

let m = S ′ βstmts in<br />

(λd1 . . .dn r. let (id1, . . . , idp) = (ι(id1, . . .,idp). m) in w,<br />

λd1 . . . dn r. let (id1, . . . , idp) = (ι(id1, . . . , idp). m) in h)<br />

Figure 4.9: Converting Quartz size expressions into Quartz<strong>Layout</strong> size functions<br />

automatic. Figure 4.8 gives the definition <strong>of</strong> the function Bβ which gives the semantics <strong>of</strong> a<br />

block as a logical predicate. This function gives Quartz a formal semantics in HOL, using an<br />

environment β which maps block identifiers to their definitions.<br />

The function in Figure 4.8 is implemented in the Quartz compiler layout processing module.<br />

The only difference between the formal definition and the compiler implementation is that the<br />

function B ′ β is not executed, instead the modelling <strong>of</strong> block instantiation <strong>with</strong>in Quartz<strong>Layout</strong><br />

(Section 4.4.2) is used. For example, the semantics <strong>of</strong> the map n R combinator, as generated<br />

by the compiler, are described by:<br />

λ(n, R)io. ∀j. (0 ≤ j ∧ j ≤ n − 1) −→ B ′ βR(i[j], o[j])<br />

In Isabelle’s ASCII syntax this is written as (note that “o” has been replaced <strong>with</strong> “o ” since<br />

“o” is a reserved keyword in Isabelle/HOL):<br />

% (n, R) i o . ALL (j::int). ((0


CHAPTER 4. VERIFYING CIRCUIT LAYOUTS 83<br />

Isabelle defines arbitrary recursive functions using the recdef construct however functions<br />

defined this way require termination to be proved automatically and this is difficult for the<br />

theorem prover to do because <strong>of</strong> the way Quartz blocks are defined. Unfortunately recdef is a<br />

closed box and it is not possible to manually direct the pro<strong>of</strong> and recdef compilation process<br />

to generate useful results.<br />

Luckily Isabelle does provide a (less general) recursive construct called primrec for imple-<br />

menting recursion over data structures. This can be used <strong>with</strong> the natural numbers type to<br />

write recursive equations for Quartz blocks in the form R0 = g and Rn+1 = f(Rn). This is<br />

limited to cases where the recursion <strong>of</strong> the block is controlled by a single integer parameter<br />

that decreases to zero, however this comfortably describes most recursively defined Quartz<br />

blocks.<br />

We have carried out the translation from recdef to primrec manually, however there is no<br />

reason why the process could not be automated.<br />

4.5.3 Generating Pro<strong>of</strong> Obligations<br />

The Quartz compiler also generates a series <strong>of</strong> pro<strong>of</strong> obligations that check the correctness<br />

<strong>of</strong> a layout specification. The correctness theorems are split into three groups, representing<br />

pro<strong>of</strong>s for validity, containment and intersection. These theorems are contained <strong>with</strong>in the<br />

same theory file that defines the block’s semantic definition, height and width functions. This<br />

ensures that the theory <strong>of</strong> a block can only be loaded to support that <strong>of</strong> a block dependent<br />

on it once it has itself been proved correct.<br />

All theorems are universally quantified across all domain and range signals. Validity theorems<br />

are the simplest and are proved under the assumption that the size functions <strong>of</strong> all higher-<br />

order block parameters are also valid. Assertion pre-conditions asserted <strong>with</strong>in the block’s<br />

body are also assumed and can be used in the pro<strong>of</strong>. The general format <strong>of</strong> the height validity<br />

theorem for a block B <strong>of</strong> type d1 . . . dn ∼ r is:


CHAPTER 4. VERIFYING CIRCUIT LAYOUTS 84<br />

∀ d1 . . . dn r.<br />

<br />

(∀(A :: block d A 1 . . . dA m ∼ rA ) ∈ {d1, . . .,dn, r}.<br />

∀ d A 1 . . .d A m r A . 0 ≤ HeightA d A 1 . . . d A m r A ) ∧<br />

<br />

(∀ c ∈ assertions(B). c) ⇒<br />

0 ≤ HeightB d1 . . . dn r<br />

The width theorem is similar, <strong>with</strong> width functions substituted for height functions 1 .<br />

The validity theorems for each block’s height and width function are given the names<br />

height ge0 and width ge0 and can be used in other theories. It is common to require these<br />

pro<strong>of</strong>s when determining the validity <strong>of</strong> size functions for blocks that use them.<br />

Containment theorems are generated for each statement that involves block instantiations,<br />

stating that the leftmost bottom point <strong>of</strong> each block is greater than <strong>of</strong> equal to (0, 0) and<br />

the top rightmost is less than or equal to (width, height) for all possible values <strong>of</strong> block<br />

parameters (provided assertions are met). The recursive descent algorithm that calculates<br />

containment theorems is shown in Figure 4.10.<br />

For the map n R combinator this generates a containment theorem <strong>of</strong>:<br />

theorem "Î(n::int) (R::((’t395⇒ ’t396⇒ bool,’t395⇒ ’t396⇒ int)block)) (i::(’t395)vector) (o<br />

::(’t396)vector). ∀ (j::int). ((0 ≤ j) ∧ (j ≤ (n − 1))) −→ Def (i ;;; R ;;; o ) ; ∀ (<br />

qs691::’t395) (qs692::’t396). 0 ≤ (Height (qs691 ;;; R ;;; qs692)) ; ∀ (qs691 ::’ t395) (qs692<br />

::’ t396). 0 ≤ (Width (qs691 ;;; R ;;; qs692)) =⇒<br />

∀ (j :: int). ((0 ≤ j) ∧ (j ≤ (n − 1))) −→ (((0::int) ≤ 0) ∧ (0 ≤ sum (0, j − 1, λqs403.<br />

Height (i ;;; R ;;; o ))) ∧ ((0 + (Width (i ;;; R ;;; o ))) ≤ (<br />

maxf (0, n − 1, λqs401. Width (i ;;; R ;;; o )))) ∧ ((sum (0, j − 1, λ<br />

qs403. Height (i ;;; R ;;; o )) + (Height (i ;;; R ;;; o ))) ≤<br />

sum (0, n − 1, λqs402. Height (i ;;; R ;;; o ))))"<br />

Note that the Quartz compiler has annotated the Isabelle theory <strong>with</strong> the results <strong>of</strong> its own<br />

1 Actually the Isabelle module generates 4 validity theorems, in two different representations. This is done<br />

because a size function can be evaluated either as (Height A) p1 . . . pn or as Height(pn−1 ; A p1 . . . pn−2 ; pn).<br />

Both <strong>of</strong> these are precisely equivalent and the compiler automatically generates a pro<strong>of</strong> for one in terms <strong>of</strong><br />

the other, requiring the designer to only prove 2 <strong>of</strong> the theorems manually. It was originally felt that both<br />

representations could be useful in pro<strong>of</strong>s however it has emerged that one format is the most useful and thus<br />

the other 2 theorems are essentially redundant.


CHAPTER 4. VERIFYING CIRCUIT LAYOUTS 85<br />

CON T :: Block → Bool<br />

CON T block bid d1 . . . dn ∼ r { τ1 id1 . . . τp idp. stmts } =<br />

∀d1 . . . dn r. S ∅stmts ⇒ SCON T ′ stmts<br />

SCON T ′ :: StmtList → Bool<br />

SCON T ′<br />

β stmt1 . . . stmtn =<br />

let c1 = SCON T stmt1 in<br />

.<br />

let cn = SCON T stmtn in<br />

c1 ∧ . . . ∧ cn<br />

SCON T :: Stmt → Bool<br />

SCON T assert e str = True<br />

SCON T e1 = e2 = True<br />

SCON T if e { stmts1 } else { stmts2 } =<br />

if e then SCON T ′ stmts1 else SCON T ′ stmts2<br />

SCON T for i = e1..e2 { stmts } =<br />

∀i. e1 ≤ i ≤ e2 −→ SCON T ′ stmts<br />

SCON T a ; blkinst ; b at (x, y) =<br />

(0 ≤ x) ∧ (0 ≤ y) ∧<br />

(x + Width(a ; blkinst ; b) ≤ widthd1 . . .dn r) ∧<br />

(y + Height(a ; blkinst ; b) ≤ heightd1 . . . dn r)<br />

Figure 4.10: Generating containment theorems<br />

type inference, ensuring that the Isabelle types are correct. The semantic interpretation <strong>of</strong><br />

the block is generated as a set <strong>of</strong> assumptions, defining assertions and possibly determining<br />

the values <strong>of</strong> internal signals (none in the case <strong>of</strong> map ).<br />

Intersection theorems are the most complex. They are generated for each block instantiation<br />

except the first, checking intersection against the previous statements. Figure 4.11 gives the<br />

algorithm that generates intersection pro<strong>of</strong> obligations. At first glance this appears quite<br />

complex, however its structure is really quite simple. The compiler makes a forward pass<br />

through the block statements, accumulating statements that have already been processed in<br />

the list φ. For each block instantiation that is encountered the least and maximum x and<br />

y co-ordinates are identified and are then compared <strong>with</strong> the equivalent co-ordinates for all<br />

blocks previously instantiated. This implements Definition 7, as can be most clearly seen in<br />

the last clause <strong>of</strong> function IN T ERSECT .<br />

An important case is the handling <strong>of</strong> for loops by the function SIN T ERφ. This generates<br />

two set <strong>of</strong> requirements - that the elements <strong>with</strong>in a loop don’t intersect <strong>with</strong> previously<br />

instantiated blocks and that the elements <strong>with</strong>in the loop don’t intersect <strong>with</strong> each other. This


CHAPTER 4. VERIFYING CIRCUIT LAYOUTS 86<br />

IN T ER :: Block → Bool<br />

IN T ER block bid d1 . . . dn ∼ r { τ1 id1 . . . τp idp. stmts } =<br />

∀d1 . . . dn r. S ∅stmts ⇒ SIN T ER ′ ∅ stmts<br />

SIN T ER ′ φ :: StmtListStmtList → Bool<br />

SIN T ER ′ φ stmt1 . . . stmtn =<br />

let c1 = SIN T ERφstmt1 in<br />

let c2 = SIN T ERφ∪{stmt1}stmt2 in<br />

.<br />

let cn = SIN T ER φ∪{stmt1,...,stmtn−1}stmtn in<br />

c1 ∧ . . . ∧ cn<br />

SIN T ERφ :: StmtList → Stmt → Bool<br />

SIN T ERφ assert e str = True<br />

SIN T ERφ e1 = e2 = True<br />

SIN T ERφ if e { stmts1 } else { stmts2 } =<br />

if e then SIN T ER ′ φstmts1 else SIN T ER ′ φstmts2 SIN T ERφ for i = e1..e2 { stmts } =<br />

∀ i. e1 ≤ i ≤ e2 −→ SIN T ER ′ φstmts ∧<br />

∀ i j. e1 ≤ i ≤ e2 ∧ e1 ≤ j ≤ e2 ∧ i = j −→ SIN T ER ′ {i↦→j}stmtsstmts<br />

SIN T ERφ a ; blkinst ; b at (x1, y1) =<br />

let (x2, y2) = (x1 + Width(a ; blkinst ; b), y1 + Height(a ; blkinst ; b)) in<br />

IN T ERSECT ′ (x1,y1),(x2,y2)φ<br />

IN T ERSECT ′ (x1,y1),(x2,y2) :: ((Exp × Exp) × (Exp × Exp)) → StmtList → Bool<br />

IN T ERSECT ′ (x1,y1),(x2,y2) stmt1 . . . stmtn =<br />

let c1 = IN T ERSECT (x1,y1),(x2,y2)stmt1 in<br />

.<br />

.<br />

let cn = IN T ERSECT (x1,y1),(x2,y2)stmtn in<br />

c1 ∧ . . . ∧ cn<br />

IN T ERSECT (x1,y1),(x2,y2) :: ((Exp × Exp) × (Exp × Exp)) → Stmt → Bool<br />

IN T ERSECT (x1,y1),(x2,y2) assert e str = True<br />

IN T ERSECT (x1,y1),(x2,y2) e1 = e2 = True<br />

IN T ERSECT (x1,y1),(x2,y2) if e { stmts1 } else { stmts2 } =<br />

if e then IN T ERSECT (x1,y1),(x2,y2)stmts1<br />

else IN T ERSECT (x1,y1),(x2,y2)stmts2<br />

IN T ERSECT (x1,y1),(x2,y2) for i = e1..e2 { stmts } =<br />

∀ i. e1 ≤ i ≤ e2 −→ IN T ERSECT (x1,y1),(x2,y2)stmts<br />

IN T ERSECT (x1,y1),(x2,y2) a ; blkinst ; b at (x ′ 1, y ′ 1) =<br />

let (x ′ 2 , y′ 2 ) = (x′ 1 + Width(a ; blkinst ; b), y′ 1 + Height(a ; blkinst ; b)) in<br />

(x ′ 2 ≤ x1) ∨ (x2 ≤ x ′ 1 ) ∨ (y′ 2 ≤ y′ 1 ) ∨ (y2 ≤ y ′ 1 )<br />

Figure 4.11: Generating intersection theorems


CHAPTER 4. VERIFYING CIRCUIT LAYOUTS 87<br />

is handled by checking the statements against themselves <strong>with</strong> a new identifier substituted<br />

for the loop variable. Intersection is only permitted if the two loop variables are the same and<br />

thus describing one iteration <strong>of</strong> the loop, which will obviously intersect totally <strong>with</strong> itself.<br />

The intersection theorem generated for map illustrates this, containing only the case checking<br />

that loop iterations do not overlap:<br />

theorem "Î(n::int) (R::((’t395⇒ ’t396⇒ bool,’t395⇒ ’t396⇒ int)block)) (i::(’t395)vector) (o<br />

::(’t396)vector). ∀ (j::int). ((0 ≤ j) ∧ (j ≤ (n − 1))) −→ Def (i ;;; R ;;; o ) ; ∀ (<br />

qs691::’t395) (qs692::’t396). 0 ≤ (Height (qs691 ;;; R ;;; qs692)) ; ∀ (qs691 ::’ t395) (qs692<br />

::’ t396). 0 ≤ (Width (qs691 ;;; R ;;; qs692)) =⇒<br />

∀ (j :: int) (j ’:: int). ((0 ≤ j) ∧ (j ≤ (n − 1)) ∧ (0 ≤ j’) ∧ (j’ ≤ (n − 1)) ∧ (j’ = j)) −→<br />

(((0 + (Width (i ;;; R ;;; o ))) ≤ 0) | ((0 + (Width (i ;;; R ;;; o ))) ≤<br />

0) | ((sum (0, j’ − 1, λ qs403. Height (i ;;; R ;;; o )) + (Height (i ;;; R ;;; o ))) ≤ sum (0, j − 1, λqs403. Height (i ;;; R ;;; o )))<br />

| ((sum (0, j − 1, λqs403. Height (i ;;; R ;;; o )) + (Height (i ;;;<br />

R ;;; o ))) ≤ sum (0, j’ − 1, λqs403. Height (i ;;; R ;;; o ))))"<br />

It is important to note that the algorithms given in Figure 4.10 and Figure 4.11 are pseudo-<br />

code and the implementation in the Quartz compiler differs slightly. For example, the com-<br />

piler carries out a large number <strong>of</strong> optimisations to eliminate unnecessary goals that are<br />

defined as being true. In addition, rather than generating one large intersection theorem<br />

the compiler splits it on a statement by statement basis into multiple theorems to make the<br />

individual pro<strong>of</strong>s a little simpler.<br />

While containment and intersection theorems are not used elsewhere, the validity theorems<br />

<strong>of</strong>ten are. It is therefore <strong>of</strong>ten useful to “prune” the assumption sets <strong>of</strong> these theorems to<br />

remove assumptions that are not necessary for the pro<strong>of</strong>. When these theorems are used in<br />

other pro<strong>of</strong>s the assumptions will themselves become pro<strong>of</strong> goals and pro<strong>of</strong>s can be simplified<br />

if the number <strong>of</strong> assumptions is minimised.


CHAPTER 4. VERIFYING CIRCUIT LAYOUTS 88<br />

4.6 Proving the Prelude Library<br />

One set <strong>of</strong> Quartz blocks it is essential to prove is the prelude library, the basic set <strong>of</strong><br />

operations that encode a variety <strong>of</strong> useful functions. The majority <strong>of</strong> the blocks in the<br />

prelude are actually wiring constructs rather than combinators, however it is still necessary<br />

to give them all a layout interpretation. We have done this according to the following rules:<br />

1. Wiring blocks, which simply re-arrange signals, are given size 0 × 0.<br />

2. Repeated composition (R n /rcomp) is laid out horizontally. /tri and ˜ /irt blocks<br />

which both use repeated composition are laid out vertically. map is laid out vertically,<br />

as are col and rdr .<br />

3. All other blocks are laid out horizontally, <strong>with</strong> the exception <strong>of</strong> grid which is two<br />

dimensional.<br />

The prelude library provides a fairly comprehensive set <strong>of</strong> blocks <strong>with</strong> different signal arrange-<br />

ments and is thus useful for experimenting <strong>with</strong> how automatic pro<strong>of</strong> tools can be tuned to<br />

minimise the human intervention in pro<strong>of</strong>s.<br />

4.6.1 Pro<strong>of</strong>s <strong>with</strong> Tacticals<br />

Our initial approach to automating pro<strong>of</strong> <strong>of</strong> the Prelude library involves generating pro<strong>of</strong><br />

tacticals in the Quartz compiler Isabelle module. Tacticals combine elementary pro<strong>of</strong> steps<br />

and automated tactics <strong>with</strong> basic repetition or choice operations.<br />

We design tacticals based on experience <strong>with</strong> hand-pro<strong>of</strong> <strong>of</strong> a variety <strong>of</strong> prelude blocks. These<br />

tacticals are based on invocations <strong>of</strong> Isabelle’s simplifier <strong>with</strong> specific simplification rules,<br />

interspersed <strong>with</strong> uses <strong>of</strong> the primitive rule method to decompose goals into multiple simpler<br />

sub-goals using theorems such as Theorems 13 and 17. Auto-generating pro<strong>of</strong> scripts for all<br />

prelude blocks we find that many theories run correctly <strong>with</strong> the automatically generated<br />

tacticals <strong>with</strong>out any human intervention, however some require intervention.<br />

The conjugate and conjugate2 blocks (defined in Figure 4.12) require additional intervention<br />

to prove the validity <strong>of</strong> their size functions due to the use <strong>of</strong> series composition. We de-


CHAPTER 4. VERIFYING CIRCUIT LAYOUTS 89<br />

block conjugate (block Q ‘a ∼ ‘a, block P ‘a ∼ ‘a) (‘a i) ∼ (‘a o) attributes {<br />

height = height (i ;converse P ; Q ; P ; o).<br />

width = width (i ;converse P ; Q ; P ; o).<br />

} → converse P ; Q ; P.<br />

block conjugate2 (block R (‘a,‘b) ∼ (‘a,‘b), block S (‘a,‘b) ∼ (‘a,‘b))<br />

(‘a i1, ‘b i2) ∼ (‘a o1, ‘b o2) attributes {<br />

height = height ((i1,i2) ; swap ; converse S ; swap ; R ; S ; (o1,o2)).<br />

width = width ((i1,i2) ;swap ; converse S ; swap ; R ; S ; (o1,o2)).<br />

} → swap ; converse S ; swap ; R ; S.<br />

Figure 4.12: Some <strong>of</strong> the Prelude library blocks that require manual intervention to prove<br />

their layouts using the tactical-based methods.<br />

velop two new theorems about series compositions that are useful here for decomposing this<br />

problem:<br />

Theorem 20ÎPQxy. Îxy. 0 ≤ Width P x y ;Îxy. (0:: int) ≤ Width Q x y <br />

=⇒ 0 ≤ Width (P ;; Q) x y<br />

Theorem 21ÎPQxy. Îxy. 0 ≤ Height P x y ;Îxy. (0:: int) ≤ Height Q x y <br />

=⇒ 0 ≤ Height (P ;; Q) x y<br />

Pro<strong>of</strong> By simplification, expanding the definitions <strong>of</strong> series composition and “let”. Theo-<br />

rem 20 also requires the use <strong>of</strong> the simple lemma 0 ≤ m ∧ 0 ≤ n ⇒ 0 ≤ m + n. Mechanised<br />

pro<strong>of</strong>s are given in Appendix B.5 as part <strong>of</strong> the SeriesComposition theory.<br />

These theorems essentially prove the validity <strong>of</strong> series composition size functions, assuming<br />

that the size functions <strong>of</strong> their constituent blocks are also valid. We prove similar theorems<br />

for parallel composition in Appendix B.6.<br />

Using these theorems we can prove the validity theorems for these two blocks. conjugate2 also<br />

requires additional intervention to split a single identifier representing a pair into two values,<br />

something that can be done automatically by Isabelle’s auto method. The combinators tri,<br />

irt, below and grid also require some manual intervention to re-write the pro<strong>of</strong> scripts. Once<br />

all pro<strong>of</strong> scripts are correct, Isabelle can prove the entire library in 47 seconds <strong>of</strong> processing<br />

time.


CHAPTER 4. VERIFYING CIRCUIT LAYOUTS 90<br />

Overall our tacticals prove good at proving intersection theorems and are mostly effective at<br />

proving validity and containment theorems.<br />

4.6.2 Improved Pro<strong>of</strong> Scripts<br />

Following our experiments <strong>with</strong> tactical-based pro<strong>of</strong>s <strong>of</strong> the Prelude library, we attempt the<br />

same pro<strong>of</strong>s using Isabelle’s standard auto pro<strong>of</strong> method. This method interleaves invocations<br />

<strong>of</strong> the simplifier and classical reasoner and can be supplied <strong>with</strong> sets <strong>of</strong> simplification rules<br />

and theorems to use. All our decompositional theorems are proved in the style <strong>of</strong> introduction<br />

rules and are supplied to auto as such.<br />

This method proves effective for validity and containment theorems however its results on<br />

intersection theorems are far from impressive. Isabelle’s automatic tools consistently select<br />

the wrong parts <strong>of</strong> the disjunctions to attempt to prove and leave pro<strong>of</strong> states that are not<br />

just unproven but actually unprovable.<br />

We therefore design a new set <strong>of</strong> rules for generating pro<strong>of</strong> scripts that combines the best <strong>of</strong><br />

these methods. auto-based pro<strong>of</strong>s are generated for validity and containment theorems while<br />

custom tacticals are generated for intersection theorems. The intersection tacticals include<br />

an invocation to auto as a last resort when the other options in the tactical fail to prove the<br />

goal, thus handling the rare circumstances when the classical reasoner can prove a goal but<br />

our custom tactical can not.<br />

Returning to our map example, the complete height function validity theorem and pro<strong>of</strong> is<br />

generated as:<br />

theorem height ge0 int: "Î(n::int) (R::((’t395⇒ ’t396⇒ bool,’t395⇒ ’t396⇒ int)block)) (i::(’<br />

t395)vector) (o ::(’t396)vector).<br />

∀ (qs691 ::’ t395) (qs692 ::’ t396). 0 ≤ (Height (qs691 ;;; R ;;; qs692)) =⇒<br />

0 ≤ (height (n, R) i o )"<br />

apply (auto intro: sum ge0 maxf ge0 sum ge0 frange maxf ge0 frange z aleq bc simp add:<br />

done<br />

Let def max def)<br />

The containment theorem is similarly proved using an appropriately parameterised auto. The


CHAPTER 4. VERIFYING CIRCUIT LAYOUTS 91<br />

compiler procedure that generates these scripts also supplies the height ge0 and width ge0<br />

theorems for all blocks used in the description as introduction rules (none in this case, since<br />

map only instantiates the supplied R block). This allows validity pro<strong>of</strong>s to build upon one<br />

another.<br />

The intersection theorem for map (page 87) is proved by the generated tactical:<br />

apply (simp, rule impI, simp)?<br />

apply ((<br />

done<br />

(rule allI )+,<br />

(case tac "0 ≤ n"),<br />

rule impdisj 12<strong>of</strong>4,<br />

(rule loop sum overlap|rule loop sum overlap’),<br />

(simp add: overlap0’’)+) |<br />

((rule allI )+,<br />

(case tac "0 ≤ n"),<br />

rule impdisj 34<strong>of</strong>4,<br />

rule loop sum overlap2,<br />

(simp add: overlap0’’)+) |<br />

auto intro: sum ge0 maxf ge0 sum nsub1 plusf maxf encloses)<br />

The loop sum overlap theorems are proved in the Structures theory. This theory contains<br />

theorems that match common layout structures, such as the layout <strong>of</strong> components in a loop.<br />

loop sum overlap is given as:<br />

Î(n::int) (j :: int) (j ’:: int). m ≤ n ;Îy. 0 ≤ f y =⇒<br />

((m ≤ j) ∧ (j ≤ (n − 1)) ∧ (m ≤ j’) ∧ (j’ ≤ (n − 1)) ∧ (j’ = j)) −→<br />

((sum (m, j − 1, f) + f j) ≤ sum (m, j’ − 1, f) |<br />

(sum (m, j’ − 1, f) + f j ’) ≤ sum (m, j − 1, f))<br />

Its pro<strong>of</strong> involves a number <strong>of</strong> steps and is given in Appendix B.8. The other loop sum overlap<br />

theorems are similar.<br />

Table 4.1 gives statistics on the pro<strong>of</strong>s for some <strong>of</strong> the blocks in the Prelude library and details<br />

for all <strong>of</strong> those where pro<strong>of</strong>s required manual intervention. Overall <strong>of</strong> nearly 40 blocks in the<br />

Prelude library, only 5 required manual intervention in their pro<strong>of</strong>s. Using the auto method


CHAPTER 4. VERIFYING CIRCUIT LAYOUTS 92<br />

Block Type Theorems Intervention Required<br />

id Wiring 2<br />

dash Wiring 2<br />

dstl Composite wiring 5 Expand definition <strong>of</strong> mfork<br />

dstr Composite wiring 5 Expand definition <strong>of</strong> mfork<br />

pair Composite wiring 3<br />

rcomp (R n ) Combinator 4<br />

tri ( ) Combinator 4<br />

irt ( ˜ ) Combinator 4 Manual containment & intersection<br />

beside (R↔S) Combinator 5<br />

row Combinator 4<br />

conjugate (R\S) Combinator 3 Handling <strong>of</strong> series composition 2<br />

conjugate2 (R \S) Combinator 3 Handling <strong>of</strong> series composition<br />

Table 4.1: Statistics on the layout pro<strong>of</strong>s for some <strong>of</strong> the prelude library blocks<br />

is slower than the tactical-only approach, requiring 1 minute 11 seconds to execute the full<br />

pro<strong>of</strong>s. However, we are more interested in the amount <strong>of</strong> human intervention required to<br />

prove layouts rather than CPU run-time, so long as it remains reasonably low.<br />

4.6.3 Building a Library<br />

Because the prelude library is used in virtually every Quartz circuit description it is desirable<br />

to not only prove its layout correct but also to ensure that the theorems the pro<strong>of</strong>s make<br />

available are formatted in the most appropriate format to ease later pro<strong>of</strong>s.<br />

This involves re-phrasing the height ge0 and width ge0 theorems for each block to remove<br />

unnecessary assumptions, since these would be unnecessary pro<strong>of</strong> burdens on any later pro<strong>of</strong>.<br />

At the same time we are also able to simplify the auto-generated pro<strong>of</strong> scripts to remove<br />

redundant pro<strong>of</strong> commands.<br />

Once final pro<strong>of</strong> scripts for the prelude library are completed they are compiled into an<br />

Isabelle heap image that can be loaded directly in the same way as the HOL base system or<br />

the Quartz<strong>Layout</strong> library. This means that blocks which use prelude theories do not need<br />

to run the pro<strong>of</strong>s before they can be used. In the Quartz placed prelude library all blocks<br />

are given the “layout-proved” attribute, indicating to the layout verification modules <strong>of</strong> the<br />

compiler that pro<strong>of</strong> scripts do not need to be generated for them.<br />

2 The application <strong>of</strong> the series composition decomposition theorems should be automated when supplied<br />

to auto, however the pro<strong>of</strong> tools do not always apply them correctly.


CHAPTER 4. VERIFYING CIRCUIT LAYOUTS 93<br />

R (3)<br />

R (2)<br />

R (1)<br />

R (0)<br />

(a)<br />

imap 4 R<br />

R (0) R (1) R (2) R (3)<br />

(b) irow 4 R<br />

Figure 4.13: Index operators<br />

R (3,0) R (3,1) R (3,2)<br />

R (2,0) R (2,1) R (2,2)<br />

R (1,0) R (1,1) R (1,2)<br />

R (0,0) R (1,0) R (2,0)<br />

(c) igrid 3,4 R<br />

The full definitions and pro<strong>of</strong>s for some <strong>of</strong> the prelude library blocks are given in Appen-<br />

dix C.1. This appendix omits all wiring blocks, where pro<strong>of</strong>s are usually trivial and many<br />

blocks where the block structures are very similar and thus the pro<strong>of</strong>s identical to others<br />

(such as col, which is very similar to row ).<br />

4.7 Proving Other Combinators<br />

While the prelude library consists <strong>of</strong> some extremely useful constructs, most <strong>of</strong> the blocks<br />

in it are quite simple. In Chapter 6 we will investigate the effectiveness <strong>of</strong> our verification<br />

framework when applied to full circuit descriptions, however we are also interested in the<br />

ease <strong>with</strong> which we can prove other useful libraries <strong>of</strong> combinators.<br />

4.7.1 Index Operators<br />

The index operators are versions <strong>of</strong> some <strong>of</strong> the standard Quartz prelude blocks which pa-<br />

rameterise their blocks <strong>with</strong> an integer parameter. For example, the index-map combinator<br />

imap n R is similar to map n R except the it instantiates instances <strong>of</strong> R parameterised <strong>with</strong><br />

0, 1, . . .,n − 1 as shown in Figure 4.13(a). Operations such as irow n R (Figure 4.13(b)) and<br />

igrid n R (Figure 4.13(c)) correspond to rown R and grid n R respectively.<br />

The index operators are particularly important examples for our system because the extra<br />

parameterisation <strong>of</strong> the R block could lead to the size <strong>of</strong> each instance <strong>of</strong> R being different.


CHAPTER 4. VERIFYING CIRCUIT LAYOUTS 94<br />

block irow (int n, block R int (‘a, ‘b) ∼ (‘c, ‘a)) (‘a l , ‘b t[n]) ∼ (‘c b[n], ‘a r)<br />

attributes {<br />

height = if(n==0, 0, height((l, t) ; snd (converse (apr (n − 1))) ;<br />

beside (irow (n−1, R), R n) ; fst (apr (n−1)) ; (b, r))).<br />

width = if(n==0, 0, width((l, t) ;snd (converse (apr (n − 1))) ;<br />

beside (irow (n−1, R), R n) ; fst (apr (n−1)) ; (b, r))).<br />

} {<br />

// Wires: l = left , t = top, b = bottom, r = right<br />

assert (n >= 0) ”n >= 0 is required”.<br />

if (n == 0) { l = r. } // b and t are empty vectors anyway<br />

else {<br />

(l , t) ;<br />

snd (converse (apr (n − 1))) ;<br />

beside (irow (n−1, R), R n) ;<br />

fst (apr (n−1)) ;<br />

(b, r) at (0,0).<br />

} .<br />

}<br />

Figure 4.14: Recursive definition <strong>of</strong> irow n R<br />

We investigate pro<strong>of</strong>s <strong>of</strong> two different versions <strong>of</strong> the index operators: defined iteratively and<br />

defined recursively. Recursively defined versions <strong>of</strong> some index operators are <strong>of</strong> particular<br />

interest because these versions can be used to represent less regular circuit descriptions.<br />

Iterative versions <strong>of</strong> the index operators are defined a very similar way to the standard prelude<br />

operators, and the pro<strong>of</strong> infrastructure developed and tested on the prelude works well for<br />

them - all iterative definitions are proved fully automatically.<br />

Recursive definitions are slightly more complex and we will examine irow n R as an example.<br />

Figure 4.14 gives the recursive definition <strong>of</strong> this combinator.<br />

The first step in verifying this layout is to generate the theory definitions using the compiler as<br />

normal, however we must then re-jig the definitions to use Isabelle’s primrec rather than recdef<br />

constructs (see Section 4.5.2). This process could be performed automatically. The recursive<br />

width function definition for the irow block after this process is given in Figure 4.15. Because<br />

the width function recurses over the natural number n, when the parameter is supplied to<br />

the R block the conversion function “int” must be used to convert the natural number to an<br />

integer.<br />

Similarly, the definition <strong>of</strong> the irow uses the int2nat function we define in the IntAlgebra


CHAPTER 4. VERIFYING CIRCUIT LAYOUTS 95<br />

primrec<br />

"width 0 R l t b r = 0"<br />

"width (Suc n) R l t b r =<br />

Width ((l, t) ;;;<br />

snd $ (converse $ (apr $ (int n))) ;;<br />

beside $ ((| Def = λb c. arbitrary, Height = λb c. arbitrary, Width = λ<br />

(l,t) (b, r). width n R l t b r |), R $ (int n + 1)) ;;<br />

fst $ (apr $ (int n))<br />

;;; (b, r)<br />

)"<br />

Figure 4.15: Isabelle definition <strong>of</strong> the irow width function<br />

theory to convert the integer parameter to a natural number. int2nat is similar to the inbuilt<br />

nat type converter except that it is only defined for values where n ≥ 0.<br />

Pro<strong>of</strong>s for recursive functions tend to follow a simple structure: induction and then some<br />

application <strong>of</strong> auto, possibly combined <strong>with</strong> other methods. In order to prove the validity<br />

theorems for irow it is necessary to massage the propositions [55], in order to move variables<br />

that must be encompassed by the induction onto the right-hand-side <strong>of</strong> the meta-implication.<br />

For example, the width validity theorem for irow is phrased as:<br />

theorem width ge0 int [rule format]: "<br />

Î(n::nat)<br />

(R::(( int⇒ (’t107∗’t108)⇒ (’t109∗’t107)⇒ bool,int⇒ (’t107∗’t108)⇒ (’t109∗’t107)⇒ int)block)).<br />

∀ (qs137::int) (qs138 ::(’ t107∗’t108)) (qs139 ::(’ t109∗’t107)).<br />

0 ≤ (Height (qs138 ;;; R $ qs137 ;;; qs139)) ;<br />

∀ (qs137::int) (qs138 ::(’ t107∗’t108)) (qs139 ::(’ t109∗’t107)).<br />

0 ≤ (Width (qs138 ;;; R $ qs137 ;;; qs139)) <br />

=⇒ ∀ l t b r. 0 ≤ (width n R l t b r)"<br />

This differs from the standard representation in that the signals l, t, b and r have been moved<br />

from being meta-quantified to being object-level quantified. The “rule format” tag instructs<br />

Isabelle to re-phrase the theorem using meta-quantification once it is proved.<br />

Pro<strong>of</strong> <strong>of</strong> this theorem involves applying induction to split it into two cases to prove:<br />

goal (theorem (width ge0 int), 2 subgoals):<br />

1.ÎnR. ∀ qs137 qs138 qs139. 0 ≤ Height (qs138 ;;; R $ qs137 ;;; qs139);<br />

∀ qs137 qs138 qs139. 0 ≤ Width (qs138 ;;; R $ qs137 ;;; qs139) <br />

=⇒ ∀ l t b r. 0 ≤ irow.width 0 R l t b r


CHAPTER 4. VERIFYING CIRCUIT LAYOUTS 96<br />

2.ÎnRna.<br />

∀ qs137 qs138 qs139. 0 ≤ Height (qs138 ;;; R $ qs137 ;;; qs139);<br />

∀ qs137 qs138 qs139. 0 ≤ Width (qs138 ;;; R $ qs137 ;;; qs139);<br />

∀ l t b r. 0 ≤ irow.width na R l t b r <br />

=⇒ ∀ l t b r. 0 ≤ irow.width (Suc na) R l t b r<br />

The pro<strong>of</strong> should now be completed by auto intro: width ser ge0, however the automatic<br />

pro<strong>of</strong> tools do not work in this case. We can however prove the base case and expand the<br />

induction case using only the simplifier:<br />

> apply (simp, simp)<br />

goal (theorem (width ge0 int), 1 subgoal):<br />

1.ÎRna.<br />

∀ qs137 a b aa ba. 0 ≤ Height R qs137 (a, b) (aa, ba);<br />

∀ qs137 a b aa ba. 0 ≤ Width R qs137 (a, b) (aa, ba);<br />

∀ l t b r. 0 ≤ irow.width na R l t b r <br />

=⇒ ∀ l t b r.<br />

0 ≤ Width<br />

(snd.snd $ (converse.converse $ (apr $ int na)) ;;<br />

beside $<br />

((| Def = λb c. arbitrary, Height = λb c. arbitrary,<br />

Width = λ(l, t). split (irow.width na R l t) |),<br />

R $ int na + 1) ;;<br />

fst . fst $ (apr $ int na))<br />

(l, t) (b, r)<br />

We can apply the width ser ge0 rule manually, after removing the universal quantifiers,<br />

which splits the goal into three sub-goals:<br />

> apply (rule allI )+<br />

> apply (rule width ser ge0)+<br />

goal (theorem (width ge0 int), 3 subgoals):<br />

1.ÎRna l t b r x y xa ya.<br />

∀ qs137 a b aa ba. 0 ≤ Height R qs137 (a, b) (aa, ba);<br />

∀ qs137 a b aa ba. 0 ≤ Width R qs137 (a, b) (aa, ba);<br />

∀ l t b r. 0 ≤ irow.width na R l t b r


CHAPTER 4. VERIFYING CIRCUIT LAYOUTS 97<br />

=⇒ 0 ≤ Width (snd.snd $ (converse.converse $ (apr $ int na))) xa ya<br />

2.ÎRna l t b r x y xa ya.<br />

∀ qs137 a b aa ba. 0 ≤ Height R qs137 (a, b) (aa, ba);<br />

∀ qs137 a b aa ba. 0 ≤ Width R qs137 (a, b) (aa, ba);<br />

∀ l t b r. 0 ≤ irow.width na R l t b r <br />

=⇒ 0 ≤ Width<br />

(beside $<br />

((| Def = λb c. arbitrary, Height = λb c. arbitrary,<br />

Width = λ(l, t). split (irow.width na R l t) |),<br />

R $ int na + 1))<br />

xa ya<br />

3.ÎRna l t b r x y.<br />

∀ qs137 a b aa ba. 0 ≤ Height R qs137 (a, b) (aa, ba);<br />

∀ qs137 a b aa ba. 0 ≤ Width R qs137 (a, b) (aa, ba);<br />

∀ l t b r. 0 ≤ irow.width na R l t b r <br />

=⇒ 0 ≤ Width (fst.fst $ (apr $ int na)) x y<br />

The pro<strong>of</strong> can then be completed by auto. Similar techniques can be adopted for other pro<strong>of</strong>s,<br />

although not all will require induction (the containment theorem for irow does not). Note<br />

that we have not required any properties <strong>of</strong> maxf or sum - these functions are used when<br />

describing the layout <strong>of</strong> iteratively defined blocks and are not needed for recursively defined<br />

combinators.<br />

We also attempted pro<strong>of</strong>s for manually specified size functions which expanded the definitions<br />

<strong>of</strong> apr and apl etc, referring to explicit vector indexes. These are simpler expressions than the<br />

full expressions produced by the inference algorithm, however a slightly unexpected result is<br />

that this substantially complicates the pro<strong>of</strong>s. Pro<strong>of</strong> now requires expansion <strong>of</strong> the definitions<br />

<strong>of</strong> intermediate signals <strong>with</strong>in the series compositions to check that there is a correspondence<br />

between the values produced by the usages <strong>of</strong> the append blocks and those that are manually<br />

specified. This should be a simple process but it is not because <strong>of</strong> the way blocks are defined<br />

as logical predicates rather than functions.<br />

The usage <strong>of</strong> block predicates to define values bound by the define description operator<br />

requires the elimination <strong>of</strong> definite descriptions to extract the real value. This is a significant


CHAPTER 4. VERIFYING CIRCUIT LAYOUTS 98<br />

B<br />

A<br />

C<br />

Figure 4.16: An irregular grid such as this one is impossible to describe using purely beside<br />

and below relative placement.<br />

pro<strong>of</strong> obligation and can be accomplished by using HOL theorems such as:<br />

theorem the equality: "P a;Îx. P x =⇒ x = a =⇒ (THE x. P x) = a"<br />

theorem theI2: "P a;Îx. P x =⇒ x = a;Îx. P x =⇒ Q x =⇒ Q (THE x. P x)"<br />

Theorem theI2 is the most useful, since it allows a definite description to be extracted from<br />

<strong>with</strong>in a let-definition. However, selecting the correct value is then a pro<strong>of</strong> obligation in<br />

three parts: proving that a value bound by the predicate exists, proving that it is unique<br />

and proving that it satisfies the original proposition. This is a complex and fiddly process<br />

which in general is not worth bothering <strong>with</strong>. It is, however, an illuminating and unexpected<br />

observation that directional abstraction, which generally aids reasoning about functional<br />

properties, complicates reasoning about layout when internal signals are involved.<br />

Full definitions and theories for some <strong>of</strong> the recursive index operators can be found in Ap-<br />

pendix C.2.<br />

4.7.2 An Irregular Grid<br />

In Chapter 1 we introduced one example <strong>of</strong> a pathological layout arrangement that was<br />

impossible to describe using purely beside and below relative placement constructs. This<br />

irregular grid arrangement is shown in Figure 4.16.<br />

Quartz can describe a combinator <strong>with</strong> this layout, as shown in Figure 4.17. <strong>Layout</strong> pro<strong>of</strong>s<br />

were attempted for two versions <strong>of</strong> this combinator: one <strong>with</strong> an inferred size function and<br />

one where the size function was specified manually. There was a marked difference in the<br />

D<br />

E


CHAPTER 4. VERIFYING CIRCUIT LAYOUTS 99<br />

block irregular grid (block A ‘a1 ∼ ‘a2, block B ‘b1 ∼ ‘b2,<br />

block C ‘c1 ∼ ‘c2, block D ‘d1 ∼ ‘d2, block E ‘e1 ∼ ‘e2)<br />

(‘a1 a1, ‘b1 b1, ‘c1 c1, ‘d1 d1, ‘e1 e1) ∼<br />

(‘a2 a2, ‘b2 b2, ‘c2 c2, ‘d2 d2, ‘e2 e2)<br />

attributes {<br />

height = max (height (a1;A;a2) + height(b1;B;b2),<br />

height(a1;A;a2)+height(c1;C;c2)+height(d1;D;d2),<br />

height(e1;E;e2)+height(d1;D;d2)).<br />

width = max (width(a1;A;a2) + width(e1;E;e2),<br />

width(b1;B;b2) + width(c1;C;c2) + width(e1;E;e2),<br />

width(b1;B;b2) + width(d1;D;d2)).<br />

} {<br />

a1 ; A ; a2 at (0,0).<br />

b1 ; B ; b2 at (0, height(a1 ; A ; a2)).<br />

c1 ; C ; c2 at (width(b1 ;B ; b2), height(a1 ; A ; a2)).<br />

d1 ; D ; d2<br />

at (width (b1 ;B ; b2),<br />

max (height (c1 ;C ; c2) + height(a1 ;A ; a2), height(e1 ; E ; e2))).<br />

e1 ; E ; e2<br />

at (max (width (a1 ;A ;a2), width(c1 ;C ; c2) + width(b1 ;B ; b2)), 0).<br />

}<br />

Figure 4.17: Quartz description for the irregular grid arrangement shown in Figure 4.16<br />

script execution times for the two combinators, <strong>with</strong> the inferred size combinator requiring<br />

2 min 37s to execute while the manually specified size combinator required only 31s. In<br />

both cases pro<strong>of</strong> scripts required some minor amendments from the auto-generated defaults,<br />

although these were relatively simple.<br />

The most important observation from this combinator stems from pro<strong>of</strong> <strong>of</strong> its four intersection<br />

theorems. During pro<strong>of</strong>s <strong>of</strong> these theorems a bug in the layout description was discovered,<br />

illustrating that the pro<strong>of</strong> process can have a role in the early debugging <strong>of</strong> this kind <strong>of</strong> layout<br />

description. While one downside <strong>of</strong> theorem proving as opposed to model checking is that it<br />

does not provide counter-examples, failure to prove an intersection pro<strong>of</strong> obligation tends to<br />

leave a pro<strong>of</strong> state that <strong>with</strong> only a little massaging clearly reveals the source <strong>of</strong> the error.<br />

Appendix C.3 gives the full pro<strong>of</strong> script for this combinator.<br />

4.7.3 H-Tree<br />

A H-tree is type <strong>of</strong> layout shown in Figure 4.18. It is <strong>of</strong> particular interest in circuit design<br />

since it can be used to lay out a tree-shaped circuit in a square block <strong>with</strong> balanced wire


CHAPTER 4. VERIFYING CIRCUIT LAYOUTS 100<br />

R<br />

R R R<br />

R<br />

R<br />

R R<br />

R<br />

R<br />

R R R<br />

R<br />

R<br />

R<br />

R<br />

R<br />

R<br />

R R R<br />

R<br />

R<br />

R<br />

R R R<br />

Figure 4.18: An H-Tree arrangement <strong>of</strong> R blocks for n = 5<br />

block htree (int n, block R (‘a, ‘a) ∼ ‘a) (‘a i [m]) ∼ (‘a o) {<br />

const m = 2 ∗∗ n.<br />

‘a st1 in[m/2], st2 in[m/2], st1 out, st2 out.<br />

if n == 0 { o = i[0]. } else {<br />

i ; half (m/2) ; (st1 in, st2 in) at (0,0).<br />

if (n mod 2 == 0) {<br />

// Vertical sub−tree arrangement<br />

st1 in ; htree (n−1, R) ; st1 out at (0,0).<br />

(st1 out, st2 out) ; R ; o<br />

at (0, height(st1 in ; htree (n−1, R) ; st1 out)).<br />

st2 in ; htree (n−1, R) ; st2 out<br />

at (0, height(st1 in;htree (n−1, R);st1 out) +<br />

height ((st1 out, st2 out);R;o)).<br />

} else {<br />

// Horizontal sub−tree arrangement<br />

st1 in ; htree (n−1, R) ; st1 out at (0,0).<br />

(st1 out, st2 out) ; R ; o<br />

at (width(st1 in ;htree (n−1, R) ; st1 out), 0).<br />

st2 in ; htree (n−1, R) ; st2 out<br />

at (width(st1 in;htree (n−1, R);st1 out) +<br />

width ((st1 out, st2 out);R;o), 0).<br />

} .<br />

} .<br />

}<br />

Figure 4.19: Quartz description for the a H-tree combinator<br />

R<br />

R<br />

R


CHAPTER 4. VERIFYING CIRCUIT LAYOUTS 101<br />

lengths (save on the interface points). This kind <strong>of</strong> structure can be realised by a recursive<br />

combinator which alternates between a horizontal and vertical layout for each sub-tree. A<br />

Quartz description <strong>of</strong> this combinator can be seen in Figure 4.19. Functionally this combi-<br />

nator is identical to a standard binary-tree arrangement, although the layout description is<br />

quite complicated.<br />

The verification <strong>of</strong> this kind <strong>of</strong> combinator is especially important, as we saw <strong>with</strong> the irreg-<br />

ular grid example, since the complex layout is relatively more likely to contain errors. The<br />

semantic definition, height and width functions for this block are recursive and defined using<br />

Isabelle primrec constructs. The use <strong>of</strong> multiple internal signals requires special handling.<br />

We have used tuples <strong>of</strong> internal signals to bind them in a single predicate, however Isabelle’s<br />

handling <strong>of</strong> tuples does not allow these to be split easily, so it is better to re-define the internal<br />

signals individually binding them <strong>with</strong> identical copies <strong>of</strong> the predicate, where other signals<br />

are existentially quantified <strong>with</strong>in it. This leads to definitions that are long and contain a<br />

great deal <strong>of</strong> redundancy but makes pro<strong>of</strong>s substantially easier and since the pro<strong>of</strong> tool is<br />

designed to handle large definitions and pro<strong>of</strong> scripts easily this is a trade-<strong>of</strong>f worth making.<br />

Validity theorems are proved by induction on n and then use <strong>of</strong> the auto method. Contain-<br />

ment theorems require a combination <strong>of</strong> auto <strong>with</strong> some primitive deduction. Intersection is<br />

proved automatically <strong>with</strong> the expansion <strong>of</strong> the definition <strong>of</strong> the half block.<br />

4.7.4 Surround<br />

In Section 3.6 we introduced the surround combinator which describes a square block sur-<br />

rounded by interface elements. Figure 4.20 illustrates the generic instantiation <strong>of</strong> this com-<br />

binator <strong>with</strong> an A block surrounded by B, C, D and E interface elements. It is important<br />

to note that this combinator is general and could describe situations where the interface<br />

elements are the same, or absent (replaced by the identity block).<br />

The Quartz description for this block given in Figure 3.14 (Page 3.14) is similarly general.<br />

It must describe a correct layout for situations where the sizes <strong>of</strong> the blocks <strong>with</strong>in the<br />

combinator vary arbitrarily. In fact, during the verification <strong>of</strong> this combinator we discovered<br />

an error in the block placement which would probably have otherwise remained undiscovered.


CHAPTER 4. VERIFYING CIRCUIT LAYOUTS 102<br />

B<br />

C<br />

A<br />

D<br />

Figure 4.20: The surround combinator<br />

Appendix C.4 gives the Quartz description and correctness pro<strong>of</strong> for this combinator’s layout.<br />

Validity <strong>of</strong> height and width expressions are proved by auto configured to expand let defin-<br />

itions and using Theorem 11 (z aleq bc). Containment pro<strong>of</strong>s can also most be completed<br />

purely by auto however one requires the use <strong>of</strong> a variant <strong>of</strong> Theorem 9.<br />

The true value <strong>of</strong> the verification methodology comes into play <strong>with</strong> the intersection pro<strong>of</strong>s.<br />

Once again, these are proved entirely automatically using purely auto - however the error that<br />

was discovered in the layout was discovered because an intersection theorem was not proved.<br />

The error was that C was naively placed <strong>with</strong> its y co-ordinate defined by heightA+heightD<br />

however this did not take into account <strong>of</strong> the fact that it was possible for it to overlap block E<br />

under some circumstances. A simple correction to define the y co-ordinate as the maximum<br />

<strong>of</strong> the height <strong>of</strong> E or A and D was sufficient to produce a valid layout.<br />

This is still a relatively simple layout for this combinator. A more complex layout description<br />

could use conditionals to compare the heights and widths <strong>of</strong> the various blocks and adjust<br />

their relative placement accordingly (for example, the B and E blocks could be aligned <strong>with</strong><br />

the bottom <strong>of</strong> the combinator as a whole rather than the bottom <strong>of</strong> the A block if they have<br />

a greater height than A).<br />

E


CHAPTER 4. VERIFYING CIRCUIT LAYOUTS 103<br />

4.8 Discussion<br />

Our approach to layout verification is effective, both at verifying the correctness <strong>of</strong> combina-<br />

tors and in finding the source <strong>of</strong> errors. However, it does have some drawbacks:<br />

1. It generates many pro<strong>of</strong> goals for blocks.<br />

2. Many pro<strong>of</strong>s that should be completed automatically require some manual intervention<br />

to tweak the role <strong>of</strong> the automated tactics.<br />

3. We have not formally established the link between the pro<strong>of</strong> obligations we generate<br />

for each block and the original definitions <strong>of</strong> correctness.<br />

The first issue stems from the increased role <strong>of</strong> the size inference system over what was<br />

originally expected. When this work was begun it was presumed that size inference would be<br />

inefficient and it would almost always be preferable to manually specify block sizes. However,<br />

we have found that the opposite is <strong>of</strong>ten the case. Except for primitive blocks, where sizes<br />

must always be specified manually, size inference is usually easier than writing complex size<br />

expressions by hand. In addition, while hand-coded size expressions are more efficient than<br />

inferred ones, the differences between the two tend to follow common patterns (some <strong>of</strong> which<br />

we have proved as theorems during the course <strong>of</strong> this work).<br />

It seems likely that in most cases the manually specified size functions could be produced by<br />

applying correctness-preserving transformations to the inferred size expressions automatically<br />

in the compiler.<br />

Using the size inference system, it should no longer be necessary to prove validity and con-<br />

tainment for each block’s size expressions if these properties could be proved for the inference<br />

algorithm itself. We can satisfy ourselves by inspection that sizes inferred by the inference<br />

system have correct containment properties since the inference algorithm is designed to select<br />

the topmost, rightmost possible co-ordinate <strong>of</strong> a layout 3 . We can also prove a theorem about<br />

the size inference function to show validity <strong>of</strong> its results:<br />

Theorem 22 For all blocks A, where R is the set <strong>of</strong> higher-order parameters <strong>of</strong> A and for<br />

3 “Correct by definition” is not totally satisfactory, however it is sufficient for our purposes.


CHAPTER 4. VERIFYING CIRCUIT LAYOUTS 104<br />

all r ∈ R, heightr and widthr are valid, the height and width expressions inferred for block<br />

A will be valid.<br />

Pro<strong>of</strong> By induction on the structure <strong>of</strong> statements and using Theorems 9, 11, 13 and 17.<br />

Similar pro<strong>of</strong>s must be completed for the size function for block compositions SI but these<br />

are similar and easy.<br />

However, if these theorems are simply omitted from the Isabelle theories this makes it im-<br />

possible for them to be referred to in pro<strong>of</strong>s for blocks which have manually specified size<br />

functions. Manually specified size functions are far from useless - they can be simpler than<br />

inferred ones and allow blocks to be allocated sizes that are larger than required. This latter<br />

point is particularly important for run-time reconfigurable designs where it may be desirable<br />

to allocate the same (maximum) amount <strong>of</strong> space to a design regardless <strong>of</strong> how big it actually<br />

is and allow all different generated designs to grow or shrink <strong>with</strong>in that boundary.<br />

A way around this is to describe the size inference function itself in Isabelle and prove these<br />

properties about it using a deep embedding <strong>of</strong> Quartz. A meta-theorem could then be proved<br />

in Isabelle/HOL to provide validity pro<strong>of</strong>s for all size expressions produced by the inference<br />

algorithm.<br />

Issue 2 appears to mainly be one <strong>of</strong> implementation. Isabelle’s automated pro<strong>of</strong> tools do<br />

produce very good results when their rule sets are correctly configured and further investi-<br />

gation is necessary to reveal whether these issues are caused by niceties in the phrasing <strong>of</strong><br />

theorems or subtle bugs in the way they are applied by the classical reasoner. It should be<br />

stressed that where manual intervention in pro<strong>of</strong>s has been necessary it has not required any<br />

theorems or lemmas that are not already present in the Quartz<strong>Layout</strong> library and thus for<br />

an experienced user the task is generally an easy one.<br />

The third issue is perhaps one <strong>of</strong> choice. Formal verification is time consuming and difficult.<br />

As such it makes sense to apply it only to the parts <strong>of</strong> a design or design process where it<br />

is most likely to yield results. No hardware development system is available that is formally<br />

verified from “top to bottom” and in our system we have chosen to implement formal verifi-<br />

cation for a particular subset, based on particular definitions that we can reason are correct<br />

in a semi-formal way.


CHAPTER 4. VERIFYING CIRCUIT LAYOUTS 105<br />

Once again, an alternative to this would perhaps be to attempt a deep embedding <strong>of</strong> Quartz<br />

in Isabelle and to prove the correctness <strong>of</strong> layouts in terms <strong>of</strong> the compilation function. A<br />

deep embedding, particularly one which encompassed a full compilation procedure, would be<br />

an even more substantial undertaking than this shallow embedding. <strong>Layout</strong> verification based<br />

on a deep embedding <strong>of</strong> compilation (for intersection pro<strong>of</strong>s) and the size inference algorithm<br />

(for validity and containment pro<strong>of</strong>s) would, for each individual design, probably require the<br />

pro<strong>of</strong> <strong>of</strong> significantly fewer theorems but each theorem is likely to be more complex. The<br />

benefit would be an increased level <strong>of</strong> formal assurance in the result but quite how much <strong>of</strong><br />

a benefit this is remains an open question.<br />

Another aspect <strong>of</strong> our system that could be changed is the use <strong>of</strong> containment pro<strong>of</strong>s. We<br />

have based layout verification around the assumption that each block can be contained <strong>with</strong>in<br />

a rectangle and this rectangle can then be used as an abstraction for the block layout in later<br />

pro<strong>of</strong>s. However, for irregular shaped blocks this may not be optimal - for example consider<br />

the case <strong>of</strong> two triangular circuits, which could be laid out inter-locking to form a rectangle<br />

- but only if their size functions are not rectangular. Introducing block boundaries described<br />

by arbitrary functions would massively complicate reasoning and the approach we would<br />

advocate using our system is to describe shapes such as these as new combinators as and<br />

when they are required - these layouts can then be proved as normal. An alternative is to<br />

relax the requirement on containment pro<strong>of</strong>s and verify the containment <strong>of</strong> a set <strong>of</strong> blocks.<br />

This means verifying the layout at a level <strong>of</strong> a set <strong>of</strong> blocks rather than each individual block<br />

in the set.<br />

4.9 Summary<br />

In this chapter we have described a system for verifying Quartz circuit layouts in a shallow<br />

embedding <strong>of</strong> Quartz in Higher-Order Logic using the generic theorem prover Isabelle. We<br />

give Quartz a formal semantics in HOL.<br />

We have described new features <strong>of</strong> the Quartz compiler which support automatic conversion<br />

<strong>of</strong> Quartz descriptions into Isabelle/HOL definitions and the automatic generation <strong>of</strong> pro<strong>of</strong><br />

obligations to verify their layouts. Our modified compiler is also reasonably effective at gen-


CHAPTER 4. VERIFYING CIRCUIT LAYOUTS 106<br />

erating pro<strong>of</strong> scripts which will automatically prove these theorems using Isabelle’s simplifier<br />

and classical reasoner.<br />

We have demonstrated our system on a range <strong>of</strong> combinators including the full prelude and<br />

index operators libraries. Large theorems have been proven relatively easily and we have also<br />

demonstrated that our system can reveal the flaws in circuit layouts in a useful manner for<br />

aiding development, rather than purely giving a false result.<br />

Chapter 6 contains some examples <strong>of</strong> the usage <strong>of</strong> the layout generation and verification<br />

system on a range <strong>of</strong> complete designs.


Chapter 5<br />

Specialisation<br />

In this chapter we illustrate how Quartz can be used to create specialised, placed designs when<br />

some input values are known at compile-time. Section 5.1 introduces the benefits <strong>of</strong> design<br />

specialisation and some <strong>of</strong> its applications. In Section 5.2 we illustrate how we can achieve<br />

distributed specialisation transparently using Quartz “clever components” as primitive ele-<br />

ments. In Section 5.3 we discuss the limitations <strong>of</strong> our current infrastructure for distributed<br />

specialisation and outline the requirements for an optimal system, while Section 5.4 discusses<br />

the role <strong>of</strong> specialisation code at a higher-level than primitive components. Section 5.5 illus-<br />

trates high level specialisation <strong>of</strong> a multiplier circuit and evaluates the performance impact<br />

<strong>of</strong> compacting designs when logic is optimised away. Section 5.6 summarises this chapter.<br />

5.1 Introduction<br />

Design specialisation is a useful tool that can be used to optimise the performance <strong>of</strong> digital<br />

circuits when the values <strong>of</strong> some inputs are known. It is something that is so commonly used<br />

in circuit design as to almost not be worth mentioning <strong>with</strong> simple optimisations such as<br />

replacing a full multiplier by a constant co-efficient multiplier when one input is fixed or, at<br />

a meta-level, the choice <strong>of</strong> selecting a multiplier implementation rather than a full ALU for<br />

a circuit that will only perform multiplication.<br />

Typically most design specialisation is static specialisation - a design-time optimisation that<br />

107


CHAPTER 5. SPECIALISATION 108<br />

is carried out either by the designer or automatically by synthesis tools. The process can<br />

encompass a range <strong>of</strong> low level logic optimisations such as constant propagation and dead<br />

logic removal (eliminating logic that computes a result which is never used). However, the<br />

increasing role <strong>of</strong> <strong>FPGA</strong>s and their potential for run-time reconfiguration raises the possibility<br />

<strong>of</strong> dynamic specialisation - changing the circuit at run-time.<br />

At present the most common use <strong>of</strong> run-time reconfiguration is to swap different pre-synthesised<br />

library circuits on and <strong>of</strong>f a chip. However, dynamic specialisation can be used to reconfigure<br />

an <strong>FPGA</strong> to carry out the same operation but to perform it in some way that is better - for<br />

example using fewer logic resources or being able to run at a higher clock frequency.<br />

Dynamic specialisation becomes a useful option for circuits which do not have static inputs<br />

but do have one or more inputs that changes at a much lower rate than the others. A good<br />

example is a cryptographic processor which may have two inputs: an encryption key and<br />

plaintext data to encrypt. If the key changes much less frequently than the plaintext then<br />

the decryption circuit can usefully be specialised for that key value, producing a design that<br />

is more efficient.<br />

The usefulness <strong>of</strong> dynamic specialisation depends on the trade-<strong>of</strong>f between the expected<br />

benefit to be gained from specialisation and the time taken to generate a specialised design<br />

and reconfigure the <strong>FPGA</strong>. This trade-<strong>of</strong>f is not necessarily one purely <strong>of</strong> time, since it could<br />

be that the desire is to free logic area on the <strong>FPGA</strong> for other uses rather than purely making<br />

the design run faster.<br />

Dynamic specialisation poses a particular difficulty from the point <strong>of</strong> view <strong>of</strong> design verifi-<br />

cation. Standard verification methodologies based on simulation (or even model checking <strong>of</strong><br />

the finished design) are not going to be any use for verifying the correctness <strong>of</strong> designs which<br />

are being produced in the order <strong>of</strong> seconds rather than days, weeks or months - and very<br />

probably <strong>with</strong>out any human intervention. This suggests that it is appropriate to employ<br />

formal verification and theorem proving for dynamic specialisation to verify the equivalence<br />

<strong>of</strong> a specialised version to the general-purpose design.<br />

Some success has been reported <strong>with</strong> this approach in verifying a procedure to partially<br />

evaluate a multiplier circuit using higher-order logic [80]. This partial evaluation procedure


CHAPTER 5. SPECIALISATION 109<br />

operated at a low level on a placed and routed circuit and replaced unnecessary functional<br />

components <strong>with</strong> wires. However, this process leaves the bounding box <strong>of</strong> the circuit un-<br />

changed and is thus not effective at freeing logic area on the device. In addition this bounding<br />

box can contain long wire lengths which will have a performance impact on the specialised<br />

design, reducing its maximum clock frequency.<br />

An alternative approach to specialising the low level hardware is to generate a specialised<br />

design at a higher level. This however requires that the full process <strong>of</strong> synthesising, mapping,<br />

placing and routing a design is completed for the specialised circuit - something that will<br />

probably take far too long. This time can be reduced by taking the middle road and gener-<br />

ating a mapped, placed circuit which only then needs to be routed. Design methodologies<br />

which allow fast generation <strong>of</strong> <strong>FPGA</strong> bitstreams from this kind <strong>of</strong> description have been<br />

described [78].<br />

In this chapter we demonstrate how our Quartz placement infrastructure can be used to<br />

specialise designs and illustrate the principle <strong>of</strong> distributed specialisation.<br />

5.2 Distributed Specialisation<br />

We use the term “distributed specialisation” to describe Quartz blocks which appear to be<br />

hardware elements but which actually contain code which controls their elaboration to pro-<br />

duce simpler hardware if possible. These self-specialising blocks can be seamlessly integrated<br />

into a design to achieve HDL-level support for specialisation.<br />

Distributed specialisation is characterised by the lack <strong>of</strong> any centralised control, making<br />

specialisation available transparently to the designer. We will also demonstrate the slightly<br />

counter-intuitive concept that distributing the specialisation code into multiple locations<br />

actually makes design verification easier as compared to when specialisation is explicitly<br />

controlled by a “specialise” input, such as has been demonstrated <strong>with</strong> the Pebble layout<br />

system [49].<br />

Our self-specialising blocks are “clever components” [75], although they are quite different to<br />

those previously demonstrated. Rather than pairing hardware wires <strong>with</strong> extra information


CHAPTER 5. SPECIALISATION 110<br />

A B Q<br />

0 0 0<br />

0 1 0<br />

1 0 0<br />

1 1 1<br />

(a) 2-input<br />

B Q<br />

0 0<br />

1 1<br />

(b) A=true<br />

B Q<br />

0 0<br />

1 0<br />

(c) A=false<br />

Figure 5.1: AND gate truth tables<br />

about them, we instead totally replace them <strong>with</strong> Quartz variables.<br />

5.2.1 Specialising Primitives<br />

We will first introduce this simple but powerful idea <strong>with</strong> a basic example: a 2-input logical<br />

and gate. The truth table for a 2-input and gate is shown in Figure 5.1(a), showing that<br />

the output signal Q is only asserted if both the inputs A and B are true. If one <strong>of</strong> the<br />

input signals is known then this truth table can be simplified as shown in Figure 5.1(b) and<br />

Figure 5.1(c). If A is false, then Q always equals false, regardless <strong>of</strong> the value <strong>of</strong> B, while if<br />

A is true then Q takes the value <strong>of</strong> B.<br />

This allows us to describe the specialisation <strong>of</strong> an and gate when the A input is fixed - if it<br />

is fixed <strong>with</strong> value true then the gate should specialise to a wire linking Q and B, while if it<br />

fixed <strong>with</strong> value false then Q should just be statically assigned the value false.<br />

In distributed specialisation we enclose this behaviour <strong>with</strong>in a composite and2 block which<br />

will transparently carry out this operation when connected to a static value. We can use<br />

the Quartz overloading mechanism to select between different possible uses <strong>of</strong> the and gate<br />

primitive - <strong>with</strong> two wire values, or one wire and one known value or two known values - using<br />

the type system, assuming that static values are represented as Quartz booleans. Figure 5.2<br />

illustrates the overloaded Quartz blocks that can describe this operation.<br />

If the two input values are unknown (i.e. they are real dynamic values carried on hardware<br />

wires) then the hardware primitive and2 is selected which elaborates to a primitive gate <strong>with</strong><br />

size 1 × 1. If both inputs are known then no hardware is generated at all and instead a<br />

boolean output variable is generated <strong>with</strong> the value <strong>of</strong> the and operation. If one input is<br />

known then either a wire or a static assignment <strong>of</strong> c to ground is generated.


CHAPTER 5. SPECIALISATION 111<br />

// Hardware primitive<br />

block and2 (wire a, wire b) ∼ (wire c) attributes { height = 1. width = 1. }{ }<br />

// Specialising when both inputs are known<br />

block and2 (bool a, bool b) ∼ (bool c) attributes { width = 0. height = 0. }<br />

→ c = (a and b).<br />

// Specialising when one input is known<br />

block and2 (bool a, wire b) ∼ (wire c) attributes { width = 0. height = 0. }<br />

→ if a { c = b. } else { c = false. } .<br />

block and2 (wire a, bool b) ∼ (wire c) → (b, a) ; and2 ; c.<br />

Figure 5.2: Distributed specialisation <strong>of</strong> an and2 block<br />

A B Q<br />

0 0 0<br />

0 1 1<br />

1 0 1<br />

1 1 0<br />

(a) 2-input<br />

B Q<br />

0 1<br />

1 0<br />

(b) A=true<br />

B Q<br />

0 0<br />

1 1<br />

(c) A=false<br />

Figure 5.3: Exclusive-or gate truth tables<br />

A slightly more complicated example is an exclusive-or function. The full and specialised<br />

truth tables for this block are given in Figure 5.3. Here the relationship between B and Q<br />

when A is known is slightly different, if A is false then Q = B however if A is true then Q is<br />

B inverted. Distributed specialisation can describe xor2 blocks which enclose this behaviour,<br />

either connecting the two signals together or generating a simple inverter rather than a full<br />

xor gate. Figure 5.4 shows the Quartz description for xor specialisation.<br />

This example is a particularly significant one because it demonstrates how conditionals in<br />

size expressions can be used to reflect the specialisation in the layout <strong>of</strong> a circuit. These size<br />

expressions will be propagated through the circuit to create a size expression for the overall<br />

circuit which is dependent on the value <strong>of</strong> the static parameter.<br />

Specialising the xor2 gate to an inverter does not actually save any logic area on an <strong>FPGA</strong>,<br />

since logic functions are implemented by look-up tables, however if realised on an ASIC the<br />

difference between 8 transistors to implement an xor function and 2 to implement an inverter<br />

is definitely worth having.


CHAPTER 5. SPECIALISATION 112<br />

// Hardware primitive<br />

block xor2 (wire a, wire b) ∼ (wire c) attributes { height = 1. width = 1. }{ }<br />

// Specialising when both inputs are known<br />

block xor2 (bool a, bool b) ∼ (bool c) attributes { width = 0. height = 0. }<br />

→ c = (a xor b).<br />

// Specialising when one input is known<br />

block xor2 (bool a, wire b) ∼ (wire c)<br />

attributes { width = if (a,1,0). height = if (a,1,0). }<br />

→ if a { b ; inv ; c. } else { c = b. } .<br />

block xor2 (wire a, bool b) ∼ (wire c)<br />

→ (b, a) ; xor2 ; c.<br />

5.2.2 Benefits<br />

Figure 5.4: Distributed specialisation <strong>of</strong> an xor2 block<br />

So far we have given simple examples <strong>of</strong> how distributed specialisation can be used to achieve<br />

constant propagation and elimination <strong>of</strong> unnecessary logic primitives. It is worth pointing<br />

out that this operation will be carried out by any reasonable synthesis tool anyway and we are<br />

not claiming that applying constant propagation to circuits is anything new. The strengths<br />

<strong>of</strong> distributed specialisation are its three advantages over low-level hardware optimisation by<br />

synthesis tools.<br />

Firstly, when using components that implement distributed specialisation in a laid out circuit,<br />

the circuit placement can itself be parameterised by the static parameters. This means that<br />

blocks can shrink in size as logic is eliminated and the remainder <strong>of</strong> the circuit components’<br />

positions will be adjusted accordingly. Low-level constant propagation can not achieve this,<br />

even if operating only on a placed rather than placed & routed circuit. While the synthesis<br />

tools will eliminate logic that is not being used, it does not change the positions <strong>of</strong> other<br />

elements <strong>of</strong> the circuit and could not know how to move them anyway since the circuit layout<br />

is only parameterised in the high-level description. By moving specialisation up to the same<br />

level as the layout description (the HDL level) we are able to make sure that the layout is<br />

parameterised and not only propagate constants through a design but also achieve design<br />

compaction.<br />

Compaction has two advantages. It allows specialisation to reduce the overall size <strong>of</strong> the


CHAPTER 5. SPECIALISATION 113<br />

circuit, rather than just eliminating some logic internally while maintaining the same overall<br />

bounding box. This means that the free logic resources on the <strong>FPGA</strong> can be used more<br />

efficiently. It also minimises wire lengths by moving components that would otherwise be<br />

joined by long wires to be adjacent to each other.<br />

The second advantage <strong>of</strong> distributed specialisation is that, because it is conducted at a high-<br />

level, it should be able to be processed much faster than low level constant propagation. This<br />

is aided because, due to the overloading mechanism only selecting specialising blocks if at<br />

least one input is static, constant propagation only needs to be analysed through the parts<br />

<strong>of</strong> the circuit where it can actually have an effect.<br />

In addition, distributed specialisation does not just have to take place at the primitive level.<br />

Designers who write hardware library blocks can also provide blocks <strong>with</strong> specialisation code<br />

which could perform high level optimisations rather than using the code in the lower level<br />

circuit blocks. For example, this means that a grid shaped circuit where entire rows are<br />

expected to be eliminated can be specialised at the row level rather than processing each<br />

row element individually. We discuss in the advantages <strong>of</strong> high level specialisation further in<br />

Section 5.4.<br />

Speed is an important factor in any mechanism that is intended for use in dynamic specialisa-<br />

tion applications where it is imperative to minimise the time taken to generate a new <strong>FPGA</strong><br />

bitstream. Using distributed specialisation <strong>with</strong> the Quartz layout framework it is possible<br />

to describe constant propagation, mapping (by using only primitive hardware components<br />

in descriptions) and placement all <strong>with</strong>in the high-level description leaving only routing and<br />

bitstream generation to be completed on the system output before it can be used to configure<br />

a device.<br />

The third advantage <strong>of</strong> distributed specialisation is that it provides a clear and modular<br />

framework <strong>with</strong> which to verify the specialisation procedure and thus, by extension, the<br />

correctness <strong>of</strong> all specialised circuits produced.


CHAPTER 5. SPECIALISATION 114<br />

5.2.3 Verifying Distributed Specialisation<br />

In dynamic specialisation applications after the correctness <strong>of</strong> a general circuit has been<br />

carefully determined, through whatever means, it is vital to ensure that the new circuits<br />

generated by the specialisation system are functionally correct. Since it is not possible to<br />

simulate each new circuit as it is generated the only reasonable approach is to verify the<br />

specialisation process itself.<br />

Distributed specialisation provides a clear and modular approach to the verification prob-<br />

lem. Essentially we rely on the fact that a block <strong>with</strong> specialisation code should, under all<br />

circumstances, output the same logical value - regardless <strong>of</strong> how this is computed from static<br />

or dynamic hardware inputs.<br />

This splits the overall task <strong>of</strong> verifying a specialisation procedure into many smaller and much<br />

simpler tasks: verification <strong>of</strong> each self-specialising block. In theory it is much worse to have<br />

to verify each new specialising block that is written than to verify a single procedure that<br />

specialises any circuit however in practice we would suggest that this approach is superior.<br />

It is likely that most circuits will rely on self-specialising blocks that have already been<br />

developed and thus will not require further verification. The kind <strong>of</strong> blocks likely to have<br />

their own specialisation code developed will be library blocks which will should already be<br />

subject to extensive verification effort. If high-level specialisation has been implemented for<br />

blocks in order to increase the speed <strong>of</strong> processing then this will be functionally identical to<br />

the lower-level specialisation and the lower-level pro<strong>of</strong>s can be used to verify the high-level<br />

specialisation procedure.<br />

Because specialisation is being carried out at the HDL level we do not need to concern<br />

ourselves <strong>with</strong> details <strong>of</strong> signal routing and need to perform only a high level functional<br />

verification. By dividing the verification task each individual verification goal will tend to be<br />

quite simple, to the point where we would expect automatic pro<strong>of</strong> tools to be able to achieve<br />

a high level <strong>of</strong> automation in proving self-specialising blocks automatically.<br />

Figure 5.5 illustrates simple pro<strong>of</strong>s for the partial specialisation and2 and xor2 blocks de-<br />

scribed in Figures 5.2 and 5.4. The hardware primitives have been given HOL semantics<br />

and the partial specialisation blocks described in their expanded HOL form as given by the


CHAPTER 5. SPECIALISATION 115<br />

constdefs (∗ Semantics for hardware primitives ∗)<br />

and2 :: "wire⇒ wire⇒ wire⇒ bool"<br />

"and2 ≡ (λ a b c. c = (a ∧ b))"<br />

inv :: "wire⇒ wire⇒ bool"<br />

"inv ≡ (λ a b. b = (˜ a))"<br />

xor2 :: "wire⇒ wire⇒ wire⇒ bool"<br />

"xor2 ≡ (λ a b c. c = (a ∧ (˜ b) | b ∧ (˜a)))"<br />

constdefs (∗ Semantics for partial specialisation blocks ∗)<br />

and2’ :: "bool⇒ wire⇒ wire⇒ bool"<br />

"and2’ ≡ (λ a b c. if a then (c = b) else (c = False))"<br />

xor2’ :: "bool⇒ wire⇒ wire⇒ bool"<br />

"xor2’ ≡ (λ a b c. if a then (inv b c) else (c = b))"<br />

(∗ Correctness theorems for specialised blocks ∗)<br />

theorem and2 spec: "Îabc. and2 a b c = and2’ a b c"<br />

by (simp add: and2 def and2’ def)<br />

theorem xor2 spec: "Îabc. xor2 a b c = xor2’ a b c"<br />

by (simp add: xor2 def xor2’ def inv def)<br />

Figure 5.5: Functional verification <strong>of</strong> distributed specialisation <strong>of</strong> and2 and xor2<br />

semantic function Bβ (Figure 4.8, page 81).<br />

This is actually a rather simpler verification model than would be desired, since the wire<br />

type is simply a synonym for bool and thus there is no need for explicit reference to the types<br />

<strong>of</strong> the input/output signals - nevertheless it suitably demonstrates the concept. The two<br />

theorems demonstrating the equivalence <strong>of</strong> the blocks are proved by using just the simplifier<br />

and expanding the block definitions.<br />

Functional verification is only one side <strong>of</strong> the verification task: it is also vital to verify the<br />

correctness <strong>of</strong> the layout <strong>of</strong> a specialised circuit. This can be done using the layout verification<br />

framework we described in Chapter 4. When specialising designs it is useful to state and<br />

prove a particular additional correctness theorem <strong>of</strong> the form:<br />

∀sigs. Width cctspec sigs ≤ Widthcctgen sigs ∧ Heightcctspec sigs ≤ Heightcctgen sigs<br />

In fact, this theorem should be phrased so it is only universally quantified over dynamic<br />

parameters. Parameters which genuinely affect the size <strong>of</strong> the circuit and are not expected to<br />

vary at run-time (such as the bit-width <strong>of</strong> a ripple adder) can not be included in such pro<strong>of</strong>s<br />

since it would clearly render the theorem unprovable.


CHAPTER 5. SPECIALISATION 116<br />

b<br />

a<br />

cout<br />

cin<br />

Figure 5.6: Full adder cell<br />

These layout theorems can be proved for each self-specialising block and then these pro<strong>of</strong>s<br />

combined to verify the size correctness <strong>of</strong> an entire circuit. This once again neatly decomposes<br />

a potentially large verification task into many small and easy parts. For example, for the<br />

xor2 gate in Figure 5.4 the width correctness theorem can be stated and proved easily:<br />

∀ a b c. Width xor2spec a b c ≤ Width xor2gen a b c<br />

⇒ ∀ a b c. if(a, 1, 0) ≤ 1<br />

⇒ 1 ≤ 1 ∧ 0 ≤ 1 ⇒ True<br />

5.3 Optimal Distributed Specialisation<br />

Reviewing the effectiveness <strong>of</strong> our self-specialising blocks on a real circuit we discover that it<br />

is necessary to address some shortcomings <strong>with</strong> the basic approach.<br />

5.3.1 Specialising a Ripple Adder<br />

We will demonstrate distributed specialisation in Quartz using a simple ripple adder circuit.<br />

This is composed <strong>of</strong> full adder cells as shown in Figure 5.6 and described in Quartz as shown<br />

in Figure 5.7. The full adder is implemented using two xor gates and a multiplexer, which is<br />

described separately using two 2-input and gates, an or and an inverter. This description is<br />

not designed for any particular target architecture however, and what matters is the effect<br />

on this circuit <strong>of</strong> specialising it.<br />

As can be seen in Figure 5.7 the full-adder block accepts only hardware wires for inputs cin<br />

ans


CHAPTER 5. SPECIALISATION 117<br />

block fadd ((‘a a, wire b), wire cin) ∼ (wire cout, wire ans) {<br />

wire xored ab.<br />

(a,b) ; xor2 ; xored ab at (0, height((a, cin) ; mux2 xored ab ;cout)).<br />

(cin, xored ab) ;xor2 ; ans at (width((a,b) ; xor2 ; xored ab), height((a, cin) ;<br />

mux2 xored ab ;cout)).<br />

(a, cin) ; mux2 xored ab ;cout at (0,0).<br />

}<br />

Figure 5.7: Quartz description for a full adder<br />

Resource Standard A=111 A=100 A=001<br />

xor gates 6 3 3 3<br />

and gates 6 3 3 3<br />

inverters 2 6 4 4<br />

or gates 3 3 3 3<br />

Total gates 17 15 13 13<br />

Saving - 12% 29% 29%<br />

Total transistors 106 72 68 68<br />

Saving - 32% 35% 35%<br />

Table 5.1: Using distributed specialisation to specialise a ripple adder<br />

and b but input a is more flexible and, since it is connected to overloaded self-specialising<br />

blocks, can be <strong>of</strong> either wire or bool type. This allows the a input to be specialised by<br />

supplying a static value.<br />

The full adder blocks can be combined together using the col combinator to create a ripple<br />

adder. We can then use the Quartz compiler to synthesise a netlist for this design <strong>with</strong> either<br />

two dynamic inputs or one dynamic and one statically specialised input. Table 5.1 compares<br />

3-bit versions <strong>of</strong> the standard ripple adder and three specialised versions.<br />

The first point to observe from these results is that the distributed specialisation has produced<br />

a significant saving in on-chip resources, though more so in terms <strong>of</strong> transistors by converting<br />

complex functions into simpler ones than by eliminating logic entirely. However, it is also<br />

obvious that this result is not optimal - for example, adding “100” should be implementable<br />

as a single full-adder <strong>with</strong> all other blocks reduced to wiring, an arrangement that only<br />

requires 6 gates using this fadd block.<br />

The reason our distributed specialisation system produces such poor results is the Quartz<br />

typing system and can be seen in the code describing distributed specialisation <strong>of</strong> the and2<br />

block, or the or2 block shown in Figure 5.8. These two blocks either connect their output to


CHAPTER 5. SPECIALISATION 118<br />

block or2 (bool a, wire b) ∼ (wire c) attributes { width = 0. height = 0. }<br />

→ if a { c = true. } else { c = b. } .<br />

block or2 (wire a, bool b) ∼ (wire c)<br />

→ (b, a) ; or2 ; c.<br />

Figure 5.8: Distributed specialisation for an or2 block<br />

the dynamic input or statically assign it to some value, depending on the value <strong>of</strong> the static<br />

input.<br />

For or2, c is connected to b if a is false, and connected to ground otherwise. Because c is<br />

connected to a wire in one branch <strong>of</strong> the conditional it must itself have a wire type since it<br />

is not possible to assign wire values to any other type. The assignment c = true used in<br />

the other branch uses the overloaded signal connection operator, which allows static boolean<br />

values to be assigned to wires.<br />

This block correctly specialises itself, however it does not allow a propagation <strong>of</strong> the constant<br />

value. If a is true, the more optimal solution is to produce a boolean output c assigned <strong>with</strong><br />

the value true. This would then allow blocks connected to the c output <strong>of</strong> or2 to properly<br />

specialise themselves, whereas <strong>with</strong> this block description other blocks assume that the value<br />

<strong>of</strong> c is unknown.<br />

5.3.2 Modified Type System<br />

To achieve proper propagation <strong>of</strong> the constant, c would need to be typed as a boolean if a<br />

is true and as a wire otherwise. This is impossible in a statically typed language like Quartz<br />

where types are determined by an inference process prior to the program executing.<br />

This problem also occurs at the combinator level. When the col combinator is used to<br />

connect together multiple fadd blocks it requires that the fadd block has a type <strong>of</strong> the form<br />

(α, β) ∼ (β, γ) - that is that the top and bottom connections must have the same type so they<br />

can be connected together. This means that it is not possible for the carry signal moving up<br />

the column to alternate between bool and wire types, depending on whether the carry value is<br />

known. Nor is it possible to take account <strong>of</strong> the fact that, in our ripple adder implementation,<br />

the initial carry-in input is always zero and use this to simplify the first full adder.


CHAPTER 5. SPECIALISATION 119<br />

Modifications are necessary to the type system to provide this kind <strong>of</strong> support. One possibility<br />

is to provide a full system <strong>of</strong> dependent types, where the type <strong>of</strong> a signal could depend on a<br />

variable value. This complicates type inference - making it undecidable - and would require<br />

the designer to write complex type declarations.<br />

A better alternative is to achieve the same power by introducing the much simpler construct<br />

<strong>of</strong> enumerated types. These are standard constructs in many programming languages, which<br />

allow programmers to specify their own types using type constructors. Functional program-<br />

ming languages usually provide easy-to-use support for recursive types and these are used to<br />

define recursive data structures such as linked lists and trees. We do not require recursive<br />

types – merely the ability for a value that can have multiple interpretations to have a simple<br />

type.<br />

We could define a data type which could be declared by (in pseudo-code):<br />

type data =<br />

Known <strong>of</strong> bool<br />

Wire <strong>of</strong> wire.<br />

Values <strong>of</strong> type data would then be used in circuit descriptions, rather than wire or bool, and<br />

the specific value would be extracted by the block itself. Figure 5.9 illustrates what an or2<br />

block that used this mechanism could look like. Because static and dynamic values now have<br />

the same type we are no longer able to use the overloading mechanism to select between<br />

instances and instead this or2 block contains code to generate the correct output regardless<br />

<strong>of</strong> whether zero, one or both inputs are known. This block <strong>with</strong> specialising code can however<br />

be overloaded <strong>with</strong> the hardware primitive <strong>with</strong> type wire wire ∼ wire and the type system<br />

can determine which block to instantiate.<br />

In this description “Wire” and “Known” are used as both an access function, to retrieve the<br />

wire value attached to the data inputs a and b and as a constructor, and a constructor, to<br />

be pattern matched. It may be that there is a more appropriate syntax, however it is the<br />

concept that matters.<br />

This system can be used to achieve optimal constant propagation results and verification is<br />

still relatively easy using a suitable model in an automatic theorem prover.


CHAPTER 5. SPECIALISATION 120<br />

block or2 (data a, data b) ∼ (data c) {<br />

if (a = Known a2) {<br />

if a2 { c = Known true. }else { c = b. } .<br />

} else if (b = Known b2) {<br />

if b2 { c = Known true. }else { c = Known false. }.<br />

} else<br />

(Wire a, Wire b) ; or2 ; (Wire c).<br />

} .<br />

}<br />

Figure 5.9: Distributed specialisation for an or2 block <strong>with</strong> a better type system<br />

5.4 High Level Specialisation<br />

Primitive-level specialisation <strong>with</strong> clever components that can eliminate individual gates is<br />

a useful process however it is not a total solution to all possible specialisation requirements.<br />

A higher level approach to specialisation, writing specialisation code for larger blocks such<br />

as library elements, also has a significant role to play.<br />

An important consideration is that constant folding, while one <strong>of</strong> the most useful optimi-<br />

sations to carry out when specialising circuits <strong>with</strong> an aim to reduce their area or improve<br />

performance, is not the only specialisation procedure we might wish to apply. There are<br />

many reasons we might wish to specialise a design, for example:<br />

1. To eliminate unnecessary logic in order to free space on the device for other function-<br />

ality, or to reduce the circuit’s power consumption.<br />

2. To increase the maximum clock frequency and run the circuit at a higher speed.<br />

3. To eliminate unnecessary computation from the critical path <strong>of</strong> a pipelined design,<br />

reducing the overall latency.<br />

4. To free space, allowing it to be used to further parallelise the computation.<br />

Items 3 & 4 are particularly interesting. If the initial stages <strong>of</strong> a pipelined computation could<br />

be eliminated by pre-computation <strong>of</strong> some <strong>of</strong> the inputs then the resulting circuit’s latency<br />

could be reduced. If the circuit is required to have a specific latency then this “latency slack”<br />

can be used to introduce additional pipelining in the later stages. This could then allow the<br />

design to run at a higher clock frequency overall, or the design could be run at the same


CHAPTER 5. SPECIALISATION 121<br />

Buffer/Queue<br />

R<br />

(a) General circuit<br />

R<br />

Buffer/Queue<br />

(b) Specialised<br />

R<br />

Buffer/Queue<br />

Additional Control Logic<br />

R R<br />

Additional Control Logic<br />

(c) Parallelised<br />

Figure 5.10: Space freed by specialisation can be used to further parallelise a circuit<br />

clock frequency but hopefully <strong>with</strong> reduced power consumption due to the reduction in glitch<br />

propagation [85].<br />

Alternatively, if the logic resources required to carry out a computation can be reduced<br />

but the space allocated on the device remains the same then the freed space can be used<br />

for accelerating the computation in other ways. It could even be used to duplicate the<br />

computational unit, increasing throughput if tasks are switched between processors rather<br />

than queueing for a single processor, as illustrated in Figure 5.10. In this diagram, the dotted<br />

box indicates the logic area allocated to the computation, which can be used to implement<br />

either a general processor, a single specialised processor and some unused logic, or multiple<br />

specialised processors <strong>with</strong> additional control.<br />

This kind <strong>of</strong> specialisation is not mere constant propagation and requires a higher-level <strong>of</strong><br />

designer involvement. <strong>Circuit</strong> designers can program library blocks to exhibit this kind <strong>of</strong><br />

specialisation behaviour, using Quartz conditionals extended <strong>with</strong> block size constructs to<br />

identify the size <strong>of</strong> specialised components. This means that a block to implement the system<br />

in Figure 5.10 could generate any number <strong>of</strong> additional copies <strong>of</strong> the computational block<br />

depending on the ratio between the size <strong>of</strong> Rgen and Rspec.<br />

High-level specialisation can also be used to perform constant propagation on a macro-level,<br />

eliminating the need to process the specialisation individually for each primitive block. This<br />

means that the specialisation process can be run more quickly, important for dynamic spe-<br />

cialisation applications where it is necessary to produce a new circuit quickly at run-time.<br />

High-level and primitive-level specialisation can be combined in descriptions so that large<br />

contiguous collections <strong>of</strong> blocks can be eliminated at the high level while individual blocks in


CHAPTER 5. SPECIALISATION 122<br />

irregular positions can be eliminated using primitive-level specialisation.<br />

High-level specialisation can fit our definition <strong>of</strong> distributed specialising - blocks operating<br />

independently <strong>with</strong>out centralised control. In some cases it may be desirable to provide<br />

explicit control over the kind <strong>of</strong> specialisation engaged in and this can be done by adding extra<br />

parameters. The Quartz overloading mechanism can be used to overload a parameterised<br />

block <strong>with</strong> a non-parameterised one which instantiates the self-specialising block <strong>with</strong> a<br />

default set <strong>of</strong> parameters – the same method we used in Section 3.6 to give blocks multiple<br />

layout interpretations.<br />

5.5 Specialising a Multiplier<br />

We will demonstrate high level specialisation <strong>with</strong> a simple example: a parallel multiplier<br />

circuit. Since one <strong>of</strong> the main advantages <strong>of</strong> specialising designs in Quartz rather than using<br />

synthesis tool optimisations is that we are able to specialise and compact placed designs<br />

we will take this opportunity to evaluate whether any performance benefit is gained from<br />

compaction.<br />

5.5.1 Parallel Multiplier Implementation<br />

Before we can specialise a multiplier circuit it is necessary to describe a multiplier circuit in<br />

Quartz. Since we are interested in evaluating the real performance <strong>of</strong> the specialised circuit,<br />

we will design a multiplier for a real <strong>FPGA</strong> architecture - Xilinx Virtex-II.<br />

A parallel multiplier operates using a shift-add methodology that is similar to the way binary<br />

multiplication is performed on paper. A multiplier performing the operation x × y can<br />

be described as a grid-shaped circuit <strong>with</strong> x values flowing vertically and y values flowing<br />

horizontally. Each functional cell must perform a the multiplication operation for one bit<br />

<strong>of</strong> x and one bit <strong>of</strong> y, producing sum and carry outputs in a similar way to a ripple adder,<br />

except that these outputs will be connected to additional processing cells <strong>with</strong> only the final<br />

stage producing an output.<br />

The Virtex-II architecture contains specific components <strong>with</strong>in each half-slice to implement


CHAPTER 5. SPECIALISATION 123<br />

a fast carry chain designed to support the generation <strong>of</strong> fast adders and multipliers. Each<br />

half-slice also contains a mult and component which allows two functional cells implementing<br />

multiplication to be described <strong>with</strong>in a single slice. Figure 5.11 illustrates how a half-slice<br />

can be configured to form part <strong>of</strong> the functional cell <strong>of</strong> a multiplier.<br />

At first glance this is an inefficient way <strong>of</strong> implementing the functional cell, since the x · y<br />

logical and operation is computed twice. However this is not actually the case since the area<br />

<strong>with</strong>in the dotted boundary is implemented using the slice look-up table. The performance<br />

and area required by the look-up table is independent <strong>of</strong> the actual logic function it is used<br />

to implement and the intermediate x · y signal does not actually exist, so can not be used as<br />

an input to the top multiplexer.<br />

The lower and gate is the slice mult and component which is specifically available for carrying<br />

out this operation and can only be connected to the lower two inputs <strong>of</strong> the look-up table.<br />

The second exclusive-or operation and the multiplexer are also available already as dedicated<br />

devices <strong>with</strong>in the slice so do not require any additional resources that are not already in<br />

existence.<br />

This slice logic can be combined <strong>with</strong> a wiring arrangement to form a cell suitable for com-<br />

position into a grid. Figure 5.12(a) shows the wiring <strong>with</strong>in a multiplier cell and how it is<br />

connected to the slice circuitry. The ACCin and ACCout signals provide a diagonal connec-<br />

tion between the SUMout output <strong>of</strong> the cell to the left and the Qin input <strong>of</strong> the cell above.<br />

X and Y signals are routed through the cell as well as being connected to the slice logic and<br />

the output signals are connected to the cells above and to the right. Figure 5.13 shows the<br />

Quartz description <strong>of</strong> the multiplier cell.<br />

This cell design can be composed into a grid, describing a multiplier <strong>with</strong> a y input on the<br />

left and x input on the bottom. The multiplication results are output on the right side and<br />

the top side. When multiplying an n bit number by an m bit number the lower n bits <strong>of</strong><br />

the result will be output on the right and the upper m bits will be available in carry-save<br />

representation on the top connections. An additional adder circuit must be connected to the<br />

top connections to produce a full m+n bit output. Figure 5.14 shows the Quartz description<br />

for an n-bit by n-bit multiplier, where only the first n bits <strong>of</strong> the output are utilised. This<br />

circuit is similar to one derived formally using the T-Ruby system [68], although ours is


CHAPTER 5. SPECIALISATION 124<br />

Yin<br />

ACCin<br />

Qin<br />

y<br />

x<br />

Figure 5.11: Virtex-II cell configuration to create a parallel multiplier<br />

ACCout<br />

Yin<br />

Pout<br />

SLICE LOGIC<br />

Xin<br />

Xout<br />

Qin Pin<br />

Xin<br />

(a) Standard cell design<br />

Yout<br />

SUMout<br />

Yin<br />

Pout<br />

ACCin<br />

Pin<br />

ACCout<br />

Qin<br />

SUMout<br />

(b) Specialised X=0<br />

Figure 5.12: Functional cells for parallel multiplier<br />

block multcell ((wire acc in, wire y in), (wire acc out, wire p out, wire x out)) ∼<br />

((wire q in, wire p in, wire x in), (wire sum out, wire y out))<br />

attributes { height = 1. width = 1. }{<br />

wire xored sig.<br />

y out = y in.<br />

x out = x in.<br />

acc out = acc in.<br />

(x in, y in, q in) ; mult lut ; xored sig at (0,0).<br />

(p in, xored sig) ; xorcy ; sum out at (0,0).<br />

((x in, y in), p in) ; fst mult and ;muxcy xored sig ;p out at (0,0).<br />

}<br />

Figure 5.13: Quartz description <strong>of</strong> the multiplier cell<br />

Yout<br />

SUMout


CHAPTER 5. SPECIALISATION 125<br />

block mult (int n) (wire y[n], wire x[n]) ∼ (wire z[n]) {<br />

wire zeros[n].<br />

int j.<br />

for j = 0..n−1 { zeros[j] = false. } .<br />

(zeros, y) ;<br />

zip 2 ; rev n ;<br />

converse (pi1) ;<br />

grid (n, n, multcell) ;<br />

[converse (zip 3), map (n, pi1) ; rev n] ;<br />

((zeros, zeros, x), z) at (0,0).<br />

}<br />

Figure 5.14: Quartz description <strong>of</strong> the multiplier grid<br />

designed specifically for a real circuit architecture and thus has different data-flow.<br />

When the value <strong>of</strong> the x input is known the circuit can be specialised. When x = 0, the<br />

individual cell can be replaced by the arrangement shown in Figure 5.12(b). Because the x bit<br />

value is common to the entire column the entire column can be eliminated and replaced <strong>with</strong><br />

an iterative wiring arrangement that directly connects the sum out outputs <strong>of</strong> the previous<br />

column to the sum out output <strong>of</strong> the current column displaced vertically by one cell. This is<br />

described in a spec multcol block which produces a specialised multiplier <strong>with</strong> unnecessary<br />

rows eliminated.<br />

Rather than using the grid combinator, we describe a multiplier <strong>with</strong> column-level special-<br />

isation using irow and this spec multcol block. spec multcol is parameterised by a boolean<br />

array <strong>of</strong> the bit values <strong>of</strong> static signal x and the index parameterisation <strong>of</strong> irow is used to<br />

extract the correct value for each column. This is just one way <strong>of</strong> describing this behaviour,<br />

an alternative would be to zip the boolean vector <strong>with</strong> the other column inputs and use the<br />

standard row combinator.<br />

The general and specialised multipliers can be overloaded so that the correct instance <strong>of</strong> mult<br />

is selected depending on whether a static or dynamic x value is specified.<br />

5.5.2 Results<br />

We expect the performance <strong>of</strong> the specialised multiplier to depend on the precise value <strong>of</strong> the<br />

x input, since 0s in the x value do not require any computation at all. Therefore we synthe-


CHAPTER 5. SPECIALISATION 126<br />

Resources Standard Settings Constrained Timing<br />

x value Slices Diff Max freq. (Mhz) Diff Max freq. (Mhz) Diff<br />

3 15 -79% 162 315% 192 368%<br />

9 14 -80% 153 292% 170 314%<br />

85 26 -63% 88 126% 95 131%<br />

121 32 -55% 75 92% 83 102%<br />

128 5 -93% 114 192% 115 180%<br />

170 25 -65% 88 126% 97 137%<br />

255 71 0% 38 -4% 41 0%<br />

Table 5.2: Results <strong>of</strong> multiplier specialisation <strong>with</strong>out compaction<br />

sise a relatively small multiplier circuit, for two 8-bit inputs, so that different performance<br />

behaviour can be explored <strong>with</strong>out requiring too large a number <strong>of</strong> different input values to<br />

be evaluated. We connect registers to the inputs and output so that the maximum clock<br />

frequency <strong>of</strong> the design can be evaluated. With this design, a general multiplier requires 71<br />

slices on the device. We generate two versions <strong>of</strong> the design on the device - one <strong>with</strong> the<br />

standard Xilinx tool settings and a second <strong>with</strong> a timing constraint to (hopefully) generate a<br />

better routed circuit. The standard design can run at a maximum clock frequency <strong>of</strong> 39Mhz,<br />

while the version synthesised <strong>with</strong> the timing constraint can run up to 41Mhz.<br />

Table 5.2 shows the results for specialising the multiplier for various x values <strong>with</strong> the descrip-<br />

tion configured to prevent compaction <strong>of</strong> the design. Compaction is prevented by manually<br />

specifying a size expression for the spec multcol block which does not contain a conditional<br />

and is the same size regardless <strong>of</strong> whether the column is specialised away. The “Diff” columns<br />

indicate the percentage difference between the values for the specialised multiplier and for<br />

the general multiplier, standard or <strong>with</strong> the timing constraint as appropriate.<br />

There are clearly significant speed and area savings that can be made by replacing the<br />

general multiplier by constant coefficient versions. To understand the results it is necessary<br />

to understand how the different values impact the structure <strong>of</strong> the generated circuit, as<br />

shown in Figure 5.15. Examining these diagrams it is clear that the multiplier for x = 255 is<br />

virtually identical to the full multiplier - although it does differ in the important respect that<br />

it does not require input pins for its x input. It is interesting to note that the Xilinx s<strong>of</strong>tware<br />

actually initially generates a design which runs slower than the full multiplier, although this<br />

effect is eliminated when the timing constraint is introduced. With the timing constraint<br />

the constant-coefficient version does actually run 0.08% faster, however this effect is pretty


CHAPTER 5. SPECIALISATION 127<br />

y<br />

y<br />

y<br />

y<br />

x 0<br />

x 0<br />

x 0<br />

x 1 x 2 x3 x 4 x 5 x 6 x7<br />

x<br />

(a) Full multiplier<br />

x 3<br />

x<br />

(c) x = 9<br />

x 3 x 4 x 5 x 6<br />

x<br />

(e) x = 121<br />

x 1 x 3 x 5 x 7<br />

x<br />

(g) x = 170<br />

ans<br />

ans<br />

ans<br />

ans<br />

y<br />

y<br />

y<br />

y<br />

x 0<br />

x 0<br />

x 0<br />

x 1<br />

x<br />

(b) x = 3<br />

x 2 x 4 x 6<br />

x<br />

(d) x = 85<br />

x<br />

(f) x = 128<br />

x 7<br />

x 1 x 2 x3 x 4 x 5 x 6 x7<br />

x<br />

(h) x = 255<br />

Figure 5.15: Comparing the full multiplier <strong>with</strong> specialised constant co-efficient multipliers<br />

ans<br />

ans<br />

ans<br />

ans


CHAPTER 5. SPECIALISATION 128<br />

Resources Standard Settings Constrained Timing<br />

x value Slices Diff Max freq. (Mhz) Diff Max freq. (Mhz) Diff<br />

3 15 0% 162 0% 192 0%<br />

9 14 0% 161 5% 170 0%<br />

85 26 0% 86 -2% 99 4%<br />

121 32 0% 76 1% 83


CHAPTER 5. SPECIALISATION 129<br />

<strong>with</strong> the uncompacted design the freed logic is dispersed throughout the multiplier circuit.<br />

This example demonstrates that compaction can improve performance and we would expect<br />

the performance gain to be much larger for larger circuits where there is more potential for<br />

compaction, even in a larger multiplier circuit.<br />

5.6 Summary<br />

In this chapter we have demonstrated how the Quartz layout infrastructure can be used to<br />

create specialised versions <strong>of</strong> designs by optimising for particular static inputs.<br />

We have presented the mechanism <strong>of</strong> distributed specialisation and demonstrated how this<br />

can be used to specialise a ripple adder using the Quartz overloading mechanism. We have<br />

also highlighted the capabilities required to implement an optimal distributed specialisa-<br />

tion system. We have shown how distributed specialisation lends itself to clear and simple<br />

verification in a way that could be easily automated using a theorem prover.<br />

One <strong>of</strong> the advantages <strong>of</strong> performing specialisation at the Quartz level is that we are able to<br />

achieve compaction <strong>of</strong> placed designs. We have demonstrated that this can lead to increased<br />

performance for a specialised multiplication circuit.


Chapter 6<br />

<strong>Layout</strong> Case Studies<br />

In this chapter we demonstrate the use <strong>of</strong> our layout framework by describing some full<br />

circuits and comparing the performance and compilation times for versions <strong>with</strong> and <strong>with</strong>out<br />

placement. Section 6.1 outlines our basic approach to collecting results. In Section 6.2 we<br />

describe pipelined and unpipelined binary trees <strong>of</strong> ripple adders. Section 6.3 gives the Quartz<br />

design and results for a simple median filter, while Section 6.4 describes a butterfly network<br />

<strong>of</strong> 2-sorters and introduces the low-level register pipelining combinator. Section 6.5 describes<br />

and analyses a binomial filter circuit. Section 6.6 introduces a new class <strong>of</strong> n-dimensional<br />

combinators and shows how the 3D version can be used to describe a matrix multiplier<br />

<strong>with</strong> an implicit 2D layout interpretation. Section 6.7 evaluates our results and Section 6.8<br />

summarises this chapter.<br />

6.1 Approach<br />

In this chapter we describe a variety <strong>of</strong> different circuits <strong>with</strong> layout information, verify their<br />

layouts and evaluate the performance <strong>of</strong> the resulting circuit <strong>with</strong> and <strong>with</strong>out the placement<br />

constraints. Designs were synthesised for a Xilinx Virtex-II <strong>FPGA</strong> so that the required logic<br />

resources, maximum operating frequency and power consumption could be measured.<br />

Designs are expressed in Quartz and compiled using the Quartz compiler <strong>with</strong> layout gen-<br />

eration (Chapter 3) into Pebble 5. The Pebble 5 compiler [66] is then used to produce<br />

130


CHAPTER 6. LAYOUT CASE STUDIES 131<br />

a flattened, placed netlist in VHDL format. This flattened VHDL instantiates architecture<br />

primitives and is enclosed in a hand-coded VHDL testbench for synthesis using the Xilinx ISE<br />

s<strong>of</strong>tware. The testbench has been used to carry out the following functions as appropriate:<br />

1. Clock division. Where the power consumption <strong>of</strong> circuits has been measured a simple<br />

clock divider circuit has been used to ensure that all circuits are run <strong>with</strong>in their<br />

maximum clock frequency.<br />

2. Generating input data. Linear Feedback Shift Register Counters [3] are used to provide<br />

pseudo-random input data to designs when necessary.<br />

3. Provision <strong>of</strong> suitable interface registers.<br />

4. XORing <strong>of</strong> outputs to a single chip pin.<br />

Outputs were XORed together when it was desired to investigate the power consumption <strong>of</strong><br />

the circuitry on a chip to the minimise the influence <strong>of</strong> I/O power. It is necessary to connect<br />

a single output pin to prevent the entire circuit being optimised away by the synthesis tools.<br />

Power consumption has been measured from a Celoxica RC200 development board, equipped<br />

<strong>with</strong> an XC2V1000 <strong>FPGA</strong>. This is a complex development board and the board power con-<br />

sumption dwarfs that <strong>of</strong> the <strong>FPGA</strong> itself, thus we have measured the quiescent power <strong>of</strong><br />

the board (when the <strong>FPGA</strong> is programmed to be empty) and deducted this from our power<br />

measurements. Power consumption itself was measured by monitoring the current drawn by<br />

the board at its operating voltage <strong>of</strong> 12V.<br />

We use four metrics to evaluate our designs:<br />

1. Maximum clock frequency. This is an important measure <strong>of</strong> the performance <strong>of</strong> the<br />

generated circuit, indicating how fast it can be run and thus how fast it can process<br />

data. We would hope that manually placed designs would have higher maximum clock<br />

frequencies than automatically placed ones, however previous results have shown that<br />

this is not always the case [77]. While placed designs <strong>of</strong>ten outperform unplaced designs<br />

very significantly, some types <strong>of</strong> design do not. Maximum clock frequency is not the<br />

only characteristic <strong>of</strong> circuits we are interested in and a placed design may still be<br />

preferable to an unplaced one if it outperforms on one <strong>of</strong> our other metrics.


CHAPTER 6. LAYOUT CASE STUDIES 132<br />

2. Power consumption. For some variants <strong>of</strong> circuits we have been able to measure the<br />

relative power consumption <strong>of</strong> placed and unplaced designs and compare them. Once<br />

again we would hope that manually placed designs would have lower power consump-<br />

tion, however this may differ from circuit to circuit.<br />

3. Place and route time. The time taken to place and route a circuit is an important<br />

part <strong>of</strong> the overall hardware compilation time, particularly for dynamic specialisation<br />

applications where it is necessary to generate circuits very quickly. This is measured<br />

by the Xilinx synthesis tools running on an Intel dual Pentium 4 Xeon 2.6Ghz PC <strong>with</strong><br />

4GB <strong>of</strong> RAM.<br />

4. Logic area. The logic area used by circuits will be measured as the number <strong>of</strong> slices<br />

required on the Virtex-II.<br />

6.2 Adder Tree<br />

The simplest circuits we analyse are pipelined and unpipelined binary adder trees.<br />

6.2.1 Ripple Adder<br />

Binary ripple adders can be laid out quite densely on the Virtex-II architecture, using a single<br />

slice to implement the addition (and pipeline delay, if desirable) <strong>of</strong> two bits. This is because<br />

the architecture <strong>of</strong> each slice (Figure 2.2, page 11) is such that two full-adder circuits can be<br />

implemented in a single slice, using both function generators and the carry chain.<br />

The Virtex slice architecture contains specialised carry logic designed to create fast carry<br />

chains. A Virtex full adder can be built by using the 4-input look-up table as an xor function<br />

which is then connected to the muxcy carry multiplexer to generate the carry out signal and<br />

xorcy logic to generate the sum result signal (which can then be registered if desired). This<br />

arrangement is depicted in Figure 6.1 and the Quartz code which generates this arrangement<br />

is shown in Figure 6.2. This description exploits the geometric interpretation <strong>of</strong> Quartz block<br />

domains and ranges to use the cout signal in the domain to indicate that it is connected to<br />

the top side <strong>of</strong> the block and cin in the range to indicate it is on the bottom <strong>of</strong> the block.


CHAPTER 6. LAYOUT CASE STUDIES 133<br />

b<br />

a<br />

cout<br />

Figure 6.1: <strong>Circuit</strong> diagram <strong>of</strong> the full adder block<br />

block fadd ((wire a, wire b), wire cout) ∼ (wire cin, wire ans)<br />

attributes { height = 1. width = 1. }{<br />

wire xored ab.<br />

(a,b) ; xor2 ; xored ab at (0,0).<br />

(cin, xored ab) ;xorcy ; ans at (0,0).<br />

(a, cin) ; muxcy xored ab ;cout at (0,0).<br />

}<br />

Figure 6.2: A full adder <strong>with</strong>in a single Virtex 2 slice<br />

All three primitive components <strong>with</strong>in the fadd block are placed at co-ordinates (0, 0) indi-<br />

cating that they should be located <strong>with</strong>in the same slice. The xor2 block is a wrapper for the<br />

Xilinx lut2 primitive which initialises the look-up table <strong>with</strong> the values necessary to produce<br />

an xor function.<br />

We have illustrated the full adder from the unpipelined ripple adder circuit, however, it is<br />

possible to connect a pipeline register on the output ans signal in a series composition <strong>with</strong><br />

the xorcy block, all <strong>with</strong>in the same slice. A full n-bit ripple adder can be formed using the<br />

column combinator: (a, b) ; zip 2 ; π1 −1 ; coln fadd ; (cin, ans).<br />

The verification <strong>of</strong> the fadd layout is a particular issue. Since the smallest meaningful element<br />

in our layout framework is a block <strong>of</strong> size 1 ×1 (half a slice, for Virtex-II) and our framework<br />

assumes a homogeneous grid <strong>of</strong> resources, it is not possible to verify the layout <strong>with</strong>in the full<br />

adder block. Instead, we reason about circuits using the full adder at the level <strong>of</strong> individual<br />

full adders, assuming that fadd itself is correct. The xor2, xorcy and muxcy blocks (and fd<br />

flip-flop primitives, if desired) are given a size <strong>of</strong> 0 × 0 to indicate that many can be packed<br />

<strong>with</strong>in a slice.<br />

cin<br />

ans


CHAPTER 6. LAYOUT CASE STUDIES 134<br />

(a) “Tree” (b) Vertical (c) Horizontal<br />

Figure 6.3: Different ways <strong>of</strong> laying out binary trees<br />

This does not prevent incorrect packing <strong>of</strong> computational resources into slices, however this<br />

error is a simple one to avoid. Precisely what might be legal or not legal is architecture<br />

dependent and thus not desirable to implement <strong>with</strong>in our general framework. Since a<br />

slice/sub-slice “primitive” such as the full adder is not parameterised its design is gener-<br />

ally either obviously correct or incorrect and in any event synthesis tools will always be able<br />

to generate an error message if the logic mapping is invalid.<br />

6.2.2 Possible Tree <strong>Layout</strong>s<br />

In Chapter 4 we illustrated how the Quartz system could describe a H-tree layout. This is just<br />

one <strong>of</strong> the possible ways <strong>of</strong> laying out a binary tree <strong>of</strong> components and we will experiment <strong>with</strong><br />

a simple horizontal layout (Figure 6.3(c) 1 ). We choose this layout because, <strong>with</strong> the vertical<br />

arrangement <strong>of</strong> ripple adders using the fast carry chain circuitry, it can be laid out extremely<br />

densely. In the horizontal layout the two sub-trees are laid out to each side <strong>of</strong> the root node.<br />

We also experimented <strong>with</strong> a “traditional” tree layout, laid out <strong>with</strong> spaces between nodes<br />

(Figure 6.3(a)) and a compaction <strong>of</strong> the tree into a single column (Figure 6.3(b)) however<br />

1 The horizontal tree arrangement is shown <strong>with</strong>out wiring, due to its density.


CHAPTER 6. LAYOUT CASE STUDIES 135<br />

block btree (int p, block R (‘a, ‘a) ∼ ‘a) (‘a i [m]) ∼ (‘a o) {<br />

const m = 2 ∗∗ p. ‘a t1, t2. ‘a i1[m/2], i2[m/2].<br />

assert (p >= 0) ”btree p


CHAPTER 6. LAYOUT CASE STUDIES 136<br />

Slices Util. t-PAR (s) Max freq. (Mhz)<br />

Unpipelined/Auto 777 15% 48 61.5<br />

Unpipelined/Placed 406 7% 19 69.7<br />

Pipelined/Auto 777 15% 28 206.0<br />

Pipelined/Placed 406 7% 26 152.0<br />

Table 6.1: Results for a single adder tree<br />

R using the timeless pre-condition and split into two repeated anti-delays <strong>with</strong>in the parallel<br />

composition using the property that D is polymorphic:<br />

half 2 n ; [btreen(R ; D) ; D −n , btreen(R ; D) ; D −n ] ; R<br />

The induction hypothesis can then be used to complete the pro<strong>of</strong>.<br />

Using this combinator we generate a 6-level tree <strong>of</strong> 8-bit ripple adders, producing a circuit<br />

which adds together 64 input values. The manual placement is compared <strong>with</strong> the identical<br />

circuit compiled <strong>with</strong>out placement and using the Xilinx placement algorithm.<br />

6.2.3 Results<br />

Table 6.1 shows the results for placed and unplaced pipelined and unpipelined adder trees.<br />

“Util” is the percentage <strong>of</strong> resources utilised on the device, “t-PAR” is the amount <strong>of</strong> time<br />

required to place and route the circuit. As expected, pipelining increases the maximum clock<br />

frequency significantly (although far from the predicted theoretical maximum <strong>of</strong> ×8). It is<br />

also interesting to note that the manually placed design has worse performance than the<br />

automatically placed version when the circuit is pipelined, even though the manually placed<br />

version has been mapped into fewer Virtex slices.<br />

We also experimented <strong>with</strong> placing multiple adder trees on the <strong>FPGA</strong>. Table 6.2 illustrates<br />

the results for an <strong>FPGA</strong> loaded <strong>with</strong> 7 <strong>of</strong> the adder trees. The difference in the resources used<br />

by the placed and unplaced descriptions is very significant, and possibly partially responsible<br />

for the fact that the placed version now exhibits significantly higher performance than the<br />

unplaced version regardless <strong>of</strong> pipelining.<br />

The difference in the number <strong>of</strong> slices used is quite interesting. It implies that the process<br />

<strong>of</strong> packing primitives into slices automatically does so much less densely than the manual


CHAPTER 6. LAYOUT CASE STUDIES 137<br />

Slices Util. t-PAR (s) Max freq. (Mhz) Pwr (mW)<br />

Unpipelined/Auto 4872 95% 225 53.3 -<br />

Unpipelined/Placed 1907 37% 19 65.5 -<br />

Pipelined/Auto 4872 95% 142 123.0 1404<br />

Pipelined/Placed 1908 37% 40 150.9 852<br />

Table 6.2: Results for 7 adder trees<br />

method. This appears to be a result <strong>of</strong> the Xilinx algorithm only packing, as a first preference,<br />

“related” logic into the same slice. Thus, while the manually specified layout tends to use<br />

both function generators in a slice, the automatic one prefers to use only one. This may<br />

allow the Xilinx router to perform better and could explain why the automatically placed<br />

single pipelined adder tree example requires more <strong>FPGA</strong> resources than the placed version<br />

but still runs faster.<br />

We measure the power consumption <strong>of</strong> the pipelined variants. Running at the same clock<br />

frequency, the placed design consumes substantially less power (39% less dynamic power,<br />

once the quiescent consumption <strong>of</strong> the development board is subtracted) than the unplaced<br />

design, though it is unclear whether this is the result <strong>of</strong> the design using fewer logic resources<br />

or <strong>of</strong> better routing.<br />

6.3 Median Filter<br />

Median filters are a special case <strong>of</strong> ranked order filtering. The median filtering operation is<br />

widely used in digital image processing to remove noise and in a variety <strong>of</strong> other applications.<br />

Our circuit will be restricted to one dimensional filtering, although the extension to a two<br />

dimensional filter is not difficult.<br />

A 1-dimensional median filtering operation involves “sliding” a filter window along a range<br />

<strong>of</strong> values and selecting the median value from the elements currently <strong>with</strong>in the window.<br />

This can be achieved by sorting the elements and selecting the middle value - obviously the<br />

window size must always be an odd number so that there is a middle element to select. In<br />

our circuit the elements <strong>with</strong>in the current window are stored and each cycle a new value<br />

is inserted while the oldest is discarded. Since only one element differs between different<br />

window positions we do not need to implement a full sorter but can simplify the circuitry to


CHAPTER 6. LAYOUT CASE STUDIES 138<br />

new value<br />

delays<br />

insert<br />

previous state<br />

locater<br />

compactor<br />

next state<br />

midelem<br />

median value<br />

Figure 6.5: Block diagram for the median filter<br />

simply insert a value into the correct position in an already sorted list.<br />

6.3.1 <strong>Circuit</strong> Design<br />

Our design is loosely based on a state-machine based design previously described in Ruby [26],<br />

however our realisation differs substantially and, we would suggest, is made much simpler by<br />

using the full features <strong>of</strong> Quartz rather than basic Ruby relations 2 . Figure 6.5 shows the basic<br />

architecture <strong>of</strong> our filter, <strong>with</strong> several blocks that operate on a current state and produce a<br />

next state. This is essentially the state transition and output logic <strong>of</strong> a state machine and<br />

this circuit can be composed <strong>with</strong> appropriate registers using the loop combinator.<br />

The two inputs to this filter core are the previous state (a set <strong>of</strong> sorted values) and a single<br />

new value. This new value is inserted into the sorted list by the insert block which implements<br />

one stage <strong>of</strong> an insertion sort. If the state contains n elements then the output <strong>of</strong> the insert<br />

block contains n + 1 elements, one <strong>of</strong> which (the oldest) must now be removed.<br />

As well as being connected to the insert block the new value is also fed into an n-element<br />

shift register, which is used to determine the value to remove to make the new state. The<br />

locater carries this out, matching the value from n cycles ago <strong>with</strong> the values in the state<br />

until the first match is found.<br />

The locater is composed <strong>of</strong> a row <strong>of</strong> lct cells, which are shown in Figure 6.6(a). These cells<br />

2 The Ruby design was however refined into a bit-level version, while we will concentrate on the word-level<br />

circuit.


CHAPTER 6. LAYOUT CASE STUDIES 139<br />

s<br />

a a2<br />

f<br />

eq<br />

or2<br />

d<br />

f2<br />

(a) lct cell (b) del cell in<br />

through mode<br />

(c) del cell in<br />

compact mode<br />

Figure 6.6: Cells making up various filter blocks<br />

output an array <strong>of</strong> boolean values (d) which control the operation <strong>of</strong> the compactor block.<br />

The value a is the value the locater is “looking for”, s is one <strong>of</strong> the elements in the state, f<br />

is a boolean indicating whether the value has been located yet. Each lct cell compares its<br />

state value <strong>with</strong> the a value and outputs d and f2 values which control the compactor.<br />

The compactor is made up <strong>of</strong> a row <strong>of</strong> cells which use multiplexers controlled by the d output<br />

<strong>of</strong> the corresponding lct cell. This mode signal configures the dell cell block into one <strong>of</strong> two<br />

configurations: through mode (Figure 6.6(b)) and compact mode (Figure 6.6(c)).<br />

The locater configures all compactor cells to the right <strong>of</strong> the first detected match to the<br />

correct value (the same value could be <strong>with</strong>in the current state multiple times) into through<br />

mode. This has the effect <strong>of</strong> routing the detected value to the right, while the other values are<br />

routed straight through. The value that appears on the rightmost output <strong>of</strong> the compactor<br />

is discarded. The compactor outputs the correct n elements <strong>of</strong> the next state.<br />

The midelem block is a simple wiring block which outputs the median value by extracting<br />

the middle value from the sorted list.<br />

6.3.2 <strong>Layout</strong><br />

We arrange the insert, locater and compactor blocks vertically roughly as shown in Figure 6.5.<br />

lct cell is formed from an equality checker, composed <strong>of</strong> a column <strong>of</strong> and3 gates on top <strong>of</strong><br />

an or2 gate. del cell is implemented by two multiplexers arranged in adjacent columns.<br />

The insert insertion-sort block is built from a row <strong>of</strong> min2 and max2 sorters, <strong>with</strong> each<br />

bit implemented as a comparator function unit and another function unit operating as a


CHAPTER 6. LAYOUT CASE STUDIES 140<br />

Block/Theory Theorems Pro<strong>of</strong> Details<br />

mux 4 Add mux lut def to simplification set<br />

mux ff 4 Add mux lut def to simplification set<br />

max2 6 Expand mux def, mux lut def, rephrase a tactical<br />

min2 6 Identical to max2<br />

insert 3 Expand compositions<br />

midelem 2 Automatic, assertion premise removed<br />

insert median 3 Expand compositions<br />

del cell 5 Manual intersection theorem<br />

compactor 3 Expand compositions<br />

eq 4 Containment pro<strong>of</strong> requires additional lemma<br />

lct cell 5 Fully automatic<br />

locater 3 Fully automatic<br />

nextstate 5 Fully automatic<br />

filter core 3 Expand compositions<br />

filter 3 Fully automatic<br />

multiplexer.<br />

Table 6.3: Statistics for median filter layout correctness pro<strong>of</strong><br />

The median filter design description, shown in Appendix D.1, is a good example <strong>of</strong> how the<br />

Quartz layout system is easy to use and allows the mixing <strong>of</strong> different styles <strong>of</strong> expressing<br />

placement. Size inference is used for all blocks, and since we do not define any combinators<br />

that we intend to re-use absolute co-ordinates can be used to simplify placement.<br />

The verification <strong>of</strong> such a layout description is important, since it could easily contain simple<br />

errors and we have undertaken this using our framework. Table 6.3 shows statistics and<br />

information on some <strong>of</strong> the pro<strong>of</strong>s conducted during the verification <strong>of</strong> the median filter<br />

layout (also shown in Appendix D.1).<br />

Generally the verification <strong>of</strong> this circuit layout is easy: although considerable tweaking <strong>of</strong> the<br />

pro<strong>of</strong> scripts are required for some blocks, the nature <strong>of</strong> such changes was always obvious<br />

and simple to carry out. Once again the inability <strong>of</strong> the classical reasoner to appropriately<br />

decompose series and parallel compositions is responsible for several required interventions,<br />

although once these compositions are handled using low-level reasoning tools the automated<br />

tools complete the pro<strong>of</strong>.<br />

The complete pro<strong>of</strong> scripts for this design take 6 min 5s to run. This is a surprisingly long<br />

time, mostly accounted for by the max2 and min2 blocks, however it is still well <strong>with</strong>in a<br />

reasonable time frame.


CHAPTER 6. LAYOUT CASE STUDIES 141<br />

shift registers<br />

insert<br />

locater<br />

compactor<br />

Figure 6.7: Median filter realised on a Virtex-II<br />

Slices Util. t-PAR (s) Max freq. (Mhz)<br />

Default synthesis tool configuration<br />

Automatic 247 4% 26 32.2<br />

Placed 147 2% 19 37.7<br />

30ns timing constraint<br />

Automatic 247 4% 27 37.2<br />

Placed 147 2% 19 37.1<br />

20ns timing constraint<br />

Automatic 247 4% 71 42.5<br />

Placed 147 2% 54 41.1<br />

Table 6.4: Results for a median filter <strong>with</strong> 8-bit data values and a window size <strong>of</strong> 5<br />

Figure 6.7 shows the logic resources used when a median filter is realised on a Virtex-II chip.<br />

The relative layout <strong>of</strong> the different components <strong>of</strong> the circuit can be clearly seen.<br />

6.3.3 Results<br />

We synthesise two different sizes <strong>of</strong> the median filter for a Virtex-II chip. Results for the<br />

smaller version, <strong>with</strong> 8-bit data values and a window size <strong>of</strong> 5, are shown in Table 6.4.<br />

We compared the results for three different configurations <strong>of</strong> the Xilinx synthesis tools. The<br />

place & route tool can be configured <strong>with</strong> a desired timing constraint on the resulting circuit,<br />

which the tool will attempt to meet. It can be seen that this does have a significant effect


CHAPTER 6. LAYOUT CASE STUDIES 142<br />

Slices Util. t-PAR (s) Max freq. (Mhz)<br />

Default synthesis tool configuration<br />

Automatic 2401 47% 116 2.95<br />

Placed 1317 26% 73 5.38<br />

200ns timing constraint<br />

Automatic 2401 47% 127 4.36<br />

Placed 1317 26% 62 5.35<br />

150ns timing constraint<br />

Automatic 2401 47% 158 4.82<br />

Placed 1317 26% 91 6.04<br />

Table 6.5: Results for a median filter <strong>with</strong> 32-bit data values and a window size <strong>of</strong> 11<br />

on the maximum clock frequency, though less so for the manually placed design. The time<br />

taken by the place & route process increases substantially as the timing constraint is made<br />

more stringent, although the manually placed version is processed quicker for all constraint<br />

settings.<br />

The performance <strong>of</strong> the automatically and manually placed versions are similar. However,<br />

both circuits use only a tiny proportion <strong>of</strong> the available logic resources and we have previously<br />

observed that automatic place & route seems to have more <strong>of</strong> an advantage at low utilisations.<br />

Table 6.5 gives the results for a larger median filter design, <strong>with</strong> 32-bit data values and a<br />

window size <strong>of</strong> 11. This design uses significantly more <strong>of</strong> the chip and the placed design has<br />

the clear advantage over the automatically placed version, in terms <strong>of</strong> both place & route<br />

time and maximum clock frequency. The clock frequency advantage is greatest when no<br />

timing constraint is specified, where the placed version is 82% faster, however even when the<br />

difference is minimised for the 200ns timing constraint the placed version is still 22% faster<br />

- and takes half the time to process.<br />

We do not record the power consumption <strong>of</strong> the median filter circuit because it runs at too<br />

slow a clock frequency to make this worthwhile. We could pipeline the design if there was a<br />

desire to increase its operating frequency.<br />

6.4 Butterfly Network<br />

Butterfly circuits are characterised by their intensive wiring patterns. Such networks are<br />

commonly used in applications such as computing a Fast Fourier Transform.


CHAPTER 6. LAYOUT CASE STUDIES 143<br />

Figure 6.8: A butterfly network <strong>of</strong> degree 4 (2 4 = 16 inputs)<br />

block butterfly (int n, block R (‘a, ‘a) ∼ (‘a, ‘a)) (‘a l [m]) ∼ (‘a r[m]) {<br />

const m = 2 ∗∗ n.<br />

l ; rcomp (n,<br />

riffle (m/2) ;<br />

pair (m/2) ;<br />

map (m/2, vecpair ; R ; converse (vecpair)) ;<br />

converse (pair (m/2))<br />

) ; r.<br />

}<br />

6.4.1 Butterfly Combinator<br />

Figure 6.9: Quartz butterfly combinator<br />

A butterfly network is an arrangement <strong>of</strong> functional blocks as shown in Figure 6.8. The<br />

network is characterised by repeated instantiations <strong>of</strong> the same, or similar functional blocks,<br />

<strong>with</strong> a particular wiring arrangement between them. There are a number <strong>of</strong> different ways<br />

<strong>of</strong> describing butterfly networks in Quartz and one <strong>of</strong> the simplest is shown in Figure 6.9.<br />

This combinator uses repeated composition to connect together multiple instantiations <strong>of</strong><br />

the the functional block R <strong>with</strong> a wiring arrangement described by riffle and pair. The pair<br />

block converts a one dimensional vector into a two dimensional vector <strong>of</strong> pairs. The riffle<br />

operation splits a vector into two halves and then combines them in an interleaved fashion,<br />

an iterative version <strong>of</strong> this wiring block is given in Figure 6.10.<br />

The vecpair block converts a two element vector into a pair (tuple) <strong>of</strong> elements and its<br />

converse performs the opposite operation. The Quartz vecpair block is defined referencing


CHAPTER 6. LAYOUT CASE STUDIES 144<br />

block riffle (int n) (‘a x[2∗n]) ∼ (‘a y[2∗n])<br />

attributes { height = 0. width = 0. }{<br />

int i.<br />

for i = 0..n∗2−1 {<br />

if (i mod 2 == 0) {y[i] = x[i/2]. } else { y[i ] = x[n+i/2]. } .<br />

} .<br />

}<br />

Figure 6.10: Iterative riffle operation<br />

block spacer (int w, int h) (‘a i) ∼ (‘a o)<br />

attributes { width = w. height = h. }→ i = o.<br />

block spacer (int w, int h) (‘a i1, ‘b i2) ∼ (‘b o1, ‘a o2)<br />

attributes { width = w. height = h. }→ (o1, o2) = (i2, i1).<br />

Figure 6.11: Two sided and four sided spacer blocks<br />

explicit vector indexes however it can also be defined using append blocks:<br />

vecpair = apl2 −1 ; snd [−] −1 = apr2 −1 ; fst [−] −1<br />

The pro<strong>of</strong> <strong>of</strong> this relationship is easy.<br />

The butterfly combinator can be instantiated <strong>with</strong> any R block to produce a variety <strong>of</strong><br />

different butterfly networks. Its structure has a clear layout interpretation imparted by the<br />

combinator blocks it utilises: map will arrange the R blocks <strong>of</strong> each stage <strong>of</strong> the butterfly<br />

vertically and each stage will be laid out horizontally next to the previous stage by the rcomp<br />

block.<br />

This is a very dense arrangement and it is possible that for some architectures there may<br />

be insufficient routing resources available to route the complex wiring network between each<br />

stage <strong>of</strong> the butterfly. To avoid this problem the butterfly combinator can have a spacer block<br />

added to the description <strong>of</strong> each stage. A spacer is a block which is functionally identical<br />

to the identity block but is defined to have a non-zero size, it can thus be used to produce<br />

empty space in designs.<br />

Figure 6.11 illustrates two spacer blocks for use in two-sided and four-sided circuit arrange-<br />

ments. They are declared as instances <strong>of</strong> an overloaded spacer identifier which are selected<br />

between depending on their type. For the butterfly combinator the spacer component can


CHAPTER 6. LAYOUT CASE STUDIES 145<br />

gr_lut<br />

gr_lut<br />

gr_lut<br />

gr_lut<br />

gr_lut<br />

gr_lut<br />

eq_lut<br />

eq_lut<br />

eq_lut<br />

eq_lut<br />

eq_lut<br />

eq_lut<br />

mux mux<br />

mux mux<br />

mux mux<br />

mux mux<br />

mux mux<br />

mux mux<br />

Figure 6.12: 6-bit 2-sorter circuit<br />

be placed anywhere <strong>with</strong>in the rcomp parameter composition since the type is polymorphic,<br />

however the logical place to put the spacer in order to put room between each butterfly stage<br />

is next to the map instantiation - it can then be given the desired width and any height (less<br />

than the expected height <strong>of</strong> the map instantiation) and the series composition layout will<br />

ensure that this space is left free.<br />

6.4.2 Implementing a bitonic merger<br />

The butterfly circuit we evaluate is a network <strong>of</strong> 2-sorters. This is a bitonic merger circuit<br />

which merges together two sorted lists. The merger is bitonic because the order <strong>of</strong> the input<br />

lists must be opposed – i.e. if one is ascending then the other must be descending or vice<br />

versa.<br />

We design a 2-sorter circuit which operates on n-bit data values and lay it out as a 4 × n<br />

block as shown in Figure 6.12. The first two columns are a comparator which outputs a<br />

control signal to the multiplexers to select the maximum and minimum values.<br />

The butterfly sorting network can be pipelined by inserting registers between each stage,<br />

replacing the R block by R ; D We can state the correctness <strong>of</strong> a pipelining arrangement<br />

<strong>with</strong> the following theorem:<br />

Theorem 24 R ; D = D ; R ⇒ butterfly n R = butterfly n (R ; D) ; D −n<br />

Pro<strong>of</strong> This requires a lemma about repeated series composition:


CHAPTER 6. LAYOUT CASE STUDIES 146<br />

block register (wire clk) (block R ‘a ∼ (wire)) (‘a i) ∼ (wire o)<br />

attributes { width = 1. height = 1. }<br />

{<br />

wire o2.<br />

assert (height(i ; R ; o2)


CHAPTER 6. LAYOUT CASE STUDIES 147<br />

Slices Util. t-PAR (s) Max freq. (Mhz) Pwr (mW)<br />

4-bit data<br />

Unpipelined/Auto 2559 50% 16 25.1 -<br />

Unpipelined/Placed 1439 28% 6 27.2 -<br />

Pipelined/Auto 2559 50% 19 76.1 252<br />

Pipelined/Placed 1439 28% 11 65.5 360<br />

6-bit data<br />

Unpipelined/Auto 3874 76% 37 18.6 -<br />

Unpipelined/Placed 2114 41% 12 19.7 -<br />

Pipelined/Auto 3887 76% 36 70.4 432<br />

Pipelined/Placed 2127 42% 14 67.6 396<br />

8-bit data<br />

Unpipelined/Auto 5118 100% 69 10.0 -<br />

Unpipelined/Placed 2778 54% 19 17.6 -<br />

Pipelined/Auto 5118 100% 74 36.4 648<br />

Pipelined/Placed 2812 55% 22 53.8 516<br />

10-bit data<br />

Unpipelined/Auto - >100% - - -<br />

Unpipelined/Placed 3453 67% 19 15.8 -<br />

Pipelined/Auto - > 100% - - -<br />

Pipelined/Placed 3496 68% 18 57.4 624<br />

the relationship:<br />

Table 6.6: Results for 2 64-input bitonic merger circuits<br />

Theorem 26 R : α ∼ wire ⇒ R ; D = register R<br />

6.4.3 Results<br />

We generate a bitonic merger circuit that merges two lists <strong>of</strong> 32 8-bit numbers pipelined and<br />

unpipelined for four different bit widths. Two <strong>of</strong> these circuits are placed on the Virtex-II<br />

chip.<br />

Figure 6.6 shows the results for these circuits. The placed version takes up significantly<br />

fewer logic resources on the device than the unplaced version. In fact, for the 10-bit data the<br />

unplaced circuit the synthesis tools report that the design can not actually be mapped onto<br />

the device, while the placed version uses less than 70% <strong>of</strong> the available resources.<br />

The picture for our other metrics is less clear, although manual placement clearly produces<br />

significant improvements in maximum clock frequency for unpipelined designs the same is<br />

not always the case for the pipelined versions. The same mixed picture is observed <strong>with</strong>


CHAPTER 6. LAYOUT CASE STUDIES 148<br />

+<br />

+ + +<br />

D D D D<br />

Figure 6.14: A 4-stage binomial filter<br />

power consumption figures, <strong>with</strong> the manually placed designs consuming 20% less power for<br />

the 8-bit variant but 43% more power for the 4-bit variant.<br />

Once again it appears that the level <strong>of</strong> device utilisation heavily influences the effectiveness<br />

<strong>of</strong> manual placement, <strong>with</strong> the automatic placement producing better results for designs that<br />

only utilise a small part <strong>of</strong> the device while for larger circuits manual placement has the<br />

advantage.<br />

6.5 Binomial Filter<br />

A binomial filter is a simple digital signal processing circuit that we can easily describe using<br />

our framework.<br />

6.5.1 <strong>Circuit</strong> Design<br />

An n-stage binomial filter is composed <strong>of</strong> n adders and delay elements arranged as shown in<br />

Figure 6.14. We could implement this circuit using the ripple adder from Section 6.2 and<br />

implement a word-level pipeline by placing additional registers between each stage.<br />

We will however use an alternative implementation which allows us to pipeline the design at<br />

the bit level. This involves abandoning the Virtex carry chain circuitry, which is designed<br />

to implement fast, un-pipelined, carry chains. We describe a new full adder component <strong>with</strong><br />

size 1 × 2 that uses two function generators to generate the sum and carry-out outputs given<br />

the three input signals for each stage. Carry signals can still propagate vertically through a<br />

column <strong>of</strong> these full adders and the sum output can be connected to the input <strong>of</strong> the next<br />

stage. We use the fork wiring block to copy the output <strong>of</strong> the previous stage and the fst<br />

combinator to map registers along only one input to the next adder. These registers have<br />

size 1 ×1 and so in order to place them correctly aligned <strong>with</strong> the full adders which have size


CHAPTER 6. LAYOUT CASE STUDIES 149<br />

block lift (int n, block R ‘a ∼ ‘b) (‘a i) ∼ (‘b o) {<br />

i ; R ; o at (0, n).<br />

}<br />

Figure 6.15: The lift combinator can be used to place blocks at a certain y-<strong>of</strong>fset<br />

Slices Util. t-PAR (s) Max freq. (Mhz) Pwr (mW)<br />

24-bit data<br />

Unpipelined/Auto 3074 60% 29 15.2 -<br />

Unpipelined/Placed 3111 61% 11 17.2 -<br />

Pipelined/Auto 3817 75% 37 153.2 648<br />

Pipelined/Placed 3114 61% 16 99.7 588<br />

26-bit data<br />

Unpipelined/Auto 3336 65% 31 15.0 -<br />

Unpipelined/Placed 3370 66% 12 17.4 -<br />

Pipelined/Auto 4140 81% 38 146.0 684<br />

Pipelined/Placed 3373 66% 19 103.8 612<br />

28-bit data<br />

Unpipelined/Auto 3597 70% 34 13.9 -<br />

Unpipelined/Placed 3629 71% 13 16.8 -<br />

Pipelined/Auto 4463 87% 46 120.3 732<br />

Pipelined/Placed 3632 71% 17 73.5 636<br />

32-bit data<br />

Unpipelined/Auto 4128 81% 38 13.4 -<br />

Unpipelined/Placed 4150 81% 14 15.4 -<br />

Pipelined/Auto 5108 100% 61 137.1 768<br />

Pipelined/Placed 4149 81% 14 96.0 756<br />

Table 6.7: Results for a 2 binomial filter circuits<br />

1 × 2 we use the lift combinator, shown in Figure 6.15. lift instantiates a block at a certain<br />

y-<strong>of</strong>fset, leaving empty space underneath it.<br />

6.5.2 Results<br />

We compile a 32-stage binomial filter for several different bitwidths and implement two on the<br />

Virtex-II device. We evaluate versions <strong>with</strong>out any pipelining and <strong>with</strong> bit-level pipelining,<br />

<strong>with</strong> no input/output synchronisation.<br />

The results for the variants <strong>of</strong> this circuit, shown in Figure 6.7, are particularly interesting.<br />

Firstly, this is the first circuit where the manual placement has not always led to a denser<br />

logic mapping than the automatic algorithms, <strong>with</strong> several manually placed versions actually<br />

requiring marginally more slices than their automatically placed equivalents. This only ap-


CHAPTER 6. LAYOUT CASE STUDIES 150<br />

plies for the unpipelined circuits and is not surprising because we have deliberately specified<br />

a less-dense packing for this circuit, using the lift block to align registers <strong>with</strong> the adders and<br />

thus using only one out <strong>of</strong> each two slice flip-flops in these columns. In the unpipelined cir-<br />

cuits we are also not utilising the in-slice flip-flops <strong>of</strong> the adder columns which could be used<br />

to implement the coefficient delay. If we had specified a denser packing for the unpipelined<br />

circuit then we would use b × n fewer slices (where b is the number <strong>of</strong> bits and n the number<br />

<strong>of</strong> stages) – saving over 1000 slices for the 32-bit, 32 stage circuit.<br />

The manually placed designs have better maximum clock frequencies than the automatically<br />

placed ones when there is no pipelining and significantly worse performance when pipelined.<br />

For example, the placed 32-bit filter runs 14% faster than the unplaced version when un-<br />

pipelined, but 30% slower when pipelined.<br />

The clock frequency result is important, since it indicates that simulated annealing has<br />

achieved better results for the pipelined circuit regardless <strong>of</strong> the circuit size. The difference<br />

between the placed and unplaced circuits does nonetheless differ depending on the circuit<br />

size, <strong>with</strong> the placed circuit 35% slower for 24-bit data but 30% slower for the larger 32-bit<br />

data variant.<br />

This result is consistent <strong>with</strong> previous research [77] which indicated that placed adder circuits<br />

not employing the carry chain were outperformed by circuits placed using simulated anneal-<br />

ing. It shows that <strong>with</strong>out the vertical placement constraint enforced by use <strong>of</strong> the carry<br />

chain simulated annealing can place cells where it likes and find high speed paths between<br />

cells which humans would probably not have considered. In the absence <strong>of</strong> other constraints,<br />

simulated annealing can therefore find irregular layouts which are better than what a human<br />

would consider reasonable.<br />

Interestingly although for the pipelined circuits the manually placed variants have lower<br />

maximum clock frequencies, when run at the same clock frequency they consume between<br />

1.5% and 13.1% less power. This indicates that manual placement has a role to play even<br />

in circuits such as this one where it leads to a decreased maximum potential performance by<br />

specifying a denser logic mapping and reducing circuit power consumption. If a 24-bit filter<br />

that can run at 50Mhz is desired then the placed circuit is clearly superior – it will compile<br />

quicker, use less logic area and consume significantly less power while running at that speed.


CHAPTER 6. LAYOUT CASE STUDIES 151<br />

6.6 Matrix Multiplier<br />

Our final circuit example is a matrix multiplier circuit. Matrix multiplication is a simple<br />

operation that has applications in many scientific computing applications as well as branches<br />

<strong>of</strong> digital signal processing. The multiplication <strong>of</strong> two matrices requires a large number <strong>of</strong><br />

multiplication and addition operations and there is considerable potential for parallelisation,<br />

making a hardware implementation attractive.<br />

Band matrix multipliers have previously been described as systolic grid-shaped circuits using<br />

Ruby [44, 68]. These descriptions were difficult to relate to a simple specification <strong>of</strong> matrix<br />

multiplication as a set <strong>of</strong> multiply-accumulate operations. We will describe our system in a<br />

clearer manner using a new combinator that describes 3D circuits.<br />

We are motivated to create new higher-dimensional combinators by the realisation that a<br />

combinator <strong>with</strong> a certain dimension is appropriate for processing data <strong>with</strong> a particular<br />

dimension. For example, a one-dimensional array such as a row can process one dimensional<br />

data, while a grid can process two dimensional data (or two one-dimensional data streams).<br />

However, the confusion <strong>with</strong> mapping an operation such as matrix multiplication onto a grid<br />

arises from the fact that the input data is two two-dimensional data sources and the output is<br />

two dimensional, while the grid itself only has 4 1-dimensional interface points in its domain<br />

and range. If we can describe higher dimensional combinators, such as cubical structures,<br />

then potentially we can describe circuits that operate on this kind <strong>of</strong> data much more clearly.<br />

An important point to note is that we are talking about multi-dimensional circuit descrip-<br />

tions, which are not the same as multi-dimensional circuits. Three dimensional <strong>FPGA</strong>s<br />

have been proposed [2, 17, 40, 51], however four dimensional and higher-dimensional FP-<br />

GAs pose interesting implementation difficulties. An alternative is to attempt to realise<br />

higher-dimensional <strong>FPGA</strong> circuitry on standard two dimensional silicon. Schmit proposes<br />

drawing on three and four dimensional topologies to increase the wiring density on stan-<br />

dard two-dimensional silicon [72]. What we propose is designing combinators which describe<br />

multi-dimensional circuits but which we expect to realise on two-dimensional silicon.<br />

Any higher-dimensional array can be flattened onto a 2D grid in a number <strong>of</strong> ways. We<br />

propose higher-dimensional combinators that have an implicit layout interpretation on the


CHAPTER 6. LAYOUT CASE STUDIES 152<br />

domain<br />

(a) Domain & range convention<br />

range<br />

z<br />

y<br />

x<br />

(b) Dimensions<br />

Figure 6.16: Representing three dimensional blocks<br />

2D <strong>FPGA</strong> but can also be given alternative layout interpretations through manipulation<br />

using correctness-preserving transformations.<br />

6.6.1 A 3D “cube” Combinator<br />

The standard layout interpretation <strong>of</strong> a Quartz block is as a four sided tile, <strong>with</strong> two sides<br />

assigned to the domain and two to the range. The assignment <strong>of</strong> sides to domain and range<br />

and the division into two sides is a convention only. It is important to realise that any<br />

convention is sufficient providing it is consistently applied and we can bear this in mind<br />

when choosing a convention for visualising the three dimensional blocks which make up a<br />

cubical circuit.<br />

Figure 6.16(a) illustrates how we divide the six sides <strong>of</strong> a cube into a block’s domain and<br />

range. Visualised in this way the block’s top, back and left sides form the domain while the<br />

front, bottom and right form the range. We describe block domains and ranges as tuples <strong>of</strong><br />

different dimensions, so the range is described <strong>of</strong> a tuple <strong>of</strong> (xs, ys, zs) while the range is a<br />

tuple <strong>of</strong> (zs, ys, xs). We use the signal xs to describe signals that are travelling along the<br />

x-axis, ys to mean the signals travelling along the y-axis etc. Note that we have extended<br />

the convention <strong>of</strong> reversing the order <strong>of</strong> the sides from the 2D case to this 3D case. In the<br />

n-dimensional case a block’s range dimensions should always be expressed in reverse order<br />

from its domain dimensions.<br />

Each side <strong>of</strong> the cube is itself a two dimensional array <strong>of</strong> values. This requires some convention<br />

<strong>of</strong> assigning dimensions, which we do as shown in Figure 6.16(b). This means that the<br />

domain and range signals are assigned the dimensions <strong>of</strong>: xs[z][y], ys[z][x] and zs[y][x]. This


CHAPTER 6. LAYOUT CASE STUDIES 153<br />

block cube (int x, int y, int z, block R (‘a, ‘b, ‘c) ∼ (‘c, ‘b, ‘a))<br />

(‘a x d[z][y], ‘b y d[z][x], ‘c z d[y][x]) ∼<br />

(‘c z r[y][x], ‘b y r[z][x], ‘a x r[z][y]) {<br />

‘a xs[x+1][z][y]. ‘b ys[y+1][z][x]. ‘c zs[z+1][y][x].<br />

int ix, iy, iz.<br />

}<br />

xs[0] = x d. ys[y] = y d. zs[z] = z d.<br />

x r = xs[x]. y r = ys[0]. z r = zs[0].<br />

for ix = 0..x−1 {<br />

for iy = 0..y−1 {<br />

for iz = 0..z−1 {<br />

(xs[ix ][ iz ][ iy ], ys[iy+1][iz ][ ix ], zs[ iz+1][iy ][ ix ]) ;<br />

R ;<br />

(zs[ iz ][ iy ][ ix ], ys[iy ][ iz ][ ix ], xs[ix+1][iz ][ iy]).<br />

} .<br />

} .<br />

} .<br />

Figure 6.17: cube combinator defined iteratively <strong>with</strong> explicit internal signals<br />

generalises easy to the n-dimensional case where extra dimensions can simply be added.<br />

We can describe a cube combinator iteratively as shown in Figure 6.17 in terms <strong>of</strong> explicit<br />

internal signals. Multiple copies <strong>of</strong> the R block are instantiated and connected to internal<br />

signals xs, ys and zs that hold values flowing along the x-axis, y-axis and z-axis respectively.<br />

This definition uses three different internal signals rather than declaring a single vector <strong>of</strong><br />

internal signals <strong>with</strong> an extra dimension, this is because the signals flowing along each axis<br />

can have different types - as illustrated in the type signature for the R block and the cube<br />

block itself.<br />

This interpretation <strong>of</strong> a cube also identifies the final important element <strong>of</strong> our convention –<br />

that the (0, 0, 0) point is located at the front, left <strong>of</strong> a cubical structure. This is reflected in<br />

the indexes chosen to connect to the R block in the loop. While this is a short description it<br />

is quite complex and has no particularly obvious layout interpretation.<br />

We can envisage a cube as a 1D array <strong>of</strong> 2D arrays. We exploit this to represent a cube as a<br />

column <strong>of</strong> grids. This kind <strong>of</strong> description immediately gives the cubical circuit a 2D layout<br />

interpretation – as grids laid out vertically on top <strong>of</strong> one another. Figure 6.18 illustrates the<br />

wiring involved in this kind <strong>of</strong> arrangement. In this example two grids are placed on top <strong>of</strong>


CHAPTER 6. LAYOUT CASE STUDIES 154<br />

y_in<br />

x_in<br />

x_in<br />

y_in<br />

x_in<br />

x_in<br />

R R<br />

R R<br />

R R<br />

R R<br />

z_out<br />

z_in<br />

R<br />

R<br />

R<br />

R<br />

x_out<br />

x_out<br />

y_out<br />

x_out<br />

x_out<br />

y_out<br />

Figure 6.18: A cubical circuit can be viewed as a column <strong>of</strong> grids.<br />

each other, however the wiring is somewhat complex as the z dimension connections need to<br />

be routed through the x and y connections between the two grids.<br />

We can do this by “folding” the extra dimensional signals into a tuple <strong>with</strong> one <strong>of</strong> the<br />

“standard” grid dimensions and extracting them as needed. This can be done using the<br />

zip n,m language construct which converts an n-tuple <strong>of</strong> vectors into an m-dimensional vector<br />

<strong>of</strong> tuples. A pre-requisite for this operation is that the vectors are the same size in the<br />

dimensions that are being zipped and it turns out that there is only one valid way <strong>of</strong> carrying<br />

this out.<br />

Figure 6.19 shows how a cube combinator can be written in terms <strong>of</strong> col and grid. Note the<br />

re-arrangement <strong>of</strong> the domain tuple into a pair <strong>of</strong> a tuple <strong>of</strong> (x d, y d) and z d while the<br />

same has been done to the range tuple. Pairs are actually extremely powerful mechanisms -<br />

they allow us to split signals into a part to perform operations on and a part to ignore. The<br />

conversion <strong>of</strong> the 3-tuple <strong>of</strong> domain/range signals into a pair allows the use <strong>of</strong> the fst and snd<br />

blocks to control the applications <strong>of</strong> the zip 2 block - we will show how this transformation<br />

can be included in the description shortly.


CHAPTER 6. LAYOUT CASE STUDIES 155<br />

block cube (int x, int y, int z, block R (‘a, ‘b, ‘c) ∼ (‘c, ‘b, ‘a))<br />

(‘a x d[z][y], ‘b y d[z][x], ‘c z d[y][x]) ∼<br />

(‘c z r[y][x], ‘b y r[z][x], ‘a x r[z][y]) {<br />

((x d, y d), z d) ;<br />

fst (zip 2) ;<br />

col (z, swap ; rsh ; fst (zip 2) ;<br />

grid (x, y, cube cell (x, R)) ;<br />

snd (converse (zip 2)) ; rsh ; fst swap ; lsh) ;<br />

snd (converse (zip 2)) ;<br />

(z r, (y r, x r)).<br />

}<br />

Figure 6.19: A Quartz description for cube as a column <strong>of</strong> grids<br />

block cube cell (int n, block R (‘a, ‘b, ‘c) ∼ (‘c, ‘b, ‘a))<br />

((‘c z[n], ‘a x), ‘b y) ∼ (‘b y2, (‘c z2[n], ‘a x2)) {<br />

((x,y), z) ;<br />

snd (converse (apl (n−1))) ; rsh ;<br />

fst (tplapr 2 ; R ; converse (tplapl 2) ; swap) ;<br />

lsh ; snd (swap ; apr (n−1)) ;<br />

((y2, x2), z2).<br />

}<br />

Figure 6.20: The cube cell re-wiring block<br />

This cube combinator creates a grid <strong>of</strong> not R blocks but cube cell blocks. This is a wiring<br />

block <strong>with</strong> a description shown in Figure 6.20. It splits the full z vector that has been<br />

“packaged” up <strong>with</strong> the x vector and extracts the signal that is for this element, connects it<br />

to R and then appends the z output <strong>of</strong> R back into the z vector. The whole operation rotates<br />

the z vector as the value is extracted from the left and appended to the right, meaning that<br />

the next block will extract the correct (different) z signal and by the end <strong>of</strong> the row the<br />

resulting z signals are all neatly packed into the vector. Figure 6.21 illustrates this structure.<br />

6.6.2 Describing N-dimensional Combinators<br />

The description in Figure 6.20 uses two new language constructs: tplapl and tplapr. These<br />

are tuple append blocks that function in much the same way as apl and apr do for vectors,<br />

adding or removing elements from the left or right <strong>of</strong> a tuple. These must be defined as<br />

language constructs that are statically parameterised because otherwise they are not valid<br />

<strong>with</strong>in the Quartz type system which requires that tuples are fixed-arity data structures. We


CHAPTER 6. LAYOUT CASE STUDIES 156<br />

x_in<br />

y_in<br />

R<br />

x_out<br />

z_in z_out<br />

y_out<br />

Figure 6.21: The cube cell wiring block<br />

have implemented tplapl and tplapr using the Quartz compiler infrastructure for defining new<br />

experimental language constructs.<br />

The tuple-append operations do exhibit some interesting behaviour because <strong>of</strong> the way they<br />

interact <strong>with</strong> other aspects <strong>of</strong> the Quartz type system, in particular the typing rule which<br />

treats a singleton as equivalent to a single-element tuple. For example, the operation tplapl<br />

1 or tplapr 1 applied to a pair leaves the pair unchanged – the effect is to append the left or<br />

right element to the single element tuple <strong>of</strong> the other element forming a new pair <strong>with</strong>in the<br />

original tuple, which is then eliminated because it only contains a single element (the pair).<br />

For tplapl the process looks a little like:<br />

(a, b) −→ ((a), b) −→ ((a, b)) −→ (a, b)<br />

The tuple-append operators allow us to decompose an n-tuple into a pair <strong>of</strong> the leftmost/right-<br />

most element and the rest <strong>of</strong> the tuple. Once this has been done we can use standard oper-<br />

ations on pairs to manipulate the tuple. They are vital in allowing us to generalise the 3D<br />

combinator we have developed in this section into an n-dimensional meta-combinator, nd .<br />

We describe this as a meta-combinator because it is not itself a valid Quartz combinator<br />

since it is parameterised in its number <strong>of</strong> dimensions and utilises tuples <strong>of</strong> parameterisable<br />

lengths, something that is not valid in the Quartz type system. Each possible instance is a<br />

valid combinator but must be described individually. It has been designed to use point-free<br />

wiring constructs in order to achieve its generality and thus could be a valid construct in the<br />

untyped Ruby calculus. The limiting influence <strong>of</strong> type systems in specification languages has


CHAPTER 6. LAYOUT CASE STUDIES 157<br />

nd2 (i1, i2) R = coli2 (rowi1 R)<br />

ndn (i1, . . .,in) R = tplaprn−1 −1 ; fst (zip n−1 ) ; colin<br />

swap ; snd tplapln−2 −1 ; rsh ; fst (zip 2,n−2 ) ;<br />

tplapln−2 ; ndn−1 (i1, . . .,in−1) (ndcelln (x, R)) ;<br />

tplaprn−2 −1 −1<br />

; snd(zip2,n−2 ; swap) ; rsh ;<br />

<br />

swap ; snd tplaprn−2 ;<br />

snd(zip n−1 −1 ) ; tplapln−1<br />

ndcelln (m, R) = tplapln−2 −1 ; fstswap ; lsh ; sndswap ; rsh ; fst tplapln−2 ;<br />

been much discussed [38].<br />

snd(aplm−1 −1 ) ; tplapl2 ; tplapr2 −1 ;<br />

fst (tplaprn−1 ; R ; tplaprn−1 −1 ; swap) ;<br />

lsh ; snd(swap ; aprm−1) ; fst tplaprn−2 −1 ;<br />

lsh ; sndswap ; tplaprn−2<br />

Figure 6.22: Description <strong>of</strong> an n-dimensional meta-combinator<br />

Figure 6.22 illustrates our n-dimensional combinator description. This combinator is defined<br />

recursively <strong>with</strong> a base case <strong>of</strong> n = 2 (grid ). For n = 3 the description <strong>of</strong> nd simplifies to<br />

that <strong>of</strong> cube and ndcell simplifies to the description <strong>of</strong> cube cell.<br />

The definition <strong>of</strong> an n-dimensional combinator description raises the tantalising possibility <strong>of</strong><br />

being able to prove theorems for all higher-dimensional combinators as a single theorem. For<br />

example, we could prove a theorem that could totally pipeline an n-dimensional array taking<br />

the pipelining theorem for a grid as the base case. Alternatively we could investigate how to<br />

serialise [45] higher-dimensional descriptions into lower dimensional ones (e.g. converting a<br />

4D description into 3D, 2D, 1D equivalents <strong>with</strong> appropriate multiplexers to manipulate the<br />

input signals).<br />

Unfortunately, the descriptions <strong>of</strong> nd and ndcell themselves are complex and fiddly, even<br />

though most individual steps are just describing simple wiring re-arrangements. Since most<br />

<strong>of</strong> the commands are simple wiring they are easy to reason about, for example tplapl and tplapr<br />

are timeless and thus registers can be moved through them easily. The complex description<br />

does seem to suggest that it could be a candidate for employing mechanised theorem proving.


CHAPTER 6. LAYOUT CASE STUDIES 158<br />

Matrix 1<br />

Empty matrix<br />

Result matrix<br />

Matrix 2<br />

(a) Matrix multiplier arrangement<br />

x_in<br />

mult<br />

add<br />

z_in<br />

y_in<br />

z_out<br />

(b) Functional cell<br />

Figure 6.23: Implementing a 3D matrix multiplier<br />

N-dimensional circuits have many potential uses in describing operations on multi-dimensional<br />

data and have already been discussed for some applications [42, 43]. They could potentially<br />

be used to provide a quick route for translating imperative for loop algorithms into hard-<br />

ware, extending existing work on mapping nested loop algorithms into multi-dimensional<br />

arrays [39].<br />

In this work we shall concentrate on describing a single circuit, a matrix multiplier, using<br />

the three-dimensional cube combinator.<br />

6.6.3 A 3D Matrix Multiplier<br />

A cubical circuit description can be used to combine two dimensional data and generate a<br />

two dimensional output. This is ideal for describing matrix multiplication as we can describe<br />

a circuit which has data from the two source matrices moving unchanged through the array<br />

while the output matrix is accumulated along a different axis. Figure 6.23(a) illustrates this<br />

arrangement.<br />

The cubical circuit can be made up <strong>of</strong> cells that function as shown in Figure 6.23(b). Fig-<br />

ure 6.24 shows the Quartz description for the matrix multiplier. Note that the first matrix<br />

must be transposed in order to correctly arrange the elements <strong>of</strong> the matrix for the circuit,<br />

nevertheless this is an extremely simple description <strong>of</strong> multiplying a y ×z matrix and a z ×x<br />

matrix to produce a y × x matrix as output.


CHAPTER 6. LAYOUT CASE STUDIES 159<br />

block matmultcell (int n) (wire x in[n], wire y in[n], wire z in[n]) ∼<br />

(wire z out[n], wire y out[n], wire x out[n]) {<br />

x out = x in.<br />

y out = y in.<br />

((x in, y in), z in) ; fst (mult n) ; add n; z out at (0,0).<br />

}<br />

block matmult (int bits) (int x, int y, int z)<br />

(wire mat1[y][z ][ bits ], wire mat2[z][x][ bits ]) ∼ (wire mat3[y][x][ bits ]) {<br />

wire emptymat[y][x][bits].<br />

wire mat trans[z][y][ bits ].<br />

int i , j, k.<br />

for i = 0..y−1 {<br />

for j = 0..x−1 {<br />

for k = 0..bits−1 { emptymat[i][j ][ k] = false. } . } . } .<br />

mat1 ; word transpose bits (y, z) ; mat trans at (0,0).<br />

(mat trans, mat2, emptymat) ;<br />

cube (x, y, z, matmultcell bits) ;<br />

converse (tplapl 2) ;<br />

pi1 ;<br />

mat3 at (0,0).<br />

}<br />

Figure 6.24: Quartz description <strong>of</strong> the 3D matrix multiplier<br />

In the implementation <strong>of</strong> the functional cell we use the column ripple adder developed in<br />

Section 6.2 and the placed parallel multiplier we described in Chapter 5.<br />

We can pipeline this circuit by inserting registers on the z data path, since this is the<br />

accumulator path. We can derive a general pipelining arrangement for nd using a retiming<br />

theorem for a column:<br />

Theorem 27 coln R = fst ( ˜<br />

n D) ; coln (R ; D) ; [D −n , ˜<br />

n D−1 ]<br />

We can apply this to the n-dimensional combinator description to give a theorem for the<br />

one-dimensional pipelining <strong>of</strong> an n-dimensional structure:<br />

Theorem 28<br />

ndn R = tplaprn−1 −1 ; fst (zip n−1 ; ˜<br />

in D) ; tplaprn−1 ;<br />

ndn (R ; tplapln−1 −1 ; fst D ; tplapln−1)<br />

tplapln−1 −1 ; [D −in , zip n−1 ; ˜<br />

in D−1 ; zip n−1 −1 ] ; tplapln−1


CHAPTER 6. LAYOUT CASE STUDIES 160<br />

Pro<strong>of</strong> By moving the delay out <strong>of</strong> the instantiation <strong>of</strong> cube cell and exploiting the timeless<br />

property <strong>of</strong> wiring blocks to re-organise 3 .<br />

We can use a particular instance <strong>of</strong> this theorem for the cube and manipulate it so the<br />

resulting circuit is implementable to give:<br />

Theorem 29<br />

cubex,y,zR = [ ˜<br />

z D, ˜<br />

z D, id] ; cubex,y,z(R ; [D, id, id]) ; [id, <br />

z D, z D]<br />

This theorem can then be used to generate synchronisation registers for the interfaces <strong>of</strong> the<br />

matrix multiplier when we pipeline it.<br />

We undertake a layout verification <strong>of</strong> the cubical matrix multiplier using the previous cor-<br />

rectness pro<strong>of</strong>s for col, grid, the multiplier and ripple adder. In the cube block we need<br />

to add further explicit type annotations to eliminate some type unknowns because the Is-<br />

abelle/HOL theory <strong>of</strong> zip is not sufficiently detailed to describe the full types. This indicates<br />

that a full theory <strong>of</strong> zip (and indeed tplapl and tplapr) is required, and not just the simple<br />

approximation contained in the theory Inbuilt, even though the inbuilt blocks do not affect<br />

layout. Otherwise pro<strong>of</strong>s are mostly automatic, <strong>with</strong> some intervention required for series<br />

compositions and expanding the definition <strong>of</strong> word transpose. Appendix D.3 gives the full<br />

Quartz description and some pro<strong>of</strong>s for the matrix multiplier circuit.<br />

6.6.4 Results<br />

We generate a circuit that multiples two 2 ×2 matrices together to produce a 2 ×2 matrix as<br />

output and evaluate two <strong>of</strong> these components on the Virtex-II. Table 6.8 shows the results<br />

for this circuit. Power consumption was measured at a clock frequency <strong>of</strong> 16.625Mhz.<br />

As can be seen the placed version is out-performed by the automatically placed version for<br />

both the pipelined and unpipelined variants in terms <strong>of</strong> maximum clock frequency, although<br />

the percentage difference is small. The power consumption figures present a confusing picture,<br />

3 Actually, proving that the D element can be moved out <strong>of</strong> cube cell is a difficult and fiddly pro<strong>of</strong>, however<br />

cube cell is designed to allow this to take place so we will gloss over this at this point.


CHAPTER 6. LAYOUT CASE STUDIES 161<br />

Slices Util. t-PAR (s) Max freq. (Mhz) Pwr (mW)<br />

Unpipelined/Auto 1568 31% 14 23.7 276<br />

Unpipelined/Placed 1413 28% 12 23.0 108<br />

Pipelined/Auto 1542 30% 16 30.7 192<br />

Pipelined/Placed 1476 29% 12 28.9 204<br />

Table 6.8: Results for matrix multiplier circuit<br />

<strong>with</strong> no clear trends emerging. The pipelined, placed version actually consumes more power<br />

than the the unpipelined version, which is unexpected since previous results have shown<br />

pipelining reduces power consumption for many kinds <strong>of</strong> circuits [85]. However, the overall<br />

power consumption <strong>of</strong> the circuit is so low it is possible that any real effects are being<br />

overwhelmed by noise. Because <strong>of</strong> the shape <strong>of</strong> the placed matrix multiplier it is not possible<br />

to fit more onto the Virtex device however by increasing the pipelining (by pipelining the<br />

multipliers themselves for example) the design could be run and evaluated at a higher clock<br />

frequency.<br />

The placed version does place & route faster than the unplaced circuit and consumes fewer<br />

resources on the chip, so could still be superior in some situations where the small difference<br />

in maximum clock frequency is not significant.<br />

It is possible that an alternative layout for the cube combinator would produce better results.<br />

With the z signals being used for the accumulator data path there are long wires between<br />

the respective elements <strong>of</strong> each grid.<br />

6.7 Evaluation and Conclusions<br />

For the five example designs in this chapter we have seen a range <strong>of</strong> results for our four<br />

evaluation metrics: logic area, place and route time, maximum clock frequency and power<br />

consumption.<br />

At important realisation that manual placement is not an optimisation method per se but<br />

rather a way <strong>of</strong> exerting more control over the compilation process. The way in which that<br />

control is exercised determines whether the circuits generated are better or worse in some<br />

way than those that would have been generated automatically.


CHAPTER 6. LAYOUT CASE STUDIES 162<br />

One invariant we have seen across all circuit examples is that manually placing designs<br />

significantly reduces the time taken for the place and route stage <strong>of</strong> the compilation process<br />

to execute, <strong>with</strong> the reduction ranging from 14% for the unpipelined matrix multiplier to<br />

92% for the unpipelined adder tree. This result is not overly surprising, since the place and<br />

route stage has been reduced to just routing, however it is beneficial to confirm it since it<br />

could have been the case that the denser placements specified by the user constraints would<br />

increase the routing time by more than is saved by avoiding automatic placement.<br />

Another fairly firm conclusion is that in almost all cases 4 manually placed designs require<br />

less logic area than automatically placed ones. The logic mapping specified by the manual<br />

constraints is denser than that used by the automatic tools and reductions in area <strong>of</strong> 40% are<br />

commonly achieved, <strong>with</strong> a maximum <strong>of</strong> 61% area reduction observed for the unpipelined<br />

adder tree. In one case the manual mapping for a butterfly circuit used less than 70% <strong>of</strong><br />

the device resources while the automatic mapping and placement was unable to fit the same<br />

circuit onto the device. Manual placement is clearly significantly superior here.<br />

The effectiveness <strong>of</strong> manual placement on positively influencing maximum clock frequency<br />

depends on other constraints. Given a homogenous environment, simulated annealing is able<br />

to generate circuits <strong>with</strong> equivalent or better performance by discovering high speed routing<br />

paths between cells that humans would not consider sensible - for example, the placed 24-bit<br />

pipelined binomial filter circuit is 35% slower than the automatically placed version. However,<br />

when other constraints are affecting the layout simulated annealing does not perform so well<br />

and the regular layout constraints can produce significant performance gains.<br />

The kind <strong>of</strong> constraints that affect simulated annealing appear to be use <strong>of</strong> the fast carry chain<br />

circuitry, which forces some cells to be laid out vertically, and level <strong>of</strong> device utilisation which<br />

reduces the ability to the placer to find high-speed routes through less densely packed logic.<br />

For pipelined bitonic merger butterfly networks, the 4-bit manually placed circuit utilises<br />

only 28% <strong>of</strong> the device runs 14% slower than the automatically placed version - however the<br />

8-bit version utilises 55% <strong>of</strong> the device and runs 48% faster than the automatically placed<br />

version.<br />

Manual placement <strong>of</strong>ten produces better results than simulated annealing for unpipelined<br />

4 The exception to this we observed was the binomial filter circuit, where we deliberately specified a less<br />

dense logic mapping in order to achieve a better aligned layout.


CHAPTER 6. LAYOUT CASE STUDIES 163<br />

circuits where the maximum clock frequency is already much lower than pipelined ones. This<br />

is not unexpected, since wiring delays will accumulate in the same way as logic propagation<br />

delays in unpipelined circuits.<br />

Generally, manual placement appears to lead to reduced power consumption, <strong>with</strong> reductions<br />

in power consumption <strong>of</strong> up to 40% possible (for the pipelined adder trees). In general power<br />

consumption can be reduced even if the maximum clock frequency <strong>of</strong> the placed design is<br />

lower than that for the automatically placed circuit. For the binomial filter power savings <strong>of</strong><br />

2-13% were observed even though the placed circuits had lower maximum clock frequencies.<br />

In the case <strong>of</strong> the butterfly network a correlation was once again observed <strong>with</strong> device util-<br />

isation/circuit size - <strong>with</strong> the 4-bit circuit consuming more power when placed although an<br />

8-bit circuit consumed less.<br />

6.8 Summary<br />

We have demonstrated our layout framework <strong>with</strong> a variety <strong>of</strong> real circuits including a ma-<br />

trix multiplier described <strong>with</strong> a new type <strong>of</strong> higher-dimensional combinator, a binomial filter,<br />

a butterfly network and a median filter. We have demonstrated how functional reasoning<br />

can be used to derive pipelined versions, while the layout framework can be used to verify<br />

layouts. We have found that in many, though not all cases, manually placed designs outper-<br />

form automatically placed circuits <strong>with</strong> higher maximum operating frequencies, lower device<br />

utilisation, lower power consumption and a faster place and route process.


Chapter 7<br />

Conclusion and Future Work<br />

In this final chapter we review the work reported in this thesis, its contribution and potential<br />

for future development.<br />

7.1 This Thesis’ Contribution<br />

In this thesis we have described the design, implementation and applications <strong>of</strong> a framework<br />

for describing and verifying parameterised <strong>FPGA</strong> circuits <strong>with</strong> layout information.<br />

In Chapter 3 we describe how we can extend Quartz <strong>with</strong> additional constructs to describe<br />

placed circuits. We show how two functions – maxf and sum – are sufficient to describe<br />

the placement and size <strong>of</strong> iterative structures. We demonstrate how the size <strong>of</strong> blocks can<br />

be inferred and also provide a mechanism for sizes to be specified manually. We show how<br />

high-order Quartz descriptions <strong>with</strong> layout information can be compiled into parameterised<br />

hardware libraries <strong>with</strong> high-order parameters removed.<br />

Chapter 4 describes an infrastructure for the verification <strong>of</strong> Quartz circuit layouts using the<br />

Isabelle theorem prover. We give a formal semantics for Quartz descriptions in HOL and<br />

provide HOL interpretations <strong>of</strong> layout correctness. Using a modified Quartz compiler we are<br />

able to automatically generate semantic definitions for Quartz blocks and pro<strong>of</strong> obligations for<br />

layout correctness. The compiler can also generate pro<strong>of</strong> scripts using Isabelle’s simplifier and<br />

164


CHAPTER 7. CONCLUSION AND FUTURE WORK 165<br />

classical reasoner to verify the layouts <strong>of</strong> many blocks <strong>with</strong>out requiring any user intervention<br />

at all. We demonstrate how this verification infrastructure can be applied to a range <strong>of</strong> useful<br />

combinators.<br />

In Chapter 5 we illustrate the use <strong>of</strong> our system to specialise designs when certain input values<br />

are known. We introduce the idea <strong>of</strong> distributed specialisation <strong>with</strong> self-specialising Quartz<br />

blocks and show that removing central control <strong>of</strong> HDL-level specialisation has significant<br />

advantages in terms <strong>of</strong> easing verification and faster processing for dynamic specialisation<br />

applications. We show that HDL-level specialisation <strong>of</strong> placed designs is able to achieve<br />

design compaction, unlike lower level approaches which operate on a fully routed circuit<br />

or placed netlist. We demonstrate the specialisation <strong>of</strong> a parallel multiplier and show that<br />

specialisation <strong>with</strong> compaction substantially increases performance and reduces the logic area<br />

required to implement the function.<br />

Chapters 6 demonstrates the use <strong>of</strong> our layout framework <strong>with</strong> several complete circuit de-<br />

scriptions, including a median filter and matrix multiplier. We introduce a new n-dimensional<br />

combinator for use in multi-dimensional circuit descriptions and show that the cubical version<br />

<strong>of</strong> this combinator can describe a matrix multiplication operation more clearly than existing<br />

combinators. We also demonstrate how Quartz combinators can be used at a low-level to<br />

pipeline circuits using the in-slice flip-flops <strong>of</strong> the Xilinx Virtex-II <strong>FPGA</strong> architecture. We<br />

show that for many circuits, manually placed designs are compiled quicker, require less logic<br />

area, have a higher maximum clock frequency and a lower power consumption than when<br />

automatically placed.<br />

7.2 Evaluation<br />

The layout generation system works well and we have succeeded in generating parameterised<br />

Pebble libraries from a variety <strong>of</strong> Quartz descriptions. In most cases maxf and sum functions<br />

are eliminated from the output in favour <strong>of</strong> conditional expressions by compiler optimisations<br />

– and these conditionals could themselves be eliminated by replacing them <strong>with</strong> Pebble<br />

conditional generation <strong>of</strong> the different alternatives if we so wished. The current version <strong>of</strong><br />

the Pebble compiler does not currently support generation <strong>of</strong> parameterised VHDL so we


CHAPTER 7. CONCLUSION AND FUTURE WORK 166<br />

have been unable to implement our system for generating parameterised placed VHDL from<br />

Pebble, although we are confident <strong>of</strong> its design and do not foresee any difficulties <strong>with</strong> this.<br />

There is currently no support for generating parameterised Pebble when Quartz blocks have<br />

recursive size expressions. There are a number <strong>of</strong> possible solutions to this problem that we<br />

have discussed, though we have not implemented them. The only totally general solution<br />

would be to allow Pebble expressions to contain recursive functions, which is not difficult<br />

although it is slightly untidy. We would recommend that recursive size expressions should be<br />

eliminated where possible by attempting to compute the transitive closure <strong>of</strong> the recursion.<br />

The verification framework also appears to work well for a wide variety <strong>of</strong> descriptions, rang-<br />

ing from library combinators to real circuits such as the median filter described in Section 6.3.<br />

However, we have learnt some important lessons about the relative ease <strong>of</strong> verifying different<br />

kinds <strong>of</strong> Quartz structures.<br />

<strong>Circuit</strong>s described iteratively can generally be verified relatively easily using the extensive<br />

range <strong>of</strong> theorems in the Quartz<strong>Layout</strong> library developed for maxf and sum. The theorems<br />

in the Structures theory which can be applied to easily prove intersection theorems for loops<br />

are particularly useful since the pro<strong>of</strong> goals tend to be quite complicated but are all <strong>of</strong> the<br />

same basic structure.<br />

<strong>Verification</strong> <strong>of</strong> recursively defined Quartz blocks is less automated. The compiler’s trans-<br />

lation into Isabelle recdef recursive definitions is correct but the definitions are not easily<br />

utilised by the theorem prover. We have had much more success <strong>with</strong> defining blocks using<br />

primitive recursion – however this is not a completely general approach. Pro<strong>of</strong>s for recursive<br />

size functions require induction, however generally after this the automated pro<strong>of</strong> tools are<br />

effective at completing the pro<strong>of</strong>.<br />

Two important lessons have been learnt from the verification <strong>of</strong> our example circuits. Firstly,<br />

that the size inference process produces better results than expected and the need for manual<br />

specification <strong>of</strong> size functions is much less than was initially presumed. Secondly, that the<br />

relational nature <strong>of</strong> Quartz seriously complicates layout reasoning in some cases.<br />

The success <strong>of</strong> the size inference process is such that for most real circuits it is likely that<br />

size inference will be used almost all <strong>of</strong> the time, <strong>with</strong> manual size expressions only specified


CHAPTER 7. CONCLUSION AND FUTURE WORK 167<br />

occasionally. Manual size expressions are still necessary, in order to describe the size <strong>of</strong><br />

primitive blocks or blocks which should reserve more space on the <strong>FPGA</strong> than they actually<br />

require – for example, so that larger designs can be swapped into the same area using run-<br />

time reconfiguration. However, the present verification framework requires the verification<br />

<strong>of</strong> the validity and containment <strong>of</strong> size expressions generated from the inference algorithm<br />

and these pro<strong>of</strong> goals could be mostly eliminated if pro<strong>of</strong> <strong>of</strong> the correctness <strong>of</strong> the inference<br />

algorithm itself could be built into the Isabelle embedding 1 .<br />

That Quartz directional abstraction complicates layout reasoning is a slightly unexpected<br />

result since it is generally regarded as simplifying functional reasoning. The use <strong>of</strong> the<br />

definite description operator to define the values <strong>of</strong> internal signals using a block’s predicate<br />

as generated by the semantic function Bβ makes it difficult to use the actual values <strong>of</strong> these<br />

signals (unless they are defined very simply). It is possible that the mechanism for resolving<br />

Quartz directions could be coded in Isabelle and used to extract the real values however<br />

since the algorithm is incomplete it is difficult to see how the necessary properties could be<br />

established formally.<br />

Despite this, internal signals do not tend to cause problems most <strong>of</strong> the time because it is<br />

not actually necessary to resolve their value. Since Quartz is one <strong>of</strong> the few languages <strong>with</strong><br />

directional abstraction this would also not be an issue if the verification framework were<br />

applied to another language.<br />

We have also shown how the layout infrastructure can be used as part <strong>of</strong> a system <strong>of</strong> dis-<br />

tributed self-specialising Quartz blocks to transparently specialise hardware when one or<br />

more input values are known at compile-time. The limitations <strong>of</strong> our current framework for<br />

carrying out distributed specialisation are discussed in Section 5.3 however even <strong>with</strong> these<br />

limitations it can be a useful tool for carrying out some simple optimisations on generated<br />

hardware <strong>with</strong>out requiring any significant designer effort.<br />

HDL-level specialisation is particularly important for placed hardware libraries because it<br />

allows designs to be compacted as logic is eliminated from circuits. This is not something<br />

that can be achieved <strong>with</strong> low level specialisation <strong>of</strong> the synthesised design because placement<br />

constraints are only parameterised in the high-level description.<br />

1 It would not be possible to totally eliminate all containment pro<strong>of</strong> obligations, since it still remains<br />

necessary to prove that no block exists to the left or below <strong>of</strong> (0, 0)


CHAPTER 7. CONCLUSION AND FUTURE WORK 168<br />

7.3 Comparison <strong>with</strong> Related Work<br />

To our knowledge, ours is the only work which addresses the issue <strong>of</strong> generating and verifying<br />

parameterised hardware libraries <strong>with</strong> explicit placement information and support for recur-<br />

sively and iterative described structures. However, other work has taken different approaches<br />

to the problem.<br />

7.3.1 VHDL <strong>with</strong> Explicit Co-ordinates<br />

VHDL and Verilog can be used <strong>with</strong> absolute placement co-ordinates specified using “RLOC”<br />

constraints for Xilinx architectures. VHDL does not provide any particular support for place-<br />

ment in particular, however it can be extended <strong>with</strong> user-defined functions to implement the<br />

equivalent <strong>of</strong> our maxf and sum operations. VHDL does not support higher-order functions,<br />

however, so these would need to be coded explicitly for each hardware arrangement.<br />

Nevertheless, our theorem proving verification framework could be applied to VHDL designs.<br />

The potential for the use <strong>of</strong> a few dozen key theorems to automate pro<strong>of</strong>s is severely reduced<br />

by the lack <strong>of</strong> higher-order functions however theorem proving can still be used to verify<br />

first-order placement co-ordinates. Size functions for each VHDL entity would probably<br />

need to be entered manually, although they could possibly be inferred in a similar way to<br />

Quartz providing a suitable subset <strong>of</strong> the language is used. VHDL allows the combination<br />

<strong>of</strong> behavioural and structural design styles <strong>with</strong>in a single description and if this was done<br />

then verification <strong>of</strong> the correctness <strong>of</strong> entity size functions would probably be impossible<br />

<strong>with</strong>out incorporating a theoretical model <strong>of</strong> how behavioural descriptions are elaborated<br />

into structural hardware.<br />

7.3.2 Relative Placement in Pebble<br />

Pebble has been extended <strong>with</strong> support for relative placement [49]. The Pebble system uses<br />

new language constructs to provide below, beside, below for and beside for capabilities and<br />

a functional specification has been given for a procedure which compiles relative positions<br />

into absolute co-ordinates. The Pebble system is clean and effective at describing many


CHAPTER 7. CONCLUSION AND FUTURE WORK 169<br />

common hardware constructs, however it contains several flaws which are not present in<br />

our approach. Firstly, by limiting itself to “conventional” arithmetic and logical expressions<br />

the Pebble system can not infer conditional branches in block sizes and is forced to select<br />

the maximum possible size <strong>of</strong> conditionals – something which can introduce inefficiency since<br />

circuits may not be placed as compactly as possible. Compaction <strong>of</strong> circuits <strong>with</strong> conditionals<br />

can be achieved using partial evaluation, however this requires that the final design is not<br />

parameterised in any variables which appear in conditional expressions.<br />

With the Pebble system layout correctness is achieved by design, through reliance on simple<br />

beside and below placement and the use <strong>of</strong> block size inference. However the Pebble system<br />

does not meet our own, more stringent, definition <strong>of</strong> correctness – for example, it may generate<br />

incorrect layouts when the size <strong>of</strong> each iteration <strong>of</strong> a loop is different. The insistence on using<br />

relative placement is also limiting and means that pathological cases such as the irregular<br />

grid example we described in Section 4.7 can not be described.<br />

On the other hand, the Pebble system does have some advantages over our framework. Firstly,<br />

it is much simpler and easier to implement in other languages, such as VHDL. By limiting<br />

itself to basic arithmetic and boolean expressions hardware may not be laid out completely<br />

optimally, however the expressions it does generate are simpler than those generated by the<br />

Quartz system which <strong>of</strong>ten contain conditionals or maxf/sum functions which have not been<br />

optimised away by the limited number <strong>of</strong> optimisations applied by the Quartz compiler.<br />

Finally, by incorporating beside and below as language constructs the Pebble system can<br />

express n-ary placement relationships more easily than can be done <strong>with</strong> Quartz. Quartz<br />

combinators are blocks like any other and have a fixed arity so can only be parameterised<br />

by a certain number <strong>of</strong> other blocks. Combinators can be parameterised by block vectors<br />

however this creates an unwanted type constraint that all blocks must have the same type.<br />

The Pebble system is designed to support placement <strong>of</strong> iteratively described circuit descrip-<br />

tions and does not detail how recursive blocks should be handled. This also causes problems<br />

for the compilation <strong>of</strong> Quartz layouts (see Section 3.7.4) however our general infrastructure<br />

does support recursive size functions.


CHAPTER 7. CONCLUSION AND FUTURE WORK 170<br />

7.3.3 Ruby and Lava<br />

Both Ruby [26] and Lava [7] have been used to generate placed circuit descriptions using<br />

their higher-order combinators. Ruby and Lava combinators can describe both function and<br />

placement, as <strong>with</strong> Quartz. Neither the Ruby nor Lava system supports the generation <strong>of</strong><br />

parameterised output, instead they generate flattened netlists.<br />

Ruby uses a variable-free notation <strong>of</strong> relations to describe circuits and it is thus relatively<br />

easy to give key combinators, such as beside and below a layout interpretation which can then<br />

be used to generate placed output. Ruby’s design style does not support explicit instantiation<br />

and signal connection and can not support explicit co-ordinates. This is a limiting factor and<br />

the system can not support the irregular grid example. No explicit verification infrastructure<br />

is available for Ruby layouts, however since the output is a flattened netlist and placement<br />

is limited to beside/below relationships it should be impossible to describe invalid layouts.<br />

Lava is a more flexible system based on Haskell. Lava provides a combinators which place<br />

components below each other, beside each other or at the same location. Unlike Ruby, it<br />

does not enforce a variable-free notation and is thus more flexible. A version <strong>of</strong> Lava has also<br />

been demonstrated that supports layout <strong>with</strong> explicit co-ordinates [77].<br />

While Lava provides constructs to aid the construction <strong>of</strong> correct layouts (using beside and<br />

below), it also permits possibly invalid layouts (by placing components in the same slice 2 ,<br />

either explicitly or using the combinator provided for this purpose). Unlike Ruby, Lava does<br />

therefore permit the description <strong>of</strong> invalid layouts and there is no infrastructure available for<br />

verifying Lava layouts however our layout verification infrastructure could be adapted to this<br />

purpose.<br />

Ruby and Lava use recursion as their only means <strong>of</strong> repetition. The languages do not support<br />

iterative descriptions, which can be a clearer way <strong>of</strong> describing some circuit arrangements<br />

than recursion (although they are not more powerful per se), although this is less relevant<br />

than for Quartz since they do not produce output in a format that supports iteration.<br />

Ruby and Lava do not support giving combinators different layout interpretations in the<br />

2 The version <strong>of</strong> Lava which supports placement was developed at Xilinx and is designed specifically to<br />

target Xilinx <strong>FPGA</strong>s.


CHAPTER 7. CONCLUSION AND FUTURE WORK 171<br />

same way as our framework. Lava does support overloading but using Haskell type classes<br />

which are not suitable for overloading blocks <strong>with</strong> different parameterisations in the way that<br />

is required for providing additional layout parameters.<br />

7.4 Future Work<br />

Several aspects <strong>of</strong> this work are particularly open ended and we will end by making a few<br />

recommendations for areas worthy <strong>of</strong> future investigation.<br />

7.4.1 Further Support For Alternative <strong>Layout</strong> Interpretations<br />

We have demonstrated how blocks can be given different layout interpretations and over-<br />

loading used to give one <strong>of</strong> these interpretations the status <strong>of</strong> a “default”. However, this<br />

approach still requires that two or more different layouts are explicitly coded for combinator<br />

blocks.<br />

While there will be some cases where blocks are described <strong>with</strong> completely unrelated layout<br />

interpretations, in most cases we expect these different interpretations to be variations on<br />

a theme. It is possible that these operations could be better described by vertical and<br />

horizontal flipping or rotation and higher-order blocks which performed these operation on<br />

their parameter block could provide a simpler method <strong>of</strong> achieving this result. Combinators<br />

<strong>with</strong> could rotate or flip blocks allow abstraction <strong>of</strong> a particular kind <strong>of</strong> layout operation and<br />

promote separation <strong>of</strong> concerns in the same way as higher-order combinators do.<br />

Such layout-manipulation combinators would need to be based on a different theoretical basis<br />

to our current system, where one block can not alter the internal structure <strong>of</strong> another. The<br />

verification <strong>of</strong> such combinators would be an interesting exercise, particularly ensuring that<br />

they do not invalidate a previously valid layout.<br />

Another useful extension would be to provide a mechanism for series and parallel compositions<br />

to be given multiple layout interpretations. Lava achieves this for series composition by<br />

providing different combinators for different series composition layouts, Quartz could take a<br />

similar approach but achieve it more concisely since series composition is a language-level


CHAPTER 7. CONCLUSION AND FUTURE WORK 172<br />

construct. Parallel compositions or series compositions could potentially be annotated <strong>with</strong><br />

explicit co-ordinates, or left to follow their default layout interpretation, as appropriate. This<br />

reduces the link between functional and non-functional (layout) description which, while it<br />

is <strong>of</strong>ten desirable, can sometimes be limiting – for example, when placing binary trees the<br />

ideal functional description is not a particularly good layout.<br />

7.4.2 Less User Interaction In Pro<strong>of</strong>s<br />

One issue <strong>with</strong> our pro<strong>of</strong> environment is the handling <strong>of</strong> recursive size functions using Is-<br />

abelle’s recdef construct. At present our approach is to manually convert the automatically<br />

generated definitions into primitive recursion and it would definitely be possible to automate<br />

this process, however it is only valid for a (large) subset <strong>of</strong> Quartz blocks. It should be pos-<br />

sible to get general size functions working by proving appropriate congruence rules to direct<br />

the automatic recdef termination pro<strong>of</strong>s however recdef is a closed box and it will probably<br />

be necessary to closely examine and possibly change the Isabelle/HOL source code to find<br />

and correct the problems.<br />

While we achieved extremely good levels <strong>of</strong> automation, there appears to be potential to<br />

make some considerable improvements. Firstly, we would advocate further investigation into<br />

how the layout theorems regarding series and parallel composition can be better utilised by<br />

the automatic pro<strong>of</strong> tools. There appears to be no reason why these theorems should need<br />

to be applied manually and many blocks that use composition could be proved much more<br />

easily if compositions were decomposed by the theorem prover correctly.<br />

We would also suggest that more experimentation <strong>with</strong> the ideal configuration <strong>of</strong> rule sets<br />

for the classical reasoner and simplifier to maximise effectiveness. We have generated scripts<br />

which specify specific rule sets for each theorem pro<strong>of</strong> in the Quartz compiler but a better<br />

mechanism would be to specify these rule sets as defaults and prove theorems using simply<br />

“auto”. This is not as easy as it sounds since we are presently generating different scripts<br />

for different types <strong>of</strong> theorems and some sort <strong>of</strong> compromise solution would need to be found<br />

that performed as well on all types <strong>of</strong> theorem as the current pro<strong>of</strong>-specific method.<br />

Another avenue that might be worth investigating is to combine multiple logics and provers


CHAPTER 7. CONCLUSION AND FUTURE WORK 173<br />

to achieve better automation. While Quartz combinators are usually high-order blocks and<br />

require a high-order formalism to verify their layouts, Quartz circuits tend to be parame-<br />

terised purely by integer or boolean parameters. As such, it is possible that Quartz libraries<br />

could be verified in higher-order logic and these pro<strong>of</strong>s could be in some way be treated as<br />

axiomatic in pro<strong>of</strong>s for a whole circuit using a different prover <strong>with</strong> a different formalism.<br />

The ACL2 theorem prover [34] is known for supporting very high levels <strong>of</strong> automation but<br />

proves theorems in the first-order Boyer-Moore logic [9], it is possible that a combination<br />

<strong>of</strong> Isabelle and ACL2 could produce better results than Isabelle alone. A major practical<br />

difficulty that would need to be overcome here is ensuring the soundness <strong>of</strong> the interaction<br />

between the two different logics.<br />

7.4.3 Integrating <strong>Layout</strong> and Functional <strong>Verification</strong><br />

In this work we have developed a shallow embedding <strong>of</strong> Quartz designed specifically to enable<br />

the verification <strong>of</strong> design layouts. It seems slightly paradoxical to maintain two different<br />

verification systems, one for functionality and one for layout, when the two could potentially<br />

be combined into a single embedding <strong>with</strong>in a theorem prover.<br />

To support full functional reasoning (some limited reasoning about functional properties is<br />

already possible) Quartz<strong>Layout</strong> would need to be extended <strong>with</strong> a timing model to allow the<br />

data values on wires to be properly modelled in synchronous circuits. It is likely that a deep<br />

embedding <strong>of</strong> Quartz, rather than a shallow semantic embedding, would be the best way to<br />

combine functional and layout verification in a single environment. We have already laid the<br />

foundations for the definition <strong>of</strong> such an embedding by defining a formal semantics <strong>of</strong> Quartz<br />

in HOL and this function could be translated into an Isabelle implementation to provide a<br />

meaning function for the deep embedding.<br />

7.4.4 Run-time Reconfiguration<br />

In Chapter 5 we demonstrated the ability <strong>of</strong> our layout framework to support the special-<br />

isation <strong>of</strong> designs. Dynamic specialisation <strong>of</strong> designs at run-time is potentially a highly<br />

worthwhile activity, <strong>with</strong> performance gains outweighing the time required to reconfigure a


CHAPTER 7. CONCLUSION AND FUTURE WORK 174<br />

chip in operation.<br />

We have showed that distributed specialisation makes the process <strong>of</strong> verifying the specialisa-<br />

tion <strong>of</strong> designs easier than lower-level approaches [80] and we have suggested that it should<br />

be quicker, though we have not quantitatively evaluated the execution speed <strong>of</strong> a HDL-level<br />

distributed specialisation process.<br />

Distributed specialisation in Quartz is an area that is definitely worthy <strong>of</strong> further investi-<br />

gation. In particular the addition <strong>of</strong> particular constructs to to the language to support<br />

run-time reconfiguration directly, as have been demonstrated for Pebble [15], could be in-<br />

vestigated. This would support a higher-level interface to run-time reconfiguration than the<br />

current capabilities using virtual multiplexer blocks [47].<br />

7.4.5 Properties <strong>of</strong> N-Dimensional Combinators<br />

The new class <strong>of</strong> n-dimensional combinators we introduced in Section 6.6 show the potential<br />

to substantially simplify the description <strong>of</strong> certain kinds <strong>of</strong> circuits, particularly those that<br />

manipulate multiple multi-dimensional data sources. Initial investigations also indicate that<br />

they could be used to clearly describe the translation <strong>of</strong> some classes <strong>of</strong> imperative function<br />

descriptions based on nested loops into hardware.<br />

The definition <strong>of</strong> an n-dimensional combinator suggests that it should be possible to prove<br />

useful theorems such as retiming and serialisation for all n-dimensional combinators at once,<br />

even though the description <strong>of</strong> the fully general case is substantially more complicated that<br />

the combinator for any particular dimension. Theorems that are valid for this entire class <strong>of</strong><br />

combinators could be extremely useful since, even if complicated to prove, they would only<br />

need to be proved once.<br />

It is also worth investigating the different ways <strong>of</strong> mapping n-dimensional descriptions into<br />

two-dimensional <strong>FPGA</strong> hardware and, if there are multiple good ways <strong>of</strong> doing this, under<br />

what situations each is optimal.


Bibliography<br />

[1] A. Aggoun and N. Beldiceanu. Extending CHIP in order to solve complex scheduling<br />

and placement problems. J. Mathematical and Computer Modelling, 17(7):57–73, 1993.<br />

[2] M. J. Alexander, J. P. Cohoon, J. L. Colflesh, J. Karro, and G. Robins. Three-<br />

dimensional field-programmable gate arrays. In Proc. IEEE Intl. ASIC Conf., 1995.<br />

[3] P. Alfke. Efficient Shift Registers, LFSR Counters and Long Pseudo-Random Sequence<br />

Generators, Xilinx Application Note 052, July 1996.<br />

[4] J. M. Arnold. S5: The architecture and development flow <strong>of</strong> a s<strong>of</strong>tware configurable<br />

processor. In Proc FPT’05: IEEE Conf. Field Programmable Technology, to appear.<br />

[5] D. Basin, H. Kuruma, K. Takaragi, and B. Wolff. <strong>Verification</strong> <strong>of</strong> a signature architecture<br />

<strong>with</strong> HOL-Z. In Formal Methods 2005, volume 3582 <strong>of</strong> LNCS, pages 269–285. Springer-<br />

Verlag, 2005.<br />

[6] N. Beldiceanu and M. Carlsson. Sweep as a generic pruning technique applied to the non-<br />

overlapping rectangles constraint. In T. Walsh, editor, Proc. Constraint Programming<br />

2001, volume 2239 <strong>of</strong> LNCS, pages 377–391. Springer-Verlag, 2001.<br />

[7] P. Bjesse, K. Claessen, M. Sheeran, and S. Singh. Lava: hardware design in Haskell.<br />

In ICFP ’98: Proc. 3rd ACM SIGPLAN Intl. Conf. on Functional programming, pages<br />

174–184. ACM Press, 1998.<br />

[8] R. Boulton, A. Gordon, M. Gordon, J. Harrison, J. Herbert, and J. V. Tassel. Experience<br />

<strong>with</strong> embedding hardware description languages in HOL. In V. Stavridou, T. F. Melham,<br />

and R. T. Boute, editors, IFIP TC10/WG 10.2 Intl. Conf. on Theorem Provers in Cir-<br />

175


BIBLIOGRAPHY 176<br />

cuit Design: Theory, Practice and Experience, pages 129–156. North-Holland/Elsevier,<br />

June 1992.<br />

[9] R. S. Boyer and J. S. Moore. A Computational Logic. Academic Press, NY, 1979.<br />

[10] B. Brock, M. Kaufmann, and J. S. Moore. ACL2 theorems about commercial micro-<br />

processors. In M. K. Srivas and A. J. Camilleri, editors, FMCAD’96: First International<br />

Conference on Formal Methods in Computer Aided Design, volume 1166 <strong>of</strong> LNCS, pages<br />

275–293, Palo Alto, California, USA, 1996. Springer-Verlag.<br />

[11] R. E. Bryant. Symbolic boolean manipulation <strong>with</strong> ordered binary-decision diagrams.<br />

ACM Computing Surveys, 24(3):293–318, 1992.<br />

[12] Celoxica. Handel-C Lanuage Reference Manual, 2001.<br />

[13] J. Cong and Y. Ding. FlowMap: An optimal technology mapping algorithm for delay<br />

optimization in lookup-table based <strong>FPGA</strong> designs. IEEE Trans. Computer Aided Design<br />

<strong>of</strong> Integrated <strong>Circuit</strong>s and Systems, 13(1):1–12, 1994.<br />

[14] L. Damas and R. Milner. Principal type-schemes for functional programs. In POPL ’82:<br />

Proc. 9th ACM Symp. on Principles <strong>of</strong> Programming Languages, pages 207–212. ACM<br />

Press, 1982.<br />

[15] A. Derbyshire and W. Luk. Compiling run-time parametrisable designs. In Proc.<br />

FPT’02: IEEE Intl. Conf. on Field-Programmable Technology, pages 44–51, December<br />

2002.<br />

[16] E. W. Dijkstra. Selected Writings on Computing: A Personal Perspective. Springer-<br />

Verlag, 1982.<br />

[17] H. Fan, J. Liu, and Y.-L. Wu. General models for optimum arbitrary-dimension <strong>FPGA</strong><br />

switch box designs. In Proc. IEEE/ACM Conf. CAD (ICCAD), pages 93–98, 2000.<br />

[18] A. C. J. Fox. Formal specification and verification <strong>of</strong> ARM6. In D. Basin and B. Wolff,<br />

editors, TPHOLS’03: 16th International Conference on Theorem Proving in Higher<br />

Order Logics, volume 2758 <strong>of</strong> LNCS. Springer-Verlag, 2003.


BIBLIOGRAPHY 177<br />

[19] H. Gelernter. Realization <strong>of</strong> a geometry-theorem proving machine. In J. Siekmann and<br />

G. Wrightson, editors, Automation <strong>of</strong> Reasoning: Classical Papers on Computational<br />

Logic 1957–1966, volume 1, pages 99–124. Springer-Verlag, 1983.<br />

[20] H. Gelernter, J. R. Hansen, and D. W. Loveland. Empirical explorations <strong>of</strong> the geometry-<br />

theorem proving machine. In J. Siekmann and G. Wrightson, editors, Automation <strong>of</strong> Rea-<br />

soning: Classical Papers on Computational Logic 1957–1966, pages 140–150. Springer-<br />

Verlag, 1983.<br />

[21] M. Gordon, R. Milner, and C. Wadsworth. Edinburgh LCF: a mechanised logic <strong>of</strong><br />

computation, volume 78 <strong>of</strong> LNCS. Springer-Verlag, 1979.<br />

[22] M. J. C. Gordon. HOL: A pro<strong>of</strong> generating system for higher-order logic. In G. Birtwistle<br />

and P. Subrahmanyam, editors, VLSI Specification, <strong>Verification</strong> and Synthesis, pages<br />

73–128. Kluwer, 1988.<br />

[23] J. Greene, E. Hamdy, and S. Beal. Antifuse field programmable gate arrays. Proceedings<br />

<strong>of</strong> the IEEE, 81(7):1042–1056, 1993.<br />

[24] S. Guo and W. Luk. Compiling Ruby into <strong>FPGA</strong>s. In W. Moore and W. Luk, editors,<br />

Proc. FPL’95: Field Programmable Logic and Applications, volume 975 <strong>of</strong> LNCS, pages<br />

188–197. Springer-Verlag, 1995.<br />

[25] S. Guo and W. Luk. Producing design diagrams from declarative descriptions. In<br />

S. Yand, J. Zhou, and C. Li, editors, Proc. 4th Intl. Conf. on CAD/CG, pages 1084–<br />

1093. SPIE, 1995.<br />

[26] S. Guo and W. Luk. An integrated system for developing regular array designs. J.<br />

Systems Architecture, 47(3-4):315–337, 2001.<br />

[27] R. Hindley. The principal type scheme <strong>of</strong> an object in combinatory logic. Trans. Amer.<br />

Math. Soc., 146:29–60, 1969.<br />

[28] J. Hughes. Why functional programming matters. Computer Journal, 32(2):98–107,<br />

1989.<br />

[29] M. Huth and M. Ryan. Logic in Computer Science. Cambridge University Press, 2004.


BIBLIOGRAPHY 178<br />

[30] IEEE. IEEE standard VHDL language reference manual, IEEE Std 1076-1987, March<br />

1988.<br />

[31] G. Jones and M. Sheeran. <strong>Circuit</strong> design in Ruby. In J. Staunstrup, editor, Formal<br />

Methods for VLSI Design, pages 13–70. North-Holland/Elsevier, 1990.<br />

[32] G. Jones and M. Sheeran. Relations and refinement in circuit design. In C. Morgan and<br />

J. Woodcock, editors, 3rd Refinement Workshop, Springer Workshops in Computing.<br />

Springer-Verlag, 1991.<br />

[33] J. J. Joyce. Formal verification and implementation <strong>of</strong> a microprocessor. In G. Birtwistle<br />

and P. A. Subrahmanyam, editors, VLSI Specification, <strong>Verification</strong> and Synthesis.<br />

Kluwer, 1988.<br />

[34] M. Kaufmann and J. S. Moore. An industrial strength theorem prover for a logic based<br />

on Common Lisp. IEEE Transactions on S<strong>of</strong>tware Engineering, 23(4):203–213, April<br />

1997.<br />

[35] S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi. Optimization by simulated annealing.<br />

Science, 220(4598):671–680, 1983.<br />

[36] A. Krukowski and I. Kale. Simulink/Matlab-to-VHDL route for full custom/<strong>FPGA</strong> rapid<br />

prototyping <strong>of</strong> DSP algorithms. In Proc. DSP’99: Matlab DSP Conference, Tampere,<br />

Finland, November 1999.<br />

[37] C. Kulkarni, G. Brebner, and G. Schelle. Mapping a domain specific language to a<br />

platform <strong>FPGA</strong>. In Proc. DAC’04: 41st Design Automation Conference, pages 924–<br />

927, 2004.<br />

[38] L. Lamport and L. C. Paulson. Should your specification language be typed? ACM<br />

Trans. Program. Lang. Syst., 21(3):502–526, 1999.<br />

[39] P.-Z. Lee and Z. M. Kedem. Mapping nested loop algorithms into multidimensional<br />

systolic arrays. IEEE Trans. Parallel and Distributed Systems, 1(1):64–76, 1990.<br />

[40] M. Lesser, W. M. Meleis, M. M. Vai, S. Chiricescu, W. Xu, and P. M. Zavracky. Rothko:<br />

A three-dimensional <strong>FPGA</strong>. IEEE Design and Test <strong>of</strong> Computers, 15(1):16–23, 1998.


BIBLIOGRAPHY 179<br />

[41] Y. Li and M. Leeser. HML, a novel hardware description language and its translation<br />

to VHDL. IEEE Trans. on Very Large Scale Integration Systems, 8(1):1–8, 2000.<br />

[42] H. Lim and E. E. Swartzlander. Multidimensional systolic arrays for the implementation<br />

<strong>of</strong> discrete fourier transforms. IEEE Trans. Signal Processing, 47(5):1359–1370, May<br />

1999.<br />

[43] N. Ling and M. A. Bayoumi. The design and implementation <strong>of</strong> multidimensional systolic<br />

arrays for DSP applications. In Proc. ICASSP-89: Intl. Conf. on Acoustics, Speech, and<br />

Signal Processing, pages 1142–1145. IEEE, 1989.<br />

[44] W. Luk. Systolic band-matrix multipliers. Electronics Letters, 26(6):403–405, March<br />

1990.<br />

[45] W. Luk. Systematic serialisation <strong>of</strong> array-based architectures. Integration, the VLSI<br />

Journal, 14(3):333–360, February 1993.<br />

[46] W. Luk and S. McKeever. Pebble: a language for parameterised and reconfigurable<br />

hardware design. In R. W. Hartenstein and A. Keevallik, editors, Proc. FPL’98: Field-<br />

Programmable Logic and Applications, volume 1482 <strong>of</strong> LNCS, pages 9–18. Springer-<br />

Verlag, 1998.<br />

[47] W. Luk, N. Shirazi, and P. Y. K. Cheung. Compilation tools for run-time reconfig-<br />

urable designs. In Proc. FCCM’97: 5th IEEE Symp. on Field-Programmable Custom<br />

Computing Machines, pages 56–65. IEEE Computer Society, 1997.<br />

[48] S. McKeever and W. Luk. Towards provably-correct hardware compilation tools based<br />

on pass separation techniques. In Proc. CHARME ’01: 11th Conf. on Correct Hardware<br />

Design and <strong>Verification</strong> Methods, volume 2144 <strong>of</strong> LNCS, pages 212–227, London, UK,<br />

2001. Springer-Verlag.<br />

[49] S. McKeever, W. Luk, and A. Derbyshire. Compiling hardware descriptions <strong>with</strong> relative<br />

placement information for parameterised libraries. In M. Aagaard and J. O’Leary, edi-<br />

tors, Proc. FMCAD 2002: 4th Intl. Conf. Formal Methods in Computer-Aided Design,<br />

volume 2517 <strong>of</strong> LNCS, pages 342–359. Springer-Verlag, 2002.


BIBLIOGRAPHY 180<br />

[50] S. McKeever, W. Luk, and A. Derbyshire. Towards verifying parametrised hardware li-<br />

braries <strong>with</strong> relative placement information. In Proc. HICSS ’03: 36th Hawaii Intl. Conf.<br />

on System Sciences, page 10, Washington, DC, USA, 2003. IEEE Computer Society.<br />

[51] W. M. Meleis, M. Leeser, P. Zavracky, and M. M. Vai. Architectural design <strong>of</strong> a three<br />

dimensional <strong>FPGA</strong>. In Proc. 17th Conf. Advanced Research in VLSI (ARVLSI), pages<br />

256–268, September 1997.<br />

[52] T. Melham. Higher Order Logic and Hardware <strong>Verification</strong>. Cambridge Tracts in The-<br />

oretical Computer Science. Cambridge University Press, 1993.<br />

[53] R. Milner. A theory <strong>of</strong> type polymorphism in programming. J. Comput. Syst. Sci.,<br />

17:348–375, 1978.<br />

[54] A. Mycr<strong>of</strong>t and R. Sharp. Higher-level techniques for hardware description and synthesis.<br />

International Journal on S<strong>of</strong>tware tools for Technology Transfer, 4(3):271–297, May<br />

2003.<br />

[55] T. Nipkow, L. C. Paulson, and M. Wenzel. Isabelle/HOL: A Pro<strong>of</strong> Assistant for Higher-<br />

Order Logic, volume 2283 <strong>of</strong> LNCS. Springer-Verlag, 2002.<br />

[56] S.-W. Ong, N. Kerkiz, B. Srijanto, C. Tan, M. Langston, D. Newport, and D. Bouldin.<br />

Automatic mapping <strong>of</strong> multiple applications to multiple adaptive computing systems.<br />

In Proc. FCCM’01: Proc. 9th IEEE Symp. Field-Programmable Custom Computing<br />

Machines, pages 10–20. IEEE, 2001.<br />

[57] J. Ou and V. K. Prasanna. Parameterized and energy efficient adaptive beamforming on<br />

<strong>FPGA</strong>s using MATLAB/Simulink. In Proc. ICASSP’04: IEEE Intl. Conf. Acoustics,<br />

Speech, and Signal Processing, volume 5, pages 181–184, 2004.<br />

[58] J. Ou and V. K. Prasanna. Pygen: a MATLAB/Simulink based tool for synthesizing<br />

parameterized and energy efficient designs using <strong>FPGA</strong>s. In Proc. FCCM 2004: 12th<br />

IEEE Symp. Field-Programmable Custom Computing Machines, pages 47–56. IEEE,<br />

April 2004.<br />

[59] S. Owre, J. M. Rushby, and N. Shankar. PVS: A prototype verification system. In<br />

D. Kapur, editor, 11th Intl. Conf. on Automated Deduction (CADE), volume 607 <strong>of</strong><br />

LNAI, pages 748–752, Saratoga, NY, June 1992. Springer-Verlag.


BIBLIOGRAPHY 181<br />

[60] L. C. Paulson. The foundation <strong>of</strong> a generic theorem prover. J. Automated Reasoning,<br />

5:363–397, 1989.<br />

[61] L. C. Paulson. Isabelle: A Generic Theorem Prover, volume 828 <strong>of</strong> LNCS. Springer-<br />

Verlag, 1994.<br />

[62] O. Pell. Quartz: A new language for hardware description. Final Year Project Report,<br />

Dept <strong>of</strong> Computing, Imperial College, June 2004.<br />

[63] O. Pell. Quartz compilation algorithms. ISO Dissertation, Dept <strong>of</strong> Computing, Imperial<br />

College, January 2005.<br />

[64] O. Pell and W. Luk. Resolving Quartz overloading. In D. Borrione and W. Paul, edi-<br />

tors, Proc. CHARME’05: 13th Conference on Correct Hardware Design and <strong>Verification</strong><br />

Methods, volume 3725 <strong>of</strong> LNCS, pages 380–383. Springer-Verlag, 2005.<br />

[65] O. Pell and W. Luk. Quartz: A framework for correct and efficient reconfigurable design.<br />

In Proc. RECONFIG’05: Intl. Conf. on Reconfigurable Computing and <strong>FPGA</strong>s. IEEE<br />

Computer Society Press, September 2005, to appear.<br />

[66] O. Pell and H. Yu. User’s tutorial for Pebble 5.0. https://cc.doc.ic.ac.uk/local/<br />

pebble/doc/users/.<br />

[67] F. P. Preparata and M. I. Shamos. Computational Geometry: An Introduction. Springer-<br />

Verlag, 1985.<br />

[68] O. Rasmussen. Transformational VLSI Design. PhD thesis, Technical University <strong>of</strong><br />

Denmark, 1997.<br />

[69] T. Riesgo, Y. Torroja, and E. de la Torre. Design methodologies based on hardware<br />

description languages. IEEE Trans. Industrial Electronics, 46(1):3–12, February 1999.<br />

[70] J. A. Robinson. A machine-oriented logic based on the resolution principle. J. ACM,<br />

12(1):23–41, 1965.<br />

[71] P. Rudnicki. An overview <strong>of</strong> the Mizar project. In Proc. 1992 Workshop on Types for<br />

Pro<strong>of</strong>s and Programs, Chalmers University <strong>of</strong> Technology, Bastad, 1992.


BIBLIOGRAPHY 182<br />

[72] H. Schmit. Extra-dimensional island-style <strong>FPGA</strong>s. In Proc. FPL 2003: Field Program-<br />

mable Logic and Applications, pages 406–415, 2003.<br />

[73] C.-J. H. Seger and R. E. Bryant. Formal verification by symbolic evaluation <strong>of</strong> partially-<br />

ordered trajectories. Formal Methods in Systems Design, 6:147–189, 1994.<br />

[74] R. Sharp and O. Rasmussen. The T-Ruby design system. Formal Methods in System<br />

Design, 11(3):239–264, 1997.<br />

[75] M. Sheeran. Finding regularity: Describing and analysing circuits that are not quite<br />

regular. In Proc. CHARME’03: Correct Hardware Design and <strong>Verification</strong> Methods,<br />

volume 2860 <strong>of</strong> LNCS, pages 4–18. Springer-Verlag, 2003.<br />

[76] M. Sheeran. Generating fast multipliers using clever circuits. In A. J. Hu and A. K.<br />

Martin, editors, Proc. FMCAD 2004: 5th Intl. Conf. Formal Methods in Computer-<br />

Aided Design, volume 3312 <strong>of</strong> LNCS, pages 6–20. Springer-Verlag, 2004.<br />

[77] S. Singh. Death <strong>of</strong> the RLOC? In FCCM’00: Proc. 8th IEEE Symp. on Field-<br />

Programmable Custom Computing Machines, page 145, Washington, DC, USA, 2000.<br />

IEEE Computer Society.<br />

[78] S. Singh and P. James-Roxby. Lava and JBits: From HDL to bitstream in seconds. In<br />

Proc. FCCM’01: 9th IEEE Symp. Field-Programmable Custom Computing Machines,<br />

2001.<br />

[79] M. K. Srivas and S. P. Miller. Applying formal verification to the AAMP5 microproces-<br />

sor: A case study in the industrial use <strong>of</strong> formal methods. Formal Methods in System<br />

Design, 8(2):153–188, 1996.<br />

[80] K. W. Susanto and T. Melham. Formally analyzed dynamic synthesis <strong>of</strong> hardware. J.<br />

Supercomputing, 19(1):7–22, 2001.<br />

[81] D. E. Thomes and P. Moorby. The Verilog Hardware Description Language. Kluwer<br />

Academic, 3rd edition edition, 1996.<br />

[82] T. Todman and W. Luk. Combining imperative and declarative hardware descriptions.<br />

In Proc. HICSS ’03: 36th Hawaii Intl. Conf. on System Sciences, pages 280–289. IEEE<br />

Computer Society, 2003.


BIBLIOGRAPHY 183<br />

[83] J. Voeten. On the fundamental limitations <strong>of</strong> transformational design. ACM Transac-<br />

tions on Design Automation <strong>of</strong> Electronic Systems, 6(4):533–552, 2001.<br />

[84] P. Wadler and S. Blott. How to make ad-hoc polymorphism less ad hoc. In POPL ’89:<br />

Proceedings <strong>of</strong> the 16th ACM SIGPLAN-SIGACT symposium on Principles <strong>of</strong> program-<br />

ming languages, pages 60–76. ACM Press, 1989.<br />

[85] S. J. E. Wilton, S.-S. Ang, and W. Luk. The impact <strong>of</strong> pipelining on energy per operation<br />

in field-programmable gate arrays. In Proc. FPL’04: Intl. Conf. on Field-Programmable<br />

Logic, volume 3203 <strong>of</strong> LNCS, pages 719–728, Antwerp, Belgium, August 2004. Springer-<br />

Verlag.<br />

[86] Xilinx, Inc. Virtex-II Platform <strong>FPGA</strong>s: Complete Data Sheet, March 2005. DS031.


Appendix A<br />

Quartz Language Grammar<br />

This appendix contains the grammar for the Quartz language <strong>with</strong> placement constructs, in<br />

extended Backus-Naur form (EBNF).<br />

Terminals are set in typewriter font, while non-terminals are set between angled brackets.<br />

Round brackets are used to indicate grouping.<br />

The asterisk symbol (*) indicates zero-or-more repetition. The plus symbol (+) indicates<br />

once-or-more repetition. The question mark symbol (?) indicates zero-or-once repetition.<br />

〈design〉 ::= 〈blockdef 〉*<br />

〈blockdef 〉 ::= block 〈id〉 〈domain〉 ~ 〈range〉 { 〈dec〉* 〈stmt〉* }<br />

〈domain〉 ::= 〈io〉 〈io〉*<br />

〈range〉 ::= 〈io〉<br />

| block 〈id〉 〈domain〉 ~ 〈range〉 -> 〈singlestmt〉<br />

〈io〉 ::= ( 〈io tuple elt〉 (, 〈io tuple elt〉 )* )<br />

〈io tuple elt〉 ::= 〈dir〉? 〈basictype〉 〈id〉 〈vecindex 〉*<br />

〈dir〉 ::= in<br />

| 〈dir〉? 〈block〉 〈id〉 〈blocksig〉 〈vecindex 〉*<br />

| ( 〈io tuple elt〉 (, 〈io tuple elt〉 )* )<br />

| out<br />

184


APPENDIX A. QUARTZ LANGUAGE GRAMMAR 185<br />

| ^〈id〉<br />

| ^〈id〉*<br />

〈basictype〉 ::= int<br />

| bool<br />

| wire<br />

| ‘〈id〉<br />

〈vecindex 〉 ::= ([〈expr〉])*<br />

〈blocksig〉 ::= 〈sig elt〉 ~ 〈sig elt〉<br />

〈sig elt〉 ::= 〈dir〉? 〈basictype〉 〈vecindex 〉*<br />

| 〈dir〉? block 〈blocksig〉 〈vecindex 〉*<br />

| ( 〈sig elt〉 (, 〈sig elt〉)* )<br />

〈dec〉 ::= 〈basictype〉 〈id〉 〈vecindex 〉 (, 〈id〉 〈vecindex 〉)* .<br />

〈singlestmt〉 ::= 〈blkref 〉 .<br />

| const 〈id〉 = 〈expr〉 (, 〈id〉 = 〈expr〉)* .<br />

| 〈stmt〉<br />

〈stmt〉 ::= 〈blkinst〉 .<br />

| for 〈id〉 = 〈expr〉..〈expr〉 { 〈stmt〉* } .<br />

| if (〈expr〉) { 〈stmt〉* } ( else { 〈stmt〉* } ) .<br />

| 〈expr〉 = 〈expr〉 .<br />

| assert (〈expr〉) "〈string〉" .<br />

〈blkinst〉 ::= 〈domainval〉 ; 〈blkref 〉 (; 〈blkref 〉 )* ; 〈rangeval〉<br />

| 〈blkref 〉 〈domainval〉 ~ 〈rangeval〉<br />

| 〈domainval〉 ; 〈blkref 〉 (; 〈blkref 〉 )* ; 〈rangeval〉 at ( 〈expr〉 , 〈expr〉 )<br />

| 〈blkref 〉 〈domainval〉 ~ 〈rangeval〉 at ( 〈expr〉 , 〈expr〉 )<br />

〈blkref 〉 ::= 〈id〉 〈arg〉*<br />

| [ ]<br />

| [ 〈blkref 〉 (, 〈blkref 〉)* ]<br />

〈domainval〉 ::= 〈arg〉 〈arg〉*


APPENDIX A. QUARTZ LANGUAGE GRAMMAR 186<br />

〈rangeval〉 ::= 〈arg〉<br />

〈arg〉 ::= 〈id〉 〈vecindex 〉<br />

| 〈expr〉<br />

| ( )<br />

| ( 〈arg〉 (, 〈arg〉)* )<br />

〈expr〉 ::= 〈expr〉 〈bop〉 〈expr〉<br />

| 〈uop〉 〈expr〉<br />

| 〈id〉<br />

| 〈num〉<br />

| true | false<br />

| 〈expr〉..〈expr〉<br />

| ( 〈expr〉 )<br />

| height ( 〈blkinst〉 )<br />

| width ( 〈blkinst〉 )<br />

| max ( 〈expr〉, 〈expr〉 (, 〈expr〉)* )<br />

| if ( 〈expr〉 , 〈expr〉 , 〈expr〉 )<br />

| sum ( 〈id〉 = 〈expr〉 .. 〈expr〉 , 〈expr〉 )<br />

| maxf ( 〈id〉 = 〈expr〉 .. 〈expr〉 , 〈expr〉 )<br />

〈bop〉 ::= and | or | nand | nor | xor | xnor<br />

| + | - | * | / | ** | mod | == | !=<br />

| < | | >=<br />

〈uop〉 ::= - | abs | not<br />

〈id〉 ::= (‘A’-‘Z’ | ‘a’-‘z’)+ (‘A’-‘Z’ | ‘a’-‘z’ | ‘0’-‘9’ | ‘ ’)*<br />

〈num〉 ::= (‘0’-‘9’)*


Appendix B<br />

Theoretical Basis for <strong>Layout</strong><br />

Reasoning<br />

This appendix contains Isabelle theories which form the Quartz<strong>Layout</strong> library.<br />

IntAlgebra defines Quartz operators that are not already present in HOL, including ><br />

and a power operation for integers. It also includes many useful theorems that can be used<br />

to rewrite integer expressions.<br />

Types declares the Quartz types <strong>of</strong> wires and vectors.<br />

Block defines Quartz blocks as Isabelle records consisting <strong>of</strong> their functional definition,<br />

height function and width function. The theory also defines the block instantiation operation<br />

and a number <strong>of</strong> simplification theorems.<br />

Inbuilt defines the layout interpretations <strong>of</strong> language constructs that are treated as inbuilt<br />

blocks (such as zip).<br />

SeriesComposition defines the semantics and layout interpretation <strong>of</strong> Quartz series com-<br />

position. It also includes pro<strong>of</strong>s <strong>of</strong> useful properties for the layout <strong>of</strong> series compositions.<br />

187


APPENDIX B. THEORETICAL BASIS FOR LAYOUT REASONING 188<br />

ParallelComposition defines the semantics and layout interpretation <strong>of</strong> Quartz parallel<br />

composition. The theory includes AST rewriting functions to allow Isabelle to parse and<br />

pretty-print parallel compositions and also contains pro<strong>of</strong>s <strong>of</strong> useful properties <strong>of</strong> parallel<br />

composition layouts.<br />

Functions defines the maxf and sum functions and includes their correctness pro<strong>of</strong>s and<br />

theorems describing a wide range <strong>of</strong> useful properties.<br />

Structures contains theorems that are particularly useful for simplifying pro<strong>of</strong> goals that<br />

are formed by certain circuit structures (e.g. horizontal arrays).<br />

CompilerSimps contains the pro<strong>of</strong>s <strong>of</strong> simplification rules used in the Quartz compiler to<br />

simplify maxf and sum functions.<br />

Quartz<strong>Layout</strong> is a dummy theory which brings together all dependent theories to be used<br />

as the root library.<br />

Minf is not included in the Quartz<strong>Layout</strong> library. It contains definitions and correspon-<br />

dance theorems between min and max functions and minf and maxf functions. If expressions<br />

using these minimum operators are desired these theorems can be used to rewrite expressions<br />

into forms which use purely the max and maxf functions.<br />

B.1 IntAlgebra<br />

header {* Useful theorems about integers for use in size function reasoning *}<br />

theory IntAlgebra = Main:<br />

section {* Additional ordering operators *}<br />

constdefs<br />

grthn:: "[’a::ord, ’a]=>bool" ("(_/ > _)" [50, 51] 50)<br />

"grthn == % a b. (b < a)"<br />

geq:: "[’a::ord, ’a]=>bool" ("(_/ >= _)" [50, 51] 50)<br />

"geq == % a b. (b


APPENDIX B. THEORETICAL BASIS FOR LAYOUT REASONING 189<br />

declare geq_def [simp]<br />

section {* Power function for integers *}<br />

(* Undefined for negative argument y - result must be an integer *)<br />

consts<br />

pwr :: "[int,int]=>int" (infixr 60)<br />

defs<br />

pwr_def: "x pwr y == if y >= 0 then int (nat x ^ nat y) else arbitrary"<br />

section {* Reasoning <strong>with</strong> equality and inequalities *}<br />

theorem zless_eq: "(((x::int)


APPENDIX B. THEORETICAL BASIS FOR LAYOUT REASONING 190<br />

lemma z_aleq_bc: "[| (0::int)


APPENDIX B. THEORETICAL BASIS FOR LAYOUT REASONING 191<br />

(* A less permissive version <strong>of</strong> the ’nat(n)’ function. ’nat(n)’ is a total function<br />

*)<br />

(* but we want a partial function defined only for n >= 0 *)<br />

constdefs<br />

int2nat :: "int => nat"<br />

"int2nat == (% x. if 0 int2nat x = arbitrary"<br />

by (simp add: int2nat_def)<br />

theorem int2nat_defined [simp]: "!! (x::int). 0 int2nat x = nat x"<br />

by (simp add: int2nat_def)<br />

end<br />

B.2 Types<br />

header {* Definition <strong>of</strong> Quartz types *}<br />

theory Types = Main:<br />

typedecl wire<br />

types ’a vector = "int => ’a"<br />

constdefs<br />

vecelem :: "’a vector => int => ’a" ("_" [51, 52] 51)<br />

"vecelem == % v i. (v i)"<br />

vecrange:: "’a vector => int => int => ’a vector" ("_" [55,56,57] 55)<br />

"vecrange == % v ub lb. (%x. if (x + lb) wire"<br />

bool2wire :: "bool => wire"<br />

end<br />

B.3 Block<br />

header {* Definition <strong>of</strong> Quartz blocks as records <strong>of</strong> functional definition and size<br />

functions *}<br />

theory Block = Types:<br />

record (’a,’b)block =<br />

Def :: "’a"<br />

Height :: "’b"<br />

Width :: "’b"<br />

section {* Currying <strong>of</strong> blocks and block instantiation *}<br />

constdefs<br />

ap :: "[(’a=>’b,’a=>’c)block,’a]=>(’b,’c)block" (infixl "$" 49)<br />

"ap == % B x. (| Def = Def B x, Height = Height B x, Width = Width B x |)"<br />

constdefs


APPENDIX B. THEORETICAL BASIS FOR LAYOUT REASONING 192<br />

inst :: "[’a, (’a=>’b=>bool,’a=>’b=>int)block,’b]=>(bool, int)block" ("_ ;;; _ ;;;<br />

_" [45, 46, 47] 45)<br />

"inst == (% x B y. (| Def = Def B x y, Height = Height B x y, Width = Width B x y<br />

|))"<br />

section {* Simplification theorems *}<br />

theorem height_extract [simp]: "Height (x ;;; A ;;; y) = ((Height A) x y)"<br />

by (simp add:inst_def)<br />

theorem width_extract [simp]: "Width (x ;;; A ;;; y) = ((Width A) x y)"<br />

by (simp add:inst_def)<br />

theorem def_extract: "Def (x ;;; A ;;; y) = (Def A) x y"<br />

by (simp add: inst_def)<br />

theorem height_ap [simp]: "Height (A $ x) = (Height A) x"<br />

by (simp add: ap_def)<br />

theorem width_ap [simp]: "Width (A $ x) = (Width A) x"<br />

by (simp add: ap_def)<br />

theorem def_ap [simp]: "Def (A $ x) = (Def A) x"<br />

by (simp add: ap_def)<br />

section {* Congruence, for recdef pro<strong>of</strong>s *}<br />

theorem ap_cong: "((| Def = s, Height = h, Width = w|) $ l) = (| Def = s l, Height =<br />

h l, Width = w l|)"<br />

by (simp)<br />

declare ap_cong [recdef_cong]<br />

end<br />

B.4 Inbuilt<br />

header {* In-built language blocks: zip and unzip *}<br />

theory Inbuilt = Types + Block:<br />

(* Don’t define structure / function - unnecessary for reasoning about size *)<br />

section {* Zip block *}<br />

consts<br />

defs<br />

zip_struct:: "int=>’a=>(’b)vector=>bool"<br />

zip_height:: "int=>’a=>(’b)vector=>int"<br />

zip_width:: "int=>’a=>(’b)vector=>int"<br />

zip:: "(int=>’a=>(’b)vector=>bool, int=>’a=>(’b)vector=>int)block"<br />

zip_height_def: "zip_height == % n x y. 0"<br />

zip_width_def: "zip_width == % n x y. 0"<br />

zip_def: "zip == (| Def = zip_struct, Height = zip_height, Width = zip_width<br />

|)"


APPENDIX B. THEORETICAL BASIS FOR LAYOUT REASONING 193<br />

declare zip_height_def [simp]<br />

declare zip_width_def [simp]<br />

declare zip_def [simp]<br />

section {* Unzip block *}<br />

consts<br />

defs<br />

unzip_struct:: "int=>(’a)vector=>’b=>bool"<br />

unzip_height:: "int=>(’a)vector=>’b=>int"<br />

unzip_width:: "int=>(’a)vector=>’b=>int"<br />

unzip:: "(int=>(’a)vector=>’b=>bool, int=>(’a)vector=>’b=>int)block"<br />

unzip_height_def: "unzip_height == % n x y. 0"<br />

unzip_width_def: "unzip_width == % n x y. 0"<br />

unzip_def: "unzip == (| Def = unzip_struct, Height = unzip_height, Width =<br />

unzip_width|)"<br />

declare unzip_height_def [simp]<br />

declare unzip_width_def [simp]<br />

declare unzip_def [simp]<br />

end<br />

B.5 SeriesComposition<br />

header {* Definition <strong>of</strong> Quartz series composition *}<br />

theory SeriesComposition = Block + IntAlgebra:<br />

constdefs<br />

ser:: "[(’a=>’b=>bool,’a=>’b=>int)block,(’b=>’c=>bool,’b=>’c=>int)block]=>(’a=>’c<br />

=>bool, ’a=>’c=>int)block" (infixl ";;" 48)<br />

"ser == (% B1 B2. (| Def = % x y. EX s. (Def B1) x s & (Def B2) s y,<br />

Height = % x y. let s = (THE s. (Def B1) x s & (Def B2) s y) in<br />

max (Height B1 x s) (Height B2 s y),<br />

Width = % x y. let s = (THE s. (Def B1) x s & (Def B2) s y) in<br />

(Width B1 x s) + (Width B2 s y)|))"<br />

section {* Properties <strong>of</strong> series composition *}<br />

theorem width_ser_ge0: "!! P Q x y. [| !! x y. 0


APPENDIX B. THEORETICAL BASIS FOR LAYOUT REASONING 194<br />

header {* Definition <strong>of</strong> Quartz parallel composition *}<br />

theory ParallelComposition = Block + IntAlgebra:<br />

consts<br />

Par :: "(’a=>’b=>bool, ’a=>’b=>int)block=>(’c=>’d=>bool,’c=>’d=>int)block=>((’a*’c<br />

)=>(’b*’d)=>bool, (’a*’c)=>(’b*’d)=>int)block"<br />

EmptyPar :: "unit=>unit" ("[[ ]]")<br />

section {* Syntax definitions *}<br />

nonterminals<br />

par_args parpatterns<br />

syntax<br />

"_par" :: "(’a=>’b=>bool,’a=>’b=>int)block => par_args => ((’a*’c)=>(’b*’d)=><br />

bool,(’a*’c)=>(’b*’d)=>int)block"<br />

("[[ _ , _ ]]")<br />

"_par_arg" :: "(’a=>’b=>bool,’a=>’b=>int)block => par_args" ("_")<br />

"_par_args" :: "(’c=>’d=>bool,’c=>’d=>int)block => par_args => par_args" ("_,/ _<br />

")<br />

"_parpattern" :: "[pttrn, parpatterns] => pttrn" ("’[[_,/ _’]]")<br />

"" :: "pttrn => parpatterns" ("_")<br />

"_parpatterns" :: "[pttrn, parpatterns] => parpatterns" ("_,/ _")<br />

translations<br />

"[[ x, y ]]" == "Par x y"<br />

"_par x (_par_args y z)" == "_par x (_par_arg (_par y z))"<br />

defs<br />

par_def: "Par A B == (| Def = % (d1, d2) (r1, r2). (Def A) d1 r1 & (Def B) d2 r2,<br />

Height = % (d1,d2) (r1, r2). (Height A) d1 r1 + (Height B) d2<br />

r2,<br />

Width = % (d1, d2) (r1, r2). max ((Width A) d1 r1) ((Width B)<br />

d2 r2) |)"<br />

section {* Expansion theorems *}<br />

theorem par2height: "Height (((a,b) ;;; [[ F, G ]] ;;; (c, d))) = (Height F) a c + (<br />

Height G) b d"<br />

by (simp add: par_def)<br />

theorem par2width: "Width ((a,b) ;;; [[ F , G ]] ;;; (c, d)) = max ((Width F) a c) ((<br />

Width G) b d)"<br />

by (simp add: par_def)<br />

section {* Properties <strong>of</strong> parallel composition *}<br />

theorem width_par_ge0: "!! P Q x y. [| !! x y. 0


APPENDIX B. THEORETICAL BASIS FOR LAYOUT REASONING 195<br />

end<br />

B.7 Functions<br />

header {* Definition <strong>of</strong> Quartz expression functions: maxf, sum, etc *}<br />

theory Functions = IntAlgebra:<br />

section {* Useful logic *}<br />

theorem conj_imp: "(A --> B & C) = ((A --> B) & (A --> C))"<br />

by (auto)<br />

section {* maxf function to find the maximum point <strong>of</strong> a function <strong>with</strong>in in a range *}<br />

consts<br />

maxf :: "(int*int*(int=>int))=>int"<br />

recdef maxf "measure (%(b, t, f). nat(t+1-b))"<br />

"maxf (bot, top, fun) = (if (top < bot) then 0<br />

else (<br />

case (top = bot) <strong>of</strong> True =><br />

fun top<br />

| False => (<br />

let one = fun top in<br />

let two = maxf (bot, top - 1, fun) in<br />

max one two)<br />

))"<br />

theorem maxf_expand_if: "maxf(b,t,f) = (if (t < b) then 0 else (if t = b then f t<br />

else max (f t) (maxf(b,t - 1,f))))"<br />

by (simp add: Let_def)<br />

subsection {* Correctness pro<strong>of</strong> *}<br />

constdefs<br />

is_maxf :: "[int,int, int=>int,int]=>bool"<br />

"is_maxf == % bot top fun max.<br />

(EX y. bot C"<br />

by (auto)<br />

lemma logic_impand: "A & B = (A --> B) | ~A"<br />

by (blast)<br />

theorem maxf_nobigger: "!! b t f. [| b


APPENDIX B. THEORETICAL BASIS FOR LAYOUT REASONING 196<br />

apply (rule conjI)<br />

apply (simp)<br />

apply (rule impI)<br />

apply (simp (no_asm_simp) add: Let_def del: maxf.simps)<br />

apply (simp only: z_leqplusone)<br />

apply (simp only: logic_rearr)<br />

apply (rule allI)<br />

apply (rule logic_rearr2)<br />

apply (simp only: le_max_iff_disj)<br />

apply (simp)<br />

apply (simp only: le_max_iff_disj)<br />

apply (simp)<br />

done<br />

theorem maxf_fmax: "!! b t f x. [| b b


APPENDIX B. THEORETICAL BASIS FOR LAYOUT REASONING 197<br />

apply (simp del: maxf.simps)<br />

apply (rule conjI)<br />

apply (rule impI)<br />

defer<br />

apply (rule impI)<br />

apply (rule exI)<br />

apply (rule conjI)<br />

defer<br />

apply (rule conjI)<br />

defer<br />

apply (simp)<br />

defer<br />

by (simp, simp, auto)<br />

theorem maxf_is_maxf: "!! b t f. [| b


APPENDIX B. THEORETICAL BASIS FOR LAYOUT REASONING 198<br />

done<br />

theorem maxf_ge0_frange: "!! m n f. [| (ALL y. m


APPENDIX B. THEORETICAL BASIS FOR LAYOUT REASONING 199<br />

theorem sum_norange_ge0: "!! m n f. [| m < n |] ==> 0


APPENDIX B. THEORETICAL BASIS FOR LAYOUT REASONING 200<br />

theorem sumn_plusf: "!! (p::int) q m n f. [| (!! y. 0


APPENDIX B. THEORETICAL BASIS FOR LAYOUT REASONING 201<br />

apply (rule impI) apply (rule impI)<br />

apply (subgoal_tac "0 int). [| (!!x. 0 maxf(b, t,<br />

%k. sum(b,k,f)) = sum(b, t, f)"<br />

apply (subgoal_tac "b (B|C|D|E))"<br />

by (auto)<br />

lemma impdisj_2<strong>of</strong>4: "(A --> C) ==> (A --> (B|C|D|E))"<br />

by (auto)


APPENDIX B. THEORETICAL BASIS FOR LAYOUT REASONING 202<br />

lemma impdisj_3<strong>of</strong>4: "(A --> D) ==> (A --> (B|C|D|E))"<br />

by (auto)<br />

lemma impdisj_4<strong>of</strong>4: "(A --> E) ==> (A --> (B|C|D|E))"<br />

by (auto)<br />

lemma impdisj_1<strong>of</strong>2: "(A --> B) ==> (A --> (B|C))"<br />

by auto<br />

lemma impdisj_2<strong>of</strong>2: "(A --> C) ==> (A --> (B|C))"<br />

by auto<br />

lemma impdisj_12<strong>of</strong>4: "(A --> (B|C)) ==> (A --> (B|C|D|E))"<br />

by auto<br />

lemma impdisj_34<strong>of</strong>4: "(A --> (D|E)) ==> (A --> (B|C|D|E))"<br />

by auto<br />

section {* Zero size ranges *}<br />

lemma overlap0: "((0::int)


APPENDIX B. THEORETICAL BASIS FOR LAYOUT REASONING 203<br />

((m


APPENDIX B. THEORETICAL BASIS FOR LAYOUT REASONING 204<br />

apply (rule impI)<br />

apply (simp (no_asm_simp) add: Let_def del: maxf.simps)<br />

apply (simp only: z_leqplusone)<br />

apply (simp only: logic_rearr)<br />

apply (rule logic_rearr2)<br />

apply (simp only: le_max_iff_disj)<br />

apply (simp)<br />

apply (simp only: le_max_iff_disj)<br />

apply (simp)<br />

done<br />

theorem maxf_sizefunc2: "!! x. [| True |] ==> b f x


APPENDIX B. THEORETICAL BASIS FOR LAYOUT REASONING 205<br />

theorem sum_nlem_zero [simp]: "!! (m::int) (n::int) f. [| n < m |] ==> sum (m, n, f)<br />

= 0"<br />

by (simp add: sum.simps)<br />

theorem sum_simp_xnotinf: "!! (m::int) (n::int) f. sum(m, n, %x. f) = (if m maxf(m, n, f)<br />

= 0"<br />

by (simp add:maxf.simps)<br />

theorem maxf_simp_xnotinf: "!! (m::int) (n::int) f. maxf(m, n, %x. f) = (if m


APPENDIX B. THEORETICAL BASIS FOR LAYOUT REASONING 206<br />

theorem min_max_corres: "!! (a::int) (b::int). min a b = - (max (-a) (-b))"<br />

apply (simp add: max_def min_def)<br />

done<br />

section {* minf function and correspondance <strong>with</strong> maxf *}<br />

consts<br />

minf :: "(int*int*(int=>int))=>int"<br />

recdef minf "measure (%(b, t, f). nat(t+1-b))"<br />

"minf (bot, top, fun) = (if (top < bot) then 0<br />

else (<br />

case (top = bot) <strong>of</strong> True =><br />

fun top<br />

| False => (<br />

let one = fun top in<br />

let two = minf (bot, top - 1, fun) in<br />

min one two)<br />

))"<br />

theorem minf_maxf_corres: "!! (f::int=>int) b t. minf(b,t,f) = - maxf (b,t,% x.- f x)<br />

"<br />

apply (case_tac "b


Appendix C<br />

Placed Combinator Libraries<br />

This appendix contains the Quartz descriptions and layout correctness pro<strong>of</strong>s for some Quartz<br />

libraries. The wide range <strong>of</strong> wiring blocks in the Quartz prelude library have been omitted,<br />

since they are all defined to have size 0 × 0, as have many combinators which have similar<br />

functions to the one shown. For example, the snd block is structurally very similar to the fst<br />

block and has been omitted, as has the col block which is very similar to row.<br />

C.1 Prelude Library<br />

C.1.1 fst<br />

/∗∗ Apply block R to the first element <strong>of</strong> a tuple ∗/<br />

block fst (block R ‘a ∼ ‘b) (‘a i1, ‘c i2) ∼ (‘b o1, ‘c o2)<br />

attributes {<br />

height = height(i1 ;R ; o1).<br />

width = width(i1 ;R ;o1).<br />

layout−proved.<br />

} {<br />

o2 = i2.<br />

i1 ; R ; o1 at (0,0).<br />

}<br />

theory fst = Quartz<strong>Layout</strong>:<br />

section {* Function definitions *}<br />

consts<br />

struct:: "((’t276=>’t277=>bool,’t276=>’t277=>int)block)=>(’t276*’t284)=>(’t277*’<br />

t284)=>bool"<br />

207


APPENDIX C. PLACED COMBINATOR LIBRARIES 208<br />

height:: "((’t276=>’t277=>bool,’t276=>’t277=>int)block)=>(’t276*’t284)=>(’t277*’<br />

t284)=>int"<br />

width:: "((’t276=>’t277=>bool,’t276=>’t277=>int)block)=>(’t276*’t284)=>(’t277*’<br />

t284)=>int"<br />

fst:: "(((’t276=>’t277=>bool,’t276=>’t277=>int)block)=>(’t276*’t284)=>(’t277*’<br />

t284)=>bool, ((’t276=>’t277=>bool,’t276=>’t277=>int)block)=>(’t276*’t284)=>(’<br />

t277*’t284)=>int)block"<br />

defs<br />

struct_def: "struct == % R (i1, i2) (o1, o2). (o2 = i2) & Def (i1 ;;; R ;;; o1)"<br />

height_def: "height == % R (i1, i2) (o1, o2). Height (i1 ;;; R ;;; o1)"<br />

width_def: "width == % R (i1, i2) (o1, o2). Width (i1 ;;; R ;;; o1)"<br />

fst_def: "fst == (| Def = struct, Height = height, Width = width |)"<br />

declare width_def [simp]<br />

declare height_def [simp]<br />

declare struct_def [simp]<br />

section {* Validity <strong>of</strong> width and height functions *}<br />

theorem height_ge0_int: "!! (R::((’t276=>’t277=>bool,’t276=>’t277=>int)block)) (i1::’<br />

t276) (i2::’t284) (o1::’t277) (o2::’t284). [| ALL (qs674::’t276) (qs675::’t277).<br />

0 <br />

0 ’t277=>bool,’t276=>’t277=>int)block)) (i1::’<br />

t276) (i2::’t284) (o1::’t277) (o2::’t284). [| ALL (qs674::’t276) (qs675::’t277).<br />

0 <br />

0 ’t277=>bool,’t276=>’t277=>int)block)) (i1::’t276<br />

) (i2::’t284) (o1::’t277) (o2::’t284). [| ALL (qs674::’t276) (qs675::’t277). 0<br />

<br />

0 ’t277=>bool,’t276=>’t277=>int)block)) (i1::’t276)<br />

(i2::’t284) (o1::’t277) (o2::’t284). [| ALL (qs674::’t276) (qs675::’t277). 0 <br />

0 ’t277=>bool,’t276=>’t277=>int)block)) (i1::’t276) (i2::’t284<br />

) (o1::’t277) (o2::’t284). [| o2 = i2 ; Def (i1 ;;; R ;;; o1) ; ALL (qs674::’<br />

t276) (qs675::’t277). 0


APPENDIX C. PLACED COMBINATOR LIBRARIES 209<br />

((0::int) int)block)=>’t301=>’t300=>bool"<br />

height:: "((’t300=>’t301=>bool,’t300=>’t301=>int)block)=>’t301=>’t300=>int"<br />

width:: "((’t300=>’t301=>bool,’t300=>’t301=>int)block)=>’t301=>’t300=>int"<br />

converse:: "(((’t300=>’t301=>bool,’t300=>’t301=>int)block)=>’t301=>’t300=>bool,<br />

((’t300=>’t301=>bool,’t300=>’t301=>int)block)=>’t301=>’t300=>int)block"<br />

defs<br />

struct_def: "struct == % R i o_. Def (o_ ;;; R ;;; i)"<br />

height_def: "height == % R i o_. Height (o_ ;;; R ;;; i)"<br />

width_def: "width == % R i o_. Width (o_ ;;; R ;;; i)"<br />

converse_def: "converse == (| Def = struct, Height = height, Width = width |)"<br />

declare width_def [simp]<br />

declare height_def [simp]<br />

declare struct_def [simp]<br />

section {* Validity <strong>of</strong> width and height functions *}<br />

theorem height_ge0_int: "!! (R::((’t300=>’t301=>bool,’t300=>’t301=>int)block)) (i::’<br />

t301) (o_::’t300). [| ALL (qs678::’t300) (qs679::’t301). 0 bool,’t300=>’t301=>int)block)) (i::’<br />

t301) (o_::’t300). [| ALL (qs678::’t300) (qs679::’t301). 0 bool,’t300=>’t301=>int)block)) (i::’t301)<br />

(o_::’t300). [| ALL (qs678::’t300) (qs679::’t301). 0


APPENDIX C. PLACED COMBINATOR LIBRARIES 210<br />

;;; qs679)) ; ALL (qs678::’t300) (qs679::’t301). 0 <br />

0 ’t301=>bool,’t300=>’t301=>int)block)) (i::’t301)<br />

(o_::’t300). [| ALL (qs678::’t300) (qs679::’t301). 0 bool,’t300=>’t301=>int)block)) (i::’t301) (o_::’t300)<br />

. [| Def (o_ ;;; R ;;; i) ; ALL (qs678::’t300) (qs679::’t301). 0


APPENDIX C. PLACED COMBINATOR LIBRARIES 211<br />

section {* Function definitions *}<br />

consts<br />

struct:: "(int*((’t321=>’t321=>bool,’t321=>’t321=>int)block))=>’t321=>’t321=>bool<br />

"<br />

height:: "(int*((’t321=>’t321=>bool,’t321=>’t321=>int)block))=>’t321=>’t321=>int"<br />

width:: "(int*((’t321=>’t321=>bool,’t321=>’t321=>int)block))=>’t321=>’t321=>int"<br />

rcomp:: "((int*((’t321=>’t321=>bool,’t321=>’t321=>int)block))=>’t321=>’t321=>bool<br />

, (int*((’t321=>’t321=>bool,’t321=>’t321=>int)block))=>’t321=>’t321=>int)<br />

block"<br />

defs<br />

struct_def: "struct == % (n, R) i o_. EX (intsig::(’t321)vector). if n = 0 then<br />

o_ = i else (intsig = i) & (ALL (j::int). ((0 ’t321=>int)block<br />

)) (i::’t321) (o_::’t321). [| ALL (qs680::’t321) (qs681::’t321). 0 <br />

0 ’t321=>bool,’t321=>’t321=>int)block)<br />

) (i::’t321) (o_::’t321). [| ALL (qs680::’t321) (qs681::’t321). 0 <br />

0 ’t321=>bool,’t321=>’t321=>int)block)) (<br />

i::’t321) (o_::’t321). [| ALL (qs680::’t321) (qs681::’t321). 0 <br />

0 ’t321=>bool,’t321=>’t321=>int)block)) (i<br />

::’t321) (o_::’t321). [| ALL (qs680::’t321) (qs681::’t321). 0


APPENDIX C. PLACED COMBINATOR LIBRARIES 212<br />

;;; R ;;; qs681)) |] ==><br />

0 ’t321=>bool,’t321=>’t321=>int)block)) (i::’t321) (<br />

o_::’t321). [| if n = 0 then o_ = i else (intsig = i) & (ALL (j::int). ((0


APPENDIX C. PLACED COMBINATOR LIBRARIES 213<br />

C.1.4 Q\P (conjugate)<br />

/∗∗ Conjugation. Q \ P = Pˆ∼ 1 ;Q ;P ∗/<br />

block conjugate (block Q ‘a ∼ ‘a, block P ‘a ∼ ‘a) (‘a i) ∼ (‘a o)<br />

attributes {<br />

height = height (i ;converse P ; Q ; P ; o).<br />

width = width (i ;converse P ; Q ; P ; o).<br />

layout−proved.<br />

} → converse P ; Q ; P.<br />

theory conjugate = converse:<br />

section {* Function definitions *}<br />

consts<br />

struct:: "(((’t343=>’t343=>bool,’t343=>’t343=>int)block)*((’t343=>’t343=>bool,’<br />

t343=>’t343=>int)block))=>’t343=>’t343=>bool"<br />

height:: "(((’t343=>’t343=>bool,’t343=>’t343=>int)block)*((’t343=>’t343=>bool,’<br />

t343=>’t343=>int)block))=>’t343=>’t343=>int"<br />

width:: "(((’t343=>’t343=>bool,’t343=>’t343=>int)block)*((’t343=>’t343=>bool,’<br />

t343=>’t343=>int)block))=>’t343=>’t343=>int"<br />

conjugate:: "((((’t343=>’t343=>bool,’t343=>’t343=>int)block)*((’t343=>’t343=>bool<br />

,’t343=>’t343=>int)block))=>’t343=>’t343=>bool, (((’t343=>’t343=>bool,’t343<br />

=>’t343=>int)block)*((’t343=>’t343=>bool,’t343=>’t343=>int)block))=>’t343=>’<br />

t343=>int)block"<br />

defs<br />

struct_def: "struct == % (Q, P) i o_. Def (i ;;; converse $ (P) ;; Q ;; P ;;; o_)<br />

"<br />

height_def: "height == % (Q, P) i o_. Height (i ;;; converse $ (P) ;; Q ;; P ;;;<br />

o_)"<br />

width_def: "width == % (Q, P) i o_. Width (i ;;; converse $ (P) ;; Q ;; P ;;; o_)<br />

"<br />

conjugate_def: "conjugate == (| Def = struct, Height = height, Width = width |)"<br />

declare width_def [simp]<br />

declare height_def [simp]<br />

declare struct_def [simp]<br />

section {* Validity <strong>of</strong> width and height functions *}<br />

theorem height_ge0_int: "!! (Q::((’t343=>’t343=>bool,’t343=>’t343=>int)block)) (P<br />

::((’t343=>’t343=>bool,’t343=>’t343=>int)block)) (i::’t343) (o_::’t343). [| ALL<br />

(qs683::’t343) (qs684::’t343). 0 int)block)) (P::((’<br />

t343=>’t343=>bool,’t343=>’t343=>int)block)) (i::’t343) (o_::’t343). [| ALL (<br />

qs683::’t343) (qs684::’t343). 0


APPENDIX C. PLACED COMBINATOR LIBRARIES 214<br />

(qs686::’t343). 0 <br />

0 ’t343=>bool,’t343=>’t343=>int)block)) (P::((’<br />

t343=>’t343=>bool,’t343=>’t343=>int)block)) (i::’t343) (o_::’t343). [| ALL (<br />

qs683::’t343) (qs684::’t343). 0 int)block)) (P::((’t343<br />

=>’t343=>bool,’t343=>’t343=>int)block)) (i::’t343) (o_::’t343). [| ALL (qs683::’<br />

t343) (qs684::’t343). 0 int)block)) (P::((’t343=>’t343=><br />

bool,’t343=>’t343=>int)block)) (i::’t343) (o_::’t343). [| Def (i ;;; converse $<br />

(P) ;; Q ;; P ;;; o_) ; ALL (qs683::’t343) (qs684::’t343). 0


APPENDIX C. PLACED COMBINATOR LIBRARIES 215<br />

attributes {<br />

width = maxf(k=0..n−1, width (i[k] ;R ;o[k])).<br />

height = sum(k=0..n−1, height (i[k] ;R ;o[k])).<br />

layout−proved.<br />

} {<br />

int j.<br />

for j = 0..n−1 {<br />

i [j] ; R ; o[j] at (0, sum(k=0..j−1,height(i[k] ;R ; o[k]))).<br />

} .<br />

}<br />

theory map = Quartz<strong>Layout</strong>:<br />

section {* Function definitions *}<br />

consts<br />

struct:: "(int*((’t395=>’t396=>bool,’t395=>’t396=>int)block))=>(’t395)vector=>(’<br />

t396)vector=>bool"<br />

height:: "(int*((’t395=>’t396=>bool,’t395=>’t396=>int)block))=>(’t395)vector=>(’<br />

t396)vector=>int"<br />

width:: "(int*((’t395=>’t396=>bool,’t395=>’t396=>int)block))=>(’t395)vector=>(’<br />

t396)vector=>int"<br />

map:: "((int*((’t395=>’t396=>bool,’t395=>’t396=>int)block))=>(’t395)vector=>(’<br />

t396)vector=>bool, (int*((’t395=>’t396=>bool,’t395=>’t396=>int)block))=>(’<br />

t395)vector=>(’t396)vector=>int)block"<br />

defs<br />

struct_def: "struct == % (n, R) i o_. ALL (j::int). ((0 ’t396=>bool,’t395=>’t396=>int)block<br />

)) (i::(’t395)vector) (o_::(’t396)vector). [| ALL (qs691::’t395) (qs692::’t396).<br />

0 <br />

0 ’t396=>bool,’t395=>’t396=>int)block)<br />

) (i::(’t395)vector) (o_::(’t396)vector). [| ALL (qs691::’t395) (qs692::’t396).<br />

0 <br />

0


APPENDIX C. PLACED COMBINATOR LIBRARIES 216<br />

theorem height_ge0: "!! (n::int) (R::((’t395=>’t396=>bool,’t395=>’t396=>int)block)) (<br />

i::(’t395)vector) (o_::(’t396)vector). [| ALL (qs691::’t395) (qs692::’t396). 0<br />

<br />

0 ’t396=>bool,’t395=>’t396=>int)block)) (i<br />

::(’t395)vector) (o_::(’t396)vector). [| ALL (qs691::’t395) (qs692::’t396). 0 <br />

0 ’t396=>bool,’t395=>’t396=>int)block)) (i::(’t395)<br />

vector) (o_::(’t396)vector). [| ALL (j::int). ((0


APPENDIX C. PLACED COMBINATOR LIBRARIES 217<br />

rule impdisj_34<strong>of</strong>4,<br />

rule loop_sum_overlap2,<br />

(simp add: overlap0’’)+) |<br />

auto intro: sum_ge0 maxf_ge0 sum_nsub1_plusf maxf_encloses)<br />

done<br />

end<br />

C.1.6 (tri)<br />

/∗∗ Triangle. Ruby /\. Place a triangle <strong>of</strong> R blocks between the n−element<br />

vectors i and o. Tri is an increasing triangle , irt is decreasing ∗/<br />

block tri (int n, block R ‘a ∼ ‘a) (‘a i[n]) ∼ (‘a o[n])<br />

attributes {<br />

height = if (n 1) {<br />

for j = 1..n−1 {<br />

i[j] ; rcomp (j, R) ; o[j] at (sum(j=1..j−1,width(i[j] ;rcomp (j, R) ; o[j ])), 0).<br />

} .<br />

} .<br />

}<br />

theory tri = rcomp:<br />

section {* Function definitions *}<br />

consts<br />

struct:: "(int*((’t415=>’t415=>bool,’t415=>’t415=>int)block))=>(’t415)vector=>(’<br />

t415)vector=>bool"<br />

height:: "(int*((’t415=>’t415=>bool,’t415=>’t415=>int)block))=>(’t415)vector=>(’<br />

t415)vector=>int"<br />

width:: "(int*((’t415=>’t415=>bool,’t415=>’t415=>int)block))=>(’t415)vector=>(’<br />

t415)vector=>int"<br />

tri:: "((int*((’t415=>’t415=>bool,’t415=>’t415=>int)block))=>(’t415)vector=>(’<br />

t415)vector=>bool, (int*((’t415=>’t415=>bool,’t415=>’t415=>int)block))=>(’<br />

t415)vector=>(’t415)vector=>int)block"<br />

defs<br />

struct_def: "struct == % (n, R) i o_. (n >= 0) & (o_ = i) & (if n > 1 then<br />

ALL (j::int). ((1


APPENDIX C. PLACED COMBINATOR LIBRARIES 218<br />

section {* Validity <strong>of</strong> width and height functions *}<br />

theorem height_ge0_int: "!! (n::int) (R::((’t415=>’t415=>bool,’t415=>’t415=>int)block<br />

)) (i::(’t415)vector) (o_::(’t415)vector). [| ALL (qs694::’t415) (qs695::’t415).<br />

0 <br />

0 ’t415=>bool,’t415=>’t415=>int)block)<br />

) (i::(’t415)vector) (o_::(’t415)vector). [| ALL (qs694::’t415) (qs695::’t415).<br />

0 <br />

0 ’t415=>bool,’t415=>’t415=>int)block)) (<br />

i::(’t415)vector) (o_::(’t415)vector). [| ALL (qs694::’t415) (qs695::’t415). 0<br />

<br />

0 ’t415=>bool,’t415=>’t415=>int)block)) (i<br />

::(’t415)vector) (o_::(’t415)vector). [| ALL (qs694::’t415) (qs695::’t415). 0 <br />

0 ’t415=>bool,’t415=>’t415=>int)block)) (i::(’t415)<br />

vector) (o_::(’t415)vector). [| n >= 0 ; o_ = i ; if n > 1 then ALL (j::<br />

int). ((1


APPENDIX C. PLACED COMBINATOR LIBRARIES 219<br />

section {* Intersection theorems *}<br />

theorem "!! (n::int) (R::((’t415=>’t415=>bool,’t415=>’t415=>int)block)) (i::(’t415)<br />

vector) (o_::(’t415)vector). [| n >= 0 ; o_ = i ; if n > 1 then ALL (j::<br />

int). ((1


APPENDIX C. PLACED COMBINATOR LIBRARIES 220<br />

struct:: "((((’t486*’t487)=>(’t488*’t489)=>bool,(’t486*’t487)=>(’t488*’t489)=>int<br />

)block)*(((’t489*’t491)=>(’t492*’t493)=>bool,(’t489*’t491)=>(’t492*’t493)=><br />

int)block))=>(’t486*(’t487*’t491))=>((’t488*’t492)*’t493)=>bool"<br />

height:: "((((’t486*’t487)=>(’t488*’t489)=>bool,(’t486*’t487)=>(’t488*’t489)=>int<br />

)block)*(((’t489*’t491)=>(’t492*’t493)=>bool,(’t489*’t491)=>(’t492*’t493)=><br />

int)block))=>(’t486*(’t487*’t491))=>((’t488*’t492)*’t493)=>int"<br />

width:: "((((’t486*’t487)=>(’t488*’t489)=>bool,(’t486*’t487)=>(’t488*’t489)=>int)<br />

block)*(((’t489*’t491)=>(’t492*’t493)=>bool,(’t489*’t491)=>(’t492*’t493)=>int<br />

)block))=>(’t486*(’t487*’t491))=>((’t488*’t492)*’t493)=>int"<br />

beside:: "(((((’t486*’t487)=>(’t488*’t489)=>bool,(’t486*’t487)=>(’t488*’t489)=><br />

int)block)*(((’t489*’t491)=>(’t492*’t493)=>bool,(’t489*’t491)=>(’t492*’t493)<br />

=>int)block))=>(’t486*(’t487*’t491))=>((’t488*’t492)*’t493)=>bool, ((((’t486<br />

*’t487)=>(’t488*’t489)=>bool,(’t486*’t487)=>(’t488*’t489)=>int)block)*(((’<br />

t489*’t491)=>(’t492*’t493)=>bool,(’t489*’t491)=>(’t492*’t493)=>int)block))<br />

=>(’t486*(’t487*’t491))=>((’t488*’t492)*’t493)=>int)block"<br />

defs<br />

struct_def: "struct == % (R, S) (a, (b, c)) ((d, e), f). EX (is::’t489). Def ((a,<br />

b) ;;; R ;;; (d, is)) & Def ((is, c) ;;; S ;;; (e, f))"<br />

height_def: "height == % (R, S) (a, (b, c)) ((d, e), f). let is = (THE (is::’t489<br />

). Def ((a, b) ;;; R ;;; (d, is)) & Def ((is, c) ;;; S ;;; (e, f))) in max (<br />

Height ((a, b) ;;; R ;;; (d, is))) (Height ((is, c) ;;; S ;;; (e, f)))"<br />

width_def: "width == % (R, S) (a, (b, c)) ((d, e), f). let is = (THE (is::’t489).<br />

Def ((a, b) ;;; R ;;; (d, is)) & Def ((is, c) ;;; S ;;; (e, f))) in (Width<br />

((a, b) ;;; R ;;; (d, is))) + (Width ((is, c) ;;; S ;;; (e, f)))"<br />

beside_def: "beside == (| Def = struct, Height = height, Width = width |)"<br />

declare width_def [simp]<br />

declare height_def [simp]<br />

declare struct_def [simp]<br />

section {* Validity <strong>of</strong> width and height functions *}<br />

theorem height_ge0_int: "!! (R::(((’t486*’t487)=>(’t488*’t489)=>bool,(’t486*’t487)<br />

=>(’t488*’t489)=>int)block)) (S::(((’t489*’t491)=>(’t492*’t493)=>bool,(’t489*’<br />

t491)=>(’t492*’t493)=>int)block)) (a::’t486) (b::’t487) (c::’t491) (d::’t488) (e<br />

::’t492) (f::’t493). [| ALL (qs704::(’t486*’t487)) (qs705::(’t488*’t489)). 0 bool,(’t486*’t487)=>(’<br />

t488*’t489)=>int)block)) (S::(((’t489*’t491)=>(’t492*’t493)=>bool,(’t489*’t491)<br />

=>(’t492*’t493)=>int)block)) (a::’t486) (b::’t487) (c::’t491) (d::’t488) (e::’<br />

t492) (f::’t493). [| ALL (qs704::(’t486*’t487)) (qs705::(’t488*’t489)). 0 bool,(’t486*’t487)=>(’<br />

t488*’t489)=>int)block)) (S::(((’t489*’t491)=>(’t492*’t493)=>bool,(’t489*’t491)<br />

=>(’t492*’t493)=>int)block)) (a::’t486) (b::’t487) (c::’t491) (d::’t488) (e::’


APPENDIX C. PLACED COMBINATOR LIBRARIES 221<br />

t492) (f::’t493). [| ALL (qs704::(’t486*’t487)) (qs705::(’t488*’t489)). 0 bool,(’t486*’t487)=>(’t488<br />

*’t489)=>int)block)) (S::(((’t489*’t491)=>(’t492*’t493)=>bool,(’t489*’t491)=>(’<br />

t492*’t493)=>int)block)) (a::’t486) (b::’t487) (c::’t491) (d::’t488) (e::’t492)<br />

(f::’t493). [| ALL (qs704::(’t486*’t487)) (qs705::(’t488*’t489)). 0 bool,(’t486*’t487)=>(’t488*’t489)=><br />

int)block)) (S::(((’t489*’t491)=>(’t492*’t493)=>bool,(’t489*’t491)=>(’t492*’t493<br />

)=>int)block)) (a::’t486) (b::’t487) (c::’t491) (d::’t488) (e::’t492) (f::’t493)<br />

. [| Def ((a, b) ;;; R ;;; (d, is)) ; Def ((is, c) ;;; S ;;; (e, f)) ; ALL (<br />

qs704::(’t486*’t487)) (qs705::(’t488*’t489)). 0 (’t492*’t493<br />

)=>int)block)) (a::’t486) (b::’t487) (c::’t491) (d::’t488) (e::’t492) (f::’t493)<br />

. [| Def ((a, b) ;;; R ;;; (d, is)) ; Def ((is, c) ;;; S ;;; (e, f)) ; ALL (<br />

qs704::(’t486*’t487)) (qs705::(’t488*’t489)). 0


APPENDIX C. PLACED COMBINATOR LIBRARIES 222<br />

theorem "!! (R::(((’t486*’t487)=>(’t488*’t489)=>bool,(’t486*’t487)=>(’t488*’t489)=><br />

int)block)) (S::(((’t489*’t491)=>(’t492*’t493)=>bool,(’t489*’t491)=>(’t492*’t493<br />

)=>int)block)) (a::’t486) (b::’t487) (c::’t491) (d::’t488) (e::’t492) (f::’t493)<br />

. [| Def ((a, b) ;;; R ;;; (d, is)) ; Def ((is, c) ;;; S ;;; (e, f)) ; ALL (<br />

qs704::(’t486*’t487)) (qs705::(’t488*’t489)). 0


APPENDIX C. PLACED COMBINATOR LIBRARIES 223<br />

struct:: "(int*(((’t520*’t506)=>(’t507*’t520)=>bool,(’t520*’t506)=>(’t507*’t520)<br />

=>int)block))=>(’t520*(’t506)vector)=>((’t507)vector*’t520)=>bool"<br />

height:: "(int*(((’t520*’t506)=>(’t507*’t520)=>bool,(’t520*’t506)=>(’t507*’t520)<br />

=>int)block))=>(’t520*(’t506)vector)=>((’t507)vector*’t520)=>int"<br />

width:: "(int*(((’t520*’t506)=>(’t507*’t520)=>bool,(’t520*’t506)=>(’t507*’t520)=><br />

int)block))=>(’t520*(’t506)vector)=>((’t507)vector*’t520)=>int"<br />

row:: "((int*(((’t520*’t506)=>(’t507*’t520)=>bool,(’t520*’t506)=>(’t507*’t520)=><br />

int)block))=>(’t520*(’t506)vector)=>((’t507)vector*’t520)=>bool, (int*(((’<br />

t520*’t506)=>(’t507*’t520)=>bool,(’t520*’t506)=>(’t507*’t520)=>int)block))<br />

=>(’t520*(’t506)vector)=>((’t507)vector*’t520)=>int)block"<br />

defs<br />

struct_def: "struct == % (n, R) (l, t) (b, r). EX (is::(’t520)vector). (is = l<br />

) & (ALL (i::int). ((0 bool,(’t520<br />

*’t506)=>(’t507*’t520)=>int)block)) (l::’t520) (t::(’t506)vector) (b::(’t507)<br />

vector) (r::’t520). [| ALL (qs708::(’t520*’t506)) (qs709::(’t507*’t520)). 0 <br />

0 (’t507*’t520)=>bool,(’t520*’<br />

t506)=>(’t507*’t520)=>int)block)) (l::’t520) (t::(’t506)vector) (b::(’t507)<br />

vector) (r::’t520). [| ALL (qs708::(’t520*’t506)) (qs709::(’t507*’t520)). 0 <br />

0 (’t507*’t520)=>bool,(’t520*’<br />

t506)=>(’t507*’t520)=>int)block)) (l::’t520) (t::(’t506)vector) (b::(’t507)<br />

vector) (r::’t520). [| ALL (qs708::(’t520*’t506)) (qs709::(’t507*’t520)). 0 <br />

0


APPENDIX C. PLACED COMBINATOR LIBRARIES 224<br />

theorem width_ge0: "!! (n::int) (R::(((’t520*’t506)=>(’t507*’t520)=>bool,(’t520*’t506<br />

)=>(’t507*’t520)=>int)block)) (l::’t520) (t::(’t506)vector) (b::(’t507)vector) (<br />

r::’t520). [| ALL (qs708::(’t520*’t506)) (qs709::(’t507*’t520)). 0 <br />

0 (’t507*’t520)=>bool,(’t520*’t506)=>(’t507*’<br />

t520)=>int)block)) (l::’t520) (t::(’t506)vector) (b::(’t507)vector) (r::’t520).<br />

[| is = l ; ALL (i::int). ((0


APPENDIX C. PLACED COMBINATOR LIBRARIES 225<br />

(simp add: overlap0’’)+) |<br />

auto intro: sum_ge0 maxf_ge0 sum_nsub1_plusf maxf_encloses)<br />

done<br />

end<br />

C.1.9 grid<br />

/∗∗ Grid <strong>of</strong> m x n R blocks. m columns, n rows ∗/<br />

block grid (int m, int n, block R (‘a, ‘b) ∼ (‘b, ‘a)) (‘a l[n], ‘b t[m]) ∼<br />

(‘b b[m], ‘a r[n])<br />

attributes {<br />

height = height ((l, t) ; row (m, col (n, R)) ; (b, r)).<br />

width = width ((l, t) ; row (m, col (n, R)) ; (b, r)).<br />

layout−proved.<br />

} → row (m, col (n, R)).<br />

theory grid = col + row:<br />

section {* Function definitions *}<br />

consts<br />

struct:: "(int*int*(((’t581*’t587)=>(’t587*’t581)=>bool,(’t581*’t587)=>(’t587*’<br />

t581)=>int)block))=>((’t581)vector*(’t587)vector)=>((’t587)vector*(’t581)<br />

vector)=>bool"<br />

height:: "(int*int*(((’t581*’t587)=>(’t587*’t581)=>bool,(’t581*’t587)=>(’t587*’<br />

t581)=>int)block))=>((’t581)vector*(’t587)vector)=>((’t587)vector*(’t581)<br />

vector)=>int"<br />

width:: "(int*int*(((’t581*’t587)=>(’t587*’t581)=>bool,(’t581*’t587)=>(’t587*’<br />

t581)=>int)block))=>((’t581)vector*(’t587)vector)=>((’t587)vector*(’t581)<br />

vector)=>int"<br />

grid:: "((int*int*(((’t581*’t587)=>(’t587*’t581)=>bool,(’t581*’t587)=>(’t587*’<br />

t581)=>int)block))=>((’t581)vector*(’t587)vector)=>((’t587)vector*(’t581)<br />

vector)=>bool, (int*int*(((’t581*’t587)=>(’t587*’t581)=>bool,(’t581*’t587)<br />

=>(’t587*’t581)=>int)block))=>((’t581)vector*(’t587)vector)=>((’t587)vector<br />

*(’t581)vector)=>int)block"<br />

defs<br />

struct_def: "struct == % (m, n, R) (l, t) (b, r). Def ((l, t) ;;; row $ (m, col $<br />

(n, R)) ;;; (b, r))"<br />

height_def: "height == % (m, n, R) (l, t) (b, r). Height ((l, t) ;;; row $ (m,<br />

col $ (n, R)) ;;; (b, r))"<br />

width_def: "width == % (m, n, R) (l, t) (b, r). Width ((l, t) ;;; row $ (m, col $<br />

(n, R)) ;;; (b, r))"<br />

grid_def: "grid == (| Def = struct, Height = height, Width = width |)"<br />

declare width_def [simp]<br />

declare height_def [simp]<br />

declare struct_def [simp]<br />

section {* Validity <strong>of</strong> width and height functions *}<br />

theorem height_ge0_int: "!! (m::int) (n::int) (R::(((’t581*’t587)=>(’t587*’t581)=><br />

bool,(’t581*’t587)=>(’t587*’t581)=>int)block)) (l::(’t581)vector) (t::(’t587)<br />

vector) (b::(’t587)vector) (r::(’t581)vector). [| ALL (qs714::(’t581*’t587)) (<br />

qs715::(’t587*’t581)). 0


APPENDIX C. PLACED COMBINATOR LIBRARIES 226<br />

0 (’t587*’t581)=>bool<br />

,(’t581*’t587)=>(’t587*’t581)=>int)block)) (l::(’t581)vector) (t::(’t587)vector)<br />

(b::(’t587)vector) (r::(’t581)vector). [| ALL (qs714::(’t581*’t587)) (qs715::(’<br />

t587*’t581)). 0 bool,(’<br />

t581*’t587)=>(’t587*’t581)=>int)block)) (l::(’t581)vector) (t::(’t587)vector) (b<br />

::(’t587)vector) (r::(’t581)vector). [| ALL (qs714::(’t581*’t587)) (qs715::(’<br />

t587*’t581)). 0 bool,(’<br />

t581*’t587)=>(’t587*’t581)=>int)block)) (l::(’t581)vector) (t::(’t587)vector) (b<br />

::(’t587)vector) (r::(’t581)vector). [| ALL (qs714::(’t581*’t587)) (qs715::(’<br />

t587*’t581)). 0 bool,(’t581*’t587)<br />

=>(’t587*’t581)=>int)block)) (l::(’t581)vector) (t::(’t587)vector) (b::(’t587)<br />

vector) (r::(’t581)vector). [| Def ((l, t) ;;; row $ (m, col $ (n, R)) ;;; (b, r<br />

)) ; ALL (qs714::(’t581*’t587)) (qs715::(’t587*’t581)). 0


APPENDIX C. PLACED COMBINATOR LIBRARIES 227<br />

C.1.10 loop<br />

/∗∗ Loop. Ruby x (loop R) y. R for some s. ∗/<br />

block loop (block R (‘a, ˆd1 ‘b) ∼ (ˆd1∗ ‘b, ‘c)) (‘a i) ∼ (‘c o)<br />

attributes {<br />

height = height((i, s) ; R ; (s, o)).<br />

width = width((i, s) ;R ; (s, o)).<br />

layout−proved.<br />

} {<br />

‘b s.<br />

(i, s) ; R ; (s, o) at (0,0).<br />

}<br />

theory loop = Quartz<strong>Layout</strong>:<br />

section {* Function definitions *}<br />

consts<br />

struct:: "(((’t632*’t633)=>(’t633*’t634)=>bool,(’t632*’t633)=>(’t633*’t634)=>int)<br />

block)=>’t632=>’t634=>bool"<br />

height:: "(((’t632*’t633)=>(’t633*’t634)=>bool,(’t632*’t633)=>(’t633*’t634)=>int)<br />

block)=>’t632=>’t634=>int"<br />

width:: "(((’t632*’t633)=>(’t633*’t634)=>bool,(’t632*’t633)=>(’t633*’t634)=>int)<br />

block)=>’t632=>’t634=>int"<br />

loop:: "((((’t632*’t633)=>(’t633*’t634)=>bool,(’t632*’t633)=>(’t633*’t634)=>int)<br />

block)=>’t632=>’t634=>bool, (((’t632*’t633)=>(’t633*’t634)=>bool,(’t632*’t633<br />

)=>(’t633*’t634)=>int)block)=>’t632=>’t634=>int)block"<br />

defs<br />

struct_def: "struct == % R i o_. EX (s::’t633). Def ((i, s) ;;; R ;;; (s, o_))"<br />

height_def: "height == % R i o_. let s = (THE (s::’t633). Def ((i, s) ;;; R ;;; (<br />

s, o_))) in Height ((i, s) ;;; R ;;; (s, o_))"<br />

width_def: "width == % R i o_. let s = (THE (s::’t633). Def ((i, s) ;;; R ;;; (s,<br />

o_))) in Width ((i, s) ;;; R ;;; (s, o_))"<br />

loop_def: "loop == (| Def = struct, Height = height, Width = width |)"<br />

declare width_def [simp]<br />

declare height_def [simp]<br />

declare struct_def [simp]<br />

section {* Validity <strong>of</strong> width and height functions *}<br />

theorem height_ge0_int: "!! (R::(((’t632*’t633)=>(’t633*’t634)=>bool,(’t632*’t633)<br />

=>(’t633*’t634)=>int)block)) (i::’t632) (o_::’t634). [| ALL (qs722::(’t632*’t633<br />

)) (qs723::(’t633*’t634)). 0 <br />

0 (’t633*’t634)=>bool,(’t632*’t633)=>(’<br />

t633*’t634)=>int)block)) (i::’t632) (o_::’t634). [| ALL (qs722::(’t632*’t633)) (<br />

qs723::(’t633*’t634)). 0 <br />

0


APPENDIX C. PLACED COMBINATOR LIBRARIES 228<br />

theorem height_ge0: "!! (R::(((’t632*’t633)=>(’t633*’t634)=>bool,(’t632*’t633)=>(’<br />

t633*’t634)=>int)block)) (i::’t632) (o_::’t634). [| ALL (qs722::(’t632*’t633)) (<br />

qs723::(’t633*’t634)). 0 <br />

0 (’t633*’t634)=>bool,(’t632*’t633)=>(’t633<br />

*’t634)=>int)block)) (i::’t632) (o_::’t634). [| ALL (qs722::(’t632*’t633)) (<br />

qs723::(’t633*’t634)). 0 <br />

0 (’t633*’t634)=>bool,(’t632*’t633)=>(’t633*’t634)=><br />

int)block)) (i::’t632) (o_::’t634). [| Def ((i, s) ;;; R ;;; (s, o_)) ; ALL (<br />

qs722::(’t632*’t633)) (qs723::(’t633*’t634)). 0


APPENDIX C. PLACED COMBINATOR LIBRARIES 229<br />

struct:: "nat=>((int=>’t12=>’t12=>bool,int=>’t12=>’t12=>int)block)=>’t12=>’t12=><br />

bool"<br />

height:: "nat=>((int=>’t12=>’t12=>bool,int=>’t12=>’t12=>int)block)=>’t12=>’<br />

t12=>int"<br />

width:: "nat=>((int=>’t12=>’t12=>bool,int=>’t12=>’t12=>int)block)=>’t12=>’t12=><br />

int"<br />

ichain:: "((int*((int=>’t12=>’t12=>bool,int=>’t12=>’t12=>int)block))=>’t12=>’t12<br />

=>bool, (int*((int=>’t12=>’t12=>bool,int=>’t12=>’t12=>int)block))=>’t12=>’t12<br />

=>int)block"<br />

defs<br />

ichain_def: "ichain == (| Def = % (n, R) d r. struct (int2nat n) R d r, Height =<br />

% (n, R) d r. height (int2nat n) R d r, Width = % (n, R) d r. width (int2nat<br />

n) R d r |)"<br />

primrec<br />

"struct 0 R d r = (d = r)"<br />

"struct (Suc n) R d r = (EX is. (struct n R d is) & (Def R (int (n + 1)) is r))"<br />

primrec<br />

"height 0 R d r = 0"<br />

"height (Suc n) R d r = (<br />

let is = (THE is. (struct (Suc n) R d is) & Def R (int (Suc n)) is r) in<br />

max (height n R d is) (Height R (int (Suc n)) is r)<br />

)"<br />

primrec<br />

"width 0 R d r = 0"<br />

"width (Suc n) R d r = (<br />

let is = (THE is. (struct (Suc n) R d is) & Def R (int (Suc n)) is r) in<br />

(width n R d is) + (Width R (int (Suc n)) is r)<br />

)"<br />

section {* Validity <strong>of</strong> width and height functions *}<br />

theorem height_ge0_int: "!! (n::nat) (R::((int=>’t18=>’t18=>bool,int=>’t18=>’t18=>int<br />

)block)) (d::’t18) (r::’t18). [| ALL (qs23::int) (qs24::’t18) (qs25::’t18). 0 ’t18=>bool,int=>’<br />

t18=>’t18=>int)block)). [| ALL (qs23::int) (qs24::’t18) (qs25::’t18). 0


APPENDIX C. PLACED COMBINATOR LIBRARIES 230<br />

theorem height_ge0: "!! (n::int) (R::((int=>’t18=>’t18=>bool,int=>’t18=>’t18=>int)<br />

block)) (d::’t18) (r::’t18). [| ALL (qs23::int) (qs24::’t18) (qs25::’t18). 0 bool,int=>’t18=>’t18=>int)<br />

block)) (d::’t18) (r::’t18). [| ALL (qs23::int) (qs24::’t18) (qs25::’t18). 0 bool,int=>’t18=>’t18=>int)block)) (d::’<br />

t18) (r::’t18). [| if n = 0 then d = r else Def (d ;;; ichain $ (n - 1, R) ;; R<br />

$ n ;;; r) ; ALL (qs23::int) (qs24::’t18) (qs25::’t18). 0


APPENDIX C. PLACED COMBINATOR LIBRARIES 231<br />

section {* Function definitions *}<br />

consts<br />

struct:: "nat=>((int=>’t106=>’t106=>bool,int=>’t106=>’t106=>int)block)=>(’t106)<br />

vector=>(’t106)vector=>bool"<br />

height:: "nat=>((int=>’t106=>’t106=>bool,int=>’t106=>’t106=>int)block)=>(’t106)<br />

vector=>(’t106)vector=>int"<br />

width:: "nat=>((int=>’t106=>’t106=>bool,int=>’t106=>’t106=>int)block)=>(’t106)<br />

vector=>(’t106)vector=>int"<br />

imap:: "((int*((int=>’t106=>’t106=>bool,int=>’t106=>’t106=>int)block))=>(’t106)<br />

vector=>(’t106)vector=>bool, (int*((int=>’t106=>’t106=>bool,int=>’t106=>’t106<br />

=>int)block))=>(’t106)vector=>(’t106)vector=>int)block"<br />

defs<br />

imap_def: "imap == (| Def = % (n, R) d r. struct (int2nat n) R d r, Height = % (n<br />

, R) d r. height (int2nat n) R d r, Width = % (n, R) d r. width (int2nat n) R<br />

d r |)"<br />

primrec<br />

"struct 0 R d r = (d = r)"<br />

"struct (Suc n) R d r = (<br />

struct n R (d) (r) &<br />

Def (d ;;; R $ (int (Suc n)) ;;; r)<br />

)"<br />

(* Parallel composition laid out vertically, heights add *)<br />

primrec<br />

"height 0 R d r = 0"<br />

"height (Suc n) R d r = (<br />

Height (d ;;; R $ (int (Suc n)) ;;; r) +<br />

height n R (d) (r)<br />

)"<br />

primrec<br />

"width 0 R d r = 0"<br />

"width (Suc n) R d r = (<br />

max (Width (d ;;; R $ (int (Suc n)) ;;; r))<br />

(width n R (d) (r))<br />

)"<br />

section {* Validity <strong>of</strong> width and height functions *}<br />

theorem height_ge0_int [rule_format]: "!! (n::nat) (R::((int=>’t31=>’t31=>bool,int=>’<br />

t31=>’t31=>int)block)). [| n >= 0 ; ALL (qs148::int) (qs149::’t31) (qs150::’t31)<br />

. 0 ’t31=>bool,int=>’<br />

t31=>’t31=>int)block)). [| n >= 0 ; ALL (qs148::int) (qs149::’t31) (qs150::’t31)


APPENDIX C. PLACED COMBINATOR LIBRARIES 232<br />

. 0 ’t31=>bool,int=>’t31=>’t31=>int)<br />

block)) (d::(’t31)vector) (r::(’t31)vector). [| n >= 0 ; ALL (qs148::int) (qs149<br />

::’t31) (qs150::’t31). 0 bool,int=>’t31=>’t31=>int)<br />

block)) (d::(’t31)vector) (r::(’t31)vector). [| n >= 0 ; ALL (qs148::int) (qs149<br />

::’t31) (qs150::’t31). 0 bool,int=>’t31=>’t31=>int)block)) (d::(’<br />

t31)vector) (r::(’t31)vector). [| n >= 0 ; if n = 0 then d = r else Def (d ;;;<br />

converse $ (apr $ (n - 1)) ;; [[ imap $ (n - 1, R), R $ n ]] ;; apr $ (n - 1)<br />

;;; r) ; ALL (qs148::int) (qs149::’t31) (qs150::’t31). 0


APPENDIX C. PLACED COMBINATOR LIBRARIES 233<br />

/∗∗ Row <strong>of</strong> R blocks <strong>with</strong> left inputs connected to right outputs <strong>of</strong> previous block.<br />

Supply increasing integer parameter from 0 to n−1. ∗/<br />

block irow (int n, block R int (‘a, ‘b) ∼ (‘c, ‘a)) (‘a l , ‘b t[n]) ∼ (‘c b[n], ‘a r)<br />

attributes {<br />

height = if(n==0, 0, height((l, t) ; snd (converse (apr (n − 1))) ; beside (irow (n−1, R),<br />

R n) ; fst (apr (n−1)) ; (b, r))).<br />

width = if(n==0, 0, width((l, t) ;snd (converse (apr (n − 1))) ; beside (irow (n−1, R), R<br />

n) ; fst (apr (n−1)) ; (b, r))).<br />

} {<br />

// Wires: l = left, t = top, b = bottom, r = right<br />

assert (n >= 0) ”n >= 0 is required”.<br />

if (n == 0) { l = r. } // b and t are empty vectors anyway<br />

else {<br />

(l , t) ;<br />

snd (converse (apr (n − 1))) ;<br />

beside (irow (n−1, R), R n) ;<br />

fst (apr (n−1)) ;<br />

(b, r) at (0,0).<br />

} .<br />

}<br />

theory irow = fst + beside + apr + converse + snd:<br />

section {* Function definitions *}<br />

consts<br />

struct:: "nat=>((int=>(’t107*’t108)=>(’t109*’t107)=>bool,int=>(’t107*’t108)=>(’<br />

t109*’t107)=>int)block)=>’t107=>(’t108)vector=>(’t109)vector=>’t107=>bool"<br />

height:: "nat=>((int=>(’t107*’t108)=>(’t109*’t107)=>bool,int=>(’t107*’t108)=>(’<br />

t109*’t107)=>int)block)=>’t107=>(’t108)vector=>(’t109)vector=>’t107=>int"<br />

width:: "nat=>((int=>(’t107*’t108)=>(’t109*’t107)=>bool,int=>(’t107*’t108)=>(’<br />

t109*’t107)=>int)block)=>’t107=>(’t108)vector=>(’t109)vector=>’t107=>int"<br />

irow:: "((int*((int=>(’t107*’t108)=>(’t109*’t107)=>bool,int=>(’t107*’t108)=>(’<br />

t109*’t107)=>int)block))=>(’t107*(’t108)vector)=>((’t109)vector*’t107)=>bool,<br />

(int*((int=>(’t107*’t108)=>(’t109*’t107)=>bool,int=>(’t107*’t108)=>(’t109*’<br />

t107)=>int)block))=>(’t107*(’t108)vector)=>((’t109)vector*’t107)=>int)block"<br />

defs<br />

irow_def: "irow == (| Def = % (n, R) (l, t) (b, r). struct (int2nat n) R l t b r,<br />

Height = % (n, R) (l, t) (b, r). height (int2nat n) R l t b r, Width = % (n,<br />

R) (l, t) (b, r). width (int2nat n) R l t b r |)"<br />

primrec<br />

"struct 0 R l t b r = (l = r)"<br />

"struct (Suc n) R l t b r =<br />

Def beside<br />

(((| Def = % (l, t) (b, r). struct n R l t b r,<br />

Height = % a b. arbitrary,<br />

Width = % a b. arbitrary|)),<br />

R $ (int (Suc n))) (l, (t, t)) ((b, b<<br />

int n>), r)"<br />

primrec<br />

"height 0 R l t b r = 0"<br />

"height (Suc n) R l t b r =<br />

Height beside<br />

(((| Def = % a b. arbitrary,


APPENDIX C. PLACED COMBINATOR LIBRARIES 234<br />

Height = % (l,t) (b,r). height n R l t b r,<br />

Width = % (l,t) (b,r). width n R l t b r|)),<br />

R $ (int (Suc n))) (l, (t, t)) ((b, b<<br />

int n>), r)"<br />

primrec<br />

"width 0 R l t b r = 0"<br />

"width (Suc n) R l t b r =<br />

Width ((l, t) ;;;<br />

snd $ (converse $ (apr $ (int n))) ;;<br />

beside $ ((| Def = % b c. arbitrary, Height = % b c. arbitrary, Width = % (l<br />

,t) (b, r). width n R l t b r |), R $ (int n + 1)) ;;<br />

fst $ (apr $ (int n))<br />

;;; (b, r)<br />

)"<br />

section {* Validity <strong>of</strong> width and height functions *}<br />

theorem height_ge0_int [rule_format]: "!! (n::nat) R. [| ALL (qs137::int) (qs138::(’<br />

t107*’t108)) (qs139::(’t109*’t107)). 0 (’t109*’<br />

t107)=>bool,int=>(’t107*’t108)=>(’t109*’t107)=>int)block)). [| ALL (qs137::int)<br />

(qs138::(’t107*’t108)) (qs139::(’t109*’t107)). 0 (’t109*’t107)=>bool,int<br />

=>(’t107*’t108)=>(’t109*’t107)=>int)block)) (l::’t107) (t::(’t108)vector) (b::(’<br />

t109)vector) (r::’t107). [| n >= 0 ; ALL (qs137::int) (qs138::(’t107*’t108)) (<br />

qs139::(’t109*’t107)). 0


APPENDIX C. PLACED COMBINATOR LIBRARIES 235<br />

done<br />

theorem width_ge0: "!! (n::int) (R::((int=>(’t107*’t108)=>(’t109*’t107)=>bool,int=>(’<br />

t107*’t108)=>(’t109*’t107)=>int)block)) (l::’t107) (t::(’t108)vector) (b::(’t109<br />

)vector) (r::’t107). [| n >= 0 ; ALL (qs137::int) (qs138::(’t107*’t108)) (qs139<br />

::(’t109*’t107)). 0 (’t154*’t165)=>bool,int=>(’t165*’t153)<br />

=>(’t154*’t165)=>int)block)) (l::’t165) (t::(’t153)vector) (b::(’t154)vector) (r<br />

::’t165). [| n >= 0 ; if n = 0 then l = r else Def ((l, t) ;;; snd $ (converse $<br />

(apr $ (n - 1))) ;; beside $ (irow $ (n - 1, R), R $ n) ;; fst $ (apr $ (n - 1)<br />

) ;;; (b, r)) ; ALL (qs205::int) (qs206::(’t165*’t153)) (qs207::(’t154*’t165)).<br />

0


APPENDIX C. PLACED COMBINATOR LIBRARIES 236<br />

theory irdlelem = pi2 + converse:<br />

section {* Function definitions *}<br />

consts<br />

struct:: "((int=>(’t36*’t28)=>’t36=>bool,int=>(’t36*’t28)=>’t36=>int)block)=>int<br />

=>(’t36*’t28)=>(’t28*’t36)=>bool"<br />

height:: "((int=>(’t36*’t28)=>’t36=>bool,int=>(’t36*’t28)=>’t36=>int)block)=>int<br />

=>(’t36*’t28)=>(’t28*’t36)=>int"<br />

width:: "((int=>(’t36*’t28)=>’t36=>bool,int=>(’t36*’t28)=>’t36=>int)block)=>int<br />

=>(’t36*’t28)=>(’t28*’t36)=>int"<br />

irdlelem:: "(((int=>(’t36*’t28)=>’t36=>bool,int=>(’t36*’t28)=>’t36=>int)block)=><br />

int=>(’t36*’t28)=>(’t28*’t36)=>bool, ((int=>(’t36*’t28)=>’t36=>bool,int=>(’<br />

t36*’t28)=>’t36=>int)block)=>int=>(’t36*’t28)=>(’t28*’t36)=>int)block"<br />

defs<br />

struct_def: "struct == % R n (l, t) (b, r). Def ((l, t) ;;; R $ n ;; converse $ (<br />

pi2) ;;; (b, r))"<br />

height_def: "height == % R n (l, t) (b, r). Height ((l, t) ;;; R $ n ;; converse<br />

$ (pi2) ;;; (b, r))"<br />

width_def: "width == % R n (l, t) (b, r). Width ((l, t) ;;; R $ n ;; converse $ (<br />

pi2) ;;; (b, r))"<br />

irdlelem_def: "irdlelem == (| Def = struct, Height = height, Width = width |)"<br />

declare width_def [simp]<br />

declare height_def [simp]<br />

declare struct_def [simp]<br />

section {* Validity <strong>of</strong> width and height functions *}<br />

theorem height_ge0_int: "!! (R::((int=>(’t36*’t28)=>’t36=>bool,int=>(’t36*’t28)=>’t36<br />

=>int)block)) (n::int) (l::’t36) (t::’t28) (b::’t28) (r::’t36). [| ALL (qs643::<br />

int) (qs644::(’t36*’t28)) (qs645::’t36). 0 <br />

0 (’t36*’t28)=>’t36=>bool,int=>(’t36*’t28)=>’t36<br />

=>int)block)) (n::int) (l::’t36) (t::’t28) (b::’t28) (r::’t36). [| ALL (qs643::<br />

int) (qs644::(’t36*’t28)) (qs645::’t36). 0 <br />

0 (’t36*’t28)=>’t36=>bool,int=>(’t36*’t28)=>’t36=><br />

int)block)) (n::int) (l::’t36) (t::’t28) (b::’t28) (r::’t36). [| ALL (qs643::int<br />

) (qs644::(’t36*’t28)) (qs645::’t36). 0 <br />

0


APPENDIX C. PLACED COMBINATOR LIBRARIES 237<br />

apply (simp (no_asm_simp) del: height_def width_def add: Let_def max_def converse_def<br />

pi2_def irdlelem_def,<br />

(rule height_ge0_int, (simp+)?)?)<br />

done<br />

theorem width_ge0: "!! (R::((int=>(’t36*’t28)=>’t36=>bool,int=>(’t36*’t28)=>’t36=>int<br />

)block)) (n::int) (l::’t36) (t::’t28) (b::’t28) (r::’t36). [| ALL (qs643::int) (<br />

qs644::(’t36*’t28)) (qs645::’t36). 0 <br />

0 (’t36*’t28)=>’t36=>bool,int=>(’t36*’t28)=>’t36=>int)block)) (n<br />

::int) (l::’t36) (t::’t28) (b::’t28) (r::’t36). [| Def ((l, t) ;;; R $ n ;;<br />

converse $ (pi2) ;;; (b, r)) ; ALL (qs643::int) (qs644::(’t36*’t28)) (qs645::’<br />

t36). 0 ’t298=>int)<br />

block))=>(’t298*(’t257)vector)=>’t298=>bool"<br />

height:: "(int*((int=>(’t298*’t257)=>’t298=>bool,int=>(’t298*’t257)=>’t298=>int)<br />

block))=>(’t298*(’t257)vector)=>’t298=>int"<br />

width:: "(int*((int=>(’t298*’t257)=>’t298=>bool,int=>(’t298*’t257)=>’t298=>int)<br />

block))=>(’t298*(’t257)vector)=>’t298=>int"<br />

irdl:: "((int*((int=>(’t298*’t257)=>’t298=>bool,int=>(’t298*’t257)=>’t298=>int)<br />

block))=>(’t298*(’t257)vector)=>’t298=>bool, (int*((int=>(’t298*’t257)=>’t298<br />

=>bool,int=>(’t298*’t257)=>’t298=>int)block))=>(’t298*(’t257)vector)=>’t298=><br />

int)block"<br />

defs<br />

struct_def: "struct == % (n, R) (l, t) r. Def ((l, t) ;;; irow $ (n, irdlelem $ (<br />

R)) ;; pi2 ;;; r)"<br />

height_def: "height == % (n, R) (l, t) r. Height ((l, t) ;;; irow $ (n, irdlelem<br />

$ (R)) ;; pi2 ;;; r)"<br />

width_def: "width == % (n, R) (l, t) r. Width ((l, t) ;;; irow $ (n, irdlelem $ (<br />

R)) ;; pi2 ;;; r)"


APPENDIX C. PLACED COMBINATOR LIBRARIES 238<br />

irdl_def: "irdl == (| Def = struct, Height = height, Width = width |)"<br />

declare width_def [simp]<br />

declare height_def [simp]<br />

declare struct_def [simp]<br />

section {* Validity <strong>of</strong> width and height functions *}<br />

theorem height_ge0_int: "!! (n::int) (R::((int=>(’t298*’t257)=>’t298=>bool,int=>(’<br />

t298*’t257)=>’t298=>int)block)) (l::’t298) (t::(’t257)vector) (r::’t298). [| 0<br />

bool,int=>(’t298<br />

*’t257)=>’t298=>int)block)) (l::’t298) (t::(’t257)vector) (r::’t298). [| 0 bool,int=>(’t298*’<br />

t257)=>’t298=>int)block)) (l::’t298) (t::(’t257)vector) (r::’t298). [| 0 bool,int=>(’t298*’<br />

t257)=>’t298=>int)block)) (l::’t298) (t::(’t257)vector) (r::’t298). [| 0


APPENDIX C. PLACED COMBINATOR LIBRARIES 239<br />

theorem "!! (n::int) (R::((int=>(’t298*’t257)=>’t298=>bool,int=>(’t298*’t257)=>’t298<br />

=>int)block)) (l::’t298) (t::(’t257)vector) (r::’t298). [| Def ((l, t) ;;; irow<br />

$ (n, irdlelem $ (R)) ;; pi2 ;;; r) ; ALL (qs649::int) (qs650::(’t298*’t257)) (<br />

qs651::’t298). 0 ’t22=>int)block)=>int=><br />

int=>’t21=>’t22=>bool"<br />

height:: "(((int*int)=>’t21=>’t22=>bool,(int*int)=>’t21=>’t22=>int)block)=>int=><br />

int=>’t21=>’t22=>int"<br />

width:: "(((int*int)=>’t21=>’t22=>bool,(int*int)=>’t21=>’t22=>int)block)=>int=><br />

int=>’t21=>’t22=>int"<br />

curry:: "((((int*int)=>’t21=>’t22=>bool,(int*int)=>’t21=>’t22=>int)block)=>int=><br />

int=>’t21=>’t22=>bool, (((int*int)=>’t21=>’t22=>bool,(int*int)=>’t21=>’t22=><br />

int)block)=>int=>int=>’t21=>’t22=>int)block"<br />

defs<br />

struct_def: "struct == % R m n d r. Def (d ;;; R $ (m, n) ;;; r)"<br />

height_def: "height == % R m n d r. Height (d ;;; R $ (m, n) ;;; r)"<br />

width_def: "width == % R m n d r. Width (d ;;; R $ (m, n) ;;; r)"<br />

curry_def: "curry == (| Def = struct, Height = height, Width = width |)"<br />

declare width_def [simp]<br />

declare height_def [simp]<br />

declare struct_def [simp]<br />

section {* Validity <strong>of</strong> width and height functions *}<br />

theorem height_ge0_int: "!! (R::(((int*int)=>’t21=>’t22=>bool,(int*int)=>’t21=>’t22=><br />

int)block)) (m::int) (n::int) (d::’t21) (r::’t22). [| ALL (qs383::(int*int)) (


APPENDIX C. PLACED COMBINATOR LIBRARIES 240<br />

qs384::’t21) (qs385::’t22). 0 ’t22=>bool,(int*int)=>’t21=>’t22=><br />

int)block)) (m::int) (n::int) (d::’t21) (r::’t22). [| ALL (qs383::(int*int)) (<br />

qs384::’t21) (qs385::’t22). 0 ’t22=>bool,(int*int)=>’t21=>’t22=>int)<br />

block)) (m::int) (n::int) (d::’t21) (r::’t22). [| ALL (qs383::(int*int)) (qs384<br />

::’t21) (qs385::’t22). 0 ’t22=>bool,(int*int)=>’t21=>’t22=>int)<br />

block)) (m::int) (n::int) (d::’t21) (r::’t22). [| ALL (qs383::(int*int)) (qs384<br />

::’t21) (qs385::’t22). 0 ’t22=>bool,(int*int)=>’t21=>’t22=>int)block)) (m::<br />

int) (n::int) (d::’t21) (r::’t22). [| Def (d ;;; R $ (m, n) ;;; r) ; ALL (qs383<br />

::(int*int)) (qs384::’t21) (qs385::’t22). 0


APPENDIX C. PLACED COMBINATOR LIBRARIES 241<br />

section {* Function definitions *}<br />

consts<br />

struct:: "((int=>int=>(’t180*’t180)=>(’t180*’t180)=>bool,int=>int=>(’t180*’t180)<br />

=>(’t180*’t180)=>int)block)=>int=>int=>((’t180)vector*’t180)=>(’t180*(’t180)<br />

vector)=>bool"<br />

height:: "((int=>int=>(’t180*’t180)=>(’t180*’t180)=>bool,int=>int=>(’t180*’t180)<br />

=>(’t180*’t180)=>int)block)=>int=>int=>((’t180)vector*’t180)=>(’t180*(’t180)<br />

vector)=>int"<br />

width:: "((int=>int=>(’t180*’t180)=>(’t180*’t180)=>bool,int=>int=>(’t180*’t180)<br />

=>(’t180*’t180)=>int)block)=>int=>int=>((’t180)vector*’t180)=>(’t180*(’t180)<br />

vector)=>int"<br />

igrid1:: "(((int=>int=>(’t180*’t180)=>(’t180*’t180)=>bool,int=>int=>(’t180*’t180)<br />

=>(’t180*’t180)=>int)block)=>int=>int=>((’t180)vector*’t180)=>(’t180*(’t180)<br />

vector)=>bool, ((int=>int=>(’t180*’t180)=>(’t180*’t180)=>bool,int=>int=>(’<br />

t180*’t180)=>(’t180*’t180)=>int)block)=>int=>int=>((’t180)vector*’t180)=>(’<br />

t180*(’t180)vector)=>int)block"<br />

defs<br />

struct_def: "struct == % R n i (l, t) (b, r). Def ((l, t) ;;; icol $ (n, R $ i)<br />

;;; (b, r))"<br />

height_def: "height == % R n i (l, t) (b, r). Height ((l, t) ;;; icol $ (n, R $ i<br />

) ;;; (b, r))"<br />

width_def: "width == % R n i (l, t) (b, r). Width ((l, t) ;;; icol $ (n, R $ i)<br />

;;; (b, r))"<br />

igrid1_def: "igrid1 == (| Def = struct, Height = height, Width = width |)"<br />

declare width_def [simp]<br />

declare height_def [simp]<br />

declare struct_def [simp]<br />

section {* Validity <strong>of</strong> width and height functions *}<br />

theorem height_ge0_int: "!! (R::((int=>int=>(’t180*’t180)=>(’t180*’t180)=>bool,int=><br />

int=>(’t180*’t180)=>(’t180*’t180)=>int)block)) (n::int) (i::int) (l::(’t180)<br />

vector) (t::’t180) (b::’t180) (r::(’t180)vector). [| ALL (qs389::int) (qs390::<br />

int) (qs391::(’t180*’t180)) (qs392::(’t180*’t180)). 0 (’t180*’t180)=>bool,int=><br />

int=>(’t180*’t180)=>(’t180*’t180)=>int)block)) (n::int) (i::int) (l::(’t180)<br />

vector) (t::’t180) (b::’t180) (r::(’t180)vector). [| ALL (qs389::int) (qs390::<br />

int) (qs391::(’t180*’t180)) (qs392::(’t180*’t180)). 0


APPENDIX C. PLACED COMBINATOR LIBRARIES 242<br />

section {* Additional simplification rules for different representations *}<br />

theorem height_ge0: "!! (R::((int=>int=>(’t180*’t180)=>(’t180*’t180)=>bool,int=>int<br />

=>(’t180*’t180)=>(’t180*’t180)=>int)block)) (n::int) (i::int) (l::(’t180)vector)<br />

(t::’t180) (b::’t180) (r::(’t180)vector). [| ALL (qs389::int) (qs390::int) (<br />

qs391::(’t180*’t180)) (qs392::(’t180*’t180)). 0 (’t180*’t180)=>bool,int=>int<br />

=>(’t180*’t180)=>(’t180*’t180)=>int)block)) (n::int) (i::int) (l::(’t180)vector)<br />

(t::’t180) (b::’t180) (r::(’t180)vector). [| ALL (qs389::int) (qs390::int) (<br />

qs391::(’t180*’t180)) (qs392::(’t180*’t180)). 0 (’t180*’t180)=>bool,int=>int=>(’t180*’t180<br />

)=>(’t180*’t180)=>int)block)) (n::int) (i::int) (l::(’t180)vector) (t::’t180) (b<br />

::’t180) (r::(’t180)vector). [| Def ((l, t) ;;; icol $ (n, R $ i) ;;; (b, r)) ;<br />

ALL (qs389::int) (qs390::int) (qs391::(’t180*’t180)) (qs392::(’t180*’t180)). 0<br />

(’t313*’t313)=>int)block))=>((’t313)vector*(’t313)vector)=>((’<br />

t313)vector*(’t313)vector)=>bool"


APPENDIX C. PLACED COMBINATOR LIBRARIES 243<br />

height:: "(int*int*(((int*int)=>(’t313*’t313)=>(’t313*’t313)=>bool,(int*int)=>(’<br />

t313*’t313)=>(’t313*’t313)=>int)block))=>((’t313)vector*(’t313)vector)=>((’<br />

t313)vector*(’t313)vector)=>int"<br />

width:: "(int*int*(((int*int)=>(’t313*’t313)=>(’t313*’t313)=>bool,(int*int)=>(’<br />

t313*’t313)=>(’t313*’t313)=>int)block))=>((’t313)vector*(’t313)vector)=>((’<br />

t313)vector*(’t313)vector)=>int"<br />

igrid:: "((int*int*(((int*int)=>(’t313*’t313)=>(’t313*’t313)=>bool,(int*int)=>(’<br />

t313*’t313)=>(’t313*’t313)=>int)block))=>((’t313)vector*(’t313)vector)=>((’<br />

t313)vector*(’t313)vector)=>bool, (int*int*(((int*int)=>(’t313*’t313)=>(’t313<br />

*’t313)=>bool,(int*int)=>(’t313*’t313)=>(’t313*’t313)=>int)block))=>((’t313)<br />

vector*(’t313)vector)=>((’t313)vector*(’t313)vector)=>int)block"<br />

defs<br />

struct_def: "struct == % (m, n, R) (l, t) (b, r). Def ((l, t) ;;; irow $ (m,<br />

igrid1 $ (curry $ (R)) $ n) ;;; (b, r))"<br />

height_def: "height == % (m, n, R) (l, t) (b, r). Height ((l, t) ;;; irow $ (m,<br />

igrid1 $ (curry $ (R)) $ n) ;;; (b, r))"<br />

width_def: "width == % (m, n, R) (l, t) (b, r). Width ((l, t) ;;; irow $ (m,<br />

igrid1 $ (curry $ (R)) $ n) ;;; (b, r))"<br />

igrid_def: "igrid == (| Def = struct, Height = height, Width = width |)"<br />

declare width_def [simp]<br />

declare height_def [simp]<br />

declare struct_def [simp]<br />

section {* Validity <strong>of</strong> width and height functions *}<br />

theorem height_ge0_int: "!! (m::int) (n::int) (R::(((int*int)=>(’t313*’t313)=>(’t313<br />

*’t313)=>bool,(int*int)=>(’t313*’t313)=>(’t313*’t313)=>int)block)) (l::(’t313)<br />

vector) (t::(’t313)vector) (b::(’t313)vector) (r::(’t313)vector). [| ALL (qs396<br />

::(int*int)) (qs397::(’t313*’t313)) (qs398::(’t313*’t313)). 0 (’t313*’t313)=>(’t313*’t313)=>int)block)) (l::(’t313)<br />

vector) (t::(’t313)vector) (b::(’t313)vector) (r::(’t313)vector). [| ALL (qs396<br />

::(int*int)) (qs397::(’t313*’t313)) (qs398::(’t313*’t313)). 0 (’t313*’t313)=>(’t313*’t313)=>int)block)) (l::(’t313)


APPENDIX C. PLACED COMBINATOR LIBRARIES 244<br />

vector) (t::(’t313)vector) (b::(’t313)vector) (r::(’t313)vector). [| ALL (qs396<br />

::(int*int)) (qs397::(’t313*’t313)) (qs398::(’t313*’t313)). 0 (’t313*’t313)=>(’t313*’t313)=>int)block)) (l::(’t313)vector)<br />

(t::(’t313)vector) (b::(’t313)vector) (r::(’t313)vector). [| ALL (qs396::(int*<br />

int)) (qs397::(’t313*’t313)) (qs398::(’t313*’t313)). 0 (’t313*’t313)=>(’t313*’t313)=>int)block)) (l::(’t313)vector) (t::(’<br />

t313)vector) (b::(’t313)vector) (r::(’t313)vector). [| Def ((l, t) ;;; irow $ (m<br />

, igrid1 $ (curry $ (R)) $ n) ;;; (b, r)) ; ALL (qs396::(int*int)) (qs397::(’<br />

t313*’t313)) (qs398::(’t313*’t313)). 0


APPENDIX C. PLACED COMBINATOR LIBRARIES 245<br />

} {<br />

a1 ; A ; a2<br />

at (0,0).<br />

b1 ; B ; b2<br />

at (0, height(a1 ; A ; a2)).<br />

c1 ; C ; c2<br />

at (width(b1 ;B ; b2), height(a1 ; A ; a2)).<br />

d1 ; D ; d2<br />

at (width (b1 ;B ; b2),<br />

max (height (c1 ;C ; c2) + height(a1 ;A ; a2), height(e1 ; E ; e2))).<br />

e1 ; E ; e2<br />

at (max (width (a1 ;A ;a2), width(c1 ;C ; c2) + width(b1 ;B ; b2)), 0).<br />

}<br />

theory irregular_grid = Quartz<strong>Layout</strong>:<br />

section {* Function definitions *}<br />

consts<br />

struct:: "(((’t15=>’t16=>bool,’t15=>’t16=>int)block)*((’t18=>’t19=>bool,’t18=>’<br />

t19=>int)block)*((’t21=>’t22=>bool,’t21=>’t22=>int)block)*((’t24=>’t25=>bool<br />

,’t24=>’t25=>int)block)*((’t27=>’t28=>bool,’t27=>’t28=>int)block))=>(’t15*’<br />

t18*’t21*’t24*’t27)=>(’t16*’t19*’t22*’t25*’t28)=>bool"<br />

height:: "(((’t15=>’t16=>bool,’t15=>’t16=>int)block)*((’t18=>’t19=>bool,’t18=>’<br />

t19=>int)block)*((’t21=>’t22=>bool,’t21=>’t22=>int)block)*((’t24=>’t25=>bool<br />

,’t24=>’t25=>int)block)*((’t27=>’t28=>bool,’t27=>’t28=>int)block))=>(’t15*’<br />

t18*’t21*’t24*’t27)=>(’t16*’t19*’t22*’t25*’t28)=>int"<br />

width:: "(((’t15=>’t16=>bool,’t15=>’t16=>int)block)*((’t18=>’t19=>bool,’t18=>’t19<br />

=>int)block)*((’t21=>’t22=>bool,’t21=>’t22=>int)block)*((’t24=>’t25=>bool,’<br />

t24=>’t25=>int)block)*((’t27=>’t28=>bool,’t27=>’t28=>int)block))=>(’t15*’t18<br />

*’t21*’t24*’t27)=>(’t16*’t19*’t22*’t25*’t28)=>int"<br />

irregular_grid:: "((((’t15=>’t16=>bool,’t15=>’t16=>int)block)*((’t18=>’t19=>bool<br />

,’t18=>’t19=>int)block)*((’t21=>’t22=>bool,’t21=>’t22=>int)block)*((’t24=>’<br />

t25=>bool,’t24=>’t25=>int)block)*((’t27=>’t28=>bool,’t27=>’t28=>int)block))<br />

=>(’t15*’t18*’t21*’t24*’t27)=>(’t16*’t19*’t22*’t25*’t28)=>bool, (((’t15=>’t16<br />

=>bool,’t15=>’t16=>int)block)*((’t18=>’t19=>bool,’t18=>’t19=>int)block)*((’<br />

t21=>’t22=>bool,’t21=>’t22=>int)block)*((’t24=>’t25=>bool,’t24=>’t25=>int)<br />

block)*((’t27=>’t28=>bool,’t27=>’t28=>int)block))=>(’t15*’t18*’t21*’t24*’t27)<br />

=>(’t16*’t19*’t22*’t25*’t28)=>int)block"<br />

defs<br />

struct_def: "struct == % (A, B, C, D, E) (a1, b1, c1, d1, e1) (a2, b2, c2, d2, e2<br />

). Def (a1 ;;; A ;;; a2) & Def (b1 ;;; B ;;; b2) & Def (c1 ;;; C ;;; c2) &<br />

Def (d1 ;;; D ;;; d2) & Def (e1 ;;; E ;;; e2)"<br />

height_def: "height == % (A, B, C, D, E) (a1, b1, c1, d1, e1) (a2, b2, c2, d2, e2<br />

). max ((Height (a1 ;;; A ;;; a2)) + (Height (b1 ;;; B ;;; b2))) ((max ((<br />

Height (a1 ;;; A ;;; a2)) + (Height (c1 ;;; C ;;; c2)) + (Height (d1 ;;; D<br />

;;; d2))) ((Height (e1 ;;; E ;;; e2)) + (Height (d1 ;;; D ;;; d2)))))"<br />

width_def: "width == % (A, B, C, D, E) (a1, b1, c1, d1, e1) (a2, b2, c2, d2, e2).<br />

max ((Width (a1 ;;; A ;;; a2)) + (Width (e1 ;;; E ;;; e2))) ((max ((Width (<br />

b1 ;;; B ;;; b2)) + (Width (c1 ;;; C ;;; c2)) + (Width (e1 ;;; E ;;; e2))) ((<br />

Width (b1 ;;; B ;;; b2)) + (Width (d1 ;;; D ;;; d2)))))"<br />

irregular_grid_def: "irregular_grid == (| Def = struct, Height = height, Width =<br />

width |)"<br />

declare width_def [simp]<br />

declare height_def [simp]


APPENDIX C. PLACED COMBINATOR LIBRARIES 246<br />

declare struct_def [simp]<br />

section {* Validity <strong>of</strong> width and height functions *}<br />

theorem height_ge0_int: "!! (A::((’t15=>’t16=>bool,’t15=>’t16=>int)block)) (B::((’t18<br />

=>’t19=>bool,’t18=>’t19=>int)block)) (C::((’t21=>’t22=>bool,’t21=>’t22=>int)<br />

block)) (D::((’t24=>’t25=>bool,’t24=>’t25=>int)block)) (E::((’t27=>’t28=>bool,’<br />

t27=>’t28=>int)block)) (a1::’t15) (b1::’t18) (c1::’t21) (d1::’t24) (e1::’t27) (<br />

a2::’t16) (b2::’t19) (c2::’t22) (d2::’t25) (e2::’t28). [| ALL (qs40::’t15) (qs41<br />

::’t16). 0 bool,’t21=>’t22=>int)<br />

block)) (D::((’t24=>’t25=>bool,’t24=>’t25=>int)block)) (E::((’t27=>’t28=>bool,’<br />

t27=>’t28=>int)block)) (a1::’t15) (b1::’t18) (c1::’t21) (d1::’t24) (e1::’t27) (<br />

a2::’t16) (b2::’t19) (c2::’t22) (d2::’t25) (e2::’t28). [| ALL (qs40::’t15) (qs41<br />

::’t16). 0 bool,’t21=>’t22=>int)block))<br />

(D::((’t24=>’t25=>bool,’t24=>’t25=>int)block)) (E::((’t27=>’t28=>bool,’t27=>’<br />

t28=>int)block)) (a1::’t15) (b1::’t18) (c1::’t21) (d1::’t24) (e1::’t27) (a2::’<br />

t16) (b2::’t19) (c2::’t22) (d2::’t25) (e2::’t28). [| ALL (qs40::’t15) (qs41::’<br />

t16). 0


APPENDIX C. PLACED COMBINATOR LIBRARIES 247<br />

Width (qs48 ;;; E ;;; qs49)) |] ==><br />

0 ’t16=>bool,’t15=>’t16=>int)block)) (B::((’t18=>’<br />

t19=>bool,’t18=>’t19=>int)block)) (C::((’t21=>’t22=>bool,’t21=>’t22=>int)block))<br />

(D::((’t24=>’t25=>bool,’t24=>’t25=>int)block)) (E::((’t27=>’t28=>bool,’t27=>’<br />

t28=>int)block)) (a1::’t15) (b1::’t18) (c1::’t21) (d1::’t24) (e1::’t27) (a2::’<br />

t16) (b2::’t19) (c2::’t22) (d2::’t25) (e2::’t28). [| ALL (qs40::’t15) (qs41::’<br />

t16). 0 bool,’t21=>’t22=>int)block)) (D::((’t24<br />

=>’t25=>bool,’t24=>’t25=>int)block)) (E::((’t27=>’t28=>bool,’t27=>’t28=>int)<br />

block)) (a1::’t15) (b1::’t18) (c1::’t21) (d1::’t24) (e1::’t27) (a2::’t16) (b2::’<br />

t19) (c2::’t22) (d2::’t25) (e2::’t28). [| Def (a1 ;;; A ;;; a2) ; Def (b1 ;;; B<br />

;;; b2) ; Def (c1 ;;; C ;;; c2) ; Def (d1 ;;; D ;;; d2) ; Def (e1 ;;; E ;;; e2)<br />

; ALL (qs40::’t15) (qs41::’t16). 0


APPENDIX C. PLACED COMBINATOR LIBRARIES 248<br />

theorem "!! (A::((’t15=>’t16=>bool,’t15=>’t16=>int)block)) (B::((’t18=>’t19=>bool,’<br />

t18=>’t19=>int)block)) (C::((’t21=>’t22=>bool,’t21=>’t22=>int)block)) (D::((’t24<br />

=>’t25=>bool,’t24=>’t25=>int)block)) (E::((’t27=>’t28=>bool,’t27=>’t28=>int)<br />

block)) (a1::’t15) (b1::’t18) (c1::’t21) (d1::’t24) (e1::’t27) (a2::’t16) (b2::’<br />

t19) (c2::’t22) (d2::’t25) (e2::’t28). [| Def (a1 ;;; A ;;; a2) ; Def (b1 ;;; B<br />

;;; b2) ; Def (c1 ;;; C ;;; c2) ; Def (d1 ;;; D ;;; d2) ; Def (e1 ;;; E ;;; e2)<br />

; ALL (qs40::’t15) (qs41::’t16). 0 ’t25=>bool,’t24=>’t25=>int)block)) (E::((’t27=>’t28=>bool,’t27=>’t28=>int)<br />

block)) (a1::’t15) (b1::’t18) (c1::’t21) (d1::’t24) (e1::’t27) (a2::’t16) (b2::’<br />

t19) (c2::’t22) (d2::’t25) (e2::’t28). [| Def (a1 ;;; A ;;; a2) ; Def (b1 ;;; B<br />

;;; b2) ; Def (c1 ;;; C ;;; c2) ; Def (d1 ;;; D ;;; d2) ; Def (e1 ;;; E ;;; e2)<br />

; ALL (qs40::’t15) (qs41::’t16). 0 ’t25=>bool,’t24=>’t25=>int)block)) (E::((’t27=>’t28=>bool,’t27=>’t28=>int)<br />

block)) (a1::’t15) (b1::’t18) (c1::’t21) (d1::’t24) (e1::’t27) (a2::’t16) (b2::’<br />

t19) (c2::’t22) (d2::’t25) (e2::’t28). [| Def (a1 ;;; A ;;; a2) ; Def (b1 ;;; B<br />

;;; b2) ; Def (c1 ;;; C ;;; c2) ; Def (d1 ;;; D ;;; d2) ; Def (e1 ;;; E ;;; e2)


APPENDIX C. PLACED COMBINATOR LIBRARIES 249<br />

; ALL (qs40::’t15) (qs41::’t16). 0 ’t25=>bool,’t24=>’t25=>int)block)) (E::((’t27=>’t28=>bool,’t27=>’t28=>int)<br />

block)) (a1::’t15) (b1::’t18) (c1::’t21) (d1::’t24) (e1::’t27) (a2::’t16) (b2::’<br />

t19) (c2::’t22) (d2::’t25) (e2::’t28). [| Def (a1 ;;; A ;;; a2) ; Def (b1 ;;; B<br />

;;; b2) ; Def (c1 ;;; C ;;; c2) ; Def (d1 ;;; D ;;; d2) ; Def (e1 ;;; E ;;; e2)<br />

; ALL (qs40::’t15) (qs41::’t16). 0 ’t25=>bool,’t24=>’t25=>int)block)) (E::((’t27=>’t28=>bool,’t27=>’t28=>int)


APPENDIX C. PLACED COMBINATOR LIBRARIES 250<br />

block)) (a1::’t15) (b1::’t18) (c1::’t21) (d1::’t24) (e1::’t27) (a2::’t16) (b2::’<br />

t19) (c2::’t22) (d2::’t25) (e2::’t28). [| Def (a1 ;;; A ;;; a2) ; Def (b1 ;;; B<br />

;;; b2) ; Def (c1 ;;; C ;;; c2) ; Def (d1 ;;; D ;;; d2) ; Def (e1 ;;; E ;;; e2)<br />

; ALL (qs40::’t15) (qs41::’t16). 0 ’t25=>bool,’t24=>’t25=>int)block)) (E::((’t27=>’t28=>bool,’t27=>’t28=>int)<br />

block)) (a1::’t15) (b1::’t18) (c1::’t21) (d1::’t24) (e1::’t27) (a2::’t16) (b2::’<br />

t19) (c2::’t22) (d2::’t25) (e2::’t28). [| Def (a1 ;;; A ;;; a2) ; Def (b1 ;;; B<br />

;;; b2) ; Def (c1 ;;; C ;;; c2) ; Def (d1 ;;; D ;;; d2) ; Def (e1 ;;; E ;;; e2)<br />

; ALL (qs40::’t15) (qs41::’t16). 0 ’t28=>bool,’t27=>’t28=>int)<br />

block)) (a1::’t15) (b1::’t18) (c1::’t21) (d1::’t24) (e1::’t27) (a2::’t16) (b2::’<br />

t19) (c2::’t22) (d2::’t25) (e2::’t28). [| Def (a1 ;;; A ;;; a2) ; Def (b1 ;;; B<br />

;;; b2) ; Def (c1 ;;; C ;;; c2) ; Def (d1 ;;; D ;;; d2) ; Def (e1 ;;; E ;;; e2)<br />

; ALL (qs40::’t15) (qs41::’t16). 0


APPENDIX C. PLACED COMBINATOR LIBRARIES 251<br />

((((Width (b1 ;;; B ;;; b2)) + (Width (c1 ;;; C ;;; c2))) int)block)) (D::((’t24<br />

=>’t25=>bool,’t24=>’t25=>int)block)) (E::((’t27=>’t28=>bool,’t27=>’t28=>int)<br />

block)) (a1::’t15) (b1::’t18) (c1::’t21) (d1::’t24) (e1::’t27) (a2::’t16) (b2::’<br />

t19) (c2::’t22) (d2::’t25) (e2::’t28). [| Def (a1 ;;; A ;;; a2) ; Def (b1 ;;; B<br />

;;; b2) ; Def (c1 ;;; C ;;; c2) ; Def (d1 ;;; D ;;; d2) ; Def (e1 ;;; E ;;; e2)<br />

; ALL (qs40::’t15) (qs41::’t16). 0


APPENDIX C. PLACED COMBINATOR LIBRARIES 252<br />

by auto<br />

end<br />

bool,(’t11*’t12)=>(’t13*’t14)=>int)block)<br />

*(((’t16=>’t11=>bool,’t16=>’t11=>int)block)*((’t18=>’t12=>bool,’t18=>’t12=><br />

int)block))*(((’t13=>’t20=>bool,’t13=>’t20=>int)block)*((’t14=>’t22=>bool,’<br />

t14=>’t22=>int)block)))=>(’t16*’t18)=>(’t20*’t22)=>bool"<br />

height:: "((((’t11*’t12)=>(’t13*’t14)=>bool,(’t11*’t12)=>(’t13*’t14)=>int)block)<br />

*(((’t16=>’t11=>bool,’t16=>’t11=>int)block)*((’t18=>’t12=>bool,’t18=>’t12=><br />

int)block))*(((’t13=>’t20=>bool,’t13=>’t20=>int)block)*((’t14=>’t22=>bool,’<br />

t14=>’t22=>int)block)))=>(’t16*’t18)=>(’t20*’t22)=>int"<br />

width:: "((((’t11*’t12)=>(’t13*’t14)=>bool,(’t11*’t12)=>(’t13*’t14)=>int)block)<br />

*(((’t16=>’t11=>bool,’t16=>’t11=>int)block)*((’t18=>’t12=>bool,’t18=>’t12=><br />

int)block))*(((’t13=>’t20=>bool,’t13=>’t20=>int)block)*((’t14=>’t22=>bool,’<br />

t14=>’t22=>int)block)))=>(’t16*’t18)=>(’t20*’t22)=>int"<br />

surround:: "(((((’t11*’t12)=>(’t13*’t14)=>bool,(’t11*’t12)=>(’t13*’t14)=>int)<br />

block)*(((’t16=>’t11=>bool,’t16=>’t11=>int)block)*((’t18=>’t12=>bool,’t18=>’<br />

t12=>int)block))*(((’t13=>’t20=>bool,’t13=>’t20=>int)block)*((’t14=>’t22=><br />

bool,’t14=>’t22=>int)block)))=>(’t16*’t18)=>(’t20*’t22)=>bool, ((((’t11*’t12)<br />

=>(’t13*’t14)=>bool,(’t11*’t12)=>(’t13*’t14)=>int)block)*(((’t16=>’t11=>bool<br />

,’t16=>’t11=>int)block)*((’t18=>’t12=>bool,’t18=>’t12=>int)block))*(((’t13=>’<br />

t20=>bool,’t13=>’t20=>int)block)*((’t14=>’t22=>bool,’t14=>’t22=>int)block)))<br />

=>(’t16*’t18)=>(’t20*’t22)=>int)block"<br />

defs<br />

struct_def: "struct == % (A, (B, C), (D, E)) (l, t) (b, r). EX (l2::’t11) (t2::’<br />

t12) (b2::’t13) (r2::’t14). Def (l ;;; B ;;; l2) & Def (t ;;; C ;;; t2) & Def<br />

((l2, t2) ;;; A ;;; (b2, r2)) & Def (b2 ;;; D ;;; b) & Def (r2 ;;; E ;;; r)"<br />

height_def: "height == % (A, (B, C), (D, E)) (l, t) (b, r).<br />

let l2 = (THE (l2::’t11). EX (t2::’t12) (b2::’t13) (r2::’t14). Def (l ;;; B<br />

;;; l2) & Def (t ;;; C ;;; t2) & Def ((l2, t2) ;;; A ;;; (b2, r2)) & Def<br />

(b2 ;;; D ;;; b) & Def (r2 ;;; E ;;; r)) in


APPENDIX C. PLACED COMBINATOR LIBRARIES 253<br />

let t2 = (THE (t2::’t12). EX (l2::’t11) (b2::’t13) (r2::’t14). Def (l ;;; B<br />

;;; l2) & Def (t ;;; C ;;; t2) & Def ((l2, t2) ;;; A ;;; (b2, r2)) & Def<br />

(b2 ;;; D ;;; b) & Def (r2 ;;; E ;;; r)) in<br />

let b2 = (THE (b2::’t13). EX (l2::’t11) (t2::’t12) (r2::’t14). Def (l ;;; B<br />

;;; l2) & Def (t ;;; C ;;; t2) & Def ((l2, t2) ;;; A ;;; (b2, r2)) & Def<br />

(b2 ;;; D ;;; b) & Def (r2 ;;; E ;;; r)) in<br />

let r2 = (THE (r2::’t14). EX (l2::’t11) (t2::’t12) (b2::’t13). Def (l ;;; B<br />

;;; l2) & Def (t ;;; C ;;; t2) & Def ((l2, t2) ;;; A ;;; (b2, r2)) & Def<br />

(b2 ;;; D ;;; b) & Def (r2 ;;; E ;;; r)) in<br />

max ((Height (b2 ;;; D ;;; b)) + (Height (r2 ;;; E ;;; r))) ((max (Height (b2<br />

;;; D ;;; b)) ((max ((Height (b2 ;;; D ;;; b)) + (Height ((l2, t2) ;;; A<br />

;;; (b2, r2)))) ((max ((max ((Height (b2 ;;; D ;;; b)) + (Height (r2 ;;;<br />

E ;;; r))) ((Height (b2 ;;; D ;;; b)) + (Height ((l2, t2) ;;; A ;;; (b2,<br />

r2))))) + (Height (t ;;; C ;;; t2))) ((Height (b2 ;;; D ;;; b)) + (<br />

Height (l ;;; B ;;; l2)))))))))"<br />

width_def: "width == % (A, (B, C), (D, E)) (l, t) (b, r).<br />

let l2 = (THE (l2::’t11). EX (t2::’t12) (b2::’t13) (r2::’t14). Def (l ;;; B<br />

;;; l2) & Def (t ;;; C ;;; t2) & Def ((l2, t2) ;;; A ;;; (b2, r2)) & Def<br />

(b2 ;;; D ;;; b) & Def (r2 ;;; E ;;; r)) in<br />

let t2 = (THE (t2::’t12). EX (l2::’t11) (b2::’t13) (r2::’t14). Def (l ;;; B<br />

;;; l2) & Def (t ;;; C ;;; t2) & Def ((l2, t2) ;;; A ;;; (b2, r2)) & Def<br />

(b2 ;;; D ;;; b) & Def (r2 ;;; E ;;; r)) in<br />

let b2 = (THE (b2::’t13). EX (l2::’t11) (t2::’t12) (r2::’t14). Def (l ;;; B<br />

;;; l2) & Def (t ;;; C ;;; t2) & Def ((l2, t2) ;;; A ;;; (b2, r2)) & Def<br />

(b2 ;;; D ;;; b) & Def (r2 ;;; E ;;; r)) in<br />

let r2 = (THE (r2::’t14). EX (l2::’t11) (t2::’t12) (b2::’t13). Def (l ;;; B<br />

;;; l2) & Def (t ;;; C ;;; t2) & Def ((l2, t2) ;;; A ;;; (b2, r2)) & Def<br />

(b2 ;;; D ;;; b) & Def (r2 ;;; E ;;; r)) in<br />

max ((Width (l ;;; B ;;; l2)) + (Width ((l2, t2) ;;; A ;;; (b2, r2))) + (<br />

Width (r2 ;;; E ;;; r))) ((max ((Width (l ;;; B ;;; l2)) + (Width (b2<br />

;;; D ;;; b))) ((max ((Width (l ;;; B ;;; l2)) + (Width ((l2, t2) ;;; A<br />

;;; (b2, r2)))) ((max ((Width (l ;;; B ;;; l2)) + (Width (t ;;; C ;;; t2<br />

))) (Width (l ;;; B ;;; l2))))))))"<br />

surround_def: "surround == (| Def = struct, Height = height, Width = width|)"<br />

declare width_def [simp]<br />

declare height_def [simp]<br />

declare struct_def [simp]<br />

section {* Validity <strong>of</strong> width and height functions *}<br />

theorem height_ge0_int : "!! (A::(((’t11*’t12)=>(’t13*’t14)=>bool,(’t11*’t12)=>(’t13<br />

*’t14)=>int)block)) (B::((’t16=>’t11=>bool,’t16=>’t11=>int)block)) (C::((’t18=>’<br />

t12=>bool,’t18=>’t12=>int)block)) (D::((’t13=>’t20=>bool,’t13=>’t20=>int)block))<br />

(E::((’t14=>’t22=>bool,’t14=>’t22=>int)block)) (l::’t16) (t::’t18) (b::’t20) (r<br />

::’t22). [| ALL (qs32::(’t11*’t12)) (qs33::(’t13*’t14)). 0


APPENDIX C. PLACED COMBINATOR LIBRARIES 254<br />

theorem width_ge0_int : "!! (A::(((’t11*’t12)=>(’t13*’t14)=>bool,(’t11*’t12)=>(’t13*’<br />

t14)=>int)block)) (B::((’t16=>’t11=>bool,’t16=>’t11=>int)block)) (C::((’t18=>’<br />

t12=>bool,’t18=>’t12=>int)block)) (D::((’t13=>’t20=>bool,’t13=>’t20=>int)block))<br />

(E::((’t14=>’t22=>bool,’t14=>’t22=>int)block)) (l::’t16) (t::’t18) (b::’t20) (r<br />

::’t22). [| ALL (qs32::(’t11*’t12)) (qs33::(’t13*’t14)). 0 bool,’t18=>’t12=>int)block)) (D::((’t13=>’t20=>bool,’t13=>’t20=>int)<br />

block)) (E::((’t14=>’t22=>bool,’t14=>’t22=>int)block)) (l::’t16) (t::’t18) (b::’<br />

t20) (r::’t22). [| ALL (qs32::(’t11*’t12)) (qs33::(’t13*’t14)). 0 bool,’t18=>’t12=>int)block)) (D::((’t13=>’t20=>bool,’t13=>’t20=>int)block))<br />

(E::((’t14=>’t22=>bool,’t14=>’t22=>int)block)) (l::’t16) (t::’t18) (b::’t20) (r<br />

::’t22). [| ALL (qs32::(’t11*’t12)) (qs33::(’t13*’t14)). 0


APPENDIX C. PLACED COMBINATOR LIBRARIES 255<br />

(rule width_ge0_int, (simp+)?)?)<br />

done<br />

section {* Containment theorems *}<br />

theorem "!! (A::(((’t11*’t12)=>(’t13*’t14)=>bool,(’t11*’t12)=>(’t13*’t14)=>int)block<br />

)) (B::((’t16=>’t11=>bool,’t16=>’t11=>int)block)) (C::((’t18=>’t12=>bool,’t18=>’<br />

t12=>int)block)) (D::((’t13=>’t20=>bool,’t13=>’t20=>int)block)) (E::((’t14=>’t22<br />

=>bool,’t14=>’t22=>int)block)) (l::’t16) (t::’t18) (b::’t20) (r::’t22). [| Def (<br />

l ;;; B ;;; l2) ; Def (t ;;; C ;;; t2) ; Def ((l2, t2) ;;; A ;;; (b2, r2)) ; Def<br />

(b2 ;;; D ;;; b) ; Def (r2 ;;; E ;;; r) ; ALL (qs32::(’t11*’t12)) (qs33::(’t13<br />

*’t14)). 0 ’t20=>bool,’t13=>’t20=>int)block)) (E::((’t14=>’t22<br />

=>bool,’t14=>’t22=>int)block)) (l::’t16) (t::’t18) (b::’t20) (r::’t22). [| Def (<br />

l ;;; B ;;; l2) ; Def (t ;;; C ;;; t2) ; Def ((l2, t2) ;;; A ;;; (b2, r2)) ; Def<br />

(b2 ;;; D ;;; b) ; Def (r2 ;;; E ;;; r) ; ALL (qs32::(’t11*’t12)) (qs33::(’t13<br />

*’t14)). 0


APPENDIX C. PLACED COMBINATOR LIBRARIES 256<br />

) ;;; A ;;; (b2, r2)))) ((Height (b2 ;;; D ;;; b)) + (Height (r2 ;;; E ;;; r)<br />

))) + (Height (t ;;; C ;;; t2))) (’t13*’t14)=>bool,(’t11*’t12)=>(’t13*’t14)=>int)block<br />

)) (B::((’t16=>’t11=>bool,’t16=>’t11=>int)block)) (C::((’t18=>’t12=>bool,’t18=>’<br />

t12=>int)block)) (D::((’t13=>’t20=>bool,’t13=>’t20=>int)block)) (E::((’t14=>’t22<br />

=>bool,’t14=>’t22=>int)block)) (l::’t16) (t::’t18) (b::’t20) (r::’t22). [| Def (<br />

l ;;; B ;;; l2) ; Def (t ;;; C ;;; t2) ; Def ((l2, t2) ;;; A ;;; (b2, r2)) ; Def<br />

(b2 ;;; D ;;; b) ; Def (r2 ;;; E ;;; r) ; ALL (qs32::(’t11*’t12)) (qs33::(’t13<br />

*’t14)). 0 ’t20=>bool,’t13=>’t20=>int)block)) (E::((’t14=>’t22<br />

=>bool,’t14=>’t22=>int)block)) (l::’t16) (t::’t18) (b::’t20) (r::’t22). [| Def (<br />

l ;;; B ;;; l2) ; Def (t ;;; C ;;; t2) ; Def ((l2, t2) ;;; A ;;; (b2, r2)) ; Def<br />

(b2 ;;; D ;;; b) ; Def (r2 ;;; E ;;; r) ; ALL (qs32::(’t11*’t12)) (qs33::(’t13<br />

*’t14)). 0


APPENDIX C. PLACED COMBINATOR LIBRARIES 257<br />

(0 int)block<br />

)) (B::((’t16=>’t11=>bool,’t16=>’t11=>int)block)) (C::((’t18=>’t12=>bool,’t18=>’<br />

t12=>int)block)) (D::((’t13=>’t20=>bool,’t13=>’t20=>int)block)) (E::((’t14=>’t22<br />

=>bool,’t14=>’t22=>int)block)) (l::’t16) (t::’t18) (b::’t20) (r::’t22). [| Def (<br />

l ;;; B ;;; l2) ; Def (t ;;; C ;;; t2) ; Def ((l2, t2) ;;; A ;;; (b2, r2)) ; Def<br />

(b2 ;;; D ;;; b) ; Def (r2 ;;; E ;;; r) ; ALL (qs32::(’t11*’t12)) (qs33::(’t13<br />

*’t14)). 0 ’t20=>bool,’t13=>’t20=>int)block)) (E::((’t14=>’t22<br />

=>bool,’t14=>’t22=>int)block)) (l::’t16) (t::’t18) (b::’t20) (r::’t22). [| Def (<br />

l ;;; B ;;; l2) ; Def (t ;;; C ;;; t2) ; Def ((l2, t2) ;;; A ;;; (b2, r2)) ; Def<br />

(b2 ;;; D ;;; b) ; Def (r2 ;;; E ;;; r) ; ALL (qs32::(’t11*’t12)) (qs33::(’t13<br />

*’t14)). 0


APPENDIX C. PLACED COMBINATOR LIBRARIES 258<br />

; ALL (qs38::’t13) (qs39::’t20). 0 ’t11=>int)block)) (C::((’t18=>’t12=>bool,’t18=>’<br />

t12=>int)block)) (D::((’t13=>’t20=>bool,’t13=>’t20=>int)block)) (E::((’t14=>’t22<br />

=>bool,’t14=>’t22=>int)block)) (l::’t16) (t::’t18) (b::’t20) (r::’t22). [| Def (<br />

l ;;; B ;;; l2) ; Def (t ;;; C ;;; t2) ; Def ((l2, t2) ;;; A ;;; (b2, r2)) ; Def<br />

(b2 ;;; D ;;; b) ; Def (r2 ;;; E ;;; r) ; ALL (qs32::(’t11*’t12)) (qs33::(’t13<br />

*’t14)). 0 ’t22<br />

=>bool,’t14=>’t22=>int)block)) (l::’t16) (t::’t18) (b::’t20) (r::’t22). [| Def (<br />

l ;;; B ;;; l2) ; Def (t ;;; C ;;; t2) ; Def ((l2, t2) ;;; A ;;; (b2, r2)) ; Def<br />

(b2 ;;; D ;;; b) ; Def (r2 ;;; E ;;; r) ; ALL (qs32::(’t11*’t12)) (qs33::(’t13<br />

*’t14)). 0


APPENDIX C. PLACED COMBINATOR LIBRARIES 259<br />

((((Width (l ;;; B ;;; l2)) + (Width ((l2, t2) ;;; A ;;; (b2, r2)))) int)block)) (D::((’t13=>’t20=>bool,’t13=>’t20=>int)block)) (E::((’t14=>’t22<br />

=>bool,’t14=>’t22=>int)block)) (l::’t16) (t::’t18) (b::’t20) (r::’t22). [| Def (<br />

l ;;; B ;;; l2) ; Def (t ;;; C ;;; t2) ; Def ((l2, t2) ;;; A ;;; (b2, r2)) ; Def<br />

(b2 ;;; D ;;; b) ; Def (r2 ;;; E ;;; r) ; ALL (qs32::(’t11*’t12)) (qs33::(’t13<br />

*’t14)). 0


APPENDIX C. PLACED COMBINATOR LIBRARIES 260<br />

by auto<br />

end<br />

C.5 H-Tree<br />

block htree (int n, block R (‘a, ‘a) ∼ ‘a) (‘a i [m]) ∼ (‘a o)<br />

{<br />

const m = 2 ∗∗ n.<br />

‘a st1 in[m/2], st2 in[m/2], st1 out, st2 out.<br />

if n == 0 {<br />

o = i[0] .<br />

} else {<br />

i ; half (m/2) ; (st1 in, st2 in)<br />

at (0,0).<br />

if (n mod 2 == 0) {<br />

// Vertical sub−tree arrangement<br />

st1 in ; htree (n−1, R) ; st1 out<br />

at (0,0).<br />

(st1 out, st2 out) ; R ; o<br />

at (0, height(st1 in ; htree (n−1, R) ; st1 out)).<br />

st2 in ; htree (n−1, R) ; st2 out<br />

at (0, height(st1 in;htree (n−1, R);st1 out) + height ((st1 out, st2 out);R;o<br />

)).<br />

} else {<br />

// Horizontal sub−tree arrangement<br />

st1 in ; htree (n−1, R) ; st1 out<br />

at (0,0).<br />

(st1 out, st2 out) ; R ; o<br />

at (width(st1 in ;htree (n−1, R) ; st1 out), 0).<br />

st2 in ; htree (n−1, R) ; st2 out<br />

at (width(st1 in;htree (n−1, R);st1 out) + width ((st1 out, st2 out);R;o), 0)<br />

.<br />

} .<br />

} .<br />

}<br />

theory htree = half:<br />

section {* Function definitions *}<br />

consts<br />

struct:: "nat=>(((’t20*’t20)=>’t20=>bool,(’t20*’t20)=>’t20=>int)block)=>(’t20)<br />

vector=>’t20=>bool"<br />

height:: "nat=>(((’t20*’t20)=>’t20=>bool,(’t20*’t20)=>’t20=>int)block)=>(’t20)<br />

vector=>’t20=>int"<br />

width:: "nat=>(((’t20*’t20)=>’t20=>bool,(’t20*’t20)=>’t20=>int)block)=>(’t20)<br />

vector=>’t20=>int"<br />

htree:: "((int*(((’t20*’t20)=>’t20=>bool,(’t20*’t20)=>’t20=>int)block))=>(’t20)<br />

vector=>’t20=>bool, (int*(((’t20*’t20)=>’t20=>bool,(’t20*’t20)=>’t20=>int)<br />

block))=>(’t20)vector=>’t20=>int)block"<br />

defs<br />

htree_def: "htree == (| Def = % (n, R) i o_. struct (int2nat n) R i o_, Height =<br />

% (n, R) i o_. height (int2nat n) R i o_, Width = % (n, R) i o_. width (<br />

int2nat n) R i o_|)"


APPENDIX C. PLACED COMBINATOR LIBRARIES 261<br />

primrec<br />

"struct 0 R i o_ = (o_ = i)"<br />

"struct (Suc n) R i o_ = (let m = (2 pwr (int (Suc n))) in<br />

EX (st1_in::(’t20)vector) (st2_in::(’t20)vector) (st1_out::’t20) (st2_out::’t20).<br />

Def (i ;;; half $ (m div 2) ;;; (st1_in, st2_in)) &<br />

(if ((Suc n) mod 2) = 0 then (struct n R st1_in st1_out) & Def ((st1_out, st2_out<br />

) ;;; R ;;; o_) & (struct n R st2_in st2_out)<br />

else (struct n R st1_in st1_out) & Def ((st1_out, st2_out) ;;; R ;;; o_) & (<br />

struct n R st2_in st2_out)))"<br />

primrec<br />

"height 0 R i o_ = 0"<br />

"height (Suc n) R i o_ = (let m = (2 pwr (int (Suc n))) in<br />

let st1_in = (THE (st1_in::(’t20)vector). EX (st2_in::(’t20)vector) (st1_out::’<br />

t20) (st2_out::’t20).<br />

Def (i ;;; half $ (m div 2) ;;; (st1_in, st2_in)) &<br />

(if ((Suc n) mod 2) = 0 then (struct n R st1_in st1_out) & Def ((st1_out,<br />

st2_out) ;;; R ;;; o_) & (struct n R st2_in st2_out)<br />

else (struct n R st1_in st1_out) & Def ((st1_out, st2_out) ;;; R ;;; o_) & (<br />

struct n R st2_in st2_out))<br />

) in<br />

let st2_in = (THE (st2_in::(’t20)vector). EX (st1_in::(’t20)vector) (st1_out::’<br />

t20) (st2_out::’t20).<br />

Def (i ;;; half $ (m div 2) ;;; (st1_in, st2_in)) &<br />

(if ((Suc n) mod 2) = 0 then (struct n R st1_in st1_out) & Def ((st1_out,<br />

st2_out) ;;; R ;;; o_) & (struct n R st2_in st2_out)<br />

else (struct n R st1_in st1_out) & Def ((st1_out, st2_out) ;;; R ;;; o_) & (<br />

struct n R st2_in st2_out))<br />

) in<br />

let st1_out = (THE (st1_out::’t20). EX (st1_in::(’t20)vector) (st2_in::(’t20)<br />

vector) (st2_out::’t20).<br />

Def (i ;;; half $ (m div 2) ;;; (st1_in, st2_in)) &<br />

(if ((Suc n) mod 2) = 0 then (struct n R st1_in st1_out) & Def ((st1_out,<br />

st2_out) ;;; R ;;; o_) & (struct n R st2_in st2_out)<br />

else (struct n R st1_in st1_out) & Def ((st1_out, st2_out) ;;; R ;;; o_) & (<br />

struct n R st2_in st2_out))<br />

) in<br />

let st2_out = (THE (st2_out::’t20). EX (st1_in::(’t20)vector) (st2_in::(’t20)<br />

vector) (st1_out::’t20).<br />

Def (i ;;; half $ (m div 2) ;;; (st1_in, st2_in)) &<br />

(if ((Suc n) mod 2) = 0 then (struct n R st1_in st1_out) & Def ((st1_out,<br />

st2_out) ;;; R ;;; o_) & (struct n R st2_in st2_out)<br />

else (struct n R st1_in st1_out) & Def ((st1_out, st2_out) ;;; R ;;; o_) & (<br />

struct n R st2_in st2_out))<br />

) in<br />

max (Height (i ;;; half $ (m div 2) ;;; (st1_in, st2_in))) (if ((Suc n) mod 2) =<br />

0 then max ((height n R st1_in st1_out) + (Height ((st1_out, st2_out) ;;; R<br />

;;; o_)) + (height n R st2_in st2_out)) ((max ((height n R st1_in st1_out) +<br />

(Height ((st1_out, st2_out) ;;; R ;;; o_))) (height n R st1_in st1_out)))<br />

else max (height n R st2_in st2_out) ((max (Height ((st1_out, st2_out) ;;; R<br />

;;; o_)) (height n R st1_in st1_out))))<br />

)"<br />

primrec<br />

"width 0 R i o_ = 0"<br />

"width (Suc n) R i o_ = (let m = (2 pwr (int (Suc n))) in


APPENDIX C. PLACED COMBINATOR LIBRARIES 262<br />

)"<br />

let st1_in = (THE (st1_in::(’t20)vector). EX (st2_in::(’t20)vector) (st1_out::’<br />

t20) (st2_out::’t20).<br />

Def (i ;;; half $ (m div 2) ;;; (st1_in, st2_in)) &<br />

(if ((Suc n) mod 2) = 0 then (struct n R st1_in st1_out) & Def ((st1_out,<br />

st2_out) ;;; R ;;; o_) & (struct n R st2_in st2_out)<br />

else (struct n R st1_in st1_out) & Def ((st1_out, st2_out) ;;; R ;;; o_) & (<br />

struct n R st2_in st2_out))<br />

) in<br />

let st2_in = (THE (st2_in::(’t20)vector). EX (st1_in::(’t20)vector) (st1_out::’<br />

t20) (st2_out::’t20).<br />

Def (i ;;; half $ (m div 2) ;;; (st1_in, st2_in)) &<br />

(if ((Suc n) mod 2) = 0 then (struct n R st1_in st1_out) & Def ((st1_out,<br />

st2_out) ;;; R ;;; o_) & (struct n R st2_in st2_out)<br />

else (struct n R st1_in st1_out) & Def ((st1_out, st2_out) ;;; R ;;; o_) & (<br />

struct n R st2_in st2_out))<br />

) in<br />

let st1_out = (THE (st1_out::’t20). EX (st1_in::(’t20)vector) (st2_in::(’t20)<br />

vector) (st2_out::’t20).<br />

Def (i ;;; half $ (m div 2) ;;; (st1_in, st2_in)) &<br />

(if ((Suc n) mod 2) = 0 then (struct n R st1_in st1_out) & Def ((st1_out,<br />

st2_out) ;;; R ;;; o_) & (struct n R st2_in st2_out)<br />

else (struct n R st1_in st1_out) & Def ((st1_out, st2_out) ;;; R ;;; o_) & (<br />

struct n R st2_in st2_out))<br />

) in<br />

let st2_out = (THE (st2_out::’t20). EX (st1_in::(’t20)vector) (st2_in::(’t20)<br />

vector) (st1_out::’t20).<br />

Def (i ;;; half $ (m div 2) ;;; (st1_in, st2_in)) &<br />

(if ((Suc n) mod 2) = 0 then (struct n R st1_in st1_out) & Def ((st1_out,<br />

st2_out) ;;; R ;;; o_) & (struct n R st2_in st2_out)<br />

else (struct n R st1_in st1_out) & Def ((st1_out, st2_out) ;;; R ;;; o_) & (<br />

struct n R st2_in st2_out))<br />

) in<br />

max (Width (i ;;; half $ (m div 2) ;;; (st1_in, st2_in))) (if ((Suc n) mod 2) = 0<br />

then max ((width n R st1_in st1_out) + (Width ((st1_out, st2_out) ;;; R ;;;<br />

o_)) + (width n R st2_in st2_out)) ((max ((width n R st1_in st1_out) + (Width<br />

((st1_out, st2_out) ;;; R ;;; o_))) (width n R st1_in st1_out))) else max (<br />

width n R st2_in st2_out) ((max (Width ((st1_out, st2_out) ;;; R ;;; o_)) (<br />

width n R st1_in st1_out))))<br />

section {* Validity <strong>of</strong> width and height functions *}<br />

theorem height_ge0_int [rule_format]: "!! (n::nat) (R::(((’t20*’t20)=>’t20=>bool,(’<br />

t20*’t20)=>’t20=>int)block)). [| ALL (qs67::(’t20*’t20)) (qs68::’t20). 0 <br />

ALL i o_. 0 ’t20=>bool,(’t20<br />

*’t20)=>’t20=>int)block)). [| ALL (qs67::(’t20*’t20)) (qs68::’t20). 0


APPENDIX C. PLACED COMBINATOR LIBRARIES 263<br />

apply (induct_tac n)<br />

apply (auto intro: z_aleq_bc half.height_ge0 half.width_ge0 simp add: Let_def max_def<br />

half_def)<br />

done<br />

section {* Additional simplification rules for different representations *}<br />

theorem height_ge0: "!! (n::int) (R::(((’t20*’t20)=>’t20=>bool,(’t20*’t20)=>’t20=>int<br />

)block)) (i::(’t20)vector) (o_::’t20). [| ALL (qs67::(’t20*’t20)) (qs68::’t20).<br />

0 bool,(’t20*’t20)=>’t20=>int)<br />

block)) (i::(’t20)vector) (o_::’t20). [| ALL (qs67::(’t20*’t20)) (qs68::’t20). 0<br />

bool,(’t20*’t20)=>’t20=>int)block)) (i<br />

::(’t20)vector) (o_::’t20). [| if n = 0 then o_ = i else Def (i ;;; half $ (m<br />

div 2) ;;; (st1_in, st2_in)) & (if (n mod 2) = 0 then (struct (int2nat (n - 1))<br />

R st1_in st1_out) & Def ((st1_out, st2_out) ;;; R ;;; o_) & (struct (int2nat (n<br />

- 1)) R st2_in st2_out) else (struct (int2nat (n - 1)) R st1_in st1_out) & Def<br />

((st1_out, st2_out) ;;; R ;;; o_) & (struct (int2nat (n - 1)) R st2_in st2_out))<br />

; m = (2 pwr n) ; ALL (qs67::(’t20*’t20)) (qs68::’t20). 0


APPENDIX C. PLACED COMBINATOR LIBRARIES 264<br />

0) & ((0::int)


APPENDIX C. PLACED COMBINATOR LIBRARIES 265<br />

st2_out) ;;; R ;;; o_)) + (width (int2nat (n - 1)) R st2_in st2_out)) ((max<br />

((width (int2nat (n - 1)) R st1_in st1_out) + (Width ((st1_out, st2_out) ;;;<br />

R ;;; o_))) (width (int2nat (n - 1)) R st1_in st1_out)))))) & ((0 + (height (<br />

int2nat (n - 1)) R st1_in st1_out))


APPENDIX C. PLACED COMBINATOR LIBRARIES 266<br />

theorem "!! (n::int) (R::(((’t20*’t20)=>’t20=>bool,(’t20*’t20)=>’t20=>int)block)) (i<br />

::(’t20)vector) (o_::’t20). [| if n = 0 then o_ = i else Def (i ;;; half $ (m<br />

div 2) ;;; (st1_in, st2_in)) & (if (n mod 2) = 0 then (struct (int2nat (n - 1))<br />

R st1_in st1_out) & Def ((st1_out, st2_out) ;;; R ;;; o_) & (struct (int2nat (n<br />

- 1)) R st2_in st2_out) else (struct (int2nat (n - 1)) R st1_in st1_out) & Def<br />

((st1_out, st2_out) ;;; R ;;; o_) & (struct (int2nat (n - 1)) R st2_in st2_out))<br />

; m = (2 pwr n) ; ALL (qs67::(’t20*’t20)) (qs68::’t20). 0


APPENDIX C. PLACED COMBINATOR LIBRARIES 267<br />

& ((((width (int2nat (n - 1)) R st1_in st1_out) + (Width ((st1_out, st2_out) ;;;<br />

R ;;; o_)))


Appendix D<br />

<strong>Circuit</strong> <strong>Layout</strong> Case Studies<br />

This appendix contains the Quartz descriptions and layout verification pro<strong>of</strong>s for a number <strong>of</strong><br />

full hardware designs for the Xilinx Virtex II <strong>FPGA</strong> architecture. Pro<strong>of</strong> scripts for hardware<br />

primitives have sometimes been omitted, although they have pro<strong>of</strong> scripts and theorems<br />

generated like any other block (though empty structures). Useful layout reasoning is at the<br />

level <strong>of</strong> half-slices which have size 1 × 1.<br />

D.1 Median Filter<br />

D.1.1 Quartz Description<br />

/∗∗ Median filter in Quartz.<br />

@author Oliver Pell<br />

1−D median filter implemented as a state machine <strong>with</strong> single serial input.<br />

∗/<br />

directive vhdl ”target:virtex2”.<br />

directive vhdl ”include:ieee header”.<br />

#include ”p prelude.qtz”<br />

/∗ Primitives ∗/<br />

block or2 (wire a, wire b) ∼ (wire c) attributes { height=1. width=1. }{ }<br />

block and2 (wire a, wire b) ∼ (wire c) attributes { height=1. width=1. }{ }<br />

block and3 (wire a, wire b, wire c) ∼ (wire d) attributes { height=1. width=1. }{ }<br />

block fd (wire c) (wire d) ∼ (wire q) attributes { height=1. width=1. }{ }<br />

block inv (wire a) ∼ (wire b) attributes { height=1. width=1. }{ }<br />

block mux lut (wire s) (wire d0, wire d1) ∼ (wire o) attributes { height=1. width=1. }{ }<br />

block mux lut ff (wire clk) (wire s) (wire d0, wire d1) ∼ (wire o) attributes { height=1.<br />

width=1. }{ }<br />

block comp lut ((wire a, wire b), wire s) ∼ (wire o) attributes { height=1. width=1. }{ }<br />

268


APPENDIX D. CIRCUIT LAYOUT CASE STUDIES 269<br />

block mux (int b) (wire s) (wire d0[b], wire d1[b]) ∼ (wire o[b]) {<br />

int j.<br />

for j = 0..b−1 {<br />

(d0[j ], d1[j ]) ; mux lut s ; (o[j ]) at (0, j).<br />

} .<br />

}<br />

block mux ff (wire clk) (int b) (wire s) (wire d0[b], wire d1[b]) ∼ (wire o[b]) {<br />

int j.<br />

for j = 0..b−1 {<br />

(d0[j ], d1[j ]) ; mux lut ff clk s ; (o[j ]) at (0, j).<br />

} .<br />

}<br />

block max2 (int bits) (wire a[bits ], wire b[bits ]) ∼ (wire c[bits ]) {<br />

wire a geq b[bits+1].<br />

int j.<br />

a geq b[0] = true.<br />

for j = 0..bits−1 {<br />

((a[j ], b[j ]) , a geq b[j]) ; comp lut ;a geq b[j+1] at (0, j).<br />

} .<br />

(b, a) ; mux bits (a geq b[bits]) ; c at (1, 0).<br />

}<br />

block min2 (int bits) (wire a[bits ], wire b[bits ]) ∼ (wire c[bits ]) {<br />

wire a geq b[bits+1].<br />

int j.<br />

a geq b[0] = true.<br />

for j = 0..bits−1 {<br />

((a[j ], b[j ]) , a geq b[j]) ; comp lut ;a geq b[j+1] at (0, j).<br />

} .<br />

(a, b) ; mux bits (a geq b[bits]) ; c at (1,0).<br />

}<br />

block eq (int n) (wire a[n], wire b[n]) ∼ (wire c) {<br />

int j.<br />

wire match[n+1].<br />

match[0] = true.<br />

for j = 0..n−1 {<br />

(match[j], a[j ], b[j ]) ; and3 ; (match[j+1]) at (0, j).<br />

} .<br />

c = match[n].<br />

}<br />

/∗∗ Insertion sort ∗/<br />

block insert (int bits) (int n) (‘t a, ‘t b[n]) ∼ (‘t c[n+1]) →<br />

row (n, fork ; [min2 bits, max2 bits]) ; apr n.<br />

block lct cell (int bits) ((wire f, ‘t a), ‘t s) ∼ (wire d, (wire f2,‘t a2)) {<br />

wire a eq s.<br />

a2 = a.<br />

d = f.<br />

(a, s) ; eq bits ; a eq s at (0, height((a eq s, f) ;or2 ; f2)).<br />

(a eq s, f) ; or2 ; f2 at (0, 0).<br />

}


APPENDIX D. CIRCUIT LAYOUT CASE STUDIES 270<br />

/∗∗ Locater block determines which value <strong>of</strong> the state should be discarded.<br />

@input bits Number <strong>of</strong> input bits<br />

@input n Size <strong>of</strong> state array<br />

@input a Value to look for<br />

@input s State array to look in<br />

@output d Array <strong>of</strong> true|false values to control the mode for the compactor<br />

∗/<br />

block locater (int bits) (int n) (‘t a, ‘t s[n+1]) ∼ (wire d[n+1]) {<br />

wire found.<br />

found = false.<br />

((found, a), s) ;<br />

row (n+1, lct cell bits) ;<br />

pi1 ;<br />

d at (0,0).<br />

}<br />

/∗∗ Compactor cell. Operates in shift or through mode. Shift means we<br />

haven’t yet encountered the value to remove, through mode means we<br />

have. In shift mode we push the current value to the left and output<br />

the last value. In through mode we destroy the last value and output<br />

this value directly , which will also be done by all subsequent nodes<br />

∗/<br />

block del cell (wire clk) (int bits) (‘t x, (‘t y, wire mode)) ∼ (‘t m, ‘t n) {<br />

(y, x) ; mux bits mode ;n at (0,0).<br />

(x, y) ; mux ff clk bits mode ;m at (1,0).<br />

}<br />

/∗∗ Compactor block takes a state <strong>of</strong> size n+1 and discards the specified<br />

element to produce a new state <strong>of</strong> size n.<br />

@input n Desired size <strong>of</strong> state array<br />

@input s State array to look in<br />

@input d Array <strong>of</strong> true|false values for whether this index is the value to<br />

be removed<br />

@output s2 Output state<br />

∗/<br />

block compactor (wire clk) (int bits) (int n) (‘t s[n+1], wire d[n+1]) ∼ (‘t s2[n]) {<br />

(s [0], (s[n.. 1], d[n..1])) ;<br />

[id, zip 2] ;<br />

row (n, del cell clk bits) ;<br />

pi1 ;<br />

s2 at (0,0).<br />

}<br />

/∗∗ Take the median element ∗/<br />

block midelem (int n) (‘t a[n]) ∼ (‘t b) {<br />

assert (n mod 2 == 1) ”Can’t take the middle element <strong>of</strong> an even number!”.<br />

b = a[n / 2].<br />

}<br />

/∗∗ Insert new value into block, then extract the median ∗/<br />

block insert median (int bits) (int n) (‘t a, ‘t b[n]) ∼ (‘t c[n+1], ‘t d)<br />

→ insert bits n ; fork ; snd (midelem (n+1)).<br />

/∗∗ Remove element a from state s to produce state s2. Throw away d ∗/<br />

block nextstate (wire clk) (int bits) (int n) (‘t a, ‘t s[n+1]) ∼ (‘t s2[n], bool d) {<br />

wire control[n+1].<br />

(a, s) ; locater bits n ; control at (0, height((s, control) ; compactor clk bits n ; s2)).


APPENDIX D. CIRCUIT LAYOUT CASE STUDIES 271<br />

}<br />

(s, control) ; compactor clk bits n ; s2 at (0,0).<br />

block filter core (int n, int bits) (wire clk) (wire newval[bits ], wire s[n][ bits ]) ∼<br />

(‘t s2[n][ bits ], ‘t median[bits]) →<br />

fst (fork ; fst (rcomp (n, map (bits, fd clk)))) ;<br />

below ( nextstate clk bits n , insert median bits n) ;<br />

snd pi2.<br />

/∗∗ Median filter, ”n” + 1 size window for ”bits” bit values.<br />

@input n Window size<br />

@input bits Number <strong>of</strong> bits in data values<br />

@input clk Clock signal<br />

@input newval Current input value<br />

@output median Current output (median) value<br />

∗/<br />

block filter (int n, int bits) (wire clk) (‘t newval) ∼ (‘t median)<br />

→ loop ( filter core (n, bits) clk).<br />

D.1.2 Theory max2<br />

theory max2 = mux + comp_lut:<br />

section {* Function definitions *}<br />

consts<br />

struct:: "int=>((wire)vector*(wire)vector)=>(wire)vector=>bool"<br />

height:: "int=>((wire)vector*(wire)vector)=>(wire)vector=>int"<br />

width:: "int=>((wire)vector*(wire)vector)=>(wire)vector=>int"<br />

max2:: "(int=>((wire)vector*(wire)vector)=>(wire)vector=>bool, int=>((wire)vector<br />

*(wire)vector)=>(wire)vector=>int)block"<br />

defs<br />

struct_def: "struct == % bits (a, b) c. EX (a_geq_b::(wire)vector). (a_geq_b =<br />

bool2wire True) & (ALL (j::int). ((0


APPENDIX D. CIRCUIT LAYOUT CASE STUDIES 272<br />

section {* Validity <strong>of</strong> width and height functions *}<br />

theorem height_ge0_int : "!! (bits::int) (a::(wire)vector) (b::(wire)vector) (c::(<br />

wire)vector). 0


APPENDIX D. CIRCUIT LAYOUT CASE STUDIES 273<br />

((0::int)


APPENDIX D. CIRCUIT LAYOUT CASE STUDIES 274<br />

D.1.3 Theory min2<br />

theory min2 = mux + comp_lut:<br />

section {* Function definitions *}<br />

consts<br />

struct:: "int=>((wire)vector*(wire)vector)=>(wire)vector=>bool"<br />

height:: "int=>((wire)vector*(wire)vector)=>(wire)vector=>int"<br />

width:: "int=>((wire)vector*(wire)vector)=>(wire)vector=>int"<br />

min2:: "(int=>((wire)vector*(wire)vector)=>(wire)vector=>bool, int=>((wire)vector<br />

*(wire)vector)=>(wire)vector=>int)block"<br />

defs<br />

struct_def: "struct == % bits (a, b) c. EX (a_geq_b::(wire)vector). (a_geq_b =<br />

bool2wire True) & (ALL (j::int). ((0


APPENDIX D. CIRCUIT LAYOUT CASE STUDIES 275<br />

theorem height_ge0: "!! (bits::int) (a::(wire)vector) (b::(wire)vector) (c::(wire)<br />

vector). 0


APPENDIX D. CIRCUIT LAYOUT CASE STUDIES 276<br />

;;; mux $ bits $ a_geq_b ;;; c) |] ==><br />

ALL (j::int) (j’::int). ((0 ((wire)vector*(wire)<br />

vector)=>wire=>int)block"<br />

defs<br />

struct_def: "struct == % n (a, b) c. EX (match::(wire)vector). (match =<br />

bool2wire True) & (ALL (j::int). ((0


APPENDIX D. CIRCUIT LAYOUT CASE STUDIES 277<br />

j>, a, b) ;;; and3 ;;; match)) & (c = match)"<br />

height_def: "height == % n (a, b) c. let match = (THE (match::(wire)vector). (<br />

match = bool2wire True) & (ALL (j::int). ((0 ) ;;; and3 ;;; match)))) 0"<br />

width_def: "width == % n (a, b) c. let match = (THE (match::(wire)vector). (match<br />

= bool2wire True) & (ALL (j::int). ((0


APPENDIX D. CIRCUIT LAYOUT CASE STUDIES 278<br />

qs480>, a, b) ;;; and3 ;;; match)))) 0)))"<br />

apply (auto intro: sum_ge0 maxf_ge0 sum_ge0_frange maxf_ge0_frange sum_nsub1_plusf<br />

maxf_encloses and3.height_ge0 and3.width_ge0 simp add: and3_def)<br />

apply (simp add: maxf_lamx_top)<br />

done<br />

section {* Intersection theorems *}<br />

theorem "!! (n::int) (a::(wire)vector) (b::(wire)vector) (c::wire). [| match =<br />

bool2wire True ; ALL (j::int). ((0 <br />

ALL (j::int) (j’::int). ((0 <br />

int"<br />

width:: "int=>int=>((wire)vector*((wire)vector)vector)=>((wire)vector)vector=>int<br />

"<br />

insert:: "(int=>int=>((wire)vector*((wire)vector)vector)=>((wire)vector)vector=><br />

bool, int=>int=>((wire)vector*((wire)vector)vector)=>((wire)vector)vector=><br />

int)block"<br />

defs<br />

struct_def: "struct == % bits n (a, b) c. Def ((a, b) ;;; row $ (n, fork ;; [[<br />

min2 $ bits, max2 $ bits ]]) ;; apr $ n ;;; c)"<br />

height_def: "height == % bits n (a, b) c. Height ((a, b) ;;; row $ (n, fork ;; [[<br />

min2 $ bits, max2 $ bits ]]) ;; apr $ n ;;; c)"


APPENDIX D. CIRCUIT LAYOUT CASE STUDIES 279<br />

width_def: "width == % bits n (a, b) c. Width ((a, b) ;;; row $ (n, fork ;; [[<br />

min2 $ bits, max2 $ bits ]]) ;; apr $ n ;;; c)"<br />

insert_def: "insert == (| Def = struct, Height = height, Width = width |)"<br />

declare width_def [simp]<br />

declare height_def [simp]<br />

declare struct_def [simp]<br />

section {* Validity <strong>of</strong> width and height functions *}<br />

theorem height_ge0_int : "!! (bits::int) (n::int) (a::(wire)vector) (b::((wire)vector<br />

)vector) (c::((wire)vector)vector). 0


APPENDIX D. CIRCUIT LAYOUT CASE STUDIES 280<br />

apply (auto intro: sum_ge0 maxf_ge0 sum_ge0_frange maxf_ge0_frange sum_nsub1_plusf<br />

maxf_encloses apr.height_ge0 apr.width_ge0 max2.height_ge0 max2.width_ge0 min2.<br />

height_ge0 min2.width_ge0 fork.height_ge0 fork.width_ge0 row.height_ge0 row.<br />

width_ge0)<br />

done<br />

end<br />

D.1.6 Theory lct cell<br />

theory lct_cell = or2 + eq:<br />

section {* Function definitions *}<br />

consts<br />

struct:: "int=>((wire*(wire)vector)*(wire)vector)=>(wire*(wire*(wire)vector))=><br />

bool"<br />

height:: "int=>((wire*(wire)vector)*(wire)vector)=>(wire*(wire*(wire)vector))=><br />

int"<br />

width:: "int=>((wire*(wire)vector)*(wire)vector)=>(wire*(wire*(wire)vector))=>int<br />

"<br />

lct_cell:: "(int=>((wire*(wire)vector)*(wire)vector)=>(wire*(wire*(wire)vector))<br />

=>bool, int=>((wire*(wire)vector)*(wire)vector)=>(wire*(wire*(wire)vector))=><br />

int)block"<br />

defs<br />

struct_def: "struct == % bits ((f, a), s) (d, (f2, a2)). EX (a_eq_s::wire). (a2 =<br />

a) & (d = f) & Def ((a, s) ;;; eq $ bits ;;; a_eq_s) & Def ((a_eq_s, f) ;;;<br />

or2 ;;; f2)"<br />

height_def: "height == % bits ((f, a), s) (d, (f2, a2)). let a_eq_s = (THE (<br />

a_eq_s::wire). (a2 = a) & (d = f) & Def ((a, s) ;;; eq $ bits ;;; a_eq_s) &<br />

Def ((a_eq_s, f) ;;; or2 ;;; f2)) in max (Height ((a_eq_s, f) ;;; or2 ;;; f2)<br />

) ((max ((Height ((a_eq_s, f) ;;; or2 ;;; f2)) + (Height ((a, s) ;;; eq $<br />

bits ;;; a_eq_s))) 0))"<br />

width_def: "width == % bits ((f, a), s) (d, (f2, a2)). let a_eq_s = (THE (a_eq_s<br />

::wire). (a2 = a) & (d = f) & Def ((a, s) ;;; eq $ bits ;;; a_eq_s) & Def ((<br />

a_eq_s, f) ;;; or2 ;;; f2)) in max (Width ((a_eq_s, f) ;;; or2 ;;; f2)) ((max<br />

(Width ((a, s) ;;; eq $ bits ;;; a_eq_s)) 0))"<br />

lct_cell_def: "lct_cell == (| Def = struct, Height = height, Width = width |)"<br />

declare width_def [simp]<br />

declare height_def [simp]<br />

declare struct_def [simp]<br />

section {* Validity <strong>of</strong> width and height functions *}<br />

theorem height_ge0_int : "!! (bits::int) (f::wire) (a::(wire)vector) (s::(wire)vector<br />

) (d::wire) (f2::wire) (a2::(wire)vector). 0


APPENDIX D. CIRCUIT LAYOUT CASE STUDIES 281<br />

apply (auto intro: sum_ge0 maxf_ge0 sum_ge0_frange maxf_ge0_frange z_aleq_bc eq.<br />

height_ge0 eq.width_ge0 or2.height_ge0 or2.width_ge0 simp add: Let_def max_def)<br />

done<br />

section {* Additional simplification rules for different representations *}<br />

theorem height_ge0: "!! (bits::int) (f::wire) (a::(wire)vector) (s::(wire)vector) (d<br />

::wire) (f2::wire) (a2::(wire)vector). 0


APPENDIX D. CIRCUIT LAYOUT CASE STUDIES 282<br />

)) + (Height ((a, s) ;;; eq $ bits ;;; a_eq_s))) (wire)vector=>bool"<br />

height:: "int=>int=>((wire)vector*((wire)vector)vector)=>(wire)vector=>int"<br />

width:: "int=>int=>((wire)vector*((wire)vector)vector)=>(wire)vector=>int"<br />

locater:: "(int=>int=>((wire)vector*((wire)vector)vector)=>(wire)vector=>bool,<br />

int=>int=>((wire)vector*((wire)vector)vector)=>(wire)vector=>int)block"<br />

defs<br />

struct_def: "struct == % bits n (a, s) d. EX (found::wire). (found = bool2wire<br />

False) & Def (((found, a), s) ;;; row $ (n + 1, lct_cell $ bits) ;; pi1 ;;; d<br />

)"<br />

height_def: "height == % bits n (a, s) d. let found = (THE (found::wire). (found<br />

= bool2wire False) & Def (((found, a), s) ;;; row $ (n + 1, lct_cell $ bits)<br />

;; pi1 ;;; d)) in max (Height (((found, a), s) ;;; row $ (n + 1, lct_cell $<br />

bits) ;; pi1 ;;; d)) 0"<br />

width_def: "width == % bits n (a, s) d. let found = (THE (found::wire). (found =<br />

bool2wire False) & Def (((found, a), s) ;;; row $ (n + 1, lct_cell $ bits) ;;<br />

pi1 ;;; d)) in max (Width (((found, a), s) ;;; row $ (n + 1, lct_cell $ bits<br />

) ;; pi1 ;;; d)) 0"<br />

locater_def: "locater == (| Def = struct, Height = height, Width = width |)"<br />

declare width_def [simp]<br />

declare height_def [simp]<br />

declare struct_def [simp]<br />

section {* Validity <strong>of</strong> width and height functions *}<br />

theorem height_ge0_int : "!! (bits::int) (n::int) (a::(wire)vector) (s::((wire)vector<br />

)vector) (d::(wire)vector). 0


APPENDIX D. CIRCUIT LAYOUT CASE STUDIES 283<br />

done<br />

theorem width_ge0_int : "!! (bits::int) (n::int) (a::(wire)vector) (s::((wire)vector)<br />

vector) (d::(wire)vector). 0 ((wire)vector*((wire)vector*wire))=>((wire)vector*(wire)<br />

vector)=>int"<br />

width:: "wire=>int=>((wire)vector*((wire)vector*wire))=>((wire)vector*(wire)<br />

vector)=>int"<br />

del_cell:: "(wire=>int=>((wire)vector*((wire)vector*wire))=>((wire)vector*(wire)<br />

vector)=>bool, wire=>int=>((wire)vector*((wire)vector*wire))=>((wire)vector*(


APPENDIX D. CIRCUIT LAYOUT CASE STUDIES 284<br />

wire)vector)=>int)block"<br />

defs<br />

struct_def: "struct == % clk bits (x, (y, mode)) (m, n). Def ((y, x) ;;; mux $<br />

bits $ mode ;;; n) & Def ((x, y) ;;; mux_ff $ clk $ bits $ mode ;;; m)"<br />

height_def: "height == % clk bits (x, (y, mode)) (m, n). max (Height ((x, y) ;;;<br />

mux_ff $ clk $ bits $ mode ;;; m)) (Height ((y, x) ;;; mux $ bits $ mode ;;;<br />

n))"<br />

width_def: "width == % clk bits (x, (y, mode)) (m, n). max (1 + (Width ((x, y)<br />

;;; mux_ff $ clk $ bits $ mode ;;; m))) (Width ((y, x) ;;; mux $ bits $ mode<br />

;;; n))"<br />

del_cell_def: "del_cell == (| Def = struct, Height = height, Width = width |)"<br />

declare width_def [simp]<br />

declare height_def [simp]<br />

declare struct_def [simp]<br />

section {* Validity <strong>of</strong> width and height functions *}<br />

theorem height_ge0_int : "!! (clk::wire) (bits::int) (x::(wire)vector) (y::(wire)<br />

vector) (mode::wire) (m::(wire)vector) (n::(wire)vector). 0


APPENDIX D. CIRCUIT LAYOUT CASE STUDIES 285<br />

((0::int) int"<br />

width:: "wire=>int=>int=>(((wire)vector)vector*(wire)vector)=>((wire)vector)<br />

vector=>int"<br />

compactor:: "(wire=>int=>int=>(((wire)vector)vector*(wire)vector)=>((wire)vector)<br />

vector=>bool, wire=>int=>int=>(((wire)vector)vector*(wire)vector)=>((wire)<br />

vector)vector=>int)block"<br />

defs<br />

struct_def: "struct == % clk bits n (s, d) s2. Def ((s, (s, d))<br />

;;; [[ id, zip $ 2 ]] ;; row $ (n, del_cell $ clk $ bits) ;; pi1 ;;; s2)"


APPENDIX D. CIRCUIT LAYOUT CASE STUDIES 286<br />

height_def: "height == % clk bits n (s, d) s2. Height ((s, (s, d))<br />

;;; [[ id, zip $ 2 ]] ;; row $ (n, del_cell $ clk $ bits) ;; pi1 ;;; s2)"<br />

width_def: "width == % clk bits n (s, d) s2. Width ((s, (s, d))<br />

;;; [[ id, zip $ 2 ]] ;; row $ (n, del_cell $ clk $ bits) ;; pi1 ;;; s2)"<br />

compactor_def: "compactor == (| Def = struct, Height = height, Width = width |)"<br />

declare width_def [simp]<br />

declare height_def [simp]<br />

declare struct_def [simp]<br />

section {* Validity <strong>of</strong> width and height functions *}<br />

theorem height_ge0_int : "!! (clk::wire) (bits::int) (n::int) (s::((wire)vector)<br />

vector) (d::(wire)vector) (s2::((wire)vector)vector). 0


APPENDIX D. CIRCUIT LAYOUT CASE STUDIES 287<br />

((0::int) bool"<br />

height:: "int=>int=>((wire)vector*((wire)vector)vector)=>(((wire)vector)vector*(<br />

wire)vector)=>int"<br />

width:: "int=>int=>((wire)vector*((wire)vector)vector)=>(((wire)vector)vector*(<br />

wire)vector)=>int"<br />

insert_median:: "(int=>int=>((wire)vector*((wire)vector)vector)=>(((wire)vector)<br />

vector*(wire)vector)=>bool, int=>int=>((wire)vector*((wire)vector)vector)<br />

=>(((wire)vector)vector*(wire)vector)=>int)block"<br />

defs<br />

struct_def: "struct == % bits n (a, b) (c, d). Def ((a, b) ;;; insert $ bits $ n<br />

;; fork ;; snd $ (midelem $ (n + 1)) ;;; (c, d))"<br />

height_def: "height == % bits n (a, b) (c, d). Height ((a, b) ;;; insert $ bits $<br />

n ;; fork ;; snd $ (midelem $ (n + 1)) ;;; (c, d))"<br />

width_def: "width == % bits n (a, b) (c, d). Width ((a, b) ;;; insert $ bits $ n<br />

;; fork ;; snd $ (midelem $ (n + 1)) ;;; (c, d))"<br />

insert_median_def: "insert_median == (| Def = struct, Height = height, Width =<br />

width |)"<br />

declare width_def [simp]<br />

declare height_def [simp]<br />

declare struct_def [simp]<br />

section {* Validity <strong>of</strong> width and height functions *}<br />

theorem height_ge0_int : "!! (bits::int) (n::int) (a::(wire)vector) (b::((wire)vector<br />

)vector) (c::((wire)vector)vector) (d::(wire)vector). 0


APPENDIX D. CIRCUIT LAYOUT CASE STUDIES 288<br />

theorem width_ge0_int : "!! (bits::int) (n::int) (a::(wire)vector) (b::((wire)vector)<br />

vector) (c::((wire)vector)vector) (d::(wire)vector). 0


APPENDIX D. CIRCUIT LAYOUT CASE STUDIES 289<br />

height:: "wire=>int=>int=>((wire)vector*((wire)vector)vector)=>(((wire)vector)<br />

vector*bool)=>int"<br />

width:: "wire=>int=>int=>((wire)vector*((wire)vector)vector)=>(((wire)vector)<br />

vector*bool)=>int"<br />

nextstate:: "(wire=>int=>int=>((wire)vector*((wire)vector)vector)=>(((wire)vector<br />

)vector*bool)=>bool, wire=>int=>int=>((wire)vector*((wire)vector)vector)=>(((<br />

wire)vector)vector*bool)=>int)block"<br />

defs<br />

struct_def: "struct == % clk bits n (a, s) (s2, d). EX (control::(wire)vector).<br />

Def ((a, s) ;;; locater $ bits $ n ;;; control) & Def ((s, control) ;;;<br />

compactor $ clk $ bits $ n ;;; s2)"<br />

height_def: "height == % clk bits n (a, s) (s2, d). let control = (THE (control<br />

::(wire)vector). Def ((a, s) ;;; locater $ bits $ n ;;; control) & Def ((s,<br />

control) ;;; compactor $ clk $ bits $ n ;;; s2)) in max (Height ((s, control)<br />

;;; compactor $ clk $ bits $ n ;;; s2)) ((Height ((s, control) ;;; compactor<br />

$ clk $ bits $ n ;;; s2)) + (Height ((a, s) ;;; locater $ bits $ n ;;;<br />

control)))"<br />

width_def: "width == % clk bits n (a, s) (s2, d). let control = (THE (control::(<br />

wire)vector). Def ((a, s) ;;; locater $ bits $ n ;;; control) & Def ((s,<br />

control) ;;; compactor $ clk $ bits $ n ;;; s2)) in max (Width ((s, control)<br />

;;; compactor $ clk $ bits $ n ;;; s2)) (Width ((a, s) ;;; locater $ bits $ n<br />

;;; control))"<br />

nextstate_def: "nextstate == (| Def = struct, Height = height, Width = width |)"<br />

declare width_def [simp]<br />

declare height_def [simp]<br />

declare struct_def [simp]<br />

section {* Validity <strong>of</strong> width and height functions *}<br />

theorem height_ge0_int : "!! (clk::wire) (bits::int) (n::int) (a::(wire)vector) (s<br />

::((wire)vector)vector) (s2::((wire)vector)vector) (d::bool). 0


APPENDIX D. CIRCUIT LAYOUT CASE STUDIES 290<br />

theorem width_ge0: "!! (clk::wire) (bits::int) (n::int) (a::(wire)vector) (s::((wire)<br />

vector)vector) (s2::((wire)vector)vector) (d::bool). 0 <br />

((0::int)


APPENDIX D. CIRCUIT LAYOUT CASE STUDIES 291<br />

(rule allI)+,<br />

(case_tac "0 ((wire)vector*((wire)vector)vector)=>(((wire)vector)<br />

vector*(wire)vector)=>bool"<br />

height:: "(int*int)=>wire=>((wire)vector*((wire)vector)vector)=>(((wire)vector)<br />

vector*(wire)vector)=>int"<br />

width:: "(int*int)=>wire=>((wire)vector*((wire)vector)vector)=>(((wire)vector)<br />

vector*(wire)vector)=>int"<br />

filter_core:: "((int*int)=>wire=>((wire)vector*((wire)vector)vector)=>(((wire)<br />

vector)vector*(wire)vector)=>bool, (int*int)=>wire=>((wire)vector*((wire)<br />

vector)vector)=>(((wire)vector)vector*(wire)vector)=>int)block"<br />

defs<br />

struct_def: "struct == % (n, bits) clk (newval, s) (s2, median). Def ((newval, s)<br />

;;; fst $ (fork ;; fst $ (rcomp $ (n, map $ (bits, fd $ clk)))) ;; below $ (<br />

nextstate $ clk $ bits $ n, insert_median $ bits $ n) ;; snd $ (pi2) ;;; (s2,<br />

median))"<br />

height_def: "height == % (n, bits) clk (newval, s) (s2, median). Height ((newval,<br />

s) ;;; fst $ (fork ;; fst $ (rcomp $ (n, map $ (bits, fd $ clk)))) ;; below<br />

$ (nextstate $ clk $ bits $ n, insert_median $ bits $ n) ;; snd $ (pi2) ;;; (<br />

s2, median))"<br />

width_def: "width == % (n, bits) clk (newval, s) (s2, median). Width ((newval, s)<br />

;;; fst $ (fork ;; fst $ (rcomp $ (n, map $ (bits, fd $ clk)))) ;; below $ (<br />

nextstate $ clk $ bits $ n, insert_median $ bits $ n) ;; snd $ (pi2) ;;; (s2,<br />

median))"<br />

filter_core_def: "filter_core == (| Def = struct, Height = height, Width = width<br />

|)"<br />

declare width_def [simp]<br />

declare height_def [simp]<br />

declare struct_def [simp]<br />

section {* Validity <strong>of</strong> width and height functions *}<br />

theorem height_ge0_int : "!! (n::int) (bits::int) (clk::wire) (newval::(wire)vector)<br />

(s::((wire)vector)vector) (s2::((wire)vector)vector) (median::(wire)vector). 0


APPENDIX D. CIRCUIT LAYOUT CASE STUDIES 292<br />


APPENDIX D. CIRCUIT LAYOUT CASE STUDIES 293<br />

bits $ n, insert_median $ bits $ n) ;; snd $ (pi2) ;;; (s2, median)))) (wire)vector=>(wire)vector=>bool"<br />

height:: "(int*int)=>wire=>(wire)vector=>(wire)vector=>int"<br />

width:: "(int*int)=>wire=>(wire)vector=>(wire)vector=>int"<br />

filter:: "((int*int)=>wire=>(wire)vector=>(wire)vector=>bool, (int*int)=>wire=>(<br />

wire)vector=>(wire)vector=>int)block"<br />

defs<br />

struct_def: "struct == % (n, bits) clk newval median. Def (newval ;;; loop $ (<br />

filter_core $ (n, bits) $ clk) ;;; median)"<br />

height_def: "height == % (n, bits) clk newval median. Height (newval ;;; loop $ (<br />

filter_core $ (n, bits) $ clk) ;;; median)"<br />

width_def: "width == % (n, bits) clk newval median. Width (newval ;;; loop $ (<br />

filter_core $ (n, bits) $ clk) ;;; median)"<br />

filter_def: "filter == (| Def = struct, Height = height, Width = width |)"<br />

declare width_def [simp]<br />

declare height_def [simp]<br />

declare struct_def [simp]<br />

section {* Validity <strong>of</strong> width and height functions *}<br />

theorem height_ge0_int : "!! (n::int) (bits::int) (clk::wire) (newval::(wire)vector)<br />

(median::(wire)vector). 0


APPENDIX D. CIRCUIT LAYOUT CASE STUDIES 294<br />

done<br />

Let_def max_def)<br />

section {* Additional simplification rules for different representations *}<br />

theorem height_ge0: "!! (n::int) (bits::int) (clk::wire) (newval::(wire)vector) (<br />

median::(wire)vector). 0


APPENDIX D. CIRCUIT LAYOUT CASE STUDIES 295<br />

}<br />

wire o2.<br />

i ; R ; o2 at (0,0).<br />

o2 ; fd clk ; o at (0,0).<br />

block and3 (wire a, wire b, wire c) ∼ (wire d)<br />

attributes { height = 1. width = 1. }{ }<br />

block mux lut (wire s) (wire d0, wire d1) ∼ (wire o)<br />

attributes { height = 1. width = 1. }{ }<br />

block gr lut ((wire a, wire b), (wire is gr, wire is eq)) ∼ (wire is gr2)<br />

attributes { height = 1. width = 1. }{ }<br />

block eq lut ((wire a, wire b), wire is eq) ∼ (wire is eq2)<br />

attributes { height = 1. width = 1. }{ }<br />

block comp elem ((wire a, wire b), (wire is gr, wire is eq)) ∼ (wire is gr2, wire is eq2) {<br />

((a, b), (is gr, is eq)) ; gr lut ; is gr2 at (0, 0).<br />

((a, b), is eq) ; eq lut ; is eq2 at (1, 0).<br />

}<br />

block comparator (int bits) (wire a[bits ], wire b[bits ]) ∼ (wire a gr b) {<br />

wire zero, one.<br />

zero = false. one = true.<br />

((a, b), (zero, one)) ; fst (zip 2) ; rdr (bits, comp elem) ;pi1 ; a gr b at (0,0).<br />

}<br />

/∗ Two input sorting circuit <strong>with</strong> output register ∗/<br />

block sort2 (wire clk) (int bits) (wire a[bits ], wire b[bits ]) ∼<br />

(wire min val[bits], wire max val[bits]) {<br />

wire a gr b.<br />

(a, b) ; comparator (bits) ; a gr b at (0,0).<br />

(a, b) ; zip 2 ; map (bits, register clk (mux lut a gr b)) ;min val at (width((a, b) ;<br />

comparator (bits) ; a gr b), 0).<br />

(b, a) ; zip 2 ; map (bits, register clk (mux lut a gr b)) ;max val at (width((a, b) ;<br />

comparator (bits) ;a gr b) + width((a, b) ;zip 2 ; map (bits, mux lut a gr b) ;<br />

min val), 0).<br />

}<br />

block vecpair (‘a i [2]) ∼ (‘a o1, ‘a o2) → (o1, o2) = (i [0], i [1]) .<br />

/∗ Combinator describing an arbitrary butterfly network ∗/<br />

block butterfly (int n, block R (‘a, ‘a) ∼ (‘a, ‘a)) (‘a l[m]) ∼ (‘a r[m]) {<br />

const m = 2 ∗∗ n.<br />

l ; rcomp (n,<br />

riffle (m/2) ;<br />

pair (m/2) ;<br />

map (m/2, vecpair ; R ; converse (vecpair)) ;<br />

converse (pair (m/2))<br />

)<br />

; r.<br />

}<br />

/∗ Pipelined bitonic merger ∗/<br />

block merger (int n) (wire a[m/2], wire b[m/2]) ∼ (wire c[m]) {<br />

const m = 2∗∗n.


APPENDIX D. CIRCUIT LAYOUT CASE STUDIES 296<br />

}<br />

(a,b) ; snd (rev (m/2)) ; converse (half (m/2)) ; butterfly (n, sort2) ; c at (0,0).<br />

D.2.2 Theory comparator<br />

theory comparator = pi1 + comp_elem + rdr + fst:<br />

section {* Function definitions *}<br />

consts<br />

struct:: "int=>((wire)vector*(wire)vector)=>wire=>bool"<br />

height:: "int=>((wire)vector*(wire)vector)=>wire=>int"<br />

width:: "int=>((wire)vector*(wire)vector)=>wire=>int"<br />

comparator:: "(int=>((wire)vector*(wire)vector)=>wire=>bool, int=>((wire)vector*(<br />

wire)vector)=>wire=>int)block"<br />

defs<br />

struct_def: "struct == % bits (a, b) a_gr_b. EX (zero::wire) (one::wire). (zero =<br />

(bool2wire False)) & (one = (bool2wire True)) & Def (((a, b), (zero, one))<br />

;;; fst $ (zip $ 2) ;; rdr $ (bits, comp_elem) ;; pi1 ;;; a_gr_b)"<br />

height_def: "height == % bits (a, b) a_gr_b. let (zero, one) = (THE (zero::wire,<br />

one::wire). (zero = (bool2wire False)) & (one = (bool2wire True)) & Def (((a,<br />

b), (zero, one)) ;;; fst $ (zip $ 2) ;; rdr $ (bits, comp_elem) ;; pi1 ;;;<br />

a_gr_b)) in max (Height (((a, b), (zero, one)) ;;; fst $ (zip $ 2) ;; rdr $ (<br />

bits, comp_elem) ;; pi1 ;;; a_gr_b)) 0"<br />

width_def: "width == % bits (a, b) a_gr_b. let (zero, one) = (THE (zero::wire,<br />

one::wire). (zero = (bool2wire False)) & (one = (bool2wire True)) & Def (((a,<br />

b), (zero, one)) ;;; fst $ (zip $ 2) ;; rdr $ (bits, comp_elem) ;; pi1 ;;;<br />

a_gr_b)) in max (Width (((a, b), (zero, one)) ;;; fst $ (zip $ 2) ;; rdr $ (<br />

bits, comp_elem) ;; pi1 ;;; a_gr_b)) 0"<br />

comparator_def: "comparator == (| Def = struct, Height = height, Width = width |)<br />

"<br />

declare width_def [simp]<br />

declare height_def [simp]<br />

declare struct_def [simp]<br />

section {* Validity <strong>of</strong> width and height functions *}<br />

theorem height_ge0_int : "!! (bits::int) (a::(wire)vector) (b::(wire)vector) (a_gr_b<br />

::wire). 0


APPENDIX D. CIRCUIT LAYOUT CASE STUDIES 297<br />

apply (simp (no_asm_simp) del: height_def width_def add: Let_def max_def fst_def<br />

zip_def rdr_def comp_elem_def pi1_def comparator_def,<br />

(rule height_ge0_int, (simp+)?)?)<br />

done<br />

theorem width_ge0: "!! (bits::int) (a::(wire)vector) (b::(wire)vector) (a_gr_b::wire)<br />

. 0 <br />

((0::int) <br />

bool"<br />

height:: "wire=>int=>((wire)vector*(wire)vector)=>((wire)vector*(wire)vector)=><br />

int"<br />

width:: "wire=>int=>((wire)vector*(wire)vector)=>((wire)vector*(wire)vector)=>int<br />

"<br />

sort2:: "(wire=>int=>((wire)vector*(wire)vector)=>((wire)vector*(wire)vector)=><br />

bool, wire=>int=>((wire)vector*(wire)vector)=>((wire)vector*(wire)vector)=><br />

int)block"<br />

defs<br />

struct_def: "struct == % clk bits (a, b) (min_val, max_val). EX (a_gr_b::wire).<br />

Def ((a, b) ;;; comparator $ bits ;;; a_gr_b) & Def ((a, b) ;;; zip $ 2 ;;<br />

map $ (bits, register $ clk $ (mux_lut $ a_gr_b)) ;;; min_val) & Def ((b, a)<br />

;;; zip $ 2 ;; map $ (bits, register $ clk $ (mux_lut $ a_gr_b)) ;;; max_val)<br />

"<br />

height_def: "height == % clk bits (a, b) (min_val, max_val). let a_gr_b = (THE (<br />

a_gr_b::wire). Def ((a, b) ;;; comparator $ bits ;;; a_gr_b) & Def ((a, b)<br />

;;; zip $ 2 ;; map $ (bits, register $ clk $ (mux_lut $ a_gr_b)) ;;; min_val)<br />

& Def ((b, a) ;;; zip $ 2 ;; map $ (bits, register $ clk $ (mux_lut $ a_gr_b


APPENDIX D. CIRCUIT LAYOUT CASE STUDIES 298<br />

)) ;;; max_val)) in max (Height ((b, a) ;;; zip $ 2 ;; map $ (bits, register<br />

$ clk $ (mux_lut $ a_gr_b)) ;;; max_val)) ((max (Height ((a, b) ;;; zip $ 2<br />

;; map $ (bits, register $ clk $ (mux_lut $ a_gr_b)) ;;; min_val)) (Height ((<br />

a, b) ;;; comparator $ bits ;;; a_gr_b))))"<br />

width_def: "width == % clk bits (a, b) (min_val, max_val). let a_gr_b = (THE (<br />

a_gr_b::wire). Def ((a, b) ;;; comparator $ bits ;;; a_gr_b) & Def ((a, b)<br />

;;; zip $ 2 ;; map $ (bits, register $ clk $ (mux_lut $ a_gr_b)) ;;; min_val)<br />

& Def ((b, a) ;;; zip $ 2 ;; map $ (bits, register $ clk $ (mux_lut $ a_gr_b<br />

)) ;;; max_val)) in max ((Width ((a, b) ;;; comparator $ bits ;;; a_gr_b)) +<br />

(Width ((a, b) ;;; zip $ 2 ;; map $ (bits, mux_lut $ a_gr_b) ;;; min_val)) +<br />

(Width ((b, a) ;;; zip $ 2 ;; map $ (bits, register $ clk $ (mux_lut $ a_gr_b<br />

)) ;;; max_val))) ((max ((Width ((a, b) ;;; comparator $ bits ;;; a_gr_b)) +<br />

(Width ((a, b) ;;; zip $ 2 ;; map $ (bits, register $ clk $ (mux_lut $ a_gr_b<br />

)) ;;; min_val))) (Width ((a, b) ;;; comparator $ bits ;;; a_gr_b))))"<br />

sort2_def: "sort2 == (| Def = struct, Height = height, Width = width |)"<br />

declare width_def [simp]<br />

declare height_def [simp]<br />

declare struct_def [simp]<br />

section {* Validity <strong>of</strong> width and height functions *}<br />

theorem height_ge0_int : "!! (clk::wire) (bits::int) (a::(wire)vector) (b::(wire)<br />

vector) (min_val::(wire)vector) (max_val::(wire)vector). 0


APPENDIX D. CIRCUIT LAYOUT CASE STUDIES 299<br />

a_gr_b)) ;;; min_val) ; Def ((b, a) ;;; zip $ 2 ;; map $ (bits, register $ clk $<br />

(mux_lut $ a_gr_b)) ;;; max_val) |] ==><br />

((0::int)


APPENDIX D. CIRCUIT LAYOUT CASE STUDIES 300<br />

zip $ 2 ;; map $ (bits, register $ clk $ (mux_lut $ a_gr_b)) ;;; max_val)))<br />

((max ((Width ((a, b) ;;; comparator $ bits ;;; a_gr_b)) + (Width ((a, b) ;;;<br />

zip $ 2 ;; map $ (bits, register $ clk $ (mux_lut $ a_gr_b)) ;;; min_val)))<br />

(Width ((a, b) ;;; comparator $ bits ;;; a_gr_b)))))) & ((0 + (Height ((b, a)<br />

;;; zip $ 2 ;; map $ (bits, register $ clk $ (mux_lut $ a_gr_b)) ;;; max_val<br />

))) <br />

((0 + (Width ((a, b) ;;; comparator $ bits ;;; a_gr_b)))


APPENDIX D. CIRCUIT LAYOUT CASE STUDIES 301<br />

((((Width ((a, b) ;;; comparator $ bits ;;; a_gr_b)) + (Width ((a, b) ;;; zip $ 2<br />

;; map $ (bits, register $ clk $ (mux_lut $ a_gr_b)) ;;; min_val))) (’t280*’t280)=>bool,(’t280*’t280)=>(’t280*’t280)<br />

=>int)block))=>(’t280)vector=>(’t280)vector=>int"<br />

width:: "(int*(((’t280*’t280)=>(’t280*’t280)=>bool,(’t280*’t280)=>(’t280*’t280)=><br />

int)block))=>(’t280)vector=>(’t280)vector=>int"<br />

butterfly:: "((int*(((’t280*’t280)=>(’t280*’t280)=>bool,(’t280*’t280)=>(’t280*’<br />

t280)=>int)block))=>(’t280)vector=>(’t280)vector=>bool, (int*(((’t280*’t280)<br />

=>(’t280*’t280)=>bool,(’t280*’t280)=>(’t280*’t280)=>int)block))=>(’t280)<br />

vector=>(’t280)vector=>int)block"<br />

defs<br />

struct_def: "struct == % (n, R) l r. let m = (2 pwr n) in Def (l ;;; rcomp $ (n,<br />

riffle $ (m div 2) ;; pair $ (m div 2) ;; map $ (m div 2, vecpair ;; R ;;<br />

converse $ (vecpair)) ;; converse $ (pair $ (m div 2))) ;;; r)"<br />

height_def: "height == % (n, R) l r. let m = (2 pwr n) in Height (l ;;; rcomp $ (<br />

n, riffle $ (m div 2) ;; pair $ (m div 2) ;; map $ (m div 2, vecpair ;; R ;;


APPENDIX D. CIRCUIT LAYOUT CASE STUDIES 302<br />

converse $ (vecpair)) ;; converse $ (pair $ (m div 2))) ;;; r)"<br />

width_def: "width == % (n, R) l r. let m = (2 pwr n) in Width (l ;;; rcomp $ (n,<br />

riffle $ (m div 2) ;; pair $ (m div 2) ;; map $ (m div 2, vecpair ;; R ;;<br />

converse $ (vecpair)) ;; converse $ (pair $ (m div 2))) ;;; r)"<br />

butterfly_def: "butterfly == (| Def = struct, Height = height, Width = width|)"<br />

declare width_def [simp]<br />

declare height_def [simp]<br />

declare struct_def [simp]<br />

section {* Validity <strong>of</strong> width and height functions *}<br />

theorem height_ge0_int : "!! (n::int) (R::(((’t280*’t280)=>(’t280*’t280)=>bool,(’t280<br />

*’t280)=>(’t280*’t280)=>int)block)) (l::(’t280)vector) (r::(’t280)vector). [|<br />

ALL (qs344::(’t280*’t280)) (qs345::(’t280*’t280)). 0 bool,(’t280<br />

*’t280)=>(’t280*’t280)=>int)block)) (l::(’t280)vector) (r::(’t280)vector). [|<br />

ALL (qs344::(’t280*’t280)) (qs345::(’t280*’t280)). 0 bool,(’<br />

t280*’t280)=>(’t280*’t280)=>int)block)) (l::(’t280)vector) (r::(’t280)vector).<br />

[| ALL (qs344::(’t280*’t280)) (qs345::(’t280*’t280)). 0 bool,(’<br />

t280*’t280)=>(’t280*’t280)=>int)block)) (l::(’t280)vector) (r::(’t280)vector).


APPENDIX D. CIRCUIT LAYOUT CASE STUDIES 303<br />

[| ALL (qs344::(’t280*’t280)) (qs345::(’t280*’t280)). 0 bool,(’t280*’t280)=>(’t280<br />

*’t280)=>int)block)) (l::(’t280)vector) (r::(’t280)vector). [| Def (l ;;; rcomp<br />

$ (n, riffle $ (m div 2) ;; pair $ (m div 2) ;; map $ (m div 2, vecpair ;; R ;;<br />

converse $ (vecpair)) ;; converse $ (pair $ (m div 2))) ;;; r) ; m = (2 pwr n) ;<br />

ALL (qs344::(’t280*’t280)) (qs345::(’t280*’t280)). 0 bool"<br />

height:: "wire=>int=>int=>(((wire)vector)vector*((wire)vector)vector)=>((wire)<br />

vector)vector=>int"<br />

width:: "wire=>int=>int=>(((wire)vector)vector*((wire)vector)vector)=>((wire)<br />

vector)vector=>int"<br />

merger:: "(wire=>int=>int=>(((wire)vector)vector*((wire)vector)vector)=>((wire)<br />

vector)vector=>bool, wire=>int=>int=>(((wire)vector)vector*((wire)vector)<br />

vector)=>((wire)vector)vector=>int)block"<br />

defs<br />

struct_def: "struct == % clk bits n (a, b) c. let m = (2 pwr n) in Def ((a, b)<br />

;;; snd $ (rev $ (m div 2)) ;; converse $ (half $ (m div 2)) ;; butterfly $ (<br />

n, sort2 $ clk $ bits) ;;; c)"


APPENDIX D. CIRCUIT LAYOUT CASE STUDIES 304<br />

height_def: "height == % clk bits n (a, b) c. let m = (2 pwr n) in Height ((a, b)<br />

;;; snd $ (rev $ (m div 2)) ;; converse $ (half $ (m div 2)) ;; butterfly $<br />

(n, sort2 $ clk $ bits) ;;; c)"<br />

width_def: "width == % clk bits n (a, b) c. let m = (2 pwr n) in Width ((a, b)<br />

;;; snd $ (rev $ (m div 2)) ;; converse $ (half $ (m div 2)) ;; butterfly $ (<br />

n, sort2 $ clk $ bits) ;;; c)"<br />

merger_def: "merger == (| Def = struct, Height = height, Width = width|)"<br />

declare width_def [simp]<br />

declare height_def [simp]<br />

declare struct_def [simp]<br />

section {* Validity <strong>of</strong> width and height functions *}<br />

theorem height_ge0_int : "!! (clk::wire) (bits::int) (n::int) (a::((wire)vector)<br />

vector) (b::((wire)vector)vector) (c::((wire)vector)vector). 0


APPENDIX D. CIRCUIT LAYOUT CASE STUDIES 305<br />

butterfly $ (n, sort2 $ clk $ bits) ;;; c)))


APPENDIX D. CIRCUIT LAYOUT CASE STUDIES 306<br />

}<br />

cin = false.<br />

(a, b) ; zip 2 ; converse (pi1) ; col (n, fadd clk) ; (cin, ans) at (0,0).<br />

/∗ Repeating cell for cubical matrix multiplier ∗/<br />

block matmultcell (int n) (wire clk) (wire x in[n], wire y in[n], wire z in[n]) ∼<br />

(wire z out[n], wire y out[n], wire x out[n]) {<br />

x out = x in.<br />

y out = y in.<br />

((x in, y in), z in) ; fst (mult n) ; add n clk; z out at (0,0).<br />

}<br />

/∗∗ Transpose a matrix ∗/<br />

block word transpose (int bits) (int n, int m) (‘a v1[n][m][bits ]) ∼ (‘a v2[m][n][ bits ]) {<br />

int i , j.<br />

for i = 0..n−1 {<br />

for j = 0..m−1 {<br />

v2[j ][ i ] = v1[i ][ j].<br />

} . } .<br />

}<br />

/∗∗ Matrix multiplier ∗/<br />

block matmult (wire clk) (int bits) (int x, int y, int z)<br />

(wire mat1[y][z ][ bits ], wire mat2[z][x][ bits ]) ∼ (wire mat3[y][x][ bits ]) {<br />

wire emptymat[y][x][bits].<br />

wire mat trans[z][y][ bits ].<br />

int i , j, k.<br />

for i = 0..y−1 { for j = 0..x−1 { for k = 0..bits−1 { emptymat[i][j][ k] = false. } . } . } .<br />

mat1 ; word transpose bits (y, z) ; mat trans at (0,0).<br />

(mat trans, mat2, emptymat) ;<br />

cube (x, y, z, matmultcell bits clk) ;<br />

converse (tplapl 2) ;<br />

pi1 ;<br />

mat3 at (0,0).<br />

}<br />

D.3.2 Theory cube cell<br />

theory cube_cell = apr + lsh + swap + fst + rsh + apl + converse + snd:<br />

section {* Temporary definitions to support tplapl in Isabelle *}<br />

constdefs<br />

tplapr2_struct :: "((’a*’b)*’c)=>(’a*’b*’c)=>bool"<br />

"tplapr2_struct == (% ((a, b), c) (d, e, f). a = d & b = e & f = c)"<br />

tplapr2 :: "(((’a*’b)*’c)=>(’a*’b*’c)=>bool,((’a*’b)*’c)=>(’a*’b*’c)=>int)block"<br />

"tplapr2 == (| Def = tplapr2_struct, Height = % a b. (0::int), Width = % a b.<br />

(0::int) |)"<br />

tplapl2_struct :: "(’a*(’b*’c))=>(’a*’b*’c)=>bool"<br />

"tplapl2_struct == (% (a, (b, c)) (d, e, f). a = d & b = e & c = f)"<br />

tplapl2 :: "((’a*(’b*’c))=>(’a*’b*’c)=>bool,(’a*(’b*’c))=>(’a*’b*’c)=>int)block"<br />

"tplapl2 == (| Def = tplapl2_struct, Height = % a b. (0::int), Width = % a b.<br />

(0::int) |)"<br />

section {* Function definitions *}<br />

consts


APPENDIX D. CIRCUIT LAYOUT CASE STUDIES 307<br />

struct:: "(int*(((’t697*’t698*’t772)=>(’t772*’t698*’t697)=>bool,(’t697*’t698*’<br />

t772)=>(’t772*’t698*’t697)=>int)block))=>(((’t772)vector*’t697)*’t698)=>(’<br />

t698*((’t772)vector*’t697))=>bool"<br />

height:: "(int*(((’t697*’t698*’t772)=>(’t772*’t698*’t697)=>bool,(’t697*’t698*’<br />

t772)=>(’t772*’t698*’t697)=>int)block))=>(((’t772)vector*’t697)*’t698)=>(’<br />

t698*((’t772)vector*’t697))=>int"<br />

width:: "(int*(((’t697*’t698*’t772)=>(’t772*’t698*’t697)=>bool,(’t697*’t698*’t772<br />

)=>(’t772*’t698*’t697)=>int)block))=>(((’t772)vector*’t697)*’t698)=>(’t698<br />

*((’t772)vector*’t697))=>int"<br />

cube_cell:: "((int*(((’t697*’t698*’t772)=>(’t772*’t698*’t697)=>bool,(’t697*’t698<br />

*’t772)=>(’t772*’t698*’t697)=>int)block))=>(((’t772)vector*’t697)*’t698)=>(’<br />

t698*((’t772)vector*’t697))=>bool, (int*(((’t697*’t698*’t772)=>(’t772*’t698*’<br />

t697)=>bool,(’t697*’t698*’t772)=>(’t772*’t698*’t697)=>int)block))=>(((’t772)<br />

vector*’t697)*’t698)=>(’t698*((’t772)vector*’t697))=>int)block"<br />

defs<br />

struct_def: "struct == % (n, R) ((z, x), y) (y2, (z2, x2)). Def (((x, y), z) ;;;<br />

snd $ (converse $ (apl $ (n - 1))) ;; rsh ;; fst $ (tplapr2 ;; R ;; converse<br />

$ (tplapl2) ;; swap) ;; lsh ;; snd $ (swap ;; apr $ (n - 1)) ;;; ((y2, x2),<br />

z2))"<br />

height_def: "height == % (n, R) ((z, x), y) (y2, (z2, x2)). Height (((x, y), z)<br />

;;; snd $ (converse $ (apl $ (n - 1))) ;; rsh ;; fst $ (tplapr2 ;; R ;;<br />

converse $ (tplapl2) ;; swap) ;; lsh ;; snd $ (swap ;; apr $ (n - 1)) ;;; ((<br />

y2, x2), z2))"<br />

width_def: "width == % (n, R) ((z, x), y) (y2, (z2, x2)). Width (((x, y), z) ;;;<br />

snd $ (converse $ (apl $ (n - 1))) ;; rsh ;; fst $ (tplapr2 ;; R ;; converse<br />

$ (tplapl2) ;; swap) ;; lsh ;; snd $ (swap ;; apr $ (n - 1)) ;;; ((y2, x2),<br />

z2))"<br />

cube_cell_def: "cube_cell == (| Def = struct, Height = height, Width = width |)"<br />

declare width_def [simp]<br />

declare height_def [simp]<br />

declare struct_def [simp]<br />

section {* Validity <strong>of</strong> width and height functions *}<br />

theorem height_ge0_int : "!! (n::int) (R::(((’t697*’t698*’t772)=>(’t772*’t698*’t697)<br />

=>bool,(’t697*’t698*’t772)=>(’t772*’t698*’t697)=>int)block)) (z::(’t772)vector)<br />

(x::’t697) (y::’t698) (y2::’t698) (z2::(’t772)vector) (x2::’t697). [| ALL (<br />

qs1101::(’t697*’t698*’t772)) (qs1102::(’t772*’t698*’t697)). 0


APPENDIX D. CIRCUIT LAYOUT CASE STUDIES 308<br />

done<br />

theorem width_ge0_int : "!! (n::int) (R::(((’t697*’t698*’t772)=>(’t772*’t698*’t697)=><br />

bool,(’t697*’t698*’t772)=>(’t772*’t698*’t697)=>int)block)) (z::(’t772)vector) (x<br />

::’t697) (y::’t698) (y2::’t698) (z2::(’t772)vector) (x2::’t697). [| ALL (qs1101<br />

::(’t697*’t698*’t772)) (qs1102::(’t772*’t698*’t697)). 0 bool<br />

,(’t697*’t698*’t772)=>(’t772*’t698*’t697)=>int)block)) (z::(’t772)vector) (x::’<br />

t697) (y::’t698) (y2::’t698) (z2::(’t772)vector) (x2::’t697). [| ALL (qs1101::(’<br />

t697*’t698*’t772)) (qs1102::(’t772*’t698*’t697)). 0 bool<br />

,(’t697*’t698*’t772)=>(’t772*’t698*’t697)=>int)block)) (z::(’t772)vector) (x::’<br />

t697) (y::’t698) (y2::’t698) (z2::(’t772)vector) (x2::’t697). [| ALL (qs1101::(’<br />

t697*’t698*’t772)) (qs1102::(’t772*’t698*’t697)). 0 bool,(’t697*’<br />

t698*’t772)=>(’t772*’t698*’t697)=>int)block)) (z::(’t772)vector) (x::’t697) (y<br />

::’t698) (y2::’t698) (z2::(’t772)vector) (x2::’t697). [| Def (((x, y), z) ;;;<br />

snd $ (converse $ (apl $ (n - 1))) ;; rsh ;; fst $ (tplapr2 ;; R ;; converse $ (<br />

tplapl2) ;; swap) ;; lsh ;; snd $ (swap ;; apr $ (n - 1)) ;;; ((y2, x2), z2)) ;


APPENDIX D. CIRCUIT LAYOUT CASE STUDIES 309<br />

ALL (qs1101::(’t697*’t698*’t772)) (qs1102::(’t772*’t698*’t697)). 0 (((’t686)vector)vector*((’t805<br />

)vector)vector*((’t772)vector)vector)=>(((’t772)vector)vector*((’t805)vector)<br />

vector*((’t686)vector)vector)=>bool"<br />

height:: "(int*int*int*(((’t686*’t805*’t772)=>(’t772*’t805*’t686)=>bool,(’t686*’<br />

t805*’t772)=>(’t772*’t805*’t686)=>int)block))=>(((’t686)vector)vector*((’t805<br />

)vector)vector*((’t772)vector)vector)=>(((’t772)vector)vector*((’t805)vector)<br />

vector*((’t686)vector)vector)=>int"<br />

width:: "(int*int*int*(((’t686*’t805*’t772)=>(’t772*’t805*’t686)=>bool,(’t686*’<br />

t805*’t772)=>(’t772*’t805*’t686)=>int)block))=>(((’t686)vector)vector*((’t805<br />

)vector)vector*((’t772)vector)vector)=>(((’t772)vector)vector*((’t805)vector)<br />

vector*((’t686)vector)vector)=>int"<br />

cube:: "((int*int*int*(((’t686*’t805*’t772)=>(’t772*’t805*’t686)=>bool,(’t686*’<br />

t805*’t772)=>(’t772*’t805*’t686)=>int)block))=>(((’t686)vector)vector*((’t805<br />

)vector)vector*((’t772)vector)vector)=>(((’t772)vector)vector*((’t805)vector)<br />

vector*((’t686)vector)vector)=>bool, (int*int*int*(((’t686*’t805*’t772)=>(’<br />

t772*’t805*’t686)=>bool,(’t686*’t805*’t772)=>(’t772*’t805*’t686)=>int)block))<br />

=>(((’t686)vector)vector*((’t805)vector)vector*((’t772)vector)vector)=>(((’<br />

t772)vector)vector*((’t805)vector)vector*((’t686)vector)vector)=>int)block"<br />

defs<br />

struct_def: "struct == % (x, y, z, R) (x_d, y_d, z_d) (z_r, y_r, x_r). Def (((x_d<br />

, y_d), z_d) ;;; fst $ ((zip $ 2)::((((’t686)vector)vector*((’t805)vector)<br />

vector)=>((’t686)vector*(’t805)vector)vector=>bool,(((’t686)vector)vector*((’<br />

t805)vector)vector)=>((’t686)vector*(’t805)vector)vector=>int)block) ;; col $<br />

(z, swap ;; rsh ;; fst $ (zip $ 2) ;; grid $ (x, y, cube_cell $ (x, R)) ;;<br />

snd $ (converse $ (zip $ 2)) ;; rsh ;; fst $ swap ;; lsh) ;; snd $ (converse<br />

$ ((zip $ 2)::((((’t805)vector)vector*((’t686)vector)vector)=>((’t805)vector<br />

*(’t686)vector)vector=>bool,(((’t805)vector)vector*((’t686)vector)vector)<br />

=>((’t805)vector*(’t686)vector)vector=>int)block)) ;;; (z_r, (y_r, x_r)))"<br />

height_def: "height == % (x, y, z, R) (x_d, y_d, z_d) (z_r, y_r, x_r). Height (((<br />

x_d, y_d), z_d) ;;; fst $ ((zip $ 2)::((((’t686)vector)vector*((’t805)vector)<br />

vector)=>((’t686)vector*(’t805)vector)vector=>bool,(((’t686)vector)vector*((’


APPENDIX D. CIRCUIT LAYOUT CASE STUDIES 310<br />

t805)vector)vector)=>((’t686)vector*(’t805)vector)vector=>int)block) ;; col $<br />

(z, swap ;; rsh ;; fst $ (zip $ 2) ;; grid $ (x, y, cube_cell $ (x, R)) ;;<br />

snd $ (converse $ (zip $ 2)) ;; rsh ;; fst $ swap ;; lsh) ;; snd $ (converse<br />

$ ((zip $ 2)::((((’t805)vector)vector*((’t686)vector)vector)=>((’t805)vector<br />

*(’t686)vector)vector=>bool,(((’t805)vector)vector*((’t686)vector)vector)<br />

=>((’t805)vector*(’t686)vector)vector=>int)block)) ;;; (z_r, (y_r, x_r)))"<br />

width_def: "width == % (x, y, z, R) (x_d, y_d, z_d) (z_r, y_r, x_r). Width (((x_d<br />

, y_d), z_d) ;;; fst $ ((zip $ 2)::((((’t686)vector)vector*((’t805)vector)<br />

vector)=>((’t686)vector*(’t805)vector)vector=>bool,(((’t686)vector)vector*((’<br />

t805)vector)vector)=>((’t686)vector*(’t805)vector)vector=>int)block) ;; col $<br />

(z, swap ;; rsh ;; fst $ (zip $ 2) ;; grid $ (x, y, cube_cell $ (x, R)) ;;<br />

snd $ (converse $ (zip $ 2)) ;; rsh ;; fst $ swap ;; lsh) ;; snd $ (converse<br />

$ ((zip $ 2)::((((’t805)vector)vector*((’t686)vector)vector)=>((’t805)vector<br />

*(’t686)vector)vector=>bool,(((’t805)vector)vector*((’t686)vector)vector)<br />

=>((’t805)vector*(’t686)vector)vector=>int)block)) ;;; (z_r, (y_r, x_r)))"<br />

cube_def: "cube == (| Def = struct, Height = height, Width = width |)"<br />

declare width_def [simp]<br />

declare height_def [simp]<br />

declare struct_def [simp]<br />

section {* Validity <strong>of</strong> width and height functions *}<br />

theorem height_ge0_int : "!! (x::int) (y::int) (z::int) (R::(((’t686*’t805*’t772)=>(’<br />

t772*’t805*’t686)=>bool,(’t686*’t805*’t772)=>(’t772*’t805*’t686)=>int)block)) (<br />

x_d::((’t686)vector)vector) (y_d::((’t805)vector)vector) (z_d::((’t772)vector)<br />

vector) (z_r::((’t772)vector)vector) (y_r::((’t805)vector)vector) (x_r::((’t686)<br />

vector)vector). [| ALL (qs1103::(’t686*’t805*’t772)) (qs1104::(’t772*’t805*’t686<br />

)). 0 bool,(’t686*’t805*’t772)=>(’t772*’t805*’t686)=>int)block)) (<br />

x_d::((’t686)vector)vector) (y_d::((’t805)vector)vector) (z_d::((’t772)vector)<br />

vector) (z_r::((’t772)vector)vector) (y_r::((’t805)vector)vector) (x_r::((’t686)<br />

vector)vector). [| ALL (qs1103::(’t686*’t805*’t772)) (qs1104::(’t772*’t805*’t686<br />

)). 0 bool,(’t686*’t805*’t772)=>(’t772*’t805*’t686)=>int)block)) (x_d


APPENDIX D. CIRCUIT LAYOUT CASE STUDIES 311<br />

::((’t686)vector)vector) (y_d::((’t805)vector)vector) (z_d::((’t772)vector)<br />

vector) (z_r::((’t772)vector)vector) (y_r::((’t805)vector)vector) (x_r::((’t686)<br />

vector)vector). [| ALL (qs1103::(’t686*’t805*’t772)) (qs1104::(’t772*’t805*’t686<br />

)). 0 bool,(’t686*’t805*’t772)=>(’t772*’t805*’t686)=>int)block)) (x_d<br />

::((’t686)vector)vector) (y_d::((’t805)vector)vector) (z_d::((’t772)vector)<br />

vector) (z_r::((’t772)vector)vector) (y_r::((’t805)vector)vector) (x_r::((’t686)<br />

vector)vector). [| ALL (qs1103::(’t686*’t805*’t772)) (qs1104::(’t772*’t805*’t686<br />

)). 0 bool,(’t686*’t805*’t772)=>(’t772*’t805*’t686)=>int)block)) (x_d::((’t686)<br />

vector)vector) (y_d::((’t805)vector)vector) (z_d::((’t772)vector)vector) (z_r<br />

::((’t772)vector)vector) (y_r::((’t805)vector)vector) (x_r::((’t686)vector)<br />

vector). [| Def (((x_d, y_d), z_d) ;;; fst $ (zip $ 2) ;; col $ (z, swap ;; rsh<br />

;; fst $ (zip $ 2) ;; grid $ (x, y, cube_cell $ (x, R)) ;; snd $ (converse $ (<br />

zip $ 2)) ;; rsh ;; fst $ (swap) ;; lsh) ;; snd $ (converse $ (zip $ 2)) ;;; (<br />

z_r, (y_r, x_r))) ; ALL (qs1103::(’t686*’t805*’t772)) (qs1104::(’t772*’t805*’<br />

t686)). 0 ((’t686)<br />

vector*(’t805)vector)vector=>int)block) ;; col $ (z, swap ;; rsh ;; fst $ (zip $<br />

2) ;; grid $ (x, y, cube_cell $ (x, R)) ;; snd $ (converse $ (zip $ 2)) ;; rsh<br />

;; fst $ swap ;; lsh) ;; snd $ (converse $ ((zip $ 2)::((((’t805)vector)vector<br />

*((’t686)vector)vector)=>((’t805)vector*(’t686)vector)vector=>bool,(((’t805)<br />

vector)vector*((’t686)vector)vector)=>((’t805)vector*(’t686)vector)vector=>int)<br />

block)) ;;; (z_r, (y_r, x_r))))) ((’t686)vector*(’t805)vector<br />

)vector=>bool,(((’t686)vector)vector*((’t805)vector)vector)=>((’t686)vector*(’<br />

t805)vector)vector=>int)block) ;; col $ (z, swap ;; rsh ;; fst $ (zip $ 2) ;;<br />

grid $ (x, y, cube_cell $ (x, R)) ;; snd $ (converse $ (zip $ 2)) ;; rsh ;; fst<br />

$ swap ;; lsh) ;; snd $ (converse $ ((zip $ 2)::((((’t805)vector)vector*((’t686)<br />

vector)vector)=>((’t805)vector*(’t686)vector)vector=>bool,(((’t805)vector)vector<br />

*((’t686)vector)vector)=>((’t805)vector*(’t686)vector)vector=>int)block)) ;;; (<br />

z_r, (y_r, x_r))))) &


APPENDIX D. CIRCUIT LAYOUT CASE STUDIES 312<br />

((0 + (Height (((x_d, y_d), z_d) ;;; fst $ ((zip $ 2)::((((’t686)vector)<br />

vector*((’t805)vector)vector)=>((’t686)vector*(’t805)vector)vector=>bool<br />

,(((’t686)vector)vector*((’t805)vector)vector)=>((’t686)vector*(’t805)<br />

vector)vector=>int)block) ;; col $ (z, swap ;; rsh ;; fst $ (zip $ 2) ;;<br />

grid $ (x, y, cube_cell $ (x, R)) ;; snd $ (converse $ (zip $ 2)) ;; rsh<br />

;; fst $ swap ;; lsh) ;; snd $ (converse $ ((zip $ 2)::((((’t805)vector)<br />

vector*((’t686)vector)vector)=>((’t805)vector*(’t686)vector)vector=>bool<br />

,(((’t805)vector)vector*((’t686)vector)vector)=>((’t805)vector*(’t686)<br />

vector)vector=>int)block)) ;;; (z_r, (y_r, x_r))))) ((’t686)vector*(’t805)vector)vector=>bool,(((’t686)vector)vector<br />

*((’t805)vector)vector)=>((’t686)vector*(’t805)vector)vector=>int)block)<br />

;; col $ (z, swap ;; rsh ;; fst $ (zip $ 2) ;; grid $ (x, y, cube_cell $<br />

(x, R)) ;; snd $ (converse $ (zip $ 2)) ;; rsh ;; fst $ swap ;; lsh) ;;<br />

snd $ (converse $ ((zip $ 2)::((((’t805)vector)vector*((’t686)vector)<br />

vector)=>((’t805)vector*(’t686)vector)vector=>bool,(((’t805)vector)vector<br />

*((’t686)vector)vector)=>((’t805)vector*(’t686)vector)vector=>int)block))<br />

;;; (z_r, (y_r, x_r)))))"<br />

by auto<br />

end<br />

D.3.4 Theory matmultcell<br />

theory matmultcell = add + mult + fst:<br />

section {* Function definitions *}<br />

consts<br />

struct:: "int=>wire=>((wire)vector*(wire)vector*(wire)vector)=>((wire)vector*(<br />

wire)vector*(wire)vector)=>bool"<br />

height:: "int=>wire=>((wire)vector*(wire)vector*(wire)vector)=>((wire)vector*(<br />

wire)vector*(wire)vector)=>int"<br />

width:: "int=>wire=>((wire)vector*(wire)vector*(wire)vector)=>((wire)vector*(wire<br />

)vector*(wire)vector)=>int"<br />

matmultcell:: "(int=>wire=>((wire)vector*(wire)vector*(wire)vector)=>((wire)<br />

vector*(wire)vector*(wire)vector)=>bool, int=>wire=>((wire)vector*(wire)<br />

vector*(wire)vector)=>((wire)vector*(wire)vector*(wire)vector)=>int)block"<br />

defs<br />

struct_def: "struct == % n clk (x_in, y_in, z_in) (z_out, y_out, x_out). (x_out =<br />

x_in) & (y_out = y_in) & Def (((x_in, y_in), z_in) ;;; fst $ (mult $ n) ;;<br />

add $ n $ clk ;;; z_out)"<br />

height_def: "height == % n clk (x_in, y_in, z_in) (z_out, y_out, x_out). max (<br />

Height (((x_in, y_in), z_in) ;;; fst $ (mult $ n) ;; add $ n $ clk ;;; z_out)<br />

) 0"<br />

width_def: "width == % n clk (x_in, y_in, z_in) (z_out, y_out, x_out). max (Width<br />

(((x_in, y_in), z_in) ;;; fst $ (mult $ n) ;; add $ n $ clk ;;; z_out)) 0"<br />

matmultcell_def: "matmultcell == (| Def = struct, Height = height, Width = width<br />

|)"<br />

declare width_def [simp]<br />

declare height_def [simp]<br />

declare struct_def [simp]<br />

section {* Validity <strong>of</strong> width and height functions *}


APPENDIX D. CIRCUIT LAYOUT CASE STUDIES 313<br />

theorem height_ge0_int : "!! (n::int) (clk::wire) (x_in::(wire)vector) (y_in::(wire)<br />

vector) (z_in::(wire)vector) (z_out::(wire)vector) (y_out::(wire)vector) (x_out<br />

::(wire)vector). 0


APPENDIX D. CIRCUIT LAYOUT CASE STUDIES 314<br />

theory matmult = pi1 + converse + matmultcell + cube + word_transpose:<br />

section {* Function definitions *}<br />

consts<br />

struct:: "wire=>int=>(int*int*int)=>((((wire)vector)vector)vector*(((wire)vector)<br />

vector)vector)=>(((wire)vector)vector)vector=>bool"<br />

height:: "wire=>int=>(int*int*int)=>((((wire)vector)vector)vector*(((wire)vector)<br />

vector)vector)=>(((wire)vector)vector)vector=>int"<br />

width:: "wire=>int=>(int*int*int)=>((((wire)vector)vector)vector*(((wire)vector)<br />

vector)vector)=>(((wire)vector)vector)vector=>int"<br />

matmult:: "(wire=>int=>(int*int*int)=>((((wire)vector)vector)vector*(((wire)<br />

vector)vector)vector)=>(((wire)vector)vector)vector=>bool, wire=>int=>(int*<br />

int*int)=>((((wire)vector)vector)vector*(((wire)vector)vector)vector)=>(((<br />

wire)vector)vector)vector=>int)block"<br />

defs<br />

struct_def: "struct == % clk bits (x, y, z) (mat1, mat2) mat3. EX (emptymat::(((<br />

wire)vector)vector)vector) (mat_trans::(((wire)vector)vector)vector). (ALL (i<br />

::int). ((0


APPENDIX D. CIRCUIT LAYOUT CASE STUDIES 315<br />

tplapl2) ;; pi1 ;;; mat3))<br />

in max (Width ((mat_trans, mat2, emptymat) ;;; cube $ (x, y, z, matmultcell $<br />

bits $ clk) ;; converse $ (tplapl2) ;; pi1 ;;; mat3)) ((max (Width (mat1 ;;;<br />

word_transpose $ bits $ (y, z) ;;; mat_trans)) (if 0


APPENDIX D. CIRCUIT LAYOUT CASE STUDIES 316<br />

((0


APPENDIX D. CIRCUIT LAYOUT CASE STUDIES 317<br />

==><br />

((0 + (Width (mat1 ;;; word_transpose $ bits $ (y, z) ;;; mat_trans)))

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!