14.01.2013 Views

2005 PhD Research in Microelectronics and Electronics - Lirmm

2005 PhD Research in Microelectronics and Electronics - Lirmm

2005 PhD Research in Microelectronics and Electronics - Lirmm

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>2005</strong> <strong>PhD</strong> <strong>Research</strong><br />

<strong>in</strong> <strong>Microelectronics</strong> <strong>and</strong> <strong>Electronics</strong><br />

PRIME<br />

Proceed<strong>in</strong>gs<br />

of the Conference<br />

- Volume I -<br />

Lausanne, Switzerl<strong>and</strong><br />

July 25 - 28, <strong>2005</strong><br />

<strong>PhD</strong> <strong>Research</strong> <strong>in</strong> <strong>Microelectronics</strong> <strong>and</strong> <strong>Electronics</strong>


<strong>2005</strong> <strong>PhD</strong> <strong>Research</strong><br />

<strong>in</strong> <strong>Microelectronics</strong> <strong>and</strong> <strong>Electronics</strong><br />

Proceed<strong>in</strong>gs<br />

of the Conference<br />

- Volume I -<br />

Lausanne, Switzerl<strong>and</strong><br />

July 25 - 28, <strong>2005</strong><br />

Organization Technical Co-Sponsorship<br />

PRIME<br />

<strong>PhD</strong> <strong>Research</strong> <strong>in</strong> <strong>Microelectronics</strong> <strong>and</strong> <strong>Electronics</strong>


Copyright <strong>and</strong> Repr<strong>in</strong>t Permission: Abstract<strong>in</strong>g is permitted with credit to the source. Libraries are<br />

permitted to photocopy beyond the limit of U.S. copyright law for private use of patrons those<br />

articles <strong>in</strong> this volume that carry a code at the bottom of the first page, provided the per-copy fee<br />

<strong>in</strong>dicated <strong>in</strong> the code is paid through Copyright Clearance Center, 222 Rosewood Drive, Danvers,<br />

MA 01923. For other copy<strong>in</strong>g, repr<strong>in</strong>t or republication permission, write to IEEE Copyrights<br />

Manager, IEEE Operations Center, 445 Hoes Lane, P.O. Box 1331, Piscataway, NJ 08855-1331. All<br />

rights reserved. Copyright ©<strong>2005</strong> by the Institute of Electrical <strong>and</strong> <strong>Electronics</strong> Eng<strong>in</strong>eers, Inc.<br />

IEEE Catalog Number: 05EX1148<br />

ISBN: 0-7803-9345-7<br />

Library of Congress Number: <strong>2005</strong>928249<br />

I


Dear <strong>PhD</strong> Student, Dear Participant,<br />

Foreword<br />

We are pleased to welcome you to the very first edition of the PRIME Conference, held <strong>in</strong> Lausanne,<br />

Switzerl<strong>and</strong> <strong>in</strong> <strong>2005</strong>. Lausanne is located <strong>in</strong> a beautiful sett<strong>in</strong>g on the shores of the Lake of Geneva.<br />

We hope that you will take the opportunity to discover this attractive place comb<strong>in</strong><strong>in</strong>g great quality<br />

of life, culture, <strong>and</strong> an <strong>in</strong>ternational touch.<br />

The <strong>Electronics</strong> Laboratories (LEG) of the Swiss Federal Institute of Technology (EPFL), Lausanne,<br />

has organized this Conference with the active help of the <strong>Electronics</strong> Micro-system Laboratory of the<br />

University of Pavia, Italy.<br />

PRIME <strong>2005</strong> is the 1st Edition of a new approach aim<strong>in</strong>g at support<strong>in</strong>g the dissem<strong>in</strong>ation of results<br />

of <strong>PhD</strong> <strong>Research</strong> In <strong>Microelectronics</strong> <strong>and</strong> <strong>Electronics</strong>. The ma<strong>in</strong> goals are the follow<strong>in</strong>g:<br />

� Friendly competition between the <strong>PhD</strong> Students.<br />

� Benchmark of the most promis<strong>in</strong>g <strong>PhD</strong> Students thanks to the rank<strong>in</strong>g of the articles (e.g.<br />

acknowledgment of the gold leaf certificate to the first 10% <strong>PhD</strong> Students).<br />

� Creation of a network of excellence between the <strong>PhD</strong> Students.<br />

The authors of PRIME presentations come from Asia, North America <strong>and</strong> Europe. Italy (27 articles),<br />

Switzerl<strong>and</strong> (18 articles) <strong>and</strong> France (13 articles) are the three ma<strong>in</strong> contributors. A number of <strong>PhD</strong><br />

Students work <strong>in</strong> close collaboration with worldwide key <strong>in</strong>dustrial players <strong>in</strong> microelectronics (e.g.<br />

ST<strong>Microelectronics</strong>).<br />

The technical Program Committee selected 111 oral presentations out of 175 proposals,<br />

correspond<strong>in</strong>g to an overall acceptance rate of 63%.<br />

The ma<strong>in</strong> topics of the <strong>2005</strong> PRIME conference are analog <strong>in</strong>tegrated circuits (ICs), RF <strong>and</strong><br />

communication ICs, digital ICs, ICs applied to biomedical applications, ADCs, sensors <strong>and</strong><br />

<strong>in</strong>terfaces, CAD tools, FPGAs, Systems on Chip, device model<strong>in</strong>g, MEMs components, <strong>and</strong> optical<br />

devices.<br />

The Technical Program Committee is formed by 26 <strong>in</strong>ternationally recognized experts <strong>in</strong> the above<br />

mentioned topics, distributed between the <strong>in</strong>dustrial world, the academic world <strong>and</strong> research centers.<br />

We would like to thank all the members of this Committee for the hard <strong>and</strong> competent work <strong>in</strong> timely<br />

review<strong>in</strong>g the submitted PRIME summaries.<br />

We would also like to acknowledge the important role played by the technical sponsor IEEE Circuits<br />

<strong>and</strong> Systems (CAS) Society for support <strong>and</strong> the advertisement of this Conference.<br />

We do hope that the Gala d<strong>in</strong>ner will give you the opportunity to meet each other <strong>in</strong> a friendly<br />

atmosphere <strong>and</strong> that you will enjoy your stay <strong>in</strong> Switzerl<strong>and</strong>.<br />

Prof. Michel Declercq Dr. Cather<strong>in</strong>e Deholla<strong>in</strong> Prof. Franco Maloberti<br />

General Chair Technical Program Co-Chair Technical Program Co-Chair<br />

II


Technical Committee<br />

General Chair:<br />

Professor Michel Declercq<br />

Director of the <strong>Electronics</strong> Lab. <strong>and</strong> Dean of the School of Eng<strong>in</strong>eer<strong>in</strong>g at EPFL<br />

E-mail: michel.declercq@epfl.ch<br />

Address: EPFL, STI, IMM, LEG, ELB Ecublens, CH-1015 Lausanne, Switzerl<strong>and</strong>.<br />

Phone : +41 (0)21 693 69 61 ; +41 (0)21 693 39 74<br />

Fax : +41 (0)21 693 36 40<br />

Program Chairs:<br />

Dr. Cather<strong>in</strong>e Deholla<strong>in</strong><br />

Head of the RF ICs Group at the <strong>Electronics</strong> Lab. of EPFL<br />

E-mail: cather<strong>in</strong>e.deholla<strong>in</strong>@epfl.ch<br />

Address: EPFL, STI, IMM, LEG, ELB Ecublens, CH-1015 Lausanne, Switzerl<strong>and</strong>.<br />

Phone: +41 (0)21 693 69 71<br />

Fax : +41 (0)21 693 36 40<br />

Professor Franco Maloberti<br />

<strong>Microelectronics</strong> Chair Professor<br />

E-mail: franco.maloberti@unipv.it<br />

University of Pavia, Electronic Department, Via Ferrata 1, 27100 Pavia, Italy<br />

Phone : (+39) 0382-985-205<br />

Local Chair:<br />

Isabelle Buzzi<br />

E-mail: isabelle.buzzi@epfl.ch<br />

Address: EPFL, STI, IMM, LEG, ELB Ecublens, CH-1015 Lausanne, Switzerl<strong>and</strong>.<br />

Phone: +41 (0)21 693 39 75<br />

Fax : +41 (0)21 693 36 40<br />

Other Members of the Committee:<br />

Dr. Thierry Arnaud<br />

Head of the RF ICs group<br />

E-mail: thierry.arnaud@st.com<br />

Address: ST <strong>Microelectronics</strong>-NV, 39 Chem<strong>in</strong> du champ des filles, Case Postale 21<br />

CH-1228 Plan-les-Ouates, Switzerl<strong>and</strong>.<br />

Phone: +41 (0)22 929 58 84<br />

Professor Erik Bruun<br />

E-mail: eb@oersted.dtu.dk<br />

Address: Technical University of Denmark<br />

rsted-DTU<br />

Build<strong>in</strong>g 348, DK-2800 Kgs. Lyngby, Denmark<br />

Phone: (+45) 4525 3906<br />

John Gerrits<br />

Staff Eng<strong>in</strong>eer<br />

E-mail: john.gerrits@csem.ch<br />

Address: CSEM, Advanced Systems Eng<strong>in</strong>eer<strong>in</strong>g<br />

Rue Jaquet-Droz 1, CH-2007 Neuchatel, Switzerl<strong>and</strong><br />

Phone : +41 (0)32 720 56 52<br />

Professor Kari Halonen<br />

E-mail: karih@ecdl.hut.fi<br />

Address: Hels<strong>in</strong>ki University of Technology<br />

Electronic Circuits Laboratory, FIN-02015, F<strong>in</strong>l<strong>and</strong><br />

III


Frank Henkel<br />

Head of the RF ICs group<br />

E-mail: henkel@imst.de<br />

Address: IMST GmbH<br />

Carl-Friedrich-Gauss Strasse<br />

D-47475 Kamp-L<strong>in</strong>tfort, Germany<br />

Phone: +49 (0) 28 42 981 270<br />

Professor Maher Kayal<br />

E-mail: maher.kayal@epfl.ch<br />

Address: EPFL, STI, IMM, LEG, ELB Ecublens, CH-1015 Lausanne, Switzerl<strong>and</strong>.<br />

Phone : +41 (0)21 693 39 81<br />

Professor Yususf Leblebici<br />

Director of the LSM Laboratory<br />

E-mail: yusuf.leblebici@epfl.ch<br />

Address: EPFL, STI, IMM, LSM, ELB Ecublens, CH-1015 Lausanne, Switzerl<strong>and</strong>.<br />

Phone : +41 (0)21 693 69 51<br />

Dr. Jean-René Lequepeys<br />

Head of the RF ICs Group at the LETI<br />

E-mail: LEQUEPEYS@chartreuse.cea.fr<br />

CEA/Grenoble, Laboratoire d’Electronique et de Technologie de l’Information<br />

Address : CEA/Grenoble, 17 Rue des Martyrs, 38054 Grenoble Cedex 9, France<br />

Phone : +33 (0) 438 78 37 49<br />

Professor Gaelle Lissorgues-Baz<strong>in</strong><br />

Address: School of Eng<strong>in</strong>eer<strong>in</strong>g ESIEE, Signal <strong>and</strong> Telecom. Department<br />

Cite Descartes, BP 99, Boulevard Blaise Pascal no 2<br />

F-93162 Noisy-Le-Gr<strong>and</strong>, Cedex, France<br />

Phone : +33 (0) 1 45 92 66 96<br />

Professor Piero Malcovati<br />

E-mail: piero.malcovati@unipv.it<br />

Address: University of Pavia, Electronic Department, Via Ferrata 1, 27100 Pavia, Italy<br />

Professor F. Xavier Moncunill<br />

e-mail: moncunill@tsc.upc.es<br />

Address: Department of Signal Theory <strong>and</strong> Communications<br />

Universitat Politcnica de Catalunya<br />

Mdul D4-208<br />

Jordi Girona 1-3<br />

08034 Barcelona, Spa<strong>in</strong><br />

Phone: + 34 93 40 17 072<br />

Dr. Pierre Nicole<br />

E-mail: pierre.nicole@fr.thalesgroup.com<br />

Project Leader (Dept. optique, hyperfréquences et microtechnologies)<br />

Address: THALES systèmes aéroportés, 2 Avenue Gay Lussac<br />

78851 Elancourt, Cedex, France<br />

Phone : + 33 (0) 1 34 59 57 98<br />

Dr. Patrice Senn<br />

E-mail : patrice.senn@francetelecom.com<br />

Address: France Telecom/BD/FTR&D/DIH/OCF<br />

ZIRST - BP 98, 38 243 Meylan Cedex, France<br />

Phone: (33) (0)4 76 76 41 32<br />

IV


Professor Gilles Sicard<br />

E-mail : gilles.sicard@imag.fr<br />

Address : TIMA-CMP<br />

46 avenue Felix Viallet, 38031 Grenoble Cedex, France<br />

Phone: +33 4 76 57 46 15<br />

Professor Hannu Tenhunen<br />

Dean of the School of Information Technology<br />

E-mail: hannu@ele.kth.se<br />

Address: School of Information Technology<br />

Royal Institute of Technology (KTH)<br />

IT-University, IMIT/LECS/ESDlab<br />

Isafjordsgatan 39<br />

Box 229, 16440 Kista, Sweden<br />

Phone: 46-8-7904119<br />

Professor Bill Redman-White<br />

E-mails: bill.redman-white@philips.com, wrw@ecs.soton.ac.uk<br />

University of Southampton, United K<strong>in</strong>gdom<br />

Philips Company, United K<strong>in</strong>gdom<br />

Professor Gerhard Wachutka<br />

E-mail: gw@tep.ei.tum.de<br />

Munich University, Germany<br />

Dr. Herman Casier<br />

E-mail: herman_casier@amis.com<br />

AMI Semiconductor, BELGIUM<br />

Dr. Didier Belot<br />

RF Integrated System Design Program Manager<br />

E-mail: didier.belot@st.com<br />

Address: ST <strong>Microelectronics</strong>, Crolles, France<br />

Phone: 0033 476 92 62 61<br />

Professor Michael Peter Kennedy<br />

E-mail: peter.kennedy@ucc.ie<br />

Address: University College Cork, Department of Microelectronic Eng<strong>in</strong>eer<strong>in</strong>g<br />

Butler Build<strong>in</strong>g, North Mall, Cork, Irel<strong>and</strong><br />

Phone: +353 21 490 4570<br />

Professor Willy Sansen<br />

Head of ESAT-MICAS<br />

E-mail: willy.sansen@esat.kuleuven.ac.be<br />

Address: K.U. Leuven, Kast. Arenberg 10, B-3001 Leuven, Belgium<br />

Phone: +32 16 321 077<br />

Professor Andrea Baschirotto<br />

E-mail: <strong>and</strong>rea.baschirotto@unile.it<br />

Address: University of Lecce - Dept. of Innovation Eng<strong>in</strong>eer<strong>in</strong>g, Via per Monteroni,<br />

73100 Lecce, Italy<br />

Phone: (+39) 0832 297 213 (Lab 353)<br />

Professor Mihai-Dan Steriu<br />

E-mail: dan.steriu@dce.pub.ro<br />

Address: University "Politehnica" of Bucharest - Faculty of <strong>Electronics</strong>, Telecommunications <strong>and</strong> Information<br />

Technology, 1-3, Bd. luliu Maniu, sector 6, Bucharest, Romania<br />

Phone: (+40) 21 402 4840<br />

V


Volume I<br />

Monday, 25th July <strong>2005</strong><br />

Morn<strong>in</strong>g<br />

Registration – Build<strong>in</strong>g CO, <strong>in</strong> front of the auditorium CO2 - http://plan.epfl.ch<br />

All presentations will take place <strong>in</strong> build<strong>in</strong>g CO (CO1 <strong>and</strong> CO2)<br />

Auditorium CO2<br />

DEHOLLAIN Cather<strong>in</strong>e, DECLERCQ Michel<br />

<strong>Electronics</strong> Laboratories, Swiss Federal Institute of Technology (EPFL), Lausanne, Switzerl<strong>and</strong><br />

Welcome address<br />

Overview of the School of Eng<strong>in</strong>eer<strong>in</strong>g of the Swiss Federal Institute of Technology (EPFL)<br />

Time Topic Chairperson<br />

10:30 - 12:00 Analog ICs F. Henkel, IMST GmbH, Germany<br />

SCANDURRA Graziella 1 , CIOFI Carm<strong>in</strong>e 1 , ZITO Domenico 2<br />

1 University of Mess<strong>in</strong>a, Mess<strong>in</strong>a, Italy, 2 University of Pisa, Pisa, Italy<br />

A New Topology for Transformer Based CMOS Active Inductances<br />

VOGRIG D. 1 , GEROSA A. 1 , NEVIANI A. 1 ,GRAELL I AMAT A. 2 , MONTORSI G. 2 ,<br />

2 1<br />

BENEDETTO S.<br />

1 2<br />

University of Padova, Padova, Italy, Politecnico di Tor<strong>in</strong>o, Tor<strong>in</strong>o, Italy<br />

A 0.35 um CMOS Analog Turbo Decoder for a 40-bit Rate 1/3 UMTS Channel Code<br />

YAZICIOGLU Refet Firat 1,2 , MERKEN Patrick 1 , VAN HOOF Chris 1,2<br />

1 IMEC, Leuven, Belgium, 2 K.U.Leuven, Leuven, Belgium<br />

Effect of Electrode Offset on the CMRR of the Current Balanc<strong>in</strong>g Instrumentation Amplifiers<br />

VASILOPOULOS Athanasios, VITZILAIOS Georgios, THEODORATOS Gerasimos,<br />

PAPANANOS Yannis,<br />

National Technical University of Athens (NTUA), Athens, Greece<br />

A Low-Voltage, Highly L<strong>in</strong>ear, Integrated, Active-RC Filter<br />

Time Topic Chairperson<br />

10:30 - 12:00 RF ICs A. Baschirotto, University of Lecce, Italy<br />

D'AMICO S. 1 . GRASSI T. 1 , RYCKAERT Julien 2 , BASCHIROTTO A. 1<br />

1 University of Lecce, Lecce, Italy, 2 IMEC, Leuven, Belgium<br />

A Up to 1GHz Low-Power Cont<strong>in</strong>uous-time "3rd-order Filter+Integrator" Cha<strong>in</strong> for Wireless<br />

Body-Area Network Receivers<br />

WELLIG Arm<strong>in</strong>,<br />

ST<strong>Microelectronics</strong>, Geneva, Switzerl<strong>and</strong><br />

Energy-Efficient Baseb<strong>and</strong> Receiver Design <strong>in</strong> Deep Sub-um Technology<br />

GHITTORI N. 1 , VIGNA A. 1 , MALCOVATI P. 1 , D'AMICO S. 2 , BASCHIROTTO A. 2<br />

1 University of Pavia, Pavia, Italy, 2 University of Lecce, Lecce, Italy<br />

A Low-Voltage, Low-Distortion (1.2V, 29.5 dBm OIP3) Reconfigurable Baseb<strong>and</strong> Block for<br />

Mobile Applications<br />

1 VIDOJKOVIC Maja, 1 VAN DER TANG Johan, 2 BALTUS Peter, 1 VAN ROERMUND Arthur<br />

1 E<strong>in</strong>dhoven University of Technology, E<strong>in</strong>dhoven, The Netherl<strong>and</strong>s,<br />

2 ASL Philips, E<strong>in</strong>dhoven, The Netherl<strong>and</strong>s<br />

Adaptive Mixers with a Discretely <strong>and</strong> a Cont<strong>in</strong>uously Ajustable performance space<br />

VI<br />

P 1<br />

P 5<br />

P 9<br />

P 13<br />

P 17<br />

P 21<br />

P 25<br />

P 29


Volume I<br />

Monday, 25th July <strong>2005</strong><br />

Afternoon<br />

Time Topic Chairperson<br />

13:50 - 15:45 ADCs F. Maloberti, University of Pavia, Italy<br />

MOVAHEDIAN Hamid, BAKHTIAR Mehrdad Sharif,<br />

Sharif University of Technology, Tehran, Iran<br />

A 1.5V 8-Bit Low-Power Self-Calibrat<strong>in</strong>g High-Speed Fold<strong>in</strong>g ADC<br />

CATLI Burak 1,2 , BAYAM Fidel 1,2 , CAVUS Bilal Tarik 1 , KEPKEP Asim, ZEKI Ali 1,<br />

1 Istanbul Technical University, Istanbul, Turkey, 2 ETA-IC Design Center, Istanbul, Turkey<br />

A 4 GSa/s 7-Bit Two-Way Time-Interleaved Bipolar Track-<strong>and</strong>-Hold Amplifier Based on a<br />

Novel Analog Switch<br />

TALAY Selçuk, DUNDAR Günhan,<br />

Bogaziçi University, Istanbul, Turkey<br />

A Sigma-Delta ADC Design Automation Tool<br />

TESTONI Nicola, CISTERNI Marco, FRANCHI Eleonora,<br />

University of Bologna, Bologna, Italy<br />

A Regular Modular Architecture for Pipel<strong>in</strong>ed B<strong>in</strong>ary Tree Multipliers Based on a SOG<br />

Structure<br />

HANNON Jason, WEGENER Carsten, KENNEDY Michael Peter,<br />

University College Cork, Cork, Irel<strong>and</strong><br />

Develop<strong>in</strong>g an Strategic Error Source Based Design Evaluation for ADC’s<br />

Time Topic Chairperson<br />

13:50 - 15:45 Digital ICs A. Baschirotto, University of Lecce, Italy<br />

WANG Yu, YANG Huazhong, WANG Hui,<br />

Ts<strong>in</strong>ghua University, Beij<strong>in</strong>g, Ch<strong>in</strong>a<br />

Signal-path Level Assignment for Dual-Vt Technique<br />

PETTENGHI Hector, AVEDILLO Maria J., QUINTANA José M.,<br />

Instituto de Microelectrónica de Sevilla, Sevilla, Spa<strong>in</strong><br />

Novel Improved RTD-Based Implementation of Multi-Threshold Logic Gates<br />

BOUQUET Valéry 1,2 , CANET Pierre 1 , LALANDE Frédéric 1 , DEVIN Jean 2 ,<br />

LECONTE Bruno 2 , MARIEMA Nicolas 1 ,<br />

1 IMT Technopole de Château Gombert, Marseille, France. 2 ST<strong>Microelectronics</strong>, Rousset, France<br />

Variation of Flash Memory Threshold Voltage Correlated With Applied Voltage Slope <strong>in</strong><br />

Fowler Nordheim Erase Mode<br />

HATIRNAZ Ilhan, BADEL Stéphane, LEBLEBICI Yusuf<br />

Swiss Federal Institute of Technology (EPFL), Lausanne, Switzerl<strong>and</strong><br />

Towards a Unified Top-Down Design Flow for Fully Differential Logic Blocks with Improved<br />

Speed <strong>and</strong> Noise Immunity<br />

WEY I-Chyn, CHEN You-Gang, WU Chia-Tsun, WANG Wei, <strong>and</strong> WU An-Yeu,<br />

National Taiwan University, Taipei, Taiwan<br />

A High-Speed Scalable Shift-Register Based On-Chip Serial Communication Design for SoC<br />

Applications<br />

VII<br />

P 33<br />

P 37<br />

P 40<br />

P 44<br />

P 48<br />

P 52<br />

P 56<br />

P 60<br />

P 63<br />

P 67


Volume I<br />

Monday, 25th July <strong>2005</strong><br />

Afternoon<br />

Time Topic Chairperson<br />

16:15 – 17:05 ADCs F. Maloberti, University of Pavia, Italy<br />

GHARBIYA Ahmed, JOHNS David A.,<br />

University of Toronto, Ontario, Canada<br />

Fully Digital Feedforward Delta-Sigma Modulator<br />

CALDWELL Trevor C., JOHNS David A.,<br />

University of Toronto, Ontario, Canada<br />

A High-Speed Technique for Time-Interleav<strong>in</strong>g Cont<strong>in</strong>uous-Time Delta-Sigma Modulators<br />

Time Topic Chairperson<br />

16:15 – 17:30 Analog ICs C. Deholla<strong>in</strong>, EPFL, Switzerl<strong>and</strong><br />

PIOMBO Davide, ZUNINO Rodolfo,<br />

University of Genova, Genova, Italy<br />

Analog Soft Max Circuit with Dynamic Ga<strong>in</strong> Control<br />

1,2 1 2 2<br />

CAUCHETEUX Damien , BEIGNE Edith , RENAUDIN Marc , CROCHON Elisabeth ,<br />

1 2<br />

CEA, Grenoble, France, TIMA Lab., Grenoble France<br />

Toward Asynchronous <strong>and</strong> High Data Rates Passive Contactless Systems<br />

LIU Chengx<strong>in</strong>, Mc NEILL John,<br />

Worcester Polytechnic Institute, Worcester, MA, USA<br />

A Digital-PLL-Based True R<strong>and</strong>om Number Generator<br />

VIII<br />

P 71<br />

P 75<br />

P 79<br />

P 83<br />

P 87


Volume I<br />

Tuesday, 26th July <strong>2005</strong><br />

Morn<strong>in</strong>g<br />

Time Topic Chairperson<br />

8:30 - 10:00 RF ICs A. Koukab, EPFL, Switzerl<strong>and</strong><br />

TEDESCO Annamaria, BONFANTI Andrea, PANSERI Luigi, LACAITA Andrea<br />

Politecnico di Milano, Milano, Italy<br />

A 11-15 GHz CMOS /2 Frequency Divider For Broad-B<strong>and</strong> I/Q Generation<br />

NIELSEN Michael, LARSEN Torben,<br />

Aalborg University, Denmark<br />

A 4.2 GHz CMOS Quadrature VCO us<strong>in</strong>g Injection Lock<strong>in</strong>g for WLAN 802.11a<br />

VON BÜREN George, ELLINGER Frank, JAECKEL He<strong>in</strong>z,<br />

Swiss Federal Institute of Technology (ETHZ), Zürich, Switzerl<strong>and</strong><br />

A Very Low Power Consum<strong>in</strong>g 5 GHz CMOS VCO Core with 800 MHz Frequency Tun<strong>in</strong>g<br />

Range<br />

LEI Yu, KOUKAB Adil, DECLERCQ Michel,<br />

Swiss Federal Institute of Technology (EPFL), Lausanne, Switzerl<strong>and</strong><br />

Design <strong>and</strong> Optimization of CMOS Prescaler<br />

Time Topic Chairperson<br />

8:30 - 10:00 FPGAs A. Schmid, EPFL, Switzerl<strong>and</strong><br />

MEINDL Tassilo, MONIACI Walter, GALLESIO Davide, PASERO Eros<br />

Politecnico di Tor<strong>in</strong>o, Tor<strong>in</strong>o, Italy<br />

Embedded Hardware Architecture for Statistical Ra<strong>in</strong> Forecast<br />

DENNING Daniel, IRVINE James, DEVLIN Malachy,<br />

Alba Centre, Liv<strong>in</strong>gston, UK<br />

A High Throughput FPGA Camellia Implementation<br />

RAMACCIOTTI Tommaso 1 , SERAFINI Luca 1 , FANUCCI Luca 1 , BALDACCI Stefano 2 .<br />

1 University of Pisa, Pisa, Italy, 2 Kayser Italia Srl, Livorno, Italy<br />

A Novel Approach Based on Dynamic Reconfiguration for Process Controls with FPGA<br />

MAN K.L.,<br />

E<strong>in</strong>dhoven University of Technology, E<strong>in</strong>dhoven, The Netherl<strong>and</strong>s<br />

An Overview of SystemC FL<br />

IX<br />

P 91<br />

P 95<br />

P 99<br />

P 103<br />

P 107<br />

P 111<br />

P 115<br />

P 119


Volume I<br />

Tuesday, 26th July <strong>2005</strong><br />

Morn<strong>in</strong>g<br />

Time Topic Chairperson<br />

10:30 – 12:00 RF ICs J. Gerrits, CSEM, Switzerl<strong>and</strong><br />

DE MATTEIS M., D'AMICO S., BASCHIROTTO A.<br />

University of Lecce, Lecce, Italy<br />

A 600mV 1.32mW 75dB-DR 4th-order Baseb<strong>and</strong> Analog Filter for UMTS Receivers<br />

CURTY Jari-Pascal, JOEHL Norbert, DEHOLLAIN Cather<strong>in</strong>e, DECLERCQ Michel,<br />

Swiss Federal Institute of Technology (EPFL), Lausanne, Switzerl<strong>and</strong><br />

A 2.45GHz Remotely Powered RFID System<br />

D'AMICO S., CHIRONI V., BASCHIROTTO A.<br />

University of Lecce, Lecce, Italy<br />

A 0.13 um CMOS VGA for Multist<strong>and</strong>ard Receivers<br />

ZITO Domenico, D'ASCOLI Francesco, NERI Bruno,<br />

University of Pisa, Pisa, Italy<br />

Fully Integrated RF Front-End for WLAN: a New Step Toward S<strong>in</strong>gle-Chip Transceivers<br />

Time Topic Chairperson<br />

10:30 – 12:00 FPGAs A. Schmid, EPFL, Switzerl<strong>and</strong><br />

MARRAKCHI Zied, MRABET Hayder, MEHREZ Habib,<br />

Université Paris 6, Paris, France<br />

Hierarchical FPGA Cluster<strong>in</strong>g to Improve Routability<br />

YANG M. 1 , ALMAINI A.E.A. 1 , WANG L. 2 , WANG P.J. 3 ,<br />

1 Napier University, Scotl<strong>and</strong>, UK, 2 Fudan University, Ch<strong>in</strong>a<br />

3 N<strong>in</strong>gbo University, Ch<strong>in</strong>a<br />

An Evolutionary Approach for Symmetrical Field Programmable Gate Array Placement<br />

STERPONE Luca, SONZA REORDA Matteo, VIOLANTE Massimo,<br />

Politecnico di Tor<strong>in</strong>o, Tor<strong>in</strong>o, Italy<br />

RoRa: A Reliability-oriented Place <strong>and</strong> Route Algorithm for SRAM-based FPGAs<br />

GOLOFIT Krzystof,<br />

Warsaw University of Technology, Warsaw, Pol<strong>and</strong><br />

Efficient VLSI Implementation of Multiplication <strong>in</strong> GF(2")<br />

X<br />

P 123<br />

P 127<br />

P 131<br />

P 135<br />

P 139<br />

P 143<br />

P 147<br />

P 151


Volume I<br />

Tuesday, 26th July <strong>2005</strong><br />

Afternoon<br />

Time Topic Chairperson<br />

13:50 - 15:45 MEMs M. Mazza, EPFL, Switzerl<strong>and</strong><br />

KRZEMIENIECKI Dariusz 1 , HERMANOWICZ Ewa 2 ,<br />

1 GZT Telekom-Telmor, Gdansk, Pol<strong>and</strong>, 2 Gdansk University of Technology, Gdansk, Pol<strong>and</strong><br />

Multidirectional CATV Transformers<br />

PISANI Marcelo, HIBERT Cyrille, BOUVET Didier, DEHOLLAIN Cather<strong>in</strong>e,<br />

IONESCU Adrian M.,<br />

Swiss Federal Institute of Technology (EPFL), Lausanne, Switzerl<strong>and</strong><br />

Fabrication <strong>and</strong> Electrical Characterization of High Performance Copper/Polyimide Inductors<br />

PERRUISSEAU-CARRIER Julien, FRITSCHI Raphaël, SKRIVERVIK Anja,<br />

Swiss Federal Institute of Technology (EPFL), Lausanne, Switzerl<strong>and</strong><br />

Design of Enhanced Multi-Bit Distributed MEMS Variable True-Time Delay L<strong>in</strong>es<br />

SCUDERI A. 1 , BIONDI T. 2 , RAGONESE E. 1 , PALMISANO G. 1 ,<br />

1 University of Catania, Catania, Italy, 2 ST<strong>Microelectronics</strong>, Catania, Italy<br />

Inductance Calculation of Thick-Metal Inductors<br />

VILLARROYA Maria 1 , VERD Jaume 1 , TEVA Jordi 1 , ABADAL, G. 1 PEREZ Francesc 2 ,<br />

ESTEVE Jaume 2 , BARNIOL Nùria 1 ,<br />

1 2<br />

Universitat Autònoma de Barcelona, Barcelona, Spa<strong>in</strong>, Instituto de Microelectrónica de Barcelona,<br />

Bellaterra, Spa<strong>in</strong><br />

Cantilever Based MEMS for Multiple Mass Sens<strong>in</strong>g<br />

Time Topic Chairperson<br />

13:50 - 15:45 CAD Tools F. Henkel, IMST GmBh, Germany<br />

HEINZL R., SPEVAK M., SCHWAHA P., GRASSER T.<br />

Technical University Vienna, Vienna, Austria<br />

A Novel Technique for Coupl<strong>in</strong>g Three Dimensional Mesh Adaptation with an a Posteriori<br />

Error Estimator<br />

LATTUADA Mauro, POSEGA Renzo, MATTAVELLI Marco, MLYNEK Daniel,<br />

Swiss Federal Institute of Technology (EPFL), Lausanne, Switzerl<strong>and</strong><br />

Efficient Error Correction Solutions for OFDM-Based Wireless Video<br />

MELANI Massimiliano, FANUCCI Luca, SAPONARA Sergio,<br />

University of Pisa, Pisa, Italy<br />

Algorithmic/Architectural Design for H.264/MPEG-4 AVC Low-Power Video CODEC<br />

MORGENTSTERN H. 1 , GROOS G. 2 , KOEHNE H. 1 , STECHER M. 3 , JOHN W. 1 ,<br />

REICHEL H. 1 ,<br />

1 Frauenhofer IZM, Berl<strong>in</strong>, Germany, 2 Universität der Bundeswehr, Neubiberg, Germany<br />

3 Inf<strong>in</strong>eon Technologies, Munich, Germany<br />

Algorithm for the Automatic Verification of Complex Mixed-Signal ICs regard<strong>in</strong>g ESD-Stress<br />

RUIZ-DE-CLAVIJO P., BELLIDO M.J., JUAN J.,<br />

Universidad de Sevilla, Sevilla, Spa<strong>in</strong><br />

Halotis-High Accurate Logic Tim<strong>in</strong>g Simulator<br />

XI<br />

P 155<br />

P 159<br />

P 163<br />

P 167<br />

P 171<br />

P 175<br />

P 179<br />

P 183<br />

P 187<br />

P 191


Volume I<br />

Tuesday, 26th July <strong>2005</strong><br />

Afternoon<br />

Time Topic Chairperson<br />

16:15 – 17:45 MEMs M. Mazza, EPFL, Switzerl<strong>and</strong><br />

DELMONTE Nicola 1 ,WATTS Bernard Enrico 2 , ROSA Lorenzo 1 , CHIERBOLI Giovanni 1 ,<br />

COVA Paolo 1 , MENOZZI Roberto 1 ,<br />

1 University of Parma, Parma, Italy, 2 Instituto IMEM/CNR, Parma, Italy<br />

Test Pattern for Microwave Dielectric Properties of SrBi2TA2O9<br />

DESPESSE Ghisla<strong>in</strong> 1 , JAGER Thomas 1 , CHAILLOUT Jean-Jacques 1 , LEGER Jean Michel 1 ,<br />

BASROUR Sk<strong>and</strong>ar 2 ,<br />

1 CEA-LETI, Grenoble, France, 2 TIMA, Grenoble, France<br />

Design <strong>and</strong> Fabrication of a New System for Vibration Energy Harvest<strong>in</strong>g<br />

LEO Elisabetta, BRAGHIN Francesco, RESTA Ferruccio,<br />

Politecnico di Milano, Milano, Italy<br />

Nonl<strong>in</strong>ar Vibrations of a MEMS Translational Device<br />

SAHA Shimul Ch<strong>and</strong>ra, SINGH Tajeshwar, SOETHER Trond,<br />

Norwegian University of Science <strong>and</strong> Technology, Trondheim, Norway<br />

Design <strong>and</strong> Simulation of RF MEMS Switches for High Switch<strong>in</strong>g Speed <strong>and</strong> Moderate Voltage<br />

Operation<br />

Time Topic Chairperson<br />

16:15 – 17:25 Sensors <strong>and</strong><br />

Interfaces<br />

P.A. Besse, EPFL, Switzerl<strong>and</strong><br />

MAJEED Bivragh, VAN SINTE Jean-Baptiste, INDRAJIT Paul, BARTON John,<br />

C O’MATHUNA Sean, DELANEY Kieran<br />

Tyndall National Institute <strong>and</strong> Cork Institute of Technology, Cork, Irel<strong>and</strong><br />

Material Characterisation <strong>and</strong> Process Development for M<strong>in</strong>iaturised Wireless Sensor Network<br />

Module<br />

ZORLU O., KEIJIK P., VINCENT F., POPOVIC, R.S.,<br />

Swiss Federal Institute of Technology (EPFL), Lausanne, Switzerl<strong>and</strong><br />

A Novel Planar Magnetic Sensor Based on Orthogonal Fluxgate Pr<strong>in</strong>ciple.<br />

CHAEHOI A., LATORRE L., .MAILLY F., NOUET P.<br />

Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier, France<br />

A Monolithic CMOS 3-Axis Accelerometer Comb<strong>in</strong><strong>in</strong>g Piezoresistive <strong>and</strong> Heat Transfer Effects<br />

GALA DINNER AT HOTEL BEAU-RIVAGE<br />

OUCHY – LAUSANNE FROM 7.30 PM<br />

XII<br />

P 195<br />

P 199<br />

P 203<br />

P 207<br />

P 211<br />

P 215<br />

P 219


Volume II<br />

Wednesday, 27th July <strong>2005</strong><br />

Morn<strong>in</strong>g<br />

Time Topic Chairperson<br />

8:30 - 10:00 RF ICs C. Deholla<strong>in</strong>, EPFL, Switzerl<strong>and</strong><br />

MAK Pui-In, U Seng-Pan 1 , MARTINS R.P. 2 ,<br />

University of Macau, Macao, Ch<strong>in</strong>a, 1 Also with Chipidea <strong>Microelectronics</strong>, Macao, Ch<strong>in</strong>a, 2 On<br />

leave from Instituto Superior Técnico, Lisbon, Portugal<br />

Multist<strong>and</strong>ard-Compliant Receiver Architecture with Low-Voltage Implementation<br />

LOPEZ Joan Lluis, TEVA Jordi, TORRES Francesc, ABADAL Gabriel, URANGA Arantxa,<br />

BARNIOL Nùria,<br />

Universitat Autònoma de Barcelona, Barcelona, Spa<strong>in</strong><br />

Frequency Synthesis Us<strong>in</strong>g On-Chip Micromechanical Resonator<br />

BURG A. HAENE S., PERELS D., LUETHI P., FELBER N., FICHTNER W.,<br />

Swiss Federal Institute of Technology (ETHZ), Zürich, Switzerl<strong>and</strong><br />

Receiver Design for Multi-Antenna Wireless Communications<br />

SCUDERI A. 1,2 , CARRARA F. 1 , PALMISANO G. 1 ,<br />

1 University of Catania, Catania, Italy, 2 ST<strong>Microelectronics</strong>, Catania, Italy<br />

A Soft-Slope Output Power Control Circuit for RF Power Amplifiers<br />

Time Topic Chairperson<br />

8:30 - 10:00 CAD Tools A. Koukab, EPFL, Switzerl<strong>and</strong><br />

MENDEZ Miguel, MATEO Diego, RUBIO Antonio, GONZALEZ José Luis,<br />

Universitat Politècnica de Catalunya, Barcelona, Spa<strong>in</strong><br />

Characterization of the Substrate Noise Spectrum for Mixed-Signal ICs<br />

WANG Q., VAN DER MEIJS,<br />

Delft University of Technology, Delft, The Netherl<strong>and</strong>s<br />

An Efficient Method for Substrate Impedance Extraction<br />

KARAMPATZAKIS D.P., EVMORFOPOULOS N.E., STAMOULIS G.I.,<br />

University of Thessaly, Volos, Greece<br />

A Statistically-Based Eng<strong>in</strong>e for P/G Network Optimization<br />

COULIBALY L.M., KADIM H.J.,<br />

Liverpool JM University, Liverpool, UK<br />

Electromagnetic Signatures: A Virtual Test <strong>and</strong> Verification Tool for SoC Signal Integrity<br />

XIII<br />

P 223<br />

P 227<br />

P 231<br />

P 235<br />

P 239<br />

P 243<br />

P 247<br />

P 251


Volume II<br />

Wednesday, 27th July <strong>2005</strong><br />

Morn<strong>in</strong>g<br />

Time Topic Chairperson<br />

10:30 – 12:00 RF ICs M.P. Kennedy, University College Cork, Irel<strong>and</strong><br />

TASSIN Claire 1,2 , GARCIA Patrice 1 , BEGUERET Jean-Baptiste 2 , TOUPE Romaric 2 , DEVAL<br />

Yann 2 , BELOT Didier 1<br />

1 ST<strong>Microelectronics</strong>, Crolles, France, 2 Bordeaux University, Talence, France<br />

A Mixed-Signal Cartesian Feedback L<strong>in</strong>earization System for a Zero-IF WCDMA Transmitter<br />

H<strong>and</strong>set IC<br />

COLLARD BOVY Anne 1 , COURMONTAGNE Philippe 2 ,<br />

1 ST<strong>Microelectronics</strong>, Rousset, France, 2 L2MP CNRS UMR, Marseille, France<br />

An All-Digital Signal Receiver for Transmission/Reception of Radio-Frequency Architectures<br />

HOLLIS Timothy, COMER David, COMER Donald,<br />

Brigham Young University, Provo, Utah, USA<br />

Reduction of Duty Cycle Distortion Through B<strong>and</strong>-Pass Filter<strong>in</strong>g<br />

ITALIA Aless<strong>and</strong>ro, CARRARA Francesco, RAGONESE Egidio, PALMISANO Giuseppe,<br />

University of Catania, Catania, Italy<br />

The Transformer Characteristic Resistance <strong>and</strong> its Application to the Design of RF Circuits<br />

P 255<br />

P 259<br />

P 263<br />

P 267<br />

Time Topic Chairperson<br />

10:30 – 12:00 Device Model<strong>in</strong>g F.X. Moncunill, Universitat Politècnica de Catalunya,<br />

Spa<strong>in</strong><br />

CANEPARI Anna 1,2 , BERTRAND Guillaume 1 , MINONDO Michel 1 , JOURDAN Nathalie 1 ,<br />

CHANTE Jean-Pierre 2,<br />

1 ST<strong>Microelectronics</strong>, Crolles, France, 2 CEGELY, INSA de Lyon, Villeurbanne, France<br />

DC/HF Circuit Model for LDMOS Includ<strong>in</strong>g Self-Heat<strong>in</strong>g <strong>and</strong> Quasi-Saturation<br />

BECKRICH Hélène 1,2 , SCHWARTZMANN Thierry 1 , CELI Didier 1 , ZIMMER Thomas 2 ,<br />

1 ST<strong>Microelectronics</strong>, Crolles, France, 2 Université Bordeaux I, Talence, France<br />

A Spice Model for Predict<strong>in</strong>g Static Thermal Coupl<strong>in</strong>g between Bipolar Transistors<br />

SHEIKHOLESLAMI A. 1 , HOLZER S. 3 , HEITZINGER C. 1 , LEICHT M. 2 , HAEBERLEN O. 2 ,<br />

FUGGER J. 2 , GRASSER T. 3 , SELBERHERR S. 1 ,<br />

1 Technical University Vienna, Vienna, Austria, 2 Inf<strong>in</strong>eon Technologies, Villach, Austria,<br />

3 Christian Doppler Laboratory, Technical University Vienna, Austria<br />

Inverse Model<strong>in</strong>g of Oxid Deposition Us<strong>in</strong>g Measurements of a TEOS CVD Process<br />

OCKET I. 1 , NAUWELAERS B. 1 , CARCHON G 2 ., DE RAEDT W. 2 ,<br />

1 K.U.Leuven, Leuven, Belgium, 2 IMEC, Leuven, Belgium<br />

Integration of High-Q Resonators With MCM-D for Millimeter Wave Applications<br />

XIV<br />

P 271<br />

P 275<br />

P 279<br />

P 283


Volume II<br />

Wednesday, 27th July <strong>2005</strong><br />

Afternoon<br />

Time Topic Chairperson<br />

13:50 - 15:45 Sensors <strong>and</strong> Interfaces P.A. Besse, EPFL, Switzerl<strong>and</strong><br />

BRACKE Wouter, MERKEN Patrick, PUERS Robert, VAN HOOF Chris,<br />

IMEC, Leuven, Belgium, ESAT, K.U.Leuven, Leuven, Belgium<br />

Ultra Low Power Capacitive Sensor Interface With Smart Energy Management<br />

FRIEDRICH Nils, BOEHM Markus,<br />

University of Siegen, Siegen, Germany<br />

A Locally Autocompensat<strong>in</strong>g Image Sensor<br />

PASTRE Marc, KAYAL Maher,<br />

Swiss Federal Institute of Technology (EPFL), Lausanne, Switzerl<strong>and</strong><br />

A Hall Sensor-Based Current Measurement Microsystem with Cont<strong>in</strong>uous Ga<strong>in</strong> Calibration<br />

SCHNEIDER Verena,<br />

Institute for <strong>Microelectronics</strong> Stuttgart (IMS), Stuttgart, Germany<br />

Fixed-Pattern Correction of HDR Image Sensors<br />

PANCHERI Lucio 1 , DALLA BETTA Gian-Franco 1 , STOPPA Davide 2 , SCANDIUZZO Mauro 2 ,<br />

VIARANI Luigi 2 , SIMONI Andrea 2<br />

1 University of Trento, Trento, Italy, 2 ITC-irst, Trento, Italy<br />

CMOS Distance Sensor Based on S<strong>in</strong>gle Photon Avalanche Diode<br />

P 287<br />

P 291<br />

P 295<br />

P 299<br />

P 303<br />

Time Topic Chairperson<br />

13:50 - 15:45 CAD Tools F.X. Moncunill, Universitat Politècnica de Catalunya,<br />

Spa<strong>in</strong><br />

GJERMUNDNES Øyste<strong>in</strong>, AAS E<strong>in</strong>ar J.<br />

Department of <strong>Electronics</strong> <strong>and</strong> Telecommunications, NTNU, Trondheim, Norway<br />

Design of a Path Delay Fault Simulator for Evaluation of Abist Generated Stimuli<br />

MACHADO Felipe, TORROJA Yago, RIESGO Teresa,<br />

Universidad Politécnica de Madrid, Madrid, Spa<strong>in</strong><br />

Exploit<strong>in</strong>g VHDL-RTL Features to Reduce the Complexity of Power Estimation <strong>in</strong> Comb<strong>in</strong>ational<br />

Circuit<br />

MARINO C., FORLITI M., ROCCHI A., GIAMBASTIANI A., IOZZI F., DE MARINIS M.,<br />

FANUCCI L.,<br />

University of Pisa, Pisa, Italy<br />

Mixed Signal Behavioral Verification us<strong>in</strong>g VHDL-AMS<br />

TELESCU M., BREHONNET P., TANGUY N., VILBE P., CALVEZ L.C.,<br />

LEST UMR CNRS, Brest, France<br />

Laguerre-Gram Reduced Order Model<strong>in</strong>g Applied to VLSI Circuit Interconnects<br />

BALZANI Marianna, REATTI Alberto,<br />

University of Florence, Florence, Italy<br />

Neural Network Based Model of a PV Array for the optimum performance of PV system<br />

XV<br />

P 307<br />

P 311<br />

P 315<br />

P 319<br />

P 323


Volume II<br />

Wednesday, 27th July <strong>2005</strong><br />

Afternoon<br />

Time Topic Chairperson<br />

16:15 – 17:45 Optical Devices <strong>and</strong><br />

Applications<br />

P. Da<strong>in</strong>esi, EPFL, Switzerl<strong>and</strong><br />

GENSOLEN Fabrice 1,2 , CATHEBRAS Guy 2 , MARTIN Lionel 1 , ROBERT Michel 2 ,<br />

1 2<br />

ST<strong>Microelectronics</strong>, Rousset, France, Université de Montpellier II, CNRS, Montpellier,<br />

France<br />

Focal Plane Integration of Image Texture Cod<strong>in</strong>g for Pixel Correspondence<br />

UPENDRANATH Vanam 1 , GOTTARDI Massimo 2 , ZORAT Aless<strong>and</strong>ro 3 ,<br />

1 2 3<br />

Digital Systems, CEERI Pilani, India, ITC-irst, Trento, Italy, University of Trento, Trento,<br />

Italy<br />

A High-Speed <strong>and</strong> High-Resolution CMOS Optical Position Sensitive Device<br />

BEER Stephan, SEITZ Peter,<br />

Swiss Center for <strong>Electronics</strong> <strong>and</strong> Microtechnology (CSEM), Zürich, Switzerl<strong>and</strong><br />

Real-Time Tomographic Imag<strong>in</strong>g Without X-Rays: a Smart Pixel Array with Massively Parallel<br />

Signal Process<strong>in</strong>g for Real-Time Optical Coherence Tomography Perform<strong>in</strong>g Close to the Physical<br />

Limits<br />

GAILLARD Arc'hanmael 1 , MOHAMED-BRAHIM Tayeb 1 , ROGEL Régis 1 , CRAND Samuel 1 ,<br />

PRAT Christophe 2 , LEROY Philippe 2 ,<br />

1 Université de Rennes I, Rennes, France, 2 THOMSON R&D FRANCE, Cesson-Sévigné, France<br />

Development of Microcrystall<strong>in</strong>e Silicon Active Matrix for OLED Displays<br />

Time Topic Chairperson<br />

16:15 – 17:25 Sensors <strong>and</strong><br />

Interfaces<br />

F. Maloberti, University of Pavia, Italy<br />

BIROL Hansu, MAEDER Thomas, BOERS Marc, JACQ Carol<strong>in</strong>e, CORRADINI Giancarlo, RYSER<br />

Peter,<br />

Swiss Federal Institute of Technology (EPFL), Lausanne, Switzerl<strong>and</strong><br />

Mill<strong>in</strong>ewton Force Sensor Based on Low Temperature Co-Fired Ceramic (LTCC) Technology<br />

PACZESNY Daniel 1 , WEREMCZUK Jersy 1 , JACHOWICZ Ryszard S. 1 , RAPIEDJKO Piotr 2 ,<br />

1 2<br />

Warsaw University of Technology, Warsaw, Pol<strong>and</strong>, Military Medical Institute, Warsaw,<br />

Pol<strong>and</strong><br />

Construction of Fast Dew Po<strong>in</strong>t Hygrometer With Integrated Semiconductor Detector for Medical<br />

Applications<br />

GOMRI Sami, SEGUIN Jean-Luc, AGUIR Khalifa,<br />

L2MP, Université Aix-Marseille III, Marseille, France<br />

Gas Sensor Selectivity Enhancement by Noise Spectroscopy: a Mode of Adsorption-Desorption<br />

Noise<br />

XVI<br />

P 327<br />

P 331<br />

P 335<br />

P 339<br />

P 343<br />

P 347<br />

P 351


Volume II<br />

Thursday, 28 th July <strong>2005</strong><br />

Morn<strong>in</strong>g<br />

Time Topic Chairperson<br />

8:30 - 10:00 Analog ICs F. Maloberti, University of Pavia, Italy<br />

BADEL Stéphane, HARTINAZ Ilhan, LEBLEBICI Yusuf,<br />

Swiss Federal Institute of Technology (EPFL), Lausanne, Switzerl<strong>and</strong><br />

Semi-Automated Design of a MOS Current-Mode Logic St<strong>and</strong>ard Cell Library from<br />

Generic Components<br />

TOPRAK Zeynep, LEBLEBICI Yusuf,<br />

Swiss Federal Institute of Technology (EPFL), Lausanne, Switzerl<strong>and</strong><br />

Low-Power Adaptive Bias/Clock Generator Us<strong>in</strong>g 0.18 um CMOS Technology for Multi-Core<br />

Cont<strong>in</strong>uous Voltage <strong>and</strong> Frequency Scal<strong>in</strong>g<br />

TSAKIRIDIS O., ZERVAS E., STONHAM J.,<br />

Brunel University, Uxbridge, UK<br />

Voltage Control Chaotic Colpitts Oscillator<br />

SIVIERO Claudio, STIEVANO I.S., MAIO I.A.,<br />

Politecnico di Tor<strong>in</strong>o, Tor<strong>in</strong>o, Italy<br />

Behavioral Model<strong>in</strong>g of IC Output Buffers: A Case Study<br />

Time Topic Chairperson<br />

8:30 - 10:00 Digital ICs M.P. Kennedy, University College Cork, Irel<strong>and</strong><br />

GUO X<strong>in</strong>yu, SECHEN Carl,<br />

University of Wash<strong>in</strong>gton, Seattle, WA, USA<br />

High Throughput Divider Us<strong>in</strong>g Output Prediction Logic<br />

MILLAN Alej<strong>and</strong>ro, BELLIDO Manuel J., JUAN J.,<br />

Instituto de Microelectrónica de Sevilla, Sevilla, Spa<strong>in</strong><br />

Optimization Techniques for Dynamic Behavior of Digital CMOS VLSI Circuits <strong>in</strong><br />

Nanometric Technologies<br />

IOZZI Francesco, SAPONARA Sergio, MORELLO Alex<strong>and</strong>er J., FANUCCI Luca,<br />

University of Pisa, Pisa, Italy<br />

8051 CPU Core Optimization for Low Power at Register Transfer Level<br />

DASYGENIS M<strong>in</strong>as, SOUDRIS, Dimitrios, THANAILAKIS Antonios,<br />

Democritus University of Thrace, Xanthi, Greece<br />

A Data <strong>and</strong> Instruction Memory Performance <strong>and</strong> Energy Optimization Technique<br />

XVII<br />

P 355<br />

P 359<br />

P 363<br />

P 366<br />

P 370<br />

P 374<br />

P 378<br />

P 382


Volume II<br />

Thursday, 28th July <strong>2005</strong><br />

Morn<strong>in</strong>g<br />

Time Topic Chairperson<br />

10:30 – 12:00 Analog ICs C. Deholla<strong>in</strong>, EPFL, Switzerl<strong>and</strong><br />

MOEZ Kambiz K., ELMASRY Mohamed I.,<br />

University of Waterloo, Ontario, Canada<br />

A Novel Inductor-Free CMOS Broadb<strong>and</strong> Amplification Technique<br />

NIZZA Nicolò, MONDINI Alessio, BRUSCHI Paolo,<br />

University of Pisa, Pisa, Italy<br />

A Current Feedback Adaptive Bias<strong>in</strong>g Method for Class-AB OTA Cells<br />

SALMEH Roghoyed, MAUNDY Brent, JOHNSTON Ronald,<br />

University of Calgary, Calgary, Canada<br />

Effects of Input Match<strong>in</strong>g Inductor on the M<strong>in</strong>imum Noise Figure of a Low Noise<br />

Amplifier Designed Based on Noise Optimization Technique<br />

SIN Sai-Weng, U Seng-Pan 1 , MARTINS R.P. 2 ,<br />

University of Macau, Macao, Ch<strong>in</strong>a, 1 Also with Chipidea <strong>Microelectronics</strong>, Macao, Ch<strong>in</strong>a, 2 On<br />

leave for Instituto Superior Técnico, Lisbon, Portugal<br />

Novel Low-Voltage Circuit Techniques for Fully-Differential Reset <strong>and</strong> Switched-<br />

Opamps<br />

Time Topic Chairperson<br />

10:30 – 12:00 ICs applied to<br />

biomedical<br />

applications<br />

A. Schmid, EPFL, Switzerl<strong>and</strong><br />

MITRA Sr<strong>in</strong>joy, INDIVERI Giacomo<br />

University of Zürich <strong>and</strong> Swiss Federal Institute of Technology (ETHZ), Zürich, Switzerl<strong>and</strong><br />

A Low-Power Dual-Threshold Comparator for Neuromorphic Systems<br />

LICHTSTEINER P., DELBRUCK T.,<br />

Institute of Neuro<strong>in</strong>formatics, University of Zürich, Zürich, Switzerl<strong>and</strong><br />

A 64x64 AER Logarithmic Temporal Derivative Silicon Ret<strong>in</strong>a<br />

VALENTE G. 1 , DE MARTINO F. 2 , BALSI M. 1 , FORMISANO E. 2 ,<br />

1 2<br />

"La Sapienza" Universita degli Studi di Roma, Italy, University of Maastricht, The<br />

Netherl<strong>and</strong>s<br />

Optimiz<strong>in</strong>g ICA Us<strong>in</strong>g Generic Knowledge of the Sources<br />

CURRY Richard, JOHNSON Simon,<br />

University of Durham, Durham, Engl<strong>and</strong><br />

A Versatile Low Power Integrated Circuit for the Record<strong>in</strong>g <strong>and</strong> Analysis of In-<br />

Vivo <strong>and</strong> In-Vitro Neural Signals<br />

XVIII<br />

P 386<br />

P 390<br />

P 394<br />

P 398<br />

P 402<br />

P 406<br />

P 410<br />

P 414


Volume II<br />

Thursday, 28th July <strong>2005</strong><br />

Afternoon<br />

Time Topic Chairperson<br />

13:50 - 15:00 System on Chip M.P. Kennedy, University College Cork, Irel<strong>and</strong><br />

LAYER Christophe, PFEIDERER Hans-Jörg,<br />

University of Ulm, Ulm, Germany<br />

A Scalable Highly Parallel VLSI Architecture Dedicated to Associative Comput<strong>in</strong>g Algorithms<br />

GÜRKAYNAK F.D., FELBER N., KAESLIN H., FICHTNER W.<br />

Swiss Federal Institute of Technology (ETHZ), Zürich, Switzerl<strong>and</strong><br />

Area, Throughput <strong>and</strong> Security Considerations for AES Crypto-ASICs<br />

GEELEN Bert 1,2 , BROCKMEYER Erik1, LAFRUIT Gauthier 1 , LAUWEREINS Rudy 1,2 , VERKEST<br />

Diedrik 1,2,3<br />

1 2 3<br />

IMEC, Leuven, Belgium, K.U.Leuven, Leuven, Belgium, Vrije Universiteit, Brussel,<br />

Belgium<br />

Exploration of System-Level Trade-Offs for Application Mapp<strong>in</strong>g <strong>in</strong> Multi-Processor System-on-<br />

Chips.<br />

Time Topic Chairperson<br />

13:50 - 15:00 CAD Tools C. Deholla<strong>in</strong>, EPFL, Switzerl<strong>and</strong><br />

KUJANPÄÄ Tuomo, ROOS Janne, HONKALA Mikko,<br />

Hels<strong>in</strong>ki University of Technology, Hut, F<strong>in</strong>l<strong>and</strong><br />

Experimental Comparison of Optimization Methods <strong>in</strong> ANN Tra<strong>in</strong><strong>in</strong>g<br />

MALATESTA Aless<strong>and</strong>ro 1 , CARDARILLI Gian Carlo 1 , RE Marco 1 , ARNONE Luigi 2 , BOCCHIO<br />

Sara 2 ,<br />

1 University of Rome “Tor Vergata”, Rome, Italy, 2 ST<strong>Microelectronics</strong>, Agrate Brianza, Italy<br />

Development <strong>and</strong> Validation of Hardware Architectures for Real-Time High-Performance Speech<br />

Recognition Systems<br />

NDIP Ivan 1,2 , JOHN Werner 1 , REICHL Herbert 1 , THIEDE Andreas 2 ,<br />

1 Frauenhofer IZM, Berl<strong>in</strong>, Germany, 2 University of Paderborn, Paderborn, Germany<br />

A Novel Approach for RF/Microwave Model<strong>in</strong>g <strong>and</strong> Optimization of BGA Packages<br />

15:10-15:20 Auditorium CO2<br />

DEHOLLAIN Cather<strong>in</strong>e<br />

<strong>Electronics</strong> Laboratories, Swiss Federal Institute of Technology (EPFL), Lausanne,<br />

Switzerl<strong>and</strong><br />

Clos<strong>in</strong>g Session<br />

XIX<br />

P 418<br />

P 422<br />

P 426<br />

P 430<br />

P 434<br />

P 437


AUTHOR INDEX<br />

AAS E<strong>in</strong>ar J. P 307 COMER David P 263<br />

ABADAL Gabriel P 171, 227 COMER Donald P 263<br />

AGUIR Khalifa P 351 CORRADINI Giancarlo P 343<br />

ALMAINI A.E.A. P 143 COULIBALY L.M. P 251<br />

ARNONE Luigi P 434 COURMONTAGNE Philippe P 259<br />

AVEDILLO Maria J. P 56 COVA Paolo P 195<br />

BADEL Stéphane P 63, 355 CRAND Samuel P 339<br />

BAKHTIAR Mehrdad Sharif P 33 CROCHON Elisabeth P 83<br />

BALDACCI Stefano P 115 CURRY Richard P 414<br />

BALSI M. P 410 CURTY Jari-Pascal P 127<br />

BALTUS Peter P 29 DALLA BETTA Gian-Franco P 303<br />

BALZANI Marianna P 323 D'AMICO S P 17, 25, 131<br />

BARNIOL Nùria P 171, 227 D'ASCOLI Francesco P 135<br />

BARTON John P 211 DASYGENIS M<strong>in</strong>as P 382<br />

BASCHIROTTO A. P 17, 25, 123, 131 DE MARINIS M. P 315<br />

BASROUR Sk<strong>and</strong>ar P 199 DE MARTINO F. P 410<br />

BAYAM Fidel P 37 DE MATTEIS M. P 123<br />

BECKRICH Hélène P 275 DE RAEDT W. P 283<br />

BEER Stephan P 335 DECLERCQ Michel P 103, 127<br />

BEGUERET Jean-Baptiste P 255 DEHOLLAIN Cather<strong>in</strong>e P 127, 159<br />

BEIGNE Edith P 83 DELANEY Kieran P 211<br />

BELLIDO M.J. P 191, 374 DELBRUCK T. P 406<br />

BELOT Didier P 255 DELMONTE Nicola P 195<br />

BENEDETTO S. P 5 DENNING Daniel P 111<br />

BERTRAND Guillaume P 271 DESPESSE Ghisla<strong>in</strong> P 199<br />

BIONDI T. P 167 DEVAL Yann P 255<br />

BIROL Hansu P 343 DEVIN Jean P 60<br />

BOCCHIO Sara P 434 DEVLIN Malachy P 111<br />

BOEHM Markus P 291 DUNDAR Günhan P 40<br />

BOERS Marc P 343 ELLINGER Frank P 99<br />

BONFANTI Andrea P 91 ELMASRY Mohamed I. P 386<br />

BOUQUET Valéry P 60 ESTEVE Jaume P 171<br />

BOUVET Didier P 159 EVMORFOPOULOS N.E. P 247<br />

BRACKE Wouter P 287 FANUCCI L. P 115, 183, 315, 378<br />

BRAGHIN Francesco P 203 FELBER N. P 231, 422<br />

BREHONNET P. P 319 FICHTNER W. P 231, 422<br />

BROCKMEYER Erik P 426 FORLITI M. P 315<br />

BRUSCHI Paolo P 390 FORMISANO E. P 410<br />

BURG A. P 231 FRANCHI Eleonora P 44<br />

C O’MATHUNA Sean P 211 FRIEDRICH Nils P 291<br />

CALDWELL Trevor C. P 75 FRITSCHI Raphaël P 163<br />

CALVEZ L.C. P 319 FUGGER J. P 279<br />

CANEPARI Anna P 271 GAILLARD Arc'hanmael P 339<br />

CANET Pierre P 60 GALLESIO Davide P 107<br />

CARCHON G. P 283 GARCIA Patrice P 255<br />

CARDARILLI Gian Carlo P 434 GEELEN Bert P 426<br />

CARRARA Francesco P 235, 267 GENSOLEN Fabrice P 327<br />

CATHEBRAS Guy P 327 GEROSA A. P 5<br />

CATLI Burak P 37 GHARBIYA Ahmed P 71<br />

CAUCHETEUX Damien P 83 GHITTORI N. P 25<br />

CAVUS Bilal Tarik P 37 GIAMBASTIANI A. P 315<br />

CELI Didier P 275 GJERMUNDNES Øyste<strong>in</strong> P 307<br />

CHAEHOI A. P 219 GOLOFIT Krzystof P 151<br />

CHAILLOUT Jean-Jacques P 199 GOMRI Sami P 351<br />

CHANTE Jean-Pierre P 271 GONZALEZ José Luis P 239<br />

CHEN You-Gang P 67 GOTTARDI Massimo P 331<br />

CHIERBOLI Giovanni P 195 GRAELL I AMAT A. P 5<br />

CHIRONI V. P 131 GRASSER T. P 175, 279<br />

CIOFI Carm<strong>in</strong>e P 1 GRASSI T. P 17<br />

CISTERNI Marco P 44 GROOS G. P 187<br />

COLLARD BOVY Anne P 259 GUO X<strong>in</strong>yu P 370<br />

XX


GÜRKAYNAK F.D. P 422 MAK Pui-In P 223<br />

HAEBERLEN O. P 279 MALATESTA Aless<strong>and</strong>ro P 434<br />

HAENE S. P 231 MALCOVATI P. P 25<br />

HANNON Jason P 48 MAN K.L. P 119<br />

HATIRNAZ Ilhan P 63, 355 MARIEMA Nicolas P 60<br />

HEINZL R. P 175 MARINO C. P 315<br />

HEITZINGER C. P 279 MARRAKCHI Zied P 139<br />

HERMANOWICZ Ewa P 155 MARTIN Lionel P 327<br />

HIBERT Cyrille P 159 MARTINS R.P. P 223, 398<br />

HOLLIS Timothy P 263 MATEO Diego P 239<br />

HOLZER S. P 279 MATTAVELLI Marco P 179<br />

HONKALA Mikko P 430 MAUNDY Brent P 394<br />

INDIVERI Giacomo P 402 Mc NEILL John P 87<br />

INDRAJIT Paul P 211 MEHREZ Habib P 139<br />

IONESCU Adrian M. P 159 MEINDL Tassilo P 107<br />

IOZZI Francesco P 315, 378 MELANI Massimiliano P 183<br />

IRVINE James P 111 MENDEZ Miguel P 239<br />

ITALIA Aless<strong>and</strong>ro P 267 MENOZZI Roberto P 195<br />

JACHOWICZ Ryszard S. P 347 MERKEN Patrick P 9, 287<br />

JACQ Carol<strong>in</strong>e P 343 MILLAN Alej<strong>and</strong>ro P 374<br />

JAECKEL He<strong>in</strong>z P 99 MINONDO Michel P 271<br />

JAGER Thomas P 199 MITRA Sr<strong>in</strong>joy P 402<br />

JOEHL Norbert P 127 MLYNEK Daniel P 179<br />

JOHN Werner P 187, 437 MOEZ Kambiz K. P 386<br />

JOHNS David A. P 71, 75 MOHAMED-BRAHIM Tayeb P 339<br />

JOHNSON Simon P 414 MONDINI Alessio P 390<br />

JOHNSTON Ronald P 394 MONIACI Walter P 107<br />

JOURDAN Nathalie P 271 MONTORSI G. P 5<br />

JUAN J. P 191, 374 MORELLO Alex<strong>and</strong>er J. P 378<br />

KADIM H.J. P 251 MORGENTSTERN H. P 187<br />

KAESLIN H. P 422 MOVAHEDIAN Hamid P 33<br />

KARAMPATZAKIS D.P. P 247 MRABET Hayder P 139<br />

KAYAL Maher P 295 NAUWELAERS B. P 283<br />

KEIJIK P. P 215 NDIP Ivan P 437<br />

KENNEDY Michael Peter P 48 NERI Bruno, P 135<br />

KEPKEP Asim P 37 NEVIANI A. P 5<br />

KOEHNE H. P 187 NIELSEN Michael P 95<br />

KOUKAB Adil P 103 NIZZA Nicolò P 390<br />

KRZEMIENIECKI Dariusz P 155 NOUET P. P 219<br />

KUJANPÄÄ Tuomo P 430 OCKET I.<br />

LACAITA Andrea P 91 PACZESNY Daniel P 347<br />

LAFRUIT Gauthier P 426 PALMISANO Giuseppe P 167, 235, 267<br />

LALANDE Frédéric P 60 PANCHERI Lucio P 303<br />

LARSEN Torben P 95 PANSERI Luigi P 91<br />

LATORRE L. P 219 PAPANANOS Yannis P 13<br />

LATTUADA M. P 179 PASERO Eros P 107<br />

LAUWEREINS Rudy P 426 PASTRE Marc P 295<br />

LAYER Christophe P 418 PERELS D. P 231<br />

LEBLEBICI Yusuf P 63, 355, 359 PEREZ Francesc P 171<br />

LECONTE Bruno P 60 PERRUISSEAU-CARRIER J. P 163<br />

LEGER Jean Michel P 199 PETTENGHI Hector P 56<br />

LEI Yu P 103 PFEIDERER Hans-Jörg P 418<br />

LEICHT M. P 279 PIOMBO Davide P 79<br />

LEO Elisabetta P 203 PISANI Marcelo P 159<br />

LEROY Philippe P 339 POPOVIC, R.S. P 215<br />

LICHTSTEINER P. P 406 POSEGA Renzo P 179<br />

LIU Chengx<strong>in</strong> P 87 PRAT Christophe P 339<br />

LOPEZ Joan Lluis P 227 PUERS Robert P 287<br />

LUETHI P. P 231 QUINTANA José M. P 56<br />

MACHADO Felipe P 311 RAGONESE Egidio P 167, 267<br />

MAEDER Thomas P 343 RAMACCIOTTI Tommaso P 115<br />

MAILLY F. P 219 RAPIEDJKO Piotr P 347<br />

MAIO I.A. P 366 RE Marco P 434<br />

MAJEED Bivragh P 211 REATTI Alberto P 323<br />

XXI<br />

P 283


REICHL Herbert P 187, 437 TOPRAK Zeynep P 359<br />

RENAUDIN Marc P 83 TORRES Francesc P 227<br />

RESTA Ferruccio P 203 TORROJA Yago P 311<br />

RIESGO Teresa P 311 TOUPE Romaric P 255<br />

ROBERT Michel P 327 TSAKIRIDIS O. P 363<br />

ROCCHI A. P 315 U Seng-Pan P 223, 398<br />

ROGEL Régis P 339 UPENDRANATH Vanam P 331<br />

ROOS Janne P 430 URANGA Arantxa P 227<br />

ROSA Lorenzo P 195 VALENTE G. P 410<br />

RUBIO Antonio P 239 VAN DER MEIJS P 243<br />

RUIZ-DE-CLAVIJO P. P 191 VAN DER TANG Johan P 29<br />

RYCKAERT Julien P 17 VAN HOOF Chris P 9, 287<br />

RYSER Peter P 343 VAN ROERMUND Arthur P 29<br />

SAHA Shimul Ch<strong>and</strong>ra P 207 VAN SINTE Jean-Baptiste P 211<br />

SALMEH Roghoyed P 394 VASILOPOULOS Athanasios P 13<br />

SAPONARA Sergio P 183, 378 VERD Jaume P 171<br />

SCANDIUZZO Mauro P 303 VERKEST Diedrik P 426<br />

SCANDURRA Graziella P 1 VIARANI Luigi P 303<br />

SCHNEIDER Verena P 299 VIDOJKOVIC Maja P 29<br />

SCHWAHA P. P 175 VIGNA A. P 25<br />

SCHWARTZMANN Thierry P 275 VILBE P. P 319<br />

SCUDERI A. P 167, 235 VILLARROYA Maria P 171<br />

SECHEN Carl P 370 VINCENT F. P 215<br />

SEGUIN Jean-Luc P 351 VIOLANTE Massimo P 147<br />

SEITZ Peter P 335 VITZILAIOS Georgios P 13<br />

SELBERHERR S. P 279 VOGRIG D. .<br />

P 5<br />

SERAFINI Luca P 115 VON BÜREN George P 99<br />

SHEIKHOLESLAMI A.<br />

P 279 WANG Hui P 52<br />

SIMONI Andrea P 303 WANG L. P 143<br />

SIN Sai-Weng P 398 WANG P.J. P 143<br />

SINGH Tajeshwar P 207 WANG Q. P 243<br />

SIVIERO Claudio P 366 WANG Wei P 67<br />

SKRIVERVIK Anja P 163 WANG Yu P 52<br />

SOETHER Trond, P 207 WATTS Bernard Enrico P 195<br />

SONZA REORDA Matteo P 147 WEGENER Carsten P 48<br />

SOUDRIS, Dimitrios P 382 WELLIG Arm<strong>in</strong> P 21<br />

SPEVAK M. P 175 WEREMCZUK Jersy P 347<br />

STAMOULIS G.I. P 247 WEY I-Chyn P 67<br />

STECHER M. P 187 WU An-Yeu P 67<br />

STERPONE Luca P 147 WU Chia Tsu P 67<br />

STIEVANO I.S. P 366 YANG Huazhong P 52<br />

STONHAM J. P 363 YANG M.<br />

P 143<br />

STOPPA Davide P 303 YAZICIOGLU Refet Firat P 9<br />

TALAY Selçuk P 40 ZEKI Ali P 37<br />

TANGUY N. P 319 ZERVAS E. P 363<br />

TASSIN Claire P 255 ZIMMER Thomas P 275<br />

TEDESCO Annamaria P 91 ZITO Domenico P 1, 135<br />

TELESCU M. P 319 ZORAT Aless<strong>and</strong>ro P 331<br />

TESTONI Nicola P 44 ZORLU O. P 215<br />

TEVA Jordi P 171, 227 ZUNINO Rodolfo P 79<br />

THANAILAKIS Antonios P 382<br />

THEODORATOS Gerasimos P 13<br />

THIEDE Andreas P 437<br />

XXII


0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

A NEW TOPOLOGY FOR TRANSFORMER<br />

BASED CMOS ACTIVE INDUCTANCES<br />

Graziella Sc<strong>and</strong>urra 1 , Carm<strong>in</strong>e Ciofi 1 , Domenico Zito 2<br />

1 Dipartimento di Fisica della Materia e TFA, University of Mess<strong>in</strong>a,<br />

Salita Sperone 31, I-98166 Mess<strong>in</strong>a, Italy.<br />

2 Dipartimento di Ingegneria dell’Informazione, University of Pisa,<br />

Via Caruso, I-56122 Pisa, Italy.<br />

E-mail: gsc<strong>and</strong>urra@<strong>in</strong>gegneria.unime.it<br />

ABSTRACT<br />

The possibility of realiz<strong>in</strong>g the analog block of an RF<br />

front end <strong>in</strong> CMOS technology is quite attractive, s<strong>in</strong>ce it<br />

would lead to the ability of implement<strong>in</strong>g both the digital<br />

<strong>and</strong> the analog sections onto a s<strong>in</strong>gle chip. The<br />

availability of high quality factor (Q) <strong>in</strong>ductors <strong>in</strong> CMOS<br />

technologies is m<strong>and</strong>atory <strong>in</strong> order to achieve this goal.<br />

This <strong>in</strong> not an easy task <strong>and</strong> it is for this reason that<br />

several researchers have started to explore the possibility<br />

of employ<strong>in</strong>g active circuits as part of the <strong>in</strong>ductor itself,<br />

<strong>in</strong> order to compensate for the passive <strong>in</strong>ductor’s losses.<br />

In this paper, we propose a new topology for the design of<br />

<strong>in</strong>tegrated transformer based high quality active <strong>in</strong>ductors.<br />

As an example, we have designed an active <strong>in</strong>ductor<br />

circuit based on a 0.35 µm CMOS technology <strong>and</strong><br />

capable of behav<strong>in</strong>g as a pure <strong>in</strong>ductance above 2.4 GHz.<br />

1. INTRODUCTION<br />

Integrated passive <strong>in</strong>ductors, especially <strong>in</strong> the case of<br />

st<strong>and</strong>ard CMOS processes, have typical values of Q that<br />

are too low for implement<strong>in</strong>g several fundamental RF<br />

functions (for example highly selective filters)[1]. This<br />

fact represents today one of the most serious obstacles<br />

toward the realization of fully <strong>in</strong>tegrated CMOS front<br />

ends. Although the quality factor can be improved by<br />

resort<strong>in</strong>g to special fabrication steps, the additional<br />

process<strong>in</strong>g cost <strong>and</strong> complication make such an approach<br />

quite unsatisfactory. It is for this reason that there is a<br />

strong <strong>in</strong>terest <strong>in</strong> the possibility of realiz<strong>in</strong>g high quality<br />

<strong>in</strong>ductances by employ<strong>in</strong>g active circuits.<br />

A few topologies of active <strong>in</strong>ductors have been proposed<br />

<strong>in</strong> the literature, which can be divided <strong>in</strong>to three ma<strong>in</strong><br />

categories: a first one <strong>in</strong> which no actual <strong>in</strong>ductance is<br />

used but an <strong>in</strong>ductive effect is obta<strong>in</strong>ed exploit<strong>in</strong>g the<br />

impedance transformation capabilities of circuit such as<br />

the “gyrator” [2], a second one <strong>in</strong> which a bi-pole with a<br />

negative resistance behavior is put <strong>in</strong> series to the <strong>in</strong>ductor<br />

<strong>in</strong> order to compensate for the losses <strong>and</strong> <strong>in</strong>crease the<br />

result<strong>in</strong>g quality factor Q[3], <strong>and</strong> a third one which<br />

exploits the magnetic coupl<strong>in</strong>g between the coils of a<br />

transformer <strong>and</strong> the current amplification <strong>in</strong>troduced by an<br />

the active device as <strong>in</strong> [4] or as <strong>in</strong> the case of the Boot-<br />

Strapped Inductor (BSI) technique[5], which has found its<br />

1<br />

applications only <strong>in</strong> bipolar technology[6-8]. As far as the<br />

first approach is concerned, us<strong>in</strong>g no actual <strong>in</strong>ductors do<br />

result <strong>in</strong> a smaller occupied area with respect to other<br />

solutions. However, <strong>in</strong>ductorless circuits <strong>in</strong>troduce a<br />

relatively high level of noise that makes them useless for<br />

the most dem<strong>and</strong><strong>in</strong>g applications [6]. Also the second<br />

approach, the “negative resistance” technique, has several<br />

drawbacks: it causes a significant <strong>in</strong>crease of the noise<br />

figure, it suffers from a strong dependence on the<br />

temperature <strong>and</strong> causes the potential <strong>in</strong>stability of the<br />

circuit. It is for these reasons that start<strong>in</strong>g from the<br />

problem of the application of the BSI technique to the case<br />

of CMOS technology, we developed a new topology that<br />

significantly simplifies the realization of high Q<br />

equivalent <strong>in</strong>ductances while possibly reta<strong>in</strong><strong>in</strong>g the<br />

advantages connected to the use of actual <strong>in</strong>ductances, that<br />

is a low level of noise .<br />

2. CMOS BOOT STRAPPED<br />

INDUCTOR TOPOLOGY<br />

The circuit <strong>in</strong> Fig. 1 can be considered as the most obvious<br />

implementation of the BSI technique, described <strong>in</strong> [5], to<br />

the case of CMOS technology.<br />

1<br />

2<br />

M<br />

V DD<br />

Z V<br />

L 2<br />

L 1<br />

R 2<br />

R 1<br />

R R<br />

L R<br />

V B<br />

Q 2<br />

V DD<br />

Figure 1.A possible implementation of a CMOS<br />

BSI.<br />

Q 1


0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

1<br />

2<br />

R 1<br />

i L1<br />

v L1<br />

L 1<br />

L R<br />

R R<br />

C GS<br />

V GS<br />

g m V GS<br />

Figure 2. Simplified small signal equivalent<br />

circuit for the CMOS BSI <strong>in</strong> Fig. 1.<br />

The circuit exploits the magnetic coupl<strong>in</strong>g between two<br />

spiral <strong>in</strong>ductors, L 1 <strong>and</strong> L 2, that are represented <strong>in</strong> series<br />

with their respective loss resistances, R 1 <strong>and</strong> R 2. The<br />

<strong>in</strong>ductor L R, with its loss resistance R R, acts as<br />

compensation network together with the equivalent <strong>in</strong>put<br />

capacitance C GS of Q 1. Note that at the frequency at which<br />

L R <strong>and</strong> C GS resonate, the current I 1 which flows trough L 1<br />

results <strong>in</strong> phase with the gate to source voltage of Q1. In<br />

this case, assum<strong>in</strong>g that the equivalent transconductance<br />

ga<strong>in</strong> (g m) of the cascade stage is real, then the current I 1 is<br />

<strong>in</strong> phase with the current I 2 which flows <strong>in</strong> the secondary<br />

spiral (L 2).<br />

As can be easily calculated from the circuit <strong>in</strong> fig.2, the<br />

impedance seen between the nodes 1 <strong>and</strong> 2, i.e. the <strong>in</strong>put<br />

impedance of the CMOS BSI, results:<br />

R 2<br />

v L2<br />

( jω<br />

) = R + jωL<br />

+ Z ( 1+<br />

j Mg )<br />

Z<strong>in</strong> 1 1 P ω m (1)<br />

where the impedance Z P is def<strong>in</strong>ed as follows:<br />

Z<br />

P<br />

() s ( R + sL )<br />

GS<br />

L 2<br />

i L2<br />

1<br />

= R R ||<br />

(2)<br />

sC<br />

Writ<strong>in</strong>g explicitly the real <strong>and</strong> imag<strong>in</strong>ary parts of Z <strong>in</strong> we<br />

get:<br />

⎧ R<br />

⎨<br />

⎩X<br />

<strong>in</strong><br />

( jω)<br />

= R + RP<br />

( jω)<br />

− ωMg<br />

m X P ( jω)<br />

( jω)<br />

= ωL<br />

+ ωMg<br />

R ( jω)<br />

+ X ( jω)<br />

<strong>in</strong><br />

1 (3)<br />

1<br />

In order to obta<strong>in</strong> a purely <strong>in</strong>ductive behaviour for Z <strong>in</strong> at a<br />

given frequency, we should be able to simultaneously<br />

obta<strong>in</strong> a null real part R <strong>in</strong> <strong>and</strong> an <strong>in</strong>ductive (positive)<br />

imag<strong>in</strong>ary part X <strong>in</strong> for Z <strong>in</strong>. It can be noted that below the<br />

resonance frequency of Z P (that is quite close to the<br />

frequency ω 0R=(L RC R) -1/2 ), the reactance X P is certa<strong>in</strong>ly<br />

positive. Therefore, as it is clear from Eq. (3), below the<br />

resonance frequency of Z P it is possible to obta<strong>in</strong>, at a<br />

given angular frequency ω D, very small values (even 0) for<br />

the real part of the equivalent impedance, provided that we<br />

use proper values for g m <strong>and</strong> M. However, <strong>in</strong> order not to<br />

end up with a negative real part for the equivalent<br />

m<br />

P<br />

P<br />

2<br />

impedance for frequencies above ω D, also the condition<br />

that the resistance assumes the m<strong>in</strong>imum value for ω=ω D<br />

must be satisfied. In conclusion, <strong>in</strong> order to obta<strong>in</strong> a purely<br />

<strong>in</strong>ductive behaviour for Z <strong>in</strong> <strong>in</strong> a frequency range around ω D<br />

(ω D


Z IN<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

( Z + R + jωL<br />

)( 1+<br />

jωMg<br />

)<br />

P<br />

1<br />

1+<br />

1<br />

2 2 2<br />

M gm<br />

= (5)<br />

ω<br />

Note that <strong>in</strong> Fig.3 C R plays the role of C GS <strong>in</strong> fig.2.<br />

Compar<strong>in</strong>g this expression to the one <strong>in</strong> (1), it can be<br />

noted that ZIN, <strong>in</strong> the case of the new topology, presents a<br />

higher positive phase contribution, s<strong>in</strong>ce the term<br />

(1+jωMgm) does now multiply the sum of ZP with R1 <strong>and</strong><br />

jωL rather than ZP alone, as it happens <strong>in</strong> the case of the<br />

previous topology. It must be noted that both the bias<br />

voltage VR <strong>and</strong> the value of the capacitance CR (that can<br />

be implemented by means of a CMOS varicap) can be<br />

used <strong>in</strong> order to compensate for process variations <strong>and</strong> non<br />

ideal behaviour of the active devices.<br />

In order to appreciate the advantages of the new approach,<br />

we have designed an active <strong>in</strong>ductor us<strong>in</strong>g both topologies<br />

<strong>in</strong> a AMS 0.35µm CMOS technology. Us<strong>in</strong>g for the<br />

parameters the reasonable values LR=2.6 nH with a quality<br />

factor QR=5, L1=LR=L2, R1=RR, <strong>and</strong> M=2 nH, we obta<strong>in</strong>ed<br />

that, at a work frequency of 2.4 GHz, a gm of about<br />

20mA/V is needed <strong>in</strong> the case of the BSI topology <strong>in</strong> order<br />

to have an ideal <strong>in</strong>ductive behaviour, while <strong>in</strong> the case of<br />

the new approach we propose, the value of gm of 11 mA/V<br />

is sufficient. Clearly, a lower value of gm results <strong>in</strong> a lower<br />

bias current for the active device <strong>and</strong>, therefore, <strong>in</strong> a lower<br />

power consumption. This is particularly important <strong>in</strong> the<br />

case of CMOS devices that have a lower value of the<br />

transconductance with respect to bipolar devices for the<br />

same bias current.<br />

3. RESULTS<br />

As an example of the performances that can be obta<strong>in</strong>ed<br />

by us<strong>in</strong>g the new approach, we have been able to design<br />

an active <strong>in</strong>ductance of about 6 nH at the work frequency<br />

of 2.4 GHz by employ<strong>in</strong>g an AMS 0.35 µm CMOS<br />

technology <strong>and</strong> circuit parameters quite close to the ones<br />

used for the example <strong>in</strong> the previous paragraph (the width<br />

of the transistors was 100 µm). The results of the<br />

simulations are synthesized <strong>in</strong> the plots <strong>in</strong> fig. 4.<br />

The bias current needed to obta<strong>in</strong> this performance is less<br />

than 1 mA for a voltage supply of 3.3 V.<br />

The condition of ideal <strong>in</strong>ductance can be reached <strong>in</strong> a<br />

significantly wide frequency range, from about 2.2 up to<br />

2.8 GHz, as it is shown <strong>in</strong> Fig. 5, by sett<strong>in</strong>g proper values<br />

for the two available control voltages.<br />

This result is quite encourag<strong>in</strong>g as it suggests that, by<br />

design<strong>in</strong>g a proper control system such as the one used <strong>in</strong><br />

[9], such a wide tun<strong>in</strong>g range can be exploited <strong>in</strong> order to<br />

compensate for process <strong>and</strong> temperature variations. The<br />

possibility of design<strong>in</strong>g such a control system, together<br />

with a detailed analysis of the deviation from the ideal<br />

behaviour as can be expected <strong>in</strong> the actual realization of<br />

the circuit, is currently be<strong>in</strong>g <strong>in</strong>vestigated<br />

m<br />

3<br />

.<br />

450<br />

400<br />

350<br />

300<br />

250<br />

Q<br />

200<br />

150<br />

100<br />

50<br />

0<br />

0<br />

1 2 3 4<br />

f(GHz)<br />

Figure 4. Q factor <strong>and</strong> <strong>in</strong>ductance of the active<br />

<strong>in</strong>ductor.<br />

500<br />

450<br />

400<br />

350<br />

300<br />

250<br />

Q<br />

200<br />

150<br />

100<br />

50<br />

0<br />

1.0 1.5 2.0 2.5 3.0 3.5<br />

f(GHz) e<br />

Figure 5. Q factor <strong>and</strong> <strong>in</strong>ductance of the active <strong>in</strong>ductor.<br />

4. CONCLUSION<br />

In this paper the possibility of realiz<strong>in</strong>g high quality<br />

<strong>in</strong>tegrated active <strong>in</strong>ductors by employ<strong>in</strong>g circuit<br />

techniques rely<strong>in</strong>g on <strong>in</strong>tegrated transformers has been<br />

explored. We started our <strong>in</strong>vestigation try<strong>in</strong>g to extend to<br />

the MOS technology the BSI techniques successfully<br />

applied to the case of bipolar technology. A new topology<br />

has been presented that, while considerably reduc<strong>in</strong>g the<br />

requirements <strong>in</strong> terms of ga<strong>in</strong>, transition frequencies <strong>and</strong><br />

power consumption, provides for two control voltages that<br />

can be used <strong>in</strong> order to accurately tune the circuit dur<strong>in</strong>g<br />

its actual operation.<br />

An AMS 0.35 µm CMOS technology has been used to<br />

demonstrate the validity of the solution we proposed<br />

which allowed to reach the condition of zero series<br />

equivalent resistance (that is a virtually <strong>in</strong>f<strong>in</strong>ite Q) for an<br />

12<br />

10<br />

8<br />

6<br />

4<br />

2<br />

L(nH)


<strong>in</strong>ductance value of about 6 nH at 2.4 GHz by properly<br />

act<strong>in</strong>g on the above mentioned control voltages. A<br />

problem which has to be <strong>in</strong>vestigated <strong>in</strong> the future is the<br />

robustness of the proposed approach versus the process<br />

tolerances <strong>and</strong> temperature effects. However, it has been<br />

already shown <strong>in</strong> the case of bipolar technology that these<br />

problems can be solved with an accurate design <strong>and</strong> some<br />

supplemental circuitry. For this reason we believe that the<br />

work we have done so far can be considered quite<br />

encourag<strong>in</strong>g <strong>in</strong> view of the possibility of realiz<strong>in</strong>g a fully<br />

<strong>in</strong>tegrated CMOS RF front end <strong>and</strong> that it is worth to<br />

further <strong>in</strong>vestigate on the actual performances that can be<br />

expected out of the new topology we have developed.<br />

Moreover, it can be expected that by follow<strong>in</strong>g the very<br />

same approach, <strong>and</strong> provided that up to date technologies<br />

are employed, almost ideal <strong>in</strong>ductances could be obta<strong>in</strong>ed<br />

at higher frequencies <strong>and</strong> particularly <strong>in</strong> the 5 GHz<br />

frequency range, which may soon become the most<br />

<strong>in</strong>terest<strong>in</strong>g frequency range requir<strong>in</strong>g the realization of<br />

high quality, fully <strong>in</strong>tegrated CMOS RF front ends for<br />

s<strong>in</strong>gle chip, low cost, WLAN <strong>in</strong>terfaces.<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

5. REFERENCES<br />

[1] F. Mernyei, F. Darrer, M. Pardoen, A. Sibrai,<br />

“Reduc<strong>in</strong>g the Substrate Losses for RF Integrated<br />

Inductors”, IEEE Microwave <strong>and</strong> Guided Wave<br />

Letters, Vol.8, pp. 300-301, September 1998<br />

[2] A. Thanachayanont, “CMOS Transistor-Only Active<br />

Inductor for IF/RF Applications”, IEEE ICIT ’02,<br />

Bangkok, Thail<strong>and</strong>.<br />

[3] W.B. Kuhn, N.K. Yunduru, A.S. Wyszynski, “Q-<br />

Enhanced LC B<strong>and</strong>pass Filters for Integrated<br />

Wireless Applications”, Trans. On Microwave<br />

Theory <strong>and</strong> Techniques, vol. 46, n.12, Dec.1998, pp.<br />

2577-2586;<br />

[4] Y.C. Wu, M.F. Chang, “On-Chip High Q (>3000)<br />

Transformer-Type Spiral Inductors),” <strong>Electronics</strong><br />

letters, 31 st January 2002, Vol. 38, No.3<br />

[5] G.D’Angelo, L.Fanucci, A. Monorchio,<br />

A.Monterastelli, B.Neri,, “High Quality Active<br />

Inductors”, IEE <strong>Electronics</strong> Letter, N.20, 30th Sept.<br />

1999, pp.1727-1728 .<br />

[6] L. Fanucci, G. D’Angelo, A. Monterastelli, M.<br />

Paparo, B. Neri , “Fully Integrated Low-Noise<br />

Amplifier with High Quality Factor L-C Filter for<br />

1.8GHz Wireless Applications”, ISCAS 2001, the<br />

2001 IEEE International Symposium on Circuits <strong>and</strong><br />

Systems, vol. 4 , 6-9 May 2001, pp.462 - 465<br />

[7] D. Zito, L. Fanucci, B. Neri, S. Di Pascoli, G.<br />

Sc<strong>and</strong>urra , “S<strong>in</strong>gle Chip 1.8GHz B<strong>and</strong> Pass LNA<br />

with Temperature Self-Compensation ”, SCS 2003,<br />

International Symposium on Signals, Circuits <strong>and</strong><br />

Systems, vol. 1 , 10-11 July 2003, pp.121 - 124<br />

[8] L. Fanucci, A. Hopper, B. Neri, D. Zito , “A Novel<br />

Fully Integrated Antenna Switch for Wireless<br />

Systems”, ESSDERC '03, 33rd Conference on<br />

European Solid-State Device <strong>Research</strong>, 16-18 Sept.<br />

2003, pp.553 – 556.<br />

[9] D. Zito, F. De Bernard<strong>in</strong>is <strong>and</strong> B. Neri, “Model<strong>in</strong>g<br />

<strong>and</strong> Design of a Tunable high Q LNA for WLAN”,<br />

4<br />

Proceed<strong>in</strong>gs of IEEE International Conference on<br />

Signals <strong>and</strong> Electronic Systems 2004 (ICSES 2004),<br />

Poznan (PL), 13-15 Sept. 2004, pp.253-256;


0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

A 0.35 μm CMOS ANALOG TURBO DECODER<br />

FOR A 40-BIT, RATE 1/3, UMTS CHANNEL CODE<br />

D. Vogrig 1 , A. Gerosa 1 , A. Neviani 1 , A. Graell i Amat 2 , G. Montorsi 2 , S. Benedetto 2<br />

1 DEI, Università di Padova, Via Gradenigo 6/B, 35131 Padova, Italy<br />

{daniele.vogrig, gerosa, neviani}@dei.unipd.it<br />

2 CERCOM, Politecnico di Tor<strong>in</strong>o, Corso Duca degli Abruzzi 24, 10129 Tor<strong>in</strong>o, Italy<br />

{alex.graell, montorsi, benedetto}@polito.it<br />

ABSTRACT<br />

In this work we present the design <strong>and</strong> prelim<strong>in</strong>ary test<strong>in</strong>g<br />

results of the first reported CMOS analog turbo decoder<br />

for a code of realistic complexity, a parallel concatenated,<br />

rate 1/3, code def<strong>in</strong>ed <strong>in</strong> the 3GPP st<strong>and</strong>ard with<br />

<strong>in</strong>terleaver size of 40 bits <strong>and</strong> a codeword size of 132 bits.<br />

The prelim<strong>in</strong>ary test results demonstrated a data rate of 2<br />

Mbit/s, with a power consumption of 6.6 mW (decoder<br />

core alone) <strong>and</strong> an energy per decoded bit <strong>and</strong> trellis state<br />

of only 1.36 nJDifferent short range low power wireless<br />

data transmission systems for ISM (Industrial Scientific<br />

<strong>and</strong> Medical) applications are compared with respect to<br />

their performances (power consumption, distance range,<br />

architecture well suited to <strong>in</strong>tegration, etc).<br />

1. INTRODUCTION<br />

After the first successful implementations of analog<br />

iterative decoders [1, 2] <strong>in</strong> BiCMOS technology, a full<br />

CMOS solution for a very simple Hamm<strong>in</strong>g (8,4) code<br />

was proposed <strong>in</strong> [3]. Shortly after, Gaudet et al. [4]<br />

realized the first analog turbo decoder, which is a<br />

significant step ahead with respect to a decoder for a<br />

s<strong>in</strong>gle convolutional code, but still far from a real<br />

application due to the limited <strong>in</strong>terleaver size (16 bit).<br />

These prototypes have shown an outst<strong>and</strong><strong>in</strong>g improvement<br />

<strong>in</strong> the power efficiency with respect to their digital<br />

counterparts, with some error-correct<strong>in</strong>g performance loss.<br />

Their success fully justifies a further research effort to<br />

demonstrate that analog decoders for very high<br />

performance codes are feasible <strong>and</strong> ma<strong>in</strong>ta<strong>in</strong> their<br />

superiority over the digital implementation.<br />

In this work, we present the design <strong>and</strong> prelim<strong>in</strong>ary test<strong>in</strong>g<br />

results of the first reported analog turbo decoder for a<br />

realistic application [5], a rate 1/3, parallel concatenated<br />

code def<strong>in</strong>ed <strong>in</strong> the 3GPP st<strong>and</strong>ard with <strong>in</strong>terleaver size of<br />

40 bits.<br />

2. SYSTEM DESCRIPTION<br />

2.1 The 3GPP Turbo Code<br />

The code considered <strong>in</strong> this work is taken from the 3GPP<br />

st<strong>and</strong>ard <strong>and</strong> consists of the parallel concatenation of two<br />

identical Recursive Convolutional Systematic (RCS)<br />

constituent encoders whose rate is equal to 1/2, <strong>and</strong> whose<br />

5<br />

number of states equals 8. The two constituent encoders<br />

are connected <strong>in</strong> parallel through an <strong>in</strong>terleaver that<br />

operates a permutation on blocks of <strong>in</strong>formation bits u k<br />

before they are fed to the second constituent encoder <strong>in</strong><br />

order to enhance the error-correct<strong>in</strong>g capability of the<br />

code. The value N B of the block size (or <strong>in</strong>terleaver<br />

length) plays a fundamental role <strong>in</strong> sett<strong>in</strong>g the<br />

performance of the code <strong>in</strong> terms of cod<strong>in</strong>g ga<strong>in</strong>. In this<br />

work, the m<strong>in</strong>imum block size N B = 40 def<strong>in</strong>ed <strong>in</strong> the<br />

3GPP st<strong>and</strong>ard was considered. Two parity bits, c k (1) <strong>and</strong><br />

c k (2) , are generated per every <strong>in</strong>formation bit uk. As a<br />

consequence, the rate of the code is R C = 1/3, so that a<br />

block of 40 <strong>in</strong>formation bits generates 120 code bits plus<br />

another N T = 12 term<strong>in</strong>ation bits.<br />

We assume that the code bit stream (u k, c k (1) , ck (2) )<br />

generated by the encoder is BPSK-modulated, then<br />

transmitted through an Additive White Gaussian Noise<br />

(AWGN) channel, <strong>and</strong> f<strong>in</strong>ally demodulated before be<strong>in</strong>g<br />

processed by the analog decoder. Each block of 40<br />

<strong>in</strong>formation bits is thus turned <strong>in</strong>to a channel output frame<br />

(y k u , yk c1 , yk c2 ) of real NC=132 numbers that is processed<br />

by the decoder <strong>in</strong> parallel.<br />

2.2 The Decod<strong>in</strong>g Algorithm<br />

A block diagram of our decoder is depicted <strong>in</strong> Fig. 1. As<br />

<strong>in</strong> the iterative decod<strong>in</strong>g technique orig<strong>in</strong>ally proposed<br />

Fig.1.Turbo decoder block diagram<br />

by Berrou et al. [6], two maximum a-posteriori (MAP)<br />

decoders matched to the two constituent encoders<br />

calculate <strong>and</strong> exchange <strong>in</strong>formation on bit probabilities of


the <strong>in</strong>formation bits pˆ ( u k ) based on the channel output<br />

observations (i.e. a frame of channel output symbols yk)<br />

<strong>and</strong> on the extr<strong>in</strong>sic <strong>in</strong>formation p~ ( u k ) com<strong>in</strong>g from the<br />

companion MAP decoder. The <strong>in</strong>terleavers (Π) <strong>and</strong> de<strong>in</strong>terleavers<br />

(Π -1 ) permute the channel, the extr<strong>in</strong>sic <strong>and</strong><br />

the output <strong>in</strong>formation so that they match the order of the<br />

<strong>in</strong>put <strong>in</strong>formation bits of the two constituent encoders. The<br />

equal gate at the output computes the a-posteriori<br />

probability pˆ ( u k ) from those com<strong>in</strong>g from the two SISO<br />

decoders.<br />

In our work, the two identical MAP decoders (s<strong>in</strong>ce<br />

identical are the constituent codes) are realized at the<br />

functional level as Soft-Input Soft-Output (SISO)<br />

decoders that implement the forward-backward algorithm<br />

(a.k.a. BCJR algorithm)[7, 8]. Our turbo decoder is based<br />

on the multiplicative version of the algorithm, that<br />

basically performs sum-of-product computations on bit<br />

probabilities, tak<strong>in</strong>g a frame of NC = 132 channel<br />

transition probabilities p(yk|xk=±1) at its <strong>in</strong>put, <strong>and</strong><br />

calculat<strong>in</strong>g the a-posteriori probabilities pˆ ( u k ) .<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

Fig. 2.SISO decoder block diagram<br />

A block diagram of a s<strong>in</strong>gle SISO decoder is shown <strong>in</strong><br />

Fig. 2. It requires six different basic build<strong>in</strong>g blocks<br />

(identified with the letters A through F <strong>in</strong> Fig. 2) to<br />

perform all the different sum-of-product computations [8].<br />

3. Analog decoder design<br />

The approach that was followed to implement the<br />

decod<strong>in</strong>g algorithm outl<strong>in</strong>ed <strong>in</strong> the previous section with<br />

an analog circuit is basically that proposed by Loeliger et<br />

al. [9], <strong>and</strong> adopted <strong>in</strong> [1] <strong>and</strong> [3]. It is a current-mode<br />

technique, where the basic cells comput<strong>in</strong>g sum of<br />

products (sum-product cells) are derived directly from the<br />

well-known Gilbert multiplier cell. An example of a sumproduct<br />

cell at transistor-level is shown <strong>in</strong> Fig.3, report<strong>in</strong>g<br />

the CMOS implementation of an E cell. The nMOSFET at<br />

the bottom generates the normalization current I B. The<br />

nMOSFET <strong>in</strong> the middle are operated <strong>in</strong> the weak<br />

<strong>in</strong>version region <strong>and</strong> generate every pair of products of the<br />

two <strong>in</strong>put current vectors (I X0, I X1) <strong>and</strong> (I Y0, I Y2, I Y2, I Y3).<br />

In the upper part of the circuit, the currents represent<strong>in</strong>g<br />

the products are summed, by simply short<strong>in</strong>g the<br />

correspond<strong>in</strong>g wires, or discarded, connect<strong>in</strong>g the wire to<br />

V DD, if they are not needed <strong>in</strong> the sum. The upper<br />

6<br />

pMOSFET’s mirror the output currents (IZ0, I Z1) to make<br />

them available to the cascaded stages.<br />

Fig.3.CMOS implementation of an E cell<br />

The transistor network needed to implement a SISO<br />

decoder is realized by directly connect<strong>in</strong>g the sum-product<br />

cells accord<strong>in</strong>g to the diagram reported <strong>in</strong> Fig. 2. When<br />

two cells are cascaded, high-output-impedance, saturated<br />

pMOSFET’s drive the output currents <strong>in</strong>to diodeconnected<br />

nMOSFET’s.<br />

The <strong>in</strong>terleaver is realized as a network of <strong>in</strong>dependent<br />

metal paths that permute the SISO1 <strong>and</strong> the channel<br />

output currents before they are fed to the SISO2 <strong>in</strong>put<br />

term<strong>in</strong>als accord<strong>in</strong>g to the same permutation scheme<br />

adopted <strong>in</strong> the encoder. The de<strong>in</strong>terleaver is an identical<br />

network perform<strong>in</strong>g the opposite operation on the<br />

<strong>in</strong>formation flow<strong>in</strong>g from SISO2 to SISO1.<br />

The prototype also <strong>in</strong>cludes an I/O <strong>in</strong>terface whose basic<br />

function is to store a frame of channel output symbols yk on an analog memory, convert them to currents<br />

represent<strong>in</strong>g the channel transition probabilities<br />

p(yk|xk=±1), perform decisions on the a-posteriori<br />

probabilites computed by the decoder, <strong>and</strong> manage the<br />

output of the decoded bits û k .<br />

In order to ease the test<strong>in</strong>g phase, the channel output is fed<br />

to the circuit <strong>in</strong> a 6-bit digital representation <strong>and</strong> is<br />

converted to a differential voltage by a programmable<br />

ga<strong>in</strong> DAC. A pseudo-differential buffer is placed at the<br />

DAC output <strong>in</strong> order to drive the large parasitic<br />

capacitance of the long <strong>in</strong>terconnections that distribute the<br />

signal across the analog memory banks. This is organized<br />

<strong>in</strong>to separate memory banks for the two SISO decoders,<br />

each <strong>in</strong>clud<strong>in</strong>g 86 storage elements, 40 for the symbols<br />

correspond<strong>in</strong>g to the <strong>in</strong>formation bits uk, another 40 for<br />

the parity check symbols, <strong>and</strong> 6 for the term<strong>in</strong>ation. The<br />

storage elements are realized with a pseudo-differential<br />

architecture based on sampl<strong>in</strong>g capacitors buffered with<br />

pMOSFET source followers. Transmission gates<br />

multiplex the output voltage of the two arrays connect<strong>in</strong>g<br />

them to a pMOSFET differential pair (DP) biased <strong>in</strong> weak<br />

<strong>in</strong>version, which converts the differential voltage<br />

proportional to the channel output <strong>in</strong>to a current pair<br />

proportional to the channel transition probabilities<br />

p(yk|xk=±1) [9].


Fig.4.Simplified tim<strong>in</strong>g diagram of the decoder<br />

At the decoder output, a bank of 40 current comparators<br />

make the f<strong>in</strong>al decision on the a-posteriori probabilities<br />

pˆ ( u k ) .<br />

A simplified tim<strong>in</strong>g diagram of the decoder is shown <strong>in</strong><br />

Fig.4. All operations are synchronized on a twononoverlapp<strong>in</strong>g-phase<br />

clock, <strong>in</strong>ternally generated from an<br />

external master clock. Load<strong>in</strong>g an <strong>in</strong>put frame takes 132<br />

clock cycles. Thanks to the double array of memory<br />

elements, the load<strong>in</strong>g <strong>and</strong> decod<strong>in</strong>g operation are<br />

performed <strong>in</strong> parallel. The correct sequence of signals to<br />

drive the S/H’s <strong>in</strong> the analog memory banks is generated<br />

us<strong>in</strong>g circular buffers. Hard decision on the decoder<br />

output is taken <strong>in</strong> the last two clock cycles, <strong>and</strong> the 40<br />

result<strong>in</strong>g hard bits û k are multiplexed on an 8-bit output<br />

port <strong>in</strong> the next five clock cycles. In the first two cycles<br />

the SISO modules are reset to uniform probability, which<br />

leaves 128 clock cycles for the decod<strong>in</strong>g operation.<br />

Due to the complexity of the circuit, it was not possible to<br />

predict the error correct<strong>in</strong>g-performance of the design by<br />

transistor-level simulations.<br />

TABLE I<br />

cell bias current IB, (μA) 1<br />

tail current nMOSFET W/L, (μm/μm) 4/4<br />

multiplier nMOSFET W/L, (μm/μm) 5/0.5<br />

mirror pMOSFET W/L, (μm/μm) 2/2<br />

S/H capacitance, (fF) 305<br />

S/H nMOSFET switch W/L, (μm/μm) 2.4/0.3<br />

S/H buffer bias current, (μA) 2<br />

DP bias current, (μA) 1<br />

DP pMOSFET W/L, (μm/μm) 40/0.5<br />

Transistors <strong>in</strong> the sum-product cells have been sized for<br />

6-bit match<strong>in</strong>g accuracy on differential <strong>in</strong>put voltages.<br />

This figure comes from the precision used <strong>in</strong> digital<br />

decoders work<strong>in</strong>g with log-likelyhood ratios [10]. This<br />

choice was verified us<strong>in</strong>g a high-abstraction-level<br />

software description of the analog decoder [5] <strong>in</strong>clud<strong>in</strong>g a<br />

simple mismatch model similar to that reported <strong>in</strong> [11], <strong>in</strong><br />

which the <strong>in</strong>put match<strong>in</strong>g data were derived from<br />

transistor-level simulations of the sum-product cells. The<br />

model predicts a performance loss between 0.2 <strong>and</strong> 0.4<br />

dB, which was considered acceptable. With transistor<br />

siz<strong>in</strong>g set by match<strong>in</strong>g requirements, the cell bias current<br />

(I B <strong>in</strong> Fig.3) was set <strong>in</strong> order to guarantee operation<br />

between weak <strong>and</strong> moderate <strong>in</strong>version of the devices. A<br />

small number of lengthy transient simulations of the<br />

decoder core at transistor level were performed to verify<br />

that the design satisfied the speed requirements. F<strong>in</strong>ally,<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

7<br />

layout solutions to keep under control the maximum<br />

<strong>in</strong>terconnection length were adopted <strong>in</strong> order to limit the<br />

amount of parasitic capacitance at critical nodes. A<br />

summary of the design ma<strong>in</strong> features is listed <strong>in</strong> Table I.<br />

4. Experimental results<br />

The circuit was designed <strong>and</strong> fabricated <strong>in</strong> a three-metal,<br />

double-poly, 0.35 μm CMOS technology. A<br />

microphotograph of the chip is shown <strong>in</strong> Fig. 5.<br />

Fig.5.Chip microphotograph<br />

The experimental setup to characterize the decoder<br />

performance consisted of a personal computer emulat<strong>in</strong>g<br />

the encoder <strong>and</strong> the AWGN channel, connected to an<br />

logic analyzer/pattern generator. A measurement cycle is<br />

made up of three steps: (1) generation of the test <strong>in</strong>put<br />

stream; (2) decod<strong>in</strong>g of the <strong>in</strong>put stream by the device<br />

under test; (3) decoder output post-process<strong>in</strong>g. In step (1)<br />

a given number of frames of channel output symbols is<br />

generated on the PC <strong>in</strong> a 6-bit digital representation, <strong>and</strong><br />

then transferred to the pattern generator. In step (2), the<br />

pattern generator feeds the digital <strong>in</strong>put port of the analog<br />

decoder. As mentioned <strong>in</strong> section 3, decod<strong>in</strong>g <strong>and</strong> load<strong>in</strong>g<br />

a new frame are performed <strong>in</strong> parallel. The decoded bits<br />

are captured <strong>and</strong> stored by the logic analyzer. The two<br />

synchronization signals that drive the pattern generator<br />

(READ) <strong>and</strong> the logic analyzer (OUTPUT_READY) are<br />

generated <strong>in</strong>ternally by the chip based on an external<br />

master clock. In step (3), the decoded bits are transferred<br />

to the PC for post-process<strong>in</strong>g. In the nom<strong>in</strong>al test<strong>in</strong>g<br />

conditions, the master clock frequency is F clock=2 MHz<br />

(that corresponds to the maximum channel data rate <strong>in</strong> a<br />

UMTS system), the power supply is V DD=3.3 V <strong>and</strong> the<br />

sum-product cell bias current IB=1 μA.<br />

The bit <strong>and</strong> frame error rate (BER, FER) curves vs signalto-noise<br />

ratio (SNR) measured on a chip samples are<br />

reported <strong>in</strong> Fig.6 (solid l<strong>in</strong>es), together with the<br />

benchmark BER <strong>and</strong> FER curves (dashed l<strong>in</strong>es). The<br />

benchmark is a software implementation of a digital<br />

version of the decoder, <strong>and</strong> yields the ideal performance<br />

of the decod<strong>in</strong>g algorithm provided a large enough<br />

number of iterations are executed. The performance of the<br />

sample have a loss of about 0.5 dB at a BER of 10 -3 , <strong>and</strong><br />

this loss stays constant until the vic<strong>in</strong>ity of<br />

BER=10 -6 .More measurement are under way to validate<br />

this observation.


BER/FER<br />

1.E+00<br />

1.E-01<br />

1.E-02<br />

1.E-03<br />

1.E-04<br />

1.E-05<br />

FER (ideal)<br />

BER (ideal)<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

FER (measurement)<br />

0.5dB<br />

1.E-06<br />

0 1 2 3 4 5 6<br />

SNR (dB)<br />

BER (measurement)<br />

0.5dB<br />

Fig.6.BER <strong>and</strong> FER curves vs SNR measured<br />

(solid l<strong>in</strong>es) compared to the ideal BER <strong>and</strong> FER.<br />

As mentioned <strong>in</strong> section 3, high-abstraction-level<br />

simulations <strong>in</strong>clud<strong>in</strong>g mismatch predict a loss between 0.2<br />

<strong>and</strong> 0.4 dB. We have also executed a set of BER<br />

measurements at a SNR = 2.5 dB with different cell bias<br />

current I B <strong>and</strong> clock frequency F clock, whose results are<br />

reported <strong>in</strong> Fig.7. As expected, a significant <strong>in</strong>crease of<br />

BER can be observed at lower I B <strong>and</strong> higher F clock, due to<br />

the decrease of the decoder speed with the bias current.<br />

However, no significant performance ga<strong>in</strong> was obta<strong>in</strong>ed<br />

mov<strong>in</strong>g from the nom<strong>in</strong>al test<strong>in</strong>g conditions to lower clock<br />

frequency <strong>and</strong> higher bias current. This confirms the<br />

validity of the design choices <strong>and</strong> rules out that f<strong>in</strong>ite<br />

decoder speed or operation <strong>in</strong> moderate <strong>in</strong>version<br />

contribute significantly to the BER performance loss<br />

observed <strong>in</strong> Fig.7.<br />

nom<strong>in</strong>al<br />

test<strong>in</strong>g<br />

conditions<br />

Fig.7.BER measured as function of cell bias<br />

current I B <strong>and</strong> clock frequency F clock.<br />

A deeper experimental <strong>in</strong>vestigation is under way to<br />

underst<strong>and</strong> the actual performance limitations of the<br />

prototype. A summary of the chip measured performance<br />

is reported <strong>in</strong> Table II. The energy per decoded bit <strong>and</strong><br />

trellis state of the decoder core is 2.5 times lower than that<br />

reported <strong>in</strong> [4], with also a lower performance loss.<br />

5. Conclusions<br />

This work reports the design <strong>and</strong> test<strong>in</strong>g of the first analog<br />

turbo decoder for a code of realistic complexity. The<br />

decoder was realized <strong>in</strong> a pla<strong>in</strong> CMOS technology <strong>and</strong><br />

was tested at a data rate of 2 Mbit/s, with a power<br />

8<br />

TABLE II<br />

decoder whole<br />

core chip<br />

area excl. pads, (mm 2 ) 3.7×1.1 4.5×2.0<br />

number of transistors 30.000 70.000<br />

VDD, (nom<strong>in</strong>al, V) 3.3 3.3<br />

Power, (mW) 6.8 10.3<br />

Energy/dec. bit/trellis state, (nJ) 1.4 2.1<br />

data throughput, (Mbit/s) 2<br />

consumption of 6.6 mW (decoder core alone), an energy<br />

per decoded bit <strong>and</strong> trellis state of 1.4 nJ, with a<br />

performance loss of 0.5 w.r.t. the ideal algorithm..<br />

6. Bibliography<br />

[1] F. Lustenberger, M. Helfenste<strong>in</strong>, G. S. Moschytz, H.-<br />

A. Loeliger, F. Tarköy, “All-Analog Decoder for a B<strong>in</strong>ary<br />

(18, 9, 5) Tail-Bit<strong>in</strong>g Trellis Code”, <strong>in</strong> Proc. ESSCIRC,<br />

pp. 362-365, Duisburg, Germany, Sep. 1999<br />

[2] M. Moerz, T. Gabara, R. Yan, J. Hagenauer, “An<br />

Analog 0.25 um BiCMOS Taibit<strong>in</strong>g MAP Decoder”, <strong>in</strong><br />

Proc. IEEE ISSCC, pp. 356-357, San Francisco, CA, Feb.<br />

2000<br />

[3] C. W<strong>in</strong>stead, J. Dai, S. Yu, C. Myers, R. R. Harrison,<br />

C. Schlegel, “CMOS Analog MAP Decoder for (8,4)<br />

Hamm<strong>in</strong>g Code”, IEEE J. Solid-State Circ., vol. 39, no.1,<br />

pp. 122-131, Jan. 2004<br />

[4] V. C. Gaudet, P. G. Gulak, "A 13.3-Mb/s 0.35-um<br />

CMOS Analog Turbo Decoder IC with a Configurable<br />

Interleaver", IEEE J. Solid-State Circ., vol. 38, no. 11, pp.<br />

2010-2015, Nov. 2003<br />

[5] A. Graell i Amat, S. Benedetto, G. Montorsi, D.<br />

Vogrig, A. Neviani, A. Gerosa, "An Analog Turbo<br />

Decoder for the UMTS St<strong>and</strong>ard", to be presented at 2004<br />

IEEE ISIT, Chicago, IL USA, June 27 - July 2, 2004<br />

[6] C. Berrou, A. Glavieux, P. Thitimajshima, "Near<br />

Shannon limit error-correct<strong>in</strong>g cod<strong>in</strong>g <strong>and</strong> decod<strong>in</strong>g:<br />

Turbo codes", <strong>in</strong> Proc. of ICC, Geneva, pp. 1064-1070,<br />

May 1993.<br />

[7] L. R. Bahl, J. Cocke, F. Jel<strong>in</strong>ek, <strong>and</strong> J. Raviv,<br />

“Optimal decod<strong>in</strong>g of l<strong>in</strong>ear codes for m<strong>in</strong>imiz<strong>in</strong>g symbol<br />

error rate,” IEEE Trans. Inform. Theory, vol. 20, pp. 284–<br />

287, Mar. 1974.<br />

[8] S. Benedetto, D. Divsalar, G. Montorsi, <strong>and</strong> F. Pollara,<br />

"Soft-<strong>in</strong>put soft-output modules for the construction <strong>and</strong><br />

distributed iterative decod<strong>in</strong>g of code networks,"<br />

European Trans. Telecommuncations, vol. 9, pp.155-172,<br />

Mar. 1998.<br />

[9] H.-A. Loeliger, F. Lustenberger, M. Helfenste<strong>in</strong>, F.<br />

Tarköy, “Probability Propagation <strong>and</strong> Decod<strong>in</strong>g <strong>in</strong> Analog<br />

VLSI”, IEEE Trans. Inform. Theory, vol.47, no.2, pp.<br />

837-843, Feb. 2001<br />

[10] G. Montorsi, S. Benedetto, "Design of fixed-po<strong>in</strong>t<br />

iterative decoders for concatenated codes with<br />

<strong>in</strong>terleavers", IEEE J. Sel. Areas Comm., vol. 19, n. 5, pp.<br />

871-882, May 2001.<br />

[11] F. Lustenberger, H.-A. Loeliger, "On Mismatch<br />

Errors <strong>in</strong> Analog-VLSI Error Correct<strong>in</strong>g Decoders", Proc.<br />

of ISCAS '01, Sydney, Australia, vol. IV, pp. 198-201,<br />

2001.


0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

EFFECT OF ELECTRODE OFFSET ON THE<br />

CMRR OF THE CURRENT BALANCING<br />

INSTRUMENTATION AMPLIFIERS<br />

Refet Firat Yazicioglu 1,2 , Patrick Merken 1 , Chris Van Hoof 1,2<br />

1 IMEC, Kapeldreef 75, 3001 Leuven, Belgium<br />

2 K. U. Leuven, ESAT-INSYS, Kasteelpark Arenberg 10, 3001 Leuven, Belgium<br />

E-mail: Refet.Firat.Yazicioglu@imec.be<br />

ABSTRACT<br />

This paper describes the effect of the electrode-offset<br />

voltage on the CMRR of the current balanc<strong>in</strong>g<br />

<strong>in</strong>strumentation amplifiers (CBIA). Calculated CMRR<br />

behavior of the implemented <strong>in</strong>strumentation amplifier<br />

(IA) <strong>in</strong> 0.5 µm CMOS technology is compared with the<br />

simulated <strong>and</strong> measured results. F<strong>in</strong>ally, a technique for<br />

improv<strong>in</strong>g the CMRR of IA under electrode-offset is<br />

proposed.<br />

1. INTRODUCTION<br />

The most important challenge for extract<strong>in</strong>g the low-level<br />

biopotential signals is reject<strong>in</strong>g the high-level commonmode<br />

signals coupled to the human body from the ma<strong>in</strong>s.<br />

This common-mode signal can be 4-5 orders of magnitude<br />

higher than the biopotential signals. Therefore, an IA,<br />

which has a high CMRR, is a vital component for<br />

biopotential acquisition systems.<br />

Another ma<strong>in</strong> problem for biopotential acquisition<br />

systems is the electrode offset. In addition to high<br />

common mode signals, biopotential signals have large DC<br />

offsets due to the mismatches between the electrodes. A<br />

commonly used electrode type for biopotential<br />

measurements is the Ag/AgCl electrode. Fig. 1 shows the<br />

measured offset voltages from several Ag/AgCl electrodes<br />

relative to the ground. Unless the electrode-offset voltage<br />

is filtered, it will lead to the saturation of the IA.<br />

Therefore, a high performance IA should filter the DC<br />

offset voltage as well as reject<strong>in</strong>g the common-mode<br />

signals coupled to the human body.<br />

An <strong>in</strong>tegrated IA is first presented by [1]. This IA does not<br />

need any matched resistors or trimmed components for<br />

achiev<strong>in</strong>g high CMRR. Operation of the circuit is based<br />

on balanc<strong>in</strong>g the currents of the <strong>in</strong>put pair transistors, <strong>and</strong><br />

ga<strong>in</strong> is def<strong>in</strong>ed by the ratio of the two resistors. The<br />

technique, use of current balanc<strong>in</strong>g for achiev<strong>in</strong>g high<br />

CMRR, is later applied to many monolithic<br />

<strong>in</strong>strumentation amplifiers ([2]-[7]), where together with a<br />

high pass filter ([5], [7]) these CBIAs can be used to filter<br />

the DC electrode offset .<br />

In this paper, we present the effect of the DC electrode<br />

offset on the CMRR of the CBIAs. Mechanisms<br />

responsible for the reduction of the CMRR under the DC<br />

electrode offset are def<strong>in</strong>ed <strong>and</strong> equations describ<strong>in</strong>g the<br />

9<br />

CMRR of the CBIA architecture of [5] are derived <strong>and</strong><br />

compared with the simulated <strong>and</strong> measured results.<br />

F<strong>in</strong>ally, a technique for improv<strong>in</strong>g the CMRR of the<br />

CBIAs under electrode offset is proposed.<br />

Figure 1. Measured DC electrode offset voltages<br />

from different Ag/AgCl electrodes relative to<br />

ground.<br />

2. CMRR MODEL OF THE CBIA<br />

ARCHITECTURE<br />

Fig. 2 shows the simplified schematic of one of the recent<br />

CBIAs [5], where CMRR of the IA is improved by [6].<br />

Circuit consists of two feedback loops. First feedback loop<br />

converts the differential <strong>in</strong>put voltage <strong>in</strong>to current. This<br />

current is copied to the second feedback loop, where it is<br />

converted <strong>in</strong>to voltage. Therefore, the ga<strong>in</strong> of the circuit is<br />

def<strong>in</strong>ed by the ratio R2/R1. It is important to note that any<br />

current pass<strong>in</strong>g through resistor R1 is supplied by the<br />

current sources M3 <strong>and</strong> M4. Hence, any DC <strong>in</strong>put will<br />

only affect the quiescent operat<strong>in</strong>g po<strong>in</strong>ts of the current<br />

mirrors M1-M6. Therefore, <strong>in</strong> our analysis we have<br />

assumed that <strong>in</strong>put differential electrode offset is only<br />

affect<strong>in</strong>g the transconductances of the transistors of the<br />

current mirrors, M1-M6. Moreover, we have also assumed<br />

that transconductance of a MOS transistor is much larger<br />

than its output transconductance.<br />

CMRR analysis of the circuit is divided <strong>in</strong>to two groups.<br />

First one is the systematic CMRR of the circuit, which<br />

def<strong>in</strong>es the limit of the CMRR due to the topology, <strong>and</strong><br />

second one is the reduction of the CMRR due to the<br />

mismatches <strong>in</strong>troduced by the DC electrode offset.


Figure 2. Simplified schematic of the IA proposed<br />

by [5].<br />

2.1 Systematic CMRR<br />

Systematic CMRR is def<strong>in</strong>ed by the topology of the circuit<br />

<strong>and</strong> f<strong>in</strong>ite unless circuit is fully differential. Fig. 3 shows<br />

the low-frequency small signal model of the first feedback<br />

loop of the IA of Fig. 2. Assum<strong>in</strong>g matched transistor<br />

parameters <strong>and</strong> large gmi, common mode ga<strong>in</strong> of the first<br />

feedback loop can be written as:<br />

I I g g ( 2g<br />

+ g )<br />

A<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

1 2 dsl o 1 m<br />

CM = = − =<br />

(1)<br />

v v 2 g m g ml<br />

where gml <strong>and</strong> gdsl is the transconductance <strong>and</strong> output<br />

transconductance of the NMOS current mirrors of Fig. 2,<br />

go is the total output transconductance of current sources<br />

M3 <strong>and</strong> M4, gm is the transconductance of the <strong>in</strong>put pair<br />

transistors, <strong>and</strong> g1 is the transconductance of the first ga<strong>in</strong><br />

resistance, R1. Fig. 4 shows the small signal model of the second<br />

feedback loop. Similar to the previous analysis,<br />

differential ga<strong>in</strong> of the circuit can be written as:<br />

vout 1<br />

=<br />

(2)<br />

I − I 2g<br />

( 2 1 ) 2<br />

assum<strong>in</strong>g a large open-loop opamp ga<strong>in</strong>, A v.<br />

Figure 3. Low-frequency small signal model of<br />

the first feedback loop of IA of Fig. 2.<br />

Systematic CMRR of the circuit can be found by<br />

comb<strong>in</strong><strong>in</strong>g Equations (1), (2), <strong>and</strong> the differential ga<strong>in</strong> of<br />

the IA, which is g 1/g 2.<br />

10<br />

Figure 4. Low-frequency small signal model of<br />

the second feedback loop of IA of Fig. 2.<br />

CMRR<br />

A<br />

2 g<br />

g<br />

DM<br />

1 m ml<br />

sys = =<br />

(3)<br />

A CM g dsl g o ( 2 g1<br />

+ g m )<br />

Equation (3) states that <strong>in</strong> order to maximize the CMRR of<br />

the IA, transconductance of first ga<strong>in</strong> resistance, g 1, <strong>and</strong><br />

transconductances of the <strong>in</strong>put pair transistors should be<br />

maximized. In addition to that, output resistances of the<br />

current sources M3-M6 should be as high as possible,<br />

which is already demonstrated by [6]. It is important to<br />

note that systematic CMRR def<strong>in</strong>es the maximum limit of<br />

CMRR under ideal conditions <strong>and</strong> will be reduced by the<br />

process related mismatches.<br />

2.2 CMRR Limit Due to Electrode Offset<br />

Even if the transistors are perfectly matched, the<br />

differential DC electrode offset voltage <strong>in</strong>troduces<br />

operat<strong>in</strong>g po<strong>in</strong>t mismatches to the IA presented <strong>in</strong> the<br />

previous section. Assum<strong>in</strong>g that common-mode to<br />

common-mode ga<strong>in</strong> of the first stage is <strong>in</strong>f<strong>in</strong>itesimally<br />

small, mismatches due to the electrode offset can be<br />

divided <strong>in</strong>to two groups. First one is due to the mismatch<br />

of the dra<strong>in</strong>-to-source voltages of the <strong>in</strong>put pair transistors<br />

<strong>and</strong> second one is the mismatch of the output<br />

transconductances of the current sources M3 <strong>and</strong> M4.<br />

Due to the fact that CBIAs are forc<strong>in</strong>g the <strong>in</strong>put pair<br />

transistors to operate at the same quiescent current, gateto-source<br />

voltages of the <strong>in</strong>put pair transistors are<br />

equalized. Therefore, any DC offset at the gate of the <strong>in</strong>put<br />

transistors will affectively create a mismatch at the source<br />

voltages of these transistors. As a consequence, a<br />

mismatch of dra<strong>in</strong>-to-source voltage is created, lead<strong>in</strong>g to<br />

a mismatch of the output transconductances of the <strong>in</strong>put<br />

transistors. Fig. 3 shows the <strong>in</strong>troduced mismatch<br />

parameter of the output transconductances to the <strong>in</strong>put pair<br />

transistors, ±∆gds/2. CMRR of the circuit due to the output<br />

transconductance mismatch can be written as:<br />

g1<br />

g2<br />

g m<br />

CMRR R ( V<strong>in</strong><br />

) = =<br />

∆<br />

(4)<br />

∆g<br />

⋅g<br />

g ⋅ g ∆g<br />

( ds(<br />

) ) ( ) ∆V<strong>in</strong><br />

1 m 2<br />

ds(<br />

∆V<strong>in</strong><br />

)<br />

g


Assum<strong>in</strong>g that the <strong>in</strong>put transistors are always operat<strong>in</strong>g <strong>in</strong><br />

saturation region, output transconductance of a MOS<br />

transistor can be written as:<br />

I ds<br />

g ds = (5)<br />

V + V<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

E<br />

where Ids is the dra<strong>in</strong>-to-source voltage of the MOS<br />

transistor, VE is the early voltage, <strong>and</strong> Vds is the dra<strong>in</strong>-tosource<br />

voltage. Equation 5 can be used to derive the<br />

change of gds, ∆gds, with respect to ∆Vds <strong>and</strong> can be<br />

written as:<br />

Ids<br />

∆Vds<br />

∆ gds<br />

= 2<br />

(6)<br />

V<br />

E<br />

assum<strong>in</strong>g that V A is much larger than V ds. ∆V ds of the<br />

<strong>in</strong>put transistors have two components. First one is due to<br />

the source voltage mismatch, which equals to the <strong>in</strong>put<br />

DC voltage mismatch, <strong>and</strong> second one is due to the f<strong>in</strong>ite<br />

g mi, of the transconductance amplifier. Therefore, ∆V ds<br />

can be written as:<br />

ds<br />

⎛ g ⎞ 1<br />

∆V<br />

= ∆<br />

⎜ +<br />

⎟<br />

ds V<strong>in</strong><br />

1<br />

(7)<br />

⎝ gmi<br />

⎠<br />

Comb<strong>in</strong><strong>in</strong>g equations (4), (6), <strong>and</strong> (7) gives the CMRR<br />

equation of the IA as:<br />

CMRR<br />

g<br />

V<br />

2<br />

m E<br />

R ( ∆ V<strong>in</strong><br />

) =<br />

(8)<br />

Ids<br />

( 1 + g1<br />

g mi ) ∆V<strong>in</strong><br />

As a result, <strong>in</strong> order to maximize CMRR R, g 1 should be<br />

much smaller than the g mi, <strong>and</strong> <strong>in</strong>put pairs should be<br />

designed as long <strong>and</strong> wide transistors to maximize both<br />

g m/I ds <strong>and</strong> V E 2 . Under low power dissipation, gmi will be<br />

small, therefore g 1 should be decreased to compensate the<br />

small value of g mi.<br />

Second mismatch due to the <strong>in</strong>put DC offset voltage is the<br />

mismatch of the output transconductances of the current<br />

sources M3 <strong>and</strong> M4. Due to the fact that any DC current<br />

pass<strong>in</strong>g through the first ga<strong>in</strong> resistance is supplied by<br />

current sources, match<strong>in</strong>g of their quiescent operat<strong>in</strong>g<br />

po<strong>in</strong>t is disturbed by the <strong>in</strong>put DC offset voltage. If the<br />

circuit is designed for low power dissipation, any<br />

electrode offset will disturb the quiescent currents of these<br />

current sources <strong>in</strong> the order of µA range, result<strong>in</strong>g <strong>in</strong> a<br />

large output transconductance mismatch depend<strong>in</strong>g on the<br />

topology of the current mirrors. Fig. 3 shows the added<br />

parameter, ±∆g o/2, represent<strong>in</strong>g the mismatch of the<br />

output transconductances of the current sources, M3 <strong>and</strong><br />

M4. CMRR equation of the circuit consider<strong>in</strong>g only the<br />

mismatches of the current sources can be written as:<br />

CMRR<br />

A<br />

g<br />

g<br />

DM<br />

1 2<br />

1<br />

M ( ) = =<br />

=<br />

∆V<strong>in</strong><br />

(9)<br />

ACM<br />

∆go<br />

( V<strong>in</strong><br />

) 2g<br />

2 ∆g<br />

∆<br />

o(<br />

∆V<strong>in</strong><br />

)<br />

1<br />

2g<br />

where ∆g o(∆V <strong>in</strong>) depends on the current mirror topology.<br />

For a cascode current mirror, similar to [6], <strong>and</strong> assum<strong>in</strong>g<br />

equal sized mirror transistors, ∆g o(∆V <strong>in</strong>) can be written as:<br />

1.<br />

5<br />

1.<br />

5<br />

[ ( Ids<br />

+ ∆V<strong>in</strong><br />

g1<br />

) − ( Ids<br />

− V<strong>in</strong><br />

g1<br />

) ]<br />

2<br />

1 VE<br />

∆ go =<br />

∆<br />

(10)<br />

2K<br />

n<br />

11<br />

where K n is the MOS transistor transconductance<br />

parameter. Therefore, CMRR M equation describ<strong>in</strong>g the<br />

CMRR of the IA based on the mismatches of the current<br />

sources can be written as:<br />

CMRR<br />

∆V<br />

<strong>in</strong><br />

M ( ∆V<strong>in</strong><br />

)<br />

g < I<br />

1<br />

ds<br />

=<br />

1.<br />

5<br />

( I + ∆V<br />

g ) − ( I − ∆V<br />

g )<br />

ds<br />

2g1<br />

<strong>in</strong><br />

1<br />

V<br />

2<br />

E<br />

2K<br />

ds<br />

n<br />

<strong>in</strong><br />

1.<br />

5<br />

1<br />

(11)<br />

Function CMRR M is an <strong>in</strong>creas<strong>in</strong>g function with<br />

<strong>in</strong>creas<strong>in</strong>g g 1. However, if g 1 is selected to be large, then<br />

<strong>in</strong>put stage of the IA could only accept small values of<br />

electrode offset under low power dissipation. Therefore,<br />

an ideal solution will be maximiz<strong>in</strong>g the first ga<strong>in</strong><br />

resistance accord<strong>in</strong>g to Equation (8) <strong>and</strong> maximiz<strong>in</strong>g<br />

CMRR M by us<strong>in</strong>g active current mirrors, where the total<br />

output transconductance is divided by the feedback.<br />

Therefore, use of active current mirrors will multiply the<br />

CMRR M with the open-loop ga<strong>in</strong> of the feedback path.<br />

As a result, total CMRR of the circuit can be written as the<br />

superposition of the three mechanisms, systematic CMRR,<br />

CMRR due to the transconductance mismatch of the <strong>in</strong>put<br />

transistors, <strong>and</strong> CMRR due to the mismatch of the output<br />

transconductance of the current sources.<br />

1 1 1 1<br />

= + +<br />

(12)<br />

CMRR CMRR CMRR CMRR<br />

T<br />

SYS<br />

Although equations are derived for a s<strong>in</strong>gle CBIA<br />

topology, It can be applied to all the CBIA topologies [2]-<br />

[7], know<strong>in</strong>g that the mechanisms affect<strong>in</strong>g the CMRR<br />

under electrode offset are the mismatch of the output<br />

transconductances of the <strong>in</strong>put pair <strong>and</strong> the current sources<br />

connected to the first ga<strong>in</strong> resistor.<br />

2.3 Verification of the Model<br />

In order to verify the presented model, a CBIA similar to<br />

the circuit shown <strong>in</strong> Fig.2 is implemented <strong>in</strong> 0.5-µm<br />

CMOS technology. Circuit uses active current mirrors for<br />

implement<strong>in</strong>g M1-M6, which <strong>in</strong>creases the CMRR M.<br />

Circuit is optimized for low power dissipation <strong>and</strong> for<br />

achiev<strong>in</strong>g high CMRR under high electrode offset<br />

voltages (up to 50 mV). Circuit dissipates 110 µA from a 3<br />

V supply, <strong>and</strong> bias current of the <strong>in</strong>put transistors are 12<br />

µA each. Ga<strong>in</strong> of the IA is 10 V/V with a first ga<strong>in</strong><br />

resistance of 10 kΩ.<br />

Fig. 5 shows the comparison of simulated, calculated, <strong>and</strong><br />

measured CMRR of the IA at 50 Hz with chang<strong>in</strong>g <strong>in</strong>put<br />

offset voltage. Components of the total CMRR equation,<br />

Equation (12), are also shown <strong>in</strong> Fig. 5 to demonstrate<br />

their effect on the total CMRR. Simulated, calculated, <strong>and</strong><br />

measured CMRR fits well <strong>in</strong> the large offset regions. Use<br />

of the active current mirrors <strong>in</strong>creases the CMRR M,<br />

leav<strong>in</strong>g CMRR R as the dom<strong>in</strong>ant mechanism def<strong>in</strong><strong>in</strong>g the<br />

total CMRR of the IA. Under low <strong>in</strong>put electrode offset<br />

voltage, total CMRR of the IA is def<strong>in</strong>ed by the process<br />

related mismatches of the circuit. However, as the <strong>in</strong>put<br />

DC offset <strong>in</strong>creases, which is the case if the IA is used for<br />

R<br />

M


measur<strong>in</strong>g biopotential voltages, total CMRR is def<strong>in</strong>ed<br />

by the mismatches <strong>in</strong>troduced by the <strong>in</strong>put DC offset.<br />

Figure 5. Comparison of the simulated, calculated,<br />

<strong>and</strong> measured CMRR of the fabricated IA.<br />

CMRR R is shifted by 5 dB to <strong>in</strong>crease visibility.<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

3. CMRR IMPROVEMENT<br />

CMRR of the IA can be further improved by actively<br />

adjust<strong>in</strong>g the dra<strong>in</strong>-to-source voltage of the <strong>in</strong>put pair<br />

transistors, which ideally sets the CMRR R to <strong>in</strong>f<strong>in</strong>ity <strong>in</strong> the<br />

expense of reduced <strong>in</strong>put common mode range. Fig. 6<br />

shows the improved IA circuit. Transistors M p3 <strong>and</strong> M p4<br />

act as a source follower <strong>and</strong> actively adjust the dra<strong>in</strong><br />

voltage of the transistors Mp 1 <strong>and</strong> Mp 2. Therefore, even if<br />

there is electrode-offset at the <strong>in</strong>put, dra<strong>in</strong>-to-source<br />

voltage of the <strong>in</strong>put pair transistors will be equalized. DC<br />

level shifter [8] ensures that Mp1 <strong>and</strong> Mp2 stay <strong>in</strong><br />

saturation. S<strong>in</strong>ce dra<strong>in</strong>-to-source voltages are equalized,<br />

CMRR R component of the Equation (12) vanishes.<br />

Figure 6. Schematic of the modified circuit for<br />

improv<strong>in</strong>g the CMRR of IA under electrodeoffset.<br />

Fig. 7 shows the simulation result of the improved IA.<br />

Note that simulated CMRR is very close to the calculated<br />

CMRR M. S<strong>in</strong>ce all the causes for the lowered CMRR<br />

under electrode-offset are elim<strong>in</strong>ated, CMRR will also be<br />

def<strong>in</strong>ed by the process related mismatches of the circuit as<br />

it is for the low offset region.<br />

12<br />

Figure 8. Comparison of calculated <strong>and</strong> simulated<br />

CMRR of the improved IA.<br />

4. CONCLUSIONS<br />

CMRR of the CBIAs is highly reduced by the electrodeoffset<br />

voltage. In this paper, we presented the mechanisms<br />

beh<strong>in</strong>d this CMRR reduction. Proposed model is<br />

compared with the simulated <strong>and</strong> measured CMRR of the<br />

implemented IA. CMRR of the implemented IA is<br />

improved under electrode-offset voltage us<strong>in</strong>g active<br />

current mirrors. F<strong>in</strong>ally, a circuit technique is proposed for<br />

improv<strong>in</strong>g the CMRR of the IA under large DC electrode<br />

offset voltages. Although applied to a s<strong>in</strong>gle IA topology,<br />

presented equations <strong>and</strong> circuit improvement technique<br />

can be used for all the CBIA structures.<br />

5. REFERENCES<br />

[1] H. Krabbe, “A High-Performance Monolithic<br />

Instrumentation Amplifier,” IEEE ISSCC, vol. 14,<br />

pp. 186-187, Feb. 1971.<br />

[2] F. L. Eatock, “A Monolithic Instrumentation<br />

Amplifier with Low Input Current,” IEEE ISSCC,<br />

vol. 16, pp. 148-149, Feb. 1973.<br />

[3] A. P. Brokaw <strong>and</strong> M. P. Timko, “An Improved<br />

Monolithic Instrumentation Amplifier,” IEEE JSSC,<br />

vol.10, iss. 6, pp. 417-423, Dec. 1975.<br />

[4] R. J. Van de Plassche, “A Wide-B<strong>and</strong> Monolithic<br />

Instrumentation Amplifier,” IEEE JSSC, vol. 10, no.<br />

6, pp. 424-431, Dec. 1975.<br />

[5] R. Mart<strong>in</strong>s, S. Selberherr, <strong>and</strong> F. A. Vaz, “A CMOS<br />

IC for Portable EEG Acquisition Systems,” IEEE<br />

Trans. on Inst. <strong>and</strong> Meas., vol. 47, iss. 5, pp. 1191-<br />

1196, Oct. 1998.<br />

[6] P. A. dal Fabbro <strong>and</strong> C. A. dos Reis Filho, “An<br />

Integrated Instrumentation Amplifier with Improved<br />

CMRR,” 15 th Symp. On Int. Cir. And Sys. Design,<br />

pp. 57-61, Sept. 2002.<br />

[7] M. S. J. Steyaert, W. M. C. Sansen, <strong>and</strong> C.<br />

Zhongyuan, “A Micropower Low-Noise Monolithic<br />

Instrumentation Amplifier for Medical Purposes,”<br />

IEEE JSSC, vol. 22, iss. 6, pp. 1163-1168, Dec.<br />

1987.<br />

[8] J. Ramirez-Angulo, “Low Voltage Current Mirrors<br />

for Built-<strong>in</strong> Current Sensors,” ISCAS, vol. 5, pp. 529-<br />

532, 1994.


0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

$ /2: 92/7$*( +,*+/< /,1($5<br />

,17(*5$7(' $&7,9( 5& ),/7(5<br />

$WKDQDVLRV 9DVLORSRXORV *HRUJLRV 9LW]LODLRV *HUDVLPRV 7KHRGRUDWRV @ ,W ZDV SUHIHUUHG RYHU RWKHU VROXWLRQV<br />

VXFK DV WKH OHDSIURJ EHFDXVH RI LWV VWUDLJKWIRUZDUG<br />

UHDOL]DWLRQ DQG LWV HDVLO\ PRGLILDEOH WUDQVIHU IXQFWLRQ 7KH<br />

VHOHFWLRQ DPRQJVW WKH IRXU GLIIHUHQW ILOWHU FRQILJXUDWLRQV LV<br />

PDGH SRVVLEOH E\ GLJLWDO VLJQDOV WKDW PDQLSXODWH WKH YDOXHV<br />

RI WKH ILOWHUV UHVLVWRUV DQG FDSDFLWRUV )RU FODULILFDWLRQ<br />

SXUSRVHV WKH VWUXFWXUH RI RQH UHVLVWRU DQG FDSDFLWRU RI WKH<br />

ILOWHU LV RXWOLQHG LQ )LJ 7KH VLJQDO %$1' FRQWUROV WKH<br />

13<br />

EDQGZLGWK RI WKH ILOWHU ZKHUHDV VLJQDOV &+ DQG LWV<br />

FRPSOHPHQWDU\ (// GHWHUPLQH WKH WUDQVIHU IXQFWLRQ E\<br />

UHJXODWLQJ UHVLVWRU DQG FDSDFLWRU YDOXHV 7KH UHVLVWRUV DUH<br />

DVVHPEOHG E\ VHULHV DQG SDUDOOHO FRPELQDWLRQV RI D ³XQLW<br />

UHVLVWRU´ WKDW WDNHV GLIIHUHQW YDOXHV IRU WKH &KHE\VKHY DQG<br />

HOOLSWLF FDVH 5 �� DQG 5 ��� UHVSHFWLYHO\ 7KH XQLW UHVLVWRU<br />

5 � LV GHSLFWHG LQ )LJ ,W LQFOXGHV D FRQVWDQW 5 ��� DQG D<br />

ELW ELQDU\ ZHLJKWHG SDUW 5 ��� WKH ODWWHU EHLQJ<br />

UHVSRQVLEOH IRU DGMXVWLQJ WKH FXW RII IUHTXHQF\<br />

)LJXUH 6WUXFWXUH RI D D UHVLVWRU DQG E D<br />

FDSDFLWRU RI WKH ILOWHU<br />

)LJXUH 8QLW UHVLVWRU<br />

$VVXPLQJ LGHDO VZLWFKHV 5�� WKHQ LQ WKH JHQHUDO FDVH<br />

1 ELW FRQWURO ZRUG 5� FDQ EH H[SUHVVHG DV IROORZV<br />

5<br />

�<br />

ZKHUH<br />

5<br />

5<br />

���<br />

���<br />

$ ˜ � ��<br />

Q'5<br />

� ��<br />

'5<br />

d Q d<br />

�<br />

�<br />

$ ˜<br />

�<br />

�<br />

'5<br />

5 ˜ 5<br />

���<br />

D ˜ D<br />

�<br />

' �<br />

�<br />

�<br />

˜ D ˜ D<br />

�<br />

�<br />

���<br />

˜ D D<br />

� 5 ˜ 5<br />

5<br />

���<br />

'5<br />

¦ � �<br />

�<br />

��<br />

�<br />

�<br />

$<br />

�<br />

$<br />

�<br />

˜ '<br />

���<br />

5


$ ���«$ �$ � 'LJLWDO ZRUG XVHG IRU WXQLQJ<br />

Q 7KH GHFLPDO YDOXH RI WKH GLJLWDO FRQWURO ZRUG<br />

“D� “D� 9DULDWLRQ RI UHVLVWRU DQG FDSDFLWRU YDOXHV<br />

UHVSHFWLYHO\ RZLQJ WR SURFHVV DQG WHPSHUDWXUH GULIWV<br />

5��� &RQVWDQW SDUW RI XQLW UHVLVWRU 5� 5��� 9DULDEOH ELQDU\ ZHLJKWHG SDUW RI XQLW UHVLVWRU 5� 5������� Q 5������� � ± 5 Q � ±<br />

5��� 1RPLQDO GHVLJQ UHVLVWRU YDOXH<br />

5 7KH VWHS RI FKDQJH IRU 5��� 5 %DVLF EXLOGLQJ EORFN RI 5��� 7KH HUURU LQ WKH YDOXH RI WKH 5& SURGXFWHPDQDWLQJIURP WKH LQKHUHQW TXDQWL]DWLRQ RI WKH DERYH DSSURDFK LV<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

H<br />

r<br />

˜<br />

5 ���<br />

'5<br />

u<br />

Q<br />

DQG EHFRPHV PD[LPXP IRU Q &RQVLGHULQJ WKH QXPEHU<br />

RIELWV1 DQG WKH FKDUDFWHULVWLFV RI WKH &026<br />

SURFHVV XVHG ZH GHGXFH WKDW ��� “ 7KH DYHUDJH<br />

TXDQWL]DWLRQ HUURU LV �� “ ,W VKRXOG EH SRLQWHG RXW<br />

WKDW LQ SUDFWLFH WKH VZLWFKHV 5�� LV DGGHG WR 5� DQG<br />

DIIHFWV WKH TXDQWL]DWLRQ HUURU FRQVLGHUDEO\ DV 1 JURZV<br />

$OO WKH VZLWFKHV WKDW GHILQH WKH WUDQVIHU IXQFWLRQ DUH<br />

LPSOHPHQWHG E\ 1026 WUDQVLVWRUV 7KH YROWDJH GULYLQJ<br />

WKHJDWHVRIWKHVH026)(7VLV9LQVWHDGRI9 VXSSO\ RI WKH ILOWHUV DPSOLILHUV 7KLV LV DWWULEXWHG WR WKH<br />

IROORZLQJ UHDVRQV )LUVWO\ LI WKLV YROWDJH ZDV LQ WKH RUGHU<br />

RI 9 WKHQ D ODUJH LQSXW VLJQDO ZRXOG OHDG WR LPSURSHU<br />

EHKDYLRXU DV 9�� RI WKH VZLWFKHV PD\ EH XQVXLWDEOH IRU<br />

VZLWFKLQJ RSHUDWLRQ 7KH VHFRQG UHDVRQ LV UHODWHG ZLWK WKH<br />

VZLWFKHV SDUDVLWLFV 'HILQLQJ DV 5����� WKHVXPRIWKH VZLWFKHV 5�� ZKHQ DOO RI WKHP DUH LQ WKH 21 VWDWH DQG<br />

&����� WKH FRUUHVSRQGLQJ WRWDO SDUDVLWLF FDSDFLWDQFH WKHQ<br />

WKH WLPH FRQVWDQW ����� 5�����&����� DVVRFLDWHG ZLWK WKH<br />

VZLWFKHV PXVW EH QHJOLJLEOH FRPSDUHG WR WKH EDQGZLGWK RI<br />

WKH ILOWHU $ YROWDJH ORZHU WKDQ 9 UDLVHV WKH VZLWFKHV<br />

5�� DQG FRQVHTXHQWO\ 5����� WR DQ H[WHQW WKDW GUDVWLFDOO\<br />

GHWHULRUDWHV WKH ILOWHUV JDLQ UHVSRQVH<br />

'LJLWDO DXWRPDWLF WXQLQJ VFKHPH<br />

7KH GLJLWDO DXWRPDWLF WXQLQJ VFKHPH LV SUHVHQWHG LQ )LJ<br />

,W HQFRPSDVVHV D UHIHUHQFH RVFLOODWRU D OLPLWHU DQ H[WHUQDO<br />

FORFN D GRZQFRXQWHU DQG D UHJLVWHU 7KH UHIHUHQFH<br />

RVFLOODWRU KDV D SHULRG SURSRUWLRQDO WR WKH 5& SURGXFW RI<br />

LWV UHVLVWRUV ZKLFK DUH EXLOG E\ 5 � DQG LWV FDSDFLWRUV<br />

8QGHU QRPLQDO FRQGLWLRQV LWV IUHTXHQF\ LV .+] 7KH<br />

H[WHUQDO FORFN IUHTXHQF\ LV DOZD\V NHSW HTXDO WR 0+]<br />

7KH WZR IUHTXHQFLHV PXVW KDYH D UDWLR RI 7KH<br />

DOJRULWKP RI WKH WXQLQJ VFKHPH LV DQ XSJUDGHG YHUVLRQ RI<br />

WKH RQH PHQWLRQHG LQ > @ DQG > @ 7KH PRVW LPSRUWDQW<br />

PHULW RI WKLV DSSURDFK LV WKDW LW FRUUHFWV WKH RVFLOODWRU<br />

IUHTXHQF\ DQG WKH ILOWHU FKDUDFWHULVWLFV LQ RQH LWHUDWLRQ RI<br />

WKH DOJRULWKP LQVWHDG RI JUDGXDOO\ DSSURDFKLQJ WKH ULJKW<br />

RVFLOODWLQJ SHULRG OLNH LQ SUHYLRXVO\ UHSRUWHG ZRUNV<br />

14<br />

)LJXUH %ORFN GLDJUDP RI WKH WXQLQJ VFKHPH<br />

)LJXUH 2VFLOODWRU KDYLQJ D SHULRG SURSRUWLRQDO<br />

WR WKH 5& SURGXFW<br />

7KH FRQFHSW XSRQ ZKLFK WKH DOJRULWKP RI WKH WXQLQJ<br />

VFKHPH UHOLHV LV WKDW WKH SHULRG RI DQ RVFLOODWRU DV WKH RQH<br />

UHSUHVHQWHG LQ )LJ LV SURSRUWLRQDO WR WKH 5& SURGXFW<br />

7KHUHIRUH DQ\ FKDQJH LQ WKH YDOXH RI WKLV SURGXFW ZRXOG<br />

OHDG WR IOXFWXDWLRQ RI WKH RVFLOODWLQJ IUHTXHQF\ 1HHGOHVV<br />

WR VD\ WKH SURGXFW YDOXH DIIHFWV WKH WLPH FRQVWDQW RI WKH<br />

ILOWHU LQ D VLPLODU PDQQHU LI WKH SDVVLYH FRPSRQHQWV RI WKH<br />

RVFLOODWRU DUH RI WKH VDPH PDWHULDO ZLWK WKRVH FRPSULVLQJ<br />

WKH ILOWHU $FFRUGLQJO\ D PHDVXUHPHQW DQG PRGLILFDWLRQ RI<br />

WKH YDOXH RI WKH 5& SURGXFW WKDW PDLQWDLQV WKH IUHTXHQF\<br />

RI WKH RVFLOODWRU HTXDO WR N+] ZRXOG DOVR FRUUHFW WKH<br />

ILOWHU FKDUDFWHULVWLFV 7KH V\VWHP LQ )LJ SHUIRUPV WKLV<br />

DGMXVWPHQW E\ PHDQV RI DQ DOJRULWKP HOXFLGDWHG EHORZ<br />

,W FDQ HDVLO\ EH SURYHQ WKDW WKH FLUFXLW LQ )LJ LV<br />

JRYHUQHG E\ D UHODWLRQ RI WKH IRUP<br />

9 ���<br />

� � �<br />

˜ V & *<br />

ZKHUH * 5 7KHUHIRUH LW LV REYLRXV WKDW RVFLOODWLRQV<br />

DUH VXVWDLQHG KDYLQJ D SHULRG 5& 7KH UHVLVWRUV RI<br />

WKH RVFLOODWRU DUH FRQVWUXFWHG E\ WKH XQLW UHVLVWRU 5 � WKXV<br />

DOORZLQJ WKH FRQWURO RI WKH SHULRG ZLWK D ELW GLJLWDO<br />

ZRUG 7KH DPSOLILHUV DUH LGHQWLFDO WR WKH RQHV GHSOR\HG LQ<br />

WKH ILOWHU 7KH VHOHFWLRQ RI WKH IUHTXHQF\ LV D FRPSURPLVH<br />

EHWZHHQ WKH GHVLUDEOH DFFXUDF\ DQG WKH YDOXH RI WKH<br />

HOHPHQWV HVSHFLDOO\ WKH FDSDFLWRUV RI WKH RVFLOODWRU<br />

*UHDWHU SHULRGV DUH VKDUSO\ DFKLHYHG EXW LPSRVH ODUJHU<br />

YDOXHV IRU WKH FRPSRQHQWV +HQFH WKH RVFLOODWRU SHULRG<br />

ZDV RSWHG WR EH V<br />

$V RXWOLQHG LQ )LJ WKH RXWSXW RI WKH RVFLOODWRU DQG DQ<br />

H[WHUQDO FORFN DUH FRQQHFWHG WR WKH (1 DQG &/. LQSXWV RI<br />

WKH GRZQFRXQWHU UHVSHFWLYHO\ 3ULRU WR HDFK LWHUDWLRQ WKH<br />

GRZQFRXQWHU LV LQLWLDOL]HG LQ WKH YDOXH DQG XQGHU<br />

QRPLQDO FRQGLWLRQV LQ HDFK RVFLOODWRU SHULRG GHFUHPHQWV<br />

WLPHV 7KHUHIRUH LI WKH RVFLOODWRU SHULRG LV JUHDWHU WKDQ<br />

WKH QRPLQDO WKH GRZQFRXQWHU ZLOO UHDFK QHJDWLYH YDOXHV<br />

EXW LI LW LV ORZHU WKH GRZQFRXQWHU ZLOO VWRS EHIRUH LW JHWV


WR ]HUR 7KH RXWFRPH RI WKH DGGLWLRQ RI WKH GRZQFRXQWHUV<br />

ILQDO YDOXH DQG WKH GLJLWDO ZRUG WKDW FRQWUROV WKH UHVLVWRUV<br />

RI WKH ILOWHU SURGXFHV WKH QHZ GLJLWDO ZRUG WKDW ZLOO EH IHG<br />

LQWR WKH UHVLVWRUV 7KH DOWHULQJ RI WKH UHVLVWRU YDOXH UHVXOWV<br />

LQ DQ RVFLOODWLQJ SHULRG WKDW LV DURXQG V &RQVHTXHQWO\<br />

XQWLO D VLJQLILFDQW FKDQJH LQ FLUFXLW FRQGLWLRQV DULVHV WKH<br />

GLJLWDO FRQWURO ZRUG UHPDLQV WKH VDPH<br />

$FWXDOO\ LQ WKH DGGLWLRQ WKH 06% RI WKH GRZQFRXQWHU LV<br />

QHJOHFWHG LWV RQO\ DLP LV WR JHQHUDWH WKH YDOXH DQG WKH<br />

UHPDLQLQJ ELWV DUH URXQGHG VR DV WR LPPXQH WKH V\VWHP WR<br />

VOLJKW GHYLDWLRQV LQ WKH IUHTXHQF\ RI WKH RVFLOODWRU 2QH<br />

LPSRUWDQW REVHUYDWLRQ LV WKDW FDUH VKRXOG EH H[HUFLVHG LQ<br />

WKH GHVLJQ WR HQVXUH WKDW WKH GXW\ F\FOH RI WKH RVFLOODWRU LV<br />

LQ RUGHU IRU WKH WXQLQJ VFKHPH WR RSHUDWH SURSHUO\<br />

$Q DOWHUQDWLYH DSSURDFK LV WR FRQQHFW WKH RVFLOODWRUV<br />

RXWSXW WR WKH &/. LQSXW RI WKH GRZQFRXQWHU DQG XVH DQ<br />

H[WHUQDO FORFN WLPHV VORZHU WKDQ WKH RVFLOODWRUV DV WKH<br />

(1 LQSXW ,Q WKLV FDVH WKH RVFLOODWRUV GXW\ F\FOH LV RI QR<br />

LPSRUWDQFH ZKHUHDV WKH H[WHUQDO FORFN GXW\ F\FOH PLJKW<br />

EH HDVLHU WR PDQDJH +RZHYHU WKH SULFH WR SD\ LV WKDW ERWK<br />

IUHTXHQFLHV IDOO LQVLGH WKH SDVVEDQG RI WKH ILOWHU<br />

([SHULPHQWDO UHVXOWV<br />

7KH ILOWHU ZDV IDEULFDWHG LQ D P &026 SURFHVV DQG<br />

RFFXSLHV PP �<br />

WXQLQJ FLUFXLW LQFOXGHG $ FKLS<br />

PLFURSKRWRJUDSK LV LOOXVWUDWHG LQ )LJ 7KH PDLQ ILOWHU<br />

DQG WKH RVFLOODWRU RSHUDWH RQ D 9 VXSSO\ DQG WKH\ GUDZ<br />

P$ DQG P$ UHVSHFWLYHO\<br />

)LJXUH &KLS PLFURSKRWRJUDSK<br />

,Q )LJ WKH JDLQ UHVSRQVH RI WKH ILOWHU LV GHSLFWHG LQ WKH<br />

IRXU PRGHV 7KH IUHTXHQF\ WXQLQJ UDQJH IRU WKH ILOWHU LV<br />

0+] ± 0+] LQ WKH 0+] FRQILJXUDWLRQ ZKHUHDV<br />

LQ WKH 0+] PRGH LW LV 0+] ± 0+] /LQHDULW\<br />

UHVXOWV DUH JLYHQ LQ 7DEOH 7KH HOOLSWLF ILOWHU LV D OLWWOH<br />

ZRUVH LQ WHUPV RI OLQHDULW\ EHFDXVH LW KDV QRW EHHQ<br />

RSWLPL]HG IRU G\QDPLF UDQJH SHUIRUPDQFH 7KH ILOWHUV<br />

RXWSXW QRLVH LQWHJUDWHG LQ WKH SDVVEDQG DQG 6)'5 DUH<br />

VKRZQ LQ 7DEOH 3655 LV SUHVHQWHG LQ 7DEOH<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

15<br />

Ã<br />

Ã<br />

Ã<br />

)LJXUH )LOWHU JDLQ UHVSRQVH LQ WKH IRXU VWDWHV<br />

7DEOH /LQHDULW\ UHVXOWV<br />

7DEOH 2XWSXW QRLVH SDVVEDQG DQG 6)'5<br />

7DEOH 3655 RI WKH ILOWHU<br />

)LJ GLVSOD\V WKH JURXS GHOD\ LQ WKH IRXU GLIIHUHQW ILOWHU<br />

FRQILJXUDWLRQV ,Q WKH HOOLSWLF LPSOHPHQWDWLRQ ZKLFK LV<br />

RSWLPL]HG IRU JURXS GHOD\ SHUIRUPDQFH WKH GHYLDWLRQ LV<br />

OHVV WKDQ “ QV WKURXJKRXW WKH SDVVEDQG 7KH &055 RI<br />

WKH ILOWHU LV GHSLFWHG LQ )LJ 7KH PLQLPXP VWRSEDQG<br />

DWWHQXDWLRQ IRU WKH &KHE\VKHY ILOWHU DW WZR WLPHV WKH<br />

FRUQHU IUHTXHQF\ DQG DERYH ZDV PHDVXUHG WR EH G%<br />

DQG G% IRU WKH 0+] DQG 0+] FRQILJXUDWLRQV<br />

UHVSHFWLYHO\ ,Q WKH HOOLSWLF FDVH WKH PLQLPXP VWRSEDQG<br />

DWWHQXDWLRQ LV G% LQ ERWK PRGHV 7KH $& UHVSRQVH RI<br />

WKH ILOWHU (OOLSWLF 0+] DW IRXU GLIIHUHQW WHPSHUDWXUHV LV<br />

LOOXVWUDWHG LQ )LJ %HFDXVH WKH WHPSHUDWXUH FRHIILFLHQW RI<br />

WKH UHVLVWRUV LV UHODWLYHO\ VPDOO SSP ž& WKH FRUQHU<br />

IUHTXHQF\ RI WKH ILOWHU GHYLDWHV RQO\ “ IRU D “ ž&


WHPSHUDWXUH YDULDWLRQ 7KHUHIRUH DV D UHVXOW RI WKH<br />

DXWRPDWLF WXQLQJ VFKHPHV “ DYHUDJH SUHFLVLRQ WKH<br />

GLJLWDO WXQLQJ ZRUG LV DGMXVWHG RQO\ DW WHPSHUDWXUHV<br />

H[FHHGLQJ ž&<br />

)LJXUH *URXS GHOD\ RI WKH ILOWHU<br />

Ã<br />

)LJXUH &055 RI WKH ILOWHU<br />

Ã<br />

)LJXUH $& UHVSRQVH RI WKH ILOWHU (OOLSWLF 0+] DW<br />

YDULRXV WHPSHUDWXUHV 7KH QXPEHU LQ EUDFNHWV FRUUHVSRQGV<br />

WR WKH YDOXH RI WKH WXQLQJ ZRUG XVHG<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

16<br />

&21&/86,21<br />

$Q LQWHJUDWHG DFWLYH 5& ILOWHU ZLWK PRGLILDEOH WUDQVIHU<br />

IXQFWLRQ DQG EDQGZLGWK ZDV SUHVHQWHG 7KH ILOWHU RSHUDWHV<br />

RQ D 9 VXSSO\ DQG GHGLFDWHV D QHZ GLJLWDO DXWRPDWLF<br />

WXQLQJ VFKHPH WR FRPSHQVDWH SURFHVV DQG WHPSHUDWXUH<br />

YDULDWLRQV ([SHULPHQWDO UHVXOWV YHULI\ WKH H[FHOOHQW<br />

OLQHDULW\ H[SHFWHG IURP WKH GHVLJQ DQG UHYHDO D EURDG<br />

G\QDPLF UDQJH<br />

5()(5(1&(6<br />

> @ : +LRH 7 2VKLPD < 6KLEDKDUD 7 'RL . 2]DNL<br />

DQG 6 $UD\DVKLNL ³ P &026 %OXHWRRWK<br />

DQDORJ UHFHLYHU ZLWK ± G%P VHQVLWLYLW\ ´ ,((( -<br />

6ROLG 6WDWH &LUFXLWV YRO SS ± )HE<br />

> @ $ 3lUVVLQHQ - -XVVLOD - 5\\QlQHQ / 6XPDQHQ<br />

DQG . $ , +DORQHQ ³$ *+] ZLGH EDQG GLUHFW<br />

FRQYHUVLRQ UHFHLYHU IRU :&'0$ DSSOLFDWLRQV ´<br />

,((( - 6ROLG 6WDWH &LUFXLWV YRO SS ±<br />

'HF<br />

> @ 6 6 /HH DQG & $ /DEHU ³$ %L&026 FRQWLQXRXV<br />

WLPH ILOWHU IRU YLGHR VLJQDO SURFHVVLQJ DSSOLFDWLRQV ´<br />

,((( - 6ROLG 6WDWH &LUFXLWV YRO SS ±<br />

6HS<br />

> @ < 3 7VLYLGLV ³,QWHJUDWHG FRQWLQXRXV WLPH ILOWHU<br />

GHVLJQ ± $Q RYHUYLHZ ´ ,((( - 6ROLG 6WDWH &LUFXLWV<br />

YRO SS ± 0DU<br />

> @ $ 0 'XUKDP - % +XJKHV DQG : 5HGPDQ :KLWH<br />

³&LUFXLW DUFKLWHFWXUHV IRU KLJK OLQHDULW\ PRQROLWKLF<br />

FRQWLQXRXV WLPH ILOWHULQJ ´ ,((( 7UDQV &LUFXLWV 6\VW<br />

,, YRO SS ± 6HS<br />

> @ - 2 9RRUPDQ $ YDQ %H]RRLMHQ DQG 1 5DPDOKR<br />

³2Q EDODQFHG LQWHJUDWRU ILOWHUV ´ LQ ,QWHJUDWHG<br />

&RQWLQXRXV 7LPH )LOWHUV 3ULQFLSOHV 'HVLJQ DQG<br />

$SSOLFDWLRQV < 3 7VLYLGLV DQG - 2 9RRUPDQ (G<br />

1HZ @ + +XDQJ DQG ( . ) /HH ³'HVLJQ RI ORZ YROWDJH<br />

&026 FRQWLQXRXV WLPH ILOWHU ZLWK RQ FKLS DXWRPDWLF<br />

WXQLQJ ´ ,((( - 6ROLG 6WDWH &LUFXLWV YRO SS<br />

± $XJ<br />

> @ $ $ (PLUD DQG ( 6iQFKH] 6LQHQFLR ³$ SVHXGR<br />

GLIIHUHQWLDO FRPSOH[ ILOWHU IRU %OXHWRRWK ZLWK<br />

IUHTXHQF\ WXQLQJ ´ ,((( 7UDQV &LUFXLWV 6\VW ,, YRO<br />

SS ± 2FW


0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

A UP TO 1GHZ LOW-POWER CONTINUOUS-TIME “3RD-ORDER<br />

FILTER+INTEGRATOR” CHAIN<br />

FOR WIRELESS BODY-AREA NETWORK RECEIVERS<br />

S. D’Amico 1(Ph.D. Student) , T. Grassi 1 , Ryckaert Julien 2 , A. Baschirotto 1<br />

1 Department of Innovation Eng<strong>in</strong>eer<strong>in</strong>g,University of Lecce, Italy<br />

E-mail: stefano.damico@unile.it, teobaldograssi@libero.it, <strong>and</strong>rea.baschirotto@unile.it<br />

ABSTRACT<br />

A CMOS filter+<strong>in</strong>tegrator baseb<strong>and</strong> cha<strong>in</strong><br />

embedded <strong>in</strong> a WBAN receiver is designed <strong>in</strong> a<br />

0.13µm CMOS technology at 1.2V. The 3 rd -order<br />

LP filter cut-off frequency <strong>and</strong> dc-ga<strong>in</strong> can be tuned<br />

from 250MHz up to 1GHz <strong>and</strong> from 0dB up to<br />

15dB, respectively. The filter power consumption is<br />

limited to 3.2mW <strong>in</strong> the most str<strong>in</strong>gent conditions,<br />

i.e. 1GHz cut-off frequency, 15dB dc-ga<strong>in</strong>. For<br />

fo=250MHz <strong>and</strong> unitary dc-ga<strong>in</strong>, the power<br />

consumption is 210µW. These results are obta<strong>in</strong>ed<br />

by us<strong>in</strong>g a very compact circuit design that allows to<br />

avoid parasitic poles <strong>and</strong> Common-Mode FeedBack<br />

circuit. A similar structure gives a 1GHz-UGB<br />

<strong>in</strong>tegrator with 120µW.<br />

1. INTRODUCTION<br />

The nodes of a Wireless Body Area Networks<br />

(WBANs) are usually placed close to the body on or<br />

<strong>in</strong> everyday cloth<strong>in</strong>g [1]. A WBAN topology<br />

comprises many transmit only sensor nodes, that<br />

have to be very simple, low cost <strong>and</strong> extremely<br />

energy efficient, some transceiver nodes, that afford<br />

a somewhat higher complexity to sense <strong>and</strong> act, <strong>and</strong><br />

few high capability nodes, e.g. master nodes with<br />

high computational capabilities <strong>and</strong> support for<br />

higher data rates. Compared to other wireless<br />

networks a WBAN has some dist<strong>in</strong>ct features <strong>and</strong><br />

requirements. Due to the close proximity of the<br />

network to the body, electromagnetic pollution<br />

should be extremely low. Thus, a non-<strong>in</strong>vasive<br />

WBAN requires a low transmit power. In addition<br />

the transmission data-rate is very low. A possible<br />

technology for non-<strong>in</strong>vasive WBAN communication<br />

is Low-Data-Rate Ultra-Wideb<strong>and</strong> (LDR-UWB). A<br />

possible UWB receiver is shown <strong>in</strong> Fig. 1. The<br />

baseb<strong>and</strong> cha<strong>in</strong> is composed by a 3 rd -order LP filter<br />

2 IMEC-Belgium<br />

E-mail: ryckj@imec.be<br />

17<br />

<strong>and</strong> an <strong>in</strong>tegrator, <strong>in</strong> order to relax the A/D<br />

converter requirements [2]-[7]. Due to the LDR<br />

feature, the overall cha<strong>in</strong> is required to feature a<br />

SNDR of about 30dB for maximum <strong>in</strong>put signal<br />

amplitude of 100mVpp.<br />

90¡<br />

Pulser<br />

Baseb<strong>and</strong> cha<strong>in</strong><br />

3rd-order<br />

LPF<br />

3rd-order<br />

LPF<br />

Fig. 1 - UWB receiver architecture<br />

ADC<br />

ADC<br />

The relative simplicity of this implementation<br />

compared to the traditional super-heterodyne<br />

architecture. This receiver has no, Phase-Locked<br />

Loop (PLL) synthesizer, nor <strong>in</strong>termediate stage (IF)<br />

filter<strong>in</strong>g. This simplicity translates to lower material<br />

costs <strong>and</strong> lower assembly costs.<br />

Even with the selection of an optimal<br />

communication system, the implementation is of<br />

extreme importance to reach ultra-low power<br />

consumption <strong>in</strong> the sensor node. The design of ultralow-power<br />

build<strong>in</strong>g blocks <strong>in</strong> advanced CMOS<br />

technologies is m<strong>and</strong>atory for this application. In the<br />

follow<strong>in</strong>g circuit solutions for the blocks embedded<br />

<strong>in</strong> the baseb<strong>and</strong>-cha<strong>in</strong> receiver are proposed, which<br />

exploit the reduced DR requirement by adopt<strong>in</strong>g an<br />

aggressive implementation.<br />

2. THE 3 RD -ORDER LOW-PASS<br />

FILTER


Fig.2 shows the proposed 3 rd –order lowpass filter.<br />

It is composed by a transconductance <strong>in</strong>put stage<br />

(the differential pairs M1-M2 <strong>and</strong> the additional<br />

M9-M10) <strong>and</strong> an active load (M3-…-M8 <strong>and</strong> C1-<br />

C2-C3).<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

Fig. 2 - The filter design architecture<br />

The filter transfer function is:<br />

V<br />

V<br />

out<br />

<strong>in</strong><br />

=<br />

s<br />

3<br />

g<br />

CC<br />

C<br />

1<br />

g<br />

2<br />

3<br />

g<br />

m3<br />

m5<br />

m7<br />

(1)<br />

g<br />

m1<br />

g<br />

m5<br />

2⎛<br />

C1C<br />

2 C3C<br />

⎞ ⎛ 2 C1<br />

C ⎞ 2<br />

+ s ⎜<br />

⎟ ⎜ ⎟<br />

⎜<br />

+<br />

⎟<br />

+ s<br />

⎜<br />

+<br />

⎟<br />

+ 1<br />

⎝ gm3gm7<br />

gm5gm7<br />

⎠ ⎝ gm3<br />

gm7<br />

⎠<br />

The poles frequencies are given by:<br />

ω 1 =− g m3<br />

C 1<br />

ω 2/3 = g m5g m7<br />

C 2C 3<br />

while the ga<strong>in</strong> <strong>and</strong> Q2/3-factor are:<br />

G = g m1<br />

g m5<br />

Q 2/3 = C 3 ⋅ g m7<br />

C 2 ⋅ g m5<br />

(2)<br />

(3)<br />

This structure presents the follow<strong>in</strong>g advantages:<br />

� Compact structure: few transistors <strong>and</strong> only three<br />

capacitors are used.<br />

� Low power consumption: only two current<br />

branches are used for each polarity.<br />

� Accurate frequency response: no non-dom<strong>in</strong>ant<br />

poles are presented <strong>and</strong> all the transistors<br />

process<strong>in</strong>g the signal are NMOS, guarantee<strong>in</strong>g<br />

also immunity to technological spread.<br />

� Absence of CMFB circuit: this avoids additional<br />

power consumption. The output DC voltage is<br />

fixed by the VGS of transistors M5 <strong>and</strong> M6.<br />

18<br />

� Good l<strong>in</strong>earity vs. power: the load exhibits the<br />

opposite non-l<strong>in</strong>ear behavior of the driver, <strong>and</strong><br />

so no l<strong>in</strong>earization is necessary for the <strong>in</strong>put<br />

pair.<br />

The dc-ga<strong>in</strong> is programmed for the two levels (0dB<br />

& 15dB) by add<strong>in</strong>g an additional <strong>in</strong>put pair (Mn-<br />

Mn+1) to <strong>in</strong>crement the gm1 value. S<strong>in</strong>ce this solution<br />

does not change the current level <strong>in</strong> the load devices,<br />

this dc-ga<strong>in</strong> tun<strong>in</strong>g strategy does not affect the filter<br />

poles position, as it is evident from (2), <strong>and</strong> (3).<br />

The cut-off frequency is tuned <strong>in</strong> the required<br />

250MHz-1GHz range by chang<strong>in</strong>g the bias current<br />

(I1 <strong>and</strong> I2): the transitors trasconductances are<br />

changed, <strong>and</strong>, thus, the cut-off frequency is tuned.<br />

The <strong>in</strong>-b<strong>and</strong> noise rema<strong>in</strong>s constant (<strong>and</strong> also the<br />

SNR). On the other h<strong>and</strong> for lower curt-off<br />

frequencies the power consumption is strongly<br />

reduced. The drawback of this approach is the fact<br />

that the devices have to be guaranteed to operate <strong>in</strong><br />

saturation region for all the tun<strong>in</strong>g range.<br />

3. THE INTEGRATOR<br />

Fig. 3 shows the schematic of the proposed<br />

<strong>in</strong>tegrator. The complete transfer function is given<br />

by:<br />

g<br />

H(s)= m1⋅r o<br />

1<br />

⋅<br />

1−(gm4−g m3)⋅r o<br />

r<br />

1+s⋅C⋅ o<br />

1−(gm4−g m3)⋅r o (4)<br />

Fig. 3 - The <strong>in</strong>tegrator structure<br />

The use of a positive feedback (M4) <strong>in</strong>creases the<br />

low-frequency load <strong>and</strong>, thus, the dc-ga<strong>in</strong> to almost<br />

70dB with a 1GHz-UGB. The l<strong>in</strong>earity is guarantee<br />

by complementary non-l<strong>in</strong>earity <strong>in</strong> the driv<strong>in</strong>g (M1)<br />

<strong>and</strong> load (M3//M4) device.


0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

4. SIMULATIONS RESULTS<br />

The 3 rd order low-pass filter <strong>and</strong> the <strong>in</strong>tegrator have<br />

been designed <strong>in</strong> a st<strong>and</strong>ard 0.13µm CMOS<br />

technology with a 1.2V supply voltage. For the LPF<br />

non-m<strong>in</strong>imum device sizes (5µm /0.3µm) have been<br />

used <strong>in</strong> order to avoid mismatch effects. For the<br />

same reason <strong>and</strong> <strong>in</strong> consideration that the load of the<br />

follow<strong>in</strong>g block is about 100fF, all the capacitors<br />

have been designed to be 100fF. This results <strong>in</strong> a<br />

SNR <strong>in</strong> excess w.r.t. the specs. TABLE I summarizes<br />

the filter performances for the four limit cases. The<br />

m<strong>in</strong>imum power consumption ranges between<br />

210µW (f0=250MHz, dc-ga<strong>in</strong>=0dB), <strong>and</strong> 3.2mW<br />

(f0=1GHz, dc-ga<strong>in</strong>=15dB). When the dc-ga<strong>in</strong> is kept<br />

equal to 0dB, the <strong>in</strong>put-referred noise decreases<br />

from 18 nV Hz to 10 nV Hz by pass<strong>in</strong>g from<br />

250MHz to 1GHz, at 15dB dc-ga<strong>in</strong>. The THD is<br />

about —30dB for a tone at fo/3 with the maximum<br />

output signal amplitude (100mVpp). The SNR is<br />

comprised among 45dB to 53dB. F<strong>in</strong>ally Table II<br />

summarizes the filter performances<br />

TABLE I –SUMMARY OF THE FILTER<br />

PERFORMANCES<br />

Case I II III IV<br />

fo [MHz] 250 1000<br />

DC-Ga<strong>in</strong> [dB] 0 15 0 15<br />

IR-Noise Level [nV/ √Hz] 18 4.5 10 2.5<br />

IR-In-b<strong>and</strong> Noise [µVrms] 285 71 320 80<br />

THD@100mVpp [dB] -30 -30 -32 -37<br />

SNR [dB] 45 53 44 53<br />

Power consumption 0.2 1.2 0.5 3.2<br />

[mW]<br />

1 9<br />

19<br />

Fig. 4 - Filter frequency response for four cases<br />

TABLE II -INTEGRATOR PERFORMANCE<br />

SUMMARY<br />

UGB [GHz] 1<br />

DC-Ga<strong>in</strong> [dB] 69<br />

Phase Marg<strong>in</strong>e [degree] 88<br />

IR-Noise Level [nV/ √Hz] 5.4<br />

THD(@100mVpp, 1GHz) [dB] -47<br />

Cload [fF] 130<br />

Power consumption [mW] 0.12<br />

5. CONCLUSIONS<br />

A low-power wide-b<strong>and</strong> CMOS cont<strong>in</strong>uous-time<br />

baseb<strong>and</strong> path (<strong>in</strong>tegrator+filter) to be embedded <strong>in</strong><br />

a WBAN receiver is presented. The filter cut-off<br />

frequency <strong>and</strong> dc-ga<strong>in</strong> can be tuned from 250MHz<br />

up to 1GHz <strong>and</strong> from 0dB up to 15dB, respectively.<br />

However, the maximum filter power consumption is<br />

limited to 3.2mW <strong>in</strong> the worst-case conditions (i.e.<br />

1GHz cut-off frequency, 15dB dc-ga<strong>in</strong>). The<br />

<strong>in</strong>tegrator exhibits a UGB of 1GHz consum<strong>in</strong>g<br />

120µW. These results are obta<strong>in</strong>ed by us<strong>in</strong>g a very<br />

compact circuit design that avoids parasitic poles<br />

<strong>and</strong> CMFB circuit.<br />

6. REFERENCES<br />

[1] S. DONNAY, C. VAN HOOF AND B.<br />

GYSELINCKX, “Body area sensor networks for<br />

health monitor<strong>in</strong>g applications” 2 nd Int. Workshop


on Sensor <strong>and</strong> Actor Network Protocols <strong>and</strong><br />

Applications (SANPA), Boston 2004.<br />

[2] F. REZZI, I. BIETTI, M. CAZZANIGA, <strong>and</strong> R.<br />

CASTELLO, “A 70-mW Seventh–Order Filter with<br />

7-50 MHz Cutoff Frequency <strong>and</strong> Programmable<br />

Boost <strong>and</strong> Group Delay Equalization” IEEE Journal<br />

of Solid-State Circuits, vol. 32, no. 12, December<br />

1997<br />

[3] I. MEHR <strong>and</strong> D. R. WELLAND, “A CMOS<br />

Cont<strong>in</strong>uous-Time Gm-C Filter for PRML Read<br />

Channel Applications at 150 Mb/s <strong>and</strong> Beyond”<br />

IEEE Journal of Solid-State Circuits, vol. 32, no. 4,<br />

April 1997<br />

[4] A. WYSZYNSKI <strong>and</strong> R. SCHAUMANN,<br />

“Us<strong>in</strong>g Multiple–Input Transconductors To Reduce<br />

Number of Components In OTA-C Filter Design”<br />

IEE <strong>Electronics</strong> Letters vol.28, no.3, January 1994.<br />

[5] S. D’AMICO <strong>and</strong> A. BASCHIROTTO “A<br />

compact High-Frequency Low-Power Cont<strong>in</strong>uous-<br />

Time Gm-C Biquad Cell” IEE <strong>Electronics</strong> Letters<br />

29 th May 2003, Vol 39, No.11, pp. 821-822.<br />

[6] A. BASCHIROTTO, U. BASCHIROTTO <strong>and</strong><br />

R. CASTELLO, “High-frequency CMOS lowpower<br />

s<strong>in</strong>gle-branch cont<strong>in</strong>uous-time filters” Proc.<br />

IEEE Int. Symp. Circ. Syst., 2000, pp. II-557, II-580.<br />

[7] A. L. COBAN <strong>and</strong> P.E. ALLEN: ‘Low-voltage<br />

CMOS transconductance cell based on parallel<br />

operation of triode <strong>and</strong> saturation transconductors’<br />

IEE <strong>Electronics</strong> Letters. vol.30, no.14, July 1994.<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

20


0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

ENERGY-EFFICIENT BASEBAND RECEIVER<br />

DESIGN IN DEEP SUB-µm TECHNOLOGY<br />

Arm<strong>in</strong> Wellig<br />

ST<strong>Microelectronics</strong> – Advanced System Technology<br />

39, Chem<strong>in</strong> du Champ-des-Filles, 1228 Plan-les-Ouates, Geneva, Switzerl<strong>and</strong><br />

E-mail: arm<strong>in</strong>.wellig@st.com<br />

ABSTRACT<br />

Novel physical layer techniques (e.g., hybrid-ARQ)<br />

dramatically <strong>in</strong>crease the embedded memory of baseb<strong>and</strong><br />

receivers. Frequent accesses to large memories are among<br />

the top dynamic power dra<strong>in</strong>ers. Moreover, shr<strong>in</strong>k<strong>in</strong>g<br />

form factors of portable devices pushes for deep sub-µm<br />

technology suffer<strong>in</strong>g from an <strong>in</strong>crease <strong>in</strong> static leakage<br />

power. In this paper, we demonstrate the need to comb<strong>in</strong>e<br />

low power techniques at the system, architecture <strong>and</strong><br />

device level to ensure energy-efficient designs <strong>in</strong> deep<br />

sub-µm. The 3GPP-HSDPA receiver implemented with<br />

ST<strong>Microelectronics</strong>’ 90nm CMOS technology serves as a<br />

case study, where energy reductions of up to 70% are<br />

observed compared to more conventional designs.<br />

1. INTRODUCTION<br />

A major challenge of portable device manufacturers is to<br />

guarantee long battery life despite the enormous <strong>in</strong>crease<br />

<strong>in</strong> system-on-chip (SoC) complexity. To maximize the<br />

amount of computation delivered per battery life, the<br />

energy consumed per operation <strong>in</strong> a specified amount of<br />

time should be m<strong>in</strong>imized. However, this only addresses<br />

one part of the “power crisis”. For most of today’s batteryoperated<br />

products, there will be times when the device is<br />

“on” but not <strong>in</strong> full active use. The cell phone <strong>in</strong> st<strong>and</strong>by is<br />

the most notable example of this, where the phone does<br />

little more than monitor for an <strong>in</strong>com<strong>in</strong>g call, <strong>and</strong> thus<br />

consumes less power than dur<strong>in</strong>g phone calls or other<br />

activities. S<strong>in</strong>ce cell phones spend so much time <strong>in</strong><br />

st<strong>and</strong>by mode, the energy consumed dur<strong>in</strong>g st<strong>and</strong>by<br />

(ma<strong>in</strong>ly due to leakage current) often determ<strong>in</strong>es the<br />

overall battery life of the phone.<br />

To achieve energy-efficiency <strong>in</strong> deep sub-µm technology,<br />

designers are forced to use a comb<strong>in</strong>ation of low power<br />

techniques at multiple levels (i.e., system, architecture,<br />

circuit <strong>and</strong> process) tailored to a particular application. In<br />

this paper, we focus on the design challenges of memorydom<strong>in</strong>ated<br />

build<strong>in</strong>g blocks of wireless baseb<strong>and</strong> receivers;<br />

for example, the <strong>in</strong>troduction of a physical layer<br />

retransmission scheme such as the H-ARQ [1] results <strong>in</strong> a<br />

6x memory <strong>in</strong>crease. We show how power reduction<br />

techniques yield important dynamic <strong>and</strong> static energy<br />

sav<strong>in</strong>gs at different design levels. In particular, the circuit<br />

activity can be reduced at the architectural level by<br />

application-specific access pattern transformations <strong>and</strong><br />

memory partition<strong>in</strong>g strategies; at the system level,<br />

system-driven voltage scal<strong>in</strong>g techniques are discussed as<br />

21<br />

a means of reduc<strong>in</strong>g the leakage power dur<strong>in</strong>g st<strong>and</strong>by <strong>and</strong><br />

f<strong>in</strong>ally, at the device level, the need for low power<br />

st<strong>and</strong>ard cell library is demonstrated.<br />

We limit the analysis to embedded SRAMs implemented<br />

with different CMOS device geometries as part of<br />

ST<strong>Microelectronics</strong>’ process technology portfolio. Some<br />

process details will not be disclosed due to confidentially<br />

constra<strong>in</strong>ts.<br />

2. ENERGY DISSIPATION IN CMOS<br />

Neglect<strong>in</strong>g the short circuit currents dur<strong>in</strong>g signal<br />

transitions, the energy dissipation of CMOS circuits can<br />

be split <strong>in</strong>to a dynamic <strong>and</strong> static part, formalized as [2]<br />

2<br />

E L DD leak DD<br />

= α ⋅ C ⋅ V + I ⋅ V ⋅ ∆t<br />

( 1 )<br />

where α is the switch<strong>in</strong>g activity, CL the load capacitance,<br />

VDD the supply voltage, Ileak the total leakage current [3]<br />

caused by reverse bias PN junction leakage, subthreshold<br />

leakage, oxide tunnel<strong>in</strong>g, hot carrier <strong>in</strong>jection, gate<br />

<strong>in</strong>duced dra<strong>in</strong> leakage <strong>and</strong> channel punchthrough, <strong>and</strong> ∆t<br />

the time <strong>in</strong>terval dur<strong>in</strong>g which (1) is applicable.<br />

As f<strong>in</strong>er geometry devices have been developed, the trend<br />

has been to reduce supply voltages, which has the positive<br />

side effect to save energy (due to the quadratic effect of<br />

VDD on dynamic energy). However, voltage levels seem to<br />

be stabiliz<strong>in</strong>g <strong>in</strong> the 1.0-1.2 Volts range <strong>in</strong> deep sub-µm,<br />

which means that future technologies will not<br />

automatically result <strong>in</strong> energy sav<strong>in</strong>gs. Moreover, f<strong>in</strong>er<br />

device geometries are also affected dramatically by<br />

leakage current. With the physics of the semi-conductor<br />

work<strong>in</strong>g aga<strong>in</strong>st the designer – by rais<strong>in</strong>g the voltage<br />

threshold of the transistors the leakage current can be<br />

reduced, but this comes at a correspond<strong>in</strong>g decrease <strong>in</strong> the<br />

speed of the transistor – the designer must turn to circuit,<br />

architecture <strong>and</strong> algorithm exploration to f<strong>in</strong>d energyefficient<br />

SoC solutions.<br />

3. DYNAMIC ENERGY REDUCTION<br />

Due to the limited space <strong>in</strong> this paper we will focus our<br />

discussions on energy-efficient memory systems. To<br />

illustrate the potential for dynamic energy sav<strong>in</strong>gs, the<br />

average energy per SRAM access is plotted <strong>in</strong> Figure 1 for<br />

different device geometries <strong>and</strong> memory data buses (or<br />

access bit widths). From Figure 1a we can draw the<br />

follow<strong>in</strong>g conclusions: 1) The access energy <strong>in</strong>creases<br />

with memory size. This motivates memory hierarchy


designs, where frequently accessed data are placed <strong>in</strong>to<br />

small, energy-efficient memories (e.g., L1/2 caches),<br />

while rarely accessed <strong>in</strong>formation is stored <strong>in</strong> large<br />

memories with high cost per access [4]. Its simplest form<br />

consists of break<strong>in</strong>g a large memory down <strong>in</strong>to smaller<br />

banks of the same size. 2) Because of the quadratic effect<br />

of V DD on the dynamic energy, supply voltage reductions<br />

as a result of technology scal<strong>in</strong>g accounts for major energy<br />

sav<strong>in</strong>gs (e.g., V DD = 1.8V for 0.18µm versus V DD = 1.2V<br />

for 0.13µm technology). Moreover, the switched node<br />

capacitance C L also decreases with constant field scal<strong>in</strong>g –<br />

the electric field across the gate-oxide does not change for<br />

the scaled technology to avoid irreversible gate-oxide<br />

breakdown – which further reduces the energy dissipation.<br />

Moreover, we can observe that the energy sav<strong>in</strong>gs are less<br />

important <strong>in</strong> case only C L scales down (here, go<strong>in</strong>g from<br />

0.13µm to 90nm keep<strong>in</strong>g V DD at 1.2V). This observation is<br />

<strong>in</strong> l<strong>in</strong>e with equation (1), which predicts (only) a l<strong>in</strong>ear<br />

reduction of the dynamic energy with respect to C L<br />

scal<strong>in</strong>g.<br />

Figure 1. Average access energy for (a) different<br />

sub-µm technologies <strong>and</strong> (b) access bit widths.<br />

From Figure 1b we conclude that the access energy<br />

<strong>in</strong>creases less than l<strong>in</strong>early with access bit-width for a<br />

fixed memory size. For example, for a fixed SRAM size of<br />

32.8k bits, access<strong>in</strong>g four times 8-bit words is 2.6x more<br />

expensive than access<strong>in</strong>g only once a 32-bit word (with<br />

the same data content). Thus, application-specific access<br />

pattern transformations such as pack<strong>in</strong>g, where for<br />

example N 8-bit samples are transformed <strong>in</strong>to one 8N-bit<br />

word, can significantly reduce the dynamic energy of<br />

memory systems.<br />

4. STATIC POWER REDUCTION<br />

Leakage power becomes <strong>in</strong>creas<strong>in</strong>gly troublesome <strong>in</strong> deep<br />

sub-µm (see Figure 2a) for applications characterized by<br />

low active-to-st<strong>and</strong>by ratios, i.e. ∆t ≅ st<strong>and</strong>by period, <strong>and</strong><br />

high memory requirements. Various techniques have been<br />

proposed to reduce the leakage power of SRAM cells [5].<br />

At the device level, leakage reduction can be achieved by<br />

controll<strong>in</strong>g the dimensions (length, oxide thickness,<br />

junction depth, etc.) <strong>and</strong> dop<strong>in</strong>g profile <strong>in</strong> the transistors.<br />

At the circuit level, leakage reduction is achieved by<br />

controll<strong>in</strong>g statically or dynamically the voltage of the<br />

different device term<strong>in</strong>als (i.e., gate, dra<strong>in</strong>, source <strong>and</strong><br />

substrate). From a system architect’s perspective these<br />

low-level techniques are a design constra<strong>in</strong>t <strong>and</strong> part of the<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

22<br />

low-power st<strong>and</strong>ard cell library specification. At the<br />

architectural level, one key concept is supply voltage<br />

scal<strong>in</strong>g to operate the SRAM cell <strong>in</strong> the sub-threshold<br />

region (generally referred to as drowsy mode [6]) or gat<strong>in</strong>g<br />

off the V DD at the expense of destroy<strong>in</strong>g the cell state. The<br />

energy reduction of drowsy SRAMs <strong>in</strong> a low-power 90nm<br />

CMOS process is shown <strong>in</strong> Figure 2b, where the bars<br />

<strong>in</strong>dicate the total leakage power (left ord<strong>in</strong>ate) <strong>and</strong> the<br />

l<strong>in</strong>es the relative leakage reduction (right ord<strong>in</strong>ate) with<br />

respect to “full V DD” operation.<br />

Figure 2. Leakage power of (a) different sub-µm<br />

technologies <strong>and</strong> (b) drowsy SRAM cells <strong>in</strong> 90nm.<br />

Note that (parts of) the SRAM can only be operated <strong>in</strong><br />

drowsy mode if the stability of the cell can be guaranteed;<br />

that is, the logic state of the cell does not flip when<br />

reduc<strong>in</strong>g V DD as will be expla<strong>in</strong>ed next.<br />

4.1 Stability of Drowsy SRAMs <strong>in</strong> sub-µm<br />

The static noise marg<strong>in</strong> (SNM) of a SRAM cell is def<strong>in</strong>ed<br />

[7] as the m<strong>in</strong>imum dc noise voltage necessary to flip the<br />

state of the cell. An estimate of the SNM can be obta<strong>in</strong>ed<br />

by draw<strong>in</strong>g the static “butterfly diagram” (or voltage<br />

transfer characteristic (VTC)) as illustrated <strong>in</strong> Figure 3 <strong>and</strong><br />

f<strong>in</strong>d<strong>in</strong>g the maximum possible square between them. We<br />

observe that the eye open<strong>in</strong>g is larger for a SRAM cell <strong>in</strong><br />

st<strong>and</strong>by mode (i.e., access transistors are off) compared to<br />

a cell <strong>in</strong> active mode (i.e., access transistors are on).<br />

Intuitively, the cell is most vulnerable to noise dur<strong>in</strong>g a<br />

read access, s<strong>in</strong>ce the ‘0’ storage node rises to a voltage<br />

higher than ground due to the bit-l<strong>in</strong>e discharge.<br />

Figure 3. Voltage transfer characteristic (VTC) for<br />

SRAM cell <strong>in</strong> (a) active <strong>and</strong> (b) st<strong>and</strong>by mode.<br />

To analyze the limitation of stable “drowsy mode<br />

operation” <strong>in</strong> deep sub-µm technology, the SNM of a<br />

basic 6T-SRAM cell [5] was simulated <strong>in</strong> SPICE based on<br />

BSIM4 models <strong>in</strong> a 90nm process. As po<strong>in</strong>ted out earlier,


the worst-case situation is under “read-disturb” conditions<br />

as shown <strong>in</strong> Figure 4a. For <strong>in</strong>stance, sett<strong>in</strong>g the stability<br />

criterion to SNM = 100 mV (application-specific), the<br />

SRAM cell can safely be operated at supply voltages as<br />

low as V DD = 0.7 V dur<strong>in</strong>g read access (account<strong>in</strong>g for<br />

process variations of up to 3σ <strong>and</strong> an operat<strong>in</strong>g<br />

temperature of 85°C). In order to be able to further<br />

decrease V DD, without relax<strong>in</strong>g the stability criterion, more<br />

sophisticated devices <strong>and</strong> circuits would be needed to<br />

support stable read operations tailored for ultra-low power<br />

applications (e.g., [8]).<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

σ<br />

σ<br />

Figure 4. Stability of a drowsy SRAM cell <strong>in</strong> (a)<br />

active <strong>and</strong> (b) st<strong>and</strong>by mode <strong>in</strong> a 90nm process.<br />

In st<strong>and</strong>by mode, an (off-the-shelf) low-power SRAM can<br />

be operated <strong>in</strong> the sub-threshold region even for a process<br />

variation of up to 3σ <strong>and</strong> an operat<strong>in</strong>g temperature of<br />

85°C. Thus, <strong>in</strong>corporat<strong>in</strong>g dynamic supply voltage control<br />

<strong>in</strong> memory-dom<strong>in</strong>ated SoC designs is of great <strong>in</strong>terest for<br />

applications characterized by long idle periods such as cell<br />

phones dur<strong>in</strong>g st<strong>and</strong>by. F<strong>in</strong>ally, it should be noted that an<br />

<strong>in</strong>crease <strong>in</strong> temperature is considerably less disturb<strong>in</strong>g<br />

(∆SNM ≅ 1mV/15°C) to the cell stability than variations <strong>in</strong><br />

the die (∆SNM ≅ 50 - 65mV @ 3σ). A more detailed<br />

analysis can be found <strong>in</strong> [9].<br />

5. 3GPP-HSDPA RECEIVER<br />

5.1 Evolution of 3GPP<br />

Wideb<strong>and</strong> code division multiple access (WCDMA) is the<br />

most widely adopted air <strong>in</strong>terface for third generation (3G)<br />

systems <strong>and</strong> specified with<strong>in</strong> the 3GPP alliance [1]. It<br />

provides peak bit rates of 2 Mbps <strong>and</strong> variable data rate on<br />

dem<strong>and</strong> <strong>in</strong> a 5 MHz b<strong>and</strong>width. Average user data rates <strong>in</strong><br />

the order of 200-300 kbps are expected under loaded<br />

network conditions [10]. The use of CDMA access<br />

technology implies a radio receiver composed of multiple<br />

f<strong>in</strong>gers us<strong>in</strong>g offsets of a common spread<strong>in</strong>g code to<br />

receive <strong>and</strong> comb<strong>in</strong>e several multi-path time-delayed<br />

signals. It is generally referred to as RAKE receiver. Other<br />

enhancements (compared to 2G systems like GSM [10])<br />

impact<strong>in</strong>g the physical layer complexity <strong>in</strong>clude soft<br />

h<strong>and</strong>over, fast power control, multiple <strong>in</strong>terleav<strong>in</strong>g stages<br />

<strong>and</strong> Turbo codes. To def<strong>in</strong>e <strong>and</strong> limit the process<strong>in</strong>g <strong>and</strong><br />

storage requirements of power-constra<strong>in</strong>t portable devices,<br />

the notion of “User equipment (UE) Capability Class” was<br />

<strong>in</strong>troduced <strong>in</strong> 3GPP. In the specifications of the Release 4<br />

the UE Capability classes range from 32 kbps to 2 Mbps.<br />

To satisfy even higher data traffic dem<strong>and</strong>s, the 3GPP<br />

system aims to gradually <strong>in</strong>crease the spectral efficiency<br />

σ<br />

σ<br />

23<br />

<strong>and</strong> to support higher user data rates, especially on the<br />

downl<strong>in</strong>k direction of the communication path due to its<br />

heavier load. In this context, the 3GPP <strong>in</strong>troduces a new<br />

feature <strong>in</strong> the Release 5 specifications denom<strong>in</strong>ated High<br />

Speed Downl<strong>in</strong>k Packet Access (HSDPA). The HSDPA<br />

concept appears as an umbrella of features to improve<br />

both user <strong>and</strong> system performance. The signal process<strong>in</strong>g<br />

concepts impact<strong>in</strong>g the receiver complexity <strong>in</strong>clude an<br />

Adaptive Modulation <strong>and</strong> Cod<strong>in</strong>g (AMC) scheme,<br />

physical layer Hybrid Automatic Repeat Request (H-<br />

ARQ) <strong>and</strong> latency-constra<strong>in</strong>t physical control decod<strong>in</strong>g.<br />

The UE Capability Classes <strong>in</strong> Release 5 are def<strong>in</strong>ed up to<br />

10 Mbps, which significantly <strong>in</strong>creases the process<strong>in</strong>g<br />

load <strong>and</strong> thus motivates a hardware implementation of the<br />

HSDPA receiver to satisfy the power-delay constra<strong>in</strong>ts.<br />

5.2 Energy-efficient HSDPA receiver design<br />

Although <strong>in</strong> pr<strong>in</strong>ciple the adoption of user-specific<br />

orthogonal codes should exempt WCDMA systems at<br />

least from <strong>in</strong>tra-cell <strong>in</strong>terference, multi-path dispersive<br />

channels provide the receiver with a comb<strong>in</strong>ation of<br />

correlated time-shifted replicas of the codes, thus<br />

compromis<strong>in</strong>g the orthogonality. While conventional<br />

RAKE process<strong>in</strong>g is sufficient to meet the compliance<br />

requirements <strong>in</strong> Release 4, channel equalization was<br />

implemented as part of the HSDPA receiver to guarantee<br />

the capacity improvements envisioned <strong>in</strong> the Release 5<br />

specifications. The equalization is done <strong>in</strong> the frequency<br />

doma<strong>in</strong> to take advantage of low-complexity FFT<br />

process<strong>in</strong>g. Due to the limited space we do not give more<br />

details of the <strong>in</strong>ner receiver architecture, but can be found<br />

<strong>in</strong> [11]. The outer receiver is <strong>in</strong> charge of decod<strong>in</strong>g the<br />

<strong>in</strong>com<strong>in</strong>g coded sequence via a prescribed algorithm to<br />

provide the cod<strong>in</strong>g ga<strong>in</strong> for operation over noisy channels.<br />

It consists of a channel de<strong>in</strong>terleav<strong>in</strong>g stage followed by<br />

H-ARQ related process<strong>in</strong>g <strong>and</strong> Turbo decod<strong>in</strong>g.<br />

Prelim<strong>in</strong>ary synthesis results of the HSDPA receiver,<br />

based on ST<strong>Microelectronics</strong>’ ultra-low leakage processes<br />

<strong>in</strong> 0.13µm <strong>and</strong> 90nm, are given <strong>in</strong> Table 1.<br />

Table. 1. Prelim<strong>in</strong>ary synthesis results of the<br />

HSDPA receiver design.<br />

Process 0.13µm 90nm<br />

Area estimates 7.235mm 2 3.703mm 2<br />

Memory -% 71 % 66 %<br />

One immediate consequence of <strong>in</strong>troduc<strong>in</strong>g a fast physical<br />

layer retransmission scheme is an <strong>in</strong>crease <strong>in</strong> memory<br />

size. From Table 1 we conclude that roughly 70% of the<br />

HSDPA data-path is composed of memory, where the<br />

biggest storage requirements stem from the H-ARQ.<br />

The idea beh<strong>in</strong>d H-ARQ is to <strong>in</strong>itially try<strong>in</strong>g to transmit<br />

the <strong>in</strong>formation with little redundancy. If the decod<strong>in</strong>g<br />

fails on the first attempt, the receiver keeps the soft bits of<br />

the <strong>in</strong>adequately received block <strong>in</strong> memory. The receiver<br />

then uses old <strong>and</strong> new soft bits <strong>in</strong> the next decod<strong>in</strong>g<br />

attempt. Note that up to 8 <strong>in</strong>dependent Stop-<strong>and</strong>-Wait


ARQ processes are supported to guarantee cont<strong>in</strong>uous<br />

transmission. S<strong>in</strong>ce the <strong>in</strong>put samples <strong>in</strong>to the H-ARQ are<br />

<strong>in</strong> chronological order (i.e., after de<strong>in</strong>terleav<strong>in</strong>g <strong>and</strong> rate<br />

match<strong>in</strong>g) the 8-bit samples can be packed <strong>in</strong>to 64-bit<br />

words. The choice of the pack<strong>in</strong>g factor N (= 8) depends<br />

on the pack<strong>in</strong>g efficiency, def<strong>in</strong>ed as the percentage of<br />

useful data out of the accessed 8N-bit word. Moreover,<br />

mapp<strong>in</strong>g each ARQ process onto a different SRAM bank<br />

not only reduces the dynamic energy, but also enables<br />

supply voltage scal<strong>in</strong>g of different banks (see Figure 5).<br />

Depend<strong>in</strong>g on the radio l<strong>in</strong>k configuration, application<br />

requirements <strong>and</strong> system capacity, each SRAM bank can<br />

be either <strong>in</strong> Read/Write Mode, St<strong>and</strong>by or Sleep mode.<br />

The banks <strong>in</strong> Sleep mode (i.e., no ARQ process is<br />

assigned) are idle over a long period (seconds to m<strong>in</strong>utes)<br />

<strong>and</strong> can be gated off. Conversely, the memory content<br />

dur<strong>in</strong>g St<strong>and</strong>by mode must be preserved (∆t ≥ 10 ms)<br />

while operat<strong>in</strong>g the SRAM <strong>in</strong> sub-threshold. The projected<br />

energy sav<strong>in</strong>gs with respect to a conventional multi-bank<br />

architecture without pack<strong>in</strong>g <strong>and</strong> drowsy mode support is<br />

shown <strong>in</strong> Figure 6. It is assumed that the network<br />

implements “Proportional Fair” packet schedul<strong>in</strong>g [12]<br />

<strong>and</strong> that all users have the same application requirements<br />

(here 64 kbps).<br />

Access energy [uJ]<br />

Figure 5. H-ARQ Architecture template.<br />

Figure 6. Projected energy reduction of the<br />

HSDPA receiver design <strong>in</strong> a 90nm process.<br />

Note that as more users await new data packets the<br />

st<strong>and</strong>by periods <strong>in</strong>crease <strong>and</strong> so thus the leakage power.<br />

Depend<strong>in</strong>g on the number of users to be served <strong>in</strong> a cell,<br />

implement<strong>in</strong>g a multi-bank SRAM architecture with<br />

pack<strong>in</strong>g <strong>and</strong> drowsy mode support results <strong>in</strong> an overall<br />

energy reduction of up to 70%.<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

Leakage Red-%<br />

24<br />

6. CONCLUSION<br />

As deep sub-µm technology <strong>in</strong>vades the market of<br />

portable devices, a profound knowledge of the application<br />

characteristics down to the silicon is critical to asses<br />

design choices. In this paper, we have demonstrated the<br />

need to comb<strong>in</strong>e low power design techniques at the<br />

levels of system (i.e., system-driven voltage scal<strong>in</strong>g),<br />

architecture (i.e., access pattern transformations <strong>and</strong><br />

memory partition<strong>in</strong>g) <strong>and</strong> device (i.e., low power st<strong>and</strong>ard<br />

cell library) to ensure energy-efficient designs of<br />

memory-dom<strong>in</strong>ated baseb<strong>and</strong> receivers. Choos<strong>in</strong>g a<br />

multi-bank SRAM architecture with pack<strong>in</strong>g <strong>and</strong> drowsy<br />

mode support to implement the H-ARQ as part of the<br />

HSDPA receiver design yields an energy reduction of up<br />

to 70% <strong>in</strong> ST<strong>Microelectronics</strong>’ 90nm CMOS technology<br />

over conventional designs.<br />

7. REFERENCES<br />

[1] 3 rd Generation Partner Project, www.3gpp.org.<br />

[2] A.P. Ch<strong>and</strong>rakasan, S. Sheng <strong>and</strong> R.W. Broderson,<br />

Low-power CMOS digital design, IEEE J. of Solid-<br />

State Circuits, Vol. 27, No. 4, pp. 473-484, 1992.<br />

[3] K. Roy et al., Leakage Current Mechanisms <strong>and</strong><br />

Leakage Reduction Techniques <strong>in</strong> Deep-<br />

Submicrometer CMOS Circuits. Proceed<strong>in</strong>gs of the<br />

IEEE, Vol. 91, No.2, pp. 305-327, 2003.<br />

[4] L. Ben<strong>in</strong>i et al., Increas<strong>in</strong>g energy efficiency of<br />

embedded systems by application-specific memory<br />

hierarchy generation, IEEE Design & Test of<br />

Computers, vol. 17, no. 2, pp. 74-85, 2000.<br />

[5] Y. Nakagome et al., Review <strong>and</strong> future prospects of<br />

low-voltage RAM circuits, IBM J. <strong>Research</strong> &<br />

Development, Vol. 47, No. 5/6, pp. 525-551, 2003.<br />

[6] K. Flautner et al, Drowsy caches: simple techniques<br />

for reduc<strong>in</strong>g leakage power, Int. Symposium on<br />

Computer Architecture, pp. 25-29, 2002.<br />

[7] E. Seev<strong>in</strong>ck et al., Static Noise Marg<strong>in</strong> Analysis of<br />

MOS SRAM Cells, IEEE J. of Solid-State Circuits,<br />

Vol. 22, No. 5, pp. 748-754, 1987.<br />

[8] B.H. Calhoun <strong>and</strong> A. Ch<strong>and</strong>rakasan, Characteriz<strong>in</strong>g<br />

<strong>and</strong> Model<strong>in</strong>g M<strong>in</strong>imum Energy Operation for<br />

Subthreshold Circuits, ISLPED, pp. 90 - 95, 2004.<br />

[9] A. Wellig <strong>and</strong> J. Zory, Static Noise Marg<strong>in</strong> Analysis<br />

of Sub-threshold SRAM cells <strong>in</strong> deep sub-micron<br />

technology, submitted to PATMOS <strong>2005</strong>.<br />

[10] T. Halonen et al., GSM, GPRS <strong>and</strong> EDGE<br />

Performance: Evolution Towards 3G/UMTS. Second<br />

Edition. John Wiley & Sons, Inc., New York, 2003.<br />

[11] D. Loiacono et al., Serial Block Process<strong>in</strong>g for Multi-<br />

Code WCDMA Frequency Doma<strong>in</strong> Equalization,<br />

IEEE WCNC, Vol. 1, pp. 164-170, <strong>2005</strong>.<br />

[12] H.J. Kushner et al., Convergence of proportional-fair<br />

shar<strong>in</strong>g algorithms under general conditions, IEEE<br />

Trans. on Wireless Communications, Vol. 3, Issue 4,<br />

pp. 1250-1259, 2004.


0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

A LOW-VOLTAGE, LOW-DISTORTION (1.2V,<br />

29.5dBm OIP3) RECONFIGURABLE BASEBAND<br />

BLOCK FOR MOBILE APPLICATIONS<br />

Nicola Ghittori 1 , Andrea Vigna 1 , Piero Malcovati 2 , Stefano D’Amico 3 , Andrea Baschirotto 3<br />

1 Department of <strong>Electronics</strong>, 2 Department of Electrical Eng<strong>in</strong>eer<strong>in</strong>g, University of Pavia, Italy<br />

3 Department of Innovation Eng<strong>in</strong>eer<strong>in</strong>g, University of Lecce, Italy<br />

ABSTRACT<br />

This paper presents a reconfigurable baseb<strong>and</strong> DAC<br />

system (current-steer<strong>in</strong>g D/A + transimpedance stage +<br />

low-pass reconstruction filter) operat<strong>in</strong>g <strong>in</strong> a low-voltage<br />

multist<strong>and</strong>ard transmitter, while satisfy<strong>in</strong>g high l<strong>in</strong>earity<br />

requirements. The implemented block can process the<br />

WLAN 802.11a/b/g <strong>and</strong> the UMTS signals. The device<br />

has been <strong>in</strong>tegrated <strong>in</strong> a 0.13µm CMOS technology <strong>and</strong><br />

operates with a 1.2V supply voltage. Experimental results<br />

show that the proposed circuit achieves a 29.5dBm OIP3<br />

when configured for the WLAN sett<strong>in</strong>g, <strong>and</strong> a 31dBm<br />

OIP3 when configured for the UMTS sett<strong>in</strong>g. The current<br />

consumption, optimized for the two cases, is<br />

16.2mA/14mA for WLAN/UMTS, respectively.<br />

1. INTRODUCTION<br />

The design of analog blocks for wireless transceivers has<br />

always been focused on the power consumption<br />

m<strong>in</strong>imization, which is an issue for portable devices. This,<br />

<strong>in</strong> conjunction with the diffusion of scaled-down<br />

technologies, leads to a progressive reduction of the<br />

supply voltage for mobile applications. However the use<br />

of low supply voltages implies also some drawbacks <strong>in</strong> the<br />

design of the analog blocks. These drawbacks ma<strong>in</strong>ly<br />

regard the limited achievable l<strong>in</strong>earity performance <strong>and</strong><br />

hence the reduced dynamic range. As a matter of fact, <strong>in</strong><br />

the state-of-the-art literature [1], a supply voltage as low<br />

as 1.2V is actually never used <strong>in</strong> wireless transceivers, <strong>and</strong><br />

an high l<strong>in</strong>earity requirement represents a challeng<strong>in</strong>g<br />

bottleneck <strong>in</strong> particular for the baseb<strong>and</strong> analog blocks.<br />

Another challenge <strong>in</strong> the present market of wireless<br />

systems is represented by the implementation of blocks<br />

able to change operation mode, exploit<strong>in</strong>g the same<br />

circuitry. This feature allows the devices to meet the<br />

specifications of different st<strong>and</strong>ards, thus satisfy<strong>in</strong>g the<br />

grow<strong>in</strong>g dem<strong>and</strong> of reconfigurable mobile term<strong>in</strong>als.<br />

In these fields of research, the aim of the presented work<br />

has been the realization of a 1.2V low-distortion DAC<br />

system to be embedded <strong>in</strong> a wireless transmitter, able to<br />

fulfil the requirements of the WLANs <strong>and</strong> UMTS<br />

st<strong>and</strong>ards. The proposed block has been fabricated <strong>in</strong> a<br />

0.13µm CMOS technology <strong>and</strong> is realized with a<br />

current-steer<strong>in</strong>g DAC followed by a transimpedance stage<br />

<strong>and</strong> a low-pass reconstruction filter. The filter b<strong>and</strong>width<br />

can be digitally programmed (11MHz/2.5MHz), accord<strong>in</strong>g<br />

25<br />

to the selected operation mode (WLAN/UMTS). In<br />

addition, also the DAC sampl<strong>in</strong>g frequency can be<br />

changed (100MHz/50MHz for WLAN/UMTS,<br />

respectively) to satisfy the different resolution<br />

requirements of the two st<strong>and</strong>ards. The use of the<br />

transimpedance stage, as well as a number of proper<br />

design choices, allow the device to achieve a high<br />

l<strong>in</strong>earity performance even with a supply voltage as low<br />

as 1.2V.<br />

The paper is organized as follows. In paragraph 2 the<br />

design of the reconfigurable system <strong>and</strong> the features of the<br />

functional blocks are described. Paragraph 3 presents the<br />

experimental results. F<strong>in</strong>ally <strong>in</strong> paragraph 4 conclusions<br />

are drawn.<br />

2. DAC SYSTEM DESIGN<br />

The fully-differential architecture of the implemented<br />

reconfigurable block is reported <strong>in</strong> Fig. 1. An 8 bit<br />

current-steer<strong>in</strong>g DAC, based on a fully-thermometric<br />

differential structure, is driven by the digital controls<br />

com<strong>in</strong>g from the b<strong>in</strong>ary-to-thermometric decoder. The<br />

DAC output current is transformed <strong>in</strong>to a voltage signal<br />

by a transimpedance stage, realized with an operational<br />

amplifier <strong>and</strong> two feedback resistors (a similar<br />

architecture is used <strong>in</strong> [2]). This signal represents the <strong>in</strong>put<br />

for the follow<strong>in</strong>g 4 th -order Bessel low-pass reconstruction<br />

filter which generates the smoothed analog waveform.<br />

Thanks to a digital control signal, the filter transfer<br />

function can be programmed <strong>in</strong> order to accommodate the<br />

two considered st<strong>and</strong>ards.<br />

In the follow<strong>in</strong>g subsections the three functional blocks<br />

(DAC, transimpedance stage <strong>and</strong> filter) are described <strong>in</strong><br />

detail. The design of these blocks has been oriented to<br />

maximize the l<strong>in</strong>earity, while optimiz<strong>in</strong>g the power<br />

consumption.<br />

2.1 The 8 bit current-steer<strong>in</strong>g DAC<br />

As shown <strong>in</strong> Fig. 1 the implemented DAC is based on an 8<br />

bit current-steer<strong>in</strong>g differential architecture. A<br />

fully-thermometric structure has been chosen to reduce the<br />

effect of glitches <strong>and</strong> the DNL errors. This is paid with an<br />

<strong>in</strong>crease <strong>in</strong> the digital area, which is anyway negligible<br />

due to the low number of bits.


Digital<br />

controls<br />

Bias<br />

(255/2)<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

I unit<br />

1.2 V<br />

255 unit cells<br />

Current-steer<strong>in</strong>g DAC<br />

(255/2) I unit<br />

DACout+<br />

DACout-<br />

= 5µΑ<br />

I unit<br />

The achievable resolution <strong>in</strong> the signal b<strong>and</strong>width<br />

(10MHz for the WLAN st<strong>and</strong>ard, 2.11MHz for the UMTS<br />

st<strong>and</strong>ard) is higher than 8 bits, s<strong>in</strong>ce the DAC has been<br />

designed to operate at conversion frequencies (F s) up to<br />

few hundreds of MHz <strong>and</strong> consequently the oversampl<strong>in</strong>g<br />

ratio can be exploited. Accord<strong>in</strong>g to the selected st<strong>and</strong>ard<br />

(WLAN/UMTS), F s can be set to 100MHz/50MHz<br />

respectively, thus lead<strong>in</strong>g to a resolution of 9 bit <strong>in</strong> the<br />

WLAN b<strong>and</strong>width <strong>and</strong> 9.5 bit <strong>in</strong> the UMTS b<strong>and</strong>width<br />

(due only to quantization noise). The required <strong>in</strong>-b<strong>and</strong><br />

resolutions, which result from a behavioural analysis of<br />

the WLAN/UMTS TX cha<strong>in</strong>s developed <strong>in</strong> the frame of<br />

this activity [3], are equal to 8 bit <strong>and</strong> 9 bit for WLAN <strong>and</strong><br />

UMTS, respectively. As a consequence, the chosen<br />

sampl<strong>in</strong>g frequencies determ<strong>in</strong>e <strong>in</strong> both cases a design<br />

marg<strong>in</strong> to account for other noise contributions due to real<br />

implementation.<br />

In order to maximize the l<strong>in</strong>earity performance all the<br />

sources of distortion which affect a current-steer<strong>in</strong>g D/A<br />

converter have been taken <strong>in</strong>to account: output<br />

impedance, match<strong>in</strong>g <strong>and</strong> glitches of the current sources.<br />

Usually, when high l<strong>in</strong>earity performance is required, the<br />

cascode structure [4] is used to implement the DAC<br />

current sources. In this case the low supply voltage of<br />

1.2V does not allow the use of this structure s<strong>in</strong>ce the<br />

marg<strong>in</strong> for all the transistors to avoid the triode region<br />

would be too small. The distortion effect due to the f<strong>in</strong>ite<br />

output impedance of the sources is anyway made<br />

negligible by the transimpedance stage. In fact the virtual<br />

ground forced by the stage at the DAC output nodes<br />

makes the unit currents of the converter much more<br />

<strong>in</strong>dependent of the <strong>in</strong>put signal, if compared to a solution<br />

<strong>in</strong> which the DAC output voltage sw<strong>in</strong>g is directly<br />

produced at the output nodes of the current cells.<br />

Behavioral analysis shows that a relative match<strong>in</strong>g of 2%<br />

for each s<strong>in</strong>gle current source is allowable to ensure an<br />

high DNL-INL yield with 0.5LSB as upper limit. The low<br />

supply voltage of 1.2V limits to 70mV the overdrive<br />

voltage of the DAC unit current cell. Consequently, us<strong>in</strong>g<br />

the Pelgrom model [5], a proper area for the unit source<br />

220 Ohm<br />

220 Ohm<br />

Transimpedance<br />

stage<br />

Figure 1. Architecture of the baseb<strong>and</strong> block.<br />

26<br />

Digital b<strong>and</strong>width<br />

selector<br />

UMTS<br />

WLAN<br />

Low-pass filter<br />

V out+<br />

V out-<br />

needs to be found to achieve the required match<strong>in</strong>g. This<br />

area has been determ<strong>in</strong>ed <strong>in</strong> 36µm 2 . Once the unit current<br />

has been fixed to 5µA as a trade-off between sav<strong>in</strong>g<br />

power <strong>and</strong> fitt<strong>in</strong>g the required conversion frequency, W<br />

<strong>and</strong> L of the unit source both result equal to 6µm.<br />

To limit the effect of glitches, which causes dynamic<br />

distortion, a reduced voltage sw<strong>in</strong>g is used to drive the<br />

switches of the various current sources (the two used<br />

reference voltages are 300mV <strong>and</strong> 800mV). Moreover,<br />

also the virtual ground forced by the transimpedance stage<br />

further reduces the energy of glitches, provid<strong>in</strong>g an<br />

improvement <strong>in</strong> the dynamic behavior of the DAC. F<strong>in</strong>ally<br />

the switches are set to m<strong>in</strong>imum dimensions to limit their<br />

charge <strong>in</strong>jection.<br />

2.2 The transimpedance stage<br />

The transimpedance stage is placed after the DAC <strong>and</strong><br />

drives the follow<strong>in</strong>g smooth<strong>in</strong>g filter. Consider<strong>in</strong>g the<br />

DAC output current fixed above, the value of the two<br />

feedback resistances (220Ω, as shown <strong>in</strong> Fig. 1) has been<br />

determ<strong>in</strong>ed <strong>in</strong> order to obta<strong>in</strong> a differential peak-to-peak<br />

sw<strong>in</strong>g of 0.56V at the DAC output. These resistances have<br />

been accurately matched with the resistance used to<br />

generate the DAC reference current, <strong>in</strong> order to ensure an<br />

output sw<strong>in</strong>g as much as possible <strong>in</strong>dependent of the<br />

technology process spread.<br />

The opamp used <strong>in</strong> the stage is a fundamental block for<br />

the entire system performance. It has to be able to drive<br />

the low feedback resistances without <strong>in</strong>troduc<strong>in</strong>g<br />

additional distortion on the signal. At the same time its<br />

current consumption needs to be m<strong>in</strong>imized. This goal has<br />

been achieved with a proper choice of the opamp structure<br />

(shown <strong>in</strong> Fig. 2) <strong>and</strong> with a careful design. A two-stage<br />

topology has been used: a folded cascode <strong>in</strong>put stage <strong>and</strong><br />

a class AB output stage. The folded cascode <strong>in</strong>put stage<br />

guarantees the necessary DC ga<strong>in</strong> of 50dB to the entire<br />

opamp. The class AB output stage allows to drive the<br />

220Ω resistances, without an excessive current<br />

consumption. The implemented opamp reaches a


ga<strong>in</strong>-b<strong>and</strong>width product of 150MHz, while the total<br />

current consumption is limited to 7mA.<br />

Figure 2. The opamp used <strong>in</strong> the transimpedance stage.<br />

2.3 The low-pass reconstruction filter<br />

The reconfigurable reconstruction filter is a 4 th -order<br />

Bessel low-pass structure consist<strong>in</strong>g of the cascade of two<br />

identical active-RC biquadratic cells <strong>and</strong> with a DC ga<strong>in</strong><br />

of 8dB. A schematic representation of the complete<br />

architecture is shown <strong>in</strong> [6]. The cut-off frequency of the<br />

filter can be programmed accord<strong>in</strong>g to the selected<br />

st<strong>and</strong>ard (11MHz for the WLANs, 2.5MHz for the<br />

UMTS), us<strong>in</strong>g a digital word which properly changes the<br />

value of the resistors. Furthermore a digitally controlled<br />

tun<strong>in</strong>g circuit is used to allow a f<strong>in</strong>e adjustment of the<br />

value of the filter capacitors. This compensates the effect<br />

of technology process spread <strong>and</strong> consequently ensures a<br />

good accuracy for the cut-off frequency.<br />

The output noise of the entire filter, dom<strong>in</strong>ated by the<br />

white thermal noise of the resistors, has been designed to<br />

be negligible with respect to the quantization noise<br />

<strong>in</strong>troduced by the DAC.<br />

The filter current consumption has been optimized for the<br />

two cases (4.7mA for WLAN, 2.5mA for UMTS), s<strong>in</strong>ce<br />

the bias currents of the filter opamps can be changed,<br />

accord<strong>in</strong>g to the selected mode, <strong>in</strong> order to save power<br />

when a smaller b<strong>and</strong>width (UMTS case) is required.<br />

3. EXPERIMENTAL RESULTS<br />

The reconfigurable DAC system has been <strong>in</strong>tegrated <strong>in</strong> a<br />

st<strong>and</strong>ard 0.13µm CMOS technology. Fig. 3 shows the<br />

microphotograph of the test-chip, whose active area is<br />

0.9mm 2 . The device consumes 16.2mA (when configured<br />

for WLAN) or 14mA (when configured for UMTS), from<br />

a s<strong>in</strong>gle 1.2V supply voltage.<br />

The static DNL <strong>and</strong> INL of the DAC have been firstly<br />

evaluated. They are shown <strong>in</strong> Fig. 4. The measured static<br />

l<strong>in</strong>earity higher than 8 bits (INL max = 0.25LSB) confirms<br />

the effectiveness of the mismatch dimension<strong>in</strong>g <strong>and</strong> the<br />

m<strong>in</strong>imization, due to the transimpedance stage, of the<br />

distortion effect of the current sources output impedance.<br />

The dynamic behaviour of the baseb<strong>and</strong> block has been<br />

proved with s<strong>in</strong>gle tone tests as well as <strong>in</strong>termodulation<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

27<br />

tests, with the DAC conversion frequency set to 100MHz<br />

for the WLAN case <strong>and</strong> 50MHz for the UMTS case.<br />

Fig. 5 <strong>and</strong> Fig. 6 show the filter output spectrum when a<br />

full-scale s<strong>in</strong>gle tone <strong>and</strong> two -6dBFS tones are applied at<br />

the <strong>in</strong>put (the device is set for the WLAN operation mode,<br />

which is the most critical). The measured SFDR is 58dB,<br />

while the IMD3 is 59dB, lead<strong>in</strong>g to an OIP3 performance<br />

of 29.5dBm. No 1dBCP measurement can be extracted<br />

due to the highly l<strong>in</strong>ear behaviour of the device. Fig. 7 <strong>and</strong><br />

Fig. 8 show the output spectra measured with typical<br />

WLAN 802.11a <strong>and</strong> UMTS signals, respectively. For both<br />

cases, the I component of a quadrature signal is<br />

considered. In the WLAN case the spectrum is compared<br />

with the emission mask (def<strong>in</strong>ed by the st<strong>and</strong>ard for the<br />

I+Q RF signal) to verify the out-of-b<strong>and</strong> l<strong>in</strong>earity.<br />

Table 1 summarizes the measured performance of the<br />

fabricated device. In spite of the low supply voltage used<br />

<strong>and</strong> the limited power consumption, the achieved l<strong>in</strong>earity<br />

performance is remarkable <strong>and</strong> proves the effectiveness of<br />

the proposed design. Regard<strong>in</strong>g the noise performance, <strong>in</strong><br />

both cases the achieved dynamic range fulfils the<br />

resolution requirements.<br />

4. CONCLUSIONS<br />

In this paper a low-voltage reconfigurable DAC system<br />

(current-steer<strong>in</strong>g DAC + transimpedance stage + low-pass<br />

reconstruction filter) for wireless transmitters has been<br />

presented. The implemented block is capable to process<br />

WLAN a/b/g <strong>and</strong> UMTS signals, thanks to a digital<br />

control which properly changes the filter transfer function.<br />

The device has been <strong>in</strong>tegrated <strong>in</strong> a 0.13µm CMOS<br />

technology <strong>and</strong> operates with a 1.2V supply voltage. The<br />

use of the transimpedance stage, <strong>in</strong> conjunction with<br />

proper design choices, allows the low-voltage block to<br />

achieve high l<strong>in</strong>earity performance (full-scale SFDR of<br />

58dB, OIP3 of 29.5dBm, for the WLAN case). The<br />

current consumption is limited to 16.2mA/14mA for the<br />

WLAN/UMTS mode respectively.<br />

Figure 3. Chip microphotograph.


LSB units (8 bit)<br />

0.3<br />

0.25<br />

0.2<br />

0.15<br />

0.1<br />

0.05<br />

-0.05<br />

-0.1<br />

-0.15<br />

-0.2<br />

-0.25<br />

-0.3<br />

0 50 100 150 200 250<br />

DAC <strong>in</strong>put code / DAC code transition<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

0<br />

INL<br />

DNL<br />

Figure 4. DAC DNL-INL.<br />

Power Spectral Density (dBm)<br />

10<br />

0<br />

-10<br />

-20<br />

-30<br />

-40<br />

-50<br />

-60<br />

-70<br />

-80<br />

RBW = 10kHz<br />

-90<br />

0 1 2 3 4 5 6 7 8 9 10<br />

Frequency (MHz)<br />

Figure 5. Output spectrum with a 3MHz FS tone<br />

(WLAN sett<strong>in</strong>g).<br />

Power Spectral Density (dBm)<br />

10<br />

0<br />

-10<br />

-20<br />

-30<br />

-40<br />

-50<br />

-60<br />

-70<br />

-80<br />

RBW = 10 kHz<br />

-90<br />

1 1.5 2 2.5 3<br />

Frequency (MHz)<br />

3.5 4 4.5 5<br />

Figure 6: Output spectrum with two -6dBFS tones<br />

(WLAN sett<strong>in</strong>g).<br />

Power Spectral Density [dBr]<br />

0<br />

-10<br />

-20<br />

-30<br />

-40<br />

-50<br />

WLAN signal spectrum<br />

Transmit spectrum mask<br />

-60<br />

0 5 10 15 20 25 30 35<br />

Frequency [MHz]<br />

Figure 7: Output spectrum with a WLAN 802.11a <strong>in</strong>put<br />

signal.<br />

28<br />

Power Spectral Density [dBr]<br />

0<br />

-10<br />

-20<br />

-30<br />

-40<br />

-50<br />

-60<br />

-70<br />

1 2 3 4 5 6 7 8 9 10<br />

Frequency [MHz]<br />

Figure 8. Output spectrum with an UMTS <strong>in</strong>put signal.<br />

Table. 1. Block performance summary.<br />

Technology CMOS 0.13µm<br />

Supply voltage 1.2V<br />

Core area 0.9mm 2<br />

DAC DNL/INL ≤ 0.1LSB / ≤ 0.25LSB<br />

St<strong>and</strong>ard WLAN UMTS<br />

Conversion<br />

frequency<br />

100MHz 50MHz<br />

Filter b<strong>and</strong>width 11MHz 2.5MHz<br />

Differential<br />

output sw<strong>in</strong>g<br />

1.4Vpp<br />

1.4Vpp<br />

DR (@FS) 52dB (8.3 bit) 56dB (9 bit)<br />

SFDR (@FS) 58dB (@ 3MHz) 60dB (@0.6MHz)<br />

OIP3 29.5dBm 31dBm<br />

Total current<br />

consumption<br />

16.2mA 14mA<br />

5. REFERENCES<br />

[1] Digest of Technical Papers of the 2004 <strong>and</strong> <strong>2005</strong><br />

IEEE International Solid-State Circuits Conference –<br />

Session 5: WLAN Transceivers.<br />

[2] D. Giotta et al., “Low-Power 14-bit Current Steer<strong>in</strong>g<br />

DAC, for ADSL2+/CO Applications <strong>in</strong> 0.13µm<br />

CMOS”, Proc. of ESSCIRC 2004, pp. 163-166,<br />

September 2004.<br />

[3] “Enabl<strong>in</strong>g technologies for wireless reconfigurable<br />

term<strong>in</strong>als”, Italian National Program FIRB, Contract<br />

n°RBNE01F582 – web-site http://ims.unipv.it/firb/.<br />

[4] M. Albiol et al., “Mismatch <strong>and</strong> dynamic model<strong>in</strong>g<br />

of current sources <strong>in</strong> current-steer<strong>in</strong>g CMOS D/A<br />

converters: an extended design procedure”, IEEE<br />

Trans. on Circuits <strong>and</strong> Systems, vol. 51, No. 1, pp.<br />

159-169, January 2004.<br />

[5] M. J. M. Pelgrom et al., “Match<strong>in</strong>g properties of<br />

MOS transistor”, IEEE J. Solid-State Circuits, vol.<br />

24, pp. 1433-1439, Oct. 1989.<br />

[6] S. D’Amico et al., “Low-power reconfigurable<br />

baseb<strong>and</strong> block for UMTS/WLAN transmitters”,<br />

Proc. of NORCHIP, pp. 103-106, November 2004.


0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

ADAPTIVE MIXERS WITH A DISCRETELY AND<br />

A CONTINUOUSLY ADJUSTABLE<br />

PERFORMANCE SPACE<br />

Maja Vidojkovic, Johan van der Tang, Peter Baltus*, Arthur van Roermund<br />

E<strong>in</strong>dhoven University of Technology, E.H. 5.06, P.O.Box 513, 5600MB, E<strong>in</strong>dhoven, The<br />

Netherl<strong>and</strong>s, *ASL Philips, E<strong>in</strong>dhoven, The Netherl<strong>and</strong>s<br />

E-mail: m.vidojkovic@tue.nl<br />

ABSTRACT<br />

The dem<strong>and</strong>s for multi-st<strong>and</strong>ard, multi-b<strong>and</strong> mobile<br />

h<strong>and</strong>sets have motivated the development of analog frontends<br />

with flexible circuit topologies that can support a<br />

range of applications via adjustability <strong>and</strong> configurability.<br />

Accord<strong>in</strong>gly, Gilbert cell mixers with a discretely <strong>and</strong> a<br />

cont<strong>in</strong>uously adjustable performance space are presented<br />

<strong>and</strong> analyzed. A discretely adjustable Gilbert cell is<br />

designed <strong>in</strong> CMOS 0.25um technology <strong>and</strong> a<br />

cont<strong>in</strong>uously adjustable Gilbert cell is implemented <strong>in</strong> the<br />

same technology. At 2.5GHz the follow<strong>in</strong>g ranges of<br />

performance of the discretely adjustable Gilbert cell are<br />

obta<strong>in</strong>ed: a voltage ga<strong>in</strong> from 15dB to –2dB, a NF from<br />

16dB to 9dB <strong>and</strong> an IIP3 from 11dBm to –8dBm. The<br />

performance is achieved for supply currents <strong>in</strong> the range<br />

from 1mA to 10mA. At 2.5GHz measured ranges of<br />

performance of the cont<strong>in</strong>uously adjustable Gilbert cell<br />

are: a voltage ga<strong>in</strong> from 11dB to 7.5dB, a NF from<br />

15.2dB to 12.7dB <strong>and</strong> an IIP3 from 4.2dBm to 0dBm.<br />

The performance is measured by adapt<strong>in</strong>g the supply<br />

current <strong>in</strong> the range from 3.3mA to 4.1mA. The comb<strong>in</strong>e<br />

of both, cont<strong>in</strong>uous tun<strong>in</strong>g <strong>and</strong> discrete tun<strong>in</strong>g is a<br />

promis<strong>in</strong>g solution for coverage of a large desired<br />

specification space.<br />

1. INTRODUCTION<br />

The coexistence of all wireless technologies <strong>in</strong> one system<br />

<strong>and</strong> cheap <strong>and</strong> quick solution for new st<strong>and</strong>ards require<br />

multi-mode, multi-b<strong>and</strong> <strong>and</strong> multi-st<strong>and</strong>ard mobile<br />

term<strong>in</strong>als. Hence, the dem<strong>and</strong>s for multi-st<strong>and</strong>ard mobile<br />

h<strong>and</strong>sets have motivated the development of analog frontends<br />

with flexible circuit topologies that can support a<br />

range of applications via adjustability <strong>and</strong> configurability.<br />

The concept of adjustability makes only sense if the reuse<br />

count of the flexible circuits (how many times the circuit<br />

is used while hav<strong>in</strong>g a m<strong>in</strong>imum amount of design effort)<br />

justifies the <strong>in</strong>vestment. This <strong>in</strong> turn, is strongly related to<br />

the performance space (the set of specifications that can be<br />

covered with one circuit, which is a sub-set of the total<br />

design space) that can be covered by the adjustable circuit.<br />

This paper is organized as follows. In section 2 the<br />

performance space of the Gilbert cell, (see fig. 1), one of<br />

the most common mixer topologies is <strong>in</strong>vestigated. An<br />

optimization procedure for its design is given <strong>in</strong> [1]. The<br />

impact of discrete variation on the performance of this<br />

29<br />

topology is <strong>in</strong>vestigated. In section 3, as an example, a<br />

Gilbert cell mixer with an extended cont<strong>in</strong>uously<br />

adjustable performance space is presented. In section 4 the<br />

comb<strong>in</strong>ation of both, cont<strong>in</strong>uous tun<strong>in</strong>g <strong>and</strong> discrete<br />

tun<strong>in</strong>g is recommended as a future work, as a promis<strong>in</strong>g<br />

solution to cover a desired specification space. Section 5 is<br />

reserved for the conclusions.<br />

Figure 1. The Gilbert cell<br />

2. A GILBERT CELL WITH A<br />

DISCRETELY ADJUSTABLE<br />

PERFORMANCE SPACE<br />

A Gilbert cell with a discretely adjustable performance<br />

space (see fig 2) is based on the Gilbert cell. The<br />

switch<strong>in</strong>g stage is the same stage as <strong>in</strong> the Gilbert cell. It is<br />

a cross-coupled differential pair that consists of the<br />

transistors M3, M4, M5 <strong>and</strong> M6. The load resistors <strong>and</strong> the<br />

transconductance stage are programmable. To facilitate<br />

the discrete adjustability the NMOS transistors Msw1, Msw2, … M swN <strong>and</strong><br />

M , M , …, M ; <strong>and</strong> PMOS<br />

sw1<br />

sw2<br />

swN<br />

transistors MR1, MR2, … MRN <strong>and</strong> M , M , …, M<br />

R1<br />

R2<br />

RN<br />

operate as switches. The transistors M11, M12, …, M1N <strong>and</strong><br />

M21, M22, … M2N implement the N differential pairs of the<br />

transconductance stage. So, the transconductance stage<br />

consists of N differential pairs which could be switch ON<br />

<strong>and</strong> OFF. With a different comb<strong>in</strong>ation of the switches<br />

different transistor sizes can be achieved. Each<br />

comb<strong>in</strong>ation of the differential pairs can be biased by one<br />

of different current sources Ibias1, Ibias2, … IbiasN. Hence,<br />

there are 2 N comb<strong>in</strong>ations of (differential pairi, Ibias,l), where i=1...N, l=1...N. For each group of (differential


pair i, I bias,l), i=1...N, l=1...N there is a pair of load resistors<br />

R G1 <strong>and</strong> R L1, R G2 <strong>and</strong> R L2, …, R GN <strong>and</strong> R LN. R G,i is a load<br />

resistance that gives the maximum voltage ga<strong>in</strong> for a<br />

certa<strong>in</strong> group (differential pair i, I bias,l), i, l=1...N; <strong>and</strong> R L,i is<br />

load resistance that gives the maximum l<strong>in</strong>earity for the<br />

same group. The bias<strong>in</strong>g voltages (V rfdc, V lodc) differ for a<br />

different comb<strong>in</strong>ation of (differential pair i, I bias,l , R G,n <strong>and</strong><br />

R L,n), where i, l, n =1…N. In such a way different set of<br />

specification can be achieved.<br />

Figure 2. The Gilbert cell with a discretely<br />

adjustable performance space<br />

The optimization procedure for the discretely adjustable<br />

Gilbert cell is similar as for the Gilbert cell. For a desired<br />

NF a certa<strong>in</strong> comb<strong>in</strong>ation of (differential pair i, I bias,l),<br />

i=1...N, l=1...N, is chosen. In addition for each<br />

comb<strong>in</strong>ation of (differential pair i, I bias,l) the bias<strong>in</strong>g<br />

voltages (V rfdc, V lodc) <strong>and</strong> the load resistor value (R) are<br />

adjusted <strong>in</strong> such a way that a maximum voltage ga<strong>in</strong> is<br />

achieved. In this way the value of R G,l is determ<strong>in</strong>ed.<br />

Then, the value of R L,l for which the maximum <strong>in</strong>putreferred<br />

IIP3 is achieved, is determ<strong>in</strong>ed.<br />

Theoretically the number N can be unlimited. In practice it<br />

is limited <strong>and</strong> is a trade-off between the degradation of<br />

mixer performance <strong>and</strong> a desirable resolution of the<br />

adjustability. In order to estimate the performance<br />

degradation the impact of the MOSFET switches on the<br />

mixer performance has to be considered.<br />

Less impact of the switches on the voltage ga<strong>in</strong> <strong>and</strong> the<br />

NF can be expected by decreas<strong>in</strong>g the resistance value of<br />

Rsw,i of the transistor Msw,i when it is ON <strong>and</strong> <strong>in</strong>creas<strong>in</strong>g the<br />

Z impedances of M . An <strong>in</strong>creased W/L ratio of<br />

sw,<br />

i<br />

sw,<br />

i<br />

the transistor Msw,i decreases Rsw,i when the transistor <strong>in</strong><br />

ON, but <strong>in</strong>troduces more parasitics <strong>and</strong> results <strong>in</strong> <strong>in</strong>creased<br />

chip area. A smaller W/L ratio of the transistor M<br />

sw,<br />

i<br />

<strong>in</strong>creases Z when the transistor is OFF, but <strong>in</strong>creases<br />

sw,<br />

i<br />

R when the transistor is ON. However, a high value of<br />

sw,<br />

i<br />

does not <strong>in</strong>fluence the voltage ga<strong>in</strong> <strong>and</strong> the NF s<strong>in</strong>ce<br />

R<br />

sw,<br />

i<br />

it is not <strong>in</strong> the signal path. The resistance values of RR,i <strong>and</strong><br />

R of the transistors M<br />

, R,i <strong>and</strong> M , respectively, when<br />

,<br />

R i<br />

they are ON can be neglected, s<strong>in</strong>ce are <strong>in</strong> series with high<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

R i<br />

30<br />

load resistance. Parasitic capacitance when the transistors<br />

are OFF can be also neglected s<strong>in</strong>ce the output signal is<br />

the <strong>in</strong>termediate frequency (IF) signal at relatively low<br />

frequencies.<br />

Assum<strong>in</strong>g a perfect square wave for local oscillator (LO)<br />

voltage <strong>and</strong> neglect<strong>in</strong>g the impact of the switches, the<br />

voltage ga<strong>in</strong> of this topology can be calculated as:<br />

N ⎛ 2 ⎛ ⎞⎞<br />

G = 20log ⎜ R⎜∑ gmi<br />

⎟⎟<br />

(1)<br />

⎝π ⎝ i=<br />

1 ⎠⎠<br />

where g mi is the transconductance of M i when the<br />

transistor is ON, R is R Gi or R Li depend on the desired<br />

performance. Tak<strong>in</strong>g noise fold<strong>in</strong>g from the frequency f RF<br />

+f lo <strong>in</strong>to account <strong>and</strong> neglect<strong>in</strong>g the thermal noise of the<br />

switch<strong>in</strong>g stage <strong>and</strong> the flicker (1/f) noise, the NF of this<br />

topology can be approximated by:<br />

⎛ ⎞<br />

⎜ ⎟<br />

R 4γ<br />

π<br />

NF = 10log 2 + 2 + +<br />

N<br />

2<br />

⎜ sw, i<br />

⎟<br />

⎜ ∑<br />

N<br />

2<br />

N<br />

i 1 R<br />

⎟<br />

= s ⎜ ( gmi) R ⎛ ⎞<br />

s 2RRs<br />

g ⎟<br />

⎜ ∑ ⎜∑mi⎟ i = 1<br />

⎟<br />

i = 1<br />

⎝ ⎝ ⎠ ⎠<br />

The total <strong>in</strong>ter-modulation <strong>in</strong> the discretely adjustable<br />

Gilbert cell can be approximated <strong>in</strong> a similar way as the<br />

total <strong>in</strong>ter-modulation <strong>in</strong> the Gilbert cell. The total <strong>in</strong>termodulation<br />

is approximately equal to the sum of the <strong>in</strong>termodulation<br />

values that the transconductance <strong>and</strong> the<br />

switch<strong>in</strong>g stage would generate if the other stage were<br />

ideal, [3]. Hence, <strong>in</strong> the case RL,l is ON, the <strong>in</strong>termodulation<br />

is limited by the transconductance stage. In the<br />

case RG,l is ON, a high voltage drop at the dra<strong>in</strong> of the<br />

transistors <strong>in</strong> the switch<strong>in</strong>g stage will turn them <strong>in</strong> the<br />

triode region. Then the <strong>in</strong>ter-modulation is determ<strong>in</strong>ed by<br />

both the transconductance <strong>and</strong> the switch<strong>in</strong>g stage.<br />

In order to make a comparison between the fixed <strong>and</strong> the<br />

discretely adjustable Gilbert cell, both topologies are<br />

simulated, by us<strong>in</strong>g the circuit simulator SpectreRF <strong>in</strong><br />

CMOS 0.25um technology. For this specific case the<br />

transconductance stage of discretely adjustable Gilbert cell<br />

consists of 3 differential pairs where M11 has<br />

W/L=50/0.25; M12 <strong>and</strong> M13 have W/L=100/0.25. M12 <strong>and</strong><br />

M13 can be switch ON <strong>and</strong> OFF. The comb<strong>in</strong>ations of<br />

transistor sizes, Ibias <strong>and</strong> load resistors used <strong>in</strong> this example<br />

are presented <strong>in</strong> Table 1. The sizes of switches are: Msw,i has W/L=100/0.25, M has W/L=10/0.25, <strong>and</strong> M<br />

, R,i <strong>and</strong><br />

sw i<br />

M have W/L=100/0.25. The fixed Gilbert cell is<br />

R,<br />

i<br />

simulated separately for each comb<strong>in</strong>ation.<br />

Increas<strong>in</strong>g Ibias (more than 10mA) improves only slightly<br />

the NF while the power dissipation is very high. Further<br />

decrease of the transistor size (less than 50um) br<strong>in</strong>gs the<br />

transistors out of saturation for Ibias=10mA. Further<br />

<strong>in</strong>crease of the transistor sizes (more than 250um)<br />

improves slightly the voltage ga<strong>in</strong> <strong>and</strong> the NF while<br />

parasitics degrade circuit performances at higher<br />

frequencies. Increas<strong>in</strong>g RGi does not improve the voltage<br />

ga<strong>in</strong> any more. A lower value of RLi improves slightly the<br />

IIP3.<br />

(2)


Table 1. The chosen comb<strong>in</strong>ations for the<br />

discretely adjustable Gilbert cell<br />

transistor<br />

sizes<br />

Ibias=1mA<br />

Ibias=5mA<br />

Ibias=10mA<br />

M11<br />

50um<br />

RG1 =2.5K<br />

RL1 =2K<br />

RG2 =550<br />

RL2 =350<br />

RG3 ’ =180<br />

RL3 =100<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

M11 <strong>and</strong> M12<br />

150um<br />

RG1 =2.5K<br />

RL1 =2K<br />

RG2 =550<br />

RL2 =350<br />

RG3 =250<br />

RL3 =100<br />

M11, M12, M13<br />

250um<br />

RG1 =2.5K<br />

RL1 =2K<br />

RG2 =550<br />

RL2 =350<br />

RG3 =250<br />

RL3 =100<br />

The simulated ranges for the voltage ga<strong>in</strong>, the NF (the<br />

SSB NF is simulated at a 50Ω source impedance) <strong>and</strong> the<br />

IIP3 of the fixed Gilbert cell <strong>and</strong> the discretely adjustable<br />

Gilbert cell are given <strong>in</strong> Table 2.<br />

Table 2. Simulation results at 2.5GHz<br />

Fixed<br />

Gilbert cell<br />

Discretely<br />

adjustable<br />

Gilbert cell<br />

Ga<strong>in</strong> [dB] NF [dB] IIP3 [dBm]<br />

Max =16<br />

M<strong>in</strong> =-2<br />

Max =15<br />

M<strong>in</strong>=-2<br />

Max =14<br />

M<strong>in</strong>=7<br />

Max =16<br />

M<strong>in</strong>=9<br />

Max=12<br />

M<strong>in</strong> =-8<br />

Max =11<br />

M<strong>in</strong>=-8<br />

From the obta<strong>in</strong>ed results the follow<strong>in</strong>g can be concluded.<br />

Significant flexibility is achieved. Moreover, emerg<strong>in</strong>g<br />

st<strong>and</strong>ards can easily fit <strong>in</strong> predesigned discretely<br />

adjustable topologies, <strong>and</strong> hence, products are ready very<br />

quickly on the market. The price paid for programmability<br />

<strong>and</strong> adjustability is degradation of the circuit performance<br />

due to the parasitic capacitances <strong>and</strong> the resistance value<br />

of the switches biased <strong>in</strong> the triode region when it is ON.<br />

This is especially case when the number N is higher <strong>and</strong> at<br />

higher frequencies.<br />

3. A GILBERT CELL WITH A<br />

CONTINUOUSLY ADJUSTABLE<br />

PERFORMANCE SPACE<br />

A cont<strong>in</strong>uously adjustable Gilbert cell (see fig 3) is also<br />

based on the Gilbert cell mixer. The switch<strong>in</strong>g stage is the<br />

same stage as <strong>in</strong> the Gilbert cell. It is a cross-coupled<br />

differential pair that consists of the transistors M 3, M 4, M 5<br />

<strong>and</strong> M 6. The transconductance stage <strong>in</strong> the cont<strong>in</strong>uously<br />

adjustable topology differs from the transcoductance stage<br />

<strong>in</strong> the fixed Gilbert topology. It consists of two differential<br />

pairs. One of the differential pairs consists of transistors<br />

M 1 <strong>and</strong> M 2, <strong>and</strong> it is referred to as the fixed differential<br />

pair. It is biased by the current source I bias1, as <strong>in</strong> the<br />

Gilbert cell. The other differential stage consists of the<br />

transistors M 11 <strong>and</strong> M 22, <strong>and</strong> it is called the cross-coupled<br />

differential pair. It is biased by the current source I bias2.<br />

The optimisation procedure of the cont<strong>in</strong>uously adjustable<br />

Gilbert cell is the same as for the fixed topologies. In the<br />

case I bias2 is zero, the adjustable Gilbert cell behaves as a<br />

conventional Gilbert cell. In the case I bias2 is non-zero, the<br />

optimisation technique is the follow<strong>in</strong>g. The bias<strong>in</strong>g<br />

voltages (V rfdc, V lodc), sizes of all transistors, <strong>and</strong> the load<br />

resistor value (R) are adjusted such that all transistors are<br />

31<br />

<strong>in</strong> saturation. The next step is to optimize the switch<strong>in</strong>g<br />

<strong>and</strong> the fixed differential stage for a high ga<strong>in</strong> <strong>and</strong> a low<br />

noise contribution. The cross-coupled differential stage is<br />

optimised to improve the l<strong>in</strong>earity. By chang<strong>in</strong>g I bias1 <strong>and</strong><br />

I bias2 currents, different values for the conversion ga<strong>in</strong>, the<br />

noise figure <strong>and</strong> the l<strong>in</strong>earity can be achieved.<br />

Figure 3: The cont<strong>in</strong>uously adjustable Gilbert cell<br />

Assum<strong>in</strong>g a perfect square wave for local oscillator (LO)<br />

voltage, the voltage ga<strong>in</strong> of the adjustable Gilbert topology<br />

can be approximated as:<br />

⎛ 2<br />

⎞<br />

G = 20log ⎜ R( gm1−gm2) ⎟<br />

⎝π⎠ (3)<br />

where g m1 is the transconductance of M 1 <strong>and</strong> M 2 <strong>and</strong> g m2 is<br />

the tranconductance of M 11 <strong>and</strong> M 22.<br />

Tak<strong>in</strong>g noise fold<strong>in</strong>g from the frequency fRF +flo <strong>in</strong>to<br />

account the NF of the adjustable mixer can be<br />

approximated by:<br />

⎛ 2<br />

4γ<br />

( gm1 + gm2)<br />

π ⎞ (4)<br />

NF = 10log ⎜2+ +<br />

⎟<br />

2<br />

2<br />

⎜ ( g − g ) R 2RR<br />

( g − g ) ⎟<br />

⎝ m1 m2 s s m1 m2<br />

⎠<br />

In order to analytically estimate the <strong>in</strong>ter-modulation the<br />

dra<strong>in</strong> current is modeled by, [3]:<br />

( gs T )<br />

I = f V − V = K<br />

1<br />

2<br />

( Vgs −VT)<br />

+ θ ( Vgs −VT)<br />

(Vgs-Vt) is the overdrive voltage of a transistor, coefficient<br />

θ models the source series resistance, mobility degradation<br />

because of the vertical field, <strong>and</strong> short-channel effects<br />

such as velocity saturation. For a 0.25-um CMOS<br />

technology θ =2.5 V -1 , approximately [3]. K is exp<strong>and</strong>ed<br />

as K = µ Cox<br />

W / L . The output current from the<br />

transconductance stage can be described as a Taylor series<br />

polynomial expression of the <strong>in</strong>put voltage vgs: 2<br />

3<br />

( a − b ) v + ( a − b ) v + ( a − b ) v + ...<br />

(5)<br />

i = i1<br />

− i2<br />

= 1 1 gs 2 2 gs 3 3 gs (6)<br />

where a 1=dI bias1/dV gs1,a 2=d 2 I bias1/dV gs1 2 ,a3=d 3 I bias1/dV gs1 3 ,<br />

b 1=dI bias2/dV gs2, b 2=d 2 I bias2/dV gs2 2 , b3=d 3 I bias2/dV gs2 3 , (Vgs1-<br />

V t) is the overdrive voltage of the transistor M 1 <strong>and</strong> M 2,<br />

<strong>and</strong> (V gs2-V t) is the overdrive voltage of the transistor M 11<br />

<strong>and</strong> M 22. In order to cancel the third order distortion


products, each stage should be designed such that a 3 = b3<br />

,<br />

[2]. For a 3 = b3<br />

the follow<strong>in</strong>g relation is estimated:<br />

1 ⎛ W ⎞ d2 Wd2<br />

Vgs2 −Vt ≈ ⎜ 4 1 4 ( Vgs1 Vt)<br />

θ ⎜<br />

− ⎟<br />

W ⎟<br />

+ − (7)<br />

⎝ d1 ⎠ Wd1<br />

where Wd1 is the width of M1 <strong>and</strong> M2, <strong>and</strong> Wd2 is the width<br />

of M11 <strong>and</strong> M22. For this specific design the target is a<br />

maximum adjustability over a certa<strong>in</strong> range <strong>and</strong> not<br />

design for the po<strong>in</strong>t of “ideal” <strong>in</strong>ter-modulation<br />

cancellation.<br />

G,NF[dB]<br />

15<br />

13<br />

11<br />

9<br />

7<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

G[dB] G[dB]<br />

NF[dB] NF[dB]<br />

IIP3[dBm] IIP3[dBm]<br />

0 0.2 0.4 0.6 0.8<br />

Ibias2[mA]<br />

Figure 4: Measured <strong>and</strong> simulated results<br />

6<br />

4<br />

2<br />

0<br />

IIP3[dBm]<br />

The <strong>in</strong>put signal is set to 2.5GHz, while the output signal<br />

is measured at an IF of 2MHz. A SSB NF is measured at a<br />

50Ω source resistance. The differential LO voltage sw<strong>in</strong>g<br />

is V lo=500mVp, the supply voltage is 2.5V, I bias1=3.3mA,<br />

R=500Ω, L d1=L d2=L s=0.25um (L d1 is length of M 1 <strong>and</strong> M 2;<br />

L d2 is length of M 11 <strong>and</strong> M 22) W d1=100um, W d2=40um,<br />

W s=80um (W s <strong>and</strong> L s are width <strong>and</strong> length of M 3, M 4, M 5<br />

<strong>and</strong> M 6).<br />

Figure 5: Die micrograph of the adjustable<br />

Gilbert cell<br />

The measured (solid l<strong>in</strong>e) <strong>and</strong> simulated (dashed l<strong>in</strong>e)<br />

results are presented <strong>in</strong> fig. 4. By <strong>in</strong>creas<strong>in</strong>g I bias2 the<br />

voltage ga<strong>in</strong> will decrease, the NF will <strong>in</strong>crease <strong>and</strong> the<br />

l<strong>in</strong>earity first will <strong>in</strong>crease due to cancellation of third<br />

order distortion products. In the case of further <strong>in</strong>crease of<br />

I bias2 the l<strong>in</strong>earity will decrease. There are two reasons for<br />

this. The first reason is that <strong>in</strong> the case of further <strong>in</strong>crease<br />

of I bias2 cancellation of the third derivatives (a 3- b 3) of the<br />

I-V curve becomes constant, [3] <strong>and</strong> the first derivatives<br />

32<br />

(a1- b1) cancel each other further (IIP3~(a1- b1)/ (a3- b3)). The second reason is that an <strong>in</strong>crease of the total bias<br />

current will <strong>in</strong>crease the voltage drop at the dra<strong>in</strong> of the<br />

transistors <strong>in</strong> the switch<strong>in</strong>g stage. This will br<strong>in</strong>g the<br />

switch<strong>in</strong>g transistors <strong>in</strong>to the triode region.<br />

The die photo of the realized adjustable Gilbert cell is<br />

shown <strong>in</strong> fig. 5. The active chip area is 180um х 210um.<br />

The fixed Gilbert cell (fig. 1) was also realized. For the<br />

same design parameters (as the cont<strong>in</strong>uously adjustable<br />

Gilbert cell) <strong>and</strong> Ibias1=3.3mA a voltage ga<strong>in</strong> of 13dB, a<br />

NF of 12dB <strong>and</strong> an IIP3of 1dBm are obta<strong>in</strong>ed.<br />

4. COMBINED METHOD TO VARY<br />

TOPOLOGY PERFORMANCE<br />

Based on the achieved results some general<br />

recommendations, which are important for choos<strong>in</strong>g<br />

methods to vary a topology performance, can be proposed.<br />

Values of design parameters can be tuned cont<strong>in</strong>uously<br />

(analog tun<strong>in</strong>g). Alternatively, a value of each design<br />

parameter can be tuned discretely by switch<strong>in</strong>g between<br />

different values (digital tun<strong>in</strong>g). The trends from the<br />

achieved results show, that by <strong>in</strong>creas<strong>in</strong>g the number of<br />

switches <strong>in</strong> discrete tun<strong>in</strong>g topology performance is<br />

degraded. There are some limitations <strong>in</strong> achiev<strong>in</strong>g a wider<br />

cont<strong>in</strong>uous tun<strong>in</strong>g range because topology performance is<br />

also degraded by <strong>in</strong>creas<strong>in</strong>g the tun<strong>in</strong>g range. Comb<strong>in</strong>ation<br />

of both, cont<strong>in</strong>uous tun<strong>in</strong>g <strong>and</strong> discrete tun<strong>in</strong>g are a<br />

promis<strong>in</strong>g solution to cover all po<strong>in</strong>ts of <strong>in</strong>terest of the<br />

overall specification space of a certa<strong>in</strong> topology.<br />

5. CONCLUSION<br />

Gilbert cell mixers with a discretely <strong>and</strong> a cont<strong>in</strong>uously<br />

adjustable performance space are presented <strong>and</strong> analyzed.<br />

A particular test circuit of a discretely adjustable Gilbert<br />

cell is designed <strong>in</strong> CMOS 0.25um technology <strong>and</strong> a<br />

particular test circuit of a cont<strong>in</strong>uously adjustable Gilbert<br />

cell is implemented <strong>in</strong> the same technology. Comb<strong>in</strong>ation<br />

of both, cont<strong>in</strong>uous tun<strong>in</strong>g <strong>and</strong> discrete tun<strong>in</strong>g is expected<br />

to be a good solution to cover the desired specification<br />

space of a certa<strong>in</strong> topology with the small performance<br />

degradation.<br />

6. REFERENCES<br />

[1] V.Vidojkovic, et.al., “Mixer Topology Selection for a Multi<br />

– St<strong>and</strong>ard High Image – Reject Front – End”, IEEE, May<br />

2003.<br />

[2] T.P.Weldon, “New Method for Amplifier L<strong>in</strong>earization”,<br />

University of North Carol<strong>in</strong>a at Charlotte, 2002.<br />

[3] M.T.Terrovitis <strong>and</strong> R.G.Meyer, “ Intermodulation<br />

Distortion <strong>in</strong> Current – Commutat<strong>in</strong>g CMOS Mixers”, IEEE<br />

Journal of Solid-State Circuits, vol. 35, no. 10, October<br />

2000.


0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

A 1.5V 8-BIT LOW-POWER SELF-<br />

CALIBRATING HIGH-SPEED FOLDING ADC<br />

ABSTRACT<br />

An 8-bit High-speed fold<strong>in</strong>g/<strong>in</strong>terpolat<strong>in</strong>g ADC is<br />

presented. Designed <strong>in</strong> 0.18μm CMOS technology, the<br />

ADC dissipates only 50mW from a s<strong>in</strong>gle 1.5V supply. A<br />

novel technique based on us<strong>in</strong>g both N <strong>and</strong> P fold<strong>in</strong>g cells<br />

is used to widen the <strong>in</strong>put range <strong>and</strong> a self-calibration<br />

technique based on us<strong>in</strong>g Trimmable MOSFETs is<br />

employed to improve the static <strong>and</strong> dynamic performance.<br />

1. INTRODUCTION<br />

Among different ADC structures, flash <strong>and</strong> fold<strong>in</strong>g are<br />

among the first choices for high speed applications.<br />

Although the flash architecture has been used <strong>in</strong> the<br />

highest speed ADCs, its severe disadvantage of requir<strong>in</strong>g<br />

2 N -1 comparators for N bits of resolution has limited<br />

practical flash ADC resolutions to less than 8 bit. Fold<strong>in</strong>g<br />

structure effectively reduces the number of comparators<br />

by add<strong>in</strong>g an analog-pre-process<strong>in</strong>g stage before the<br />

comparators. However, fold<strong>in</strong>g structure is sensitive to<br />

mismatch between elements (i.e. voltage <strong>and</strong> current<br />

offsets). Although the mismatch performance can be<br />

improved by us<strong>in</strong>g techniques such as averag<strong>in</strong>g <strong>and</strong><br />

<strong>in</strong>terpolat<strong>in</strong>g, still large MOSFET device sizes are<br />

required to ma<strong>in</strong>ta<strong>in</strong> reasonable offsets which <strong>in</strong> turn<br />

degrade the maximum sampl<strong>in</strong>g frequency. As a result,<br />

considerable power must be consumed to satisfy both the<br />

static <strong>and</strong> dynamic performance <strong>in</strong> classic fold<strong>in</strong>g ADCs.<br />

Another challenge <strong>in</strong> design<strong>in</strong>g fold<strong>in</strong>g ADC <strong>in</strong> scaled<br />

CMOS processes is the reduction of the power supply<br />

voltage which reduces <strong>in</strong>put range <strong>and</strong> voltage efficiency<br />

(i.e. the ratio of the <strong>in</strong>put voltage range to the supply<br />

voltage). As voltage efficiency affects many of the design<br />

parameters (match<strong>in</strong>g considerations, <strong>in</strong>put capacitance,<br />

power dissipation, etc) [1], narrow<strong>in</strong>g the <strong>in</strong>put range<br />

would result <strong>in</strong> more complex design process. The<br />

restriction is more severe when the supply voltage is less<br />

than the nom<strong>in</strong>al value of the technology (as is the case <strong>in</strong><br />

our design).<br />

Nevertheless, the number of reported low-voltage fold<strong>in</strong>g<br />

ADC’s is <strong>in</strong>creas<strong>in</strong>g [2,3,4]. The mismatch <strong>in</strong>duced errors<br />

are usually reduced by us<strong>in</strong>g averag<strong>in</strong>g <strong>and</strong> <strong>in</strong>terpolat<strong>in</strong>g<br />

Hamid Movahedian,Mehrdad Sharif Bakhtiar<br />

Electrical Eng<strong>in</strong>eer<strong>in</strong>g Department<br />

Sharif University of Technology<br />

Tehran, IRAN<br />

Email: movahedian@sharif.ir<br />

33<br />

<strong>and</strong>/or self-calibration. The effects of reduced power<br />

supply on static <strong>and</strong> dynamic behaviour are generally<br />

compensated at the cost of <strong>in</strong>creas<strong>in</strong>g power dissipation.<br />

Consider<strong>in</strong>g the energy per conversion step as a figure of<br />

merit [2]:<br />

F=E/convstep=Power/(2 ENOB x F s) (1)<br />

For all the referred cases, F is more than 2.8pJ. As power<br />

consumption is an important parameter especially from<br />

<strong>in</strong>dustrial po<strong>in</strong>t of view, it would be very attractive to<br />

convey methods for us<strong>in</strong>g the advantages of fold<strong>in</strong>g<br />

architecture without sacrific<strong>in</strong>g power <strong>and</strong>/or chip area.<br />

The fold<strong>in</strong>g ADC proposed <strong>in</strong> this paper utilizes two novel<br />

techniques, one for <strong>in</strong>creas<strong>in</strong>g voltage efficiency, <strong>and</strong> the<br />

other for self-calibration <strong>and</strong> compensation for static<br />

mismatch. By us<strong>in</strong>g these techniques, both the dynamic<br />

<strong>and</strong> static performance will be improved without any need<br />

to <strong>in</strong>crease the power. Simulation results show energy per<br />

conversion step of less than 1.4pJ.<br />

2. IMPROVEMENT TECHNIQUES<br />

2.1 Widen<strong>in</strong>g Usable Input Voltage Range<br />

The ma<strong>in</strong> build<strong>in</strong>g block of every fold<strong>in</strong>g amplifier is a<br />

fold<strong>in</strong>g cell which consists of one or two differential<br />

pair(s). The technique used to <strong>in</strong>crease the usable <strong>in</strong>put<br />

range is based on the use of both N <strong>and</strong> P differential pairs<br />

as preamplifiers <strong>and</strong> fold<strong>in</strong>g amplifiers[5]. In other words,<br />

the lower part of the <strong>in</strong>put range is covered by P<br />

differential pairs while the upper part is covered by N<br />

differential pairs. In this way, the <strong>in</strong>put range can be<br />

virtually extended from both ends to supply limits (0 <strong>and</strong><br />

Vdd ). The usable <strong>in</strong>put range, however, would be lower if<br />

extra folders are used to allow <strong>in</strong>terpolation at the ends of<br />

the <strong>in</strong>put range <strong>and</strong> also for dc balance. For example, to<br />

have a Fold<strong>in</strong>g Factor of 8, us<strong>in</strong>g 11 fold<strong>in</strong>g amplifiers is<br />

a practical choice [6]. For such an architecture, the <strong>in</strong>put<br />

signal range would be 8/ 11 × 1. 5 ≈ 1. 1V<br />

for a 1.5V<br />

supply voltage.


Vi<br />

P-Fold<strong>in</strong>g Cell 1 P- Fold<strong>in</strong>g Cell 5<br />

Vref<br />

Vref<br />

Iss 1<br />

Iss<br />

5<br />

Iss<br />

Vref<br />

6<br />

Iss<br />

N-Fold<strong>in</strong>g Cell 1 N-Fold<strong>in</strong>g Cell 6<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

Vref<br />

11<br />

Vb1<br />

Vb2<br />

+Vout-<br />

Figure 1. Wide Input Range Fold<strong>in</strong>g Block<br />

The output currents of P-folders <strong>and</strong> N-folders are<br />

summed up before be<strong>in</strong>g converted to voltages <strong>in</strong> active<br />

loads. Fig. 1 shows the proposed structure.<br />

As the summ<strong>in</strong>g node of the output currents is divided to<br />

two nodes <strong>in</strong> this structure, the total capacitance hang<strong>in</strong>g<br />

from each node is reduced to about one half of the former<br />

case. In this way, the summ<strong>in</strong>g nodes have potentially<br />

higher b<strong>and</strong>widths.<br />

As it can be seen <strong>in</strong> Fig.1, there are two fold<strong>in</strong>g cells for<br />

which one of the neighbor<strong>in</strong>g folders is N-type while the<br />

other is P-type. As the characteristic curves of N <strong>and</strong> P<br />

fold<strong>in</strong>g amplifiers are not matched, the overall<br />

characteristic curve will be distorted at the <strong>in</strong>terface<br />

between two subfolders. In order to compensate for the<br />

mismatch between characteristic curves of N- <strong>and</strong> Pfold<strong>in</strong>g<br />

cells, the ga<strong>in</strong> of N-preamplifiers are adjusted such<br />

that the two types of fold<strong>in</strong>g blocks have the same<br />

characteristic curve. This is done <strong>in</strong> an additional circuitry<br />

consist<strong>in</strong>g of replicas of an N- <strong>and</strong> a P-fold<strong>in</strong>g cell <strong>and</strong> an<br />

op-amp to adjust the bias of tail current source of the Nfold<strong>in</strong>g<br />

preamplifier (Fig.2). The <strong>in</strong>put values of the<br />

replica fold<strong>in</strong>g cells are selected from the reference<br />

voltages <strong>in</strong> such a way that the ga<strong>in</strong>s are equalized at the<br />

region between N <strong>and</strong> P sub-ranges. The same bias<br />

voltage is applied to the tail current sources of all the Npreamplifiers.<br />

Vref M+1<br />

Vref M-1<br />

Vref M<br />

Vref M<br />

Figure 2. Equaliz<strong>in</strong>g ga<strong>in</strong> of N- <strong>and</strong> P-fold<strong>in</strong>g<br />

cells<br />

34<br />

2.2 Self-Calibration<br />

Consider<strong>in</strong>g the major sources of mismatch as<br />

<strong>in</strong>dependent r<strong>and</strong>om variables, their effect <strong>in</strong> term of<br />

mean-squared values can be mathematically expressed as<br />

parts of an <strong>in</strong>put referred offset. The effect of such an<br />

<strong>in</strong>put offset can be cancelled out by <strong>in</strong>troduc<strong>in</strong>g a<br />

“programmable <strong>in</strong>put offset”. In Cont<strong>in</strong>uous time offsetcancellation<br />

techniques, the <strong>in</strong>put offset is sampled on a<br />

capacitor before signal sampl<strong>in</strong>g <strong>and</strong> is subtracted from<br />

signal dur<strong>in</strong>g normal sampl<strong>in</strong>g phase[7]. This technique,<br />

however, limits the sampl<strong>in</strong>g frequency as the calibration<br />

process must be repeated before each sampl<strong>in</strong>g. Static or<br />

start-up calibration, on the other h<strong>and</strong>, can improve the<br />

dynamic behavior. In [4], current DAC’s are used to<br />

subtract the measured offset from the signal at the output<br />

of the preamplifiers. This has led to a considerable<br />

improvement <strong>in</strong> both the static <strong>and</strong> dynamic behavior, but<br />

at the cost of <strong>in</strong>creas<strong>in</strong>g power <strong>and</strong> chip area.<br />

As a classic differential amplifier with resistive loads<br />

(R D), the <strong>in</strong>put referred offset of the preamplifier is:<br />

( V −V ) ΔR Δ(<br />

W / L)<br />

GS th D<br />

V = ( + ) − Δ V (2)<br />

off th<br />

2 R W / L<br />

D<br />

Δ ( W / L)<br />

<strong>in</strong> Eq. (5) can be adjusted us<strong>in</strong>g “Trimmable<br />

MOSFET” as a part of one of the <strong>in</strong>put transistors of the<br />

fold<strong>in</strong>g cells. The trimmable MOSFET consists of a<br />

number of switchable paralleled transistors with different<br />

(W/L)’s <strong>and</strong> acts as a programmable W/L transistor. It’s<br />

usually used for trimm<strong>in</strong>g b<strong>in</strong>ary weighted current sources<br />

<strong>in</strong> DACs [8].<br />

With proper weight<strong>in</strong>g of (W/L) M1…(W/L) Mn, the<br />

preamplifier conta<strong>in</strong><strong>in</strong>g trimmable MOSFET('s) acts as a<br />

D/A which converts an n-bit digital word to a<br />

programmable <strong>in</strong>put offset which can cancel out the<br />

mismatch <strong>in</strong>duced offset. An n-bit latch is used to store<br />

the proper digital word.<br />

Fig.3 shows a 4-bit digital-to-offset converter used as<br />

preamplifier <strong>in</strong> N-fold<strong>in</strong>g cells (similar structure with Ptransistors<br />

is used for P-fold<strong>in</strong>g cells). To ma<strong>in</strong>ta<strong>in</strong><br />

dynamic match<strong>in</strong>g, another trimmable MOSFET (with<br />

fixed code) is used for the other <strong>in</strong>put transistor.<br />

Vref M0<br />

4-bit<br />

Latch<br />

M 1 M 2 M 4<br />

Figure 3. 4-bit Digital-to-Offset Converter<br />

V<strong>in</strong>


The next step is devis<strong>in</strong>g a procedure for offset<br />

cancellation. Not<strong>in</strong>g that the ultimate goal of offset<br />

cancellation procedure is reach<strong>in</strong>g the situation:<br />

VIN= Vref ⇒ VOUT<br />

= 0<br />

(3)<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

i<br />

This procedure will be quite straightforward:<br />

1-The two <strong>in</strong>puts of the each fold<strong>in</strong>g cell (V IN <strong>and</strong> V refi) are<br />

connected to ma<strong>in</strong>ta<strong>in</strong> the condition VIN=Vrefi .<br />

2-The programmable offset is set to its most negative<br />

value. This value must be selected to be higher <strong>in</strong><br />

magnitude than the highest expected value of the<br />

mismatch <strong>in</strong>duced offset.<br />

3-The programmable offset is <strong>in</strong>cremented <strong>in</strong><br />

(1/2 n ).V FullScale steps as long as the output reaches zero (i.e.<br />

the output of the comparator toggles)<br />

4- The last value of the n-bit digital word is stored <strong>in</strong> the<br />

latch <strong>and</strong> the <strong>in</strong>puts are disconnected. It must be noted that<br />

the number of bits is chosen to achieve the maximum<br />

allowable INL.<br />

The same procedure will be repeated for every fold<strong>in</strong>g cell<br />

of each fold<strong>in</strong>g block. As this technique compensates the<br />

static offsets, it will be sufficient to perform it once if the<br />

latches are non-volatile memories or once per power up if<br />

the latches are volatile. For the latter case, it takes less<br />

than a millisecond after each power up to perform selfcalibration<br />

for the whole ADC.<br />

Fig.4 shows the block diagram of the offset cancellation<br />

system .<br />

Us<strong>in</strong>g the self-calibration technique has an obvious effect<br />

on the reduction of the overall chip area because the added<br />

logic circuitry occupies negligible area compared to the<br />

large transistors needed to satisfy the same static<br />

performance without calibration. In this case, the overall<br />

chip area was reduced to about 0.6mm 2<br />

V IN<br />

Fold<strong>in</strong>g<br />

Cell n<br />

N-bit<br />

Latch<br />

Vref n<br />

N-bit Bus<br />

CLK<br />

+V OUT<br />

Enable<br />

Figure4. Offset Cancellation circuit<br />

Output<br />

35<br />

Vout (V)<br />

0.6<br />

0.4<br />

0.2<br />

V IN<br />

0<br />

-0.2<br />

-0.4<br />

FOLDING<br />

BLOCK 1<br />

FOLDING<br />

BLOCK 2<br />

FOLDING<br />

BLOCK 3<br />

FOLDING<br />

BLOCK 4<br />

8x INTERPOLATOR<br />

32<br />

32 COMPARATORS<br />

COARSE<br />

ADC<br />

32 8<br />

SYNCH AND ENCODER<br />

Figure 5. Block Diagram of the 8-Bit<br />

Fold<strong>in</strong>g/Interpolat<strong>in</strong>g ADC<br />

USABLE RANGE<br />

DATA OUT<br />

8<br />

-0.6<br />

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.5<br />

V<strong>in</strong> (V)<br />

Figure6. Fold<strong>in</strong>g waveforms<br />

3. ADC ARCHITECTURE<br />

The block diagram of designed ADC is shown <strong>in</strong> Fig.5. It<br />

consists of 4 self-calibrated complimentary fold<strong>in</strong>g blocks,<br />

resistive <strong>in</strong>terpolat<strong>in</strong>g networks, 32 f<strong>in</strong>e comparators, a<br />

digital encoder <strong>and</strong> a coarse ADC block. The entire <strong>in</strong>put<br />

range of (0 – 1.5V) was folded 11 times, among them 8<br />

(0.15 V


VON<br />

VIP<br />

M1<br />

CLK M7 M5<br />

M6 M8 CLK<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

M10<br />

Vdd<br />

CLK<br />

M2<br />

Vb<br />

M3 M4 M 9<br />

Figure 7. Improved Comparator<br />

VIN<br />

VOP<br />

- Instead of ty<strong>in</strong>g output nodes to ground, they were<br />

shorted <strong>in</strong> the preamplification phase. In this way, these<br />

transistors can start the regeneration process shortly after<br />

the fall<strong>in</strong>g edge of the clock, result<strong>in</strong>g <strong>in</strong> lower dynamic<br />

offset.<br />

4. SIMULATION RESULTS<br />

Fig.8 shows the simulated INL based on the Monte Carlo<br />

simulation of the ADC with <strong>and</strong> without self-calibration.<br />

Dynamic simulation of the designed ADC shows that the<br />

sampl<strong>in</strong>g frequency can be <strong>in</strong>creased to 300Msample/sec.<br />

In this sampl<strong>in</strong>g rate, however, ENOB drops to 7 bits at<br />

nyquist rate. The total power consumption is about<br />

50mW, <strong>in</strong>clud<strong>in</strong>g reference ladder power <strong>and</strong> static <strong>and</strong><br />

dynamic power consumption of the ADC (exclud<strong>in</strong>g T&H<br />

circuit). Table.1 shows the summary of the simulation<br />

results <strong>in</strong> comparison with state of the art fold<strong>in</strong>g ADC’s .<br />

INL<br />

1.5<br />

1<br />

0.5<br />

0.2<br />

0<br />

-0.2<br />

-0.5<br />

-1<br />

INL<br />

Calibrated<br />

-1.5<br />

0 32 64 96 128 160 192 224 256<br />

Code<br />

Uncalibrated<br />

Figure 8. Simulated INL with <strong>and</strong> without selfcalibration<br />

36<br />

Table. 1. Comparison with the state of the art.<br />

This Work [1] [2] [3]<br />

Technology 0.18um 0.12um 0.18um 0.18um<br />

Vdd 1.5V 1.2V 1.8V 1.8V<br />

Fs, Nyquist 300 MHz<br />

100 MHz<br />

400 MHz<br />

800 MHz<br />

ENOB 7bit 8.83b 7.5 7.26<br />

INL 0.4 LSB<br />

Power 50 mW<br />

? 0.8 LSB<br />

140 mW<br />


0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

A 4GSa/s 7-BIT TWO-WAY TIME-INTERLEAVED<br />

BIPOLAR TRACK-AND-HOLD AMPLIFIER<br />

BASED ON A NOVEL ANALOG SWITCH<br />

Burak Çatlı 1,2 , Fidel Bayam 1,2 , Bilal Tarık Çavuş 1 , Asım Kepkep, Ali Zeki 1<br />

1 Istanbul Technical University, Faculty of Electrical <strong>and</strong> <strong>Electronics</strong> Eng.,<br />

<strong>Electronics</strong> <strong>and</strong> Communications Eng. Dept., 34469 Maslak, Istanbul, Turkey<br />

2 ETA-IC Design Center, KOSGEB B<strong>in</strong>asi, A Blok, 34469 Maslak, Istanbul, Turkey<br />

E-mail: catli@ehb.itu.edu.tr<br />

ABSTRACT<br />

Based on a novel analog switch, a two-way time<strong>in</strong>terleaved<br />

bipolar track-<strong>and</strong> hold amplifier is proposed,<br />

achiev<strong>in</strong>g 7-bit resolution at 4GSa/s sampl<strong>in</strong>g rate.<br />

1. INTRODUCTION<br />

Ever <strong>in</strong>creas<strong>in</strong>g dem<strong>and</strong> for broadb<strong>and</strong> applications, such<br />

as optical communications, wideb<strong>and</strong> radar <strong>and</strong> highspeed<br />

<strong>in</strong>strumentation, has made high-speed ADCs a<br />

crucial element. The performance, particularly the speed<br />

of such ADCs is ma<strong>in</strong>ly limited by the track <strong>and</strong> hold<br />

(TH) amplifier, thus it is a critical component.<br />

When, the sampl<strong>in</strong>g rate of an ADC is limited by the<br />

speed of the TH amplifier, the high-speed conversion<br />

dem<strong>and</strong> can be relaxed by apply<strong>in</strong>g time-<strong>in</strong>terleav<strong>in</strong>g to<br />

multiple copies of the ADC. Ma<strong>in</strong> disadvantages of a<br />

time-<strong>in</strong>terleaved ADC are high power dissipation <strong>and</strong><br />

large area occupation due to repetition of sub-ADCs with<br />

all blocks (TH amplifier, comparator array, decoder, etc.).<br />

To overcome the TH bottleneck with a simpler<br />

architecture “time-<strong>in</strong>terleaved TH amplifier” technique is<br />

proposed. This technique limits parallelization <strong>in</strong> TH<br />

block. Then, only a s<strong>in</strong>gle copy of the rema<strong>in</strong><strong>in</strong>g blocks<br />

(comparator array, decoder, etc.) is required, thus the<br />

excess power <strong>and</strong> area consumption drawbacks are<br />

apparently elim<strong>in</strong>ated. Few examples of this approach<br />

have been reported [1-3]. Among these, only one can<br />

achieve the high-speed requirements of a broad-b<strong>and</strong><br />

system, but with a 16-way structure, m<strong>and</strong>at<strong>in</strong>g a<br />

complicated analog multiplex<strong>in</strong>g circuitry <strong>and</strong> a 16-phase<br />

clock generation [3]. In this paper, based on a novel<br />

analog switch (ASW), a 4GSa/s two-way time-<strong>in</strong>terleaved<br />

TH amplifier is proposed, employ<strong>in</strong>g two 2GSa/s TH<br />

amplifiers.<br />

2. TRACK-AND-HOLD SYSTEM<br />

2.1 Proposed circuit<br />

The block diagram of the proposed circuit is shown <strong>in</strong><br />

Fig.1. The two TH amplifiers (TH-1 <strong>and</strong> TH-2) at the<br />

<strong>in</strong>put are clocked to sample the <strong>in</strong>put signal, similar to TH<br />

amplifiers <strong>in</strong> a two-way time <strong>in</strong>terleaved ADC. The analog<br />

37<br />

switches (ASW-1 <strong>and</strong> ASW-2), controlled by opposite<br />

clock signals (clk-t <strong>and</strong> clk-h), perform “analog<br />

multiplex<strong>in</strong>g” to comb<strong>in</strong>e the sampled signals at the<br />

output, to obta<strong>in</strong> an effective sampl<strong>in</strong>g rate of 4GSa/s. The<br />

proposed circuit, whose tim<strong>in</strong>g diagram is shown <strong>in</strong> Fig.2,<br />

operates as follows: Dur<strong>in</strong>g the first time <strong>in</strong>terval, t1-t2,<br />

TH-1 is <strong>in</strong> hold mode <strong>and</strong> the held differential signal at its<br />

outputs (out+, out-) is transferred to the output nodes of<br />

the circuit (out-p, out-n) via the analog switches ASW-1<br />

<strong>and</strong> ASW-2 (for both, sel1=“1” <strong>and</strong> sel2=“0”). Dur<strong>in</strong>g the<br />

same <strong>in</strong>terval, TH-2 is <strong>in</strong> track mode but its output is<br />

isolated from the output nodes of the circuit. At time t2,<br />

the hold period of TH-1 is over <strong>and</strong> the ASWs disconnect<br />

the output of TH-1 from the output nodes of the circuit.<br />

The <strong>in</strong>terval t2-t3 is the hold period for TH-2, dur<strong>in</strong>g<br />

which sel2=“1” <strong>and</strong> sel1=“0” for both switches, such that<br />

the held differential output signal of TH-2 is connected to<br />

the output nodes of the circuit. Thus, the ASWs comb<strong>in</strong>e<br />

two 2GSa/s sampled signals at the output to yield a<br />

4GSa/s sampled signal.<br />

Figure 1. The complete track-<strong>and</strong>-hold system.<br />

2.2 Analog switch<br />

An ASW can be thought as two emitter followers (EFs),<br />

activated alternately by clock signals to drive their<br />

common output node. An EF can be deactivated by<br />

pull<strong>in</strong>g down its DC base voltage to a low level; <strong>in</strong> this<br />

case the other EF is activated. A problem is the transient<br />

base current observed at the beg<strong>in</strong>n<strong>in</strong>g of activation of an<br />

EF, which reduces the usable hold period of the output<br />

signal of the preced<strong>in</strong>g TH amplifier.


Figure 2. Tim<strong>in</strong>g diagram of complete TH system.<br />

The complete schematic of the proposed analog switch,<br />

shown <strong>in</strong> Fig.3, has some differences with respect to the<br />

simple ASW expla<strong>in</strong>ed above. To limit the transient base<br />

currents dur<strong>in</strong>g activation, a simple darl<strong>in</strong>gton pair is used.<br />

This reduces the emitter currents of Q1 <strong>and</strong> Q2 (when ON)<br />

considerably, degrad<strong>in</strong>g operation. Therefore, resistive<br />

paths (R1 <strong>and</strong> R2) are added from the emitters of Q1 <strong>and</strong><br />

Q2 to the current source I1. QS1 <strong>and</strong> QS2 (driven by<br />

opposite clock signals) are used to activate/ deactivate the<br />

darl<strong>in</strong>gton stages. When sel1=“1”, QS1 directs I2 to the<br />

base of Q1, to pull down this node with help of output<br />

buffer resistor of TH-1. Thus, Q1 is driven OFF, cutt<strong>in</strong>g<br />

off the path of Q3 <strong>and</strong> R1. Consequently, the other<br />

darl<strong>in</strong>gton stage (Q2-Q4-R2-I1) is activated. Similarly,<br />

when sel2=“1”, QS2 is ON to activate the stage Q1-Q3-<br />

R1-I1, while the other is OFF.<br />

Figure 3. The complete schematic of analog switch.<br />

Although pull<strong>in</strong>g down/releas<strong>in</strong>g the bases of Q1 <strong>and</strong> Q2<br />

for deactivation/activation is similar to the method <strong>in</strong> a<br />

switched emitter follower (SEF) of a TH amplifier, the<br />

proposed ASW is functionally different <strong>in</strong> two ways: i)<br />

When ASW is to pass a signal (say “<strong>in</strong>1”) to its output,<br />

this signal is an almost constant voltage, already held by<br />

the preced<strong>in</strong>g TH dur<strong>in</strong>g this period (hold period of TH-1).<br />

Conversely, SEF of a TH is ON dur<strong>in</strong>g the track mode <strong>and</strong><br />

has to keep up with a slew<strong>in</strong>g signal to perform proper<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

38<br />

track<strong>in</strong>g. ii) When ASW is to block a signal (say “<strong>in</strong>1”), it<br />

isolates its output node from this signal (output of TH-1,<br />

which is <strong>in</strong> track mode), but unlike the SEF, does not<br />

perform sampl<strong>in</strong>g. Actually, unlike a SEF, when one half<br />

of ASW is OFF, its output signal is not a voltage held on<br />

a float<strong>in</strong>g capacitor, but a voltage dictated by the other<br />

half of the ASW. Therefore, signal feedthrough across the<br />

deactivated part is apparently lower.<br />

2.3 Simulation results <strong>and</strong> discussion<br />

The circuit was implemented <strong>in</strong> a 0.35µm BiCMOS<br />

technology with f T=30 GHz <strong>and</strong> simulated with Spectre.<br />

Design of TH-1 <strong>and</strong> TH-2 were based on the TH amplifier<br />

<strong>in</strong> [4]. The supply voltage is 3.3V <strong>and</strong> complete system<br />

consumes 120mW. The differential full-scale analog <strong>in</strong>put<br />

voltage is 400mV p-p. The hold capaticance of TH<br />

amplifiers is 200fF. Fig.4 shows the ENOB curves versus<br />

<strong>in</strong>put signal frequency for a s<strong>in</strong>gle TH (sampled at<br />

4GSa/s) <strong>and</strong> the two-way complete TH (4GSa/s effective<br />

sampl<strong>in</strong>g), reveal<strong>in</strong>g the advantage of the time-<strong>in</strong>terleaved<br />

TH, especially for f>1GHz. Fig.5 shows the differential<br />

output signals for a 450MHz full-scale s<strong>in</strong>e-wave <strong>in</strong>put.<br />

Figure 4. ENOB plots for s<strong>in</strong>gle <strong>and</strong> two-way TH.<br />

Figure 5. Outputs of the THs <strong>and</strong> the system.<br />

3. CONCLUSION<br />

The proposed two-way time-<strong>in</strong>terleaved TH is based on a<br />

novel analog switch. The ASW realizes a simpler function<br />

<strong>and</strong> its speed <strong>and</strong> signal feedthrough problems are


apparently relaxed with respect to that of a SEF, thus the<br />

optimization space for the ASW is wider. The complete<br />

system provides an ENOB 2 bits higher than that of a<br />

s<strong>in</strong>gle TH operat<strong>in</strong>g at the same sampl<strong>in</strong>g rate.<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

4. REFERENCES<br />

[1] K. Nagaraj, et al., “A dual-mode 700-Msample/s 6bit<br />

200-Msamples/s 7-bit A/D converter <strong>in</strong> a 0.25µm<br />

digital CMOS process,” IEEE J.Solid-State Circuits,<br />

vol. 35, pp. 1760-1768, 2000.<br />

[2] T. Reimann, et.al., “A high-speed BiCMOS<br />

switched-current track-<strong>and</strong>-hold circuit,” Proc. of<br />

Custom Integrated Circuit Conf., pp.377-380, 1998.<br />

[3] S. M. Louwsma, et al., “A 1.6 GS/s, 16 times<br />

<strong>in</strong>terleaved track & hold with 7.6 ENOB <strong>in</strong> CMOS,”<br />

Proc. of ESSCIRC '04, pp. 343-346, 2004.<br />

[4] A. N. Karanicolas, “A 2.7-V 300-MS/s track-<strong>and</strong>hold<br />

amplifier,” IEEE J.Solid-State Circuits, vol. 32,<br />

pp. 1961-1967, 1997.<br />

39


0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

A SIGMA-DELTA ADC DESIGN AUTOMATION<br />

TOOL<br />

Selçuk Talay, Günhan Dündar<br />

Boğaziçi University, Department of Electrical <strong>and</strong> <strong>Electronics</strong> Eng<strong>in</strong>eer<strong>in</strong>g<br />

Bebek, 34342 Istanbul, Turkey<br />

E-mail: talays@boun.edu.tr, dundar@boun.edu.tr<br />

ABSTRACT<br />

A design automation system for Sigma-Delta analog-todigital<br />

converters (ADC) which operates at system level<br />

is presented. New error models <strong>and</strong> a new approach to<br />

system level design exploration <strong>and</strong>/or automation are<br />

illustrated. F<strong>in</strong>ally, verification of the approach on a chip<br />

design is presented.<br />

1. INTRODUCTION<br />

This work presents a design automation system developed<br />

for Sigma-Delta (SD) ADC’s. There are various<br />

applications, especially wireless communication systems,<br />

which require specific ADC designs. The specifications<br />

for these circuits may vary <strong>in</strong> many different dimensions<br />

such as resolution, conversion rate, power, etc. It is a<br />

challeng<strong>in</strong>g task to identify the most appropriate structure<br />

for a designer. There are some tools previously developed<br />

[1] which help the designer <strong>in</strong> different aspects. Most of<br />

these tools aim to complete the design for specific ADC<br />

structure <strong>and</strong> even with specific transistor level circuits.<br />

These tools may give adequate solutions for some specific<br />

implementations s<strong>in</strong>ce they only utilize fixed ADC<br />

architectures. Also, designers do not want to use<br />

“blackbox” design automation systems <strong>and</strong> rather prefer<br />

more user <strong>in</strong>teraction. This work satisfies the needs of an<br />

ADC designer regard<strong>in</strong>g user <strong>in</strong>teraction, flexibility <strong>and</strong><br />

type of <strong>in</strong>teraction that designer may select through the<br />

design process. Also, equation database that the tool uses<br />

has improvements over some previously developed<br />

models. The aim throughout this work has been to develop<br />

a SD ADC design automation tool which can be merged<br />

<strong>in</strong>to the analog design automation (ADA) system<br />

described <strong>in</strong> [2] at system level.<br />

2. SIGMA-DELTA ADC DESIGN<br />

AUTOMATION<br />

2.1 Model<strong>in</strong>g approach<br />

There are two common approaches for SD modulator<br />

model<strong>in</strong>g. The first approach utilizes dedicated behavioral<br />

simulators for parameter optimization [3], while the <strong>in</strong>itial<br />

architecture selection is performed by limit<strong>in</strong>g the design<br />

space <strong>in</strong> a coarse manner by us<strong>in</strong>g simple SNR equations<br />

that do not conta<strong>in</strong> any non-l<strong>in</strong>earities. In order to obta<strong>in</strong><br />

accurate performance <strong>in</strong>formation, extensive <strong>and</strong> time<br />

consum<strong>in</strong>g simulations should be performed.<br />

40<br />

The other approach is to estimate the performance of the<br />

system by us<strong>in</strong>g analytical expressions [4]. Then,<br />

behavioral simulations on the analytical expressions<br />

themselves can be performed <strong>in</strong> order to validate the<br />

design. This approach is faster than the previous one but<br />

expressions should be accurate enough to reflect the real<br />

circuit. Faster simulation times allow a more thorough<br />

exploration of the design space. Furthermore, analytical<br />

expressions provide <strong>in</strong>sight <strong>in</strong>to the design, thus help<strong>in</strong>g<br />

the designer to evaluate the tradeoffs better.<br />

In our work, the second approach was chosen. Equations<br />

describ<strong>in</strong>g the behavior of the SD ADC under various<br />

non-idealities were developed <strong>and</strong> performance estimation<br />

was performed without behavioral simulations.<br />

2.2 Equation database<br />

The database conta<strong>in</strong>s many equations, which model the<br />

effects of non-idealities by calculat<strong>in</strong>g their contribution to<br />

the total noise. Then, the algorithm developed, as shown<br />

briefly <strong>in</strong> Figure 1, calculates the dynamic range (DR)<br />

achievable by the design <strong>and</strong> other device level parameters<br />

for the given specifications. The equation database has<br />

various models for noise sources <strong>in</strong> SD ADC’s such as<br />

quantization noise, thermal noise, noise from switches,<br />

jitter noise, slew-rate effect [5]. The equation database is<br />

utilized by the algorithm at different stages of operation <strong>in</strong><br />

order to estimate the noise dur<strong>in</strong>g the design automation<br />

process. The algorithm uses quantization noise calculation<br />

at the early stages of design process for a coarse<br />

estimation. Throughout the design, other noise<br />

contributions are calculated <strong>and</strong> c<strong>and</strong>idate solutions are<br />

generated or killed.<br />

The equation database has many equations which use<br />

device level parameters like amplifier ga<strong>in</strong>, comparator<br />

offset, unit capacitor value, or switch resistances for<br />

calculat<strong>in</strong>g the system level parameters which are DR,<br />

area, power, conversion speed of the ADC. These<br />

parameters can clearly characterize an ADC <strong>and</strong> allow<br />

designs to be compared with others.<br />

The DR of the system can be found by calculat<strong>in</strong>g<br />

contributions of all error sources <strong>and</strong> non-idealities. The<br />

error sources of the architecture may be collected under<br />

four different head<strong>in</strong>gs. Three of the error sources are<br />

related to the blocks of the ADC: <strong>in</strong>tegrator, quantizer <strong>and</strong><br />

DAC. The rema<strong>in</strong><strong>in</strong>g one is the clock jitter. All the error<br />

sources can be modeled either as noise sources or


variations on the transfer functions [4]. As a result of the<br />

effects on transfer functions, the modified noise shap<strong>in</strong>g<br />

behavior can be obta<strong>in</strong>ed. On the other h<strong>and</strong>, utiliz<strong>in</strong>g the<br />

extra noise sources, the <strong>in</strong>-b<strong>and</strong> noise can be obta<strong>in</strong>ed.<br />

Then, DR <strong>and</strong> other performance parameters can be<br />

calculated. In order to achieve faster optimization,<br />

calculations performed for all error sources must be<br />

carried out without behavioral simulations. Most of the<br />

available approaches calculate performance of the systems<br />

through behavioral simulations, at least for some error<br />

sources. This is simply because some errors such as slew<br />

rate are much more difficult to model analytically.<br />

Integrator designs used <strong>in</strong> Sigma-Delta ADC’s are<br />

commonly switched capacitor <strong>in</strong>tegrators. Most of the<br />

errors <strong>in</strong>troduced <strong>in</strong> these <strong>in</strong>tegrators can be modeled <strong>in</strong><br />

the transfer function. The transfer function of a non-ideal<br />

<strong>in</strong>tegrator can be expressed as H(z) = Bz -1 /(1-Cz -1 ) where<br />

B represents the ga<strong>in</strong> error <strong>and</strong> C represents the leakage at<br />

the simplest level expressed as:<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

⎛ C ⎞<br />

f AV<br />

− Cs<br />

C = ⎜ ⎟<br />

⎜ ⎟<br />

⎝ C f AV<br />

⎠<br />

The effect of these errors is the <strong>in</strong>troduction of additional<br />

terms to the famous quantization noise power expression<br />

given <strong>in</strong> [4]. If the expressions, which depend on OSR <strong>and</strong><br />

order, are extended for higher order systems with some<br />

simplifications, it can be seen that the non-ideality adds<br />

extra terms to the noise power which is a function of<br />

modulator order <strong>and</strong> OSR.<br />

Although the transfer function of the <strong>in</strong>tegrator <strong>in</strong>volves<br />

non-idealities, there are more error sources, which should<br />

be added separately. These are among the major<br />

differences of our work from other works reported.<br />

Slew-rate of the OPAMP causes an error which is additive<br />

to the output at the signal b<strong>and</strong>. Some previous researchers<br />

try to estimate the slew-rate by analytically calculat<strong>in</strong>g the<br />

effects [4] or by perform<strong>in</strong>g behavioral simulations [3].<br />

For fast design automation systems, slew-rate can be<br />

calculated <strong>in</strong> a different way. Our approach is to estimate<br />

the effect of slew-rate from signal <strong>in</strong>formation [5]. In this<br />

approach, it is observed that large differences between the<br />

current output <strong>and</strong> the previous output of the <strong>in</strong>tegrator<br />

cause slew<strong>in</strong>g problems. If this difference is larger than<br />

the corners def<strong>in</strong>ed <strong>in</strong> l<strong>in</strong>ear slew-rate characteristic,<br />

slew<strong>in</strong>g problems occur. Thus, us<strong>in</strong>g <strong>in</strong>formation from the<br />

signal statistics, the number of problematic slew-rate<br />

conditions can be achieved. In [5], it was shown that this<br />

value shows the error at the output by a constant.<br />

There are also errors caused by parasitic capacitances (Cp), which can be added to the capacitor CS <strong>in</strong> the transfer<br />

function. If these parasitic capacitances are known or can<br />

be estimated dur<strong>in</strong>g the design, these capacitor values can<br />

be used to calculate CS. Capacitance mismatch is also<br />

important. The mismatch modifies the ga<strong>in</strong> <strong>in</strong> the transfer<br />

function with a factor of (1-σ) where, σ represents the<br />

mismatch value. The effect of this noise source at the<br />

output is an additional term <strong>in</strong> the noise power. However,<br />

41<br />

these terms are quite complicated especially for cascaded<br />

systems <strong>and</strong> are therefore simplified for speed<br />

improvement of the system.<br />

The <strong>in</strong>put referred noise com<strong>in</strong>g from the switches <strong>and</strong> the<br />

OPAMP can be taken as an additive noise source at the<br />

<strong>in</strong>put. This error causes an <strong>in</strong>crease <strong>in</strong> the noise floor.<br />

The offset of the OPAMP is also an additive noise source<br />

<strong>in</strong> the feedback loop. This noise source can be added to the<br />

quantizer error.<br />

The error sources of the quantizer are the offset error, <strong>and</strong><br />

the voltage error caused by settl<strong>in</strong>g time or slew rate. It<br />

can be easily seen that the offset error is additive to the<br />

quantization error.<br />

The other errors can be taken as an additional noise to the<br />

signal b<strong>and</strong>. However, the error given <strong>in</strong> the above<br />

expression is shaped by the noise transfer function<br />

For s<strong>in</strong>gle bit architectures, the errors of the digital-toanalog<br />

converter (DAC) do not have significant impact on<br />

the whole system. However, for multibit systems the<br />

l<strong>in</strong>earity of the DAC is crucial <strong>and</strong> is actually the limit<strong>in</strong>g<br />

factor <strong>in</strong> the performance s<strong>in</strong>ce the error at this po<strong>in</strong>t is<br />

directly added to the <strong>in</strong>put. For ADC’s designed by us<strong>in</strong>g<br />

switched capacitor <strong>in</strong>tegrators, the only contribution is the<br />

noise of the switches This error can be modeled as<br />

additive noise to the <strong>in</strong>put signal, which will raise the<br />

noise floor of the system.<br />

One of the drawbacks of SD ADC’s is the oversampl<strong>in</strong>g<br />

process of the ADC. S<strong>in</strong>ce the architecture of the SD<br />

modulator uses oversampl<strong>in</strong>g, the error that may come<br />

from high sampl<strong>in</strong>g rates cannot be ignored. SD ADC’s<br />

generally have high sampl<strong>in</strong>g ratios, which may require<br />

jitter smaller than few picoseconds for wideb<strong>and</strong><br />

applications. The jitter error was modeled as an additive<br />

<strong>in</strong>put. The most mean<strong>in</strong>gful way of def<strong>in</strong><strong>in</strong>g this error is to<br />

consider the first derivative of the signal <strong>and</strong> the possible<br />

shift <strong>in</strong> the sample time. The result will add an error to the<br />

signal. In our model, the signal at the <strong>in</strong>put of the first<br />

<strong>in</strong>tegrator with clock jitter error is def<strong>in</strong>ed as:<br />

Xnew[nT]= X[nT+δ]= X[nT]+X’[nT].p(t)<br />

where X[nT] is the <strong>in</strong>put signal, X’[nT] is the first<br />

derivative of the signal <strong>and</strong> p(t) is the Gaussian probability<br />

function represent<strong>in</strong>g the deviance from the sample time.<br />

The l<strong>in</strong>ear approach is feasible s<strong>in</strong>ce the SD ADC employs<br />

high sampl<strong>in</strong>g values where the difference between two<br />

samples does not change much. The jitter error was<br />

modeled as an additive <strong>in</strong>put noise where the X’[nT].p(t)<br />

term is an addition to the signal. The derivative can be<br />

expressed as X[nT]-X[nT-T] which can be calculated<br />

easily s<strong>in</strong>ce the <strong>in</strong>put signal characteristics are available.<br />

The developed algorithm can be used to f<strong>in</strong>d a solution<br />

with different oversampl<strong>in</strong>g ratios, orders <strong>and</strong> cascade<br />

configurations. The users even can <strong>in</strong>teract with the tool<br />

<strong>and</strong> select if the solution should be a cascaded architecture<br />

or s<strong>in</strong>gle-loop architecture. Also, there are some other<br />

configurations, which use feed-forward <strong>and</strong> feedback<br />

connections, <strong>and</strong> their models are still under development


<strong>and</strong> will be <strong>in</strong>corporated <strong>in</strong>to the tool. The equation<br />

database is used for calculat<strong>in</strong>g the performance<br />

parameters of these architectures.<br />

Get design<br />

parameters<br />

Evaluate design<br />

Output<br />

feasibility<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

Start<br />

Get desired<br />

SNR<br />

Get stability<br />

Get OSR<br />

1 Operation 3<br />

mode<br />

False<br />

Get ranges<br />

for miss<strong>in</strong>g<br />

data<br />

2<br />

Get library<br />

blocks<br />

Library<br />

complete<br />

True<br />

Evaluate<br />

Output feasibility<br />

<strong>and</strong> block parameters<br />

Figure 1. The algorithm.<br />

2.3 Modes of operation<br />

Search design<br />

space<br />

F<strong>in</strong>d c<strong>and</strong>idate<br />

solutions<br />

Output parameters<br />

of c<strong>and</strong>idate<br />

solutions<br />

The developed algorithm has three different modes of<br />

operation for help<strong>in</strong>g the designer dur<strong>in</strong>g the system level<br />

design of the ADC. The user should select a mode for<br />

operation at the beg<strong>in</strong>n<strong>in</strong>g of the design process. The first<br />

mode (verification mode) is for check<strong>in</strong>g if the design is<br />

implementable. In this mode, the user provides all the<br />

design parameters for each block <strong>in</strong> the design <strong>and</strong> the tool<br />

checks the feasibility of the whole circuit. In this mode, all<br />

design parameters provided by the user are evaluated <strong>in</strong><br />

the equation database <strong>and</strong> their noise contribution is<br />

calculated. Then, the tool calculates the performance<br />

parameters DR, area, power <strong>and</strong> conversion speed<br />

achievable with the given block parameters.<br />

The second mode (semi-custom mode) is where the user<br />

can attach a library to the tool. The libraries attached to the<br />

tool may conta<strong>in</strong> block characterizations such as OPAMP<br />

<strong>and</strong> comparator <strong>in</strong>formation. Then, the tool tries to f<strong>in</strong>d a<br />

solution with the parameters of the blocks available <strong>in</strong> the<br />

library. This mode is useful when there are previously<br />

designed blocks. The designer may use these blocks or<br />

some of them <strong>and</strong> generate the specifications for the<br />

rema<strong>in</strong><strong>in</strong>g blocks.<br />

In the last mode (the full automatic mode), the user only<br />

specifies the system level <strong>in</strong>puts. Then, the tool searches a<br />

42<br />

huge design space <strong>and</strong> f<strong>in</strong>ds c<strong>and</strong>idate solutions based on<br />

an <strong>in</strong>corporated cost function for select<strong>in</strong>g a solution <strong>in</strong> the<br />

design space. In this mode, not only the block parameters<br />

are calculated, but also the order <strong>and</strong> cascade<br />

configuration are also explored. The design space<br />

exploration can be limited by user <strong>in</strong>teraction.<br />

To be able to utilize this tool to its full power <strong>in</strong> the semicustom<br />

<strong>and</strong> full automatic modes, an accurate cost<br />

function should be def<strong>in</strong>ed. In this cost function, the<br />

power consumption <strong>and</strong> area of each block should also be<br />

provided. However, this <strong>in</strong>formation is available only if<br />

the blocks are pre-designed. In order to get around this<br />

problem, a performance estimator has been developed as<br />

expla<strong>in</strong>ed <strong>in</strong> [2]. This performance estimator estimates the<br />

cost of any block given the required performance<br />

specifications such as ga<strong>in</strong>, SR, etc [6]. S<strong>in</strong>ce the<br />

performance estimator is discussed elsewhere, it will not<br />

be detailed here. However, it is possible to guarantee the<br />

lowest cost solution with this approach. The accuracy of<br />

performance estimation lies <strong>in</strong> the approach used <strong>in</strong> the<br />

system. The performance estimator utilized <strong>in</strong> this work,<br />

has been developed us<strong>in</strong>g EKV models <strong>and</strong> provides<br />

accurate results <strong>in</strong> reasonable cpu time.<br />

Hav<strong>in</strong>g more than one operation mode <strong>in</strong>creases the<br />

flexibility of the design process. Also, by utiliz<strong>in</strong>g a<br />

library, the design time for an ADC may decrease<br />

considerably. It is certa<strong>in</strong> that <strong>in</strong> such a case the solution<br />

will not be the optimum one, maybe not even close to it.<br />

The verification mode <strong>and</strong> semi-custom mode are geared<br />

towards typical design cycles <strong>in</strong> the <strong>in</strong>dustry where preverified<br />

cells exist <strong>and</strong> the eng<strong>in</strong>eer may wish to use them.<br />

2.4 Examples<br />

In order to show the accuracy of the models, an example is<br />

presented. Real circuit data from [4] is used s<strong>in</strong>ce all<br />

design parameters are clearly stated. The design is a<br />

second order system which has 100.2 dB DR. In order to<br />

compare the models, the first mode of operation was used.<br />

The design parameters used <strong>in</strong> the reference design were<br />

given as <strong>in</strong>put to the tool <strong>and</strong> it calculated the DR value as<br />

100dB. Although the results are very similar, the effect of<br />

capacitor mismatch was not <strong>in</strong>cluded <strong>in</strong> the example run<br />

of the system.<br />

Another example is presented <strong>in</strong> order to show other<br />

modes of operation. The semi-custom mode of the SD<br />

ADC design tool was chosen <strong>in</strong> this example because it<br />

conta<strong>in</strong>s both library oriented design <strong>and</strong> automated<br />

design. The amplifiers were designed by automatically<br />

utiliz<strong>in</strong>g the performance estimator <strong>and</strong> the comparators<br />

were chosen from the library. The amplifier types utilized<br />

by the performance estimator throughout this example are<br />

cascode OPAMPs.<br />

In the design example, the desired specifications are a DR<br />

of 86 dB, a supply voltage of 5V, <strong>and</strong> an OSR of 64. The<br />

order of the modulator was not set. The desired DR value<br />

corresponds to nearly 14 bits of resolution.


The algorithm searches for c<strong>and</strong>idate architectures with<br />

the given specifications. S<strong>in</strong>ce behavioral simulation was<br />

not performed for check<strong>in</strong>g the stability of the design, the<br />

tool advised the user to use cascaded architectures, which<br />

utilize at most second order modulators. As a result the<br />

c<strong>and</strong>idate architectures were 2-1, 2-2 <strong>and</strong> 2-2-1. However,<br />

lower order architectures can be achieved by assign<strong>in</strong>g the<br />

OSR as a free variable. In this case, the algorithm searches<br />

for OSR values which provide adequate performance. This<br />

search is performed on ideal SD ADC equations <strong>and</strong> the<br />

real performance is calculated later by add<strong>in</strong>g noise<br />

contributions.<br />

The c<strong>and</strong>idate architectures which satisfy the desired DR<br />

performance for OSR of 64 were sent to the performance<br />

estimator <strong>and</strong> area <strong>and</strong> power values were estimated.<br />

The algorithm uses a structure called ‘worm’ which is a<br />

solution for the ADC. The worm conta<strong>in</strong>s all <strong>in</strong>formation<br />

such as OSR, supply voltage, configuration, noise values,<br />

DR, capacitor values <strong>and</strong> the block parameters such as<br />

amplifier ga<strong>in</strong>, comparator offset. These worms are passed<br />

to functions <strong>in</strong> order to modify their relevant values, such<br />

as quantization noise or DR. For the semi-custom<br />

operation mode that is used <strong>in</strong> this example, worms were<br />

created for 2-1, 2-2 <strong>and</strong> 2-2-1 configurations. The <strong>in</strong>itial<br />

number of worms depends on the number of elements <strong>in</strong><br />

the library. The amplifier library was not used <strong>in</strong> this<br />

example, thus the comparator library def<strong>in</strong>es the number<br />

of <strong>in</strong>itial worms. Then, the user may select the range of<br />

parameters for the amplifier <strong>and</strong> other blocks. For our run,<br />

ranges for the ga<strong>in</strong> of the amplifier <strong>and</strong> capacitor values <strong>in</strong><br />

the <strong>in</strong>tegrator were set.<br />

The worms were sent to noise estimation functions which<br />

calculate the DR of the worm. As a result, the user has<br />

many solutions satisfy<strong>in</strong>g the specified DR performance.<br />

The algorithm generates 846 c<strong>and</strong>idate solutions.<br />

Performance estimator could f<strong>in</strong>d a solution for 567 of<br />

them. This means that the OPAMP specifications for the<br />

rest were not achievable. These solutions were listed<br />

accord<strong>in</strong>g to their DR values by the software. Several<br />

solutions may exist for the same DR with different<br />

structures <strong>in</strong> which case the optimum solution (<strong>in</strong> terms of<br />

area, power, or a comb<strong>in</strong>ation of these two) should be<br />

selected by the user.<br />

2.5 Chip Design<br />

In order to test the developed system, a chip was<br />

fabricated <strong>in</strong> 0.6µm AMS technology, which conta<strong>in</strong>s two<br />

modules. The first module is for test<strong>in</strong>g the first-order<br />

system. The second one is a 2-1 cascaded system, whose<br />

second <strong>and</strong> third order outputs are accessible. Correct<br />

operation of the modules was observed <strong>in</strong> <strong>in</strong>itial test<strong>in</strong>g.<br />

Figure 2 shows the I/O characteristics of 10 IC samples<br />

where the reference voltages are ±0.75V. The<br />

characterization results are <strong>in</strong> good agreement with the<br />

behavior predicted by the tool. The 60 mV offset is due to<br />

process variations.<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

43<br />

Digital output mapped to analog<br />

1<br />

0.8<br />

0.6<br />

0.4<br />

0.2<br />

0<br />

-0.2<br />

-0.4<br />

-0.6<br />

-0.8<br />

y = - 0.000652*x + 0.0658<br />

M easured<br />

Data fit<br />

Target<br />

-1<br />

-1500 -1000 -500 0<br />

D ifferential Input<br />

500 1000 1500<br />

Figure 2. Chip characterization.<br />

3. CONCLUSION<br />

In this work a system level design automation system for<br />

SD ADC’s was presented. The developed system has<br />

some improvements on models of SD ADC’s, especially<br />

on jitter <strong>and</strong> slew-rate. Also, the algorithm of the system<br />

was developed such that it allows user <strong>in</strong>teraction at every<br />

<strong>in</strong>stance of design process. Another novelty of the<br />

approach is that it allows the model<strong>in</strong>g <strong>and</strong> synthesis of<br />

different architectures. It also is developed for seamless<br />

<strong>in</strong>tegration <strong>in</strong>to previously developed ADA systems. A<br />

chip was designed to verify the validity of the approach.<br />

4. ACKNOWLEDGEMENT<br />

This work was supported by TÜBİTAK by project number<br />

101E039.<br />

5. REFERENCES<br />

[1] L. Williams, III <strong>and</strong> B. Wooley, "MIDAS—A<br />

functional simulator for mixed digital <strong>and</strong> analog<br />

sampled data systems." <strong>in</strong> Proc. Int. Symp. Circuits<br />

Syst. (ISCAS 92 ), pp: 2148-2151. 1992,<br />

[2] S. Balkır, G. Dündar, <strong>and</strong> A.S. Öğrenci, Analog VLSI<br />

Design Automation, CRC Press, 2003.<br />

[3] K. Francken, G.E. Gielen, "A High-level Simulation<br />

<strong>and</strong> Synthesis Environment for Delta Sigma<br />

Modulators", IEEE TCAD, Vol. 22, pp. 1049-1061,<br />

Aug. 2003<br />

[4] F. Medeiro, B. Perez-Verdu <strong>and</strong> A. Rodriguez-<br />

Vazquez, Top-Down Design of High-Performance<br />

Sigma-Delta Modulators, Kluwer Academic<br />

Publishers, Holl<strong>and</strong>, 1999.<br />

[5] S. Talay <strong>and</strong> G. Dündar “Slew-Rate Effect <strong>in</strong> First<br />

Order Sigma-Delta ADC’s”, Melecon 2004,<br />

Dubrovnik, Crotia, May 2004.<br />

[6] E. Deniz <strong>and</strong> G. Dündar, “Performance Estimator for<br />

an Analog Design Automation System us<strong>in</strong>g EKVmodeled<br />

Analog Circuits”, to appear <strong>in</strong> ECCTD<br />

<strong>2005</strong>.


0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

A REGULAR MODULAR ARCHITECTURE<br />

FOR PIPELINED BINARY TREE<br />

MULTIPLIERS BASED ON A SOG<br />

STRUCTURE<br />

Nicola Testoni, Marco Cisterni, Eleonora Franchi<br />

Advanced <strong>Research</strong> Center on Electronic Systems (ARCES), University of Bologna<br />

v.le Risorgimento 2, 40136 Bologna, Italy<br />

e-mail: ntestoni@arces.unibo.it<br />

ABSTRACT<br />

This paper presents an highly regular modular architecture<br />

for the design of pipel<strong>in</strong>ed b<strong>in</strong>ary tree multipliers which is<br />

suited to be mapped onto a gate array device. The “tree of<br />

Wallace trees” architecture is employed <strong>in</strong> order to<br />

achieve a regular layout, while symmetrical sea-of-gate<br />

(SOG) enables for a reduced design time <strong>and</strong> an high predictability<br />

of l<strong>in</strong>e delay. Post-layout simulations for a<br />

16×16bit pipel<strong>in</strong>ed multiplier designed us<strong>in</strong>g a 5 metal<br />

layers, 0.35μm CMOS technology show operative frequencies<br />

up to 1GHz.<br />

1. INTRODUCTION<br />

A great number of today's mid-volume <strong>in</strong>tegrated circuit<br />

production runs are not well-served by the two lead<strong>in</strong>g<br />

technologies <strong>in</strong> this field: FPGAs <strong>and</strong> ASICs. Author <strong>in</strong><br />

[1] suggests that a new class of devices, namely structured<br />

ASICs, could fill the gap between these technologies,<br />

s<strong>in</strong>ce they are both closer to FPGAs <strong>in</strong> terms of design<br />

costs <strong>and</strong> turnaround time <strong>and</strong> to ASICs <strong>in</strong> terms of gate<br />

capacity, performances <strong>and</strong> power consumptions. Modern<br />

high-end gate array devices can be viewed as extremely<br />

f<strong>in</strong>e-gra<strong>in</strong> structured ASICs, be<strong>in</strong>g both highly regular <strong>and</strong><br />

fast to design: by this st<strong>and</strong>po<strong>in</strong>t we describe a modular<br />

architecture for pipel<strong>in</strong>ed b<strong>in</strong>ary tree multipliers which can<br />

be easily used to quickly design arbitrary large multipliers<br />

with good throughput performances exploit<strong>in</strong>g the high<br />

l<strong>in</strong>e delay predictability due to the use of a symmetrical<br />

gate array support. Tak<strong>in</strong>g steps from the Wallace Tree<br />

(WT) [2], the tree of Wallace Tree (TWT) [3] architecture<br />

is discussed <strong>in</strong> Section 2. In Section 3 we highlight the<br />

advantages of a symmetrical sea of gates (SOG) structure<br />

for the design stage of a TWT pipel<strong>in</strong>ed multiplier <strong>and</strong> we<br />

describe implementation choices at circuit <strong>and</strong> layout<br />

level. Section 4 shows the simulation results for a<br />

prototypical implementation on a 5 metal layers, 0.35μm<br />

CMOS technology. Section 5 concludes this paper.<br />

2. TWT ARCHITECTURE<br />

A TWT multiplier unit is divided <strong>in</strong>to three stages: the<br />

first one generates the partial products, the second sums<br />

up the partial products add<strong>in</strong>g them <strong>in</strong> parallel us<strong>in</strong>g<br />

multiple full adders (FA). The TWT architecture uses a<br />

pair of FA at each node <strong>in</strong> order to generate two outputs<br />

from four <strong>in</strong>puts (CSA 4:2), which results <strong>in</strong> a balanced<br />

b<strong>in</strong>ary tree with an high degree of regularity <strong>and</strong> a<br />

simplified wir<strong>in</strong>g structure. The third stage generates the<br />

44<br />

Figure.1. TWT layout for a 16×16bit multiplier:<br />

partial product generators are shaded. The longest<br />

SOG row is enclosed by a dashed l<strong>in</strong>e while<br />

arrows <strong>in</strong>dicate the add<strong>in</strong>g direction;<br />

f<strong>in</strong>al multiplication result: <strong>in</strong> this stage, any k<strong>in</strong>d of fast<br />

adder architecture can be used as a carry propagate adder<br />

(CPA). The layout of the TWT can be conducted <strong>in</strong> stages,<br />

from leaf nodes towards the root, <strong>in</strong> the form of an L-tree<br />

[4] as shown <strong>in</strong> Fig.1, so that any node is placed between<br />

its two children. Although the root always ends up <strong>in</strong> the<br />

middle of the layout, the result<strong>in</strong>g rout<strong>in</strong>g density is the<br />

smallest achievable for a l<strong>in</strong>ear placement structure,<br />

improv<strong>in</strong>g the pipel<strong>in</strong>e performances. F<strong>in</strong>ally, due to the<br />

symmetry of the layout at each leaf, the design of<br />

multipliers with an higher number of bits comes at almost<br />

no cost, while ma<strong>in</strong>ta<strong>in</strong><strong>in</strong>g all the benefits of constant <strong>and</strong><br />

predictable l<strong>in</strong>e delay.<br />

3. SOG IMPLEMENTATION OF TWT<br />

S<strong>in</strong>ce each CSA stage of the TWT multiplier is identical to<br />

its children, an optimized implementation for the CSA<br />

should be adopted. In this paper we exploit the<br />

performance that can be achieved by us<strong>in</strong>g the high<br />

regular structure of a SOG: even if this choice does not<br />

aim to s<strong>in</strong>gle transistor optimization, it grants a certa<strong>in</strong><br />

degree of optimization on the whole layout. Each SOG<br />

basic cell is left-right symmetrical <strong>and</strong> conta<strong>in</strong>s one<br />

NMOS transistor <strong>and</strong> one PMOS transistor. The block<br />

diagram of the TWT latch-based pipel<strong>in</strong>e is shown <strong>in</strong><br />

Fig.2.


3.1 CSA design choices<br />

Due to the SOG structure, a CSA layout with a balanced<br />

number of NMOS <strong>and</strong> PMOS is required. In order to both<br />

ease the design phase <strong>and</strong> achieve an high throughput, we<br />

chose a static pass transistor logic for each FA. Delay<br />

between each FA is reduced remov<strong>in</strong>g the output <strong>in</strong>verter<br />

stages, thus us<strong>in</strong>g negative logic FA (FAn) <strong>in</strong>stead of<br />

common FA: s<strong>in</strong>ce the number of FA <strong>in</strong> each CSA is even,<br />

this choice does not alter the global logic function.<br />

Moreover, a tree of pass-transistor logic FA implements a<br />

number of pass-transistor gate cha<strong>in</strong>s: each cha<strong>in</strong> can be<br />

optimized <strong>in</strong> order to achieve the m<strong>in</strong>imum propagation<br />

time. The schematic of the basic CSA unit is shown <strong>in</strong><br />

Fig.3.<br />

3.2 Layout level design choices<br />

The CSA stages are arranged <strong>in</strong> order to have partial<br />

products of the same weight on the same SOG row. Only<br />

one row of maximum length exists (Fig.1) <strong>and</strong> it can be<br />

shown that it co<strong>in</strong>cides with the critical path (Fig.2).<br />

Interest<strong>in</strong>gly, this same row is replicated to implement<br />

both active circuitry <strong>and</strong> local <strong>in</strong>terconnections between<br />

TWT leaves <strong>in</strong> each row, thus lead<strong>in</strong>g to the highly regular<br />

layout shown <strong>in</strong> Fig.5. This layout structure is made<br />

possible due to the flexibility of the FA: <strong>in</strong> fact, s<strong>in</strong>ce they<br />

are based on pass-transistor logic, whenever some of their<br />

<strong>in</strong>puts are fixed, they can work both as half adders or<br />

buffers with almost the same performance of dedicated<br />

hardware. This improves both the modularity of the design<br />

<strong>and</strong> the global predictability of the l<strong>in</strong>e delay s<strong>in</strong>ce each<br />

<strong>in</strong>terconnection is realized us<strong>in</strong>g the same structure.<br />

Additionally, s<strong>in</strong>ce the TWT is a balanced b<strong>in</strong>ary tree <strong>and</strong><br />

the SOG used is itself symmetrical, each path is equivalent<br />

to the critical one <strong>in</strong> terms of performances: this means<br />

that the optimization effort can be concentrated on the<br />

basic CSA <strong>and</strong> the pipel<strong>in</strong>e buffers s<strong>in</strong>ce the length of<br />

each l<strong>in</strong>e is quantized. F<strong>in</strong>ally, thanks to the symmetry of<br />

the SOG, each cell <strong>in</strong> the TWT is designed only once <strong>and</strong><br />

then mirrored left-right <strong>in</strong> order to realize the L-Tree<br />

structure. This gives way to <strong>in</strong>terconnections that have the<br />

same length <strong>and</strong> capacitance not only between each row<br />

but even between each leaf of the L-Tree.<br />

3.3 Pipel<strong>in</strong>e latches design <strong>and</strong> optimization<br />

In order to improve pipel<strong>in</strong>e throughput, pass-transistor<br />

based dynamic latches are used <strong>in</strong> place of static registers:<br />

the two clock phases are generated with<strong>in</strong> each latch from<br />

a s<strong>in</strong>gle phase clock. This additional circuitry, consist<strong>in</strong>g<br />

of a cascade of 2 logic <strong>in</strong>verters is not shown <strong>in</strong> Fig.4.<br />

S<strong>in</strong>ce both FA <strong>and</strong> latches are based upon pass-transistor<br />

logic, an almost complete analysis of the critical path load<br />

capacitance can be issued <strong>in</strong> order to optimize the latches'<br />

output buffer. Given the characteristics of the m<strong>in</strong>imum<br />

sized <strong>in</strong>verter available on the SOG <strong>and</strong> a good RC model<br />

for the propagation delay of the pass-transistor gate, the<br />

optimization effort [5] along the critical path gives the<br />

follow<strong>in</strong>g results: optimal number of pass-transistor gates<br />

<strong>in</strong> each stage equal to 2.7 <strong>and</strong> stage buffer multiply<strong>in</strong>g<br />

factor equal to 2.2. Given the constra<strong>in</strong>ts of the SOG<br />

support, the implemented solution is the closest to these<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

45<br />

Figure.2. Block diagram of the pipel<strong>in</strong>e: critical<br />

path is drawn solid while dashed block represents<br />

other TWT leaves.<br />

Figure.3. Basic CSA block schematic: two<br />

negative logic FA are used <strong>in</strong> order to reduce the<br />

propagation delay.<br />

Figure.4. Pass-transistor logic latch with Set-<br />

Reset; output transistors are double-sized <strong>in</strong> order<br />

to reduce delay.


values: <strong>in</strong> fact, along the critical path we have at most 3<br />

subsequent pass-transistor gates, tak<strong>in</strong>g <strong>in</strong>to account the<br />

one with<strong>in</strong> the latch too, while each latch output buffer is<br />

double-sized. I/O propagation time for ris<strong>in</strong>g edges is as<br />

low as 65ps while is close to 83ps for fall<strong>in</strong>g edges;<br />

propagation time from clock ris<strong>in</strong>g edges are 166ps for a<br />

fall<strong>in</strong>g output transition <strong>and</strong> 186ps for a ris<strong>in</strong>g one. As for<br />

a 16×16bit multiplier, the difference <strong>in</strong> <strong>in</strong>terconnections'<br />

length between CSA stages results <strong>in</strong> a negligible<br />

difference of capacitance loads so that equal sized pipel<strong>in</strong>e<br />

buffers are used.<br />

3.4 L<strong>in</strong>e drivers design <strong>and</strong> optimization<br />

Even if the TWT structure has a very regular layout, as<br />

shown <strong>in</strong> Fig.5, multiplic<strong>and</strong>'s <strong>and</strong> multiplier's <strong>in</strong>put bits<br />

does not follow the same path with<strong>in</strong> the unit: the length<br />

of the longest distribution l<strong>in</strong>e for each <strong>in</strong>put signal must<br />

be correctly estimated <strong>in</strong> order to optimize the<br />

performance of l<strong>in</strong>e drivers. Fig.6 displays two sample<br />

<strong>in</strong>put signal l<strong>in</strong>es: <strong>in</strong> general, it can be shown that, while<br />

the length of the multiplic<strong>and</strong>'s distribution l<strong>in</strong>es A0:A15 is<br />

almost constant throughout the whole unit, the length of<br />

multiplier's distribution l<strong>in</strong>es grows constantly from B0 to<br />

B15 but upper bounded by the length of the multiplic<strong>and</strong>'s<br />

l<strong>in</strong>es. The longest l<strong>in</strong>e measures 914μm <strong>and</strong> its<br />

capacitance load is 166fF. In addition to this, multiplier's<br />

<strong>and</strong> multiplic<strong>and</strong>'s l<strong>in</strong>es have the same fan-out: for a<br />

16×16bit unit based on a 0.35μm CMOS technology this<br />

capacitance load is 214fF, almost 150% of the capacitance<br />

load of the longest l<strong>in</strong>e, thus reduc<strong>in</strong>g the difference<br />

between <strong>in</strong>put signal buffers' total capacitance load. Input<br />

signals l<strong>in</strong>e drivers are implemented us<strong>in</strong>g a passtransistor<br />

latch followed by a cascade of buffer with<br />

grow<strong>in</strong>g form-factor: the optimization process [5] gives a<br />

form-factor multiply<strong>in</strong>g factor of 4, while the optimal<br />

number of stages needed to drive a capacitance of 380fF is<br />

2.4. We implemented a 3 stages l<strong>in</strong>e driver s<strong>in</strong>ce it<br />

produces sharper signal edges, while keep<strong>in</strong>g the<br />

propagation delay as low as possible.<br />

3.5 TWT as multiplier build<strong>in</strong>g block<br />

Larger multipliers can be easily built us<strong>in</strong>g the basic TWT<br />

unit: it's symmetry <strong>and</strong> modularity can be exploited by a<br />

structure compiler to design an arbitrary large multiplier<br />

simply mirror<strong>in</strong>g <strong>and</strong> til<strong>in</strong>g appropriate sized TWT unit.<br />

As shown <strong>in</strong> Fig.7, a 16×16bit TWT multiplier is made up<br />

of four 8×8bit TWT multipliers enclos<strong>in</strong>g the f<strong>in</strong>al stage<br />

CSA for that level. In general, the easiest way to<br />

implement a larger multiplier given a basic TWT unit is<br />

the follow<strong>in</strong>g:<br />

• Create a copy of the TWT unit<br />

• Mirror it upside-down<br />

• Tile it over the orig<strong>in</strong>al TWT<br />

• Create a copy of the new compound<br />

• Mirror it left-right<br />

• Tile the new compounds around a CSA unit<br />

• Properly <strong>in</strong>terconnect units' <strong>in</strong>put signals.<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

46<br />

Figure.5. Layout of the 16×16bit TWT multiplier<br />

on a 0.35μm CMOS technology symmetrical sea<br />

of gate.<br />

Figure.6. Comparison between signal distribution<br />

l<strong>in</strong>es for <strong>in</strong>put A2, B2 <strong>and</strong> B13 with<strong>in</strong> a 16×16bit<br />

TWT multiplier.<br />

Figure.7. Hierarchical subdivision for a 16×16bit<br />

TWT multiplier: 8×8bit TWT are tiled enclos<strong>in</strong>g<br />

the last stage CSA.


Before <strong>in</strong>terconnect<strong>in</strong>g units, some optimization should be<br />

taken <strong>in</strong>to consideration: <strong>in</strong> fact, it can be shown that no<br />

more than 2(N-1)+log2(N) SOG rows are required for the<br />

layout of a TWT multiplier with N <strong>in</strong>put bits per channel.<br />

Us<strong>in</strong>g an 8×8bit TWT unit as basic block, up to log2(N)-4<br />

SOG rows can be elim<strong>in</strong>ated at each stage from the top of<br />

the stack without compromis<strong>in</strong>g the global functionality.<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

4. SIMULATION RESULTS<br />

Simulations of a 16×16-bit unsigned multiplier have been<br />

done us<strong>in</strong>g a 5 metal layers, 0.35μm CMOS technology on<br />

Spectre. Each cell of the SOG measures 1.5μm×17.4μm<br />

<strong>and</strong> conta<strong>in</strong>s one NMOS transistor with WN=3.4μm <strong>and</strong><br />

one PMOS with WP=4.7μm: these form factors had been<br />

chosen <strong>in</strong> order to maximize the routability of the first<br />

metal layer <strong>and</strong> to m<strong>in</strong>imize the propagation delay; the cell<br />

is left-right symmetrical. A 16×16bit TWT unit measures<br />

636μm×592μm, as shown <strong>in</strong> Fig.5; <strong>in</strong> order to implement<br />

a work<strong>in</strong>g multiplier, distribution network drivers for<br />

clock, set <strong>and</strong> reset signals <strong>and</strong> <strong>in</strong>put signal buffer must be<br />

added: the former requires 3 additional topmost SOG<br />

rows, while the last needs 28 columns at each SOG row.<br />

Dimensions for a 16×16bit TWT multiplier are thus<br />

678μm×644μm. Performance simulations <strong>in</strong>volved the<br />

critical path only: worst case commutation for each FA<br />

pair was considered <strong>in</strong> order to properly evaluate the<br />

m<strong>in</strong>imum clock period. The structure regularity allowed<br />

for a simple but effective simulation setup: each critical<br />

path carry-out <strong>and</strong> sum-out is used to generate the<br />

correspond<strong>in</strong>g carry-<strong>in</strong> <strong>and</strong> sum-<strong>in</strong>. This corresponds to<br />

the simulation of the multiplier central row, when both<br />

lower <strong>and</strong> upper rows are <strong>in</strong>volved <strong>in</strong> a worst case<br />

commutation. As shown <strong>in</strong> Fig.8, several post-layout<br />

simulation at 3.3V power supply <strong>and</strong> different<br />

corner/temperature were conducted <strong>in</strong> order to evaluate<br />

the fastest allowed operat<strong>in</strong>g frequencies; after that,<br />

quadratic least square <strong>in</strong>terpolation was used to<br />

extrapolate cont<strong>in</strong>uous functions of the temperature. As<br />

expected, fast corner allows for 1.1GHz operations all<br />

over the <strong>in</strong>dustrial temperature range, while only 750MHz<br />

can be guaranteed for the slow corner. Typical corner<br />

post-layout simulations (Fig.9) show that 1GHz operat<strong>in</strong>g<br />

frequency is easily achievable on this support at 27°C.<br />

5. CONCLUSIONS<br />

An high-regularity modular architecture for the design of<br />

pipel<strong>in</strong>ed b<strong>in</strong>ary tree multipliers has been presented. The<br />

regularity of the architecture comb<strong>in</strong>ed with the structure<br />

of the SOG allows for an high degree of predictability of<br />

l<strong>in</strong>e delay <strong>and</strong> for a great balanc<strong>in</strong>g of l<strong>in</strong>e capacitance<br />

load. System-wise optimization of pipel<strong>in</strong>e buffers led to<br />

high operat<strong>in</strong>g frequencies even if SOG does not allow for<br />

s<strong>in</strong>gle transistor optimization. The proposed architecture<br />

seems an ideal c<strong>and</strong>idate to the design of a multipliers'<br />

compiler provid<strong>in</strong>g both short design time <strong>and</strong> predictable<br />

performance.<br />

47<br />

Figure.8. Post-layout simulation results for<br />

m<strong>in</strong>imum clock period vs. temperature sweep at<br />

different corners; solid curves are obta<strong>in</strong>ed<br />

through quadratic LS <strong>in</strong>terpolation.<br />

Figure.9. Simulation results for the critical path<br />

of a 16×16bit TWT multiplier <strong>in</strong> 0.35μm CMOS<br />

technology at Vdd=3.3V, T=27°C: all output<br />

signals' transients end before clock ris<strong>in</strong>gs.<br />

6. REFERENCES<br />

[1] B.Zahiri, “Structured ASICs: Opportunities <strong>and</strong><br />

Challenges”, 21 st IEEE International Conference on<br />

Computer Design, proceed<strong>in</strong>gs, pp.404-409, 2003.<br />

[2] C.S.Wallace, “A suggestion for fast multiplier”,<br />

IEEE Transactions on Electronic Computers, vol.13,<br />

pp.14-17, Feb. 1964.<br />

[3] K.F.Pang, “Architecture for pipel<strong>in</strong>ed Wallace tree<br />

multiplier-accumulators”, IEEE International<br />

Conference on Computer Design., proceed<strong>in</strong>gs,<br />

pp.247-250, Sep. 1990.<br />

[4] Y.Harata, et al., “A high speed multiplier us<strong>in</strong>g<br />

redundant b<strong>in</strong>ary adder tree”, IEEE J. Solid St. Circ.,<br />

vol.22, pp.28-34, Feb. 1987.<br />

[5] I.Sutherl<strong>and</strong>, et al., “Logial Effort: Design<strong>in</strong>g Fast<br />

CMOS Circuits”, Morgan Kaufmann Publisher,<br />

1999.


Abstract<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

Develop<strong>in</strong>g a Strategic Error Source Based<br />

Design Evaluation For ADC’s<br />

Jason Hannon, Carsten Wegener <strong>and</strong> Michael Peter Kennedy<br />

In this work, we develop the foundation of an error source<br />

based Design Evaluation <strong>and</strong> apply it to the exmple of<br />

a Succesive-Approximation-Register (SAR) Analog-to-Digital<br />

converter (ADC). The proposed approach identifies the error<br />

sources <strong>in</strong> the design <strong>and</strong> relates them to the performance characteristics<br />

measured for prototype samples. Based on this<br />

identification, we extract the sources of dom<strong>in</strong>ant performance<br />

degradation <strong>and</strong> suggest a redesign to counter act the latter.<br />

1 Introduction<br />

The objective of this article is to develop a strategic approach to<br />

Design Evaluation of an ADC. The aims of Design Evaluation<br />

consist of:<br />

establish<strong>in</strong>g confidence that the design meets specifications<br />

with high yield <strong>in</strong> volume manufactur<strong>in</strong>g [1], <strong>and</strong><br />

determ<strong>in</strong><strong>in</strong>g the sources of errors degrad<strong>in</strong>g device performance.<br />

If a design marg<strong>in</strong>ally meets its design specifications, an important<br />

piece of <strong>in</strong>formation to learn from Design Evaluation is<br />

which are the contributors that affect the design performance.<br />

This <strong>in</strong>formation proves <strong>in</strong>valuable for two reasons: (1) guid<strong>in</strong>g<br />

a redesign effort which can lead to devices that perform<br />

towards improved specifications, <strong>and</strong> (2) provid<strong>in</strong>g <strong>in</strong>formation<br />

on potential design hazards for future designs when migrat<strong>in</strong>g<br />

to smaller geometies. Also, with tighter profit marg<strong>in</strong>s for IC’s,<br />

e.g. <strong>in</strong> consumer products, achiev<strong>in</strong>g a yield <strong>in</strong> manufactur<strong>in</strong>g<br />

that is higher by even a small percentage translates <strong>in</strong>to significant<br />

additional revenue.<br />

Traditional Design Evaluation approaches are ad-hoc <strong>in</strong> nature;<br />

when a design does not satisfactorily meet specifications, the<br />

design evaluation eng<strong>in</strong>eer attempts to identify the design errors,<br />

i.e. the ma<strong>in</strong> source of performance degradation. With<br />

<strong>in</strong>creas<strong>in</strong>g complexity of designs, the portion of time for evaluat<strong>in</strong>g<br />

<strong>and</strong> troubleshoot<strong>in</strong>g design errors is <strong>in</strong>creas<strong>in</strong>g [2].<br />

The here proposed approach is strategic <strong>in</strong> that all known error<br />

sources <strong>in</strong>herent to the design are identified <strong>and</strong> subsequently<br />

modeled. This provides <strong>in</strong>sight <strong>in</strong>to device behavior, which is<br />

ready for exploitation by fitt<strong>in</strong>g contributions associated with<br />

<strong>in</strong>dependent errors sources to the measured device response.<br />

The advantage of the model based approach is that one can establish<br />

the model’s completeness with respect to cover<strong>in</strong>g the<br />

Department of Microelectronic Eng<strong>in</strong>eer<strong>in</strong>g,<br />

University College Cork, Irel<strong>and</strong><br />

E-mail: jason.hannon@ue.ucc.ie<br />

48<br />

on chip error sources; if the model is <strong>in</strong>complete this will show<br />

as lack of fit of the model. Like this one can ensure that all relevant<br />

error sources are covered, or at least, determ<strong>in</strong>e the degree<br />

of coverage, i.e. the quality of fit, obta<strong>in</strong>ed.<br />

2 Design evaluation<br />

2.1 ADC transfer characteristic<br />

The transfer characteristic, i.e. the <strong>in</strong>put to output relationship,<br />

of an ADC depends on the device parameters. This relationship<br />

is generally non-l<strong>in</strong>ear, however, if the parameters vary by<br />

only a small amount, as is typical for a stable manufacur<strong>in</strong>g<br />

process, this dependence can be approximated by a l<strong>in</strong>ear relationship<br />

[3].<br />

The Integral Nonl<strong>in</strong>earity (INL) describes, how far away from<br />

the ideal transfer function is the converter under test. The<br />

IEEE [4] def<strong>in</strong>es the maximum <strong>and</strong> m<strong>in</strong>imum INL of an ADC<br />

as the most positive <strong>and</strong> negative difference between the ideal<br />

<strong>and</strong> measured code transition levels.<br />

Fig. 1 shows a plot of the nom<strong>in</strong>al INL characteristic versus<br />

code for the ADC used <strong>in</strong> this work. This INL curve is obta<strong>in</strong>ed<br />

by averag<strong>in</strong>g the INL curves of the 77 devices which are taken<br />

as a representation sample for the manufactur<strong>in</strong>g process. Note<br />

that this averag<strong>in</strong>g removes the <strong>in</strong>fluence of manufactur<strong>in</strong>g process<br />

variations, thus, the characteristic shown can be considered<br />

the INL curve built-<strong>in</strong> by design, which ideally should be zero<br />

at all-codes.<br />

2.2 St<strong>and</strong>ard approach<br />

The st<strong>and</strong>ard approach to Design Evaluation starts with measur<strong>in</strong>g<br />

the design specifications for a sample set of devices from<br />

the manufactur<strong>in</strong>g process. For a sample set of 77 measured devices,<br />

the probability density distribution of maximum <strong>and</strong> m<strong>in</strong>imum<br />

INL values is shown <strong>in</strong> Fig. 2. If, as <strong>in</strong> the case shown,<br />

there is sufficient marg<strong>in</strong> between the performance spread <strong>and</strong><br />

the design specifications of |INL| < 1.5 LSB, the design can be<br />

submitted to high volume production. Otherwise, Design Evaluation<br />

cont<strong>in</strong>ues with the aim of identify<strong>in</strong>g <strong>and</strong> counteract<strong>in</strong>g<br />

the dom<strong>in</strong>ant on-chip error sources that contribute to performance<br />

degradations.<br />

At the po<strong>in</strong>t of identify<strong>in</strong>g the dom<strong>in</strong>ant source of error, the<br />

relationship between design-specific error sources <strong>and</strong> the per-


Integral Nonl<strong>in</strong>earity <strong>in</strong> LSB<br />

0.5<br />

0.4<br />

0.3<br />

0.2<br />

0.1<br />

0<br />

−0.1<br />

−0.2<br />

−0.3<br />

0 500 1000 1500 2000 2500 3000 code 4000<br />

Figure 1: Nom<strong>in</strong>al INL characteristic with maximum of<br />

+0.39 LSB <strong>and</strong> m<strong>in</strong>imum of −0.28 LSB.<br />

Probability density<br />

3.5<br />

3<br />

2.5<br />

2<br />

1.5<br />

1<br />

0.5<br />

0<br />

−1.5 −1 −0.5 0 0.5 1 1.5<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

Integral Nonl<strong>in</strong>earity <strong>in</strong> LSB<br />

Figure 2: Distribution of maximum <strong>and</strong> m<strong>in</strong>imum INL<br />

for ADC sample.<br />

C2<br />

C1<br />

K circuit elements<br />

R n<br />

x<br />

n device parameters<br />

Model<br />

Figure 3: Model mapp<strong>in</strong>g.<br />

R m<br />

b<br />

m measurements<br />

49<br />

VIN<br />

CMP<br />

DAC<br />

Code Word<br />

Figure 4: SAR ADC block diagram.<br />

SAR<br />

Logic<br />

formance measurements becomes an important <strong>in</strong>put to the Design<br />

Evaluation process. However, <strong>in</strong> the st<strong>and</strong>ard approach,<br />

the design eng<strong>in</strong>eer can only <strong>in</strong>spect the measurement results<br />

<strong>in</strong> detail <strong>and</strong> try to replicate the observed performance degradation<br />

<strong>in</strong> simulation. This trial-<strong>and</strong>-error approach is costly as it<br />

delays Time-to-Market <strong>and</strong>, moreover, does not guarantee completeness<br />

with respect to all known sources of error.<br />

2.3 Design evaluation based on error<br />

source model<br />

The approach taken <strong>in</strong> this work is to build a model which can<br />

replicate the measured results. This model is first built us<strong>in</strong>g the<br />

circuits schematics, with the component values sett<strong>in</strong>g the models<br />

device parameters. Once the model is fitted to the measured<br />

INL transfer function, the values for the device parameters used<br />

to fit the model can be used <strong>in</strong> the circuit schematic to give the<br />

actual values for the components. The previous approach was<br />

ad hoc, unless an on-chip error source dom<strong>in</strong>ated the INL performance<br />

degradation it proved difficult to locate the reason for<br />

performance degradation.<br />

The test vehicle used for this research is a 12-bit SAR ADC<br />

design similiar to [5]. This ADC uses a charge-redistribution<br />

DAC <strong>in</strong> a feedback loop <strong>and</strong> which is illustrated <strong>in</strong> Fig. 4. The<br />

output code for the ADC is the DAC’s <strong>in</strong>put “code word” at<br />

the end of the SAR logic controlled conversion cycle. Nonl<strong>in</strong>earities<br />

<strong>in</strong> the transfer characteristic of the ADC are due to<br />

nonl<strong>in</strong>earities <strong>in</strong> the DAC <strong>and</strong> the settl<strong>in</strong>g time for the decision<br />

by the comparator (CMP). In this work, we focus on the DAC<br />

nonl<strong>in</strong>earities as it can be shown to be the dom<strong>in</strong>ant contributor<br />

to the nonl<strong>in</strong>earity of the ADC.<br />

The model based approach focuses on the l<strong>in</strong>ear model concept<br />

which exploits the relationship between on-chip error sources<br />

<strong>and</strong> the Integral Non L<strong>in</strong>earity (INL) measurements [1]. The<br />

model<strong>in</strong>g approach uses the fact that mismatches, vary by only<br />

a small amount, such as DAC component mismatches, thus the<br />

model can be l<strong>in</strong>ear <strong>in</strong> the component parameters. Known error<br />

sources that impact the INL characteristic are simulated.<br />

S<strong>in</strong>ce the model<strong>in</strong>g is l<strong>in</strong>ear, the superposition pr<strong>in</strong>ciple holds<br />

therefore it is possible to simulate each <strong>in</strong>dividual error source.<br />

These simulation results comb<strong>in</strong>e to form the overall INL transfer<br />

characteristic.These simulation results are used <strong>in</strong> a matrix<br />

A for a 12-bit converter the size would be (4096 × n) with n<br />

denot<strong>in</strong>g the number of error sources. The measured device<br />

characteristic b ∈ R 4096 is decomposed as<br />

b = Ax + ∆b (1)<br />

Where x is a vector of parameters associated with each error <strong>and</strong><br />

the residual ∆b is the lack of fit. Solv<strong>in</strong>g for x gives the values


Cc<br />

Cb4<br />

1<br />

32<br />

Cb5<br />

Cb3<br />

Cb6<br />

Cseg1<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

1<br />

16<br />

Cb2<br />

2<br />

8<br />

Cb1<br />

4<br />

4<br />

Cseg2<br />

Cb0<br />

4<br />

2<br />

Cseg31<br />

Ct<br />

1<br />

4<br />

Cgnd<br />

Sgnd<br />

4<br />

Cpar<br />

Figure 5: 12-bit Charge Redistribution DAC.<br />

X<br />

S<strong>in</strong><br />

−<br />

+<br />

Sfb<br />

V<strong>in</strong> Vref<br />

for the models parameters. These parameters are then related<br />

back to the component values <strong>in</strong> the schematic giv<strong>in</strong>g the actual<br />

values for the manufactured device. This is used to identify the<br />

dom<strong>in</strong>ant error source or comb<strong>in</strong>ation of error sources caus<strong>in</strong>g<br />

the degradation <strong>in</strong> the INL transfer characteristic.<br />

3 Succssive approximation register<br />

ADC<br />

3.1 ADC implementation<br />

The charge redistribution DAC <strong>in</strong> the SAR ADC is shown <strong>in</strong><br />

Fig. 5 the top five bits are segmented, with the rema<strong>in</strong><strong>in</strong>g seven<br />

bits be<strong>in</strong>g b<strong>in</strong>ary weighted.<br />

The operation of the ADC is as follows. All the switches are<br />

<strong>in</strong>itially as shown, this tracks the <strong>in</strong>put voltage V<strong>in</strong> charg<strong>in</strong>g all<br />

the capacitors to the <strong>in</strong>put voltage. At the end of this sampl<strong>in</strong>g<br />

phase the switches S fb <strong>and</strong> Sgnd are opened, preserv<strong>in</strong>g<br />

the charge on the capacitors. Next all the switches at the bottom<br />

end of the capacitors are switched to ground, caus<strong>in</strong>g the<br />

comparator <strong>in</strong>put voltage to go to −V<strong>in</strong>. The <strong>in</strong>put switch S<strong>in</strong><br />

is than connected to reference voltage Vre f .The SAR logic then<br />

starts a b<strong>in</strong>ary search for a sett<strong>in</strong>g of the switches at the bottom<br />

end of the capacitors for which the comparator <strong>in</strong>put node<br />

returns to ground.<br />

3.2 Development of the model<br />

In the circuit diagram <strong>in</strong> Fig. 5, forty-one capacitors are shown;<br />

each capacitor is labeled with the number of unit capacitors<br />

they are made up of. If all capacitors are at there ideal value<br />

<strong>and</strong> the comparator has sufficient settl<strong>in</strong>g time then the simulated<br />

INL characteristic is that of an ideal ADC, i.e. zero at<br />

all-codes. However, the manufactur<strong>in</strong>g process causes these capacitor<br />

values to deviate r<strong>and</strong>omly from ideal which causes the<br />

INL chartacteristic to display nonl<strong>in</strong>earities. Dur<strong>in</strong>g the manufactur<strong>in</strong>g<br />

process typically the capacitors will be mismatched<br />

by 0.1% [6] of their designed value.<br />

In Fig. 6 the segment capacitors Cseg1,...,Cseg31 simulated INL<br />

characteristics for each <strong>in</strong>dividual capacitor <strong>in</strong>creased by 0.1%<br />

is shown.<br />

50<br />

0.1<br />

0.05<br />

0<br />

−0.05<br />

−0.1<br />

INL <strong>in</strong> LSB<br />

INL <strong>in</strong> LSB<br />

0.15<br />

0.1<br />

0.05<br />

0<br />

0 500 1000 1500 2000 2500 3000 3500 4000<br />

Cseg1<br />

Cseg2<br />

Cseg3<br />

−0.05<br />

0 100 200 300 400 500<br />

Figure 6: Simulated INL characteristic for segment capacitors<br />

be<strong>in</strong>g <strong>in</strong>creased by 0.1%.<br />

4 Application of the model<br />

4.1 Solution with identified error sources<br />

As discussed <strong>in</strong> section 2.3, a model is built us<strong>in</strong>g simulation<br />

results obta<strong>in</strong>ed from all the known error sources. These results<br />

are than used <strong>in</strong> equation 1 to decompose the measured device<br />

response <strong>in</strong>to a set of weighted values for each error source.<br />

S<strong>in</strong>ce this model replicates the measured results, apply<strong>in</strong>g these<br />

weights to the simulated results should result <strong>in</strong> the INL characteristic<br />

obta<strong>in</strong>ed from the measured results.<br />

The results of this are shown <strong>in</strong> Fig. 7. The top graph is the<br />

average device characteristic for 77 device measurements. The<br />

bottom graph <strong>in</strong> Fig. 7 is the residual ∆b, i.e. the lack of fit for<br />

the model.<br />

The spikes <strong>in</strong> the residual lack-of-fit represent the settl<strong>in</strong>g time<br />

of the comparator decision. By <strong>in</strong>spection of the residual: the<br />

largest peaks are located at codes 1023 <strong>and</strong> 3071 correspond<strong>in</strong>g<br />

to the second most significant bit <strong>in</strong> the code word. Note that the<br />

ADC design allows more settl<strong>in</strong>g time for the most significat<br />

bit (MSB) decision so there is no peak observed at mid-scale<br />

code 2047.<br />

4.2 Improved model<br />

From the <strong>in</strong>itial model built us<strong>in</strong>g the known error sources the<br />

lack of fit was to high, mak<strong>in</strong>g it apparent an major error source<br />

had been missed. The next step is to identify this miss<strong>in</strong>g error<br />

source, which had caused the model’s lack of fit. The answer to<br />

the miss<strong>in</strong>g error source came from <strong>in</strong>spect<strong>in</strong>g a zoomed plot<br />

of the fitted INL characteristic <strong>and</strong> nom<strong>in</strong>al INL characteristic.<br />

This is shown <strong>in</strong> Fig. 8, the difference between the two<br />

waveforms is a repeat<strong>in</strong>g range of 32 codes this po<strong>in</strong>ts to the<br />

sub-DAC be<strong>in</strong>g the location of the miss<strong>in</strong>g error source.<br />

The waveform seen <strong>in</strong> the zoomed plot exhibits the shape seen<br />

from simulations for mismatches of Ct <strong>and</strong> Cc. The explanation<br />

for the mismatch is a parasitic capacitance to ground on<br />

the sub-DAC. This capacitance can be lumped <strong>in</strong>to a capacitor<br />

Cpar which is shown <strong>in</strong> Fig. 5. The lack of fit is improved by<br />

<strong>in</strong>clud<strong>in</strong>g the INL characteristic for the term<strong>in</strong>ation capacitor.<br />

The result for fitt<strong>in</strong>g the improved model is shown <strong>in</strong> a plot <strong>in</strong>


Nom<strong>in</strong>al INL <strong>in</strong> LSB<br />

INL <strong>in</strong> LSB<br />

0.4<br />

0.2<br />

0<br />

−0.2<br />

0.2<br />

0.1<br />

0<br />

−0.1<br />

0 500 1000 1500 2000 2500 3000 3500 4000<br />

−0.2<br />

0 500 1000 1500 2000 2500 3000 3500 4000<br />

Figure 7: Simulated INL characteristic for segment capacitors<br />

be<strong>in</strong>g <strong>in</strong>creased by 0.1%.<br />

Nom<strong>in</strong>al INL <strong>in</strong> LSB<br />

0.2<br />

0.15<br />

0.1<br />

0.05<br />

0<br />

−0.05<br />

−0.1<br />

−0.15<br />

−0.2<br />

−0.25<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

Fitted INL Characteristic<br />

Measured INL Characteristic<br />

50 100 150 200 250 300<br />

Figure 8: Simulated model characteristics fitted to nom<strong>in</strong>al<br />

INL.<br />

Nom<strong>in</strong>al INL <strong>in</strong> LSB<br />

0.25<br />

0.2<br />

0.15<br />

0.1<br />

0.05<br />

0<br />

−0.05<br />

−0.1<br />

−0.15<br />

−0.2<br />

−0.25<br />

0 50 100 150 200 250<br />

Figure 9: Improved model characteristics fitted to nom<strong>in</strong>al<br />

INL.<br />

51<br />

Fig. 9. From the results it is apparent that the improved model<br />

is now able to capture the INL characteristic of the measured<br />

characteristic much closer.<br />

Now that we have a more accurate model we will use it to suggest<br />

a redesign that can perform to a higher specification. As<br />

was seen a major reason for the lack of fit orig<strong>in</strong>ally was the<br />

parasitic capacitance <strong>in</strong> the sub-DAC. The parasitic capacitance<br />

is a major contributor to the performance degradation. The size<br />

of the parasitic capacitance can be estimated by the mismatch<br />

of Ct <strong>in</strong> the nom<strong>in</strong>al INL characteristic. The mismatch from<br />

Ct suggested that the parasitic capacitance Cpar value is equal<br />

to 58% of a unit capacitor. The suggested redesign is to make<br />

capacitor Ct value 42% of a unit capacitor.<br />

5 Conclusions<br />

In this paper we have presented an error source based approach<br />

to Design Evaluation. The advantage of this model-based approach<br />

is that one can quantify the approach’s completeness <strong>in</strong><br />

terms of the on-chip error sources <strong>and</strong> that the mask<strong>in</strong>g effect<br />

of multiple error sources can be removed <strong>in</strong> the decomposition<br />

process.<br />

In the example used <strong>in</strong> this paper, a previously un-quntifiable<br />

error source, i.e. the parasitic capacitance Cpar is determ<strong>in</strong>ed.<br />

This was seen from a large lack of fit for the first model. The<br />

model was improved accommodat<strong>in</strong>g Cpar. Us<strong>in</strong>g the improved<br />

model Cpar was estimated from the decomposition process <strong>and</strong><br />

a redesign suggested to compensate for this error source.<br />

References<br />

[1] C. Wegener <strong>and</strong> M. P. Kennedy, L<strong>in</strong>ear model-based error<br />

identification <strong>and</strong> calibration for data converters, <strong>in</strong><br />

Proc. DATE, ser. Conf. on Design, Automation <strong>and</strong> Test<br />

<strong>in</strong> Europe, Munich, Germany, March 2003, pp. 630–635.<br />

[2] International technology roadmap for semiconductors.<br />

[Onl<strong>in</strong>e]. Available: http://public.itrs.net<br />

[3] C. Wegener, Applications of l<strong>in</strong>ear model<strong>in</strong>g to test<strong>in</strong> <strong>and</strong><br />

characteriz<strong>in</strong>g d/a <strong>and</strong> a/d converters, Ph.D. dissertation,<br />

University College Cork, Nov 2003.<br />

[4] IEEE-Std-1241 Draft, The Institute of Electrical <strong>and</strong> <strong>Electronics</strong><br />

Eng<strong>in</strong>eers, Inc., New York, March 2000, IEEE st<strong>and</strong>ard<br />

for term<strong>in</strong>ology <strong>and</strong> test methods for analog-to-digital<br />

converters.<br />

[5] A. Rivetti, G. Anelli, F. Angh<strong>in</strong>olfi, G. Mazza, <strong>and</strong> F. Rotondo,<br />

A low-power 10-bit ADC <strong>in</strong> a 0.25-µm CMOS:design<br />

considerations <strong>and</strong> test results, IEEE Trans. Nuclear<br />

Science, vol. 48, no. 4, pp. 1225–1228, Aug. 2000.<br />

[6] M. McNutt, S. LeMarquis, <strong>and</strong> J. Dunkley, Systematic<br />

capacitance match<strong>in</strong>g errors <strong>and</strong> corrective layout procedures,<br />

IEEE Journal of Solid-State Circuits,, vol. 29, no. 4,<br />

May 1994.


0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

SIGNAL-PATH LEVEL ASSIGNMENT FOR DUAL-<br />

Vt TECHNIQUE<br />

Yu Wang, Huazhong Yang, Hui Wang<br />

Circuits <strong>and</strong> Systems Laboratory, Department of Electronic Eng<strong>in</strong>eer<strong>in</strong>g,<br />

Ts<strong>in</strong>ghua University, 100084, Beij<strong>in</strong>g, People’s Republic of Ch<strong>in</strong>a<br />

E-mail: wangyuu99@mails.ts<strong>in</strong>ghua.edu.cn<br />

ABSTRACT<br />

Along with the fast development of dual threshold voltage<br />

(dual-V t) technology, it is possible to use it to reduce static<br />

power <strong>in</strong> low-voltage high-performance circuits. In this<br />

paper we present a new signal-path level circuit model <strong>and</strong><br />

an algorithm based on the new circuit model which<br />

<strong>in</strong>troduces the concept of extract<strong>in</strong>g sub-circuits.<br />

Experimental results show that, for the ISCAS85<br />

benchmark circuits, our algorithm produces a significant<br />

leakage-power reduction similar to the transistor level<br />

dual-V t assignment, but the computational cost is<br />

comparative to gate level dual-V t assignment.<br />

1. INTRODUCTION<br />

With the development of the fabrication technology,<br />

leakage power dissipation has become comparable to<br />

switch<strong>in</strong>g power dissipation [1]. At the 90nm technology<br />

node, leakage power may make up 42% of total power [2].<br />

Inevitably, techniques are necessary for reduc<strong>in</strong>g the<br />

<strong>in</strong>creas<strong>in</strong>g leakage power. These leakage control methods<br />

can be broadly categorized <strong>in</strong>to two ma<strong>in</strong> categories:<br />

process level <strong>and</strong> circuit level techniques. At the process<br />

level, leakage reduction can be achieved by controll<strong>in</strong>g the<br />

dimensions (length, oxide thickness, junction depth, etc.)<br />

<strong>and</strong> dop<strong>in</strong>g profile <strong>in</strong> transistors. There are also four major<br />

circuit design techniques, namely, transistor stack<strong>in</strong>g,<br />

supply voltage scal<strong>in</strong>g, dynamic V t <strong>and</strong> multiple V t for<br />

leakage reduction <strong>in</strong> digital circuits. The two ma<strong>in</strong><br />

methods <strong>in</strong> multiple V t technique are sleep transistor<br />

<strong>in</strong>sertion <strong>and</strong> dual V t CMOS assignment. The extra area<br />

<strong>and</strong> delay due to the <strong>in</strong>sertion of sleep transistors have<br />

considerable <strong>in</strong>fluence on the circuit performance.<br />

Furthermore, with the supply voltage scal<strong>in</strong>g, it is<br />

becom<strong>in</strong>g harder to turn on under a very low supply<br />

voltage. A higher V t can be assigned to some of the<br />

transistors <strong>in</strong> the non-critical paths <strong>and</strong> a lower V t is<br />

assigned to other transistors <strong>in</strong> the dual V t CMOS<br />

assignment for a logic circuit <strong>in</strong> order to achieve both high<br />

performance <strong>and</strong> low power simultaneously. Recently, a<br />

dual-V t MOSFET process was developed [3], which<br />

makes the implementation of dual-V t logic circuits more<br />

feasible. Dual-V t method results <strong>in</strong> a significant reduction<br />

<strong>in</strong> total power dissipation <strong>and</strong> energy. Therefore,<br />

determ<strong>in</strong><strong>in</strong>g which gate should be the high V t has already<br />

become a major emphasis <strong>in</strong> the research field.<br />

Traditionally, gate level dual-V t assignment [4][5][6][7]<br />

52<br />

suppresses less leakage power than transistor level dual-Vt assignment [8] while sav<strong>in</strong>g much more computation time.<br />

In this paper, we assume that all the gates are us<strong>in</strong>g the<br />

low threshold voltage <strong>in</strong> order to get the best performance<br />

(tim<strong>in</strong>g characteristic). The new signal-path level circuit<br />

model we used is different from the circuit models which<br />

consider a gate as a vertex or an edge <strong>in</strong> a graph. This is to<br />

make our algorithms useful for transistor level leakage<br />

control. We use look up table method <strong>in</strong> static tim<strong>in</strong>g<br />

analysis to get the critical paths <strong>and</strong> non-critical paths of<br />

the circuit much faster <strong>and</strong> with more accuracy. The gates<br />

<strong>in</strong> the critical paths will rema<strong>in</strong> unchanged to ma<strong>in</strong>ta<strong>in</strong> the<br />

performance; <strong>and</strong> the gates <strong>in</strong> the non-critical paths are<br />

extracted <strong>in</strong>to several sub-circuits. Without reiterat<strong>in</strong>g the<br />

whole circuit, we can focus solely on deal<strong>in</strong>g with the subcircuits<br />

<strong>in</strong> which we use a new developed hierarchy based<br />

algorithm to get an optimal result faster. Our new signalpath<br />

level dual Vt assignment aims to have more leakage<br />

power sav<strong>in</strong>g <strong>and</strong> a similar or even less computation<br />

complexity than gate level dual-Vt assignment.<br />

2. DUAL Vt ASSIGNMENT<br />

2.1 Gate level circuit model<br />

A comb<strong>in</strong>ational circuit is represented by a directed<br />

acyclic graph (DAG) G = (V, E). Traditionally a vertex v<br />

∈V represents a CMOS transistor network which realizes<br />

a s<strong>in</strong>gle output logic function (a logic gate), while an edge<br />

(i,j)∈E, i,j∈V represents a connection from vertex i to<br />

vertex j. In this way, the transistors with<strong>in</strong> a vertex that are<br />

driven by the same logic signal will be assigned to the<br />

same threshold.<br />

The assignment of threshold voltages to the transistors <strong>in</strong><br />

the circuit can be represented as one of assign<strong>in</strong>g a<br />

threshold voltage to a vertex v∈V [3][4][5], or one of<br />

assign<strong>in</strong>g a threshold voltage to an edge (i,j)∈E [6]. Thus<br />

this allows us to treat the dual-Vt optimization problem as<br />

a k<strong>in</strong>d of graph problem. It greatly simplifies delay<br />

analysis <strong>and</strong> st<strong>and</strong>by power estimation dur<strong>in</strong>g Vt assignment. The effects on delay when a Vt change is<br />

made can be easily modeled by static tim<strong>in</strong>g analysis<br />

(STA) methods. In Figure 1, a comb<strong>in</strong>ational circuit is<br />

presented at the left side; the traditional circuit model is at<br />

the right side.


Figure 1. Orig<strong>in</strong>al circuit C17 <strong>in</strong> ISCAS85 <strong>and</strong><br />

traditional gate level circuit model.<br />

2.2 Signal-path level circuit model<br />

In our signal-path level circuit model, a vertex v ∈ V<br />

represents a p<strong>in</strong> of logic gates or a primary <strong>in</strong>put/output;<br />

an edge (i,j)∈E represents a connection from vertex i to<br />

vertex j. In our model, an edge is the abstraction of a wire<br />

connect<strong>in</strong>g two p<strong>in</strong>s of two different gates or a signal-path<br />

<strong>in</strong> a logic gate from one of its <strong>in</strong>put p<strong>in</strong> to an output p<strong>in</strong>.<br />

Furthermore, we have added a virtual <strong>in</strong>put vertex <strong>and</strong> a<br />

virtual output vertex to our model. The virtual <strong>in</strong>put vertex<br />

is connected to all the primary <strong>in</strong>puts <strong>and</strong> the virtual<br />

output vertex is connected to all the primary outputs. The<br />

fan-<strong>in</strong> of a logic gate’s <strong>in</strong>put p<strong>in</strong> refers to the number of<br />

p<strong>in</strong>s which connect with this <strong>in</strong>put p<strong>in</strong>. The fan-out of a<br />

logic gate’s output p<strong>in</strong> refers to the number of p<strong>in</strong>s which<br />

is connected with this output p<strong>in</strong>. Furthermore, vertexes<br />

which have a fan-<strong>in</strong> of zero constitute primary <strong>in</strong>puts;<br />

similarly vertexes which have a fan-out of zero constitute<br />

primary outputs. Figure 2 shows our signal-path circuit<br />

model of circuit C17 from ISCAS85 bench mark. If vertex<br />

i∈V represents one of gate A’s <strong>in</strong>put p<strong>in</strong>s <strong>and</strong> vertex j∈V<br />

represents gate A’s output p<strong>in</strong>, we def<strong>in</strong>e edge (i,j)∈E as<br />

a “signal-path” <strong>and</strong> this signal-path belongs to gate A.<br />

And actually each signal-path consists of one of the<br />

parallel transistors <strong>and</strong> the transistors <strong>in</strong> series <strong>in</strong> a simple<br />

classical CMOS gate.<br />

Figure 2. Signal-path level circuit model (C17).<br />

There are several reasons for us<strong>in</strong>g this new circuit model.<br />

Firstly, the signal arrival time may be different for every<br />

<strong>in</strong>put p<strong>in</strong> of a gate. More detailed delay <strong>in</strong>formation for<br />

every gate is presented s<strong>in</strong>ce the delay <strong>in</strong>formation for<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

53<br />

every p<strong>in</strong> of the gate is computed by STA. Secondly,<br />

through the def<strong>in</strong>ition of signal-path, it is possible to have<br />

transistors with different Vt <strong>in</strong> a s<strong>in</strong>gle gate at the same<br />

time, which means transistors <strong>in</strong> one gate may have<br />

different Vt. If we neglect the possibility of assign<strong>in</strong>g<br />

different threshold voltage to signal-paths which belong to<br />

the same gate, it will get the same solution as previous<br />

methods [4][5]. F<strong>in</strong>ally it leads to a much less<br />

computation complexity than transistor level assignment<br />

[8], yet achiev<strong>in</strong>g a similar accuracy.<br />

The edge E <strong>in</strong> the graph represents two k<strong>in</strong>ds of<br />

connections. One is “signal-path”, the other is the<br />

connection of two p<strong>in</strong>s belong<strong>in</strong>g to different gates<br />

respectively which represents a wire between two p<strong>in</strong>s <strong>in</strong><br />

most cases. Hence it is possible to consider the<br />

<strong>in</strong>terconnect delay dur<strong>in</strong>g STA <strong>in</strong> order to get more<br />

accurate model of the circuit.<br />

2.3 Delay model<br />

We create our signal-path delay model <strong>and</strong> get arrival<br />

time, require time <strong>and</strong> slack time <strong>in</strong>formation for each<br />

vertex <strong>and</strong> signal-path <strong>in</strong> the signal-path level circuit<br />

model.<br />

In order to get the delay attributes, we levelize the<br />

vertexes <strong>in</strong> the graph, mak<strong>in</strong>g sure every two vertexes<br />

belongs to the same level have no edges between them.<br />

Each p<strong>in</strong>’s fan-<strong>in</strong>s are not at the same level as itself, its<br />

fan-outs are not either; thus an edge (i,j) ∈ E’s two<br />

vertexes i,j∈V are not at the same level. The delay of an<br />

edge (i,j)∈E, i,j∈V is denoted by di,j. We def<strong>in</strong>e three attributes for every vertex v∈V, they are<br />

namely, the arrival time ta(v), the required time treq(v), <strong>and</strong><br />

the slack time tslk(v). The arrival time ta(v) is the worst<br />

case of delay from the primary <strong>in</strong>puts to p<strong>in</strong> v. treq(v) is the<br />

latest time the signal needs to arrive at p<strong>in</strong> v. We def<strong>in</strong>e<br />

them as:<br />

⎧⎪<br />

given _ time _ of _ arrival, if v is the vitual <strong>in</strong>put<br />

t ( v)<br />

≡ a ⎨ max { t ( i) + d , } , otherwise<br />

a i v<br />

⎪⎩ i∈fan_ <strong>in</strong>( v)<br />

⎧ta(<br />

v), if v is the virtual output<br />

⎪<br />

treq () v ≡ ⎨ m<strong>in</strong> { treq ( i) − d<br />

⎪⎩<br />

v, i}<br />

i∈ fan _ out( v)<br />

By comparison to traditional circuit model, the arrival<br />

time of a gate is the maximum of its <strong>in</strong>put p<strong>in</strong>s’ arrival<br />

time, <strong>and</strong> the required time of a gate is its output p<strong>in</strong>’s<br />

required time (if the gate is a CMOS transistor network<br />

which realizes a s<strong>in</strong>gle output logic function). The slack<br />

time of a gate is also def<strong>in</strong>ed as the difference of its arrival<br />

time <strong>and</strong> required time. The critical path of the circuits is<br />

constituted by the set of gates that has the m<strong>in</strong>imum slack<br />

time value.<br />

We def<strong>in</strong>e every edge (i,j)∈E, i,j∈V <strong>in</strong> the graph G also<br />

has the attribute s i,j which represents the slack time of the<br />

edge:<br />

s ≡ t ( j) −t ( i) −d<br />

i, j req a i, j


F<strong>in</strong>ally, the slack time of a vertex v∈V is def<strong>in</strong>ed as the<br />

m<strong>in</strong>imum slack time of its fan-<strong>in</strong> edges:<br />

t () v = m<strong>in</strong>s<br />

s iv ,<br />

i∈fan_ <strong>in</strong>( v)<br />

In our delay model we def<strong>in</strong>e the critical path of the<br />

circuits as the set of the edges that has the m<strong>in</strong>imum slack<br />

time value. If there is no negative slack <strong>in</strong> the circuit, then<br />

tim<strong>in</strong>g constra<strong>in</strong>ts are satisfied [9].<br />

Notice that every signal-path <strong>in</strong> the same gate can have<br />

different delay difference when its Vt changes; <strong>and</strong> when<br />

several signal-paths can be simultaneously changed <strong>in</strong> one<br />

gate, the delay difference is even more complicated<br />

because of the <strong>in</strong>fections between the changed signalpaths.<br />

Here we select the largest delay difference of all the<br />

signal-paths’ change schemes as the reference delay<br />

difference of the signal-path <strong>in</strong> this k<strong>in</strong>d of gate. The<br />

signal-path delay data are then derived from the look up<br />

table of st<strong>and</strong>ard cells <strong>and</strong> HSPICE simulation.<br />

2.4 Leakage power model<br />

We f<strong>in</strong>d out that the leakage power change due to only one<br />

signal path’s change is always the same <strong>and</strong> furthermore,<br />

if there are k signal-paths which can change their threshold<br />

voltage <strong>in</strong> a gate with w signal paths (k< w), no matter<br />

how to choose the k signal paths, the power change due to<br />

k signal paths’ threshold voltage change is always the<br />

same. The leakage power sav<strong>in</strong>g due to k signal paths’<br />

threshold voltage change is nearly the same as k times the<br />

leakage power sav<strong>in</strong>g due to only one signal path’s change.<br />

However, if all the w signal paths <strong>in</strong> the gate is changed,<br />

the leakage power sav<strong>in</strong>g is larger than w times the<br />

leakage power sav<strong>in</strong>g due to only one signal path’s<br />

change. F<strong>in</strong>ally we use two values to represent each signal<br />

path’s leakage power attributes: the larger one is for all the<br />

signal paths <strong>in</strong> that gate can change <strong>in</strong>to high threshold<br />

voltage, <strong>and</strong> it equals to the leakage power sav<strong>in</strong>g due to<br />

the gate’s threshold voltage change divided by the number<br />

of signal-paths <strong>in</strong> the gate; <strong>and</strong> the smaller one is for other<br />

conditions which equals the leakage power sav<strong>in</strong>g due to<br />

only one signal path’s change. Us<strong>in</strong>g HSPICE <strong>and</strong> a<br />

typical library for each circuit scheme of the signal-path,<br />

we can create a table of leakage power for the signalpath’s<br />

threshold voltage change.<br />

We do not consider the signal probability at each p<strong>in</strong> of<br />

the gates, <strong>and</strong> we may use logic simulation or local<br />

probability propagation <strong>in</strong> our future work to make it<br />

possible to comb<strong>in</strong>e transistor stack<strong>in</strong>g effects with the<br />

circuit analysis to get a more accurate leakage power<br />

estimation table.<br />

2.5 Algorithm<br />

The dual-Vt optimization is def<strong>in</strong>ed as a problem to assign<br />

one of two threshold voltages to each signal-path to get<br />

most leakage power reduction with no performance<br />

reduction.<br />

In our algorithm, we assumed the DAG representation<br />

G(V,E) of a signal-path level comb<strong>in</strong>ational circuit. The<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

54<br />

graph is firstly levelized to <strong>in</strong>dicate the depth of the<br />

vertexes <strong>and</strong> signal-paths <strong>in</strong> the graph. We <strong>in</strong>itialize the<br />

circuit by assign<strong>in</strong>g a low threshold voltage (VTHlow) to all<br />

the signal-paths of the circuit, i.e. it essentially configures<br />

the circuit to have the m<strong>in</strong>imum delay. In the <strong>in</strong>itialization<br />

procedure, we decide the delay attributes of every vertex<br />

<strong>and</strong> edge <strong>in</strong> the graph: the arrival time ta(v), the required<br />

time treq(v)<strong>and</strong> the slack time tslk(v), the edge slack time si,j, the edge propagation delay di,j. All these attributes can be<br />

calculated us<strong>in</strong>g static tim<strong>in</strong>g analysis <strong>and</strong> the formula we<br />

have denoted before. The fan-<strong>in</strong>s of a vertex are the<br />

former level vertexes which are connected with this<br />

vertex; the fan-outs of a vertex are the next level vertexes<br />

which are connected with this vertex. S<strong>in</strong>ce every edge has<br />

a slack time, we extract all the non-zero slack time edges<br />

to construct a set of sub-graphs Gsub1, Gsub2, …, Gsubn. The<br />

critical paths’ delay attributes are not affected when the Vt of some signal-paths on non-critical paths are changed.<br />

Therefore the assignment of dual Vt <strong>in</strong> the whole circuit is<br />

decomposed <strong>in</strong>to several small problems, which have<br />

much smaller solution space <strong>and</strong> thus are more easier to<br />

get the optimal assignment of Vt. Without reiterat<strong>in</strong>g the<br />

whole circuit, we can focus solely on deal<strong>in</strong>g with the subcircuits<br />

<strong>in</strong> which we use a new developed hierarchy<br />

priority based algorithm to get an optimal result faster.<br />

If we do not consider the condition that signal-paths <strong>in</strong> the<br />

same gate can have different threshold voltages, we can<br />

get the solution for gate-level dual-Vt optimization.<br />

Therefore, dur<strong>in</strong>g the sub-graph extraction, we will only<br />

consider the gates <strong>in</strong> which all the signal-paths’ slack<br />

times are positive. It could be easily realized by mapp<strong>in</strong>g a<br />

whole gate to a s<strong>in</strong>gle vertex <strong>in</strong> the graph. The arrival time<br />

of the gate is the maximum of the arrival times of the<br />

gate’s <strong>in</strong>put p<strong>in</strong>s. The required time of the gate is the<br />

output p<strong>in</strong>’s required time. The slack time of the gate is<br />

the difference between the arrival time <strong>and</strong> the required<br />

time of the gate. Through a little change <strong>in</strong> the algorithm,<br />

we can get the gate level optimization of the circuits.<br />

3. IMPLEMENTATION AND<br />

RESULTS<br />

The assignment algorithm has been implemented <strong>in</strong> C++<br />

under signal-path level static tim<strong>in</strong>g analysis environment.<br />

The value of various transistor parameters have been taken<br />

from the TSMC library, the effect channel length is<br />

0.13µm <strong>and</strong> the gate oxide thickness is 2.4nm. The circuit<br />

temperature is assumed to be 110℃. The leakage power<br />

table <strong>and</strong> delay look up table is created by HSPICE<br />

simulation. In our analysis, the low threshold voltage <strong>and</strong><br />

the supply voltage of the orig<strong>in</strong>al circuits are assumed to<br />

be 0.2V <strong>and</strong> 1.2V, <strong>and</strong> high threshold voltage dur<strong>in</strong>g the<br />

dual V t optimization is assumed to be 0.3V.<br />

In Figure 3, the signal-paths which are labeled out can be<br />

changed <strong>in</strong>to high threshold voltage; meanwhile only<br />

NAND_A can be changed by gate level assignment. The<br />

leakage power sav<strong>in</strong>g of C17 is respectively 16.3% <strong>and</strong><br />

28.7% for gate level <strong>and</strong> signal-path level optimization.


Figure 3. Changeable signal-paths <strong>in</strong> C17.<br />

Table. 1. shows that the gate level <strong>and</strong> signal-path level<br />

optimization leads to different results for leakage power<br />

sav<strong>in</strong>g of ISCAS85 circuits, <strong>and</strong> obviously more leakage<br />

reduction can be achieved through signal-path level<br />

optimization s<strong>in</strong>ce there are actually more transistors <strong>in</strong><br />

the implementation of the circuit which can be assigned to<br />

high threshold voltage.<br />

Table 1. Comparison of leakage power reduction.<br />

ISCAS85<br />

Benchmark<br />

Circuits<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

Gate level<br />

Reduction<br />

(%)<br />

Signal-path<br />

level Reduction<br />

(%)<br />

C432 34.5 43.3<br />

C499 29.2 53.1<br />

C880 71.8 77.4<br />

C1355 45.2 68.1<br />

C1908 55.4 66.8<br />

C2670 77.3 85.5<br />

C3540 81.9 89.7<br />

C5315 69.6 78.5<br />

C6288 40.3 56.5<br />

C7522 69.2 75.4<br />

Average 57.44 69.43<br />

4. CONCLUSION<br />

In this paper we have proposed a new circuit model for<br />

comb<strong>in</strong>ational circuit dur<strong>in</strong>g dual-V t assignment. The<br />

assignment algorithm is sped up by the proper extraction<br />

of sub graphs. By us<strong>in</strong>g a delay look-up table <strong>and</strong> a<br />

leakage power table generated by HSPICE simulation, we<br />

f<strong>in</strong>d that approximately 12% more leakage power sav<strong>in</strong>gs<br />

can be achieved under the signal-path level optimization<br />

than the gate-level optimization.<br />

55<br />

5. REFERENCES<br />

[1] G. Moore, “No exponential is forever: But forever<br />

can be delayed,” IEEE ISSCC Dig. Tech. Papers,<br />

2003, pp. 20 - 23.<br />

[2] Kao J., Narendra S., Ch<strong>and</strong>rakasan A., “Subthreshold<br />

Leakage model<strong>in</strong>g <strong>and</strong> reduction techniques”,<br />

ICCAD, 2002, pp 141 - 149<br />

[3] Z. Chen et. al., “0.18 um dual Vt MOSFET process<br />

<strong>and</strong> energy-delay measurement,” <strong>in</strong> IEDM Dig.,<br />

1996, p. 851.<br />

[4] L. Wei, Z. Chen, <strong>and</strong> K. Roy, “Design <strong>and</strong><br />

Optimization of Dual Threshold Circuits for Low<br />

Voltage, Low Power Applications”, IEEE<br />

Transaction on VLSI Systems, Vol.2. 17, NO. 1,<br />

1999, pp. 16-24.<br />

[5] Vijay Sundararajan, <strong>and</strong> Keshab K. Parhi, “LOW<br />

Power Synthesis of Dual Threshold Voltage CMOS<br />

VLSI Circuits”, Proc. ISLPED, 1999, pp. 363-368.<br />

[6] Nikhil Tripathi, Amit Bhosle, Debasis Samanta, <strong>and</strong><br />

Ajit Pal; “Optimal Assignment of High Threshold<br />

Voltage for Synthesiz<strong>in</strong>g Dual Threshold CMOS<br />

Circuits”, VLSI Design, 2001. Fourteenth<br />

International Conference on ,3-7 Jan. 2001 pp. 227 -<br />

232.<br />

[7] Qi Wang; Vrudhula, S.B.K.; “Algorithms for<br />

m<strong>in</strong>imiz<strong>in</strong>g st<strong>and</strong>by power <strong>in</strong> deep submicrometer,<br />

dual-Vt CMOS circuits”, Computer-Aided Design of<br />

Integrated Circuits <strong>and</strong> Systems, IEEE Transactions<br />

on ,Volume: 21 ,Issue: 3 ,March 2002 pp. 306 - 318 .<br />

[8] Liqiong Wei; Zhanp<strong>in</strong>g Chen; Roy, K.; Yib<strong>in</strong> Ye;<br />

De, V.; “Mixed-Vth (MVT) CMOS circuit design<br />

methodology for low power applications”, Design<br />

Automation Conference, 1999. Proceed<strong>in</strong>gs. 36th ,21-<br />

25 June 1999 pp. 430 - 435 .<br />

[9] S. Devadas, A. Ghosh, <strong>and</strong> K. Keutzer, Logic<br />

Synthesis. New York: McGraw-Hill, 1994.


Abstract - The augmentation of transistor technologies<br />

with Resonant Tunnell<strong>in</strong>g Diodes (RTDs) has demonstrated<br />

improved circuit performance <strong>and</strong> it has been claimed<br />

that it could be the way to extend lifetime of current technologies.<br />

Thus the research on circuit topologies us<strong>in</strong>g<br />

RTDs <strong>and</strong> transistors is of critical importance for these<br />

emergent technologies. In particular, threshold logic gates<br />

(TGs) <strong>and</strong> multi threshold gates (MTTGs) have been efficiently<br />

implemented. In this paper we propose a novel circuit<br />

topology to implement MTTGs which exhibits advantages<br />

<strong>in</strong> terms of speed <strong>and</strong> power consumption with respect<br />

to the previously reported circuit. A comparison<br />

between both topologies is carried out for an useful logic<br />

block: a gate which simultaneously implements the EXOR<br />

<strong>and</strong> the NAND functions.<br />

Index Terms - Resonant Tunnel<strong>in</strong>g Diodes, MOBILE,<br />

negative differential resistance, logic gates.<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

Novel Improved RTD-Based Implementation of<br />

Multi-Threshold Logic Gates<br />

Héctor Pettenghi, María J. Avedillo, <strong>and</strong> José M. Qu<strong>in</strong>tana<br />

Instituto de Microelectrónica de Sevilla, Centro Nacional de Microelectrónica,<br />

Edificio CICA, Avda. Re<strong>in</strong>a Mercedes s/n, 41012-Sevilla, SPAIN<br />

FAX: +34-955056686, E-mail: {hector, avedillo, josem, }@imse.cnm.es<br />

I. INTRODUCTION<br />

Resonant tunnell<strong>in</strong>g diodes (RTDs) are very fast non l<strong>in</strong>ear<br />

circuit elements which have been <strong>in</strong>tegrated with transistors<br />

to create novel quantum devices <strong>and</strong> circuits. The<br />

<strong>in</strong>corporation of tunnel diodes <strong>in</strong>to transistor technologies<br />

has demonstrated improved circuit performance: higher<br />

circuit speed, reduced component count, <strong>and</strong> lowered<br />

power consumption [1].<br />

RTDs exhibit a negative differential resistance<br />

(NDR) region <strong>in</strong> their current-voltage characteristics<br />

(Fig. 1a) which can be exploited to significantly<br />

<strong>in</strong>crease the functionality implemented by a s<strong>in</strong>gle gate<br />

<strong>in</strong> comparison to conventional MOS <strong>and</strong> bipolar technologies,<br />

thus reduc<strong>in</strong>g circuit complexity. Circuit<br />

applications of RTDs are ma<strong>in</strong>ly based on the MOnostable-BIstable<br />

Logic Element (MOBILE) [2]. The<br />

MOBILE, shown <strong>in</strong> Fig. 1a, is a ris<strong>in</strong>g edge triggered<br />

current controlled gate which consists of two RTDs<br />

connected <strong>in</strong> series <strong>and</strong> driven by a switch<strong>in</strong>g bias voltage<br />

Vbias . When Vbias is low both RTDs are <strong>in</strong> the onstate<br />

(or low resistance state) <strong>and</strong> the circuit is monosta-<br />

56<br />

x 1<br />

load<br />

driver<br />

V bias<br />

w1<br />

A 21<br />

V bias<br />

RTD 1<br />

RTD 2<br />

A 1<br />

A 2<br />

IRTD (mA)<br />

peak 2<br />

current<br />

valley<br />

current<br />

1<br />

VRTD (V)<br />

-1 -0.5 0.5<br />

peak<br />

-1<br />

voltage<br />

1<br />

V out<br />

NDR 1<br />

V out<br />

NDR 2<br />

Figure 1. MOBILE circuits, a) basic MOBILE,<br />

b) MOBILE <strong>in</strong>verter.<br />

ble. Increas<strong>in</strong>g Vbias to an appropriate maximum<br />

value ensures that only the device with the lowest peak<br />

current switches (quenches) from the on-state to the offstate<br />

(the high resistance state). Output is high if the<br />

driver RTD is the one which switches <strong>and</strong> it is low if it<br />

is the load. Peak currents are proportional to RTDs<br />

-2<br />

(a)<br />

(b)


areas. Assum<strong>in</strong>g equal current densities, the smallest<br />

RTD is the one which switches. Logic functionality is<br />

achieved by embedd<strong>in</strong>g an <strong>in</strong>put stage which modifies<br />

the peak current of one of the RTDs. An <strong>in</strong>put stage<br />

composed of the series connection of an RTD <strong>and</strong> a<br />

FET transistor has been proposed [3]. This is the idea of<br />

the <strong>in</strong>verter MOBILE shown <strong>in</strong> Fig. 1b. Only when a<br />

logic one is applied to the gate of a transistor its associated<br />

RTD contributes to the NDR2 current. RTDs areas<br />

are selected such that the relation of the peak currents of<br />

NDR1 <strong>and</strong> NDR0 depends on whether the external <strong>in</strong>put<br />

signal V<strong>in</strong> is “1” or “0”. For Vbias high the output<br />

node ma<strong>in</strong>ta<strong>in</strong>s its value even if the <strong>in</strong>put changes. That<br />

is, this circuit structure is self-latch<strong>in</strong>g allow<strong>in</strong>g to<br />

implement pipel<strong>in</strong>e at the gate level without any area<br />

overhead associated to the addition of the latches which<br />

allows very high throughtoutput.<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

A<br />

0.4<br />

0.2<br />

B<br />

0.4<br />

0.2<br />

V bias<br />

By us<strong>in</strong>g as many <strong>in</strong>put stages as <strong>in</strong>put variables,<br />

Threshold Gates (TGs) [4] can be realized with the<br />

same operat<strong>in</strong>g pr<strong>in</strong>ciple [3]. TG design style is a well<br />

recognized powerful alternative to the st<strong>and</strong>ard logic<br />

design because of the <strong>in</strong>tr<strong>in</strong>sic complexity of the functions<br />

performed by TGs allow<strong>in</strong>g realizations that<br />

require less threshold gates than st<strong>and</strong>ard threshold<br />

logic. Recently, a circuit structure for Multi-Threshold<br />

Gates (MTTGs) [5], a generalization of TGs able to<br />

implement even more complex functions than TGs, has<br />

been proposed us<strong>in</strong>g three or more series connected<br />

RTDs [6].<br />

II. CIRCUIT TOPOLOGIES<br />

Fig. 2a shows the circuit diagram of a previously<br />

reported two-<strong>in</strong>put logic block implement<strong>in</strong>g simultaneously<br />

the NAND <strong>and</strong> the EXOR operations [7] which<br />

RTD NDR<br />

RTD<br />

0<br />

0<br />

0<br />

A+0.4<br />

A<br />

A+0.1<br />

y 2<br />

NDR<br />

1<br />

NDR<br />

2<br />

0.2 0.2 0.2 0.2<br />

w1 w1 RTD<br />

1<br />

w2 w2 w1 w1<br />

y1 w 2<br />

w 2<br />

RTD<br />

2<br />

(a)<br />

c d a b<br />

A<br />

B<br />

V bias<br />

A+0.4<br />

A<br />

RTD<br />

1<br />

y 2<br />

y 1<br />

A+0.1<br />

RTD<br />

2<br />

Figure 2. Two <strong>in</strong>put NAND-EXOR gate, a) implemented with previously reported topology, b) implemented with<br />

new proposed topology. RTD areas are expressed as area factors; an area factor of 1 is equal to an area of 10µm 2<br />

Figure 3. Simulation results for the proposed design showed <strong>in</strong> Fig. 2b.<br />

57<br />

(b)<br />

V bias<br />

A<br />

B<br />

y 2<br />

y 1


employees the topology of MTTGs reported <strong>in</strong> [6]. Fig.<br />

2b depicts the new proposed realization. In both circuits<br />

the parameter A, determ<strong>in</strong><strong>in</strong>g the area of some of the<br />

RTDs, is selected accord<strong>in</strong>g to the technology <strong>and</strong> to the<br />

required speed-power trade-offs. Circuit <strong>in</strong> Fig. 2a operates<br />

on the basis of current comparison: the NDR with<br />

the lowest peak current switches off. Inputs (A <strong>and</strong> B)<br />

modulate the peak currents of both the down (NDR2 )<br />

<strong>and</strong> the middle (NDR1 ) NDRs, although by a different<br />

amount s<strong>in</strong>ce the areas of the RTDs of the <strong>in</strong>put stages<br />

are 2 µm <strong>and</strong> 4 respectively (see caption to Figure<br />

2). Thus, concern<strong>in</strong>g <strong>and</strong> , the same<br />

result of the current comparison could be obta<strong>in</strong>ed if<br />

only is modulated with <strong>in</strong>put stages of 2<br />

RTDs. That is what has been done <strong>in</strong> Fig. 2b (<strong>in</strong>put<br />

stages a <strong>and</strong> b). However, area relationships of<br />

<strong>and</strong> with must be kept for circuit <strong>in</strong> Fig<br />

2b to implement the same functionality that circuit <strong>in</strong><br />

Fig. 2a. To achieve this, 2 <strong>in</strong>put stages common to<br />

both <strong>and</strong> have been added to the proposed<br />

circuit (<strong>in</strong>put stages c <strong>and</strong> d). Note that the area of<br />

some RTDs is reduced with respect to the solution <strong>in</strong><br />

Fig 2a, thus allow<strong>in</strong>g to reduce the width of their associated<br />

transistors s<strong>in</strong>ce they are sized such that RTDs are<br />

the current limit<strong>in</strong>g devices <strong>in</strong> the series connection of<br />

the RTD <strong>and</strong> the transistor. This idea is the rationale for<br />

a new topology for MTTGs. Fig. 3 shows simulation<br />

results for the proposed new design. Correct operation<br />

(y2 implements NAND <strong>and</strong> y1 implements EXOR) is<br />

obta<strong>in</strong>ed.<br />

2<br />

µm 2<br />

NDR1 NDR2 NDR1 µm 2<br />

NDR1 NDR2 NDR0 µm 2<br />

NDR1 NDR2 Maximum frequency (MHz)<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

III. COMPARISON OF ARCHITECTURES<br />

In order to compare both circuit architectures we have<br />

carried out simulations of three stage cha<strong>in</strong>s of these<br />

NAND-EXOR gates. The study has been done with<br />

w1 = 10µm , w2 = 2µm for circuit <strong>in</strong> Fig 2a <strong>and</strong><br />

w1 = 5µm , w2 = 2µm for circuit <strong>in</strong> Fig 2b.<br />

L = 0,6µm<br />

for all transistors. These sizes have been<br />

selected to maximize the operat<strong>in</strong>g frequency. In all the<br />

simulations LOCOM models for RTDs <strong>and</strong> transistors<br />

have been used [8].<br />

Fig. 4 shows the maximum operat<strong>in</strong>g frequency for<br />

different values of the bias voltage tak<strong>in</strong>g the RTD area<br />

A as a parameter. The results <strong>in</strong> Fig. 4 <strong>in</strong>dicate that with<br />

the new idea faster designs can be obta<strong>in</strong>ed. Also, for<br />

operation frequencies which can be achieved with both<br />

architectures, the new one requires smaller A values.<br />

Fig. 5 shows the power consumption versus operat<strong>in</strong>g<br />

frequency for different A values. Due to the form for the<br />

maximun operat<strong>in</strong>g frequency versus Vbias (see Fig. 4),<br />

with two values of V bias result<strong>in</strong>g <strong>in</strong> the same operation<br />

frequency, two branches are observed <strong>in</strong> these representations.<br />

For both circuits power <strong>in</strong>creases with the area factor<br />

A but the circuit <strong>in</strong>corporat<strong>in</strong>g the new idea exhibits<br />

less power consumption than the respective orig<strong>in</strong>al one<br />

with the same A value.<br />

F<strong>in</strong>ally, Fig. 6 depicts power consumption versus the<br />

operat<strong>in</strong>g frequency. Designs with the smallest A value<br />

(<strong>and</strong> so smallest power consumption) achiev<strong>in</strong>g the<br />

required operat<strong>in</strong>g frequency have been selected. The<br />

Vbias (V)<br />

Figure 4. Maximum frequency versus bias voltage for different A values.<br />

Orig<strong>in</strong>al designs: marks over dashed l<strong>in</strong>e<br />

New designs: marks over solid l<strong>in</strong>e<br />

58


Power (mW)<br />

Figure 6. Power consumption versus maximum<br />

operat<strong>in</strong>g frequency.<br />

bias voltage is fixed at 0.7 volts. It is clear that the new<br />

idea design has less power consumption than the orig<strong>in</strong>al<br />

one.<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

IV. CONCLUSIONS<br />

Maximum frequency (MHz)<br />

A new RTD-based circuit topology for MTTGs has<br />

been proposed. The comparison to a previously reported<br />

one by means of an NAND-EXOR gate shows significant<br />

improvements. The new design can operate at higher frequency<br />

<strong>and</strong> consumes less power.<br />

ACKNOELEDGEMENT<br />

This effort was partially supported by the Spanish Government<br />

under project TEC2004-02948/MIC.<br />

REFERENCES<br />

[1] P. Mazumder, S. Kulkarni, M. Bhattacharya, J.P. Sun<br />

<strong>and</strong> G.I. Haddad, “Digital Circuit Applications of Resonant<br />

Tunnel<strong>in</strong>g Devices”, Proceed<strong>in</strong>gs of the IEEE,<br />

Vol. 86, no. 4, pp. 664-686, April 1998.<br />

[2] K.J. Chen, K. Maezawa <strong>and</strong> M. Yamamoto, “InP-<br />

Power (mW)<br />

59<br />

Figure 5. Power consumption versus<br />

maximum operat<strong>in</strong>g frequency for different<br />

A values.<br />

Orig<strong>in</strong>al designs: marks over dashed l<strong>in</strong>e<br />

New designs: marks over solid l<strong>in</strong>e<br />

Maximum frequency (MHz)<br />

Based High Performance Monostable-Bistable Transition<br />

Logic Elements (MOBILEs) Us<strong>in</strong>g Integrated<br />

Multiple-Input Resonant-Tunnel<strong>in</strong>g Devices,” IEEE<br />

Electron Device Letters, Vol. 17, no. 3, pp. 127-129,<br />

March 1996.<br />

[3] C. Pacha et al., “Threshold Logic Circuit Design of<br />

Parallel Adders Us<strong>in</strong>g Resonant Tunnell<strong>in</strong>g Devices,”<br />

IEEE Trans. on VLSI Systems, Vol. 8, no. 5, pp. 558-<br />

572, Oct. 2000.<br />

[4] S. Muroga, Threshold Logic <strong>and</strong> Its Applications, New<br />

York: Wiley, 1971.<br />

[5] D. R. Har<strong>in</strong>g, “Multi-Threshold Threshold Elements”,<br />

IEEE Trans. on Electronic Computers, Vol. EC-15,<br />

No. 1, pp. 45-65, February 1966.<br />

[6] M.J. Avedillo, J.M. Qu<strong>in</strong>tana, H. Pettenghi, P.M.<br />

Kelly, <strong>and</strong> C.J. Thompson, “Multi-threshold Threshold<br />

Logic Circuit Design Us<strong>in</strong>g Resonant Tunnel<strong>in</strong>g<br />

Devices,” <strong>Electronics</strong> Letters, Vol. 39, pp. 1502-1504,<br />

2003.<br />

[7] H. Pettenghi, M.J. Avedillo, J.M. Qu<strong>in</strong>tana, “Useful<br />

Logic Blocks Based on Clocked Series-Connected<br />

RTDs” Proceed<strong>in</strong>gs. IEEE Conference on Nanotechnology,<br />

pp. 593-595, 2004.<br />

[8] W. Prost et al., EU IST Report LOCOM no. 28 844<br />

Dec. 2000.


0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

VARIATION OF FLASH MEMORY THRESHOLD<br />

VOLTAGE CORRELATED WITH APPLIED VOLTAGE<br />

SLOPE IN FOWLER NORDHEIM ERASE MODE<br />

Valéry Bouquet 1,2 , Pierre Canet 1 , Frédéric Lal<strong>and</strong>e 1 , Jean Dev<strong>in</strong> 2 , Bruno Leconte 2 , Nicolas Mariéma 1<br />

(1) L2MP-UMR CNRS 6137, IMT Technopole de Château Gombert, 13451 Marseille Cedex 13<br />

(2) ST-<strong>Microelectronics</strong>, ZI de Rousset BP 2, 13106 Rousset Cedex, FRANCE<br />

E-mail: valery.bouquet@l2mp.fr<br />

ABSTRACT<br />

In order to pre-evaluate the necessary time needed to<br />

write a flash cell memory, we use a simplified expression<br />

for the Fowler Nordheim <strong>in</strong>ject<strong>in</strong>g current dur<strong>in</strong>g the<br />

erase mode, which allows us to f<strong>in</strong>d a relationship<br />

between the applied voltages <strong>and</strong> the result<strong>in</strong>g threshold<br />

voltage. We show <strong>in</strong> this study that the absolute value of<br />

the derivative of the threshold voltage is equal to the<br />

positive signal slope applied on the control gate. We<br />

validate this assumption by measurements on samples<br />

provided by STmicroelectronics<br />

1. INTRODUCTION<br />

The flash memory is a non-volatile memory made of one<br />

stacked transistor with an isolated poly silicon gate called<br />

float<strong>in</strong>g gate which lead to a small cell’s area <strong>and</strong> a high<br />

<strong>in</strong>tegration density. In order to pre-evaluate the necessary<br />

time needed to erase a Flash cell memory, we try to<br />

evaluate the threshold voltage variation depend<strong>in</strong>g on the<br />

applied voltages.<br />

2. FLASH MEMORY<br />

The flash memory is a none volatile memory which stores<br />

<strong>and</strong> keeps the b<strong>in</strong>ary <strong>in</strong>formation without any external<br />

electrical supply. The flash memory is made of one<br />

stacked transistor with an isolated poly silicon gate called<br />

float<strong>in</strong>g gate (Fig. 1a) [1]. We def<strong>in</strong>e the different<br />

capacitances : C PP is the <strong>in</strong>ter poly silicon capacitance<br />

between the float<strong>in</strong>g gate <strong>and</strong> the control gate, C ox is the<br />

capacitance between the float<strong>in</strong>g gate <strong>and</strong> the channel,<br />

IFN is the <strong>in</strong>jected current generator .<br />

(a) (b)<br />

Figure 1. (a) Structure of the stacked transistor<br />

with an isolated polysilicon float<strong>in</strong>g gate (b)<br />

Electrical equivalent schema of the flash memory.<br />

The float<strong>in</strong>g gate charge, Q FG, impacts the threshold<br />

voltage of the transistor to determ<strong>in</strong>e two b<strong>in</strong>ary states.<br />

60<br />

Q<br />

V −<br />

FG<br />

(1)<br />

T = VT<br />

0<br />

CPP<br />

Where VT is the threshold voltage after <strong>in</strong>jection of<br />

charge ; VTo is the <strong>in</strong>itial threshold voltage without<br />

<strong>in</strong>jected charge <strong>in</strong> the float<strong>in</strong>g gate ; QFG is the charge<br />

<strong>in</strong>jected <strong>in</strong> the float<strong>in</strong>g gate.<br />

Two mechanisms are used to <strong>in</strong>ject charges through the<br />

th<strong>in</strong> tunnel oxide <strong>in</strong> the float<strong>in</strong>g gate : the positive charge<br />

<strong>in</strong>jection is led by a Fowler Nordheim mechanism (FN)<br />

[2] <strong>and</strong> the negative charge <strong>in</strong>jection is led by channel hot<br />

electron mechanism (CHE) [3, 4]. The positive charge<br />

<strong>in</strong>jection is called the erase mode <strong>and</strong> the negative charge<br />

<strong>in</strong>jection is called the programm<strong>in</strong>g mode.<br />

To reach an erased threshold voltage, we apply a negative<br />

signal between the control gate <strong>and</strong> the well to establish a<br />

high electrical field across the th<strong>in</strong> tunnel oxide. By this<br />

way, a positive charge is <strong>in</strong>jected <strong>in</strong> the float<strong>in</strong>g gate by<br />

Fowler Nordheim mechanism [2].<br />

3. MODELING OF ERASING<br />

The equivalent electrical schema is given <strong>in</strong> figure1b<br />

where CPP is <strong>in</strong>ter poly silicon capacitance between the<br />

float<strong>in</strong>g gate <strong>and</strong> the control gate, Cox is the capacitance<br />

between the float<strong>in</strong>g gate <strong>and</strong> the channel. The charge<br />

<strong>in</strong>jection is controlled by a Fowler Nordheim <strong>in</strong>ject<strong>in</strong>g<br />

current given by the follow<strong>in</strong>g equation [2].<br />

2 B<br />

(2)<br />

IFN<br />

= A × STUN<br />

× ETUN<br />

× exp( )<br />

ETUN<br />

Where A (AV -2 ) <strong>and</strong> B (Vm -1 ) are the Fowler Nordheim<br />

parameters ; STUN is the tunnel array ; ETUN is the electrical<br />

field across the th<strong>in</strong> tunnel oxide.<br />

Us<strong>in</strong>g the electrical scheme (Fig. 1b) <strong>and</strong> the Kirchoff law,<br />

we obta<strong>in</strong> the expression (3) where VFG is the float<strong>in</strong>g gate<br />

potential <strong>and</strong> VCG is the control gate potential.<br />

dV dV<br />

( C<br />

(3)<br />

PP<br />

dt dt<br />

FG CG<br />

I FN = COX<br />

+ CPP<br />

) × − ×<br />

Without <strong>in</strong>jection (I FN=0A), we <strong>in</strong>troduce the coupl<strong>in</strong>g<br />

ratio, α G (4a). This parameter permits to establish an<br />

expression of the variation of the float<strong>in</strong>g gate potential <strong>in</strong><br />

function of the variation of the control gate potential (4b).


PP<br />

α G =<br />

COX<br />

+ CPP<br />

(4a)<br />

dVFG dVCG<br />

= αG<br />

×<br />

(4b)<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

C<br />

dt dt<br />

When <strong>in</strong>jection occurs the <strong>in</strong>jected charges moved the<br />

float<strong>in</strong>g gate potential <strong>and</strong> the expression (4b) will be not<br />

valid.<br />

In order to reach the given erase float<strong>in</strong>g gate voltage, a<br />

high coupl<strong>in</strong>g ration permit to decrease the erase applied<br />

voltage. When the coupl<strong>in</strong>g ratio <strong>in</strong>creases, the tunnel<br />

oxide field <strong>in</strong>creases, <strong>and</strong> <strong>in</strong> the same time the tunnel<br />

oxide voltage. Moreover the <strong>in</strong>jected current is more<br />

important, <strong>and</strong> the charges are <strong>in</strong>jected faster. So it’s<br />

possible to decrease the erase time. But this <strong>in</strong>crease of the<br />

coupl<strong>in</strong>g ratio implies a worse reliability[].<br />

In order to exhibit an analytical solution, we solve<br />

equation (3), by us<strong>in</strong>g a l<strong>in</strong>ear simplified expression of the<br />

Fowler Nordheim current (5) as shown <strong>in</strong> figure 2.<br />

Injected current (A)<br />

0<br />

2 .10 13<br />

2 .10 13<br />

4 .10 13<br />

IFN() x 4 .10 13<br />

IFN() x<br />

6 .10 13<br />

IFNl( x)<br />

6 .10 13<br />

IFNl( x)<br />

8 .10 13<br />

8 .10 13<br />

0<br />

VS<br />

g = dIFN<br />

g =<br />

dV<br />

dIFN<br />

g =<br />

dV<br />

dIFN<br />

g =<br />

dV<br />

dIFN<br />

dV<br />

IFN theoretical<br />

IFN simplified<br />

1 .10<br />

14 12 10 8 6 4 2 0<br />

12<br />

− 1 10 12 −<br />

×<br />

1 .10<br />

14 12 10 8 6 4 2 0<br />

− 14<br />

x<br />

Tunnel oxide voltage (V)<br />

0<br />

12<br />

− 1 10 12 −<br />

×<br />

− 14<br />

x<br />

Tunnel oxide voltage (V)<br />

0<br />

Figure 2. Simplification of the Fowler Nordheim<br />

current by <strong>in</strong>troduc<strong>in</strong>g VS as threshold potential<br />

<strong>and</strong> g as the current variation for V >V S<br />

IFN S FG<br />

= g × ( V − V )<br />

(5)<br />

By this way we express the expression of the float<strong>in</strong>g gate<br />

potential where t s is the time when the <strong>in</strong>jection occurs :<br />

V<br />

FG<br />

1 dV ⎧<br />

⎫<br />

CG<br />

− g × ( t − ts)<br />

(6)<br />

= VS<br />

+ ( × CPP<br />

) × ⎨1<br />

− exp<br />

⎬<br />

g dt ⎩ COX<br />

+ CPP<br />

⎭<br />

So we can express the maximal float<strong>in</strong>g gate potential<br />

when permanent regime is established as experimental<br />

conditions permit it :<br />

1 dVCG<br />

V FGMAX<br />

= VS<br />

+ ( × C<br />

g dt<br />

PP<br />

)<br />

We can now express the variation of threshold voltage<br />

dur<strong>in</strong>g an erase permanent regime (8).<br />

PP<br />

PP<br />

(7)<br />

dVT<br />

1<br />

1<br />

(8a)<br />

= − × I FN = − × g × ( VS<br />

−V<br />

FGMAX )<br />

dt C<br />

C<br />

dt<br />

dV<br />

= −<br />

(8b)<br />

dt<br />

dVT GC<br />

61<br />

We show that the absolute value of the derivative of the<br />

threshold voltage dur<strong>in</strong>g erase permanent regime is clearly<br />

equal to variation of the control gate potential. We can<br />

notice that this equality is <strong>in</strong>dependent of the technological<br />

parameters of the cell as the coupl<strong>in</strong>g ratio, α G. But the<br />

time to reach permanent regime will depend of<br />

technological parameters.<br />

4. MEASUREMENTS<br />

We designed three cells to reach different coupl<strong>in</strong>g ratios,<br />

by adjust<strong>in</strong>g the <strong>in</strong>ter poly capacitive areas of the cell.<br />

These cells were realised by Stmicroelectronics. The<br />

technology used imposes all the oxide thickness : tunnel<br />

oxide thickness, t ox


eaches his maximum <strong>and</strong> constant value, the <strong>in</strong>jection<br />

current is maximal <strong>and</strong> constant. The permanent regime is<br />

established <strong>and</strong> the absolute value of the variation of the<br />

threshold voltage, is clearly the same <strong>and</strong> equal to positive<br />

signal slope as (8) .<br />

Figure 4. Threshold voltage evolution <strong>in</strong> erase<br />

mode for three memory cells with different<br />

coupl<strong>in</strong>g ratios (▲ cell A, α G=0.61 ; ■ cell B,<br />

α G=0.71 ; ● cell C, α G=0.78 ) by three positive<br />

signal slopes (0.625V/ms ; 1.25V/ms ; 2.5V/ms).<br />

Moreover, we observe that for the f<strong>in</strong>al threshold voltage<br />

value depends on coupl<strong>in</strong>g ratio <strong>and</strong> depends on the<br />

applied signal slope, table 1. The technological limitation<br />

(maximum allowed voltage) <strong>and</strong> the expected threshold<br />

voltage will conduct to the choice of the erase signal slope<br />

<strong>and</strong> of the coupl<strong>in</strong>g ratio giv<strong>in</strong>g the shorter eras<strong>in</strong>g time.<br />

Table 1. F<strong>in</strong>al threshold voltage for three cells with<br />

different coupl<strong>in</strong>g ratio (cell A, αG=0.61 ; cell B, αG=0.71 ; cell C, αG=0.78 ), <strong>in</strong> erase mode by three positive signal<br />

slopes (0.625V/ms ; 1.25V/ms ; 2.5V/ms) at the same f<strong>in</strong>al<br />

bulk-control gate potential.<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

Cell A Cell B Cell C<br />

0.625V/ms 3.4V 2.8V 2.1V<br />

1.25V/ms 3.7V 3.1V 2.5V<br />

2.5V/ms 4V 3.4V 2.75V<br />

5. CONCLUSION<br />

In this study, we show with a very simple modell<strong>in</strong>g of the<br />

Fowler Nordheim current, that the derivative of the<br />

threshold voltage is opposite to the applied signal slope<br />

when the permanent regime is established with a constant<br />

float<strong>in</strong>g gate potential <strong>and</strong> constant <strong>in</strong>jected current.<br />

Neglect<strong>in</strong>g the transitional doma<strong>in</strong>, it is very easy to<br />

evaluate the eras<strong>in</strong>g time necessary to reach a given<br />

threshold voltage depend<strong>in</strong>g on applied signal slope.<br />

A better evaluation of this eras<strong>in</strong>g time needs to simulate<br />

the transitional behaviour of the cell.<br />

62<br />

6. REFERENCES<br />

[1] W.D. Brown, J.E. Brewer, Non Volatile Semiconductor<br />

Memory Technology, IEEE Press (1998).<br />

[2] R. H Fowler <strong>and</strong> L. Nordheim, Proc. Soc. London<br />

Ser., A119 (1928) 173.<br />

[3] B. Eitan, D. Froham-Bentchkowsky, Hot-Electron<br />

Injection <strong>in</strong>to the Oxyde <strong>in</strong> N-Channel MOS devices,<br />

IEEE Trans. Electron Devices, ED-28 n°3 (1981)<br />

328-340.<br />

[4] C. Hu, Lucky Electron Model for Channel Hot-<br />

Electron Emission, IEDM Tech. Dig., (1979) 22.<br />

[5] P. Canet, R. Bouchakour, F. Lal<strong>and</strong>e, J.M. Mirabel,<br />

EEPROM Cell Design : Paradoxical Choice of the<br />

Coupl<strong>in</strong>g Ratio, J. Non-Cryst. Solids, vol. 322 (2003)<br />

p.246-249.<br />

[6] V. Bouquet, P. Canet, F. Lal<strong>and</strong>e, R. Bouchakour,<br />

J.M. Mirabel, Non Volatile Memory Cell Design :<br />

Coupl<strong>in</strong>g Ratio Impact On tunnel Oxide Reliability,<br />

J. Non-Cryst. Solids (<strong>2005</strong>) to be published.<br />

[7] P. Canet, F. Lal<strong>and</strong>e, J. Razaf<strong>in</strong>dramora, V. Bouquet,<br />

J. Postel, R. Bouchakour, J.M. Mirabel, Integrated<br />

Reliability <strong>in</strong> Non Volatile Memory Cell Design,<br />

NVMTS’2004 5th Annual Non-Volatile Memory<br />

Technology Symposium, Orl<strong>and</strong>o, Florida,<br />

November 15-17 (2004) proceed<strong>in</strong>g CD.


0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

TOWARDS A UNIFIED TOP-DOWN DESIGN<br />

FLOW FOR FULLY DIFFERENTIAL LOGIC<br />

BLOCKS WITH IMPROVED SPEED AND NOISE<br />

IMMUNITY<br />

Ilhan Hatirnaz, Stéphane Badel, Yusuf Leblebici<br />

Microelectronic Systems Laboratory (LSM)<br />

Ecole Polytechnique Fédérale de Lausanne (EPFL)<br />

Batiment ELD Station 11, CH-1015 Lausanne, Switzerl<strong>and</strong><br />

E-mail: {ilhan.hatirnaz, stephane.badel, yusuf.leblebici}@epfl.ch<br />

ABSTRACT<br />

A new top-down design flow (RTL-to-GDSII) is<br />

proposed for achiev<strong>in</strong>g high-performance <strong>and</strong> noiseimmune<br />

designs consist<strong>in</strong>g of differential logic blocks.<br />

The differential build<strong>in</strong>g blocks are based on the currentmode<br />

logic (CML), which offers true differentiality with<br />

low-sw<strong>in</strong>g signall<strong>in</strong>g, switch<strong>in</strong>g-<strong>in</strong>dependent constant<br />

power dissipation <strong>and</strong> very high-speed operation. The<br />

goal of this flow is to allow effective cancellation of<br />

<strong>in</strong>ductive <strong>and</strong> capacitive noise <strong>in</strong> high-speed on-chip<br />

<strong>in</strong>terconnect l<strong>in</strong>es us<strong>in</strong>g a simple generic <strong>in</strong>terconnect<br />

architecture.<br />

1. INTRODUCTION<br />

Currently, the implementation of chip design is an<br />

iterative process guided by the design automation tools<br />

<strong>and</strong> the conventional l<strong>in</strong>ear design flow. In the ideal case,<br />

the limit<strong>in</strong>g factor <strong>in</strong> achiev<strong>in</strong>g the design goals would<br />

come only from the physical limitations of the process<br />

technology used. However, especially start<strong>in</strong>g with the use<br />

of deep sub-micron technologies, the design flow <strong>and</strong> the<br />

electronic design automation (EDA) tools started<br />

becom<strong>in</strong>g the limit<strong>in</strong>g factors of what f<strong>in</strong>al performance<br />

can be achieved.<br />

There is a number of reasons beh<strong>in</strong>d this fact. One reason<br />

is the <strong>in</strong>creas<strong>in</strong>g complexity of the designs, which, can not<br />

be h<strong>and</strong>led by currently available tools. Furthermore,<br />

traditional VLSI design flows may mask some problems,<br />

which only show up dur<strong>in</strong>g or after the f<strong>in</strong>al steps of the<br />

flow (for example, after the detailed rout<strong>in</strong>g is f<strong>in</strong>ished),<br />

where, <strong>in</strong> some cases, a dramatic change at the high-level<br />

description of the circuit might be necessary.<br />

In addition to the <strong>in</strong>creas<strong>in</strong>g design complexity, the<br />

fabrication costs of ASICs are ris<strong>in</strong>g rapidly as the latest<br />

technologies with much smaller feature sizes are offered.<br />

This leads to a tendency from st<strong>and</strong>ard-cell based design<br />

to more soft-programmable design solutions, like FPGAs<br />

<strong>and</strong>/or processor-based solutions, where the mask-cost is<br />

reduced to m<strong>in</strong>imum. Although these solutions help<br />

reduc<strong>in</strong>g the total cost <strong>and</strong> time-to-market, they are not<br />

able to offer comparable performance as ASIC-based<br />

solutions. Recently, to address this problem, structured-<br />

63<br />

ASICs (SA) were <strong>in</strong>troduced [1]. Structured ASICs are<br />

expected to fill the gap between FPGAs <strong>and</strong> st<strong>and</strong>ard-cell<br />

based design approaches. Structured ASICs are based on<br />

a predef<strong>in</strong>ed <strong>and</strong> pre-built logic fabric, which is fabricated<br />

<strong>in</strong>clud<strong>in</strong>g the <strong>in</strong>terconnect structure consist<strong>in</strong>g of a number<br />

of the bottom metal layers (for example up to Metal3 or<br />

Metal4), where the rest of the metal layers are to be laid<br />

out later for hav<strong>in</strong>g the design mapped to the wafer. These<br />

wafers are stored as base wafers until ordered by<br />

customers.<br />

This project aims to f<strong>in</strong>d a generic solution to signal<br />

<strong>in</strong>tegrity problems <strong>and</strong> a differential-design flow that<br />

employs a fully differential (differential <strong>in</strong>puts-differential<br />

outputs) st<strong>and</strong>ard cell family based on current-mode logic<br />

(CML). This paper <strong>in</strong>troduces the first step <strong>in</strong>to the search<br />

for a complete solution. Proposed is a flow for the<br />

implementation of a design us<strong>in</strong>g differential gates or<br />

structures, where the signal <strong>in</strong>tegrity issues are to be<br />

addressed early at the design flow, even <strong>in</strong> the library<br />

modell<strong>in</strong>g phase. The target device of this flow can be<br />

either a st<strong>and</strong>ard-cell library or a structured-ASIC based<br />

solution; <strong>in</strong> this paper only implementation target<strong>in</strong>g a cell<br />

library is discussed. The benefits of such a scheme is the<br />

regularity <strong>and</strong> hence the <strong>in</strong>creased predictability of the<br />

f<strong>in</strong>al design, <strong>and</strong> additionally, shift<strong>in</strong>g the limitations from<br />

the limits of the tools or flows to the process technology<br />

limits. These goals, if achieved, would prevent the underutilization<br />

of the very deep sub-micron (VDSM)<br />

technologies, which means that the designers would be<br />

able to achieve higher clock speed us<strong>in</strong>g the same<br />

technology at either an acceptable or no additional cost,<br />

result<strong>in</strong>g <strong>in</strong> a faster time-to-market.<br />

The rema<strong>in</strong>der of this paper is organized as follows: The<br />

proposed differential design flow is <strong>in</strong>troduced <strong>in</strong> Section<br />

2, <strong>in</strong>clud<strong>in</strong>g the <strong>in</strong>dividual steps <strong>and</strong> the CML-based cell<br />

library. In Section 3 the test chip <strong>in</strong>clud<strong>in</strong>g the blocks<br />

designed us<strong>in</strong>g this <strong>and</strong> regular design flows is presented.,<br />

followed by the conclusions.<br />

2. DIFFERENTIAL DESIGN FLOW<br />

There are a number of well-known advantages of us<strong>in</strong>g<br />

differential signall<strong>in</strong>g <strong>in</strong> high-performance systems <strong>in</strong><br />

terms of signal <strong>in</strong>tegrity <strong>and</strong> noise immunity. The primary


disadvantage of differential signall<strong>in</strong>g is the <strong>in</strong>creased<br />

number of traces per bit of <strong>in</strong>formation, which<br />

proportionally <strong>in</strong>creases the cost of the associated rout<strong>in</strong>g<br />

<strong>and</strong> the total silicon area, which, <strong>in</strong> fact, constitutes the<br />

ma<strong>in</strong> reason for mak<strong>in</strong>g use of differential signall<strong>in</strong>g <strong>and</strong><br />

differential gates only <strong>in</strong> some very high performance<br />

designs ( for example, microprocessors [2] ) <strong>and</strong> only <strong>in</strong><br />

specific cases, like rout<strong>in</strong>g bit l<strong>in</strong>es of RAM-structures.<br />

Therefore, there is not as much <strong>in</strong>terest <strong>and</strong> support from<br />

the EDA world for differential designs as there is for<br />

conventional s<strong>in</strong>gle-ended cell libraries. There are no<br />

commercial tools that provide differential logic synthesis,<br />

moreover, conventional hardware description languages<br />

do not support differential design entry.<br />

Figure 1 The proposed RTL-to-GDSII differential<br />

design flow. It should be noted that ‘S’ st<strong>and</strong>s for<br />

s<strong>in</strong>gle-ended <strong>and</strong> ‘D’ st<strong>and</strong>s for differential <strong>in</strong> the<br />

descriptions of the different netlists.<br />

2.1 Logic Synthesis<br />

The proposed design flow is given <strong>in</strong> Figure 1. The ma<strong>in</strong><br />

pieces of this flow are commercially available EDA tools<br />

<strong>and</strong> a number of netlist conversion scripts. The ma<strong>in</strong> <strong>in</strong>put<br />

to the flow is a synthesizable RTL description of the<br />

design. The RTL code does not need to <strong>in</strong>clude any<br />

knowledge of differentiality, it only describes the design<br />

<strong>in</strong> a s<strong>in</strong>gle-ended manner.<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

64<br />

Even a fully characterized differential cell library<br />

(differential <strong>in</strong>puts-differential outputs) is available,<br />

current synthesis tools are not able to provide mapp<strong>in</strong>g of<br />

nets to differential <strong>in</strong>puts pairs of gates from this<br />

differential library. To overcome this issue <strong>and</strong> also make<br />

use of the complementary nature of the differential cell<br />

outputs, a new synthesis library is extracted from the fully<br />

differential library, where this new library consists of<br />

s<strong>in</strong>gle-ended <strong>in</strong>put/differential output (SD) gates. The<br />

logic synthesis tool (Synopsys Design Compiler [3]) is<br />

able to benefit from the differential outputs of the logic<br />

gates offered by the SD cell library, i.e., the tools uses<br />

either both signals (<strong>in</strong>verted <strong>and</strong> non-<strong>in</strong>verted) or one of<br />

them without need<strong>in</strong>g to <strong>in</strong>vert the complementary net of<br />

the pair.<br />

After the mapp<strong>in</strong>g process is f<strong>in</strong>ished, the synthesized<br />

circuit is written out as a Verilog netlist. This netlist,<br />

consist<strong>in</strong>g of SD gates, are then converted first to a s<strong>in</strong>gleended<br />

<strong>in</strong>put/s<strong>in</strong>gle-ended output netlist <strong>and</strong> then to a fully<br />

differential Verilog netlist us<strong>in</strong>g the netlist conversion<br />

scripts. The observe if the same functionality is kept for all<br />

the different netlist of the design, these scripts also provide<br />

a run file to be fed to the equivalency checker tool<br />

(Synopsys Formality [3]) with the Verilog netlists under<br />

comparison.<br />

Dur<strong>in</strong>g logic synthesis, it can happen that only one of the<br />

complementary outputs (either Y or Y´) drives an <strong>in</strong>put of<br />

any other gate, whereas the other signal stays float<strong>in</strong>g,<br />

because the <strong>in</strong>puts are only s<strong>in</strong>gle-ended. This means that<br />

the load<strong>in</strong>g of the complementary nets <strong>in</strong> one differential<br />

pair might not be the same. Dur<strong>in</strong>g the SD-to-DD Verilog<br />

netlist conversion all the SD gates are replaced with their<br />

DD counterparts. Hence, both signals of any output pair<br />

will have the same fan-out. If routed together as a pair, the<br />

differential nets will be exposed to the same wire load, <strong>and</strong><br />

therefore, exhibit similar tim<strong>in</strong>g behaviour.<br />

2.2 Placement <strong>and</strong> Rout<strong>in</strong>g<br />

As <strong>in</strong> the logic synthesis case, tools for rout<strong>in</strong>g differential<br />

signals as a differential wire pair do not exist. Some of the<br />

currently available routers can route signals together at a<br />

specific distance from each other, as desired for<br />

differential pair rout<strong>in</strong>g, but, this feature can be applied<br />

only to few user-def<strong>in</strong>ed nets.<br />

There is not much previous work available on differential<br />

rout<strong>in</strong>g; the exist<strong>in</strong>g solutions are based on a rout<strong>in</strong>g the<br />

differential pairs as one wider net, where the width of this<br />

“fat” wire is equal to the sum of the <strong>in</strong>dividual widths of<br />

each net <strong>and</strong> the spac<strong>in</strong>g between them. This method was<br />

<strong>in</strong>troduced <strong>in</strong> [4] to be used <strong>in</strong> multi-chip modules<br />

(MCM), <strong>and</strong> it was adapted to be part of a design flow <strong>in</strong><br />

[5], <strong>in</strong> order to obta<strong>in</strong> secure hardware implementations of<br />

crypto algorithms aga<strong>in</strong>st the differential power analysis<br />

(DPA) attacks.<br />

The <strong>in</strong>puts to the placement-<strong>and</strong>-rout<strong>in</strong>g (P&R) step are<br />

the Verilog netlist consist<strong>in</strong>g of SS-gates <strong>and</strong> a LEF file<br />

(Library Exchange Format by Cadence [6]) represent<strong>in</strong>g<br />

the “fat-wire” technology <strong>and</strong> the cell library, <strong>in</strong> which,


each gate has s<strong>in</strong>gle-ended IO p<strong>in</strong>s. These p<strong>in</strong>s are def<strong>in</strong>ed<br />

as “virtual p<strong>in</strong>s” located on a higher level of metal. The<br />

regular P&R flow is followed until a DRC clean <strong>and</strong><br />

logically verified layout is obta<strong>in</strong>ed. The output of this<br />

step is a DEF file (Design Exchange Format by Cadence<br />

[6]) describ<strong>in</strong>g the f<strong>in</strong>al circuit of SS-gates <strong>and</strong> wide-wire<br />

<strong>in</strong>terconnections. The next step is to run the script, which<br />

replaces each SS-cell with its counterpart from the fully<br />

differential DD library, <strong>and</strong> splits the “fat wires” <strong>in</strong>to the<br />

two nets of regular wire width dictated by the orig<strong>in</strong>al<br />

technology. Then, to complete the connections between<br />

the differential IO p<strong>in</strong>s <strong>and</strong> the differential nets, the fully<br />

differential DD Verilog netlist is read <strong>and</strong> a correspond<strong>in</strong>g<br />

connection mask is applied to between every p<strong>in</strong> pair <strong>and</strong><br />

the so-called virtual p<strong>in</strong>s. The f<strong>in</strong>al step is to verify the<br />

<strong>in</strong>terconnection network by either runn<strong>in</strong>g LVS or us<strong>in</strong>g<br />

an equivalency checker tool.<br />

The proposed method <strong>in</strong>volves a more detailed work with<br />

less constra<strong>in</strong>ts on the cell design compared to other<br />

methods <strong>in</strong> previous works, serv<strong>in</strong>g the goal of achiev<strong>in</strong>g<br />

a noise-immune design solution. Dur<strong>in</strong>g placement the<br />

tool is allowed to use any symmetry for the cells, which<br />

might save a lot of area. The method allows the designer<br />

to run clock tree synthesis, <strong>in</strong> fact, it does not prevent the<br />

tool to apply any ECO changes that might be necessary.<br />

Moreover, it can be applied to exist<strong>in</strong>g differential cell<br />

libraries with little additional work.<br />

2.3 The Differential Cell Library<br />

A fully differential cell library has been designed <strong>and</strong><br />

characterized to be used <strong>in</strong> logic synthesis <strong>and</strong> P&R. The<br />

cells are based on current mode logic (CML), where, the<br />

operation is based on the pr<strong>in</strong>ciple of re-direct<strong>in</strong>g (or<br />

switch<strong>in</strong>g) the current of a constant current source through<br />

a fully differential network of <strong>in</strong>put transistors, <strong>and</strong><br />

utiliz<strong>in</strong>g the reduced-sw<strong>in</strong>g voltage drop on a pair of<br />

complementary load devices as the output. CML circuits<br />

have been <strong>in</strong>troduced as very high speed design<br />

alternatives that offer robust operation, reduced power<br />

supply/common mode noise, <strong>and</strong> improved immunity<br />

aga<strong>in</strong>st process variations [7]. The switch<strong>in</strong>g delays can be<br />

significantly reduced due to limited output voltage sw<strong>in</strong>g,<br />

while fully differential <strong>in</strong>puts <strong>and</strong> outputs contribute to<br />

improved noise immunity <strong>and</strong> robustness. In addition, the<br />

power dissipation of the MCML gate rema<strong>in</strong>s virtually<br />

<strong>in</strong>dependent of the switch<strong>in</strong>g frequency, which means that<br />

the power dissipation at higher operat<strong>in</strong>g frequencies is<br />

actually lower than that of an equivalent CMOS gate<br />

under the same output load conditions.<br />

Figure 2 shows a generic 3-<strong>in</strong>put CML gate. The transistor<br />

M1, driven by a fixed voltage (“nbias”), provides the<br />

current for the circuit, <strong>and</strong> the transistors M2 <strong>and</strong> M3,<br />

driven by another fixed voltage (“pbias”) act as resistors.<br />

The generic function of this structure is Y = AB + A'C<br />

( Y’= [A + B’] [A + C’] ), it corresponds to a multiplexeroperation.<br />

All the 2-<strong>in</strong>put logic functions, <strong>in</strong>clud<strong>in</strong>g XOR<br />

operation <strong>and</strong> some of 3-<strong>in</strong>put logic functions can be<br />

realized us<strong>in</strong>g only one generic CML gate, by assign<strong>in</strong>g<br />

the <strong>in</strong>put pairs to appropriate logic levels.<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

65<br />

Figure 2 Transistor-level view of the generic<br />

CML gate.<br />

The CML cell library consists of a limited number of basic<br />

logic gates, <strong>in</strong>clud<strong>in</strong>g buffers, latches <strong>and</strong> resettable flipflops.<br />

Each cell is designed with six different drive<br />

strengths. Typical CML two-<strong>in</strong>put gate delays are found to<br />

be between 90ps, where a full-adder delay is<br />

approximately 50ps.<br />

Figure 3 The layout-view of the generic CML<br />

gate, <strong>in</strong>clud<strong>in</strong>g two neighbor<strong>in</strong>g cells of the same<br />

type, which do not have all the physical layers<br />

visible.<br />

The layout view of the generic CML gate is given <strong>in</strong><br />

Figure 3, with two neighbor<strong>in</strong>g <strong>in</strong>stances of the same cell<br />

together. The total size of the given layout is 22.0 um x<br />

24.5 um, where one cell has a height of 22.0 um <strong>and</strong> a<br />

width of 8.5 um. The cells use up only the lowest two<br />

metal layers, hence, the rest of the upper levels are free for<br />

rout<strong>in</strong>g purposes. .<br />

The ma<strong>in</strong> contribution of CML gates with respect to<br />

predictability of the proposed flow is based on the fact that<br />

there is virtually no change <strong>in</strong> current drawn by the gate,


even after switch<strong>in</strong>g of <strong>in</strong>puts. This eases overcom<strong>in</strong>g<br />

some signal <strong>in</strong>tegrity problems like IR-drop <strong>and</strong> electromigration<br />

(EM), just by mak<strong>in</strong>g them easier to calculate or<br />

to be known at the very first steps of the design flow.<br />

3. HARDWARE IMPLEMENTATION<br />

To test <strong>and</strong> to evaluate the proposed design flow, a test<br />

chip is produced which consists of three different<br />

realizations of the same RC4 block [9] as listed below:<br />

1. RC4_ART_SER: Implemented us<strong>in</strong>g a<br />

commercially available s<strong>in</strong>gle-ended CMOS<br />

st<strong>and</strong>ard-cell library.<br />

2. RC4_CML_FDF: Implement us<strong>in</strong>g the fulldifferential<br />

cells, placed-<strong>and</strong>-routed accord<strong>in</strong>g to<br />

the proposed flow.<br />

3. RC4_CML_SDF: Implemented us<strong>in</strong>g the CML<br />

based fully differential library, same cell<br />

placement as <strong>in</strong> RC4_CML_FDF, without fulldifferential<br />

rout<strong>in</strong>g.<br />

The layout of the test chip is shown <strong>in</strong> Figure 4. One key<br />

observation is the difference <strong>in</strong> size between the s<strong>in</strong>gleended<br />

implementation ‘RC4_ART_SER’ (located at the<br />

top-left corner) <strong>and</strong> the differential implementations.<br />

RC4_ART_SER occupies an area of 400um x 400um,<br />

whereas the differential circuits have the dimensions of<br />

approximately 1mm x 1mm. This difference is caused<br />

ma<strong>in</strong>ly by the area difference of the cells from both<br />

libraries <strong>and</strong> of course the need for rout<strong>in</strong>g two nets<br />

<strong>in</strong>stead of one. In return, it is expected that the fully<br />

differential version of the circuit exhibits significantly<br />

improved signal <strong>in</strong>tegrity characteristics, <strong>and</strong> hence,<br />

higher operat<strong>in</strong>g speed. Complete experimental<br />

characterization of the circuits will be done follow<strong>in</strong>g the<br />

fabrication of the test chip.<br />

Figure 4 The top-level layout of the test chip<br />

designed with 0.18um CMOS technology.<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

66<br />

Figure 5 A closer view to the layout of the RC4<br />

block that is routed fully differential wir<strong>in</strong>g. The<br />

cell layouts are omitted for better visibility of the<br />

differential rout<strong>in</strong>g.<br />

4. CONCLUSION<br />

In this paper a design flow for implement<strong>in</strong>g fully<br />

differential designs is proposed. It is shown that follow<strong>in</strong>g<br />

this flow leads to successful f<strong>in</strong>al layouts which are DRC<br />

<strong>and</strong> LVS clean. Next step <strong>in</strong> this work is to show that the<br />

signal <strong>in</strong>tegrity issues encountered with the deep submicron<br />

technologies can be decreased to an acceptable<br />

level.<br />

5. REFERENCES<br />

[1] B.Zahiri, Structured ASICs: Opportunities <strong>and</strong><br />

Challenges, Proceed<strong>in</strong>gs of ICCD’03, 2003.<br />

[2] P. E. Gronowski, W.J. Bowhill, R.P. Preston, M.K.<br />

Gowan, R.L. Allmon, High Performance<br />

Microprocessor Design, IEEE JSSC, vol. 33, Issue 5,<br />

May, 1998, pp. 676-686.<br />

[3] Synopsys, Inc., http://www.synopsys.com<br />

[4] Loy, J.; Garg, A.; Krishnamoorthy, M.; McDonald,<br />

J., Differential Rout<strong>in</strong>g of MCMs-CIF: the ideal<br />

bifurcation medium, Electrical Performance of<br />

Electronic Packag<strong>in</strong>g, 2004. IEEE 13th Topical<br />

Meet<strong>in</strong>g on 25-27 Oct. 2004 Page(s):135 – 138.<br />

[5] Tiri, K.; Verbauwhede, I., A VLSI design flow for<br />

secure side-channel attack resistant ICs, DATE <strong>2005</strong><br />

Proceed<strong>in</strong>gs<br />

Page(s):58 - 63 Vol. 3.<br />

[6] Cadence Design Systems, http://www.cadence.com<br />

[7] P. Heydari, Design <strong>and</strong> Analysis of Low-Voltage<br />

Current-Mode Logic Buffers, 4 th International<br />

Symposium on Quality Electronic Design, March<br />

2003.<br />

[8] I. Hatirnaz, Y.Leblebici, Twisted-Differential On-<br />

Chip Interconnect Architecture for<br />

Inductive/Capacitive Crosstalk Noise Cancellation,<br />

Proceed<strong>in</strong>gs of Int. Symposium on System-on-Chip<br />

(SOC), F<strong>in</strong>l<strong>and</strong>, 2003.<br />

[9] P. Hartmann, S. Witt, Vergleichende Analyse und<br />

VHDL-basierte Implementation von<br />

Zufallszahlengeneratoren auf Chipkarten,<br />

Studienarbeit, University of Hamburg, 2001.


A High-Speed Scalable Shift-Register Based On-Chip<br />

Serial Communication Design for SoC Applications<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

I-Chyn Wey, You-Gang Chen, Chia-Tsun Wu, Wei Wang, <strong>and</strong> An-Yeu Wu<br />

Graduate Institute of <strong>Electronics</strong> Eng<strong>in</strong>eer<strong>in</strong>g <strong>and</strong> Department of Electric Eng<strong>in</strong>eer<strong>in</strong>g, National Taiwan University<br />

No.1, Sec. 4, Roosevelt Road, Taipei 106, Taiwan<br />

Abstract—In this paper, a high-speed, scalable<br />

on-chip serial transmission design is proposed to<br />

provide 2Gb/s transmission b<strong>and</strong>width for SoC<br />

applications. By us<strong>in</strong>g the dynamic control<br />

technology <strong>and</strong> the s<strong>in</strong>gle-phase pulse-triggered<br />

TSPC shift register design, we can provide<br />

high-speed on-chip serial transmission. Moreover,<br />

the shift register design is a scalable design. By<br />

us<strong>in</strong>g the proposed method, we can provide 3 times<br />

wider b<strong>and</strong>width as compared to the prior art<br />

design [6].<br />

I. INTRODUCTION<br />

System-on-a-chip (SoC) designs provide possible <strong>and</strong><br />

economical method to <strong>in</strong>tegrate complex systems on a s<strong>in</strong>gle<br />

chip. However, the exponential growth <strong>in</strong> speed <strong>and</strong><br />

<strong>in</strong>tegration levels of <strong>in</strong>tellectual properties (IPs) has<br />

<strong>in</strong>creased the <strong>in</strong>terconnection complexity, which dom<strong>in</strong>ates<br />

the performance of SoC design [1-6]. As a result, On-Chip<br />

Networks (OCN) have been actively studied recently to<br />

reduce the wire complexity <strong>and</strong> solve the problems of<br />

scalability <strong>in</strong> the bus-based SoC communication [2-4]. On<br />

the other h<strong>and</strong>, serial communication is the key technology<br />

to overcome the wire complexity to make the<br />

implementation of OCN is possible <strong>and</strong> practical [5,6].<br />

However, the b<strong>and</strong>width of OCN transmission will be<br />

limited by the serial communication clock frequency.<br />

High-speed design becomes the critical dem<strong>and</strong> <strong>in</strong> the<br />

on-chip serial communication architecture. Therefore, we<br />

propose a new technology to overcome the speed bottleneck<br />

<strong>in</strong> the on-chip serial communication design.<br />

As illustrated <strong>in</strong> Fig. 1(a), on-chip parallel-shared bus<br />

communication requires enormous wire <strong>in</strong>terconnections<br />

between IPs. In our proposed on-chip high-speed serial<br />

communication architecture, as illustrated <strong>in</strong> Fig. 1(b), wire<br />

complexity can be greatly reduced by 90% <strong>and</strong> the speed<br />

bottleneck can be overcome to provide 2Gb/s transmission<br />

b<strong>and</strong>width. Moreover, the proposed design can be easily<br />

scalable for higher-b<strong>and</strong>width designs.<br />

II. PROPOSED ON-CHIP SERIAL<br />

COMMUNICATION DESIGN<br />

The proposed high-speed on-chip serial communication<br />

architecture is constructed by the transmitter with parallel to<br />

serial converter <strong>and</strong> the receiver with serial to parallel<br />

converter, as illustrated <strong>in</strong> Fig. 2(a).<br />

Email: archi@access.ee.ntu.edu.tw<br />

67<br />

IP0<br />

Transmit Data Receive Data<br />

Transmit Data Receive Data<br />

IP3<br />

IP0<br />

Transmit Data Receive Data<br />

IP1<br />

Transmit Data Receive Data<br />

Transmit Data Receive Data<br />

IP2<br />

(a) On-Chip Parallel-Shared Bus Communication<br />

High-Speed On-Chip Serial<br />

Communication Interface<br />

High-Speed On-Chip Serial<br />

Communication Interface<br />

IP3<br />

Transmit Data Receive Data<br />

IP1<br />

Transmit Data Receive Data<br />

High-Speed On-Chip Serial<br />

Communication Interface<br />

High-Speed On-Chip Serial<br />

Communication Interface<br />

IP2<br />

Transmit Data Receive Data<br />

(b) On-Chip High-Speed Serial Communication<br />

Arbiter<br />

Fig. 1: Comparisons of wire complexity between the<br />

parallel-shared bus <strong>and</strong> the proposed on-chip high-speed<br />

serial communication architecture<br />

A. The Proposed Transmitter Design<br />

The transmitter, as illustrated <strong>in</strong> Fig. 2(b), is composed<br />

of the r<strong>in</strong>g oscillator, the shift register, <strong>and</strong> the control circuit.<br />

The r<strong>in</strong>g oscillator is used to generate the high-speed clock<br />

(<strong>in</strong>t_clk) to synchronize the serial transmission data. In our<br />

design, the r<strong>in</strong>g oscillator can oscillate above 2 GHz to<br />

provide high transmission b<strong>and</strong>width. In [6], the speed<br />

bottleneck <strong>in</strong> the transmitter is limited by the control circuit<br />

<strong>and</strong> the counter because of critical synchronization tim<strong>in</strong>g<br />

constra<strong>in</strong>ts. Therefore, we propose the shift register based<br />

transmitter to overcome the speed bottleneck.<br />

In the transmitter, the number of serial transmission bits<br />

is determ<strong>in</strong>ed by the stage of shift register. In the shift


IP1<br />

Transmit Data Receive Data<br />

Parallel-to-Serial<br />

Converter<br />

Serial-to-Parallel<br />

Converter<br />

Int_clock (Transmit) Int_clock (Receive)<br />

On-Chip Serial<br />

Communication<br />

Transmitter<br />

ext_clk<br />

Ext_clock 1<br />

Control 1<br />

On-Chip Serial<br />

Communication<br />

Receiver<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

D<br />

CLK<br />

Q<br />

control<br />

Control 2<br />

Ext_clock 2<br />

IP2<br />

Transmit Data Receive Data<br />

Parallel-to-Serial<br />

Converter<br />

On-Chip Serial<br />

Communication<br />

Transmitter<br />

High-Speed Serial Data Transmission (Critical Wire L<strong>in</strong>e)<br />

d1 d2 d3 d4<br />

D<br />

CLK<br />

Q D<br />

CLK<br />

Serial-to-Parallel<br />

Converter<br />

Int_clock (Transmit) Int_clock (Receive)<br />

Q D<br />

CLK<br />

<strong>in</strong>t_clk2 <strong>in</strong>t_clk<br />

Q D<br />

On-Chip Serial<br />

Communication<br />

Receiver<br />

(a) Proposed High-Speed On-Chip Serial Communication Architecture<br />

CONTROL CIRCUIT<br />

d10<br />

control<br />

Q<br />

RING OSCILLATOR<br />

D<br />

CLK<br />

<strong>in</strong>t_clk1 <strong>in</strong>t_clk4<br />

d9<br />

Q<br />

D<br />

CLK<br />

d8<br />

Q<br />

D<br />

CLK<br />

SINGLE-PHASE SHIFT-REGISTER<br />

<strong>in</strong>t_clk3<br />

(b) On-Chip Serial Communication Transmitter<br />

<strong>in</strong>t_clk12 <strong>in</strong>t_clkre<br />

<strong>in</strong>t_clk11 <strong>in</strong>t_clk14<br />

Dummy<br />

Cell<br />

<strong>in</strong>t_clk13<br />

(c) On-Chip Serial Communication Receiver<br />

Fig. 2: The proposed high-speed, reliable on-chip serial<br />

communication design.<br />

register, each s<strong>in</strong>gle bit is constructed by the Pulse-Triggered<br />

True-s<strong>in</strong>gle-phase- clock<strong>in</strong>g D Flip-Flop (PTTFF) [7] with<br />

delay buffer. The PTTFF is adopted as fast register with<br />

shorter set-up time. In this paper, the shift register is<br />

designed to be s<strong>in</strong>gle-phase, which is realized by connect<strong>in</strong>g<br />

the external-clock (ext_clk) as the shift register <strong>in</strong>put. The<br />

s<strong>in</strong>gle-phase design can provide wider synchronization<br />

tim<strong>in</strong>g constra<strong>in</strong>t tolerance to meet the high-speed dem<strong>and</strong>.<br />

Moreover, the s<strong>in</strong>gle phase can make the shift register to be<br />

scalable.<br />

The control circuit is used to synchronize the <strong>in</strong>t_clk<br />

between transmitter <strong>and</strong> receiver. The oscillator starts with<br />

the positive edge of ext_clk <strong>and</strong> stops with a fixed number of<br />

shift register latency. In the proposed dynamic control circuit,<br />

the control signal can be fast precharged to “high” to trigger<br />

the <strong>in</strong>t_clk. As the f<strong>in</strong>al stage shift register output (d10)<br />

changes from logic ”0” to logic “1”, the control signal will<br />

be fast pulled down to stop the <strong>in</strong>t_clk.<br />

B. The proposed Receiver Design<br />

The receiver is constructed by the r<strong>in</strong>g oscillator with<br />

nearly the same frequency as <strong>in</strong>t_clk, as illustrated <strong>in</strong> Fig.<br />

2(c). To get the same oscillation frequency <strong>in</strong> the receiver<br />

d7<br />

Q<br />

CLK<br />

D<br />

CLK<br />

Q<br />

d5<br />

d6<br />

Q<br />

D<br />

CLK<br />

68<br />

(<strong>in</strong>t_clkre), the dummy cell is <strong>in</strong>serted to <strong>in</strong>t_clkre with the<br />

same size as the load<strong>in</strong>g <strong>in</strong> <strong>in</strong>t_clk.<br />

III. HIGH-SPEED SCALABLE DESIGN AND TIMING<br />

ANALYSIS<br />

To meet the speed constra<strong>in</strong>t <strong>and</strong> wide-b<strong>and</strong>width<br />

dem<strong>and</strong> <strong>in</strong> SoC, we propose the s<strong>in</strong>gle-phase pulse-triggered<br />

TSPC shift-register design <strong>in</strong> the on-chip serial<br />

communication transmitter to replace the counter-based<br />

transmitter <strong>in</strong> [6]. The proposed s<strong>in</strong>gle-phase shift register<br />

tim<strong>in</strong>g constra<strong>in</strong>t analysis is illustrated <strong>in</strong> Fig. 3. With<br />

s<strong>in</strong>gle-phase advantage, the proposed design can provide<br />

larger synchronization tim<strong>in</strong>g constra<strong>in</strong>t tolerance. Operat<strong>in</strong>g<br />

at about 2.5GHz, the proposed design can provide 0.151ns<br />

synchronization tim<strong>in</strong>g constra<strong>in</strong>t tolerant range. Every<br />

s<strong>in</strong>gle-bit shift register can be viewed as the same <strong>and</strong> we<br />

only need to consider the case of read logic “1” <strong>in</strong> the D<br />

flip-flop. Therefore, it can be easily scalable to any<br />

transmission bit number meet the future SoC dem<strong>and</strong>.<br />

Synchronization tim<strong>in</strong>g constra<strong>in</strong>t tolerant region<br />

( Safe_Region )<br />

Trasient_time<br />

Hold_time<br />

Buffer_delay<br />

0.082ns<br />

0.394ns<br />

0.151ns<br />

Setup_time<br />

0.212ns 0.031ns<br />

Fig. 3: The tim<strong>in</strong>g constra<strong>in</strong>t analysis of the proposed<br />

s<strong>in</strong>gle-phase shift register<br />

In the asynchronous counter (ripple counter), as<br />

illustrated <strong>in</strong> Fig 4(a), the delay time is proportional to the<br />

bit number, because each present counter state <strong>in</strong>put is<br />

triggered by the previous state output. As a result, the speed<br />

will be slow, especially as the bit number <strong>in</strong>creases. In the<br />

synchronous counter, as illustrated <strong>in</strong> Fig. 4(b), every<br />

counter clock <strong>in</strong>put is parallel triggered at the same time.<br />

However, the circuit complexity <strong>in</strong> the counter will be raised<br />

as the bit number <strong>in</strong>creases. Also the delay time will be<br />

<strong>in</strong>creased. In the proposed shift register based design, as<br />

illustrated <strong>in</strong> Fig. 4(c), the delay time is constant <strong>and</strong><br />

<strong>in</strong>dependent from the serial transmission bit number. The<br />

speed comparison of counter-based design <strong>and</strong> proposed<br />

shift register based design is demonstrated <strong>in</strong> Fig. 5. In the<br />

10-bit shift register, the speed can be improved about 80% as<br />

compared to 10-bit counter. Consequently, it is suitable for<br />

the wide-b<strong>and</strong>width SoC applications. Furthermore, it can be<br />

easily extended to any transmission number.<br />

To generate a fast control signal, we propose a new<br />

dynamic control technology. The operation tim<strong>in</strong>g analysis<br />

of the proposed dynamic control technique is demonstrated<br />

<strong>in</strong> Fig. 6. In the precharge period, the <strong>in</strong>t_clk is triggered by<br />

the positive edge of ext_clk (the negative edge of ext_clkbar).<br />

In the evaluation period, with the s<strong>in</strong>gle-phase property, the


0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

D<br />

D<br />

CLK<br />

CLK<br />

D<br />

Q<br />

CLK Q<br />

D<br />

CLK<br />

QB D<br />

Q D<br />

CLK<br />

CLK Q<br />

Q D<br />

QB D<br />

CLK<br />

(a) Shift register<br />

CLK Q<br />

(b) Asynchronous counter<br />

Q D<br />

Q D<br />

QB D<br />

CLK<br />

(c) Synchronous counter<br />

CLK<br />

QB<br />

CLK Q<br />

Q<br />

Q D<br />

Fig. 4: Comparisons of the shift-register circuit <strong>and</strong> the<br />

counter circuit<br />

Fig. 5: The speed comparison of counter-based design <strong>and</strong><br />

proposed shift register based design<br />

ext_clk<br />

ext_clkbar<br />

d10<br />

control<br />

precharge<br />

evaluation<br />

CLK<br />

Without Charge Loss Without Charge Loss<br />

Fig. 6: The operation tim<strong>in</strong>g analysis of the proposed<br />

dynamic control technique<br />

shift register outputs always change from logic ”0” to logic<br />

“1”. Therefore, there will be no charge loss through the<br />

pull-down network even though the signal switches <strong>in</strong> the<br />

evaluation period of dynamic control circuit. Consequently,<br />

the f<strong>in</strong>al stage shift register output can be directly connected<br />

to the dynamic pull-down nMOS. Besides, <strong>in</strong> the output of<br />

control circuit, we add 4 transistors to construct the feedback<br />

latch to boost ga<strong>in</strong> <strong>and</strong> avoid float<strong>in</strong>g situation. The control<br />

signal can be fast pulled down to stop the <strong>in</strong>t_clk because<br />

only one nMOS serial connected with the evaluated nMOS<br />

<strong>in</strong> the pull-down network <strong>and</strong> the precharged pMOS has<br />

Q<br />

69<br />

been turned-off.<br />

IV. PERFORMANCE COMPARISONS AND<br />

SIMULATION RESULTS<br />

The detailed performance comparison of various<br />

on-chip serial communication designs is illustrated <strong>in</strong> Table<br />

1 <strong>and</strong> the speed tim<strong>in</strong>g comparison is illustrated <strong>in</strong> Fig. 7. All<br />

the comparison results are based on the post-layout<br />

simulation got from HSPICE.<br />

In the on-chip serial communication with<br />

parallel-to-serial ratio of 10, the 90% <strong>in</strong>terconnection wires<br />

between IPs can be reduced. In the proposed on-chip serial<br />

communication design, the critical delay time can be reduced<br />

to 4.25ns with about 65% improvement <strong>and</strong> can provide<br />

about 3 times b<strong>and</strong>width. The delay <strong>in</strong> the proposed<br />

s<strong>in</strong>gle-phase shift register based on-chip serial transmitter<br />

can be reduced to 0.243ns with about 80%improvement <strong>in</strong><br />

speed as compared to the counter based transmitter <strong>in</strong> [6]. By<br />

us<strong>in</strong>g the proposed dynamic control technique, the speed can<br />

be accelerated about 20%. Moreover, the proposed on-chip<br />

serial communication design possesses good scalability,<br />

wide tim<strong>in</strong>g constra<strong>in</strong>t toleranc. In Fig7, we can f<strong>in</strong>d that the<br />

proposed design can operate fastest because of fast serial<br />

data transmitted by the proposed s<strong>in</strong>gle-phase shift-register<br />

based transmitter.<br />

Table 1: Performance comparisons of various on-chip serial<br />

communication designs<br />

This Work Async[6] Sync[6]<br />

Tr. Count 384 394 562<br />

Wire Reduction 90% 90% 90%<br />

P/S Ratio 10 10 10<br />

Tcritical (ns) 4.25 13.18 12.12<br />

Improvement 67.75% 64.93%<br />

Int_Clk Frequency 2.54GHz 0.73GHz 0.80GHz<br />

B<strong>and</strong>width 2Gb/s 0.645Gb/s 0.690Gb/s<br />

Improvement 3.10X 2.90X<br />

T_transmitter 0.243 1.213 1.134<br />

Improvement 79.97% 78.57%<br />

T_control 0.166 0.214 0.214<br />

Improvement 22.43% 22.43%<br />

Energy (uw/MHz) 1.65 1.63 1.81<br />

Scalability Good Worse Worse<br />

Tim<strong>in</strong>g tolerance Wide Narrow Narrow<br />

Proposed<br />

Async<br />

Sync<br />

Control<br />

Serial Data Transmission<br />

Wire Delay<br />

Receive Data<br />

0 2 4 6 8 10 12 14 (ns)<br />

Fig. 7: The speed tim<strong>in</strong>g comparison of various on-chip<br />

serial communication designs


0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

Transmitter<br />

Experimental Long Wire<br />

Receiver<br />

Fig. 8: The layout of proposed high-speed, scalable on-chip serial communication design<br />

ext_clk<br />

<strong>in</strong>t_clk<br />

<strong>in</strong>t_clkre<br />

ipre1<br />

ipre2<br />

ipre3<br />

ipre4<br />

ipre5<br />

ipre6<br />

ipre7<br />

ipre8<br />

ipre9<br />

ipre10<br />

0 1 0 0 0 0 1 0 1 1 1 1 1 1 1 1 0 0 0 1 0 1 1 1 1 0 0 0 1 1 1 1 1 0 0 1 1 0 0 1 0 1 0 0 0 0 1 0 1 1<br />

0<br />

1<br />

0<br />

0<br />

0<br />

0<br />

1<br />

0<br />

1<br />

1<br />

1<br />

1<br />

1<br />

1<br />

1<br />

1<br />

0<br />

0<br />

0<br />

1<br />

Fig. 9: Waveform of the proposed high-speed, scalable<br />

on-chip serial communication design<br />

Table 2: Performance Summary<br />

Process UMC 0.18um<br />

Supply Voltage 1.8V<br />

Transistor count 384<br />

Transmission B<strong>and</strong>width 2Gb/s<br />

Int_clock Frequency 2.54GHz<br />

Ext_clock Frequency 200MHz<br />

Implementation Area 180um* 40um<br />

Experiment Wire Length 6mm<br />

To verify the function <strong>and</strong> performance <strong>in</strong> silicon, we<br />

layout the proposed design <strong>in</strong> UMC 0.18um process, as<br />

illustrated <strong>in</strong> Fig. 8. A 6mm experimental long wire is drawn<br />

to simulate the SoC communication environment. The<br />

function is verified to be correct <strong>in</strong> 2Gb/s transmission rate<br />

as demonstrated <strong>in</strong> Fig. 9. F<strong>in</strong>ally, Table2 shows the<br />

0<br />

1<br />

1<br />

1<br />

1<br />

0<br />

0<br />

0<br />

1<br />

1<br />

1<br />

1<br />

1<br />

0<br />

0<br />

1<br />

1<br />

0<br />

0<br />

1<br />

0<br />

1<br />

0<br />

0<br />

0<br />

0<br />

1<br />

0<br />

1<br />

1<br />

70<br />

Input Register<br />

Output Register<br />

performance summary of the proposed high-speed, scalable<br />

on-chip serial communication design. In the proposed design,<br />

the total transistor number is 384. In the UMC 0.18um<br />

process, the proposed design can support 2Gb/s transmission<br />

b<strong>and</strong>width with 2.54GHz <strong>in</strong>t_clk <strong>and</strong> 200MHz ext_clk. The<br />

total implementation area is 180um*40um, exclud<strong>in</strong>g<br />

experimental wire area.<br />

V. CONCLUSIONS<br />

In this paper, a high-speed, scalable on-chip serial<br />

communication <strong>in</strong>terface design is proposed <strong>in</strong> UMC 0.18um<br />

process. The serial communication clock frequency is<br />

designed to work correctly at 2.54GHz to provide 2Gb/s<br />

transmission b<strong>and</strong>width for SoC applications. By us<strong>in</strong>g the<br />

dynamic control technology, we can generate a fast <strong>and</strong><br />

reliable control signal with about 20% improvement <strong>in</strong> speed.<br />

By us<strong>in</strong>g the s<strong>in</strong>gle-phase pulse-triggered TSPC shift register<br />

design, the speed can be accelerated about 65% to provide<br />

about 3 times serial transmission b<strong>and</strong>width. Moreover, the<br />

proposed design can be easily scalable.<br />

REFERENCES<br />

[1] Ron Ho, Kenneth W. Mai, <strong>and</strong> Mark A. Horowitz, “The<br />

Future of Wires,” Proceed<strong>in</strong>gs of the IEEE, Volume 89,<br />

Issue 4, Pages 490-504, 2001.<br />

[2] L. Ben<strong>in</strong>i, <strong>and</strong> Giovanni De Micheli, “Networks on<br />

Chips: A New SoC Paradigm,” IEEE Computer, Volume<br />

35, Issue 1, Pages 70-78, 2002.<br />

[3] K. Lahiri, et al, “Design of High-Performance<br />

System-on-Chips us<strong>in</strong>g Communication Architecture<br />

Tuners,” IEEE Transaction on CAD/ICAS, Volume 23,<br />

Issue 5, Pages 620-636, 2004.<br />

[4] F. Karim, et al, “On-Chip Communication Architecture<br />

for OC-768 Network Processors,” Proc. DAC, Pages<br />

678-683, 2001.<br />

[5] Se-Joong Lee, et al, “An 800MHz Stat-Connected<br />

On-Chip Network for Application to Systems on a chip,”<br />

ISSCC, Pages 468-469, 2003.<br />

[6] Sh<strong>in</strong>ji Kimura, et al, “An On-Chip High Speed Serial<br />

Communication Method Based on Independent R<strong>in</strong>g<br />

Oscillators,” ISSCC, Pages 390-391, 2003.<br />

[7] J. S. Wang, Po-Hui Yang, <strong>and</strong> Duo Sheng, “ Design of a<br />

3-V 300MHz Low-Power 8-b X 8-b Pipel<strong>in</strong>ed Multipliers<br />

Us<strong>in</strong>g Pulse-Triggerd TSPC Flip-Flops“ IEEE JSSC, vol<br />

35, pp. 583-592, April 2000.


0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

FULLY DIGITAL FEEDFORWARD<br />

DELTA-SIGMA MODULATOR<br />

Ahmed Gharbiya <strong>and</strong> D. A. Johns<br />

Department of Electrical <strong>and</strong> Computer Eng<strong>in</strong>eer<strong>in</strong>g, University of Toronto,<br />

10 K<strong>in</strong>g’s College Road, Toronto, Ontario, Canada, M5S 3G4<br />

E-mail: gharbiya@eecg.utoronto.ca<br />

ABSTRACT<br />

A delta-sigma modulator us<strong>in</strong>g a fully digital feedforward<br />

path is proposed. The new modulator reta<strong>in</strong>s the<br />

conventional <strong>in</strong>put feedforward structure advantages <strong>and</strong><br />

adds to them the elim<strong>in</strong>ation of the analog adder at the<br />

quantizer <strong>in</strong>put <strong>and</strong> improved signal-to-noise ratio. The<br />

new modulator requires an extra quantizer <strong>and</strong> digital<br />

adders.<br />

1. INTRODUCTION<br />

Delta-sigma (∆Σ) modulators are widely used for high<br />

resolution <strong>and</strong> low to moderate b<strong>and</strong>width<br />

analog-to-digital converters (ADCs). With the cont<strong>in</strong>u<strong>in</strong>g<br />

advancement of CMOS technology <strong>and</strong> dem<strong>and</strong> for higher<br />

b<strong>and</strong>width, the analog components of ∆Σ modulators are<br />

<strong>in</strong>creas<strong>in</strong>gly more difficult to implement. The <strong>in</strong>put<br />

feedforward concept is a method of relax<strong>in</strong>g the<br />

requirements on the analog blocks [1, 2]. For example,<br />

Fig. 1 shows a conventional cha<strong>in</strong> of <strong>in</strong>tegrators with<br />

distributed feedback (CIFB) structure (solid l<strong>in</strong>e). This<br />

structure can be modified to <strong>in</strong>clude an <strong>in</strong>put feedforward<br />

(CIFB-IF) path (dashed l<strong>in</strong>e) which removes the<br />

<strong>in</strong>put-signal component from the output node of the<br />

<strong>in</strong>tegrators, relax<strong>in</strong>g the headroom requirements <strong>and</strong><br />

possibly allow<strong>in</strong>g for more efficient opamp architectures<br />

to be used. Also, distortion becomes <strong>in</strong>dependent of the<br />

<strong>in</strong>put signal, which relaxes the l<strong>in</strong>earity requirements.<br />

Both of these advantages can reduce power dissipation.<br />

z<br />

1−<br />

z<br />

−1<br />

−1<br />

z<br />

1−<br />

z<br />

−1<br />

−1<br />

Fig. 1: Second-order ∆Σ modulator<br />

Different <strong>in</strong>put feedforward architectures have been<br />

proposed [1-3] <strong>and</strong> most are implemented completely <strong>in</strong><br />

the analog doma<strong>in</strong>. Recently, a partially digital <strong>in</strong>put<br />

feedforward modulator was proposed [3]. This paper<br />

presents a modulator with the <strong>in</strong>put feedforward path<br />

implemented completely <strong>in</strong> the digital doma<strong>in</strong>.<br />

The outl<strong>in</strong>e of this paper is as follows. Section 2 describes<br />

the new fully digital feedforward concept us<strong>in</strong>g a<br />

first-order ∆Σ modulator. Section 3 evaluates the<br />

performance of the proposed technique us<strong>in</strong>g a<br />

71<br />

second-order ∆Σ modulator. Section 4 presents a<br />

simplified fully digital feedforward technique. F<strong>in</strong>ally,<br />

conclusions are presented <strong>in</strong> section 5.<br />

2. PROPOSED TOPOLOGY<br />

The fully digital feedforward (FDFF) structure is<br />

illustrated for a first-order modulator as shown <strong>in</strong> Fig. 2.<br />

The feedforward path (dashed l<strong>in</strong>e) consists of an<br />

additional ADC (ADC 2) with a reference voltage V ref, ADC2<br />

which can be different than the ma<strong>in</strong> ADC (ADC 1)<br />

reference voltage (V ref, ADC1). In addition, the number of<br />

bits <strong>in</strong> ADC 2 can be different than that of ADC 1. An<br />

additional digital-to-analog converter (DAC) is needed to<br />

feed the weighted digitized <strong>in</strong>put-signal <strong>in</strong>to the loop.<br />

Vref, ADC2<br />

ADC2<br />

DAC<br />

x<br />

a1<br />

-<br />

DAC<br />

−1<br />

z<br />

−1<br />

1−<br />

z<br />

ADC1<br />

y<br />

b1<br />

c1<br />

b2<br />

Vref, ADC1<br />

Fig. 2: First-order Fully Digital Feedforward ∆Σ<br />

modulator<br />

The modulator can be simplified further as shown <strong>in</strong> Fig.<br />

3, where the extra DAC is elim<strong>in</strong>ated by perform<strong>in</strong>g the<br />

addition <strong>in</strong> the digital doma<strong>in</strong>. As will be shown shortly,<br />

the coefficient values feed<strong>in</strong>g to the <strong>in</strong>tegrator from the<br />

DAC are equal; therefore, the signal process<strong>in</strong>g doesn’t<br />

<strong>in</strong>volve any multiplications, only additions are required.<br />

z<br />

1−<br />

z<br />

−1<br />

−1<br />

Fig. 3: Modified first-order FDFF ∆Σ modulator<br />

c2


The noise transfer function of ADC 1 is not affected <strong>and</strong> is<br />

first-order noise shaped. Quantization noise from ADC 2 is<br />

completely cancelled <strong>and</strong> doesn’t appear at the output of<br />

the modulator; however, it appears at the output of the<br />

<strong>in</strong>tegrator. The signal at the output of the <strong>in</strong>tegrator<br />

doesn’t conta<strong>in</strong> any <strong>in</strong>put-signal component, only<br />

quantization noise. A l<strong>in</strong>ear analysis of the system leads to<br />

the follow<strong>in</strong>g results:<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

−1<br />

( 1− z ) q1<br />

y<br />

out<br />

−1<br />

= −z<br />

q<br />

−1<br />

− z q<br />

(2)<br />

−1<br />

= z x +<br />

(1)<br />

<strong>in</strong>tegrator<br />

for the follow<strong>in</strong>g choice of coefficients:<br />

a α<br />

−1<br />

−1<br />

1 = 1,<br />

b1<br />

= 1 − α , b2<br />

= α,<br />

c1<br />

= 1−<br />

z , c2<br />

= z<br />

where q 1 is the quantization noise from ADC 1, q 2 is the<br />

quantization noise from ADC 2, <strong>and</strong> α is a constant.<br />

The first benefit of the FDFF modulator when compared<br />

to classical <strong>in</strong>put feedforward modulators is elim<strong>in</strong>at<strong>in</strong>g<br />

the adder that is required before the ma<strong>in</strong> quantizer<br />

(ADC1). This analog adder is implemented us<strong>in</strong>g active or<br />

passive circuits. In either implementation there is an<br />

<strong>in</strong>crease <strong>in</strong> design complexity <strong>and</strong> power consumption.<br />

The <strong>in</strong>creased power consumption us<strong>in</strong>g a passive adder is<br />

due to the reduction <strong>in</strong> dynamic range <strong>in</strong>herent <strong>in</strong> their<br />

switched-capacitor implementation, which means that the<br />

quantizer has to resolve smaller voltages <strong>and</strong> therefore<br />

<strong>in</strong>crease its power consumption [2]. Therefore, the<br />

additional circuitry used to implement the digital<br />

feedforward modulator does not necessarily <strong>in</strong>crease the<br />

power consumption <strong>and</strong> it is simpler to implement.<br />

The second benefit is <strong>in</strong>creas<strong>in</strong>g the achievable<br />

signal-to-noise ratio (SNR). To underst<strong>and</strong> how this<br />

happens, the modulator is redrawn as shown <strong>in</strong> Fig. 4 with<br />

α=0.<br />

z<br />

1−<br />

z<br />

Fig. 4: FDFF ∆Σ modulator with the blocks rearranged<br />

The modulator can be viewed as a two stage modulator<br />

with the second stage (<strong>in</strong>side dashed box) be<strong>in</strong>g a<br />

conventional feedback ∆Σ modulator. The modulator has<br />

two regions of operation, the first is before ADC 2 saturates<br />

(the <strong>in</strong>put voltage is less than V ref, ADC2) <strong>and</strong> the second is<br />

after ADC 2 saturates (the <strong>in</strong>put voltage is greater than<br />

V ref, ADC2). In region one, the <strong>in</strong>put voltage to the second<br />

stage is mostly quantization noise <strong>and</strong> the modulator has<br />

all the benefits of the <strong>in</strong>put feedforward technique. In<br />

region two, the <strong>in</strong>put voltage to the second stage conta<strong>in</strong>s<br />

<strong>in</strong>put-signal component <strong>and</strong> the modulator operates as a<br />

conventional feedback structure. The maximum <strong>in</strong>put<br />

voltage <strong>in</strong> region two is proportional to V ref, ADC1 <strong>and</strong><br />

depends on the loop order <strong>and</strong> number of bits <strong>in</strong> the<br />

quantizer ADC 1 as <strong>in</strong> conventional modulators. Therefore,<br />

−1<br />

1<br />

−1<br />

2<br />

−1<br />

z<br />

72<br />

the maximum achievable <strong>in</strong>put signal amplitude of the<br />

overall modulator is:<br />

V<strong>in</strong> , max<br />

= V + k V<br />

(3)<br />

ref, ADC2<br />

ref, ADC1<br />

where k is a constant rang<strong>in</strong>g from 50 to 80% [4]. This<br />

should be compared to conventional structures where:<br />

V<strong>in</strong> = k V<br />

(4)<br />

, max<br />

ref, ADC1<br />

The larger <strong>in</strong>put-signal means that the FDFF modulator<br />

can achieve better SNR, or can consume less power for the<br />

same SNR as conventional structures.<br />

The third benefit is the possibility of easily modify<strong>in</strong>g an<br />

exist<strong>in</strong>g modulator to <strong>in</strong>corporate the fully digital<br />

feedforward idea. The modification can help modulators<br />

that suffer from opamp nonl<strong>in</strong>earities if operation is<br />

restricted to region 1. Another benefit of restrict<strong>in</strong>g the<br />

operation to region 1 is improved stability. As can be seen<br />

from (3), when the maximum <strong>in</strong>put signal is limited to<br />

Vref, ADC2, the modulator is kVref, ADC1 away from<br />

overload<strong>in</strong>g.<br />

3. SIMULATION RESULTS<br />

Several simulations us<strong>in</strong>g MATLAB ® <strong>and</strong> Simul<strong>in</strong>k ® were<br />

used to evaluate the performance of the new modulator<br />

<strong>and</strong> to compare it to conventional structures. The<br />

second-order FDFF modulator is used for the simulations<br />

as shown <strong>in</strong> Fig. 5 <strong>and</strong> for the follow<strong>in</strong>g choice of<br />

coefficients:<br />

a1<br />

= 1,<br />

a2<br />

= 1,<br />

b1<br />

= 1,<br />

b2<br />

= 1,<br />

b3<br />

= 0,<br />

b4<br />

= 1,<br />

−2<br />

−2<br />

c = 1,<br />

c = z , c = z ,<br />

1<br />

2<br />

3<br />

z<br />

1−<br />

z<br />

−1<br />

−1<br />

z<br />

1−<br />

z<br />

Fig. 5: Second-order FDFF ∆Σ modulator<br />

All simulations are carried out with an oversampl<strong>in</strong>g ratio<br />

(OSR) of 32, 3 bits <strong>in</strong> both quantizers, a quantizer<br />

reference voltage Vref, ADC1 =±0.5 =Vref, ADC2, <strong>and</strong><br />

uniformly distributed dither<strong>in</strong>g added to each quantizer.<br />

The modulator is <strong>in</strong>sensitive to non-idealities <strong>in</strong> the<br />

additional quantizer ADC2. In fact, variations of σ=50% <strong>in</strong><br />

the reference levels of ADC2 result <strong>in</strong> no degradation to<br />

the SNR as long as ADC2 is monotonic.<br />

Perform<strong>in</strong>g the signal process<strong>in</strong>g <strong>in</strong> the digital doma<strong>in</strong><br />

requires f<strong>in</strong>er DACs to feedback the sum of digital signals.<br />

The number of levels required by the DACs was<br />

determ<strong>in</strong>ed by simulations; DAC1 requires 15 levels <strong>and</strong><br />

DAC2 requires 8 levels. In general, the number of levels<br />

2 1<br />

1 N +<br />

− <strong>and</strong> for DAC2 is N<br />

2 .<br />

for DAC1 is ( )<br />

−1<br />

−1


The modulator was simulated tak<strong>in</strong>g <strong>in</strong>to account the<br />

effect of f<strong>in</strong>ite ga<strong>in</strong>, b<strong>and</strong>width, <strong>and</strong> slew rate of the<br />

opamp <strong>in</strong> the first <strong>in</strong>tegrator. From simulations, the opamp<br />

requirements are an open-loop DC ga<strong>in</strong> of 55dB, a<br />

b<strong>and</strong>width f-3dB of 1.25fsampl<strong>in</strong>g, <strong>and</strong> a slew-rate of<br />

1.2 V/Tsampl<strong>in</strong>g; where f-3dB is the closed-loop -3dB<br />

b<strong>and</strong>width of the opamp, fsampl<strong>in</strong>g is the sampl<strong>in</strong>g<br />

frequency, <strong>and</strong> Tsampl<strong>in</strong>g is the sampl<strong>in</strong>g period. It should be<br />

mentioned that these requirements are slightly larger than<br />

those required by the conventional structures, but less than<br />

those of cascade (MASH) structures. The higher<br />

requirements are needed to achieve cancellation of ADC2 quantization noise at the output. It is also possible to<br />

compensate for these non-idealities <strong>in</strong> the digital doma<strong>in</strong><br />

to relax the opamp requirements which will be discussed<br />

<strong>in</strong> section 4.<br />

The first <strong>and</strong> second <strong>in</strong>tegrator output probability densities<br />

are shown <strong>in</strong> Fig. 6 for the FDFF, CIFB, <strong>and</strong> CIFB-IF.<br />

occurrences [%]<br />

occurrences [%]<br />

15<br />

10<br />

5<br />

0<br />

-1 -0.5 0<br />

<strong>in</strong>tegrator 1 output<br />

0.5 1<br />

10<br />

8<br />

6<br />

4<br />

2<br />

0<br />

-1 -0.5 0<br />

<strong>in</strong>tegrator 2 output<br />

0.5 1<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

FDFF<br />

CIFB<br />

CIFB-IF<br />

FDFF<br />

CIFB<br />

CIFB-IF<br />

Fig. 6: Output probability densities for the first <strong>and</strong><br />

second <strong>in</strong>tegrators with s<strong>in</strong>usoidal <strong>in</strong>put<br />

The signals for the FDFF are slightly larger than the<br />

CIFB-IF because it conta<strong>in</strong>s q2 <strong>in</strong> addition to q1; s<strong>in</strong>ce<br />

CIFB conta<strong>in</strong>s <strong>in</strong>put-signal component, its signals are<br />

larger.<br />

The SNR improvement is illustrated <strong>in</strong> Fig. 7 where the<br />

SNR is plotted versus the <strong>in</strong>put-signal amplitude <strong>in</strong> dBFS,<br />

where dBFS is dB with respect to the full scale (which is<br />

def<strong>in</strong>ed to be Vref, ADC1). The plot shows that the new<br />

modulator can achieve 8 dB better SNR than conventional<br />

structures. The SNR can be improved further by<br />

<strong>in</strong>creas<strong>in</strong>g Vref, ADC2. For example, if Vref, ADC2 is twice as<br />

large as Vref, ADC1, SNR improves by an extra 3dB. The<br />

trade off is <strong>in</strong>creas<strong>in</strong>g the signal sw<strong>in</strong>gs at the <strong>in</strong>tegrators<br />

outputs. Larger reference voltages for ADC2 are possible<br />

because it is outside the loop <strong>and</strong> therefore is not limited<br />

by opamps output sw<strong>in</strong>g.<br />

73<br />

SNDR [ dB ]<br />

90<br />

80<br />

70<br />

60<br />

50<br />

40<br />

30<br />

20<br />

10<br />

FDFF<br />

CIFB<br />

CIFB-IF<br />

0<br />

-80 -70 -60 -50 -40 -30 -20 -10 0 10<br />

A [ dBFS ]<br />

Fig. 7: SNR versus <strong>in</strong>put-signal amplitude<br />

The output spectrum of the modulator is illustrated <strong>in</strong> Fig.<br />

8 for both regions of operation. The <strong>in</strong>put-signal frequency<br />

is at 1/4 the Nyquist b<strong>and</strong>width to clearly show any<br />

distortion components. The simulations are carried out<br />

with f<strong>in</strong>ite ga<strong>in</strong>, b<strong>and</strong>width, slew rate, <strong>and</strong> third order<br />

distortion <strong>in</strong> the first <strong>in</strong>tegrator <strong>and</strong> with variations of<br />

σ=5% <strong>in</strong> the reference levels of ADC 2.<br />

PSD [ dB ]<br />

PSD [ dB ]<br />

0<br />

-50<br />

-100<br />

-150<br />

0<br />

-50<br />

-100<br />

-150<br />

10 -3<br />

10 -3<br />

10 -2<br />

Frequency [ f/fs ]<br />

(a)<br />

10 -2<br />

Frequency [ f/fs ]<br />

(b)<br />

10 -1<br />

10 -1<br />

Fig. 8: Output spectrum for second-order ∆Σ<br />

modulator (a) region 1 of operation (b) region 2 of<br />

operation<br />

In region one, the <strong>in</strong>put signal amplitude is 0 dBFS, <strong>and</strong><br />

the spectrum doesn’t show distortion as expected from<br />

<strong>in</strong>put feedforward structures. In region two, the <strong>in</strong>put<br />

signal amplitude is 3 dBFS, <strong>and</strong> the spectrum shows third<br />

order distortion component. However, even if the<br />

<strong>in</strong>put-signal is restricted to region one operation only,<br />

there is still SNR ga<strong>in</strong> s<strong>in</strong>ce the <strong>in</strong>put-signal can be as<br />

large as 0 dBFS. Moreover, V ref, ADC2 can be set larger than<br />

V ref, ADC1 which allows the <strong>in</strong>put signal to be larger than 0<br />

dBFS.


4. SIMPLIFIED FDFF MODULATOR<br />

A simpler FDFF modulator shown <strong>in</strong> Fig. 9 can be devised<br />

by generaliz<strong>in</strong>g the structure <strong>in</strong> Fig. 4. The simplified<br />

FDFF consists of an <strong>in</strong>put stage (ADC 2 <strong>and</strong> a subtractor),<br />

a conventional ∆Σ modulator, <strong>and</strong> a digital filter H. The<br />

<strong>in</strong>ter-stage ga<strong>in</strong> factor “a” doesn’t map directly from Fig.<br />

4, but it offers another degree of freedom <strong>in</strong> design<strong>in</strong>g<br />

FDFF modulators.<br />

Fig. 9: Simplified Second-order FDFF ∆Σ modulator<br />

L<strong>in</strong>ear analysis of the system leads to the follow<strong>in</strong>g:<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

( H − a STF ) q2<br />

y H x + NTF∆Σ<br />

q1<br />

+<br />

∆Σ<br />

= (5)<br />

where STF ∆Σ <strong>and</strong> NTF ∆Σ are the conventional ∆Σ<br />

modulator transfer functions. Therefore, to cancel q 2 at the<br />

output of the modulator, H must equal (aSTF ∆Σ).<br />

Compensat<strong>in</strong>g for opamp non-idealities <strong>in</strong> the simplified<br />

FDFF is an easy task. First, errors <strong>in</strong> (aSTF ∆Σ) should be<br />

measured. Then, a correction is applied to H <strong>in</strong> the digital<br />

doma<strong>in</strong> to achieve a match between H <strong>and</strong> (aSTF ∆Σ). S<strong>in</strong>ce<br />

this multiplication is not <strong>in</strong> the feedback, it doesn’t restrict<br />

the speed of the FDFF modulator.<br />

A second-order FDFF ∆Σ modulator is shown <strong>in</strong> Fig. 10 as<br />

an example. Note that the two adders on the left-h<strong>and</strong> side<br />

of the modulator can be comb<strong>in</strong>ed <strong>in</strong> the digital doma<strong>in</strong>.<br />

Or, they can share the same opamp <strong>in</strong> the analog doma<strong>in</strong>;<br />

therefore, there is no extra opamp required for the<br />

subtractor.<br />

z<br />

1−<br />

z<br />

−1<br />

−1<br />

z<br />

1−<br />

z<br />

Fig. 10: Simplified Second-order FDFF ∆Σ modulator<br />

The achievable SNR can be <strong>in</strong>creased further by<br />

exploit<strong>in</strong>g the <strong>in</strong>ter-stage ga<strong>in</strong> factor. There is an optimal<br />

value for the ga<strong>in</strong> that results <strong>in</strong> maximum <strong>in</strong>crease of the<br />

SNR.<br />

−2<br />

z<br />

−1<br />

−1<br />

74<br />

SNR [ dB ]<br />

90<br />

85<br />

80<br />

75<br />

70<br />

65<br />

60<br />

55<br />

50<br />

1 2 3 4<br />

<strong>in</strong>ter-stage ga<strong>in</strong><br />

5 6 7<br />

Fig. 11 SNR vs. <strong>in</strong>ter-stage ga<strong>in</strong><br />

Fig. 11 shows the maximum achievable SNR versus the<br />

<strong>in</strong>ter-stage ga<strong>in</strong> for the same specifications stated <strong>in</strong><br />

section 3. There is an extra 6 dB SNR improvement for<br />

pick<strong>in</strong>g the optimal <strong>in</strong>ter-stage ga<strong>in</strong> of 3. The trade off is<br />

<strong>in</strong>creas<strong>in</strong>g the signal sw<strong>in</strong>gs at the <strong>in</strong>tegrators outputs.<br />

S<strong>in</strong>ce this signal is still quantization noise only, the<br />

l<strong>in</strong>earity requirements are still relaxed.<br />

5. CONCLUSION<br />

A ∆Σ modulator with a fully digital feedforward path has<br />

been described. The new modulator elim<strong>in</strong>ates the analog<br />

adder at the quantizer <strong>in</strong>put <strong>and</strong> improves the SNR. It<br />

requires one extra quantizer <strong>and</strong> digital adders.<br />

6. REFERENCES<br />

[1] J. Silva, U. Moon, J. Steensgaard, <strong>and</strong> G.C. Temes,<br />

“Wideb<strong>and</strong> low-distortion delta-sigma ADC<br />

topology,” Electron. Lett., vol. 37, pp. 737 – 738,<br />

2001.<br />

[2] A.A. Hamoui <strong>and</strong> K.W. Mart<strong>in</strong>, “High-Order<br />

Multibit Modulators <strong>and</strong> Pseudo<br />

Data-Weighted-Averag<strong>in</strong>g <strong>in</strong> Low-Oversampl<strong>in</strong>g ∆Σ<br />

ADCs for Broad-B<strong>and</strong> Applications,” IEEE Trans.<br />

Circuits Syst. I, vol. 51, pp. 72 – 85, 2004.<br />

[3] S. Kwon <strong>and</strong> F. Maloberti, “Op-Amp Sw<strong>in</strong>g<br />

Reduction <strong>in</strong> Sigma-Delta Modulators,” Proc. IEEE<br />

ISCAS, pp. I-525 – I-528, 2004.<br />

[4] S. R. Norsworthy, R. Schreier, <strong>and</strong> G. C. Temes,<br />

Delta-Sigma Data Converters: Theory, Design, <strong>and</strong><br />

Simulation. Piscataway, NJ: IEEE Press, pp.157,<br />

1997.


0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

A HIGH-SPEED TECHNIQUE FOR TIME-<br />

INTERLEAVING CONTINUOUS-TIME DELTA-<br />

SIGMA MODULATORS<br />

Trevor C. Caldwell <strong>and</strong> David A. Johns<br />

University of Toronto, Department of Electrical <strong>and</strong> Computer Eng<strong>in</strong>eer<strong>in</strong>g<br />

10 K<strong>in</strong>g’s College Rd., M5S 3G4, Toronto, Ontario, Canada<br />

E-mail: caldwet@eecg.toronto.edu<br />

ABSTRACT<br />

A technique for <strong>in</strong>creas<strong>in</strong>g the speed of time-<strong>in</strong>terleaved<br />

cont<strong>in</strong>uous-time ∆Σ modulators is presented. Extra<br />

feedback paths allow the DAC pulse to enter the adjacent<br />

clock period without deteriorat<strong>in</strong>g the SNR of the ∆Σ<br />

modulator. With the added feedback paths, an extra<br />

summer needs to be added to the modulator to atta<strong>in</strong> an<br />

equivalent impulse response, but the critical path of the<br />

modulator is able to operate twice as fast. Also, with the<br />

<strong>in</strong>clusion of non-idealites such as f<strong>in</strong>ite ga<strong>in</strong> <strong>and</strong> f<strong>in</strong>ite<br />

b<strong>and</strong>width opamps <strong>and</strong> DAC clock jitter, the modulator is<br />

able to atta<strong>in</strong> a higher SNR with the new technique.<br />

1. INTRODUCTION<br />

A method of time-<strong>in</strong>terleav<strong>in</strong>g cont<strong>in</strong>uous-time deltasigma<br />

(∆Σ) modulators has been accepted for publication<br />

at the <strong>2005</strong> European Solid-State Circuits Conference<br />

(ESSCIRC) [1]. The modulator operates at an<br />

oversampl<strong>in</strong>g ratio (OSR) of 5 with a time-<strong>in</strong>terleav<strong>in</strong>g<br />

factor of 2 at sampl<strong>in</strong>g frequencies of 100MHz <strong>and</strong><br />

200MHz. It achieves a signal-to-noise <strong>and</strong> distortion ratio<br />

(SNDR) of 57dB <strong>and</strong> 49dB <strong>in</strong> signal b<strong>and</strong>widths of<br />

10MHz <strong>and</strong> 20MHz, respectively. As it was the first<br />

<strong>in</strong>tegrated circuit implementation of a time-<strong>in</strong>terleaved ∆Σ<br />

modulator, an opportunity for improvement <strong>in</strong> the design<br />

has been found. This paper presents a method of<br />

<strong>in</strong>creas<strong>in</strong>g the speed of the modulator to overcome the<br />

low-latency requirement of the critical path us<strong>in</strong>g<br />

additional digital-to-analog converter (DAC) feedback<br />

paths. The method also relaxes the requirements of the<br />

opamps while also <strong>in</strong>creas<strong>in</strong>g the allowable clock jitter on<br />

the DAC clock.<br />

This paper will proceed as follows: Section 2 will discuss<br />

the new technique, while Section 3 will demonstrate<br />

simulation results with non-idealities added. Section 4 will<br />

conclude the paper.<br />

2. HIGH-SPEED SOLUTION<br />

2.1 Time-Interleaved Modulator<br />

A cont<strong>in</strong>uous-time time-<strong>in</strong>terleaved (CTTI) ∆Σ modulator<br />

is shown <strong>in</strong> Fig. 1, as derived from the appropriate<br />

discrete-time ∆Σ modulator. The noise transfer function of<br />

75<br />

these two modulators is identical, while the signal transfer<br />

function is negligibly different.<br />

The critical path <strong>in</strong> this modulator is between the Bottom<br />

return-to-zero (RZ) DAC <strong>and</strong> the Top analog-to-digital<br />

converter (ADC). With<strong>in</strong> one clock period (of 2T, as<br />

compared to the orig<strong>in</strong>al period of T for the discrete-time<br />

modulator of Fig. 1), the output of the Bottom RZ DAC<br />

must be summed to the top branch where it must be<br />

quantized by the Top ADC <strong>and</strong> output by the Top RZ<br />

DAC. For both of the RZ DAC pulses to stay with<strong>in</strong> their<br />

respective clock periods, the RZ DAC pulses are chosen as<br />

shown <strong>in</strong> Fig. 1 where the Bottom RZ DAC pulse is nonzero<br />

from T/2 to 3T/2, <strong>and</strong> the Top RZ DAC pulse is nonzero<br />

from T to 2T. Also, the quantization <strong>in</strong> the Bottom<br />

ADC occurs at 0, 2T, 4T, etc., while the quantization <strong>in</strong><br />

the Top ADC occurs at T/2, 5T/2, 9T/2, etc. This scheme<br />

is successful; however, the output of the Bottom RZ DAC<br />

must be summed, quantized <strong>and</strong> output from the Top RZ<br />

DAC with<strong>in</strong> T/2 seconds. At a sampl<strong>in</strong>g frequency of<br />

200MHz (2T=5ns), this allows only 1.25ns to perform the<br />

summation, quantization <strong>and</strong> RZ DAC operation. This<br />

significantly <strong>in</strong>creases the power consumption of the ADC<br />

<strong>and</strong> summer s<strong>in</strong>ce the latency must be very low. Also, two<br />

90-degree phase shifted clocks are necessary to produce<br />

the proper Top <strong>and</strong> Bottom RZ DAC tim<strong>in</strong>g signals,<br />

requir<strong>in</strong>g a larger clock generator circuit.<br />

Figure 1. CTTI ∆Σ modulator derived from its<br />

discrete-time equivalent modulator.


2.2 Additional Feedback Paths<br />

In order to reduce the latency requirements on the critical<br />

path of the modulator, the Top RZ DAC pulse could be<br />

output at a later time. However, the DAC pulse would then<br />

enter the next clock period. A technique presented <strong>in</strong> [2]<br />

demonstrates that <strong>in</strong> a regular cont<strong>in</strong>uous-time ∆Σ<br />

modulator, an extra feedback path between the DAC <strong>and</strong><br />

the ADC can be added to compensate for a DAC pulse<br />

that enters the adjacent clock period. Typically, this<br />

technique is used to combat excess loop delay <strong>in</strong><br />

modulators with non-return-to-zero (NRZ) DAC pulses.<br />

Due to f<strong>in</strong>ite switch<strong>in</strong>g time of transistors, these<br />

modulators are unable to output a DAC pulse<br />

immediately, caus<strong>in</strong>g the NRZ pulse to enter the next<br />

clock period [3].<br />

x(t)<br />

Sampl<strong>in</strong>g Time = 2T<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

1<br />

Ts<br />

1<br />

Ts<br />

1<br />

Ts<br />

T<br />

y 1<br />

w 1<br />

w2 y2 0 T<br />

Top<br />

DAC<br />

2T<br />

Top<br />

ADC<br />

Bottom<br />

ADC<br />

Bottom<br />

DAC<br />

0 T 2T<br />

Figure 2. CTTI modulator with additional DAC<br />

feedback paths for the Top DAC.<br />

Us<strong>in</strong>g the technique presented <strong>in</strong> [2], the CTTI ∆Σ<br />

modulator can be modified with extra feedback paths that<br />

will allow the DAC pulse to enter the adjacent clock<br />

period. Keep<strong>in</strong>g the Bottom RZ DAC pulse the same, the<br />

Top RZ DAC pulse can be delayed by another T/2 seconds<br />

so that its non-zero duration is from 3T/2 to 5T/2, where<br />

the last T/2 seconds of this pulse enters the next clock<br />

period. S<strong>in</strong>ce this is a time-<strong>in</strong>terleaved modulator, the<br />

extra DAC feedback path does not follow exactly from<br />

[2], <strong>and</strong> two feedback paths are required, one from the Top<br />

RZ DAC to the Top ADC, <strong>and</strong> another from the Top RZ<br />

DAC to the Bottom ADC. This requires two additional<br />

DACs, depend<strong>in</strong>g on the design. The result<strong>in</strong>g modulator<br />

with the additional feedback terms is shown <strong>in</strong> Fig. 2. The<br />

other DAC feedback coefficients β4, β5 <strong>and</strong> β6 must be<br />

changed to ensure equivalence between the impulse<br />

response of the new modulator <strong>and</strong> the modulator of Fig. 1<br />

(note that the two impulse responses that must be matched<br />

to each other are from the output of the Top ADC to the<br />

<strong>in</strong>put of the Top ADC, <strong>and</strong> from the output of the Top<br />

ADC to the <strong>in</strong>put of the Bottom ADC). S<strong>in</strong>ce a summation<br />

node already exists at the <strong>in</strong>put of the Top ADC, an extra<br />

summation block is necessary only at the <strong>in</strong>put of the<br />

Bottom ADC. The extra circuit elements are highlighted<br />

with a dotted l<strong>in</strong>e <strong>in</strong> Fig. 2.<br />

S<strong>in</strong>ce the Top RZ DAC pulse has been delayed by an extra<br />

T/2 seconds, there are T seconds for all operations <strong>in</strong> the<br />

critical path to complete, as opposed to T/2 seconds. The<br />

ADC clocks can then be adjusted so that while the<br />

quantization <strong>in</strong> the Bottom ADC still occurs at 0, 2T, 4T,<br />

etc., the quantization <strong>in</strong> the Top ADC now occurs at T, 3T,<br />

2<br />

2<br />

z -1/2<br />

y[n]<br />

76<br />

5T, etc. Therefore, the latency requirements on the circuit<br />

elements <strong>in</strong> the critical path have been reduced by a factor<br />

of 2. This primarily affects the ADC <strong>and</strong> the summer,<br />

which have the largest delays.<br />

2.3 NRZ DAC Pulses<br />

It is more advantageous <strong>in</strong> a cont<strong>in</strong>uous-time ∆Σ<br />

modulator to use NRZ DAC pulses s<strong>in</strong>ce they <strong>in</strong>crease the<br />

modulator’s resilience to DAC clock jitter [4]. The Top<br />

DAC of the previous section can be modified by chang<strong>in</strong>g<br />

* * *<br />

the DAC feedback parameters β4 , β5 , β6 , β8 <strong>and</strong> β9 to<br />

accommodate a NRZ DAC pulse that extends further <strong>in</strong>to<br />

the next clock period (from 3T/2 to 7T/2). The result<strong>in</strong>g<br />

modulator is otherwise identical. Note that this<br />

modification is only possible s<strong>in</strong>ce the additional feedback<br />

paths are already <strong>in</strong> place to allow the DAC pulse to enter<br />

the next clock period.<br />

The techniques applied to the Top DAC <strong>in</strong> the previous<br />

section can also be applied to the Bottom DAC. The<br />

Bottom DAC pulse can enter the adjacent clock period as<br />

an NRZ pulse by add<strong>in</strong>g two additional feedback paths<br />

<strong>and</strong> extend<strong>in</strong>g the pulse so that it lasts from T/2 to 5T/2. In<br />

this case, the two impulse responses that must be matched<br />

to each other are from the output of the Bottom ADC to<br />

the <strong>in</strong>put of the Bottom ADC, <strong>and</strong> from the output of the<br />

Bottom ADC to the <strong>in</strong>put of the Top ADC. The<br />

coefficients β1, β2 <strong>and</strong> β3 must be modified to<br />

accommodate the longer DAC pulse. S<strong>in</strong>ce summers<br />

already exist <strong>in</strong> both the top <strong>and</strong> bottom branches, no<br />

additional summers are required.<br />

The result<strong>in</strong>g Modified Cont<strong>in</strong>uous-Time Time-Interleaved<br />

(M-CTTI) ∆Σ modulator is shown <strong>in</strong> Fig. 3. The Top <strong>and</strong><br />

Bottom DAC pulses are 180-degrees out of phase, <strong>and</strong><br />

they are each of length 2T, mean<strong>in</strong>g that they are NRZ<br />

pulses. The two additional feedback paths due to this<br />

modification are highlighted with a dotted l<strong>in</strong>e. It should<br />

be noted that the β10 feedback path cannot be comb<strong>in</strong>ed to<br />

the β7 feedback path, even though they may appear to<br />

represent the same path. This is because the β7 feedback<br />

path still represents the signal <strong>in</strong> the critical path that must<br />

be quantized <strong>and</strong> added to the Top ADC with<strong>in</strong> T seconds,<br />

while the β10 feedback path is to be comb<strong>in</strong>ed with the<br />

next quantization <strong>in</strong>terval of the Top ADC, 2T seconds<br />

later.<br />

x(t)<br />

1<br />

Ts<br />

1<br />

Ts<br />

Sampl<strong>in</strong>g Time = 2T<br />

1<br />

Ts<br />

y 1<br />

w 1<br />

w2 y2 0 T<br />

Top<br />

2T<br />

RZ DAC<br />

Top<br />

ADC<br />

Bottom<br />

ADC<br />

Bottom<br />

RZ DAC<br />

0 T 2T<br />

Figure 3. M-CTTI modulator: CTTI modulator<br />

with additional NRZ DAC feedback paths.<br />

T<br />

3T<br />

2<br />

2<br />

z -1/2<br />

y[n]


2.4 Circuit Considerations<br />

Despite requir<strong>in</strong>g an additional summation block, the<br />

latency requirements on both the Top <strong>and</strong> Bottom paths of<br />

the new M-CTTI modulator are now equivalent, which<br />

simplifies the design. Both the Top <strong>and</strong> Bottom ADCs,<br />

DACs <strong>and</strong> summers can be designed identically, while <strong>in</strong><br />

the previous CTTI ∆Σ modulator, the b<strong>and</strong>width<br />

requirement on the Bottom ADC was much greater than<br />

that of the Top ADC. Furthermore, one of the major<br />

sources of power consumption of the orig<strong>in</strong>al CTTI<br />

modulator was the ADC, which consumed about 36% of<br />

the total power. With the new improvement, about half of<br />

this power (18% of the total power) can theoretically be<br />

saved s<strong>in</strong>ce the tolerable latency of the ADCs is now twice<br />

the previous value.<br />

While the new modulator requires four additional<br />

feedback paths, two due to the modification of the Top<br />

DAC pulses, <strong>and</strong> two more due to the modification of the<br />

Bottom DAC pulses, they can be implemented as<br />

relatively small DACs due to the low l<strong>in</strong>earity required at<br />

the <strong>in</strong>put of the ADCs. Simulations <strong>in</strong>dicate a l<strong>in</strong>earity of<br />

6 bits would be sufficient for a 10-bit design, keep<strong>in</strong>g<br />

them significantly smaller <strong>in</strong> area than the first stage DAC.<br />

Another advantage of this technique is that the clocks used<br />

to generate the RZ DAC pulses are now 180-degrees out<br />

of phase. The clocks of the orig<strong>in</strong>al CTTI ∆Σ modulator<br />

were 90-degrees out of phase, mak<strong>in</strong>g them much more<br />

difficult to generate. With the new DAC feedback scheme,<br />

the clocks are far easier to implement as a simple<br />

<strong>in</strong>version, <strong>and</strong> only one off-chip clock is required. Also, a<br />

small power sav<strong>in</strong>gs <strong>in</strong> the clock generator circuits is<br />

obta<strong>in</strong>ed, s<strong>in</strong>ce fewer non-overlapp<strong>in</strong>g clock generators<br />

are needed.<br />

3.1 Ideal Modulator<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

3. SIMULATIONS<br />

To confirm the equivalence between the M-CTTI<br />

modulator <strong>and</strong> the CTTI modulator, MATLAB<br />

simulations were performed. The impulse responses of the<br />

two modulators were matched to an accuracy of 0.1%<br />

us<strong>in</strong>g the additional feedback paths. The result<strong>in</strong>g output<br />

spectra are shown <strong>in</strong> Fig. 4 where a 3 rd order, low-pass,<br />

time-<strong>in</strong>terleaved by 2, cont<strong>in</strong>uous-time modulator with an<br />

OSR of 10 was simulated. The two output spectra are<br />

almost identical, both achiev<strong>in</strong>g a signal-to-noise ratio<br />

(SNR) of 71.0dB with an <strong>in</strong>put that is 4.4dB below fullscale<br />

at f S/20. These modulator specifications as well as<br />

the same <strong>in</strong>put tone were used as the basis for comparison<br />

<strong>in</strong> the next section.<br />

3.2 Addition of Non-Idealities<br />

Various non-idealities can be added to the MATLAB<br />

model of the ∆Σ modulators to compare them with respect<br />

to more practical high-speed issues, such as f<strong>in</strong>ite opamp<br />

b<strong>and</strong>width <strong>and</strong> ga<strong>in</strong>, <strong>and</strong> DAC clock jitter. First, the<br />

77<br />

effects of the modified feedback paths on f<strong>in</strong>ite opamp<br />

b<strong>and</strong>width <strong>and</strong> ga<strong>in</strong> were <strong>in</strong>vestigated. The <strong>in</strong>tegrators<br />

were modeled as s<strong>in</strong>gle-poled, f<strong>in</strong>ite-ga<strong>in</strong> amplifiers. The<br />

M-CTTI modulator was slightly more resilient to these<br />

non-idealities. With a DC ga<strong>in</strong> of 50dB <strong>and</strong> a first pole of<br />

f S/50 (e.g., a first pole of 8MHz for an effective sampl<strong>in</strong>g<br />

frequency of 400MHz, where the effective sampl<strong>in</strong>g<br />

frequency is twice the actual sampl<strong>in</strong>g frequency due to<br />

the time-<strong>in</strong>terleav<strong>in</strong>g by 2), the M-CTTI modulator<br />

achieved an SNR of 68.6dB, while the previous CTTI<br />

modulator achieved an SNR of 67.1dB. This difference<br />

can be significant if the modulator’s performance is<br />

limited by the b<strong>and</strong>width of the opamps. Also, with lower<br />

DC ga<strong>in</strong>s, the difference <strong>in</strong> performance between the two<br />

modulators can be much larger, <strong>in</strong>dicat<strong>in</strong>g that the two<br />

modulators are not affected <strong>in</strong> the same way by the ga<strong>in</strong>b<strong>and</strong>width<br />

product, <strong>and</strong> the M-CTTI modulator is more<br />

capable of operat<strong>in</strong>g with lower-ga<strong>in</strong>, higher-b<strong>and</strong>width<br />

opamps. This is important <strong>in</strong> high-speed design where the<br />

ga<strong>in</strong> may need to be traded for <strong>in</strong>creased b<strong>and</strong>width.<br />

a)<br />

Amplitude (dB)<br />

b)<br />

Amplitude (dB)<br />

0<br />

-50<br />

-100<br />

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5<br />

Frequency (fs=1)<br />

0<br />

-20<br />

-40<br />

-60<br />

-80<br />

-100<br />

-120<br />

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5<br />

Frequency (fs=1)<br />

Figure 4. Output spectra. a) M-CTTI modulator of<br />

Fig. 3. b) CTTI modulator of Fig. 1.<br />

S<strong>in</strong>ce the new modulator uses multibit NRZ pulses, it is<br />

expected to be more resilient to DAC clock jitter. This is<br />

because every DAC pulse will conta<strong>in</strong> jitter <strong>in</strong> only the<br />

portion of the pulse that is different from the previous<br />

pulse, result<strong>in</strong>g <strong>in</strong> much less total jitter as a fraction of the<br />

total pulse energy, when compared to RZ pulses (this is<br />

especially true because of the 4-bit multibit feedback used,<br />

<strong>and</strong> becomes less significant as fewer bits are used <strong>in</strong> the<br />

quantizer). Furthermore, when compared to the RZ DAC<br />

pulse, the length of the pulse is twice the energy, <strong>and</strong><br />

therefore the jitter is proportionally only half as<br />

detrimental for the NRZ pulse. When simulated with an<br />

RMS jitter of 12ps, the degradation <strong>in</strong> the M-CTTI<br />

modulator is far less than that of the CTTI modulator. The<br />

M-CTTI modulator has an SNR of 66.0dB, while the<br />

CTTI modulator has an SNR of only 53.8dB. If the jitter<br />

can be reduced to only 3ps RMS, the M-CTTI modulator<br />

has an SNR of 69.8dB, while the CTTI modulator has an<br />

SNR of 64.5dB, <strong>and</strong> so the performance degradation of the<br />

CTTI modulator can be reduced, at the cost of a clock with<br />

much better jitter performance. But with the M-CTTI


modulator, the requirements on the DAC clock jitter are<br />

relaxed quite significantly.<br />

The performance results of the new M-CTTI modulator<br />

compared to the previous CTTI modulator are summarized<br />

<strong>in</strong> Table 1 for various opamp ga<strong>in</strong> <strong>and</strong> b<strong>and</strong>width values,<br />

as well as for various quantities of jitter. Sample output<br />

spectra plots are shown <strong>in</strong> Fig. 5 where the opamp<br />

b<strong>and</strong>width is fS/80, the DC ga<strong>in</strong> is 50dB, <strong>and</strong> the jitter is<br />

6ps RMS (these values were chosen as the approximate<br />

values of the previously designed CTTI ∆Σ modulator).<br />

The distorted shape of the noise transfer function is due to<br />

the f<strong>in</strong>ite b<strong>and</strong>width of the opamps, as well as the reduced<br />

DC ga<strong>in</strong>, while the <strong>in</strong>crease <strong>in</strong> the noise floor <strong>in</strong> the signal<br />

b<strong>and</strong> is primarily due to the DAC clock jitter. The M-<br />

CTTI modulator has an SNR of 67.7dB, while the CTTI<br />

modulator has an SNR of 59.2dB. It is clear from this<br />

comparison that the new techniques improve the CTTI ∆Σ<br />

modulator, <strong>and</strong> the resolution of the M-CTTI modulator is<br />

<strong>in</strong>creased at higher speeds.<br />

First<br />

Pole<br />

Table. 1. Performance comparison of M-CTTI<br />

<strong>and</strong> CTTI modulator.<br />

DC<br />

Ga<strong>in</strong><br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

RMS<br />

Jitter<br />

M-CTTI<br />

Modulator<br />

SNR<br />

CTTI<br />

Modulator<br />

SNR<br />

f S/100 56dB 0ps 70.0dB 66.8dB<br />

f S/50 50dB 0ps 68.6dB 67.1dB<br />

f S/25 44dB 0ps 68.8dB 66.0dB<br />

f S/12.5 38dB 0ps 68.2dB 61.1dB<br />

Inf. Inf. 3ps 69.8dB 64.5dB<br />

Inf. Inf. 6ps 69.0dB 59.5dB<br />

Inf. Inf. 12ps 66.0dB 53.8dB<br />

f S/50 50dB 6ps 67.7dB 59.2dB<br />

4. CONCLUSIONS<br />

A new technique for reduc<strong>in</strong>g the latency requirement <strong>in</strong><br />

the critical path of a CTTI ∆Σ modulator has been<br />

presented. This technique requires one additional summer<br />

as well as four additional low-l<strong>in</strong>earity feedback paths or<br />

DACs, but is theoretically able to save about 20% of the<br />

power consumption of the orig<strong>in</strong>al CTTI ∆Σ modulator, as<br />

well as <strong>in</strong>crease the modulator’s resilience to lower ga<strong>in</strong><br />

<strong>and</strong> lower b<strong>and</strong>width opamps <strong>and</strong> DAC clock jitter. Both<br />

of these issues were significant <strong>in</strong> the orig<strong>in</strong>al CTTI<br />

modulator which suffered from both high power<br />

consumption <strong>and</strong> reduced performance at high-speeds due<br />

the low-latency requirement on the critical path <strong>and</strong> the<br />

f<strong>in</strong>ite b<strong>and</strong>width of the opamps. Another advantage of the<br />

new technique is that the M-CTTI modulator requires<br />

clocks for the two ADCs <strong>and</strong> DACs that are 180-degrees<br />

apart, which is much easier to generate than the orig<strong>in</strong>al<br />

90-degree phase shifted clocks required <strong>in</strong> the CTTI<br />

78<br />

modulator. It has been shown that with the additional<br />

DAC feedback paths, aside from the slight <strong>in</strong>crease <strong>in</strong><br />

circuit area, the new M-CTTI modulator outperforms the<br />

orig<strong>in</strong>al CTTI modulator <strong>in</strong> many important metrics,<br />

mak<strong>in</strong>g it much more suitable for high-speed design.<br />

a)<br />

Amplitude (dB)<br />

b)<br />

Amplitude (dB)<br />

0<br />

-50<br />

-100<br />

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5<br />

Frequency (fs=1)<br />

0<br />

-20<br />

-40<br />

-60<br />

-80<br />

-100<br />

-120<br />

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5<br />

Frequency (fs=1)<br />

Figure 5. Output spectra with non-idealities. a)<br />

M-CTTI modulator b) CTTI modulator.<br />

5. REFERENCES<br />

[1] T. C. Caldwell <strong>and</strong> D. A. Johns, “A Time-Interleaved<br />

Cont<strong>in</strong>uous-Time ∆Σ Modulator with 20MHz Signal<br />

B<strong>and</strong>width,” to be published at ESSCIRC <strong>2005</strong>.<br />

[2] P. Benabes, M. Keramat <strong>and</strong> R. Kielbasa, “A<br />

Methodology for design<strong>in</strong>g Cont<strong>in</strong>uous-time Sigma-<br />

Delta Modulators,” <strong>in</strong> Proc. European Design <strong>and</strong><br />

Test Conf., pp. 46-50, 1997.<br />

[3] J. A. Cherry <strong>and</strong> W. M. Snelgrove, “Excess Loop<br />

Delay <strong>in</strong> Cont<strong>in</strong>uous-Time Delta-Sigma Modulators,”<br />

IEEE Trans. Circuit <strong>and</strong> Systems-II: Analog <strong>and</strong><br />

Digital Signal Process<strong>in</strong>g, vol. 46, pp. 376-389, Apr.<br />

1999.<br />

[4] J. A. Cherry <strong>and</strong> W. M. Snelgrove, “Clock Jitter <strong>and</strong><br />

Quantizer Metastability <strong>in</strong> Cont<strong>in</strong>uous-Time Delta-<br />

Sigma Modulators,” IEEE Trans. Circuit <strong>and</strong><br />

Systems-II: Analog <strong>and</strong> Digital Signal Process<strong>in</strong>g,<br />

vol. 46, pp. 661-676, June 1999.


0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

ANALOG SOFT MAX CIRCUIT WITH<br />

DYNAMIC GAIN CONTROL<br />

Davide Piombo <strong>and</strong> Rodolfo Zun<strong>in</strong>o<br />

Biophysical <strong>and</strong> Electronic Eng<strong>in</strong>eer<strong>in</strong>g Department (DIBE), Genoa University,<br />

Via Opera Pia 11a, 16145 Genova, Italy<br />

E-mail: {piombo, zun<strong>in</strong>o} @dibe.unige.it<br />

ABSTRACT<br />

An analog current-mode circuit performs Soft-Max<br />

computation with ga<strong>in</strong>-control capability; a theoretical<br />

analysis supports the design method <strong>and</strong> gives the<br />

expected error bound. Experimental results from postlayout<br />

simulations match theoretical predictions.<br />

1. INTRODUCTION<br />

The Soft Max (SM) function [1] of a vector, Y, of n scalar<br />

<strong>in</strong>puts:Y={y1,..,yn}, is a vector, S, of n values computed as:<br />

Si<br />

n<br />

( Y ) = exp( − ⋅ y ) / exp(<br />

− γ ⋅ y )<br />

i<br />

∑<br />

j=<br />

1<br />

γ ; i=1,.,n (1)<br />

where γ is a ga<strong>in</strong> parameter. Soft Max often plays a<br />

relevant role <strong>in</strong> analog nonl<strong>in</strong>ear process<strong>in</strong>g because it<br />

generates a set of output-normalized quantities from a set<br />

of non-normalized <strong>in</strong>puts, hence it is suitable for dynamic<br />

control; by adjust<strong>in</strong>g the ga<strong>in</strong> parameter, γ, one can tune<br />

the SM function from a smooth mapp<strong>in</strong>g �(small γ values)<br />

to a W<strong>in</strong>ner-Takes-All (WTA) selection (γ → ∞). From a<br />

circuit viewpo<strong>in</strong>t, some issues h<strong>in</strong>der hardware<br />

implementations of SM process<strong>in</strong>g: the considerable<br />

dynamic range possibly required to support the<br />

exponential function; precision issues <strong>in</strong> render<strong>in</strong>g the<br />

normaliz<strong>in</strong>g ratio accurately; f<strong>in</strong>ally, <strong>and</strong> most<br />

importantly, an effective SM circuit should be able to<br />

support adjustable ga<strong>in</strong> control. These aspects can<br />

probably expla<strong>in</strong> why there are very few research on SM<br />

circuitry, as opposed to the vast literature on WTA<br />

schemata. This paper describes a general CMOS design<br />

method for current-mode SM circuits. A theoretical<br />

characterization derives an analytical upper bound to the<br />

hardware-<strong>in</strong>duced approximation error, which is used to<br />

develop the actual circuit design. The schema exhibits two<br />

major advantages: first, a modular approach facilitates the<br />

design of SM systems with an arbitrary number of <strong>in</strong>puts;<br />

secondly, the circuit enables one to set the SM ga<strong>in</strong> factor.<br />

2. THEORETICAL ANALYSIS OF<br />

SOFTMAX PROCESSING<br />

2.1 Approximation by power series<br />

A crucial feature of Soft Max computation is that the<br />

really important quantity is the absolute difference<br />

j<br />

79<br />

between the largest component <strong>and</strong> other components. In<br />

this sense, the role of the ga<strong>in</strong> factor, γ, lies <strong>in</strong> resiz<strong>in</strong>g the<br />

<strong>in</strong>put dynamic range properly. For the sake of simplicity,<br />

<strong>in</strong> the follow<strong>in</strong>g we shall write: γ⋅y i≡x i <strong>and</strong> x i∈ℜ + .<br />

The Soft Max computation can be split <strong>in</strong>to two<br />

operations, namely, the exponential computation <strong>and</strong> a<br />

global normalization that rescales output values. The<br />

approximation method adopted <strong>in</strong> this paper performs the<br />

computation of an approximated value of the<br />

exponentiation, <strong>and</strong> use a CMOS version of the Gilbert’s<br />

global normalization circuit [2] for the second task.<br />

The exponential approximation is obta<strong>in</strong>ed by a Taylor<br />

power series, truncated at the m-th order term:<br />

m<br />

i<br />

i<br />

( )<br />

( x j − x0<br />

)<br />

exp x j ≈ ∑ ( −1)<br />

exp(<br />

x0<br />

) (2)<br />

i=<br />

0 i!<br />

Approximated output values are computed <strong>in</strong> a parallel<br />

way, for all components of the Soft Max <strong>in</strong>put vector Y.<br />

The present approach uses the m<strong>in</strong>imum <strong>in</strong>put component<br />

as the pivot value, x0, of the series; Such a sett<strong>in</strong>g ensures<br />

that all the differential terms <strong>in</strong> (2) rema<strong>in</strong> positive.<br />

Def<strong>in</strong><strong>in</strong>g the quantities x j = x j − x0<br />

≥ 0 ∀j<br />

, <strong>and</strong><br />

xupper = sup x j , <strong>and</strong> denot<strong>in</strong>g with S j<br />

j<br />

~ the approximation<br />

of (1) us<strong>in</strong>g (2); after some derivations one obta<strong>in</strong>s an<br />

upper bound to the approximation absolute error for each<br />

s<strong>in</strong>gle output component:<br />

| | ( )<br />

( ) 2<br />

m+<br />

1<br />

xupper<br />

2 ⋅<br />

~ m+<br />

1 !<br />

Errj<br />

= S j − S j ≤<br />

; j=1,..,n (3)<br />

n ⋅ 1−<br />

xupper<br />

When represent<strong>in</strong>g quantities with a current-mode cod<strong>in</strong>g,<br />

each <strong>in</strong>put value xj is supported by a proportional current<br />

term Ij=IREF xj. As a result, the current-mode circuit that<br />

implements (2) should first extract the m<strong>in</strong>imum <strong>in</strong>put<br />

component, then subtract it from each <strong>in</strong>put term. The<br />

current-mode representation of this operation will be<br />

denoted as<br />

I j = I j − m<strong>in</strong> { Ii}<br />

= IREF<br />

⋅ x j ; j=1,..,n (4)<br />

The result<strong>in</strong>g current value for the exponential<br />

approximation will be then:<br />

m<br />

~ ( exp)<br />

m IREF<br />

⋅ xj<br />

I j = IREF<br />

− IREF<br />

⋅ xj<br />

+ .. + ( −1)<br />

; j=1,..,n (5)<br />

m!


The key idea <strong>in</strong> the SM circuit is now to make the<br />

constant current, I0, adaptive to the current <strong>in</strong>put<br />

configuration, by impos<strong>in</strong>g it to be proportional to the<br />

largest <strong>in</strong>put value:<br />

IREF = KREF<br />

max{<br />

Ii}<br />

(6)<br />

i<br />

−1<br />

By this trick one has: x upper = max{<br />

Ii}<br />

IREF<br />

= KREF<br />

,<br />

i<br />

which makes the approximation error (3) <strong>in</strong>puts<br />

<strong>in</strong>dependent, <strong>and</strong> the designer has one degree of freedom,<br />

i.e., the order, m, of approximation.<br />

2.2 Approximation Error Analysis<br />

The f<strong>in</strong>al error characterization gives a bound that<br />

decrease rapidly with K0 for any <strong>in</strong>put vector dimension n<br />

<strong>and</strong> for all approximation order m>1:<br />

m+<br />

1<br />

⎛ 1 ⎞<br />

2 ⋅ ⎜ ⎟<br />

~<br />

⎜ K ⎟<br />

REF<br />

Errj<br />

= | S j − S j | ≤<br />

⎝ ⎠<br />

(7)<br />

2<br />

⎛ 1 ⎞<br />

n ⋅ ⎜1<br />

⎟<br />

⎜<br />

− ⋅ ( m+<br />

1)!<br />

K ⎟<br />

⎝ REF ⎠<br />

The graph <strong>in</strong> Fig 1 shows a family of error curves hav<strong>in</strong>g<br />

a common approximation order m=2, for several <strong>in</strong>put<br />

dimensions, n. The graph clearly shows that the adopted<br />

approximation order m=2 is sufficient to limit the<br />

approximation error under 0,05 for all K 0>1.<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

Figure 1: Approximation error bound<br />

3. CURRENT MODE CIRCUIT<br />

DESIGN<br />

For the sake of clarity, the current-mode Soft Max circuit<br />

design will be expla<strong>in</strong>ed by consider<strong>in</strong>g a sample set-up<br />

adopt<strong>in</strong>g the above-derived approximation order m=2. The<br />

follow<strong>in</strong>g sections will first describe the sub-circuits to<br />

perform maximum- <strong>and</strong> m<strong>in</strong>imum-extraction, then the<br />

schemata to compute each s<strong>in</strong>gle term of the exponential<br />

approximation will be illustrated <strong>and</strong> analyzed. F<strong>in</strong>ally, a<br />

short description of the global normalization circuit will<br />

be given.<br />

80<br />

3.1 Maximum M<strong>in</strong>imum Extraction<br />

The extraction of the maximum <strong>and</strong> m<strong>in</strong>imum component<br />

of the <strong>in</strong>put vector is performed by a hierarchical b<strong>in</strong>ary-<br />

tree architecture made of two <strong>in</strong>puts/two outputs cells.<br />

One output l<strong>in</strong>e from each cell gives the larger <strong>in</strong>put<br />

value, whereas the other channel returns the smaller. Each<br />

hierarchy (for max <strong>and</strong> m<strong>in</strong> selection, respectively)<br />

<strong>in</strong>cludes a number log 2n of layers, but the two hierarchies<br />

share the first layer. The schema of a b<strong>in</strong>ary cell is<br />

presented <strong>in</strong> Fig.2, <strong>and</strong> is a variation of the classical WTA<br />

Lazzaro cell [3]. The steer<strong>in</strong>g cell <strong>in</strong> Fig.3 compensates<br />

crossover distortion when the pair of <strong>in</strong>put values are not<br />

separable, <strong>and</strong> gives as outputs their arithmetic average.<br />

Figure 2: B<strong>in</strong>ary M<strong>in</strong>-/Max extraction cell<br />

Figure 3: Steer<strong>in</strong>g cell for Max-M<strong>in</strong> selection<br />

3.2 Exp approx.: the Constant Term<br />

The bias, IREF, is the first series term (5) that must be<br />

computed to obta<strong>in</strong> the exponential approximated value.<br />

Expression (6) shows that the <strong>in</strong>puts required to generate<br />

that bias term are: the difference between the largest <strong>and</strong><br />

smallest <strong>in</strong>puts, <strong>and</strong> the ga<strong>in</strong> factor KREF. The schema<br />

adopted to support (6) accord<strong>in</strong>gly is a current-steer<strong>in</strong>g<br />

D/A converter, whose digital <strong>in</strong>puts represent the K0<br />

factor, while the reference current is the difference current<br />

max { I j}<br />

. Fig.4 show the constant term circuit schema.<br />

j<br />

3.3 Exp approx.: the L<strong>in</strong>ear Term<br />

The l<strong>in</strong>ear term of the approximation is simply the <strong>in</strong>put<br />

current of the j-th channel, after the subtraction of the<br />

m<strong>in</strong>imum <strong>in</strong>put current: I j = I j − m<strong>in</strong>{<br />

Ii}<br />

.<br />

i


0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

Figure 4: Cell schematic for the I REF term<br />

3.4 Exp approx.: the Quadratic Term<br />

The circuit schema for the quadratic term <strong>in</strong> (5) uses a<br />

weak <strong>in</strong>version trans-l<strong>in</strong>ear loop, which is highlighted <strong>in</strong><br />

the grey area of Fig.5. It is easy to verify that such a sub<br />

circuit drives <strong>in</strong>to the mirror M34/M33 a current term that<br />

2 ( I j )<br />

is equal to . The output mirror exhibits a 2:1 aspect<br />

IREF<br />

ratio to implement the denom<strong>in</strong>ator factor <strong>in</strong> (5).<br />

3.5 The Exponential Approximated Value<br />

Fig. 5 gives the complete schema used for the second<br />

order exponential approximation. The 2-nd order<br />

approximation of the exponential will therefore <strong>in</strong>clude<br />

the three terms described above. In the circuit schema of<br />

Fig. 5, the associated currents sum at node A, hence one<br />

obta<strong>in</strong>s an output current value:<br />

( )<br />

{ } ⎟ ⎟<br />

⎛<br />

⎞<br />

⎜<br />

ˆ 2<br />

j<br />

1 1 x j<br />

I exp = KREF<br />

⋅ I0<br />

⋅ max xi<br />

⋅ 1−<br />

xˆ<br />

+<br />

⎜<br />

j<br />

(8)<br />

i K<br />

2 2<br />

⎝ REF KREF<br />

⎠<br />

where:<br />

x j<br />

xˆ<br />

j = (9)<br />

max{<br />

xi}<br />

i<br />

Figure. 5: Exponential Approximation Cell<br />

81<br />

The amplification factor <strong>in</strong> (8) is common to all<br />

components, <strong>and</strong> vanishes <strong>in</strong> the global normalization<br />

step. Instead, those expressions show that, <strong>in</strong> the output<br />

current value (8), the factor (KREF) -1 can be rewritten as<br />

the γ factor <strong>in</strong> the Soft Max def<strong>in</strong>ition (1); thus a dynamic<br />

control of the K REF factor (as shown <strong>in</strong> Fig.4) allows one<br />

ultimately to adjust the Soft Max ga<strong>in</strong> dynamically.<br />

3.6 The Global Normalization Circuit<br />

The f<strong>in</strong>al global normalization is performed by a CMOS<br />

version of the Gilbert’s global normalization circuit [2],<br />

whose operation can be written as<br />

j<br />

j Iexp<br />

Iout<br />

= IT<br />

⋅<br />

(10)<br />

n<br />

i<br />

∑ Iexp<br />

i=<br />

1<br />

In the <strong>in</strong>put/output relationship, the IT bias factor can be<br />

set <strong>in</strong> an arbitrary way, <strong>and</strong> therefore can be adjusted to<br />

control the dynamic output range. Fig. 6 gives the circuit<br />

schema.<br />

Figure. 6: Current Global Normalizer<br />

4. EXPERIMENTAL RESULTS<br />

4.1 Design Environment<br />

The example Soft Max computation system has been<br />

developed us<strong>in</strong>g a TSMC 0,35µm p–substrate technology<br />

by MOSIS service; the Eldo <strong>and</strong> IC Flow tools by Mentor<br />

Graphics supported the circuit simulation <strong>and</strong> the CAD<br />

design for the layout phase, together with the Calibre tool<br />

for post-layout analysis.<br />

4.2 Experimental Tests<br />

The tests implemented to verify the functionality of the<br />

proposed schema <strong>in</strong>volved a case study of n=8 <strong>in</strong>put<br />

quantities. The maximum admissible <strong>in</strong>put current was<br />

4µΑ, while the m<strong>in</strong>imum value was 0,5µΑ; the bias value<br />

for the IT term was set to 10µA. Table I gives the sizes of<br />

the MOS transistors.<br />

Stress tests aimed at verify<strong>in</strong>g the implementation<br />

correctness <strong>in</strong> a variety of worst-case <strong>in</strong>put distributions,<br />

<strong>in</strong> which the <strong>in</strong>put configurations stressed the <strong>in</strong>put<br />

dynamic range (one <strong>in</strong>put very far from all others).


0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

Table I - MOS Sizes for the SM Circuits<br />

Transistor W/L (µm) Transistor W/L (µm)<br />

M4 M10 M12<br />

M18 M21 M35 1/1 M31 12/1<br />

M14 M16 2/1 M34 6/1<br />

M15 M22 5/1 M41 15/1<br />

M17 1/3 M43 2/2<br />

M13 M20 M33 3/1 M47 50/1<br />

M32 M42 100/1 M48 1/10<br />

Table II gives the post-layout results for three SM ga<strong>in</strong><br />

sett<strong>in</strong>gs <strong>in</strong> each experiment: K0∈{3,7,15}. L<strong>in</strong>es 1, 2, <strong>and</strong><br />

3 give the ideal SM values (1) <strong>and</strong> the predicted approx<br />

error, respectively. L<strong>in</strong>es 4-7 give the actual circuit<br />

outputs <strong>and</strong> the associated overall approximation errors.<br />

Table II – Results for worst-case experiments<br />

Experiment 1 Experiment 2<br />

Inputs → I1 = 4µA, I2-8 = 0.5µA I1 = 0.5µA, I2-8 = 4µA<br />

Ga<strong>in</strong> → K0= 3 K0= 7 K0= 15 K0= 3 K0= 7 K0= 15<br />

S1(Y) 0.0929 0.1102 0.1179 0.1662 0.1415 0.1325<br />

Si(Y)<br />

i =2,8<br />

0.1296 0.1271 0.1260 0.1191 0.1226 0.1239<br />

Erri∀i<br />

0.0035 0.0002 0.0000 0.0035 0.0002 0.0000<br />

see (12)<br />

() 1<br />

I out / IT<br />

0.0884 0.1055 0.1148 0.1756 0.1516 0.1420<br />

() i<br />

I out / IT<br />

i =2,8<br />

0.1316 0.1300 0.1298 0.1190 0.1233 0.1258<br />

Actual<br />

Err1 Actual<br />

0.0045 0.0047 0.0031 0.0094 0.0101 0.0095<br />

Erri<br />

i =2,8<br />

0.0020 0.0029 0.0038 0.0001 0.0007 0.0019<br />

Experimental results confirm that the HW-<strong>in</strong>duced error<br />

well matched theoretical predictions; SM operation was<br />

also effectively supported, as the overall distortion never<br />

exceeded 5%. The rapidly decreas<strong>in</strong>g predicted error (12)<br />

when K0 <strong>in</strong>creases may also suggest that, for large K0<br />

sett<strong>in</strong>gs, the power series can stop at the first term (m=1).<br />

Figure 7: Dynamic Error Behavior<br />

82<br />

Another campaign of dynamic tests measured the circuit<br />

b<strong>and</strong>width, which exceeded 100 kHz. Fig. 7 gives the<br />

measured dynamic distortion, when the <strong>in</strong>put currents<br />

were s<strong>in</strong>usoidal waveforms, with 3µA amplitude <strong>and</strong> 45°<br />

phase-shift from one another. The graph shows that the<br />

dynamic error is a zero-mean s<strong>in</strong>usoidal waveform, as<br />

well, <strong>and</strong> never exceeds a 5% overall distortion. Overall<br />

power consumption was less than 5mW.<br />

The area occupation for the complete layout of the<br />

example Soft Max computation system is approximately<br />

400x480µm 2 . In Fig.8 an image of the complete layout is<br />

reported.<br />

Figure 8: Layout of the 8-<strong>in</strong>put Soft Max system<br />

5. CONCLUSIONS<br />

A current-mode analogue approach <strong>in</strong>volv<strong>in</strong>g a 2-nd order<br />

approximation can support Soft-Max operation<br />

effectively. The research presented a modular approach<br />

that allows a designer to develop SM circuits with a<br />

predictable <strong>and</strong> consistent performance. The overall SM<br />

schema also enables one to control the SM ga<strong>in</strong><br />

dynamically<br />

6. REFERENCES<br />

[1] Gold S, Rangarajan A, “Softmax to softassign: neural<br />

network algorithms for comb<strong>in</strong>atorial optimization”<br />

Journal of artificial neural netwoks, 1995, vol. 2, No.<br />

4, pp. 381-399.<br />

[2] Gilbert B. “A Monolithic 16-Channel Analog Array<br />

Normalizer” IEEE Journal of Solid State Circuits,<br />

SC-19, No 6, pp. 956-963, Dec 1984<br />

[3] Lazzaro J, Ryckebush S, Mahowald MA, Mead CA<br />

“W<strong>in</strong>ner-take-all networks of O(n) complexity” <strong>in</strong><br />

Adv.Neur.Info.Proc. Sys. NIPS 1: Morgan Kaufman,<br />

1989, pp. 703-711


0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

TOWARD ASYNCHRONOUS AND HIGH DATA<br />

RATES PASSIVE CONTACTLESS SYSTEMS<br />

Damien Caucheteux (1)(2) , Edith Beigné (1) , Marc Renaud<strong>in</strong> (2) , Elisabeth Crochon (1)<br />

(1)<br />

CEA, LETI-DCIS – 17 rue des Martyrs<br />

(2)<br />

TIMA Lab., CIS-Group – 46 av. Félix Viallet<br />

38054 Grenoble Cedex 9, France<br />

38031 Grenoble Cedex, France<br />

E-mail: damien.caucheteux@cea.fr<br />

ABSTRACT<br />

This paper is a contribution to the <strong>in</strong>troduction of a new<br />

class of contactless remotely powered systems. To<br />

improve remotely powered device performances, like<br />

transmission data rates, power consumption <strong>and</strong> dynamic<br />

adaptation to environment, fully asynchronous systems<br />

are used. A new <strong>and</strong> specific event based communication<br />

is developed <strong>and</strong> tested with an <strong>in</strong>ductive l<strong>in</strong>k coupl<strong>in</strong>g<br />

model. Post-layout simulations of the asynchronous selfadaptive<br />

to data rate tag test chip show that a downl<strong>in</strong>k<br />

communication up to 1.2 Mbps is achieved with a global<br />

power consumption on tag below 120 µW.<br />

1. INTRODUCTION<br />

A contactless system is at least composed, as shown <strong>in</strong><br />

figure 1, of a reader <strong>and</strong> a tag [1]. The first one is a fixed<br />

element, connected to a specific data base or network, <strong>and</strong><br />

the second one is a mobile element. The tag receives,<br />

through the magnetic field, the downl<strong>in</strong>k communication<br />

<strong>and</strong> emits back to the reader the upl<strong>in</strong>k communication.<br />

Reader<br />

Energy<br />

Clock<br />

Data<br />

Tag<br />

Few centimeters<br />

Figure 1. M<strong>in</strong>imum <strong>in</strong>ductively powered system.<br />

Accord<strong>in</strong>g to required communication distances <strong>and</strong> data<br />

rates, contactless systems usually follow one of the<br />

13.56 MHz ISO st<strong>and</strong>ards [1]. Codes, data rates, modulation<br />

characteristics <strong>and</strong> protocols are def<strong>in</strong>ed <strong>in</strong> these<br />

documents. Table 1 gives some of their characteristics.<br />

Table 1. Some 13.56 MHz downl<strong>in</strong>k st<strong>and</strong>ards.<br />

St<strong>and</strong>ards Downl<strong>in</strong>k characteristics<br />

Ref Type Modulation Data rate<br />

14223<br />

HDX/FDX ASK 90-100% 6 kbps<br />

SEQ 90° PSK 500 kbps<br />

10536 X ASK 100% 9.6 kbps<br />

14443<br />

A ASK 100% 106 kbps<br />

B ASK 10% 106 kbps<br />

15693<br />

Low ASK 100% or 10% 1.65 kbps<br />

Fast ASK 100% or 10% 26.5 kbps<br />

83<br />

Protocols can be sequential (SEQ) if the carrier sent by the<br />

reader is discont<strong>in</strong>uous, half-duplex (HDX) if upl<strong>in</strong>k <strong>and</strong><br />

downl<strong>in</strong>k communications are alternative on a cont<strong>in</strong>uous<br />

carrier or full-duplex (FDX) if communications are simultaneous<br />

on a cont<strong>in</strong>uous carrier. The st<strong>and</strong>ard determ<strong>in</strong>es<br />

data frame to be sent to the tag <strong>and</strong> fixes the demodulator<br />

specifications. The tag is then designed to be compatible<br />

with a reader protocol accord<strong>in</strong>g to the predef<strong>in</strong>ed data<br />

frame <strong>and</strong> bit rate. Tag demodulator is resynchronized at<br />

each start of frame symbol to detect the <strong>in</strong>com<strong>in</strong>g flow.<br />

New applications, like electronic keys, contactless cryptography<br />

smart cards, biometry identification or stimulation<br />

systems, require high performances contactless systems. A<br />

solution is to improve power transmission or data rates, to<br />

allow full-duplex protocol or to reduce power consumption<br />

[2][3]. Actual proposals are based on fixed synchronous<br />

protocols <strong>and</strong> data rates, limit<strong>in</strong>g their use <strong>in</strong>to<br />

specific environments <strong>and</strong> distances. If conditions are not<br />

satisfied, no transmission is possible, whereas applications<br />

could require to adapt performances to environment, like<br />

reduc<strong>in</strong>g data rate to improve power consumption.<br />

In this paper, we present a new class of 13.56 MHz <strong>in</strong>ductively<br />

powered contactless systems to reach high data<br />

rates, asynchronous transmission <strong>and</strong> to reduce power<br />

consumption. Moreover, a self-adaptive to data rate tag<br />

test chip is designed to detect any modulation data rate.<br />

First, asynchronous systems properties <strong>and</strong> motivations<br />

are presented. Secondly, to outl<strong>in</strong>e air <strong>in</strong>terface properties<br />

<strong>and</strong> to validate the new event based communication<br />

scheme, an <strong>in</strong>ductive l<strong>in</strong>k modell<strong>in</strong>g methodology is<br />

described. We f<strong>in</strong>ally briefly present, <strong>in</strong> section 4, a selfadaptive<br />

to data rate tag architecture <strong>and</strong> features.<br />

2. ASYNCHRONOUS SYSTEMS<br />

2.1 Asynchronous circuit properties<br />

As an answer to the development of high performance<br />

applications, some works have proved benefits of asynchronous<br />

circuits on tag.<br />

operator<br />

n<br />

acknowledge<br />

data + request<br />

operator<br />

n+1<br />

Figure 2. Local h<strong>and</strong>shake synchronization.


Those systems are based on ISO 14443 st<strong>and</strong>ard communications<br />

[4][5]. To replace the global clock synchronization<br />

signal, those are locally based on a h<strong>and</strong>shake<br />

protocol. This is illustrated on figure 2.<br />

First, the h<strong>and</strong>shake synchronization makes those circuits<br />

active on dem<strong>and</strong> that’s why they are said data driven. So,<br />

asynchronous circuits are active where <strong>and</strong> when needed.<br />

Moreover, block activations over time are distributed,<br />

limit<strong>in</strong>g consumption current peaks. Those two properties<br />

reduce static <strong>and</strong> dynamic power consumption [6]. Spectrum<br />

<strong>and</strong> harmonics of digital circuits are also reduced.<br />

Secondly, the delay <strong>in</strong>sensitive property of asynchronous<br />

circuits makes them robust to supply voltage variations:<br />

this improve remotely powered tag performances. It also<br />

implements an automatic performance regulation on tag,<br />

depend<strong>in</strong>g on available power. The circuit speed, <strong>and</strong> so<br />

its consumption, are automatically reduced when the<br />

energy received is reduced. This also reduces tag front-end<br />

constra<strong>in</strong>ts <strong>and</strong> consequently its complexity <strong>and</strong> area.<br />

2.2 A fully asynchronous contactless tag<br />

In this paper, we propose to take advantage of high<br />

performance solutions <strong>and</strong> asynchronous circuits: a fully<br />

asynchronous tag, us<strong>in</strong>g an asynchronous event based<br />

transmission from the reader to the tag, is <strong>in</strong>troduced. To<br />

adapt tag performances to conditions, a high data rate <strong>and</strong><br />

very flexible variable communication scheme is needed.<br />

A self-adaptive to data rate front-end enables us not to use<br />

a predef<strong>in</strong>ed <strong>and</strong> fixed data transmission rate. Accord<strong>in</strong>g<br />

to asynchronous circuit local synchronization, a global<br />

data clock is no longer needed. An event based communication<br />

is <strong>in</strong>troduced here to perform asynchronous transmissions.<br />

Events notify <strong>in</strong>com<strong>in</strong>g data to the tag, replac<strong>in</strong>g<br />

a data transmission clock ris<strong>in</strong>g edge. These events are<br />

said asynchronous because they can appear at any time of<br />

the carrier phase <strong>and</strong> at any time after the previous one.<br />

To achieve the demodulation, the device is cont<strong>in</strong>uously<br />

track<strong>in</strong>g events on carrier to detect a data on coil. Then, a<br />

signal acquisition sends an image of the received event to<br />

an asynchronous event demodulator. F<strong>in</strong>ally, a comparison<br />

between the detected mark <strong>and</strong> memorized marks<br />

enables the tag to recover the signature of this event <strong>and</strong><br />

consequently the associated data value.<br />

We may notice that there is no tim<strong>in</strong>g assumption. Any<br />

data transmission rate can dynamically be chosen up to<br />

system performances.<br />

3. INDUCTIVE LINK MODELING<br />

3.1 Model<strong>in</strong>g methodology<br />

To characterize communications <strong>and</strong> <strong>in</strong>vestigate different<br />

solutions, a simple, but efficient, model is used. Whereas<br />

the reader antenna is modeled by a R 1L 1C 1 serial resonant<br />

circuit, tag’s one is modeled by a R 2L 2C 2 parallel one [3].<br />

The general case of an <strong>in</strong>ductive l<strong>in</strong>k is then described by<br />

the equation system giv<strong>in</strong>g received voltage on tag V 2 <strong>and</strong><br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

84<br />

current I 1 through <strong>in</strong>ductance L 1, know<strong>in</strong>g emissions on<br />

V 1, circuit characteristics (ω i, Q i <strong>and</strong> L i) <strong>and</strong> the <strong>in</strong>ductive<br />

coupl<strong>in</strong>g coefficient k.<br />

2<br />

⎧ L ⎛ ⎞<br />

1ω1<br />

ω1<br />

⎪V<br />

⎜<br />

⎟<br />

1 = I1<br />

+ jL1<br />

ω − I1<br />

+ jk L1L<br />

2ωI<br />

⎪ Q1<br />

⎝ ω ⎠<br />

⎪<br />

⎨ V2<br />

= jL2ωI<br />

2 + jk L1L<br />

2ωI<br />

1<br />

⎪<br />

2<br />

Q2<br />

L2ω<br />

2<br />

⎪ V2<br />

= − I 2<br />

⎪<br />

ω 2 + jQ2ω<br />

⎩<br />

Equation 1. Inductive l<strong>in</strong>k model equations.<br />

These equations, given <strong>in</strong> equation 1, are the basis of a<br />

powerful tool for fast <strong>in</strong>vestigations. It as been<br />

implemented on Matlab Simul<strong>in</strong>k to perform fast <strong>and</strong><br />

comprehensive simulations.<br />

3.2 Ma<strong>in</strong> <strong>in</strong>ductive l<strong>in</strong>k properties<br />

Communication characteristics are given <strong>in</strong> two steps:<br />

first, static analyses <strong>and</strong> then, dynamic simulations. Static<br />

analysis results outl<strong>in</strong>e coupl<strong>in</strong>g family characteristics<br />

def<strong>in</strong>ed by resonant circuit parameter values [7]. On the<br />

other h<strong>and</strong>, dynamic analyses give transient simulations of<br />

V2 <strong>and</strong> I1. With this new fast simulation model, fullduplex<br />

protocol simulation <strong>and</strong> <strong>in</strong>vestigation of parameter<br />

values <strong>in</strong>fluence are possible by resolv<strong>in</strong>g a three<br />

differential equation system deduced from equation 1.<br />

This model has been validated with a Spice level<br />

simulation of the reader front-end architecture.<br />

This model gives important features on modulation<br />

schemes. As an example, the amplitude received by the<br />

tag is an important selection criteria: power transfer<br />

function depends on the amplitude level. That’s why we<br />

propose an analysis of this criteria. As the tag is designed<br />

to receive a fixed voltage, the transmitted power decreases<br />

with signal amplitude. As a consequence, optimal power<br />

transfer is opposed to an easy ASK (Amplitude Shift<br />

Key<strong>in</strong>g) demodulation which needs high modulation <strong>in</strong>dex<br />

values. FSK (Frequency Shift Key<strong>in</strong>g), which is a<br />

frequency modulation, also modulates the received voltage<br />

amplitude because of the <strong>in</strong>ductive l<strong>in</strong>k b<strong>and</strong> pass filter<strong>in</strong>g<br />

function [8]. At the opposite, the <strong>in</strong>ductive model shows<br />

that PSK (Phase Shift Key<strong>in</strong>g) creates only a transient<br />

amplitude modulation. This expla<strong>in</strong>s why the model<br />

shows that phase modulation has good EMI radiations<br />

results for a given transmitted power.<br />

Moreover, dynamic simulations of an ISO 14443 st<strong>and</strong>ard,<br />

with a 106 kbps data rate transmission, show that the physical<br />

limit of this <strong>in</strong>ductive l<strong>in</strong>k is high enough to improve<br />

data rates. Some studies have already proved that the<br />

phase modulation allows high data rate transmissions [8].<br />

In fact, the presented model shows that we can improve<br />

transmission data rates <strong>and</strong> choose an efficient downl<strong>in</strong>k<br />

modulation regard<strong>in</strong>g power transfer efficiency. This<br />

model leads us to choose an adequate communication<br />

which is presented below.<br />

2


3.3 The new event based communication<br />

A new asynchronous event based communication scheme<br />

is deduced from those studies. No data clock transmission<br />

or frequency knowledge is needed: each data is carried by<br />

a particular mark fitt<strong>in</strong>g asynchronous circuit protocols.<br />

This mark, or event, is not only carry<strong>in</strong>g a data value but<br />

also a local synchronization, request<strong>in</strong>g a demodulation.<br />

To dynamically adapt itself to variable data rates, the<br />

associated demodulator has to detect <strong>in</strong>com<strong>in</strong>g marks.<br />

Moreover, if needed, h<strong>and</strong>shake communications toward<br />

the <strong>in</strong>ductive l<strong>in</strong>k are possible us<strong>in</strong>g full-duplex protocol.<br />

Regard<strong>in</strong>g these specifications <strong>and</strong> modulation properties<br />

presented above, we have chosen a phase modulation<br />

based on π/4 shifts on a 13.56 MHz frequency carrier.<br />

This leads to a code, illustrated <strong>in</strong> figure 2-a, called a two<br />

of eight cyclic code. Whereas phase shifts are implement<strong>in</strong>g<br />

asynchronous synchronization, their signs are encod<strong>in</strong>g<br />

the data value (logical 0 or 1). Therefore, with such<br />

phase shift events, the receiver does not need to know the<br />

absolute value of the phase but only detects phase shifts.<br />

3π<br />

4<br />

π<br />

2<br />

π 0<br />

3π<br />

−<br />

4<br />

'1'<br />

π<br />

−<br />

2<br />

π<br />

4<br />

π<br />

−<br />

4<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

'0'<br />

π<br />

ϕ n+<br />

1 −ϕ<br />

n =<br />

4<br />

(a)<br />

'1'<br />

'0'<br />

1<br />

0<br />

1<br />

0<br />

ϕ n<br />

V 2<br />

data value<br />

data synchronization<br />

phase position<br />

modulated carrier<br />

(b)<br />

Figure 4. Event based communication.<br />

An asynchronous downl<strong>in</strong>k communication, up to 1 Mbps,<br />

i.e. with a maximum time of 1 µs between two data, has<br />

been simulated us<strong>in</strong>g the <strong>in</strong>ductive l<strong>in</strong>k model. An<br />

illustration of an asynchronous downl<strong>in</strong>k transmission<br />

simulation is proposed <strong>in</strong> figure 2-b. Data value <strong>and</strong> local<br />

synchronization are first presented. The phase position is<br />

then illustrated, show<strong>in</strong>g that each new data is coded by a<br />

level transition. F<strong>in</strong>ally, the signal received on coil, V2, is<br />

simulated us<strong>in</strong>g the <strong>in</strong>ductive l<strong>in</strong>k model.<br />

A new high data rate asynchronous reader to tag transmission<br />

is proposed. Whereas all previous devices are us<strong>in</strong>g a<br />

fixed <strong>and</strong> predef<strong>in</strong>ed data rate, this new asynchronous<br />

event based communication allows tags to receive any<br />

data rate up to a maximum value. Moreover, the phase<br />

modulation is an efficient tool to provide high power<br />

transmission to tag. Furthermore, the event transmission is<br />

the key po<strong>in</strong>t to the <strong>in</strong>troduction of a new class of selfadaptive<br />

to data rate contactless devices. The next section<br />

expla<strong>in</strong>s the design of an asynchronous event based<br />

communication demodulator on a remotely powered tag.<br />

t<br />

t<br />

t<br />

t<br />

85<br />

4. ON TAG DEMODULATION<br />

A remotely powered device front-end is generally first<br />

composed of high voltage functions to control supply voltage,<br />

modulation, etc. Secondly analog functions provide<br />

stable current references, signal extraction, etc. However,<br />

a fully asynchronous tag does not need any clock generation<br />

neither a high performance voltage regulation to<br />

supply digital blocks. As a consequence, the tag front-end<br />

is simplified <strong>and</strong> its area reduced. F<strong>in</strong>ally, an <strong>in</strong>terface<br />

makes digital circuits compatible with the analog frontend.<br />

Here, the classical global architecture <strong>and</strong> ma<strong>in</strong><br />

functions of a contactless remotely powered device are not<br />

presented. For more <strong>in</strong>formation, we advise to read the<br />

RFID H<strong>and</strong>book 2 nd Edition [1].<br />

To detect data events, a charge pump PLL structure,<br />

illustrated on figure 5, is implemented on tag. A digital<br />

Phase <strong>and</strong> Frequency Detector (PFD) is used to analyze<br />

phases <strong>and</strong> to activate the Charge Pump (CP). It compares<br />

the signal received on coil with an <strong>in</strong>ternal reference<br />

generated by the Voltage Controlled Oscillator (VCO).<br />

Ext-Ref<br />

Int-Ref<br />

PFD CP<br />

VCO<br />

Up<br />

Down<br />

Ultra Low Power PLL<br />

APJD<br />

Multi<br />

Rail<br />

Asynchronous<br />

Phase Jump<br />

Detector<br />

Figure 5. PLL <strong>and</strong> demodulation architecture.<br />

The asynchronous event based communication demodulation<br />

is achieved by an Asynchronous Phase Jump Detector<br />

(APJD). It produces on its outputs a data us<strong>in</strong>g PFD<br />

results. First of all, each phase shift received on signal Up<br />

or Down, is filtered to dissociate data events from phase<br />

noise. A succession of filter cells gives n different images<br />

of the same phase shift, with different filter<strong>in</strong>g characteristics<br />

chosen accord<strong>in</strong>g to tag position <strong>in</strong> the field.<br />

Then, know<strong>in</strong>g the encod<strong>in</strong>g scheme, the signature detection<br />

l<strong>in</strong>ks phase variations with received data: selection of<br />

a filter<strong>in</strong>g characteristic, detection of the event envelop,<br />

<strong>and</strong> conversion <strong>in</strong> a compatible asynchronous circuit code.<br />

As no data clock is needed, the data rate can dynamically<br />

take any value up to l<strong>in</strong>k <strong>and</strong> circuit performances.<br />

In -<br />

V CC<br />

V cont<br />

Out + Out -<br />

I pol<br />

Symmetric Load<br />

Figure 6. Self-regulated differential delay cell.<br />

In +


The second key po<strong>in</strong>t is the design of an ultra low power<br />

VCO around 13.56 MHz to create the <strong>in</strong>ternal reference.<br />

The VCO is designed us<strong>in</strong>g a r<strong>in</strong>g oscillator composed of<br />

three differential voltage controlled delay cells [9].<br />

Moreover, a self-regulated function is added to perform an<br />

ultra low supply voltage sensitivity [10]. The VCO power<br />

consumption is 16 µW <strong>and</strong> output frequency variations are<br />

below 1 % with supply voltage variations up to 20 %.<br />

Normalized frequency<br />

1,01<br />

1,005<br />

1<br />

0,995<br />

0,99<br />

Typical process<br />

Worst speed process<br />

0,985<br />

0,98<br />

Worst power process<br />

620 720 820<br />

Supply voltage (mV)<br />

920 1020<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

Supply sensitivity<br />

300 K<br />

Figure 7. VCO supply voltage sensitivity.<br />

The analog tag front-end is designed to reach high performance<br />

<strong>and</strong> low power consumption features. As shown <strong>in</strong><br />

table 2, the front end consumption is below 120 µW<br />

dur<strong>in</strong>g a 1 Mbps demodulation. Dur<strong>in</strong>g st<strong>and</strong>by mode, tag<br />

wait<strong>in</strong>g for events, power consumption is lower by 10 %.<br />

Moreover, downl<strong>in</strong>k transmissions can take any rate up to<br />

1.2 Mbps at 10 cm. Compared to similar front-end<br />

architectures, with a 1 Mbps phase demodulation <strong>and</strong> a<br />

13.56 MHz carrier [8], power consumption of our design<br />

is lower by a factor 6.<br />

Table 2. Post-layout simulations.<br />

Reader to Tag distance (cm) 0 6 10<br />

Supply Voltage VCCA (V) 1.18 1.04 0.94<br />

I(VCCA) (µA) 98.3 84.8 79.5<br />

Power consumption (µW) 116 88.2 74.7<br />

Maximum data rate (Mbps) 1.0 1.2 1.2<br />

5. CONCLUSION<br />

A fully asynchronous <strong>and</strong> self-adaptive to data rate<br />

contactless system has been developed. An asynchronous<br />

event based communication protocol us<strong>in</strong>g a phase<br />

modulation <strong>and</strong> a two of eight cyclic code has been<br />

adopted because it fulfils the functional requirements we<br />

target as well as silicon implementation requirements.<br />

Indeed, the proposed contactless system enables us to<br />

<strong>in</strong>crease <strong>and</strong> dynamically adjusts the transmission rate, to<br />

lower the power consumption <strong>and</strong> to simplify the<br />

architecture.<br />

Post-layout simulations have shown that a downl<strong>in</strong>k<br />

transmission rate of up to 1.2 Mbps can be achieved.<br />

Moreover, the proposed architecture is low power <strong>and</strong> has<br />

86<br />

a low sensitivity to the supply voltage. This will extend<br />

the usage conditions <strong>and</strong> performances. The chip of this<br />

architecture, developed us<strong>in</strong>g a 130 nm st<strong>and</strong>ard CMOS<br />

process, is currently under test. It has to prove the benefits<br />

brought by this new approach <strong>and</strong> validate the upl<strong>in</strong>k<br />

communication scheme.<br />

6. ACKNOWLEDGEMENTS<br />

The authors would like to thank for their help all CEA-<br />

LETI teams which have been work<strong>in</strong>g on this project.<br />

7. REFERENCES<br />

[1] K. F<strong>in</strong>kenzeller, RFID H<strong>and</strong>book, Fundamentals <strong>and</strong><br />

Applications <strong>in</strong> Contactless Smart Cards <strong>and</strong><br />

Identification, Second Edition, John Wiley & Sons<br />

Ltd, 2003.<br />

[2] M. Ghovanloo <strong>and</strong> K. Najafi, A Wideb<strong>and</strong> Frequency<br />

Shift Key<strong>in</strong>g Wireless L<strong>in</strong>k for Inducti-vely Powered<br />

Biomedical Implants, IEEE Transac-tions on Circuits<br />

<strong>and</strong> Systems, vol. 51, n° 12, 2004.<br />

[3] D. C. Galbraith, M. Soma <strong>and</strong> R. L. White, A Wide-<br />

B<strong>and</strong> Efficient Inductive Transdermal Power <strong>and</strong><br />

Data L<strong>in</strong>k with Coupl<strong>in</strong>g Insensitive Ga<strong>in</strong>, IEEE<br />

Transaction on Biomedical Eng<strong>in</strong>eer<strong>in</strong>g, vol. 34,<br />

n° 4, 1987.<br />

[4] J. Kessels, T. Kramer, G. d. Besten, A. Peters <strong>and</strong><br />

V. Timm, Apply<strong>in</strong>g Asynchronous Circuits <strong>in</strong><br />

Contactless SmartCards, IEEE 6th Int. Symposium on<br />

Advanced <strong>Research</strong> <strong>in</strong> Asynchronous Circuits &<br />

Systems, ASYNC’00, April, 2000, pp. 36-44.<br />

[5] A. Abrial, J. Bouvier, M. Renaud<strong>in</strong>, P. Senn <strong>and</strong><br />

P. Vivet, A New Contactless Smart Card IC Us<strong>in</strong>g an<br />

On-Chip Antenna <strong>and</strong> an Asynchronous Microcontroller,<br />

IEEE Solid-State Circuits, vol. 36, n°7, 2001.<br />

[6] M. Renaud<strong>in</strong>, Asynchronous circuits <strong>and</strong> systems : a<br />

promis<strong>in</strong>g design alternative, <strong>Microelectronics</strong><br />

Eng<strong>in</strong>eer<strong>in</strong>g Journal, Elsevier Science, vol. 54,<br />

n° 1-2, Dec. 2000, pp. 133-149.<br />

[7] P. Doguet, Integrated stimulator for visual<br />

prosthesis, <strong>PhD</strong> thesis, Université Catholique de<br />

Louva<strong>in</strong>, 2000.<br />

[8] J. F. Gervais, J. Coulombe, F. Mounaim <strong>and</strong> M.<br />

Sawan, Bidirectional High Data Rate Transmission<br />

Interface for Inductively Powered Devices, IEEE<br />

Canadian Conference on Electrical <strong>and</strong> Computer<br />

Eng<strong>in</strong>eer<strong>in</strong>g, vol. 1, pages : 167-170, 2003.<br />

[9] J.G. Maneatis, Low-Jitter Process-Independent DLL<br />

<strong>and</strong> PLL Based on Self-Biased Techniques, IEEE<br />

Journal of Solid State Circuits, vol. 31, n° 11, Nov.,<br />

1996, pp. 1723-1732.<br />

[10] I.C. Hwang, C. Kim <strong>and</strong> S.M. Kang, A CMOS Self-<br />

Regulat<strong>in</strong>g VCO with Low Supply Sensitivity, IEEE<br />

Journal of Solid-State Circuits, vol. 39, n° 1, Jan.,<br />

2004, pp. 42-48.


0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

A DIGITAL-PLL-BASED TRUE RANDOM<br />

NUMBER GENERATOR<br />

Chengx<strong>in</strong> Liu, John McNeill<br />

Worcester Polytechnic Institute, Electrical <strong>and</strong> Computer Eng<strong>in</strong>eer<strong>in</strong>g Department,<br />

100 Institute Road, Worcester, MA 01609, USA<br />

E-mail: mcneill@ece.wpi.edu<br />

ABSTRACT<br />

A true r<strong>and</strong>om number generator (RNG) based on a<br />

digital phase-locked loop (PLL) has been designed <strong>and</strong><br />

implemented <strong>in</strong> a 1.5um CMOS process. It achieved an<br />

output data rate of 100 kbps from the sampl<strong>in</strong>g of two<br />

30MHz r<strong>in</strong>g oscillators, <strong>and</strong> successfully passed the NIST<br />

test suite SP800-22.<br />

1. INTRODUCTION<br />

Several emerg<strong>in</strong>g cryptographic applications such as smart<br />

cards <strong>and</strong> pervasive comput<strong>in</strong>g require a low cost solution<br />

to the problem of obta<strong>in</strong><strong>in</strong>g high quality r<strong>and</strong>om numbers<br />

<strong>in</strong> system-on-chip designs for secure data communication.<br />

The most popular approach by far is the method of<br />

oscillator-sampl<strong>in</strong>g [1][2] due to the advantages of less die<br />

area, power efficiency, <strong>and</strong> high speed.<br />

As illustrated <strong>in</strong> Fig. 1, a low frequency oscillator with<br />

high jitter samples the output of a high frequency<br />

oscillator us<strong>in</strong>g a D flip-flop to produce the r<strong>and</strong>omnumber<br />

sequences. In order to achieve high level<br />

r<strong>and</strong>omness, the rms jitter of the low frequency oscillator<br />

must be much greater than the period of the fast oscillator<br />

[2]. Experimental results <strong>in</strong> [2] have shown that for<br />

CMOS r<strong>in</strong>g oscillators <strong>in</strong> a 0.18um digital library, the<br />

jitter to mean period ratio is less than 10 -4 , which limits<br />

the maximum output data rate to 100 kbps if a 1GHz fast<br />

oscillator is used.<br />

This paper presents a new dual-oscillator sampl<strong>in</strong>g<br />

architecture for r<strong>and</strong>om number generation. The ma<strong>in</strong><br />

advantage over the traditional approach is the capability of<br />

achiev<strong>in</strong>g the same data rate us<strong>in</strong>g slower clocks, thus<br />

cheaper process <strong>and</strong> lower cost. Section 2 reviews the<br />

jitter theory of r<strong>in</strong>g oscillators <strong>and</strong> PLLs. The RNG<br />

architecture <strong>and</strong> circuit details are discussed <strong>in</strong> section 3.<br />

Section 4 gives the experimental results.<br />

High Freq.<br />

Oscillator<br />

Noisy Low<br />

Freq. Oscillator<br />

Figure 1. Traditional oscillator-based RNG<br />

D<br />

Q<br />

R<strong>and</strong>om Bit Stream<br />

87<br />

2. JITTER IN PLLS<br />

2.1 Jitter <strong>and</strong> phase noise <strong>in</strong> r<strong>in</strong>g oscillators<br />

White noise <strong>and</strong> 1/f noise are the ma<strong>in</strong> noise sources <strong>in</strong><br />

MOS transistors which will be upconverted to phase noise.<br />

Fig. 2 shows the typical phase noise spectrum <strong>and</strong> time<br />

doma<strong>in</strong> plot of rms jitter versus measurement time delay<br />

for open-loop r<strong>in</strong>g oscillators <strong>in</strong> log-log scale.<br />

( ( f ) )<br />

log SΦ − 30dB/<br />

dec<br />

Nf 1<br />

f<br />

c<br />

3<br />

fc<br />

− 20dB<br />

/ dec<br />

N1<br />

2<br />

f<br />

log(f )<br />

)<br />

log(σ ∆T<br />

slope=0.5<br />

κ ∆T<br />

tc<br />

slope=1<br />

ζ∆T<br />

log( ∆T)<br />

(a) Frequency doma<strong>in</strong> (b) Time doma<strong>in</strong><br />

Figure 2. Jitter <strong>and</strong> phase noise <strong>in</strong> r<strong>in</strong>g oscillators<br />

The -20dB/dec region <strong>in</strong> the phase noise spectrum is due<br />

to the white noise <strong>and</strong> has the form of<br />

N1<br />

SΦ W ( f ) = (1)<br />

2<br />

f<br />

where N1 is the frequency doma<strong>in</strong> white noise figure of<br />

merit, <strong>and</strong> f is the offset frequency from the carrier. It is a<br />

Gaussian r<strong>and</strong>om process <strong>in</strong> the time doma<strong>in</strong> simply<br />

reflect<strong>in</strong>g the central limit theorem <strong>in</strong> statistical theory.<br />

The rms jitter after measurement time delay ∆T is [3]<br />

σw( ∆ T) = κ ∆ T<br />

(2)<br />

where κ is the time doma<strong>in</strong> white noise figure of merit<br />

which is determ<strong>in</strong>ed by the delay cell parameters, <strong>and</strong> is<br />

<strong>in</strong>dependent of the number of stages <strong>in</strong> the r<strong>in</strong>g [3]. κ is<br />

related to the frequency doma<strong>in</strong> figure of merit N1 by [3]<br />

N1<br />

κ = (3)<br />

f0<br />

where f0 is the VCO’s oscillat<strong>in</strong>g frequency.<br />

1/f noise upconverted phase noise dom<strong>in</strong>ates at the lower<br />

offset frequencies <strong>and</strong> thus longer time delays. It has a<br />

slope of –30dB/dec <strong>in</strong> the phase noise spectrum with the<br />

representation of [4]<br />

Nf 1 c SΦ<br />

1/ f = (4)<br />

3<br />

f


where fc is the 1/f 3 phase noise corner [4]. In time doma<strong>in</strong>,<br />

it is a correlated r<strong>and</strong>om process. The accumulated rms<br />

jitter is proportional to the measurement time delay as<br />

σ1/ f ( ∆ T) = ζ∆<br />

T<br />

(5)<br />

where ζ is time doma<strong>in</strong> 1/f noise figure of merit.<br />

2.2 Jitter <strong>and</strong> phase noise <strong>in</strong> PLLs<br />

The noise transfer function of PLL corresponds to the<br />

high-pass transfer function. If the PLL loop b<strong>and</strong>width is<br />

wide enough to cover the 1/f 3 phase noise corner fc, the<br />

upconverted 1/f noise jitter can be filtered out, leav<strong>in</strong>g<br />

only the filtered white noise as <strong>in</strong> Fig. 3. This rema<strong>in</strong><strong>in</strong>g<br />

noise has a Gaussian distribution <strong>and</strong> the jitter process is a<br />

correlated white noise process.<br />

The closed-loop phase noise spectrum has the form of<br />

2<br />

N1/ fL<br />

SΦ CL ( f ) = (6)<br />

2<br />

1 + ( f / fL<br />

)<br />

where fL is the PLL loop b<strong>and</strong>width. By the Wiener-<br />

Kh<strong>in</strong>ch<strong>in</strong>e theorem, the autocorrelation function of this<br />

jitter process can be obta<strong>in</strong>ed by tak<strong>in</strong>g <strong>in</strong>verse Fourier<br />

transform of its power spectrum density (6). Therefore the<br />

autocorrelation coefficient of this jitter process is<br />

calculated as<br />

ρ xx ( ∆t ) =exp( -2πfL∆t) (7)<br />

And the rms jitter with respect to the PLL reference clock<br />

is as [3]<br />

1<br />

σx= κ<br />

(8)<br />

4π<br />

fL<br />

So for white noise dom<strong>in</strong>ated PLL, only κ <strong>and</strong> the PLL<br />

loop b<strong>and</strong>width fL are needed for jitter prediction.<br />

( ( f ) )<br />

log SΦ fc<br />

fL<br />

N<br />

f<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

1<br />

2<br />

log(f )<br />

)<br />

log(σ ∆T<br />

κ ∆T<br />

τ L<br />

tc<br />

2σ x<br />

log( ∆T)<br />

(a) Frequency doma<strong>in</strong> (b) Time doma<strong>in</strong><br />

Figure 3. Jitter <strong>and</strong> phase noise <strong>in</strong> PLLs<br />

DACCtrl<br />

R<strong>in</strong>g Oscillator II<br />

Bias<br />

DACCtrl<br />

R<strong>in</strong>g Oscillator I<br />

Data<br />

D Q<br />

24-bit up/down<br />

counterp<br />

Clk<br />

9 bits<br />

R<strong>and</strong>omBits<br />

Post -process<strong>in</strong>g<br />

1-bit up/down<br />

counter z<br />

Figure 4. The proposed RNG architecture<br />

8 bits<br />

1 bit<br />

88<br />

3. RNG DESIGN<br />

3.1 System architecture<br />

The proposed RNG architecture is illustrated <strong>in</strong> Fig. 4.<br />

Two identical noisy r<strong>in</strong>g oscillators are designed with<br />

white noise dom<strong>in</strong>ated jitter. Oscillator І is free-runn<strong>in</strong>g<br />

<strong>and</strong> serv<strong>in</strong>g as the clock. The phase error of the two<br />

oscillators is sampled by a low metastability D flip-flop,<br />

which also acts as a bang-bang phase detector. Two<br />

up/down counters form the loop filter. The 24-bit up/down<br />

counter p <strong>in</strong>tegrates the phase error of the two r<strong>in</strong>g<br />

oscillators to set the average frequency of oscillator II, <strong>and</strong><br />

<strong>in</strong>troduces a pole to the loop transfer function. The 1-bit<br />

up/down counter z <strong>in</strong>troduces the zero to stabilize the<br />

loop, <strong>and</strong> provides <strong>in</strong>stantaneous phase correction without<br />

affect<strong>in</strong>g the average oscillat<strong>in</strong>g frequency. Therefore the<br />

two oscillators are always synchronized through the<br />

feedback. The whole system is powered by a voltage<br />

regulator to reject the noise from the power supply. It<br />

should be noted that the whole system is nonl<strong>in</strong>ear <strong>and</strong><br />

thus it is difficult to be modeled analytically.<br />

If the loop b<strong>and</strong>width is wide enough to filter out the 1/f<br />

noise upconverted jitter, the jitter difference of the two<br />

oscillators is simply correlated white noise. After sampled<br />

by the D flip-flop, a correlated data stream with equal<br />

probability for ‘1’s <strong>and</strong> ‘0’s is generated. From equation<br />

(7), by divid<strong>in</strong>g the output data rate down to around or<br />

below the PLL loop b<strong>and</strong>width, the autocorrelation of the<br />

output data can be significantly decreased so that the data<br />

stream can be considered as r<strong>and</strong>om. Therefore for the<br />

proposed RNG, the maximum data rate achievable is<br />

limited by the loop b<strong>and</strong>width of this system. S<strong>in</strong>ce the<br />

PLL loop b<strong>and</strong>width can be as high as 10% of the clock<br />

frequency, ideally the maximum RNG data rate can be as<br />

high as 10% of the r<strong>in</strong>g oscillator frequency.<br />

However, wider loop b<strong>and</strong>width results <strong>in</strong> lower PLL<br />

output jitter accord<strong>in</strong>g to equation (8). The output jitter<br />

should be much larger than the LSB of the DAC, or the<br />

phases of the two r<strong>in</strong>g oscillators will not be synchronized<br />

but oscillate. Thus the key factors of this design are the<br />

DAC resolution <strong>and</strong> the noise figure of merit κ.<br />

3.2 Time doma<strong>in</strong> analysis<br />

As illustrated <strong>in</strong> Fig. 5, <strong>in</strong> the presence of jitter <strong>and</strong><br />

assum<strong>in</strong>g there is no frequency drift, the transition time for<br />

the two oscillators can be expressed as<br />

t1[ n] = t1[ n− 1] + T1+ ε1[<br />

n]<br />

(9)<br />

t2[ n] = t2[ n− 1] + T2[ n] + ε 2[<br />

n]<br />

(10)<br />

where T1 is the constant average period of the free-runn<strong>in</strong>g<br />

oscillator I, T2[n] is the average period of the oscillator II<br />

<strong>in</strong> the n th <strong>in</strong>terval, <strong>and</strong> εi[n] is zero-mean, discrete<br />

Gaussian r<strong>and</strong>om process which expresses the deviation of<br />

the period from the average. S<strong>in</strong>ce white noise dom<strong>in</strong>ates,<br />

the r<strong>and</strong>om process εi[n] is uncorrelated. From equation<br />

(2), the st<strong>and</strong>ard deviation of εi[n] is as<br />

σ = κ T i = 1, 2<br />

(11)<br />

ε i i i


t[1]<br />

t[2]<br />

T+ε[1] 0<br />

0 T+ε[2] 0 T+ε[n]<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

t[n]<br />

Figure 5. Def. of r<strong>and</strong>om processes for clock with jitter<br />

Assum<strong>in</strong>g there is no metastability for the D flip-flop, the<br />

phase error sequence rb[n], which is also the output<br />

sequence of this RNG system, is as<br />

rb[ n] = ( sign( td[ n]<br />

) + 1 ) /2<br />

where sign() is the signum function, <strong>and</strong> td[n] is<br />

(12)<br />

td[ n] = t1[ n]- t2[ n]<br />

= t[ n−1] −t [ n− 1] + T − T [ n] + ε [ n] −ε<br />

[ n]<br />

(13)<br />

( ) ( ) ( )<br />

1 2 1 2 1 2<br />

S<strong>in</strong>ce ε1[n] <strong>and</strong> ε2[n] are <strong>in</strong>dependent zero-mean Gaussian<br />

r<strong>and</strong>om processes, their difference is also a zero-mean<br />

Gaussian r<strong>and</strong>om process. The first two difference terms<br />

<strong>in</strong> equation (13) <strong>in</strong>dicate the correlation of td[n] with its<br />

previous bits. Therefore the sequence td[n] is a correlated<br />

r<strong>and</strong>om process.<br />

The frequency of oscillator II is adjusted <strong>in</strong> every clock<br />

cycle <strong>and</strong> can be modeled as<br />

f2[ n+ 1] = f2[1] −( kp⋅ ctrp[ n] + kz⋅ctrz[ n] ) ⋅ Kvco<br />

(14)<br />

where kp <strong>and</strong> kz are the ga<strong>in</strong> of the counters p <strong>and</strong> z, ctrp[n] <strong>and</strong> ctrz[n] are their count<strong>in</strong>g results, <strong>and</strong> Kvco is the<br />

oscillator control constant <strong>in</strong> the unit of Hz/bit.<br />

In the implementation of this design, the eight most<br />

significant bits of the 24-bit counter p are connected to the<br />

DAC control of the oscillator I, while the output of 1-bit<br />

counter z is connected to the DAC directly. So equation<br />

(14) changes to<br />

16<br />

f2[ n + 1] = f2[1] − ( fix( ctrp[ n]/2 ) + ctrz[ n] ) ⋅ Kvco<br />

(15)<br />

where fix(A) is the Matlab fix function which rounds the<br />

elements of A toward zero, result<strong>in</strong>g <strong>in</strong> an array of<br />

<strong>in</strong>tegers.<br />

The loop b<strong>and</strong>width of this digital PLL system is<br />

proportional to the oscillator control constant Kvco, <strong>and</strong> the<br />

counters’ ga<strong>in</strong>, kp <strong>and</strong> kz. Fig. 6 shows the loop acquisition<br />

process simulated by Matlab for different number of bits<br />

<strong>in</strong> counter p. A shorter lock<strong>in</strong>g time <strong>in</strong>dicates a wider loop<br />

b<strong>and</strong>width.<br />

t d[n]=t 1[n]-t 2[n]<br />

(ns)<br />

50<br />

0<br />

14-bit<br />

-50 16-bit<br />

f 1=30MHz<br />

f 2[1]=29.9MHz<br />

-100 18-bit<br />

t d[1]=20ns<br />

σ εi =60ps<br />

-150<br />

0 50<br />

t(µs)<br />

100<br />

K VCO =40kHz/bit<br />

150 200<br />

Figure 6. Loop acquisition simulated by Matlab<br />

t<br />

89<br />

3.3 DAC-controlled r<strong>in</strong>g oscillator<br />

The DAC-controlled r<strong>in</strong>g oscillator is realized by add<strong>in</strong>g a<br />

capacitor array to the load of the 3-stage s<strong>in</strong>gle-ended r<strong>in</strong>g<br />

oscillator as illustrated <strong>in</strong> Fig. 7. To provide more jitter,<br />

the current-starved <strong>in</strong>verter with six extra 50K resistors at<br />

the voltage control nodes is used as the delay stage <strong>in</strong>stead<br />

of the simple CMOS <strong>in</strong>verter. The two r<strong>in</strong>g oscillators are<br />

totally symmetric <strong>and</strong> next to each other <strong>in</strong> the layout.<br />

Thus the determ<strong>in</strong>istic jitter due to the power supply <strong>and</strong><br />

substrate is <strong>in</strong> common-mode <strong>and</strong> will be rejected.<br />

Vdd<br />

Vp<br />

Vn<br />

9 control bits<br />

Z0 F0 F1 F7<br />

1x 1x 2x 128x<br />

Figure 7. The DAC-controlled r<strong>in</strong>g oscillator<br />

out<br />

Cload<br />

For the current-starved <strong>in</strong>verter, the noise figure of merit κ<br />

can be maximized by m<strong>in</strong>imiz<strong>in</strong>g the power dissipated for<br />

<strong>in</strong>verter switch<strong>in</strong>g [5]. M<strong>in</strong>imum sizes are used for the<br />

switch<strong>in</strong>g NMOS transistors to achieve higher κ with fast<br />

available speed. PMOS transistors are sized relative to<br />

NMOS transistors to provide symmetric rise <strong>and</strong> fall<br />

times. The control transistors are twice the size of the<br />

switch<strong>in</strong>g transistors, <strong>and</strong> are biased with very small<br />

excess gate-source voltages to limit the switch<strong>in</strong>g current.<br />

3.4 Low metastability D flip-flop<br />

S<strong>in</strong>ce the edges of the oscillators are always aligned to<br />

each other, a low metastability fall<strong>in</strong>g edge triggered D<br />

flip-flop is designed <strong>in</strong> this work as <strong>in</strong> Fig. 8. When the<br />

CLK is high, both the pre-amplifier [6] <strong>and</strong> the D latch [6]<br />

will be reset. The transmission gate is on <strong>and</strong> the data is<br />

sampled. When the CLK switches to low, the transmission<br />

gate is closed to hold the data. The pre-amplifier amplifies<br />

the difference between the held data <strong>and</strong> vdd/2, <strong>and</strong> the D<br />

latch regenerates it to a valid logic level. To reduce the<br />

metastability error, two D flip-flops are cascaded.<br />

Data<br />

CLK<br />

CLK<br />

+<br />

Pre-Amp<br />

−<br />

CLK Vdd/2<br />

CLK<br />

Reset<br />

+<br />

D Latch<br />

−<br />

Figure 8. The low metastability D flip-flop<br />

3.5 Digital post-process<strong>in</strong>g<br />

In order to lower the autocorrelation, the raw output data<br />

stream of the digital PLL system is divided down by 10<br />

first <strong>and</strong> then fed <strong>in</strong>to a Von Neumann corrector to<br />

elim<strong>in</strong>ate the bias. F<strong>in</strong>ally another frequency divider is<br />

used to further reduce the autocorrelation.<br />

Q


4. EXPERIMENTAL RESULTS<br />

The customized part of this design <strong>in</strong>clud<strong>in</strong>g the<br />

oscillators, the D flip-flops, <strong>and</strong> the voltage regulator are<br />

implemented <strong>in</strong> a 5V 1.5um 2-poly 2-metal CMOS<br />

process <strong>and</strong> consume an area of 1mm 2 . The chip<br />

micrograph is shown <strong>in</strong> Fig. 9. Exclud<strong>in</strong>g the output<br />

buffers, the total power dissipation is 1.92mW. The<br />

counters <strong>and</strong> the digital post-process<strong>in</strong>g circuits are<br />

implemented <strong>in</strong> an off-chip FPGA for design flexibility.<br />

DAC1<br />

DAC2<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

R<strong>in</strong>g1<br />

R<strong>in</strong>g2<br />

Figure 9. The die photo<br />

2 DFFs<br />

Resistors<br />

The two oscillators are runn<strong>in</strong>g at 30MHz <strong>in</strong> this RNG<br />

system. Fig. 10 shows the measured plot of the rms jitter<br />

over measurement time delay for the open-loop r<strong>in</strong>g<br />

oscillators. The extracted white noise figure of merit κ is<br />

1.48E-7.<br />

RMSJitter (sec)<br />

10 -9<br />

10 -10<br />

10 -11<br />

κ=1.48E-7<br />

10 -7<br />

ζ =2.52E-4<br />

Meas. Time Delay (sec)<br />

10 -6<br />

Figure 10. Jitter performance of the free-runn<strong>in</strong>g<br />

r<strong>in</strong>g oscillators at 30MHz<br />

By analyz<strong>in</strong>g the autocorrelation of the raw output data<br />

before post-process<strong>in</strong>g, the extracted equivalent loop<br />

b<strong>and</strong>width of this PLL system is around 500 kHz. Thus<br />

the rms jitter of this system is about 60ps accord<strong>in</strong>g to<br />

equation (8). The DAC has a measured LSB of 44ps with<br />

the total adjust<strong>in</strong>g range of ±1.6ns. The resolution is not as<br />

f<strong>in</strong>e as expected due to variation <strong>in</strong> the fabrication process.<br />

Thus it was necessary to apply more division than<br />

expected to lower the autocorrelation sufficiently, <strong>and</strong> a<br />

data rate of only 100 kbps is achieved. It is expected that<br />

this rate will be improved <strong>in</strong> a future iteration of the<br />

design.<br />

The quality of the r<strong>and</strong>omness has been verified by the<br />

NIST SP800-22 test suite [7] over 2Mbit long sequences.<br />

This test suite consists of 16 statistical tests, <strong>and</strong> the<br />

pass<strong>in</strong>g criteria for each test is that the p-value is larger<br />

than 0.01. Table 1 shows the complete test results for 3<br />

data sequences.<br />

90<br />

Table 1. NIST SP800-22 statistical test results<br />

Test<br />

P-value<br />

Data Set I Data Set II Data Set III<br />

Frequency 0.403123 0.321518 0.346485<br />

Block Frequency 0.151636 0.755148 0.055258<br />

Cusum-Forward 0.323588 0.556027 0.677662<br />

Cusum-Reverse 0.338798 0.243561 0.506432<br />

Runs 0.17792 0.194128 0.903376<br />

Longest Run 0.891002 0.945998 0.124246<br />

Rank 0.239545 0.24964 0.141371<br />

FFT 0.270026 0.6652 0.718271<br />

Universal 0.685955 0.673109 0.841297<br />

Approx. Entropy 0.399766 0.196523 0.981364<br />

Serial1 0.866552 0.183901 0.619666<br />

Serial2 0.793059 0.411532 0.319159<br />

Lempel Ziv 1.000000 1.000000 1.000000<br />

L<strong>in</strong>ear Complexity 0.448407 0.187954 0.980834<br />

Periodic Templates 0.362154 0.65189 0.299952<br />

Aperiodic Templates all passed all passed all passed<br />

R<strong>and</strong>om Excursions all passed all passed all passed<br />

R<strong>and</strong>om Ex. Variant all passed all passed all passed<br />

5. CONCLUSION<br />

A new type of true RNG based on digital PLL has been<br />

proposed. The r<strong>and</strong>om bits are generated by the jitter<br />

sampl<strong>in</strong>g of two identical synchronized r<strong>in</strong>g oscillators.<br />

Compar<strong>in</strong>g to the traditional oscillator sampl<strong>in</strong>g approach,<br />

this method is able to achieve higher data rate when us<strong>in</strong>g<br />

same speed of clocks. This structure has been realized <strong>in</strong> a<br />

1.5um process, <strong>and</strong> has successfully passed the NIST<br />

SP800-22 statistical test suite.<br />

6. REFERENCES<br />

[1] B. Jun <strong>and</strong> P. Kocher, The Intel r<strong>and</strong>om number<br />

generator, Technical Report, Intel, 1999.<br />

[2] M. Bucci et al., A high speed oscillator-based truly<br />

r<strong>and</strong>om number source for cryptographic applications<br />

on a smart card IC, IEEE Trans. on Computers, vol.<br />

52, No. 4, pp. 403-409, 2003.<br />

[3] J. McNeill, Jitter <strong>in</strong> r<strong>in</strong>g oscillators, IEEE J. Solid-<br />

State Circuits, vol. 32, No. 6, pp. 870-879, 1997.<br />

[4] C. Liu <strong>and</strong> J. McNeill, Jitter <strong>in</strong> oscillators with 1/f<br />

noise sources, Proc. IEEE ISCAS’04, vol. 1, pp. 773-<br />

776, 2004.<br />

[5] C. Liu <strong>and</strong> J. McNeill, Jitter <strong>in</strong> deep submicron<br />

CMOS s<strong>in</strong>gle-ended r<strong>in</strong>g oscillators, Proc. 5 th<br />

International Conference on ASIC, pp. 715-718,<br />

2003.<br />

[6] C. Portmann <strong>and</strong> T. Meng, Power-efficient<br />

metastability error reduction <strong>in</strong> CMOS flash A/D<br />

converters, IEEE J. Solid-State Circuits, vol. 31, No.<br />

8, pp. 1132-1140, 1996.<br />

[7] A. Rukh<strong>in</strong>, el. A statistical test suite for r<strong>and</strong>om <strong>and</strong><br />

pseudor<strong>and</strong>om number generators for cryptographic<br />

applications, NIST Special Publication 800-22, 2001.


0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

A 11-15 GHz CMOS ÷2 FREQUENCY DIVIDER<br />

FOR BROAD-BAND I/Q GENERATION<br />

Annamaria Tedesco, Andrea Bonfanti, Luigi Panseri, Andrea Lacaita<br />

Dipartimento di Elettronica ed Informazione - Politecnico di Milano,<br />

P.za Leonardo da V<strong>in</strong>ci 32, I-20133 Milano, Italy<br />

E-mail: panseri@elet.polimi.it<br />

ABSTRACT<br />

This paper presents a 0.13 µm CMOS frequency divider<br />

for I/Q generation. To achieve a wide lock<strong>in</strong>g range, a<br />

novel topology based on a two stages <strong>in</strong>jection-lock<strong>in</strong>g<br />

r<strong>in</strong>g oscillator is adopted. This architecture can reach a<br />

larger <strong>in</strong>put frequency range <strong>and</strong> better phase accuracy<br />

with respect to <strong>in</strong>jection-lock<strong>in</strong>g LC oscillators, because of<br />

the smoother slope of its phase-frequency plot. Post layout<br />

simulations show that the circuit is able to divide an <strong>in</strong>put<br />

signal spann<strong>in</strong>g from 6 to 24 GHz, although the available<br />

tun<strong>in</strong>g range of the signal source (<strong>in</strong>tegrated VCO) limited<br />

the experimental verification to the <strong>in</strong>terval 11-15 GHz,<br />

with a consequent measured 31% lock<strong>in</strong>g range. A s<strong>in</strong>gle<br />

divider dissipates 3 mA from 1.2 V power supply.<br />

1. INTRODUCTION<br />

High-speed frequency dividers are among the most critical<br />

build<strong>in</strong>g blocks <strong>in</strong> modern CMOS wireless tranceivers.<br />

They are employed both <strong>in</strong> the feedback path of Phase<br />

Locked Loops (PLL) <strong>and</strong> to generate the I/Q quadrature<br />

signals necessary <strong>in</strong> zero- or low-IF receivers. The design<br />

of a fully <strong>in</strong>tegrated CMOS divider is critical, above all for<br />

its power consumption that is comparable to the one of the<br />

Voltage Controlled Oscillator (VCO). This problem<br />

becomes even more severe when a wide <strong>in</strong>put frequency<br />

range is required, s<strong>in</strong>ce <strong>in</strong> this case it is not possible to<br />

adopt resonant loads, that would allow power reduction.<br />

Moreover, the dissipation of a broad-b<strong>and</strong> buffer, usually<br />

<strong>in</strong>terposed between the VCO <strong>and</strong> the divider itself, must<br />

be also taken <strong>in</strong> to account <strong>in</strong> the total power budget. We<br />

discuss a topology for a ÷2 frequency divider for I/Q<br />

generation <strong>in</strong> a WLAN transceiver for IEEE 802.11a/b/g<br />

st<strong>and</strong>ards. In the frequency synthesizer only one VCO will<br />

be employed, <strong>and</strong> a first divider provides the I/Q signals<br />

for the 802.11a channels without us<strong>in</strong>g an <strong>in</strong>put buffer.<br />

Then, a follow<strong>in</strong>g divider makes the signals for the 2.5<br />

GHz b<strong>and</strong> available. The key advantage of this<br />

architecture is that VCO pull<strong>in</strong>g is avoided. Given the<br />

channellizations of the st<strong>and</strong>ards, the <strong>in</strong>put frequency of<br />

the first divider must vary between 9.6 <strong>and</strong> 11.6 GHz,<br />

lead<strong>in</strong>g to 18 % <strong>in</strong>put tun<strong>in</strong>g, or lock<strong>in</strong>g, range. However,<br />

a more conservative figure for the lock<strong>in</strong>g range is 35%, to<br />

account for the process spreads <strong>in</strong> the VCO. In the next<br />

section we will recall why such high tun<strong>in</strong>g range makes<br />

not possible to adopt some recently presented low-power<br />

91<br />

solutions, as LC <strong>in</strong>jection lock<strong>in</strong>g topology. Then we<br />

discuss the circuit topology <strong>and</strong> we show why this circuit<br />

can achieve a wide tun<strong>in</strong>g range. We present the circuit<br />

implementation, some experimental results <strong>and</strong>, then, the<br />

conclusions will follow.<br />

2. HIGH-SPEED DIVIDERS<br />

High-speed dividers are typically realized as two latches <strong>in</strong><br />

a negative feedback loop [1]. For multi-GHz <strong>in</strong>put signals,<br />

these circuits have a differential current-steer<strong>in</strong>g structure<br />

that allows fast current switch<strong>in</strong>g, usually called sourcecoupled<br />

logic (SCL). To reduce the dissipation a dynamic<br />

logic was employed <strong>in</strong> [2], but it does not provide I/Q<br />

outputs. Probably, the best solution <strong>in</strong> term of power<br />

consumption is to differentially drive two <strong>in</strong>jection<br />

lock<strong>in</strong>g LC oscillators, as <strong>in</strong> [3]. With respect to the SCL<br />

dividers the dissipation reduces by about Q, the quality<br />

factor of the divider’s resonant tank, at the cost of a large<br />

area occupation. Unfortunately, this topology is<br />

<strong>in</strong>tr<strong>in</strong>sically narrow b<strong>and</strong>: to <strong>in</strong>crease the divider lock<strong>in</strong>g<br />

range the quality factor Q of the LC tanks must be<br />

lowered with a result<strong>in</strong>g rise <strong>in</strong> dissipation. Our lock<strong>in</strong>g<br />

range requirements dem<strong>and</strong> a low Q, close to two,<br />

practically void<strong>in</strong>g the effect of the LC resonance. In this<br />

sense, power is traded not only with speed, but also with<br />

<strong>in</strong>put lock<strong>in</strong>g range. A further example of this trend is the<br />

40 GHz divider presented <strong>in</strong> [4], which demonstrates good<br />

power performance but features 6 % lock<strong>in</strong>g range.<br />

2.1 Proposed <strong>in</strong>jection-lock r<strong>in</strong>g divider<br />

Our approach was to <strong>in</strong>jection-lock a wide tun<strong>in</strong>g range<br />

oscillator, namely a two-stages r<strong>in</strong>g oscillator. Self<br />

oscillat<strong>in</strong>g dividers have also the useful property of<br />

requir<strong>in</strong>g less power from the driv<strong>in</strong>g signal, avoid<strong>in</strong>g the<br />

use of a power consum<strong>in</strong>g buffer. Fig.1 shows the circuit<br />

scheme <strong>and</strong> the detail of one stage. Simulations show that<br />

the transfer function of the s<strong>in</strong>gle stage can be<br />

approximated as:<br />

( )<br />

1− jω<br />

ω<br />

+ ω ω<br />

T jω = T0 1 j<br />

The small signal dc ga<strong>in</strong> is T 0 = g mi R/(1 – g mc R), be<strong>in</strong>g g mI<br />

<strong>and</strong> g mc the transconductances of transistors M 1 -M 2 <strong>and</strong> M 3 -<br />

z<br />

p<br />

(1)


+ _<br />

0°<br />

I<br />

ω <strong>in</strong><br />

2ω <strong>in</strong><br />

+ _<br />

180°<br />

M respectively. The dom<strong>in</strong>ant pole frequency is ω =(1–<br />

4 P<br />

g R)/C R, where C is the output load capacitance.<br />

mc O O<br />

Ow<strong>in</strong>g to the gate-dra<strong>in</strong> overlap capacitance of the <strong>in</strong>put<br />

couple M -M , the transfer function also exhibits a right<br />

1 2<br />

half-plane zero at higher frequency, ω = g /C To satisfy<br />

Z mi gd<br />

the Barkhausen criterion for loop phase, a lag of – π/2<br />

radians is required for each stage, provid<strong>in</strong>g <strong>in</strong> this way<br />

the quadrature outputs. The cross-connection between the<br />

two stages <strong>in</strong> Fig.1 gives the rema<strong>in</strong><strong>in</strong>g phase shift of π.<br />

The – π/2 delay is reached only thanks to the right zero,<br />

<strong>and</strong> it is straightforward from (1) to evaluate that this shift<br />

is achieved at a frequency correspond<strong>in</strong>g to the geometric<br />

mean between the pole <strong>and</strong> the zero. The r<strong>in</strong>g oscillation<br />

frequency is thus ω0 = (ωp ωz) 1/2 . Us<strong>in</strong>g (1) <strong>and</strong> the<br />

Barkhausen criterion for loop magnitude, we obta<strong>in</strong> the<br />

condition for the dc ga<strong>in</strong>: T0(ωp /ωz) 1/2 = 1. Then, when<br />

the amplitude rises, the “effective” transconductance of<br />

the cross-coupled pair M3-M4 reduces with respect to the<br />

<strong>in</strong>itial small signal value g /2, until the oscillation<br />

mc<br />

becomes stable. It’s important to note, from the expression<br />

of the dom<strong>in</strong>ant pole frequency, that also ωP <strong>and</strong>,<br />

accord<strong>in</strong>gly, ω0 vary. Both ωP <strong>and</strong> ω0 <strong>in</strong>crease when the<br />

transconductance of the pair M3-M4 lowers. In an<br />

oscillator this effect would lead to very poor performance<br />

<strong>in</strong> term of frequency stability <strong>and</strong> phase noise. In our<br />

application we do not face this problem, s<strong>in</strong>ce the circuit is<br />

driven by an <strong>in</strong>put signal.<br />

A similar analysis can be found <strong>in</strong> [5], though the circuit<br />

<strong>in</strong> that case is a st<strong>and</strong>ard SCL divider, <strong>and</strong> only simulation<br />

results are shown. In our case the cross-coupled pair acts<br />

as a ga<strong>in</strong>-boost<strong>in</strong>g stage at the start-up <strong>and</strong> then as an<br />

amplitude limiter to stabilize the oscillation, <strong>and</strong> not as a<br />

latch for data hold<strong>in</strong>g. An <strong>in</strong>jection-lock<strong>in</strong>g r<strong>in</strong>g is also<br />

discussed <strong>in</strong> [6]. In that paper however, the circuit only<br />

generates the I/Q signals without divid<strong>in</strong>g the VCO<br />

output, that could suffer from pull<strong>in</strong>g effect.<br />

2.2 Lock Range <strong>and</strong> Phase Accuracy<br />

The two differential outputs of the VCO, runn<strong>in</strong>g at 2ω <strong>in</strong> ,<br />

drive the divider through the tail transistors of the two<br />

stages, M 5 <strong>in</strong> Fig. 1 <strong>and</strong> the analogous one of the other<br />

stage. In this way the <strong>in</strong>jected current i <strong>in</strong> , at 2ω <strong>in</strong> is<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

Q<br />

+<br />

2ω <strong>in</strong><br />

ω <strong>in</strong><br />

R<br />

_<br />

V bias<br />

M1 g M2 mi<br />

M5 V DD<br />

Figure 1. R<strong>in</strong>g frequency divider driven by a<br />

differential VCO. The scheme of one stage is also<br />

+ R<br />

M 3<br />

_<br />

g mc<br />

M 4<br />

92<br />

I DC@0<br />

i <strong>in</strong>@2ω <strong>in</strong><br />

v2 α<br />

ϕ<br />

V1 ω <strong>in</strong><br />

1° Stage<br />

-1<br />

superimposed to the dc tail bias current I . In this way the<br />

DC<br />

r<strong>in</strong>g is forced to oscillate at ω , which is <strong>in</strong> general<br />

<strong>in</strong><br />

different from ω . The mechanism of <strong>in</strong>jection lock<strong>in</strong>g has<br />

0<br />

been rigorously discussed by many authors, see for<br />

<strong>in</strong>stance [3], [6], [7] <strong>and</strong> is <strong>in</strong>tuitively recalled with the<br />

help of Fig. 2.<br />

A steady condition is reached when the r<strong>in</strong>g oscillates at<br />

ω . In this case the <strong>in</strong>put pair M -M <strong>in</strong> Fig. 1 behaves as a<br />

<strong>in</strong> 1 2<br />

mixer for the signal com<strong>in</strong>g from the tail <strong>and</strong> thus<br />

converts I <strong>and</strong> i to ω . The dom<strong>in</strong>ant pole of each stage<br />

DC <strong>in</strong> <strong>in</strong><br />

filters out the higher-frequency products. S<strong>in</strong>ce ω differs<br />

<strong>in</strong><br />

from ω , the lag of each stage is not exactly –π/2 but it is –<br />

0<br />

π/2–ϕ. The mixer stage must balance this phase shift<br />

ϕ add<strong>in</strong>g the opposite shift to its output. That is sketched<br />

<strong>in</strong> Fig. 2, where all the represented phasors rotate at ω . V <strong>in</strong> 1<br />

represents the output signal as a result of the dc current,<br />

while v is the additional signal caused by the <strong>in</strong>jected<br />

2<br />

current. The loop adjusts the phasors, by settl<strong>in</strong>g the angle<br />

α, <strong>in</strong> order to obta<strong>in</strong> an additional shift ϕ, with respect to<br />

the state <strong>in</strong> which there’s not <strong>in</strong>jection of current. It is easy<br />

to see that when a differential signal, i <strong>and</strong> –i , is forced<br />

<strong>in</strong> <strong>in</strong><br />

<strong>in</strong> the two stages, their outputs are still <strong>in</strong> quadrature. It is<br />

also possible to evaluate which is the the maximum<br />

frequency deviation, ∆ω, from ω so that the divider can<br />

0<br />

properly operate, [7]. If we call SL the magnitude of the<br />

slope, evaluated at ω , of the phase vs. frequency<br />

0<br />

characteristic of T(jω), it results SL ≅ ϕ/∆ω. If we def<strong>in</strong>e<br />

the <strong>in</strong>jection strength m = v /V = i /I , Fig.2 shows that,<br />

2 1 <strong>in</strong> DC<br />

with a good approximation, is ϕ ≅ m, <strong>and</strong> consequently<br />

max<br />

∆ω ≅ m/SL. Consequently the lock range <strong>in</strong>creases with<br />

max<br />

m, <strong>and</strong> it is also <strong>in</strong>versely proportional to SL. From<br />

equation (1) it is possible to simply evaluate the<br />

magnitude of the slope. Recall<strong>in</strong>g that the expression for<br />

the free runn<strong>in</strong>g frequency, ω0, is ω0 = (ωP ωZ) 1/2 , it<br />

results:<br />

dϕ<br />

SL = = 2 ω +ω<br />

ω<br />

d ω<br />

0<br />

( p z )<br />

I DC@0<br />

-i <strong>in</strong>@2ω <strong>in</strong><br />

For a given ω 0, this value is much smaller than the<br />

correspond<strong>in</strong>g one <strong>in</strong> an LC tank oscillator, namely 2Q/ω 0 .<br />

In fact, first, the arithmetic mean (ω p + ω z )/2 is always<br />

larger than the geometric mean, ω 0 . Second, the factor 2Q<br />

ω <strong>in</strong><br />

2° Stage<br />

ϕ α v2 V1 Figure 2. The <strong>in</strong>jection lock<strong>in</strong>g mechanism <strong>in</strong><br />

the two stages r<strong>in</strong>g oscillator.<br />

(2)


210µm<br />

54µm<br />

is usually greater than one. Thus, the lock range obta<strong>in</strong>ed<br />

is accord<strong>in</strong>gly larger. Moreover a low value of SL<br />

provides also benefits to the quadrature accuracy. The<br />

quadrature error is due to the mismatch between the values<br />

of ω 0 <strong>in</strong> the two stages. For a difference dω 0 , a<br />

correspond<strong>in</strong>g variation dϕ must thus occur. That can be<br />

achieved only by chang<strong>in</strong>g the angle α between the<br />

vectors V 1 <strong>and</strong> v 2 . This shift dα, is the quadrature error.<br />

From the phasors diagram <strong>in</strong> Fig.2 we obta<strong>in</strong> that:<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

300µm<br />

36µm<br />

Figure 3. Die photo.<br />

2<br />

1+ m + 2mcosα<br />

dα= dϕ m m cos<br />

( + α)<br />

VCO<br />

High speed divider<br />

S<strong>in</strong>ce the magnitude of dϕ is approximately |dϕ| ≅ SL<br />

|dω 0|, also <strong>in</strong> this case a smoother slope SL is beneficial.<br />

3. CIRCUIT REALIZATION<br />

A cascade of two dividers has been <strong>in</strong>tegrated <strong>in</strong> a STM<br />

0.13µm CMOS technology. A differential VCO as signal<br />

source <strong>and</strong> a quadrature mixer are <strong>in</strong>tegrated on the same<br />

die. The two circuits have the same topology, with the<br />

difference that <strong>in</strong> the second divider the bias current is<br />

almost half of the first divider, because of its halved<br />

operat<strong>in</strong>g frequency. Fig. 3 shows a photo of the chip. The<br />

<strong>in</strong>ductors of a differential LC-tank VCO fill most of the<br />

area. The only signals available are the ones at the output<br />

of the VCO <strong>and</strong> of the second divider. Moreover <strong>in</strong> order<br />

to let the first circuit oscillate <strong>in</strong> free runn<strong>in</strong>g mode, it is<br />

possible to switch-off the power supply of the VCO.<br />

Post-layout simulations show that the first circuit can<br />

operate with an <strong>in</strong>put frequency vary<strong>in</strong>g between 6 <strong>and</strong> 24<br />

GHz. Fig. 4 shows the simulated sensitivity curve between<br />

the m<strong>in</strong>imum amplitude of the <strong>in</strong>put signal of the first<br />

divider <strong>and</strong> its operat<strong>in</strong>g frequency. At 19 GHz the<br />

amplitude of the voltage signal at the <strong>in</strong>put of the divider<br />

is zero. This means that the divider operates <strong>in</strong> free<br />

runn<strong>in</strong>g mode.<br />

(3)<br />

93<br />

Input Voltage [V]<br />

1.2<br />

1.0<br />

0.8<br />

0.6<br />

0.4<br />

0.2<br />

0.0<br />

6 8 10 12 14 16 18 20 22 24<br />

Input Frequency [GHz]<br />

Figure 4. Simulated sensitivity curve between<br />

the m<strong>in</strong>imum amplitude of the <strong>in</strong>put signal of<br />

the first divider <strong>and</strong> its operat<strong>in</strong>g frequency.<br />

4. EXPERIMENTAL RESULTS<br />

As previously discussed, one characteristic of these<br />

dividers is the capability to operate both driven by an<br />

<strong>in</strong>put signal <strong>and</strong> <strong>in</strong> free runn<strong>in</strong>g mode For comparison,<br />

two measured output spectra are superimposed <strong>in</strong> Fig. 5.<br />

The spectrum on the left side was measured when the<br />

divider cha<strong>in</strong> is driven by the VCO. Its center frequency,<br />

shifted <strong>in</strong> the figure for comparison, is 3.66 GHz,<br />

correspond<strong>in</strong>g to a VCO frequency four times higher, that<br />

is 14.64 GHz. The spectrum on the right side is the one of<br />

the first r<strong>in</strong>g operat<strong>in</strong>g <strong>in</strong> free runn<strong>in</strong>g mode, divided by<br />

two by the second r<strong>in</strong>g. The measured lock<strong>in</strong>g range is<br />

close to 31 % because it was possible to test the divider<br />

cha<strong>in</strong> only at the operat<strong>in</strong>g frequencies of the VCO,<br />

between 11 <strong>and</strong> 15 GHz. The measured free runn<strong>in</strong>g<br />

frequency is 8.3 GHz, a value slightly <strong>in</strong>ferior to the one<br />

obta<strong>in</strong>ed by simulations. Actually, the divider current<br />

needed for free-runn<strong>in</strong>g oscillation was slightly higher<br />

than the one used <strong>in</strong> the simulations. This may be due to<br />

the underestimation of the parasitic capacitances which<br />

leads to a lower open-loop ga<strong>in</strong>.<br />

Fig. 6 shows first the output phase noise when the VCO is<br />

switched off. It is the phase noise of the first r<strong>in</strong>g, reduced<br />

of – 6 dB by the frequency division. As expected, this<br />

noise is very high, but it does not impair the overall noise<br />

performance. When the two dividers are <strong>in</strong>jection-locked<br />

to the VCO it is the phase noise of this circuit that appears<br />

at the output. In Fig.6 <strong>in</strong> fact, we also compare the<br />

measured phase noise of the VCO with the one at the<br />

output of the two locked dividers. The frequency division<br />

by four reduces the noise by 12-dB <strong>and</strong> confirms that the<br />

r<strong>in</strong>g phase noise doesn’t affect the output phase noise. In<br />

Table I are summarized the two dividers’ performances.<br />

5. CONCLUSIONS<br />

A novel topology of frequency divider has been presented.<br />

The key feature of the circuit is the reduced slope of the<br />

phase-frequency plot of one s<strong>in</strong>gle stage. That allows


Phase Noise [dBc/Hz]<br />

Figure 5. Output spectrum (left) of the cascade<br />

of the two dividers driven by the VCO runn<strong>in</strong>g at<br />

14.64 GHz (the spectrum is shifted for sake of<br />

comparison), compared with the spectrum (right)<br />

of the first r<strong>in</strong>g <strong>in</strong> free runn<strong>in</strong>g, divided by two.<br />

-10<br />

-20<br />

-30<br />

-40<br />

-50<br />

-60<br />

-70<br />

-80<br />

free runn<strong>in</strong>g r<strong>in</strong>g /2<br />

-90<br />

-100<br />

VCO<br />

-110<br />

-120<br />

-130<br />

-140<br />

locked r<strong>in</strong>g<br />

-150<br />

10k 100k 1M 10M<br />

achiev<strong>in</strong>g a measured lock<strong>in</strong>g range about 31% of the<br />

center frequency, with a reasonable power budget. This<br />

value is limited by the operat<strong>in</strong>g frequency range of the<br />

VCO (11-15 GHz) used as <strong>in</strong>jection signal. Simulation<br />

demonstrate even a larger lock<strong>in</strong>g range, 6 – 24 GHz,<br />

which was not possible to experimentally verify given the<br />

limited tun<strong>in</strong>g range of the signal source.<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

Frequency Offset [Hz]<br />

Figure 6. Comparison between the phase noise of<br />

the VCO <strong>and</strong> the phase noise at the output of the<br />

two dividers. The difference is 12 dB, <strong>in</strong>dicat<strong>in</strong>g<br />

that the dividers do not impair the noise<br />

performance.<br />

6. REFERENCES<br />

[1] B. Razavi, RF <strong>Microelectronics</strong>, Prentice Hall, Upper<br />

Saddle River, NJ, USA, 1998, pp. 290-295.<br />

[2] S. Pellerano, S. Levant<strong>in</strong>o, C. Samori, <strong>and</strong> A.<br />

Lacaita, “A 13.5-mW frequency synthesizer with<br />

dynamic-logic frequency divider,” IEEE Journal of<br />

Solid-State Circuits, vol. 39, pp. 378-383, Feb. 2004.<br />

[3] A. Mazzanti, P. Uggetti, <strong>and</strong> F. Svelto, “Analysis <strong>and</strong><br />

design of <strong>in</strong>jection-locked LC dividers for quadrature<br />

94<br />

Table. 1. Measured Dividers Performance<br />

Power supply 1.2 V<br />

Bias current – first divider 3 mA<br />

Bias current – second divider 1.5 mA<br />

Free runn<strong>in</strong>g frequency – first divider.<br />

Free runn<strong>in</strong>g frequency – second divider<br />

8.3 GHz<br />

5.5 GHz<br />

Input frequency range 11 – 15 GHz<br />

generation,” <strong>in</strong> IEEE Journal of Solid-State Circuits,<br />

vol. 39, pp. 1425-1433, Sept. 2004.<br />

[4] J. Lee, <strong>and</strong> B. Razavi, “A 40-GHz frequency divider<br />

<strong>in</strong> 0.18 µm CMOS technology,” <strong>in</strong> IEEE Journal of<br />

Solid-State Circuits, vol. 39, pp. 594-601, April 2004.<br />

[5] R. Dehgahani, <strong>and</strong> S. M. Atarodi, “A low power<br />

wideb<strong>and</strong> 2.6 GHz CMOS <strong>in</strong>jection-locked r<strong>in</strong>g<br />

oscillator prescaler,” <strong>in</strong> Proc. RFIC Symposium, pp.<br />

659-662, Philadelphia, PA, 8-10 June 2003.<br />

[6] P. K<strong>in</strong>get, R. Melville, D.Long <strong>and</strong> V. Gop<strong>in</strong>athan,<br />

“An <strong>in</strong>jection-lock<strong>in</strong>g scheme for precision<br />

quadrature generation,” IEEE Journal of Solid-State<br />

Circuits, vol. 37, pp. 845-851, July 2002.<br />

[7] R. Adler, “A study of lock<strong>in</strong>g phenomena <strong>in</strong><br />

oscillators,” Proc. IEEE, vol. 61, pp. 1380-1385, Oct.<br />

1973.


A 4.2 GHz CMOS Quadrature VCO us<strong>in</strong>g Injection<br />

Lock<strong>in</strong>g for WLAN 802.11a<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

Michael Nielsen, Torben Larsen<br />

RISC Division, Aalborg University, Denmark, email: {mn,tl}@kom.aau.dk<br />

Abstract— A 4.2 GHz 0.24-µm CMOS VCO for a WLAN<br />

802.11a image reject transceiver is demonstrated. The VCO<br />

provides quadrature output through <strong>in</strong>jection lock<strong>in</strong>g of two<br />

LC-oscillators. Because the <strong>in</strong>jection lock<strong>in</strong>g has an impact on<br />

the oscillation frequency <strong>and</strong> Q-factor, an quasi l<strong>in</strong>ear model<br />

is used to analyze this. Furthermore, as this quadrature VCO<br />

is prone to imperfections, an analysis of the static phase error<br />

is presented. The phase error is measured to 2.6 − 3.1 ◦ which<br />

can be traced back to the coupl<strong>in</strong>g from the <strong>in</strong>ductors to the<br />

DC-supply connection. The phase noise is at most −109 dBc/Hz<br />

@ 1 MHz over the entire tun<strong>in</strong>g range from 3.95 − 4.55 GHz<br />

while the power consumption is 21 mW.<br />

I. INTRODUCTION<br />

For modern low cost wireless systems such as WLAN<br />

802.11a a high level of <strong>in</strong>tegration <strong>and</strong> low cost is a<br />

significant criteria for the transceiver architecture. The most<br />

obvious c<strong>and</strong>idates are the direct conversion transceiver<br />

(DCT) <strong>and</strong> the image reject transceiver (IRT), both requir<strong>in</strong>g<br />

local oscillator signals <strong>in</strong> quadrature.<br />

In an IRT the frequency plann<strong>in</strong>g can be done such that<br />

the second local oscillator signal is derived from the first<br />

<strong>in</strong> a divide by four circuit [1]. For WLAN 802.11a this<br />

means that the first local oscillator should cover a frequency<br />

range from 4.12 − 4.28 GHz if UNII b<strong>and</strong>s 1 <strong>and</strong> 2 are<br />

supported [1]. While the quadrature from the second LO<br />

can be generated <strong>in</strong> the divide by four circuit, the first LO<br />

still needs quadrature form. Traditionally this quadrature<br />

generation has been done by RC-polyphase filter or divide<br />

by 2 circuits [2], [3]. However, each method has its own<br />

advantages <strong>and</strong> disadvantages.<br />

An attractive approach to produce LO signals <strong>in</strong> quadrature<br />

is to <strong>in</strong>jection lock two oscillators by plac<strong>in</strong>g coupl<strong>in</strong>g<br />

transistors <strong>in</strong> series with the cross-coupled negative-R transistors<br />

(see Fig. 1) as proposed by [2], where the output<br />

signals <strong>in</strong> the ideal case are <strong>in</strong> perfect quadrature. Other<br />

configurations do exist [4]. However, <strong>in</strong> [4] it was found that<br />

the phase error of this topology is least prone to mismatch.<br />

Mcpl<br />

Msw<br />

Vcc<br />

Vcc<br />

I+ I- Q+ Q-<br />

VBias<br />

Q+ Q- I- I+<br />

Mbias<br />

VBias<br />

Fig. 1. Topology with <strong>in</strong>jection locked oscillator [2].<br />

95<br />

In [2] the oscillator has been implemented with an<br />

oscillation frequency just below 2 GHz, whereas <strong>in</strong> this<br />

work the topology is adapted for a 4.2 GHz oscillator.<br />

Because the dimensions relative to the frequency are larger<br />

<strong>in</strong> this case, the perfect quadrature is more prone to layout<br />

imperfections <strong>and</strong> consequently a phase error analysis is<br />

performed to reveal the critical layout aspects.<br />

The paper is organized as follows. First, a quasi l<strong>in</strong>ear<br />

model is proposed for the analysis of the oscillator. Secondly,<br />

the impact of the <strong>in</strong>jection lock<strong>in</strong>g on the oscillation<br />

frequency <strong>and</strong> oscillator Q-factor is <strong>in</strong>vestigated. Then, a<br />

phase error analysis based on the quasi l<strong>in</strong>ear model shows<br />

the critical aspects of the layout. F<strong>in</strong>ally, the results of the<br />

fabricated oscillator is presented <strong>and</strong> a conclusion is given.<br />

II. QUASI LINEAR MODEL<br />

In Fig. 2 a quasi l<strong>in</strong>ear model of the oscillator is depicted.<br />

In this model it is assumed that the current runn<strong>in</strong>g <strong>in</strong><br />

the positive branch of the I oscillator is solely depend<strong>in</strong>g<br />

on the voltage of nodes In <strong>and</strong> Qp <strong>and</strong> similar for the<br />

three other branches. Although the current <strong>in</strong> the series<br />

coupled transistors also depends on the voltage of the upper<br />

dra<strong>in</strong> this is <strong>in</strong>corporated <strong>in</strong> the model as vIp = −vIn. The<br />

nonl<strong>in</strong>earities of the oscillator are modeled <strong>in</strong> the transconductances<br />

Gs,x <strong>and</strong> Gc,x <strong>and</strong> these adjust themselves to<br />

ensure that Barkhausens criteria for oscillation is fulfilled.<br />

Gs,x represents the negative-R trans-conductance, <strong>and</strong> Gc,x<br />

represents the coupl<strong>in</strong>g between the two oscillators. Several<br />

th<strong>in</strong>gs effect Gc,x <strong>and</strong> Gs,x such as the size, bias-po<strong>in</strong>t <strong>and</strong><br />

periods of the transistors be<strong>in</strong>g <strong>in</strong> l<strong>in</strong>ear/saturation region.<br />

Regrettably, Gc,x <strong>and</strong> Gs,x are not directly related to these<br />

parameters <strong>and</strong> have to be simulated.<br />

HQ = vQ<br />

Inphase<br />

vI Quadrature<br />

Oscillator HI = Oscillator<br />

vI<br />

vQ<br />

vIp<br />

LI<br />

CI<br />

RI<br />

Gs,IvIn + Gc,IvQp<br />

vIn<br />

Gs,IvIp + Gc,IvQn<br />

vQp<br />

LQ<br />

CQ<br />

RQ<br />

Gs,QvQn + Gc,QvIn<br />

vQn<br />

Gs,QvQp + Gc,QvIp<br />

Fig. 2. L<strong>in</strong>ear model of the oscillator. R represents the loss <strong>in</strong> the<br />

LC-tanks due to the f<strong>in</strong>ite Q-factor. A similar model is presented<br />

<strong>in</strong> [2]<br />

A. Quadrature Oscillation<br />

The transfer functions from vQ to vI <strong>and</strong> visa versa can<br />

be found us<strong>in</strong>g simple circuit calculations to be:


<strong>and</strong><br />

HI (ω) =<br />

HQ (ω) =<br />

vI (ω)<br />

vQ (ω) =<br />

vQ (ω)<br />

vI (ω) =<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

−jωL � � Gc<br />

2<br />

1 − ω2LC + jωL � 1<br />

R<br />

jωL � � Gc<br />

2<br />

1 − ω2LC + jωL � 1<br />

R<br />

� (1)<br />

Gs<br />

− 2<br />

� (2)<br />

Gs<br />

− 2<br />

where it is assumed that the match<strong>in</strong>g between the components<br />

is perfect (hence the subscripts are dropped).<br />

Barkhausens criteria for oscillation states that<br />

HI (ω) · HQ (ω) =<br />

ω 2 L 2 G2 c<br />

4<br />

� 1 − ω 2 LC + jωL � 1<br />

R<br />

�� Gs 2 =1<br />

− 2<br />

(3)<br />

The quadrature nature of the oscillator can be realized<br />

by analyz<strong>in</strong>g Eqs. (1)–(3). Because the numerator of<br />

HI (ω) · HQ (ω) has no imag<strong>in</strong>ary part, the denom<strong>in</strong>ator<br />

must also be purely real. Consequently, Gs = 2<br />

R<br />

. This<br />

implies that the denom<strong>in</strong>ator of HI (ω) (or HQ (ω)) is<br />

real, which <strong>in</strong> turn implies that the phase between vI (ω)<br />

<strong>and</strong> vQ (ω) must be ±90 ◦ depend<strong>in</strong>g on the sign of the<br />

denom<strong>in</strong>ator. Hence the quadrature is shown.<br />

B. Oscillation Frequency<br />

Solv<strong>in</strong>g Eq. (3) with respect to the frequency gives<br />

�<br />

, (4)<br />

ωosc = ±<br />

� �<br />

∆ω ± ∆ω2 + ω2 0<br />

where ω0 is the resonance frequency of the LC-tank <strong>and</strong><br />

∆ω = Gc<br />

, (5)<br />

4C<br />

which may be <strong>in</strong>terpreted as an offset frequency between<br />

the oscillation frequency of the oscillator <strong>and</strong> the resonance<br />

frequency of the LC-tank. As ω0 ≫ ∆ω, Eq. (4) can be<br />

reduced to<br />

ωOsc = ω0 ± ∆ω, (6)<br />

where the ± outside the brackets <strong>in</strong> Eq. (4) represents<br />

the negative frequency <strong>and</strong> is consequently left out here.<br />

Although Eq. (6) has two mathematical solutions, only<br />

one of them is physically valid. Both measurements <strong>and</strong><br />

simulations show that the oscillation frequency is lower than<br />

the resonance frequency. S<strong>in</strong>ce the parasitic capacitance<br />

can be difficult to evaluate exact this can not be measured<br />

directly. However, by assum<strong>in</strong>g that Gc <strong>in</strong>creases with an<br />

<strong>in</strong>creased bias current (this tendency has been verified <strong>in</strong><br />

simulations) it can be seen that the offset is negative because<br />

the oscillation frequency drops with <strong>in</strong>creased bias current.<br />

Note that [2] f<strong>in</strong>ds that the offset is positive. Several factors<br />

can expla<strong>in</strong> this apparent contradiction: (i) The oscillation<br />

frequency of this oscillator is more than twice of [2]. (ii)<br />

The L/C-ratio is <strong>in</strong> this case 4 times that of [2]. (iii) The<br />

sizes of the transistors Mcpl <strong>and</strong> Msw are <strong>in</strong> this case nearly<br />

the same (refer to Fig. 3), whereas <strong>in</strong> [2] Mcpl is 5 times<br />

96<br />

the size of Msw. (iv) [2] uses a 0.35 µm process, whereas<br />

this is a 0.24 µm.<br />

C. Q-factor of Oscillator<br />

The Q-factor of an oscillator can be found as [5]<br />

Qosc = ω<br />

�<br />

2<br />

dφ<br />

+<br />

2 dω<br />

dA<br />

�<br />

2�<br />

�<br />

� , (7)<br />

dω<br />

where dφ<br />

dω<br />

<strong>and</strong> dA<br />

dω<br />

� ω=ωosc<br />

are the derivatives of the phase <strong>and</strong><br />

amplitude of the open loop transfer function of the oscillator.<br />

In a traditional negative-gm oscillator the Q-factor of<br />

the oscillator is the same as the Q-factor of the LC-tank.<br />

A high Q-factor reduces the phase noise <strong>and</strong> power loss<br />

(<strong>and</strong> thereby power consumption). For the <strong>in</strong>jection locked<br />

oscillator the Q-factor of the oscillator is not the same<br />

as the LC-tank. Comb<strong>in</strong><strong>in</strong>g Eqs. (3), (6), <strong>and</strong> (7) allows<br />

the Q-factor to be calculated. However, the expression gets<br />

complicated <strong>and</strong> br<strong>in</strong>gs no overview. Assisted by a computer<br />

algebra system, a few conclusions can be made (note that<br />

the phase derivative of Eq. (3) is zero because Gs cancels<br />

out the imag<strong>in</strong>ary part):<br />

• If it assumed that the parallel resistor <strong>in</strong> Fig. 2 scales<br />

l<strong>in</strong>early with the <strong>in</strong>ductor size, Gs must scale <strong>in</strong>versely<br />

with L. If it furthermore is assumed that Gc scales<br />

l<strong>in</strong>early with Gs it can be shown that the Q-factor of<br />

the oscillator is fixed for reasonable choices of the L/Cratio.<br />

It should be noted here, that the first assumption<br />

is a logical consequence of the loss be<strong>in</strong>g primarily<br />

dom<strong>in</strong>ated by the <strong>in</strong>ductor <strong>and</strong> that the <strong>in</strong>ductor has<br />

a fixed Q-factor. Simulations have verified that the<br />

second assumption is very close to reality.<br />

• The Q-factor decreases with an <strong>in</strong>crease of Gc. Put<br />

<strong>in</strong> other words: the lower coupl<strong>in</strong>g between the two<br />

oscillators, the better.<br />

The L/C-ratio should like the traditional negative-gm<br />

oscillator be maximized because the power loss <strong>in</strong> the LCtank<br />

<strong>in</strong>creases with a small <strong>in</strong>ductor <strong>and</strong> the oscillator needs<br />

to compensate for the <strong>in</strong>creased loss [6]. Consequently,<br />

the L/C-ratio should be maximized to m<strong>in</strong>imize the power<br />

consumption.<br />

D. Phase Error Analysis<br />

There are three sources to a static phase error: (i)<br />

Component- or bias mismatch, (ii) parasitic capacitative<br />

coupl<strong>in</strong>g, <strong>and</strong> (iii) mutual <strong>in</strong>ductance between the <strong>in</strong>ductors.<br />

In general the match<strong>in</strong>g performance of CMOS is sufficient<br />

to ensure a low phase error if the critical components<br />

are placed close to each other. Because the mismatch of<br />

the active devices (transistors <strong>and</strong> varactors) is larger, these<br />

should be given higher priority <strong>in</strong> this matter.<br />

The bias match<strong>in</strong>g is primarily the resistive loss <strong>in</strong> the<br />

supply voltage to the center-tapped <strong>in</strong>ductors. A resistive<br />

loss will cause a m<strong>in</strong>or voltage drop across the supply<br />

connection, if this loss is not matched one of the oscillators<br />

will have a lower supply voltage <strong>and</strong> consequently a phase<br />

error will be <strong>in</strong>troduced. Simulations <strong>in</strong>dicate that a 4 Ω


mismatch of the supply connection causes 1 ◦ phase error.<br />

However, careful layout can prevent such a mismatch.<br />

Somewhat <strong>in</strong>tuitive, the parasitic capacitative coupl<strong>in</strong>g is<br />

not a problem if the coupl<strong>in</strong>g is the same for all the nodes<br />

(Ip, In, Qp <strong>and</strong> Qn). A good match of the parasitic coupl<strong>in</strong>g<br />

can be met by a highly symmetric layout <strong>and</strong> some h<strong>and</strong>s-on<br />

area calculations of the <strong>in</strong>terconnections.<br />

The last contributor to phase error is the mutual <strong>in</strong>ductance<br />

between the <strong>in</strong>ductors. If the mutual <strong>in</strong>ductance<br />

between the two <strong>in</strong>ductors is characterized by the coupl<strong>in</strong>g<br />

constant (k) the open loop transfer function H (ω) =<br />

HI (ω]·HQ (ω) can once aga<strong>in</strong> be found by straight forward<br />

circuit calculations to:<br />

<strong>and</strong><br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

� Gc,I<br />

2<br />

�<br />

−jωLI ± jk<br />

HI (ω) =<br />

1 − ω2LICI + jωLI<br />

� Gc,Q<br />

2<br />

�<br />

jωLQ ∓ jk<br />

HQ (ω) =<br />

1 − ω2LQCQ + jωLQ<br />

�<br />

1<br />

− RI Gs,I 2<br />

�<br />

1<br />

− RQ Gs,Q 2<br />

� (8)<br />

� (9)<br />

where the sign of k depends on the physical layout —<br />

note that the sign of k <strong>in</strong> Eqs. (8) <strong>and</strong> (9) is opposite. It<br />

is worth not<strong>in</strong>g that because the numerators are no longer<br />

pure imag<strong>in</strong>ary, the denom<strong>in</strong>ators need to have an imag<strong>in</strong>ary<br />

part for Barkhausens criteria of oscillation to be fulfilled.<br />

As a consequence, Gs,I <strong>and</strong> Gs,Q must change, <strong>and</strong> s<strong>in</strong>ce it<br />

is reasonable to assume that Gs <strong>and</strong> Gc are correlated (the<br />

orig<strong>in</strong> of the effect is the same transistors) Gc,I <strong>and</strong> Gc,Q also<br />

change. It is impossible to predict how these will change<br />

from the quasi l<strong>in</strong>ear model, so simulations are required<br />

here. Simulations show that the trans-conductances change<br />

constructively — that is, the phase error is larger than what<br />

can be calculated from the numerator of Eq. (8) or Eq. (9).<br />

There are two methods to ensure a low mutual coupl<strong>in</strong>g. Either<br />

the <strong>in</strong>ductors must be placed with sufficiently distance<br />

<strong>in</strong> between, or four <strong>in</strong>ductors can be used as proposed <strong>in</strong><br />

[7]. ADS Momentum simulations <strong>in</strong>dicate that plac<strong>in</strong>g the<br />

<strong>in</strong>ductors as shown <strong>in</strong> Fig. 3 reduces the phase error to a<br />

value below 0.1 ◦ .<br />

In conclusion, the static phase error is down to proper<br />

layout with the follow<strong>in</strong>g guidel<strong>in</strong>es:<br />

• Transistors <strong>and</strong> varactors should be placed close to<br />

each other to avoid mismatch.<br />

• The bias supply connection should be resistive<br />

matched.<br />

• A highly symmetric layout is preferable to match the<br />

parasitic capacitance. The <strong>in</strong>terconnections that can not<br />

be routed symmetric should be given extra attention to<br />

ensure a match<strong>in</strong>g capacitance.<br />

• F<strong>in</strong>ally the <strong>in</strong>ductors should be placed with sufficient<br />

distance between them to avoid mutual <strong>in</strong>ductance.<br />

III. RESULTS<br />

The oscillator (depicted <strong>in</strong> Fig. 3) is fabricated <strong>in</strong> a 0.24-<br />

µm CMOS process with 2 µm top layer metal.<br />

97<br />

GND<br />

Vcc<br />

Vbias<br />

Vvar<br />

Vvar<br />

Ip<br />

DC-Supply Connection<br />

In<br />

Varactor Buffer Varactor<br />

Oscillator Core<br />

LI=2.4 nH<br />

Buffer<br />

LQ=2.4 nH<br />

Qn Qp<br />

Mcp = 0.24x90 µm<br />

Msw = 0.24x70 µm<br />

Mbias = 1x70 µm<br />

Fig. 3. Layout of oscillator. The size of the transistors are chosen<br />

small to reduce the parasitic capacitance. NMOS <strong>in</strong>version mode<br />

varactors are used to tune the frequency.<br />

To <strong>in</strong>crease the Q-factor of the <strong>in</strong>ductors, the strip-width<br />

is made th<strong>in</strong>ner <strong>in</strong> the <strong>in</strong>ner turn <strong>and</strong> thicker <strong>in</strong> the outer<br />

most turn. The arguments for do<strong>in</strong>g so are as follows. The<br />

<strong>in</strong>ductance is proportional to the average diameter, whereas<br />

most of the loss is caused by the series resistance <strong>in</strong> the<br />

w<strong>in</strong>d<strong>in</strong>gs. By decreas<strong>in</strong>g the width of the <strong>in</strong>ner turn <strong>and</strong><br />

<strong>in</strong>creas<strong>in</strong>g the outer turn the average diameter is <strong>in</strong>creased<br />

with the series loss more or less unchanged. Therefore,<br />

the Q-factor is <strong>in</strong>creased. The parasitic capacitance of the<br />

<strong>in</strong>ner turn is reduced <strong>in</strong> turn of an <strong>in</strong>creased outer turn<br />

capacitance. However, s<strong>in</strong>ce the outer turn capacitance is<br />

<strong>in</strong> common mode this is not a problem. Momentum simulations<br />

<strong>in</strong>dicates an <strong>in</strong>crease of the Q-factor of app. 10 %<br />

by us<strong>in</strong>g variable strip-width compared to us<strong>in</strong>g fixed stripwidth.<br />

Measurements show that the Q-factor of the <strong>in</strong>ductor<br />

is app. 12 at 4 GHz.<br />

The static phase error is measured on-chip us<strong>in</strong>g a<br />

network analyzer with one of the outputs of the oscillator<br />

(Ip) connected to the RF-<strong>in</strong>put of the analyzer. Thereby<br />

the analyzer locks to Ip <strong>and</strong> the relationship can be found<br />

between Ip <strong>and</strong> Qp by connect<strong>in</strong>g Qp to port 2 of the<br />

network analyzer <strong>and</strong> read<strong>in</strong>g out S21. The measurement<br />

setup is illustrated <strong>in</strong> Figure 4. By swapp<strong>in</strong>g the <strong>in</strong>puts <strong>and</strong><br />

re-measur<strong>in</strong>g, any cable mismatch is calibrated out <strong>and</strong> the<br />

phase <strong>and</strong> amplitude difference between the two signals can<br />

be found as:<br />

�� �<br />

S21,b<br />

θ = angle<br />

(10)<br />

<strong>and</strong><br />

��<br />

G = abs<br />

S21,a<br />

S21,b<br />

S21,a<br />

�<br />

(11)<br />

where θ typically is of most <strong>in</strong>terest. In these calculations<br />

it is assumed that the phase between p- <strong>and</strong> n-branch of<br />

each oscillator is 180 ◦ (hence perfect symmetry around the<br />

<strong>in</strong>ductor). The phase error is measured <strong>in</strong> 4 chips to 2.6 ◦ –<br />

3.1 ◦ with a maximum spread of 0.3 ◦ pr. chip. Histograms<br />

of the measurements are shown <strong>in</strong> Fig. 5. The rather large<br />

phase error can be traced back to the coupl<strong>in</strong>g from the<br />

<strong>in</strong>ductors to the DC-supply connection to <strong>in</strong>ductor LQ.<br />

This connection was not <strong>in</strong>cluded <strong>in</strong> the <strong>in</strong>itial Momentum<br />

simulation <strong>and</strong> therefore not discovered before the layout.<br />

Post simulations have shown that this coupl<strong>in</strong>g has a large<br />

effect on both the phase error between the oscillators <strong>and</strong><br />

the signal symmetry around the center-tapped <strong>in</strong>ductors.


8510C<br />

S21a =<br />

Cable 1 RF Port 2<br />

Cable 2<br />

a1 = I+<br />

b2 = αQ+<br />

b2 αQ+<br />

= a1 I+<br />

Probe 1<br />

GND<br />

I-<br />

I+<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

Measurement A<br />

Vcc<br />

Vbias<br />

Vvar<br />

8510C<br />

S21b =<br />

Cable 1 RF Port 2<br />

Cable 2<br />

a1 = Q+<br />

b2 = αI+<br />

b2 αI+ = a1 Q+<br />

Vvar<br />

Q-<br />

Q+<br />

Vbias<br />

Q+<br />

Q-<br />

Measurement B<br />

Vcc<br />

I+<br />

I-<br />

GND<br />

Probe 2<br />

Probe 1 Probe 2<br />

Fig. 4. Measurement setup. The usual RF-source of the HP 8510C<br />

test set is disconnected from the RF-<strong>in</strong>put of the network analyzer<br />

<strong>and</strong> one of the outputs of the oscillator (DUT) is connected <strong>in</strong>stead.<br />

In measurement B the chip <strong>and</strong> its bias supplies are rotated 180<br />

deg.<br />

The coupl<strong>in</strong>g can be reduced by wir<strong>in</strong>g the DC-supply<br />

connection on both sides of the <strong>in</strong>ductors or by us<strong>in</strong>g an<br />

approach to layout with centered DC-supply connections.<br />

In Fig. 6 the measured phase noise <strong>and</strong> oscillation frequency<br />

are shown along with the simulated results. The measured<br />

frequency is slightly higher than simulated, which can be<br />

expla<strong>in</strong>ed by the process variation of CMOS. The higher<br />

phase noise <strong>in</strong> the transition from low to high frequency is<br />

caused by the 1/f noise AM-PM conversion due to the steep<br />

tun<strong>in</strong>g curve of the varactors (for more <strong>in</strong>formation refer<br />

to [8]). Because this mechanism to a high degree depends<br />

on the amplitude of the signal, a discrepancy between the<br />

measured <strong>and</strong> simulated results can be observed.<br />

IV. CONCLUSION<br />

A 4.2 GHz oscillator has been implemented <strong>in</strong> a 0.24-µm<br />

CMOS process with a phase noise better than -109 dBc/Hz<br />

@ 1 MHz over the entire tun<strong>in</strong>g range from 3.95–4.55<br />

GHz. An analysis of the oscillator with respect to oscillation<br />

frequency, oscillator Q-factor, <strong>and</strong> static phase error is given<br />

based on a quasi l<strong>in</strong>ear model. The static phase error is<br />

measured to be between 2.6 ◦ –3.1 ◦ . This large error can be<br />

traced back to coupl<strong>in</strong>g from the <strong>in</strong>ductors to the DC-supply<br />

α<br />

α<br />

98<br />

Histogram of θ<br />

25<br />

40<br />

Chip 1 Chip 2<br />

20<br />

30<br />

15<br />

10<br />

5<br />

0<br />

86.4<br />

25<br />

Chip 3<br />

20<br />

86.6 86.8 87 87.2<br />

15<br />

10<br />

5<br />

20<br />

10<br />

0<br />

87 87.2 87.4 87.6 87.8 88<br />

30<br />

Chip 4<br />

25<br />

0<br />

87.1 87.2 87.3 87.4<br />

0<br />

86.9 87 87.1 87.2 87.3<br />

Phase Difference, θ [deg]<br />

Fig. 5. Histograms of the phase error measurement.<br />

Phase Noise @ 1MHz [dBc/Hz] (RBW/VBW 30 kHz)<br />

−107.5<br />

−110<br />

−112.5<br />

−115<br />

−117.5<br />

POsc=2 dBm<br />

(differential output)<br />

PDC=21 mW<br />

−120<br />

0 0.5 1 1.5 2 2.5 3<br />

3.7<br />

3.5<br />

Varactor Control Voltage [V]<br />

Fig. 6. Measured <strong>and</strong> simulated phase noise <strong>and</strong> oscillation<br />

frequency. Solid = measured, Dashed = simulations. The oscillator<br />

draws 8.3 mA from a 2.5 V voltage supply.<br />

connection <strong>and</strong> can be reduced by wir<strong>in</strong>g the connection on<br />

both sides of the <strong>in</strong>ductors.<br />

20<br />

15<br />

10<br />

REFERENCES<br />

[1] M. Zargari, D. K. Su, P. Yue, S. Rabii, D. Weber, B. J.<br />

Kaczynski, S. S. Mehta, K. S<strong>in</strong>gh, S. Mendis, <strong>and</strong> B. A.<br />

Wooley, “A 5-GHz CMOS Transceiver for IEEE 802.11a<br />

Wireless LAN Systems,” IEEE Journal of Solid-State Circuits,<br />

pp. 1688–1694, 2002.<br />

[2] P. Andreani, A. Bonfanti, L. Romano, <strong>and</strong> C. Samori, “Analysis<br />

<strong>and</strong> Design of a 1.8-GHz CMOS LC Quadrature VCO,” IEEE<br />

Journals of Solid-State Circuits, pp. 1737–1747, December<br />

2002.<br />

[3] B. Razavi, RF <strong>Microelectronics</strong>. Prentice Hall, 1998.<br />

[4] X. Wang <strong>and</strong> P. Andreani, “A 2GHz Low-Phase-Noise CMOS<br />

Quadrature VCO,” <strong>in</strong> Proc. IEEE Norchip Conference, 2002,<br />

pp. 303–308.<br />

[5] B. Razavi, “A Study of Phase Noise <strong>in</strong> CMOS Oscillators,”<br />

IEEE Journal of Solid State Circuits, Vol 31., pp. 331–343,<br />

1996.<br />

[6] M. Tiebout, “Low-Power Low-Phase-Noise Differentially<br />

Tuned Quadrature VCO Design <strong>in</strong> St<strong>and</strong>ard CMOS,” IEEE<br />

Journal of Solid-State Circuits, pp. 1018–1024, July 2001.<br />

[7] P. Andreani <strong>and</strong> X. Wang, “On the Phase-Noise <strong>and</strong> Phase-<br />

Error Performances of Multiphase LC CMOS VCOs,” IEEE<br />

Journal of Solid State Circuits, Vol 39., pp. 1883–1893, 2004.<br />

[8] S. Levant<strong>in</strong>o, C. Samori, A. Bonfanti, S. L. J. Gierk<strong>in</strong>k, A. L.<br />

Lacaita, <strong>and</strong> V. Boccuzzi, “Frequency Dependence on Bias<br />

Current <strong>in</strong> 5 GHz CMOS VCOs: Impact on Tun<strong>in</strong>g Range<br />

<strong>and</strong> Flicker Noise Upconversion,” IEEE Journal of Solid State<br />

Circuits, Vol 37., pp. 1003–1011, 2002.<br />

5<br />

4.7<br />

4.5<br />

4.3<br />

4.1<br />

3.9<br />

Oscillation Frequency [GHz]


0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

A VERY LOW POWER CONSUMING 5 GHZ<br />

CMOS VCO CORE WITH 800 MHZ FREQUENCY<br />

TUNING RANGE<br />

George von Büren, Frank Ell<strong>in</strong>ger, He<strong>in</strong>z Jäckel<br />

Swiss Federal Institute of Technology (ETH) Zurich, <strong>Electronics</strong> Laboratory,<br />

Gloriastrasse 35, CH-8092 Zurich, Switzerl<strong>and</strong><br />

E-mail: george.vonbueren@ife.ee.ethz.ch<br />

ABSTRACT<br />

A low power consum<strong>in</strong>g VCO core at C-b<strong>and</strong> frequencies<br />

is presented. With a supply voltage of 2.5 V <strong>and</strong> a VCO<br />

core current consumption of only 1.5 mA a phase noise<br />

between -98 to -104 dBc/Hz is measured. Vary<strong>in</strong>g the<br />

tun<strong>in</strong>g voltage from 1 V to 2.5 V yields a tun<strong>in</strong>g range of<br />

800 MHz with a center frequency of 4.6 GHz. The fully<br />

<strong>in</strong>tegrated circuit is fabricated on a commercial 0.25 µm<br />

BiCMOS process.<br />

1. INTRODUCTION<br />

High-speed phase-locked loops <strong>and</strong> clock/data recovery<br />

circuits require high performance VCOs. The VCO should<br />

be able to drive several sub-blocks like prescaler, phase<br />

detector <strong>and</strong> demultiplexer. Furthermore, the jitter<br />

generation <strong>in</strong> the VCO should be m<strong>in</strong>imized. As the VCO<br />

phase noise is high-pass filtered <strong>in</strong> PLLs, the phase noise<br />

spectral power components with a frequency offset from<br />

the carrier higher than the PLL loop b<strong>and</strong>width should be<br />

as low as possible.<br />

2.1 Topology<br />

2. DESIGN<br />

The phase noise [1] of a LC VCO expressed <strong>in</strong> dBc/Hz at<br />

a frequency-offset ∆f can be approximated by<br />

⎡ 2<br />

2kT ⎛ f0<br />

⎞<br />

⎤<br />

L{ ∆ f} = 10 log ⎢F⎜ ⎟ ⎥<br />

⎢ PSig ⎝2Q∆f ⎠ ⎥<br />

⎣ ⎦<br />

where F is the noise factor of the active circuit part, k is<br />

the Boltzmann constant, T the absolute temperature, PSig<br />

the oscillator output power, f 0 the oscillation frequency,<br />

<strong>and</strong> Q the loaded oscillator tank quality factor. The major<br />

parameters to optimize are the quality factor of the tank<br />

<strong>and</strong> the oscillator amplitude. The goal of this design was<br />

to reduce the power consumption <strong>in</strong> the VCO core <strong>and</strong> to<br />

achieve a wide tun<strong>in</strong>g range while still hav<strong>in</strong>g an<br />

acceptable phase noise performance. For the realization, a<br />

NMOS-PMOS cross-coupled LC VCO topology (Fig. 1)<br />

was chosen because it reuses current result<strong>in</strong>g <strong>in</strong> higher<br />

oscillation amplitude than a NMOS-only structure.<br />

Doubl<strong>in</strong>g the oscillation amplitude means lower<strong>in</strong>g the<br />

99<br />

phase noise by maximum of -6 dB. For a given bias<br />

current, the negative conductance g ative of a NMOS-PMOS<br />

structure is -g mn/2-g mp/2 where as the negative<br />

conductance of a NMOS-only structure is only -g mn/2 with<br />

g mn <strong>and</strong> g mp be<strong>in</strong>g the transconductances of a NMOS <strong>and</strong><br />

PMOS transistor. A higher transconductance for a given<br />

bias current results <strong>in</strong> faster switch<strong>in</strong>g of the crosscoupled<br />

pair. Siz<strong>in</strong>g the PMOS <strong>and</strong> NMOS transistors that<br />

g mn≈g mp can improve the rise <strong>and</strong> fall time symmetry of<br />

the signals V x, V y hav<strong>in</strong>g a frequency of f osc <strong>and</strong> V p<br />

alternat<strong>in</strong>g with 2·f osc. A better signal symmetry<br />

m<strong>in</strong>imizes the low frequency 1/f-noise up-conversion of<br />

the tail current noise <strong>in</strong>to sideb<strong>and</strong> frequency components<br />

around multiples of 2·f osc <strong>and</strong> results <strong>in</strong> a lower 1/f 3 noise<br />

corner frequency [2]. A disadvantage of the additional<br />

PMOS transistors is the <strong>in</strong>crease of the total parasitic<br />

capacitance of the tank result<strong>in</strong>g <strong>in</strong> a somewhat lower<br />

tun<strong>in</strong>g range.<br />

Figure 1. Schematic of the NMOS-PMOS VCO.<br />

To drive all sub-blocks of a PLL or CDR with the system<br />

clock a VCO buffer is used. The clock buffer tree<br />

<strong>in</strong>corporates two differential amplifier stages. In the<br />

presented design a third stage is added to provide a<br />

tapered output buffer for an adequate voltage sw<strong>in</strong>g at the<br />

50 Ω load. The first stage isolates the VCO core from the<br />

ma<strong>in</strong> clock driver <strong>and</strong> should have a m<strong>in</strong>imal <strong>in</strong>put<br />

capacitance. The width of transistor should be therefore as<br />

small as possible but large enough to be able to drive the<br />

next stage with an equal rise <strong>and</strong> fall time.


2.2 Tank amplitude<br />

Fig. 2 shows the model for a parallel LC oscillator <strong>in</strong><br />

steady state, where -g active represents the negative<br />

conductance of the active devices that compensates the<br />

losses <strong>in</strong> the tank represented by g tank.<br />

Figure 2. Steady-state parallel oscillator model.<br />

At the resonance frequency, the admittances of the L <strong>and</strong><br />

the C cancel each other. Harmonics of the <strong>in</strong>put current<br />

Iosc are strongly attenuated, leav<strong>in</strong>g the fundamental of Iosc to generate a differential voltage sw<strong>in</strong>g of amplitude<br />

4/π·Iosc·(gtank) -1 across the tank assum<strong>in</strong>g a rectangular<br />

current waveform. At high frequencies, the current<br />

waveform may be approximated more closely by a<br />

s<strong>in</strong>usoid. In this case the tank amplitude can be better<br />

approximated as Vtank≈Iosc·(gtank) -1 . This mode of operation<br />

is referred as the current-limited regime [2].<br />

If Iosc is steadily <strong>in</strong>creased to maximize the tank voltage,<br />

at a certa<strong>in</strong> current Iosc, the tank voltage saturates at Vlimit, which is close to the supply voltage to the first order. This<br />

situation is called voltage-limited regime. What we want<br />

to avoid is to drive the tail current source M5 <strong>in</strong>to triode<br />

region. Therefore, the goal is to work <strong>in</strong> the currentlimited<br />

regime very close to the voltage-limited regime to<br />

have large tank amplitude to lower the phase noise. The<br />

tank voltage amplitude is <strong>in</strong> the range of<br />

-1 -1<br />

I osc ⋅(g tank,max ) ≤V tank ≤ I osc ⋅ (g tank,m<strong>in</strong> ) < Vdd-Vdsat5 where Iosc·(gtank,max) -1 signifies the worst-case scenario<br />

when the varactor capacitance <strong>and</strong> the tank losses are<br />

maximal. The lower the losses <strong>in</strong> the resonator the smaller<br />

oscillator core current Iosc that fulfils the oscillation startup<br />

condition can be chosen. The dimension of the tail<br />

current transistor M5 are: L=0.5µm, W=76.8µm. Vdsat5 is<br />

0.4 V with a dra<strong>in</strong>-source current of Iosc=1.5 mA. The<br />

simulated differential tank amplitude varies between 1.4 V<br />

(gtank,m<strong>in</strong>: 0.7 V


Small signal simulation tak<strong>in</strong>g the wir<strong>in</strong>g parasitics <strong>in</strong>to<br />

account predicted an open-loop ga<strong>in</strong> between 2 <strong>and</strong> 3<br />

when vary<strong>in</strong>g with the tun<strong>in</strong>g voltage.<br />

2.4 Resonator <strong>and</strong> tun<strong>in</strong>g range<br />

An oscillator can be represented as a tuneable RLC-tank<br />

with a variable capacitance as tun<strong>in</strong>g element. To have a<br />

sharp <strong>and</strong> high tank amplitude <strong>and</strong> a steep phase roll off <strong>in</strong><br />

the phase doma<strong>in</strong> the L/C ratio should be as high as<br />

possible <strong>and</strong> the losses <strong>in</strong> the resonator should be<br />

m<strong>in</strong>imized [3]. The octagonal on-chip spiral <strong>in</strong>ductor<br />

hav<strong>in</strong>g an <strong>in</strong>ductance of 1.12 nH is a deep-trench <strong>in</strong>ductor<br />

with an outer dimension of 160 µm, a width of 10 µm, a<br />

spac<strong>in</strong>g of 5 µm <strong>and</strong> 2.5 turns. The deep-trench <strong>in</strong>ductor<br />

uses deep-trench mazes with a high resistance under the<br />

<strong>in</strong>ductor to reduce the parasitic capacitance. The<br />

simulated quality factor is Q=14 at 5 GHz, the simulated<br />

peak quality factor is at 5.68 GHz <strong>and</strong> the simulated selfresonance<br />

frequency is around 30 GHz.<br />

As a varactor a base-collector pn-junction diode used <strong>in</strong><br />

reverse biased mode, hav<strong>in</strong>g a tuneability of 2:1 <strong>and</strong> a<br />

quality factor rang<strong>in</strong>g from 60 to 120 at 5 GHz, is chosen.<br />

The highest Q can be achieved with the cathode connected<br />

to ac ground due to the subcollector to substrate<br />

capacitance, as can be seen <strong>in</strong> Fig 5. The design of the<br />

anode width requires a trade-off between higher<br />

capacitance ratio <strong>and</strong> lower diode series resistance. A<br />

square anode will maximize the capacitance ratio. A long<br />

<strong>and</strong> narrow diode provides lower series resistance, which<br />

is more important to have low tank losses. The dimension<br />

of the used varactor hav<strong>in</strong>g 26 cathode stripes is<br />

L=10.28 µm <strong>and</strong> W=2 µm to reduce the losses <strong>in</strong> the tank.<br />

Figure 5. Base-collector pn-junction diode.<br />

The frequency of the tun<strong>in</strong>g range of the circuit is given<br />

by:<br />

1 1<br />

∆ω = −<br />

L tan k ⋅ (Cv,m<strong>in</strong>+ C fix ) L tan k ⋅ (Cv,max+ C fix )<br />

where Ltank=1.12 nH, Cfix=CPMOS+CNMOS+Cbuf+C<strong>in</strong>d+Cwir<strong>in</strong>g <strong>and</strong> the simulated Cv is illustrated <strong>in</strong> Fig. 6. The lower the<br />

fixed capacitance Cfix the higher the tun<strong>in</strong>g range. The<br />

total fixed capacitance Cfix is around 350 fF over the<br />

whole tun<strong>in</strong>g range. The small signal capacitances are:<br />

CPMOS=66 fF; CNMOS=21 fF; Cbuf =10 fF; C<strong>in</strong>d≈225 fF;<br />

Cwir<strong>in</strong>g≈120 fF. The quality factor of the varactor depicted<br />

<strong>in</strong> Fig. 6 decreases with lower tun<strong>in</strong>g voltage Vc <strong>and</strong> thus<br />

the loaded quality factor of the tank as well which results<br />

<strong>in</strong> a lower phase noise performance.<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

101<br />

Cv [fF] Q<br />

1100<br />

180<br />

900<br />

700<br />

500<br />

300<br />

5 4.2 3.4 2.6 1.8 1<br />

Tun<strong>in</strong>g Voltage Vc [V]<br />

140<br />

100<br />

Figure 6. Variable capacitance Cv <strong>and</strong> its quality factor Q<br />

simulated at 5 GHz versus the tun<strong>in</strong>g voltage V c.<br />

2.5 Implementation<br />

The fully <strong>in</strong>tegrated circuit is implemented as an entirely<br />

symmetric layout <strong>in</strong> a commercial 0.25 µm BiCMOS<br />

technology [4]. The overall chip size is 1 mm x 0.7 mm<br />

but the effective chip size <strong>in</strong>clud<strong>in</strong>g two spirals is only<br />

0.5 mm x 0.4 mm.<br />

Figure 7. Chip photo.<br />

3. MEASUREMENTS<br />

All measurements were preformed on-wafer. Fig. 8 shows<br />

the measured spectrum at a tun<strong>in</strong>g voltage of 2.5 V. With<br />

a resolution b<strong>and</strong>width of 100 kHz, the phase noise at a<br />

frequency offset of 1 MHz from the carrier frequency of<br />

5.011 GHz is (-54-50) dBc/Hz= -104 dBc/Hz. The<br />

measured s<strong>in</strong>gle ended output power over the complete<br />

tun<strong>in</strong>g range tak<strong>in</strong>g <strong>in</strong>to account the losses of the spectrum<br />

analyzer (3.5 dB), the coaxial cable (0.75 dB) <strong>and</strong> the<br />

probe (0.5 dB) results <strong>in</strong> a output power of -1.5 dBm<br />

correspond<strong>in</strong>g to a peak-to-peak amplitude of 530 mVpp<br />

at 50 Ω. The simulated transient output peak-to-peak<br />

amplitude over the whole tun<strong>in</strong>g range is higher than<br />

0.5 V. The oscillator core draws only 1.5 mA from a<br />

supply voltage of 2.5 V. The current consumption of the<br />

whole chip <strong>in</strong>clud<strong>in</strong>g output buffer is 38 mA while the last<br />

differential stage needs 27 mA.<br />

60<br />

20


Power [dBm]<br />

-20<br />

-40<br />

-60<br />

5.006 5.01 5.014<br />

Frequency [GHz]<br />

Figure 8. Measured output spectrum at 5.01 GHz.<br />

Resolution b<strong>and</strong>width: 100 kHz.<br />

Fig. 9 shows the simulated (dotted), measured oscillation<br />

frequency <strong>and</strong> the measured phase noise at a frequency<br />

offset of 1 MHz from the carrier versus the tun<strong>in</strong>g voltage.<br />

The measured tun<strong>in</strong>g range is 800 MHz by vary<strong>in</strong>g the<br />

tun<strong>in</strong>g voltage from 1 V to 2.5 V. The tun<strong>in</strong>g range<br />

expressed is 17.4 %.<br />

Frequency [GHz] Phase Noise [dBc/Hz]<br />

5.1<br />

-95<br />

4.9<br />

4.7<br />

4.5<br />

4.3<br />

4.1<br />

sim<br />

meas<br />

1 1.5 2 2.5<br />

Tun<strong>in</strong>g Voltage Vc [V]<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

-97<br />

-99<br />

-101<br />

-103<br />

-105<br />

Figure 9. Simulated (dotted), measured oscillation<br />

frequency <strong>and</strong> the measured phase noise at a<br />

frequency offset of 1 MHz versus tun<strong>in</strong>g voltage.<br />

It is possible to <strong>in</strong>crease the tun<strong>in</strong>g voltage V c further.<br />

With a tun<strong>in</strong>g voltage of V c=5 V the measured oscillation<br />

frequency is 5.47 GHz <strong>and</strong> the phase noise at a frequency<br />

offset of 1 MHz is -110 dBc/Hz.<br />

Ref. Iosc<br />

[mA]<br />

Pcore<br />

[mW]<br />

54 dB<br />

meas<br />

PN (1 MHz)<br />

[dBc/Hz]<br />

Tun<strong>in</strong>g<br />

range<br />

[5] 2 3.8 -101 7.2 %<br />

[6] 10 20 -100 8 %<br />

[7] 5.5 14 -114 18 %<br />

This work 1.5 3.8 -101 17 %<br />

Table. 1. Comparison of 5 GHz CMOS crosscoupled<br />

VCO implemented <strong>in</strong> 0.25µm CMOS.<br />

102<br />

Table 1 compares the figure of merits of state-of-the-art<br />

cross-coupled VCOs operat<strong>in</strong>g at 5 GHz <strong>and</strong> implemented<br />

<strong>in</strong> 0.25µm CMOS processes.<br />

4. CONCLUSION<br />

The presented NMOS-PMOS cross-coupled 5 GHz<br />

LC VCO core consumes only 3.75 mW. The VCO core<br />

current was m<strong>in</strong>imized by optimiz<strong>in</strong>g the resonator losses<br />

<strong>and</strong> the chosen topology. Despite the disadvantage of this<br />

topology, the measured tun<strong>in</strong>g range is 17 %. If the tun<strong>in</strong>g<br />

voltage is <strong>in</strong>creased higher than the supply voltage the<br />

tun<strong>in</strong>g range reaches 1.3 GHz. The phase noise at a<br />

frequency offset of 1 MHz over the whole tun<strong>in</strong>g range is<br />

-101 dBc/Hz ±3 dB.<br />

5. ACKNOWLEGEMENTS<br />

For fund<strong>in</strong>g <strong>and</strong> access to the BiCMOS technology, we<br />

would like to thank IBM <strong>Research</strong> Laboratory Zurich. In<br />

this context the authors would like to acknowledge<br />

Dr. M. Schmatz, IBM <strong>Research</strong>, Zurich, Switzerl<strong>and</strong>, for<br />

his support <strong>and</strong> engagement for the IBM/ETH center for<br />

advanced silicon electronics (CASE). Furthermore, the<br />

authors are grateful to H. Benedickter for provid<strong>in</strong>g<br />

measurement equipment <strong>and</strong> support <strong>and</strong> to D. Barras for<br />

organiz<strong>in</strong>g the waferrun.<br />

6. REFERENCES<br />

[1] Leeson D. B., A simple model of feedback oscillator<br />

noise spectrum, Proc. IEEE, Dec. 1966.<br />

[2] Hajimiri A., Lee T. H., Design issues <strong>in</strong> CMOS<br />

differential LC oscillators, IEEE Journal of Solid-<br />

State Circuits, vol. 34, n0. 5, pp. 717-724, May 1999.<br />

[3] Tiebout M., Low-power low-phase-noise<br />

differentially tuned quadrature VCO design <strong>in</strong><br />

st<strong>and</strong>ard CMOS, IEEE Journal of Solid-State<br />

Circuits, vol. 36, n0. 7, pp. 1018-1024, July 2001.<br />

[4] St. Onge S. A. et al., A 0.24 µm SiGe BiCMOS<br />

mixed-signal RF production technology featur<strong>in</strong>g a<br />

47 GHz f t HBT <strong>and</strong> 0.18 µm L eff CMOS,<br />

Bipolar/BiCMOS Circuits <strong>and</strong> Technology Meet<strong>in</strong>g<br />

1999, pp. 117 - 120, Sept. 1999.<br />

[5] Rategh, H. R. et al., A CMOS frequency synthesizer<br />

with an <strong>in</strong>jection-locked frequency divider for a<br />

5-GHz wireless LAN receiver, IEEE Journal of<br />

Solid-State Circuits, vol. 35, n0. 5, pp. 780 – 787,<br />

May 2000<br />

[6] Yamagishi, A. et al., A low-voltage 6 GHz-b<strong>and</strong><br />

CMOS monolithic LC-tank VCO us<strong>in</strong>g a tun<strong>in</strong>grange<br />

switch<strong>in</strong>g technique, International Microwave<br />

Symposium Digest, 2000 IEEE MTT-S, vol. 2,<br />

pp. 735 - 738, June 2000.<br />

[7] Samori, C. et al., A -94 dBc/Hz@100 kHz, fully<strong>in</strong>tegrated,<br />

5-GHz, CMOS VCO with 18% tun<strong>in</strong>g<br />

range for Bluetooth applications, IEEE Custom<br />

Integrated Circuits Conference 2001, pp. 201 - 204,<br />

May 2001.


0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

Design <strong>and</strong> Optimization of CMOS Prescaler<br />

Yu Lei, Student Member IEEE, Adil Koukab,<br />

<strong>and</strong> Michel Declercq, Fellow IEEE<br />

Ecole Polytechnique Fédérale de Lausanne (EPFL), Laboratoire d’Electronique Générale,<br />

Batiment ELB, Station 11, CH-1015 Lausanne, Switzerl<strong>and</strong><br />

E-mail: yu.lei@epfl.ch<br />

Abstract:<br />

UWB prescaler operat<strong>in</strong>g from 1GHz to 25 GHZ<br />

while draw<strong>in</strong>g about 3.5mA from 1.8V power<br />

supply has been designed <strong>in</strong> 0.18 CMOS process.<br />

The prescaler features new balanced dynamic load<br />

technique <strong>and</strong> an asymmetrical latch structure. The<br />

speed enhancement ma<strong>in</strong>ly comes from the<br />

asymmetrical sett<strong>in</strong>g of the D-latch cell, while<br />

balanced dynamic load significantly widens the<br />

frequency range <strong>and</strong> at the same time reduces power<br />

consumption. The factors <strong>and</strong> tradeoffs that govern<br />

the circuit performances are addressed to optimize<br />

the design.<br />

I. Introduction:<br />

High speed prescaler is one of the key components<br />

<strong>in</strong> frequency synthesizers. It works at the highest<br />

frequency of the system <strong>and</strong> generally consumes a<br />

significant portion of the total power. The ma<strong>in</strong><br />

function of the prescaler is to divide de frequency by<br />

two <strong>and</strong> to generate quadratic I <strong>and</strong> Q signals. Dflipflop<br />

architectures us<strong>in</strong>g CML is the most popular<br />

topology for this circuit [1-6]. Conventional CML<br />

circuits are preferred due to their high speed, good<br />

noise immunity <strong>and</strong> relatively big work<strong>in</strong>g<br />

frequency range. But <strong>in</strong> high frequency their power<br />

consumption <strong>in</strong>crease significantly. In this paper, a<br />

new prescaler topology with a modified dynamic<br />

load is proposed. This architecture is particularly<br />

suitable for a high speed, ultra wide b<strong>and</strong> (UWB)<br />

<strong>and</strong> low power application. Design parameters<br />

govern<strong>in</strong>g the prescalers operations are <strong>in</strong>vestigated<br />

<strong>and</strong> analysed <strong>and</strong> a methodology to optimize their<br />

performances proposed.<br />

II. Circuit Design<br />

A conventional CML prescaler uses constant biased<br />

PMOS transistor as load is shown <strong>in</strong> Fig. 1.<br />

Generally the prescaler comprises two differential<br />

current mode D-latches mutually coupled <strong>in</strong> a<br />

103<br />

negative feedback. As illustrated <strong>in</strong> Fig.1, Each Dlatch<br />

consists of two differential pairs, the transistors<br />

M7, M8 compose the logic pair which track<strong>in</strong>g the<br />

<strong>in</strong>put signal <strong>and</strong> M5 <strong>and</strong> M6 compose the latch pair.<br />

The track <strong>and</strong> latch modes are determ<strong>in</strong>ed by the<br />

clock signals at the <strong>in</strong>puts of the transistors M9 <strong>and</strong><br />

M10. When the signal CLK is "HIGH", the circuit<br />

operates <strong>in</strong> the track<strong>in</strong>g mode. Most of the current<br />

provided by the load flows through the logic pair,<br />

M7 <strong>and</strong> M8, thereby allow<strong>in</strong>g VOUT (Q, QN) to<br />

track V<strong>in</strong> (D, DN). In the latch-mode, the signal CLK<br />

is held low <strong>and</strong> the latch pair is enabled to store the<br />

logic state at the output. At the same time the<br />

track<strong>in</strong>g stage is disabled. M2 <strong>and</strong> M3 (M12, M13)<br />

are PMOS transistors which are constant biased as<br />

the load shared by logic pair <strong>and</strong> latch pair [1, 4, 7].<br />

Resistors were also used as loads <strong>in</strong> some other<br />

designs [5, 6]. Based on the conventional designs,<br />

two major modifications will be <strong>in</strong>troduced <strong>in</strong> the<br />

Fig. 1 Conventional Prescaler with constant biased load<br />

prescaler architecture to improve the operation speed<br />

<strong>and</strong> enhance the frequency operat<strong>in</strong>g range. The first<br />

technique consists of us<strong>in</strong>g an asymmetrical latch<br />

design. The second technique is to implement a<br />

complementary CLK driven load (M1-M4 <strong>in</strong> fig1).<br />

This technique will be called balanced dynamic load<br />

(BDL).


2.1 Asymmetrical latch design<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

Fig. 2 High Speed BDL Prescaler<br />

Traditionally, prescaler was designed <strong>in</strong> a<br />

symmetrical topology; same transistor size was used<br />

<strong>in</strong> the latch <strong>and</strong> the logic pairs [4]. However, this is<br />

not an optimized solution. In fact, the logic pair<br />

should be designed with large transistors (6-10um<br />

gate width) <strong>in</strong> order to enhance its ga<strong>in</strong>. At the same<br />

time its capacitance load <strong>and</strong> so the latch transistors<br />

size should be as small as possible to speed up the<br />

chang<strong>in</strong>g of state. This is why the equally sized<br />

latch-logic pairs have a limited frequency operat<strong>in</strong>g<br />

range.<br />

To <strong>in</strong>crease the maximum operat<strong>in</strong>g frequency we<br />

should shr<strong>in</strong>k the transistors of the latch pair.<br />

However, there is a shr<strong>in</strong>k<strong>in</strong>g limit s<strong>in</strong>ce the latch<br />

pair also needs some ga<strong>in</strong> to drive the second Dlatch<br />

<strong>and</strong> accelerate its chang<strong>in</strong>g state (latch<strong>in</strong>g<br />

mode). On the other h<strong>and</strong>, the lower limit of the<br />

operat<strong>in</strong>g frequency range <strong>in</strong>crease significantly<br />

when we size down the latch pair. To deal with this<br />

problem, the BDL technique will be <strong>in</strong>troduced.<br />

This method will be discussed <strong>in</strong> next section. For<br />

the current process the optimum ratio between logic<br />

<strong>and</strong> latch transistors sizes was found to be around 3<br />

to 1. Thanks to this optimization procedure 30%<br />

speed enhancement is achieved compared to an<br />

104<br />

implementation employ<strong>in</strong>g an equally sized<br />

differential pair <strong>and</strong> us<strong>in</strong>g the same power budget.<br />

2.2 Balanced dynamic load technique<br />

The load topology has a very important impact on<br />

the prescaler performances. Many configurations<br />

have been presented <strong>in</strong> the open literature based on<br />

passive loads [5, 6] <strong>and</strong> constant biased active loads<br />

[1, 4, 7]. Recently, a dynamic load technique was<br />

<strong>in</strong>troduced [2, 3] with a significant success. The aim<br />

of the technique is to reduce the load <strong>and</strong> so the RC<br />

time <strong>in</strong> the track<strong>in</strong>g mode, <strong>and</strong> <strong>in</strong>crease the load <strong>in</strong><br />

the latch<strong>in</strong>g mode for higher ga<strong>in</strong> to accelerate the<br />

positive feedback setup process. For the later, this<br />

also implies that for the same ga<strong>in</strong> one can reduce<br />

the latch size thus reduce the current needed for<br />

charge the capacitance. To achieve this,<br />

complementary clocks were used to drive the active<br />

loads of the logic <strong>and</strong> the latch pairs. A significant<br />

improvement <strong>in</strong> the power consumption at high<br />

frequency was reported [3]. The sources of latch pair<br />

transistors of this topology have been directly<br />

grounded <strong>and</strong> the maximum reported frequency was<br />

18 GHz. This frequency operat<strong>in</strong>g limit was also<br />

verified from our simulations. To enhance the<br />

frequency range of this structure, we have<br />

<strong>in</strong>troduced the transistors M9, M19 to drive the latch<br />

pair (MDL <strong>in</strong> Fig.2). This modification <strong>in</strong>creases the<br />

highest operat<strong>in</strong>g frequency by the ratio of about<br />

30%.<br />

The effect of add<strong>in</strong>g M9/M19 is analyzed here. At<br />

first, <strong>in</strong>sert<strong>in</strong>g tail transistors M9/M19 reduce the<br />

<strong>in</strong>ternal signal sw<strong>in</strong>g <strong>in</strong> node D/D*/Q/Q*, thus<br />

reduce the time of chang<strong>in</strong>g state. Secondly, the<br />

clock-driven latch pair has less capacitive load <strong>in</strong> the<br />

track<strong>in</strong>g mode compare to grounded latch pair <strong>in</strong> DL<br />

prescaler of fig.2, thus improve the operation speed.<br />

The side effect of <strong>in</strong>sert<strong>in</strong>g M9, M19 is that <strong>in</strong> lower<br />

frequency range (


III. Simulation result <strong>and</strong> discussion<br />

Transistors’ dimension:<br />

Transistor Gate dimension<br />

M7/M8 (logic) 6um<br />

M5/M6(latch) 3um<br />

M9 (latch tail current) 16um<br />

M10 (logic tail current) 8um<br />

M2/M3(load) 8um<br />

M1/M4(load) 3um<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

Table .1 Transistors’ dimension of<br />

designed BDL prescaler<br />

Figure 2 illustrates all the circuit topology used <strong>in</strong><br />

simulation. The dimension of all the transistors used<br />

<strong>in</strong> the BDL prescaler is shown <strong>in</strong> table 1.<br />

Figure 3 shows that both the classical DL <strong>and</strong> BDL<br />

prescaler works correctly with 18GHz <strong>in</strong>put. When<br />

the <strong>in</strong>put is 25GHz, the BDL prescaler still divides<br />

correctly but the classical DL fails to work. In our<br />

simulation, 18GHz is the upper limit <strong>in</strong> of divide<br />

range of DL prescaler.<br />

Fig.3 High frequency performance comparison<br />

Classical DL <strong>and</strong> BDL architecture<br />

Figure 4 shows the prescaler <strong>in</strong>put <strong>and</strong> output signal<br />

work<strong>in</strong>g at 8GHz. In the upper curves the MDL<br />

prescaler of Fig.1 is used, while <strong>in</strong> the lower curves<br />

the BDL prescaler is used. The MDL prescaler fails<br />

to divide correctly while the BDL prescaler cont<strong>in</strong>ue<br />

to work properly. The current flow <strong>in</strong> the logic <strong>and</strong><br />

latch pairs for these two topologies is illustrated <strong>in</strong><br />

Fig 4 (upper curves for BDL <strong>and</strong> lower curves for<br />

MDL). It is obvious that the latch pair consumes<br />

105<br />

much less current than logic pair. This is partly due<br />

to the fact that the latch’s setup process is based on<br />

positive feedback <strong>and</strong> consumes only small current<br />

after “startup”. The shr<strong>in</strong>k of latch transistors also<br />

contribute to this current sav<strong>in</strong>g. In the classical<br />

implementation which employs resistor or constant<br />

biased transistors for the loads, the current flow<br />

through this load is always ma<strong>in</strong>ta<strong>in</strong>ed at the same<br />

high level which is set by the requirement of the<br />

logic pair. This is clearly not an efficient solution.<br />

The DL technique tries to provide high current only<br />

for the logic pair, thus successfully reduces the<br />

current consumption of the whole prescaler.<br />

Fig.4 Low frequency failure of MDL<br />

Maximum <strong>in</strong>put frequency 25GHz<br />

M<strong>in</strong>imum <strong>in</strong>put frequency 0.8GHz<br />

Output Voltage 2 X 250mV pp<br />

Divide ratio 2<br />

Supply voltage 1.8v<br />

Supply current 3.5mA<br />

Technology 0.18u CMOS<br />

Table II. Technical data<br />

Insert<strong>in</strong>g CLK driven tail current source to latch pair<br />

(MDL) improves the maximum operation frequency<br />

by roughly 30% but fails to divide <strong>in</strong> the low<br />

frequency range (


timeslot. This negative feedback discharges the<br />

output node <strong>and</strong> thus counteracts the setup positive<br />

feedback process of the latch pair which charr<strong>in</strong>g the<br />

output node. The existence of this negative feedback<br />

timeslot significantly affects the prescaler’s latch<br />

setup behavior. This timeslot widen as the frequency<br />

decreases. After a certa<strong>in</strong> frequency threshold, this<br />

timeslot will be so big that the latch positive<br />

feedback is totally <strong>in</strong>hibited <strong>and</strong> the latch setup fails.<br />

At this po<strong>in</strong>t the prescaler will cease to divide. In the<br />

MDL configuration the simulated frequency<br />

threshold is around 8.5GHz.<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

Fig.5 Current flow In D-latch<br />

In the BDL technique, a second load pair is<br />

<strong>in</strong>troduced, which is driven by the complementary<br />

clock signal of the first pair. This successfully<br />

resolves the problem mentioned above <strong>and</strong> sets the<br />

lower threshold to around 1GHz. From the figure 5,<br />

the effect of the second load pair can be seen clearly.<br />

It <strong>in</strong>jects current to the output node periodically to<br />

counteract the effect of negative feedback leakage,<br />

thus the latch<strong>in</strong>g process cont<strong>in</strong>ue to work even at<br />

very low frequency.<br />

IV. Conclusion<br />

A UBW [1-25GHz], low power [3.5mA from 1.8v]<br />

prescaler was designed. The detailed technical data<br />

of the designed prescaler is listed <strong>in</strong> table II. In this<br />

paper, a new balanced dynamic load architecture<br />

which improves the frequency operat<strong>in</strong>g range by 30<br />

% was presented. This research illustrates that by<br />

h<strong>and</strong>l<strong>in</strong>g the current flow <strong>in</strong> the prescaler correctly, a<br />

106<br />

“balanced” ultra wide b<strong>and</strong> design with relatively<br />

low power consumption can be achieved.<br />

Reference:<br />

[1] D-J Yang, Kenneth K.O, “A 14-Ghz 256/257<br />

Dual- Modulus Prescaler with secondary<br />

Feedback <strong>and</strong> its application to monolithic<br />

CMOS 10.4GHz PLL,” IEEE Trans. on<br />

Microwave Theory <strong>and</strong> Techniques VOL.52,<br />

No.2, pp.461-468, Feb, 2004<br />

[2] Hongmo Wang , US Patent 6,166, 571,<br />

Dec. 26. 2000<br />

[3] Hongmo Wang, “A 1.8v 3mw 16.8GHz<br />

Frequency Divider <strong>in</strong> 0.25um CMOS,” <strong>in</strong> IEEE<br />

Int. Solid-state circuits Conf. Tech. Dig, San<br />

Francisco, CA, Feb. 2000, pp.196-197<br />

[4] A. Sh<strong>in</strong>myo, M. Hashimoto, “Design <strong>and</strong><br />

optimization of CMOS current mode logic<br />

dividers,” 2004 IEEE AsiaPacific conference on<br />

Advanced system <strong>in</strong>tegrated circuits, 2004<br />

[5] Hans-Dieter Wohlmuth, Daniel Kehere, et. Al.<br />

“A 17GHz Dual-Modulus Prescaler <strong>in</strong> 120nm<br />

CMOS,” IEEE Radio Frequency Integrated<br />

Circuits (RFIC) Symposium, pp.479-482, 2003<br />

[6] Eric Tour<strong>in</strong>er, Mathilde sie, Jacques Graffeuil,<br />

“A 14.5GHz 0.35um frequency divider for dualmodulus<br />

prescaler,” IEEE Radio Frequency<br />

Integrated Circuits (RFIC) Symposium, pp.227-<br />

230, 2002<br />

[7] Behzad Razavi, Kw<strong>in</strong>g, F, Lee <strong>and</strong> Ran H, Yan,<br />

“Design of high-speed, low power frequency<br />

dividers <strong>and</strong> PLLs <strong>in</strong> deep submicron CMOS,”<br />

IEEE J. Solid-state Circuits, Vol30, No.2, pp.<br />

101-109, Feb, 1995.


0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

EMBEDDED HARDWARE ARCHITECTURE FOR<br />

STATISTICAL RAIN FORECAST<br />

Tassilo Me<strong>in</strong>dl, Walter Moniaci, Davide Gallesio, Eros Pasero<br />

Polytechnic of Tur<strong>in</strong>, Neuronica laboratory,<br />

Corso Duca degli Abruzzi, I-10129 Tur<strong>in</strong>, Italy<br />

E-mail: tassilo.me<strong>in</strong>dl@polito.it<br />

ABSTRACT<br />

Weather forecasts are a typical problem where a huge<br />

amount of data com<strong>in</strong>g from different types of sensors<br />

must be elaborated by means of complex, time consum<strong>in</strong>g<br />

algorithms. In this paper we present a new embedded data<br />

m<strong>in</strong><strong>in</strong>g system, based on a FPGA [1], which performs<br />

statistical data elaboration on sensorial data <strong>and</strong> provides<br />

nowcast<strong>in</strong>g for the next one, two <strong>and</strong> three hours. The<br />

statistical algorithm has been tested for forecasts of<br />

temperature, humidity <strong>and</strong> ra<strong>in</strong> <strong>and</strong> provides promis<strong>in</strong>g<br />

results.<br />

1. INTRODUCTION<br />

Weather forecast systems are among the most heavy<br />

equation systems that a computer must solve. A very large<br />

amount of data, com<strong>in</strong>g from satellites, synoptic stations<br />

<strong>and</strong> sensors located around our planet give each day<br />

<strong>in</strong>formation that must be used to foresee the weather<br />

situation <strong>in</strong> next hours <strong>and</strong> days all around the world.<br />

Weather reports give forecasts for next 24, 48 <strong>and</strong> 72<br />

hours for wide areas. Today they are quite reliable but it is<br />

not unusual that <strong>in</strong> a restricted region the conditions<br />

suddenly change. Snow, ice <strong>and</strong> fog are very dangerous<br />

events that can occur on the roads giv<strong>in</strong>g important<br />

problems for road safety. So whereas a traditional weather<br />

forecast system provides a powerful system able to give<br />

<strong>in</strong>dications about the weather evolution <strong>in</strong> next days <strong>in</strong> a<br />

large area, this embedded system is used to make very<br />

short-term local prediction <strong>and</strong> warn<strong>in</strong>g decisions. This is<br />

made possible thanks to a powerful suite of detection <strong>and</strong><br />

prediction algorithms. The usefulness of this approach is<br />

that the knowledge of the next three hours weather<br />

conditions, for example, can be used to organize outdoor<br />

activities. For example the knowledge about a ra<strong>in</strong> event<br />

<strong>in</strong> next two hours can be used to choose the right set of<br />

tires <strong>in</strong> a formula one competition. The knowledge of fog<br />

occurrence <strong>in</strong> next three hours allows the airport<br />

ma<strong>in</strong>tenance staff to take <strong>in</strong> advance important decisions<br />

<strong>in</strong> order to avoid cancel<strong>in</strong>g flight at the last m<strong>in</strong>ute. The<br />

realization of an autonomous embedded portable weather<br />

station that can be used without a PC, able to predict<br />

microclimate meteorological events <strong>and</strong> alarms <strong>in</strong><br />

restricted areas with<strong>in</strong> short time is therefore an <strong>in</strong>terest<strong>in</strong>g<br />

<strong>and</strong> useful approach different from the classical<br />

macroclimate meteorological forecast systems.<br />

107<br />

2. HARDWARE ARCHITECTURE<br />

We propose a new hardware architecture for embedded<br />

data process<strong>in</strong>g of sensorial data. In order to <strong>in</strong>crease the<br />

field of possible applications a distributed hardware<br />

architecture has been realized.<br />

Data Management <strong>and</strong> Controll<strong>in</strong>g Data Process<strong>in</strong>g<br />

Onboard<br />

Sensors<br />

433 MHz<br />

Radio<br />

Transceiver<br />

Wireless<br />

Sensors<br />

Real<br />

Time<br />

Clock<br />

LCD<br />

Display<br />

32.768 kHz<br />

Cygnal 8051<br />

Microcontroller<br />

USB<br />

PC<br />

In System<br />

Programm<strong>in</strong>g<br />

Control<br />

Power<br />

Supply<br />

Network<br />

Battery<br />

2mbit<br />

Program<br />

Flash<br />

24 MHz<br />

XILINX Spartan 3<br />

FPGA<br />

4mbit<br />

Data<br />

Flash<br />

Embedded Data Process<strong>in</strong>g Station<br />

Figure 1. Embedded Data Process<strong>in</strong>g Station.<br />

The system as shown <strong>in</strong> figure 1 is primarily based on a<br />

microcontroller, which is the heart of the “Data<br />

Management <strong>and</strong> Controll<strong>in</strong>g” part, <strong>and</strong> a FPGA<br />

connected to several memory blocks, which constitutes the<br />

“Data Process<strong>in</strong>g” part. The simple <strong>and</strong> fast modifiability<br />

of the microcontroller’s firmware, allows for an efficient<br />

acquisition of sensorial data <strong>and</strong> the time stamp, <strong>and</strong><br />

provides user <strong>in</strong>terfaces, <strong>in</strong>clud<strong>in</strong>g the connection to a PC<br />

or the on-board buttons <strong>and</strong> display, whereby the<br />

elaboration of data <strong>and</strong> the prevision algorithm has been<br />

implemented completely <strong>in</strong> the FPGA. The exact tim<strong>in</strong>g is<br />

one of the key po<strong>in</strong>ts for data process<strong>in</strong>g of sensorial data<br />

<strong>in</strong> order to guarantee data consistency whilst not sav<strong>in</strong>g<br />

time stamps <strong>in</strong>to the data memory. Every 15 m<strong>in</strong>utes the<br />

microcontroller performs data acquisition from the<br />

sensors, which are displayed directly to the user by a LCD<br />

display. The FPGA performs the statistical algorithm<br />

described <strong>in</strong> the next section <strong>and</strong> saves the forecast values<br />

for the next 1, 2 <strong>and</strong> 3 hours <strong>in</strong>to a data register, which can<br />

be read by the microcontroller. Dur<strong>in</strong>g operation on<br />

battery <strong>in</strong> the absence of power<strong>in</strong>g from the USB<br />

connector, the microcontroller performs efficient power<br />

management.


The FPGA core has been implemented completely us<strong>in</strong>g<br />

VHDL. Interfac<strong>in</strong>g to the external components <strong>and</strong> the<br />

controll<strong>in</strong>g part has been implemented directly <strong>in</strong> VHDL,<br />

whilst the statistical process<strong>in</strong>g part has been implemented<br />

us<strong>in</strong>g CODESIMULINK, a MATLAB tool developed at<br />

the Polytechnic of Tur<strong>in</strong>, which automatically generates a<br />

synthesizable VHDL code.<br />

3. ALGORITHM DESCRIPTION<br />

The algorithm is composed of a “Best Day Calculation”<br />

<strong>and</strong> the forecast module. The “Best Day Calculation”<br />

module searches <strong>in</strong> the historical database the day which<br />

matches best the current day, based on the last four hour<br />

measures of a particular parameter selected by the Parzen<br />

method [4]. The gradient evolution of the historical data<br />

allows then the Forecast module to perform a simple data<br />

prevision for the next 1, 2 <strong>and</strong> 3 hours. The ra<strong>in</strong> prediction<br />

is not an easy problem to solve. In fact ra<strong>in</strong> is not a<br />

stationary phenomenon <strong>and</strong> the classical Auto Regressive<br />

eXogenous [3] model (ARX) does not provide good<br />

results. In this paper a new approach, based on statistical<br />

non parametric method, is shown. First we have to decide<br />

the variables that have more <strong>in</strong>fluence on the ra<strong>in</strong> event.<br />

We used the Parzen method to establish the correlation<br />

among ra<strong>in</strong> <strong>and</strong> other meteorological variables.<br />

3.1 Parzen method<br />

The goal is to choose the best “predictors” for the ra<strong>in</strong><br />

forecast. We def<strong>in</strong>e the entropy of the parameters <strong>in</strong> the<br />

follow<strong>in</strong>g way:<br />

∫ +∞<br />

−∞<br />

( p(<br />

x)<br />

) ⋅ p(<br />

x)<br />

e(x) = − log dx<br />

(1)<br />

In this formula p(x) is the Probability Density Function<br />

(PDF) of a r<strong>and</strong>om variable x. e(x) is an <strong>in</strong>dex of<br />

dispersion that lies <strong>in</strong> the range ]-∞,+∞[. Let Z be the<br />

vector of all the possible predictors that can be used to<br />

foresee the variable y (predict<strong>and</strong>). Now let X1 <strong>and</strong> X2 be<br />

two particular subsets of predictors taken from Z. The<br />

number of elements <strong>in</strong> X1 <strong>and</strong> X2 has to be the same. In<br />

order to establish the best set of predictors between X1 <strong>and</strong><br />

X2 we calculated the follow<strong>in</strong>g entropy difference:<br />

d(<br />

X , y)<br />

= e(<br />

X , y)<br />

− e(<br />

x)<br />

= −∫<br />

log P(<br />

X , y)<br />

⋅ P(<br />

X , y)<br />

dXdy<br />

+<br />

(2)<br />

+ log P(<br />

X ) ⋅ P X dX<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

∫<br />

( )<br />

where P(X,y) is the jo<strong>in</strong>t PDF of X <strong>and</strong> y <strong>and</strong> P(X) is the<br />

PDF of X (the predictors). Both PDFs are unknown <strong>and</strong><br />

therefore it’s impossible to evaluate the <strong>in</strong>tegrals <strong>in</strong> (2).<br />

We circumvented this problem by estimat<strong>in</strong>g the unknown<br />

PDFs through the Parzen method. This method estimates<br />

the unknown probability density functions mak<strong>in</strong>g a sum<br />

of Gauss kernels, each one centered on a record of the<br />

database. The formula is the follow<strong>in</strong>g:<br />

108<br />

( x x )<br />

n m<br />

∗<br />

1<br />

PX<br />

( X ; D,<br />

Λ)<br />

= ∑∏ n i=<br />

1 j= 1<br />

where,<br />

1 ⎡<br />

⋅ exp⎢−<br />

2<br />

2πλ<br />

⎢<br />

j ⎣<br />

j − ij<br />

2<br />

2λ<br />

j<br />

2 ⎤<br />

⎥<br />

⎥⎦<br />

(3)<br />

• D = {X1, …, Xn}: is the data vector<br />

• n: database dimension (number of records)<br />

• xi,xij: these are the j-th component of X <strong>and</strong> Xi • Λ: st<strong>and</strong>ard deviation vector, Λ= ( λ1,….,λm) Each st<strong>and</strong>ard deviation <strong>in</strong> Λ �regulates the resolution of<br />

the estimator along the correspond<strong>in</strong>g dimension. In its<br />

turn this allows us to estimate d(X 1,y) <strong>and</strong> d(X 2,y). The<br />

best between X 1 <strong>and</strong> X 2 is the one that gives rise to the<br />

smallest entropy difference. In the case shown <strong>in</strong> this<br />

paper the sets X i are the data of the follow<strong>in</strong>g<br />

meteorological variables:<br />

• Air temperature<br />

• Relative humidity<br />

• Solar radiation<br />

• Ground temperature<br />

• Atmospheric pressure<br />

At the end of the Parzen simulation we obta<strong>in</strong> that the<br />

most important meteorological variable for ra<strong>in</strong> prediction<br />

is air relative humidity. Then we have to decide how many<br />

air relative humidity data <strong>in</strong> the past are necessary to have<br />

an affordable forecast. We use aga<strong>in</strong> the Parzen method<br />

<strong>and</strong> it is put <strong>in</strong> evidence that the last 4 hour humidity<br />

values are sufficient to make a good prediction.<br />

3.2 Forecast<strong>in</strong>g algorithm<br />

Let X be the vector of the last four hour humidity data<br />

respect to the forecast time j. We, for each day <strong>in</strong> the past,<br />

take the last four hour values of humidity respect to the<br />

<strong>in</strong>stant correspond<strong>in</strong>g to j. In this way we obta<strong>in</strong> a matrix<br />

Z (nxm) conta<strong>in</strong><strong>in</strong>g the past humidity values for each day.<br />

The number of rows of Z has to be the same of X; the<br />

number of columns is determ<strong>in</strong>ed by the number of days<br />

recorded <strong>in</strong> the database. Let Y a particular day <strong>in</strong> the past.<br />

Then we calculated the mean square error for each past<br />

day as shown <strong>in</strong> the follow<strong>in</strong>g formula:<br />

n 1<br />

2<br />

MSE = ∑ ( X i − Yi<br />

)<br />

(4)<br />

n i=<br />

1<br />

At the end of this calculus we have a vector E conta<strong>in</strong><strong>in</strong>g<br />

the MSE for all the past days. We choose the day with the<br />

smallest MSE. In this way it’s identified the “match<strong>in</strong>g<br />

day”: this is the day <strong>in</strong> witch the humidity trend is closest<br />

to the humidity values of the forecast day at the time j.<br />

After this we fit the ra<strong>in</strong> values of the match day <strong>in</strong> the<br />

next three hours after j with a polynomial of the third<br />

order. Then the forecast ra<strong>in</strong> values are obta<strong>in</strong>ed add<strong>in</strong>g to<br />

the last ra<strong>in</strong> measure the ra<strong>in</strong> gradient of the match day, as<br />

shown <strong>in</strong> the follow<strong>in</strong>g formula:


( t + ) = m(<br />

t)<br />

+ [ p ( t + 1)<br />

− p ( t)<br />

]<br />

P 1 M<br />

M<br />

(5)<br />

where,<br />

• P(t+1): is the ra<strong>in</strong> forecast value<br />

• m(t): is the measured ra<strong>in</strong> at the time t<br />

• pM(t+1): is the ra<strong>in</strong> value of the match day at<br />

time t+1<br />

• pM(t): is the ra<strong>in</strong> value of the match day at time t<br />

It is important to underl<strong>in</strong>e that P M (t) <strong>and</strong> P M (t+1) are not<br />

the measured ra<strong>in</strong> values but the value obta<strong>in</strong>ed with the<br />

third order model. In this way we circumvent the problem<br />

of wrong measures <strong>and</strong> we have a smoothest ra<strong>in</strong> gradient.<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

4. IMPLEMENTATION<br />

4.1 Database Organization<br />

In our case 3 values are measured every 15 m<strong>in</strong>utes,<br />

namely air temperature, humidity <strong>and</strong> precipitation with a<br />

precision of 16 bits. This means that for each day 288<br />

values have to be stored, which requires 4608 bits <strong>in</strong> the<br />

data memory. Therefore we selected a 4 Mbit Flash<br />

memory which allows us to store the values of<br />

approximately 2 ½ years which is more than sufficient as<br />

historical database for the statistical forecast<strong>in</strong>g algorithm.<br />

S<strong>in</strong>ce each day conta<strong>in</strong>s always exactly 288 measurements<br />

we can segment the complete memory <strong>in</strong> day-slots of 4608<br />

bits represent<strong>in</strong>g respectively the separate days with 96<br />

time-slots each conta<strong>in</strong><strong>in</strong>g 3 measured values. From the<br />

current time we can then calculate the memory position<br />

where the next measured values have to be stored <strong>in</strong>to.<br />

It has to be stressed that for this calculation the current<br />

measurement time is sufficient. S<strong>in</strong>ce we search <strong>in</strong> the<br />

historical database only the “Best match<strong>in</strong>g day”, it is not<br />

problematic if several days <strong>in</strong> the database are miss<strong>in</strong>g e.g.<br />

due to battery failure. Therefore we do not add these<br />

miss<strong>in</strong>g days to the database but we proceed directly with<br />

the next free day-slot <strong>in</strong> the database. Consequently only<br />

the last saved database position is necessary to perform the<br />

complete management. After a battery exchange or other<br />

power failures this last database position can be searched<br />

easily <strong>and</strong> <strong>in</strong> case the current measurement time does not<br />

represent the expected next database position we try to<br />

reconstruct the miss<strong>in</strong>g values by means of an<br />

approximation through a typical st<strong>and</strong>ard day cycle <strong>in</strong><br />

order to allow anyhow a forecast, which, however, is<br />

unlikely to be very reliable for the next 4 hours, namely<br />

till the necessary previous measurements are available for<br />

the “Best Day Calculation”.<br />

4.2 Synthetic Database<br />

In order to make the system at once work<strong>in</strong>g it is<br />

necessary to have a large database for the best day<br />

calculation. The system would not operate reliable after a<br />

new <strong>in</strong>stallation <strong>and</strong> several days or even months would be<br />

necessary till the required data have been collected. To<br />

avoid this problem we created a synthetic database able to<br />

represent the meteorological characteristics of the place<br />

the system is <strong>in</strong>stalled <strong>in</strong>. Dur<strong>in</strong>g a new <strong>in</strong>stallation this<br />

database can be updated from the PC <strong>in</strong> two modes. In the<br />

109<br />

extended version a complete year of the current location is<br />

loaded <strong>in</strong>to the data memory: However these data might<br />

not be available always <strong>and</strong> therefore <strong>in</strong> the compact<br />

version only 60 significant days for typical regions <strong>in</strong><br />

Western Europe (see table 1) are loaded <strong>in</strong>to the data<br />

memory. In this way the system can run at once us<strong>in</strong>g this<br />

synthetic database <strong>and</strong> then acquired data can be<br />

appended.<br />

Table 1. Typical Western European Regions.<br />

Terra<strong>in</strong> Features<br />

Open Open site, open terra<strong>in</strong>, north fac<strong>in</strong>g<br />

<strong>in</strong>cl<strong>in</strong>e, no raised skyl<strong>in</strong>e (Applies to<br />

most sites.)<br />

Depression Depression on very flat valley floor,<br />

<strong>in</strong> which cold air collects, particularly<br />

the Jura <strong>and</strong> the Alps<br />

Sea/lake Shore of sea or larger lake (up to 1 km<br />

from the shore<br />

City Center of a larger city (over 100'000<br />

<strong>in</strong>habitants)<br />

Valley Valley floor <strong>in</strong> mounta<strong>in</strong>ous valley at<br />

higher altitudes. Valley floor <strong>in</strong>cl<strong>in</strong>ed<br />

(flat valleys are often treated as<br />

depressions)<br />

Valley Central Floor of large central Alp<strong>in</strong>e valley<br />

Alps<br />

(e.g. Alp<strong>in</strong>e regions of Velais)<br />

Föhn valley Valley floor of föhn valley (regions<br />

with warm descend<strong>in</strong>g air currents)<br />

Valley Alp<strong>in</strong>e Valley floor <strong>in</strong> northern Alp<strong>in</strong>e<br />

foothills foothills<br />

4.3 Forecast<strong>in</strong>g process<br />

As expla<strong>in</strong>ed <strong>in</strong> the previous section we only have to f<strong>in</strong>d<br />

the day where the MSE is lowest. This can be achieved by<br />

means of simple additions <strong>and</strong> subtraction as can be seen<br />

<strong>in</strong> equation 4. In the implementation we even do not need<br />

the division by n s<strong>in</strong>ce n rema<strong>in</strong>s constant. After<br />

determ<strong>in</strong><strong>in</strong>g this best match<strong>in</strong>g day the only th<strong>in</strong>g to do is<br />

to add the gradient for the next 1, 2 <strong>and</strong> 3 hour to the<br />

current measurement value as can be seen <strong>in</strong> figure 2. The<br />

same procedure can be applied to the temperature forecast.<br />

For the ra<strong>in</strong> forecast we have to search the “Best Match<strong>in</strong>g<br />

Day” <strong>in</strong> the humidity database, but we use the ra<strong>in</strong> values<br />

for forecast<strong>in</strong>g. However, s<strong>in</strong>ce the basic algorithm is<br />

<strong>in</strong>dependent from the data type we reuse the same FPGA<br />

core for all forecast<strong>in</strong>g.<br />

It must be admitted that up to know the algorithm is quite<br />

simple <strong>and</strong> an implementation directly <strong>in</strong> a FPGA might<br />

seem unnecessary, ma<strong>in</strong>ly because the predictor for the<br />

ra<strong>in</strong> forecast has been established previously by simulation<br />

at the PC. But with additional sensorial <strong>in</strong>formation such<br />

as solar radiation <strong>and</strong> the necessity to determ<strong>in</strong>e other<br />

weather events such as fog or ice formation, the reliability<br />

of the forecast<strong>in</strong>g process has to be <strong>in</strong>creased by<br />

dynamically determ<strong>in</strong><strong>in</strong>g the optimal predictors previous<br />

to each forecast<strong>in</strong>g process <strong>and</strong> neural networks might be<br />

necessary to perform the forecast<strong>in</strong>g <strong>in</strong> order to account<br />

for cross correlations among several predictors. Both of<br />

these improvements have just been tested successfully by


means of simulation. However these algorithms are<br />

elaborate <strong>and</strong> time-consum<strong>in</strong>g <strong>and</strong> therefore can be<br />

implemented best <strong>in</strong> a FPGA [5].<br />

Relative Humidity [%]<br />

Relative Humidity [%]<br />

100<br />

90<br />

80<br />

70<br />

60<br />

50<br />

40<br />

30<br />

20<br />

10<br />

100<br />

90<br />

80<br />

70<br />

60<br />

50<br />

40<br />

30<br />

20<br />

10<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

Best match<strong>in</strong>g day<br />

Humidity Measurements<br />

Ra<strong>in</strong> Measurements<br />

Best Day Calculation Forecast<strong>in</strong>g<br />

Time-slot<br />

Current day<br />

Humidity Measurements<br />

Humidity Forecasts<br />

Ra<strong>in</strong> Measurements<br />

Ra<strong>in</strong> Forecasts<br />

Best Day Calculation Forecast<strong>in</strong>g<br />

Time-slot<br />

Figure 2. Example Forecast.<br />

5. RESULTS<br />

The algorithm described <strong>in</strong> the previous section has been<br />

applied to ra<strong>in</strong> data of a meteorological station positioned<br />

on the roof of Polytechnic of Tur<strong>in</strong>. The period of<br />

observations is between 15/03/2004 <strong>and</strong> 29/03/2004. In<br />

the follow<strong>in</strong>g figure are shown the forecast values<br />

compared with the measured values.<br />

[mm]<br />

[mm]<br />

[mm]<br />

[mm]<br />

15<br />

10<br />

5<br />

RAIN MEASURES<br />

0<br />

0 200 400 600 Time 800 1000 1200 1400<br />

15<br />

1 HOUR FORECASTS<br />

10<br />

5<br />

0<br />

0 200 400 600 Time 800 1000 1200 1400<br />

15<br />

2 HOUR FORECASTS<br />

10<br />

5<br />

0<br />

0 200 400 600 Time 800 1000 1200 1400<br />

10<br />

3 HOUR FORECASTS<br />

5<br />

0<br />

0 200 400 600 Time 800 1000 1200 1400<br />

Figure 3. Forecast <strong>and</strong> measured values.<br />

In order to evaluate the quality of the predictions we<br />

calculate the mean square error for the 1, 2 <strong>and</strong> 3 hour<br />

forecasts. We can see that the results are good; <strong>in</strong> fact even<br />

<strong>in</strong> the three hour forecast the mean square error is low if<br />

we consider that the pluviometer sensibility is 0.2 mm.<br />

20<br />

18<br />

16<br />

14<br />

12<br />

10<br />

8<br />

6<br />

4<br />

2<br />

20<br />

18<br />

16<br />

14<br />

12<br />

10<br />

8<br />

6<br />

4<br />

2<br />

Precipitation [mm]<br />

Precipitation [mm]<br />

110<br />

Table 2. Mean square error values.<br />

1 h forecasts 2 h forecasts 3 h forecasts<br />

MSE 0.37 mm 2 0.42 mm 2 0.49 mm 2<br />

VRC 0.39 0.43 0.51<br />

Then we calculate also the Variance Reduction Coefficient<br />

as shown <strong>in</strong> the follow<strong>in</strong>g formula:<br />

1 1<br />

VRC = ⋅<br />

2<br />

σ N<br />

where,<br />

N<br />

∑<br />

i=<br />

1<br />

2<br />

( x − xˆ<br />

)<br />

• x i : is the forecast value<br />

i<br />

i<br />

• xˆ i : is the measured value<br />

• N: is the number of observations<br />

This <strong>in</strong>dex is very mean<strong>in</strong>gful: <strong>in</strong> fact it puts <strong>in</strong> evidence<br />

how much the variability of the process (ra<strong>in</strong> <strong>in</strong> this case)<br />

is expla<strong>in</strong>ed by the predictor used (humidity). Lower is the<br />

VRC value higher is the forecast value. In this case, be<strong>in</strong>g<br />

the value less than 1 <strong>in</strong> all cases, it means that the mean<br />

square error is less than the <strong>in</strong>tr<strong>in</strong>sic variability of the<br />

phenomenon expressed by the variance.<br />

6. CONCLUSION<br />

The ma<strong>in</strong> characteristic of this embedded data process<strong>in</strong>g<br />

station for sensorial <strong>in</strong>formation is that the basic statistical<br />

algorithm <strong>in</strong> the FPGA core <strong>in</strong> future can be exchanges or<br />

extended with more powerful algorithms such as neural<br />

networks [1] which have just been tested with MATLAB.<br />

Then, be<strong>in</strong>g this platform <strong>in</strong>dependent of the type of data,<br />

it can be used to monitor <strong>and</strong> foresee any time series such<br />

as air pollution data, <strong>and</strong> so on.<br />

7. REFERENCES<br />

[1] E. Pasero, W. Moniaci <strong>and</strong> T. Me<strong>in</strong>dl, FPGA based<br />

statistical data m<strong>in</strong><strong>in</strong>g processor, XV WIRN, Perugia,<br />

Italy,15-17 September 2004.<br />

[2] E. Pasero <strong>and</strong> W. Moniaci, Artificial neural networks<br />

for meteorological nowcast, IEEE CIMSA, Boston,<br />

USA 14-16 July 2004.<br />

[3] E.W. Jensen, P. L<strong>in</strong>dholm <strong>and</strong> S.W. Henneberg,<br />

Autoregressive model<strong>in</strong>g with exogenous <strong>in</strong>put of<br />

middle-latency auditory-evoked potentials to measure<br />

rapid changes <strong>in</strong> depth of anesthesia, Methods of<br />

Information <strong>in</strong> Medic<strong>in</strong>e, Vol 35, pp 256-260, 1996.<br />

[4] E. Parzen, On Estimation of a Probability Density<br />

Function <strong>and</strong> Mode, Ann. Math. Stat., Vol 33, pp<br />

1065-1076, 1962.<br />

[5] A.R. Omondi <strong>and</strong> J.C. Rajapakse, Neural Networks<br />

<strong>in</strong> FPGAs, Proceed<strong>in</strong>gs of the 9th International<br />

Conference on Neural Information Process<strong>in</strong>g, Vol.2,<br />

pp.954-959, 2002.<br />

(6)


0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

A HIGH THROUGHPUT FPGA CAMELLIA<br />

IMPLEMENTATION<br />

Daniel Denn<strong>in</strong>g, James Irv<strong>in</strong>e, Malachy Devl<strong>in</strong><br />

Institute of System Level Integration, Alba Centre, Alba Campus,<br />

Liv<strong>in</strong>gston, EH54 7EG, UK<br />

E-mail: daniel.denn<strong>in</strong>g@sli-<strong>in</strong>stiute.ac.uk<br />

ABSTRACT<br />

In this paper we present a Field Programmable<br />

Gate Array (FPGA) implementation of the<br />

Camellia encryption algorithm. Our<br />

implementation deeply sub-pipel<strong>in</strong>es the algorithm<br />

for the FPGA architecture. Camellia has been<br />

<strong>in</strong>cluded <strong>in</strong> both portfolios of the New European<br />

Schemes for Signatures, Integrity, <strong>and</strong> Encryption<br />

(NESSIE) for Europe <strong>and</strong> the Cryptography<br />

<strong>Research</strong> <strong>and</strong> Evaluation Committee (CRYPTREC)<br />

<strong>in</strong> Japan. The implementation is the fastest<br />

published throughput for the entire block ciphers<br />

recommended <strong>in</strong> both portfolios for NESSIE <strong>and</strong><br />

CRYPTREC, <strong>and</strong> runs at a throughput of<br />

33.25Gbit/sec.<br />

1. INTRODUCTION<br />

The Camellia [1] symmetric-key block cipher has<br />

been recognised by an open call from NESSIE [2]<br />

<strong>and</strong> CRYPTREC [3] as a cryptographic algorithm<br />

to help protect the current <strong>and</strong> future <strong>in</strong>formation<br />

society. FPGAs provide a very beneficial platform<br />

for implement<strong>in</strong>g cryptographic systems because of<br />

the <strong>in</strong>herent parallelism of the device.<br />

NESSIE was a European Union (EU) <strong>in</strong>itiative<br />

<strong>and</strong> a project with <strong>in</strong> the Information Society<br />

Technologies (IST) of the EU. Camellia was<br />

selected with three other block ciphers from the 42encryption<br />

algorithms that were submitted. The<br />

other two block ciphers be<strong>in</strong>g MISTY1 <strong>and</strong><br />

SHACAL-2. The AES algorithm was also selected<br />

but selected on its evaluation from the National<br />

Institute of St<strong>and</strong>ards <strong>and</strong> Technology (NIST).<br />

Other algorithms <strong>in</strong>cluded digital signatures,<br />

identification schemes, public-key encryption, MAC<br />

algorithms, <strong>and</strong> hash functions.<br />

The Camellia algorithm is a 128-bit block<br />

cipher jo<strong>in</strong>tly developed by NTT <strong>and</strong> Mitsubishi<br />

Electric Corporation. The algorithm has also been<br />

submitted to other st<strong>and</strong>ardisation organisations <strong>and</strong><br />

111<br />

evaluation projects such as ISO/IEC JTC 1/SC 27,<br />

IETF, <strong>and</strong> TV-Anytime Forum. Previous Camellia<br />

implementations have been published <strong>in</strong> [4,5] but<br />

do not <strong>in</strong>vestigate a sub-pipel<strong>in</strong><strong>in</strong>g architecture.<br />

2. OVERVIEW OF CAMELLIA<br />

ALGORITHM<br />

The Camellia algorithm processes data blocks of<br />

128-bits with secret keys of lengths 128, 192, or 256<br />

bits. Note that Camellia has the same <strong>in</strong>terface as<br />

the AES (Advanced Encryption St<strong>and</strong>ard). In our<br />

implementation we focus on the algorithm us<strong>in</strong>g a<br />

key length of 128-bits that is key agile. A key agile<br />

core requires that on each clock cycle new data <strong>and</strong><br />

cipher key must be accepted.<br />

A key length of 128-bits results <strong>in</strong> an 18 round<br />

Feistel structure. After the 6 th <strong>and</strong> 12 th rounds<br />

FL/FL -1 function layers are <strong>in</strong>serted to provide some<br />

non-regularity across rounds. There are also two<br />

64-bit XOR operations before the first round <strong>and</strong><br />

after the last, also known as pre- <strong>and</strong> postwhiten<strong>in</strong>g.<br />

The top-level structure of the algorithm<br />

can be seen <strong>in</strong> Figure 1, as well as the <strong>in</strong>ner 6 round<br />

structure. The key schedule, discussed later <strong>in</strong> this<br />

section, generates subkeys for each round, FL/FL -1<br />

layers, <strong>and</strong> pre- <strong>and</strong> post-whiten<strong>in</strong>g.<br />

The FL-function is def<strong>in</strong>ed by:<br />

YR(32) = ((XL(32) ∩ klL(32))


FL<br />

FL<br />

M(128)<br />

kw1 kw2<br />

k1, k2,<br />

k3, k4,<br />

k5, k6<br />

kl1<br />

k7, k8,<br />

k9, k10,<br />

k11, k12<br />

kl3<br />

k13, k14,<br />

k15, k4,<br />

k5, k6<br />

kw3<br />

64 64<br />

C(128)<br />

FL -1<br />

FL -1<br />

Figure 2. Key schedule architecture for<br />

Camellia<br />

represents a substitution us<strong>in</strong>g one of 4 s-boxes that<br />

are def<strong>in</strong>ed by:<br />

S1(x) = h(g(f(x ⊕ a))) ⊕ b, (4)<br />

S2(x) = S1(x) > 1, (6)<br />

S4(x) = S1(x


Camellia 1<br />

3. SUB-PIPELINING ARCHITECTURE<br />

In this section we describe the sub-pipel<strong>in</strong>ed<br />

architecture but it <strong>in</strong>cludes previous classical<br />

encryption optimisation techniques to <strong>in</strong>crease the<br />

algorithms throughput at the higher level with<strong>in</strong> the<br />

algorithm. These <strong>in</strong>clude such optimisations as<br />

unroll<strong>in</strong>g, round pipel<strong>in</strong><strong>in</strong>g, <strong>and</strong> transformation<br />

partition<strong>in</strong>g. The implementation is key agile thus<br />

provid<strong>in</strong>g very high security <strong>and</strong> throughput as new<br />

data blocks can be encrypted on each clock cycle<br />

with a different key.<br />

For the sub-pipel<strong>in</strong><strong>in</strong>g architecture we have fully<br />

unrolled <strong>and</strong> added pipel<strong>in</strong><strong>in</strong>g registers between<br />

each encryption round. We have then also added 5<br />

stages of sub-pipel<strong>in</strong><strong>in</strong>g shift registers <strong>in</strong>to the F-<br />

Function. The registers have been added between<br />

the P-Function, S-Function <strong>and</strong> the XOR. With a<br />

pipel<strong>in</strong>e stage also added <strong>in</strong>to the P-Function <strong>and</strong> a<br />

register at the output of the F-function. This<br />

architecture produces a pipel<strong>in</strong>e delay of 108 clock<br />

1 Included <strong>in</strong> both CRYPTREC <strong>and</strong> NESSIE.<br />

2 F<strong>in</strong>alists not <strong>in</strong>cluded <strong>in</strong> NESSIE portfolio.<br />

3 Implemented on a Altera FPGA device<br />

4 Only able to f<strong>in</strong>d ASIC implementation<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

Table 1. Summary of CRYPTREC <strong>and</strong> NESSIE 128-bit Block Cipher W<strong>in</strong>ners<br />

<strong>and</strong> F<strong>in</strong>alists on FPGAs<br />

Algorithm Authors Device Area Key Throughput Efficiency<br />

Slices<br />

(BRAMS)<br />

agility Gbits/sec Mbps/slices<br />

Authors XC2Vp50 11,287<br />

(88)<br />

YES 33.25 -<br />

AES-128 1<br />

Zambreno<br />

et al. [6]<br />

XC2V4000 16938 YES 23.57 1.39<br />

MISTY1 1<br />

Rouvroy et<br />

al. [7]<br />

XCV1000 6,322 YES 19.34 3.06<br />

SHACAL-1 2 McLoone et<br />

al. [8]<br />

XC2V4000 13,729 NO 17.02 1.24<br />

IDEA 2<br />

Gonzalez et XCV600 12,026 NO 8.3 0.69<br />

al. [9]<br />

(estimated)<br />

RC6 2<br />

Beuchat XC2V3000 8,554 NO 15.2 -<br />

[10]<br />

(80)<br />

(assumed)<br />

Hierocrypt-3 3<br />

Rogawski EPF10K130V 25811 YES 0.397 -<br />

[11]<br />

LE<br />

(assumed)<br />

CIPHERUNICORN-<br />

A 3<br />

NEC [12] EP20K1500E 7072 LE YES 0.044 -<br />

(66<br />

ESB)<br />

SC2000 4<br />

Shimoyama CPF10K130V 25811 YES 0.397 -<br />

et al. [13]<br />

LE<br />

(assumed)<br />

113<br />

cycles but does not have an effect on throughput.<br />

Further improvements might be made with some<br />

algorithm optimisations or floor plann<strong>in</strong>g. Add<strong>in</strong>g<br />

the registers <strong>in</strong>to the P-Function means that the<br />

function now has a branch number of 3 or 2 <strong>in</strong>stead<br />

of 5 depend<strong>in</strong>g on the data path. The architecture<br />

of F-Function can be seen <strong>in</strong> Figure 3 on the<br />

previous page. The registers can be seen <strong>in</strong>between<br />

each of the F-function operations.<br />

One other important optimisation techniques<br />

specific for the FPGA architecture is the use of<br />

dual-port block-RAM. Every round has its own<br />

associated F-function, as described earlier, <strong>and</strong> each<br />

function utilises 4 different s-boxes for byte<br />

substitution. Each F-function makes 2 calls to the<br />

same s-box. For this implementation we have<br />

chosen to use dual ported block-RAMs for the sboxes.<br />

4. RESULTS<br />

We have implemented the sub-pipel<strong>in</strong>ed<br />

architecture on a Virtex-II pro XC2Vp50 device.<br />

The core was designed with Xil<strong>in</strong>x’s System<br />

Generator, which produced the VHDL <strong>and</strong><br />

associated files. The VHDL was synthesised with<br />

ISE 5.2 <strong>and</strong> verified with the test vectors that were


submitted to NESSIE. The core runs at a frequency<br />

of 259MHz that results <strong>in</strong> a throughput of<br />

33.25GBits/sec. The implementation requires<br />

11,287 slices (47%) of the FPGA <strong>and</strong> uses 88 block-<br />

RAMs (37%). Generally comparisons aga<strong>in</strong>st other<br />

encryption implementations can be difficult <strong>and</strong> can<br />

depend on architecture, device, embedded cores<br />

such as block-RAM, key agility, on-chip key<br />

schedul<strong>in</strong>g, <strong>and</strong> other such criteria. Table 1 shows<br />

a list of other known high throughput FPGA<br />

architectures for NESSIE <strong>and</strong> CRYPTREC. For<br />

implementations that have used block-RAMs we<br />

have not <strong>in</strong>cluded the efficiency. It is possible to<br />

calculate the extra number of slices that the block-<br />

RAMs would occupy if distributed memory was to<br />

be used, but this sort of calculation doesn’t take <strong>in</strong>to<br />

account the extra rout<strong>in</strong>g, which would have an<br />

effect on the total throughput <strong>and</strong> therefore the<br />

efficiency.<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

5. CONCLUSION<br />

In this paper we have presented a sub-pipel<strong>in</strong>ed<br />

Camellia implementation that has a throughput of<br />

33.25Gbit/sec. Compared to other published FPGA<br />

implementations this is the fastest known<br />

throughput of all block ciphers (128-bit)<br />

recommended by NESSIE <strong>and</strong> CRYPTREC. The<br />

importance of this algorithm has been proved <strong>in</strong> the<br />

open-call by NESSIE <strong>and</strong> CRYTREC.<br />

The core is able to receive new secret keys <strong>and</strong><br />

data on every clock cycle. We have utilised the<br />

available dual-port block-RAMs for use <strong>in</strong> the F-<br />

Function <strong>and</strong> been able to pipel<strong>in</strong>e the algorithm to<br />

the lowest level <strong>and</strong> then apply this to the FPGA<br />

architecture. The core has a very high throughput<br />

<strong>and</strong> thus could be used <strong>in</strong> a secure high<br />

performance comput<strong>in</strong>g application. As the core<br />

takes less then 50% of the Virtex 2 pro it might be<br />

possible to put two cores down thus be<strong>in</strong>g able to<br />

obta<strong>in</strong> a throughput on one FPGA device of around<br />

66Gbits/sec. This will be part of the work for the<br />

future.<br />

6. REFERENCES<br />

[1] Aoki, K., Ichikawa, T., K<strong>and</strong>a, M., Matsui, M.,<br />

Moriai, S., Nakajima, J., Tokita, T.,<br />

“Specification of Camellia – 128-bit Block<br />

Cipher”, URL:<br />

http://<strong>in</strong>fo.isl.ntt.co.jp/camellia/#Spec,<br />

September 2001.<br />

114<br />

[2] NESSIE, “NESSIE Project Announces F<strong>in</strong>al<br />

Selection of Crypto Algorithms”, IST-199-<br />

12324, February 2003.<br />

[3] CRYPTREC, Information-Technology<br />

Promotion Agency, Japan, “CRYPTREC<br />

Report 2002”,<br />

[4] Denn<strong>in</strong>g, D., Irv<strong>in</strong>e, J., Delv<strong>in</strong>, M., “A Key<br />

Agile 17.4Gbit/sec Camellia Implementation”,<br />

proceed<strong>in</strong>gs of FPL’04, Antwerp, Belgium,<br />

Aug. 2004.<br />

[5] Ichikawa, T., Sorimachi, T., Kasuya, T.,<br />

Matsui, M., “On the Criteria of Hardware<br />

Evalution of Block Ciphers(1)”, Techn report<br />

of IEICE, ISEC2001-53, September 2001.<br />

[6] Zambreno, J., Nguyen, D., Choudhary, A.,<br />

“Explor<strong>in</strong>g Area/Delay Tradeoffs <strong>in</strong> an AES<br />

FPGA Implementation”, proceed<strong>in</strong>gs of<br />

FPL’04, Antwerp, Belguim, Aug. 2004.<br />

[7] Rouvroy. G., St<strong>and</strong>aert, F.X., “Efficient FPGA<br />

Implementation of Blcok Cipher MISTY1”,<br />

proceed<strong>in</strong>gs IPDPS’03, France, April 2003.<br />

[8] McLoone, M., McCanny, J. V., “Very High<br />

Speed 17 Gbps SHACAL Encryption<br />

Architecture”, <strong>in</strong> Proc. Of the 13 th Int’l<br />

Conference on Field-Programmable Logic <strong>and</strong><br />

its Applications (FPL), Portugal, September<br />

2003.<br />

[9] Gonzalez, I., Lopez-Buebo, S., Gomez, F. J.,<br />

Mart<strong>in</strong>ez, J., “Us<strong>in</strong>g Partial Reconfiguration <strong>in</strong><br />

Cryptographic Applications: An<br />

Implementation of the IDEA Algorithm”, <strong>in</strong><br />

Proc. Of the 14 th Int’l Conference on Field-<br />

Programmable Logic <strong>and</strong> its Applications<br />

(FPL), Portugal, September 2003.<br />

[10] Beuchat, J. L., “FPGA Implementations of the<br />

RC6 Block Cipher”, <strong>in</strong> Proc. Of the 13 th Int’l<br />

Conference on Field-Programmable Logic <strong>and</strong><br />

its Applications (FPL), Portugal, September<br />

2003.<br />

[11] Rogawski, M., “Analysis of Implementation<br />

Hierocrypt-3 Algorithm (<strong>and</strong> its comparison to<br />

Camellia algorithm) us<strong>in</strong>g ALTERA devices”,<br />

Electronic Edition, CoRR cs CR/0312035,<br />

2003.<br />

[12] NEC Corporation, “Self Evaluation Report –<br />

CIPHERUNICORN-A”, unpublished date.<br />

[13] Shimoyama, T., Hitoshi, Y., Kazuhiro, Y.,<br />

Takenaka, M., Itoh, K., Yjima, J., Torii, N.,<br />

Tanaka, H., “Specification <strong>and</strong> Support<strong>in</strong>g<br />

Document of the Block Cipher SC2000”, IST-<br />

199-12324, February 2003.


0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

A NOVEL APPROACH BASED ON DYNAMIC<br />

RECONFIGURATION FOR PROCESS CONTROLS<br />

WITH FPGA.<br />

Tommaso Ramacciotti ‡ , Luca Seraf<strong>in</strong>i ‡ , Luca Fanucci ‡ , Stefano Baldacci †<br />

‡ Department of Information Eng<strong>in</strong>eer<strong>in</strong>g, University of Pisa, Via Caruso, 56122 Pisa, ITALY<br />

† Kayser Italia Srl, Via di Popogna 501, 57128 Livorno, ITALY<br />

E-mail: {t.ramacciotti,l.seraf<strong>in</strong>i,l.fanucci}@iet.unipi.it; s.baldacci@kayser.it<br />

ABSTRACT<br />

A demonstrative application of dynamically reconfigurable<br />

FPGA (D_FPGA) is presented along with a<br />

dedicated novel design methodology <strong>and</strong> software tools<br />

which has been developed <strong>in</strong> the framework of an<br />

European Community funded project (RECONF). The<br />

application be<strong>in</strong>g developed consists <strong>in</strong> the re-design, by<br />

us<strong>in</strong>g a D_FPGA, with an embedded microcontroller, of<br />

the control electronics of a scientific space experiment<br />

module <strong>in</strong>itially based on an One Time Programm<strong>in</strong>g<br />

(OTP) FPGA. The demo application aims to demonstrate<br />

the benefit achieved from the design of complex systems<br />

based on the dynamic re-configuration of SRAM based<br />

FPGA. Adopt<strong>in</strong>g a D_FPGA approach we expect a<br />

performance improvement <strong>in</strong> terms of silicon area, power<br />

reduction <strong>and</strong> a decrease of the development time/costs.<br />

1. INTRODUCTION<br />

In the design of space qualified electronic systems reconfigurable<br />

FPGA offer the advantage of considerable<br />

reduction of development time <strong>and</strong> costs. In fact a design<br />

error (as trivial as an <strong>in</strong>verted signal) <strong>in</strong> an OTP FPGA<br />

makes this device (once programmed) unusable <strong>and</strong> a new<br />

FPGA has to be programmed. This “trial <strong>and</strong> error”<br />

approach shows to be costly <strong>in</strong> terms of development time<br />

<strong>and</strong> also <strong>in</strong> terms of wasted FPGA <strong>and</strong> pr<strong>in</strong>ted circuit<br />

boards (note here that space qualification for PCBs allows<br />

a maximum of three solder<strong>in</strong>g de-solder<strong>in</strong>g cycles for any<br />

given pad). Therefore, employment of dynamic reconfigurable<br />

FPGA is highly desirable <strong>in</strong> this field,<br />

because the FPGA configuration can be modified even<br />

when the device is soldered onto a PCB.<br />

The bitstream of a D_FPGA[2] is composed by a static<br />

module (loaded at the system start up <strong>and</strong> never changed)<br />

<strong>and</strong> some dynamic modules (d_module) that can be loaded<br />

<strong>and</strong> unloaded asynchronously accord<strong>in</strong>g to the user<br />

requirements. Any number of d_modules can be loaded<br />

<strong>and</strong> co-exist <strong>in</strong>to the D_FPGA up to its bitstream memory<br />

size. Any complex design can, <strong>in</strong> pr<strong>in</strong>ciple, be subdivided<br />

<strong>in</strong>to small d_modules which implement the same<br />

functionalities at time division. Us<strong>in</strong>g this approach we<br />

can <strong>in</strong>crease the functional density of an FPGA <strong>and</strong><br />

115<br />

complex designs can fit <strong>in</strong>to a D_ FPGA with a relatively<br />

small gate count.<br />

The limitation to the applicability of this approach for a<br />

real time control with respect to a st<strong>and</strong>ard pipel<strong>in</strong>ed<br />

architecture is represented by the reload time of the<br />

dynamic modules. In fact to obta<strong>in</strong> a correct sampl<strong>in</strong>g of<br />

the <strong>in</strong>put signals the system must satisfy the follow<strong>in</strong>g<br />

requirement:<br />

∑ [ Treload] + ∑[<br />

Tproc]<br />

< ⋅Tsampl<strong>in</strong>g<br />

where Treload represent the reload times of all the<br />

dynamic modules <strong>and</strong> Tproc are the process<strong>in</strong>g times of<br />

the modules. Tsampl<strong>in</strong>g <strong>in</strong>stead represents the m<strong>in</strong>imum<br />

time <strong>in</strong>terval at which the fastest <strong>in</strong>put signal must be<br />

acquired.<br />

If this condition is satisfied we can approach the system<br />

design <strong>in</strong> a dynamic fashion, design<strong>in</strong>g the s<strong>in</strong>gle<br />

processes <strong>in</strong>dependently <strong>and</strong> load<strong>in</strong>g them separately <strong>in</strong><br />

the FPGA at run time.<br />

In spite of the wide availability of re-configurable FPGA<br />

architectures on the market, there are no adapted design<br />

methodologies <strong>and</strong> no tools on the market to support<br />

electronic designs based on D_FPGA. Developers are still<br />

us<strong>in</strong>g classical design environments, which are not<br />

adapted to an efficient approach, especially regard<strong>in</strong>g the<br />

dynamic system specificity.<br />

In the framework of the European Community funded<br />

project IST-RECONF[1] new software tools <strong>and</strong><br />

methodology for the design of systems with dynamically<br />

re-configurable FPGA has been developed <strong>and</strong> evaluated.<br />

The aim of this project is to allow the implementation of<br />

adaptive system architectures by def<strong>in</strong><strong>in</strong>g a methodology<br />

<strong>and</strong> develop<strong>in</strong>g the required design environment to take<br />

full benefits of D_FPGA. The output of the project is a<br />

complete <strong>and</strong> validated design environment composed of<br />

the follow<strong>in</strong>g parts:<br />

• New <strong>and</strong> adapted design methodology address<strong>in</strong>g<br />

partition<strong>in</strong>g <strong>and</strong> schedul<strong>in</strong>g issues <strong>in</strong>clud<strong>in</strong>g the<br />

availability of a Designer’s Guide


• Front-End tools adapted to D_FPGA but<br />

<strong>in</strong>dependent from the IC manufacturer’s technology<br />

• Back-End tools adapted to D_FPGA <strong>and</strong> dependent<br />

on the IC manufacturer’s technology.<br />

• Configuration controller generator to generate a<br />

Hardware/software description of the configuration<br />

controller that will be <strong>in</strong> charge of manag<strong>in</strong>g the<br />

reconfiguration process. It could be generated<br />

automatically, or under user constra<strong>in</strong>ts.<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

2. THE TECHNOLOGY<br />

The Back-End tools previously <strong>in</strong>troduced which<br />

implement the modular placement <strong>and</strong> rout<strong>in</strong>g of the<br />

different module onto the D_FPGA fabric are obviously<br />

technology dependent. The Back-End tools be<strong>in</strong>g<br />

developed for the RECONF project are dedicated to the<br />

AT40K <strong>and</strong> AT94K (a version of the AT40K family with<br />

an AVR microcontroller embedded) FPGA families from<br />

ATMEL®. In our work we used the AT94K l<strong>in</strong>e of<br />

products which is briefly <strong>in</strong>troduced (Figure 1).<br />

Configurable SRAM<br />

AT40K FPGA<br />

Figure 1. Atmel® AT94K40<br />

36K bytes of SRAM<br />

40K gates<br />

The ATMEL® users have been us<strong>in</strong>g micro controllers to<br />

control their applications while us<strong>in</strong>g FPGAs to<br />

implement <strong>in</strong>terfaces <strong>and</strong> glue logic. This fact led the<br />

ATMEL® Corp. to the development of a new device that<br />

comb<strong>in</strong>es an 8-bit RISC AVR micro controller <strong>and</strong> an<br />

AT40K-type FPGA. The ATMEL® Corp. has categorized<br />

this new device AT94K as a Field- Programmable System<br />

Level Integrated Circuit (FPSLIC) [4]. There are three<br />

AT94K devices available (AT94K05, AT94K10, <strong>and</strong><br />

AT94K40). They differ only <strong>in</strong> the number of the gates,<br />

<strong>and</strong> <strong>in</strong> the size of the AVR program memory. The AVR<br />

core is based on an enhanced RISC architecture that<br />

comb<strong>in</strong>es a rich <strong>in</strong>struction set with 32x8 general-purpose<br />

work<strong>in</strong>g registers. The FPGA <strong>and</strong> AVR share a flexible<br />

<strong>in</strong>terface which allows for many methods of system<br />

<strong>in</strong>tegration. Four memory locations <strong>in</strong> the AVR memory<br />

map are decoded <strong>in</strong>to 16 select l<strong>in</strong>es (8 for AT94K05) <strong>and</strong><br />

are presented to the FPGA along with the AVR 8-bit data<br />

bus. The FPGA can be used to create additional custom<br />

peripherals for the AVR micro controller through this<br />

116<br />

<strong>in</strong>terface. In addition there are 16 <strong>in</strong>terrupt l<strong>in</strong>es (8 for<br />

AT94K05) from the FPGA back <strong>in</strong>to the AVR <strong>in</strong>terrupt<br />

controller. The FPGA I/O selection is controlled by the<br />

AVR. Another communication can be made through a<br />

dual-port SRAM between both devices. The port<br />

connected to the FPGA is used to store data without us<strong>in</strong>g<br />

up b<strong>and</strong>width on the AVR system data bus. Through this<br />

communication <strong>in</strong>terface AVR can access the<br />

configuration memory of the FPGA allow<strong>in</strong>g the user to<br />

re-configure the FPGA through the AVR. This represents<br />

the basics of dynamic reconfiguration [5].<br />

3. A DEMONSTRATIVE SPACE<br />

APPLICATION<br />

The demonstrator envisages the re-design of a Micro<br />

Controller Module (MCM) of an experiment module used<br />

for a scientific experiment based on a common OTP<br />

FPGA (static) <strong>in</strong> addition to a 80KC186 micro by us<strong>in</strong>g<br />

D_FPGA with a microcontroller embedded aim<strong>in</strong>g to<br />

improve the system performances <strong>in</strong> terms of power,<br />

silicon area, system reliability <strong>and</strong> flexibility. The orig<strong>in</strong>al<br />

MCM <strong>in</strong> addition to the normal experiment parameters<br />

monitor<strong>in</strong>g operation implements <strong>in</strong> software on a<br />

80KC186 controll<strong>in</strong>g loops for Temperature, Pressure <strong>and</strong><br />

Critical Heat Flux, for an experiment chamber <strong>in</strong> a<br />

microgravity environment (µg). The orig<strong>in</strong>al design<br />

featured a process control cycle tim<strong>in</strong>g of about 10 ms for<br />

the three controls. In the new design we are seek<strong>in</strong>g to<br />

improve the time performances <strong>and</strong> to reduce the silicon<br />

usage as well.<br />

3.1 Hardware design<br />

The system be<strong>in</strong>g re-designed lends itself to a<br />

configuration composed of a static part, which implements<br />

basic functionalities such as memory management <strong>and</strong><br />

peripherals control <strong>and</strong> three dynamic modules which<br />

implement the controll<strong>in</strong>g processes of the experiment.<br />

In the new electronic design the D_FPGA replaces the<br />

exist<strong>in</strong>g OTP FPGA. The controll<strong>in</strong>g processes<br />

(Temperature, Pressure, Critical Heat Flux) are migrated<br />

<strong>in</strong> the D_FPGA. The controll<strong>in</strong>g processes run<br />

<strong>in</strong>dependently <strong>and</strong> each process is loaded <strong>and</strong> unloaded<br />

from the D_FPGA at pre-def<strong>in</strong>ed time <strong>in</strong>tervals. The<br />

<strong>in</strong>itial load<strong>in</strong>g of the static part <strong>and</strong> the subsequent<br />

load<strong>in</strong>g/unload<strong>in</strong>g of the d_modules is <strong>in</strong> charge of the<br />

configuration controller which is implemented <strong>in</strong> the<br />

embedded micro controller (AVR).


AVR<br />

WDT<br />

TO EXTERNAL<br />

MEMORY<br />

RECONFIGURATION<br />

INTERFACE<br />

DAC<br />

INTERFACE<br />

ADC<br />

INTERFACE<br />

TO AD & DA Converters<br />

Figure 2. Demonstrator D_FPGA<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

SuperMacro<br />

Static module<br />

Dynamic module<br />

TEMPERATURE<br />

PRESSURE<br />

HEAT FLUX<br />

Save <strong>and</strong> restore register<br />

F<strong>in</strong>iteStateMach<strong>in</strong>e<br />

To PC Simulator<br />

The complete bitstream <strong>in</strong>clud<strong>in</strong>g the static part with the<br />

three d_modules is stored <strong>in</strong> the SRAM configuration<br />

memory <strong>in</strong>side the D_FPGA (Figure 4).<br />

3.2 Configuration Controller (CC)<br />

The implementation of the configuration controller holds a<br />

great importance <strong>in</strong> the design flow of complex systems<br />

based on D_FPGA. The configuration controller<br />

represents the most critical part of a D_FPGA <strong>and</strong> a<br />

malfunction<strong>in</strong>g to this very part of the system would result<br />

<strong>in</strong> a serious loss of functionality.<br />

In the RECONF design flow of D_FPGA based systems<br />

one must carefully consider the different approaches for<br />

the implementation of the configuration controller. We<br />

have chosen an “embedded” implementation us<strong>in</strong>g the<br />

D_FPGA <strong>in</strong>ternal microcontroller as configuration<br />

controller because it leads to a system of lower<br />

complexity: there is no need of external hardware <strong>and</strong><br />

software to implement the functionalities of the<br />

configuration controller.<br />

Dur<strong>in</strong>g the start-up of the system the configuration<br />

controller is <strong>in</strong> charge to load the static part from he<br />

<strong>in</strong>ternal SRAM (Figure 4) that rema<strong>in</strong>s always placed <strong>in</strong><br />

the same area of the configuration SRAM. Along with the<br />

static part also the first dynamic module is loaded. The<br />

next d_module gets loaded <strong>in</strong> place of the previous one<br />

after a predef<strong>in</strong>ed time <strong>in</strong>terval while the rest of the system<br />

keeps work<strong>in</strong>g without <strong>in</strong>terruption of functionality (see<br />

Figure 3).<br />

117<br />

Silicon<br />

Area d_mod1<br />

d_mod1<br />

d_mod3<br />

d_mod2<br />

Static part<br />

Trl=reload time<br />

TL=latency time<br />

Figure 3. d_modules context switch<strong>in</strong>g<br />

. . .<br />

Dur<strong>in</strong>g the sequential load<strong>in</strong>g/unload<strong>in</strong>g of the d_modules<br />

the status of the respective data signals are saved first <strong>and</strong><br />

then restored by means of registers implemented <strong>in</strong> the<br />

static part of the D_FPGA (Figure 2).<br />

Static<br />

module<br />

D_mod_1<br />

D_mod_2<br />

D_mod_3<br />

Static<br />

module<br />

D_module_x<br />

CC<br />

AVR<br />

Figure 4. The CC is <strong>in</strong> charge to load the<br />

d_modules from the Configuration SRAM<br />

3.3 Considerations on fault tolerance<br />

Several considerations have to be taken <strong>in</strong> account to<br />

<strong>in</strong>crease the fault tolerance of the system. For the<br />

d_modules, the cont<strong>in</strong>uous re-load of them elim<strong>in</strong>ates any<br />

possible configuration error because each d_modules get<br />

refreshed at every re-configuration cycle. Regard<strong>in</strong>g the<br />

consistency of the static part, there is no opportunity to<br />

take advantage of the cont<strong>in</strong>uous reload<strong>in</strong>g at each reconfiguration<br />

cycle. To solve this problem the static part<br />

(now termed “pseudo-static”) can be implemented by a<br />

d_module with a double redundancy <strong>and</strong> an error detection<br />

circuitry which generates a trigger signal to the<br />

configuration controller when the outputs of the two<br />

blocks are different (an error as occurred ). In this case the<br />

Time


configuration controller re-loads the whole module from<br />

the D_FPGA <strong>in</strong>ternal bitsteam memory (see Fig. 5).<br />

Static<br />

Static<br />

D_FPGA<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

Error<br />

Re-load<br />

Figure 5. Error detection <strong>and</strong> correction<br />

4. CONCLUSION<br />

Bitstream memory<br />

Configuration<br />

Controller<br />

A design methodology based on ATMEL® technology<br />

<strong>and</strong> novel design tools has been presented. The software<br />

tools have been developed with<strong>in</strong> the framework of the<br />

European Community RECONF project.<br />

All the reconfiguration process of the D_FPGA is carried<br />

out by a configuration controller which can be generated<br />

automatically, or under user constra<strong>in</strong>ts, by the RECONF<br />

Front-End tools. The configuration controller is<br />

implemented <strong>in</strong> the <strong>in</strong>ternal micro-controller (AVR).<br />

The follow<strong>in</strong>g table reports the results of the tests<br />

performed with the orig<strong>in</strong>al implementations of the MCM<br />

<strong>and</strong> with the approach based on D_FPGA:<br />

Table. 1. Comparison of the two designs.<br />

Complexity<br />

[used Gates]<br />

Power<br />

consumption<br />

Time<br />

performances<br />

Orig<strong>in</strong>al<br />

design<br />

D_FPGA<br />

design<br />

Improvement<br />

60000 35000 - 42% Area<br />

Usage<br />

110mA<br />

(OTP<br />

FPGA<br />

+80186<br />

10 ms<br />

(controls<br />

cycle)<br />

60mA<br />

(FPGA+A<br />

VR)<br />

~10 ms<br />

(controls<br />

cycle)<br />

- 45% Power<br />

reduction<br />

unchanged<br />

118<br />

We have successfully demonstrated the applicability of<br />

the Dynamic Reconfiguration to a real process control<br />

task. This design methodology is well suited <strong>in</strong> space<br />

applications where the reduction of the silicon area <strong>and</strong><br />

the power consumption of the FPGA system is of primary<br />

importance.<br />

5. REFERENCES<br />

[1] RECONF Dissem<strong>in</strong>ation WWW Server<br />

http://www.reconf.org/<br />

[2] J. Manuel Moreno "Dynamic Reconfigurable<br />

FPGAs” Second Workshop on Reconfigurable<br />

Comput<strong>in</strong>g <strong>and</strong> Applications. Almunecar, Spa<strong>in</strong><br />

[3] Rafal Kielbik "High-Level Partition<strong>in</strong>g of<br />

Digital Systems Based on Dynamically<br />

Reconfigurable Devices". FPL2002 Montpellier,<br />

France<br />

[4] Atmel Corp. AT94K Series FPSLIC, June 2002.<br />

[5] Atmel Corp. AT94 K Series Cache Logic<br />

(Mode 4) Configuration, July 2001.


0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

AN OVERVIEW OF SystemC FL<br />

K.L. Man<br />

Formal Methods Group, Department of Mathematics <strong>and</strong> Computer Science,<br />

E<strong>in</strong>dhoven University of Technology, P.O.Box 513, 5600 MB E<strong>in</strong>dhoven, The Netherl<strong>and</strong>s<br />

Email: kman@w<strong>in</strong>.tue.nl<br />

ABSTRACT<br />

This paper describes our on-go<strong>in</strong>g research. Recently, we developed<br />

an algebraic theory based on classical process algebra<br />

ACP, called SystemC FL , for the specification <strong>and</strong> analysis of<br />

SystemC designs. The semantics of SystemC FL is def<strong>in</strong>ed by<br />

means of deduction rules <strong>in</strong> a st<strong>and</strong>ard structured operational<br />

semantics style that associate a labelled transition transition<br />

system with a SystemC FL process. In this paper, we first provide<br />

an overview of the current status of SystemC FL <strong>and</strong> show<br />

some practical applications of SystemC FL , as well as some<br />

key features <strong>and</strong> results of SystemC FL . Then, we give an outl<strong>in</strong>e<br />

for the latest developments of SystemC FL <strong>and</strong> po<strong>in</strong>t out<br />

the direction for future work.<br />

1. INTRODUCTION<br />

Process algebras are useful tools for the specification <strong>and</strong> verification<br />

of various systems. General speak<strong>in</strong>g, process algebras<br />

describe the behavior of processes, <strong>and</strong> provide operations<br />

that allow to compose systems <strong>in</strong> order to obta<strong>in</strong> more complex<br />

systems. Moreover, the analysis <strong>and</strong> verification of systems<br />

described us<strong>in</strong>g the process algebras can be partially or completely<br />

carried out by mathematical proofs us<strong>in</strong>g the equational<br />

theory.<br />

SystemC [1] is a model<strong>in</strong>g <strong>and</strong> simulation language (without<br />

formal semantics) based on C++ for hardware <strong>and</strong> system level<br />

design model<strong>in</strong>g. Recently, SystemC has received an extreme<br />

<strong>in</strong>crease <strong>in</strong> <strong>in</strong>dustrial acceptance for system specification <strong>and</strong><br />

simulation.<br />

The goal of develop<strong>in</strong>g a formal semantics is to provide a<br />

complete <strong>and</strong> unambiguous specification of the language. It<br />

also contributes significantly to the shar<strong>in</strong>g, portability <strong>and</strong> <strong>in</strong>tegration<br />

of various applications <strong>in</strong> simulation, synthesis <strong>and</strong><br />

formal verification.<br />

Over years, research <strong>in</strong> formal semantics <strong>in</strong> electronic community<br />

ma<strong>in</strong>ly focused on Verilog, VHDL, <strong>and</strong> SystemC. In<br />

general, their def<strong>in</strong>itions were based on Abstract State Mach<strong>in</strong>e<br />

(ASM) specifications <strong>and</strong> Denotational Semantics. For <strong>in</strong>stance<br />

(as previous work for SystemC), the simulation semantics (<strong>in</strong>clud<strong>in</strong>g<br />

watch<strong>in</strong>g statement, signal assignment, <strong>and</strong> wait statement)<br />

of SystemC <strong>in</strong> the form of distributed Abstract State Mach<strong>in</strong>e<br />

(ASM) specifications <strong>and</strong> the Denotational Semantics for<br />

a synchronous subset of SystemC were studied by [9] <strong>and</strong> [10]<br />

respectively. It is generally believed that the structured operational<br />

semantics (SOS style) [4] is more <strong>in</strong>tuitive, <strong>and</strong> the<br />

methods of ASM specifications <strong>and</strong> denotational semantics ap-<br />

119<br />

pear to be difficult to apply to describe the dynamic behavior<br />

of processes (as shown <strong>in</strong> [5]). S<strong>in</strong>ce processes are the basic<br />

units of execution with<strong>in</strong> Verilog, VHDL, <strong>and</strong> SystemC that are<br />

used to simulate the behavior of a device or a system, process<br />

algebras with the SOS style semantics are <strong>in</strong>st<strong>in</strong>ctively the best<br />

choices for giv<strong>in</strong>g formal specifications of systems <strong>in</strong> electronic<br />

community.<br />

Last year, the <strong>in</strong>troduction of SystemC FL [2] based on classical<br />

process algebra ACP [3] for the specification <strong>and</strong> analysis<br />

of SystemC designs, <strong>in</strong>itiated an attempt to extend the<br />

knowledge <strong>and</strong> experience of the field of process algebras to<br />

the field of hardware/software co-designs. The semantics of<br />

SystemC FL is def<strong>in</strong>ed by means of deduction rules <strong>in</strong> a SOS<br />

style that associate a labelled transition transition system with<br />

a SystemC FL process. A set of SystemC FL properties was<br />

presented for a notion of bisimilarity. By us<strong>in</strong>g this process<br />

algebra based formalism, we can give unambiguous specifications<br />

of SystemC designs which are possibly analyzable with<br />

automated software tools. Furthermore, SystemC FL can be<br />

purportedly used for formal verification.<br />

The goal of this paper is to provide an overview of the current<br />

status of SystemC FL , to show some practical applications, key<br />

features <strong>and</strong> results of SystemC FL , as well as the latest developments<br />

of SystemC FL .<br />

This paper is organized as follows. In section 2. , we review<br />

the syntax <strong>and</strong> formal semantics of SystemC FL . Section<br />

3. illustrates the use of SystemC FL with some case studies<br />

taken from literature. Some key features <strong>and</strong> results of<br />

SystemC FL are shown <strong>in</strong> Section 4. . F<strong>in</strong>ally, some conclud<strong>in</strong>g<br />

remarks are made <strong>in</strong> Section 5. .<br />

2. FORMAL LANGUAGE SystemC FL<br />

In this section, we give an overview of the formal language<br />

SystemC FL proposed <strong>in</strong> [2] <strong>and</strong> [8]. For an extensive treatment<br />

of SystemC FL , the reader is referred to those papers.<br />

2. 1 SystemC FL date types<br />

In order to def<strong>in</strong>e the semantics of SystemC FL processes, we<br />

need to make some assumptions about the data types. Let Var<br />

denote the set of all variables (x0, . . . , xn), <strong>and</strong> Value denote<br />

the set of all possible values (v0, . . . , vm) that conta<strong>in</strong>s<br />

at least B (booleans) <strong>and</strong> R (reals). A valuation is a partial<br />

function from variables to values (e.g. x0 ↦→ v0). The set of<br />

all valuations is denoted by Σ. The set Ch of all channels


<strong>and</strong> the set S of all sensitivity lists with clocks may be used<br />

<strong>in</strong> SystemC FL processes that are assumed.<br />

Notice that the above proposed data types are the fundamental<br />

ones. Several extensions of data types (e.g. “sc bit” <strong>and</strong><br />

“sc logic”) were already <strong>in</strong>troduced <strong>in</strong> [6].<br />

2. 2 Syntax of the SystemC FL Language<br />

A process term P <strong>in</strong> SystemC FL is built from atomic process<br />

terms AP. SystemC FL consists of various operators that operate<br />

on process terms, <strong>and</strong> it is def<strong>in</strong>ed accord<strong>in</strong>g to the follow<strong>in</strong>g<br />

grammar:<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

AP ::= δ | skip | x := e | ∆en | ≫<br />

P ::= AP | P ◭ b ◮ P | b � P | P • P | P ΘP<br />

P △d P | P � d P | ∗P | P || P | P || ℓ P<br />

P ∼ P | ∂H(P ) | τI(P ) | π(P ) | �(P )<br />

Υ � P<br />

The operators are listed <strong>in</strong> descend<strong>in</strong>g order of their b<strong>in</strong>d<strong>in</strong>g<br />

strength as follows : {�, •, △, �, ∗}, {◭◮, Θ, ||, || ℓ , ∼<br />

}, {∂, τ, π, �, �}. The operators <strong>in</strong>side the braces have equal<br />

b<strong>in</strong>d<strong>in</strong>g strength. In addition, operators of equal b<strong>in</strong>d<strong>in</strong>g<br />

strength associate to the left, <strong>and</strong> parentheses may be used to<br />

group expressions.<br />

Below is a brief <strong>in</strong>troduction of the formal language<br />

SystemC FL . The formal semantics of SystemC FL is given<br />

<strong>in</strong> Subsection 2. 3. Due to limitation of pages, deduction rules<br />

for operational semantics of SystemC FL are not given <strong>in</strong> this<br />

paper. For those <strong>in</strong>terested <strong>in</strong> more details, please read [2] <strong>and</strong><br />

[8].<br />

SystemC FL has the atomic process terms <strong>and</strong> operators as<br />

follows. The deadlock δ is <strong>in</strong>troduced as a constant, which<br />

represents no behavior. The skip process term performs the <strong>in</strong>ternal<br />

action τ, which is not externally visible. The assignment<br />

process term x := e, which assigns the value of expression e<br />

to x (model<strong>in</strong>g a SystemC “assignment” statement). The delay<br />

process term ∆en is able to delay the value of numerical expression<br />

en. The unbounded delay process term ≫ (model<strong>in</strong>g<br />

a SystemC “wait” statement) may delay for a long time that is<br />

unbounded or perform the <strong>in</strong>ternal action τ.<br />

The conditional composition p ◭ b ◮ q operates as a SystemC<br />

“then if else” statement, where b denotes a boolean expression<br />

<strong>and</strong> p, q ∈ P . The watch<strong>in</strong>g process term b � p is<br />

used to model a SystemC “watch<strong>in</strong>g” statement. The sequential<br />

composition p • q models the process term that behaves as p,<br />

<strong>and</strong> upon term<strong>in</strong>ation of p, cont<strong>in</strong>ues to behave as process term<br />

q. The alternative composition pΘq models a non-determ<strong>in</strong>istic<br />

choice between process terms p <strong>and</strong> q.<br />

The timeout process term p △d q (model<strong>in</strong>g a SystemC<br />

“time out” construct) behaves as p if p performs a time transition<br />

before a time d ∈ R>0. Otherwise, it behaves as q. The<br />

watchdog process term p� d q behaves as p dur<strong>in</strong>g a period of<br />

time less than d, at time d, q takes over the execution from p <strong>in</strong><br />

p� d q. If p performs an <strong>in</strong>ternal cancel χ action, then the delay<br />

is canceled, <strong>and</strong> the subsequent behavior is that of p after χ is<br />

executed. The repetition process term ∗p (model<strong>in</strong>g a SystemC<br />

“loop” construct) executes p zero or more times.<br />

The parallel composition p || q, the left-parallel composition<br />

p || ℓ q <strong>and</strong> the communication composition p ∼ q are used<br />

to express parallelism (actions are executed <strong>in</strong> an <strong>in</strong>terleav<strong>in</strong>g<br />

120<br />

manner, with the possibility of communication of actions). The<br />

encapsulation of actions is allowed us<strong>in</strong>g ∂H(p), where H represents<br />

the set of all actions to be blocked <strong>in</strong> p. The abstraction<br />

τI(p) behaves as the process term p, except that all action<br />

names <strong>in</strong> I are renamed to the <strong>in</strong>ternal action τ. The maximal<br />

progress π(p) assigns action transitions a higher priority<br />

over time transitions. This operator is needed to establish a desired<br />

communication behavior. That is, both the sender <strong>and</strong> the<br />

receiver must be able to perform time transitions, but if two of<br />

these can communicate (i.e. perform<strong>in</strong>g action transitions), they<br />

should not perform time transitions.<br />

The group<strong>in</strong>g of actions <strong>and</strong> execut<strong>in</strong>g them <strong>in</strong> one step can<br />

be done by us<strong>in</strong>g �(p). This is def<strong>in</strong>ed for the translation from<br />

SystemC FL to PROMELA (see also [15]). The signal emission<br />

operator Υ � p requires that the predicate Υ always holds.<br />

If it is the case, Υ � p behaves like p. Otherwise, it is a δ.<br />

This operator is newly <strong>in</strong>troduced. It is needed for def<strong>in</strong><strong>in</strong>g<br />

user-friendly syntax <strong>and</strong> the translation from SystemC FL to<br />

the SMV language (see also [16]).<br />

Informal semantics of SystemC states that SystemC <strong>in</strong>corporates<br />

both po<strong>in</strong>t-to-po<strong>in</strong>t communication <strong>and</strong> multi-party communication<br />

mechanisms for the <strong>in</strong>teraction between concurrent<br />

processes. However, there are no statements <strong>in</strong> SystemC for<br />

model<strong>in</strong>g these communication mechanisms. In order to capture<br />

the same communication behaviors between concurrent<br />

processes <strong>in</strong> SystemC FL , operators ||, || ℓ , ∼, ∂H, τI, <strong>and</strong> π<br />

are <strong>in</strong>troduced to give formal specification for po<strong>in</strong>t-to-po<strong>in</strong>t<br />

communication <strong>and</strong> multi-party communication between concurrent<br />

SystemC FL processes.<br />

2. 3 Semantics of the<br />

SystemC FL Language<br />

Def<strong>in</strong>ition 1 A SystemC FL process is a qu<strong>in</strong>tuple<br />

〈P, Σ, Σ, S, Ch〉. We use the convention 〈p, σ ′ , σ, s, m〉<br />

to write a SystemC FL process, where p is a process term;<br />

σ, σ ′ are valuations; s is a sensitivity list with clocks; <strong>and</strong> m is<br />

a channel.<br />

Def<strong>in</strong>ition 2 The set of actions Aτ conta<strong>in</strong>s at least aa(x,v),χ<br />

<strong>and</strong> τ, where aa(x, v) is the assignment action (i.e. the value<br />

of v is assigned to x), χ is the <strong>in</strong>ternal cancel action <strong>and</strong> τ is<br />

the <strong>in</strong>ternal action. The set Aτ is considered as a parameter of<br />

SystemC FL <strong>and</strong> can be freely <strong>in</strong>stantiated.<br />

Def<strong>in</strong>ition 3 We give a formal semantics for<br />

SystemC FL processes <strong>in</strong> terms of a Labelled Transition<br />

System (LTS), <strong>and</strong> def<strong>in</strong>e the follow<strong>in</strong>g transition relations on<br />

processes of SystemC FL :<br />

• −→ ⊆ (P × Σ × Σ × S × Ch) × Aτ × (P × Σ × Σ ×<br />

S × Ch), denotes action transition;<br />

• −→ 〈�, , , , 〉 ⊆ (P × Σ × Σ × S × Ch) × Aτ × (Σ ×<br />

Σ × S × Ch), denotes term<strong>in</strong>ation, where � is used to<br />

<strong>in</strong>dicate a successful term<strong>in</strong>ation, <strong>and</strong> � is not a process<br />

term;<br />

• � ⊆ (P × Σ × Σ × S × Ch) × R>0 × (P × Σ ×<br />

Σ × S × Ch), denotes time transition (so-called delay).<br />

The three k<strong>in</strong>ds of transition relations can be expla<strong>in</strong>ed<br />

as follows. Firstly, an action transition 〈p, σ ′ , σ, s, m〉 a −→


〈p ′ , σ, σ ′′ , s, m〉 is that the process 〈p, σ ′ , σ, s, m〉 executes<br />

the action a ∈ Aτ start<strong>in</strong>g with the current valuation σ (at the<br />

moment of the transition tak<strong>in</strong>g place) <strong>and</strong> by this execution p<br />

evolves <strong>in</strong>to p ′ . Notice that σ ′ represents the previous accompany<strong>in</strong>g<br />

valuation of the process, <strong>and</strong> σ ′′ represents the accompany<strong>in</strong>g<br />

valuation of the process after the action a is executed.<br />

Secondly, a term<strong>in</strong>ation 〈p, σ ′ , σ, s, m〉 a −→ 〈�, σ, σ ′′ , s, m〉 is<br />

that the process executes the action a followed by term<strong>in</strong>ation.<br />

Thirdly, a time transition 〈p, σ ′ , σ, s, m〉 d � 〈p ′ , σ, σ ′′ , s, m〉 is<br />

that the process 〈p, σ ′ , σ, s, m〉 may idle dur<strong>in</strong>g a time d <strong>and</strong><br />

then behaves like 〈p ′ , σ, σ ′′ , s, m〉.<br />

Two valuations (e.g. previous accompany<strong>in</strong>g valuation <strong>and</strong><br />

current valuation) are def<strong>in</strong>ed (as arguments) <strong>in</strong> the qu<strong>in</strong>tuple,<br />

so that the change of the valuation of variables <strong>in</strong> the sensitivity<br />

list of the qu<strong>in</strong>tuple can be observed. This is needed for def<strong>in</strong><strong>in</strong>g<br />

the deduction rules of some SystemC FL operators (e.g. the<br />

watch<strong>in</strong>g �).<br />

Notice that model<strong>in</strong>g designs with the syntax presented <strong>in</strong><br />

Subsection 2. 2 may not be straightforward <strong>and</strong> <strong>in</strong>tuitive for<br />

users not hav<strong>in</strong>g a computer science background. Therefore, <strong>in</strong><br />

[16], we def<strong>in</strong>ed an user-friendly syntax <strong>in</strong> SystemC FL .<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

3. CASE STUDIES<br />

Some complex case studies (memory, synchronous D flip<br />

flop <strong>and</strong> remote procedure call (RPC) protocol) of the application<br />

of SystemC FL to model <strong>and</strong> to perform analysis of<br />

SystemC FL designs already appeared <strong>in</strong> [6]. For illustration<br />

purpose, <strong>in</strong> this section, we only give a simple case study:<br />

counter <strong>and</strong> test-bench to show how SystemC FL can be put<br />

to practical use to model the fundamental SystemC processes.<br />

3. 1 Method process<br />

When events occur on signals that a process is sensitive to, then<br />

the process executes. Below is an example (written <strong>in</strong> SystemC<br />

code) that implements an <strong>in</strong>teger counter.<br />

// count.h<br />

# <strong>in</strong>clude ’’systemc.h’’<br />

SC_MODULE(count) {<br />

sc_<strong>in</strong> load;<br />

sc_<strong>in</strong> d<strong>in</strong>;<br />

sc_<strong>in</strong> clock;<br />

sc_out dout;<br />

<strong>in</strong>t count_val;<br />

void count_up();<br />

SC_CTOR(count) {<br />

SC_METHOD(count_up);<br />

sensitive_pos


where σ ′ , σ <strong>and</strong> s are def<strong>in</strong>ed as <strong>in</strong> Subsection 3. 1.<br />

If Condclock evaluates to true, the outermost � process term<br />

executes. Then the <strong>in</strong>nermost � process term executes also.<br />

First the value 0 <strong>and</strong> true are assigned to d<strong>in</strong> <strong>and</strong> load respectively,<br />

then performs some delay. After that, false is assigned<br />

to load <strong>and</strong> performs some other delay.<br />

S<strong>in</strong>ce processes (counter <strong>and</strong> test-bench) execute concurrently,<br />

the parallel composition operator is used to model the<br />

complete system. The complete system is modeled as follows:<br />

〈Condclock (σ ′ , σ, s) � (((countval := d<strong>in</strong> ◭ load ◮<br />

countval := countval + 1) • dout := countval) || (true �<br />

(load := true • d<strong>in</strong> := 0 • ≫ • load := false • ≫<br />

))), σ ′ , σ, s〉 for some σ ′ , σ, s (are def<strong>in</strong>ed as before).<br />

4. KEY FEATURES AND RESULTS<br />

This section presents some key features <strong>and</strong> results of<br />

SystemC FL .<br />

A key feature of SystemC FL is to have a s<strong>in</strong>gle-formalism<br />

that can be used to describe various systems. Another<br />

key feature is that formal SystemC FL specifications can be<br />

purportedly used for formal verification. Analysis/formal<br />

verification takes place by extract<strong>in</strong>g simpler designs from<br />

SystemC FL designs that are tailored to some specific properties<br />

of <strong>in</strong>terest. Moreover, it can be specifically used for specify<strong>in</strong>g<br />

concurrent systems, f<strong>in</strong>ite state systems <strong>and</strong> real-time systems.<br />

Desired properties of these systems/designs modeled <strong>in</strong><br />

SystemC FL can be verified with exist<strong>in</strong>g formal verification<br />

tools (from free-public doma<strong>in</strong>) by translat<strong>in</strong>g them formally<br />

<strong>in</strong>to different formalisms that are the <strong>in</strong>put languages of the exist<strong>in</strong>g<br />

formal verification tools.<br />

For <strong>in</strong>stance, [15] reported that safety properties of concurrent<br />

systems modeled <strong>in</strong> SystemC FL that can be verified by<br />

translat<strong>in</strong>g them to PROMELA that is the <strong>in</strong>put language of<br />

the SPIN Model Checker. Similarly, safety <strong>and</strong> liveness properties<br />

of f<strong>in</strong>ite state systems described <strong>in</strong> SystemC FL can be<br />

fed <strong>in</strong>to the SMV [12] Model Checker to verify them (reported<br />

<strong>in</strong> [16]). Also, a formal translation was def<strong>in</strong>ed <strong>in</strong> [7] from<br />

SystemC FL to a variant (with very general sett<strong>in</strong>gs) of timed<br />

automata [13]. The practical benefit of the formal translation<br />

from a SystemC FL design (model<strong>in</strong>g real-time systems) to a<br />

timed automaton is to enable verification of properties of realtime<br />

systems described <strong>in</strong> SystemC FL us<strong>in</strong>g exist<strong>in</strong>g verification<br />

tools for timed automata, such as Uppaal [14].<br />

5. CONCLUDING REMARKS<br />

In this paper, we have presented the ma<strong>in</strong> aspects of the current<br />

status of the formal language SystemC FL . Then, we have<br />

illustrated the use of the SystemC FL through some case studies<br />

taken literature. We have also shown some key features <strong>and</strong><br />

results of SystemC FL .<br />

Our future works will focus on def<strong>in</strong><strong>in</strong>g the formal semantics<br />

(also <strong>in</strong> SOS style) <strong>in</strong> SystemC FL for model<strong>in</strong>g mixed-signal<br />

designs. This semantics <strong>in</strong>tends to support the development of<br />

system-level analog <strong>and</strong> mixed-signal specifications, <strong>and</strong> will<br />

be a conservative extension to the exist<strong>in</strong>g SystemC FL .<br />

We hope that SystemC FL will also be widely used by the<br />

chip designers <strong>and</strong> verification eng<strong>in</strong>eers <strong>in</strong> electronic commu-<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

122<br />

nity to give specifications <strong>and</strong> analysis of SystemC designs <strong>and</strong><br />

various systems.<br />

6. REFERENCES<br />

[1] “SystemC User’s Guide <strong>and</strong> SystemC Language<br />

Reference Manual (version 2.0),” are available at<br />

(http://www.systemc.org).<br />

[2] K.L. Man, “SystemC FL : Formalization of SystemC,” <strong>in</strong><br />

IEEE Proceed<strong>in</strong>gs of the 12th Mediterranean Electrotechnical<br />

Conference - IEEE/MELECON 2004, Dubrovnik, Croatia,<br />

Vol. 1, pp. 201-204, May, 2004.<br />

[3] J.C.M. Baeten, W.P. Weijl<strong>and</strong>, “Process Algebra,” Number<br />

18 <strong>in</strong> Cambridge Tracts <strong>in</strong> Theoretical Computer Science,<br />

Cambridge University Press, 1990.<br />

[4] Gordon D. Plotk<strong>in</strong>, “A Structural Approach to Operational<br />

Semantics,” Report DAIMI FN-19, Computer Science Department,<br />

Aarhus University, 1981.<br />

[5] Luca Aceto, Wan Fokk<strong>in</strong>k, Chris Verhoef. “Structural Operational<br />

Semantics”. In Bergstra et al. BPS01, pp. 197-292,<br />

1999.<br />

[6] K.L. Man, “Model<strong>in</strong>g with the Formal Language of SystemC<br />

: Case Studies,” <strong>in</strong> Proceed<strong>in</strong>gs of the 11th International<br />

Conference Mixed Design of Integrated Circuits <strong>and</strong><br />

Systems - IEEE/MIXDES 2004, Szczec<strong>in</strong>, Pol<strong>and</strong>, pp. 407-<br />

411, June, 2004.<br />

[7] K.L. Man, “Analyz<strong>in</strong>g SystemC FL Designs Us<strong>in</strong>g Timed<br />

Automata,” <strong>in</strong> INSPEC IEE Proceed<strong>in</strong>gs of the 9th Baltic<br />

<strong>Electronics</strong> Conference - IEEE/BEC 2004, Tall<strong>in</strong>n, Estonia,<br />

pp. 155-158, October, 2004.<br />

[8] K.L. Man, “The Formal Communication Semantics of<br />

SystemC FL ,” to appear <strong>in</strong> IEEE Proceed<strong>in</strong>gs of the 8th Euromicro<br />

Conference on Digital System Design - IEEE/DSD<br />

<strong>2005</strong>, Porto, Portugal, September, 2004.<br />

[9] W. Mueller, J.Ruf, D. Hofmann, J. Gerlach, T. Kropf,<br />

W.Rosenstiehl. “The Simulation Semantics of SystemC,” <strong>in</strong><br />

Proceed<strong>in</strong>gs of DATE, 2001.<br />

[10] Ashraf Salem. “Formal Semantics of Synchronous SystemC,”<br />

<strong>in</strong> Proceed<strong>in</strong>gs of DATE, 2003.<br />

[11] G. J. Holzmann, The SPIN Model Checker, Addison-<br />

Wesley, 2003.<br />

[12] The SMV model checker is available at http://www-<br />

2.cs.cmu.edu/ modelcheck/.<br />

[13] R. Alur, D.L. Dill, “A Theory of Timed Automata,” Theoretical<br />

Computer Science, Vol. 126, No. 2, pp. 183-236,<br />

April, 1994.<br />

[14] Kim G. Larsen, Paul Pettersson, Wang Yi, “UPPAAL <strong>in</strong> a<br />

Nutshell,” Journal of Software Tools for Technology Transfer<br />

(STTT), Vol 1, No. 1-2, pp. 134–152, 1997.<br />

[15] K.L. Man, “Formal Verification of SystemC FL Designs<br />

Us<strong>in</strong>g the SPIN Model Checker,” unpublished, March <strong>2005</strong>.<br />

[16] K.L. Man, “ Verify<strong>in</strong>g SystemC FL Designs Us<strong>in</strong>g the<br />

SMV Model Checker,” <strong>in</strong> Proceed<strong>in</strong>gs of the 8th IEEE<br />

Workshop on Design <strong>and</strong> Diagnostics of Electronic Circuits<br />

<strong>and</strong> Systems - IEEE/DDECS <strong>2005</strong>, Sopron, Hungary, pp.<br />

244-247, April, <strong>2005</strong>.


0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

A 600mV 1.32mW 75dB-DR 4 th -order Baseb<strong>and</strong> Analog Filter<br />

for UMTS Receivers<br />

M. De Matteis (Ph.D. Student) , S. D’Amico (Ph.D. Student) , A. Baschirotto<br />

Department of Innovation Eng<strong>in</strong>eer<strong>in</strong>g<br />

University of Lecce, Italy<br />

Email: marcello.dematteis@unile.it, stefano.damico@unile.it, <strong>and</strong>rea.baschirotto@unile.it<br />

ABSTRACT<br />

In this paper a 600mV supply voltage 4 th order lowpass<br />

analog filter for UMTS receivers is presented.<br />

The filter is designed us<strong>in</strong>g two 2 nd order filter<strong>in</strong>g<br />

stages based on biquadratic cells, where a suitable level<br />

shifter allows to achieve rail-to-rail <strong>in</strong>put <strong>and</strong> output<br />

sw<strong>in</strong>g, while operat<strong>in</strong>g at a supply voltage of Vth+3Vov.<br />

From a s<strong>in</strong>gle 600mV supply voltage <strong>in</strong> a CMOS<br />

0.13µm technology the s<strong>in</strong>gle-ended analog filter<br />

features a THD=-49dB for a 480mVpp output signal<br />

amplitude.<br />

1. INTRODUCTION<br />

Low-voltage <strong>and</strong> low-power issues are more <strong>and</strong><br />

more important <strong>in</strong> the design <strong>and</strong> realization of<br />

portable term<strong>in</strong>als.<br />

The state-of-the-art is operat<strong>in</strong>g at 1.8V, while<br />

research activities are design<strong>in</strong>g blocks <strong>and</strong> complete<br />

transceiver operat<strong>in</strong>g with a power supply of 1.2V or<br />

lower [1]. This trend is also due to the lower supply<br />

voltage, which can be susta<strong>in</strong>ed by the scaled CMOS<br />

technology.<br />

In this paper the design of the analog base-b<strong>and</strong> filter<br />

is addressed. This block appears str<strong>in</strong>gent s<strong>in</strong>ce at this<br />

level of the receiver cha<strong>in</strong> the signal has been amplified<br />

<strong>and</strong> so large l<strong>in</strong>ear range has to be guarantee.<br />

For this design the case of the UMTS has been<br />

develop, with reference to the front-end designed <strong>in</strong><br />

[2]. In this case, the receiver requires a 4 th -order<br />

Butterworth low-pass filter whose transfer function<br />

parameters are given <strong>in</strong> Table I.<br />

Table I. LP Filter Transfer Function<br />

Cell 1 Cell2<br />

f-3db 2. 11MHz 2. 11MHz<br />

Adc<br />

Q<br />

(quality factor)<br />

16dB 16dB<br />

1.3 0.5411<br />

The str<strong>in</strong>gent requirements on noise<br />

(IRN


In order to reduce the VDDm<strong>in</strong>, the current source<br />

made up by MB has been added. This current source<br />

(together with the control loop with the low-pass filter)<br />

allows to fix VoutDC=VDD/2.<br />

As a consequence, the virtual ground pr<strong>in</strong>ciple forces<br />

to have Vopamp_<strong>in</strong>DC=VB, which is fixed to be<br />

VB=Vov=200mV, i.e. the MB saturation/overdrive<br />

voltage. It follows that:<br />

VDDm<strong>in</strong>=Vov+VGS<strong>in</strong>putpair+Vsat_currentsource=Vth+3Vov (6)<br />

which is much lower that VDDm<strong>in</strong> <strong>in</strong> (5).<br />

So to fix the DC voltage at the opamp negative <strong>in</strong>put,<br />

a bias<strong>in</strong>g circuit has been added. This control loop<br />

circuit acts only at very low frequencies, <strong>in</strong> fact the RC<br />

pair <strong>in</strong> the control-loop places a zero at the DC <strong>and</strong> a<br />

pole at low frequency (2KHz).<br />

The filter <strong>in</strong>put signal is a wide b<strong>and</strong> <strong>in</strong>put signal<br />

(approximately 2Mhz), so when the control loop circuit<br />

acts as high pass filter there are no significant<br />

performances degradations <strong>in</strong> the receiver.<br />

But there is another aspect which has been<br />

considered with the control loop circuit.<br />

In the UMTS receivers the <strong>in</strong>put signal filter is<br />

usually a base b<strong>and</strong> signal, which <strong>in</strong>cludes the zero<br />

frequency. This can cause many problems <strong>in</strong> terms of<br />

offset voltage.<br />

The natural solution is to implement a high-pass<br />

filter with a cut-off frequency around the KHz. In the<br />

filter presented <strong>in</strong> this paper the control-loop has been<br />

designed to place the zero-pole pair to cut off the<br />

frequencies until a few KHz.<br />

The control loop pole position <strong>in</strong> the frequency<br />

doma<strong>in</strong> depends on:<br />

1- the dc ga<strong>in</strong> control loop opamp (Adc_loopOPAMP);<br />

2- the MOSFET MB transconductance (gm) ;<br />

3- the feedback resistance (Rf) ;<br />

4- the RC constant time value.<br />

In order to m<strong>in</strong>imize the RC value it has been used a<br />

unity ga<strong>in</strong> configuration opamp <strong>in</strong> the control loop. So<br />

the pole frequency is:<br />

flow_frequency=(1-Gloop)(1/2πRC)<br />

where Gloop=-gmAdc_loopOPAMPRf .<br />

These additional s<strong>in</strong>gularities <strong>in</strong> the overall<br />

frequency allow to relax the front-end dc-offset<br />

performance without a significant signal degradation<br />

[3].<br />

In the follow<strong>in</strong>g design <strong>in</strong> a 0.13µm CMOS<br />

technology, the Vth is around 300mV. Us<strong>in</strong>g<br />

Vov=90mV, the safe value of VDDm<strong>in</strong>=600mV is used <strong>in</strong><br />

order to guarantee the target performance for any<br />

temperature, ag<strong>in</strong>g, <strong>and</strong> technological (worst case)<br />

spread.<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

124<br />

V <strong>in</strong>DC<br />

R 1<br />

C 1<br />

R f<br />

R 2<br />

M B<br />

V B<br />

C L<br />

_<br />

+<br />

R L<br />

C 2<br />

Figure 1. LV Biquadratic cell<br />

_<br />

+<br />

V O<br />

V outDC<br />

2.1 Low Voltage Operational Amplifier<br />

The opamp to be embedded <strong>in</strong> the multi-path cell of<br />

Figure 1 is shown <strong>in</strong> Figure 2 [4]. It has to satisfy the<br />

low-voltage requirements, as described <strong>in</strong> the previous<br />

section, <strong>in</strong> addition to frequency response, noise <strong>and</strong><br />

l<strong>in</strong>earity requirements as required by the UMTS<br />

application.<br />

The opamp is composed of two ga<strong>in</strong> stages. The first<br />

ga<strong>in</strong> stage is composed of a differential P-MOS pair<br />

(M1 <strong>and</strong> M2) <strong>and</strong> a N-MOS pair used as a current<br />

mirror (M3 <strong>and</strong> M4) to implement the “differential-tos<strong>in</strong>gle<br />

ended” transformation. The second stage is a<br />

Common Source (M12-M11) stage to maximize the<br />

output sw<strong>in</strong>g. Between the two ga<strong>in</strong> stages, there are<br />

M8 <strong>and</strong> M5 transistors. M5 is used as a cascode device<br />

to fix the VDSM4.<br />

The opamp UGBW is designed to be about 100<br />

times higher that the filter cut-off frequency <strong>in</strong> order to<br />

reduce the effects of its lag on the filter frequency<br />

response. The opamp <strong>in</strong>put-refered noise reaches the<br />

output with a ga<strong>in</strong> equal to 2 <strong>and</strong> it has been<br />

consequently kept low (up to 3nV/√Hz) <strong>in</strong> order to<br />

make it negligible respect to the other noise<br />

contributions.<br />

Table II. LP Filter Transfer Function<br />

Power supply 600mV<br />

Current consumption 1.1mA<br />

UGBW 200MHz<br />

Adc<br />

60dB<br />

IRN 3nV/√Hz<br />

Phase Marg<strong>in</strong> 60°


Figure 2. Low Voltage OPAMP Structure<br />

3. SIMULATION RESULTS<br />

The 4 th order filter had been designed us<strong>in</strong>g a<br />

0.13µm st<strong>and</strong>ard CMOS technology, with<br />

VTH≈300mV. The filter is designed to operate with<br />

VDD=600mV. A s<strong>in</strong>gle-ended implementation is used<br />

s<strong>in</strong>ce it is sufficient for demonstrate the low-voltage<br />

design approach.<br />

On the other h<strong>and</strong> the l<strong>in</strong>earity performance given <strong>in</strong><br />

the follow<strong>in</strong>g may validate such a design approach. Of<br />

course a performance improvement is expected with a<br />

fully-differential implementation.<br />

The values of the capacitors <strong>and</strong> resistors (given <strong>in</strong><br />

Table) have been calculated with a MATLAB<br />

procedure, <strong>in</strong> order to satisfy the frequency response,<br />

the noise <strong>and</strong> the l<strong>in</strong>earity requirements. The noise of<br />

the 1 st cell is dom<strong>in</strong>ated by the thermal noise of R1 <strong>and</strong><br />

R2. On the other h<strong>and</strong>, due to the large ga<strong>in</strong> of the 1 st<br />

cell, the noise of the 2 nd cell is relaxed, <strong>and</strong> then the<br />

resistors can be larger. The overall filter transfer<br />

function is given <strong>in</strong> agreement with the specifications.<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

Table III. Component Sizes<br />

Cell 1 Cell 2<br />

C1 [pF] 62.9 5.9<br />

C2 [pF] 1.25 0.64<br />

R1[kΩ] 3.46 12.7<br />

R2 [kΩ] 3.29 19<br />

Rf [kΩ] 21.8 80<br />

The l<strong>in</strong>earity performance is reported <strong>in</strong> the next<br />

figure. A large l<strong>in</strong>earity performance is achieved, <strong>in</strong><br />

fact a THD=-46dB is obta<strong>in</strong>ed for a 480mVpp<br />

s<strong>in</strong>ewave, i.e 50mV close to the rail.<br />

In Table IV, the proposed filter is compared to an<br />

important example from the literature [1].<br />

125<br />

4. CONCLUSIONS<br />

In this paper a 600mV cont<strong>in</strong>uous-time filter is<br />

proposed to be embedded <strong>in</strong> a UMTS receiver. The<br />

filter presents a multi-path active-RC structure, with a<br />

voltage level shift which allows to optimally bias the<br />

structure. This allows to operate the design with a<br />

VDDm<strong>in</strong>=Vth+3·Vov <strong>and</strong> to guarantee rail-to-rail output<br />

sw<strong>in</strong>g. Us<strong>in</strong>g a s<strong>in</strong>gle-ended implementation, the THD<br />

is -49dB for a 480mVpp output signal sw<strong>in</strong>g. This<br />

demonstrates the validity of this design approach,<br />

which may be extended to the fully-differential<br />

implementation<br />

Table IV - Filter Performance Summary<br />

This Design [1]<br />

VDD [V] 0.6 0.6<br />

Technology<br />

[µm]<br />

0.13 CMOS 0.18 CMOS<br />

VTH [mV] 300<br />

Current<br />

Consumption<br />

[mA]<br />

2.2 4.3<br />

Power<br />

Consumption<br />

[mW]<br />

1.32 2.58<br />

Filter order 4 th<br />

5 th<br />

f-3db [MHz] 2.11MHz 135KHz<br />

Adc [dB] 32dB 0<br />

IRN<br />

[nV/√Hz]<br />

17 30<br />

THD [dB] -49@170mVrms -50@50mVrms<br />

In B<strong>and</strong> IRN 30<br />

[µVrms]<br />

65<br />

DR<br />

[dB]<br />

75@THD=-49dB 58@THD=-50dB<br />

Magnitude (dB)<br />

40<br />

30<br />

20<br />

10<br />

0<br />

-10<br />

-20<br />

-30<br />

-40<br />

10 3<br />

10 4<br />

10 5<br />

Frequency (Hz)<br />

10 6<br />

Figure 3. Filter frequency response<br />

10 7


Figure 4. Filter HD2 <strong>and</strong> HD3 vs. output<br />

signal amplitude<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

Reference<br />

[1] S. Chatterjee, Y. Tsividis, P. K<strong>in</strong>get, “A 0.5V<br />

Filter with PLL-Based Tun<strong>in</strong>g <strong>in</strong> 0.18µm<br />

CMOS”, Proceed<strong>in</strong>gs of ISSCC <strong>2005</strong><br />

[2] D. Mastretta, R. Castello, F. Gatta, P. Rossi <strong>and</strong> F.<br />

Svelto “A 0.18 µm CMOS Direct-Conversion<br />

Receiver Front-End for UMTS”, Proceed<strong>in</strong>gs of<br />

ISSCC 2002<br />

[3] A. Spr<strong>in</strong>ger, L. Maurer, <strong>and</strong> R. Weigel, RF System<br />

Concepts for highly Integrated RFICs for W-<br />

CDMA Mobile Radio Term<strong>in</strong>als”, IEEE Trans.<br />

On Microwave Theory <strong>and</strong> Techniques, Jan.<br />

2002<br />

[4] A. Baschirotto, R. Castello, "A 1V 1.8MHz CMOS<br />

Switched-opamp SC filter with rail-to-rail<br />

output sw<strong>in</strong>g", IEEE J. of Solid State Circuits -<br />

December 1997 - pag. 1979-1986<br />

126


0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

A 2.45 GHz Remotely Powered RFID System<br />

Jari-Pascal Curty, Norbert Joehl, Cather<strong>in</strong>e Deholla<strong>in</strong>, Michel Declercq<br />

Laboratoire d’électronique générale, STI/IMM/LEG,<br />

Ecole Polytechnique Fédérale de Lausanne, Switzerl<strong>and</strong><br />

E-mail: jari.curty@epfl.ch<br />

ABSTRACT<br />

This paper presents a fully <strong>in</strong>tegrated remotely powered<br />

<strong>and</strong> addressable Radio Frequency IDentification (RFID)<br />

transponder work<strong>in</strong>g at 2.45 GHz. The achieved operat<strong>in</strong>g<br />

range at 4 W Effective Isotropically Radiated Power<br />

(EIRP) base-station transmit power is 12 m. The Integrated<br />

Circuit (IC) is implemented <strong>in</strong> a 0.5 µm siliconon-sapphire<br />

technology. A state of the art rectifier design<br />

achiev<strong>in</strong>g 37% of global efficiency is embedded to supply<br />

energy to the transponder. The necessary <strong>in</strong>put power<br />

to operate the transponder is about 2.7 µW.<br />

1. Introduction<br />

The IC presented <strong>in</strong> this article offers a solution <strong>in</strong> terms<br />

of performances for long range RFID applications. A<br />

read<strong>in</strong>g distance of 12 m has been achieved us<strong>in</strong>g stateof-the-art<br />

IC design <strong>and</strong> antenna techniques. Today’s<br />

exist<strong>in</strong>g RFID technology at 2.45 GHz are available <strong>in</strong> [1],<br />

[2], [3].<br />

Reader<br />

Power + ID of transponder B<br />

Data from transponder B<br />

Transponder<br />

C<br />

Transponder<br />

A<br />

Transponder<br />

B<br />

<br />

Modulator<br />

Rectifier<br />

+<br />

Limiter<br />

Figure 1: Operational pr<strong>in</strong>ciple.<br />

V+ V-<br />

Detection<br />

+<br />

Logic<br />

The general application framework is described <strong>in</strong><br />

Fig. 1. The concept is to address a group of hardcoded<br />

passive transponders <strong>in</strong> order to wake up only<br />

one of them. The reader sends an N-bit address (ID)<br />

<strong>and</strong> only the transponder that conta<strong>in</strong>s this ID wakes<br />

up. All others switch <strong>in</strong>to quiet mode. If a transponder<br />

with the emitted ID is found, it starts modulat<strong>in</strong>g<br />

its <strong>in</strong>put impedance to enable backscatter<strong>in</strong>g communication<br />

with the reader. The latter can then verify the<br />

transponder’s ID to check if it matches the one sent. Ev-<br />

127<br />

ery step <strong>in</strong> a communication session is reader controlled.<br />

Transponders which retrieve their operat<strong>in</strong>g power from<br />

the reader’s transmitted energy thus operate <strong>in</strong> a masterslave<br />

configuration.<br />

2. Transponder description<br />

Rectifier<br />

+<br />

Limiter<br />

Data Slicer<br />

V+ V-<br />

IF<br />

Oscillator<br />

V-<br />

V+ V+<br />

V+<br />

CV+<br />

CV-<br />

Power on<br />

reset<br />

V- V-<br />

V+<br />

Shift Reg.<br />

& Logic<br />

Figure 2: System architecture.<br />

Due to tight power constra<strong>in</strong>ts, the number of build<strong>in</strong>g<br />

blocks <strong>and</strong> their complexity are limited. The reader-tag<br />

communication l<strong>in</strong>k is implemented us<strong>in</strong>g a t<strong>in</strong>y pulsewidth<br />

demodulator. The backscatter<strong>in</strong>g technique [4] is<br />

used to enable the tag-reader communication l<strong>in</strong>k. Because<br />

of the 1/f noise that occurs, the tag data are modulated<br />

at an Intermediate Frequency (IF) rate. The IF<br />

frequency is generated by a typical relaxation oscillator<br />

structure. The latter is technology dependant <strong>and</strong> the<br />

frequency obta<strong>in</strong>ed may vary from die to die. One could<br />

solve this issue by us<strong>in</strong>g a calibration technique at the<br />

cost of a larger protocol preamble <strong>and</strong> a higher circuit<br />

complexity. Another possibility is to use a large IF b<strong>and</strong>width<br />

filter at the reader side, but the sensitivity of the<br />

system will be affected because of the noise level. Our<br />

approach is to develop a solution based on the architecture<br />

shown <strong>in</strong> Fig. 3 The IF b<strong>and</strong>width of the loop is<br />

V-


IF<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

LNA Mixer B<strong>and</strong>pass<br />

VCO, fC<br />

A<br />

D<br />

µC<br />

RSSI<br />

Figure 3: IF demodulation architecture.<br />

kept to a m<strong>in</strong>imum that ensures the desired datarate.<br />

The micro-controller sweeps the voltage-controlled oscillator<br />

frequency fC between fC <strong>and</strong> fC +(IFmax − IFm<strong>in</strong>)<br />

until the Received Signal Strength Indicator (RSSI) circuit<br />

detects some signal power located at the tag’s IF<br />

frequency.<br />

3. Rectifier issues<br />

The rectifier is a key block <strong>in</strong> every remotely powered<br />

system. A typical two stage modified-Gre<strong>in</strong>acher structure<br />

is shown <strong>in</strong> Fig. 4. The use of low threshold voltage<br />

<strong>and</strong> low reverse current diodes <strong>and</strong> capacitors makes it<br />

possible to obta<strong>in</strong> a relatively high output voltage (1-2 V)<br />

given a s<strong>in</strong>usoidal <strong>in</strong>put signal of about 200 mV. These<br />

values depend on the received power, the DC output current<br />

delivered to the load, <strong>and</strong> the impedance match<strong>in</strong>g<br />

quality between the antenna <strong>and</strong> the rectifier’s <strong>in</strong>put.<br />

v<strong>in</strong><br />

C1<br />

C ′<br />

1<br />

D1<br />

D2<br />

D ′<br />

1<br />

D ′<br />

2<br />

C2<br />

C ′<br />

2<br />

Stage 1<br />

D<br />

Stage 2<br />

A<br />

Cout<br />

Cout<br />

Vout<br />

Figure 4: 2-stage modified-Gre<strong>in</strong>acher full-wave<br />

rectifier.<br />

A thorough study is available <strong>in</strong> a preced<strong>in</strong>g article<br />

[5, 6] <strong>in</strong> which the circuit of Fig. 5 models the rectifier<br />

fabricated <strong>in</strong> a given technology as an <strong>in</strong>put <strong>and</strong> output<br />

impedance as well as a voltage controlled voltage source.<br />

The model allows the prediction of the power needed to<br />

supply a given DC output current at a constant output<br />

DC voltage. Furthermore, it provides the rectifier <strong>in</strong>put<br />

<strong>and</strong> output impedance. If the <strong>in</strong>put capacitance C<strong>in</strong> is<br />

128<br />

<strong>in</strong>ductively compensated, the <strong>in</strong>put voltage amplitude �v<strong>in</strong><br />

is equal to<br />

�v<strong>in</strong> =2 √ R<strong>in</strong><br />

2PAVRant<br />

(1)<br />

R<strong>in</strong> + Rant<br />

where PAV is the available power from the antenna <strong>and</strong><br />

Rant is its impedance real part. From Eq. (1), it is clear<br />

that to <strong>in</strong>crease �v<strong>in</strong>, one has to <strong>in</strong>crease Rant while simultaneously<br />

keep<strong>in</strong>g R<strong>in</strong> = Rant, i.e. power match. This<br />

is necessary <strong>in</strong> order to <strong>in</strong>crease the performances of the<br />

rectifier.<br />

v<strong>in</strong> R<strong>in</strong> C<strong>in</strong><br />

V0<br />

Rout<br />

Vout<br />

Figure 5: Equivalent circuit for the rectifier.<br />

Based on [5, 6], the m<strong>in</strong>imal <strong>in</strong>put power for a 3 stage<br />

rectifier with a 300 Ω radiation resistance antenna is<br />

equal to -16 dBm for an output current consumption of<br />

1 µA. This power value could be reduced with better<br />

antenna-tag impedance match<strong>in</strong>g.<br />

4. Modulator issues<br />

The backscatter<strong>in</strong>g modulation technique is based on the<br />

variation of the reflection coefficient Γ at the <strong>in</strong>put of a<br />

transponder. The power part be<strong>in</strong>g reflected enables tagto-reader<br />

communication. Γ can vary either <strong>in</strong> amplitude<br />

or <strong>in</strong> phase. As a result, two modulation types are possible:<br />

ASK (AM) <strong>and</strong> PSK (PM).<br />

The modulation type choice is a strict trade-off between<br />

the power dedicated to the tag operations <strong>and</strong><br />

power devoted to the communication. In ASK, the modulator<br />

switches the <strong>in</strong>put impedance between a matched<br />

state <strong>and</strong> a reflection state, wheras <strong>in</strong> PSK, it switches<br />

the impedance reactive part seen by the antenna between<br />

two complex conjugate values. The available power to<br />

the tag is thus kept constant dur<strong>in</strong>g the PSK modulation<br />

states. This could be seen as an advantage of PSK<br />

over ASK but, it can be shown that there exists an optimum<br />

for both modulation <strong>in</strong> terms of Bit Error Rate<br />

(BER), average available power <strong>and</strong> voltage amplitude<br />

at the tag <strong>in</strong>put. In this case, the PSK is only 0.74 dB<br />

superior to ASK (see Fig. 6), the average power available<br />

to the tag is 50% of the received power for both types <strong>and</strong><br />

the <strong>in</strong>put voltage amplitude is more than twice better <strong>in</strong><br />

ASK but only dur<strong>in</strong>g one modulation state (s<strong>in</strong>ce <strong>in</strong> the<br />

other state, the rectifier <strong>in</strong>put is shorted). In the circuit<br />

of this article, an ASK modulation type is used. It is<br />

implemented with the help of a simple analog switch.<br />

5. Antenna issues<br />

The transponder <strong>in</strong>put impedance is ma<strong>in</strong>ly <strong>in</strong>fluenced<br />

by the rectifier. Its impedance is composed of an imag-


BER<br />

10 0<br />

10 −1<br />

10 −2<br />

10 −3<br />

10 −4<br />

10 −5<br />

10 −6<br />

−1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18<br />

Eb dB<br />

N0 0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

0 5 10 15<br />

Optimal backscatter<strong>in</strong>g PSK<br />

Optimal backscatter<strong>in</strong>g ASK<br />

St<strong>and</strong>ard PSK<br />

Figure 6: Optimal ASK <strong>and</strong> PSK BER comparison.<br />

<strong>in</strong>ary part due to the parasitic capacitances <strong>and</strong> a real<br />

part that depends on the output current consumption.<br />

The other blocks of the RF front-end (the detector <strong>and</strong><br />

the modulator) br<strong>in</strong>g a small contribution (for the ASK<br />

case).<br />

Antenna<br />

R ant<br />

Vs C R Vs L C R<br />

Transponder<br />

Antenna<br />

R ant<br />

Inductive<br />

match<strong>in</strong>g<br />

Transponder<br />

Figure 7: Impedance match<strong>in</strong>g for the transponder.<br />

At low received power, <strong>in</strong>ductive match<strong>in</strong>g can be<br />

added to resonate with capacitance C (see Fig. 7). In<br />

this case, optimum power transfer between the antenna<br />

radiation resistance Rant <strong>and</strong> the <strong>in</strong>put resistance R is<br />

achieved if they are both equal. As a result, the global<br />

efficiency of the RF front-end <strong>in</strong>creases. The <strong>in</strong>ductance<br />

can be realized <strong>in</strong> conjunction with the antenna design<br />

with no added cost <strong>in</strong> terms of fabrication.<br />

In section 6. , the operat<strong>in</strong>g ranges for a transponder<br />

with <strong>and</strong> without <strong>in</strong>ductive match<strong>in</strong>g (see Table 2) for<br />

different antenna types are compared.<br />

There are several types of antennas that can be used<br />

with a high frequency transponder. The choice is closely<br />

related to the target application [7]. Patch antennas are<br />

well suited for metallic objects s<strong>in</strong>ce it is possible to make<br />

use of their bodies as a ground plane. Other types of materials,<br />

e.g. wood, cardboard, etc. also allow differential<br />

antennas. These antennas offer the advantages of higher<br />

radiation resistance compared to s<strong>in</strong>gle-ended versions,<br />

<strong>and</strong> of less capacitive losses. As shown <strong>in</strong> [5], [6], higher<br />

radiation resistance is desirable. For the circuit of this<br />

particular article an impedance level of 300 Ω offered the<br />

best results <strong>in</strong> terms of operat<strong>in</strong>g range.<br />

129<br />

6. Experimental results<br />

A version of the circuit has been realized <strong>in</strong> a fully depleted<br />

silicon on sapphire technology [8]. This competitive<br />

process offers low threshold voltage CMOS transistors<br />

that is a clear advantage for remotely powered<br />

devices[9]. The isolated devices present a low parasitic<br />

capacitance compared to a st<strong>and</strong>ard bulk process. The<br />

<strong>in</strong>put impedance <strong>in</strong> the absorptive state is shown <strong>in</strong> Table<br />

1. The model presented <strong>in</strong> [5] predicted electrical<br />

Table 1: Input admittance measurement at<br />

2.45 GHz.<br />

Freq. GHz RkΩ CfF Vout V Iout µA<br />

2.45 3.4 500 1 1<br />

parameters (<strong>in</strong>put impedance, efficiency, etc.) that were<br />

<strong>in</strong> excellent agreement with on-wafer measurements. The<br />

layout of the RF front-end has to be done with particular<br />

care, i.e. parasitic capacitances should be m<strong>in</strong>imized.<br />

Extracted layout simulation is essential <strong>in</strong> order to estimate<br />

<strong>and</strong> reduce parasitics that are directly connected<br />

to the antenna.<br />

A complete transponder is shown <strong>in</strong> Fig. 9. The antenna<br />

is optimized for 2.45 GHz. Its radiation resistance<br />

is approximately 300 Ω.<br />

Figure 8: Die micro-photograph (size:<br />

400 µm×550 µm without the test pads).<br />

The measured operat<strong>in</strong>g ranges at 2.45 GHz are summarized<br />

<strong>in</strong> Table 2. Results show a clear advantage for<br />

the folded dipole antenna with <strong>in</strong>ductive match<strong>in</strong>g. In<br />

this particular case, the path loss at 12 m is 124 dB.<br />

With an emitted power of 36 dBm <strong>and</strong> no losses com<strong>in</strong>g<br />

from the transponder, the backscattered power return<strong>in</strong>g<br />

to the emitt<strong>in</strong>g antenna is equal to -88 dBm. The<br />

measured power at the receiv<strong>in</strong>g antenna is equal to -<br />

92 dBm. The losses due to the tag antenna mismatch<br />

are thus equal to 4 dB. In the case of the folded dipole<br />

without <strong>in</strong>ductive compensation, those losses were equal<br />

to 14 dB.


Table 2: Operat<strong>in</strong>g range for different antenna types<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

Freq. MHz Antenna Range m<br />

2450 λ/2-dipole<br />

λ/2-dipole with<br />

6<br />

2450<br />

<strong>in</strong>ductive<br />

match<strong>in</strong>g<br />

9<br />

2450 folded dipole<br />

folded dipole<br />

7<br />

2450 with <strong>in</strong>ductive<br />

match<strong>in</strong>g<br />

12<br />

The power imp<strong>in</strong>g<strong>in</strong>g the transponder antenna (2 dB<br />

ga<strong>in</strong>) at 12 m is equal to -25.7 dBm, i.e. about 2.7 µW.<br />

With a 300 Ω radiation resistance antenna, the equivalent<br />

voltage source amplitude is 80 mV. With an <strong>in</strong>put<br />

impedance after <strong>in</strong>ductive match<strong>in</strong>g of 3.4 kΩ, the <strong>in</strong>put<br />

RF voltage amplitude is 73.5 mV <strong>and</strong> the output<br />

DC voltage of the rectifier reaches 882 mV (multiplication<br />

factor of 12 for the present rectifier structure). The<br />

transponder sensitivity is thus 73.5 mV or -25.7 dBm on<br />

a 300 Ω antenna. F<strong>in</strong>ally, the global efficiency of the<br />

rectifier <strong>in</strong> this extreme case is about 37% (consider<strong>in</strong>g<br />

1 µW of ouptut power).<br />

Figure 9: Picture of the complete 2.45 GHz<br />

transponder.<br />

The distance could be <strong>in</strong>creased by us<strong>in</strong>g a smaller<br />

operat<strong>in</strong>g frequency, e.g. 900 MHz. In this case, the<br />

achievable distance can be more than doubled.<br />

7. Conclusion<br />

This paper presents a remotely powered addressable<br />

UHF RFID system. Us<strong>in</strong>g state of the art analysis<br />

<strong>and</strong> design techniques, a SOS 0.5-µm CMOS technology<br />

<strong>and</strong> <strong>in</strong>ductive match<strong>in</strong>g between the antenna <strong>and</strong><br />

the transponder, an operat<strong>in</strong>g range of 12 m with 4 W<br />

EIRP transmitted power was achieved. At this distance,<br />

the available power for the transponder is 2.7 µW which<br />

means about 37% global efficiency for the rectifier s<strong>in</strong>ce<br />

the estimated (simulation) power consumption of the<br />

whole system is approximately 1 µW.<br />

As shown <strong>in</strong> this article, the system complexity is<br />

pr<strong>in</strong>cipally on the reader side (master-slave architecture).<br />

The available functions on the tag side are<br />

kept to a m<strong>in</strong>imum with the goal of ultra-low power<br />

consumption. However, if a higher amount of power<br />

is necessary for a given application, this circuit is<br />

still usable at the cost of a smaller operat<strong>in</strong>g range.<br />

130<br />

Table 3: Operat<strong>in</strong>g range for 4 W EIRP at<br />

2.45 GHz RFID systems.<br />

Range<br />

Mod.<br />

a<br />

type<br />

Process Reference<br />

0.4 m ASk<br />

0.18 µm<br />

CMOS<br />

[2]<br />

1.8 m n/a n/a [3]<br />

2.6 m b<br />

PSK<br />

0.5 µm<br />

CMOS +<br />

Shottky<br />

[1]<br />

12 m ASK 0.5 µm SOS This work<br />

a<br />

Tag-reader modulation.<br />

b<br />

Calculated value<br />

This paper also presents <strong>in</strong> conjunction with [5] the<br />

trade-offs that occur for remotely powered systems. Us<strong>in</strong>g<br />

silicon-on-sapphire <strong>in</strong> conjuction with state-of-theart<br />

RF design allowed us to reach an operat<strong>in</strong>g range of<br />

12 m. This is an important improvement compared with<br />

the available technologies today [1], [2], [3] (see Table 3).<br />

8. REFERENCES<br />

[1] U. Karthaus <strong>and</strong> M. Fischer, ”Fully <strong>in</strong>tegrated passive<br />

UHF RFID transponder IC with 16.7 µW m<strong>in</strong>imum<br />

RF <strong>in</strong>put power”, IEEE Journalof Solid-State<br />

Circuits, vol. 38, no. 10, pp. 1602-1608, October 2003.<br />

[2] “Hitachi µchip product”, 2004,<br />

http://www.hitachi-eu.com/mu<br />

[3] “Philips uCode product”, 2004,<br />

http://www.semiconductors.philips.com/markets/←↪<br />

identification/products/ucode/<strong>in</strong>dex.html<br />

[4] K. F<strong>in</strong>kenzeller, ”RFID h<strong>and</strong>book, Radio-Frequency<br />

Identifications Fundamentals <strong>and</strong> Applications”, 2nd<br />

edition, Wiley, 2003, ISBN 0-470-84402-7.<br />

[5] J.-P. Curty et Al., ”A model for µ-powered rectifier<br />

analysis <strong>and</strong> design”, To be published <strong>in</strong> IEEE Transactions<br />

on Circuits <strong>and</strong> Systems I.<br />

[6] J. P. Curty, ”A web <strong>in</strong>terface for diode<br />

rectification ability evaluation”, 2004,<br />

http://legwww.epfl.ch/direct<br />

[7] P. R. Foster <strong>and</strong> R. A. Burberry, ”Antenna problems<br />

<strong>in</strong> RFID systems”, IEE Colloquium on RFID technology,<br />

no. 1999/123, pp. 3/1-3/5, 25 Oct. 1999.<br />

[8] J. C. S. Woo, ”High performance silicon on sapphire<br />

technology”, 1998,<br />

http://www.ucop.edu/research/←↪<br />

micro/98-98/97 208.pdf<br />

[9] “UTSi Process Advantages”, 2004,<br />

http://www.peregr<strong>in</strong>e-semi.com.au/←↪<br />

Process/UTSIProcess2.html


0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

A 0.13µm CMOS VGA FOR MULTISTANDARD<br />

RECEIVERS<br />

S. D’Amico (Ph.D. Student) , V. Chironi, A. Baschirotto<br />

Department of Innovation Eng<strong>in</strong>eer<strong>in</strong>g<br />

University of Lecce, Italy<br />

E-mail: stefano.damico@unile.it, vchironi@email.it, <strong>and</strong>rea.baschirotto@unile.it<br />

ABSTRACT<br />

A fully balanced variable ga<strong>in</strong> amplifier (VGA) <strong>in</strong><br />

0.13µm CMOS technology to be used <strong>in</strong> a<br />

multist<strong>and</strong>ard receivers (WLAN, UMTS, GSM, <strong>and</strong><br />

Bluetooth) is reported. The overall architecture<br />

where the VGA is emebedded presents considerable<br />

different requirements (b<strong>and</strong>width, dc-ga<strong>in</strong>, <strong>and</strong><br />

common mode <strong>in</strong>put voltage) for the four telecom<br />

st<strong>and</strong>ards to be processed. The VGA features 4 ga<strong>in</strong><br />

steps with a IIP3 better than 30dBm <strong>and</strong> an <strong>in</strong>putreferred<br />

noise voltage lower than 5nV/√Hz at 0dB<br />

dc-ga<strong>in</strong>.<br />

1. INTRODUCTION<br />

Reconfigurable wireless systems (multist<strong>and</strong>ard<br />

term<strong>in</strong>als) identify a new wireless technology with<br />

a great research <strong>and</strong> market <strong>in</strong>terest. These<br />

equipments will allow the transceiver<br />

reconfigurability to process different telecom<br />

st<strong>and</strong>ards for wireless technologies, like: 802.11.x<br />

WLAN <strong>and</strong> Bluetooth (<strong>in</strong>door), <strong>and</strong> GSM,UMTS<br />

(outdoor). This approach allows shar<strong>in</strong>g the same<br />

circuital blocks for all the selected st<strong>and</strong>ards. Fig.1<br />

shows the architecture of a reconfigurable receiver<br />

under development by the Italian <strong>Research</strong> Project<br />

FIRB [1]-[2]. In this scheme the VGA (marked as<br />

VGA1) l<strong>in</strong>ks the two GSM/UMTS <strong>and</strong><br />

WLAN/Bluetooth RF receive-cha<strong>in</strong>s. These two RF<br />

channels present different characteristics (i.e supply<br />

voltage, dc-ga<strong>in</strong>, <strong>and</strong> b<strong>and</strong>width). Thus the VGA<br />

has to process different features for the selected<br />

st<strong>and</strong>ards [3]-[4], as reported <strong>in</strong> Table I for the most<br />

critical cases.<br />

131<br />

Fig. 1 - Reconconfigurable Receiever architecture<br />

In this paper a circuit solution for the VGA designed<br />

<strong>in</strong> a 0.13µm CMOS technology is proposed.<br />

2. CIRCUIT PRINCIPLE AND<br />

DESIGN<br />

The schematic of the designed VGA is shown <strong>in</strong><br />

Fig. 2. It uses an open-loop structure, which consists<br />

of a V-I converter (<strong>in</strong>put opamps, M1-M2, <strong>and</strong> Rdeg),<br />

a resistive load (RL) <strong>and</strong> an output level shifter (M7-<br />

M8). Such an open-loop structure allows to satisfy<br />

more efficiently the target specifications than with a<br />

closed loop structure.<br />

Parameter WLAN UMTS<br />

B<strong>and</strong>width 12 MHz 2.11MHz<br />

DC-ga<strong>in</strong> -10÷20 dB -10÷35 dB<br />

IRN@0dB dcga<strong>in</strong><br />

5nV/√Hz 5nV/√Hz<br />

IP3@0dB dcga<strong>in</strong><br />

30 dBm 30 dBm<br />

Input VCM 1.25 V 600 mV<br />

Table I -VGA Specs for different st<strong>and</strong>ards


0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

Fig. 2 – The VGA circuit.<br />

M1 <strong>and</strong> M2 operate <strong>in</strong> source follower configuration;<br />

the differential <strong>in</strong>put voltage v<br />

+ −<br />

v − v is<br />

<strong>in</strong> = <strong>in</strong> <strong>in</strong><br />

copied over the series impedance of the two<br />

nonl<strong>in</strong>ear transconductance gm1<br />

<strong>and</strong> the l<strong>in</strong>ear<br />

degeneration resistors Rdeg. The gm1 l<strong>in</strong>earity is<br />

improved by closed-loop structure <strong>in</strong>clud<strong>in</strong>g the<br />

<strong>in</strong>put opamps, which uses a s<strong>in</strong>gle differential pairs<br />

for large b<strong>and</strong>width <strong>and</strong> low noise level.<br />

The VGA dc-ga<strong>in</strong>, G, is calculated as<br />

follows:<br />

G =<br />

R<br />

deg<br />

R L<br />

Ao<br />

R L<br />

⋅ ≅<br />

1 ( 1 + A ) R<br />

+<br />

o deg<br />

g ( 1 + A )<br />

m1<br />

o<br />

(1)<br />

where gm1 is the M1-M2 transconductance while A0<br />

is the <strong>in</strong>put opamp dc-ga<strong>in</strong>. With this solution, the<br />

different <strong>in</strong>put common-mode voltage of the<br />

different st<strong>and</strong>ard is “absorbed” by the <strong>in</strong>put<br />

differential stage, which passes to the output load<br />

just a current proportional to the <strong>in</strong>put differential.<br />

The VGA ga<strong>in</strong> is the adjusted by a digital control,<br />

which changes the value of RL <strong>and</strong> Rdeg, as shown <strong>in</strong><br />

Fig. 3, <strong>and</strong> reported <strong>in</strong> Table II. For a dc-ga<strong>in</strong> lower<br />

than 0dB (i.e. for large <strong>in</strong>put signals, with reduced<br />

requirements for low-noise, but large requirements<br />

for large l<strong>in</strong>earity) the Rdeg is changed (reduced). On<br />

the other h<strong>and</strong>, for a dc-ga<strong>in</strong> larger than 0dB (i.e. for<br />

small <strong>in</strong>put signals, with large requirements for lownoise,<br />

but reduced requirements for large l<strong>in</strong>earity)<br />

the Rdeg is set to a m<strong>in</strong>imum value, while RL is<br />

<strong>in</strong>creased. This reflects <strong>in</strong> the frequency response,<br />

which presents a s<strong>in</strong>gle pole at the ga<strong>in</strong> node of RL,<br />

at the frequency fp=1/(2·π·RL·Cout), where Cout is the<br />

capacitive load. For the -10dB <strong>and</strong> 0dB ga<strong>in</strong> level,<br />

RL does not change <strong>and</strong>, accord<strong>in</strong>gly also the fp<br />

value. For larger dc-ga<strong>in</strong> the fp value reduces, while<br />

guarantee<strong>in</strong>g a sufficient b<strong>and</strong>width for the VGA.<br />

132<br />

AV<br />

[dB<br />

]<br />

Table II –Resistors values for ga<strong>in</strong> sett<strong>in</strong>g<br />

RL<br />

[Ω]<br />

Rdeg IRN [nV/sqrt(Hz)] IIP3 [dBm]<br />

[Ω] UMTS WLAN UMTS WLAN<br />

-10 150 400 12.8 13.4 39 39.8<br />

0 150 150 4.8 4.9 31.3 30.9<br />

20 2k 150 4.22 4.24 -3.7 -3.6<br />

35 32k 150 4.2 4.2 -18.2 -17.8<br />

Fig. 3 - Structure for the ga<strong>in</strong> control<br />

The <strong>in</strong>put-referred noise, IRN, is calculated,<br />

approximalively, as follows:<br />

2<br />

⎛ Rdeg<br />

⎞ 2<br />

IRN = 4 ⋅ kT ⋅ Rdeg<br />

⋅<br />

⎜<br />

⎜1<br />

+ + v<strong>in</strong><br />

_ op<br />

R ⎟ (2)<br />

⎝ L ⎠<br />

The VGA IIP3 is calculated as follows:<br />

2 2 2<br />

2 2 2<br />

( 1+<br />

g R A ) ≈ 384⋅<br />

I ⋅ R A<br />

IIP = ⋅ (3)<br />

2<br />

3 96⋅Vov1<br />

⋅ m1<br />

deg o<br />

2 deg o<br />

where Vov1 is the M1-M2 overdrive. The maximum<br />

opamp ga<strong>in</strong> is limited by the technology. Thus, the<br />

VGA l<strong>in</strong>earity performance is depends on the I2<br />

current level.<br />

The output stage is a unity ga<strong>in</strong> level shifter. It<br />

provides a 1.25V output common mode voltage.<br />

This solution allows to keep M1-M2 <strong>in</strong> saturation<br />

region.<br />

3. SIMULATION RESULTS<br />

The proposed VGA has been designed <strong>in</strong> a st<strong>and</strong>ard<br />

0.13µm CMOS technology. The frequency response<br />

for different ga<strong>in</strong> level is shown <strong>in</strong> Fig. 4. It can be<br />

seen that the different ga<strong>in</strong> levels correspond to


different b<strong>and</strong>width, satisfy<strong>in</strong>g the requirements for<br />

the different st<strong>and</strong>ards. Table III summarizes the<br />

VGA performance, for the different st<strong>and</strong>ards. Fig.<br />

5 shows the layout of the device, which is under<br />

fabrication. The active chip size is 0.5mm x 1.5mm.<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

Fig. 4 – VGA transfer functions<br />

Table III -VGA performance at 0dB dc-ga<strong>in</strong><br />

St<strong>and</strong>ard UMTS WLAN<br />

VCM<strong>in</strong> [V] 0.6 1.25<br />

IIP3 [dB] 31.7 30.6<br />

IRN [nV/sqrt(Hz)] 4.9<br />

Current consumption<br />

Differential stage 1.65mA x 2<br />

Opamp 1.15mA x 2<br />

Level shifter 0.40mA x 2<br />

Total 6.4mA<br />

Power supply 2.5V<br />

THD (@500mV) -50dB<br />

133<br />

Fig. 5 – Chip layout<br />

4. CONCLUSIONS<br />

A high-l<strong>in</strong>earity, low-noise VGA <strong>in</strong> a 0.13µm<br />

CMOS technology is proposed. The VGA is<br />

embedded <strong>in</strong> a reconfigurable receiver able to<br />

process different telecom st<strong>and</strong>ards (WLAN,<br />

UMTS, GSM, <strong>and</strong> Bluetooth). The VGA is then<br />

able to satisfy all the requirements for each<br />

st<strong>and</strong>ards <strong>in</strong> terms of dc-ga<strong>in</strong>, b<strong>and</strong>width <strong>and</strong><br />

common mode <strong>in</strong>put voltage. The proposed VGA<br />

consumes about 6.4mA, its IRN is lower than<br />

5nV/√Hz <strong>and</strong> the IIP3 is more than 30dBm at 0dB<br />

DC-ga<strong>in</strong>.<br />

ACKNOWLEDGMENT<br />

This research has been partially supported by the<br />

Italian National Program FIRB, Contract n°<br />

RBNE01F582.<br />

5. REFERENCES<br />

[1] “Enabl<strong>in</strong>g Technologies for Wireless<br />

Reconfigurable Term<strong>in</strong>als” FIRB Project,<br />

WebPage: http://ims.unipv.it/firb/<br />

[2] S. D’Amico, A. Baschirotto, A. Vigna, N.<br />

Ghittori, <strong>and</strong> P. Malcovati, “Low-power<br />

reconfigurable baseb<strong>and</strong> block for<br />

UMTS/WLAN transmitters”, Proc. of<br />

NORCHIP 2004.<br />

[3] A.A. El-Adawy, A.M. Soliman, H.O. Elwan;<br />

“Low voltage fully CMOS voltage mode


0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

digitally controlled variable ga<strong>in</strong> amplifier”.<br />

<strong>Microelectronics</strong> Journal 31 (2000) 139-146<br />

[4] J.J.F, Rijns; “CMOS Low-Distortion High-<br />

Frequency Variable-Ga<strong>in</strong> Amplifier”. IEEE<br />

Journal of Solid-State Circuit, July 1996<br />

134


0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

FULLY INTEGRATED RF FRONT−END FOR WLAN: A NEW<br />

STEP TOWARD SINGLE-CHIP TRANSCEIVERS<br />

Domenico Zito, Francesco D’Ascoli, Bruno Neri<br />

Dipartimento di Ingegneria dell’Informazione (DIIEIT), RFLab, University of Pisa<br />

G.Caruso 16, I-56122 Pisa, Italy<br />

e-mail: d.zito@iet.unipi.it<br />

ABSTRACT<br />

The RF stages of a fully <strong>in</strong>tegrated front-end on st<strong>and</strong>ard<br />

silicon technology for low power Wireless Local Area<br />

Network applications are presented. The circuit solution<br />

here<strong>in</strong> proposed <strong>in</strong>cludes an <strong>in</strong>novative Tx/Rx antenna<br />

switch which does not require any additional<br />

technological steps <strong>and</strong> gives better performance with<br />

respect to those which commonly use PIN diodes as<br />

switch<strong>in</strong>g element. The operat<strong>in</strong>g pr<strong>in</strong>ciple is highlighted<br />

<strong>and</strong> the most representative performances are<br />

summarized.<br />

1. INTRODUCTION<br />

The <strong>in</strong>tegration of as more devices as possible <strong>in</strong>to a<br />

s<strong>in</strong>gle-chip allows the cost reduction <strong>and</strong> the <strong>in</strong>creas<strong>in</strong>g of<br />

the performances. Actually, s<strong>in</strong>ce undesired parasitic<br />

effects due to the packag<strong>in</strong>g are avoided <strong>and</strong> buffer stages<br />

towards external components are not necessary, a<br />

reduction of the dissipated power <strong>and</strong> an <strong>in</strong>creas<strong>in</strong>g of the<br />

design reliability are obta<strong>in</strong>ed. All these factors contribute<br />

drastically to the cost reduction of the entire transceiver<br />

[1], which represents the ma<strong>in</strong> target toward the mass<br />

market of radiofrequency <strong>in</strong>terfaces for Wireless LAN<br />

applications. The most widespread st<strong>and</strong>ards for these<br />

applications are IEEE 802.11a <strong>and</strong> HiPerLAN/2. Both of<br />

them operate <strong>in</strong> the 5-6 GHz b<strong>and</strong> <strong>and</strong> are based on Time<br />

Division Duplex<strong>in</strong>g protocol <strong>and</strong> Orthogonal Frequency<br />

Division Multiplex<strong>in</strong>g modulation. The topmost level of<br />

the admitted output power for 5.15-5.35 GHz <strong>in</strong>door<br />

applications is <strong>in</strong> the range of 50-250mW [2]. Even<br />

though these values are not so low, the <strong>in</strong>tegration of the<br />

Power Amplifier (PA) can be approached too, so that the<br />

WLAN systems offer the <strong>in</strong>terest<strong>in</strong>g prospective to realize<br />

fully <strong>in</strong>tegrated transceivers on st<strong>and</strong>ard silicon<br />

technologies.<br />

However, the low quality factor of <strong>in</strong>tegrated LC filters<br />

(often limited by the <strong>in</strong>ductors) represents a strong<br />

limitation to the realization of almost ideal resonant filters<br />

on st<strong>and</strong>ard silicon technologies. For this reason, the most<br />

widespread commercial solutions consist of us<strong>in</strong>g bulky<br />

<strong>and</strong> expensive components external to the chip, such as<br />

MEMS [3] or GaAs devices [4]. Although fully <strong>in</strong>tegrated<br />

transmitter <strong>and</strong> receiver solutions for WLAN are even<br />

more widespread <strong>in</strong> literature [2, 5], the Tx/Rx antenna<br />

switch is still rema<strong>in</strong><strong>in</strong>g the last obstacle to the complete<br />

<strong>in</strong>tegration of the transceiver. Only recently, some active<br />

circuits have been realized as switch<strong>in</strong>g element on<br />

135<br />

st<strong>and</strong>ard microelectronic technologies [6,7]. Their ma<strong>in</strong><br />

drawback is given by a not negligible <strong>in</strong>sertion loss (near<br />

to 1.5 dB). The novel solution presented <strong>in</strong> this paper uses<br />

an active circuit named ‘Boot-Strapped Inductor’ (BSI)<br />

which has been demonstrated capable of giv<strong>in</strong>g equivalent<br />

<strong>in</strong>ductance L eq (which can be varied by means of a control<br />

voltage by even a factor of 10) with a quality factor Q <strong>in</strong><br />

excess of 50 [8, 9]. Exploit<strong>in</strong>g these peculiarities, the BSI<br />

circuit can be driven from a low impedance to a high<br />

impedance state. Thus, the power <strong>in</strong> the circuit can be<br />

guided from the PA to the antenna dur<strong>in</strong>g the transmission<br />

time <strong>in</strong>terval (Tx) <strong>and</strong> from the antenna to the Low Noise<br />

Amplifier (LNA) dur<strong>in</strong>g the reception time <strong>in</strong>terval (Rx),<br />

as it will be expla<strong>in</strong>ed here<strong>in</strong>after. This paper is organized<br />

as follows: <strong>in</strong> Section 2, the operat<strong>in</strong>g pr<strong>in</strong>ciple of the<br />

novel antenna switch, the LNA <strong>and</strong> the PA are<br />

highlighted. In the Section 3, the fully <strong>in</strong>tegrated RF frontend<br />

design for 5-6 GHz Wireless LAN is reported <strong>and</strong><br />

discussed. Moreover, the ma<strong>in</strong> performances are reported<br />

<strong>and</strong> compared with those of other solutions. F<strong>in</strong>ally, <strong>in</strong><br />

Section 4, the conclusions are drawn.<br />

2. THE CIRCUIT NOVELTY<br />

The basic idea of the <strong>in</strong>novative circuit solution is<br />

represented by the antenna switch (patented) which has<br />

been proposed <strong>in</strong> the past for 2.4 GHz WLAN systems.<br />

The antenna switch is implemented by exploit<strong>in</strong>g the BSI<br />

circuit to realize high quality LC filter as block<strong>in</strong>g<br />

impedance for signal [10]. A BSI consists of an <strong>in</strong>tegrated<br />

transformer <strong>and</strong> a current amplifier, as shown <strong>in</strong> Fig.1.<br />

Figure 1. Basic scheme of the BSI. L 1 <strong>and</strong> L 2 are<br />

the <strong>in</strong>ductances of the two spirals of an <strong>in</strong>tegrated<br />

transformer (IT); the cascode amplifier is used <strong>in</strong><br />

stead of a current amplifier (CA).<br />

From a simplified small signal analysis of the BSI circuit<br />

shown <strong>in</strong> Fig.1, it derives that:<br />

Z = [ R + r − ω M g r s<strong>in</strong> ϕ] + jω[ L + M g r cos ϕ]<br />

(1)<br />

* *<br />

IN 1 π 12 m π 1 12 m π


where g m * is the tranconductance ga<strong>in</strong> of the CA, ϕ is the<br />

phase difference (due to the capacitive effects) between<br />

the currents which flow <strong>in</strong> the spirals of the <strong>in</strong>tegrated<br />

transformer (L 1, L 2), M 12 is their mutual <strong>in</strong>ductance <strong>and</strong> R 1<br />

is the parasitic resistance of the primary. The formula<br />

shows that Z IN is ma<strong>in</strong>ly depend<strong>in</strong>g on the current ga<strong>in</strong><br />

(both magnitude <strong>and</strong> phase). To be noted that, if ϕ results<br />

very close to zero (this can be done <strong>in</strong>troduc<strong>in</strong>g a properly<br />

sized additional <strong>in</strong>ductor <strong>in</strong> order to compensate the<br />

parasitic capacitance seen at the <strong>in</strong>put the CA [8]) then an<br />

almost ideal equivalent <strong>in</strong>ductance (L eq) is obta<strong>in</strong>ed. By<br />

act<strong>in</strong>g on the quiescent po<strong>in</strong>ts of the transistors Q 1 <strong>and</strong> Q 2<br />

(<strong>in</strong> pr<strong>in</strong>ciple, that can be done by means of I B1 <strong>and</strong> V B2, as<br />

reported <strong>in</strong> the scheme of Fig.1), the transconductance<br />

ga<strong>in</strong> <strong>and</strong> the <strong>in</strong>put capacitance of the amplifier can varied<br />

[8] <strong>and</strong> so also the value of the equivalent <strong>in</strong>ductance L eq.<br />

This approach has been successfully adopted to realize the<br />

LC resonant circuit (L realized by means L eq of the BSI<br />

circuit) <strong>in</strong> a tunable LNA with excellent selectivity<br />

(f 0 / B 3dB > 25, where f 0 is the central frequency <strong>and</strong> B 3dB<br />

the b<strong>and</strong>width at m<strong>in</strong>us 3dB) for 1 GHz applications. The<br />

measured results are reported <strong>in</strong> Fig. 2. The reliability of<br />

such an approach has been demonstrated by a test<br />

campaign on several prototypes. The robustness aga<strong>in</strong>st<br />

temperature has been proved [11] <strong>and</strong> moreover, <strong>in</strong> [9], a<br />

self-calibration digital system has been proposed <strong>and</strong><br />

successfully <strong>in</strong>vestigated <strong>in</strong> order to compensate the<br />

spread<strong>in</strong>g of process parameters, temperature <strong>and</strong> ag<strong>in</strong>g<br />

effects.<br />

However, <strong>in</strong> the case of the Tx/Rx antenna switch, the BSI<br />

circuit presents two operat<strong>in</strong>g conditions of <strong>in</strong>terest: the<br />

former, <strong>in</strong> which the current amplifier is switched-on <strong>and</strong><br />

then the boot-strap effect on the equivalent <strong>in</strong>ductance is<br />

obta<strong>in</strong>ed; the latter, <strong>in</strong> which the current amplifier is<br />

switched-off <strong>and</strong> then the aforementioned effect is<br />

miss<strong>in</strong>g. The scheme of Fig. 3 clarifies the way <strong>in</strong> which<br />

the BSI circuit can be properly exploited to implement the<br />

novel Tx/Rx antenna switch. Particularly, the capabilities<br />

of the BSI have been used <strong>in</strong> two different ways <strong>in</strong><br />

receiver <strong>and</strong> transmitter. Dur<strong>in</strong>g Tx, the BSI circuit at the<br />

<strong>in</strong>put of the LNA (which is switched-off) is active, so<br />

provid<strong>in</strong>g a high impedance path (L eq is resonant with C)<br />

with respect to that of the antenna (typically, 50 Ohm�).<br />

Figure 2. Transducer ga<strong>in</strong> G T (measured) of the<br />

tunable LNA based on the BSI circuit. The <strong>in</strong>set<br />

on top reports the central frequency f 0 (where<br />

G T=G Tmax) of the frequency responses obta<strong>in</strong>ed<br />

for three different couples of I B1,V B2.<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

136<br />

Figure 3. Scheme of pr<strong>in</strong>ciple of the Tx/Rx<br />

antenna switch. The current amplifier (CA) of<br />

each BSI can be switched ON/OFF. To be noted<br />

the different approaches used to realize the<br />

block<strong>in</strong>g impedance <strong>in</strong> the receiver (R) <strong>and</strong><br />

transmitter (T).<br />

Dur<strong>in</strong>g the receive time <strong>in</strong>tervals (Rx), the CA of the LNA<br />

(which is switched-on) is <strong>in</strong> off state therefore the value of<br />

the equivalent <strong>in</strong>ductance (1) is switched to the low value.<br />

At the same time, the BSI circuit at the output of the PA<br />

(Fig. 2) is activated <strong>in</strong> order to present a high impedance<br />

path from the antenna toward the PA. For sake of clarity,<br />

<strong>in</strong> this case, the BSI circuit has been properly designed <strong>in</strong><br />

such a way the phase difference ϕ (1) is <strong>in</strong> the fourth<br />

quarter, so obta<strong>in</strong><strong>in</strong>g a high impedance value, both for real<br />

<strong>and</strong> imag<strong>in</strong>ary parts.<br />

The different approaches (resonance for the Rx path <strong>and</strong><br />

high impedance for the transmitter one) are related to the<br />

different l<strong>in</strong>earity constra<strong>in</strong>s of the receiver <strong>and</strong><br />

transmitter channels. The resonant solution has<br />

demonstrated reach<strong>in</strong>g for higher block<strong>in</strong>g impedance <strong>and</strong><br />

a larger l<strong>in</strong>earity range. The proposed solution can be<br />

implemented <strong>in</strong> many different technologies <strong>and</strong><br />

processes, <strong>in</strong> a s<strong>in</strong>gle ended or <strong>in</strong> a fully differential<br />

architecture (as shown <strong>in</strong> Fig. 3), <strong>and</strong> it can be adapted for<br />

a wide range of RF frequencies. The key factor for a<br />

successful design is to size the driv<strong>in</strong>g circuitry <strong>and</strong> the<br />

transformer <strong>in</strong> order to obta<strong>in</strong> a nearly perfect power<br />

match alternatively between the antenna <strong>and</strong> LNA or PA.<br />

3. 5-6 GHZ RF FRONT-END:<br />

CIRCUITS AND PERFORMANCES<br />

The design of the RF front-end with the BSI based<br />

antenna switch, has been realized for low power (50 mW)<br />

5.15-5.35 GHz WLAN systems, relatively to a fully<br />

differential topology with a 50 Ω antenna impedance. For<br />

reasons of space, a s<strong>in</strong>gle ended equivalent <strong>and</strong> simplified<br />

schematic is reported <strong>in</strong> Fig. 4. The LNA is implemented<br />

by a cascode stage with <strong>in</strong>ductive emitter degeneration<br />

(L E) for an <strong>in</strong>tegrated match<strong>in</strong>g to 50 Ω of the antenna <strong>and</strong><br />

a subsequent emitter follower buffer stage.


Figure 4. Simplified <strong>and</strong> equivalent s<strong>in</strong>gle ended<br />

scheme of the fully <strong>in</strong>tegrated RF front-end,<br />

<strong>in</strong>clusive of the BSI Antenna Switch. The CAs are<br />

made of two cascode stages put <strong>in</strong> cascade.<br />

The PA is realized by a cascade of two cascode emitter<br />

stages with <strong>in</strong>ductive degeneration <strong>and</strong> LC resonant load<br />

filters (L 1C 1, L 2C 2) realized by means of on-chip<br />

<strong>in</strong>tegrated <strong>in</strong>ductors.<br />

The circuits have been designed by us<strong>in</strong>g the HBTs of a<br />

st<strong>and</strong>ard SiGe-CMOS 0.35 µm technology (S35 by AMS)<br />

with a top metal thickness of 2.5 µm <strong>and</strong> a maximum cutoff<br />

frequency of 75 GHz. The chip picture is reported <strong>in</strong><br />

Fig. 5. Circuit <strong>and</strong> electromagnetic simulations have been<br />

performed by means of SpectreRF, Advanced Design<br />

System <strong>and</strong> Momentum, respectively.<br />

The transducer ga<strong>in</strong> (GT) of the LNA is nearly 20 dB (see<br />

Fig. 6) <strong>and</strong> the Noise Figure (NF) is 3.74 dB. The <strong>in</strong>putreferred<br />

CP 1dB is -21 dBm. The total power consumption<br />

is equal to 27 mW @ 3V of power supply (3 mA dra<strong>in</strong>ed<br />

by the buffer). As for the PA, the power ga<strong>in</strong> amounts to<br />

22.5 dB (see Fig. 7), the power saturation level is nearly<br />

equal to 12 dBm <strong>and</strong> the total power consumption is 144<br />

mW.<br />

In Rx mode, the <strong>in</strong>put impedance of the receiver cha<strong>in</strong> is<br />

very close to 50 Ω, whereas the output impedance of the<br />

transmitter (|Z T|, see Fig. 3) is about 840 Ω. In Tx mode,<br />

Z T is very close to 50 Ω, whereas |Z R| becomes<br />

approximately equal to 1 KΩ (see Fig. 8).<br />

PA<br />

Figure 5.Chip picture (the total area is 2.4 mm 2 T<br />

R<br />

IN/OUT<br />

).<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

LNA<br />

137<br />

The power consumption of the BSI circuits amounts to<br />

12.9 <strong>and</strong> 26 mW, <strong>in</strong> Rx <strong>and</strong> Tx mode, respectively. The<br />

Power Ratio (PR), def<strong>in</strong>ed as the ratio between the power<br />

delivered to the antenna <strong>and</strong> that lost <strong>in</strong> the receiver, is<br />

greater then 10 up to +12 dBm of output available power<br />

of the PA (Pout_pa), as shown <strong>in</strong> Fig. 9.a. The relevant<br />

<strong>in</strong>sertion loss, def<strong>in</strong>ed as the ratio between the power<br />

available from source <strong>and</strong> the power delivered to load, is<br />

0.235 dB <strong>and</strong> 0.585 dB (at most is 1.1 dB at 50 mW) for<br />

Rx <strong>and</strong> Tx mode operations, respectively. These results<br />

have been obta<strong>in</strong>ed by a non-l<strong>in</strong>ear steady state analisys<br />

<strong>and</strong> they are shown <strong>in</strong> Figg. 9 <strong>and</strong> 10.<br />

Figure 6. Transducer ga<strong>in</strong> (GT) <strong>and</strong> available<br />

power ga<strong>in</strong> (GA) of the LNA vs. frequency. The<br />

load impedance is equal to 100 Ohms.<br />

Figure 7. The power ga<strong>in</strong> (Gp) of the Power<br />

Amplifier vs. frequency. The source impedance is<br />

equal to 100 Ohms.<br />

Figure 8. The block<strong>in</strong>g impedance (resonant) seen<br />

from the <strong>in</strong>put of the receiver dur<strong>in</strong>g Tx. The<br />

value of the real part of Z R is higher than 500<br />

Ohms (ten times R ANT) <strong>in</strong> a frequency range of<br />

400 MHz around 5.25 GHz.


0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

a)<br />

b)<br />

Figure 9. Results relative to the Tx mode. a) The<br />

Power Ratio (PR) between the power delivered to<br />

the antenna <strong>and</strong> that lost <strong>in</strong> the receiver vs. the<br />

available power at the output of the PA (Pout_pa);<br />

b) the <strong>in</strong>sertion loss (IL). They has been obta<strong>in</strong>ed<br />

by non-l<strong>in</strong>ear analysis (Harmonic Balance).<br />

a)<br />

b)<br />

Figure 10. Results relative to the Rx mode. a) The<br />

Power Ratio (PR) between the power delivered to<br />

the <strong>in</strong>put of the receiver <strong>and</strong> that lost <strong>in</strong> the<br />

transmitter vs. the available power at the antenna;<br />

b) the <strong>in</strong>sertion loss (IL). They have been obta<strong>in</strong>ed<br />

by non-l<strong>in</strong>ear simulations (Harmonic Balance).<br />

138<br />

These results are very promis<strong>in</strong>g if compared with those<br />

which are typically obta<strong>in</strong>ed by us<strong>in</strong>g external switches or<br />

duplexer which not only have to be added as external<br />

components to the transceiver, but they also <strong>in</strong>troduce an<br />

<strong>in</strong>sertion loss of at least 1-2 dB. The chip has been just<br />

realized <strong>and</strong> the measurements have already started <strong>and</strong><br />

they will be carried out <strong>in</strong> the next weeks.<br />

4. CONCLUSIONS<br />

The RF stages of a <strong>in</strong>novative fully <strong>in</strong>tegrated RF frontend<br />

comprehensive of antenna switch have been presented<br />

<strong>and</strong> demonstrated at simulation level. The circuit has been<br />

implemented <strong>in</strong> a 0.35 µm SiGe-CMOS st<strong>and</strong>ard process.<br />

If the obta<strong>in</strong>ed results will be confirmed by the<br />

measurements on the prototypes, this fully <strong>in</strong>tegrated RF<br />

front-end for WLAN <strong>in</strong> the 5-6 GHz frequency b<strong>and</strong> will<br />

represent a significant new step toward the realization of<br />

s<strong>in</strong>gle-chip transceivers.<br />

REFERENCES<br />

[1] M. Steyaert, “S<strong>in</strong>gle chip CMOS RF transceivers:<br />

wishful th<strong>in</strong>k<strong>in</strong>g or reality”, IEE Sem<strong>in</strong>ar on Low<br />

Power IC Design (Ref. No. 2001/042), 2001, pp.1-6;<br />

[2] H. Samavati, T. H. Lee et al, “5-GHz CMOS<br />

Wireless LAN” IEEE MTT , vol. 50, pp. 268-280, Jan<br />

2002;<br />

[3] C. Goldsmith, J. Kleber, et al. “RF MEMS: benefits<br />

& challenges of an evolv<strong>in</strong>g RF switch technology”,<br />

Tech. Digest of IEEE GaAs IC Symp. 2001, pp.147-<br />

148, Oct 2001;<br />

[4] C.-H Lee, S. Chakraborty, et al., “Broadb<strong>and</strong> highly<br />

<strong>in</strong>tegrated LTCC front-end module for IEEE 802.11a<br />

WLAN applications”, Proc. of IEEE MTT Symp.<br />

2002, vol. 2, pp.1045-1048, Jun 2002;<br />

[5] M. Zargari et al., “A 5-GHz CMOS Transceiver for<br />

IEEE 802.11a Wireless LAN Systems” IEEE JSSC,<br />

vol. 37, is. 12, pp. 1688- 1692, Dec 2002;<br />

[6] Feng-Jung Huang, K.O, “A 0.5-µm CMOS T/R<br />

switch for 900-MHz wireless applications” IEEE<br />

JSSC, vol.36, is.3, pp.486-492, Mar 2001;<br />

[7] N. A. Talwalkar, C.P. Yue, et al., “Integrated CMOS<br />

Transmit-Receive Switch us<strong>in</strong>g LC-Tun<strong>in</strong>g Substrate<br />

Bias for 2.4-GHz <strong>and</strong> 5.2 GHz Applications”, IEEE<br />

JSSC, vol.39, n.6, pp. 863-870, Jun 2004;<br />

[8] G. D’Angelo, A. Monterastelli, B. Neri et al., “Highquality<br />

active <strong>in</strong>ductors”, IEE <strong>Electronics</strong> Letters,<br />

vol. 35, n. 20, 30 Sep 1999;<br />

[9] D. Zito, De Bernard<strong>in</strong>is <strong>and</strong> B. Neri, “Model<strong>in</strong>g <strong>and</strong><br />

Design of High Q Tunable Low Noise Amplifier for<br />

WLAN”, Proc. of IEEE ICSES 2004, pp.253-256, Jul<br />

2004;<br />

[10] L. Fanucci, A. Hopper, B. Neri, D. Zito “A Novel<br />

Fully Integrated Antenna Switch for Wireless<br />

Systems”, Proc. of IEEE ESSDERC 2003, pp.553-<br />

556, Sep 2003;<br />

[11] S. Di Pascoli, B. Neri, D. Zito et al., “S<strong>in</strong>gle-chip 1.8<br />

GHz b<strong>and</strong> pass LNA with temperature selfcompensation”,<br />

Proc. of IEEE ISSCS 2003, vol.1,<br />

pp.125-128, Jul 2003.


0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

Hierarchical FPGA cluster<strong>in</strong>g to improve routability<br />

Zied Marrakchi, Hayder Mrabet, Habib Mehrez<br />

LIP6-ASIM Laboratory, Université Paris 6, Pierre et Marie Curie<br />

4 Place Jussieu, 75252 Paris, France<br />

E-mail: zied.marrakchi@lip6.fr<br />

ABSTRACT<br />

In this paper we present a new cluster<strong>in</strong>g technique,<br />

based on the multilevel partition<strong>in</strong>g, for hierarchical<br />

FPGAs. The purpose of this technique is to reduce<br />

area <strong>and</strong> power by consider<strong>in</strong>g routability <strong>in</strong> early<br />

steps of the CAD flow. We will show that this<br />

technique can reduce the needed tracks <strong>in</strong> the<br />

rout<strong>in</strong>g step by 15% compared with the other<br />

pack<strong>in</strong>g tools.<br />

1. INTRODUCTION<br />

Field Programmable Gate Arrays (FPGAs) have ga<strong>in</strong>ed<br />

rapid commercial acceptance thanks to their<br />

reconfigurability <strong>and</strong> low cost. Speed <strong>and</strong> area efficiency<br />

of an FPGA are directly related to the granularity of its<br />

logic blocks. If the logic blocks are f<strong>in</strong>e gra<strong>in</strong>ed, the<br />

circuit to be implemented will be distributed over a larger<br />

number of logic blocks. This has a negative impact on<br />

routability s<strong>in</strong>ce more blocks need to be <strong>in</strong>terconnected.<br />

Recently FPGA vendors have <strong>in</strong>troduced hierarchical<br />

FPGAs consist<strong>in</strong>g of logic clusters. Examples of such<br />

devices are the Xil<strong>in</strong>x Virtex <strong>and</strong> the Apex from Altera. In<br />

these architectures several Look Up Tables (LUTs) are<br />

clustered <strong>in</strong>to one logic block to provide better<br />

performances specially for communication <strong>and</strong> to exploit<br />

signal shar<strong>in</strong>g among LUTs. Our work focuses on the way<br />

to adapt an exist<strong>in</strong>g multilevel partition<strong>in</strong>g tool to the<br />

FPGA cluster<strong>in</strong>g problem. In fact, some constra<strong>in</strong>ts<br />

imposed by the architecture (number of p<strong>in</strong>s <strong>and</strong> cluster<br />

size) must be respected.<br />

2. PREVIOUS WORK<br />

Prior research on clustered FPGA architectures has<br />

focused ma<strong>in</strong>ly on area <strong>and</strong> delay optimisation. Betz et al.<br />

[1] proposed a pack<strong>in</strong>g/cluster<strong>in</strong>g algorithm: Vpack for<br />

hierarchical FPGAs [5]. The ma<strong>in</strong> idea <strong>in</strong> their work was<br />

to pack a technology-mapped circuit <strong>in</strong>to clusters of a<br />

given size <strong>and</strong> <strong>in</strong>put/output p<strong>in</strong> constra<strong>in</strong>ts. The same<br />

authors <strong>in</strong>troduced a tool, T-Vpack [2], us<strong>in</strong>g a tim<strong>in</strong>g<br />

driven pack<strong>in</strong>g approach based on the idea of pack<strong>in</strong>g<br />

blocks on tim<strong>in</strong>g-critical paths to exploit fast local<br />

<strong>in</strong>terconnects. The clusters generated us<strong>in</strong>g T-vpack use an<br />

average of 12% fewer tracks than the clusters generated<br />

us<strong>in</strong>g Vpack for the same array size. A recent work, Rpack<br />

[3] presented a routability-driven pack<strong>in</strong>g algorithm<br />

which first identifies routability factors <strong>and</strong> prioritizes<br />

these factors <strong>in</strong>to an improved cluster<strong>in</strong>g function. This<br />

approach produces rout<strong>in</strong>g track counts comparable to<br />

those generated by T-Vpack.<br />

139<br />

3. ARCHITECTURE OVERVIEW<br />

The FPGA we are target<strong>in</strong>g is of isl<strong>and</strong>-style structure. As<br />

it can be seen <strong>in</strong> figure 1, the circuit is composed of<br />

clustered logic blocks (CLBs), switch blocks, connection<br />

blocks, <strong>and</strong> I/O blocks. Each CLB, which implements the<br />

user’s logic, has <strong>in</strong>puts <strong>and</strong> outputs connected by the<br />

rout<strong>in</strong>g network. It conta<strong>in</strong>s N basic logic elements<br />

(BLEs) grouped together. Each BLE conta<strong>in</strong>s a k-<strong>in</strong>put<br />

lookup table (a K-LUT) followed by a bypass flip-flop.<br />

The LUT <strong>in</strong>puts are chosen from among a set of shared<br />

cluster <strong>in</strong>puts. In our case k = 4.<br />

The Connection block connects the <strong>in</strong>put <strong>and</strong> output p<strong>in</strong>s<br />

of a CLB to the rout<strong>in</strong>g channel, <strong>and</strong> the Switch block<br />

connect the wires of two <strong>in</strong>tersect<strong>in</strong>g channels. The<br />

number of tracks between any two neighbor<strong>in</strong>g clusters is<br />

uniform <strong>and</strong> is called the channel width. The number of<br />

logic clusters that each wire-segment spans before go<strong>in</strong>g<br />

through a switch box is called the track segment length.<br />

We assume that a cluster of size n has (2n + 2) <strong>in</strong>put p<strong>in</strong>s<br />

<strong>and</strong> n output p<strong>in</strong>s. Indeed, this is sufficient to achieve full<br />

logic connectivity as shown by Betz <strong>in</strong> [1]. In addition we<br />

assume that all segments are of length 1.<br />

Figure 1. General architecture model<br />

4. MULTILEVEL LOGIC<br />

CLUSTERING APPROACH<br />

Cluster<strong>in</strong>g is done <strong>in</strong> 2 phases. First we apply a k-way<br />

partition<strong>in</strong>g to the circuit. So hav<strong>in</strong>g a circuit consist<strong>in</strong>g of<br />

set of modules <strong>and</strong> set of signal nets, we want to divide it<br />

<strong>in</strong>to k clusters such that the number of <strong>in</strong>ter-cluster signal<br />

nets is m<strong>in</strong>imized. As we will use an exist<strong>in</strong>g partition<strong>in</strong>g


tool we can’t impose several constra<strong>in</strong>ts <strong>in</strong> advance like<br />

the number of p<strong>in</strong>s per cluster. That’s why <strong>in</strong> the second<br />

phase we have to move some vertices among the partitions<br />

to respect such constra<strong>in</strong>ts.<br />

4.1 Partition<strong>in</strong>g runn<strong>in</strong>g<br />

In this phase, we present the multilevel partition<strong>in</strong>g<br />

algorithm that we have applied to divide the BLEs <strong>in</strong>to<br />

clusters. As <strong>in</strong> the FPGA partition<strong>in</strong>g problem, BLEs must<br />

be divided <strong>in</strong>to k roughly equal parts, we use an algorithm<br />

that computes directly the k partitions: hMETIS-Kway [4].<br />

The hMETIS-Kway is k-way partition<strong>in</strong>g algorithm based<br />

on the multilevel paradigm.<br />

As shown <strong>in</strong> figure 2, the hypergraph is coarsened<br />

successively <strong>and</strong> it is directly partitioned <strong>in</strong>to k parts. Then<br />

this k-way partition<strong>in</strong>g is successively ref<strong>in</strong>ed as the<br />

partition<strong>in</strong>g is projected back <strong>in</strong>to the orig<strong>in</strong>al hypergraph.<br />

Corsen<strong>in</strong>g<br />

phase<br />

Figure 2. The various phases of the multilevel cluster<strong>in</strong>g<br />

approach<br />

If dur<strong>in</strong>g the cluster<strong>in</strong>g special properties of the<br />

<strong>in</strong>terconnect can be exploited, significant ga<strong>in</strong>s can be<br />

obta<strong>in</strong>ed <strong>in</strong> terms of routability. when we run the<br />

partition<strong>in</strong>g we choose an objective function that<br />

m<strong>in</strong>imises the number of external nets <strong>and</strong> elim<strong>in</strong>ates nets<br />

with high density: A net with a larger number of term<strong>in</strong>als<br />

is harder to route.<br />

4.2 Respect<strong>in</strong>g Constra<strong>in</strong>ts<br />

When the k-way hMetis partition<strong>in</strong>g is run, it’s impossible<br />

to impose the constra<strong>in</strong>ts concern<strong>in</strong>g the partitions’ size<br />

<strong>and</strong> the number of <strong>in</strong>puts per cluster. As it can be seen <strong>in</strong><br />

figure 1 a “constra<strong>in</strong>ts respect<strong>in</strong>g” step was added at the<br />

end of the uncoarsen<strong>in</strong>g phase. First we have to verify for<br />

each cluster if the number of blocks exceeds the limit<br />

imposed by the FPGA architecture. In this case we have to<br />

move some blocks from some clusters <strong>and</strong> to place them<br />

<strong>in</strong> other ones. In a second step we have to verify if the<br />

number of external <strong>in</strong>put nets exceeds the number of<br />

cluster’s <strong>in</strong>put p<strong>in</strong>s allowed by the architecture. In this<br />

case we have also to move some blocks. When we do such<br />

moves we will modify the partitions that we have obta<strong>in</strong>ed<br />

<strong>and</strong> this can have a bad effect on the objective function. So<br />

<strong>in</strong> both cases we have to select the c<strong>and</strong>idates to move<br />

with the best ga<strong>in</strong>. The ga<strong>in</strong> is def<strong>in</strong>ed as the number of<br />

external nets (~number of p<strong>in</strong>s per cluster) to reduce when<br />

we move a bloc B from a cluster C.<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

Constra<strong>in</strong>ts<br />

respect<br />

Uncorsen<strong>in</strong>g<br />

phase<br />

140<br />

Ga<strong>in</strong> ( B,<br />

C)<br />

=<br />

∑<br />

i∈Nets(<br />

B)<br />

g(<br />

i,<br />

Nets(<br />

C),<br />

B)<br />

Now we present how the ga<strong>in</strong> for each net is computed.<br />

Note that each block output p<strong>in</strong> is accessible from outside<br />

<strong>and</strong> there is no shar<strong>in</strong>g among the output p<strong>in</strong>s. Therefore<br />

there would be no output p<strong>in</strong> constra<strong>in</strong>ts for the clusters.<br />

Accord<strong>in</strong>g to this observation, sav<strong>in</strong>g on output p<strong>in</strong>s<br />

doesn’t br<strong>in</strong>g any ga<strong>in</strong>.<br />

Cluster C<br />

i0 out<br />

i1<br />

N2<br />

i2<br />

i3<br />

N1<br />

N3<br />

i0 out<br />

i1<br />

i2<br />

i3<br />

Block to move<br />

i0 out<br />

i1<br />

i2<br />

i3<br />

i0 out<br />

i1<br />

i2<br />

i3<br />

N4<br />

Figure 3. Logic block be<strong>in</strong>g moved from a cluster<br />

In figure 3, we show a c<strong>and</strong>idate basic block B <strong>and</strong> a<br />

cluster C. The nets N1, N2, N3 <strong>and</strong> N4 have different<br />

contributions to the ga<strong>in</strong> of mov<strong>in</strong>g the block B.<br />

N1 is connected to <strong>in</strong>put p<strong>in</strong>s of two blocks <strong>in</strong>side the<br />

cluster C. So N1 has no improv<strong>in</strong>g effect s<strong>in</strong>ce the<br />

external <strong>in</strong>put p<strong>in</strong> will be kept. So the ga<strong>in</strong> obta<strong>in</strong>ed by<br />

mov<strong>in</strong>g B from cluster C correspond<strong>in</strong>g to net N1 is 0.<br />

However s<strong>in</strong>ce all term<strong>in</strong>als of the net N2 are <strong>in</strong>side the<br />

cluster (<strong>in</strong>ternal net), one <strong>in</strong>put p<strong>in</strong> of the cluster would<br />

be used for N2 if we move the block B. So the ga<strong>in</strong> of<br />

mov<strong>in</strong>g logic block B to the cluster due to N2 <strong>in</strong> terms of<br />

used <strong>in</strong>put p<strong>in</strong>s per cluster is -1. N3 is connected to only<br />

an <strong>in</strong>put p<strong>in</strong> of the block B <strong>in</strong> the cluster C. So when we<br />

move B an <strong>in</strong>put p<strong>in</strong> gets free <strong>and</strong> the ga<strong>in</strong> is 1. The<br />

driv<strong>in</strong>g p<strong>in</strong> of N4 is the output of the logic block B. There<br />

is an <strong>in</strong>put p<strong>in</strong> of net N4 <strong>in</strong>side the cluster C. If we move<br />

the block B we need to use an <strong>in</strong>put p<strong>in</strong> of the cluster to<br />

connect the net N4 to other term<strong>in</strong>als of the net outside<br />

the cluster. Accord<strong>in</strong>g to this reason the net N4 has a<br />

contribution to the block ga<strong>in</strong> equal to -1.<br />

All different cases yield<strong>in</strong>g different ga<strong>in</strong>s are presented <strong>in</strong><br />

table 1, for one net connected to the c<strong>and</strong>idate block.<br />

Table. 1. Ga<strong>in</strong> of mov<strong>in</strong>g a block accord<strong>in</strong>g to a s<strong>in</strong>gle<br />

net<br />

p<strong>in</strong> <strong>in</strong> B -> Staus <strong>in</strong>-p<strong>in</strong><br />

p<strong>in</strong> <strong>in</strong> C<br />

ga<strong>in</strong><br />

out -> out B has the<br />

only p<strong>in</strong><br />

0<br />

out -> out Other <strong>in</strong> p<strong>in</strong>s<br />

<strong>in</strong> C<br />

-1


0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

<strong>in</strong> -> <strong>in</strong> B has the<br />

only p<strong>in</strong><br />

1<br />

<strong>in</strong> -> <strong>in</strong> Other <strong>in</strong>-p<strong>in</strong>s<br />

<strong>in</strong> C<br />

0<br />

<strong>in</strong> -> <strong>in</strong> All p<strong>in</strong>s <strong>in</strong> C -1<br />

<strong>in</strong> -> out - 0<br />

Once we have selected the c<strong>and</strong>idate block to move, we<br />

must select the best cluster receiver. The cluster receiver<br />

must verify the follow<strong>in</strong>g conditions:<br />

• Number of blocks is less than the limit number<br />

imposed by the architecture;<br />

• Number of <strong>in</strong>put p<strong>in</strong>s doesn’t exceed the<br />

number imposed by the architecture.<br />

When we add a block to a cluster we must not exceed the<br />

number of <strong>in</strong>puts. So we compute a ga<strong>in</strong> function for each<br />

cluster. The ga<strong>in</strong> is the number of the <strong>in</strong>puts that will be<br />

added (negative value) or reduced (positive value) when<br />

we <strong>in</strong>sert the block <strong>in</strong>to the cluster.<br />

Cluster receiver C<br />

i0 out<br />

i1<br />

i2<br />

N2<br />

i3<br />

N1<br />

N3<br />

i0 out<br />

i1<br />

i2<br />

i3<br />

Block to <strong>in</strong>sert<br />

Figure 4. Logic block be<strong>in</strong>g <strong>in</strong>serted <strong>in</strong> a cluster<br />

In figure 4, a block B <strong>and</strong> a c<strong>and</strong>idate cluster C are<br />

presented. Block B has two common nets with cluster C:<br />

N1 <strong>and</strong> N2. An <strong>in</strong>put p<strong>in</strong> of the net N1 is <strong>in</strong>side the cluster<br />

C. By add<strong>in</strong>g the block B to the cluster, another <strong>in</strong>put<br />

term<strong>in</strong>al of the net N1 would be <strong>in</strong>side the cluster. This<br />

will not lead to any change concern<strong>in</strong>g the <strong>in</strong>put p<strong>in</strong>s, the<br />

ga<strong>in</strong> is equal to 0. The driv<strong>in</strong>g p<strong>in</strong> of net N2 is the output<br />

of the logic block B. There is an <strong>in</strong>put p<strong>in</strong> of net N2 <strong>in</strong>side<br />

the cluster C. If we <strong>in</strong>sert the block B there will be no need<br />

to use an <strong>in</strong>put p<strong>in</strong> of the cluster to connect the net N2 to<br />

other term<strong>in</strong>als of the net outside the cluster. By add<strong>in</strong>g<br />

block B to cluster C, an <strong>in</strong>put p<strong>in</strong> of the cluster gets free<br />

<strong>and</strong> can be used for another net connection. The ga<strong>in</strong><br />

correspond<strong>in</strong>g to this net is eqaul to 1. Net N3 has no p<strong>in</strong>s<br />

<strong>in</strong>side the cluster C. One p<strong>in</strong> of the cluster will be used for<br />

N3. So the ga<strong>in</strong> of mov<strong>in</strong>g logic block B to the cluster due<br />

to N3 <strong>in</strong> terms of used <strong>in</strong>put p<strong>in</strong>s per cluster is -1.<br />

The ga<strong>in</strong> for each block <strong>in</strong>sert<strong>in</strong>g <strong>in</strong> cluster C can be<br />

computed as follows:<br />

Ga<strong>in</strong>(<br />

B,<br />

C)<br />

=<br />

∑<br />

i∈Nets(<br />

B)<br />

i0 out<br />

i1<br />

i2<br />

i3<br />

i0 out<br />

i1<br />

i2<br />

i3<br />

g(<br />

i,<br />

Nets(<br />

C),<br />

B)<br />

141<br />

g(I, Nets(C), B) is def<strong>in</strong>ed as the ga<strong>in</strong> obta<strong>in</strong>ed <strong>in</strong> <strong>in</strong>put<br />

p<strong>in</strong>s of cluster C as def<strong>in</strong>ed <strong>in</strong> table 2.<br />

Table. 2. Ga<strong>in</strong> of <strong>in</strong>sert<strong>in</strong>g a block accord<strong>in</strong>g to a s<strong>in</strong>gle<br />

net<br />

p<strong>in</strong> <strong>in</strong> B -><br />

p<strong>in</strong> <strong>in</strong> C<br />

Status <strong>in</strong>-p<strong>in</strong> ga<strong>in</strong><br />

<strong>in</strong> -> out<br />

<strong>in</strong> -> <strong>in</strong><br />

- 0<br />

out -> <strong>in</strong> - 1<br />

<strong>in</strong> New net -1<br />

out - 0<br />

5. EXPERIMENTAL RESULTS<br />

We have implemented our cluster<strong>in</strong>g technique, on the top<br />

of VPR [5], we have used a Pentium-4 mach<strong>in</strong>e 3 GHz.<br />

We placed <strong>and</strong> routed 18 of the largest MCNC r<strong>and</strong>om<br />

benchmark circuits [6] on clustered FPGAs. Cluster size 8<br />

was used <strong>in</strong> our experiments.<br />

5.1 Routability <strong>and</strong> power consumption<br />

Table 3 shows the rout<strong>in</strong>g tracks results for T-VPack [2],<br />

RPack [3] <strong>and</strong> our cluster<strong>in</strong>g technique. For the same<br />

number of clusters T-VPack <strong>and</strong> R-Pack use about 15%<br />

more tracks than our cluster<strong>in</strong>g technique. Less number of<br />

tracks does not only mean sav<strong>in</strong>g wir<strong>in</strong>g area but also<br />

decreas<strong>in</strong>g the size of the rout<strong>in</strong>g switches. The reduction<br />

of needed tracks is directly related to the amount of<br />

<strong>in</strong>terconnect power sav<strong>in</strong>g.<br />

Table. 3. Rout<strong>in</strong>g tracks<br />

T-Vpack R-Pack Ours<br />

Circuits Clusters Channel Channel Channel<br />

alu4 192 26 34 26<br />

apex2 240 34 35 33<br />

Apex4 165 37 35 35<br />

bigkey 214 17 15 11<br />

des 200 17 18 15<br />

diffeq 189 20 19 17<br />

dsip 172 14 24 11<br />

elliptic 454 37 32 30<br />

ex1010 599 41 42 31<br />

ex5p 139 37 36 31<br />

frisc 446 39 34 36<br />

misex3 178 29 32 29<br />

pdc 582 52 56 51<br />

s38417 802 29 25 18<br />

s38584 806 32 26 20<br />

seq 221 33 37 34<br />

spla 469 42 48 39<br />

tseng 133 21 21 13<br />

average 344.48 30.49 31.16 25.94


5.2 Circuit speed <strong>and</strong> run time<br />

Now we will check the effect of our cluster<strong>in</strong>g method on<br />

circuit speed. We placed <strong>and</strong> routed the same benches on<br />

clustered FPGAs us<strong>in</strong>g a tim<strong>in</strong>g-driven algorithm. Table 4<br />

shows the critical path results for T-VPack [2] <strong>and</strong> our<br />

cluster<strong>in</strong>g technique. We have almost the same speed<br />

performance. Our cluster<strong>in</strong>g method can deal with the area<br />

<strong>and</strong> speed problems <strong>in</strong> the same time.<br />

Table. 4. Critical path <strong>and</strong> run time<br />

T-Vpack Ours<br />

Circuits C.Path Run time C.Path Run time<br />

(ns) (s) (ns) (s)<br />

alu4 59.8 47.3 55.3 50.2<br />

apex2 76.0 100.2 69.1 90.3<br />

Apex4 65.7 71.5 63.4 57<br />

bigkey 43.6 58.2 29.6 38<br />

des 46.5 68.3 54.9 62.4<br />

diffeq 43.7 41.5 56.7 39.1<br />

dsip 52.4 45.4 33.4 41.9<br />

elliptic 81.29 260.5 99.7 265.3<br />

ex1010 86.8 525 98.6 495.4<br />

ex5p 110.6 65.2 108.3 56.7<br />

frisc 53.9 280.3 60.0 290.8<br />

misex3 59.3 59.3 57.5 60.7<br />

pdc 126.7 818.7 121.7 825.5<br />

s38417 87.3 360 77.7 341.4<br />

s38584 59.3 420.3 55.9 413.8<br />

seq 63.5 79.5 62.7 88.2<br />

spla 98.1 314 98.7 320.7<br />

tseng 51.3 33.3 50.4 35<br />

average 70.2 202.7 69.6 198.5<br />

The time consumed <strong>in</strong> our partition<strong>in</strong>g method is<br />

relatively important <strong>and</strong> this is due essentially to the<br />

“uncoarsen<strong>in</strong>g” <strong>and</strong> the “constra<strong>in</strong>ts respect<strong>in</strong>g” phases.<br />

In table 4 we show the time consumed to run cluster<strong>in</strong>g,<br />

placement <strong>and</strong> rout<strong>in</strong>g <strong>in</strong> both cases. We notice that by<br />

spend<strong>in</strong>g more time <strong>in</strong> the cluster<strong>in</strong>g stage we reduce the<br />

time that will be consumed <strong>in</strong> the placement <strong>and</strong> rout<strong>in</strong>g<br />

phases.<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

6. CONCLUSION<br />

In this paper we proposed a multilevel partition<strong>in</strong>g method<br />

for cluster-based FPGAs. This method improves the<br />

routability by decreas<strong>in</strong>g the number of required tracks <strong>in</strong><br />

the FPGA rout<strong>in</strong>g. This method has also a good effect on<br />

optimis<strong>in</strong>g the critical path. Those improvements were<br />

achieved thanks to the multilevel partition<strong>in</strong>g <strong>and</strong> the<br />

choice of the objective function. Those results <strong>in</strong> terms of<br />

area <strong>and</strong> power reduction are important for FPGA<br />

embedd<strong>in</strong>g on SOC.<br />

142<br />

7. REFERENCES<br />

[1] V.Betz, J.Rose <strong>and</strong> A.Marquartdt, “Architecture <strong>and</strong><br />

CAD for Deep-Submicron FPGAs”, Kluwer<br />

Academic Bublishers, 1999.<br />

[2] A.Marquart, V.Betz <strong>and</strong> J.Rose “Us<strong>in</strong>g Cluster-Based<br />

Logic blocks <strong>and</strong> Tim<strong>in</strong>g-Driven Pack<strong>in</strong>g to improve<br />

FPGA speed <strong>and</strong> density” ACM/SIGDA International<br />

Symposium on Field Programmable Gate Arrays,<br />

Montrey, CA, February 1999, pp.37-46.<br />

[3] E.Bozogzadeh, S.Ogrenci-Memik, M.Sarrafzadeh,<br />

“RPack: Routability-driven Pack<strong>in</strong>g for cluster-based<br />

FPGAs”, Proceed<strong>in</strong>gs, Asia-South Pacific Design<br />

Automation conference, January 2001.<br />

[4] G.Karypis <strong>and</strong> V.Kumar, “Multilevel k-way<br />

Hypergraph Partition<strong>in</strong>g”, DAC99, New Orleans<br />

Louisiana.<br />

[5] V.Betz, J.Rose “A New Pack<strong>in</strong>g, Placement <strong>and</strong><br />

Rout<strong>in</strong>g tool for FPGA research”, Proc Seventh<br />

FPLA, pp.213-222, 1997.<br />

[6] S.Yang, “Logic synthesis <strong>and</strong> optimization<br />

benchmarks” user guide version 3.0. MCNC, Jan.<br />

1991.


0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

AN EVOLUTIONARY APPROACH FOR<br />

SYMMETRICAL FIELD PROGRAMMABLE GATE<br />

ARRAY PLACEMENT<br />

M. Yang 1 , A.E.A. Alma<strong>in</strong>i 1 , L. Wang 2 , P.J. Wang 3<br />

1 School of Eng<strong>in</strong>eer<strong>in</strong>g, Napier University, 10 Col<strong>in</strong>ton Road, Ed<strong>in</strong>burgh, EH10 5DT, UK<br />

2 <strong>Microelectronics</strong> Department, Fudan University, 220 H<strong>and</strong>an Road, Shanghai, 200433, Ch<strong>in</strong>a<br />

3 Faculty of Information Science <strong>and</strong> Technology, N<strong>in</strong>gbo University, 315211, Ch<strong>in</strong>a<br />

E-mail: m.yang@napier.ac.uk<br />

ABSTRACT<br />

An evolutionary computation method is used to place a<br />

set of different <strong>Microelectronics</strong> Center of North Carol<strong>in</strong>a<br />

(MCNC) benchmark circuits on traditional symmetrical<br />

Field Programmable Gate Array (FPGA). The<br />

experimental results are compared to the state-of-the-art<br />

results from Versatile Placement <strong>and</strong> Rout<strong>in</strong>g (VPR). The<br />

proposed algorithm achieves promis<strong>in</strong>g performance, <strong>in</strong><br />

terms of rout<strong>in</strong>g channel density.<br />

1. INTRODUCTION<br />

The logic capacity of commercial FPGAs has significantly<br />

<strong>in</strong>creased as the process geometry shrunk to<br />

deep-submicron such as 0.13 micron. Placement is an<br />

NP-complete problem, thus new challenges arise from the<br />

cont<strong>in</strong>uous growth <strong>in</strong> the number of logic elements.<br />

The rout<strong>in</strong>g resources <strong>in</strong> the FPGA are prefabricated <strong>and</strong><br />

are crucially limited compared to Mask Programmable<br />

Gate Array (MPGA). To achieve 100 percent placement<br />

<strong>and</strong> rout<strong>in</strong>g, many algorithms such as genetic algorithm<br />

(GA) [1] [2], simulated anneal<strong>in</strong>g (SA) [3] [4] [5] <strong>and</strong><br />

particle swarm optimization (PSO) [6] were proposed.<br />

The objective of this paper is to present a placement<br />

solution for the traditional symmetrical FPGA [7] by<br />

us<strong>in</strong>g GA, one of the evolutionary computation methods.<br />

GA is stochastic search algorithm based on biological<br />

evolution models, whose ma<strong>in</strong> advantages lie <strong>in</strong> its<br />

robustness of search <strong>and</strong> problem <strong>in</strong>dependence. The basic<br />

concepts of GA were developed by Holl<strong>and</strong> <strong>in</strong> 1975 [8].<br />

Although GA has characteristics of robustness <strong>and</strong> wide<br />

range search space, it normally takes a large number of<br />

generations to converge to optimum, result<strong>in</strong>g <strong>in</strong> long<br />

CPU consumption. This makes fast prototyp<strong>in</strong>g of FPGA<br />

difficult. As a result, a modified GA with greedy scheme<br />

is <strong>in</strong>troduced.<br />

In this paper, section 2 gives a brief description of<br />

symmetrical FPGA architecture. In section 3 the proposed<br />

genetic algorithm is given. Experimental results show<strong>in</strong>g<br />

the effectiveness of the proposed approach are presented<br />

<strong>in</strong> section 4. Conclusion is then given <strong>in</strong> section 5.<br />

143<br />

2. FPGA ARCHITECTURE<br />

Symmetrical FPGA [7] consists of three fundamental<br />

components: logic blocks, <strong>in</strong>put <strong>and</strong> output (I/O) blocks<br />

<strong>and</strong> programm<strong>in</strong>g rout<strong>in</strong>g resources. Logic blocks <strong>and</strong><br />

rout<strong>in</strong>g resources are surrounded by I/O blocks. The<br />

symmetrical FPGA used by Xil<strong>in</strong>x is shown <strong>in</strong> Fig. 1.<br />

Figure 1. Symmetrical FPGA architecture.<br />

Fig. 2 shows a logic block of symmetrical FPGA. It is<br />

composed of one 4-<strong>in</strong>put look-up table (LUT), which is<br />

used to implement comb<strong>in</strong>ational logic, one D flip-flop<br />

<strong>and</strong> one multiplexer to select comb<strong>in</strong>ational or sequential<br />

logic.<br />

Figure 2. A logic block.


3. EVOLUTIONARY APPROACH<br />

3.1 Algorithm<br />

The GA is a search technique which mimics the natural<br />

process of evolution as a means of progress<strong>in</strong>g to the<br />

optimum.<br />

Basically, it starts with an <strong>in</strong>itial set of r<strong>and</strong>om solutions<br />

termed as population. Each <strong>in</strong>dividual <strong>in</strong> the population<br />

consists of a str<strong>in</strong>g of bits termed as genes. The str<strong>in</strong>g<br />

which is made up of genes is termed chromosome. The<br />

chromosome represents a solution to the problem. Dur<strong>in</strong>g<br />

each iteration, which is termed generation, the <strong>in</strong>dividuals<br />

<strong>in</strong> the current population are evaluated us<strong>in</strong>g measurement<br />

of fitness function. Those <strong>in</strong>dividuals with high fitness<br />

values, i.e. good placement solutions, are more likely to<br />

be selected. Genetic operators such as crossover <strong>and</strong><br />

mutation are employed to f<strong>in</strong>d good solution. As a result,<br />

the fitness of population evolves as the number of<br />

generations <strong>in</strong>creases.<br />

Figure 3. Proposed genetic algorithm.<br />

Fig. 3 shows the pseudo code for the proposed GA. The<br />

proposed algorithm modifies st<strong>and</strong>ard genetic algorithm <strong>in</strong><br />

three ways. First one is selection operator. 30 percent of<br />

population with higher fitness rema<strong>in</strong>s <strong>in</strong>tact while the rest<br />

is selected probabilistically based on the fitness of the<br />

<strong>in</strong>dividual. The higher the fitness, the more likely it can be<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

144<br />

selected. Second is greedy scheme. Selection of <strong>in</strong>dividual<br />

for greedy improvement is also probabilistic. It works on<br />

selected <strong>in</strong>dividuals <strong>in</strong> order to improve visible fitness by<br />

swapp<strong>in</strong>g two positions of two blocks at a time for a<br />

number of iterations. Third one is elitism. Elitism is used<br />

to ensure the survival of good solutions to the next<br />

generation.<br />

3.2 Representation<br />

One possible representation of an <strong>in</strong>dividual can be<br />

b<strong>in</strong>ary. A b<strong>in</strong>ary <strong>in</strong>dividual consists of a str<strong>in</strong>g of b<strong>in</strong>ary<br />

digits. A 0 or 1 <strong>in</strong> the str<strong>in</strong>g corresponds to if condition is<br />

false or true, as shown <strong>in</strong> Fig. 4. Fig. 5 shows a vector of<br />

positive <strong>in</strong>teger representation.<br />

Figure 4. B<strong>in</strong>ary representation.<br />

Figure 5. Positive <strong>in</strong>teger representation.<br />

The vector of positive <strong>and</strong> negative <strong>in</strong>tegers is more<br />

natural <strong>in</strong> our placement optimization. As a result, it is<br />

used to represent each <strong>in</strong>dividual.<br />

In the representation, only one negative <strong>in</strong>teger value is<br />

used, which is -1. It represents empty. Positive <strong>in</strong>teger can<br />

be <strong>in</strong> the range of [0, ∞] <strong>and</strong> depends on the number of<br />

logic blocks. For example, if the number of logic blocks is<br />

11, the positive <strong>in</strong>tegers are then 0, 1, 2, … 9, 10. The<br />

logic block at the location (x,y) of the FPGA has a<br />

correspond<strong>in</strong>g position <strong>in</strong> the chromosome. The position is<br />

calculated as shown <strong>in</strong> Eq. 1<br />

position = ( x −1)<br />

× size + ( y −1)<br />

Eq. 1<br />

where position st<strong>and</strong>s for position of gene <strong>and</strong> size st<strong>and</strong>s<br />

for the size of FPGA. For example, if the size of a FPGA<br />

is 4 by 4, then size = 4.<br />

Figure 6. A simple example of representation.


A simple example is illustrated <strong>in</strong> Fig. 6. An empty Logic<br />

Block which is located at (1,1) corresponds to the gene at<br />

the 1st position of the vector, result<strong>in</strong>g <strong>in</strong> negative <strong>in</strong>teger<br />

-1. Logic Block 10 which is located at (1,2) corresponds<br />

to the gene at the 2nd position of the vector, result<strong>in</strong>g <strong>in</strong><br />

positive <strong>in</strong>teger 10.<br />

3.3 Fitness function<br />

The quality of placement is evaluated by fitness function.<br />

The higher the fitness, the higher the quality of placement.<br />

S<strong>in</strong>ce a benchmark circuit will have hundreds of nets or<br />

even more, the measure is judged by average fitness value<br />

of all nets not just by any partial one, which is shown <strong>in</strong><br />

Eq. 2.<br />

f<br />

N<br />

= maxcost−<br />

∑ C(i)[ bbx<br />

(i) + bb y<br />

(i)]<br />

i = 1<br />

Eq. 2<br />

where maxcost st<strong>and</strong>s for the worst cost for placement, N<br />

st<strong>and</strong>s for the total numbers of nets. For each net i, bb x(i)<br />

<strong>and</strong> bb y(i) denote the horizontal <strong>and</strong> vertical spans of its<br />

bound<strong>in</strong>g box respectively. C(i) compensates for the fact<br />

that the bound<strong>in</strong>g box wire length model underestimates<br />

the wir<strong>in</strong>g necessary to connect nets with more than three<br />

term<strong>in</strong>als. Its value depends on the number of term<strong>in</strong>als of<br />

the net i.<br />

3.4 Genetic operators<br />

S<strong>in</strong>ce GA works on a population, a population of<br />

<strong>in</strong>dividuals is <strong>in</strong>itialized, <strong>and</strong> then genetic operators are<br />

employed on the population of the <strong>in</strong>dividuals. In the<br />

proposed algorithm, the population is <strong>in</strong>itially evaluated<br />

accord<strong>in</strong>g to fitness function which is mentioned <strong>in</strong> 3.3.<br />

The fitness values of population are sorted <strong>in</strong> <strong>in</strong>creas<strong>in</strong>g<br />

order accord<strong>in</strong>g to fitness of <strong>in</strong>dividual. Individuals of 30<br />

percent of population with higher fitness value <strong>in</strong> the<br />

current population are <strong>in</strong>tact <strong>and</strong> rema<strong>in</strong> <strong>in</strong> the next<br />

population. The rest, i.e. 70 percent of current population,<br />

works as follows.<br />

The fitness of each <strong>in</strong>dividual is considered as a slot of a<br />

sized pie. M equidistant markers are placed around the<br />

pie, where M = 0.7 × K <strong>and</strong> K is the number of<br />

<strong>in</strong>dividuals <strong>in</strong> the population. Fig. 7 shows the sized pie<br />

<strong>and</strong> markers around the pie. A total fitness value of 70<br />

percent of the population corresponds to the whole size. A<br />

r<strong>and</strong>om decimal number greater than zero but less than<br />

one is generated. This decimal number corresponds to the<br />

rotation of the markers. If the marker is <strong>in</strong>side the slot, the<br />

correspond<strong>in</strong>g <strong>in</strong>dividual is selected. As a result, M<br />

<strong>in</strong>dividuals are simultaneously selected. Although the<br />

selection procedure is r<strong>and</strong>om, the chance of be<strong>in</strong>g<br />

selected for each <strong>in</strong>dividual is proportional to its fitness.<br />

For example, if M = 6 <strong>and</strong> the r<strong>and</strong>om decimal number is<br />

0.125, the markers will rotate 45 degrees. The selected<br />

<strong>in</strong>dividuals are therefore 2, 3, 3, 4, 5 <strong>and</strong> 5, as shown <strong>in</strong><br />

Fig. 7.<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

145<br />

Figure 7. An example of a sized pie accord<strong>in</strong>g to<br />

the fitness of each <strong>in</strong>dividual <strong>and</strong> the rotation of<br />

the markers.<br />

The one-po<strong>in</strong>t crossover operator [1] is then employed to<br />

produce two new <strong>in</strong>dividuals, offspr<strong>in</strong>gs, by comb<strong>in</strong><strong>in</strong>g<br />

two <strong>in</strong>dividual parents. Diversity of the population is<br />

ma<strong>in</strong>ta<strong>in</strong>ed by us<strong>in</strong>g the mutation operator [1].<br />

3.5 Greedy scheme<br />

In order to overcome long CPU consumption <strong>in</strong> st<strong>and</strong>ard<br />

GA, greedy scheme which works for r<strong>and</strong>omly selected<br />

<strong>in</strong>dividual is <strong>in</strong>troduced. It beg<strong>in</strong>s by r<strong>and</strong>omly select<strong>in</strong>g<br />

one <strong>in</strong>dividual from the population. Then two genes are<br />

r<strong>and</strong>omly selected for swapp<strong>in</strong>g <strong>in</strong> the selected <strong>in</strong>dividual,<br />

i.e. two logic blocks are r<strong>and</strong>omly selected <strong>and</strong> swapped.<br />

The swapp<strong>in</strong>g might result <strong>in</strong> either <strong>in</strong>crease or decrease<br />

of fitness value. Only when fitness value <strong>in</strong>creases after<br />

swapp<strong>in</strong>g, the swapp<strong>in</strong>g is accepted. Those swaps that<br />

reduce fitness value are discarded. This scheme is carried<br />

out <strong>in</strong> each generation. In each generation, swapp<strong>in</strong>g two<br />

blocks is performed for several iterations.<br />

4. RESULTS<br />

This section presents the experimental results obta<strong>in</strong>ed by<br />

the proposed algorithm. The implementation of the<br />

algorithm was written <strong>in</strong> the C programm<strong>in</strong>g language. All<br />

experiments were performed on Intel Pentium 2.4 GHz<br />

with 512M memory under RedHat L<strong>in</strong>ux Enterprise AS3.<br />

Performance is measured us<strong>in</strong>g 19 <strong>Microelectronics</strong><br />

Center of North Carol<strong>in</strong>a (MCNC) benchmark circuits.<br />

The fixed parameter values are selected for all tested<br />

benchmarks follow<strong>in</strong>g some experiments <strong>and</strong> based on<br />

previous experience. POP_SIZE = 50, Pcrossover = 0.6,<br />

Pmutation = 0.005, Pgreedy = 0.3 <strong>and</strong> Preserve = 0.3.<br />

The placement is an <strong>in</strong>termediate stage of the Electronic<br />

Design Automation (EDA) flow. The way to measure the<br />

quality of a placement is to route the placement. The<br />

placements generated by VPR <strong>and</strong> our algorithm are both<br />

performed on the smallest possible size of a FPGA <strong>and</strong><br />

routed by VRouter [3] with same parameter sett<strong>in</strong>gs for<br />

the purpose of fair comparison. The numbers of channel<br />

tracks required to route a placement is used to judge the<br />

quality of the placement, as illustrated <strong>in</strong> Table 1. The less<br />

the channel tracks needed for the circuit, the better the


placement. One rout<strong>in</strong>g example of tested benchmark<br />

9symml is given <strong>in</strong> Fig. 8.<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

5. CONCLUSION<br />

An evolutionary placement algorithm for symmetrical<br />

FPGA is described. The proposed genetic algorithm can<br />

achieve as good performance as the state-of-the-art<br />

placement tool VPlace. Only 2 out of 19 tested MCNC<br />

benchmark circuits, which are clip <strong>and</strong> k2, perform worse<br />

than VPlace. These two benchmark circuits consume one<br />

more track each.<br />

Table 1. The comparison results of rout<strong>in</strong>g<br />

channel tracks between VPR <strong>and</strong> GA.<br />

Placer Smallest VPlace [3] GA<br />

Router size VRouter [3]<br />

5xp1 8x8 4 4<br />

9symml 10x10 5 5<br />

alu2 15x15 6 6<br />

apex7 11x11 5 5<br />

b9 8x8 4 4<br />

cc 6x6 3 3<br />

comp 7x7 4 4<br />

clip 12x12 5 6<br />

count 7x7 4 4<br />

e64 17x17 8 8<br />

example2 19x19 5 5<br />

f51m 8x8 4 4<br />

k2 23x23 9 10<br />

lal 6x6 4 4<br />

ldd 7x7 4 4<br />

pcler8 6x6 4 4<br />

term1 10x10 5 5<br />

too-lrg 14x14 7 7<br />

vda 18x18 8 8<br />

Total 98 100<br />

146<br />

Figure 8. F<strong>in</strong>al rout<strong>in</strong>g of 9symml.<br />

6. REFERENCES<br />

[1] M. Yang <strong>and</strong> A.E.A. Alma<strong>in</strong>i, Hybrid Genetic<br />

Algorithm for Xil<strong>in</strong>x-style FPGA Placement, Proc. of<br />

the First Intl. Conf. on CAD/ECAD, Durham<br />

University, UK, pp. 95-100, 2004.<br />

[2] V. Schnecke <strong>and</strong> O. Vornberger, Hybrid genetic<br />

algorithms for constra<strong>in</strong>ed placement problems, IEEE<br />

Transactions on Evolutionary Computation, vol. 1,<br />

no. 4, pp. 266-277, 1997.<br />

[3] V. Betz <strong>and</strong> J. Rose, VPR: A New Pack<strong>in</strong>g,<br />

Placement <strong>and</strong> Rout<strong>in</strong>g tool for FPGA research,<br />

Proceed<strong>in</strong>g of the Seventh Field Programmable<br />

Logic Applications, pp. 213-222, 1997.<br />

[4] W. Vigerske, B. Stube <strong>and</strong> M. Pleßow, Automatic<br />

Wir<strong>in</strong>g <strong>in</strong> Switch Cab<strong>in</strong>ets, Proc. of the First Intl.<br />

Conf. on CAD/ECAD, Durham University, UK, pp.<br />

90-94, 2004.<br />

[5] C. Sechen <strong>and</strong> A. Sangiovanni-v<strong>in</strong>centelli,<br />

TimberWolf3.2: A new st<strong>and</strong>ard cell placement <strong>and</strong><br />

global rout<strong>in</strong>g package, Proc. of the 23rd DAC, pp.<br />

432-439, 1986.<br />

[6] V. G. Gudise <strong>and</strong> G. K Venayagamoorthy,., FPGA<br />

Placement <strong>and</strong> Rout<strong>in</strong>g Us<strong>in</strong>g Particle Swarm<br />

Optimization, Proc. of the IEEE Computer Society<br />

Annual Symposium on VLSI Emerg<strong>in</strong>g Trends <strong>in</strong><br />

VLSI Systems Design (ISVLSI’04), pp. 307-308,<br />

2004.<br />

[7] Xil<strong>in</strong>x Inc., XC4000E <strong>and</strong> XC4000X series FPGAs,<br />

Data sheet. 1997.<br />

[8] D. E. Goldberg, Genetic algorithms <strong>in</strong> search,<br />

optimization, <strong>and</strong> mach<strong>in</strong>e learn<strong>in</strong>g, Addison<br />

Wesley, 1989.


0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

RoRA: A RELIABILITY-ORIENTED PLACE AND<br />

ROUTE ALGORITHM FOR SRAM-BASED FPGAS<br />

Luca Sterpone, Matteo Sonza Reorda, Massimo Violante<br />

Politecnico di Tor<strong>in</strong>o, Dipartimento di Automatica e Informatica (DAUIN), CAD group<br />

C.so Duca degli Abruzzi 24, Tor<strong>in</strong>o, Italy<br />

E-mail: luca.sterpone@polito.it<br />

ABSTRACT<br />

SRAM-based FPGA designs are extremely susceptible to<br />

S<strong>in</strong>gle Event Upset (SEUs). S<strong>in</strong>ce the configuration<br />

memory def<strong>in</strong>es which is the circuit an SRAM-based<br />

Field Programmable Gate Array (FPGA) implements, any<br />

change <strong>in</strong>duced by SEUs <strong>in</strong> the configuration memory<br />

may modify drastically the implemented circuit. When<br />

such devices are used <strong>in</strong> safety-critical applications, fault<br />

tolerant techniques are needed able to mitigate the effects<br />

of SEUs <strong>in</strong> FPGA’s configuration memory. In this paper<br />

we present a reliability-oriented place <strong>and</strong> route algorithm<br />

that is able to mitigate the effects of the considered<br />

upsets.<br />

1. INTRODUCTION<br />

SRAM-based Field Programmable Gate Arrays (FPGAs)<br />

are programmable devices used for different application,<br />

such as signal process<strong>in</strong>g, prototyp<strong>in</strong>g <strong>and</strong> network<strong>in</strong>g.<br />

They are composed of a fixed number of rout<strong>in</strong>g resources<br />

(wires <strong>and</strong> programmable switches), memory modules,<br />

<strong>and</strong> logic resources (i.e., look-up tables or LUTs, Flip-<br />

Flop FFs): all these components are programmed by<br />

download<strong>in</strong>g <strong>in</strong>to a on-chip configuration memory a<br />

proper bitstream giv<strong>in</strong>g the FPGA the capability of<br />

implement<strong>in</strong>g nearly any k<strong>in</strong>d of digital circuit on the<br />

same chip.<br />

The content of the configuration memory is crucial for the<br />

correct operations of the circuit the FPGA implement. In<br />

SRAM-based FPGAs the mapped circuit is totally<br />

controlled by the configuration memory. When energetic<br />

particles hit the surface of the SRAM-based FPGAs, they<br />

can alter the bits compos<strong>in</strong>g the configuration memory,<br />

<strong>and</strong> therefore the circuit the FPGA implements may<br />

change its orig<strong>in</strong>al behavior.<br />

In SRAM-based FPGA, both comb<strong>in</strong>ational <strong>and</strong> sequential<br />

logic are implemented by several customizable SRAM<br />

cells which are extremely sensitive to radiations that may<br />

cause SEUs [1]. If an upset affect the comb<strong>in</strong>ational logic<br />

<strong>in</strong> the FPGA, it provokes a bit flip <strong>in</strong> one of the LUTs<br />

cells or <strong>in</strong> the cells that control the rout<strong>in</strong>g. This upset has<br />

a permanent effect, <strong>and</strong> is correctable only at the next load<br />

of the configuration bitstream. Viceversa, when an upset<br />

affect the user sequential logic it has a transient effect<br />

because the flip-flop’s next load corrects it.<br />

147<br />

An earlier solution consisted <strong>in</strong> develop<strong>in</strong>g radiationhardened<br />

FPGAs by resort<strong>in</strong>g to special manufactur<strong>in</strong>g<br />

technologies. Although effective, this solution is very<br />

expensive because radiation-hardened FPGAs cost several<br />

orders of magnitude more than commercial-of-the-shelf<br />

devices. The solution that is currently under <strong>in</strong>vestigation<br />

consists <strong>in</strong> adopt<strong>in</strong>g fault-tolerant architectures to<br />

implement hardened circuits while us<strong>in</strong>g commercial-offthe-shelf<br />

FPGA.<br />

Triple Modular Redundancy (TMR) design technique is<br />

the high-level SEU mitigation technique most often used<br />

today to protect designs synthesized <strong>in</strong> SRAM-based<br />

FPGA s<strong>in</strong>ce memory elements, <strong>in</strong>terconnections <strong>and</strong><br />

comb<strong>in</strong>ational gates are all susceptible to SEUs <strong>and</strong> fullmodule<br />

redundancy must be adopted. The TMR<br />

implementation uses three identical logic blocks<br />

perform<strong>in</strong>g the same task <strong>in</strong> parallel with correspond<strong>in</strong>g<br />

outputs be<strong>in</strong>g compared trough majority voter. Although<br />

TMR is suitable for protect<strong>in</strong>g FPGA memory elements,<br />

<strong>in</strong>terconnections <strong>and</strong> comb<strong>in</strong>ational gates, TMR is able to<br />

mitigate only partially the effects of SEUs affect<strong>in</strong>g those<br />

resources, as we have been observed through a detailed<br />

analysis of FPGA resources <strong>and</strong> extensive fault-<strong>in</strong>jection<br />

experiments [2].<br />

In this paper we propose a reliability-oriented place <strong>and</strong><br />

route algorithm (RoRA) able to place <strong>and</strong> route the logic<br />

functions <strong>and</strong> the signals of a design <strong>in</strong> such a way that the<br />

number of SEUs affect<strong>in</strong>g the configuration memory <strong>and</strong><br />

possibly caus<strong>in</strong>g FPGA misbehaviors is significantly<br />

reduced with respect to the TMR approach. The algorithm<br />

is applicable to every k<strong>in</strong>d of design implemented on<br />

SRAM-based FPGAs.<br />

2. PRELIMINARIES<br />

A Field Programmable Gate Array consists of an array of<br />

logic blocks that can be <strong>in</strong>terconnected selectively to<br />

implement different designs. An FPGA logic block is<br />

typically capable of implement<strong>in</strong>g many different<br />

comb<strong>in</strong>ational <strong>and</strong> sequential logic functions. Today,<br />

commercial FPGAs use logic blocks that are based on<br />

transistor pairs, basic small gates such as two-<strong>in</strong>put<br />

NANDs or exclusive ORs, multiplexers, Look-up tables<br />

(LUTs) <strong>and</strong> wide-fan<strong>in</strong> AND-OR structures. An FPGA<br />

rout<strong>in</strong>g architecture <strong>in</strong>corporates wire segments of vary<strong>in</strong>g<br />

length that can be <strong>in</strong>terconnected via electrically<br />

programmable switches. The distribution of the length of


the wire segments affects directly the density <strong>and</strong><br />

performance achieved by an FPGA.<br />

The SRAM-based FPGA generic model used <strong>in</strong> this work<br />

consists of three k<strong>in</strong>ds of resources, as shown <strong>in</strong> fig. 1 :<br />

logic blocks, switch boxes <strong>and</strong> wir<strong>in</strong>g segments. The logic<br />

blocks conta<strong>in</strong> the comb<strong>in</strong>ational <strong>and</strong> sequential logic<br />

required to implement the user circuit. The <strong>in</strong>put <strong>and</strong><br />

output signals are connected to adjacent switch boxes<br />

through wir<strong>in</strong>g segments. The switch boxes are switch<br />

matrices where several programmable <strong>in</strong>terconnect po<strong>in</strong>ts<br />

(PIPs) (i.e. pass transistor), called rout<strong>in</strong>g segments<br />

controlled by the configuration memory, are available. We<br />

modeled the resources with<strong>in</strong> a SRAM-based FGPA as<br />

vertices <strong>and</strong> edges of a graph. Thus we have logic vertices<br />

that model the FPGA’s logic blocks, rout<strong>in</strong>g vertices that<br />

model the <strong>in</strong>put/output po<strong>in</strong>ts of the switchboxes, rout<strong>in</strong>g<br />

edges that model the PIPs <strong>and</strong> wir<strong>in</strong>g edges that model the<br />

FPGA’s wir<strong>in</strong>g segments. Edges <strong>and</strong> vertices are colored<br />

to <strong>in</strong>dicate the correspond<strong>in</strong>g FPGA’s resource is used to<br />

implement a circuit. In the case the FPGA implements<br />

different circuits, or different replicas of the same circuits,<br />

different colors are used to mark each circuit.<br />

Logic<br />

Block<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

Wir<strong>in</strong>g segments<br />

Logic<br />

Block<br />

Switch box Switch box<br />

Figure 1. Generic FPGA architecture model<br />

3. SEU EFFECTS ON FPGA’S<br />

CONFIGURATION MEMORY<br />

The effects <strong>in</strong>duced by SEUs on SRAM-based FPGAs<br />

have been recently <strong>in</strong>vestigated thanks to radiation<br />

experiments [3]. More recently, an analysis that comb<strong>in</strong>es<br />

the results of radiation test<strong>in</strong>g with those obta<strong>in</strong>ed while<br />

analyz<strong>in</strong>g the mean<strong>in</strong>g of every bit <strong>in</strong> the FPGA’s<br />

configuration memory was presented <strong>in</strong> [4]. Although<br />

SEUs are transient by def<strong>in</strong>ition, when they are orig<strong>in</strong>ated<br />

<strong>in</strong> the configuration memory their effects are permanent,<br />

s<strong>in</strong>ce SEUs rema<strong>in</strong> latched until the configuration memory<br />

is rewritten with new configuration data. The errors<br />

produced by SEUs <strong>in</strong> the FPGA’s configuration memory<br />

can be classified <strong>in</strong> two different categories: errors that<br />

affect logic blocks <strong>and</strong> errors that affect the switch boxes.<br />

Consider<strong>in</strong>g the logic-block errors, several different<br />

phenomena may be observed, depend<strong>in</strong>g of which<br />

resource of the logic block the SEU modified:<br />

1. LUT error: the SEU modified one bit of a LUT, thus<br />

chang<strong>in</strong>g the comb<strong>in</strong>ational function it implements.<br />

148<br />

2. MUX error: the SEU modified the configuration of a<br />

MUX <strong>in</strong> the logic block; as a result, signals are not<br />

correctly forwarded <strong>in</strong>side the logic block.<br />

3. FF error: the SEU modified the configuration of a<br />

FF, for example chang<strong>in</strong>g the polarity of the reset<br />

l<strong>in</strong>e, or that of the clock l<strong>in</strong>e.<br />

In order to model faulty logic blocks <strong>in</strong> the rout<strong>in</strong>g graph<br />

we described <strong>in</strong> the previous section, we assumed to use<br />

the black color to mark each vertex correspond<strong>in</strong>g to a<br />

faulty logic block.<br />

As far as switch boxes are considered, different<br />

phenomena are possible. Although a SEU affect<strong>in</strong>g a<br />

switch box modifies the configuration of one PIP, both<br />

s<strong>in</strong>gle <strong>and</strong> multiple effects can be orig<strong>in</strong>ated. S<strong>in</strong>gle<br />

effects happen when the modification <strong>in</strong>duced by the SEU<br />

after the affected PIP, only. In this case one situation may<br />

happen is Open the SEU changes the configuration of the<br />

affected PIP <strong>in</strong> such a way that the exist<strong>in</strong>g connection<br />

between two rout<strong>in</strong>g segments is opened. In the rout<strong>in</strong>g<br />

graph we model such a situation by delet<strong>in</strong>g the rout<strong>in</strong>g<br />

edge correspond<strong>in</strong>g to the PIP that connects the two<br />

rout<strong>in</strong>g vertices.<br />

In order to describe the multiple effects <strong>in</strong> terms of<br />

modifications to the rout<strong>in</strong>g graph, let us consider a pair of<br />

connection between four rout<strong>in</strong>g vertices AS/AD <strong>and</strong><br />

BS/BD as shown <strong>in</strong> fig. 2.a. Depend<strong>in</strong>g on the FPGA<br />

architecture, it is possible that one <strong>and</strong> only one<br />

configuration memory bit controls two or more rout<strong>in</strong>g<br />

segment. Thus, a SEU affect<strong>in</strong>g it may modify two or<br />

more rout<strong>in</strong>g segments provok<strong>in</strong>g a multiple errors.<br />

Accord<strong>in</strong>g to our graph model, SEUs affect<strong>in</strong>g<br />

configuration memory bits controll<strong>in</strong>g rout<strong>in</strong>g resources<br />

can modify two or more edge, generat<strong>in</strong>g multiple errors.<br />

We identified the follow<strong>in</strong>g modifications that could be<br />

<strong>in</strong>troduced between AS/AD <strong>and</strong> BS/BD: Short (fig. 2.b) a<br />

new edge is added, Open (fig. 2.c) two edges are deleted<br />

<strong>and</strong> Open/Bridge (fig. 2.d) both one edge is added <strong>and</strong> one<br />

is deleted.<br />

A D<br />

A S<br />

B D<br />

B S<br />

A D<br />

A S<br />

B D<br />

B S<br />

(a) (b) (c) (d)<br />

Figure 2. Possible multiple effects <strong>in</strong>duced by one SEU<br />

S<strong>in</strong>ce one SEU may be responsible for multiple errors,<br />

harden<strong>in</strong>g techniques developed accord<strong>in</strong>gly to the s<strong>in</strong>glefault<br />

assumption are not adequate to cope with the effects<br />

of SEUs <strong>in</strong> the configuration memory. As a result of the<br />

analysis we performed on FPGA architectures, we def<strong>in</strong>ed<br />

the follow<strong>in</strong>g constra<strong>in</strong>ts that must be enforced by the<br />

place <strong>and</strong> route algorithm <strong>in</strong> order to develop a faulttolerant<br />

graph mark<strong>in</strong>g:<br />

1. All the circuit modules <strong>and</strong> connections must be<br />

replicated three times;<br />

A D<br />

A S<br />

B D<br />

B S<br />

A D<br />

A S<br />

B D<br />

B S


2. The outputs of the three circuit replica must be voted<br />

accord<strong>in</strong>g to the TMR pr<strong>in</strong>ciple;<br />

3. The elements of the result<strong>in</strong>g TMR architecture (logic<br />

functions <strong>and</strong> connections among them) must be<br />

placed <strong>and</strong> routed <strong>in</strong> such a way that, given the<br />

correspond<strong>in</strong>g rout<strong>in</strong>g graph, each new edge that is<br />

added (or deleted) to (from) the graph cannot provoke<br />

any fault belong<strong>in</strong>g to the follow<strong>in</strong>g categories:<br />

a. Short between different connections belong<strong>in</strong>g<br />

to different circuit replicas;<br />

b. Open different connections belong<strong>in</strong>g to<br />

different circuit replicas.<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

4. THE RORA ALGORITHM<br />

/*Placement*/<br />

generate_functions_replicas (F1, F2, F3)<br />

generate_majority_voter (F4)<br />

generate_partitions (S1, S2, S3, S4)<br />

for each logic function LF ∈ Fi<br />

place LB on Si where i = {1, 2, 3, 4}<br />

/*Rout<strong>in</strong>g*/<br />

FVSi = ∅ where i = {1, 2, 3}<br />

for each source vertex SV ∈ Fi<br />

{<br />

for each dest<strong>in</strong>ation vertex DV of SV<br />

RT = route (SV, DV)<br />

update (FVSi,RT)<br />

}<br />

Figure 3. The flow of the proposed Reliability-Oriented<br />

Place <strong>and</strong> Route Algorithm RoRA<br />

In general, the commonly used design-flow to map<br />

designs onto a SRAM-based FPGA consist of three<br />

phases. In the first phase, a synthesizer is used to<br />

transform a circuit model coded <strong>in</strong> a hardware description<br />

language <strong>in</strong>to an RTL design. In the second phase a<br />

technology mapper transforms the RTL design <strong>in</strong>to a gatelevel<br />

model composed of look-up tables (LUTs) <strong>and</strong> flip<br />

flops (FFs) <strong>and</strong> it b<strong>in</strong>ds them to the FPGA’s resources<br />

(produc<strong>in</strong>g the technology-mapped design). In the third<br />

phase, the technology mapped design is physically<br />

implemented on the FPGA by the place <strong>and</strong> route<br />

algorithm.<br />

The problem of how to physically implement a circuit on a<br />

FPGA device is divided <strong>in</strong>to two sub problems: placement<br />

<strong>and</strong> rout<strong>in</strong>g. The ma<strong>in</strong> reason beh<strong>in</strong>d such decomposition<br />

is to reduce the problem complexity. Our proposed<br />

reliability-oriented place <strong>and</strong> route algorithm, called<br />

RoRA, firstly reads a technology mapped design. Then, it<br />

performs a reliability-oriented placement of each logic<br />

functions, <strong>and</strong> f<strong>in</strong>ally it routes the signals between<br />

functions <strong>in</strong> such a way that multiple errors affect<strong>in</strong>g two<br />

different connections are not possible.<br />

The algorithm we developed is described <strong>in</strong> fig.3, where<br />

the placement <strong>and</strong> rout<strong>in</strong>g steps are shown <strong>in</strong> a C-like<br />

pseudo-code. Our proposed RoRA Placement algorithm<br />

149<br />

performs a robust placement, which implements the TMR<br />

pr<strong>in</strong>ciple, execut<strong>in</strong>g four dist<strong>in</strong>ct functions:<br />

1. The generate_functions_replicas () firstly reads the<br />

design description produced after the technology<br />

mapp<strong>in</strong>g <strong>and</strong> identifies the logic functions <strong>in</strong> the<br />

design. Secondly, it generates three replicas of the<br />

logic functions belong<strong>in</strong>g to the orig<strong>in</strong>al design. Let<br />

F be the set of the orig<strong>in</strong>al design’s logic functions:<br />

at the end of this step the three sets F1, F2 <strong>and</strong> F3 are<br />

produced.<br />

2. The generate_majority_voter () analyzes the three<br />

logic function sets F 1, F 2 <strong>and</strong> F 3, <strong>and</strong> generates a<br />

logic functions set F 4 that performs the majority<br />

vot<strong>in</strong>g between them.<br />

3. The generate_partitions () partitions the rout<strong>in</strong>g<br />

graph’s vertices <strong>in</strong> four non-overlapp<strong>in</strong>g sets, where<br />

each set S i (i=1,2,3,4) has enough logic vertices to<br />

conta<strong>in</strong> the logic functions of each set F i (i=1,2,3,4).<br />

4. Every logic function <strong>in</strong> set F i is placed heuristically<br />

to the logic vertices <strong>in</strong> set S i, where i=1, 2, 3, 4. This<br />

phase takes care of mark<strong>in</strong>g the graph, by assign<strong>in</strong>g<br />

each logic function to exactly one logic vertex <strong>in</strong> our<br />

rout<strong>in</strong>g graph.<br />

The RoRA placement algorithm places each logic<br />

functions <strong>in</strong> Fi to the graph vertices belong<strong>in</strong>g to Si, as<br />

well as the majority voter on S4. After the placement<br />

process, each set Si conta<strong>in</strong>s exclusively the function of set<br />

Fi. This solution allows us to guarantee that s<strong>in</strong>gle or<br />

multiple effects with<strong>in</strong> one set Si only do not provoke any<br />

misbehavior of the circuit. Indeed, accord<strong>in</strong>gly to our<br />

placement, only multiple effects on the border of two<br />

different sets Si ≠ Sj may generate multiple errors that<br />

affect two different replicas.<br />

When all the logic functions are placed to the<br />

correspondent logic vertex set, RoRA performs the rout<strong>in</strong>g<br />

of the <strong>in</strong>terconnections between the logic vertices.<br />

Basically, the RoRA Rout<strong>in</strong>g algorithm works on the<br />

rout<strong>in</strong>g graph we developed, <strong>and</strong> it routes each connection<br />

between two logic vertices through the shortest path it can<br />

f<strong>in</strong>d. Dur<strong>in</strong>g path selection, the RoRA Rout<strong>in</strong>g algorithm<br />

labels dynamically the graph’s rout<strong>in</strong>g vertices, <strong>in</strong> such a<br />

way that it avoids the <strong>in</strong>stantiation of two connections that<br />

may be subject to Short effects. Each graph rout<strong>in</strong>g vertex<br />

(RV) are labeled as free, used or forbidden, with the<br />

follow<strong>in</strong>g mean<strong>in</strong>gs:<br />

1. Free: the rout<strong>in</strong>g vertex is not used by any<br />

connection.<br />

2. Used: the rout<strong>in</strong>g vertex is already used by a<br />

connection.<br />

3. Forbidden: a rout<strong>in</strong>g vertex RV is forbidden if <strong>and</strong><br />

only if:<br />

a. It belongs to set Si (RV∈Si), <strong>and</strong><br />

b. At least one rout<strong>in</strong>g edge, or one wir<strong>in</strong>g edge<br />

exists between RV <strong>and</strong> another vertex RV’<br />

belong<strong>in</strong>g to Sj (RV’∈Sj), where i ≠ j.


If RV is added to the circuit <strong>and</strong> a SEU affects the rout<strong>in</strong>g<br />

resources <strong>in</strong> such a way that both RV <strong>and</strong> RV’ are<br />

affected, the TMR does no longer work as expected. The<br />

Forbidden Vertices Sets (FVSs), which are empty at the<br />

beg<strong>in</strong>n<strong>in</strong>g of the RoRA rout<strong>in</strong>g, conta<strong>in</strong> the vertices<br />

marked as forbidden <strong>and</strong> belong<strong>in</strong>g to the correspondent<br />

graph rout<strong>in</strong>g vertices set S i.<br />

RoRA performs the rout<strong>in</strong>g of each net by tak<strong>in</strong>g <strong>in</strong>to<br />

consideration all the graph’s vertices labeled as free, <strong>and</strong> it<br />

updates progressively the FVSs add<strong>in</strong>g the vertices<br />

marked as forbidden.<br />

As soon as the net is routed, <strong>and</strong> the mark<strong>in</strong>g of the graph<br />

has been updated (i.e., the vertices <strong>in</strong> the rout<strong>in</strong>g graph,<br />

<strong>and</strong> the associated edges, have been marked as used by the<br />

circuit implementation), the update () function is used to<br />

modify the set i of forbidden vertices (FVSi), which is<br />

empty at the beg<strong>in</strong>n<strong>in</strong>g of RoRA rout<strong>in</strong>g.<br />

5. EXPERIMENTAL RESULTS<br />

We evaluate the robustness of the circuit obta<strong>in</strong>ed from<br />

RoRA us<strong>in</strong>g the fault <strong>in</strong>jection environment presented <strong>in</strong><br />

[2]. We considered four benchmark circuits (two adders,<br />

one multiplier <strong>and</strong> a filter) <strong>and</strong> we placed them on a Xil<strong>in</strong>x<br />

Spartan ® XC2S200EPQ208 [5], the characteristics of the<br />

adopted circuits are reported <strong>in</strong> table 1 where we report the<br />

number of FPGA slices that circuit occupies (column<br />

Area) as well as its maximum work<strong>in</strong>g frequency (column<br />

Speed), for the pla<strong>in</strong>, the TMR <strong>and</strong> the RoRA versions. It<br />

is <strong>in</strong>terest<strong>in</strong>g to observe that, for the considered<br />

benchmarks, RoRA does not <strong>in</strong>troduce any area overhead<br />

with respect to the traditional TMR solution (which is<br />

about 3 times larger than the pla<strong>in</strong> circuit). Conversely,<br />

when placed <strong>and</strong> routed through RoRA, the circuits<br />

become 22% slower on the average than their TMR<br />

versions. This effect is the result of the dependabilityoriented<br />

rout<strong>in</strong>g algorithm that RoRA implements: the<br />

shortest path is not always selected as the best solution,<br />

s<strong>in</strong>ce it may not be acceptable from the dependability<br />

po<strong>in</strong>t of view. To measure the hardness of the obta<strong>in</strong>ed<br />

circuit we <strong>in</strong>jected 15,000 r<strong>and</strong>omly selected SEUs <strong>in</strong> the<br />

FPGAs configuration memory. The results are reported <strong>in</strong><br />

tab. 2, where Injected Faults is the number of <strong>in</strong>jected<br />

SEUs <strong>and</strong> Wrong Answer is the number of SEUs for<br />

which the faulty circuit produces outputs that differ from<br />

the fault-free one.<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

Table 2. Fault Injection results.<br />

Circuit Injected Wrong Answer [#]<br />

Faults Pla<strong>in</strong> TMR RoR<br />

A<br />

Add8 15,000 14,587 1,352 30<br />

Add16 15,000 14,598 1,692 41<br />

Mul8 15,000 14,603 1,977 23<br />

Filter 15,000 14,642 1,981 44<br />

6. CONCLUSIONS<br />

In this paper we propose a reliability-oriented place <strong>and</strong><br />

route algorithm able to reduce the effects of SEUs <strong>in</strong><br />

SRAM-based FPGAs. Its effectiveness was evaluated on<br />

some benchmark circuits by means of fault <strong>in</strong>jection<br />

experiments <strong>in</strong> the FPGA configuration memory. For the<br />

considered benchmarks, the capability of tolerat<strong>in</strong>g SEU<br />

<strong>in</strong>creases up to 85 times. This improvement comes without<br />

any additional area cost with respect to the TMR design<br />

technique, while a performance penalty of 22% on the<br />

average is observed.<br />

7. REFERENCES<br />

[1] E. Norm<strong>and</strong>, “S<strong>in</strong>gle Event Upset at Ground Level”,<br />

IEEE Transaction on Nuclear Science, vol. 43, No. 6,<br />

Dec. 1996<br />

[2] P. Bernardi, M. Sonza Reorda, L. Sterpone, M.<br />

Violante, “On the evaluation of SEUs sensitiveness<br />

<strong>in</strong> SRAM-based FPGAs”, 10th IEEE International<br />

On-L<strong>in</strong>e Test<strong>in</strong>g Symposium, 2004, pp. 115-120<br />

[3] M. Bellato, P. Bernardi, D. Bortolato, A. C<strong>and</strong>elori,<br />

M. Ceschia, A. Paccagnella, M. Rebaudengo, M.<br />

Sonza Reorda, M. Violante, P. Zambol<strong>in</strong>,<br />

“Evaluat<strong>in</strong>g the effects of SEUs affect<strong>in</strong>g the<br />

configuration memory of an SRAM-based FPGA”,<br />

IEEE Design Automation <strong>and</strong> Test <strong>in</strong> Europe, <strong>2005</strong>,<br />

pp. 1290 - 1295<br />

[4] M. Ceschia, M. Violante, M. Sonza Reorda, A.<br />

Paccagnella, P. Bernardi, M. Rebaudengo, D.<br />

Bortolato, M. Bellato, P. Zambol<strong>in</strong>, A. C<strong>and</strong>elori,<br />

“Identification <strong>and</strong> classification of s<strong>in</strong>gle-event<br />

upsets <strong>in</strong> the configuration memory of SRAM-based<br />

FPGAs”, IEEE Transaction on Nuclear Science, Dec.<br />

2003, Vol. 50, No. 6, pp. 2088-2094<br />

[5] “Spartan-II 2.5 V FPGA family”, Xil<strong>in</strong>x, 2003.<br />

Table 1. Characteristics of the adopted circuits.<br />

Circuit<br />

Pla<strong>in</strong> version TMR version RoRA version<br />

Speed [Mhz] Area Speed [Mhz] Area Speed [Mhz] Area<br />

[# slices]<br />

[# slices]<br />

[# slices]<br />

Add8 105 26 86 100 64 96<br />

Add16 105 28 85 103 62 105<br />

Mul8 105 41 64 127 54 125<br />

Filter 104 46 65 132 58 138<br />

150


0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

EFFICIENT VLSI IMPLEMENTATION<br />

OF MULTIPLICATION IN GF(2 n )<br />

Krzysztof Gołofit<br />

Warsaw Univesity of Technology, Faculty of <strong>Electronics</strong> <strong>and</strong> Information Technology,<br />

Ul. Nowowiejska 15/19, 00-665 Warsaw, Pol<strong>and</strong><br />

E-mail: kgolofit@elka.pw.edu.pl<br />

ABSTRACT<br />

The paper focuses on the possibilities of hardware<br />

implementation (above all VLSI proposals) of the<br />

multiplication <strong>in</strong> GF(2 n ). Encourag<strong>in</strong>g solutions were<br />

implemented <strong>in</strong> technology 0.35 µm <strong>and</strong> served as the<br />

models for the projection of the ma<strong>in</strong> circuits parameters<br />

(silicon area, speed, power consumption) for currently<br />

used <strong>and</strong> more advanced technologies.<br />

1. INTRODUCTION<br />

It is common knowledge that multiplication is the most<br />

difficult <strong>and</strong> time-consum<strong>in</strong>g operation <strong>in</strong>dependently of<br />

computational platform. For the hardware solutions,<br />

b<strong>in</strong>ary fields (arithmetic <strong>in</strong> GF(2 n ) ) are attractive s<strong>in</strong>ce the<br />

operations <strong>in</strong>volve only shifts <strong>and</strong> elementary operations<br />

like conjunction <strong>and</strong> bitwise addition modulo 2. The aim<br />

of this article is to present the logical way from theoretical<br />

backgrounds of multiplication <strong>in</strong> GF(2 n ), through the<br />

consecutive models – which are often difficult (or even<br />

unable) to perform <strong>in</strong> such hardware – to efficient <strong>and</strong><br />

useful implementations. As a result of our considerations<br />

two alternative solutions will be shown. The first one<br />

assumes stream data process<strong>in</strong>g <strong>and</strong> is based on systolic<br />

structure. The second one concerns fast multiplication<br />

where almost all of the operations are performed<br />

simultaneously – this case requires ONB (Optimal Normal<br />

Bases) to be use. Models of these two <strong>and</strong> other crucial<br />

solutions were implemented at the lowest possible design<br />

level – full custom VLSI (Very Large Scale Integration)<br />

technology. Such approach allows us to obta<strong>in</strong><br />

implementation possibilities (first of all depend<strong>in</strong>g on field<br />

extensions) as well as basic circuits parameters.<br />

Familiarity with the advantages of more advanced<br />

technologies (for example additional layers of<br />

metallization) makes a few more <strong>in</strong>terest<strong>in</strong>g solutions<br />

possible – still giv<strong>in</strong>g an estimation of achievable<br />

parameters for each of analysed technologies.<br />

2. HARDWARE IMPLEMENTATION<br />

OF MULTIPLICATION<br />

2.1 Theoretical backgrounds of multiplication<br />

<strong>in</strong> GF(2 n )<br />

The extensive computation number theory gives us<br />

methods to simplify <strong>and</strong> optimize almost every arithmetic-<br />

151<br />

cal operation <strong>in</strong> f<strong>in</strong>ite fields. Obviously the most<br />

computational complex operation still rema<strong>in</strong>s the<br />

multiplication which can be less dem<strong>and</strong><strong>in</strong>g when normal<br />

bases (especially ONB) are used. A normal basis can be<br />

formed us<strong>in</strong>g the set<br />

2<br />

n−1<br />

2 2 2<br />

{ , β , β , ... , }<br />

N = β β<br />

(1)<br />

where β∈GF(2 n ) unambiguously determ<strong>in</strong>es the basis N.<br />

Every element A over GF(2 n ) can be uniquely represented<br />

as a l<strong>in</strong>ear comb<strong>in</strong>ation of elements of N (<strong>and</strong> can be<br />

written <strong>in</strong> the form of vector)<br />

A<br />

n−1<br />

= ∑<br />

i=<br />

0<br />

i<br />

0 1 n−1<br />

i<br />

2<br />

a ⋅ β = ( a a ... a ) (2)<br />

Multiplication of the two elements A <strong>and</strong> B over f<strong>in</strong>ite<br />

field GF(2 n ) can be def<strong>in</strong>ed as follow<br />

n−1<br />

n−1<br />

n−1<br />

n−1<br />

k<br />

k<br />

( k ) 2<br />

2<br />

A ⋅ B = ∑∑∑aib<br />

jtij<br />

β = ∑ ck<br />

⋅ β = C (3)<br />

k=<br />

0 i=<br />

0 j=<br />

0<br />

k=<br />

0<br />

where C∈GF(2 n ) is the result of multiplication <strong>and</strong><br />

T −<br />

( k )<br />

k = tij<br />

= t<br />

(4)<br />

j−i,<br />

k i<br />

are n multiplication tables of size n-by-n, 0 ≤ k, i, j ≤ n–1.<br />

Every table Tk can be regularly produced from the table<br />

T = [tij] n×n. Table T can be calculate from follow<strong>in</strong>g<br />

equation<br />

∑ − m 1<br />

ij<br />

j=<br />

0<br />

i<br />

2<br />

β ⋅ β = t ⋅ β<br />

(5)<br />

i<br />

Optimal normal bases require optimal complexity <strong>in</strong> every<br />

multiplication table Tk – it means that the quantity of the<br />

nonzero elements has the m<strong>in</strong>imal value (2n–1).<br />

Practically we have at the most two elements <strong>in</strong> every row<br />

<strong>and</strong> every column of the matrix. Transformation from the<br />

table Tk to the table Tk+1 (for both NB <strong>and</strong> ONB) results <strong>in</strong><br />

a regular slant<strong>in</strong>g shift of the elements <strong>in</strong> the table – what<br />

can be undem<strong>and</strong><strong>in</strong>gly implemented <strong>in</strong> hardware.<br />

The equation describ<strong>in</strong>g k bit of vector C can be f<strong>in</strong>ally<br />

written – on the basis of def<strong>in</strong>ition (3) – as multiplication<br />

of n-by-n matrix T <strong>and</strong> elements A <strong>and</strong> B written <strong>in</strong> the<br />

form of n-by-1 vector<br />

c k<br />

k AT B'<br />

= (6)<br />

where B' st<strong>and</strong>s for the transpose of the vector B.


2.2 VLSI hardware multiplication<br />

The first idea for the hardware VLSI implementation is to<br />

build a computational array follow<strong>in</strong>g almost directly the<br />

theoretical basis. The ma<strong>in</strong> cell of the array consists of<br />

three elements: flip-flop, b<strong>in</strong>ary conjunction <strong>and</strong> addition<br />

modulo 2. Multiplied vectors A <strong>and</strong> B are delivered<br />

horizontally <strong>and</strong> vertically to the array respectively, dur<strong>in</strong>g<br />

the time when flip-flops are hold<strong>in</strong>g values analogous to<br />

the current multiplication matrix T k. Follow<strong>in</strong>g the<br />

formula (3) every s<strong>in</strong>gle cell of the array is mak<strong>in</strong>g b<strong>in</strong>ary<br />

multiplication of bits t ij (k) (elements tij of the matrix T k), a i<br />

<strong>and</strong> b i. The results are first added vertically (<strong>in</strong> the<br />

columns) <strong>and</strong> then horizontally (under the array) giv<strong>in</strong>g<br />

consequently b<strong>in</strong>ary value of the bit c k. Cyclic process<br />

together with regular slant<strong>in</strong>g shifts (among the flip-flops)<br />

provides us with the whole vector C. Unfortunately we can<br />

observe cubic dependence of process<strong>in</strong>g time on field<br />

extension n (caused by sequence: vertical addition,<br />

horizontal addition, n-time cyclic process).<br />

Figure 1. The structure of the cells for the cyclic<br />

multiplication circuit.<br />

Computation time can be shortened from dependence n 3<br />

to n 2 +1 by mak<strong>in</strong>g second addition (the horizontal)<br />

<strong>in</strong> the cycle for the next bit – while the next vertical<br />

addition is processed. Outlook on practical implementation<br />

of the complete array model for GF(2 11 )<br />

with the vertical signals addition (at the top of the array)<br />

is presented <strong>in</strong> Figure 2.<br />

Furthermore, we can design the array, where all of the<br />

array cells will systematically process the data at the same<br />

time – it will give us the l<strong>in</strong>ear dependence of computation<br />

time on n. Both the multiplied vectors <strong>and</strong> the<br />

temporary <strong>in</strong>termediate results must be transported<br />

between cells <strong>in</strong> such solution. The s<strong>in</strong>gle cell consists of<br />

b<strong>in</strong>ary conjunction, addition modulo 2 <strong>and</strong> 3 flip-flops<br />

– for the multiplied bits ax, by <strong>and</strong> the <strong>in</strong>termediate result<br />

(there is no need to save <strong>and</strong> transform the values of the<br />

table T – the set can be rigidly arrange). The first<br />

<strong>in</strong>termediate results of c1 can be compute <strong>in</strong> the first row<br />

(alongside the vector A) <strong>and</strong> the first column (alongside<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

152<br />

Figure 2. VLSI layout of the cyclic multiplication<br />

circuit model for GF(2 11 ).<br />

the vector B) of the cells. The outcomes (as well as the<br />

vectors A <strong>and</strong> B) will be deliver to the second row <strong>and</strong> the<br />

second column – where will be the next step of the<br />

computations. In the second step the first row <strong>and</strong> the first<br />

column (spare at the time) can take up computations of<br />

the bit c2. Graphically, the computation wave progress<br />

vertically <strong>and</strong> horizontally to the corner of the array<br />

– therefore the computation time is shortened to the l<strong>in</strong>ear<br />

dependence on n. Unfortunately, from the practical po<strong>in</strong>t<br />

of view, such solution is restricted by the field extension<br />

n. The s<strong>in</strong>gle cells located <strong>in</strong> the diagonal l<strong>in</strong>e should<br />

provide the rest of the cells <strong>in</strong> the same rows <strong>and</strong> columns<br />

with the multiplied signals ax <strong>and</strong> by – it can not be<br />

effectively implemented. Illustration of this case, be<strong>in</strong>g<br />

less practical, will be pass over <strong>in</strong> this article, however, on<br />

this ground we can consider next solution.<br />

Def<strong>in</strong>itely more <strong>in</strong>terest<strong>in</strong>g <strong>in</strong>stance is the systolic model.<br />

Systolic array is characterized merely by communication<br />

between adjo<strong>in</strong><strong>in</strong>g cells. In this construction the<br />

computational wave is mov<strong>in</strong>g diagonally.<br />

Figure 3. The systolic structure – the s<strong>in</strong>gle array cell.<br />

Flip-flops are used as the memory of <strong>in</strong>termediate results<br />

<strong>and</strong> current multiplied bits a x <strong>and</strong> b y. We have also b<strong>in</strong>ary<br />

conjunction of the bits a x, b y <strong>and</strong> t ij (values of the table T<br />

are rigidly placed) as well as addition modulo 2 for the<br />

actual <strong>and</strong> previous <strong>in</strong>termediate result. The last column<br />

<strong>and</strong> the last row must make b<strong>in</strong>ary addition of 4 signals,<br />

however, b<strong>in</strong>ary conjunction is not required for the first


ow <strong>and</strong> the fist column. The multiplied vectors A <strong>and</strong> B<br />

are delivered respectively, but <strong>in</strong> the first step we need to<br />

deliver the bits a1 <strong>and</strong> b1 to the first row <strong>and</strong> the first<br />

column, <strong>in</strong> the second step b2 <strong>and</strong> a2 to the first <strong>and</strong> the<br />

second row <strong>and</strong> the column, <strong>in</strong> the third step a3 <strong>and</strong> b3 to<br />

the first three columns <strong>and</strong> rows respectively, etc.<br />

Consequently we have two major advantages. First, l<strong>in</strong>ear<br />

(2n) dependence of time on n. Second, the <strong>in</strong>put data<br />

should be delivered <strong>in</strong> the order from 1 to n, mak<strong>in</strong>g<br />

stream data process<strong>in</strong>g possible!<br />

The models illustrated <strong>in</strong> the Figure 1 <strong>and</strong> 3 have been<br />

modeled <strong>in</strong> full custom VLSI technology (0.35 µm). Both<br />

solutions are characterized by square dependence of the<br />

silicon area on n <strong>and</strong> square dependence of the dissipated<br />

power on n. In the first case the area of one cell is equal to<br />

590 µm 2 <strong>and</strong> <strong>in</strong> the second one to 1100 µm 2 . Power<br />

consumption strongly depends on field extension<br />

(obviously square dependence), <strong>in</strong>put data, <strong>and</strong> cell<br />

structure (consists of a quantity of flip-flops <strong>and</strong> gates).<br />

Speed of the elementary cell <strong>in</strong> both circuits is comparable<br />

– process<strong>in</strong>g time of one cell amount to 2.56 ns.<br />

Differences between circuits can be noticed <strong>in</strong> dependences<br />

on n – the first one is characterized by square<br />

dependence: n 2 +1, the second by l<strong>in</strong>ear dependence: 2n.<br />

Simplicity <strong>and</strong> recurrence of presented structures are<br />

<strong>in</strong>disputable advantages of those solutions. Moreover, <strong>in</strong><br />

the systems where pipel<strong>in</strong>ed solutions are required systolic<br />

multiplication can f<strong>in</strong>d excellent application. But there are<br />

examples where stream data process<strong>in</strong>g cannot be<br />

exploited – the cases of specific cryptography solutions,<br />

where vectors are rotated between multiplications (mak<strong>in</strong>g<br />

squar<strong>in</strong>s). Therefore other multiplication methods must be<br />

considered.<br />

2.3 Fast multiplication<br />

When optimal extension field (OEF) is used multiplication<br />

can be done <strong>in</strong> the particular way. The difference relates to<br />

the fact that we resign from universal solution <strong>in</strong> aid of the<br />

parallel but rigid construction. Mak<strong>in</strong>g n multiplication<br />

arrays (each for the different matrix T k) build<strong>in</strong>g of the<br />

flip-flops can be passed over (as a matter of course most<br />

area <strong>in</strong>volv<strong>in</strong>g factor comes off). Remember<strong>in</strong>g that <strong>in</strong><br />

every column at the most two nonzero elements appear we<br />

can arrange all the computation gates <strong>in</strong>to one l<strong>in</strong>e <strong>and</strong><br />

only cross their connections.<br />

Figure 4. The structure of the fast multiplication<br />

circuit (for one bit).<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

153<br />

For the reason that we have to build n similar constructions<br />

– for every bit of the vector C – square dependence<br />

of gates area is obvious nevertheless crossed connections<br />

result <strong>in</strong> cubic dependence of area on n. It is graphically<br />

depicted <strong>in</strong> Figure 5 – area with cubic dependence is<br />

marked at the top of the middle picture <strong>and</strong> is equal to<br />

5.75 µm 2 <strong>in</strong> 0.35 µm technology. Square area factor (on<br />

the left <strong>in</strong> Fig.5) is approximated to 105 µm 2 .<br />

Figure 5. VLSI layout of the fast multiplication<br />

circuit for GF(2 11 ).<br />

The greatest advantage of this solution is that the circuit is<br />

very fast <strong>and</strong> the dependence of process<strong>in</strong>g time from n is<br />

negligible (about ⎡log 2n⎤ ). It takes place because almost<br />

all possible computations are performed parallely. The<br />

time of one multiplication <strong>in</strong> mentioned technology with 4<br />

metallization layers for n = 500 is equal to 4 ns.<br />

General estimation of the circuit area for the 0.35 µm<br />

technology with 3 metallization layers is presented by<br />

the equation<br />

3<br />

2<br />

P = pk<br />

3 ⋅ n + pop<br />

⋅n<br />

+ ps<br />

⋅ ⎡ 2 n⎤+<br />

pr<br />

3 log (7)<br />

where pk3 – area of the elementary connection (5,75µm 2 );<br />

pop – area of the ‘oper’ component (104,19µm 2 ); ps – area<br />

of addition modulo 2 (115,65 µm 2 ); pr – supplementary<br />

area factor (n 2 + ⎡log 5 n⎤ )·30µm 2 – ma<strong>in</strong>ly responsible for<br />

distribution of multiplied signals.<br />

The ma<strong>in</strong> difficulty with the speed analysis are the<br />

<strong>in</strong>creas<strong>in</strong>g lengths of the circuit <strong>in</strong>tra-connections. The<br />

numbers can be not precise, but their orders of values are<br />

reliable. We estimate the circuit process<strong>in</strong>g time by<br />

T = 2 t<strong>in</strong><br />

⋅ ⎡log25 n⎤+<br />

toper<br />

+ ts<br />

⋅ ⎡log2 n⎤+<br />

t (8)<br />

out<br />

where t_ are the elementary times of: t <strong>in</strong> – <strong>in</strong>put amplifier<br />

(160ps); t oper – ‘oper’ component (380ps); t s – addition<br />

modulo 2 (300ps); t out – output amplifier (100ps).<br />

Nevertheless it is possible to determ<strong>in</strong>e exactly the number<br />

of transistors (what can be identified with dissipated<br />

power).<br />

2<br />

L 1, 26 ⋅ ⎡log25 n⎤+<br />

10 ⋅ n + 2 ⋅ ⎡log 2 n⎤<br />

= (9)<br />

Both parameters process<strong>in</strong>g time <strong>and</strong> circuit area are<br />

<strong>in</strong>separably related to applied technology. The series of


improvements are possible (precisely described <strong>in</strong> [12])<br />

lead<strong>in</strong>g to decrease <strong>in</strong> silicon area up to 65% depend<strong>in</strong>g on<br />

applied technology. Percentage of reduction depend<strong>in</strong>g on<br />

additional layers of metallization (from 4 up to 7 layers) is<br />

shown <strong>in</strong> relation to the technology with 3 metallization<br />

layers (illustrated <strong>in</strong> Figure 6).<br />

Figure 6. Reduction of circuit area <strong>in</strong> different<br />

technologies <strong>in</strong> comparison to 3 met. layers techn.<br />

The size of the circuit – <strong>and</strong> connected with that<br />

implementation possibilities – still rema<strong>in</strong>s the major<br />

problem. Fortunately there is the perspective on exchange<br />

of process<strong>in</strong>g speed to less silicon area mak<strong>in</strong>g this<br />

solution more flexible <strong>and</strong> useful. Follow<strong>in</strong>g the equation<br />

(4) we can observe that the differences between<br />

consecutives tables T k <strong>and</strong> T k+1 <strong>in</strong> def<strong>in</strong>ition (3) may be<br />

replaced with <strong>in</strong>variable table T 0 <strong>and</strong> proper rotation of<br />

multiplied vectors A <strong>and</strong> B<br />

n−1<br />

n−1<br />

n−1<br />

∑∑∑<br />

i j ij<br />

k=<br />

0 i=<br />

0 j=<br />

0<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

n−1<br />

n−1<br />

n−1<br />

( 0)<br />

∑∑∑a(<br />

i+<br />

k)<br />

mod nb(<br />

j+<br />

k)<br />

mod ntij<br />

k=<br />

0 i=<br />

0 j=<br />

0<br />

( k)<br />

a b t =<br />

(10)<br />

where 0 ≤ k, i, j ≤ n–1 (this trick have been used as well <strong>in</strong><br />

systolic array implementation). Extremely we can build<br />

only one structure illustrated <strong>in</strong> Figure 4 <strong>and</strong> with the aid<br />

of n rotations of the A <strong>and</strong> B we obta<strong>in</strong> the entire vector C<br />

– the time of multiplication <strong>in</strong> this case take at least n<br />

times longer.<br />

3. CONCLUSION<br />

Compar<strong>in</strong>g the proposals, the most significant po<strong>in</strong>t is the<br />

establishment that the l<strong>in</strong>ear dependence of computation<br />

time on n <strong>in</strong> the systolic solution was rearranged to<br />

negligible dependence of time <strong>in</strong> fast circuit. The cost of<br />

such reformation was cubic dependence of silicon area<br />

<strong>in</strong>stead of square dependence. Universal solution<br />

(applicable to any b<strong>in</strong>ary field), simple implementation<br />

(result<strong>in</strong>g from highly modular structure) <strong>and</strong> ability<br />

to stream data process<strong>in</strong>g are <strong>in</strong>disputable advantages<br />

of systolic model. Nevertheless the parallel <strong>and</strong><br />

asynchronous method – difficult to implement <strong>and</strong><br />

applicable only to ONB – makes hundreds more<br />

multiplications possible.<br />

As usual the faster <strong>and</strong> bigger parallel circuits would<br />

be probably used <strong>in</strong> highly specialized hardware<br />

accelerators (for example cryptographic coprocessors)<br />

154<br />

while the cyclic or systolic circuits can be implemented at<br />

less dem<strong>and</strong><strong>in</strong>g solutions where economiz<strong>in</strong>g on area or<br />

pipel<strong>in</strong>ed algorithms would be applied.<br />

An estimation of achievable parameters depend<strong>in</strong>g on<br />

technological possibilities, speed requirements <strong>and</strong><br />

universality or type of data process<strong>in</strong>g is feasible through<br />

the specific design methods which take <strong>in</strong>to account the<br />

fixed nature of presented models.<br />

4. REFERENCES<br />

[1] S. Gao, Normal Bases over F<strong>in</strong>ite Fields, Waterloo,<br />

Ontario, Canada, 1993.<br />

http://www.math.clemson.edu/~sgao/pub.html<br />

[2] S. Gao, S. A. Vanstone, On Orders of Optimal<br />

Normal Basis Generators, Waterloo, Ontario,<br />

Canada, 1994.<br />

http://www.math.clemson.edu/~sgao/pub.html<br />

[3] J. Gaw<strong>in</strong>iecki, J. Szmidt, Zastosowanie ciał skończonych<br />

i krzywych eliptycznych w kryptografii,<br />

WAT, Warsaw, 1999.<br />

[4] N. Koblitz, Algebraiczne aspekty kryptografii, WNT,<br />

Warsaw, 2000.<br />

[5] International Technology Roadmap for Semiconductors,<br />

ITRS Edition, Update, 2002.<br />

http://public.itrs.net.<br />

[6] A. Menezes, P. Oorschot, S. Vanstone, H<strong>and</strong>book of<br />

Applied Cryptography, CRC Press, 1996.<br />

http://www.cacr.math.uwaterloo.ca/hac/<br />

[7] S. Roman, Field theory, New York, Berl<strong>in</strong>, Heidelberg,<br />

Spr<strong>in</strong>ger Verlag, 1995.<br />

[8] P.B. Bhattacharya, S.K. Ja<strong>in</strong>, S.R. Nagpaul, Basic<br />

abstract algebra, Cambridge University Press, 1994.<br />

[9] W. Mochnacki, Kody korekcyjne i kryptografia,<br />

Oficyna Wydawnicza Politechniki Wrocławskiej,<br />

Wrocław, 1977.<br />

[10] K. Gołofit, Implementacja systemów podpisów cyfrowych<br />

wykorzystujących krzywe eliptyczne, Master<br />

thesis (<strong>in</strong> Polish), PW, ISE, 2003.<br />

[11] K. Gołofit, Implementacja mnożenia w technologii<br />

VLSI dla zastosowań krzywych eliptycznych, III KKE<br />

Kołobrzeg, 2004.<br />

[12] K. Gołofit, Analiza możliwości implementacyjnych<br />

mnożenia nad GF(2 n ) w technologiach VLSI,<br />

IV KKE Darłówko Wschodnie, <strong>2005</strong>.


0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

MULTIDIRECTIONAL CATV TRANSFORMERS<br />

Dariusz Krzemieniecki 1 <strong>and</strong> Ewa Hermanowicz 2<br />

1 GZT Telkom-Telmor, Gdansk, Pol<strong>and</strong> dariusz.krzemieniecki@telmor.pl<br />

2 Gdansk University of Technology, Faculty of <strong>Electronics</strong>, Telecommunications <strong>and</strong> Informatics<br />

ul. Narutowicza 11/12, 80-952 Gdansk, Pol<strong>and</strong> hewa@eti.pg.gda.pl<br />

ABSTRACT<br />

In this paper we propose novel multidirectional<br />

transformers, <strong>in</strong>tended for application as passive power<br />

dividers, operat<strong>in</strong>g <strong>in</strong> the frequency range from 5 MHz to<br />

862 MHz <strong>in</strong> multimedia separators for CATV systems.<br />

The performance of these transformers is analysed <strong>in</strong><br />

MATLAB by means of node voltage technique <strong>and</strong><br />

compared with the results of measurements.<br />

1. INTRODUCTION<br />

The objective of this article is to present the performance<br />

of multidirectional transformers <strong>in</strong>tended for application<br />

as passive power dividers <strong>in</strong> multimedia separators. These<br />

are exploited <strong>in</strong> CATV systems <strong>in</strong> the frequency range<br />

from 5 MHz to 862 MHz. The proposed multidirectional<br />

transformers substitute for hitherto solutions based on<br />

cascades of traditional unidirectional transformers. The<br />

novel solutions presented here afford the possibility of<br />

divid<strong>in</strong>g the signal <strong>in</strong>to branches of identical coupl<strong>in</strong>g.<br />

This highly desirable feature is very difficult to achieve <strong>in</strong><br />

the correspond<strong>in</strong>g cascades of unidirectional transformers.<br />

Moreover, the multidirectional transformers are capable of<br />

hav<strong>in</strong>g uniformly shaped loss characteristic versus<br />

frequency <strong>and</strong> offer more economical solutions because of<br />

lower material consumption <strong>and</strong> lower cost of labour.<br />

2. UNIDIRECTIONAL<br />

TRANSFORMER<br />

A unidirectional transformer can be realized <strong>in</strong> four<br />

configurations. The first two of them are shown <strong>in</strong> Fig. 1.<br />

These are the basic (traditional) configuration <strong>and</strong> <strong>in</strong>verse<br />

configuration [1], [2], [3]. Each of the other two<br />

configurations, shown <strong>in</strong> Fig. 2, <strong>in</strong>volves an additional<br />

element: the co called zero-resistor. This resistor can be<br />

used to achieve a slight improvement <strong>in</strong> the values of IN –<br />

the mismatch loss at the <strong>in</strong>put, <strong>and</strong> TAP – the mismatch<br />

loss at the coupled l<strong>in</strong>e output. Fig. 3 presents the<br />

measured performance of a unidirectional transformer,<br />

constructed by the first author of the paper, <strong>in</strong> basic<br />

configuration. This is the so-called 10dB transformer. It<br />

means that the nom<strong>in</strong>al value of the IN-TAP attenuation is<br />

10dB. All measurements here (see also Figs. 7 <strong>and</strong> 11)<br />

were done us<strong>in</strong>g the Rohde-Schwarz ZVR 1.70 analyzer.<br />

The mean<strong>in</strong>g of symbols:<br />

• IN-OUT (attenuation <strong>in</strong> the ma<strong>in</strong> l<strong>in</strong>e between the<br />

<strong>in</strong>put IN <strong>and</strong> output OUT),<br />

• IN-TAP (attenuation <strong>in</strong> the coupled l<strong>in</strong>e between the<br />

<strong>in</strong>put IN <strong>and</strong> the branch TAP),<br />

155<br />

• OUT-TAP (attenuation between the ma<strong>in</strong> output OUT<br />

<strong>and</strong> the coupled branch TAP),<br />

• IN (as above),<br />

• OUT (mismatch loss at the output), <strong>and</strong><br />

• TAP (as above),<br />

all expressed <strong>in</strong> dB, is the same as <strong>in</strong> [1]. The frequency<br />

characteristics have also been analyzed <strong>in</strong> the MATLAB<br />

environment [2], [3]; see Fig.4, by us<strong>in</strong>g the node voltage<br />

analysis technique under the quasi-static assumption [4].<br />

This assumption is permissible for frequencies up to<br />

approximately 1.8 GHz for objects fulfill<strong>in</strong>g the condition<br />

f [ GHz]<br />

< 150/<br />

D[<br />

mm]<br />

, where D st<strong>and</strong>s for the diagonal<br />

of the physical object under consideration.<br />

Figure 1.Basic (top) <strong>and</strong> <strong>in</strong>verse (bottom)<br />

configurations of unidirectional transformer.<br />

Thus <strong>in</strong> model<strong>in</strong>g the admittance <strong>and</strong> scatter<strong>in</strong>g matrices<br />

were used, supposed that the circuits are composed only of<br />

lumped elements. A comparison of the calculated<br />

frequency characteristics of the designed transformers<br />

with their measured counterparts confirms that the above<br />

supposition is valid <strong>in</strong> the frequency b<strong>and</strong> of <strong>in</strong>terest here.<br />

The analysis is based on the complex-valued formula for<br />

the frequency dispersion of the ferrite permeability<br />

µ i<br />

µ ( f ) = 1+<br />

(1)<br />

1+<br />

jf / fm<br />

given by (5) <strong>in</strong> [1], where µ i st<strong>and</strong>s for the <strong>in</strong>itial<br />

permeability <strong>and</strong> f m is the frequency of relaxation, <strong>and</strong><br />

j = −1<br />

. Influences of the <strong>in</strong>itial permeability on the<br />

parameters of the 10dB unidirectional transformer <strong>in</strong> basic<br />

configuration are presented <strong>in</strong> Fig.5.


5<br />

0<br />

-5<br />

-10<br />

-15<br />

-20<br />

-25<br />

-30<br />

-35<br />

-40<br />

Figure 2. Configurations of the unidirectional<br />

transformer with zero-resistors.<br />

IN-OUT<br />

IN-TAP<br />

OUT<br />

-45<br />

0 100 200 300 400 500 600<br />

f [MHz]<br />

700 800 900 1000<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

TAP OUT-TAP<br />

IN<br />

Figure 3. The performance of the 10dB<br />

unidirectional transformer <strong>in</strong> basic configuration –<br />

measurement results, µ = 1400 .<br />

IN − OUT<br />

IN − TAP<br />

OUT<br />

TAP<br />

IN<br />

Figure 4. The 10dB unidirectional transformer<br />

characteristics computed <strong>in</strong> MATLAB, basic<br />

configuration, µ = 1400 .<br />

i<br />

i<br />

OUT - TAP<br />

156<br />

0<br />

-5<br />

-10<br />

-15<br />

-20<br />

-25<br />

-30<br />

IN − TAP<br />

OUT<br />

TAP<br />

OUT - TAP<br />

IN − OUT<br />

-35<br />

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000<br />

µ i<br />

IN<br />

Figure 5. The performance of the 10dB<br />

unidirectional transformer <strong>in</strong> basic configuration<br />

versus the <strong>in</strong>itial permeability, f=5 MHz.<br />

3. MULTIDIRECTIONAL<br />

TRANSFORMERS<br />

Fig.6 shows the basic configuration of a novel twodirectional<br />

transformer [2]. The measured characteristics<br />

of the 15dB two-directional transformer are presented <strong>in</strong><br />

Fig.7.<br />

Figure 6. The two-directional transformer <strong>in</strong> basic<br />

configuration.<br />

Fig.8 presents computed characteristics of this<br />

transformer.<br />

Note that the two-directional transformer should be<br />

realized <strong>in</strong> two configurations only: basic <strong>and</strong> <strong>in</strong>verse.<br />

Such transformers without zero resistors are impractical<br />

because of mismatch <strong>in</strong> coupled l<strong>in</strong>es. Obviously, for the<br />

two-directional transformer we dist<strong>in</strong>guish an additional<br />

parameter: an isolation loss TAP A-B between the coupled<br />

branches.


IN-OUT<br />

IN-TAP<br />

-10<br />

-15<br />

-20<br />

-25<br />

-30<br />

-35<br />

-40<br />

-45<br />

0 100 200 300 400 500<br />

f [MHz]<br />

600 700 800 900 1000<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

5<br />

0<br />

-5<br />

IN<br />

OUT<br />

TAP A-B<br />

OUT-TAP<br />

TAP<br />

Figure 7. Experimental characteristics of the 15dB twodirectional<br />

transformer <strong>in</strong> basic configuration, µ = 1400 .<br />

An <strong>in</strong>fluence of the <strong>in</strong>itial permeability on the parameters<br />

of the 15 dB two-directional transformer <strong>in</strong> basic<br />

configuration is presented <strong>in</strong> Fig.9.<br />

Figure 8. Calculated characteristics of the 15dB twodirectional<br />

transformer <strong>in</strong> basic configuration, µ = 1400 .<br />

0<br />

-5<br />

-10<br />

-15<br />

-20<br />

-25<br />

-30<br />

-35<br />

IN − OUT<br />

IN − TAP<br />

TAP<br />

IN<br />

IN<br />

OUT - TAP<br />

IN-<br />

TAP<br />

OUT<br />

TAP A - B OUT - TAP<br />

Figure 9. Influence of the <strong>in</strong>itial permeability on<br />

the characteristics of the 15dB two-directional<br />

transformer <strong>in</strong> basic configuration, f=5 MHz.<br />

i<br />

OUT<br />

TAP A - B<br />

-40<br />

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000<br />

µ<br />

i<br />

IN - OUT<br />

TAP<br />

i<br />

157<br />

Figure 10. The four-directional transformer <strong>in</strong> basic<br />

configuration.<br />

Fig. 10 shows the four-directional transformer <strong>in</strong> basic<br />

configuration [2]. The base for the construction of this<br />

transformer is similar as for the two-directional<br />

transformer. The four-directional transformer <strong>in</strong>cludes<br />

four coupled branches. The measured <strong>and</strong> calculated<br />

characteristics of the 18dB four-directional transformer are<br />

shown <strong>in</strong> Fig.11 <strong>and</strong> Fig.12 respectively. Similarly as for<br />

the unidirectional <strong>and</strong> two-directional transformers, an<br />

<strong>in</strong>fluence of the <strong>in</strong>itial permeability on the parameters of<br />

the four-directional transformer <strong>in</strong> basic configuration was<br />

calculated. The results are presented <strong>in</strong> Fig.13.<br />

IN-OUT<br />

IN<br />

OUT<br />

OUT-TAP<br />

TAP A-B<br />

IN-TAP<br />

TAP<br />

Figure 11. Experimental characteristics of the 18dB fourdirectional<br />

transformer <strong>in</strong> basic configuration, µ = 1400 .<br />

i


-10<br />

-15<br />

-20<br />

-25<br />

-30<br />

-35<br />

-40<br />

-45<br />

0 100 200 300 400 500 600 700 800 900 1000<br />

f [MHz]<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

5<br />

0<br />

-5<br />

Figure 12. Calculated characteristics of the 18dB fourdirectional<br />

transformer <strong>in</strong> basic configuration, µ = 1400 .<br />

0<br />

-5<br />

-10<br />

0<br />

-5<br />

-10<br />

-15<br />

-15<br />

-20<br />

-20<br />

-25<br />

-25<br />

-30<br />

-30<br />

-35<br />

IN<br />

IN - OUT<br />

OUT<br />

TAP A - B<br />

IN<br />

OUT<br />

OUT<br />

TAP A - B<br />

IN-<br />

TAP<br />

TAP A - B<br />

OUT - TAP<br />

OUT - TAP<br />

IN - TAP<br />

-35<br />

-40<br />

0 1000 2000 3000 4000 5000 6000 7000 8000<br />

TAP<br />

9000 10000<br />

-40<br />

0 1000 2000 3000 4000<br />

µ i<br />

5000 6000 7000 8000 9000 10000<br />

µ i<br />

Figure 13. Influence of the <strong>in</strong>itial permeability on<br />

the performance of the 18dB four-directional<br />

transformer <strong>in</strong> basic configuration, f=5 MHz.<br />

4. CONCLUSIONS<br />

The proposed multidirectional transformers, capable of<br />

substitut<strong>in</strong>g for a cascade of unidirectional transformers,<br />

offer:<br />

1.The possibility of divid<strong>in</strong>g the signal <strong>in</strong>to branches of<br />

identical coupl<strong>in</strong>g, which is very difficult to achieve <strong>in</strong> the<br />

correspond<strong>in</strong>g cascades of unidirectional transformers.<br />

2. Uniform attenuation characteristic versus frequency.<br />

3. More economical solutions because of lower material<br />

consumption <strong>and</strong> lower cost of labour, especially <strong>in</strong> case<br />

of multidimensional transformers hav<strong>in</strong>g four <strong>and</strong> more<br />

coupled branches.<br />

IN<br />

OUT - TAP<br />

IN-<br />

TAP<br />

TAP<br />

i<br />

IN-<br />

OUT<br />

IN - OUT<br />

158<br />

REFERENCES<br />

[1] D.I.Kim, M. Takahashi, K. Araki <strong>and</strong> Y. Naito,<br />

Optimum Design of Power Dividers with Ferrite<br />

Toroids for CATV <strong>and</strong>/or MATV Systems, IEEE<br />

Transactions on Consumer <strong>Electronics</strong>, vol. CE-29,<br />

no. 1, pp. 27-38, 1983.<br />

[2] D. Krzemieniecki, Modell<strong>in</strong>g <strong>and</strong> Design of<br />

Directional Transformers, <strong>PhD</strong> Thesis under<br />

preparation, Faculty of <strong>Electronics</strong>,<br />

Telecommunications <strong>and</strong> Informatics, Gdansk<br />

University of Technology, Pol<strong>and</strong> (<strong>in</strong> Polish).<br />

[3] D. Krzemieniecki, Analysis <strong>and</strong> Optimisation of a<br />

Wideb<strong>and</strong> Directional Transformer, <strong>Electronics</strong> <strong>and</strong><br />

Telecommunications Quarterly, <strong>2005</strong>, 51, no. 1,<br />

pp.105-135.<br />

[4] D. De Zutter, J. Sercu <strong>and</strong> T. Dhaene, New<br />

approaches to <strong>in</strong>crease the efficiency of the<br />

electromagnetic modell<strong>in</strong>g of planar RF <strong>and</strong><br />

microwave circuits, Department of Information<br />

Technology, Ghent University, 2004.<br />

http://www.w<strong>in</strong>.tue.nl/masci-net/Events/DeZutter1.pdf<br />

pp. 6, 10, 18.


0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

FABRICATION AND ELECTRICAL<br />

CHARACTERIZATION OF HIGH<br />

PERFORMANCE COPPER / POLYIMIDE<br />

INDUCTORS<br />

Marcelo B. Pisani (1, 2) , Cyrille Hibert (2) , Didier Bouvet (1) , Cather<strong>in</strong>e Deholla<strong>in</strong> (1) <strong>and</strong><br />

Adrian M. Ionescu (1)<br />

École Polytechnique Fédérale de Lausanne (EPFL),<br />

(1) <strong>Electronics</strong> Laboratory (LEG), ELB Build<strong>in</strong>g, Station 11, <strong>and</strong><br />

(2) Center of Micro <strong>and</strong> Nanotechnology (CMI), BM Build<strong>in</strong>g, Station 17,<br />

CH-1015 Lausanne, Switzerl<strong>and</strong><br />

E-mail: marcelo.pisani@epfl.ch<br />

ABSTRACT<br />

This paper presents fabrication <strong>and</strong> RF characterization<br />

results of spiral <strong>in</strong>ductors fabricated us<strong>in</strong>g a developed<br />

damascene-like thick-copper / polyimide process module.<br />

This module has low thermal budget <strong>and</strong> is compatible<br />

with current IC <strong>in</strong>terconnect architectures, mak<strong>in</strong>g it<br />

suitable for CMOS above IC <strong>in</strong>tegration of high quality<br />

factor passive devices. Thick, high-conductive copper<br />

layers associated with low κ polymers <strong>and</strong> high resistivity<br />

substrates can provide RF performances that cannot be<br />

achieved us<strong>in</strong>g conventional th<strong>in</strong> alum<strong>in</strong>um films on lowresistivity<br />

substrates. Peak quality factors of about 20 <strong>and</strong><br />

exceed<strong>in</strong>g 10 over a wide frequency range (1–6 GHz) are<br />

demonstrated, with a self-resonant frequency of about<br />

10 GHz. Measurements <strong>and</strong> equivalent circuit extraction<br />

results are also presented <strong>and</strong> discussed.<br />

1. INTRODUCTION<br />

There is a strong dem<strong>and</strong> for high quality factor passive<br />

devices <strong>in</strong> communication applications <strong>in</strong> order to reduce<br />

power consumption, reduce noise <strong>and</strong> <strong>in</strong>crease both<br />

operat<strong>in</strong>g frequency <strong>and</strong> usable b<strong>and</strong>width of the circuits<br />

[1, 2].<br />

Inductors are key RF passive components that can be co<strong>in</strong>tegrated<br />

with CMOS circuits us<strong>in</strong>g the local metal<br />

<strong>in</strong>terconnects to produce spirals with <strong>in</strong>ductances typically<br />

<strong>in</strong> the range of 1 to 100 nH. In spite of its simplicity, this<br />

approach produces low quality factor <strong>and</strong> low-b<strong>and</strong>width<br />

devices. Peak quality factors <strong>in</strong> the range of 5 to 10 <strong>and</strong><br />

resonant frequencies that do not exceed 5 GHz are<br />

commonly reported us<strong>in</strong>g these technologies [3]. These<br />

poor performances are ma<strong>in</strong>ly related to the limited metal<br />

conductivity (<strong>in</strong> general, th<strong>in</strong> alum<strong>in</strong>um), substrate losses<br />

(<strong>in</strong> general low-resistivity substrate used <strong>in</strong> RF<br />

applications) <strong>and</strong> high capacitance between spiral <strong>in</strong>terturns<br />

<strong>and</strong> between the spiral <strong>and</strong> the substrate (<strong>in</strong> general,<br />

SiO2 is used as <strong>in</strong>ter-metal dielectrics). There is a number<br />

of techniques like the use of patterned ground shields<br />

159<br />

underneath the <strong>in</strong>ductor [4] <strong>and</strong> special substrate contact<br />

techniques [5] than can be employed to <strong>in</strong>crease the<br />

quality factor, but their impact <strong>in</strong> the f<strong>in</strong>al performance of<br />

the devices is limited s<strong>in</strong>ce the ma<strong>in</strong> limitations come<br />

from the metal resistivity <strong>and</strong> thickness available for a<br />

given technology, associated with the low-resistivity<br />

silicon substrate losses.<br />

The use of copper / low κ or gold / low κ <strong>in</strong>terconnects<br />

can improve the performance of the devices, provid<strong>in</strong>g<br />

metals with better conductivity <strong>and</strong> dielectrics with lower<br />

dielectric constant, reduc<strong>in</strong>g both series resistance <strong>and</strong><br />

parasitic capacitances [1, 6]. The association of these<br />

materials with high-resistivity substrates [7] <strong>and</strong> other<br />

microfabrication techniques [8] can provide the best<br />

comb<strong>in</strong>ation of fabrication techniques to achieve high RF<br />

performances.<br />

We report on a 5 µm-thick-copper / polyimide process<br />

module [9] on high resistivity silicon wafers (8 kΩ cm)<br />

capable of provid<strong>in</strong>g high quality factor <strong>in</strong>ductors (<strong>in</strong><br />

excess of 10) for broadb<strong>and</strong> RF applications <strong>in</strong> the<br />

frequency range of 1 to 10 GHz, cover<strong>in</strong>g applications like<br />

ISM, GSM, Bluetooth <strong>and</strong> HIPERLAN. An equivalent<br />

circuit model is proposed <strong>and</strong> extracted for the fabricated<br />

devices.<br />

2. FABRICATION PROCESS AND<br />

CHARACTERIZATION<br />

METHODOLOGY<br />

2.1 Fabrication process<br />

Figure 1 shows schematically a cross-section view of the<br />

technology profile used to fabricate the <strong>in</strong>ductors. The<br />

wafer is <strong>in</strong>sulated from the devices us<strong>in</strong>g a 0.6 µm SiO 2 +<br />

0.2 µm Si 3N 4 stress-compensated bi-layer (a). Polyimide is<br />

then spun on (PI 2610 from Dupont, ε R = 2.9) at typical<br />

thicknesses of 1.5 µm (the thickness of the first metal<br />

layer). Polyimide film is then cured at 300 o C for 1 h.


A 0.3-µm thick sputtered SiO 2 layer is used as hard-mask<br />

to pattern the polyimide mold us<strong>in</strong>g an O 2-based plasma<br />

etch recipe (c, d). This process enables to fabricate vertical<br />

walls with up to 5 µm resolution for 10 µm-thick l<strong>in</strong>es. A<br />

40 nm-th<strong>in</strong> tantalum layer <strong>and</strong> a 0.2 µm-thick copper layer<br />

are deposited by sputter<strong>in</strong>g (e), work<strong>in</strong>g as diffusion<br />

barrier between copper <strong>and</strong> polyimide <strong>and</strong> as seed layer<br />

for the electroplat<strong>in</strong>g of a thick copper layer (f). The<br />

copper l<strong>in</strong>es are then def<strong>in</strong>ed by a fast removal rate<br />

chemical-mechanical polish<strong>in</strong>g step (CMP, g) capable of<br />

polish<strong>in</strong>g films much thicker than the ones traditionally<br />

used <strong>in</strong> th<strong>in</strong>-metal layer <strong>in</strong>terconnects. The successive<br />

repetition of these process steps (h-p) provide a second<br />

level of metal with a via <strong>in</strong> between.<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

a<br />

b<br />

c<br />

d<br />

e<br />

f<br />

g<br />

Materials:<br />

Si N 3 4<br />

SiO 2<br />

Si<br />

h<br />

i<br />

j<br />

k<br />

l<br />

RF Ports<br />

m<br />

n<br />

o<br />

p<br />

M2<br />

M1<br />

Polyimide 4 - 8 µm<br />

ECP Cu 4 - 8 µm<br />

Ta 40 nm + Cu Seed 200nm<br />

Figure 1. Technological process flow used to<br />

fabricate polyimide embedded thick-copper spiral<br />

<strong>in</strong>ductors.<br />

1.1 RF characterization method<br />

Fabricated <strong>in</strong>ductors were measured us<strong>in</strong>g a Cascade<br />

Microtech prober coupled to an HP 8719D vector network<br />

analyzer, operat<strong>in</strong>g <strong>in</strong> the 0.05 to 13.51 GHz frequency<br />

range. The measurement probes have a coplanar<br />

waveguide ground-signal-ground configuration (GSG),<br />

with a 150 µm <strong>in</strong>-l<strong>in</strong>e pitch. They are connected to the<br />

analyzer us<strong>in</strong>g low-loss coaxial cables <strong>and</strong> calibrated<br />

us<strong>in</strong>g an alum<strong>in</strong>a substrate for the short circuits on port 1<br />

<strong>and</strong> 2, 1 ps through, open-circuit <strong>and</strong> 50 Ω load references<br />

(ISS/SOLT calibration technique).<br />

There are a number of quality factor def<strong>in</strong>itions,<br />

depend<strong>in</strong>g on the device application <strong>and</strong> on the circuit<br />

parameters of <strong>in</strong>terest [10]. For isolated devices operat<strong>in</strong>g<br />

160<br />

below the self-resonant frequency, usually measured <strong>and</strong><br />

calibrated S-parameter data are converted <strong>in</strong>to Yparameter<br />

data <strong>and</strong> equivalent <strong>in</strong>ductance L <strong>and</strong> quality<br />

factor Q, as a function of the frequency, are calculated as:<br />

imag(<br />

1/<br />

y11)<br />

L =<br />

2π<br />

f<br />

(1) <strong>and</strong><br />

imag(<br />

1/<br />

y11)<br />

Q =<br />

real(<br />

1/<br />

y )<br />

(2),<br />

11<br />

where f is the frequency <strong>and</strong> 1/y 11 is the equivalent<br />

complex impedance of port 1 when port 2 is connected to<br />

the ground.<br />

The resonant frequency is def<strong>in</strong>ed when the equation (2)<br />

goes to zero, which means that the device behaves as a<br />

capacitor above this frequency.<br />

The same extraction procedure can be applied over Sparameter<br />

data obta<strong>in</strong>ed from electromagnetic field solver<br />

simulators. In this work, we have used ASITIC [10] to<br />

make <strong>in</strong>itial performance estimation <strong>and</strong> design of the<br />

spirals.<br />

For circuit simulation purposes, it is necessary to have a<br />

broadb<strong>and</strong> equivalent circuit that fits the measured data.<br />

This π-equivalent circuit is showed <strong>in</strong> figure 2 [2].<br />

Figure 2. Equivalent 2-port π circuit used to<br />

model the fabricated <strong>in</strong>ductors.<br />

In this model, L S accounts for the series self-<strong>in</strong>ductance<br />

of the device. R S represents the series <strong>in</strong>ductor resistance<br />

term, related directly to the ohmic losses <strong>in</strong> the metal<br />

tracks. This resistance can be reduced us<strong>in</strong>g wider <strong>and</strong><br />

thicker tracks, but this reduction trend is limited <strong>in</strong> very<br />

high frequencies due to the sk<strong>in</strong> effect <strong>and</strong> proximity<br />

effect current conf<strong>in</strong>ement [11]. At the same time, the use<br />

of thicker <strong>and</strong> wider tracks will also <strong>in</strong>crease the<br />

associated parasitic capacitances due to the <strong>in</strong>crease <strong>in</strong> the<br />

device area. C INS1,2 are associated with the spiral-wafer<br />

<strong>in</strong>sulation <strong>and</strong> can be reduced us<strong>in</strong>g lower κ <strong>and</strong> thicker<br />

films for this layer. R SUB1,2 <strong>and</strong> C SUB1,2 are associated with<br />

the silicon substrate. These parameters are roughly<br />

proportional to the <strong>in</strong>ductor surface <strong>and</strong> are strongly<br />

dependent of the substrate resistivity. The use of highresistivity<br />

wafers contribute to <strong>in</strong>crease R SUB1,2 values <strong>and</strong><br />

provide at the same time higher quality factors <strong>and</strong> higher<br />

self-resonant frequencies.


2. RESULTS AND DISCUSSION<br />

Figure 3 shows a top view of a typical fabricated spiral<br />

<strong>in</strong>ductor with the RF prober padd<strong>in</strong>g for the RF electrical<br />

characterization. Typically, fabricated <strong>in</strong>ductors have 250<br />

to 400 µm external diameter, track widths between 10 <strong>and</strong><br />

40 µm, track spac<strong>in</strong>g between 5 <strong>and</strong> 20 µm, <strong>and</strong><br />

<strong>in</strong>ductance values between 1 <strong>and</strong> 10 nH.<br />

Figure 4 shows a focused ion beam (FIB) cross-section<br />

view of a stacked 1.5-µm metal 1 / 3-µm via / 5-µm metal<br />

2 structure on the <strong>in</strong>sulated silicon wafer. The <strong>in</strong>terface<br />

between the metal levels is clean <strong>and</strong> smooth, with a<br />

contact resistance that does not exceed 35 mΩ/via for a<br />

m<strong>in</strong>imum via size of 10 x 10 µm 2 . The walls are well<br />

def<strong>in</strong>ed <strong>and</strong> vertical, <strong>and</strong> good electroplat<strong>in</strong>g fill<strong>in</strong>g can be<br />

observed for via size down to 2.5 µm.<br />

Figure 3. Top view of a fabricated square spiral<br />

with RF pads (each pad is a 100 µm square).<br />

The resistivity of the electroplated copper layers is 2.0 ±<br />

0.1 µΩ cm, measured by the four-po<strong>in</strong>t probe technique<br />

for thicknesses between 1 <strong>and</strong> 8 µm.<br />

The CMP dish<strong>in</strong>g <strong>and</strong> erosion is limited to about 0.2 µm,<br />

which can be considered an acceptable value for a 5 µm<br />

thick structure.<br />

Figure 4. FIB cross-section view of a double<br />

metal layer stacked structure.<br />

Figures 5a <strong>and</strong> 5b show the measured <strong>and</strong> equivalent<br />

circuit results for a 2.7 nH octagonal <strong>in</strong>ductor with an<br />

external diameter of 400 µm, 2.5 turns, 30-µm track width<br />

<strong>and</strong> 10 µm of track spac<strong>in</strong>g. This <strong>in</strong>ductor was optimized<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

161<br />

to work <strong>in</strong> a 2.45-GHz voltage controlled oscillator circuit<br />

(VCO). This device was fabricated on top of a highresistivity<br />

4-<strong>in</strong>ch silicon wafer (8 kΩ cm). The Cu / low κ<br />

process associated with the high-resistivity substrate are<br />

capable of provid<strong>in</strong>g quality factors <strong>in</strong> excess of 10 over a<br />

wide b<strong>and</strong>width (1 to 6 GHz).<br />

In general, measured results have peak quality factors<br />

<strong>and</strong> self-resonant frequencies 15 – 30 % below the ones<br />

predicted us<strong>in</strong>g ASITIC field solver for the <strong>in</strong>ductors <strong>in</strong><br />

the 1 – 10 nH range fabricated us<strong>in</strong>g this technology. We<br />

believe that this discrepancy is due on the one h<strong>and</strong> to an<br />

overestimation <strong>in</strong> the low frequency <strong>in</strong>ductance value <strong>and</strong><br />

on the other h<strong>and</strong> to the <strong>in</strong>accurate model<strong>in</strong>g of the<br />

parasitics <strong>in</strong> frequencies beyond the ones for that tool were<br />

calibrated to be used (around 1–3 GHz) [10]. Typical<br />

simulations us<strong>in</strong>g slow electromagnetic analysis to<br />

provide more precise results take many hours of<br />

computation <strong>and</strong> frequently present convergence<br />

problems, suggest<strong>in</strong>g that, for the predictive model<strong>in</strong>g of<br />

<strong>in</strong>ductors us<strong>in</strong>g low-loss substrates <strong>and</strong> thick metal layers,<br />

a more advanced electromagnetic simulation tool is<br />

probably needed.<br />

Table 1 shows the extracted equivalent circuit model for<br />

the measurements. The results were obta<strong>in</strong>ed by a nonl<strong>in</strong>ear<br />

least-squares fit us<strong>in</strong>g the frequency range below the<br />

self-resonant frequency. The quality of the fit is sufficient<br />

for SPICE-like circuit simulations us<strong>in</strong>g the device.<br />

The use of high-resistivity substrates reflects especially<br />

<strong>in</strong> the substrate losses (modeled by RSUB1 <strong>and</strong> RSUB2 resistances). Typically, the same spiral on top of a lowresistivity<br />

substrate would present equivalent resistance<br />

values <strong>in</strong> the 20 – 200 Ω range. The use of the highresistivity<br />

substrates permits an improvement of a factor<br />

from 5 to 20 <strong>in</strong> this parameter.<br />

3. CONCLUSIONS<br />

We have designed <strong>and</strong> fabricated <strong>in</strong>ductors with<br />

experimental peak quality factors around 20 <strong>and</strong> quality<br />

factors <strong>in</strong> excess of 10 over a wide operat<strong>in</strong>g frequency<br />

range (1 – 6 GHz). Self-resonant frequency of about<br />

10 GHz is also demonstrated, mak<strong>in</strong>g a same <strong>in</strong>ductor<br />

device usable <strong>in</strong> multi-b<strong>and</strong> circuits, which is <strong>in</strong> advance<br />

compared to state-of-the-art. The developed process us<strong>in</strong>g<br />

low-resistive thick copper / polyimide <strong>and</strong> performed on<br />

top of high-resistivity substrates delivers performances<br />

that cannot be achieved us<strong>in</strong>g st<strong>and</strong>ard alum<strong>in</strong>um / SiO 2<br />

RF-IC technologies on top of low-resistivity substrates.<br />

The fabrication process is compatible with st<strong>and</strong>ard above<br />

IC <strong>in</strong>terconnects <strong>and</strong> presents low thermal budget, mak<strong>in</strong>g<br />

it useful for direct co-<strong>in</strong>tegration with CMOS circuits.<br />

Broadb<strong>and</strong> equivalent circuit model suitable for SPICElike<br />

simulation below the self-resonant frequency has been<br />

extracted by fitt<strong>in</strong>g the measured performances.


Inductance (nH)<br />

20<br />

10<br />

-10<br />

-20<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

0<br />

Ldc=2.7nH<br />

Measurement<br />

Fitted model<br />

0 2 4 6 8 10 12 14<br />

Frequency (GHz)<br />

Figure 5a. Measured <strong>and</strong> equivalent circuit<br />

effective <strong>in</strong>ductance for the 2.7 nH octagonal<br />

<strong>in</strong>ductor (eq. 1).<br />

Quality Factor<br />

20<br />

15<br />

10<br />

5<br />

0<br />

-5<br />

-10<br />

Measured<br />

Fitted<br />

0 2 4 6 8 10 12 14<br />

Frequency (GHz)<br />

Figure 5b. Measured <strong>and</strong> simulated quality factor<br />

results for the 2.7 nH <strong>in</strong>ductor (eq. 2).<br />

LS = 2.7 nH<br />

RS = 1.3 Ω<br />

CP = 18 fF<br />

CINS1 = CINS2 = 160 fF<br />

RSUB1 = RSUB2 = 580 Ω<br />

CSUB1 = CSUB2 =132 fF<br />

Peak quality factor: 18.9 @ 2.50 GHz<br />

Self-resonant frequency: 10.2 GHz<br />

Table 1. Equivalent π-circuit model parameters<br />

for the 2.7 nH <strong>in</strong>ductor (figure 2), calculated peak<br />

quality factor <strong>and</strong> self-resonant frequency (eq. 3).<br />

162<br />

Acknowledgments<br />

This work has been supported by the IST Wide-RF EU<br />

project (IST-2001-33286) <strong>and</strong> by the Swiss OFES project<br />

01.0308. Many thanks are due to the EPFL-CMI staff for<br />

the technical <strong>and</strong> clean room facilities support.<br />

4. REFERENCES<br />

[1] Ste<strong>in</strong>brüchel <strong>and</strong> B. L. Ch<strong>in</strong>. Copper <strong>in</strong>terconnect<br />

technology. SPIE Press, Wash<strong>in</strong>gton, 2001.<br />

[2] T.H. Lee <strong>and</strong> A. Hajimiri, Oscillator phase noise: a<br />

tutorial”, IEEE Journal of Solid-State Circuits, vol.<br />

35, no. 3, pp. 326-336, 2000.<br />

[3] P. Yue <strong>and</strong> S. S. Wong, Physical model<strong>in</strong>g of spiral<br />

<strong>in</strong>ductors on silicon, IEEE Transactions on Electron<br />

Devices, vol. 47, n. 3, pp. 560-568, 2000.<br />

[4] C. P. Yue <strong>and</strong> S. S. Wong, "On-chip spiral <strong>in</strong>ductors<br />

with patterned ground shields for Si-based RF IC's",<br />

IEEE Journal of Solid-state Circuits, vol. 33, n. 5,<br />

1998, pp. 743-752.<br />

[5] J.N. Burghartz, A.E. Ruehli, K.A. Jenk<strong>in</strong>s, M.<br />

Soyuer, D. Nguyen-Ngoc, Novel substrate contact<br />

structure for high-Q silicon-<strong>in</strong>tegrated spiral<br />

<strong>in</strong>ductors, Proc. of the International Electron Devices<br />

Meet<strong>in</strong>g, pp. 55-58, 1997.<br />

[6] I.J. Bahl, “High-performance <strong>in</strong>ductors”, IEEE Trans.<br />

on Microwave Theory <strong>and</strong> Techniques, vol. 49, n. 4,<br />

2001, pp. 654-664.<br />

[7] A.C. Reyes, S.M. El-Ghazaly, S. Dorn, M. Dydyk,<br />

D.K. Schroder, <strong>and</strong> H. Patterson, “High resistivity Si<br />

as a microwave substrate”, Proc. of the 46th<br />

Electronic Components <strong>and</strong> Technology Conference,<br />

May 1996, pp. 382 -391.<br />

[8] J.M. López-Villegas et al, “Study of <strong>in</strong>tegrated RF<br />

passive components performed us<strong>in</strong>g CMOS <strong>and</strong> Si<br />

micromach<strong>in</strong><strong>in</strong>g technologies”, J. Micromech.<br />

Microeng, vol. 7, n. 3, Sept. 1997, pp. 162-164.<br />

[9] M. B. Pisani, C. Hibert, D. Bouvet, P. Beaud <strong>and</strong> A.<br />

M. Ionescu, Cooper/polyimide fabrication process for<br />

above RF IC <strong>in</strong>tegration of high quality factor<br />

<strong>in</strong>ductors, Microelectronic Eng<strong>in</strong>eer<strong>in</strong>g, vol. 73-74,<br />

pp. 474-479, 2004<br />

[10] A.M. Niknejad <strong>and</strong> R.G. Meyer, Analysis, Design<br />

<strong>and</strong> Optimization of Spiral Inductors <strong>and</strong><br />

Transformers for Si RF IC’s, IEEE Journal of Solid-<br />

State Circuits, vol. 36, no. 10, 1998.<br />

[11] W.B. Kuhn <strong>and</strong> N.M. Ibrahim, Analysis of current<br />

crowd<strong>in</strong>g effects <strong>in</strong> multiturn spiral <strong>in</strong>ductors, IEEE<br />

Transactions on Microwave Theory <strong>and</strong> Techniques,<br />

vol. 49, no. 1, pp. 31-38, Jan. 2001.


0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

DESIGN OF ENHANCED MULTI-BIT<br />

DISTRIBUTED MEMS VARIABLE TRUE-TIME<br />

DELAY LINES<br />

ABSTRACT<br />

This paper shows how periodic structure theory can be<br />

used to precisely model <strong>and</strong> design conventional<br />

distributed MEMS variable true-time delay l<strong>in</strong>es, so as to<br />

deduce an improved topology for multi-bit operation.<br />

Results for the new configuration are then presented<br />

along with its conventional counterpart performances,<br />

highlight<strong>in</strong>g the advantage of the new configuration <strong>in</strong><br />

terms of mismatch <strong>and</strong> delay distortion.<br />

1. INTRODUCTION<br />

This paper presents new contributions to distributed<br />

MEMS transmission l<strong>in</strong>es (DMTL) work<strong>in</strong>g on the<br />

pr<strong>in</strong>ciple of MEMS capacitors load<strong>in</strong>g a coplanar<br />

waveguide [1]. The performance of such DMTL <strong>in</strong> terms<br />

of differential delay over losses <strong>and</strong> stability were<br />

significantly improved by <strong>in</strong>creas<strong>in</strong>g the load<strong>in</strong>g<br />

capacitance ratio by replac<strong>in</strong>g the load<strong>in</strong>g analog<br />

capacitors by digital ones [2]. However, this results <strong>in</strong><br />

significant degradation <strong>in</strong> the match<strong>in</strong>g properties for the<br />

DMTL, which is a very limit<strong>in</strong>g factor for multi-bit digital<br />

DMTL. This mismatch is not only detrimental <strong>in</strong> terms of<br />

reflected energy but also results <strong>in</strong> distortion <strong>in</strong> the delay.<br />

The ma<strong>in</strong> reason for implement<strong>in</strong>g a DMTL is to obta<strong>in</strong> a<br />

constant delay over a very wide b<strong>and</strong>width [3]. Therefore,<br />

it is not possible to use conventional, frequencydependant,<br />

match<strong>in</strong>g techniques. Even if operated over a<br />

relatively narrow b<strong>and</strong>, the match<strong>in</strong>g circuits would have<br />

to be reconfigurable, <strong>and</strong> such complicated structures<br />

would <strong>in</strong>troduce parasitics effect detrimental to the DMTL<br />

performances. F<strong>in</strong>ally, a tapered match requires a length<br />

comparable with the wavelength. This solution is<br />

therefore too space consum<strong>in</strong>g for MMIC applications <strong>and</strong><br />

would also <strong>in</strong>crease the overall <strong>in</strong>sertion loss of the<br />

DMTL.<br />

We propose here a new topology for such multi-bit<br />

DMTL. Compared with the usual way to achieve multi-bit<br />

operation, namely by cascad<strong>in</strong>g similar l<strong>in</strong>e sections of<br />

different length, we show that <strong>in</strong>terlac<strong>in</strong>g bridges of<br />

different capacitance values (correspond<strong>in</strong>g to the<br />

different bits) allows improv<strong>in</strong>g the match<strong>in</strong>g properties of<br />

the DMTL, which also reduces the oscillations <strong>in</strong> the<br />

delay.<br />

J. Perruisseau-Carrier, R. Fritschi, A.K. Skrivervik<br />

Ecole Polytechnique Fédérale de Lausanne (EPFL)<br />

Station 11, 1015 Lausanne, Switzerl<strong>and</strong><br />

E-mail: julien.perruisseau@epfl.ch<br />

163<br />

The periodic structure approach so as the new <strong>in</strong>terlaced<br />

bits topology are directly applicable to other type of<br />

periodic variable true-time delay l<strong>in</strong>es (V-TTDL), such as<br />

[4], <strong>and</strong> could also be of use for other types of MEMS<br />

periodic structure (e.g., [5]).<br />

This paper is organized as follows. First, we briefly show<br />

how the periodic approach, along with a comprehensive<br />

circuit model, allows model<strong>in</strong>g precisely the conventional<br />

DMTL. An <strong>in</strong>-depth description of the method is available<br />

<strong>in</strong> [3]. The approach is then applied to show why the new<br />

topology for multi-bit DMTL is more perform<strong>in</strong>g <strong>in</strong> terms<br />

of match<strong>in</strong>g. F<strong>in</strong>ally, a comparison between the two types<br />

of DMTL is carried out.<br />

2. FINITE PERIODIC STRUCTURE<br />

A DMTL is a particular one-dimensional periodic<br />

structure whose unit cell consists <strong>in</strong> a MEMS shunt<br />

capacitor load<strong>in</strong>g a coplanar waveguide (CPW). Such a<br />

periodic device can be modeled by cascad<strong>in</strong>g N identical<br />

two-port networks, each of those correspond<strong>in</strong>g to a unit<br />

cell of the structure. If a cell <strong>in</strong> the structure is<br />

symmetrical <strong>and</strong> def<strong>in</strong>ed by its transmission matrix (hence<br />

the four ABCD parameters named here A cell=D cell, B cell<br />

<strong>and</strong> C cell), it can be shown that the transmission matrix of<br />

the complete structure is [3]:<br />

( Nγequd) Zequ ( Nγequd) ( Nγequd) ( Nγequd) ⎡ cosh s<strong>in</strong>h ⎤<br />

N ⎢ ⎥<br />

TPS= Tcell<br />

=⎢1 ⎥ (1)<br />

⎢ s<strong>in</strong>h cosh<br />

Z<br />

⎥<br />

⎣ equ<br />

⎦<br />

where γequ <strong>and</strong> Zequ are the periodic transmission l<strong>in</strong>es<br />

equivalents of the periodic structure, def<strong>in</strong>ed <strong>in</strong> the case of<br />

symmetrical cells by:<br />

( )<br />

cell<br />

cosh γ equd = Acell<br />

Zequ<br />

Ccell<br />

B<br />

=± (2),(3)<br />

The transmission matrix given by (1) is exactly equivalent<br />

to the transmission matrix of a cont<strong>in</strong>uous l<strong>in</strong>e of length<br />

Nd, with characteristics γ equ <strong>and</strong> Z equ, with the restriction<br />

that the voltages <strong>and</strong> currents are only def<strong>in</strong>ed at the<br />

connections of the periodic cells. As a result, conventional<br />

transmission l<strong>in</strong>e theory can be used for the design <strong>and</strong><br />

analysis of DMTL, provided that the DMTL is described<br />

by its periodic structure equivalent parameters γ equ <strong>and</strong><br />

Z equ. This calculation is exact on the basis of the circuit


model <strong>and</strong> does not require any assumption on the number<br />

of cells of the periodic structure [3].<br />

For simplicity, only the symmetrical case was considered<br />

here. However, the new topology presented <strong>in</strong> section 4 is<br />

made of asymmetrical cells. In this case, a periodic<br />

structure model still applies <strong>and</strong> the equivalents (2),(3) are<br />

written <strong>in</strong> a more general way as:<br />

1<br />

cosh ( γ equd) = ( Acell + Dcell<br />

)<br />

(4)<br />

2<br />

Z<br />

equ<br />

=+<br />

−2Bcell<br />

A − D ± A + D −4<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

( ) 2<br />

cell cell cell cell<br />

(5)<br />

It is noticeable that <strong>in</strong> this case, the two solutions for Z equ,<br />

which correspond to the two different directions of<br />

propagation along the structure, do not only differ by their<br />

sign.<br />

3. CONVENTIONAL V-DMTL<br />

3.1 Model<strong>in</strong>g <strong>and</strong> design<br />

The comprehensive circuit model shown <strong>in</strong> Fig. 1, so as<br />

design methods of DMTL based on the periodic structure<br />

model<strong>in</strong>g are described by the authors <strong>in</strong> [3] <strong>and</strong> have<br />

been used here to model <strong>and</strong> design DMTL such as<br />

presented <strong>in</strong> [2].<br />

Figure 1: Circuit model for a cell <strong>in</strong> a<br />

conventional DMTL<br />

3.2 Results<br />

Various DMTL have been built on the fabrication process<br />

presented <strong>in</strong> [3]. We present here the measurements for<br />

another digital design, compared with its circuit model<br />

results. The measured <strong>and</strong> simulated delays, <strong>in</strong>sertion loss<br />

<strong>and</strong> match<strong>in</strong>g results are presented <strong>in</strong> Fig. 2 <strong>and</strong> 3. A good<br />

agreement is observed <strong>and</strong> validates the model of Fig. 1.<br />

The parameters of the l<strong>in</strong>e are: differential delay ∆D =<br />

18ps, Bragg frequency f B = 80GHz <strong>and</strong> capacitance ratio<br />

C r = 2.3. The numerical values for the circuit model are<br />

CM = 27fF, C F = 29fF, N = 26 cells of d=302µm, R P = 1Ω,<br />

LP = 50pH, ∆L S = -2pH <strong>and</strong> ∆R S = 0.1Ω <strong>and</strong> 0.15Ω at<br />

20GHz, <strong>in</strong> the up <strong>and</strong> down states, respectively.<br />

4. INTERLACED BITS V-DMTL<br />

4.1 Concept <strong>and</strong> model<strong>in</strong>g<br />

When multi-bit operation is required for DMTL (N > 1<br />

bits), the usual approach is to cascade a number N of<br />

164<br />

Phase Delay [ps]<br />

|s 11 | [dB]<br />

110<br />

100<br />

90<br />

80<br />

20V<br />

70<br />

0 5 10<br />

f [GHz]<br />

15 20<br />

DMTL as presented <strong>in</strong> section 3, each of those provid<strong>in</strong>g a<br />

delay correspond<strong>in</strong>g to a bit weight [2]. This structure is<br />

schematically depicted <strong>in</strong> the left of Fig. 4 <strong>in</strong> the case N =<br />

2 <strong>and</strong> will always be referred to as CB-DMTL for<br />

cascaded bits DMTL. On the right of the figure, we<br />

propose a new structure where the bridges actuated by the<br />

different bits are not cascaded, but <strong>in</strong>terlaced. This<br />

structure will be referred to as the <strong>in</strong>terlaced bits DMTL<br />

(IB-DMTL). Both structures are operated <strong>in</strong> four different<br />

states determ<strong>in</strong>ed by the value of each control bit, named<br />

AB <strong>in</strong> the CB-DMTL case <strong>and</strong> CD <strong>in</strong> the IB-DMTL case<br />

(see Fig. 4). These different states will be referred to <strong>in</strong><br />

this document us<strong>in</strong>g the bit values, i.e. 00, 01, 10 <strong>and</strong> 11.<br />

The bits ‘0’ <strong>and</strong> ‘1’ mean that the correspond<strong>in</strong>g MEMS<br />

capacitors are <strong>in</strong> the ‘up’ <strong>and</strong> ‘down’ states, respectively.<br />

The evolution from CB to IB-DMTL does not simply<br />

consist <strong>in</strong> operat<strong>in</strong>g a s<strong>in</strong>gle section of the CB-DMTL<br />

with ‘four-state MEMS capacitors’ s<strong>in</strong>ce there are l<strong>in</strong>e<br />

sections between the two types of bridges C <strong>and</strong> D, <strong>in</strong><br />

order to limit the distortion <strong>in</strong> the high frequencies (Bragg<br />

frequency). It is also noticeable that the IB-DMTL cannot<br />

be obta<strong>in</strong>ed straightforwardly from a CB-DMTL design<br />

but requires a dedicated design procedure to appropriately<br />

control the differential delays.<br />

In order to underst<strong>and</strong> the ma<strong>in</strong> difference between both<br />

DMTL types, let us qualitatively study the equivalent<br />

characteristic impedances of the periodic structures. We<br />

first consider the case of the CB-DMTL. In order to<br />

0V<br />

Figure 2: Absolute phase delay (Pla<strong>in</strong> l<strong>in</strong>e:<br />

measurement, dotted: circuit model)<br />

0<br />

-10<br />

-20<br />

-30<br />

20V<br />

0V<br />

0V<br />

20V<br />

-40<br />

0 5 10<br />

f[GHz]<br />

15<br />

-5<br />

20<br />

0<br />

-1<br />

-2<br />

-3<br />

-4<br />

|s 21 | [dB]<br />

Figure 3: Insertion loss <strong>and</strong> match<strong>in</strong>g (Pla<strong>in</strong> l<strong>in</strong>e:<br />

measurement, dotted: circuit model)


m<strong>in</strong>imize the mismatch <strong>in</strong> the four states altogether, the<br />

equivalent characteristic impedance of each section <strong>in</strong> the<br />

state 0 (high impedance) are placed above the reference<br />

impedance Z ref. In the state 1, the impedance is decreased<br />

<strong>and</strong> should now be below Z ref. This process is illustrated<br />

by Fig. 5 <strong>and</strong> Fig. 6 <strong>and</strong> expla<strong>in</strong>ed <strong>in</strong> detail <strong>in</strong> [3]. In the<br />

case of the IB-DMTL, there is only one periodic structure,<br />

but with four different characteristic impedances, which<br />

will be ‘placed’ by the design algorithm as schematically<br />

shown <strong>in</strong> the right of Fig. 6. The worst case <strong>in</strong> terms of<br />

match<strong>in</strong>g <strong>in</strong> the CB-DMTL occurs when the states 01 or<br />

10 are operated s<strong>in</strong>ce a significant equivalent impedance<br />

discont<strong>in</strong>uity occurs at the <strong>in</strong>terface between both bit l<strong>in</strong>es<br />

(Fig. 5, Fig. 6). This problem is solved by the IB-DMTL<br />

structure s<strong>in</strong>ce there is no discont<strong>in</strong>uity <strong>in</strong> the l<strong>in</strong>e <strong>in</strong> terms<br />

of equivalent impedance, <strong>and</strong> that the impedance <strong>in</strong> the<br />

state 01 <strong>and</strong> 10 is close to the reference impedance (Fig. 6,<br />

right).<br />

4.2 Design<br />

The design of the IB-DMTL was done us<strong>in</strong>g a method<br />

similar to the one used for the CB-DMTL but was further<br />

developed so as to obta<strong>in</strong> quasi equals differential delays<br />

between the four states. The method was fully<br />

programmed <strong>in</strong> Matlab <strong>and</strong> provides all parameters of the<br />

DMTL, <strong>in</strong>clud<strong>in</strong>g the four capacitance values<br />

correspond<strong>in</strong>g to the two types of bridge <strong>in</strong> each state.<br />

4.3 Results<br />

This section compares the performances of the two types<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

Cascaded Bits (CB-DMTL) Interlaced Bits (IB-DMTL)<br />

Figure 4: Topologies <strong>and</strong> unit cell circuits : CB-DMTL (left) <strong>and</strong> IB-DMTL (right)<br />

Figure 5: Equivalent l<strong>in</strong>es model for the CB-DMTL (left) <strong>and</strong> IB-DMTL (right)<br />

Figure 6: Illustration of the match<strong>in</strong>g process for the CB-DMTL (left) <strong>and</strong> IB-DMTL (right)<br />

165<br />

of DMTL. For fair comparison, the designs were made<br />

us<strong>in</strong>g the same material, delay, <strong>and</strong> b<strong>and</strong>width. The results<br />

obta<strong>in</strong>ed from the model used for the design are verified<br />

by full-wave simulation with Ansoft HFSS v9.2. In Fig. 8<br />

<strong>and</strong> Fig. 9 are respectively plotted the absolute delay <strong>and</strong><br />

<strong>in</strong>put return loss obta<strong>in</strong>ed. It is first observed that the<br />

circuit model <strong>and</strong> the full-wave simulation are <strong>in</strong> very<br />

good agreement.<br />

As can be seen <strong>in</strong> Fig. 9 <strong>and</strong> Table 1, the CB-DMTL <strong>and</strong><br />

IB-DMTL exhibit approximately the same <strong>in</strong>put return<br />

loss <strong>in</strong> states 00 <strong>and</strong> 11 as there is no mismatch at the<br />

transition between the two bit sections <strong>in</strong> the CB-DMTL.<br />

However, <strong>in</strong> the states 01 <strong>and</strong> 10, an important mismatch<br />

occurs <strong>in</strong> the cascaded type, s<strong>in</strong>ce two l<strong>in</strong>es of impedances<br />

Figure 7: 3D view of the designed IB-DMTL.<br />

The small fixed capacitor of one of the bridge is<br />

realized by the coupl<strong>in</strong>g through the silicon<br />

substrate alone.


Phase Delay [ps]<br />

Phase Delay [ps]<br />

85<br />

80<br />

75<br />

70<br />

85<br />

80<br />

44Ω <strong>and</strong> 56.4Ω are cascaded. This problem does not occur<br />

<strong>in</strong> the IB-DMTL case, where the states 01 <strong>and</strong> 10<br />

correspond to <strong>in</strong>termediate impedances (52.5 <strong>and</strong> 46.7Ω),<br />

hence a low <strong>in</strong>put return loss. As mentioned previously,<br />

some oscillations <strong>in</strong> the delay are due to the mismatch <strong>and</strong><br />

are therefore lower <strong>in</strong> the IB-DMTL case.<br />

Another important parameter for DMTL is the <strong>in</strong>sertion<br />

loss <strong>and</strong> it is seen that both configurations exhibit<br />

approximately the same performance from this po<strong>in</strong>t of<br />

view. The reason is the predom<strong>in</strong>ance of ohmic losses<br />

over mismatch losses <strong>in</strong> the overall <strong>in</strong>sertion losses.<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

CB-DMTL<br />

IB-DMTL<br />

75<br />

70<br />

state 00<br />

state 01<br />

state 10<br />

state 11<br />

65<br />

4 8 12<br />

f [GHz]<br />

16 20<br />

Figure 8: Delays of the CB-DMTL <strong>and</strong> IB-<br />

DMTL (pla<strong>in</strong>: HFSS simulation, dotted: lumped<br />

model).<br />

5. CONCLUSION<br />

We demonstrated the advantage of the IB-DMTL structure<br />

over its usual CB-DMTL counterpart from match<strong>in</strong>g <strong>and</strong><br />

delay distortion po<strong>in</strong>ts of view, <strong>in</strong> a two bit operation. The<br />

cost of this improvement is an <strong>in</strong>creased design<br />

complexity.<br />

6. REFERENCES<br />

[1] N. S. Barker <strong>and</strong> G. M. Rebeiz, "Optimization of<br />

distributed MEMS transmission l<strong>in</strong>es phase shifters,<br />

U-b<strong>and</strong> <strong>and</strong> W-b<strong>and</strong> designs," IEEE Trans.<br />

Microwave Theory Tech., vol. 48, no. 11, pp. 1957-<br />

1966, Nov. 2000.<br />

[2] J. Hayden <strong>and</strong> G. Rebeiz, "Very low-loss distributed<br />

X-b<strong>and</strong> <strong>and</strong> Ka-b<strong>and</strong> MEMS phase shifters us<strong>in</strong>g<br />

166<br />

|s 11 | [dB]<br />

|s 11 | [dB]<br />

-5<br />

-10<br />

-15<br />

-20<br />

-25<br />

-5<br />

-10<br />

-15<br />

-20<br />

-25<br />

CB-DMTL<br />

IB-DMTL state 00<br />

state 01<br />

state 10<br />

state 11<br />

4 8 12<br />

f [GHz]<br />

16 20<br />

Figure 9: Input return loss of the CB-DMTL <strong>and</strong><br />

IB-DMTL (pla<strong>in</strong>: HFSS simulation, dotted: lumped<br />

model).<br />

Table 1: Comparison of the two types of DMTL <strong>in</strong> terms<br />

of the equivalent impedances <strong>in</strong> [Ω], the max. of |S 11| <strong>and</strong><br />

the m<strong>in</strong>. of |S 21|, <strong>in</strong> [dB].<br />

CB-DMTL IB-DMTL<br />

Zequ,A Zequ,B S11max S21m<strong>in</strong> Zequ S11max S21m<strong>in</strong><br />

00 56.4 56.4 -19 -0.7 56.7 -18 -0.8<br />

01 56.4 44 -12 -0.7 52.5 -25 -0.7<br />

10 44 56.4 -12 -0.8 46.7 -23 -0.7<br />

11 44 44 -17 -0.7 44.2 -19 -0.6<br />

metal-air-metal capacitors," IEEE Trans. Microwave<br />

Theory Tech., vol. 51, no. 1, pp. 309-314, Jan. 2003.<br />

[3] J. Perruisseau-Carrier, R. Fritschi, P. Crespo-Valero,<br />

<strong>and</strong> A. K. Skrivervik, "Model<strong>in</strong>g of periodic<br />

distributed MEMS, application to the design of<br />

variable true-time delay l<strong>in</strong>es," IEEE Trans.<br />

Microwave Theory Tech., In Review, <strong>2005</strong>.<br />

[4] A. S. Nagra <strong>and</strong> R. A. York, "Distributed analog<br />

phase shifters with low <strong>in</strong>sertion loss," IEEE Trans.<br />

Microwave Theory Tech., vol. 47, no. 9, pp. 1705 -<br />

1711, Sept. 1999.<br />

[5] H.-T. Kim, J.-H. Park, S. Lee, S. Kim, J.-M. Kim,<br />

Y.-K. Kim, <strong>and</strong> Y. Kwon, "V-b<strong>and</strong> 2-b <strong>and</strong> 4-b lowloss<br />

<strong>and</strong> low-voltage distributed MEMS digital phase<br />

shifter us<strong>in</strong>g metal-air-metal capacitors," IEEE<br />

Trans. Microwave Theory Tech., vol. 50, no. 12, pp.<br />

2918 - 2923, Dec. 2002.


0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

INDUCTANCE CALCULATION OF<br />

THICK-METAL INDUCTORS<br />

A. Scuderi 1 , T. Biondi 2 , E. Ragonese 1 , G. Palmisano 1<br />

1 Università di Catania, Facoltà di Ingegneria, DIEES, Viale A. Doria 6, 95125 Catania, Italy<br />

2 ST<strong>Microelectronics</strong>, Stradale Primosole 50, 95121 Catania, Italy<br />

E-mail: ascuderi@diees.unict.it<br />

ABSTRACT<br />

In this paper, the analysis <strong>and</strong> model<strong>in</strong>g of thick-metal<br />

spiral <strong>in</strong>ductors are addressed. The accuracy of a 2.5 D<br />

electromagnetic simulator is first validated by comparison<br />

with on-wafer experimental measurements. Simulation<br />

results are then employed to <strong>in</strong>vestigate the effect of metal<br />

thicken<strong>in</strong>g on <strong>in</strong>ductor performance. The <strong>in</strong>ductance<br />

decrease due to metal thicken<strong>in</strong>g is modeled by us<strong>in</strong>g a<br />

modified current-sheet expression. The proposed formula<br />

achieves higher accuracy compared to the orig<strong>in</strong>al one<br />

reveal<strong>in</strong>g errors below 5% even for thickness-to-width<br />

ratio up to 2.5.<br />

1. INTRODUCTION<br />

Dur<strong>in</strong>g the last years, spiral <strong>in</strong>ductors became key<br />

components <strong>in</strong> silicon-based technologies to achieve high<br />

<strong>in</strong>tegration levels <strong>and</strong> reduce fabrication costs. However,<br />

their quality may strongly affect the performance of most<br />

RF circuit blocks. Indeed, silicon <strong>in</strong>tegrated <strong>in</strong>ductors<br />

suffer from both substrate <strong>and</strong> metal losses. To rise the<br />

quality factor, <strong>in</strong>creas<strong>in</strong>g the metal thickness <strong>and</strong> replac<strong>in</strong>g<br />

alum<strong>in</strong>um with copper were profitably <strong>in</strong>vestigated.<br />

Despite several experimental results, only few works faced<br />

the impact of metal thicken<strong>in</strong>g on the design, optimization<br />

<strong>and</strong> model<strong>in</strong>g of spiral <strong>in</strong>ductors [1]. Moreover, much<br />

<strong>in</strong>terest was focused on the <strong>in</strong>crease of quality factor<br />

overlook<strong>in</strong>g the reduction of <strong>in</strong>ductance. Although this<br />

phenomenon is taken <strong>in</strong>to account <strong>in</strong> many expressions<br />

reported <strong>in</strong> literature [2]-[4], their validation was only<br />

limited to medium-thickness <strong>in</strong>ductors with thicknessto-width<br />

ratios below 0.25.<br />

In this paper, the <strong>in</strong>ductance decrease of thick-metal<br />

silicon <strong>in</strong>tegrated <strong>in</strong>ductors is <strong>in</strong>vestigated. A closed-form<br />

expression for low-frequency <strong>in</strong>ductance calculation is<br />

also proposed. The accuracy of ADS Momentum on the<br />

prediction of <strong>in</strong>ductor performance is demonstrated by<br />

compar<strong>in</strong>g electromagnetic (EM) simulations with<br />

measurements available from previous work [5]. The<br />

proposed equation for <strong>in</strong>ductance calculation is validated<br />

<strong>in</strong> a wide range of geometrical parameters <strong>and</strong> thicknessto-width<br />

ratios by means of EM simulations.<br />

2. EM SIMULATIONS<br />

In order to <strong>in</strong>vestigate the effect of metal thickness on the<br />

low-frequency <strong>in</strong>ductance of circular spirals, the improved<br />

capabilities of a commercial 2.5-D EM simulator, ADS<br />

167<br />

Momentum by Agilent Technologies, have been exploited.<br />

The accuracy achievable with Momentum <strong>and</strong> the<br />

simulator setup were properly validated by tak<strong>in</strong>g<br />

advantage of several <strong>in</strong>tegrated <strong>in</strong>ductors made available<br />

by previous research [5]. The employed fabrication<br />

technology, whose detailed cross-section is shown <strong>in</strong><br />

Fig. 1, allows only three Al-metal layers. The Metal 3 is a<br />

3-µm thick metal layer <strong>and</strong> is used for the spiral, while the<br />

Metal 2 <strong>and</strong> the Metal 1 are two th<strong>in</strong>ner metal layers used<br />

for the underpass <strong>and</strong> the ground plane, respectively. A<br />

radial pattern of oxide trench was adopted to break<br />

<strong>in</strong>duced current loops with<strong>in</strong> the buried layer shield<strong>in</strong>g the<br />

spiral from the underly<strong>in</strong>g substrate. Buried layer contacts<br />

were placed at the edge of the ground plane at a distance<br />

of 50 µm from the spiral.<br />

Figure 1. Fabrication technology cross-section.<br />

Integrated <strong>in</strong>ductors have turn number (n) from 1.5 to 5.5,<br />

metal width (w) from 6 to 20 µm, <strong>in</strong>ner diameter (d<strong>in</strong>) from<br />

50 to 150 µm, <strong>and</strong> <strong>in</strong>ter-metal spac<strong>in</strong>g (s) of 4 µm, which<br />

results <strong>in</strong> <strong>in</strong>ductances rang<strong>in</strong>g from 0.3 to 8.6 nH cover<strong>in</strong>g<br />

most of the values employed <strong>in</strong> the design of silicon RF<br />

ICs. The characterization was carried out by exploit<strong>in</strong>g a<br />

5-step de-embedd<strong>in</strong>g technique. A micrograph of an<br />

<strong>in</strong>tegrated <strong>in</strong>ductor is shown <strong>in</strong> Fig. 2.<br />

Fig. 3 depicts the setup employed for electromagnetic<br />

simulations <strong>in</strong>clud<strong>in</strong>g the ground plane, the underpass <strong>and</strong><br />

the substrate layers underly<strong>in</strong>g the spiral (not visible). In<br />

spite of the old version of Momentum, the simulator<br />

allows an expansion of the metal thickness to estimate<br />

<strong>in</strong>ductance <strong>and</strong> quality factor more accurately. The<br />

soundness of Momentum simulations is demonstrated <strong>in</strong><br />

Figs. 4 <strong>and</strong> 5 where simulation results <strong>and</strong> measurements<br />

of a 3.7-nH <strong>and</strong> 2-nH <strong>in</strong>ductors are compared. The<br />

simulator predicts the low-frequency <strong>in</strong>ductance <strong>and</strong> the<br />

self-resonance frequency with a relative error about 1 %


<strong>and</strong> 3 %, respectively. Indeed, the quality factor is<br />

estimated with a maximum relative error of 9 %<br />

demonstrat<strong>in</strong>g the simulator accuracy <strong>in</strong> the prediction of<br />

frequency dependence of series losses.<br />

G<br />

S S<br />

G<br />

Oxide<br />

trench<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

Metal 3<br />

Metal 1<br />

Metal 2<br />

Buried layer contacts<br />

Figure 2. Micrograph of an <strong>in</strong>tegrated <strong>in</strong>ductor.<br />

ground plane<br />

ground plane<br />

G<br />

G<br />

buried layer contacts<br />

underpass<br />

buried layer contacts<br />

Figure 3. Electromagnetic simulation setup.<br />

Figure 4. Simulated <strong>and</strong> measured <strong>in</strong>ductance <strong>and</strong><br />

quality factor of a 3.7-nH <strong>in</strong>ductor (n = 3.5,<br />

w =6 µm, <strong>and</strong> d <strong>in</strong> = 150 µm).<br />

168<br />

Figure 5. Simulated <strong>and</strong> measured <strong>in</strong>ductance <strong>and</strong><br />

quality factor of a 2-nH <strong>in</strong>ductor (n = 2.5,<br />

w =14 µm, <strong>and</strong> d <strong>in</strong> = 100 µm).<br />

For the purpose of this work, the validation of<br />

low-frequency <strong>in</strong>ductance estimation has to be accurately<br />

verified. Although the <strong>in</strong>ductance decreases ow<strong>in</strong>g to sk<strong>in</strong><br />

<strong>and</strong> proximity effects, excellent agreement is found<br />

especially at medium frequencies, where the current<br />

distribution <strong>in</strong>side the conductor satisfies the current sheet<br />

approximation (i.e., when the sk<strong>in</strong> depth is small<br />

compared to the thickness <strong>and</strong> width of the conductor).<br />

Moreover, relative errors of low-frequency <strong>in</strong>ductance for<br />

a large number of simulated <strong>and</strong> measured devices are<br />

reported <strong>in</strong> Fig. 6 as a function of <strong>in</strong>ductance. Maximum<br />

<strong>and</strong> average errors of measurements with respect to<br />

electromagnetic simulations are 4.5% <strong>and</strong> less than 3%,<br />

respectively. This result demonstrates the simulator<br />

capability to estimate the <strong>in</strong>ductance value for various<br />

widths, turn numbers, <strong>and</strong> diameters.<br />

Figure 6. Relative error between simulated <strong>and</strong><br />

measured low-frequency <strong>in</strong>ductance.<br />

Moreover, the accuracy of Momentum has been<br />

demonstrated <strong>in</strong> [6] for a thickness different from this<br />

work, s<strong>in</strong>ce EM simulations were compared with on-wafer<br />

experimental measurements of 5-µm-thick <strong>in</strong>ductors.


The simulator provides adequate accuracy <strong>and</strong> moderate<br />

complexity <strong>and</strong> simulation time with respect to full 3D<br />

electromagnetic tools. This is particularly time-sav<strong>in</strong>g<br />

when several <strong>in</strong>ductors have to be simulated, as <strong>in</strong> this<br />

case.<br />

3. INDUCTANCE MODELING<br />

The coil <strong>in</strong>ductance has been calculated from EM<br />

simulations of open-air spirals (i.e., separated by a 5-mm<br />

air cushion from the underly<strong>in</strong>g ground plane). The<br />

<strong>in</strong>vestigated structures have turn number from 1.5 to 5.5,<br />

<strong>in</strong>ner diameter from 50 µm to 150 µm, <strong>and</strong> metal width<br />

from 6 µm to 20 µm. The geometrical parameters of all<br />

<strong>in</strong>vestigated structures are detailed <strong>in</strong> Table I. For each<br />

structure, the metal thickness was varied from 3 µm to<br />

15 µm. This results <strong>in</strong> thickness-to-width ratios from 0.15<br />

to 2.5, which thoroughly exceeds the range commonly<br />

employed <strong>in</strong> the design of RF ICs.<br />

Table. I. Simulated <strong>in</strong>ductors<br />

Name<br />

Turn<br />

number<br />

Metal width<br />

[µm]<br />

Inner diameter<br />

[µm]<br />

A 5.5 6 50<br />

B 4.5 6 100<br />

C 3.5 10 100<br />

D 3.5 14 50<br />

E 1.5 14 150<br />

F 4.5 14 150<br />

G 1.5 18 100<br />

H 2.5 20 150<br />

By apply<strong>in</strong>g the current sheet approximation [4], the<br />

<strong>in</strong>ductance of spirals with f<strong>in</strong>ite-thickness conductor can<br />

be expressed by (1)<br />

2<br />

µ ⋅ n ⋅ d avg 1 ⎛ t ⎞<br />

L = L0<br />

− α ⋅<br />

⋅ ⋅ ln⎜1<br />

+ ⎟ , (1)<br />

2 n ⎝ w ⎠<br />

where L0 is the <strong>in</strong>ductance value that competes to a<br />

zero-thickness spiral, µ is the magnetic permeability of air,<br />

davg is the average diameter computed as (d<strong>in</strong> + dout) / 2.<br />

The correction term α, equal to 1 <strong>in</strong> the orig<strong>in</strong>al<br />

formulation of (1), has been <strong>in</strong>troduced to provide more<br />

accurate <strong>in</strong>ductance calculations for spirals with large<br />

values of n <strong>and</strong> t / w, as will be fully expla<strong>in</strong>ed <strong>in</strong> the<br />

follow<strong>in</strong>g. A closed-form expression to calculate L0 was<br />

already given <strong>in</strong> [4], <strong>and</strong> extended to sub-nH <strong>in</strong>ductances<br />

<strong>in</strong> [7], hence only the thickness-dependent term of (1) will<br />

be dealt with <strong>in</strong> this paper.<br />

Figs. 7, 8 <strong>and</strong> 9 show the relative <strong>in</strong>ductance decrease,<br />

calculated as 100 · (1 – L / L0), of three spirals with different<br />

geometrical layout parameters as a function of the<br />

thickness-to-width ratio. Data calculated us<strong>in</strong>g both the<br />

orig<strong>in</strong>al (α = 1) <strong>and</strong> corrected version of (1) are also<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

169<br />

Relative <strong>in</strong>ductance decrease [%]<br />

Relative <strong>in</strong>ductance decrease [%]<br />

Relative <strong>in</strong>ductance decrease [%]<br />

24<br />

20<br />

16<br />

12<br />

8<br />

4<br />

0<br />

Momentum simulation<br />

Expression (1)<br />

Expression (1) corrected<br />

0.0 0.5 1.0 1.5 2.0 2.5 3.0<br />

Thickness-to-width ratio<br />

Figure 7. Relative <strong>in</strong>ductance decrease as a<br />

function of the thickness-to-width ratio (type A).<br />

20<br />

16<br />

12<br />

8<br />

4<br />

Momentum simulation<br />

Expression (1)<br />

Expression (1) corrected<br />

0<br />

0.0 0.2 0.4 0.6 0.8 1.0 1.2<br />

Thickness-to-width ratio<br />

Figure 8. Relative <strong>in</strong>ductance decrease as a<br />

function of the thickness-to-width ratio (type D).<br />

20<br />

16<br />

12<br />

8<br />

4<br />

Momentum simulation<br />

Expression (1)<br />

Expression (1) corrected<br />

0<br />

0.0 0.2 0.4 0.6 0.8 1.0<br />

Thickness-to-width ratio<br />

Figure 9. Relative <strong>in</strong>ductance decrease as a<br />

function of the thickness-to-width ratio (type G).


eported. It can be observed that the orig<strong>in</strong>al expression<br />

underestimates the <strong>in</strong>ductance reduction due to the<br />

<strong>in</strong>creased metal thickness <strong>in</strong> all considered cases,<br />

moreover the discrepancies between EM simulations <strong>and</strong><br />

calculations become larger as the turn number <strong>in</strong>creases.<br />

As an example, relative errors <strong>in</strong>crease from 16% to more<br />

than 30% as the turn number raises from 1.5 to 5.5 even<br />

for thickness-to-width ratios smaller than 1. The<br />

formulation of (1) derives from the consideration that<br />

<strong>in</strong>creas<strong>in</strong>g the metal thickness only <strong>in</strong>fluences the<br />

self-<strong>in</strong>ductance of the coil (proportional to n) while<br />

leav<strong>in</strong>g unchanged the mutual <strong>in</strong>ductance contributions<br />

(proportional to n 2 – n). However, questions might be<br />

raised on the validity of these assumptions <strong>in</strong> light of the<br />

above results. To improve the accuracy of (1) especially<br />

for large values of n the correction term α was <strong>in</strong>troduced.<br />

By compar<strong>in</strong>g analytical calculations with EM simulations<br />

of spirals with different geometrical parameters it emerged<br />

that the dependence of (1) on the turn number has to be<br />

slightly modified <strong>in</strong> order to take <strong>in</strong>to proper account the<br />

<strong>in</strong>fluence of the metal thickness on the mutual <strong>in</strong>ductance<br />

contributions. This can be accomplished us<strong>in</strong>g an<br />

expression of the form a·n b , with a <strong>and</strong> b as fitt<strong>in</strong>g<br />

parameters, which allows accurate <strong>in</strong>ductance model<strong>in</strong>g<br />

still ma<strong>in</strong>ta<strong>in</strong><strong>in</strong>g the physics-based nature of (1). The<br />

result<strong>in</strong>g expression of α is reported <strong>in</strong> (2)<br />

0.<br />

17<br />

α = 1.<br />

13⋅<br />

n . (2)<br />

Figs. 7, 8 <strong>and</strong> 9 demonstrate that the <strong>in</strong>troduction of α <strong>in</strong><br />

(1) substantially reduces calculation errors with respect to<br />

the orig<strong>in</strong>al expression. Indeed, the corrected formula is <strong>in</strong><br />

close agreement with EM simulations provid<strong>in</strong>g errors<br />

smaller than 5% for turn number up to 5.5 <strong>and</strong><br />

thickness-to-width ratio up to 2.5.<br />

F<strong>in</strong>ally, Fig. 10 shows the simulated <strong>and</strong> calculated<br />

relative <strong>in</strong>ductance decrease of all <strong>in</strong>vestigated spirals for<br />

t = 15 µm. Comparisons with EM simulations prove the<br />

excellent geometrical scalability of the modified<br />

expression <strong>and</strong> the improvement with respect to the<br />

orig<strong>in</strong>al one. Indeed, maximum <strong>and</strong> average errors with<br />

respect to EM simulations are reduced from 37% to 9%<br />

<strong>and</strong> from 27% to 3%, respectively.<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

4. CONCLUSION<br />

A modified current-sheet expression was proposed to<br />

model the <strong>in</strong>ductance decrease due to metal thicken<strong>in</strong>g. It<br />

achieved higher accuracy <strong>and</strong> improved geometrical<br />

scalability compared to the orig<strong>in</strong>al one. The proposed<br />

expression revealed excellent accuracy over a wide range<br />

of <strong>in</strong>ductor geometries <strong>and</strong> thickness-to-width ratio.<br />

170<br />

Relative <strong>in</strong>ductance decrease [%]<br />

30<br />

25<br />

20<br />

15<br />

10<br />

5<br />

0<br />

Momentum<br />

Expression (1) corrected<br />

Expression (1)<br />

A B C D E F G H<br />

Figure 10. Relative <strong>in</strong>ductance decrease of all <strong>in</strong>vestigated<br />

spirals for t = 15 µm.<br />

5. REFERENCES<br />

[1] Y.-S. Choi <strong>and</strong> J.-B. Yoon, “Experimental analysis of<br />

the effect of metal thickness on the quality factor <strong>in</strong><br />

<strong>in</strong>tegrated spiral <strong>in</strong>ductors for RF ICs,” IEEE<br />

Electron Device Lett., vol. 25, pp. 76-79, Feb. 2004.<br />

[2] S. Jenei, et al, “Physics-based closed-form<br />

<strong>in</strong>ductance expression for compact modell<strong>in</strong>g of<br />

<strong>in</strong>tegrated spiral <strong>in</strong>ductors,” IEEE J. Solid-State<br />

Circuits, vol. 37, pp. 77-80, Jan. 2002.<br />

[3] C. P. Yue <strong>and</strong> S. S. Wong, “Physical model<strong>in</strong>g of<br />

spiral <strong>in</strong>ductors on silicon,” IEEE Trans. Electron<br />

Devices, vol. 47, pp. 560-568, Mar. 2000.<br />

[4] S. S. Mohan, “The design, model<strong>in</strong>g <strong>and</strong> optimization<br />

of on-chip <strong>in</strong>ductor <strong>and</strong> transformer circuits,” Ph.D.<br />

thesis, Stanford University, Dec. 1999.<br />

[5] A. Scuderi, T. Biondi, E. Ragonese <strong>and</strong> G.<br />

Palmisano, “A lumped scalable model for silicon<br />

<strong>in</strong>tegrated spiral <strong>in</strong>ductors,” IEEE Trans. Circuits<br />

Syst. I, vol. 51, pp. 1203-1209, June 2004.<br />

[6] Y. Tretiakov, et al, “Improved model<strong>in</strong>g accuracy of<br />

thick metal passive SiGe/BiCMOS components for<br />

UWB us<strong>in</strong>g ADS Momentum,” <strong>in</strong> IEEE RFIC Symp.<br />

Digest, June 2004, pp. 461-464.<br />

[7] T. Biondi, A. Scuderi, E. Ragonese, <strong>and</strong> G.<br />

Palmisano, “Characterization <strong>and</strong> model<strong>in</strong>g of<br />

sub-nH <strong>in</strong>tegrated <strong>in</strong>ductances,” <strong>in</strong> Proc. IEEE IMTC<br />

2004, May 2004, pp. 1998-2002.


0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

CANTILEVER BASED MEMS FOR MULTIPLE MASS SENSING<br />

María Villarroya, Jaume Verd, Jordi Teva, Gabriel Abadal, Francesc Pérez*, Jaume<br />

Esteve*, Núria Barniol<br />

Departament d’Eng<strong>in</strong>yeria Electrònica, Escola Tecnica Superior d’Eng<strong>in</strong>yeria,<br />

Universitat Autónoma de Barcelona.Campus UAB.08193-Bellaterra (Barcelona) Spa<strong>in</strong><br />

E-mail: Maria.Villarroya@uab.es<br />

*Instituto de Microelectrónica de Barcelona.-Centro de Microelectrónica de Barcelona.. Consejo<br />

Superiro de Investigaciones Científicas.. Campus UAB.08193-Bellaterra (Barcelona) Spa<strong>in</strong><br />

ABSTRACT<br />

A cantilever based Micro Electro Mechanical System<br />

(MEMS) for mass detection is presented. The sensor for<br />

multiple detections is composed by several cantilevers <strong>in</strong><br />

an array configuration <strong>in</strong>tegrated monolithically with<br />

CMOS. Cantilevers are excited electrostatically to its<br />

resonance frequency. The oscillation of the<br />

microcantilever is detected by a capacitive detection<br />

technique. Mass variation is detected by resonance<br />

frequency shift<strong>in</strong>g. The mechanical transducers are<br />

fabricated after CMOS process on polysilicon, one of the<br />

CMOS layers. Optical lithography is used for the<br />

cantilevers def<strong>in</strong>itions. Cantilevers of 50 µm length, 1.1<br />

µm wide <strong>and</strong> 600 nm thick have been def<strong>in</strong>ed. This<br />

sensor provides a mass sensitivity of 70 ag/Hz.<br />

1. INTRODUCTION<br />

The objective of this article is to present the fabrication<br />

<strong>and</strong> characterisation of a cantilever based mass sensor<br />

<strong>in</strong>tegrated on st<strong>and</strong>ard CMOS chip. To detect t<strong>in</strong>ny mass<br />

(10 -17 -10 -19 g), sensors based on submicrometric-resonators<br />

are used[1,2],.<br />

Cantilever based sensor are optimal systems for chemical<br />

<strong>and</strong> biological sens<strong>in</strong>g [3].The <strong>in</strong>clusion of several devices<br />

on the same sensor allows the possibility of differential<br />

measurements with higher resolution <strong>and</strong> the chance of<br />

several different sens<strong>in</strong>g (with appropriate surface<br />

functionalization on the same device).<br />

The sens<strong>in</strong>g transducer <strong>and</strong> the control system will be on<br />

the same silicon substrate, be<strong>in</strong>g one polysilicon layer of<br />

the CMOS technology used as the structural layer for the<br />

cantilevers fabrication. The oscillation of the cantilever<br />

will be measured by a capacitive detection technique<br />

171<br />

2. SENSOR CHARACTERISTICS<br />

2.1 Sensor work<strong>in</strong>g pr<strong>in</strong>ciple<br />

Accord<strong>in</strong>g to the dimensions of the cantilever (figure1)<br />

<strong>and</strong> the mechanical properties of polysilicon (assum<strong>in</strong>g<br />

E=160 Gpa, ρ=2.33·103 Kg/m3), the spr<strong>in</strong>g constant (k)<br />

<strong>and</strong> the resonance frequency (fres) can be calculated by:<br />

Particularry for polisilycon:<br />

3<br />

10 w<br />

k = 4.<br />

0·<br />

10 t (N/m)<br />

3<br />

l<br />

(1)<br />

1<br />

f res =<br />

2π<br />

E w<br />

2<br />

ρ l<br />

w<br />

l<br />

(Hz) (2)<br />

3<br />

f res = 1.<br />

3·<br />

10 (Hz) (3)<br />

2<br />

The cantilever-driver transducer constitutes a mass sensor<br />

by measur<strong>in</strong>g the small changes <strong>in</strong> resonance frequency<br />

produced by small changes <strong>in</strong> the cantilever mass [4]. The<br />

mass resolution (assum<strong>in</strong>g that the extra mass is<br />

distributed along the cantilever) can be expressed as:<br />

δm<br />

=<br />

δf<br />

50.<br />

6<br />

k<br />

f<br />

(g·Hz -1 ) (4)<br />

3<br />

res<br />

Thus, a theoretical mass resolution of 7.2·10 -17 g Hz -1 can<br />

be obta<strong>in</strong>ed with a cantilever 50 µm long <strong>and</strong> 1.4 µm<br />

wide..


0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

l<br />

driver cantilever<br />

s t<br />

w<br />

Figure 1 Def<strong>in</strong>ition of the dimensions of the cantilever-driver<br />

structure<br />

2.2 Excitation <strong>and</strong> Readout Circuitry<br />

The CMOS circuitry is implemented <strong>in</strong> a st<strong>and</strong>ard tw<strong>in</strong>well<br />

CMOS technology. It is a 5V, 2.5µm, 2-poly, 2-metal<br />

CMOS process suitable for mixed <strong>and</strong> analog/digital<br />

applications.<br />

The <strong>in</strong>tegrated circuitry consists on the readout circuitry<br />

<strong>and</strong> the multiplex<strong>in</strong>g system (front-end circuitry). The use<br />

of on-chip circuitry allows the m<strong>in</strong>imization of leakage<br />

currents through parasitic capacitances (i.e. bond<strong>in</strong>g pads<br />

<strong>and</strong> external wires) that may dom<strong>in</strong>ate the dynamical<br />

capacitance of <strong>in</strong>terest (cantilever-driver <strong>in</strong> this case)<br />

when the device is scaled <strong>in</strong> dimensions.<br />

Figure 2 Draw of the sensor schema, two circuits allow<br />

differential measurements, cantilevers are excited from the driver,<br />

<strong>and</strong> polarized directly to the same voltage than the substrate.<br />

The cantilevers are electrostatically excited by apply<strong>in</strong>g<br />

both AC <strong>and</strong> DC voltage; the readout is achieved from<br />

one driver. Figure 2 shows a schema of the used<br />

configuration. Different polarization voltage (DC) can be<br />

applied to each cantilever, excitation voltage (AC) is<br />

applied to the uneven position drivers, readout is<br />

performed through the even position drivers. To avoid the<br />

collapse of the cantilever with the substrate, the substrate<br />

172<br />

is polarized to the same voltage than the cantilevers<br />

through a polarization r<strong>in</strong>g.<br />

The motional current, generated by the cantilever-driver<br />

<strong>in</strong>terface [5] of each cantilever is detected by the readout<br />

circuit. In this work we have designed a transimpedance<br />

amplifier (I/V converter) based on an operational<br />

amplifier with a resistive feedback. With this circuit we<br />

obta<strong>in</strong> a ga<strong>in</strong> higher than 1MΩ <strong>in</strong> the b<strong>and</strong>width required<br />

for our system [1].<br />

2.3 Fabrication Process<br />

Ma<strong>in</strong> objective <strong>in</strong> the process is to ma<strong>in</strong>ta<strong>in</strong> unchanged the<br />

electrical characteristics of the CMOS circuits. CMOS is<br />

performed st<strong>and</strong>ard <strong>and</strong> transducers are def<strong>in</strong>ed on a<br />

process after CMOS. Special areas connected to the circuit<br />

have been def<strong>in</strong>ed dur<strong>in</strong>g CMOS. Figure 3 shows a cross<br />

section draw of the resonators fabrication area after<br />

CMOS. An aperture has been def<strong>in</strong>ed on the passivation<br />

layer to access to the polysilicon. Poly0 layer (one of the<br />

polysilicon layer on the technology) is used as structural<br />

layer. Dur<strong>in</strong>g CMOS process Poly 1 layer (second<br />

polysilicon layer of the technology) is used as protection<br />

layer of the fabrication area dur<strong>in</strong>g CMOS.<br />

Figure 3 Schematic cross section of the region for the resonators<br />

fabrication after CMOS process: Poly1 (protection layer) has to<br />

be removed by RIE before fabrication process<br />

First step after CMOS process is to remove the protection<br />

layer by reactive ion etch<strong>in</strong>g (RIE) process. Next<br />

cantilevers/drivers structures are def<strong>in</strong>ed. Several<br />

technologies for cantilever fabrication have been used. By<br />

optical lithography mechanical transducers are def<strong>in</strong>ed on<br />

the whole wafer. Nanolithography techniques as electron<br />

beam lithography allow higher resolution with lower<br />

throughput [6]. Figure 4 shows an draw of the<br />

fabrication process us<strong>in</strong>g optical lithography. First<br />

image shows the fabrication area after CMOS, B)<br />

the polysilicon protection layer is removed by RIE.<br />

Photolithography is performance to def<strong>in</strong>e the<br />

transducers C, D, E). Cantilevers <strong>and</strong> drivers pattern<br />

are transferred to the substrate by RIE. F). With a<br />

new resist mask H) cantilevers are released by wet<br />

etch<strong>in</strong>g of the silicon oxide used as sacrificial layer,<br />

I).


A) Fabrication area after CMOS<br />

B) Poly1 (protection layer) RIE<br />

C) Resist deposition for UV lithography<br />

D) UV exposition for structures def<strong>in</strong>iton<br />

E) Resist mask def<strong>in</strong>ed after developement<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

F) Pattern transfer to PolyO by RIE<br />

G) Resist remove<br />

Silicon substrate<br />

Passivation layer<br />

Poly 1<br />

Figure 4. Schema of the fabrication process.<br />

H) Photolithography for cantilever release<br />

I) Structures after SIO isotropic etch<strong>in</strong>g.<br />

2<br />

Photoresist<br />

Poly 0 Metal<br />

Thick oxide<br />

Interlayer oxide<br />

Figure 5 shows an optical image of the results. A<br />

four cantilever array connected to two readout<br />

circuits. Cantilevers have been def<strong>in</strong>ed by optical<br />

lithography.<br />

Figure 5 Optical image of the sensor: cantilevers def<strong>in</strong>ed<br />

optically on the CMOS substrate with the excitation <strong>and</strong> readout<br />

circuitry. ´<br />

Figure 6 shows <strong>in</strong> detail an image of the transducer,<br />

arrays of 4 cantilevers <strong>and</strong> 5 drivers have been<br />

def<strong>in</strong>ed. Cantilevers 1.1 µm wide, 50 µm length <strong>and</strong><br />

600nm thick have been def<strong>in</strong>ed.<br />

2.4 Results<br />

MEMS systems <strong>in</strong>tegrat<strong>in</strong>g polysilicon cantilevers with<br />

CMOS have been fabricated (Figure 5, 6) successfully.<br />

173<br />

The cantilevers dimensions are 50 µm long, 1.1 µm width<br />

<strong>and</strong> 600 nm thick., giv<strong>in</strong>g a mass sensitivity of 70 ag/Hz.<br />

Figure 6 Optical image <strong>in</strong> detail of the transducer.Cantilevers 50<br />

µm long, 1.2 µm width <strong>and</strong> 600 nm thick have been def<strong>in</strong>ed.<br />

Simultaneous electrical characterization of the frequency<br />

response of two cantilevers (shown on figure 5,6) has been<br />

made. Results are presented on figure 7. Resonance<br />

frequency detected is 502 kHz for cantilever #1 <strong>and</strong> 510<br />

kHz for cantilever #2. The frequency difference between<br />

both cantilevers is due to small variations on the<br />

dimensions due to the technological process used.<br />

Figure 7 Frequency response of two cantilevers, 50um long, 1.2<br />

um width <strong>and</strong> 600 nm thick, <strong>in</strong> the same array.<br />

From a detailed analysis of the resonance frequency of<br />

one of the cantilever with the applied voltage, the <strong>in</strong>tr<strong>in</strong>sic<br />

resonance frequency of the system can be determ<strong>in</strong>ed.<br />

Figure 8 shows the dependence of the resonance<br />

frequency with the applied effective voltage. From the<br />

l<strong>in</strong>ear regression the natural resonance frequency is 519.5<br />

kHz. As the resonance frequency depends on the Young’s<br />

module accord<strong>in</strong>g with equation (2) we can determ<strong>in</strong>e the<br />

Young’s Module, giv<strong>in</strong>g 127 GPa, accord<strong>in</strong>g with the<br />

values referenced on the literature [6]


Figure 8 Dependence on the resonance frequency with the<br />

applied effective voltage.<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

3. CONCLUSION<br />

A mass sens<strong>in</strong>g system with high resolution has been<br />

presented. The implemented array system allows multiple<br />

detection, <strong>in</strong>creas<strong>in</strong>g the sensor versatility. The readout<br />

performance <strong>in</strong>tegrated monolithically with the<br />

micromechanical transducers forms an on chip system.<br />

As resonance of two cantilevers of the array can be<br />

detected simultaneously, differentials measurements, that<br />

<strong>in</strong>crease the resolution of the systems are allowed.<br />

High mass resolution can be achieved <strong>in</strong> ambient<br />

conditions us<strong>in</strong>g optical lithography for the cantilevers<br />

def<strong>in</strong>itions. One order of magnitude can be <strong>in</strong>creased<br />

reduc<strong>in</strong>g the dimensions by us<strong>in</strong>g e-beam lithography.<br />

A limitation <strong>in</strong> the resolution of the cantilevers width is<br />

due to the polysilicon gra<strong>in</strong>s, the use of crystall<strong>in</strong>e silicon<br />

would improve the resolution. This possibility has been<br />

also developed dur<strong>in</strong>g the <strong>PhD</strong>.<br />

4. REFERENCES<br />

[1] K. L. Ek<strong>in</strong>ci,et al. “Ultimate limits to <strong>in</strong>ertial mass<br />

sens<strong>in</strong>g based upon nanoelectromechanical<br />

systems”. Applied Physics Letters, Vol. 84, N 22, pp<br />

4469-4471. (2004)<br />

[2] J. Verd et al. “Design, fabrication <strong>and</strong><br />

characterization of a sub-microelectromechanical<br />

resonator with monolithically <strong>in</strong>tegrated CMOS<br />

readout circuit.” IEEE Journal of<br />

174<br />

Microelectromechanical Systems, Vol. 14, N 3.<br />

(<strong>2005</strong>)<br />

[3] N.V. Lavrik et al. “Cantilever transducers as a<br />

platform for chemical <strong>and</strong> biological sensors”<br />

Review of Scientific Instruments. Vol. 75, n.7, pp<br />

2229-2253.(2004)<br />

[4] G. Abadal et al. “Electromechanical model of a<br />

resonat<strong>in</strong>g nano-cantilever based sensor for high<br />

resolution <strong>and</strong> high sensitivity mass detection”<br />

Nanotechnology. Vol 12, pp 100-104. (2001).<br />

[5] J. Verd et al., “CMOS circuitry for on-chip read-out<br />

of a resonat<strong>in</strong>g nanometer-scale cantilever”, <strong>in</strong><br />

DCIS´2002 Conference Proceed<strong>in</strong>gs, pp. 207-212<br />

(2002)<br />

[6] Baltes et al. “CMOS-MEMS” Wiley-VCH.<br />

(November 2004).


A NOVEL TECHNIQUE FOR COUPLING<br />

THREE DIMENSIONAL MESH ADAPTATION WITH<br />

AN A POSTERIORI ERROR ESTIMATOR<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

R. He<strong>in</strong>zl △ ,M.Spevak ◦ ,P.Schwaha △ ,T.Grasser △<br />

△ Christian Doppler Laboratory for TCAD <strong>in</strong> <strong>Microelectronics</strong><br />

at the Institute for <strong>Microelectronics</strong><br />

◦ Institute for <strong>Microelectronics</strong>, Technical University Vienna,<br />

Gußhausstraße 27-29/E360, A-1040 Vienna, Austria<br />

E-mail: He<strong>in</strong>zl@iue.tuwien.ac.at<br />

ABSTRACT<br />

We present a novel error estimation driven threedimensional<br />

unstructured mesh adaptation technique<br />

based on a posteriori error estimation techniques<br />

with upper <strong>and</strong> lower error bounds. In contrast<br />

to other work [1, 2] we present this approach<br />

<strong>in</strong> three dimensions us<strong>in</strong>g unstructured mesh<strong>in</strong>g<br />

techniques to potentiate an automatically adaptation<br />

of three-dimensional unstructured meshes<br />

without any user <strong>in</strong>teraction. The motivation for<br />

this approach, the applicability <strong>and</strong> usability is presented<br />

with real-world examples.<br />

1. INTRODUCTION<br />

Most TCAD (Technology Computer Aided Design)<br />

problems can be formulated with partial differential<br />

equations <strong>and</strong> solved by numerical methods, usually<br />

f<strong>in</strong>ite difference, f<strong>in</strong>ite element <strong>and</strong> f<strong>in</strong>ite volume<br />

methods. They are used to model disparate<br />

phenomena such as dopant diffusion, mechanical<br />

deformation, heat transfer, fluid flow, electromagnetic<br />

wave propagation, <strong>and</strong> quantum effects. An<br />

essential step <strong>in</strong> these methods is to f<strong>in</strong>d a proper<br />

tessellation of a cont<strong>in</strong>uous doma<strong>in</strong> with discrete<br />

elements, <strong>in</strong> our case tetrahedra.<br />

This transition from the cont<strong>in</strong>ous doma<strong>in</strong> to a<br />

discretized doma<strong>in</strong> will <strong>in</strong>herently produce errors<br />

<strong>in</strong> the computed results, no matter how sophisticated<br />

or how appropriate a mathematical model<br />

is. This approximation error can be enormous, <strong>and</strong><br />

can completely <strong>in</strong>validate numerical predictions if<br />

we have no estimated or quantitative measurement<br />

of these errors. The general subject is referred to<br />

as a posteriori error estimation. It is an essential<br />

step to observe <strong>and</strong> bound the approximation error<br />

<strong>and</strong> to have a mesh adaptation strategy to guar-<br />

175<br />

antee the accuracy of the solution with<strong>in</strong> a given<br />

range. In contrast to two dimensions where mesh<br />

generation <strong>and</strong> adaptation techniques are mostly<br />

based on h<strong>and</strong> crafted meshes or grids, it is almost<br />

impossible to design grids or meshes <strong>in</strong> three dimensions.<br />

On that account it is very important<br />

to generate <strong>and</strong> adapt meshes <strong>in</strong> three-dimensions<br />

automatically.<br />

2. MESH GENERATION AND<br />

ADAPTATION<br />

The first step <strong>in</strong> solv<strong>in</strong>g equations numerically is<br />

the discretization of the underly<strong>in</strong>g computational<br />

doma<strong>in</strong>. A widely used approach has been to divide<br />

the doma<strong>in</strong> <strong>in</strong>to a structured assembly of quadrilateral<br />

cells, with the topological <strong>in</strong>formation be<strong>in</strong>g<br />

apparent from the fact that each <strong>in</strong>terior vertex<br />

is surrounded by exactly the same number of<br />

cells. This k<strong>in</strong>d of discretization is called structured<br />

grid or simply grid. The major disadvantage<br />

of this approach is, that the discretization of<br />

highly non-planar elements produces a large number<br />

of po<strong>in</strong>ts <strong>in</strong> the simulation doma<strong>in</strong>. As a consequence<br />

the subsequent simulation <strong>and</strong> calculation<br />

steps are slowed down requir<strong>in</strong>g a lot of computational<br />

resources.<br />

The alternative approach is to divide the computational<br />

doma<strong>in</strong> <strong>in</strong>to an unstructured assembly of<br />

cells. The notable feature of an unstructured mesh<br />

is that the number of cells surround<strong>in</strong>g a typical <strong>in</strong>terior<br />

vertex of the mesh is not necessarily constant.<br />

This k<strong>in</strong>d of discretization is called unstructured<br />

mesh or simply mesh. The major disadvantage of<br />

this approach is that the element generation process<br />

is one of the most complicated procedures <strong>in</strong><br />

the field of simulation. However the reduction <strong>in</strong><br />

simulation time <strong>and</strong> the requirements on computa-


tional resources can be significant.<br />

Based on the complex three-dimensional mesh generation<br />

process <strong>and</strong> the impracticality of us<strong>in</strong>g<br />

uniform ref<strong>in</strong>ement strategies most of the TCAD<br />

simulations are based on structured grids. But<br />

with the shift to real <strong>and</strong> complex <strong>in</strong>put structures<br />

the grid approach with the <strong>in</strong>volved ref<strong>in</strong>ement<br />

steps is no longer manageable. Here the unstructured<br />

mesh generation techniques come <strong>in</strong>to<br />

play. In two dimensions most of the grid or mesh<br />

design procedure <strong>and</strong> adaptation steps are done<br />

by h<strong>and</strong>. With the step from two-dimensional to<br />

three-dimensional mesh generation <strong>and</strong> adaptation<br />

a h<strong>and</strong> crafted design <strong>and</strong> adaptation is impossible.<br />

First, the user <strong>in</strong>teraction <strong>and</strong> visualization<br />

<strong>in</strong> three-dimensions is very difficult. Secondly the<br />

user can not be aware where the adaptation should<br />

be done. On this account three-dimensional mesh<br />

generation <strong>and</strong> adaptation must be coupled with<br />

error estimation techniques to ensure an automatic<br />

adjustment for a given problem without user <strong>in</strong>teraction.<br />

A difficulty <strong>in</strong> the field of mesh adaptation is that<br />

to this date the underst<strong>and</strong><strong>in</strong>g of the relationship<br />

between the quality of mesh elements, numerical<br />

accuracy, <strong>and</strong> stiffness matrix condition rema<strong>in</strong>s <strong>in</strong>complete,<br />

even for the simplest cases. Experience<br />

<strong>and</strong> mathematical results have shown that isotropic<br />

elements usually lead to good results while degenerated<br />

elements will negatively affect the computation.<br />

Therefore we derive an abstract quality criterion<br />

for elements which have to be ref<strong>in</strong>ed so that<br />

automatic remesh<strong>in</strong>g can be easily accomplished by<br />

locally remov<strong>in</strong>g tetrahedra patches <strong>and</strong> <strong>in</strong>sert<strong>in</strong>g<br />

po<strong>in</strong>ts derived from the error estimator. Our novel<br />

technique of calculat<strong>in</strong>g an abstract quality criterion<br />

to control the mesh adaptation or remesh<strong>in</strong>g<br />

step separates the mathematical error estimation<br />

step from the geometrical mesh<strong>in</strong>g step <strong>and</strong> can<br />

therefore be implemented with different error estimation<br />

models. Also the software components can<br />

be easily upgraded. In the field of unstructured<br />

mesh modification the follow<strong>in</strong>g techniques are possible:<br />

- H-method<br />

This method uses a geometrical parameter h<br />

for ref<strong>in</strong>ement (i.e. the height of a tetrahedron).<br />

- P-method<br />

This method varies the degree p <strong>in</strong> the approximation<br />

(i.e. quadratic ansatz functions with<strong>in</strong><br />

f<strong>in</strong>ite elements) while keep<strong>in</strong>g the geometrical<br />

size h unchanged.<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

176<br />

- HP-method<br />

This method comb<strong>in</strong>es the p-method with the<br />

h-method.<br />

- Adaptive remesh<strong>in</strong>g method<br />

This method extracts a patch of marked elements<br />

which are accord<strong>in</strong>gly remeshed.<br />

For our technique we focus ma<strong>in</strong>ly on the adaptive<br />

remesh<strong>in</strong>g method (some k<strong>in</strong>d of advanc<strong>in</strong>g front<br />

method [3]) because of the maximum degree of freedom<br />

with<strong>in</strong> mesh adaptation.<br />

3. ERROR ESTIMATION<br />

The numerical expression of a discretized problem<br />

results <strong>in</strong> a discrete distribution of quantities <strong>and</strong><br />

ansatz functions of a certa<strong>in</strong> function class (e.g.<br />

piecewise aff<strong>in</strong>e functions) to describe the behavior<br />

of the quantities. Apart from the quality of<br />

the underly<strong>in</strong>g mesh the quality of the simulation<br />

essentially depends on the selection of the ansatz<br />

functions. Us<strong>in</strong>g piecewise aff<strong>in</strong>e or constant ansatz<br />

functions like f<strong>in</strong>ite volumes or f<strong>in</strong>ite elements we<br />

always obta<strong>in</strong> results with a certa<strong>in</strong> error. In terms<br />

of function spaces we carry out a projection of<br />

the complete space of functions to the subspace<br />

of piecewise aff<strong>in</strong>e or constant functions. Usually<br />

the euclidian norm is used <strong>in</strong> order to measure the<br />

distance between two functions.<br />

�<br />

� +∞�<br />

�2dx �f − g�2 = f(x) − g(x) (1)<br />

−∞<br />

3.1 Residual based error estimation<br />

On each triangle the solution function is <strong>in</strong>terpolated<br />

piecewise (Figure 1) aff<strong>in</strong>ely so as to receive a<br />

globally cont<strong>in</strong>uous function. This function fulfills<br />

the Laplace equation <strong>in</strong> the <strong>in</strong>terior of the triangle<br />

whereas the discont<strong>in</strong>uity of the <strong>in</strong>terpolated<br />

solution function at the boundaries leads to an error<br />

which can be estimated locally by the follow<strong>in</strong>g<br />

formula [4]:<br />

� �<br />

ηk = hk<br />

E∈EK∩E<strong>in</strong>t<br />

� JE,n (uh) � 2 E +<br />

�<br />

E∈EK<br />

� JE,t (uh) � 2 �<br />

E<br />

(2)<br />

where EK denotes the edges of the triangle <strong>and</strong> E<strong>in</strong>t<br />

is the set of the <strong>in</strong>terior edges. The local discon-<br />

t<strong>in</strong>uity of the gradient of the <strong>in</strong>terpolated function<br />

at an edge is � JE, whereJE,n is the normal com-<br />

ponent <strong>and</strong> JE,t is the tangential component. The<br />

geometry factor hK denotes a characteristic length


Figure 1: (left) Two-dimensional representation of the<br />

error estimator. The normal component of the error<br />

changes at the facet. (right) Discrete solution function<br />

uh <strong>and</strong> the <strong>in</strong>terpolation function uh as function over<br />

the mesh triangle<br />

of the triangle such as the mean edge length or<br />

the circumference radius. An <strong>in</strong>terpretation of the<br />

behavior of the error estimator is given <strong>in</strong> the follow<strong>in</strong>g.<br />

A gradient <strong>in</strong> the potential causes a flux,<br />

which is free of sources <strong>in</strong> the case of the Laplace<br />

equation. If the flux is discont<strong>in</strong>uous through a<br />

facet of a tetrahedron there has to <strong>in</strong>clude source<br />

density on the facet. The Laplace equation states,<br />

however, that the source density vanishes. Therefore<br />

the estimated error is zero if the potential behaves<br />

smoothly when cross<strong>in</strong>g a facet. As we use<br />

piecewise aff<strong>in</strong>e <strong>in</strong>terpolation the function is cont<strong>in</strong>ous<br />

<strong>and</strong> therefore the jump of the tangential field<br />

strength has to vanish. For this reason only the<br />

normal components of the field strength are relevant.<br />

3.2 ZZ error estimation<br />

The ZZ error estimator [5] measures how much the<br />

numerical solution uh differs from a smoothed numerical<br />

solution uh (Figure 1). For some types of<br />

differential equations such as the Laplace equation<br />

the ZZ estimator has been shown to have upper <strong>and</strong><br />

lower bounds [5]. For the <strong>in</strong>terpolation function of<br />

the discrete numeric solution uh we use polynomial<br />

functions of degree one <strong>in</strong> each tetrahedron. The<br />

distance between the <strong>in</strong>terpolated piecewise aff<strong>in</strong>e<br />

function <strong>and</strong> the piecewise constant function can<br />

be determ<strong>in</strong>ed by the evaluation of the norm (1)<br />

<strong>and</strong> yields,<br />

ηk = �<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

i<br />

U 2 i<br />

− �<br />

UiUj<br />

i�=j<br />

uh<br />

uh<br />

(3)<br />

where the Ui are the result values <strong>in</strong> the vertices of<br />

the tetrahedron.<br />

3.3 Evaluation of the error estimation<br />

A quality statement regard<strong>in</strong>g the calculation can<br />

be given count<strong>in</strong>g the simplices with<strong>in</strong> a certa<strong>in</strong> error<br />

<strong>in</strong>terval of error values. The range of errors<br />

(from zero to the maximum error) is divided <strong>in</strong>to<br />

equidistant error classes, i.e. ten classes. With<br />

177<br />

this separation different adaptation strategies can<br />

be used: m<strong>in</strong>imum number of elements, error <strong>in</strong><br />

element, ormaximum number of elements. Here<br />

we use the maximum number of elements strategy,<br />

which is bound to 30% <strong>in</strong> each ref<strong>in</strong>ement step, <strong>and</strong><br />

the error <strong>in</strong> element strategy, only to sort the elements.<br />

4. RESULTS<br />

In the follow<strong>in</strong>g the results of the error estimation<br />

<strong>and</strong> mesh adaptation techniques are shown.<br />

We use two different examples to demonstrate<br />

the behavior of our novel technique of coupl<strong>in</strong>g<br />

the error estimation <strong>and</strong> mesh adaptation steps<br />

through an abstract <strong>in</strong>terface. First, we use a<br />

non-planar capacitor structure <strong>and</strong> calculate the<br />

potential distribution between the contacts. Here<br />

we use the residual error estimation technique only<br />

to show the shift of the quality of the elements<br />

with<strong>in</strong> each error class. The second example deals<br />

with a realistic <strong>in</strong>terconnect l<strong>in</strong>e with tapered l<strong>in</strong>e<br />

elements (l<strong>in</strong>es with angular side walls) <strong>and</strong> a<br />

pyramid element for the via, which connects the<br />

two l<strong>in</strong>es. Here we compare the residual error<br />

<strong>and</strong> the ZZ error estimation technique. For the<br />

non-planar capacitor we give a comparison of the<br />

<strong>in</strong>itial error <strong>and</strong> the error value after one remesh<strong>in</strong>g<br />

step:<br />

Initial mesh<strong>in</strong>g Remesh<strong>in</strong>g<br />

Tetrahedra 2,145 8,774<br />

M<strong>in</strong>imum error 0.02 0.001<br />

Maximum error 25.0 19.4<br />

The next diagram shows the distribution of<br />

error values. The number of tetrahedra is<br />

plotted on the y-axis while the x-axis shows<br />

the error classes. The light gray boxes show<br />

the ref<strong>in</strong>ed error values whereas the dark<br />

grey boxes show the <strong>in</strong>itial error values:<br />

400<br />

300<br />

200<br />

100<br />

0<br />

4,636<br />

0 1 2 3 4 5 6 7 8<br />

As can be seen, the error values for the elements<br />

are shifted to the left side <strong>in</strong>dicat<strong>in</strong>g that the local<br />

error values drop due to our ref<strong>in</strong>ement technique.<br />

Figure 2 depicts the error values without any ref<strong>in</strong>ement,<br />

whereas Figure 3 shows the distribution


Figure 2: Initial local error values without ref<strong>in</strong>ement<br />

Figure 3: Local error values after one ref<strong>in</strong>ement step<br />

of error values after one adaptation step (zero<br />

st<strong>and</strong>s for a lower error, <strong>and</strong> one denotes a higher<br />

error). As we have seen <strong>in</strong> the error value diagrams<br />

the local error values are shifted to lower values.<br />

To show the applicability <strong>and</strong> usability for a<br />

realistic example we solve the Laplace equation<br />

with<strong>in</strong> an <strong>in</strong>terconnect structure <strong>and</strong> show the<br />

successfully application of our technique. In<br />

Figure 4 we depict the structure, the contacts <strong>and</strong><br />

the potential distribution. Figure 5 presents a<br />

three-dimensional visualization (not a cut through<br />

the structure) of the relative error based on the<br />

residual error estimation technique with<strong>in</strong> each<br />

adaptation step. Compared to the residual error<br />

estimator, Figure 6 presents the adaptation steps<br />

based on the ZZ error estimation technique. The<br />

Figure 4: Initial <strong>in</strong>terconnect structure (top) <strong>and</strong> potential<br />

distribution (bottom)<br />

follow<strong>in</strong>g table gives a comparison of the number<br />

of tetrahedra after each adaptation step with<strong>in</strong> the<br />

two different error estimation techniques:<br />

Initial step Step 1 Step 2<br />

RS: Tetrahedra 1,720 2,052 2,334<br />

ZZ: Tetrahedra 1,720 2,075 2,290<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

178<br />

Figure 5: Residual based error estimator, zoomed <strong>in</strong>to<br />

the important via: first three adaptation steps<br />

Figure 6: ZZ error estimator, zoomed <strong>in</strong>to the important<br />

via: first three adaptation steps<br />

5. CONCLUSION<br />

Us<strong>in</strong>g the advantages of mesh ref<strong>in</strong>ement <strong>in</strong> comb<strong>in</strong>ation<br />

with a posteriori error estimation leads to an<br />

enormous <strong>in</strong>crease of simulation result quality. The<br />

benefits of adaptive mesh ref<strong>in</strong>ement allow us to locally<br />

improve the mesh quality without <strong>in</strong>creas<strong>in</strong>g<br />

the number of mesh po<strong>in</strong>ts dramatically. For this<br />

reason the resolution of the critical simulation doma<strong>in</strong><br />

is much higher <strong>and</strong> the relevant processes can<br />

be better simulated whereas the regions of lower<br />

<strong>in</strong>terest do not require much simulation time. In<br />

comb<strong>in</strong>ation with a posteriori error estimation a<br />

measure was found which triggers the ref<strong>in</strong>ement<br />

<strong>and</strong> <strong>in</strong>dicates if the quality of the solution is resolved<br />

adequately.<br />

6. REFERENCES<br />

[1] J.T. Oden. A Posteriori Error Estimation, Verification<br />

<strong>and</strong> Validation <strong>in</strong> Computational Solid Mechanics.<br />

ASME, USACM St<strong>and</strong>ards, 2002.<br />

[2] S. Prudhomme, J.T. Oden, T. Westermann, <strong>and</strong><br />

M. E. Botk<strong>in</strong> J.Bass. Practical Methods for a<br />

Posteriori Error Estimation<strong>in</strong> Eng<strong>in</strong>eer<strong>in</strong>g Applications.<br />

International Journal for Numerical Methods<br />

<strong>in</strong> Eng<strong>in</strong>eer<strong>in</strong>g, 2003.<br />

[3] P. Fleischmann. Enhanced Advanc<strong>in</strong>g Front Delaunay<br />

Mesh<strong>in</strong>g <strong>in</strong> TCAD. International Conference<br />

on the Simulation of Semiconductor Processes <strong>and</strong><br />

Devices (SISPAD), Kobe; 09-04-2002 - 09-06-2002,<br />

2002.<br />

[4] S. Nicaise. A Posteriori Error Estimations of Some<br />

Cell-Centered F<strong>in</strong>ite Volume Methods. Université<br />

de Valenciennes et du Ha<strong>in</strong>aut Cambrésis, MACS,<br />

ISTV, Valenciennes Cedex 9, France, 2004.<br />

[5] O.C. Zienkiewicz <strong>and</strong> J.Z. Zhu. A Simple Error Estimator<br />

<strong>and</strong> Adaptive Procedure for Practical Eng<strong>in</strong>eer<strong>in</strong>g<br />

Analysis. Int. J. Numer. Meth. Engrg. 24<br />

(1987) 337–357. MR 87m:73055, 1987.


0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

EFFICIENT ERROR CORRECTION SOLUTIONS<br />

FOR OFDM-BASED WIRELESS VIDEO<br />

Mauro Lattuada, Renzo Posega, Marco Mattavelli, Daniel Mlynek<br />

Ecole Polytechnique Fédérale de Lausanne (EPFL), Laboratoire de Traitement des Signaux (LTS3),<br />

Batiment ELB, Station 11, CH-1015 Lausanne, Switzerl<strong>and</strong><br />

E-mail: mauro.lattuada@epfl.ch<br />

ABSTRACT<br />

This paper describes a new error correction scheme for<br />

flexible <strong>and</strong> robust high-throughput real time video<br />

transmission systems able to cope with difficult high<br />

mobility channels. The scheme is based on comb<strong>in</strong><strong>in</strong>g<br />

OFDM transmission with Turbo Cod<strong>in</strong>g <strong>and</strong> deep time<br />

<strong>in</strong>terleav<strong>in</strong>g. Simulations of different channel models <strong>and</strong><br />

field test has shown that the provided solution provides<br />

relevant improvement versus DVB-T based solutions.<br />

1. INTRODUCTION<br />

The objective of this work is to study wireless digital<br />

video transmission system solutions achiev<strong>in</strong>g at the same<br />

time flexibility <strong>and</strong> robustness for high mobile<br />

applications. The OFDM modulation, for its <strong>in</strong>tr<strong>in</strong>sic<br />

robustness to multipath fad<strong>in</strong>g has shown to be the best<br />

modulation choice for wireless digital video transmission.<br />

A simple evolution of nowadays well-established DVB-T<br />

[1] or ISDB-T [2] st<strong>and</strong>ard does not enable the<br />

development of an appropriate system for mobile<br />

applications. Several problems arise when try<strong>in</strong>g to use<br />

transmission channels placed <strong>in</strong> higher frequency b<strong>and</strong>s.<br />

The major problems are due to <strong>in</strong>creased Doppler <strong>and</strong><br />

phase noise impairments that can completely corrupt the<br />

performances of current receivers designed for less critical<br />

DVB-T mobility. New highly mobile applications require<br />

the development of appropriate OFDM process<strong>in</strong>g<br />

architectures. Moreover, DVB-T st<strong>and</strong>ard modes are well<br />

suited for fixed b<strong>and</strong>width transmission to be compatible<br />

with existent analog broadcast<strong>in</strong>g <strong>and</strong> cannot be easily<br />

modified to enable transmission <strong>in</strong>to variable b<strong>and</strong>width<br />

channels. The available channel b<strong>and</strong>width <strong>in</strong> wireless<br />

digital video application can be very different. The most<br />

important goal of this work is the <strong>in</strong>vestigation of the<br />

possible improvements achievable by the Forward Error<br />

Correction (FEC) schemes comb<strong>in</strong>ed with deep<br />

<strong>in</strong>terleav<strong>in</strong>g techniques applied to OFDM modulation <strong>in</strong><br />

presence of deep fad<strong>in</strong>g impairments of the transmission<br />

channel. The objective is to achieve sufficient robustness<br />

for enabl<strong>in</strong>g a QoS for real time video contributions.<br />

While the digital television st<strong>and</strong>ards use a concatenated<br />

error correction scheme based on convolutional <strong>in</strong>ner<br />

coder, <strong>in</strong> this work the <strong>in</strong>vestigated solution is based on a<br />

turbo <strong>in</strong>ner coder. These codes offer theoretically better<br />

performances than convolutional codes, but the<br />

complexity <strong>and</strong> the computational requirements are<br />

179<br />

higher. A concatenated turbo-block codes scheme solution<br />

has been <strong>in</strong>vestigated with the purpose of tak<strong>in</strong>g<br />

advantages of the improved error correction capabilities<br />

for reduced complexity portable applications. The<br />

concatenated scheme solution obta<strong>in</strong>s very low BER <strong>and</strong><br />

avoids error floors at low bit error rates.<br />

The second element of the error correction scheme studied<br />

<strong>in</strong> this work is the channel <strong>in</strong>terleaver used to average the<br />

channel fad<strong>in</strong>g <strong>in</strong> time <strong>and</strong> frequency. This mechanism is<br />

conceived to be adapted to the modulated b<strong>and</strong>width <strong>and</strong><br />

is optimized for high throughput real time video. The<br />

proposed system architecture has been f<strong>in</strong>ally been<br />

implemented <strong>in</strong> FPGA. The application field of the<br />

proposed system is <strong>in</strong> the wireless transmission for digital<br />

television production like <strong>in</strong> the case of Digital News<br />

Gather<strong>in</strong>g or the sport events production. Figure 1 shows a<br />

typical application of such system <strong>in</strong> the case of a bicycle<br />

race.<br />

Figure 1: Bicycle race example.<br />

2. DVB-T<br />

The DVB-T modulation scheme was developed to<br />

guarantee transmission error ratios between 10 -9 <strong>and</strong> 10 -12 .<br />

The DVB-T Forward Error Correction (FEC) system has<br />

been developed to work with fixed length MPEG2-TS<br />

(Transport Stream) packets. After a proper r<strong>and</strong>omization<br />

to ensure the adequate b<strong>in</strong>ary transitions, the 188 bytes<br />

packets are Reed-Solomon (204,188) coded. This outer<br />

coder adds 16 bytes <strong>and</strong> has a maximum correction<br />

capability of 8 bytes. After this, the outer <strong>in</strong>terleav<strong>in</strong>g,<br />

also called Forney convolution <strong>in</strong>terleav<strong>in</strong>g, is performed<br />

to scatter the errors at the reception side <strong>and</strong> so to make<br />

the outer cod<strong>in</strong>g more effective by shuffl<strong>in</strong>g the data bytes<br />

over 12 packets. The <strong>in</strong>ner coder is a 1/2 rate<br />

convolutional code (G 1=171 oct; G 2=133 oct) <strong>and</strong> performs


different punctur<strong>in</strong>g patterns to obta<strong>in</strong> different cod<strong>in</strong>g<br />

robustness (R = 1/2; 2/3; 3/4; 5/6; 7/8). The <strong>in</strong>ner<br />

<strong>in</strong>terleaver consists <strong>in</strong> a bit-wise <strong>in</strong>terleav<strong>in</strong>g followed by<br />

a symbol <strong>in</strong>terleav<strong>in</strong>g. Both the bit-wise <strong>in</strong>terleaver <strong>and</strong><br />

the carrier <strong>in</strong>terleaver processes are block-based <strong>and</strong> the<br />

block length depends on the OFDM mode (2K/8K carriers<br />

modes). The bit-wise <strong>in</strong>terleaver, depend<strong>in</strong>g on the carrier<br />

modulation mode (QPSK, 16QAM, 64QAM) distributes<br />

the stream over different carriers to avoid selective fad<strong>in</strong>g<br />

corrupt<strong>in</strong>g consecutive message bits. The carrier<br />

<strong>in</strong>terleaver then generates a permutation between carriers<br />

to spread <strong>in</strong> frequency the carriers over an OFDM symbol.<br />

Figure 2: St<strong>and</strong>ard DVB-T transmitter block diagram.<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

3. FEC <strong>and</strong> System design<br />

3.1 Turbo based concatenated FEC<br />

The Quasi Error Free performances (BER=10 -12 ) <strong>and</strong> the<br />

acceptable implementation complexity are contradictory<br />

requirements with a pure turbo code based error correction<br />

scheme. Turbo codes can provide such performances with<br />

a high number of iterations <strong>and</strong> block length <strong>and</strong> this<br />

impact directly with the memory <strong>and</strong> the process<strong>in</strong>g<br />

power needed for real-time decod<strong>in</strong>g. The solution<br />

developed <strong>and</strong> <strong>in</strong>vestigated <strong>in</strong> this work is to concatenate<br />

the turbo coder with a (204,188) Reed-Solomon outer<br />

code to correct the residual errors <strong>in</strong> the stream. The<br />

concatenated RS outer coder avoids error floors at low bit<br />

error rates <strong>and</strong> <strong>in</strong>crease the work<strong>in</strong>g range of the system<br />

with a rapid transition to the QEF condition. This scheme<br />

is similar to the proposed FEC for the DVB-S2 [3] where<br />

LDPC codes are concatenated with a BCH outer coder.<br />

The chosen <strong>in</strong>ner coder is a CRSC (Circular Recursive<br />

Systematic Convolutional) turbo code [4] with code rate<br />

of R=1/3, 2/5, 1/2, 2/3, 3/4.<br />

Table 1: Outer <strong>in</strong>terleaver parameters.<br />

Branches (B) 12<br />

FIFO depth<br />

b = 0 ÷ (B -1)<br />

Delay<br />

(Number of packets)<br />

17 ⋅ b<br />

12<br />

Total memory (bytes) 2244<br />

The outer <strong>in</strong>terleaver for the concatenated error correction<br />

scheme adopted is a convolutional <strong>in</strong>terleaver based on TS<br />

packets. The statistics of the errors can vary with the<br />

channel characteristics so the convolutional <strong>in</strong>terleaver<br />

180<br />

has to perform good averag<strong>in</strong>g over many TS packets to<br />

allow the outer decoder to be effective. At the reception<br />

side the bit-stream from the turbo decoder is firstly<br />

converted <strong>in</strong> byte-stream <strong>and</strong> the correct alignment is<br />

retrieved look<strong>in</strong>g for the synchronization byte of the TS<br />

header. Table 1 reports the outer <strong>in</strong>terleaver parameters.<br />

3.2 Time <strong>and</strong> Frequency Interleav<strong>in</strong>g<br />

The phenomenon of multipath fad<strong>in</strong>g, operat<strong>in</strong>g to the<br />

OFDM modulated signals, groups the errors <strong>in</strong> burst <strong>and</strong><br />

decreases the efficiency of the error correction. When an<br />

OFDM signal undergoes a multipath fad<strong>in</strong>g process some<br />

frequency ranges are highly attenuated thus several<br />

adjacent carriers are corrupted by additive noise. At the<br />

same way, shadow fad<strong>in</strong>g corrupts several consecutive<br />

OFDM symbols. Two possible strategies can be applied <strong>in</strong><br />

the case of deep fad<strong>in</strong>g for OFDM symbols. The first is<br />

the frequency <strong>in</strong>terleav<strong>in</strong>g <strong>in</strong>side each OFDM symbol <strong>and</strong><br />

the second, called time <strong>in</strong>terleav<strong>in</strong>g, among the OFDM<br />

symbol sequence.<br />

The frequency <strong>in</strong>terleaver takes the data at the output of<br />

the <strong>in</strong>ner coder <strong>and</strong> has to <strong>in</strong>terleave the bit stream to<br />

avoid consecutive bits to be mapped on the same carrier.<br />

This process can be divided <strong>in</strong> a bit-wise <strong>in</strong>terleaver which<br />

<strong>in</strong>terleaves consecutive bits <strong>in</strong>to different mapped carriers<br />

<strong>and</strong> a carrier <strong>in</strong>terleaver which generate a pseudo-r<strong>and</strong>om<br />

sequence for the carrier mapp<strong>in</strong>g. The two processes are<br />

block based <strong>and</strong> can be adapted to the number of<br />

modulated useful carriers of the OFDM symbol.<br />

The time <strong>in</strong>terleaver receives the modulated carriers data<br />

(2, 4 or 6 bit each data carrier) from the bit <strong>in</strong>terleaver.<br />

The task of the time <strong>in</strong>terleaver is to spread errors com<strong>in</strong>g<br />

from shadow fad<strong>in</strong>g across several OFDM symbols to<br />

allow the channel cod<strong>in</strong>g mechanism to rebuild the<br />

corrupted data. The benefits of such system are<br />

proportional to the memory depth because of the greater<br />

averag<strong>in</strong>g period. The time <strong>in</strong>terleaver needs to work on a<br />

cont<strong>in</strong>uous data stream <strong>and</strong> treats a huge quantity of data.<br />

The <strong>in</strong>terleav<strong>in</strong>g technique applied must manage large<br />

memory <strong>and</strong> needs to access data by bursts. The structure<br />

of the <strong>in</strong>terleaver is shown <strong>in</strong> Figure 3.<br />

The ma<strong>in</strong> characteristic of this k<strong>in</strong>d of <strong>in</strong>terleaver is its<br />

<strong>in</strong>herent time-cont<strong>in</strong>uity. One faulty OFDM symbol is<br />

spread over B consecutive OFDM symbols <strong>and</strong> a group of<br />

length E of faulty OFDM symbols is spread over (B+E)<br />

symbols. The advantage of the convolutional <strong>in</strong>terleaver is<br />

that it spreads time-consecutive errors evenly over the<br />

symbols. For our application the time <strong>in</strong>terleaver works<br />

with blocks of 16 carriers called atoms. This simplifies the<br />

packag<strong>in</strong>g of the data <strong>and</strong> improves the transfer b<strong>and</strong>width<br />

dur<strong>in</strong>g the writ<strong>in</strong>g <strong>and</strong> read<strong>in</strong>g process. Each OFDM<br />

symbol is divided <strong>in</strong>to groups of carriers to vary the<br />

transmitted b<strong>and</strong>width. Each symbol is composed of<br />

maximum of Ng = 16 groups composed of 96 useful<br />

carriers. The time <strong>in</strong>terleaver works on variable length<br />

symbols hav<strong>in</strong>g a number of atoms Na equal to:<br />

Na = Ng ⋅ 96/16 = Ng ⋅ 6 (3.1)


The <strong>in</strong>terleav<strong>in</strong>g is done only among OFDM symbols so<br />

the number of branches needs to be B ≥ Na; B needs also<br />

to be a multiple of Na to keep the correct atom position.<br />

B = p ⋅ Na p = 1, 2 … (3.2)<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

Figure 3: Time <strong>in</strong>teleaver.<br />

Depend<strong>in</strong>g on the <strong>in</strong>terleav<strong>in</strong>g depth (n ⋅ Na) <strong>and</strong> the<br />

number of atoms, the FIFO sizes M b has to be chosen as:<br />

Mb = n ⋅ b ⋅ Na b = 1, 2…, (B-1) (3.3)<br />

The delay will be equal to the number of symbol to store<br />

<strong>in</strong> the memory <strong>and</strong> equal to:<br />

D = n ⋅ p ⋅ Na = n ⋅ B (3.4)<br />

3.3 System design<br />

Figure 4: System block diagram.<br />

The new developed system is flexible, <strong>and</strong> is able to adapt<br />

the modulation parameters depend<strong>in</strong>g on the application,<br />

the field conditions <strong>and</strong> the available frequencies. The<br />

spectrum is organized <strong>in</strong> blocks of carriers to achieve a<br />

variable b<strong>and</strong>width. Each symbol is organized <strong>in</strong> N g=16<br />

groups of 96 useful carriers <strong>and</strong> 12 pilot carriers for<br />

channel estimation <strong>and</strong> 1 signalization carrier. The pilot<br />

carriers are scattered <strong>in</strong>side the OFDM symbol to obta<strong>in</strong> a<br />

better frequency <strong>and</strong> time channel estimation <strong>and</strong> each<br />

block conta<strong>in</strong>s two fixed pilots. The time <strong>and</strong> frequency<br />

<strong>in</strong>terleavers act only on the useful carriers of the OFDM<br />

symbol. This implies that the synchronization mechanism<br />

can not benefit of the time-symbol <strong>in</strong>terleav<strong>in</strong>g so it needs<br />

to support long time fad<strong>in</strong>g before fall<strong>in</strong>g <strong>in</strong>to an unlocked<br />

status. Table 2 summarizes the ma<strong>in</strong> system parameters.<br />

4. Performances<br />

A complete system simulator has been developed to test<br />

the system performances. This C++ test environment has<br />

been used to validate the modulation <strong>and</strong> demodulation<br />

algorithms <strong>and</strong> to provide the VHDL test vector for the<br />

FPGA implementation.<br />

181<br />

4.1 Simulations results<br />

Table 3 shows the performances of the concatenated FEC<br />

compared to DVB-T. The table shows the required E b/No<br />

for BER =2 ⋅10 -4 after <strong>in</strong>ner coder <strong>and</strong> QEF after RS for<br />

the DVB-T st<strong>and</strong>ard.<br />

Table 2: System parameters.<br />

Useful channel<br />

b<strong>and</strong>width (Bw)<br />

0.5-25 MHz<br />

System frequency (fs) 12.5 MHz (Bw = 0.5-10.6 MHz)<br />

25 MHz (Bw = 1-22,2 MHz)<br />

Adjustable Steps (∆Bw) 0.66 MHz @ 12.5MHz<br />

1.31 MHz @ 25MHz<br />

FFT size 2K<br />

Maximum number of<br />

modulated carriers (N)<br />

Maximum number of<br />

useful carriers (Nu)<br />

Inter-carrier spac<strong>in</strong>g<br />

(df)<br />

Maximum number of<br />

pilot carriers<br />

Useful OFDM symbol<br />

duration (Tu)<br />

Guard <strong>in</strong>terval<br />

duration<br />

1745<br />

1536<br />

6,1 kHz @ 12.5MHz<br />

12.2 kHz @ 25MHz<br />

193<br />

163.84 µs @ 12.5MHz<br />

81 µs @ 25MHz<br />

1/4, 1/8, 1/16<br />

Constellation QPSK, 16QAM, 64QAM<br />

Inner encoder Turbo CRSC<br />

Inner encod<strong>in</strong>g rate (R) 1/3, 2/5, 1/2, 2/3, 3/4<br />

Inner <strong>in</strong>terleav<strong>in</strong>g Time <strong>and</strong> frequency<br />

Maximum time<br />

<strong>in</strong>terleaver depth (Ti)<br />

5.4 sec QPSK<br />

2,7 sec 16QAM<br />

1.3 sec 64QAM<br />

Outer <strong>in</strong>terleav<strong>in</strong>g Convolutional on 12 packets<br />

Outer encod<strong>in</strong>g RS (204,188)<br />

Useful bitrate 4,63 ÷ 37.9 Mbits @ 12.5 MHz<br />

9.26 ÷ 75,8 Mbits @ 25 MHz<br />

Table 3: Performances comparison for BER = 10 -4 at the<br />

output of the <strong>in</strong>ner coder.<br />

Code DVB-T (CC) Turbo<br />

Rate Eb/No (dB) Eb/No (dB)<br />

Gaussian 1/3 0.9<br />

2/5 1.2<br />

1/2 3.35 1.35<br />

Ricean 1/3 1.3<br />

2/5 1.6<br />

1/2 3.95 2.1<br />

Rayleigh 1/3 2.1<br />

2/5 2.8<br />

1/2 5.75 3.95<br />

The ga<strong>in</strong> obta<strong>in</strong>ed by us<strong>in</strong>g turbo codes results of about<br />

2dB for the Gaussian channel, 1.85 dB for the Ricean<br />

channel <strong>and</strong> 1.8 dB for the Rayleigh channel. Figure 5


shows the BER evolution at the output of the turbo<br />

decoder for a small signal loss. The figure reports the<br />

evolution of the BER at the output of the <strong>in</strong>ner coder for a<br />

signal loss of 100 symbols (19.4 ms) <strong>in</strong> the case of <strong>in</strong>ner<br />

code R=1/3, QPSK modulation, for 1.1 dB Eb/No with<br />

Gaussian channel <strong>and</strong> time <strong>in</strong>terleav<strong>in</strong>g depth of 1920<br />

symbols (384 ms).<br />

BER<br />

10 -4<br />

10 -5<br />

10<br />

0 240 480 720 960 1200 1440 1680 1920 2160 2400 2640 2880 3120 3360<br />

-6<br />

SYMBOLS<br />

Figure 5: BER evolution at the output of the turbo<br />

decoder for a s<strong>in</strong>gle failure po<strong>in</strong>t of 100 symbols <strong>and</strong> a<br />

time <strong>in</strong>terleaver depth of 1920 symbols.<br />

The graph clearly shows that the after turbo decoder the<br />

QEF threshold of BER = 2⋅10 -4 is not exceeded thus the<br />

deep fad<strong>in</strong>g event has been overcome without residual<br />

errors.<br />

4.2 FPGA implementation<br />

The modulator <strong>and</strong> the demodulator have been<br />

implemented on a Xil<strong>in</strong>x FPGAs platform. The modulator<br />

design results <strong>in</strong> a total equivalent gate count of 2.4M<br />

gates <strong>and</strong> works at 100 MHz. The demodulator is<br />

implemented on two FPGA's result<strong>in</strong>g <strong>in</strong> a total equivalent<br />

gate count of 10 M gates work<strong>in</strong>g at 50 Mhz. The first<br />

FPGA receives the ADC data, control the AGC's <strong>and</strong><br />

implements the channel estimation <strong>and</strong> compensation, the<br />

FFT <strong>and</strong> the demapp<strong>in</strong>g of the data carriers. The bit<br />

metrics are then output to the FEC section implemented<br />

on the second FPGA. This FPGA implements the time<br />

<strong>in</strong>terleaver <strong>and</strong> controls a 256 Mbit SDRAM work<strong>in</strong>g at<br />

100MHz.<br />

4.3 On field <strong>and</strong> implementation results<br />

The system performances have been compared with<br />

8MHz b<strong>and</strong>width st<strong>and</strong>ard DVB-T demodulator. The<br />

transmitter works on 2.3 GHz b<strong>and</strong> with 10 dBm emitted<br />

power. For QPSK modulation with GI=1/16 <strong>and</strong> R=1/2<br />

the obta<strong>in</strong>ed performances are shown <strong>in</strong> Table 4. These<br />

results show the m<strong>in</strong>imum power required at the <strong>in</strong>put of<br />

the demodulator to obta<strong>in</strong> an error free image at the<br />

receiver. The implemented system has also been tested<br />

with a ground-plane transmission dur<strong>in</strong>g some tests<br />

sessions.<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

182<br />

Video IN<br />

Transmitter<br />

Fixed<br />

Attenuation<br />

Variable<br />

Attenuator<br />

Multimode Receiver<br />

Consumer<br />

DVB-T<br />

demodulator<br />

Video OUT<br />

50 dB 0 90 dB<br />

Enhanched<br />

OFDM<br />

Demodulator<br />

Figure 6: Implemented system test bench.<br />

Table 4: Implemented system performances<br />

DVB-T -97 dBm<br />

Enhanced ODFM<br />

16 Iterations<br />

25 ms Time <strong>in</strong>terleaver<br />

-99 dBm<br />

Dur<strong>in</strong>g these tests the transmission system has been<br />

compared <strong>in</strong> real work<strong>in</strong>g condition with a limited power<br />

(1 W) <strong>in</strong> a very difficult environment (forest road). These<br />

tests show that a time <strong>in</strong>terleaver between 100 <strong>and</strong> 800 ms<br />

improves the robustness of the l<strong>in</strong>k <strong>and</strong> allows the receiver<br />

to obta<strong>in</strong> a high quality image compared to the st<strong>and</strong>ard<br />

DVB-T transmission.<br />

5. CONCLUSION<br />

This work shows that a robust error correction system for<br />

high quality video mobile term<strong>in</strong>als based on turbo cod<strong>in</strong>g<br />

<strong>and</strong> time <strong>in</strong>terleav<strong>in</strong>g can improve the mobile<br />

performances of OFDM systems. The robustness of the<br />

system resides <strong>in</strong> a wide error spread<strong>in</strong>g associated to a<br />

strong error correction system capable to cancel the<br />

impairments of the channel. The turbo based concatenated<br />

error correction demonstrate to be a good solution for<br />

QEF performances <strong>in</strong> mobile channels. The simulations<br />

show that the time <strong>in</strong>terleaver is a very effective tool to<br />

successfully overcome shadow fad<strong>in</strong>g events. The<br />

experimental performances of the systems dur<strong>in</strong>g field test<br />

confirm that the proposed error correction techniques can<br />

be really used to improve the performances of DVB-T <strong>in</strong><br />

highly mobile applications.<br />

6. REFERENCES<br />

[1] Digital Video Broadcast<strong>in</strong>g: Fram<strong>in</strong>g Structure,<br />

channel cod<strong>in</strong>g <strong>and</strong> modulation for digital terrestrial<br />

television (DVB-T). ETSI EN 300 744 v1.4.1(2001-<br />

01)<br />

[2] Terrestrial Integrated Services Digital Broadcast<strong>in</strong>g<br />

(ISDB-T) Specifications for channel cod<strong>in</strong>g, Fram<strong>in</strong>g<br />

structure <strong>and</strong> modulation, ARIB, September1998.<br />

[3] Digital Video Broadcast<strong>in</strong>g (DVB-S2); Second<br />

generation fram<strong>in</strong>g structure, channel cod<strong>in</strong>g <strong>and</strong><br />

modulation systems for Broadcast<strong>in</strong>g, Interactive<br />

Services, News Gather<strong>in</strong>g <strong>and</strong> other broadb<strong>and</strong><br />

satellite applications. Draft ETSI EN 302 307 V1.1.1<br />

(2004-06)<br />

[4] D. Gnaedig, E. Boutillon, V. C. Gaudet, M. Jézéquel,<br />

P. G. Gulak, "On Multiple Slice Turbo Codes", <strong>in</strong><br />

Proc. International Symposium on Turbo Codes <strong>and</strong><br />

Related Topics, Brest, pp. 343-346, Sept. 2003


0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

ALGORITHMIC/ARCHITECTURAL DESIGN FOR<br />

H.264/MPEG-4 AVC LOW-POWER VIDEO CODEC<br />

Massimiliano Melani, Luca Fanucci, Sergio Saponara<br />

Department of Information Eng<strong>in</strong>eer<strong>in</strong>g, University of Pisa, Via Caruso, 56122, Pisa, Italy<br />

E-mail: massimiliano.melani@iet.unipi.it<br />

ABSTRACT<br />

With reference to the new H.264/AVC video codec<br />

st<strong>and</strong>ard, this paper presents novel algorithmic <strong>and</strong><br />

architectural solutions for the implementation of contextaware<br />

coprocessors <strong>in</strong> real-time, low-power embedded<br />

systems. The focus is on the motion estimation task which<br />

is the traditional bottleneck of video cod<strong>in</strong>g systems <strong>in</strong><br />

terms of computational <strong>and</strong> memory costs. When<br />

implement<strong>in</strong>g the proposed VLSI architecture <strong>in</strong> CMOS<br />

technology the same performance of the conventional<br />

Full-Search approach is achieved for a wide range of bitrates,<br />

while remarkably reduc<strong>in</strong>g computational burden<br />

<strong>and</strong> power consumption.<br />

1. INTRODUCTION<br />

ITU-T <strong>and</strong> ISO-IEC have recently released a new video<br />

cod<strong>in</strong>g st<strong>and</strong>ard, the H.264/MPEG-4 AVC. It achieves the<br />

same visual quality of previous st<strong>and</strong>ards with bit-rate<br />

reductions up to 50% [1], thus represent<strong>in</strong>g the enabl<strong>in</strong>g<br />

technology for high quality video communication over<br />

wirel<strong>in</strong>e (e.g. xDSL) <strong>and</strong> wireless (e.g. UMTS, WLAN)<br />

networks. Like previous schemes, H.264/AVC is based on<br />

a hybrid motion compensated <strong>and</strong> transform cod<strong>in</strong>g<br />

model, adopt<strong>in</strong>g multi-reference frames <strong>and</strong> variable block<br />

sizes. Additional features, particularly <strong>in</strong> the motion<br />

estimation (ME) task, improve the cod<strong>in</strong>g efficiency at the<br />

expenses of an <strong>in</strong>creased implementation cost: over than<br />

60% of the whole encoder complexity, represent<strong>in</strong>g the<br />

bottleneck of the entire codec. Thus, the design of lowpower<br />

ME eng<strong>in</strong>es, support<strong>in</strong>g multi reference frames <strong>and</strong><br />

variable block sizes, is a key issue for H.264/AVCcompliant<br />

battery-powered video term<strong>in</strong>als. To this aim<br />

we propose to add a low complexity context aware<br />

controller to a conventional Full Search (FS) ME eng<strong>in</strong>e.<br />

By proper configur<strong>in</strong>g search parameters, the proposed<br />

controller avoids unnecessary computations <strong>and</strong> memory<br />

costs. This way implementation complexity is reduced by<br />

a factor up to 25 depend<strong>in</strong>g on the <strong>in</strong>put signal, while<br />

keep<strong>in</strong>g unaltered performance <strong>in</strong> terms of ME accuracy.<br />

Hereafter Section 2 reviews state-of-art algorithms <strong>and</strong><br />

VLSI architectures for ME. Section 3 describes the<br />

proposed algorithmic solution <strong>and</strong> its relevant architectural<br />

implementation. Section 4 shows the achieved results <strong>in</strong><br />

terms of cod<strong>in</strong>g efficiency <strong>and</strong> implementation complexity<br />

when apply<strong>in</strong>g the context-aware controller to both FS <strong>and</strong><br />

Fast ME search<strong>in</strong>g eng<strong>in</strong>es. Conclusions are drawn <strong>in</strong><br />

Section 5.<br />

183<br />

2. ME ALGORITHMS AND VLSI<br />

ARCHITECTURES<br />

A straightforward technique for perform<strong>in</strong>g the ME is the<br />

FS: as sketched <strong>in</strong> Fig. 1, the current frame of a video<br />

sequence is divided <strong>in</strong>to non-overlapp<strong>in</strong>g blocks (current<br />

blocks) <strong>and</strong>, for each of them, a block (reference block) <strong>in</strong><br />

one or more previous processed frames is searched for the<br />

best match<strong>in</strong>g with<strong>in</strong> a search w<strong>in</strong>dow with maximum<br />

horizontal <strong>and</strong> vertical displacements of p pixels. The<br />

match<strong>in</strong>g algorithm consists <strong>in</strong> comput<strong>in</strong>g a cost function,<br />

usually the SAD (Sum of Absolute Differences), for all the<br />

(2·p+1) 2 possible positions of the reference blocks with<strong>in</strong><br />

the search w<strong>in</strong>dow. If a(i,j) <strong>and</strong> b(i,j) are the pixels of the<br />

current <strong>and</strong> reference blocks <strong>and</strong> m, n are the coord<strong>in</strong>ates<br />

of Motion Vector (MV) (i.e. position of reference block <strong>in</strong><br />

the search area), the SAD for a VxH-pixel block type is:<br />

V 1 H 1 − −<br />

SAD(<br />

m,<br />

n)<br />

= ∑∑ a(<br />

i,<br />

j)<br />

− b(<br />

i + m,<br />

j + n)<br />

, (-p ≤ m, n ≤ p)<br />

i=<br />

0 j=<br />

0<br />

Figure 1. Full Search motion estimation<br />

In previous H.263/MPEG-4 video cod<strong>in</strong>g st<strong>and</strong>ards only 1<br />

reference frame <strong>and</strong> 16x16-pixel blocks were used for<br />

ME. In the emerg<strong>in</strong>g H.264/AVC st<strong>and</strong>ard the motion<br />

estimator considers the current frame <strong>and</strong> multiple<br />

reference frames <strong>and</strong> produces, as output, the m<strong>in</strong>imum<br />

SADs (SADm<strong>in</strong>) <strong>and</strong> the relevant MVs for each 16x16pixel<br />

block <strong>and</strong> its sub-partitions, down to 4x4 pixel<br />

blocks. The extension of ME to variable block sizes <strong>and</strong><br />

multiple reference frames, together with new cod<strong>in</strong>g<br />

features (CABAC entropy cod<strong>in</strong>g, <strong>in</strong>tra spatial prediction,<br />

<strong>in</strong>teger transform, <strong>in</strong>-loop deblock<strong>in</strong>g filter) is at the basis<br />

of a doubled cod<strong>in</strong>g efficiency of H.264/AVC vs. its<br />

ancestors, but is also responsible of an <strong>in</strong>creased<br />

complexity up to a factor 10 for the encoder [1,6]. As


concerns real-time implementation of video cod<strong>in</strong>g<br />

st<strong>and</strong>ards, ME is traditionally the bottleneck of<br />

H.26x/MPEGx coders <strong>and</strong> requires a dedicated hardware<br />

eng<strong>in</strong>e. As an example implement<strong>in</strong>g a FS ME for a 30 Hz<br />

CIF video (352x288 pixels), with M=5 reference frames<br />

<strong>and</strong> a search range of ±p=16 pixels, requires the<br />

exam<strong>in</strong>ation of M⋅(2p+1) 2 =5445 locations for each image<br />

block, i.e. more than 16x10 9 /sec absolute differences <strong>and</strong><br />

data accesses at pixel level (8-bit data). The high<br />

computational <strong>and</strong> memory requirement <strong>and</strong> the<br />

consequent high power consumption of the FS approach<br />

depend quadratically on the search size, <strong>and</strong> l<strong>in</strong>early on<br />

the number of reference frames, but are <strong>in</strong>dependent from<br />

the scene content. Process<strong>in</strong>g slow-motion or low-quality<br />

scenes requires the same high implementation cost of<br />

high-quality <strong>and</strong>/or high-dynamic videos. Other ME<br />

techniques have been proposed <strong>in</strong> the literature, <strong>in</strong> order to<br />

decrease complexity of the search: Three-Step-Search<br />

(TSS), Four-Step-Search, 2D Log-Search [7,8,9]. Their<br />

lead<strong>in</strong>g idea is to reduce the number of reference blocks<br />

processed for each current block, but without exploit<strong>in</strong>g<br />

the statistics of the <strong>in</strong>put data; usually a local m<strong>in</strong>imum is<br />

obta<strong>in</strong>ed, <strong>in</strong>stead of determ<strong>in</strong><strong>in</strong>g the global one over the<br />

search range. Hence the computational cost reduction vs.<br />

FS is achieved at the expenses of a reduced estimation<br />

quality. The class of predictive algorithms [7,10] exploits<br />

the spatial <strong>and</strong> temporal correlation of typical MV fields.<br />

For applications from tens to hundreds of Kbps the same<br />

high cod<strong>in</strong>g efficiency of the FS can be achieved with<br />

computational load reductions but they are affected by two<br />

problems: (i) the higher is the bit-rate of the considered<br />

application the worse is the algorithm performance; (ii) the<br />

regularity of the FS data flow is broken thus caus<strong>in</strong>g a<br />

poor data reuse. Several VLSI architectures have been<br />

proposed <strong>in</strong> the literature to implement FS ME consist<strong>in</strong>g<br />

of 3 ma<strong>in</strong> units [7,9,11]: (i) an array of process<strong>in</strong>g<br />

elements realiz<strong>in</strong>g at pixel level the basic SAD operation;<br />

(ii) a local memory to exploit data reuse thus reduc<strong>in</strong>g the<br />

accesses to large background frame memories; (iii) an I/O<br />

control unit. Such architectures are not efficient <strong>in</strong> terms<br />

of power consumption s<strong>in</strong>ce the search eng<strong>in</strong>e is always<br />

work<strong>in</strong>g at maximum effort, i.e. us<strong>in</strong>g maximum search<br />

displacement <strong>and</strong> number of reference frames, also <strong>in</strong> case<br />

of slow-motion or low-quality videos.<br />

3. PROPOSED ME COPROCESSOR<br />

To address the above issues this paper proposes a ME<br />

technique that exploits <strong>in</strong>put signal variations to<br />

automatically configure the search area size <strong>and</strong> the<br />

reference frames number of a FS block-match<strong>in</strong>g eng<strong>in</strong>e.<br />

Accord<strong>in</strong>g to the scheme <strong>in</strong> Fig. 2 we propose to add a<br />

low-complexity control unit, the Context Aware<br />

Controller [2,3], to a FS eng<strong>in</strong>e made up of a Search<br />

Eng<strong>in</strong>e plus an I/O <strong>in</strong>terface <strong>and</strong> a local memory. While<br />

the search eng<strong>in</strong>e is work<strong>in</strong>g, the adaptive controller<br />

extracts from the search eng<strong>in</strong>e partial results <strong>in</strong>formation<br />

on the <strong>in</strong>put signal statistics us<strong>in</strong>g them to configure the<br />

search area size <strong>and</strong> the number of reference frames.<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

184<br />

I/0<br />

Control<br />

Ext_ctrl_I/O<br />

Search<br />

Eng<strong>in</strong>e<br />

Current<br />

Pixels<br />

Reference<br />

Pixels<br />

SAD<br />

MV<br />

Search size &<br />

frame number<br />

Local Memory<br />

Data_I/O<br />

Context<br />

Aware<br />

Controller<br />

Figure 2. Architecture of the self-adaptive ME<br />

The lead<strong>in</strong>g ideas for the design of such controller are: (i)<br />

exploit<strong>in</strong>g the spatial <strong>and</strong> temporal correlation that<br />

characterizes the MV field of real-world sequences; (ii)<br />

us<strong>in</strong>g the m<strong>in</strong>imum SADs (SADm<strong>in</strong>) <strong>and</strong> the relevant<br />

MVs values as <strong>in</strong>dicators of the motion prediction<br />

accuracy. In case of <strong>in</strong>terframe cod<strong>in</strong>g, a high SADm<strong>in</strong><br />

value for the processed block po<strong>in</strong>t out a high prediction<br />

error <strong>and</strong> that the used set of search parameters (search<br />

range, number of reference frames) should be extended. A<br />

low value of SADm<strong>in</strong> <strong>in</strong>stead <strong>in</strong>dicates that used search<br />

parameters allow for good motion estimation <strong>and</strong> hence<br />

search range <strong>and</strong> number of reference frames can be kept<br />

unaltered or reduced to save computational/memory costs.<br />

Therefore, by compar<strong>in</strong>g the SADm<strong>in</strong> related to a certa<strong>in</strong><br />

block with proper thresholds the optimal search area range<br />

<strong>and</strong> number of reference frames for the successive blocks,<br />

for each 16x16-pixel block <strong>and</strong> its subpartitions, can be<br />

determ<strong>in</strong>ed. The analysis doma<strong>in</strong> is divided <strong>in</strong> 3 zones<br />

determ<strong>in</strong>ed compar<strong>in</strong>g the SADm<strong>in</strong> of a certa<strong>in</strong> 16x16pixel<br />

block with 2 thresholds T 1 <strong>and</strong> T 2. The search<br />

w<strong>in</strong>dow size for the successive 16x16-pixel block <strong>and</strong> its<br />

subpartitions is differently determ<strong>in</strong>ed (by us<strong>in</strong>g the MVs<br />

of its spatio-temporal neighbour<strong>in</strong>g blocks), accord<strong>in</strong>g to<br />

the result<strong>in</strong>g zone. If T 1


multiple reference frames are not really useful. On the<br />

contrary <strong>in</strong> case of sequences with a high grade of<br />

dynamism only 30% of optimal MVs refer to the first<br />

frame, <strong>in</strong> this case multiple reference frames leads an<br />

effective profit. As consequence, we improved the<br />

proposed basic control, briefly discussed above,<br />

<strong>in</strong>troduc<strong>in</strong>g a cost function to verify if, for all the blocks <strong>in</strong><br />

the current frame under estimation, it is worth us<strong>in</strong>g<br />

multiple reference frames or not. The basic idea is to<br />

compare the maximum SADm<strong>in</strong> of the previous encoded<br />

frame with a proper threshold T 3; if the maximum<br />

SADm<strong>in</strong> results lower than T 3, only 1 reference frame will<br />

be considered for all the blocks <strong>in</strong> the current frame<br />

otherwise 5 reference frames. In the latter case, when<br />

estimat<strong>in</strong>g the motion of each block the 5 search w<strong>in</strong>dows<br />

<strong>in</strong> the 5 reference frames have the same size (for more<br />

details see [2]). It is worth not<strong>in</strong>g that the values for<br />

thresholds T 1, T 2, T 3 have been empirically derived by<br />

computer simulations of typical videos with a wide range<br />

of different grades of dynamism <strong>and</strong> image formats<br />

search<strong>in</strong>g a good trade-off between the performance of the<br />

result<strong>in</strong>g motion estimator <strong>and</strong> the computational overhead<br />

<strong>in</strong>troduced by the decision law. The proposed technique<br />

has been implemented both as a software runn<strong>in</strong>g on a<br />

general-purpose µProcessor (PentiumIV) <strong>and</strong> as dedicated<br />

VLSI macrocell. The VLSI architecture of the motion<br />

estimator depicted <strong>in</strong> Fig. 2 has been implemented us<strong>in</strong>g a<br />

semi-custom design flow based on reusable VHDL<br />

description <strong>and</strong> logic synthesis with Synopsys <strong>in</strong> 0.18 µm<br />

st<strong>and</strong>ard-cells CMOS technology. The f<strong>in</strong>al motion<br />

estimator cell is made up of a context-aware controller,<br />

implement<strong>in</strong>g the decision laws briefly described above,<br />

which configures the search parameters of a conventional<br />

FS eng<strong>in</strong>e based on the architecture presented <strong>in</strong> [12]. The<br />

reduction of the adaptive search size <strong>and</strong> reference frames<br />

number obta<strong>in</strong>ed <strong>in</strong> the f<strong>in</strong>al motion estimator permits to<br />

lighten the computational load <strong>and</strong> to decrease memory<br />

accesses required to the dedicated search eng<strong>in</strong>e. This<br />

reduction has been exploited, by clock gat<strong>in</strong>g, to lower the<br />

power needed to elaborate a video sequence s<strong>in</strong>ce, <strong>in</strong><br />

CMOS VLSI circuits, the power consumption primary<br />

derives by switch<strong>in</strong>g activity which depends l<strong>in</strong>early on<br />

the computational load <strong>and</strong> memory accesses.<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

4. PERFORMANCE AND<br />

COMPLEXITY RESULTS<br />

4.1 ME Performance<br />

With respect to conventional ME solutions, <strong>and</strong> to<br />

alternative solutions proposed <strong>in</strong> literature [4], the<br />

efficiency measured as reduced process<strong>in</strong>g time <strong>in</strong> case of<br />

software implementation ranges from a factor 2.2 to 25,<br />

depend<strong>in</strong>g on the <strong>in</strong>put signal <strong>and</strong> on the encoder<br />

configuration. The compression performance is the same<br />

of the H.264/MPEG-4 AVC scheme us<strong>in</strong>g FS. Fig. 3<br />

shows the time (sec) spent for the ME task for sequences<br />

with different formats <strong>and</strong> grades of dynamism:<br />

Mother&Daughter QCIF (M&D), Foreman QCIF (FOR),<br />

Mobile&Calendar SIF (M&C) <strong>and</strong> Stefan CIF (SF) all at<br />

185<br />

30Hz when encod<strong>in</strong>g 300 frames for QP=24, 1 reference<br />

frame <strong>and</strong> a maximum search range ±p=16. The results are<br />

obta<strong>in</strong>ed keep<strong>in</strong>g the cod<strong>in</strong>g efficiency (bit-rates achieved)<br />

roughly the same over a wide range of bit-rates for equal<br />

PSNR values. We observe that our algorithm features a<br />

speed <strong>in</strong>crease rang<strong>in</strong>g from a factor 2.9 up to 7 vs. the FS<br />

<strong>and</strong> from 1.1 up to 1.9 vs. [4]. Repeat<strong>in</strong>g the above<br />

analysis other sequences leads to similar results.<br />

ME time (sec)<br />

300<br />

250<br />

200<br />

150<br />

100<br />

50<br />

0<br />

H.264_FS-1fr<br />

H.264_OUR<br />

H.264_[4]<br />

1 2 3 4<br />

sequences<br />

Figure 3. ME time us<strong>in</strong>g different motion<br />

estimators for 1)M&D 2)FOR 3)M&C 4)SF<br />

Fig. 4 compares the PSNR vs. bit-rate performances of the<br />

JM6.1C st<strong>and</strong>ard model [1] us<strong>in</strong>g for ME our technique<br />

(H.264_OUR <strong>in</strong> Figs. 3,4), a FS with 5 reference frames<br />

(H.264_FS-5fr) <strong>and</strong> a FS with 1 reference frame<br />

(H.264_FS-1fr). In Fig. 4 our ME <strong>and</strong> the FS-5fr behave<br />

similarly featur<strong>in</strong>g, for all bit-rates, a PSNR 2 dB higher<br />

than the FS-1fr. The <strong>in</strong>put test video is Mobile & Calendar<br />

CIF. Similar results are achieved with other test videos.<br />

PSNR (dB)<br />

43<br />

40<br />

37<br />

34<br />

31<br />

28<br />

25<br />

H.264_FS-1fr<br />

H.264_FS-5fr<br />

H.264_OUR<br />

0 2000 4000 6000<br />

bit-rate (Kbps)<br />

Figure 4. Cod<strong>in</strong>g performance us<strong>in</strong>g different<br />

motion estimators (Mobile & Calendar CIF)<br />

4.2 CMOS synthesis results<br />

The complexity of the synthesized macrocell amounts to<br />

roughly 109 Kgates plus 37 Kbits of SRAM <strong>and</strong> it allows<br />

real-time of ma<strong>in</strong> video formats. With reference to Fig. 2<br />

the overhead of the context-aware controller is limited <strong>in</strong><br />

terms of circuit complexity (3.5 Kgates <strong>and</strong> 13 Kbits of<br />

SRAM) <strong>and</strong> is negligible <strong>in</strong> terms of tim<strong>in</strong>g s<strong>in</strong>ce the<br />

controller <strong>and</strong> the search eng<strong>in</strong>e works concurrently. The<br />

power consumption of the whole motion estimator has<br />

been extracted from gate level simulations us<strong>in</strong>g as <strong>in</strong>put<br />

stimuli the sequences already adopted dur<strong>in</strong>g the<br />

algorithmic development. In case of 30 Hz CIF videos<br />

with a max search range ±p=16 pixels <strong>and</strong> 5 reference


frames real-time process<strong>in</strong>g is achieved with a clock<br />

frequency of about 68 MHz The power consumption<br />

without the context-aware controller amounts to roughly<br />

0.2 W. This value is practically <strong>in</strong>dependent from the<br />

<strong>in</strong>put signal s<strong>in</strong>ce a fixed search displacement ±p=16 <strong>and</strong><br />

5 reference frames are used for both high-dynamic video<br />

scenes <strong>and</strong> slow-motion ones. With the use of the contextaware<br />

control we can achieve a remarkable power sav<strong>in</strong>g<br />

for the whole motion estimator through clock gat<strong>in</strong>g.<br />

Indeed, depend<strong>in</strong>g on the considered <strong>in</strong>put signal, the<br />

average search size ranges from 3.5 to 8.5 <strong>in</strong>stead of be<strong>in</strong>g<br />

fixed to 16 while the average number of reference frames<br />

ranges from 1.2 to 4.6 <strong>in</strong>stead of 5. As consequence the<br />

adaptive process<strong>in</strong>g approach leads to a power<br />

consumption reduction from 60 % to 95 % depend<strong>in</strong>g on<br />

the <strong>in</strong>put signal. The lower the video sequence dynamism,<br />

the higher the power consumption reduction.<br />

4.3 Extension to Fast ME techniques<br />

The proposed technique has been also applied to fast ME<br />

eng<strong>in</strong>es [5] <strong>in</strong> JM9.0C st<strong>and</strong>ard model; <strong>in</strong> this case we can<br />

achieve roughly the same cod<strong>in</strong>g efficiency (measured as<br />

bit-rate spent for a fixed PSNR) <strong>and</strong> a complexity<br />

reduction vs. [5] by a factor at least 2. S<strong>in</strong>ce [5] already<br />

obta<strong>in</strong>s a speed factor up to 20 vs. the FS the whole speed<br />

up <strong>in</strong>crease comb<strong>in</strong><strong>in</strong>g our context-aware control with Fast<br />

ME is higher than 40 vs. FS.<br />

Cod<strong>in</strong>g<br />

time sav<strong>in</strong>g<br />

Bit rate<br />

<strong>in</strong>crease<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

Table 1. Comparison between our <strong>and</strong> [5]<br />

M&C FOR M&D<br />

69,5% 72,3% 69,2%<br />

8% 5,40% 4,60%<br />

5. CONCLUSION<br />

The paper presents a power-optimized motion estimator,<br />

compliant with the new H.264/MPEG-4 AVC video<br />

st<strong>and</strong>ard. For bit-rate applications rang<strong>in</strong>g from few Kbps<br />

to several Mbps, our ME achieves the same compression<br />

efficiency as the FS (work<strong>in</strong>g with multiple reference<br />

frames, variable block sizes <strong>and</strong> fixed search size) but with<br />

a noticeable complexity reduction s<strong>in</strong>ce unnecessary<br />

computations <strong>and</strong> memory accesses can be avoided. The<br />

<strong>in</strong>creased efficiency is exploited both for (i) process<strong>in</strong>g<br />

time reduction (up to a factor 25) <strong>in</strong> case of software<br />

implementation; (ii) power consumption reduction (from<br />

60% to 95%) <strong>in</strong> case of CMOS hardware implementation.<br />

The context-aware control technique has been also applied<br />

to recently proposed fast ME techniques allow<strong>in</strong>g for a<br />

complexity reduction higher than a factor of two with<br />

roughly the same cod<strong>in</strong>g efficiency. The whole speed up<br />

<strong>in</strong>crease comb<strong>in</strong><strong>in</strong>g our context-aware control with Fast<br />

ME is higher than 40 vs. FS.<br />

186<br />

6. ACKNOWLEDGMENT<br />

The work has been supported by the PRIN2003 <strong>and</strong> FIRB<br />

PRIMO projects by MIUR. Discussions with M. Casula,<br />

University of Pisa, are gratefully acknowledged.<br />

7. REFERENCES<br />

[1] T. Wieg<strong>and</strong>, G. Sullivan, G. Bjntegaard, A. Luthra,<br />

Overview of the H.264/AVC video cod<strong>in</strong>g st<strong>and</strong>ard,<br />

IEEE, TCSVT, 13 (7): 560-576, 2003.<br />

[2] S .Saponara, M. Melani, L. Fanucci, P. Terreni,<br />

Adaptive algorithm for fast motion estimation <strong>in</strong><br />

H.264/MPEG-4 AVC, EUSIPCO04, Vienna, 269-272<br />

[3] S. Saponara, M. Melani, L. Fanucci, P. Terreni,<br />

Controll<strong>in</strong>g Motion Estimation Search Parameters<br />

for a Power-optimized Video Architecture <strong>in</strong><br />

H.264/AVC, IWSES 2004, Poznan, 275-278.<br />

[4] W. Song, M Hong, Adaptive search range decision<br />

algorithm for fast motion estimation, Proc. of SPIE,<br />

VCIP04. vol. 5308, pp. 1073-1081, 2004.<br />

[5] Z. Chen, J. Xu, Y. He, Efficient Fast ME Predictions<br />

<strong>and</strong> Early-term<strong>in</strong>ation Strategy Based on H.264<br />

Statistical Characters, ICICS-PCM03, S<strong>in</strong>gapore.<br />

[6] J. Ostermann, J. Bormans, P. List, D. Marpe, M.<br />

Narroschke, F. Pereira, T. Stockhammer, T. Wedi,<br />

Video cod<strong>in</strong>g with H.264/AVC: tools, performance<br />

<strong>and</strong> complexity, IEEE Cir. <strong>and</strong> Syst. Magaz<strong>in</strong>e, 4 (1):<br />

7-28, 2004<br />

[7] P. Kuhn, Algorithms, complexity analysis <strong>and</strong> VLSI<br />

architectures for MPEG-4 motion estimation, Kluwer<br />

Academic Publisher, 1999<br />

[8] M. Takahashi et al., A 60-MHz 240-mW MPEG-4<br />

videophone LSI with 16-Mb embedded DRAM, IEEE<br />

J. Solid State Circuits 35 (11): 1713–1721, 2000<br />

[9] M. Harr<strong>and</strong> et al., A s<strong>in</strong>gle-chip CIF 30 Hz H.261,<br />

H.263, H.263+ video encoder/decoder with<br />

embedded display controller, IEEE JSSC 34 (11):<br />

1627–1633, 1999<br />

[10] A. Chimienti, L. Fanucci, R. Locatelli, S. Saponara.<br />

VLSI architecture for a low-power video codec<br />

system, <strong>Microelectronics</strong> Journal, 33 (5): 417-427,<br />

2002<br />

[11] L. Fanucci, S. Saponara, L. Bert<strong>in</strong>i, A parametric<br />

VLSI architecture for video motion estimation,<br />

Integration The VLSI Journal, 31 (1): 79-100, 2001<br />

[12] Y.-W. Huang, T.-C. Wang, B. Hsieh, L.-G. Chen,<br />

Hardware architecture design for variable block size<br />

motion estimation <strong>in</strong> MPEG-4 AVC/JVT/ITU-T<br />

H.264, Proc. IEEE ISCAS03 , pp. 796-799, Bangkok,<br />

2003


0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

ALGORITHM FOR THE AUTOMATIC<br />

VERIFICATION OF COMPLEX MIXED-SIGNAL<br />

ICS REGARDING ESD-STRESS<br />

H. Morgenstern (1) , G. Groos (2) , H. Köhne (1) , M. Stecher (3) , W.John (1) , H. Reichl (1)<br />

(1) Fraunhofer IZM, Gustav-Meyer Allee 25, 13355 Berl<strong>in</strong>, Germany<br />

(2) Universität der Bundeswehr, Werner-Heisenberg Weg 39, 85577 Neubiberg, Germany<br />

(3) Inf<strong>in</strong>eon Technologies, AI AP TD BCD, Balanstr. 73, 81541 Munich, Germany<br />

ABSTRACT<br />

In this publication, an algorithm is described which automates<br />

the verification of a complex <strong>in</strong>tegrated circuit (IC)<br />

with regard to the behaviour under transient high current<br />

impulses (e.g. ESD). In order to reduce the complexity of<br />

the circuit for a later transient simulation with high current<br />

simulation models, the electric state of the circuit under<br />

stress is analysed <strong>in</strong> a simplified way enabl<strong>in</strong>g the efficient<br />

automated analysis of the total IC. A manual extraction<br />

of the relevant circuit parts for such a transient analysis<br />

can thus be avoided. The algorithm is embedded <strong>in</strong> a<br />

commercial design framework for IC-design <strong>and</strong> uses the<br />

data structures already exist<strong>in</strong>g.<br />

1. INTRODUCTION<br />

Due to the complexity of today’s <strong>in</strong>tegrated circuits, the<br />

development costs <strong>and</strong> permanently shortened product<br />

cycles, a verification of the functionality by circuit simulation<br />

is essential. Both for digital <strong>and</strong> mixed-signal ICs,<br />

simulations of the circuit operation are carried out as early<br />

as possible <strong>in</strong> the design cycle, e.g. us<strong>in</strong>g hardware description<br />

languages like VHDL, VHDL-AMS or SystemC<br />

[1]. But <strong>in</strong> order to reproduce the behaviour of analogous<br />

circuit parts or the impact of parasitic resistances, capacitances<br />

or <strong>in</strong>ductances, it is necessary to <strong>in</strong>tegrate analogous<br />

simulations (e.g. SPICE-based) <strong>in</strong>to the design flow.<br />

Due to clarity <strong>and</strong> work shar<strong>in</strong>g, designs are nowadays<br />

split up <strong>in</strong>to blocks <strong>and</strong> processed with<strong>in</strong> a hierarchical<br />

design process. The functionality of each block is at first<br />

checked <strong>in</strong>dividually, <strong>and</strong> then the simulation of the whole<br />

IC consists of a comb<strong>in</strong>ation of analogous, digital <strong>and</strong><br />

behaviour models. Yet, the transient, analogous simulation<br />

of the whole IC is often no longer possible with a justifiable<br />

time expenditure due to the complexity of today’s<br />

circuits [2]. The usual manual verification or simplification<br />

of the circuit diagram is extremely time consum<strong>in</strong>g<br />

<strong>and</strong> error-prone. Moreover, it presupposes a high expertise<br />

not only with regard to the circuit <strong>in</strong> all details but also to<br />

possible parasitic effects of the used technology. In the<br />

ESD-case, the verification of the whole circuit is especially<br />

important as – due to complex supply nets, circuit<br />

blocks of different voltage classes <strong>and</strong> application specific<br />

I/O- or ESD-structures – coupl<strong>in</strong>gs <strong>and</strong> transient current<br />

paths can come up, which are not covered via s<strong>in</strong>gle block<br />

simulations [3]. A further issue when simulat<strong>in</strong>g a circuit<br />

187<br />

subject to ESD-impulses is that special simulation models<br />

are necessary which allow the modell<strong>in</strong>g of the physical<br />

behaviour when such an <strong>in</strong>cident occurs. These models are<br />

considerably more complex than st<strong>and</strong>ard models <strong>and</strong><br />

thus, they <strong>in</strong>crease the simulation time <strong>and</strong> the probability<br />

of convergence problems.<br />

However, <strong>in</strong> order to be able to predict the behaviour of<br />

the whole IC <strong>in</strong> the case of an ESD impulse already <strong>in</strong> an<br />

early design phase, different approaches exist. In [3], an<br />

algorithm is presented for the special case of CDM-<br />

(Charged Device Model) stress which partitions the whole<br />

circuit <strong>and</strong> models each block as RC-network. Between<br />

these blocks, the current paths are extracted dur<strong>in</strong>g a<br />

CDM-impulse via simulation. In contrast to this, the current<br />

paths <strong>in</strong> [5] are characterized via an analysis of the<br />

resistance of all possible paths <strong>and</strong> after that, the possibility<br />

of each path is calculated. In [4], the ESD critical devices<br />

<strong>in</strong>side the pad frame are simulated with different<br />

characteristics for snapback <strong>and</strong> breakdown, tak<strong>in</strong>g all<br />

possible permutations <strong>in</strong>to account. In [2], a two-step approach<br />

of a Full-Chip analysis is demonstrated, where the<br />

whole netlist is reduced <strong>in</strong> a first step <strong>and</strong> a simulation of<br />

the reduced netlist is carried out subsequently. A reduction<br />

algorithm analyses critical voltages <strong>in</strong> each possible<br />

path <strong>and</strong> subsequently decides whether this path shall be<br />

<strong>in</strong>cluded <strong>in</strong> the reduced circuit plan or not. Due to the<br />

variety of possible paths, the applicability of this procedure<br />

is highly constra<strong>in</strong>ed for cases <strong>in</strong> which a supply net<br />

is <strong>in</strong>volved.<br />

In this publication, a procedure is presented which also<br />

carries out a reduction of the whole circuit, <strong>in</strong> order to<br />

subsequently allow a more precise analysis. But <strong>in</strong> comparison<br />

to [2], the reduction algorithm presented here is<br />

more efficient <strong>in</strong> f<strong>in</strong>d<strong>in</strong>g the relevant paths <strong>and</strong> it considers<br />

not only fixed voltage conditions, but additionally the<br />

static <strong>and</strong> transient circuit states <strong>in</strong> a certa<strong>in</strong> approximation;<br />

<strong>in</strong> this way it extracts less superfluous paths <strong>and</strong> is<br />

able to dist<strong>in</strong>guish between potentially damaged <strong>and</strong> otherwise<br />

<strong>in</strong>volved devices.<br />

2. Approach to Circuit Reduction<br />

In order to carry out a transient simulation of the IC dur<strong>in</strong>g<br />

ESD-stress, it is necessary to reduce the netlist to<br />

those parts which decisively determ<strong>in</strong>e the behaviour of<br />

the circuit <strong>in</strong> the relevant case. A differentiation is made


etween “endangered” <strong>and</strong> “<strong>in</strong>volved” circuit elements.<br />

Dur<strong>in</strong>g the stress, endangered devices may experience a<br />

physical damag<strong>in</strong>g <strong>and</strong> have to be part of the reduced<br />

circuit diagram <strong>in</strong> any case. On the basis of these devices,<br />

the critical current paths, i.e. the ones conduct<strong>in</strong>g the largest<br />

current through the endangered devices, are extracted.<br />

In addition to these paths, further devices can carry a significant<br />

current <strong>and</strong> thus, they are decisively responsible<br />

for the behaviour of the whole circuit. These elements are<br />

also <strong>in</strong>cluded <strong>in</strong>to the reduced circuit diagram.<br />

To provide this, it is <strong>in</strong>dispensable to consider the circuit<br />

states of the transistors <strong>in</strong> an efficient way <strong>in</strong> order to reflect<br />

the distribution of the impulse <strong>and</strong> the <strong>in</strong>fluence of<br />

parasitic capacitances <strong>and</strong> resistances for the worst case.<br />

Figure 1. Approximation of ESD-impulse by a<br />

ramp function<br />

The rise times of an ESD current pulse range from 10ps<br />

dur<strong>in</strong>g CDM-impulses to 10ns dur<strong>in</strong>g HBM-impulses [8].<br />

Those current pulses are transformed by the protection<br />

device at the p<strong>in</strong> under test to a voltage pulse with a certa<strong>in</strong><br />

rise time (t r) <strong>and</strong> maximum voltage (V max). These high<br />

voltages <strong>and</strong> voltage slopes lead to currents <strong>and</strong> voltages<br />

(ohmic <strong>and</strong> capacitive) <strong>in</strong>side the circuit which damage<br />

devices or have impact on the circuit state of transistors.<br />

Because the worst case can be expected when both the<br />

voltage <strong>and</strong> its slope is maximal, the transient waveform<br />

of the impulse is approximated as a ramp function with<br />

the same rise time <strong>and</strong> maximum voltage (Figure 1) <strong>and</strong><br />

the end of the ris<strong>in</strong>g edge regarded as that worst case,<br />

where the current path extraction is carried out.<br />

Req = tr ___<br />

C<br />

Error criteria<br />

Accuracy criteria<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

Netlist Input impulse<br />

Simplified<br />

simulation<br />

(DC-simulation)<br />

Current path<br />

extraction<br />

Reduced<br />

netlist<br />

Transient<br />

simulation<br />

Ramp function<br />

Boundary nodes<br />

Figure 2. Verification flow for netlist reduction<br />

In order to arrange this process time efficiently, a modified<br />

direct current simulation (DC-simulation) of the<br />

whole circuit diagram is carried out (Figure 2).<br />

188<br />

To model the capacitive behaviour of devices <strong>in</strong> a DCsimulation,<br />

the <strong>in</strong>ternal capacitances are replaced by<br />

equivalent resistances as follows: In a DC-simulation,<br />

these resistances should carry the same current as the <strong>in</strong>ternal<br />

model capacitances <strong>in</strong> a transient analysis at the end<br />

of the voltage ramp (worst case). That current follows<br />

from Equation (1) us<strong>in</strong>g the approximation of the impulse<br />

by a ramp function. The resistance (R eq) which carries the<br />

same current at constant voltage V max as the capacity with<br />

the equivalent ramp can be calculated by equation (2).<br />

dv(<br />

t)<br />

i(<br />

t)<br />

= C ⋅<br />

dt<br />

R<br />

eq<br />

V<br />

I<br />

= max<br />

i<br />

tr<br />

=<br />

C<br />

V<br />

C ⋅<br />

t<br />

=<br />

max<br />

r<br />

(1)<br />

Thus, a DC simulation apply<strong>in</strong>g Vmax <strong>and</strong> us<strong>in</strong>g those<br />

equivalent resistances is an approximation of the worst<br />

case dur<strong>in</strong>g the ESD impulse.<br />

The modification of the simulation models can take place<br />

<strong>in</strong> a semi-automatically process. Here, parallel to an already<br />

exist<strong>in</strong>g model library, a so called DC-library is<br />

produced <strong>in</strong> which the DC-models are archived. After<br />

these models were embedded <strong>in</strong>to the netlist, the DCsimulation<br />

of the DC-netlist can be carried out <strong>and</strong> the<br />

result file can be analysed with the help of device specific<br />

current- <strong>and</strong> voltage criteria. This process (current path<br />

extraction) analyses the paths of the highest currents start<strong>in</strong>g<br />

from the damaged devices.<br />

Subsequently, the devices which are decisive for the behaviour<br />

of the whole circuit are extracted via different<br />

accuracy criteria <strong>and</strong> are <strong>in</strong>cluded <strong>in</strong> a reduced netlist.<br />

Then, this netlist serves as a start<strong>in</strong>g basis for a transient<br />

simulation.<br />

The whole simulation process (Figure 2) can be controlled<br />

<strong>and</strong> automated by a script language such as described <strong>in</strong><br />

[6]. This makes it possible to access the <strong>in</strong>ternal data<br />

structures of a design framework for IC-design [7].<br />

3. Realization <strong>and</strong> Verification of the<br />

Model<strong>in</strong>g Approach<br />

In order to detect over voltages <strong>and</strong> thus to deduce possible<br />

damages of the circuit elements, the simulation models<br />

are extended by anti-serial Zener diodes (Figure 3). As<br />

soon as the voltage across one diode exceeds the breakdown<br />

voltage, a current flows <strong>in</strong> this branch <strong>and</strong> the device<br />

is marked as endangered with<strong>in</strong> the algorithm for the<br />

current path extraction.<br />

The complex sub circuits of the transistor models were not<br />

changed. Esp. the <strong>in</strong>ternal capacitances, which have no<br />

effect dur<strong>in</strong>g a DC simulation, were not removed, but the<br />

equivalent resistances were just added <strong>in</strong> parallel (see<br />

Figure 3). If needed, further simplification can take place<br />

here.<br />

(2)


Figure 3. Extended DMOS simulation model<br />

Due to the substitution of capacitances by their equivalent<br />

resistances, the behaviour of the circuit is modelled with<strong>in</strong><br />

a certa<strong>in</strong> approximation. As an example, <strong>in</strong> a simple RC<br />

serial circuit the current flow can be described by the differential<br />

equation (3) (voltage ramp with V max <strong>and</strong> t r),<br />

result<strong>in</strong>g <strong>in</strong> the current transient from equation (4).<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

di(<br />

t)<br />

i(<br />

t)<br />

V<br />

= − +<br />

dt R ⋅C<br />

R ⋅t<br />

V ⎛ max ⋅C<br />

i( t)<br />

= ⎜<br />

1−<br />

e<br />

tr<br />

⎝<br />

max<br />

t<br />

−<br />

RC<br />

If the capacitance is substituted by the equivalent resistance,<br />

a difference is yielded which depends on the ratio<br />

of R to C respective R eq. In Figure 4, the current flows of<br />

the RC- <strong>and</strong> RR eq- serial connection as well as the relative<br />

deviation are shown as a function of the ratio R eq /R. The<br />

qualitative characteristics are well reproduced by the approximation.<br />

The maximum error occurs when R <strong>and</strong> R eq<br />

differ by a factor of 1.5 <strong>and</strong> 2.5 <strong>and</strong> is acceptable for a<br />

worst case consideration, as will be shown below.<br />

Figure 4. Current flow through RC- <strong>and</strong> RReq series<br />

connection <strong>and</strong> relative deviation.<br />

In order to verify this approach <strong>and</strong> to validate the approximation,<br />

transient <strong>and</strong> DC simulations were carried<br />

out with the same schematics, <strong>in</strong>creas<strong>in</strong>g the complexity<br />

r<br />

⎞<br />

⎟<br />

⎠<br />

(3)<br />

(4)<br />

189<br />

of the analyzed circuits step by step (see also chapter 4<br />

<strong>and</strong> 5). In a first <strong>in</strong>vestigation simple networks of capacitances<br />

<strong>and</strong> resistances where analyzed. The difference<br />

between transient <strong>and</strong> DC-simulation was <strong>in</strong> the range of<br />

0.1-0.4 percent. The behaviour of a few transistors with<br />

respect to capacitive voltage divider was studied by means<br />

of an example circuit (Figure 5). At the supply net V DD, an<br />

impulse with a maximum voltage of 40V <strong>and</strong> a rise time<br />

of 1ns is applied. The capacitive coupl<strong>in</strong>g of transistor T3,<br />

T2 <strong>and</strong> the result<strong>in</strong>g gate-source damage of T6 are well<br />

modelled dur<strong>in</strong>g the DC-simulation.<br />

Figure 5. Example circuit for test of DC-models<br />

4. Example of Use<br />

To verify the behaviour of the algorithm under real conditions,<br />

DC-simulations <strong>and</strong> transient simulations are carried<br />

out <strong>and</strong> compared to each other. The first example is a<br />

driver stage of a communication controller, which consists<br />

of bipolar, low-voltage <strong>and</strong> high-voltage devices. The total<br />

device count is 139.<br />

The correlation between transient <strong>and</strong> DC-simulation results<br />

is shown <strong>in</strong> Figure 6. A good correlation may be<br />

observed for currents larger than 1% of the maximum<br />

current, show<strong>in</strong>g the applicability of the approximation for<br />

a worst case analysis.<br />

Figure 6. Correlation between transient <strong>and</strong> DCsimulation<br />

currents (normalized)


5. Comparison of Comput<strong>in</strong>g Time<br />

The approach presented <strong>in</strong> this publication assumes that a<br />

DC-simulation (@V max) takes less time than a transient<br />

analysis (ramp function t r@V max). To verify this assumption<br />

with different circuit sizes, the circuit presented <strong>in</strong><br />

Paragraph 4 is repeatedly connected <strong>in</strong> parallel up to 200<br />

times. The result<strong>in</strong>g circuits are simulated transiently us<strong>in</strong>g<br />

a l<strong>in</strong>ear ramp <strong>and</strong> by a DC simulation <strong>in</strong> the operat<strong>in</strong>g<br />

po<strong>in</strong>t V=V max us<strong>in</strong>g the equivalent models described<br />

above. After this, the comput<strong>in</strong>g times are evaluated.<br />

Simulation Time [s]<br />

1000000<br />

100000<br />

10000<br />

1000<br />

100<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

10<br />

70%<br />

60%<br />

50%<br />

40%<br />

30%<br />

20%<br />

10%<br />

1<br />

0%<br />

100 1000 10000 100000<br />

Device Count<br />

Transient Simulation DC Simulation Time Sav<strong>in</strong>g<br />

Figure 7: Comput<strong>in</strong>g time as a function of device<br />

count<br />

For the DC simulation a manual source stepp<strong>in</strong>g method is<br />

used to improve convergence, i.e. the DC-voltage is <strong>in</strong>creased<br />

<strong>in</strong> three steps up to the maximum voltage Vmax <strong>and</strong> the result of the previous simulation acts as <strong>in</strong>itial<br />

solution for the next analysis. Dur<strong>in</strong>g the transient simulation,<br />

the <strong>in</strong>itial solution as well as the solution of all other<br />

time steps is found with<strong>in</strong> a few iterations because of the<br />

small voltage variations between the time steps.<br />

Figure 7 shows time sav<strong>in</strong>gs of 35% to 60% for the DC<br />

analysis compared to a transient simulation. This is not<br />

very large, but it has to be taken <strong>in</strong>to account, that the DC<br />

models were not further simplified, but are rather more<br />

complex due to the added resistances so that the device<br />

count is 1.5 times higher for the DC-models than for the<br />

transient simulations models. Further work has to be done<br />

<strong>in</strong> the field of the simplification of the DC-Simulation<br />

models <strong>in</strong> order to enlarge the advantage <strong>in</strong> the comput<strong>in</strong>g<br />

time.<br />

6. Summary <strong>and</strong> Outlook<br />

In this publication, an algorithm is presented which is<br />

capable to analyze the behaviour of a complex mixedsignal<br />

IC <strong>in</strong> the case of an ESD-event dur<strong>in</strong>g the early<br />

design phase.<br />

In a two-stage approach an equivalent DC-simulation is<br />

used to model the circuit behaviour at the end of the pulse<br />

Time Sav<strong>in</strong>g [%]<br />

190<br />

rise time (which is considered as the worst case), to identify<br />

endangered devices <strong>and</strong> to extract critical paths <strong>and</strong> a<br />

reduced schematic. In a next step this <strong>in</strong>formation can be<br />

used to carry out a more detailed (transient) analysis. It is<br />

shown that the deviations result<strong>in</strong>g from the used approximations<br />

are small enough for a worst-case analysis.<br />

Further <strong>in</strong>vestigations are necessary to <strong>in</strong>crease the time<br />

efficiency of the simulation models <strong>and</strong> the automation<br />

level of the entire analysis flow. Furthermore, parasitic<br />

capacitances <strong>and</strong> resistances should be <strong>in</strong>cluded after a<br />

parasitic extraction of the layout is done.<br />

With the current state of the tool it is possible to f<strong>in</strong>d endangered<br />

devices by a total IC analysis. Furthermore, the<br />

<strong>in</strong>volved current paths, which can run <strong>in</strong> a complicated<br />

way across several parasitic paths <strong>and</strong> through different<br />

blocks <strong>and</strong> hierarchical levels, were identified <strong>and</strong> visualised<br />

<strong>in</strong>side the CAD environment.<br />

In this way the reliability of the ESD analysis is <strong>in</strong>creased<br />

enormously <strong>and</strong> future redesigns can be reduced.<br />

7. References<br />

[1] P. Rah<strong>in</strong>kar, P. Paters<strong>in</strong>, L. S<strong>in</strong>gh, „System-ona-chip<br />

Verification“, Kluwer Academic Publishers<br />

2002<br />

[2] M. Baird, R. Ida, “VerifyESD: A Tool for Efficient<br />

Circuit Level ESD Simulation of Mixed-<br />

Signal ICs”, EOS/ESD Symp. 2000, pp 465-469<br />

[3] J. Lee, K. W. Kim, S. M. S. Kang, „VeriCDF: A<br />

new Verification Methodology for Charged Device<br />

Failures“, 39th Design Automation Conference<br />

2002<br />

[4] M. Streibl, F. Zängl, K. Esmark, „High Abstraction<br />

Level Permutational ESD Concept Analysis“,<br />

EOS/ESD Symp. 2003<br />

[5] P. Ngan, R. Gramacy, C. K. Wong, “Automatic<br />

Layout Based Verification of Electrostatic Discharge<br />

Paths”, EOS/ESD Symp. 2001, pp. 96-<br />

101<br />

[6] Cadence Design Systems, “SKILL Language<br />

Reference”, June 2000<br />

[7] Cadence Design Systems, “Cadence Design<br />

Framework II”, June 2000<br />

[8] A. Amerasekera, C. Duvvury, „ESD <strong>in</strong> Silicon<br />

Integrated Circuits“, J. Wiley & Sons 2002<br />

The reported R+D work <strong>in</strong> this paper is carried out <strong>in</strong> the<br />

frame of the MEDEAplus/Eureka project A509 MESDIE.<br />

This particular research is supported by the BMBF<br />

(Bundesm<strong>in</strong>isterium fuer Bildung und Forschung) of the<br />

Federal Republic of Germany under grant 01M3061H.<br />

The responsibility for this publication is held by the authors<br />

only.


0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

HALOTIS – HIGH ACCURATE LOGIC TIMING<br />

SIMULATOR<br />

P. Ruiz-de-Clavijo, M.J. Bellido, J. Juan.<br />

Dpto. Tecnología Electrónica – Universidad de Sevilla – Spa<strong>in</strong><br />

Instituto de Microelectrónica de Sevilla – CNM - Spa<strong>in</strong><br />

E-mail: paul<strong>in</strong>o@dte.us.es<br />

ABSTRACT<br />

This paper present a novel logic-tim<strong>in</strong>g simulator that<br />

<strong>in</strong>cludes the Degradation Delay Model (DDM) called<br />

HALOTIS. DDM obta<strong>in</strong>s high accuracy <strong>in</strong> glitches<br />

treatment <strong>and</strong> it has been <strong>in</strong>cluded <strong>in</strong> HALOTIS<br />

simulation eng<strong>in</strong>e. Also, it is possible to estimate<br />

switch<strong>in</strong>g activity us<strong>in</strong>g it, <strong>and</strong> the results show a high<br />

accuracy <strong>in</strong> simulations when these are compared to<br />

HSPICE.<br />

1.MOTIVATION<br />

One of the key po<strong>in</strong>ts <strong>in</strong> modern CMOS VLSI system<br />

design is power consumption. On one h<strong>and</strong>, limited<br />

power consumption is a major design constra<strong>in</strong>t <strong>in</strong><br />

portable <strong>and</strong> battery powered devices, while, on the<br />

other h<strong>and</strong>, common desktop processor consumptions are<br />

reach<strong>in</strong>g values of hundreds of watts, which may<br />

seriously compromise reliability <strong>and</strong> cool<strong>in</strong>g capabilities.<br />

Therefore, accurate power analysis has become an<br />

<strong>in</strong>tegral part of the design verification tasks.<br />

Very accurate power estimations may be obta<strong>in</strong>ed<br />

through circuit-level electrical simulation us<strong>in</strong>g SPICElike<br />

tools. These, however, are only applicable to small<br />

circuits or critical units due to the high CPU time <strong>and</strong><br />

computational resources required by circuit-level<br />

simulation. Most practical power estimations need to be<br />

done at higher level of abstraction: logic-level or higher.<br />

This work is about a logic-level simulation tool that<br />

<strong>in</strong>corporates accurate power estimation capabilities.<br />

Logic-level simulation like vss [1], modelsim [2] or<br />

verilog-xl [3] may be used to obta<strong>in</strong> power estimations<br />

through the analysis of the switch<strong>in</strong>g activity. Most of<br />

the power consumed <strong>in</strong> digital CMOS circuits is<br />

dynamic power (due to signal switch<strong>in</strong>g) which can be<br />

obta<strong>in</strong>ed from the operation of logic-level simulators.<br />

Total power (or current) consumption is then evaluated<br />

by us<strong>in</strong>g a current model for the switch<strong>in</strong>g signals. The<br />

accuracy of the estimation thus obta<strong>in</strong>ed will largely<br />

depend on the ability of the logic-level simulator to<br />

provide the correct switch<strong>in</strong>g activity. However, it is<br />

well known that current logic-level simulators largely<br />

overestimate the switch<strong>in</strong>g activity 30% or more is often<br />

observed- [4] because of a poor treatment of the<br />

generation <strong>and</strong> propagation of “glitches” or marg<strong>in</strong>al<br />

pulses that are otherwise filtered <strong>in</strong> reality. The ultimate<br />

reason to this lack of accuracy is the dynamic nature of<br />

glitch propagation, which is not taken <strong>in</strong>to account <strong>in</strong><br />

traditional logic-level behavioural models. This may<br />

become a major issue as logic-level power estimations<br />

191<br />

become popular <strong>and</strong> are used to drive low-power logic<br />

synthesis.<br />

In the past few years, our research group has developed<br />

logic-level behavioural models which explicitly<br />

<strong>in</strong>corporates the treatment of glitches through the socalled<br />

Delay Degradation Model (DDM) [5]. The ma<strong>in</strong><br />

objective of this work is to develop a logic-level<br />

simulator called HALOTIS (High Accurate LOgic<br />

TIm<strong>in</strong>g Simulator), [6] that implements the DDM <strong>and</strong> is<br />

then able to obta<strong>in</strong> accurate estimations of the switch<strong>in</strong>g<br />

activity <strong>and</strong> current consumption (thus power) at the<br />

logic level, overcom<strong>in</strong>g the common logic-level<br />

simulator limitations. To do that, HALOTIS <strong>in</strong>cludes<br />

<strong>and</strong> advanced current model for logic transitions <strong>and</strong> a<br />

novel logic-simulation algorithm. Besides, HALOTIS is<br />

a fast <strong>and</strong> very accurate tim<strong>in</strong>g simulator.<br />

In the next section we present the current model.<br />

Section three summarizes some results, which are<br />

shortly discussed <strong>in</strong> section four.<br />

2.HALOTIS: HIGH ACCURACY<br />

LOGIC-TIMING SIMULATOR<br />

The ma<strong>in</strong> objective of this work is to build a logic-level<br />

simulation tool called HALOTIS. The ma<strong>in</strong> aspects we<br />

consider <strong>in</strong> HALOTIS develop<strong>in</strong>g can be summarized as<br />

follow:<br />

• The simulation eng<strong>in</strong>e <strong>in</strong>cludes the DDM proposed <strong>in</strong><br />

previous papers [5]. This means a very high accuracy <strong>in</strong><br />

simulation of glitches. Also, the new algorithm for the<br />

<strong>in</strong>ertial effect [7] greatly improves the accuracy with<br />

respect to the traditional approach.<br />

• Includes an algorithm to caculate switch<strong>in</strong>g activity.<br />

The switch<strong>in</strong>g activity is used <strong>in</strong> other design tools to<br />

use low power design technics as for example Power<br />

Compiler [1]. At the present moment we are study<strong>in</strong>g<br />

the SAIF [1] format specification to achieve that these<br />

results can be used <strong>in</strong> other optimization tools.<br />

• An algorithm to get the current waveform. The basic<br />

idea consists <strong>in</strong> to use a triangular waveform for the<br />

current. This triangle is characterized by the Imax, tmax<br />

(maximum current <strong>and</strong> the time), the beg<strong>in</strong><strong>in</strong>g <strong>and</strong> the<br />

end of triangle. To calculte Imax <strong>and</strong> tmax we are study<strong>in</strong>g<br />

differents models that exists <strong>in</strong> literature [8, 9]. A<br />

prelim<strong>in</strong>ar current waveform results has been<br />

presented <strong>in</strong> [10].


As we can see, we are develop<strong>in</strong>g a logic level simulation<br />

tool aimed to get accuracy estimation of power <strong>and</strong><br />

current consumption.<br />

3.THE DEGRADATION DELAY<br />

MODEL<br />

Typical models for logic simulation only consider the<br />

<strong>in</strong>ertial effect to deal with very narrow pulses. These<br />

models show discont<strong>in</strong>uous behavior for very similar<br />

<strong>in</strong>put conditions. Unlike the actual behavior, this<br />

discont<strong>in</strong>uity is due to the fact that depend<strong>in</strong>g on its<br />

width, an <strong>in</strong>put pulse may be <strong>in</strong> a normal propagation or<br />

a filter<strong>in</strong>g (non-propagation) region. Nevertheless, the<br />

change <strong>in</strong> the behavior of a true gate is not abrupt, rather<br />

cont<strong>in</strong>uous <strong>and</strong> gradual. In fact, two limit cases appear <strong>in</strong><br />

real behavior: one for wide pulses that are propagated<br />

normally <strong>and</strong> another for very narrow pulses that are<br />

elim<strong>in</strong>ated, but there is a pulse-width range between<br />

them <strong>in</strong> which pulses are neither elim<strong>in</strong>ated nor<br />

propagated normally. Inside this range, the output pulse<br />

width is smaller than the correspond<strong>in</strong>g <strong>in</strong>put pulse<br />

width. In such a case, the pulse is considered to be<br />

degraded.<br />

We showed <strong>in</strong> [5] that the delay decreases exponentially<br />

as pulses are shortened. Full degradation effect <strong>in</strong>sights<br />

were studied for the case of CMOS gates <strong>and</strong> a delay<br />

model that took <strong>in</strong>to account the exponential behavior of<br />

the degradation effect was also presented. Now, we<br />

summarize some conclusions: only two parameters for<br />

each type of transition, � <strong>and</strong> T 0 , are needed to model<br />

the degradation effect, result<strong>in</strong>g <strong>in</strong> the follow<strong>in</strong>g<br />

formula:<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

t p =t �1−e p0 −� T −T 0<br />

�<br />

� �<br />

(1)<br />

where t p0 is the normal propagation delay, that can be<br />

calculated us<strong>in</strong>g a conventional delay model [11], T is<br />

the time elapsed s<strong>in</strong>ce the last output transition <strong>in</strong> the<br />

gate's output took place, which measures the <strong>in</strong>ternal<br />

state of the gate, <strong>and</strong> � <strong>and</strong> T 0 are the degradation<br />

parameters which depend on the output load ( C L ), the<br />

supply voltage ( V DD ), the <strong>in</strong>put transition time ( �<strong>in</strong> )<br />

<strong>and</strong> the position of the <strong>in</strong>put that is chang<strong>in</strong>g state ( I ).<br />

It has been obta<strong>in</strong>ed <strong>in</strong> [15] that this dependence can be<br />

expressed as:<br />

� ×V = A �B ×C<br />

x DD xi xi l (2)<br />

T 0 x =� 1<br />

2 − C xi<br />

V DD �<br />

� <strong>in</strong><br />

where "x" st<strong>and</strong>s for "r" or "f" depend<strong>in</strong>g on the sense<br />

of the output transition (rise or fall respectively).<br />

4.CURRENT MODEL<br />

(3)<br />

Global current calculation is obta<strong>in</strong>ed by summ<strong>in</strong>g up<br />

the current driven by each circuit cell <strong>in</strong>dividually from<br />

the power supply. For a given cell we assume that the<br />

current only circulates dur<strong>in</strong>g the logic switch<strong>in</strong>g of the<br />

cell's output. Fig. 1 shows the currents <strong>in</strong>volved <strong>in</strong> a<br />

192<br />

CMOS <strong>in</strong>verter structure when a rais<strong>in</strong>g output<br />

transition takes place. The ma<strong>in</strong> currents taken from<br />

V DD dur<strong>in</strong>g the <strong>in</strong>verter's switch<strong>in</strong>g process are I CL <strong>and</strong><br />

I SC . I CL is the current that charges the <strong>in</strong>verter output<br />

node, <strong>and</strong> I SC is the short-circuit current which goes<br />

directly from V cc to GND while the NMOS part is still<br />

conduct<strong>in</strong>g. When we deal with rais<strong>in</strong>g output<br />

transitions, the total current driven from the power<br />

supply is the sum of I CL <strong>and</strong> I SC . For fall<strong>in</strong>g output<br />

transitions, the capacity C L is discharged, <strong>and</strong> the only<br />

component taken from the power supply is I SC .<br />

The actual current curve <strong>in</strong> the gate, as obta<strong>in</strong>ed with<br />

HSPICE electrical simulator [3], is shown <strong>in</strong> Fig. 2-b.<br />

Our model proposes to fit the actual current curve by a<br />

triangle-shaped, two-pieces l<strong>in</strong>ear curve. The triangle<br />

approximation can be seen <strong>in</strong> Fig. 2-a.<br />

Figure 1. CMOS Inverter currents dur<strong>in</strong>g logic<br />

switch<strong>in</strong>g.<br />

(a) (b)<br />

Figure 2. Inverter switch<strong>in</strong>g curves. (a) Triangle<br />

current model. (b) HSPICE simulation<br />

This approach is similar to that proposed by other<br />

authors [12,13], but us<strong>in</strong>g a simpler characterization<br />

method. The triangular shape is def<strong>in</strong>ed by three po<strong>in</strong>ts:<br />

the triangle start<strong>in</strong>g po<strong>in</strong>t <strong>in</strong>stant ( T b ), the current<br />

maximum value <strong>and</strong> the <strong>in</strong>stant when it takes place (<br />

T max , I max ), <strong>and</strong> the <strong>in</strong>stant time where the triangle<br />

ends ( T e ).<br />

These po<strong>in</strong>ts can be approximated as follows: T b is the<br />

<strong>in</strong>stant when the <strong>in</strong>put transition starts <strong>and</strong> the po<strong>in</strong>t T e<br />

is the <strong>in</strong>stant when the output transition ends, which are<br />

both known. To calculate I max <strong>and</strong> T max we use the


model proposed <strong>in</strong> [8]. In that work, the authors obta<strong>in</strong><br />

the follow<strong>in</strong>g equations:<br />

I max =�<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

2<br />

K ×W ×V ×�C �C �<br />

p p DD L SC<br />

� <strong>in</strong><br />

T max = �C L �C SC �V DD<br />

I max<br />

(4)<br />

(5)<br />

where K p <strong>and</strong> W p are respectively the<br />

transconductance factor <strong>and</strong> width of the PMOS<br />

transistor, V DD is the supply voltage, C SC is the<br />

short-circuit capacitance as def<strong>in</strong>ed <strong>in</strong> [8] <strong>and</strong> � <strong>in</strong> is the<br />

transition time of the <strong>in</strong>put.<br />

On the other h<strong>and</strong>, for fall<strong>in</strong>g output transitions we only<br />

have to consider the short-circuit current I SC , so that:<br />

I max =�<br />

2<br />

K ×W ×V ×C<br />

n n DD SC<br />

In this case, T max is the time when <strong>in</strong>put transition<br />

crosses the <strong>in</strong>verter <strong>in</strong>put<br />

� <strong>in</strong><br />

(6)<br />

threshold.<br />

The proposed model allows the calculation of an<br />

approximated, triangle-shaped current curve for every<br />

output transition of a cell, only based on cell parameters<br />

<strong>and</strong> tim<strong>in</strong>g data provided dur<strong>in</strong>g logic simulation.<br />

5.RESULTS<br />

We have already got a first version of HALOTIS [6, 10]<br />

<strong>and</strong> we have got the first results that now are go<strong>in</strong>g to<br />

summarize. In Fig. 4, 5 <strong>and</strong> 6, we shows the output<br />

waveform of a 4x4 multiplier circuit (Fig. 3) obta<strong>in</strong>ed<br />

with HSPICE, VERILOG <strong>and</strong> HALOTIS respectively.<br />

The results shows that, effectively, <strong>in</strong> VERILOG<br />

simulation there are many glitches that <strong>in</strong> HALOTIS <strong>and</strong><br />

HSPICE there is not.<br />

Figure 3. 4x4 multiplier circuit.<br />

193<br />

On the other h<strong>and</strong>, with this version of HALOTIS is<br />

possible to calculate switch<strong>in</strong>g activity (SA) of circuits.<br />

So, <strong>in</strong> Table 1 we shows the results of SA for two<br />

different <strong>in</strong>put sequences of the circuit of Fig.1a<br />

obta<strong>in</strong>ed with HSPICE, HALOTTIS <strong>and</strong> VERILOG. The<br />

results are very significant <strong>in</strong> sense that VERILOG give<br />

large overstimations of the switch<strong>in</strong>g activity (even<br />

above 60%) while HALOTIS is with<strong>in</strong> 5% respect to<br />

HSPICE.<br />

This results shows that HALOTIS could be a very<br />

<strong>in</strong>terest<strong>in</strong>g tool to be used when would be necessary to<br />

obta<strong>in</strong> accurate data about the dynamic behaviour<br />

(tim<strong>in</strong>g <strong>and</strong> power) of a design.<br />

Figure 4. Out waveforms of HSPICE simulation<br />

Figure 5.Out waveforms of VERILOG<br />

simulation.


Figure 6. Out waveforms of HALOTIS-DDM<br />

simulation.<br />

Table. 1. Transitions count for sequences A <strong>and</strong> B.<br />

Seq HSPIC<br />

E<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

HALOTIS<br />

(%error)<br />

VERILOG<br />

(%error)<br />

A 416 408 (2%) 697 (68%)<br />

B 584 608 (4%) 857 (47%)<br />

6.REFERENCES<br />

[1] http://www.synopsys.com<br />

[2] http://www.mentorgraphics.com<br />

[3] http://www.cadence.com<br />

[4] Baena, C. <strong>and</strong> Juan, J. <strong>and</strong> Bellido, M. J. <strong>and</strong> Ruizde-Clavijo,<br />

P. <strong>and</strong> Jimenez, C. J. <strong>and</strong> Valencia, M.,<br />

Measurement of the switch<strong>in</strong>g activity of CMOS<br />

digital circuits at the gate level, Proc. 12th<br />

International Workshop on Power <strong>and</strong> Tim<strong>in</strong>g<br />

Model<strong>in</strong>g, Optimization <strong>and</strong> Simulation (PATMOS),<br />

pp. 353-362., September 2002, Seville – Spa<strong>in</strong>.<br />

[5] Bellido-Díaz, M. J. <strong>and</strong> Juan-Chico, J. <strong>and</strong> Acosta,<br />

A. J. <strong>and</strong> Valencia, M. <strong>and</strong> Huertas, J. L., Logical<br />

modell<strong>in</strong>g of delay degradation effect <strong>in</strong> static<br />

CMOS gates, IEE Proc. Circuits Devices <strong>and</strong><br />

Systems, no. 2, Vol. 147, pp. 107-117, April 2000.<br />

[6] Ruiz-de-Clavijo, P. <strong>and</strong> Juan, J. <strong>and</strong> Bellido, M. J.<br />

<strong>and</strong> Acosta, A. J. <strong>and</strong> Valencia, M., HALOTIS:<br />

High Accuracy LOgic TIm<strong>in</strong>g Simulator with<br />

Inertial <strong>and</strong> Degradation Delay Model, Proc.<br />

Design, Automation <strong>and</strong> Test <strong>in</strong> Europe (DATE)<br />

Conference <strong>and</strong> Exhibition, Munich - Germany,<br />

March 2001.<br />

[7] J. Juan-Chico, M. J.<strong>and</strong> Bellido, A. J. Acosta, <strong>and</strong><br />

M. Valencia, Inertial effect h<strong>and</strong>l<strong>in</strong>g method for<br />

CMOS digital IC simulation IEE <strong>Electronics</strong><br />

Letters, Vol. 35, pp. 2028–2030, 1999.<br />

[8] P.Maur<strong>in</strong>e, R. Poirier, N. Azémard, D. Auverne,<br />

Switch<strong>in</strong>g current model<strong>in</strong>g <strong>in</strong> CMOS <strong>in</strong>verter for<br />

194<br />

speed <strong>and</strong> power estimation, Proc. 16th Conference<br />

on Design of Circuits <strong>and</strong> Integrated Systems<br />

(DCIS), pp. 618-622, Porto - Portugal, Nov. 2001.<br />

[9] L. Bisdounis, S. Nikolaidis, <strong>and</strong> O. Koufopavlou,<br />

Analytical transient response <strong>and</strong> propagation delay<br />

evaluation of the CMOS <strong>in</strong>verter for short-channel<br />

devices, IEEE Journal of Solid-State Circuits, vol.<br />

33, pp. 302–306, February 1998.<br />

[10] P. Ruiz-de Clavijo, J. Juan, M. J. Bellido, A.<br />

Millán, <strong>and</strong> D. Guerrero, Efficient <strong>and</strong> fast current<br />

curve estimation of CMOS digital circuits at the<br />

logic level, Proc. 12th International Workshop on<br />

Power <strong>and</strong> Tim<strong>in</strong>g Model<strong>in</strong>g, Optimization <strong>and</strong><br />

Simulation (PATMOS), pp. 400–408, Spr<strong>in</strong>ger, Sep.<br />

2002.<br />

[11] Daga, J. M. <strong>and</strong> Auvergne, D., A comprehensive<br />

delay macro model<strong>in</strong>g for submicrometer CMOS<br />

logics, IEEE Journal of Solid-State Circuits, Vol.<br />

34 – 1, 1999<br />

[12] Bogliolo, A. <strong>and</strong> Ben<strong>in</strong>i, L. <strong>and</strong> De Micheli,<br />

Giovanni <strong>and</strong> Riccò, B., Gate-level power <strong>and</strong><br />

current simulation of CMOS <strong>in</strong>tegrated circuits,<br />

IEEE Trans. on very large scale of <strong>in</strong>tegration<br />

(VLSI) systems, Vol. 5 no. 5, pp. 473-488, Dec<br />

1999.<br />

[13] Nikolaidis, S. <strong>and</strong> Chatzigeorgiou, A., Analytical<br />

estimation of propagation delay <strong>and</strong> short-circuit<br />

power dissipation <strong>in</strong> CMOS gates., International<br />

journal of circuit theory <strong>and</strong> applications, Vol. 27,<br />

pp. 375-392, 1999


0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

TEST PATTERN FOR MICROWAVE<br />

DIELECTRIC PROPERTIES OF SrBi2Ta2O9<br />

Nicola Delmonte 1 , Bernard Enrico Watts 2 , Lorenzo Rosa 1 , Giovanni Chiorboli 1 ,<br />

Paolo Cova 1 , Roberto Menozzi 1<br />

1 Dipartimento di Ingegneria dell’Informazione, University of Parma<br />

Parco Area delle Scienze 181/A, I-43100 Parma, ITALY<br />

2 IMEM/CNR, Viale delle Scienze 37/A, I-43100 Parma, ITALY<br />

E-mail: delmonte@ee.unipr.it<br />

ABSTRACT<br />

A test structure employ<strong>in</strong>g a one-step lithography process<br />

has been built for measur<strong>in</strong>g the complex impedance of<br />

ferroelectric capacitors at microwaves. The measurements<br />

are compared to the results of a f<strong>in</strong>ite element analysis<br />

with the aim of develop<strong>in</strong>g an electrical model of the test<br />

structure <strong>in</strong> which parasitic elements appear. These<br />

elements can be experimentally measured <strong>and</strong> partially<br />

de-embedded.<br />

The purpose of this paper is the characterization of<br />

strontium-bismuth tantalate (SBT) capacitors for<br />

microwave ICs or SoCs.<br />

1. INTRODUCTION<br />

Ferroelectric technology <strong>in</strong>tegrated with the<br />

complementary metal-oxide-semiconductor (CMOS)<br />

process has been proposed for some types of non-volatile<br />

memory <strong>and</strong> analog microwave circuits (e.g. resonators<br />

<strong>and</strong> filters). In recent years, several ferroelectric th<strong>in</strong> films<br />

have been studied at microwave frequencies; lead<br />

zirconate titanate (PZT) <strong>and</strong> barium-strontium titanate<br />

(BST) have been widely <strong>in</strong>vestigated. One <strong>in</strong>terest<strong>in</strong>g<br />

ferroelectric is the SrBi2Ta2O9 (strontium-bismuth<br />

tantalate), which has been studied because of its lower<br />

dielectric fatigue when used with Pt electrodes [1, 2].<br />

However, the microwave dielectric properties of<br />

strontium-bismuth tantalate (SBT) have not yet been<br />

<strong>in</strong>vestigated as widely [3, 4]. The purpose of these studies<br />

is the microwave characterization of the dielectric<br />

properties of an SBT th<strong>in</strong> film.<br />

The dielectric properties of SBT make it a good material<br />

for the production of FERAM memories. Microwave<br />

characterizations may show other properties that could<br />

make it a good c<strong>and</strong>idate for capacitors to be employed<br />

also <strong>in</strong> microwave circuits.<br />

In this experiment, a study of high frequency dielectric<br />

properties of an SBT th<strong>in</strong> film has been performed <strong>and</strong> an<br />

equivalent circuit model has been used to correct the<br />

measurements.<br />

195<br />

2. EXPERIMENTAL TECHNIQUE<br />

2.1 SBT deposition <strong>and</strong> top metal layer<br />

pattern<strong>in</strong>g<br />

Ferroelectric SrBi 2Ta 2O 9 (SBT) th<strong>in</strong> films were deposited<br />

onto a Pt(190 nm)/TiN(20nm)/SiO 2(100 nm)/Si-undoped<br />

substrate us<strong>in</strong>g a metal organic decomposition (MOD)<br />

preparation technique, adapted from a method described<br />

by Joshi et al. [5]. The start<strong>in</strong>g materials used were:<br />

strontium acetate Sr(CH 3COO) 20.5H 2O; bismuth 2ethylhexanoate<br />

Bi(C 4H 9CH(C 2H 5)COO) 3; tantalum<br />

ethoxide Ta(CH 3CH 2O) 5; stabilis<strong>in</strong>g additives,<br />

methoxiethanol CH 3OCH 2CH 2OH <strong>and</strong> diethanolam<strong>in</strong>e<br />

NH(CH 3CH 2O) 2; glacial acetic acid as the solvent. The<br />

precursor solution was prepared accord<strong>in</strong>g to the method<br />

described by Watts et al. [1] <strong>and</strong> was syr<strong>in</strong>ged through a<br />

0.2 µm filter on a 15x15 mm substrate <strong>and</strong> spun at 3000<br />

rpm for 60 s. The bak<strong>in</strong>g step was done on a hot plate at<br />

350 °C between sp<strong>in</strong>s. F<strong>in</strong>ally, the SBT was crystallised at<br />

750 °C <strong>in</strong> flow<strong>in</strong>g oxygen for 1 h. A total of 3 layers were<br />

deposited. After bak<strong>in</strong>g <strong>and</strong> crystallization, the SBT film<br />

was 70 nm thick. The thickness is estimated from a<br />

previous experiment.<br />

Si<br />

top view<br />

cross sectional view<br />

top-ground<br />

Au<br />

SBT<br />

Pt<br />

TiN<br />

SiO2<br />

Figure 1. Top <strong>and</strong> cross sectional view of the test<br />

structure (not to scale).


In order to measure the dielectric properties of the SBT<br />

layer, the simple shape of a circular patch capacitor was<br />

adopted, s<strong>in</strong>ce it is a rather convenient method <strong>and</strong> can be<br />

made us<strong>in</strong>g a one-step lithography process [6]. To realize<br />

this shape we evaporated a 100 nm Au film on the SBT<br />

layer (Fig. 1). The Au patterns were def<strong>in</strong>ed by the lift-off<br />

technique, to match a 150 µm pitch coplanar G-S-G<br />

microprobe used for the measurements. A matrix of metal<strong>in</strong>sulator-metal<br />

(M-I-M) capacitors has been patterned<br />

with two different diameters, 50 <strong>and</strong> 100 µm, to de-embed<br />

the capacitance due to the top-ground metallization from<br />

the results provided by the vector network analyzer<br />

(VNA).<br />

Cd<br />

Figure 2. Lumped element equivalent circuit of<br />

the test structure.<br />

The circuit <strong>in</strong> Fig. 2 can electrically model the measured<br />

impedance. In this model, RcA <strong>and</strong> RcB are the contact<br />

resistances between the S probe tip <strong>and</strong> the circular patch,<br />

<strong>and</strong> between the G probe tips <strong>and</strong> the top-ground<br />

respectively, Cd <strong>and</strong> Ctg are the capacitances of the circular<br />

patch <strong>and</strong> the top ground respectively, while Rd <strong>and</strong> Rtg are<br />

resistances represent<strong>in</strong>g the dielectric losses of the two<br />

capacitors. Rs is the resistance of the Pt r<strong>in</strong>g that connects<br />

the bottom electrode of the circular capacitor with the<br />

bottom electrode of the top-ground capacitor. F<strong>in</strong>ally, Cp is the lateral parasitic capacitance that appears between the<br />

top two electrodes of the circular patch <strong>and</strong> the topground,<br />

while Rp is the lateral parasitic resistance between<br />

the same electrodes. This resistance is due to the SBT r<strong>in</strong>g<br />

between the two desired capacitors of the test structure.<br />

If the capacitance Cp is low enough <strong>and</strong> the resistance Rp is large enough, the electrical model can be simplified as<br />

shown <strong>in</strong> Fig. 3, where Rt represents the series of RcA, Rs <strong>and</strong> RcB. Figure 3. Simplified lumped element equivalent<br />

circuit of the test structure.<br />

The impedance of this model is:<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

Ctg<br />

RcA<br />

RcB<br />

A Rs<br />

B<br />

A<br />

Rd<br />

Cd<br />

Rp<br />

Cp<br />

Rtg<br />

Ctg<br />

Rt B<br />

Rd<br />

Rtg<br />

196<br />

Z<br />

AB<br />

R<br />

R<br />

d<br />

tg<br />

= Rt<br />

+<br />

+<br />

2 2<br />

2<br />

1+<br />

ω R C 1+<br />

ω R C<br />

d<br />

⎛ 2<br />

⎜ ω R<br />

− dC<br />

i<br />

d<br />

⎜ 2 2<br />

⎝<br />

1+<br />

ω RdC<br />

2<br />

d<br />

2<br />

d<br />

2<br />

tg<br />

2<br />

ω RtgCtg<br />

+<br />

2 2<br />

1+<br />

ω R C<br />

When C tg >> C d <strong>and</strong> R tg


To model the SBT layer we used the complex dielectric<br />

constant computed from data measured by the VNA us<strong>in</strong>g<br />

the methods described by Ma et al. [6] <strong>and</strong> Dube et al. [7].<br />

3. RESULTS AND DISCUSSION<br />

3.1 Measurements<br />

The electric properties of dielectric material are described<br />

by the real part of the dielectric constant ε' <strong>and</strong> the loss<br />

tangent tanδ, which can be evaluated from C d <strong>and</strong> R d.<br />

Hence, it is necessary to de-embed the resistance R t <strong>and</strong><br />

the impedance of the top ground from the results given by<br />

the VNA Anritsu 37347C VNA <strong>in</strong> the range 40 MHz ÷ 20<br />

GHz.<br />

Figure 5. Effect of de-embedd<strong>in</strong>g correction on loss<br />

tangent.<br />

First, the resistance Rt has been extracted accord<strong>in</strong>g to [8],<br />

then the impedance due to the top ground (Ctg//Rtg) is deembedded<br />

by compar<strong>in</strong>g the result<strong>in</strong>g impedances of two<br />

M-I-M capacitors of different areas. The results are<br />

reported <strong>in</strong> Fig. 5, with reference to the loss tangent. The<br />

same results were obta<strong>in</strong>ed by consider<strong>in</strong>g the top ground<br />

as a short circuit <strong>in</strong> the electrical model shown <strong>in</strong> Fig. 3.<br />

With this second method, after the Rt de-embedd<strong>in</strong>g the<br />

real part of the dielectric constant can be calculated from<br />

the measured susceptance Bm, by us<strong>in</strong>g the formula<br />

reported <strong>in</strong> Eq. 3:<br />

Bm<br />

⋅ t<br />

ε 'r<br />

=<br />

, (3)<br />

2π<br />

⋅ f ⋅ Ad<br />

⋅ ε 0<br />

where t is the dielectric thickness, A d is the electrodes area<br />

of the capacitor under test<strong>in</strong>g <strong>and</strong> ε 0 is the vacuum<br />

dielectric constant.<br />

In addition to the real part of the dielectric constant, to<br />

give a complete description of the dielectric properties, it<br />

is necessary to evaluate the complex part of the dielectric<br />

constant ε’’ or, equivalently, it is possible to compute the<br />

loss tangent by us<strong>in</strong>g Eq. 4:<br />

Re<br />

⎧ ⎫<br />

{ } ⎨<br />

1<br />

ε ''<br />

Re Y<br />

⎬<br />

m ⎩ Z m<br />

tanδ<br />

= = =<br />

⎭<br />

, (4)<br />

ε ' Im{<br />

Ym}<br />

Im<br />

⎧ ⎫<br />

⎨<br />

1<br />

⎬<br />

⎩ Z m ⎭<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

197<br />

where Y m (Z m) is the admittance (impedance) measured<br />

after de-embedd<strong>in</strong>g R t.<br />

Accord<strong>in</strong>g to the model <strong>in</strong> Fig. 3, if the top-ground<br />

capacitor acts as a short-circuit, after the de-embedd<strong>in</strong>g R t,<br />

the measured susceptance, B m, versus frequency must<br />

follow a l<strong>in</strong>ear behaviour. We can estimate the frequency<br />

limit of the model from where the measured value of B m<br />

does not match the theoretical behaviour. When the VNA<br />

calibration step <strong>and</strong> the contact<strong>in</strong>g between the probe tips<br />

<strong>and</strong> the electrodes of the test structure are done with care,<br />

the frequency limit is higher. For the structures measured,<br />

we observed a frequency limit up to about 10 GHz. tanδ<br />

becomes negative above this frequency, confirm<strong>in</strong>g the<br />

limit of the adopted model. Figures 6 <strong>and</strong> 7 show the<br />

results obta<strong>in</strong>ed for the real part of dielectric constant <strong>and</strong><br />

the loss tangent with a zero voltage DC bias applied to the<br />

capacitors.<br />

Figure 6. Real part of dielectric constant as a<br />

function of frequency.<br />

Figure 7. Loss tangent as a function of frequency.<br />

Under this condition, start<strong>in</strong>g from a value of about 43 at<br />

40 MHz, the dielectric constant decreases slightly with<br />

frequency down to a value of about 36 at 10 GHz. The<br />

loss tangent is always lower than 0.07. At higher<br />

frequencies the loss tangent is around zero.


3.2 F<strong>in</strong>ite element analysis<br />

The numerical analysis was limited to the electrical field<br />

with low levels of current flow<strong>in</strong>g <strong>in</strong> the plane, hence,<br />

<strong>in</strong>ductive effects are negligible. The <strong>in</strong>ward AC current<br />

density at the circular patch electrode was varied until the<br />

evaluated capacitance of the structure was equal to one<br />

obta<strong>in</strong>ed by the VNA for the 50 µm diameter patches<br />

(about 10 pF). When these capacitances are matched, the<br />

magnitude of the lateral parasitic impedance of C p <strong>in</strong><br />

parallel to R p can be estimated. The results are shown <strong>in</strong><br />

Fig. 8, where the magnitude of the impedance C p//R p is<br />

plotted for frequencies from 1 GHz to 20 GHz.<br />

Figure 8. Magnitude of the impedance C p//R P<br />

plotted versus frequency.<br />

Above 5 GHz the lateral parasitic impedance is less than<br />

10 MΩ <strong>and</strong> reaches about 3 MΩ at 20 GHz.<br />

A similar calculation shows that the resistance R s does not<br />

varies with frequency. The computed value to R s is 0.25<br />

Ω, similar to the obta<strong>in</strong>ed for R t by experimental<br />

measurements (≈0.6 Ω).<br />

The f<strong>in</strong>ite element analysis confirms that the top-ground<br />

capacitor acts as a short-circuit.<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

4. CONCLUSION<br />

Results of the microwave characterization show that an<br />

SBT th<strong>in</strong> film produced us<strong>in</strong>g MOD technique has a low<br />

loss tangent <strong>in</strong> the <strong>in</strong> the 40 MHz ÷ 10 GHz range. The<br />

dielectric constant is shown to be around 40, lower than<br />

the expected value of 70 measured <strong>in</strong> previous<br />

experiments at low frequencies.<br />

The measurement technique adopted needs samples with<br />

simple top metal pattern<strong>in</strong>g that can be made with a one<br />

step lithography process.<br />

An improved model that takes a wider range of parasitics<br />

<strong>in</strong>to account is proposed. The model is based on a f<strong>in</strong>ite<br />

element analysis of the structure. This analysis has shown<br />

that the approximations made for the computations of<br />

dielectric constant <strong>and</strong> loss tangent versus frequency are<br />

good <strong>and</strong> suggests tak<strong>in</strong>g <strong>in</strong>to account the lateral parasitic<br />

element presented here.<br />

198<br />

5. REFERENCES<br />

[1] B.E. Watts, F. Leccabue, S. Guerri, M. Severi, M.<br />

Fanciulli, S. Ferrari, G. Tallarida, C. Mor<strong>and</strong>i, “A<br />

comparison of Ti/Pt <strong>and</strong> TiN/Pt electrodes used with<br />

ferroelectric SrBi 2Ta 2O 9 films”, Th<strong>in</strong> Solid Films,<br />

vol. 406, pp. 23-29, 2002.<br />

[2] K. Amanuma, T. Hase, Y. Miyasaka, “Preparation<br />

<strong>and</strong> ferroelectric properties of SrBi 2Ta 2O 9 th<strong>in</strong><br />

films”, Applied Physics Letters, vol. 66, pp. 221-223,<br />

1995.<br />

[3] Y. Wu, M.J. Forbess, S. Seraji, S.J. Limmer, T.P.<br />

Chou, G. Cao, “Impedance study of SrBi 2Ta 2O 9 <strong>and</strong><br />

SrBi 2(Ta 0.9V 0.1)O 9 ferroelectrics”, Material Science<br />

<strong>and</strong> Eng<strong>in</strong>eer<strong>in</strong>g, vol. B86, pp. 70-78, 2001.<br />

[4] K. Kotani, M. Misra, I. Kawayama, M. Tonouchi,<br />

“Time- doma<strong>in</strong> Terahertz Spectroscopy of Strontium<br />

Bismuth Tantalate Th<strong>in</strong> Films”, Materials <strong>Research</strong>.<br />

Society Symposium Proceed<strong>in</strong>gs, vol. 784, 2004.<br />

[5] P. C. Joshi, S. O. Ryu, X. Zhang, <strong>and</strong> S. B. Desu,<br />

“Properties of SrBi 2Ta 2O 9 ferroelectric th<strong>in</strong> films<br />

prepared by a modified metalorganic solution<br />

deposition technique”, Applied Physics Letters, vol.<br />

70, pp. 1080-1082, 1997.<br />

[6] Z. Ma, A.J. Becker, P. Polakos, H. Hugg<strong>in</strong>s, J.<br />

Pastalan, H. Wu, K. Watts, Y.H. Wong, P.<br />

Mankiewich, “RF Measurement Technique for<br />

Characteriz<strong>in</strong>g Th<strong>in</strong> Dielectric Films”, IEEE<br />

Transactions on Electron Devices, vol. 45, no. 8,<br />

August 1998.<br />

[7] D.C. Dube, J. Baborowski, P. Muralt, N. Setter, “The<br />

effect of bottom electrode on the performance of th<strong>in</strong><br />

film based capacitors <strong>in</strong> the gigahertz region”,<br />

Applied Physics Letters, vol. 74, pp. 3546-3548,<br />

1999.<br />

[8] T.G. Kim, J. Oh, Y. Kim, T. Moon, K. Sun Hong, B.<br />

Park, “Crystall<strong>in</strong>ity Dependence of Microwave<br />

Dielectric Properties <strong>in</strong> (Ba,Sr)TiO 3 Th<strong>in</strong> Films”,<br />

Japan. Journal of Applied Physics, vol. 42, pp. 1315-<br />

1319, 2003.<br />

[9] L.F. Chen, C.K. Ong, C.P. Neo, V.V. Varadan, V.K.<br />

Varadan, “Microwave <strong>Electronics</strong>: Measurements<br />

<strong>and</strong> Materials Characterization”, John Wiley &<br />

Sons, pp. 383-412, 2004.<br />

[10] M.J. Lancaster, J. Powell, A. Porch, “Th<strong>in</strong>-film<br />

ferroelectric microwave devices”, Supercond. Sci.<br />

Technol., vol. 11, pp. 1323-1334, 1998.<br />

[11] Y. Kim, J. Oh, T.G. Kim, B. Park, “Influence of the<br />

Microstructures on the Dielectric Properties of<br />

ZrTiO 4 Th<strong>in</strong> Films at Microwave-Frequency Range”,<br />

Japan. Journal of Applied Physics, vol. 40, pp. 4599-<br />

4603, 2001.


0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

Design <strong>and</strong> Fabrication of a New System For<br />

Vibration Energy Harvest<strong>in</strong>g<br />

Ghisla<strong>in</strong> Despesse 1 , Thomas Jager 1 , Jean-Jacques Chaillout 1 , Jean-Michel Léger 1<br />

Sk<strong>and</strong>ar Basrour 2<br />

1 CEA/DRT – LETI/DCIS, 17 rue des Martyrs, 38054 Grenoble Cedex 09, France<br />

2 TIMA, 46 avenue Félix Viallet, 38031 Grenoble Cedex, France<br />

E-mail: despesse@cea.fr<br />

ABSTRACT<br />

Global simulations, designs <strong>and</strong> characterisations of a<br />

broadb<strong>and</strong> vibration energy scaveng<strong>in</strong>g system are here<br />

reported. High damp<strong>in</strong>g electrostatic conversion<br />

structures have been <strong>in</strong>vestigated: Mathematica analytical<br />

models have been performed <strong>and</strong> confronted with<br />

ANSYS simulations. A scaveng<strong>in</strong>g electronics has also<br />

been developed: tak<strong>in</strong>g <strong>in</strong> account ohmic, capacitive <strong>and</strong><br />

<strong>in</strong>ductive losses, the global conversion efficiency has<br />

been calculated. F<strong>in</strong>ally, a macro <strong>and</strong> a micro resonant<br />

structure tuned to 50 Hz have been realized. A scavenged<br />

power of 1 mW is already available on the macro<br />

structure under a vibration amplitude of 90 µm at 50 Hz.<br />

The correspond<strong>in</strong>g global conversion efficiency is about<br />

60 % which is <strong>in</strong> good agreement with mechanical <strong>and</strong><br />

electrical simulations.<br />

1. INTRODUCTION<br />

Advances <strong>in</strong> low power electronics <strong>and</strong> microsystems<br />

design open up the possibility to power small wireless<br />

sensor nodes thanks to energy scaveng<strong>in</strong>g techniques.<br />

Depend<strong>in</strong>g on the system environment, many different<br />

types of energy sources can be used: thermal, radiative,<br />

mechanical or chemical. Among these sources we had to<br />

choose one whose conversion system is adapted to size<br />

reduction. We f<strong>in</strong>ally decided to focus on mechanical<br />

surround<strong>in</strong>g vibrations.. As shown on Figure 1 <strong>and</strong> Figure<br />

2 <strong>and</strong> by other recent studies [1], surround<strong>in</strong>g mechanical<br />

vibration frequencies are ma<strong>in</strong>ly <strong>and</strong> widely distributed<br />

below 100 Hz.<br />

20 40 60 80<br />

Figure 1. Acceleration spectrum on a car<br />

Hz<br />

m.s A(f)<br />

0.25<br />

0.2<br />

0.15<br />

0.1<br />

0.05<br />

-2<br />

20 40 60 80<br />

Figure 2. Acceleration spectrum on a metallic stair<br />

Hz<br />

m.s<br />

0.3<br />

0.25<br />

0.2<br />

0.15<br />

0.1<br />

0.05<br />

-2<br />

A(f)<br />

199<br />

To convert most of these vibrations <strong>in</strong>to electrical power it<br />

has been chosen to <strong>in</strong>vestigate conversion structures based<br />

on electrostatic transduction with high electrical damp<strong>in</strong>g.<br />

Many advantages are provided by the electrostatic<br />

conversion: it is easy to <strong>in</strong>tegrate <strong>and</strong> its power density is<br />

<strong>in</strong>creased by the size reduction. Moreover, high electrical<br />

damp<strong>in</strong>gs are easily achievable through this transduction<br />

pr<strong>in</strong>ciple. Thus, <strong>and</strong> contrary to most of exist<strong>in</strong>g systems<br />

[1-3], our structures will be able to recover power over a<br />

large spectrum below 100 Hz.<br />

2. CONVERSION STRUCTURES<br />

2.1 Choice of the conversion system<br />

In a first step the mechanical behavior of the conversion<br />

structure has been approximated through a l<strong>in</strong>ear viscous<br />

damp<strong>in</strong>g model [4]. In function of the <strong>in</strong>put acceleration<br />

A, its pulsation ω, the mov<strong>in</strong>g mass m, the resonant<br />

pulsation ωn of the structure <strong>and</strong> the mechanical <strong>and</strong><br />

electrical damp<strong>in</strong>g ζm <strong>and</strong> ζe, the maximum mechanical<br />

scavenged power P is given by:<br />

P =<br />

⎛<br />

⎜<br />

⎜2<br />

⎝<br />

( ζ + ζ )<br />

e<br />

2<br />

mA ζ 3⎛<br />

⎞<br />

e ω<br />

ω ⎜<br />

⎟ n<br />

ω n ⎝ ω n ⎠<br />

2<br />

2<br />

ω ⎞ ⎛ ⎞<br />

⎜ ⎛ ω ⎞<br />

+ 1−<br />

⎟<br />

⎟<br />

⎜ ⎜<br />

⎟<br />

m<br />

ω ⎠<br />

⎟<br />

n ⎝ ⎝ ω n ⎠ ⎠<br />

2<br />

We can easily see that the scavenged power is directly<br />

proportional to the mov<strong>in</strong>g mass <strong>and</strong> to the square of the<br />

<strong>in</strong>put acceleration. An operation with a high electrical<br />

damp<strong>in</strong>g has been chosen to maximize the scavenged<br />

power for a wide number of applications (i.e. over a large<br />

frequency b<strong>and</strong>) neglect<strong>in</strong>g the mechanical damp<strong>in</strong>g<br />

compared to the electrical one.<br />

For all types of exist<strong>in</strong>g electrostatic microstructures (outof-plane<br />

gap clos<strong>in</strong>g, <strong>in</strong>-plane overlap or <strong>in</strong>-plane gap<br />

clos<strong>in</strong>g [2]) <strong>and</strong> their operat<strong>in</strong>g cycles (voltageconstra<strong>in</strong>ed<br />

or charge-constra<strong>in</strong>ed cycle [3]), the electrical<br />

damp<strong>in</strong>g is due to the electrostatic force Fe appear<strong>in</strong>g<br />

between two mechanical parts (the mov<strong>in</strong>g mass <strong>and</strong> fixed<br />

electrodes) set at different potentials. For a high damp<strong>in</strong>g<br />

configuration, the electrostatic force has to counterbalance<br />

almost entirely the mechanical spr<strong>in</strong>g force Fm=kz, which<br />

is proportional to the relative displacement z <strong>and</strong> to the<br />

mechanical structure stiffness k. In agreement with the<br />

2<br />

(1)


electrostatic force characteristics presented <strong>in</strong> Table 1 <strong>in</strong><br />

function of the conversion structure <strong>and</strong> its operation<br />

cycle, the most convenient configuration seems to be an<br />

<strong>in</strong>-plane gap clos<strong>in</strong>g structure with a charge constra<strong>in</strong>ed<br />

cycle. For this configuration, the electrostatic force Fe is<br />

also proportional to the relative displacement <strong>and</strong> can be<br />

expressed as Fe=kez where ke is def<strong>in</strong>ed as the electrical<br />

stiffness.<br />

Structure Q constra<strong>in</strong>ed V constra<strong>in</strong>ed<br />

Out-of-plane Fe constant Fe ~ 1/z<br />

In-plane<br />

Fe ~ 1/z2 Fe constant<br />

overlap<br />

In-plane gap<br />

clos<strong>in</strong>g Fe ~ z F e ~ 1/z 2<br />

Table 1. Electrostatic force variation for different<br />

system configurations<br />

A high electrical damp<strong>in</strong>g is f<strong>in</strong>ally achieved by choos<strong>in</strong>g<br />

an electrical stiffness close to the mechanical one.<br />

2.2 Model<strong>in</strong>g <strong>and</strong> simulations<br />

Assum<strong>in</strong>g an <strong>in</strong>-plane gap clos<strong>in</strong>g conversion structure<br />

operat<strong>in</strong>g <strong>in</strong> a charge cycle, Mathematica analytical<br />

models have been performed to determ<strong>in</strong>e the mechanical<br />

<strong>and</strong> electrical parameters that maximize the scavenged<br />

power for a wide number of applications. Based on former<br />

experimental vibration measurements, the recoverable<br />

power <strong>and</strong> the vibration amplitude have been estimated <strong>in</strong><br />

function of the resonant frequency for high electrical<br />

damp<strong>in</strong>g configurations (i.e. ke/k close to 1). These<br />

calculations are summarized on Figure 3 for ke/k =0.67.<br />

20 40 60 80<br />

µm<br />

Displacement<br />

300<br />

250<br />

Metallic stairs<br />

200<br />

Drill<br />

150<br />

100<br />

Car eng<strong>in</strong>e<br />

50 Car driv<strong>in</strong>g<br />

100<br />

20 40 60 80 100<br />

fr(Hz)<br />

Figure 3. Recoverable power (a) <strong>and</strong> vibration<br />

amplitude (b) for a 1 g mov<strong>in</strong>g mass <strong>in</strong> function of<br />

the resonant frequency for ke/k=0.67<br />

fr(Hz)<br />

P(µW) Recoverable power<br />

30 Car driv<strong>in</strong>g Car eng<strong>in</strong>e<br />

25<br />

20 Drill Metallic stairs<br />

15<br />

10<br />

5<br />

Comb<strong>in</strong><strong>in</strong>g displacement (2 µW) constra<strong>in</strong>ts a 50 Hz resonant frequency<br />

has f<strong>in</strong>ally been chosen for our structures.<br />

As shown by Equation (1) the scavenged power is<br />

proportional to the mov<strong>in</strong>g mass: to scavenge a significant<br />

power it has thus been decided to <strong>in</strong>crease the mov<strong>in</strong>g<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

200<br />

mass of our future silicon microstructure by stick<strong>in</strong>g an<br />

additional tungsten mass on the mov<strong>in</strong>g part. In order to<br />

early validate the scaveng<strong>in</strong>g electronics pr<strong>in</strong>ciples it has<br />

also been decided to realize an <strong>in</strong>termediate macro<br />

prototype <strong>in</strong> bulk tungsten alloy.<br />

Complementary FEA (F<strong>in</strong>ite Element Analysis)<br />

simulations with ANSYS® have also been performed to<br />

validate our mechanical designs.<br />

F<strong>in</strong>ally, the scavenged mechanical power was estimated<br />

for both prototypes thanks to global simulations whose<br />

results are reported <strong>in</strong> Table 2 along with ma<strong>in</strong> prototypes<br />

characteristics.<br />

Bulk tungsten Silicon<br />

Characteristics<br />

macrostructure microstructure<br />

Size 18 cm 2 x1 cm 81 mm 2 x0.4 mm<br />

Mov<strong>in</strong>g mass 104 g 2 g<br />

Cm<strong>in</strong>/Cmax 900 / 3590 pF 14 / 147 pF<br />

Resonant<br />

50 Hz 50 Hz<br />

frequency<br />

Maximum<br />

116 µm 95 µm<br />

displacement<br />

Maximum<br />

scavenged power<br />

6 mW 70 µW<br />

Table 2. Characteristics <strong>and</strong> calculated<br />

performances of the prototypes<br />

2.3 Realizations<br />

The macroscopic conversion structure has been realized <strong>in</strong><br />

bulk tungsten thanks to electrical discharge mach<strong>in</strong><strong>in</strong>g<br />

(EDM). This technique is a good compromise to test <strong>and</strong><br />

validate the electronics pr<strong>in</strong>ciple. The structure is<br />

presented on Figure 4.<br />

Figure 4. Bulk tungsten prototype<br />

The second prototype is a silicon one whose realization<br />

process is based on Deep Reactive Ion Etch<strong>in</strong>g techniques<br />

as presented on Figure 5:<br />

Step 1<br />

Step 2<br />

Step 3<br />

Figure 5. Flowchart for the silicon prototype<br />

Flowchart description:<br />

� Step 1: Etch<strong>in</strong>g of the back face silicon cavity


� Step 2: Wafer bond<strong>in</strong>g <strong>and</strong> Alum<strong>in</strong>ium metallization<br />

� Step 3: DRIE Etch<strong>in</strong>g <strong>and</strong> structure release<br />

Many advantages can be achieved with size reduction: a<br />

higher capacity range <strong>and</strong> a higher quality factor with a<br />

smaller system size. On the Figure 6 is presented the f<strong>in</strong>al<br />

structure.<br />

Figure 6. Silicon prototype<br />

The measured resonant frequency of the tungsten<br />

prototype is a little lower than 50 Hz <strong>and</strong> the effective<br />

maximum vibration amplitude of its mov<strong>in</strong>g mass is<br />

limited to 90 µm. This is due to limitations <strong>in</strong>herent to the<br />

EDM comb<strong>in</strong>ed with the <strong>in</strong>accuracies of the f<strong>in</strong>al<br />

assembl<strong>in</strong>g of mov<strong>in</strong>g <strong>and</strong> fixed elements.<br />

Instead of the 6 mW of scavenged power <strong>in</strong>itially expected<br />

with a 116 µm maximum displacement at 50 Hz, the<br />

maximum expected power at 50 Hz is then reduced to 1.76<br />

mW.<br />

3. CONVERSION AND<br />

MANAGEMENT ELECTRONICS<br />

As the variable capacitance structure is driven by<br />

mechanical vibrations, it oscillates between a maximum<br />

capacitance Cmax <strong>and</strong> a m<strong>in</strong>imum capacitance Cm<strong>in</strong>. The<br />

maximum capacitance is variable <strong>and</strong> depends on the<br />

displacement amplitude <strong>and</strong> Cm<strong>in</strong> is fixed by the <strong>in</strong>itial<br />

equilibrium position of the structure. For a charge<br />

constra<strong>in</strong>ed operat<strong>in</strong>g cycle a given charge is <strong>in</strong>jected<br />

under a <strong>in</strong>itial Vm<strong>in</strong> voltage on the structure when the<br />

maximum of capacitance is reached. An electrostatic force<br />

appears <strong>and</strong> absorbs a part of the mechanical movement<br />

until the structure reaches its equilibrium position. The<br />

<strong>in</strong>jected charge is then recovered under a higher Vmax<br />

voltage. The maximum recoverable energy per cycle is<br />

given by:<br />

1<br />

E = Vm<strong>in</strong>Vmax<br />

( Cmax<br />

− Cm<strong>in</strong><br />

) (2)<br />

2<br />

To observe the <strong>in</strong>fluence of the effective movement<br />

amplitude on the capacitance <strong>and</strong> the voltage variations,<br />

the response to a decreas<strong>in</strong>g s<strong>in</strong>usoidal movement has<br />

been simulated for an arbitrary structure size. Results are<br />

presented on Figure 7.<br />

z(t)<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

mov<strong>in</strong>g part<br />

fixed part<br />

201<br />

0.01 0.02 0.03 0.04 t(s)<br />

-10<br />

-20<br />

10<br />

µm<br />

20<br />

Relative displacement<br />

0.01 0.02 0.03 0.04 t(s)<br />

150<br />

100<br />

50<br />

pF<br />

Variable capacity's value<br />

0.01 0.02 0.03 0.04 t(s)<br />

3<br />

2<br />

1<br />

nC Variable capacity's charge<br />

0.01 0.02 0.03 0.04 t(s)<br />

V<br />

120<br />

80<br />

40<br />

Variable capacity's voltage<br />

Figure 7. Charge-constra<strong>in</strong>ed cycle operation<br />

As the vibration amplitude decreases, the difference<br />

between the charge <strong>and</strong> discharge energy decreases, as<br />

does the converted energy.<br />

An efficient detection of maximum <strong>and</strong> m<strong>in</strong>imum of<br />

capacitance is needed to work properly. A driv<strong>in</strong>g circuit<br />

between the storage unit <strong>and</strong> the variable capacitance is<br />

also required for the charge <strong>and</strong> the discharge.<br />

3.1 Detection of capacitance extrema<br />

The capacitance extrema detection can be achieved by<br />

<strong>in</strong>ject<strong>in</strong>g a known high frequency current <strong>in</strong> the variable<br />

capacitance: the capacitance can be measured by filter<strong>in</strong>g<br />

or/<strong>and</strong> by us<strong>in</strong>g a synchronous detection. Unfortunately<br />

the power consumption of circuits based on this operat<strong>in</strong>g<br />

pr<strong>in</strong>ciple is too high. Instead of measur<strong>in</strong>g the capacitance<br />

value we therefore focused on the measurement of the<br />

capacitance voltage derivative. However this solution<br />

works only if there is always a residual charge on the<br />

variable capacitance: <strong>in</strong>stead of entirely discharg<strong>in</strong>g the<br />

capacitance at every cycle, the discharge will not be totally<br />

completed.<br />

With this residual charge on the variable capacitance, a<br />

first extremum of the capacitance voltage is achieved<br />

when the capacitance is maximum. An additional effective<br />

charge is then loaded on the capacitance. When the<br />

capacitance becomes m<strong>in</strong>imal, the voltage is maximum<br />

<strong>and</strong> reaches a second extremum: the capacitance is almost<br />

entirely discharged <strong>and</strong> a new conversion cycle starts.<br />

3.2 Power management<br />

Controlled by the detection stage, the charge <strong>and</strong> the<br />

discharge of the variable capacitance are achieved through<br />

a Flyback structure presented on Figure 8:<br />

Lp Ls E C<br />

Kp<br />

Ks<br />

Figure 8. Flyback power structure<br />

Vc


This structure has many advantages. The first one is that<br />

the storage unit side could be easily <strong>in</strong>tegrated (voltage<br />

lower than 5 V). Only one transistor (Ks) is still subjected<br />

to high voltage: the leakage current losses <strong>and</strong> the parasite<br />

capacitance losses are limited.<br />

The second one is that the number of turns on each side of<br />

the magnetic circuit can be adjusted to have the same<br />

switch-on time for both Kp <strong>and</strong> Ks transistors.<br />

The third advantage is that the transistors comm<strong>and</strong>s are<br />

referenced to the same ground potential.<br />

To charge the variable capacitance, the transistor Kp is<br />

switched on long enough for the magnetic circuit to be<br />

loaded with the requested energy. This energy is then<br />

transferred <strong>in</strong> the variable capacitance when Kp <strong>and</strong> Ks are<br />

respectively switched off <strong>and</strong> on: Ks is kept on until the<br />

magnetic circuit is entirely discharged <strong>in</strong> the variable<br />

capacitance.<br />

To discharge the variable capacitance, Ks is first switched<br />

on to transfer the capacitance energy <strong>in</strong> the magnetic<br />

circuit. Then, Kp <strong>and</strong> Ks are respectively switched on <strong>and</strong><br />

off <strong>and</strong> the energy is transferred from the magnet circuit<br />

<strong>in</strong>to the storage unit.<br />

Apart from the charge <strong>and</strong> the discharge, Kp <strong>and</strong> Ks are<br />

kept open.<br />

Experimental measurements of this charge <strong>and</strong> discharge<br />

cycles on the bulk tungsten macrostructure are reported on<br />

the Figure 9. These measurements are <strong>in</strong> good agreement<br />

with simulations.<br />

150<br />

100<br />

50<br />

-4 4 8<br />

Vc(V)<br />

Vc<br />

300<br />

250<br />

200<br />

150<br />

100<br />

50<br />

Figure 9. Charge <strong>and</strong> discharge cycle for a 30 Hz<br />

s<strong>in</strong>usoidal mechanical excitation.<br />

Experimental charge <strong>and</strong> discharge times have been<br />

measured <strong>in</strong> the µs range: as expected they are negligible<br />

compared with the mechanical cycle period (<strong>in</strong> the ms<br />

range).<br />

3.3 Power balance <strong>and</strong> conversion efficiency<br />

All parasitic losses associated to the magnetic circuit <strong>and</strong><br />

the transistors have been taken <strong>in</strong> account to estimate the<br />

power balance of the global system presented <strong>in</strong> Figure 10<br />

for the tungsten prototype.<br />

These parasitic elements have been measured <strong>and</strong> then<br />

implemented <strong>in</strong> model<strong>in</strong>g to simulate the loss repartition.<br />

Once these elements taken <strong>in</strong>to account, simulations are <strong>in</strong><br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

Simulation<br />

t (µs)<br />

Vc(t)<br />

300<br />

250<br />

200<br />

150<br />

100<br />

50<br />

Vc<br />

2.5 5 7.5 10 12 .5<br />

4 8 12<br />

Measurement<br />

t(ms)<br />

t (µs)<br />

202<br />

good agreement with measurements. The power balance is<br />

almost proportional to the mechanical frequency <strong>and</strong> is<br />

strongly l<strong>in</strong>ked to the relative displacement amplitude. For<br />

the tungsten prototype, the power balance becomes<br />

positive for vibration amplitudes higher than 60 µm.<br />

As presented on Figure 10 for a vibration amplitude of 90<br />

µm at 50 Hz, the global scavenged power is about 1052<br />

µW with an absorbed mechanical power of about 1760<br />

µW. The global efficiency is thus close to 60 %.<br />

Self ma<strong>in</strong>tenance power<br />

1.9 mW<br />

Mechanical energy<br />

1.8 mW Transduction 3.4 3mW<br />

1.9 mW 1.7 mW<br />

mW 1mW<br />

Scavenged<br />

-170 µW -78 µW -460 µW power<br />

Charge losses Transduction losses Discharge losses<br />

Figure 10. Power balance for the tungsten<br />

prototype (50 Hz; vibration amplitude of 90 µm)<br />

With size-reduction we can expect an higher conversion<br />

efficiency due to a better mechanical accuracy <strong>and</strong> a<br />

higher quality factor <strong>in</strong> the silicon prototype realization.<br />

4. CONCLUSION<br />

Global simulations have been performed on a high<br />

damp<strong>in</strong>g electrostatic system for vibration energy<br />

scaveng<strong>in</strong>g. A macrostructure <strong>in</strong> bulk tungsten <strong>and</strong> a<br />

silicon microstructure have been designed along with the<br />

associated management electronics. In agreement with<br />

simulations, 1mW of scavenged power is already<br />

delivered by the tungsten prototype with a global<br />

conversion efficiency of 60 % submitted to a s<strong>in</strong>usoidal<br />

vibration amplitude of 90 µm at 50 Hz. In situ<br />

measurements have also been performed <strong>and</strong> up to 250<br />

µW have been scavenged on a car eng<strong>in</strong>e. First conversion<br />

results with the silicon microstructure are expected soon<br />

<strong>and</strong> shall be about a few ten µW.<br />

5. REFERENCES<br />

[1] S. Roundy, P. K. Wright, <strong>and</strong> J. Rabaey, "A study of<br />

low level vibrations as a power source for wireless<br />

sensor nodes", Computer Communications, vol. 26,<br />

1131-1144, 2003.<br />

[2] S. Roundy, P. K. Wright <strong>and</strong> K. S. J. Pister, "Micro-<br />

Electrostatic Vibration-to-Electricity Converters",<br />

Proceed<strong>in</strong>gs of IMECE2002, 1-10, 2002.<br />

[3] S. Men<strong>in</strong>ger, "A Low Power Controller for a MEMS<br />

Based Energy Converter", Master of Science at the<br />

Massachusetts Institute of Technology, 1999.<br />

[4] C. B. Williams <strong>and</strong> R.B Yates, "Analysis of a microelectric<br />

generator for microsystems", Proceed<strong>in</strong>gs of<br />

the Transducers 95/Eurosensors IX, 369-372, 1995.


0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

NONLINEAR VIBRATIONS OF A MEMS<br />

TRANSLATIONAL DEVICE<br />

Elisabetta Leo, Francesco Bragh<strong>in</strong>, Ferruccio Resta.<br />

ABSTRACT<br />

When significantly displac<strong>in</strong>g a proof mass, the nonl<strong>in</strong>ear<br />

harden<strong>in</strong>g characteristic of the support<strong>in</strong>g beams becomes<br />

visible. Thus, the resonance peak of the structure is no<br />

longer vertical but bends towards the higher frequencies.<br />

This property is useful to easily synchronise sense <strong>and</strong><br />

drive resonances thus <strong>in</strong>creas<strong>in</strong>g the sensibility of the<br />

MEMS device. Through a test structure designed to<br />

<strong>in</strong>vestigate the high deformation range of the support<strong>in</strong>g<br />

beams, its nonl<strong>in</strong>ear vibrations were <strong>in</strong>vestigated both<br />

experimentally <strong>and</strong> numerically.<br />

1. INTRODUCTION<br />

The typical structure of a tun<strong>in</strong>g-fork MEMS gyroscope<br />

consists of a proof mass supported by polysilicon beams.<br />

The proof mass is set <strong>in</strong>to vibration along the drive<br />

direction through the comb-drive capacitor. In presence of<br />

an angular speed (ωz) perpendicular to the drive direction.<br />

Coriolis acceleration determ<strong>in</strong>es a force along the out-ofplane<br />

direction (sense direction) hav<strong>in</strong>g the same<br />

frequency as the driv<strong>in</strong>g force <strong>and</strong> proportional both to the<br />

speed of the proof mass along the drive axis <strong>and</strong> to the<br />

angular rate.<br />

To <strong>in</strong>crease the sensitivity of the gyroscope (i.e. to obta<strong>in</strong><br />

higher displacement along the sense direction) the<br />

eigenfrequencies along drive <strong>and</strong> sense directions should<br />

be equal to the driv<strong>in</strong>g frequency [1] <strong>and</strong> a low damp<strong>in</strong>g<br />

ratio (low pressure level) should be aimed for. However,<br />

low damp<strong>in</strong>g means narrow resonance peaks. Small<br />

variations <strong>in</strong> the eigenfrequencies (e.g. due to small errors<br />

<strong>in</strong> the production process) would therefore lead to a very<br />

small sensitivity. Complex control logic have thus to be<br />

implemented. Otherwise, to achieve high sensitivity even<br />

<strong>in</strong> presence of production errors, it is necessary to relax<br />

the constra<strong>in</strong> that the two resonance peaks have the same<br />

frequencies.<br />

The solution proposed <strong>in</strong> this paper relies on the nonl<strong>in</strong>ear<br />

characteristics of the device. A test structure (Figure 1) to<br />

achieve high displacements <strong>and</strong> thus to experimentally<br />

study the non-l<strong>in</strong>ear behaviour of MEMS was designed<br />

<strong>and</strong> produced with the support of ST<strong>Microelectronics</strong>. It<br />

consists of a proof mass suspended through four beams<br />

<strong>and</strong> forced by twenty comb-drives along the x-axis (drive<br />

direction).<br />

Polytechnic of Milan, Mechanical Department,<br />

Via La Masa 34 Milan, Italy<br />

E-mail: elisabetta.leo@polimi.it<br />

203<br />

Sense direction (y-axis)<br />

Drive direction (x-axis)<br />

Figure 1. Layout of the test structure to access the<br />

nonl<strong>in</strong>ear behaviour of support<strong>in</strong>g beams<br />

2. NUMERICAL MODEL<br />

S<strong>in</strong>ce only the first eigenfrequency of the system is of<br />

<strong>in</strong>terest, a lumped parameter model of the test structure<br />

has been developed. Moreover, due to the fact that the test<br />

structure was designed to access the nonl<strong>in</strong>ear behaviour<br />

of MEMS devices <strong>and</strong> is not a fully work<strong>in</strong>g MEMS<br />

gyroscope, only the motion equation along the drive<br />

direction is of <strong>in</strong>terest:<br />

m ⋅�� x + c ⋅x � + k(x) = f ⋅s<strong>in</strong>(ω<br />

t)<br />

drive drive<br />

where ‘x’ is the displacement of the test structure along<br />

the drive direction, ‘m’ is the proof mass, ‘k’ <strong>and</strong> ‘c’ are<br />

respectively the stiffness <strong>and</strong> the damp<strong>in</strong>g coefficients<br />

along the drive axis <strong>and</strong> the right-h<strong>and</strong> term of the<br />

equation is the actuation force.<br />

In order to correctly determ<strong>in</strong>e the non-l<strong>in</strong>ear stiffness<br />

“k(x)” of the support<strong>in</strong>g beams, a FEA model has been<br />

used. Due to the symmetry of the test structure the proof<br />

mass is bounded to translate along the x-direction. Thus,<br />

the boundary condition of the FEA model of a s<strong>in</strong>gle<br />

support<strong>in</strong>g beam are an encastre at one beam end <strong>and</strong> a<br />

slid<strong>in</strong>g-block constra<strong>in</strong>t at the other end. Two different<br />

FEA models were used to study the <strong>in</strong>fluence of element<br />

types: one “beam model” <strong>and</strong> one “shell model”. In the<br />

“beam model” 260 beam elements were used while <strong>in</strong> the<br />

“shell model” the support<strong>in</strong>g beam is discretized through<br />

1800 shell elements. For both models, the beam cross<br />

section is supposed to be rectangular <strong>and</strong> constant over the<br />

beam length. The simulated characteristic curve (force vs.<br />

displacement) of a s<strong>in</strong>gle support<strong>in</strong>g beam is shown <strong>in</strong><br />

Figure 2. As expected, the two models give almost the<br />

same results. Moreover, a harden<strong>in</strong>g behaviour is clearly


visible. To speed-up the dynamic simulation necessary to<br />

access the nonl<strong>in</strong>ear behavior of the test structure, an<br />

analytical description of this characteristic curve would be<br />

very useful. Thus, the simulated characteristic curve of a<br />

s<strong>in</strong>gle support<strong>in</strong>g beam was approximate through a cubic<br />

relation:<br />

3<br />

F=k(x) ≅ Kx+K x<br />

1 3<br />

F be<strong>in</strong>g the applied force at the free end of the beam, x<br />

be<strong>in</strong>g the displacement at the same end <strong>and</strong> K1 <strong>and</strong> K3 the<br />

l<strong>in</strong>ear <strong>and</strong> cubic stiffness constants respectively. The curve<br />

approximation is carried out us<strong>in</strong>g a nonl<strong>in</strong>ear curvefitt<strong>in</strong>g<br />

algorithm.Figure 3 shows the results of the<br />

approximation procedure: the fitted characteristic curve<br />

<strong>and</strong> the simulated one are almost overlapped.<br />

Figure 2. Harden<strong>in</strong>g characteristic curve of a<br />

s<strong>in</strong>gle support<strong>in</strong>g beam obta<strong>in</strong>ed through FEM<br />

analysis<br />

Figure 3. Comparison between simulated <strong>and</strong><br />

approximated harden<strong>in</strong>g characteristic curve.<br />

Us<strong>in</strong>g this cubic approximation of the support<strong>in</strong>g beams’<br />

characteristic, the differential equation that governs the<br />

motion along the drive axis becomes:<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

2<br />

b 1 b 3 drive drive<br />

m⋅�� x+c⋅x+N � kx+N k x = f s<strong>in</strong>(ω t)<br />

where ‘Nb’ is the number of beams that support the proof<br />

mass. While the parameter ‘m’ is easy to determ<strong>in</strong>e , the<br />

value of the damp<strong>in</strong>g coefficient is dependent on the<br />

work<strong>in</strong>g pressure of the device. Ambient pressure level is<br />

considered..<br />

The follow<strong>in</strong>g simplify<strong>in</strong>g assumptions were made:<br />

• the stiffness effects due to the compressibility of the<br />

fluid can be neglected with respect to the stiffness<br />

<strong>in</strong>troduced by the support<strong>in</strong>g beams;<br />

204<br />

• the structural damp<strong>in</strong>g of polysilicon can be neglected<br />

with respect to the viscous damp<strong>in</strong>g due to the air<br />

surround<strong>in</strong>g the structure (be<strong>in</strong>g of an order of<br />

magnitude smaller).<br />

To model the viscous damp<strong>in</strong>g, Couette flow <strong>and</strong> squeeze<br />

film damp<strong>in</strong>g [2] are used:<br />

c = cflow + csqueeze<br />

Approximat<strong>in</strong>g air as a Newtonian fluid, flow <strong>and</strong> squeeze<br />

film damp<strong>in</strong>g are equal to:<br />

A<br />

c flow = -µ<br />

y<br />

0<br />

2N ⋅l⋅h c squeeze = µ<br />

y<br />

comb<br />

µ be<strong>in</strong>g the viscosity constant of air at ambient<br />

temperature <strong>and</strong> pressure, A be<strong>in</strong>g the area of the<br />

overlapped plates, y 0 be<strong>in</strong>g the distance between the<br />

parallel surfaces, N be<strong>in</strong>g the number of comb f<strong>in</strong>gers, l<br />

be<strong>in</strong>g the length of these f<strong>in</strong>gers, h be<strong>in</strong>g the thickness of<br />

the structure (<strong>and</strong> therefore of the f<strong>in</strong>gers) <strong>and</strong> y comb be<strong>in</strong>g<br />

the distance between two f<strong>in</strong>gers. As already said, the test<br />

structure is set <strong>in</strong>to vibration through a series of comb<br />

drives to achieve a high actuation forces <strong>and</strong> therefore<br />

high displacements. To avoid hav<strong>in</strong>g a driv<strong>in</strong>g force with<br />

multiple frequencies, the pulsat<strong>in</strong>g component of the<br />

voltage applied to the two stators of the comb drive has<br />

opposite sign. Thus, the driv<strong>in</strong>g force is equal to<br />

⎛ h ⎞<br />

F () t = 2N⎜ε ⎟VDC<br />

VACs<strong>in</strong>( ωdt)<br />

=<br />

⎝ ycomb<br />

⎠<br />

= f s<strong>in</strong> t<br />

( ω )<br />

drive drive<br />

where ε is the dielectric constant of air, VDC <strong>and</strong> VAC are<br />

the amplitudes of the applied constant <strong>and</strong> pulsat<strong>in</strong>g<br />

voltages respectively <strong>and</strong> ωdrive is the drive frequency.<br />

The equation of motion along drive direction can therefore<br />

be rewritten as:<br />

3<br />

m⋅ �� x+ c⋅ x� + k1x+ k3x =<br />

⎛ h ⎞<br />

= 2N⎜ε ⎟VDC<br />

VAC s<strong>in</strong> ωdrivet<br />

⎝ ycomb<br />

⎠<br />

( )<br />

This non-l<strong>in</strong>ear, constant parameter, second order<br />

differential equation is known as Duff<strong>in</strong>g equation. Even<br />

though simple, there isn’t any analytical solution. Thus, to<br />

solve it two ways are possible: a semi-analytical approach<br />

or a numerical approach. The semi-analytical approach is<br />

based on Galerk<strong>in</strong>-Urabe’s method [3] <strong>and</strong> allows to<br />

determ<strong>in</strong>e only regime solutions (both stable <strong>and</strong> unstable<br />

solutions) <strong>in</strong> a very fast <strong>and</strong> efficient way; the numerical<br />

approach (e.g. Runge-Kutta’s method),<strong>in</strong>stead, allows to<br />

determ<strong>in</strong>e both the transient response <strong>and</strong> the stable<br />

regime solution but requires a much higher computational<br />

effort<br />

2.1 Semi-analytical method


The semi-analytical method used to solve the non-l<strong>in</strong>ear<br />

equation of motion of the test structure along drive<br />

direction is the Galerk<strong>in</strong>-Urabe method.The basic idea<br />

beh<strong>in</strong>d this method is that, although non l<strong>in</strong>ear, when a<br />

system is forced by a s<strong>in</strong>usoidal actuation force the system<br />

will have a periodic regime response. Thus, the regime<br />

response can be written as a Fourier series:<br />

xt () = a1cos( ω1t) + b1s<strong>in</strong>( ω1t)<br />

+<br />

+ a2cos( ω2t) + b2s<strong>in</strong>( ω2t)<br />

+ ...<br />

where the frequencies ‘ωi’ are multiples (super -<br />

harmonics) or fractions (sub - harmonics) of the frequency<br />

ωdrive of the actuation force. Due to the cubic non-l<strong>in</strong>earity<br />

<strong>in</strong>troduced by the support<strong>in</strong>g beams, it can be shown that<br />

the system’s response is a function of only odd<br />

frequencies Thus the system’s response x(t) is equal to:<br />

⎛ωdrive ⎞ ⎛ωdrive ⎞<br />

xt () = ... + a1/3 cos⎜ t⎟+ b1/3s<strong>in</strong>⎜ t⎟+<br />

⎝ 3 ⎠ ⎝ 3 ⎠<br />

+ acos( ωdrivet) + bs<strong>in</strong>( ωdrivet)<br />

+<br />

+ a3cos( 3ωdrivet) + b3s<strong>in</strong>( 3 ωdrivet)<br />

+ ... =<br />

p<br />

⎛ ⎛ωdrive ⎞ ⎛ωdrive⎞⎞ ≅ ∑⎜<br />

a1/(2i+ 1) cos⎜ t⎟+ b1/(2i+ 1) s<strong>in</strong>⎜<br />

t⎟⎟+<br />

i=<br />

1 ⎝ ⎝2i+ 1 ⎠ ⎝2i+ 1 ⎠⎠<br />

m<br />

∑(<br />

a(2 j + 1) ( ( j+ ) ωdrivet) + b(2 j + 1) ( ( j+ ) ωdrivet)<br />

)<br />

+ cos 2 1 s<strong>in</strong> 2 1<br />

j = 0<br />

‘p’ be<strong>in</strong>g the number of sub-harmonics <strong>and</strong> ‘m’ be<strong>in</strong>g the<br />

number of sub-harmonics considered. By <strong>in</strong>creas<strong>in</strong>g ‘p’<br />

<strong>and</strong>/or ‘m’ the approximate system’s response tends to the<br />

exact one. Substitut<strong>in</strong>g <strong>in</strong>to Duff<strong>in</strong>g’s equation, a system<br />

of 2(p+m+1) non-l<strong>in</strong>ear algebric equations is obta<strong>in</strong>ed.<br />

Solv<strong>in</strong>g this system of non-l<strong>in</strong>ear equations through, for<br />

example, Newton-Raphson method, the unknown<br />

coefficients a i <strong>and</strong> b i are determ<strong>in</strong>ed. Figure 4 shows the<br />

system’s regime response <strong>in</strong> the frequency doma<strong>in</strong><br />

obta<strong>in</strong>ed with 0 sub-harmonics (‘p=0’) <strong>and</strong> 5 superharmonics<br />

(‘m=5’). This response is determ<strong>in</strong>ed by<br />

apply<strong>in</strong>g a driv<strong>in</strong>g force hav<strong>in</strong>g a frequency that changes<br />

with discrete steps of 50Hz from 1kHz to 6kHz <strong>and</strong><br />

record<strong>in</strong>g the system’s response at equal frequency (a 1 <strong>and</strong><br />

b 1 coefficients).<br />

Figure 4. Non l<strong>in</strong>ear frequency response function<br />

of the test structure obta<strong>in</strong>ed with the semianalytical<br />

method (0 sub-harmonics <strong>and</strong> 5 superharmonics).<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

205<br />

2.2 Numerical method<br />

It’s important to consider that the <strong>in</strong>tegration step as well<br />

as the order of the <strong>in</strong>tegration method <strong>in</strong>troduce<br />

perturbations <strong>in</strong> the function to be <strong>in</strong>tegrated. Increas<strong>in</strong>g<br />

the order or decreas<strong>in</strong>g the step reduces the perturbation.<br />

The first consequence is that, as already po<strong>in</strong>ted out, the<br />

numerical <strong>in</strong>tegration methods cannot determ<strong>in</strong>e unstable<br />

branches of the system’s response. Furthermore, when<br />

stable <strong>and</strong> unstable branches of the system’s response get<br />

close to each other, the jump phenomenon may occur<br />

earlier or later as a function of the size of the <strong>in</strong>tegration<br />

step or of the order of the <strong>in</strong>tegration method. To improve<br />

accuracy, either higher order <strong>in</strong>tegration methods or very<br />

small <strong>in</strong>tegration step are required. Thus, high<br />

computational time is usually necessary to obta<strong>in</strong> the<br />

system’s regime response with good approximation.<br />

Several different <strong>in</strong>tegration methods were tested (variable<br />

step <strong>in</strong>tegration methods, variable order <strong>in</strong>tegration<br />

methods, non stiff <strong>and</strong> stiff <strong>in</strong>tegration methods) <strong>in</strong> order<br />

to access the best compromise between computational<br />

time <strong>and</strong> accuracy.<br />

The system’s regime response shown <strong>in</strong> Figure 5 are<br />

obta<strong>in</strong>ed by us<strong>in</strong>g the same <strong>in</strong>tegration method (RK 2/3)<br />

but different <strong>in</strong>tegration step (10µs <strong>and</strong> 5µs). It can be<br />

noticed that the identified jump-down frequency is higher<br />

with a smaller step size. However, as expected, the<br />

required computational time is about three times higher<br />

[m]<br />

x 10-6<br />

3.5<br />

3<br />

2.5<br />

2<br />

1.5<br />

1<br />

0.5<br />

RK23c step: 10e-6<br />

RK23c step: 5e-6<br />

0<br />

1000 1500 2000 2500 3000 3500 4000 4500<br />

[Hz]<br />

Figure 5. Nonl<strong>in</strong>ear frequency response function<br />

of the test structure obta<strong>in</strong>ed with the numerical<br />

method (Runge-Kutta 2 nd – 3 rd order method).<br />

3. COMPARISON BETWEEN<br />

NUMERICAL AND EXPERIMENTAL<br />

RESULTS<br />

To validate the numerical model, the test structure was<br />

produced <strong>and</strong> its velocity along drive direction was<br />

measured through a laser Doppler Vibrometer. To obta<strong>in</strong><br />

the displacement, the velocity of the proof mass is divided<br />

by the actuation frequency.<br />

Figure 6 <strong>and</strong> Figure 7 show the system’s regime response<br />

of the test structure <strong>in</strong> terms of magnitude of the frequency<br />

response function. Stars <strong>in</strong>dicate the simulated system’s<br />

response achieved through the semi-analytical method<br />

while squares <strong>and</strong> circles show the measured displacement<br />

values at <strong>in</strong>creas<strong>in</strong>g <strong>and</strong> decreas<strong>in</strong>g actuation frequency<br />

respectively. The difference between the two figures is the<br />

actuation force: Figure 6 refers to an actuation force of<br />

1.35µN while Figure 7 refers to a force of 2.7µN.


Both the figures show that the model is able to correctly<br />

predict the nonl<strong>in</strong>ear systems response: the jump-up <strong>and</strong><br />

jump-down frequencies are correctly estimated as well as<br />

the <strong>in</strong>crease/decrease of the systems vibrations amplitude<br />

with actuation frequency.<br />

Figure 6. Regime response of the test structure:<br />

numerical results (star marker) <strong>and</strong> experimental<br />

results (square <strong>and</strong> circle marker). Force: 1.35µN.<br />

Figure 7. Regime response of the test structure:<br />

numerical results (star marker) <strong>and</strong> experimental<br />

results (square <strong>and</strong> circle marker). Force: 2.7µN.<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

4. CONCLUSION<br />

The nonl<strong>in</strong>earity <strong>in</strong>troduced by the support<strong>in</strong>g beams of a<br />

test structure has been <strong>in</strong>vestigated both numerically<br />

(us<strong>in</strong>g two different methodologies, a semi-analytical <strong>and</strong><br />

a numerical approach) <strong>and</strong> experimentally. Even though a<br />

very simple lumped parameter model was used, a good<br />

agreement between experimental <strong>and</strong> numerical results<br />

was obta<strong>in</strong>ed. Thus, a powerful tool to predict even the<br />

nonl<strong>in</strong>ear structural dynamic behaviour of translational<br />

gyroscopes has been setup that will allow to design better<br />

<strong>and</strong> more reliable devices.<br />

5. REFERENCES<br />

[1] Sitaraman I., Yong Z., Tamal M., Analytical Modell<strong>in</strong>g<br />

of Cross-Axis Coupl<strong>in</strong>g <strong>in</strong> Micromechanical Spr<strong>in</strong>gs,<br />

Carnegie Mellon University, Pittsburgh<br />

[2] Hosaka H., Itao K., Kuroda S., Damp<strong>in</strong>g<br />

Characteristics of Beam-Shaped Micro-Oscillators,<br />

Sensors <strong>and</strong> Actuators A, Vol. 49, 1995, pp.: 87-95<br />

[3] Miralles J.P., Jiménez Olivo P.J. Peirò D.G., A Fast<br />

Galerk<strong>in</strong> Method to Obta<strong>in</strong> the Periodic Solutions of a<br />

206<br />

Nonl<strong>in</strong>ear Oscillator, Applied Mathematics <strong>and</strong><br />

Computation, Vol. 86, 1997, pp.: 261-282


0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

DESIGN AND SIMULATION OF RF MEMS<br />

SWITCHES FOR HIGH SWITCHING SPEED AND<br />

MODERATE VOLTAGE OPERATION<br />

Shimul Ch<strong>and</strong>ra Saha, Tajeshwar S<strong>in</strong>gh, Trond Sæther<br />

Department of <strong>Electronics</strong> <strong>and</strong> Telecommunications, Norwegian University of Science <strong>and</strong><br />

Technology (NTNU), 7491 Trondheim, Norway<br />

E-mail: shimul.saha@iet.ntnu.no<br />

ABSTRACT<br />

In this paper we present the design <strong>and</strong> simulations<br />

of an RF MEMS switch with regard to its switch<strong>in</strong>g<br />

speed <strong>and</strong> pull-down voltage. The switch is required<br />

to fulfill the requirements of high speed <strong>and</strong> a<br />

moderate pull down voltage for the switch<strong>in</strong>g of<br />

CMUT (Capacitive Micromach<strong>in</strong>ed Ultrasonic<br />

Transducer) elements used for ultrasound imag<strong>in</strong>g<br />

1. INTRODUCTION<br />

The objective of this paper is to present the design of<br />

a high speed RF MEMS switch. MEMS are<br />

nowadays becom<strong>in</strong>g <strong>in</strong>creas<strong>in</strong>gly popular <strong>in</strong> RF<br />

applications due to their attractive advantages such<br />

as high isolation <strong>and</strong> low <strong>in</strong>sertion loss. Depend<strong>in</strong>g<br />

on the switch type (DC contact or capacitive), the<br />

MEMS switches can show very good RF<br />

performance from DC to several tens of GHz.<br />

However, there are two sides for this co<strong>in</strong> too; there<br />

are also some disadvantages of the RF MEMS<br />

switches, as compared to their solid-state<br />

counterparts, which <strong>in</strong>clude high pull-down voltage<br />

<strong>and</strong> slow switch<strong>in</strong>g speed. In this paper it is shown<br />

that by a careful design, a compact switch with a<br />

moderate applied voltage can be optimized for the<br />

switch<strong>in</strong>g speed (


process. Prelim<strong>in</strong>ary simulations for switch<strong>in</strong>g time<br />

& pull-down voltage were done with Mathematica ® .<br />

The pull-down voltage Vp <strong>and</strong> switch<strong>in</strong>g time ts of<br />

the bridge are given by [1][3]:<br />

8⋅k⋅g<br />

3<br />

Vp<br />

=<br />

(1)<br />

27 ⋅ε⋅W ⋅w<br />

Vp<br />

ts<br />

=<br />

ω ⋅v<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

0<br />

S<br />

0<br />

27<br />

2<br />

Figure 1: The capacitive bridge<br />

(2)<br />

The actuation voltage Vs is selected slightly higher<br />

than the pull-down voltage calculated <strong>in</strong> (1), to<br />

<strong>in</strong>crease the switch<strong>in</strong>g speed. The proposed switch is<br />

shown <strong>in</strong> Fig. 1.<br />

The dimensions of the beam<br />

are 250µ m× 40µ m× 1.5µ<br />

m,<br />

the bottom electrode<br />

is 200µ m× 40µ<br />

m,<br />

<strong>and</strong> <strong>in</strong>itial gap g0 = 1µ<br />

m . From the<br />

simulations <strong>in</strong> Mathematica ® , a pull-down voltage of<br />

12V is obta<strong>in</strong>ed. Hence, with an actuation voltage of<br />

15V, which is of the order of the present state-ofthe-art<br />

(10-15V), pull down time of ts≈2.9µ sis<br />

obta<strong>in</strong>ed, which is around 10 times lower than<br />

reported results [2]. F<strong>in</strong>al device simulations were<br />

done <strong>in</strong> Coventorware ® . The switch<strong>in</strong>g time <strong>and</strong><br />

actuation voltage is shown on the Table 1. For<br />

comparison, the Gold beam is also simulated <strong>and</strong> as<br />

we can clearly see that the switch<strong>in</strong>g time is more<br />

than 2.5 orders better for Al than for Au for the same<br />

actuation voltage. The switch<strong>in</strong>g time (transient) <strong>and</strong><br />

pull down voltage simulations are shown <strong>in</strong> Figure 2<br />

<strong>and</strong> Figure 3 respectively.<br />

Table 1: The switch<strong>in</strong>g time for the switch with Q≈1<br />

Material Pull down<br />

voltage<br />

(V)<br />

Actuation<br />

voltage(V)<br />

Switch<strong>in</strong>g<br />

time (µs)<br />

Alum<strong>in</strong>um 11 12 4.90<br />

Alum<strong>in</strong>um 11 15 2.95<br />

Alum<strong>in</strong>um 11 20 1.85<br />

Alum<strong>in</strong>um 11 25 1.40<br />

Gold 12 15 7.55<br />

208<br />

Figure 2: Switch<strong>in</strong>g time simulations for Al beam with<br />

10µs pulse delay <strong>and</strong> Q≈1<br />

Figure 3: Pull down voltages for Al <strong>and</strong> Au beams.<br />

3.1 The damp<strong>in</strong>g <strong>and</strong> the Quality factor<br />

The damp<strong>in</strong>g (dom<strong>in</strong>antly squeeze film damp<strong>in</strong>g)<br />

plays an important role <strong>in</strong> the switch<strong>in</strong>g time of<br />

MEMS switches (Equation 3). With<strong>in</strong> small gap<br />

heights, as <strong>in</strong> MEMS devices, the gaseous medium<br />

acts like a viscous liquid. Dur<strong>in</strong>g switch actuation<br />

when the beam moves down, the gaseous medium<br />

present with<strong>in</strong> the gap must be pushed out <strong>and</strong> the<br />

molecules undergo several collisions between the<br />

beam <strong>and</strong> electrode [3]. The viscosity of the medium<br />

<strong>and</strong> <strong>in</strong> turn the quality factor can be controlled by<br />

reduc<strong>in</strong>g the pressure of the switch medium through<br />

packag<strong>in</strong>g. The equations (4-7) for effective<br />

viscosity are shown below [3].<br />

2<br />

d z dz<br />

m + b + kz =<br />

2<br />

dt dt<br />

λ<br />

a<br />

K n<br />

µ<br />

e<br />

0 λ 0<br />

a<br />

F total<br />

(3)<br />

p<br />

= (4)<br />

p<br />

λ<br />

g<br />

= (5)<br />

=<br />

µ<br />

1 . 159<br />

1 + 9 . 638 K n<br />

Where λa is the mean free path at pressure p a . g is<br />

the gap height <strong>and</strong> n K is the Knudsen number. Kn is<br />

a measure of viscosity of the gas under MEMS<br />

(6)


eam. µ e is the effective viscosity, a function of the<br />

Knudsen number. A high Knudsen number means<br />

low effective viscosity <strong>and</strong> the gas experiences few<br />

collisions <strong>and</strong> the flow is not viscous anymore. So<br />

by reduc<strong>in</strong>g the pressure of the medium (vacuum)<br />

the viscosity can be reduced <strong>and</strong> the switch<strong>in</strong>g speed<br />

<strong>in</strong>creased.<br />

The effective damp<strong>in</strong>g coefficient can also be<br />

reduced by hav<strong>in</strong>g perforations <strong>in</strong> the mov<strong>in</strong>g plate.<br />

The equations for damp<strong>in</strong>g without, <strong>and</strong> with<br />

perforations on the beam are given <strong>in</strong> Equation (7)<br />

<strong>and</strong> Equation (8) respectively [3, 4].<br />

2<br />

3 µ A<br />

b = (7)<br />

2π<br />

g<br />

b h<br />

3<br />

0<br />

2<br />

2<br />

12 µ A p p ln( p )<br />

=<br />

( − − −<br />

3<br />

N π g 2 8 4<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

0<br />

3<br />

)<br />

8<br />

(8)<br />

Here, N is the number of holes; p is the fraction of<br />

open area <strong>in</strong> the plate. From Equation (8) it can be<br />

seen that by hav<strong>in</strong>g perforations <strong>in</strong> the membrane,<br />

the damp<strong>in</strong>g coefficient can be reduced significantly.<br />

This option widely used because even by hav<strong>in</strong>g the<br />

perforations, the upstate capacitance is not affected<br />

because the fr<strong>in</strong>g<strong>in</strong>g fields ‘fill’ the empty space.<br />

The beam was simulated with different values of<br />

Quality Factor ‘Q’ (Equation 9) which depends on<br />

damp<strong>in</strong>g i.e. ma<strong>in</strong>ly gas damp<strong>in</strong>g <strong>and</strong> modal<br />

damp<strong>in</strong>g ‘D’ (D=a⋅M+b⋅K, where ‘M’ is the mass of<br />

the beam <strong>and</strong> ‘K’ is the beam spr<strong>in</strong>g constant, ‘a’<br />

<strong>and</strong> ‘b’ depend on the damp<strong>in</strong>g medium <strong>and</strong> the<br />

beam structure). Q was varied from 0.05 to 10 <strong>and</strong><br />

for different actuation voltage.<br />

Q<br />

=<br />

ω 0<br />

k<br />

b<br />

Thus, Q can be easily controlled by adjust<strong>in</strong>g the<br />

damp<strong>in</strong>g coefficient which <strong>in</strong> turn decides the<br />

switch<strong>in</strong>g speed. As the gap height is low, the<br />

obta<strong>in</strong>ed Q is very low ≈0.05 under normal<br />

condition (atmospheric pressure, no perforations).<br />

Figure 9 shows the 3-D view of a beam with 80<br />

holes (4×4 µm 2 ) for which Q≈1 was easily obta<strong>in</strong>ed<br />

at normal pressure. The same value of Q can be<br />

obta<strong>in</strong>ed without perforations by reduc<strong>in</strong>g the<br />

pressure equal to ≈1/30 th of the atmospheric<br />

pressure. For further <strong>in</strong>creas<strong>in</strong>g the Q, both options<br />

can be comb<strong>in</strong>ed (perforations <strong>and</strong> low pressure), a<br />

Q≈5 was thus obta<strong>in</strong>ed with perforations <strong>and</strong> at 1/7 th<br />

atmospheric pressure. Figures 4 <strong>and</strong> 5 show the<br />

switch<strong>in</strong>g time for actuation voltages 15V <strong>and</strong> 25V<br />

respectively, for different Q. It can thus be seen that<br />

(9)<br />

209<br />

at very low Q (high viscosity), the switch<strong>in</strong>g time is<br />

very large. Note from Figure 4 that very high Q also<br />

leads to a higher settl<strong>in</strong>g (release) time.<br />

Figure 4: The switch<strong>in</strong>g <strong>and</strong> release time for V a =15 V for<br />

different values of Q.<br />

Figure 5: The switch<strong>in</strong>g time for actuation voltage 25 V<br />

for different Q.<br />

At a higher Q the switch<strong>in</strong>g (pull down) time improves but<br />

<strong>in</strong>creas<strong>in</strong>g Q beyond 1 or 2 does not reduce the switch<strong>in</strong>g<br />

time much as compared to the reduc<strong>in</strong>g medium viscosity.<br />

So Q≈1-2 may be a fair choice for high speed switch<strong>in</strong>g. It<br />

can also be seen that at substantially higher Q≈10 <strong>and</strong><br />

actuation voltage of 25V, the switch<strong>in</strong>g time is 1.25µs,<br />

which is very fast compared to the present technology [2,<br />

3].<br />

3.2 The residual stress consideration<br />

One of the ma<strong>in</strong> concerns <strong>in</strong> the Fixed-Fixed beams<br />

is the residual stress. It can result <strong>in</strong> changes <strong>in</strong> the<br />

pull down voltage <strong>and</strong> <strong>in</strong> extreme cases render the<br />

structure useless. The residual stress usually arises<br />

form the process conditions <strong>and</strong> can be m<strong>in</strong>imised<br />

by controll<strong>in</strong>g the process parameters. It is possible<br />

to see the effect of residual stress on the switch<strong>in</strong>g<br />

time. When residual stress <strong>in</strong>creases, it adds to the<br />

spr<strong>in</strong>g constant, thus the resonant frequency<br />

<strong>in</strong>creases (ω0=k/m) which reduces the switch<strong>in</strong>g<br />

time. Figure 6 shows the comparative study of<br />

resonant frequency with residual stress for our beam.<br />

As seen, when the residual stress <strong>in</strong>creases, the<br />

resonant frequency <strong>in</strong>creases, which reduces the<br />

switch<strong>in</strong>g time. Thus care must be taken that the


esidual stress does not exceed the limits because if<br />

the pull down voltage exceeds the desired actuation<br />

voltage then the beam cannot be switched.<br />

Resonant frequency<br />

1,10E+06<br />

1,05E+06<br />

1,00E+06<br />

9,50E+05<br />

9,00E+05<br />

8,50E+05<br />

8,00E+05<br />

7,50E+05<br />

7,00E+05<br />

6,50E+05<br />

6,00E+05<br />

0,00E+ 5,00E+ 1,00E+<br />

00 06 07<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

1,50E+ 2,00E+ 2,50E+<br />

07 07 07<br />

Residual stress<br />

3,00E+ 3,50E+<br />

07 07<br />

Figure 6: The resonant frequency vs. residual stress.<br />

3.3 Different beam lengths<br />

The effect of vary<strong>in</strong>g the beam length is<br />

<strong>in</strong>vestigated. For longer beams, the spr<strong>in</strong>g constant k<br />

decreases thus the resonant frequency <strong>and</strong> the<br />

actuation voltage are both reduced. This has a mixed<br />

effect on the switch<strong>in</strong>g time. Figure 7 shows the<br />

switch<strong>in</strong>g times for different beam length.<br />

Figure 7: The switch<strong>in</strong>g time for actuation voltage 20 V<br />

for different length <strong>and</strong> Q≈1.<br />

We choose an actuation voltage of 20V as the pull<br />

down voltage for the 200µm switch is more >15V.<br />

From Figure 7 it can be seen that from 200µm to<br />

250µm there is a considerable improvement <strong>in</strong> the<br />

switch<strong>in</strong>g time but from 250µm to 280µm the<br />

switch<strong>in</strong>g time does not improve much. So it is not<br />

worth <strong>in</strong>creas<strong>in</strong>g the beam length further, to<br />

ma<strong>in</strong>ta<strong>in</strong> the compactness.<br />

3.4 The S-parameters<br />

The simulation for the S- parameters is shown <strong>in</strong><br />

Figure 8. The switch transmits the signal (ON state)<br />

while <strong>in</strong> upstate <strong>and</strong> the signal is capacitively<br />

grounded (OFF state) when the beam is <strong>in</strong> the<br />

downstate. It can be seen that we get an <strong>in</strong>sertion<br />

loss less than 1dB <strong>in</strong> upstate position <strong>and</strong> have more<br />

than 15 dB isolation <strong>in</strong> the down state position for<br />

typical operat<strong>in</strong>g frequency range (>10 GHz) for a<br />

capacitive switch <strong>and</strong> more than 25 dB at higher<br />

210<br />

frequency. The isolation can be improved by us<strong>in</strong>g<br />

high dielectric constant materials as shown below.<br />

Figure 8: The isolation <strong>and</strong> the <strong>in</strong>sertion loss <strong>in</strong> dB.<br />

4. CONCLUSIONS<br />

We have shown the design of an RF MEMS switch<br />

with a high switch<strong>in</strong>g speed at moderate actuation<br />

voltage. Switch<strong>in</strong>g speed <strong>in</strong>creases with actuation<br />

voltage but we have a limitation on the maximum<br />

actuation voltage. Damp<strong>in</strong>g affects the switch<strong>in</strong>g<br />

speed <strong>and</strong> can be controlled by perforations on the<br />

beam <strong>and</strong> by regulat<strong>in</strong>g the medium <strong>and</strong> pressure.<br />

Detailed effect of Q <strong>and</strong> damp<strong>in</strong>g was shown <strong>in</strong><br />

section 3.1. The switches will be fabricated <strong>in</strong> the<br />

SINTEF, MiNaLab, Oslo.<br />

Figure 9: A 3-D view of the beam with perforation.<br />

5. ACKNOWLEDGEMENT<br />

Authors are grateful to Norwegian <strong>Research</strong> Council<br />

for sponsor<strong>in</strong>g the work through SMiDA project (No<br />

159559/130).<br />

6. REFERENCES<br />

[1] J.B. Mauldav<strong>in</strong>, Nonl<strong>in</strong>ear Electro-Mechanical<br />

Model<strong>in</strong>g of MEMS Switches, IEEE MTT-S Digest,<br />

2001.<br />

[2] D.Peroulis, Electromechanical Considerations <strong>in</strong><br />

Develop<strong>in</strong>g Low-Voltage RF MEMS Switches. IEEE<br />

Transactions on MTT, Vol.51, No. 1 January 2003.<br />

[3] G.M. Rebeiz, RF MEMS Theory, Design <strong>and</strong><br />

Technology, 1 st ed. New York: Wiley, 2003, Ch 3,5.<br />

[4] J. Bergqvist, A Silicon Condenser Microphone with a<br />

highly perforated backplate, In <strong>in</strong>ternational<br />

conference on Solid-State Sensor <strong>and</strong> Actuators<br />

Digest, New York, 1991, pp 266-269.


0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

MATERIAL CHARACTERISATION AND<br />

PROCESS DEVELOPMENT FOR MINIATURISED<br />

WIRELESS SENSOR NETWORK MODULE<br />

Bivragh Majeed, Jean-Baptiste Van S<strong>in</strong>te Jans, Indrajit Paul, John Barton<br />

Sean C O’Mathuna, Kieran Delaney 1<br />

Tyndall National Institute, Lee Malt<strong>in</strong>gs, Prospect Row, Cork, Irel<strong>and</strong><br />

Cork Institute of Technology, Cork, Irel<strong>and</strong><br />

E-mail: bmajeed@tyndall.ie<br />

ABSTRACT<br />

The paper describes the work done <strong>in</strong> materials<br />

characterisation <strong>and</strong> process development of a proof-ofconcept<br />

test vehicle for m<strong>in</strong>iaturised wireless sensor<br />

network nodes. The test vehicle consists of th<strong>in</strong> silicon on<br />

a flexible substrate packaged <strong>in</strong> a novel 3-D structure.<br />

Two flexible substrates (commercial <strong>and</strong> <strong>in</strong>-house<br />

polyimide substrates, <strong>in</strong> the thickness range 25 microns<br />

down to 3 microns (each with 4 microns of sputtered<br />

copper) were analysed. It was observed that as the<br />

thickness of the polyimide substrate decreased below 9<br />

microns, wr<strong>in</strong>kl<strong>in</strong>g became a major issue. The wr<strong>in</strong>kl<strong>in</strong>g<br />

was attributed to the copper sputter<strong>in</strong>g. There was no<br />

adverse effect on test chip properties due to reduction <strong>in</strong><br />

chip thickness while mechanically the chip was able to<br />

flex more with decreas<strong>in</strong>g thickness <strong>and</strong> thus<br />

accommodate higher stresses. A f<strong>in</strong>ite element analysis<br />

model was constructed to observe the stress <strong>in</strong> silicon die<br />

under different load<strong>in</strong>g conditions. The model matched<br />

well with the experimental values <strong>and</strong> showed that gold<br />

stud bumps cause stress <strong>in</strong> the chip. Test chips were<br />

packaged to develop a highly m<strong>in</strong>iaturised 3D module by<br />

fold<strong>in</strong>g flexibly <strong>in</strong> an S shape structure. The 3D module<br />

consists of four test chips each hav<strong>in</strong>g thickness of 50<br />

microns, result<strong>in</strong>g <strong>in</strong> a packaged test device 450 microns<br />

<strong>in</strong> thickness <strong>and</strong> with a footpr<strong>in</strong>t of 18x7mm 2 . This<br />

module is be<strong>in</strong>g currently <strong>in</strong>vestigated for its<br />

implementation <strong>in</strong> a wireless sensor network system.<br />

1. INTRODUCTION<br />

As the dem<strong>and</strong> for low cost, highly m<strong>in</strong>iaturised<br />

electronics <strong>in</strong>creases, 3D packag<strong>in</strong>g has emerged as a high<br />

potential solution. Many <strong>in</strong>novative <strong>and</strong> excit<strong>in</strong>g 3D<br />

technologies have been recently reported <strong>in</strong> literature.<br />

These <strong>in</strong>clude techniques for wafer level stack<strong>in</strong>g [1],<br />

stacked chip scale package (S-CSP) [2], the neo-stack [3],<br />

<strong>and</strong> multi-chip stacked package [4]. Each method provides<br />

clear benefits <strong>in</strong> <strong>in</strong>creas<strong>in</strong>g the on-board systems density<br />

<strong>in</strong> numerous applications. However, from the perspective<br />

of wireless sensor networks, perhaps the most <strong>in</strong>terest<strong>in</strong>g<br />

is one that can use either bare die or package, <strong>and</strong> employs<br />

the use of flexible substrates. The technique will <strong>in</strong>volve<br />

bond<strong>in</strong>g the die <strong>and</strong> mak<strong>in</strong>g <strong>in</strong>terconnection us<strong>in</strong>g flipchip<br />

<strong>and</strong> then fold<strong>in</strong>g the substrate to obta<strong>in</strong> a viable 3-D<br />

package. This technique will be used for the development<br />

211<br />

of novel 5mm cube micro-sensor modules for use as nodes<br />

<strong>in</strong> future wireless network applications. This node presents<br />

a fundamental research target <strong>in</strong> the development of<br />

exploitable distributed scalable wireless networks. This<br />

node is developed as a phased proof-of-concept test<br />

vehicle to ensure effective <strong>in</strong>vestigation of assembly<br />

issues <strong>and</strong> characterisation of the behaviour of different<br />

materials.<br />

2. DEVELOPMENT AND<br />

CHARACTERISATION<br />

2.1 Development of th<strong>in</strong> flexible substrate<br />

The need for the development of th<strong>in</strong> flexible substrates<br />

arises because, <strong>in</strong> folded stack assembly, as the thickness<br />

of chip decreases below 100 microns, substrate thickness<br />

has an adverse effect on silicon volume efficiency [5].<br />

Thus, the dem<strong>and</strong> for volume effective wireless,<br />

autonomous nodes dictates that <strong>in</strong> order to fully capitalise<br />

on the benefit of th<strong>in</strong> silicon, the substrate itself needs to<br />

be made as th<strong>in</strong> as possible.<br />

The method developed to form a th<strong>in</strong> flexible substrate<br />

started with sp<strong>in</strong>n<strong>in</strong>g of precursor polyimide on a st<strong>and</strong>ard<br />

four-<strong>in</strong>ch carrier wafer. Once applied to the wafer, the<br />

precursor is thermally converted <strong>in</strong>to an <strong>in</strong>tractable<br />

polyimide film. In this way, a number of different<br />

thickness polyimides, rang<strong>in</strong>g from 16 microns down to 3<br />

microns, were prepared <strong>in</strong> order to characterise the effect<br />

of thickness on different properties of the film. This was<br />

followed by development of a test circuit, conta<strong>in</strong><strong>in</strong>g a<br />

pattern required to <strong>in</strong>terconnect a four-layer test vehicle<br />

assembly. The test pattern was generated by deposit<strong>in</strong>g a<br />

conductive seed layer on polyimide, sputter<strong>in</strong>g a copper<br />

layer 3 microns thick <strong>and</strong> then metal etch<strong>in</strong>g.<br />

The next step <strong>in</strong> the process is the technique to release the<br />

substrate from the carrier wafer. A number of different<br />

methods were <strong>in</strong>vestigated <strong>in</strong>clud<strong>in</strong>g mechanical, chemical<br />

<strong>and</strong> laser ablation. In mechanical release the polyimide<br />

was physically stripped from the wafer. This method was<br />

effective without the adhesion promoter on samples with<br />

no metal pattern<strong>in</strong>g; samples with the adhesion promoter<br />

could not be removed at all. Samples with metal patterns<br />

on them could not be successfully released either, as the<br />

polyimide was torn dur<strong>in</strong>g the peel<strong>in</strong>g process; this is<br />

believed to have happened because the copper metal acted


to create stress concentration po<strong>in</strong>ts. In chemical release,<br />

some of the chemical affected the polyimide <strong>and</strong> thus<br />

these methods were discarded. In laser ablation technique<br />

a short wavelength high-energy UV laser beam is focused<br />

onto the backside of the wafer. The schematic of the<br />

process is shown <strong>in</strong> Figure 1.<br />

Figure 1: Schematic of flex release technique us<strong>in</strong>g<br />

UV laser<br />

The effectiveness of the process depends on three factors<br />

namely: laser wavelength, carrier wafer <strong>and</strong> the polymer.<br />

A 193nm ArF excimer laser was used <strong>in</strong> the experiments<br />

with quartz wafer as carrier wafer. Quartz is used, as it is<br />

transparent to the laser, while polyimide is highly<br />

absorb<strong>in</strong>g at this wavelength. The mechanism of ablation<br />

depends on the laser ablation power density (P d)[6] In<br />

general with low P d as (<strong>in</strong> our case) photochemical<br />

ablation mechanism dom<strong>in</strong>ates <strong>and</strong> for high P d<br />

photothermal mechanism dom<strong>in</strong>ates. In photoablation<br />

process a sub-micron layer at the polyimide-quartz<br />

<strong>in</strong>terface is etched away to release the flexible substrate<br />

from the carrier substrate [7]. The important characteristic<br />

of this process is no thermal damage to the material<br />

adjacent to the ablation. For the current experiment the<br />

laser release process gave the best possible results with the<br />

m<strong>in</strong>imum damage to the polyimide flexible substrate.<br />

2.2 Characterisation of th<strong>in</strong> flexible<br />

substrate<br />

Five types of characterisation <strong>in</strong>clud<strong>in</strong>g electrical,<br />

chemical, mechanical, moisture absorption <strong>and</strong> stress were<br />

performed on released substrates of vary<strong>in</strong>g thickness. The<br />

results were analysed <strong>and</strong> compared with commercial<br />

polyimide. The results showed that electrical, chemical,<br />

mechanical <strong>and</strong> moisture absorption properties are<br />

acceptable for all of the spun off polyimide <strong>in</strong>clud<strong>in</strong>g the<br />

th<strong>in</strong>nest ones. One of the problems that was observed<br />

dur<strong>in</strong>g the process<strong>in</strong>g of th<strong>in</strong> flex was wr<strong>in</strong>kl<strong>in</strong>g of<br />

th<strong>in</strong>nest flex once it was released from the carrier wafer.<br />

Figure 2 shows an example of wr<strong>in</strong>kled <strong>and</strong> wr<strong>in</strong>kle free<br />

substrates. There are two ma<strong>in</strong> reasons for wr<strong>in</strong>kl<strong>in</strong>g;<br />

firstly it can be caused by <strong>in</strong>ternal stress generated <strong>in</strong><br />

polyimide dur<strong>in</strong>g the cur<strong>in</strong>g process <strong>and</strong> secondly residual<br />

stress <strong>in</strong> copper. The stresses <strong>in</strong> the cured polyimide<br />

before release were calculated <strong>and</strong> it was concluded that<br />

the difference <strong>in</strong> stress level between the thickest <strong>and</strong> the<br />

th<strong>in</strong>nest polyimide was not that huge to justify such a<br />

large observed difference <strong>in</strong> wr<strong>in</strong>kl<strong>in</strong>g patterns. Thus the<br />

residual stress <strong>in</strong> copper must drive the formation of<br />

wr<strong>in</strong>kl<strong>in</strong>g <strong>in</strong> areas surround<strong>in</strong>g the Cu due to non-uniform<br />

pattern<strong>in</strong>g of the circuit. The thicker substrates are too stiff<br />

to wr<strong>in</strong>kle but the driv<strong>in</strong>g force can overcome the small<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

212<br />

resistance offered by stiffness <strong>in</strong> the th<strong>in</strong>ner substrate. The<br />

critical thickness below which such an effect will be<br />

present was estimated to be 10 microns. To overcome this<br />

issue a polymer r<strong>in</strong>g formation around the circuit was<br />

<strong>in</strong>vestigated <strong>and</strong> it resulted <strong>in</strong> wr<strong>in</strong>kle free substrates.<br />

Figure 2: Laser delam<strong>in</strong>ated flexible test circuit<br />

substrate a) 3 microns b) 3 microns with polymer r<strong>in</strong>g.<br />

2.3 Th<strong>in</strong> silicon<br />

As discussed <strong>in</strong> the <strong>in</strong>troduction section of the paper, one<br />

of the major requirements, to meet the <strong>in</strong>creas<strong>in</strong>g dem<strong>and</strong><br />

for higher functionality <strong>in</strong> lower profile, is to th<strong>in</strong> the<br />

silicon die itself. This is achieved by th<strong>in</strong>n<strong>in</strong>g the nonfunctional<br />

part of silicon die. Th<strong>in</strong>n<strong>in</strong>g silicon not only<br />

reduces the package profile but also offers added<br />

advantages of better thermal resistance <strong>and</strong> reduced device<br />

stress thus <strong>in</strong>creas<strong>in</strong>g device reliability [8, 9]. Also, when<br />

silicon is th<strong>in</strong>ned to below 50 microns it becomes flexible,<br />

comb<strong>in</strong>ed with a flexible substrate it can be a major<br />

advantage <strong>in</strong> the areas of wearable comput<strong>in</strong>g, embedded<br />

artefacts <strong>and</strong> wireless sensor nodes.<br />

There are many approaches <strong>in</strong>clud<strong>in</strong>g mechanical gr<strong>in</strong>d<strong>in</strong>g<br />

[10], chemical mechanical polish<strong>in</strong>g [11] <strong>and</strong> dry <strong>and</strong><br />

plasma etch<strong>in</strong>g [12], that can be used to achieve a silicon<br />

thickness to less than 100µm. Each of these methods carry<br />

with them their own advantages <strong>and</strong> disadvantages, the<br />

f<strong>in</strong>al choice of which method to use is very much<br />

dependant on the f<strong>in</strong>al device application <strong>and</strong> the silicon<br />

thickness required. As the thickness of the silicon is<br />

reduced below about 100µm a portion of the silicon can be<br />

damaged by some of the th<strong>in</strong>n<strong>in</strong>g methods. This damage<br />

can be separated <strong>in</strong>to two parts; surface damage <strong>in</strong>clud<strong>in</strong>g<br />

micro-cracks <strong>and</strong> crystal damage consist<strong>in</strong>g of<br />

dislocations. Both of these damage layers contribute to<br />

wafer warpage <strong>and</strong> a reduction of the fracture strength of<br />

the wafer [13]. The damage produced can propagate<br />

through the sample caus<strong>in</strong>g it to crack <strong>and</strong> it can also<br />

affect the electrical operation of the devices on the sample<br />

surface. For this reason it is imperative that this damage<br />

layer be reduced <strong>and</strong> full electrical, mechanical <strong>and</strong> visual<br />

characterisation should be done to elim<strong>in</strong>ate any error <strong>in</strong><br />

a<br />

b


performance of the chip due to th<strong>in</strong>n<strong>in</strong>g process before<br />

packag<strong>in</strong>g.<br />

In the current work, test chips were th<strong>in</strong>ned to f<strong>in</strong>al<br />

thicknesses of 250, 100 <strong>and</strong> 50 microns. The th<strong>in</strong>n<strong>in</strong>g<br />

process <strong>in</strong>volved chemical mechanical polish<strong>in</strong>g to<br />

remove the bulk of the material followed by dry plasma<br />

etch<strong>in</strong>g to remove the damage layer. Dic<strong>in</strong>g was done with<br />

a specialised saw <strong>in</strong> order to m<strong>in</strong>imise the damage to the<br />

chip. Effect of th<strong>in</strong>n<strong>in</strong>g electrical <strong>and</strong> mechanical<br />

properties were <strong>in</strong>vestigated <strong>and</strong> results are presented <strong>in</strong><br />

the next section.<br />

2.3.1 Electrical characterisation<br />

The test chip conta<strong>in</strong>s a diffused heater resistor cover<strong>in</strong>g<br />

85% of the chip area <strong>and</strong> 3 temperature sensitive diodes.<br />

The diodes are positioned for optimal temperature sens<strong>in</strong>g<br />

across the surface of each die. For the current work the<br />

central diode is used for measur<strong>in</strong>g different parameters.<br />

The follow<strong>in</strong>g parameters were <strong>in</strong>vestigated to observe the<br />

effect of chip thickness:<br />

1. Temperature Sensitive Parameter for the Diode<br />

2. IV Characteristics<br />

3. Heater Resistance<br />

Measurements were done with a Kiethly current source<br />

237 connected to a computer via a GPIB. A Labview<br />

program was used to record the results for the experiment.<br />

The results showed that there is no relationship between<br />

TSP <strong>and</strong> thickness of the chip <strong>and</strong> silicon th<strong>in</strong>n<strong>in</strong>g has no<br />

adverse effect on the electrical performance of the diode.<br />

IV characteristics of the diode were measured for different<br />

chip thickness at different temperature. The result showed<br />

the dependence of IV on temperature <strong>and</strong> there was no<br />

adverse effect of silicon th<strong>in</strong>n<strong>in</strong>g on this property. Heater<br />

resistance of the test chip was measured <strong>in</strong> order to<br />

determ<strong>in</strong>e the power dissipated by the chip for a given<br />

current. The results given <strong>in</strong>dicated that the th<strong>in</strong>nest chip<br />

dissipated the maximum power for 100 mA current but<br />

there was not much of difference between different chips.<br />

In general th<strong>in</strong>n<strong>in</strong>g has no adverse effect on the electrical<br />

properties of the material.<br />

2.3.2 Mechanical characterisation<br />

Silicon th<strong>in</strong>n<strong>in</strong>g can impart damage to the silicon die <strong>and</strong><br />

it is very important that the damage layer be removed.<br />

Scann<strong>in</strong>g electron microscopy was performed on the<br />

backside of the chip <strong>and</strong> it was observed that there was no<br />

surface damage on the backside but there was some<br />

surface damage done due to dic<strong>in</strong>g. For mechanical<br />

characterisation three po<strong>in</strong>t bend test was carried out on<br />

the different thickness chip. This test allowed for the<br />

determ<strong>in</strong>ation of chip flextural stress. A f<strong>in</strong>ite element<br />

analysis model was then constructed. The basis of the<br />

model is the maximum pr<strong>in</strong>ciple stress theory [14].<br />

Maximum pr<strong>in</strong>ciple stress theory can predict fracture of<br />

brittle materials subjected to stress states with at least one<br />

tensile pr<strong>in</strong>ciple stress under static load<strong>in</strong>g conditions.<br />

Fracture will occur when the pr<strong>in</strong>ciple stress of greatest<br />

magnitude for that state of stress reaches a critical value.<br />

In the first model, different distributed load along the yaxis<br />

of the 525 microns die were applied for 525. The<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

213<br />

model predicted that for 60N force the maximum pr<strong>in</strong>ciple<br />

stress was higher than the flexural strength <strong>and</strong> failure will<br />

occur. These results were with<strong>in</strong> 5 percent of the<br />

experimental data. Similar results were obta<strong>in</strong>ed for 50<br />

microns die thus validat<strong>in</strong>g the model. Figure 3 shows the<br />

stress generated for a 3N po<strong>in</strong>t force applied at the centre<br />

of 250 microns th<strong>in</strong> die. The stress is below the flexural<br />

stress <strong>and</strong> die should not break at this applied load. Also<br />

the stress is concentrated near the applied load <strong>and</strong> most of<br />

the chip is stress free.<br />

Figure 3: FEA model predict<strong>in</strong>g the stress generated<br />

<strong>in</strong> the 250 microns th<strong>in</strong> chip<br />

In the next step the model was used to predict the stress<br />

generated for four gold stud bumps placed at each corner<br />

of the chip. The results shown <strong>in</strong> figure 4 predict that the<br />

maximum stress generated is still below the flexural stress<br />

<strong>and</strong> similar <strong>in</strong> magnitude to the previous example. But<br />

overall the chip is under higher stress state compared to<br />

the previous example <strong>and</strong> there is compound effect of<br />

add<strong>in</strong>g the stress from <strong>in</strong>dividual stress concentration<br />

po<strong>in</strong>t.<br />

Figure 4: Stress generated <strong>in</strong> 250 microns chip with<br />

four gold stud bumps.<br />

In the next phase of the work the effect of this stress on<br />

the properties of the flip chip test chip will be carried out.<br />

This will <strong>in</strong>volve stress<strong>in</strong>g the chip via a tensile tester <strong>and</strong><br />

measur<strong>in</strong>g the properties of the test chip.<br />

2.4 Development of 3-D module<br />

The 3-D module consisted of flexible substrates of vary<strong>in</strong>g<br />

thickness; st<strong>and</strong>ard <strong>and</strong> th<strong>in</strong>ned test dies to 50 microns <strong>and</strong><br />

was fabricated with flip chip technology us<strong>in</strong>g either<br />

anisotropic conductive paste or film. The first modules<br />

were developed as technological demonstrators us<strong>in</strong>g<br />

commercial flex of 25 microns thickness. Figure 5 shows


Chips<br />

thickness<br />

Microns<br />

the mechanical demonstrator built by us<strong>in</strong>g four-test chip<br />

of two different thicknesses <strong>and</strong> table I gives the<br />

dimension of two modules. It shows that for a 4-chip<br />

demonstrator with a total thickness of 480 microns, the<br />

percentage of flex <strong>in</strong>creases from 4 to 20 percent.<br />

Figure 5 Technological Demonstrators<br />

Total<br />

thickness<br />

Microns<br />

Table I: Dimensions of the first demonstrator<br />

An environmental test regime <strong>in</strong>clud<strong>in</strong>g humidity <strong>and</strong> high<br />

temperature/ high humidity was carried <strong>in</strong>itially on an<br />

unfolded test samples. The sample failed <strong>in</strong> humidity<br />

test<strong>in</strong>g after 500 cycles while <strong>in</strong> thermal cycl<strong>in</strong>g samples<br />

didn’t fail after 200 cycles. The failure analysis proved<br />

that humidity was a major cause of failure <strong>and</strong> this<br />

corroborates with the results recently reported <strong>in</strong> literature<br />

[15]. In the next step these tests will be carried out on<br />

folded stack to thermo-mechanically characterise the 3-D<br />

modules.<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

4 Test chip 3D Demonstrator<br />

%<br />

chip<br />

%<br />

chip<br />

Length<br />

(mm)<br />

3. CONCLUSION<br />

Width<br />

(mm)<br />

525 2357 90 4 18.74 7.19<br />

50 450 41 20 18.74 8.25<br />

This work focused upon the early development phase of<br />

the packag<strong>in</strong>g of a wireless sensor node, <strong>and</strong>, <strong>in</strong> particular,<br />

specific material issues <strong>and</strong> assembly techniques. Folded<br />

flex provided a very good potential solution to the system<br />

packag<strong>in</strong>g challenges <strong>in</strong>herent <strong>in</strong> implement<strong>in</strong>g functional<br />

m<strong>in</strong>iaturised sensor modules. The problem aris<strong>in</strong>g due to<br />

the m<strong>in</strong>iaturis<strong>in</strong>g of different component especially<br />

flexible substrates <strong>and</strong> silicon die were <strong>in</strong>vestigated.<br />

Solutions <strong>and</strong> mechanism to overcome problem like<br />

wr<strong>in</strong>kl<strong>in</strong>g were presented. The effect of silicon th<strong>in</strong>n<strong>in</strong>g<br />

was <strong>in</strong>vestigated <strong>and</strong> it was concluded that electrical<br />

properties are not affected by th<strong>in</strong>n<strong>in</strong>g process. While th<strong>in</strong><br />

silicon die are very flexible <strong>and</strong> thus are able to<br />

compensate stresses generated due to thermal mismatch.<br />

The FEA model showed the stresses generated <strong>in</strong> different<br />

load<strong>in</strong>g situations. It showed that stress concentration<br />

po<strong>in</strong>ts like gold stud bumps; result <strong>in</strong> higher stressed die<br />

as their stresses add up. The first 3-D technological<br />

module showed the advantages of reduc<strong>in</strong>g the thickness<br />

of the chip <strong>and</strong> the feasibility of the current work. In the<br />

214<br />

next phase the stresses <strong>in</strong> th<strong>in</strong> silicon under dynamic<br />

load<strong>in</strong>g will be analysed <strong>and</strong> corroborated by FEA.<br />

Different reliability test<strong>in</strong>g will be performed on the 3D<br />

module the feasibility study for the development of sensor<br />

node.<br />

4. REFERENCES<br />

[1] Y.P Wu, et al “Innovative Stacked Die Package –<br />

S2BGA,” Proc 52 nd ECTC May 28 - 31, San Diego,<br />

USA, s06p4, 2002<br />

[2] K Y Chen; et al, “Ultra Th<strong>in</strong> Electronic Package”,<br />

IEEE Transactions on Advance Packag<strong>in</strong>g, Vol 23, N1,<br />

pp22-26, 2000<br />

[3] K D Gann, Neo Stack<strong>in</strong>g Technology; Irv<strong>in</strong>e Sensors<br />

Corporation, http://www.irv<strong>in</strong>e-sensors.com/pdf/Neo-<br />

Stack<strong>in</strong>g%20Technology%20HDI-3.pdf<br />

[4] M Sunohara, et al “Development of Wafer Th<strong>in</strong>n<strong>in</strong>g<br />

<strong>and</strong> Double Sided Bump<strong>in</strong>g Technologies for Three<br />

Dimensional Stacked LSI, Proc 52 nd ECTC, May 28 -<br />

31, San Diego, California USA, s06p2, 2002<br />

[5] B. Majeed, et al The Development of a Test Vehicle<br />

for Applications <strong>in</strong> Ambient Electronic Systems Us<strong>in</strong>g<br />

Very Th<strong>in</strong> Flexible Substrate, 3rd EMPS, Prague Czech<br />

Republic, 16th to 18th June 2004<br />

[6] S. Kuper et al, “Threshold Behaviour <strong>in</strong> Polyimide<br />

Photoablation: S<strong>in</strong>gle-Shot Rate Measurements <strong>and</strong><br />

Surface-Temperature modell<strong>in</strong>g”, Journal of Applied<br />

Physics, A56, 43-50, 1993<br />

[7] J U. Meyer; “High Density Interconnects for Flexible<br />

Hybrid Assemblies for Active Biomedical Implants”;<br />

IEEE Transaction on Advance Packag<strong>in</strong>g, Vol 24, Iss<br />

3, pp 366-374 2001<br />

[8] E Jung, et al, “Ultra Th<strong>in</strong> Chip for M<strong>in</strong>iaturised<br />

Products” Proc 52 nd ECTC May 28 - 31, San Diego,<br />

California USA, s26p5, 2002,<br />

[9]S. S<strong>in</strong>niagu<strong>in</strong>e; “3 D Stacked Wafer Level Packag<strong>in</strong>g<br />

Advance Packag<strong>in</strong>g, Vol 9, n 3, March 2000<br />

10 Pat Halahan, Backgr<strong>in</strong>d<strong>in</strong>g Technologies for Th<strong>in</strong>-<br />

Wafer Production, Chip Scale Review, Jan 2002.<br />

11 P. Perdu, A review of sample backside preparation<br />

techniques for VLSI, <strong>Microelectronics</strong> Reliability 40,<br />

2000<br />

[12] Sergey Savastiouk, Atmospheric Downstream<br />

Plasma – An approach to 300mm Wafer Th<strong>in</strong>n<strong>in</strong>g, Semi<br />

Global 300mm Report, July 1998<br />

[13] Kenji Takahashi, Current Status of <strong>Research</strong> <strong>and</strong><br />

Development for Three-Dimensional Chip Stack<br />

Technology, Japanese Journal of Applied Physics, Vol.<br />

40 April 2001<br />

[14] N.Willems, “Strength of Materials” New York<br />

McGraw-Hill, 1981<br />

[15] J.de Vries, et al “100microns Pitch Flip Chip on Foil<br />

Assemblies with adhesive <strong>in</strong>terconnects”,<br />

<strong>Microelectronics</strong> Reliability, 45 pp 527-534 <strong>2005</strong>


0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

A NOVEL PLANAR MAGNETIC SENSOR BASED<br />

ON ORTHOGONAL FLUXGATE PRINCIPLE<br />

O. Zorlu, P. Kejik, F. V<strong>in</strong>cent, R. S. Popovic<br />

Ecole Polytechnique Fédérale de Lausanne (EPFL), Laboratoire de Microsystèmes 3<br />

Batiment BM, Station 17, CH-1015 Lausanne, Switzerl<strong>and</strong><br />

E-mail: ozge.zorlu@epfl.ch<br />

ABSTRACT<br />

In this paper, we present a new planar fluxgate<br />

magnetometer structure. The sensor has the<br />

orthogonal fluxgate configuration which makes the<br />

detection part <strong>in</strong>dependent of the excitation<br />

mechanism. The sensor consists of a ferromagnetic<br />

cyl<strong>in</strong>drical core cover<strong>in</strong>g an excitation rod, <strong>and</strong><br />

planar coils for signal detection. The fabricated<br />

sensor has a l<strong>in</strong>ear range of ±250 µT, a sensitivity<br />

of 4.3 mV/mT, <strong>and</strong> a perm<strong>in</strong>g below 400 nT for<br />

200 mA peak s<strong>in</strong>usoidal excitation current at<br />

100 kHz. The effect of demagnetization on the<br />

sensitivity, l<strong>in</strong>ear range, <strong>and</strong> perm<strong>in</strong>g for this<br />

structure is demonstrated by vary<strong>in</strong>g the length of<br />

the ferromagnetic core.<br />

1. INTRODUCTION<br />

Fluxgate type magnetic sensors are used <strong>in</strong> the<br />

magnitudal <strong>and</strong> directional measurement of DC or<br />

low-frequency AC magnetic fields. Their typical<br />

application areas are electronic compasses, current<br />

sensors, magnetic <strong>in</strong>k read<strong>in</strong>g, detection of ferrous<br />

materials, <strong>and</strong> non-destructive test<strong>in</strong>g [1, 2]. The<br />

ma<strong>in</strong> advantage of fluxgate sensors are their high<br />

sensitivity <strong>and</strong> l<strong>in</strong>earity. On the other h<strong>and</strong>, low<br />

magnetic field operation range <strong>and</strong> high perm<strong>in</strong>g are<br />

still the problems of current fluxgate sensors [3].<br />

2. SENSOR STRUCTURE<br />

In this paper, a new orthogonal fluxgate sensor<br />

structure is presented. Figure 1 shows the new<br />

sensor structure. An excitation rod is coated with<br />

the ferromagnetic material <strong>in</strong> a way that a circular<br />

excitation field loop is formed <strong>in</strong>side the<br />

ferromagnetic material. The permeability of the<br />

ferromagnetic layer is periodically modulated <strong>in</strong> the<br />

orthogonal direction with respect to the measured<br />

magnetic field by pass<strong>in</strong>g an AC current through the<br />

excitation rod. This makes the detection part<br />

<strong>in</strong>dependent of the excitation mechanism <strong>and</strong> the<br />

measur<strong>in</strong>g range <strong>and</strong> the sensitivity can be adjusted<br />

by modify<strong>in</strong>g only the core length. The<br />

215<br />

modification of the core length changes the<br />

demagnetization factor of the core. The apparent<br />

relative permeability µapp of the core deviates from<br />

its <strong>in</strong>tr<strong>in</strong>sic value accord<strong>in</strong>g to the demagnetization<br />

factor. This can be expressed as:<br />

µ<br />

app<br />

µ i =<br />

1+ N µ −<br />

( 1)<br />

i<br />

(1)<br />

where µi is the <strong>in</strong>tr<strong>in</strong>sic relative permeability <strong>and</strong> N<br />

is the demagnetization factor. A change <strong>in</strong> the<br />

apparent permeability produces a change <strong>in</strong> the<br />

l<strong>in</strong>ear region of the B-H curve of the ferromagnetic<br />

layer. S<strong>in</strong>ce the saturation magnetic filed <strong>in</strong>tensity<br />

Bsat rema<strong>in</strong>s constant, the slope of the l<strong>in</strong>ear region<br />

decreases <strong>and</strong> the magnetic field H required to<br />

saturate the material <strong>in</strong>creases. This corresponds to<br />

a decrease <strong>in</strong> the sensitivity, but also to an <strong>in</strong>crease<br />

<strong>in</strong> the l<strong>in</strong>ear operat<strong>in</strong>g range of the sensor. This<br />

phenomenon is effective <strong>in</strong> the longitud<strong>in</strong>al direction<br />

<strong>in</strong> the sensor, i.e. <strong>in</strong> the sens<strong>in</strong>g direction of the<br />

magnetic field.<br />

Figure 1: The sensor structure.<br />

On the other h<strong>and</strong>, <strong>in</strong> the radial direction, i.e. <strong>in</strong> the<br />

excitation direction the geometry can be considered<br />

as an <strong>in</strong>f<strong>in</strong>itely long core due to the closed magnetic<br />

loop, for which the demagnetization factor is zero.<br />

So, the slope of the B-H curve is a maximum, which<br />

allows easy <strong>and</strong> homogenous saturation along the<br />

whole core length. Two planar coils positioned<br />

under the tips of the ferromagnetic core pick up the<br />

measured signal. The use of planar pick-up coils<br />

provides easy <strong>in</strong>tegration <strong>in</strong>to st<strong>and</strong>ard CMOS<br />

processes.


0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

3. FABRICATION<br />

Figure 2 shows the fabricated sensor prototype. The<br />

fabrication starts with form<strong>in</strong>g the pick-up coils on a<br />

Pyrex substrate by sputter<strong>in</strong>g, photolithography, <strong>and</strong><br />

pattern<strong>in</strong>g. Then, a gold wire which is pass<strong>in</strong>g over<br />

two pick-up coils is bonded. The gold wire serves<br />

as the excitation rod. After this step, the two edges<br />

of the gold wire are covered with an epoxy. The<br />

epoxy is used as a mask to determ<strong>in</strong>e the length <strong>and</strong><br />

placement of the ferromagnetic layer which is<br />

electroplated over the gold wire. A st<strong>and</strong>ard FeNi<br />

electroplat<strong>in</strong>g bath is used for the electroplat<strong>in</strong>g of<br />

the magnetic layer [4]. A 10 µm thick <strong>and</strong> 4 mm<br />

long cyl<strong>in</strong>drical FeNi layer is electroplated over the<br />

gold wire hav<strong>in</strong>g 20 µm diameter.<br />

Figure 2: (a) The fabricated sensor. (b) Closer<br />

view to one edge. The planar coil, the<br />

electroplated <strong>and</strong> protected parts of the gold wire,<br />

<strong>and</strong> the epoxy is seen.<br />

4. SENSOR PERFORMANCE<br />

The tests of the sensor are done by pass<strong>in</strong>g a<br />

s<strong>in</strong>usoidal current through the excitation rod. The<br />

frequency of the current is kept constant at 100 kHz,<br />

whereas its amplitude is varied. The excitation<br />

current leads to an <strong>in</strong>duced voltage across the pickup<br />

coils whose 2 nd harmonic is proportional to external<br />

magnetic field. The 2 nd harmonic of the <strong>in</strong>duced<br />

voltage is measured with a lock-<strong>in</strong> amplifier <strong>and</strong> the<br />

l<strong>in</strong>ear operation range, sensitivity, <strong>and</strong> the perm<strong>in</strong>g<br />

of the sensor are analyzed. Figure 3 shows the test<br />

results of the sensor with different excitation<br />

currents with<strong>in</strong> ±2 mT external magnetic filed range.<br />

The l<strong>in</strong>ear range is def<strong>in</strong>ed as the region <strong>in</strong> which<br />

the waveform fits to a l<strong>in</strong>ear function with 1%<br />

nonl<strong>in</strong>earity. It can be seen from Figure 3 that the<br />

l<strong>in</strong>ear measur<strong>in</strong>g range is <strong>in</strong>dependent of the<br />

excitation current.<br />

216<br />

Figure 3: The sensor response with<strong>in</strong> ±2 mT<br />

range for different levels of the s<strong>in</strong>usoidal<br />

excitation current at 100 kHz. The l<strong>in</strong>ear<br />

measur<strong>in</strong>g range (shaded region) is <strong>in</strong>dependent of<br />

the excitation current.<br />

The slopes of the l<strong>in</strong>ear l<strong>in</strong>es give the sensitivity of<br />

the sensor over this l<strong>in</strong>ear range. Figure 4 shows the<br />

variation of the sensitivity with the excitation<br />

current. The sensitivity of the sensor <strong>in</strong>creases with<br />

the excitation current <strong>and</strong> tends to saturate after<br />

100 mA peak excitation current.<br />

Figure 4 shows also the variation of the perm<strong>in</strong>g<br />

value of the sensor with the excitation current. The<br />

perm<strong>in</strong>g of the sensor is the memory effect due to<br />

hard magnetic shocks. In order to measure the effect<br />

of perm<strong>in</strong>g, an external field of about 50 mT is<br />

applied to the sensor, <strong>and</strong> then the field is reduced to<br />

ambient. Then the stabilized value seen on the lock<strong>in</strong><br />

amplifier is recorded. The same procedure is<br />

repeated with a field <strong>in</strong> the opposite direction. The<br />

difference between the two measured values gives<br />

the perm<strong>in</strong>g voltage. This voltage is then converted<br />

<strong>in</strong>to the equivalent <strong>in</strong>put magnetic field by divid<strong>in</strong>g<br />

the value by the sensitivity at the very small region<br />

around the zero field value. It is important to note<br />

that the sensitivity of the sensor at this region is<br />

different from the sensitivity over the whole l<strong>in</strong>ear<br />

range. It is seen from Figure 4 that the perm<strong>in</strong>g<br />

decreases with the excitation current due to the<br />

better saturation of the core.<br />

The measurements show that, with 200 mA peak<br />

excitation current at the frequency of 100 kHz, the<br />

sensor shows a l<strong>in</strong>ear response <strong>in</strong> the range of<br />

±250 µT with a perm<strong>in</strong>g below 400 nT <strong>and</strong> a<br />

sensitivity of 4.3 mV/mT.


Figure 4: The variation of the sensitivity <strong>and</strong> the<br />

perm<strong>in</strong>g of the sensor with the magnitude of the<br />

excitation current.<br />

5. EFFECT OF DEMAGNETIZATION<br />

The effect of demagnetization on the sensitivity,<br />

l<strong>in</strong>ear range, <strong>and</strong> perm<strong>in</strong>g were also <strong>in</strong>vestigated. It<br />

is decided to perform the measurements on the same<br />

sample with only one pick-up coil by reduc<strong>in</strong>g the<br />

core length by ½ after perform<strong>in</strong>g all tests. In order<br />

to achieve different core lengths, half of the<br />

ferromagnetic core is protected by photoresist <strong>and</strong><br />

the other half is etched by an H2SO4 based solution.<br />

The tests are then performed on the new structure<br />

hav<strong>in</strong>g smaller core length. With this method,<br />

structures with 2 mm, 1 mm, <strong>and</strong> 0.5 mm lengths are<br />

obta<strong>in</strong>ed <strong>and</strong> tested. The advantage of this method<br />

over fabricat<strong>in</strong>g sensors with different lengths is that<br />

the sensor response is only dependent on the change<br />

<strong>in</strong> the length of the ferromagnetic core, but not on<br />

the variations on the electroplat<strong>in</strong>g thickness <strong>and</strong> the<br />

core distance from substrate. These two would not<br />

be well controlled if the tests would be performed<br />

with different samples.<br />

Figure 5 gives the responses of the sensors hav<strong>in</strong>g<br />

different core lengths with a 200 mA peak excitation<br />

current. It should be noted that for better<br />

comparison, the voltage values for the 4mm-long<br />

sensor are divided by 2. It is seen that the l<strong>in</strong>ear<br />

range is <strong>in</strong>creas<strong>in</strong>g <strong>and</strong> the sensitivity of the sensor<br />

is decreas<strong>in</strong>g with smaller core length. These are<br />

both due to the effect of demagnetization of the<br />

sensor.<br />

Figure 6 provides a more clear view of the change of<br />

the l<strong>in</strong>ear range with the sensor length. As well as<br />

the data for four different lengths, the variation of<br />

the <strong>in</strong>verse of the apparent permeability, µapp, which<br />

is calculated for a µi of 1000 with (Eq. 1), by<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

217<br />

Figure 5: The response of the sensors with<br />

different core lengths at 200 mA peak current.<br />

The change <strong>in</strong> the l<strong>in</strong>ear range <strong>and</strong> the sensitivity<br />

with length is seen.<br />

assum<strong>in</strong>g that the core has an ellipsoidal shape [5],<br />

is presented. The l<strong>in</strong>ear range follows the 1/µapp<br />

curve as expected. Furthermore, Figure 6 shows that<br />

the l<strong>in</strong>ear range is almost the same for the same<br />

sensor length for different excitation values, which<br />

is previously depicted <strong>in</strong> Figure 3.<br />

Figure 6: The change of the l<strong>in</strong>ear range with the<br />

sensor length for different excitation current<br />

values <strong>and</strong> the comparison with the 1/µ app values.<br />

Figure 7 shows the variation of the sensitivity with<br />

the core length, <strong>and</strong> its comparison with the<br />

variation of the apparent permeabilityp. The<br />

sensitivity tends to saturate for longer core lengths<br />

due to the fact that the demagnetization factor for<br />

longer cores is close to zero <strong>and</strong> does not have such<br />

a big effect when compared to the smaller lengths.<br />

This is also <strong>in</strong> good agreement with the very small<br />

change <strong>in</strong> the l<strong>in</strong>ear range between 2 mm <strong>and</strong> 4 mm<br />

cores which is seen <strong>in</strong> Figure 6.


Figure 7: The variation of the sensitivity with the<br />

core length, <strong>and</strong> its comparison with the variation<br />

of the apparent permeability, µ app.<br />

Figure 8 shows the variation of perm<strong>in</strong>g with the<br />

excitation current for a given core length. For<br />

higher excitation currents, the sensor is saturated<br />

more strongly <strong>in</strong> the orthogonal direction. This<br />

results <strong>in</strong> the better alignment of the magnetic<br />

doma<strong>in</strong>s <strong>in</strong>side the core which gives rise to a smaller<br />

perm<strong>in</strong>g value. On the other h<strong>and</strong>, the magnetic<br />

properties <strong>in</strong> the radial direction are <strong>in</strong>dependent of<br />

the core length, which gives the same remanent<br />

magnetization for different core lengths with the<br />

same excitation current. However, the sensitivity of<br />

the sensor is smaller, <strong>and</strong> this gives rise to a higher<br />

perm<strong>in</strong>g value.<br />

Figure 8: The variation of perm<strong>in</strong>g with the<br />

excitation current for a given core length.<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

218<br />

6. CONCLUSION<br />

A new planar orthogonal fluxgate sensor structure is<br />

presented, <strong>and</strong> the feasibility of the idea is verified<br />

by the fabricated prototype. The sensor has a l<strong>in</strong>ear<br />

range of ±250 µT, a sensitivity of 4.3 mV/mT, a <strong>and</strong><br />

perm<strong>in</strong>g below 400 nT for 200 mA peak s<strong>in</strong>usoidal<br />

excitation current at 100 kHz. The effect of<br />

demagnetization on the l<strong>in</strong>ear range, sensitivity, <strong>and</strong><br />

perm<strong>in</strong>g of sensor is also demonstrated by vary<strong>in</strong>g<br />

the length of the ferromagnetic core. The sensitivity<br />

can be <strong>in</strong>creased <strong>and</strong> the perm<strong>in</strong>g can be decreased<br />

by <strong>in</strong>creas<strong>in</strong>g the core length, but this decreases the<br />

l<strong>in</strong>ear range of the sensor. Higher excitation<br />

currents lead to a better core saturation, which<br />

decreases the perm<strong>in</strong>g.<br />

Further work is the scal<strong>in</strong>g down of the sensor. The<br />

current required for an adequate core saturation can<br />

be drastically decreased by scal<strong>in</strong>g down the sensor.<br />

Smaller sensor structures also enable the <strong>in</strong>tegration<br />

of the sensor <strong>in</strong>to CMOS processes.<br />

7. REFERENCES<br />

[1] F. Kaluza, A. Gruger, H. Gruger, “New <strong>and</strong><br />

future applications of fluxgate sensors,” Sensors<br />

& Actuators A 106, pp: 48-51, 2003.<br />

[2] P. Ripka, “Advances <strong>in</strong> fluxgate sensors,”<br />

Sensors & Actuators A 106, pp: 8-14, 2003.<br />

[3] Pavel Ripka, “Magnetic Sensors <strong>and</strong><br />

Magnetometers”, Artech House Publishers,<br />

January, 2001.<br />

[4] J.M. Quemper, et. al, “Permalloy electroplat<strong>in</strong>g<br />

through photoresist molds,” Sensors &<br />

Actuators A74, pp. 1-4, 1999.<br />

[5] J. A. Osborn, “Demagnetiz<strong>in</strong>g Factors for the<br />

General Ellipsoid,” Physical Review, Vol. 67,<br />

No. 11-12, June 1-15, pp. 351-357, 1945.


0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

A Monolithic CMOS 3-axis Accelerometer<br />

Comb<strong>in</strong><strong>in</strong>g Piezoresistive <strong>and</strong> Heat Transfer Effects<br />

A. CHAEHOI, L. LATORRE, F. MAILLY <strong>and</strong> P. NOUET<br />

Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier, France.<br />

latorre@lirmm.fr<br />

Abstract<br />

This paper presents the analytical modell<strong>in</strong>g <strong>and</strong><br />

experimental data of a novel monolithic 3-axis<br />

CMOS accelerometer. The proposed sensor exhibit<br />

a sensitivity of 24mV/g <strong>and</strong> a resolution of 1g <strong>in</strong> out<br />

of plane detection <strong>and</strong> a sensitivity of 370mV/g <strong>and</strong><br />

a resolution of 30mg <strong>in</strong> lateral measurement.<br />

1. INTRODUCTION<br />

Most accelerometers sense the acceleration along<br />

one s<strong>in</strong>gle axis (vertical) or two axis (<strong>in</strong>-plane). This<br />

<strong>in</strong>formation is <strong>in</strong>sufficient <strong>in</strong> applications where full<br />

data about the movement is required. Thus 3-axis<br />

accelerometers are essential for the movement<br />

sens<strong>in</strong>g <strong>in</strong> numerous applications such as robotics or<br />

automotive applications. In this article we present a<br />

novel monolithic 3-axis accelerometer. Front Side<br />

Bulk Micromach<strong>in</strong><strong>in</strong>g (FSBM) is the cheapest way<br />

to release mechanical structures on CMOS dies.<br />

This mature technique leads to low-cost <strong>and</strong> reliable<br />

devices. The proposed sensor is based on CMOS-<br />

FSBM beam <strong>and</strong> bridges. Out of plane (z axis)<br />

accelerations are measured by a "T-shaped”<br />

cantilever beam (seismic mass). The acceleration is<br />

measured by means of two embedded stra<strong>in</strong> gauges<br />

that convert the vertical displacement of the<br />

mechanical structure <strong>in</strong>to resistance variations. In<br />

plane (x <strong>and</strong> y axis) accelerations are sensed us<strong>in</strong>g<br />

convection heat transfer devices. A heat<strong>in</strong>g resistor<br />

flanked by two temperature detectors is suspended<br />

over a cavity. The symmetric temperature profile<br />

created by the heater become asymmetric with the<br />

acceleration, this temperature variation is converted<br />

<strong>in</strong>to a resistance variation [1].<br />

In this paper we present analytical modell<strong>in</strong>g, f<strong>in</strong>ite<br />

element simulations (FEM) <strong>and</strong> experimental results<br />

of our unique monolithic CMOS-FSBM 3-axis<br />

accelerometer.<br />

219<br />

2. OUT-OF-PLANE SENSING:<br />

Piezoresistive accelerometer<br />

Us<strong>in</strong>g FSBM, the obta<strong>in</strong>ed suspended structures do<br />

not feature important seismic mass. Nevertheless,<br />

we have recently demonstrated that the very low<br />

noise level <strong>in</strong> polysilicon gauges allows a theoretical<br />

resolution of about 0.5g [2]. We have then designed<br />

a test chip to verify the performances of such CMOS<br />

cantilevers used as <strong>in</strong>ertial sensors. The proposed<br />

sensor is based on a “T-shaped” cantilever beam. An<br />

image of the proposed sensor <strong>and</strong> its associated<br />

CMOS electronic circuitry is illustrated <strong>in</strong> Figure 1.<br />

The dimensions are of the two fabricated devices are<br />

presented <strong>in</strong> figure 2 <strong>and</strong> table 1.<br />

Figure 1. Picture of the T-Shaped based<br />

accelerometer.<br />

Figure 2. Design parameters of the proposed<br />

accelerometer.


Table 1. Dimensions of the “T-Shaped” structures<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

Device A Device B<br />

Lb × Wb (µm) 480 × 80 480 × 40<br />

Lp × Wp (µm) 280 × 280 280 × 280<br />

In order to simulate the behavior of the sensor<br />

concurrently with the electronic circuitry, we have<br />

implemented an analog-HDL model description of<br />

the sensor. The output voltage is directly<br />

proportional to the beam bend<strong>in</strong>g z. The expression<br />

of the bend<strong>in</strong>g z, is determ<strong>in</strong>ed as a function of the<br />

beam’s load (Figure 3), where the loads qb <strong>and</strong> qp<br />

represents the gravitational force result<strong>in</strong>g from the<br />

sensor mass itself, E the Young modulus <strong>and</strong> I the<br />

moment of <strong>in</strong>ertia of the cross section.<br />

q b<br />

L b<br />

q p<br />

L p<br />

Fig. 3: Load<strong>in</strong>g model for the proposed sensor.<br />

The bend<strong>in</strong>g z (at a distance x from the attached po<strong>in</strong>t<br />

of the beam) is obta<strong>in</strong>ed by <strong>in</strong>tegrat<strong>in</strong>g the bend<strong>in</strong>g<br />

torque T(x) along the cantilever length. In expression<br />

(1) E is the equivalent young modulus <strong>and</strong> I is the<br />

moment of <strong>in</strong>ertia of the beam cross section I=Ib. The<br />

moment of <strong>in</strong>ertia of the plate section is given by<br />

expression (2).<br />

2<br />

d z<br />

E . I = T ( x)<br />

2<br />

dx<br />

(1)<br />

WpWp Ip = Ib = I<br />

Wb Wb<br />

(2)<br />

The bend<strong>in</strong>g torque along the cantilever is given by:<br />

x ∈ [ 0,<br />

Lb]<br />

: T ( x)<br />

= qb<br />

2 ( Lb − x)<br />

⎛ Lp ⎞<br />

+ q pLp⎜<br />

Lb + − x⎟<br />

2 ⎝ 2 ⎠<br />

[ Lb , Lb + Lp]<br />

: T(<br />

x)<br />

x p<br />

( Lb + Lp − x)<br />

∈ = q<br />

(3)<br />

2<br />

The expression of the bend<strong>in</strong>g is obta<strong>in</strong>ed by<br />

solv<strong>in</strong>g equation (1). We choose the centre of the<br />

plate as the reference po<strong>in</strong>t where the equivalent<br />

force is applied ( x = Lb + Lp 2 ). It comes: (4)<br />

2<br />

220<br />

z<br />

plate<br />

= q<br />

p<br />

+ q<br />

2Lp<br />

45EI<br />

p<br />

( W W )<br />

p<br />

4<br />

2<br />

Lp Lb<br />

2EI<br />

2<br />

b<br />

+ q<br />

+ q<br />

b<br />

b<br />

4<br />

Lb<br />

+ q<br />

8EI<br />

p<br />

3<br />

Lb Lp<br />

+ q<br />

12EI<br />

p<br />

Lp Lb<br />

3EI<br />

3<br />

3<br />

Lp Lb<br />

4EI<br />

From this equation we can deduce the expression of<br />

the equivalent stiffness <strong>and</strong> the equivalent mass of<br />

the model. The stiffness is def<strong>in</strong>ed by (5), where M<br />

is the equivalent lumped mass located at the centre<br />

of the plate.<br />

M.<br />

g<br />

K =<br />

z plate<br />

(5)<br />

By nature the mass is a distributed load while our<br />

second order mechanical system assumes a located<br />

s<strong>in</strong>gle force. In order to determ<strong>in</strong>e the expression of<br />

the punctual equivalent mass, we assume that the<br />

output voltage of the sensor is the same whether we<br />

consider the distributed mass or the equivalent<br />

punctual mass. The output voltage is directly<br />

proportional to the bend<strong>in</strong>g torque, it comes:<br />

( 0) ( 0)<br />

T x = = T x = (6)<br />

Distributed Mass Lumped Mass<br />

Where the bend<strong>in</strong>g torque for a distributed mass is:<br />

2 2<br />

Lb Lp<br />

TDistributed Mass ( x = 0)<br />

= qb + qp qb + qp LbLp<br />

2 2<br />

And the bend<strong>in</strong>g torque for a lumped mass is:<br />

⎛ Lp ⎞<br />

TLumped Masse ( x= 0)<br />

= ( MLumped ⋅g) ⋅ ⎜Lb+ ⎟<br />

⎝ 2 ⎠<br />

It comes:<br />

2 2<br />

⎛Wb. Lb + Wp. Lp + 2 Wp. Lb. Lp ⎞<br />

M Lumped = ρS ⋅⎜ ⎟<br />

⎝ 2Lb<br />

+ Lp ⎠<br />

(7)<br />

(8)<br />

(9)<br />

The behavioral model of the cantilever has been<br />

validated by ANSYS® simulations (FEM).<br />

Simulations results are compared to experimental<br />

data from our prototypes <strong>in</strong> Table 2. A sensitivity of<br />

about 23.9mV/g <strong>and</strong> a resolution of 1g have been<br />

measured with the fabricated devices.


Table 2. Summary of results<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

Model ANSYS® Prototype<br />

Stiffness (N.m) 0.44 n/a n/a<br />

Seismic Mass (kg) 1.08 10 -9 n/a n/a<br />

Sensibility (nm/g) 23.9 25.5 25.5<br />

Resonant Freq (kHz) 3.22 3.06 3.06<br />

3. IN PLANE SENSING:<br />

The thermally based accelerometer<br />

For the <strong>in</strong> plane acceleration, we use a device based<br />

on convection heat transfer from a suspended<br />

heat<strong>in</strong>g resistor. The proposed sensor is depicted <strong>in</strong><br />

Figure 4. The cavity is an 800µm square, the<br />

dimension of the polysilicon heat<strong>in</strong>g resistor is<br />

40*1040µm <strong>and</strong> 30*420µm for the detectors. A<br />

heat<strong>in</strong>g resistor flanked by two temperature<br />

detectors is suspended over a cavity. With an<br />

acceleration <strong>in</strong> the chip plane <strong>and</strong> perpendicular to<br />

the resistors, the temperature profile created by the<br />

heater become asymmetric <strong>and</strong> this temperature<br />

variation is converted <strong>in</strong>to a resistance variation by<br />

the detectors. Figure 5 shows the asymmetrical<br />

temperature distribution due to the acceleration. The<br />

detectors are arranged <strong>in</strong> a Wheatstone bridge <strong>and</strong><br />

connected to an on-chip <strong>in</strong>strument amplifier<br />

(programmable ga<strong>in</strong>: 20\100\1000).<br />

Figure 4. Picture of the thermally based<br />

accelerometer.<br />

We have simulated the behavior of the sensor with<br />

the FEM simulator ANSYS®. We consider a 2D<br />

model of the sensor us<strong>in</strong>g the FLOTRAN CFD<br />

element FLUID 141 where the velocities are<br />

221<br />

obta<strong>in</strong>ed from the conservation of momentum<br />

pr<strong>in</strong>ciple, the pressure from the conservation of<br />

mass pr<strong>in</strong>ciple <strong>and</strong> the temperature from the law of<br />

conservation of energy. The silicon cavity width <strong>and</strong><br />

depth are 1000 <strong>and</strong> 300 µm respectively. The heater<br />

<strong>and</strong> detectors widths are 40 <strong>and</strong> 30 µm respectively<br />

<strong>and</strong> their thickness is 5 µm. The distance between<br />

the heater <strong>and</strong> the sensors is 200µm. With this<br />

model we f<strong>in</strong>d a difference of temperature on the<br />

detectors of 1.37°C for 1g acceleration <strong>and</strong> a heater<br />

temperature of 440°C. Figures 5 <strong>and</strong> 6 are examples<br />

of ANSYS simulations, Figure 5 illustrates the<br />

asymmetrical temperature distribution due to the<br />

acceleration <strong>and</strong> Figure 6 displays the fluid velocity<br />

vectors.<br />

Figure 5. Temperature distribution for an<br />

acceleration toward the left.<br />

Figure 6. Fluid velocity vectors.<br />

As it was described by [3], the silicon cavity limits<br />

the thermal boundary layer <strong>and</strong> the convection


phenomena <strong>in</strong> the bottom region of the fluid.<br />

Therefore the volume of the packag<strong>in</strong>g <strong>in</strong>fluences<br />

the convection <strong>in</strong> the top region <strong>and</strong> has a large<br />

consequence on the sensor sensitivity.<br />

This k<strong>in</strong>d of device exhibits a high sensitivity to the<br />

acceleration; its resolution is below 1g. To<br />

determ<strong>in</strong>e its sensitivity <strong>and</strong> resolution, we have<br />

used it as an <strong>in</strong>cl<strong>in</strong>ometer. Figure 7 displays<br />

experimental results for the test chip under<br />

<strong>in</strong>cl<strong>in</strong>ation. This result is for a 35mW heat<strong>in</strong>g<br />

power. In this condition, the heater temperature<br />

reaches 438°C (deduced from the polysilicon<br />

temperature coefficient). The temperature difference<br />

across the detectors is calculated from the output<br />

voltage (tak<strong>in</strong>g <strong>in</strong>to account the Wheatstone bridge<br />

<strong>and</strong> the amplifier ga<strong>in</strong>). We obta<strong>in</strong> about 1.53°C/g.<br />

F<strong>in</strong>ally an angular resolution of about 1.7° (around<br />

90°) is observed (SNR=1).<br />

Output voltage (V)<br />

2,2<br />

1,8<br />

1,6<br />

1,4<br />

0-7803-9345-7/05/$20.00©<strong>2005</strong> IEEE.<br />

2<br />

0 30 60 90 120 150 180<br />

Incl<strong>in</strong>ation (degree)<br />

Figure 7. Sensitivity of the device to <strong>in</strong>cl<strong>in</strong>ation.<br />

Figure 8 presents the sensor output under a s<strong>in</strong>usoidal<br />

acceleration of 40 Hz: the sensitivity is 375mV/g <strong>and</strong> the<br />

resolution is 30 mg. The sensor has a very good l<strong>in</strong>earity<br />

between 0 <strong>and</strong> 10g, the l<strong>in</strong>earity error as a percentage of<br />

full scale is smaller than<br />

2%.<br />

Output voltage (mVpp)<br />

800<br />

600<br />

400<br />

200<br />

0<br />

0 2 4 6 8 10<br />

Acceleration (g)<br />

Figure 8. Sensitivity of the device to acceleration.<br />

222<br />

4. CONCLUSION<br />

The self-aligned wet-etch<strong>in</strong>g of CMOS bulk from<br />

the front side (FSBM) is the cheapest way to<br />

manufacture suspended structures <strong>and</strong> so monolithic<br />

microsystems. In this paper, we demonstrate that<br />

such low cost mechanical devices can be used as<br />

<strong>in</strong>ertial sensors. We have proposed a complete<br />

model of the piezoresistive out-of-plane<br />

accelerometer. The model has been validated by<br />

FEM simulations. It is implemented <strong>in</strong>to an analog-<br />

HDL description to simulate the behaviour of the<br />

sensor concurrently with the electronic circuitry <strong>in</strong><br />

order to predict the whole system performances. We<br />

have proposed a 2D-model of the <strong>in</strong>-plane thermal<br />

accelerometer based on FEM simulations. Proposed<br />

FEM are used as a basis for further behavioural<br />

modell<strong>in</strong>g <strong>and</strong> system <strong>in</strong>tegration<br />

The paper presents both simulation <strong>and</strong><br />

characterization results. We have achieved a lowcost<br />

3-axis accelerometers with a sensitivity of<br />

23.9mV/g <strong>and</strong> a resolution of 1g <strong>in</strong> the out of plane<br />

direction <strong>and</strong> a sensitivity of 370mV/g with a<br />

resolution below 30mg <strong>in</strong> lateral direction. The<br />

performances of this prototype can be easily<br />

improved by on-chip amplification <strong>and</strong> filter<strong>in</strong>g.<br />

5. REFERENCES<br />

[1] A.M. Leung, J. Jones, E. Czyzewska, J. Chen<br />

<strong>and</strong> B. Woods, “Micromach<strong>in</strong>ed accelerometer<br />

based on convection heat transfer”, IEEE<br />

MEMS’1998, Heidelberg, Germany, 25-29 Jan.<br />

1998, pp. 627–630.<br />

[2] A. Chaehoi, L Latorre, S. Baglio, P. Nouet;<br />

Piezoresistive CMOS Beams for <strong>in</strong>ertial Sens<strong>in</strong>,<br />

IEEE Sensors’03, Toronto, Canada, October<br />

22-24,2003, pp. 142-143.<br />

[3] F. Mailly, A. Mart<strong>in</strong>ez, A. Giani, F. Pascal-<br />

Delannoy <strong>and</strong> A. Boyer, Effect of gas pressure<br />

on the sensitivity of a micromach<strong>in</strong>ed thermal<br />

accelerometer, Sens. Actuators A 109 (1-2),<br />

2003, pp. 88-94.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!