11.07.2015 Views

2012 DAT_AP_0210.pdf - VLSI-DAT

2012 DAT_AP_0210.pdf - VLSI-DAT

2012 DAT_AP_0210.pdf - VLSI-DAT

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

• IEEE Solid-State Circuits Society• IEEE Solid-State Circuits Society Taipei/Tainan Chapter• IEEE Taipei/Tainan Section• Taiwan IC Design Society• Taiwan Semiconductor Industry AssociationIndustrial sponsors:• Global Unichip Corp. (http://www.globalunichip.com/)• Himax Technologies, Inc. (http://www.himax.com.tw)• Macronix International Co., Ltd. (http://www.macronix.com)• Nanya Technology Corporation (http://www.ntc.com)• Realtek Semiconductor Corp. (http://www.realtek.com/)• Vanguard International Semiconductor Corp. (http://www.vis.com.tw)• VIA Technologies, Inc. (http://www.viatech.com)• Winbond Electronics Corp. (http://www.winbond.com.tw)GENERAL INFORMATIONDate: April 23 - 25, <strong>2012</strong>Venue: The Ambassador Hotel-Hsinchu 新 竹 國 賓 大 飯 店188, Section 2, Chung Hwa Road, 新 竹 市 中 華 路 二 段 188 號Hsinchu, TaiwanTel: +886-3-515-1111Fax: +886-3-515-1112Website: www.ambassadorhotel.com.twE-mail: reserv@ambassdor-hsinchu.com.twREGISTRATIONALL PARTICIPANTS (INCLUDING SPEAKERS) ARE REQUESTED TOREGISTERStarting from 2010, <strong>VLSI</strong>-<strong>DAT</strong> do NOT produce a printed version of the conferenceproceedings. However, an electronic version of the conference proceedings (CD-ROM) willbe provided.I. Registration FeePayment of the Conference registration fee entitles single registrant to one CD-ROM ofconference proceedings, one admission for the cocktail reception, free lunch during theconference (lunch coupon can only be claimed BEFORE 10:00am each day) and entrance toall technical sessions. Technical Session registration does not include entrance to the <strong>DAT</strong>Tutorial.Payment of the Tutorial fee entitles single registrant to tutorial with one copy of the tutorialhandout. Tutorial registration does not include entrance to the technical sessions.3


◎ Get special price to register for both Conference and Tutorial.◎ The advance registration rate is available when the payment is made before theadvance registration deadline, April 13, <strong>2012</strong>. From April 14, <strong>2012</strong> onward, all theunpaid registration will be applied with on-site rate automatically.TO QUALIFY FOR THE MEMBER REGISTRATION FEE, REGISTRANT MUSTBE A MEMBER OF IEEE PRIOR TO THE CONFERENCE.(USD:TWD = 1:30)Advance (on-line by April 13) On-site (and after April 14)RegistrationCategories Conference Short Course Conference & Conference Short CourseShort CourseConference &Short CourseUSD420/ USD100/ USD460/ USD525/ USD125/RegularUSD570/TWD12,600 TWD3,000 TWD 13,800 TWD 15,750 TWD 3,750 TWD 15,600USD340/ USD80/ USD370/ USD425/ USD100/IEEE MemberUSD465/TWD10,200 TWD2,400 TWD 11,100 TWD 12,750 TWD 3,000 TWD 13,950USD65/ USD50/ USD80/ USD85/ USD65/ USD100/StudentTWD1,950 TWD1,500 TWD 2,400 TWD2,550 TWD 1,950 TWD 3,000Additional Options:Additional CDsUSD67/TWD2,000 per copyAdditional Short Course HandoutUSD17/TWD500 per copyII. Registration Details1. All participants (including oral/poster presenters, TPC members, invited speakers andsession chairs) are requested to register for the conference. For advance registration,please visit conference website at http://vlsitsa.itri.org.tw/2. Thanks to the sponsorship from local government, local professors are eligible for a 40%off to the conference registration fee. This special rate is exclusively for local professorswith regular registration to conference and conference with tutorial (or conference withshort course) only.3. The registration fee may be paid by check, bank draft or credit card for all registrants.Local registrants can also paid by ATM. Online payment by credit card will be securedby SSL. Please follow the instructions of payment method accordingly.4. A registration confirmation letter will be sent once the payment has been completed.5. Each registrant must use one individual form, except for group/sponsor registrants.6. For group registrations, please fill in Group/Sponsor registration form on-line.7. If you could not register successfully through the internet, please download the on-lineRegistration Form from conference website and send it by e-mail or fax to ConferenceRegistrar.Conference RegistrarMs. Yvonne Chen / Ms. Mandy LoRoom 101, Bldg. 21, No.195, Sec. 4, Chung Hsing Rd.,Chutung, Hsinchu 310, TaiwanTel: 886-3-591-3003 or 886-3-591-2896Fax: 886-3-582-0303E-Mail: YvonneChen@itri.org.tw4


III. Cancellation PolicyYou are encouraged to register before the conference in advance for your ownconvenience and enjoy the early registration rate. Due to materials printing commitments,refunds requested after April 20th, <strong>2012</strong> cannot be guaranteed. If written notice ofcancellation reaches the Conference Registrar:• Before 23:59 GMT+0800, April 20, <strong>2012</strong>: A USD40 (TWD1,300) processing fee willbe withheld from all refunds.• After 23:59 GMT+0800, April 20, <strong>2012</strong>: no refund will be given. A copy of theconference CD will be sent after the Symposium.BEST P<strong>AP</strong>ER AWARD2011 <strong>VLSI</strong> <strong>DAT</strong> Best Paper AwardThe Best Paper Award for 2011 <strong>VLSI</strong>-<strong>DAT</strong> will be presented in the opening ceremony of thesymposium.Important Test Selection For Screening Potential Customer ReturnsNik Sumikawa, Dragoljub Gagi Drmanac, LeRoy Winemberg, Li-C. Wang, Magdy S.Abadir, University of California, Santa Barbara, USA.<strong>VLSI</strong> Design, Automation and Test, 2011 <strong>VLSI</strong>-<strong>DAT</strong> '11. International Symposium on 25-27April 2011Digital Object Identifier: 10.1109/V<strong>DAT</strong>.2011.5783603Publication Year: 2011A Macro-Layer Level Fully Parallel Layered LDPC Decoder SOC for IEEE 802.15.3cApplicationZhixiang CHEN, Xiao PENG, Xiongxin ZHAO, Qian XIE, Leona OKAMURA, DajiangZHOU, Satoshi GOTO, Waseda University, Japan.<strong>VLSI</strong> Design, Automation and Test, 2011 <strong>VLSI</strong>-<strong>DAT</strong> '11. International Symposium on 25-27April 2011Digital Object Identifier: 10.1109/V<strong>DAT</strong>.2011.5783634Publication Year: 2011<strong>2012</strong> <strong>VLSI</strong>-<strong>DAT</strong> Best Paper Award CommitteeThe committee co-chairs are Technical Program Committee co-chairs, Prof. An-Yeu Wu andProf. Li-C. Wang. The committee members consist of invited TPC members of <strong>2012</strong>. TheAward Committee will select the <strong>2012</strong> Best Paper Award based on the criteria including thetechnical contents and depth, quality of the paper as well as the quality of the presentation.The award will be announced after the conference and the award ceremony will be held at theopening of the 2013 symposium.<strong>2012</strong> <strong>VLSI</strong>-<strong>DAT</strong> Best Paper Award CandidatesThe Best Paper Award Committee has selected the following papers as award candidates for<strong>2012</strong> <strong>VLSI</strong>-<strong>DAT</strong> and will conduct on-site evaluation during the conference.M12 A 2kb Built-In Row-Controlled Dynamic Voltage Scaling Near-/Sub-ThresholdFIFO memory for WBANsWei-Hung Du, Po-Tsang Huang, Ming-Hung Chang, and Wei Hwang5


T41 A 4.9-mW 4-Gb/s Single-to-Differential TIA with Current-Amplifying RegulatedCascodeTzon-Tzer Lu, Hua-Chin Lee, Chao-Shiun Wang and Chorng-Kuang WangT51 Design of a Real-time Software-Based GPS Baseband Receiver Using GPUAccelerationJyun-Cheng Wu, Lei Chen, and Tzi-Dar ChiuehT61 3D-IC BISR for Stacked Memories Using Cross-Die SparesChun-Chuan Chi, Yung-Fa Chou, Ding-Ming Kwai, Yu-Ying Hsiao, Cheng-Wen Wu,Yu-Tsao Hsing, Li-Ming Denq, Tsung-Hsiang LinT62 Routing-Efficient Implementation of An Internal-Response-Based BISTArchitectureWei-Cheng Lien, Tong-Yu Hsieh and Kuen-Jong LeeT71 IMITATOR: A Deterministic Multicore Replay System with Refining TechniquesShing-Yu Chen, Chi-Neng Wen, Geng-Hau Yang, Wen-Ben Jone, Tien-Fu ChenT81 Automatic Delay Analysis and Throughput Optimization in Data-DrivenAsynchronous CircuitsHongguang Ren Zhiying Wang Doug Edwards Wei ShiT91 A Low-Power and Small-Area All-Digital Spread-Spectrum Clock Generator in65nm CMOS TechnologyChing-Che Chung, Duo Sheng, and Wei-Da HoT101 Port Assignment for Interconnect Reduction in High-Level SynthesisHao Cong, Waseda University Song Chen, Waseda University Takeshi Yoshimura, WasedaUniversityW21 A Monolithic 1.85GHz 2-stage SiGe Power Amplifier with Envelope Tracking forImproved Linear Power and EfficiencyRuili Wu, Yan Li, Jerry Lopez, and Donald Y. C. LieHOTEL RESERVATIONDue to the hot season of visiting HsinChu, the hotel room availability is not guaranteed byhotels even though a block of rooms has been reserved for <strong>VLSI</strong>-<strong>DAT</strong> participants at thefollowing hotels near the Symposium venue: the Ambassador Hotel, the Puyisy BusinessHotel, and Lakeshore Hotel Metropolis I. The hotel reservation form is now available fordownload from conference website. Please make your hotel reservation NO LATER THANMARCH 30, <strong>2012</strong> to qualify for a room under our special rates.The Symposium sessions will be held at the Ambassador Hotel, HsinChu. Both the PuyisyBusiness Hotel and Lakeshore Hotel Metropolis I are located within a 15-minute walkdistance from the Ambassador Hotel while the shuttle service is available from LakeshoreHotel Hsinchu to the conference site.6


Hotel1. Ambassador Hotel Hsinchu國 賓 大 飯 店2. Lakeshore Hotel Metropolis I煙 波 大 飯 店 都 會 一 館3. Puyisy Business Hotel普 邑 斯 商 務 旅 館4. Lakeshore Hotel Hsinchu煙 波 大 飯 店 湖 濱 本 館7Room Rate (NTD)Deluxe Single 4,650+10%Deluxe Twin 5,100+10%Deluxe SingleDeluxe TwinExquisite SingleDeluxe Twin2,800(Net)3,220(Net)1,400(Net)1,800(Net)New Chalucet Single 3,000(Net)New Chalucet Twin 3,800(Net)ChargesincludingBreakfastBreakfastBreakfastBreakfast**Shuttle All rates are subject to change without notice and the amount in US dollars might be changed upon currencyfluctuation (currency exchange rate at January <strong>2012</strong> is around USD1 to TWD30). All kinds of credit cards are acceptable. Only facsimile of reservation is accepted.** The free shuttle service is available from Lakeshore Hotel Hsinchu to the conference site (one way) and onlyone service daily at 8:00am from 23 to 25 April, <strong>2012</strong>. For more details, please contact Front Desk forassistance.For more information about accommodations and room reservations for each hotel, pleasecontact:The Ambassador Hotel Hsinchu 新 竹 國 賓 大 飯 店Tel: +886-3-515-1111Fax: +886-3-515-1216Address: No.188 Sec 2, Chung-Hua Road, Hsin Chu, TaiwanE-mail: reserv@ambassador-hsinchu.com.tw; rsbn@ambassador-hsinchu.com.twWebsite: www.ambassadorhotel.com.twTransport:1. Airport pickup service TWD 2,000 (Mercedes)2. Taxi: from Taoyuan Airport terminal taxi stop to hotel costs TWD1,500 (estimated)Puyisy Business Hotel 普 邑 斯 商 務 旅 館Tel: +886-3-542-8681Fax: +886-3-542-5827Address: No.193 Min Chuan Road, Hsin Chu, TaiwanE-mail: puyisy.htl@msa.hinet.netWebsite: http://www.puyisy.comTransport:1. Airport pickup service TWD1,300 in cash (2000cc Limo or up)2. Taxi: from Taoyuan Airport terminal taxi stop to hotel costs TWD1,500 (estimated)3. Take 16 minutes from the Puyisy business hotel to the conference site by walk.Lakeshore Hotel Metropolis I 煙 波 大 飯 店 都 會 一 館Tel: +886-3-542-7777Fax: +886-3-612-1237Address: No.177, Min Sheng Rd. Hsin Chu, Taiwane-mail: metropolis@lakeshore.com.twWebsite: http://www.lakeshore.com.tw/english/metropolis/metropolis.htm


煙 波 大 飯 店 湖 濱 本 館Lakeshore HotelHsinchuREGISTRATION/INFORMATION CENTERRegistration material and the Symposium Proceedings will be available at the Registrationand Information Center, located in the Grand Ball Room, the 10 th floor of the AmbassadorHotel.The Registration and Information Counter will be open as follows:Monday, April 23Tuesday, April 24Wednesday, April 258:00 a.m. - 5:00 p.m.8:00 a.m. - 5:00 p.m.8:00 a.m. - 3:30 p.m.COCKTAIL RECEPTIONA cocktail reception for Symposium participants will be held on Tuesday, April 24, from 6:30p.m. to 8:00 p.m. One admission to the cocktail reception is included in the conferenceregistration fee. Spouses are welcome and free.IDENTIFICATION BADGEBadges are required for admittance to all technical sessions and the cocktail reception. Please9


wear your badge at all times.SPEAKER’S PREVIEW ROOMThe preview room for all speakers is located in the Mezzanine C Room, the 13 th floor of theAmbassador Hotel. We encourage all speakers to check your file and make preparation inadvance there.Open Hours:Monday, April 23Tuesday, April 24Wednesday, April 258:30 a.m. - 5:00 p.m.8:30 a.m. - 5:00 p.m.8:30 a.m. - 3:30 p.m.INTERNET ACCESSThe Wireless LAN will be available at conference site. 802.11 b/g Wireless LAN card andlaptops should be prepared by users.ABOUT TAIWANI. Visa30-days visa-free privileges are afforded to citizens of the Australia, Republic of Korea,Malaysia, Singapore, U.S.A.90-days visa-free privileges are afforded to citizens of Austria, Belgium, Bulgaria, Canada,Cyprus, Czech Republic, Denmark, Estonia, Finland, France, Germany, Greece, Hungary,Iceland, Ireland, Israel, Italy, Japan, Latvia, Liechtenstein, Lithuania, Luxembourg, Malta,Monaco, the Netherlands, New Zealand, Norway, Poland, Portugal, Romania, Slovakia,Slovenia, Spain, Sweden, Switzerland, U.K. , and Vatican City State.For other countries, visas are required to enter Taiwan and can be obtained from R.O.C.embassies, consulates or designated representative offices in visitors' native countries.Tourist visas are recommended. Foreign nationals may obtain a tourist visa to visit theRepublic of China for purposes of sightseeing, business, family visits, study or training,medical treatment, or other legitimate activities. Tourist visas may be single- ormultiple-entry. Meanwhile, passports valid for at least 6 months are required for anyone whoplans to visit Taiwan.For more details of visa application, please go Bureau of Consular Affairs, Ministry ofForeign Affairs, Republic of China (Taiwan) web site http://www.boca.gov.tw/mp.asp?mp=2Visa application for those who are holding passport of Mainland China:Since it might take more than 2 months for China visitors to apply the entry visa to Taiwan,the conference registrar is now in the process of collecting list of attendees (including authorsand speakers) from China. If any of your paper presenter or any of your team member is fromP.R.O.C (or holds a passport issued by P.R.O.C government) and want to attend theconference, please immediately contact the conference registrar, Ms. Yvonne Chen(YvonneChen@itri.org.tw) for more information. If you need other documentations or helpfor your VISA application, please contact vlsidat@itri.org.tw for assistance.10


II. TransportationHotel LimousineLimousines from the Taiwan Taoyuan International Airport to the Ambassador Hotel,Hsinchu are available at around TWD2,000 (USD67). Please indicate your needs on the hotelreservation form for further arrangement.Airport TaxiTaxis queue outside the Arrival Halls of both terminals. To ensure the safety of passengers,only taxis approved by the Aviation Police Bureau after a strict qualification procedure arecleared to operate in Taoyuan Airport. Taxis are available at Taoyuan Airport around theclock.The airport taxi charges according to the meter plus a 50% surcharge. Typical cost of a taxiride to Hsinchu is around TWD1,500 (USD50).Note: To avoid communication problems with taxi drivers, please have a Chinese note (onpage 8 - 9 of this Advance Program) of your destination for taxi drivers.III. Helpful TipsClimateApril is one of the most pleasant months in Taiwan. The climate in April is generally mildwith temperatures of 20-30°C (68 -86°F) in Hsinchu.CurrencyThe New Taiwan Dollar (TWD) is the national currency. Paper currency comes in $2,000,$1,000, $500, $200, and $100 denominations, while coins come in $50, $10, $5, and $1denominations.The currency exchange rate in January <strong>2012</strong> is around TWD30 to USD1. Foreign currenciescan be exchanged at the Taiwan Taoyuan International Airport, government-designated banksand hotels. Please visit the Currency Converter at http://www.xe.com/Major credit cards are accepted in major hotels and stores and travelers checks may becashed at foreign-exchange banks, some tourist-oriented businesses and most internationaltourist hotels.Electricity110 Volts/ 60Hz A.C.LanguagesMandarin Chinese is the official language in Taiwan, but English is familiar to the youngergeneration, especially students, and those who are engaged in export trading, tourism, andhotel industries. Most taxi drivers do not speak English.11


Time ZoneTaiwan is 8 hours ahead of Greenwich Mean Time (GMT + 0800)TippingA 10% service charge is automatically added to room rates and meals. Other tipping is notexpected.Telecommunication ServicesThe local rate for public pay phones is NTD1 for two minutes. Most public phones provideinternational call services. English and Japanese speaking switchboard operators areemployed at most hotels. Please dial the area code (02) for Taipei and (03) for Hsinchu first ifyou are calling from the Taiwan Taoyuan International Airport.Useful Phone NumbersTaiwan Taoyuan International Airport Tourist Service Centerhttp://www.taoyuan-airport.com/english/index.jspTerminal 1 Service Counter: +886-3-398-2143Terminal 2 Service Counter: +886-3-398-3274Tourism Bureau's Tourist Informationhttp://www.tbroc.gov.tw/Tel: +886-2-2349-1500Bureau of Consular Affairs, Ministry of Foreign Affairshttp://www.boca.gov.tw/mp?mp=2Tel: +886-2-2343-2888International Phone Assistance....................................100Emergency Service.......................................................110For more information about Taiwan, please refer to http://www.taiwan.net.tw/.INQUIRIESRequests for information about the Symposium should be directed to:Ms. Elodie Ho/ Doris ChiangSecretariat of <strong>VLSI</strong>-<strong>DAT</strong>Room 719, Bldg. 51, No. 195, Sec. 4, Chung Hsing Rd.,Chutung, Hsinchu, Taiwan 310Tel: +886-3-591-9039Fax: +886-3-582-0420E-mail: vlsidat@itri.org.tw12


VENUE M<strong>AP</strong>/ROOM ASSIGNMENTLocationEventsDomesticRegistrationBallroom ABallroom BForeigner/VIPRegistrationBallroom CBallroom D10 th floorBallroom A+B4/23 Joint Opening and Joint Plenary SessionBallroom A4/23 Joint Session 14/24 TSA Session 4, Special Talk, Joint Session 2and Panel Discussion4/25 TSA Short Course I and <strong>DAT</strong> Industry Session Iand IIBallroom B4/23 <strong>DAT</strong> Special Session I, Session M1 and M24/24 TSA Session 5, Session 7 and CocktailReception4/25 TSA Session 9 and Short Course IIBallroom C4/23 TSA Session 34/24 TSA Session 6 and Session 84/25 TSA Session 10 and <strong>DAT</strong> Tutorial IHallway-10F4/24 <strong>DAT</strong> Poster SessionBallroom D4/23 TSA Session 24/24 <strong>DAT</strong> Session T1, T3, T5, T7 and T94/25 <strong>DAT</strong> Special Session II and Tutorial IIMezzanine A+B11 th floorMezzanine A+B4/24 Session T2, T4, T6, T8 and T104/25 <strong>DAT</strong> Session W1, W2 and Tutorial IIIMezzanine CAuthor Preview RoomMezzanine C13 rd floor13


CONFERENCE SCHEDULEOpening & Award CeremonyMonday, April 23, 9:00 AM ~ 9:40 AMBallroom A+BTSA Symposium Chair and Co-chairs: Michel Brillouët, CEA-Léti Minatec, FranceLewis Terman, IBM, USAYi-Jen Chan, ITRI, Taiwan<strong>DAT</strong> General Co-chairs: Shyh-Jye Jou, National Chiao Tung University, TaiwanPatrick Yue, UC Santa Barbra, USA2011 <strong>VLSI</strong>-TSA Best Student Paper AwardNew Tellurium Implant and Segregation for Contact Resistance Reduction and SingleMetallic Silicide Technology for Independent Contact Resistance Optimization in n- andp-FinFETsShao-Ming Koh, National University of Singapore, SingaporeCo-Authors: Eugene Yu Jin Kong, Bin Liu, Chee-Mang Ng, Pan Liu, Zhi-Qiang Mo,Kam-Chew Leong, Ganesh S. Samudra, and Yee-Chia YeoInternational Symposium on <strong>VLSI</strong> Test, Systems and Application, 25-27 April, 2011 Page(s):74 - 75Digital Object Identifier: 10.1109/VTSA.2011.58722382011 <strong>VLSI</strong>-<strong>DAT</strong> Best Paper AwardImportant Test Selection For Screening Potential Customer ReturnsNik Sumikawa, Dragoljub Gagi Drmanac, LeRoy Winemberg, Li-C. Wang, Magdy S.Abadir, University of California, Santa Barbara, USA.<strong>VLSI</strong> Design, Automation and Test, 2011 <strong>VLSI</strong>-<strong>DAT</strong> '11. International Symposium on 25-27April 2011Digital Object Identifier: 10.1109/V<strong>DAT</strong>.2011.5783603Publication Year: 2011A Macro-Layer Level Fully Parallel Layered LDPC Decoder SOC for IEEE 802.15.3cApplicationZhixiang CHEN, Xiao PENG, Xiongxin ZHAO, Qian XIE, Leona OKAMURA, DajiangZHOU, Satoshi GOTO, Waseda University, Japan.<strong>VLSI</strong> Design, Automation and Test, 2011 <strong>VLSI</strong>-<strong>DAT</strong> '11. International Symposium on 25-27April 2011Digital Object Identifier: 10.1109/V<strong>DAT</strong>.2011.5783634Publication Year: 2011PROGRAM REPORT<strong>VLSI</strong>-TSA program report14


Technical Program Chair:(Charles) Kin P. Cheung, NIST, USA<strong>VLSI</strong>-<strong>DAT</strong> program reportTechnical Program Co-chairs:An-Yeu Wu, National Taiwan University, TaiwanLi-C. Wang, University of California Santa Barbara, USAJOINT PLENARY SESSIONMonday, April 23, 9:40 AM ~ 12:30 PMBallroom A+BTSA Symposium Chair and Co-chairs: Michel Brillouët, CEA-Léti Minatec, FranceLewis Terman, IBM, USAYi-Jen Chan, ITRI, Taiwan<strong>DAT</strong> General Co-chairs: Shyh-Jye Jou, National Chiao Tung University, TaiwanPatrick Yue, UC Santa Barbra, USA9:40 AMJK1 Advances in ComputingYa-Qin ZhangCorporate Vice President of Microsoft and Chairman of Microsoft Asia-Pacific Researchand Development Group (ARD)10:30 AM Break10:50 AMJK2 TBDPhilippe Magarshack,VP of Technology R&D Group, STMicroelectronics11:40 AMJK3 TBD12:30 PM LunchJOINT SESSION I: Smart Handheld PlatformMonday, April 23, 1:30 PM ~ 6:30 PMBallroom ACo-chairs:Carlos Mazuré, SOITEC, FranceBor-Sung Liang, MediaTek Inc., Taiwan1:30 PMJS11 TBD15


2:10 PMJS12 TBD2:50 PMJS13 TBD3:30 Break3:50 PMJS14 Touch Techniques in Smart Handheld DeviceLin LinPresident of Anxar Touch Tech. Co., Ltd.The open Operation System, the innovative User Interface and powerful Integrated Chipsettotally changed the perception of touch component in smart handheld device. Under theevolution, the projective capacitive touch panel has been exalted to Apple’s mobile devices.Coming by the Apple’s wave, the other various touch technologies flocked in any occasion.Among these touch technologies, which one will be the mainstream is the very interestingand important for enterprise and academy. The special session is expected to present anoverview of emerging touch technologies as well as focus on some of significant touchtechnologies.4:30 PMJS15 Google’s C/C++ Toolchain for Smart Handheld DevicesDoug KwanGoogle Inc., USASmart handheld devices are ubiquitous today and software plays an important role on them.Therefore a compiler and related tools cn improve devices by generating efficient, compactand secure code. In this paper, we share our experience of applying various compilationtechniques at Google to improve software running on smart handheld devices, usingGoogle’s mobile platforms as examples. At Google we use GNU toolchain for generatingcode on difference platforms and for conducting compiler research and development. Wehave developed new techniques, added features and functionalities in the GNU tools. Someof these results are now used for smart handheld devices.5:10 PMJS16 The <strong>2012</strong> ARM Powered Compute Subsystem – Delivering the Smart HandheldPlatformTim WhitfieldARM Ltd., TaiwanDriven by increased complexity, cost and shorter time to market, subsystem re-use is nowadopted by most of the major semiconductor companies. This presentation will show a “<strong>2012</strong>Compute Subsystem” for application processors targeting low power, screen based devicessuch as smartphones, and tablets, giving a detailed breakdown of the key hardware andsoftware technology employed. The presentation will outline the development decisionsmade for the <strong>2012</strong> compute subsystem using Cortex-A15 and Mali-600 processors.16


5:50 PMJS17 TBD17


SPECIAL SESSION I: High-Power Green ElectronicsMonday, April 23, 1:30 PM ~ 2:50 PMBallroom BChair: Chung J. Kuo, Delta, Taiwan1:30 PMSS11 Silicon Carbide (SiC) Devices opens the New EraHidemi TakasuRohm, JapanThe expectations for Silicon Carbide (SiC) devices in advanced power electronicsapplications for energy savings continue to grow. Since SiC has a breakdown electric fieldten times higher than that of Silicon, SiC devices have yet to achieve ideal performancelevels. In addition, new assembly technologies are necessary to bring out their maximumpotential. Trench Schottky Diodes and double-trench MOSFETs are demonstrated to realizeSiC potentials. These devices succeeded in improving performance by reduction of theinternal electric field. Trench Schottky diodes are able to reduce forward voltage (VF) dropand double-trench MOSFETs show extremely low on-resistances. Power modules using SiCdevices demonstrate high temperature operation and high power density.2:10 PMSS12 TBD2:50 PM. BreakSession M1: Low Power Memory DesignMonday, 23 April, 3:10 PM~4:30 PMBallroom BCo-Chairs:Gene C.H. Chuang, Industrial Technology Research Institute, TaiwanMeng-Fan Marvin Chang, National Tsing Hua University, TaiwanThis section considers Low-power and Low-voltage memory design. There are two papersdesigned for 6T SRAM array. The first paper considers one of the major reliability, the BT1degradation, concerning in nano-scale CMOS <strong>VLSI</strong> design. The second paper concerns readstability and write margin characterization scheme for 6T SRAM array. The other two papersare designed for slew rate self-adjusting 2VDD output butter with PVT compensation andbuilt-in row controlled dynamic voltage scaling near-/sub-threshold FIFO memory.3:10 PMM11 Embedded SRAM Ring Oscillator for In-Situ Measurement of NBTI and PBTIDegradation in CMOS 6T SRAM ArrayMing-Chien Tsai 1 , Yi-Wei Lin 1 , Hao-I Yang 1 , Ming-Hsien Tu 1 , Wei-Chiang Shih 2 , Nan-ChunLien 1,2 , Kuen-Di Lee 2 , Shyh-Jye Jou 1 , Ching-Te Chuang 1 and Wei Hwang 11 National Chiao Tung University, Taiwan2 Faraday Technology Corporation, Taiwan18


One of the major reliability concerns in nano-scale CMOS <strong>VLSI</strong> design is thetime-dependent Bias Temperature Instability (BTI) degradation. Negative Bias TemperatureInstability and Positive Bias Temperature Instability (NBTI and PBTI) weaken MOSFETsover usage/stress time. We present an embedded 6T SRAM ring oscillator structure whichprovides in-situ measurement/characterization capability of cell transistor degradationinduced by bias temperature instability. The viability of the embedded ring oscillatorodometer and the impact of bias temperature instability are demonstrated in 55nm standardperformance CMOS technology.3:30 PMM12 A 2kb Built-In Row-Controlled Dynamic Voltage Scaling Near-/Sub-ThresholdFIFO Memory for WBANsWei-Hung Du, Po-Tsang Huang, Ming-Hung Chang, and Wei HwangNational Chiao Tung University, TaiwanDue to the limited energy source, ultra-low power designs are significant approaches inenergy-constrained SoCs. In this paper, a 2kb built-in row-controlled dynamic voltagescaling (DVS) FIFO memory is proposed to adopt the operation voltage in thenear-/sub-threshold regions for the WBAN (wireless body area network) system. Therow-based DVS provides the fine-grained power switch control for each sub-block. Moreover,only one sub-block are operated in the typical mode, and other sub-blocks are operated in thelow-power mode and cut-off mode for realizing the power saving. Based on TSMC 65nmtechnology, the proposed DVS FIFO can achieve 47.8% power saving.3:50 PMM13 An All-Digital Read Stability and Write Margin Characterization Scheme forCMOS 6T SRAM ArrayYi-Wei Lin 1 , Ming-Chien Tsai 1 , Hao-I Yang 1 , Geng-Cing Lin 1 , Shao-Cheng Wang 1 , Ching-TeChuang 1 , Shyh-Jye Jou 1 , Wei Hwang 1 , Nan-Chun Lien 1,2 , Kuen-Di Lee 2 and Wei-ChiangShih 21 National Chiao Tung University, Taiwan2 Faraday Technology Corporation, TaiwanWe present an all-digital Read Stability and Write Margin (WM) characterization scheme forCMOS 6T SRAM array. The scheme measures the cell Read Disturb voltage (V read ) and cellInverter Trip voltage (V trip ) in SRAM cell array environment. Measured voltages areconverted to frequency with Voltage Controlled Oscillator (VCO) and counter based digitalread-out. A 512Kb test macro is implemented in UMC 55nm 1P10M Standard PerformanceSP CMOS technology. Monte Carlo simulations validate the accuracy of V read and V tripmeasurement scheme, and post-layout simulations show the resolution of the digital read-outscheme scheme is 0.167mV/bit.4:10 PMM14 A Slew Rate Self-Adjusting 2xVDD Output Buffer with PVT CompensationChih-Lin Chen, Hsin-Yuan Tseng, Ron-Chi Kuo, Chua-Chin WangNational Sun Yat-Sen University, Taiwan19


A novel PVT (Process, Voltage, Temperature) detection and compensation technique isproposed to automatically adjust the slew rate of a 2xVDD output buffer. The thresholdvoltage (Vth) of PMOSs and NMOSs varying with process, voltage, and temperaturedeviation could be detected, respectively. The proposed design is implemented using atypical 90 nm CMOS process to justify the performance. By adjusting output currents, theslew rate of output signal could be compensated over 38% and the maximum data rate withcompensation is 345 MHz.4:30 PM BreakSession M2: ADC/ DACMonday, 23 April, 4:50 PM~6:10 PMBallroom BCo-Chairs:Tai-Cheng Lee, National Taiwan University, TaiwanPing-Hsuan Hsieh, National Tsing Hua University, TaiwanThe first paper uses comparators with each conversion based on digitally controlled thresholdlevels. It’s an interpolated subranging structure for high speed and low power. The secondlow power SAR ADC uses a partially asymmetric tri-level CDAC. The third low powertechniques for pipeline ADC is scaling the op-amp power. The last paper presents a highspeed DAC with Dynamic matching and calibrating techniques.4:50 PMM21 A 6b, 1GS/s, 9.9mW Interpolated Subranging ADC in 65nm CMOSTakumi Danjo, Masato Yoshioka, Masayuki Isogai, Masanori Hoshino, and SanrokuTsukamotoFujitsu Microelectronics Solutions Limited., Yokohama, JapanA 6b 1GS/s subranging ADC with interpolating technique, which has neither a referenceresistor ladder nor redundant comparators is presented. Each comparator operates twice eachcycle, during coarse and fine decision, for a conversion based on digitally controlledthreshold levels. The threshold levels at these decisions are different, so these are adjusted inforeground calibration. The die area is 0.04mm 2 including on-chip digitally threshold controlcircuit, and power consumption is 9.9mW. SNDR is 32.8 dB is achieved at 1GS/s.5:10 PMM22 A 9-bit 100MS/s Tri-level Charge Redistribution SAR ADC with AsymmetricCDAC ArrayXiaolei Zhu 1 , Yanfei Chen 2 , Sanroku Tsukamoto 2 and Tadahiro Kuroda 11 Keio University, Japan.2 Fujitsu Laboratories Ltd., JapanTo improve the ADC performance in light of area and energy efficiencies, a partiallyasymmetric tri-level CDAC design technique is proposed to save the silicon cost and poweras well. Combining the asymmetric CDAC approach with the tri-level charge redistributiontechnique makes it possible for the SAR ADC to achieve a 9-bit resolution with 4-bit + 3-bit20


split capacitor arrays. A 9-bit SAR ADC with CDAC calibration has been implemented in a65nm CMOS technology and consumes 1.26 mW from a 1.2-V supply. The staticperformance of +0.4/-0.5 LSB DNL and +0.5/-0.7 LSB INL is achieved. The ADC occupiesan active area of 0.1*0.13 mm 2 and archieves a FOM of 45fJ/conv.-step.5:30 PMM23 A 10-bit 200-MS/s Reconfigurable Pipelined ADCChia-Chi Ho and Tai-Cheng LeeNational Taiwan University, TaiwanA reconfigurable pipelined analog-to-digital converter (ADC) has been fabricated in a 90-nmCMOS technology. The analysis and design of the reconfigurable architecture are presented.Based on various system performance requirements, the suitable configuration can beadopted. The chip is designed for all the configurations under different bandwidthrequirements to prove that the reconfiguration will save significant power consumption.Simulated output codes of the designed ADC exhibit a SNDR of 59.0 dB at 200 MHz. Theactive area of the design is 0.27 mm 2 , and the power consumption at full speed is 26mWfrom a 1V supply voltage.5:50 PMM24 A 14-bit 200MS/s Current-Steering DAC Achieving over 82dB SFDR withDigitally-Assisted Calibration and Dynamic Matching TechniquesJen-Huan Tsai, Yen-Ju Chen, Yan-Fong Lai, Meng-Hung Shen and Po-Chiun HuangNational Tsing Hua University, TaiwanThis paper presents a digitally-assisted two-step linearity enhancement strategy for a 14-bitcurrent-steering DAC. The static nonlinearity introduced by the wittingly small currentsources for area reduction is firstly calibrated by evaluating and compensating the offsetcurrent between the average of 6-bit MSB array and the sum of 8-bit LSB array. An uniquedynamic element matching algorithm is then applied to suppress the residual nonlinearitieswithout introducing too many transient glitches at the output. A 14-bit 200MS/s DAC isfabricated to verify the proposed scheme and successfully demonstrates a decent linearity of82.5dB SFDR within an area of 0.45mm 2 in a 0.18um CMOS process.PLENARY SESSION ITuesday, April 24, 9:00 AM ~ 9:50 AMBallroom ATechnical Program Co-chairs:An-Yeu Wu, National Taiwan University, TaiwanLi-C. Wang, University of California Santa Barbara, USA9:00 AMK1 Ambient Electronics and Ultra-Low Power LSI DesignTakayasu SakuraiUniversity of Tokyo, Japan21


Ambient electronics is emerging, where electronicssubmerges in environments and invisibly help people leadproductive yet safe and secure life. The new electronicswill need certain attributes: ultra-low power feature,unwired operation and interfaces to physical world. In thiscontext, the talk will cover ultra-low power LSI design, 3Dintegration, and large-area electronics. For ultra-low powerdesign, the ultra-low voltage approach will be describedwhich is a battle against variability. Variability of thresholdvoltage tends to increase as we further scale MOSFET'sand within-die VTH variation is becoming a real headache.The talk will cover recent advances in ultra-low voltagedesign and application of L-/C-coupled communicationtechnology in 3D chip stacks. The recent advances inlarge-area electronics is also touched which helpselectronics interact with physical world.9:50 AM BreakSession T1: Hardware Architecture of Multimedia SystemsTuesday, 24 April, 10:10 AM~11:10 AMBallroom DCo-Chairs:Jiun-In Guo, National Chiao Tung University, TaiwanChih-Peng Fan, National Chung Hsing University, TaiwanThree papers about hardware architecture design of multimedia systems are included in thissession. Efficient memory accessing scheme becomes the major issue in current systems ,and these three works deal with this part from different points of view. In the first paper, atransform engine with low memory bandwidth is designed for JPEG-XR, and thespecification of HD1080@60fpt is achieved. The other two papers are related to videosystems. The memory bandwidth is reduced with a lossless embedded compression codec inan HD video decoder in the second paper. In the third paper, a hardware-efficient true motionestimator is proposed for full-HD video systems, where both algorithm and hardwareoptimizations are both considered.10:10 AMT11 Low Bandwidth HD1080@60FPS JPEG-XR Transform Design1 Sheng-Wei Fan, 2 Jia-Wai Chen, and 3 Jiun-In Guo1 Industrial Technology Research Institute, Taiwan2 National Chung-Cheng University, Taiwan3 National Chiao-Tung University, TaiwanThis paper proposes a low bandwidth HD1080@60fps JPEG-XR transform design to solvethe huge memory access problem induced by doing Photo Overlap Transform (POT) inJPEG-XR. Three low bandwidth techniques are proposed to decrease the amount of dataaccess in the overlapping area between MBs when doing the POT, especially in the POT-222


mode. The proposed design can reduce 72% memory bandwidth when compared to theconventional method. At operating frequency of 250MHz, the proposed design could supportthe real-time processing for videos with resolution up to HD1080@60fps.10:30 AMT12 A Lossless Embedded Compression Codec Engine for HD Video DecodingLiang-Chi Chiu and Tian-Sheuan ChangNational Chiao Tung University, TaiwanExternal memory bandwidth is the most critical issue of video coding for high performanceor low power concerns. To solve this issue, this paper presents a simple yet efficient losslessembedded compression engine. The engine uses adaptive differential patterns to reduce thedata redundancy and encodes the residuals with simple Golomb Rice coding. For data thatcannot be compressed, the engine adopts an exceptional case handling to ensure the dataintegrity and minimize the extra storage space. With above simple mechanism, the memorybandwidth can be saved by 56% to 66% for 1080P video size with QP at 10 and 28respectively. The hardware design with 0.18um CMOS process can easily support1080P@30fps video decoding with only 20K gate count at 93.3MHz.10:50 AMT13 Hardware-Efficient True Motion Estimator Based on Markov Random FieldMotion Vector CorrectionFu-Chen Chen 1 , Yung-Lin Huang 1 , and Shao-Yi Chien 21 Graduate Institute of Networking and Multimedia2 National Taiwan University, TaiwanTrue motion estimation is a well-known technique to find the true object motion trajectory ina video. However, if the target frame size becomes large, new design challenges areintroduced. We develop a true motion estimator for video systems with Full-HD resolution.The PSNR evaluation shows that our algorithm is better than other three existing algorithms.For hardware implementation, we use Verilog-HDL and synthesize it by SYNOPSIS DesignCompiler with UMC 90nm cell library. The implementation works at 300MHz frequency,and there are total 76% bandwidth reduction, 66% cycle reduction and 88% on-chip SRAMreduction with the proposed ping-pong two-way scheduling and MV grouping techniques.Session T2: Mixed-Signal TechniquesTuesday, 24 April 10:10 AM~11:10 AMMezzanine A+BCo-chairs:Chih-Cheng Hsieh, National Tsing Hua University, TaiwanWei Liang Lin, National Chung Hsing University, TaiwanWe begin this session with a paper of training algorithm from Intel, and continue by a CMOSImage Sensor. We conclude this session with a fast-locking phase-locked loop paper.10:10 AMT21 Intel® Core i5/i7 QuickPath Interconnect Receiver Clocking Circuits and23


Training AlgorithmNasirul Chowdhury, Jeff Wight, Chris Mozak, Nasser KurdIntel Corporation, USAThis paper describes the forwarded clock amplifier (FCA), phase interpolator (PI) andtraining algorithm used in receiver clocking of QuickPath Interconnect TM (QPI) in Intel ®Core micro-processor, implemented in 45nm and 32nm process technologies. QPI is usedfor communication among processors/chipsets and delivers up to 25.6GB/s BW per port at6.4GT/s. The FCA has a built in duty cycle corrector DCC. Two PIs were used for eachreceiver lane to generate clocks to capture odd and even data independently. The noveltraining and retraining algorithm trains each PI for its corresponding data eye eliminating theneed for any duty cycle correction of the PI output while maximizing the eye margin.10:30 AMT22 Time-Delay Integration Readout with Adjacent Pixel Signal Transfer for CMOSImage SensorKuo-Wei Cheng, Chin Yin, Chih-Cheng Hsieh, Wen-Hsu Chang*, Hann-Huei Tsai*, andChin-Fong Chiu*National Tsing Hua University, Hsinchu, Taiwan*National Chip Implementation Center, TaiwanThis paper presents a time delay and integration (TDI) structure for CMOS image sensor(CIS) with adjacent pixel signal transfer (<strong>AP</strong>ST). The CCD-like TDI function is achieved inCIS by proposed <strong>AP</strong>ST without additional in-pixel device and minimum routing effort. Thein-pixel integrated signal is transferred to adjacent pixel and summed up by an off-pixelcolumn-shared unity-gain buffer. A 128x6 pixel array with 6x6μm 2 pixel size has beendesigned and fabricated in TSMC0.18μm 1P6M CIS technology providing 6 TDI stages withfill factor of 23.1%. It achieves a SNR improvement of 13dB, a transfer efficiency of 99.6%,and a total power dissipation of 4.43 μW per column at 1.6K fps.10:50 AMT23 A Fast-Locking Phase-Locked Loop Using CP Control and Gated VCOI-Ting Lee, Yun-Ta Tsai, and Shen-Iuan LiuNational Taiwan University, TaiwanA fast-locking phase-locked loop (PLL) using a CP control and a gated voltage-controlledoscillator is presented. This PLL is fabricated in a 90nm CMOS technology. The measuredlocking time is 3.78us from 640MHz to 800MHz. This PLL consumes 3.8mW for a supply of1.2V and the active area is 0.0135mm 2 .11:10 AM BreakSession T3: Efficient Channel Codec DesignTuesday, 24 April, 11:30 AM~12:30 PMBallroom DCo-chairs:Hsie-Chia Chang, National Chiao Tung University, Taiwan24


Yeong-Luh Ueng, National Tsing Hua University, TaiwanIn this session, three cases of channel codec design and implementation are presented. Theapplications cover general communications, M2M communications, and NOR flashmemories.11:30 AMT31 A Fully-Parallel Step-by-Step BCH Decoder over Composite Field for NOR FlashMemoriesYi-Hsun Chen, Chi-Heng Yang, and Hsie-Chia ChangNational Chiao Tung University, TaiwanThis paper presents a DEC BCH decoder for NOR flash memories to improve the reliability.For DEC BCH code, the decoding mechanism can be induced to a simple checking equationfrom the step-by-step algorithm. It is implemented in the proposed design with afully-parallel architecture for meeting the low access time requirement of NOR flashmemories. To reduce the complexity overhead of fully-parallel architecture, the compositefield arithmetic is applied to the whole decoder without using field conversion hardware. Thesynthesis results in 90 nm standard CMOS technology show that the latency is only 2.5 nsand gate count is 23.2K.11:50 AMT32 Efficient Architecture for Reed-Solomon DecoderYung-Kuei Lu and Ming-Der ShiehNational Cheng Kung University, TaiwanAn efficient Reed-Solomon (RS) decoder design based on the reformulated inversionlessBerlekamp-Massey (RiBM) algorithm is presented in this paper. Applying the developedcontrol scheme and the simplified boundary cell, the resulting design can significantly reducethe hardware complexity and have a high throughput rate. Compared with the related works,the proposed design has the advantage of area-time complexity. With TSMC 0.18um process,the experimental results reveal that the developed RS(255,239) decoder can operate up to425MHz and achieve a throughput rate of 3.4Gbps with a total gate count of 12,668.12:10 PMT33 Large Set Construction of User Uplink Ranging Codes for M2M ApplicationsXi-Rui Wang*, Hsi-Pin Ma*, Jen-Yuan Hsu † , Pang-An Ting †* National Tsing Hua University, Hsinchu, Taiwan† Industrial Technology Research Institute, Hsinchu, TaiwanSupport for large number of devices or users are important in designing M2Mcommunications. In this paper the authors propose a design for the large set of user uplinkcodes in M2M communications. The zero-correlation-zone ZCZ sequences and Goldsequence have been applied as spreading sequences in this code division multiplexing accessCDMA-like system according to different correlation characteristics. This system supportsnumber of simultaneous users up to 256 and the total synchronization time is about 200 msunder reasonable chip rate 2.5 Mcps. Compared with previous systems for ranging process, itonly requires 33% cycles of IEEE 802.16e and 16.7% cycles of IEEE 802.16m.25


Session T4: Analog TechniquesTuesday, 24 April, 11:30 AM~12:30 PMMezzanine A+BCo-chairs:Tsung-Heng Tsai, National Chung Cheng University, TaiwanHao-Chiao Hong, National Chiao Tung University, TaiwanThis session starts with the design and analysis of the low voltage regulated cascadetrans-impedance amplifier. Then, a new design of low-standby-leakage ESD clamp circuit ispresented. Finally, the session is concluded by a high-speed Gm-C filter.11:30 AMT41 A 4.9-mW 4-Gb/s Single-to-Differential TIA with Current-Amplifying RegulatedCascodeTzon-Tzer Lu, Hua-Chin Lee, Chao-Shiun Wang and Chorng-Kuang WangNational Taiwan University, TaiwanThis paper presents the design and analysis of the low voltage regulated cascodetrans-impedance amplifier RGC-TIA. The current combining technique in RGC stage reusesthe gain of the auxiliary amplifier for current amplifying, which eliminates the current loss inthe CG-CS RGC input stage and then improves the overall TIA gain performance. Theproposed low phase error active balun is also adopted to increase the immunity ofcommon-mode noise and relax the following LA design. This TIA draws 3.77 mA from 1.3 Vonly. With a maximum 4-Gb/s data rate, this work achieves a trans-impedance gain of 61.4dBΩ and provides differential operation within 10 degrees phase difference.11:50 AMT42 New Design on 2×VDD-Tolerant Power-Rail ESD Clamp Circuit with Low StandbyLeakage in 65nm CMOS ProcessChih-Ting Yeh 1,2 and Ming-Dou Ker 2,31Industrial Technology Research Institute, Taiwan.2National Chiao-Tung University, Taiwan.3I-Shou University, Taiwan.A 2×VDD-tolerant power-rail electrostatic discharge (ESD) clamp circuit with only thin gateoxide 1V devices and silicon-controlled rectifier (SCR) as main ESD clamp device has beenproposed and verified in a 65nm CMOS process. The proposed power-rail ESD clamp circuithas an ultra-low standby leakage current by reducing the voltage drop across the gate oxideof the devices in the ESD detection circuit. From the measured results, the proposed designwith SCR dimension of 50um in width can achieve 6.5kV human-body-model (HBM), 300Vmachine-model (MM) ESD levels, and an ultra-low standby leakage current of 34.1nA atroom temperature under the normal circuit operating condition with 1.8V bias.12:10 PMT43 40MHz Gm-C Filter with High Linearity OTA for Wireless ApplicationsShin-Jye Hsu, Che-Yu Lu, and Chung-Chih Hung26


National Chiao Tung University, TaiwanThis paper presents a fourth-order linear phase low-pass filter for high speed wireless system.A high linearity operational transconductance amplifier (OTA) is proposed, where thecharacteristics of input attenuators and low source impedance are used to achieve highlinearity. By using the proposed OTA as a building block, the fourth-order low-pass filterwith the cutoff frequency of 40MHz is implemented. The filter is designed in 0.18-mmCMOS process technology and the HD3 performance is about -53.4dB. The filter works with1.8V supply voltage and its power consumption is about 14.1mW.12:30 PM LunchSPECIAL TALKTuesday, April 24, 12:40 PM ~ 13:20 PMBallroom AChair: Patrick Yue, University of California, Santa Barbara, USAThe Evolution of Fabless IC Industry in China: Past, Present, and FuturePing K. Ko 1, 2 and C. Patrick Yue 2, 31 Silicon Federation Inc., China2 Hong Kong University of Science and Technology, Hong Kong3 University of California, Santa Barbara, USAThe importance of China’s role in the electronics industry, in terms of both market size andits criticality in global supply chain, cannot be overstated. Over the past decade, China hasemerged from a mere service provider of semiconductor assembly and test to a significantplayer in both the foundry service and fabless IC design business. This plenary paper willfirst provide a macro overview on the evolution of China’s fabless IC industry. The keystatistical figures and momentous milestones and events, during the 15-year history ofChina’s fabless IC industry, will be reviewed. Next, several case studies will be presented tohighlight the unique dynamics of building and running fabless IC companies in China.Finally, the authors will speculate on the most significant trends in technology development,business model and market segments for the fabless companies going into the next decade.JOINT SESSION II: New Memory SystemTuesday, April 24, 1:30 PM ~ 4:30 PMBallroom ACo-Chairs:(Charles) Kin P. Cheung, NIST, USAAn-Yeu Wu, National Taiwan University, Taiwan1:30 PMJS21 Transforming memory systems: Optimizing for client value on emergingworkloadsKevin J. NowkaIBM, USA27


Computing systems are increasingly being transformed to better satisfy the demands ofcloud computing, Big Data, and deep, sophisticated analytics applications. Theseapplications are driving an explosion in volume of data, acceleration of the rate at whichthis data must be consumed, and an increase in the diversity of sources of data. Memorysystem architectures and designs are perhaps most affected by these changes incomputing applications. This talk will explore the disruptive trends resulting from thesenew application spaces, the resulting demands placed on the memory systems, and theimplications for existing and new memory technologies, traditional memoriesorganization, and storage class memory.2:10 PMJS22 Emerging Memory Technology PerspectiveRoberto BezMicron, ItalyMemory are getting increasing importance since they are becoming fundamental in theelectronic system definition. Presently the industry standard technologies are still DRAMand Flash that have been able to guarantee the cost sustainability thanks to the continuousscaling. The NAND/DRAM miniaturization is becoming increasingly difficult andmoreover new applications are requiring higher memory density and better performances.Therefore there are good opportunities for the alternative memory technologies, whichwill be presented and analyzed in this talk, to enter into the market and replace/displacethe standard ones.2:50 PM Break3:10 PMJS23 Review of 3D High Density Storage Class Memory (SCM) ArchitectureBrian LeeCEO of Petari, USAThis paper describes a fundamental building block and architecture of future high densitystorage class memory products. A review of 3D architecture is presented. A detailedreview of cross point select devices and memory cells are conducted followed by theirintegration scheme. A critical review of its challenges and potential solutions areproposed.3:50 PMJS24 Computer Architecture for Die StackingGabriel H. LohAdvanced Micro Devices, Inc.Three-dimensional die-stacking technologies are rapidly maturing, with intense researchand development happening in the areas of manufacturing, EDA/CAD, test and yieldimprovement. When the die-stacking technology has reached the point of economicviability for high-volume manufacturing, chip and system designers must have completearchitectures ready to take advantage of this exciting new technology. The computer28


architecture research area is showing great interest in 3D technology. In this talk, I willsummarize some of the major directions that academic researchers are currentlyexploring, highlight some of these efforts, and discuss future opportunities in these andother areas of computer and system architectures. In particular, I will cover 3Dopportunities for compute (including processor and application-specific accelerators),memory, and the integration of other technologies from a computer architectureperspective. I will also discuss areas where collaborations between computerarchitecture and other fields may provide further value for the entire die-stackingecosystem.4:30 PM BreakPANEL DISCUSSION: New Memory SystemTuesday, April 24, 4:50 PM ~ 6:10 PMBallroom AModerator: Nicky Lu, Etron Technology, TaiwanPanelists:Kevin Nowka, IBM, USARoberto Bez, Micron, ItalyBrian Lee, CEO of Petari, USAGabriel H. Loh, Advanced Micro Devices, Inc.Session T5: SoC DesignTuesday, 24 April, 1:30 PM~2:50 PMBallroom DChair: Bor-Cheng Charles Lai, National Chiao Tung University, TaiwanVarious SoC designs with topics covering communication biomedical, 3D-system andhearing aids are presented in this session.1:30 PMT51 Design of a Real-time Software-Based GPS Baseband Receiver Using GPUAccelerationJyun-Cheng Wu 1 , Lei Chen 1 , and Tzi-Dar Chiueh 1,21 National Taiwan University, Taiwan2National Chip Implementation Center, TaiwanIn this study, we designed an energy-optimized real-time GPS software receiver that runs ondesktop and server platforms. The proposed GPS software receiver moves mosttime-consuming operations to the GPU and achieves considerable increase in performance aswell as substantial reduction in total energy consumption. The proposed receiver achieves a7.5x speedup from the original CPU program. Furthermore, it reduces approximately 66 %energy consumption when compared to a software receiver without GPU acceleration.1:50 PM29


T52 Universal Architecture Prototype for Patient-Centric Medical EnvironmentYun-Yen ChenWu * , Hsi-Pin Ma * , Chaitali Biswas † and Dejan Markovic †*National Tsing Hua University, Hsinchu, Taiwan† Univ of California, Los Angeles, C<strong>AP</strong>atient-centric medical environment is an important trend recently. In this paper, a healthcare system based on mobile phone for patient-centric medi- cal environment is proposed.The system provides the remote monitoring and emergency alarm for both patients anddoctors. Also, the extended mobililty of system allows patients to engage some outdooractivities. The system transfers by Bluetooth to the mobile phone the recorded ECG and EEGsignals from the patients’ sensors. The mobile phone then by 3G/GPRS/WiFi servicestransfers the recorded data along with location information by GPS or WiFi location back tothe medical cloud.2:10 PMT53 An Efficient Memory Controller for 3D Heterogeneous Integration PlatformYi-Jun Liu, Chih-Chyau Yang, Shih-Lun Chen, Chun-Chieh Chiu, and Chun-Chieh Chu,Chien-Ming Wu, Chun-Ming HuangNational Chip Implementation Center, TaiwanThis paper presents an efficient memory controller <strong>VLSI</strong> design for integrating a 3Dheterogeneous MorPACK system. With the technique of sharing one system-side signals, thepin count can reduce 41.9% while the pin count can reduce 19.2% by applying the techniqueof sharing memory-side signals. The total silicon area of single-mode memory controllers isabout 6.83-mm 2 in TSMC 90 nm process. Compared with the total chip area 3.1-mm 2 of ourproposed multi-mode memory controller, the results show that there are 54.7 % fabricationcost reduced.2:30 PMT54 Design and Implementation of 18-band Quasi-ANSI S1.11 1/3-Octave Filter Bankfor Digital Hearing AidsChing-Hao Lin, Kuo-Chiang Chang, Min-Hsun Chuang, Chih-Wei LiuNational Chiao Tung University, TaiwanThe ANSI S1.11 1/3-octave filter bank is suitable for hearing aids; however, the large groupdelay and the high computational complexity complicate matters considerably. In this paper,a 27-uW, 10-ms, and 18-band quasi-ANSI S1.11 1/3-octave filter bank for processing 24KHz audio is designed and implemented. For an 18-band digital hearing aid with 24 KHzsampling rate, the proposed architecture has been implemented in UMC 90 CMOStechnology, which consumes only 73 uW. By voltage scaling, the circuit-level simulationresult exhibits that the power consumption of the test chip reduces to 27 uW, which is about30% of that of the most energy-efficient design for digital hearing aids.Session T6: Testing and Fault-Tolerant TechniquesTuesday, 24 April, 1:30 PM~2:50 PMMezzanine A+BCo-chairs:30


Mango Chia-Tso Chao, National Chiao Tung University, TaiwanTong-Yu Hsieh, National Sun Yat-Sen University, TaiwanThis session addresses the emerging challenges in testing, fault-tolerant design, and yieldenhancement. The first paper proposes a cross-die repair scheme for a 3D-IC. The secondpaper presents a low overhead logic BIST scheme, the third paper addresses the qualityassessment of small delay test, fault-tolerance to matrix multiplication.1:30 PMT61 3D-IC BISR for Stacked Memories Using Cross-Die SparesChun-Chuan Chi 1,2 , Yung-Fa Chou 2 , Ding-Ming Kwai 2 , Yu-Ying Hsiao 2 , Cheng-Wen Wu 1, 2 ,Yu-Tsao Hsing 3 , Li-Ming Denq 3 , Tsung-Hsiang Lin 31 National Tsing Hua University, Taiwan2 Industrial Technilogy Research Institute, Taiwan3 HOY Technologies, Taiwan3D ICs based on Through-Silicon-Vias enable the stacking of logic and memory dies tomanufacture chips with higher performance, lower power, and smaller form factor. Toimprove the yield of the memory dies in 3D ICs, this paper proposes a Built-In Self-Repairarchitecture which allows the sharing of spares between different layers of dies. Thecorresponding pre-bond and post-bond test flow is presented as well. In order to maximizethe yield gain introduced by the cross-die spares, a die matching algorithm is proposed.Experimental results show that the area overhead of the proposed BISR circuit is only 2.43%,and the yield gain achieved by cross-die spare sharing can be up to 9%.1:50 PMT62 Routing-Efficient Implementation of an Internal-Response-Based BISTArchitectureWei-Cheng Lien 1 , Tong-Yu Hsieh 2 and Kuen-Jong Lee 11 National Cheng Kung University, Taiwan2 National Sun Yat-sen University, TaiwanInternal-response-based BIST techniques use internal circuit responses to directly generatetest patterns for reducing or even eliminating storage requirement for test data. For thesetechniques, appropriate routing of internal nets to the BIST circuitry is crucial for minimizingthe area overhead and performance impact. In this paper, an efficient net sharing algorithmand special response decompressor hardware are proposed to minimize the total number ofrequired nets for an internal-response-based BIST scheme. Experimental results show that onaverage 3.24% of nets and 2.83% area overhead of response decompressors are sufficient toachieve complete fault coverage for ISCAS’85 circuits.2:10 PMT63 Statistical SDFC:A Metric for Evaluating Test Quality of Small Delay FaultsXuefeng Zhu 1,2 , Huawei Li 1* , Xiaowei Li 11 Chinese Academy of Sciences, P.R. China2Graduate University of Chinese Academy of Sciences, P.R. ChinaAs the technology node continues to shrink, an effective and accurate metric is essential to31


measure the test quality of small delay faults (SDFs), which may cause failure at circuitoutputs. Owing to including gross delay fault (GDF) coverage, prior metrics can’tconcentrate on the detection of SDFs. We propose a new metric, statistical SDFCoverage( S-SDFC), for differentiating SDFs and GDFs and evaluating the test quality ofSDFs under the statistical delay quality model. Experimental results show that S-SDFC ismore conservative and effective in evaluating the quality of test sets in detecting SDFs andoutperforms other metrics in guiding test generation.2:30 PMT64 A Fault-Tolerant PE Array Based Matrix Multiplier DesignBo-Yu Jan, Jiun-Lang HuangNational Taiwan University, TaiwanThis paper presents three fault tolerant schemes for the Cannon algorithm basedtwo-dimensional PE array matrix multiplier. In the twisted column scheme, free PEs takeover the tasks of adjacent faulty PEs; this preserves the computation efficiency. The columnreplacement scheme re-allocates the task of a faulty column to a fault-free one; this degradesthe overall performance but significantly enhances fault tolerance. The hybrid approachcombines the previous two schemes; it achieves the best fault tolerance and incurs noperformance degradation if the number of fault PEs is small. Simulation results and overheadanalyses are presented to validate the proposed schemes.2:50 PM BreakSession T7: SoC Design MethodologyTuesday, 24 April, 3:10 PM~4:30 PMBallroom DChair: His-Pin Ma, National Tsing Hua University, TaiwanThis session covers various design methodology issues of SoC implementation, includingtrace analysis , on-chip data communication security, and parallel design on GPGPUs.3:10 PMT71 IMITATOR: A Deterministic Multicore Replay System with Refining TechniquesShing-Yu Chen 1 , Chi-Neng Wen 2 , Geng-Hau Yang 1 , Wen-Ben Jone 3 , Tien-Fu Chen 11National Chiao Tung University, Taiwan2National Chung Cheng University, Taiwan3University of Cincinnati, U.S.A.This paper proposes IMITATOR for both trace compression and deterministic replay. Incontrast to the most other record and replay systems, IMITATOR presents an additional phase,refining phase, between record and replay phases to significantly reduce the recorderoverhead, while enabling faster replaying. Results with SPLASH2 benchmark on a 32-coresystem show that IMITATOR can (a) significantly reduce trace size by the trace refiningtechnologies (~15% of native trace) and (b) achieve replay speed 1.5 times faster than thereplayer using Sigrace scheme on average.32


3:30 PMT72 Transport-Layer Assisted Vertical Traffic Balanced Routing for Thermal-AwareThree-Dimensional Network-on-Chip SystemsKun-Chih Chen, Chih-Hao Chao, Shu-Yen Lin, Hui-Shun Hung, and An-Yeu Andy WuNational Taiwan University, TaiwanThe thermal problem of 3D NoC is severer than 2D NoC because chip stacking. To preventthe chip from overheat, the near-overheat routers are throttled and the topology becomesNon-Stationary Irregular Mesh NSI-Mesh. To deliver the packet successfully, TransportLayer Assisted Routing (TLAR) was proposed. It has better performance than theconventional routing approaches for NSI-Mesh. However, it suffers traffic congestion in thebottom chip layer and inter-layer traffic unbalance. In this paper, transport layer assistedVertical Traffic Balance Routing (VTBR) was proposed. In the experimental results, it canachieve more vertical traffic balancing and improve 35.3%~40% network throughput.3:50 PMT73 A Power Management Technology for Mobile Embedded SystemShui-An Wen, Chun-Chin Chen, Shing-Wu TungIndustrial Technology Research Institute, TaiwanThe power consumption of mobile electronic devices has become a major specification index,and battery endurance is a critical factor for their practicability. Electronic devices have tofully utilize power management technology to maximize power efficiency out of limitedresource. This work introduces a portable power supply module, the “miniPAC PortablePower Module”, which is a system consists of voltage management ICs and is capable ofproviding multiple voltages and battery charging. An application on the PAC Duo platform isintroduced as an example to probe into the transformation efficiency of this module. Theexperimental result shows that this module is of an efficiency of over 88%.4:10 PMT74 A Highly Parallel Design of Image Surface Layout Recovering on GPGPUGuan-Ru Li, Bo-Cheng Charles LaiNational Chiao Tung University, TaiwanSurface layout recovering helps computers understand the intricate information in an imageby assigning local segments to different geometric classes. It greatly reduces the complexityof the following-up image processing and is widely used in various computer visionapplications. However, the algorithm walks through every image pixel and imposes intensivecomputation requirement. Through comprehensive analysis on the execution behavior, thispaper identifies significant parallelism inherent in the algorithm. With careful concerns onboth multi-threaded software and parallel hardware, the optimized parallel design on amodern GPGPU has reached an average of 10.7X performance enhancement.Session T8: Advanced Performance-Driven Design OptimizationTuesday, 24 April, 3:10 PM~4:30 PMMezzanine A+BCo-chairs:33


Chien-Nan Liu, National Central University, TaiwanPo-Hung Lin, National Chung Cheng University, TaiwanThis session consists of four papers addressing advanced performance-driven designoptimization. The first paper in this session presents an automatic delay analysis approach indata-driven asynchronous circuits. The second paper proposed an analytical approach toestimate the peak wake-up current of a given wake-up input pattern at gate level. The thirdpaper proposes a nonlinear optimization methodology for globally improving the matchingquality in analog ICs. The last paper presents the challenges and state-of-the-art features of3D EDA tool chain.3:10 PMT81 Automatic Delay Analysis and Throughput Optimization in Data-DrivenAsynchronous CircuitsHongguang Ren, Zhiying Wang and Wei ShiNational University of Defense Technology, Changsha Hunan, P.R. China (2/2)Asynchronous circuits have several potential advantages over their synchronous equivalents,including the ability to exploit average case performance, the elimination of clock skewproblem and the low power nature. This paper presents an automatic delay analysis approachin data-driven asynchronous circuits. And an automatic latch insertion approach based on thedelay analysis is implemented in Teak, which aims at throughput optimization. Experimentalresults show that the delay analysis offers both an insight to the delay distributions indata-driven asynchronous circuits and a valuable guidance for throughput improvement.3:30 PMT82 Peak Wake-up Current Estimation at Gate-Level with Standard LibraryInformationMu-Shun Matt Lee, Yi-Chu Liu, Wan-Rong Wu and Chien-Nan Jimmy LiuNational Central University, TaiwanIn power gating designs, it is important to estimate the peak wake-up current at the designstages to avoid possible power issues. In this paper, an analytical approach is proposed toestimate the peak wake-up current of a given wake-up input pattern at gate level. Therequired information of the proposed approach can be directly obtained from existing libraryinformation without extra characterization, which can be easily inserted into existing EDAflow with little overhead. The extra effects of power switches on the current waveforms arealso considered in the proposed approach, which significantly improve the estimationaccuracy as demonstrated in the experiments.3:50 PMT83 A Nonlinear Optimization Methodology for Resistor Matching in Analog IntegratedCircuitsSheng-Jhih Jiang, Chan-Liang Wu, and Tsung-Yi HoNational Cheng Kung University, TaiwanIn this paper, we propose a nonlinear optimization methodology for globally improving thematching quality. Our algorithm enhances the matching quality by deforming the rectilinear34


shape into centrosymmetrical shape and simultaneously minimize the perturbation of thepre-placed normal blocks. Experimental result shows that the proposed algorithm is verypromising.4:10 PMT84 3-D Centric Technology and Realization with TSVChang-Tzu Lin, Chia-Hsin Lee, Tsu-Wei Tseng, Ding-Ming Kwai, Yung-Fa ChouIndustrial Technology Research Institute, TaiwanIn this paper, we present the challenges and state-of-the-art features of 3-D EDA tool chain.We introduce the complete realization technologies of system integration withthrough-silicon via TSV using TSMC 90nm process. For optimization of system performance,we summarize the stacking concerns for timing constraints due to various stackingapplications. With the configurations, an approach is proposed to overcome 3-D timingoptimization problem because commercial tool tackles only one die at one time. Empiricalresults show the approach is promising for present 3-D centric design methodologies.4:30 PM BreakSession T9: All-Digital Clock CircuitsTuesday, 24 April, 4:50 PM~6:10 PMBallroom DCo-chairs:Wei-Bin Yang, Tamkang University, TaiwanChing-Che Chung, National Chung Cheng University, TaiwanIn this session, four papers which related to clock generation and high speed domino circuitdesign are selected. The first three papers are focused on spread spectrum clock generation,multi-phase clock generation and clock skew, respectively. He fourth paper is for high-speeddomino circuit design.4:50 PMT91 A Low-Power and Small-Area All-Digital Spread-Spectrum Clock Generator in65nm CMOS TechnologyChing-Che Chung 1 , Duo Sheng 2 , and Wei-Da Ho 11 National Chung Cheng University, Taiwan2 Fu Jen Catholic University, TaiwanIn this paper, a low-power and small-area all-digital spread spectrum clock generator(ADSSCG) is presented. The proposed ADSSCG can provide a programmable spreadingratio. In order to maintain the frequency stability while performing triangular modulation, afast frequency and phase relock mechanism is proposed to overcome PVT variations. Theproposed ADSSCG is implemented in 65nm CMOS process, and the active area is 100um ×100um. The simulation results show that the electromagnetic interference EMI reduction is22.6dB with 1.3% spreading ratio at 270MHz and 18.9dB with 0.45% spreading ratio at162MHz. The power consumption is 229uW at 270MHz with 1.0V power supply.35


5:10 PMT92 Cyclic-MPCG : Process-Resilient and Super-Resolution Multi-Phase ClockGeneration by Exploiting the Cyclic PropertyRuo-Ting Ding 1 , Shi-Yu Huang 1 , Chao-Wen Tzeng 1 , Shan-Chien Fang 2 , and Chia-ChienWeng 11 National Tsing Hua University, Taiwan2 TinnoTek Inc., TaiwanMulti-phase clock generation (MPCG) is a problem that aims to generate a sequence of clocksignals with the same frequency and uniformly shifted phases. In this paper we propose aMPCG design with two major contributions: (1) We use a process calibration scheme thatmakes the per-phase delay highly accurate.(2) We further exploit a so-called cyclic propertyto make the achievable per-phase delay much smaller than a buffer delay. The entire designcan be made in nearly standard cells, thus lending itself to automation easily. Experimentalresults indicate this design is highly general that it can apply to a 16-phase clock signal (withthe per-phase delay of only 100ps).5:30 PMT93 A Range Extending Delay-Recycled Clock Skew-Compensation and/orDuty-Cycle-Correction CircuitShih-Nung Wei 1 , Yi-Ming Wang *2 , Jyun-Hua Peng 2 , and Yuandi Surya 21 National Chung Cheng University, Taiwan2 National Chi Nan University, TaiwanA clock skew-compensation and/or duty-cycle correction circuit (CSADC) is indispensablyrequired to maximize the performance of synchronous double edge triggered systems. Mostconventional CSADCs adopted a cascade structure that inherits a lower performanceproperty so as to slower the locking procedure, meanwhile the dual loop design results inmore power consumption and design complexity. A range extending delay-recycled CSADCis proposed in this work. Compared to conventional CSADCs, the proposed circuit achievesat least a 2 times extension in bandwidth ratio, a 2.81 times reduction in power, and a 12times reduction in power-to-bandwidth ratio.5:50 PMT94 A High-Speed Dual-Phase Processing Pipelined Domino Circuit Design with aBuilt-in Performance Adjusting MechanismChing-Hwa Cheng, and *Jiun-In GuoFeng Chia University, Taiwan* National Chiao Tung University, TaiwanA high speed dual-phase processing domino circuit design with high performance andreliable characteristics is proposed. The cell based automatic synthesis flow supports thequick design of high performance chips. The test chip of a dual-phase 64-bit high-speedmultiplier with a built-in performance adjustment mechanism has been successfully validatedusing TSMC 0.18um CMOS technology. The test chip shows a 2.7X performanceimprovement, compared to the conventional static CMOS logic design. In addition, a cellbased synthesizable design CAD flow, with consideration of the skew-tolerant issue has beenestablished. Finally, a built-in performance adjustment mechanism is conducted within36


design.Session T10: High Level Synthesis and Robust Design MethodologyTuesday, 24 April 4:50 PM~6:10 PMMezzanine A+BCo-chairs:Hung-Pin (Charles) Wen, National Chiao Tung University, TaiwanJuinn-Dar Huang, National Chiao Tung University, TaiwanThis session discussed that technical advances in high level synthesis and robust designmethodology. The first paper focuses on the port assignment problem in HLS. The secondpaper proposes an approach for ASIC synthesis based on architecture description languages.The third paper presents a novel technique for design validation. The last paper incorporatesthe spatial correlation into soft error analysis for better accuracy.4:50 PMT101 Port Assignment for Interconnect Reduction in High-Level SynthesisHao Cong, Song Chen, Takeshi YoshimuraWaseda University, JapanThis paper focuses on the Port Assignment Problem for Binary Commutative Operators(P<strong>AP</strong>-BCO) in high-level synthesis. Given a binding of operations and variables, theP<strong>AP</strong>-BCO pursues to build the connections of registers to functional units with an objectiveof minimizing interconnects. In this paper, we formulate the P<strong>AP</strong>-BCO as a vertexpartitioning problem on a graph, and propose an exact Integer Linear Programming (ILP)based method and a fast iterative method based on elementary transformations of spanningtree. Experimental results show that the fast iterative algorithm can get the optimum solutionsin 97% runs. At the same time, the running time is only tens of milliseconds.5:10 PMT102 ASIC Synthesis Using Architecture Description LanguageZheng Wang, Xiao Wang, Anupam Chattopadhyay and Zoltan E. RakosiRWTH-Aachen, GermanyIn this paper, an approach for high-level synthesis of ASICs from C specification is presentedbased on Architecture Description Languages, which are predominantly used for modelingapplication-specific processors. This helps the designers to explore a wide range ofintermediate design points between an ASIC and a weakly programmable processor. Weprovide several efficient algorithms for automating the high-level synthesis. The ADL-basedASIC synthesis flow is tested with 2 case studies from modern embedded applications.5:30 PMT103 Design Validation on Multiple-Core CPU Supported Low Power States UsingPlatform Based Infrared Emission Microscopy PIREM TechniqueYuan-Chuan Steven Chen, Dave Budka, Auston Gibertini, Dan Bockelman, Yutien LinIntel Corporation, USA37


An innovative design validation methodology using recognized industry wide IREM imagingtechniques in conjunction with full PC platform enablement was developed, and successfullyapplied to the IA-32 multiple-core (MC) NehalemR microprocessor family [1]. Conventionalstructural based “tester” IREM characterization and debug techniques can, for the first time,be extended to the “platform” environment. This approach has been proven to be a faster, andsignificantly more cost effective, approach for post silicon design validation, power debug onlow power Energy StarR states, and realistic customer applications.5:50 PMT104 Spatial-Correlation-Aware Soft Error Rate Analysis Using Quasi-ImportanceSamplingXin-Tian Lena Wu, Kai-Hua Dennis Hsu, Lynn C.-L. Chang and Charles H.-P. WenNational Chiao Tung University, TaiwanSince statistical methods are important to accurately estimate the soft error rate (SER) withprocess variation, we incorporate the spatial correlation into SSER analysis to provide betteraccuracy. Moreover, the SSER analysis based on quasi-Monte Carlo comes into the difficultyof sampling points on a non-uniform distribution or unbounded distribution. Therefore, weemploy the quasi-importance sampling into Monte-Carlo simulation to overcome suchsampling issue. Experimental results show that the quasi-importance sampling Monte-CarloSSER analysis framework is capable of more precisely estimating circuit SSERs and reaches3.72X speedups when compared to the baseline Monte-Carlo simulation.PLENARY SESSION IIWednesday, April 25, 9:00 AM ~ 9:50 AMBallroom ATechnical Program Co-chairs:An-Yeu Wu, National Taiwan University, TaiwanLi-C. Wang, University of California Santa Barbara, USA9:00 AMK2 Technology and Design Challenges for Smartphone SOCsDr. K. Lawrence LohCorporate VPMediaTek Inc., Hsinchu, TaiwanThis key-note presentation begins with the introduction ofcritical technologies driving smart phone and tablet marketssuch as increasingly demanding CPU performance, wirelessand wired connectivity requirements, memory capacity andbandwidth, display/multimedia performance and longerusage time – which requires sophisticated low powerdesign techniques. It is demonstrated that performance andcost are always the two major factors to drive theseinnovations and technologies. Design considerations andtechnology trends are highlighted with recent development inhigh speed and low power circuits and <strong>VLSI</strong>/chip38


implementations, analog, power management, mixed-signaland RF circuits and systems, and challenges in designingwith advanced IC fabrication processes and packagetechnologies.9:50 AM BreakINDUSTRY SESSION IWednesday, April 25, 10:10 AM ~ 11:10 AMBallroom AChair: Ing-Jer Huang, National Sun Yat-Sen University, Taiwan10:10 AMIS11 A 363-μW/fps Power-Aware Green Multimedia Processor for Mobile ApplicationsChi-Cheng Ju, Yung-Chang Chang, Chih-Ming Wang, ChunChia Chen, Hue-Min Lin,Chia-Yun Cheng, Fred Chiu, Sheng-Jen Wang, Tsu-Ming Liu, Chung-Hung TsaiMediatek Inc., TaiwanIn this paper, a power-aware and low power multimedia processor is presented. A novelclock gating scheme and dynamic frequency selection (DFS) are implemented to minimizethe power dissipation and it integrates 7-standards (H.264 / VC1 / RV / AVS / MPEG-1 /MPEG-2 / MPEG-4) with several resource-sharing techniques in both algorithmic andarchitectural levels so as to achieve significant area and power reduction. In this work, ourproposal also adopts several finegrain power scalability (FGPS) technologies which canreduce a noticeable power consumption. The processor supports a wide range of decodingresolution ranging from CIF to full-HD under the 20~288MHz of working frequency and60fps of frame rate with 363 μW/fps of power dissipation at 1.2V supply voltage andfabricated using 40nm 1P7M CMOS process with core area 1.40 mm 2 .10:30 AMIS12 High Speed DDR2/3 PHY and Dual CPU Core Design for 28nm SoCKevin Ho, Tsung-Yi Chou, Po-Kai Chen and David J. LiouGlobal Unichip Corp., TaiwanAs DDR DRAM is running at higher and higher speed, the shrinking data windows makesthe timing closure in setup and hold at either DRAM or host chip more and more difficult.When calculating timing margins for DDR2/3 system, it helps to break up the uncertaintycontributions into transmitter, interconnect and receiver categories. Furthermore, based onthe timing margins, the signal integrity and power integrity analysis is the key point to reachsuccess. We will also present our front-end experience of high-speed and low-power 28nmCPU core hardening, from top RTL integration to synthesis and DFT. This core includesdual-core ARM Cortex TM -A9 CPU, Level 2 Cache Controller and Program Trace Macrocell.10:50 AMIS13 The Best SoC Solution with AndesCore and Andes’s PlatformSimon Jiang and Frankwell linAndes Technology Corporation, Taiwan39


Based on 32-bit reduced instruction set computing (RISC) CPU architecture, Andes’s CPUseries are called AndesCore which support designers to exploit SoC platform. Three majorapplication classifications that N8, N9, N10 and N12 can be deployed to, are entry levelMCU based application, mid range Linux or RTOS application, and high end Linuxapplication, respectively. Each of N8, N9, N10 or N12, namely AndesCore, has configurationflexibility to be manipulated by the designers more than other competitors was improvedaverage performance and power by 24% and 43%, respectively. In order to reduce time tomarket, Andes supports SoC platform, with name of AndeShape, that can be used to developwhole SoC with helping designers to compose single N8, N9, N10 or N12 CPU core inFPGA; or dual N12 cores in real chip implementation, designer can always easily developapplication they like to.11:10 AM BreakINDUSTRY SESSION IIWednesday, April 25, 11:30 AM ~ 12:30 PMBallroom AChair: Alan Su, Synopsys, Taiwan11:30 AMIS21 Test for More Than Pass/Fail Using On-chip Temperature SensorChih-Wea Wang, Chen-Tung Lin, Chun-Chieh Hsu, Ching-Tung Wu, Chi-Feng WuRealtek Semiconductor Corp., TaiwanIn modern SoCs, process variations represented as process corners, which designers deal withsufficient design margins. As feature sizes shrink below 90 nm, the effects of processvariations are also increasing severely. Moreover, circuit performance and powerconsumption are strongly impacted by environmental conditions under which the circuitoperates. For example, thermal hot spots increase leakage, degrade timing/stability, andreduce battery lifetime. Both of increasing process variations and environment variations arenow modeled in design corners, and corresponding design margin overheads have reducedthe benefit of shrinking process node. In this paper, we first address the design marginproblem. With accurate temperature sensor and speed sensor implemented in our SoCapplication, we can measure the process and environment variations on each chip.Proceeding on-filed design model correlation with realistic variation information can furtherreduce unnecessary design margins and improve the accuracy of process modeling.Furthermore, we can utilize the temperature sensor to construct the temperature profile, andimprove the system performance and power consumption.11:50 AMIS22 A Novel Design Methodology for Hybrid Process 3D-ICChien-Lin Huang, Nian-Shyang Chang, Chi-Shi Chen, Chun-Pin Lin, Chien-Ming Wu andChun-Ming HuangNational Chip Implementation Center, TaiwanThree-dimensional integrated circuit (3D-IC) is considered to be the most promising40


technology for modern and future electronic devices manufacturing. However, lots ofchallenges need to be addressed in order to make 3D-IC technically feasible as well as costeffective. One of the major challenges for 3D-IC is electronic design automation (EDA) dueto the lack of true 3D EDA tools. In this paper, we propose a novel design methodologywhich makes current (2D-IC) EDA tools 3D aware. This methodology can be applied to3D-ICs with the structure of 2 tiers bonded face to face. Since the process node for each tierof the applied 3D-IC can be different, we name it Hybrid Process 3D-IC. Based on thehierarchical design methodology, low power design techniques, flip-chip physicalimplementation methods, customized cell libraries and scripts are combined together to buildup Hybrid Process 3D-IC design methodology. In order to verify the proposed methodology,a real circuit is designed according to it. The detail of the circuit will be described in thefollowing paragraphs.12:10 PMIS23 Challenges and Solutions in Modern Analog PlacementTung-Chieh Chen, Ta-Yu Kuan, Chung-Che Hsieh, and Chi-Chen PengSpingSoft Inc., TaiwanThe analog placement problem is to place devices without overlap and design-rule-correction(DRC) error under position constraints (e.g. symmetry, cluster) such that some cost metric(e.g. area, wire-length) is optimized. However, modern analog design challenges havereshaped the placement problem. A modern analog placer also needs to consider devicelayout-dependent-effect (LDE) and interconnect parasitic effect. Because of multipleobjectives, it is impossible to decide the single best placement. In this paper, we firstintroduce our placer that can explore multiple placements under position constraints so that adesigner can analyze the trade-off among different objectives. Then, we provide some futureresearch directions for the modern analog placement problem.SPECIAL SESSION II: Design Methodology for Advanced TechnologiesWednesday, April 25, 10:10 AM ~ 12:30 PMBallroom DCo-Chairs:Charles H-P Wen, National Chiao Tung University, TaiwanTai-Chen Chen, National Central University, Taiwan10:10 AMSS21 Data Mining Based Prediction Paradigm and Its Applications in DesignAutomationMagdy S. AbadirFreescale Semiconductor, USAThis talk will review several key challenges in design automation, including areas such aspre-silicon functional verification, design-silicon timing correlation, test cost and quality anddescribe data mining technologies to implement a prediction platform that provides uniquesolutions to over these challenges. Results based on industrial cases will be discussed andother potential applications in design automation will be explained.41


11:10 AM Break11:30 AMSS22 <strong>VLSI</strong> CAD for Emerging NanolithographyDavid PanUT-AustinIn this talk, we will discuss emerging nanolithography technologies (includingdouble/multiple patterning, extreme ultra-violet lithography, and electron-beam lithography)and their interactions with <strong>VLSI</strong> CAD. These emerging nanolithography technologies allhave different manufacturing processes with their own challenges/issues. Meanwhilenanometer <strong>VLSI</strong> designs and mask synthesis have to be co-optimized with these processtechnologies to ensure high product quality (performance/power/area, etc.), yield, andthroughput to make future scaling worthwhile. Some recent results will be presented to showthe enablement and effectiveness of such design and process integration.12:10 PMSS23 Area and Reliability Efficient ECC Scheme for 3D RAMsLi-Jung Chang, Yu-Jen Huang, and Jin-Fu LiNational Central University, TaiwanSoft error is one critical issue faced by nano-scale random access memories (RAMs).Three-dimensional (3D) RAM with through-silicon via (TSV) is a new approach forovercoming the memory wall. A 3D RAM consists of multiple dies vertically stacked, whichprovides the shielding effect for the dies below the top die. Thus, the SER in the dies belowthe top die is smaller than that in the top die. This paper proposes an area and reliabilityefficient ECC (ARE-ECC) scheme for 3D RAMs by taking advantage of the shielding effect.An area and reliability optimization algorithm is also proposed to aid the designer to designthe ARE-ECC scheme for 3D RAMs. Simulation results show that the ARE-ECC schemecan effectively increase the reliability of a 3D RAM using small area overhead.Session W1: Power Conversion TechniquesWednesday, 25 April, 10:10 AM~11:10 AMMezzanine A+BCo-chairs:Po-Chiun Huang, National Tsing Hua University, TaiwanMuh-Tian Shiue, National Central University, TaiwanThis session focuses on the power conversion techniques. We begin with a CMOS switchingconverter. Then, a high-voltage driven is presented. A wireless power link design willconclude this session.10:10 AMW11 An Area-Efficient CMOS Switching Converter with On-Chip LC Filter UsingFeedforward Ripple Cancellation TechniquePo-Hsiang Lan, Yao-Jun Kuo and Po-Chiun HuangNational Tsing Hua University, Taiwan42


Increasing the operation speed is the possible way to realize a monolithic switching converterwith on-chip LC filter. However the performance of output ripple, power efficiency, andoperation range are relative to the technology, area occupation, and circuit topology. In thiswork, with a 0.18-um CMOS, the hysteresis-based control regulates the converter at 100MHz switching frequency for the balance of filter size and power efficiency. The feedforwardripple cancellation technique reduces the output ripple further diminish the LC filter.10:30 AMW12 On Investigation into a CMOS-Process-Based High-Voltage Driver Applied toImplantable MicrosystemCihun-Siyong Alex Gong*, Kai-Wen Yao**, Jyun-Yue Hong**, and Muh-Tian Shiue****/*** Industrial Technology Research Institute, Taiwan**/*** National Central University, TaiwanStimulator is a key for implantable applications such as neural prostheses, which providesinjected current flowing directly to the neural interfacing and cells or tissues of interest toactivate neural responses as long as the electric charge thresholds involved can be achieved.This papers aims presenting a preliminary research results toward an efficient stimulator forwhich particular emphasis is put on the efficiency point of view. Analysis and design areintroduced. Experimental results from our first-edition proof of concept are given.10:50 PMW13 Wireless Power Link Design Using Silicon-Embedded Inductors forBrain-Machine InterfaceRongxiang Wu 1 , Salahuddin Raju 1 , Mansun Chan 1 , Johnny K.O. Sin 1 , and C. Patrick Yue 1, 21 Hong Kong University of Science and Technology, Hong Kong2 University of California, Santa Barbara, USAThis paper discusses the safety requirements, equivalent circuit model, and design strategy ofwireless power transmission to neural implants. A novel silicon substrate-embedded 3.2-uHspiral inductor designed inside a 4.5 mm × 4.5 mm IC is proposed to be used as the receivingcoil. In a practical brain-machine interface setting, full-wave EM simulations show thatwireless power in the range of 1-10 mW can be delivered at 5% efficiency to an implant at 1cm below the head surface using signals between 2 to 5 MHz. The large parasitic capacitanceof the “in-chip” inductor is methodically absorbed in the matching network to maximize theefficiency and power transfer.11:10 AM BreakSession W2: Power Amplifier and Energy-Efficient TransmitterWednesday, 25 April, 11:30 AM~12:30 PMMezzanine A+BCo-chairs:Hwann-Kaeo Chiou, National Central University, TaiwanTa-Shun Chu, National Tsing Hua University, Taiwan43


The first paper of this session introduces a power amplifier (PA) with an envelope trackingtechnique for improving linearity and efficiency. Next, another PA with a predistorter ispresented. Finally, an energy-efficient transmitter with an FIR pulse-shaping filter isintroduced.11:30 AMW21 A Monolithic 1.85GHz 2-stage SiGe Power Amplifier with Envelope Tracking forImproved Linear Power and EfficiencyRuili Wu, Yan Li, Jerry Lopez, and Donald Y. C. LieTexas Tech University, USAIn this paper, a monolithic 2-stage differential PA is designed and fabricated in the 0.35 μmIBM 5PAe SiGe BiCMOS technology. All components are integrated on-chip for the PAexcept the input/output baluns. The envelope tracking technique is applied to this 2-stage PAto enhance the PAE and linearity for high power-to-average-ratio LTE signals. The ET-basedPA system achieves a linear output power of 20.4 dBm with a gain of 30.5 dB and an overallsystem PAE of 22%, using an LTE 16QAM 5 MHz signal with ~7.5 dB PAR at 1.85 GHz.Compared to the 2-stage PA operating with fixed voltage supplies, the ET-based PA systemimproves the linear output power by 3.2 dB and its system PAE by over 10%.11:50 AMW22 Variable Gain Active Predistorter with Linearity Enhancement for a 2.4 GHz SiGeHBT Power Amplifier DesignKuei-Cheng Lin #1,*2 , Hwann-Kaeo Chiou #1 , Member, IEEE, Po-Chang Wu *2 , Chu-JungSha *2 , Chun-Lin Ko *2 , Da-Chiang Chang *2 , and Ying-Zong Juang *2#National Central University, Taiwan*Chip Implementation Center, National Applied Research Laboratories, TaiwanThis paper proposes a 2.4 GHz SiGe HBT differential power amplifier (PA) with a novelon-chip variable gain active predistorter (PD) for linearity enhancement. The fullydifferential active PD provides an open collector adaptive bias control which can effectivelyenhance the linearity while improve power added efficiency (PAE). The PA with active PDachieves an output 1-dB compression power (OP 1dB ) of 20 dBm, a gain control range of 10dB, a PAE of 30%, and an error vector magnitude (EVM) improvement of 2.4 % underOFDM/64-QAM modulation signal. The fabricated die size including pads is less than 0.74mm 2 and suitable for highly integrated linear PA.12:10 PMW23 An Energy-Efficient Ultra-Wideband Transmitter with an FIR Pulse-ShapingFilterWei-Ning Liu and Tsung-Hsien LinNational Taiwan University, TaiwanAn energy-efficient impulse-radio ultra-wideband IR-UWB transmitter (TX) operating at thesub-GHz range is reported. The IR-UWB TX is composed of a phase-locked loop, a digitalmodulation circuit, and an FIR-filter-based pulse generator. By utilizing a 4-tap FIR filter forpulse shaping, the output transmission spectrum is spectrally efficient. The maximum pulserepetition frequency of the proposed TX achieves 50 Mpulses/s. The TX operates from a 1-V44


supply voltage and consumes only 1.5 mW at 50-Mbps data rate. It achieves an energyefficiency of 30 pJ/bit. The TX is fabricated in a 0.18-um CMOS technology and the corearea is 0.25 mm 2 .12:30 PM Lunch45


Poster SessionTuesday, 24 April 2:00 PM~6:00 PM10F HallwayCo-chairs:Shing-Wu Tung, Industrial Technology Research Institute, TaiwanChien-Nan Kuo, National Chiao Tung University, TaiwanPS1 A 1-V 60 GHz CMOS Low Noise Amplifier with Low Loss Microstrip LinesChun-Lin Ko 1,2 , Chieh-Pin Chang 2 , Chien-Nan Kuo 1 , Da-Chiang Chang 2 and Ying-ZongJuang 21 National Chiao Tung University, Taiwan2 National Chip Implementation Center, National Applied Research Laboratories, TaiwanA V-band low noise amplifier has been demonstrated in 90 nm CMOS. The LNA design wasused the low loss microstrip lines for all matching networks. To fulfill the metal densityrequirement in fabrication, the ground plane needs slots. The direction of the slot patternaffects the line loss over 30% at 60 GHz, according to the analysis and experimental results.By slot filling under the line, the line loss can be improved 10% further. The topology ofLNA is 3 stage common-source cascades for low supply voltage limited by process. Usingthe microstrip lines, the LNA exhibited a low noise figure of 5.6 dB and a gain of 10.8 dB at60 GHz with only 5.5 mW from a 1.0 V power supply.PS2 A 1-V,44.6 ppm/℃ Bandgap Reference with CDS Technique1 Peng-Yu Chen, 2 Soon-Jyh Chang, 3 Chung-Ming Huang, and 3 Chin-Fu Lin1,2 National Cheng Kung University, Taiwan3 Himax Technologies Inc., TaiwanA CMOS bandgap reference generator with switched-capacitor and corresponding doublesampling techniques is presented. The proposed circuit uses a low-gain amplifier to generatean accurate reference voltage so that this structure can be implemented in low-voltageenvironment. With proper design, the circuit can produce any output voltage between supplyvoltage and ground. The circuit was designed and implemented in 0.18-μm CMOS process.The core circuit occupies about 0.065 mm 2( 240 μm ×271 μm). Measurement results showthat the temperature coefficient of the output is 47.3 ppm/°C in the temperature range from-40 ℃ to 100 ℃. The average power consumption is about 48.1 μW.PS3 A Highly Integrated Class-D Amplifier Using Driver Delay Hysteresis ControlJia-Nan Tai, Hsin-Shu Chen and Hang-Quei ChiuNational Taiwan University, TaiwanIn this paper, a closed-loop class-D amplifier (CDA) simplifying circuit implementation onconventional hysteresis control is proposed. The functionality of hysteresis control is realizedby comparator in conjunction with MOSFET propagation delay. The switching frequencyvariation is reduced by means of unmatched rising and falling edge delays. The prototypeCDA chip is fabricated in 0.35-µm CMOS process and its active area is 0.2 mm 2 . Themeasurement results show that THD+N ratio is flat between 20 Hz and 20 kHz with allvalues below 0.03%. PSRR and SNR are 64 dB and 83 dB, respectively.46


PS4 A Low-Power, Capacitively-Divided, Ring Oscillator with Digitally AdjustableVoltage SwingTao Jiang 1 , Kangmin Hu 2 and Patrick Y. Chiang 11 Oregon State University, USA2 Broadcom Corporation, USAA capacitively-divided, ring oscillator is proposed to decrease dynamic power consumptionby reducing voltage swing. A feedforward capacitor is placed in series with the loadcapacitance, effectively AC coupling each inverter stage to the next stage. Simulations withdifferent ratios of the feedforward and load capacitors show a reduction in the powerconsumption by as much as 36% with a 72% reduction in voltage swing. At the limits ofreduced voltage swing, the power consumption is limited by static leakage current. Built in a90nm CMOS, 1.2V, a capacitively-coupled ring oscillator with digital trimming isinjection-locked to an off-chip reference clock and measured for verification.PS6 An Energy-Saving Spectrum Sensing Processor Based on Partial Discrete WaveletPacket TransformChi-Kai Yang, Chi-Hsuan Hsieh, Yuan-Hao HuangNational Tsing-Hua University, TaiwanCognitive radio draws lots of research attentions in recent years for its efficient spectrumutilization. This research proposes a discrete wavelet packet transform DWPT-basedspectrum sensing processor for cognitive radio systems. In this processor, this study proposesa partial DWPT spectrum analyzer and a double-threshold energy detection and decisionresult prediction technique to reduce the computational complexity. Complexity analysisshows that the proposed spectrum sensing processor saves at most 62% computational costcompared to the traditional FFT processor. <strong>VLSI</strong> implementation results also show that theproposed architecture saves about 40% energy dissipation for the IEEE 802.22 WRANsystem.PS7 DVB-T2 LDPC Decoder with Perfect Conflict ResolutionXiongxin ZHAO, Zhixiang CHEN, Xiao PENG, Dajiang ZHOU and Satoshi GOTOWaseda University, JapanIn this paper we focus on the resolution of the message updating conflict problem in layeredalgorithm for DVB-T2 LDPC decoders. Unlike the previous resolutions, we directlyimplement the layered algorithm without modifying the parity-check matrices PCM or thedecoding algorithm. DVB-T2 LDPC decoder architecture is also proposed in this paper withtwo new techniques which guarantee conflict-free layered decoding. Experiment results showthat compared to state-of-the-art works we achieve a slight error-correcting performance gainfor DVB-T2 LDPC codes.PS8 A Low Cost DPA-Resistant 8-bit AES Core Based on Ring OscillatorsHsing-Ping Fu, Ju-Hung Hsiao, Po-Chun Liu, Hsie-Chia Chang, and Chen-Yi LeeNational Chiao Tung University, TaiwanThis work presents a 8-bit low cost AES core with DPA resistance for resource-limited47


applications. By using ring oscillators, the protection with no throughput penalty caneffectively keep the design from DPA attacks. Both of the elegant control mechanism andlogic sharing are employed to further reduce power consumption and area overhead. Afterimplemented in UMC 90 nm CMOS technology, our approach costing 3.32K gates with6.9% overhead can achieve 178Mb/s throughput and 201uW power consumption, showingthat this design is comparable with other unprotected low cost designs. Furthermore, thecountermeasure protects key information from DPA after evaluated by 15000 traces.PS9 A Hardware in the Loop Design Methodology for FPGA System and Its Applicationto Complex FunctionsGuixuan Liang, Danping He, Jorge Portilla, Teresa RiesgoUniversidad Politécnica de Madrid, SpainIn this work, a unified algorithm-architecture-circuit co-design environment for complexFPGA system development is presented. The main objective is to find an efficientmethodology for designing a configurable optimized FPGA system by using as few efforts aspossible in verification stage, so as to speed up the development period. A proposed highperformance FFT/IFFT processor for Multiband Orthogonal Frequency DivisionMultiplexing Ultra Wideband (MB-OFDM UWB) system design process is given as anexample to demonstrate the proposed methodology. This efficient design methodology istested and considered to be suitable for almost all types of complex FPGA system designsand verifications.PS10 Design and Implementation of an Optical OFDM Baseband Receiver in FPGAYin-Tsung Hwang, Sung-Jun Tsai, Yi-Yo ChenNational Chung Hsing University, TaiwanA baseband receiver design and its FPGA implementation for an OOFDM system aimed atthe NG-PON applications are presented. A low cost IMDD architecture is adopted andbaseband DSP measures are employed to compensate various optical impairments. Targetinga 4GSps throughput (8Gbps data rate), an 8-way parallel architecture is developed to performthe synchronization, FFT and equalization each with massive parallelism. The simulationresults show a -16dBm receiver sensitivity for 64 QAM modulation with BER equal to10**-3. the FPGA implementation obtains a fully functional, but speed degraded 250 MHzversus 500MHz design. The design occupies 21,423 logic slices and 56 embedded DSP.PS11 An OCP-AHB Bus Wrapper with Built-in ICE Support for SoC IntegrationCheng-Ta Wu, Feng-Xiang Huang, Kuan-Fu Kuo, Ing-Jer HuangNational Sun Yat-Sen University, TaiwanAs the design of SoC is getting more and more complicated, the IPs Intellectual Propertyreuse ability increasing is the key issue to improvement the time of the embedded systemsdevelopment and integration. However, the IPs reuse ability will be affected by thecomplexity and diversity of different SoC development environments. In this paper weimplement a standard OCP-AHB bus wrapper. By this wrapper, the IP with OCP protocol canconnect to AMBA 2.0 AHB bus quickly, and IP designer can focus on the development of IPfunctionalities without considering the data transaction in different interconnects. This willreduce the IP development time, and increase the reuse ability. Thus the system integration48


and verification can be accelerated. Furthermore, we added the built-in ICE architecture tomake the SoC verification more flexible and quickly.PS13 Design of a Pipelined Clos Network with Late Release SchemeWeiXiang Tang, and Yursun Hsu*Industrial Technology Research Institute of Taiwan, Taiwan* National Tsing Hua University, TaiwanIn this paper, a novel 4-stage pipelined router is proposed for 3-stage Clos network and itscorresponding network interface NI. The proposed structure is built in DE3 and theperformance is estimated using an in-house C++ simulator. To further improve thethroughput, we propose a late release scheme LRS which reserves the allocated paths. Thesimulation result shows the throughput improvements are 9.42H and 42.91H under randomand mixed traffic, respectively. The latency improvements are 5.1x and 2.53x under Jacobilinear equation simulation with 1k and 512 data sizes, respectively.PS14 Performance Validation of Dynamic-Remapping-Based Task Scheduling on 3DMulti-Core ProcessorsChien-Hui (Christina) Liao and Hung-Pin (Charles) WenNational Chiao Tung University, TaiwanIn our previous work, we proposed a dynamic remapping strategy, Iterative DynamicRemapping (IDR), to enhance an energy-aware task-scheduling algorithm. In this paper,performance for IDR with transmission costs between cores is validated through comparisonwith a Quadratic-Programming-based method and a Genetic-Algorithm-based method.Experimental results show that, the IDR strategy can run at least five-order faster whileachieving comparable performance on total energy consumption of the QP-based method.Compared to the GA-based method, the IDR strategy can run at least three-order faster whileachieving comparable (or even better) performance on total energy consumption.PS15 A Master-Slaver SoC Structure for HMM Based Speech RecognitionHui Geng 1 , Yiyu Shi 1 , Ming Dong 2 , Runsheng Liu 1,31Missouri University of Science & Technology, USA2Beijing Lingshengxin Speech Technology Co. Ltd., China3Tsinghua University, ChinaIn this paper, we propose a master-slave SoC structure composed of an ARM7-TDMI and aco-processor for Mahalanobis distance calculation. The SoC was implemented on an ActelFPGA. Furthermore, we implement a HMM based speech recognition system based on thisSoC. Compared with the conventional ASIC co-processor and slave SoC structure, the newmaster-slave structure reduces the number of SRAM access and improves the bus efficiency.Experiment results show that with 1.40s Chinese speech “feixi” and 24MHz clock, theprocessing time of the M-S SoC system is 1.85s, a 64.12% reduction compared with thesoftware implement on ARM7-TDMI, and a 5.95% reduction compared with slave structureSoC.PS16 Post-Bond Test Techniques for TSVs with Crosstalk Faults in 3D ICsYu-Jen Huang, Jin-Fu Li, and Che-Wei Chou49


National Central University, TaiwanThree-dimensional 3D integration is expected to cope with the difficulties faced by current2D system-on-chip designs using through silicon via TSV. However, coupling capacitanceexists between two neighboring TSVs such that TSVs are prone to crosstalk faults. In thispaper, we propose a builtin self-test BIST scheme for the post-bond test of TSVs withcrosstalk faults in 3D ICs. A test algorithm for testing crosstalk faults of TSVs is proposed.The proposed BIST scheme has the feature of low area cost. Simulation results show that thearea overhead of the BIST circuit implemented with 90nm CMOS technology for a 512×16TSV array in which each TSV cell size is 15 × 15μm 2 is 6.7%.PS17 3D IC Test Scheduling Using Simulated AnnealingChih-Yao Hsu 1 , Chun-Yi Kuo 1 , and James C-M Li 1 , Krishnendu Chakrabarty 21 National Taiwan University, Taiwan2 Duke UniversityThree-dimensional integrated circuits 3D ICs have many advantages over traditionalintegrated circuits. Although 3D ICs have such advantages, there are many difficulties to beovercome. Testing for 3D ICs is regarded as the most difficult challenge. High power densityin 3D ICs causes rising temperature, which may cause test yield loss. In this paper, wepropose a thermal-aware test scheduling technique for 3D ICs. Our experimental resultsshow that the maximum temperature in the test schedule of our proposed technique is underthe temperature limit while the test length overhead is only 19H.50


TUTORIAL IWednesday, April 25, 1:30 PM ~ 4:30 PMBallroom CChair: Poki Chen, National Taiwan University of Science and Technology,TaiwanEnergy-efficient On-chip Power Management: System, Circuit and Device PerspectivesEduard AlarcónThe Technical University of Catalunya (UPC BarcelonaTech), SpainTrends in portable applications such as mobile terminals for next generation communicationsproceed in the direction of increasing the computational load while concurrently reducingsize and enhancing operating lifetime, whereas conversely the density of energy sources isonly expected to increase slightly. In front of this scenario, there exists a demand inimproved power management integrated circuits for future systems-on-chip. The ultimatetarget consists in the fully monolithic integration of the power supply together with thecircuits that constitute its load within either the same substrate or chip package, yielding acomplete Powered System on a Chip (PSOC). This talk will cover efficient energyprocessing circuits within an integrated circuit environment, which require amultidisciplinary approach through concurrence of analog and mixed-signal IC design,power electronics and control theory disciplines. Topics covered will encompass on-chippower supply design and implementation, efficiency optimization, IC-compatible powerinductors and capacitors, power MOSFET switches and efficient switch drivers, mixed-signaland digital controller architectures, as well as system and circuit-level design of on-chipadaptive power management techniques such as adaptive wideband envelope-tracking powersupplies for RF transmitters. Finally, a prospective of on-chip power management techniquesfor energy levels downscaled to the ultra low power regimes as required in energy harvestingapplications for wireless sensor networks and biomedical implants, as well as exploringnanotechnology-enabled devices for power management will be presented.TUTORIAL IIWednesday, April 25, 1:30 PM ~ 4:30 PMBallroom DChair: Chua-Chin Wang, National Sun Yat-Sen University, TaiwanAutomotive Smart Power IC DesignBernhard WichtReutlingen University of Applied Sciences, GermanySmart power ICs combine analog, digital and high-voltage / power functions in a single chip.Based on high-voltage BiCMOS technologies smart power design allows for integration ofcomplete systems or subsystems into single chip solutions. The availability of thesetechnologies has enabled new levels of functionality and new applications, all combined withcost reduction. This applies especially to the automotive market. With the adoption of moreelectronics, vehicles become safer, cleaner, more efficient, more comfortable, moreaffordable. This tutorial will give an overview of automotive design challenges and willcover the circuit design of main smart power circuit blocks like charge pumps, gate and51


idge drivers, linear and switched-mode voltage regulators. System design aspects like pinout, floor planning, grounding/ supply guidelines as well as design-for-test and automotivequalification will also be addressed. With the combination of mixed-signal and high-voltagecircuits, the contents of this tutorial is also applicable to new growth areas like homeautomation or power management for renewable energy.TUTORIAL IIIWednesday, April 25, 1:30 PM ~ 4:30 PMMezzanine A+BChair: Ping-Hsuan Hsieh, National Tsing Hua University, TaiwanCMOS Transceivers for Optical InterconnectsSamuel PalermoTexas A&M University, USAThis tutorial provides an overview of potential opportunities for the use of optics to enablecompute systems to continue performance scaling well into the next decade. Currentprocessors are limited by both inter- and intra-chip interconnect bandwidth, latency, andpower consumption. In contrast, typical optical channels, including glass fibers and on-chipwaveguides display signal loss characteristics which vary only fractions of dBs over widewavelength ranges (tens of nanometers), allowing for efficient data transmission of severalTb/s at the speed of light. This tutorial will discuss how advanced optical devices andphotonic transceiver circuit designs can provide potential advantages in current inter-chip(I/O) interconnects and future intra-chip interconnects for applications such as global clockdistribution and photonic networks-on-chip (NoCs). The first part of the tutorial will give anoverview of optical devices suitable for applications ranging from current optical modules tofuture silicon photonics. Next, circuit implementations of key transceiver circuits are detailed.The tutorial concludes with a discussion on implementation and testing challenges associatedwith optical interconnect systems.52

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!