Built-In Self-Repair - Laboratory for Reliable Computing

Built-In Self-Repair - Laboratory for Reliable Computing Built-In Self-Repair - Laboratory for Reliable Computing

larc.ee.nthu.edu.tw
from larc.ee.nthu.edu.tw More from this publisher
11.07.2015 Views

Built-In Self-RepairCheng-Wen Wu 吳 誠 文Lab for Reliable ComputingDept. Electrical EngineeringNational Tsing Hua University

<strong>Built</strong>-<strong>In</strong> <strong>Self</strong>-<strong>Repair</strong>Cheng-Wen Wu 吳 誠 文Lab <strong>for</strong> <strong>Reliable</strong> <strong>Computing</strong>Dept. Electrical EngineeringNational Tsing Hua University


• Motivation & Goals• Memory Redundancy <strong>Repair</strong>• <strong>Repair</strong> Rate Analysis• <strong>Built</strong>-<strong>In</strong> Redundancy Analysis (BIRA)• <strong>Built</strong>-<strong>In</strong> <strong>Self</strong>-<strong>Repair</strong> (BISR)− BISR Architecture− BISR Procedure• Processor Based BISR• ConclusionsOutlinem07bisr10.05Cheng-Wen Wu, LARC, NTHU2


Motivation• Embedded memories are the most widely usedcores− Memory cores dominate the yield of SOC− Redundancy repair is an effective yieldenhancementtechnique <strong>for</strong> memories• Embedded memory repair using external ATE isdifficult and expensive• <strong>Built</strong>-in self-repair (BISR) is gaining popularity<strong>for</strong> embedded memories− Yield improvement− <strong>Built</strong>-in self-test (BIST) is requiredm07bisr10.05Cheng-Wen Wu, LARC, NTHU3


From IC/Board Test to Core/SOC Test•Cores are untestedcomponents•Test access of cores needsto be developed•Heterogeneous corescomplicate test requirement•Core-level test is notavailable•Total test time can beextremely long•At-speed/functional test ishard•Diagnostics is a requirement−Product development−Yield enhancement−Business successMPEGROMMPEGuPuPROMROMROMSRAMPCBASICSOCDSPGlue LogicDSPA/D BlockFPGAFPGAm07bisr10.05Cheng-Wen Wu, LARC, NTHU4


ITRS Embedded-Memory Test-Requirements• <strong>In</strong>ternational Technology Roadmap <strong>for</strong> Semiconductors (ITRS), 2007-2009• More row & column spares, and both divided & shared spares <strong>for</strong> segments• High-speed portion: Memory-embedded Counters and Data-comparators• Low-speed portion: Shared Scheduling, Pattern programming, etc.• Testability-aware high-level synthesis <strong>for</strong> memory grouping and parallel access• New test-oriented and parallel-test architecturesYear of Production 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024Embedded SRAM<strong>Repair</strong>ing Mechanism(Row & Col, RC More, More Sophisticated)Area <strong>In</strong>vestment of BIST/BISR/BISD(Kgates/Mbits)Standardized Fast Test I/F(Some, Partially, Fully)RC RC RC RC RC RCM RCM RCM M M M M M M M M M M35 35 35 35 35 35 35 35 35 35 35 35 35 35 35 35 35 35S S S S S P P P P P P F F F F F F FCapacity (Mbits) (ITRS08) 0.5 1 1 1 2 2 2 4 4 4 8 8 8 16 16 16 NA NACapacity (Mbits) (ITRS09) NA NA 64 64 128 128 128 256 256 256 512 512 512 1024 1024 1024 2048 2048DFTEmbedded DRAMCapacity (Mbits) 256 512 512 512 1024 1024 1024 2048 2048 2048 4096 4096 4096 4096 4096 8192 8192 8192DFTEmbedded FlashCapacity (Mbits) (ITRS08) 64 128 128 128 256 256 256 512 512 512 1024 1024 1024 1024 2048 2048 NA NACapacity (Mbits) (ITRS09) NA NA 64 64 128 128 128 256 256 256 512 512 512 1024 1024 1024 2048 2048DFTBIST/BISRBIST/BISRBIST/BISR/DATManufacturable and optimized solutions exist<strong>In</strong>terim solutions are knownManufacturable solutions are knownManufacturable solutions are NOT known5


Dominant Fail Mechanism in 65nm SRAM• Vmin Drift− Vmin: minimum operating voltage of cells− Parametric sensitivity; not associated with a pattern defect− Single cell failsSource:M. Ball et al., IEDM06, TIJ. P. Bick<strong>for</strong>d et al., ASMC08, IBM6


YieldYield Life Cycle Curve1Design yieldoptimizationFab yieldoptimizationWhat has it toYielddolearningwith DFT?curveDesign Yield ramp up Volume ProductionTimeSource: Y. Zorian, DAC02m07bisr10.05Cheng-Wen Wu, LARC, NTHU7


<strong>In</strong>frastructure IP (IIP)• Ensures quality, manufacturability, and reliability− Not function IP (FIP)• Basic IIP types:− DFT/BIST IP− Diagnosis/debugging IP− BIRA/BISR IP− Characterization/measurement IP− Process monitoring IP− Robustness/fault tolerance IP• Can be on wafer, in SOC, distributed over FIP, orintegrated into FIPm07bisr10.05Cheng-Wen Wu, LARC, NTHU8


NTHU-FTC BIST Architecturem07bisr10.05Cheng-Wen Wu, LARC, NTHU9


Test Mode• <strong>In</strong> Test Mode it runs a fixed algorithm <strong>for</strong>production test and repair− Only a few pins need to be controlled, and BGOreports the result (Go/No-Go)m07bisr10.05Cheng-Wen Wu, LARC, NTHU10


Fault Analysis Mode• <strong>In</strong> Fault Analysis Mode, we can apply a longerMarch algorithm <strong>for</strong> diagnosis− FSI captures the error in<strong>for</strong>mation of the faultycellsEOP <strong>for</strong>mat:m07bisr10.05Cheng-Wen Wu, LARC, NTHU11


Redundancy and <strong>Repair</strong>• Problem: We keep shrinking the RAM chipfeature size and increasing its density andcapacity. How do we maintain the yield?• Solutions:− Fabrication∗ Material, process, equipment, etc.− Design∗ Device, circuit, etc.− Redundancy and repair∗ On-line◊ EDAC (extended Hamming code; product code)∗ Off-line◊ Spare rows, columns, blocks, etc.m07bisr10.05Cheng-Wen Wu, LARC, NTHU12


From BIST to BISRBIST BISD BIRA BISR• BIST: built-in self-test• BIECA: built-in error catch & analysis-BISD: built-in self diagnosis-BIRA: built-in redundancy analysis• BISR: built-in self-repairm07bisr10.05Cheng-Wen Wu, LARC, NTHU13


RAM <strong>Built</strong>-<strong>In</strong> <strong>Self</strong>-<strong>Repair</strong> (BISR)Reconfiguration MechanismRedundancyAnalyzerBISTRAMSpare Elementsm07bisr10.05Cheng-Wen Wu, LARC, NTHU14


Redundancy Architecturesm07bisr10.05Cheng-Wen Wu, LARC, NTHU15


RAM Redundancy Allocation• 1-D: spare rows (or columns) only− SRAM− Algorithm: Must-<strong>Repair</strong>• 2-D: spare rows and columns (or blocks)− Local and/or global spares− NP-complete problem− Conventional algorithm:∗ Must-<strong>Repair</strong> phase∗ Final-<strong>Repair</strong> phase◊ <strong>Repair</strong>-Most (greedy) [Tarr et al., 1984]◊ Fault-Driven (exhaustive, slow) [Day, 1985]◊ Fault-Line Covering (b&b) [Huang et al., 1990]m07bisr10.05Cheng-Wen Wu, LARC, NTHU16


RAM Redundancy Allocation (cont’d)• Optimal 2D redundancy allocation is NP-complete [Kuo, et al., IEEE D&T87]− Exhaustive∗ Software [Lin, et al., IEEE TR, 06/06]∗ Hardware◊ Sequential [Ohler, et al., ETS07]◊ Parallel [Kawagoe, et al., ITC00] (bit-oriented) [Du, et al., VLSID04] (word-oriented)− Heuristics <strong>for</strong> reasonable resource [IEEE TR, 11/03]∗ Local Optimal∗ Greedy: <strong>Repair</strong> Most◊ Local <strong>Repair</strong> Most◊ Essential Spare Pivoting ~ <strong>Repair</strong> ThresholdCorner Fault Map LO LRM ESP Th2172 4 61 1 1 13 14 15 12 4 61 1 1 13 14 15 11 23 64 46 0


An SRAM with BISRSource: Kim et al., ITC98m07bisr10.05Cheng-Wen Wu, LARC, NTHU18


DRAM Redundancy Example I4 local spare rows per block2x4=8 global spare columnsm07bisr10.05Cheng-Wen Wu, LARC, NTHU19


DRAM Redundancy Example IIGroupBankBlockDomainIO1IO4IO2IO3NormalGlobalSpareRowLimitedSpareRowLocalSpareRowm07bisr10.05Cheng-Wen Wu, LARC, NTHU20


Example <strong>for</strong> Complex Row/Col Spares• Unit-Block (UB)• Spare Row− 4-bit wide, 1-bank long• Spare Column (Col)− 1-bit wide, 1-UB-pair long− 2 spare Cols per UB-pair8 spare Rows per Half-bankBank• How do we represent the spares?Source: VTTW0821


Parameterized Extensible Unit• Unit-Block Size (bits)− [UnitColLength,Entire MemoryUnitRowLength]• Spares per Unit-Block− [ 2, 8]UnitCol UnitRow• Spares per Grouping− [ 1, 4]SegCol SegRowSource: VTTW0822


Col-Direction Extensibility Expression• Col-Direction Levels− [2, 2, 4]• UnitRow Perpendicular Ext.Spare Row allocating range− [0, 0, 1]• UnitCol Parallel Ext.Spare Col covering range− [0, 1, 0]Source: VTTW0823


Row-Direction Extensibility Expression• Row-Direction Levels− [2, 8]• UnitRow Parallel Ext.Spare Row covering range− [0, 1]• UnitCol Perp. Ext.Spare Col allocating range− [0, 0]Source: VTTW0824


Redundancy Analysis• RA algorithm is the main factor affecting repairefficiencyFaulty MemoryMethod 1: Row firstCan not repairm07bisr10.05Cheng-Wen Wu, LARC, NTHU25


Redundancy Analysis• RA algorithm is the main factor affecting repairefficiencyFaulty MemoryMethod 2: Column firstCan not repairm07bisr10.05Cheng-Wen Wu, LARC, NTHU25


Redundancy Analysis• RA algorithm is the main factor affecting repairefficiencyFaulty MemoryMethod 3: GreedyCan be repairedRA is importantm07bisr10.05Cheng-Wen Wu, LARC, NTHU25


Redundancy Analysis SimulationMemoryDefect <strong>In</strong>jectionFault TranslationFaulty MemoryTest AlgorithmSimulationRAAlgorithmSpareElementsFail bit map and sub-mapsRA SimulationResultRef: MTDT02m07bisr10.05Cheng-Wen Wu, LARC, NTHU29


Memory Specification File# Memory ConfigurationOriented = 1 # word-oriented or bit-orientedWord_Length = 16Block_Size = 256x16Block_Count = 4#Redundancy DesignSpare_Rows = 2Spare_Columns = 8# Defects and Faults Prob.Random_Defects = 20Faulty_Rows = 15Faulty_Columns = 10Cluster_Faults = 5# March={ Test Algorithmsa = 0000000000000000 ⇓ (wb) ⇓ (rb, wa) ⇑ (ra, wa) ⇑ (ra) }m07bisr10.05Cheng-Wen Wu, LARC, NTHU30


Defect <strong>In</strong>jection• Only point defects are assumed (no bulkdefects)• Defect count distributions:− Poisson, Gamma, Negative Binomial, etc.• Defect locations:− Randomly distributed on wafer/die• Defect distribution is process dependentm07bisr10.05Cheng-Wen Wu, LARC, NTHU31


Fault Translation• Defects lead to:− Faulty address decoder− Faulty sense amplifier− Faulty cells due to, e.g., coupling between bit-linesor word-lines• Defects are translated to:− Single cell fault− Faulty row− Faulty column− Cluster fault− Etc.• User can set the probability of each fault typem07bisr10.05Cheng-Wen Wu, LARC, NTHU32


Fault Translation: Failure Patterns• Single Bit• Double Bits • Partial Row • Single Row • Double Rows • Partial Column• Single Column• Double ColumnsSource: VTTW0833


Fault Translation: Distribution• Assumption− A defect results in a failure pattern∗ PatternNum = DefectNum = DefectDensity x MemoryAreaFailure Pattern Probability (%)Single Bit 25.205Double Bits 3.767Partial Row 8.563Single Row 0.153Double Row 0.956Partial Column 56.478Failure Pattern Probability (%)Single Bit 25.205Double Bits 3.767Single Row 8.716Double Row 0.956Single Column 61.089Double Column 0.267Single Column 2.187Bit-Line Failure 0.093Sense Amplifier Failure 2.331Double Column 0.267Source: VTTW0834


• Memory architecture− 2K x 8KFailure Bitmap− Same row/column structure as previous example• <strong>In</strong>jected defects− 80 random defects− Same failure pattern distribution as previous exampleSource: VTTW0835


Test Algorithm Simulation• Fault in<strong>for</strong>mation collected on-line• Each Read operation provides different in<strong>for</strong>mationw0r0w1r1w0r0w1r1w0r0=m07bisr10.05Cheng-Wen Wu, LARC, NTHU36


RA Algorithm Evaluation• Our RA simulator evaluates different RAalgorithms and report the respective repair ratesand area overheads• The RA algorithms are provided by the user− Using function calls• Spare element types:− Row/column− Global/local− Shared/non-sharedExercise: What are the factors that determinethe final yield of the memory?m07bisr10.05Cheng-Wen Wu, LARC, NTHU37


DRAM Constrained <strong>Repair</strong> Algorithm1. Must-<strong>Repair</strong> Column Phase (32 fails in a column)2. Must-<strong>Repair</strong> Row Phase ( 4 fails in a row segment)3. Fail-Count Weighted-Sort <strong>Repair</strong>-Most Phase1. Column <strong>Repair</strong> Most (with threshold > 2)2. Column <strong>Repair</strong> Most (with threshold > 1, FaultyRow/Domain Weighted)3. Row <strong>Repair</strong> Most (with threshold > 1, Use Global Spare Rows)4. Must-<strong>Repair</strong> Row Phase (Loop <strong>for</strong> SpareColum-Lacked ColumnGroup)5. Constrained Orthogonal <strong>Repair</strong>((FaultCount - AvailSpareRow)/ColGrp Compared)6. Row <strong>Repair</strong> Constraints Check (Share-Amp, Fix-Domain)GroupBlockDomainIO1IO4IO2IO3NormalGlobalSpareRowLimitedSpareRowLocalSpareRow38


<strong>Repair</strong> Rate Evaluation• <strong>Repair</strong> Rate =Number of repaired memoriesNumber of faulty memories• 3-D plot <strong>for</strong> repair rate<strong>Repair</strong> AnalysisReapir Rate10090807060504030201 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10Spare Row CountSpare Column Countm07bisr10.05Cheng-Wen Wu, LARC, NTHU39


Simulation Time Reduction• Criterion 1: N ⊕N sr + N sc ⇒ unrepairable• Criterion 2: N sr + N sc – N fr – N fc>≥N sfc ⇒ repairable13246857123456: Faulty cell78N ⊕ : orthogonal cell countN sr : spare row countN sc : spare column countN fr : faulty row countN fc : faulty column countN sfc : single faulty cell countA faulty memory with N ⊕ = 8m07bisr10.05Cheng-Wen Wu, LARC, NTHU40


Bit-Oriented Memory Example• Size: 2048x55• Single block• Defects: 1-20 (Poisson distribution)• Faulty rows: 15%• Single faulty cells: 85%• Spare rows: 1-10• Spare columns: 1-10m07bisr10.05Cheng-Wen Wu, LARC, NTHU41


m07bisr10.05Experimental ResultR C RR Area R C RR Area1 8 75.26% 16439 6 4 82.12% 85227 2 75.52% 4481 4 6 82.20% 125089 1 75.61% 2543 3 7 82.55% 145012 7 75.78% 14446 5 5 82.81% 105153 6 75.87% 12453 9 2 85.24% 45914 5 76.13% 10460 1 10 85.76% 205355 4 76.74% 8467 2 9 86.55% 185426 3 76.91% 6474 5 6 86.98% 1256310 1 78.12% 2598 3 8 87.29% 165491 9 81.15% 18487 4 7 87.33% 145568 2 81.34% 4536 8 3 87.41% 65842 8 81.51% 16494 7 4 87.67% 85777 3 81.67% 6529 6 5 87.67% 10570Target repair rate: 75%42Cheng-Wen Wu, LARC, NTHU


m07bisr10.05Experimental ResultR C RR Area R C RR Area1 8 75.26% 16439 6 4 82.12% 85227 2 75.52% 4481 4 6 82.20% 125089 1 75.61% 2543 3 7 82.55% 145012 7 75.78% 14446 5 5 82.81% 105153 6 75.87% 12453 9 2 85.24% 45914 5 76.13% 10460 1 10 85.76% 205355 4 76.74% 8467 2 9 86.55% 185426 3 76.91% 6474 5 6 86.98% 1256310 1 78.12% 2598 3 8 87.29% 165491 9 81.15% 18487 4 7 87.33% 145568 2 81.34% 4536 8 3 87.41% 65842 8 81.51% 16494 7 4 87.67% 85777 3 81.67% 6529 6 5 87.67% 10570Area overhead is less than 5000 cells42Cheng-Wen Wu, LARC, NTHU


CRESTA• CRESTA: Comprehensive Real-time ExhaustiveSearch Test and Analysis• At-speed BIRA• Use multiple sub-analyzers to try all possibleorders of spare-row/column allocationconcurrently− Exhaustive; optimal− Hardware cost grows rapidly• Save addresses in CAM <strong>for</strong> repairSource: Kawagoe, et al., ITC00m07bisr10.05Cheng-Wen Wu, LARC, NTHU45


CRESTA Algorithm• During Memory BIST (<strong>for</strong> each Sub-Analyzer i):If (Fail == 1)If ((Column_Addr not in CAM_C) && (Row_Addr not in CAM_R))If (No more spare resouce)UnSuc[i] == 1;Else if (current spare resouce == row)CAM_R_cur = Row_Addr;Point to next spare resouce;Else if (current spare resouce == column)CAM_C_cur = Column_Addr;Point to next spare resouce;• Post Memory BIST:If (UnSuc != all ‘1’)<strong>Repair</strong>able: choose a result with UnSuc[i] == 0 as your strategy;ElseUnrepairable;m07bisr10.05Cheng-Wen Wu, LARC, NTHU46


Example• A 5x5 memory cell array− 2 spare columns− 2 spare rows− 8 faulty cells• 6 possible BIRA strategies− RRCC− RCRC− RCCR− CRRC− CRCR− CCRR12563478m07bisr10.05Cheng-Wen Wu, LARC, NTHU47


Sequence of <strong>Repair</strong> Analysis1. R-R-C-CRR C2. R-C-R-CRC R C Use up! 3. R-C-C-RRC C1515 153332224444. C-R-R-CCR R C Use up! 5. C-R-C-RCR C R Use up! 6. C-C-R-RC C R RUse up!15 15 15333222444m07bisr10.05Cheng-Wen Wu, LARC, NTHU48


Sequence of <strong>Repair</strong> Analysis1. R-R-C-CRR C C Use up! 2. R-C-R-CRC R C UnSuc! 3. R-C-C-RR C C RUnSuc!1515153737372626264448884. C-R-R-CCR R C UnSuc! 5. C-R-C-RCR C R UnSuc! 6. C-C-R-RC C R R UnSuc!151515373737262626444888m07bisr10.05Cheng-Wen Wu, LARC, NTHU48


Sequence of <strong>Repair</strong> Analysis1. R-R-C-CRR C C Pass! 2. R-C-R-CRC R C UnSuc! 3. R-C-C-RR C C R UnSuc!1515153737372626264448884. C-R-R-CCR R C UnSuc! 5. C-R-C-RCR C R UnSuc! 6. C-C-R-RC C R R UnSuc!151515373737262626444888m07bisr10.05Cheng-Wen Wu, LARC, NTHU48


FSM of Sub-Analyzer (RRCC)[2][3][0] <strong>In</strong>itialize[1] Fail[2] Pass[3] Fail & Match[4] Fail & Mismatch[0][2]R1[1]Start[0][4][0][0]R2[4][2][3]UnSuc[0]C1[1][2][4]C2[4][2][3][2][3]m07bisr10.05Cheng-Wen Wu, LARC, NTHU53


CRESTA ArchitectureRow AddressFailWECAM Array(Row)L10L11L12FSM。。。UnSucL13UnSuc1Column AddressWECAM Array(Column)m07bisr10.05Cheng-Wen Wu, LARC, NTHU54


Exercise1. What is the number of sub-analyzers if there are4 spare rows and 6 spare columns?2. Derive the hardware complexity and timecomplexity of CRESTA with respect to number ofspares and faulty cells.3. Is it possible to reduce the number of subanalyzersand still guarantee optimal solution?• Prove it and give an example.4. Can CRESTA be used <strong>for</strong> word-orientedmemories?• Prove it and give an example.m07bisr10.05Cheng-Wen Wu, LARC, NTHU55


1-D Array ArchitectureD0 D1 D2 D3• Address bits: 6• Word length: 4• Column <strong>Repair</strong> Vector(CRV)0 A0 A0 A0 A01 A1 A1 A1 A12 A2 A2 A2 A2Source: Powell, et al., ITC0363 A63 A63 A63 A63m07bisr10.05Cheng-Wen Wu, LARC, NTHU56


Word-Oriented BIRA Algorithm• During Memory BIST:If (Fail_Map != 0) /* Fail_Map = Fault Signature */If ((Fail_Map not in CRV) && (Row_Addr not in Spare_Rows))If (No more spare resouce)Unrepairable;Else if (current spare resouce == row)Spare_Row = Row_Addr;Point to next spare resouce;Else if (current spare resouce == column)CRV = CRV | Fail_Map;Point to next spare resouce;• Post Memory BIST:If ((!unrepairable) && (# of 1s in CRV


Example of C-C-R-RFail_MapC-C-R-RC R1101 01 01 01012 2 23 3 3CRV01 01 01 0101# of 1s in CRV = 5 > 2Unrepairable !!m07bisr10.05Cheng-Wen Wu, LARC, NTHU58


Example of C-R-R-CFail_MapC-R-R-CR R C1101 01 01 01012 2 23 3 3CRV01 0 0 0 01# of 1s in CRV = 2


<strong>In</strong>terleaved ArchitectureD0 D0 D1 D1• Address bits: 6• Word length: 2• MUX-level:2• Row address =Address/MUX-level• Column address =Address%MUX-level• <strong>In</strong>terleaving: to avoidintra-word couplingfaults0 A0 A1 A0 A11 A2 A3 A2 A32 A4 A5 A4 A531 A62 A63 A62 A63m07bisr10.05Cheng-Wen Wu, LARC, NTHU60


Modification <strong>for</strong> <strong>In</strong>terleaved Architecture• Method 1− Size of CRV = # of bits in a row− Use the same algorithm as in the linear (1-D)architecture− Area overhead is high• Method 2− Combine several CRVs into a W-CRV− Size of W-CRV = word length + length of MUX-level− # of W-CRVs = # of spare columns− Compare column address as well as MUX-levelm07bisr10.05Cheng-Wen Wu, LARC, NTHU61


Example 10 1 0 10 0000 001012 0000 01003MUXlevel1 0 0 0 0 0 1 1 0CRV0 0 0 0 0 0 0 0 0m07bisr10.05Cheng-Wen Wu, LARC, NTHU62


Example 20 1 0 101 0000 001023 0100 0000MUXlevel1 0 0 0 0 0 0 1 0CRV0 0 1 0 0 0 0 0 0m07bisr10.05Cheng-Wen Wu, LARC, NTHU63


<strong>Repair</strong> Strategy Reconfiguration by LFSR• By using LFSR• “1” == Spare Row, “0” == Spare Column• 1001→1010 → 0101 → 1100 → 0110 → 0011SpareResourcesStrategyPolynomial<strong>In</strong>itial Strategy1R1C 1+x 2 101R2C 1+x 3 1002R1C 1+x 3 1102R2C 1+x+x 2 +x 4 1001m07bisr10.05Cheng-Wen Wu, LARC, NTHU64


<strong>Repair</strong> Strategy Selection AlgorithmRCRCCRCRCRRCCCR R C RExercise: What other heuristics can you use to improve theper<strong>for</strong>mance and reduce the cost?m07bisr10.05Cheng-Wen Wu, LARC, NTHU65


Definitions• Faulty line: row or column with at least one faultycell• A faulty line is covered if all faulty cells in the lineare repaired by spare rows and/or columns• A faulty cell not sharing any row or column withany other faulty cell is an orthogonal faulty cell• r: number of (available) spare rows• c: number of (available) spare columns• F: number of faulty cells in a block• F’:number of orthogonal faulty cells in a blockm07bisr10.05Cheng-Wen Wu, LARC, NTHU66


Example Block with Faulty Cellsm07bisr10.05Cheng-Wen Wu, LARC, NTHU67


<strong>Repair</strong>-Most (RM)1. Run BIST and constructbitmap2. Construct row andcolumn error counters3. Run Must-<strong>Repair</strong>algorithm4. Run greedy final-repairalgorithmm07bisr10.05Cheng-Wen Wu, LARC, NTHU68


Worst-Case Bitmap (After Must-<strong>Repair</strong>)• Max F=2rc• Max F’=r+c• Bitmap size: (rc+c)(cr+r)r=2; c=4m07bisr10.05Cheng-Wen Wu, LARC, NTHU69


Local <strong>Repair</strong>-Most (LRM)• RM is not good enough <strong>for</strong> embedded RAM− Large storage requirement: bitmap and counters− Slow• LRM improves the per<strong>for</strong>mance− <strong>Repair</strong>-Most based− Improved heuristics− Early termination rules− Concurrent BIST and BIRA− No separate Must-<strong>Repair</strong> phase• LRM reduces the storage required− Smaller local bitmap∗ From (rc+c)x(cr+r) to mxnRef: IEEE TR, 11/03m07bisr10.05Cheng-Wen Wu, LARC, NTHU70


LRM Algorithm• Activated by BIST whenever a faulty cell isdetected• Fault Collection (FC)− Collects faulty-cell addresses− Constructs local bitmap− Counts row and column errors• Spare Allocation (SA)− Allocate spare rows or columns when bitmap is full− Allocate spare rows or columns at endm07bisr10.05Cheng-Wen Wu, LARC, NTHU71


LRM: FC and SA(1,0), (1,6), (2,4), (3,4), (5,1), (5,2)m07bisr10.05Cheng-Wen Wu, LARC, NTHU72


LRM Example(5,2) (5,4),(5,6),(5,7) (7,3)m07bisr10.05Cheng-Wen Wu, LARC, NTHU73


Local Optimization (LO)• LRM has drawbacks:− Selecting line with largest fault count may be slow− Multiple lines may need to be selected <strong>for</strong> repair− Area overhead is still high− <strong>Repair</strong> rate depends on bitmap size• LO has a better repair rate based on samehardware overhead, i.e., a higher repair efficiency− Fault Collection (FC)∗ Records faulty cells in bitmap until it is full− Spare Allocation (SA)∗ Exhaustive search per<strong>for</strong>med <strong>for</strong> repairing all faults− Bitmap cleared; process repeated until doneRef: IEEE TR, 11/03m07bisr10.05Cheng-Wen Wu, LARC, NTHU74


LO: Column*/Row Selection <strong>for</strong> SAA 1 means that the correspondingcol is selected <strong>for</strong> repair, unless empty(Should try all combinations of E)Col selection vector1. Col 5 selected <strong>for</strong> repairRow selection vector2. Row 5 is selected <strong>for</strong> repair* Assume column selection has a lower cost than row selectionm07bisr10.05Cheng-Wen Wu, LARC, NTHU75


LO Examplem07bisr10.05Cheng-Wen Wu, LARC, NTHU76


Essential Spare Pivoting (ESP)• Maintain high repair rate without using a bitmap− Small area overhead• Fault Collection (FC)− Collect and store faulty-cell address using rowpivotand column-pivot registers∗ If there is a match <strong>for</strong> row (col) pivot, the pivot is anessential pivot∗ If there is no match, store the row/col addresses inthe pivot registers− If F > r+c, the RAM is irreparable• Spare Allocation (SA)− Use row and column pivots <strong>for</strong> spare allocation∗ Spare rows (cols) <strong>for</strong> essential row (col) pivots− SA <strong>for</strong> orthogonal faultsRef: IEEE TR, 11/03m07bisr10.05Cheng-Wen Wu, LARC, NTHU77


ESP Example(1,0) (1,6) (2,4) (3,4) (5,1) (5,2) (7,3)m07bisr10.05Cheng-Wen Wu, LARC, NTHU78


Cell Fault Size DistributionMixed Poisson-exponential distributionm07bisr10.05Cheng-Wen Wu, LARC, NTHU79


<strong>Repair</strong> Rate Comparison• 1,552 RAM blocks• 1,024x64 bits per block• r from 6 to 10• c from 2 to 6• LRM bitmap: rxc• LO bitmap: 8x4m07bisr10.05Cheng-Wen Wu, LARC, NTHU80


Normalized <strong>Repair</strong> Ratem07bisr10.05Cheng-Wen Wu, LARC, NTHU81


<strong>Repair</strong> Rate (r=10)m07bisr10.05Cheng-Wen Wu, LARC, NTHU82


Normalized <strong>Repair</strong> Rate (r=6)m07bisr10.05Cheng-Wen Wu, LARC, NTHU83


Limitation of <strong>Repair</strong> Improvement (1 Spare)84


Limitation of <strong>Repair</strong> Improvement (4 Spares)85


Area OverheadOverhead is about 5-12% <strong>for</strong> 16Mb DRAM, r=8, and c=4m07bisr10.05Cheng-Wen Wu, LARC, NTHU86


Computation Time (Simulated)m07bisr10.05Cheng-Wen Wu, LARC, NTHU87


Configurable Spare RemappingMain MemoryRow Addr.000010Addresses MappingRow Columna 5 a 4 a 3 a 2 a 1 a 0Remap0 1 00A MA SElement #01Spare Elements000111111Remap1A S0 0 0A MColumn Addr.88


Redundancy OrganizationSEG0SEG1SR0SR1SCG0SCG1SR: Spare Row; SCG: Spare Column Group; SEG: SegmentITC03m07bisr10.05Cheng-Wen Wu, LARC, NTHU89


BISR ArchitectureQDAMAOBIRAWrapperMain MemoryPORBISTSpare MemoryMAO: mask address output; POR: power-on resetm07bisr10.05Cheng-Wen Wu, LARC, NTHURef: ITC0390


Power-On BISR ProcedurePower OnBIST Test Spare Row & ColumnError in<strong>for</strong>mationBIRABIST Test Main MemoryContinueError in<strong>for</strong>mationBIRAMasked addressBIRAReduced address spaceAddressRemappingAddressm07bisr10.05Cheng-Wen Wu, LARC, NTHU91


Exercise1. Can we use LRM, LO, or ESP <strong>for</strong> BIRA in thiscase?• If so, show the block diagram of the design.• If not, why?2. For this particular case, develop a BIRA circuitthat you think is cost effective.3. Compare the proposed address remappingapproach with the conventional laser repairmethod.4. What are the advantages and disadvantages ofpower-on BISR?m07bisr10.05Cheng-Wen Wu, LARC, NTHU92


Down-Graded Operation Mode• If the spare rows are exhausted, the memory isoperated at down-graded mode− The size of the memory is reduced• For example, assume that a memory withmultiple blocks is used <strong>for</strong> buffering and theblocks are chained by pointers− If some block is faulty and should be masked,then the pointers are updated to invalidate theblock− The system still works if a smaller buffer isallowedm07bisr10.05Cheng-Wen Wu, LARC, NTHU93


• SubwordDefinition− A subword is consecutive bits of a word− Its length is the same as the group size• Example: a 32x16 RAM with 3-bit row addressand 2-bit column addressA word with 4 subwordsm07bisr10.05A subword with 4 bitsCheng-Wen Wu, LARC, NTHU94


• To reduce the complexity, we use two row-repairrules− A row has multiple faulty subwords− Multiple faulty subwords with the same columnaddress and different row addresses• Examples:Row-<strong>Repair</strong> Rulessubwordm07bisr10.05subwordCheng-Wen Wu, LARC, NTHU95


BIRA ProcedureRun BISTDetects a faultCheck Row-<strong>Repair</strong> RulesNot metDoneMetStop<strong>Repair</strong>-Most RulesCheck Available Spare RowsNo available spare rowExport Faulty Row Addressm07bisr10.05Cheng-Wen Wu, LARC, NTHU96


• <strong>Repair</strong> rate<strong>Repair</strong> Rate Analysis− The ratio of the number of repaired memories tothe number of defective memories• A simulator has been implemented to estimate therepair rate of the proposed BISR scheme [Huang, etal., MTDT02]• <strong>In</strong>dustrial case:− SRAM size: 8Kx64− # of injected random faults: 1~10− # of memory samples: 534− RA algorithms: proposed and exhaustive searchalgorithmsm07bisr10.05Cheng-Wen Wu, LARC, NTHU97


Simulation ResultsN SR N SC N SCG RR 1MA 2MA 3MA 4MA 5MA >5MA RR (Best)1 0 01 4 11 8 21 12 32 0 02 4 12 8 22 12 33 0 03 4 13 8 23 12 34 0 04 4 14 8 24 12 35 0 05 4 15 8 25 12 318.37%73.10%94.43%99.26%36.55%86.09%99.26%100%72.17%96.10%99.81%100%72.36%98.52%100%100%85.90%99.81%100%100%99 191 4 69 45 3238 40 35 16 9 75 7 12 1 3 21 1 1 1 0 0192 2 71 46 18 1336 16 12 3 8 03 1 0 0 0 00 0 0 0 0 00 75 43 18 7 77 5 4 3 2 01 0 0 0 0 00 0 0 0 0 073 44 18 8 5 14 3 0 0 0 00 0 0 0 0 00 0 0 0 0 044 18 7 6 1 01 0 0 0 0 00 0 0 0 0 00 0 0 0 0 018.54%86.14%99.81%100%37.08%94.01%100%100%55.06%97.38%100%100%71.91%98.69%100%100%85.77%99.81%100%100%m07bisr10.05Cheng-Wen Wu, LARC, NTHU98


A 8Kx64 <strong>Repair</strong>able SRAMTechnology: 0.25umSRAM area: 6.5 mm 2BISR area : 0.3 mm 2Spare area : 0.3 mm 2HO spare : 4.6%HO bisr : 4.6%<strong>Repair</strong> rate: 100% (if #random faults is no morethan 10)Redundancy: 4 spare rows and 2 spare column groupsGroup size: 4m07bisr10.05Cheng-Wen Wu, LARC, NTHU99


<strong>Repair</strong> Rate Comparison (Group Size=2)<strong>Repair</strong> Rate (%)ProposedExhaustive1009080706050403020100024Spare rows6810 0241086Spare columnsm07bisr10.05Cheng-Wen Wu, LARC, NTHU100


<strong>Repair</strong> Rate Comparison (Group Size=4)ProposedExhaustive<strong>Repair</strong> Rate (%)1009080706050403020100024Spare rows6810 02410 1286Spare columnsm07bisr10.05Cheng-Wen Wu, LARC, NTHU101


• Power-on BISTProcessor-Based BISR− Load the test and RA algorithms into ROM− Execute the test algorithm− If a fault is detected, execute the RA algorithm andreconfigure the addresses• Faulty columns/rows are replaced by redundantcolumn/rows• All memory cores are tested and repaired ifpossible be<strong>for</strong>e normal operation• Testing is done at-speedRef: ATS03m07bisr10.05Cheng-Wen Wu, LARC, NTHU102


Processor-Based BISR (cont’d)• Using the ARM core as the embedded processor• The SOC plat<strong>for</strong>m is based on Advanced Highper<strong>for</strong>manceBus (AHB)• Components we developed:− RAM Selector− AHB interface− Test Pattern Generator (TPG)− Reconfiguration Circuit (RC)m07bisr10.05Cheng-Wen Wu, LARC, NTHU103


BISR ArchitectureARMAHB interfaceROMSMIAHBAHB interfaceAHB interfaceAHB interfaceRAMSelectorTPGRCTPGRCRAMRAMRef: ATS03m07bisr10.05Cheng-Wen Wu, LARC, NTHU104


Programmability• Test algorithm programming:1. Implement the memory test algorithm andredundancy allocation (RA) algorithm by C2. Compile the C code into the ARM code3. Save the ARM code in the ROM• Easy modification of the memory test algorithmand RA algorithmm07bisr10.05Cheng-Wen Wu, LARC, NTHU105


Signals between ComponentsARM CoreCorrect / faultyOp codefaulty addressContinueelement completerepaired addressWrapperRAMSelectorModeWrapperRAMm07bisr10.05Cheng-Wen Wu, LARC, NTHU106


State Diagram of the Memory SelectorHRESETn! Test_finish& ! ReconfMTestReconfNormalModeTestModeReconfigureTest_finishConf_finish! MTest! Conf_finishm07bisr10.05Cheng-Wen Wu, LARC, NTHU107


Memory Wrapper• <strong>In</strong>cluding TPG, RC and AHB interfaceMBSARCDatainDDataoutAHB Command<strong>In</strong>terfaceF_addressTPGMEMRMEMQCOMPRATORm07bisr10.05Cheng-Wen Wu, LARC, NTHU108


Test Pattern Generator (TPG)• Receives test commands <strong>for</strong>m the ARMprocessor• Generates test signals <strong>for</strong> the memory core• Sends faulty-cell addresses back to the ARMprocessor when faults are detected• Composed of a Controller and a Wave<strong>for</strong>mGeneratorm07bisr10.05Cheng-Wen Wu, LARC, NTHU109


Controller of TPG! Test_finish& ! faultBIST_activefaultIdleActiveFaultyTest_finishContinue! BIST_active! Continuem07bisr10.05Cheng-Wen Wu, LARC, NTHU110


Test Command Word FormatCommandContinueBackgroundDirectionCommand Direction Background Continue000 0/1 0/1 wwrwr001 0/1 0/1 w010 X X X <strong>In</strong>itial all Reg.011 0/1 0/1 wr100 0/1 0/1 rw101 0/1 0/1 r110 0/1 0/1 rwr111 0/1 0/1 rwrrm07bisr10.05Cheng-Wen Wu, LARC, NTHU111


Wave<strong>for</strong>m GeneratorTest Command:idlerr wr w rr w r rww rw w r w rContinue!=0wwrww w r w r rfaultcontinue=0rCommand:rr,rw,rwr,rwrrw,wrwwrwrwwrrrwrwrrwrrwwrwrm07bisr10.05Cheng-Wen Wu, LARC, NTHU112


<strong>Repair</strong> Process• Upon detecting a fault, the ARM core pauses thetest process and switches to the RA algorithm• Three RA algorithms have been implemented− Row-first− Column-first− Must-repair• After RA, ARM sends repair in<strong>for</strong>mation to RC ifnecessary− Address remapping is done by RC• RA algorithm can be replacedm07bisr10.05Cheng-Wen Wu, LARC, NTHU113


RC Block Diagram<strong>Repair</strong>Row AddressWrite R_Enable Write C_Enable<strong>Repair</strong>Column AddressADDRR_AddC_AddR_ComparatorR_AddTAGR_SelectorC_SelectorTAGC_AddC_ComparatorRemappingm07bisr10.05Data SelectionCheng-Wen Wu, LARC, NTHUMappedADDR114


Conclusions• <strong>Repair</strong> rate simulation under different redundancy analysis algorithms andspare-element configurations− Helps evaluate BIRA algorithms and develop BISR schemes• BIRA can be done in an efficient way− LRM, LO, and ESP• A power-on BISR scheme <strong>for</strong> RAM has been presented− The BIRA circuit executes the proposed RA algorithm <strong>for</strong> 2-D redundancy− To reduce the complexity and increase the flexibility, the spare columns aregrouped and segmented− Software-based address remapping is per<strong>for</strong>med in the down-gradedoperation mode• An industrial case has been experimented− Full repair can be achieved (if # random faults is no more than 10)− Only 4.6% area overhead <strong>for</strong> the 8Kx64 SRAM• Processor-based BISR scheme has a low area overhead− Reuses the on-chip processor− TPG Wave<strong>for</strong>m Generator is designed by combining single-operationelements− At-Speed testing can be achievedm07bisr10.05Cheng-Wen Wu, LARC, NTHU115

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!