IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION SYSTEMS 1Power Reduction Techniques for LDPC DecodersAhmad Darabiha, Student Member, IEEE, Anthony Chan Carusone, Member, IEEE, and Frank R.Kschischang, Fellow, IEEEAbstract—This paper investigates hardware architectures forlow-density parity-check (LDPC) decoders amenable to lowvoltageand low-power operation. First, a highly-parallel decoderarchitecture with low routing overhead is described. Second,we propose an efficient method to detect early convergence ofthe iterative decoder and terminate the computations, therebyreducing dynamic power. We report on a bit-serial fully-parallelLDPC decoder fabricated in a 0.13-µm CMOS process andshow how the above techniques affect the power consumption.With early termination, the prototype is capable of decodingwith 10.4 pJ/bit/iteration, while performing within 3 dB ofthe Shannon limit at a BER of 10 −5 and with 3.3 Gbpstotal throughput. If operated from a 0.6-V supply, the energyconsumption can be further reduced to 2.7 pJ/bit/iteration whilemaintaining a total throughput of 648 Mbps, due to the highlyparallelarchitecture.Index Terms—Channel coding, low-density parity-check codes,very-large-scale integration, iterative message passing, 10 GigabitEthernet.I. INTRODUCTIONLDPC codes [1] have been adopted for several new digitalcommunication standards due to their excellent errorcorrection performance, freedom from patent protection, andinherently-parallel decoding algorithm [2]–[4]. Most of theresearch on LDPC decoder design so far has focused on codedesigns, decoding algorithms, and decoder architectures thatimprove decoder throughput. Fewer papers have discussedlow-power architectures for LDPC decoders. Analog decodershave been proposed for low-power decoding of LDPC [5]and Turbo codes [6]. However, analog decoders have onlybeen demonstrated on codes with block lengths less than 250bits. Scaling analog decoders to longer block lengths will becomplicated by device mismatches and the need to store andbuffer hundreds of analog inputs to the decoder. The performanceof such short block-length codes is insufficient for thetargeted applications, and the throughput of analog decoders islimited to less than 50 Mbps. In nanoscale CMOS processes,digital LDPC decoders appear to be the best solution for futurecommunication applications that demand performance near thelimits of channel capacity.In this paper, we discuss techniques for low-power digitalLDPC decoders. First, in Section II a highly-parallel decoderarchitecture with low routing overhead is described. Theparallelism permits operation from a low supply voltage,thereby providing low-power consumption. Second, in SectionIII we investigate an early termination scheme to reduce powerconsumption by stopping the decoding iterations as soon asA. Darabiha, A. Chan Carusone and F. R. Kschischang are with The EdwardS. Rogers Sr. Department of Electrical and Computer Engineering, Universityof Toronto, Toronto M5S 3G4, Canada (email: ahmadd@eecg.utoronto.ca;tcc@eecg.utoronto.ca; frank@comm.utoronto.ca).a valid codeword is detected. Finally, Section IV reportsresults from a prototype bit-serial fully-parallel LDPC decoderfabricated in a 0.13-µm CMOS process.A. BackgroundII. LOW-POWER PARALLEL DECODERSLDPC codes are a sub-class of linear error control codes andcan be described as the null space of a sparse {0, 1}−valuedparity-check matrix, H. They can also be described by abipartite graph, or Tanner graph, in which check nodes{c 1 , c 2 , . . . , c C } represent the rows of H and variable nodes{v 1 , v 2 , . . . , v V } represent the columns. An edge connects thecheck node c m to the variable node v n if and only if H mnis nonzero. A code is called (d v , d c )-regular if every columnand every row of H has d v and d c ones, respectively. As anexample, Fig. 1 shows the Tanner graph for a (3, 6)-regularLDPC code with V =10 variable nodes and C=5 check nodes.Fig. 1.c 1c 2c 3c 4c 5from channelchecknodesv 1v 2v 3v 4v 5v 6v 7v 8 v 9v 10LDPC code Tanner graph.variablenodesMin-sum decoding [7] is a type of iterative message-passingdecoding that is commonly used in LDPC decoders due to itssimplicity and good BER performance. Each decoding iterationconsists of updating and transferring extrinsic messagesbetween neighboring variable and check nodes. A message isa belief about the value of corresponding received bit andis expressed in the form of log-likelihood ratio (LLR). Atthe beginning of min-sum decoding, the variable nodes passthe LLR value of the received symbols (i.e. the intrinsicmessage) to all the neighboring check nodes. Then eachiteration consists of check update phase followed by variableupdate phase. During the check update phase the outgoingmessage on each edge of the check node is calculated as afunction of the incoming messages from all the other edges:the magnitude of the output is the minimum of the inputmagnitudes and the sign is the parity of the signs of the inputs.During the variable update phase the outgoing message on

