23.08.2013 Views

FFT Implementation

FFT Implementation

FFT Implementation

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Thomas Lenart 2002-10-02<br />

<strong>FFT</strong> <strong>Implementation</strong><br />

Thomas Lenart<br />

Digital ASIC group<br />

Department of Electroscience<br />

Lund University<br />

thomas.lenart@es.lth.se


• Application specific<br />

Thomas Lenart 2002-10-02<br />

– Architecture<br />

– Number of points<br />

– Word length<br />

– Scaling<br />

Introduction<br />

• Optimized for<br />

– High throughput<br />

– Flexibility<br />

– Low power<br />

– Chip area


Thomas Lenart 2002-10-02<br />

Research projects<br />

1. Radix-2 2 DIF <strong>FFT</strong> algorithm<br />

2. Acoustic Echo Canceller<br />

3. Flexible OFDM system<br />

4. Hardware acceleration in Digital Holographic Imaging


Thomas Lenart 2002-10-02<br />

Radix-2 2 DIF <strong>FFT</strong> algorithm<br />

Same multiplicative complexity as<br />

radix-4, but still retains the<br />

radix-2 butterfly structure<br />

…very close to THE<br />

architecture for VLSI<br />

implementation...


Thomas Lenart 2002-10-02<br />

Butterfly structure<br />

Multiplier #<br />

log 4(N)-1<br />

Adder #<br />

4log 4(N)<br />

Memory size<br />

N-1


Thomas Lenart 2002-10-02<br />

Acoustic Echo Canceller<br />

• Requires several <strong>FFT</strong> units<br />

• Hardware resource sharing between <strong>FFT</strong> units


Thomas Lenart 2002-10-02<br />

Acoustic Echo Canceller contd.


do<br />

Me mory<br />

di<br />

addr<br />

Thomas Lenart 2002-10-02<br />

Control & twiddle factor ROM<br />

<strong>Implementation</strong><br />

Advantages Disadvantages<br />

• Flexible<br />

• Single butterfly, single memory<br />

• Single twiddle factor ROM<br />

• Compact design<br />

• Low throughput


Thomas Lenart 2002-10-02<br />

<strong>Implementation</strong> contd.<br />

Architecture Optimized for:<br />

Time multiplexed<br />

Pipeline<br />

1D<br />

2D<br />

Data scaling<br />

Throughput<br />

Low power<br />

Flexibility<br />

Chip area


• Pipelined 32-1024 points<br />

I<strong>FFT</strong>/<strong>FFT</strong> processor which can be<br />

used in both transmitter and<br />

receiver.<br />

• Unused parts are turned off, for<br />

low power consumption.<br />

Thomas Lenart 2002-10-02<br />

Flexible OFDM system<br />

32-1024 point<br />

I<strong>FFT</strong> processor<br />

D/A<br />

converter<br />

OFDM transmitter<br />

Flexible<br />

coder<br />

Constellation<br />

mapper<br />

with bit loading<br />

Signal<br />

reordering and<br />

CP insertion


Data In<br />

Clock<br />

Thomas Lenart 2002-10-02<br />

Radix 2 2<br />

and<br />

Radix 2<br />

Stage 5<br />

Clock<br />

Gate<br />

<strong>Implementation</strong><br />

Radix 2 2<br />

Stage 4<br />

Clock<br />

Gate<br />

Advantages Disadvantages<br />

• High throughput<br />

• Easy to resize (32-1024 points)<br />

• Easy to turn off unused stages.<br />

• Possible to optimize bit width<br />

in each stage<br />

Radix 2 2<br />

Stage 3<br />

Clock<br />

Gate<br />

Radix 2 2<br />

Stage 2<br />

Clock<br />

Gate<br />

Radix 2 2<br />

Stage 1<br />

Data Out<br />

• Bit reversed output, will need a<br />

reordering unit on the output.<br />

• Large (4 complex multipliers and<br />

ROMs)


Thomas Lenart 2002-10-02<br />

<strong>Implementation</strong> contd.<br />

Architecture Optimized for:<br />

Time multiplexed<br />

Pipeline<br />

1D<br />

2D<br />

Data scaling<br />

Throughput<br />

Low power<br />

Flexibility<br />

Chip area


Hardware acceleration in Digital Holographic<br />

Imaging<br />

• 2D <strong>FFT</strong> calculations up to 2048x2048 complex points<br />

• Nearly constant internal word length by using data scaling<br />

• Capable of handling a wide range of input signals<br />

Thomas Lenart 2002-10-02


Thomas Lenart 2002-10-02<br />

Data scaling approaches<br />

• Block floating point (BFP)<br />

• Convergent block floating point (CBFP)<br />

do<br />

Memory Me mory<br />

di<br />

addr<br />

CBFP<br />

Stage 1 Stage 2<br />

Stage 3<br />

(radix-2)<br />

Control & twiddle factor ROM<br />

CBFP<br />

Stage 4<br />

CBFP<br />

Stage 5<br />

Delay Delay Delay<br />

W(n)<br />

CBFP<br />

Stage 6<br />

Delay<br />

CBFP<br />

Final decoder<br />

Delay<br />

⎡log 4 ( N ) ⎤−stage 4 Trunc<br />

Controller<br />

Trunc


IBF<br />

Thomas Lenart 2002-10-02<br />

FIFO<br />

IBF<br />

Butte Butterfly rfly<br />

(BF)<br />

MUL<br />

MBF<br />

<strong>Implementation</strong><br />

Stage 1 Stage 2 Stage 3 Stage 4 Stage 5 Stage 6<br />

MUL<br />

W(n)<br />

MUL<br />

Normalize Normalizer r<br />

MBF<br />

Advantages Disadvantages<br />

• High throughput<br />

• Low memory requirements compared to<br />

existing scaling approaches<br />

MUL<br />

MBF<br />

Equalizer<br />

MUL<br />

FIFO<br />

exponent<br />

BF<br />

MBF<br />

-j -j<br />

MUL<br />

Equalizer<br />

MBF<br />

FIFO<br />

exponent<br />

BF<br />

• Bit reversed output, will need a<br />

reordering unit on the output.


size (sqrmm)<br />

11<br />

10<br />

Thomas Lenart 2002-10-02<br />

9<br />

8<br />

7<br />

6<br />

5<br />

4<br />

Variable data path<br />

Convergent block floating point<br />

Our approach<br />

3<br />

512 1024 2048 4096<br />

complex points<br />

Results<br />

Total chip size SNR<br />

SNR (dB)<br />

60<br />

50<br />

40<br />

30<br />

20<br />

10<br />

Constant data path<br />

Variable data path<br />

Convergent block floating point<br />

Our approach without VDL<br />

Our approach with VDL<br />

0<br />

1 1/2 1/4 1/8<br />

input amplitude<br />

1/16 1/32 1/64


Thomas Lenart 2002-10-02<br />

<strong>Implementation</strong> contd.<br />

Architecture Optimized for:<br />

Time multiplexed<br />

Pipeline<br />

1D<br />

2D<br />

Data scaling<br />

Throughput<br />

Low power<br />

Flexibility<br />

Chip area


Thomas Lenart 2002-10-02<br />

Conclusions<br />

• Radix-22is an important <strong>FFT</strong> architecture contribution, used in<br />

Digital Video Broadcasting (DVB)<br />

• The projects require different implementation approaches<br />

– Flexible <strong>FFT</strong> processor<br />

– Novel data scaling approach

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!