08.06.2013 Views

Factorial in VHDL - ALSE

Factorial in VHDL - ALSE

Factorial in VHDL - ALSE

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

© 2009 A.L.S.E. - <strong>Factorial</strong> Application Note v1.4<br />

Introduction<br />

This Application Note demonstrates some advanced features of <strong>VHDL</strong> and some cod<strong>in</strong>g techniques that can<br />

be used to keep<strong>in</strong>g the description both efficient and synthesizable. While the factorial calculation <strong>in</strong><br />

hardware is unlikely to f<strong>in</strong>d its place <strong>in</strong> real-world applications (besides scholar assignments), the techniques<br />

demonstrated here are very applicable to real-world projects and can help enhance the code quality.<br />

(<strong>in</strong> other words : yes, we could code the factorial us<strong>in</strong>g a loop, and, yes, RTL project usually don't need<br />

recursion...)<br />

The function we want to implement <strong>in</strong> hardware is the well-know <strong>Factorial</strong> noted with an exclamation mark :<br />

0 ! = 1 (by convention)<br />

1 ! = 1<br />

N ! = N x (N-1) x (N-2) …. x 2 x 1<br />

This function is easy to def<strong>in</strong>e us<strong>in</strong>g recursion :<br />

N ! = N x (N-1) !<br />

We'll see that we can code the recursion while keep<strong>in</strong>g the code synthesizable.<br />

The Entity<br />

To be able to display easily the maximum operat<strong>in</strong>g frequency, we implement the <strong>Factorial</strong> (comb<strong>in</strong>ational<br />

function) between two banks of registers (Flip-Flops), so this module will be fully synchronous and pipel<strong>in</strong>ed.<br />

Therefore the <strong>in</strong>terface needs a clock, an <strong>in</strong>put vector and an output vector. This is a good practice for<br />

unitary synthesis, allow<strong>in</strong>g fast and realistic estimation of complexity. Note that we don't really need a reset<br />

<strong>in</strong> this trivial case (because the system doesn't have any feedback path).<br />

Our first attempts will try to implement <strong>Factorial</strong> of numbers comprised between 0 and 5 or between 0 and 7<br />

(3 bits unsigned vectors) with an output requir<strong>in</strong>g log2(7!=5040) → 13 bits or less.<br />

Library IEEE;<br />

use IEEE.std_logic_1164.all;<br />

use IEEE.numeric_std.all;<br />

Advanced use of <strong>VHDL</strong><br />

A.L.S.E. Application Note<br />

Synthesizable <strong>Factorial</strong><br />

& Recursivity with <strong>VHDL</strong><br />

Entity <strong>Factorial</strong> is<br />

port ( Clk : <strong>in</strong> std_logic;<br />

D<strong>in</strong> : <strong>in</strong> std_logic_vector (2 downto 0); -- 0! .. 7! = 5040<br />

Result : out std_logic_vector (12 downto 0) );-- 0 .. 8191<br />

End Entity <strong>Factorial</strong>;<br />

(c) 2009 A.L.S.E. - B. Cuzeau ApNote: <strong>Factorial</strong> <strong>in</strong> <strong>VHDL</strong> 1


Test bench<br />

Even if the code is go<strong>in</strong>g to be straightforward, we must create at least a simple Test Bench to verify that the<br />

code works, behaviorally.<br />

The test bench we need is extremely simple : at the <strong>in</strong>put, we apply values count<strong>in</strong>g up between 0 and the<br />

2**3-1=7 (and cycl<strong>in</strong>g), rema<strong>in</strong><strong>in</strong>g stable dur<strong>in</strong>g 4 clock cycles (so we can see the <strong>in</strong>put value nicely<br />

propagate to the output). And we just eyeball the output waveform.<br />

--------------------------------------------------<br />

-- Test Bench. Simulate -all, eyeball the results<br />

--------------------------------------------------<br />

-- synopsys translate_off<br />

library IEEE; use IEEE.std_logic_1164.all; use IEEE.numeric_std.all;<br />

Entity <strong>Factorial</strong>_tb is end;<br />

Architecture TEST of <strong>Factorial</strong>_tb is<br />

signal Clk : std_logic := '0';<br />

signal D<strong>in</strong> : std_logic_vector (2 downto 0) := (others=>'0');<br />

signal Result : std_logic_vector (12 downto 0);<br />

beg<strong>in</strong><br />

assert Clk='0' or now


Simulation :<br />

All looks good.<br />

But while this simulates correctly as seen above, it fails miserably with all the FPGA synthesis tools we tried<br />

as of April 2009 (Synplify, XST, Quartus).<br />

The reason is that they attempt to implement the function itself <strong>in</strong> hardware, and they don't discover that the<br />

required depth of recursion is limited (they don't “unroll the loop”). Even if we clearly limit the bounds with<br />

the <strong>in</strong>put parameter set to “range 0 to 5”, this doesn't help.<br />

Be careful and prepared while try<strong>in</strong>g this code on a synthesis tools : some tools will issue a warn<strong>in</strong>g and<br />

stop gracefully, others may just eat up gigabytes of your PC's memory until it dies, or end with a stack<br />

failure...<br />

Obviously, we need to f<strong>in</strong>d another (more synthesis-friendly) way to describe the same logic.<br />

Synthesis-Friendly Solution<br />

The way to address the (lack of) “synthesizability” of the previous description is applicable to many other<br />

situations (hence the <strong>in</strong>terest of this Application Note) :<br />

Use the computational algorithm to build constants, and let the synthesis tool deal<br />

with this f<strong>in</strong>ite set of values to generate and reduce the comb<strong>in</strong>ational logic.<br />

In <strong>VHDL</strong>, it's easy to do : we build a constant Table hold<strong>in</strong>g the set of result values, and we simply <strong>in</strong>dex it<br />

with the <strong>in</strong>put vector used as an unsigned number to retrieve the output. But do<strong>in</strong>g so, we describe more or<br />

less a Rom memory, and we may fear that our <strong>in</strong>tended comb<strong>in</strong>ational logic would end up <strong>in</strong> a memory<br />

block. In fact, it doesn't happen <strong>in</strong> this case : the synthesis tools are smart enough to figure out how to<br />

implement the logic, and s<strong>in</strong>ce the <strong>Factorial</strong> decod<strong>in</strong>g logic is very simple, the Quality of Result will be just<br />

perfect and the design will fit <strong>in</strong> just a few Logic Elements.<br />

Let's see the code :<br />

-- ---------------------------------------<br />

Architecture RTL of <strong>Factorial</strong> is -- yes, this is perfect for synthesis !<br />

-- ---------------------------------------<br />

-- The usual recursive function<br />

function fact (d : natural) return natural is<br />

variable res : natural;<br />

beg<strong>in</strong><br />

if d


-- Function to <strong>in</strong>itialize a table with the factorial<br />

impure function Init_Table return Table_t is<br />

variable T : Table_t;<br />

beg<strong>in</strong><br />

for I <strong>in</strong> T'range loop<br />

T(I) := to_unsigned(fact(I),Result'length);<br />

end loop;<br />

return T;<br />

end function Init_Table;<br />

-- The Table itself, <strong>in</strong>itialized by call<strong>in</strong>g Init-Table:<br />

constant Table : Table_t := Init_Table;<br />

-- note : this table will be simplified <strong>in</strong>to a few LUTs<br />

signal D<strong>in</strong>r : std_logic_vector (D<strong>in</strong>'range);<br />

------\<br />

Beg<strong>in</strong> -- Architecture<br />

------/<br />

D<strong>in</strong>r


Clk<br />

D<strong>in</strong>[0..2]<br />

Clk~clkctrl<br />

INCLK OUTCLK<br />

CLKCTRL<br />

D<strong>in</strong>r[0]<br />

PRE<br />

D Q<br />

ENA<br />

SCLR<br />

SDATA<br />

1<br />

SLOAD<br />

CLR<br />

D<strong>in</strong>r[2]<br />

PRE<br />

D Q<br />

ENA<br />

SCLR<br />

SDATA<br />

1<br />

SLOAD<br />

CLR<br />

D<strong>in</strong>r[1]<br />

PRE<br />

D Q<br />

ENA<br />

SCLR<br />

SDATA<br />

1<br />

SLOAD<br />

CLR<br />

Mux0~0<br />

Mux6~1<br />

Mux2~0<br />

Result[4]~reg0<br />

PRE<br />

D Q<br />

ENA<br />

SCLR<br />

SDATA<br />

1<br />

SLOAD<br />

CLR<br />

Mux3~0<br />

Mux6~0<br />

Mux5~1<br />

Mux5~0<br />

Mux7~0<br />

F<br />

F<br />

F<br />

F<br />

F<br />

F<br />

F<br />

F<br />

Synthesis Result<br />

Result[12]~reg0<br />

We have displayed the post-layout View and revealed the equivalent functions of the <strong>in</strong>ternal LUTs.<br />

(c) 2009 A.L.S.E. - B. Cuzeau ApNote: <strong>Factorial</strong> <strong>in</strong> <strong>VHDL</strong> 5<br />

PRE<br />

D Q<br />

ENA<br />

SCLR<br />

SDATA<br />

1<br />

SLOAD<br />

CLR<br />

Result[9]~reg0<br />

PRE<br />

D Q<br />

ENA<br />

PRE<br />

D Q<br />

ENA<br />

PRE<br />

D Q<br />

ENA<br />

SCLR<br />

SDATA<br />

1<br />

SLOAD<br />

CLR<br />

PRE<br />

D Q<br />

ENA<br />

PRE<br />

D Q<br />

ENA<br />

PRE<br />

D Q<br />

ENA<br />

PRE<br />

D Q<br />

ENA<br />

PRE<br />

D Q<br />

ENA<br />

PRE<br />

D Q<br />

ENA<br />

CLR<br />

Result[8]~reg0<br />

CLR<br />

Result[7]~reg0<br />

Result[6]~reg0<br />

CLR<br />

Result[5]~reg0<br />

CLR<br />

Result[3]~reg0<br />

CLR<br />

Result[2]~reg0<br />

CLR<br />

Result[1]~reg0<br />

CLR<br />

Result[0]~reg0<br />

CLR<br />

2' h0 --<br />

Result[0..12]


Ref<strong>in</strong>ement : Design Re-use and Large Integers.<br />

The previous solution wasn't bad, but what if we want to try with larger <strong>in</strong>tegers ?<br />

We face two issues :<br />

– We need to change the entity (augment the number of bits for the ports),<br />

– We will soon have to handle more than 31 bits for the output.<br />

For the first issue, us<strong>in</strong>g generic parameters is not a very brilliant solution, because the parameterization<br />

won't be easier than modify<strong>in</strong>g the ports widths directly. In the previous description, this change did suffice<br />

because, <strong>in</strong>side the architecture, we used attributes to recover the size of items.<br />

Can we do even better ? Yes ! We can.<br />

The idea is to remove the ports dimensions ! In <strong>VHDL</strong> jargon, our ports can use unconstra<strong>in</strong>ed vectors.<br />

How can this work ? The vectors will take their actual dimensions at elaboration time, when the entity will be<br />

hooked to the design, as an <strong>in</strong>stance. The upper level module (for example the Test Bench) will provide the<br />

dimensions.<br />

Wait a sec ! What if the module is at the top level (like for unitary synthesis) ? Are we doomed ?<br />

Thankfully, the workaround is easy : we use a “Wrapper” as a top level which is the same entity but with<br />

dimensioned vectors simply <strong>in</strong>stantiat<strong>in</strong>g the un-dimensioned entity. Our example shows this.<br />

For the second issue (deal<strong>in</strong>g with very large <strong>in</strong>tegers), we need to avoid “Naturals” and use “Unsigned”<br />

<strong>in</strong>stead, s<strong>in</strong>ce unsigned vectors have no limit <strong>in</strong> size.<br />

When we start cod<strong>in</strong>g the <strong>Factorial</strong> function us<strong>in</strong>g unsigned, we start fac<strong>in</strong>g a number of issues aga<strong>in</strong> :<br />

– siz<strong>in</strong>g the function parameters,<br />

– handl<strong>in</strong>g the multiplication result size.<br />

We can cure the first concern by us<strong>in</strong>g aga<strong>in</strong> unconstra<strong>in</strong>ed arrays as function parameters. That's another<br />

powerful feature <strong>in</strong> <strong>VHDL</strong> : the dimensions are fixed dynamically when the function is called.<br />

The second concern comes from the fact that the multiplication operator overloaded for unsigned does<br />

produce a result which is as wide as the sum of the width of the operands. This is too large for our purpose :<br />

we want a result of the same size as the output (just like with naturals). And unfortunately, if you try to<br />

extract a slice of an expression like d * Fact (d-1), you'll be... disappo<strong>in</strong>ted. A neat work-around is to code<br />

the multiply operator as a function call, <strong>in</strong> which case the slice can be extracted :<br />

res := "*" (d,Fact (d-1))(res'range);<br />

Another more recommended way <strong>in</strong> this case would be to simply use the resize function :<br />

res := resize (d * Fact (d-1),res'length);<br />

And this is the f<strong>in</strong>al solution, reproduced next page.<br />

Conclusion<br />

As previously mentioned, all the techniques and tricks exposed here can be used <strong>in</strong> many different designs<br />

and circumstances, to enhance the code quality, readability, and re-usability.<br />

Usual disclaimer: don't ask for support if you are not a customer, but I welcome ideas and suggestions.<br />

Happy cod<strong>in</strong>g <strong>in</strong> <strong>VHDL</strong> !<br />

Bertrand Cuzeau – CTO A.L.S.E<br />

<strong>in</strong>fo@alse-fr.com<br />

-=oOo=-<br />

(c) 2009 A.L.S.E. - B. Cuzeau ApNote: <strong>Factorial</strong> <strong>in</strong> <strong>VHDL</strong> 6


-- Fact_rtl.vhd<br />

-- --------------------------------------------------<br />

-- <strong>Factorial</strong> Example - Synthesizable & efficient !<br />

-- --------------------------------------------------<br />

-- Author : (c) Bert Cuzeau. <strong>ALSE</strong>. http://www.alse-fr.com<br />

-- Version : 3.1, us<strong>in</strong>g unconstra<strong>in</strong>ed vectors.<br />

-- Handles numbers larger than 2**31<br />

--<br />

-- Synthesis results : 8 LUTs for (0! .. 7!)<br />

-- 32 LUTS for (0! .. 15! = 1,307,674,368,000)<br />

-- Tested with Quartus II v 9.0.<br />

-- Should be f<strong>in</strong>e with any synthesis tool.<br />

--<br />

-- Make sure you synthesize "Wrapper" as the top level.<br />

Library IEEE;<br />

use IEEE.std_logic_1164.all;<br />

use IEEE.numeric_std.all;<br />

-- ---------------------------------------<br />

Entity <strong>Factorial</strong> is<br />

-- --------------------------------------port<br />

( Clk : <strong>in</strong> std_logic;<br />

D<strong>in</strong> : <strong>in</strong> std_logic_vector; -- Yep, unconstra<strong>in</strong>ed !<br />

Result : out std_logic_vector );<br />

End Entity <strong>Factorial</strong>;<br />

-- ---------------------------------------<br />

Architecture RTL of <strong>Factorial</strong> is -- yes, this is perfect for synthesis !<br />

-- ---------------------------------------<br />

-- The (almost) usual recursive function<br />

function Fact (d : unsigned) return unsigned is<br />

variable res : unsigned (d'range);<br />

beg<strong>in</strong><br />

if d'1',others=>'0'); -- 1<br />

else<br />

res := "*" (d,Fact (d-1))(res'range); -- function call notation trick<br />

-- res := resize(d * Fact(d-1),res'length); -- recommended<br />

end if;<br />

return res;<br />

end function fact;<br />

-- Constant table type<br />

type Table_t is array (0 to 2**D<strong>in</strong>'length - 1) of unsigned(Result'range);<br />

-- Function to <strong>in</strong>itialize a table with the factorial<br />

impure function Init_Table return Table_t is<br />

variable T : Table_t;<br />

beg<strong>in</strong><br />

for I <strong>in</strong> T'range loop<br />

T(I) := fact(to_unsigned(I,Result'length));<br />

end loop;<br />

return T;<br />

end function Init_Table;<br />

-- The Table itself, <strong>in</strong>itialized at creation :<br />

constant Table : Table_t := Init_Table;<br />

-- note : this table will be simplified <strong>in</strong>to a few LUTs<br />

signal D<strong>in</strong>r : std_logic_vector (D<strong>in</strong>'range);<br />

------\<br />

Beg<strong>in</strong> -- Architecture<br />

------/<br />

D<strong>in</strong>r


--------------------------------------------------------<br />

-- Wrapper for Synthesis (4 -> 41 bits implementation)<br />

--------------------------------------------------------<br />

Library IEEE; use IEEE.std_logic_1164.all;<br />

Entity Wrapper is -- For Synthesis of 4 bits -> 41 bits<br />

port ( Clk : <strong>in</strong> std_logic;<br />

D<strong>in</strong> : <strong>in</strong> std_logic_vector (3 downto 0); -- 0! .. 15! = 13077775800hex<br />

Result : out std_logic_vector (40 downto 0) );-- 41 bits result<br />

End Entity Wrapper;<br />

Architecture Wrap of Wrapper is<br />

beg<strong>in</strong><br />

Fact : entity work.<strong>Factorial</strong> port map (Clk,D<strong>in</strong>,Result);<br />

end architecture Wrap;<br />

--------------------------------------------------<br />

-- Test Bench. Simulate -all, eyeball the results<br />

--------------------------------------------------<br />

-- synopsys translate_off<br />

library IEEE; use IEEE.std_logic_1164.all; use IEEE.numeric_std.all;<br />

Entity <strong>Factorial</strong>_tb is end;<br />

Architecture TEST of <strong>Factorial</strong>_tb is<br />

signal Clk : std_logic := '0';<br />

signal D<strong>in</strong> : std_logic_vector (3 downto 0) := (others=>'0'); -- 0! .. 15!<br />

signal Result : std_logic_vector (40 downto 0);<br />

beg<strong>in</strong><br />

assert Clk='0' or now < 800 ns<br />

report "Simulation has ended (not an error)." severity failure;<br />

Clk

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!