SIMPLE PLL-BASED TRUE RANDOM NUMBER ... - KEMT FEI TUKE
SIMPLE PLL-BASED TRUE RANDOM NUMBER ... - KEMT FEI TUKE
SIMPLE PLL-BASED TRUE RANDOM NUMBER ... - KEMT FEI TUKE
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
<strong>SIMPLE</strong> <strong>PLL</strong>-<strong>BASED</strong> <strong>TRUE</strong> <strong>RANDOM</strong> <strong>NUMBER</strong> GENERATOR FOR<br />
EMBEDDED DIGITAL SYSTEMS<br />
Viktor Fischer * , Miloš Drutarovský ** , Martin Šimka ** and Frédéric Celle *<br />
*<br />
Laboratoire Traitement du Signal et<br />
Instrumentation, Unité Mixte de<br />
Recherche CNRS 5516,<br />
Université Jean Monnet<br />
10, rue Badouin,<br />
42 000 Saint-Etienne, France<br />
** Department of Electronics and<br />
Multimedia Communications,<br />
Technical University of Košice<br />
Park Komenského 13,<br />
04120 Košice, Slovak Republic<br />
{Milos.Drutarovsky, Martin.Simka}<br />
{fischer, celle}@univ-st-etienne.fr @tuke.sk<br />
Abstract. The paper presents a simple true random number generator (TRNG)<br />
which can be embedded in digital Application Specific Integrated Circuits<br />
(ASICs) and Field Programmable Logic Devices (FPLDs). As a source of<br />
randomness, it uses on-chip noise generated in the internal analog phaselocked<br />
loop (<strong>PLL</strong>) circuitry. In contrast with traditionally used free running<br />
oscillators, it uses a novel method of randomness extraction based on two<br />
rationally related synthesized clock signals. The generator has been developed<br />
for embedded cryptographic applications, where it significantly increases the<br />
system security, but it can be used in a wide range of applications. The quality<br />
of TRNG output is confirmed by applying special statistical tests, which pass<br />
even for high output bit-rates of several hundreds of kilobits per second.<br />
1 Introduction<br />
Producing unpredictable, i.e. irreproducible uniformly distributed, random number<br />
sequences on hardware is one of the central issues of cryptographic software and hardware<br />
implementations. Random numbers are needed in particular for key generation,<br />
authentication protocols, zero-knowledge protocols, padding, in many digital signature<br />
schemes and even in some encryption algorithms. In all these applications, security greatly<br />
depends on the quality of the source of random numbers.<br />
TRNGs can be produced using any non-deterministic process. The most typical<br />
implementations use quantum mechanics [1] which can produce up to one Mbit per second.<br />
Radioactivity can also provide true random bits [2] but at a much lower rate of several<br />
hundreds of bits per second. Many other hardware generators exist. They usually exploit the<br />
thermal noise (resistance or shoot) in electronic devices. In the well-known Intel Random<br />
Number Generator [3] a slow clock signal modulated by a thermal noise is sampled using<br />
fast clock signal from a free running oscillator. The drift between the two clocks provides<br />
the source of random bits with an average rate of 75 Kbits per second.
Our aim was to find a solution completely embedded in a digital circuit. These circuits<br />
are well suited for implementation of so called pseudo-random number generators.<br />
However, pseudo-random number generators cannot provide sufficient security, because<br />
they are predictable. Digital circuits include only limited sources of randomness, e. g.<br />
metastability, frequency of a free-running oscillator, clock jitter, etc. Usually, reliable<br />
generators based on the metastability and/or frequency instability are difficult to implement<br />
and they are not secure enough for cryptographic applications. The entropy increase is not<br />
sufficiently high in some cases, so the output of generator can be predicted [4]. In another<br />
case [5] off-chip components are required what decreases a cryptographic security of the<br />
implementation.<br />
In contrast with these methods, in [6] we have proposed a novel method of randomness<br />
extraction based on two rationally related synthesized clock signals. It was shown that it is<br />
perfectly suited for modern FPLDs with the internal analog phase-locked loop (<strong>PLL</strong>)<br />
circuitry (e.g. Apex or Stratix FPLDs from Altera). The generator provided random data<br />
with only small deviations from ideal one.<br />
In this paper we present a modified version of our generator [6] having simplified<br />
structure and higher data rate. It is based on an extended knowledge of jitter features<br />
obtained by measurements in real conditions on Altera development boards.<br />
2 The <strong>PLL</strong>-synthesized clock jitter<br />
2.1 Analog <strong>PLL</strong>s embedded in digital circuits<br />
New digital VLSI circuits use advanced clock generation and distribution circuitry based on<br />
embedded analog <strong>PLL</strong>s [7], [8], [9], [10]. A simplified block diagram of one analog based<br />
<strong>PLL</strong> block available in advanced digital circuits is depicted in Figure 1. Each <strong>PLL</strong> block<br />
can provide at least one synthesized clock signal with frequency F OUT :<br />
m KM<br />
FOUT = FIN = FIN<br />
(1)<br />
n× k KD<br />
where F IN is the frequency of the external input clock source. Reference-, feedback- and<br />
post-divider values n, m and k can vary from one to several hundreds in FPGAs [9], [10], or<br />
to several thousands in ASICs [8].<br />
:n<br />
Input Clock<br />
FIN<br />
Phase<br />
Frequency<br />
Detector<br />
:m<br />
Charge<br />
Pump<br />
&<br />
Loop<br />
VCO :k<br />
Filter Output Clock<br />
ClockShift<br />
Circuitry<br />
FOUT = FIN m<br />
n × k<br />
Figure 1: Simplified block diagram of an embedded <strong>PLL</strong> circuitry<br />
2.2 Jitter of <strong>PLL</strong> synthesized clock signals<br />
In analog <strong>PLL</strong>s, a noise causes the Voltage Controlled Oscillator (VCO) to fluctuate in<br />
frequency. Other frequency fluctuations are caused by variations of supply voltage,<br />
temperature and by a noisy environment. The internal control circuitry adjusts the VCO
ack to the specified frequency, but a certain part of the fluctuations caused by nondeterministic<br />
noise cannot be compensated for and is seen as a clock jitter.<br />
The size of the intrinsic jitter depends on the quality factor Q of the VCO, on the<br />
bandwidth of the Loop Filter and on the so-called pattern jitter introduced by the Phase<br />
Frequency Detector. It is often given in peak-to-peak value or 1-sigma (or RMS) value. 1sigma<br />
value of the jitter ( σ jit ) depends on the technology and the configuration of the <strong>PLL</strong><br />
and it can range from 3.5 ps to 10 ps for ASICs [8] up to 50 ps for FPGAs [9], [10]. Since<br />
the technology of the <strong>PLL</strong> and the quality of the VCO is usually defined, a user can change<br />
the output jitter by modification of divider values and filter bandwidth.<br />
For example the jitter in an Apex FPLD has 1-sigma value of σ jit ≈15.9<br />
ps for a<br />
F OUT = 66.6 MHz clock signal and multiplication factor 2 [11]. These results were acquired<br />
under “ideal conditions”, with only a minimal amount of FPLD resources occupied and<br />
minimal input/output activities. Our last measurements (cf. in subsection 2.3) show that the<br />
clock jitter in an Apex FPLD is significantly higher (about 140 ps ) for higher multiplication<br />
factors and when internal FPLD flip-flops are switching on different clock frequencies.<br />
2.3 Jitter size measurement<br />
Since the knowledge of the jitter size and its statistical features is crucial for correct settings<br />
of the TRNG parameters, several measurements have been made for different<br />
configurations of the <strong>PLL</strong> (different values KM<br />
and KD<br />
). We used Agilent Infiniium DCA<br />
86100B wide bandwidth oscilloscope and the Nios development board [12] with Apex<br />
EP20K200 device. The 1-sigma value of the jitter measured at the output of the <strong>PLL</strong> has<br />
achieved values σ jit ≈ 32 ps for FIN = 33. 3 MHz<br />
and KM K D = 21.<br />
For ratios<br />
KM K D = 32,43,54,65…<br />
the 1-sigma value was in the range σ jit ≈47 −64ps<br />
and the<br />
jitter has approximately Gaussian distribution as it is illustrated in Figure 2a) for<br />
configuration M D 65 = K K .<br />
However, for M D 157 48 = K K used in [6], the jitter was σ jit ≈140<br />
ps and it exhibited<br />
two peaks with a total size of 600<br />
ps (peak-to-peak). The obtained jitter distribution is<br />
depicted in Figure 2b) and it is compatible with a reference measurement made by Xilinx<br />
on Altera FPLDs [11], where a two-peak jitter distribution has been documented. Since the<br />
jitter included in the clock signal in real conditions was significantly higher than that<br />
documented by Altera (note, that Altera has used the same kind of development board, only<br />
K M and KD<br />
parameters were different), we could significantly reduce the generator<br />
complexity. As far as K M and K D parameters are chosen properly, the proposed method is<br />
insensitive to the jitter distribution (see section 3 and [6]).<br />
a) b)<br />
Figure 2: The jitter probability distribution for a) 65<br />
M D<br />
K K = b) K K = 157 48<br />
M D
3 Randomness extraction from the <strong>PLL</strong>-generated clock jitter<br />
The basic principle behind our method is to extract the randomness from the jitter of the<br />
clock signal synthesized in the embedded analog <strong>PLL</strong>. The jitter is detected by the sampling<br />
of a reference (clock) signal using a rationally related (clock) signal synthesized in the onchip<br />
analog <strong>PLL</strong>. The fundamental problem lies in the fact that the reference signal has to<br />
be sampled near the edges influenced by the jitter. The basic structure of the random<br />
bitstream generator is depicted in Figure 3.<br />
CLK<br />
CLJ<br />
<strong>PLL</strong> D<br />
CLK<br />
Q<br />
D<br />
CLK<br />
Q<br />
q(nTCLK)<br />
Decimator<br />
(NKD)<br />
x(nNTQ)<br />
Figure 3: Basic structure of random bitstream generator using <strong>PLL</strong>-synthesized low-jitter clock signal<br />
Because there is a probability that the first flip-flop could become metastable, the<br />
second flip-flop is cascaded. In case the first flip-flop produces a metastable output, it can<br />
resolve until its output is clocked by the second flip-flop. This flip-flops connection does<br />
not assure that only stable signal is clocked, but the probability that the output qn ( T CLK ) will<br />
get a valid state is much higher [13].<br />
Let CLJ be an on-chip <strong>PLL</strong>-synthesized rectangular clock waveform with the frequency<br />
FCLJ = FCLK KM/ KD<br />
(2)<br />
where CLK is a reference clock signal and parameters K M and KD<br />
defined in (1) are<br />
related to the <strong>PLL</strong> structure. Signal CLJ is sampled into the D flip-flop using a clock signal<br />
with frequency F CLK . There are KD rising edges of CLK signal and 2K M edges (rising and<br />
falling) of CLJ waveform during time period<br />
TQ = KDTCLK = KMTCLJ (3)<br />
It has been shown in [6] that if K M and KD<br />
are relative primes, the set of samples create an<br />
equidistant set of values with the distance step<br />
TCLK TCLJ<br />
d = GCD( 2 KM, KD) = GCD( 2 KM,<br />
K D)<br />
(4)<br />
2KM 2KD<br />
where GCD means Greatest Common Divisor. It has been shown that the worst-case<br />
distance between the two closest edges of CLK and CLJ during the period T Q is given as<br />
( )<br />
If KM, K D and F CLJ are chosen so, that<br />
σ > MAX ( ∆ T )<br />
MAX ∆ T = d /2<br />
(5)<br />
jit<br />
we can guarantee that during T Q the sampling edge of CLK will fall at least once into the<br />
edge zone of CLJ (the edge zone means the time interval around the edge with a width<br />
smaller than σ jit ). Therefore during the period T Q , D values of CLJ will be sampled into<br />
the first D flip-flop and at least one of them will statistically depend on the random jitter, so<br />
the output value of the second flip-flop will be nondeterministic. In [6] we used<br />
delay elements to increase the probability of overlapping of and CLJ edge zones.<br />
K<br />
qnT ( CLK )<br />
CLK<br />
min<br />
min<br />
(6)
Thanks to the measurements we got the information about the jitter value, and also about<br />
the probability of edges overlapping that both are higher as expected. Therefore the delay<br />
line is not needed anymore. The decimated output signal<br />
( Q) = ( Q) ⊕ ( Q − CLK) … ⊕ ( Q−( D−1) CLK)<br />
xnT qnT qnT T qnT K T<br />
which is generated at the output of an Exclusive-OR (XOR)-based decimator as a bit-wise<br />
addition modulo 2 ( ⊕ ) of K D samples ( ) . q sampled with the frequency F CLK , will be<br />
nondeterministic, too.<br />
It can be seen that in reference to [6] we have changed basic structure of the generator<br />
in several ways:<br />
- because of higher jitter value, which has been approved by precise jitter<br />
measurements we could replace the delay line and the bank of flip-flops by a single<br />
flip-flop,<br />
- the metastability behavior of the signal qn ( T CLK ) and thus of the generator in general<br />
was improved by addition of the second flip-flop,<br />
- we have validated that if condition (6) is fulfilled, the speed of the generator can be<br />
increased by reducing the decimation factor to NK D , where N = 1 is number of T Q<br />
periods.<br />
4 TRNG implementation<br />
We have validated our simplified structure of the random bitstream generator using Altera<br />
analog <strong>PLL</strong>s embedded in Apex E family. We have used an evaluation board with a PC<br />
Card interface. The Apex EP20K160 device has included generator, 16x128-bit FIFO, PC<br />
Card interface and a custom logic. As the best option a 2-<strong>PLL</strong> configuration (shown in<br />
Figure 4) with only one input clock signal has been chosen.<br />
Synthesized clock signals CLK and CLJ are not fed out from the FPLD (in design<br />
presented in [6] one synthesized signal has been fed out of the device and again reused in<br />
FPLD, what is definitely less secure). Therefore they cannot be manipulated separately<br />
without a circuit reconfiguration. This fact is very important for cryptographic applications,<br />
because it significantly improves overall system security.<br />
Oscil<br />
40 MHz<br />
CLK4p<br />
CLK2p (NC)<br />
CLKLK_FB2p (NC)<br />
CLKLK_OUT2p (NC)<br />
APEX EP20K160E-2X<br />
<strong>PLL</strong>4<br />
clk1<br />
inclk<br />
clk0<br />
<strong>PLL</strong>2<br />
clk1<br />
inclk<br />
clk0<br />
fCLJ<br />
(96,4 MHz)<br />
Random<br />
bitstream<br />
generator<br />
fCLK (95 MHz)<br />
S/P<br />
Conv.<br />
Serial output 454 kbits/s<br />
Custom logic<br />
&<br />
PC Card interface<br />
FIFO<br />
16x128<br />
Figure 4: Block diagram of the experimental PC card with Apex EP20K160 ETC144-2x<br />
(7)
Multiplication and division factors for individual output signals were selected as follows<br />
(the first number represents the <strong>PLL</strong> index and the second number the output index):<br />
clk 20 = 40× 53 22 = 96.36( MHz)<br />
clk 40 = 40× 19 38 = 20( MHz)<br />
clk 41= 40× 19 8 = 95( MHz)<br />
Therefore MAX ( ∆Tmin ) ≈ 12 ps (see equations (4) and (5)). This value is much smaller than<br />
measured jitter value σ jit (see section 2.3). The TRNG was implemented for N = 1 and so<br />
RTRNG ≈ 45400bit/s<br />
(about six times more than in [6]). This was done intentionally in order<br />
to check generated output bitstream under more stringent conditions.<br />
The FPLD resource requirements of the proposed TRNG blocks as well as supporting<br />
logic (FIFO and control logic) are shown in Table 1. The first four columns show resource<br />
requirements (Logic Cells, Embedded System Blocks (ESB), <strong>PLL</strong> outputs and Global<br />
Clocks used) for the generator. The other four columns give requirements of the complete<br />
circuit including FIFO buffer and data bus controller. Presented results have been obtained<br />
using Altera Quartus II v. 2.2.<br />
Table 1: TRNG resource requirements<br />
TRNG only TRNG + FIFO and logic<br />
LC ESB <strong>PLL</strong> GCLK LC ESB <strong>PLL</strong> GCLK<br />
19 0 2 2 129 2 2 3<br />
While 19 logic cells represent less than 1% of available cells, the project uses 100% of<br />
available <strong>PLL</strong>s (two <strong>PLL</strong>s). Since each <strong>PLL</strong> contains two clock outputs, two clock outputs<br />
remain available (one at each <strong>PLL</strong>) and with some restrictions, they can be used inside the<br />
application area (e. g. in our case for the PC Card interface). Internal usage of <strong>PLL</strong> clock<br />
signals represents no security problem as these signals cannot be externally manipulated.<br />
5 Statistical evaluation of TRNG<br />
There are some well-documented general statistical tests that can be used to look for<br />
deviations from an ideal TRNG [14]. A good TRNG should pass all kinds of tests.<br />
5.1 Results of the NIST tests<br />
Our NIST statistical tests were performed on 1 Gigabit of continuous TRNG output records<br />
(for N = 1)<br />
and followed the testing strategy, general recommendations and result<br />
interpretation described in [14]. We have used a set of 1024 1-Megabit sequences produced<br />
by the generator and we have evaluated the set of P -values at a significance level α = 0.01 .<br />
We have got similar results as in [6] (with the exception of FFT test, see below) so we can<br />
conclude that there are no detectable differences for ensemble of 1024 1-Megabit records.<br />
Results of these tests are not included in the paper due to space limitation.<br />
It was claimed in [6] that some 1-Gbit TRNG records did not pass NIST FFT tests.<br />
During evaluation of proposed new TRNG implementation we have found errors in NIST<br />
FFT test formulation (confirmed also in [15]). After correction of these errors in NIST<br />
package all the FFT tests pass without any problems for all tested (1-Gbit) records. To
prove even further the quality of the generator, we have applied also very strict Frequency<br />
(Monobit) tests for very long records.<br />
5.2 Frequency (Monobit) test of very long record<br />
The most common statistical test of TRNGs is the Frequency (Monobit) test [14]. Good<br />
TRNGs should provide independent binary (Bernoulli) random variables 0 and 1 with the<br />
same probability. For a sequence of independent identically distributed Bernoulli random<br />
variables x( nNK D ) we can define variable Sn = X1+ … + Xn,<br />
where Xn = 2x( nNKD)<br />
−1<br />
are<br />
antipodaly encoded values { − 1,1}<br />
. By the classic De Moivre-Laplace theorem [14], for a<br />
sufficiently large number of trials, the distribution of the normalized binomial sum<br />
sn = Sn<br />
/ n is closely approximated by a standard normal distribution N ( 0,1)<br />
and, roughly<br />
said, Sn < 3 n for almost all n (note that we used extremely long TRNG record in order to<br />
detect also very small deviations).<br />
Results for 74-Gbit record (bottom curve, for decimation factor N = 1)<br />
and 37-Gbit<br />
record (upper curve, for N = 2 ) are shown in Figure 5. Small deviation for N = 1 is<br />
visible. After decimation by factor 2, deviation is within expected values (marked as<br />
parabola with peak point in 0), but it remains visible. This is the only deviation from an<br />
ideal TRNG currently known to us. On the other hand, the bias decreasing using the XOR<br />
decimator is applicable only for independent bits. The fact that we obtained lower bias after<br />
decimation gives us indirect evidence that output bits are independent and very close to the<br />
perfectly random bits.<br />
Difference<br />
0.5<br />
0<br />
-0.5<br />
-1<br />
-1.5<br />
-2<br />
-2.5<br />
x Cummulated 106<br />
1<br />
"1" and "0" Difference<br />
0 1 2 3 4 5 6 7 8<br />
x 10 10<br />
-3<br />
Record Length<br />
Figure 5: Results of frequency tests of very long TRNG output: a) 74 Gbits ( N = 1)<br />
b) 37 Gbits ( N = 2 )<br />
6 Conclusions<br />
In this paper we have described and evaluated a simplified method of true random bitstream<br />
generation inside digital VLSI circuits. The design of the TRNG and the method of<br />
randomness extraction guarantee that the output depends on a physical undeterministic<br />
b)<br />
a)
internal random process. The randomness of the sequence of numbers has been extensively<br />
tested and only extremely small differences from an ideal TRNG have been detected.<br />
The proposed solution is very cheap, it uses a very small number of logic resources and<br />
it is faster than comparable methods. Although the functionality of the proposed solution<br />
has been demonstrated for Altera Apex family, the same principle can be used for all recent<br />
high-performance ASICs or FPLDs that include an on-chip reconfigurable analog <strong>PLL</strong> for<br />
clock synthesis.<br />
In the future we plan a deeper research of the jitter and its measurement, as its statistical<br />
features directly affect the statistical features of the output random sequence.<br />
Acknowledgments<br />
This work has been done in the frame of the project CryptArchi included in the French<br />
national program ACI Cryptologie and Slovak scientific project VEGA 1/1057/04.<br />
References<br />
[1] T. Jennewein, U. Achleitner, G. Weihs, H. Weinfurter, A. Zeilinger, “A fast and compact quantum random<br />
number generator”, Rev. Sci. Inst. 71, 1675-1680 (2000)<br />
[2] J. Walker, “Hotbits: Genuine random numbers, generated by radioactive decay”, online at<br />
www.fourmilab.ch/hotbits<br />
[3] B. Jun, P. Kocher, "The INTEL Random Number Generator", Cryptography Research, Inc., White Paper<br />
prepared for Intel Corporation, April 1999, pp.1-8, www.intel.com<br />
[4] M. Dichtl, “How to Predict the Output of a Hardware Random Number Generator”, In C.D. Walter, C. K.<br />
Koc, Ch. Paar (Eds.): CHES 2003, LNCS 2779, Springer, Berlin, 2003, pp.181-188<br />
[5] K.H. Tsoi, K.H. Leung, P.H. Leong, “Compact FPGA-based True and Pseudo Random Number<br />
Generators”, Proceedings of the IEEE Symposium on Field-Programmable Custom Computing Machines<br />
(FCCM), California USA, 2003, pp.51-61<br />
[6] V. Fischer, M. Drutarovský, “True Random Number Generator Embedded in Reconfigurable Hardware”,<br />
In B.S.Kaliski Jr. et al. (Eds.): CHES 2002, LNCS 2523, Springer, Berlin, 2003, pp.415-430<br />
[7] J. G. Maneatis, “Selecting <strong>PLL</strong>s for ASIC Applications Requires Tradeoffs”, Planet Analog Magazine<br />
9/2003, www.planetanalog.com<br />
[8] “XpressArray High Density 0.18um Structured ASIC”, WEB page of the AMI Semiconductors Company,<br />
www.amis.com<br />
[9] “Using the ClockLock & ClockBoost <strong>PLL</strong> Features in Apex Devices”, Altera Application Note 115, v.2.3,<br />
May 2002, pp.1-55, www.altera.com<br />
[10] “Using <strong>PLL</strong>s in Stratix Devices”, Altera Application Note 200, v.1.0, February 2002, pp.1-70,<br />
www.altera.com<br />
[11] “Superior Jitter Management with DLLs”, Virtech Tech Topic VTT013 (v.1.2), January 21, 2003, pp.1-6,<br />
www.xilinx.com<br />
[12] “Nios Embedded Processor Development Board”, Altera Data Sheet, v.2.1, April 2002, pp.1-22,<br />
www.altera.com/nios<br />
[13] “Metastability in Altera Devices”, Altera Application Note 42, v.4.0, May 1999, pp.1-10,<br />
www.altera.com<br />
[14] A. Rukhin, J. Soto, J. Nechvatal, M. Smid, E. Barker, S. Leigh, M. Levenson, M. Vangel, D. Banks, A.<br />
Heckert, J. Dray, S. Vo, “A statistical test suite for random and pseudorandom number generators for<br />
cryptographic applications“, NIST Special Publication 800-22, May 15, 2001, pp.1-153,<br />
http://csrc.nist.gov/rng/<br />
[15] S. Kim, K. Umeno, and A. Hasegawa, “Corrections of the NIST Statistical Test Suite for Randomness”,<br />
Cryptology ePrint Archive, Report 2004/018, January 26, 2004, http://eprint.iacr.org/2004/018.pdf