09.12.2012 Views

Concrete mathematics : a foundation for computer science

Concrete mathematics : a foundation for computer science

Concrete mathematics : a foundation for computer science

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

8.5 HASHING 405<br />

(8.32), and we have Mean(S) = (n- 1)/2, Var(S) = (n2 - 1)/12. Hence<br />

n2-1 (m-l)(n-1) =~ (n-1)(6m+n-5)<br />

VP==+ -Jm2 12m2 ’<br />

(8&v)<br />

Once again we have gained the desired speedup factor of 1 /m. If m = n/inn<br />

and n + 00, the average number of probes per successful search in this case<br />

is about i Inn, and the standard deviation is asymptotically (Inn)/&!.<br />

On the other hand, we might suppose that sk = (kH,))’ <strong>for</strong> 1 6 k 6 n;<br />

this distribution is called “Zipf’s law!’ Then Mean(G) = n/H,, and Var( G) =<br />

in(n + 1)/H,, - n’/Hi. The average number of probes <strong>for</strong> m = n/inn as<br />

n + oo is approximately 2, with standard deviation asymptotic to G/d.<br />

In both cases the analysis allows the cautious souls among us, who fear<br />

the worst case, to rest easily: Chebyshev’s inequality tells us that the lists<br />

will be nice and short, except in extremely rare cases.<br />

Case 2, continued: Variants of the variance.<br />

We have just computed the variance of the number of probes in a successful<br />

search, by considering P to be a random variable over a probability space<br />

with mn.n elements (h,, . . . , hn; k). But we could have adopted another point<br />

OK, gang, time of view: Each pattern (h, , . . . , h,) of hash values defines a random variable<br />

to put on your<br />

P/h,... , h,), representing the probes we make in a successful search of a<br />

skim suits again.<br />

-Friendly TA<br />

particular hash table on n given keys. The average value of PI (h, , . . . , h,),<br />

A(h,, . . . ,&I = ~p.Pr(Pl(hl,...,h,)=p), (8.98)<br />

p=l<br />

can be said to represent the running time of a successful search. This quantity<br />

A(h,, . . . , h,) is a random variable that depends only on (h, , . . . , h,), not on<br />

the final component k; we can write it in the <strong>for</strong>m<br />

A(h,,... ,hn) = $kPb,,...,hn;k),<br />

k=l<br />

since P/(hl,... , h,) = p with probability<br />

~~=, Pr(P(hl,... ,h,;k)=p) = xE=, m nsk[P(hl,... ,h,;k)=p]<br />

~~=, Prh , . . . , hn; k) ~~=, m nSk<br />

= fsk[P(h I,..., h,;k)=p].<br />

k=l

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!