09.12.2012 Views

Concrete mathematics : a foundation for computer science

Concrete mathematics : a foundation for computer science

Concrete mathematics : a foundation for computer science

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

8.5 HASHING 399<br />

After a successful search, the desired data D(K) appears in DATA [jl , as in<br />

the previous algorithm. After an unsuccessful search, we can enter K and D(K)<br />

in the table by doing the following operations:<br />

n := n+l;<br />

if j < 0 then FIRSTCil :=n else NEXT[il :=n;<br />

KEYCn.1 := K; DATACnl := D(K); NEXT[n] := 0. (8.83)<br />

Now the table will once again be up to date.<br />

We hope to get lists of roughly equal length, because this will make the<br />

task of searching about m times faster. The value of m is usually much greater<br />

than 4, so a factor of l/m will be a significant improvement.<br />

We don’t know in advance what keys will be present, but it is generally<br />

possible to choose the hash function h so that we can consider h(K) to be a<br />

random variable that is uni<strong>for</strong>mly distributed between 1 and m, independent<br />

of the hash values of other keys that are present. In such cases computing the<br />

hash function is like rolling a die that has m faces. There’s a chance that all<br />

the records will fall into the same list, just as there’s a chance that a die will<br />

always turn up � ; but probability theory tells us that the lists will almost<br />

always be pretty evenly balanced.<br />

Analysis of Hashing: Introduction.<br />

“Algorithmic analysis” is a branch of <strong>computer</strong> <strong>science</strong> that derives quantitative<br />

in<strong>for</strong>mation about the efficiency of <strong>computer</strong> methods. “Probabilistic<br />

analysis of an algorithm” is the study of an algorithm’s running time, considered<br />

as a random variable that depends on assumed characteristics of the<br />

input data. Hashing is an especially good candidate <strong>for</strong> probabilistic analysis,<br />

because it is an extremely efficient method on the average, even though its<br />

worst case is too horrible to contemplate. (The worst case occurs when all<br />

keys have the same hash value.) Indeed, a <strong>computer</strong> programmer who uses<br />

hashing had better be a believer in probability theory.<br />

Let P be the number of times step H3 is per<strong>for</strong>med when the algorithm<br />

above is used to carry out a search. (Each execution of H3 is called a “probe”<br />

in the table.) If we know P, we know how often each step is per<strong>for</strong>med,<br />

depending on whether the search is successful or unsuccessful:<br />

Step Unsuccessful search<br />

Hl 1 time<br />

H2 P + 1 times<br />

H3 P times<br />

H4 P times<br />

Successful search<br />

1 time<br />

P times<br />

P times<br />

P - 1 times

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!