09.12.2012 Views

Concrete mathematics : a foundation for computer science

Concrete mathematics : a foundation for computer science

Concrete mathematics : a foundation for computer science

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

8.5 HASHING 411<br />

possible values of Y, and V(E(XlY)) is th e variance of this random variable<br />

[Now is a good with respect to the probability distribution of Y. Similarly, E(V(XlY)) is the<br />

time to do warmup<br />

exercise 6.)<br />

P is still the number<br />

of probes.<br />

average of the random variables V(Xly) as y varies. On the left of (8.105)<br />

is VX, the unconditional variance of X. Since variances are nonnegative, we<br />

always have<br />

vx 3 V(EW’)) and VX 3 E(V(XlY)). (8.106)<br />

Case 1, again: Unsuccessful search revisited.<br />

Let’s bring our microscopic examination of hashing to a close by doing one<br />

more calculation typical of algorithmic analysis. This time we’ll look more<br />

closely at the total running time associated with an unsuccessful search,<br />

assuming that the <strong>computer</strong> will insert the previously unknown key into its<br />

memory.<br />

The insertion process in (8.83) has two cases, depending on whether j is<br />

negative or zero. We have j < 0 if and only if P = 0, since a negative value<br />

comes from the FIRST entry of an empty list. Thus, if the list was previously<br />

empty, we have P = 0 and we must set FIRSTC&+,l := n + 1. (The new<br />

record will be inserted into position n + 1.) Otherwise we have P > 0 and we<br />

must set a LINK entry to n + 1. These two cases may take different amounts<br />

of time; there<strong>for</strong>e the total running time <strong>for</strong> an unsuccessful search has the<br />

<strong>for</strong>m<br />

T = a+pP$-6[P=O], (8.107)<br />

where OL, fi, and 6 are constants that depend on the <strong>computer</strong> being used and<br />

on the way in which hashing is encoded in that machine’s internal language.<br />

It would be nice to know the mean and variance of T, since such in<strong>for</strong>mation<br />

is more relevant in practice than the mean and variance of P.<br />

So far we have used probability generating functions only in connection<br />

with random variables that take nonnegative integer values. But it turns out<br />

that we can deal in essentially the same way with<br />

Gx(z) = t Pr(w)zx(wi<br />

wcn<br />

when X is any real-valued random variable, because the essential characteristics<br />

of X depend only on the behavior of Gx near z = 1, where powers of z are<br />

well defined. For example, the running time (8.107) of an unsuccessful search<br />

is a random variable, defined on the probability space of equally likely hash<br />

values (h1,. . , , h,; h,+l ) with 1 6 hj 6 m; we can consider the series<br />

GT(z) = &i f...f f<br />

h, =l h,=l h,+,=l<br />

Z”+PPlhl ,..., hn;hn+l)+6P(hl a...> hn;hn+l I=01

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!