09.12.2012 Views

Concrete mathematics : a foundation for computer science

Concrete mathematics : a foundation for computer science

Concrete mathematics : a foundation for computer science

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Somehow the verb<br />

“to hash” magically<br />

became standard<br />

terminology <strong>for</strong> key<br />

trans<strong>for</strong>mation doring<br />

the mid-l 96Os,<br />

yet nobody was rash<br />

enough to use such<br />

an undignified word<br />

publicly until 1967.<br />

-D. E. Knoth [I 75)<br />

8.5 HASHING<br />

8.5 HASHING 397<br />

Let’s conclude this chapter by applying probability theory to <strong>computer</strong><br />

programming. Several important algorithms <strong>for</strong> storing and retrieving<br />

in<strong>for</strong>mation inside a <strong>computer</strong> are based on a technique called “hashing!’<br />

The general problem is to maintain a set of records that each contain a “key”<br />

value, K, and some data D(K) about that key; we want to be able to find<br />

D(K) quickly when K is given. For example, each key might be the name of<br />

a student, and the associated data might be that student’s homework grades.<br />

In practice, <strong>computer</strong>s don’t have enough capacity to set aside one memory<br />

cell <strong>for</strong> every possible key; billions of keys are possible, but comparatively<br />

few keys are actually present in any one application. One solution to the<br />

problem is to maintain two tables KEY [jl and DATACjl <strong>for</strong> 1 6 j 6 N, where<br />

N is the total number of records that can be accommodated; another variable<br />

n tells how many records are actually present. Then we can search <strong>for</strong> a<br />

given key K by going through the table sequentially in an obvious way:<br />

Sl Set j := 1. (We’ve searched through all positions < j.)<br />

S2 If j > n, stop. (The search was unsuccessful.)<br />

S3 If KEY Cjl = K, stop. (The search was successful.)<br />

S4 Increase j by 1 and return to step S2. (We’ll try again.)<br />

After a successful search, the desired data entry D(K) appears in DATACjl.<br />

After an unsuccessful search, we can insert K and D(K) into the table by<br />

setting<br />

n := j, KEY Cnl := K, DATACnl := D(K),<br />

assuming that the table was not already filled to capacity.<br />

This method works, but it can be dreadfully slow; we need to repeat<br />

step S2 a total of n + 1 times whenever an unsuccessful search is made, and<br />

n can be quite large.<br />

Hashing was invented to speed things up. The basic idea, in one of its<br />

popular <strong>for</strong>ms, is to use m separate lists instead of one giant list. A “hash<br />

function” trans<strong>for</strong>ms every possible key K into a list number h(K) between 1<br />

and m. An auxiliary table FIRSTCil <strong>for</strong> 1 6 i 6 m points to the first record<br />

in list i; another auxiliary table NEXTCjl <strong>for</strong> 1 < j 6 N points to the record<br />

following record j in its list. We assume that<br />

FIRSTCi] = -1, if list i is empty;<br />

NEXT[jl = 0, if record j is the last in its list.<br />

As be<strong>for</strong>e, there’s a variable n that tells how many records have been stored<br />

altogether.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!