18.04.2013 Views

The.Algorithm.Design.Manual.Springer-Verlag.1998

The.Algorithm.Design.Manual.Springer-Verlag.1998

The.Algorithm.Design.Manual.Springer-Verlag.1998

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Dictionaries<br />

Bottom line: Which binary search tree is best for your application? Probably the balanced tree for<br />

which you have the best implementation readily available. See the choices below. Which flavor of<br />

balanced tree is probably not as important as how good the programmer was who coded it.<br />

● B-trees - For data sets so large that they will not fit in main memory (say more than 1,000,000<br />

items) your best bet will be some flavor of a B-tree. As soon as the data structure gets outside of<br />

main memory, the search time to access a particular location on a disk or CD-ROM can kill you,<br />

since this is several orders of magnitude slower than accessing RAM.<br />

<strong>The</strong> idea behind a B-tree is to collapse several levels of a binary search tree into a single large<br />

node, so that we can make the equivalent of several search steps before another disk access is<br />

needed. We can thereafter reference enormous numbers of keys using only a few disk accesses. To<br />

get the full benefit from using a B-tree, it is important to understand explicitly how the secondary<br />

storage device and virtual memory interact, through constants such as page size and virtual/real<br />

address space.<br />

Even for modest-sized data sets, unexpectedly poor performance of a data structure may be due to<br />

excessive swapping, so listen to your disk to help decide whether you should be using a B-tree.<br />

● Skip lists - <strong>The</strong>se are somewhat of a cult data structure. <strong>The</strong>ir primary benefits seem to be ease of<br />

implementation relative to balanced trees. If you are using a canned tree implementation, and thus<br />

not coding it yourself, this benefit is eliminated. I wouldn't bother with them.<br />

Implementations: LEDA (see Section ) provides an extremely complete collection of dictionary data<br />

structures in C++, including hashing, perfect hashing, B-trees, red-black trees, random search trees, and<br />

skip lists. Given all of these choices, their default dictionary implementation is a randomized search tree<br />

[AS89], presumably reflecting which structure they expect to be most efficient in practice.<br />

XTango (see Section ) is an algorithm animation system for UNIX and X-windows that includes<br />

animations of such dictionary data structures as AVL trees, binary search trees, hashing, red-black trees,<br />

and treaps (randomized search trees). Many of these are interesting and quite informative to watch.<br />

Further, the C source code for each animation is included.<br />

<strong>The</strong> 1996 DIMACS implementation challenge focused on elementary data structures like dictionaries.<br />

<strong>The</strong> world's best available implementations were likely to be identified during the course of the<br />

challenge, and they are accessible from http://dimacs.rutgers.edu/ .<br />

Bare bones implementations in C and Pascal of a dizzying variety of dictionary data structures appear in<br />

[GBY91], among them several variations on hashing and binary search trees, and optimal binary search<br />

tree construction. See Section for details.<br />

file:///E|/BOOK/BOOK3/NODE129.HTM (4 of 5) [19/1/2003 1:30:03]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!