The.Algorithm.Design.Manual.Springer-Verlag.1998

The.Algorithm.Design.Manual.Springer-Verlag.1998 The.Algorithm.Design.Manual.Springer-Verlag.1998

18.04.2013 Views

Bucketing Techniques Next: War Story: Stripping Triangulations Up: Approaches to Sorting Previous: Randomization Bucketing Techniques If we were sorting names for the telephone book, we could start by partitioning the names according to the first letter of the last name. That will create 26 different piles, or buckets, of names. Observe that any name in the J pile must occur after every name in the I pile but before any name in the K pile. Therefore, we can proceed to sort each pile individually and just concatenate the bunch of piles together. If the names are distributed fairly evenly among the buckets, as we might expect, the resulting 26 sorting problems should each be substantially smaller than the original problem. Further, by now partitioning each pile based on the second letter of each name, we generate smaller and smaller piles. The names will be sorted as soon as each bucket contains only a single name. The resulting algorithm is commonly called bucketsort or distribution sort. Bucketing is a very effective idea whenever we are confident that the distribution of data will be roughly uniform. It is the idea that underlies hash tables, kd-trees, and a variety of other practical data structures. The downside of such techniques is that the performance can be terrible whenever the data distribution is not what we expected. Although data structures such as binary trees offer guaranteed worst-case behavior for any input distribution, no such promise exists for heuristic data structures on unexpected input distributions. Figure: A small subset of Charlottesville Shiffletts To show that non-uniform distributions occur in real life, consider Americans with the uncommon last name of Shifflett. The 1997 Manhattan telephone directory, with over one million names, contains exactly five Shiffletts. So how many Shiffletts should there be in a small city of 50,000 people? Figure shows a small portion of the two and a half pages of Shiffletts in the Charlottesville, Virginia telephone book. The Shifflett clan is a fixture of the region, but it would play havoc with any distribution sort program, as refining buckets from S to Sh to Shi to Shif to to Shifflett results in no significant partitioning. Next: War Story: Stripping Triangulations Up: Approaches to Sorting Previous: Randomization file:///E|/BOOK/BOOK/NODE36.HTM (1 of 2) [19/1/2003 1:28:32]

Bucketing Techniques Algorithms Mon Jun 2 23:33:50 EDT 1997 file:///E|/BOOK/BOOK/NODE36.HTM (2 of 2) [19/1/2003 1:28:32]

Bucketing Techniques<br />

Next: War Story: Stripping Triangulations Up: Approaches to Sorting Previous: Randomization<br />

Bucketing Techniques<br />

If we were sorting names for the telephone book, we could start by partitioning the names according to the first letter of the last<br />

name. That will create 26 different piles, or buckets, of names. Observe that any name in the J pile must occur after every name<br />

in the I pile but before any name in the K pile. <strong>The</strong>refore, we can proceed to sort each pile individually and just concatenate the<br />

bunch of piles together.<br />

If the names are distributed fairly evenly among the buckets, as we might expect, the resulting 26 sorting problems should each<br />

be substantially smaller than the original problem. Further, by now partitioning each pile based on the second letter of each<br />

name, we generate smaller and smaller piles. <strong>The</strong> names will be sorted as soon as each bucket contains only a single name. <strong>The</strong><br />

resulting algorithm is commonly called bucketsort or distribution sort.<br />

Bucketing is a very effective idea whenever we are confident that the distribution of data will be roughly uniform. It is the idea<br />

that underlies hash tables, kd-trees, and a variety of other practical data structures. <strong>The</strong> downside of such techniques is that the<br />

performance can be terrible whenever the data distribution is not what we expected. Although data structures such as binary<br />

trees offer guaranteed worst-case behavior for any input distribution, no such promise exists for heuristic data structures on<br />

unexpected input distributions.<br />

Figure: A small subset of Charlottesville Shiffletts<br />

To show that non-uniform distributions occur in real life, consider Americans with the uncommon last name of Shifflett. <strong>The</strong><br />

1997 Manhattan telephone directory, with over one million names, contains exactly five Shiffletts. So how many Shiffletts<br />

should there be in a small city of 50,000 people? Figure shows a small portion of the two and a half pages of Shiffletts in<br />

the Charlottesville, Virginia telephone book. <strong>The</strong> Shifflett clan is a fixture of the region, but it would play havoc with any<br />

distribution sort program, as refining buckets from S to Sh to Shi to Shif to to Shifflett results in no significant partitioning.<br />

Next: War Story: Stripping Triangulations Up: Approaches to Sorting Previous: Randomization<br />

file:///E|/BOOK/BOOK/NODE36.HTM (1 of 2) [19/1/2003 1:28:32]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!