The.Algorithm.Design.Manual.Springer-Verlag.1998
The.Algorithm.Design.Manual.Springer-Verlag.1998 The.Algorithm.Design.Manual.Springer-Verlag.1998
Bucketing Techniques Next: War Story: Stripping Triangulations Up: Approaches to Sorting Previous: Randomization Bucketing Techniques If we were sorting names for the telephone book, we could start by partitioning the names according to the first letter of the last name. That will create 26 different piles, or buckets, of names. Observe that any name in the J pile must occur after every name in the I pile but before any name in the K pile. Therefore, we can proceed to sort each pile individually and just concatenate the bunch of piles together. If the names are distributed fairly evenly among the buckets, as we might expect, the resulting 26 sorting problems should each be substantially smaller than the original problem. Further, by now partitioning each pile based on the second letter of each name, we generate smaller and smaller piles. The names will be sorted as soon as each bucket contains only a single name. The resulting algorithm is commonly called bucketsort or distribution sort. Bucketing is a very effective idea whenever we are confident that the distribution of data will be roughly uniform. It is the idea that underlies hash tables, kd-trees, and a variety of other practical data structures. The downside of such techniques is that the performance can be terrible whenever the data distribution is not what we expected. Although data structures such as binary trees offer guaranteed worst-case behavior for any input distribution, no such promise exists for heuristic data structures on unexpected input distributions. Figure: A small subset of Charlottesville Shiffletts To show that non-uniform distributions occur in real life, consider Americans with the uncommon last name of Shifflett. The 1997 Manhattan telephone directory, with over one million names, contains exactly five Shiffletts. So how many Shiffletts should there be in a small city of 50,000 people? Figure shows a small portion of the two and a half pages of Shiffletts in the Charlottesville, Virginia telephone book. The Shifflett clan is a fixture of the region, but it would play havoc with any distribution sort program, as refining buckets from S to Sh to Shi to Shif to to Shifflett results in no significant partitioning. Next: War Story: Stripping Triangulations Up: Approaches to Sorting Previous: Randomization file:///E|/BOOK/BOOK/NODE36.HTM (1 of 2) [19/1/2003 1:28:32]
Bucketing Techniques Algorithms Mon Jun 2 23:33:50 EDT 1997 file:///E|/BOOK/BOOK/NODE36.HTM (2 of 2) [19/1/2003 1:28:32]
- Page 107 and 108: Correctness Figure: A bad example f
- Page 109 and 110: Efficiency Next: Expressing Algorit
- Page 111 and 112: Keeping Score Next: The RAM Model o
- Page 113 and 114: The RAM Model of Computation substa
- Page 115 and 116: Best, Worst, and Average-Case Compl
- Page 117 and 118: The Big Oh Notation Figure: Illustr
- Page 119 and 120: Logarithms Next: Modeling the Probl
- Page 121 and 122: Logarithms justified in ignoring th
- Page 123 and 124: Modeling the Problem Figure: Modeli
- Page 125 and 126: About the War Stories Next: War Sto
- Page 127 and 128: War Story: Psychic Modeling Next: E
- Page 129 and 130: War Story: Psychic Modeling have pu
- Page 131 and 132: War Story: Psychic Modeling Next: E
- Page 133 and 134: Exercises (b) If I prove that an al
- Page 135 and 136: Fundamental Data Types Next: Contai
- Page 137 and 138: Containers Next: Dictionaries Up: F
- Page 139 and 140: Dictionaries Next: Binary Search Tr
- Page 141 and 142: Binary Search Trees BinaryTreeQuery
- Page 143 and 144: Priority Queues Next: Specialized D
- Page 145 and 146: Specialized Data Structures Next: S
- Page 147 and 148: Sorting Next: Applications of Sorti
- Page 149 and 150: Applications of Sorting Figure: Con
- Page 151 and 152: Data Structures Next: Incremental I
- Page 153 and 154: Incremental Insertion Next: Divide
- Page 155 and 156: Randomization Next: Bucketing Techn
- Page 157: Randomization Next: Bucketing Techn
- Page 161 and 162: War Story: Stripping Triangulations
- Page 163 and 164: War Story: Stripping Triangulations
- Page 165 and 166: War Story: Mystery of the Pyramids
- Page 167 and 168: War Story: Mystery of the Pyramids
- Page 169 and 170: War Story: String 'em Up We were co
- Page 171 and 172: War Story: String 'em Up Figure: Su
- Page 173 and 174: Exercises Next: Implementation Chal
- Page 175 and 176: Exercises used to select the pivot.
- Page 177 and 178: Dynamic Programming Next: Fibonacci
- Page 179 and 180: Fibonacci numbers Next: The Partiti
- Page 181 and 182: Fibonacci numbers Next: The Partiti
- Page 183 and 184: The Partition Problem . What is the
- Page 185 and 186: The Partition Problem Figure: Dynam
- Page 187 and 188: Approximate String Matching Next: L
- Page 189 and 190: Approximate String Matching The val
- Page 191 and 192: Longest Increasing Sequence Will th
- Page 193 and 194: Minimum Weight Triangulation Next:
- Page 195 and 196: Limitations of Dynamic Programming
- Page 197 and 198: War Story: Evolution of the Lobster
- Page 199 and 200: War Story: Evolution of the Lobster
- Page 201 and 202: War Story: What's Past is Prolog Ne
- Page 203 and 204: War Story: What's Past is Prolog th
- Page 205 and 206: War Story: Text Compression for Bar
- Page 207 and 208: War Story: Text Compression for Bar
Bucketing Techniques<br />
Next: War Story: Stripping Triangulations Up: Approaches to Sorting Previous: Randomization<br />
Bucketing Techniques<br />
If we were sorting names for the telephone book, we could start by partitioning the names according to the first letter of the last<br />
name. That will create 26 different piles, or buckets, of names. Observe that any name in the J pile must occur after every name<br />
in the I pile but before any name in the K pile. <strong>The</strong>refore, we can proceed to sort each pile individually and just concatenate the<br />
bunch of piles together.<br />
If the names are distributed fairly evenly among the buckets, as we might expect, the resulting 26 sorting problems should each<br />
be substantially smaller than the original problem. Further, by now partitioning each pile based on the second letter of each<br />
name, we generate smaller and smaller piles. <strong>The</strong> names will be sorted as soon as each bucket contains only a single name. <strong>The</strong><br />
resulting algorithm is commonly called bucketsort or distribution sort.<br />
Bucketing is a very effective idea whenever we are confident that the distribution of data will be roughly uniform. It is the idea<br />
that underlies hash tables, kd-trees, and a variety of other practical data structures. <strong>The</strong> downside of such techniques is that the<br />
performance can be terrible whenever the data distribution is not what we expected. Although data structures such as binary<br />
trees offer guaranteed worst-case behavior for any input distribution, no such promise exists for heuristic data structures on<br />
unexpected input distributions.<br />
Figure: A small subset of Charlottesville Shiffletts<br />
To show that non-uniform distributions occur in real life, consider Americans with the uncommon last name of Shifflett. <strong>The</strong><br />
1997 Manhattan telephone directory, with over one million names, contains exactly five Shiffletts. So how many Shiffletts<br />
should there be in a small city of 50,000 people? Figure shows a small portion of the two and a half pages of Shiffletts in<br />
the Charlottesville, Virginia telephone book. <strong>The</strong> Shifflett clan is a fixture of the region, but it would play havoc with any<br />
distribution sort program, as refining buckets from S to Sh to Shi to Shif to to Shifflett results in no significant partitioning.<br />
Next: War Story: Stripping Triangulations Up: Approaches to Sorting Previous: Randomization<br />
file:///E|/BOOK/BOOK/NODE36.HTM (1 of 2) [19/1/2003 1:28:32]