The.Algorithm.Design.Manual.Springer-Verlag.1998
The.Algorithm.Design.Manual.Springer-Verlag.1998 The.Algorithm.Design.Manual.Springer-Verlag.1998
War Story: Dialing for Documents distribution of letter frequencies to help us decode the text. For example, if you hit the 3 key while typing English, you were more likely to have meant to type an `E' than either a `D' or `F'. By taking into account the frequencies of a window of three characters, we could predict the typed text. Indeed, this is what happened when I tried it on the Gettysburg Address: enurraore ane reten yeasr ain our ectherr arotght eosti on ugis aootinent a oey oation aoncdivee in licesty ane eedicatee un uhe rrorosition uiat all oen are arectee e ual ony ye are enichde in a irect aitil yar uestini yhethes uiat oatioo or aoy oation ro aoncdivee ane ro eedicatee aan loni eneure ye are oet on a irect aattlediele oe uiat yar ye iate aone un eedicate a rostion oe uiat eiele ar a einal restini rlace eor uiore yin iere iate uhdis lives uiat uhe oation ogght live it is aluniethes eittini ane rrores uiat ye rioule en ugir att in a laries reore ye aan oou eedicate ye aan oou aoorearate ye aan oou ialloy ugis iroune the arate oen litini ane eeae yin rustgilee iere iate aoorearatee it ear aante our roor rowes un ade or eeuraat the yople yill little oote oor loni renences yiat ye ray iere att it aan oetes eosiet yiat uhfy eie iere it is eor ur uhe litini rathes un ae eedicatee iere un uhe undiniside yopl yhici uhfy yin entght iere iate uiur ear ro onaky aetancde it is rathes eor ur un ae iere eedicatee un uhe irect uarl rencinini adeore ur uiat eron uhere ioooree eeae ye uale inarearee eeuotion uo tiat aaure eor yhici uhfy iere iate uhe lart eull oearure oe eeuotioo tiat ye iere iggily rerolue uiat uhere eeae riall oou iate eide io The trigram statistics did a decent job of translating it into Greek, but a terrible job of transcribing English. One reason was clear. This algorithm knew nothing about English words. If we coupled it with a dictionary, we might be on to something. The difficulty was that often two words in the dictionary would be represented by the exact same string of phone codes. For an extreme example, the code string ``22737'' collides with eleven distinct English words, including cases, cares, cards, capes, caper, and bases. As a first attempt, we reported the unambiguous characters of any words that collided in the dictionary, and used trigrams to fill in the rest of the characters. We were rewarded with: eourscore and seven yearr ain our eatherr brought forth on this continent azoey nation conceivee in liberty and dedicatee uo uhe proposition that all men are createe equal ony ye are engagee in azipeat civil yar uestioi whether that nation or aoy nation ro conceivee and ro dedicatee aan long endure ye are oet on azipeat battlefield oe that yar ye iate aone uo dedicate a rostion oe that field ar a final perthni place for those yin here iate their lives that uhe nation oight live it is altogether fittinizane proper that ye should en this aut in a larges sense ye aan oou dedicate ye aan oou consecrate ye aan oou hallow this ground the arate men litioi and deae yin strugglee here iate consecratee it ear above our roor power uo ade or detract the world will little oote oor long remember what ye ray here aut it aan meter forget what uhfy die here it is for ur uhe litioi rather uo ae dedicatee here uo uhe toeioisgee york which uhfy yin fought here iate thus ear ro mocky advancee it is rather for ur uo ae here dedicatee uo uhe great task renagogoi adfore ur that from there honoree deae ye uale increasee devotion uo that aause for which uhfy here iate uhe last eull measure oe devotion that ye here highky resolve that there deae shall oou iate fide io vain that this nation under ioe shall iate azoey birth oe freedom and that ioternmenu oe uhe people ay uhe people for uhe people shall oou perish from uhe earth file:///E|/BOOK/BOOK2/NODE80.HTM (2 of 6) [19/1/2003 1:29:21]
War Story: Dialing for Documents If you were a student of American history, maybe you could recognize it, but you certainly couldn't read it. Somehow we had to distinguish between the different dictionary words that got hashed to the same code. We could factor in the relative popularity of each word, which would help, but this would still make too many mistakes. At this point I started working with Harald Rau on the project, who proved a great collaborator for two reasons. First, he was a bright and peristent graduate student. Second, as a native German speaker he would believe every lie I told him about English grammar. Figure: The phases of the telephone code reconstruction process Harald built up a phone code reconstruction program on the lines of Figure . It worked on the input one sentence at a time, identifying dictionary words that matched each code string. The key problem was how to incorporate the grammatical constraints. ``We can get good word-use frequencies and grammatical information using this big text database called the Brown Corpus. It contains thousands of typical English sentences, each of which is parsed according to parts of speech. But how do we factor it all in?'' Harald asked. ``Let's try to think about it as a graph problem,'' I suggested. file:///E|/BOOK/BOOK2/NODE80.HTM (3 of 6) [19/1/2003 1:29:21]
- Page 209 and 210: Divide and Conquer Next: Fast Expon
- Page 211 and 212: Fast Exponentiation Next: Binary Se
- Page 213 and 214: Binary Search Next: Square and Othe
- Page 215 and 216: Exercises Next: Implementation Chal
- Page 217 and 218: Exercises whose denominations are ,
- Page 219 and 220: The Friendship Graph Next: Data Str
- Page 221 and 222: The Friendship Graph friendship gra
- Page 223 and 224: Data Structures for Graphs linked t
- Page 225 and 226: War Story: Getting the Graph ``Well
- Page 227 and 228: Traversing a Graph Next: Breadth-Fi
- Page 229 and 230: Breadth-First Search Next: Depth-Fi
- Page 231 and 232: Depth-First Search Next: Applicatio
- Page 233 and 234: Depth-First Search Next: Applicatio
- Page 235 and 236: Connected Components Next: Tree and
- Page 237 and 238: Two-Coloring Graphs Next: Topologic
- Page 239 and 240: Topological Sorting Next: Articulat
- Page 241 and 242: Modeling Graph Problems Next: Minim
- Page 243 and 244: Modeling Graph Problems good line s
- Page 245 and 246: Minimum Spanning Trees ● Prim's A
- Page 247 and 248: Prim's Algorithm inserted edge (x,y
- Page 249 and 250: Kruskal's Algorithm a tree of weigh
- Page 251 and 252: Dijkstra's Algorithm Next: All-Pair
- Page 253 and 254: All-Pairs Shortest Path Next: War S
- Page 255 and 256: War Story: Nothing but Nets Next: W
- Page 257 and 258: War Story: Nothing but Nets ``You a
- Page 259: War Story: Dialing for Documents Ne
- Page 263 and 264: War Story: Dialing for Documents CO
- Page 265 and 266: Exercises Next: Implementation Chal
- Page 267 and 268: Exercises Prove the statement or gi
- Page 269 and 270: Backtracking report it. kth element
- Page 271 and 272: Constructing All Subsets Next: Cons
- Page 273 and 274: Constructing All Paths in a Graph N
- Page 275 and 276: Search Pruning Next: Bandwidth Mini
- Page 277 and 278: Bandwidth Minimization immediately
- Page 279 and 280: War Story: Covering Chessboards Nex
- Page 281 and 282: War Story: Covering Chessboards att
- Page 283 and 284: Heuristic Methods Mon Jun 2 23:33:5
- Page 285 and 286: Simulated Annealing Return S then u
- Page 287 and 288: Traveling Salesman Problem Next: Ma
- Page 289 and 290: Independent Set Next: Circuit Board
- Page 291 and 292: Neural Networks Next: Genetic Algor
- Page 293 and 294: Genetic Algorithms Next: War Story:
- Page 295 and 296: War Story: Annealing Arrays Next: P
- Page 297 and 298: War Story: Annealing Arrays optimal
- Page 299 and 300: Parallel Algorithms Next: War Story
- Page 301 and 302: War Story: Going Nowhere Fast Next:
- Page 303 and 304: Exercises Next: Implementation Chal
- Page 305 and 306: Problems and Reductions Next: Simpl
- Page 307 and 308: Simple Reductions Next: Hamiltonian
- Page 309 and 310: Hamiltonian Cycles Next: Independen
War Story: Dialing for Documents<br />
If you were a student of American history, maybe you could recognize it, but you certainly couldn't read<br />
it. Somehow we had to distinguish between the different dictionary words that got hashed to the same<br />
code. We could factor in the relative popularity of each word, which would help, but this would still<br />
make too many mistakes.<br />
At this point I started working with Harald Rau on the project, who proved a great collaborator for two<br />
reasons. First, he was a bright and peristent graduate student. Second, as a native German speaker he<br />
would believe every lie I told him about English grammar.<br />
Figure: <strong>The</strong> phases of the telephone code reconstruction process<br />
Harald built up a phone code reconstruction program on the lines of Figure . It worked on the input<br />
one sentence at a time, identifying dictionary words that matched each code string. <strong>The</strong> key problem was<br />
how to incorporate the grammatical constraints.<br />
``We can get good word-use frequencies and grammatical information using this big text database called<br />
the Brown Corpus. It contains thousands of typical English sentences, each of which is parsed according<br />
to parts of speech. But how do we factor it all in?'' Harald asked.<br />
``Let's try to think about it as a graph problem,'' I suggested.<br />
file:///E|/BOOK/BOOK2/NODE80.HTM (3 of 6) [19/1/2003 1:29:21]