12.07.2015 Views

Dictionaries and Tolerant Retrieval

Dictionaries and Tolerant Retrieval

Dictionaries and Tolerant Retrieval

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Introduction to Information <strong>Retrieval</strong>Sec. 3.3Document correction• Especially needed for OCR’ed documents• Correction algorithms are tuned for this: rn/m• Can use domain‐specific knowledge• E.g., OCR can confuse O <strong>and</strong> D more often than it would confuse O<strong>and</strong> I (adjacent on the QWERTY keyboard, so more likelyinterchanged in typing).• But also: web pages <strong>and</strong> even printed material hastypos• Goal: the dictionary contains fewer misspellings• But often we don’t change the documents but aim tofix the query‐document mapping23

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!