12.07.2015 Views

Dictionaries and Tolerant Retrieval

Dictionaries and Tolerant Retrieval

Dictionaries and Tolerant Retrieval

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Introduction to Information <strong>Retrieval</strong>Ch. 2Recap of the previous lecture• The type/token distinction• Terms are normalized types put in the dictionary• Tokenization problems:• Hyphens, apostrophes, compounds, Chinese• Term equivalence classing:• Numbers, case folding, stemming, lemmatization• Skip pointers• Encoding a tree‐like structure in a postings list• Biword indexes for phrases• Positional indexes for phrases/proximity queries2

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!