Introduction to Information Retrieval
Introduction to Information Retrieval
Introduction to Information Retrieval
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
<strong>Introduction</strong> <strong>to</strong> <strong>Information</strong> <strong>Retrieval</strong><br />
Collection frequency vs. Document frequency<br />
word collection frequency document frequency<br />
INSURANCE<br />
TRY<br />
• Collection frequency of t: number of <strong>to</strong>kens of t in the<br />
collection<br />
• Document frequency of t: number of documents t occurs in<br />
• Why these numbers?<br />
10440<br />
10422<br />
• Which word is a better search term (and should get a<br />
higher weight)?<br />
• This example suggests that df (and idf) is better for<br />
weighting than cf (and “icf”).<br />
3997<br />
8760<br />
26