Introduction to Information Retrieval
Introduction to Information Retrieval Introduction to Information Retrieval
Introduction to Information Retrieval Document frequency • We want high weights for rare terms like ARACHNOCENTRIC. • We want low (positive) weights for frequent words like GOOD, INCREASE and LINE. • We will use document frequency to factor this into computing the matching score. • The document frequency is the number of documents in the collection that the term occurs in. 22
Introduction to Information Retrieval idf weight • df t is the document frequency, the number of documents that t occurs in. • df t is an inverse measure of the informativeness of term t. • We define the idf weight of term t as follows: (N is the number of documents in the collection.) • idf t is a measure of the informativeness of the term. • [log N/df t ] instead of [N/df t ] to “dampen” the effect of idf • Note that we use the log transformation for both term frequency and document frequency. 23
- Page 1 and 2: Introduction to Information Retriev
- Page 3 and 4: Introduction to Information Retriev
- Page 5 and 6: Introduction to Information Retriev
- Page 7 and 8: Introduction to Information Retriev
- Page 9 and 10: Introduction to Information Retriev
- Page 11 and 12: Introduction to Information Retriev
- Page 13 and 14: Introduction to Information Retriev
- Page 15 and 16: Introduction to Information Retriev
- Page 17 and 18: Introduction to Information Retriev
- Page 19 and 20: Introduction to Information Retriev
- Page 21: Introduction to Information Retriev
- Page 25 and 26: Introduction to Information Retriev
- Page 27 and 28: Introduction to Information Retriev
- Page 29 and 30: Introduction to Information Retriev
- Page 31 and 32: Introduction to Information Retriev
- Page 33 and 34: Introduction to Information Retriev
- Page 35 and 36: Introduction to Information Retriev
- Page 37 and 38: Introduction to Information Retriev
- Page 39 and 40: Introduction to Information Retriev
- Page 41 and 42: Introduction to Information Retriev
- Page 43 and 44: Introduction to Information Retriev
- Page 45 and 46: Introduction to Information Retriev
- Page 47 and 48: Introduction to Information Retriev
- Page 49 and 50: Introduction to Information Retriev
- Page 51 and 52: Introduction to Information Retriev
- Page 53: Introduction to Information Retriev
<strong>Introduction</strong> <strong>to</strong> <strong>Information</strong> <strong>Retrieval</strong><br />
idf weight<br />
• df t is the document frequency, the number of documents<br />
that t occurs in.<br />
• df t is an inverse measure of the informativeness of term t.<br />
• We define the idf weight of term t as follows:<br />
(N is the number of documents in the collection.)<br />
• idf t is a measure of the informativeness of the term.<br />
• [log N/df t ] instead of [N/df t ] <strong>to</strong> “dampen” the effect of idf<br />
• Note that we use the log transformation for both term<br />
frequency and document frequency.<br />
23