Introduction to Information Retrieval
Introduction to Information Retrieval Introduction to Information Retrieval
Introduction to Information Retrieval Term frequency tf • The term frequency tf t,d of term t in document d is defined as the number of times that t occurs in d. • We want to use tf when computing query-document match scores. • But how? • Raw term frequency is not what we want because: • A document with tf = 10 occurrences of the term is more relevant than a document with tf = 1 occurrence of the term. • But not 10 times more relevant. • Relevance does not increase proportionally with term frequency. 16
Introduction to Information Retrieval Instead of raw frequency: Log frequency weighting • The log frequency weight of term t in d is defined as follows • tf t,d → w t,d : 0 → 0, 1 → 1, 2 → 1.3, 10 → 2, 1000 → 4, etc. • Score for a document-query pair: sum over terms t in both q and d: tf-matching-score(q, d) = t∈q∩d (1 + log tf t,d ) • The score is 0 if none of the query terms is present in the document. 17
- Page 1 and 2: Introduction to Information Retriev
- Page 3 and 4: Introduction to Information Retriev
- Page 5 and 6: Introduction to Information Retriev
- Page 7 and 8: Introduction to Information Retriev
- Page 9 and 10: Introduction to Information Retriev
- Page 11 and 12: Introduction to Information Retriev
- Page 13 and 14: Introduction to Information Retriev
- Page 15: Introduction to Information Retriev
- Page 19 and 20: Introduction to Information Retriev
- Page 21 and 22: Introduction to Information Retriev
- Page 23 and 24: Introduction to Information Retriev
- Page 25 and 26: Introduction to Information Retriev
- Page 27 and 28: Introduction to Information Retriev
- Page 29 and 30: Introduction to Information Retriev
- Page 31 and 32: Introduction to Information Retriev
- Page 33 and 34: Introduction to Information Retriev
- Page 35 and 36: Introduction to Information Retriev
- Page 37 and 38: Introduction to Information Retriev
- Page 39 and 40: Introduction to Information Retriev
- Page 41 and 42: Introduction to Information Retriev
- Page 43 and 44: Introduction to Information Retriev
- Page 45 and 46: Introduction to Information Retriev
- Page 47 and 48: Introduction to Information Retriev
- Page 49 and 50: Introduction to Information Retriev
- Page 51 and 52: Introduction to Information Retriev
- Page 53: Introduction to Information Retriev
<strong>Introduction</strong> <strong>to</strong> <strong>Information</strong> <strong>Retrieval</strong><br />
Term frequency tf<br />
• The term frequency tf t,d of term t in document d is defined<br />
as the number of times that t occurs in d.<br />
• We want <strong>to</strong> use tf when computing query-document match<br />
scores.<br />
• But how?<br />
• Raw term frequency is not what we want because:<br />
• A document with tf = 10 occurrences of the term is more<br />
relevant than a document with tf = 1 occurrence of the<br />
term.<br />
• But not 10 times more relevant.<br />
• Relevance does not increase proportionally with term<br />
frequency.<br />
16