Introduction to Information Retrieval
Introduction to Information Retrieval
Introduction to Information Retrieval
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
<strong>Introduction</strong> <strong>to</strong> <strong>Information</strong> <strong>Retrieval</strong><br />
Length normalization<br />
• How do we compute the cosine?<br />
• A vec<strong>to</strong>r can be (length-) normalized by dividing each of its<br />
components by its length – here we use the L 2 norm:<br />
• This maps vec<strong>to</strong>rs on<strong>to</strong> the unit sphere<br />
• As a result, longer documents and shorter documents have<br />
weights of the same order of magnitude.<br />
• Effect on the two documents d and d′ (d appended <strong>to</strong> itself)<br />
from earlier slide: they have identical vec<strong>to</strong>rs after lengthnormalization.<br />
41