23.06.2015 Views

Introduction to Information Retrieval

Introduction to Information Retrieval

Introduction to Information Retrieval

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Introduction</strong> <strong>to</strong> <strong>Information</strong> <strong>Retrieval</strong><br />

Length normalization<br />

• How do we compute the cosine?<br />

• A vec<strong>to</strong>r can be (length-) normalized by dividing each of its<br />

components by its length – here we use the L 2 norm:<br />

• This maps vec<strong>to</strong>rs on<strong>to</strong> the unit sphere<br />

• As a result, longer documents and shorter documents have<br />

weights of the same order of magnitude.<br />

• Effect on the two documents d and d′ (d appended <strong>to</strong> itself)<br />

from earlier slide: they have identical vec<strong>to</strong>rs after lengthnormalization.<br />

41

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!