03.01.2015 Views

Combining Information from Multiple Internet Sources

Combining Information from Multiple Internet Sources

Combining Information from Multiple Internet Sources

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

is multiplied by 0.01. 3<br />

If the algorithms could not be started; the application creates a combined result set <strong>from</strong> all<br />

result sets without URL repetitions and such large set is displayed with a possibility to provide<br />

feedback. The URL which is considered to be the best can be marked as feedback and it is sent to<br />

the application, which uses it as an anchor to start weights calculation process.<br />

Input: Result <strong>from</strong> feedback; initial result sets<br />

Output: Map of weights with corresponding agents<br />

BEGIN<br />

1. find the agent whose result set contains the result <strong>from</strong> feedback,<br />

set his weight to 1<br />

2. for all other agents:<br />

( i ) w<br />

find d( r , r )<br />

W[<br />

i]<br />

=<br />

where<br />

r<br />

( i) ( i )<br />

− d<br />

r<br />

w<br />

( r , r )<br />

( i)<br />

( i)<br />

w<br />

( r r )<br />

d , is the number of different URLs between the result<br />

set of agent i and the “winner ” agent (note that those in case of<br />

ad joint result sets will be equal to zero)<br />

3. return weights<br />

END<br />

Listing 3.4.2 Weights calculation for Game theory and Auction methods<br />

3.4.3 Adapted Levenshtein distance<br />

Next listing presents the adapted algorithm for finding Levenshtein distance. An adaptation<br />

of this algorithm was used in the application for calculating distances between result sets. The<br />

algorithm is simple but at the same moment it is very fast and provides well and easily interpretable<br />

results.<br />

In its original version it is an edit distance – measure of distance between strings. It finds<br />

how many basic operations are needed to transform one string into another. “Basic operations”<br />

mean the following:<br />

• deletion of a character <strong>from</strong> the string<br />

• insertion of a character to a string<br />

• substitution of a character with another character<br />

This distance was applied to measure the distance between result sets. Adaptation of this distance<br />

was as following: strings became result sets; characters became URLs. Having this translation, one<br />

3 As for the previous case – this functionality was disabled for the tests presented in chapter 4. Weight of every search<br />

engine was equal to 1 – URL ranks were not altered.<br />

39

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!