03.01.2015 Views

Combining Information from Multiple Internet Sources

Combining Information from Multiple Internet Sources

Combining Information from Multiple Internet Sources

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

http://ieeexplore.ieee.org/iel5/4106395/<br />

4106396/04106417.pdfisnumber=4106396&pr<br />

od=CNF&arnumber=4106417&arSt=96&ared=101<br />

&arAuthor=Muhammad Nawaz<br />

Table 4.2.5 Results of Consensus method and search engines for more complex query<br />

10 http://www.madison.k12.ct.us/publication<br />

s/shareddesic.htm<br />

Consensus Ask.com Live Interia Yahoo! Google<br />

Set Coverage 50% 30% 70% 40% 60%<br />

URL to URL 0% 20% 0% 0% 0%<br />

Table 4.2.6 Coverage of Consensus method and search engines for more complex query<br />

It can be observed that the answer of Consensus method covers in some extent every engine.<br />

The most covered engine is Interia (70% set-coverage) while the Live engine is the least setcovered<br />

one (30%). URL to URL coverage is very low – that is why the final result was considered<br />

as inconsistent. As stated before, Levenshtein distance is highly dependent on URL positions, thus<br />

leading to the large distances between each of the engines’ result sets and the consensus answer.<br />

Consensus method is highly rank based one. Like in the previous example, the URLs which<br />

the final result set is comprised of, are highly ranked URLs in general. So if the URL was on the top<br />

places throughout the engines’ result sets it will be contained in the final answer of the Consensus<br />

method. If the ranking was low, it will not be contained as its average rank will be very low.<br />

The problem with consistence of the answer is like with the previous case. Low URL to<br />

URL coverage, results in Levenshtein distance to grow, thus leading the average of distances to<br />

grow. The URL to URL coverage also means that the result sets were highly dispersed when<br />

measuring distances using Levenshtein distance. If the URL to URL coverage was about 60-80%<br />

for each search engine, probably the answer would be marked as consistent.<br />

56

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!