28.06.2013 Views

Papers in PDF format

Papers in PDF format

Papers in PDF format

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

1. Introduction<br />

In<strong>format</strong>ion Fusion with ProFusion*<br />

Susan Gauch, Guijun Wang<br />

Department of Electrical Eng<strong>in</strong>eer<strong>in</strong>g and Computer Science<br />

The University of Kansas, Lawrence, KS 66045, USA<br />

{sgauch, gwang}@eecs.ukans.edu<br />

*http://www.designlab.ukans.edu/ProFusion.html<br />

Abstract: The explosive growth of the World Wide Web, and the result<strong>in</strong>g <strong>in</strong><strong>format</strong>ion<br />

overload, has led to a m<strong>in</strong>i-explosion <strong>in</strong> World Wide Web search eng<strong>in</strong>es. This m<strong>in</strong>iexplosion,<br />

<strong>in</strong> turn, led to the development of ProFusion, a meta search eng<strong>in</strong>e. Educators,<br />

like other users, do not have the time to evaluate multiple search eng<strong>in</strong>es to knowledgeably<br />

select the best for their uses. Nor do they have the time to submit each query to multiple<br />

search eng<strong>in</strong>es and wade through the result<strong>in</strong>g flood of good <strong>in</strong><strong>format</strong>ion, duplicated<br />

<strong>in</strong><strong>format</strong>ion, irrelevant <strong>in</strong><strong>format</strong>ion, and miss<strong>in</strong>g documents. ProFusion sends user queries to<br />

multiple underly<strong>in</strong>g search eng<strong>in</strong>es <strong>in</strong> parallel, retrieves and merges the result<strong>in</strong>g URLs. It<br />

identifies and removes duplicates and creates one relevance-ranked list. If desired, the actual<br />

documents can be pre-fetched to remove yet more duplicates and broken l<strong>in</strong>ks. ProFusion's<br />

performance has been compared to the <strong>in</strong>dividual search eng<strong>in</strong>es and other meta searchers,<br />

demonstrat<strong>in</strong>g its ability to retrieve more relevant <strong>in</strong><strong>format</strong>ion and present fewer duplicates<br />

pages. Future developments <strong>in</strong>clude analyz<strong>in</strong>g the documents for improved rank<strong>in</strong>g,<br />

automatically submitt<strong>in</strong>g queries to the most appropriate search eng<strong>in</strong>es, and modify<strong>in</strong>g<br />

ProFusion to be an <strong>in</strong><strong>format</strong>ion filter<strong>in</strong>g and dissem<strong>in</strong>ation system.<br />

There are a huge number of documents on the World Wide Web, mak<strong>in</strong>g it very difficult to locate<br />

<strong>in</strong><strong>format</strong>ion that is relevant to a user's <strong>in</strong>terest. Search tools such as InfoSeek[InfoSeek] and Lycos[Lycos]<br />

<strong>in</strong>dex huge collections of Web documents, allow<strong>in</strong>g users to search the World Wide Web via keyword-based<br />

queries. Given a query, such search tools search their <strong>in</strong>dividual <strong>in</strong>dex and present the user with a list of items<br />

that are potentially relevant, generally presented <strong>in</strong> ranked order. However large the <strong>in</strong>dexes are, still each<br />

search tool <strong>in</strong>dexes only a subset of all documents available on WWW. As more and more search tools become<br />

available, each cover<strong>in</strong>g a different (overlapp<strong>in</strong>g) subset of Web documents, it becomes <strong>in</strong>creas<strong>in</strong>gly difficult to<br />

choose the right one to use for a specific <strong>in</strong><strong>format</strong>ion need. ProFusion has been developed to help deal with<br />

this problem.<br />

2. Related Work<br />

There are several different approaches to manag<strong>in</strong>g the proliferation of Web search eng<strong>in</strong>es. One<br />

solution is to use a large Web page that lists several search eng<strong>in</strong>es and allows users to query one search eng<strong>in</strong>e<br />

at a time. One example of this approach is All-<strong>in</strong>-One Search Page [Cross]. Unfortunately, users still have to<br />

choose one search eng<strong>in</strong>e to which to submit their search.<br />

Another approach is to use <strong>in</strong>telligent agents to br<strong>in</strong>g back documents that are relevant to a user's<br />

<strong>in</strong>terest. Such agents [Balabanovic et al. 1995][Knoblock et al. 1994] provide personal assistance to a user.<br />

For example, [Balabanovic et al. 1995] describes an adaptive agent that can br<strong>in</strong>g back Web pages of a user's<br />

<strong>in</strong>terest daily. The user gives relevance feedback to the agent by evaluat<strong>in</strong>g Web pages that were brought<br />

back. The agent them makes adjustment for future searches on relevant Web pages. However, these agents<br />

[Balabanovic et al. 1995][Knoblock et al. 1994] gather <strong>in</strong><strong>format</strong>ion from only their own search <strong>in</strong>dex, which<br />

may limit the amount of <strong>in</strong><strong>format</strong>ion they have access to.<br />

A different approach is the meta search method which builds on top of other search eng<strong>in</strong>es. Queries<br />

are submitted to the meta search eng<strong>in</strong>e which <strong>in</strong> turn sends the query to multiple s<strong>in</strong>gle search eng<strong>in</strong>es.<br />

When retrieved items are returned by the underly<strong>in</strong>g search eng<strong>in</strong>es, it further processes these items and

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!