28.06.2013 Views

Papers in PDF format

Papers in PDF format

Papers in PDF format

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Our approach is based on automatic distribution and replication of meta-<strong>in</strong><strong>format</strong>ion (abstracts of resources).<br />

By the “abstract” we mean a unit of meta-<strong>in</strong><strong>format</strong>ion, a small (with<strong>in</strong> 1-2K) description of a resource. An<br />

abstract has to conta<strong>in</strong> at least two components: a short description of resource to judge on the relevance to a<br />

query be<strong>in</strong>g processed and some po<strong>in</strong>ter to the resource (such as URL or URN) to locate the resource at retrieval<br />

stage. WHOIS++ templates and Harvest SOIF templates are examples of an abstract. Other <strong>in</strong><strong>format</strong>ion, such as<br />

abstract’s expire date, its source, popularity rat<strong>in</strong>g, may be stored with the abstract.<br />

Us<strong>in</strong>g the abstracts <strong>in</strong>stead of full text documents makes it technologically feasible to pass, replicate, <strong>in</strong>dex and<br />

search the resource discovery meta-<strong>in</strong><strong>format</strong>ion because it decreases <strong>in</strong>dex size and network traffic. It also<br />

provides a uniform approach other types of resources, such as pictures and software packages.<br />

To achieve aims of distributed <strong>in</strong>dex<strong>in</strong>g and query process<strong>in</strong>g, we could deploy a network of <strong>in</strong>dex brokers and<br />

allow abstracts to travel from <strong>in</strong>dex to <strong>in</strong>dex and stay longer where and while they are popular. The more<br />

popular the abstract is, the faster and further it would travel and the more replicas would exist.<br />

Observations and Assumptions<br />

A number of observations led to this approach and which suggest that it can achieve its aims:<br />

1. When one is look<strong>in</strong>g for <strong>in</strong><strong>format</strong>ion, it is very likely that others have already looked for this <strong>in</strong><strong>format</strong>ion.<br />

This is especially true <strong>in</strong> a specialised environment, where people are work<strong>in</strong>g <strong>in</strong> the same area and require<br />

similar <strong>in</strong><strong>format</strong>ion. This observation is supported by the success of resource cach<strong>in</strong>g [O’Callaghan 95] and<br />

mirror<strong>in</strong>g. Also analysis of log files of search eng<strong>in</strong>es has shown that there are small number “hot” topics which<br />

constitute a large part of the total number of queries.<br />

2. In many cases, users would be satisfied with any <strong>in</strong><strong>format</strong>ion on the search subject because they can use this<br />

<strong>in</strong><strong>format</strong>ion to f<strong>in</strong>d more l<strong>in</strong>ks or references.<br />

3. A large amount of published material on the Internet has a very low value, and there is a strong need for<br />

expert selection, not for search return<strong>in</strong>g anyth<strong>in</strong>g match<strong>in</strong>g query terms.<br />

Because of these reasons, the News, for example, is successfully used as a resource discovery system, although<br />

it was not designed for that purpose. In our experience, it is a good idea to try the news FAQs as the first po<strong>in</strong>t<br />

when start<strong>in</strong>g look<strong>in</strong>g for material <strong>in</strong> a new area.<br />

In What’sHot, we assume that we can obta<strong>in</strong> abstracts of resources from publishers, or derive abstracts<br />

automatically from the resources, or get them from exist<strong>in</strong>g systems, like Harvest. We also assume that hav<strong>in</strong>g<br />

retrieved and reviewed a document, people are will<strong>in</strong>g to give a “yes/no” feedback (vote) on the document value<br />

and relevance to the query. This is not mandatory, however. We can judge on the value of resource based on the<br />

number of users who retrieved it, but an explicit vote is a better criteria.<br />

What What’sHot Looks Like For Users<br />

There is no essential change <strong>in</strong> the current search process. Instead of contact<strong>in</strong>g InfoSeek or Lycos, people<br />

contact a (local, <strong>in</strong> most cases) What’sHot broker and receive a search form. After submitt<strong>in</strong>g a query, they<br />

receive a result form similar to the one you would receive from a conventional search eng<strong>in</strong>e. What’sHot result<br />

form, however, has a check box for each abstract. There is also a “Submit” button on the form. The users check<br />

the check box only if they liked the resource. They press the “Submit” button to send the feedback to the broker.<br />

A user may send their user profile (description of <strong>in</strong>terests) to a broker and (possibly, for a small fee) from time<br />

to time receive e-mail messages about newly available good resources that may be of <strong>in</strong>terest.<br />

We realise that there are users (possibly, many), who do not bother check<strong>in</strong>g boxes and reply<strong>in</strong>g. The system<br />

just ignores them. This means that the system does not cache <strong>in</strong><strong>format</strong>ion that is <strong>in</strong>terest<strong>in</strong>g to these people. In

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!