28.06.2013 Views

Papers in PDF format

Papers in PDF format

Papers in PDF format

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Providers<br />

In<strong>format</strong>ion<br />

Servers<br />

HTTP<br />

Gopher<br />

FTP<br />

Gatherers<br />

Meta-data<br />

Extraction<br />

SOIF<br />

Gatherer<br />

Dissem<strong>in</strong>ation<br />

Service<br />

In<strong>format</strong>ion<br />

Classification<br />

and<br />

Dissem<strong>in</strong>ation<br />

Brokers<br />

Specialized<br />

In<strong>format</strong>ion<br />

Brokers<br />

Broker<br />

In<strong>format</strong>ion<br />

Service<br />

Abstraction<br />

of Broker<br />

In<strong>format</strong>ion<br />

Space<br />

Client<br />

Interactive<br />

Interface to<br />

In<strong>format</strong>ion<br />

Space<br />

Figure 1: MITRE In<strong>format</strong>ion Discovery System (MIDS) Architecture<br />

To augment the fixed taxonomy classification scheme, several cluster<strong>in</strong>g schemes (e.g., [Ja<strong>in</strong> et al. 88]) are<br />

under <strong>in</strong>vestigation to provide a topical decomposition of specific subject categories. After <strong>in</strong><strong>format</strong>ion gets<br />

filtered <strong>in</strong>to the fixed taxonomy, it is processed through a cluster<strong>in</strong>g algorithm which groups documents <strong>in</strong><br />

each topical area <strong>in</strong>to related piles. This is done to provide a f<strong>in</strong>er granularity of topics than is supplied by the<br />

fixed taxonomy classification scheme. This is especially useful when a large number of documents are<br />

assigned to a topical area as a result of the filter<strong>in</strong>g process.<br />

In<strong>format</strong>ion Organization<br />

In addition to collect<strong>in</strong>g and classify<strong>in</strong>g <strong>in</strong><strong>format</strong>ion, the GDS distributes its processed <strong>in</strong><strong>format</strong>ion to a set<br />

of Harvest Brokers and the BIS. Brokers register themselves with the GDS and have profiles, stored <strong>in</strong> a<br />

knowledge base, that specify the topical categories of the k<strong>in</strong>ds of documents to be received. Each time the<br />

GDS classifies new <strong>in</strong><strong>format</strong>ion, Broker specific files conta<strong>in</strong><strong>in</strong>g SOIF records are written to an output queue<br />

on secondary storage. Brokers periodically contact the GDS to receive new <strong>in</strong><strong>format</strong>ion and to <strong>in</strong>dex it. In<br />

addition, <strong>in</strong><strong>format</strong>ion perta<strong>in</strong><strong>in</strong>g to objects and their classification categories is routed to the BIS.<br />

Changes that occur to the document collections processed by the GDS, <strong>in</strong> terms of document deletions,<br />

additions, and updates, are reflected throughout MIDS us<strong>in</strong>g a weak-consistency protocol [Down<strong>in</strong>g et al. 90].<br />

S<strong>in</strong>ce each MIDS subsystem is run only periodically (weekly), changes occurr<strong>in</strong>g to document collections are<br />

not reflected until a new batch of <strong>in</strong><strong>format</strong>ion is processed through the system. After process<strong>in</strong>g a new batch of<br />

<strong>in</strong><strong>format</strong>ion, the GDS notifies Brokers and the BIS about any changes that had occurred so that they can<br />

appropriately update their databases.<br />

In<strong>format</strong>ion Discovery and Retrieval<br />

In<strong>format</strong>ion discovery and retrieval <strong>in</strong> MIDS occurs through the client <strong>in</strong>terface which <strong>in</strong>teracts with<br />

Harvest Brokers and BIS. The <strong>in</strong>itial services provided by the system <strong>in</strong>clude topical browse, query rout<strong>in</strong>g,<br />

and search. The BIS provides a topical browse service by manag<strong>in</strong>g the fixed taxonomy of topics as well as the<br />

topics generated as a result of utiliz<strong>in</strong>g cluster<strong>in</strong>g techniques. The BIS also manages document summary<br />

<strong>in</strong><strong>format</strong>ion, which had been generated as a result of the classification process <strong>in</strong> the GDS, and provides the<br />

query rout<strong>in</strong>g capability by determ<strong>in</strong><strong>in</strong>g which Harvest Brokers to search as the user browses the topical<br />

<strong>in</strong><strong>format</strong>ion space. This is accomplished by means of a table which lists the association between Brokers and<br />

the topical categories they manage.<br />

Harvest Brokers provide a full-text search capability, enabl<strong>in</strong>g a user to issue a query conta<strong>in</strong><strong>in</strong>g a Boolean<br />

expression of keywords and then receive a list of documents.<br />

All documents retrieved <strong>in</strong> the system reside at the <strong>in</strong><strong>format</strong>ion provider sites. Only metadata is processed<br />

and stored with<strong>in</strong> MIDS. After the user consults the system, a list of documents represented by titles and URLs

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!