28.06.2013 Views

Papers in PDF format

Papers in PDF format

Papers in PDF format

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

2.1 Extract<strong>in</strong>g and Organiz<strong>in</strong>g WWW Semantic Content<br />

The system’s pr<strong>in</strong>cipal visualizations are network displays based on documents’ keyword lists. The lists<br />

can be provided by automatic content extraction tools, such as Harvest [Bowman et al., 1994], or derived from<br />

documents retrieved by the system. Keyword lists for each document are used to determ<strong>in</strong>e the associations<br />

among documents and among terms us<strong>in</strong>g co-occurrence metrics to derive similarity measures among documents<br />

and among keywords. Both visualizations of document space and term space for WWW document sets<br />

are available to the user.<br />

The content based network of WWW documents is shown below <strong>in</strong> Figure 1 as the ma<strong>in</strong> w<strong>in</strong>dow of the<br />

screen display. In this view documents are labeled by their content. Alternatively, the display can present the<br />

HTML title. Below are overview diagrams show<strong>in</strong>g the location of the detailed view <strong>in</strong> the ma<strong>in</strong> w<strong>in</strong>dow<br />

with<strong>in</strong> the complete network. To the left and right of the overview is a series of visual bookmarks set while<br />

brows<strong>in</strong>g which can be used to return to a previous viewpo<strong>in</strong>t. The leftmost shows a view of the complete network.<br />

Other navigation and orientation tools are also available, such as anchors and signposts, which the user<br />

can leave and revisit at po<strong>in</strong>ts traversed <strong>in</strong> the network. Change of viewpo<strong>in</strong>t us<strong>in</strong>g navigation aids is always<br />

done by zoom<strong>in</strong>g to new viewpo<strong>in</strong>ts to ma<strong>in</strong>ta<strong>in</strong> fluid motion and attenuate disorientation. In the upper right<br />

of the screen is a natural language query which has been transformed to a user-manipulable query graph. The<br />

query can be used to supply an entry po<strong>in</strong>t <strong>in</strong> the document network or be used for conventional weighted vector<br />

search to provide a list of documents from which to <strong>in</strong>itiate brows<strong>in</strong>g.<br />

Figure 1: Document Explorer screen. The content based document network is shown <strong>in</strong> the ma<strong>in</strong> w<strong>in</strong>dow<br />

with overview diagrams below it. W<strong>in</strong>dows to the left and right of the overview show visual bookmarks the<br />

user has set. At the upper right is the user’s natural language query and its visual representation.<br />

The representations underly<strong>in</strong>g the system’s network displays are m<strong>in</strong>imum cost networks derived from<br />

measures of term and document associations. The network of documents is based on <strong>in</strong>terdocument similarity,

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!