28.06.2013 Views

Papers in PDF format

Papers in PDF format

Papers in PDF format

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

4.4 Database versus In<strong>format</strong>ion Retrieval<br />

The Web is often used only as a means to present <strong>in</strong><strong>format</strong>ion. If it is not used to acquire data<br />

no updates, <strong>in</strong>serts and deletes will be necessary through the web. In this case <strong>in</strong><strong>format</strong>ion retrieval<br />

components may be more suitable than a database. A database supports transactions and<br />

is best suitable for use with exact <strong>in</strong><strong>format</strong>ion and the comb<strong>in</strong>ation of attributes.<br />

In<strong>format</strong>ion retrieval works with fuzzy logic. The user gives some terms, even a natural language<br />

query like „I would like to f<strong>in</strong>d all papers deal<strong>in</strong>g with images, databases and the<br />

WWW“ would be appropriate. In the first step the system elim<strong>in</strong>ates useless words <strong>in</strong> the query<br />

and expand the relevant terms us<strong>in</strong>g a thesaurus. After this an <strong>in</strong>verted list is used to f<strong>in</strong>d<br />

documents conta<strong>in</strong><strong>in</strong>g the significant words and their score. Calculat<strong>in</strong>g the score for a word <strong>in</strong><br />

a document may be done consider<strong>in</strong>g the frequency of appearance <strong>in</strong> the text and the document<br />

universe, the length of the paper, and the place of appearance, for example are title, header,<br />

and abstract more important than pla<strong>in</strong> text. Know<strong>in</strong>g the scores for the s<strong>in</strong>gle key words a<br />

comb<strong>in</strong>ed score will be calculated for the best N hits. Do<strong>in</strong>g <strong>in</strong><strong>format</strong>ion retrieval often leads to<br />

much better results than us<strong>in</strong>g a database: Have you ever tried to use KnowledgeF<strong>in</strong>der as<br />

front-end to Medl<strong>in</strong>e?<br />

So, use a database if you:<br />

• would like to do updates, <strong>in</strong>serts and deletes <strong>in</strong> a reliable way<br />

• usually use exact match, for example the patients name and date of birth<br />

• would like to comb<strong>in</strong>e different pieces of data<br />

Use <strong>in</strong><strong>format</strong>ion retrieval if you:<br />

• need to search <strong>in</strong> text based <strong>in</strong><strong>format</strong>ion<br />

• have no clear understand<strong>in</strong>g of possible terms and values<br />

• would like to have a rank<strong>in</strong>g of appropriateness of search results<br />

You do not have to decide for a database or an <strong>in</strong><strong>format</strong>ion retrieval system, because modern<br />

databases support full text search: Illustra uses a "Text DataBlade", Oracle the "Oracle Con-<br />

Text Option", Sybase "Topic", and IBM DB2 the "DB Text Extender".<br />

4.5 Interface Design<br />

There are a few rules how to design a user <strong>in</strong>terface. The three and most important are: simple,<br />

simple, and simple. Have a look at the Web based search eng<strong>in</strong>es like Lycos [27], and<br />

WebCrawler [28]. They have developed from an <strong>in</strong>put <strong>in</strong>terface with many parameters to a<br />

s<strong>in</strong>gle <strong>in</strong>put box. Alternatives are not given any longer <strong>in</strong> the entry screen. Good default values<br />

are used for the novice user. The experienced user knows where to change the sett<strong>in</strong>gs, if necessary.<br />

4.6 Integration of distributed data<br />

With distributed data we face two problems:<br />

1) Transfer times of data can be very long<br />

2) A (at least weak) consistent view on the data is required<br />

-22-

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!