21.11.2013 Views

YEARS OF EUROPEAN ONLINE ANNÉES DE EN LIGNE ...

YEARS OF EUROPEAN ONLINE ANNÉES DE EN LIGNE ...

YEARS OF EUROPEAN ONLINE ANNÉES DE EN LIGNE ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

tual centre for text mining in biomedicine)). It offers the automatic analysis of<br />

publications of any kind in biology and medicine. furthermore it helps to classify<br />

the electronically available, but mostly unstructured, information on patients.<br />

thus, account can also be taken of relations which until now had not<br />

been taken into account. In the United States of America, the National Library<br />

of medicine has used text mining for 15 years with great success.<br />

the British government has supported the UK National Centre for text<br />

mining with a GBP 1 million grant. the Japanese national parliament recently<br />

decided to establish a centre for text mining in biology. the list can go on, and<br />

shows that the technology is considered to be well advanced. Drew Robb<br />

(2004) gives an impressive list of projects world wide, which handle tremendous<br />

amounts of data.<br />

4. tEXt mINING IN COmPARISON wIth OthER INfORmAtION-<br />

REtRIEVAL mEthODOLOGIES<br />

Information retrieval has not only become a discipline since the widespread<br />

use of computers, particularly personal computers. Some of the methodologies<br />

are in fact even older than computers. Some of these methodologies<br />

will be described, together with an analysis of the sort of advantages that textmining<br />

technologies offer.<br />

Boolean retrieval<br />

Boolean retrieval is widely used, mostly because of its simple syntax.<br />

terms are researched and may be combined with the operators AND (∧),<br />

OR (∨) and NOt (¬). Although there are possibilities to modify the priorities<br />

of combinations which might result in rather complex requests, it is not too<br />

dificult to validate the syntax of a command. however, it is impossible to<br />

control the semantics of a request; for instance, it is not possible to detect<br />

terms which exclude each other. In many applications which are based on this<br />

technology, the syntax is extended by additional operators for comparison<br />

such as > (greater than), < (less than), = (equal), ≥ (greater or equal), ≤ (less or<br />

equal) and ≠ (not equal).<br />

terms may be related to ields of structured data in a database or refer to<br />

words or expressions within unstructured data. the eficiency of Boolean retrieval<br />

is often improved by collecting keywords in so-called inverted lists.<br />

Each expression is accompanied by references to the documents from which it<br />

was extracted. the problem is, however, that these lists take up a lot of storage<br />

01_2007_5222_txt_ML.indd 158 6-12-2007 15:14:06

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!