21.11.2013 Views

YEARS OF EUROPEAN ONLINE ANNÉES DE EN LIGNE ...

YEARS OF EUROPEAN ONLINE ANNÉES DE EN LIGNE ...

YEARS OF EUROPEAN ONLINE ANNÉES DE EN LIGNE ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

TEXT MINING<br />

1. INtRODUCtION<br />

the growth of electronically available information on the Internet has<br />

been enormous and is still increasing. In february 2006, the Google search<br />

engine offered access to more than 25 billion web pages. Eight years before,<br />

the igures were relatively low, with about 320 million pages.<br />

In its introduction, the EUR-Lex website currently mentions about 1.8 million<br />

documents in various languages which are accessible via this portal. In<br />

2005, which was supposed to be a quiet year, more than 2 100 regulations<br />

were adopted in European legislation which meant over 44 000 new documents<br />

in EUR-Lex. Although these amounts seem relatively small in comparison<br />

with the igures on the Internet, a human client of the EUR-Lex system will<br />

be lost without an eficient support from corresponding electronic information<br />

retrieval systems.<br />

Nevertheless, it has to be underlined that, since computers irst started to<br />

store and retrieve information, individuals have been confronted with more<br />

and more information items. thanks to information technology research, retrieval<br />

methodologies have become more and more sophisticated and offered<br />

easier and better possibilities for information retrieval.<br />

It is today certainly understandable that users of information retrieval systems<br />

are very demanding about the quality of the results, and no longer by the<br />

quantity. what is the advantage when a search engine on the web confronts a<br />

user with millions of hits? Sometimes it helps if the retrieval parameters are<br />

reformulated; Boolean operators may also help to exclude some websites or at<br />

least to determine which kind of answer its a given request better. In many<br />

cases, however, this will not necessarily reduce the number of answers, but<br />

will perhaps place the most relevant answers on top of the list. however, this<br />

is not really an improvement of the retrieval result.<br />

Another problem in the same context is that accessible systems or portals<br />

are so complicated that users do not necessarily have the knowledge and experience<br />

to make the engines extract relevant data. this can be supported by the<br />

HOLGER BAGOLA<br />

Head of the Formats<br />

and Linguistic Tools Section,<br />

Publications Office<br />

150 | 151<br />

01_2007_5222_txt_ML.indd 151 6-12-2007 15:14:05

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!