21.11.2013 Views

YEARS OF EUROPEAN ONLINE ANNÉES DE EN LIGNE ...

YEARS OF EUROPEAN ONLINE ANNÉES DE EN LIGNE ...

YEARS OF EUROPEAN ONLINE ANNÉES DE EN LIGNE ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

WORKSHOP<br />

jects. the problem, however, is the collection of reference texts. As the general<br />

use of a language should be represented, a clear deinition of the borders between<br />

a general and a speciic vocabulary has to be found. But in the daily<br />

world, which is more and more inluenced by new technologies, this deinition<br />

is not evident.<br />

A similar problem exists for the deinition of threshold values. their exactness<br />

has a direct impact on the usefulness of the second one of the abovementioned<br />

word classes. this is of particular importance with respect to the generalisation<br />

of speciic vocabularies in the everyday language, which tends to<br />

minimise the irst class.<br />

to overcome such limits, additional methods have to be implemented.<br />

the probabilistic language model, which is based on syntactical and semantic<br />

analyses, leads to the deinition of rules which limit the number of combinations<br />

of linguistic entities. the example The bone eats a dog, although syntactically<br />

correct, has to be rejected as bone cannot be the acting part in the context<br />

of the verb eat. A similar rule will exclude Birne ‘bulb’ from being the object of<br />

the same verb ( 4 ).<br />

Various other methods will have to be used to reine text-mining analyses.<br />

the cooperation of different scientiic disciplines will be necessary in order to<br />

deine relevant vocabularies.<br />

3. ImPLEm<strong>EN</strong>tAtION Of tEXt mINING<br />

Before text-mining methodologies can eficiently contribute to the acquisition<br />

of knowledge, an enormous amount of preparatory work has to be done.<br />

In particular, dictionaries have to be created which contain suficient information<br />

for the text analysis as well as the necessary interlinking such as described<br />

by ontologies.<br />

this may be one of the reasons why text mining is established for speciic<br />

subject matter. the life science domain is particularly active. It is supposed to have<br />

the largest user community and the fastest-growing literature. the fraunhofer Institut<br />

in Bonn-St Augustin, Germany, organises an annual conference where representatives<br />

from various subjects report on the evolution of their projects.<br />

A project which is at a state of relatively high maturity is Biotem (Deutsches<br />

virtuelles Centrum für text mining in der Biomedizin (the German vir-<br />

( 4 ) Leaving aside some sensational performers who lead their audience to believe that they<br />

are really eating bulbs.<br />

156 | 157<br />

01_2007_5222_txt_ML.indd 157 6-12-2007 15:14:06

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!