YEARS OF EUROPEAN ONLINE ANNÉES DE EN LIGNE ...

YEARS OF EUROPEAN ONLINE ANNÉES DE EN LIGNE ... YEARS OF EUROPEAN ONLINE ANNÉES DE EN LIGNE ...

estig.ipbeja.pt
from estig.ipbeja.pt More from this publisher
21.11.2013 Views

Thesaurus Descriptor: pear tree … Entry Descriptor: pear Entry Related term: pear Related term: pear tree Ontology Entry Instance: pear tree Instance: pear Entry Produces: pear Produces: wood Instance: wood Entry Originates from: pear tree Originates from: pear tree Is used for: furniture a semantic web is a very ambitious and interesting idea, but in spite of various analyses it is and will stay a dream for a long time to come. Text mining In fact, text mining is a conglomeration of the different methodologies which are brought together and which beneit from synergetic effects. But text mining is even more ambitious. the analysis of unstructured data is supposed to be automatic, which in fact means independent from any particular preparation of the text documents. A text-mining system is furthermore supposed to be extensible, which means that, just by training, new rules can be integrated in the analysing processes. the most important advantage would be that automatic procedures can detect relations which would not otherwise have been highlighted by a human expert when categorising a document. If the objects to be analysed are not prepared, i.e. indexed by means of controlled vocabularies or classiied, the text-mining system must have access to sophisticated resources. It requires formalisms for the representation of 01_2007_5222_txt_ML.indd 164 6-12-2007 15:14:07

WORKSHOP knowledge which are reusable. this knowledge has to be integrated in the existing knowledge base. During the retrieval process, relations between identiied terms have to be validated and evaluated. All these preparations or preliminary tasks are language dependent. So the knowledge base, at least, has to be conceived in a way which makes it independent of a given language. 5. tEXt mINING IN LEGISLAtIVE DAtA? the fourth symposium on text mining organised by the fraunhofer Institute concentrated on legislative data. Various projects from different countries were presented and may be considered to be very ambitious. As all of them are concentrated on limited subjects — that does not exclude that huge amounts of data are treated — the speakers could talk about their success. however, it has also to be mentioned that the real use can hardly be veriied. Applying text mining to a complete legislation, even a rather young one such as the legislation of the European Union, is therefore extremely ambitious. the different subjects are characterised by specialised vocabularies and terminologies. the reusability of tools, dictionaries or rules for the evaluation of relations is limited. Consequently, the design and development will take up very large amounts of time and resources. Nevertheless, legislative texts have a big advantage in comparison with other document types. Even if their structure is not explicitly marked up, they have an inherent structure which, in many cases, has a long tradition. Parts of the document structure can be identiied by evaluating the use of speciic key terms. As a consequence, an analysis in the context of information retrieval could be directed to and concentrated on those components. therefore, in many cases, the application of text-mining technologies to titles of legislative documents could be suficient to classify acts or associate descriptors from a thesaurus without the intervention of a human operator. the abovementioned complexity and possible dificulties should not be regarded as reasons for not advancing towards automation processes. As textmining systems are in general very modular, it should be foreseen that smaller components could be implemented in existing systems. this could help to validate the eficiency of the modules and, at the same time the operational systems could be improved. Implicitly, this also means that the quality of the retrieval results should improve as well. 164 | 165 01_2007_5222_txt_ML.indd 165 6-12-2007 15:14:07

WORKSHOP<br />

knowledge which are reusable. this knowledge has to be integrated in the<br />

existing knowledge base. During the retrieval process, relations between identiied<br />

terms have to be validated and evaluated. All these preparations or preliminary<br />

tasks are language dependent. So the knowledge base, at least, has to<br />

be conceived in a way which makes it independent of a given language.<br />

5. tEXt mINING IN LEGISLAtIVE DAtA?<br />

the fourth symposium on text mining organised by the fraunhofer Institute<br />

concentrated on legislative data. Various projects from different countries<br />

were presented and may be considered to be very ambitious. As all of them are<br />

concentrated on limited subjects — that does not exclude that huge amounts<br />

of data are treated — the speakers could talk about their success. however, it<br />

has also to be mentioned that the real use can hardly be veriied.<br />

Applying text mining to a complete legislation, even a rather young one<br />

such as the legislation of the European Union, is therefore extremely ambitious.<br />

the different subjects are characterised by specialised vocabularies and<br />

terminologies. the reusability of tools, dictionaries or rules for the evaluation<br />

of relations is limited. Consequently, the design and development will take up<br />

very large amounts of time and resources.<br />

Nevertheless, legislative texts have a big advantage in comparison with<br />

other document types. Even if their structure is not explicitly marked up, they<br />

have an inherent structure which, in many cases, has a long tradition. Parts of<br />

the document structure can be identiied by evaluating the use of speciic key<br />

terms. As a consequence, an analysis in the context of information retrieval<br />

could be directed to and concentrated on those components. therefore, in<br />

many cases, the application of text-mining technologies to titles of legislative<br />

documents could be suficient to classify acts or associate descriptors from a<br />

thesaurus without the intervention of a human operator.<br />

the abovementioned complexity and possible dificulties should not be<br />

regarded as reasons for not advancing towards automation processes. As textmining<br />

systems are in general very modular, it should be foreseen that smaller<br />

components could be implemented in existing systems. this could help to<br />

validate the eficiency of the modules and, at the same time the operational<br />

systems could be improved. Implicitly, this also means that the quality of the<br />

retrieval results should improve as well.<br />

164 | 165<br />

01_2007_5222_txt_ML.indd 165 6-12-2007 15:14:07

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!