YEARS OF EUROPEAN ONLINE ANNÉES DE EN LIGNE ...

21.11.2013 Views
development of user interfaces, which by means of taxonomies lead the user to the searched domain of interest (Dörre, Gerstl and Seiffert, 2004, p. 480). One of the biggest challenges of information retrieval systems is the fact that most (if not all) of the documents — in the broader sense of the word — are written in a natural language ( 1 ). Among others these communication systems are characterised by a certain number of different possibilities to refer to the same extralinguistic facts. Sometimes it is even a good method in a communication situation to reformulate a statement in order to show the understanding of the original one and/or to get a conirmation of it. In other cases, certain facts are paraphrased without a speciic keyword being used. for example, if an author describes a nice lady in white clothes sitting on a glass bowl, it could mean that he is talking about luck. this example of a medieval allegory, which today is probably only understood by some experts, proves another characteristic of human language: it changes over the course of time. New expressions appear and others disappear or change their meaning. It has also to be kept in mind that the items of our vocabulary are not only in a certain relation with the concept of an extralinguistic fact, but are related to each other as well. this is why, for instance, in good dictionaries you ind hints about synonyms or antonyms to a given expression. But in the context of environment protection, for example, it could also be of interest to retrieve information on air pollution. this case indicates the complexity of the relations between expressions, and it may be doubted if such word ields are ever complete. Such circumstances obviously make an automatic retrieval of performing information extremely complex. So documents need to be analysed by means of scientiic linguistic methodology which goes far beyond the still widespread approach of simple text indexing and clearing. the still ‘young’ domain of text mining tries to develop appropriate methods to support the digging for information within natural language documents. text mining is sometimes also referred to as ‘text data mining’ or ‘knowledge discovery in text’. In general it deines the process of retrieving information in texts. this is the most important difference between data mining and text mining. while data mining procedures try to extract relevant information from structured databases, text mining concentrates on unstructured text doc- ( 1 ) Some attempts to translate documents into a more formal language — Interlingua — were not very successful. See the comments by hutchins (1986, Chapter 10). 01_2007_5222_txt_ML.indd 152 6-12-2007 15:14:05

WORKSHOP uments. the tasks and objectives for the analysis process are more or less the same (Dörre, Gerstl and Seiffert, 2004, p. 480). Information is typically identiied through processes discovering patterns and relations mainly by means of statistical pattern learning. texts are generally regarded as unstructured data in contrast to database information, which is supposed to be structured. text mining usually involves the process of structuring the input text by ‘parsing’, which is completed by the addition and/or removal of linguistic features. this restructuring of data permits the derivation patterns as well as evaluation and interpretation of the output. the quality of text mining is usually judged on the combination of relevance, novelty and tractability. typical text-mining tasks include text classiication, text clustering, concept or entity extraction, document summarisation and modelling of entity relations. text-mining processes may be described as a subsequent low of activities. By means of statistical algorithms, the key terms of a textual entity are identiied. Comparison with entries in ontologies offers possibilities to group those texts together with similar ones. In this way, a basis of knowledge is created and extended after analysing other documents. An example will show the complexity of the necessary methods. Imagine that a document contains the German word Birne ‘pear’. It has to be taken into account that the use of this term could be an ellipsis or a metaphor. that leads us to the following virtual classes, which distinguish from each other by the different meanings of the key term: (1) a kind of fruit, (2) the tree which produces the fruits (‘pear tree’); this is the elliptic use for Birnenbaum, (3) the wood of a pear tree which is used for the construction of furniture; this is an ellipsis for Birnenholz, (4) an electric bulb which in many cases has a form resembling a pear; this is a metaphor well established in the German vocabulary and at the same time an ellipsis for Glühbirne, (5) ironically the head of a human being which in certain stylistic contexts may be compared with a pear; in that case it could be regarded as a metaphor. Although the last one of these variants only has to be taken into account depending on the stylistic context, the other ones need deeper analysis so that the documents concerned can be related to similar ones. In the irst case, this could consist of references to other types of fruit or foods. If the document 152 | 153 01_2007_5222_txt_ML.indd 153 6-12-2007 15:14:05

Page 1 and 2: Speeches and proceedings 25th anniv

Page 3 and 4: 25 YEARS OF ONLINE THE EVENT 25 ANN

Page 5 and 6: INTRODUCTION APRèS LA PUBLICAtION

Page 10 and 11: wORKShOP Legal XmL — Use of XmL f

Page 12 and 13: 01_2007_5222_txt_ML.indd 12 6-12-20

Page 14 and 15: Cette énumération des participant


Page 18 and 19: sion a fait de l’initiative «mie

Page 20 and 21: II. «mIEUX LéGIféRER» Et L’AC


Page 24 and 25: • et L. E. Allen, spécialiste de

Page 26 and 27: La création d’un réseau de coop


Page 30 and 31: Stele so weit nähern, dass er den

Page 32 and 33: nicht autorisierte Abschrift der of

Page 34 and 35: „Die Anforderungen und Bedingunge

Page 36 and 37: hat, „Zugänglichkeit und Verstä

Page 38 and 39: Old testament. this passage is set

Page 40 and 41: this checkpoint is given a Priority

Page 42 and 43: knowledge of the law. Adults, child

Page 44 and 45: It is a dificult task to predict th

Page 47 and 48: MEETING OF THE COUNCIL WORKING PART

Page 49 and 50: SOUVENIRS D’UNE DÉLÉGUÉE NATIO

Page 51 and 52: EUR-LEX TODAY AND TOMORROW After mo




Page 59 and 60: DOCUMENT ANALYSIS AND LEGAL INFORMA




Page 67 and 68: LIFE AS A CELEX HOST INtRODUCtION I



Page 73 and 74: CONCLUSIONS first, I would like to

Page 75 and 76: En tant que déléguée de la Grèc

Page 77 and 78: LEGAL XML — USE OF XML FOR THE PR

Page 79 and 80: WORKSHOP • publishing technologie

Page 81 and 82: WORKSHOP • NiR and the NiR editor

Page 83 and 84: WORKSHOP tors, thus providing the o

Page 85 and 86: WORKSHOP 3. SCOPE the expected resu

Page 87 and 88: WORKSHOP gestützt auf die Verordnu

Page 89 and 90: WORKSHOP which describe general com

Page 91 and 92: WORKSHOP Arithmetic Poetry. Ameri

Page 93 and 94: WORKSHOP Metadata fields collected

Page 95 and 96: WORKSHOP tion of the common metadat

Page 97 and 98: ELECTRONIC PUBLISHING OF LEGISLATIO

Page 99 and 100: WORKSHOP the working group has focu

Page 101 and 102: WORKSHOP • the integrity of a rec

Page 103 and 104: WORKSHOP Examples of different appr

Page 105 and 106: WORKSHOP conidence uses certiicatio

Page 107 and 108: WORKSHOP 3. LEGISLAtIVE ISSUES CONC

Page 109 and 110: WORKSHOP signature, is only publish

Page 111 and 112: WORKSHOP 3.2.4. ARE THERE ACTS, DEC

Page 113 and 114: WORKSHOP Electronic signature of PD

Page 115 and 116: WORKSHOP formats are available: htm

Page 117 and 118: WORKSHOP the chain of conidence is

Page 119 and 120: WORKSHOP the object of SOLON is to

Page 121 and 122: WORKSHOP 5. A secure session is now

Page 123 and 124: WORKSHOP ESTONIA A certiicate-based

Page 125 and 126: WORKSHOP ertheless, some assistance

Page 127 and 128: WORKSHOP (b) If the system is XmL-b

Page 129 and 130: COHERENCE OF TERMINOLOGY AND SEARCH

Page 131 and 132: WORKSHOP nym for legal categories,

Page 133 and 134: WORKSHOP Article 4(2) of Directive

Page 135 and 136: WORKSHOP the tool could prove to be

Page 137 and 138: EUR-LEX: FROM DATA STRUCTURES TO LE

Page 139 and 140: WORKSHOP duces a ‘magic result’

Page 141 and 142: WORKSHOP sion of the current one, i

Page 143 and 144: WORKSHOP for test and demonstration

Page 145 and 146: WORKSHOP focus on text representati

Page 147 and 148: WORKSHOP As a irst step, the existi

Page 149 and 150: WORKSHOP REfERENCES Bench-Capon, t.

Page 151: TEXT MINING 1. INtRODUCtION the gro

Page 155 and 156: WORKSHOP In order to create more ef

Page 157 and 158: WORKSHOP jects. the problem, howeve

Page 159 and 160: WORKSHOP space and their maintenanc

Page 161 and 162: WORKSHOP the success of the impleme

Page 163 and 164: WORKSHOP Thesauri thesauri are cont

Page 165 and 166: WORKSHOP knowledge which are reusab

Page 167 and 168: WORKSHOP fuhr, Norbert. 2004. Infor

Page 169 and 170: WORKSHOP Oberle, Daniel; Staab, Ste

Page 171: En tant que déléguée de la Grèc

Page 175 and 176: PRESS REVIEW / REVUE DE PRESSE " 17


Page 180: 01_2007_5222_txt_ML.indd 180 6-12-2

electronic

documents

european

celex

metadata

oficial

droit

irst

legislative

publications

ligne

www.estig.ipbeja.pt

YEARS OF EUROPEAN ONLINE ANNÉES DE EN LIGNE ...

YEARS OF EUROPEAN ONLINE ANNÉES DE EN LIGNE ... ... View more YEARS OF EUROPEAN ONLINE ANNÉES DE EN LIGNE ...

Delete template?

Save as template ?

YEARS OF EUROPEAN ONLINE ANNÉES DE EN LIGNE ... YEARS OF EUROPEAN ONLINE ANNÉES DE EN LIGNE ...