YEARS OF EUROPEAN ONLINE ANNÉES DE EN LIGNE ...

More documents

Recommendations

Info

allows for identifying the concepts of a document. In fact, both approaches should be used subsequently to arrive at a clear and solid classiication of texts. the documents actually are instantiation of different concepts. text-mining technology is not limited to the basic ideas which are expressed at the surface of a given text, but discovers other ones as well. thus documents can be related to each other although the key terms used are different. for example, a document handling the legislation on ‘environment protection’ can be classiied together with documents treating problems of ‘air pollution’. Linguistic statistics offer basic methods for the distinction of relevant and less- or non-relevant components. Regularities concerning the frequency of words and word forms in a given language allow the description of linguistic relations. this approach combined with a probabilistic linguistic model is one of the fundaments of text mining. One of the most important methods analyses the differences of the vocabulary of a given text with a reference text collection which is supposed to represent the general vocabulary of a language. After determining the frequencies of all word forms or tokens in both the text and the reference collection, there are four classes which help to identify a word: (1) words which exist in the analysed text, but are not part of the reference collection: there is a very high probability that these words belong to the speciic vocabulary of the domain which the text deals with; (2) words which exist both in the analysed text and the reference collection, but with the relative frequency in the analysed text being higher than in the general collection: if a predeined threshold value is exceeded, it is probable that these words also belong to the speciic terminology of the domain which the text speaks about; (3) words which exist both in the analysed text and the reference collection and whose relative frequencies are more or less the same: in general these words are necessary for the functioning of a language, they do not regard speciic subjects; (4) words which exist in the reference collection with higher relative frequencies than in the analysed text: with a very high probability these words do not contain terms of the subject matter treated by the analysed document. the advantage of this method is that the key terms of a given text can be identiied with a relatively high probability. the next step could consist of the classiication of the document with documents which deal with similar sub- 01_2007_5222_txt_ML.indd 156 6-12-2007 15:14:05
WORKSHOP jects. the problem, however, is the collection of reference texts. As the general use of a language should be represented, a clear deinition of the borders between a general and a speciic vocabulary has to be found. But in the daily world, which is more and more inluenced by new technologies, this deinition is not evident. A similar problem exists for the deinition of threshold values. their exactness has a direct impact on the usefulness of the second one of the abovementioned word classes. this is of particular importance with respect to the generalisation of speciic vocabularies in the everyday language, which tends to minimise the irst class. to overcome such limits, additional methods have to be implemented. the probabilistic language model, which is based on syntactical and semantic analyses, leads to the deinition of rules which limit the number of combinations of linguistic entities. the example The bone eats a dog, although syntactically correct, has to be rejected as bone cannot be the acting part in the context of the verb eat. A similar rule will exclude Birne ‘bulb’ from being the object of the same verb ( 4 ). Various other methods will have to be used to reine text-mining analyses. the cooperation of different scientiic disciplines will be necessary in order to deine relevant vocabularies. 3. ImPLEm<strong>EN</strong>tAtION Of tEXt mINING Before text-mining methodologies can eficiently contribute to the acquisition of knowledge, an enormous amount of preparatory work has to be done. In particular, dictionaries have to be created which contain suficient information for the text analysis as well as the necessary interlinking such as described by ontologies. this may be one of the reasons why text mining is established for speciic subject matter. the life science domain is particularly active. It is supposed to have the largest user community and the fastest-growing literature. the fraunhofer Institut in Bonn-St Augustin, Germany, organises an annual conference where representatives from various subjects report on the evolution of their projects. A project which is at a state of relatively high maturity is Biotem (Deutsches virtuelles Centrum für text mining in der Biomedizin (the German vir- ( 4 ) Leaving aside some sensational performers who lead their audience to believe that they are really eating bulbs. 156 | 157 01_2007_5222_txt_ML.indd 157 6-12-2007 15:14:06
Page 1 and 2:
Speeches and proceedings 25th anniv
Page 3 and 4:
25 YEARS OF ONLINE THE EVENT 25 ANN
Page 5 and 6:
INTRODUCTION APRèS LA PUBLICAtION
Page 10 and 11:
wORKShOP Legal XmL — Use of XmL f
Page 12 and 13:
01_2007_5222_txt_ML.indd 12 6-12-20
Page 14 and 15:
Cette énumération des participant
Page 16 and 17:
01_2007_5222_txt_ML.indd 16 6-12-20
Page 18 and 19:
sion a fait de l’initiative «mie
Page 20 and 21:
II. «mIEUX LéGIféRER» Et L’AC
Page 22 and 23:
01_2007_5222_txt_ML.indd 22 6-12-20
Page 24 and 25:
• et L. E. Allen, spécialiste de
Page 26 and 27:
La création d’un réseau de coop
Page 28 and 29:
01_2007_5222_txt_ML.indd 28 6-12-20
Page 30 and 31:
Stele so weit nähern, dass er den
Page 32 and 33:
nicht autorisierte Abschrift der of
Page 34 and 35:
„Die Anforderungen und Bedingunge
Page 36 and 37:
hat, „Zugänglichkeit und Verstä
Page 38 and 39:
Old testament. this passage is set
Page 40 and 41:
this checkpoint is given a Priority
Page 42 and 43:
knowledge of the law. Adults, child
Page 44 and 45:
It is a dificult task to predict th
Page 47 and 48:
MEETING OF THE COUNCIL WORKING PART
Page 49 and 50:
SOUVENIRS D’UNE DÉLÉGUÉE NATIO
Page 51 and 52:
EUR-LEX TODAY AND TOMORROW After mo
Page 53 and 54:
Page 55 and 56:
Page 57 and 58:
Page 59 and 60:
DOCUMENT ANALYSIS AND LEGAL INFORMA
Page 61 and 62:
Page 63 and 64:
Page 65 and 66:
Page 67 and 68:
LIFE AS A CELEX HOST INtRODUCtION I
Page 69 and 70:
Page 71 and 72:
Page 73 and 74:
CONCLUSIONS first, I would like to
Page 75 and 76:
En tant que déléguée de la Grèc
Page 77 and 78:
LEGAL XML — USE OF XML FOR THE PR
Page 79 and 80:
WORKSHOP • publishing technologie
Page 81 and 82:
WORKSHOP • NiR and the NiR editor
Page 83 and 84:
WORKSHOP tors, thus providing the o
Page 85 and 86:
WORKSHOP 3. SCOPE the expected resu
Page 87 and 88:
WORKSHOP gestützt auf die Verordnu
Page 89 and 90:
WORKSHOP which describe general com
Page 91 and 92:
WORKSHOP Arithmetic Poetry. Ameri
Page 93 and 94:
WORKSHOP Metadata fields collected
Page 95 and 96:
WORKSHOP tion of the common metadat
Page 97 and 98:
ELECTRONIC PUBLISHING OF LEGISLATIO
Page 99 and 100:
WORKSHOP the working group has focu
Page 101 and 102:
WORKSHOP • the integrity of a rec
Page 103 and 104:
WORKSHOP Examples of different appr
Page 105 and 106: WORKSHOP conidence uses certiicatio
Page 107 and 108: WORKSHOP 3. LEGISLAtIVE ISSUES CONC
Page 109 and 110: WORKSHOP signature, is only publish
Page 111 and 112: WORKSHOP 3.2.4. ARE THERE ACTS, DEC
Page 113 and 114: WORKSHOP Electronic signature of PD
Page 115 and 116: WORKSHOP formats are available: htm
Page 117 and 118: WORKSHOP the chain of conidence is
Page 119 and 120: WORKSHOP the object of SOLON is to
Page 121 and 122: WORKSHOP 5. A secure session is now
Page 123 and 124: WORKSHOP ESTONIA A certiicate-based
Page 125 and 126: WORKSHOP ertheless, some assistance
Page 127 and 128: WORKSHOP (b) If the system is XmL-b
Page 129 and 130: COHERENCE OF TERMINOLOGY AND SEARCH
Page 131 and 132: WORKSHOP nym for legal categories,
Page 133 and 134: WORKSHOP Article 4(2) of Directive
Page 135 and 136: WORKSHOP the tool could prove to be
Page 137 and 138: EUR-LEX: FROM DATA STRUCTURES TO LE
Page 139 and 140: WORKSHOP duces a ‘magic result’
Page 141 and 142: WORKSHOP sion of the current one, i
Page 143 and 144: WORKSHOP for test and demonstration
Page 145 and 146: WORKSHOP focus on text representati
Page 147 and 148: WORKSHOP As a irst step, the existi
Page 149 and 150: WORKSHOP REfERENCES Bench-Capon, t.
Page 151 and 152: TEXT MINING 1. INtRODUCtION the gro
Page 153 and 154: WORKSHOP uments. the tasks and obje
Page 155: WORKSHOP In order to create more ef
Page 159 and 160: WORKSHOP space and their maintenanc
Page 161 and 162: WORKSHOP the success of the impleme
Page 163 and 164: WORKSHOP Thesauri thesauri are cont
Page 165 and 166: WORKSHOP knowledge which are reusab
Page 167 and 168: WORKSHOP fuhr, Norbert. 2004. Infor
Page 169 and 170: WORKSHOP Oberle, Daniel; Staab, Ste
Page 171: En tant que déléguée de la Grèc
Page 175 and 176: PRESS REVIEW / REVUE DE PRESSE " 17
Page 178 and 179: 01_2007_5222_txt_ML.indd 178 6-12-2
Page 180: 01_2007_5222_txt_ML.indd 180 6-12-2
show all

YEARS OF EUROPEAN ONLINE ANNÉES DE EN LIGNE ...

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?