Unni Cathrine Eiken February 2005
Unni Cathrine Eiken February 2005 Unni Cathrine Eiken February 2005
enriched by several new EPAS. Still, for the purposes of this thesis, the list includes a broad enough variety of structures to be of use in the classification phase. In the process of assessing the quality of the EPAS list, it became evident that the most interesting structures are the simplest ones. The EPAS corresponding to verb-subject-object relations are the ones that contribute with most information about the selectional restrictions of the domain. An alternative way to obtain an effective and robust extraction of EPAS might have been to concentrate only on this type of structure, rather that focusing on extracting all EPAS from the text collections and then filtering out unwanted ones. In order to estimate the potential of a classification of the EPAS list, line diagrams were created using Formal Concept Analysis (FCA). FCA is a methodology of data analysis and knowledge representation which identifies conceptual structures in data sets, and was a useful tool in the process of identifying how the predicates and arguments in the EPAS list related to each other. FCA distinguishes between two types of elements; formal objects and formal attributes. A formal concept is seen as a unit consisting of all belonging objects and attributes (Wolff 1991, p. 430). Starting with any set of formal objects, all formal attributes the objects have in common can be identified. When using FCA to structure the data in the EPAS list, the arguments were termed objects, while the predicates were termed attributes. An FCA line diagram consists of all objects and attributes in a given context, organised hierachically according to their shared properties. Figure 5 below shows the FCA line diagram for part of the structures in the EPAS list 4 . Each white label corresponding to an argument from the EPAS list should be understood as a concept, and information about each concept can be read by following the upward leading paths from each concept. An object has a given attribute if there is an upward leading path from the object to the attribute (Wolff 1994, p. 431). Using the arguments/formal objects lensmann (sergeant) and Fonn as a starting point, the associated predicates/formal attributes gi (give), and bede-om (ask-for) can be identified. The arguments lensmann and Fonn co-occur with the predicates gi and bede-om, while politi (police), which is further down in the hierarchy, cooccurs with other predicates as well as those higher up in the diagram (gi, bede-om and bekrefte (confirm)). In other words, more general concepts are found toward the bottom of the diagram, while specialised concepts are found by following the paths upwards. For the data material in 4 The diagram was made using the program Concept Explorer, downloadable from http://sourceforge.net/projects/conexp 56
this project, this can be interpreted in terms of the contextual distribution the arguments have. Arguments found in the lower parts of the diagram are more general and co-occur with a wider range of predicates than the arguments found higher up in the hierarchy. In Figure 5, it can be seen that gjerningsmann (perpetrator) and drapsmann (killer) have similar distributions in the data material; drapsmann co-occurs with the predicates velge (choose) and gjemme (hide), while gjerningsmann only is found in connection with gjemme. On the basis of the formal concept analysis, it is clear that the EPAS list contains several arguments which show a distribution particular to their semantic meaning. The different lines in the diagram show interesting bundles of semantically related arguments and confirm the assumption that different types of arguments show different contextual distribution within the thematic domain. Figure 5 57
- Page 11 and 12: The results obtained in this projec
- Page 13 and 14: The term anaphor describes a lingui
- Page 15 and 16: 2.1.1.1 Discourse representation th
- Page 17 and 18: eferring to BT. The NP which is lin
- Page 19 and 20: esolution system will not be able t
- Page 21 and 22: (2- 12) REC SUBJ EXIST OBJ IND-OBJ
- Page 23 and 24: Figure 1 17
- Page 25 and 26: means that the algorithm would prop
- Page 27 and 28: for an overview). Many of these sys
- Page 29 and 30: (2- 15) a. Politiet etterlyste i da
- Page 31 and 32: section. The theory dates back to t
- Page 33 and 34: 2.2.2 Different types of context So
- Page 35 and 36: neighbours. For example, a target w
- Page 37 and 38: with it. Selectional constraints al
- Page 39 and 40: 3 From text to EPAS - the extractio
- Page 41 and 42: 3.2 Predicate-argument structures "
- Page 43 and 44: speaker flexibility with regards to
- Page 45 and 46: and woman occur together both in su
- Page 47 and 48: occur with. Arguments which are unl
- Page 49 and 50: 3.3.1 NorGram in outline Norsk komp
- Page 51 and 52: Figure 3 The most useful structure
- Page 53 and 54: 3.4 Altering the source As already
- Page 55 and 56: (3- 12) (3- 13) Politiet leter ette
- Page 57 and 58: ARG1 and ARG2 arrays display a valu
- Page 59 and 60: (3- 20) Anne Slåtten bodde i et st
- Page 61: value and highly desirable. As such
- Page 65 and 66: The process of classifying the cons
- Page 67 and 68: There are several different distanc
- Page 69 and 70: . ankomme,etterforsker,?,? ankomme,
- Page 71 and 72: Test 2 Training set: EPAS_arg1 with
- Page 73 and 74: The training and test material was
- Page 75 and 76: • level 0: words which co-occur w
- Page 77 and 78: (4- 9) avklare,obduksjon,? bede-om,
- Page 79 and 80: (4-10) below shows the output for t
- Page 81 and 82: In the introduction to this chapter
- Page 83 and 84: the EPAS can be used in the classif
- Page 85 and 86: exemption of jobbe-utfra, none of t
- Page 87 and 88: antecedent for (4-15a). In the case
- Page 89 and 90: Figure 7 Interestingly enough, howe
- Page 91 and 92: When testing on knowledge-dependent
- Page 93 and 94: Firth, J. R. (1957): A synopsis of
- Page 95 and 96: Appendix A: Ekstraktor.pl - algorit
- Page 97 and 98: finnARG2(); This function has exact
- Page 99 and 100: #legger lest linje inn i @prt derso
- Page 101 and 102: sub fjernEP{ #fjerner elementer fra
- Page 103 and 104: } splice(@ARGx); $imax = @ARG3ep; @
- Page 105 and 106: } else{ } } } push(@liste, $ARG0ep[
- Page 107 and 108: 101 Appendix C: the EPAS list 23-å
- Page 109 and 110: 103 obdusere,,kvinne observere,,23-
- Page 111 and 112: Appendix D: Text aligned with EPAS
enriched by several new EPAS. Still, for the purposes of this thesis, the list includes a broad<br />
enough variety of structures to be of use in the classification phase.<br />
In the process of assessing the quality of the EPAS list, it became evident that the most<br />
interesting structures are the simplest ones. The EPAS corresponding to verb-subject-object<br />
relations are the ones that contribute with most information about the selectional restrictions of<br />
the domain. An alternative way to obtain an effective and robust extraction of EPAS might have<br />
been to concentrate only on this type of structure, rather that focusing on extracting all EPAS<br />
from the text collections and then filtering out unwanted ones.<br />
In order to estimate the potential of a classification of the EPAS list, line diagrams were created<br />
using Formal Concept Analysis (FCA). FCA is a methodology of data analysis and knowledge<br />
representation which identifies conceptual structures in data sets, and was a useful tool in the<br />
process of identifying how the predicates and arguments in the EPAS list related to each other.<br />
FCA distinguishes between two types of elements; formal objects and formal attributes. A<br />
formal concept is seen as a unit consisting of all belonging objects and attributes (Wolff 1991, p.<br />
430). Starting with any set of formal objects, all formal attributes the objects have in common<br />
can be identified. When using FCA to structure the data in the EPAS list, the arguments were<br />
termed objects, while the predicates were termed attributes. An FCA line diagram consists of all<br />
objects and attributes in a given context, organised hierachically according to their shared<br />
properties. Figure 5 below shows the FCA line diagram for part of the structures in the EPAS<br />
list 4 . Each white label corresponding to an argument from the EPAS list should be understood as<br />
a concept, and information about each concept can be read by following the upward leading<br />
paths from each concept. An object has a given attribute if there is an upward leading path from<br />
the object to the attribute (Wolff 1994, p. 431). Using the arguments/formal objects lensmann<br />
(sergeant) and Fonn as a starting point, the associated predicates/formal attributes gi (give), and<br />
bede-om (ask-for) can be identified. The arguments lensmann and Fonn co-occur with the<br />
predicates gi and bede-om, while politi (police), which is further down in the hierarchy, cooccurs<br />
with other predicates as well as those higher up in the diagram (gi, bede-om and bekrefte<br />
(confirm)). In other words, more general concepts are found toward the bottom of the diagram,<br />
while specialised concepts are found by following the paths upwards. For the data material in<br />
4 The diagram was made using the program Concept Explorer, downloadable from<br />
http://sourceforge.net/projects/conexp<br />
56