10.04.2013 Views

Unni Cathrine Eiken February 2005

Unni Cathrine Eiken February 2005

Unni Cathrine Eiken February 2005

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

information estimated from the text. Hindle’s results show that semantic relatedness can be<br />

derived from the distribution of syntactic forms (Hindle 1990, p. 274). This is a similar approach<br />

to the one taken in the present work, if on a substantially smaller scale. Hindle (1990) addresses<br />

the data sparseness problem by estimating the probability of an unseen event by comparing it to<br />

similar events which have been seen. Grefenstette (1992) presents a method which looks for<br />

context patterns in large domain-specific corpora and finds similar words relative to how a target<br />

word is used in a specific text or domain. His program SEXTANT uses syntactically derived<br />

contexts and estimates the similarity of two words by considering the overlapping of all the<br />

contexts associated with them over a large corpus (Grefenstette 1992, p. 325). As a result, a<br />

word’s context consists of all the words co-occurring with it in the corpus. Pereira et al. (1993)<br />

also report a method for clustering words according to their distributions in given syntactic<br />

contexts. In their approach, nouns are classified based on their syntactic relations to predicates in<br />

the corpus. The method enables the automatic derivation of classes of semantically similar<br />

words from a text corpus and produces clusters the authors term “intuitively informative”<br />

(Pereira et al. 1993, p. 190). Lin and Pantel (2001) present the unsupervised algorithm UNICON<br />

for the creation of groups of semantically similar words. Their approach examines collocation<br />

patterns consisting of dependency relationships, and employs a method for selecting significant<br />

collocation patterns. Those dependency relations which occur more frequently than if the words<br />

were independent of each other are selected as collocation patterns. This approach is further<br />

developed in Pantel and Lin (2002). Here, clusters which are relatively semantically different<br />

are initially identified and a subset of the cluster members are used to create so-called centroids,<br />

which represent the average features of the subsets. Subsequently new words are assigned to<br />

their most similar clusters. A word can be assigned to several clusters, each cluster<br />

corresponding to a sense of the word.<br />

2.2.3 Context and selectional restrictions<br />

In the above it has been argued that a given word will tend to co-occur with a limited class of<br />

other words, and that this information can be exploited to find words that are similar in meaning.<br />

One of the reasons for this expected occurrence of similar words in similar contexts, is that<br />

predicates to a certain extent limit the semantic properties of the arguments that they can<br />

combine with. This behaviour is captured through the notion of selectional restrictions, which<br />

define how a predicate restricts the class of arguments that can combine in a specific position<br />

30

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!