12.07.2015 Views

Large-Scale Semi-Supervised Learning for Natural Language ...

Large-Scale Semi-Supervised Learning for Natural Language ...

Large-Scale Semi-Supervised Learning for Natural Language ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Pattern Filler TypeString#1: 3rd-person pron. sing. it/its#2: 3rd-person pron. plur. they/them/their#3: any other pronoun he/him/his/,I/me/my, etc.#4: infrequent word token 〈UNK〉#5: any other token *Table 3.3: Pattern filler typesthat we can reliably classify a pronoun as being referential or non-referential based solelyon the local context surrounding the pronoun, using the techniques described in Section 3.3.The difficulty of non-referential pronouns has been acknowledged since the beginningof computational resolution of anaphora. Hobbs [1978] notes his algorithm does not handlepronominal references to sentences nor cases where it occurs in time or weather expressions.Hirst [1981, page 17] emphasizes the importance of detecting non-referential pronouns,“lest precious hours be lost in bootless searches <strong>for</strong> textual referents.” Mueller [2006]summarizes the evolution of computational approaches to non-referential it detection. Inparticular, note the pioneering work of Paice and Husk [1987], the inclusion of non-referentialit detection in a full anaphora resolution system by Lappin and Leass [1994], and the machinelearning approach of Evans [2001].3.7.2 Our Approach to Non-referential Pronoun DetectionWe apply our web-scale disambiguation systems to this task. Like in the above approaches,we turn the context into patterns, with it as the word to be labeled. Since the output classesare not explicit words, we devise some surrogate fillers. To illustrate <strong>for</strong> Example (1), notewe can extract the context pattern “make * in advance” and <strong>for</strong> Example (2) “make * inHollywood,” where “*” represents the filler in the position of it. Non-referential instancestend to have the word it filling this position in the pattern’s distribution. This is because nonreferentialpatterns are fairly unique to non-referential pronouns. Referential distributionsoccur with many other noun phrase fillers. For example, in the Google N-gram corpus,“make it in advance” and “make them in advance” occur roughly the same number of times(442 vs. 449), indicating a referential pattern. In contrast, “make it in Hollywood” occurs3421 times while “make them in Hollywood” does not occur at all. This indicates that someuseful statistics are counts <strong>for</strong> patterns filled with the words it and them.These simple counts strongly indicate whether another noun can replace the pronoun.Thus we can computationally distinguish between a) pronouns that refer to nouns, and b)all other instances: including those that have no antecedent, like Example (2), and thosethat refer to sentences, clauses, or implied topics of discourse.We now discuss our full set of pattern fillers. For identifying non-referential it in English,we are interested in how often it occurs as a pattern filler versus other nouns. Assurrogates <strong>for</strong> nouns, we gather counts <strong>for</strong> five different classes of words that fill the wildcardposition, determined by string match (Table 3.3). 6 The third-person plural they (#2)reliably occurs in patterns where referential it also resides. The occurrence of any other6 Note, this work was done be<strong>for</strong>e the availability of the POS-tagged Google V2 corpus (Chapter 5). Wecould directly count noun-fillers using that corpus.49

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!