12.07.2015 Views

Large-Scale Semi-Supervised Learning for Natural Language ...

Large-Scale Semi-Supervised Learning for Natural Language ...

Large-Scale Semi-Supervised Learning for Natural Language ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Interpolated Precision0.80.70.60.50.40.30.20.100 0.2 0.4 0.6 0.8 1RecallDSP>TMI>TDSP>0MI>0Figure 6.2: Pronoun resolution precision-recall on MUC.only once in the corpus. Even if we smooth MI by smoothing Pr(n|v) in Equation 6.2using modified KN-smoothing [Chen and Goodman, 1998], the recall of MI>0 on SJMonly increases from 44.1% to 44.9%, still far below DSP. Frequency-based models havefundamentally low coverage. As further evidence, if we build a model of MI on the SJMcorpus and use it in our pseudodisambiguation experiment (Section 6.4.3), MI>0 gets aMacroAvg precision of 86% but a MacroAvg recall of only 12%. 96.4.6 Pronoun ResolutionFinally, we evaluate DSP on a common application of selectional preferences: choosing thecorrect antecedent <strong>for</strong> pronouns in text [Dagan and Itai, 1990; Kehler et al., 2004]. We studythe cases where a pronoun is the direct object of a verb predicate,v. A pronoun’s antecedentmust obey v’s selectional preferences. If we have a better model of SP, we should be ableto better select pronoun antecedents. 10We parsed the MUC-7 [1997] coreference corpus and extracted all pronouns in a directobject relation. For each pronoun,p, modified by a verb,v, we extracted all preceding nounswithin the current or previous sentence. Thirty-nine anaphoric pronouns had an antecedentin this window and are used in the evaluation. For eachp, letN(p) + by the set of precedingnouns coreferent with p, and let N(p) − be the remaining non-coreferent nouns. We takeall (v,n + ) where n + ∈ N(p) + as positive, and all other pairs (v,n − ), n − ∈ N(p) − asnegative.We compare MI and DSP on this set, classifying every (v,n) with MI>T (or DSP>T )as positive. By varying T , we get a precision-recall curve (Figure 6.2). Precision is low9 Recall that even the Keller and Lapata [2003] system, built on the world’s largest corpus, achieves only34% recall (Table 6.1) (with only 48% of positives and 27% of all pairs previously observed, but note, onthe other hand, that low-count N-grams have been filtered from the N-gram corpus, and there<strong>for</strong>e perhaps thiseffect is overstated).10 Note we’re not trying to answer the question of whether selectional preferences are useful [Yang et al.,2005] or not [Kehler et al., 2004] <strong>for</strong> resolving pronouns when combined with features <strong>for</strong> recency, frequency,gender, syntactic role of the candidate, etc. We are only using this task as another evaluation <strong>for</strong> our models ofselectional preference.93

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!