12.07.2015 Views

Large-Scale Semi-Supervised Learning for Natural Language ...

Large-Scale Semi-Supervised Learning for Natural Language ...

Large-Scale Semi-Supervised Learning for Natural Language ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

SystemAccMost-Recent Noun 17.9%Maximum MI 28.2%Maximum DSP 38.5%Table 6.4: Pronoun resolution accuracy on nouns in current or previous sentence in MUC.because, of course, there are many nouns that satisfy the predicate’s SPs that are not coreferent.DSP>0 has both a higher recall and higher precision than accepting every pair previouslyseen in text (the right-most point on MI>T ). The DSP>T system achieves higherprecision than MI>T <strong>for</strong> points where recall is greater than 60% (where MI0 is higher here than it is <strong>for</strong> general verb-objects (Section 6.4.5).On the subset of pairs with strong empirical association (MI>0), MI generally outper<strong>for</strong>msDSP at equivalent recall values.We next compare MI and DSP as stand-alone pronoun resolution systems (Table 6.4).As a standard baseline, <strong>for</strong> each pronoun, we choose the most recent noun in text as the pronoun’santecedent, achieving 17.9% resolution accuracy. This baseline is low because manyof the most-recent nouns are subjects of the pronoun’s verb phrase, and there<strong>for</strong>e resolutionwould violate syntactic coreference constraints. If we choose the previous noun with thehighest MI as antecedent, we get an accuracy of 28.2%, while choosing the previous nounwith the highest DSP achieves 38.5%. DSP resolves 37% more pronouns correctly than MI.6.5 Conclusions and Future WorkWe have proposed a simple, effective model of selectional preference based on discriminativetraining. <strong>Supervised</strong> techniques typically achieve better per<strong>for</strong>mance than unsupervisedmodels, and we duplicate these gains with DSP. Here, however, these gains come at no additionallabeling cost, as training examples are generated automatically from unlabeled text.DSP allows an arbitrary combination of features, including verb co-occurrence featuresthat yield high-quality similar-word lists as latent output. These lists not only indicatewhich verbs are associated with a common set of nouns; they provide insight into a chain ofnarrative events in which a particular noun may participate (e.g., a particular noun may bebought, then cooked, then garnished, and then likely eaten). DSP there<strong>for</strong>e learns similarin<strong>for</strong>mation to previous approaches that seek such narrative events directly and explicitly[Bean and Riloff, 2004; Chambers and Jurafsky, 2008]. However, note that we learn theassociation of events across a corpus rather than only in a particular segment of discourse,like a document or paragraph. One option <strong>for</strong> future work is to model the local and globaldistribution of nouns separately, allowing <strong>for</strong> finer-grained (and potentially sense-specific)plausibility predictions.The features used in DSP only scratch the surface of possible feature mining; in<strong>for</strong>mationfrom WordNet relations, Wikipedia categories, or parallel corpora could also providevaluable clues <strong>for</strong> selectional preference. Also, if any other system were to exceed DSP’sper<strong>for</strong>mance, it could also be included as one of DSP’s features.It would be interesting to expand our co-occurrence features, including co-occurrencecounts across more grammatical relations and using counts from external, unparsed corpora94

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!