12.07.2015 Views

Large-Scale Semi-Supervised Learning for Natural Language ...

Large-Scale Semi-Supervised Learning for Natural Language ...

Large-Scale Semi-Supervised Learning for Natural Language ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

807876F-Score747270Stemmed patternsTruncated patternsUnaltered patterns681 2 3 4 5 6 7 8 9 10Truncated word lengthFigure 3.5: Effect of pattern-word truncation on non-referential it detection.not directly comparable to the above work because they used a different split of trainingand testing data, and because experiments were conducted with a maximum entropy classifierrather than an SVM. However, this previous work nevertheless provides useful insightsinto the per<strong>for</strong>mance of SUPERLM on this task. Full details are available in [Bergsma etal., 2008b]. To analyze the output of our system in greater detail, we now also report on theprecision, recall, and F-score of the classifier (defined in Section 2.3.2).Stemming vs. Simple TruncationSince applying an English stemmer to the context words (Section 3.7.2) reduces the portabilityof the distributional technique, we investigated the use of more portable pattern abstraction.Figure 3.5 compares the use of the stemmer to simply truncating the words in thepatterns at a certain maximum length. Using no truncation (Unaltered) drops the F-Scoreby 4.3%, while truncating the patterns to a length of four only drops the F-Score by 1.4%,a difference which is not statistically significant. Simple truncation may be a good option<strong>for</strong> other languages where stemmers are not readily available. The optimum truncation sizewill likely depend on the length of the base <strong>for</strong>ms of words in that language. For real-worldapplication of our approach, truncation also reduces the table sizes (and thus storage andlook-up costs) of any pre-compiled it-pattern database.A Human StudyWe also wondered, what is the effect of making a classification based solely on, in aggregate,four words of context on either side of it. Another way to view the limited context isto ask, given the amount of context we have, are we making optimum use of it? We answerthis by seeing how well humans can do with the same in<strong>for</strong>mation. Our system uses 5-gramcontext patterns that together span from four-to-the-left to four-to-the-right of the pronoun.We thus provide these same nine-token windows to our human subjects, and ask them todecide whether the pronouns refer to previous noun phrases or not, based on these contexts.52

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!