12.07.2015 Views

Large-Scale Semi-Supervised Learning for Natural Language ...

Large-Scale Semi-Supervised Learning for Natural Language ...

Large-Scale Semi-Supervised Learning for Natural Language ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

System P R F AccSUPERLM 80.0 73.3 76.5 86.5Human-1 92.7 63.3 75.2 87.5Human-2 84.0 70.0 76.4 87.0Human-3 72.2 86.7 78.8 86.0Table 3.4: Human vs. computer non-referential it detection (%).Subjects first per<strong>for</strong>med a dry-run experiment on separate development data. They wereshown their errors and sources of confusion were clarified. They then made the judgmentsunassisted on a final set of 200 test examples. Three humans per<strong>for</strong>med the experiment.Their results show a range of preferences <strong>for</strong> precision versus recall, with F-Score and Accuracybroadly similar to SUPERLM (Table 3.4). These results show that our distributionalapproach is already getting good leverage from the limited context in<strong>for</strong>mation, around thatachieved by our best human.Error AnalysisIt is instructive to inspect the twenty-five test instances that our system classified incorrectly,given human per<strong>for</strong>mance on this same set. Seventeen of the twenty-five system errorswere also made by one or more human subjects, suggesting system errors are also mostlydue to limited context. For example, one of these errors was <strong>for</strong> the context: “it takesan astounding amount...” Here, the non-referential nature of the instance is not apparentwithout the infinitive clause that ends the sentence: “... of time to compare very long DNAsequences with each other.”Six of the eight errors unique to the system were cases where the system falsely saidthe pronoun was non-referential. Four of these could have referred to entire sentences orclauses rather than nouns. These confusing cases, <strong>for</strong> both humans and our system, resultfrom our definition of a referential pronoun: pronouns with verbal or clause antecedents areconsidered non-referential. If an antecedent verb or clause is replaced by a nominalization(Smith researched... to Smith’s research), then a neutral pronoun, in the same context, becomesreferential. When we inspect the probabilities produced by the maximum entropyclassifier, we see only a weak bias <strong>for</strong> the non-referential class on these examples, reflectingour classifier’s uncertainty. It would likely be possible to improve accuracy on thesecases by encoding the presence or absence of preceding nominalizations as a feature of ourclassifier.Another false non-referential decision is <strong>for</strong> the phrase “... machine he had installedit on.” The it is actually referential, but the extracted patterns (e.g. “he had install * on”)are nevertheless usually filled with it (this example also suggests using filler counts <strong>for</strong>the word “the” as a feature when it is the last word in the pattern). Again, it might bepossible to fix such examples by leveraging the preceding discourse. Notably, the first nounphrasebe<strong>for</strong>e the context is the word “software.” There is strong compatibility between thepronoun-parent “install” and the candidate antecedent “software.” In a full coreferenceresolution system, when the anaphora resolution module has a strong preference to link itto an antecedent (which it should when the pronoun is indeed referential), we can override aweak non-referential probability. Non-referential it detection should not be a pre-processingstep, but rather part of a globally-optimal configuration, as was done <strong>for</strong> general noun phrase53

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!