12.07.2015 Views

Large-Scale Semi-Supervised Learning for Natural Language ...

Large-Scale Semi-Supervised Learning for Natural Language ...

Large-Scale Semi-Supervised Learning for Natural Language ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Accuracy (%)100989694929088SUPERLMSUMLMRATIOLMTRIGRAM100 1000 10000 100000Number of training examplesFigure 3.3: Context-sensitive spelling correction learning curvewe select the most likely word from the set. The importance of using large volumes ofdata has previously been noted [Banko and Brill, 2001; Liu and Curran, 2006]. Impressivelevels of accuracy have been achieved on the standard confusion sets, <strong>for</strong> example, 100% ondisambiguating both {affect, effect} and {weather, whether} by Golding and Roth [1999].We thus restricted our experiments to the five confusion sets (of twenty-one in total) wherethe reported per<strong>for</strong>mance in [Golding and Roth, 1999] is below 90% (an average of 87%):{among, between},{amount, number},{cite, sight, site},{peace, piece}, and{raise, rise}.We again create labeled data automatically from the NYT portion of Gigaword. For eachconfusion set, we extract 100K examples <strong>for</strong> training, 10K <strong>for</strong> development, and 10K <strong>for</strong> afinal test set.3.6.2 Context-sensitive Spelling Correction ResultsFigure 3.3 provides the spelling correction learning curve, while Table 3.2 gives resultson the five confusion sets. Choosing the most frequent label averages 66.9% on this task(BASE). TRIGRAM scores 88.4%, comparable to the trigram (page count) results reportedin [Lapata and Keller, 2005]. SUPERLM again achieves the highest per<strong>for</strong>mance (95.7%),and it reaches this per<strong>for</strong>mance using many fewer training examples than with prepositionselection. This is because the number of parameters grows with the number of fillers timesthe number of labels (recall, there are 14|F|K count-weight parameters), and there are34 prepositions but only two-to-three confusable spellings. Note that we also include theper<strong>for</strong>mance reported in [Golding and Roth, 1999], although these results are reported on adifferent corpus.SUPERLM achieves a 24% relative reduction in error over RATIOLM (94.4%), whichwas the previous state-of-the-art [Carlson et al., 2008]. SUMLM (94.8%) also improveson RATIOLM, although results are generally similar on the different confusion sets. On{raise,rise}, SUPERLM’s supervised weighting of the counts by position and size does notimprove over SUMLM (Table 3.2). On all the other sets the per<strong>for</strong>mance is higher; <strong>for</strong>example, on {among,between}, the accuracy improves by 2.3%. On this set, counts <strong>for</strong>47

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!