12.07.2015 Views

Large-Scale Semi-Supervised Learning for Natural Language ...

Large-Scale Semi-Supervised Learning for Natural Language ...

Large-Scale Semi-Supervised Learning for Natural Language ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Accuracy (%)1009590858075706560N-GM+LEXN-GMLEX1001e31e4Number of training examples1e5Figure 5.3: Out-of-domain learning curve of adjective ordering classifiers on Medline.5.4 Context-Sensitive Spelling CorrectionWe now turn to the generation problem of context-sensitive spelling correction. For thosewho have read the previous two chapters, you’re obviously familiar with the task: For everyoccurrence of a word in a pre-defined set of confusable words (like peace and piece), thesystem must select the most likely word from the set, flagging possible usage errors whenthe predicted word disagrees with the original.Our in-domain examples are again from the New York Times (NYT) portion of Gigaword,as described in Chapter 3. Recall that these comprise the 5 confusion sets whereaccuracy was below 90% in [Golding and Roth, 1999]. There are 100K training, 10K development,and 10K test examples <strong>for</strong> each confusion set. Our results are averages acrossconfusion sets.Out-of-domain examples are again drawn from Gutenberg and Medline. We extract allinstances of words that are in one of our confusion sets, along with surrounding context.By assuming the extracted instances represent correct usage, we label 7.8K and 56K outof-domaintest examples <strong>for</strong> Gutenberg and Medline, respectively.We test three unsupervised systems: 1) TRIGRAM (Chapter 3): use one token of contexton the left and one on the right, and output the candidate from the confusion set that occursmost frequently in this pattern [Lapata and Keller, 2005]. 2) SUMLM (Chapter 3), wherewe measure the frequency of the candidates in all the 3-to-5-gram patterns that span theconfusable word. For each candidate, we sum the log-counts of all patterns filled with thecandidate, and output the candidate with the highest total. 3) The baseline predicts the mostfrequent member of each confusion set, based on frequencies in the NYT training data.5.4.1 <strong>Supervised</strong> Spelling CorrectionOur LEX features are typical disambiguation features that flag specific aspects of the context.We have features <strong>for</strong> the words at all positions in a 9-word window (called collocationfeatures by [Golding and Roth, 1999]), plus indicators <strong>for</strong> a particular word preceding orfollowing the confusable word. We also include indicators <strong>for</strong> all N-grams, and their position,in a 9-word window.For N-GM count features, we follow Chapter 3. We include the log-counts of all75

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!