12.07.2015 Views

Large-Scale Semi-Supervised Learning for Natural Language ...

Large-Scale Semi-Supervised Learning for Natural Language ...

Large-Scale Semi-Supervised Learning for Natural Language ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

are available. This position-weighting (viz. feature-weighting) technique is similar to thedecision-list weighting in [Yarowsky, 1994]. We refer to this approach as RATIOLM in ourexperiments.3.4 Evaluation MethodologyWe compare our supervised and unsupervised systems on three experimental tasks: prepositionselection, context-sensitive spelling correction, and non-referential pronoun detection.We evaluate using accuracy: the percentage of correctly-selected labels. As a baseline(BASE), we state the accuracy of always choosing the most-frequent class. For spellingcorrection, we average accuracies across the five confusion sets. We also provide learningcurves by varying the number of labeled training examples. It is worth reiterating that thisdata is used solely to weight the contribution of the different filler counts; the filler countsthemselves do not change, as they are always extracted from the full Google 5-gram Corpus.For training SUPERLM, we use a support vector machine (SVM). SVMs achieve goodper<strong>for</strong>mance on a range of tasks (Chapter 2, Section 2.3.4). We use a linear-kernel multiclassSVM (the efficient SVM multiclass instance of SVM struct [Tsochantaridis et al.,2004]). It slightly outper<strong>for</strong>med one-versus-all SVMs in preliminary experiments (and alater, more extensive study in Chapter 4 confirmed that these preliminary intuitions werejustified). We tune the SVM’s regularization parameter on the development sets. We applyadd-one smoothing to the counts used in SUMLM and SUPERLM, while we add 39 tothe counts in RATIOLM, following the approach of Carlson et al. [2008] (40 is the countcut-off used in the Google Corpus). For all unsupervised systems, we choose the most frequentclass if no counts are available. For SUMLM, we use the development sets to decidewhich orders of N-grams to combine, finding orders 3-5 optimal <strong>for</strong> preposition selection,2-5 optimal <strong>for</strong> spelling correction, and 4-5 optimal <strong>for</strong> non-referential pronoun detection.Development experiments also showed RATIOLM works better starting from 4-grams, notthe 5-grams originally used in [Carlson et al., 2008].3.5 Preposition Selection3.5.1 The Task of Preposition SelectionChoosing the correct preposition is one of the most difficult tasks <strong>for</strong> a second-languagelearner to master, and errors involving prepositions constitute a significant proportion oferrors made by learners of English [Chodorow et al., 2007]. Several automatic approachesto preposition selection have recently been developed [Felice and Pulman, 2007; Gamonet al., 2008]. We follow the experiments of Chodorow et al. [2007], who train a classifierto choose the correct preposition among 34 candidates. 4 In [Chodorow et al., 2007],feature vectors indicate words and part-of-speech tags near the preposition, similar to thefeatures used in most disambiguation systems, and unlike the aggregate counts we use inour supervised preposition-selection N-gram model (Section 3.3.1).4 Chodorow et al. do not identify the 34 prepositions they use. We use the 34 from the SemEval-07 prepositionsense-disambiguation task [Litkowski and Hargraves, 2007]: about, across, above, after, against, along,among, around, as, at, be<strong>for</strong>e, behind, beneath, beside, between, by, down, during, <strong>for</strong>, from, in, inside, into,like, of, off, on, onto, over, round, through, to, towards, with43

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!