12.07.2015 Views

Large-Scale Semi-Supervised Learning for Natural Language ...

Large-Scale Semi-Supervised Learning for Natural Language ...

Large-Scale Semi-Supervised Learning for Natural Language ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

generated word clusters. Several researchers have used the hierarchical Brown et al. [1992]clustering algorithm, and then created features <strong>for</strong> cluster membership at different levels ofthe hierarchy [Miller et al., 2004; Koo et al., 2008]. Rather than clustering single words,Lin and Wu [2009] use phrasal clusters, and provide features <strong>for</strong> cluster membership whendifferent numbers of clusters are used in the clustering.Features <strong>for</strong> the Output of Auxiliary ClassifiersAnother way to create features from unlabeled data is to create features <strong>for</strong> the output ofpredictions on auxiliary problems that can be trained solely with unlabeled data [Ando andZhang, 2005]. For example, we could create a prediction <strong>for</strong> whether the word arena occursin a document. We can take all the documents where arena does and does not occur, andbuild a classifier using all the other words in the document. This classifier may predictthat arena does occur if the words hockey, curling, fans, etc. occur. When the predictionsare used as features, if they are useful, they will receive high weight at training time. Attest time, if we see a word like curling, <strong>for</strong> example, even though it was never seen in ourlabeled set, it may cause the predictor <strong>for</strong> arena to return a high score, and thus also causethe document to be recognized as sports.Note that since these examples can be created automatically, this problem (and otherauxiliary problems in the Ando and Zhang approach) fall into the category of those with<strong>Natural</strong> Automatic Examples as discussed above. One possible direction <strong>for</strong> future work isto construct auxiliary problems with pseudo-negative examples. For example, we couldinclude the predictions of various configurations of our selectional-preference classifier(Chapter 6) as a feature in a discriminatively-trained language model. We took a similarapproach in our work on gender [Bergsma et al., 2009a]. We trained a classifier onautomatically-created examples, but used the output of this classifier as another feature ina classifier trained on a small amount of supervised data. This resulted in a substantial gainin per<strong>for</strong>mance over using the original prediction on its own: 95.5% versus 92.6% (but noteother features were combined with the prediction of the auxiliary classifier).Features used in this DissertationIn this dissertation, we create features from unsupervised data in several chapters and inseveral different ways. In Chapter 6, to assess whether a noun is compatible with a verb,we create features <strong>for</strong> the noun’s distribution only with other verbs. Thus we characterizea noun by its verb contexts, rather than its full distribution, using less features than anaive representation using the noun’s full distributional profile. Chapters 3 and 5 also selectivelyuse features from parts of the total distribution of a word, phrase, or pair of words (tocharacterize the relation between words, <strong>for</strong> noun compound bracketing and verb tag disambiguationin Chapter 5). In Chapter 3, we characterize contexts by using selected typesfrom the distribution of other words that occur in the context. For the adjective-orderingwork in Chapter 5, we choose an order based on the distribution of the adjectives individuallyand combined in a phrase. Our approaches are simple, but effective. Perhaps mostimportantly, by leveraging the counts in a web-scale N-gram corpus, they scale to makeuse of all the text data on the web. On the other hand, scaling most other semi-supervisedtechniques to even moderately-large collections of unlabeled text remains “future work” <strong>for</strong>a large number of published approaches in the machine learning and NLP literature.34

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!