12.07.2015 Views

Large-Scale Semi-Supervised Learning for Natural Language ...

Large-Scale Semi-Supervised Learning for Natural Language ...

Large-Scale Semi-Supervised Learning for Natural Language ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

other tasks we only had a handful of N-GM features; here there are 21K features <strong>for</strong> thedistributional patterns of N,V pairs. With more features, we need more labeled data tomake the N-GM features work properly. A similar issue with such high-dimensional globalfeatures is explored in [Huang and Yates, 2009]. Reducing this feature space by pruning orper<strong>for</strong>ming trans<strong>for</strong>mations may improve accuracy in and out-of-domain.5.7 Discussion and Future WorkOf all classifiers, LEX per<strong>for</strong>ms worst on all cross-domain tasks. Clearly, many of theregularities that a typical classifier exploits in one domain do not transfer to new genres.N-GM features, however, do not depend directly on training examples, and thus work bettercross-domain. Of course, using web-scale N-grams is not the only way to create robustclassifiers. Counts from any large auxiliary corpus may also help, but web counts shouldhelp more [Lapata and Keller, 2005]. Section 5.6.2 suggests that another way to mitigatedomain-dependence is to have multiple feature views.[Banko and Brill, 2001] argue “a logical next step <strong>for</strong> the research community would beto direct ef<strong>for</strong>ts towards increasing the size of annotated training collections.” Assuming wereally do want systems that operate beyond the specific domains on which they are trained,the community also needs to identify which systems behave as in Figure 5.2, where theaccuracy of the best in-domain system actually decreases with more training examples. Ourresults suggest better features, such as web pattern counts, may help more than expandingtraining data. Also, systems using web-scale unlabeled data will improve automatically asthe web expands, without annotation ef<strong>for</strong>t.In some sense, using web counts as features is a <strong>for</strong>m of domain adaptation: adaptinga web model to the training domain. How do we ensure these features are adaptedwell and not used in domain-specific ways (especially with many features to adapt, as inSection 5.6)? One option may be to regularize the classifier specifically <strong>for</strong> out-of-domainaccuracy. We found that adjusting the SVM misclassification penalty (<strong>for</strong> more regularization)can help or hurt out-of-domain. Other regularizations are possible. In fact, one goodoption might be to extend the results of Chapter 4 to cross-domain usage. In this chapter,each task had a domain-neutral unsupervised approach. We could encode these systemsas linear classifiers with corresponding weights. Rather than a typical SVM that minimizesthe weight-norm||w|| (plus the slacks), we could regularize toward domain-neutral weights,and otherwise minimize weight variance so that it prefers this system in the absence of otherin<strong>for</strong>mation. This regularization could be optimized on creative splits of the training data,in an ef<strong>for</strong>t to simulate the lexical overlap we can expect with a particular test domain.5.8 ConclusionWe presented results on tasks spanning a range of NLP research: generation, disambiguation,parsing and tagging. Using web-scale N-gram data improves accuracy on each task.When less training data is used, or when the system is used on a different domain, N-gramfeatures greatly improve per<strong>for</strong>mance. Since most supervised NLP systems do not useweb-scale counts, further cross-domain evaluation may reveal some very brittle systems.Continued ef<strong>for</strong>t in new domains should be a priority <strong>for</strong> the community going <strong>for</strong>ward.This concludes the part of the dissertation that explored using web-scale N-gram dataas features within a supervised classifier. I hope you enjoyed it.82

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!