12.07.2015 Views

Large-Scale Semi-Supervised Learning for Natural Language ...

Large-Scale Semi-Supervised Learning for Natural Language ...

Large-Scale Semi-Supervised Learning for Natural Language ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

8.3 Future WorkThis section outlines some specific ways to extend or apply insights from this thesis.8.3.1 Improved <strong>Learning</strong> with Automatically-Generated ExamplesIn part two of this thesis, we achieved good results by automatically generating trainingexamples, but we left open some natural questions arising from this work. For example,how many negatives should be generated <strong>for</strong> each positive? How do we ensure that trainingwith pseudo-examples transfers well to testing on real examples? While the size of thelearning problem prevented extensive experiments at the time the research was originallyconducted, recent advances in large-scale machine learning enable much faster training.This allows us to per<strong>for</strong>m large-scale empirical studies to address the above questions. Incombination with the usual advances in computer speed and memory, large-scale empiricalstudies will become even easier. In fact, some have even suggested that large-scale learningof linear classifiers is now essentially a solved problem [Yu et al., 2010]. This provides evengreater impetus to test and exploit large-scale linear pseudo-classifiers in NLP.8.3.2 Exploiting New ML TechniquesAnother interesting direction <strong>for</strong> future research will be the development of learning algorithmsthat exploit correlations between local and global features (see Chapter 1 <strong>for</strong> anexample of local and global features <strong>for</strong> VBN/VBD disambiguation). Often the local andglobal patterns represent the same linguistic construction, and their weights should thus besimilar. For example, suppose at test time we encounter the phrase, “it was the Bears whowon.” Even if we haven’t seen the pattern, “noun who verb” as local context in the trainingset, we may have seen it in the global context of a VBD training instance. Laplacian regularization(previously used to exploit the distributional similarity of words <strong>for</strong> syntactic parsing[Wang et al., 2006]) provides a principled way to <strong>for</strong>ce global and local features to havesimilar weights, although simpler feature-based techniques also exist [Daumé III, 2007]. Inparticular, combining Laplacian regularization with the scaling of feature values (to allowthe more predictive, local features to have higher weight) is a promising direction to explore.In any case, identifying an effective solution here could have implications on other,related problems, such as multi-task learning [Raina et al., 2006], domain adaptation [Mc-Closky et al., 2010] and sharing feature knowledge across languages [Berg-Kirkpatrick andKlein, 2010].8.3.3 New NLP ProblemsThere are a number of other important, but largely unexplored, NLP problems where webscalesolutions could have an impact. One such problem is the detection of functionalrelations <strong>for</strong> in<strong>for</strong>mation extraction. A functional relation is a binary relation where eachelement of the domain is related to a unique element in the codomain. For example, eachperson has a unique birthplace and date of birth, but may have multiple children, residences,and alma maters. There are a number of novel contextual clues that could flag these relations.For example, the indefinite articles a/an tend not to occur with functional relations;we frequently observe a cousin of in text, but we rarely see a birthplace of. The latter isfunctional. Based on our results in Chapter 5, a classifier combining such simple statistics110

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!