12.07.2015 Views

Large-Scale Semi-Supervised Learning for Natural Language ...

Large-Scale Semi-Supervised Learning for Natural Language ...

Large-Scale Semi-Supervised Learning for Natural Language ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

One way to find a better solution in this bumpy optimization space is to initialize orfix parameters of your model in ways that bias things toward what you know you want.For example, [Haghighi and Klein, 2010] fix a number of parameters in their entity-type /coreference model using prototypes of different classes. That is, they ensure, e.g., that Bushor Gore are in the PERSON class, as are the nominals president, official, etc., and thatthis class is referred to by the appropriate set of pronouns. They also set a number of otherparameters to “fixed heuristic values.” When the unsupervised learning kicks in, it initiallyhas less freedom to go off the rails, as the hand-tuning has started the model from a goodspot in the optimization space.One argument that is sometimes made against fully unsupervised approaches is that theset-up is a little unrealistic. You will likely want to evaluate your approach. To evaluateyour approach, you will need some labeled data. If you can produce labeled data <strong>for</strong> testing,you can produce some labeled data <strong>for</strong> training. It seems that semi-supervised learningis a more realistic situation: you have lots of unlabeled data, but you also have a few labeledexamples to help you configure your parameters. In our unsupervised pronoun resolutionwork [Cherry and Bergsma, 2005], we also used some labeled examples to re-weight the parameterslearned by EM (using the discriminative technique known as maximum entropy). 3Another interaction between unsupervised and supervised learning occurs when an unsupervisedmethod provides intermediate structural in<strong>for</strong>mation <strong>for</strong> a supervised algorithm.For example, unsupervised algorithms often generate the unseen word-to-word alignmentsin statistical machine translation [Brown et al., 1993] and also the unseen character-tophonemealignments in grapheme-to-phoneme conversion [Jiampojamarn et al., 2007].This alignment in<strong>for</strong>mation is then leveraged by subsequent supervised processing.2.5 <strong>Semi</strong>-<strong>Supervised</strong> <strong>Learning</strong><strong>Semi</strong>-supervised learning is a huge and growing area of interest in many different researchcommunities. The name semi-supervised learning has come to essentially mean that apredictor is being created from in<strong>for</strong>mation from both labeled and unlabeled examples.There are a variety of flavours of semi-supervised learning that are relevant to NLP andmerit discussion. A good recent survey of semi-supervised techniques in general is byZhu [2005].<strong>Semi</strong>-supervised learning was the “Special Topic of Interest” at the 2009 Conferenceon <strong>Natural</strong> <strong>Language</strong> <strong>Learning</strong>. The organizers, Suzanne Stevenson and Xavier Carreras,provided a thoughtful motivation <strong>for</strong> semi-supervised learning in the call <strong>for</strong> papers. 4“The field of natural language learning has made great strides over the last 15years, especially in the design and application of supervised and batch learningmethods. However, two challenges arise with this kind of approach. First, incore NLP tasks, supervised approaches require typically large amounts of manuallyannotated data, and experience has shown that results often depend on the3 The line between supervised and unsupervised learning can be a little blurry. We called our use of labeleddata and maximum entropy the “supervised extension” of our unsupervised system in our EM paper [Cherryand Bergsma, 2005]. A later unsupervised approach by Charniak and Elsner [2009], which also uses EMtraining <strong>for</strong> pronoun resolution, involved tuning essentially the same number of hyperparameters by hand (tooptimize per<strong>for</strong>mance on a development set) as the number of parameters we tuned with supervision. Is thisstill unsupervised learning?4 http://www.cnts.ua.ac.be/conll2009/cfp.html24

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!