Large-Scale Semi-Supervised Learning for Natural Language ...

More documents

Recommendations

Info

with standard lexical features could possibly allow robust functional relation identificationacross different domains and genres.8.3.4 Improving Core NLP TechnologiesI also plan to apply the web-scale semi-supervised framework to core NLP technologiesthat are in great demand in the NLP community.I have previously explored a range of enhancements to pronoun resolution systems[Cherry and Bergsma, 2005; Bergsma, 2005; Bergsma and Lin, 2006; Bergsma et al.,2008b; 2008a; 2009a]. My next step will be to develop and distribute an efficient, stateof-the-art,N-gram-enabled pronoun resolution system for academic and industrial applications.In conversation with colleagues at conferences, I have found that many researchersshy away from machine-learned pronoun resolution systems because of a fear they wouldnot work well on new domains (i.e., the specific domain on which the research is beingconducted). By incorporating web-scale statistics into pronoun resolvers, I plan to producea robust system that people can confidently apply wherever needed.I will also use web-scale resources to make advances in parsing, the cornerstone technologyof NLP. A parser gives the structure of a sentence, identifying who is doing whatto whom. Parsing digs deeper into text than typical information retrieval technology, extractingricher levels of knowledge. Companies like Google and Microsoft have recognizedthe need to access these deeper linguistic structures and are making parsing a focus fortheir next generation of search engines. I will create an accurate open-domain parser: adomain-independent parser that can reliably analyze any genre of text. A few approacheshave successfully adapted a parser to a specific domain, such as general non-fiction [Mc-Closky et al., 2006b] or biomedical text [Rimell and Clark, 2008], but these systems makeassumptions that would be unrealistic when parsing text in a heterogeneous collection ofweb pages, for example. A parser that could reliably process a variety of genres, withoutmanual involvement, would be of great practical and scientific value.I will create an open-domain parser by essentially adapting to all the text on the web,again building on the robust classifiers presented in Chapter 5. Parsing decisions will bebased on observations in web-scale N-gram data, rather than observed (and potentiallyoverly-specific) constructions in a particular domain. Custom algorithms could also beused to extract web-scale knowledge for difficult parsing decisions in coordination, nouncompounding, and prepositional phrase attachment. Work in open domain parsing will alsorequire the development of new, cross-domain, task-based evaluations; these could facilitatecomparison of parsers based on different formalisms.I have recently explored methods to both improve the speed of highly-accurate graphbasedparsers [Bergsma and Cherry, 2010] (thus allowing the incorporation of new featureswith less overhead) and ways to incorporate web-scale statistics into the subtask of nounphrase parsing [Pitler et al., 2010]. In preliminary experiments, I have identified a numberof other simple N-gram-derived features that improve full-sentence parsing accuracy.I also plan to investigate whether open-domain parsing could be improved by manuallyannotating parses of the most frequent N-grams in our new web-scale N-gram corpus(Chapter 5). Recall that the new N-gram corpus includes part-of-speech tags. These tagsmight help identify N-grams that are likely to be both syntactic constituents and syntacticallyambiguous (e.g. noun compounds). The annotation could be done either by experts,or by crowdsourcing annotation via Amazon’s Mechanical Turk. A similar technique wasrecently successfully demonstrated for MT [Bloodgood and Callison-Burch, 2010].111
My focus is thus on enabling robust, open-domain systems through better features andnew kinds of labeled data. These improvements should combine constructively with recent,orthogonal advances in domain detection and adaptation [McClosky et al., 2010].8.3.5 Mining New Data SourcesWhile web-scale N-gram data is very effective, future NLP technology will combine informationfrom a variety of other structured and unstructured data sources to make betternatural language inferences. Query logs, parallel bilingual corpora, and collaborativeprojects like Wikipedia will provide crucial knowledge for syntactic and semantic analysis.For example, there is a tremendous amount of untapped information in the Wikipedia edithistories, which record all the changes made to Wikipedia pages. As a first step in harvestingthis information, we could extract a database of real spelling corrections made toWikipedia pages. This data could be used to train and test NLP spelling correction systemsat an unprecedented scale.Furthermore, it also seems likely that information from the massive volume of onlineimages and video will be used to inform automatic language processing. Many simplestatistics can also be computed from visual sources and stored, just like N-gram counts, inprecompiled databases. For example, we might extract visual descriptors using algorithmslike the popular and efficient SIFT algorithm [Lowe, 1999], convert these descriptors toimage codewords (i.e., the bag-of-words representation of images), and then store the codewordco-occurrence counts in a large database.In fact, services like the Google Image Search and Flickr Photo Sharing websites effectivelyalready link caption words to images in a database. This service could be exploitedfor building special language models, for example, for selectional preference. When creatingfeatures for nouns occurring with particular verbs, for example (as in Chapter 6), wemight query the image search service using the noun string as the keyword, and then createSIFT-style features for the retrieved images. Could we build a model, for example, of thingsthat can be eaten, purely based on visual images of edible substances?In general, I envision some breakthroughs once NLP moves beyond solving text processingin isolation and instead adopts an approach that integrates advances in large-scaleprocessing across a variety of disciplines.– Thanks for reading the dissertation!112
Page 1 and 2:
University of AlbertaLarge-Scale Se
Page 5 and 6:
Table of Contents1 Introduction 11.
Page 7 and 8:
7 Alignment-Based Discriminative St
Page 9 and 10:
List of Figures2.1 The linear class
Page 11 and 12:
drawn in by establishing a partial
Page 13 and 14:
(2) “He saw the trophy won yester
Page 15 and 16:
actual sentence said, “My son’s
Page 17 and 18:
Uses Web-Scale N-grams Auto-Creates
Page 19 and 20:
spelling correction, and the identi
Page 21 and 22:
Chapter 2Supervised and Semi-Superv
Page 23 and 24:
emphasis on “deliverables and eva
Page 25 and 26:
Figure 2.1: The linear classifier h
Page 27 and 28:
The above experimental set-up is so
Page 29 and 30:
and discriminative models therefore
Page 31 and 32:
their slack value). In practice, I
Page 33 and 34:
One way to find a better solution i
Page 35 and 36:
Figure 2.2: Learning from labeled a
Page 37 and 38:
algorithm). Yarowsky used it for wo
Page 39 and 40:
Learning with Natural Automatic Exa
Page 41 and 42:
positive examples from any collecti
Page 43 and 44:
generated word clusters. Several re
Page 45 and 46:
One common disambiguation task is t
Page 47 and 48:
3.2.2 Web-Scale Statistics in NLPEx
Page 49 and 50:
For each target wordv 0 , there are
Page 51 and 52:
ut without counts for the class pri
Page 53 and 54:
Accuracy (%)10090807060SUPERLMSUMLM
Page 55 and 56:
We also follow Carlson et al. [2001
Page 57 and 58:
Set BASE [Golding and Roth, 1999] T
Page 59 and 60:
pronoun (#3) guarantees that at the
Page 61 and 62:
807876F-Score747270Stemmed patterns
Page 63 and 64:
anaphoricity by [Denis and Baldridg
Page 65 and 66:
ter, we present a simple technique
Page 67 and 68:
We seek weights such that the class
Page 69 and 70: each optimum performance is at most
Page 71 and 72: We now show that ¯w T (diag(¯p)
Page 73 and 74: Training ExamplesSystem 10 100 1K 1
Page 75 and 76: Since we wanted the system to learn
Page 77 and 78: Chapter 5Creating Robust Supervised
Page 79 and 80: § In-Domain (IN) Out-of-Domain #1
Page 81 and 82: Adjective ordering is also needed i
Page 83 and 84: Accuracy (%)10095908580757065601001
Page 85 and 86: System IN O1 O2Baseline 66.9 44.6 6
Page 87 and 88: 90% of the time in Gutenberg. The L
Page 89 and 90: VBN/VBD distinction by providing re
Page 91 and 92: other tasks we only had a handful o
Page 93 and 94: without the need for manual annotat
Page 95 and 96: DSP uses these labels to identify o
Page 97 and 98: Semantic classesMotivated by previo
Page 99 and 100: empirical Pr(n|v) in Equation (6.2)
Page 101 and 102: Verb Plaus./Implaus. Resnik Dagan e
Page 103 and 104: SystemAccMost-Recent Noun 17.9%Maxi
Page 105 and 106: Chapter 7Alignment-Based Discrimina
Page 107 and 108: ious measures to learn the recurren
Page 109 and 110: how labeled word pairs can be colle
Page 111 and 112: Figure 7.1: LCSR histogram and poly
Page 113 and 114: 0.711-pt Average Precision0.60.50.4
Page 115 and 116: Fr-En Bitext Es-En Bitext De-En Bit
Page 117 and 118: Chapter 8Conclusions and Future Wor
Page 119: 8.3 Future WorkThis section outline
Page 123 and 124: [Bergsma and Cherry, 2010] Shane Be
Page 125 and 126: [Church and Mercer, 1993] Kenneth W
Page 127 and 128: [Grefenstette, 1999] Gregory Grefen
Page 129 and 130: [Koehn, 2005] Philipp Koehn. Europa
Page 131 and 132: [Mihalcea and Moldovan, 1999] Rada
Page 133 and 134: [Ristad and Yianilos, 1998] Eric Sv
Page 135 and 136: [Wang et al., 2008] Qin Iris Wang,
Page 137: NNP noun, proper, singular Motown V
show all

Large-Scale Semi-Supervised Learning for Natural Language ...

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?