Bibliography[Adamson and Boreham, 1974] George W. Adamson and Jillian Boreham. The use of anassociation measure based on character structure to identify semantically related pairs ofwords and document titles. In<strong>for</strong>mation Storage and Retrieval, 10, 1974.[Agichtein and Gravano, 2000] Eugene Agichtein and Luis Gravano. Snowball: Extractingrelations from large plain-text collections. In Proceedings of the Fifth ACM InternationalConference on Digital Libraries, 2000.[Ailon and Mohri, 2008] Nir Ailon and Mehryar Mohri. An efficient reduction of rankingto classification. In COLT, 2008.[Ando and Zhang, 2005] Rie Kubota Ando and Tong Zhang. A framework <strong>for</strong> learningpredictive structures from multiple tasks and unlabeled data. Journal of Machine <strong>Learning</strong>Research, 6, 2005.[Bahl et al., 1983] Lalit R. Bahl, Frederick Jelinek, and Robert L. Mercer. A maximumlikelihood approach to continuous speech recognition. IEEE Transactions on PatternAnalysis and Machine Intelligence, 5(2), 1983.[Baker, 1975] James K. Baker. The DRAGON system - an overview. IEEE Transactionson Acoustics, Speech, and Signal Processing, 23(1), 1975.[Banko and Brill, 2001] Michele Banko and Eric Brill. Scaling to very very large corpora<strong>for</strong> natural language disambiguation. In ACL, 2001.[Barr et al., 2008] Cory Barr, Rosie Jones, and Moira Regelson. The linguistic structure ofEnglish web-search queries. In EMNLP, 2008.[Barzilay and Lapata, 2006] Regina Barzilay and Mirella Lapata. Aggregation via set partitioning<strong>for</strong> natural language generation. In HLT-NAACL, 2006.[Bean and Riloff, 2004] David L. Bean and Ellen Riloff. Unsupervised learning of contextualrole knowledge <strong>for</strong> coreference resolution. In HLT-NAACL, 2004.[Belkin et al., 2006] Mikhail Belkin, Partha Niyogi, and Vikas Sindhwani. Manifold regularization:A geometric framework <strong>for</strong> learning from labeled and unlabeled examples.JMLR, 7:2399–2434, 2006.[Bengio et al., 2003] Yoshua Bengio, Réjean Ducharme, Pascal Vincent, and ChristianJanvin. A neural probabilistic language model. Journal of Machine <strong>Learning</strong> Research,3, 2003.[Berg-Kirkpatrick and Klein, 2010] Taylor Berg-Kirkpatrick and Dan Klein. Phylogeneticgrammar induction. In ACL, 2010.[Berger et al., 1996] Adam L. Berger, Stephen A. Della Pietra, and Vincent J. Della Pietra.A maximum entropy approach to natural language processing. Computational Linguistics,22(1), 1996.113
[Bergsma and Cherry, 2010] Shane Bergsma and Colin Cherry. Fast and accurate arc filtering<strong>for</strong> dependency parsing. In COLING, 2010.[Bergsma and Kondrak, 2007a] Shane Bergsma and Grzegorz Kondrak. Alignment-baseddiscriminative string similarity. In ACL, 2007.[Bergsma and Kondrak, 2007b] Shane Bergsma and Grzegorz Kondrak. Multilingual cognateidentification using integer linear programming. In RANLP Workshop on Acquisitionand Management of Multilingual Lexicons, 2007.[Bergsma and Lin, 2006] Shane Bergsma and Dekang Lin. Bootstrapping path-based pronounresolution. In COLING-ACL, 2006.[Bergsma and Wang, 2007] Shane Bergsma and Qin Iris Wang.query segmentation. In EMNLP-CoNLL, 2007.<strong>Learning</strong> noun phrase[Bergsma et al., 2008a] Shane Bergsma, Dekang Lin, and Randy Goebel. Discriminativelearning of selectional preference from unlabeled text. In EMNLP, 2008.[Bergsma et al., 2008b] Shane Bergsma, Dekang Lin, and Randy Goebel. Distributionalidentification of non-referential pronouns. In ACL-08: HLT, 2008.[Bergsma et al., 2009a] Shane Bergsma, Dekang Lin, and Randy Goebel. Glen, Glendaor Glendale: Unsupervised and semi-supervised learning of English noun gender. InCoNLL, 2009.[Bergsma et al., 2009b] Shane Bergsma, Dekang Lin, and Randy Goebel. Web-scale N-gram models <strong>for</strong> lexical disambiguation. In IJCAI, 2009.[Bergsma et al., 2010a] Shane Bergsma, Aditya Bhargava, Hua He, and Grzegorz Kondrak.Predicting the semantic compositionality of prefix verbs. In EMNLP, 2010.[Bergsma et al., 2010b] Shane Bergsma, Dekang Lin, and Dale Schuurmans. Improvednatural language learning via variance-regularization support vector machines. InCoNLL, 2010.[Bergsma et al., 2010c] Shane Bergsma, Emily Pitler, and Dekang Lin. Creating robustsupervised classifiers via web-scale n-gram data. In ACL, 2010.[Bergsma, 2005] Shane Bergsma. Automatic acquisition of gender in<strong>for</strong>mation <strong>for</strong>anaphora resolution. In Proceedings of the 18th Conference of the Canadian Society<strong>for</strong> Computational Studies of Intelligence (Canadian AI’2005), 2005.[Bikel, 2004] Daniel M. Bikel. Intricacies of Collins’ parsing model. Computational Linguistics,30(4), 2004.[Bilenko and Mooney, 2003] Mikhail Bilenko and Raymond J. Mooney. Adaptive duplicatedetection using learnable string similarity measures. In KDD, 2003.[Blitzer et al., 2005] John Blitzer, Amir Globerson, and Fernando Pereira. Distributed latentvariable models of lexical co-occurrences. In AISTATS, 2005.[Blitzer et al., 2007] John Blitzer, Mark Dredze, and Fernando Pereira. Biographies, bollywood,boom-boxes and blenders: Domain adaptation <strong>for</strong> sentiment classification. InACL, 2007.[Bloodgood and Callison-Burch, 2010] Michael Bloodgood and Chris Callison-Burch.Bucking the trend: <strong>Large</strong>-scale cost-focused active learning <strong>for</strong> statistical machine translation.In ACL, 2010.[Blum and Mitchell, 1998] Avrim Blum and Tom Mitchell. Combining labeled and unlabeleddata with co-training. In COLT, 1998.114
- Page 1 and 2:
University of AlbertaLarge-Scale Se
- Page 5 and 6:
Table of Contents1 Introduction 11.
- Page 7 and 8:
7 Alignment-Based Discriminative St
- Page 9 and 10:
List of Figures2.1 The linear class
- Page 11 and 12:
drawn in by establishing a partial
- Page 13 and 14:
(2) “He saw the trophy won yester
- Page 15 and 16:
actual sentence said, “My son’s
- Page 17 and 18:
Uses Web-Scale N-grams Auto-Creates
- Page 19 and 20:
spelling correction, and the identi
- Page 21 and 22:
Chapter 2Supervised and Semi-Superv
- Page 23 and 24:
emphasis on “deliverables and eva
- Page 25 and 26:
Figure 2.1: The linear classifier h
- Page 27 and 28:
The above experimental set-up is so
- Page 29 and 30:
and discriminative models therefore
- Page 31 and 32:
their slack value). In practice, I
- Page 33 and 34:
One way to find a better solution i
- Page 35 and 36:
Figure 2.2: Learning from labeled a
- Page 37 and 38:
algorithm). Yarowsky used it for wo
- Page 39 and 40:
Learning with Natural Automatic Exa
- Page 41 and 42:
positive examples from any collecti
- Page 43 and 44:
generated word clusters. Several re
- Page 45 and 46:
One common disambiguation task is t
- Page 47 and 48:
3.2.2 Web-Scale Statistics in NLPEx
- Page 49 and 50:
For each target wordv 0 , there are
- Page 51 and 52:
ut without counts for the class pri
- Page 53 and 54:
Accuracy (%)10090807060SUPERLMSUMLM
- Page 55 and 56:
We also follow Carlson et al. [2001
- Page 57 and 58:
Set BASE [Golding and Roth, 1999] T
- Page 59 and 60:
pronoun (#3) guarantees that at the
- Page 61 and 62:
807876F-Score747270Stemmed patterns
- Page 63 and 64:
anaphoricity by [Denis and Baldridg
- Page 65 and 66:
ter, we present a simple technique
- Page 67 and 68:
We seek weights such that the class
- Page 69 and 70:
each optimum performance is at most
- Page 71 and 72: We now show that ¯w T (diag(¯p)
- Page 73 and 74: Training ExamplesSystem 10 100 1K 1
- Page 75 and 76: Since we wanted the system to learn
- Page 77 and 78: Chapter 5Creating Robust Supervised
- Page 79 and 80: § In-Domain (IN) Out-of-Domain #1
- Page 81 and 82: Adjective ordering is also needed i
- Page 83 and 84: Accuracy (%)10095908580757065601001
- Page 85 and 86: System IN O1 O2Baseline 66.9 44.6 6
- Page 87 and 88: 90% of the time in Gutenberg. The L
- Page 89 and 90: VBN/VBD distinction by providing re
- Page 91 and 92: other tasks we only had a handful o
- Page 93 and 94: without the need for manual annotat
- Page 95 and 96: DSP uses these labels to identify o
- Page 97 and 98: Semantic classesMotivated by previo
- Page 99 and 100: empirical Pr(n|v) in Equation (6.2)
- Page 101 and 102: Verb Plaus./Implaus. Resnik Dagan e
- Page 103 and 104: SystemAccMost-Recent Noun 17.9%Maxi
- Page 105 and 106: Chapter 7Alignment-Based Discrimina
- Page 107 and 108: ious measures to learn the recurren
- Page 109 and 110: how labeled word pairs can be colle
- Page 111 and 112: Figure 7.1: LCSR histogram and poly
- Page 113 and 114: 0.711-pt Average Precision0.60.50.4
- Page 115 and 116: Fr-En Bitext Es-En Bitext De-En Bit
- Page 117 and 118: Chapter 8Conclusions and Future Wor
- Page 119 and 120: 8.3 Future WorkThis section outline
- Page 121: My focus is thus on enabling robust
- Page 125 and 126: [Church and Mercer, 1993] Kenneth W
- Page 127 and 128: [Grefenstette, 1999] Gregory Grefen
- Page 129 and 130: [Koehn, 2005] Philipp Koehn. Europa
- Page 131 and 132: [Mihalcea and Moldovan, 1999] Rada
- Page 133 and 134: [Ristad and Yianilos, 1998] Eric Sv
- Page 135 and 136: [Wang et al., 2008] Qin Iris Wang,
- Page 137: NNP noun, proper, singular Motown V