[Brants and Franz, 2006] Thorsten Brants and Alex Franz. The Google Web 1T 5-gramCorpus Version 1.1. LDC2006T13, 2006.[Brants et al., 2007] Thorsten Brants, Ashok C. Popat, Peng Xu, Franz J. Och, and JeffreyDean. <strong>Large</strong> language models in machine translation. In EMNLP, 2007.[Brants, 2000] Thorsten Brants. TnT – a statistical part-of-speech tagger. In ANLP, 2000.[Brill and Moore, 2000] Eric Brill and Robert Moore. An improved error model <strong>for</strong> noisychannel spelling correction. In ACL, 2000.[Brill et al., 2001] Eric Brill, Jimmy Lin, Michele Banko, Susan Dumais, and Andrew Ng.Data-Intensive Question Answering. In TREC, 2001.[Brin, 1998] Sergey Brin. Extracting patterns and relations from the world wide web. InWebDB Workshop at 6th International Conference on Extending Database Technology,1998.[Brockmann and Lapata, 2003] Carsten Brockmann and Mirella Lapata. Evaluating andcombining approaches to selectional preference acquisition. In EACL, 2003.[Brown et al., 1992] Peter F. Brown, Vincent J. Della Pietra, Peter V. de Souza, Jennifer C.Lai, and Robert L. Mercer. Class-based n-gram models of natural language. ComputationalLinguistics, 18(4), 1992.[Brown et al., 1993] Peter F. Brown, Stephen A. Della Pietra, Vincent J. Della Pietra, andRobert L. Mercer. The mathematics of statistical machine translation: Parameter estimation.Computational Linguistics, 19(2), 1993.[Carlson et al., 2001] Andrew J. Carlson, Jeffrey Rosen, and Dan Roth. Scaling up contextsensitivetext correction. In AAAI/IAAI, 2001.[Carlson et al., 2008] Andrew Carlson, Tom M. Mitchell, and Ian Fette. Data analysisproject: Leveraging massive textual corpora using n-gram statistics. Technial ReportCMU-ML-08-107, 2008.[Chambers and Jurafsky, 2008] Nathanael Chambers and Dan Jurafsky.learning of narrative event chains. In ACL, 2008.Unsupervised[Chambers and Jurafsky, 2010] Nathanael Chambers and Dan Jurafsky. Improving the useof pseudo-words <strong>for</strong> evaluating selectional preferences. In ACL, 2010.[Charniak and Elsner, 2009] Eugene Charniak and Micha Elsner. EM works <strong>for</strong> pronounanaphora resolution. In EACL, 2009.[Chen and Goodman, 1998] Stanley F. Chen and Joshua Goodman. An empirical study ofsmoothing techniques <strong>for</strong> language modeling. TR-10-98, Harvard University, 1998.An Expectation Maxi-[Cherry and Bergsma, 2005] Colin Cherry and Shane Bergsma.mization approach to pronoun resolution. In CoNLL, 2005.[Chklovski and Pantel, 2004] Timothy Chklovski and Patrick Pantel. Verbocean: Miningthe web <strong>for</strong> fine-grained semantic verb relations. In EMNLP, pages 33–40, 2004.[Chodorow et al., 2007] Martin Chodorow, Joel R. Tetreault, and Na-Rae Han. Detectionof grammatical errors involving prepositions. In ACL-SIGSEM Workshop on Prepositions,2007.[Chomsky, 1956] Noam Chomsky. Three models <strong>for</strong> the description of language. IRITransactions on In<strong>for</strong>mation Theory, 2(3), 1956.[Church and Hanks, 1990] Kenneth W. Church and Patrick Hanks. Word associationnorms, mutual in<strong>for</strong>mation, and lexicography. Computational Linguistics, 16(1), 1990.115
[Church and Mercer, 1993] Kenneth W. Church and Rorbert L. Mercer. Introduction to thespecial issue on computational linguistics using large corpora. Computational Linguistics,19(1), 1993.[Church and Patil, 1982] Kenneth Church and Ramesh Patil. Coping with syntactic ambiguityor how to put the block in the box on the table. Computational Linguistics,8(3-4):139–149, 1982.[Church et al., 2007] Kenneth Church, Ted Hart, and Jianfeng Gao. Compressing trigramlanguage models with Golomb coding. In EMNLP-CoNLL, 2007.[Church, 1993] Kenneth W. Church. Char align: A program <strong>for</strong> aligning parallel texts atthe character level. In Proceedings of ACL 1993, 1993.[Clark and Weir, 2002] Stephen Clark and David Weir. Class-based probability estimationusing a semantic hierarchy. Computational Linguistics, 28(2), 2002.[Cohn et al., 1994] David Cohn, Les Atlas, and Richard Ladner. Improving generalizationwith active learning. Mach. Learn., 15(2):201–221, 1994.[Collins and Koo, 2005] Michael Collins and Terry Koo. Discriminative reranking <strong>for</strong> naturallanguage parsing. Computational Linguistics, 31(1), 2005.[Collins and Singer, 1999] Michael Collins and Yoram Singer. Unsupervised models <strong>for</strong>named entity classification. In EMNLP-VLC, 1999.[Collins, 2002] Michael Collins. Discriminative training methods <strong>for</strong> hidden markov models:Theory and experiments with perceptron algorithms. In EMNLP, 2002.[Cortes and Vapnik, 1995] Corinna Cortes and Vladimir Vapnik. Support-vector networks.Mach. Learn., 20(3):273–297, 1995.[CPLEX, 2005] CPLEX. IBM ILOG CPLEX 9.1. www.ilog.com/products/cplex/, 2005.[Crammer and Singer, 2001] Koby Crammer and Yoram Singer. On the algorithmic implementationof multiclass kernel-based vector machines. JMLR, 2:265–292, 2001.[Crammer and Singer, 2003] Koby Crammer and Yoram Singer. Ultraconservative onlinealgorithms <strong>for</strong> multiclass problems. JMLR, 3:951–991, 2003.[Crammer et al., 2006] Koby Crammer, Ofer Dekel, Joseph Keshet, Shai Shalev-Shwartz,and Yoram Singer. Online passive-aggressive algorithms. JMLR, 7:551–585, 2006.[Cucerzan and Agichtein, 2005] Silviu Cucerzan and Eugene Agichtein. Factoid QuestionAnswering over Unstructured and Structured Web Content. In TREC, 2005.[Cucerzan and Yarowsky, 1999] Silviu Cucerzan and David Yarowsky. <strong>Language</strong> independentnamed entity recognition combining morphological and contextual evidence. InEMNLP-VLC, 1999.[Cucerzan and Yarowsky, 2002] Silviu Cucerzan and David Yarowsky. Augmented mixturemodels <strong>for</strong> lexical disambiguation. In EMNLP, 2002.[Cucerzan and Yarowsky, 2003] Silviu Cucerzan and David Yarowsky. Minimally supervisedinduction of grammatical gender. In NAACL, 2003.[Dagan and Itai, 1990] Ido Dagan and Alan Itai. Automatic processing of large corpora <strong>for</strong>the resolution of anaphora references. In COLING, volume 3, 1990.[Dagan et al., 1999] Ido Dagan, Lillian Lee, and Fernando C. N. Pereira. Similarity-basedmodels of word cooccurrence probabilities. Mach. Learn., 34(1-3), 1999.116
- Page 1 and 2:
University of AlbertaLarge-Scale Se
- Page 5 and 6:
Table of Contents1 Introduction 11.
- Page 7 and 8:
7 Alignment-Based Discriminative St
- Page 9 and 10:
List of Figures2.1 The linear class
- Page 11 and 12:
drawn in by establishing a partial
- Page 13 and 14:
(2) “He saw the trophy won yester
- Page 15 and 16:
actual sentence said, “My son’s
- Page 17 and 18:
Uses Web-Scale N-grams Auto-Creates
- Page 19 and 20:
spelling correction, and the identi
- Page 21 and 22:
Chapter 2Supervised and Semi-Superv
- Page 23 and 24:
emphasis on “deliverables and eva
- Page 25 and 26:
Figure 2.1: The linear classifier h
- Page 27 and 28:
The above experimental set-up is so
- Page 29 and 30:
and discriminative models therefore
- Page 31 and 32:
their slack value). In practice, I
- Page 33 and 34:
One way to find a better solution i
- Page 35 and 36:
Figure 2.2: Learning from labeled a
- Page 37 and 38:
algorithm). Yarowsky used it for wo
- Page 39 and 40:
Learning with Natural Automatic Exa
- Page 41 and 42:
positive examples from any collecti
- Page 43 and 44:
generated word clusters. Several re
- Page 45 and 46:
One common disambiguation task is t
- Page 47 and 48:
3.2.2 Web-Scale Statistics in NLPEx
- Page 49 and 50:
For each target wordv 0 , there are
- Page 51 and 52:
ut without counts for the class pri
- Page 53 and 54:
Accuracy (%)10090807060SUPERLMSUMLM
- Page 55 and 56:
We also follow Carlson et al. [2001
- Page 57 and 58:
Set BASE [Golding and Roth, 1999] T
- Page 59 and 60:
pronoun (#3) guarantees that at the
- Page 61 and 62:
807876F-Score747270Stemmed patterns
- Page 63 and 64:
anaphoricity by [Denis and Baldridg
- Page 65 and 66:
ter, we present a simple technique
- Page 67 and 68:
We seek weights such that the class
- Page 69 and 70:
each optimum performance is at most
- Page 71 and 72:
We now show that ¯w T (diag(¯p)
- Page 73 and 74: Training ExamplesSystem 10 100 1K 1
- Page 75 and 76: Since we wanted the system to learn
- Page 77 and 78: Chapter 5Creating Robust Supervised
- Page 79 and 80: § In-Domain (IN) Out-of-Domain #1
- Page 81 and 82: Adjective ordering is also needed i
- Page 83 and 84: Accuracy (%)10095908580757065601001
- Page 85 and 86: System IN O1 O2Baseline 66.9 44.6 6
- Page 87 and 88: 90% of the time in Gutenberg. The L
- Page 89 and 90: VBN/VBD distinction by providing re
- Page 91 and 92: other tasks we only had a handful o
- Page 93 and 94: without the need for manual annotat
- Page 95 and 96: DSP uses these labels to identify o
- Page 97 and 98: Semantic classesMotivated by previo
- Page 99 and 100: empirical Pr(n|v) in Equation (6.2)
- Page 101 and 102: Verb Plaus./Implaus. Resnik Dagan e
- Page 103 and 104: SystemAccMost-Recent Noun 17.9%Maxi
- Page 105 and 106: Chapter 7Alignment-Based Discrimina
- Page 107 and 108: ious measures to learn the recurren
- Page 109 and 110: how labeled word pairs can be colle
- Page 111 and 112: Figure 7.1: LCSR histogram and poly
- Page 113 and 114: 0.711-pt Average Precision0.60.50.4
- Page 115 and 116: Fr-En Bitext Es-En Bitext De-En Bit
- Page 117 and 118: Chapter 8Conclusions and Future Wor
- Page 119 and 120: 8.3 Future WorkThis section outline
- Page 121 and 122: My focus is thus on enabling robust
- Page 123: [Bergsma and Cherry, 2010] Shane Be
- Page 127 and 128: [Grefenstette, 1999] Gregory Grefen
- Page 129 and 130: [Koehn, 2005] Philipp Koehn. Europa
- Page 131 and 132: [Mihalcea and Moldovan, 1999] Rada
- Page 133 and 134: [Ristad and Yianilos, 1998] Eric Sv
- Page 135 and 136: [Wang et al., 2008] Qin Iris Wang,
- Page 137: NNP noun, proper, singular Motown V