[Litkowski and Hargraves, 2007] Ken Litkowski and Orin Hargraves. SemEval-2007 Task06: Word-sense disambiguation of prepositions. In SemEval, 2007.[Liu and Curran, 2006] Vinci Liu and James R. Curran. Web text corpus <strong>for</strong> natural languageprocessing. In EACL, 2006.[Lodhi et al., 2002] Huma Lodhi, Craig Saunders, John Shawe-Taylor, Nello Cristianini,and Chris Watkins. Text classification using string kernels. JMLR, 2:419–444, 2002.[Lowe, 1999] David G. Lowe. Object recognition from local scale-invariant features. InICCV, 1999.[Malouf, 2000] Robert Malouf. The order of prenominal adjectives in natural languagegeneration. In ACL, 2000.[Mann and McCallum, 2007] Gideon S. Mann and Andrew McCallum. Simple, robust,scalable semi-supervised learning via expectation regularization. In ICML, 2007.[Mann and Yarowsky, 2001] Gideon S. Mann and David Yarowsky. Multipath translationlexicon induction via bridge languages. In NAACL, 2001.[Manning and Schütze, 1999] Christopher D. Manning and Hinrich Schütze. Foundationsof Statistical <strong>Natural</strong> <strong>Language</strong> Processing. MIT Press, 1999.[Marcus et al., 1993] Mitchell P. Marcus, Beatrice Santorini, and Mary Marcinkiewicz.Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics,19(2):313–330, 1993.[Marcus, 1980] Mitchell P. Marcus. Theory of Syntactic Recognition <strong>for</strong> <strong>Natural</strong> <strong>Language</strong>s.MIT Press, Cambridge, MA, USA, 1980.[Marton et al., 2009] Yuval Marton, Chris Callison-Burch, and Philip Resnik. Improvedstatistical machine translation using monolingually-derived paraphrases. In ACL-IJCNLP, 2009.[McCallum et al., 2005] Andrew McCallum, Kedar Bellare, and Fernando Pereira. A conditionalrandom field <strong>for</strong> discriminatively-trained finite-state string edit distance. In UAI,2005.[McCallum, 1996] Andrew Kachites McCallum.tistical language modeling, text retrieval,Bow: Aclassificationtoolkit <strong>for</strong> sta-and clustering.http://www.cs.cmu.edu/˜mccallum/bow, 1996.[McClosky et al., 2006a] David McClosky, Eugene Charniak, and Mark Johnson. Effectiveself-training <strong>for</strong> parsing. In HLT-NAACL, 2006.[McClosky et al., 2006b] David McClosky, Eugene Charniak, and Mark Johnson. Rerankingand self-training <strong>for</strong> parser adaptation. In COLING-ACL, 2006.Auto-[McClosky et al., 2010] David McClosky, Eugene Charniak, and Mark Johnson.matic domain adaptation <strong>for</strong> parsing. In NAACL HLT, 2010.[McEnery and Oakes, 1996] Tony McEnery and Michael P. Oakes. Sentence and wordalignment in the CRATER project. In Using Corpora <strong>for</strong> <strong>Language</strong> Research. Longman,1996.[Melamed, 1998] I. Dan Melamed. Manual annotation of translational equivalence. TechnicalReport IRCS #98-07, University of Pennsylvania, 1998.[Melamed, 1999] I. Dan Melamed. Bitext maps and alignment via pattern recognition.Computational Linguistics, 25(1), 1999.121
[Mihalcea and Moldovan, 1999] Rada Mihalcea and Dan I. Moldovan. A method <strong>for</strong> wordsense disambiguation of unrestricted text. In ACL, 1999.[Miller et al., 1990] George A. Miller, Richard Beckwith, Christiane Fellbaum, DerekGross, and Katherine J. Miller. Introduction to WordNet: an on-line lexical database.International Journal of Lexicography, 3(4), 1990.[Miller et al., 2004] Scott Miller, Jethran Guinness, and Alex Zamanian. Name taggingwith word clusters and discriminative training. In HLT-NAACL, 2004.[Mitchell, 2009] Margaret Mitchell. Class-based ordering of prenominal modifiers. In 12thEuropean Workshop on <strong>Natural</strong> <strong>Language</strong> Generation, 2009.[Modjeska et al., 2003] Natalia N. Modjeska, Katja Markert, and Malvina Nissim. Usingthe Web in machine learning <strong>for</strong> other-anaphora resolution. In EMNLP, 2003.[MUC-7, 1997] MUC-7. Coreference task definition (v3.0, 13 Jul 97). In Proceedings ofthe Seventh Message Understanding Conference (MUC-7), 1997.[Müller et al., 2002] Christoph Müller, Stefan Rapp, and Michael Strube. Applying cotrainingto reference resolution. In Proceedings of the 40th Annual Meeting of the Association<strong>for</strong> Computational Linguistics, 2002.[Müller, 2006] Christoph Müller. Automatic detection of nonreferential It in spoken multipartydialog. In EACL, 2006.[Mulloni and Pekar, 2006] Andrea Mulloni and Viktor Pekar. Automatic detection of orthographiccues <strong>for</strong> cognate recognition. In LREC, 2006.[Munteanu and Marcu, 2005] Dragos S. Munteanu and Daniel Marcu. Improving machinetranslation per<strong>for</strong>mance by exploiting non-parallel corpora. Computational Linguistics,31(4):477–504, 2005.[Nakov and Hearst, 2005a] Preslav Nakov and Marti Hearst. Search engine statistics beyondthe n-gram: Application to noun compound bracketing. In CoNLL, 2005.[Nakov and Hearst, 2005b] Preslav Nakov and Marti Hearst. Using the web as an implicittraining set: application to structural ambiguity resolution. In HLT/EMNLP, 2005.[Nakov, 2007] Preslav Ivanov Nakov. Using the Web as an Implicit Training Set: Applicationto Noun Compound Syntax and Semantics. PhD thesis, University of Cali<strong>for</strong>nia,Berkeley, 2007.[Ng and Cardie, 2003a] Vincent Ng and Claire Cardie. Bootstrapping coreference classifierswith multiple machine learning algorithms. In Proceedings of the 2003 Conferenceon Empirical Methods in <strong>Natural</strong> <strong>Language</strong> Processing (EMNLP), 2003.[Ng and Cardie, 2003b] Vincent Ng and Claire Cardie. Weakly supervised natural languagelearning without redundant views. In Proceedings of the HLT-NAACL, 2003.[Ng and Jordan, 2002] Andrew Y. Ng and Michael I. Jordan. Discriminative vs. generativeclassifiers: A comparison of logistic regression and naive bayes. In NIPS, 2002.[Och and Ney, 2002] Franz J. Och and Hermann Ney. Discriminative training and maximumentropy models <strong>for</strong> statistical machine translation. In ACL, 2002.[Okanohara and Tsujii, 2007] Daisuke Okanohara and Jun’ichi Tsujii.language model with pseudo-negative samples. In ACL, 2007.A discriminative[Paice and Husk, 1987] Chris D. Paice and Gareth D. Husk. Towards the automatic recognitionof anaphoric features in English text: the impersonal pronoun “it”. ComputerSpeech and <strong>Language</strong>, 2:109–132, 1987.122
- Page 1 and 2:
University of AlbertaLarge-Scale Se
- Page 5 and 6:
Table of Contents1 Introduction 11.
- Page 7 and 8:
7 Alignment-Based Discriminative St
- Page 9 and 10:
List of Figures2.1 The linear class
- Page 11 and 12:
drawn in by establishing a partial
- Page 13 and 14:
(2) “He saw the trophy won yester
- Page 15 and 16:
actual sentence said, “My son’s
- Page 17 and 18:
Uses Web-Scale N-grams Auto-Creates
- Page 19 and 20:
spelling correction, and the identi
- Page 21 and 22:
Chapter 2Supervised and Semi-Superv
- Page 23 and 24:
emphasis on “deliverables and eva
- Page 25 and 26:
Figure 2.1: The linear classifier h
- Page 27 and 28:
The above experimental set-up is so
- Page 29 and 30:
and discriminative models therefore
- Page 31 and 32:
their slack value). In practice, I
- Page 33 and 34:
One way to find a better solution i
- Page 35 and 36:
Figure 2.2: Learning from labeled a
- Page 37 and 38:
algorithm). Yarowsky used it for wo
- Page 39 and 40:
Learning with Natural Automatic Exa
- Page 41 and 42:
positive examples from any collecti
- Page 43 and 44:
generated word clusters. Several re
- Page 45 and 46:
One common disambiguation task is t
- Page 47 and 48:
3.2.2 Web-Scale Statistics in NLPEx
- Page 49 and 50:
For each target wordv 0 , there are
- Page 51 and 52:
ut without counts for the class pri
- Page 53 and 54:
Accuracy (%)10090807060SUPERLMSUMLM
- Page 55 and 56:
We also follow Carlson et al. [2001
- Page 57 and 58:
Set BASE [Golding and Roth, 1999] T
- Page 59 and 60:
pronoun (#3) guarantees that at the
- Page 61 and 62:
807876F-Score747270Stemmed patterns
- Page 63 and 64:
anaphoricity by [Denis and Baldridg
- Page 65 and 66:
ter, we present a simple technique
- Page 67 and 68:
We seek weights such that the class
- Page 69 and 70:
each optimum performance is at most
- Page 71 and 72:
We now show that ¯w T (diag(¯p)
- Page 73 and 74:
Training ExamplesSystem 10 100 1K 1
- Page 75 and 76:
Since we wanted the system to learn
- Page 77 and 78:
Chapter 5Creating Robust Supervised
- Page 79 and 80: § In-Domain (IN) Out-of-Domain #1
- Page 81 and 82: Adjective ordering is also needed i
- Page 83 and 84: Accuracy (%)10095908580757065601001
- Page 85 and 86: System IN O1 O2Baseline 66.9 44.6 6
- Page 87 and 88: 90% of the time in Gutenberg. The L
- Page 89 and 90: VBN/VBD distinction by providing re
- Page 91 and 92: other tasks we only had a handful o
- Page 93 and 94: without the need for manual annotat
- Page 95 and 96: DSP uses these labels to identify o
- Page 97 and 98: Semantic classesMotivated by previo
- Page 99 and 100: empirical Pr(n|v) in Equation (6.2)
- Page 101 and 102: Verb Plaus./Implaus. Resnik Dagan e
- Page 103 and 104: SystemAccMost-Recent Noun 17.9%Maxi
- Page 105 and 106: Chapter 7Alignment-Based Discrimina
- Page 107 and 108: ious measures to learn the recurren
- Page 109 and 110: how labeled word pairs can be colle
- Page 111 and 112: Figure 7.1: LCSR histogram and poly
- Page 113 and 114: 0.711-pt Average Precision0.60.50.4
- Page 115 and 116: Fr-En Bitext Es-En Bitext De-En Bit
- Page 117 and 118: Chapter 8Conclusions and Future Wor
- Page 119 and 120: 8.3 Future WorkThis section outline
- Page 121 and 122: My focus is thus on enabling robust
- Page 123 and 124: [Bergsma and Cherry, 2010] Shane Be
- Page 125 and 126: [Church and Mercer, 1993] Kenneth W
- Page 127 and 128: [Grefenstette, 1999] Gregory Grefen
- Page 129: [Koehn, 2005] Philipp Koehn. Europa
- Page 133 and 134: [Ristad and Yianilos, 1998] Eric Sv
- Page 135 and 136: [Wang et al., 2008] Qin Iris Wang,
- Page 137: NNP noun, proper, singular Motown V