[Daumé III, 2007] Hal Daumé III. Frustratingly easy domain adaptation. In ACL, 2007.[Denis and Baldridge, 2007] Pascal Denis and Jason Baldridge. Joint determination ofanaphoricity and coreference using integer programming. In NAACL-HLT, 2007.[Dou et al., 2009] Qing Dou, Shane Bergsma, Sittichai Jiampojamarn, and Grzegorz Kondrak.A ranking approach to stress prediction <strong>for</strong> letter-to-phoneme conversion. In ACL-IJCNLP, 2009.[Dredze et al., 2008] Mark Dredze, Koby Crammer, and Fernando Pereira. Confidenceweightedlinear classification. In ICML, 2008.[Duda and Hart, 1973] Richard O. Duda and Peter E. Hart.Scene Analysis. John Wiley & Sons, 1973.Pattern Classification and[Erk, 2007] Katrin Erk. A simple, similarity-based model <strong>for</strong> selectional preference. InACL, 2007.[Etzioni et al., 2005] Oren Etzioni, Michael Cafarella, Doug Downey, Ana-Maria Popescu,Tal Shaked, Stephen Soderland, Daniel S. Weld, and Alexander Yates. Unsupervisednamed-entity extraction from the web: an experimental study. Artif. Intell., 165(1),2005.[Evans, 2001] Richard Evans. Applying machine learning toward an automatic classificationof it. Literary and Linguistic Computing, 16(1), 2001.[Even-Zohar and Roth, 2000] Yair Even-Zohar and Dan Roth. A classification approach toword prediction. In NAACL, 2000.[Evert, 2004] Stefan Evert. Significance tests <strong>for</strong> the evaluation of ranking methods. InCOLING, 2004.[Fan et al., 2008] Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-Rui Wang, andChih-Jen Lin. LIBLINEAR: A library <strong>for</strong> large linear classification. JMLR, 9:1871–1874, 2008.[Felice and Pulman, 2007] Rachele De Felice and Stephen G. Pulman. Automatically acquiringmodels of preposition use. In ACL-SIGSEM Workshop on Prepositions, 2007.[Fleischman et al., 2003] Michael Fleischman, Eduard Hovy, and Abdessamad Echihabi.Offline strategies <strong>for</strong> online question answering: answering questions be<strong>for</strong>e they areasked. In ACL, 2003.[Fung and Roth, 2005] Pascale Fung and Dan Roth. Guest editors introduction: Machinelearning in speech and language technologies. Machine <strong>Learning</strong>, 60(1-3):5–9, 2005.[Gale et al., 1992] William A. Gale, Kenneth W. Church, and David Yarowsky. One senseper discourse. In DARPA Speech and <strong>Natural</strong> <strong>Language</strong> Workshop, 1992.[Gamon et al., 2008] Michael Gamon, Jianfeng Gao, Chris Brockett, Alexandre Klementiev,William B. Dolan, Dmitriy Belenko, and Lucy Vanderwende. Using contextualspeller techniques and language modeling <strong>for</strong> ESL error correction. In IJCNLP, 2008.[Ge et al., 1998] Niyu Ge, John Hale, and Eugene Charniak. A statistical approach toanaphora resolution. In Proceedings of the Sixth Workshop on Very <strong>Large</strong> Corpora,1998.[Gildea, 2001] Dan Gildea. Corpus variation and parser per<strong>for</strong>mance. In EMNLP, 2001.[Golding and Roth, 1999] Andrew R. Golding and Dan Roth. A Winnow-based approachto context-sensitive spelling correction. Mach. Learn., 34(1-3):107–130, 1999.[Graff, 2003] David Graff. English gigaword. LDC2003T05, 2003.117
[Grefenstette, 1999] Gregory Grefenstette. The World Wide Web as a resource <strong>for</strong>example-based machine translation tasks. In ASLIB Conference on Translating and theComputer, 1999.[Haghighi and Klein, 2006] Aria Haghighi and Dan Klein. Prototype-driven learning <strong>for</strong>sequence models. In HLT-NAACL, 2006.[Haghighi and Klein, 2010] Aria Haghighi and Dan Klein.modular, entity-centered model. In HLT-NAACL, 2010.Coreference resolution in a[Hajič and Hajičová, 2007] Jan Hajič and Eva Hajičová. Some of our best friends are statisticians.In TSD, 2007.[Har-Peled et al., 2003] Sariel Har-Peled, Dan Roth, and Dav Zimak. Constraint classification<strong>for</strong> multiclass classification and ranking. In NIPS, 2003.[Harabagiu et al., 2001] Sanda Harabagiu, Razvan Bunescu, and Steven Maiorano. Textand knowledge mining <strong>for</strong> coreference resolution. In NAACL, 2001.[Harman, 1992] Donna Harman. The DARPA TIPSTER project. ACM SIGIR Forum,26(2), 1992.[Hawker et al., 2007] Tobias Hawker, Mary Gardiner, and Andrew Bennetts. Practicalqueries of a massive n-gram database. In Proc. Australasian <strong>Language</strong> Technology AssociationWorkshop, 2007.[Hearst, 1992] Marti A. Hearst. Automatic acquisition of hyponyms from large text corpora.In COLING, 1992.[Hirst and Budanitsky, 2005] Graeme Hirst and Alexander Budanitsky. Correcting realwordspelling errors by restoring lexical cohesion. Nat. Lang. Eng., 11(1):87–111, 2005.[Hirst, 1981] Graeme Hirst.Springer Verlag, 1981.Anaphora in <strong>Natural</strong> <strong>Language</strong> Understanding: A Survey.[Hobbs, 1978] Jerry Hobbs. Resolving pronoun references. Lingua, 44(311), 1978.[Holmes et al., 1989] Virginia M. Holmes, Laurie Stowe, and Linda Cupples. Lexical expectationsin parsing complement-verb sentences. Journal of Memory and <strong>Language</strong>,28, 1989.[Hovy et al., 2006] Eduard Hovy, Mitchell Marcus, Martha Palmer, Lance Ramshaw, andRalph Weischedel. OntoNotes: the 90% solution. In HLT-NAACL, 2006.[Hsu and Lin, 2002] Chih-Wei Hsu and Chih-Jen Lin. A comparison of methods <strong>for</strong> multiclasssupport vector machines. IEEE Trans. Neur. Networks, 13(2):415–425, 2002.[Huang and Yates, 2009] Fei Huang and Alexander Yates. Distributional representations<strong>for</strong> handling sparsity in supervised sequence-labeling. In ACL-IJCNLP, 2009.[Jelinek, 1976] Fred Jelinek. Continuous speech recognition by statistical methods. Proceedingsof the IEEE, 64(4), 1976.[Jelinek, 2005] Frederick Jelinek. Some of my best friends are linguists. <strong>Language</strong> Resourcesand Evaluation, 39, 2005.[Jelinek, 2009] Frederick Jelinek. The dawn of statistical ASR and MT. Comput. Linguist.,35(4):483–494, 2009.[Ji and Lin, 2009] Heng Ji and Dekang Lin. Gender and animacy knowledge discoveryfrom web-scale N-grams <strong>for</strong> unsupervised person mention detection. In PACLIC, 2009.118
- Page 1 and 2:
University of AlbertaLarge-Scale Se
- Page 5 and 6:
Table of Contents1 Introduction 11.
- Page 7 and 8:
7 Alignment-Based Discriminative St
- Page 9 and 10:
List of Figures2.1 The linear class
- Page 11 and 12:
drawn in by establishing a partial
- Page 13 and 14:
(2) “He saw the trophy won yester
- Page 15 and 16:
actual sentence said, “My son’s
- Page 17 and 18:
Uses Web-Scale N-grams Auto-Creates
- Page 19 and 20:
spelling correction, and the identi
- Page 21 and 22:
Chapter 2Supervised and Semi-Superv
- Page 23 and 24:
emphasis on “deliverables and eva
- Page 25 and 26:
Figure 2.1: The linear classifier h
- Page 27 and 28:
The above experimental set-up is so
- Page 29 and 30:
and discriminative models therefore
- Page 31 and 32:
their slack value). In practice, I
- Page 33 and 34:
One way to find a better solution i
- Page 35 and 36:
Figure 2.2: Learning from labeled a
- Page 37 and 38:
algorithm). Yarowsky used it for wo
- Page 39 and 40:
Learning with Natural Automatic Exa
- Page 41 and 42:
positive examples from any collecti
- Page 43 and 44:
generated word clusters. Several re
- Page 45 and 46:
One common disambiguation task is t
- Page 47 and 48:
3.2.2 Web-Scale Statistics in NLPEx
- Page 49 and 50:
For each target wordv 0 , there are
- Page 51 and 52:
ut without counts for the class pri
- Page 53 and 54:
Accuracy (%)10090807060SUPERLMSUMLM
- Page 55 and 56:
We also follow Carlson et al. [2001
- Page 57 and 58:
Set BASE [Golding and Roth, 1999] T
- Page 59 and 60:
pronoun (#3) guarantees that at the
- Page 61 and 62:
807876F-Score747270Stemmed patterns
- Page 63 and 64:
anaphoricity by [Denis and Baldridg
- Page 65 and 66:
ter, we present a simple technique
- Page 67 and 68:
We seek weights such that the class
- Page 69 and 70:
each optimum performance is at most
- Page 71 and 72:
We now show that ¯w T (diag(¯p)
- Page 73 and 74:
Training ExamplesSystem 10 100 1K 1
- Page 75 and 76: Since we wanted the system to learn
- Page 77 and 78: Chapter 5Creating Robust Supervised
- Page 79 and 80: § In-Domain (IN) Out-of-Domain #1
- Page 81 and 82: Adjective ordering is also needed i
- Page 83 and 84: Accuracy (%)10095908580757065601001
- Page 85 and 86: System IN O1 O2Baseline 66.9 44.6 6
- Page 87 and 88: 90% of the time in Gutenberg. The L
- Page 89 and 90: VBN/VBD distinction by providing re
- Page 91 and 92: other tasks we only had a handful o
- Page 93 and 94: without the need for manual annotat
- Page 95 and 96: DSP uses these labels to identify o
- Page 97 and 98: Semantic classesMotivated by previo
- Page 99 and 100: empirical Pr(n|v) in Equation (6.2)
- Page 101 and 102: Verb Plaus./Implaus. Resnik Dagan e
- Page 103 and 104: SystemAccMost-Recent Noun 17.9%Maxi
- Page 105 and 106: Chapter 7Alignment-Based Discrimina
- Page 107 and 108: ious measures to learn the recurren
- Page 109 and 110: how labeled word pairs can be colle
- Page 111 and 112: Figure 7.1: LCSR histogram and poly
- Page 113 and 114: 0.711-pt Average Precision0.60.50.4
- Page 115 and 116: Fr-En Bitext Es-En Bitext De-En Bit
- Page 117 and 118: Chapter 8Conclusions and Future Wor
- Page 119 and 120: 8.3 Future WorkThis section outline
- Page 121 and 122: My focus is thus on enabling robust
- Page 123 and 124: [Bergsma and Cherry, 2010] Shane Be
- Page 125: [Church and Mercer, 1993] Kenneth W
- Page 129 and 130: [Koehn, 2005] Philipp Koehn. Europa
- Page 131 and 132: [Mihalcea and Moldovan, 1999] Rada
- Page 133 and 134: [Ristad and Yianilos, 1998] Eric Sv
- Page 135 and 136: [Wang et al., 2008] Qin Iris Wang,
- Page 137: NNP noun, proper, singular Motown V