[Tetreault and Chodorow, 2008] Joel R. Tetreault and Martin Chodorow.downs of preposition error detection in ESL writing. In COLING, 2008.The ups and[Tiedemann, 1999] Jörg Tiedemann. Automatic construction of weighted string similaritymeasures. In EMNLP-VLC, 1999.[Tong and Koller, 2002] Simon Tong and Daphne Koller. Support vector machine activelearning with applications to text classification. JMLR, 2:45–66, 2002.[Tratz and Hovy, 2010] Stephen Tratz and Eduard Hovy. A taxonomy, dataset, and classifier<strong>for</strong> automatic noun compound interpretation. In ACL, 2010.[Tsochantaridis et al., 2004] Ioannis Tsochantaridis, Thomas Hofmann, ThorstenJoachims, and Yasemin Altun. Support vector machine learning <strong>for</strong> interdependent andstructured output spaces. In ICML, 2004.[Tsochantaridis et al., 2005] Ioannis Tsochantaridis, Thorsten Joachims, Thomas Hofmann,and Yasemin Altun. <strong>Large</strong> margin methods <strong>for</strong> structured and interdependentoutput variables. JMLR, 6:1453–1484, 2005.[Tsuruoka et al., 2005] Yoshimasa Tsuruoka, Yuka Tateishi, Jin-Dong Kim, Tomoko Ohta,John McNaught, Sophia Ananiadou, and Jun’ichi Tsujii. Developing a robust part-ofspeechtagger <strong>for</strong> biomedical text. In Advances in In<strong>for</strong>matics, 2005.[Turian et al., 2010] Joseph Turian, Lev Ratinov, and Yoshua Bengio. Word representations:A simple and general method <strong>for</strong> semi-supervised learning. In ACL, 2010.[Turney, 2002] Peter D. Turney. Thumbs up or thumbs down? semantic orientation appliedto unsupervised classification of reviews. In ACL, 2002.[Turney, 2003] Peter D. Turney. Coherent keyphrase extraction via web mining. InProceedings of the Eighteenth International Joint Conference on Artificial Intelligence(IJCAI-03), (2003), Acapulco, Mexico, 2003.[Turney, 2006] Peter D. Turney. Similarity of semantic relations. Computational Linguistics,32(3):379–416, 2006.[Uitdenbogerd, 2005] Sandra Uitdenbogerd. Readability of French as a <strong>for</strong>eign languageand its uses. In Proceedings of the Australian Document Computing Symposium, 2005.[Vadas and Curran, 2007a] David Vadas and James R. Curran. Adding noun phrase structureto the Penn Treebank. In ACL, 2007.[Vadas and Curran, 2007b] David Vadas and James R. Curran.models <strong>for</strong> noun phrase bracketing. In PACLING, 2007.<strong>Large</strong>-scale supervised[van den Bosch, 2006] Antal van den Bosch. All-word prediction as the ultimate confusabledisambiguation. In Workshop on Computationally Hard Problems and Joint Inferencein Speech and <strong>Language</strong> Processing, 2006.[Vapnik, 1998] Vladimir N. Vapnik. Statistical <strong>Learning</strong> Theory. John Wiley & Sons,1998.[Vorhees, 2002] Ellen Vorhees. Overview of the TREC 2002 question answering track. InProceedings of the Eleventh Text REtrieval Conference (TREC), 2002.[Wang et al., 2005] Qin Iris Wang, Dale Schuurmans, and Dekang Lin. Strictly lexicaldependency parsing. In International Workshop on Parsing Technologies, 2005.[Wang et al., 2006] Qin Iris Wang, Colin Cherry, Dan Lizotte, and Dale Schuurmans. Improvedlarge margin dependency parsing via local constraints and Laplacian regularization.In CoNLL, 2006.125
[Wang et al., 2008] Qin Iris Wang, Dale Schuurmans, and Dekang Lin. <strong>Semi</strong>-supervisedconvex training <strong>for</strong> dependency parsing. In ACL-08: HLT, 2008.[Weeds and Weir, 2005] Julie Weeds and David Weir. Co-occurrence retrieval: a flexibleframework <strong>for</strong> lexical distributional similarity. Computational Linguistics, 31(4), 2005.[Weston and Watkins, 1998] Jason Weston and Chris Watkins. Multi-class support vectormachines. Technical Report CSD-TR-98-04, Department of Computer Science, RoyalHolloway, University of London, 1998.[Wilcox-O’Hearn et al., 2008] Amber Wilcox-O’Hearn, Graeme Hirst, and Alexander Budanitsky.Real-word spelling correction with trigrams: A reconsideration of the Mays,Damerau, and Mercer model. In CICLing, 2008.[Witten and Frank, 2005] Ian H. Witten and Eibe Frank. Data Mining: Practical machinelearning tools and techniques. Morgan Kaufmann, second edition, 2005.[Xu et al., 2009] Peng Xu, Jaeho Kang, Michael Ringgaard, and Franz Josef Och. Using adependency parser to improve SMT <strong>for</strong> subject-object-verb languages. In HLT-NAACL,2009.[Yang et al., 2005] Xiaofeng Yang, Jian Su, and Chew Lim Tan. Improving pronoun resolutionusing statistics-based semantic compatibility in<strong>for</strong>mation. In ACL, 2005.[Yarowsky, 1994] David Yarowsky. Decision lists <strong>for</strong> lexical ambiguity resolution: applicationto accent restoration in Spanish and French. In ACL, 1994.[Yarowsky, 1995] David Yarowsky. Unsupervised word sense disambiguation rivaling supervisedmethods. In ACL, 1995.[Yi et al., 2008] Xing Yi, Jianfeng Gao, and William B. Dolan. A web-based Englishproofing system <strong>for</strong> English as a second language users. In IJCNLP, 2008.[Yoon et al., 2007] Su-Youn Yoon, Kyoung-Young Kim, and Richard Sproat. Multilingualtransliteration using feature based phonetic method. In ACL, pages 112–119, 2007.[Yu et al., 2007] Liang-Chih Yu, Chung-Hsien Wu, Andrew Philpot, and Eduard Hovy.OntoNotes: Sense pool verification using Google N-gram and statistical tests. In OntoLexWorkshop at the 6th International Semantic Web Conference (ISWC’07), 2007.[Yu et al., 2010] Hsiang-Fu Yu, Cho-Jui Hsieh, Kai-Wei Chang, and Chih-Jen Lin. <strong>Large</strong>linear classification when data cannot fit in memory. In KDD, 2010.[Yuret, 2007] Deniz Yuret. KU: Word sense disambiguation by substitution. In SemEval-2007: 4th International Workshop on Semantic Evaluations, June 2007.[Zaidan et al., 2007] Omar Zaidan, Jason Eisner, and Christine Piatko. Using “annotatorrationales” to improve machine learning <strong>for</strong> text categorization. In NAACL-HLT, 2007.[Zelenko and Aone, 2006] Dmitry Zelenko and Chinatsu Aone. Discriminative methods<strong>for</strong> transliteration. In EMNLP, 2006.[Zhu, 2005] Xiaojin Zhu. <strong>Semi</strong>-supervised learning literature survey. Technical Report1530, Computer Sciences, University of Wisconsin-Madison, 2005.126
- Page 1 and 2:
University of AlbertaLarge-Scale Se
- Page 5 and 6:
Table of Contents1 Introduction 11.
- Page 7 and 8:
7 Alignment-Based Discriminative St
- Page 9 and 10:
List of Figures2.1 The linear class
- Page 11 and 12:
drawn in by establishing a partial
- Page 13 and 14:
(2) “He saw the trophy won yester
- Page 15 and 16:
actual sentence said, “My son’s
- Page 17 and 18:
Uses Web-Scale N-grams Auto-Creates
- Page 19 and 20:
spelling correction, and the identi
- Page 21 and 22:
Chapter 2Supervised and Semi-Superv
- Page 23 and 24:
emphasis on “deliverables and eva
- Page 25 and 26:
Figure 2.1: The linear classifier h
- Page 27 and 28:
The above experimental set-up is so
- Page 29 and 30:
and discriminative models therefore
- Page 31 and 32:
their slack value). In practice, I
- Page 33 and 34:
One way to find a better solution i
- Page 35 and 36:
Figure 2.2: Learning from labeled a
- Page 37 and 38:
algorithm). Yarowsky used it for wo
- Page 39 and 40:
Learning with Natural Automatic Exa
- Page 41 and 42:
positive examples from any collecti
- Page 43 and 44:
generated word clusters. Several re
- Page 45 and 46:
One common disambiguation task is t
- Page 47 and 48:
3.2.2 Web-Scale Statistics in NLPEx
- Page 49 and 50:
For each target wordv 0 , there are
- Page 51 and 52:
ut without counts for the class pri
- Page 53 and 54:
Accuracy (%)10090807060SUPERLMSUMLM
- Page 55 and 56:
We also follow Carlson et al. [2001
- Page 57 and 58:
Set BASE [Golding and Roth, 1999] T
- Page 59 and 60:
pronoun (#3) guarantees that at the
- Page 61 and 62:
807876F-Score747270Stemmed patterns
- Page 63 and 64:
anaphoricity by [Denis and Baldridg
- Page 65 and 66:
ter, we present a simple technique
- Page 67 and 68:
We seek weights such that the class
- Page 69 and 70:
each optimum performance is at most
- Page 71 and 72:
We now show that ¯w T (diag(¯p)
- Page 73 and 74:
Training ExamplesSystem 10 100 1K 1
- Page 75 and 76:
Since we wanted the system to learn
- Page 77 and 78:
Chapter 5Creating Robust Supervised
- Page 79 and 80:
§ In-Domain (IN) Out-of-Domain #1
- Page 81 and 82:
Adjective ordering is also needed i
- Page 83 and 84: Accuracy (%)10095908580757065601001
- Page 85 and 86: System IN O1 O2Baseline 66.9 44.6 6
- Page 87 and 88: 90% of the time in Gutenberg. The L
- Page 89 and 90: VBN/VBD distinction by providing re
- Page 91 and 92: other tasks we only had a handful o
- Page 93 and 94: without the need for manual annotat
- Page 95 and 96: DSP uses these labels to identify o
- Page 97 and 98: Semantic classesMotivated by previo
- Page 99 and 100: empirical Pr(n|v) in Equation (6.2)
- Page 101 and 102: Verb Plaus./Implaus. Resnik Dagan e
- Page 103 and 104: SystemAccMost-Recent Noun 17.9%Maxi
- Page 105 and 106: Chapter 7Alignment-Based Discrimina
- Page 107 and 108: ious measures to learn the recurren
- Page 109 and 110: how labeled word pairs can be colle
- Page 111 and 112: Figure 7.1: LCSR histogram and poly
- Page 113 and 114: 0.711-pt Average Precision0.60.50.4
- Page 115 and 116: Fr-En Bitext Es-En Bitext De-En Bit
- Page 117 and 118: Chapter 8Conclusions and Future Wor
- Page 119 and 120: 8.3 Future WorkThis section outline
- Page 121 and 122: My focus is thus on enabling robust
- Page 123 and 124: [Bergsma and Cherry, 2010] Shane Be
- Page 125 and 126: [Church and Mercer, 1993] Kenneth W
- Page 127 and 128: [Grefenstette, 1999] Gregory Grefen
- Page 129 and 130: [Koehn, 2005] Philipp Koehn. Europa
- Page 131 and 132: [Mihalcea and Moldovan, 1999] Rada
- Page 133: [Ristad and Yianilos, 1998] Eric Sv
- Page 137: NNP noun, proper, singular Motown V