Large-Scale Semi-Supervised Learning for Natural Language ...

More documents

Recommendations

Info

[Tetreault and Chodorow, 2008] Joel R. Tetreault and Martin Chodorow.downs of preposition error detection in ESL writing. In COLING, 2008.The ups and[Tiedemann, 1999] Jörg Tiedemann. Automatic construction of weighted string similaritymeasures. In EMNLP-VLC, 1999.[Tong and Koller, 2002] Simon Tong and Daphne Koller. Support vector machine activelearning with applications to text classification. JMLR, 2:45–66, 2002.[Tratz and Hovy, 2010] Stephen Tratz and Eduard Hovy. A taxonomy, dataset, and classifierfor automatic noun compound interpretation. In ACL, 2010.[Tsochantaridis et al., 2004] Ioannis Tsochantaridis, Thomas Hofmann, ThorstenJoachims, and Yasemin Altun. Support vector machine learning for interdependent andstructured output spaces. In ICML, 2004.[Tsochantaridis et al., 2005] Ioannis Tsochantaridis, Thorsten Joachims, Thomas Hofmann,and Yasemin Altun. Large margin methods for structured and interdependentoutput variables. JMLR, 6:1453–1484, 2005.[Tsuruoka et al., 2005] Yoshimasa Tsuruoka, Yuka Tateishi, Jin-Dong Kim, Tomoko Ohta,John McNaught, Sophia Ananiadou, and Jun’ichi Tsujii. Developing a robust part-ofspeechtagger for biomedical text. In Advances in Informatics, 2005.[Turian et al., 2010] Joseph Turian, Lev Ratinov, and Yoshua Bengio. Word representations:A simple and general method for semi-supervised learning. In ACL, 2010.[Turney, 2002] Peter D. Turney. Thumbs up or thumbs down? semantic orientation appliedto unsupervised classification of reviews. In ACL, 2002.[Turney, 2003] Peter D. Turney. Coherent keyphrase extraction via web mining. InProceedings of the Eighteenth International Joint Conference on Artificial Intelligence(IJCAI-03), (2003), Acapulco, Mexico, 2003.[Turney, 2006] Peter D. Turney. Similarity of semantic relations. Computational Linguistics,32(3):379–416, 2006.[Uitdenbogerd, 2005] Sandra Uitdenbogerd. Readability of French as a foreign languageand its uses. In Proceedings of the Australian Document Computing Symposium, 2005.[Vadas and Curran, 2007a] David Vadas and James R. Curran. Adding noun phrase structureto the Penn Treebank. In ACL, 2007.[Vadas and Curran, 2007b] David Vadas and James R. Curran.models for noun phrase bracketing. In PACLING, 2007.Large-scale supervised[van den Bosch, 2006] Antal van den Bosch. All-word prediction as the ultimate confusabledisambiguation. In Workshop on Computationally Hard Problems and Joint Inferencein Speech and Language Processing, 2006.[Vapnik, 1998] Vladimir N. Vapnik. Statistical Learning Theory. John Wiley & Sons,1998.[Vorhees, 2002] Ellen Vorhees. Overview of the TREC 2002 question answering track. InProceedings of the Eleventh Text REtrieval Conference (TREC), 2002.[Wang et al., 2005] Qin Iris Wang, Dale Schuurmans, and Dekang Lin. Strictly lexicaldependency parsing. In International Workshop on Parsing Technologies, 2005.[Wang et al., 2006] Qin Iris Wang, Colin Cherry, Dan Lizotte, and Dale Schuurmans. Improvedlarge margin dependency parsing via local constraints and Laplacian regularization.In CoNLL, 2006.125
[Wang et al., 2008] Qin Iris Wang, Dale Schuurmans, and Dekang Lin. Semi-supervisedconvex training for dependency parsing. In ACL-08: HLT, 2008.[Weeds and Weir, 2005] Julie Weeds and David Weir. Co-occurrence retrieval: a flexibleframework for lexical distributional similarity. Computational Linguistics, 31(4), 2005.[Weston and Watkins, 1998] Jason Weston and Chris Watkins. Multi-class support vectormachines. Technical Report CSD-TR-98-04, Department of Computer Science, RoyalHolloway, University of London, 1998.[Wilcox-O’Hearn et al., 2008] Amber Wilcox-O’Hearn, Graeme Hirst, and Alexander Budanitsky.Real-word spelling correction with trigrams: A reconsideration of the Mays,Damerau, and Mercer model. In CICLing, 2008.[Witten and Frank, 2005] Ian H. Witten and Eibe Frank. Data Mining: Practical machinelearning tools and techniques. Morgan Kaufmann, second edition, 2005.[Xu et al., 2009] Peng Xu, Jaeho Kang, Michael Ringgaard, and Franz Josef Och. Using adependency parser to improve SMT for subject-object-verb languages. In HLT-NAACL,2009.[Yang et al., 2005] Xiaofeng Yang, Jian Su, and Chew Lim Tan. Improving pronoun resolutionusing statistics-based semantic compatibility information. In ACL, 2005.[Yarowsky, 1994] David Yarowsky. Decision lists for lexical ambiguity resolution: applicationto accent restoration in Spanish and French. In ACL, 1994.[Yarowsky, 1995] David Yarowsky. Unsupervised word sense disambiguation rivaling supervisedmethods. In ACL, 1995.[Yi et al., 2008] Xing Yi, Jianfeng Gao, and William B. Dolan. A web-based Englishproofing system for English as a second language users. In IJCNLP, 2008.[Yoon et al., 2007] Su-Youn Yoon, Kyoung-Young Kim, and Richard Sproat. Multilingualtransliteration using feature based phonetic method. In ACL, pages 112–119, 2007.[Yu et al., 2007] Liang-Chih Yu, Chung-Hsien Wu, Andrew Philpot, and Eduard Hovy.OntoNotes: Sense pool verification using Google N-gram and statistical tests. In OntoLexWorkshop at the 6th International Semantic Web Conference (ISWC’07), 2007.[Yu et al., 2010] Hsiang-Fu Yu, Cho-Jui Hsieh, Kai-Wei Chang, and Chih-Jen Lin. Largelinear classification when data cannot fit in memory. In KDD, 2010.[Yuret, 2007] Deniz Yuret. KU: Word sense disambiguation by substitution. In SemEval-2007: 4th International Workshop on Semantic Evaluations, June 2007.[Zaidan et al., 2007] Omar Zaidan, Jason Eisner, and Christine Piatko. Using “annotatorrationales” to improve machine learning for text categorization. In NAACL-HLT, 2007.[Zelenko and Aone, 2006] Dmitry Zelenko and Chinatsu Aone. Discriminative methodsfor transliteration. In EMNLP, 2006.[Zhu, 2005] Xiaojin Zhu. Semi-supervised learning literature survey. Technical Report1530, Computer Sciences, University of Wisconsin-Madison, 2005.126
Page 1 and 2:
University of AlbertaLarge-Scale Se
Page 5 and 6:
Table of Contents1 Introduction 11.
Page 7 and 8:
7 Alignment-Based Discriminative St
Page 9 and 10:
List of Figures2.1 The linear class
Page 11 and 12:
drawn in by establishing a partial
Page 13 and 14:
(2) “He saw the trophy won yester
Page 15 and 16:
actual sentence said, “My son’s
Page 17 and 18:
Uses Web-Scale N-grams Auto-Creates
Page 19 and 20:
spelling correction, and the identi
Page 21 and 22:
Chapter 2Supervised and Semi-Superv
Page 23 and 24:
emphasis on “deliverables and eva
Page 25 and 26:
Figure 2.1: The linear classifier h
Page 27 and 28:
The above experimental set-up is so
Page 29 and 30:
and discriminative models therefore
Page 31 and 32:
their slack value). In practice, I
Page 33 and 34:
One way to find a better solution i
Page 35 and 36:
Figure 2.2: Learning from labeled a
Page 37 and 38:
algorithm). Yarowsky used it for wo
Page 39 and 40:
Learning with Natural Automatic Exa
Page 41 and 42:
positive examples from any collecti
Page 43 and 44:
generated word clusters. Several re
Page 45 and 46:
One common disambiguation task is t
Page 47 and 48:
3.2.2 Web-Scale Statistics in NLPEx
Page 49 and 50:
For each target wordv 0 , there are
Page 51 and 52:
ut without counts for the class pri
Page 53 and 54:
Accuracy (%)10090807060SUPERLMSUMLM
Page 55 and 56:
We also follow Carlson et al. [2001
Page 57 and 58:
Set BASE [Golding and Roth, 1999] T
Page 59 and 60:
pronoun (#3) guarantees that at the
Page 61 and 62:
807876F-Score747270Stemmed patterns
Page 63 and 64:
anaphoricity by [Denis and Baldridg
Page 65 and 66:
ter, we present a simple technique
Page 67 and 68:
We seek weights such that the class
Page 69 and 70:
each optimum performance is at most
Page 71 and 72:
We now show that ¯w T (diag(¯p)
Page 73 and 74:
Training ExamplesSystem 10 100 1K 1
Page 75 and 76:
Since we wanted the system to learn
Page 77 and 78:
Chapter 5Creating Robust Supervised
Page 79 and 80:
§ In-Domain (IN) Out-of-Domain #1
Page 81 and 82:
Adjective ordering is also needed i
Page 83 and 84: Accuracy (%)10095908580757065601001
Page 85 and 86: System IN O1 O2Baseline 66.9 44.6 6
Page 87 and 88: 90% of the time in Gutenberg. The L
Page 89 and 90: VBN/VBD distinction by providing re
Page 91 and 92: other tasks we only had a handful o
Page 93 and 94: without the need for manual annotat
Page 95 and 96: DSP uses these labels to identify o
Page 97 and 98: Semantic classesMotivated by previo
Page 99 and 100: empirical Pr(n|v) in Equation (6.2)
Page 101 and 102: Verb Plaus./Implaus. Resnik Dagan e
Page 103 and 104: SystemAccMost-Recent Noun 17.9%Maxi
Page 105 and 106: Chapter 7Alignment-Based Discrimina
Page 107 and 108: ious measures to learn the recurren
Page 109 and 110: how labeled word pairs can be colle
Page 111 and 112: Figure 7.1: LCSR histogram and poly
Page 113 and 114: 0.711-pt Average Precision0.60.50.4
Page 115 and 116: Fr-En Bitext Es-En Bitext De-En Bit
Page 117 and 118: Chapter 8Conclusions and Future Wor
Page 119 and 120: 8.3 Future WorkThis section outline
Page 121 and 122: My focus is thus on enabling robust
Page 123 and 124: [Bergsma and Cherry, 2010] Shane Be
Page 125 and 126: [Church and Mercer, 1993] Kenneth W
Page 127 and 128: [Grefenstette, 1999] Gregory Grefen
Page 129 and 130: [Koehn, 2005] Philipp Koehn. Europa
Page 131 and 132: [Mihalcea and Moldovan, 1999] Rada
Page 133: [Ristad and Yianilos, 1998] Eric Sv
Page 137: NNP noun, proper, singular Motown V
show all

Large-Scale Semi-Supervised Learning for Natural Language ...

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?