Large-Scale Semi-Supervised Learning for Natural Language ...

12.07.2015 Views
[Jiampojamarn et al., 2007] Sittichai Jiampojamarn, Grzegorz Kondrak, and Tarek Sherif.Applying many-to-many alignments and hidden Markov models to letter-to-phonemeconversion. In NAACL-HLT, 2007.[Jiampojamarn et al., 2010] Sittichai Jiampojamarn, Ken Dwyer, Shane Bergsma, AdityaBhargava, Qing Dou, Mi-Young Kim, and Grzegorz Kondrak. Transliteration generationand mining with limited training resources. Named Entities Workshop (NEWS), 2010.[Joachims et al., 2009] Thorsten Joachims, Thomas Finley, and Chun-Nam John Yu.Cutting-plane training of structural SVMs. Mach. Learn., 77(1):27–59, 2009.[Joachims, 1999a] Thorsten Joachims. Making large-scale Support Vector Machine learningpractical. In B. Schölkopf and C. Burges, editors, Advances in Kernel Methods:Support Vector Machines. MIT-Press, 1999.[Joachims, 1999b] Thorsten Joachims. Transductive inference for text classification usingsupport vector machines. In International Conference on Machine Learning (ICML),1999.[Joachims, 2002] Thorsten Joachims. Optimizing search engines using clickthrough data.In KDD, 2002.[Joachims, 2006] Thorsten Joachims. Training linear SVMs in linear time. In KDD, 2006.[Jones and Ghani, 2000] Rosie Jones and Rayid Ghani. Automatically building a corpusfor a minority language from the web. In Proceedings of the Student Research Workshopat the 38th AnnualMeeting of the Association for Computational Linguistics, 2000.[Jurafsky and Martin, 2000] Daniel Jurafsky and James H. Martin. Speech and languageprocessing. Prentice Hall, 2000.[Kehler et al., 2004] Andrew Kehler, Douglas Appelt, Lara Taylor, and Aleksandr Simma.The (non)utility of predicate-argument frequencies for pronoun interpretation. In HLT-NAACL, 2004.[Keller and Lapata, 2003] Frank Keller and Mirella Lapata. Using the web to obtain frequenciesfor unseen bigrams. Computational Linguistics, 29(3):459–484, 2003.[Kilgarriff and Grefenstette, 2003] Adam Kilgarriff and Gregory Grefenstette. Introductionto the special issue on the Web as corpus. Computational Linguistics, 29(3):333–347, 2003.[Kilgarriff, 2007] Adam Kilgarriff. Googleology is bad science. Computational Linguistics,33(1), 2007.[Klementiev and Roth, 2006] Alexandre Klementiev and Dan Roth. Named entity transliterationand discovery from multilingual comparable corpora. In HLT-NAACL, 2006.[Knight et al., 1995] Kevin Knight, Ishwar Chander, Matthew Haines, Vasileios Hatzivassiloglou,Eduard Hovy, Masayo Iida, Steve K. Luk, Richard Whitney, and Kenji Yamada.Filling knowledge gaps in a broad coverage machine translation system. In IJCAI, 1995.[Koehn and Knight, 2002] Philipp Koehn and Kevin Knight. Learning a translation lexiconfrom monolingual corpora. In ACL Workshop on Unsupervised Lexical Acquistion, 2002.[Koehn and Monz, 2006] Philipp Koehn and Christof Monz. Manual and automatic evaluationof machine translation between European languages. In NAACL Workshop onStatistical Machine Translation, 2006.[Koehn et al., 2003] Philipp Koehn, Franz Josef Och, and Daniel Marcu. Statistical phrasebasedtranslation. In HLT-NAACL, 2003.119

[Koehn, 2005] Philipp Koehn. Europarl: A parallel corpus for statistical machine translation.In MT Summit X, 2005.[Kondrak and Sherif, 2006] Grzegorz Kondrak and Tarek Sherif. Evaluation of severalphonetic similarity algorithms on the task of cognate identification. In COLING-ACLWorkshop on Linguistic Distances, 2006.[Kondrak et al., 2003] Grzegorz Kondrak, Daniel Marcu, and Kevin Knight. Cognates canimprove statistical translation models. In HLT-NAACL, 2003.[Kondrak, 2005] Grzegorz Kondrak. Cognates and word alignment in bitexts. In MT SummitX, 2005.Simple semi-[Koo et al., 2008] Terry Koo, Xavier Carreras, and Michael Collins.supervised dependency parsing. In ACL-08: HLT, 2008.[Kotsia et al., 2009] Irene Kotsia, Stefanos Zafeiriou, and Ioannis Pitas. Novel multiclassclassifiers based on the minimization of the within-class variance. IEEE Trans. Neur.Networks, 20(1):14–34, 2009.[Kulick et al., 2004] Seth Kulick, Ann Bies, Mark Liberman, Mark Mandel, Ryan Mc-Donald, Martha Palmer, Andrew Schein, Lyle Ungar, Scott Winters, and Pete White.Integrated annotation for biomedical information extraction. In BioLINK 2004: LinkingBiological Literature, Ontologies and Databases, 2004.[Kummerfeld and Curran, 2008] Jonathan K. Kummerfeld and James R. Curran. Classificationof verb particle constructions with the google web1t corpus. In AustralasianLanguage Technology Association Workshop, 2008.[Lafferty et al., 2001] John D. Lafferty, Andrew McCallum, and Fernando C. N. Pereira.Conditional Random Fields: Probabilistic models for segmenting and labeling sequencedata. In ICML, 2001.[Lapata and Keller, 2005] Mirella Lapata and Frank Keller. Web-based models for naturallanguage processing. ACM Trans. Speech and Language Processing, 2(1):1–31, 2005.[Lappin and Leass, 1994] Shalom Lappin and Herbert J. Leass. An algorithm for pronominalanaphora resolution. Computational Linguistics, 20(4), 1994.[Lauer, 1995a] Mark Lauer. Corpus statistics meet the noun compound: Some empiricalresults. In ACL, 1995.[Lauer, 1995b] Mark Lauer. Designing Statistical Language Learners: Experiments onCompound Nouns. PhD thesis, Macquarie University, 1995.[Levenshtein, 1966] Vladimir I. Levenshtein. Binary codes capable of correcting deletions,insertions, and reversals. Soviet Physics Doklady, 10(8), 1966.[Li and Abe, 1998] Hang Li and Naoki Abe. Generalizing case frames using a thesaurusand the MDL principle. Computational Linguistics, 24(2), 1998.[Lin and Wu, 2009] Dekang Lin and Xiaoyun Wu.learning. In ACL-IJCNLP, 2009.Phrase clustering for discriminative[Lin et al., 2010] Dekang Lin, Kenneth Church, Heng Ji, Satoshi Sekine, David Yarowsky,Shane Bergsma, Kailash Patil, Emily Pitler, Rachel Lathbury, Vikram Rao, Kapil Dalwani,and Sushant Narsale. New tools for web-scale N-grams. In LREC, 2010.[Lin, 1998a] Dekang Lin. Automatic retrieval and clustering of similar words. In COLING-ACL, 1998.[Lin, 1998b] Dekang Lin. Dependency-based evaluation of MINIPAR. In LREC Workshopon the Evaluation of Parsing Systems, 1998.120

Page 1 and 2: University of AlbertaLarge-Scale Se

Page 5 and 6: Table of Contents1 Introduction 11.

Page 7 and 8: 7 Alignment-Based Discriminative St

Page 9 and 10: List of Figures2.1 The linear class

Page 11 and 12: drawn in by establishing a partial

Page 13 and 14: (2) “He saw the trophy won yester

Page 15 and 16: actual sentence said, “My son’s

Page 17 and 18: Uses Web-Scale N-grams Auto-Creates

Page 19 and 20: spelling correction, and the identi

Page 21 and 22: Chapter 2Supervised and Semi-Superv

Page 23 and 24: emphasis on “deliverables and eva

Page 25 and 26: Figure 2.1: The linear classifier h

Page 27 and 28: The above experimental set-up is so

Page 29 and 30: and discriminative models therefore

Page 31 and 32: their slack value). In practice, I

Page 33 and 34: One way to find a better solution i

Page 35 and 36: Figure 2.2: Learning from labeled a

Page 37 and 38: algorithm). Yarowsky used it for wo

Page 39 and 40: Learning with Natural Automatic Exa

Page 41 and 42: positive examples from any collecti

Page 43 and 44: generated word clusters. Several re

Page 45 and 46: One common disambiguation task is t

Page 47 and 48: 3.2.2 Web-Scale Statistics in NLPEx

Page 49 and 50: For each target wordv 0 , there are

Page 51 and 52: ut without counts for the class pri

Page 53 and 54: Accuracy (%)10090807060SUPERLMSUMLM

Page 55 and 56: We also follow Carlson et al. [2001

Page 57 and 58: Set BASE [Golding and Roth, 1999] T

Page 59 and 60: pronoun (#3) guarantees that at the

Page 61 and 62: 807876F-Score747270Stemmed patterns

Page 63 and 64: anaphoricity by [Denis and Baldridg

Page 65 and 66: ter, we present a simple technique

Page 67 and 68: We seek weights such that the class

Page 69 and 70: each optimum performance is at most

Page 71 and 72: We now show that ¯w T (diag(¯p)

Page 73 and 74: Training ExamplesSystem 10 100 1K 1

Page 75 and 76: Since we wanted the system to learn

Page 77 and 78: Chapter 5Creating Robust Supervised

Page 79 and 80: § In-Domain (IN) Out-of-Domain #1

Page 81 and 82: Adjective ordering is also needed i

Page 83 and 84: Accuracy (%)10095908580757065601001

Page 85 and 86: System IN O1 O2Baseline 66.9 44.6 6

Page 87 and 88: 90% of the time in Gutenberg. The L

Page 89 and 90: VBN/VBD distinction by providing re

Page 91 and 92: other tasks we only had a handful o

Page 93 and 94: without the need for manual annotat

Page 95 and 96: DSP uses these labels to identify o

Page 97 and 98: Semantic classesMotivated by previo

Page 99 and 100: empirical Pr(n|v) in Equation (6.2)

Page 101 and 102: Verb Plaus./Implaus. Resnik Dagan e

Page 103 and 104: SystemAccMost-Recent Noun 17.9%Maxi

Page 105 and 106: Chapter 7Alignment-Based Discrimina

Page 107 and 108: ious measures to learn the recurren

Page 109 and 110: how labeled word pairs can be colle

Page 111 and 112: Figure 7.1: LCSR histogram and poly

Page 113 and 114: 0.711-pt Average Precision0.60.50.4

Page 115 and 116: Fr-En Bitext Es-En Bitext De-En Bit

Page 117 and 118: Chapter 8Conclusions and Future Wor

Page 119 and 120: 8.3 Future WorkThis section outline

Page 121 and 122: My focus is thus on enabling robust

Page 123 and 124: [Bergsma and Cherry, 2010] Shane Be

Page 125 and 126: [Church and Mercer, 1993] Kenneth W

Page 127: [Grefenstette, 1999] Gregory Grefen

Page 131 and 132: [Mihalcea and Moldovan, 1999] Rada

Page 133 and 134: [Ristad and Yianilos, 1998] Eric Sv

Page 135 and 136: [Wang et al., 2008] Qin Iris Wang,

Page 137: NNP noun, proper, singular Motown V

features

examples

feature

pairs

classifier

weights

labeled

noun

corpus

approaches

Large-Scale Semi-Supervised Learning for Natural Language ...

Large-Scale Semi-Supervised Learning for Natural Language ... ... View more Large-Scale Semi-Supervised Learning for Natural Language ...

Delete template?

Save as template ?

Large-Scale Semi-Supervised Learning for Natural Language ... Large-Scale Semi-Supervised Learning for Natural Language ...