Large-Scale Semi-Supervised Learning for Natural Language ...

More documents

Recommendations

Info

[Jiampojamarn et al., 2007] Sittichai Jiampojamarn, Grzegorz Kondrak, and Tarek Sherif.Applying many-to-many alignments and hidden Markov models to letter-to-phonemeconversion. In NAACL-HLT, 2007.[Jiampojamarn et al., 2010] Sittichai Jiampojamarn, Ken Dwyer, Shane Bergsma, AdityaBhargava, Qing Dou, Mi-Young Kim, and Grzegorz Kondrak. Transliteration generationand mining with limited training resources. Named Entities Workshop (NEWS), 2010.[Joachims et al., 2009] Thorsten Joachims, Thomas Finley, and Chun-Nam John Yu.Cutting-plane training of structural SVMs. Mach. Learn., 77(1):27–59, 2009.[Joachims, 1999a] Thorsten Joachims. Making large-scale Support Vector Machine learningpractical. In B. Schölkopf and C. Burges, editors, Advances in Kernel Methods:Support Vector Machines. MIT-Press, 1999.[Joachims, 1999b] Thorsten Joachims. Transductive inference for text classification usingsupport vector machines. In International Conference on Machine Learning (ICML),1999.[Joachims, 2002] Thorsten Joachims. Optimizing search engines using clickthrough data.In KDD, 2002.[Joachims, 2006] Thorsten Joachims. Training linear SVMs in linear time. In KDD, 2006.[Jones and Ghani, 2000] Rosie Jones and Rayid Ghani. Automatically building a corpusfor a minority language from the web. In Proceedings of the Student Research Workshopat the 38th AnnualMeeting of the Association for Computational Linguistics, 2000.[Jurafsky and Martin, 2000] Daniel Jurafsky and James H. Martin. Speech and languageprocessing. Prentice Hall, 2000.[Kehler et al., 2004] Andrew Kehler, Douglas Appelt, Lara Taylor, and Aleksandr Simma.The (non)utility of predicate-argument frequencies for pronoun interpretation. In HLT-NAACL, 2004.[Keller and Lapata, 2003] Frank Keller and Mirella Lapata. Using the web to obtain frequenciesfor unseen bigrams. Computational Linguistics, 29(3):459–484, 2003.[Kilgarriff and Grefenstette, 2003] Adam Kilgarriff and Gregory Grefenstette. Introductionto the special issue on the Web as corpus. Computational Linguistics, 29(3):333–347, 2003.[Kilgarriff, 2007] Adam Kilgarriff. Googleology is bad science. Computational Linguistics,33(1), 2007.[Klementiev and Roth, 2006] Alexandre Klementiev and Dan Roth. Named entity transliterationand discovery from multilingual comparable corpora. In HLT-NAACL, 2006.[Knight et al., 1995] Kevin Knight, Ishwar Chander, Matthew Haines, Vasileios Hatzivassiloglou,Eduard Hovy, Masayo Iida, Steve K. Luk, Richard Whitney, and Kenji Yamada.Filling knowledge gaps in a broad coverage machine translation system. In IJCAI, 1995.[Koehn and Knight, 2002] Philipp Koehn and Kevin Knight. Learning a translation lexiconfrom monolingual corpora. In ACL Workshop on Unsupervised Lexical Acquistion, 2002.[Koehn and Monz, 2006] Philipp Koehn and Christof Monz. Manual and automatic evaluationof machine translation between European languages. In NAACL Workshop onStatistical Machine Translation, 2006.[Koehn et al., 2003] Philipp Koehn, Franz Josef Och, and Daniel Marcu. Statistical phrasebasedtranslation. In HLT-NAACL, 2003.119
[Koehn, 2005] Philipp Koehn. Europarl: A parallel corpus for statistical machine translation.In MT Summit X, 2005.[Kondrak and Sherif, 2006] Grzegorz Kondrak and Tarek Sherif. Evaluation of severalphonetic similarity algorithms on the task of cognate identification. In COLING-ACLWorkshop on Linguistic Distances, 2006.[Kondrak et al., 2003] Grzegorz Kondrak, Daniel Marcu, and Kevin Knight. Cognates canimprove statistical translation models. In HLT-NAACL, 2003.[Kondrak, 2005] Grzegorz Kondrak. Cognates and word alignment in bitexts. In MT SummitX, 2005.Simple semi-[Koo et al., 2008] Terry Koo, Xavier Carreras, and Michael Collins.supervised dependency parsing. In ACL-08: HLT, 2008.[Kotsia et al., 2009] Irene Kotsia, Stefanos Zafeiriou, and Ioannis Pitas. Novel multiclassclassifiers based on the minimization of the within-class variance. IEEE Trans. Neur.Networks, 20(1):14–34, 2009.[Kulick et al., 2004] Seth Kulick, Ann Bies, Mark Liberman, Mark Mandel, Ryan Mc-Donald, Martha Palmer, Andrew Schein, Lyle Ungar, Scott Winters, and Pete White.Integrated annotation for biomedical information extraction. In BioLINK 2004: LinkingBiological Literature, Ontologies and Databases, 2004.[Kummerfeld and Curran, 2008] Jonathan K. Kummerfeld and James R. Curran. Classificationof verb particle constructions with the google web1t corpus. In AustralasianLanguage Technology Association Workshop, 2008.[Lafferty et al., 2001] John D. Lafferty, Andrew McCallum, and Fernando C. N. Pereira.Conditional Random Fields: Probabilistic models for segmenting and labeling sequencedata. In ICML, 2001.[Lapata and Keller, 2005] Mirella Lapata and Frank Keller. Web-based models for naturallanguage processing. ACM Trans. Speech and Language Processing, 2(1):1–31, 2005.[Lappin and Leass, 1994] Shalom Lappin and Herbert J. Leass. An algorithm for pronominalanaphora resolution. Computational Linguistics, 20(4), 1994.[Lauer, 1995a] Mark Lauer. Corpus statistics meet the noun compound: Some empiricalresults. In ACL, 1995.[Lauer, 1995b] Mark Lauer. Designing Statistical Language Learners: Experiments onCompound Nouns. PhD thesis, Macquarie University, 1995.[Levenshtein, 1966] Vladimir I. Levenshtein. Binary codes capable of correcting deletions,insertions, and reversals. Soviet Physics Doklady, 10(8), 1966.[Li and Abe, 1998] Hang Li and Naoki Abe. Generalizing case frames using a thesaurusand the MDL principle. Computational Linguistics, 24(2), 1998.[Lin and Wu, 2009] Dekang Lin and Xiaoyun Wu.learning. In ACL-IJCNLP, 2009.Phrase clustering for discriminative[Lin et al., 2010] Dekang Lin, Kenneth Church, Heng Ji, Satoshi Sekine, David Yarowsky,Shane Bergsma, Kailash Patil, Emily Pitler, Rachel Lathbury, Vikram Rao, Kapil Dalwani,and Sushant Narsale. New tools for web-scale N-grams. In LREC, 2010.[Lin, 1998a] Dekang Lin. Automatic retrieval and clustering of similar words. In COLING-ACL, 1998.[Lin, 1998b] Dekang Lin. Dependency-based evaluation of MINIPAR. In LREC Workshopon the Evaluation of Parsing Systems, 1998.120
Page 1 and 2:
University of AlbertaLarge-Scale Se
Page 5 and 6:
Table of Contents1 Introduction 11.
Page 7 and 8:
7 Alignment-Based Discriminative St
Page 9 and 10:
List of Figures2.1 The linear class
Page 11 and 12:
drawn in by establishing a partial
Page 13 and 14:
(2) “He saw the trophy won yester
Page 15 and 16:
actual sentence said, “My son’s
Page 17 and 18:
Uses Web-Scale N-grams Auto-Creates
Page 19 and 20:
spelling correction, and the identi
Page 21 and 22:
Chapter 2Supervised and Semi-Superv
Page 23 and 24:
emphasis on “deliverables and eva
Page 25 and 26:
Figure 2.1: The linear classifier h
Page 27 and 28:
The above experimental set-up is so
Page 29 and 30:
and discriminative models therefore
Page 31 and 32:
their slack value). In practice, I
Page 33 and 34:
One way to find a better solution i
Page 35 and 36:
Figure 2.2: Learning from labeled a
Page 37 and 38:
algorithm). Yarowsky used it for wo
Page 39 and 40:
Learning with Natural Automatic Exa
Page 41 and 42:
positive examples from any collecti
Page 43 and 44:
generated word clusters. Several re
Page 45 and 46:
One common disambiguation task is t
Page 47 and 48:
3.2.2 Web-Scale Statistics in NLPEx
Page 49 and 50:
For each target wordv 0 , there are
Page 51 and 52:
ut without counts for the class pri
Page 53 and 54:
Accuracy (%)10090807060SUPERLMSUMLM
Page 55 and 56:
We also follow Carlson et al. [2001
Page 57 and 58:
Set BASE [Golding and Roth, 1999] T
Page 59 and 60:
pronoun (#3) guarantees that at the
Page 61 and 62:
807876F-Score747270Stemmed patterns
Page 63 and 64:
anaphoricity by [Denis and Baldridg
Page 65 and 66:
ter, we present a simple technique
Page 67 and 68:
We seek weights such that the class
Page 69 and 70:
each optimum performance is at most
Page 71 and 72:
We now show that ¯w T (diag(¯p)
Page 73 and 74:
Training ExamplesSystem 10 100 1K 1
Page 75 and 76:
Since we wanted the system to learn
Page 77 and 78: Chapter 5Creating Robust Supervised
Page 79 and 80: § In-Domain (IN) Out-of-Domain #1
Page 81 and 82: Adjective ordering is also needed i
Page 83 and 84: Accuracy (%)10095908580757065601001
Page 85 and 86: System IN O1 O2Baseline 66.9 44.6 6
Page 87 and 88: 90% of the time in Gutenberg. The L
Page 89 and 90: VBN/VBD distinction by providing re
Page 91 and 92: other tasks we only had a handful o
Page 93 and 94: without the need for manual annotat
Page 95 and 96: DSP uses these labels to identify o
Page 97 and 98: Semantic classesMotivated by previo
Page 99 and 100: empirical Pr(n|v) in Equation (6.2)
Page 101 and 102: Verb Plaus./Implaus. Resnik Dagan e
Page 103 and 104: SystemAccMost-Recent Noun 17.9%Maxi
Page 105 and 106: Chapter 7Alignment-Based Discrimina
Page 107 and 108: ious measures to learn the recurren
Page 109 and 110: how labeled word pairs can be colle
Page 111 and 112: Figure 7.1: LCSR histogram and poly
Page 113 and 114: 0.711-pt Average Precision0.60.50.4
Page 115 and 116: Fr-En Bitext Es-En Bitext De-En Bit
Page 117 and 118: Chapter 8Conclusions and Future Wor
Page 119 and 120: 8.3 Future WorkThis section outline
Page 121 and 122: My focus is thus on enabling robust
Page 123 and 124: [Bergsma and Cherry, 2010] Shane Be
Page 125 and 126: [Church and Mercer, 1993] Kenneth W
Page 127: [Grefenstette, 1999] Gregory Grefen
Page 131 and 132: [Mihalcea and Moldovan, 1999] Rada
Page 133 and 134: [Ristad and Yianilos, 1998] Eric Sv
Page 135 and 136: [Wang et al., 2008] Qin Iris Wang,
Page 137: NNP noun, proper, singular Motown V
show all

Large-Scale Semi-Supervised Learning for Natural Language ...

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?