Large-Scale Semi-Supervised Learning for Natural Language ...

12.07.2015 Views
List of Tables1.1 Summary of tasks handled in the dissertation . . . . . . . . . . . . . . . . 82.1 The classifier confusion matrix . . . . . . . . . . . . . . . . . . . . . . . . 193.1 SUMLM accuracy combining N-grams from order Min to Max . . . . . . . 453.2 Context-sensitive spelling correction accuracy on different confusion sets . 483.3 Pattern filler types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 493.4 Human vs. computer non-referential it detection . . . . . . . . . . . . . . . 534.1 Accuracy of preposition-selection SVMs. . . . . . . . . . . . . . . . . . . 644.2 Accuracy of spell-correction SVMs. . . . . . . . . . . . . . . . . . . . . . 644.3 Accuracy of non-referential detection SVMs. . . . . . . . . . . . . . . . . 655.1 Data for tasks in Chapter 5 . . . . . . . . . . . . . . . . . . . . . . . . . . 705.2 Number of labeled examples for tasks in Chapter 5 . . . . . . . . . . . . . 705.3 Adjective ordering accuracy . . . . . . . . . . . . . . . . . . . . . . . . . 735.4 Spelling correction accuracy . . . . . . . . . . . . . . . . . . . . . . . . . 765.5 NC-bracketing accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . 795.6 Verb-POS-disambiguation accuracy . . . . . . . . . . . . . . . . . . . . . 816.1 Pseudodisambiguation results averaged across each example . . . . . . . . 896.2 Selectional ratings for plausible/implausible direct objects . . . . . . . . . 926.3 Recall on identification of Verb-Object pairs from an unseen corpus . . . . 926.4 Pronoun resolution accuracy on nouns in current or previous sentence. . . . 947.1 Foreign-English cognates and false friend training examples. . . . . . . . . 997.2 Bitext French-English development set cognate identification 11-pt averageprecision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1037.3 Bitext, Dictionary Foreign-to-English cognate identification 11-pt average7.4precision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Example features and weights for various Alignment-Based Discriminative103classifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1057.5 Highest scored pairs by Alignment-Based Discriminative classifier . . . . . 106

List of Figures2.1 The linear classifier hyperplane . . . . . . . . . . . . . . . . . . . . . . . . 162.2 Learning from labeled and unlabeled examples . . . . . . . . . . . . . . . 263.1 Preposition selection learning curve . . . . . . . . . . . . . . . . . . . . . 443.2 Preposition selection over high-confidence subsets . . . . . . . . . . . . . . 453.33.4Context-sensitive spelling correction learning curve . . . . . . . . . . . . .Non-referential detection learning curve . . . . . . . . . . . . . . . . . . .47513.5 Effect of pattern-word truncation on non-referential it detection. . . . . . . 524.1 Multi-class classification for web-scale N-gram models . . . . . . . . . . . 595.1 In-domain learning curve of adjective ordering classifiers on BNC. . . . . . 745.2 Out-of-domain learning curve of adjective ordering classifiers on Gutenberg. 745.3 Out-of-domain learning curve of adjective ordering classifiers on Medline. . 755.4 In-domain learning curve of spelling correction classifiers on NYT. . . . . . 765.5 Out-of-domain learning curve of spelling correction classifiers on Gutenberg. 775.6 Out-of-domain learning curve of spelling correction classifiers on Medline. 775.7 In-domain NC-bracketer learning curve . . . . . . . . . . . . . . . . . . . 795.8 Out-of-domain learning curve of verb disambiguation classifiers on Medline. 816.1 Disambiguation results by noun frequency. . . . . . . . . . . . . . . . . . . 916.2 Pronoun resolution precision-recall on MUC. . . . . . . . . . . . . . . . . 937.1 LCSR histogram and polynomial trendline of French-English dictionary pairs.1027.2 Bitext French-English cognate identification learning curve. . . . . . . . . 104

Page 1 and 2: University of AlbertaLarge-Scale Se

Page 5 and 6: Table of Contents1 Introduction 11.

Page 7: 7 Alignment-Based Discriminative St

Page 11 and 12: drawn in by establishing a partial

Page 13 and 14: (2) “He saw the trophy won yester

Page 15 and 16: actual sentence said, “My son’s

Page 17 and 18: Uses Web-Scale N-grams Auto-Creates

Page 19 and 20: spelling correction, and the identi

Page 21 and 22: Chapter 2Supervised and Semi-Superv

Page 23 and 24: emphasis on “deliverables and eva

Page 25 and 26: Figure 2.1: The linear classifier h

Page 27 and 28: The above experimental set-up is so

Page 29 and 30: and discriminative models therefore

Page 31 and 32: their slack value). In practice, I

Page 33 and 34: One way to find a better solution i

Page 35 and 36: Figure 2.2: Learning from labeled a

Page 37 and 38: algorithm). Yarowsky used it for wo

Page 39 and 40: Learning with Natural Automatic Exa

Page 41 and 42: positive examples from any collecti

Page 43 and 44: generated word clusters. Several re

Page 45 and 46: One common disambiguation task is t

Page 47 and 48: 3.2.2 Web-Scale Statistics in NLPEx

Page 49 and 50: For each target wordv 0 , there are

Page 51 and 52: ut without counts for the class pri

Page 53 and 54: Accuracy (%)10090807060SUPERLMSUMLM

Page 55 and 56: We also follow Carlson et al. [2001

Page 57 and 58: Set BASE [Golding and Roth, 1999] T

Page 59 and 60: pronoun (#3) guarantees that at the

Page 61 and 62: 807876F-Score747270Stemmed patterns

Page 63 and 64: anaphoricity by [Denis and Baldridg

Page 65 and 66: ter, we present a simple technique

Page 67 and 68: We seek weights such that the class

Page 69 and 70: each optimum performance is at most

Page 71 and 72: We now show that ¯w T (diag(¯p)

Page 73 and 74: Training ExamplesSystem 10 100 1K 1

Page 75 and 76: Since we wanted the system to learn

Page 77 and 78: Chapter 5Creating Robust Supervised

Page 79 and 80: § In-Domain (IN) Out-of-Domain #1

Page 81 and 82: Adjective ordering is also needed i

Page 83 and 84: Accuracy (%)10095908580757065601001

Page 85 and 86: System IN O1 O2Baseline 66.9 44.6 6

Page 87 and 88: 90% of the time in Gutenberg. The L

Page 89 and 90: VBN/VBD distinction by providing re

Page 91 and 92: other tasks we only had a handful o

Page 93 and 94: without the need for manual annotat

Page 95 and 96: DSP uses these labels to identify o

Page 97 and 98: Semantic classesMotivated by previo

Page 99 and 100: empirical Pr(n|v) in Equation (6.2)

Page 101 and 102: Verb Plaus./Implaus. Resnik Dagan e

Page 103 and 104: SystemAccMost-Recent Noun 17.9%Maxi

Page 105 and 106: Chapter 7Alignment-Based Discrimina

Page 107 and 108: ious measures to learn the recurren

Page 109 and 110: how labeled word pairs can be colle

Page 111 and 112: Figure 7.1: LCSR histogram and poly

Page 113 and 114: 0.711-pt Average Precision0.60.50.4

Page 115 and 116: Fr-En Bitext Es-En Bitext De-En Bit

Page 117 and 118: Chapter 8Conclusions and Future Wor

Page 119 and 120: 8.3 Future WorkThis section outline

Page 121 and 122: My focus is thus on enabling robust

Page 123 and 124: [Bergsma and Cherry, 2010] Shane Be

Page 125 and 126: [Church and Mercer, 1993] Kenneth W

Page 127 and 128: [Grefenstette, 1999] Gregory Grefen

Page 129 and 130: [Koehn, 2005] Philipp Koehn. Europa

Page 131 and 132: [Mihalcea and Moldovan, 1999] Rada

Page 133 and 134: [Ristad and Yianilos, 1998] Eric Sv

Page 135 and 136: [Wang et al., 2008] Qin Iris Wang,

Page 137: NNP noun, proper, singular Motown V

features

examples

feature

pairs

classifier

weights

labeled

noun

corpus

approaches

Large-Scale Semi-Supervised Learning for Natural Language ...

Large-Scale Semi-Supervised Learning for Natural Language ... ... View more Large-Scale Semi-Supervised Learning for Natural Language ...

Delete template?

Save as template ?

Large-Scale Semi-Supervised Learning for Natural Language ... Large-Scale Semi-Supervised Learning for Natural Language ...