System IN O1 O2[Malouf, 2000] 91.5 65.6 71.6webc(a 1 ,a 2 ) vs. c(a 2 ,a 1 ) 87.1 83.7 86.0SVM with N-GM features 90.0 85.8 88.5SVM with LEX features 93.0 70.0 73.9SVM with N-GM + LEX 93.7 83.6 85.4Table 5.3: Adjective ordering accuracy (%). SVM and [Malouf, 2000] trained on BNC,tested on BNC (IN), Gutenberg (O1), and Medline (O2).Finally, we also have features <strong>for</strong> all suffixes of length 1-to-4 letters, as these encodeuseful in<strong>for</strong>mation about adjective class [Malouf, 2000]. Like the adjective features, thesuffix features receive a value of +1 <strong>for</strong> adjectives in the first position and −1 <strong>for</strong> those inthe second.N-GM features[Lapata and Keller, 2005] propose a web-based approach to adjective ordering: take themost-frequent order of the words on the web, c(a 1 ,a 2 ) vs. c(a 2 ,a 1 ). We adopt this as ourunsupervised approach. We merge the counts <strong>for</strong> the adjectives occurring contiguously andseparated by a comma.These are the most useful N-GM features; we include them but also other, tag-basedcounts from Google V2. Raw counts include cases where one of the adjectives is notused as a modifier: “the special present was” vs. “the present special issue.” We includelog-counts <strong>for</strong> the following, more-targeted patterns: 5 c(a 1 a 2 N.*), c(a 2 a 1 N.*),c(DT a 1 a 2 N.*), c(DT a 2 a 1 N.*). We also include features <strong>for</strong> the log-counts of eachadjective preceded or followed by a word matching an adjective-tag: c(a 1 J.*), c(J.* a 1 ),c(a 2 J.*), c(J.* a 2 ). These assess the positional preferences of each adjective. Finally, weinclude the log-frequency of each adjective. The more frequent adjective occurs first in57% of pairs.As in all tasks, the counts are features in a classifier, so the importance of the differentpatterns is weighted discriminatively during training.5.3.2 Adjective Ordering ResultsIn-domain, with both feature classes, we set a strong new standard on this data: 93.7%accuracy <strong>for</strong> the N-GM+LEX system (Table 5.3). We trained and tested [Malouf, 2000]’sprogram on our data; our LEX classifier, which also uses no auxiliary corpus, makes 18%fewer errors than Malouf’s system. Our web-based N-GM model is also superior to thedirect evidence web-based approach of [Lapata and Keller, 2005], scoring 90.0% vs. 87.1%accuracy. These results show the benefit of both our new lexicalized and our new web-basedfeatures.Figure 5.1 gives the in-domain learning curve. With fewer training examples, the systemswith N-GM features strongly outper<strong>for</strong>m the LEX-only system. Note that with tens ofthousands of test examples, all differences are highly significant.Out-of-domain, LEX’s accuracy drops a shocking 23% on Gutenberg and 19% on Medline(Table 5.3). [Malouf, 2000]’s system fares even worse. The overlap between training5 In this notation, capital letters (and reg-exps) are matched against tags while a 1 and a 2 match words.73
Accuracy (%)10095908580757065601001e3N-GM+LEXN-GMLEX1e4Number of training examples1e5Figure 5.1: In-domain learning curve of adjective ordering classifiers on BNC.Accuracy (%)1009590858075706560N-GM+LEXN-GMLEX1001e31e4Number of training examples1e5Figure 5.2: Out-of-domain learning curve of adjective ordering classifiers on Gutenberg.and test pairs helps explain. While 59% of the BNC test pairs were seen in the trainingcorpus, only 25% of Gutenberg and 18% of Medline pairs were seen in training.While other ordering models have also achieved “very poor results” out-of-domain[Mitchell, 2009], we expected our expanded set of LEX features to provide good generalizationon new data. Instead, LEX is very unreliable on new domains.N-GM features do not rely on specific pairs in training data, and thus remain fairly robustcross-domain. Across the three test sets, 84-89% of examples had the correct orderingappear at least once on the web. On new domains, the learned N-GM system maintains anadvantage over the unsupervised c(a 1 ,a 2 ) vs. c(a 2 ,a 1 ), but the difference is reduced. Notethat training with 10-fold cross validation, the N-GM system can achieve up to 87.5% onGutenberg (90.0% <strong>for</strong> N-GM + LEX).The learning curves showing per<strong>for</strong>mance on Gutenberg and Medline (but still trainingon BNC) is particularly instructive (Figures 5.2 and 5.3). The LEX system per<strong>for</strong>ms muchworse than the web-based models across all training sizes. For our top in-domain system,N-GM + LEX, as you add more labeled examples, per<strong>for</strong>mance begins decreasing out-ofdomain.The system disregards the robust N-gram counts as it is more and more confidentin the LEX features, and it suffers the consequences.74
- Page 1 and 2:
University of AlbertaLarge-Scale Se
- Page 5 and 6:
Table of Contents1 Introduction 11.
- Page 7 and 8:
7 Alignment-Based Discriminative St
- Page 9 and 10:
List of Figures2.1 The linear class
- Page 11 and 12:
drawn in by establishing a partial
- Page 13 and 14:
(2) “He saw the trophy won yester
- Page 15 and 16:
actual sentence said, “My son’s
- Page 17 and 18:
Uses Web-Scale N-grams Auto-Creates
- Page 19 and 20:
spelling correction, and the identi
- Page 21 and 22:
Chapter 2Supervised and Semi-Superv
- Page 23 and 24:
emphasis on “deliverables and eva
- Page 25 and 26:
Figure 2.1: The linear classifier h
- Page 27 and 28:
The above experimental set-up is so
- Page 29 and 30:
and discriminative models therefore
- Page 31 and 32: their slack value). In practice, I
- Page 33 and 34: One way to find a better solution i
- Page 35 and 36: Figure 2.2: Learning from labeled a
- Page 37 and 38: algorithm). Yarowsky used it for wo
- Page 39 and 40: Learning with Natural Automatic Exa
- Page 41 and 42: positive examples from any collecti
- Page 43 and 44: generated word clusters. Several re
- Page 45 and 46: One common disambiguation task is t
- Page 47 and 48: 3.2.2 Web-Scale Statistics in NLPEx
- Page 49 and 50: For each target wordv 0 , there are
- Page 51 and 52: ut without counts for the class pri
- Page 53 and 54: Accuracy (%)10090807060SUPERLMSUMLM
- Page 55 and 56: We also follow Carlson et al. [2001
- Page 57 and 58: Set BASE [Golding and Roth, 1999] T
- Page 59 and 60: pronoun (#3) guarantees that at the
- Page 61 and 62: 807876F-Score747270Stemmed patterns
- Page 63 and 64: anaphoricity by [Denis and Baldridg
- Page 65 and 66: ter, we present a simple technique
- Page 67 and 68: We seek weights such that the class
- Page 69 and 70: each optimum performance is at most
- Page 71 and 72: We now show that ¯w T (diag(¯p)
- Page 73 and 74: Training ExamplesSystem 10 100 1K 1
- Page 75 and 76: Since we wanted the system to learn
- Page 77 and 78: Chapter 5Creating Robust Supervised
- Page 79 and 80: § In-Domain (IN) Out-of-Domain #1
- Page 81: Adjective ordering is also needed i
- Page 85 and 86: System IN O1 O2Baseline 66.9 44.6 6
- Page 87 and 88: 90% of the time in Gutenberg. The L
- Page 89 and 90: VBN/VBD distinction by providing re
- Page 91 and 92: other tasks we only had a handful o
- Page 93 and 94: without the need for manual annotat
- Page 95 and 96: DSP uses these labels to identify o
- Page 97 and 98: Semantic classesMotivated by previo
- Page 99 and 100: empirical Pr(n|v) in Equation (6.2)
- Page 101 and 102: Verb Plaus./Implaus. Resnik Dagan e
- Page 103 and 104: SystemAccMost-Recent Noun 17.9%Maxi
- Page 105 and 106: Chapter 7Alignment-Based Discrimina
- Page 107 and 108: ious measures to learn the recurren
- Page 109 and 110: how labeled word pairs can be colle
- Page 111 and 112: Figure 7.1: LCSR histogram and poly
- Page 113 and 114: 0.711-pt Average Precision0.60.50.4
- Page 115 and 116: Fr-En Bitext Es-En Bitext De-En Bit
- Page 117 and 118: Chapter 8Conclusions and Future Wor
- Page 119 and 120: 8.3 Future WorkThis section outline
- Page 121 and 122: My focus is thus on enabling robust
- Page 123 and 124: [Bergsma and Cherry, 2010] Shane Be
- Page 125 and 126: [Church and Mercer, 1993] Kenneth W
- Page 127 and 128: [Grefenstette, 1999] Gregory Grefen
- Page 129 and 130: [Koehn, 2005] Philipp Koehn. Europa
- Page 131 and 132: [Mihalcea and Moldovan, 1999] Rada
- Page 133 and 134:
[Ristad and Yianilos, 1998] Eric Sv
- Page 135 and 136:
[Wang et al., 2008] Qin Iris Wang,
- Page 137:
NNP noun, proper, singular Motown V