12.07.2015 Views

Large-Scale Semi-Supervised Learning for Natural Language ...

Large-Scale Semi-Supervised Learning for Natural Language ...

Large-Scale Semi-Supervised Learning for Natural Language ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Training ExamplesSystem 10 100 1KCS-SVM 59.0 71.0 84.3CS-SVM + 59.4 74.9 84.5VAR-SVM 70.2 76.2 84.5VAR-SVM+FreeB 64.2 80.3 84.5Table 4.3: Accuracy (%) of non-referential detection SVMs. Unsupervised (SUMLM) accuracyis 79.8%.ResultsOn this task, the majority-class baseline is much higher, 66.9%, and so is the accuracy ofthe top unsupervised system: 94.8%. Since four of the five sets are binary classifications,where K-SVM and CS-SVM are equivalent, we only give the accuracy of the CS-SVM (itdoes per<strong>for</strong>m better than K-SVM on the one 3-way set). VAR-SVM again exceeds the unsupervisedaccuracy <strong>for</strong> all training sizes, and generally per<strong>for</strong>ms as well as the augmentedCS-SVM + using an order of magnitude less training data (Table 4.2). Differences from≤1Kare significant.4.4.3 Non-Referential Pronoun DetectionWe use the same data and fillers as Chapter 3, and preprocess the N-gram corpus in thesame way.Recall that the unsupervised system picks non-referential if the difference between thesummed count of it fillers and the summed count of they fillers is above a threshold (notethis no longer fits Equation (4.5), with consequences discussed below). We thus separatelyminimize the variance of the it pattern weights and the they pattern weights. We use 1Ktraining, 533 development, and 534 test examples.ResultsThe most common class is referential, occurring in 59.4% of test examples. The unsupervisedsystem again does much better, at 79.8%.Annotated training examples are much harder to obtain <strong>for</strong> this task and we experimentwith a smaller range of training sizes (Table 4.3). The per<strong>for</strong>mance of VAR-SVM exceeds theper<strong>for</strong>mance of K-SVM across all training sizes (bold accuracies are significantly better thaneither CS-SVM <strong>for</strong> ≤100 examples). However, the gains were not as large as we had hoped,and accuracy remains worse than the unsupervised system when not using all the trainingdata. When using all the data, a fairly large C-parameter per<strong>for</strong>ms best on developmentdata, so regularization plays less of a role.After development experiments, we speculated that the poor per<strong>for</strong>mance relative to theunsupervised approach was related to class bias. In the other tasks, the unsupervised systemchooses the highest summed score. Here, the difference in it and they counts is compared toa threshold. Since the bias feature is regularized toward zero, then, unlike the other tasks,using a low C-parameter does not produce the unsupervised system, so per<strong>for</strong>mance canbegin below the unsupervised level.65

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!