12.07.2015 Views

Large-Scale Semi-Supervised Learning for Natural Language ...

Large-Scale Semi-Supervised Learning for Natural Language ...

Large-Scale Semi-Supervised Learning for Natural Language ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

The above experimental set-up is sometimes referred to as a batch learning approach,because the algorithm is given the entire training set at once. A typical algorithm learnsa single, static model using the entire training set in one training session (remember: <strong>for</strong>a linear classifier, by model we just mean the set of weights). This is the approach takenby SVMs and maximum entropy models. This is clearly different than how humans learn;we adapt over time as new data is presented. Alternatively, an online learning algorithmis one that is presented with training examples in sequence. Online learning iterativelyre-estimates the model each time a new training instance is encountered. The perceptronis the classic example of an online learning approach, while currently MIRA [Crammerand Singer, 2003; Crammer et al., 2006] is a popular maximum-margin online learner (seeSection 2.3.4 <strong>for</strong> more on max-margin classifiers). In practice, there is little differencebetween how batch and online learners are used; if new training examples become availableto a batch learner, the new examples can simply be added to the existing training set and themodel can be re-trained on the old-plus-new combined data as another batch process.It is also worth mentioning another learning paradigm known as active learning [Cohnet al., 1994; Tong and Koller, 2002]. Here the learner does not simply train passivelyfrom whatever labeled data is available, rather, the learner can request specific examplesbe labeled if it deems adding these examples to the training set will most improve theclassifier’s predictive power. Active learning could potentially be used in conjunction withthe techniques in this dissertation to get the most benefit out of the smallest amount oftraining data possible.2.3.2 Evaluation MeasuresPer<strong>for</strong>mance is often evaluated in terms of accuracy: what percentage of examples did weclassify correctly? For example, if our decision is whether a document is about sports or not(i.e., sports is the positive class), then accuracy is the percentage of documents that are correctlylabeled as sports or non-sports. Note it is difficult to compare accuracy of classifiersacross tasks, because typically the class balance strongly affects the achievable accuracy.For example, suppose there are 100 documents in our test set, and only five of these aresports documents. Then a system could trivially achieve 95% accuracy by assigning everydocument the non-sports label. 95% might be much harder to obtain on another task witha 50-50 balance of the positive and negative classes. Accuracy is most useful as a measurewhen the per<strong>for</strong>mance of the proposed system is compared to a baseline: a reasonable,simple and perhaps even trivial classifier, such as one that picks the majority-class (themost frequent class in the training data). We use baselines whenever we state accuracy inthis dissertation.Accuracy also does not tell us whether our classifier is predicting one class disproportionatelymore often than another (that is, whether it has a bias). Statistical measures thatdo identify classifier biases are Precision, Recall, and F-Score. These measures are usedtogether extensively in classifier evaluation. 2 Again, suppose sports is the class we’re predicting.Precision tells us: of the documents that our classifier predicted to be sports, whatpercentage are actually sports? That is, precision is the ratio of true positives (elementswe predicted to be of the positive class that truly are positive, where sports is the positiveclass in our running example), divided by the sum of true positives and false positives (to-2 Wikipedia has a detailed discussion of these measures: http://en.wikipedia.org/wiki/Precision_and_recall18

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!