sentiment-annotated lexicon construction for an urdu ... - Paas.com.pk
sentiment-annotated lexicon construction for an urdu ... - Paas.com.pk
sentiment-annotated lexicon construction for an urdu ... - Paas.com.pk
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Pakist<strong>an</strong> Journal of Science (Vol. 63 No. 4 Dec, 2011)<br />
SENTIMENT-ANNOTATED LEXICON CONSTRUCTION FOR AN URDU TEXT BASED<br />
SENTIMENT ANALYZER<br />
Afraz Z. S., A. Muhammad <strong>an</strong>d Martinez-Enriquez A. M *<br />
Department of CS & E, U. E. T., Lahore, Pakist<strong>an</strong><br />
**<br />
Department of CS, CINVESTAV-IPN, D.F. Mexico<br />
Corresponding author’s email (afrazsyed@uet.edu.<strong>pk</strong>)<br />
ABSTRACT: A <strong>lexicon</strong> based <strong>sentiment</strong> <strong>an</strong>alyzer is <strong>com</strong>posed of two parts: a classifier <strong>an</strong>d a<br />
<strong>lexicon</strong> of <strong>sentiment</strong>-<strong><strong>an</strong>notated</strong> words/phrases. In this paper, a model <strong>for</strong> such a <strong>lexicon</strong> is presented, in<br />
which the polarity scores are <strong><strong>an</strong>notated</strong> with all the subjective entries. This approach h<strong>an</strong>dles Urdu<br />
words, which are morphologically rich <strong>an</strong>d results into a much higher level of <strong>lexicon</strong> intricacy th<strong>an</strong><br />
the other l<strong>an</strong>guages, like English. This is a pioneering ef<strong>for</strong>t, as no <strong>sentiment</strong>-<strong><strong>an</strong>notated</strong> <strong>lexicon</strong> exists<br />
<strong>for</strong> Urdu l<strong>an</strong>guage. Moreover, already developed <strong>lexicon</strong>s of other l<strong>an</strong>guages c<strong>an</strong>not be used, because,<br />
Urdu exhibits, exceptionally distinctive orthographical, morphological, <strong>an</strong>d grammatical features. This<br />
<strong>lexicon</strong> is constructed as a part of a <strong>lexicon</strong> based <strong>sentiment</strong> <strong>an</strong>alyzer <strong>for</strong> opinionated Urdu text, given<br />
in the <strong>for</strong>m of reviews. After applying the developed <strong>lexicon</strong> on multiple reviews, it is observed that<br />
the results are meeting the expectations.<br />
Key words: Natural l<strong>an</strong>guage processing, <strong>com</strong>putational linguistics, <strong>sentiment</strong> <strong>an</strong>alysis, opinion mining, shallow<br />
parsing, Urdu text processing, <strong>lexicon</strong> <strong>construction</strong>.<br />
INTRODUCTION<br />
The rapid proliferation of the user generated text<br />
on the internet has given rise to a number of previously<br />
unknown aspects of the natural l<strong>an</strong>guage processing <strong>an</strong>d<br />
underst<strong>an</strong>ding. This is <strong>an</strong> obvious fact that such a huge<br />
body of knowledge generated by millions of minds<br />
around the world c<strong>an</strong>not be left free <strong>an</strong>d unbridled<br />
(Glaser et al., 2002). As a result, the field of <strong>sentiment</strong><br />
<strong>an</strong>alysis, opinion mining, or subjectivity <strong>an</strong>alysis is<br />
emerging rapidly as <strong>an</strong> unexplored frontier. For English<br />
l<strong>an</strong>guage, this area is under consideration from the last<br />
decade (Hatzivassiloglou <strong>an</strong>d Wiebe, 2000; Turney 2002;<br />
Yu <strong>an</strong>d Hatzivassiloglou, 2003 <strong>an</strong>d P<strong>an</strong>g <strong>an</strong>d Lee, 2008).<br />
These contributions present a <strong>com</strong>plete model of a<br />
<strong>sentiment</strong> <strong>an</strong>alyzer based on different techniques <strong>an</strong>d<br />
approaches like supervised or unsupervised machine<br />
learning or <strong>lexicon</strong> based, etc.<br />
In these works, a usual model of a <strong>sentiment</strong><br />
<strong>an</strong>alyzer incorporates two <strong>com</strong>ponents: (a) the classifier<br />
which <strong>an</strong>alyzes <strong>an</strong>d categorizes the given text <strong>an</strong>d (b) the<br />
<strong>lexicon</strong> or <strong>lexicon</strong>s containing the in<strong>for</strong>mation about the<br />
orientations of the entries (words/ phrases) as positive or<br />
negative. These <strong>lexicon</strong>s are called <strong>sentiment</strong>-<strong><strong>an</strong>notated</strong><br />
<strong>lexicon</strong>s (P<strong>an</strong>g <strong>an</strong>d Lee, 2008), because the polarity<br />
marks indicated <strong>for</strong> orientation are <strong><strong>an</strong>notated</strong> directly to<br />
the <strong>lexicon</strong> entries. Such <strong>lexicon</strong>s c<strong>an</strong> either be m<strong>an</strong>ually<br />
<strong>com</strong>piled or automatically generated. A considerable<br />
percentage of research has emerged in the <strong>sentiment</strong><br />
<strong><strong>an</strong>notated</strong> <strong>lexicon</strong> <strong>construction</strong> within a few years (Annett<br />
<strong>an</strong>d Kondrak, 2008; Higashinaka et al., 2007;<br />
Andreevskaia <strong>an</strong>d Bergler, 2006; Hu <strong>an</strong>d Lui, 2005; Yu<br />
<strong>an</strong>d Hatzivassiloglou, 2003; Riloff et al., 2003; Turney,<br />
2002 <strong>an</strong>d Hatzivassiloglou <strong>an</strong>d Wiebe, 2000). These<br />
contributions have proposed a variety of approaches <strong>for</strong><br />
the <strong>lexicon</strong> development, their structures <strong>an</strong>d the<br />
relationships between the entries.<br />
Mainly these ef<strong>for</strong>ts are <strong>for</strong> English l<strong>an</strong>guage<br />
<strong>an</strong>d exploit pre-developed linguistic recourses like<br />
corpuses <strong>for</strong> the development <strong>an</strong>d extraction of the<br />
required <strong>lexicon</strong>s. Consequently, <strong>for</strong> English l<strong>an</strong>guage<br />
this aspect of research is no more <strong>an</strong> unsolved issue. On<br />
the other h<strong>an</strong>d, Urdu is a recourse poor l<strong>an</strong>guage<br />
(Mukund et al, 2010) <strong>an</strong>d hence, the task of domain<br />
specific <strong>sentiment</strong> <strong><strong>an</strong>notated</strong> <strong>lexicon</strong> <strong>construction</strong> <strong>for</strong><br />
Urdu text poses m<strong>an</strong>y challenges. To our knowledge no<br />
such <strong>lexicon</strong> exists. However, there are a very few ef<strong>for</strong>ts<br />
which have tried to construct <strong>lexicon</strong>s <strong>for</strong> other l<strong>an</strong>guage<br />
processing applications of Urdu text (Ijaz <strong>an</strong>d Hussain,<br />
2007; Humayoun et al., 2007; Muaz <strong>an</strong>d Hussain, 2009<br />
<strong>an</strong>d Mukund et al, 2010).<br />
There<strong>for</strong>e, this paper describes the structure,<br />
<strong>construction</strong> <strong>an</strong>d evaluation of a m<strong>an</strong>ually tagged<br />
<strong>sentiment</strong>-<strong><strong>an</strong>notated</strong> Urdu words based <strong>lexicon</strong> as a<br />
<strong>com</strong>ponent of a <strong>sentiment</strong> <strong>an</strong>alysis model developed <strong>for</strong><br />
Urdu text. The <strong>lexicon</strong> contains in<strong>for</strong>mation about the<br />
subjectivity of <strong>an</strong> entry in addition to its orthographic,<br />
phonological, syntactic <strong>an</strong>d, morphological aspects. This<br />
approach recognizes the subjective entries in the <strong>lexicon</strong><br />
through their two attributes; i.e. orientation (either<br />
positive or negative) <strong>an</strong>d intensity (the <strong>for</strong>ce of the<br />
orientation). After the development of the <strong>lexicon</strong>, it is<br />
integrated with the <strong>sentiment</strong> classifier. The classifier<br />
preprocesses the given text <strong>an</strong>d then applies shallow<br />
218
Pakist<strong>an</strong> Journal of Science (Vol. 63 No. 4 Dec, 2011)<br />
parsing based chunking. It uses <strong>lexicon</strong> <strong>for</strong> <strong>com</strong>paring all<br />
the words/phrases present in the text. As a result, all the<br />
subjective terms in the given text be<strong>com</strong>e <strong><strong>an</strong>notated</strong>. On<br />
the basis of the polarities of individual words, the<br />
sentence <strong>an</strong>d then its total review polarity is calculated.<br />
The overall system per<strong>for</strong>m<strong>an</strong>ce is evaluated by using a<br />
corpus of movie reviews in Urdu l<strong>an</strong>guage. The<br />
classification algorithm is applied on the review corpus.<br />
Each subjective word in the review is <strong>com</strong>pared with<br />
<strong>lexicon</strong> entries <strong>for</strong> the <strong>com</strong>putation of the polarity scores.<br />
MATERIAL AND METHODS<br />
In this section, the <strong>construction</strong>, structure <strong>an</strong>d<br />
integration of the <strong>sentiment</strong>-<strong><strong>an</strong>notated</strong> <strong>lexicon</strong> of the<br />
Urdu words developed <strong>for</strong> a <strong>sentiment</strong> classification<br />
model is described. The model is designed to distinguish<br />
between the objective <strong>an</strong>d subjective terms in a given<br />
review. Objective terms are with neutral <strong>sentiment</strong>s,<br />
which have no effect on the final decision of the<br />
classification <strong>an</strong>d subjective terms are considered as the<br />
carriers of the <strong>sentiment</strong>s <strong>an</strong>d their presence c<strong>an</strong> alter the<br />
final classification. Keeping this distinction in view, the<br />
<strong>lexicon</strong> entries are also categorized as objective <strong>an</strong>d<br />
subjective terms. Be<strong>for</strong>e going into details, some terms<br />
are defined below:<br />
• Orientation. Orientation describes either the<br />
positivity or the negativity of a <strong>lexicon</strong> entry. For<br />
most of the entries, orientation is predefined during<br />
<strong>lexicon</strong> <strong>construction</strong> phase. But, in a given text it c<strong>an</strong><br />
be altered with the use of a polarity shifter in the<br />
sentence, e.g. the word ”اچھھھا“ (acha, good) have<br />
positive orientation but, with the polarity shifter “<br />
expression, (naheen, not), it be<strong>com</strong>es a negative ”نہیں<br />
i.e., نہیں“ ”اچھا (acha naheen, not good). Moreover,<br />
the orientation of some words (though their number<br />
is few) is highly domain specific or depends upon the<br />
context within which they are used. But, these two<br />
issues are beyond the scope of this research.<br />
• Intensity. This is the intensity of orientation of a<br />
<strong>lexicon</strong> entry. This describes the <strong>for</strong>ce of positivity<br />
or negativity of a term. Usually, the modifiers, e.g., “<br />
(bohat, more) describe the intensity of <strong>an</strong> ”بہھھت<br />
expression. Like other l<strong>an</strong>guages, in Urdu there are<br />
three degrees of intensity; absolute (only positive or<br />
negative orientation), <strong>com</strong>paratives (two distinct<br />
entities are <strong>com</strong>pared with each other) <strong>an</strong>d<br />
superlative (one of all entities is with highest<br />
orientation)<br />
• Polarity. The polarity mark is <strong><strong>an</strong>notated</strong> with each<br />
<strong>lexicon</strong> entry to show its orientation <strong>an</strong>d intensity.<br />
This is done at the implementation level.<br />
Lexicon Construction: A <strong>sentiment</strong>-<strong><strong>an</strong>notated</strong> <strong>lexicon</strong><br />
be<strong>com</strong>es more intricate as <strong>com</strong>pared to other Natural<br />
L<strong>an</strong>guage Processing (NLP) <strong>lexicon</strong>s. There are two<br />
reasons <strong>for</strong> this intricacy:<br />
• Each <strong>lexicon</strong> entry demonstrates its polarity<br />
in<strong>for</strong>mation in addition to its orthographic,<br />
phonological, syntactic <strong>an</strong>d, morphological features.<br />
This polarity in<strong>for</strong>mation is usually represented as<br />
either positive, or negative or neutral. For example,<br />
SentiWordNet (Andreevskaia <strong>an</strong>d Bergler, 2006),<br />
use triplets [positive, negative, objectives], with<br />
minimum value 0.0 <strong>an</strong>d maximum 1.0.<br />
• Most of the words exhibit multiple orientations<br />
depending upon their use <strong>an</strong>d domain. For example,<br />
“This damage is everlasting”. In this sentence, the<br />
everlasting is a positive word, but the <strong>com</strong>ment’s<br />
overall orientation is negative. Also, unpredictable is<br />
a positive word when used about a movie’s plot, <strong>an</strong>d<br />
be<strong>com</strong>es negative <strong>for</strong> the per<strong>for</strong>m<strong>an</strong>ce of a<br />
microwave oven.<br />
Construction Steps: The <strong>lexicon</strong> <strong>construction</strong> task is<br />
divided into following steps:<br />
Figure 1. Structure of the <strong>sentiment</strong>-<strong><strong>an</strong>notated</strong> <strong>lexicon</strong><br />
with respect to O <strong>an</strong>d I<br />
• Categorize the words either subjective or objective.<br />
When the classification algorithm is applied on these<br />
words, then the classifier simply ignores objective<br />
terms, in this way its per<strong>for</strong>m<strong>an</strong>ce totally depends<br />
upon subjective words.<br />
• Categorize these words according to morphological<br />
rules, which work at the word level. These rules c<strong>an</strong><br />
ch<strong>an</strong>ge the structure, me<strong>an</strong>ing, <strong>an</strong>d part of speech of<br />
the words. For example, rules <strong>for</strong> marking of <strong>an</strong><br />
adjective with the noun it qualifies, etc.<br />
• Identify their grammatical rules, which describe the<br />
possible structures of a sentence <strong>an</strong>d position of the<br />
parts of speech with respect to each other. As Urdu is<br />
a free order l<strong>an</strong>guage so theses rules are more<br />
difficult to define <strong>an</strong>d implement. For example, use<br />
of modifiers with adjectives or use of auxiliaries with<br />
verbs, etc.<br />
• Discover relationships between different <strong>lexicon</strong><br />
entries. These relationships c<strong>an</strong> define synonyms,<br />
<strong>an</strong>tonyms, <strong>an</strong>d cross references, etc.<br />
219
Pakist<strong>an</strong> Journal of Science (Vol. 63 No. 4 Dec, 2011)<br />
• Decide <strong>an</strong>d <strong>an</strong>notate polarities <strong>an</strong>d then intensities to<br />
the entries. In this task first the entries are<br />
categorized as positive or negative then their<br />
intensity scores are attached to them. Some entries<br />
have only orientations <strong>an</strong>d some have only intensities<br />
(like modifiers) <strong>an</strong>d some have both values.<br />
Lexicon Structure: It is assumed that the <strong>lexicon</strong> entries<br />
are either subjective or objective. The Objective terms are<br />
saved without <strong>an</strong>y polarity mark, but the subjective terms<br />
are further categorized on the bases of orientation <strong>an</strong>d<br />
intensity into three types as:<br />
• Terms with orientation only T (O). These are the<br />
terms which are either absolute positive or absolute<br />
negative. The degree of positivity or negativity is not<br />
attached with them.<br />
• Terms with intensity only T (I). These are the terms<br />
which have no orientation but they c<strong>an</strong> intensify the<br />
orientation of other word in the sentences.<br />
• Terms with both orientation <strong>an</strong>d intensity T (O, I). If<br />
a term contains both orientation (either positive or<br />
negative) <strong>an</strong>d intensity then it lies in this category<br />
<strong>an</strong>d is marked with both values.<br />
Some examples of <strong>lexicon</strong> entries from all the three<br />
categories, i.e., T(O), T(I) <strong>an</strong>d T(I,O) are given in Table<br />
1. For example, the word ”کامیاب“ (kamyaab, successful),<br />
”زیادہ“ Similarly, has positive orientation but no intensity.<br />
(zyada, more) <strong>an</strong>d ”بہت“ (bohat, very) both have intensity<br />
<strong>an</strong>d no orientation. Whereas, ”بہتر“ (behtar, better) <strong>an</strong>d “<br />
(behtareen, best) both have positive orientation ”بہھترین<br />
with intensities of a <strong>com</strong>parative <strong>an</strong>d superlative degrees,<br />
respectively.<br />
Figure 2. Integration of <strong>sentiment</strong> <strong><strong>an</strong>notated</strong> <strong>lexicon</strong> of Urdu words with the <strong>sentiment</strong> classifier<br />
System Integration: The <strong><strong>an</strong>notated</strong> <strong>lexicon</strong> of Urdu<br />
words is integrated with the <strong>sentiment</strong> classifier as shown<br />
in Figure 2. First of all, the given text in the <strong>for</strong>m of a<br />
review is taken from the website. The <strong>sentiment</strong> classifier<br />
<strong>com</strong>ponent of the systems preprocesses this review,<br />
segments it into sentences <strong>an</strong>d then words. These words<br />
are then tagged with the respective parts of speech. Now,<br />
these tagged words are <strong>com</strong>pared with the <strong>lexicon</strong> entries<br />
<strong>for</strong> <strong>sentiment</strong> orientations <strong>an</strong>d intensities. This<br />
<strong>com</strong>parison results into polarity marked or polarity<br />
<strong><strong>an</strong>notated</strong> words <strong>an</strong>d phrases. The classifier then<br />
calculates the <strong>sentiment</strong> orientation of the sentences using<br />
term polarities.<br />
RESULTS AND DISCUSSION<br />
As already mentioned, the corpuses of reviews<br />
in Urdu text are not available in the electronic <strong>for</strong>m.<br />
Although, some other corpuses related to news, blogs are<br />
accessible but these are not appropriate <strong>for</strong> the<br />
experimentation <strong>an</strong>d evaluation of our system because<br />
these do not contain opinionated text like reviews.<br />
There<strong>for</strong>e, two corpuses are m<strong>an</strong>ually collected<br />
as the test-beds from the domains of movies <strong>an</strong>d<br />
electronic appli<strong>an</strong>ces. These reviews are taken from<br />
different people to avoid monotonous opinions. The<br />
movie reviews based corpus MR (movie reviews) is<br />
<strong>com</strong>prised of 226 positive, 224 negative <strong>an</strong>d 450 reviews<br />
in total. There are 328 reviews of electronic appli<strong>an</strong>ces in<br />
PR (product reviews) corpus, with 177 positive <strong>an</strong>d 151<br />
negative.<br />
For measuring the per<strong>for</strong>m<strong>an</strong>ce, accuracy is<br />
used as the system per<strong>for</strong>m<strong>an</strong>ce metric. It is the measure<br />
of how close the document classification suggested by<br />
our system is to the actual <strong>sentiment</strong>s present in the<br />
review. A series of experiments is per<strong>for</strong>med on both<br />
corpora, one after <strong>an</strong>other.<br />
Table 2, shows the results, with accuracy of 66-<br />
74% <strong>for</strong> MR <strong>an</strong>d 77-79% <strong>for</strong> PR. It also gives the<br />
variation in the classification of positive <strong>an</strong>d negative<br />
reviews, separately.<br />
220
Pakist<strong>an</strong> Journal of Science (Vol. 63 No. 4 Dec, 2011)<br />
Table 2. Results of experimentation on both corpora<br />
Category Corpora Accuracy<br />
Negative<br />
MR 66%<br />
PR 77%<br />
Positive<br />
MR 74%<br />
PR 79%<br />
Conclusions This research work presents, the structure,<br />
development <strong>an</strong>d integration of a <strong>sentiment</strong>-<strong><strong>an</strong>notated</strong><br />
<strong>lexicon</strong>, developed as a <strong>com</strong>ponent of <strong>an</strong> Urdu text based<br />
<strong>sentiment</strong> <strong>an</strong>alysis system. Urdu is a morphologically<br />
rich l<strong>an</strong>guage, <strong>an</strong>d hence, poses m<strong>an</strong>y challenges <strong>for</strong> the<br />
development of such a <strong>lexicon</strong>. Moreover, due to<br />
unavailability of electronic text <strong>an</strong>d corpuses of<br />
opinionated reviews, our task be<strong>com</strong>es even more time<br />
consuming. The next step after the development of the<br />
<strong>lexicon</strong> is its integration with the <strong>sentiment</strong> classifier <strong>an</strong>d<br />
final implementation of the <strong>com</strong>plete system. There are<br />
two types of corpuses, which are used <strong>for</strong> testing, i.e.,<br />
movie <strong>an</strong>d product reviews. Despite of the inherent<br />
<strong>com</strong>plexities of the l<strong>an</strong>guage, the experimentation gives<br />
excellent results with <strong>an</strong> accuracy of about (74%).<br />
There<strong>for</strong>e, it is pl<strong>an</strong>ned to extend this <strong>lexicon</strong> on the same<br />
structure but with larger coverage of words.<br />
REFERENCES<br />
Andreevskaia, A. <strong>an</strong>d S. Bergler: Mining WordNet <strong>for</strong><br />
fuzzy <strong>sentiment</strong>: Sentiment tag extraction from<br />
WordNet glosses. In: EACL 2006, Trent, Italy,<br />
(2006).<br />
Annet, M. <strong>an</strong>d G. Kondark: A <strong>com</strong>parison of <strong>sentiment</strong><br />
<strong>an</strong>alysis techniques: Polarizing movie blogs. In:<br />
Bergler, S. (ed.) C<strong>an</strong>adi<strong>an</strong> AI 2008. LNCS<br />
(LNAI), vol. 5032, pp. 25–35. Springer,<br />
Heidelberg, (2008).<br />
Glaser, J., J. Dixit <strong>an</strong>d P. D. Green: Studying hate crime<br />
with the Internet: What makes racists advocate<br />
racial violence, Journal of Social Issues 58, 1,<br />
177-193, (2002).<br />
Hatzivassiloglou, V. <strong>an</strong>d J. Wiebe: Effects of Adjective<br />
Orientation <strong>an</strong>d Gradability on Sentence<br />
Subjectivity. In: 18th International Conference<br />
on Computational Linguistics, New Brunswick,<br />
NJ, (2000).<br />
Higashinaka, R., M. Walker <strong>an</strong>d R. Prasad: Learning to<br />
generate naturalistic utter<strong>an</strong>ces using reviews in<br />
spoken dialogue systems. ACM Tr<strong>an</strong>sactions<br />
onSpeech <strong>an</strong>d L<strong>an</strong>guage Processing (TSLP),<br />
(2007).<br />
Hu, M. <strong>an</strong>d B. Lui: Mining <strong>an</strong>d summarizing customer<br />
reviews. In: Conference on Hum<strong>an</strong> L<strong>an</strong>guage<br />
Technology <strong>an</strong>d Empirical Methods in Natural<br />
L<strong>an</strong>guage Processing, (2005).<br />
Humayoun, M., H. Hammarström, <strong>an</strong>d A. R<strong>an</strong>ta.: Urdu<br />
morphology, orthography <strong>an</strong>d <strong>lexicon</strong><br />
extraction. In A. Farghaly <strong>an</strong>d K.<br />
Megerdoomi<strong>an</strong> (Eds.). In: Proceedings of the<br />
2nd Workshop on Computational Approaches to<br />
Arabic Scriptbased L<strong>an</strong>guages, pp. 59–66.<br />
St<strong>an</strong><strong>for</strong>d LSA (2007).<br />
Ijaz, M. <strong>an</strong>d S. Hussain: Corpus based Urdu Lexicon<br />
Development. In: Conference on L<strong>an</strong>guage<br />
Technology (CLT 2007), University of<br />
Peshawar, Pakist<strong>an</strong>, (2007).<br />
Muaz, A., A. Ali <strong>an</strong>d S. Hussain: Analysis <strong>an</strong>d<br />
Development of Urdu POS Tagged Corpora. In:<br />
Proceedings of the 7 th Workshop on Asi<strong>an</strong><br />
L<strong>an</strong>guage Resources, IJCNLP, (2009).<br />
Mukund, S., D. Ghosh <strong>an</strong>d R. K. Srihari: Using Cross-<br />
Lingual Projections to Generate sem<strong>an</strong>tic Role<br />
Labeled Corpus <strong>for</strong> Urdu- A Resource Poor<br />
L<strong>an</strong>guage. In: 23 rd International Conference on<br />
Computational Linguistics COLING, (2010).<br />
P<strong>an</strong>g, B. <strong>an</strong>d L. Lee: Opinion mining <strong>an</strong>d <strong>sentiment</strong><br />
<strong>an</strong>alysis. Foundation <strong>an</strong>d Trends in In<strong>for</strong>mation<br />
Retrieval 2(1-2), 1–135, (2008).<br />
Riloff, E., J. Wiebe <strong>an</strong>d T. Wilson: Learning subjective<br />
nouns using extraction pattern bootstrapping. In<br />
Proceedings of the Conference on Natural<br />
L<strong>an</strong>guage Learning (CoNLL), pp. 25–32,<br />
(2003).<br />
Turney, P.: Thumbs up or thumbs down Sem<strong>an</strong>tic<br />
orientation applied to unsupervised classification<br />
of reviews, in Proceedings of the Association <strong>for</strong><br />
Computational Linguistics (ACL), pp. 417–424,<br />
(2002).<br />
Yu, H. <strong>an</strong>d V. Hatzivassiloglou: Towards <strong>an</strong>swering<br />
opinion questions: Separating facts from<br />
opinions <strong>an</strong>d identifying the polarity of opinion<br />
sentences. In Proceedings of the Conference on<br />
Empirical Methods in Natural L<strong>an</strong>guage<br />
Processing (EMNLP), (2003).<br />
221