Unsupervised Linguistic Approach for Sentiment Classification from Online Reviews Using Sentiwordnet 3.0

DOI : 10.17577/IJERTV2IS90012

Download Full-Text PDF Cite this Publication

Text Only Version

Unsupervised Linguistic Approach for Sentiment Classification from Online Reviews Using Sentiwordnet 3.0

Monalisa Ghosh

Dept. of Computer Science and Engineering GNIT, West Bengal University of Technology West Bengal, India

Abstract

Sentiment analysis is an area of text classification that began early of the last decade and has recently been receiving a lot of attention from researchers. Sentiment analysis involves analyzing datasets (online review, social media, blogs, and discussion groups) which contain opinions with the objective of classifying the opinions as positive, negative, or neutral. Opinion plays essential part in our information-gathering behaviour before taking a decision. In this work we describe a simple technique to perform sentiment classification based on an unsupervised linguistic approach. Our pattern-based method applies a classification rule according to which each review is classified as positive or negative. In this paper we used SentiWordNet to calculate overall sentiment score of each sentence. The results indicate SentiWordNet could be used as an important resource for sentiment classification tasks. Additional considerations are made on possible further improvements to the method.

Keywords: Sentiment Analysis, Sentiment Classification, Linguistic, Opinion Mining, SentiWordNet.

  1. Introduction:

    The World Wide Web and the Internet provide a forum through which an individuals process of decision making may be influenced by the opinions of others. For example, the customer feedback system used by amazon.com allows customers to use free-form text to rate products and services received. These ratings influence those customers who prefer to follow the reviews before they make a purchase decision, allowing a customer to make a more informed

    Animesh Kar

    HOD, Dept. of Computer Application GNIT, West Bengal University of Technology

    West Bengal, India

    decision. Customer feedback and product evaluations can also be found at many online sites including epinions.com and amazon.com. These kinds of online media have resulted in large quantities of textual data containing opinion and facts.

    The main objective of Sentiment Classification is to determine the polarity of comments (positive, negative or neutral) by extracting features and components of the object that have been commented on in each document. Sentiment classification could be done in word/phrase level, sentence level and document level. Sentiment analysis and classification are technically challenging because opinions can be expressed in subtle and complex ways, involving the use of slang, ambiguity, sarcasm, irony and idiom.

    Sentiment analysis and classification is performed for several reasons, for example to track the ups and downs of aggregate attitudes to a brand or product, to compare the attitudes of online customers between one brand or product and another, and to pull out examples of particular types of positive or negative statements on some topic. It may also be performed to enhance customer relationship management and to help other potential customers make informed choices.

    In this work, we propose technique for sentence-level sentiment classification of product reviews by using SentiWordNet. SentiWordNet is one such resource, containing opinion information on terms extracted from the WordNet database and made publicly available for performing opinion mining tasks and research purposes.

    The rest of the paper is organized as follows: Section-2 presents the related research of the proposed work. Section- 3 describes the proposed technique. Section-4 elaborates the all method with all steps. Section-5 highlights the results and finally Section-6 concludes the proposed method.

  2. Background and Related Work:

    Sentiment analysis (or opinion mining) has been the focus of growing attention among computational linguists in recent years, in no small part because of the emergence of the Web, which provides both a vast corpus and a variety of potential applications. In the field of Sentiment Analysis, many studies have been carried out on Sentiment-based Classification. There are two main approaches to sentiment classification: those that are based on machine learning techniques and those that are involved semantic orientation technique. Looking at related works and methods adopted, a distinction can be made between machine learning and linguistic (semantic orientation) approaches. In our study, Sentiment Analysis is performed at sentence level (phrase- level Sentiment Analysis) and Sentiment Classification is based on a linguistic approach; our pattern-based method applies a classification rule according to which each review is classified as positive or negative depending on its overall sentiment score, calculated with the aid of SentiWordNet[1]. There is an increasing number and variety of research papers in the area of sentiment analysis and classification where author used linguistic approach, few of them are considered to discuss here.

    The first major attempt to classify words automatically according to their polarity was probably Hatzivassiloglou and McKeown (1997) [2]. Instead of the internet, they used the Wall Street Journal corpus, and only concerned themselves with whether a word was positive or negative.

    Turney et al., [3] proposed an algorithm which takes a written review as input and produces a classification as output in a three step approach: using a part-of-speech tagger to identify phrases in a review that contain adjectives or adverbs, estimating the semantic orientation of each phrase extracted, and assigning the review to a class, either recommended or not recommended, based on the average semantic orientation of the extracted phrases. If the average is positive, the review is assumed to recommend the item, otherwise, the item is not recommended. The point wise mutual information and information retrieval algorithm is used to measure the similarity of pairs of words or phrases to estimate the semantic orientation of a phrase.

    Lei Zhang et al., [4] identified domain dependent opinion words. Noun and noun phrases that indicate the product feature which implies opinions are found using a feature based opinion mining model. And also a list of candidate features with positive opinions and list of candidate features negative opinions is produced.

    Opinion lexicon complied by Ding et al., (2008) [5] was used to identify the opinion polarity on each product feature in a sentence. For a sentence s which contains a product feature f, opinion words in the sentence are first identified by matching with the words in the opinion lexicon. An orientation score for f is computed and the semantic

    orientation of the positive word is assigned the score of +1, and a negative word is assigned the score of -1. On summing up of all the scores, if the final score is positive, then the opinion on the feature in s is positive. If the score is negative, then the opinion on the feature in s is negative.

    Ohana et al., [6] proposed a technique of sentiment classification by using features built from the SentiWordNet database of term polarity scores. Their approach consisted of counting positive and negative term scores to determine sentiment orientation. They also presented an improvement of this by building a data set of relevant features using SentiWordNet and a machine learning classifier. They implemented a negation detection algorithm to adjust SentiWordNet scores accordingly for negated terms and set a threshold value in cases where multiple SentiWordNet scores were found fora term.

    Dave et al., [7] developed Review Seer, a document level opinion classifier that uses statistical techniques and POS tagging information for sifting through and synthesizing product reviews, essentially automating the sort of work done by aggregation sites or clipping services. They first used structured reviews for testing and training, identifying appropriate features and scoring methods from information retrieval for determining whether reviews are positive or negative. These results performed as well as traditional machine learning methods.

    Yessenalina et al., [8] proposed a two-level approach to document-level sentiment classification that extracts useful sentences and predicts document-level sentiment based on the extracted sentences. Their model, unlike previous learning methods for this task, does not rely on the gold standard sentence-level subjectivity annotations and optimizes directly for document-level performance. This model was evaluated using the movie reviews dataset and the U.S. Congressional floor debates.

  3. Proposed Technique

    Figure 1 gives an architectural overview for our approach of sentiment classification of online reviews. In this work, we proposed a domain dependent rule based method for semantically classifying sentiment from online customer reviews and comments. This method works as it takes a review; checks individual sentences whether sentences are objective or subjective, and decides its semantic orientation by using lexical contextual information at the sentence level. Our approach can be summarized into following steps. The steps will describe elaborately in the next section.

    • Collect review from online resource, splitting all reviews into sentences, remove stop words and then store it in a database.

    • Use Stanford POS tagger for tags each term with its part of speech.

    • Create a list of a list of identified noun feature and then extract the noun phrase.

    • Classify the sentences into objective and subjective sentences by identifying the opinion sentence.

    • Take an opinion sentence calculates its word semantic orientation and assign a weight to this word from SentiWordNet dictionary.

    • Calculate the final weights of each sentence and review to decide whether it is positive or negative.

      Review Crawler

      Review Crawler

      Split Sentences

      Split Sentences

      Remove stop

      Remove stop

      Semantic score of opinion sentence

      we got really good performance improvement. Most of this work was done to ensure that we could maintain a one-to- one correspondence between the raw data provided and the data we generated after POS tagging using the Stanford Tagger.

      4.2 Review sentence tagging:

      A POS tagger parses a string of words (e.g., a sentence) and tags each term with its part of speech. For POS tagging we used the Stanford pos tagger because of its good performance and pre-trained POS tagging models which help us to tag the texts of the reviews without any need for training data. This tagger used by splitting text into sentences and to produce the part-of-speech tag for each word (whether the word is a noun, verb, adjective, etc). The following shows a sentence with POS tags.

      Pos tagging

      Pos tagging

      I/PRP recommend/VB this/DT superb/NN camera/NN to/TO all/DT I/PRP used/VBN until/IN now/RB EOS/EOS

      Extract candidate feature

      Extract candidate feature

      Review set

      Calculate the review score

      20D/CD but/CC this/DT is/VBZ really/RB big/JJ step/NN up/IN with/IN quality/NN viewfinder/NN cleaning/NN system,/NN performance/NN.

      SentiWordNet Interpretation

      SentiWordNet Interpretation

      Each sentence is saved in the review database along with the POS tag information of each word in the sentence. After the POS tagging phase, the sentences and words in each review are tokenized. In this step, we collect the necessary information about the features of the review.

      Noun phrase

      Noun phrase

      Figure 1: Architectural overview

  4. Method

    There are several subsection that describe the overall process of the sentiment classification of the proposed method are Pre-processing, Review sentence tagging Feature selection, Phrase extraction, Sentence classification using SentiWordNet .

    4.1 Pre-processing:

    First we collect textual data from online sources as HTML pages and stored it in a relational database and all those reviews are split into sentences and make a bag of sentences (BoS).

    Pre-processing of the data included removing noise form sentences using spelling correction, removing ellipsis () etc. At this phase we remove few stop words (prepositions, irrelevant words) for better improvement of the result and

      1. Feature Extraction:

        Opinion word

        Opinion word

        Determine opinion sentences

        Determine opinion sentences

        Feature & opinion

        Feature & opinion

        Feature selection is one of the most important parts of our experiment because when the user expresses their opinion about the product they mainly target the feature. We described the process by following steps.

        • We consider the tagged sentences to retrieved all the frequent nouns with a maximum and minimum threshold and create a list of these nouns as candidate feature.

        • Although all the parts of speech are important people most commonly used adjectives to depict most of the expression. For each sentence and each noun in the candidate features list, checks whether there is an adjective immediately before or after the noun and make a list of those relevant adjectives.

        • In this step, consider each sentence again to extract the nouns adjacent to the adjectives stored in the adjective list. Then we include all the current nouns (identified in this phase) with the previous list of candidate feature.

          • Its difficult to finding those features which appear explicitly in the reviews. For example, in the

          manually create a list of 15 domain dependent feature indicator (fit, expensive for the features respectively size, price) to identify those features that appear explicitly.

      2. Phrase Extraction:

        In product reviews people like to express their opinion in short and simple sentence, like the form of product feature + opinion- word. The opinion words mostly appear around the product features in the review sentences. Since we could catch the simple collocation and pattern between the opinion word and the product features in a window.

        We extract noun Phrase in the following manner:

        • First extract the adjectives or opinion words [project goal] from those sentences only which contains at least one of the selected features. Sentences with opinion words and selected feature called opinion sentence [10].

        • For each opinion sentence, determine the distance between frequent feature and opinion word by calculating the position of opinion word as

          n 06768259 0.125 0 cheap_shot#1

          sentence While light, it will not easily fit in

          pockets here the feature is size of the camera. In

          POS ID Posscore Negscore SynsetTerm

          this situation the adverb and adjectives which

          a

          00001740

          0.125

          0

          able#1

          indicate the explicit feature are called feature

          a

          00002098

          0

          0.75

          unable#1

          indicator [9]. We consider a number of reviews and

          a

          00005205

          0.5

          0

          absolute#1

          sentence While light, it will not easily fit in

          pockets here the feature is size of the amera. In

          POS ID Posscore Negscore SynsetTerm

          this situation the adverb and adjectives which

          a

          00001740

          0.125

          0

          able#1

          indicate the explicit feature are called feature

          a

          00002098

          0

          0.75

          unable#1

          indicator [9]. We consider a number of reviews and

          a

          00005205

          0.5

          0

          absolute#1

          v 00514730 0.375 0 destress#1

          Figure 2: SentiWordNet Fragment

          For our experiment we have used formulas for calculating positive and negative scores were described below.

        • For a given lemma with n senses (lemma#n),the formula applied to all the n posScores and negScores of the lemma.

          In our experiment we used this formula by considering only the first (and thus most frequent) sense for the given lemma, Based on [12], [13], [14].

          posScore = posScore1 and negScore = negScore1

        • We have already identified the subjective sentence; now consider that sentence to determine positive and negative scores by calculating the average of positive and negative scores for each term using a formula which are taken from [15] as shown below.

          (feature + 5) and (feature 5) and consider this

          i

          pattern as noun phrase.

      3. Sentiment classification using

    SentenceScore =

    =0

    1

    …. Formula 1

    SentiWordNet:

    SentiWordNet [1] is a lexical resource for opinion mining. SentiWordNet assigns to each synset of WordNet [11] three sentiment numerical scores: Obj(s), Pos(s) and Neg(s) describing how Objective, Positive and Negative the terms contained in the synset are. Each of the three scores ranges from 0.0 to 1.0, and their sum is 1.0 for each synset and the entries contain the parts of speech category of the displayed entry, its positivity, its negativity, and the list of synonyms.

    SentenceScore is the positive or negative scores of sentences; score(i) is the positive or negative scores of the word in sentences; n is the number of words in sentences.

    • Finally we calculate the review score to decide about positive or negative, calculate the average of positive and negative scores for these opinionated

    • sentences using a formula which are taken from

    [16] as shown below.

    The word or lemma present in the form lemma#sense-

    i

    number, where the first sense corresponds to the most frequent and different word senses can have different polarities. Examples of sentiment scores associated to SentiWordNet entries are shown in Figure 2.

    ReviewScore =

    =0

    …..Formula2

    ReviewScore, are the positive or negative score of Review, SentenceScore(i) are the positive, negative score of the sentence in review, n is the total no. of sentences in the review.

  5. Experimental Results:

    In this section, we explain our datasets and evaluate the proposed methods.

    5.1 Dataset Collection:

    All the experiments are performed based on the dataset prepared by collecting online reviews of digital camera. We considered three kinds of digital camera Canon EOS40D, Nikon Coolpix, Nikon D3SLR; reviews are crawled from Amazon.com and ebay.com. The data set consists of 200 positive and 100 negative reviews in individual text files. Each review contains minimum 10 sentences.

    Table I present the whole database it shows the no of items, no of customer reviews, no of sentences per review.

    Table II: Classification of subjective and objective sentence

    Product Name

    No of review

    Sentenc e

    Subjective

    Object ive

    Positi Ve

    Negati ve

    Canon EOS40D

    112

    1232

    515

    229

    388

    Nikon coolpix

    86

    1118

    459

    336

    323

    Nikon D3SLR

    59

    413

    168

    71

    174

    We use SentiWordNet to find semantic orientation of subjective sentence or individual opinion sentence and we calculate the final semantic score of each review to decide about positive, negative using the formula 1 and 2. We use precession and recall to evaluate the final result shows in Table III.

    Table I: Database

    Product

    Product type

    No of Reviews

    Average line/review

    Canon EOS40D

    Digital Camera

    112

    11

    Nikon coolpix

    Digital Camera

    86

    13

    Nikon D3SLR

    Digital Camera

    59

    7

    We used precision and recall as evaluation matrices to measure the performance of our method. Precision is the ratio of true positives among all retrieved instances and recall is the ratio of true positives among all positive instances. We used F1 measure matrices defined as the harmonic mean of precision Precision () and Recall ().

    Precision ():

    +

    Recall ():

    +

    1

    1

    F : 2

    2++

    Table II and Table III present the evaluation results.

    In our experiment we have identified the subjective or opinion sentence using the proposed technique as mentioned in section IV and the results shows the classification of subjective and objective sentences in Table II.

    Table III: Results of sentence orientation

    Product Name

    Positive review

    F1-

    mea sure

    Negative review

    F1-

    meas ure

    Rec all (%)

    Precisio n (%)

    Recall (%)

    Precisio n(%)

    Canon

    84.9

    88.60

    86.7

    77.96

    72.17

    74.95

    EOS40

    2

    3

    D

    Nikon

    81.1

    78.49

    79.7

    81.87

    76.66

    79.17

    coolpix

    4

    9

    Nikon

    83.3

    85.65

    84.4

    80.69

    86.27

    83.39

    D3SLR

    3

    7

  6. Conclusion:

In this paper we have proposed a set of techniques which performs linguistic semantic analysis of text to identify opinions from the sentences of customer review documents. We have also proposed a rule based method to identify the sentiment polarity of opinion sentences. In this purpose we used SentiWordNet which is a valuable lexicon for sentiment classifying of online reviews. A typical feature selection method based on sentence tagging is employed and next identify opinionated sentence based on feature and opinion words. The semantic score for the most frequent sense is assigned to the each term using SentiWordNet and finally calculate the sentence score to decide it is either positive or negative. Our experimental results indicate that the proposed techniques are achievedan average accuracy. In our future work, we plan to further improve and refine our techniques, includes the development of a method to extend the list of product-dependent features and feature

attributes. For better improvement we want to consider negation expressions that exchange the polarity of the sentence sentiment. In future well try to combine rule based method and machine learning methods to improve the accuracy of the result.

References

  1. A. Esuli and F. Sebastiani (2006). SentiWordNet: A Publicly Available Lexical Resource for Opinion Mining. In Proceedings of International Conference on Language Resources and Evaluation (LREC), Genoa.

  2. V. Hatzivassiloglou and K.R. McKeown (1997). Predicting the semantic orientation of adjectives. In Proceedings of the 8th conference on European chapter of the Association for Computational Linguistics, Pages: 174 181.

  3. Peter D. Turney (2002). Thumbs up or thumbs down? : Semantic orientation applied to unsupervised classification of reviews. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. Pages: 417-424.

  4. Lei Zhang and Bing Liu (2011) Identifying Noun Product Features that Imply Opinions. In Proceedings of the ACL-2011 (short paper), Portland, Oregon, USA.

  5. Xiaowen Ding (2008). A Holistic Lexicon-Based Approach to Opinion Mining- In proceeding of WSDM08, Palo Alto, California, USA.2008 ACM

    978-1-59593-927-9/08/0002.

  6. Bruno Ohana and Brendan Tierney (2009). Sentiment Classification of reviews using SentiWordNet. In Proceeding of IT&T Conference.

  7. Kushal Dave, Steve Lawrence, and David M. Pennock (2003) Mining the peanut gallery: Opinion extraction and semantic classification of product reviews. In Proceedings of WWW. Pages: 519528.

  8. Ainur Yessenalina, Yisong Yue and Claire Cardie. (2010). Multi-level Structured Models for Document-level Sentiment Classification. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing. Pages: 10461056.

  9. B. Liu, M. Hu, and J. Cheng (2005): Opinion observer: analyzing and comparing opinions on the web. In: WWW '05: In Proceedings of the 14th international conference on World Wide Web,NY, USA, ACM. Pages: 342351

  10. M. Hu and B. Liu (2004) Mining and summarizing customer reviews. In proceeding of KDD04.

  11. A. Esuli and F. Sebastiani (2006). Determining term subjectivity and term orientation for opinion mining. In Proceedings of EACL-06, 11th Conference of the European Chapter of the Association for Computational Linguistics, Trento, IT. Forthcoming.

  12. A. Neviarouskaya, H. Prendinger and M. Ishizuka (2009). Sentiful: Generating a reliable lexicon for sentiment analysis. In proceeding of Affective Computing and Intelligent Interaction, ACII, pages 16.

  13. S. Agrawal (2009). Using syntactic and contextual information for sentiment polarity analysis. In Proceedings of the 2nd International Conference on Interaction Sciences: Information Technology, Culture and Human, ACM Pages: 620623.

  14. Guerini, M., Stock, O., and Strapparava, C. (2008). Valentino: A tool for valence shifting of natural language texts. In Proceedings of LREC 2008, Marrakech, Morocco.

  15. S. Das and M. Chen (2001). Yahoo! for Amazon: Extracting market sentiment from stock message boards.In proceeding of the Asia Pacific Finance Association Annual Conference APFA, 2001.

  16. A. Khan, B. Baharudin,, K. Khan (2011).Sentiment Classification from Online Customer Reviews Using Lexical Contextual Sentence Structure.In proceeding of 2nd International Conference, ICSECS 2011, Kuantan, Pahang, Malaysia.

Leave a Reply