Sentiment Analysis of Product Reviews and Evaluation of Trustworthiness

DOI : 10.17577/IJERTCONV5IS01018

Download Full-Text PDF Cite this Publication

Text Only Version

Sentiment Analysis of Product Reviews and Evaluation of Trustworthiness

Vivek Panchal

INFT, ACE

Mumbai, India

Zaineb Penwala

INFT, ACE

Mumbai, India

Sneha Prabhu

INFT, ACE

Mumbai, India

Rhea Shetty

INFT, ACE

Mumbai, India

Prof. Reena Mahe

INFT, ACE

Mumbai, India

Abstract There has been a rapid growth in the E-commerce industry which market and sell products as well as allow users to express their opinions about products. Buyers generally refer to these reviews and opinions before making a buying decision and to obtain first-hand experience of the product from certified buyers. However, users tend to adapt modern writing styles such as abbreviations, misspelled words, phrases instead of sentences and emoticons. Hence, automatic summarization of product reviews using Sentiment Analysis has great significance as it helps the company know what the user liked and disliked about the product as well as helps buyers make an online purchase decision. The basic task of sentiment analysis is sentiment classification which classifies a user review as positive, negative, neutral. Also, it is important to calculate the degree of trust of the user who posts a review, the reviews trustworthiness and generation of a global reputation score of the product.

Keywords Sentiment Analysis, Opinions, Review, Summarization, Abbreviations, Trustworthiness.

  1. INTRODUCTION

    Sentiment Analysis (SA) or Opinion Mining (OM) is the study of peoples opinions, attitudes and emotions toward an entity. The entity can represent events or topics. These topics are most likely to be covered by reviews. Sentiment Analysis identifies the sentiment expressed in a text then analyzes it. Therefore, the target of SA is to find opinions, identify the sentiments they express, and then classify their polarity.

    The data sets used in SA are an important issue in this field. The main sources of data are from the product reviews. These reviews are important to the business holders as they can take business decisions according to the analysis results of users opinions about their products. The review sources are mainly review sites [1].

    For the consumers to make a better choice, they have access to reviews and experiences from other consumers who have made similar choices. It helps them to avoid mistakes that other consumers made but also help clear confusion, if any, about the product or service.

    In the present system, most of the leading e-commerce

    websites, tend to focus on the product or its features to quite a large extent and give very little emphasis to what other people are saying about the product.

    This paper focuses on semantically analyzing and evaluating product reviews as they are and minimizing human bias. It does not compare products feature-wise, rather it tries to detect sentiments in a review and along with the products features, and it gives an overall rating. It also determines the degree of trust of the user providing the review by assigning it a global reputation score [9].

  2. LITERATURE SURVEY

    Classification and summarization of online blog reviews are very important to the growth of E-commerce and social networking applications. Earlier work on automatic text summarization has mainly focused on extraction of sentences that are more significant in comparison to others in a document corpus[2].

    However, it is important to note that the task of summarizing online product reviews is very different from traditional text summarization, as it does not involve extracting significant sentences from the source text. Instead, while summarizing user reviews, the aim is to first identify semantic features of products and next to generate a comparative summary of products based on feature-wise sentiment classification of the reviews that will guide the user in making a buying decision. Opinion mining from users reviews involves two main tasks (1) identification of the opinion feature set and (2) sentiment analysis of users opinions based on the identified features [2].

    It has been observed that nouns and noun phrases (N and NP) frequently occurring in reviews are useful opinion features, while the adjectives and adverbs describing them are useful in classifying sentiment [2].There are some topics that work under the umbrella of SA and have attracted the researchers recently.

    This paper [3] deals with keywords based temporal sentiment analysis. Here the paper describes that free texts such as comments by customers or readers, contain not only the sentiments of the topic being talked

    about but also temporal trends of the sentiments. Sentiment detection automatically estimates the polarity of the comments as positive, negative or sometimes neutral. On the other hand, a temporal sentiment analysis is an investigation of the sentiment pattern within a given time period. They proposed a method for investigating the temporal patterns using keywords in the comments. They managed to relate a few major events that occurred during certain time period of investigation using sentiment classification techniques and keyword clustering. The results of this work showed how temporal sentiment analysis could be used to establish the changes in opinions from the public relating to issues-events in a historically important election campaign in a developing country. The analysis was initiated by calculating the sentiment score using sentiment lexicon developed using SentWordnet 3.0. In the final step, the keyword clusters were identified by applying the standard clustering techniques, such as K- means and self-organizing map (SOM). From this paper, we observed that, the sentiment of a comment is independent of the number of keywords.

    This paper [4] deals with spotting fake reviews via collective positive unlabelled learning. Opinions in reviews are increasingly used by individuals and organizations for making purchase decisions, marketing and product design. Positive opinions often mean profits and fames for businesses and individuals, which, unfortunately, give strong incentives for imposters to post fake reviews to promote or to discredit some target products or services. Such individuals are called opinion spammers and their activities are called opinion spamming. In this study, they took Dianping which is the largest host of Chinese reviews. To improve the quality of their reviews, Dianping developed a system to detect fake reviews. All fake reviews detected by the system are almost certainly fake but the remaining reviews (unknown set) may not be all genuine. Since the unknown set may contain many fake reviews, it is more appropriate to treat it as an unlabelled set. This calls for the model of learning from positive and unlabelled examples (PU learning). By leveraging the intricate dependencies among reviews, users and IP addresses, they first proposed a collective classification algorithm called Multi-Typed Heterogeneous Collective Classification (MHCC) and then extended it to Collective Positive and Unlabelled learning (CPU).

    Their experiments are conducted on real-life reviews of 500 restaurants in Shanghai, China. It not only out performs them, but also more importantly, detects many potential fake reviews hidden in the unlabelled set, which shows the power of PU learning in solving the problem.

  3. METHODOLOGY

    The objective content is removed for sentiment analysis and subjective content is extracted for future analysis. The subjective content consists of all sentiment sentences. A sentiment sentence is the one that consists of at least one psitive or negative word. The product reviews posted by the users are extracted and stored in a database.

    The reviews posted by users may contain spelling mistakes and incorrect punctuation so here basic cleaning tasks like spell correction and sentence boundary detection is performed. Next step is feature extraction where frequently occurring nouns (N) and noun phrases (NP) are treated as possible opinion features followed by POS tagging. In natural language processing part-of- speech (POS) taggers have been developed to classify words based on their parts of speech. For sentiment analysis, a POS tagger is very useful because the POS tagger can distinguish words that can be used in different parts of speech. Opinion words are extracted followed by its polarity identification, here SENTIWORDNET is used which gives result in notions of positivity, negativity, and neutrality. The calculation of trustworthiness of user is done using Trust Reputation System (TRS). Trust Reputation Systems (TRS) are solicited in e-commerce applications so as to create trustworthiness, among a group of participants, toward transactions circumstances, products characteristics and toward users passed experiences. In fact, e-commerce users prefer to focus on users opinions about a product, in order to conceive their own trust and reputation experience. Users believe in their common interest which is to know about the trustworthiness of the transaction and product. Therefore, feedbacks or reviews, scores, recommendations and any other information given by users are very important for the trust reputation assessment. However, the reliability of this information needs to be verified.TRS are indeed essential mechanisms that aim to detect malicious interventions of users whose intention is to falsify the reputation score of a product positively or negatively. TRS ensures that the product get perfect rating and helps the customer to make right choice about the product. In the literature, there are many works such as those that propose algorithms for calculating a reputation or defining a specific set of possible reputations or ratings. However, few of them have been devoted to the semantic analysis of textual feedbacks in order to generate a mosttrustful trust degree of the user [10].Many existing TRS architectures together with different algorithms calculate the reputation score related to a product.

  4. IMPLEMENTATION

    Consumers rely on the product reviews to gain insights on the product quality, features, usability and functions. A genuine product review points out the pros and cons of a product helping buyers determine whether its the right one for them. The proposed system focuses on the below mentioned steps as also seen in Fig 1 to classify the polarity of reviews thus enabling buyers make a right choice as well as sellers gain insight on product feedback.

    1. Database consisting of Reviews

      The users post their opinions about products which are then stored in a database. The reviews from the database are then used for further steps.

    2. Preprocessing Phase

      Preprocessing the text is cleaning and preparing the text for further classification. The product reviews are written by non-experts in an unstructured natural language text and often contain spelling errors, white spaces, repetitive punctuation and incorrect punctuation. The period (.) requires to be disambiguated as it may mean a full stop or a decimal point or an abbreviation (e.g., Mr., Pvt.). These problems are corrected in this phase. Sometimes a single sentence straddlesmultiple lines as the user presses unnecessary return keys. Insuch cases, we apply the sentence merge rules as proposed byDey and Haque [5].

    3. Feature Extraction Phase

      In this phase, opinion features are extracted from the pre-processed text obtained from the previous phase. The nouns (N) and noun phrases (NP) are considered as opinion features and associated adjectives describing them are considered as indicators of their opinion orientation.

      All nouns (N) and noun phrases (NP) are extracted and tagged by the Link Grammar Parser which is a well-known and efficient syntactic parser for English language (http://www.abisource.com/projects/link-grammar/) and then, the frequently occurring N and NP as possible opinion features are identified.

    4. Polarity Identification

      To determine the sentiment polarity of an adjective describing an opinion feature we make use of SentiWordNet which is a lexical resource for opinion mining. SentiWordNet assigns three normalized sentiment scores: positivity, objectivity, and negativity to each synset of WordNet. In this way, a feature-orientation table (FO table) that records the opinion features and their corresponding descriptors of positive and negative polarities can be generated.While extracting multiword opinion features, it is possiblethat some multiword is a substring of another.In such cases, we adopt the decomposition strategy [6], which favours a shorter feature compared to a longer one.

    5. Trust Reputation System

    In the e-commerce context, there is a lack of direct trust assessment. Thus, users are not able to conceive a reputation for the product without additional help. Therefore, feedbacks or reviews, scores, recommendations and any other information given by users are very important for the trust reputation assessment [7]. However, the reliability of this information needs to be verified.

    Fig. 1 Flow Chart of Proposed System

    TRS in e-commerce application includes feedbacks analysis in its treatment of scores [8]. A proposed algorithm can be used by this system to calculate the trust degree of the user, the feedbacks trustworthiness and generates the global reputation score of the product.

    The user is redirected to an interface of selected pre-fabricated feedbacks. Feedbacks are added in the data base and text mining makes the pre-fabricated feedbacks with different categories and fills out the knowledge base.The text mining algorithm contains a part of learning to automatically fill out the knowledge base. The user is then invited to like or dislike each feedback/review of the displayed selection.

    Each feedback already has a degree of trustworthiness which represents the trust degree of the user who is the provider of the feedback. Then the proposed reputation algorithm gets the users opinion on each review (like/dislike) in addition to the trustworthiness degree of the liked/disliked feedback and uses them to generate a trust degree for the user.

  5. CONCLUSION

Although sentiment analysis tasks are challenging due to their natural language processing origins, much progress has been made over the last few years due to the high demand for it. Companies want to know how their products and services are perceived by consumers but consumers want to know the opinions of others before making buying decisions. The growing need for product insights keeps sentiment analysis and opinion mining relevant. This paper tackles a fundamental problem of evaluating the degree of trust of user along with the sentiment analysis, sentiment polarity categorization. Product reviews are selected as data used for this study.A sentiment polarity identification process and evaluation of trustworthiness has been presented along with detailed descriptions of each step.Absence of trust is always considered as a problem in online transactions. This system aims at creating trust in online communities. Proposed design will use both ratings and semantic feedbacks to calculate trustworthiness and to classify comments and users.

REFERENCES

  1. WalaaMedhat, Ahmed Hassan and HodaKorashy, Sentiment analysis algorithms and applications: A survey, Ain Shams Engineering Journal, December 2014.

  2. Mita K. Dalal and Mukesh A. Zhaveri, Semisupervised learning based opinion summarization and classification for online product reviews, Applied Computational Intelligence and Soft Computin, January 2013.

  3. NishantaMedagoda and SubanaShanmuganathan, Keyword Based Temporal Sentiment Analysis, IEEE 2015.

  4. Huayi Li, Zhiyuan Chen and Bing Liu, Spotting Fake Reviews via Collective Positive-Unlabeled Learning, IEEE January 2015.

  5. L.Dey and S. M.Haque, Opinion mining fromnoisy text data,International Journal on Document Analysis and Recognition, vol. 12, no. 3, pp. 205226, 2009.

  6. S. Shi and Y. Wang, A product features mining method based on association rules and the degree of property co-occurrence, in Proceedings of the International Conference on Computer Science and Network Technology (ICCSNT 11), pp. 11901194, December 2011.

  7. Anna Gutowska and Andrew Sloane,Modelling the B2C Marketplace: Evaluation of a Reputation Metric for e-commerce, Proceedings of Web Information Systems and Technologies – WEBIST, pp. 212-226, 2009.

  8. FereshtehGhazizadehEhsaei, Ab. Razak Che Hussin,Acceptance of Feedbacks in Reputation Systems, The Role of Online Social Interactions Information Management and Business Review Vol. 4, No. 7, pp. 391-401, July 2012 (ISSN 2220-3796).

  9. HasnaeRahimi, El bakkalihanan, Toward a New Design of Trust Reputation System in e-commerce, in the proceedings of ICMCS, (International conference on Multimedia computing and systems, Tangier Morocco, IEEE 2012).

  10. Prof. Reena Mahe , Rahul Jadhav, Pratik Gaikwad , Rahul Gadekar , Kiran Bhise, Trustworthiness in E-Commerce Context using TRS Algorithm, (International Journal of Advanced Research in Computer and Communication Engineering Vol. 4, Issue 10, October 2015).

Leave a Reply