Sentiment Analysis: Facebook Status Message

DOI : 10.17577/IJERTCONV4IS27021

Download Full-Text PDF Cite this Publication

Text Only Version

Sentiment Analysis: Facebook Status Message

Sushmitha. R Haripriya V

4th Sem, Ms (It) Assistant Professor Ms(It)

Jain University Dept Of Cs And It,

Bangalore Jain University Bangalore

Abstract:- Sentimental analyses have received great attention due to the abundance of opinion data that exist in social networks such as Facebook, Twitter, etc. Sentiments are projected on these media using texts for expressing friendship, social support, anger, happiness, etc. Existing sentiment analysis studies tend to identify user behaviors and state of minds but remain insufficient due to complexities in conveyed texts. In this research paper, we focus on the usage of text mining for sentiment classification.

Keywords: Sentimental analysis, classification, machine learning.

  1. INTRODUCTION

    On social networks interfaces, people had the opportunities to call for the change, by expressing their sentiments through a multiplicity of publications. In fact, statuses, commented images, videos, articles, etc. were shared on social networks to show dictators crimes, inequalities between regions, etc Facebook users had tendency to share their feelings, their thoughts and to inform their friends about conditions of their cities or neighborhoods, by sharing videos and pictures and especially posting short posts on walls which was the preferred way to interact with friends and by consequences to push them to care about what they think.

    This paper explores the potential applications of text and sentiment mining techniques on statuses update in order to analyze the Tunisians behavior during the revolution. For this purpose, we choose a random population having Facebook accounts. It includes males and females, students, workers, housewives, etc. The age of targeted population is varying between 21 and 54 years old.

    In fact, through the application of

    machine learning algorithms, to identify the nature of the statuses update, and to link them to

    behaviors and sentiments characteristics. For that purpose,

    we created our own dataset and then we applied on it two machine learning algorithms

    Sentiment analysis, also known as opinion mining, is the analysis of the feelings (i.e. attitudes, emotions and opinions) behind the words using natural language processing tools. Its looking beyond the number of Likes, Shares or Comments you get on an ad campaign, product release, blog post, and video to understand how people are responding to it. Was the review positive? Negative? Sarcastic? Ideologically biased?

    As marketers, being able to capture the complexity of emotional responses helps us determine if our social and content marketing initiatives are driving the actions that we planned for, while also giving us hard cues for adapting our strategy, in the event that our touch points are not resonating with our customers.

  2. SENTIMENT ANALYSIS

      1. An overview of sentiment analysis

        With the growing availability of opinion-rich resources such as social networks sites, blogs and forums new challenges arise as people actively use information

        technologies to seek out and understand the opinions of others in many domains such as asking about a particular product, politics, etc [1]. Instead

        of conducting costly market studies, customer satisfaction analysis or traditional surveys techniques, sentiment analysis provides companies

        offering products or services with means to analyze

        published reviews, to estimate the extent of product

        acceptance and to determine strategies to improve product or service quality. Sentiment analysis also facilitates policy makers or politicians to analyze public sentiments with respect to policies, public services or political issues.

        Several subtasks can be identified within sentiment analysis:

        Determining document subjectivity: often called subjectivity classification:This subtask determines whether a giving text is objective (expressing a fact) or subjective (expressing an opinion or emotion) [2].Determining document orientation: often called sentiment classification or document-level sentiment classification:this subtask determines the polarity of a giving subjective text. In other word, determines whether this text expresses a positive or a negative sentiment on its subject matter [3]. Determining the strength of document orientation: this subtask decides whether the positive sentiment expressed by a text on its subject matter is weakly positive, mildly positive or strongly positive [4].

      2. Sentiment analysis techniques

        The sentiment analysis is a challenging field. During the few last years, it has attracted a lot of researchers. Therefore, we can find a variety of techniques of sentiment analysis in the literature. The two main ones are the machine learning approaches and the dictionary-based one.

      3. Machine Learning techniques

    The machine learning methods treat the sentiment classification problem as a topic-based text classification problem. Any text classification algorithm can be employed, such as Naïve Bayes, Support Vector Machine or Maximum Entropy, etc.

    [5] developed an unsupervised learning algorithm, known as Pointwise Mutual Information and Information Retrieval (PMI-IR), to classify texts as recommended or not recommended. [6] used three machine learning techniques (Naïve Bayes, classification maximum entropy, and SVM) to classify movie reviews as positive or negative.

    They tested different feature combinations including unigrams, unigrams

    + bigrams and unigrams + POS (part-of speech) tags, etc. Those techniques outperformed the human- generated baseline, and the SVM was the technique that gave the best result. Other examples of machine learning approaches in the sentiment analysis area are proposed such as regression models to predict a reviews usefulness [7] and a semi- supervised

    method of performing a binary classification of texts as positive or negative [8]. [9] investigated the utility of Naïve Bayes and SVM on political web

    logs and they showed that a Naïve Bayes classifier significantly outperforms SVM.

    +

    In [10] the authors applied simple online classifier Winnow to classifying polarity of documents. They showed that human agreement can merely achieve 75%- 80% of precision and recall on polarity prediction. The recall obtained by Winnow is very poor, achieving only 43% for positive reviews and 16% for negative reviews.

    [11] conducted a comparative experiment on sentiment classification for online products reviews using the following classifiers: Passive-Aggressive (PA) Algorithm Based Classifier [12], Language Modeling (LM) Based Classifier [13], and Winnow Classifier. The results of their experiments showed that the Passive- Aggressive algorithm reached the higher accuracy (90, 07%) comparing to the others [14] conducted a sentiment analysis of restaurant reviews by building a senti-lexicon, and proposed two improved versions of the Naïve Bayes algorithms. Then, they evaluated their performances by comparing them to the original algorithm Naïve Bayes and SVM. Results showed that the improved versions of Naïve Bayes proved effectiveness.

      1. Dictionary-based approaches

        Those approaches extract the polarity of each sentence in a document. Afterwards, the sense of the opinion words in the phrase is analyzed in order to classify the sentiment in the text. Generally speaking, the techniques that follow this approach are based on lexicons, and use a dictionary of words mapped to their semantic value [15].The lexicon of a anguage is its vocabulary.The first version and the most well-known one is WordNet1 [16] which is a semantic lexicon where words are grouped into sets of synonyms (called synsets). Another famous example of lexicon is SentiWordNet [17] which is an extension of WordNet. This one is a sentiment lexicon that represents an index of sentiment words, and it has the polarity information of the relevant word irrespective of whether it carries a positive sentiment or a negative one.

        In this article, we perform text mining and sentiment analysis on a novel collection which represents Facebooks statuses updates Tunisian users in order to analyze sentiments and behaviors during the revolution of January 2011. Recent research on sentiment analysis [18], has focused on the mining of massive volume of texts with opinions and sentiments. Unlike most text, however, wall posts are comparatively short and they are probably the most popular Facebook

      2. How to Measure Sentiment: Measure What Matters

    Comments, shares, Likes, re-tweets, inbound links and onsite engagement are invaluable metrics that show us how people found us and if they are engaging (or not!) with our content. But they are also what Katie Delahaye Paine, author of Measure What Matters: Online Tools

    for Understanding Customers, Social Media, Engagement, and Key Relationships, labels as quantity or vanity metrics.She says that if we look at numbers only, it can give us a false sense of hope that our content is generating leads for our brand or business. With sentiment analysis, we dig deeper and look at quality metrics.Quality metrics include opinions, feelings, satisfaction ratings, the quality of shares, comments, re-tweets, replies, ratings or conversations, as well as the quality of engagement over time. Refer in Table [1].

    Table1: Sentimental Analysis chart:

    #

    S

    Mont

    h 2

    #

    s

    Metri

    cs

    cha

    nge

    Lik

    2,

    Likes

    4,

    Like

    10

    es

    0

    0

    Grow

    0%

    0

    0

    th

    0

    0

    Pos

    1

    Posts

    1

    Post

    25

    ts

    0

    2

    Grow

    %

    0

    5

    th

    Co

    2

    Com

    3

    Com

    50

    m

    0

    ments

    0

    ments

    %

    me

    0

    0

    Grow

    nts

    th

    Co

    2

    Com

    2.

    CPP

    20

    m

    ments

    4

    Grow

    %

    me

    -per-

    th

    nts

    post

    per

    pos

    t

    Co

    0.

    Com

    0.

    CPL

    m

    1

    ments

    0

    Grow

    25

    me

    -per-

    7

    th

    %

    nts

    likes

    5

    per

    lik

    es

  3. RELATED WORK:

    Normally, the users share opinions, facts or issues based on their topic of interest without being at the same place and same time Sentiments are analyzed after all the opinions in comments or postings are extracted.

    By analyzing people's sentiment, the emotions of the public toward a particular issue can be observed; experimented and quantified Emotions can be classified into three sets of texts which are texts containing positive and Negative emotions as well as texts which only state a fact or do not express any emotions Emotions can be classified into three sets of texts which are texts containing positive and negative emotions as well as texts which only state a fact or do not express any emotions we chose to classify into positive and negative sentiment labels. These labels are represented by: POSITIVE: smiley, wink, tongue, angel, shades, blush, rockon

    NEGATIVE: frown, shock, skeptical, evil, angry, fail

    Certainly arguments can be made about our choices (is shock always negative?), but we chose to maximize our dataset size, given time constraints on data collection. After dividing the categories in this way and truncating positive and negative sets to ensure they are the same size, we ended up with 4,320 usable, labeled samples. We chose to split the data into roughly 75% to 25% testing, resulting in a training set of 3,500 samples and testing set of 820 samples. Positive and negative samples were split into groups of 25 and then interleaved so that they were approximately evenly distributed within the training and test set files.

    For the multi-class case, we chose to classify into four sentiment labels: unhappy, happy,

    skeptical, and playful, chosen to maximize the amount of usable data. These labels are

    represented by:

    UNHAPPY: evil, frown, shock, angry HAPPY: smiley

    SKEPTICAL: skeptical

    PLAYFUL: angel, rock-on, shades, tongue, wink Again, truncating each class so that they are all equal, we end up with 3,612 usable, labeled samples spanning the classes above. Since we reasoned that the multi-class classification would be harder, we chose to favour training set size over test set size, splitting the data into roughly 90% to 10% testing. This resulted in a training set of 3,300 samples and testing set of 312 samples. As before, each label was interleaved roughly evenly into the training and test files.

    Next, we had to create versions of the training and test set files that were compatible with the tool to be used. After running POS and LDA against their respective training/test sets, we created Java programs to merge, in various ways commensurate with testing objectives, the resulting POS file of tagged status updates with the resulting LDA file of word-to-label ratios to produce a final training and test set to feed into the classifier.

  4. CONCLUSIONS AND FUTURE WORK

In this paper, we have investigated the utility of sentiment classification on a novel collection of dataset which is Tunisian Facebook users. The originality of this collection leads not only on the nationality of the users, but also on the period of posting their statuses updates which is the Tunisian revolution. This period was very special and unique for them, so their wall posts are with no doubt unique and encouragig to analyze.

Using the most well- known machine learning algorithms, we conducted a comparative experimental

procedure between the Naïve Bayes and the SVM algorithms by combining different feature extractors. Those algorithms can achieve high accuracy for classifying sentiment when combining different features. Although Facebook statuses have unique characteristics compared to other corpuses (Reviews, News, etc), machine learning algorithms are shown to classify statuses with similar performance.

Finally, the overall performance of the proposed methodology is satisfactory, however, we would like to further improve our research by tracking changes within peoples sentiment on a particular topic, explore the time dependency of our data and analyze their trendy topics dynamically. It would be very interesting to involve the temporal feature on this kind of analysis and not to focus solely on previous posts or discussion

REFERENCE:

[1.] M. A. Hearst, Text data mining: Issues, techniques, and the relationship to information access, Presentation notes for UW/MS workshop on data mining, 1997.

[2.] R. Feldman, and J.Sanger, The Text Mining Handbook: Advanced Appraoches in Analyzing Unstructered Data, 1st ed., New York: Cambridge University Press, 1995.

[3.] U. Fayyad, G. Piatetsky-Shapiro, and P. Smyth, From Data Mining to Knowledge Discovery: An Overview, Advances in Knowledge Discovery and Data Mining, MIT Press, 1-36, Cambridge,1996.

[4.] D, Hand, H. Mannila, and P. Smyth. Principles of Data Mining, MIT Press. Cambridge,

MA. . 2001

[5.] T. Mitchell. Machine Learning, McGraw Hill Cambridge, MA, 1997.

[6.] B. Liu, Sentiment

Analysis and Subjectivity, Invited Chapter for the Handbook of Natural Language Processing, Second Edition. March, 2010

[7.] B. Pang, and L. Lee, Opinion Mining and sentiment analysis. Foundations and Trends in Information Retrieval, 2(1-2): 1-135, 2008.

[8.] B. Pang, and L. Lee, A Sentimental Education: Sentiment Analysis using Subjectivity

Summarization based on Minimum Cuts, In: Proceedings of the 42nd Meeting of the association for Computational Linguistics (ACL04), Barcelona, ES, 2004, pp. 271278.

[9.] T. Wilson, Fine-grained subjectivity and sentiment analysis: Recognizing the intensity, polarity, and attitudes of private states, University of Pittsburgh, 2008.

[10.] P.D. Turney, Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews, In Proceedings of the 40th annual meeting on association for computational linguistics. Philadelphia, Pennsylvania, 2002.

[11.] T. Wilson, J. Wiebe, and R. Hwa, Just how mad are you? Finding strong and weak opinion clauses, In: Proceedings of the 21st Conference of the American Association for Artificial Intelligence (AAAI04). San Jose, US., 2004, pp. 761 769.

[12.] A-M. Popescu, and O. Etzioni, Extracting product features and opinion from reviews, In Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, 339- Vancouver, British Columbia, Canada: Association for Computational Linguistics, 2005.

[13.] B. Pang, L. Lee, and S. Vaithyanathan, Thumbs up? Sentiment classification using machine learning techniques. In P. Isabelle (Ed), In Proceeding of conference on empirical methods in natural language, Philadelphia, US, 2002, pp.79-86, Association for Computational Linguistics.

[14.] X. Zhang, and F. Zhu, The influence of Online consumer reviews on the demand for experience goods: The case

of video games, In 27th international conference on information systems (ICIS), Milwaukee,. AISPress, 2006.

[15.] A. Esuli, and F. Sebastiani. Determining the semantic orientation of terms through gloss analysis, In: Proceedings of the 14th ACM International Conference on Information and Knowledge Management (CIKM05), Bremen, DE, 2005, pp.617624.

[16.] K. T. Durant, and M. D. Smith, Mining Sentiment Classification from Political Web Logs, WEBKDD06, Philadelphia, Pennysylvania, USA, 2006, ACM 1-59593-444-8.

[17.] M. Hurst and K. Nigam. Retrieving topical sentiments from online document collections, in Document Recognition and Retrieval XI, 2004, pp. 2734.

[18.] A. Hang, M. Vibhu, and D. Mayur, Comparative Experiments on Sentiment Classification for Online ProductReviews American Association for Artificial Intelligence (www.aaai.org), 2006.

Leave a Reply