Public Sentiment Variations - Interpretations
on Twitter

Nidhishree H. P.; Akshatha Hr; Vinod Kumar S Be

doi:10.17577/IJERTCONV3IS27121

NCRTS - 2015 (Volume 3 - Issue 27)

Public Sentiment Variations – Interpretations on Twitter

DOI : 10.17577/IJERTCONV3IS27121

Download Full-Text PDF Cite this Publication

Open Access
Article Download / Views: 96
Total Downloads : 16
Authors : Nidhishree H. P., Akshatha Hr, Vinod Kumar S Be
Paper ID : IJERTCONV3IS27121
Volume & Issue : NCRTS – 2015 (Volume 3 – Issue 27)
Published (First Online): 30-07-2018
ISSN (Online) : 2278-0181
Publisher Name : IJERT
License: This work is licensed under a Creative Commons Attribution 4.0 International License

PDF Version

View

Text Only Version

Public Sentiment Variations – Interpretations on Twitter

Nidhishree H. P.

4th Sem ( Ise ), Vivekananda Institute Of Technology

3. Vinod Kumar S Be, Mtech,

2. Akshatha Hr

4th Sem ( Cse ); Vivekananda Institute Of Technology

Asst. Professor, Dept. Of Cse.

RV College Of Engineering, Bangalore

AbstractMillions of users share their opinions on Twitter, making it a valuable platform for tracking and analyzing public sentiment. Such tracking and analysis can provide critical information for decision making in various domains. Therefore it has attracted attention in both academia and industry. Previous research mainly focused on modeling and tracking public sentiment. In this work, we move one step further to interpret sentiment variations. We observed that emerging topics (named foreground topics) within the sentiment variation periods are highly related to the genuine reasons behind the variations. Based on this observation, we propose a Latent Dirichlet Allocation (LDA) based model, Foreground and Background LDA (FB-LDA), to distill foreground topics and filter out longstanding background topics. These foreground topics can give potential interpretations of the sentiment variations. To further enhance the readability of the mined reasons, we select the most representative tweets for foreground topics and develop another generative model called Reason Candidate and Background LDA (RCB-LDA) to rank them with respect to their popularity within the variation period. Experimental results show that our methods can effectively find foreground topics and rank reason candidates. The proposed models can also be applied to other tasks such as finding topic differences between two sets of documents.

Index TermsTwitter, public sentiment, emerging topic mining, sentiment analysis, latent Dirichlet allocation, Gibbs sampling

1 INTRODUCTION

WITH the explosive growth of user generated messages, Twitter has become a social site where millions of users can exchange their opinion. Sentiment analysis on Twitter data has provided an economical and effective way to expose public opinion timely, which is crit-ical for decision making in various domains. For instance, a company can study the public sentiment in tweets to obtain users feedback towards its products; while a politician can adjust his/her position with respect to the sentiment change

of the public.

There have been a large number of research studies and industrial applications in the area of public sentiment track- ing and modeling. Previous research like OConnor et al. [19] focused on tracking public sentiment on Twitter and studying its correlation with consumer confidence and pres-idential job approval polls. Similar studies have been done for investigating the reflection of public sentiment on stock

S. Tan, J. Bu, and C. Chen are with Zhejiang Key Laboratory of Service Robot, College of Computer Science, Zhejiang University, Hangzhou 310027, China. E-mail: {shulongtan, bjj, chenc}@zju.edu.cn.
Y. Li, H. Sun, and X. Yan are with the Department of Computer Science, University of California, Santa Barbara, CA 93106, USA.

E-mail: {yangli, huansun, xyan}@cs.ucsb.edu.
Z. Guan is with the College of Information and Technology, Northwest University of China, Xian 710127, China.

E-mail: welbyhebei@gmail.com.
X. He is with State Key Laboratory of CAD&CG, College of Computer Science, Zhejiang University, Hangzhou 310027, China.

Manuscript received 4 Dec. 2012; revised 20 Apr. 2013; accepted

28 June 2013. Date of publication 15 July 2013; date of current version

7 May 2014.

Recommended for acceptance by J. Pei.

For information on obtaining reprints of this article, please send e-mail to: reprints@ieee.org, and reference the Digital Object Identifier below.

Digital Object Identifier 10.1109/TKDE.2013.116

markets [4] and oil price indices [3]. They reported that events in real life indeed have a significant and immediate effect on the public sentiment on Twitter. However, none of these studies performed further analysis to mine use-ful insights behind significant sentiment variation, called public sentiment variation. One valuable analysis is to find possible reasons behind sentiment variation, which can pro-vide important decision-making information. For example, if negative sentiment towards Barack Obama increases sig- nificantly, the White House Administration Office may be eager to know why people have changed their opinion and then react accordingly to reverse this trend. Another example is, if public sentiment changes greatly on some products, the related companies may want to know why their products receive such feedback.

It is generally difficult to find the exact causes of senti- ment variations since they may involve complicated inter-nal and external factors. We observed that the emerging topics discussed in the variation period could be highly related to the genuine reasons behind the variations. When people express their opinions, they often mention reasons (e.g., some specific events or topics) that support their current view. In this work, we consider these emerging events/topics as possible reasons.

Mining emerging events/topics is challenging: (1) The tweets collection in the variation period could be very noisy, covering irrelevant background topics which had been discussed for a long time and did not contribute to the changes of the publics opinion. How to filter out such back- ground topics is an issue we need to solve. Text clustering and summarization techniques [5], [16] are not appropri-ate for this task since they will discover all topics in a text collection. (2) The events and topics related to opinion vari- ation are hard to represent. Keywords produced by topic

Fig. 1. Sentiment variation tracking of Obama".

modeling [2] can describe the underlying events to some extent. But they are not as intuitive as natural language sentences. (3) Reasons could be complicated and involve a number of events. These events might not be equally impor- tant. Therefore, the mined events should be ranked with respect to their contributions.

In this paper, we analyze public sentiment variations on Twitter and mine possible reasons behind such variations. To track public sentiment, we combine two state-of-the-art sentiment analysis tools to obtain sentiment information towards interested targets (e.g., Obama) in each tweet. Based on the sentiment label obtained for each tweet, we can track the public sentiment regarding the correspond-ing target using some descriptive statistics (e.g., Sentiment Percentage). On the tracking curves significant sentiment variations can be detected with a pre-defined threshold (e.g., the percentage of negative tweets increases for more than 50%). Figs. 1 and 2 depict the sentiment curves for Obama and Apple. Note that in both figures, due to the existence of neutral sentiment, the sentiment percentages of positive and negative tweets do not necessarily sum to 1.

We propose two Latent Dirichlet Allocation (LDA) based models to analyze tweets in significant variation periods, and infer possible reasons for the variations. The first model, called Foreground and Background LDA (FB-LDA), can filter out background topics and extract foreground topics from tweets in the variation period, with the help of an auxiliary set of background tweets generated just before the variation. By removing the interference of longstanding background topics, FB-LDA can address the first aforemen- tioned challenge. To andle the last two challenges, we propose another generative model called Reason Candidate and Background LDA (RCB-LDA). RCB-LDA first extracts representative tweets for the foreground topics (obtained

Fig. 2. Sentiment variation tracking of Apple".

from FB-LDA) as reason candidates. Then it will asso-ciate each remaining tweet in the variation period with one reason candidate and rank the reason candidates by the number of tweets associated with them. Experimental results on real Twitter data show that our method can outperform baseline methods and effectively mine desired information behind public sentiment variations.

In summary, the main contributions of this paper are two-

folds: (1) To the best of our knowledge, our study is the first work that tries to analyze and interpret the public sen-timent variations in microblogging services. (2) Two novel generative models are developed to solve the reason mining problem. The two proposed models are general: they can be applied to other tasks such as finding topic differences between two sets of documents.

MODELS FOR SENTIMENT ARIATION

ANALYSIS

To analyze public sentiment variations and mine possible reasons behind these variations, we propose two Latent Dirichlet Allocation (LDA) based models: (1) Foreground and Background LDA (FB-LDA) and (2) Reason Candidate and Background LDA (RCB-LDA). In this section we illus- trate the intuitions and describe the details of the two proposed models.
1. Intuitions and Notations
  
  As discussed in the last section, it is hard to find the exact causes for sentiment variation. However, it is possible to find clues via analyzing the relevant tweets within the vari-ation period, since people often justify their opinion with supporting reasons. For example, if we want to know why positive sentiment on Obama increases, we can analyze the tweets with positive sentiment in the changing period and dig out the underlying events/topics co-occurring with these positive opinions.
  
  We consider the emerging events or topics that are strongly correlated with sentiment variations, as possible reasons. Mining such events/topics is not trivial. Topics discussed before the variation period may continue receiv-ing attention for a long time. Therefore, we need to make use of the tweets generated just before the variation period to help eliminate these background topics. We can formu-late this special topic mining problem as follows: given two document sets, a background set B and a foreground set T, we want to mine the special topics inside T but outside B. In our reason mining task, the foreground set T contains tweets appearing within the variation period and the background set B contains tweets appearing before the variation period. Note that this problem setting is general: it has applications beyond sentiment analysis.
  
  To solve this topic mining problem, we develop a generative model called Foreground and Background LDA(FB-LDA). Fig. 3(a) shows the graphical structure of dependencies of FB-LDA. Benefiting from the reference role of the background tweets set, FB-LDA can distinguish the foreground topics out of the background or noise topics. Such foreground topics can help reveal possible reasons of the sentiment variations, in the form of word distributions. Details of FB-LDA will be described in Section 2.2.
  
  Fig. 3. (a) Foreground and Background LDA (FB-LDA). (b) Reason Candidate and Background LDA (RCB-LDA).
  
  FB-LDA utilizes word distributions to reveal possible reasons, which might not be easy for users to understand. Therefore we resort to finding representative tweets that reflect foreground topics learnt from FB-LDA. These most relevant tweets, defined as Reason Candidates C, are sentence-level representatives for foreground topics. Since they are not equally important, we rank these candidates (representative tweets) by associating the tweets in the foreground tweets set to them. Each tweet is mapped to only one candidate. The more important one reason candidate is, the more tweets it would be associated with. Top-ranked candidates will likely reveal the reasons behind sentiment variations.
  
  In particular, the association task can be done by com- paring the topic distributions (obatined by topic modeling methods, such as LDA) of tweets and the reason candidates. However, this solution is not optimal since the optimiza-tion goal of the topic modeling step does not take into account the tweet-candidate association at all. Inspired by [12], we propose another generative model called Reason Candidate and Background LDA (RCB-LDA) to accom-plish this task. RCB-LDA can simultaneously optimize topic learning and tweet-candidate association. RCB-LDA, as depicted in Fig. 3(b), is an extension of FB-LDA. It will accept a set of reason candidates as input and output the associations between tweets and those reason candidates. Details of RCB-LDA will be described in Section 2.3.
  
  For the purpose of better describing our models, we summarize all the notations used in FB-LDA and RCB-LDA in Table 1.
2. Foreground and Background LDA
  
  To mine foreground topics, we need to filter out all topics existing in the background tweets set, known as background topics, from the foreground tweets set. we propose a generative model FB-LDA to achieve this goal.
  
  As shown in Fig. 3(a), FB-LDA has two parts of word distributions: f (Kf Ã— V) and b (Kb Ã— V). f is for fore-
  
  ground topics and b is for background topics. Kf and Kb are the number of foreground topics and background top-ics,
  
  respectively. V is the dimension of the vocabulary. For the background tweets set, FB-LDA follows a similar gener- ative process with the standard LDA [2]. Given the chosen topic, each word in a background tweet will be drawn from
  
  a word distribution corresponding to one background topic (i.e., one row of the matrix b). However, for the foreground tweet set, each tweet has two topic distributions, a fore-ground topic distribution t
  
  and a background topic distri-bution t. For each word in a foreground tweet, an associa-tion indicator yit, which is drawn from
  
  t
  
  t
  
  t = 1), the topic of the
  
  t = 1), the topic of the
  
  a type decision distri-bution t, is required to indicate choosing a topic from t or t. If yi = 0, the topic of the word will be drawn from fore-ground topics (i.e., from t), as a result the word is drawn from f based on the drawn topic. Otherwise (yi
  
  word will be drawn from background topics (i.e., from t) and accordingly the word is drawn from b.
  
  TABLE 1
  
  Notations for Our Proposed Models
  
  With the help of background tweets, tweets coming from events. In this paper we automatically select reason candi- the foreground set but corresponding to background top- dates by finding the most relevant tweets (i.e., representa- ics would make a bigger contribution in background topics tive tweets) for each foreground topic learnt from FB-LDA, (i.e., b) learning than in foreground topics (i.e., f ) learn- using the following measure:
  
  ing. The large amount of similar background tweets in the kf ,i
  
  background set would pull them to the background topics. Relevance(t, kf ) = f ,
  
  Only tweets corresponding to foreground topics (i.e, emerg- it
  
  ing topics) will be used to build foreground topics. In this kf
  
  (2)
  
  way, background topics will be filtered out and foreground where f is the word distribution for the foreground topic
  
  topics will be highlighted in a natural way. kf and i is the index of each non-repetitive word in tweet
  
  To summarize, we have the following generative process t. Note that we dont normalize this measure with respect in FB-LDA: to the length of the tweet, since tweets are all very short
  
  and generally have similar lengths. For other kinds of
  1. Choose a word distribution f ;kf Dirichlet(f ) for
    
    texts, normalization shall be applied. After filtering out
    
    each foregrond topic kf . junk tweets and merging similar ones, we consider the
  2. Choose a word distribution b;kb Dirichlet(b) for remaining relevant tweets as the reason candidates.
    
    each background topic kb.
  3. For each tweet b in the background data,
    
    As shown in Fig. 3(b), the generative process of RCB-
    
    b LDA is similar to that of FB-LDA. It generates the
    
    {1, . . . , B}: reason candidates set and the background tweets set in a similar way as the standard LDA. The main
    1. Choose a topic distribution b Dirichlet difference lies in the generative process of the fore-
      
      ( ). i ground tweets set. Each word in the foreground tweets
    2. For each word wb in the tweet, i {1, . . . , Nb}: set can select a topic from alternative topic distri-
    1. Choose a topic zbi Multinomial(b). butions: (1) draw a foreground topic from the topic
      
      b
      
      b
    2. Choose a word w i Multinomial(b zbi ). distribution of one candidate; (2) draw a background
    ;
  4. For each tweet t in the foreground data,
    
    t {1, . . . , T}:
    
    topic from its own background distribution t. Specifically,
    
    for each word in tweets from the foreground tweets set, a
    
    yti is chosen, similar to that in FB-LDA. If yti = 0, we should
    
    t ,
    
    t ,
    1. Choose a type decision distribution choose an association candidate c i which is drawn from a
      
      t Dirichlet ( ). candidate association distribution t. Then we draw a fore-
      
      i {
      
      i {
    2. For each word w in the tweet, i 1, . . . , N :
      
      t t} ground topic from cit for that word. The generative process
      
      i = 1 is as same as that in FB-LDA.foryi)ChooseatypeyBernoulli().
      
      t t t
      
      i
      
      The mapping from a foreground tweet t
      
      to any rea-
      1. if yt = 0:
        
        son candidate or a background topic can be controlled by
        
        Choose a foreground topic distribu-
        
        t,c and t,0. If t,0 is bigger than an empirical threshold
        
        tion t Dirichlet ( ). (e.g., 0.5), the tweet will be mapped to the candidate c
        
        Choose a topic zi Multinomial( ).
        
        t t which corresponds to the largest t,c value; otherwise it will
        
        i Multinomial be mapped to the bac kground topic. tC)
        
        (f ;zti ). Due to the space limit, we omit some parts of the gen-
      2. else (i.e., yti = 1.):
        
        erative process of RCB-LDA which are similar to those in FB-LDA. Here we just present the generative process for
        
        Choose a background topic distri- foreground tweets set in RCB-LDA:
        
        bution t Dirichleti (). For each tweet t in the foreground tweet data,
        
        Choose a topic zt Multinomial(t). t {1, . . . , T}:
        
        Choose a word wti Multinomial 1) Choose a type decision distribution
      (b;zti ). t Dirichlet( ).
      1. Choose a candidate association distribution
        
        Given the hyperparameters
        
        ,
        
        , , ,
        
        f b
        
        , the joint
        
        t Dirichlet
        
        ( i. )
        
        distribution is:
        
        |
        
        |
        
        L = P(y, zt, zb , wt, wb
      2. For each word wt in the tweet, i {1 , . . . , Nt}:
      , , , f , b) a) Choose a type yti Bernoulli(t).
      
      i
      
      |
      
      |
      
      = z
      
      = z
      
      t
      
      t
      
      _
      
      _
      
      P y P _
      
      | t
      
      y = 0
      
      ; P z , z
      
      b| y = 1 ; b) if yt = 0:
      
      i
      
      P w_ y
      
      |
      
      0, z P w , w y
      
      = ; =
      
      1, z
      
      , z ; . i) Choose a candidate c
      
      b
      
      Multinomial (t).
      
      t t f
      
      t b | t b i t
      1. ii) Choose a topic zt Multinomial cti .
      i
      
      f ;zt
      
      f ;zt
      
      iii) Choose a word wt Multinomial i .
3. Reason Candidate and Background LDA
  
  Different from FB-LDA, RCB-LDA needs a third document
4. Gibbs Sampling the total number of words in tweet t. Rt,c is the number
  
  Similar to the original LDA model, exact inference for of words in tweet t which are generated by the topics
  
  our model is intractable. Several approximate inference methods are available, such as variational inference [2], expectation propagation [17] and Gibbs Sampling [7], [9].
  
  We use Gibbs Sampling here, since it is easy to extend and
  
  in reason candidate c, while Rt is the total number of words in tweet t which are generated by the foreground
  
  topics.
  
  it has been proved to be quite effective in avoiding local 2.5 Parameter Estimation
  
  optima. Given the sampled topics zc, zt, zb, type association y and The sampling methods for the two models are similar candidate associations c as well as the inputs: s, s, wc,
  
  to each other. Due to the space limit, we only focus on the wt and wb, we can estimate model parameters , , , ,
  
  detailed inference of the relatively more complicated model
  
  f and b for RCB-LDA as follows. Again, the detailed RCB-LDA, and the inference of FB-LDA can be derived parameter estimation for FB-LDA is omitted here. In our
  
  similarly. The sampling methods of zc, zb, zt, c and y in experiments, we empirically set model hyperparameters as
  
  RCB-LDA are as follows: = 0.1, = 0.1, = 0.5, = 0.1, f = 0.01 and
  
  i (c,i)
  
  i (c,i)
  
  t,i + t,c +
  
  t,i + t,c +
  
  P(zc = kf |zc , zt, zb, wc, wt, wb, c, y; , , b = 0.01. M R
  
  , , f , b) ,
  
  1
  
  1
  
  _ +
  
  v,kf f
  
  1), t,i
  
  = Mt + 2 Â· ;
  
  t,c =
  
  Rt + C Â·
  
  (8)
  
  _( ),kf V f
  
  1 Ã— ( c,kf + c,kf +
  
  where i = 0 or 1. serves as a threshold to deter-
  
  Â· + Â·
  
  (4) mine whether the tweet is a foreground topic tweet
  
  t
  
  t
  
  (e.g., t,0 > 0.5) or a bckground tweet. t can be used to
  
  where v is the word token in the vocabulary that has the determine which candidate the tweet is associated with
  
  same word symbol with word i. _v,kf is the number of
  
  times word token v being drawn from the kf th foreground
  
  (e.g., by choosing the biggest t,c).
  
  f
  
  f
  
  c,k + c,kf +
  
  topic. _
  
  (Â·),kf is the total number of word tokens drawn from
  
  c,kf
  
  = c,( ) + c,( ) + Kf Â· ,
  
  (9)
  
  the kf th foreground topic. c,kf is the number of words in Â· Â·
  
  candidate c which choose topic k . ,
  
  f c kf
  
  is the number of
  
  i,kb = _
  
  _i,k +
  
  b
  
  b
  
  + K Â·
  
  , (10)
  
  words in the foreground tweets set which are associated
  
  i,(Â·) b
  
  _
  
  _
  
  _
  
  b ,
  
  b ,
  
  _
  
  _
  
  with candidate c and choose topic kf . kf ,v v,kf + f k v
  
  +
  
  v,kb b
  
  w
  
  w
  
  i ( , ) , , , , , , , , ,
  
  f = (Â·),kf + V Â· f ; b =
  
  _
  
  (Â·),kb
  
  + V Â·
  
  ;
  
  ;
  
  b
  
  (11)
  
  b i
  
  P z = k |z
  
  f
  
  f
  
  b b b
  
  zc zt wc wt
  
  b c y;
  
  is the word distribution for each foreground topic; b is
  
  , , f , b the word distribution for each background topic.
  
  _
  
  _
  
  v,kb + b 1 _
  
  ,
  
  _ + V Â· 1
  
  _ + V Â· 1
  
  Ã—
  - kb b b k +
  1 , (5)
TRACKING PUBLIC SENTIMENT

( ), b

where _v,kb is the number of times word tokens v being In our work, sentiment tracking involves the following

assigned to the kbth background topic. _(Â·),kb is the total three steps. First, we extract tweets related to our inter-

number of word tokens assigned to the kbth background ested targets (e.g., Obama, Apple etc), and preprocess

topic. _b,kb is the number of words in tweet b which the extracted tweets to make them more appropriate for

choose topic kb. sentiment analysis. Second, we assign a sentiment label

to each individual tweet by combining two state-of-the-art sentiment analysis tools [6], [29]. Finally, based on the sentiment labels obtained for each tweet, we track the sen- timent variation regarding the corresponding target using some descriptive statistics. Details of these steps will be described in the following subsections.
1. Tweets Extraction and Preprocessing
  
  To extract tweets related to the target, we go through the whole dataset and extract all the tweets which contain the keywords of the target.
  
  Compared with regular text documents, tweets are gen- erally less formal and often written in an ad hoc man-ner. Sentiment analysis tools applied on raw tweets often achieve very poor performance in most cases. Therefore, preprocessing techniques on tweets are necessary for obtaining satisfactory results on sentiment analysis:
  1. Slang words translation: Tweets often contain a lot of slang words (e.g., lol, omg). These words are usually impor- tant for sentiment analysis, but may not be included in sentiment lexicons. Since the sentiment analysis tool [29] we are going to use is based on sentiment lexicon, we con-vert these slang words into their standard forms using the Internet Slang Word Dictionary1 and then add them to the tweets.
  2. Non-English tweets filtering: Since the sentiment analysis tools to be used only work for English texts, we remove all non-English tweets in advance. A tweet is con- sidered as non-English if more than 20 percent of its words (after slang words translation) do not appear in the GNU Aspell English Dictionary2.
  3. URL removal: A lot of users include URLs in their tweets. These URLs complicate the sentiment analysis pro- cess. We decide to remove them from tweets.
2. Sentiment Label Assignment
  
  To assign sentiment labels for each tweet more confidently, we resort to two state-of-the-art sentiment analysis tools. One is the SentiStrengtp tool [29]. This tool is based on the LIWC [27] sentiment lexicon. It works in the follow-ing way: first assign a sentiment score to each word in the text according to the sentiment lexicon; then choose the maximum positive score and the maximum negative score among those of all individual words in the text; compute the sum of the maximum positive score and the maximum negative score, denoted as FinalScore; finally, use the sign of FinalScore to indicate whether a tweet is positive, neutral or negative. The other sentiment analysis tool is TwitterSentiment4. TwtterSentiment is based on a Maximum Entropy classifier [6]. It uses automatically col-lected 160,000 tweets with emoticons as noisy labels to train the classifier. Then based on the classifiers outputs, it will assign the sentiment label (positive, neutral or negative) with the maximum probability as the sentiment label of a tweet.
  1. http://www.noslang.com
  2. http://aspell.net
  3. http://sentistrength.wlv.ac.uk
  4. http://twittersentiment.appspot.com
    
    Though these two tools are very popular, their performance on real datasets are not satisfactory because a large proportion of tweets still contain noises after preprocessing. We randomly picked 1,000 tweets and man-ually labeled them to test the overall accuracy of these two tools. It turns out that SentiStrength and TwitterSentiment achieve 62.3% and 57.2% accuracy on this testing dataset, respectively. By analyzing more cases outside the testing set, we found that TwitterSentiment is very inclined to mis-judge a non-neutral tweet as neutral, while SentiStrength is highly likely to make a wrong judgement when FinalScore is close to 0. Therefore, we design the following strategy to combine the two tools:
    1. If both tools make the same judgement, adopt this
      
      judgement;
    2. If the judgement of one tool is neutral while that of the other is not, trust the non-neutral judgement;
    3. In the case where the two judgements conflict with each other (i.e., one positive and one nega-tive), trust SentiStrengths judgement if the absolute
    value of FinalScore is larger than 1; otherwise, trust TwitterSentiments judgement.
    
    By utilizing the above heuristic strategy, the accuracy on the testing dataset is boosted to 69.7%, indicating the effectiveness of combining the two tools.
    
    Note that the low accuracy of sentiment analysis would affect the final result of reason mining. For example, the mined possible reasons for negative sentiment variations may contain positive or neutral tweets. The low accu-racy is mainly due to the fact that sentiment analysis techniques are still not very reliable on data with much noises, such as tweets. Fortunately, our work uses aggre-gated sentiment of multiple tweets. As long as the error caused by sentiment analysis is not significantly biased, the result is still useful, as demonstrated later by our case study. In practice, any sentiment analysis methods which would achieve better performance can be plugged into our algorithm and improve the performance of reason mining.
3. Sentiment Variation Tracking
  
  After obtaining the sentiment labels of all extracted tweets about a target, we can track the sentiment variation using some descriptive statistics. Previous work on burst detec-tion usually chooses the variation of the total number of tweets over time as an indicator. However, in this work, we are interested in analyzing the time period during which the overall positive (negative) sentiment climbs upward while the overall negative (positive) sen-timent slides downward. In this case, the total number of tweets is not informative any more since the num-ber of positive tweets and negative tweets may change consistently. Here we adopt the percentage of positive or negative tweets among all the extracted tweets as an indi-cator for tracking sentiment variation ovr time. Based on these descriptive statistics, sentiment variations can be found using various heuristics (e.g., the percentage of pos-itive/negative tweets increases for more than 50%). Figs. 1 and 2 show the sentiment curves regarding Obama and Apple from June 2009 to October 2009. We will test our
  
  TABLE 2
  
  Basic Statistics for the 50 Sentiment Variations
  
  proposed methods on sentiment variations of these two targets.
EXPERIMENTS

In this section, we first present the experiments on a Twitter dataset for mining possible reasons of public sentiment variations. The results demonstrate that our models out- perform baseline methods in finding foreground topics and ranking reason candidates. Then we apply our models on scientific article data and product review data. These appli- cations show that our models can be used in any cases where we need to mine special topics or events from one text collection in comparison with another text collection.

learned foreground topics. Here we treat the standard LDA (only using the foreground tweets set) as the baseline. We propose to measure the quality of a topic using word entropy: the conditional entropy of the word distribution given a topic, which is similar to the topic entropy [8]. Entropy measures the average amount of information expressed by each assignment to a random variable. If the a topics word distribution has low word entropy, it means that topic has a narrow focus on a set of words. Therefore, a topic mod-eling method with a low average word entropy generates topics with high clarity and interpretability. The definition of the word entropy given a topic k is as follows:

V

H(w|k) = p(wi|k)logp(wi|k)

i=1

V

Twitter Dataset

=

i=1

k,ilogk,i,

(12)

Our proposed models are tested on a Twitter dataset to analyze public sentiment variations. The dataset is obtained from the Stanford Network Analysis Platform [34]. It spans from June 11, 2009 to December 31, 2009 and contains around 476 million tweets. It covers around 20-30% of all public tweets published on Twitter during that time period. We do our experiments on a subset of the dataset, which spans from June 13, 2009 to October 31, 2009. In this work, we choose two targets to test our methods, Obama and Apple. These two targets are chosen as representatives of the political sphere and the business field, where the anal-ysis of sentiment variation plays a critical role in helping decision making. Figs. 1 and 2 show the sentiment variation curves of these two targets.

To evaluate our models, we choose 50 sentiment vari- ations for Obama and Apple, from June 13, 2009 to October 31, 2009. In particular, we first detect sentiment variation periods by the following heuristic: if the percent-age of negative/postive tweets increases for more than 50%, we mark the increasing period as a sentiment variation period. Then the foreground tweets set is generated by tweets within the variation period, while the background tweets set is formed by tweets just before the variation period. The amount of background tweets is chosen to be twice as that of the foreground set. Both foreground set and background set only contain the tweets whose sentiment labels correspond to the sentiment variation (e.g., positive label for positive sentiment increase). Table 2 shows some basic statistics for the 50 cases.
Foreground Topics from FB-LDA

In this section, we evaluate the effectiveness of our first proposed model FB-LDA, using all 50 sentiment variations shown in Table 2. We will first compare the learned topics of FB-LDA and LDA with a heuristic evaluation metric. Then we will devise two baseline methods and quantitatively compare FB-LDA with them by using manually labeled ground truth.
1. Average Word Entropy
  
  where k is the word distribution of topic k and V is the number of words in the vocabulary.
  
  Here we configure FB-LDA to output 20 foreground top- ics and 20 background topics, and set LDA to produce 20 topics. The average word entropies for topics learned from FB-LDA and LDA are 3.775 and 4.389, respectively. It shows that the topics produced by FB-LDA exhibit lower word entropy than those learned by LDA, indicating that FB- LDA can generally obtain topics with less ambiguity and more interpretability.
2. Quantified Evaluation
  
  To verify that FB-LDA is effective in finding foreground topics, we first manually find foreground events for each sentiment variation and consider these foreground events as ground truth. Then we evaluate the precision and recall of the results mined by FB-LDA with respect to the ground truth.
  
  To find ground truth we will first do text clustering on the foreground tweets set and background tweets set respectively. Big clusters in the foreground set will be extracted as candidates. Then we manually filter out the candidates which also exist in the background set. The remaining candidates will be treated as the final ground truth. Each candidate will be represented by 1-3 repre-sentative tweets which are different descriptions of the underlying event. On average, each sentiment variation has 9 emerging/foreground events.
  
  The precision and recall of the results found by FB-LDA are computed as follows: (a) rank foreground topics by their word entropies in ascending order. (b) for a foreground topic, five most relevant tweets are selected by Equation (2).
  
  (c) if the most relevant tweets contain a tweet in the ground truth, or contain a tweet which is very similar to a tweet in the ground truth, we believe that the method finds a correct foreground event. We define the similarity between two tweets as follows:
  
  |WordOverlap| 2
  
  We first evaluate FB-LDA by measuring the quality of the
  
  Similarity(ti, tj) =
  
  |t | + |t |
  
  . (13)
  
  Fig. 4. Precision-recall curves for FB-LDA, LDA and k-means.
  
  In our experiments, two tweets will be considered similar if their Similarity is no less than 0.8.
  
  Since there is no existing work that does exactly the same thing as our work (i.e., foreground topics/events finding), we design two methods based on traditional tech-niques as the baselines: k-means and LDA. For k-means, we first run the k-means clustering on the foreground set and the background set respectively. Since clusters from the foreground set contain both foreground and background topics, we design a mechanism to filter out background clusters by comparing clusters between the foreground set and the background set. If a cluster corresponds to the same topic/event with one background cluster, it will be filtered out. In particular, we compute the cosine similarity of the two cluster centers. If the cosine similarity is big-ger than a threshold s, we consider the two clusters as from the same topic. In this experiment, we
  
  empirically set s = 0.8 to achieve the best performance. After back-ground topics filtering, the remaining foreground
  
  clusters will be ranked by their sizes in descending order. Then for each cluster we find five tweets which are closest to the cluster center. The evaluation method for k-means is as same as that of FB-LDA. For the second baseline LDA, the background topics filtering step is similar to k-means. But instead of comparing cluster centers, here we compare the word distributions of topics. For the threshold setting, we
  
  empirically set s = 0.9 to achieve the best performance. The topic ranking method and the evaluation step of LDA are the
  
  same as those of FB-LDA.
  
  We observed that the most relevant tweets of each topic/cluster are similar with each other and could clearly represent the semantics of the topic/cluster. Moreover, each of the most relevant tweets generally corresponds to a specific event. Therefore, if a representative tweet of a foreground topic/cluster appears in the ground truth set, we could reasonably conclude that the foreground topic/cluster corresponds to one round truth event.
  
  Experimental results Fig. 4 shows the Precision-Recall curves (average on all 50 variations) for FB-LDA, LDA and k-means. In this experiment, we configure FB-LDA to pro- duce 20 foreground topics and 20 background topics. For LDA, we configure it to produce 20 topics on the fore-ground set and another 20 topics on the background set. For k-means clustering, we run it on the two sets respectively, each generating 20 clusters. It can be found that FB-LDA greatly outperforms the two baselines in terms of precision
  
  and recall. LDA and k-means can not work well because a
  
  fixed threshold for filtering out background topics is obviously not appropriate for all cases. In comparison, FB- LDA can work properly without depending on any manually set thresholds.
Reason Ranking of RCB-LDA

In this section, we evaluate our second model RCB-LDA in ranking reason candidates.
1. Association Accuracy
  
  We randomly choose five sentiment variations from all 50 cases in Table 2, two for Obama and three for Apple. For each selected case, several reason candidates are gen-erated (see Section 2.3). Then RCB-LDA ranks these candi-dates by assigning each tweet in the foreground tweets set to one of them or the background. Candidates associated with more tweets are more likely to be the main reasons. Before showing the reason ranking results, we first measure RCB- LDAs association accuracy and compare it with two baseline methods.
  
  We manually label a subset of tweets in foreground set as the ground truth. Each label contains two elements: one tweet and one candidate (or the background). For each case, 1,000 tweets are manually labeled. Then we extend the labeled set by comparing labeled tweets contents with the unlabeled tweets. If an unlabeled tweet has the same content with a labeled tweet, it should inherit the label from the labeled one.
  
  Our model is compared with two baselines: (1) TFIDF: In this method, each tweet or candidate is represented as a vector with each component weighted by term fre- quency/inverse document frequency (TF-IDF). The associ- ation is judged by the cosine similarity between the two vectors representing tweets and candidates. (2) LDA: For this method, we use the standard LDA model to learn top-ics from foreground tweets and candidates together (i.e., treating the foreground tweets and reason candidates as an entire text set). Then we compute topic distribution distances between tweets and candidates using cosine simi-larity. We also tried to use the standard Jensen-Shannon(JS) divergence to measure similarity. But it turns out that the baseline methods perform even worse under this measure.
  
  For all the three models, we evaluate them by measur-ing their mapping accuracy in assigning tweets to candi- dates/background. The mapping accuracy is defined as: the number of correctly mapped tweets over the total number of tweets in the testing set. The mapping is controlled by a threshold (indicating mapping to any candidate or to the
  
  background): (1) For TFIDF and LDA, a tweet is mapped to
  
  a candidate if there is at least one candidate whose relative
  
  similarity with the tweet is bigger than , and the candidate with the maximum similarity will be selected as the map-
  
  ping destination; otherwise it is mapped to the background. The relative similarity between a tweet and a candidate is defined as: the ratio of the similarity between them over the sum of similarities between the tweet and all candidates. (2) For RCB-LDA, a tweet is mapped to the candidate correspond-ing to the largest t,c value if t,0 > and mapped
  
  to the background otherwise. The meanings of t,0 and t,c
  
  can be found in Section 2.
  
  TABLE 3
  
  Ranking Results of Reason Candidates by RCB-LDA
  
  This is an example of a negative sentiment variation towards Apple from July 1st to July 3rd.
  
  4.4 Scientific Articles
  
  Fig. 5 shows the comparison of all three models aver-age
  
  mapping accuracies by varying . Our RCB-LDA model achieves the best accuracy in a wide range of the param-eter
  
  variation. Moreover, compared with the two baseline methods, our method is not very sensitive to the varying threshold. LDA cannot work well for two reasons: (1) the topics learnt by LDA cannot accurately reflect the real fore- ground events; (2) LDA does not optimize the association goal directly.
2. Reason Ranking Example

Finally, we show an example of the reason ranking results. The example is a negative sentiment variation towards Apple from July 1st to July 3rd. Table 3 shows the results. Here we set = 0.6 and rank all the reason candidates with

respect to the number of tweets associated with them.

As can be seen, the mined reasons are meaningful and reasonable. They clearly explain why people hold negative opinions during this period. The shooting event at Apple store is likely the main reason. Besides, SMS vulnerability on iPhone and the iPhone 3GS overheating risk are also important reason candidates. Furthermore, the possible rea- sons are represented as natural language sentences, which greatly eases the understanding of these reasons. Reasons shown in Table 3 might be "shallow" to informed observers, however, for people with little background knowledge, these results are very useful for showing them a big picture about the events behind the sentiment variation. Compared to professional analysis provided by news or TV editors, our models can give quantitative results and require much less manual efforts. Moreover, professional analysts can hardly be familiar with all the events regarding various targets (well known or not well known). Our methods can work on any target, in any time period. The mined

Fig. 5. Average association accuracy comparison for TFIDF, LDA and RCB-LDA by varying .

results from our models can be used as references by those

professional analysts.

As explained in Section 2.3, the proposed models are gen- eral in the sense that they can be used to mine special topics existing in one text collection, but not in another one. The task in this test is to find new topics of the information retrieval (IR) domain in recent three years, using FB-LDA. We expect to find research topics that emerge recently in IR domain, such as Twitter Data Analysis and Hashing Techniques. Long standing IR topics, such as Retrieval Models and Web Page Analysis, are not desired. Specifically, we plan to mine special topics in papers published in the proceed-ings of ACM SIGIR 2010, 2011 and 2012. The background set contains papers published in the SIGIR conference dur-ing 2000-2009. These papers are collected from the ACM Digital Library 5. The foreground set contains 294 papers and the background set contains 630 papers. We only use the title and the abstract for each paper in this experiment.

In this test, we set FB-LDA to output 20 foreground topics along with 20 background topics, and set LDA to produce 20 topics. The average word entropy for special topics learnt from FB-LDA is 3.856 and that for topics from LDA is 4.275. It shows that FB-LDA outperforms LDA in finding meaningful topics.

Tables 4 and 5 show the top 10 topics (foreground topics for FB-LDA) with the lowest word entropy learnt from FB- LDA and LDA, respectively. The second column of each table presents research topics by manual inspection. And the third column shows top words for each topics. As illus-trated, most foreground topics found by FB-LDA are new and emerging in recent years. While many of the topics learnt from LDA are general IR topics which have been studied for a long time. It is clear that FB-LDA is superior in finding special/emerging topics.

4.5 Product Reviews

We then verify the effectiveness of RCB-LDA on product review data. The task in this experiment is to find aspects or features in which customers think Kindle 4 outperforms Kindle 3. Aspects being frequently talked in both Kndle 3 and 4 reviews, and aspects hardly being mentioned in Kindle 4 reviews should be ranked lower. In contrast, aspects being frequently discussed in Kindle 4 reviews but not in Kindle 3 reviews should be ranked higher.

In this test, we consider positive reviews for Kindle 4 as foreground documents and treat positive reviews for Kindle

http://dl.acm.org

TABLE 4

Foreground Topics Learnt from FB-LDA in the Task of Finding New IR Topics

3 as background documents. In order to test RCB-LDA, we need some reason/aspect candidates. These aspect candi-dates are automatically generated from the product descrip-tions, which are publicly available online. Specifically, we crawled 1,000 positive (5-star) reviews for Kindle 3 and Kindle 4 respectively, from Amazon.com. From the product descriptions we generate 9 candidates to describe different aspects of Kindle (e.g., Weight, Size etc). Each candidate contains around 5 sentences which are all about the same aspect.

Fig. 6 shows the comparison of RCB-LDA results under two settings: (1) RCB-LDA with both foreground and back- ground data; (2) RCB-LDA using only foreground data. In both tables, the first column shows the count of reviews assigned to each candidate by our model. Since reviews are usually much longer than tweets, each review can cover multiple aspects. Therefore we allow one review to be assigned to multiple candidates as long as t,c is bigger than a

threshold (t,c > 0.2 in this experiment). Note that in the experiments conducted on the Twitter data, one tweet is

assigned to only one reason candidate.

As illustrated in Fig. 6, without considering the back- ground, the model will rank some aspects shared by both Kindle 3 and Kindle 4 (e.g., Special Offer, Price) at the top positions (the left table in Fig. 6). This is not good since we are interested in detecting the aspects that customers think Kindle 4 outperforms Kindle 3. With the help of the back- ground data, the results of RCB-LDA model can perfectly satisfy our demand. As shown by the right table, when taking the background into account, the RCB-LDA model will boost the rankings of aspects which clearly show the advantages of Kindle 4 over Kindle 3 (e.g., Weight, Size, Sensibility), and at the same time lower the rankings of shared aspects (e.g., Special Offer, Price).

The above two experiments demonstrate that our mod-els are general and not limited to the possible reason mining problem. They can be applied to various tasks involving finding the topic differences between two sets of documents.

5 RELATED WORK

To the best of our knowledge, this is the first work to analyze and interpret the public sentiment variations in microblogging services. Although there is almost no pre- vious work on exactly the same problem, here we pro-vide a brief review of related work from several greater perspectives.

Sentiment Analysis. In recent years, sentiment analysis, also known as opinion mining, has been widely applied to various document types, such as movie or product reviews[11], [36], webpages and blogs [35]. Pang et al. conducted a detailed survey of the existing methods on sen- timent analysis [20]. As one main application of sentiment analysis, sentiment classification [13], [32] aims at classi- fying a given text to one or more pre-defined sentiment categories.

Online public sentiment analysis is an increasingly pop- ular topic in social network related research. There have been some research work focusing on assessing the rela-tions between online public sentiment and real-life events (e.g., consumer confidence, stock market [3], [19], [28]). They reported that events in real life indeed have a sig-nificant and immediate effect on the public sentiment in Twitter. Based on such correlations, some other work [18], [31] made use of the sentiment signals in blogs and tweets to predict movie sales and elections. Their results showed that online public sentiment is indeed a good indicator for

TABLE 5

Topics Learnt from LDA in the Task of Finding New IR Topics

Fig. 6. Experiment results on kindle reviews. The left table shows the result of running RCB-LDA using only Kindle 4 reviews (foreground data) and the right table shows the result of running RCB-LDA using both Kindle 4 reviews (foreground data) and Kindle 3

reviews (background data).

movie sales and elections. Different from the existing work in this line, we propose to analyze possible reasons behind the public sentiment variations.

Event Detection and Tracking. In this paper, we are interested in analyzing the latent reasons behind the public sentiment variations regarding a certain target. Considering the target as a given query, we can obtain all tweets about this target. Then there are two approaches to accomplish the task:

(1) detecting and tracking events, and performing sen-timent analysis for the tweets about each event; (2) tracking sentiment variations about the target and finding reasons for the sentiment changes.

The first approach is intuitive but problematic because detecting and tracking events are not easy, especially for fine- grained ones, such as one paragraph in some speech or one feature of a product. These events should be good rea-son candidates to explain the sentiment variation. However, we cannot detect and track them by existing event track-ing methods [1], [10], [15], [21], [22], [33] which are only applicable to popular events, such as the biggest events per day in the whole Twitter message stream. Leskovec et al. proposed to track memes, such as quoted phrases and sen- tences, in the news cycle [14]. It is based on the quotation relations of memes. However, such quotation relations do not frequently appear in Twitter because of the length limit of tweets. Retweets in Twitter are different from quo-tations since retweets can only be used to track reposted messages with exactly the same content, but not to track relayed messages using different descriptions.

We choose the second approach, which first tracks sentiment variations and then detects latent reasons behind them. Through analyzing the output of our proposed mod-els, we find that the second approach is able to find fine-grained reasons which correspond to specific aspects of events. These fine-grained events can hardly be detected by existing event detection and tracking methods.

Data Visualization. The reason mining task is com-mitted to show specific information hidden in the text data. So it is also related to data visualization tech-niques. Recently, there are many excellent works which fucus on data visualization using subspace learning algorithms [23][25]. Unfortunately these works are not appropriate for text data especially for noisy text data.

Another popular data visualization technique is ranking [26], [30]. Ranking is a core technique in the information retrieval domain which can help find the most relevant information for given queries. However, the reason mining task cannot be solved by ranking methods because there are no explicit queries in this task.

Correlation between Tweets and Events. To better

understand events, there have been some work[5], [15], [16], [21], [33] trying to characterize events using work [9] tweets. However, few of them proposed to study the correlation between tweets and events. Recently, Hu et al. proposed novel models to map tweets to each segmentation in a public speech [12]. Our RCB-LDA model also focuses on finding the correlation between tweets and events (i.e, reason candidates). However, it is different from previous work in the following two aspects: (1) In our model, we utilize a background tweets set as a reference to remove noises and background topics. In this way, the interference of noises and background topics can be eliminated. (2) Our model is much more general. It can not only analyze the content in a single speech, but also handle more complex cases where multiple events mix together.

6 CONCLUSION

In this paper we investigated the problem of analyzing public sentiment variations and finding the possible reasons causing these variations. To solve the problem, we pro-posed two Latent Dirichlet Allocation (LDA) based models, Foreground and Background LDA (FB-LDA) and Reason Candidate and Background LDA (RCB-LDA). The FB-LDA model can filter out background topics and then extract foreground topics to reveal possible reasons. To give a more intuitive representation, the RCB-LDA model can rank a set of reason candidates expressed in natural language to provide sentence-level reasons. Our proposed models were evaluated on real Twitter data. Experimental results showed that our models can mine possible reasons behind sentiment variations. Moreover, the proposed models are general: they can be used to discover special topics or aspects in one text collection in comparison with another background text collection.

ACKNOWLEDGMENTS

This work was partially supported in part by the Institute for Collaborative Biotechnologies through Grant W911NF-09- 0001 from the U.S. Army Research Office, and in part by the National Basic Research Program of China (973 Program) under Grant 2012CB316400, NSF 0905084 and NSF 0917228, National Natural Science Foundation of China (Grant No: 61125203, 61173186, 61373118). The content of the informa-tion does not necessarily reflect the position or the policy of the Government, and no official endorsement should be inferred. Z. Guan is the corresponding author.

REFERENCES

H. Becker, M. Naaman, and L. Gravano, Learning similarity met- rics for event identification in social media, in Proc. 3rd ACM WSDM, Macau, China, 2010.
D. M. Blei, A. Y. Ng, and M. I. Jordan, Latent dirichlet allocation, J. Mach. Learn. Res., vol. 3, pp. 9931022, Jan. 2003.
J. Bollen, H. Mao, and A. Pepe, Modeling public mood and emo- tion: Twitter sentiment and socio-economic phenomena, in Proc. 5th Int. AAAI Conf. Weblogs Social Media, Barcelona, Spain, 2011.
J. Bollen, H. Mao, and X. Zeng, Twitter mood predicts the stock market, J. Comput. Sci., vol. 2, no. 1, pp. 18, Mar. 2011.
D. Chakrabarti and K. Punera, Event summarization using tweets, in Proc. 5th Int. AAAI Conf. Weblogs Social Media, Barcelona, Spain, 2011.
A. Go, R. Bhayani, and L. Huang, Twitter sentiment classification using distant supervision, CS224N Project Rep., Stanford: 112, 2009.
T. L. Griffiths and M. Steyvers, Finding scientific topics, in Proc. Nat. Acad. Sci. USA, vol. 101, (Suppl. 1), pp. 52285235, Apr. 2004.
D. Hall, D. Jurafsky, and C. D. Manning, Studying the history of ideas using topic models, in Proc. Conf. EMNLP, Stroudsburg, PA, USA, 2008, pp. 363371.
G. Heinrich, Parameter estimation for text analysis, Fraunhofer IGD, Darmstadt, Germany, Univ. Leipzig, Leipzig, Germany, Tech. Rep., 2009.
Z. Hong, X. Mei, and D. Tao, Dual-force metric learning for robust distracter-resistant tracker, in Proc. ECCV, Florence, Italy, 2012.
M. Hu and B. Liu, Mining and summarizing customer reviews, in

Proc. 10th ACM SIGKDD, Washington, DC, USA, 2004.
Y. Hu, A. John, F. Wang, and D. D. Seligmann, Et-lda: Joint topic modeling for aligning events and their twitter feedback, in Proc. 26th AAAI Conf. Artif. Intell., Vancouver, BC, Canada, 2012.
L. Jiang, M. Yu, M. Zhou, X. Liu, and T. Zhao, Target-dependent twitter sentiment classification, in Proc. 49th HLT, Portland, OR, USA, 2011.
J. Leskovec, L. Backstrom, and J. Kleinberg, Meme-tracking and the dynamics of the news cycle, in Proc. 15th ACM SIGKDD, Paris, France, 2009.
C. X. Lin, B. Zhao, Q. Mei, and J. Han, Pet: A statistical model for popular events tracking in social communities, in Proc. 16th ACM SIGKDD, Washington, DC, USA, 2010.
F. Liu, Y. Liu, and F. Weng, Why is SXSW" trending? exploring multiple text sources for twitter topic summarization, in Proc. Workshop LSM, Portland, OR, USA, 2011.
T. Minka and J. Lafferty, Expectation-propagation for the gener- ative aspect model, in Proc. 18th Conf. UAI, San Francisco, CA, USA, 2002.
G. Mishne and N. Glance, Predicting movie sales from blogger sentiment, in Proc. AAAI-CAAW, Stanford, CA, USA, 2006.
B. OConnor, R. Balasubramanyan, B. R. Routledge, and N. A. Smith, From tweets to polls: Linking text sentiment to public opinion time series, in Proc. 4th Int. AAAI Conf. Weblogs Social Media, Washington, DC, USA, 2010.
B. Pang and L. Lee, Opinion mining and sentiment analysis,

Found. Trends Inform. Retrieval, vol. 2, no. (12), pp. 1135, 2008.
T. Sakaki, M. Okazaki, and Y. Matsuo, Earthquake shakes twitter users: Real-time event detection by social sensors, in Proc. 19th Int. Conf. WWW, Raleigh, NC, USA, 2010.
D. Shahaf and C. Guestrin, Connecting the dots between news articles, in Proc. 16th ACM SIGKDD, Washington, DC, USA, 2010.
D. Tao, X. Li, X. Wu, W. Hu, and S. J. Maybank, Supervised tensor learning, Knowl. Inform. Syst., vol. 13, no. 1, pp. 142, 2007.
D. Tao, X. Li, X. Wu, and S. J. Maybank, General tensor dis- criminant analysis and gabor features for gait recognition, IEEE Trans. Pattern Anal. Mach. Intell., vol. 29, no. 10, pp. 17001715, Oct. 2007.
D. Tao, X. Li, X. Wu, and S. J. Maybank, Geometric mean for subspace selection, IEEE Trans. Pattern Anal. Mach. Intell., vol. 31, no. 2, pp. 260274, Feb. 2009.
D. Tao, X. Tang, X. Li, and X. Wu, Asymmetric bagging and random subspace for support vector machines-based relevance feedback in image retrieval, IEEE Trans. Patt. Anal. Mach. Intell., vol. 28, no. 7, pp. 10881099, Jul. 2006.
Y. Tausczik and J. Pennebaker, The psychological meaning of words: Liwc and computerized text analysis methods, J. Lang. Soc. Psychol., vol. 29, no. 1, pp. 2454, 2010.
M. Thelwall, K. Buckley, and G. Paltoglou, Sentiment in twit-ter

events, J. Amer. Soc. Inform. Sci. Technol., vol. 62, no. 2, pp. 406418, 2011.
M. Thelwall, K. Buckley, G. Paltoglou, D. Cai, and A. Kappas, Sentiment strength detection in short informal text, J. Amer. Soc. Inform. Sci. Technol., vol. 61, no. 12, pp. 25442558, 2010.
X. Tian, D. Tao, and Y. Rui, Sparse transfer learning for inter- active video search reranking, ACM Trans. Multimedia Comput. Commun. Appl., vol. 8, no. 3, article 26, Jul. 2012.
A. Tumasjan, T. O. Sprenger, P. G. Sandner, and I. M. Welpe, Predicting elections with twitter: What 140 characters reveal about political sentiment, in Proc. 4th Int. AAAI Conf. Weblogs Social Media, Washington, DC, USA, 2010.
X. Wang, F. Wei, X. Liu, M. Zhou, and M. Zhang, Topic sentiment analysis in twitter: A graph-based hashtag sentiment classification approach, in Proc. 20th ACM CIKM, Glasgow, Scotland, 2011.
J. Weng and B.-S. Lee, Event detection in twitter, in Proc. 5th Int. AAAI Conf. Weblogs Social Media, Barcelona, Spain, 2011.
J. Yang and J. Leskovec, Patterns of temporal variation in online media, in Proc. 4th ACM Int. Conf. Web Search Data Mining, Hong Kong, China, 2011.
W. Zhang, C. Yu, and W. Meng, Opinion retrieval from blogs, in

Proc. 16th ACM CIKM, Lisbon, Portugal, 2007.
L. Zhuang, F. Jing, X. Zhu, and L. Zhang, Movie review mining and summarization, in Proc. 15th ACM Int. Conf. Inform. Knowl. Manage., Arlington, TX, USA, 2006.

Shulong Tan received the BS degree in software engineering from Zhejiang University, China, in 2008. He is currently a PhD candidate in the College of Computer Science, Zhejiang University, under the supervision of Prof. Chun Chen and Prof. Jiajun Bu. His current research interests include social network mining, recom- mender systems and text mining.

Yang Li received the BS degree in computer sci-ence from Zhejiang University, China, in 2010. He is currently a PhD candidate at the Computer Science Department, University of California at Santa Barbara, under the supervision of Prof. Xifeng Yan. His current research interests include text mining, natural language under-standing and data management.

Huan Sun received the BS degree in elec- tronic engineering and information science from the University of Science and Technology of China, in 2010. She is currently pursuing toward the PhD degree in computer science at the University of California, Santa Barbara. Her current research interests include sta-tistical machine learning, deep learning, and data mining.

Ziyu Guan received the BS and the PhD degrees in computer science from Zhejiang University, China, in 2004 and 2010, respectively. He had worked as a research scientist with the University of California at Santa Barbara from 2010 to 2012. He is currently a full professor at the College of Information and Technology of Chinas Northwest University. His current research interests include attributed graph min-ing and search, machine learning, exper-tise modeling and retrieval, and recommender systems.

1170

Xifeng Yan is an associate professor with the University of California at Santa Barbara. He holds the Venkatesh Narayanamurti Chair in Computer Science. He received the PhD degree in computer science from the University of Illinois at Urbana-Champaign in 2006. He was a research staff member at the IBM T. J. Watson Research Center between 2006 and 2008. His current research interests include modeling, managing, and mining graphs in bioin-formatics, social networks, information networks,

and computer systems. His work is extensively referenced, with over 7,000 Google Scholar citations. He received the US NSF CAREER Award, the IBM Invention Achievement Award, the ACM- SIGMOD Dissertation Runner-Up Award, and the IEEE ICDM 10- year Highest Impact Paper Award. He is a member of the IEEE.

Jiajun Bu received the BS and the PhD degrees in computer science from Zhejiang University, China, in 1995 and 2000, respectively. He is a professor in the College of Computer Science, Zhejiang University. His current research inter-ests include embedded system, data mining, information retrieval and mobile database. He is a member of the IEEE.

Chun Chen received the BS degree in mathe- matics from Xiamen University, China, in 1981, and the MS and PhD degrees in computer sci-ence from Zhejiang University, China, in 1984 and 1990 respectively. He is a professor in the College of Computer Science, Zhejiang University. His current research interests include information retrieval, data mining, computer vision, computer graphics and embedded tech-nology. He is a member of the IEEE.

Xiaofei He received the BS degree in com- puter science from Zhejiang University, China, in 2000, and the PhD degree in computer sci-ence from the University of Chicago, Chicago, Illinois, in 2005. He is a professor at the State Key Lab of CAD&CG at Zhejiang University, China. Prior to joining Zhejiang University, he was a research scientist at Yahoo! Research Labs, Burbank, California. His current research interests include machine learning, information retrieval, and computer vision. He is a member of the IEEE.

For more information on this or any other computing topic, please visit our Digital Library at www.computer.org/publications/dlib.

Public Sentiment Variations – Interpretations on Twitter

Foreground and Background LDA

Tweets Extraction and Preprocessing

Sentiment Label Assignment

Sentiment Variation Tracking

Twitter Dataset

Foreground Topics from FB-LDA

Reason Ranking of RCB-LDA

4.4 Scientific Articles

4.5 Product Reviews

Leave a Reply