Sentiment Analysis Techniques and Approaches

DOI : 10.17577/IJERTV9IS060350

Download Full-Text PDF Cite this Publication

Text Only Version

Sentiment Analysis Techniques and Approaches

Saismita Panda1 Saumya Gupta2 Swati Kumari3 Parul Yadav4

1,2,3,4 Department of Information Technology,

Bharati Vidyapeeths College of Engineering, New Delhi, India

Abstract: Sentiment analysis or opinion mining is the extraction and detailed examination of opinions and attitudes from any form of text. Sentiment analysis is a very useful method widely used to express the opinion of a large group or mass. This sentiment can be based on the attitude of the author or his/her affective state at the moment of writing the text. Social media and other online platforms contain a huge amount of unstructured data in the form of tweets, blogs, posts, etc.This paper aims at analyzing a solution for the sentiment classification at a fined grained level, namely the sentence level in which the polarity of the sentence can be given by the three categories as positive, negative or neutral. In this paper, we have analyzed the popular techniques adopted in the classical Sentiment Analysis problem of analyzing Movie reviews like Naïve Bayes, K-Nearest Neighbour, Random Forest, Maximum Entropy, SVM, and Voted Perceptrons discussed in various papers with their advantages and disadvantages in detail and how many times have they provided researchers with satisfying results.

Keywords- Sentiment analysis; Naïve Bayes; K-Nearest Neighbour; Random Forest; Maximum Entropy; SVM; Voted Perceptron;

  1. INTRODUCTION

    Sentiment Analysis is the most common text classification tool that analyses an incoming message and tells whether the underlying sentiment is positive, negative, or neutral.[1] Before we start discussing popular techniques used in sentiment analysis, it is very important to understand what sentiment is:

    Liu et al. (2009) defines a sentiment or opinion as a quintuple-

    <oj, fjk, soijkl, hi, tl>, where oj is a target object, fjk is a feature of the object oj, soijkl is the sentiment value of the opinion of the opinion holder hion feature fjk of object oj at time tl, soijkl is +ve, -ve, or neutral, or a more granular rating, hi is an opinion holder, tl is the time when the opinion is expressed.[8]

    Pang-Lee et al. (2002)[2] broadly classified the applications of Sentiment Analysis into four categories:

    1. Applications to Review-Related Websites, for example, Movie Reviews, Product Reviews, etc.

    2. Applications as a Sensitive information detector, heated language detection in emails, spam detection, etc.

    3. Applications in Business and Government Intelligence used for knowing Consumer attitudes and trends.

    4. Applications across various other Domains for knowing public opinions for political leaders or their notions about rules and regulations in place.

    Sentiment analysis is a series of methods, techniques, and tools about detecting and extracting subjective information, such as opinion and attitudes, from language[3]. Movie reviews are an important way to measure the performance of any movie. While a rating in the form of stars does tell us the success or failure of a movie quantitatively, movie reviews provide us with deeper qualitative measures and insights into different aspects of the movie.Identifying domain-dependent opinion words is a key problem in sentiment analysis and has been studied by various researchers. However, existing work has been more focused on adjectives and to some extent verbs. Limited work has been done on nouns and noun phrases.[7] Analyzing movies on the basis of its textual reviews gives us a better understanding of the expectations of reviewers.

  2. CHALLENGES

    Sentiment analysis is generally used in classifying the polarity of given text data at a document, sentence, or phrase level. Sentiment analysis of conventional text such as review documents are considered much easier than that of tweets data.This is because of short length of tweets, the frequent use of informal and irregular words, and the rapid progression of language in Twitter[23].It is used to express the given text data into the categories of being positive, negative, or neutral and sometimes is also used for emotional states such as "worry" "excited" and "happy". Cultural factors, sentence negation, sarcasm, terseness, language ambiguity, and differing contexts make it extremely difficult to turn a string of written text into a simple good or bad sentiment.The correct subjectivity of the content is also very difficult to understand for the differences between feelings, opinions, sentiment, and emotions in the context of natural language processing are also very different and can be understood by M. D. Munezero and C. S. Montero description.[14]

    Sentiment analysis is a natural language processing techniques to quantify an expressed opinion or sentiment within a selection of tweets.[19]

    Sentiment analysis is also used to classify a given text into classes i.e. subjectivity and objectivity. As the subjectivity of words and phrases may depend on their context and an objective document contains subjective sentences.This problem is more difficult than polarity classification. Though if listed, the challenges faced during building a model for correct classification can be endless, but in Sentiment Analysis: A Literature Survey. The authors discussed that it can be categorized into nine categories as explained below:

    1. Implicit Sentiment and Sarcasm: There is a possibility that a sentence may not have any sentiment bearing word

      but may have an implicit sentiment. For example, The headset broke in two days. The sentence does not explicitly carry any negative sentiment bearing words although this is a negative sentence. Thus identifying semantics is more important in sentiment analysis than syntax detection.

    2. Domain Dependency: There exist many words whose polarity i.e either positive, negative or neutral changes from domain to domain. For example, you better read a book. In this, the sentiment conveyed has a positive sentiment in the book domain but a negative sentiment in the movie domain.

    3. Thwarted Expectations: Sometimes the author sets up deliberate contrast. For example, This film should be amazing. It seems to have an amazing plot, with most liked actors present as well as the supporting cast doing a great job as well. However, it could not hold up. Even due to the presence of words that are positive in orientation, the overall sentiment of this sentence is negative because of the crucial last sentence, which clearly states negative polarity. Whereas in traditional text classification this sentence would have been classified as positive due to the presence of higher frequency positive terms, as in traditional text classification frequency of term is more important than the term itself.

    4. Pragmatics: It is necessary to detect the pragmatics of the user view as it may change the sentiment completely. For Example, I just watched The Ugliest Girl in Town. Capitalization can be used with subtlety to denote sentiment. This denotes a positive sentiment.

    5. World Knowledge: At times world knowledge is required for detecting sentiments. For example, He is a Frankenstein. The first sentence depicts negative sentiment but the meaning of the word Frankenstein should be known to detect the sentiment. Thus knowledge is required to be fed to understand the phrases better.

    6. Subjectivity Detection: Subjectivity Detection is to differentiate between opinionated and non-opinionated text

      i.e between the opinion of someone and a fact tated. To improve the performance of the system we remove the subjectivity from the words to obtain the objective facts. For example, 1) I hate love stories. 2) I do not like the tv series Haters back off. The first example here presents an objective fact i.e negative whereas the second example depicts the opinion about a particular movie here "not like" is negative sentiment and not the word "hate".

    7. Entity Identification: A text or sentence may have multiple entities. It is important to find out the entity towards which the opinion is directed. For example, Royal Tigers defeated Chennai Super Kings in an IPL match. Here the example is positive for Royal Tigers and negative for Chennai Super Kings.

    8. Negation Handling: Negation can be expressed even without the explicit use of any negative words. The method used to handle the negation explicitly in the sentences like

    I do not like the movie, is to reverse the polarity of all the words appearing after the negation operator (like not). For example, Not only did I like the acting, but also the direction. In this case, the polarity is not reversed after not due to the presence of only.

  3. DATASET

    The Movie Review Dataset that was most commonly used in the research papers was obtained from a Large Movie Review Dataset[6]. The dataset contains 50,000

    training examples where each review is labeled with the rating of the movie on a scale of 1-10. With the customers' reviews, one can understand the changes in the market and improve their product/service. It is also scalable to any type of environment.[5] As sentiments usually include adjectives like good/bad or happy/sad or like/dislike, it was better to classify these ratings as either 1 (like) or 0 (dislike) on the basis of ratings. If the rating was above 5, it means the person who liked the movie otherwise not. It is important to highlight the fact that sentiment mining can be performed on three levels as follows [32]: Document-level sentiment classification: At this level, a document is classified as positive, negative, or neutral. Sentence-level sentiment classification:At this level, a sentence is classified as positive, negative or unbiased. Aspect and feature level sentiment classification: At this leel, sentences/documents can be categorized vas positive, negative or non-partisan in light of certain aspects of sentences/archives and commonly known as perspective-level assessment grouping.The training and testing distribution were first taken to be fifty percent and fifty percent each, then it was decided to redistribute the dataset to twenty percent for testing and eighty percent for training. It led to over-fitting on training examples and worse performance on the test set. In the end, Cross-Validation is used in which the complete dataset is divided into multiple forms with different samples for training and validation each time and the final performance result of the classifier is averaged overall results. This improved the accuracy of the models across the boards.

  4. PREDICTIVE TASK

    The motive of this paper is to identify the best sentiment technique of a movie review problem on the basis of its textual information. Thus, it was classified whether a person liked the movie or not based on the review they give for the movie. Sentiment Analysis of Movie reviews is mainly useful for the creator of the movie who wants to measure its overall performance using reviews that critics and viewers are providing for the movie. The outcome of this project can also be used to create a recommender by providing a recommendation of movies to viewers on the basis of their previous reviews.Hundreds of thousands of users depend on online sentiment reviews. 90% of customers decisions depended on Online Reviews in April 2013 (Ling et al., 2014). The main aim of scrutiny sentiment is to analyze the reviews and examine the scores of sentiments. This analysis is divided into three levels (Thomas, 2013): document level (Ainur et al., 2010),

    sentence level (Noura et al., 2010), word/term level (Nikos et al., 2011) or aspect level (Haochen and Fei, 2015).[34] This project helps to find a group of viewers with similar movie choices (likes or dislikes). As one part of this project aims to study and understand the significance of feature extraction techniques like keyword spotting, lexical affinity, and statistical methods with respect to our problem. With feature extraction, the papers have also discussed the different classification techniques and accuracy of their feature representation. Sentiment mining takes advantage of NLP and information extraction (IE) approaches to analyze an extensive number of archives in order to gather the sentiments of comments posed by different authors [31]. This process incorporates various strategies, including computational etymology and information retrieval (IR) [31]. Finally, A conclusion is drawn as to the most accurate technique for the current task of prediction.

  5. COMPARISON OF TECHNIQUES

    Several multi-category classification algorithmic programs are evaluated with given coaching knowledge to search out the most effective algorithm for the task. When evaluating the accuracy of algorithms, an identical coaching set can not be used as a model might overfit to coaching knowledge however it cannot predict something helpful for unseen knowledge. Recently, Zainuddin et al. [26] proposed an aspect-based sentiment analysis (ABSA) framework, which contained two principal tasks. The first task used aspect- based feature extraction to identify aspects of entities and the second task [25] To avoid this downside, it's common to divide given coaching knowledge to coach set and check set. Their square measure varied approaches to divide a given coaching knowledge set to coach set and check set. ab initio the holdout approach was used wherever hr of the first coaching knowledge set is employed for coaching and therefore the remaining quantity is employed for testing. However, there is still the possibility of overfitting due to pinched parameters for the program optimally. This way, data regarding the check set might leak into the model, and analysis metrics not report on generalization performance. Pak and Paroubek [61] present a method for automatic collecting of a corpus from microblogs and build a sentiment classifier. In this instance, the corpus is gathered from Twitter. The authors claim that the approach can be adapted to multiple languages, but in their work, it is only used with the English language.[11] Another part of the dataset will be controlled out as a validation set to resolve that issue. Therefore the workflow for evaluating is coaching take on the coaching set, analysis is finished on the validation set, and once the experiment looks to achieve success, the final analysis will be done on the check set.

    Cumulative Analysis of people's reactions towards buying a product, services are vital in any project.

    Sentiment analysis aims to discover the sentiments behind opinions in texts on varying subjects.[10] However, by partitioning the on the market knowledge into 3 separate sets (training, analysis, and validation) we have a tendency to drastically cut back the number of samples which might be used for learning the model, and therefore the results will

    depend upon a selected random alternative for the combine of (train, validation) sets.

    Another bibliometric study on sentiment analysis was published by Piryani et al [17]. studies the bibliometric studies on sentiment analysis. And roughly shows over tenfold growth trends in papers per year in a decade. They analyze patterns citation and they also draw a map of citations[11]

    To solve these problems, cross-validation is employed. In k- fold CV, the coaching set is split into k smaller sets. Following procedure is followed for every of the k folds,

    1. A model is trained mistreatment k-1 of the folds as coaching knowledge. The enuing model is

      validated on the remaining part of the info.

    2. The performance lives reportable by k-fold cross- validation is that the average of the

      values computed in every fold.

      This approach will be computationally dearly-won compared to the holdout technique, however, this doesn't waste an excessive amount of knowledge that could be a major advantage in issues wherever the quantity of samples is extremely little.

      Given below square measure the outline and results of every evaluated technique.

  6. MACHINE LEARNING APPROACHES Machine learning techniques are broadly classified into two types based on their dataset specifications:

      • Unsupervised learning: This algorithm is used to draw inferences from datasets without labels. Thus it is commonly used for finding hidden patterns and grouping the data.

      • Supervised learning:This algorithm is used to draw inferences for a labeled dataset.

    Naive Bayes, Support Vector machines and Maximum Entropy are few widely used machine learning techniques for sentiment analysis.For the second dataset, the proposed technique achieved an F1-measure of 0.795 whilst [28] achieved an F-score of 0.76. For the third dataset, the proposed

    1. Support Vector Machine Classifier

      Supervised vector machines are supervised learning models with learning algorithms that analyze data for classification and regression problems. It is used for both linear and non- linear data. If the data is linearly separable, the SVM searches for the optimal separating hyperplane, which is a decision boundary that separates data from one class from another. If the data is linearly inseparable, the SVM uses nonlinear napping to transform the data into a higher dimension.[20] It tries to reduce the dimensionality of the dataset so that the given data can be effectively divided into two parts. Po-Wei Liang et al. [30] designed a framework called an opinion miner that investigated and detected the sentiments of social media messages automatically. The interpreted tweets were combined for the analysis and the

      messages which contained feelings were extracted and their polarities determined from either positive or negative. To achieve this, the experimenters [30] classified the tweets into opinion and non-opinion using the NB classifier with a unigram.[25] The major advantage of support vector machines is the efficacy we can obtain in high dimensional spaces. Support vectors use a subset of training points in their decision functions due to which we obtain a memory- efficient model.

      while Anton and Andrey [29] developed a model to extract sentiment polarity from Twitter data, which extracted words containing n-grams and emoticons. The experiment carried out demonstrated that the SVM performed better than the Naïve Bayes. The best overall performing method was the SVM in combination with unigram feature extraction, achieving a precision accuracy of 81% and a recall accuracy of 74%. [25] But it should also be noted that the SVM is not efficient in the case of unbalanced data as they tend to perform poorly in the minority data. The scikit library provides us with different types of kernels which include linear, RDF, and polynomial mathematical functions. These kernels implement multi-class classification using the one to one approach. The Performance of the above method has been evaluated using the holdout technique of having 60% data for training and 40 % data for testing. The above model has also been evaluated with categorization accuracy and cross-validation approaches and the resulting values of the accuracy of the model using the Hold out method came out to be 0.60 while in the case of the 10-fold cross-validation method, it came out as 0.612.[9]

    2. Stochastic Gradient Descent

      This iterative algorithm is used to improve the overall performance of SVM, This simple approach is used to discriminate linear classifiers under convex loss functions. This model has been in use for a long time but it has recently gained attention due to its application in large scale learning.

      The ease of implementation and code tuning are major advantages of this technique.

      While its drawback would be its sensitivity to feature scaling. The above model has also been evaluated with categorization accuracy and cross-validation approaches and the resulting values of the accuracy of the model using the Hold out method came out to be 0.615 while in the case of the 10-fold cross-validation method, it came out as 0.62.[9]

    3. Logistic Regression

      This statistical model uses a logistic function to model the probability of a certain class. Its name does despise it to be a regression model but it is a linear model. It is also known in the literature as logit regression, log-linear classifier, and maximum-entropy classification (MaxEnt). This model can be extended to model several classes of events such as determinism of various objects in the image. Performance has been calculated using the holdout approach (60% – training, 40%- testing) and 10-fold cross-validation approaches and categorization accuracy values are 0.60, 0.618 respectively.[9]

    4. SGD with logistic regression

      Gradient descent is an algorithm that starts from a random point of the problem statement dataset values and at each iteration optimizes the results by decreasing the difference between the minimum cost and initial cost using the concept of slopes and derivatives. It was used to improve the performance of logistic regression. The resulting values of the accuracy of the model using the Hold out method came out to be 0.63 while in the case of a 10-fold cross-validation method, it came out as 0.638.[9]

    5. K Nearest Neighbors Classifier

      These nonparametric algorithm stores are available as well as new cases on the basis of similarity measures. It has also been used in estimation and pattern recognition.

      Neighbors-based classification is a type of instance-based learning that simply stores instances of the training data. Classification in the above-mentioned model is performed by simple voting of majority neighbors with respect to the test point. The papers we analyzed implemented two different neighbor classifiers that K neighbor and Radius Neighbors. Among the above both techniques, K neighbors approach is most commonly used where K denotes the number of neighbors considered in relation to the decision. Saif et al. [33] had introduced the concept of merging semantic with unigram in his paper. The extracted features from the above model were used to compute the correlation of entity groups augmented by their sentiment polarities. It should be noted that incorporating semantic features into an analysis does help in the detection of sentiment analysis with entities.[25]

      The above model has also been evaluated with categorization accuracy and cross-validation approaches and the resulting values of the accuracy of the model using the Hold out method came out to be 0.625 while in the case of a 10-fold cross-validation method, it came out as 0.638.[9]

    6. Random Forest

      Random forest algorithm is used for both classifications as well as regression. In this, a decision tree is created on the basis of a data sample and then the prediction is obtained from each of them and finally selects the best solution through voting. It is better than a single decision tree because it reduces the over-fitting by averaging the result as it is an ensemble tree. Each tree in the ensemble is built from bootstrap samples from the training set.

      When constructing a tree the split chosen is no longer the beat among the features available. Thus instead we should choose the best among a random subset of the features rather than considering them as a whole. Due to this randomness, the bias increases by a small percentage when compared to non-random trees. It is balanced by decreasing variance through averaging hence yielding an overall beter model. The above model has also been evaluated with categorization accuracy and cross-validation approaches and the resulting values of the accuracy of the model using the Hold out method came out to be 0.63 while in the case

      of the 10-fold cross-validation method, it came out as 0.631.[26]

      For the larger data sets, it seems that significantly lower error rates are possible. On some runs, we got errors as low as 5.1% on the zip-code data, 2.2% on the letters data, and 7.9% on the satellite data. The improvement was low on the smaller data sets. More work is needed on this, but it does suggest that different injections of randomness can produce better results.[24]

    7. Naïve Bayesian classifier

      This classification algorithm is a collection of classification algorithms with the assumption of independence between pairs of features. The above model has also been evaluated with categorization accuracy and cross-validation approaches and the resulting values of the accuracy of the model using the Hold out method came out to be 0.64 while in the case of a 10-fold cross-validation method, it came out as 0.646.[9]

      The Naïve Bayes classifier works as follows: Suppose that there exists a set of training data, D, in which each tuple is represented by an n-dimensional feature vector, X=x1, x2, so on..,xm, indicating m measurements made on the tuple from m features. Assume that there are m classes,

      C1, C2 ,..so on till CM. Given a tuple Y the classifier will predict that Y belongs to CI if and only if: P(C I|Y)>P(C J|Y), where I,J[1,m]andIJ. P(C I|Y) is computed as:

      Equation 1.1

      Naïve Bayes performs better than SVM when the feature space is small. But if the feature space is increased then SVM performs better than Naive Bayes.

    8. Maximum Entropy Classifier

      With respect to the relationship amongst features, no assumptions are considered in the maximum entropy classifier. The conditional distribution of class labels is estimated by maximizing the entropy of the system through this classifier. The mathematical representation of conditional distribution is:

      Equation 1.2

      Here, the feature vector is represented by "X" and the class label by "y". The normalization factor is represented by Z(X) and the weight coefficient by i. For classification within our feature vector, the relationships amongst part of speech tag, emotional keyword, and negation are utilized.[4]

    9. Ensemble classifier

      Various kinds of ensemble classifiers are developed. All the features of all the best classifiers are utilized in this classifier

      to perform the best classification. Base classifiers used three main approaches i.e. Naive Bayes, Maximum entropy, and SVM. The voting rule is used to create an ensemble classifier. Depending upon the output of larger parts of classifiers, their classification is done. The following are the two sets of data needed within machine learning approaches [14]: a. Training Set A. Test Set The training dataset is collected to initiate machine learning. The training data is used in the next step for training a classifier. Selecting the feature is an imperative decision to be made after the selection of a supervised classification approach. The representation of documents can be known through this. During sentiment classification, the most commonly used features include Term presence and their frequency Part of speech information Negations Opinion words and phrases When having an initial set of labeled opinions is unrealistic for training the classifier, the semi-supervised and unsupervised techniques are designed. The sentiment dictionary which consists of opinion words is used by a lexicon-based approach. The polarity is determined by matching these words with the rest of the data. For understanding how positive, negative, and objective the words contained in a dictionary are, the sentiment scores are assigned to opinion words. The sentiment lexicon which is an accumulation of known and precompiled sentiment phrases, idioms, and terms is used as a base for the lexicon- based approaches. For different traditional genres of communication, this approach is developed [15].

      This approach has two sub-classifications:

      • Dictionary-based The terms which are collected normally and then annotated in a manual way are utilized for this approach. The synonyms and antonyms of a particular word within the dictionary are searched for growing this set. WordNet is an example of one such dictionary using which a thesaurus called SentiWordNet is developed. The domain and context-based orientations cannot be managed by this method which is its major drawback.

        Figure 1

      • Corpus-Based The dictionaries related to a particular domain are provided by the corpus-based approach. A set of seed opinion terms which grow from the search of relevant words using statistical or semantic techniques are created by these dictionaries. Following are the two basic methods that are based on statistics [16]:

      • Latent Semantic Analysis (LSA). An interesting solution can be provided by different methods which

    TABLE 1.1: Comparison of techniques

    Machine Learning Approaches

    Hold out method

    10-fold cross validation

    Support Vector Machine Classifier

    0.6

    0.612

    Stochastic Gradient Descent

    0.615

    0.62

    Logistic Regression

    0.6

    0.618

    SGD with logistic regression

    0.63

    0.638

    K Nearest Neighbors Classifier

    0.625

    0.638

    Random Forest

    0.63

    0.631

    Naïve Bayesian classifier

    0.64

    0.646

    are based on semantics. A comparative investigation of existing opinion mining techniques that include cross-domain, machine learning, and lexicon-based techniques and so on has been provided depending upon certain performance measures such as recall and precision.

    Figure 1.2: the accuracy obtained from the hold out method and 10 fold cross-validation method.

  7. CONCLUSION

From the above results, it can be concluded from the above- given accuracies that the model implemented with the help of the Naive Bayesian has been much more promising than

the SVM based approach with the given data and thus would be more promising to work on. The Core NLP based method not only carries a normal classification process but also does some important data preprocessing steps on the movie review. The data mostly used was accomplished using the

available tools within the Stanford Core NLP library, such as Stanford Tokenizer and Stanford Lemmatizer,[9] performance improvement in various research papers. Additionally, research overlapping sentiment analysis and natural language processing has seen to be addressing many problems to the applicability of sentiment analysis such as irony detection [12] and multi-lingual support [13]. Moreover, the Sentiment Analyzer module significantly supported the model building process for these approaches which greatly increased the accuracy of the models which was built in various research papers. It was also found that while journals rarely change their title name and vary in issue and volume numbers, the conference proceedings' names are not reliable over the years. In order to overcome this issue, the author[18] cleaned the venues[11]. And thus it can be concluded that researching and verifying the methods that have benefited most, as well as research papers analysis and filtering, are very important in the success of the current model. This knowledge and experiences, when reused can lead to further advantages to current Sentiment Analysis papers and projects and s a must step to include.

REFERENCES:

  1. Subhabrata Mukherjee, Pushpak Bhattacharyya, Sentiment Analysis: A Literature Survey. Available at Cornell University, Computer Science Department, Computation and Language/arxiv.org/abs/1304.4520, 2013, JOUR.

  2. Pang, B., L. Lee, and S. Vaithyanathan. 2002. Thumbs up?: sentiment classification using machine learning techniques. In Proceedings of the ACL-02 conference on Empirical methods in natural language processing-Volume 10, 7986.

  3. B. Liu, Handbook Chapter: Sentiment Analysis and Subjectivity. Handbook of Natural Language Processing, Handbook of Natural Language Processing. Marcel Dekker, Inc. New York, NY, USA, 2009.

  4. Adyan Marendra Ramadhani, Hong Soon Goo. Twitter Sentiment Analysis using Deep Learning Methods. 7 the International Annual Engineering Seminar (InAES), Yogyakarta, Indonesia, 2017.

  5. K. Kaviya, C. Roshini, V. Vaidhehi, J. Dhalia Sweetlin. Sentiment for Restaurant Rating. 2017 IEEE International Conference on Smart Technologies and Management for Computing, Controls, Energy, and Material (ICSTM).

  6. Large Movie Review Dataset extracted from Stanford datasets and can be found at ai.stanford.edu/~amaas/data/sentiment.

  7. Lei Zhang, Bing Liu, Identifying Noun Product Features that Imply Opinions, Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: short papers, pages 575580, Portland, Oregon, June 19-24, 2011. c 2011 Association for Computational Linguistics

  8. Mauro Dragoni, An Information Retrieval-based System For Multi-Domain Sentiment Analysis, Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), pages 502509, Denver, Colorado, June 4-5, 2015. c 2015 Association for Computational Linguistics.

  9. Asiri Wijesinghe, Sentiment Analysis on Movie Reviews,(2015/10/14) Research Gate

  10. Alsaeedi Khan, Abdullah Mohammad Study on Sentiment Analysis Techniques of Twitter Data(2019), pages 361-374, International Journal of Advanced Computer Science and Applications.

  11. Mika V. Mäntylä, Daniel Graziotin, Miikka Kuutila, The evolution of sentiment analysisA review of research topics, venues, and top cited papers, Computer Science Review, Volume 27, February 2018, Pages 16-32, ISSN 1574-0137

  12. A. Reyes and P. Rosso, On the difficulty of automatically detecting irony: beyond a simple case of negation, Knowledge and Information Systems, vol. 40, no. 3, pp. 595614, 2014.

  13. A. Hogenboom, B. Heerschop, F. Frasincar, U. Kaymak, and F. de Jong, Multi-lingual support for lexicon-based sentiment analysis guided by semantics, Decision support systems, vol. 62, pp. 43 53, 2014.

  14. M. D. Munezero, C. S. Montero, E. Sutinen, and J. Pajunen, Are they different? Affect, feeling, emotion, sentiment, and opinion detection in text, IEEE transactions on affective computing, vol. 5, no. 2, pp. 101111, 2014.

    B. Liu, Sentiment analysis and opinion mining, Synthesis lectures on human language technologies, vol. 5, no. 1, pp. 1167, 2012.

  15. A. Pak and P. Paroubek, Twitter as a Corpus for Sentiment Analysis and Opinion Mining., in LREc, 2010, vol. 10.

  16. A. Tumasjan, T. O. Sprenger, P. G. Sandner, and I.M. Welpe, Predicting elections with twitter: What 140 characters reveal about political sentiment., ICWSM, vol. 10, no. 1, pp. 178185, 2010.

  17. R. Piryani, D. Madhavi, and V. K. Singh, Analytical mapping of opinion mining and sentiment analysis research during 2000 2015, Information Processing & Management, vol. 53, no. 1, pp. 122150, 2017.

  18. R. C. Team and others, R: A language and environment for statistical computing, 2013.

  19. Sarlan, Aliza & Nadam, Chayanit & Basri, Shuib. (2014). Twitter sentiment analysis.212-216.10.1109/ICIMU.2014.7066632.

  20. Fang, X., Zhan, J. Sentiment analysis using product review data.

    Journal of Big Data 2, 5 (2015).

  21. Mohey El-Din, Doaa. (2016). Analyzing Scientific Papers Based on Sentiment Analysis (First Draft). 10.13140/RG.2.1.2222.6328.

  22. Zhang, L., Wang, S., & Liu, B. (2018). Deep learning for sentiment analysis: A survey. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 8(4), e1253.

  23. Dos Santos, C., & Gatti, M. (2014, August). Deep convolutional neural networks for sentiment analysis of short texts. In Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers (pp. 69-78).

  24. Leo Breiman Statistics Department University of California,

    Random Forest, 2001, CA 94720

  25. Alsaeedi, Abdullah Khan, Mohammad 2019, pages 361 – 374, A Study on Sentiment Analysis Techniques of Twitter Data, International Journal of Advanced Computer Science and Applications

  26. Wijesinghe, Asiri, 2015, Sentiment Analysis on Movie Reviews, 10.13140/RG.2.2.13784.80645, Research Gate.

  27. N. Zainuddin, A. Selamat, and R. Ibrahim, "Hybrid sentiment classification on twitter aspect-based sentiment analysis," Applied Intelligence, pp. 1-15, 2017.

  28. F. M. Kundi, S. Ahmad, A. Khan, and M. Z. Asghar, "Detection and scoring of internet slangs for sentiment analysis using SentiWordNet," Life Science Journal, vol. 11, no. 9, pp. 66-72, 2014.

  29. A. Barhan and A. Shakhomirov, "Methods for Sentiment Analysis of Twitter messages," in the 12th Conference of FRUCT Association, 2012.

  30. P.-W. Liang and B.-R. Dai, "Opinion mining on social media data," in Mobile Data Management (MDM), 2013 IEEE 14th International Conference on, 2013, vol. 2: IEEE, pp. 91-96.

  31. R. Sharma, S. Nigam, and R. Jain, "Opinion mining of movie reviews at document level," arXiv preprint arXiv:1408.3829, 2014.

  32. R. Sharma, S. Nigam, and R. Jain, "Polarity detection at sentence level," International Journal of Computer Applications, vol. 86, no. 11, 2014.

  33. H. Saif, Y. He, and H. Alani, "Semantic sentiment analysis of twitter," in International semantic web conference, 2012: Springer, pp. 508-524.

  34. Doaa Mohey, El-Din Mohamed, A survey on sentiment analysis challenges, Journal of King Saud University – Engineering Sciences Volume 30, Issue 4, October 2018, Pages 330-338.

Leave a Reply