Aspect-based Sentiment Summarization with Deep Neural Networks

DOI : 10.17577/IJERTV5IS050553

Download Full-Text PDF Cite this Publication

  • Open Access
  • Total Downloads : 855
  • Authors : Dhanush D, Abhinav Kumar Thakur, Narasimha Prasad Diwakar
  • Paper ID : IJERTV5IS050553
  • Volume & Issue : Volume 05, Issue 05 (May 2016)
  • DOI :
  • Published (First Online): 16-05-2016
  • ISSN (Online) : 2278-0181
  • Publisher Name : IJERT
  • License: Creative Commons License This work is licensed under a Creative Commons Attribution 4.0 International License

Text Only Version

Aspect-based Sentiment Summarization with Deep Neural Networks

Dhanush D

Department of Computer Science & Engineering

R.V. College of Engineering Bangalore, India

Abhinav Kumar Thakur

Department of Electronics & Communication Engineering

R.V. College of Engineering Bangalore, India

Narasimha Prasad Diwakar Department of Master of Computer Application

R.V. College of Engineering Bangalore, India

Abstract Aspect-based Sentiment analysis (ASBA) has gained lot of importance and seen rapid growth of research in Natural Language Processing. ASBA can be used for Summarization of reviews in e-commerce sites, blogs, discussion forums, etc. The problem under ASBA has two major tasks which are aspect extraction and sentiment classification. To solve these tasks we apply deep neural networks and discuss the performance on SemEval14 dataset. The proposed design consists of separate models for aspect extraction by tagging aspects in a sentence using Recurrent Neural Network and sentence level sentiment classification using Convolution Neural Network. The pairing of aspects and sentiments will be supplemented by a constituency parse tree.

KeywordsWord2Vec, Convolutional Neural Network, Aspect Based Sentiment Analysis, Recurrent Neural Network, Constituency parse tree.


    With the growth of World Wide Web (WWW), there has been large unstructured user content being generated in the form of discussion forums, blogs and customer reviews in e-commerce sites. Reading each and every review to understand user opinions is tough and thus there has been need for methods which can leverage such humongous data in providing better apprehension to the users and companies. Such insights can help businesses to understand market demands and observe market for their stature, and consumers to facilitate better decision making while shopping.

    Sentiment analysis would be one such approach to get the insights from the reviews. This task has been extensively studied in industry and academics, and has gained a lot of importance over years. However, just doing sentiment analysis wont be sufficient since reviews are less often just completely positive or negative in sentiment. Instead, review sentiments are usually targeted towards an aspect of business or product i.e., each review sentence will be suggesting as to what is positive and what is negative. An example of such a case from a restaurant review is as follows:

    Service was nice but price was not worth it and food quality was very miserable.

    Just doing sentiment classification for the above example will not be the right approach since sentiments are biased towards an aspect. The aspects price and food are having negative polarity while service as an aspect is having positive polarity. To overcome this drawback, there is a need for application of Aspect-Based Sentiment Analysis (ASBA) to summarize reviews correctly. Figure 1 shows an example of restaurant review summary for five predefined aspects (Service, Ambiance, Price, Location, Food) where the yellow filled bar indicates the reviews which has positive opinions and un-filled bar indicates review which has negative opinions with the length of the bar indicating the number of reviews.

    Fig. 1. Example of aspect specific summary for restaurant reviews.

    ASBA usually has two main tasks which are aspect tagging and sentiment analysis. Aspect extraction involves determining all the entities which a sentence is speaking about. Sentiment analysis involves determining the polarity of the aspects in the sentence.

    Aspect tagging or finding the target entity of an opinion has been approached in two ways till now. First method is based on linguistic rules. These rules have many limitations when applied to user reviews from web since not very often users follow the rules of language. This results in decrease of accuracy as there will be frequent mismatch among the rules for extraction and structure of sentence in reviews. The second approach is using topic modeling. Topics are extracted using methods like LDA from the dataset and used as pre-defined aspects for aspect classification. This

    method being unsupervised doesnt work well for specific products. Also unsupervised nature makes the approach less adaptable.

    In previous works, there are assumptions made that a sentence always targets a single aspect. This assumption could give wrong results on numerous occasions. In this paper, well be discussing about method for aspect extraction by tagging label for each words in a sentence. Also, handling cases where multiple aspects are being detected in a single sentence since sentiment classifier works on sentence level classifications.


    The problem of ASBA is usually tackled in two ways which are:

    1. SAS model (Separate Aspect Sentiment) – The prediction models for aspects and sentiment is independent.

    2. Joint Multi-Aspect Sentiment Model (JMAS) – Aspect and its sentiment are modelled as pairs here. Each (aspect, sentiment) is a pair representing an output class for the model.

    Most of the models developed initially for ASBA were SAS models. Hu and Liu [1] use linguistic patterns to capture features and wordnets to capture polarities. In [2], CRF with features likes POS tag, capitalization, dependency tree features semantic tagging was performed for extracting opinion sources from text. Mukherjee and Liu [3] explain aspect extraction model using semi- supervised methods. It uses LDA models for topic modelling. However, in such approach, the topics were muddled and werent in par with human judgements, since it modelled on co-occurrence matrix. One of the implementation of ASBA using deep learning framework was using cascaded Convolution Neural Networks were used for aspect extraction and sentiment classification [4]. However, they had the assumption that each sentence targets single aspect.

    Not long ago, JMAS models have been explored. In [5] a hierarchical deep learning framework was discussed which modelled aspect and its sentiment as pairs. Their work was concerned with learning word representation which could reason out at the phrase level the aspect- sentiment.

    Socher et al., (2013) [8] proposed Recursive Neural Tensor Network (RNTN) for predicting sentiment of a sentence using a tensor based composition function. This was implemented using a Sentiment Treebank. The major disadvantage here is the requirement for labels of sentiment at phrase level. Convolution Neural Network applied to many traditional NLP tasks has produced better results than existing methods [6]. In [7] CNN applied to sentence classification produced state-of-results over many benchmark datasets. Also, CNN requires sentiment labels at sentence level. RNN for semantic slot filling has outperformed CRF and has produced state-of-results over many datasets [9]. So here we design deep learning framework for ASBA task using RNN and CNN.


    The overview of the system proposed is shown in figure

    2. The review dataset of a particular product is chosen. Word-embedding for the dataset is created using word2vec

    [10] or word-embedding from the pre-trained models using Google news dataset could be used. The reviews are sentence tokenized and passed to the Recurrent Neural Network (RNN) based aspect model which will tag all the aspects in a sentence. Convolution Neural Network (CNN) based sentiment model is used to classify the sentiment of the sentence. In this way all the aspects and sentiment pairs from thereviews is extracted from the reviews and summarized. If multiple aspects are tagged in a single sentence by RNN, constituency parse tree is used to break a sentence into sub-sentences such that each sub-sentence will have a single aspect and its related context which is then passed to the sentiment classifier.

    Fig. 2. Aspect based Sentiment Analysis architecture.

    1. Word Embedding

      The pre-processed dataset is used to learn word embedding for specified dimensions using word2vec tool [10]. Internally, word2vec uses a shallow neural network to learn the weights with either continuous bag-of-words or skip-gram architecture for a specified context window size. The vectors which are semantically similar have high cosine similarity value. The word embedding acts as the input to the CNN models and RNN models by making words to its respective vectors.

    2. Aspect Tagging

      In this work, RNN is used for aspect tagging i.e., given a sentence RNN must output the class of each word. Figure 3 shows the architecture of RNN used which is an Elman- RNN [11]. The architecture shows that the network has no depth in space and passes current hidden layer output to next temporal hidden unit.

      Fig. 3. Uni-directional RNN structure

      At any time step t, the output of each hidden unit is a non-linear function of input at time step t and previous hidden state. The previous hidden state acts as the memory of the networks and enables encoding more context information.

      = ( + 1) (1)

      = ( ) (2)

      In equation (1), represents the hidden state at the time step . 1 represents the hidden state input at time

      1. 1 is set to zeroes, which is required to calculate at time = 0. The function is a non-linear function among which ReLU or tanh are often used. In (2), is the output at time step .

      The weights , , in (1) and (2) are the weight matrices for input to hidden layer, hidden to hidden, hidden to output respectively. All these weights are shared across steps so that the number of parameters that are to be learned decrease. Also intuitively, this means that the same operation for distinct inputs is being done across each time steps. The input at each time step is given as an array of vectors in a fixed context window. The window size usually varies from 3 to 9.

      Fig. 4. Constituency parse tree for a sentence with multiple aspects and

      different sentiments

      Handling multiple aspects There are cases where a sentence can have multiple aspects being targeted and having contrasting opinions. To handle such cases, constituency parse tree is drawn using Stanford CoreNLP parser library [11]. Constituency parse consists of phrase types in non-terminal nodes and words in terminal nodes of the parse tree.

      If multiple aspects are tagged by the Aspect tagging model, using the constituency parse tree, sub-sentences are extracted by extracting all the words under the parent node S except root node of the tree. These sub-sentences containing known aspects are passed individually to the sentiment classifier to get the aspect-sentiment pairs output. An example such a case is shown in figure 4. If multiple aspects are under the same parent node S, the sentiment of the sentence is mapped to all the aspects. One of such a case is shown in figure 5 which consist of two different aspects in a sentence under the single parent node S. Both the aspects in figure 5 will pair with the sentiment of the sentence.

      Fig. 5. Constituency parse tree for a sentence with multiple aspects and

      similar sentiments

    3. Sentiment Classifier

    The architecture of the sentiment classifier is shown in figure 6. Each word is represented as – dimensional word vector instead of high dimensional one-hot encoding.

    is the word embedding matrix where V is the size of vocabulary. The input sentence forms a matrix of size × where is the number of words in the sentence. The word embedding could be learned by the CNN or word embedding created using word2vec tool can

    be used. For SemEval tasks word embedding from Google News pre-trained word vectors was used. Words not present in the embedding matrix will be randomly initialized.

    Convolution Layer Convolution is performed on the input matrix of size using multiple filters. Each convolution operation involves the following linear transformation to produce a new feature:

    Fig. 6. Convolution Neural Network architecture for sentiment classification

    = ( :+1 + ) (3) In (3), is the filter applied over window of

    words and all the dimension of the word, b is the bias. Convolution is done over the entire sentence of length to generate a + 1 feature map (stride size is always 1). In this way multiple filters are used to generate several such feature maps.

    Sub-sampling Layer Sub-sampling of the feature maps is done using max-over-time pooling [6]. Among

    + 1 feature map, a single maximum value =

    (1, 2, , +1) is chosen. This is done for all the feature maps generated from different filters to maximum values across feature maps, which is appended to form a fixed length feature vector. By max-over-time pooling arbitrary sentence lengths could be handled as pooling layer produces feature vector of length equal to number of feature maps.

    Fully connected MLP The fixed-size feature vectors in the penultimate layer is input into a fully connected Multi-Layer Perceptron. The output is the probability distribution across sentiment classes by the softmax layer in the MLP. In the MLP, regularization was implemented by using dropout. A certain number of hidden of units are randomly dropped out to avert co-adaptation during the forward propagation and non-dropped out hidden units only participate in backpropagation [13].

    The sentiment of the sentence is obtained as output of the CNN which is paired with the specific aspect tagged by the RNN. These aspect sentiment pairs are aggregated to generate summary for a specific review dataset.


    The current proposed summarization approach is evaluated using a standard dataset and the results are discussed.

    1. Dataset

      The standard dataset for Aspect Based Sentiment Analysis was taken from SemEval-2014 task 4. It consisted of around 3044 sentences clearly annotated for training and

      stored in XML format. The frequency of various categories among sentences in train data is shown in figure 7. In this work we have focussed only on subtask 3 (Aspect category detection) and subtask 4 (Aspect category polarity) of task 12 for In-domain ASBA of restaurant reviews. Subtask 3 involves detecting aspect categories in a sentence i.e. given a sentence – Menu was great, but restaurant was expensive, the output classes must be food and price for the sentence. Subtask 4 involved determining aspect category and aspect polarity i.e. for a sentence Menu was great, but restaurant was expensive, the output pairs must be price negative, food positive.

      Fig. 7. Frequency of aspect categories among sentences in train data

    2. Training

      Recurrent neural network (Aspect model) is trained using an IOB tagged dataset. For the SemEval task, the train data was tagged using IOB format. Example:

      Sentence – The design and atmosphere is just as good. Tags – O, B-ambience, O, B-ambience, O, O, O, O, O The initial training data was split into training set and

      validation set to choose optimal epoch. Update during

      training was done using mini-batch stochastic gradient descent where a single sentence represented the mini-batch. Epoch was chosen as the one which produced minimum loss on the validation set. The network was trained for chosen number of epochs.

      Convolution Neural Network (Sentiment model) hyper parameters chosen were: filter windows () of size 3,4,5 with 100 feature maps, 0.5 dropout rate, mini batch size of 25,ReLU as the non-linearity function, 2 constraint of 3. 10% of the training data was used as the validation set. The initial weight matrices are randomly assigned. The network is run for fixed number of epochs and the parameters at which the accuracy is maximum on the validation set is saved and chosen as the best model.

    3. Results

    The result of our aspect model and best score for subtask 3 is tabulated in Table I. The RNN model was trained for 50 epochs at which it produced maximum F1 score on validation set. CNN was trained for 10 epochs at which maximum accuracy was obtained. The evaluation metric currently used is F1 score which is the harmonic mean of precision and recall.


    Subtask 3



    F1 score

    RNN model




    Best team score




    The subtask 4 performance was measured using accuracy of the system on the test data. The accuracy we obtained was 0.761 however highest accuracy among the submissions was 0.829.


In this work, we have presented a deep learning framework for aspect based sentiment analysis which has produced competitive results in the SemEval-2014 subtasks. This summarization system could be used in e- commerce or restaurant review sites where opinion on specific aspects of the product is important. In future we aspire to test the framework for various types of dataset while exploring better enhancements for the current system.


  1. Minqing Hu , Bing Liu, Mining opinion features in customer reviews, Proceedings of the 19th national conference on Artifical intelligence, p.755-760, July 25-29, 2004, San Jose, California

  2. Yejin Choi , Claire Cardie , Ellen Riloff , Siddharth Patwardhan, Identifying sources of opinions with conditional random fields and extraction patterns, Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, p.355-362, October 06-08, 2005, Vancouver, British Columbia, Canada

  3. Arjun Mukherjee , Bing Liu, Aspect extraction through semi- supervised modeling, Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers, July 08-14, 2012, Jeju Island, Korea

  4. Haibing Wu, Yiwei Gu, Shangdi Sun, Xiaodong Gu. (2015, Nov 30)

    . Aspect-based Opinion Summarization with Convolutional Neural Networks [Online]. Available:

  5. Himabindu Lakkaraju, Richard Socher, and Chris Manning, Aspect Specific Sentiment Analysis using Hierarchical Deep Learning, NIPS Workshop on Deep Learning and Representation Learning, 2014

  6. Ronan Collobert , Jason Weston , Léon Bottou , Michael Karlen , Koray Kavukcuoglu , Pavel Kuksa, Natural Language Processing (Almost) from Scratch, The Journal of Machine Learning Research, 12, p.2493-2537, 2011.

  7. Kim Y. (2014), Convolutional Neural Networks for Sentence Classification, Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) pp. 17461751,

    Doha, Qatar, October 25-29, 2014

  8. Socher R, Perelygin A, Wu J Y, et al, Recursive deep models for semantic compositionality over a sentiment treebank, In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1631-1642, 2013.

  9. G. Mesnil, Y. Dauphin, K. Yao, Y. Bengio, L. Deng, D. Hakkani- Tur, X. He, L. Heck, G. Tur, D. Yu, and G. Zweig, Using recurrent neural network for slot filling in spoken language understanding, IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 23, no. 3, pp. 530539, Mar. 2015.

  10. T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean, Distributed Representations of Words and Phrases and their Compositionality, Advances in NIPS, pp. 3111-3119, 2013.

  11. J. Elman, "Finding structure in time", Cognitive Sci., vol. 14, no. 2, 1990

  12. Manning, Christopher D., Mihai Surdeanu, John Bauer, Jenny Finkel, Steven J. Bethard, and David McClosky. The Stanford CoreNLP Natural Language Processing Toolkit In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 55-60, 2014.

  13. Hinton G E, Srivastava N, Krizhevsky A, et al, Improving neural networks by preventing co-adaptation of feature detectors, arXiv preprint arXiv:1207.0580, 2012.

Leave a Reply