Comparative Analysis of Book Reviews using SVM Linear and RBF Kernel

Om Kolte; Archana Kadam

doi:10.17577/IJERTV11IS120141

Volume 11, Issue 12 (December 2022)

Comparative Analysis of Book Reviews using SVM Linear and RBF Kernel

DOI : 10.17577/IJERTV11IS120141

Download Full-Text PDF Cite this Publication

Open Access
Article Download / Views: 133
Authors : Om Kolte , Archana Kadam
Paper ID : IJERTV11IS120141
Volume & Issue : Volume 11, Issue 12 (December 2022)
Published (First Online): 07-01-2023
ISSN (Online) : 2278-0181
Publisher Name : IJERT
License: This work is licensed under a Creative Commons Attribution 4.0 International License

PDF Version

View

Text Only Version

Comparative Analysis of Book Reviews using SVM Linear and RBF Kernel

Om Kolte

Department of Computer Engineering, Pimpri Chinchwad College of Engineering Pune, India

Prof. Archana Kadam

Department of Computer Engineering, Pimpri Chinchwad College of Engineering Pune, India

Abstract Sentiment analysis, also referred to as opinion mining, is a branch of natural language processing which focuses on the analysis of identifying the opinions or feelings expressed in textual content. The primary focus of this study is on conducting a sentiment classification of book reviews using supervised (Support Vector Machine) machine learning technique on a book review dataset from Amazon. The comparative analysis of the approach with different kernel parameter indicates that supervised approach (SVM) with Radial Basis Function (rbf) kernel gives the best accuracy of 83.84% whereas linear kernel gives us better overall result with accuracy of 81.24%.

KeywordsSentiment analysis, Support Vector Machine, Supervised approach, Machine Learning, Radial Basis Function (rbf)

(SVM) and the unsupervised approach consists of the semantic approach also known as SO-PMI-IR algorithm used for classifying reviews. The results of this paper shows that the unsupervised algorithms works better with dataset having long phrases whereas the supervised algorithms gave higher accuracy on the dataset containing one-lined short reviews. V. Kaur proposed an unsupervised semantic oriented approach for classifying books as positive, negative or neutral based on the readers reviews of the respective books. The algorithm which was used is called as SO-PMI-IR where:

SO Semantic Orientation
PMI Pointwise Mutual Information

IR Information Retrieval

INTRODUCTION

Supervised learning is the types of machine learning in which machines are trained using well labelled training data means some input data is already tagged with the correct output. This includes algorithms like Support Vector Machine (SVM), Random Forest (RF), NaÃ¯ve Bayes Classifier (NB), and many more.

Unsupervised learning is the type of machine learning in which models are not supervised using training dataset. Its main goal is to find hidden patterns from the data on which it is applied. This includes algorithms like the semantic oriented approach (SO-PMI-IR).

The introduction of the internet and technology has provided people with greater access to web apps via smart devices, greatly increasing the importance of the rating system. However, there are millions of product or service related reviews available on the web, and reading all of them is a time-consuming and stressful effort for anyone. As a result, there is a need for appropriate approaches that automatically categorize these reviews as good or negative in order to provide valuable information to the user. This classification task is technically known as Sentimental Analysis. It is a branch of natural language processing which focuses on the analysis of identifying the opinions or feelings expressed in textual content. Data mining is a process that helps extract useful knowledge from large amounts of data. Analysis of sentiment is performed constantly in widely spoken languages such as English with various methods of machine learning algorithms.
LITERATURE REVIEW

The supervised and unsupervised approaches used for opinion mining were discussed. The supervised approach consists of NaÃ¯ve Bayes Approach, Support Vector Machine

Fig. 1 Sentimental Analysis Process Model [1]
The first phase, which is depicted in figure 1, is the preparation of the dataset, in which empty and incomprehensible reviews are deleted from the dataset, and then the text of processed reviews is retrieved from those reviews and stored in text file. Then, in the second stage, for the purpose of sentiment classification, two supervised techniques known as NB and SVM as well as an unsupervised approach known as SO-PMI-IR were utilised.

The comparison of NaÃ¯ve Bayes Algorithm with the results of the semantic approach SO-PMI-IR and SVM was analyzed.

It is observed that better accuracy is achieved with NB rather than SVM and is almost similar to the SO-PMI-IR approach. The results showed that SO-PMI-IR gave the best accuracy and NB outperformed SVM.
1. Walia explored unsupervised SO-PMI-IR as well as the supervised NB and SVM approaches for sentimental analysis of the reviews. The results showed that SO-PMI-IR gave the best accuracy and the NB classifier outperformed the SVM. [2]
  The provision of a visual method for book review evaluations in order to assist readers in gaining knowledge of various books. The purpose of the suggested system is to offer a graphical method for reviewing books, with the intention of assisting users in gaining a better knowledge of a variety of works through the utilization of various visuals. Through the use of Tableau software, this paper reported the findings of a study that analyzed 1000 customer reviews posted on Amazon. [3]
  Multiple methodologies used for pre-processing techniques. Analysis of user sentiment and a representation of various visualization approaches make up the two primary components that are integrated into the system to provide answers to the research questions. [4]
  Using a clustering algorithm for review classification which is used to group the users into clusters of their interests and collaborative algorithm is used to recommend books. There are multiple algorithms implemented for opinion mining. [5]
ALGORITHMIC SURVEY
The following measures have been used to analyze the performance of the algorithm used:
1. Accuracy = (True Positive + True Negative) / Total instances
2. Precision = True Positive / (True Positive + False Positive)
3. Recall = True Positive / (True Positive + False Negative)
4. F-score = 2. (Precision. Recall/ (Precision + Recall))
The techniques used in this paper have been implemented in Python and the dataset was in JSON format.
PROPOSED SYSTEM

Going through the existing research papers in this domain of research and their methodology, we have tried to propose a system which mainly involves the following 5 stages:

Stage 1 Dataset Preparation Stage 2 Dataset Preprocessing Stage 3 Feature Extraction

Stage 4 Classification

Stage 5 Performance Evaluation and Comparison

Stage 4 Classification

After the data has been pre-processed, we apply the SVM algorithm to identify the polarity of the review. First we used the rbf kernel optimization parameter which is the default kernel for SVM algorithm and noted the results obtained. Later, linear kernel was used as kernel optimization parameter to improve the results obtained by using the rbf kernel.

Stage 5 Performance Evaluation and Comparison

The results need to be evaluated in order to measure the efficiency of the model. For evaluation we use the different metrics like accuracy, f-score, recall and precision. We compare the results of both the kernels (rbf and linear) using these metrics.

RESULTS

This section shows the results of the implementation of the SVM algorithm used for sentiment classification. Table I contains the result of SVM approach with linear as well as rbf kernel. Dataset consisting of Amazon Reviews in a JSON file was used.

SVM Linear Kernel

If the data can be linearly split using a single line, then the Linear Kernel is the appropriate choice for processing the data. It is one of the kernels that is employed on a widespread scale. Its primary use occurs in situations where a given data set has a significant number of features. One of the instances that demonstrates a large number of characteristics is text classification; this is because each letter represents a new feature. Therefore, the Linear Kernel approach constitutes the majority of our Text Classification.

Fig. 3 Proposed System Model

Stage 1 Dataset Preparation

This stage involves collecting book reviews from Amazon Marketplace and storing them into a file to create a dataset. This file can be of any type like csv, json, etc. The main aim of this stage is to collect the data and store it in a file for further processing and use.

Stage 2 Dataset Preprocessing

In this stage, the data which was collected and stored is then preprocessed. The dataset contains reviews with null values or some invalid values, these are removed from the dataset and then the data is stored in the file again after all those values are removed. This gives us a clean dataset which is easy to use and would yield better results. Basically, the null and invalid entries are removed from the original dataset to obtain a clean dataset.

Stage 3 Feature Extraction

The approach known as dimensionality reduction, in which an initial collection of unprocessed data is partitioned into categories that are easier to work with, includes the feature extraction step as one of its components. Because of this, the processing will be made easier. Feature extraction is something that we do since it makes it possible for the model to be produced with less work from the machines and speeds up the processes of machine learning and generalization.
SVM RBF Kernel

The term "Kernel Function" refers to a procedure that takes data as its input and converts it into the format that is necessary for processing data. The term "kernel" is applied because the Support Vector Machine makes use of a collection of mathematical functions that provide the "window" to alter the data. In most cases, the Kernel Function will adjust the training set of data in such a way that a non-linear decision surface will be able to be transformed into a linear equation in spaces with a greater number of dimensions. In its simplest form, it computes and returns the inner product of the distance between two locations in a standard feature dimension. The transformation was enhanced by the use of the Radial Basis technique.

TABLE I. RESULT ON THE AMAZON DATASET

Method	Performance Measure	33% Test Data
SVM with Linear Kernel	Accuracy	81.24%
	F-Score	0.812
	Recall	0.812
	Precision	0.812
SVM with rbf Kernel	Accuracy	83.84%
	F-Score	0.764
	Recall	0.838
	Precision	0.703

CONCLUSION

The sentiment analysis of book reviews using a supervised technique is the primary emphasis of this work. I constructed a dataset made out of Amazon Reviews, and applied the widely known support vector machine (SVM) algorithm to it.

According to the findings, SVM with rbf kernel produced more accurate results, while SVM with linear kernel produced superior results overall.

REFERENCES

[1] V. Kaur, Sentimental Analysis of Book Reviews using Unsupervised Semantic Orientation, 2018 Second International Conference on Green Computing and Internet of Things (ICGCIoT)

[2] P. Walia, V. K. Singh and M. K. Singh, Evaluating machine learning and unsupervised semantic orientation approaches for sentiment analysis of textual reviews, 2012 IEEE International Conference on Computational Intelligence and Computing Research, 2012, pp. 1-6.

[3] Aljoharah Almjawel, Sahar Bayoumi, Dalal Alshehri, Soroor Alzahrani and Munirah Alotaibi, Sentiment Analysis and Visualization of Amazon Books Reviews, 2019 2nd International Conference on Computer Applications & Information Security (ICCAIS)

[4] K.S. Srujan; S.S. Nikhil; H. Raghav Rao; K. Karthik; B.S. Harish;

H.M. Keerthi Kumar, Classification of Amazon Book Reviews Based on Sentiment Analysis

[5] Mounika Addanki and Dr. S. Saraswathi, Classification of book reviews based on sentiment analysis: A Survey.

[6] J. E. T Akinsola, Supervised Machine Learning Algorithms: Classification and Comparison, International Journal of Computer Trends and Technology, vol. 48 no. 3, 2017.