Multi-Attribute Analysis of Kindle Reviews

DOI : 10.17577/IJERTV11IS100013

Download Full-Text PDF Cite this Publication

Text Only Version

Multi-Attribute Analysis of Kindle Reviews

Meet Suthar

School of Engineering Technology Purdue University

West Lafayette, United States

Raji Sundararajan

School of Engineering Technology Purdue University

West Lafayette, United States

Gaurav Nanda

School of Engineering Technology Purdue University

West Lafayette, United States

Abstract Online reviews are often used to check the ratings before buying a product In this research, the various attributes of customer reviews, such as customer sentiments are analyzed for the kindle E-reader version fire tablet For this, a dataset of 469 publicly available Amazon reviews were used To study the sentiments, Latent Dirichlet Allocation topic model was used to obtain the various topics of interest Sentiment Analysis was performed to better comprehend the positive and negative tones of each topic The main themes that emerged from topic analysis of customer reviews for Kindle included: (a) reading usage, (b) utility as a gift, (c) price, (d) parental control, (e) durability, and

  1. charging The methodology in the study is replicable for any product and the findings from this research can assist product manufacturers and customers in better comprehending product successes and shortcomings based on a comprehensive analysis of consumer feedback.

    Keywords Sentiment Analysis; Topic Modeling; Customer Reviews; Latent Dirichlet Allocation; Business Intelligence (key words)

    1. INTRODUCTION

      This In 2020, nearly two billion individuals have purchased products or services online, with global online sales exceeded 4.2 trillion dollars [1], compared to 3.78 trillion dollars in 2019. Global retail e-commerce sales increased by more than 25% during the year of the epidemic 2020-21 [1]. Internet users explore, compare, and purchase products or services using a variety of online platforms. While some websites cater particularly to business-to-business (B2B) customers, individual consumers have access to a wide range of digital options. Online marketplaces account for most online transactions worldwide as of 2019. In terms of traffic, Amazon is at the top of the global list of online shopping websites [1], [2].

      Consumer analyses are found in a variety of internet formats, including specific goods (such as video cameras), articles in newspapers or magazines (such as Rolling Stone and customer reports), company industry articles (such as Amazon), and technical and user analysis pages in a variety of areas [2]. The Amazon ranking system uses a scale of one to five stars, with one being the worst and five being the best. Individual ratings define exact values throughout the analysis; thus, reviews are classified with incompatible ratings [2]. Consumer feedback is also available on websites and forums such as Blogstreet.com, AllConsuming.net, and onfocus.com. Amazon is a pioneer in e-commerce, and it is utilized by consumers every day for online shopping. It receives hundreds of evaluations from customers on their favorite goods [3].

      Marketers that want to attract new consumers and retain existing ones can analyze online reviews and understand their

      customers' sentiments, such as wants, requirements, and expectations to build a better and more engaging brand reputation. Consumers trust 4 stars out of 5 stars the most, according to the Online Reviews Survey, followed by 4.5 and

      5.0 stars [4]. According to the Womply research [4], firms with a 4.0 to 4.5-star rating earn an extra 28% in yearly income, but 5-star enterprises have lower-than-average sales and occasionally earn less than 1 to 1.5-star businesses. Customer reviews are not all created equal, especially when it comes to where they originate from and how they're generated. Some review sites will only accept verified evaluations that are linked to real-world transactions. Others let anybody write a review without requiring evidence of purchase (unprompted reviews) [5].

      The format of the reviews has also been found to play an important role in addition to the content [5]. customer reviews are generally gathered in two formats on online platforms: average ratings that provide a summary of the product's general quality (i.e., numbered ratings, also known as star ratings) and personal evaluations that contain individual stories of experiences with a that particular product. The comparative significance of these various sorts of information is a point of discussion and deeper understanding about them is still evolving with different findings from various studies. Based on a recent consumer poll, it has been observed that customers consider average evaluations to be the most significant, [5], [6]. According to current meta-analyses study, the valence (average rating) and number of reviews were observed to be some of the most important factors impacting sales and sentiments [4]. Positive evaluations raise sales and attitudes whilst negative reviews decrease them. Their impact, on the other hand, has been found to be dependent on review exposure, reviewer qualities, and review source [6].

      Although both good and negative evaluations can influence customer behavior, some studies have found that their impact differs [6]. Purnawirawan found that negative reviews had the greatest impact on attitudes and usefulness, indicating that negative evaluations may be more influential than positive ones a conclusion that is consistent with other communication studies [6].

      Hong and Park [4] discovered that both statistical and narrative information was equally persuasive, but Ziegele and Weber [7] discovered that, while average ratings were relevant, unique vivid narratives outweighed them. This view is congruent with medical studies that show anecdotal or narrative evidence of treatment quality might be more compelling than statistical data. Because consumers generally only read a limited number of reviews before deciding,

      focusing on the most recent ones, the topic of how strongly single reviews impact behavior is particularly relevant [7]. Towards this, in this study, we conducted the sentiment reviews of Kindle, as, it has a distinct product following and a subcategory in E-readers. This can provide us with a unique perspective on a popular subcategory of electronic items, such as the E-readers, as it is a widely used product being the most popular e-reader among 91 million e-readers in the US [8]. It has a sizeable number of product reviews, and falls into a unique device category of E-readers.

      For this purpose, we analyzed the various aspects of user feedback using Latent Dirichlet Allocation (LDA) topic modelling and interpreted the themes of generated topics using qualitative analysis and data visualization approaches and determined the prevalent customer sentiments associated with each topic, using sentiment analysis techniques. Recent studies have used topic models for analyzing various aspects of customer feedback and deriving deeper insights about customer preferences for different products and services including airlines services [9], rental accommodations [10],

      1. Data

        Fig. 1. Flow diagram of the methodology

        restaurants [11], and other purposes including analyzing online reviews of competing products [12] and detecting spam reviews [13]. However, no previous study has been conducted on the online customer reviews for Kindle as a product using LDA topic model, with theme interpretation using a combination of qualitative analysis, visualization, and sentiment analysis approaches.

    2. METHODS

      To develop data modelling through unsupervised lerning algorithms, a labeled dataset that includes reviews and star rating is required. Yet, such dataset is rare and very hard to generate. In such cases, several studies have attempted to train unsupervised learning models through 5 core public datasets which are verified and peer reviewed. This approach has demonstrated acceptable performance. Building on this idea, this study analysis the kindle E-reader through a three-stage process as follows:

      1. Creating unique product topics using topic modelling approach. Using multiple topic modelling models for generating and identifying precise number of required topics.

      2. Generating topic themes by qualitatively analyzing topic words and popularity.

      3. Generating sentiment scores for each topic and the dataset for understating the sentiment of individual topic themes.

      Kaggle Consumer Reviews of Amazon Products consist of

      39000 unique reviews of different amazon products [14]. The data fields include reviewer ID, product ASIN code, review helpfulness vote, review content, product overall rating, and review date. Out of these reviews, 469 reviews of Amazon Kindle E-reader were used for the analyses [14]. The data set is 5-core data since it has at least 5 verified reviews for each user and product. Figure 2 shows the snapshot of the dataset.

      Fig. 2. Dataset format

      Reviewer ID is the unique customer number from which the reviewer can be identified. ASIN number is used to determine the product for which the review is posted and Verified data field indicates whether the review is verified by Amazon. Out of all these data field, the Reviewtext field is used for the analysis because it contains the actual review that is posted for the product. All the other data fields are eliminated except the Reviewtext and the dataset is converted to text format to utilize it in the topic modelling. The text file was prepared with review texts, which are organized in a one-review-per-line manner.

      Data in the text file format was used for further processing. Various steps including text pre-processing, data visualization, coherence calculations, topic modelling, and results analysis were performed using various Python packages [15].

      The pre-processing steps are as follows:

      • Lemmatization: The dictionary base word known as lemma was generated from the phrases. For example, words such as "grab", "grabbing", "grabbed", and grabs were all changed to "grab".

      • Tokenization: The string of text was converted into a list of tokens by removing the punctuations and converting all words to lowercase.

      • N-grams: A sequence of n number of words were created that had higher probability of occurring in the review data. We considered bigrams and trigrams in this study.

      • Part of Speech tagging: The part of speech tags of each word in the narrative were determined. Only the words that were adjectives, nouns, verbs, or adverbs were considered for further processing and others were eliminated.

      Next, the pre-processed data was given as input to the topic model to generate the topic outputs, using LDA model, in which each document may be characterized by a distribution of themes, and each topic by a distribution of words. Sentiment analysis was performed on the topics generated by the LDA model.

      1. LDA topic Modeling

        Latent Dirichlet Allocation topic modelling is an unsupervised machine learning technique for organizing, analyzing, finding, and summarizing vast amounts of electronic data automatically [16]. It is a statistical method for identifying latent semantic patterns in a text body, which pertain towards being abstract "themes" that appear in a set of texts. This has been found to be very helpful in efficiently analyzing large collection of textual data, such as customer feedback, open-ended survey responses, collection of documents, and news articles [16]. Given that a document is about a specific topic, one might expect certain words to appear more or less frequently in the document, for example, words, such as "size" and "material" will appear more frequently in documents about clothing, and words, such as "drive" and "comfort" will appear in documents about cars, and common words, such as "the" and "is" will appear roughly equally in both.

        The LDA model assumes that each word in each document comes from a topic and the topic is selected from a per- document distribution over topics. It decomposes the probability distribution matrix of word in a document into two matrices, consisting of distribution of topic in a document and distribution of words in a topic [17]. The LDA model requires the number of topics as an input. To determine the optimum number of topics, Content Vector (CV) Coherence measure was used, as it has been found to be well-correlated with human judgement about generated topics and has been recommended by previous studies [17]. For different number of topics, ranging from 2 to 20 as inputs, CV Coherence was determined for each using the Gensim LDA topic modelling. Since, previous studies have reported that the Mallet LDA model have better performance of generated topics as compared to other LDA models, and Gensim LDA model to be suitable for CV Coherence calculations, they were chosen in this study [17]. Based on these CV Coherence values, the number of topics were determined corresponding to maximum

        value of CV Coherence and were given as input to the Mallet LDA topic model [18].

        The output of the Mallet topic model was further analyzed to determine the themes of each topic and were given as input to the sentiment analysis model to get the predominant sentiment of the review dataset as well as each topic. The Mallet LDA Model provides the following outputs: (a) list of top-20 words representative of the topic, and (b) the topic- wise composition of each review, which outputs the weight (ranging 0 to 1) associated with each topic representing how prominently that topic was present in the review text. The theme of the generated topics was determined by qualitatively analyzing the top-20 words of the topic and examining the top 30 reviews that had the highest associated weight for that topic. Figure 3 depicts the working of LDA topic modelling.

        Fig. 3. LDA Model workflow

      2. Sentiment Analysis

      Sentiment analysis is a text analysis technique, used to identify positive or negative views (polarity) in a text, whether it is a full document, a paragraph, a phrase, or a clause. Sentiment analysis quantifies a speaker's or writer's attitude, sentiments, assessments, attitudes, and emotions using a computational approach of subjectivity in a text [19]. For conducting Sentiment Analysis, we employed TextBlob model. TextBlob is a Natural Language Processing (NLP) module that allows for more in-depth textual data analysis and processing and uses the Natural Language ToolKit (NLTK) for the backend analysis [19]. NLTK is a library that allows users to deal with categorization, classification, and a variety of other tasks by providing simple access to a large number of lexical resources. The semantic direction of a feeling is determined by the strength of each word in the phrase. This necessitates the use of a pre-defined lexicon that categorizes negative and positive terms. A text message is often expressed by a collection of words. Following the assignment of individual scores to all of the words, the final sentiment is computed using a pooling procedure, such as the average of all of the sentiments. The polarity and subjectivity of a statement are returned by TextBlob. The range of polarity is -1 to 1, with -1 indicating a negative sentiment and 1 indicating a positive sentiment, and values between [0,1] are often subjective. Negative words are commonly used to change the polarity of a sentence and are factored in during calculations. Semantic labels in TxtBlob aid in fine-grained analysis, such as emoticons, exclamation marks, and emojis. The intensity of

      personal opinion and factual information in the text is measured by subjectivity. Because of the text's increased subjectivity, it provides personal opinion rather than factual facts. The intensity feature of TextBlob is used to calculate subjectivity. The intensity of a word affects whether it modifies the next word, for example, adverbs are used as modifiers (e.g., very nice').

      The sentiment analysis was performed at two different levels to better understand the overall view and about each topic. In the first level, the analysis was performed on the entire review dataset to obtain the overall sentiment of the reviews without dividing it into different topics. In the second level, the data was split into different topics obtained from topic modelling. For this part, sentiment analysis was performed on topics, generated by the Mallet LDA model. This approach helps to understand how people felt about each topic and how polarized they were. A better grasp of the overall tone and missing tones of the subjects were obtained by comparing the sentiment analysis outputs to the themes established during the qualitative study.

    3. RESULTS AND DISCUSSION

The Gensim LDA model was run to obtain the CV Coherence values for different numbers of topics starting from 2 to 20. This range was selected as it seemed to reasonably cover from the minimum to the maximum number of topics based on the size of the dataset. CV coherence calculations for the entire review dataset consisting of 469 reviews were performed. The outputs of CV Coherence corresponding to different number of topics is presented graphically in Figure 4.

Fig. 4. CV Values distribution

Figure 4 shows that the CV Coherence values for 2 to 20 topics for entire dataset had larger variability. These indicate that as data size increases, the topics may get more coherent, and thus a better understanding of the topics can be obtained. it can also noted from Figure 4 that the first peak in the CV Coherence value can be seen for 4 topic, and after that for 9 topics. The Mallet LDA modeling was performed for 4 topics, as well as for 9 topics, and compared them to better understand any repetitions in these themes. It was found that, the output with 9 topics consisted of all the topics that were present in the output with 4 topics. Thus, 9 topics were used for further analysis, and the corresponding MALLET outputs were analyzed.

  1. Mallet Topic Modeling Output

    The MALLET library used for performing LDA topic model analysis generates two main outputs. The first one, known as keys, contains the top-20 words in a particular topic. The second output, known as composition, gives the weightage of each review in the corresponding topic. These 2 files were obtained for 9 topics. Table 1 shows a comparison and analysis of these two.

    Here, T1-T9 are the outputs for Kindle reviews for 9 topics. Each topic is assigned a topic theme, which represents the overall interpretation of words in that topic. Among the 9 topics, each topic describes user sentiments in a certain aspect. For example, topic T1 focuses on reading, and topic T4 talks more about the gifting of the tablet. Other topics correspond to other features of the product, and its uses. it can be observed that the 9 topics give us more detailed information of the product than other analyses, such as star rating, and also we can see more tones in these reviews.

    TABLE I. TOPIC THEMES IDENTIFIED FOR 9 TOPICS FORM MALLET OUTPUT

    Topic Themes

    Top words

    Theme Label

    Weightage

    T1

    The Tablet is great for reading, movies, music and games.

    games, great, books, play, reading, kindle, tablet, love, read, watch, easy, movies, playing, lot, it's, money, web, happy, music, memory

    Read

    0.12521

    T2

    The tablet doesnt support much apps other than the downloaded ones and the navigation is not friendly

    great, user, can't, friendly, things, navigate, highly, apps, school, buy, downloaded, find, daughter, reading, it's, facing, front, wouldn't, worked, returned

    Apps & navigation

    0.04849

    T3

    There is a problem in charging, speakers and apps in this version of the kindle.

    product, good, store, apps, generation, kindle, amazon, google, sound, app, work, purchased, quality, system, speakers, doesn't, drawback, charging, fan, settings

    drawback

    0.0562

    T4

    The tablet is of great price and is good for Christmas gifts for kids.

    tablet, great, loves, kindle, fire, easy, gift, good, kids, amazon, price, son, year, product, christmas, size, read, time, perfect, love

    Gift

    0.31554

    T5

    Videos take lot of time to load, and the programming of the tablet failed

    buy, superb, nice, registered, replacement, devices, online, videos, taking, time, programming, failed, yokod, cable, usb-mini,

    Replacement

    0.01388

    so it was returned and replaced.

    suggest, outstanding, reservations, making, finding

    T6

    The tablet is of great price and is good for Christmas gifts for kids.

    tablet, price, screen, it's, works, perfect, good, reader, beat, bought, replace, internet, phone, basic, can't, nexus, browsing, size, fits, purchase

    Price

    0.09981

    T7

    There are a lot of parental control options in the tablet

    tablets, tablet, i've, price, purchased, case, bought, year, problems, fire, parental, tag, times, held, android, features, controls, card, find, display

    Parental

    0.05329

    T8

    There are some durability concerns of the tablet but the customer service was good

    i'm, version, book, fire, series, ease, set, books, purchase, faster, home, change, figure, fine, miss, buy, durable, amazon, box, pick

    Durable

    0.03083

    T9

    There is some problem with charging but the replacement service was good

    buy, space, charge, life, past, battery, sales, phone, device, color, kids, associate, helpful, deal, taking, warranty, port, upgrade, hard, charging

    Charging

    0.04553

  2. Topic Popularity

    Popularity of these various topics can be measured by the proportion of each topic in the entire corpus. A difference among those proportions is the evidence that some topics are more likely to occur than others. As we can see in Table 1, the LDA output weightage of the topic indicates its popularity in the corpus. Figure 5 shows a corpus-wide comparison of the popularity of topics for the Mallet output with 4 topics and the 9 topics LDA model.

    Fig. 5. Topic Popularity graph

    Figure 5 indicates that T1 is the most-mentioned topic in the 4 topics output. In the same way, T4 is the most mentioned topic in 9 topics output. Both talk about the gift aspects of

    Kindle, where the reviews strongly associated with this topic suggest that the product is good for gifting and for kids.

  3. Topic visualization

    For the topic visualization, the PyLDAvis model [20], an open-source visualization library for presenting topic models, which helps in analyzing and creating highly interactive visualization of the topics created by LDA was used.

    PyLDAvis generates 2 output graphs. The first one is called the Intertopic Distance ap. It shows the topic overlapping in the outputs. By that it can be identified which topics are overlapping and to what extent. The second output is the word frequency distribution. It shows the words in a particular topic and the frequency of that word in the topic as well as in the overall dataset. The Intertopic Distance map for 9-topic LDA model is presented in Figure 6 and the word frequency distribution for Topic 2 in the 9-topic LDA model is presented in Figure 7.

    In Figure 6, the Intertopic Distance map, each bubble represents a topic. The larger the bubble, the higher percentage of the number of words in the corpus is about that topic. The further the bubbles are away from each other, the more different they are. For example, it is difficult to tell the difference between topics 1 and 2. Both the topics are about price, but it is much easier to tell the difference between topics 7 and 8. From Table 1, we can see that topic 7 is about parental control and topic 8 is about durability, which are two distinct aspects of the product and are not likely to have overlapping topic words.

    Fig. 6. PyLDAvis Intertopic Distance Map Fig. 7. PyLDAvis words frequency distribution

    In Figure 7, the word frequency distribution for topic 2 is shown. The list of words on left represent the top-20 words of the topic 2 and the blue bars represent the overall frequency of estimated number of times a given term was generated by a given topic, which in case of Figure 7 is topic 2. As shown in Figure 7, the value was about 250 for the word love

    indicating that this term was used about 250 times within topic

    2. The word with the longest red bar is the word that is used the most by the reviews belonging to that topic. each word in the corpus. If no topic is selected in the PyLDAvis, the blue bars of the most frequently used words for the entire corpus will be displayed. The red bars give the

    We also visualized each topic with the top-20 words in form of WordClouds [21], where the font size of each topic word represents their relative weights in the topic. WordClouds are also a good visual form to present topics as they highlight the most representative word of the topic and can help in theme interpretation. The WordCloud for each topic in the for the 9-topic LDA model is presented in Figure 8.

    As shown in Figure 8, sometimes 1-2 words dominate a topic (e.g. topic 9), and other times, there is a more even distribution of weights among topic words (e.g., Topic 6). The weights of different topic words are determined by the LDA model based on statistical processing of their frequency and co-occurrence with other topic words. Visualizing this distribution of topic word weightage makes the interpretation of topic model results more nuanced and sophisticated.

    Fig. 7. Topic visualization using wordcloud for 9-Topic LDA Model

  4. Sentiment Analysis

Figures 9 and 10 show the polarity distribution of the reviews, and the subjectivity distribution of the entire dataset, respectively. Polarity shows the statistic value of how positive, neutral or negative a review text is. The subjectivity shows how opinion oriented the review is. It can be observed that the majority of this dataset tends towards the positive sentiment. This corelates well with the star ratings of the dataset (Figure 11). The subjectivity of the dataset is uniformly distributed at mean of 0.6. This belief was examined through the sentiment

analysis of reviews, at two levels, first the entire dataset, and second, for each topic.

Fig. 8. Product Polarity score distribution

Fig. 9. Product subjectivity score distribution

Fig. 10. Kindle star rating distribution

Figures 12 shows the sentiment distribution for the entire dataset and Figure 13 shows the sentiment analysis based on how subjective the reviews in the dataset are by representing them as the size of each dot. In Figure 13, the x-axis represents the polarity of the reviews, and y-axis represents subjectivity of the reviews. The green dots represent the neutral reviews. They lie between the negative and positive dots. The red dots represent the negative reviews, and the blue dots represent the positive reviews. The size of the dots represents the subjectivity of the reviews. Overall, there were considerably a greater number of positive reviews than the negatives.

Fig. 11. Product sentiment distribution

Fig. 12. Sentiment Analysis Results for the entire review dataset

CONCLUSION

In this study, we used topic modeling and sentiment analysis to analyze a large collection of product reviews and to interpret the themes of predominant topics using qualitative analysis and topic visualization. Sentiment analysis of user reviews help to understand the polarity and the subjectivity aspects of the reviews and determine whether the attributes talked in the reviews were mostly positive, neutral, or negative. While prior research has been largely focused on quantitative (e.g., review length) and textural (e.g., readability) characteristics, we delved deeper and identified the main topics people were talking about in online Kindle reviews using LDA topic modeling. Our findings highlight the necessity of uncovering the latent (hidden) themes that underpin online reviews in order to improve consumer review systems.

We also demonstrated that topic visualization and sentiment analysis can help in developing a more nuanced interpretation of topic themes. Our research identified interpretable themes that customers are predominantly talking about across a large set of Kindle text reviews, such as "Service," "Feature," "Usability," and "Feel." The findings will be useful for online customers and businesses for discerning useful information from a large collection of customer reviews.

There are a few limitations in our research. First, our dataset was not very large; we considered only 469 reviews of Kindle out of 39,000 reviews of different amazon products. In future studies, we can analyze a larger dataset in order to analyze the product reviews better and identify more nuanced themes. Second, to create the topic themes, qualitative analysis was performed, which is time consuming and involves trial and error. Although topic visualization helped in qualitative interpretation of topic themes, better and more

Fig. 13. Topic based sentiment analysis

standardized ways of conducting qualitative analysis of topics generated can be researched to make this process more efficient. Third, the current data does not include demographic data of reviewers. Future work can include demographic data to better understand the theme changes and customer preferences based on countries and locations.

REFERENCES

[1] Quarterly Retail E-Commerce Sales 2nd Quarter 2021, http://www.census.gov/retail/ (accessed: Sep. 14, 2021).

[2] Y. Heng, Z. Gao, Y. Jiang, and X. Chen, Exploring hidden factors behind online food shopping from Amazon reviews: A topic mining approach, Journal of Retailing and Consumer Services, vol. 42, pp. 161168, May 2018, doi: 10.1016/J.JRETCONSER.2018.02.006.

[3] B. von Helversen, K. Abramczuk, W. Kope, and R. Nielek, Influence of consumer reviews on online purchasing decisions in older and younger adults, Decision Support Systems, vol. 113, pp. 110, Sep. 2018, doi: 10.1016/J.DSS.2018.05.006.

[4] S. Hong and H. S. Park, Computer-mediated persuasion in online reviews: Statistical versus narrative evidence, Computers in Human Behavior, vol. 28, no. 3, pp. 906919, May 2012, doi: 10.1016/J.CHB.2011.12.011.

[5] S. K. Roy, M. S. Balaji, A. Quazi, and M. Quaddus, Predictors of customer acceptance of and resistance to smart technologies in the retail sector, Journal of Retailing and Consumer Services, vol. 42, pp. 147160, May 2018, doi: 10.1016/J.JRETCONSER.2018.02.005.

[6] N. Purnawirawan, M. Eisend, P. de Pelsmacker, and N. Dens, A Meta- analytic Investigation of the Role of Valence in Online Revies, Journal of Interactive Marketing, vol. 31, Aug. 2015, doi: 10.1016/j.intmar.2015.05.001.

[7] Ziegele, M., & Weber, M. (2014). Example, please! comparing the effects of single customer reviews and aggregate review scores on Online Shoppers' Product Evaluations. Journal of Consumer Behaviour, 14(2), 103114. https://doi.org/10.1002/cb.1503

[8] E-Readers – Statistics & Facts. [Online]. Available: https://www.statista.com/topics/1488/e-readers/#dossierKeyfigures.

[9] Lucini, F. R., Tonetto, L. M., Fogliatto, F. S., & Anzanello, M. J. (2020). Text mining approach to explore dimensions of airline

customer satisfaction using online customer reviews. Journal of Air Transport Management, 83, 101760.

[10] Ding, K., Choo, W. C., Ng, K. Y., & Ng, S. I. (2020). Employing structural topic modelling to explore perceived service quality attributes in Airbnb accommodation. International Journal of Hospitality Management, 91, 102676.

[11] Kwon, W., Lee, M., & Back, K. J. (2020). Exploring the underlying factors of customer value in restaurants: A machine learning approach. International Journal of Hospitality Management, 91, 102643.

[12] Wang, W., Feng, Y., & Dai, W. (2018). Topic analysis of online reviews for two competitive products using latent Dirichlet allocation. Electronic Commerce Research and Applications, 29, 142- 156.

[13] Wang, Z., Gu, S. & Xu, X. GSLDA: LDA-based group spamming detection in product reviews. Appl Intell 48, 30943107 (2018). https://doi.org/10.1007/s10489-018-1142-1

[14] B. Srigiriraju, Amazon reviews: Kindle Store Category | Kaggle, 2018. https://www.kaggle.com/bharadwaj6/kindle-reviews (accessed

Sep. 14, 2021).

[15] G. Nanda, N. M. Hicks, D. R. Waller, D. Goldwasser, and K. A. Douglas, Understanding Learners Opinion about Participation Certificates in Online Courses using Topic Modeling. International Conference on Educational Data Mining (EDM), vol 11, jul. 2018, doi: ED593201.

[16] D. M. Blei, A. Y. Ng, and J. B. Edu, Latent Dirichlet Allocation Michael I. Jordan, Journal of Machine Learning Research, vol. 3, pp. 9931022, 2003.

[17] Selva Prabhakaran, Gensim Topic Modeling – A Guide to Building Best LDA models, 2018. https://www.machinelearningplus.com/nlp/topic-modeling-gensim- python/#14computemodelperplexityandcoherencescore (accessed Sep. 14, 2021).

[18] Senol Kurt , Topic Modeling LDA Mallet Implementation in Python Part 1 https://medium.com/@kurtsenol21/topic-modeling- lda-mallet-implementation-in-python-part-1-c493a5297ad2 (accessed Sep. 14, 2021).

[19] Sentiment Analysis in Python: TextBlob vs Vader Sentiment vs Flair vs Building It From Scratch – neptune.ai.

https://neptune.ai/blog/sentiment-analysis-python-textblob-vs-vader-vs- flair (accessed Sep. 16, 2021).

[20] B. Mabey, Welcome to pyLDAviss documentation! pyLDAvis

2.1.2 documentation. https://pyldavis.readthedocs.io/en/latest/ (accessed Sep. 14, 2021).

[21] H. Sharma, Topic Model Visualization using pyLDAvis | by Himanshu Sharma | Towards Data Science, 2021. https://towardsdatascience.com/topic-model-visualization-using- pyldavis-fecd7c18fbf6 (accessed Sep. 14, 2021).

[22] R. Y. Kim, When does online review matter to consumers? The effect of product quality information cues, 123AD, doi: 10.1007/s10660- 020-09398-0.

[23] D. Kaemingk, Online Review Statistics to Know in 2021 // Qualtrics. https://www.qualtrics.com/blog/online-review-stats/ (accessed Sep. 14, 2021).

[24] C. Bloem, 84 Percent of People Trust Online Reviews As Much As Friends. Heres How to Manage What They See | Inc.com. https://www.inc.com/craig-bloem/84-percent-of-people-trust-online- reviews-as-much-.html (accessed Sep. 14, 2021).

[25] D. Sarkar, Text Analytics with Python, Text Analytics with Python, 2016, doi: 10.1007/978-1-4842-2388-8.

[26] G. Nanda, K. A. Douglas, D. R. Waller, H. E. Merzdorf, and D. Goldwasser, Analyzing Large Collections of Open-Ended Feedback from MOOC Learners Using LDA Topic Modeling and Qualitative Analysis, IEEE Transactions on Learning Technologies, vol. 14, no. 2, pp. 146160, Apr. 2021, doi: 10.1109/TLT.2021.3064798.

[27] H. R. Marriott and M. D. Williams, Exploring consumers perceived risk and trust for mobile shopping: A theoretical framework and empirical study, Journal of Retailing and Consumer Services, vol. 42, pp. 133146, May 2018, doi: 10.1016/J.JRETCONSER.2018.01.017.

[28] S. Bashir, S. Anwar, Z. Awan, T. W. Qureshi, and A. B. Memon, A holistic understanding of the prospects of financial loss to enhance shoppers trust to search, recommend, speak positive and frequently visit an online shop, Journal of Retailing and Consumer Services, vol. 42, pp. 169174, May 2018, doi:

10.1016/J.JRETCONSER.2018.02.004.

[29] Bloomberg Businessweek – July 29, 2013 Issue – Bloomberg,

Bloomberg Businessweek, 2013.