Quality Assured Publisher
Serving Researchers Since 2012

OCR-Based Sentiment Classification Web Application with DistilBERT and Logistic Regression Baseline

DOI : 10.17577/IJERTCONV14IS010015
Download Full-Text PDF Cite this Publication

Text Only Version

OCR-Based Sentiment Classification Web Application with DistilBERT and Logistic Regression Baseline

Chaitra S

Dept. of Computer Applications St Joseph Engineering College An Autonomous Institution Vamanjoor, Mangaluru, India

Ms Priyadarshini P

Assistant Professor

Dept. of Computer Applications St Joseph Engineering College An Autonomous Institution Vamanjoor, Mangaluru, India

Mr. Hareesh B Associate Professor Department of Computer Applications

St Joseph Engineering College Vamanjoor, Mangalore, Karnataka

Abstract – The OCR-Enabled Sentiment Classification Web Application is a full-stack intelligent system developed to classify sentiment from both textual and image-based inputs by integrating Optical Character Recognition (OCR) with advanced Natural Language Processing (NLP) techniques. The system leverages a fine-tuned DistilBERT transformer model for accurate sentiment prediction, supported by a baseline logistic regression model for performance benchmarking. Tesseract OCR is incorporated to extract text from image inputs such as scanned documents, handwritten notes, and screenshots, enabling multimodal sentiment analysis. The application is built using a modular architecture consisting of a ReactJS frontend, Flask-based backend, and MongoDB for secure user data and prediction history storage. Training was conducted using a combination of IMDB, Yelp, and a custom tricky dataset, which included context-sensitive phrases like not bad and just okay to enhance the models real-world interpretation capabilities. The DistilBERT model achieved a macro F1-score of 96.99%, outperforming the logistic regression baseline. Users can input plain text or upload an image to instantly receive sentiment results classified as Positive, Neutral, or Negative. The application also supports user authentication, dashboard-based sentiment history, and real-time feedback. Designed for practical applications in customer review analysis, survey processing, and market research, the system offers a scalable and user-friendly solution for organizations aiming to interpret user sentiment across diverse input formats.

Keywords – Sentiment Analysis, Optical Character Recognition (OCR), DistilBERT, Tesseract, Logistic Regression, Natural Language Processing (NLP)

  1. INTRODUCTION

    In an age driven by public opinion, understanding sentiment has become essential across industries from e- commerce and product reviews to market analysis and social media monitoring. With the increasing volume of both textual and image-based feedback (such as scanned receipts, handwritten notes, or screenshots), traditional sentiment analysis tools fall short due to limited input support and shallow linguistic understanding. A major challenge in current sentiment analysis systems is their dependency on

    plain text inputs and rule-based or shallow machine learning models like Naive Bayes or logistic regression[1][2]. These models often misinterpret nuanced expressions and struggle with ambiguous phrases such as not bad, meh, or could be worse. Additionally, existing systems typically do not support sentiment extraction from image data, requiring users to manually transcribe text, a time-consuming and error-prone process. To address these limitations, we introduce a hybrid sentiment classification system that combines Optical Character Recognition (OCR) with a deep learning model based on DistilBERT, a lightweight transformer architecture[4] known for its contextual accuracy and efficiency. The tool accepts both typed and image-based user input, uses OCR (Tesseract[7]) to extract text from images, cleans the content, and classifies it into Positive, Neutral, or Negative sentiment.

    What makes this system unique is not only the integration of OCR and NLP but also its focus on real-world usability through a user-friendly web interface, login functionality, prediction history, and the inclusion of a tricky custom dataset designed to test ambiguous phrases. Our application offers a complete and deployable full-stack solution for intelligent sentiment extraction across diverse input formats.

  2. RELATED WORK

    Recent advances in sentiment analysis have shown significant improvements through the adoption of deep learning, particularly transformer-based architectures such as BERT and its variants. Traditional machine learning approaches and rule-based systems remain widely studied as lightweight alternatives, especially for structured and clean text. However, challenges remain in processing ambiguous language and non-textual inputs like scanned forms or images.

    1. Rule-Based and Traditional Machine Learning Approaches

      Early sentiment analysis methods relied heavily on rule- based systems such as TextBlob and VADER, which used

      predefined sentiment lexicons and grammatical heuristics. These systems performed well for binary sentiment classification but struggled with complex linguistic structures, sarcasm, and context-sensitive phrases. Logistic Regression and Support Vector Machines (SVM) became popular as lightweight alternatives, especially when combined with TF-IDF or Bag-of-Words features. While these models provided acceptable accuracy for simple datasets, they underperformed on inputs with nuanced or ambiguous sentiments [1][2].

    2. Transformer-Based Models for Contextual Understanding

      The introduction of transformer architectures, particularly BERT and its optimized variant DistilBERT, revolutionized natural language understanding by leveraging self-attention mechanisms for contextual learning. Studies such as Devlin et al. [3] and Sanh et al. [4] demonstrated that fine-tuned BERT models consistently outperform traditional models across sentiment classification benchmarks like IMDB, Yelp, and Amazon reviews.

    3. OCR and Multimodal Sentiment Extraction

      OCR technologies such as Tesseract have been successfully integrated into NLP pipelines to enable text extraction from images and scanned documents. Smith [6] provided a comprehensive overview of Tesseract's capabilities[7] in recognizing printed and handwritten text. However, few studies have combined OCR with sentiment classification in a full-stack deployable setting. Existing tools typically assume clean digital text as input, excluding a significant portion of real-world feedback data captured in screenshots, receipts, or handwritten forms.

    4. End-to-End Sentiment Systems and Dataset Challenges

      End-to-end sentiment analysis platforms that combine frontend interfaces, backend logic, and machine learning pipelines remain underexplored in academic literature. Most prior works either focus on backend model performance or static dataset evaluation. Moreover, custom datasets containing ambiguous phrases like not bad, meh, or just okay are rarely included, despite being prevalent in user- generated content. This gap highlights the need for systems that not only classify sentiment accurately but also handle tricky language and non-text inputs.

    5. Comparative Context

    Unlike existing models that handle only digital text, the proposed system integrates Tesseract OCR with a fine-tuned DistilBERT model for real-time sentiment classification on both text and image-based inputs. While logistic regression is used as a baseline, the transformer-based approach clearly outperforms it in handling ambiguous and context-dependent phrases. Furthermore, the full-stack web implementation using ReactJS, Flask, and MongoDB[10][11] enables user interaction, history tracking, and sentiment logging in a practical, scalable environment.

  3. METHODOLOGY

    1. System Architecture

      The OCR-Enabled Sentiment Anaysis Web Application is designed to be a scalable and responsive platform that combines both traditional and modern Natural Language Processing (NLP) approaches. The primary objective of this application is to automate the sentiment classification of both typed text and images by integrating Optical Character Recognition (OCR) with deep learning techniques. The application offers a user-friendly interface and includes essential features such as login, sign-up, text and image input, real-time sentiment analysis output, and a user dashboard to view sentiment history.

      There are four major elements of system architecture: frontend, backend, OCR module, and sentiment classification engine. These architecture layers communicate with each other with RESTful APIs. Users interact with a ReactJS-built frontend, which in turn communicates with a Flask-based backend. Authentication, model inference, text processing, as well as database operations, occur in this Flask-based backend. All user data, such as input history as well as output of prediction, is kept in MongoDB. To process image-based input, text extraction is conducted using Tesseract-based OCR in uploaded images. Final sentiment classification is carried out using a fine-tuned DistilBERT transformer-based model capable of deeply understanding sensitive, context-rich, as well as tricky phrases.

    2. Data Collection

      For building a solid sentiment analysis model, we have used a mixture of custom-built as well as open-source datasets in such a manner such as there is a good coverage across Positive, Neutral, as well as Negative sentiments. First built as a collection of 50,000 labeled reviews towards binary classification, the IMDB Movie Reviews dataset[9], further was adapted with three sentiment classes. Apart from this, the complete Yelp Review dataset[8], with ratings of 12 stars being set as negative, 3 as neutral, as well as 45 as positive adding diversity in tone along with in-the-field expressions, was added.

      To further improve the models ability to handle nuanced or ambiguous language, we curated a custom tricky dataset. This included context-sensitive phrases like not bad, meh, and just okay, which are often misinterpreted by traditional models. These helped test the model's understanding of subtle sentiment shifts.

      All datasets were preprocessed in a consistent way: converting text to lowercase, removing punctuation and HTML tags, and filtering out stopwords using NLTK. Sentiment labels were then encoded as 0 (Negative), 1 (Neutral), and 2 (Positive) to work smoothly with both classical ML algorithms and transformer-based models.

    3. Technologies

      OCR-Enabled Sentiment Analysis Web Application was developed as a full-stack web application with a set of open- source tools. It was developed on a Windows-based platform with Visual Studio Code as the primary code editor, and as such, in general, the application design enables it to be

      deployable both on Windows-based as well as on Linux servers.

      On the frontend, ReactJS, a contemporary JavaScript library with a component-based architecture for creating dynamic user interfaces, was employed in designing the interface. TailwindCSS was utilized in styling in order to make sure the interface was responsive as well as usable across varied devices. JavaScript and Axios were employed in handling interactivity as well as communication of the frontend API in order to facilitate easy integration with backend services.

      For the backend, the application depends on Flask, a powerful but lightweight Python web framework. It takes care of such core functionality as routing of requests, user authentication handling, image upload processing, as well as communication with the sentiment model. All of the application's backend functionality is concentrated in one file app.py, thus facilitating easy maintenance as well as modular coding.

      For sentiment analysis of images, Tesseract OCR[7] was used in the backend. It is an open-source OCR engine developed with the aim of extracting text in pictures or screen shots uploaded, hence enabling both typed and image inputs to be subjected to analysis. Text thus extracted is cleaned before being passed on to sentiment classifier for prediction.

      MongoDB[11], a NoSQL document-based database, is utilized in handling data with flexibilities in storing sentiment history and account information of the users. It is schema-less, hence handling varied data types in accordance with each behavior of the user was easy.

      There are two machine learning models in this code in total. As a quick reference, a Logistic Regression model with TF-IDF features using Scikit-learn[5] was initially trained. Fast but easy, this did serve as a base case mostly. Final production deploys a fine-tuned DistilBERT model utilizing the Hugging Face Transformers package[4][10]. With a transformer architecture, it will correctly interpret complex sentence structures, sarcasm, as well as contextual cues in ways a Logistic Regression method cannot.

      Overall, this tech stack was chosen because of its balanced combination of performance, flexibility, and ease of integration. With a secure backend, responsive frontend, and robust data handling mechanisms, the system is both extensible and production-ready.

    4. Implementation

      The development of the OCR-Enabled Sentiment Analysis Web Application was approached in a modular, step-by-step manner to ensure flexibility, smooth performance, and easy integration of both traditional and modern machine learning techniques. The whole project was developed in a Windows environment with Visual Studio Code, where various parts of the system were separated into frontend, backend, database, and machine learning modules for ease of development and debugging. The frontend was developed with ReactJS and styled with TailwindCSS. This configuration made the

      interface interactive and responsive and functioned properly on desktops, tablets, and smartphones. The component-based structure of React made it easy to organize core features such as login, registration, sentiment input (text/image), and history display. JavaScript and Axios were employed to validate forms and make API calls to the backend.

      On the backend, Flask, a light Python framework, was used. The application logic was contained within a single file, app.py, and included everything from login and registration routes to password reset and model predictions. When users uploaded images, the Flask backend used Tesseract OCR to read out any text it could find, which was then sanitized and passed to the sentiment model for prediction. User information, including passwords and sentiment logs, was stored in MongoDB. Its lack of schema made it ideal for tracking user-specific information such as prediction history, timestamps, and email information. Each sentiment prediction is tied back to the corresponding user and is later viewable on their individual dashboard.

      The system employs two sentiment models: a Scikit-learn and TF-IDF-trained logistic regression model for comparison, and a fine-tuned DistilBERT transformer model deployed using Hugging Face Transformers. While the logistic model was a simple baseline, the DistilBERT model was used in production because it significantly outperformed the others, particularly on rich context and ambiguous input. After extensive testing, the application was ready for deployment. While built on Windows, modularity makes deployment on Windows or Linux-based servers simple. Key features such as password reset, OCR capability, and real-time feedback were tested for smooth functionality. When a user submits an image or text, the backend automatically extracts or processes the input, sends it to the DistilBERT model, and saves the result. The user immediately sees the prediction, and the result is stored in the database for future access via the dashboard.

      While more advanced logging features (like server performance or internal error tracking) are not yet actie, the system already logs essential user activity such as login timestamps and previous predictions. The application is also designed to scale in the future, with possibilities for adding multilingual support, admin dashboards, and deeper analytical insights.

    5. Pseudo Codes

    Login Functionality:

    BEGIN

    DISPLAY Login Form

    IF user submits the form THEN RETRIEVE user data from MongoDB VALIDATE entered email and password

    IF credentials match THEN SET user session

    REDIRECT to user dashboard ELSE

    DISPLAY error message: "Invalid email or password" END IF

    END IF END

    Image or Text Input and Sentiment Analysis:

    BEGIN

    SHOW options: Upload Image / Enter Text WAIT for user input

    IF image is selected THEN

    SEND image + email to /upload-image

    IF response is successful THEN DISPLAY extracted text + sentiment

    ELSE

    SHOW error: "Image analysis failed" ENDIF

    ELSE IF text is entered THEN SEND text + email to /analyze-text IF response is successful THEN

    DISPLAY original text + sentiment ELSE

    SHOW error: "Text analysis failed" ENDIF

    ELSE

    SHOW message: "Please provide input" ENDIF

    END

  4. RESULTS AND ANALYSIS

    The performance of the proposed OCR-Enabled Sentiment Analysis Web Application was evaluated by testing the system's ability to accurately classify sentiment across text and image-based inputs. The evaluation considered model performance, responsiveness, and real- time processing capacity. The fine-tuned DistilBERT model served as the core sentiment engine, while a logistic regression model acted as the baseline for comparison

    1. Sentiment Classification Accuracy

      The fine-tuned DistilBERT model was evaluated on a combined dataset of IMDB reviews, Yelp ratings, and a manually curated tricky phrase dataset. This test set included

      ambiguous and sarcastic samples to assess real-world robustness. The model demonstrated consistently high performance, achieving the following metrics:

      • Precision: 0.9707

      • Recall: 0.9691

      • F1-Score: 0.9699

      • Overall Accuracy: 96.91%

        In comparison, the baseline Logistic Regression model trained on the same data (using TF-IDF features) achieved a macro F1-score of 0.6666 and an accuracy of 87.0%. This clearly demonstrates the superior capability of the DistilBERT model in handling nuanced and context-heavy inputs such as not bad or meh.

        Figure 4(a) shows a side-by-side bar chart comparison of the key evaluation metrics Accuracy, Precision, Recall, and F1- score where DistilBERT consistently outperforms the Logistic Regression model.

    2. OCR-Enabled Image Input Evaluation

      To make the system capable of handling image-based inputs, Tesseract OCR was integrated into the backend to extract text from screenshots, scanned documents, or printed forms. Under clear and well-formatted conditions such as typed or printed text the OCR engine performed reliably, with an average text extraction accuracy of around 91%. However, when working with low-quality or handwritten images, its accuracy dropped slightly due to difficulties in recognizing certain characters.

      As illustrated in Figure 4(b), processing plain text inputs took about 1.3 seconds on average, while image-based predictions which include the time for OCR took slightly longer at around 1.8 seconds. This slight difference still falls well within the range of real-time responsiveness, showing that the system is efficient and practical for everyday use across both text and image input formats.

    3. Performance Comparison of Models

    To assess the effectiveness of the proposed system, we conducted a baseline comparison between a classical Logistic Regression model and a fine-tuned DistilBERT transformer. Both models were evaluated using standard metrics on a test set comprising IMDB, Yelp, and tricky custom samples.

    As shown in Table I, DistilBERT consistently outperformed Logistic Regression across all evaluation metrics.

    Model

    Accuracy

    Precision

    Recall

    F1-

    Score

    Logistic Regression

    86.97%

    0.7490

    0.6425

    0.6659

    DistilBERT

    96.91%

    0.9707

    0.9691

    0.9699

    Table I: Model Performance Comparison

  5. DISCUSSION

    The OCR-Enabled Sentiment Analysis Web Application brings together multiple technologies to provide a seamless and intelligent way to analyze sentiment from both text and image inputs. Through this study, we compared two different approaches Logistic Regression as a traditional machine learning model, and DistilBERT as a modern transformer- based deep learning model to see how each performed in real-world sentiment classification scenarios. Our results clearly showed DistilBERT significantly surpassed Logistic Regression. Even though the Logistic Regression-model achieved a macro F1-score of 66.59%, it underperformed with expressions of neutrality or ambiguality such as "just okay," "not bad," or "meh." It is not surprising since standard models, as keyword frequency-based models, tend to lose out on fine grain meaning of such expressions.

    On the other hand, our fine-tuned DistilBERT with our combined dataset of IMDB, Yelp, and difficult custom test examples delivered an astonishing F1-score of 96.99% as well as 96.91% overall accuracy. Its sentence structure as well as contextual understanding assisted it in dealing with sarcasm, negation, as well as ambigious input way better. With a smooth learning curve upon training, although there was some spike in validation loss in the third epoch, F1-score did not drift, showing no overfitting. Our designed OCR module with Tesseract turned out to be important as well. It allowed users to input text with image files (such as screenshots or hand-scanned notes), extract text, as well as

    sentiment-analyze it. It functioned best with clear high- contrast images as well as was approximately 91% accurate in text extraction. However, it fared poorly with blurry or handwritten text, sometimes resulting in extraction as well as prediction failure. In regards to performance, the system managed to make predictions fast enough in 1.3 seconds with textual inputs as well as 1.8 seconds using image-based inputs, making it real-time in most scenarios of user interactions. Our frontend dashboard with history of previous sentiment prediction was enjoyed well enough in user trials in view of ease of use as well as convenience.

    Even though in the production-quality final version of this application, we have not applied the Logistic Regression model, it served as a suitable baseline with respect to which we can measure the improvements of transformer-based models. By utilizing it, we managed to present context-aware models in daily applications. It was designed as a modular project in such a way that its frontend, backend, database, or machine learning models can be independently revised. It also supports secure login, password reset features, and user- specific sentiment logs stored in MongoDB. Although advanced logging and admin analytics are not yet implemented, the foundation is in place for future improvements. Overall, this study shows that combining OCR with a powerful language model like DistilBERT can significantly improve the reach and accuracy of sentiment analysis systems. The application is well-suited for use in industries such as product review monitoring, customer feedback analysis, or social media sentiment tracking. Future enhancements could include multilingual support, better handling of handwritten images,and a more advanced error feedback mechanism

  6. CONCLUSION

This work introduced and evaluated an OCR-Enabled Sentiment Analysis Web Application that bridges optical character recognition with advanced language understanding to interpret sentiment from both typed and image-based inputs. Built on a modular full-stack framework ReactJS for the frontend, Flask for backend logic, and MongoDB for persistent data storage the system provides a seamless user experience while ensuring performance and scalability. At the heart of the application lies a fine-tuned DistilBERT model that delivers high accuracy in sentiment classification, especially for nuanced or tricky phrases often found in real-world communication. A logistic regression model was also implemented as a baseline for comparison. The DistilBERT model clearly outperformed it, achieving a macro F1-score of 96.99%, while maintaining fast response times averaging 1.3 seconds for text input and 1.8 seconds for image-based input. With the integration of Tesseract OCR, the system extends its capabilities beyond traditional text inputs, allowing users to analyze sentiments from screenshots, scanned forms, or printed feedback. This makes the tool highly versatile for use cases in product review analysis, customer service, and social media monitoring. While the system performs reliably under clean, structured inputs, future improvements can focus on better handling of handwritten or low-quality images, multi- language support, and admin dashboards for deeper insights and oversight. Ultimately, this application highlights the

practical potential of combining OCR with transformer-based NLP for real-world sentiment analysis tasks. It opens up promising directions for further research in multimodal, context-aware sentiment understanding

REFERENCES

  1. A. Go, R. Bhayani, and L. Huang, "Twitter Sentiment Classification using Distant Supervision," CS224N Project Report, Stanford University, 2009.

  2. L. Zhang, R. Ghosh, and S. M. Venkatagiri, "A review on sentiment analysis of social media data using deep learning and machine learning approaches," Journal of Big Data, vol. 9, no. 1, pp. 130, 2022.

  3. J. Devlin, M. Chang, K. Lee, and K. Toutanova, BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, in Proc. of NAACL-HLT, 2019, pp. 41714186.

  4. V. Sanh, L. Debut, J. Chaumond, and T. Wolf, DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter, arXiv preprint arXiv:1910.01108, 2019.

  5. F. Pedregosa et al., Scikit-learn: Machine Learning in Python, Journal of Machine Learning Research, vol. 12, pp. 28252830, 2011.

  6. S. Bird, E. Klein, and E. Loper, Natural Language Processing with Python, OReilly Media Inc., 2009.

  7. R. Smith, An Overview of the Tesseract OCR Engine, in Proc. Ninth International Conference on Document Analysis and Recognition (ICDAR), 2007, pp. 629633.

  8. Yelp Dataset, https://www.yelp.com/dataset, Accessed: June 2025.

  9. IMDB Dataset of 50K Movie Reviews, https://ai.stanford.edu/~amaas/data/sentiment/, Accessed: June 2025.

  10. Hugging Face Transformers, https://huggingface.co/transformers/, Accessed: June 2025.

  11. MongoDB NoSQL Database, https://www.mongodb.com/, Accessed: May 2025