DOI : https://doi.org/10.5281/zenodo.19760701
- Open Access
- Authors : Shiva Karthik Bairy, Shravya Jallepally, Ashish Pathak, B. Harish Goud
- Paper ID : IJERTV15IS041907
- Volume & Issue : Volume 15, Issue 04 , April – 2026
- Published (First Online): 25-04-2026
- ISSN (Online) : 2278-0181
- Publisher Name : IJERT
- License:
This work is licensed under a Creative Commons Attribution 4.0 International License
Sentiment Analysis and Smart Review of Stakeholder Comments in the eConsultation Module
B. Harish Goud
Department of Information Technology Chaitanya Bharathi Institute of Technology Hyderabad, India
Shravya Jallepally
Department of Information Technology Chaitanya Bharathi Institute of Technology Hyderabad, India
Ashish Pathak
Department of Information Technology Chaitanya Bharathi Institute of Technology Hyderabad, India
Shiva Karthik Bairy
Department of Information Technology Chaitanya Bharathi Institute of Technology Hyderabad, India
AbstractThe Ministry of Corporate Affairs employs the eConsultation Module to seek inputs from various stakeholders for their feedback and suggestions on proposed legislation or proposed amendments to that legislation. While the tool enables citizen participation, a vast number of comments can be collected during the consultation process in a matter of days. There is a high probability of overlooking any essential feedback, suggestions, opinions, etc., during this entire exercise. Thus, automation of the process becomes necessary, and we offer an AI-driven solution Avalokan that will analyze the stakeholder feedback efciently. Our solution employs sentiment analysis to classify stakeholder feedback into either positive, negative, or neutral with respect to the proposed amendments, uses SentenceBERT cosine similarity technique to detect duplication of comments, and offers a version-based trend engine to track the evolution of sentiments with each passing version. The system gives an accuracy of 83.33% with F1-score of 0.7622 for real stakeholder comments. Additionally, the percentage of negative sentiments is consistently decreasing from 60% in v1.0 to 46% in v3.0, a measure of improvement of iterative policy-making. The performance of our proposed model beats a baseline of TF-IDF
+ Logistic Regression in accuracy.
Index TermsSentiment Analysis, eConsultation, Government Policy, Natural Language Processing, Transformer Models, De-
cision Support System
-
INTRODUCTION
Public participation lies at the heart of modern democratic governance. The Ministry of Corporate Affairs (MCA), India, has taken a signicant step in this direction by deploying an eConsultation module that invites stakeholders, ranging from industry experts and legal professionals to ordinary citizens, to comment on proposed amendments and draft legislation [1], [2]. While such platforms strengthen the transparency and legitimacy of the legislative process, their very success creates a formidable operational challenge: thousands of responses can arrive within days of a consultation opening, quickly overwhelming the capacity of human reviewers to process them thoroughly.
The problem is not merely one of volume. Manual review is inherently slow, error-prone, and difcult to scale [3]. As the number of stakeholders grows, the depth of analysis that any xed team of reviewers can provide remains essentially unchanged, creating a widening gap between the richness of public input and the quality of insight that actually informs policy decisions. Beyond scalability, human reviewers are susceptible to cognitive fatigue and analytical bias: a well- articulated but supercial comment may receive more weight than a technically insightful one that is poorly worded.
Addressing this gap calls for an intelligent, automated system that can process large volumes of unstructured text, classify the sentiment expressed in each submission, detect redundant or near-duplicate comments, and track how public opinion evolves across successive drafts of a policy. Several prior systems have applied BERT-based models to senti- ment classication in legal and governance domains [10],
[11]; however, they have largely treated sentiment analysis as a standalone task, without integrating duplicate detection, version-wise trend analysis, and a production-grade deploy- ment pipeline into a single coherent platform. The system proposed in this paper Avalokan addresses this gap by com- bining DistilBERT-based sentiment classication, Sentence- BERT cosine similarity for deduplication, a multi-version trend engine, and a full-stack FlaskMongoDB deployment into one unied decision support tool designed specically for the eConsultation use case.By applying Natural Language Processing techniques, the Ministry can transform thousands of paragraphs of unstruc- tured feedback into structured, actionable intelligence making the consultation process not only more efcient but also more equitable, since every submission is processed with equal rigor regardless of how it is written.
The main contributions of this work are as follows:
-
Developing an automatic sentiment analysis system for stakeholder comments on a large scale.
-
Implementing semantic similarity algorithms through SBERT for duplicate comment detection.
-
Sentiment trend analysis of different versions of a policy draft.
-
Full-stack implementation using Flask and MongoDB.
-
-
LITERATURE SURVEY
Large-scale textual feedback has been analyzed automat- ically, emerging as a fundamental area of research that de- veloped alongside improvements to e-governance technology. Early sentiment analysis systems were based on conventional machine learning methods signicantly, combinations of TF- IDF feature extraction with Random Forest classiers and Sup- port Vector Machines and were competitive in classication accuracy on domain-specic corpora [9], [10]. Such techniques are still computationally light and interpretable, making them convenient starting points for resource-constrained deploy- ments.
Preprocessing is a precondition of any sentiment analysis pipeline, since real-world consultation text is disorderly and disjointed. Standard steps include tokenization, stop-word re- moval, lemmatization, and elimination of punctuations and special characters. Visualization tools such as word clouds and sentiment distribution charts assist policymakers in seeing key themes at a glance, without requiring strong technical know- how.
With the introduction of transformer-based language mod- els, the landscape changed signicantly. BERT [11] showed that deep bidirectional pretraining on large corpora generates contextual representations which signicantly outperform pre- vious practices on classication problems. Subsequent work expanded it to domain-specic variants: LegalBERT adapted the BERT architecture to legal text, and RoBERTa enhanced the training regime of BERT to generate more robust rep- resentations. The predicted sentiment probability under these models is computed as:
P (y | x) = softmax(W · h[CLS] + b) (1)
Later developments moved towards hybrid architectures that combine the representational power of transformer em- beddings with the decision-boundary versatility of classical classiers. [10] suggested combining BERT embeddings with a Random Forest classier and found enhanced generalization on skewed sentiment datasets a result directly applicable to the current study, due to class imbalance in the eConsultation corpus. Similarly, Xie et al. [12] used BERT in combination with FastText to analyze sentiment in educational text, demon- strating that lightweight auxiliary encoders can offset domain vocabulary gaps that a general-purpose BERT model may not fully capture.
Sarcasm and implicit sentiment remain open challenges across all these architecture. Arif and Nayak [13] showed that even ne-tuned BERT models struggle with gurative language and indirect expressions of disapproval a limitation particularly relevant in policy consultation settings, where
stakeholders may signal resistance through rhetorical questions or qualied praise rather than explicit negative statements.
For duplicate and near-duplicate detection, Sentence- BERT [8] introduced a siamese network architecture produc- ing semantically meaningful sentence embeddings that support efcient cosine similarity comparison. This method is far more scalable than cross-encoding every pair of submissions, making it suitable for large-volume consultation scenarios.
For keyword extraction, TF-IDF frequency analysis has been widely used to surface recurring themes across large document sets, allowing policymakers to identify the most fre- quently raised concerns. Hybrid models combining RoBERTa or LegalBERT with classical classiers have shown strong performance in legal sentiment analysis, though at greater computational expense [9], [10].
On the intersection of sentiment analysis and civic tech- nology, Simonofski et al. [5] studied how social media sen- timent can complement conventional policymaking processes, identifying the need for tools that bridge the gap between un- structured public opinion and structured legislative workows. This directly motivates the design of Avalokan as an end-to- end decision support system rather than a standalone classier. A keyword mapping algorithm has also been proposed to identify references to specic provisions within draft legis- lation [3], enabling ner-grained sentiment attribution. The Trend Engine concept tracking sentiment shifts across drafts (v1, v2, v3) provides a longitudinal view of how public reception evolves as policymakers incorporate stakeholder feedback, which is a core component of the system this paper
describes.
-
System Architecture and Implementation
The proposed system is built upon a layered architecture consisting of three principal components: a React-based Fron- tend Interface, a Flask Backend, and an AI Processing Module. This separation of concerns allows each layer to be developed, tested, and scaled independently, which is essential when processing the large and unpredictable volumes of feedback characteristic of active public consultations.
Stakeholders interact with the frontend to submit comments on a draft policy. These submissions are relayed to the backend via RESTful API calls and persisted in a MongoDB database organized into four collections: Users, Policies, Drafts, and Analysis Results. MongoDBs document-oriented schema ac- commodates the variable structure of free-text comments with- out requiring rigid table denitions, and its horizontal scaling capability supports future growth in submission volume.
Once stored, each comment enters the AI Processing Mod- ule through a multi-stage preprocessing pipeline: (i) Unicode normalization and lowercasing to ensure encoding consistency;
(ii) removal of URLs, email addresses, and special charac- ters that carry no semantic value; (iii) contraction expansion (e.g., wont will not) to reduce vocabulary fragmentation;
(iv) tokenization using the DistilBERT WordPiece tokenizer, which handles out-of-vocabulary terms through subword de- composition; and (v) truncation or padding to a maximum
Fig. 1. Detailed AI Processing Flow and Modular Architecture.
sequence length of 512 tokens as required by the model. Stop- word removal was deliberately omitted at the transformer input stage, since DistilBERTs attention mechanism is capable of learning to down-weight uninformative tokens internally.
The preprocessed token sequences are passed to the senti- ment classication component. The system uses a pretrained DistilBERT model ne-tuned on SST-2 (distilbert-b ase-uncased-finetuned-sst-2-english), which produces a contextual embedding for the entire input via the special [CLS] token. The pretrained SST-2 model was adapted to a three-class setting through post-processing of prediction probabilities. A linear classication layer maps this embedding to sentiment logits, and a softmax activation converts them to a probability distribution over the three sentiment classes:
P (y | x) = softmax(W · h[CLS] + b) (2)
where h[CLS] is the contextual embedding of the input se- quence, and W and b are the learned parameters of the classication layer.
To detect duplicate or near-duplicate submissions a com- mon occurrence in coordinated lobbying campaigns Sentence- BERT embeddings are computed for each comment. The semantic similarity between any two comments A and B is measured using cosine similarity:
A · B
CosSim(A, B) = (3)
I/AI/I/BI/
Comment pairs whose cosine similarity exceeds a predened threshold are agged as duplicates and excluded from the primary sentiment aggregation, preventing organized submis- sion campaigns from distorting the overall sentiment signal. A trend analysis module then aggregates the per-comment sentiment scores across all submissions for a given draft version, producing version-level sentiment distributions that can be compared across v1.0, v2.0, and v3.0 of a policy
and pipeline validation, they were not included in the nal performance evaluation reported in Section IV.
The real-world portion was collected through the Avalokan platform itself: registered users were invited to submit com- ments on three successive versions of a sample draft policy, mirroring the structure of an actual MCA consultation. All submissions were collected with user consent. Each comment was independently labelled by two annotators using a three- class schema: Positive (the commenter broadly supports the amendment or expresses approval), Negative (the commenter opposes or raises concerns), and Neutral (factual observations, questions, or ambiguous statements). Disagreements between annotators were resolved through discussion, with a third team member serving as a tiebreaker where consensus could not be reached.
The nal dataset comprises approximately 1,400 comments distributed across the three classes: 32% Positive (448 com- ments), 47% Negative (658 comments), and 21% Neutral (294 comments). The pronounced negative skew reects the typical pattern of policy consultations, where stakeholders who feel strongly enough to respond are more likely to raise objections than to express general approval. This class imbal- ance was accounted for in evaluation by reporting weighted precision and F1-score in addition to overall accuracy.
-
RESULTS AND DISCUSSION
To establish a meaningful point of comparison, a classi- cal baseline was constructed using TF-IDF weighted bag-of- words features combined with a Logistic Regression classier trained with L2 regularization. This approach represents the standard pre-transformer pipeline for text classication and has been widely used as a reference model in sentiment analysis literature [9], [10]. The baseline was trained and evaluated on the same 150 real-world comments used for the proposed model, under identical train/test split conditions, to ensure a fair comparison.
The proposed system uses a DistilBERT model (distilbe rt-base-uncased-finetuned-sst-2-english) for sentiment classication of stakeholder comments. Experiments were conducted on approximately 150 real-world comments collected through the Avalokan platform, held out entirely from the training process and used solely for evaluation.
TABLE I
Performance Evaluation of the Proposed System.
Model Accuracy F1 Score
document.
TF-IDF + Logistic Regression (Baseline)
75% 0.74
A. Dataset Description
The dataset used in this study combines simulated and real- world stakeholder comments to balance controlled experimen- tal conditions with real-world validity. The simulated portion as generated using a large language model prompted to produce realistic policy feedback across a range of sentiments, writing styles, and levels of technical detail. These synthetic comments were used exclusively during system development
Proposed DistilBERT Model 83.33% 0.762
The DistilBERT model achieved an overall accuracy of 83.33%, a weighted precision of 70.9%, recall of 83.33%, and an F1-score of 0.7622, compared to the baselines accuracy of 75% and F1-score of 0.74. While the DistilBERT model surpasses the baseline on accuracy, the almost similar F1-score (0.7622 vs. 0.74) reects the impact of class imbalance on per- class precision specically, the model tends to over-predict the
Fig. 2. Evaluation metrics of the sentiment classication model showing accu- racy, precision, recall, and F1-score computed on real stakeholder comments.
majority Negative class, suppressing precision for the minority Positive and Neutral classes. This is a known behaviour of models ne-tuned on balanced benchmark datasets such as SST-2 when applied to skewed real-world distributions, and it motivates future ne-tuning directly on the eConsultation corpus.
Fig. 3. Version-wise Sentiment Evolution Trend: Stakeholder sentiment distribution (Positive, Negative, Neutral) tracked across three ofcial draft iterations (v1.0v3.0). A consistent decrease in negative feedback and a general increase in positive sentiment validate the iterative policy renement process.
The version-wise analysis provides a more encouraging picture of the systems practical utility. Applied to stakeholder comments across three successive draft versions, the system tracks a clear directional trend: negative sentiment decreased from 60% in v1.0 to approximately 46% in v3.0, while positive sentiment rose from 23% in v1.0 to a peak of 35% in v2.0 before stabilizing at around 30% in v3.0. Neutral sentiment remained relatively stable, ranging from 15% in v1.0 to approximately 23% in v3.0.
The slight dip in positive sentiment from v2.0 to v3.0, alongside the rise in neutral sentiment, suggests that while the v3.0 revisions successfully addressed a signicant share of stakeholder concerns, some newly introduced provisions
generated uncertainty or mixed reactions rather than outright approval. This kind of nuanced, version-level insight distin- guishing between concerns resolved and new ambiguities introduced is precisely the type of intelligence that is difcult to extract through manual review but is surfaced naturally by the trend analysis module.
Overall, the system demonstrates that the sentiment trend is moving in a positive direction across draft iterations, with the consistent reduction in negative feedback serving as a quantitative indicator of iterative policy improvement.
-
LIMITATIONS
The greatest linguistic weakness of the existing system is the way it treats implicit and gurative sentiment. Stakeholder dis- agreement in policy consultations is often expressed through rhetorical questions, hedged phrasing, or sarcastic framing e.g., a remark like I am sure this amendment will do the drafters of it a great service carries negative sentiment that a supercial classier is prone to misread as positive. The DistilBERT model in this case was ne-tuned on SST-2, a movie review sentiment dataset in which sentiment is predominantly explicit. Being accustomed to the more subtle register of legal and policy discourse, the model has no mechanism for identifying irony or situational sarcasm. Addressing this would require either a dedicated sarcasm pre-ltering step such as the ne- tuned BERT methodology investigated by Arif and Nayak [13] or domain-adaptive ne-tuning on annotated eConsultation data containing sarcastic examples.
The second constraint is related to scaling under high- load conditions. The present evaluation was carried out on approximately 150 real-world comments, which sufces for proof-of-concept validation but is orders of magnitude fewer than the volume a national-level MCA consultation would generate. The DistilBERT inference pipeline, though lighter than full BERT, still processes comments one by one in the current implementation. At scale e.g., 50,000 submissions ar- riving within 48 hours – this would generate intolerable latency without asynchronous batched inference, processing queues, or model quantization to minimize per-sample inference time. The MongoDB database is horizontally scalable, yet the AI module remains a bottleneck that must be addressed prior to production deployment at national scale.
Third, the system operates in near-real-time rather than ac- tual real-time. The sentiment score is recomputed periodically in response to batch processing of comments, not instantly upon submission to the dashboard. This approach sufces under most consultation circumstances, but in emergencies requiring rapid regulatory response, or during fast-moving societal events, the processing delay can result in signicant shifts in public opinion going unnoticed for minutes or even hours. Integrating the inference pipeline with a tool such as Apache Kafka or Redis could help reduce processing latency. Moreover, although the evaluation dataset is of real-world origin, its collection was carried out through an articial platform with a relatively homogeneous user community. Feedback gathered from a broader public survey involving
multiple regional languages, varying literacy levels, and di- verse linguistic backgrounds may exhibit different distribu- tional characteristics compared to the current corpus. The systems performance on such heterogeneous input has yet to be established.
-
Societal Impact and Practical Implications
In its simplest form, Avalokan addresses a democratic decit. This decit is produced when a government consul- tation receives thousands of responses but only a fraction can be reviewed the voices of the majority of participants go unheard, no matter how committed the system claims to be to engagement. By automating sentiment classication, deduplication, and trend tracking, the system ensures that every submission is studied with the same rigor, regardless of how eloquently it is written or when during the consultation window it was submitted. This fairness of processing is not just a technological feature; it is a material improvement to the fairness of the legislative process.
To policymakers, the practical advantages are immediate and concrete. Instead of ministry ofcials being given a stack of thousands of raw submissions at the end of a consultation, they are presented with a dashboard displaying the overall sentiment distribution, the prevailing concerns as indicated by keyword analysis, the percentage of duplicate entries re- moved, and critically a version-wise trend chart showing how sentiment has shifted across past revisions. This compresses what would otherwise be weeks of manual post-consultation review into an hours-long analytical task, liberating ofcials to focus on policy content rather than the logistics of document processing.
The societal implications extend beyond efciency. Auto- mated deduplication deters coordinated lobbying in which one interest group submits hundreds of near-identical responses from articially inating the apparent weight of a particular viewpoint. Toxicity ltering removes abusive or offensive posts prior to review, safeguarding the integrity of the consul- tation record. Together, these features make the consultation process more resistant to manipulation and more reective of genuine public sentiment, which strengthens the democratic legitimacy of the resulting legislation.
Beyond the MCA, the Avalokan architecture is domain- agnostic. Any organization that receives large volumes of structured public opinion environmental regulators, municipal planning authorities, public health agencies faces the same information overload problem. The modular design of the system, withclearly separated preprocessing, classication, deduplication, and trend analysis components, means it can be adapted to new domains by retraining the classication layer on domain-relevant data while leaving the rest of the pipeline intact. This generalizability signicantly extends the potential societal contribution of the work.
-
CONCLUSION
In this paper, Avalokan an AI-based decision support system meant to transform the manner in which government ministries
process and respond to public input in legislative consulta- tions was introduced. The system combines DistilBERT-based sentiment classication, Sentence-BERT cosine similarity for duplicate detection, multi-version trend analysis, and a full- stack FlaskMongoDB implementation into a single unied platform an integration which, to the best of our knowledge, has not been previously demonstrated in the eConsultation domain.
Evaluated on 150 real-world stakeholder comments, the system achieved an accuracy of 83.33% and an F1-score of 0.7622, outperforming the TF-IDF + Logistic Regression baseline on precision while highlighting the class imbalance issue a well-known challenge when benchmarked models are applied to skewed real-world distributions. The version-wise sentiment trend ndings conrmed the systems capacity to identify meaningful shifts in public opinion across drafts, with positive sentiment declining from 60% in v1.0 to 46% in v3.0 a quantitative signal of iterative policy improvement that would be practically impossible to extract through manual inspection. The identied limitations sarcasm detection, sequential in- ference bottlenecks, near-real-time rather than streaming up- dates, and dataset diversity establish a clear and actionable research agenda. Future work will focus on domain-adaptive ne-tuning of the classication model on a larger and more diverse eConsultation corpus; a streaming inference pipeline to enable truly real-time sentiment updates; and adoption of a blockchain-based immutable audit ledger to ensure the integrity and transparency of the entire consultation record. These directions build directly on the foundation the current system has established, collectively advancing Avalokan from a proof-of-concept towards a production-grade tool for demo-
cratic governance at scale.
ACKNOWLEDGMENT
The authors wish to thank the Department of Information Technology at Chaitanya Bharathi Institute of Technology, Hyderabad, for providing the infrastructure and academic environment that made this work possible. We are also grateful to all the participants who submitted comments through the Avalokan platform during data collection, and to the annotators whose careful labelling effort formed the foundation of the evaluation. Finally, we thank our peers and reviewers whose feedback helped sharpen the ideas presented in this paper.
REFERENCES
-
A. Macintosh, Characterizing e-participation in policy-making, in
Proc. 37th Hawaii Int. Conf. System Sciences, IEEE, 2004.
-
R. Deshmukh et al., E-governance and Digital Innovation. Springer, 2024.
-
Z. Jin and R. Mihalcea, Natural Language Processing for Policy-Making.
Springer, 2023.
-
M. Mansoor et al., Semantic similarity detection, 2020.
-
A. Simonofski et al., Policy-making with social media, 2021.
-
E. Cambria et al., A Practical Guide to Sentiment Analysis. Springer, 2017.
-
S. Poria et al., Multimodal sentiment analysis, 2017.
-
N. Reimers and I. Gurevych, Sentence-BERT: Sentence embeddings using siamese BERT-networks, arXiv preprint arXiv:1908.10084, 2019.
-
E. Demir and M. Bilgin, Sentiment analysis from Turkish news texts with BERT-based language models and machine learning algorithms, 2023.
-
U. Ghosh, S. Sarkar, S. Jana, I. Bhattacharya, K. Singh, and P. Kumari, A hybrid framework for sentiment analysis on textual data using BERT embeddings and random forest classier, 2026.
-
J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, BERT: Pre-training of deep bidirectional transformers for language understanding, arXiv preprint arXiv:1810.04805, 2018.
-
P. Xie, H. Gu, and D. Zhou, Modeling sentiment analysis for educa- tional texts by combining BERT and FastText, 2024.
-
M. F. Arif and J. Nayak, Fine-tuned BERT model for accurate hate speech detection in social media, 2024.
