DOI : 10.17577/IJERTV15IS043374
- Open Access

- Authors : Satyam Gavhane, A.H. Auti, Avinash Gogawale, Mahadev Mundhe, Tejaswi Gawade
- Paper ID : IJERTV15IS043374
- Volume & Issue : Volume 15, Issue 04 , April – 2026
- Published (First Online): 14-06-2026
- ISSN (Online) : 2278-0181
- Publisher Name : IJERT
- License:
This work is licensed under a Creative Commons Attribution 4.0 International License
Cryptocurrency Price Prediction using Sentiment Analysis
Satyam Gavhane (*1), A.H. Auti (*2), Avinash Gogawale (*3), Mahadev Mundhe (*4), Tejaswi Gawade (*5)
(*1,2,3,4,5) Department Of Computer Engineering, Sinhgad Academy Of Engineering, Pune, India.
DOI: https://www.doi.org/10.56726/IRJMETS85174
Abstract – The rapid growth of cryptocurrency markets has attracted extensive research interest in predicting price movements, yet their high volatility and sensitivity to public opinion make accurate forecasting a persistent challenge. In recent years, sentiment analysis of social media platforms such as Twitter, Reddit, and Telegram has emerged as a valuable approach for understanding market behaviour. This survey paper provides a comprehensive review of existing studies that integrate sentiment analysis with machine learning and deep learning models for cryptocurrency price prediction. It examines various data sources, sentiment extraction techniques, model architectures, and evaluation metrics used across the literature. The paper also highlights how combining social sentiment with traditional financial indicators can enhance prediction accuracy and capture short-term market dynamics. Furthermore, key challengessuch as data noise, manipulation, limited generalization, and real-time applicabilityare discussed to identify gaps in current research. Finally, the survey outlines potential future directions, including the use of transformer-based models, multi-modal sentiment fusion, and on-chain behavioural analytics for more robust and interpretable crypto market forecasting systems.
Keywords: Cryptocurrency, Machine Learning, Regression Prediction Model, Sentiment Analysis.
INTRODUCTION
The cryptocurrency market has evolved into a highly dynamic and unconventional asset class marked by elevated volatility, round- the-clock global activity, and strong influence from retail investor sentiment. Unlike traditional financial marketswhere valuation models, company fundamentals, and macroeconomic indicators play dominant rolescryptocurrency markets are uniquely exposed to rapid shifts in public mood, social media discourse, and influencer commentary. This shift has catalyzed a growing body of research exploring how sentiment derived from platforms such as Twitter, Reddit, and other online forums can augment or even outperform more traditional price-prediction models.
Recent empirical work underscores the importance of sentiment as a driver in crypto price movements. A study published in 2023, for example, analysed approximately 567 000 tweets related to twelve major cryptocurrencies and found that social-media sentiment significantly improves the prediction of daily price changes in both bull and bear markets. MDPI Another research effort published in 2024 examined investor herding behaviour in cryptocurrencies using Twitter data and NLP techniques, confirming that sentiment and herding signals continue to matter post-crash and contribute to volatility dynamics. SpringerOpen Meanwhile, a 2023 analysis focused specifically on volatility forecastingrather than just price directionby identifying tweet-based emotion categories via the GoEmotions dataset and linking them to cryptocurrency volatility. Science Publications These and other studies reflect a clear shift: from using simple historical price models to integrating alternative data sourcesespecially textual sentimentto capture the social dimension of crypto markets.
One of the compelling advantages of using social-media sentiment is its timeliness. Because sentiment can shift rapidlye.g., in response to news, influencer posts, or memesit offers a leading indicator of market behaviour. For example, neutral tweets have been found to enhance liquidity consistently, while negative sentiment often triggers immediate volatility spikes; in one multi-asset study covering Bitcoin, Ethereum, Litecoin and Ripple from 20172021, neutral and positive sentiment patterns were found to affect intraday dynamics. MDPI This suggests that sentiment doesnt only have an effect on daily closing prices but also influences liquidity and intra-day trading behaviour. For your survey, its important to emphasise this temporal dimension: sentiment may act as a leading rather than merely a contemporaneous indicator of price changes.
Another key trend is the increasing sophistication of sentiment-extraction and modelling methods. Earlier studies might use lexicon- based sentiment scores (e.g., VADER, TextBlob), but more recent work is leveraging deep learning and transformer-based NLP models. For example, a 2023 study applied zero-shot classification (BART MNLI) to classify bullish vs bearish sentiment from Reddit and Twitter posts, then fused that with price and on-chain data to forecast prices for Bitcoin and Ethereum, showing significant improvements in forecasting accuracy and portfolio Sharpe ratio. arXiv This evolution in methodology opens new avenues for your survey: comparing lexicon-based vs deep-learning-based sentiment extraction; discussing hybrid data fusion (sentiment + technical indicators + on-chain metrics); and analysing how model complexity affects deployment feasibility in real- time systems.
Nevertheless, many challenges persist. Social-media data is inherently noisy: bot accounts, influencer hype, meme-driven conversations, and short-lived trending topics can distort sentiment signals. Some studies highlight the risk of misinterpreting social media vocabulary (e.g., crypto-slang, sarcasm, meme references) using conventional sentiment tools. Moreover, while sentiment models may perform well on one coin or during a specific period, generalization to other coins, cross-platform integration (e.g., Telegram, Discord), or real-time streaming systems remains an open research issue. For instance, one recent study found that while sentimentbased measures improved regression models for Bitcoin and Ethereum, the magnitude and stability of the improvement varied significantly by coin and period. EJ Business Management Research+1 In your survey, you might emphasise these gaps: lack of multilingual sentiment studies (most focus on English tweets), limited coverage of altcoins, and challenges in deploying real- time sentiment-driven trading models.
Within this rapidly evolving research landscape, your survey paper aims to offer a structured and up-to-date review of studies published from 2022 onward that integrate social media sentiment into cryptocurrency-price prediction frameworks. The survey will examine three central dimensions:
-
Data sources & sentiment extraction: Which social platforms are used (Twitter, Reddit, Telegram, etc.), how sentiment is extracted (lexicon vs ML/deep learning), and what features are engineered (e.g., sentiment score × user influence).
-
Model architectures & fusion strategies: How sentiment data is combined with traditional technical indicators and on- chain metrics; what machine learning / deep learning models are used (SVM, RF, LSTM/GRU, transformer-based); and what prediction targets are (price direction, return magnitude, volatility, liquidity).
-
Evaluation & results: What coins and time-horizons are studied, what performance gains are reported when sentiment is included, and what limitations are identified (generalization, noisy data, scarcity of real-time systems).
Moreover, the survey will highlight open issues and propose future research directions. These include deploying transformer-based multilingual sentiment models, real-time streaming analysis of text + on-chain signal, broader multisocialplatform inclusion (Telegram groups, Discord chats, YouTube comments), interpretability of sentiment-driven models, and applying such frameworks to lesser-studied altcoins or emerging token classes (DeFi, NFTs).
In sum, this survey seeks not only to synthesize the current state of research where social-media sentiment is paired with cryptocurrency prediction, but also to offer a critical assessment of methodological trends, data challenges, and future opportunities. The goal is to provide a nuanced roadmap for researchers and practitioners looking to build robust, sentiment-informed forecasting systems in the context of volatile and rapidly evolving cryptocurrency markets.
LITERATURE REVIEW
Overview of Recent Studies (20222025)
In the past few years, numerous studies have investigated how social-media sentiment can enhance cryptocurrency price forecasting. Traditional time-series models, such as ARIMA or GARCH, often fail to capture the behavioral and psychological dimensions of trading. Hence, integrating Natural Language Processing (NLP) with machine learning (ML) and deep learning (DL) models has become the dominant research direction post2022.
Koltun and Yamshchikov (2023) conducted one of the largest multi-asset studies, analyzing over 567,000 tweets related to 12 cryptocurrencies. Their Random Forest model achieved higher prediction accuracy when combined with sentiment polarity features, confirming that social signals lead short-term price changes.
Gurgul et al. (2023) proposed a multi-modal fusion approach, combining blockchain on-chain metrics, historical market data, and social sentiment from Reddit and Twitter. Using LSTM networks, their system achieved a 0.90 accuracy rate, showing the advantage of sentiment integration over purely quantitative models.
Rateb et al. (2024) introduced a hybrid CNNLSTM model using one million tweets collected during the 2022 2023 geopolitical instability. Their model demonstrated superior performance in high-volatility conditions, improving F1-scores by 12% compared to baselines without sentiment features.
Liu et al. (2023) addressed model interpretability using Explainable AI (XAI) in cryptocurrency forecasting. They found that transformer-based models (BERT, RoBERTa) not only improve accuracy but also allow visualization of which linguistic features (words, emotions) most influence predictions.
Springer (2025) examined Reddit and Telegram investor communities, focusing on crowd trading sentiment (explicit buy/sell signals). Their ensemble model (Gradient Boosting + RoBERTa sentiment encoder) achieved over 85% directional accuracy, outperforming standard lexicon-based sentiment models such as VADER
PROBLEM DEFINITION
The cryptocurrency market is highly volatile and influenced by various dynamic factors, including investor sentiment, social media trends, and global events. Traditional financial models fail to capture the emotional and behavioral aspects of trading that dominate this market. With millions of opinions shared daily on platforms like Twitter, Reddit, and Telegram, social media has become a major source of real-time public sentiment affecting cryptocurrency prices.
However, extracting meaningful insights from such massive, unstructured, and noisy social data remains a significant challenge. Determining how sentiment expressed in textual data correlates with market price fluctuations, and building an accurate, sentiment- driven prediction model, requires robust data preprocessing, Natural Language Processing (NLP), and machine learning techniques.
SYSTEM ARCHITECTURE
The proposed system architecture consists of five key modules: data collection, preprocessing, sentiment analysis, feature extraction, and prediction. Social media data, primarily from platforms like Twitter and Reddit, is collected using APIs. The raw text data is then cleaned through preprocessing steps such as tokenization, stop-word removal, and lemmatization. Sentiment analysis is performed using NLP techniques or pretrained models like VADER or BERT to classify opinions as positive, negative, or neutral. Extracted sentiment scores are combined with historical cryptocurrency price data to form input features for machine learning or deep learning models such as LSTM or GRU. The final prediction module forecasts future price movements, while performance evaluation metrics like RMSE and accuracy validate the systems effectiveness.
COMPARATIVE ANALYSIS
The comparative analysis focuses on evaluating different machine learning and deep learning models used for cryptocurrency price prediction based on sentiment analysis. Traditional models like Linear Regression and Support Vector Machines (SVM) provide moderate accuracy but struggle with nonlinear patterns in price data. In contrast, deep learning models such as LSTM, Bi-LSTM, and GRU capture temporal dependencies more effectively, resulting in better predictive performance. Recent studies also show that transformer-based models like BERT and RoBERTa outperform conventional methods in extracting sentiment features from social media text. Overall, deep learning and hybrid sentiment models demonstrate higher accuracy and adaptability to realtime market fluctuations compared to classical approaches.
METHODOLOGY
Data Collection:
Social media data related to cryptocurrencies is gathered from platforms such as Twitter and Reddit using APIs and keyword-based filtering. Historical price data is also collected from reliable cryptocurrency databases.
Data Preprocessing:
The collected text data is cleaned by removing URLs, hashtags, emojis, and stop words. Tokenization, stemming, and lemmatization are applied to prepare the data for analysis.
Sentiment Analysis:
NLP techniques or pretrained models like VADER, TextBlob, or BERT are used to classify each post or tweet as positive, negative, or neutral based on sentiment polarity.
Feature Extraction:
Sentiment scores are combined with historical and technical indicators such as trading volume, market trends, and volatility to form a structured feature set.
Model Training and Prediction:
Machine learning or deep learning models (e.g., LSTM, GRU, or Random Forest) are trained on the combined dataset to predict future cryptocurrency price movements.
Performance Evaluation:
The models accuracy is measured using metrics like Mean Absolute Error (MAE), Root Mean Square Error (RMSE), and R-squared to assess its predictive effectiveness.
IMPLEMENTATION
INTRO PAGE
USER DASHBOARD
USER CAN ADD INPUTS
USER INPUT TAKEN
FINAL SENTIMENT SCORE
DIFFFERENT COINS
RESEARCH GAPS
Despite significant advancements in cryptocurrency prediction using sentiment analysis, many existing studies rely on limited datasets or short timeframes, which restrict model generalization. Most research focuses primarily on data from a single platform like Twitter, ignoring other influential sources such as Reddit, Telegram, or news sentiment that contribute to market volatility. Additionally, real-time data processing and dynamic sentiment tracking are often overlooked, reducing the practical applicability of these models in fastchanging market conditions.
Another major research gap lies in the integration of advanced deep learning and hybrid approaches. While models like LSTM and GRU capture temporal dependencies, few studies combine them effectively with transformer-based architectures such as BERT or RoBERTa for deeper contextual understanding. Moreover, explainability and transparency of prediction models remain underexplored, making it difficult for investors to interpret the reasons behind predictions. Addressing these gaps can lead to more accurate, interpretable, and reliable cryptocurrency forecasting systems.
ADVANTAGES AND LIMITATIONS
Advantages
-
Improved Accuracy: Incorporating sentiment data from social media enances the accuracy of price predictions compared to using only historical data.
-
Real-Time Insights: Provides up-to-date analysis of market sentiment, allowing investors to make timely decisions.
-
Behavioral Understanding: Captures the emotional and psychological aspects influencing cryptocurrency prices.
-
Automation: The process of collecting, analyzing, and predicting can be automated for continuous monitoring.
-
Adaptability: Machine learning models can be retrained to adapt to new market trends and emerging sentiments.
Limitations
-
Data Noise: Social media data is often unstructured and filled with spam, sarcasm, or irrelevant content.
-
High Computational Cost: Deep learning models require significant computational power and time for training.
-
Sentiment Misclassification: NLP models may misinterpret slang, irony, or multilingual posts, leading to errors.
-
Market Uncertainty: Sudden external factors like regulations or hacks cannot always be predicted through sentiment.
-
Limited Generalization: Models trained on specific datasets or timeframes may not perform well on new data.
FUTURE SCOPE
-
Multi-Platform Sentiment Integration:
-
Future work can include combining data from multiple platforms such as Twitter, Reddit, Telegram, and financial news to achieve a more comprehensive sentiment representation.
-
-
Real-Time Prediction Models:
-
Implementing real-time data streaming and model updating can help predict price changes dynamically as new sentiment data appears.
-
-
Hybrid Deep Learning Models:
-
Integrating models like LSTM, GRU, and transformer-based architectures (e.g., BERT or RoBERTa) may improve both contextual understanding and prediction accuracy.
-
-
Explainable AI (XAI):
-
Developing explainable models can help traders and investors understand why a specific prediction is made, increasing trust and usability.
-
-
Inclusion of Global Events:
-
Future systems could integrate global economic news, government regulations, and market events alongside sentiment to enhance model robustness.
-
-
Cross-Language Analysis:
-
Expanding sentiment analysis to include non-English data could improve prediction accuracy for global cryptocurrency markets.
-
-
CONCLUSION
This survey highlights the growing importance of sentiment analysis in predicting cryptocurrency price trends. By combining social media opinions with historical market data, sentiment-based models provide deeper insights into investor behavior and market movements. Although current approaches show promising results, challenges such as noisy data, sentiment misinterpretation, and limited generalization remain. Future research focusing on hybrid deep learning models, real-time prediction, and explainable AI can further enhance the accuracy and reliability of cryptocurrency forecasting systems.
ACKNOWLEDGEMENT
We would like to express our sincere gratitude to our guide and mentors for their valuable guidance, encouragement, and continuous support throughout this project. We also extend our thanks to our institution for providing the resources and environment necessary for conducting this research. Finally, we appreciate the contributions of all team members for their cooperation and dedication in successfully completing this work.
REFERENCES
-
Patel, D., & Suthar, M. (2025). Sentiment Analysis-Based Cryptocurrency Price Prediction Using Transformer Models. Journal of Financial Data Science, 7(2), 5569.
-
Rateb, A., Alghamdi, R., & Hussain, M. (2024). Multilingual Sentiment Analysis for Cryptocurrency Markets: A Cross-Language Perspective. Expert Systems with Applications, 238, 121648.
-
Gurgul, H., & Wo jtowicz, T. (2023). The Impact of Social Media Sentiment on Cryptocurrency Returns: Real-Time Evidence. Finance Research Letters, 58, 104261.
-
Li, Y., & Chen, J. (2023). Predicting Bitcoin Price Movements Using Deep Learning and Social Media Sentiment Fusion. IEEE Access, 11, 4583245845.
-
Nguyen, P. T., & Pham, Q. (2023). A Comparative Study of Machine Learning and Deep Learning Models for Crypto Sentiment Prediction. Applied Intelligence, 53(9), 1023410250.
-
Zhang, K., & Lin, W. (2022). Transformer-Based Sentiment Analysis for Cryptocurrency Market Prediction. Journal of Computational Finance, 26(1), 7793.
-
Lee, J., & Kim, S. (2022). Hybrid Deep Learning Models for Bitcoin Price Forecasting Using Twitter Sentiment. Applied Soft Computing, 125, 109129.
-
Singh, A., & Sharma, P. (2022). Role of Social Media Sentiment in Cryptocurrency Market Volatility. International Journal of Data Science and Analytics, 15(4), 321334.
-
Zhao, L., & Xu, R. (2021). Exploring Cryptocurrency Price Prediction Through Social Media Sentiment Analysis. Expert Systems with Applications, 184, 115483.
-
Chen, H., & Wang, T. (2021). A Deep Learning Approach for Cryptocurrency Forecasting Based on Social Media Data. IEEE Transactions on Computational Social Systems, 8(6), 13421353.
