Analyzing Stock Trend using News Articles

DOI : 10.17577/IJERTCONV8IS15003

Download Full-Text PDF Cite this Publication

Text Only Version

Analyzing Stock Trend using News Articles

Shravan Bhat, Siddhanth M, Sampath Kumar, Dr. Rekha B Venkatapur, Head of the Department Dept. of Computer Science & Engineering

K.S Institute of Technology, Bangalore, India

Abstract – Data surpassed Oil as the most valuable resource in the world. We are living in an age where the value of data is more than any other resource. As such, the world economy is in one way or another linked to the data that is being produced. The world economy runs on the basis of the stock market. The stock market is intertwined with the current affairs and the news. For instance, the news of bad loans in the crisis of Yes Bank, as it dropped by 86%. This is an example of how news affects the stock market. There are many factors by which the stock trends are affected, one of which is daily news articles.

Recent studies have shown that the massive amount of online information and various social media discussions and news stories tend to have an observable effect on the financial market. So, the goal would be to analyze and determine whether there is any significant link between the news articles and the news on the internet on the stock market or rather whether it has any impact on the shares of stocks of a company

Keywords – Machine Learning, Natural Language Processing, Stock market prediction, Analytics, Neural Network

  1. INTRODUCTION

    Stock market is an aggregation or a cluster of buyer and seller of stocks, which basically represent the ownership of a business. So, these stocks can be bought and sold on stock exchanges. Since, the stocks issued by individual companies are affected by many different factors both inside and outside the company, the stock market is very unpredictable. Therefore, a successful prediction could yield a significant profit. Recent studies have shown that the massive amount of online information and various social media discussions and news stories tend to have an observable effect on the financial market. So, the goal would be to analyze and determine whether there is any significant link between the news articles and the news on the internet on the stock market or rather whether it has any impact on the shares of stocks of a company. We can also thus figure out how each news headline could in turn change the stock market.

  2. METHODOLOGY The project is broken into 6 parts

    PART I: Data collection and sentiment analysis PART II: Developing the ML model

    PART III: Training the ML model with training data PART IV: Calculating the performance of the model. PART V: Testing the ML model with testing data PART VI: Accuracy of the ML model.

    1. Data collection and sentiment analysis

      First step is to download the data from various news sources and their respective apis. The news sources we used for retrieving the data are:

      1. https://www.economictimes.com

      2. https://www.deccanherald.com

      3. https://www.moneycontrol.com

      4. https://www/finance.yahoo.com

      5. https://www.investing.com

        We processed over 20 lakh news articles over 8 years which is more than any previous study that we could find. Data is downloaded from the stock market indices and platforms with information like high, low, volume traded etc. This scraping of information will be done with help of BeautifulSoup4 – A library in python for extracting data. We will now parse the given information which has been downloaded, to process and remove any unnecessary information. From the news articles, only the financial news will be loaded and any extra tags or information will be discarded. The relevant fields from the stock market data will also be parsed in similar manner.

        Determining the polarity of the news article

        This is done by using the library Vader Analysis. The library goes through the article and assigns a value which is used in determining the polarity of the news article. Vader library is used for determining polarity in a very efficient way. The library classifies information into 4 different types:

        1. Positive: if the assigned score > 0

        2. Negative: if the assigned score < 0

        3. Neutral: if the assigned score ~ 0

        4. Compound: the sum of positive and negative and the sentiment score

        After downloading the news articles, we assign each heading a vader score. We chose this library since we found it has a lot of accuracy for news articles. The negative aspect is that for financial news articles there tended to be more false positives due to which we also had to include some bag of words for common negative sentiment words which were being wrongly classified. Thus we achieved a parsed csv file which had the vader score for each headline.

    2. Developing the ML model

      The ML model was developed using Tensorflow and Keras library and was executed on Google Colab. The dataset is divided into 80-20 ratio (80 for training, 20 for testing)

      The ML model can be broken down into 6 parts:

      1. Importing all the dependencies

      2. Creating the neural networks

      3. Training the model

      4. Evaluation of the model

      5. Testing the model

      6. Accuracy of the models.

    3. Developing the ML model

      train_model=model.fit(X_train[0:],y_train[0:], epochs=500,verbose=False,shuffle=True)

      train_model=model.fit(X_train[0:],y_train[0:], epochs=500,verbose=False,shuffle=True)

      After developing the ML model, it is compiled and then trained. The training process involves using the tensorflow library with keras. The code for running the Ml model is as follows:

      where epochs represents the number of iterations or the total number of samples on which the model is training on. Here, the number of samples used are, 79,000*500 = 39500000 samples

      fit() is the method used in Tensorflow to invoke the training process

    4. Training the ML model with training data

      After training the machine learning model i.e after processing samples, the results of the machine learning model are then analyzed for performance. The model is then taken to the next stage i.e testing the model with the testing data.

      A snapshot of the percentage change of the shares of stock according to the model:

      Input: [[-0.232 -0.2 -0.23 ]]

      Prediction: -0.05841918

      Input: [[-0.232 -0.2 -0.23 ]]

      Prediction: -0.05841918

      In the snapshot, the prediction of -0.058 indicates that the stock reduces by 0.058 points and the accuracy of the prediction in this case is 98.7%.

    5. Calculating the performance of the model

      In this stage, the machine learning model is then tested with the testing data which is a very essential step as it determines whether the training samples in the previous stage was processed properly and whether the results of the test data from the machine learning model can be used to check for real-time news articles and get the desired results.

    6. Accuracy of the ML model

    After testing the machine learning[6] on the test data, it is now required to check the accuracy of the model. How do we determine the accuracy of the machine learning model?

    The model checks whether the predicted results of the dataset matches the actual results and then it takes an overall average percentage and outputs a percentage.

    It is found that using the DenseLayers network consisting of 3 layers,

    1. input layer: consisting of 128 nodes

    2. hidden layer: consisting of 128 nodes

    3. outputlayer: consisting of 1 node,

    an accuracy of 55.45% for the dataset used.

  3. RESULTS

In general, an accuracy of 55% was achieved with a high of 61% and a low of 45%.

Some snapshots are as follows:

Fig. 1. Bar graph showing the relation between polarity and frequency

Fig 2: Scatter plot showing the prediction with relation to real values

Fig 3: CSV file of the final readings

Fig 4: Word cloud of the most common words in the dataset

CONCLUSION

In this project, we have significant proof that there is a correlation between the price of shares of stock and the daily news associated with it. Through the machine learning model, we were able to observe that there is in fact an impact through these news articles. Though not much, it is helpful for an enthusiast who is deeply passionate about investing in the stock market

This project was carried out on NIFTY50 company dataset and the corresponding news dataset for each of those 50 companies. The prediction of the model that we were able to achieve was roughly around 55% with the highest being 61% and the lowest being 45%.

ACKNOWLEDGEMENT

We would like to thank our college for giving us this opportunity. We would also like to thank our guide, the HOD for her guidance.

We would like to thank our friends and family without whom we would not have been able to complete this project.

REFERENCES

  1. Dev Shah, Haruna Shah, Farhana Zulkernine, Predicting the effects of news sentiments on the stock market, 2018 IEEE conference on Big Data(Big Data), ISBN:978-1-5386-5035-6/18

  2. Yasef Kaya, M. Elif Karsligil, Stock price prediction using financial news articles, 2010 IEEE , ISBN: 978-1-4244-6928-4/10.

  3. HD Huynh, LM Dang, D Duong, A New model for stock price movements prediction using Deep Neural Network, SoICT, 2017, pp.57-62: ACM

  4. Stock market prediction using daily news articles: Yashwanth Singh Patel, Supriyo Mandal, IIT Patna, 2017

  5. Cicil Fonseka, Liwan Liyanage, A data mining algorithm to analyse stock market data using lagged correlation, 2008 IEEE, ISBN: 978-1- 4244-4/08.

  6. Bhargav Hegde, Dayananda P, Mahesh Hegde, Chetan C, Deep Learning Technique for Detecting NSCLC, International Journal of Recent Technology and Engineering (IJRTE), Volume-8 Issue-3,

    September 2019, pp. 7841-7843. DOI: 10.35940/ijrte.C6540.098319

  7. Kalyani Joshi, Prof.Bharati, Prof. Jyothi Rao, Stock trend prediction using news sentiment analysis, IJCSIT VOL.8 No.3 June 2016

Leave a Reply