Time Series Analysis for Understanding the Vaccination Rate using ARIMA

Download Full-Text PDF Cite this Publication

Text Only Version

Time Series Analysis for Understanding the Vaccination Rate using ARIMA

Amulya Maitre Department of Computer Engineering

Pimpri Chinchwad College of Engineering, Akurdi

Pune, India

Dr. K. Rajeswari Department of Computer Engineering

Pimpri Chinchwad College of Engineering, Akurdi

Pune, India

Prof. Sushma Vispute Department of Computer Engineering

Pimpri Chinchwad College of Engineering, Akurdi

Pune, India

Abstract – The pandemic of the novel Coronavirus has led to a devastating situation all around the globe. With the introduction of vaccines by various pharmaceutical companies, there is a hope of recovery from this dreadful condition. In this work, we are using a Machine Learning (ML) approach, based on time series analysis called ARIMA to forecast vaccination rates in various countries. With experimentation, we are able to conclude the forecast of probable increase or decrease in vaccination rates for developed, developing, and underdeveloped countries. This approach will be helpful to apprehend which country needs more attention in terms of vaccine supply or creating awareness amongst its citizens.

Keywords – ARIMA, Covid-19, Machine Learning, Time Series Analysis.

vaccinations for countries in the upcoming period. This will help us analyze the citizens' response to vaccination and indicate the awareness amongst the citizens of that particular country. The analysis could also be an indicator for vaccine supplies in the countries.

The paper further walks through various sections. Section I introduces the enigma, and section II illustrates the prior research done in this field. Section III provides the data set description and methodology, followed by the results and discussion in Section IV. The conclusion of this study is in Section V.



The COVID-19 pandemic originated in China at the end of 2019. By January 2020, the virus affected almost a billion people due to irresponsibility and gradually gained the status of COVID-19 as a world-upending pandemic.

Since the COVID-19 pandemic was termed unfavorable for humankind, the only ray of hope was to manufacture a cure for this wildly spreading viral disease at the earliest. Scientific and medical communities quickly began their research and development processes and successfully came up with vaccines in a record-breaking short period.

But not until early 2021, the vaccines were made available to all. With the rapid development, many pharmaceutical companies have come up with their vaccine brands, approved by the World Health Organization (WHO). Although vaccine production is moving in great swing, we cannot help but notice the lack of awareness in people around the globe about the benefits of these vaccines, looking at the rate of vaccinations done. Another possibility could be that some countries haven't been supplied with the vaccines yet.

In this study, we intend to understand the trends of vaccinations done in countries around the world. We categorize these countries as developed, developing and under-developed, to summarize our observations. Using the time series analysis approach, we can predict the rate of

Time series analysis is a specialized area of statistics that focuses on time-series data and analysis of trends over time. This kind of data could be in intervals or a series of particular periods.

Ample research is done using time series analysis in various aspects such as economic changes, geographical changes, and many more. Many applications use time series analysis to predict the near future possibilities based on defined parameters. In this section, we will summarize a few applications based on time series analysis. Also, we will discuss some prior studies based on COVID-19.

S. Julian et al. [3] have used the ARIMA model for calculating the prediction of immunization in Indonesia. Their study focused on predicting the vaccine stock for immunization which will help the government determine vaccine stock requirements for clinics to avoid shortage or excess vaccines in stock.

  1. Varsharani et al. [4] worked on forecasting Measles vaccine requirements using time series analysis. Their study used the Expert Modeler as it showed a simple seasonal model as the best fit model. The study helped in the vaccine requirement calculation as it gave the upper and lower confidence interval, assuring the optimum supply of vaccines in Medical colleges.

    S. Saswat et al. [5] used the ARIMA model to predict the effect of the lockdown-unlock periods on the COVID-19

    dataset. The model results were compared with other models like TBATS, N-Beats, Prophet, Single Exponential Method, Double Exponential Method, and Moving Average, using the Root Mean Square Error (RMSE).

    H. Tandon et al. [11] contrasted the accuracy of the ARIMA model with other time series analysis models like Linear Trend, Moving Average, Quadratic Linear, Single Exponential, S-Curve Trend, and Double Exponential. The conclusion was that ARIMA(2,2,2) is the most accurate amongst the other models for forecasting the confirmed COVID-19 infection cases, based on MAPE, MAD, and MSD values. They suggest employing this model for foretelling the number of COVID-19 infection cases in India.

    Benvenuto, Domenico et al. [14] performed predictions using the ARIMA model on the Johns Hopkins epidemiological data. This study predicted the epidemiological trend of the prevalence and frequency of COVID-19.

    Lai D. et al. [15] applied Box-Jenkins, random walk on the daily reported SARS cases, and ARMA model to monitor the effects of SARS in China based on the new cases reported daily by the Ministry of Health of China.

    These works mention the effectiveness of the ARIMA model in various use cases, and the research undertaken helps to understand the behavior of COVID-19 infection.


      But for this analysis, we used only a few attributes like location, date, and people_vaccinated_per_hundred. The aim is to forecast how many people will be vaccinated per hundred in the next 30 days for selected countries based on their overall development status.

      As mentioned in the introduction, the approach is to forecast the vaccination rate for some countries based on their development status. In time series forecasting, we collect past observations and analyze them to develop a suitable mathematical model. The model then captures the underlying pattern of data generation for the series. This trained model can predict future events [8]. As prior studies showcase, ARIMA i.e., AutoregRessive Integrated Moving Average performs effectively in forecasting. Hence, we will be incorporating the ARIMA model for forecasting vaccination rates in this study. Fig. 1. shows the flowchart of implementation for this work.

      1. Dataset

        This study uses the vaccination data provided by Hannah Ritchie on Our World in Data[1], referenced by [6]. The site has data based on various metrics like Vaccinations, Tests & positivity, Hospital & ICU, Confirmed cases, Confirmed deaths, Reproduction rate, Policy responses, and Other variables of interest. The data is updated daily based on verifiable public official sources. The dataset can be accessed online at: https://ourworldindata.org/coronavirus- source-data.

        The vaccination data consists of country-wise data on COVID-19 vaccinations for 209 countries [2]. This dataset includes some subnational locations (England, Northern Ireland, Scotland, Wales, Northern Cyprus, etc.) and international aggregates (World, cntinents, European Union, etc.). The dataset has various attributes which are tabulated in Table 1.


      2. ARIMA

        Fig. 1. Flowchart of Implementation

























        The classical regression is insufficient for explaining the dynamics of a time series as there is often additional information in the data that regression does not capture. This may lead to the generation of correlation introduced due to lagged linear relations. Hence, the autoregressive model, AR, and autoregressive moving average model, ARMA came

        into the picture. For non-stationary scenarios, the ARIMA

        model was introduced [10].

        In terms of ARIMA, "autoregressive" terms are the lags in the forecasting equation of a stationary series, and the "moving average" terms are the lags of the forecast errors. A time series that needs to be differentiated to make it stationary is an "integrated" version of the stationary series [7]. ARIMA is a model that fits time-series data to better understand or predict future data points in the series.

        As mentioned in [7][11], an ARIMA model is expressed in terms of ARIMA(p,d,q), Here,

        • p is the number of autoregressive (AR) terms,

        • d is the degree of trend differences needed for stationarity (I), and

        • q is the number of moving average (MA) terms i.e., the lagged forecast errors in the prediction equation.

          We use the following representation of ARIMA for forecasting the vaccination rates in various countries [11].

          (, , q):

          = (11) + (22) + (11) + (22) + (1) where, = (1) (2)


        • Xt = forecasted values for vaccinations done on tth day,

        • 1, 2, 1, and 2 = parameters

        • Zt = residual term from the tth day.

          For implementation, we are using the auto_arima from the pmdarima library available in python. The auto-ARIMA method attempts to identify the most optimal parameters for an ARIMA model to achieve a single best-fit ARIMA model [9].

      3. AIC

        AIC, i.e., Akaikes Information Criterion, is used to identify the best ARIMA model. The model with lower AIC is the better one when comparing multiple models [12].

        In auto-ARIMA, the best fit model derives from a given information_criterion. This criterion could be one of Akaike Information Criterion, Bayesian Information Criterion, Corrected Akaike Information Criterion, etc. The auto- ARIMA returns the model that minimizes the value.

      4. ACF & PACF

      The ACF, autocorrelation function graph, and PACF, partial autocorrelation graph help analyze differencing order. The model is tested for stationary characteristics and variance in normality via the Augmented Dickey-Fuller

      (ADF) test [11]. Auto-ARIMA conducts the differencing tests and determines the order of differencing, d [9]. The test can be mentioned explicitly among the options of Augmented Dickey-Fuller, PhillipsPerro, or Kwiatkowski PhillipsSchmidtShin.

      The implementation of this case study is inspired by the work of [11] and [13].


      As mentioned earlier, we aim to compare the vaccination rates for countries based on their overall development statuses like developed, developing, and under-developed.

      1. Developed countries

        When searched for developed countries, the top countries that populate the list are Norway, Switzerland, Ireland, Germany, and many more.

        The experimentation graphs for Norway and Switzerland data are shown below. Fig. 2. and Fig. 5. show the ACF and PACF graphs for both countries. Fig. 3. and Fig. 6. illustrate the auto-ARIMA model summary. Fig. 4. and Fig. 7. depict the forecast of vaccination rate (red line) in Norway and Switzerland, respectively, for the upcoming 30 days.

        Fig. 2. ACF and PACF graphs for Norway vaccination data

        Fig. 3. ARIMA model summary for Norway vaccination data

        Fig. 4. Forecast of Vaccination Rate for Norway

        Fig. 5. ACF and PACF graphs for Switzerland vaccination data

        Fig. 6. ARIMA model summary for Switzerland vaccination data

        Fig. 7. Forecast of Vaccination Rate for Switzerland

      2. Developing countries

        In the list of developing countries, we find countries like India, Argentina, Brazil, China, and many more.

        The experimentation graphs for India and Argentina data are shown below. Fig. 8. and Fig. 11. show the ACF and PACF graphs for both countries. Fig. 9. and Fig. 12. illustrate the auto-ARIMA model summary. Fig. 10. and Fig. 13. depict the forecast of vaccination rate (red line) in India and Argentina, respectively, for the upcoming 30 days.

      3. Under-developed countries

      Some of the underdeveloped countries include Niger, Chad, South Sudan, Mali, etc. In the dataset, it is observed that there are fewer tuples for these countries. This also indicates the lack of awareness among the citizens of these countries.

      Fig. 8. ACF and PACF graphs for India vaccination data

      Fig. 9. ARIMA model summary for India vaccination data

      Fig. 10. Forecast of Vaccination Rate for India

      The experimentation graphs for Niger and South Sudan data are shown below. Fig. 14. and Fig. 16. illustrate the auto-ARIMA model summary. Fig. 15. and Fig. 17. depict the forecast of vaccination rate (red line) in Niger and South Sudan, respectively, for the upcoming 30 days.

      Fig. 11. ACF and PACF graphs for Argentina vaccination data

      Fig. 12. ARIMA model summary for Argentina vaccination data

      Fig. 13. Forecast of Vaccination Rate for Argentina

      Fig. 14. ARIMA model summary for Niger vaccination data

      Fig. 15. Forecast of Vaccination Rate for Niger

      Fig. 16. ARIMA model summary for South Sudan vaccination data

      Fig. 17. Forecast of Vaccination Rate for South Sudan

      From the above experimentation results, we observe that for the vaccinations dataset, the auto-ARIMA function determines d as a second-order differencing.

      Table 1. shows the forecasts of vaccination rate for people per hundred for the upcoming 30 days.


      Development Status


      Forecast Vaccination Rate (people

      vaccinated per hundred)



      Norway, Switzerland

      50 to 80

      50 to 65

      Increasing moderately for both


      India, Argentina

      27 to 35

      65 to 75

      Slow increase for India, Rapid increase for Argentina

      Underdevelo ped

      Niger, South Sudan

      -20 to 55

      -2 to 4.5

      Very slow for both


      In this work, we forecasted the vaccination rates or some of the developed, developing, and underdeveloped countries for the upcoming 30 days.

      From the experimentations, we can conclude that the vaccination rate for developing countries is increasing at quite a pace. The average rate observed is 50 – 80 people per hundred getting vaccinated in the next 30 days.

      For developing countries, the average forecasted rate observed is 30 – 60 people per hundred getting vaccinated in the upcoming month. With this, we can understand the level of social awareness amongst the citizens.

      However, for underdeveloped countries, we have less data on vaccinations. Due to this, the vaccination rate cant be accurately forecasted, thus providing us a wide range of possibilities ranging from -10 to 50 people per hundred getting vaccinated in the next 30 days on an average. This may also imply that the governments of these countries need to spread awareness amongst their citizens and also supply ample quantity of vaccines to their citizens.


As future research, the model can extend to forecast vaccination rates using other attributes provided in the dataset. The sources [1][2] also have an explicit US vaccinations dataset, which could be used for similar purposes. Different models can be used, and their performance can be compared to the chosen problem statement.


  1. Ritchie, H., Ortiz-Ospina, E., Beltekian, D., Mathieu, E., Hasell, J., Macdonald, B., Giattino, C., Appel, C., Rodés-Guirao, L. and Roser, M., 2021. Coronavirus Pandemic (COVID-19). [online] Our World in Data. Available at: <https://ourworldindata.org/coronavirus-source- data> [Accessed 20 July 2021].

  2. Mathieu, E., Ritchie, H., Ortiz-Ospina, E. et al. A global database of COVID-19 vaccinations. Nat Hum Behav (2021). https://doi.org/10.1038/s41562-021-01122-8.

  3. Sahisnu, Julian Satya, et al. Vaccine Prediction System Using ARIMA Method. ICIC International, 2020. DOI.org (CSL JSON), https://doi.org/10.24507/icicelb.11.06.567.

  4. Vithalrao Kendre, Varsharani, et al. Forecasting Measles Vaccine Requirement By Using Time Series Analysis. Journal of Evolution of Medical and Dental Sciences, vol. 6, no. 28, Apr. 2017, pp. 2329 33. DOI.org (Crossref), doi:10.14260/Jemds/2017/501.

  5. Singh, Saswat, et al. Time Series Analysis of COVID-19 Data to Study the Effect of Lockdown and Unlock in India. Journal of The Institution of Engineers (India): Series B, Apr. 2021. Springer Link, doi:10.1007/s40031-021-00585-7.

  6. Tuli, Shreshth, et al. Predicting the Growth and Trend of COVID-19 Pandemic Using Machine Learning and Cloud Computing. Internet of Things, vol. 11, Sept. 2020, p. 100222. DOI.org (Crossref), doi:10.1016/j.iot.2020.100222.

  7. People.duke.edu. 2021. Introduction to ARIMA models. [online] Available at: <https://people.duke.edu/~rnau/411arim.htm> [Accessed 20 July 2021].

  8. R. Adhikari and R. K. Agrawal, An introductory study on time series modeling and forecasting, arXiv Prepr. arXiv1302.6613, 2013.

  9. Alkaline-ml.com. 2021. pmdarima.arima.AutoARIMA pmdarima

    1.8.2 documentation. [online] Available at: <https://alkaline- ml.com/pmdarima/modules/generated/pmdarima.arima.AutoARIMA

    .html> [Accessed 20 July 2021].

  10. R. H. Shumway and D. S. Stoffer, Time Series Analysis and Its Applications: With R Examples, Springer, 2017.

  11. H. Tandon, P. Ranjan, T. Chakraborty, V. Suhag, Coronavirus (covid- 19): Arima based time-series analysis to forecast near future. arXiv:2004.07859 (2020)

  12. Keshvani, A. and Keshvani, V., 2021. Using AIC to Test ARIMA Models. [online] CoolStatsBlog. Available at:

    <https://coolstatsblog.com/2013/08/14/using-aic-to-test-arima- models-2/> [Accessed 20 July 2021].

  13. Benny, D., 2021. COVID-19 Vaccination Rate | Analysis of COVID-

    19 Vaccination Rate. [online] Analytics Vidhya. Available at:

    <https://www.analyticsvidhya.com/blog/2021/04/time-series- analysis-forecast-covid-19-vaccination-rate/> [Accessed 20 July 2021].

  14. Benvenuto, D., Giovanetti, M., Vassallo, L., Angeletti, S., & Ciccozzi, M. (2020). Application of the ARIMA model on the COVID-2019 epidemic dataset. Data in brief, 29, 105340. https://doi.org/10.1016/j.dib.2020.105340

  15. Lai, D. (2021). Monitoring the SARS Epidemic in China: A Time Series Analysis. Journal of Data Science, 3(3), 279-293. doi:10.6339/JDS.2005.03(3).229

Leave a Reply

Your email address will not be published. Required fields are marked *