Data Science and COVID-19: predicting the spread! (using R)

No one is safe from the dreaded COVID-19 nowadays. We did not want to write about it, but there is so much to analyse, predict and share that we are going to bring the website back to life to provide you with the latest trends, forecasts and data analyses, hoping that this pandemic will come to an end soon.

Yesterday, the Imperial College London (ICL) published a very interesting article using data from different countries to build several predictive models and show the possible scope of the spread of this virus. From our side, we have contacted a data analyst to create our own model for one of the countries that is suffering the most, Spain.

The original paper from the ICL provides an explanation of the measures that have been taken by the countries, some of which are common among all of them. Social distancing, banning public events and even complete lockdown are some of the measures that are being applied to prevent the spread of COVID-19. These measures are proving to be effective, although the increase in the number of cases is still concerning for some of the European countries like Spain and Italy, whose populations are older than that of countries like Switzerland, South Korea or China.

One of the most interesting parts of the ICL’s research paper is the comparison between the estimated number of deaths per day with and without intervention. The numbers speak for themselves: ignoring the extremely fast reproduction rate of the virus and its considerable lethality would have led us to several thousand deaths every single day.

Fig. 1: the possible effects of absence of intervention in Spain

            Our data analyst from Spain, Sergio Roldán, knows about the situation in this country first-hand. With the information from this paper and his own research based on official data from the Government of Spain, he has created a predictive model for the next three weeks, when the virus is expected to stop its rapid spread thanks to the extraordinary measures and the hard work from healthcare professionals and researchers. By analysing the time series data of COVID in Spain, he is able to give the readers a very robust forecast of the expansion of this virus.

Fig. 2: Estimated number of infections (total) in Spain from Mar 31 to Apr 23, 2020

            As we can see in the graph, there are two “main” scenarios for the pandemic. The first one, which we will call “pessimistic,” gives us an incredibly high (although possible) figure for total infections by April 23, 2020, when the infection rate is expected to lower significantly. The second scenario is the “optimistic” one; the number of daily infections will lower gradually and eventually reach 0 by the end of April – beginning of May 2020.

            Chart: estimated number of total infections and confidence intervals

            The first column “Forecast” is the mean value of the prediction. The columns “Lo” and “Hi” represent the lower and higher ends of the confidence intervals (80 and 95, respectively). Using the highest level of confidence, i.e. a wider interval, the numbers vary between a scary 95,075 and a whopping 353,688 total cases. A lower confidence level would bring these figures closer to each other, estimating between 139,833 and 308,931 total cases.

            So far, the predictions are very accurate, but we must take note that the absence of tests, and the resultant inability to accurately track all the cases, can make these figures deceiving. The original research paper from the ICL estimates that up to 15% of the population in Spain could be infected, including those who are currently asymptomatic or have very mild symptoms.

            In future articles, we will discuss the effects of COVID for global economy and the delicate situation of the European Union as a cause of this, as well as some estimates for macro bullets and indicators. Until then, stay safe and stay Economic.

This article was brought to you by The Economic Man and Sergio Roldán Fernández (LinkedIn post). You can read the full research paper from the Imperial College (London) here. Props to my good friend Emma for proofreading this article 🙂

Leave a Reply

Your email address will not be published. Required fields are marked *