Introduction.
"One of the certainties about the covid-19 pandemic is that little is known about it. With the progress of the epidemic in several countries, experiments, trials, more encompassing and less unreliable data, together with ideas and innovations from the biological and medical sciences start to generate knowledge on the specific dynamics of the virus. In FGV IIU (2020) a pledge is made for more, actually plenty of, analyses of the data, of all sorts and approaches, in order to produce insights and stylised facts that may help in public policy designs to tackle the different present and future impacts of the catastrophe.
In a pioneering paper, John Tukey set the seminal ideas of looking at data per se, without the goal of building up models or testing sophisticated hypotheses. The basic goal, Tukey suggested, should be to use simple techniques to open routes or paths, to be further developed through more sophisticated approaches. This gave origin to the field known as Exploratory Data Analysis (EDA), with several independent and creative evolutions since. The enormous increase in computer power, and the related algorithms for dealing with extremely large data bases, gave later origin to what came to be called the Big Data area and its corresponding analytical procedures -included by many in the Artificial Intelligence (AI) field- for examining very large quantities of data.
Unbeknownst of the principles and purposes of EDA, some believe that Big Data combined with AI’s methods replaced its original procedures, oftentimes (misleadingly) called small sample approaches . Reality is however more complex, and the main principles have not changed. Using Taleb´s terminology, either small or big data techniques valid in Mediocristan -the realm of the normal distribution and all those which somehow cannot be considered too far from it- will not work for sets that follow other, more exotic distributions, like the heavy tailed ones, and both approaches have the same need to develop strategies for these so-called odd distributions. Also, insights are sometimes deeper when extracted from smaller samples, but tested in different data sets of similar size. Briefly, rather than opposite, both methodologies should be used complementarily. Last but not least, the dubious quality of data on the epidemic, extremely dependent on the way measurements are made, puts an additional caveat on the use of more sophisticated or assumptions-demanding models.
In this paper, simple techniques, in a true EDA spirit, are used to try to find clues, from available data on the pandemics, that would signal that a given policy package is being successful and when its likely end would be. Thus, in a more ambitious attempt, some inkling on the probable length of time, till the epidemic could be considered under full control is also suggested.
This is most crucial, as more flexible confinement rules are now dearly needed, but the legitimate fear of applying them too soon heightens and sometimes biases the debate on this decision.
Section 2 explores (daily) data on new infected cases, while section 3 those related to (daily) deaths attributed to covid-19. In both, the publicly available files from worldometers.info have been used. Section wraps up what could be extracted from the previous analyses, in terms of practical procedures, while section concludes with a critical view of the evidences gathered and suggestions for additional pursuits. No panacea or magic solution is presented, rather two tracking mechanisms that could act as a supplementary aid to guess how controlled the epidemic is.
As in the old Middle East classic, at least one thousand and a hundred analyses are needed for unveiling all the relevant patterns of this pandemic and avoiding death; here is simply one of them."