A Linear Study of the Spread of COVID-19 in China and Iran, , and
Studying trends in observed rates provides valuable information in terms of need assessment, planning of programs and development indicators of each country. The purpose of the present study was to apply the regression model and the Fourier series in terms of predicting the trends in growth and mortality rate of coronavirus disease.
Material and Methods:
In this study, two linear analysis methods were used to predict the incidence and mortality rate of coronavirus disease in Iran and China. The methods used are linear regression and Fourier transform. The data used were collected by referring to the official media of the mentioned countries, the general form of which is a time series of the incidence and mortality rate in recent days and the model implemented to estimate the incidence and mortality rate for the coming days. Python programming language version 3.7 is used to implement models.
The results of this study show that the rates of coronavirus disease incidence and mortality are still increasing. Meanwhile, the Fourier transform-based analytical method is more accurate than the linear regression method and on the other hand, the accuracy of both algorithms for predicting mortality was much higher than the prediction rate. This indicates that the mortality rate is higher than that of its linearity over time. The other point is that based on the results of this study, however, linear methods are very suitable for future prediction, due to the nature of epidemic diseases whose growth chart is nonlinear, linear methods cannot be used to predict the rate and mortality used in distant times.
The accuracy of the mathematics-based methods for predicting the trajectory of COVID-19 was really high. We predicted that the epidemics of COVID-19 will be high during 10 days. If the data are reliable and there are no second transmissions, we can accurately predict the transmission dynamics of the COVID-19 across the cities in China and Iran. The mathematics-inspired methods are a powerful tool for helping public health planning and policymaking.
One of the most important components in any community health planning is to determine the incidence of diseases in that community. Knowing the pattern of changes in disease incidence in each country can be of great importance for national planning.
Public health organizations believe that monitoring the process of disease incidence, mortality and health risk factors may contribute to adverse health events. Tracking changes in observed incidence or prevalence rates provides valuable information for need assessment, planning, review of programs and development indicators of each country. Studying data over time can also play a role in predicting the magnitude and frequency of future events.
In December, 2019, COVID-19 virus which slipped from animals to humans in China caused an outbreak of respiratory illness .
Actually, Iran reported its first confirmed cases of CoV-2 infections on 19 February 2020 in Qom. The kind of pneumonia caused by the novel coronavirus disease (COVID-19) is a highly infectious disease, and the ongoing outbreak has been declared by WHO as a global public health emergency . Coronaviruses are a group of viruses that make diseases in mammals and birds. In humans, coronaviruses cause respiratory tract infections that are mild, like the common cold, though rarer forms such as SARS, MERS, and COVID-19 can be lethal .
According to ARCGIS, 105427 confirmed cases and 3583 deaths in around 101 countries have been registered. Among these countries China, South Korea, Italy, Iran and Germany have the most confirmed cases .
Linear regression, known as piecewise regression, least squares regression, two regressions, two- or multiple regressions, 3-step, and broken-line regression, is one of the methods of regression analysis in which the independent variable is divided at intervals and for each interval one. The regression line is fitted separately and is called the boundary between the fracture points. Linear regression is used to express continuous breakpoints in disease mortality and incidence rates .
There are several predictive methods [6, 7], one of which is linear regression, which is a useful tool for describing trends in data changes, especially data on disease incidence or mortality. As the trend of disease pattern changes in the country changes on an annual basis, therefore, to study these trends, the use of mathematical models can often be problematic and provide new insights into medical issues .
The application of intelligent technologies in clinical decision making have started playing a vital role in improving the quality of patients’ life and helping in decreasing cost and workload involved in their daily healthcare .
A study by Lau and Gange et al in 2003 examined the trend of decreasing lymphocyte counts and hemoglobin concentrations as a precursor to AIDS in men with HIV infection using single regression to evaluate the above trend . Cutoff regression has had many applications in the medical sciences, biology, animal and plant sciences and genetics, and the problem of estimating the number of change points in a piecewise regression model was expressed by Kim in 2000. In 2000, Kim applied a series of permutation tests to determine the number of unknown change points in a single regression .
In this paper, a fast Fourier transform-coupled machine learning based ensemble model is adopted for predicting patients with coronavirus disease based on the analysis of their medical data during ten days. The purpose of this article is to predict the growth rate and mortality rate of patients with coronavirus disease using linear and time series regression.
MATERIAL AND METHODS
The model consists of two popular models: linear regression (LR) and fast Fourier transform (FFT). In linear regression, the relationships are modeled using linear predictor functions whose unknown model parameters are estimated from the data. Such models are called linear models. A FFT is an algorithm that computes the Discrete Fourier transform (DFT) of a sequence, or its Inverse (IDFT). Fourier analysis converts a signal from its original domain (often time or space) to a representation in the frequency domain and vice versa [8, 11, 12].
The model of predicting the incidence of coronavirus disease indifferent countries are studied using regression model and Fourier series. The dataset is a series of times the number of officially confirmed deaths is reported on the WHO web site (https://www.who.int/emergencies/diseases/novel-coronavirus-2019/situation-reports). In this paper, we use linear regression and Fourier analysis and make predictions for the ten days.
In this paper, linear regression is not applied to data whose growth is exponential, thus using the fast Fourier series. The main purpose of the study is to investigate the coupling of the fast Fourier transformation with a linear regression in predicting COVID-19. In this section, an overview is first presented on the architecture of the recommender system followed by the detailed discussion on the fast Fourier transformation and a linear regression, two major technical components of the system.
Linear regression (LR)
Linear regression is the first kind of regression analysis to be studied rigorously, and to be used extensively in useful applications. This is because models which depend linearly on their unknown parameters are better to fit than models which are non-linearly related to their parameters and because the statistical properties of the resulting estimators are easier to characterized .
Fast Fourier Transform (FFT)
The FFT is an efficient technique to compute the Discrete Fourier Transform (DFT) and its inverse. It is like the wavelet transformation as a windowing technique. The DFT decomposes the input data sequence in order to extract frequency information for the purpose of predicting the patient’s condition one day in advance [12, 14]. In this paper FFT formulas were used as following (Eq1, Eq2) :
If s(x) is P –periodic; then any interval of that length is sufficient.
and can be reduced to and .
Many texts choose to simplify the argument of the sinusoid functions (Eq3):
The data used for analysis in this article were collected through official statistics of two countries, China and Iran, up to the date of 7th of March 2020, and therefore based on the properties of the data used in the form of a time series, are the basis of all results and discussions. This article deals with this hypothesis. Fig 1 to 4 show the prediction rate of coronavirus disease in the next 10 days.
Fig 1 to 4 show the growth of COVID-19 in both Iran and China. At the same time intervals, China's rate of growth is much higher than Iran, which could be due to China's large population or the time difference between the two countries when the outbreak began. The outbreak of this disease in Iran started two months later and we can say that we are at the beginning of its growth chart. On the other hand, as the outbreak of the disease has begun in China, awareness has been very effective in reducing the rate of infection in other countries.
As it is clear from these two graphs, the rate is increasing linearly over the next ten days. Given the high prevalence of infection in China in recent days, the implemented model predicts a higher incidence of infection in China. What is evident from the linear model test is that the prediction performs very well for the near days and the error rate increases rapidly for the third day onwards. The reason for this is due to the non-linear nature of the growth of epidemic diseases. Both the Fourier transform method and the linear regression predict the day ahead with high precision data analysis, and they can be advanced day by day. Meanwhile, the Fourier transform method was more accurate than the linear regression method and its error rate was relatively lower for the following days.
Fig 5 to 8 show the prediction of mortality rates for coronavirus disease in the next ten days using either Fourier transform or linear regression.
The diagrams in Fig 5 to 8 show an increase in mortality from coronavirus disease. In comparison, it can be said that the death rate in China is higher than that of Iran and this can be influenced by several factors. The important point in these graphs is that the models implemented for the first and second days had a much higher prediction accuracy than the next days, but the accuracy of the linear models was far better than the predicted mortality.
Examination of mortality charts shows that unfortunately, mortality has increased linearly over time and, despite the awareness of many communities, its growth rate has not yet slowed down. Forecasts show that China has seen a slowdown in its growth rate through a period of exponential growth in the incidence of deaths from COVID-19, but Iran has for several reasons that may be due to its relative readiness to cope with the disease. There was a beginning of the outbreak, a linear behavior from the beginning, and this linear increase continues. The mortality rate of the disease in Iran is expected to be much lower in the same period than in China.
Results showed that linear models can predict incidence and mortality of coronavirus for a near future. When we talk about a prediction over a long period of time, these models are unable to predict in an acceptable distance to the real values. FFT predicts better than linear regression has a better accuracy. Both FFT and Linear regression have predicted that the growth rate of incidence and mortality in China is more than Iran. This can be resulted from several reasons. Coronavirus started in Iran two months after China so Iranian had more readiness than Chinese when they encounter the first infected case.
Population density, public health, public access to medical centers and the power of government in facing with coronavirus effect on the results. This means that some factors have linear behavior and some of them change the results in nonlinear way.
According to the results, the number of incidence and mortality will exceed grow rapidly based on the past reports. A very rapid and effective way is needed to reduce or stop the growth rate of incidence and mortality of coronavirus. Staying at home and strict rules must be applied beside the diagnosis and recovery of infected patients.
This paper presents two methods of linear modeling of Fourier transform and linear regression to predict the incidence and mortality rate of coronavirus disease in both China and Iran.
The results show that linear methods can predict coronavirus disease rate and mortality rate in near future with good accuracy. The overall growth chart for epidemic diseases shows that the rates and deaths from epidemic diseases such as corona are non-linear and that linear methods can work correctly for near-prediction times. When we shorten the time interval sufficiently, the graph behavior becomes closer to the linear model, and therefore the prediction is coupled with possible linear methods and with good accuracy. Nonlinear methods are proposed to predict the incidence and mortality of coronavirus disease over longer periods.
The authors agree on this final form of the manuscript, and attested that all authors contributed in the final draft of the manuscript.
CONFLICTS OF INTEREST
The authors declare no conflicts of interest regarding the publication of this study.
No financial interests related to the material of this manuscript have been declared.