Designing an intelligent system for diagnosing type of sleep apnea and determining its severity, and
Sleep apnea syndrome can be considered as one of the most serious risk factors of sleep disorder. Due to the lack of information about this disease, many causes of unexpected deaths have been identified. With increasing the number of patients with this disease around the world, many patients suffer apnea complications. Most of them are not treated because of the complex and costly and time-consuming polysomnography (PSG) diagnostic procedure.
Material and Methods:
This descriptive-analytical study was performed on 50 patients referred to sleep clinic of Imam Khomeini Hospital in Tehran, Attempts to design, and develop a system for detection of sleep apnea and its severity using ECG signals, RR intervals and airflow. The random forest algorithm and MATLAB2016 were used in the design of the system that the algorithm inputs are extracted 8 features nonlinear in time-frequency domain from airflow and ECG signals and 10 nonlinear features of RR intervals.
The accuracy for normal, obstructive, central and mixed apnea was obtained at 95.3%, 97.92%, 99.60%, and 97.29%, respectively, and the accuracy For detection of normal, mild, moderate and severe apnea was obtained 96%, 94%, 94%, 96% respectively. According to the results, the proposed system can correctly classify the types of sleep apnea and its severity.
Sleep apnea is a disorder in which breathing is repeatedly interrupted during sleep and starts again . Each complete stop of breathing for at least 10 seconds in sleep is called sleep apnea and in severe cases, it can be repeated hundreds of times in one day . 25% to 50% reduction in airflow during respiration, associated with a severe reduction in oxygen saturation, is called hypopnea . This disorder is manifested as central respiratory failure in 36% of patients, obstructive apnea in 12% of patients, and a combination of abnormal central and obstructive breathing in other cases . This disorder has a high prevalence in males, elderly people, people who have high waist-to-hip ratio, high neck circumference, and high body mass index, and cigarette smokers [5-8]. Untreated obstructive sleep apnea can increase hypertension , heart failure , kidney failure , and stroke . Obstructive sleep apnea disrupts glucose metabolism and increases the chance of developing type 2 diabetes, impaired glucose tolerance, and insulin resistance . Polysomnography (PSG) is the standard tool for the diagnosis of apnea at the present time. In this method, sensitive receptors that are attached to the patient's body are monitored and provide important information by recording various activities such as electroencephalogram, electrooculogram, electromyography, electrocardiogram, oximetry, airflow and respiratory activities [14-16]. Based on analysis of these signals and examining the body's function for each patient, the Apnea–Hypopnea Index (AHI), which indicates the number of respiratory interruptions and poor respiratory interruptions, divided by the total sleep time (based on time), are calculated. This index is used to determine the severity of the disease . In general, the American Academy of Medicine for Sleep Disorders, the range of AHI < 5 is considered to be normal, while the ranges (5 ≥ AHI≥14), (15≥AHI ≥30) and (AHI >30, respectively, indicate mild, moderate, and severe forms of the disease . PSG is a very expensive and time-consuming diagnostic test because it requires patient’s hospitalization for one night in sleep laboratory and the presence of specialist for diagnosis and its performing is not possible anywhere . In the last 20 years, many efforts have been made to diagnose the disease using fewer signals compared to PSG. Alvarez et al used electroencephalographic spectroscopy and oxygen saturation signal analyses using linear regression (LR) method to diagnose obstructive apnea . Khandoker et al extracted the RR intevals and ECG-derived respiration (EDR) features from 125 ECG recordings and used support vector machine (SVM) technique for diagnosis of apnea . Koley et al used Spo2 signal and binary classifier to diagnose apnea-hypopnea events online . Wenming Yang used the long short-term memory (LSTM) neural network based on airflow signals to diagnose apnea-hypopnea events . The aim of the present study was to design and develop an intelligent system using airflow and ECG signals that determine types of apnea and the AHI value. It was done at a lower cost compared to PSG method and did not have its disadvantages. Simple features that have less computational load were also used. For this purpose, a series of features were extracted from discrete wavelet transform coefficients that were inputs of random forest algorithm to diagnose apnea types and also to calculate its severity.
MATERIAL AND METHODS
The type of study is descriptive-developmental. This section describes the various steps of the proposed algorithm to diagnose apnea. Fig 1 illustrates the methodology of the proposed method.
In the present study, the data recorded in sleep clinic of Imam Khomeini Hospital in Tehran were used. 50 recorded ECG and airflow signals including 34 males and 16 females with (32≤ age ≤ 64) and (24≤ BMI ≤ 35), their sleep duration between 7 and 9 hours, 100-Hz signal recording and 16-bit resolution, which their results were predetermined. Out of 50 cases, 10 were normal (AHI <5), 13 were mild (5≥AHI < 5), 17 were moderate (15≥AHI<30) and 9 were severe (AHI≤ 30).
Preprocessing was performed on the signals. First, the noisy points were manually noted and then removed. Then, to remove the automatic noise of signals, the urban electricity and baseline deviation noises were filtered using Chebyshev Type II and Butterworth band-pass .
Pan-Tompkins method was used to diagnose R peaks. This algorithm acts based on the analysis of slope and width of the QRS complex. The most important part of the QRS complex is the R peaks. The R wave slope is used to find the QRS complex. This algorithm first passes the signal through the filter block. Block filters include band-pass filter, low-pass filter, high-pass filter, derivative function, squaring function and integral of a moving window . The distance between R (RR-interval) peaks was calculated based on the following formula:
i= 1, …, n (1)
According to the definition of apnea, the windows were selected for 10 seconds so that the end of each window was the beginning of the apnea. Then, one-second shift was applied to the window to increase accuracy and analysis of the whole signal, and the shift continued until the beginning of the window to fit the end of apnea. Since apnea has different lengths, we consider a 30-second window on average for each apnea window with a frequency of 100. To calculate the number of normal windows, the number of apneas was multiplied by 1000. Accordingly, total number of windows that should be assigned to appropriate classifier was determined.
C. Discrete wavelet transform
DWT, used widely in engineering and medicine, is an adaptive signal processing tool. Different frequency bands with different resolutions analyze signal within the information approximately and partially. The DWT uses two sets of functions called scaling and wavelet functions, which are dependent on the low-pass and high-pass filters, respectively. Signal is analyzed within different frequency bands from low-pass and high-pass filtering in the signal time domain. The method of analyzing several signal resolutions x [n] is summarized in Fig 2.
Each stage of this project consists of two digital filters and two blocks of second order sampling rate reduction. The first filter h[.] is the discrete mother wavelet, which is inherently high-pass filter and the second is the G[.], which is inherently low-pass filter. The outputs of the sampling rate reducer of first high-pass and low-pass filters are partial coefficients D1 and the approximate A1, respectively. The first approximate coefficients A1 are further analyzed and the process continues [26-28].
D. Feature extraction
Given the non-stationary nature of ECG and airflow signals, the use of discrete wavelet transforms to identify the mentioned patterns seems to be more appropriate. Thus, in this algorithm, partial coefficients of the 8th step of the smelt wavelet transform with the Daubechies mother function were extracted from the 10-second periods of disease state and normal state. It should be noted that different types of mother wavelet and approximate and partial surfaces were studied and finally the wavelet transform and the analysis level were obtained experimentally via trial and error as the optimal mother wavelet transform and analysis level. The result of this step was10 coefficients, which included valuable information on ECG and airflow signal patterns that 10 coefficients (4 approximate coefficients and one detail coefficient for airflow and 4 approximate coefficients and one detail coefficient for ECG) then 8 features nonlinear were extracted for wavelet coefficients that is the wavelet coefficients, as shown in Table 1.
Features extracted from wavelet coefficients
|Feature||Math formula||Feature||Math formula|
|Ampl||Ampl= Max () - Min(||
For the RR intervals, 10 features were extracted, as shown in Table 2 . After the extraction of features, a t-test with a significance level of α=0.05 was applied to determine the differentiation of the extracted features.
E. Random Forest (RF) Algorithm
Random forest is a combination of learning trees that each tree in the forest is made from a random vector called Q. The Qk vector describes the way of making K tree. Each tree made with h (x, Qk) is displayed. For the input classifier x, this vector is given to all trees as input, and the final class is the class that most trees have been assigned to it. If X is a random vector derived from the training data and Y is its output vector and h1(x), ... , and hk (x) are some classifiers, the margin function of this set of classifiers will be defined as follows:
Where, the function I (.) is marker.
If mg (X, Y) > 0, the set of classifiers has been correctly classified. If mg (X, Y) <0, the classification has been done incorrectly .
The algorithm for making a RF with T tree from a dataset with n observations and variable P is as follows [31, 32]:
i) Using a bootstrap method, a random sample with n observations is selected.
ii) For the selected bootstrap sample, a classifier tree is grown using the recursive partition algorithm. In each node, partitioning is done based on random sample of m of predictive variable of p.
iii) The recursive partition algorithm continues until the tree reaches its largest size (i.e., a final node for each observation), without the tree being pruned.
iv) Steps (i) to (iii) T are repeated so that a RF is made.
Features extracted from RR intervals 
F. Design of graphical user interface (GUI)
The GUI is a program interface that consists of interactive components such as icons and other graphical objects. It helps user interact with computer software such as the operating system. It also allows users move around a computer or device and complete actions through visual indicators and graphic icons . For the GUI test, the results of its execution are compared with the expected results. This test can be manual or automatic . In this research, using MatLab 2018, the GUI was designed with the following features:
1. It provides access to all users with the least training
2. It does not contain additional and misleading elements
3. The software is independent and designed to reduce complexity of work.
4. It provides automatically a complete report of the types of apnea, its severity, and the time and length of the apnea.
5. The user enjoys working with it. In other words, it is user friendly.
Among the matrix data obtained in the previous step, 70% were selected for training and 30% were selected for testing random forest algorithm. First, using the training data, random data were trained. Then, using the Bagging method, the data were divided into 10 equal bags and randomly 4 bags were assigned to the trees (50,100,500,000) each time and finally the class of each window was determined via voting for the test data. Finally, the appropriate forest was selected using trial and error method. It should be noted that the accuracy of diagnosis does not always increase with increasing the number of trees because other parameters such as the number of variables selected in each node also have a determining effect on the model’s diagnostic accuracy. Table 3 illustrates a sample of performing different forests with tree number, depth, and shift per second, and mean accuracy.
An appropriate random forest includes 100 trees, 9 depths and 5 sub-branches. The classification matrix of the results obtained from the random forest algorithm in which the rows are the real number and the columns are diagnosed number of the algorithm is as follows (Table 4).
The following formulas were used to evaluate the performance of the system. The results are shown in the Table 5.
A sample of performing different forests with tree number, depth, shift per second, and mean accuracy.
Where, the correct diagnoses of TN and TP, FN and FP are the wrong diagnoses for each class, the evaluation results were calculated.
After executing the algorithm and making the inference engine, the GUI system was designed in three pages. On the first page, there are two boxes and one button. In the first box, the patient's national code is first entered, then the search button is pressed, and in the next box, if the patient has a history of apnea and polysomnography, the patient will be asked if he or she wants to perform a signal analysis, if the NO button is selected, the page will be closed. If the yes button is pressed, it will go to the next page. If the patient is new, it will be saved first, and then, one of the two mentioned states will occur. Fig 3 illustrates a sample of the executing of the first page of GUI.
The second page, as shown in Fig 4, contains several parts. In the Brose .edf file part, the address of the file that is to be analyzed is inserted and in the start time of sleep part, the patient's start time of sleep, including hours, minutes and seconds should be recorded. In the window time shifting, window shifts are determined. Shifts are from 1 to 5 seconds. In the smaller size shifts, the diagnosis accuracy is higher and the execution speed is lower. In the larger size shifts, the diagnosis accuracy is lower and the execution rate is higher. After these settings, with pressing the Analyze button, the results of the signal analysis including the number of apnea types (obstructive, central, mixed) and hypopnea, their sum, AHI value, and the type of apnea severity (mild, moderate, severe, normal) are displayed.
A full report of the patient's signal analysis result can be found by pressing View Analysis Results button on the next page of the GUI (Fig 5). This page is a 4-sheet Excel file. The first sheet (Num) displays apnea number, the second sheet displays apnea start time, the third sheet (end) displays apnea end time, and the fourth sheet displays type of apnea.
To evaluate the usability of the system, its main goal was evaluated. Usability was evaluated retrospectively by using medical file information of the patients who had undergone polysomnography and their polysomnographic response was determined. For this purpose, 50 medical records of patients in the database were randomly selected and provided to processing core of apnea severity diagnosis system. In Table 6, the classification matrix indicates the frequency and accuracy of the predictions, rows are physician’s diagnosis, and columns are diagnosis of the designed system.
The confusion matrix of the evaluation has been calculated and recorded in Table 7.
Accuracy for normal, mild, moderate and severe apnea was obtained 96%, 94%, 94%, 96% respectively.
In this study, due to the necessity of early diagnosis of sleep apnea, a system was designed to diagnose this disease. Random forest algorithm was used for system inference engine and MatLab 2016 GUI was used to create system interface. Automatic sleep apnea diagnosis was performed in 3 stages (pre-processing, feature extraction and classification) and the idea of second-to-second signal analysis improved classification performance. The results showed that the best structure for random forest classification had 100 trees, 9 depths and 5 features. Accuracy evaluation for normal, obstructive, central and mixed apnea was obtained at 95.3%, 97.92%, 99.60%, and 97.29%, respectively, and evaluation of accuracy for normal, mild, moderate and severe apnea was obtained 96%, 94%, 94%, and 96% respectively.
To separate apnea-hypopnea events, Gutierrez-tobal et al extracted spectral and nonlinear features from airflow signals from 317 individuals. First, the correlation method was used to optimize the features and analyze their correlation and redundancy. Then, these features were applied to combination of the LDA model and the regression trees. Accuracy evaluation of AHI=5, 15, 30 was obtained 83.3%, 81% and 83.3%, respectively . To classify normal and abnormal windows, Hassan et al analyzed parts of ECG signals using a data-compatible signal analysis design, that is, Tunable Q-Factor Wavelet Transform (TQWT). Three statistical features were extracted from TQWT sub-bands and its training and testing matrices were formed. The general performance of the RUSBoost algorithm was evaluated for different values of TQWT parameters and the optimal values of these parameters were evaluated and determined. Then, using the statistical properties extracted from each of the sub-bands, they classified normal and abnormal windows with an accuracy of 88.88% . Lakhan et al evaluated airflow signals of 520 patients from the MrOS sleep database to classify the severity of apnea-hypopnea. They used the proposed design of DL to extract 17 features of airflow. These features were deep neural network inputs. Then, using the 10-fold validation technique, the accuracy for AHI=5, 15, 30 was obtained at 82.38%, 84.15%, and 92.14%, respectively . In the present study, the designed system has an advantage of automatic diagnosis of AHI index.
In the first stage, the researcher collected, pre-processed, normalized and adjusted the data. In the second stage, the combination of the two ECG and airflow signals was used simultaneously in model design. In the third stage, the signals were analyzed in seconds for the first time in this study and types of apnea were determined and the AHI index was calculated. In the fourth stage, the designed system of Gui allows physicians, technicians, and even patients at home use it practically. The accuracy of the simulation was higher in terms of statistical evaluation criteria compared to previous studies. The results of the studies are promising and show that the random forest (RF) algorithm is suitable for modeling sleep apnea diagnosis. However, this study has some limitations. Lower number of patients was used to train and evaluate the model and the ECG signals were noisy due to shaking and moving during sleep or coughing. It is recommended for future studies to use more patients and a combination of other signals (such as EEG, EOG, blood oxygen saturation, etc.), apply other classification algorithms and combination of them and to compare their results. It is also recommended to design GUI using other programming languages such as Java, web-based and mobile-based systems.
Sleep apnea is a common disease, but it has not been highly recognized and treated, because the majority of patients do not report any daily symptoms such as excessive sleepiness. The aim of this study was designing a sleep apnea diagnosis system using ECG and air flow signals and high reliability was obtained for this system by using the mentioned method and second-to-second analyzing of the signal. Given the low cost, extensive access, and easy use of this system, it can be used in healthcare centers to help the physician increase the speed and accuracy of the diagnosis of apnea and its severity, in home systems that require algorithms with low computations and the areas where health services are not sufficient.
All authors contributed to the literature review, design, data collection and analysis, drafting the manuscript, read and approved the final manuscript.
CONFLICTS OF INTEREST
The authors declare no conflicts of interest regarding the publication of this study.
No financial interests related to the material of this manuscript have been declared.