A Novel Structure of Highly Interpretable Fuzzy Rules Extractionand
Extracting effective rules from medical data with two indicators of accuracy and high interpretability is essential to increase the accuracy and speed of diagnosis by specialists. As a result, the production of medical assistant systems that are able to detect the rules governing the data plays a vital role in early detection of the disease and thus increase the chances of treatment, disease control and maintaining the quality of life of patients.
Material and Methods:
In this paper, a system of automatic extraction of rules from medical data by a new hybrid method based on fuzzy logic and genetic algorithm is presented. Genetic algorithms are used to automatically generate these rules. The Parkinson UCI dataset including 195 records and 23 variables was used to evaluate the proposed method based on the criteria of interpretability, accuracy, sensitivity and specificity.
The evaluation of the proposed model on the Parkinson's dataset was the accuracy of 84.62%. This accuracy is supported by 4 fuzzy rules with an average rule length of 2 and using 7 linguistic terms extremely low, very low, low, normal, high, very high and extremely high. All fuzzy membership functions that represent each term have the same width.
The proposed method, based on the three criteria of low number of rules, short rule length and symmetric membership functions with equal width for all variables, is quite suitable for automatic production of accurate and compact rules with high interpretability in medical data. . A 90% dimensionality reduction in the experimental evaluation showed that this model could be used to implement real-time systems.
In recent years, fuzzy systems have been successfully used in various fields of science , engineering  and medicine  for various applications such as control, modeling and classification . Fuzzy rules have become very popular with users due to their high readability, easy interpretation by humans, and the provision of insights into the knowledge embedded in classification systems [5, 6]. They use linguistic rules to describe systems that are easily interpreted and analyzed by users .
One of the most important factors in designing any fuzzy system is generating fuzzy rules and fuzzy membership functions for each rule set. There are two main ways to do this. In first category, the rules are produced by an expert. This method is especially used in control issues with a small number of inputs . The second method is the automatic production of rules using Neuro-Fuzzy techniques, clustering methods and evolutionary algorithms .
In the second category, fuzzy system design can be formulated as a high-dimensional search space problem in which each point represents a rule set, membership functions, and system behavior. By determining performance criteria, system performance takes the form of a hypersurface in space. The development of an optimal fuzzy system design is equivalent to finding the optimal position in this hypersurface. This feature makes evolutionary algorithms such as genetic algorithms a good candidate to search for these hypersurface [7, 9].
One of the applications of fuzzy systems is in clinical decision support system, where the discovery and interpretability of the underlying rules of the data are of great importance. Given that on the one hand, a specialist's knowledge of diagnostic rules is formed experimentally and over the years in his mind, and on the other hand, the emergence of new diseases leads to the rapid growth of medical knowledge and treatment, extracting rules from data with two indicators of high accuracy and interpretability, help specialists to increase the accuracy and speed of diagnosis [10-12].
In this study, a hybrid genetic-fuzzy classification model is presented to automatically extract a set of compact and highly accurate diagnostic rules from data. The proposed chromosomal structure is designed to select the most effective subset of features and provide the best diagnostic rules required by the fuzzy classification system. The Oxford Parkinson's Disease Detection dataset  was used to evaluate the proposed method. Parkinson disease (PD) is the second most common neurodegenerative and progressive disease in the world and one of the public health concerns [14, 15]. The disease is caused by insufficient production of dopamine in the brain and impairs motor and speech abilities [16-18]. More than 10 million people worldwide are currently suffering from this disease . The phenomenon of aging in the world is one of the factors increasing the number of patients with this disease [14, 20]. Given that speech disorders are seen in patients with PD approximately five years before they are clinically diagnosed , designing automated PD systems that use voice data is a good way to diagnose early. As a result, the quality of life is maintained and the symptoms associated with the disease are reduced.
So far, many studies have used intelligent methods to diagnose PD on the Oxford Parkinson dataset, including basic machine learning methods [21, 22], hybrid classifiers [14, 20, 23, 24], evolutionary algorithms [25, 26] and fuzzy expert system [24, 27, 28]. The study by Abiyev and Abizade  is one of the works that has been done to extract the diagnostic rules of PD. The authors presented a design of the PD diagnosis system combining the Takagi-Sugeno-Kang fuzzy system with a six-layer neural network. The first layer was input, the second layer was fuzzy membership functions to assign the degree of membership to each language term, the third layer was fuzzy rules, and the fourth layer was the result layer of fuzzy rules. The fifth layer multiplied the output signals of the third layer by the output signals of the fourth layer. The sixth layer calculated the system output based on a relation. To design the rules in their work, the clustering method along with the gradient technique had been used. Despite the full design description and 100% accuracy report of the model on the Oxford Parkinson's dataset, no diagnostic rules were stated in the article.
In 2014, Wang et al.  proposed a PSO-based Fuzzy Hyper-Rectangular Composite Neural Network (PFHRCNN) to extract rules from the Parkinson's disease dataset. By randomly dividing the data set 50-50% into training and test sets and 10 times implementing the method, the accuracy of the proposed method had been reported by 8 rules, 82.4% on the test set and 92.6% on training. This article listed only 2 out of 8 rules, each of which covers a range of all 22 features in the dataset for both healthy and Parkinson's patients. Given that the length of rules is one of the main indicators of interpretability in clinical decision support systems [10-12], the large number of hypotheses in each rule made it difficult to interpret. Rodrigues and Karunanithi used Mamdani fuzzy inference system in 2019 . Without mentioning the feature selection method, four features of Spread1, DFA, FoH and Spread2 had been selected to analyze the PD diagnosis and intensity. These four features values were categorized into three sets of people with PD, common valued for both PD and healthy people, and completely healthy people. According to the authors, in analyzing the values of these four features, a clear difference was observed between healthy people and those with PD, but no reference was made to the accuracy of the model created by these rules. Their final system consisted of 81 rules, of which only 9 were mentioned in the article. Given the importance of interpretability of diagnostic rules by the human user and its direct relationship with the number and length of rules, in the present study we aim to provide a structure to produce the most compact yet high-accuracy set of rules that is automatically extracted from the data set.
MATERIAL AND METHODS
This study uses the Oxford Parkinson's dataset to train and evaluate the proposed model. The dataset is derived from the UCI Machine Learning Repository, created by Max Little of the University of Oxford in collaboration with the National Center for Voice and Speech, Denver, Colorado, USA . The dataset includes information on 31 individuals, 23 of whom have PD, and includes a total of 24 attributes, including patient names, status (healthy or with PD), and 22 voice measurements taken from the Multidimensional Audio Program (MDVP). This dataset has a total of 195 voice recording, consisting of recording 5 to 6 sounds from each person to utter vowels for 36 seconds, the details of which are described in Table 1. Of these, 147 records belong to 23 people with PD (approximately 75% of data set records) and 48 records belong to 8 people without PD (approximately 25%). Diagnosis was made over a period of 28 years (from zero to 28) and the age of the people was between 46 and 85 years with an average of 65.8. The main purpose of this data set was to differentiate people with PD from healthy people through influential voice features. Table 1 shows the mean, variance, and range of values of each feature in this data set for information on healthy people and those with PD .
Parkinson data set statistical information
The structure of the proposed diagnostic method
Fig 1 shows the flowchart of the proposed method. After normalizing the dataset in the interval [0,1], genetic algorithms are applied to extract the rules from the dataset. A fuzzy rule base is embedded in the structure of the proposed chromosome.. Each rule base contains r rules. Each rule has a consequent (class label) and an antecedent. The antecedent consists of two parts. The first part determines the variable numbers which were selected in the antecedent of the rule (identified by Fi) and the second part determines the corresponding values to each of the variables selected in the first part(determined by Vi). The number of allowed values (linguistic terms) for each variable is determined by the parameter s, and the number of variables used in each antecedent is determined by the parameter f. Using this structure, by determining the number of variables in each rule, the most effective subset of features as well as the most accurate rule can be selected.The size of the proposed chromosome is calculated from the Eq 1:
Where r is the number of rules and f is the number of variables at the antecedent of each rule. The constant 1 is added because of the gene representing the consequent of the rule. Each gene has a value between zero and one that is randomly generated in the original population. A valid chromosome has three conditions: first, no duplicate rules are allowed on the same rule base; the second is that no duplicate variables are appeared in the same antecedent, and the third is that the consequent of at least one rule (i.e., the class label of the rule) must be different from the consequent of other rules. The purpose of the algorithm is to determine the best subset of features in the data set to form the antecedent of each rule.
A fuzzy expert system is used to calculate the fitness of each chromosome. At first, each rule are converted to the Mamdani fuzzy rule base. In this form, if the value of any of the variables used in the antecedent of the rule was 1 , it corresponds to the extreme case of low and is represented by the left shoulder membership function. If the value of any of the variables used in the antecedent of the rule was s, it is equivalent to the extreme case of high and is represented by a right shoulder membership function. Otherwise, symmetric triangular functions are used to represent other linguistic terms. These functions are among the most common functions that have been used in research in the field of extracting rules from medical data, including [1, 4, 5, 32, 33]. In this study, the width of all membership functions is the same and both adjacent membership functions overlap by α. Also, to display the output of each rule, symmetric triangular membership functions with fixed parameters have been used.
Comparison the proposed method with the decision tree
The decision tree is one of the most efficient and popular data mining algorithms in the field of rule extraction . In addition to generating interpretable rules, this algorithm is also able to select effective features. Therefore, in order to evaluate the proposed method based on the criteria of interpretability, generalizability and accuracy of the extracted rules, the decision tree algorithm is also implemented on the Oxford Parkinson data set.
The proposed genetic algorithm was implemented for each change in the number of rules, the number of variables at the antecedent of each rule, and the number of terms for each variable at the antecedent of the rules. Given that the goal was to create a compact rule base with high interpretability, the number of rules empirically set between 2 to 7 rules, the number of variables in each rule set between 2 to 4 variables and the number of terms for each variable in of the antecedent of the rules were set to 3, 5 or 7 (inactive mode was not allowed for selected variables in this structure). To reduce the effect of randomness on the genetic algorithm, each setup was run 10 times.
The tournament method was used to select the parents. The mutation operator was the complement method and the crossover operator was the modified whole arithmetic method; in each call, the crossover point was randomly assigned. The purpose of the proposed model was to determine the antecedent of each rule from the features of the data set.
Table 2 shows the results of program execution for different modes. In this table, nRule shows the number of rules, (nFeat, nState) show the combining the number of variables in each rule and the number of terms for each variable. Results shown that the sensitivity of the proposed model was between 93.2 to 100%, specificity between 10.42 to 54.17%, accuracy between 77.44 to 84.62%, PPV between 77.25 to 86.34%, NPV between 66.67 to 100% and f-measure between 86.90 to 90.74%. The highest f-measure, which was the criterion for selecting the fittest chromosomes, was obtained with 4 rules and 2 variables in each rule and 7 terms for each variable at 90.74%.
The best set of rules for the various combinations of the rule base, highlighted in Table 2, is presented in Table 3. In Table 3, the selected fuzzy term for each variable is specified in the antecedent of the rule.
According to the width of each membership function for linguistic terms, which was obtained by dividing the interval equal to the number of allowed terms and the overlap of 0.25% of the width of the functions, the numerical value of each function for each variable could be calculated. Table 4 shows the best set of rules in terms of the most compact, the balance between sensitivity and specificity, and the highest fitness after converting the fuzzy term to the numerical value of each of the variables.
Results of the proposed method on the Parkinson's data set for different combinations
Selected Diagnostic Rules The proposed method on the Parkinson's data set for different combinations
Selected rules from the most compact selected rules
Set of rules generated by C4.5 algorithm on the Parkinson Oxford dataset
Results of the implementation of the decision tree method
The rules generated by C4.5 decision tree on the data set are shown in Table 5. This method achieved 98.46% accuracy, 100% sensitivity and 93.75% specificity using 10 rules. Out of a total of 10 generated rules, 33 variables were used in antecedents of these rules and the average length of the rule was 3.3.
According to Table 2, the sensitivity of the proposed model on the Parkinson's dataset is not below 93% in any case. This shows that the proposed model can identify people with PD with high accuracy. In more than 14% of cases, the proposed model has reached 100% sensitivity. As a result, the proposed model works well in systems where the correct diagnosis of people with PD is the first priority.
The specificity obtained from the implementation of the proposed model on the Parkinson's data set was below 50% in 74% of cases. This suggests that the proposed model on the Parkinson's dataset for this criterion performs worse than random. The reason for this might be seen in the small number of negative class instances (healthy individuals) in this data set. In this dataset, a quarter of the total number of samples belong to people without PD.
The NPV criterion has reached 100% in all cases where the sensitivity of the model is 100%. This shows that although the proposed model identifies a small number of healthy people as healthy, all of the people it identifies as healthy are really healthy. This measure shows the reliability of the proposed model in determining healthy individuals. One of combination in which the proposed model achieves 100% sensitivity and NPV is a set of rules containing 2 rules, 2 variables per rule, and 7 terms for each variable, which is a compact, fully interpretable set of rules that is easy to implement. As a result, in laboratory environments where the goal is to build a test device, the proposed model is a completely reliable option.
On the other hand, if the goal is to build a system with high specificity and sensitivity simultaneously, the best performance of the proposed model was observed in a rule base with 3 rules, 2 variables in each rule and 3 fuzzy terms for each variable. This rule base has a sensitivity of 94.56%, a specificity of 54.17% and an accuracy of 84.62%. Compared to the set of rules obtained by c4.5 algorithm, the proposed method produces a more compact set of rules in terms of number of rules (ratio 3 to 10) and average length of rules (ratio 2 to 3.3). Although the performance criteria of the set of rules produced by decision tree are higher than the proposed method, the generalizability of the rules of the proposed method is higher. In decision tree rules, small changes in the value of a variable will cause different decisions; Rules 1 and 2 in Table 5 differ only in the value of the variable RDPE. If this value is less than 0.47, the sample is classified into the class of people with PD, and if it is equal to or greater than 0.47, the sample is classified into the class of people without PD. While in the set of rules produced by the proposed method, minor changes at the level of one variable will not cause a different classification. Contrary to the hierarchical nature of tree rules and the dependence of decision-making on minor changes in variables, the rules produced by the proposed method have the ability to be each independently evaluated by a specialist and approved or rejected clinically.
One of the limitations of the proposed system is its lack of evaluation on different data sets that can be covered in future work. In addition, a case study of each of the rules on the data set to assess the distinguishing power of healthy and PD records by each rule is a limitation that must be met in future work.
According to the obtained results, the proposed model has worked quite well in compact rule extraction with very high interpretability; also, considering the imbalance of positive and negative samples in the data set, the accuracy of the proposed model is quite appropriate and can be used in the construction of medical assistant systems.
In this paper, a hybrid genetic-fuzzy model is presented to extract diagnostic rules from medical data. The goal was to achieve a precise set of rules with high interpretability. The efficiency of the proposed method was evaluated on the Oxford Parkinson's data set. This dataset contains 147 records containing information on people with PD and 48 records containing information on healthy individuals. Despite the imbalance of positive and negative samples of this data set, the best set of rules obtained includes 4 rules and 2 variables in each rule and 7 terms for each variable on the whole data set with 84.62% accuracy, 100% sensitivity, 37.5% specificity, 83.05% PPV, 100% NPV and 90.74% f-measure. This result shows that the proposed model achieved an error rate of 15.38% by using only 9% of the data set features in the construction of each rule, while reducing the dimensions of the data set by 90%. Also, due to the compactness of the rule base and the reliability of the distinction between healthy and PD people according to NPV and PPV criteria, the proposed model can be used in the construction of medical decision support systems. In addition, the proposed model with the ability to adjust the number of rules, the number of variables in each rule and the number of linguistic terms to describe each clinical variable can be retrained to achieve higher accuracy
The authors agree on this final form of the manuscript, and attested that all authors contributed in the final draft of the manuscript.
CONFLICTS OF INTEREST
The authors declare no conflicts of interest regarding the publication of this study.
No financial interests related to the material of this manuscript have been declared.