A Novel Structure of Highly Interpretable Fuzzy Rules Extraction

Fatemeh Ahouz; Amin Golabpour

449

Views

A Novel Structure of Highly Interpretable Fuzzy Rules Extraction

Fatemeh Ahouz¹; Amin Golabpour²*

1. Behbahan Khatam Al-Anbia University of Technology, Behbahan, Iran., 2. School of Medicine, Shahroud University of Medical Sciences, Shahroud, Iran.

Correspondence: *. Corresponding author: Amin Golabpour , School of Medicine, Shahroud University of Medical Sciences, Shahroud, Iran. Email: a.golabpour@shmu.ac.ir

Abstract

Introduction:

Extracting effective rules from medical data with two indicators of accuracy and high interpretability is essential to increase the accuracy and speed of diagnosis by specialists. As a result, the production of medical assistant systems that are able to detect the rules governing the data plays a vital role in early detection of the disease and thus increase the chances of treatment, disease control and maintaining the quality of life of patients.

Material and Methods:

In this paper, a system of automatic extraction of rules from medical data by a new hybrid method based on fuzzy logic and genetic algorithm is presented. Genetic algorithms are used to automatically generate these rules. The Parkinson UCI dataset including 195 records and 23 variables was used to evaluate the proposed method based on the criteria of interpretability, accuracy, sensitivity and specificity.

Results:

The evaluation of the proposed model on the Parkinson's dataset was the accuracy of 84.62%. This accuracy is supported by 4 fuzzy rules with an average rule length of 2 and using 7 linguistic terms extremely low, very low, low, normal, high, very high and extremely high. All fuzzy membership functions that represent each term have the same width.

Conclusion:

The proposed method, based on the three criteria of low number of rules, short rule length and symmetric membership functions with equal width for all variables, is quite suitable for automatic production of accurate and compact rules with high interpretability in medical data. . A 90% dimensionality reduction in the experimental evaluation showed that this model could be used to implement real-time systems.

Received: 2020 October 18; Accepted: 2020 November 26

FHD. 2021 Jan 1; 10: 53

doi: 10.30699/fhi.v10i1.253

Keywords: Key Words Structure, Fuzzy, Rule Extraction.

INTRODUCTION

In recent years, fuzzy systems have been successfully used in various fields of science [1], engineering [2] and medicine [3] for various applications such as control, modeling and classification [4]. Fuzzy rules have become very popular with users due to their high readability, easy interpretation by humans, and the provision of insights into the knowledge embedded in classification systems [5, 6]. They use linguistic rules to describe systems that are easily interpreted and analyzed by users [7].

One of the most important factors in designing any fuzzy system is generating fuzzy rules and fuzzy membership functions for each rule set. There are two main ways to do this. In first category, the rules are produced by an expert. This method is especially used in control issues with a small number of inputs [7]. The second method is the automatic production of rules using Neuro-Fuzzy techniques, clustering methods and evolutionary algorithms [8].

In the second category, fuzzy system design can be formulated as a high-dimensional search space problem in which each point represents a rule set, membership functions, and system behavior. By determining performance criteria, system performance takes the form of a hypersurface in space. The development of an optimal fuzzy system design is equivalent to finding the optimal position in this hypersurface. This feature makes evolutionary algorithms such as genetic algorithms a good candidate to search for these hypersurface [7, 9].

One of the applications of fuzzy systems is in clinical decision support system, where the discovery and interpretability of the underlying rules of the data are of great importance. Given that on the one hand, a specialist's knowledge of diagnostic rules is formed experimentally and over the years in his mind, and on the other hand, the emergence of new diseases leads to the rapid growth of medical knowledge and treatment, extracting rules from data with two indicators of high accuracy and interpretability, help specialists to increase the accuracy and speed of diagnosis [10-12].

In this study, a hybrid genetic-fuzzy classification model is presented to automatically extract a set of compact and highly accurate diagnostic rules from data. The proposed chromosomal structure is designed to select the most effective subset of features and provide the best diagnostic rules required by the fuzzy classification system. The Oxford Parkinson's Disease Detection dataset [13] was used to evaluate the proposed method. Parkinson disease (PD) is the second most common neurodegenerative and progressive disease in the world and one of the public health concerns [14, 15]. The disease is caused by insufficient production of dopamine in the brain and impairs motor and speech abilities [16-18]. More than 10 million people worldwide are currently suffering from this disease [19]. The phenomenon of aging in the world is one of the factors increasing the number of patients with this disease [14, 20]. Given that speech disorders are seen in patients with PD approximately five years before they are clinically diagnosed [14], designing automated PD systems that use voice data is a good way to diagnose early. As a result, the quality of life is maintained and the symptoms associated with the disease are reduced.

So far, many studies have used intelligent methods to diagnose PD on the Oxford Parkinson dataset, including basic machine learning methods [21, 22], hybrid classifiers [14, 20, 23, 24], evolutionary algorithms [25, 26] and fuzzy expert system [24, 27, 28]. The study by Abiyev and Abizade [24] is one of the works that has been done to extract the diagnostic rules of PD. The authors presented a design of the PD diagnosis system combining the Takagi-Sugeno-Kang fuzzy system with a six-layer neural network. The first layer was input, the second layer was fuzzy membership functions to assign the degree of membership to each language term, the third layer was fuzzy rules, and the fourth layer was the result layer of fuzzy rules. The fifth layer multiplied the output signals of the third layer by the output signals of the fourth layer. The sixth layer calculated the system output based on a relation. To design the rules in their work, the clustering method along with the gradient technique had been used. Despite the full design description and 100% accuracy report of the model on the Oxford Parkinson's dataset, no diagnostic rules were stated in the article.

In 2014, Wang et al. [29] proposed a PSO-based Fuzzy Hyper-Rectangular Composite Neural Network (PFHRCNN) to extract rules from the Parkinson's disease dataset. By randomly dividing the data set 50-50% into training and test sets and 10 times implementing the method, the accuracy of the proposed method had been reported by 8 rules, 82.4% on the test set and 92.6% on training. This article listed only 2 out of 8 rules, each of which covers a range of all 22 features in the dataset for both healthy and Parkinson's patients. Given that the length of rules is one of the main indicators of interpretability in clinical decision support systems [10-12], the large number of hypotheses in each rule made it difficult to interpret. Rodrigues and Karunanithi used Mamdani fuzzy inference system in 2019 [27]. Without mentioning the feature selection method, four features of Spread1, DFA, FoH and Spread2 had been selected to analyze the PD diagnosis and intensity. These four features values were categorized into three sets of people with PD, common valued for both PD and healthy people, and completely healthy people. According to the authors, in analyzing the values of these four features, a clear difference was observed between healthy people and those with PD, but no reference was made to the accuracy of the model created by these rules. Their final system consisted of 81 rules, of which only 9 were mentioned in the article. Given the importance of interpretability of diagnostic rules by the human user and its direct relationship with the number and length of rules, in the present study we aim to provide a structure to produce the most compact yet high-accuracy set of rules that is automatically extracted from the data set.

MATERIAL AND METHODS

This study uses the Oxford Parkinson's dataset to train and evaluate the proposed model. The dataset is derived from the UCI Machine Learning Repository, created by Max Little of the University of Oxford in collaboration with the National Center for Voice and Speech, Denver, Colorado, USA [13]. The dataset includes information on 31 individuals, 23 of whom have PD, and includes a total of 24 attributes, including patient names, status (healthy or with PD), and 22 voice measurements taken from the Multidimensional Audio Program (MDVP)[30]. This dataset has a total of 195 voice recording, consisting of recording 5 to 6 sounds from each person to utter vowels for 36 seconds, the details of which are described in Table 1. Of these, 147 records belong to 23 people with PD (approximately 75% of data set records) and 48 records belong to 8 people without PD (approximately 25%). Diagnosis was made over a period of 28 years (from zero to 28) and the age of the people was between 46 and 85 years with an average of 65.8. The main purpose of this data set was to differentiate people with PD from healthy people through influential voice features. Table 1 shows the mean, variance, and range of values of each feature in this data set for information on healthy people and those with PD [31].

Table 1. Parkinson data set statistical information

Feature	Explanation	Parkinson’s diseases (status=1)		Healthy (status=0)
[min-max]	mean±SD	[min-max]	mean±SD
MDVPFo(Hz)	Average basic frequency of sound	[88.33-222.36]	145.18±32.35	[110.74-260.11]	181.94±52.73
MDVPFhi(Hz)	Maximum basic sound frequency	[102.15-588.52]	188.44±88.34	[113.60-592.03]	223.64±96.73
MDVPFlo(Hz)	Minimum basic sound frequency	[65.48-199.02]	106.89±32.27	[74.29-239.17]	145.21±58.76
MDVPJitter	Some measurements of changes in fundamental frequency	[0.00-0.03]	0.01±0.01	[0.00-0.01]	0.00±0.00
MDVPJitterAbs		[0.00-0.00]	0.00±0.00	[0.00-0.00]	0.00±0.00
MDVPRAP		[0.00-0.02]	0.00±0.00	[0.00-0.01]	0.00±0.00
MDVPPPQ		[0.00-0.02]	0.00±0.00	[0.00-0.01]	0.00±0.00
JitterDDP		[0.00-0.06]	0.01±0.01	[0.00-0.02]	0.01±0.00
MDVPShimmer	Some measurements of domain changes	[0.01-0.12]	0.03±0.02	[0.01-0.04]	0.02±0.01
MDVPShimmerdB		[0.09-1.30]	0.32±0.21	[0.09-0.41]	0.16±0.06
ShimmerAPQ3		[0.00-0.06]	0.02±0.01	[0.00-0.02]	0.01±0.00
ShimmerAPQ5		[0.01-0.08]	0.02±0.01	[0.01-0.02]	0.01±0.00
MDVPAPQ		[0.01-0.14]	0.03±0.02	[0.01-0.03]	0.01±0.00
ShimmerDDA		[0.01-0.17]	0.05±0.03	[0.01-0.07]	0.03±0.01
NHR	Two measurements of the ratio of noise to tonal components in sound condition	[0.00-0.31]	0.03±0.04	[0.00-0.11]	0.01±0.02
HNR		[8.44-29.93]	20.97±4.34	[17.88-33.05]	24.68±3043
RPDE	Two measurements of the complexity of nonlinear dynamics	[0.26-0.69]	0.52±0.10	[0.26-0.66]	0.44±0.09
D2	Two measurements of the complexity of nonlinear dynamics	[1.77-3.67]	2.46±0.38	[1.42-2.88]	2.15±0.31
DFA	Representative of the fractal scalability of the signal	[0.57-0.83]	0.73±0.05	[0.63-0.79]	0.7±0.05
spread1	Three nonlinear measurements of fundamental frequency changes	[(-7.12) – (-2.43)]	-5.33±0.97	[(-7.96) – (-5.20)]	6.76±0.64-
spread2		[0.06-0.45]	0.25±0.08	[0.01-0.29]	0.16±0.06
PPE		[0.09-0.53]	0.23±0.08	[0.04-0.25]	0.12±0.04

The structure of the proposed diagnostic method

Fig 1 shows the flowchart of the proposed method. After normalizing the dataset in the interval [0,1], genetic algorithms are applied to extract the rules from the dataset. A fuzzy rule base is embedded in the structure of the proposed chromosome.. Each rule base contains r rules. Each rule has a consequent (class label) and an antecedent. The antecedent consists of two parts. The first part determines the variable numbers which were selected in the antecedent of the rule (identified by Fⁱ) and the second part determines the corresponding values to each of the variables selected in the first part(determined by Vⁱ). The number of allowed values (linguistic terms) for each variable is determined by the parameter s, and the number of variables used in each antecedent is determined by the parameter f. Using this structure, by determining the number of variables in each rule, the most effective subset of features as well as the most accurate rule can be selected.The size of the proposed chromosome is calculated from the Eq 1:

chromoSize=r× 2 × f + 1

(1)

Where r is the number of rules and f is the number of variables at the antecedent of each rule. The constant 1 is added because of the gene representing the consequent of the rule. Each gene has a value between zero and one that is randomly generated in the original population. A valid chromosome has three conditions: first, no duplicate rules are allowed on the same rule base; the second is that no duplicate variables are appeared in the same antecedent, and the third is that the consequent of at least one rule (i.e., the class label of the rule) must be different from the consequent of other rules. The purpose of the algorithm is to determine the best subset of features in the data set to form the antecedent of each rule.

A fuzzy expert system is used to calculate the fitness of each chromosome. At first, each rule are converted to the Mamdani fuzzy rule base. In this form, if the value of any of the variables used in the antecedent of the rule was 1 , it corresponds to the extreme case of low and is represented by the left shoulder membership function. If the value of any of the variables used in the antecedent of the rule was s, it is equivalent to the extreme case of high and is represented by a right shoulder membership function. Otherwise, symmetric triangular functions are used to represent other linguistic terms. These functions are among the most common functions that have been used in research in the field of extracting rules from medical data, including [1, 4, 5, 32, 33]. In this study, the width of all membership functions is the same and both adjacent membership functions overlap by α. Also, to display the output of each rule, symmetric triangular membership functions with fixed parameters have been used.

Comparison the proposed method with the decision tree

The decision tree is one of the most efficient and popular data mining algorithms in the field of rule extraction [34]. In addition to generating interpretable rules, this algorithm is also able to select effective features. Therefore, in order to evaluate the proposed method based on the criteria of interpretability, generalizability and accuracy of the extracted rules, the decision tree algorithm is also implemented on the Oxford Parkinson data set.

[Figure ID: F1] Fig 1. The structure of the proposed model.

RESULTS

The proposed genetic algorithm was implemented for each change in the number of rules, the number of variables at the antecedent of each rule, and the number of terms for each variable at the antecedent of the rules. Given that the goal was to create a compact rule base with high interpretability, the number of rules empirically set between 2 to 7 rules, the number of variables in each rule set between 2 to 4 variables and the number of terms for each variable in of the antecedent of the rules were set to 3, 5 or 7 (inactive mode was not allowed for selected variables in this structure). To reduce the effect of randomness on the genetic algorithm, each setup was run 10 times.

The tournament method was used to select the parents. The mutation operator was the complement method and the crossover operator was the modified whole arithmetic method; in each call, the crossover point was randomly assigned. The purpose of the proposed model was to determine the antecedent of each rule from the features of the data set.

Table 2 shows the results of program execution for different modes. In this table, nRule shows the number of rules, (nFeat, nState) show the combining the number of variables in each rule and the number of terms for each variable. Results shown that the sensitivity of the proposed model was between 93.2 to 100%, specificity between 10.42 to 54.17%, accuracy between 77.44 to 84.62%, PPV between 77.25 to 86.34%, NPV between 66.67 to 100% and f-measure between 86.90 to 90.74%. The highest f-measure, which was the criterion for selecting the fittest chromosomes, was obtained with 4 rules and 2 variables in each rule and 7 terms for each variable at 90.74%.

The best set of rules for the various combinations of the rule base, highlighted in Table 2, is presented in Table 3. In Table 3, the selected fuzzy term for each variable is specified in the antecedent of the rule.

According to the width of each membership function for linguistic terms, which was obtained by dividing the interval equal to the number of allowed terms and the overlap of 0.25% of the width of the functions, the numerical value of each function for each variable could be calculated. Table 4 shows the best set of rules in terms of the most compact, the balance between sensitivity and specificity, and the highest fitness after converting the fuzzy term to the numerical value of each of the variables.

Table 2. Results of the proposed method on the Parkinson's data set for different combinations

nRule	(nFeat, nState)	sensitivity	specificity	accuracy	PPV	NPV	FMeasure	nRule	(nFeat, nState)	sensitivity	specificity	accuracy	PPV	NPV	FMeasure
2	(2,3)	97.96	35.42	82.56	82.29	85.00	89.44	5	(2,3)	94.56	54.17	84.62	86.34	76.47	90.26
	(2,5)	99.32	33.33	83.08	82.02	94.12	89.85		(2,5)	98.64	39.58	84.10	83.33	90.48	90.34
	(2,7)	100.00	22.92	81.03	79.89	100.00	88.82		(2,7)	97.96	37.50	83.08	82.76	85.71	89.72
	(3,3)	97.96	35.42	82.56	82.29	85.00	89.44		(3,3)	94.56	54.17	84.62	86.34	76.47	90.26
	(3,5)	99.32	33.33	83.08	82.02	94.12	89.85		(3,5)	99.32	35.42	83.59	82.49	94.44	90.12
	(3,7)	100.00	22.92	81.03	79.89	100.00	88.82		(3,7)	99.32	31.25	82.56	81.56	93.75	89.57
	(4,3)	97.96	35.42	82.56	82.29	85.00	89.44		(4,3)	94.56	54.17	84.62	86.34	76.47	90.26
	(4,5)	99.32	10.42	77.44	77.25	83.33	86.90		(4,5)	98.64	33.33	82.56	81.92	88.89	89.51
	(4,7)	93.20	41.67	80.51	83.03	66.67	87.82		(4,7)	99.32	18.75	79.49	78.92	90.00	87.95
3	(2,3)	94.56	54.17	84.62	86.34	76.47	90.26	6	(2,3)	94.56	54.17	84.62	86.34	76.47	90.26
	(2,5)	99.32	35.42	83.59	82.49	94.44	90.12		(2,5)	98.64	39.58	84.10	83.33	90.48	90.34
	(2,7)	97.96	37.50	83.08	82.76	85.71	89.72		(2,7)	100.00	37.50	84.62	83.05	100.00	90.74
	(3,3)	94.56	54.17	84.62	86.34	76.47	90.26		(3,3)	94.56	54.17	84.62	86.34	76.47	90.26
	(3,5)	99.32	35.42	83.59	82.49	94.44	90.12		(3,5)	99.32	35.42	83.59	82.49	94.44	90.12
	(3,7)	98.64	31.25	82.05	81.46	88.24	89.23		(3,7)	98.64	31.25	82.05	81.46	88.24	89.23
	(4,3)	94.56	54.17	84.62	86.34	76.47	90.26		(4,3)	98.64	35.42	83.08	82.39	89.47	89.78
	(4,5)	99.32	31.25	82.56	81.56	93.75	89.57		(4,5)	98.64	37.50	83.59	82.86	90.00	90.06
	(4,7)	100.00	25.00	81.54	80.33	100.00	89.09		(4,7)	98.64	22.92	80.00	79.67	84.62	88.15
4	(2,3)	94.56	54.17	84.62	86.34	76.47	90.26	7	(2,3)	94.56	54.17	84.62	86.34	76.47	90.26
	(2,5)	98.64	39.58	84.10	83.33	90.48	90.34		(2,5)	98.64	39.58	84.10	83.33	90.48	90.34
	(2,7)	100.00	37.50	84.62	83.05	100.00	90.74		(2,7)	99.32	37.50	84.10	82.95	94.74	90.40
	(3,3)	94.56	54.17	84.62	86.34	76.47	90.26		(3,3)	94.56	54.17	84.62	86.34	76.47	90.26
	(3,5)	99.32	35.42	83.59	82.49	94.44	90.12		(3,5)	99.32	35.42	83.59	82.49	94.44	90.12
	(3,7)	100.00	25.00	81.54	80.33	100.00	89.09		(3,7)	98.64	31.25	82.05	81.46	88.24	89.23
	(4,3)	94.56	54.17	84.62	86.34	76.47	90.26		(4,3)	94.56	54.17	84.62	86.34	76.47	90.26
	(4,5)	99.32	35.42	83.59	82.49	94.44	90.12		(4,5)	98.64	14.58	77.95	77.96	77.78	87.09
	(4,7)	100.00	22.92	81.03	79.89	100.00	88.82		(4,7)	100.00	22.92	81.03	79.89	100.00	88.82

Table 3. Selected Diagnostic Rules The proposed method on the Parkinson's data set for different combinations

#Rule	Rule Set	f-measure
2	if MDVP:Fhi(Hz) is very high and spread2 is very high Then Healthy if Jitter:DDP is high and Shimmer:APQ3 is very high Then Sick	89.85%
3	if DFA is high and RPDE is high Then Healthy if PPE is high andShimmer:APQ5 is low Then Healthy if MDVP:Flo(Hz) is normal and MDVP:Shimmer is normal Then Sick	90.26%
4	if MDVP:Shimmer is very very high and MDVP:Fo(Hz) is very very high Then Healthy if MDVP:Fo(Hz) is very very low and RPDE is normal Then Healthy if D2 is very high and MDVP:Fhi(Hz) is very high Then Healthy if MDVP:Fhi(Hz) is very low and MDVP:APQ is very low Then Sick	90.74%
5	if HNR is very high and spread1 is very high Then Healthy if RPDE is normal and MDVP:Fhi(Hz) is high Then Healthy if MDVP:RAP is very high and Jitter:DDP is very high Then Healthy if MDVP:Fhi(Hz) is very low and Jitter:DDP is normal Then Sick if Shimmer:APQ5 is high and MDVP:PPQ is very low Then Healthy	90.34%
6	if Shimmer:APQ5 is very very high andMDVP:Jitter(%) is very very high Then Healthy if Shimmer:APQ5 is low and Shimmer:APQ3 is very very low Then Healthy if Jitter:DDP is very high and DFA isvery high Then Healthy if MDVP:APQ is very low and MDVP:Shimmer(dB) is very low Then Sick if Shimmer:APQ3 is low and MDVP:Shimmer is very very low Then Sick if MDVP:PPQ is low and MDVP:RAP is very very low Then Healthy	90.74%
7	if MDVP:Fo(Hz) is very very high and DFA is very very high Then Healthy if spread2 is normal and MDVP:APQ is very very low Then Healthy if MDVP:Flo(Hz) is very high and Shimmer:APQ3 is very high Then Healthy if Shimmer:DDA is high and DFA is normal Then Sick if MDVP:Shimmer is high and MDVP:Jitter(%) is low Then Healthy if MDVP:Jitter(Abs) is very low and MDVP:RAP is normal Then Healthy if Jitter:DDP is very low and Shimmer:APQ3 is very low Then Sick	90.40%

Table 4. Selected rules from the most compact selected rules

# Rule	Rule Set	Sensitivity	Specificity	F-measure
2	if 470 ≤ MDVP:Fhi(Hz) ≤ 593 and 0≤ spread2 ≤ 1 Then Healthy if 0 ≤ Jitter:DDP ≤ 1and 0≤ Shimmer:APQ3 ≤ 1 Then Sick	99.32	33.33	89.85%
3	if DFA==1 and RPDE ==1 Then Healthy if 0 ≤ PPE ≤ 1and 0 ≤ Shimmer:APQ5 ≤ 1 Then Healthy if 109 ≤ MDVP:Flo(Hz) ≤ 196 and 0≤ MDVP:Shimmer ≤ 1 Then Sick	94.56	54.17	90.26%
6	if 0 ≤ Shimmer:APQ5 ≤ 1 and 0 ≤ MDVP:Jitter(%) ≤ 1 Then Healthy if 0 ≤ Shimmer:APQ5 ≤ 1 and 0 ≤ Shimmer:APQ3 ≤ 1 Then Healthy if 0 ≤ Jitter:DDP ≤ 1 and DFA==1 Then Healthy if 0 ≤ MDVP:APQ ≤ 1 and 0 ≤ MDVP:Shimmer(dB) ≤ 1 Then Sick if 0 ≤ Shimmer:APQ3 ≤ 1 and 0 ≤ MDVP:Shimmer ≤ 1 Then Sick if 0 ≤ MDVP:PPQ ≤ 1 and 0 ≤ MDVP:RAP ≤ 1 Then Healthy	100.00	37.50	90.74%

Table 5. Set of rules generated by C4.5 algorithm on the Parkinson Oxford dataset

Row	Rule
1	if (PPE<0. 13 and MDVP:FHi <202.31 and RPDE<0.46) then sick
2	if (PPE<0. 13 and MDVP:FHi <202.31 and RPDE>= 0.46) then healthy
3	if (PPE<0. 13 and MDVP:FHi>=202.31 and spread2 < 0.24 and D2 <2.81) then healthy
4	if (PPE<0. 13 and MDVP:FHi >=202.31 and spread2 < 0.24 and D2>=2.81)then helathy
5	if (PPE<0. 13 and MDVP:FHi >=202.31 and spread2 >= 0.24) then sick
6	if (PPE>=0. 13 and Shimmr:APQ5 <0.01 and MDVP:Fo <112.349) then sick
7	if (PPE>=0. 13 and Shimmr:APQ5 <0.01 and MDVP:Fo>=112.349) then healthy
8	if (PPE>=0. 13 and Shimmr:APQ5 <0.01 and MDVP:Fo>=117.99 and MDVP:Shimmer<0.021) then sick
9	if (PPE>=0. 13 and Shimmr:APQ5<0.01 and MDVP:Fo>=117.99 and MDVP:Shimmer>=0.021) then Healthy
10	if (PPE>=0. 13 and Shimmr:APQ5 >=0.01) then Sick

Results of the implementation of the decision tree method

The rules generated by C4.5 decision tree on the data set are shown in Table 5. This method achieved 98.46% accuracy, 100% sensitivity and 93.75% specificity using 10 rules. Out of a total of 10 generated rules, 33 variables were used in antecedents of these rules and the average length of the rule was 3.3.

DISCUSSION

According to Table 2, the sensitivity of the proposed model on the Parkinson's dataset is not below 93% in any case. This shows that the proposed model can identify people with PD with high accuracy. In more than 14% of cases, the proposed model has reached 100% sensitivity. As a result, the proposed model works well in systems where the correct diagnosis of people with PD is the first priority.

The specificity obtained from the implementation of the proposed model on the Parkinson's data set was below 50% in 74% of cases. This suggests that the proposed model on the Parkinson's dataset for this criterion performs worse than random. The reason for this might be seen in the small number of negative class instances (healthy individuals) in this data set. In this dataset, a quarter of the total number of samples belong to people without PD.

The NPV criterion has reached 100% in all cases where the sensitivity of the model is 100%. This shows that although the proposed model identifies a small number of healthy people as healthy, all of the people it identifies as healthy are really healthy. This measure shows the reliability of the proposed model in determining healthy individuals. One of combination in which the proposed model achieves 100% sensitivity and NPV is a set of rules containing 2 rules, 2 variables per rule, and 7 terms for each variable, which is a compact, fully interpretable set of rules that is easy to implement. As a result, in laboratory environments where the goal is to build a test device, the proposed model is a completely reliable option.

On the other hand, if the goal is to build a system with high specificity and sensitivity simultaneously, the best performance of the proposed model was observed in a rule base with 3 rules, 2 variables in each rule and 3 fuzzy terms for each variable. This rule base has a sensitivity of 94.56%, a specificity of 54.17% and an accuracy of 84.62%. Compared to the set of rules obtained by c4.5 algorithm, the proposed method produces a more compact set of rules in terms of number of rules (ratio 3 to 10) and average length of rules (ratio 2 to 3.3). Although the performance criteria of the set of rules produced by decision tree are higher than the proposed method, the generalizability of the rules of the proposed method is higher. In decision tree rules, small changes in the value of a variable will cause different decisions; Rules 1 and 2 in Table 5 differ only in the value of the variable RDPE. If this value is less than 0.47, the sample is classified into the class of people with PD, and if it is equal to or greater than 0.47, the sample is classified into the class of people without PD. While in the set of rules produced by the proposed method, minor changes at the level of one variable will not cause a different classification. Contrary to the hierarchical nature of tree rules and the dependence of decision-making on minor changes in variables, the rules produced by the proposed method have the ability to be each independently evaluated by a specialist and approved or rejected clinically.

One of the limitations of the proposed system is its lack of evaluation on different data sets that can be covered in future work. In addition, a case study of each of the rules on the data set to assess the distinguishing power of healthy and PD records by each rule is a limitation that must be met in future work.

According to the obtained results, the proposed model has worked quite well in compact rule extraction with very high interpretability; also, considering the imbalance of positive and negative samples in the data set, the accuracy of the proposed model is quite appropriate and can be used in the construction of medical assistant systems.

CONCLUSION

In this paper, a hybrid genetic-fuzzy model is presented to extract diagnostic rules from medical data. The goal was to achieve a precise set of rules with high interpretability. The efficiency of the proposed method was evaluated on the Oxford Parkinson's data set. This dataset contains 147 records containing information on people with PD and 48 records containing information on healthy individuals. Despite the imbalance of positive and negative samples of this data set, the best set of rules obtained includes 4 rules and 2 variables in each rule and 7 terms for each variable on the whole data set with 84.62% accuracy, 100% sensitivity, 37.5% specificity, 83.05% PPV, 100% NPV and 90.74% f-measure. This result shows that the proposed model achieved an error rate of 15.38% by using only 9% of the data set features in the construction of each rule, while reducing the dimensions of the data set by 90%. Also, due to the compactness of the rule base and the reliability of the distinction between healthy and PD people according to NPV and PPV criteria, the proposed model can be used in the construction of medical decision support systems. In addition, the proposed model with the ability to adjust the number of rules, the number of variables in each rule and the number of linguistic terms to describe each clinical variable can be retrained to achieve higher accuracy

References


1.	Feng, TC. Li, T. Kuo, PH. Variable coded hierarchical fuzzy classification model using DNA coding and evolutionary programming. Applied Mathematical Modelling 2015 39(23-24):7401–19.
2.	De, SE. Rizzi, A. Sadeghian, A. Hierarchical genetic optimization of a fuzzy logic system for energy flows management in microgrids. Applied Soft Computing 2017 60:135–49.
3.	Tan, CH. Tan, MS. Chang, SW. Yap, KS. Yap, HJ. Wong, SY. Genetic algorithm fuzzy logic for medical knowledge-based pattern classification. Journal of Engineering Science and Technology 2018 13(Special Issue on ICCSIT 2018):242–58.
4.	Ishibuchi, H.; Nojima, Y.; Kuwajima, I. Genetic rule selection as a postprocessing procedure in fuzzy data mining. International Symposium on Evolving Fuzzy Systems IEEE; 2006.
5.	Gorzalczany, MB. Rudzinski, F. Interpretable and accurate medical data classification: A multi-objective genetic-fuzzy optimization approach. Expert Systems with Applications 2017 71:26–39.
6.	Mitra, S. Hayashi, Y. Neuro-fuzzy rule generation: Survey in soft computing framework. IEEE Transactions on Neural Networks 2000 11(3):748–68.
7.	Shi, Y. Eberhart, R. Chen, Y. Implementation of evolutionary fuzzy systems. IEEE Transactions on Fuzzy Systems 1999 7(2):109–19.
8.	Chang, X. Lilly, JH. Evolutionary design of a fuzzy classifier from data. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 2004 34(4):1894–906.
9.	GaneshKumar, P. Rani, C. Devaraj, D. Victoire, T. Hybrid Ant Bee Algorithm for Fuzzy Expert System Based Sample Classification. IEEE/ACM Transactions on Computational Biology and Bioinformatics 2014 11(2):347–60.
*10.*	Shortliffe, E. Cimino, JJ. Biomedical informatics: Computer applications in health care and biomedicine. 4th ed Springer
*11.*	Sujatha, R. Ephzibah, EP. Dharinya, S. Uma, MG. Mareeswari, V. Pamidimarri, V. Comparative study on dimensionality reduction for disease diagnosis using fuzzy classifier. International Journal of Engineering and Technology 2018 7(1):79–84.
*12.*	Seera, M. Lim, CP. A hybrid intelligent system for medical data classification. Expert Systems with Applications 2014 41(5):2239–49.
*13.*	UCI Machine Learning Repository. Parkinsons data set [Internet]. 2007 [ [cited: 15 Jun 2020]]. Available from: [WebCite Cache]
*14.*	Cai, Z. Gu, J. Wen, C. Zhao, D. Huang, C. Huang, H. An intelligent Parkinson's disease diagnostic system based on a chaotic bacterial foraging optimization enhanced fuzzy KNN approach. Comput Math Methods Med 2018 2018:2396952.
*15.*	Li, Y. Swift, S. Tucker, A. Modelling and analysing the dynamics of disease progression from cross-sectional studies. J Biomed Inform 2013 46(2):266–74.
*16.*	Pahuja, G. Nagabhushan, TN. A Comparative Study of Existing Machine Learning Approaches for Parkinson's Disease Detection. IETE Journal of Research 2018 2018:1–11.
*17.*	Devarajan, M. Ravi, L. Intelligent cyber-physical system for an efficient detection of Parkinson disease using fog computing. Multimedia Tools and Applications 2019 78:32695–719.
*18.*	Olivares, R. Munoz, R. Soto, R. Crawford, B. Cárdenas, D. Ponce, A. An optimized brain-based algorithm for classifying parkinson's disease. Appllied Science 2020 10(5):1827.
*19.*	Parkinson's Foundation. Statistics [Internet]. 2020 [[cited: 1 Jun 2020]]. Available from: [WebCite Cache]
*20.*	Hariharan, M. Polat, K. Sindhu, R. A new hybrid intelligent system for accurate detection of Parkinson's disease. Comput Methods Programs Biomed 2014 113(3):904–13.
*21.*	Marar, S. Swain, D. Hiwarkar, V. Motwani, N. Awari, A. Predicting the occurrence of Parkinson's Disease using various classification models. International Conference on Advanced Computation and Telecommunication IEEE; 2018
*22.*	Caesarendra, W. Putri, FT. Ariyanto, M. Setiawan, JD. Pattern recognition methods for multi stage classification of Parkinson's disease utilizing voice features. IEEE International Conference on Advanced Intelligent Mechatronics IEEE; 2015
*23.*	Avci, D. Dogantekin, A. An expert diagnosis system for Parkinson disease based on genetic algorithm, wavelet kernel, extreme learning machine. Parkinsons Dis 2016 2016:5264743.
*24.*	Abiyev, RH. Abizade, S. Diagnosing Parkinson's diseases using fuzzy neural system. Comput Math Methods Med 2016 2016:1267919.
*25.*	Dash, S. Thulasiram, R. Thulasiraman, P. An enhanced chaos-based firefly model for Parkinson's disease diagnosis and classification. International Conference on Information Technology (ICIT) IEEE; 2018
*26.*	Tomar, D. Prasad, BR. Agarwal, S. An efficient Parkinson disease diagnosis system based on least squares twin support vector machine and particle swarm optimization. International Conference on Industrial and Information Systems (ICIIS) IEEE; 2015
*27.*	Karunanithi, D.; Rodrigues, P. International Conference on ISMAC in Computational Vision and Bio-Engineering. Springer; 2019. A fuzzy rule-based diagnosis of Parkinson’s disease.
*28.*	Lee, SH. Feature selection based on the center of gravity of BSWFMs using NEWFM. Engineering Applications of Artificial Intelligence 2015 45:482–7.
*29.*	Hsieh, Y. Su, M. Wang, P. A PSO-based rule extractor for medical diagnosis. J Biomed Inform 2014 49:53–60.
*30.*	Little, M. McSharry, P. Roberts, S. Costello, D. Moroz, I. Exploiting nonlinear recurrence and fractal scaling properties for voice disorder detection. Biomed Eng Online 2007 6:23.
*31.*	Ahouz, F. Golabpour, A. A novel diagnostic rule for Parkinson's disease based on a hybrid extraction method. Journal of Knowledge & Health in Basic Medical Sciences 2020 15(2):42–50.
*32.*	Setnes, M. Roubos, H. GA-fuzzy modeling and classification: Complexity and performance. IEEE Transactions on Fuzzy Systems, 2000 8(5):509–22.
*33.*	Mansourypoor, F. Asadi, S. Development of a reinforcement learning-based evolutionary fuzzy rule-based system for diabetes diagnosis. Comput Biol Med 2017 91:337–52.
*34.*	Wang, YY. Li, J. Feature-selection ability of the decision-tree algorithm and the impact of feature-selection/extraction on decision-tree results based on hyperspectral data. International Journal of Remote Sensing 2008 29(10):2993–3010.

Refbacks

There are currently no refbacks.

Frontiers in Health Informatics

ISSUES

Views

A Novel Structure of Highly Interpretable Fuzzy Rules Extraction

Abstract

INTRODUCTION

MATERIAL AND METHODS

RESULTS

DISCUSSION

CONCLUSION

Refbacks