Iranian Association of Medical InformaticsFrontiers in Health Informatics2676-710410120210123Machine Learning Based Methods for Handling Imbalanced Data in Hepatitis Diagnosis25925910.30699/fhi.v10i1.259ENAzamOroojiAssistant Professor, North Khorasan University of Medical Sciences, Bojnurd,. firstname.lastname@example.orgFarzanehKermani2020112320201216Introduction: Hepatitis C virus is the leading cause of mortality from liver disease. Also, diagnosis systems are usable tools for better disease control and management. The aim of this study was to design an HCV disease prediction system and classify its severity based on data mining methods. Method: This is an applied research that uses the hepatitis C dataset in the UCI library. The study was conducted in four steps including data preprocessing, data mining, evaluation and system design. In data pre-processing, data balancing techniques were performed. Then, three data mining algorithms (Multi-Layer Perceptron, Bayesian network, and decision tree) were implemented and 10-fold cross-validation method was used to evaluate data mining algorithms. Finally, user interface was designed in MATLAB programming language (version 2016) based on the best algorithm.Results:The results showed that the over-sampling method improved the performance measures of data mining algorithms in disease prediction, so that in the O-dataset the accuracy of the best method (random forest) was 99.9%. Also, the random forest for the O-dataset had the best performance measures in term of sensitivity, accuracy and f-measure (99.9%) and the 100% specificity amount.Conclusion: Considering that the presented approach has performed better than all suggested methods in previous studies, the proposed system in this study can be used well in HCV diagnosing and determining its severity.
- There are currently no refbacks.