Machine Learning Models for Diagnostic Classification of Hepatitis C Tests

Oladosu Oyebisi Oladimeji, Abimbola Oladimeji, Oladimeji Olayanju



Introduction: Hepatitis C is a chronic infection caused by hepatitis c virus - a blood borne virus. Therefore, the infection occurs through exposure to small quantities of blood. It has been estimated by World Health Organization (WHO) to have affected 71 million people worldwide. This infection costs individual, groups and government a lot because no vaccine has been gotten yet for the treatment. This disease is likely to continue to affect more people because it’s long asymptotic phase which makes its early detection not feasible.

Material and Methods: In this study, we have presented machine learning models to automatically classify the diagnosis test of hepatitis and also ranked the test features in order to know how they contribute to the classification which help in decision making process by the health care industry. The synthetic minority oversampling technique (SMOTE) was used to solve the problem of imbalance dataset.

Results: The models were evaluated based on metrics such as Matthews correlation coefficient, F-measure, Precision-Recall curve and Receiver Operating Characteristic Area Under Curve.  We found that using SMOTE techniques helped raise performance of the predictive models. Also, random forest (RF) had the best performance based on Matthews correlation coefficient (0.99), F-measure (0.99), Precision-Recall curve (1.00) and Receiver Operating Characteristic Area Under Curve (0.99).

Conclusion: This discovery has the potential to impact on clinical practice, when health workers aim at classifying diagnosis result of disease at its early stage.


Alam TM, Iqbal MA, Ali Y, Wahab A, Ijaz S, Baig TI, et al. A model for early prediction of diabetes. Informatics in Medicine Unlocked, 2019; 16: 100204.

Skyler JS, Bakris GS, Bonifacio, Darsow T, Eckel RH, Groop L, et al. Differentiation of diabetes by pathophysiology, natural history, and prognosis. Diabetes. 2017; 66(2): 241-55. PMID: 27980006 DOI: 10.2337/db16-0806

Tao Z, Shi A, Zhao J. Epidemiological perspectives of diabetes. Cell Biochem Biophys. 2015; 73(1): 181-5. PMID: 25711186 DOI: 10.1007/s12013-015-0598-4

World Health Organization. Hepatitis C [Internet]. 2020 [cited: 9 Nov 2020]. Available from: detail/hepatitis-c

Centers for Disease Control and Prevention. Hepatitis [Internet]. 2018 [cited: 9 Nov 2020]. Available from:

Chawathe SS. Diagnostic classification using hepatitis C tests. International IOT, Electronics and Mechatronics Conference. IEEE; 2020.

Bishop C. Pattern recognition and machine learning. Springer; 2006.

Awan SE, Bennamoun M, Sohel F, Sanfilippo FM, Dwivedi G. Machine learning based prediction of heart failure readmission or death: Implications of choosing the right model and the right metrics. ESC Heart Fail. 2019; 6(2): 428-35. PMID: 30810291 DOI: 10.1002/ehf2.12419

Oladimeji OO, Oladimeji O. Predicting survival of heart failure patients using classification algorithms. Journal of Information Technology and Computer Engineering. 2020; 4(2): 90-4.

Joloudari JH, Saadatfar H, Dehzangi A, Shamshirband S. Computer-aided decision-making for predicting liver disease using PSO-based optimized SVM with feature selection. Informatics in Medicine Unlocked. 2017; 17: 100255.

Metzge BE, Lowe LP, Dyer AR, Trimble ER, Chaovarindr U, Coustan DR, et al. Hyperglycemia and adverse pregnancy outcomes. N Engl J Med. 2008; 358(19): 1991-2002. PMID: 18463375 DOI: 10.1056/NEJMoa0707943

Sneha N, Gangi T. Analysis of diabetes mellitus for early prediction using optimal features selection. Journal of Big Data. 2019; 6: 13.

UCI. Machine learning repository [Internet]. 2007 [cited: 4 Nov 2020]. Available from:

Hoffmann G, Bietenbeck A, Lichtinghagen R, Klawon F. Using machine techniques to generate laboratory diagnostic pathways: A case study. Journal of Laboratory and Percision Medicine. 2018; 3(6): 58-67.

Han J, Kamber M, Pei J. Data mining: Concepts and techniques. 3rd ed. Elsevier; 2001.

Larose DT, Larose CD. Introduction to data mining and knowledge discovery. John Wiley & Sons; 1996.

Hu G, Xi T, Mohammed F, Miao H. Classification of wine quality with imbalanced data. International Conference on Industrial Technology. IEEE; 2016.

Burkov A. The hundred page machine learning book. Andriy Burkov; 2019.

Alshamlan H, Badr G, Alohali Y. Gene selection and cancer classification method using artificial bee colony and SVM algorithms (ABC-SVM). International Conference on Data Engineering. Springer; 2015.

Breiman L. Random forest. Machine Learning. 2001; 45: 5-32.

Marsland S. Machine learning: An algorithmic perspective. 2nd ed. CRC Press; 2015.

Martinez-Arroyo M, Sucar L. Learning an optimal Naïve Bayes classifier. International Conference on Pattern Recognition. IEEE; 2006.

Lee JW, Lee JB, Park M, Song SH. An extensive evaluation of recent classification tools applied to microarray data. Computational Statistics and Data Analysis. 2005; 48: 869–85.

Yeung KY, Bumgarner RE, Raftery AE. Bayesian model averaging: Development of an improved multi-class, gene selection and classification tool for microarray data. Bioinformatics. 2005; 21(10): 2394–402.

WEKA. The workbench for machine learning [Internet]. 2015 [cited: 3 Nov 2020]. Available from:

Matthews BW. Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim Biophys Acta (BBA)-Protein Structure. 1975; 405(2): 442–51.

Diez P. Smart wheelchairs and brain-computer interfaces. Academic Press; 2018.

Orooji A, Kermani F. Machine learning based methods for handling imbalanced data in hepatitis diagnosis. Frontiers in Health Informatics. 2021; 10: 57.



  • There are currently no refbacks.