Towards a Better Diagnosis of Prostate Cancer: Application of Machine Learning Algorithms

Soheila Saeedi, Keivan Maghooli, Shahrzad Amirazodi, Sorayya Rezayi



Introduction: Prostate cancer is one of the leading causes of death in men, and the early detection of this disease can be a significant factor in controlling and managing it. Applying data mining techniques can lead to the extraction of hidden knowledge from a huge amount of data and can help diagnose this disease by physicians. This study aims to determine the algorithm with the best performance to diagnose prostate cancer.

Methods: In this study, nine data mining techniques, including Support Vector Machine, Decision Tree, Naive Bayes, K-Nearest Neighbors, Neural Network, Random Forest, Deep Learning, Auto-MLP, and Rule Induction algorithms, were used to extract hidden patterns from prostate cancer data. In this study, the data of 100 patients, which included eight characteristics, were used, and the RapidMiner Studio environment was employed for modeling. To compare the performance of the mentioned approaches used in this study to diagnose prostate cancer, accuracy, recall, precision, AUC, sensitivity, and specificity were calculated and reported for all techniques.

Results: The results of this study showed that the accuracy of the applied algorithms was between 77% and 84%. Using different criteria to evaluate the techniques used showed that the two algorithms K-Nearest Neighbors and Neural Network, had better performance and accuracy (84%) than other methods. The sensitivity in these two algorithms was 80% for Neural Networks and 85% for K-Nearest Neighbors, respectively.

Conclusion: The usage of different data mining techniques can lead to the discovery of hidden patterns among an enormous amount of data related to prostate cancer, and as a result, it leads to the early diagnosis of this disease and saves the subsequent costs.


Prostate Cancer; Data Mining; Machine Learning; Diagnose; Neural Network; Deep Learning


Churilov L, Bagirov A, Schwartz D, Smith K, Dally M. Data mining with combined use of optimization techniques and self-organizing maps for improving risk grouping rules: Application to prostate cancer patients. Journal of Management Information Systems. 2005; 21(4): 85-100.

Wang Y, Wang D, Geng N, Wang Y, Yin Y, Jin Y. Stacking-based ensemble learning of decision trees for interpretable prostate cancer detection. Applied Soft Computing. 2019; 77: 188-204.

Bray F, Ferlay J, Soerjomataram I, Siegel RL, Torre LA, Jemal A. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2018; 68(6): 394-424. PMID: 30207593 DOI: 10.3322/caac.21492

Mohler J, Bahnson RR, Boston B, Busby JE, D'Amico A, Eastham JA, et al. NCCN clinical practice guidelines in oncology: Prostate cancer. J Natl Compr Canc Netw. 2010; 8(2): 162-200. PMID: 20141676 DOI: 10.6004/jnccn.2010.0012

Grigore AD, Ben-Jacob E, Farach-Carson MC. Prostate cancer and neuroendocrine differentiation: More neuronal, less endocrine? Front Oncol. 2015; 5: 37. PMID: 25785244 DOI: 10.3389/fonc.2015.00037

Wu CH, Fang K, Chen TC. Applying data mining for prostate cancer. International Conference on New Trends in Information and Service Science. IEEE; 2009.

Zhang YY, Li Q, Xin Y, Lv WQ. Differentiating prostate cancer from benign prostatic hyperplasia using PSAD based on machine learning: Single-center retrospective study in China. IEEE/ACM Trans Comput Biol Bioinform. 2018; 16(3): 936-41. PMID: 29993659 DOI: 10.1109/TCBB.2018.2822675

Dunning MJ, Vowler SL, Lalonde E, Ross-Adams H, Boutros P, Mills IG, et al. Mining human prostate cancer datasets: The “camcAPP” Shiny App. EBioMedicine. 2017; 17: 5-6. PMID: 28286059 DOI: 10.1016/j.ebiom.2017.02.022

Ngai EWT, Xiu L, Chau DCK. Application of data mining techniques in customer relationship management: A literature review and classification. Expert Systems with Applications. 2009; 36(2): 2592-602.

Freitas AA. A survey of evolutionary algorithms for data mining and knowledge discovery. In: Ghosh A, Tsutsui S (eds.). Advances in evolutionary computing. Springer; 2003.

Alonso F, Martínez L, Pérez A, Valente JP. Cooperation between expert knowledge and data mining discovered knowledge: Lessons learned. Expert Systems with Applications. 2012; 39(8): 7524-35.

Delen D, Walker G, Kadam A. Predicting breast cancer survivability: A comparison of three data mining methods. Artif Intell Med. 2005; 34(2): 113-27. PMID: 15894176 DOI: 10.1016/j.artmed.2004.07.002

Maliha SK, Ema RR, Ghosh SK, Ahmed H, Mollick MRJ, Islam T. Cancer disease prediction using naive bayes, K-nearest neighbor and J48 algorithm. International Conference on Computing, Communication and Networking Technologies. IEEE; 2019.

Bustamam A, Bachtiar A, Sarwinda D. Selecting features subsets based on support vector machine-recursive features elimination and one dimensional Naïve Bayes classifier using support vector machines for classification of prostate and breast cancer. Procedia Computer Science. 2019; 157: 450-8.

Kunwar V, Chandel K, Sabitha AS, Bansal A. Chronic kidney disease analysis using data mining classification techniques. International Conference of Cloud System and Big Data Engineering. IEEE; 2016.

Chaurasia V, Pal S, Tiwari B. Chronic kidney disease: A predictive model using decision tree. International Journal of Engineering Research and Technology. 2018; 11(11): 1781-94.

Vijayarani S, Dhayanand S. Data mining classification algorithms for kidney disease prediction. International Journal on Cybernetics & Informatics. 2015; 4(4): 13-25.

Bahrami B, Shirvani MH. Prediction and diagnosis of heart disease by data mining techniques. Journal of Multidisciplinary Engineering Science and Technology. 2015; 2(2): 164-8.

Dangare CS, Apte SS. Improved study of heart disease prediction system using data mining classification techniques. International Journal of Computer Applications. 2012; 47(10): 44-8.

LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015; 521(7553):436-44. PMID: 26017442 DOI: 10.1038/nature14539

Belgiu M, Drăguţ L. Random forest in remote sensing: A review of applications and future directions. ISPRS Journal of Photogrammetry and Remote Sensing. 2016; 114: 24-31.

Oshiro TM, Perez PS, Baranauskas JA. How many trees in a random forest? International Workshop on Machine Learning and Data Mining in Pattern Recognition. Springer; 2012.

Jalali SMJ, Moro S, Mahmoudi MR, Ghaffary KA, Maleki M, Alidoostan A. A comparative analysis of classifiers in cancer prediction using multiple data mining techniques. International Journal of Business Intelligence and Systems Engineering. 2017; 1(2): 166-78.

Mallios N, Papageorgiou E, Samarinas M. Comparison of machine learning techniques using the WEKA environment for prostate cancer therapy plan. International Workshops on Enabling Technologies: Infrastructure for Collaborative Enterprises. IEEE; 2011.

Sidiropoulos K, Glotsos D, Kostopoulos S, Ravazoula P, Kalatzis I, Cavouras D, et al. Real time decision support system for diagnosis of rare cancers, trained in parallel, on a graphics processing unit. Comput Biol Med. 2012; 42(4): 376-86. PMID: 22197115 DOI: 10.1016/j.compbiomed.2011.12.004

Ohmann C, Moustakis V, Yang Q, Lang K. Evaluation of automatic knowledge acquisition techniques in the diagnosis of acute abdominal pain. Artif Intell Med. 1996; 8(1): 23-36. PMID: 8963379 DOI: 10.1016/0933-3657(95)00018-6



  • There are currently no refbacks.