• Logo
  • HamaraJournals

Performance Analysis of Data Mining Techniques for the Prediction Breast Cancer Risk on Big Data

Solmaz Sohrabei, Alireza Atashi



Introduction: Early detection breast cancer Causes it most curable cancer in among other types of cancer, early detection and accurate examination for breast cancer ensures an extended survival rate of the patients. Risk factors are an important parameter in breast cancer has an important effect on breast cancer. Data mining techniques have a growing reputation in the medical field because of high predictive capability and useful classification. These methods can help practitioners to develop tools that allow detecting the early stages of breast cancer.

Material and Methods: The database used in this paper is provided by Motamed Cancer Institute, ACECR Tehran, Iran. It contains of 7834 records of breast cancer patients clinical and risk factors data. There were 4008 patients (52.4%) with breast cancers (malignant) and the remaining 3617 patients (47.6%) without breast cancers (benign). Support vector machine, multi-layer perceptron, decision tree, K nearest neighbor, random forest, naïve Bayesian models were developed using 20 fields (risk factor) of the database because database feature was restrictions. Used 10-fold crossover for models evaluate. Ultimately, the comparison of the models was made based on sensitivity, specificity and accuracy indicators.

Results: Naïve Bayesian and artificial neural network are better models for the prediction of breast cancer risks. Naïve Bayesian had accuracy of 93%, specificity of 93.32%, sensitivity of 95056%, ROC of 0.95 and artificial neural network had accuracy of 93.23%, specificity of 91.98%, sensitivity of 92.69%, and ROC of 0.8.

Conclusion: Strangely the different artificial intelligent calculations utilized in this examination yielded close precision subsequently these techniques could be utilized as option prescient instruments in the bosom malignancy risk considers. The significant prognostic components affecting risk pace of bosom disease distinguished in this investigation, which were approved by risk, are helpful and could be converted into choice help devices in the clinical area.


Chen HL, Yang B, Liu J, Liu DY. A support vector machine classifier with rough set-based feature selection for breast cancer diagnosis. Expert Systems with Applications. 2011; 38(7): 9014-22.

Kumari M, Singh V. Breast cancer prediction system. Procedia Computer Science. 2018; 132: 371–6.

Salim EI, Jazieh AR, Moore MA. Lung cancer incidence in the Arab league countries: Risk factors and control. Asian Pac J Cancer Prev. 2011; 12(1): 17-34. PMID: 21517227

Chaurasia V, Pal S, Tiwari BB. Prediction of benign and malignant breast cancer using data mining techniques. Journal of Algorithms & Computational Technology. 2018; 12(2): 119-26.

Chaurasia V, Pal S. A novel approach for breast cancer detection using data mining techniques. International Journal of Innovative Research in Computer and Communication Engineering. 2014; 2(1): 1-17.

Venkatesan E, Velmurugan T. Performance analysis of decision tree algorithms for breast cancer classification. Indian Journal of Science and Technology. 2015; 8(29): 1-8.

Verma D, Mishra N. Analysis and prediction of breast cancer and diabetes disease datasets using data mining classification techniques. International Conference on Intelligent Sustainable Systems. IEEE; 2017.

Poornima S, Pushpalatha M. A survey of predictive analytics using big data with data mining. International Journal of Bioinformatics Research and Applications. 2018; 14(3): 269-82.

No Authour. Application of data mining techniques to predict breast cancer. Procedia Computer Science. 2019; 163: 11-8.

Wang H, Yoon SW. Breast cancer prediction using data mining method. Industrial and Systems Engineering Research Conference. IISE; 2015.

Burt JR, Torosdagli N, Khosravan N, Ravi Prakash H, Mortazi A, Tissavirasingham F, et al. Deep learning beyond cats and dogs: Recent advances in diagnosing breast cancer with deep neural networks. Br J Radiol. 2018, 91(1089): 20170545. PMID: 29565644 DOI: 10.1259/bjr.20170545

Hueman MT, Wang H, Yang CQ, Sheng L, Henson DE, Schwartz AM, et al. Creating prognostic systems for cancer patients: A demonstration using breast cancer. Cancer Med. 2018, 7(8), 3611–21. PMID: 29968970 DOI: 10.1002/cam4.1629

Thakur SS, Li H, Chan AMY, Tudor R, Bigras G, Morris D, et al. The use of automated Ki67 analysis to predict Oncotype DX risk-of-recurrence categories in early-stage breast cancer. PLoS One. 2018; 13(1): e0188983. PMID: 29304138 DOI: 10.1371/journal.pone.0188983

Lavanya D, Usha Rani K. Performance evaluation of decision tree classifiers on medical datasets. International Journal of Computer Applications. 2011; 26(4): 1-4.

Shomona GJ, Ramani R. Discovery of knowledge patterns in clinical data through data mining algorithms: Multi-class categorization of breast tissue data. International Journal of Computer Applications. 2011; 32(7): 46-52.

Shajahaan SS, Shanthi S, ManoChitra V. Application of data mining techniques to model breast cancer data. International Journal of Emerging Technology and Advanced Engineering. 2013; 3(11): 1-5.

Kharya S, Dubey D, Soni S. Predictive machine learning techniques for breast cancer detection. International Journal of Computer Science and Information Technologies. 2013; 4(6): 1023-8.

Nalini C, Meera D. Breast cancer prediction system using data mining methods. International Journal of Pure and Applied Mathematics. 2018; 119(12): 10901-11.

Thongkam J, Xu G, Zhang Y, Huang F. Support vector machine for outlier detection in breast cancer survivability prediction. Advanced Web and Network Technologies, and Applications. Springer; 2008.

Rana M, Chandorkar P, Dsouza A, Kazi N. Breast Cancer Diagnosis and Recurrence Prediction Using Machine Learning Techniques. International Journal of Research in Engineering and Technology. 2015; 4: 372-6.

Sriramakrishnan GV, Muthu Selvam M. Early detection of breast cancer using data mining algorithm based on historical medical records. International Journal of Advanced Science and Technology. 2020; 29(7): 8949-55.

Abdull MAS. Data mining techniques and breast cancer prediction: A case study of Libya [PhD Thesis]. Sheffield Hallam University, United Kingdom; 2011.

Rathi M, Gupta C. An approach to predict breast cancer and drug suggestion using machine learning techniques. International Journal on Information Technology. 2014; 4(1): 1-9.

Eltalhi S, Kutrani H. Breast cancer diagnosis and prediction using machine learning and data mining techniques: A review. IOSR Journal of Dental and Medical Sciences. 2019; 18(4): 85-94.

Swetha K, Ranjana R. Breast cancer predication using machine learning and data mining. International Journal of Scientific Research in Computer Science, Engineering and Information Technology. 2020; 6(3): 610-5.

Diz J, Marreiros G, Freitas A. Applying data mining techniques to improve breast cancer diagnosis. J Med Syst. 2016; 40(9): 203. PMID: 27498205 DOI: 10.1007/s10916-016-0561-y

Mandadi S, Tejashwini B, Vijayakumar S. Breast cancer classification using data mining. International Journal of Research in Engineering and Science. 2020; 8(11): 43-7.

Sultana J, Jilani AK. Predicting breast cancer using logistic regression and multi-class classifiers. International Journal of Engineering & Technology. 2018; 7(4); 22-6.

Leenavinmalar F, Kumarkombaiya A. Application of data mining techniques in early detection of breast cancer. International Journal of Engineering Trends and Technology. 2018; 56(1): 1-5.

Nishara Banu MA, Gomathy B. Disease predicting system using data mining techniques. International Journal of Technical Research and Applications. 2013; 1(5): 41-5.

Han J, Kamber M. Data mining: Concepts and techniques. 3rd ed. Morgan Kaufman; 2012.

Liou DM, Chang WP. Applying data mining for the analysis of breast cancer data. Methods Mol Biol. 2015; 1246: 175-89. PMID: 25417087 DOI: 10.1007/978-1-4939-1985-7_12

Karapinar Senturk Z, Kara R. Breast cancer diagnosis via data mining: Performance analysis of seven different algorithms. Computer Science & Engineering: An International Journal. 2014; 4(1): 35-46.

D'Alisa S, Miscio G, Baudo S, Simone A, Tesio L, Mauro A. Depression is the main determinant of quality of life in multiple sclerosis: A classification-regression (CART) study. Disabil Rehabil. 2006; 28(5): 307-14. PMID: 16492625 DOI: 10.1080/09638280500191753

Garcia Jacob S, Geetha Ramani R. Efficient classifier for classification of prognostic breast cancer data through data mining techniques. World Congress on Engineering and Computer Science. WCECS; 2012.

Elsayad AM, Elsalamony HA. Diagnosis of breast cancer using decision tree models and SVM. International Journal of Computer Applications. 2013; 83(5): 19-29.

Kate RJ, Nadig R. Stage-specific predictive models for breast cancer survivability. Int J Med Inform. 2017; 97: 304-11. PMID: 27919388 DOI: 10.1016/j.ijmedinf.2016.11.001

Kaushik D, Kaur K. Application of data mining for high accuracy prediction of breast tissue biopsy results. International Conference on Digital Information Processing, Data Mining, and Wireless Communications. IEEE; 2016.

Hamsagayathri P, Sampath P. Decision tree classifers for classifcation of breast cancer. International Journal of Current Pharmaceutical Research. 2017; 9(2): 31-6.

Pritom AI, Munshi MAR, Sabab SA, Shihab S. Predicting breast cancer recurrence using efective classifcation and feature selection technique. International Conference on Computer and Information Technology. IEEE; 2017.

Percha B, Nassif H, Lipson J, Burnside E, Rubin D. Automatic classifcation of mammography reports by BI-RADS breast tissue composition class. J Am Med Inform Accoc. 2012; 19(5): 913–6. PMID: 22291166 DOI: 10.1136/amiajnl-2011-000607

Li Y, Chen Z. Performance evaluation of machine learning methods for breast cancer prediction. Applied and Computational Mathematics. 2018; 7(4): 212-6.

Nourelahi M, Zamani A, Talei A, Tahmasebi S. A model to predict breast cancer survivability using logistic regression. Middle East Journal of Cancer. 2019; 10(2): 132–8.

Lotfinezhad Afshar H, Jabbari N, Khalkhali HR, Esnaashari O. Prediction of breast cancer survival by machine learning methods: An application of multiple imputation. Iran J Public Health. 2021; 50(3): 598-605. PMID: 34178808 DOI: 10.18502/ijph.v50i3.5606

Sohrabi S, Atashi A, Dadashi A, Marashi S. A comparative study of multilayer neural network and C4.5 decision tree models for predicting the risk of breast cancer. Archives of Breast Cancer. 2018; 29: 11-4.

Tanha J, Salarabadi H, Aznab M, Farahi A, Zoberi M. Relationship among prognostic indices of breast cancer using classification techniques. Informatics in Medicine Unlocked. 2020; 18: 100265.

Zand HK. A comparative survey on data mining techniques for breast cancer diagnosis and prediction. Indian Journal of Fundamental and Applied Life Sciences. 2015; 5(s1): 4330-9.

Mehri Dehnavi A, Sehhati MR, Rabbani H. Hybrid method for prediction of metastasis in breast cancer patients using gene expression signals. J Med Signals Sens. 2013; 3(2): 79-86. PMID: 24098861 PMCID: PMC3788197

DOI: http://dx.doi.org/10.30699/fhi.v10i1.296