• Logo
  • HamaraJournals

Natural Language Processing Systems for Diagnosing and Determining Level of Lung Cancer: A Systematic Review

Mahdieh Montazeri, Ali Afraz, Raheleh Mahboob Farimani, Fahimeh Ghasemian



Introduction: Lung cancer is the second most common cancer for men and women. Using natural language processing to automatically extract information from text, lead to decrease labor of manual extraction from large volume of text material and save time. The aim of this study is to systematically review of studies which reviewed NLP methods in diagnosing and staging lung cancer.

Material and Methods:  PubMed, Scopus, Web of science, Embase was searched for English language articles that reported diagnosing and staging methods in lung cancer Using NLP until DEC 2019. Two reviewers independently assessed original papers to determine eligibility for inclusion in the review.

Results: Of 119 studies, 7 studies were included. Three studies developed a NLP algorithm to scan radiology notes and determine the presence or absence of nodules to identify patients with incident lung nodules for treatment or follow-up. Two studies used NLP to transform the report text, including identification of UMLS terms and detection of negated findings to classifying reports, also one of them used an SVM-based text classification system for staging lung cancer patients. All studies reported various performance measures based on the difference between combination of methods. Most of studies have reported sensitivity and specificity of the NLP algorithm for identifying the presence of lung nodules.

Conclusion: Evaluation of studies in diagnosing and staging methods in lung cancer using NLP shows there is a number of studies on diagnosing lung cancer but there are a few works on staging that. In some studies, combination of methods was considered and NLP isolated was not sufficient for capturing satisfying results. There are potentials to improve studies by adding other data sources, further refinement and subsequent validation.


Baudendistel I, Winkler E, Kamradt M, Brophy S, Längst G, Eckrich F, et al. Cross‐sectoral cancer care: Views from patients and health care professionals regarding a personal electronic health record. Eur J Cancer Care (Engl). 2017; 26(2): e12429. PMID: 26840784 DOI: 10.1111/ecc.12429

Azami-Aghdash S, Ghojazadeh M, Sheyklo SG, Daemi A, Kolahdouzan K, Mohseni M, et al. Breast cancer screening barriers from the womans perspective: A meta-synthesis. Asian Pac J Cancer Prev. 2015; 16(8): 3463-71. PMID: 25921163 DOI: 10.7314/apjcp.2015.16.8.3463

Murphy KM, Nguyen V, Shin K, Sebastian-Deutsch A, Frieden L. Health care professionals and the employment-related needs of cancer survivors. J Occup Rehabil. 2017; 27(2): 296-305. PMID: 26857027 DOI: 10.1007/s10926-016-9629-2

Maleki D, Ghojazadeh M, Mahmoudi S-S, Mahmoudi S-M, Pournaghi-Azar F, Torab A, et al. Epidemiology of oral cancer in Iran: A systematic review. Asian Pac J Cancer Prev. 2015; 16(13): 5427-32. PMID: 26225689 DOI: 10.7314/apjcp.2015.16.13.5427

Ferraz AFB, Rosim R, Anaya P. Standardization process of raw datasus and consumption analysis of oncology therapies in the Brazil public health care system: A comparison between raw and standardized dataset in colorectal and lung cancer. Value in Health. 2015; 18(7): A811.

Faris N, Yu X, Sareen S, Signore RS, McHugh LM, Roark K, et al. Preoperative evaluation of lung cancer in a community health care setting. Ann Thorac Surg. 2015; 100(2): 394-400. PMID: 26074001 DOI: 10.1016/j.athoracsur.2015.03.008

Ryoo JJ, Malin JL, Ordin DL, Oishi SM, Kim B, Asch SM, et al. Facility characteristics and quality of lung cancer care in an integrated health care system. J Thorac Oncol. 2014; 9(4): 447-55. PMID: 24736065 DOI: 10.1097/JTO.0000000000000108

Sundaram B, Kazerooni E. Preface: Lung cancer is an important public health care issue. Radiol Clin North Am. 2012; 50(5): xi. PMID: 22974784 DOI: 10.1016/j.rcl.2012.06.009

Torre LA, Siegel RL, Jemal A. Lung cancer statistics. In: Ahmad A, Gadgeel S (eds). Lung cancer and personalized medicine. Springer; 2016.

Siegel RL, Miller KD, Jemal A. Cancer statistics, 2019. CA Cancer J Clin. 2019; 69(1): 7-34. PMID: 30620402 DOI: 10.3322/caac.21551

Hunnibell LS, Rose MG, Connery DM, Grens CE, Hampel JM, Rosa M, et al. Using nurse navigation to improve timeliness of lung cancer care at a veterans hospital. Clin J Oncol Nurs. 2012; 16(1): 29-36. PMID: 22297004 DOI: 10.1188/12.CJON.29-36

Hirano H, Maeda H, Yamaguchi T, Yokota S, Mori M, Sakoda S. Survivin expression in lung cancer: Association with smoking, histological types and pathological stages. Oncol Lett. 2015; 10(3): 1456-62. PMID: 26622690 DOI: 10.3892/ol.2015.3374

Huang R, Wei Y, Hung RJ, Liu G, Su L, Zhang R, et al. Associated links among smoking, chronic obstructive pulmonary disease, and small cell lung cancer: A pooled analysis in the International Lung Cancer Consortium. EBioMedicine. 2015; 2(11): 1677-85. PMID: 26870794 DOI: 10.1016/j.ebiom.2015.09.031

Yun YD, Back JH, Ghang H, Jee SH, Kim Y, Lee SM, et al. Hazard ratio of smoking on lung cancer in Korea according to histological type and gender. Lung. 2016; 194(2): 281-9. PMID: 26718701 DOI: 10.1007/s00408-015-9836-1

Thun MJ, Hannan LM, Adams-Campbell LL, Boffetta P, Buring JE, Feskanich D, et al. Lung cancer occurrence in never-smokers: an analysis of 13 cohorts and 22 cancer registry studies. PLoS Medicine. 2008; 5(9): e185. PMID: 18788891 DOI: 10.1371/journal.pmed.0050185

Barletta JA, Yeap BY, Chirieac LR. Prognostic significance of grading in lung adenocarcinoma. Cancer. 2010; 116(3): 659-69. PMID: 20014400 DOI: 10.1002/cncr.24831

McCowan I, Moore D, Fry M-J. Classification of cancer stage from free-text histology reports. Conf Proc IEEE Eng Med Biol Soc. 2006; 2006: 5153-6. PMID: 17945879 DOI: 10.1109/IEMBS.2006.259563

Lewis SZ, Diekemper R, Addrizzo-Harris DJ. Methodology for development of guidelines for lung cancer: diagnosis and management of lung cancer: American College of Chest Physicians evidence-based clinical practice guidelines. Chest. 2013; 143(5): 41S-50S. PMID: 23649432 DOI: 10.1378/chest.12-2344

Azar FE, Azami-Aghdash S, Pournaghi-Azar F, Mazdaki A, Rezapour A, Ebrahimi P, et al. Cost-effectiveness of lung cancer screening and treatment methods: A systematic review of systematic reviews. BMC Health Serv Res. 2017; 17(1): 413. PMID: 28629461 DOI: 10.1186/s12913-017-2374-1

Kreimeyer K, Foster M, Pandey A, Arya N, Halford G, Jones SF, et al. Natural language processing systems for capturing and standardizing unstructured clinical information: A systematic review. J Biomed Inform. 2017; 73: 14-29. PMID: 28729030 DOI: 10.1016/j.jbi.2017.07.012

Dreisbach C, Koleck TA, Bourne PE, Bakken S. A systematic review of natural language processing and text mining of symptoms from electronic patient-authored text data. Int J Med Inform. 2019; 125: 37-46. PMID: 30914179 DOI: 10.1016/j.ijmedinf.2019.02.008

Nadkarni PM, Ohno-Machado L, Chapman WW. Natural language processing: An introduction. J Am Med Inform Assoc. 2011; 18(5): 544-51. PMID: 21846786 DOI: 10.1136/amiajnl-2011-000464

Calvo RA, Milne DN, Hussain MS, Christensen H. Natural language processing in mental health applications using non-clinical texts. Natural Language Engineering. 2017; 23(5): 649-85.

Yim W-W, Yetisgen M, Harris WP, Kwan SW. Natural language processing in oncology: A review. JAMA Oncol. 2016; 2(6): 797-804. PMID: 27124593 DOI: 10.1001/jamaoncol.2016.0213

Paul MJ, Sarker A, Brownstein JS, Nikfarjam A, Scotch M, Smith KL, et al. Social media mining for public health monitoring and surveillance. Pacific Symposium on Biocomputing. World Scientific; 2016.

Sun W, Rumshisky A, Uzuner O. Evaluating temporal relations in clinical text: 2012 i2b2 Challenge. J Am Med Inform Assoc. 2013; 20(5): 806-13. PMID: 23564629 DOI: 10.1136/amiajnl-2013-001628

Uzuner O, Bodnari A, Shen S, Forbush T, Pestian J, South BR. Evaluating the state of the art in coreference resolution for electronic medical records. J Am Med Inform Assoc. 2012; 19(5): 786-91. PMID: 22366294 DOI: 10.1136/amiajnl-2011-000784

Elhadad N, Pradhan S, Gorman S, Manandhar S, Chapman W, Savova G. SemEval-2015 task 14: Analysis of clinical text. International Workshop on Semantic Evaluation. Association for Computational Linguistics; 2015.

Moher D, Shamseer L, Clarke M, Ghersi D, Liberati A, Petticrew M, et al. Preferred reporting items for systematic review and meta-analysis protocols (PRISMA-P) 2015 statement. Syst Rev. 2015; 4(1): 1. PMID: 25554246 DOI: 10.1186/2046-4053-4-1

Ouzzani M, Hammady H, Fedorowicz Z, Elmagarmid AJSr. Rayyan: A web and mobile app for systematic reviews. Syst Rev. 2016; 5(1): 210. PMID: 27919275 DOI: 10.1186/s13643-016-0384-4

Van Rijsbergen CJ. Information retrieval. 2nd ed. London, England: Butterworths; 1979.

Wang L, Luo L, Wang Y, Wampfler J, Yang P, Liu H. Natural language processing for populating lung cancer clinical research data. BMC Med Inform Decis Mak. 2019; 19(Suppl 5): 239. PMID: 31801515 DOI: 10.1186/s12911-019-0931-8

Gupta EK, Thammasudjarit R, Thakkinstian A. NLP automation to read radiological reports to detect the stage of cancer among lung cancer patients. Workshop on Widening NLP; 2019.

Karunakaran B, Misra D, Marshall K, Mathrawala D, Kethireddy S. Closing the loop: Finding lung cancer patients using NLP. International Conference on Big Data. IEEE; 2017.

Wadia R, Akgun K, Brandt C, Fenton BT, Levin W, Marple AH, et al. Comparison of natural language processing and manual coding for the identification of cross-sectional imaging reports suspicious for lung cancer. JCO Clin Cancer Inform. 2018; 2: 1-7. PMID: 30652545 DOI: 10.1200/CCI.17.00069

Danforth KN, Early MI, Ngan S, Kosco AE, Zheng C, Gould MK. Automated identification of patients with pulmonary nodules in an integrated health system using administrative health plan data, radiology reports, and natural language processing. J Thorac Oncol. 2012; 7(8): 1257-62. PMID: 22627647 DOI: 10.1097/JTO.0b013e31825bd9f5

Beyer SE, McKee BJ, Regis SM, McKee AB, Flacke S, El Saadawi G, et al. Automatic Lung-RADS™ classification with a natural language processing system. J Thorac Dis. 2017; 9(9): 3114-22. PMID: 29221286 DOI: 10.21037/jtd.2017.08.13

Attardi G. DeepNL: A deep learning NLP pipeline. Workshop on Vector Space Modeling for Natural Language Processing. Association for Computational Linguistics; 2015.

Manning C, Surdeanu M, Bauer J, Finkel J, Bethard S, McClosky D. The Stanford core NLP toolkit. Annual Meeting of the Association for Computational Linguistics: System Demonstrations. Association for Computational Linguistics; 2014.

Chen MC, Ball RL, Yang L, Moradzadeh N, Chapman BE, Larson DB, et al. Deep learning to classify radiology free-text reports. Radiology. 2017; 286(3): 845–52. PMID: 29135365 DOI: 10.1148/radiol.2017171115

Alex B, Grover C, Tobin R, Sudlow C, Mair G, Whiteley W. Text mining brain imaging reports. J Biomed Semantics. 2019; 10(Suppl 1): 23. PMID: 31711539 DOI: 10.1186/s13326-019-0211-7

Viani N, Miller TA, Napolitano C, Priori SG, Savova GK, Bellazzi R, et al. Supervised methods to extract clinical events from cardiology reports in Italian. J Biomed Inform. 2019; 95: 103219. PMID: 31150777 DOI: 10.1016/j.jbi.2019.103219

Liu Y, Liu Q, Han C, Zhang X, Wang X. The implementation of natural language processing to extract index lesions from breast magnetic resonance imaging reports. BMC Med Inform Decis Mak. 2019; 19(1): 288. PMID: 31888615 DOI: 10.1186/s12911-019-0997-3

Viani N, Kam J, Yin L, Bittar A, Dutta R, Patel R, et al. Temporal information extraction from mental health records to identify duration of untreated psychosis. J Biomed Semantics. 2020; 11(1): 2. PMID: 32156302 DOI: 10.1186/s13326-020-00220-2

Chandran D, Robbins DA, Chang C-K, Shetty H, Sanyal J, Downs J, et al. Use of natural language processing to identify obsessive compulsive symptoms in patients with schizophrenia, schizoaffective disorder or bipolar disorder. Sci Rep. 2019; 9(1): 14146. PMID: 31578348 DOI: 10.1038/s41598-019-49165-2

Savova GK, Danciu I, Alamudun F, Miller T, Lin C, Bitterman DS, et al. Use of natural language processing to extract clinical cancer phenotypes from electronic medical records. Cancer Res. 2019; 79(21): 5463–70. PMID: 31395609 DOI: 10.1158/0008-5472.CAN-19-0579

DOI: http://dx.doi.org/10.30699/fhi.v10i1.264


  • There are currently no refbacks.