1. Department of Health Information Management, School of Allied Medical Sciences, Tehran University of Medical Sciences, Tehran, Iran., 2. Medical Informatics Research Center, Institute for Future Studies in Health, Kerman University of Medical Sciences, Kerman, Iran., 3. Department of Biostatistics and Epidemiology, School of Public Health, Kerman University of Medical Sciences, Kerman, Iran.
Speech recognition (SR) technology has been existing for more than two decades. But, it has been rarely used in health care institutions and not applied uniformly in all the clinical domains. The aim of this study was to investigate the accuracy of speech recognition system in four different situations in the real environment of health services. We also report physicians' experience of using speech recognition technology.
To do this study, NEVISA SR software professional v.3 was installed on the computers of expert physicians. The pre-designated medical report was tested by the physicians in four different modes including slow expression in a silent environment, slow expression in crowded environments, rapid expression in a silent environment and rapid expression in a busy environment. After using the speech recognition software by 15 physicians in hospitals, a designed questionnaire was distributed among them.
The results showed that the highest average accuracy of speech recognition software was in the silent environment by slow expression and the minimum average accuracy was in the busy environment by rapid expression. Of all the participants in the study, 53.3% of the physicians believed that the use of speech recognition system promoted the workflow.
We found that software accuracy was generally higher than the expectation and its use required to upgrade the system and its operation. In order to achieve the highest level of recognition rate and error reduction by speech recognition, influential factors such as environmental noise, type of software or hardware, training and experience of participants can be also considered.
Received: 2019 August 24; Revision Received: 2019 August 27; Accepted: 2019 August 31
Speech recognition technology has been existing for more than two decades but, it has been rarely used in health care institutions and not applied uniformly in all the clinical domains. Successful use of this technology has been reported in radiology [1, 2]. These systems are designed to turn speaker's speech into text and eliminate the need for human transcription and dictation [1, 3-5]. Primary systems have multiple limitations such as limited vocabulary banks, time consumption, low speed, non-user-friendliness and most importantly, separate speech. So early on, these restrictions have been considered as a major barrier to adopting this technology in health care [3, 4]. All speech recognition systems are divided into two main categories: separate speech and continuous speech, and initial systems are of separate speech type. Continuous speech systems were introduced in 1994, which became very common in hospitals and health care settings [3].
The main goal of health care organizations is to increase the quality and accuracy of documentation, and reduce documentation time and cost [6]. With the development and maturity of this technology, its implementation reduces documentation time and cost compared with traditional dictation transcription [6, 7]. Speech recognition systems with the potential ability can increase the quality of document creation without negatively affecting users’ time [8]. In addition, the accuracy of SR system has been reported more than 90% in several studies [4, 9, 10]. One study performed by Ramaswamy et al. for describing the advantages and disadvantages of using the SR system in MR imaging reporting indicated that the average of reporting turnaround time decreased and the mean of word accuracy was 92.7% [4]. Despite the benefits of the system, some studies have reported high rates of error created by SR software [7, 10] and increased burden of editing [4].
McGurk et al. conducted a study that considered the impact of VR software on the error rate of reports in radiology and found that error of reports in radiology by VR software was increased [11]. Finding a way to store and document patient information has been one of the concerns of health care systems over the years. Despite the databases for storing and transferring data, there is no easy way to receive and enter data into these systems. Furthermore, physicians are reluctant to use the new methods of data entry. The aim of this study was to investigate the accuracy of speech recognition system in four different situations in the real environment of health services. We also report physicians' experience of using speech recognition technology.
To check the accuracy of the speech recognition system and the effect of noise on its performance, the system was examined in the educational hospitals affiliated to Kerman University of Medical Sciences. To do this study, NEVISA SR software professional v.3 was installed on the computers of expert physicians. Fifteen physicians with different specialties such as pediatricians, urologists, cardiologists, internists, orthopedists and oncologists were selected to participate in this project. Selection was based on the interest and perceived need for this technology. Also, 30-60 min training was provided per participant to work with the software. It took about 10 min to adjust the sound profile for each person. The pre-designated medical report was tested by the physicians in 4 different modes including slow expression in a silent environment, slow expression in crowded environments, rapid expression in a silent environment and rapid expression in a busy environment. Finally, the accuracy of the system was evaluated based on the recognition rate of the reported text words. To measure the accuracy, the percentage of comprehensible sentences without unambiguous was considered. If a sentence were exactly as expected to be transcribed, the sentence would be score of 10. All of the produced reports were reviewed independently by two researchers: one of them was a physician and specialized in medical informatics. The results and analysis were based on descriptive and analytical statistics, independent sample’s t-test and analysis of variance using SPSS (ver. 16).
After using the speech recognition software by 15 physicians in hospitals, a designed questionnaire was distributed among them to examine the physicians’ attitude and experience in connection with the impact of software on reducing the documentation time, improving patient care quality, improving quality of documentation and workflow, reducing errors and cost, simplifying the use of the software and finding their preference of data entry methods. To increase the response rate, the review was limited to 8 questions. Content validity of the questionnaire was confirmed by 3 specialists in Medical Informatics and its reliability was measured by Cronbach's alpha (0.9).
The results showed that the highest average accuracy of speech recognition software was in the silent environment by slow expression and the minimum average accuracy was in the busy environment by rapid expression (Table 1).
Accuracy | Number | Mean | SD | Error |
---|---|---|---|---|
Slow expression in crowded environments | 30 | 76.45 | 16.78 | 3.06 |
Slow expression in silent environment | 30 | 82.08 | 12.32 | 2.25 |
Rapid expression in a busy environment | 30 | 59.81 | 16.32 | 2.98 |
Rapid expression in a silent environment | 30 | 69.63 | 19.71 | 3.59 |
TOTAL | 120 | 71.99 | 18.29 | 1.67 |
Highest average accuracy of the concept-matching was in the silent environment by slow expression and the minimum average accuracy was in the busy environment by rapid expression (Table 2).
The accuracy of the concept -matching | Number | Mean | SD | Error |
---|---|---|---|---|
Slow expression in crowded environments | 30 | 71.33 | 21.16 | 3.86 |
Slow expression in silent environment | 30 | 77 | 17.79 | 3.24 |
Rapid expression in a busy environment | 30 | 50.83 | 17.52 | 3.19 |
Rapid expression in a silent environment | 30 | 59.16 | 22.51 | 4.11 |
TOTAL | 120 | 64.58 | 22.13 | 2.02 |
Questionnaires were completed by all the participants. The results of the analysis of the questionnaire showed 33.3 of the participants said that the use of speech recognition technology improved the quality of care and 33.3 physicians said use of speech recognition system in the real health environment reduced the time of documentation. Also, 40% of the physicians believed that this technology would increase the quality of documentation. Of all the participants in the study, 53.3% of the physicians believed that the use of speech recognition system promoted the workflow. Concerning the effect of error on the documentation by speech recognition system, 20% of the physicians stated that using this technology increased documentation errors. Concerning cost savings, 26.7% of the physicians believed that this technology would reduce costs in the area of health care. Among all the 15 physicians who responded to the questionnaire, 46.7% believed that working with technology was very easy. According to a survey by the physicians, 60% of the respondents tended to use the speech recognition system and 33.3% of the participants tended to use the keyboard to enter data. Among all the participants, there was 6.7% tendency to use traditional hand-held data entry methods. The results demonstrated that the mean accuracy of speech recognition system was 71.99 (SD=18.29) and the concept-matching was 64.58 (SD=22.13).
Our aim in this study was to investigate the accuracy of speech recognition system in the real environment of health care services by physicians. Word recognition rate by the system was assessed as a system accuracy [4, 12]. In summary, based on the results, software accuracy in a quiet environment by slow expression was 82.08% and, in a busy environment by rapid expression, it was 59.81%. The overall accuracy of the software was 71.99%. The results of Alapetite’s study were consistent with our study and background noise reduced the accuracy of the software [13]. Several studies have examined the accuracy of speech recognition software [4, 14-17]. One of the papers has reported that the accuracy of the system was 84.5% and stated no relationship between the accuracy and productivity of speech recognition system [18].
This difference in the accuracy of the software can be attributed to many factors. The accuracy of the speech recognition system depends on the user's experience with the system, type of software and hardware, environment for reading the text [15], type of microphone, participants, type of training and recognition [13].
The results of the studies showed no significant difference between the male and female groups [15, 19]. The accuracy of the concept-matching in the silent environment and with quiet expression was 77% and, in the busy environment and by the rapid expression, it was 50.83%. Therefore, based on the results of this study, various factors such as environmental noise and speaker expression can affect the performance of the system and increase its accuracy. The accuracy of the entire system increased by improvement in the concept-matching accuracy [20]. Based on the results of physicians' experience, 53.3% stated that speech recognition system could enhance the workflow. Singh's results [21] were consistent with our results: the experience of physicians (33.3%) in this study indicated that the speech recognition system would enhance the quality of care, whereas in Derman's work, the participants reported that the speech recognition system did not significantly improve the clinical and administration workflow and quality of care [8]. Various studies have reported different results for the benefits of SR technology in the clinical workflow [3, 14, 22-24]. Of all the participants in this study, 33.3% stated that the use of speech recognition system reduced the time of documentation. In Lyons’ study, 51% of the participants reported that, according to their experience, the speech recognition system saved the time of documentation [25]. Accordingly, the time saving of documentation raised the benefits of patient care. As a result, it provided more time for the care of patients by care-providers. In this study, the cost of installing and implementing speech recognition system was also considered. Very few physicians (26.7%) beloved the speech recognition system would reduce the cost. Few studies [4, 9, 23] have reported that speech recognition systems are cost-effective. According to a research by Ramaswamy, the cost of system installation may vary depending on the size of practice and available infrastructure. It should be considered before the implementation and installation of the system [4].
Only 20% of the physicians reported that speech recognition technology improved the errors. Despite the benefits of this system, some studies have reported high rates of error created by SR software [7, 10] and increased burden of editing [4]. McGurk et al. conducted a study that considered the impact of VR software on the error rate of reports in radiology and found that error of reports in radiology by VR software was increased [11] if this problem were left, use of this software in health care can have irreversible consequences. Based on 40% of the participants in this study, the use of speech recognition system increased the quality of documentation. The experience of the participants in John's study was negative in terms of the quality of medical documentation and the time taken for their dreation by speech recognition system [25]. Further, 46.7% of the participants reported that working with speech recognition system was easy. Most of them (60%) tended to use the speech recognition system as an input device, while only 6.7% of the physicians liked the traditional manual entry method. According to the results of this study, using speech recognition systems can be recognized as a promising technology in the health care. By removing the limitations of this technology and its promotion as an input mechanism in the health care industry, it can be very helpful and meet the needs of this area. It can be explicitly stated that this work was the first study that examined the accuracy of speech recognition software in various health care settings in Iran. Our main goal was to examine the accuracy of the speech recognition system by physicians and in the real environment of health care. Other studies should consider the use of the speech recognition system in different fields.
We found that software accuracy was generally higher than the expectation and its use required to upgrade the system and its operation. In order to achieve the highest level of recognition rate and error reduction by speech recognition, influential factors such as environmental noise, type of software or hardware, training and experience of participants can be also considered. The experience of physicians in using speech recognition system was positive, which could be the basis for further efforts to improve the speech recognition system and introduce it in the field of health care. It should be noted that more research is needed to examine the impact of this technology on various healthcare areas. In the future, speech recognition technology as an alternative to typing texts can have many benefits for hospitals and healthcare providers.
1. | Hodgson, T. Coiera, E. Risks and benefits of speech recognition for clinical documentation: A systematic review. J Am Med Inform Assoc 2016 23(e1):e169–79. [PubMed] [CrossRef] |
2. | Kang, HP. Sirintrapun, SJ. Nestler, RJ. Parwani, AV. Experience with voice recognition in surgical pathology at a large academic multi-institutional center. Am J Clin Pathol 2010 133(1):156–9. [PubMed] [CrossRef] |
3. | Parente, R. Kock, N. Sonsini, J. An analysis of the implementation and impact of speech-recognition technology in the healthcare sector. Perspect Health Inf Manag 2004 1:5–28. [PubMed] |
4. | Ramaswamy, MR. Chaljub, G. Esch, O. Fanning, DD. vanSonnenberg, E. Continuous speech recognition in MR imaging reporting: advantages, disadvantages, and impact. AJR Am J Roentgenol 2000 174(3):617–22. [PubMed] [CrossRef] |
5. | Du, TJ. Hattingh, R. Pitcher, R. The accuracy of radiology speech recognition reports in a multilingual South African teaching hospital. BMC Med Imaging 2015 15:8–13. [PubMed] [CrossRef] |
6. | Clarke, MA. King, JL. Kim, MS. Toward successful implementation of speech recognition technology: A survey of SRT utilization issues in healthcare settings. South Med J 2015 108(7):445–51. [PubMed] |
7. | Basma, S. Lord, B. Jacks, LM. Rizk, M. Scaranelo, AM. Error rates in breast imaging reports: comparison of automatic speech recognition and dictation transcription. AJR Am J Roentgenol 2011 197(4):923–7. [PubMed] [CrossRef] |
8. | Derman, YD. Arenovich, T. Strauss, J. Speech recognition software and electronic psychiatric progress notes: physicians' ratings and preferences. BMC Med Inform Decis Mak 2010 10:44–51. [PubMed] [CrossRef] |
9. | Zick, RG. Olsen, J. Voice recognition software versus a traditional transcription service for physician charting in the ED. Am J Emerg Med 2001 19(4):295–8. [PubMed] [CrossRef] |
10. | Pezzullo, JA. Tung, GA. Rogg, JM. Davis, LM. Brody, JM. Mayo-Smith, WW. Voice recognition dictation: radiologist as transcriptionist. J Digit Imaging 2008 21(4):384–9. [PubMed] [CrossRef] |
11. | McGurk, S. Brauer, K. Macfarlane, T. Duncan, K. The effect of voice recognition software on comparative error rates in radiology reports. Br J Radiol 2008 81(970):767–70. [PubMed] [CrossRef] |
12. | Smith, NT. Brien, RA. Pettus, DC. Jones, BR. Quinn, ML. Sarnat, A. Recognition accuracy with a voice-recognition system designed for anesthesia record keeping. Journal of Clinical Monitoring and Computing 1990 6(4):299–306. [CrossRef] |
13. | Alapetite, A. Impact of noise and other factors on speech recognition in anaesthesia. Int J Med Inform 2008 77(1):68–77. [PubMed] [CrossRef] |
14. | Issenman, RM. Jaffer, IH. Use of voice recognition software in an outpatient pediatric specialty practice. Pediatrics 2004 114(3):e290–3. [PubMed] [CrossRef] |
15. | Kanal, KM. Hangiandreou, NJ. Sykes, A. Eklund, HE. Araoz, PA. Leon, JA. Initial evaluation of a continuous speech recognition program for radiology. J Digit Imaging 2001 14(1):30–7. [PubMed] [CrossRef] |
16. | Al-Aynati, MM. Chorneyko, KA. Comparison of voice-automated transcription and human transcription in generating pathology reports. Arch Pathol Lab Med 2003 127(6):721–5. [PubMed] |
17. | Vorbeck, F. Ba-Ssalamah, A. Kettenbach, J. Huebsch, P. Report generation using digital speech recognition in radiology. Eur Radiol 2000 10(12):1976–82. [PubMed] [CrossRef] |
18. | Mohr, DN. Turner, DW. Pond, GR. Kamath, JS. De, VCB. Carpenter, PC. Speech recognition as a transcription aid: A randomized comparison with standard transcription. J Am Med Inform Assoc 2003 10(1):85–93. [PubMed] [CrossRef] |
19. | Zemmel, NJ. Park, SM. Schweitzer, J. O'Keefe, JS. Laughon, MM. Edlich, RF. Status of voicetype dictation for windows for the emergency physician. J Emerg Med 1996 14(4):511–5. [PubMed] [CrossRef] |
20. | Detmer, WM. Shiffman, S. Wyatt, JC. Friedman, CP. Lane, CD. Fagan, LM. A continuous-speech interface to a decision support system: II An evaluation using a Wizard-of-Oz experimental paradigm. J Am Med Inform Assoc 1995 2(1):46–57. [PubMed] [CrossRef] |
21. | Singh, M. Pal, TR. Voice recognition technology implementation in surgical pathology: advantages and limitations. Arch Pathol Lab Med 2011 135(11):1476–81. [PubMed] [CrossRef] |
22. | Ben, B. Whiting, SO. Speech recognition 2008. Healthcare Quarterly (Toronto, Ont) 2008 11(4):99–101. [PubMed] |
23. | Callaway, EC. Sweet, CF. Siegel, E. Reiser, JM. Beall, DP. Speech recognition interface to a hospital information system using a self-designed visual basic program: Initial experience. Journal of Digital Imaging 2002 15(1):43–53. [PubMed] [CrossRef] |
24. | Koivikko, MP. Kauppinen, T. Ahovuo, J. Improvement of report workflow and productivity using speech recognition: A follow-up study. J Digit Imaging 2008 21(4):378–82. [PubMed] [CrossRef] |
25. | Lyons, JP. Sanders, SA. Fredrick, CD. Palmer, C. Mihalik, VL. Weigel, T. Speech recognition acceptance by physicians: A temporal replication of a survey of expectations and experiences. Health Informatics J 2016 22(3):768–78. [PubMed] [CrossRef] |