Abandonment of Speech Recognition by New Users

Heidi Horstmann Koester
Rehabilitation Engineering Research Center on Ergonomics
University of Michigan

Abstract

In a longitudinal study on new users of speech recognition, 7 out of 8 participants were no longer using their speech recognition system 6 months after system delivery. Reasons for this include variable and poor performance using speech recognition, and resulting preference for non-speech input methods, as well as difficulties with technical implementation.

Background

Automatic speech recognition (ASR) systems have the potential to enhance the comfort and productivity of computer users who have disabilities. However, while use of speech for human communication is natural, speaking successfully to a computer is not. There are very few data on how user skill develops with ASR and how to facilitate skill development. Performance data for users without disabilities suggests that text entry roughly doubles after 20 hours or so of use, from 14 wpm at initial use to 25 - 30 wpm with experience [1,2,3]. Unfortunately, these data represent only a handful of subjects without disabilities, and we have found no performance measurements for users who have physical disabilities.

Research Questions

The general goals of this three-year project are to understand how well ASR systems are meeting the needs of people with disabilities and to improve user performance with ASR systems. The specific goals of this study are to:

Measure the performance and satisfaction of new ASR users;
Determine how performance and satisfaction changes over time;
Understand the factors that lead to success with ASR for new users.

Methods

Subjects.

New users of speech recognition were enrolled in the study when their ASR system was delivered. Eight people participated. All but one received their ASR system following assessment by a certified ATP. Seven subjects used Dragon NaturallySpeaking, and one used Dragon Dictate. One subject had tried ASR prior to this study. Seven have physical disabilities that affect their ability to use the standard keyboard and mouse, and one has difficulty writing and reading using traditional orthography. All subjects were able to use a non-speech method to access their computer, ranging from a mouthstick to ten-finger typing.

Procedure.

Sessions occurred in the subject's home or office, on their own ASR system. ASR training was provided by the client's clinician, for those who worked with clinicians, and was not controlled in any way by this study. Text entry rate and recognition accuracy with ASR were measured from the time of initial delivery of the system across a minimum of three sessions over four to six weeks. (An exception is subject BN1 who stopped using ASR before a second data point could be obtained.)

Data Collection.

Text entry rate and recognition accuracy using ASR were measured using the QuickMAP procedure, developed for this study [4]. Briefly, the procedure consists of transcribing a short paragraph using ASR, in two phases: a dictation phase, followed by a correction phase. The main purpose of the dictation phase is to measure the recognition accuracy the client is achieving with the ASR system. The correction phase allows measurement of the client's true text entry rate, when the time required to correct recognition errors is taken into account.

Follow-up

Subjects' ASR usage 6 months after delivery was determined through direct contact with subjects and their clinicians. In addition to anecdotal information, five subjects completed a specific follow-up survey.

Results

Table 1 shows the major results for each subject. Recognition accuracy and text entry rate with ASR are shown on initial measurement and after 4 to 6 weeks. The general pattern is one of inconsistent and variable performance. Only 2 of 7 subjects showed improvement in recognition accuracy after 6 weeks, and only 3 of 7 showed improvement in text entry rate.

**Table 1. ASR performance and follow-up results for eight new ASR users.**
Subject	Recog. Accuracy (%)		Text Entry Rate (wpm)		Quit ASR?	Reason for quitting
	Initial	4-6 weeks	Initial	4-6 weeks
BN1	95	--	39.0	--	Y	Technical problems
MN1	87	95	7.1	17.5	Y	Personal issues
MN2	92	73	11.6	6.8	Y	Other method is better
SN1	77	76	8.3	11.3	Y	Other method is better
SN2	95	85	22.3	6.6	Y	Other method is better
WN1	69	60	1.9	1.5	Y	Other method is better
ZN1	77	99	5.5	72.6	Y	Technical problems
JN1	94	91	39.4	22.0	N	Did not quit
Avg.	86	83	16.9	19.8

Perhaps most striking is the result that only 1 of the 8 participants, JN1, was still using her ASR system on follow-up at 6 months. She was the only participant to attain recognition accuracy above 90% both initially and at 6 weeks. Table 1 lists the primary reason why the other 7 participants stopped using their systems, based on follow-up survey responses as well as conversations with the participants and their clinicians. For 4 of the 7, the main reason was that their non-speech input method, which was keyboard use with mouthstick (N=2) or fingers (N=2), worked better for them. Specific reasons cited were that the non-speech method provided faster and more consistent performance, was easier to set up, and worked on any computer. These were the four users who had the slowest text entry rates with ASR at 4 to 6 weeks, averaging only 6.5 words per minute. For 2 of the 7 abandoners, technical difficulties in getting ASR to work reliably on their hardware and with their software applications provided enough of a barrier that they decided to stop using ASR altogether. One of these individuals, BN1, had excellent initial performance with ASR, but her system crashed frequently. She was the only participant who did not have an experienced clinician helping her with her installation. The remaining abandoner, MN1, stopped using ASR due to personal family issues that had nothing to do with her speech recognition system.

Discussion

Relatively poor and variable performance, in terms of recognition accuracy and text entry rate, appears to be an underpinning to the high rate of abandonment seen in these subjects. When users didn't experience rapid improvement and short-term benefits from their ASR system, they were likely to conclude that their "tried-and-true" non-speech method was a better solution. It is possible that the variable performance seen in the first few weeks with these participants may have improved to a satisfactory level over a longer time span. Indeed, the original intent of the 6-month follow-up was to measure long-term speed and accuracy in study participants, and in trying to schedule these sessions, the widespread abandonment was revealed.

These findings contrast sharply with those of Schwartz and Johnson (1999), who found that 25% of 28 participants stopped using their ASR system, and DeRosier (2002), who reported that 10% of 10 users had abandoned their system [5,6]. Neither study measured user performance, so it is hard to say whether the higher satisfaction was due to better speed and accuracy enjoyed by these users. Both groups may have had more comprehensive ASR training than the participants in this study, and this may have contributed to their greater success.

The limited success with ASR observed in this study should be interpreted cautiously. Certainly there are clients for whom ASR is an excellent solution. However, it is not a panacea, as these results show. Further research is needed to learn more about how to better facilitate positive outcomes with speech recognition.

References

Devine, E.G., Gaehde, S.A., and Curtis, A.C. (2000). Comparative evaluation of three continuous speech recognition software packages in the generation of medical reports. Journal of the American Medical Informatics Association, 7, 462-468.
Karat, J., Horn, D.B., Halverson, C.A., and Karat, C. (2000). Overcoming unusability: Developing efficient strategies in speech recognition systems. Poster at CHI 2000, ACM Conference on Human Factors in Computer Systems, The Hague, Netherlands, April 1-4, 2000.
Karat, C., Halverson, C.A., Horn, D.B., and Karat, J. (1999). Patterns of entry and correction in large vocabulary continuous speech recognition systems. In Proceedings of the CHI '99 Conference (pp. 568-574). Boston, MA: Association for Computing Machinery.
Koester, H. (2002). A method for measuring client performance with speech recognition. Proceedings of the 25th RESNA Conference (pp. 115-117), Washington, DC: RESNA.
Schwartz, P., Johnson, J. (1999). The effectiveness of speech recognition technology. Proceedings of RESNA 99 (pp. 77-79), Washington, DC: RESNA.
DeRosier, R. (2002). Speech recognition software as an assistive device: A study of user satisfaction and psychosocial impact. M.S. Thesis, Temple University.

Acknowledgments

This study was funded by U.S. Dept of Education Grant #H133E980007. Thanks to all of the participants in this study for their generous contributions of time, effort, and insights.

Heidi Horstmann Koester, Ph.D.
1205 Beal Ave.,
University of Michigan,
Ann Arbor MI 48109-2117

hhk@umich.edu