RESNA 26th International Annual Confence
In a longitudinal study on new users of speech recognition, 7 out of 8 participants were no longer using their speech recognition system 6 months after system delivery. Reasons for this include variable and poor performance using speech recognition, and resulting preference for non-speech input methods, as well as difficulties with technical implementation.
Automatic speech recognition (ASR) systems have the potential to enhance the comfort and productivity of computer users who have disabilities. However, while use of speech for human communication is natural, speaking successfully to a computer is not. There are very few data on how user skill develops with ASR and how to facilitate skill development. Performance data for users without disabilities suggests that text entry roughly doubles after 20 hours or so of use, from 14 wpm at initial use to 25 - 30 wpm with experience [1,2,3]. Unfortunately, these data represent only a handful of subjects without disabilities, and we have found no performance measurements for users who have physical disabilities.
The general goals of this three-year project are to understand how well ASR systems are meeting the needs of people with disabilities and to improve user performance with ASR systems. The specific goals of this study are to:
New users of speech recognition were enrolled in the study when their ASR system was delivered. Eight people participated. All but one received their ASR system following assessment by a certified ATP. Seven subjects used Dragon NaturallySpeaking, and one used Dragon Dictate. One subject had tried ASR prior to this study. Seven have physical disabilities that affect their ability to use the standard keyboard and mouse, and one has difficulty writing and reading using traditional orthography. All subjects were able to use a non-speech method to access their computer, ranging from a mouthstick to ten-finger typing.
Sessions occurred in the subject's home or office, on their own ASR system. ASR training was provided by the client's clinician, for those who worked with clinicians, and was not controlled in any way by this study. Text entry rate and recognition accuracy with ASR were measured from the time of initial delivery of the system across a minimum of three sessions over four to six weeks. (An exception is subject BN1 who stopped using ASR before a second data point could be obtained.)
Text entry rate and recognition accuracy using ASR were measured using the QuickMAP procedure, developed for this study . Briefly, the procedure consists of transcribing a short paragraph using ASR, in two phases: a dictation phase, followed by a correction phase. The main purpose of the dictation phase is to measure the recognition accuracy the client is achieving with the ASR system. The correction phase allows measurement of the client's true text entry rate, when the time required to correct recognition errors is taken into account.
Subjects' ASR usage 6 months after delivery was determined through direct contact with subjects and their clinicians. In addition to anecdotal information, five subjects completed a specific follow-up survey.
Table 1 shows the major results for each subject. Recognition accuracy and text entry rate with ASR are shown on initial measurement and after 4 to 6 weeks. The general pattern is one of inconsistent and variable performance. Only 2 of 7 subjects showed improvement in recognition accuracy after 6 weeks, and only 3 of 7 showed improvement in text entry rate.
Perhaps most striking is the result that only 1 of the 8 participants, JN1, was still using her ASR system on follow-up at 6 months. She was the only participant to attain recognition accuracy above 90% both initially and at 6 weeks. Table 1 lists the primary reason why the other 7 participants stopped using their systems, based on follow-up survey responses as well as conversations with the participants and their clinicians. For 4 of the 7, the main reason was that their non-speech input method, which was keyboard use with mouthstick (N=2) or fingers (N=2), worked better for them. Specific reasons cited were that the non-speech method provided faster and more consistent performance, was easier to set up, and worked on any computer. These were the four users who had the slowest text entry rates with ASR at 4 to 6 weeks, averaging only 6.5 words per minute. For 2 of the 7 abandoners, technical difficulties in getting ASR to work reliably on their hardware and with their software applications provided enough of a barrier that they decided to stop using ASR altogether. One of these individuals, BN1, had excellent initial performance with ASR, but her system crashed frequently. She was the only participant who did not have an experienced clinician helping her with her installation. The remaining abandoner, MN1, stopped using ASR due to personal family issues that had nothing to do with her speech recognition system.
Relatively poor and variable performance, in terms of recognition accuracy and text entry rate, appears to be an underpinning to the high rate of abandonment seen in these subjects. When users didn't experience rapid improvement and short-term benefits from their ASR system, they were likely to conclude that their "tried-and-true" non-speech method was a better solution. It is possible that the variable performance seen in the first few weeks with these participants may have improved to a satisfactory level over a longer time span. Indeed, the original intent of the 6-month follow-up was to measure long-term speed and accuracy in study participants, and in trying to schedule these sessions, the widespread abandonment was revealed.
These findings contrast sharply with those of Schwartz and Johnson (1999), who found that 25% of 28 participants stopped using their ASR system, and DeRosier (2002), who reported that 10% of 10 users had abandoned their system [5,6]. Neither study measured user performance, so it is hard to say whether the higher satisfaction was due to better speed and accuracy enjoyed by these users. Both groups may have had more comprehensive ASR training than the participants in this study, and this may have contributed to their greater success.
The limited success with ASR observed in this study should be interpreted cautiously. Certainly there are clients for whom ASR is an excellent solution. However, it is not a panacea, as these results show. Further research is needed to learn more about how to better facilitate positive outcomes with speech recognition.