Exploring the Use of Speech Input by Blind People on ... - Washington
Exploring the Use of Speech Input by Blind People on ... - Washington Exploring the Use of Speech Input by Blind People on ... - Washington
4.2.2 Reviewing and Editing Text In this section, we describe
- Page 1 and 2: Exploring
- Page 3 and 4: [ ] Yes [ ] No If the</stro
- Page 5: The study was an 8 x 1 design, with
<str<strong>on</strong>g>the</str<strong>on</strong>g> end <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> study sessi<strong>on</strong>s, showing varying levels <str<strong>on</strong>g>of</str<strong>on</strong>g><br />
satisfacti<strong>on</strong> and frustrati<strong>on</strong>. Figure 7 also shows that all<br />
participants felt inputting text with speech was much faster than<br />
with <str<strong>on</strong>g>the</str<strong>on</strong>g> keyboard.<br />
Frequency<br />
Frequency<br />
Frequency<br />
0.0 0.5 1.0 1.5 2.0<br />
0 1 2 3 4 5<br />
0 1 2 3 4<br />
0 1 2 3 4 5 6 7<br />
<str<strong>on</strong>g>Speech</str<strong>on</strong>g> is fast compared to <str<strong>on</strong>g>the</str<strong>on</strong>g> keyboard. Mean = 1.6 (SD = 0.9).<br />
0 1 2 3 4 5 6 7<br />
<str<strong>on</strong>g>Speech</str<strong>on</strong>g> is frustrating compared to <str<strong>on</strong>g>the</str<strong>on</strong>g> keyboard. Mean = 4.8 (SD = 2.1).<br />
0 1 2 3 4 5 6 7<br />
I'm satisfied with speech compared to <str<strong>on</strong>g>the</str<strong>on</strong>g> keyboard. Mean = 3.5 (SD = 1.9).<br />
Figure 7. Resp<strong>on</strong>ses to three statements <strong>on</strong> a<br />
7-point Likert scale (1 is str<strong>on</strong>gly agree, 7 is<br />
str<strong>on</strong>gly disagree).<br />
5. Discussi<strong>on</strong><br />
Our study showed that speech input is an efficient entry method<br />
for blind people compared to <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>-screen keyboard, yet it is<br />
impeded <str<strong>on</strong>g>by</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> time required to review and edit ASR output.<br />
<str<strong>on</strong>g>People</str<strong>on</strong>g> can speak intelligibly at a rate <str<strong>on</strong>g>of</str<strong>on</strong>g> about 150 WPM [31],<br />
but <str<strong>on</strong>g>the</str<strong>on</strong>g> average entry rate <str<strong>on</strong>g>of</str<strong>on</strong>g> blind people using speech in our<br />
study was just 19.5 WPM. N<strong>on</strong>e<str<strong>on</strong>g>the</str<strong>on</strong>g>less, this was comparable to<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> entry rate <str<strong>on</strong>g>of</str<strong>on</strong>g> sighted people using <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>-screen keyboard <str<strong>on</strong>g>of</str<strong>on</strong>g> a<br />
smartph<strong>on</strong>e, as found in prior work [7]. Fur<str<strong>on</strong>g>the</str<strong>on</strong>g>rmore, we found<br />
that <str<strong>on</strong>g>the</str<strong>on</strong>g> error rate <str<strong>on</strong>g>of</str<strong>on</strong>g> speech input was no higher than that <str<strong>on</strong>g>of</str<strong>on</strong>g><br />
keyboard input for participants in our study. It is important to<br />
note, however, that we measured accuracy <strong>on</strong>ly in terms <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
WER, which does not necessarily correlate with <str<strong>on</strong>g>the</str<strong>on</strong>g> intelligibility<br />
<str<strong>on</strong>g>of</str<strong>on</strong>g> text [19]. The WER penalizes equally for small and major<br />
errors in a word, but it is <str<strong>on</strong>g>the</str<strong>on</strong>g> standard measure for evaluating<br />
ASR accuracy.<br />
Six <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> eight participants (75%) preferred speech over <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
keyboard because <str<strong>on</strong>g>of</str<strong>on</strong>g> speed, but all participants faced challenges<br />
when using speech input. Editing was <str<strong>on</strong>g>the</str<strong>on</strong>g> primary challenge;<br />
participants spent <str<strong>on</strong>g>the</str<strong>on</strong>g> majority (80%) <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g>ir time editing <str<strong>on</strong>g>the</str<strong>on</strong>g> text<br />
output <str<strong>on</strong>g>by</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> recognizer. Surprisingly, <str<strong>on</strong>g>the</str<strong>on</strong>g>ir most comm<strong>on</strong> editing<br />
technique was highly inefficient in terms <str<strong>on</strong>g>of</str<strong>on</strong>g> keystrokes.<br />
Participants deleted characters with BACKSPACE and <str<strong>on</strong>g>the</str<strong>on</strong>g>n reentered<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g>m with <str<strong>on</strong>g>the</str<strong>on</strong>g> keyboard. It was unclear why <str<strong>on</strong>g>the</str<strong>on</strong>g>y did not<br />
select whole words to replace <str<strong>on</strong>g>the</str<strong>on</strong>g>m, or use speech for editing<br />
more than <str<strong>on</strong>g>the</str<strong>on</strong>g> keyboard. Perhaps some participants did not know<br />
how to select whole words with VoiceOver. They may have<br />
preferred to edit text with <str<strong>on</strong>g>the</str<strong>on</strong>g> keyboard because it was more<br />
predictable, preventing additi<strong>on</strong>al errors.<br />
Study resp<strong>on</strong>ses were less positive than our survey resp<strong>on</strong>ses<br />
were. This was probably because, in our study, we asked<br />
participants to enter paragraphs that were l<strong>on</strong>ger and more formal<br />
than many smartph<strong>on</strong>e communicati<strong>on</strong>s. For example, a text<br />
message input <str<strong>on</strong>g>by</str<strong>on</strong>g> a survey participant was probably less than four<br />
sentences l<strong>on</strong>g and not as formal as an email that <strong>on</strong>e would write<br />
to a potential employer (referring to <str<strong>on</strong>g>the</str<strong>on</strong>g> guidelines we gave<br />
participants in <str<strong>on</strong>g>the</str<strong>on</strong>g> study). <str<strong>on</strong>g>Speech</str<strong>on</strong>g> is currently better suited for<br />
short, casual messages, probably because <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> difficulty <str<strong>on</strong>g>of</str<strong>on</strong>g><br />
correcting and identifying errors. We believe <str<strong>on</strong>g>the</str<strong>on</strong>g> research<br />
community should facilitate <str<strong>on</strong>g>the</str<strong>on</strong>g> process <str<strong>on</strong>g>of</str<strong>on</strong>g> correcting c<strong>on</strong>tent and<br />
grammar to make speech input more versatile.<br />
Our study also uncovered interesting keyboard input behavior<br />
with VoiceOver. Since this was not our focus, we did not<br />
document challenges with keyboard input rigorously, but<br />
observed several interesting trends. Surprisingly, some<br />
participants did not use <str<strong>on</strong>g>the</str<strong>on</strong>g> auto-correct feature, which could<br />
have improved <str<strong>on</strong>g>the</str<strong>on</strong>g>ir speed and accuracy. They found it difficult<br />
to m<strong>on</strong>itor and dismiss auto-correct suggesti<strong>on</strong>s. We also<br />
observed that VoiceOver did not communicate punctuati<strong>on</strong><br />
clearly, and some minor grammatical issues, such as extra spaces<br />
between words, were <strong>on</strong>ly noticeable when reviewing text<br />
character <str<strong>on</strong>g>by</str<strong>on</strong>g> character. VoiceOver had a setting in which it speaks<br />
punctuati<strong>on</strong> marks, but not <strong>on</strong>e participant used this setting.<br />
VoiceOver also did not communicate misspelled words that were<br />
visually identified with an underline. Enabling participants to<br />
more easily identify punctuati<strong>on</strong> and grammar and spelling issues<br />
would likely improve efficiency and compositi<strong>on</strong> quality for both<br />
keyboard and speech input.<br />
Throughout <str<strong>on</strong>g>the</str<strong>on</strong>g> paper, we have compared speech input with <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
de facto standard accessible input method for touchscreens: <strong>on</strong>screen<br />
keyboard input with VoiceOver. However, <str<strong>on</strong>g>the</str<strong>on</strong>g>re are input<br />
alternatives that are comm<strong>on</strong>ly used <str<strong>on</strong>g>by</str<strong>on</strong>g> both blind and sighted<br />
people that should be c<strong>on</strong>sidered when evaluating speech. Several<br />
study participants used a small external keyboard with hard keys;<br />
<strong>on</strong>e participant used <str<strong>on</strong>g>the</str<strong>on</strong>g> keyboard <strong>on</strong> his Braille display, which<br />
c<strong>on</strong>nected to his iPh<strong>on</strong>e; <strong>on</strong>e participant used <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>-screen input<br />
method Fleksy [9], <strong>on</strong>e <str<strong>on</strong>g>of</str<strong>on</strong>g> many gesture-based text entry methods<br />
(see Related Work for o<str<strong>on</strong>g>the</str<strong>on</strong>g>rs). These alternatives are more private<br />
than speech, and probably more reliable in noisy envir<strong>on</strong>ments. It<br />
would be interesting to compare <str<strong>on</strong>g>the</str<strong>on</strong>g>se methods to speech in <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
future.<br />
6. Challenges for Future Research<br />
We distill our findings into a set <str<strong>on</strong>g>of</str<strong>on</strong>g> challenges for researchers<br />
interested in n<strong>on</strong>visual text entry. These challenges can be<br />
incorporated into both speech and gesture-based input methods.<br />
1. Text selecti<strong>on</strong> – a better method for n<strong>on</strong>visual selecti<strong>on</strong><br />
<str<strong>on</strong>g>of</str<strong>on</strong>g> text. This can also include o<str<strong>on</strong>g>the</str<strong>on</strong>g>r edit operati<strong>on</strong>s, such<br />
as cut, copy, and paste.<br />
2. Cursor positi<strong>on</strong>ing – an easier way to move a cursor<br />
around a text area; enable a user to easily h<strong>on</strong>e in <strong>on</strong><br />
errors.