Exploring the Use of Speech Input by Blind People on ... - Washington
Exploring the Use of Speech Input by Blind People on ... - Washington
Exploring the Use of Speech Input by Blind People on ... - Washington
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
4.2.2 Reviewing and Editing Text<br />
In this secti<strong>on</strong>, we describe <str<strong>on</strong>g>the</str<strong>on</strong>g> reviewing and editing techniques<br />
that participants used when composing paragraphs. After entering<br />
several sentences, most participants reviewed <str<strong>on</strong>g>the</str<strong>on</strong>g>ir text <str<strong>on</strong>g>by</str<strong>on</strong>g><br />
making VoiceOver read it word <str<strong>on</strong>g>by</str<strong>on</strong>g> word. If a word did not sound<br />
correct, <str<strong>on</strong>g>the</str<strong>on</strong>g>y read it character <str<strong>on</strong>g>by</str<strong>on</strong>g> character. To do this,<br />
participants used <str<strong>on</strong>g>the</str<strong>on</strong>g> VoiceOver rotor [3], which enables users to<br />
read text <strong>on</strong>e unit at a time. A unit can be a line, word, or<br />
character. To move <str<strong>on</strong>g>the</str<strong>on</strong>g> cursor from <strong>on</strong>e unit to <str<strong>on</strong>g>the</str<strong>on</strong>g> next, a user<br />
swipes down <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> screen. The user can also swipe up to move<br />
to <str<strong>on</strong>g>the</str<strong>on</strong>g> previous unit. In this way, participants moved <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
VoiceOver cursor around <str<strong>on</strong>g>the</str<strong>on</strong>g> text in <str<strong>on</strong>g>the</str<strong>on</strong>g> text area to verify <str<strong>on</strong>g>the</str<strong>on</strong>g>ir<br />
input. When reviewing, participants <str<strong>on</strong>g>of</str<strong>on</strong>g>ten read words several<br />
times, <str<strong>on</strong>g>the</str<strong>on</strong>g>n iterated through <str<strong>on</strong>g>the</str<strong>on</strong>g>ir characters.<br />
Some ASR errors were difficult to detect using <str<strong>on</strong>g>the</str<strong>on</strong>g> VoiceOver<br />
text-to-speech because <str<strong>on</strong>g>the</str<strong>on</strong>g>y sounded similar to <str<strong>on</strong>g>the</str<strong>on</strong>g> user’s original<br />
speech. For example, P1 dictated <str<strong>on</strong>g>the</str<strong>on</strong>g> phrase “lost my sight,” but<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> speech recognizer output <str<strong>on</strong>g>the</str<strong>on</strong>g> phrase, “lost my site.” P1 did not<br />
detect this error. Ano<str<strong>on</strong>g>the</str<strong>on</strong>g>r participant, P6, said, “you can hike,”<br />
but <str<strong>on</strong>g>the</str<strong>on</strong>g> speech recognizer output, “you can’t hike.” P6 did not<br />
detect <str<strong>on</strong>g>the</str<strong>on</strong>g> error ei<str<strong>on</strong>g>the</str<strong>on</strong>g>r. Some grammatical errors were also not<br />
detected in <str<strong>on</strong>g>the</str<strong>on</strong>g> review process. VoiceOver’s text-to-speech does<br />
not differentiate upper- and lower-case letters, so <str<strong>on</strong>g>the</str<strong>on</strong>g>re were some<br />
uncorrected case errors. The composed paragraphs also included<br />
extra spaces between words and sentences that participants did<br />
not seem to be aware <str<strong>on</strong>g>of</str<strong>on</strong>g>.<br />
We observed that VoiceOver did not alert participants to passages<br />
that were recognized with low-c<strong>on</strong>fidence. Occasi<strong>on</strong>ally, <str<strong>on</strong>g>the</str<strong>on</strong>g> iOS<br />
speech recognizer underlined a word or phrase that was<br />
recognized with low c<strong>on</strong>fidence, and would present <str<strong>on</strong>g>the</str<strong>on</strong>g> user with<br />
alternative recogniti<strong>on</strong> opti<strong>on</strong>s if <str<strong>on</strong>g>the</str<strong>on</strong>g> phrase was touched. This<br />
informati<strong>on</strong> was not accessible. Also, we observed that<br />
VoiceOver communicated punctuati<strong>on</strong> marks in <str<strong>on</strong>g>the</str<strong>on</strong>g> text in a<br />
subtle manner, <str<strong>on</strong>g>by</str<strong>on</strong>g> varying prosody and pausing. This made it<br />
difficult for participants to detect punctuati<strong>on</strong> marks in <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
recognized text when reviewing it.<br />
The review process took l<strong>on</strong>ger than we expected, but participants<br />
spent much more time editing <str<strong>on</strong>g>the</str<strong>on</strong>g> ASR output. Figure 6 shows <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
number <str<strong>on</strong>g>of</str<strong>on</strong>g> edits participants performed for each paragraph input<br />
with speech. The total number <str<strong>on</strong>g>of</str<strong>on</strong>g> edits performed during speech<br />
input tasks was 96. Edits included inserting, deleting, and<br />
replacing characters, words, or strings <str<strong>on</strong>g>of</str<strong>on</strong>g> words. Only 15 (15.6%)<br />
<str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g>se edits were d<strong>on</strong>e using speech input. Although inputting<br />
speech was much faster than using <str<strong>on</strong>g>the</str<strong>on</strong>g> keyboard, participants<br />
preferred using <str<strong>on</strong>g>the</str<strong>on</strong>g> keyboard to edit <str<strong>on</strong>g>the</str<strong>on</strong>g> ASR output. Several<br />
participants inserted words or phrases with speech but deleted<br />
err<strong>on</strong>eous text with <str<strong>on</strong>g>the</str<strong>on</strong>g> keyboard’s BACKSPACE key. Also, <str<strong>on</strong>g>the</str<strong>on</strong>g>re<br />
was no way to move <str<strong>on</strong>g>the</str<strong>on</strong>g> cursor using speech commands, so<br />
participants preferred to c<strong>on</strong>tinue using touch to make characteror<br />
word-level edits.<br />
We observed three editing techniques for speech input in our<br />
study. We describe <str<strong>on</strong>g>the</str<strong>on</strong>g>m as follows.<br />
H<strong>on</strong>e in, delete, and reenter. The first technique was <str<strong>on</strong>g>the</str<strong>on</strong>g> most<br />
comm<strong>on</strong> and was used <str<strong>on</strong>g>by</str<strong>on</strong>g> all participants except P5 and P6 who<br />
did not know how to use VoiceOver gestures. This technique<br />
involved (1) moving <str<strong>on</strong>g>the</str<strong>on</strong>g> cursor to a desired positi<strong>on</strong> using <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
VoiceOver gestures describe previously, (2) using <str<strong>on</strong>g>the</str<strong>on</strong>g> BACKSPACE<br />
key to delete unwanted characters, words, or even phrases, and<br />
finally (3) to enter characters using <str<strong>on</strong>g>the</str<strong>on</strong>g> keyboard.<br />
H<strong>on</strong>e in, select, and reenter. The sec<strong>on</strong>d editing technique<br />
seems more efficient than <str<strong>on</strong>g>the</str<strong>on</strong>g> first, but was <strong>on</strong>ly used <str<strong>on</strong>g>by</str<strong>on</strong>g> P8. It<br />
involved (1) using VoiceOver gestures to move <str<strong>on</strong>g>the</str<strong>on</strong>g> cursor to a<br />
desired positi<strong>on</strong>, (2) choosing <str<strong>on</strong>g>the</str<strong>on</strong>g> EDIT opti<strong>on</strong> from <str<strong>on</strong>g>the</str<strong>on</strong>g> VoiceOver<br />
rotor, (3) selecting <str<strong>on</strong>g>the</str<strong>on</strong>g> word (<strong>on</strong>e <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> EDIT opti<strong>on</strong>s, al<strong>on</strong>g with<br />
COPY and PASTE), and (4) re-enter <str<strong>on</strong>g>the</str<strong>on</strong>g> word with <str<strong>on</strong>g>the</str<strong>on</strong>g> keyboard. P8<br />
sometimes re-entered <str<strong>on</strong>g>the</str<strong>on</strong>g> word with speech input. This technique<br />
required fewer key presses than “h<strong>on</strong>e in, delete, and reenter.”<br />
Delete and start over. P5 and P6 did not know how to repositi<strong>on</strong><br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> cursor, so <str<strong>on</strong>g>the</str<strong>on</strong>g>y used this technique. They deleted entered text<br />
with <str<strong>on</strong>g>the</str<strong>on</strong>g> DELETE key, starting from <str<strong>on</strong>g>the</str<strong>on</strong>g> end (where <str<strong>on</strong>g>the</str<strong>on</strong>g> cursor was<br />
located <str<strong>on</strong>g>by</str<strong>on</strong>g> default) and re-entered with speech. They both said<br />
that <str<strong>on</strong>g>the</str<strong>on</strong>g>y <str<strong>on</strong>g>of</str<strong>on</strong>g>ten used speech for short messages, and deleted<br />
everything and started over if <str<strong>on</strong>g>the</str<strong>on</strong>g>re was a recogniti<strong>on</strong> error. This<br />
was <str<strong>on</strong>g>the</str<strong>on</strong>g> least flexible and, most likely, <str<strong>on</strong>g>the</str<strong>on</strong>g> least efficient editing<br />
technique for l<strong>on</strong>ger messages.<br />
Number <str<strong>on</strong>g>of</str<strong>on</strong>g> Edits<br />
0 5 10 15 20 25 30<br />
Keyboard edits<br />
<str<strong>on</strong>g>Speech</str<strong>on</strong>g> edits<br />
1.1 2.1 2.2 3.1 3.2 4.1 5.1 6.1 7.1 7.2 8.1<br />
Participant.task<br />
Figure 6. Number <str<strong>on</strong>g>of</str<strong>on</strong>g> edits performed with each<br />
modality for each composed paragraph (i.e., task).<br />
4.2.3 Qualitative <str<strong>on</strong>g>Use</str<strong>on</strong>g>r Feedback<br />
Two <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> 8 (25%) participants in our study preferred keyboard<br />
input over speech input. These were P1 and P5, who were <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
<strong>on</strong>ly two participants who used speech weekly, ra<str<strong>on</strong>g>the</str<strong>on</strong>g>r than daily.<br />
All participants menti<strong>on</strong>ed speed as <str<strong>on</strong>g>the</str<strong>on</strong>g> primary benefit <str<strong>on</strong>g>of</str<strong>on</strong>g> using<br />
speech for input. P1 and P5 preferred <str<strong>on</strong>g>the</str<strong>on</strong>g> keyboard because <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
challenge <str<strong>on</strong>g>of</str<strong>on</strong>g> editing: P5 said she knew <str<strong>on</strong>g>the</str<strong>on</strong>g>re were mistakes and<br />
didn’t know how to fix <str<strong>on</strong>g>the</str<strong>on</strong>g>m. P1 said that “although [with] speech<br />
you can get more volume <str<strong>on</strong>g>of</str<strong>on</strong>g> text in <str<strong>on</strong>g>the</str<strong>on</strong>g>re…but my weakness is<br />
efficiently editing. that’s <str<strong>on</strong>g>the</str<strong>on</strong>g> downside <str<strong>on</strong>g>of</str<strong>on</strong>g> speech.”<br />
P3 and P8 said it was easy for <str<strong>on</strong>g>the</str<strong>on</strong>g>m to express <str<strong>on</strong>g>the</str<strong>on</strong>g>ir thoughts<br />
verbally, having had prior experience with dictati<strong>on</strong> systems,<br />
unlike P1 and P2 who found it more difficult. P8 also said that<br />
speech input helped him avoid spelling mistakes—auto-correct<br />
was not sufficient for correcting spelling mistakes.<br />
I rely <strong>on</strong> dictati<strong>on</strong> more because my spelling is not <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
greatest and feel like can compose more <str<strong>on</strong>g>of</str<strong>on</strong>g> coherent<br />
sentence verbally than <str<strong>on</strong>g>by</str<strong>on</strong>g> writing, especially <strong>on</strong> an<br />
iOS device where it’s difficult to correct mistakes. On<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> computer, you can see with spell check and<br />
correct. but <strong>on</strong> iOS device it’s easier not to have to<br />
worry about it.<br />
While most preferred speech, all participants found certain<br />
aspects <str<strong>on</strong>g>of</str<strong>on</strong>g> speech input frustrating. All participants except P3<br />
cited editing as a source <str<strong>on</strong>g>of</str<strong>on</strong>g> frustrati<strong>on</strong>. P8 would like to be able to<br />
do “inline” editing as he spoke. P7 wanted an easier way to edit<br />
with speech ra<str<strong>on</strong>g>the</str<strong>on</strong>g>r than using <str<strong>on</strong>g>the</str<strong>on</strong>g> keyboard, which she did for<br />
most edits. P7 and P3 menti<strong>on</strong>ed <str<strong>on</strong>g>the</str<strong>on</strong>g> problem <str<strong>on</strong>g>of</str<strong>on</strong>g> dictating words<br />
that were out-<str<strong>on</strong>g>of</str<strong>on</strong>g>-vocabulary, such as names. Figure 7 shows<br />
Likert scale resp<strong>on</strong>ses to three statements from <str<strong>on</strong>g>the</str<strong>on</strong>g> interviews at