28.02.2014 Views

Exploring the Use of Speech Input by Blind People on ... - Washington

Exploring the Use of Speech Input by Blind People on ... - Washington

Exploring the Use of Speech Input by Blind People on ... - Washington

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

4.2.2 Reviewing and Editing Text<br />

In this secti<strong>on</strong>, we describe <str<strong>on</strong>g>the</str<strong>on</strong>g> reviewing and editing techniques<br />

that participants used when composing paragraphs. After entering<br />

several sentences, most participants reviewed <str<strong>on</strong>g>the</str<strong>on</strong>g>ir text <str<strong>on</strong>g>by</str<strong>on</strong>g><br />

making VoiceOver read it word <str<strong>on</strong>g>by</str<strong>on</strong>g> word. If a word did not sound<br />

correct, <str<strong>on</strong>g>the</str<strong>on</strong>g>y read it character <str<strong>on</strong>g>by</str<strong>on</strong>g> character. To do this,<br />

participants used <str<strong>on</strong>g>the</str<strong>on</strong>g> VoiceOver rotor [3], which enables users to<br />

read text <strong>on</strong>e unit at a time. A unit can be a line, word, or<br />

character. To move <str<strong>on</strong>g>the</str<strong>on</strong>g> cursor from <strong>on</strong>e unit to <str<strong>on</strong>g>the</str<strong>on</strong>g> next, a user<br />

swipes down <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> screen. The user can also swipe up to move<br />

to <str<strong>on</strong>g>the</str<strong>on</strong>g> previous unit. In this way, participants moved <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

VoiceOver cursor around <str<strong>on</strong>g>the</str<strong>on</strong>g> text in <str<strong>on</strong>g>the</str<strong>on</strong>g> text area to verify <str<strong>on</strong>g>the</str<strong>on</strong>g>ir<br />

input. When reviewing, participants <str<strong>on</strong>g>of</str<strong>on</strong>g>ten read words several<br />

times, <str<strong>on</strong>g>the</str<strong>on</strong>g>n iterated through <str<strong>on</strong>g>the</str<strong>on</strong>g>ir characters.<br />

Some ASR errors were difficult to detect using <str<strong>on</strong>g>the</str<strong>on</strong>g> VoiceOver<br />

text-to-speech because <str<strong>on</strong>g>the</str<strong>on</strong>g>y sounded similar to <str<strong>on</strong>g>the</str<strong>on</strong>g> user’s original<br />

speech. For example, P1 dictated <str<strong>on</strong>g>the</str<strong>on</strong>g> phrase “lost my sight,” but<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> speech recognizer output <str<strong>on</strong>g>the</str<strong>on</strong>g> phrase, “lost my site.” P1 did not<br />

detect this error. Ano<str<strong>on</strong>g>the</str<strong>on</strong>g>r participant, P6, said, “you can hike,”<br />

but <str<strong>on</strong>g>the</str<strong>on</strong>g> speech recognizer output, “you can’t hike.” P6 did not<br />

detect <str<strong>on</strong>g>the</str<strong>on</strong>g> error ei<str<strong>on</strong>g>the</str<strong>on</strong>g>r. Some grammatical errors were also not<br />

detected in <str<strong>on</strong>g>the</str<strong>on</strong>g> review process. VoiceOver’s text-to-speech does<br />

not differentiate upper- and lower-case letters, so <str<strong>on</strong>g>the</str<strong>on</strong>g>re were some<br />

uncorrected case errors. The composed paragraphs also included<br />

extra spaces between words and sentences that participants did<br />

not seem to be aware <str<strong>on</strong>g>of</str<strong>on</strong>g>.<br />

We observed that VoiceOver did not alert participants to passages<br />

that were recognized with low-c<strong>on</strong>fidence. Occasi<strong>on</strong>ally, <str<strong>on</strong>g>the</str<strong>on</strong>g> iOS<br />

speech recognizer underlined a word or phrase that was<br />

recognized with low c<strong>on</strong>fidence, and would present <str<strong>on</strong>g>the</str<strong>on</strong>g> user with<br />

alternative recogniti<strong>on</strong> opti<strong>on</strong>s if <str<strong>on</strong>g>the</str<strong>on</strong>g> phrase was touched. This<br />

informati<strong>on</strong> was not accessible. Also, we observed that<br />

VoiceOver communicated punctuati<strong>on</strong> marks in <str<strong>on</strong>g>the</str<strong>on</strong>g> text in a<br />

subtle manner, <str<strong>on</strong>g>by</str<strong>on</strong>g> varying prosody and pausing. This made it<br />

difficult for participants to detect punctuati<strong>on</strong> marks in <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

recognized text when reviewing it.<br />

The review process took l<strong>on</strong>ger than we expected, but participants<br />

spent much more time editing <str<strong>on</strong>g>the</str<strong>on</strong>g> ASR output. Figure 6 shows <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

number <str<strong>on</strong>g>of</str<strong>on</strong>g> edits participants performed for each paragraph input<br />

with speech. The total number <str<strong>on</strong>g>of</str<strong>on</strong>g> edits performed during speech<br />

input tasks was 96. Edits included inserting, deleting, and<br />

replacing characters, words, or strings <str<strong>on</strong>g>of</str<strong>on</strong>g> words. Only 15 (15.6%)<br />

<str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g>se edits were d<strong>on</strong>e using speech input. Although inputting<br />

speech was much faster than using <str<strong>on</strong>g>the</str<strong>on</strong>g> keyboard, participants<br />

preferred using <str<strong>on</strong>g>the</str<strong>on</strong>g> keyboard to edit <str<strong>on</strong>g>the</str<strong>on</strong>g> ASR output. Several<br />

participants inserted words or phrases with speech but deleted<br />

err<strong>on</strong>eous text with <str<strong>on</strong>g>the</str<strong>on</strong>g> keyboard’s BACKSPACE key. Also, <str<strong>on</strong>g>the</str<strong>on</strong>g>re<br />

was no way to move <str<strong>on</strong>g>the</str<strong>on</strong>g> cursor using speech commands, so<br />

participants preferred to c<strong>on</strong>tinue using touch to make characteror<br />

word-level edits.<br />

We observed three editing techniques for speech input in our<br />

study. We describe <str<strong>on</strong>g>the</str<strong>on</strong>g>m as follows.<br />

H<strong>on</strong>e in, delete, and reenter. The first technique was <str<strong>on</strong>g>the</str<strong>on</strong>g> most<br />

comm<strong>on</strong> and was used <str<strong>on</strong>g>by</str<strong>on</strong>g> all participants except P5 and P6 who<br />

did not know how to use VoiceOver gestures. This technique<br />

involved (1) moving <str<strong>on</strong>g>the</str<strong>on</strong>g> cursor to a desired positi<strong>on</strong> using <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

VoiceOver gestures describe previously, (2) using <str<strong>on</strong>g>the</str<strong>on</strong>g> BACKSPACE<br />

key to delete unwanted characters, words, or even phrases, and<br />

finally (3) to enter characters using <str<strong>on</strong>g>the</str<strong>on</strong>g> keyboard.<br />

H<strong>on</strong>e in, select, and reenter. The sec<strong>on</strong>d editing technique<br />

seems more efficient than <str<strong>on</strong>g>the</str<strong>on</strong>g> first, but was <strong>on</strong>ly used <str<strong>on</strong>g>by</str<strong>on</strong>g> P8. It<br />

involved (1) using VoiceOver gestures to move <str<strong>on</strong>g>the</str<strong>on</strong>g> cursor to a<br />

desired positi<strong>on</strong>, (2) choosing <str<strong>on</strong>g>the</str<strong>on</strong>g> EDIT opti<strong>on</strong> from <str<strong>on</strong>g>the</str<strong>on</strong>g> VoiceOver<br />

rotor, (3) selecting <str<strong>on</strong>g>the</str<strong>on</strong>g> word (<strong>on</strong>e <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> EDIT opti<strong>on</strong>s, al<strong>on</strong>g with<br />

COPY and PASTE), and (4) re-enter <str<strong>on</strong>g>the</str<strong>on</strong>g> word with <str<strong>on</strong>g>the</str<strong>on</strong>g> keyboard. P8<br />

sometimes re-entered <str<strong>on</strong>g>the</str<strong>on</strong>g> word with speech input. This technique<br />

required fewer key presses than “h<strong>on</strong>e in, delete, and reenter.”<br />

Delete and start over. P5 and P6 did not know how to repositi<strong>on</strong><br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> cursor, so <str<strong>on</strong>g>the</str<strong>on</strong>g>y used this technique. They deleted entered text<br />

with <str<strong>on</strong>g>the</str<strong>on</strong>g> DELETE key, starting from <str<strong>on</strong>g>the</str<strong>on</strong>g> end (where <str<strong>on</strong>g>the</str<strong>on</strong>g> cursor was<br />

located <str<strong>on</strong>g>by</str<strong>on</strong>g> default) and re-entered with speech. They both said<br />

that <str<strong>on</strong>g>the</str<strong>on</strong>g>y <str<strong>on</strong>g>of</str<strong>on</strong>g>ten used speech for short messages, and deleted<br />

everything and started over if <str<strong>on</strong>g>the</str<strong>on</strong>g>re was a recogniti<strong>on</strong> error. This<br />

was <str<strong>on</strong>g>the</str<strong>on</strong>g> least flexible and, most likely, <str<strong>on</strong>g>the</str<strong>on</strong>g> least efficient editing<br />

technique for l<strong>on</strong>ger messages.<br />

Number <str<strong>on</strong>g>of</str<strong>on</strong>g> Edits<br />

0 5 10 15 20 25 30<br />

Keyboard edits<br />

<str<strong>on</strong>g>Speech</str<strong>on</strong>g> edits<br />

1.1 2.1 2.2 3.1 3.2 4.1 5.1 6.1 7.1 7.2 8.1<br />

Participant.task<br />

Figure 6. Number <str<strong>on</strong>g>of</str<strong>on</strong>g> edits performed with each<br />

modality for each composed paragraph (i.e., task).<br />

4.2.3 Qualitative <str<strong>on</strong>g>Use</str<strong>on</strong>g>r Feedback<br />

Two <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> 8 (25%) participants in our study preferred keyboard<br />

input over speech input. These were P1 and P5, who were <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

<strong>on</strong>ly two participants who used speech weekly, ra<str<strong>on</strong>g>the</str<strong>on</strong>g>r than daily.<br />

All participants menti<strong>on</strong>ed speed as <str<strong>on</strong>g>the</str<strong>on</strong>g> primary benefit <str<strong>on</strong>g>of</str<strong>on</strong>g> using<br />

speech for input. P1 and P5 preferred <str<strong>on</strong>g>the</str<strong>on</strong>g> keyboard because <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

challenge <str<strong>on</strong>g>of</str<strong>on</strong>g> editing: P5 said she knew <str<strong>on</strong>g>the</str<strong>on</strong>g>re were mistakes and<br />

didn’t know how to fix <str<strong>on</strong>g>the</str<strong>on</strong>g>m. P1 said that “although [with] speech<br />

you can get more volume <str<strong>on</strong>g>of</str<strong>on</strong>g> text in <str<strong>on</strong>g>the</str<strong>on</strong>g>re…but my weakness is<br />

efficiently editing. that’s <str<strong>on</strong>g>the</str<strong>on</strong>g> downside <str<strong>on</strong>g>of</str<strong>on</strong>g> speech.”<br />

P3 and P8 said it was easy for <str<strong>on</strong>g>the</str<strong>on</strong>g>m to express <str<strong>on</strong>g>the</str<strong>on</strong>g>ir thoughts<br />

verbally, having had prior experience with dictati<strong>on</strong> systems,<br />

unlike P1 and P2 who found it more difficult. P8 also said that<br />

speech input helped him avoid spelling mistakes—auto-correct<br />

was not sufficient for correcting spelling mistakes.<br />

I rely <strong>on</strong> dictati<strong>on</strong> more because my spelling is not <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

greatest and feel like can compose more <str<strong>on</strong>g>of</str<strong>on</strong>g> coherent<br />

sentence verbally than <str<strong>on</strong>g>by</str<strong>on</strong>g> writing, especially <strong>on</strong> an<br />

iOS device where it’s difficult to correct mistakes. On<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> computer, you can see with spell check and<br />

correct. but <strong>on</strong> iOS device it’s easier not to have to<br />

worry about it.<br />

While most preferred speech, all participants found certain<br />

aspects <str<strong>on</strong>g>of</str<strong>on</strong>g> speech input frustrating. All participants except P3<br />

cited editing as a source <str<strong>on</strong>g>of</str<strong>on</strong>g> frustrati<strong>on</strong>. P8 would like to be able to<br />

do “inline” editing as he spoke. P7 wanted an easier way to edit<br />

with speech ra<str<strong>on</strong>g>the</str<strong>on</strong>g>r than using <str<strong>on</strong>g>the</str<strong>on</strong>g> keyboard, which she did for<br />

most edits. P7 and P3 menti<strong>on</strong>ed <str<strong>on</strong>g>the</str<strong>on</strong>g> problem <str<strong>on</strong>g>of</str<strong>on</strong>g> dictating words<br />

that were out-<str<strong>on</strong>g>of</str<strong>on</strong>g>-vocabulary, such as names. Figure 7 shows<br />

Likert scale resp<strong>on</strong>ses to three statements from <str<strong>on</strong>g>the</str<strong>on</strong>g> interviews at

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!