Exploring the Use of Speech Input by Blind People on ... - Washington

Exploring the Use of Speech Input by Blind People on ... - Washington Exploring the Use of Speech Input by Blind People on ... - Washington

homes.cs.washington.edu
from homes.cs.washington.edu More from this publisher
28.02.2014 Views

ong>Exploringong> ong>theong> ong>Useong> ong>ofong> ong>Speechong> ong>Inputong> ong>byong> ong>Blindong> ong>Peopleong> on Mobile Devices Shiri Azenkot Computer Science & Engineering | DUB Group University ong>ofong> Washington Seattle, WA 98195 USA shiri@cs.washington.edu Nicole B. Lee Human Centered Design & Engineering | DUB Group University ong>ofong> Washington Seattle, WA 98195 USA nikki@nicoleblee.com ABSTRACT Much recent work has explored ong>theong> challenge ong>ofong> nonvisual text entry on mobile devices. While researchers have attempted to solve this problem with gestures, we explore a different modality: speech. We conducted a survey with 169 blind and sighted participants to investigate how ong>ofong>ten, what for, and why blind people used speech for input on ong>theong>ir mobile devices. We found that blind people used speech more ong>ofong>ten and input longer messages than sighted people. We ong>theong>n conducted a study with 8 blind people to observe how ong>theong>y used speech input on an iPod compared with ong>theong> on-screen keyboard with VoiceOver. We found that speech was nearly 5 times as fast as ong>theong> keyboard. While participants were mostly satisfied with speech input, editing recognition errors was frustrating. Participants spent an average ong>ofong> 80.3% ong>ofong> ong>theong>ir time editing. Finally, we propose challenges for future work, including more efficient eyes-free editing and better error detection methods for reviewing text. Categories and Subject Descriptors D.3.3 H.5.2 [Information Interfaces and Presentation]: ong>Useong>r Interfaces – ong>Inputong> devices and strategies, Voice I/O.; K.4.2 [Computers and society]: Social issues – assistive technologies for persons with disabilities. Keywords Dictation, text entry, eyes-free, mobile devices. 1. INTRODUCTION In ong>theong> past few years, touchscreen devices have become widely adopted ong>byong> blind people thanks to screen readers like Apple’s VoiceOver. A blind user can explore a touchscreen ong>byong> touching it as VoiceOver speaks ong>theong> labels ong>ofong> UI elements being touched. To select an element, ong>theong> user performs a second gesture such as a double tap. While this interaction makes an interface generally accessible, finding and selecting keys on an on-screen keyboard is slow and error-prone. Azenkot et al. [5] found that ong>theong> mean text entry rate on an iPhone with VoiceOver was only 4.5 words per minute (WPM). Many researchers have attempted to alleviate ong>theong> challenge ong>ofong> Permission to make digital or hard copies ong>ofong> all or part ong>ofong> this work for personal or classroom use is granted without fee provided that copies are not made or distributed for prong>ofong>it or commercial advantage and that copies bear this notice and ong>theong> full citation on ong>theong> first page. To copy oong>theong>rwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ASSETS’13, Oct 22–24, 2013, Bellevue, WA, USA. Copyright 2010 ACM 1-58113-000-0/00/0010 …$15.00. nonvisual text entry using larger keys [6,22] multi-touch taps [5,10,18]. To our knowledge, no one has explored speech as an eyes-free input modality. Yet speaking is natural, fast, and nonvisual and automatic speech recognition (ASR) is already well integrated on mobile platforms. Mobile device users can enter text with speech in any text entry scenario on iOS and Android devices (see Figure 1) and perform actions such as searching ong>theong> internet and creating calendar events with Google Voice Search [11] and Siri [4]. Figure 1. The Android (left) and iPhone (right) keyboards have a DICTATE button to ong>theong> left ong>ofong> ong>theong> SPACE key that enables users to dictate text instead ong>ofong> using ong>theong> on-screen keyboard. While ong>theong> act ong>ofong> speaking is eyes-free, ong>theong> process ong>ofong> dictating, reviewing, and editing text is complex, and requires additional input. Both reviewing and editing may be challenging for blind people with today’s mobile technology. A sighted person can review a speech recognizer’s output ong>byong> simply reading ong>theong> text on ong>theong> screen. Meanwhile, a blind person can use VoiceOver to review ong>theong> text with speech output. Detecting recognition errors with speech output may be difficult, however, since ong>theong> recognizer is likely to output words that sound like those ong>theong> user said, but are incorrect. For example, ASR is known to have difficulty with segmentation, and may recognize ong>theong> words “recognize speech,” as ong>theong> similarly-sounding “wreck a nice beach.” After identifying recognition errors, blind people may face challenges correcting ong>theong>se errors. Prior work has shown that sighted users spend ong>theong> majority ong>ofong> ong>theong>ir time correcting errors when dictating text. Karat et al. [14] found that sighted people spent 66% ong>ofong> ong>theong>ir time editing ASR output on a desktop dictation system. The editing process for blind users on mobile devices is likely to be much slower, since users may resort to using ong>theong> onscreen keyboard to correct and edit ASR output. Perhaps ong>theong> combined difficulty ong>ofong> reviewing and editing ASR output will outweigh ong>theong> ease and speed ong>ofong> speech. In this paper, we explore ong>theong> patterns and challenges ong>ofong> ong>theong> use ong>ofong> speech input ong>byong> blind people on mobile devices. Our ultimate goal is to improve ong>theong> experience ong>ofong> nonvisual speech input, but

<str<strong>on</strong>g>Exploring</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> <str<strong>on</strong>g>Use</str<strong>on</strong>g> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>Speech</str<strong>on</strong>g> <str<strong>on</strong>g>Input</str<strong>on</strong>g> <str<strong>on</strong>g>by</str<strong>on</strong>g> <str<strong>on</strong>g>Blind</str<strong>on</strong>g> <str<strong>on</strong>g>People</str<strong>on</strong>g><br />

<strong>on</strong> Mobile Devices<br />

Shiri Azenkot<br />

Computer Science & Engineering | DUB Group<br />

University <str<strong>on</strong>g>of</str<strong>on</strong>g> Washingt<strong>on</strong><br />

Seattle, WA 98195 USA<br />

shiri@cs.washingt<strong>on</strong>.edu<br />

Nicole B. Lee<br />

Human Centered Design & Engineering | DUB Group<br />

University <str<strong>on</strong>g>of</str<strong>on</strong>g> Washingt<strong>on</strong><br />

Seattle, WA 98195 USA<br />

nikki@nicoleblee.com<br />

ABSTRACT<br />

Much recent work has explored <str<strong>on</strong>g>the</str<strong>on</strong>g> challenge <str<strong>on</strong>g>of</str<strong>on</strong>g> n<strong>on</strong>visual text<br />

entry <strong>on</strong> mobile devices. While researchers have attempted to<br />

solve this problem with gestures, we explore a different modality:<br />

speech. We c<strong>on</strong>ducted a survey with 169 blind and sighted<br />

participants to investigate how <str<strong>on</strong>g>of</str<strong>on</strong>g>ten, what for, and why blind<br />

people used speech for input <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g>ir mobile devices. We found<br />

that blind people used speech more <str<strong>on</strong>g>of</str<strong>on</strong>g>ten and input l<strong>on</strong>ger<br />

messages than sighted people. We <str<strong>on</strong>g>the</str<strong>on</strong>g>n c<strong>on</strong>ducted a study with 8<br />

blind people to observe how <str<strong>on</strong>g>the</str<strong>on</strong>g>y used speech input <strong>on</strong> an iPod<br />

compared with <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>-screen keyboard with VoiceOver. We<br />

found that speech was nearly 5 times as fast as <str<strong>on</strong>g>the</str<strong>on</strong>g> keyboard.<br />

While participants were mostly satisfied with speech input,<br />

editing recogniti<strong>on</strong> errors was frustrating. Participants spent an<br />

average <str<strong>on</strong>g>of</str<strong>on</strong>g> 80.3% <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g>ir time editing. Finally, we propose<br />

challenges for future work, including more efficient eyes-free<br />

editing and better error detecti<strong>on</strong> methods for reviewing text.<br />

Categories and Subject Descriptors<br />

D.3.3 H.5.2 [Informati<strong>on</strong> Interfaces and Presentati<strong>on</strong>]: <str<strong>on</strong>g>Use</str<strong>on</strong>g>r<br />

Interfaces – <str<strong>on</strong>g>Input</str<strong>on</strong>g> devices and strategies, Voice I/O.; K.4.2<br />

[Computers and society]: Social issues – assistive technologies<br />

for pers<strong>on</strong>s with disabilities.<br />

Keywords<br />

Dictati<strong>on</strong>, text entry, eyes-free, mobile devices.<br />

1. INTRODUCTION<br />

In <str<strong>on</strong>g>the</str<strong>on</strong>g> past few years, touchscreen devices have become widely<br />

adopted <str<strong>on</strong>g>by</str<strong>on</strong>g> blind people thanks to screen readers like Apple’s<br />

VoiceOver. A blind user can explore a touchscreen <str<strong>on</strong>g>by</str<strong>on</strong>g> touching it<br />

as VoiceOver speaks <str<strong>on</strong>g>the</str<strong>on</strong>g> labels <str<strong>on</strong>g>of</str<strong>on</strong>g> UI elements being touched. To<br />

select an element, <str<strong>on</strong>g>the</str<strong>on</strong>g> user performs a sec<strong>on</strong>d gesture such as a<br />

double tap. While this interacti<strong>on</strong> makes an interface generally<br />

accessible, finding and selecting keys <strong>on</strong> an <strong>on</strong>-screen keyboard is<br />

slow and error-pr<strong>on</strong>e. Azenkot et al. [5] found that <str<strong>on</strong>g>the</str<strong>on</strong>g> mean text<br />

entry rate <strong>on</strong> an iPh<strong>on</strong>e with VoiceOver was <strong>on</strong>ly 4.5 words per<br />

minute (WPM).<br />

Many researchers have attempted to alleviate <str<strong>on</strong>g>the</str<strong>on</strong>g> challenge <str<strong>on</strong>g>of</str<strong>on</strong>g><br />

Permissi<strong>on</strong> to make digital or hard copies <str<strong>on</strong>g>of</str<strong>on</strong>g> all or part <str<strong>on</strong>g>of</str<strong>on</strong>g> this work for<br />

pers<strong>on</strong>al or classroom use is granted without fee provided that copies are<br />

not made or distributed for pr<str<strong>on</strong>g>of</str<strong>on</strong>g>it or commercial advantage and that<br />

copies bear this notice and <str<strong>on</strong>g>the</str<strong>on</strong>g> full citati<strong>on</strong> <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> first page. To copy<br />

o<str<strong>on</strong>g>the</str<strong>on</strong>g>rwise, or republish, to post <strong>on</strong> servers or to redistribute to lists,<br />

requires prior specific permissi<strong>on</strong> and/or a fee.<br />

ASSETS’13, Oct 22–24, 2013, Bellevue, WA, USA.<br />

Copyright 2010 ACM 1-58113-000-0/00/0010 …$15.00.<br />

n<strong>on</strong>visual text entry using larger keys [6,22] multi-touch taps<br />

[5,10,18]. To our knowledge, no <strong>on</strong>e has explored speech as an<br />

eyes-free input modality. Yet speaking is natural, fast, and<br />

n<strong>on</strong>visual and automatic speech recogniti<strong>on</strong> (ASR) is already well<br />

integrated <strong>on</strong> mobile platforms. Mobile device users can enter<br />

text with speech in any text entry scenario <strong>on</strong> iOS and Android<br />

devices (see Figure 1) and perform acti<strong>on</strong>s such as searching <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

internet and creating calendar events with Google Voice Search<br />

[11] and Siri [4].<br />

Figure 1. The Android (left) and iPh<strong>on</strong>e (right)<br />

keyboards have a DICTATE butt<strong>on</strong> to <str<strong>on</strong>g>the</str<strong>on</strong>g> left <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

SPACE key that enables users to dictate text instead <str<strong>on</strong>g>of</str<strong>on</strong>g><br />

using <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>-screen keyboard.<br />

While <str<strong>on</strong>g>the</str<strong>on</strong>g> act <str<strong>on</strong>g>of</str<strong>on</strong>g> speaking is eyes-free, <str<strong>on</strong>g>the</str<strong>on</strong>g> process <str<strong>on</strong>g>of</str<strong>on</strong>g> dictating,<br />

reviewing, and editing text is complex, and requires additi<strong>on</strong>al<br />

input. Both reviewing and editing may be challenging for blind<br />

people with today’s mobile technology. A sighted pers<strong>on</strong> can<br />

review a speech recognizer’s output <str<strong>on</strong>g>by</str<strong>on</strong>g> simply reading <str<strong>on</strong>g>the</str<strong>on</strong>g> text <strong>on</strong><br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> screen. Meanwhile, a blind pers<strong>on</strong> can use VoiceOver to<br />

review <str<strong>on</strong>g>the</str<strong>on</strong>g> text with speech output. Detecting recogniti<strong>on</strong> errors<br />

with speech output may be difficult, however, since <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

recognizer is likely to output words that sound like those <str<strong>on</strong>g>the</str<strong>on</strong>g> user<br />

said, but are incorrect. For example, ASR is known to have<br />

difficulty with segmentati<strong>on</strong>, and may recognize <str<strong>on</strong>g>the</str<strong>on</strong>g> words<br />

“recognize speech,” as <str<strong>on</strong>g>the</str<strong>on</strong>g> similarly-sounding “wreck a nice<br />

beach.”<br />

After identifying recogniti<strong>on</strong> errors, blind people may face<br />

challenges correcting <str<strong>on</strong>g>the</str<strong>on</strong>g>se errors. Prior work has shown that<br />

sighted users spend <str<strong>on</strong>g>the</str<strong>on</strong>g> majority <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g>ir time correcting errors<br />

when dictating text. Karat et al. [14] found that sighted people<br />

spent 66% <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g>ir time editing ASR output <strong>on</strong> a desktop dictati<strong>on</strong><br />

system. The editing process for blind users <strong>on</strong> mobile devices is<br />

likely to be much slower, since users may resort to using <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>screen<br />

keyboard to correct and edit ASR output. Perhaps <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

combined difficulty <str<strong>on</strong>g>of</str<strong>on</strong>g> reviewing and editing ASR output will<br />

outweigh <str<strong>on</strong>g>the</str<strong>on</strong>g> ease and speed <str<strong>on</strong>g>of</str<strong>on</strong>g> speech.<br />

In this paper, we explore <str<strong>on</strong>g>the</str<strong>on</strong>g> patterns and challenges <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> use <str<strong>on</strong>g>of</str<strong>on</strong>g><br />

speech input <str<strong>on</strong>g>by</str<strong>on</strong>g> blind people <strong>on</strong> mobile devices. Our ultimate<br />

goal is to improve <str<strong>on</strong>g>the</str<strong>on</strong>g> experience <str<strong>on</strong>g>of</str<strong>on</strong>g> n<strong>on</strong>visual speech input, but


first we study current use to identify specific challenges that <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

accessibility community can address. We c<strong>on</strong>ducted a survey<br />

with 169 people (105 sighted and 65 blind and low-visi<strong>on</strong>) to<br />

learn how <str<strong>on</strong>g>of</str<strong>on</strong>g>ten people use speech input, what <str<strong>on</strong>g>the</str<strong>on</strong>g>y use it for, and<br />

how much <str<strong>on</strong>g>the</str<strong>on</strong>g>y like it. We <str<strong>on</strong>g>the</str<strong>on</strong>g>n c<strong>on</strong>ducted a laboratory study with<br />

8 blind people to observe how blind people use speech to<br />

compose paragraphs. We wanted to discover what techniques<br />

people used to review and edit ASR output and how effective<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g>se techniques were.<br />

In our survey, we found that blind people used speech for input<br />

more frequently and for l<strong>on</strong>ger messages than sighted people.<br />

<str<strong>on</strong>g>Blind</str<strong>on</strong>g> people were also more satisfied with speech than sighted<br />

people, probably because <str<strong>on</strong>g>the</str<strong>on</strong>g> comparative advantage <str<strong>on</strong>g>of</str<strong>on</strong>g> speech to<br />

keyboard input was far greater for <str<strong>on</strong>g>the</str<strong>on</strong>g>m than for sighted people.<br />

Our laboratory study showed that speech was nearly five times as<br />

fast as <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>-screen keyboard, but editing recogniti<strong>on</strong> errors was<br />

frustrating. Participants spent an average <str<strong>on</strong>g>of</str<strong>on</strong>g> 80.3% <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g>ir time<br />

reviewing and editing <str<strong>on</strong>g>the</str<strong>on</strong>g>ir text. Most edits were performed using<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> BACKSPACE key and reentering characters with <str<strong>on</strong>g>the</str<strong>on</strong>g> keyboard.<br />

Six out <str<strong>on</strong>g>of</str<strong>on</strong>g> eight participants in <str<strong>on</strong>g>the</str<strong>on</strong>g> study preferred speech to<br />

keyboard entry.<br />

Our main c<strong>on</strong>tributi<strong>on</strong> is our findings from <str<strong>on</strong>g>the</str<strong>on</strong>g> survey and study.<br />

Also, we c<strong>on</strong>tribute specific research challenges for <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

community to explore that can improve n<strong>on</strong>visual text entry using<br />

both speech and touch input.<br />

2. RELATED WORK<br />

To our knowledge, we are <str<strong>on</strong>g>the</str<strong>on</strong>g> first to explore speech input for<br />

blind people in <str<strong>on</strong>g>the</str<strong>on</strong>g> human-computer interacti<strong>on</strong> literature. <str<strong>on</strong>g>Speech</str<strong>on</strong>g><br />

has mostly been studied as a form <str<strong>on</strong>g>of</str<strong>on</strong>g> n<strong>on</strong>visual output (e.g., [24,<br />

28]) ra<str<strong>on</strong>g>the</str<strong>on</strong>g>r than n<strong>on</strong>visual input. There has been some work <strong>on</strong><br />

hands-free dictati<strong>on</strong>, both for people with motor impairments<br />

[26], and for <str<strong>on</strong>g>the</str<strong>on</strong>g> general populati<strong>on</strong> [12,14].<br />

Prior work <strong>on</strong> speech input interacti<strong>on</strong> focuses <strong>on</strong> error<br />

correcti<strong>on</strong>, <str<strong>on</strong>g>the</str<strong>on</strong>g> “Achilles Heel <str<strong>on</strong>g>of</str<strong>on</strong>g> speech technology” [23].<br />

Desktop dictati<strong>on</strong> systems such as Drag<strong>on</strong> Naturally Speaking<br />

[21], which gained popularity in <str<strong>on</strong>g>the</str<strong>on</strong>g> late 1990’s, use speech<br />

commands for cursor navigati<strong>on</strong> and error correcti<strong>on</strong>. <str<strong>on</strong>g>Use</str<strong>on</strong>g>rs speak<br />

commands such as “move left” and “undo” to edit text or<br />

repositi<strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> cursor. Karat et al. [14] found that novice users<br />

entered text at a rate <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>ly 13.6 WPM with a commercial<br />

desktop dictati<strong>on</strong> system and 32.5 WPM with a keyboard and<br />

mouse. This striking discrepancy was due to (1) cascades <str<strong>on</strong>g>of</str<strong>on</strong>g><br />

errors triggered <str<strong>on</strong>g>by</str<strong>on</strong>g> a user’s correcti<strong>on</strong> and (2) spiral-depth, a<br />

user’s repeated attempts to speak a word that is not correctly<br />

recognized.<br />

Some work has aimed to alleviate <str<strong>on</strong>g>the</str<strong>on</strong>g> difficulty <str<strong>on</strong>g>of</str<strong>on</strong>g> error<br />

correcti<strong>on</strong> through touch or stylus input (see [23] for a review).<br />

Suhm et al. [29] present a system where users touch <str<strong>on</strong>g>the</str<strong>on</strong>g> word<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g>y want to correct, eliminating speech-based navigati<strong>on</strong>. Their<br />

system also supports small gestures such as striking out a word as<br />

shortcuts. Martin et al. [19] enable users to correct errors <str<strong>on</strong>g>of</str<strong>on</strong>g><br />

preliminary results with a mouse click. These systems make error<br />

correcti<strong>on</strong> more efficient but have high visual demands and are<br />

not appropriate for bind users.<br />

There is little recent academic work <strong>on</strong> speech-based input<br />

systems. Voice Typing, introduced <str<strong>on</strong>g>by</str<strong>on</strong>g> Kumar et al. in 2012 [16],<br />

displays recognized text as <str<strong>on</strong>g>the</str<strong>on</strong>g> user dictates short phrases. <str<strong>on</strong>g>Use</str<strong>on</strong>g>rs<br />

can correct recogniti<strong>on</strong> errors with a marking menu. Kumar et al.<br />

found that correcting errors with Voice Typing required less<br />

effort than correcting errors with <str<strong>on</strong>g>the</str<strong>on</strong>g> iPh<strong>on</strong>e’s dictati<strong>on</strong> model.<br />

The iOS and Android dictati<strong>on</strong> systems resemble <str<strong>on</strong>g>the</str<strong>on</strong>g> literature<br />

described above. Dictati<strong>on</strong> <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> iPh<strong>on</strong>e follows an open-loop<br />

interacti<strong>on</strong> model, where <str<strong>on</strong>g>the</str<strong>on</strong>g> recognizer outputs text <strong>on</strong>ly after <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

user completes <str<strong>on</strong>g>the</str<strong>on</strong>g> dictati<strong>on</strong>. Android, in c<strong>on</strong>trast, currently has<br />

incremental speech recogniti<strong>on</strong>, displaying recognized text as <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

user speaks. With VoiceOver, iOS dictati<strong>on</strong> seems to be<br />

accessible to blind people, but it is unclear how incremental<br />

recogniti<strong>on</strong> affects accessibility <strong>on</strong> Android, especially since<br />

Android is not as generally accessible as iOS.<br />

While <str<strong>on</strong>g>the</str<strong>on</strong>g>re is no known work <strong>on</strong> n<strong>on</strong>visual speech input, <str<strong>on</strong>g>the</str<strong>on</strong>g>re<br />

has been significant interest in n<strong>on</strong>visual touch-based input. In<br />

2008, Slide Rule [13], introduced an accessible eyes-free<br />

interacti<strong>on</strong> technique for touchscreen devices which was later<br />

adopted <str<strong>on</strong>g>by</str<strong>on</strong>g> VoiceOver. Since <str<strong>on</strong>g>the</str<strong>on</strong>g>n, researchers and developers<br />

have been trying to improve <str<strong>on</strong>g>the</str<strong>on</strong>g> experience <str<strong>on</strong>g>of</str<strong>on</strong>g> eyes-free text entry<br />

with gesture-based methods. Several text entry techniques were<br />

proposed that were based <strong>on</strong> Braille, including Perkinput [5],<br />

BrailleTouch [10,27], BrailleType [22], and TypeInBraille [18].<br />

Methods that were not based <strong>on</strong> Braille [9,25,34], including No-<br />

Look Notes [6] and NavTouch [22], did not achieve comparable<br />

performance to <str<strong>on</strong>g>the</str<strong>on</strong>g> former set. Despite <str<strong>on</strong>g>the</str<strong>on</strong>g> large amount <str<strong>on</strong>g>of</str<strong>on</strong>g> work<br />

in this area, entry rates for blind users remain relatively slow: at<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> end <str<strong>on</strong>g>of</str<strong>on</strong>g> a l<strong>on</strong>gitudinal study, Perkinput users entered text at a<br />

rate <str<strong>on</strong>g>of</str<strong>on</strong>g> 7.3 WPM (with a 2.1% error rate). BrailleTouch users,<br />

who were expert users <str<strong>on</strong>g>of</str<strong>on</strong>g> Braille keyboards, entered text at a rate<br />

<str<strong>on</strong>g>of</str<strong>on</strong>g> 23.1 WPM, but with an error rate <str<strong>on</strong>g>of</str<strong>on</strong>g> 4.8%.<br />

Several eyes-free text entry methods were proposed and evaluated<br />

with sighted people (e.g., [30]), but <str<strong>on</strong>g>the</str<strong>on</strong>g>y may not be appropriate<br />

for blind users. As Kane et al. found [14], blind people have<br />

different preferences and performance abilities with touch screen<br />

gestures than sighted people.<br />

3. SURVEY: PATTERNS OF SPEECH<br />

INPUT AMONG BLIND AND SIGHTED<br />

PEOPLE<br />

We c<strong>on</strong>ducted a survey to determine how <str<strong>on</strong>g>of</str<strong>on</strong>g>ten blind people use<br />

speech input <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g>ir mobile devices, what <str<strong>on</strong>g>the</str<strong>on</strong>g>y use it for, and<br />

how <str<strong>on</strong>g>the</str<strong>on</strong>g>y feel about it. We surveyed both blind and sighted<br />

people to evaluate <str<strong>on</strong>g>the</str<strong>on</strong>g> n<strong>on</strong>visual experience against a baseline <str<strong>on</strong>g>of</str<strong>on</strong>g><br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> comm<strong>on</strong> (visual) use case.<br />

3.1 Methods<br />

We surveyed 54 blind participants, 10 low-visi<strong>on</strong> participants,<br />

and 105 sighted participants. There were 31 female and 33 male<br />

blind/low-visi<strong>on</strong> (BLV) participants, with an average age <str<strong>on</strong>g>of</str<strong>on</strong>g> 40<br />

(age range: 18 to 66). Sighted participants were younger, with 51<br />

males and 54 females and an average age <str<strong>on</strong>g>of</str<strong>on</strong>g> 32 (age range: 20 to<br />

66). We sent emails <strong>on</strong> mailing lists related to our university and<br />

blindness organizati<strong>on</strong>s to recruit participants. Participants did not<br />

receive compensati<strong>on</strong> for completing <str<strong>on</strong>g>the</str<strong>on</strong>g> survey.<br />

The survey included a maximum number <str<strong>on</strong>g>of</str<strong>on</strong>g> 9 questi<strong>on</strong>s. The first<br />

three questi<strong>on</strong>s asked for demographic informati<strong>on</strong>: age, gender,<br />

and disability. The next questi<strong>on</strong> asked:<br />

Have you recently used dictati<strong>on</strong> instead <str<strong>on</strong>g>of</str<strong>on</strong>g> a<br />

keyboard to enter text <strong>on</strong> a smartph<strong>on</strong>e?<br />

Examples <str<strong>on</strong>g>of</str<strong>on</strong>g> dictati<strong>on</strong> include:<br />

- Asking Siri a questi<strong>on</strong>, e.g., "what's <str<strong>on</strong>g>the</str<strong>on</strong>g> wea<str<strong>on</strong>g>the</str<strong>on</strong>g>r like today?"<br />

- Giving Siri a command, e.g., "call John Johns<strong>on</strong>"<br />

- Dictating an email or text message<br />

Required.


[ ] Yes<br />

[ ] No<br />

If <str<strong>on</strong>g>the</str<strong>on</strong>g> participant answered “No” to <str<strong>on</strong>g>the</str<strong>on</strong>g> questi<strong>on</strong> above, <str<strong>on</strong>g>the</str<strong>on</strong>g> survey<br />

c<strong>on</strong>cluded with a final questi<strong>on</strong> that asked why not. If <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

participant answered “yes,” she was asked to recall a specific<br />

instance in which she used dictati<strong>on</strong>. She was <str<strong>on</strong>g>the</str<strong>on</strong>g>n asked several<br />

questi<strong>on</strong>s about this instance, such as when <str<strong>on</strong>g>the</str<strong>on</strong>g> instance occurred,<br />

and what she dictated (a questi<strong>on</strong> to Siri, a text message, etc.).<br />

The penultimate questi<strong>on</strong> presented <str<strong>on</strong>g>the</str<strong>on</strong>g> user with three statements<br />

and asked her to describe how she feels about each statement <strong>on</strong> a<br />

Likert scale. The statements were:<br />

• Dictati<strong>on</strong> <strong>on</strong> a smartph<strong>on</strong>e is accurate<br />

• Using dictati<strong>on</strong> <strong>on</strong> a smartph<strong>on</strong>e (including <str<strong>on</strong>g>the</str<strong>on</strong>g> time it<br />

takes to correct errors) is fast relative to an <strong>on</strong>-screen<br />

keyboard.<br />

• I am satisfied with dictati<strong>on</strong> <strong>on</strong> my smartph<strong>on</strong>e.<br />

The survey c<strong>on</strong>cluded with a prompt for “o<str<strong>on</strong>g>the</str<strong>on</strong>g>r comments” and a<br />

free-form text box for <str<strong>on</strong>g>the</str<strong>on</strong>g>ir resp<strong>on</strong>se.<br />

Surveys were completed <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> Internet and resp<strong>on</strong>ses were<br />

an<strong>on</strong>ymized.<br />

To analyze <str<strong>on</strong>g>the</str<strong>on</strong>g> results, we graphed <str<strong>on</strong>g>the</str<strong>on</strong>g> data and computed<br />

descriptive statistics for all questi<strong>on</strong>s. We used Wilcox<strong>on</strong> Rank<br />

Sums tests to compare means between Likert scale resp<strong>on</strong>ses. We<br />

modeled <str<strong>on</strong>g>the</str<strong>on</strong>g> data with <strong>on</strong>e factor, SightAbility, with two levels:<br />

BLV, and Sighted. The measures corresp<strong>on</strong>ded to <str<strong>on</strong>g>the</str<strong>on</strong>g> three Likert<br />

resp<strong>on</strong>se statements: Accurate, Fast, and Satisfied.<br />

3.2 Results<br />

The survey resp<strong>on</strong>ses showed that BLV people used dictati<strong>on</strong> far<br />

more frequently than sighted people. Interestingly, 58 BLV<br />

participants (90.6%) and 58 sighted participants (55.2%) used<br />

dictati<strong>on</strong> recently. Am<strong>on</strong>g BLV participants, <strong>on</strong>ly <strong>on</strong>e used<br />

speech input <strong>on</strong> an Android device and <str<strong>on</strong>g>the</str<strong>on</strong>g> rest used it <strong>on</strong> iOS<br />

devices. Am<strong>on</strong>g sighted people, 21 participants used speech input<br />

<strong>on</strong> an Android device and 34 <strong>on</strong> an iOS device. Most BLV people<br />

used speech input within <str<strong>on</strong>g>the</str<strong>on</strong>g> last day while most sighted people<br />

used it within <str<strong>on</strong>g>the</str<strong>on</strong>g> last week.<br />

Both BLV and sighted participants used speech most for<br />

composing text message. Table 1 shows <str<strong>on</strong>g>the</str<strong>on</strong>g> kinds <str<strong>on</strong>g>of</str<strong>on</strong>g> messages<br />

participants composed. Many more BLV than sighted people used<br />

speech to compose emails. Figure 2 supports this finding,<br />

showing that BLV people composed l<strong>on</strong>ger messages.<br />

Table 1. Number <str<strong>on</strong>g>of</str<strong>on</strong>g> resp<strong>on</strong>ses for a survey questi<strong>on</strong>.<br />

What did you use speech input for? BLV Sighted<br />

A command (e.g., "Call bob smith") 8 14<br />

a questi<strong>on</strong> (e.g., "Siri, what's <str<strong>on</strong>g>the</str<strong>on</strong>g> wea<str<strong>on</strong>g>the</str<strong>on</strong>g>r like<br />

today?") 13 14<br />

An email 12 4<br />

A text message 20 19<br />

O<str<strong>on</strong>g>the</str<strong>on</strong>g>r 5 7<br />

Figure 2 shows <str<strong>on</strong>g>the</str<strong>on</strong>g> means and standard deviati<strong>on</strong>s (SD’s) <str<strong>on</strong>g>of</str<strong>on</strong>g><br />

participant Likert scale resp<strong>on</strong>ses to <str<strong>on</strong>g>the</str<strong>on</strong>g> penultimate questi<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g><br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> survey. Histograms <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> data showed <str<strong>on</strong>g>the</str<strong>on</strong>g> distributi<strong>on</strong>s <str<strong>on</strong>g>of</str<strong>on</strong>g><br />

resp<strong>on</strong>ses were roughly normal, so <str<strong>on</strong>g>the</str<strong>on</strong>g> means and SDs represent<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> resp<strong>on</strong>ses appropriately. As Figure 3 shows, BLV people<br />

were more satisfied with speech and thought it was faster than <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

<strong>on</strong>-screen keyboard compared with sighted people. This resulted<br />

in a significant effect <str<strong>on</strong>g>of</str<strong>on</strong>g> SightAbility <strong>on</strong> Satisfacti<strong>on</strong> (W = 2296, p<br />

< 0.001) and Speed (W = 2240, p = 0.001). There was no<br />

significant effect <str<strong>on</strong>g>of</str<strong>on</strong>g> SightAbility <strong>on</strong> Accuracy, but <str<strong>on</strong>g>the</str<strong>on</strong>g>re was a<br />

str<strong>on</strong>g trend (W = 1977, p = 0.067). Perhaps BLV people were<br />

able to get fewer recogniti<strong>on</strong> errors because <str<strong>on</strong>g>the</str<strong>on</strong>g>y had more<br />

practice using speech for input.<br />

Resp<strong>on</strong>ses<br />

0 5 10 15 20 25 30<br />

BLV<br />

Sighted<br />

1 − 5 6 − 10 >10<br />

Figure 2. Survey resp<strong>on</strong>ses to <str<strong>on</strong>g>the</str<strong>on</strong>g> questi<strong>on</strong>, “About<br />

how l<strong>on</strong>g was your dictated text?”<br />

0 1 2 3 4 5<br />

Accurate Fast Satisfied<br />

BLV<br />

Sighted<br />

Figure 3. Survey resp<strong>on</strong>ses <strong>on</strong> a 5-point Likert-scale:<br />

1 is str<strong>on</strong>gly disagree, and 5 is str<strong>on</strong>g agree. Resp<strong>on</strong>ses<br />

were roughly normally distributed.<br />

When prompted for o<str<strong>on</strong>g>the</str<strong>on</strong>g>r comments, many participants noted <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

challenge <str<strong>on</strong>g>of</str<strong>on</strong>g> editing <str<strong>on</strong>g>the</str<strong>on</strong>g> recognizer’s output, and speaking in<br />

noisy envir<strong>on</strong>ments. One blind participant explained,<br />

Accuracy in noisy envir<strong>on</strong>ments is <str<strong>on</strong>g>the</str<strong>on</strong>g> biggest<br />

challenge I feel. I prefer to dictate short commands<br />

and text, saving l<strong>on</strong>g e-mail resp<strong>on</strong>ses for a standard<br />

computer. Editing can be a challenge.<br />

Three sighted participants felt awkward using speaking to <str<strong>on</strong>g>the</str<strong>on</strong>g>ir<br />

device. Ano<str<strong>on</strong>g>the</str<strong>on</strong>g>r sighted participant echoed this c<strong>on</strong>cern, feeling<br />

frustrated with <str<strong>on</strong>g>the</str<strong>on</strong>g> lack <str<strong>on</strong>g>of</str<strong>on</strong>g> feedback, “I find it hard to talk to <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

device. Do you yell at it and hope it understands better?”<br />

Participants who did not use speech for input recently were<br />

mostly c<strong>on</strong>cerned with accuracy and errors. Some were also<br />

c<strong>on</strong>cerned about privacy or social appropriateness, since o<str<strong>on</strong>g>the</str<strong>on</strong>g>r<br />

people can hear what <str<strong>on</strong>g>the</str<strong>on</strong>g>y say when <str<strong>on</strong>g>the</str<strong>on</strong>g>y speak to <str<strong>on</strong>g>the</str<strong>on</strong>g>ir devices.<br />

Some sighted participants said that <str<strong>on</strong>g>the</str<strong>on</strong>g>y simply “d<strong>on</strong>’t need” to<br />

use speech for input or haven’t figured out how to use it yet.<br />

3.3 Discussi<strong>on</strong><br />

The survey results suggest that speech is already a widely used<br />

eyes-free alternative to keyboard input. <str<strong>on</strong>g>Blind</str<strong>on</strong>g> people seem more<br />

satisfied with speech than sighted people. This is probably<br />

because keyboard input with VoiceOver is so much slower than


standard keyboard input, that sighted people do not feel speech<br />

input <str<strong>on</strong>g>of</str<strong>on</strong>g>fers a significant advantage.<br />

We were surprised that nearly half <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> blind participants<br />

reported that <str<strong>on</strong>g>the</str<strong>on</strong>g>ir recent speech input message was over 10<br />

words l<strong>on</strong>g. From anecdotal experiences and formative work, it<br />

seems that it would be difficult to review and edit a message that<br />

was more than 10 words l<strong>on</strong>g. On <str<strong>on</strong>g>the</str<strong>on</strong>g> o<str<strong>on</strong>g>the</str<strong>on</strong>g>r hand, more blind<br />

people entered text messages with speech than emails, suggesting<br />

that perhaps speech input was preferred for shorter and more<br />

casual messages.<br />

The survey findings raised new questi<strong>on</strong>s. Exactly how l<strong>on</strong>g were<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> messages blind people entered with speech? The results<br />

suggested speech input was much faster than keyboard input, but<br />

<str<strong>on</strong>g>by</str<strong>on</strong>g> how much? How did participants review and edit <str<strong>on</strong>g>the</str<strong>on</strong>g>ir text,<br />

and how much time did <str<strong>on</strong>g>the</str<strong>on</strong>g>y spend doing so? In <str<strong>on</strong>g>the</str<strong>on</strong>g> next secti<strong>on</strong>,<br />

we describe <str<strong>on</strong>g>the</str<strong>on</strong>g> study we c<strong>on</strong>ducted to answer <str<strong>on</strong>g>the</str<strong>on</strong>g>se questi<strong>on</strong>s.<br />

4. STUDY: OBSERVING THE USE OF<br />

SPEECH INPUT BY BLIND PEOPLE<br />

After finding that nearly all <str<strong>on</strong>g>of</str<strong>on</strong>g> our blind survey participants used<br />

speech for input, we wanted to learn more about <str<strong>on</strong>g>the</str<strong>on</strong>g>ir experience<br />

using it. We c<strong>on</strong>ducted a laboratory study to observe blind people<br />

composing paragraphs with dictati<strong>on</strong> and <str<strong>on</strong>g>the</str<strong>on</strong>g> accessible <strong>on</strong>-screen<br />

keyboard.<br />

4.1 Methods<br />

4.1.1 Participants<br />

We recruited eight blind participants (five males, three females)<br />

with an average age <str<strong>on</strong>g>of</str<strong>on</strong>g> 44 (age range: 22 to 61). We required that<br />

participants be blind and use a smart mobile device such as a<br />

smartph<strong>on</strong>e. All participants owned iPh<strong>on</strong>es, which <str<strong>on</strong>g>the</str<strong>on</strong>g>y used<br />

many times a day. Two participants had <str<strong>on</strong>g>the</str<strong>on</strong>g>ir ph<strong>on</strong>es for about a<br />

year, and <str<strong>on</strong>g>the</str<strong>on</strong>g> rest had <str<strong>on</strong>g>the</str<strong>on</strong>g>ir ph<strong>on</strong>es for two or more years. All had<br />

no functi<strong>on</strong>al visi<strong>on</strong> and used VoiceOver to interact with <str<strong>on</strong>g>the</str<strong>on</strong>g>ir<br />

ph<strong>on</strong>es.<br />

Since we wanted to observe how blind people use speech input in<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g>ir daily lives, we also required that participants have<br />

experience using speech <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g>ir mobile devices. Six participants<br />

used speech input every day, while <str<strong>on</strong>g>the</str<strong>on</strong>g> remaining two participants<br />

used speech input weekly.<br />

4.1.2 Procedure & Apparatus<br />

Participants completed <strong>on</strong>e sessi<strong>on</strong> in <str<strong>on</strong>g>the</str<strong>on</strong>g> study that was about<br />

<strong>on</strong>e hour and fifteen minutes l<strong>on</strong>g. At <str<strong>on</strong>g>the</str<strong>on</strong>g> beginning <str<strong>on</strong>g>of</str<strong>on</strong>g> a sessi<strong>on</strong>,<br />

we asked participants for demographic informati<strong>on</strong>, and asked<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g>m how <str<strong>on</strong>g>of</str<strong>on</strong>g>ten and what <str<strong>on</strong>g>the</str<strong>on</strong>g>y used speech as input for <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g>ir<br />

mobile devices. We <str<strong>on</strong>g>the</str<strong>on</strong>g>n showed participants an iPod Touch 5<br />

that we used for <str<strong>on</strong>g>the</str<strong>on</strong>g> study and adjusted <str<strong>on</strong>g>the</str<strong>on</strong>g> keyboard and<br />

VoiceOver settings to match <str<strong>on</strong>g>the</str<strong>on</strong>g> participant’s preferences (e.g.,<br />

disable auto-correct).<br />

We <str<strong>on</strong>g>the</str<strong>on</strong>g>n asked participants to compose paragraphs using ei<str<strong>on</strong>g>the</str<strong>on</strong>g>r<br />

speech or <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>-screen keyboard for input. When composing<br />

with speech, participants used <str<strong>on</strong>g>the</str<strong>on</strong>g> DICTATE butt<strong>on</strong> <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

keyboard, located to <str<strong>on</strong>g>the</str<strong>on</strong>g> left <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> SPACE key (shown in Figure<br />

1). They could use <str<strong>on</strong>g>the</str<strong>on</strong>g> keyboard when editing <str<strong>on</strong>g>the</str<strong>on</strong>g> ASR output,<br />

but we required <str<strong>on</strong>g>the</str<strong>on</strong>g>m to use speech for <str<strong>on</strong>g>the</str<strong>on</strong>g> initial compositi<strong>on</strong><br />

process. Participants composed text <strong>on</strong> a simple website we<br />

developed that included <strong>on</strong>ly a prompt, a textarea widget, and a<br />

submit butt<strong>on</strong>. When <str<strong>on</strong>g>the</str<strong>on</strong>g>y completed a paragraph, <str<strong>on</strong>g>the</str<strong>on</strong>g>y clicked<br />

submit.<br />

We presented participants with short prompts such as, “Tell us<br />

about a book you read recently. What did you like about it?” In<br />

additi<strong>on</strong> to telling <str<strong>on</strong>g>the</str<strong>on</strong>g>m which input method to use (speech or<br />

keyboard), we gave <str<strong>on</strong>g>the</str<strong>on</strong>g>m two guidelines:<br />

1. Enter about 4 to 8 sentences in resp<strong>on</strong>se to <str<strong>on</strong>g>the</str<strong>on</strong>g> prompt.<br />

2. <str<strong>on</strong>g>Use</str<strong>on</strong>g> pr<str<strong>on</strong>g>of</str<strong>on</strong>g>essi<strong>on</strong>al language, as though you were<br />

emailing a potential employer.<br />

We hoped <str<strong>on</strong>g>the</str<strong>on</strong>g> first guideline would encourage participants to type<br />

reas<strong>on</strong>ably l<strong>on</strong>g paragraphs, that would reveal more interesting<br />

behavior, and <str<strong>on</strong>g>the</str<strong>on</strong>g> sec<strong>on</strong>d guideline would encourage <str<strong>on</strong>g>the</str<strong>on</strong>g>m to<br />

write complete sentences with proper grammar. We<br />

recommended that participants review and edit <str<strong>on</strong>g>the</str<strong>on</strong>g>ir text as <str<strong>on</strong>g>the</str<strong>on</strong>g>y<br />

normally would (with both input modalities). We wanted to study<br />

speech input for l<strong>on</strong>g and formal paragraphs because we aim to<br />

make it usable for a variety <str<strong>on</strong>g>of</str<strong>on</strong>g> c<strong>on</strong>texts, not just short, casual text<br />

messages.<br />

Our procedure differed from standard text entry studies in at least<br />

two ways. First, in standard text entry studies (e.g.,<br />

[5,17,22,27,30,33]), participants transcribe sets <str<strong>on</strong>g>of</str<strong>on</strong>g> phrases. This<br />

approach is not appropriate for speech input, however, because<br />

people speak differently when reading text. <str<strong>on</strong>g>Speech</str<strong>on</strong>g> recognizers<br />

are trained <strong>on</strong> c<strong>on</strong>versati<strong>on</strong>al speech, not read-aloud phrases.<br />

Recognizers would be likely to produce more errors if<br />

participants read phrases out loud. Sec<strong>on</strong>d, most text entry studies<br />

do not permit participants to edit text post-hoc (i.e., insert, delete,<br />

or replace text after repositi<strong>on</strong>ing <str<strong>on</strong>g>the</str<strong>on</strong>g> cursor). Since post-hoc<br />

editing is a comm<strong>on</strong> part <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> compositi<strong>on</strong> process, especially<br />

with dictati<strong>on</strong>, we included it in our study design.<br />

After a participant entered <strong>on</strong>e paragraph with each modality, we<br />

decided whe<str<strong>on</strong>g>the</str<strong>on</strong>g>r to ask him or her to enter a sec<strong>on</strong>d set <str<strong>on</strong>g>of</str<strong>on</strong>g><br />

paragraphs, depending <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> amount <str<strong>on</strong>g>of</str<strong>on</strong>g> time remaining in <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

study. We counter-balanced <str<strong>on</strong>g>the</str<strong>on</strong>g> order <str<strong>on</strong>g>of</str<strong>on</strong>g> input methods and<br />

alternated methods between successive paragraphs for each<br />

participant (if he or she entered more than <strong>on</strong>e paragraph). We<br />

c<strong>on</strong>cluded each study with a 15-minute semi-structured interview.<br />

The interview included open-ended questi<strong>on</strong>s about what<br />

participants liked and disliked about using speech for input. We<br />

also asked <str<strong>on</strong>g>the</str<strong>on</strong>g>m to resp<strong>on</strong>d to three statements <strong>on</strong> a 7-point Likert<br />

scale (1 is str<strong>on</strong>gly agree, 7 is str<strong>on</strong>gly disagree). The<br />

statements were:<br />

1. Entering text with speech was fast compared to entering<br />

text with <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>-screen keyboard.<br />

2. Entering text with speech was frustrating compared<br />

with entering text with <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>-screen keyboard.<br />

3. I am satisfied with using speech to enter text compared<br />

to using <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>-screen keyboard.<br />

We recorded audio for each study, al<strong>on</strong>g with a video capture <str<strong>on</strong>g>of</str<strong>on</strong>g><br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> iPod Touch’s screen. We mirrored <str<strong>on</strong>g>the</str<strong>on</strong>g> iPod Touch screen <strong>on</strong>to<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> researcher’s computer using <str<strong>on</strong>g>the</str<strong>on</strong>g> built-in AirPlay [2] client <strong>on</strong><br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> iPod and a Mac applicati<strong>on</strong> called AirServer [1].<br />

4.1.3 Design & Analysis<br />

Participants composed a total <str<strong>on</strong>g>of</str<strong>on</strong>g> 22 paragraphs, 11 with speech<br />

input and 11 with keyboard input. Only three participants entered<br />

two paragraphs with each method and <str<strong>on</strong>g>the</str<strong>on</strong>g> rest entered <strong>on</strong>e<br />

paragraph with each method. The average length <str<strong>on</strong>g>of</str<strong>on</strong>g> a paragraph<br />

was 66.4 words (SD = 41.6).


The study was an 8 x 1 design, with a single factor Method with<br />

two levels, <str<strong>on</strong>g>Speech</str<strong>on</strong>g> and Keyboard. We calculated entry speed in<br />

terms <str<strong>on</strong>g>of</str<strong>on</strong>g> Words per Minute (WPM), using <str<strong>on</strong>g>the</str<strong>on</strong>g> formula [30]:<br />

WPM = T −1 •60• 1 S 5<br />

where T is <str<strong>on</strong>g>the</str<strong>on</strong>g> transcribed string entered, |T| is <str<strong>on</strong>g>the</str<strong>on</strong>g> length <str<strong>on</strong>g>of</str<strong>on</strong>g> T,<br />

and S is <str<strong>on</strong>g>the</str<strong>on</strong>g> time in sec<strong>on</strong>ds from <str<strong>on</strong>g>the</str<strong>on</strong>g> entry <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> first character to<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> final keystroke. We included post-hoc editing in <str<strong>on</strong>g>the</str<strong>on</strong>g> entry rate<br />

calculati<strong>on</strong>, so S included <str<strong>on</strong>g>the</str<strong>on</strong>g> time taken to review and edit a<br />

paragraph.<br />

We evaluated accuracy in two ways. First, we computed <str<strong>on</strong>g>the</str<strong>on</strong>g> error<br />

rate <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> speech recognizer for all text entered with speech.<br />

Sec<strong>on</strong>d, we computed <str<strong>on</strong>g>the</str<strong>on</strong>g> error rate <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> final transcripti<strong>on</strong>s for<br />

both speech and keyboard inputs. We measured <str<strong>on</strong>g>the</str<strong>on</strong>g> error rate in<br />

terms <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> Word Error Rate (WER), a standard metric used to<br />

evaluate speech recogniti<strong>on</strong> systems [19]. The WER is <str<strong>on</strong>g>the</str<strong>on</strong>g> word<br />

edit distance (i.e., <str<strong>on</strong>g>the</str<strong>on</strong>g> number <str<strong>on</strong>g>of</str<strong>on</strong>g> word inserti<strong>on</strong>s, deleti<strong>on</strong>s, and<br />

replacements) between a reference and transcripti<strong>on</strong> text,<br />

normalized <str<strong>on</strong>g>by</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> length <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> reference text. For evaluating<br />

recogniti<strong>on</strong> errors, <str<strong>on</strong>g>the</str<strong>on</strong>g> reference text is <str<strong>on</strong>g>the</str<strong>on</strong>g> <str<strong>on</strong>g>of</str<strong>on</strong>g>fline, humanperceived<br />

transcript <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> participant’s speech. For evaluating <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

final, edited keyboard and speech text, determining <str<strong>on</strong>g>the</str<strong>on</strong>g> reference<br />

text is less straight-forward. Since we are uncertain <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> user’s<br />

intended entry (this was a compositi<strong>on</strong> and not a transcripti<strong>on</strong><br />

task), it is difficult to identify errors. As such, we c<strong>on</strong>sidered a<br />

word to be an error if it was (1) a misspelled word, (2) a n<strong>on</strong><br />

sequitur that made no sense in <str<strong>on</strong>g>the</str<strong>on</strong>g> c<strong>on</strong>text <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> sentence, or (3) a<br />

clear grammatical error such as a repeated word, or a short<br />

sentence fragment. Since participants’ input was c<strong>on</strong>versati<strong>on</strong>al,<br />

classifying words as errors was relatively straight-forward, given<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> aforementi<strong>on</strong>ed categories.<br />

We used two-sided t-tests to compare text entry rates, since <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

WPM was roughly normally distributed. For error rates, which<br />

were not normally distributed, we used Wilcox<strong>on</strong> Rank Sums<br />

tests to compare <str<strong>on</strong>g>the</str<strong>on</strong>g> means between input methods.<br />

4.2 Results<br />

There was high variability in <str<strong>on</strong>g>the</str<strong>on</strong>g> speech input behavior am<strong>on</strong>g<br />

participants, so we explain some <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> results in terms <str<strong>on</strong>g>of</str<strong>on</strong>g><br />

individual performance. P5 and P6 did not know how to<br />

repositi<strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> cursor, so <str<strong>on</strong>g>the</str<strong>on</strong>g>ir ability to edit was limited,<br />

impacting both speed and accuracy. P5’s accuracy was also<br />

affected <str<strong>on</strong>g>by</str<strong>on</strong>g> her thick foreign accent, although she still used<br />

speech for input weekly. On <str<strong>on</strong>g>the</str<strong>on</strong>g> o<str<strong>on</strong>g>the</str<strong>on</strong>g>r hand, P3 had many years <str<strong>on</strong>g>of</str<strong>on</strong>g><br />

experience with dictati<strong>on</strong> systems <strong>on</strong> different platforms and<br />

spoke with no hesitati<strong>on</strong> and clear dicti<strong>on</strong>.<br />

4.2.1 Entry Time and Accuracy<br />

The entry rate for speech input was much higher than for<br />

keyboard input. With speech input, participants entered text at a<br />

rate <str<strong>on</strong>g>of</str<strong>on</strong>g> 19.5 WPM (SD = 10.1), while <str<strong>on</strong>g>the</str<strong>on</strong>g>y entered <strong>on</strong>ly 4.3 WPM<br />

(SD = 1.5) with <str<strong>on</strong>g>the</str<strong>on</strong>g> keyboard. As expected, this resulted in a<br />

significant effect <str<strong>on</strong>g>of</str<strong>on</strong>g> Method <strong>on</strong> WPM (t (10) =-5.07, p < 0.001). The<br />

maximum entry rate was achieved <str<strong>on</strong>g>by</str<strong>on</strong>g> P3, at 34.6 WPM with<br />

speech, while <str<strong>on</strong>g>the</str<strong>on</strong>g> maximum speed for keyboard input was<br />

achieved <str<strong>on</strong>g>by</str<strong>on</strong>g> P7, at 6.7 WPM.<br />

When inputting speech, participants spent most <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g>ir time<br />

reviewing and editing recogniti<strong>on</strong> errors. On average, this<br />

amounted to 80.3% (SD = 10.2) <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> compositi<strong>on</strong> time. P1 spent<br />

94.2% (about 10 minutes) <str<strong>on</strong>g>of</str<strong>on</strong>g> his time reviewing and editing <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

(1)<br />

recognizer’s output—more than any o<str<strong>on</strong>g>the</str<strong>on</strong>g>r participant. Figure 4<br />

shows <str<strong>on</strong>g>the</str<strong>on</strong>g> amount <str<strong>on</strong>g>of</str<strong>on</strong>g> time participants spent dictating vs.<br />

reviewing and editing <str<strong>on</strong>g>the</str<strong>on</strong>g>ir speech input compositi<strong>on</strong>s. Each bar<br />

in <str<strong>on</strong>g>the</str<strong>on</strong>g> plot represents <strong>on</strong>e composed paragraph (i.e., <strong>on</strong>e task).<br />

Time (Minutes)<br />

0 5 10 15 20<br />

Inline entry<br />

Review & edits<br />

1.1 2.1 2.2 3.1 3.2 4.1 5.1 6.1 7.1 7.2 8.1<br />

Participant.task<br />

Figure 4. Time spent entering (red) and reviewing<br />

and editing (blue) text. Each column represents <strong>on</strong>e<br />

composed paragraph (i.e., a task).<br />

While participants spent most <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g>ir time reviewing and editing<br />

text when using speech input, <str<strong>on</strong>g>the</str<strong>on</strong>g>y spent little time <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g>se<br />

activities when using <str<strong>on</strong>g>the</str<strong>on</strong>g> keyboard. On average, participants spent<br />

<strong>on</strong>ly 9.0% (SD = 11.7) <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g>ir time reviewing and editing<br />

keyboard output. This amounted to an average <str<strong>on</strong>g>of</str<strong>on</strong>g> 1.2 minutes <str<strong>on</strong>g>of</str<strong>on</strong>g><br />

review and edit time with <str<strong>on</strong>g>the</str<strong>on</strong>g> keyboard compared to 5.4 minutes<br />

<str<strong>on</strong>g>of</str<strong>on</strong>g> review and edit time with speech input. Participants made most<br />

<str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g>ir edits in <str<strong>on</strong>g>the</str<strong>on</strong>g> keyboard c<strong>on</strong>diti<strong>on</strong> inline, <str<strong>on</strong>g>by</str<strong>on</strong>g> deleting text<br />

with <str<strong>on</strong>g>the</str<strong>on</strong>g> BACKSPACE key and reentering it.<br />

The number <str<strong>on</strong>g>of</str<strong>on</strong>g> errors produced <str<strong>on</strong>g>by</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> ASR largely determined<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> amount <str<strong>on</strong>g>of</str<strong>on</strong>g> reviewing and editing participants performed, and,<br />

in turn, <str<strong>on</strong>g>the</str<strong>on</strong>g>ir overall rate <str<strong>on</strong>g>of</str<strong>on</strong>g> entry. The average WER for <str<strong>on</strong>g>the</str<strong>on</strong>g> iOS<br />

ASR was 10.2% (SD = 10.4), ranging from 0% for P3 to 35.6%<br />

for P5. Figure 5 shows <str<strong>on</strong>g>the</str<strong>on</strong>g> entry rates <str<strong>on</strong>g>of</str<strong>on</strong>g> each paragraph input<br />

with speech as a functi<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> ASR’s WER. As expected, <str<strong>on</strong>g>the</str<strong>on</strong>g>re<br />

was a negative correlati<strong>on</strong>, with an outlier point in <str<strong>on</strong>g>the</str<strong>on</strong>g> far right<br />

for P5.<br />

Word Per Minute (WPM)<br />

0 5 10 20 30<br />

●<br />

3.1,3.2<br />

●<br />

●<br />

●<br />

8.1<br />

2.1<br />

7.2<br />

●<br />

●<br />

7.1<br />

6.1 ● 2.2<br />

● 1.1<br />

● 4.1<br />

0.0 0.1 0.2 0.3 0.4<br />

<str<strong>on</strong>g>Speech</str<strong>on</strong>g> Recogniti<strong>on</strong> Word Error Rate (WER)<br />

Figure 5. Entry rate vs. speech recogniti<strong>on</strong> error rate for<br />

speech input. The labels <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> plotted points indicate <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

participant (1 – 8) and paragraph number (1 – 2).<br />

Participants corrected most <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> speech recognizer’s errors,<br />

yielding a mean WER <str<strong>on</strong>g>of</str<strong>on</strong>g> 3.2% (SD = 4.6) for <str<strong>on</strong>g>the</str<strong>on</strong>g> final text input<br />

with speech. The mean WER for text input with <str<strong>on</strong>g>the</str<strong>on</strong>g> keyboard was<br />

slightly higher, at 4.0% (SD = 3.3). There was no significant<br />

effect <str<strong>on</strong>g>of</str<strong>on</strong>g> Method <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> WER <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> final text (W = 77, n.s.).<br />

●<br />

5.1


4.2.2 Reviewing and Editing Text<br />

In this secti<strong>on</strong>, we describe <str<strong>on</strong>g>the</str<strong>on</strong>g> reviewing and editing techniques<br />

that participants used when composing paragraphs. After entering<br />

several sentences, most participants reviewed <str<strong>on</strong>g>the</str<strong>on</strong>g>ir text <str<strong>on</strong>g>by</str<strong>on</strong>g><br />

making VoiceOver read it word <str<strong>on</strong>g>by</str<strong>on</strong>g> word. If a word did not sound<br />

correct, <str<strong>on</strong>g>the</str<strong>on</strong>g>y read it character <str<strong>on</strong>g>by</str<strong>on</strong>g> character. To do this,<br />

participants used <str<strong>on</strong>g>the</str<strong>on</strong>g> VoiceOver rotor [3], which enables users to<br />

read text <strong>on</strong>e unit at a time. A unit can be a line, word, or<br />

character. To move <str<strong>on</strong>g>the</str<strong>on</strong>g> cursor from <strong>on</strong>e unit to <str<strong>on</strong>g>the</str<strong>on</strong>g> next, a user<br />

swipes down <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> screen. The user can also swipe up to move<br />

to <str<strong>on</strong>g>the</str<strong>on</strong>g> previous unit. In this way, participants moved <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

VoiceOver cursor around <str<strong>on</strong>g>the</str<strong>on</strong>g> text in <str<strong>on</strong>g>the</str<strong>on</strong>g> text area to verify <str<strong>on</strong>g>the</str<strong>on</strong>g>ir<br />

input. When reviewing, participants <str<strong>on</strong>g>of</str<strong>on</strong>g>ten read words several<br />

times, <str<strong>on</strong>g>the</str<strong>on</strong>g>n iterated through <str<strong>on</strong>g>the</str<strong>on</strong>g>ir characters.<br />

Some ASR errors were difficult to detect using <str<strong>on</strong>g>the</str<strong>on</strong>g> VoiceOver<br />

text-to-speech because <str<strong>on</strong>g>the</str<strong>on</strong>g>y sounded similar to <str<strong>on</strong>g>the</str<strong>on</strong>g> user’s original<br />

speech. For example, P1 dictated <str<strong>on</strong>g>the</str<strong>on</strong>g> phrase “lost my sight,” but<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> speech recognizer output <str<strong>on</strong>g>the</str<strong>on</strong>g> phrase, “lost my site.” P1 did not<br />

detect this error. Ano<str<strong>on</strong>g>the</str<strong>on</strong>g>r participant, P6, said, “you can hike,”<br />

but <str<strong>on</strong>g>the</str<strong>on</strong>g> speech recognizer output, “you can’t hike.” P6 did not<br />

detect <str<strong>on</strong>g>the</str<strong>on</strong>g> error ei<str<strong>on</strong>g>the</str<strong>on</strong>g>r. Some grammatical errors were also not<br />

detected in <str<strong>on</strong>g>the</str<strong>on</strong>g> review process. VoiceOver’s text-to-speech does<br />

not differentiate upper- and lower-case letters, so <str<strong>on</strong>g>the</str<strong>on</strong>g>re were some<br />

uncorrected case errors. The composed paragraphs also included<br />

extra spaces between words and sentences that participants did<br />

not seem to be aware <str<strong>on</strong>g>of</str<strong>on</strong>g>.<br />

We observed that VoiceOver did not alert participants to passages<br />

that were recognized with low-c<strong>on</strong>fidence. Occasi<strong>on</strong>ally, <str<strong>on</strong>g>the</str<strong>on</strong>g> iOS<br />

speech recognizer underlined a word or phrase that was<br />

recognized with low c<strong>on</strong>fidence, and would present <str<strong>on</strong>g>the</str<strong>on</strong>g> user with<br />

alternative recogniti<strong>on</strong> opti<strong>on</strong>s if <str<strong>on</strong>g>the</str<strong>on</strong>g> phrase was touched. This<br />

informati<strong>on</strong> was not accessible. Also, we observed that<br />

VoiceOver communicated punctuati<strong>on</strong> marks in <str<strong>on</strong>g>the</str<strong>on</strong>g> text in a<br />

subtle manner, <str<strong>on</strong>g>by</str<strong>on</strong>g> varying prosody and pausing. This made it<br />

difficult for participants to detect punctuati<strong>on</strong> marks in <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

recognized text when reviewing it.<br />

The review process took l<strong>on</strong>ger than we expected, but participants<br />

spent much more time editing <str<strong>on</strong>g>the</str<strong>on</strong>g> ASR output. Figure 6 shows <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

number <str<strong>on</strong>g>of</str<strong>on</strong>g> edits participants performed for each paragraph input<br />

with speech. The total number <str<strong>on</strong>g>of</str<strong>on</strong>g> edits performed during speech<br />

input tasks was 96. Edits included inserting, deleting, and<br />

replacing characters, words, or strings <str<strong>on</strong>g>of</str<strong>on</strong>g> words. Only 15 (15.6%)<br />

<str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g>se edits were d<strong>on</strong>e using speech input. Although inputting<br />

speech was much faster than using <str<strong>on</strong>g>the</str<strong>on</strong>g> keyboard, participants<br />

preferred using <str<strong>on</strong>g>the</str<strong>on</strong>g> keyboard to edit <str<strong>on</strong>g>the</str<strong>on</strong>g> ASR output. Several<br />

participants inserted words or phrases with speech but deleted<br />

err<strong>on</strong>eous text with <str<strong>on</strong>g>the</str<strong>on</strong>g> keyboard’s BACKSPACE key. Also, <str<strong>on</strong>g>the</str<strong>on</strong>g>re<br />

was no way to move <str<strong>on</strong>g>the</str<strong>on</strong>g> cursor using speech commands, so<br />

participants preferred to c<strong>on</strong>tinue using touch to make characteror<br />

word-level edits.<br />

We observed three editing techniques for speech input in our<br />

study. We describe <str<strong>on</strong>g>the</str<strong>on</strong>g>m as follows.<br />

H<strong>on</strong>e in, delete, and reenter. The first technique was <str<strong>on</strong>g>the</str<strong>on</strong>g> most<br />

comm<strong>on</strong> and was used <str<strong>on</strong>g>by</str<strong>on</strong>g> all participants except P5 and P6 who<br />

did not know how to use VoiceOver gestures. This technique<br />

involved (1) moving <str<strong>on</strong>g>the</str<strong>on</strong>g> cursor to a desired positi<strong>on</strong> using <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

VoiceOver gestures describe previously, (2) using <str<strong>on</strong>g>the</str<strong>on</strong>g> BACKSPACE<br />

key to delete unwanted characters, words, or even phrases, and<br />

finally (3) to enter characters using <str<strong>on</strong>g>the</str<strong>on</strong>g> keyboard.<br />

H<strong>on</strong>e in, select, and reenter. The sec<strong>on</strong>d editing technique<br />

seems more efficient than <str<strong>on</strong>g>the</str<strong>on</strong>g> first, but was <strong>on</strong>ly used <str<strong>on</strong>g>by</str<strong>on</strong>g> P8. It<br />

involved (1) using VoiceOver gestures to move <str<strong>on</strong>g>the</str<strong>on</strong>g> cursor to a<br />

desired positi<strong>on</strong>, (2) choosing <str<strong>on</strong>g>the</str<strong>on</strong>g> EDIT opti<strong>on</strong> from <str<strong>on</strong>g>the</str<strong>on</strong>g> VoiceOver<br />

rotor, (3) selecting <str<strong>on</strong>g>the</str<strong>on</strong>g> word (<strong>on</strong>e <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> EDIT opti<strong>on</strong>s, al<strong>on</strong>g with<br />

COPY and PASTE), and (4) re-enter <str<strong>on</strong>g>the</str<strong>on</strong>g> word with <str<strong>on</strong>g>the</str<strong>on</strong>g> keyboard. P8<br />

sometimes re-entered <str<strong>on</strong>g>the</str<strong>on</strong>g> word with speech input. This technique<br />

required fewer key presses than “h<strong>on</strong>e in, delete, and reenter.”<br />

Delete and start over. P5 and P6 did not know how to repositi<strong>on</strong><br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> cursor, so <str<strong>on</strong>g>the</str<strong>on</strong>g>y used this technique. They deleted entered text<br />

with <str<strong>on</strong>g>the</str<strong>on</strong>g> DELETE key, starting from <str<strong>on</strong>g>the</str<strong>on</strong>g> end (where <str<strong>on</strong>g>the</str<strong>on</strong>g> cursor was<br />

located <str<strong>on</strong>g>by</str<strong>on</strong>g> default) and re-entered with speech. They both said<br />

that <str<strong>on</strong>g>the</str<strong>on</strong>g>y <str<strong>on</strong>g>of</str<strong>on</strong>g>ten used speech for short messages, and deleted<br />

everything and started over if <str<strong>on</strong>g>the</str<strong>on</strong>g>re was a recogniti<strong>on</strong> error. This<br />

was <str<strong>on</strong>g>the</str<strong>on</strong>g> least flexible and, most likely, <str<strong>on</strong>g>the</str<strong>on</strong>g> least efficient editing<br />

technique for l<strong>on</strong>ger messages.<br />

Number <str<strong>on</strong>g>of</str<strong>on</strong>g> Edits<br />

0 5 10 15 20 25 30<br />

Keyboard edits<br />

<str<strong>on</strong>g>Speech</str<strong>on</strong>g> edits<br />

1.1 2.1 2.2 3.1 3.2 4.1 5.1 6.1 7.1 7.2 8.1<br />

Participant.task<br />

Figure 6. Number <str<strong>on</strong>g>of</str<strong>on</strong>g> edits performed with each<br />

modality for each composed paragraph (i.e., task).<br />

4.2.3 Qualitative <str<strong>on</strong>g>Use</str<strong>on</strong>g>r Feedback<br />

Two <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> 8 (25%) participants in our study preferred keyboard<br />

input over speech input. These were P1 and P5, who were <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

<strong>on</strong>ly two participants who used speech weekly, ra<str<strong>on</strong>g>the</str<strong>on</strong>g>r than daily.<br />

All participants menti<strong>on</strong>ed speed as <str<strong>on</strong>g>the</str<strong>on</strong>g> primary benefit <str<strong>on</strong>g>of</str<strong>on</strong>g> using<br />

speech for input. P1 and P5 preferred <str<strong>on</strong>g>the</str<strong>on</strong>g> keyboard because <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

challenge <str<strong>on</strong>g>of</str<strong>on</strong>g> editing: P5 said she knew <str<strong>on</strong>g>the</str<strong>on</strong>g>re were mistakes and<br />

didn’t know how to fix <str<strong>on</strong>g>the</str<strong>on</strong>g>m. P1 said that “although [with] speech<br />

you can get more volume <str<strong>on</strong>g>of</str<strong>on</strong>g> text in <str<strong>on</strong>g>the</str<strong>on</strong>g>re…but my weakness is<br />

efficiently editing. that’s <str<strong>on</strong>g>the</str<strong>on</strong>g> downside <str<strong>on</strong>g>of</str<strong>on</strong>g> speech.”<br />

P3 and P8 said it was easy for <str<strong>on</strong>g>the</str<strong>on</strong>g>m to express <str<strong>on</strong>g>the</str<strong>on</strong>g>ir thoughts<br />

verbally, having had prior experience with dictati<strong>on</strong> systems,<br />

unlike P1 and P2 who found it more difficult. P8 also said that<br />

speech input helped him avoid spelling mistakes—auto-correct<br />

was not sufficient for correcting spelling mistakes.<br />

I rely <strong>on</strong> dictati<strong>on</strong> more because my spelling is not <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

greatest and feel like can compose more <str<strong>on</strong>g>of</str<strong>on</strong>g> coherent<br />

sentence verbally than <str<strong>on</strong>g>by</str<strong>on</strong>g> writing, especially <strong>on</strong> an<br />

iOS device where it’s difficult to correct mistakes. On<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> computer, you can see with spell check and<br />

correct. but <strong>on</strong> iOS device it’s easier not to have to<br />

worry about it.<br />

While most preferred speech, all participants found certain<br />

aspects <str<strong>on</strong>g>of</str<strong>on</strong>g> speech input frustrating. All participants except P3<br />

cited editing as a source <str<strong>on</strong>g>of</str<strong>on</strong>g> frustrati<strong>on</strong>. P8 would like to be able to<br />

do “inline” editing as he spoke. P7 wanted an easier way to edit<br />

with speech ra<str<strong>on</strong>g>the</str<strong>on</strong>g>r than using <str<strong>on</strong>g>the</str<strong>on</strong>g> keyboard, which she did for<br />

most edits. P7 and P3 menti<strong>on</strong>ed <str<strong>on</strong>g>the</str<strong>on</strong>g> problem <str<strong>on</strong>g>of</str<strong>on</strong>g> dictating words<br />

that were out-<str<strong>on</strong>g>of</str<strong>on</strong>g>-vocabulary, such as names. Figure 7 shows<br />

Likert scale resp<strong>on</strong>ses to three statements from <str<strong>on</strong>g>the</str<strong>on</strong>g> interviews at


<str<strong>on</strong>g>the</str<strong>on</strong>g> end <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> study sessi<strong>on</strong>s, showing varying levels <str<strong>on</strong>g>of</str<strong>on</strong>g><br />

satisfacti<strong>on</strong> and frustrati<strong>on</strong>. Figure 7 also shows that all<br />

participants felt inputting text with speech was much faster than<br />

with <str<strong>on</strong>g>the</str<strong>on</strong>g> keyboard.<br />

Frequency<br />

Frequency<br />

Frequency<br />

0.0 0.5 1.0 1.5 2.0<br />

0 1 2 3 4 5<br />

0 1 2 3 4<br />

0 1 2 3 4 5 6 7<br />

<str<strong>on</strong>g>Speech</str<strong>on</strong>g> is fast compared to <str<strong>on</strong>g>the</str<strong>on</strong>g> keyboard. Mean = 1.6 (SD = 0.9).<br />

0 1 2 3 4 5 6 7<br />

<str<strong>on</strong>g>Speech</str<strong>on</strong>g> is frustrating compared to <str<strong>on</strong>g>the</str<strong>on</strong>g> keyboard. Mean = 4.8 (SD = 2.1).<br />

0 1 2 3 4 5 6 7<br />

I'm satisfied with speech compared to <str<strong>on</strong>g>the</str<strong>on</strong>g> keyboard. Mean = 3.5 (SD = 1.9).<br />

Figure 7. Resp<strong>on</strong>ses to three statements <strong>on</strong> a<br />

7-point Likert scale (1 is str<strong>on</strong>gly agree, 7 is<br />

str<strong>on</strong>gly disagree).<br />

5. Discussi<strong>on</strong><br />

Our study showed that speech input is an efficient entry method<br />

for blind people compared to <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>-screen keyboard, yet it is<br />

impeded <str<strong>on</strong>g>by</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> time required to review and edit ASR output.<br />

<str<strong>on</strong>g>People</str<strong>on</strong>g> can speak intelligibly at a rate <str<strong>on</strong>g>of</str<strong>on</strong>g> about 150 WPM [31],<br />

but <str<strong>on</strong>g>the</str<strong>on</strong>g> average entry rate <str<strong>on</strong>g>of</str<strong>on</strong>g> blind people using speech in our<br />

study was just 19.5 WPM. N<strong>on</strong>e<str<strong>on</strong>g>the</str<strong>on</strong>g>less, this was comparable to<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> entry rate <str<strong>on</strong>g>of</str<strong>on</strong>g> sighted people using <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>-screen keyboard <str<strong>on</strong>g>of</str<strong>on</strong>g> a<br />

smartph<strong>on</strong>e, as found in prior work [7]. Fur<str<strong>on</strong>g>the</str<strong>on</strong>g>rmore, we found<br />

that <str<strong>on</strong>g>the</str<strong>on</strong>g> error rate <str<strong>on</strong>g>of</str<strong>on</strong>g> speech input was no higher than that <str<strong>on</strong>g>of</str<strong>on</strong>g><br />

keyboard input for participants in our study. It is important to<br />

note, however, that we measured accuracy <strong>on</strong>ly in terms <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

WER, which does not necessarily correlate with <str<strong>on</strong>g>the</str<strong>on</strong>g> intelligibility<br />

<str<strong>on</strong>g>of</str<strong>on</strong>g> text [19]. The WER penalizes equally for small and major<br />

errors in a word, but it is <str<strong>on</strong>g>the</str<strong>on</strong>g> standard measure for evaluating<br />

ASR accuracy.<br />

Six <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> eight participants (75%) preferred speech over <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

keyboard because <str<strong>on</strong>g>of</str<strong>on</strong>g> speed, but all participants faced challenges<br />

when using speech input. Editing was <str<strong>on</strong>g>the</str<strong>on</strong>g> primary challenge;<br />

participants spent <str<strong>on</strong>g>the</str<strong>on</strong>g> majority (80%) <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g>ir time editing <str<strong>on</strong>g>the</str<strong>on</strong>g> text<br />

output <str<strong>on</strong>g>by</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> recognizer. Surprisingly, <str<strong>on</strong>g>the</str<strong>on</strong>g>ir most comm<strong>on</strong> editing<br />

technique was highly inefficient in terms <str<strong>on</strong>g>of</str<strong>on</strong>g> keystrokes.<br />

Participants deleted characters with BACKSPACE and <str<strong>on</strong>g>the</str<strong>on</strong>g>n reentered<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g>m with <str<strong>on</strong>g>the</str<strong>on</strong>g> keyboard. It was unclear why <str<strong>on</strong>g>the</str<strong>on</strong>g>y did not<br />

select whole words to replace <str<strong>on</strong>g>the</str<strong>on</strong>g>m, or use speech for editing<br />

more than <str<strong>on</strong>g>the</str<strong>on</strong>g> keyboard. Perhaps some participants did not know<br />

how to select whole words with VoiceOver. They may have<br />

preferred to edit text with <str<strong>on</strong>g>the</str<strong>on</strong>g> keyboard because it was more<br />

predictable, preventing additi<strong>on</strong>al errors.<br />

Study resp<strong>on</strong>ses were less positive than our survey resp<strong>on</strong>ses<br />

were. This was probably because, in our study, we asked<br />

participants to enter paragraphs that were l<strong>on</strong>ger and more formal<br />

than many smartph<strong>on</strong>e communicati<strong>on</strong>s. For example, a text<br />

message input <str<strong>on</strong>g>by</str<strong>on</strong>g> a survey participant was probably less than four<br />

sentences l<strong>on</strong>g and not as formal as an email that <strong>on</strong>e would write<br />

to a potential employer (referring to <str<strong>on</strong>g>the</str<strong>on</strong>g> guidelines we gave<br />

participants in <str<strong>on</strong>g>the</str<strong>on</strong>g> study). <str<strong>on</strong>g>Speech</str<strong>on</strong>g> is currently better suited for<br />

short, casual messages, probably because <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> difficulty <str<strong>on</strong>g>of</str<strong>on</strong>g><br />

correcting and identifying errors. We believe <str<strong>on</strong>g>the</str<strong>on</strong>g> research<br />

community should facilitate <str<strong>on</strong>g>the</str<strong>on</strong>g> process <str<strong>on</strong>g>of</str<strong>on</strong>g> correcting c<strong>on</strong>tent and<br />

grammar to make speech input more versatile.<br />

Our study also uncovered interesting keyboard input behavior<br />

with VoiceOver. Since this was not our focus, we did not<br />

document challenges with keyboard input rigorously, but<br />

observed several interesting trends. Surprisingly, some<br />

participants did not use <str<strong>on</strong>g>the</str<strong>on</strong>g> auto-correct feature, which could<br />

have improved <str<strong>on</strong>g>the</str<strong>on</strong>g>ir speed and accuracy. They found it difficult<br />

to m<strong>on</strong>itor and dismiss auto-correct suggesti<strong>on</strong>s. We also<br />

observed that VoiceOver did not communicate punctuati<strong>on</strong><br />

clearly, and some minor grammatical issues, such as extra spaces<br />

between words, were <strong>on</strong>ly noticeable when reviewing text<br />

character <str<strong>on</strong>g>by</str<strong>on</strong>g> character. VoiceOver had a setting in which it speaks<br />

punctuati<strong>on</strong> marks, but not <strong>on</strong>e participant used this setting.<br />

VoiceOver also did not communicate misspelled words that were<br />

visually identified with an underline. Enabling participants to<br />

more easily identify punctuati<strong>on</strong> and grammar and spelling issues<br />

would likely improve efficiency and compositi<strong>on</strong> quality for both<br />

keyboard and speech input.<br />

Throughout <str<strong>on</strong>g>the</str<strong>on</strong>g> paper, we have compared speech input with <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

de facto standard accessible input method for touchscreens: <strong>on</strong>screen<br />

keyboard input with VoiceOver. However, <str<strong>on</strong>g>the</str<strong>on</strong>g>re are input<br />

alternatives that are comm<strong>on</strong>ly used <str<strong>on</strong>g>by</str<strong>on</strong>g> both blind and sighted<br />

people that should be c<strong>on</strong>sidered when evaluating speech. Several<br />

study participants used a small external keyboard with hard keys;<br />

<strong>on</strong>e participant used <str<strong>on</strong>g>the</str<strong>on</strong>g> keyboard <strong>on</strong> his Braille display, which<br />

c<strong>on</strong>nected to his iPh<strong>on</strong>e; <strong>on</strong>e participant used <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>-screen input<br />

method Fleksy [9], <strong>on</strong>e <str<strong>on</strong>g>of</str<strong>on</strong>g> many gesture-based text entry methods<br />

(see Related Work for o<str<strong>on</strong>g>the</str<strong>on</strong>g>rs). These alternatives are more private<br />

than speech, and probably more reliable in noisy envir<strong>on</strong>ments. It<br />

would be interesting to compare <str<strong>on</strong>g>the</str<strong>on</strong>g>se methods to speech in <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

future.<br />

6. Challenges for Future Research<br />

We distill our findings into a set <str<strong>on</strong>g>of</str<strong>on</strong>g> challenges for researchers<br />

interested in n<strong>on</strong>visual text entry. These challenges can be<br />

incorporated into both speech and gesture-based input methods.<br />

1. Text selecti<strong>on</strong> – a better method for n<strong>on</strong>visual selecti<strong>on</strong><br />

<str<strong>on</strong>g>of</str<strong>on</strong>g> text. This can also include o<str<strong>on</strong>g>the</str<strong>on</strong>g>r edit operati<strong>on</strong>s, such<br />

as cut, copy, and paste.<br />

2. Cursor positi<strong>on</strong>ing – an easier way to move a cursor<br />

around a text area; enable a user to easily h<strong>on</strong>e in <strong>on</strong><br />

errors.


3. Error detecti<strong>on</strong> – an easier way to detect errors such as<br />

spelling mistakes, letter case errors, and low-c<strong>on</strong>fidence<br />

ASR output.<br />

4. Auto-correct – a study <str<strong>on</strong>g>of</str<strong>on</strong>g> how well auto-correct works<br />

for n<strong>on</strong>visual use, and a way to make it more effective.<br />

7. C<strong>on</strong>clusi<strong>on</strong><br />

We have explored <str<strong>on</strong>g>the</str<strong>on</strong>g> patterns and challenges <str<strong>on</strong>g>of</str<strong>on</strong>g> speech input for<br />

blind mobile device users through a survey and a laboratory<br />

study. We found that speech input is a popular alternative to <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

<strong>on</strong>-screen keyboard with VoiceOver, yet people face challenges<br />

when reviewing and editing a speech recognizer’s output, <str<strong>on</strong>g>of</str<strong>on</strong>g>ten<br />

resorting to using <str<strong>on</strong>g>the</str<strong>on</strong>g> keyboard. We hope this work will enable<br />

text entry researchers to better understand <str<strong>on</strong>g>the</str<strong>on</strong>g> patterns and<br />

challenges <str<strong>on</strong>g>of</str<strong>on</strong>g> current n<strong>on</strong>visual text entry, and spur fur<str<strong>on</strong>g>the</str<strong>on</strong>g>r<br />

research in <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong> n<strong>on</strong>visual speech input.<br />

8. ACKNOWLEDGMENTS<br />

We thank Gina-Anne Levow, Richard Ladner, Rochelle H. Ng,<br />

and Sim<strong>on</strong>e Schaffer. This work was supported in part <str<strong>on</strong>g>by</str<strong>on</strong>g> AT&T<br />

Labs and <str<strong>on</strong>g>the</str<strong>on</strong>g> Nati<strong>on</strong>al Science Foundati<strong>on</strong> under grant No.<br />

1116051.<br />

9. REFERENCES<br />

1. AirServer. http://www.airserver.com/.<br />

2. Apple Inc., AirPlay. http://www.apple.com/airplay/<br />

3. Apple Inc., Chapter 11: Using VoiceOver Gestures. From<br />

VoiceOver Getting Started. http://www.apple.com/<br />

voiceover/info/guide/_1137.html#vo28035.<br />

4. Apple Inc., Siri. http://www.apple.com/ios/siri/<br />

5. Azenkot, S., Wobbrock, J.O., Prasain, S., and Ladner, R.E.<br />

<str<strong>on</strong>g>Input</str<strong>on</strong>g> finger detecti<strong>on</strong> for n<strong>on</strong>visual touch screen text entry<br />

in Perkinput. Proc. GI ‘12, 121–129.<br />

6. B<strong>on</strong>ner, M., Brudvik, J., Abowd, G. Edwards, K. (2010).<br />

No-Look Notes: Accessible Eyes-Free Multi-Touch Text<br />

Entry. Proc. Pervasive ’10, 409-426.<br />

7. Castellucci, S., and MacKenzie, I.S. (2011). Ga<str<strong>on</strong>g>the</str<strong>on</strong>g>ring text<br />

entry metrics <strong>on</strong> android devices. Proc. CHI EA '11, 1507-<br />

1512.<br />

8. Fischer, A.R.H., Price, K.J., and Sears, A. <str<strong>on</strong>g>Speech</str<strong>on</strong>g>-based<br />

Text Entry for Mobile Handheld Devices: An analysis <str<strong>on</strong>g>of</str<strong>on</strong>g><br />

efficacy and error correcti<strong>on</strong> techniques for server-based<br />

soluti<strong>on</strong>s. Internati<strong>on</strong>al Journal <str<strong>on</strong>g>of</str<strong>on</strong>g> Human-Computer<br />

Interacti<strong>on</strong> 19, 3 (2005), 279–304.<br />

9. Fleksy App <str<strong>on</strong>g>by</str<strong>on</strong>g> Syntellia. http://fleksy.com/<br />

10. Frey, B., Sou<str<strong>on</strong>g>the</str<strong>on</strong>g>rn, C., and Romero, M. BrailleTouch:<br />

Mobile Texting for <str<strong>on</strong>g>the</str<strong>on</strong>g> Visually Impaired. Proc. UAHCI'11,<br />

19-25.<br />

11. Google. Voice Search Anywhere. http://www.google.com/<br />

insidesearch/features/voicesearch/index-chrome.html<br />

12. Halvers<strong>on</strong>, C., Horn, D., Karat, C., and Karat, J. The beauty<br />

<str<strong>on</strong>g>of</str<strong>on</strong>g> errors: patterns <str<strong>on</strong>g>of</str<strong>on</strong>g> error correcti<strong>on</strong> in desktop speech<br />

systems. IOS Press (1999), 133–140.<br />

13. Kane, S.K., Bigham, J.P. and Wobbrock, J.O. (2008). Slide<br />

Rule: Making mobile touch screens accessible to blind<br />

people using multitouch interacti<strong>on</strong> techniques. Proc.<br />

ASSETS '08, 73-80.<br />

14. Kane, S. K., Wobbrock, J. O. and Ladner, R. E. (2011).<br />

Usable gestures for blind people: understanding preference<br />

and performance. Proc. CHI '11, 413-422.<br />

15. Karat, C.-M., Halvers<strong>on</strong>, C., Horn, D., and Karat, J. Patterns<br />

<str<strong>on</strong>g>of</str<strong>on</strong>g> entry and correcti<strong>on</strong> in large vocabulary c<strong>on</strong>tinuous<br />

speech recogniti<strong>on</strong> systems. Proc. CHI‘99, 568–575.<br />

16. Kumar, A., Paek, T., and Lee, B. Voice typing: a new speech<br />

interacti<strong>on</strong> model for dictati<strong>on</strong> <strong>on</strong> touchscreen devices. Proc.<br />

CHI ‘12, 2277-2286.<br />

17. MacKenzie, I.S. and Zhang, S.X. (1999). The design and<br />

evaluati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> a high-performance s<str<strong>on</strong>g>of</str<strong>on</strong>g>t keyboard. Proc. CHI<br />

'99, 25-31.<br />

18. Mascetti, S., Bernareggi, C., and Belotti, M. (2011).<br />

TypeInBraille: a braille-based typing applicati<strong>on</strong> for<br />

touchscreen devices. Proc. ASSETS '11, 295-296.<br />

19. Martin, T.B. and Welch, J.R. Practical speech recognizers<br />

and some performance effectiveness parameters. In Trends<br />

in <str<strong>on</strong>g>Speech</str<strong>on</strong>g> Recogniti<strong>on</strong>. Prentice Hall, Englewood Cliffs, NJ,<br />

USA, 1980.<br />

20. Mishra, T., Ljolje, A., Gilbert, Mazin. (2011). Predicting<br />

Human Perceived Accuracy <str<strong>on</strong>g>of</str<strong>on</strong>g> ASR Systems. Proc.<br />

INTERSPEECH ’11, 1945-1948.<br />

21. Nuance. Drag<strong>on</strong> Naturally Speaking S<str<strong>on</strong>g>of</str<strong>on</strong>g>tware.<br />

http://www.nuance.com/drag<strong>on</strong>/index.htm<br />

22. Oliveira, J., Guerreiro, T., Nicolau, H., Jorge, J., and<br />

G<strong>on</strong>çalves, D. (2011). <str<strong>on</strong>g>Blind</str<strong>on</strong>g> people and mobile touch-based<br />

text-entry: acknowledging <str<strong>on</strong>g>the</str<strong>on</strong>g> need for different flavors.<br />

Proc. ASSETS '11, 179-186.<br />

23. Oviatt, S. Taming recogniti<strong>on</strong> errors with a multimodal<br />

interface. Commun. ACM 43, 9 (2000), 45–51.<br />

24. Pitt, I., and Edwards, A.D.N. (1996). Improving <str<strong>on</strong>g>the</str<strong>on</strong>g> usability<br />

<str<strong>on</strong>g>of</str<strong>on</strong>g> speech-based interfaces for blind users. Proc. ASSETS<br />

’96, 124-130.<br />

25. Sánchez, J. and Aguayo, F. (2006), Mobile messenger for<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> blind. Proc. UAAI ‘06, 369-385<br />

26. Sears, A., Karat, C.-M., Oseitutu, K., Karimullah, A., and<br />

Feng, J. Productivity, satisfacti<strong>on</strong>, and interacti<strong>on</strong> strategies<br />

<str<strong>on</strong>g>of</str<strong>on</strong>g> individuals with spinal cord injuries and traditi<strong>on</strong>al users<br />

interacting with speech recogniti<strong>on</strong> s<str<strong>on</strong>g>of</str<strong>on</strong>g>tware. Universal<br />

Access in <str<strong>on</strong>g>the</str<strong>on</strong>g> Informati<strong>on</strong> Society 1, 1 (2001), 4–15.<br />

27. Sou<str<strong>on</strong>g>the</str<strong>on</strong>g>rn, C., Claws<strong>on</strong>, J., Frey, B., Abowd, B., and Romero,<br />

M. 2012. An evaluati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> BrailleTouch: mobile touchscreen<br />

text entry for <str<strong>on</strong>g>the</str<strong>on</strong>g> visually impaired. Proc. MobileHCI '12,<br />

317-326.<br />

28. Stent, A., Syrdal, A., and Mishra, T. 2011. On <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

intelligibility <str<strong>on</strong>g>of</str<strong>on</strong>g> fast syn<str<strong>on</strong>g>the</str<strong>on</strong>g>sized speech for individuals with<br />

early-<strong>on</strong>set blindness. Proc. ASSETS '11, 211-218.<br />

29. Suhm, B., Myers, B., and Waibel, A. Multi-Modal Error<br />

Correcti<strong>on</strong> for <str<strong>on</strong>g>Speech</str<strong>on</strong>g> <str<strong>on</strong>g>Use</str<strong>on</strong>g>r Interfaces. ACM TOCHI 8, 1<br />

(2001), 60-98.<br />

30. Tinwala, H, and MacKenzie, I. S. (2009). Eyes-free text<br />

entry <strong>on</strong> a touchscreen ph<strong>on</strong>e. Proc. TIC-STH ’09, 83-88.<br />

31. Williams, J. R. (1998). Guidelines for <str<strong>on</strong>g>the</str<strong>on</strong>g> use <str<strong>on</strong>g>of</str<strong>on</strong>g> multimedia<br />

in instructi<strong>on</strong>, Proceedings <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> Human Factors and<br />

Erg<strong>on</strong>omics Society 42nd Annual Meeting, 1447–1451.<br />

32. Wobbrock, J.O. (2007). Measures <str<strong>on</strong>g>of</str<strong>on</strong>g> text entry<br />

performance. In Text Entry Systems: Mobility,<br />

Accessibility, Universality, I. S. MacKenzie and K. Tanaka-<br />

Ishii (eds.). San Francisco: Morgan Kaufmann, 47-74.<br />

33. Wobbrock, J.O. and Myers, B.A. Analyzing <str<strong>on</strong>g>the</str<strong>on</strong>g> input stream<br />

for character- level errors in unc<strong>on</strong>strained text entry<br />

evaluati<strong>on</strong>s. ACM TOCHI. 13, 4 (2006), 458–489.<br />

34. Yfantidis, G. and Evreinov, G. Adaptive blind interacti<strong>on</strong><br />

technique for touchscreens, Universal Access in <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

Informati<strong>on</strong> Society, 4, 2006, 328-337.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!