28.02.2014 Views

Exploring the Use of Speech Input by Blind People on ... - Washington

Exploring the Use of Speech Input by Blind People on ... - Washington

Exploring the Use of Speech Input by Blind People on ... - Washington

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

first we study current use to identify specific challenges that <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

accessibility community can address. We c<strong>on</strong>ducted a survey<br />

with 169 people (105 sighted and 65 blind and low-visi<strong>on</strong>) to<br />

learn how <str<strong>on</strong>g>of</str<strong>on</strong>g>ten people use speech input, what <str<strong>on</strong>g>the</str<strong>on</strong>g>y use it for, and<br />

how much <str<strong>on</strong>g>the</str<strong>on</strong>g>y like it. We <str<strong>on</strong>g>the</str<strong>on</strong>g>n c<strong>on</strong>ducted a laboratory study with<br />

8 blind people to observe how blind people use speech to<br />

compose paragraphs. We wanted to discover what techniques<br />

people used to review and edit ASR output and how effective<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g>se techniques were.<br />

In our survey, we found that blind people used speech for input<br />

more frequently and for l<strong>on</strong>ger messages than sighted people.<br />

<str<strong>on</strong>g>Blind</str<strong>on</strong>g> people were also more satisfied with speech than sighted<br />

people, probably because <str<strong>on</strong>g>the</str<strong>on</strong>g> comparative advantage <str<strong>on</strong>g>of</str<strong>on</strong>g> speech to<br />

keyboard input was far greater for <str<strong>on</strong>g>the</str<strong>on</strong>g>m than for sighted people.<br />

Our laboratory study showed that speech was nearly five times as<br />

fast as <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>-screen keyboard, but editing recogniti<strong>on</strong> errors was<br />

frustrating. Participants spent an average <str<strong>on</strong>g>of</str<strong>on</strong>g> 80.3% <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g>ir time<br />

reviewing and editing <str<strong>on</strong>g>the</str<strong>on</strong>g>ir text. Most edits were performed using<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> BACKSPACE key and reentering characters with <str<strong>on</strong>g>the</str<strong>on</strong>g> keyboard.<br />

Six out <str<strong>on</strong>g>of</str<strong>on</strong>g> eight participants in <str<strong>on</strong>g>the</str<strong>on</strong>g> study preferred speech to<br />

keyboard entry.<br />

Our main c<strong>on</strong>tributi<strong>on</strong> is our findings from <str<strong>on</strong>g>the</str<strong>on</strong>g> survey and study.<br />

Also, we c<strong>on</strong>tribute specific research challenges for <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

community to explore that can improve n<strong>on</strong>visual text entry using<br />

both speech and touch input.<br />

2. RELATED WORK<br />

To our knowledge, we are <str<strong>on</strong>g>the</str<strong>on</strong>g> first to explore speech input for<br />

blind people in <str<strong>on</strong>g>the</str<strong>on</strong>g> human-computer interacti<strong>on</strong> literature. <str<strong>on</strong>g>Speech</str<strong>on</strong>g><br />

has mostly been studied as a form <str<strong>on</strong>g>of</str<strong>on</strong>g> n<strong>on</strong>visual output (e.g., [24,<br />

28]) ra<str<strong>on</strong>g>the</str<strong>on</strong>g>r than n<strong>on</strong>visual input. There has been some work <strong>on</strong><br />

hands-free dictati<strong>on</strong>, both for people with motor impairments<br />

[26], and for <str<strong>on</strong>g>the</str<strong>on</strong>g> general populati<strong>on</strong> [12,14].<br />

Prior work <strong>on</strong> speech input interacti<strong>on</strong> focuses <strong>on</strong> error<br />

correcti<strong>on</strong>, <str<strong>on</strong>g>the</str<strong>on</strong>g> “Achilles Heel <str<strong>on</strong>g>of</str<strong>on</strong>g> speech technology” [23].<br />

Desktop dictati<strong>on</strong> systems such as Drag<strong>on</strong> Naturally Speaking<br />

[21], which gained popularity in <str<strong>on</strong>g>the</str<strong>on</strong>g> late 1990’s, use speech<br />

commands for cursor navigati<strong>on</strong> and error correcti<strong>on</strong>. <str<strong>on</strong>g>Use</str<strong>on</strong>g>rs speak<br />

commands such as “move left” and “undo” to edit text or<br />

repositi<strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> cursor. Karat et al. [14] found that novice users<br />

entered text at a rate <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>ly 13.6 WPM with a commercial<br />

desktop dictati<strong>on</strong> system and 32.5 WPM with a keyboard and<br />

mouse. This striking discrepancy was due to (1) cascades <str<strong>on</strong>g>of</str<strong>on</strong>g><br />

errors triggered <str<strong>on</strong>g>by</str<strong>on</strong>g> a user’s correcti<strong>on</strong> and (2) spiral-depth, a<br />

user’s repeated attempts to speak a word that is not correctly<br />

recognized.<br />

Some work has aimed to alleviate <str<strong>on</strong>g>the</str<strong>on</strong>g> difficulty <str<strong>on</strong>g>of</str<strong>on</strong>g> error<br />

correcti<strong>on</strong> through touch or stylus input (see [23] for a review).<br />

Suhm et al. [29] present a system where users touch <str<strong>on</strong>g>the</str<strong>on</strong>g> word<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g>y want to correct, eliminating speech-based navigati<strong>on</strong>. Their<br />

system also supports small gestures such as striking out a word as<br />

shortcuts. Martin et al. [19] enable users to correct errors <str<strong>on</strong>g>of</str<strong>on</strong>g><br />

preliminary results with a mouse click. These systems make error<br />

correcti<strong>on</strong> more efficient but have high visual demands and are<br />

not appropriate for bind users.<br />

There is little recent academic work <strong>on</strong> speech-based input<br />

systems. Voice Typing, introduced <str<strong>on</strong>g>by</str<strong>on</strong>g> Kumar et al. in 2012 [16],<br />

displays recognized text as <str<strong>on</strong>g>the</str<strong>on</strong>g> user dictates short phrases. <str<strong>on</strong>g>Use</str<strong>on</strong>g>rs<br />

can correct recogniti<strong>on</strong> errors with a marking menu. Kumar et al.<br />

found that correcting errors with Voice Typing required less<br />

effort than correcting errors with <str<strong>on</strong>g>the</str<strong>on</strong>g> iPh<strong>on</strong>e’s dictati<strong>on</strong> model.<br />

The iOS and Android dictati<strong>on</strong> systems resemble <str<strong>on</strong>g>the</str<strong>on</strong>g> literature<br />

described above. Dictati<strong>on</strong> <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> iPh<strong>on</strong>e follows an open-loop<br />

interacti<strong>on</strong> model, where <str<strong>on</strong>g>the</str<strong>on</strong>g> recognizer outputs text <strong>on</strong>ly after <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

user completes <str<strong>on</strong>g>the</str<strong>on</strong>g> dictati<strong>on</strong>. Android, in c<strong>on</strong>trast, currently has<br />

incremental speech recogniti<strong>on</strong>, displaying recognized text as <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

user speaks. With VoiceOver, iOS dictati<strong>on</strong> seems to be<br />

accessible to blind people, but it is unclear how incremental<br />

recogniti<strong>on</strong> affects accessibility <strong>on</strong> Android, especially since<br />

Android is not as generally accessible as iOS.<br />

While <str<strong>on</strong>g>the</str<strong>on</strong>g>re is no known work <strong>on</strong> n<strong>on</strong>visual speech input, <str<strong>on</strong>g>the</str<strong>on</strong>g>re<br />

has been significant interest in n<strong>on</strong>visual touch-based input. In<br />

2008, Slide Rule [13], introduced an accessible eyes-free<br />

interacti<strong>on</strong> technique for touchscreen devices which was later<br />

adopted <str<strong>on</strong>g>by</str<strong>on</strong>g> VoiceOver. Since <str<strong>on</strong>g>the</str<strong>on</strong>g>n, researchers and developers<br />

have been trying to improve <str<strong>on</strong>g>the</str<strong>on</strong>g> experience <str<strong>on</strong>g>of</str<strong>on</strong>g> eyes-free text entry<br />

with gesture-based methods. Several text entry techniques were<br />

proposed that were based <strong>on</strong> Braille, including Perkinput [5],<br />

BrailleTouch [10,27], BrailleType [22], and TypeInBraille [18].<br />

Methods that were not based <strong>on</strong> Braille [9,25,34], including No-<br />

Look Notes [6] and NavTouch [22], did not achieve comparable<br />

performance to <str<strong>on</strong>g>the</str<strong>on</strong>g> former set. Despite <str<strong>on</strong>g>the</str<strong>on</strong>g> large amount <str<strong>on</strong>g>of</str<strong>on</strong>g> work<br />

in this area, entry rates for blind users remain relatively slow: at<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> end <str<strong>on</strong>g>of</str<strong>on</strong>g> a l<strong>on</strong>gitudinal study, Perkinput users entered text at a<br />

rate <str<strong>on</strong>g>of</str<strong>on</strong>g> 7.3 WPM (with a 2.1% error rate). BrailleTouch users,<br />

who were expert users <str<strong>on</strong>g>of</str<strong>on</strong>g> Braille keyboards, entered text at a rate<br />

<str<strong>on</strong>g>of</str<strong>on</strong>g> 23.1 WPM, but with an error rate <str<strong>on</strong>g>of</str<strong>on</strong>g> 4.8%.<br />

Several eyes-free text entry methods were proposed and evaluated<br />

with sighted people (e.g., [30]), but <str<strong>on</strong>g>the</str<strong>on</strong>g>y may not be appropriate<br />

for blind users. As Kane et al. found [14], blind people have<br />

different preferences and performance abilities with touch screen<br />

gestures than sighted people.<br />

3. SURVEY: PATTERNS OF SPEECH<br />

INPUT AMONG BLIND AND SIGHTED<br />

PEOPLE<br />

We c<strong>on</strong>ducted a survey to determine how <str<strong>on</strong>g>of</str<strong>on</strong>g>ten blind people use<br />

speech input <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g>ir mobile devices, what <str<strong>on</strong>g>the</str<strong>on</strong>g>y use it for, and<br />

how <str<strong>on</strong>g>the</str<strong>on</strong>g>y feel about it. We surveyed both blind and sighted<br />

people to evaluate <str<strong>on</strong>g>the</str<strong>on</strong>g> n<strong>on</strong>visual experience against a baseline <str<strong>on</strong>g>of</str<strong>on</strong>g><br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> comm<strong>on</strong> (visual) use case.<br />

3.1 Methods<br />

We surveyed 54 blind participants, 10 low-visi<strong>on</strong> participants,<br />

and 105 sighted participants. There were 31 female and 33 male<br />

blind/low-visi<strong>on</strong> (BLV) participants, with an average age <str<strong>on</strong>g>of</str<strong>on</strong>g> 40<br />

(age range: 18 to 66). Sighted participants were younger, with 51<br />

males and 54 females and an average age <str<strong>on</strong>g>of</str<strong>on</strong>g> 32 (age range: 20 to<br />

66). We sent emails <strong>on</strong> mailing lists related to our university and<br />

blindness organizati<strong>on</strong>s to recruit participants. Participants did not<br />

receive compensati<strong>on</strong> for completing <str<strong>on</strong>g>the</str<strong>on</strong>g> survey.<br />

The survey included a maximum number <str<strong>on</strong>g>of</str<strong>on</strong>g> 9 questi<strong>on</strong>s. The first<br />

three questi<strong>on</strong>s asked for demographic informati<strong>on</strong>: age, gender,<br />

and disability. The next questi<strong>on</strong> asked:<br />

Have you recently used dictati<strong>on</strong> instead <str<strong>on</strong>g>of</str<strong>on</strong>g> a<br />

keyboard to enter text <strong>on</strong> a smartph<strong>on</strong>e?<br />

Examples <str<strong>on</strong>g>of</str<strong>on</strong>g> dictati<strong>on</strong> include:<br />

- Asking Siri a questi<strong>on</strong>, e.g., "what's <str<strong>on</strong>g>the</str<strong>on</strong>g> wea<str<strong>on</strong>g>the</str<strong>on</strong>g>r like today?"<br />

- Giving Siri a command, e.g., "call John Johns<strong>on</strong>"<br />

- Dictating an email or text message<br />

Required.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!