21.01.2014 Views

Conversations in the Crowd - Microsoft Research

Conversations in the Crowd - Microsoft Research

Conversations in the Crowd - Microsoft Research

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

While fur<strong>the</strong>r <strong>in</strong>vestigation is needed, our <strong>in</strong>itial results<br />

suggest that this is a feasible approach for collect<strong>in</strong>g<br />

task-oriented dialog data. We discuss future steps to<br />

improve <strong>the</strong> efficiency of data collection with <strong>the</strong> system.<br />

We conclude with a discussion of ongo<strong>in</strong>g work<br />

on provid<strong>in</strong>g tra<strong>in</strong><strong>in</strong>g data for spoken language understand<strong>in</strong>g<br />

models by hav<strong>in</strong>g <strong>the</strong> crowd collectively parse<br />

logs from <strong>the</strong> conversation <strong>in</strong>to <strong>the</strong>ir correspond<strong>in</strong>g semantic<br />

frames. Our work suggests a possible future <strong>in</strong><br />

which <strong>the</strong> crowd can be used to <strong>in</strong>creas<strong>in</strong>gly automate<br />

<strong>the</strong> process of build<strong>in</strong>g dialog systems <strong>in</strong> new doma<strong>in</strong>s,<br />

on-demand, with m<strong>in</strong>imal effort from developers.<br />

Background<br />

Our approach draws from rich prior work <strong>in</strong> both conversational<br />

dialog systems, and crowdsourc<strong>in</strong>g.<br />

Conversational Dialog Systems<br />

Our ChatCollect system is designed to alleviate <strong>the</strong><br />

development efforts for dialog systems. The manual<br />

author<strong>in</strong>g of <strong>the</strong>se systems has been recognized as a<br />

time consum<strong>in</strong>g effort that requires doma<strong>in</strong> expertise<br />

(Ward and Pellom 1999). This challeng<strong>in</strong>g and timeconsum<strong>in</strong>g<br />

process has been used <strong>in</strong> <strong>the</strong> development<br />

of many dialog systems (e.g., (Gandhe et al. 2009;<br />

Aleven et al. 2006; Allen et al. 1996)). Alternative approaches<br />

have focused on learn<strong>in</strong>g from transcribed<br />

human-human dialogs (Gor<strong>in</strong>, Riccardi, and Wright<br />

1997) or dialogs collected through Wizard-of-Oz studies<br />

(Lathrop 2004). However, exist<strong>in</strong>g human-human corpora<br />

apply to limited doma<strong>in</strong>s, and collect<strong>in</strong>g data for<br />

new doma<strong>in</strong>s with <strong>the</strong> exist<strong>in</strong>g techniques is difficult<br />

and expensive. Moreover, <strong>the</strong>y do not scale well – an<br />

important aspect given <strong>the</strong> amount of data needed to<br />

tra<strong>in</strong> accurate models for dialog systems. The system<br />

proposed <strong>in</strong> this paper holds promise for obta<strong>in</strong><strong>in</strong>g natural<br />

dialog data for various doma<strong>in</strong>s with little to no<br />

<strong>in</strong>volvement from system designers. The approach may<br />

enable automated tra<strong>in</strong><strong>in</strong>g of future dialog systems and<br />

speed up <strong>the</strong> real-world deployment of such systems.<br />

<strong>Crowd</strong>sourc<strong>in</strong>g<br />

<strong>Crowd</strong>sourc<strong>in</strong>g has ga<strong>in</strong>ed immense popularity <strong>in</strong> recent<br />

years as a means to provide programmatic, easy,<br />

and scalable access to human <strong>in</strong>telligence. <strong>Crowd</strong>sourc<strong>in</strong>g<br />

has been applied to a large variety of tasks rang<strong>in</strong>g<br />

from search relevance (Chen et al. 2013) to image description<br />

(Bigham et al. 2010). Recently, <strong>the</strong>re has been<br />

grow<strong>in</strong>g <strong>in</strong>terest <strong>in</strong> apply<strong>in</strong>g crowdsourc<strong>in</strong>g to language<br />

problems. <strong>Crowd</strong>sourc<strong>in</strong>g tasks have been designed for<br />

speech transcription (Lasecki et al. 2012), speech acquisition<br />

(Lane et al. 2010), translation (Zaidan and<br />

Callison-Burch 2011) and paraphrase generation (Burrows,<br />

Potthast, and Ste<strong>in</strong> 2013). Some prior work has<br />

particularly focused on crowdsourc<strong>in</strong>g methods for <strong>the</strong><br />

development of dialog systems. Wang et. al. studied<br />

methods for elicit<strong>in</strong>g natural language for a given semantic<br />

form (Wang et al. 2012). The Asgard system<br />

uses crowdsourc<strong>in</strong>g for free-form language generation<br />

and for <strong>the</strong> semantic label<strong>in</strong>g of segments of <strong>the</strong> collected<br />

language (Liu et al. 2013). Bessho et. al. (Bessho,<br />

Harada, and Kuniyoshi 2012) developed a dialog system<br />

that requests efforts from a real-time crowd to make<br />

progress <strong>in</strong> a dialog when automated approaches fail.<br />

Cont<strong>in</strong>uous real-time crowdsourc<strong>in</strong>g was <strong>in</strong>troduced<br />

by Lasecki et. al (Lasecki et al. 2011) <strong>in</strong> order to create<br />

crowd-powered systems capable of <strong>in</strong>teract<strong>in</strong>g with<br />

<strong>the</strong>ir users and <strong>the</strong>ir environment. This approach was<br />

applied to conversational <strong>in</strong>teraction <strong>in</strong> Chorus (Lasecki<br />

et al. 2013b; 2013a), a conversational question answer<strong>in</strong>g<br />

system. Both of <strong>the</strong>se systems are able to quickly get<br />

<strong>in</strong>put from crowd workers by pre-recruit<strong>in</strong>g (Bigham et<br />

al. 2010; Bernste<strong>in</strong> et al. 2011). ChatCollect differs from<br />

<strong>the</strong>se systems by recruit<strong>in</strong>g all sides of <strong>the</strong> dialog from<br />

<strong>the</strong> crowd for automated dialog generation.<br />

ChatCollect System<br />

The ChatCollect system is designed to collect realistic<br />

task-oriented dialog data from crowd workers. The<br />

only <strong>in</strong>puts of <strong>the</strong> system developer to <strong>the</strong> ChatCollect<br />

system are <strong>the</strong> different tasks <strong>the</strong> developer wants to<br />

collect data about. For example, a developer may tell<br />

ChatCollect to collect data about flight reservations,<br />

plann<strong>in</strong>g a night out or decid<strong>in</strong>g what car to buy.<br />

The ChatCollect system hires a pair of workers from a<br />

crowdsourc<strong>in</strong>g marketplace <strong>in</strong> real-time and l<strong>in</strong>ks <strong>the</strong>m<br />

to an <strong>in</strong>stant-messenger style chat w<strong>in</strong>dow to have a<br />

conversation about <strong>the</strong> given task, and assigns a role to<br />

each worker. The first worker routed to <strong>the</strong> chat w<strong>in</strong>dow<br />

is assigned <strong>the</strong> role of <strong>the</strong> “Assistant” and is <strong>in</strong>structed<br />

to help someone with <strong>the</strong> task at hand, such as f<strong>in</strong>d<strong>in</strong>g<br />

a flight. The second worker is assigned <strong>the</strong> role of<br />

<strong>the</strong> “User” and is given a task type to complete with<br />

<strong>the</strong> help of <strong>the</strong> Assistant (e.g., f<strong>in</strong>d flight). In order to<br />

add realistic depth to <strong>the</strong> conversations held by workers<br />

play<strong>in</strong>g a part, <strong>the</strong> system does not prime <strong>the</strong> User<br />

with <strong>the</strong> details of <strong>the</strong> task she needs to complete, but<br />

<strong>in</strong>stead it asks <strong>the</strong> User to imag<strong>in</strong>e such a sett<strong>in</strong>g. Both<br />

workers are <strong>in</strong>structed not to share any personally identifiable<br />

<strong>in</strong>formation about <strong>the</strong>mselves.<br />

S<strong>in</strong>ce <strong>the</strong> Assistant is hired first and routed to <strong>the</strong><br />

chat <strong>in</strong>terface, <strong>the</strong> User f<strong>in</strong>ds <strong>the</strong> Assistant wait<strong>in</strong>g<br />

when she is routed to <strong>the</strong> chat <strong>in</strong>terface. The User is<br />

<strong>in</strong>structed to start <strong>the</strong> dialog when she is at <strong>the</strong> chat<br />

<strong>in</strong>terface by tell<strong>in</strong>g <strong>the</strong> Assistant about <strong>the</strong> task. The<br />

User and Assistant both share <strong>in</strong>formation about <strong>the</strong><br />

task, discuss possible solutions, and revise <strong>the</strong> task.<br />

Workers are asked to complete <strong>the</strong> task as well as<br />

possible (ra<strong>the</strong>r than meet <strong>the</strong> m<strong>in</strong>imum requirements,<br />

or just accept any response). Once <strong>the</strong> User deems that<br />

<strong>the</strong> task is complete, she can signal <strong>the</strong> end of <strong>the</strong> <strong>in</strong>teraction<br />

by click<strong>in</strong>g “done”, which forwards her and<br />

<strong>the</strong> Assistant worker to a survey page ask<strong>in</strong>g <strong>the</strong>m each<br />

about <strong>the</strong> quality of <strong>the</strong> <strong>in</strong>teraction <strong>the</strong>y had. The survey<br />

also asks <strong>the</strong> Assistant workers about <strong>the</strong> resources<br />

(e.g., websites) <strong>the</strong>y used to complete <strong>the</strong> given task.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!