Conversations in the Crowd - Microsoft Research

Conversations in the Crowd - Microsoft Research Conversations in the Crowd - Microsoft Research

research.microsoft.com
from research.microsoft.com More from this publisher
21.01.2014 Views

Discussion and Future Work Our initial results suggest that ChatCollect can provide an effective tool for collecting task-oriented dialog data. We limited the size of the data collected for this exploratory work to 16 in order to more easily review each conversation manually, but ChatCollect can be run continuously to collect larger datasets. In future work, it will be interesting to explore how the dialog data collected with our system compares to prior datasets generated by bringing participants into a lab setting, or collected through a deployed system. While the completed conversations were all successful, fewer than half of the chat sessions that workers were routed to were completed. The main reason for this was that sometimes a large delay would occur between the arrivals of the two workers. Accordingly, none of the incomplete conversations had more than one round of dialog (each party spoke at most once when they arrived, but the first left before the second arrived). A quick way to address this issue is following marketplace specific strategies to attract workers to our tasks more quickly, such as reposting tasks to increase task visibility. As a more permanent fix, future versions of Chat- Collect will detect when one party disconnects, and route the other to an active task (with appropriate compensation for any work done so far). This also helps in the case when a conversation is partially complete, and one worker leaves, as was observed in preliminary testing. Also, pre-recruiting approaches used in prior work can help ensure worker availability (Bigham et al. 2010; Bernstein et al. 2011). Our ultimate goal is to enable the automatic training and scaling of dialog systems to new domains. Future versions of the system will investigate collecting speech from workers with microphones in order to train spoken language understanding components. We believe such a pipeline can reduce the cost of developing dialog systems that are able to easily generalize to new domains. Conclusion Crowdsourcing methods such as the ones presented here offer new opportunities for developing dialog systems that can continuously learn on demand with low cost. In this paper, we introduced ChatCollect, a system for collecting task-oriented dialog data using pairs of crowd workers. We presented some results from an initial set of 16 conversations containing a total of 4296 turns which we analyzed manually, finding that these conversations are appropriately varied and on-topic. We also discussed ongoing work on CrowdParse, a system that uses the crowd to parse semantic frames from dialogs. We believe this work takes first steps towards a future in which dialog systems can be easily trained in new domains by using crowd-generated datasets. Acknowledgements The authors would like to thank Eric Horvitz for many insightful discussions about this work. References Aleven, V.; Sewall, J.; McLaren, B. M.; and Koedinger, K. R. 2006. Rapid authoring of intelligent tutors for realworld and experimental use. In ICALT 2006, 847–851. Allen, J. F.; Miller, B. W.; Ringger, E. K.; and Sikorski, T. 1996. A robust system for natural spoken dialogue. In Proceedings of ACL 1996, 62–70. Bernstein, M. S.; Brandt, J. R.; Miller, R. C.; and Karger, D. R. 2011. Crowds in two seconds: Enabling realtime crowd-powered interfaces. In Proc. of UIST 2011, 33–42. Bessho, F.; Harada, T.; and Kuniyoshi, Y. 2012. Dialog system using real-time crowdsourcing and twitter large-scale corpus. In Proceedings of SIGDIAL 2012, 227–231. Bigham, J. P.; Jayant, C.; Ji, H.; Little, G.; Miller, A.; Miller, R. C.; Miller, R.; Tatarowicz, A.; White, B.; White, S.; and Yeh, T. 2010. Vizwiz: nearly real-time answers to visual questions. In Proceedings of UIST 2010, 333–342. Burrows, S.; Potthast, M.; and Stein, B. 2013. Paraphrase acquisition via crowdsourcing and machine learning. ACM Trans. Intell. Syst. Technol. 4(3):43:1–43:21. Chen, X.; Bennett, P. N.; Collins-Thompson, K.; and Horvitz, E. 2013. Pairwise ranking aggregation in a crowdsourced setting. In Proceedings of WSDM 2013, 193–202. Gandhe, S.; DeVault, D.; Roque, A.; Martinovski, B.; Artstein, R.; A. Leuski, J. G.; and Traum, D. 2009. From domain specification to virtual humans: An integrated approach to authoring tactical questioning characters. In Proceedings of Interspeech 2008. Gorin, A. L.; Riccardi, G.; and Wright, J. H. 1997. How may i help you? Speech Commun. 23(1-2):113–127. Lane, I.; Waibel, A.; Eck, M.; and Rottmann, K. 2010. Tools for collecting speech corpora via mechanical-turk. In Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk, CSLDAMT ’10, 184–187. Lasecki, W.; Murray, K.; White, S.; Miller, R. C.; and Bigham, J. P. 2011. Real-time crowd control of existing interfaces. In Proceedings of UIST 2011, 23–32. Lasecki, W.; Miller, C.; Sadilek, A.; AbuMoussa, A.; and Bigham, J. 2012. Real-time captioning by groups of nonexperts. In Proceedings of UIST 2012. Lasecki, W. S.; Thiha, P.; Zhong, Y.; Brady, E.; and Bigham, J. P. 2013a. Answering visual questions with conversational crowd assistants. In Proc. of ASSETS 2013, To Appear. Lasecki, W.; Wesley, R.; Nichols, J.; Kulkarni, A.; Allen, J.; and Bigham, J. 2013b. Chorus: A crowd-powered conversational assistant. In Proceedings of UIST 2013, To Appear. Lathrop, B. e. a. 2004. A wizard of oz framework for collecting spoken human-computer dialogs: An experiment procedure for the design and testing of natural language in-vehicle technology systems. In Proceedings of ITS 2004. Liu, J.; Pasupat, P.; Cyphers, S.; and Glass, J. 2013. Asgard: A portable architecture for multilingual dialogue systems. In Proceedings of ICASSP 2013. Wang, W. Y.; Bohus, D.; Kamar, E.; and Horvitz, E. 2012. Crowdsourcing the acquisition of natural language corpora: Methods and observations. In SLT 2012. IEEE. Ward, W., and Pellom, B. 1999. The cu communicator system. In Proceedings of IEEE ASRU. Zaidan, O. F., and Callison-Burch, C. 2011. Crowdsourcing translation: professional quality from non-professionals. In Proceedings of HLT 2011, 1220–1229.

Discussion and Future Work<br />

Our <strong>in</strong>itial results suggest that ChatCollect can provide<br />

an effective tool for collect<strong>in</strong>g task-oriented dialog<br />

data. We limited <strong>the</strong> size of <strong>the</strong> data collected for this<br />

exploratory work to 16 <strong>in</strong> order to more easily review<br />

each conversation manually, but ChatCollect can be run<br />

cont<strong>in</strong>uously to collect larger datasets. In future work,<br />

it will be <strong>in</strong>terest<strong>in</strong>g to explore how <strong>the</strong> dialog data<br />

collected with our system compares to prior datasets<br />

generated by br<strong>in</strong>g<strong>in</strong>g participants <strong>in</strong>to a lab sett<strong>in</strong>g,<br />

or collected through a deployed system.<br />

While <strong>the</strong> completed conversations were all successful,<br />

fewer than half of <strong>the</strong> chat sessions that workers<br />

were routed to were completed. The ma<strong>in</strong> reason for this<br />

was that sometimes a large delay would occur between<br />

<strong>the</strong> arrivals of <strong>the</strong> two workers. Accord<strong>in</strong>gly, none of <strong>the</strong><br />

<strong>in</strong>complete conversations had more than one round of<br />

dialog (each party spoke at most once when <strong>the</strong>y arrived,<br />

but <strong>the</strong> first left before <strong>the</strong> second arrived). A<br />

quick way to address this issue is follow<strong>in</strong>g marketplace<br />

specific strategies to attract workers to our tasks more<br />

quickly, such as repost<strong>in</strong>g tasks to <strong>in</strong>crease task visibility.<br />

As a more permanent fix, future versions of Chat-<br />

Collect will detect when one party disconnects, and<br />

route <strong>the</strong> o<strong>the</strong>r to an active task (with appropriate compensation<br />

for any work done so far). This also helps <strong>in</strong><br />

<strong>the</strong> case when a conversation is partially complete, and<br />

one worker leaves, as was observed <strong>in</strong> prelim<strong>in</strong>ary test<strong>in</strong>g.<br />

Also, pre-recruit<strong>in</strong>g approaches used <strong>in</strong> prior work<br />

can help ensure worker availability (Bigham et al. 2010;<br />

Bernste<strong>in</strong> et al. 2011).<br />

Our ultimate goal is to enable <strong>the</strong> automatic tra<strong>in</strong><strong>in</strong>g<br />

and scal<strong>in</strong>g of dialog systems to new doma<strong>in</strong>s. Future<br />

versions of <strong>the</strong> system will <strong>in</strong>vestigate collect<strong>in</strong>g speech<br />

from workers with microphones <strong>in</strong> order to tra<strong>in</strong> spoken<br />

language understand<strong>in</strong>g components. We believe such<br />

a pipel<strong>in</strong>e can reduce <strong>the</strong> cost of develop<strong>in</strong>g dialog systems<br />

that are able to easily generalize to new doma<strong>in</strong>s.<br />

Conclusion<br />

<strong>Crowd</strong>sourc<strong>in</strong>g methods such as <strong>the</strong> ones presented here<br />

offer new opportunities for develop<strong>in</strong>g dialog systems<br />

that can cont<strong>in</strong>uously learn on demand with low cost.<br />

In this paper, we <strong>in</strong>troduced ChatCollect, a system for<br />

collect<strong>in</strong>g task-oriented dialog data us<strong>in</strong>g pairs of crowd<br />

workers. We presented some results from an <strong>in</strong>itial set of<br />

16 conversations conta<strong>in</strong><strong>in</strong>g a total of 4296 turns which<br />

we analyzed manually, f<strong>in</strong>d<strong>in</strong>g that <strong>the</strong>se conversations<br />

are appropriately varied and on-topic. We also discussed<br />

ongo<strong>in</strong>g work on <strong>Crowd</strong>Parse, a system that uses <strong>the</strong><br />

crowd to parse semantic frames from dialogs. We believe<br />

this work takes first steps towards a future <strong>in</strong> which<br />

dialog systems can be easily tra<strong>in</strong>ed <strong>in</strong> new doma<strong>in</strong>s by<br />

us<strong>in</strong>g crowd-generated datasets.<br />

Acknowledgements<br />

The authors would like to thank Eric Horvitz for many<br />

<strong>in</strong>sightful discussions about this work.<br />

References<br />

Aleven, V.; Sewall, J.; McLaren, B. M.; and Koed<strong>in</strong>ger,<br />

K. R. 2006. Rapid author<strong>in</strong>g of <strong>in</strong>telligent tutors for realworld<br />

and experimental use. In ICALT 2006, 847–851.<br />

Allen, J. F.; Miller, B. W.; R<strong>in</strong>gger, E. K.; and Sikorski,<br />

T. 1996. A robust system for natural spoken dialogue. In<br />

Proceed<strong>in</strong>gs of ACL 1996, 62–70.<br />

Bernste<strong>in</strong>, M. S.; Brandt, J. R.; Miller, R. C.; and Karger,<br />

D. R. 2011. <strong>Crowd</strong>s <strong>in</strong> two seconds: Enabl<strong>in</strong>g realtime<br />

crowd-powered <strong>in</strong>terfaces. In Proc. of UIST 2011, 33–42.<br />

Bessho, F.; Harada, T.; and Kuniyoshi, Y. 2012. Dialog<br />

system us<strong>in</strong>g real-time crowdsourc<strong>in</strong>g and twitter large-scale<br />

corpus. In Proceed<strong>in</strong>gs of SIGDIAL 2012, 227–231.<br />

Bigham, J. P.; Jayant, C.; Ji, H.; Little, G.; Miller, A.;<br />

Miller, R. C.; Miller, R.; Tatarowicz, A.; White, B.; White,<br />

S.; and Yeh, T. 2010. Vizwiz: nearly real-time answers to<br />

visual questions. In Proceed<strong>in</strong>gs of UIST 2010, 333–342.<br />

Burrows, S.; Potthast, M.; and Ste<strong>in</strong>, B. 2013. Paraphrase<br />

acquisition via crowdsourc<strong>in</strong>g and mach<strong>in</strong>e learn<strong>in</strong>g. ACM<br />

Trans. Intell. Syst. Technol. 4(3):43:1–43:21.<br />

Chen, X.; Bennett, P. N.; Coll<strong>in</strong>s-Thompson, K.; and<br />

Horvitz, E. 2013. Pairwise rank<strong>in</strong>g aggregation <strong>in</strong> a crowdsourced<br />

sett<strong>in</strong>g. In Proceed<strong>in</strong>gs of WSDM 2013, 193–202.<br />

Gandhe, S.; DeVault, D.; Roque, A.; Mart<strong>in</strong>ovski, B.; Artste<strong>in</strong>,<br />

R.; A. Leuski, J. G.; and Traum, D. 2009. From<br />

doma<strong>in</strong> specification to virtual humans: An <strong>in</strong>tegrated approach<br />

to author<strong>in</strong>g tactical question<strong>in</strong>g characters. In Proceed<strong>in</strong>gs<br />

of Interspeech 2008.<br />

Gor<strong>in</strong>, A. L.; Riccardi, G.; and Wright, J. H. 1997. How<br />

may i help you? Speech Commun. 23(1-2):113–127.<br />

Lane, I.; Waibel, A.; Eck, M.; and Rottmann, K. 2010.<br />

Tools for collect<strong>in</strong>g speech corpora via mechanical-turk. In<br />

Proceed<strong>in</strong>gs of <strong>the</strong> NAACL HLT 2010 Workshop on Creat<strong>in</strong>g<br />

Speech and Language Data with Amazon’s Mechanical Turk,<br />

CSLDAMT ’10, 184–187.<br />

Lasecki, W.; Murray, K.; White, S.; Miller, R. C.; and<br />

Bigham, J. P. 2011. Real-time crowd control of exist<strong>in</strong>g<br />

<strong>in</strong>terfaces. In Proceed<strong>in</strong>gs of UIST 2011, 23–32.<br />

Lasecki, W.; Miller, C.; Sadilek, A.; AbuMoussa, A.; and<br />

Bigham, J. 2012. Real-time caption<strong>in</strong>g by groups of nonexperts.<br />

In Proceed<strong>in</strong>gs of UIST 2012.<br />

Lasecki, W. S.; Thiha, P.; Zhong, Y.; Brady, E.; and Bigham,<br />

J. P. 2013a. Answer<strong>in</strong>g visual questions with conversational<br />

crowd assistants. In Proc. of ASSETS 2013, To Appear.<br />

Lasecki, W.; Wesley, R.; Nichols, J.; Kulkarni, A.; Allen, J.;<br />

and Bigham, J. 2013b. Chorus: A crowd-powered conversational<br />

assistant. In Proceed<strong>in</strong>gs of UIST 2013, To Appear.<br />

Lathrop, B. e. a. 2004. A wizard of oz framework for collect<strong>in</strong>g<br />

spoken human-computer dialogs: An experiment procedure<br />

for <strong>the</strong> design and test<strong>in</strong>g of natural language <strong>in</strong>-vehicle<br />

technology systems. In Proceed<strong>in</strong>gs of ITS 2004.<br />

Liu, J.; Pasupat, P.; Cyphers, S.; and Glass, J. 2013. Asgard:<br />

A portable architecture for multil<strong>in</strong>gual dialogue systems. In<br />

Proceed<strong>in</strong>gs of ICASSP 2013.<br />

Wang, W. Y.; Bohus, D.; Kamar, E.; and Horvitz, E. 2012.<br />

<strong>Crowd</strong>sourc<strong>in</strong>g <strong>the</strong> acquisition of natural language corpora:<br />

Methods and observations. In SLT 2012. IEEE.<br />

Ward, W., and Pellom, B. 1999. The cu communicator<br />

system. In Proceed<strong>in</strong>gs of IEEE ASRU.<br />

Zaidan, O. F., and Callison-Burch, C. 2011. <strong>Crowd</strong>sourc<strong>in</strong>g<br />

translation: professional quality from non-professionals. In<br />

Proceed<strong>in</strong>gs of HLT 2011, 1220–1229.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!