LOUD: A 1020-Node Microphone Array and Acoustic ... - CSAIL - MIT

LOUD: A 1020-Node Microphone Array and Acoustic ... - CSAIL - MIT LOUD: A 1020-Node Microphone Array and Acoustic ... - CSAIL - MIT

groups.csail.mit.edu
from groups.csail.mit.edu More from this publisher
27.03.2013 Views

LOUD: A 1020-Node Microphone Array and Acoustic Beamformer* Eugene Weinstein 1 , Kenneth Steele 2 , Anant Agarwal 2,3 , James Glass 3 1 Courant Institute of Mathematical Sciences 2 Tilera Corporation 3 MIT Computer Science and Artificial Intelligence Lab * Based on work done at MIT CSAIL

<strong>LOUD</strong>: A <strong>1020</strong>-<strong>Node</strong><br />

<strong>Microphone</strong> <strong>Array</strong> <strong>and</strong><br />

<strong>Acoustic</strong> Beamformer*<br />

Eugene Weinstein 1 , Kenneth Steele 2 , Anant Agarwal 2,3 ,<br />

James Glass 3<br />

1 Courant Institute of Mathematical Sciences<br />

2 Tilera Corporation<br />

3 <strong>MIT</strong> Computer Science <strong>and</strong> Artificial Intelligence Lab<br />

* Based on work done at <strong>MIT</strong> <strong>CSAIL</strong>


Introduction<br />

• Recording sound in high-noise settings is difficult<br />

• e.g., noisy lab or conference room<br />

• Can use close-talking microphones (e.g., lapel mic)<br />

• However, an untethered solution is more natural<br />

• Idea: use software-steerable microphone arrays<br />

• Isolate <strong>and</strong> amplify sound using beamforming<br />

• Target application: speech recognition<br />

2


Large <strong>Microphone</strong> <strong>Array</strong>s<br />

• Large acOUstic Data (<strong>LOUD</strong>) array: <strong>1020</strong> microphones<br />

• <strong>Microphone</strong> array gain increases linearly with the number<br />

of microphones<br />

• Past large-array speech recognition experiments scarce<br />

• Processing large quantities of data in real-time is a<br />

compelling application for novel computing architectures<br />

• <strong>LOUD</strong> generates 400 Mbits/sec<br />

• We use Raw, a 16-tile parallel architecture<br />

3


<strong>Acoustic</strong> Beamforming<br />

• Selectively amplify a sound source at a particular location<br />

• Take advantage of sound propagation through space<br />

• Use simple delay-<strong>and</strong>-sum beamforming<br />

t8<br />

…<br />

0 t8-t7<br />

t8-t1<br />

+<br />

t7<br />

…<br />

t1<br />

Sound<br />

Source<br />

<strong>Microphone</strong>s<br />

Delay


Two-microphone PCB<br />

• On-board A/D converter feeds into CPLD<br />

• Data streamed to CPU using time-division multiplexing<br />

5


<strong>1020</strong>-<strong>Microphone</strong> <strong>Array</strong><br />

6


<strong>Microphone</strong> Positions<br />

• Automated procedure to calibrate microphone positions<br />

• Play a test audio “chirp” through a speaker<br />

• Record with reference mic at speaker position <strong>and</strong> at<br />

each array mic<br />

• Peak of cross-correlation function between reference,<br />

array microphones gives propagation delay<br />

• Solve for precise array geometry<br />

7


Experiments<br />

• Setting: extremely noisy hardware lab<br />

• Subject <strong>and</strong> “interferer” talking at the same time<br />

• Goal: demonstrate that speech recognition accuracy<br />

improves with microphone array size<br />

• Speaker-independent recognizer for digit strings<br />

• Record 150 utterances with interferer, 110 without<br />

• Baseline: high quality close-talking mic, 80 utterances<br />

8


Recognition Accuracy<br />

• Word error rate<br />

(WER) decreases<br />

with array size<br />

• WER drops by 87%<br />

(w/ interferer), 91%<br />

(no interferer) from<br />

one to <strong>1020</strong> mics<br />

• Accuracy approaches<br />

close-talking<br />

microphone levels!<br />

Word Error Rate (%)<br />

100<br />

90<br />

80<br />

70<br />

60<br />

50<br />

40<br />

30<br />

20<br />

10<br />

0<br />

10 0<br />

5.1. Discussion 9<br />

10 1<br />

Number of <strong>Microphone</strong>s<br />

ICSV14 • 9–12 July 2007 • Cairns • Aus<br />

10 2<br />

Figure 3. Experimental word error rates.<br />

<strong>Array</strong> with interferer<br />

<strong>Array</strong> without interferer<br />

Close!talking microphone<br />

10 3


<strong>LOUD</strong> Demo<br />

10


Summary/Future Work<br />

• <strong>LOUD</strong> allows high-quality untethered recording in very<br />

noisy settings<br />

• Speech recognition experiments demonstrate benefit of<br />

large arrays<br />

• Future work:<br />

• Implement more sophisticated beamforming techniques<br />

• Automatic speaker tracking<br />

• Conduct more experiments with different geometries,<br />

noise settings<br />

11

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!