LOUD: A 1020-Node Microphone Array and Acoustic ... - CSAIL - MIT
LOUD: A 1020-Node Microphone Array and Acoustic ... - CSAIL - MIT LOUD: A 1020-Node Microphone Array and Acoustic ... - CSAIL - MIT
LOUD: A 1020-Node Microphone Array and Acoustic Beamformer* Eugene Weinstein 1 , Kenneth Steele 2 , Anant Agarwal 2,3 , James Glass 3 1 Courant Institute of Mathematical Sciences 2 Tilera Corporation 3 MIT Computer Science and Artificial Intelligence Lab * Based on work done at MIT CSAIL
- Page 2 and 3: Introduction • Recording sound in
- Page 4 and 5: Acoustic Beamforming • Selectivel
- Page 6 and 7: 1020-Microphone Array 6
- Page 8 and 9: Experiments • Setting: extremely
- Page 10 and 11: LOUD Demo 10
<strong>LOUD</strong>: A <strong>1020</strong>-<strong>Node</strong><br />
<strong>Microphone</strong> <strong>Array</strong> <strong>and</strong><br />
<strong>Acoustic</strong> Beamformer*<br />
Eugene Weinstein 1 , Kenneth Steele 2 , Anant Agarwal 2,3 ,<br />
James Glass 3<br />
1 Courant Institute of Mathematical Sciences<br />
2 Tilera Corporation<br />
3 <strong>MIT</strong> Computer Science <strong>and</strong> Artificial Intelligence Lab<br />
* Based on work done at <strong>MIT</strong> <strong>CSAIL</strong>
Introduction<br />
• Recording sound in high-noise settings is difficult<br />
• e.g., noisy lab or conference room<br />
• Can use close-talking microphones (e.g., lapel mic)<br />
• However, an untethered solution is more natural<br />
• Idea: use software-steerable microphone arrays<br />
• Isolate <strong>and</strong> amplify sound using beamforming<br />
• Target application: speech recognition<br />
2
Large <strong>Microphone</strong> <strong>Array</strong>s<br />
• Large acOUstic Data (<strong>LOUD</strong>) array: <strong>1020</strong> microphones<br />
• <strong>Microphone</strong> array gain increases linearly with the number<br />
of microphones<br />
• Past large-array speech recognition experiments scarce<br />
• Processing large quantities of data in real-time is a<br />
compelling application for novel computing architectures<br />
• <strong>LOUD</strong> generates 400 Mbits/sec<br />
• We use Raw, a 16-tile parallel architecture<br />
3
<strong>Acoustic</strong> Beamforming<br />
• Selectively amplify a sound source at a particular location<br />
• Take advantage of sound propagation through space<br />
• Use simple delay-<strong>and</strong>-sum beamforming<br />
t8<br />
…<br />
0 t8-t7<br />
t8-t1<br />
+<br />
t7<br />
…<br />
t1<br />
Sound<br />
Source<br />
<strong>Microphone</strong>s<br />
Delay
Two-microphone PCB<br />
• On-board A/D converter feeds into CPLD<br />
• Data streamed to CPU using time-division multiplexing<br />
5
<strong>1020</strong>-<strong>Microphone</strong> <strong>Array</strong><br />
6
<strong>Microphone</strong> Positions<br />
• Automated procedure to calibrate microphone positions<br />
• Play a test audio “chirp” through a speaker<br />
• Record with reference mic at speaker position <strong>and</strong> at<br />
each array mic<br />
• Peak of cross-correlation function between reference,<br />
array microphones gives propagation delay<br />
• Solve for precise array geometry<br />
7
Experiments<br />
• Setting: extremely noisy hardware lab<br />
• Subject <strong>and</strong> “interferer” talking at the same time<br />
• Goal: demonstrate that speech recognition accuracy<br />
improves with microphone array size<br />
• Speaker-independent recognizer for digit strings<br />
• Record 150 utterances with interferer, 110 without<br />
• Baseline: high quality close-talking mic, 80 utterances<br />
8
Recognition Accuracy<br />
• Word error rate<br />
(WER) decreases<br />
with array size<br />
• WER drops by 87%<br />
(w/ interferer), 91%<br />
(no interferer) from<br />
one to <strong>1020</strong> mics<br />
• Accuracy approaches<br />
close-talking<br />
microphone levels!<br />
Word Error Rate (%)<br />
100<br />
90<br />
80<br />
70<br />
60<br />
50<br />
40<br />
30<br />
20<br />
10<br />
0<br />
10 0<br />
5.1. Discussion 9<br />
10 1<br />
Number of <strong>Microphone</strong>s<br />
ICSV14 • 9–12 July 2007 • Cairns • Aus<br />
10 2<br />
Figure 3. Experimental word error rates.<br />
<strong>Array</strong> with interferer<br />
<strong>Array</strong> without interferer<br />
Close!talking microphone<br />
10 3
<strong>LOUD</strong> Demo<br />
10
Summary/Future Work<br />
• <strong>LOUD</strong> allows high-quality untethered recording in very<br />
noisy settings<br />
• Speech recognition experiments demonstrate benefit of<br />
large arrays<br />
• Future work:<br />
• Implement more sophisticated beamforming techniques<br />
• Automatic speaker tracking<br />
• Conduct more experiments with different geometries,<br />
noise settings<br />
11