Figure 8: BSS ......FFTLeft-channel inputRight-channel input......FFT FFT FFT FFT FFT FFT FFT FFTSIMO-ICAFilter updatingin 3 s durationReal-time filteringSIMO-ICAFilter updatingin 3 s duration... ...W(f)W(f)Real-time filteringW(f)Binary Binary Binary Binary Binary Binary Binary BinaryMask Mask Mask Mask Mask Mask Mask MaskSeparated signal reconstruction with Inverse FFT...TimeFigure 9: ICA [1] K. Nakadai, D. Matsuura, H. Okuno, and H. Kitano,“Applying scattering theory to robot auditionsystem: robust sound source localization and extraction,”Proc. IROS-2003, pp.1147–1152, 2003.[2] R. Nishimura, T. Uchida, A. Lee, H. Saruwatari, K.Shikano, and Y. Matsumoto, “ASKA: Receptionistrobot with speech dialogue system,” Proc. IROS-2002, pp.1314–1317, 2002.[3] R. Prasad, H. Saruwatari, and K. Shikano, “Robotsthat can hear, understand and talk,” AdvancedRobotics, vol.18, pp.533–564, 2004.[4] P. Comon, “Independent component analysis, a newconcept?,” Signal Processing, vol.36, pp.287–314,1994.[5] N. Murata and S. Ikeda, “An on-line algorithm forblind source separation on speech signals,” Proc.NOLTA98, vol.3, pp.923–926, 1998.[6] P. Smaragdis, “Blind separation of convolved mixturesin the frequency domain,” Neurocomputing,vol.22, pp.21–34, 1998.[7] L. Parra and C. Spence, “Convolutive blind separationof non-stationary sources,” IEEE Trans. Speech&AudioProcessing, vol.8, pp.320–327, 2000.[8] H. Saruwatari, S. Kurita, K. Takeda, F. Itakura,T. Nishikawa, and K. Shikano, “Blind source separationcombining independent component analysisand beamforming,” EURASIP Journal on AppliedSignal Processing, vol.2003, pp.1135–1146, 2003.[9] S. F. Boll, “Suppression of acoustic noise inspeech using spectral subtraction,” IEEE Trans.Acoust., Speech & Signal Process., vol.ASSP-27,no.2, pp.113–120, 1979.[10] T. Takatani, T. Nishikawa, H. Saruwatari,and K. Shikano, “High-fidelity blind separationof acoustic signals using SIMO-model-basedICA with information-geometric learning,” Proc.IWAENC2003, pp.251–254, 2003.[11] T. Takatani, T. Nishikawa, H. Saruwatari, and K.Shikano, “High-fidelity blind separation of acousticsignals using SIMO-model-based independentcomponent analysis,” IEICE Trans. Fundamentals,vol.E87-A, no.8, pp.2063–2072, 2004.[12] R. Lyon, “A computational model of binaural localizationand separation,” Proc. ICASSP83, pp.1148–1151, 1983.[13] N. Roman, D. Wang, and G. Brown, “Speechsegregation based on sound localization,” Proc.IJCNN01, pp.2861–2866, 2001.[14] M. Aoki, M. Okamoto, S. Aoki, H. Matsui, T.Sakurai, and Y. Kaneda, ”Sound source segregationbased on estimating incident angle of each frequencycomponent of input signals acquired by multiplemicrophones,” Acoustical Science and Technology,vol.22, no.2, pp.149–157, 2001.[15] H. Sawada, R. Mukai, S. Araki, and S. Makino,“Polar coordinate based nonlinear function for frequencydomain blind source separation,” IEICETrans. Fundamentals, vol.E86-A, no.3, pp.590–596,2003.[16] H. Sawada, R. Mukai, S. Araki, and S. Makino, “Arobust and precise method for solving the permutationproblem of frequency-domain blind sourceseparation,” Proc. Int. Sympo. on ICA and BSS,pp.505–510, 2003.[17] M. Aoki and K. Furuya, ”Using spatial informationfor speech enhancement,” Technical Report of IE-ICE, vol.EA2002-11, pp.23–30, 2002 (in Japanese).28
社 団 法 人 人 工 知 能 学 会Japanese Society forArtificial Intelligence人 工 知 能 学 会JSAI Technical ReportSIG-CHallege-0522-5 (10/14)Hands-Free Speech Recognition Using Spatial Subtraction Arraywith Adaptive Noise Estimation Prosessing under Real EnvironmentChie Kiuchi, Tomoya Takatani, Hiroshi Saruwatari, Kiyohiro ShikanoNara Institute of Science and Technologychie-k@is.naist.jpAbstractWe newly propose an improved spatial subtractionarray (SSA) with an adaptive noise estimationprocessing, which aims at the achievementof robust hands-free speech recognition in realenvironments. The previously proposed SSAcan recognise a target speech with a high accuracyunder a laboratory environment. Howeverthe conventional SSA used an ideally designednull beamformer (NBF) for noise estimation,and consequently it cannot take intoaccount the reverberation effect which arises inan actual environment. The proposed SSA introducesadaptive beamformer (ABF) for theaccurate noise estimation, and thereby remarkablyimproves the noise subtraction performanceeven under real reverberant conditions.The speech recognition experiments reveal thatthe word accuracy of the proposed SSA is superiorto that of the conventional SSA as wellas the conventional delay-and-sum beamformerand adaptive beamformer.1 (SSA) [1]SSA Delay-and-Sum (DS)[2] Griffith-Jim (GJ)[2],[3] DSDSGJ DS GJ DS GJ GJ DS GJSSA Mel FrequencyCepstrum Coefficient (MFCC)[4] (NBF)[5] NBF SSA [1] () NBF SSA () NBF SSA SSA(ABF SSA) NBF SSA ABF SSA 29
- Page 4: SCOT(Smoothed Coherence Transform)P
- Page 8 and 9: Particle (a)(b)φ12(τ )[14]x ( t )
- Page 10 and 11: - 8 -
- Page 12 and 13: 1 () 2 SIMO-ICA 3 SIMO-ICA tele
- Page 14: ICAy FCy FCy SIMO-ICAs 1(t)x 1(t)1(
- Page 17 and 18: [15] Y. Mori, H. Saruwatari, T. Tak
- Page 19 and 20: 社 団 法 人 人 工 知 能 学
- Page 21 and 22: • 音 源 位 置マイク配 置
- Page 23 and 24: Table 1: 6 : SIR (dB)SIR 1 SIR 2 S
- Page 25 and 26: 社 団 法 人 人 工 知 能 学
- Page 27 and 28: SIMO-ICA SIMO Figure 2(a)SIMO-ICA
- Page 29: Binary maskConventional ICAConventi
- Page 33 and 34: k lo (l), k c (l), k hi (l) l k c
- Page 35 and 36: 5.75 m4.33 mNoise1.15 mUser 40°2.1
- Page 38 and 39: おける 方 法 論 に 関 し
- Page 40 and 41: Fig.6 は 幼 児 の ABR (Auditory
- Page 42 and 43: ンターフェースはスパイ
- Page 44 and 45: マイクロホン[ 正 面 ][ 左
- Page 46 and 47: s(k)Crosstalkn(k)R S(k)X P(k)X R(k)
- Page 48 and 49: する 隠 れマルコフモデル
- Page 50 and 51: 123ÙÖ ½ ¾º¾ ´º ½µ ´º
- Page 52 and 53: ÌÐ ½ ¿º¾ ÅÎÆÇÂ
- Page 54 and 55: ÁÒØÖÒØÓÒÐ ÓÒÖÒ ÓÒ Á
- Page 56 and 57: 例 えば、 同 一 時 間 差
- Page 58 and 59: いて、θの 絶 対 値 が 大
- Page 60 and 61: Fig.11 にこのシステムの 処
- Page 62 and 63: 5 , 2 EMIEWFig.1 EMIEW EMIEW 6 ,
- Page 64: 0 P th , (14) 4.4 3 4 4 1 , 3
- Page 67 and 68: 社 団 法 人 人 工 知 能 学
- Page 69 and 70: 3.1. 3.2. Fig. 3. The
- Page 71 and 72: 4.1. Fig. 5. The time co
- Page 73 and 74: 社 団 法 人 人 工 知 能 学
- Page 75 and 76: modal (m, ), whispery (w, ), aspir
- Page 77 and 78: Aperiodicity rate (APR)TLR (Time-La
- Page 79 and 80: 社 団 法 人 人 工 知 能 学
- Page 81 and 82:
, À, WDS-BF Ñ À℄·
- Page 83 and 84:
Table 1: Localization Error of A Si
- Page 85 and 86:
社 団 法 人 人 工 知 能 学
- Page 87 and 88:
を 行 い, 閾 値 処 理 を
- Page 89 and 90:
4. 音 声 対 話 制 御 実 験H
- Page 91 and 92:
社 団 法 人 人 工 知 能 学
- Page 93 and 94:
3 HLDAMLLR [3] (Useful Information
- Page 95 and 96:
Class 10degClass 20degClass 10degCl
- Page 97 and 98:
社 団 法 人 人 工 知 能 学
- Page 99 and 100:
赤 い 長 方 形 内 ). 以 下
- Page 101 and 102:
5.2 音 場 計 測 結 果(dB SPL)
- Page 103 and 104:
社 団 法 人 人 工 知 能 学
- Page 105 and 106:
a) 90 b) 90 MFMc) d) MFMe) 9
- Page 107 and 108:
(3) MFT Julius 7.1 Figure 4: SIG2