Frequency (kHz)2102102102101) Mixture20 40 60 802) Only ICA20 40 60 803) ICA and T−F masking20 40 60 804) Target source component20 40 60 80Time frameFigure 11: : 1)2) ICA 3) ICA (T-F masking)4) Table 2: 9 3 : SIR (dB) a120 b120 c170 SIR −3.9 −3.6 −5.9ICA 12.5 13.6 14.5ICA 15.1 16.5 17.65 BSS 2permutation ICA BSS permutation [16][1] B. D. Van Veen and K. M. Buckley, “Beamforming: a versatileapproach to spatial filtering,” IEEE ASSP Magazine, vol. 5, pp.4–24, Apr. 1988.[2] S. Haykin, Ed., Unsupervised Adaptive Filtering (Volume I:Blind Source Separation). John Wiley & Sons, 2000.[3] A. Hyvärinen, J. Karhunen, and E. Oja, Independent ComponentAnalysis. John Wiley & Sons, 2001.[4] K. Matsuoka and S. Nakashima, “Minimal distortion principlefor blind source separation,” in Proc. ICA 2001, Dec. 2001, pp.722–727.[5] S. C. Douglas and X. Sun, “Convolutive blind separation ofspeech mixtures using the natural gradient,” Speech Communication,vol. 39, pp. 65–78, 2003.[6] S. C. Douglas, H. Sawada, and S. Makino, “A spatio-temporalFastICA algorithm for separating convolutive mixtures,” inProc. of 2005 IEEE International Conference on Acoustics,Speech, and Signal Processing (ICASSP 2005), vol. V, Mar.2005, pp. 165–168.[7] P. Smaragdis, “Blind separation of convolved mixtures inthe frequency domain,” Neurocomputing, vol. 22, pp. 21–34,1998.[8] N. Murata, S. Ikeda, and A. Ziehe, “An approach to blindsource separation based on temporal structure of speech signals,”Neurocomputing, vol. 41, no. 1-4, pp. 1–24, Oct. 2001.[9] F. Asano, S. Ikeda, M. Ogawa, H. Asoh, and N. Kitawaki,“Combined approach of array processing and independentcomponent analysis for blind separation of acoustic signals,”IEEE Trans. Speech Audio Processing, vol. 11, no. 3, pp. 204–215, May 2003.[10] H. Saruwatari, S. Kurita, K. Takeda, F. Itakura, T. Nishikawa,and K. Shikano, “Blind source separation combining independentcomponent analysis and beamforming,,” EURASIP Journalon Applied Signal Processing, vol. 2003, no. 11, pp. 1135–1146, Nov. 2003.[11] H. Sawada, R. Mukai, S. Araki, and S. Makino, “Polar coordinatebased nonlinear function for frequency domain blindsource separation,” IEICE Trans. Fundamentals, vol. E86-A,no. 3, pp. 590–596, Mar. 2003.[12] ——, “A robust and precise method for solving the permutationproblem of frequency-domain blind source separation,”IEEE Trans. Speech Audio Processing, vol. 12, no. 5, pp. 530–538, Sept. 2004.[13] R. Mukai, H. Sawada, S. Araki, and S. Makino, “Frequencydomain blind source separation for many speech signals,”in Proc. of the 5th International Conference on IndependentComponent Analysis and Blind Signal Separation (ICA 2004 /LNCS 3195). Springer-Verlag, Sept. 2004, pp. 461–469.[14] H. Sawada, R. Mukai, S. Araki, and S. Makino, “Frequencydomainblind source separation,” in Speech Enhancement,J. Benesty, S. Makino, and J. Chen, Eds. Springer, Mar. 2005,pp. 299–327.[15] H. Sawada, S. Araki, R. Mukai, and S. Makino, “Blind extractionof a dominant source from mixtures of many sources usingICA and time-frequency masking,” in Proc. of 2005 IEEE InternationalSymposium on Circuits and Systems (ISCAS 2005),May 2005, pp. 5882–5885.[16] S. Araki, H. Sawada, R. Mukai, and S. Makino, “A novel blindsource separation method with observation vector clustering,”in Proc. 2005 International Workshop on Acoustic Echo andNoise Control (IWAENC 2005), Sept. 2005.22
社 団 法 人 人 工 知 能 学 会Japanese Society forArtificial Intelligence人 工 知 能 学 会JSAI Technical ReportSIG-CHallege-0522-4 (10/14)SIMO-ICA 2 Two-Stage Real-Time Blind Source Separation Combining SIMO-ICA and Binary Mask Processing † † † † ‡ ‡Y. Mori † , T. Takatani † , H. Saruwatari † , K. Shikano † ,T.Hiekata ‡ ,T.Morita ‡† Nara Institute of Science and Technology‡() Kobe Steel,Ltd.E-mail: †{yoshim-m, tomoya-t, sawatari, shikano}@is.naist.jp,‡{t-hiekata, takashi-morita}@kobelco.jpAbstractWe newly propose a real-time two-stage blindsource separation (BSS) for binaural mixed signalsobserved at the ears of humanoid robot, inwhich a Single-Input Multiple-Output (SIMO)-model-based independent component analysis(ICA) and binary mask processing are combined.SIMO-model-based ICA can separatethe mixed signals, not into monaural source signalsbut into SIMO-model-based signals fromindependent sources as they are at the microphones.Thus, the separated signals ofSIMO-model-based ICA can maintain the spatialqualities of each sound source, and thisyields that binary mask processing can be appliedto efficiently remove the residual interferencecomponents after SIMO-model-basedICA. The experimental results obtained witha human-like head reveal that the separationperformance can be considerably improved byusing the proposed method in comparison tothe conventional ICA-based and binary-maskbasedBSS methods.1 (BSS) (DOA) BSS BSS [1] [2, 3] (ICA) [4] BSS [5,6,7,8] BSS ICA ICA [9] ICA () 2 BSS BSS 2 Single-Input Multiple-Output (SIMO) ICA (SIMO-ICA) [10, 11] SIMO-ICA SIMO – [12, 13, 14] “SIMO” SIMO-ICA SIMO SIMO-ICA SIMO-ICA BSS23
- Page 4: SCOT(Smoothed Coherence Transform)P
- Page 8 and 9: Particle (a)(b)φ12(τ )[14]x ( t )
- Page 10 and 11: - 8 -
- Page 12 and 13: 1 () 2 SIMO-ICA 3 SIMO-ICA tele
- Page 14: ICAy FCy FCy SIMO-ICAs 1(t)x 1(t)1(
- Page 17 and 18: [15] Y. Mori, H. Saruwatari, T. Tak
- Page 19 and 20: 社 団 法 人 人 工 知 能 学
- Page 21 and 22: • 音 源 位 置マイク配 置
- Page 23: Table 1: 6 : SIR (dB)SIR 1 SIR 2 S
- Page 27 and 28: SIMO-ICA SIMO Figure 2(a)SIMO-ICA
- Page 29 and 30: Binary maskConventional ICAConventi
- Page 31 and 32: 社 団 法 人 人 工 知 能 学
- Page 33 and 34: k lo (l), k c (l), k hi (l) l k c
- Page 35 and 36: 5.75 m4.33 mNoise1.15 mUser 40°2.1
- Page 38 and 39: おける 方 法 論 に 関 し
- Page 40 and 41: Fig.6 は 幼 児 の ABR (Auditory
- Page 42 and 43: ンターフェースはスパイ
- Page 44 and 45: マイクロホン[ 正 面 ][ 左
- Page 46 and 47: s(k)Crosstalkn(k)R S(k)X P(k)X R(k)
- Page 48 and 49: する 隠 れマルコフモデル
- Page 50 and 51: 123ÙÖ ½ ¾º¾ ´º ½µ ´º
- Page 52 and 53: ÌÐ ½ ¿º¾ ÅÎÆÇÂ
- Page 54 and 55: ÁÒØÖÒØÓÒÐ ÓÒÖÒ ÓÒ Á
- Page 56 and 57: 例 えば、 同 一 時 間 差
- Page 58 and 59: いて、θの 絶 対 値 が 大
- Page 60 and 61: Fig.11 にこのシステムの 処
- Page 62 and 63: 5 , 2 EMIEWFig.1 EMIEW EMIEW 6 ,
- Page 64: 0 P th , (14) 4.4 3 4 4 1 , 3
- Page 67 and 68: 社 団 法 人 人 工 知 能 学
- Page 69 and 70: 3.1. 3.2. Fig. 3. The
- Page 71 and 72: 4.1. Fig. 5. The time co
- Page 73 and 74: 社 団 法 人 人 工 知 能 学
- Page 75 and 76:
modal (m, ), whispery (w, ), aspir
- Page 77 and 78:
Aperiodicity rate (APR)TLR (Time-La
- Page 79 and 80:
社 団 法 人 人 工 知 能 学
- Page 81 and 82:
, À, WDS-BF Ñ À℄·
- Page 83 and 84:
Table 1: Localization Error of A Si
- Page 85 and 86:
社 団 法 人 人 工 知 能 学
- Page 87 and 88:
を 行 い, 閾 値 処 理 を
- Page 89 and 90:
4. 音 声 対 話 制 御 実 験H
- Page 91 and 92:
社 団 法 人 人 工 知 能 学
- Page 93 and 94:
3 HLDAMLLR [3] (Useful Information
- Page 95 and 96:
Class 10degClass 20degClass 10degCl
- Page 97 and 98:
社 団 法 人 人 工 知 能 学
- Page 99 and 100:
赤 い 長 方 形 内 ). 以 下
- Page 101 and 102:
5.2 音 場 計 測 結 果(dB SPL)
- Page 103 and 104:
社 団 法 人 人 工 知 能 学
- Page 105 and 106:
a) 90 b) 90 MFMc) d) MFMe) 9
- Page 107 and 108:
(3) MFT Julius 7.1 Figure 4: SIG2