第22回ロボット聴覚特集 - 奥乃研究室 - 京都大学

More documents

Recommendations

Info

ンターフェースはスパイクイベントと決まっており、その時間情報に位相、強度、周波数など必要な情報が準備されているので、受け側で選択し利用すればいい。また、個々の処理が単純なため、冗長ではあるが多数のニューロンを用意し関係付けることで、時間的にも空間的にも補完的な処理が可能になる。このモデルは学習発達型でも利用可能である。発達型にするのであれば、2.2.1 などを参考に、低周波選択で位相発火型の聴神経をベースに内側上オリーブ- 下丘間の結合を学習し、下丘に方向マップを概ね作成した後に外側オリーブや外側毛帯を導入し、上下で挟むことで学習を進めればいい。また、このモデルにおいてマスクを形成する外側毛帯の活動は、一時の入力に対し 20 ミリ秒ほどの継続的発火を伴うものなのだが、そのような特性も細胞特性に組み込むだけでよい。IID などは、計算方式はアルゴリズムで記述するものと異なるが内容に大きな差はないが、同じ計算原理で Fig.7 のモデルをすべて動かせることが何よりも重要である。しかし、3.1 で述べたように原理は実時間向きでも、処理が実時間でないことは大きな課題であり、ロボット応用に向けてはさらなる技術蓄積が必要である。。4. おわりに脳型情報処理の立場から、ロボット聴覚を考察した。本稿ではスパイキングニューロンの処理に関してのみ述べたが、3 節冒頭で述べたように、脳型情報処理はシステム、回路、素子の 3 要素からなり、それらの相乗効果が大きい。三位一体の研究展開が重要である。ロボット聴覚は情報処理として求められる動的特性、学習能力の観点から、脳型情報処理の研究において最適な課題といえる。今後、両研究領域の相互発展を期待したい。参考文献1) Berg, BO., Principles of Child Neurology, McGraw-Hill,New York NY, 1995.2) Brooks, RA. : A Robust Layered Control System for aMobile Robot, IEEE Journal of Robotics and Automation 2(1), 14–2, 1986.3) Clifton RK., The development of spatial hearing in humaninfants, in Werner LA, Rubel EW (eds): Developmental Psycholoacoutics.,American psychological Association, Washington,DC, 135-157, 1992.4) Damasio, H., Tranel, D., Grabowski, T., Adolphs, R.,Damasio, A., Neural systems behind word and concept re-trieval, Cognition, 92, 179-229, 2004.5) Dayan, P., Abbott, LF., Theoretical Neuroscience: Computationaland Mathematical Modeling of Neural Systems,MIT Press, Cambridge MA, 2001.6) Dominey PF.,Hoen M.,Blanc JM.,Lelekov-Boissard T.,7)Neurological basis of language and sequential cognition:evidence from simulation, aphasia, and ERP studies., Brainand Language, 86(2),207-25, 2003.D’Esposito M, Alexander MP., Subcortical aphasia: distinctprofiles following left putaminal hemorrhage. Neuorol-ogy, 45, 38–41, 1995.8) Ewert, J.-P. and Arbib, M.A., Eds., Visuomotor Coordina-Amphibians, Comparisons, Models and Robots, Newtion:York: Plenum Press, 1989.9) Jordan, MI. (Ed.), Learning in graphical models., MITPress, Cambridge MA, 1999.10) Koch, C. (ed.), Biophysics of Computation, Oxford UniversityPress, New York, 1999.11) Koerner, E., Gewaltig, M-O., Koerner, U., Richter, A. andRodemann, T., A model of computation in neocorticalarchitecture., Neural Networks, 12:989–1006, 1999.12) Koerner, E., Tsujino, H. and Masutani, T.: A Cortical-typeModular Neural Network for Hypothetical Reasoning,Neural Networks 10, 791-814, 1997.13) Lieberman, P., On the nature and evolution of the neuralbases of human language, Yearbook of physical anthropology,45, 36-62, 2002.14) Litovsky RY, Colburn HS, Yost WA, Guzman SJ., Theprecedence effect, J Acoust Soc Am.,106(4 Pt 1),1633-54,1999.15) Litovsky RY, Shinn-Cunningham BG., Investigation of therelationship among three common measures of precedence:fusion, localization dominance, and discrimination suppression,J Acoust Soc Am., 109(1),346-58, 2001.16) Maas W., Bishop, CM. (eds), Pulsed neural networks, MITPress, Cambridge MA, 1998.17) McCarthy, J., Minsky ML., Rochester, N., Shannon CE., Aproposal for the Dartmouth summer research project on artificialintelligence”, 1955.18) McCulloch, W.S. and Pitts, W.H., A logical calculus of theideas immanent in neural nets, Bulletin of MathematicalBiophysics, 5 : 115-133, 1943.19) Nilsson, NJ. : Shakey The Robot, Technical Note 323. AICenter, SRI International, 1984.20) Poeppel, D., Hickok, G., Towards a new functional anatomyof language, Cognition, 92(1-2), 1-12, 2004.21) Pollak GD, Burger RM, Klug A., Dissecting the circuitryof the auditory system, Trends Neuroscience, 26(1),33-9,2003.22) Tsujino, H., Output-driven operation and memory-basedarchitecture principles embedded in a real-world device,Journal of Integrative Neuroscience, 3(2), 133-42, 2004.23) Ullman, MT., Contribution of memory circuits to language:the declarative/procedural model, Cognition, 92, 231-270,2004.24) Watkins KE, Vargha-Khadem F, Ashburner J, PassinghamRE, Connelly A, Friston KJ, Frackowiak RS, Mishkin M,Gadian DG., MRI analysis of an inherited speech and languagedisorder: structural brain abnormalities., Brain, 125( Pt3), 465-78, 2002.25) Werner LA., Gillenwater JM., Pure-tone sensitivity of 2-to5-week-old infants, Infant Behavior and Development,13(355), 355-375, 1990.26 ) Wolpert D, Kawato M: Multiple paired forward and inversemodels for motor control. Neural Networks11,1317-1329, 1998.27) Yang X, Grantham DW., Echo suppression and discriminationsuppression aspects of the precedence effect, PerceptPsychophys, 59(7),1108-17, 1997.28) 井上博充 : 人間型ロボットが拓く未来社会と新産業の創成 , 日本ロボット学会誌 , 22 (1), 2-5 , 2004.29) 奥乃博 , 中臺一博 , ロボット聴覚の課題と現状 , 情報処理 , 44(11), 104-113, Nov. 2003.30) 松本元、辻野広司 : 脳のこころ、「情と意の脳科学」、松本元・小野武年共編、培風館 , 2002.40
社団法人人工知能学会Japanese Society forArtificial Intelligence人工知能学会JSAI Technical ReportSIG-CHallege-0522-7 (10/14)パーソナルロボット PaPeRo における近接話者方向推定と 2 マイク音声強調Near-Field Sound-Source Localization and Adaptive Noise Cancellationin a Personal Robot, PaPeRoAbstract—This paper presents implementation andevaluation of speech interface for a personal robot,PaPeRo, based on sound-source localization and noisecancellation. Sound-source localization incorporates anew formula taking near-field conditions into account foroffsetting errors caused by the relative altitude of thespeech source to the microphones. In noise cancellation,a novel stepsize control assuming a wide range of signal-to-noiseratios of the input signal helps achieve bothsmall residual noise and distortion in the noise-cancelledsignal. Evaluation results with recorded signals in thereal environment demonstrates 40% highersource-localization performance and as much as 65%higher speech recognition rates in noisy environment.1. はじめに〇佐藤幹 (NEC メディア情報研究所 )杉山昭彦 (NEC メディア情報研究所 )大中慎一 (NEC メディア情報研究所 )* Miki SATO(NEC.), Akihiko SUGIYAMA(NEC.), Shin’ichi Ohnaka(NEC.)m-sato@dh.jp.nec.com, aks@ak.jp.nec.com, s-ohnaka@cp.jp.nec.com近年、人間と共生することを目的としたパートナー型ロボットの研究が盛んに行われている [1]。これらのロボットは、通常、音声コマンドによって、離れた位置から制御される。背景雑音や妨害信号の影響を低減して、正確に音声コマンドを認識するために、指向性マイクロホンが広く使われている。このため、音声の到来する方向を推定し、推定方向にマイクロホンの指向性を一致させることが重要となる。遠隔会議などの通信応用と異なり、人間とロボットの対話では、話者の口、すなわち音源とマイクロホンは、同一平面上にあると見なすことはできない。しかし、ロボットにおける話者方向推定では、暗黙のうちに音源とマイクロホンが同一平面上にあると仮定してきた。この仮定が話者方向推定結果に与える影響は、人間とロボットとの距離が近くなるほど大きくなる。すなわち、近接音場を想定した方向推定が重要となるのである。一方、マイクロホンの指向性だけで抑圧できない雑音や妨害信号は、音声強調処理によって、その影響を軽減する。応用毎に異なる要求条件に応じて、1つ又は多数のマイクロホンを用いた雑音及び妨害信号の抑圧が、広く行われている [2]。人間とロボットの対話においては、2 つのマイクロホンを用いた適応ノイズキャンセラが、マイクロホン数、雑音除去性能、及び歪の観点から見て、良い妥協策である。適応ノイズキャンセラは、音声用と雑音用の 2 つFigure. 1: PaPeRo の外観のマイクロホンを用いて、雑音の消去を行う。符号化や音声認識の前処理に用いるために、係数更新ステップサイズを音声対雑音比 (SN 比 )に応じて制御することで、高い雑音消去性能と小さな音声歪を両立することができるノイズキャンセラ [3] が提案されている。このノイズキャンセラは、ヘッドセットなどのように、音声用マイクロホンが話者の口元にあることを想定しているため、様々な距離から話しかけられるロボットに適用することはできない。音声用マイクロホンと口との距離に応じて、SN 比が広範囲に変化するためである。本稿では、音声対話機能をもつ自律移動型パーソナルロボット PaPeRo[4]における、近接音場を想定した話者方向推定と、広範囲な SN 比に対応できるノイズキャンセラについて紹介する。2 節で、PaPeRoの構成と音声インタフェースについて説明する。3節では近接話者方向推定、4 節ではノイズキャンセラをとりあげる。5 節では評価結果を用いて性能を明らかにし、6 節で今後の課題について述べる。2. パーソナルロボット PaPeRo2.1. ハードウェアパーソナルロボットPaPeRoの外観を、Fig. 1に示す。PaPeRoは、高さ385mm、幅 248mm、奥行 245mm、重量 5.0kgの自律移動型ロボットである。胴体正面に4 個、左右にそれぞれ1 個、背面に1 個の無指向性41
Page 4: SCOT(Smoothed Coherence Transform)P
Page 8 and 9: Particle (a)(b)φ12(τ )[14]x ( t )
Page 10 and 11: - 8 -
Page 12 and 13: 1 () 2 SIMO-ICA 3 SIMO-ICA tele
Page 14: ICAy FCy FCy SIMO-ICAs 1(t)x 1(t)1(
Page 17 and 18: [15] Y. Mori, H. Saruwatari, T. Tak
Page 19 and 20: 社団法人人工知能学
Page 21 and 22: • 音源位置マイク配置
Page 23 and 24: Table 1: 6 : SIR (dB)SIR 1 SIR 2 S
Page 27 and 28: SIMO-ICA SIMO Figure 2(a)SIMO-ICA
Page 29 and 30: Binary maskConventional ICAConventi
Page 33 and 34: k lo (l), k c (l), k hi (l) l k c
Page 35 and 36: 5.75 m4.33 mNoise1.15 mUser 40°2.1
Page 38 and 39: おける方法論に関し
Page 40 and 41: Fig.6 は幼児の ABR (Auditory
Page 44 and 45: マイクロホン[ 正面 ][ 左
Page 46 and 47: s(k)Crosstalkn(k)R S(k)X P(k)X R(k)
Page 48 and 49: する隠れマルコフモデル
Page 50 and 51: 123ÙÖ ½ ¾º¾ ´º ½µ ´º
Page 52 and 53: ÌÐ ½ ¿º¾ ÅÎÆÇÂ
Page 54 and 55: ÁÒØÖÒØÓÒÐ ÓÒÖÒ ÓÒ Á
Page 56 and 57: 例えば、同一時間差
Page 58 and 59: いて、θの絶対値が大
Page 60 and 61: Fig.11 にこのシステムの処
Page 62 and 63: 5 , 2 EMIEWFig.1 EMIEW EMIEW 6 ,
Page 64: 0 P th , (14) 4.4 3 4 4 1 , 3
Page 69 and 70: 3.1. 3.2. Fig. 3. The
Page 71 and 72: 4.1. Fig. 5. The time co
Page 75 and 76: modal (m, ), whispery (w, ), aspir
Page 77 and 78: Aperiodicity rate (APR)TLR (Time-La
Page 81 and 82: , À, WDS-BF Ñ À℄·
Page 83 and 84: Table 1: Localization Error of A Si
Page 87 and 88: を行い, 閾値処理を
Page 89 and 90: 4. 音声対話制御実験H
Page 93 and 94:
3 HLDAMLLR [3] (Useful Information
Page 95 and 96:
Class 10degClass 20degClass 10degCl
Page 97 and 98:
社団法人人工知能学
Page 99 and 100:
赤い長方形内 ). 以下
Page 101 and 102:
5.2 音場計測結果(dB SPL)
Page 103 and 104:
社団法人人工知能学
Page 105 and 106:
a) 90 b) 90 MFMc) d) MFMe) 9
Page 107 and 108:
(3) MFT Julius 7.1 Figure 4: SIG2
show all

第22回 ロボット聴覚特集 - 奥乃研究室 - 京都大学

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?

第22回ロボット聴覚特集 - 奥乃研究室 - 京都大学