Peter Davis1) (1998). . 54, 508-514.2) Okada, M., Mizutani, S., and Kashino, M. (2005). The dynamicsof auditory streaming. The 27th MidWinter Meetingof ARO.3) Warren, R.M. (1999). Auditory Perception – A new analysisand synthesis. Cambridge, UK: Cambridge UniversityPress.4) Kondo, H. and Kashino, M. (2005). Distributed brain activationinvolved in the changes of auditory perceptual organization:an fMRI study on the verbal transformation illusion.The 27th MidWinter Meeting of ARO.5) Liberman, A.M., Cooper, F.S., Shankweiler, D.P., andStuddert-Kennedy, M. (1967). Perception of the speechcode. Psychol. Rev. 74, 431-461.6) Kashino, M. and Nishida, S. (1998). Adaptation in soundlocalization revealed by auditory aftereffects. J. Acoust.Soc. Am. 103, 3597-3604.7) Carlile, S., Hyams, S., and Delaney, S. (2001). Systematicdistortions of auditory space perception following prolongedexposure to broadband noise. J. Acoust. Soc. Am.110, 416-424.8) Kashino, M. (1999). Interaction in the perceived lateralizationof two sounds having different interaural time differences.J. Acoust. Soc. Am. 105, 1343.9) Furukawa, S., Maki, K. Kashino, M. and Riquimaroux, H.(2005). Dependence of the interaural phase difference sesitivitiesof inferior collicular neurons on a preceding toneand its implications in neural population coding. J. Neurophysiol.93, 3313-3326.10) Kashino, M. (1998). Adaptation in sound localization revealedby auditory after-effects. In A. R. Palmer, A. Rees,A. Q. Summerfield, and R. Meddis (Eds.), Psychologicaland physiological advances in hearing. London: WhurrPublishers, Pp. 322-328.11) Getzmann, S. (2004). Spatial discrimination of soundsources in the horizontal plane following an adapter sound.Hear Res. 191, 14-20.12) Okada, M. and Kashino, M. (2003). The role of frequency-changedetectors in auditory temporal order judgment.NeuroReport 14, 261-264.13) Fujisaki, W., Shimojo, S., Kashino, M., and Nishida, S.(2004). Recalibration of audio-visual simultaneity. NatureNeurosci. 7, 773-778.14) Sekuler, R., Sekuler, A. B., and Lau, R. (1997). Sound altersvisual motion perception. Nature 385, 308.70
社 団 法 人 人 工 知 能 学 会Japanese Society forArtificial Intelligence人 工 知 能 学 会JSAI Technical ReportSIG-CHallege-0522-12 (10/15)Using prosodic and voice quality features for paralinguistic information extraction in dialog speech * Carlos Toshinori ISHI, Hiroshi ISHIGURO, Norihiro HAGITA (ATR Intelligent Robotics andCommunication Laboratories)carlos@atr.jpishiguro@ams.eng.osaka-u.ac.jp hagita@atr.jpAbstract - The use of voice quality features in additionto classical prosodic features is proposed for automaticextraction of paralinguistic information (like speech acts,attitudes and emotions) in dialog speech. Perceptualexperiments and acoustic analysis are conducted formonosyllabic utterances spoken in several speakingstyles, carrying a variety of paralinguistic information.Acoustic parameters related with prosodic and voicequality features potentially representing the variations inspeaking styles are evaluated. Experimental resultsindicate that prosodic features are effective foridentifying some groups of speech acts with specificfunctions, while voice quality features are useful foridentifying utterances with an emotional or attitudinalexpressivity.1 phonation style[6] modal ( ), breathy whispery ( ), vocal fry creaky (), harsh ventricular(), pressed () [6][7,8,9,10]1 F0 contourTonePhonetic& durationpattern [7]informationVocal FryVocal FrySpeechdetection [8]RatesignalParalinguisticAperiodicityAperiodicityinformationdetection [9] Rateextractionspeech actAspir. noiseAsp. Noiseattitudedetection [10]RateemotionFrame level orUtterancesegment levellevelFig. 1 Block diagram of the proposed framework forF0 paralinguistic information extraction.(prosodic features)2 [1,2,3,4](expressive 2.1 speech) non-modal F0 [5]1 Phonetic information “voice quality”ProsodyVoice quality71
- Page 4:
SCOT(Smoothed Coherence Transform)P
- Page 8 and 9:
Particle (a)(b)φ12(τ )[14]x ( t )
- Page 10 and 11:
- 8 -
- Page 12 and 13:
1 () 2 SIMO-ICA 3 SIMO-ICA tele
- Page 14:
ICAy FCy FCy SIMO-ICAs 1(t)x 1(t)1(
- Page 17 and 18:
[15] Y. Mori, H. Saruwatari, T. Tak
- Page 19 and 20:
社 団 法 人 人 工 知 能 学
- Page 21 and 22: • 音 源 位 置マイク配 置
- Page 23 and 24: Table 1: 6 : SIR (dB)SIR 1 SIR 2 S
- Page 25 and 26: 社 団 法 人 人 工 知 能 学
- Page 27 and 28: SIMO-ICA SIMO Figure 2(a)SIMO-ICA
- Page 29 and 30: Binary maskConventional ICAConventi
- Page 31 and 32: 社 団 法 人 人 工 知 能 学
- Page 33 and 34: k lo (l), k c (l), k hi (l) l k c
- Page 35 and 36: 5.75 m4.33 mNoise1.15 mUser 40°2.1
- Page 38 and 39: おける 方 法 論 に 関 し
- Page 40 and 41: Fig.6 は 幼 児 の ABR (Auditory
- Page 42 and 43: ンターフェースはスパイ
- Page 44 and 45: マイクロホン[ 正 面 ][ 左
- Page 46 and 47: s(k)Crosstalkn(k)R S(k)X P(k)X R(k)
- Page 48 and 49: する 隠 れマルコフモデル
- Page 50 and 51: 123ÙÖ ½ ¾º¾ ´º ½µ ´º
- Page 52 and 53: ÌÐ ½ ¿º¾ ÅÎÆÇÂ
- Page 54 and 55: ÁÒØÖÒØÓÒÐ ÓÒÖÒ ÓÒ Á
- Page 56 and 57: 例 えば、 同 一 時 間 差
- Page 58 and 59: いて、θの 絶 対 値 が 大
- Page 60 and 61: Fig.11 にこのシステムの 処
- Page 62 and 63: 5 , 2 EMIEWFig.1 EMIEW EMIEW 6 ,
- Page 64: 0 P th , (14) 4.4 3 4 4 1 , 3
- Page 67 and 68: 社 団 法 人 人 工 知 能 学
- Page 69 and 70: 3.1. 3.2. Fig. 3. The
- Page 71: 4.1. Fig. 5. The time co
- Page 75 and 76: modal (m, ), whispery (w, ), aspir
- Page 77 and 78: Aperiodicity rate (APR)TLR (Time-La
- Page 79 and 80: 社 団 法 人 人 工 知 能 学
- Page 81 and 82: , À, WDS-BF Ñ À℄·
- Page 83 and 84: Table 1: Localization Error of A Si
- Page 85 and 86: 社 団 法 人 人 工 知 能 学
- Page 87 and 88: を 行 い, 閾 値 処 理 を
- Page 89 and 90: 4. 音 声 対 話 制 御 実 験H
- Page 91 and 92: 社 団 法 人 人 工 知 能 学
- Page 93 and 94: 3 HLDAMLLR [3] (Useful Information
- Page 95 and 96: Class 10degClass 20degClass 10degCl
- Page 97 and 98: 社 団 法 人 人 工 知 能 学
- Page 99 and 100: 赤 い 長 方 形 内 ). 以 下
- Page 101 and 102: 5.2 音 場 計 測 結 果(dB SPL)
- Page 103 and 104: 社 団 法 人 人 工 知 能 学
- Page 105 and 106: a) 90 b) 90 MFMc) d) MFMe) 9
- Page 107 and 108: (3) MFT Julius 7.1 Figure 4: SIG2