Tuesday afternoon, 11 November - The Acoustical Society of America

More documents

Recommendations

Info

aries on a transcript. Intertranscriber agreement rates across subsets of 17–40 subjects are significantly above chance based on Fleiss’ statistic, indicating that listeners’ perception of prosody is reliable, with higher agreement rates for boundary perception than for prominence. Prosody perception varies across listeners both corpora and across speakers WMD, where perceived prosody varies for the same utterance produced by different speakers. Acoustic measures from stressed vowels Buckeye: duration, intensity, F1, F2 and articulatory kinematic measures WMD are correlated with the perceived prosodic features of the word. Work supported by NSF. 2pSC15. Perception of contrastive meaning through the LH * LH% contour. Heeyeon Y. Dennison, Amy J. Schafer, and Victoria B. Anderson Dept. of Linguist., Univ. of Hawaii, 1890 East-West Rd., Honolulu, HI 96822, linguist@hawaii.edu This study establishes empirical evidence regarding listeners’s perceptions of the contrastive tune LH * LH%; e.g., Lee et al. 2007. Eighteen native English speakers heard three types of test sentences: 1 contrastive, “The mailbox wasLH * fullLH%,” 2 positive neutral, “The mailboxH * was fullH * LL%;” and 3 negated neutral, “The mailboxH * was notH * fullH * LL%.” The participants first scored them by naturalness, and then typed continuation sentences based on the perceived meaning. Three other native English speakers independently coded the continuations to evaluate participants’ interpretations of the test sentences. The results clearly demonstrated that the LH * LH% tune generated contrastive meanings e.g., “…but the mailman took the mail and now it is empty” significantly more often than both the positive and negative neutral counterparts. Moreover, sentences presented in the contrastive tune were perceived as natural utterances. High coder agreement indicated a reliable function of the contrastive tune, conforming to the existing literature based on intuitive examples e.g., Lee 1999. Interestingly, however, the contrastive tune produced the expected contrastive meaning in only about 60% of trials versus less than 10% contrastive continuations for the other contours. This finding shows that the interpretation of the LH * LH% contour is more complex than previously suggested. 2pSC16. Order of presentation asymmetry in intonational contour discrimination in English. Hyekyung Hwang Dept. of Linguist., McGill Univ., 1085 Dr. Penfield Ave., Montreal PQ H3A 1A7, Canada, hye.hwang@mail.mcgill.ca, Amy J. Schafer, and Victoria B. Anderson Univ. of Hawaii, Honolulu, HI 96822 In the work of Hwang et al. 2007, native English speakers showed overall poor accuracy in distinguishing initially rising versus level e.g., L * L * H- H * L-L% vs L * L * L- H * L-L% or initially falling versus level e.g., H * H * L- H * L-L% vs H * H * H- H * L-L% contour contrasts on English phrases in an AX discrimination task. Results not reported in that paper found that it was easier to discriminate when a more complex F0 contour occurred second than when it occurred first. Several orders of presentation effects in the perception of intonation have been reported e.g., L. Morton 1997; S. Lintfert 2003; Cummins et al. 2006 but no satisfying account has been provided. This study investigated these asymmetries more systematically. The order effect was significant for falling-level contrast pairs: pairs with a more complex F0 contour last were discriminated more easily than the reverse order. Rising versus level contrasts showed a similar tendency. The results thus extend intonational discrimination asymmetries to these additional contours. They suggest that the cause of the asymmetries may depend more on F0 complexity than on F0 peak. 2pSC17. Alternatives to f0 turning points in American English intonation. Jonathan Barnes Dept. of Romance Studies, Boston Univ., 621 Commonwealth Ave., Boston, MA 02215, jabarnes@bu.edu, Nanette Veilleux Dept of Comput. Sci., Simmons College, Boston, MA 02115, veilleux@simmons.edu, Alejna Brugos Boston Univ., Boston, MA 02215, abrugos@bu.edu, and Stefanie Shattuck-Hufnagel Res. Lab of Electrons, MIT, Cambridge, MA 02139, stef@speech.mit.edu Since the inception of the autosegmental-metrical approach to intonation Bruce 1977, Pierrehumbert 1980, Ladd 1996, the location and scaling of f0 turning points have been used to characterize phonologically distinct f0 contours in various languages, including American English. This approach is undermined, however, by the difficulty listeners experience in perceiving differences in turning point location. Numerous studies have demonstrated either listener insensitivity to changes in turning point location or the capacity for other aspects of contour “shape” to override turning-point alignment for contour identification Chen 2003, D’Imperio 2000, Niebuhr 2008. Even labelers with access to visual representations of the f0 encounter similar challenges. By contrast, a family of related measurements using area under the f0 curve to quantify differences in contour shape appear more robust. For example, a measure of the synchronization of the center of gravity of the accentual rise with the boundaries of the accented vowel yields 93.9% correct classification in a logistic regression model on a data set of 115 labeled utterances differing in pitch accent type. L * H LH * in ToBI terminology. This classification proceeds entirely without explicit reference to the turning points i.e., beginning of rise, peak traditionally used to characterize this distinction. 2pSC18. Comparison of a child’s fundamental frequencies during structured and unstructured activities: A case study. Eric Hunter Natl. Ctr. for Voice and Speech, 1101 13th St., Denver, CO 80126, eric.hunter @ncvs2.org This case study investigates the difference between children’s fundamental frequency F 0 during structured and unstructured activities, building on the concept that task type influences F 0 values. A healthy male child 67 months was evaluated 31 h, 4 days. During all activities, a National Center for Voice and Speech voice dosimeter was worn to measure long-term unstructured vocal usage. Four structured tasks from previous F 0 studies were also completed: 1 sustaining the vowel /Ä/, 2 sustaining the vowel /Ä/ embedded in a phrase-end word, 3 repeating a sentence, and 4 counting from one to ten. Mean F 0 during vocal tasks 257 Hz, as measured by the dosimeter and acoustic analysis of microphone data, matched the literature’s average results for the child’s age. However, the child’s mean F 0 during unstructured activities was significantly higher 376 Hz. The mode and median of the vocal tasks were respectively 260 and 259 Hz, while the dosimeter’s mode and median were 290 and 355 Hz. Results suggest that children produce significantly different voice patterns during clinical observations than in routine activities. Further, long-term F 0 distribution is not normal, making statistical mean an invalid measure for such. F 0 mode and median are suggested as two replacement parameters to convey basic information about F 0 usage. 2pSC19. Effects of acoustic cue manipulations on emotional prosody recognition. Chinar Dara and Marc Pell School of Commun. Sci. and Disord., McGill Univ., 1266 Pine West, Montreal, QC H3G 1A8, Canada, chinar.dara@mail.mcgill.ca Studies on emotion recognition from prosody have largely focused on the role and effectiveness of isolated acoustic parameters and less is known about how information from these cues is perceived and combined to infer emotional meaning. To better understand how acoustic cues influence recognition of discrete emotions from voice, this study investigated how listeners perceptually combine information from two critical acoustic cues, pitch and speech rate, to identify emotions. For all the utterances, pitch and speech rate measures of the whole utterance were independently manipulated by factors of 1.25 25% and 0.75 25%. To examine the influence of one cue with reference to the other cue the three manipulations of pitch 25%, 0%, and 25% were crossed with the three manipulations of speech rate 25%, 0%, and 25%. Pseudoutterances spoken in five emotional tones happy, sad, angry, fear, and disgust and neutral that have undergone acoustic cue manipulations were presented to 15 male and 15 female participants for an emotion identification task. Results indicated that both pitch and speech rate are important acoustic parameters to identify emotions and more critically, it is the relative weight of each cue which seems to contribute significantly for categorizing happy, sad, fear, and neutral. 2pSC20. Perception of emphasis in urban Jordanian Arabic. Allard Jongman, Sonja Combest, Wendy Herd, and Mohammad Al-Masri Linguist. Dept., Univ. of Kansas, 1541 Lilac Ln., Lawrence, KS 66044, jongman@ku.edu Previous acoustic analyses of minimal pairs of emphatic versus plain CVC stimuli showed that 1 emphatic consonants have a lower spectral mean than their plain counterparts and 2 vowels surrounding emphatic consonants are characterized by a higher F1, lower F2, and higher F3 than 2497 J. Acoust. Soc. Am., Vol. 124, No. 4, Pt. 2, October 2008 156th Meeting: Acoustical Society of America 2497 2p TUE. PM
vowels surrounding plain consonants Jongman et al., J. Acoust. Soc. Am. 121, 3169 2007. The present perception study explores whether Arabic listeners’ recognition of emphasis is based on information in the consonant or vowel. Monosyllabic words were used with emphatic and plain consonants in either initial or final position. By means of cross-splicing, the emphatic consonant or its adjacent vowel replaced the plain consonant or its adjacent vowel. Thirty Jordanian listeners participated in the experiment. On each trial, they indicated which word emphatic or plain they heard. Results show that the contribution of consonantal and vocalic information to the perception of emphasis depends on vowel quality: In the context of a, listeners seem to make their decision primarily based on the vowel while in the context of i and u, properties of the consonant carry more weight in this decision. The perceptual data will be compared to the acoustic measurements. Research supported by the NSF. 2pSC21. The weighting of vowel quality in perception of English lexical stress. Yanhong Zhang 411 Windsor Court, Ewing, NJ 0868, zhang66@purdue.edu and Alexander Francis Purdue Univ., West Lafayette, IN 470907-2038 Acoustically, English lexical stress is multidimensional, involving F0, duration, intensity, and vowel quality. Previous research found that Mandarin speakers had problems using vowel reduction in English lexical stress production. Assuming nativelike perception is a prerequisite to nativelike production for non-native speech, the weight of vowel quality with comparison to that of F0, duration, and intensity in Mandarin listeners’ stress perception was examined. Mandarin and English listeners judged lexical stress placement in synthesized tokens of desert, in which the first syllable /de/ was varied along vowel quality and each of the other cues depending on the pair of cues in focus. Results showed that both Mandarin and English listeners consistently weighted vowel quality more than the other cues. Vowel quality and duration were treated as combinational cues by both groups. English listeners used both intensity and vowel quality separately, while Mandarin listeners did not use intensity at all. Findings suggest that Mandarin listeners had a nativelike use of vowel quality for perceiving English stress. However, Mandarin listeners treated F0 in a different way from English listeners, possibly owing to the influence of their native tonal background. Implications for the interaction between production and perception in second-language learning will be also discussed. Work supported by Purdue Linguistics. 2pSC22. Duration of tone sandhi in Mandarin Chinese. Bei Yang FLARE, the Univ. of Iowa, PH658, Univ. of Iowa, Iowa City, IA 52240, bei-yang@uiowa.edu Tone sandhi is the tonal alternation when they are connected in speech flow. Six types of tone sandhi of disyllabic words in Mandarin Chinese are investigated in this research, including the neutral tone, the dipping tone alternation, /bu/, /yi1/, /yi2/, and double change both of tone changes in a disyllabic word. The paper explores the duration of these six types and whether the neutral tone can change into the citation tones level, rising, dipping, and falling tones if we change the duration of the neutral tone. Five Chinese native speakers participate in the study. Two tasks are used to elicit data. First, speakers are required to read 40 disyllabic words. The second one is an identification task. The duration of eight neutral tones are elongated by acoustic technology. The participants hear the eight processed neutral tones and eight unprocessed neutral tones, and are asked to judge the tone types. The results indicate that the neutral tone has the shortest duration, and the duration of other sandhi tone is shorter than that of the normal citation tones. The data also show that the neutral tone and citation tones can be altered based on duration and pitch. 2pSC23. The development of tonal duration in Mandarin-speaking children. Jie Yang Dept. of Commun. Sci. and Disord., Univ. of Texas at Austin, 1 Univ. Station A1100, Austin, TX 78712, thyjessie@mail.utexas.edu, Randy Diehl and Barbara Davis Univ. of Texas at Austin, Austin, TX 78712 Previous research found that the duration of segments decreases as children grow older. The development of suprasegmental duration, however, has not been explored. The present study investigated developmental changes in duration of the four Mandarin tones. 5-, 8-, and 12-year-old monolingual Mandarin-speaking children and young adults participated in the study. Tone durations were measured in participants’ production of monosyllabic target words elicited by picture identification tasks. The results were as follows 1 For each tone category, tone duration and variability decreased with age: 5and 8-year-old children showed significantly longer durations than adults. Tone durations in 12-year-old children approximated adult values. 2 Despite longer durations, adultlike duration patterns across tone categories existed in all children: dipping tones were the longest, followed by rising and level tones, with falling tones being the shortest. 3 Duration differences between the rising and dipping tones became larger as children grew older. The results may be indicative of the general maturation of laryngeal control over age. Although 5- and 8-year-old children have already established lexical contrasts of tone, adultlike phonetic norms are still in the process of development. The developmental data also provide support for a hybrid account of speech production from a suprasegmental perspective. 2pSC24. Effects of language background on tonal perception for young school-aged children. Chang Liu and Jie Yang Dept. of Commun. Sci. and Disord., Univ. of Texas at Austin, Austin, TX 78712 Given that Chinese is a tonal language while English is not, the present study investigates how language background affects tonal perception for young children. Tonal identification and discrimination are measured on for three groups of school-aged children 6–7 years old: Chinese-monolingual CM, English-monolingual EM, and English-Chinese bilingual ECB children. Children’s task is to identify and discriminate tone 1 level, tone 2 rising, and tone 4 falling of a Chinese syllable /ma/, for which the fundamental frequency contour is systematically manipulated from tone 1 to tone 2 and from tone 1 to tone 4. CM and ECB children show typical categorical perception in both identification and discrimination, while EM children cannot identify and discriminate the three tones. These results suggest that learning and exposure experience of the tonal language is critical for children to perceive tonal changes in speech sounds. 2pSC25. Hemispheric processing of pitch accent in Japanese by native and non-native listeners. Jung-Yueh Tu, Xianghua Wu, and Yue Wang Dept. of Linguist., Simon Fraser Univ., 8888 Univ. Dr., Burnaby, BC V5A 1S6, Canada, jta31@sfu.ca It is well established that language processing is left hemisphere dominant. Previous findings, however, indicate that lateralization of different levels of linguistic prosody varies with their functional load as well as listeners’ linguistic experience. This study explored the hemispheric processing of Japanese pitch accent by native and non-native listeners differing in experience with pitch, including 16 native Japanese participants, 16 Mandarin Chinese participants whose native language has linguistic tonal contrasts, and 16 English participants with no tone or pitch accent background. Pitch accent pairs were dichotically presented and the listeners were asked to identify which pitch accent pattern they heard in each ear. Preliminary results showed that for all the three groups, the percentage of errors for the left ear and that for the right ear were comparable, indicating no hemispheric dominance. The Japanese group did not reveal left hemisphere dominance, as previously found for linguistic tone processing by native listeners. The performance of Mandarin group infers that tone language background did not significantly affect the lateralization of pitch accent. These findings are discussed in terms of how linguistic function differentially influences the hemispheric specialization of different domains of prosodic processing by native and non-native listeners. Work supported by the NSERC. 2pSC26. The effect of weak tone on the f0 peak alignment. Seung-Eun Chang Dept. of Linguist., Univ. of Texas at Austin, 1 University Station B5100, Austin, TX 78712, sechang71@gmail.com The f0 peak sometimes occurs after the syllable with which it is associated, and the peak alignment varies, depending on several factors such as lexical tone target, neighboring tones, focus, and so forth. This study investigates the effect of weak tones on the alignment of f0 peaks with three tone types i.e., H, M, and R of South Kyungsang Korean, spoken in the southeastern part of Korea. When three tone types are followed by one or two unstressed suffixes, R was found to have the maximum amount of peak delay and M was found to have the minimum amount, i.e., the peak came in the second syllable, following the R-toned syllable, but the peak came in the syllable following the H-toned syllable. This peak delay was not found for M. Thus, it is argued that the tone alternation patterns in suffixed words are 2498 J. Acoust. Soc. Am., Vol. 124, No. 4, Pt. 2, October 2008 156th Meeting: Acoustical Society of America 2498
Page 1 and 2: TUESDAY MORNING, 11 NOVEMBER 2008 C
Page 3 and 4: In each case, the unique qualities
Page 5 and 6: 5:05 2pAA13. Predictions of sound e
Page 7 and 8: oth for daytime resting behavior an
Page 9 and 10: 2:30 2pBB5. Time reversal acoustic
Page 11 and 12: e.g., temperature that is then digi
Page 13 and 14: music” presents a set of challeng
Page 15 and 16: ection of the resonance frequency s
Page 17 and 18: signals such as helicopters. This s
Page 19: may enhance attention to pitch and
Page 23 and 24: 2:40 2pSP3. Hardware and software s
Page 25: criteria is normal mode acoustic pr

Tuesday afternoon, 11 November - The Acoustical Society of America

Create successful ePaper yourself

Delete template?

Save as template?