ANALYSIS BY SYNTHESIS OF ACOUSTIC ... - ResearchGate

ANALYSIS BY SYNTHESIS OF ACOUSTIC ... - ResearchGate ANALYSIS BY SYNTHESIS OF ACOUSTIC ... - ResearchGate

dea.brunel.ac.uk
from dea.brunel.ac.uk More from this publisher

➡➡FrequencyInitialPitchRisePitch Frequency RangePitch Contour SlopePitch nucleusFinal Pitch FallThis warning / now represents/ a minority viewFigure 4: Broad Pitch Intonation Pattern in Accent ModellingThick line is the pitch contour.Time4. TRANSFORMATION <strong>OF</strong> PITCH INTONATIONCORRELATES <strong>OF</strong> ACCENTSIntonation is one of the most important characteristics of accents.Popular intonation models include Tone and Break Index(ToBI)[10], Tilt[11], Momel[12] and Fujisaki[13]. In this paper,a model of broad pitch intonation pattern is employed todescribe the pitch intonation difference across the accents [9](Figure 4). A set of features describing the broad patterns ofvariation of pitch contour are pitch frequency range, pitchcontour slope (F 0slope ), rate of initial pitch rise (F 0I ), rate of finalpitch fall (F 0F ). The pitch trajectory features are used formodeling intonations as illustrated in Figure (4):Pitch frequency range represents the range of F 0 boundedfrom the lowest to the highest frequency. However, in this workwe define pitch range asrange( F0 ) F0 3std( F0)(3)where std(F 0 ) is the standard deviation of F 0 . Standard deviationis used instead of the maximum and minimum values of the pitchfrequency over an utterance to avoid errors in the estimation ofpitch range due to outliers.Figure 5: Original pitch marks (a) from a broad Australianaccented sentence and transformed pitch marks (b) towardsto British accent. Pitch frequency range increases 50%;F 0slope increases 5%; F 0I increases 50%; F 0F decreases 5%.Duration (s)0.20.180.160.140.120.10.080.060.040.02Australian British Americanaa ae ah ao aw ay eh er ey ih iy ow oy uh uwFigure 6: Comparison of average duration of Australian, Britishand American vowels.Pitch contour slope (F 0slope ) is the overall slope of pitchtrajectory over an utterance.Rate of initial pitch rise (F 0I ) at the beginning of a pitchcontour. This is described by the rate and direction of the F 0change at the initial pitch contour segment at the beginning ofthe utterance.Rate of final pitch fall (F 0F ) is described by the rate anddirection of the F 0 change at the end of the utterance.A pitch transformation method, based on TD-PSOLA[14], isdeveloped that allows independent modification of the pitchintonation parameters. The modification of intonation featuresacross British, Australian and American accents is based on theanalysis and estimates of the features for these accents [9].Figure 5 gives an example of broad pitch intonationmodification.5. TRANSFORMATION <strong>OF</strong> DURATIONCORRELATES <strong>OF</strong> ACCENTSAccents are partly conveyed by the differences in vowel durationpatterns and speaking rate [9]. Variation of the average phonemedurations across accents is shown in Figure 6. In ourexperiments, the transformation of duration pattern of accents isaccomplished by adjusting the duration of each phoneme to themean duration of the corresponding phoneme in the targetaccent.The modification of phoneme duration pattern is achievedusing TD-PSOLA [14] and estimates of the ratio of the averageduration of the phoneme in the source accent to that of thecorresponding phoneme in the target accent.6. RANKING <strong>OF</strong> CORRELATES <strong>OF</strong> ACCENTSIn order to perceptually assess the importance of the acousticcorrelates of accents, mean opinion score tests (MOS)[15,16] areperformed. In total 7 British speaker subjects are used. Eachsubject is given 5 sets of sentences from different Australianspeakers. In each set, there are five sentences (A B C D E). Twoof them are original British accented speech and Australianaccented speech. The remaining three are transformed speechesfrom Australian accented speech by modification of formant,pitch and duration correlates of the source accent towards thetarget accent respectively. All the sentences in one set have thesame text content and are listed in random order. In addition,subjects are given two known accent sentences: Sentence S andI - 639


➡➠sentence T. Sentence S is the source Australian accented speechand sentence T is the target British accented speech. Bothsentences have different text from test sentences. Subjects arethen asked to give scores of the similarity between given set ofspeech (Sentence A B C D E) and the target accented speech(Sentence T) and source accented speech (Sentence S). Thescore is in the range between 10 (identical) to 0 (completelydifferent). Similar tests are conducted on British and Americanaccent pair.The MOS results are shown in Table 3. It can be seen thatfor British and American accent pair, modification of formantsof speech is more influential in affecting accent change thanmodification of duration and broad pitch intonation pattern.However, for Australian and British accent pair, Australianspeech after modification of broad pitch intonation areconsidered to have slightly more British accent characteristicscompared to that obtained after modification of formants. In bothcases, duration pattern exhibits little impact on accent.7. CONCLUSTIONThis paper presented an analysis the acoustic correlates ofaccents: formants, pitch intonation and duration. Each correlateis transformed individually for perceptual assessment. Formantscorrelates of accents are transformed by non-uniform subbandfrequency warping based on 2D HMMs. A TD-PSOLA basedmethod is used for transformation of pitch intonation patternsand duration pattern. The MOS test indicates that formants arethe most important correlate accents of the three.Future works include extending the experiments to otheraccent pairs (i.e. American and Australian accent pair) with morenative American and Australian subject.8. ACKNOWLEDGEMENTSAccentsBritishAccentsAmericanAccentsOriginal British speech 8 2British speech after modification offormants towards American 4 6British speech after pitch intonationmodification towards American 6 4British speech after durationmodification towards American 7 3Original American speech 1 9Table 3(a) MOS scores of transformation of acoustic correlates ofBritish accented speech to American accentAccentsAustralianAccentsBritishAccentsOriginal Australian Speech 7 3Australian speech after formantsmodification towards British 5 5Australian speech after pitchintonation modification towardsBritish 3 7Australian speech after durationmodification towards British 6 3Original British Speech 1 9Table 3(b) MOS scores of transformation of acoustic correlates ofbroad Australian accented speech to British accentWe wish to thank the UK’s EPSRC for funding project noGR/M98036.9. REFERENCE[1] Wells J.C., Accents of English, Cambridge University Press,(1982).[2] Humphries J., “Accent Modelling and Adaptation inAutomatic Speech recognition”, PhD Thesis, CambridgeUniversity Engineering Department (1997)[3] Harrington J., Cox F., Evans Z., “An Acoustic PhoneticStudy of Broad, General, and Cultivated Australian EnglishVowels”, Australian Journal of Linguistics 17: 155-184 (1997)[4] Watson C., Harrington J., Evans Z., “An AcousticComparison between New Zealand and Australian EnglishVowels”, Australian Journal of Linguistics (1996)[5]Yan Q. , Vaseghi S., “A Comparative Analysis of UK and USEnglish Accents in Recognition and Synthesis”, ICASSP, Floridapp 413-416 (2002)[6] Arslan L. M., Hansen H., “A Study of Temporal Features andFrequency Characteristics in American English Foreign Accent”,Journal of Acoustic Society of America, vol. 102(1), pp. 28-40,(1997)[7] Yan Q., Vaseghi S., “Analysis, Modelling and Synthesis offormants of British, American and Australian Accents” ICASSP,Hong Kong, pp. 712-715 (2003)[8] Sethy A., Narayanan S., “Refined Speech Segmentation forConcatenative Speech Synthesis” ICSLP (2002)[9]Yan Q., Vaseghi S., Rentzos D., Ho C., Turajlic E., “Analysisof Acoustic Correlates of British, Australian and AmericanAccents” IEEE Automatic Recognition and UnderstandingWorkshop (2003)[10] Silverman, K., Beckman, M., Pitrelli,J., Ostendorf, M.,Wightman,C., Price,P., Pierrehumbert,J., Hirschberg,J. “ToBI; Astandard for labeling English prosody”, ICSLP. (1992).[11] Talyor, P.A., “The Tilt intonation model” ICSLP SydenyAustralian. (1998).[12] Hirst D., Espesser R., “Automatic Modelling ofFundamental Frequency Using A Quadratic Spline Function”Travaux de l’Institute de Phonetique d’Aix vo. 15 pp.75-85,(1993).[13] Fujisaki H., “A note on the physiological and physical basisfor the phrase and accent components in the voice fundamentalfrequency contour.” In Fujimura, O. (Ed.) Vocal FoldPhysiology: Voice Production, Mechanisms andFunctions. Raven, New York, NY, pp.135-149 (1988).[14]Moulines, E., Charpentier, F. “Pitch-SynchronousWaveform Processing techniques for Test-to-Speech SynthesisUsing Diphones”, Speech Communication, 9 pp.453-467 (1990)[15] Stylianou, Y. Cappe, O. Moulines, E. “ContinuousProbabilistic Transform for Voice Conversion” IEEETransaction on Speech and Audio Processing, Vol6, No. 2 pp131-142 (1998)[16] Abe, M., Nakamura, S., Shikano, K., Kuwabara, H., “VoiceConversion Through Vector Quantization”, ICASSP pp 565-568(1998)I - 640

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!