Analysis of Acoustic Correlates of British ... - Brunel University

Analysis of Acoustic Correlates of British ... - Brunel University Analysis of Acoustic Correlates of British ... - Brunel University

dea.brunel.ac.uk
from dea.brunel.ac.uk More from this publisher
07.02.2015 Views

ANALYSIS OF ACOUSTIC CORRELATES OF BRITISH, AUSTRALIAN AND AMERICAN ACCENTS Qin Yan Saeed Vaseghi Dimitrios Rentzos *Ching-Hsiang Ho Emir Turajlic Department of Electronic and Computer Engineering, Brunel University, Uxbridge, Middlesex, UK UB8 3PH *Fortune Institute of Technology, Kaohsiung Taiwan, 842, R.O.C (Qin.Yan, Saeed.Vaseghi, Dimitrios.Rentzos, Emir. Turajlic)@brunel.ac.uk *ch.ho@center.fjtc.edu.tw ABSTRACT This paper presents an analysis of the acoustic correlates of the differences of British, Australian and American English accents. The structures of the differences that characterise accents in speech can be divided into two parts: (a) phonetic differences and (b) acoustic differences. The focus of this paper is on the analysis of acoustic correlates of accents including formants and their trajectories, pitch trajectory, pitch accent, pitch nucleus, duration and speaking rate. The acoustics of accents are modelled and estimated using twodimensional HMMs of formants and a model of pitch such as Rise/Fall/Connect (RFC) model. The differences between British, Broad Australian and General American English accents are discussed. Australian accent has a lower 1 st formant (F1) but higher 2 nd formant (F2) compared to British and American. The second formant in speech is considered as the most sensitive to accent identity. British speakers have the largest pitch frequency range and the largest initial pitch rise and final pitch fall rates in utterances. Australian accent exhibits significant elongation of vowels and the lowest speaking rate compared to other two accents. The differences in acoustic correlates across accents are used to morph the accent of a source speaker towards a target accent. 1. INTRODUCTION Accent is one of the most fascinating aspects of speech acoustics. An accent is a distinctive characteristic manner of pronunciation, usually associated with a community of people with a common regional or social/cultural background. Accents are dynamic processes that evolve over time influenced by large immigrations and social and cultural trends. J.C. Wells [1] provides an excellent introduction to the linguistic structures of accents of English language. In [2,3] Watson and Harrington describe a comparative analysis of formants of British English, Australian English and New Zealand English and three subclasses of Australian namely: Broad Australian English, General Australian English and Cultivated Australian English. Accent is a relatively under-explored aspect of automatic speech recognition (ASR) and text-to-speech (TTS) systems. Accent variation is considered as one of the dominant causes of deterioration in speech recognition performance. In [4] Humphries improves speech recognition accuracy by automatically modelling pronunciations of the accents of the input speech. In [5] Arslan and Hansen present an acoustic analysis of a number of foreign English accents. Accent Structure Parameters Comments Phonetic Parameters Substitution, Pronunciation differences from insertion, deletion ‘standard’ phonetic transcription Formant Correlates Formants & their trajectories 2 nd formant is most sensitive to the accent Intonation Correlates F 0 range Range of pitch F 0 Utterance Slope Pitch slope across the utterance Pitch Nucleus Prominent point (stressed) within an intonation group (Tone Unit) Initial Pitch Rising First pitch slope of a narrative utterance Final Pitch Lowering Final Pitch Rising Final fall pitch slope of a narrative utterance Final rise pitch slope of a narrative utterance Duration and Speaking Rate Correlates Speaking Rate Phonemes or words per second Phoneme Duration Vowel duration elongation and complete pronunciation all affect duration Table 1: A list of phonetic and acoustic correlates of accent.

ANALYSIS OF ACOUSTIC CORRELATES OF BRITISH, AUSTRALIAN AND<br />

AMERICAN ACCENTS<br />

Qin Yan Saeed Vaseghi Dimitrios Rentzos *Ching-Hsiang Ho Emir Turajlic<br />

Department <strong>of</strong> Electronic and Computer Engineering, <strong>Brunel</strong> <strong>University</strong>, Uxbridge, Middlesex, UK UB8 3PH<br />

*Fortune Institute <strong>of</strong> Technology, Kaohsiung Taiwan, 842, R.O.C<br />

(Qin.Yan, Saeed.Vaseghi, Dimitrios.Rentzos, Emir. Turajlic)@brunel.ac.uk *ch.ho@center.fjtc.edu.tw<br />

ABSTRACT<br />

This paper presents an analysis <strong>of</strong> the acoustic correlates<br />

<strong>of</strong> the differences <strong>of</strong> <strong>British</strong>, Australian and American<br />

English accents. The structures <strong>of</strong> the differences that<br />

characterise accents in speech can be divided into two<br />

parts: (a) phonetic differences and (b) acoustic<br />

differences. The focus <strong>of</strong> this paper is on the analysis <strong>of</strong><br />

acoustic correlates <strong>of</strong> accents including formants and<br />

their trajectories, pitch trajectory, pitch accent, pitch<br />

nucleus, duration and speaking rate. The acoustics <strong>of</strong><br />

accents are modelled and estimated using twodimensional<br />

HMMs <strong>of</strong> formants and a model <strong>of</strong> pitch<br />

such as Rise/Fall/Connect (RFC) model. The differences<br />

between <strong>British</strong>, Broad Australian and General American<br />

English accents are discussed. Australian accent has a<br />

lower 1 st formant (F1) but higher 2 nd formant (F2)<br />

compared to <strong>British</strong> and American. The second formant<br />

in speech is considered as the most sensitive to accent<br />

identity. <strong>British</strong> speakers have the largest pitch frequency<br />

range and the largest initial pitch rise and final pitch fall<br />

rates in utterances. Australian accent exhibits significant<br />

elongation <strong>of</strong> vowels and the lowest speaking rate<br />

compared to other two accents. The differences in<br />

acoustic correlates across accents are used to morph the<br />

accent <strong>of</strong> a source speaker towards a target accent.<br />

1. INTRODUCTION<br />

Accent is one <strong>of</strong> the most fascinating aspects <strong>of</strong> speech<br />

acoustics. An accent is a distinctive characteristic manner<br />

<strong>of</strong> pronunciation, usually associated with a community <strong>of</strong><br />

people with a common regional or social/cultural<br />

background. Accents are dynamic processes that evolve<br />

over time influenced by large immigrations and social<br />

and cultural trends. J.C. Wells [1] provides an excellent<br />

introduction to the linguistic structures <strong>of</strong> accents <strong>of</strong><br />

English language. In [2,3] Watson and Harrington<br />

describe a comparative analysis <strong>of</strong> formants <strong>of</strong> <strong>British</strong><br />

English, Australian English and New Zealand English<br />

and three subclasses <strong>of</strong> Australian namely: Broad<br />

Australian English, General Australian English and<br />

Cultivated Australian English.<br />

Accent is a relatively under-explored aspect <strong>of</strong><br />

automatic speech recognition (ASR) and text-to-speech<br />

(TTS) systems. Accent variation is considered as one <strong>of</strong><br />

the dominant causes <strong>of</strong> deterioration in speech<br />

recognition performance. In [4] Humphries improves<br />

speech recognition accuracy by automatically modelling<br />

pronunciations <strong>of</strong> the accents <strong>of</strong> the input speech. In [5]<br />

Arslan and Hansen present an acoustic analysis <strong>of</strong> a<br />

number <strong>of</strong> foreign English accents.<br />

Accent Structure<br />

Parameters<br />

Comments<br />

Phonetic Parameters<br />

Substitution, Pronunciation differences from<br />

insertion, deletion ‘standard’ phonetic transcription<br />

Formant <strong>Correlates</strong><br />

Formants & their<br />

trajectories<br />

2 nd formant is most sensitive to the<br />

accent<br />

Intonation <strong>Correlates</strong><br />

F 0 range<br />

Range <strong>of</strong> pitch<br />

F 0 Utterance Slope Pitch slope across the utterance<br />

Pitch Nucleus Prominent point (stressed) within an<br />

intonation group (Tone Unit)<br />

Initial Pitch Rising First pitch slope <strong>of</strong> a narrative<br />

utterance<br />

Final Pitch<br />

Lowering<br />

Final Pitch Rising<br />

Final fall pitch slope <strong>of</strong> a narrative<br />

utterance<br />

Final rise pitch slope <strong>of</strong> a narrative<br />

utterance<br />

Duration and Speaking Rate <strong>Correlates</strong><br />

Speaking Rate Phonemes or words per second<br />

Phoneme Duration Vowel duration elongation and<br />

complete pronunciation all affect<br />

duration<br />

Table 1: A list <strong>of</strong> phonetic and acoustic correlates <strong>of</strong> accent.


Speakerindependent<br />

speech database<br />

(UK,AU,US)<br />

Feature<br />

extraction<br />

Pitch trajectory<br />

estimator<br />

HMM<br />

training<br />

Forced<br />

alignment<br />

segmentation<br />

Pitch pattern<br />

analysis<br />

This paper presents an outline <strong>of</strong> an accent model in<br />

terms <strong>of</strong> the linguistics and acoustics correlates <strong>of</strong> an<br />

accent. Experiments are focused on three major accents<br />

<strong>of</strong> the English language namely <strong>British</strong>, Broad Australian<br />

and American. The databases used are the Australian<br />

accent speech ANDOSL database, American accent<br />

speech WSJ database (Table 2) and <strong>British</strong> accent speech<br />

WSJCAM0. The features used to train hidden Markov<br />

models <strong>of</strong> speech are 39 Mel-Frequency Cepstral<br />

Coefficient (MFCCs) with energy and their<br />

differentiation and acceleration. For segmentation and<br />

labeling <strong>of</strong> speech left-to-right HMMs <strong>of</strong> triphone units<br />

with 3 states and 20 Gaussian mixtures per state are used.<br />

2. OVERVIEW OF ACCENT<br />

MODELLING<br />

Formant<br />

2D HMMs<br />

Duration &<br />

speaking rate<br />

models<br />

Pitch trajectory<br />

model<br />

Figure 1 An Overview <strong>of</strong> <strong>Acoustic</strong> Modelling <strong>of</strong> Accents.<br />

The structural differences <strong>of</strong> accents can be divided into<br />

two broad parts: (a) differences in phonetic transcription<br />

and (b) differences in acoustic correlates. Table 1 shows<br />

a list <strong>of</strong> the acoustic features that represent the acoustic<br />

space <strong>of</strong> an accent with a brief comment on each set <strong>of</strong><br />

correlates. Figure 1 illustrates the feature extraction and<br />

modelling process involved in quantifying the correlates<br />

<strong>of</strong> an accent in terms <strong>of</strong> the values <strong>of</strong> the formants, pitch<br />

and duration. Formants <strong>of</strong> each phoneme are modeled by<br />

2D HMM [6,7]. A 2D HMM is modeled as a<br />

combination <strong>of</strong> a 1-D HMM along time dimension and a<br />

1-D HMM along frequency dimension. Along the<br />

frequency axis, every state <strong>of</strong> a 2D-HMM <strong>of</strong> formant<br />

models the distribution <strong>of</strong> one formant <strong>of</strong> the phoneme.<br />

Pitch trajectories are used to model the broad intonation<br />

characteristics pattern <strong>of</strong> an accent. Duration and<br />

speaking rate are modelled by traditional speech HMMs.<br />

3. PHONETIC CORRELATES OF ACCENTS<br />

The most dominant aspect <strong>of</strong> accents is <strong>of</strong>ten manifested<br />

Database Name<br />

(accent)<br />

No.<br />

Speakers(f/m)<br />

No.<br />

Sentences<br />

ANDOSL (Australian) 18/18 7200<br />

WSJ (American) 36/38 9438<br />

WSJCAM0(<strong>British</strong>) 40/46 9476<br />

Table 2: Databases configurations<br />

Word American <strong>British</strong> Australian<br />

John ʤΛn ʤƆn ʤƆn<br />

day dei dei dæi<br />

immediate ı’mi:ʤ^t i’mi:ʤət ə’mi:di:ət<br />

chassis ʃæsı ʃæ sı ʃæzi:<br />

Table 3: Examples <strong>of</strong> Difference in Phonetic Realization.<br />

by the differences in pronunciation as transcribed by the<br />

phonetic dictionary. These phonetic transcription<br />

differences can be broadly categorised into two classes:<br />

(a) Differences in the phonemic alphabet systems,<br />

i.e. in the number or identity <strong>of</strong> phonemes. For<br />

example, for automatic speech processing,<br />

<strong>British</strong> accent as transcribed by Cambridge<br />

<strong>University</strong>’s BEEP dictionary has five extra<br />

vowels: /ax ea ia ua oh/ compared to American<br />

as transcribed by Carnegie Melon <strong>University</strong>’s<br />

CMU dictionary. Australian English has<br />

distinctive vowels such as /æi/ instead <strong>of</strong> /ai/<br />

and /æu/ for /au/.<br />

(b) Differences in phonetic realisation including<br />

phoneme substitution / deletion / insertion.<br />

Table 3 gives some examples <strong>of</strong> phonetic<br />

differences between American, Australian and<br />

<strong>British</strong> accents.<br />

4. ACOUSTIC CORRELATES OF ACCENTS<br />

4.1 Formant <strong>Correlates</strong> <strong>of</strong> Accent<br />

Formants are the main sources <strong>of</strong> acoustic ‘colouring’ <strong>of</strong><br />

F1 (Hz)<br />

F2 (Hz)<br />

1000<br />

900<br />

800<br />

700<br />

600<br />

500<br />

400<br />

300<br />

200<br />

2800<br />

2600<br />

2400<br />

2200<br />

2000<br />

1800<br />

1600<br />

1400<br />

1200<br />

1000<br />

800<br />

600<br />

Australian <strong>British</strong> American<br />

AA AE AH AO EH ER IH IY OH UW UH<br />

Australian <strong>British</strong> American<br />

AA AE AH AO EH ER IH IY OH UW UH<br />

Figure 2: Formants Comparison <strong>of</strong> Australian, <strong>British</strong> and<br />

American (female).


U 1<br />

U 2<br />

U 3<br />

T 1 T 2 T 3 T 4 T 5<br />

E 1 E 2 E 3<br />

A 1 A 2 A 3 A 4 A 5<br />

Figure 3: Formant Space <strong>of</strong> Australian, <strong>British</strong> and<br />

American (female).<br />

speech. Part <strong>of</strong> the acoustics <strong>of</strong> an accent is due to the<br />

differences in the distribution <strong>of</strong> formant frequencies <strong>of</strong><br />

vowels and diphthongs. In [2,3] the differences between<br />

the distributions <strong>of</strong> the formants <strong>of</strong> the vowels <strong>of</strong><br />

Australian and New Zealand English are shown. In this<br />

paper experiments are conducted on an enlarged accent<br />

set.<br />

Formant frequency estimation is based on an<br />

improved LPC-based method and 2-D HMM [6,7]. The<br />

average first and second formants <strong>of</strong> American, <strong>British</strong><br />

and Australian English are shown in Figure 2. It can be<br />

seen that across the accents the second formants <strong>of</strong><br />

vowels have the largest frequency range <strong>of</strong> up to 2KHz<br />

and also some <strong>of</strong> the highest frequency differences <strong>of</strong><br />

formants. It is interesting to note that the second formant<br />

which is most affected by accent also coincides with a<br />

Figure 4: Average Formant Trajectories (female) <strong>of</strong> <strong>British</strong>,<br />

Australian and American after alignment.<br />

X-axis is the normalized time. Y-axis is frequency (Hz).<br />

relatively high sensitivity frequency band <strong>of</strong> hearing.<br />

Figure 5: The hierarchy structure <strong>of</strong> the prosodic across the<br />

pitch contour. The unit <strong>of</strong> x-axis is 10ms and the unit <strong>of</strong> y-<br />

axis is Hz. ‘T’ denotes the tone units. ‘E’ denotes the pitch<br />

event. ‘A’ denotes the pitch accent [6].<br />

The 2 nd formant <strong>of</strong> vowels in Australian are on<br />

average 11% higher than those <strong>of</strong> <strong>British</strong> and 8% higher<br />

than those <strong>of</strong> American. The first formant <strong>of</strong> vowels<br />

varies within a 1KHz frequency range. Experimental<br />

results show that Australian vowels have a lower first<br />

formant compared to those in <strong>British</strong> and American. The<br />

third and fourth formants display a difference <strong>of</strong> some<br />

10% across accents. Figure 3 illustrates the formant<br />

space <strong>of</strong> three major English accents [7]. Some features<br />

noted from Figure 3 are:<br />

<br />

<br />

<br />

<br />

Rising <strong>of</strong> vowels /ae/ and /eh/ in Australian.<br />

Fronting <strong>of</strong> the open vowel /aa/ and the high<br />

vowels /uw/ and /uh/ in Australian.<br />

Fronting and rising <strong>of</strong> the vowel /er/ in<br />

Australian.<br />

The vowels /iy/, /eh/ and /ae/ in Australian are<br />

closer. It is particularly noticeable that for some<br />

vowels such as /ih/ /uw/ /ae/ /eh/, their positions<br />

in F1/F2 panel are shifted significantly with<br />

accents.<br />

Figure 4 shows the time-normalised average formant<br />

trajectory <strong>of</strong> <strong>British</strong>, Australian and American English<br />

accents. The acoustic target point <strong>of</strong> the formant<br />

trajectory <strong>of</strong> each vowel is marked as the point where the<br />

formant reaches the steadiest value (“a static spectral<br />

slice [8]”). It is noticeable in Figure 4 that Australian<br />

vowels have delayed target points compared to <strong>British</strong><br />

and American vowels.<br />

4.2 Pitch Intonation <strong>Correlates</strong> <strong>of</strong> Accent<br />

Much <strong>of</strong> an accent is conveyed by intonation, which in<br />

turn is largely conveyed by pitch trajectory. Pitch<br />

trajectory is hence regarded as one <strong>of</strong> the most important


correlates <strong>of</strong> accents. In [9] Cruttenden points out that the<br />

initial rise or the final fall/rise in pitch is an indicator <strong>of</strong><br />

different accent types. In particular, Scottish and<br />

Northern Ireland English tend to use final pitch rise in<br />

declarative utterance.<br />

The Rise/Fall/Connect (RFC) model [6,10] is<br />

employed to describe the pitch contour differences in<br />

accents modelled via Legendre polynomial function.<br />

The pitch correlate <strong>of</strong> intonation has the following<br />

hierarchical structure [6] (Figure 5)<br />

1) Pitch Accent: this is either a pitch rise or a fall.<br />

2) Pitch Event: a combination <strong>of</strong> a pitch rise and<br />

pitch fall (or vice versa)<br />

3) Tone Unit: a continuous sequence <strong>of</strong> several<br />

pitch events within phrase or utterance.<br />

4) Utterance: the composite trajectory <strong>of</strong> the pitch<br />

signal across a number <strong>of</strong> tone-units in an<br />

utterance.<br />

Tone unit must contain at least one prominent pitch<br />

event. This prominent pitch event is called the pitch<br />

nucleus. It carries the major part <strong>of</strong> intonation<br />

information. The magnitude and relative position <strong>of</strong> the<br />

Gender Female<br />

Male<br />

Accent (Avg.Pitch 3*std) (Avg.Pitch 3*std)<br />

<strong>British</strong> 184 3*37.30 121 3*29.93<br />

Australian 194 3*21.21 119 3*17.98<br />

American 195 3*21.78 121 3*17.17<br />

Table 4: Average Pitch and Pitch Frequency Range (Hz) <strong>of</strong><br />

<strong>British</strong>, Australian and American accents.<br />

Gender<br />

Accent Female Male<br />

<strong>British</strong> 13.37 8.53<br />

Australian 14.77 8.89<br />

American 14.91 9.07<br />

Table 5: Pitch Utterance Slope (Hz/sec).<br />

Gender<br />

Accent Female Male<br />

<strong>British</strong> 74.77 60.94<br />

Australian 44.36 31.6<br />

American 58.72 46.31<br />

Table 6: Initial rise pitch rate (Hz/sec).<br />

Gender<br />

Accent Female Male<br />

<strong>British</strong> 244 146<br />

Australian 68 88<br />

American 225 116<br />

Table 7: Final lowering pitch rate (Hz/sec).<br />

Frequency<br />

Initial<br />

Pitch<br />

Rise<br />

Pitch Range<br />

Pitch Sentence Slope<br />

Pitch nucl eus<br />

Final Pitch Fall<br />

This warning / now repr e sents / a m i nority view<br />

Figure 6: Broad Pitch Intonation Pattern in Accent Modelling<br />

Thick line is the pitch contour.<br />

Time<br />

nucleus is correlated with accents. Different accents tend<br />

to put nucleus in different positions [1].<br />

For accent modelling, in this paper only the broad<br />

intonation patterns are modelled (Figure 6). A set <strong>of</strong><br />

parameters are considered including pitch range, the rate<br />

<strong>of</strong> initial pitch rise in an intonation utterance, the rate <strong>of</strong><br />

final pitch fall and pitch slope across an utterance.<br />

Table 4 gives the estimates <strong>of</strong> average pitch frequency<br />

and pitch range <strong>of</strong> male and female speakers across<br />

<strong>British</strong>, Australian and American accents. There is no<br />

significant difference in average value <strong>of</strong> F 0 across the<br />

three accents. This implies that these accents do not have<br />

a significant impact on average pitch frequency. Hence<br />

average pitch frequency can be considered as a more<br />

speaker-dependent parameter. Table 4 also shows that<br />

female speakers have a larger pitch frequency range than<br />

male speakers. Both male and female <strong>British</strong> speakers<br />

have the largest pitch frequency range among the three<br />

accents. On average the pitch frequency range <strong>of</strong> <strong>British</strong><br />

speakers is 41% and 43% larger than those <strong>of</strong> Australian<br />

and American respectively. Australian and American<br />

speakers possess similar pitch frequency range. It is<br />

believed that pitch frequency range is closely related to<br />

intonation patterns employed by different accents.<br />

<strong>British</strong> speakers have extensive usage <strong>of</strong> low-rise tone in<br />

non-final intonation group (tone unit) to form an<br />

Duration (s)<br />

0.2<br />

0.18<br />

0.16<br />

0.14<br />

0.12<br />

0.1<br />

0.08<br />

0.06<br />

0.04<br />

0.02<br />

Australian <strong>British</strong> American<br />

aa ae ah ao aw ay eh er ey ih iy ow oy uh uw<br />

Figure 7: Comparison <strong>of</strong> duration <strong>of</strong> Australian, <strong>British</strong> and<br />

American vowels.


Duration (s)<br />

0.18<br />

0.15<br />

0.12<br />

0.09<br />

0.06<br />

0.03<br />

0<br />

Australian<br />

<strong>British</strong><br />

American<br />

b ch d dh f g hh jh k l m n ng p r s sh t th v w y z zh<br />

Figure 8 Comparison <strong>of</strong> duration <strong>of</strong> <strong>British</strong>, Australian, American consonants.<br />

oratorical and formal speaking style whereas Americans<br />

prefer the high-rise tone, a more causal way. There is<br />

also increasing tendency <strong>of</strong> using high-rise tone in<br />

Australians [9].<br />

Table 5 presents the pitch utterance slope. Female<br />

speakers consistently have larger pitch contour slope than<br />

male speakers regardless <strong>of</strong> their accents. It can be noted<br />

that American has a slightly sharper slope than <strong>British</strong><br />

and Australian, with a difference <strong>of</strong> 8% and 3%<br />

respectively.<br />

Table 6 and 7 display the initial pitch rise rates <strong>of</strong><br />

<strong>British</strong>, American and Australian accents. It is noticeable<br />

that female speakers consistently have steeper pitch rate<br />

in the Initial rising part <strong>of</strong> pitch contour than those <strong>of</strong><br />

male speakers. <strong>British</strong> speakers possess the steepest rate<br />

<strong>of</strong> initial pitch rise among the three accents about 44.1%<br />

and 22.7% steeper than Australian and American<br />

respectively. This coincides with the fact that they have<br />

the largest frequency range <strong>of</strong> three. Similarly, <strong>British</strong><br />

speakers possess the sharpest final fall rate in pitch<br />

compared to Australian and American speakers. Results<br />

illustrate that <strong>British</strong> speakers tend to have a steeper pitch<br />

rise and fall rates than American speakers. Furthermore,<br />

American speakers tend to have a lower pitch in final<br />

words <strong>of</strong> sentences compared to <strong>British</strong> speakers [11].<br />

4.3 Phoneme Duration Pattern and Speaking Rate <strong>of</strong><br />

Accent<br />

Accents are also partly conveyed by vowel duration and<br />

speaking rate. Duration statistics is estimated using<br />

segmentation boundaries obtained by HTK. Figure 7 and<br />

8 displays the average vowel and consonant duration <strong>of</strong><br />

three accents. On average, the duration <strong>of</strong> Australian<br />

vowels is 27% and 25% longer than those <strong>of</strong> <strong>British</strong> and<br />

American accents respectively. In particular, diphthongs<br />

such as /ao/ and /ow/ in Australian are over 33% longer<br />

than those in <strong>British</strong>, and over 30% longer than those in<br />

American. <strong>British</strong> speak shorter vowels at the starts and<br />

the ends <strong>of</strong> sentences in comparison to American<br />

speakers [11]. This could be due to <strong>British</strong> speakers<br />

always tending to pronounce last syllable fast. As for the<br />

consonants, Australian consonants are 5% and 6% longer<br />

than American and <strong>British</strong> respectively. Duration <strong>of</strong><br />

nasal consonants /m n ng/, unvoiced stops /p t k/ and<br />

semivowels /w l r/ in American are longer than those <strong>of</strong><br />

<strong>British</strong>. Affricates consonants /jh ch / in American are<br />

shorter than those in <strong>British</strong>. Australian has longer /dh<br />

hh/, shorter /z/ compared to <strong>British</strong> and American.<br />

Speaking rate describes the speaking speed and can<br />

be indicated by the number <strong>of</strong> syllables or words spoken<br />

per minute and the duration <strong>of</strong> pause. In our analysis, it is<br />

found that average speaking rate varies across accents.<br />

Table 8 and Figure 9 present the figures and distributions<br />

<strong>of</strong> the average speaking rates <strong>of</strong> <strong>British</strong>, Australian and<br />

Figure 9: Distribution <strong>of</strong> speaking rate in American, <strong>British</strong><br />

and Australian accents.<br />

Speaking Rate<br />

(number/sec) Phone Word<br />

<strong>British</strong> 12.1 3.64<br />

American 11.6 3.1<br />

Australian 10.8 2.8<br />

Table 8: Speaking rates <strong>of</strong> <strong>British</strong>, American and Australian.<br />

Model<br />

Input<br />

<strong>British</strong><br />

model<br />

American<br />

model<br />

Australian<br />

Model<br />

<strong>British</strong> 12.8 29.3 34.9<br />

American 30.6 8.8 29.94<br />

Australian 33.1 27.3 7.28<br />

Table 9: Word error (%) <strong>of</strong> cross accents speech recognition<br />

between <strong>British</strong>, American and Australian.


American accents. Speaking rates <strong>of</strong> Australian English<br />

and American English are respectively 23% and 15%<br />

lower than that <strong>of</strong> <strong>British</strong> English. The fact that<br />

Australian English possess the lowest speaking rate also<br />

conforms to the elongation <strong>of</strong> durations <strong>of</strong> Australian<br />

vowels. Unlike pitch rate, female speakers <strong>of</strong> three<br />

accents do not illustrate consistently higher speaking<br />

rates than male speakers.<br />

It is also interesting to notice the apparent correlation<br />

between the speaking rate and the accuracy <strong>of</strong> automatic<br />

speech recognition. American and Australian English<br />

respectively achieves 31% and 43% less word<br />

recognition error than the <strong>British</strong> English in matched<br />

accent conditions. Australian with the slowest average<br />

speaking rate gives the best speech recognition rate <strong>of</strong> the<br />

three accents as shown in Table 9. The reason may be<br />

that the slower speaking rate leads to a more consistent<br />

pronunciation and a better recognition results.<br />

4. ACCENT CONVERSION<br />

The differences <strong>of</strong> acoustic correlates across accents are<br />

used to morph the accents <strong>of</strong> spoken sentences among<br />

<strong>British</strong>, Australian and American accent speech<br />

databases. The changes in the acoustic correlates <strong>of</strong><br />

accents <strong>of</strong> speech are accomplished by the VoiceMorph<br />

s<strong>of</strong>tware developed in our laboratory. VoiceMorph can<br />

independently change all acoustic correlates <strong>of</strong> a voice.<br />

Initial experiments were conducted by mapping the<br />

phonetic, formants, pitch and duration correlates <strong>of</strong><br />

accents <strong>of</strong> speech between <strong>British</strong>, Australian and<br />

American English accents. The results show that acoustic<br />

correlates are an effective description <strong>of</strong> an accent for<br />

changing the perceived accent <strong>of</strong> a voice from a source<br />

accent to a target accent.<br />

5. CONCLUSION<br />

This paper presents a comparative analysis <strong>of</strong> acoustic<br />

correlates <strong>of</strong> accents. Differences in accent are classified<br />

into two categories: phonetic transcriptions and acoustic<br />

correlates. <strong>Acoustic</strong> correlates <strong>of</strong> accents such as<br />

formants are analysed for three major English accents.<br />

These acoustic correlates are proved very effective in<br />

accent morphing. However, the applications <strong>of</strong> the<br />

analysis also include accent recognition and<br />

compensation in automatic speech recognition, which are<br />

subjects <strong>of</strong> further investigations.<br />

7. REFERENCE<br />

[1] Wells J.C., Accents <strong>of</strong> English, volume 1,2,3,<br />

Cambridge <strong>University</strong> Press (1982).<br />

[2] Harrington J., Cox F., Evans Z., “An <strong>Acoustic</strong><br />

Phonetic Study <strong>of</strong> Broad, General, and Cultivated<br />

Australian English Vowels”, Australian Journal <strong>of</strong><br />

Linguistics 17: 155-184 (1997)<br />

[3] Watson C. I., Harrington J., Evans Z., “An <strong>Acoustic</strong><br />

Comparison between New Zealand and Australian<br />

English Vowels”, Australian Journal <strong>of</strong> Linguistics<br />

(1996)<br />

[4] Humphries J., “Accent Modelling and Adaptation in<br />

Automatic Speech recognition”, PhD Thesis, Cambridge<br />

<strong>University</strong> (1997)<br />

[5] Arslan L., Hansen J., “A Study <strong>of</strong> Temporal Features<br />

and Frequency Characteristics in American English<br />

Foreign Accent”, Journal <strong>of</strong> <strong>Acoustic</strong> Society <strong>of</strong> America,<br />

vol. 102(1), p. 28-40, (1997)<br />

[6] Ching-Hsing Ho, “Speaker Modelling for Voice<br />

Conversion”, PHD thesis, Department <strong>of</strong> Electronic and<br />

Computer Engineering, <strong>Brunel</strong> <strong>University</strong> (2001)<br />

[7]Qin.Yan, Saeed Vaseghi “<strong>Analysis</strong>, Modelling and<br />

Synthesis <strong>of</strong> formants <strong>of</strong> <strong>British</strong>, American and<br />

Australian Accents” ICASSP, Hong Kong, p. 712-715<br />

(2003)<br />

[8] Harrington, J., Cassidy, S. “Dynamic and Target<br />

theories <strong>of</strong> Vowel Classification: Evidence from<br />

Monophthongs and Diphthongs in Australian English”<br />

Language and Speech 37 pp.357-373 (1994)<br />

[9] Cruttenden A., Intonation, Cambridge <strong>University</strong><br />

Press (1997)<br />

[10] Paul Taylor, “<strong>Analysis</strong> and Synthesis <strong>of</strong> Intonation<br />

using Tilt Model”, Journal <strong>of</strong> the <strong>Acoustic</strong>al Society <strong>of</strong><br />

America Vol 107 3, pp.1697-1714 (1994)<br />

[11] Qin Yan, Saeed Vaseghi, “A Comparative <strong>Analysis</strong><br />

<strong>of</strong> UK and US English Accents In Recognition and<br />

Synthesis”, ICASSP (2002)<br />

6. ACKNOWLEDGEMENTS<br />

We would like to thank the UK’s EPSRC for funding<br />

project no GR/M98036.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!