Prosody and Syntax in Corpus Based Analysis of Spoken English ...

Prosody and Syntax in 

Corpus Based Analysis of 

Spoken English 

by 

Simon Christopher Arneld 

The University of Leeds 

School of Computer Studies 

December 14, 1994 

Submitted in accordance with the requirements 

for the degree of Doctor of Philosophy. 

The candidate conrms that the work submitted is his own and that 

appropriate credit has been given where reference has been made to 

the work of others.

Abstract 

This thesis attempts to show that it can be productive to analyse English prosody in terms of 

syntax. Although diering prosodies are possible for a xed syntax, it is demonstrated that an 

utterances syntax can be used to generate an underlying \baseline" prosody regardless of the 

actual words, semantics or context. In order to analyse this a British English spoken corpus is 

needed which has both syntactic and prosodic information. Such a corpus (the Spoken English 

Corpus (SEC) now known as the Machine Readable Spoken English Corpus (MARSEC)) is used 

to calculate a number of statistical measures relating the prosodic (specically the tonic stress 

mark annotations) and the syntactic (specically the part of speech tags) information. 

This thesis explores the mapping between this information. Models are devised around this 

information which implement the mappings and select from the search space of possible annotations 

those with the highest scores. The mapping is applied in the models for prediction of stress 

and prosodic annotations in new (part of speech tagged) text. 

The models are used to demonstrate that there is a clear relationship between parts of speech 

and the prosodic annotations in the Spoken English Corpus. The models may be exploited to 

generate stress and prosodic annotations for text{to{speech applications in order to increase the 

intelligibility and naturalness of the synthesized speech.

Contents 

1 Introduction 1 

1.1 Introduction :::::::::::::::::::::::::::::::::::::::: 1 

1.2 Motivation :::::::::::::::::::::::::::::::::::::::: 2 

1.3 Applications :::::::::::::::::::::::::::::::::::::::: 2 

1.3.1 Speech Synthesis ::::::::::::::::::::::::::::::::: 2 

1.3.2 Speech Recognition and Understanding :::::::::::::::::::: 3 

1.4 Overview ::::::::::::::::::::::::::::::::::::::::: 4 

2 Background 6 

2.1 Introduction :::::::::::::::::::::::::::::::::::::::: 6 

2.2 Prosody :::::::::::::::::::::::::::::::::::::::::: 6 

2.2.1 Denition of prosody ::::::::::::::::::::::::::::::: 7 

2.2.2 Auditory versus Acoustic Prosody ::::::::::::::::::::::: 12 

2.2.3 Functions of prosody ::::::::::::::::::::::::::::::: 12 

2.2.4 Representation of prosody :::::::::::::::::::::::::::: 13 

2.3 Syntax ::::::::::::::::::::::::::::::::::::::::::: 14 

2.3.1 Denition of Syntax ::::::::::::::::::::::::::::::: 14 

2.4 Approaches To Natural Language Processing ::::::::::::::::::::: 16 

2.5 Spoken English Corpora and Prosodic Annotation :::::::::::::::::: 17 

2.5.1 London{Lund Corpus :::::::::::::::::::::::::::::: 18 

i

2.5.2 Polytechnic of Wales Corpus :::::::::::::::::::::::::: 19 

2.5.3 SEC/MARSEC :::::::::::::::::::::::::::::::::: 19 

2.6 Computational Use of Prosody ::::::::::::::::::::::::::::: 20 

2.6.1 Speech Recognition and Understanding :::::::::::::::::::: 22 


3 Relating Prosody and Word Class 24 

3.1 Introduction :::::::::::::::::::::::::::::::::::::::: 24 

3.2 A Source of Data ::::::::::::::::::::::::::::::::::::: 24 

3.3 A Need For Processing :::::::::::::::::::::::::::::::::: 25 

3.4 Cross Referencing ::::::::::::::::::::::::::::::::::::: 27 

3.5 By{Products ::::::::::::::::::::::::::::::::::::::: 31 

3.6 Summary ::::::::::::::::::::::::::::::::::::::::: 31 

4 Preliminary Statistical Analysis 34 

4.1 Introduction :::::::::::::::::::::::::::::::::::::::: 34 

4.2 Prosodic Annotation Statistics ::::::::::::::::::::::::::::: 35 

4.2.1 Prosodic mark frequencies :::::::::::::::::::::::::::: 35 

4.2.2 Tone Unit lengths :::::::::::::::::::::::::::::::: 36 

4.2.3 Prosodic mark bigram frequencies ::::::::::::::::::::::: 37 

4.3 Cross{Reference Statistics :::::::::::::::::::::::::::::::: 38 

4.3.1 Co{occurence tables ::::::::::::::::::::::::::::::: 38 

4.3.2 Ignoring Higher{Level Syntactic structures :::::::::::::::::: 40 

4.3.3 Clustering word classes ::::::::::::::::::::::::::::: 41 

4.4 Summary ::::::::::::::::::::::::::::::::::::::::: 41 

5 Automatic Stress Annotation 45 

5.1 Introduction :::::::::::::::::::::::::::::::::::::::: 45 

5.2 Stress Prediction ::::::::::::::::::::::::::::::::::::: 47 

ii

5.2.1 Search Mechanism :::::::::::::::::::::::::::::::: 49 

5.2.2 Scoring :::::::::::::::::::::::::::::::::::::: 49 

5.2.3 Performance Measures :::::::::::::::::::::::::::::: 51 

5.2.4 Context :::::::::::::::::::::::::::::::::::::: 52 

5.2.5 Boundary Conditions :::::::::::::::::::::::::::::: 55 

5.3 Performance :::::::::::::::::::::::::::::::::::::::: 56 

5.4 Improvements ::::::::::::::::::::::::::::::::::::::: 58 

5.5 Summary ::::::::::::::::::::::::::::::::::::::::: 61 

6 Automatic Prosodic Annotation 63 

6.1 Introduction :::::::::::::::::::::::::::::::::::::::: 63 

6.2 Expanding the Model :::::::::::::::::::::::::::::::::: 63 

6.3 Model Design ::::::::::::::::::::::::::::::::::::::: 65 

6.3.1 Choice of Prosodic Marks :::::::::::::::::::::::::::: 66 

6.3.2 Estimation of Probabilities ::::::::::::::::::::::::::: 67 

6.3.3 The Model :::::::::::::::::::::::::::::::::::: 69 

6.3.4 Composite Model ::::::::::::::::::::::::::::::::: 70 

6.4 Model Assessment :::::::::::::::::::::::::::::::::::: 73 

6.4.1 Performance Statistics :::::::::::::::::::::::::::::: 76 

6.5 Summary ::::::::::::::::::::::::::::::::::::::::: 77 

7 Conclusions and Future Work 78 

7.1 Introduction :::::::::::::::::::::::::::::::::::::::: 78 

7.2 Review ::::::::::::::::::::::::::::::::::::::::::: 78 

7.3 Performance Measures :::::::::::::::::::::::::::::::::: 79 

7.3.1 Tone Unit lengths in the Model. :::::::::::::::::::::::: 80 

7.3.2 Analysis of Models :::::::::::::::::::::::::::::::: 80 

7.3.3 Word Class Models :::::::::::::::::::::::::::::::: 82 

iii

7.3.4 Prosodic Mark Models :::::::::::::::::::::::::::::: 83 

7.4 Future Work ::::::::::::::::::::::::::::::::::::::: 84 

7.4.1 Conversion to ToBI ::::::::::::::::::::::::::::::: 85 

7.4.2 Additional Constraints :::::::::::::::::::::::::::::: 85 


7.4.4 Parameter Improvement ::::::::::::::::::::::::::::: 87 

7.5 General Conclusions ::::::::::::::::::::::::::::::::::: 87 

A SEC and MARSEC 88 

A.1 Introduction :::::::::::::::::::::::::::::::::::::::: 88 

A.2 The Spoken English Corpus ::::::::::::::::::::::::::::::: 88 

A.2.1 History :::::::::::::::::::::::::::::::::::::: 88 

A.2.2 Categories ::::::::::::::::::::::::::::::::::::: 88 

A.3 MARSEC ::::::::::::::::::::::::::::::::::::::::: 89 

B Syntactic Tagging of SEC 92 

B.1 Introduction :::::::::::::::::::::::::::::::::::::::: 92 

B.2 Word Class Tags ::::::::::::::::::::::::::::::::::::: 92 

B.3 Phrase/Clause Tags ::::::::::::::::::::::::::::::::::: 96 

C Testing Data 97 

C.1 Corpus Texts: Category M ::::::::::::::::::::::::::::::: 97 

C.1.1 Section M02 :::::::::::::::::::::::::::::::::::: 97 

C.1.2 Section M03 :::::::::::::::::::::::::::::::::::: 98 

C.1.3 Section M04 :::::::::::::::::::::::::::::::::::: 99 

C.1.4 Section M05 :::::::::::::::::::::::::::::::::::: 100 

C.1.5 Section M07 :::::::::::::::::::::::::::::::::::: 103 

C.1.6 Section M08 :::::::::::::::::::::::::::::::::::: 103 

C.1.7 Section M09 :::::::::::::::::::::::::::::::::::: 104 

iv

C.2 Prediction Results :::::::::::::::::::::::::::::::::::: 106 

C.2.1 Extract from section M05 :::::::::::::::::::::::::::: 106 

D Word{Class / TSM Co{occurence gures 108 

D.1 Tonic Stress Mark Frequencies. ::::::::::::::::::::::::::::: 108 

D.2 Word Class Frequencies. ::::::::::::::::::::::::::::::::: 109 

D.3 Tag/Tone Co-occurences ::::::::::::::::::::::::::::::::: 109 

E Punctuation and Boundaries 114 

F Source Code 116 

F.1 symbolify.c :::::::::::::::::::::::::::::::::::::::: 116 

F.2 ttalign.c :::::::::::::::::::::::::::::::::::::::::: 120 

F.3 collate-tu.c :::::::::::::::::::::::::::::::::::::::: 128 

F.4 align-parse.c :::::::::::::::::::::::::::::::::::::::: 130 

F.5 splittule.c :::::::::::::::::::::::::::::::::::::::: 133 

F.6 transition.c :::::::::::::::::::::::::::::::::::::::: 134 

F.7 transgroups.c ::::::::::::::::::::::::::::::::::::::: 137 

F.8 segment.c ::::::::::::::::::::::::::::::::::::::::: 141 

F.9 probability.c :::::::::::::::::::::::::::::::::::::::: 142 

F.10 probability3.c ::::::::::::::::::::::::::::::::::::::: 149 

F.11 probabilityc.c ::::::::::::::::::::::::::::::::::::::: 156 

Bibliography 171 

v

List of Tables 

2.1 Prosodic marks used in the SEC/MARSEC :::::::::::::::::::::: 21 

4.1 Categories of the corpus used for analysis ::::::::::::::::::::::: 35 

4.2 Prosodic mark bigram frequencies ::::::::::::::::::::::::::: 37 

4.3 Co{occurence table for 64 most frequent word classes. :::::::::::::::: 43 

5.1 Stress Transition Table. ::::::::::::::::::::::::::::::::: 54 

5.2 Probability of a tone unit boundary following a stressed or unstressed word. :::: 56 

5.3 Probability of stressed or unstressed word following a Tone Unit boundary. :::: 56 

5.4 Performance statistics for stress prediction model. Percentage of words which are 

correctly stressed/unstressed in comparison to the two expert annotations and overall. 57 

5.5 Performance statistics for stress prediction model. Percentage of completely correct 

tone units in comparison to the two expert annotations (BJW: Briony Williams, 

and GOK: Gerry Knowles) and overall (ALL). :::::::::::::::::::: 57 

5.6 Words classes in the groups. ::::::::::::::::::::::::::::::: 60 

5.7 Performance statistics for stress prediction model using group transition probabilities. 61 

6.1 Performance scores for the training categories. ::::::::::::::::::::: 72 

6.2 Scoring relationship between predicted and annotated prosodic marks. ::::::: 75 

6.3 Performance scores for the test category of the corpus. :::::::::::::::: 77 

vi

7.1 Word class tags with frequencies of 50 or greater showing percentage of correct 

predictions (when compared with the corpus annotations) for the stress prediction 

model (SPM) and the prosodic mark prediction model (PPM). ::::::::::: 81 

7.2 Prosodic marks showing prediction percentages for the composite prosody prediction 

model. :::::::::::::::::::::::::::::::::::::::: 83 

A.1 Categories in the SEC/MARSEC :::::::::::::::::::::::::::: 90 

A.2 Sections in the SEC/MARSEC ::::::::::::::::::::::::::::: 90 

B.1 Phrase and Clause labels ::::::::::::::::::::::::::::::::: 96 

E.1 Punctuation/Tone Unit Boundary Co-occurence Table. :::::::::::::::: 115 

vii

List of Figures 

2.1 Waveform for the vowel e with one cycle or pitch period marked. :::::::::: 9 

3.1 Example of Prosodic annotation format. :::::::::::::::::::::::: 32 

3.2 Example of Treebank format. :::::::::::::::::::::::::::::: 32 

3.3 Example output from cross{referencing from section B02. : : : : : : : : : : : : : : 33 

4.1 Frequency of prosodic marks :::::::::::::::::::::::::::::: 36 

4.2 Relative frequencies of tone{unit lengths in terms of numbers of: words with tonic 

stress marks words with prosodic marks and words. ::::::::::::::::: 36 

4.3 Hierarchical clustering of 64 most frequent word classes. ::::::::::::::: 44 

7.1 Relative frequencies of tone-unit lengths produced by the model in terms of numbers 

of: words with tonic stress marks words with prosodic marks and words. ::::: 82 

A.1 Diagram showing waveform, fundamental frequency, RMS energy, segmental, prosodic 

and treebank transcriptions. ::::::::::::::::::::::::::::::: 91 

viii

Acknowledgements 

Iwould like to thank my supervisors Eric Atwell and Peter Roach for their support, advice, wisdom 

and ideas. Gratitude must also go to my collegues in the Speech Laboratory and in the School of 

Computer Studies who helped me with a great many things. 

Iwould also like to thank my family and friends especially my wife Debra and my friend Dean 

Brown who have put up with my rantings over the last few years and gave me their condence 

and enthusiasm to continue when things looked bleak. 

This research was funded by a Science and Engineering Research Council (SERC) research 

studentship. 

ix

Chapter 1 

Introduction 

1.1 Introduction 

This thesis investigates the relationship between parts of speech (or word class) and prosodic annotations 

specically in the Spoken English Corpus. A probabilistic approach istaken to describe 

the mapping from word class annotations to prosodic annotations. This mapping quanties the 

relationship. 

It is shown that a strong relationship exists between word class tags and stress (see chapter 5) 

but to a lesser extent there is also a relationship with prosodic annotations (see chapter 6). Models 

developed to demonstrate the relationship achieve over 91% agreement with the original corpus 

annotations for stress prediction and over 65% agreement for prosody (stress accent) prediction. 

The principal aim in studying the relationship is to assess whether either can be of any use in 

determining the other in speech synthesis, speech recognition and speech understanding applications. 

1

1.2 Motivation 

Lea[Lea80] (pp.172{174) gives results of an experiment where ve listeners were presented with 

255 carefully designed spoken sentences. The listeners were asked to mark each syllable as either 

stressed, unstressed or reduced. Lea collated their results and presented them as relative stress 

level ordered according to syntactic category (see his gure 8{2 p.174 in[Lea80]). 

His experiment showed that articles, conjunctions, and prepositions were on average judged 

reduced. 

Possessive determiners, relatives, copulatives, auxiliary verbs and pronouns were on 

average unstressed and main verbs, adjectives, sentence adverbs, nouns, quantiers and command 

verbs were on average stressed. 

This evidence shows that there is a relationship between stress and word class. Within this 

thesis similar data is analysed but on a much larger scale since there are over 1400 sentences and 

the stress distinctions extend to cover a wide variety of stress accents. Such ataskwould not be 

possible without the use of a machine readable corpus such as the Spoken English Corpus the like 

of which has not been available until recent years. 

The results presented in chapters 4, 5, 6 and 7 conrm and expand upon Lea's results. 

1.3 Applications 

Information relating parts of speech to prosody has useful applications for speech recognition and 

understanding and speech synthesis. 

1.3.1 Speech Synthesis 

In speech synthesis we have a one{to{many relationship between the string of word class tags 

for the words in an utterance and all the possible prosodic patterns that can be used with that 

utterance. 

It is apparent that context will aect the choice of prosodic patterns beyond the scope derivable 

from word class information. Consider the utterance \Peter isn't here". The tonic (or main) stress 

2

accent may be placed on any word to eect dierent emphasis and attitudinal circumstances. Viz: 

1. Peter isn't here. 



Where the small capitals indicate the word taking the stress accent. The rst might be uttered 

if everyone except Peter were present and if he were expected to be. The second might be uttered 

to correct a persons misconception about the presence of Peter. 

The nal utterance might be 

made if we expected Peter to be somewhere when he is not. 

It will be realised that the possible variations in prosody have multiple purposes covering 

aspects of attitude, emphasis, given/new information and grammatical structure. 

In this thesis we are only concerned with the information contained within the parts of speech 

and how this relates to prosody and hence only the latter of the above aspects will have a major 

bearing on this work. However, there is a certain amount of other information implicit within the 

structure of the parts of speech. 

Without the support of contextual information to distinguish between the above choices the 

best that can be aimed for in speech synthesis is a discourse neutral pattern | or a\baseline" pattern 

which would be the standard or default pronunciation in the absence of any major contextual 

eects. 

So, the main application of relating prosody and word class tags for text{to{speech speech 

synthesis is in producing neutral prosodic patterns which, of course, may be modied by higher 

level context or semantic processes. It is beyond the scope of this thesis to cover the realisation 

of these prosodic patterns acoustically. 

1.3.2 Speech Recognition and Understanding 

In speech recognition the uses of prosody are more limited because identifying prosodic information 

in an utterance will not help signicantly with the low{level identication of words or the words 

3

classes. However, disambiguation between similar sounding words based upon pitch accents or 

stress is a realistic possibility. For example \which" and \witch" sound the same but will most 

likely have dierent prosody. The word \which" (word class tag DDQ | `wh-' determiner without 

`-ever') is most likely to be unstressed though it may be stressed approximately 1/3 of the time. 

The word \witch" (word class tag NN1 | singular common noun) is highly likely to be stressed 

with a variety of stress accents. Thus by determining the presence of stress we can postulate that 

\witch" is more likely and by combining this with evidence from surrounding words for probable 

syntactic structures we can remove the ambiguity. The data presented in chapter 4 and appendix D 

is most useful for this task. 

The most useful application of prosody is likely to be in speech understanding. The identication 

of stress or stress accents within an utterance may be useful in predicting deeper semantic 

features such asgiven/new information, mood, use of irony or sarcasm etc. 

1.4 Overview 

This thesis is presented in 5 main chunks corresponding to chapters 3,4,5,6 and 7. First however 

a review of background information is given in chapter 2. 

Chapter 3 deals with the problem of processing the data in the Spoken English Corpus in such 

away that statistical information may be gathered automatically. It describes an algorithm and 

software that was devised to cross{reference between the prosodic annotations with the word class 

tag information. 

Chapter 4 uses the results of this cross{reference to extract various statistical measures of 

prosody and word class information specically the co-occurrence frequencies of each word class 

with each prosodic mark. That is the number of times that words with a given word class tag are 

annotated in the corpus with each of the prosodic marks. 

Chapter 5 describes and develops a model using the co-occurrence frequencies that can predict 

with high accuracy which words (corresponding to their word class tags) should be stressed (as 

4

opposed to left unstressed) in the speaking of an utterance. The predictions are made only on the 

basis of the word class tags the actual words are not relevant to the operation of the model. 

Chapter 6 builds upon the success of the model rening it to include stress accents in its 

predictions instead of just making the stress/unstressed distinction. 

Chapter 7 presents a review and analysis of the models which demonstrate their potential 

and limitations. It demonstrates how well each word class tag is modelled and how well each 

prosodic mark is modelled. It shows that although stress prediction works well, the prosodic mark 

prediction model is somewhat constrained in that it does not model the dierence between stress 

accents well it has a predisposition to make use of the fall stress accent. And nally, future work 

possibilities are identied. 

5

Chapter 2 

Background 


This chapter provides background information on prosody and syntax, Spoken English Corpora 

and the computational uses of prosody. 

2.2 Prosody 

The term prosody comes from Ancient Greek, and the study of it dates from that time if not 

earlier. Robins[Rob67] says: 

Apollonius's son, Herodian, is best known for his work on Greek accentutation 

... covering the eld of the prosodai ... The prosodai were described in more detail 

by later scholiasts and came to include the distinctive pitch levels symbolized by the 

accent marks on written words ... It is interesting to see the Greek word prosoda 

covering very much the range of phonetic phenomena to which the term prosody has 

been applied ... 

But Crystal[Cry69] (p. 20{21) points out that no major work was done on prosody until the 16 th 

Century. 

6

It is generally agreed that the earliest discussion of melody in spoken English is 

that of John Hart in his Orthographie, and The opening of the unreasonable writing of 

our Inglish toung. The former, published in 1569, has a long section on intonation (see 

Danielsson, 1955, pp. 199{201) ... In The opening of the unreasonable writing of our 

Inglish toung (1551, xx164{5 see Danielsson, 1955, pp. 147 .), there is an attempt to 

outline the nature of stress in English ... 

Following Hart's work, Crystal (p. 22) says, was: 

Butler (1633), who provides the rst connected discussion of the two main English 

tunes ... [and Flint (1740)] which involved reference to stress, there was no specic 

study until Steele (1775) and Walker (1787). 

A review of the history of English prosody is given in Crystal[Cry69] sections 2.5 to 2.7. 

However there is not much work of relevance to this thesis until after the late fties when systems 

of tonetic stress marks and the tone unit were given substance by O`Connor and Arnold[OA61]. 

The particular emphasis of this thesis is on machine readable copora which have not existed until 

very recent years. 

2.2.1 Denition of prosody 

The terms used in the study of prosody are often ambiguous and confusing (for example British 

researchers tend to use the term prosodic whereas American researchers tend to use the term 

suprasegmental). In fact some researchers tend to talk about prosody as a term synonymous with 

intonation or intonation and intensity. The purpose of this section is to provide denitions of the 

main terms and particularly those relevant to this work. In all cases British English is the only 

language under consideration unless stated otherwise. 

Prosody 

Prosody is the term used to describe those features of speech that are considered to be nonsegmental 

usually agreed to include intonation, stress, loudness, rhythm, tempo and voice quality. 

7

These contrast with segmental aspects of speech suchaswords, syllables or phonemes. Prosody 

is often used as an alternative termtointonation. Hence prosody is the superordinate term used 

to cover all of the above aspects which will be described individually in the sections below. 

In the work presented here prosody is used primarily to indicate stress and intonation and to 

a lesser extent loudness. No consideration is given to aspects of rhythm, tempo and voice quality. 

Suprasegmental/nonsegmental 

According to Roach[Roa92] (p. 105) \A term invented to refer to aspects of sound such asintonation 

that did not seem to be properties of individual segments ... much British work has preferred 

to use the term prosodic instead." Crystal[CQ64] (p. 341) also lists plurisegmental and superx 

as alternative terms. 

Intonation 

Intonation is the patterns of changing pitch ofvoice (or melody) over an utterance used to convey 

information. However, as Roach (p. 56) points out it is often used in a \broader and more popular 

sense, [as] equivalent to prosody, where variations in such things as voice quality, tempo and 

loudness are included." 

Intonation is used by speakers to convey emotions and attitudes and (p. 57) \interesting relationships 

exist in English between intonation and grammar" suggesting that intonation also plays 

a role in guiding the listener through the structure of the utterance 

Descriptive frameworks have been developed to describe the changing pitch movements. The 

tone unit is considered to be the basic unit of prosody and intonation and is the approach most 

widely used in Britain. See section 2.2.4. 

Pitch 

A sound with a periodicity issaidtohave a pitch. Speech is considered to be periodic although 

strictly speaking there are minor variations between cylces. Pitch is sometimes considered equivalent 

to fundamental frequency (F0) but this is not the case the fundamental frequency is the 

8

Figure 2.1: Waveform for the vowel e with one cycle or pitch period marked. 

acoustic counterpart to pitch. Pitch is a complex auditory perception. It is possible to perceive a 

change in pitch when the fundamental frequency is xed but signal intensity is slightly varied an 

increase in intensity produces a drop in pitch as noted by Lehiste (p. 67)[Leh70]. 

Under most conditions pitch remains closely related to the fundamental frequency. Figure 2.1 

shows the waveform for a vowel with one cycle or pitch period indicated. The more cycles there 

are per second the higher the perceived pitch. The relationship is not linear however and the mel 

scale relates pitch to frequency (see Lehiste p. 65). Signicant changes or contrasts in pitch give 

rise to pitch accents. 

Accent 

Accent refers to a prominence sometimes called a pitch accent. This is distinct from stress in that 

stress is more generally used to refer to other types of prominence including loudness, length and 

sound quality. 

Accent may also refer to a particular way of pronouncing. For example two English people 

may say the same sentence but one may speak with Received Pronunciation whilst the other may 

speak with a broad Yorkshire accent. 

Stress 

Stress is the term given to any form of prominence of syllables. 

Sentence Stress is the most 

prominent word in a sentence and word stress is the stress pattern on the syllables within a word. 

The position of stress within a word can determine its meaning for example REfuse and reFUSE. 

Stress is a complex topic and not completely understood. Components of stress include pitch 

prominence, increased articulatory eort, loudness, and syllable lengthening. 

9

The number of degrees (or levels) of stress are a subject of disagreement yet it seems that no 

more than three levels (unstressed, weakly stressed, and strongly stressed) are all that is necessary. 

See Fudge[Fud84] for more information. 

Loudness 

Roach (p. 68) has the following to say on loudness: 

[Loudness is the] auditory impression of the amount of energy in sounds. We all use 

greater loudness to overcome dicult communication conditions ... and to give strong 

emphasis to what we aresaying, and it is clear that individuals dier from each other 

in the natural loudness level of their normal speaking voice. Loudness plays a relatively 

small role in the stressing of syllables, but it seems that in general we do not make 

very much use of loudness contrasts in speaking. 

Crystal (p. 215) adds that it corresponds \to some degree with the acoustic feature of intensity 

(measured in decibels (dB))" and he goes on to note that \other factors than intensity may aect 

our sensation of loudness, e.g. increasing the frequency of vocal cord vibrations ..." 

Rhythm 

The timing and distribution of events in speech. Roach (pp. 93-94) notes: 

An extreme view (though quite a common one) is that English speech has a rhythm 

that allows us to divide it up into more or less equal intervals of time called `feet', 

each of which begins with a stressed syllable: this is called the stress{timed rhythm 

hypothesis. 

Crystal (p. 307) denes rhythm as \the perceived regularity of prominent units in speech". 

Speech Rate/Tempo 

Simply the speed of speaking or rate of articulation measured in, for example, syllables per minute. 

Speech rate can be modied (from a speaker's normal speech rate, within certain ranges) for 

10

semantic or emotional/attitudinal eects. 

Pitch Range/Tessitura 

The extent in pitch (between lowest and highest pitch) which a speaker usually uses in normal 

speech. This may be extended or shifted for semantic or emotional/attitudinal eects. 

Key 

Crystal (p. 200) denes Key in the following way: 

A term used by some sociolinguists as part of a classication of variations in spoken 

interaction: it refers to the tone, manner, or spirit in which a speech{act is carried out, 

e.g. the contrast between mock and serious styles of activity ... 

However, Roach (p. 61) says that key \has generally been used simply to indicate a rough location 

within the pitch range" and that the terms high key and low key have been used to describe the 

fact that sometimes a speaker will make more use of the higher or lower part of their pitch range 

usually as a result of the emotional content of what they may besaying. See Brazil et al[BCJ80] 

chapter 2 for a more comprehensive description. 

Voice Quality 

Distinctive characteristics of a person's speech such a breathiness or creekiness are aspects of voice 

quality. Speakers do, however, introduce variations in voice quality for particular purposes. For 

example: speaking in a soft voice to indicate sympathy or a harsh voice to show anger. Voice 

quality isbeyond the scope of this thesis but see Laver[Lav80, Lav72] for a more authorative 

description of voice quality. 

Juncture 

Crystal (p. 197) classies juncture as \phonetic boundary features which demarcate grammatical 

units ..." and Roach (p. 60) describes it as the \way one sound is attached to its neighbours" 

11

and \where one found in continuous speech phonetic eects that would usually be found preceding 

or following a pause, the phonological element of juncture would be postulated". Crystal 

demonstrates with an example: 

Word{division, for example, can be signalled by a complex of pitch, stress, length and 

other features, as in the potential contrast between that stu and that's tough. 

Roach lists some other examples: cart rack/car track, pea stalks/peace talks, great ape/grey tape. 

2.2.2 Auditory versus Acoustic Prosody 

Research into prosody usually falls into two camps: auditory or impressionistic/subjective and 

acoustic. 

Acoustic research concentrates upon physical phenomena such as can be measured 

and would for example concentrate upon (for intonation, for example) the acoustic correlates 

such as fundamental frequency whereas an auditory approach would be more concerned with the 

perceptual phenomena such aspitch rises and falls. 

For computer based work it is dicult to follow an auditory approach because of the problems 

inherent in modelling complex perceptual eects. For example it is relatively easy to measure F0 

but it is not possible to measure pitch since it is a perception. See [Leh70, Cry69, Lav80, BCJ80, 

Fud84] for relevant work. 

2.2.3 Functions of prosody 

The functions of prosody are diverse and opinions are divided between emotion and attitude signalling 

versus grammatical and lexical information. See [Cru86, Cry69] andLaver[Lav94](pp.494- 

498) and Roach[Roa91](p.163). It seems reasonable that prosody performs both tone unit segmentation 

correspond with some major syntactic constituents. See section 2.6. Lea[Lea80] has 

shown that there is a relationship between word class and stress. Prosody also directs attention 

or focus and can signal new or old information as well as contrasting or correcting and echoing 

information. For example see section 1.3.1. 

12

2.2.4 Representation of prosody 

An old transcription system often used referred to as \interlinear tonetic" represents intonation 

as a sequence of dots or dashes (with curves representing stress accents) between two horizontal 

lines (see [Roa91, Cru86] for examples). This is not a convienient representation scheme for use 

in machine readable processing of intonation though. 

Cruttenden[Cru86] comments: 

there have beentwo alternative approaches to the analysis of English: the older British 

`whole tune' approach which describes the overall tunes associated with sentences ... 

and which does not therefore have a concept of nucleus and nuclear tone ... [and the] 

American approach involving pitch levels and terminal junctures. 

The standard British approach has been the tone unit, with tonic stress movement markers 

placed upon the syllables upon which a pitch accent starts. The structure of a tone unit is dened 

as 

(pre{head) (head) tonic stress (tail) 

The tonic stress is the only element that is required the others being optional. Laver[Lav94](p.492) 

says 

Any legitimate utterance of English is made up of one or more intonational phrases, 

and each intonational phrase contains one intonational nucleus at which one of the 

possible nuclear tones is chosen. 

If any words precede the tonic stress they comprise the pre{head, any words which are stressed 

(but without accent) form the head. Words following the tonic stress are the tail. Typical tonic 

stress accents include rises, falls, fall{rises, and levels. 

In British writing another transcription system indicates the tune in a tone unit by anumber 

at the start of each tone unit with diacritics to indicate variations. 

One of the problems with the above system comments Cruttenden[Cru86](pp.63-64) is the 

confounding eect of the various distinctions of pitch range. Recent work, he says, has explored 

13

the use of only two level tones (High and Low). He goes on to describe autosegmental intonation 

(see section 3.8.1). 

Cruttenden reports that the model presented by Pierrehumbert (1980) uses \an LP{type metrical 

representation of the text ... and secondly a tune represented by a sequence of high (H) and 

low (L) tones ... the model constructs an underlying representation for tunes of English intonation 

and a set of rules which transmutes such tunes into actual patterns of fundamental frequency". 

Refer to section 3.9 of Cruttenden[Cru86] for a summary. 

Another transcription commonly used until recently in American writing used one of a number 

of pitch levels at crucial change points in a contour. 

Crystal[Cry69] denes the most comprehensive prosodic and paralinguistic transcription which 

has been used in the London{Lund Corpus of Spoken English (see below). 

2.3 Syntax 

Although the emphasis of this thesis is upon the automatic generation of prosodic annotations 

from their relations to word class it is necessary to include a few denitions on syntax. 

2.3.1 Denition of Syntax 

Syntax is the grammatical arrangement ofwords showing their connection and relation or a set of 

rules dening how words may be combined. 

PartofSpeech 

The part of speech ofaword is its grammatical identity such as noun, verb, adjective, conjunction 

etc. Most dictionaries will list the parts of speech for words although in corpus based natural 

language processing there are usually many subdivisions of the basic classes given above. For 

example in the system used throughout this thesis there are 29 divisions within the noun class. 

Some words may have more than one part of speech which gives them dierent meanings in 

dierent circumstances. For example \record" as a verb is \to make a copy of something" whilst 

14

as a noun it is \a disc of plastic from which music may be played". 

Wordtag 

Awordtag is a symbol used to annotate a word in a sentence (usually contained within a corpus) 

to indicate the part of speech ofthatword. For example NP1 is the wordtag for a singular proper 

noun. 

The symbols for wordtags vary amongst authors but in all cases within this thesis the word 

tags used are from the CLAWS4 (see section 2.5.3) part of speech tagging system. See appendix B 

for a list of the tags used. 

Note that some of the wordtags are very highly constrained for example the wordtag VBZ is 

only used for is or 's. 

Parsetree 

A parse tree is a tree{like structure that shows the inter{relationship between parts of speech 

and shows phrase and clause structures within a sentence. In this example wordtags immediately 

follow the word they belong to with a separating underscore character. Thus right JJ is the word 

right with wordtag JJ which means adjective in this context. For example (SEC section A09 

sentence 9): 

[N They_PPHS2 N][V are_VBR [J right_JJ J] 

[Ti to_TO be_VB0 [J sceptical_JJ J]Ti]V] 

The square brackets indicate phrase and clause structures and come in matching pairs so the [J 

bracket before the word right matches the J] bracket just after it. In this case They is a noun 

phrase, right and sceptical are adjective phrases, to be sceptical is a clause with innite 

head and are right to be sceptical is a verb phrase. It can be seen that with the nested 

matching brackets the above sequence could be drawn as a tree. For a full list of phrase and clause 

labels refer to appendix B. 

15

Treebank 

A treebank is simply a collection of parsetrees for each of the sentences in a text. 

2.4 Approaches To Natural Language Processing 

In this section I will briey outline some of the wider ranging approaches taken to natural language 

processing (NLP) which dier from the probabilistic corpus{based approach taken in this thesis. 

Roger Schank argued that whatever information is encoded in the organisation of language can 

be extracted directly without building an intermediate representation(p.32)[MA91]. That is, it is 

possible to process language without a system of grammar. 

Gazdar's et al Generalised Phrase Structure Grammar (GPSG 1 )[GKPS85], on the other hand, 

contains specic rules about possible and legal structures of the kind which Schank avoids. However, 

approaches such as Gazdar's also contain mappings between syntactic rules and rules for 

semantic interpretation. So, it is easy to view any structure syntactically and semantically. 

Both of these approaches to NLP use a system of rules. Rules for representing legal and illegal 

structures and rules for mapping between interpretations of structures. 

Computer programming languages such as LISP and PROLOG (with its unication mechanism) 

have been instrumental in the design of systems such as GPSG. See Gazdar and Mellish's 

book on NLP in PROLOG[GM89] for an example. 

Representation of meaning was approached by Schank by dening a set of \atomic" primatives 

in terms of whicheverything else was dened. His theory of conceptual dependency was an attempt 

to nd a minimal set of semantic primatives which can be used for the interpretation of all natural 

language texts. He qualied this by arguing that any two texts which have the same meaning 

should be represented in the same way. For example: 

John loves Mary. 

1 GPSG is a framework for writing fully explicit formal grammars for natural languages. It is a notationally 

elaborated varient ofcontext{free phrase structure grammar. 

16

Mary is loved by John. 

Should both have the same representation 2 . See, for example, Schank [Sch73]. Wilks's[Wil78] 

theory of preference semantics is similar but uses a much larger set of primatives. 

In contrast to the symbolic rule{based models mentioned above, connexionist approaches such 

as that of Waltz[Wal89] are based upon neural network models of language. 

Waltz lists the 

advantages of connexionism as: 

Connectionist systems exhibit non{trivial learning ... 

[they] can be made fault{ 

tolerant and error{correcting, degrading gracefully for cases not encountered previously. 

... Connectionist architectures also scale well ... 

He goes on to point out that: 

In contrast, systems based on logic, unication and exact matching are inevitably 

brittle. 

Similar to neural networks are Hidden Markov Models (HMM) (see Rabiner[Rab90]) which are 

also used in non{symbolic, non{rule based approaches to NLP. 

There are then, two main approaches to NLP: 

1. rule{based symbolic systems 

2. non{symbolic connexionist or stochastic systems. 

The work in this thesis, whilst it does not used neural networks nor HMMs, falls into the latter 

category. 

2.5 Spoken English Corpora and Prosodic Annotation 

There are three main prosodic annotation schemes used in machine readable speech corpora: the 

system used in the SEC (standard British prosodic annotation) or variations on it which are based 

2 Some linguists, however, would question whether these sentences are actually the same 

17

upon O'Connor and Arnold[OA61] the system used in the LLC which follows Crystal[Cry69] and 

the ToBI system[SBP + 92] derived from Pierrehumbert[Pie87]. See below for information on these 

corpora. The reader is referred to the relevant references for information regarding the annotation 

information in corpora other than the SEC. 

2.5.1 London{Lund Corpus 

The London{Lund Corpus of Spoken English (LLC) derives from two projects: the Survey of 

English Usage at University College London launched in 1959 by Randolph Quirk and the Survey 

of Spoken English launched in 1975 by JanSvartvik at Lund University. 

The LLC contains written as well as spoken material including surreptitiously recorded material. 

Texts have been analysed grammatically and have a prosodic/paralinguistic analysis (following 

Crystals conventions[Cry69]) which are held on typed cards. Only a fraction of this prosodic 

paralinguistic analysis is available in a machine readable form. Greenbaum and Svartvik[Sva90] 

state that: 

The basic prosodic features marked in the full transcription are tone unit boundaries, 

the location of the nucleus (ie the peak of greatest prominence in a tone unit), the 

direction of the nuclear tone, varying lengths of pauses, and varying degrees of stress. 

Other features comprise varying degrees of loudness and tempo (eg allegro, clipped, 

drawled), modications in voice quality (pitch range, rhythmicality and tension), and 

paralinguistic features such as whisper and creak. Indications are given of overlap in 

the utterances of speakers. The full transcription and the grammatical analysis are 

available only on the slips at the Survey of English Usage at University College London. 

The (machine readable) reduced transcription includes tone units, onsets, nuclei, nuclear tone 

direction (falls, rises etc.), boosters, pauses (2 degrees), and stress (2 levels). 

18

2.5.2 Polytechnic of Wales Corpus 

The Polytechnic of Wales Corpus (POW)[FP80] is not really suitable for analysis within this thesis 

since it is neither machine readable (being collected in the days prior to word processors) nor is it 

of suitable material since it is a corpus of child speech. It is however worth a mention because it 

contains prosodic annotations as well as a full grammatical analysis. 

It is worth noting that the original recordings have been rescued by Clive Souter and have 

now being copied to digital audio tape (DAT). The four volumes of transcripts may one day be 

scanned to machine to make thePOW corpus machine readable. 

2.5.3 SEC/MARSEC 

The Spoken English Corpus (SEC)[KT88] (also known as the Lancaster/IBM Spoken English 

Corpus) was compiled between 1984 and 1985 at the Unit for Computer Research on the English 

Language (UCREL), University of Lancaster, and the Speech Research Group at IBM UK Scientic 

Centre, Winchester. As the name implies, the SEC is a corpus of spoken British English 

(taken mainly from BBC Radio 4 broadcasts) which isavailable as lexicographically transcribed 

texts (with and without punctuation), as part of speech annotated texts, and as prosodically annotated 

texts. All annotations were produced manually with the exception of the part of speech 

annotated texts which are semi{automatically produced. In addition, as a parallel resource, there 

is a treebank version of the corpus. The SEC is available through the International Computer 

Archive of Modern English (ICAME). 

Together these form a very rich source of information and all are in a machine readable format. 

A potential drawback but one solved in chapter 4 is that the diering versions of the corpus exist as 

separate entities that have evolved independently they are not related to each other except by the 

fact that they cover the same speech material. See appendix A for a description of the contents of 

the corpus. For more comprehensive information refer to the original corpus documentation[KT88]. 

The corpus size is approximately 52000 words, which is quite small by modern computer 

corpora standards. Although this may be a problem the corpus is so richly annotated as to make 

19

it a very desirable resource. 

The Machine Readable Spoken English Corpus (MARSEC)[RKVA94, GAR92] is an extension 

to the SEC in which the original acoustic data of the corpus has been digitised and made available 

on CDROM. To complement this fundamental frequency, RMS energy, time aligned segmental 

transcription and syllabic divisions have been added. 

As a direct result of work done in this thesis cross{reference between the prosodic, part of 

speech, treebank, syllabic, and segmental transcriptions have been produced which form part of 

the Leeds (UNIX/waves based) version of the corpus | although not yet released. The cross{ 

reference allows direct links to be made from any point in the corpus between any two (or more) 

versions of the corpus annotations (including acoustic signal, F0 and RMS energy). 

The part of speech annotations in the corpus were assigned at the University of Lancaster 

using their CLAWS[GLS87, Atw83] tagging program which was rst developed between 1981 and 

1983 at the Universities of Lancaster, Oslo and Bergen. 

The prosodic annotations were produced manually by two expert transcribers 3 using a system 

based upon O'Connor and Arnold[OA61]. The corpus was annotated with 16 prosodic marks. 

The " and # symbols could be used in conjunction with any of the high or low level TSMs. Of 

the tone unit boundaries only one transcriber used the hesitation boundary. A major boundary 

existed where there was a pause a minor boundary where there was a boundary without a pause. 

Hesitation boundaries were placed in instances where there was a pause but one would not normally 

expect to nd a boundary. See table 2.1 

2.6 Computational Use of Prosody 

Even though prosody is an integral part of the speech act and conveys several types of information 

it has hardly been exploited in computational systems such asspeech synthesis and recognition. 

Waibel[Wai90] claims that 

3 Dr. B. Williams and Dr. G. Knowles 

20

" higher than predictable pitch 

# lower than predictable pitch 

low level 

low fall 

low rise 

low fall{rise 

low rise{fall 

high level 

high fall 

high rise 

high fall{rise 

high rise{fall 

stressed but unaccented 

k major tone unit boundary 

j minor tone unit boundary 

* hesitation tone unit boundary 

Table 2.1: Prosodic marks used in the SEC/MARSEC 

To this day, the prosodic cues in the speech signal, duration, rhythm, intensity, pitch 

and stress, are frequently being ignored in the implementation of speech recognition 

systems. 

and that 

Several attempts at using prosodic cues in speech recognition systems have mostly been 

limited to aiding syntactic analysis by hypothesizing phrase or clause boundaries (from 

pitch excursions) and/or hypothesizing phonemicaly reliable parts of the utterance 

(\islands of reliability") from the amount of stress signal. 

Klatt has also commented on the little of use of prosody[Kla80, Kla90]: 

While relatively little use has been made of prosodic information in most recognition 

systems described to date, some ideas for prosodic analysis have been proposed and 

tested (Lea, Medress, and Skinner, 1975). 

Again the same opinion is expressed by Lea[Lea80]: 

If there is one aspect of the information in the speech signal that seems promising and 

yet \untapped", it is the \suprasegmental" information ... 

21

He goes on to document several prosodic correlates of linguistic structures that have potentially 

useful applications in computer speech technologies. 

2.6.1 Speech Recognition and Understanding 

This section demonstrates that although some work has been done on integrating prosody into 

speech recognition and understanding systems there is still much to be gained. 

It is commonly agreed that speech recognition can be improved by use of prosodic information. 

\Prosodic cues (fundamental frequency, segmental duration, and intensity contour) suggest a 

stress pattern for the incoming syllable string and thus could assist in lexical hypothesization" 

said Klatt[Kla80, Kla90]. Three types of prosodic knowledge source (duration and rhythm, stress 

and intensity) were investigated by Waibel[Wai90] for use in a speech recognition system and he 

showed that \dramatic overall improvements" were attained when used in combination with a 

speaker{independent phonetic word hypothesizer. 

Longuet{Higgins[LH85] comments upon the possible uses that intonation may be put to for 

speech understanding. In particular he suggests that contrastive pitch movements may be used 

to identify the relative importance of words in an utterance and thus indicate emphasis or new 

information. Similarly on the subject of prosodics in speech understanding systems Woods[Woo85] 

says that in a speech understanding system it is necessary to have: 

the ability to use cues such asintonation and rhythm to predict the possible syntactic 

structure of an utterance or to conrm or reject a proposed syntactic structure. 

To date none of the speech recognition systems currently available commercially make signicant 

use of prosodic information. 


Prosody is an important and integral part of speech and speech synthesis systems designers have 

tried to model aspects of it such as, for example, rhythm[Isa85] and melody[Ste85] although the 

latter notes that (for French) \the gain in intelligilbility is almost zero" but that the speech will be 

22

much more \pleasant". He also notes that \not all authors agree that prosody is directly related 

to the syntactic structure of a sentence". 

The Klatt system (see [Kla87]) has a prosodic processing stage where prosodic contours are 

applied in the synthesised utterance taking into account syntactic structure. Word classes give rise 

to a potential to aect the fundamental frequency contour, nine of which are actually distinguished 

from one another. The relative height of the peak deection of the F0 contour depends upon the 

level given to each word class. This however does not correspond to the way in which prosodic 

annotations are given to speech. That is: in terms of rises, falls and stresses. 

There appears to be no system that can do the conversion between prosodic annotations 

(whether of SEC, LLC or ToBI type) and the acoustic signal. 

However, see chapter 3 of t' 

Hart[tHCC90]. Indeed it is not even known if there is a real acoustic correlate or if the transcriptions 

are auditory phenomena. Should such a process become available by the work of others the 

work presented in chapters 5 and 6 will be able to generate annotations similar to those in the 

SEC and so becomes a link in the path from text{to{speech. 

23

Chapter 3 

Relating Prosody and Word Class 


Comparison of variations in prosody and word class annotations requires that one has that information 

for the same utterances. This information is provided in the SEC but not in a unied way. 

This chapter addresses the problem of how to cross{reference between the prosodic and the word 

class annotations. 

3.2 A Source of Data 

With any statistical analysis a body of data is needed. For the task in hand this data must be 

speech which has been lexically transcribed, tagged with word classes and must be annotated by 

experts with prosodic information. Ideally the acoustic data should be available to allow reference 

between the annotations and the original speech. This is also preferable for future work where 

the relationships between the prosodic annotations and fundamental frequency, RMS energy, and 

duration could be investigated[RAng, GAR92]. The data must also be in a machine readable form 

since the task of coding such information is highly time consuming. 

There are manyspeech corpora available but few (at that time) oered the requirements of be- 

24

ing: machine readable, British English, having word class and prosodic annotations and providing 

acoustic data. The only obvious candidate: The Spoken English Corpus (SEC)[KT88, GAR92, 

Kno88] was, by chance, being updated jointly by Leeds University and Lancaster University to 

become The Machine Readable Spoken English Corpus (MARSEC)[RKVA94]. Other corpora are 

ruled out for various reasons: the Polytechnic of Wales Corpus (POW) is a corpus of child speech 

which has some prosodic annotations but is not in a machine readable form and the London{Lund 

Corpus (LLC) has some sections of speech with prosodic annotations however these are mostly 

not in a machine readable form and due to the source of the data the acoustic recordings are not 

available. The standard corpus does not have word class tags | although these now exist for some 

parts. 

MARSEC provides in machine readable form the acoustic data along with fundamental frequency 

and RMS energy traces, syllabic divisions, segmental time alignment aswell as all the 

data that is contained within the SEC (see appendix A) along with the ability to cross{reference 

between forms of data. The MARSEC project was not due to nish until the end of this work 

hence the cross-referencing described below has proved to be a useful addition to that project. 

Other speech corpora are discussed in chapter 2. The SEC and MARSEC are ideal sources for 

this work. 

3.3 A Need For Processing 

In an ideal world the information in the SEC would have been easily extractable. That is: prosodic 

information should be directly comparable to word class information and vice{versa. The original 

structure of the SEC does not allow for this: word class information and prosodic information 

were held in parallel versions of the corpus text. 

There are a number of dierences between the two representations of the corpus texts that 

hold information on word classes and prosody. A manual of information to accompany the SEC 

Corpus[KT88] (sic) explains that, in general, transcriptions were produced from recordings of 

25

speech. These (unpunctuated) transcriptions were then used in conjunction with the recordings 

to produce the prosodic annotations. Upon occasion it was necessary to alter the transcription 

to allow for the appropriate placement of prosodic tone marks. For example, if a prosodic marker 

were to be placed on the nal syllable of the word \19" it would be necessary to rewrite \19" as 

\nineteen". This generally happened with numbers and acronyms but only where it was necessary 

| not in every instance. 

Meanwhile the transcriptions were being punctuated and then tagged with word classes using 

the CLAWS tagging program. Modications that were made to allow prosodic markers to be placed 

were not also applied to the punctuated or tagged annotations. Likewise the representation of the 

word classes required changes to be made to the tagged annotation which were not propagated 

to other annotations namely the treatment of enclitics such as\won't" (section A01 line 18) and 

\don't" (section A11 line 43) which are expanded into \will n't" and \do n't" (notice that this is not 

merely a case of splitting the two parts of the enclitic, so the process is not easily reversed without 

domain knowledge). This is to allow the insertion of the appropriate word classes. Compounds 

(which may be several words long, hyphenated or not) may be classied as a single word class 

and are therefore treated as a single word. Possessives such as \England's" (section A01 line 28) 

are split like enclitics into \England" and \ 's" although it should be noted that the tagging 

scheme used in the original SEC (CLAWS1 ) did not do this and this is only applicable to the 

tagging scheme used to tag the parsed annotated corpus (CLAWS4 ). Finally minor dierences 

were introduced by changes of case for some words (which, the author understands, to have been 

caused by some rule{based mechanism in CLAWS). 

Two very obvious dierences between the annotations are that the tagged annotation contains 

punctuation and word classes whereas the prosodic annotation contains tone{unit boundaries and 

prosodic tone and stress markers. 

The annotations were preceded with headers detailing the 

source, speaker(s) and other details. Comments enclosed within square brackets detailed such 

information as speaker changes or extracts omitted. The original SEC tagged les retained this 

information and tagged it as if it were a part of the original script which is not true. Care must, 

26

therefore, be taken to ensure that these comments are not taken as data. 

Since the treebank 1 

contained word class information and phrase bracket information it was 

desirable to allow for use of this data yet attempts to match the word class data and the treebank 

data with the prosodic data proved too arduous because of changes that had been made (for better 

or worse) to the word class data. This meant that although it was possible to match the word class 

or treebank data to the prosodic data, it was dicult to resolve the dierences in punctuation 

between the word class and treebank versions. A decision was made to drop the word class data 

and use the treebank data exclusively. There is a distinct advantage to this since the treebank 

was tagged using CLAWS4 | a more advanced tagging system to that used to tag the word class 

version. 

All these factors give good reason why it has not been an easy task to cross reference data in 

the prosodically annotated version of the corpus with the word class information in the corpus. 

They also indicate that a fairly complex cross{referencing process is necessary. This is detailed in 

the remainder of this chapter. 

3.4 Cross Referencing 

The structure of the corpus and the problems described above meantitwas not possible to 

answer question such as \what word class has a given stressed word in the corpus been tagged 

with" or \list all the proper nouns sorted according to the type of prosodic marks they have". 

Cross{referencing between word class and prosodic information was an essential step. 

To this 

end software was written that would match upword{by{word the two data sources. Before the 

cross{referencing can begin it is necessary to preprocess the corpus data to transform it into a 

useful format. Figure 3.1 shows an extract of the prosodic annotation format and gure 3.2 shows 

an extract of the treebank annotation. 

See appendix F for the source code. 

trees. 

1 The treebank is a further version of the corpus containing word classes (assigned using CLAWS4) and parse 

27

The preprocessing (acomplished with UNIX 2 tools) produces two les from each of these formats. 

The rst two les produced from the prosodic annotation contain one word per line (here 

we dene \word" as any sequence of characters delimited by whitespace). In one le the words 

retain their prosodic annotations and in the other these are removed just leaving the word. Tone 

unit boundaries are treated as words in this context and appear on a separate line. Comments 

contained in square brackets are discarded. The second two les produced from the treebank are 

similar: one le contains each word (separated from the word class tag) and the other contains 

the word class. Phrase bracket and sentence numbers are discarded. Punctuation is treated as a 

word in these les (a punctuation symbol being given the word class tag of itself). As a further 

aid to the following stage the case of letters in the two word les is converted into lowercase where 

appropriate. 

This preprocessing stage is doubly useful in that it not only simplies the cross{referencing 

stage but it also insulates the body of the process from the potentially variable annotation formats. 

Hence by changing the preprocessing stage this software may be used with alternative annoatation 

formats. 

Before the treebank format was adopted as a source of word classes the word{class 

tagged version of the SEC was used. This change was acomplished with a minor change to the 

preprocessing stage. 

The next stage is the use of the program ttalign (see section F.2). The program works like this: 

the corresponding entries are read from the two les produced by the preprocessing stage from 

the prosodic annotation and similarly for the two les produced from the treebank. The program 

then compares the word from the prosodic annotation with the word from the treebank. If these 

words match an output line is generated from the input. As a special case punctuation is treated 

as equivalent to tone unit boundaries where they coincide (remember that tone unit boundaries 

only exist in the prosodic annotation and punctuation only exists in the treebank). Two other 

possibilities are that the prosodic annotation word is a tone unit boundary and the treebank word 

is the next word after the boundary. In these cases a ller symbol (fTUg) is inserted to match with 

2 UNIX is a registered trademark of UNIX System Laboratories. 

28

the tone unit boundary on the output line alternatively the treebank word may be punctuation 

whereas the prosodic annotation word may be the next word after the punctuation. As above a 

ller symbol (fPNg) is inserted to match the punctuation on the output line. The symbol fTUg 

stands for \a tone unit boundary that does not coincide with any punctuation" and the symbol 

fPNg stands for \a punctuation symbol that does not coincide with a tone unit boundary". 

If none of these situations is true, that is: if the two words do not match and neither is a 

punctuation or tone unit boundary then the mismatch handling routine is called. In the mismatch 

handling routine an output line is generated from the mismatching entries and new input is read. 

If these words now match then it is assumed that the previous mismatchwas an error in the corpus 

or that the words were dierent representations of the same thing e.g. \nineteen" and \19". In 

this case another output line is generated and processing continues as normal. 

If the new input words do not match atwo way lookahead stage is entered. 

In this the 

prosodic annotation word is compared with the next few treebank words and the treebank word 

is simultaneously compared with the next few prosodic annotation words until a match is found 

in either search oruntil a xed distance ahead has been viewed. If a match is found, depending 

upon which stream the match is found in, the program assumes that the mismatchmust have been 

caused by either an enclitic or a compound word. As previously noted enclitics such as\won't" 

are represented as \will n't" in the treebank. The software will match \won't" with \will" but will 

insert a ller symbol (fENg) meaning enclitic which will be matched with \n't". For compounds, 

for example, \search and destroy" (section a10 line 43) the software will match \search" with 

\search and destroy" and will insert a ller symbol fCPg) meaning compound word which will be 

matched with the next two lines: \and" and \destroy". 

It it, of course, possible that the two way lookahead will not turn up a match. In this case 

an interactive modeisentered that allows the user to specify one of the four ller symbols (fTUg, 

fPNg, fENg, and fCPg) which is repeated until the input streams are again in synchronisation (i.e. 

the prosodic annotation word matches the treebank word). This is a very rare occurrence and it 

was not justiable to add the extra complexity necessary to handle the conditions under which 

29

it occurred. This happened when a compound word was followed by a tone unit boundary or an 

enclitic was followed by punctuation which did not match punctuation or a tone unit boundary 

respectively. Since the lookahead does not check for punctuation or tone unit boundaries it can 

easily get confused thinking that, say, the tone unit boundary is part of the compound. For this 

reason and to save adding recursive levels of lookahead the interactive modeisentered. When 

synchronisation is achieved the user exits this mode and the program continues as normal. 

The whole process is repeated for each input line until the data are exhausted. The output 

from ttalign is not the end of the cross{reference process: two further stages are necessary. A 

post{processing stage handles the tricky problem of what to do if more than one punctuation 

symbol coincides with a tone unit boundary. The ttalign program will only match the rst such 

punctuation symbol with the tone unit boundary and the remaining punctuation symbols will be 

tagged with the fPNg ller symbol which means that the punctuation does not match a tone unit 

boundary. This is wrong and to x this the program collate{tu (see section F.3) will nd such 

instances and convert fPNg llers into fCTUg ller symbols which stand for collated tone unit (a 

single tone unit boundary that matches multiple punctuation symbols). Which of the punctuation 

symbols actually matches the tone unit boundary (as opposed to the ller symbol) is determined 

by an order of precedence. This usually only occurs with quotes and brackets which are given less 

precedence than the other punctuation symbols. With hindsight itwould have been preferable 

to use two or three ller symbols such asfCTU1g, fCTU2g, and fCTU3g that are used respectively 

with the three dierent types of tone unit boundary. As it stands it is not possible to tell (without 

reference to the context) whether punctuation aligned with the fCTUg ller was at a major, minor 

or hesitation tone unit boundary. This does not have any eect on the research presented here 

but may bear on automatic segmentation of the text into tone units. 

The nal stage of the cross{referencing uses the word classes to guide the program align{parse 

(see section F.4) while it re-inserts the treebank phrase brackets. Although the phrase brackets 

are not actually used here it is hoped that future research may be able to improve on results by 

using the contextual knowledge embodied within them. Figure 3.3 shows some example output 

30

from the alignment process (here without the phrase brackets). The rst column contains the 

word class (see appendix B for an explanation of the code used) the second column contains the 

prosodically annotated word and the third column contains the lexical word. Notice how tone unit 

boundaries and punctuation are handled. These are not \words" and some researchers choose to 

add an extra eld for information such as this placing the punctuation next to the word which it 

follows. This approach has not been adopted because of the additional complexity of processing 

the data. 

3.5 By{Products 

The above has also been useful in the production of cross{referencing data in the MARSEC 

project[GAR92]. The ability to relate temporal information to the treebank is provided as a direct 

by{product of this software. Prosodic annotation words are located in a cross{reference produced 

by the above algorithm and the location within the parsetree is identied and a cross{reference 

table is produced. 

3.6 Summary 

The chosen data source, the SEC, contains word class, parsetree and prosodic information which 

could not directly be cross{referenced. A semi-intelligent algorithm was used to produce cross{ 

referencing between these annotations which coped with representational dierences with little 

domain knowledge. The results may be used to make direct comparisons between word class and 

prosodic annotations. 

31

[001 SPOKEN ENGLISH CORPUS TEXT A01] 

[In Perspective] 

[Rosemary Hartill] 

[Broadcast notes: Radio 4, 07.45a.m., 24th November, 1984] 

[Transcriber: BJW] 

#Good morning k"more news about the Reverend Sun Myung Moon j founderofthe 

Uni cation Church j who's currently in jail j for tax evasion k"he was a warded an honorary 

de gree last week j by the Roman Catholic Uni versity ofla Plata j in Buenos Aires j 

Argen tina k in an nouncing the a ward in New York j the rector of the uni versity j#Dr 

Nicholas Argen tato j de scribed Mr Moon as j a prophet of our time k 

Figure 3.1: Example of Prosodic annotation format. 

SA01 1 v 

SA01 2 v 

[N Good JJ morning NN1 N] . . 

SA01 3 v 

[N More DAR news NN1 [P about II [N the AT Reverend NNS1 Sun NP1 Myung NP1 Moon NP1 

, , [N founder NN1 [P of IO [N the AT Unication NN1 church NN1 N]P]N] , ,[Fr[N who PNQS 

N][V 's VBZ currently RR [P in II [N jail NN1 N]P][P for IF [N tax NN1 evasion NN1 

N]P]V]Fr]N]P]N] : : [N he PPHS1 N][V was VBDZ awarded VVN [N an AT1 honorary JJ degree 

NN1 N][Nr last MD week NNT1 Nr][P by II [N the AT [ Roman JJ Catholic JJ ] University 

NNL1 [P of IO [N la &FW Plata NP1 N]P][P in II [N Buenos NP1 Aires NP1 , , Argentina 

NP1 N]P]N]P]V] . . 

SA01 4 v 

[P In II [Tg announcing VVG [NtheATaward NN1 N][P in II [N New NP1 York NP1 N]P]Tg]P] 

, , [N the AT rector NNS1 [P of IO [N the AT university NNL1 N]P] , , [N Dr NNSB1 

Nicholas NP1 Argentato NP1 N]N] , , [V described VVD [N Mr NNSB1 Moon NP1 N][P as II 

[N a AT1 prophet NN1 [P of IO [N our APP$ time NN1 N]P]N]P]V] . . 

Figure 3.2: Example of Treebank format. 

32

NNJ BBC bbc 

NN1 news news 

fTUg j fTONE{UNITg 

II at at 

MC eight eight 

RA o' clock o'clock 

fTUg j fTONE{UNITg 

II on on 

NPD1 Saturday saturday 

, j , 

AT the the 

MD twenty{ second twenty{second 

IO of of 

NP1 June june 

. k . 

DD1 # this this 

VBZ is is 

NP1 Brian brian 

NP1 Perkins perkins 

. k . 

Figure 3.3: Example output from cross{referencing from section B02. 

33

Chapter 4 

Preliminary Statistical Analysis 


This chapter will explain how statistics were extracted from the corpus and how these statistics 

may be used to estimate probabilities for the co{occurrence between prosodic stress marks and 

word classes and probabilities for seqeuences of prosodic stress marks and word classes. Together 

these probabilities are used in later chapters to build up a probabilistic grammar of prosody that 

may be generated from the word classes for an utterance. Other statistics gathered are used to 

provide descriptions of the corpus annotations. These are used to provide operational parameters 

for the context in which the grammar will work and provide information on the factors of the 

prosodic annotation that are not covered by the grammar. 

In this statistical analysis of the corpus there were two aims: 

to extract information descriptive of the prosodic annotations. 

to extract information helpful in relating the prosodic annotation to the syntactic annotation. 

The analysis results described and presented here are extracted from the prosodic annotation 

and the cross{reference between the prosody and syntax produced in the previous chapter. Only a 

subsection of the corpus was used for these analyses because the corpus is comprised of a number 

34

of dierent speech styles (see appendix D) and there is a likelihood that unusual prosodic styles 

such as found in the poetry examples, for example, may produce spurious results in the statistics. 

A subsection of the corpus (of approximately two thirds) was selected roughly corresponding to a 

report style. This subsection comprised the categories listed in table 4.1. Category M was reserved 

for testing purposes. 

Category Style #words % corpus 

A Commentary 9066 17% 

B News Broadcast 5235 10% 

C Lecture type I (general audience) 4471 8% 

D Lecture type II (restricted audience) 7451 14% 

F Magazine Style Reports 4710 9% 

M Miscellaneous 3352 6% 

Table 4.1: Categories of the corpus used for analysis 

(K). 

Categories omitted were: Religious (E), Fiction (G), Poetry (H), Dialogue (J), and Propaganda 

4.2 Prosodic Annotation Statistics 

Although there is little new information presented in these statistics they are useful in providing 

descriptions of the annotations in the corpus and can serve as a metric to compare synthesized 

annotations against. Useful statistics to look at are the relative frequencyofeach of the prosodic 

annotations (see appendix D) the lengths of tone units and frequencies of prosodic mark bigrams. 

4.2.1 Prosodic mark frequencies 

Figure 4.1 shows the relative frequency of each of the prosodic marks (including unstressed) 

used in the annotation of the SEC. Unstressed words account for 47.1%, whereas rises, falls, fall{ 

rises and level tones account for 40%. The remaining 12.9% is largely made up of the class stressed 

but unaccented. Rise{fall tones are so rare that they are negligible for our purposes. The one or 

two instances that actually do occur in the corpus were ommitted hence the count of zero in the 

35

Frequency 

12801 

3511 

2564 

2297 

1528 

1511 

1200 1158 

342 261 

8 

0 

Unstressed 

High Fall 

Stress (unaccented) 

Low Fall 

Low Rise 

High Rise 

High Fall Rise 

Low Level 

High Level 

Low Fall Rise 

High Rise Fall 

Low Rise Fall 

Figure 4.1: Frequency of prosodic marks 

histogram. 

4.2.2 Tone Unit lengths 

3300 

3000 

2700 

2400 

2100 

1800 

1500 

1200 

900 

600 

300 

0 

0 1 2 3 4 5 6 7 8 9 10 

Length of Tone-Unit 

Figure 4.2: Relative frequencies of tone{unit lengths in terms of numbers of: words with tonic 

stress marks words with prosodic marks and words. 

The length of tone units (presented in gure 4.2) was calculated by counting the number of 

words with a prosodic mark (the dotted line), by counting the number of words with tonic stress 

36

marks only (the dashed line), and by counting the total number of words (the solid line). It is, of 

course, not meaningful to have a tone unit with zero words but it is possible to have a tone unit 

with no TSM or stressed words. The total length of tone units (in terms of words) extends to over 

35 words but this trails o to a frequency below 10 after a length of about 15 words. The average 

length of a tone unit is about 4 words. 

Of these measures of length only the rst two are useful for comparison with the same measures 

taken from any prosody synthesis model that is not concerned with the segmentation (into tone 

units) problem. The models presented in chapters 5 and 6 are only concerned with synthesis of 

stress and prosodic marks annotations and not prosodic boundaries | these are taken as given. 

Under those conditions the number of words in a tone unit will not change as the same tone unit 

boundaries are used. 

4.2.3 Prosodic mark bigram frequencies 

or or or or Unstress kjor * 

or 34 40 20 128 329 2044 

or 44 304 216 1072 2454 4834 

or 10 74 25 701 854 1162 

or 953 2381 666 2436 5297 3767 

Unstress 1235 4939 1524 8210 8453 1463 

kjor * 319 1185 375 2953 8437 N/A 

Table 4.2: Prosodic mark bigram frequencies 

The gures in table 4.2 show the absolute number of instances of each of the bigrams. So, 

for example, the frequency of a fall (either high or low) being immediately followed by a tone 

unit boundary is 4834 instances. For reasons applicable to the models developed in chapters 5 

and 6 the bigrams gures presented here group low and high tones together as well as grouping 

the stressed but unaccented mark together with the low and high level tones. This group is often 

referred to here simply as stressed. The unstress element refers to words (not syllables) that have 

no stress or prosodic mark annotated. It is interesting to note that 60.6% of tone units are ended 

with a falling, rising or rise{fall tone and only 11% with an unstressed word supporting the view 

37

that tonic stress comes at the end of a tone unit. 

It is interesting to note that none of the cells are zero. This shows that a probabilistic approach 

to language modelling is essential. A traditional generative approach to developing a \grammar 

of prosodic marks" following Chomsky[Cho57] would involve dening a set of rules to generate all 

and only \legal" prosodic mark sequences, and disallowing \illegal" sequences. Since all pair combinations 

are legal a rule{based grammar derrived from the bigrams alone would not be sucient. 

4.3 Cross{Reference Statistics 

The statistics here provide evidence for what has commonly been believed about the relationship 

between prosodic marks and word classes. In particular it quanties the relationship and by use 

of a larger set of word classications than is normally considered it adds detail. The relationship 

between punctuation and tone unit boundaries is also covered by showing the frequency of co{ 

occurrence of the marks in each annotation. 

4.3.1 Co{occurence tables 

Of particular interest to this research is the frequency of co{occurence of word classes with prosodic 

marks. The prosodic marks (known as Tonic Stress Marks or TSMs) fall on the syllable upon which 

a pitch movement starts and the behaviour of the prosody between TSMs may be predicted[KT88]. 

The TSMs therefore encapsulate the essential changes in the prosody. 

The SEC (as noted in 

chapter 2) marks every stressed syllable with a TSM with the assumption that the nal TSM in 

a tone{unit is the tonic stress. It is this approach to the annotation that makes it possible to 

produce the co{occurence table for word classes and TSMs. This could not be done with other 

prosodic annotations schemes such asToBI[SBP + 92] which have a rigid grammar and individual 

elements cannot be extracted from an utterance ignoring the context from where they came. Of 

course a co{ocurrence table (or any other statistical model) cannot readily be extracted from 

prosodic annotations unless these are machine{readable. The POW corpus was collated before 

word processing software was available so the only \models" we can build from the POW must be 

38

ased on intelligent observation of the four volumes of printed transcripts. From the cross-reference 

produced in chapter 3 this table is easy to produce: for each word class a count ismadeofeach 

time it occurs with each of the possible TSMs plus the annotations of stressed but unnaccented 

and unstressed 1 . 

One problem with this approach is that some multisyllabic words have more than one prosodic 

mark within them (for example \ unscien tic" section a01 line 64). Words with multiple prosodic 

marks comprise 1.2% of the sub{corpus used for the cross{reference which is small enough to 

ignore and assume that all words will have only one TSM. This is an important assumption since 

it overcomes the diculties associated with the fact that prosody and stress are usually syllable 

based but word class tagging is word based. Compound words are often treated as a single word 

(for example \ battle{ marked" (section a02 line 12) is not treated as two separated words \battle" 

and \marked" with word classes NN1 and JJ but as a single word with word class JJ) and cases 

such as this often disagree with the assumption. This work does not attempt to address this 

problem. Further work on the relationships between prosody and compound words looks like it 

would be a fruitful avenue to explore, but is outside the bounds investigated here. 

Another problem is the use of " and # which may be either used on their own or in combination 

with other TSMs. This increases the number of possible (if not actually used) prosodic marks 

substantially yet " and # occur very infrequently. The approach here had been to assume that the 

eect of " and # is highly semantic, contextual or pragmatic in nature and hence they have been 

ignored. Thus there is no distinction made between TSMs marked with a high or low reset and 

those without. 

Looking at the frequencies of TSMs (gure 4.1) it was noticed that rise{fall tones are also very 

infrequent. So infrequent that statistics gleened from the few instances are likely to be in error 

hence they have likewise been ignored. 

In Summary: with 168 dierent word classes occurring in the sub{corpus sample and a possible 

34 dierent types of prosodic annotation per syllable there is a major problem in regards of sample 

1 N.B. From here on I will not normally make the distinction between TSMs and stressed but unaccented and 

unstressed annotations. 

39

sizes. In order to alleviate this problem certain prosodic annotation marks are ignored. These 

are higher and lower than expected pitch level markers (" and #), and rise falls which together 

account for less than 0.6% of the data but reduce the number of prosodic marks to ten. Even with 

this precaution many cells in the co{occurrence table will be zero and correspondingly distribution 

probabilities based on the frequencies will be in error. 

A table of frequencies is produced (see table D.3). Presented in table 4.3 are the frequencies 

for the 64 most frequent word classes. 

The same statistics can also be calculated for tone unit boundaries and punctuation symbols 

see table E.1. 

Although the phrase brackets are also aligned with the word classes and prosodic annotations 

they are not actually used in the mappings dened here. Atwell[Atw94] has some ndings which 

support the view that phrase brackets do not give much more information than the word class 

tags alone. 

4.3.2 Ignoring Higher{Level Syntactic structures 

There are two good reasons to ignore the parse tree information available in the SEC treebank. 

Firstly, there exist accurate word class taggers (e.g. CLAWS) but no parsers of equivalent 

accuracy exist[Atw93, Atw94, ASO88, O'D93, Wee94]. It would be much more useful to be able to 

predict prosody from word class tags alone since natural language processing systems can predict 

these with condence. Accurate parse trees cannot be generated automatically. 

Secondly, parse trees are large structures in comparison with individual prosodic marks and the 

SEC is not large enough to be able to provide enough examples of given parse tree structures (or 

sub{structures) in correlation with prosodic mark sequences. It is very doubtful that any progress 

would be made from use of such limited information within the framework of this research. 

This is why this thesis concentrates upon mapping only word class tags to prosody. Hopefully 

higher level syntactic structures would only be useful for providing contextual or semantic 

information or segmenting the utterance into tone units which are all outside the scope. 

40

4.3.3 Clustering word classes 

Using the data in the co-occurence table it is possible to perform a prosodic{based word class 

hierarchical clustering 2 of these word classes. Clustering of non{parametric n{dimensional data 

is a dicult task and many distance metrics exist (see Hughes[Hug94]). It is highly likely that an 

improvement on the clustering presented here would be possible. Advanced clustering techniques, 

however, are beyond the scope of this research. 

By drawing a vertical line through the arcs in gure 4.3 the word classes may be divided into 

anumber of groups. A line at the far left would give one group, slightly to the right would give 

two groups etc. There would be n groups where the vertical line cut through n horizontal lines 

| all the arcs to the right of cut bracket all the word classes in each group. For example with 

two groups there would be word classes APP$ to DDQ in one group and DB to RG in the other 

group. 

This leaves the problem of which group each of the other word classes (not in this gure) 

belong to. Low frequency word classes have poorly dened co{occurence vectors which make it 

dicult to know which cluster a word class would group with. The best that can be achieved is 

to inspect the groups for patterns between the word classes. For example one cluster near the 

bottom of gure 4.3 has the word classes: NN1, NN2, NNL1, NNT2, NNU, and NNT1. A clear 

pattern exists here and similar low frequency tags such as NNL2 could be added to this cluster. 

4.4 Summary 

In this chapter consideration has been given to the extraction of various measures to aid the 

mapping between word class and prosodic annotations in the Spoken English Corpus. In particular 

the range and frequency of the prosodic marks has been presented along with their frequency within 

tone units. Prosodic mark bigram frequencies have been extracted which may be used to calculate 

likelihood scores for sequences of prosodic marks. It was noted that these gures indicate that in 

2 The clustering was kindly performed for me by John Hughes using a technique described in Hughes[Hug94] 

41

common with general belief the rst word in a tone unit is usually unstressed and the last word 

in a tone unit usually carries the tonic stress. 

A mechanism was described to cluster word classes into groups based upon their similarity of 

co{occurrence with prosodic marks. This has led to the idea of prosodically orientated word class 

groups. 

The statistics so assembled here are used in chapter 5 to devise a model to generate stress 

patterns from word class tagged text. 

42

Tag 

U/str 

APP$ 1 0 2 9 1 8 0 7 6 303 

VBDR 0 0 0 2 0 6 0 1 1 90 

VHD 0 0 0 2 1 6 0 0 3 75 

CC 5 1 1 12 1 18 0 6 57 683 

II 7 1 11 25 13 47 1 25 128 1847 

IW 0 0 0 4 1 5 0 3 12 157 

PPHS1 0 0 1 5 1 3 0 1 5 116 

VHZ 1 0 0 3 1 2 0 1 5 79 

VBZ 2 1 3 8 1 6 0 8 10 229 

EX 0 0 0 0 1 2 0 0 6 79 

PNQS 0 1 0 1 1 2 0 0 4 78 

PPIS2 0 0 0 0 0 4 0 1 5 96 

VBDZ 0 0 0 2 3 4 0 1 11 261 

VBN 0 1 0 1 2 1 0 0 5 104 

IF 1 0 0 1 1 0 0 0 11 254 

PPH1 0 0 1 1 0 0 0 0 16 245 

VB0 1 0 1 2 0 1 0 0 9 153 

AT 0 0 0 18 1 19 0 3 42 2016 

PPY 0 0 0 0 1 0 0 1 1 68 

AT1 1 0 0 1 1 3 0 1 12 739 

CST 0 0 0 0 0 0 0 0 8 260 

II22 0 0 0 0 1 0 0 0 0 70 

IO 0 0 0 1 0 0 0 0 7 896 

TO 0 0 0 0 0 0 0 0 3 410 

CCB 2 0 0 0 0 3 0 1 23 140 

NNSB1 2 0 0 1 1 3 0 0 12 60 

CSA 1 0 0 4 1 8 0 0 7 72 

VM 2 0 5 20 9 19 0 10 17 218 

PPHS2 1 0 0 5 2 2 1 6 7 97 

VBR 0 0 1 8 4 3 0 2 5 108 

VH0 1 0 0 3 3 4 0 5 13 123 

CS 3 1 0 12 3 17 0 0 30 92 

ICS 1 0 1 9 7 9 0 5 24 65 

DDQ 3 0 1 10 7 11 0 1 22 106 

DB 0 1 9 18 5 19 2 14 14 8 

NP1 91 52 129 191 131 191 37 155 222 117 

JJ 68 15 87 263 200 354 37 177 356 145 

VVN 67 8 62 107 96 115 16 56 152 60 

JB 7 0 3 23 18 28 1 20 40 24 

NN 14 3 13 23 14 23 3 6 45 28 

VV0 67 7 47 104 90 105 9 47 185 144 

RL 8 1 16 13 10 15 3 4 26 15 

RR 41 5 28 157 65 130 16 83 132 117 

RT 12 2 8 10 7 15 1 11 23 22 

DD 0 0 1 13 9 20 0 17 14 23 

MD 3 0 4 18 22 27 3 26 20 29 

XX 0 1 4 21 6 22 0 8 10 22 

MC 12 2 22 78 63 108 4 35 73 50 

MC1 4 1 2 10 11 22 0 12 18 18 

NNS1 3 3 3 8 9 15 2 5 31 13 

VVD 13 10 21 53 47 85 2 28 147 67 

VVG 21 3 6 54 51 71 5 20 105 42 

VVZ 11 2 9 20 21 33 1 4 54 40 

NN1 379 80 373 610 285 348 106 359 810 275 

NN2 160 26 172 236 134 130 45 124 364 141 

NNL1 15 4 20 23 13 11 2 8 26 19 

NNT2 12 1 13 21 4 4 1 7 15 14 

NNJ 7 5 6 20 12 3 7 8 30 19 

NNT1 15 3 19 32 13 15 4 12 62 35 

DD1 5 1 4 38 6 44 1 31 42 113 

MF 3 0 2 13 1 10 0 1 16 24 

NNO 3 0 2 9 7 4 0 10 14 26 

RP 14 4 24 33 11 18 4 9 24 66 

RG 0 1 0 14 6 22 0 5 19 52 

Table 4.3: Co{occurence table for 64 most frequent word classes. 

43

APP$ 

VBDR 

VHD 

CC 

II 

IW 

PPHS1 

VHZ 

VBZ 

EX 

PNQS 

PPIS2 

VBDZ 

VBN 

IF 

PPH1 

VB0 

AT 

PPY 

AT1 

CST 

II22 

IO 

TO 

CCB 

NNSB1 

CSA 

VM 

PPHS2 

VBR 

VH0 

CS 

ICS 

DDQ 

DB 

NP1 

JJ 

VVN 

JB 

NN 

VV0 

RL 

RR 

RT 

DD 

MD 

XX 

MC 

MC1 

NNS1 

VVD 

VVG 

VVZ 

NN1 

NN2 

NNL1 

NNT2 

NNJ 

NNT1 

DD1 

MF 

NNO 

RP 

RG 

Figure 4.3: Hierarchical clustering of 64 most frequent word classes. 

44

Chapter 5 

Automatic Stress Annotation 


In this chapter it will be shown that the placement of stresses 1 

on words in an utterance may 

be largely predicted from word class classications. This chapter is largely based upon the work 

presented in Arneld[AA93]. 

Chapter 6 will expand on the ideas presented here to show how prosodic annotations may be 

calculated from word classes. The intention, as pointed out in section 1.3.1, is not to produce 

predictions that exactly match the annotations in the corpus but to generate annotations that will 

act as a baseline annotation which may be built upon by semantic and contextual processes. 

It is possible for a sentence to be stressed in dierent ways in dierent texts (contexts). A 

predictor based on sentence{syntax, without any model of \text grammar" or inter{sentential 

cohesion cannot hope to work perfectly. This leads to a problem of evaluation if the predicted 

stress is dierent from that in the corpus | it need not necessarily be wrong. 

There are a number of problems associated with this task (which are examined below): 

1 For the purposes of clarity in writing I will refer to a stressed syllable or word here as one that has either a 

Tonic Stress Mark or is classied as being stressed but unnaccented. Unstressed refers to words or syllables which 

have no prosodic annotation (In some cases words are annotated solely with pitch resets. These are ignored in this 

research and such aword would be considered unstressed). In general I refer to stress marks as meaning either of 

the above, and not a syllable which isstressed. 

45

Stresses are normally associated with syllables, not words. 

How does one decide which syllable stress will fall on 

Enclitic words have more than one word class. 

Compound words have a single word class but multiple words. 

A basic assumption of this research has been that it will only deal with at most one stress 

mark per word. This is not the case in reality by examining the corpus it can be seen that this 

assumption holds for 98.8% of the words in the sections of the corpus used (see section 4.1). The 

assumption is such a useful one to make because it simplies the mapping problem to mapping 

between a single word class (in most cases) and a single stress mark per word. In words that 

feature more than one stress mark the most prominent stress takes precedence, in this analysis. 

The second problem follows on from the rst if a stress mark is to be placed (upon a syllable) 

within a word, which syllable should it go on This is not a problem tackled by this research 

because of a second underlying assumption: that the stress mark will go on the syllable marked 

as that carrying the primary stress in a dictionary. According to Fudge[Fud84] 

in English, the syllable singled out in a given word is nearly always the same one, 

irrespective of the context. 

He notes two types of exceptions (i) cases where the word has not been properly perceived by 

the hearer and (ii) certain types of phrase require a shift in word{stress. The problem would 

therefore be solved by acombination of dictionary lookup using a machine readable dictionary 

or lexicon (such as the forthcoming edition of The English Pronouncing Dictionary[RH95]) and 

a rule{based approach ascovered by Fudge. Several machine readable dictionaries and lexical 

databases commonly used for natural language processing research (e.g. LDOCE, OALD, Collins 

English Dictionary) include stress assignment information. 

Enclitics (which are pronounced as single words) have more than one word class because they 

are formed from two (or more) words that become joined. For example \can't" is \can not" and 

46

\won't" is \will not". Two possibilities exist to deal with mapping between two (ormore)word 

classes and a single word: (i) treat enclitics as a special case in the stress prediction stage or (ii) 

as a special case in the placement of stress marks on syllables stage. The dierence here is either 

some additional complexity in the prediction model or some simple rule{based approach inhow 

to deal with placing more than one stress mark on the same syllable. Because of limited data on 

enclitics being available in the corpus the former approach would be dicult to implement and 

mayhave the aect of blurring the word classications of enclitics. Toavoid this potential problem 

the latter approach is assumed: a prediction is made for each part of the enclitic and these are 

combined at a later stage with the rule that the most prominent stress mark that is assigned to 

any part of the enclitic is taken as the stress mark for the whole enclitic. If it is a multisyllabic 

enclitic the stress mark will be placed on the primarily stressed syllable as indicated above. 

Compounds unlike enclitics suer from the problem that a single word classication is given 

for a phrase that maycontain several words. For example \search{and{destroy" may be classied 

with a single word class. This will give a single stress mark prediction but in reality more than 

one word may be stressed. 

In addition it is dicult to say whichword should take the main 

stress. In eect what is needed is a lexicon of compound words that list primary stress and gives 

rules for assignment of stress marks to the other words/syllables. Compound words are a special 

and dicult case that are not dealt with here. It would be outside the scope of this research to 

attempt to handle them eectively. Hence, within the constraints of the predictor model used, it 

will not be possible to predict stress marks for compound word sequences. However, Fudge[Fud84] 

(chapter 5) gives rules to deal with compounds. 

5.2 Stress Prediction 

The study (described in this chapter and extended in chapter 6) concentrates upon building a 

stochastic grammar model of stress based upon word class and the prosodic mark co{occurrence 

table. 

47

If we can collect a number of \measures" of the relationship between word classes and prosodic 

marks we can combine these measures together. Each diering measure of likelihood of relationship 

forms a constituent of the overall measure of relationship. Using a numberofsuch constituents 

to relate one entity (suchasword class) to another (prosody) is what Atwell[Atw83] has called 

constituent likelihood. 

In the case of this research the measures are probabilities (or estimates of) of co{occurrence 

between word class and prosodic marks. 

In the model developed here the distinction between dierent tones will be ignored and only two 

types of annotations (stress markers and unstressed markers) will be considered. Stress markers 

are considered to be any of the tonic stress marks or stressed but unaccented marks whereas 

unstressed markers are considered to exist on words which have no prosodic marks whatsoever or 

only have marks " or # although if these exist in combination with a TSM the word is considered 

stressed. With the assumptions made above there will be no more that a single stress occurring 

within each word one can look at a sequence of words as a sequence of stress and unstress markers. 

Each word is either stressed (i.e. contains a stressed syllable) or unstressed (no syllable is stressed). 

Since this is a binary string, for a sequence of n words, there would be 2 n possible sequences. 

As an example: a three word utterance (\at Ford motors" (section B04 line 51)) would have 

eight possible stress patterns where each word is in one of two states. 

1 at Ford motors 








Here 

indicates (some type of) stress on the word (others being unstressed). If we wish to assign 

48

stress to the appropriate words in an utterance we needtondwhich of the possible 2 n sequences 

are valid or acceptable. One way of doing this is to assign a score to each sequence and pick the 

highest scoring sequence as the pattern for the utterance. The scores being designed such that 

the \baseline" stress patterns generate the best scores. 

5.2.1 Search Mechanism 

For limited sizes of n the number of sequences is small enough to do a global search. If n were large 

or if we were dealing with more that two possible annotation types for each word then it would 

be necessary to use an alternative search methodology to cut down on the computational load. 

For example for a 10 word sequence with two annotation types there are only 1024 possibilities 

but if there were, say, ve annotation types this number rises to 5 10 = 9765625 which is too large 

to search exhaustively in reasonable time. Alternative search possibilities exist (see section 6.3.4) 

where more annotation types are considered. 

5.2.2 Scoring 

How does one assign scores to the sequences 

Various factors appear to be relevant including 

semantics, pragmatics, word class and context and clearly all of these could be used to provide 

components of the score. Since the aim of this research is to demonstrate a relationship between 

syntax and prosody we will only use measures derived from word class and context and will not 

attempt to implement any measures based upon semantics or other relevant factors. 

One could try the following formula for scoring each possible sequence of length w words 

function a n gives the stress{state (or annotation) of word n (i.e. either stressed or unstressed) 

function w n 

gives the word class at word n into the sequence and function S(p q) gives the 

probability (or likelihood) that word class p would have annotation q. 

score = 

wY 

n=1 

S(w n a n ) (5.1) 

49

That is: if we know the word class for each word in our utterance and if we know the probability 

of each word class being associated with a stressed (or unstressed) word then we canmultiply 

all the probabilities together to give avalue we call the score (it is not actually necessary to use 

probabilities, any measure of likelihood would be acceptable but using probabilities (in reality an 

estimate of the probability) means that all the measures of likelihood will always be in the range 

of 0.0 to 1.0 and hence one does not have toworry about overow errors when using very long 

sequences. Underow errors can, of course, occur but this is easy to handle since any sequence 

score that underows is going to be very unlikely and therefore will not be a candidate). 

By summing frequencies for all stress marks in the tonic stress/word class co{occurrence table 

for each word class the two values (for frequency of stressed and frequency of unstressed) can 

be used to calculate probabilities of each word class being stressed or unstressed. For NP1 (see 

table D.3) we get the frequency of NP1 being unstressed at 117 and the frequency of NP1 being 

stressed at 

91 + 52 + 129 + 191 + 131 + 191 + 37 + 155 + 222 = 1199 

The probability of NP1 being unstressed is then 

117=(117 + 1199) = 0:09 

and the probability of NP1 being stressed is 

1199=(117 + 1199) = 0:91 

Given that the word class is known for each word in the utterance 2 

wecannow state the 

likelihood of stress being present on each word. As an example consider: 

2 It is assumed that this information may be derrived automatically using a tagging system such as CLAWS). 

50

Word at Ford motors 

Word Class II NP1 NN2 

Probability of being stressed 0.12 0.91 0.92 

Probability of being unstressed 0.88 0.09 0.08 

Which means, for example, that the word class NP1 (singular proper noun) has a 91% chance 

of being stressed and a 9% chance of being unstressed. 

Then for each of the possible stress 

sequences the scoring would be as follows (S represents stressed annotation, U represents unstressed 

annotation so USS means that \at" will be unstressed whilst both \Ford" and \motors" will be 

stressed): 

pattern calculation score 

UUU 0:88 0:09 0:08 = 0:006 

UUS 0:88 0:09 0:92 = 0:073 

USU 0:88 0:91 0:08 = 0:064 

USS 0:88 0:91 0:92 = 0:737 

SUU 0:12 0:09 0:08 = 0:001 

SUS 0:12 0:09 0:92 = 0:010 

SSU 0:12 0:91 0:08 = 0:009 

SSS 0:12 0:91 0:92 = 0:101 

Using this simple scoring scheme reasonable results are achieved (in the order of half of the 

predictions matching the corpus annotation). In this case \at Ford motors" is the winning 

sequence by a long way however the next best sequences \ at Ford motors", \at Ford motors" 

and \at Ford motors" are all plausible. 

5.2.3 Performance Measures 

An important consideration is now evident: how does one rate the performance of the annotations 

That is: how does this generated annotation compare with real speech. This is a dicult question 

but using the corpus as a guide two possibilities are: 

51

calculate the percentage of words with the annotation marks which are the same as those 

annotated in the corpus. 

calculate the number of (tone unit) sequences that are entirely the same as those annotated 

in the corpus. 

The latter performance measure was used initially since synthesis worked on a sequence of several 

words at once. One error in a tone unit might upset a listeners perception of normalness. However 

this performance measure proved to be too inexible to register minor improvements in the performance 

(the main reason for use of such a measure is to assess the function of the model). Indeed 

the majority of errors that indicated poor performance were likely to be unpredictable without 

additional information such as semantics (afterall one would not expect 100% performance from 

this type of model). Such errorsovershadowed those errors that could have been improved upon 

within the constraints of the model. It was also discovered that sequences that contained errors 

were only wrong in one place 69% of the time. The change to the former performance measure 

brought understandably higher percentage gures but most importantly provided better insights 

into how changes eected the synthesis results. Neither of these are wholly satisfactory because 

they do not take account of the fact that alternative annotations to those given in the corpus may 

be acceptible. The matter will be returned to later. 

5.2.4 Context 

Something to notice about this simple scheme is that it takes no account oftheorderofthe 

words. 

Hence \at Ford motors" (II NP1 NN2) and \Police in Yorkshire" (NN2 II NP1) will 

attain the same scores regardless of the word order and regardless of whether the change in order 

dramatically changes the way that each word would be stressed (it is unlikely however in this 

case). There is strong reason to suggest that the order of stress annotations is important asis 

noted in section 4.2.3 there is a tendency for a TSM to come at the end of a tone unit and it 

is relatively likely for an unstressed word to appear at the start of a tone unit. A renement to 

formula 5.1 can take account of the sequence order. To calculate this we can use the probability 

52

of a stress occuring at each word and the probability of the stress sequence. Ideally a value is 

needed that represents the relative likelihood (or probability) of each sequence of word classes 

and stress annotations. However these values are dicult to extract from the corpus because the 

corpus is not large enough to provide enough examples of each word class sequence in dierent 

stress patterns to get reliable probability orlikelihood measures. 

Although it is not possible to extract likelihood measures for any arbitary sequence of word 

classes it is possible to approximate this by using likelihoods for shorter sequences and overlapping 

them. Making use of xed length short sequences also considerably simplies calculation of the 

score. 

For example the sequence II NP1 NN2 could be divided into the two sequences II NP1 and 

NP1 NN2. The likelihood for II NP1 NN2 can be estimated as the product of the likelihoods for 

the sequences II NP1 and NP1 NN2. Two{symbol sequences like this are known as bigrams 3 . 

The new scoring metric given in equation 5.2 includes components for both the likelihood of 

stresses occurring with word classes and the transition likelihoods for bigram sequences of word 

classes with specied stress annotations. 

score = 

wY 

S(w n a n ) 

wY 

n=1 

m=2 

B(w m;1 a m;1 w m a m ) (5.2) 

Where S(p q), w n ,anda n are as before and B(p q p 0 q 0 ) is the probability (or likelihood) of 

the bigram of word class p followed by word class p 0 where p has an annotation q and p 0 has an 

annotation q 0 . 

Bigrams Probabilities 

For the bigram probabilities four tables are needed, one for each of the possible stress transitions: 

3 A bigram is a sequence of two symbols, such astwo word class tags that follow each other in a text. Trigrams 

or more generally n{grams are three or n symbol sequences. 

53

unstressed ! unstressed 

unstressed ! stressed 

stressed ! unstressed 

stressed ! stressed 

Each table has to hold the likelihoods for transition from any of the 168 word classes to any other 

word class. 

Four tables of 168 168 cell locations still need very many more samples than are available in 

the corpus size of appoximately 28000 words 4 to produce reasonable likelihood measures. It is clear 

that even these values cannot be extracted. One (possibly extreme) solution to this problem (but 

see section 5.4) is to assume that all word classes behave similarly in these bigram probabilities. 

In all likelihood many word classes will behave similarly and their likelihoods could be combined 

to give groups of word classes that occur similarly with regard to stress pattern/word class orders. 

The extreme case of assuming that all word classes behave similarly (which isequivalent toa 

single group) will mean that only four values (one per table) need to be estimated. In fact in this 

approach word classes are ignored. This is a simplication to B(p q p 0 q 0 ) such thatp and p 0 are 

irrelevant and are ignored or more generally all word classes are mapped to the same group. The 

grouping idea will be returned to later. These four probabilities can be calculated from the corpus 

prosodic annotations and are shown in the table 5.1. 

first nsecond unstressed stressed 

unstressed 0.17 0.39 

stressed 0.21 0.23 

Table 5.1: Stress Transition Table. 

This means that for the sequence where the rst word in unstressed and the second word is 

stressed (in a bigram) the probability is0:39. 

4 Section 4.1 lists the categories if the corpus used for this research. Note that some sections of these categories 

are also omitted because of problems between the various corpus versions. 

54

For a given stress pattern the following probabilities can now be laid out (where arrows indicate 

transition (bigram) probabilities and initial state probabilities (probabilties of word classes being 

stressed or unstressed are listed below eachword class). This example is for the specic stress 

annotation \at Ford motors". 

at Ford Motors 

II NP1 NN2 

0.39 0.23 

0.88 0.91 0.92 

prosody 

word tags 

transition probabilities 

state probabilities 

The product of all these probabilities is calculated (using equation 5.2) giving the overall 

likelihood for this utterance. This is repeated for all possible patterns and the highest scoring 

pattern is selected giving the \most likely" binary pattern. 

Here are the new values for the example (note that although the order of most sequences is 

the same UUS and USU are reversed): 

pattern 

score 

UUU 1:7 10 ;4 

UUS 4:8 10 ;3 

USU 5:2 10 ;3 

USS 6:6 10 ;2 

SUU 3:6 10 ;5 

SUS 8:2 10 ;4 

SSU 4:3 10 ;4 

SSS 5:3 10 ;3 

5.2.5 Boundary Conditions 

Until now the subject of boundary considerations has been ignored. It is possible to calculate 

bigram frequencies from the corpus prosodic annotations for stress annotations after and before 

prosodic boundaries (see tables 5.2 and 5.3) and incorporate these into the calculations. Assuming 

that the sequences that are processed with the model are whole tone units (bounded on either 

55

TU boundary 

stressed 0.1781 

unstressed 0.0139 

Table 5.2: Probability of a tone unit boundary following a stressed or unstressed word. 

stressed unstressed 

TU boundary 0.0690 0.1231 

Table 5.3: Probability of stressed or unstressed word following a Tone Unit boundary. 

side by a tone unit boundary) the additional probabilities: tone unit boundary followed by either 

stressed or unstress annotations (dependent upon the stress state of the rst word) and likewise 

for the last word in the tone unit can be applied by multiplying them together with the value 

given by formula 5.2. 

For the example sequence \k at Ford motors k" the score given above (6:6 10 ;2 )would be 

multiplied by 0.1231 for the \k at" bigram and by 0.1781 for the \ motors k" bigram. 

This model performs no kind of boundary prediction. In the results presented below the original 

boundaries were taken from the corpus prosodic annotations. Boundary conditions do eect the 

results (there is a greater tendency for a stress at the end of a tone unit than at the beginning) 

but the models (in this research) are not concerned with boundary predictions or modelling even 

though their use here does improve the model's performance. Initially a distinction was made 

between boundary types but this was later dropped and all boundary types are treated the same 

because there was no signicant dierence attained by making the distinction. 

5.3 Performance 

To assess the accuracy of the model (see source code in section F.9) it was applied to the training 

data in the corpus 5 

and the predicted stress patterns compared with those transcribed by the 

5 It is usual to use testing data to assess a model's usefulness but this is not the purpose here. Here we assess 

how well the model matches the training data. Testing data would then be used to compare with the training 

data results to see if the model worked as well for unseen data Category M was reserved for checking the models 

generality and results for this are presented in chapter 6. 

56

Category Speech Style %BJW %GOK %ALL 

A Commentary 88 94 91 

B News Broadcasts 90 95 92 

C Lecture(general) 88 94 90 

D Lecture(specialist) 92 95 93 

F Magazine Reporting 85 94 90 

Table 5.4: Performance statistics for stress prediction model. Percentage of words which are 

correctly stressed/unstressed in comparison to the two expert annotations and overall. 

Category Speech Style %BJW %GOK %ALL 

A Commentary 50 74 64 

B News Broadcasts 54 82 69 

C Lecture(general) 55 75 66 

D Lecture(specialist) 66 79 73 

F Magazine Reporting 43 76 64 

Table 5.5: Performance statistics for stress prediction model. Percentage of completely correct 

tone units in comparison to the two expert annotations (BJW: Briony Williams, and GOK: Gerry 

Knowles) and overall (ALL). 

experts. The results are summarized in table 5.4. The performance of the model is good averaging 

over 90% agreement with the annotations in the corpus for a range of speech styles however it seems 

especially good with the more formal speech styles in the specialist lecture and news categories. 

It is interesting to note that using the second performance measure listed earlier the results 

in table 5.5 were attained. The interesting point to note is not that the values are lower (as this 

would be expected) but that the values for the dierent transcribers are signicantly dierent. 

The values in the last three columns show the percentages of correct predictions for tone 

units (average length 4 words) in each catagory. The percentage correct is the percentage of tone 

units whose predicted stress pattern completely agrees with that in the transcription. There is an 

average 24% dierence between the accuracy of the model when applied to each of the transcribers' 

sections of the corpus. This does not mean that one transcriber was better or worse than the other 

as both transcribers are experts in the eld and were approaching the same task. However it does 

go some way to illuminate the diculty in producing prosodic transcriptions. Prosodic factors are 

complex perceptions and a person's perceptions are avoured by interests, attitudes, environment 

etc. It is reasonable to expect a level of \perceptual variability" between any two transcriptions. 

57

It is not suprising, therefore, that one transcriber had a tendency to produce transcriptions more 

consistent with syntactic structure (he having been working in that area for many years) whilst 

the other transcriber produced transcriptions more consistent with acoustic measures of speech 

(she working on speech and intonation synthesis at that time). 

Since the model is based on 

syntactic class it is clear why the percentage gures for GOKs transcribed sections were larger. 

This dierence is amplied by the metric used. 

5.4 Improvements 

As suggested earlier clustering word classes into groups with similar probabilities will enable us 

to estimate word classs/stress state transition probabilities for groups of word classes and thus 

side{step the problem of low sample sizes. If the assumption made above that word class is not 

important for these probabilities is wrong the estimated group{based probabilities will perform 

better than the non{group{based probabilities used before. In which way the groups are formed 

is the subject of the following. 

There are potentially many ways to group word classes. Initially they were grouped upon the 

ratio of frequencies of stressed to unstressed instances of each word class. That is the probability 

of each word class being stressed and unstressed were used to plot a point on a graph. All points 

lie on the line 

y =1; x where 0 x 1 

An arbitrary thirteen groups were produced by dividing this line at arbitrary points and the 

group stress transition probabilities were extracted for each group from the aligned prosodic and 

syntactic annotations (this produced four tables of 13 13 = 676 cells). This was achieved by 

summing the transition frequencies for each word class and then adding together frequencies for 

each word class in each group and translating between word classes and groups. These transition 

probabilities were then used in place of the probabilities in the stress transition table. Each word 

class is mapped onto its appropriate group and the group{to{group transition probabilities used. 

58

Initial word class state probabilities were used as before as opposed to calculating group state 

probabilities. 

The probabilities made no new mistakes and when tested upon a sample of 50 previously erroneously 

predicted sequences 16% were corrected. This translates to an overall 3%{4% improvement 

thus demonstrating that grouping of word classes to aid estimation of transition probabilities improved 

the model. However, the expected improvements were not forthcoming when the testing 

was scaled up. 

This failure prompted a closer examination of the groupings. There is a problem with grouping 

word classes in this way consider a very low frequency word class (for example NNSA1 which has 

one instance in the sub{corpus sample used). A word class with only one instance will be placed 

at either end of the continuum stressed/unstressed and no account is taken of the behaviour of 

other word classes that might be similar. This is true for other low frequency word classes: it is 

quite possible for a single extra example of the word class to shift which group it would be placed 

in. 

A dierent grouping scheme could be to use transitions probabilities themselves as a guide 

to group formation but this suers from all the problems mentioned earlier and is therefore not 

practical. A more realistic scheme would be to use the word class/prosodic mark co{occurrence 

frequencies as a guide see section 4.3.3. 

To avoid the problem of low frequency word classes aecting the groupings the clustering 

was performed on the 64 most frequent word classes. This was an arbitrary cut o point. The 

remaining word classes were then placed into groups using the similarity ofword classes already 

in the groups as a guide and was performed some what ad hoc. The clustering into groups was 

performed using the method described in Hughes[Hug94] where the distance between two word 

classes as dened by the distance between the vectors of prosodic mark co-occurrence gures for 

each word class was calculated and closest word classes were merged. The vector for each word 

class contained the probabilties (i.e. normalised frequencies) for co{occurrence with unstressed 

words, levelly stressed words, stressed but unaccented words and words with stress accent. 

59

1 2 3 4 5 6 7 8 9 

&FW EX AT CCB DA &FO NNS ND1 DD1 

APP$ IF AT1 CF DA1 DD NNS1 NN1 DD121 

CC PNQO BTO CS DA2 MC NNS2 NN121 DD122 

CC31 PNQS BTO21 CS21 DA2R MC{MC VVD NN122 DD2 

CC32 PPH1 BTO22 CS22 DAR MC1 VVG NN2 DD21 

CC33 PPIO1 CSN CSA DAT MC2 VVZ NNJ DD22 

II PPIO2 CST CSW DB MD NNJ1 DD221 

IW PPIS1 II22 DDQ DB2 XX NNJ2 DD222 

PN PPIS2 II31 DDQ$ JA NNL1 II21 

PN1 VB0 II33 DDQV JB NNL2 II32 

PN121 VBDZ IO ICS JBR NNT1 MF 

PN122 VBN PPY LE JJ NNT2 NNO 

PP$ TO NNSA1 JJR PPX1 NNO2 

PPHO1 NNSB1 JJT PPX121 NNU 

PPHS1 PPHO2 NN PPX122 NNU1 

UH PPHS2 NP PPX2 NNU2 

VBDR VBR NP1 PPX221 NNU21 

VBG VH0 NP2 PPX222 NNU22 

VBM VM RA NPD1 

VBZ VM21 RL NPM1 

VD0 VM22 RL21 REX 

VDD VMK RL22 REX21 

VDG RR REX22 

VDN RR21 RG 

VDZ RR22 RG21 

VHD RR31 RG22 

VHG RR32 RGA 

VHN RR33 RGQ 

VHZ RRQ RGQV 

RRQV 

RGR 

RRR 

RGT 

RRT 

RP 

RT 

VV0 

VVN 

ZZ1 

Table 5.6: Words classes in the groups. 

This is repeated until all word classes were merged together. From the resulting distances a 

dendritic diagram could be produced (see gure 4.3). By cutting vertically through the lines of 

the dendritic diagram the word classes may be divided intoanumber of groups. This was done 

to yield nine groups. 

The word classes in each group are presented in table 5.6. 

After all word classes have been placed into a group it is possible to calculate the transition 

probabilities. 

This was performed using the two programs transition and transgroups see sec- 

60

Category Speech Style %Correct 

A Commentary 92 

B News Broadcasts 93 

C Lecture(general) 92 

D Lecture(specialist) 93 

F Magazine Reporting 93 

Table 5.7: Performance statistics for stress prediction model using group transition probabilities. 

tions F.6 and F.7 respectively. The rst program calculates the absolute transition frequencies for 

each word class (that is the number of times each word class/prosodic mark pair is followed by 

another word class/prosodic mark pair) and the second maps the word classes (and their associated 

frequencies) into groups before normalising the frequencies into the range 0 to 1 giving the 

estimate of probability. 

As before only two stress states are of concern here: stressed and unstressed, however, transitions 

to and from tone unit boundaries are also calculated, so a special group is assigned to the role 

of representing tone unit boundaries although it is clearly not possible for tone unit boundaries 

to be stressed or unstressed. For convenience and simplicity they are always treated as unstressed 

by convention, even though this has no meaning. 

This produced results (see table 5.7) equivalent to those shown in table 5.4. This demonstrates 

that clustering can produce results at least as good as those produced earlier, in this case slightly 

higher. 

5.5 Summary 

With performances as high as 95% for some categories (see table 5.4 categories B and D) it is obvious 

that the model performs very well. A performance of 100% correct would be remarkable and 

it is clear that the models' performance is approaching a limit in terms of possible improvements 

using just word class and bigram frequencies. 

It is highly likely (but outside the scope of this work) that contextual information (such as 

given information) could improve performance. 

The results are encouraging enough to try to 

61

expand the model to predict more than two types of prosodic mark. That is to expand the model 

such that it produces a potential prosodic annotation for a sequence of word classes. In addition 

this will quantify the relationship between word class and stress accents. This is the subject of 

the next chapter. 

62

Chapter 6 

Automatic Prosodic Annotation 


Chapter 5 described a model for predicting stress patterns for word class tagged text. This chapter 

concentrates upon extending the model developed to make predictions about stress accents 1 . 

The basic function of the model is as described in the previous chapter hence for a background 

description of the mathematics and functioning of the model refer to chapter 5. 

6.2 Expanding the Model 

The stress prediction model (SPM) developed in Chapter 5 showed that there was a good degree of 

relationship between word class and stress such that for over 91% of word classes the stressed versus 

unstressed state of the word may be predicted. That is a probabilistic model has been derived 

which can generate acceptable levels of stress annotation from word class sequences. This shows 

that stress is closely related to word class. This has been a widely held belief but has not until 

now been quantied in such away due to the unavailability of hand annotated machine{readable 

corpora such as the SEC. 

1 Whether a stress has a rising or falling or level pitch 

63

This chapter extends the model to show that this relationship holds too for stress accents (to a 

lesser extent). That is: it shows that there is a relationship between word class and pitch accents 

and that to a certain degree stress accents can (reasonably accurately) be generated from word 

classes. 

Three points however need to be made. Firstly the degree to which it is reasonable to expect 

such a model to function accurately will be less than for the previous model. This is partly due 

to the increased diversity of the possible annotations and partly due to the fact that the stress 

prediction model is a more general case than that of stress accent prediction so in predicting 

accents we are rening the model's behaviour. 

Secondly, as has repeatedly been pointed out, it is unreasonable to attempt to deduce an 

accurate statistically based model from a corpus as small as the SEC. That is not to say itis 

impossible but any models will be subject to deciencies in the same areas where the corpus 

suers from deciencies. For example the tonic stress mark for rise{fall is so infrequent in the 

SEC as to make it impossible to model it statistically. The same is true for certain word classes e.g. 

NP2 and REX. In general it is still possible to model a category (prosodic or syntactic) accurately 

despite low frequency if and only if it is highly constrained. The few examples are enough to 

denitively illustrate the category's behaviour. 

Finally, stress seems to play an important role in giving information about utterance (or sentence) 

structure whereas stress accents also play important roles in giving semantic and contextual 

information. The eect of these will not be modellable within the constraints of this research. 

For these reasons the model developed here does not attempt to reproduce the exact set of 

annotations with which the corpus is annotated, but instead uses a subset as described and justied 

in section 6.3.1 below. This brings up the issue of how to compare the two annotation schemes. 

Section 6.4 describes an alternative metric 2 to that used previously (i.e. an exact match with the 

corpus annotations) that attempts to overcome these problems. 

There is, and continues to be, a real lack of ability to assess general performance of any such 

2 None of the metrics used for assessment are wholey satisfactory. 

64

model since it is beyond current capability to know what comprises a good prosodic annotation 

for a given word class sequence | unless, of course, one is an expert in prosodic annotations 3 . 

An expert transcriber or a linguist who had worked with prosody would have anintuitive feel for 

what transcriptions were acceptible or natural. No system has yet been invented that can do this. 

In addition no system has been invented that can generate speech with the appropriate stresses 

and intonations from such transcriptions (thus performing the inverse role of the transcriber). 

One possibility (See section 6.4) would be to synthesize utterances or manipulate pre{recorded 

utterances with the desired changes in fundamental frequency, intensity, duration and other 

prosodic features to reect the stress accent predictions and submit these to listening tests whereby 

subjects assess the naturalness of each synthesized utterance. Attractive though this is it is not 

possible to achieve at present because of lackofknowledge of how (and if) the prosodic annotations 

relate to acoustic features such as fundamental frequency, intensity etc. 

6.3 Model Design 

The model used in this chapter works on the same principles as the stress prediction model but 

with two important dierences. Firstly the range of annotation symbols is increased from stressed 

and unstressed to incorporate rises and falls (see below) and transition probabilities are estimated 

from groupings of word classes using their prosodic similarity (in the ultimately described model). 

There are two basic details to consider in the design of the model and these are: what range 

of word classes the model should handle and which aspects of the prosodic annotation will be 

modelled. There are also two sets of parameters that need to be estimated from the corpus: the 

probability of co{ocurrence between word classes and prosodic marks and the transition probabilities 

of a pair of word classes with appropriate prosodic marks. 

The choice of which word classes to model is largely dictated by those word classes that will 

be in the input stream. However as pointed out above itwould be impossible to model some word 

3 unlike the author 

65

classes because of their low frequency. The action taken was to attempt to model all word classes 

however erroneously that may be for some word classes. In attempting to model such word classes 

the overall performance of the model will be reduced. It should be noted however, that by their 

very nature these word classes hardly occur and so each word class badly modelled will not have 

a signicant impact upon the performance. 

A related factor to this becomes obvious when we want to assess whether changes to the model 

are improvements or not. Since most errors of the sort caused by poor modelling occur in those 

word classes with ill{dened probabilities (i.e. those with low frequencies) performance changes 

are unlikely to be large and hence will be dicult to assess. It would be quite justiable to ignore 

the annotations on some of the word classes output from the model (if they were in the poorly 

dened class). This would leave the problem of side eects that happen over transitions between 

these word classes and their neighbours. In order to not complicate the analyses more than is 

necessary this is not done. Performance gures are given for the more frequently occuring word 

classes. See section 7.3.3. 

6.3.1 Choice of Prosodic Marks 

The corpus has a wide range of prosodic marks: 

and unstressed with the 

" and # modier symbols. It is unreasonable to expect this model to perform well with so many 

possibilities for reasons given above and the search space would become unreasonably large. This 

must be reduced. Dropping the high/low distinction (as in Roach[Roa91]) and the rise{fall mark 

( ) which isvery infrequent, reduces the set of marks to six high and low resets are generally 

ignored in these models. The symbol (stressed but unaccented) has been described as ill{dened 4 

and a decision was made to merge the low and high level tones ( and )with to give anew 

mark (here it will continue to be donoted as 

but its meaning is simply stress or level stress). 

This results in 5 possible prosodic classes: rise, fall, fall{rise, stressed and unstressed. Unstressed 

words will continue to carry no mark. 

4 Gerry Knowles in private conversation. 

66

For convienience we will continue to use the symbols and to represent these prosodic 

marks but it must be remembered that these symbols do not now distinguish between high and 

low levels of marks. 

6.3.2 Estimation of Probabilities 

Estimating a model of stresses or tonic stress marks statistically from a corpus would normally 

require enough data for each entity to be modelled. As pointed out in section 5.2.4 the SEC is 

not a large corpus and the direct methods of estimating co-occurrence probabilities are prone to 

error. However in section 5.2.4 an assumption was made that all word classes behave similarly 

with regard to stress transition likelihoods. 

This is not a valid assumption (but was a useful 

way to simplify the calculation of the likelihoods) because improved results may beachieved by 

grouping word classes in terms of their behaviour and estimating likelihoods for each group (see 

below on transition probabilities). If there is a sucient number of examples of the dierent word 

classes in each group (which thus imposes some constraints upon how groups are comprised) the 

likelihoods of co{occurrence between word classes (or more accurately, groups) and tonic stress 

marks can be estimated with reasonable accuracy. Section 4.3.3 discusses grouping word classes 

using the 64 most frequent as a guide. The remaining word classes which are of low frequency 

can be inserted into groups based upon the similarity oftheword class types. It should be noted 

that the clustering of word classes into groups performed in section 4.3.3 does so not based upon 

the transition likelihoods but on the co-occurrence likelihoods, however, the enhancements to the 

model provided here assume that word classes that have similar co-occurrence likelihoods will also 

have similar transition likelihoods. This is described in more detail in section 5.4. 

Estimation of State Probabilities 

It would be possible to use the state probabilities for the group associated with each word class. 

This would mean that low frequency word classes would use an \improved" set of probabilities 

but at the expense of those word classes that are of high frequency. Since these latter word classes 

67

are the most frequent it is desirable to have them as accurate as possible. The decision made here 

is to use the state probabilities derived from the co{occurrence table directly individually for each 

word class. An alternative would be to use the group state probabilities for low frequency word 

classes only. The problem here is that it is still not clear which group a low frequency word class 

should belong and consequently this approach has not been explored although there is reason to 

suspect some improvement in performance. 

Estimation of Transition Probabilities 

The estimation of the transition probabilities can be attempted in three ways: rstly ignore word 

class and use prosodic mark transition probabilities. This is equivalent to the special case of a 

single word class group. Secondly the transition probabilities for each and every word class could 

be estimated from alignment data produced in chapter 3. This is equivalent tohaving groups 

with a single word class in each. Finally a compromise could be struck using a small number of 

groups on the basis of similarity. Although there is good reason to expect that this compromise 

would be the most benecial the construction of groups is still an uncertain process and has 

signicant impact upon results as can be seen by the two approaches taken in section 5.4. The 

estimation of probabilities is fraught with low sample size problems: each group must contain a 

sucient number of word classes with a representative sample of transitions. For this reason the 

transition probabilities are estimated on the single group basis which has been shown to work if 

not optimally. It is also important to not over{complicate the model initially. In section 6.3.4 the 

concept of a composite model is introduced. This model uses aspects of the model developed in 

the previous chapter along with the model developed here. This, of course, leads to two lots of 

transition probabilities. The transition probabilities for the rst stage of the composite model use 

the groupings described in section 5.4 and the transition probabilties for the latter stage of the 

composite model use the single group transition probabilities mentioned above. 

68

6.3.3 The Model 

The formula for the model is very similar to that given in equation 5.2. There are however now 

more annotation types: the ve marks given in section 6.3.1. As in the original stress prediction 

model (SPM) transition probabilities are used with the assumption that word class is not important 

(i.e. the extreme case of one group). 

The source code for this model is given in section F.10. This program is very computationally 

intensive because of the exhaustive search of the very large search space. In an attempt to limit 

computation time the length of the utterances that it will handle is limited to 15 words (and/or 

tone unit boundaries). In most situations however utterances (divided by tone unit boundaries) 

will be much shorter than this. There is no special requirement that the utterance is bounded by 

tone unit boundaries and it is worth noting that this will aect performance since the presence 

of a tone unit boundary will aect the placement of tones given the inclusion of TSM|tone unit 

boundary bigram likelihood constraints (see section 5.2.5). In particular it may be found that 

TSMs will not be placed at the end of what would be a tone unit if a tone unit boundary is 

omitted. 

Boundary Considerations 

In a real life application of the model it would be necessary to predict the location of the tone unit 

boundaries. This is beyond the scope of this research but cannot be entirely ignored. The model 

makes no serious attempt to model tone unit boundaries other than assuming that punctuation 

gives rise to a boundary. As punctuation marks are classed as lexical items and assigned their own 

syntactic word tags in the SEC (and most other tagged corpora) this amounts to an appropriate 

mapping between syntax and prosody at a basic level. This is a very rough and ready rule which 

aects the performance of the model but is only described as a short term solution since much 

work has being done in this area. 

Table E.1 quanties the relationship between punctuation and tone unit boundaries. It should 

be noted that this rule will miss approximately 52% of the boundaries and approximately 9% 

69

of the boundaries generated will be inserted where they would otherwise not have existed. 

It 

is undoubtable that this would aect the accuracy of the model in some cases. 

In the results 

presented here and elsewhere in this thesis the original tone unit boundaries (as transcribed in the 

corpus) were used for the assessment of the model. Punctuation would only be used in situations 

where tone unit boundaries were not available. 

6.3.4 Composite Model 

The major failing of the model above is that it performs badly on the stress/unstress distinction. 

The stress prediction model however achieves a high success rate in this very area (over 91% on 

average). It is therefore desirable to try to capitalise on this. 

A composite model (i.e. 

a model that combines both the model described above and the 

stress prediction model) may be devised by making use of the mechanism of the model developed 

in chapter 5 to select a number of candidate sequences and use these as input to the prosody 

predition model. In this approach the stress prediction model is applied to utterances and the 

top few sequences are selected. Analysis of the models' performance has shown that the \correct" 

or most acceptable stress pattern was usually present in the top 5 patterns. The prosodic mark 

prediction model now operates on the search space dened by the top few patterns: the stressed 

words will be allowed to vary between each of the accents and and the unstressed words 

will be clamped as unstressed. 

For example the utterance \at Ford motors" as processed by the SPM gives performance scores 

as listed in section 5.2.4 with the winning sequence being \at Ford motors". This sequence would 

be constrained to the possibilities listed below. Remember that 

has dierent meanings in the 

SPM and PPM. 

70

at Ford motors 

In this case the winning sequence was \at Ford motors" which is considered to be equivalent 

to that annotated in the corpus of \j at 

Ford motors j". 

Of course the SPM is not perfect and its errors will be passed on to the second phase of the 

model. A better solution would be more accurate estimates of the model parameters: state and 

transition probabilities as mentioned elsewhere. 

Alternative search methods such assimulated annealing are not guaranteed to nd the global 

minimum whereas the above mechanism is highly unlikely to miss the global minimumand achieves 

avery large reduction in the search space it is therefore reasonably hard to beat. There seems 

little justication in attempting to implement alternative search methodologies. 

71

Category Speech Style %Correct 

A Commentary 64 

B News Broadcasts 77 

C Lecture(general) 59 

D Lecture(specialist) 63 

F Magazine Reporting 61 

Table 6.1: Performance scores for the training categories. 

Search Space Reduction 

The non{composite model described above suers from a high computational load which makes 

it unsuitable for real time applications: the search space is too large to search in reasonable time 

given current computing power. Although the global search mechanism could be replaced with 

an alternative there is no need: the composite model described in section 6.3.4 massively reduced 

the computational load. 

For example: a ten word utterance would have a search space of size 5 10 patterns as compared 

to the composite model which would have a search space of the order 2 10 + 100 (with the, very 

reasonable, assumption that the composite model selects 5 patterns from the rst phase to pass 

to the second phase and if we assume that, on average, half of the words in the utterance will be 

unstressed (5 words 4 prosodic marks 5 patterns = 100). We note that 2 10 + 100 5 10 . 

As noted in section 4.2.2 and by reference to gure 4.2 the average length of a tone unit is 

about 4 words. This would mean a search space of less than 130 patterns which iseasilyachievable 

in real time. On a computer capable of 1Mop (and performing no other work) it could achieve 

in the order of 1000 tone units per second equivalent to more than 4000 words. For an assumed 

speech rate of ve words per second this would account for approximately 1/8% of the computing 

power. 

Performance Improvements 

The performance statistics for the training categories of the model are presented in table 6.1. 

Results for the testing sections are given in table 6.3. 

72

There are a whole range of alternatives that may be used to attempt to improve the performance 

of the model. One subset of these alternatives changes the transition probabilities. In addition 

to the bigrams it is possible to calculate tri{gram probabilities for prosodic marks (i.e. 

these 

probabilities are for prosodic marks only not for word class and prosodic marks as it has previously 

been noted that probabilities such as those cannot reliably be estimated from the corpus). The 

addition of these tri{gram probabilities does provide a small increase (overall performance increase 

of 0.18%) in the performance of the model overall, but not really of any signicance. 

Varying the number of sequences passed to the second phase from the rst phase does not 

make any dierence to the performance of the model. It would be possible to dynamically vary 

the number of sequences passed on based upon the closeness of the scores for each sequence in the 

rst phase. This could further reduce the search space. The top two or three sequences are the 

most that is required. 

Running the model with either the initial state probabilities or the transition probabilities 

alone reduces its performance to just over half what is achieved when using both. 

6.4 Model Assessment 

The stress prediction model developed in chapter 5 could predict one of two possible prosodic 

marks for each word class. The metric used to assess how well the model worked was to count the 

number of times that each predicted mark matched the actual mark (or equivalent actual mark) 

in the corpus. This was quite acceptible. In the case of the model developed in this chapter the 

situtation is more complex since the model can predict multiple types of mark. Using the above 

metric would disadvantage the model since it is clear that an error of one type maybeworse than 

an error of another type. In fact what may be construed as an error (in that it does not match 

exactly the annotation in the corpus) may be an acceptable prediction (in that a listener would 

not object to the naturalness of the utterance). 

In an attempt to alleviate this problem each 

predicted mark is scored (ranging from 0 to 1) depending upon its similarity to the annotation 

73

given in the corpus as opposed to the score being 1 for an exact match and 0 for a mismatch. For 

example if the corpus annotation was a fall{rise and the model predicted a fall this would be given 

a higher score than if the model predicted an unstressed word. The scores were given according to 

the gures in table 6.2. For example: a predicted mark of 

would score 0.15 if the corpus word 

were annotated with 

but would score 0.0 if the corpus word were annotated as unstressed. In all 

cases a score of 1.0 is given where an exact match isachieved. The nal score for a section can be 

converted to a percentage by dividing the score by the numberofwords. 

It should be acknowledged that these scores have been allocated somewhat arbitrarily although 

it would be possible to choose the scores so that the models behaviour seemed to be improved. 

This has not been attempted and in the example performance statistics given in table 6.3 the 

percentage correct metric is also given for comparison. 

Scoring or evaluation is a notoriously 

dicult problem in natural language processing in general. 

For example see [Wee94, Lyo94, 

BGL93] for discussions of the range of dierent and contradictory metrics of parsing sucess used 

in corpus{based grammatical analysis systems. 

As will be noted when viewing the performance statistics in table 6.3 the percentage scores are 

higher than the percentage correct. This is to be expected given the nature of the way the scores 

are calculated. Although the gures seem similar it should be realised that the percentage scores 

aim to give a better idea of how good annotations are and should be more sensitive tochanges in 

the model than the percentage correct. They have only really been useful when comparing diering 

versions of the PPM. A more thoughtful and insightful approach to assignment of the values in 

table 6.2 would perhaps provide more sensitivity and improvement in annotation assessment but 

this requires expertise in the area of prosodic tone labelling of speech. 

The only really satisfactory way to assess the \goodness" of prosodic annotations is to listen 

to speech spoken in a way to reect the annotations. A listening test experiment would present 

subjects with a variety of dierent utterances with diering prosody and the subjects would give 

a subjective opinion of the \naturalness" of each utterance. The only diculty with this approach 

is the diculty in producing the utterances. Three possibilities are given below. 

74

u/stress 

1.00 0.25 0.50 0.50 0.15 

0.25 1.00 0.75 0.50 0.15 

0.50 0.50 1.00 0.50 0.15 

0.15 0.15 0.15 1.00 0.25 

u/stress 0.00 0.00 0.00 0.15 1.00 

Table 6.2: Scoring relationship between predicted and annotated prosodic marks. 

1. Speak the utterances following the prosodic annotations. This is not an easy task and 

requires expert ability not readily available. 

2. Modify the original corpus recordings (available on the MARSEC CDROM) using either the 

SOLA/PSOLA 5 algorithms or by using re{synthesis techniques. This requires the specication 

of intensity, fundamental frequency, and duration contours derived from the prosodic 

annotation. It is not clear how to do this or indeed whether there really is a relationship 

between the two which can be captured in an accurate computational model. 

3. Finally, a speech synthesizer could be used which allows specication of syllable durations, 

rising and falling tones and relative syllable loudness. There are, however, no such synthesizers 

readily available as most that employany form of intonation control tend to be restricted 

to either a falling or rising tune over each utterance. Listeners may deem the output \unnatural" 

even if the predicted annotations are an exact match for MARSEC markup because 

of poor synthesis rather than poor prosody prediction. 

The Klatt synthesizer allows the specication of all the necessary parameters but this requires 

that prosodic annotations are converted into fundamental frequency and intensity contours 

and syllable durations which leads to the same problems as above. 

The use of a listening test has distinct advantages: it gets away from the use of corpus{based 

scoring metrics whichhave been criticised[Wee94]. It also allows for the fact that some annotations 

may be acceptible in diering ways from those in the corpus and there may well be reason why 

5 These two algorithms are digital signal processing techniques that allow the duration of a sound to be lengthened 

or shortened without changing the pitch (SOLA) and allow the pitch of a sound to be changed without changing 

the length of a sound (PSOLA). 

75

the corpus annotation is not typical or would not be acceptible in a general case. Context may 

have forced a change in the prosody in the corpus which would not ordinarily have happened. For 

example where the speaker wishes to contrast or correct something that the listener has misheard. 

For example \Peter isn't here" versus \Peter isn't here". 

6.4.1 Performance Statistics 

The model performs quite well especially when it is realised that chance performance would be 

20% (that is each output symbol may be one of ve possibilities) compared with that of the stress 

prediction model where chance would be 50%. Overal score percentage is 1621:25(1420+763) 

100 = 74:27% and the overall percentage correct is 1420 (1420 + 763) 100 = 65:05%. Here 

score percentage is calculated as scoreright+wrong. See table 6.3. 

The majority of test sections have good results over 60% correct. It is worth noting the the 

best performance of nearly 74% correct is achieved on the longest section (m05). The sections (in 

decreasing order of performance) cover the following topics. 

m05 

m04 

m09 

m08 

m03 

m02 

m07 

Nelson Mandela speech 

Programme News 


Weather Forecast 


Motoring News 

Travel Roundup 

It is clear that the model performs better for some types of speech and perhaps speaker. Above 

the formal speech given about Nelson Madela and programme and weather news are given by 

professional speakers who are more likely to speak with more accurate structure, pronunciation 

and intonation than would be expected in casual speech. 

76

Section # right # wrong Score Score % Correct % 

m02 113 88 133.45 66.39 56.22 

m03 86 57 102.70 71.82 60.14 

m04 196 109 225.45 73.92 64.26 

m05 548 196 597.65 80.33 73.66 

m07 103 88 127.20 66.60 53.93 

m08 90 59 108.05 72.52 60.40 

m09 284 166 326.75 72.61 63.11 

total 1420 763 1621.25 74.27 65.05 

6.5 Summary 

Table 6.3: Performance scores for the test category of the corpus. 

In this chapter the stress prediction model developed in chapter 5 has been expanded to make 

predictions for a range of prosodic stress marks. It was found that a model such asthishadavery 

high computational load and was not very sucessful. The main problem area for the model seemed 

to be in the stress/unstress distinction. However, the stress prediction model does this task well 

and so a composite model was created. The composite model uses the stress prediction model 

to select a number of stress patterns for the utterance from the search space and passes these 

few patterns onto the prosody prediction model. This clamps unstressed words as such and only 

allows stressed words to vary between the dierent stress accents. It was found that this produced 

signicant improvement in performance with, on average, 65% of all prosodic marks matching the 

prosodic mark in the corpus. This result was also seen in the testing data presented in table 6.1. 

Chapter 7 goes on to analyse the function of the model in terms of how well it performs for 

each prosodic mark and for each word class. 

77

Chapter 7 

Conclusions and Future Work 


This chapter rst presents a review of the main points covered in this thesis. It then covers analysis 

of the models' performance in terms of how well it models each word class and prosodic mark. 

Finally comments upon future investigations are given. 

7.2 Review 

In chapter 3 a need to be able to cross reference between prosodic and word class annotated 

versions of the Spoken English Corpus was identied and a semi{automatic semi{intelligent tool 

was devised to enable this. This tool produced an aligned version of the corpus with both word 

class and prosodic annotations 1 from which itwas possible to extract statistics of co{occurrence 

between the two as described in chapter 4. This tool is general enough to be exploited in a similar 

way to relate any other types of corpus annotation and it has been used to provide alignment 

information in the Machine Readable Spoken English Corpus[GAR92]. 

In chapters 5 and 6 two modelswere developed using statistics extracted from the corpus. 

1 Treebank parse trees are also aligned, even though these were not used in experiments. 

78

The development of these models was designed to demonstrate in a quantiable way the extent of 

the relationship between the prosodic annotations and the word class tags in the Spoken English 

Corpus. The models use two sets of parameters calculated from the cross reference between the 

prosodic and word class versions of the corpus. These are the co{occurrence likelihoods (or the 

likelihood that a prosodic mark occurs upon a word with a given word class) and the bigram 

likelihoods (or the likelihood that a prosodic mark/word class combination is followed by another 

specic prosodic mark/word class combination). 

As far as has been possible within the limitations of the size of the corpus these likelihoods 

have been estimated, although various alternatives have been considered to cope with problems 

of low frequency of some entities. The most important of these techniques is the clustering of 

word classes into groups on a prosodic basis. This has given rise to the concept of prosodically 

orientated word class groups. It has been shown that these groups can perform as well, if not 

better, in estimating bigram likelihoods. 

The stress prediction model and the prosody prediction model have been tested with new 

unseen text and both have demonstrated that they can perform good levels of annotations which 

correspond 91% and 65% of the time respectively with the annotations within the corpus. These 

models may be used as a stage in a text{to{speech system for the low level \baseline" assignment 

of stress and prosody annotations. Higher level processes could use these as a starting point for the 

assigment ofcontext and semantic dependent prosody before generating the prosodic annotations 

acoustically. 

7.3 Performance Measures 

In order to evaluate the models in more detail it is necessary to look at performance gures for 

individual elements within the model. This section presents and comments on several such elements 

including the frequency of prosodic marks within tone units, the performance of individual word 

classes and the dierences between predicted and actual prosodic marks. The latter two showin 

79

which areas the models performance varies. These show inaquantitative way the relationship 

between word class and stress accents. 

7.3.1 Tone Unit lengths in the Model. 

Although the model makes no attempt to generate tone unit boundaries (other than assuming 

that punctuation indicates a boundary) the number of tonic stresses and stressed words in tone 

units will dier between the model and the transcribed corpus. If the same tone unit boundaries 

are used to segment the synthesized prosody we can compare the graphs shown in gure 4.2 with 

the equivalent graphs in gure 7.1. It should be noted that the boundaries used are not exactly 

the same since boundaries inserted due to punctuation have not been removed. This accounts for 

approximately 330 extra tone unit boundaries. It is also important to realise that the model would 

probably behave dierently if these boundaries had not been present in its input or if the tone 

unit boundaries inserted to allow this comparison had been present in its input. This accounts for 

the spreading to the right for the words curve. This does not have signicant impact upon the 

other two curves. 

Note that the model predicts a large number of tone units with 0, 1 or 2 stress accents (TSMs) 

in comparison with the original data which has large numbers of tone units with 1, 2 or 3 stress 

accents. This indicates that the model underpredicts many of the stress accents, although it is 

not possible to tell what percentage of stress accents are present only as a function of context and 

semantics. 

7.3.2 Analysis of Models 

There is a real need to be able to assess the \goodness" of stress patterns. Clearly there is a 

dierence between the experts' transcriptions, but to what extent do errors go towards making 

an utterance unintelligible Is it more wrong to miss a stress, insert one or transpose it It is 

invisaged that this question could be answered by listening tests as described in section 6.4. 

80

Tag SPM% PPM% 

APP$ 91.46 89.76 

AT 97.65 95.79 

AT1 98.86 97.14 

CC 93.09 86.68 

CCB 93.71 81.44 

CS 69.80 57.96 

CSA 79.78 75.27 

CST 99.24 96.99 

DB 91.57 44.32 

DD 76.29 45.36 

DD1 65.06 36.04 

DD2 56.76 36.00 

DDQ 72.73 66.46 

EX 96.51 89.77 

ICS 58.77 53.33 

IF 98.81 94.38 

II 90.25 87.04 

II21 61.76 40.85 

II22 98.53 98.59 

IO 99.88 98.87 

IW 88.89 84.83 

JB 88.64 48.66 

JJ 92.30 52.43 

JJT 87.76 31.37 

MC 89.79 52.34 

MC1 81.25 41.84 

MD 80.99 41.40 

MF 61.84 40.79 

NN 85.03 42.44 

NN1 92.88 35.96 

NN2 90.97 35.29 

NNJ 85.71 36.43 

NNL1 85.61 40.56 

NNO 66.22 45.33 

NNS1 86.02 54.84 

Tag SPM% PPM% 

NNSB1 78.48 69.62 

NNT1 82.41 35.07 

NNT2 89.41 36.96 

NP1 91.99 43.82 

PPH1 99.19 93.13 

PPHS1 92.06 87.69 

PPHS2 82.73 80.67 

PPIS1 86.21 77.97 

PPIS2 95.15 90.57 

PPY 97.14 95.77 

RG 64.15 49.15 

RL 88.29 38.60 

RP 67.53 35.61 

RR 86.00 42.49 

RR21 94.44 80.00 

RR22 92.59 33.33 

RRQ 54.41 50.00 

RT 79.59 42.59 

TO 100.00 99.27 

VB0 92.22 91.62 

VBDR 89.80 89.90 

VBDZ 94.66 92.88 

VBN 91.26 91.23 

VBR 87.97 85.44 

VBZ 91.83 90.15 

VH0 84.80 83.15 

VHD 89.89 87.23 

VHZ 89.83 88.43 

VM 76.99 75.82 

VV0 82.64 47.09 

VVD 87.59 55.79 

VVG 89.76 54.74 

VVN 91.82 48.57 

VVZ 78.65 51.55 

XX 45.51 27.61 

Table 7.1: Word class tags with frequencies of 50 or greater showing percentage of correct predictions 

(when compared with the corpus annotations) for the stress prediction model (SPM) and 

the prosodic mark prediction model (PPM). 

81

Frequency 

3300 

3000 

2700 

2400 

2100 

1800 

1500 

1200 

900 

600 

300 

0 

0 1 2 3 4 5 6 7 8 9 10 

Length of Tone-Unit 

Figure 7.1: Relative frequencies of tone-unit lengths produced by the model in terms of numbers 

of: words with tonic stress marks words with prosodic marks and words. 

7.3.3 Word Class Models 

The gures in table 7.1 show for both models how well the predictions match the corpus for each 

word class. Word classes with frequencies of less than fty have not been presented because they 

are poorly modelled due to insucient data they are not of primary concern to the model since 

their low frequency means that they have alow impact upon the models. 

With some exceptions the PPM performs just as well as the SPM in all word classes except 

for determiners (DB{DDQ), adjectives (JB{JJ), numbers (MC{MF), nouns (NN{NP1), adverbs 

(RG{RT), lexical verbs (VV0{VVZ) and not or n't (XX). The performance of XX is probably 

poor due to its frequent inclusion in enclitics. It is no great surprise to discover that those word 

classes upon which both models perform well are those that are mostly unstressed. Remember 

for the SPM a chance result would be 50% correct but for the PPM a chance result would be 

20%. For the PPM most predictions are between 15% and 75% higher than chance. However high 

performance was achieved for (APP$), articles (AT{AT1), conjunctions (CC{CST), existential 

there (EX), prepositions (ICS{IW), it, he/she, they, I, we, you (PPH1{PPY), innitive marker to 

82

U/stress 

48.86 2.87 1.03 38.88 8.36 

66.46 2.14 0.22 27.26 3.37 

44.54 2.02 1.74 43.34 8.36 

21.62 1.44 1.09 59.90 15.95 

U/Stress 3.68 0.12 0.27 9.95 85.97 

Table 7.2: Prosodic marks showing prediction percentages for the composite prosody prediction 

model. 

(TO), and verbs be, were, was, been, are, is, have, had, has, can/may/would etc. (VB0{VM). 

Refering to tables 7.1 and 4.3 it will be noted that for those word classes with poor performance 

by the PPM that there is a widespread of use of the diering TSMs. For example DD1 scores 36% 

in table 7.1. Table 4.3 shows that although it is mainly unstressed it also co{occurs with 4 TSMs 

with roughly equal likelihood ( and ). RP also scores approximately 36% and table 4.3 

shows that all 10 prosodic marks are plausible. 

Whereas the models are very good at determining which words should be unstressed or which 

words should have a stress accent the PPM is not able to choose the correct stress accent for all 

nouns, adjectives, verbs (VV0-VVZ), adverbs, and determiners. This is not really surprising since 

these word classes are those most related to the context and semantics of the utterance. 

7.3.4 Prosodic Mark Models 

The values presented in table 7.2 show the percentage of times that the predictions for each 

prosodic mark match the corpus or not. For example is predicted as 38.88% of the time. The 

ratios for each of the prosodic marks of the numbers of times that they exist in the corpus to the 

number of times that they were predicted by the model are as follows: 

83

actual : predicted 

2333 : 3245 

653 : 173 

1089 : 114 

4237 : 4825 

Unstress 7336 : 7291 

Although there is a reasonable degree of accuracy for stress and unstress, rises and fall{rises 

are very poorly modelled instead being replaced with falls or stressed marks. This is possibly 

due to the high frequency of the bigram: fall followed by a tone unit boudary, which may bias 

TSMs before a boundary away from being a rise or fall{rise. The fall mark has been largely over 

predicted and to a lesser extent the stressed mark also. The model performs best for unstress, 

stress and fall marks. In chapter 2 it was noted that the numberoflevels of stress that it is 

usually useful to distinguish between is three (unstressed, weakly stressed and strongly stressed). 

The model seems to add weight to this claim. Since fall, rise and fall{rise are all \strong" stresses 

it is perhaps not surprising that the model does not perform well in distinguishing between them 

and that one dominates over the others. Though in the case of fall{rise there is an approximately 

equal split between the fall and stress mark which take the place of the fall{rise mark. For the 

rise mark there is a 2:1 split between fall and stress marks. We can note that falls and rises are 

\stronger" stress marks than fall{rises. 

The results of the PPM suggest that the placement of stress accents is predictable from structure 

and word class information but that the direction of the stress accent is not. 

7.4 Future Work 

During this research Ihave had many ideas which it has not been possibe to investigate either 

due to lack of time or mainly because they do not belong within this research but are interesting 

oshoots. These ideas are presented below in no particular order. 

84

7.4.1 Conversion to ToBI 

TOBI[SBP + 92, BA93] (Tones and Break Indices) is a modern system for transcribing prosodic 

and intonation patterns in English and is attracting a lot of interest. Unlike previous prosodic 

systems ToBI is being developed by alargenumberofspeech scientists with explicit application 

to the annotation of machine readable spoken English corpora. 

Roach[Roa94] has already provided a means of converting between the annotation scheme used 

in the SEC and ToBI. 

ToBI has two clear advantages. Firstly it has a grammar in the sense that some \sentences" of 

annotations are not allowed and secondly it does not have unclear areas such as the stressed but 

unaccented mark used in the SEC. A useful development tothework presented here would be to 

convert the annotations produced to the ToBI system. This would allow for a phase of weeding 

out illegal ToBI sequences from all those possibilities presented by the model and would decrease 

the likelihood of the model producing unacceptible annotations. Given the amount ofinterest in 

ToBI this would be a useful task to undertake especially if the model were ever to be used in a 

text{to{speech system. 

7.4.2 Additional Constraints 

The focus of this thesis has been upon the relationships between prosodic annotations and word 

class. It has often been noted that prosody serves more purposes than signifying stucture. We 

can therefore expect marked improvements (especially in the prediction of stress accents) by the 

inclusion of other factors than word class. Researchers in the eld of Natural Language Processing 

have been developing semantic taggers for corpora | or systems that can annotate corpora with 

semantic information derived from them[JA94]. As has been previously pointed out in this thesis 

semantics play an important role in the structure of prosody. The inclusion of semantic information 

in the prediction would be most advantageous in that, for example, given information could be 

signied prosodically. At a lesser level parse tree structures could provide additional constraints. 

85

It has also being noted by Knowles 2 that given information plays an important role in stress 

assignment. 

A further level of constraints could be imposed by rule in certain circumstances for example 

stress accent assignment in compound nouns or noun phrases. Such constraints are discussed in 

chapter 5 of Fudge[Fud84]. 


There is at present no method for automatically synthesizing speech with appropriate pitch, intensity 

and syllable durations for a given prosodic annotation. Usage of such a system, had it 

existed, would have been most useful in assessing the acceptibility of annotations produced by the 

model (see section 6.4). 

There are very few text{to{speech systems that generate speech from phonemes and allow the 

inclusion of prosodic annotations or pitch movement indications. One such system available (the 

speech synthesizer 3 built into the Commodore Amiga personal computer) allows a specication of 

stress level following each vowel ranging from 1 to 9. An attempt was made to convert between 

the annotations produced by the model and stress level accepted by the synthesizer but there was 

too little range of control as a general rise|fall contour was imposed by the system outside the 

user's control. The stress level number merely perturbs this level upwards by avarying amount. 

There is therefore no easy way of indicating a falling tone. 

SOLA/PSOLA 

An alternative tospeechsynthesis would be to use real speech and adjust these with the SOLA 

and PSOLA algorithms to change the length and pitch contour. These are digital signal processing 

techniques that allow a segment of speech tobechanged in duration without aecting the pitch 

of the utterance (SOLA) and will allow the pitch of the speech tobechanged without aecting 

the duration of the speech (PSOLA). These can be used to give any utterance any desired pitch 

2 At a seminar given at Leeds University Linguistics Department. 1994 

3 believed to be modeled on the DecTalk system. 

86

contour and syllable duration. Intensity is modiable by scaling the waveform with an intensity 

contour. 

It is, however, very dicult for the novice to know how exactly to realise the prosodic annotations 

in terms of F0 and intensity contours and syllable durations. There is also the additional 

problem of correctly interpreting between F0 and intensity and pitch. 

There is a very clear need for work on relating F0, intensity and duration to prosodic annotations. 

7.4.4 Parameter Improvement 

The model is dened in terms of the state probabilities and the transition probabilities. A large 

portion of this research has concentrated upon methods to estimate these parameters directly from 

the corpus. Improvements in the estimation of these parameters will improve the accuracy of the 

model hence any means to perform this would be desireable. 

There are other iterative methods that may be exploited to improve the parameters. 

One 

method would be to impose random variations in the parameters and observe how these eect the 

overall performance of the model. Iteratively this would allow improvements to be made to the 

poorly estimated parameters by keeping those changes that improved performance. Many other 

methods exist. See for example Statistical Language Learning[Cha93]. 

7.5 General Conclusions 

This whole thesis starts from the assumption that the prosodic annotations in the Spoken English 

Corpus do have some correlation to the acoustic signal i.e. some \realisation" beyond the 

perception of the human annotators. It is beyond the scope of this thesis to dene this prosodic{ 

acoustic correlation exactly but if and when others do this, then this thesis has mapped out the 

link between prosodic and syntactic tags, and so will constitute a further link in the chain relating 

acoustic signals to syntactic analysis. 

87

Appendix A 

SEC and MARSEC 

A.1 Introduction 

This appendix provides an outline description of the data and its organisation in the Spoken 

English Corpus (SEC) and the Machine Readable Spoken English Corpus (MARSEC). 

A.2 The Spoken English Corpus 

A.2.1 

History 

The SEC was the product of a three{year project funded by IBM UK and carried out at the 

University of Lancaster by Knowles et al[KT88] with the aim of providing a corpus of data for the 

analysis of intonation. 

The corpus comprises 52,673 words of text recorded principally from BBC Radio 4 broadcasts 

and covers a diversity of categories in line with the LOB and BROWN corpora conventions. 

A.2.2 

Categories 

The SEC is divided into 11 categories each category featuring a dierent variety ofspeechstyle. 

These categories are listed in table A.1 with their size in words and as a percentage of the whole 

88

corpus. 

Each category is divided into a number of sections. Each section comprises of a single recording. 

Table A.2 shows the section numbers (which begin with their category letter), their duration in 

minutes and seconds and the approximate numberofwords in each section (accurate word counts 

can sometimes be debateable). 

A.3 MARSEC 

The MARSEC was produced from a two{year ESRC project held jointly by Lancaster University 

and Leeds University Roach[RKVA94]. The main dierence between the SEC and the MARSEC 

is that the acoustic data has been added to the corpus in the form of a CD{ROM and fundamental 

frequency and RMS energy have been calculated. The MARSEC also has the advantage of bringing 

together all the available corpus material along with a segmental time alignment of the data and 

a mechanism for cross{referencing (see Section 3.5). 

The MARSEC update of the SEC therefore brings together the following information. 

Digitized Recordings of the Speech 

Fundamental Frequency 

RMS Energy 

Segmental Time Alignment with Syllabic Divisions 

Prosodic Annotation with Time Alignment 

Orthographic Text 

Part of Speech Annotation 

Parse Trees 

Figure A.1 shows an example of all the available information for an example utterance. For 

more detailed information see the references given above. 

89

Category Style #words % corpus 

A Commentary 9066 17% 

B News Broadcast 5235 10% 

C Lecture type I (general audience) 4471 8% 

D Lecture type II (restricted audience) 7451 14% 

E Religious (including liturgy) 1503 3% 

F Magazine Style Reports 4710 9% 

G Fiction 7299 14% 

H Poetry 1292 2% 

J Dialogue 6826 13% 

K Propaganda 1432 3% 

M Miscellaneous 3352 6% 

Table A.1: Categories in the SEC/MARSEC 

Section Time #Words 

A01 15:00 793 

A02 4:28 734 

A03 4:01 620 

A04 5:41 977 

A05 4:48 804 

A06 4:32 828 

A07 3:54 716 

A08 4:08 618 

A09 5:12 787 

A10 4:26 800 

A11 4:15 785 

A12 4:05 604 

B01 9:32 1722 

B02 9:40 1720 

B03 5:00 940 

B04 5:00 853 

C01 30:00 4471 

D01 19:00 2410 

D02 19:00 2434 

D03 19:00 2607 

E01 6:48 915 

E02 4:30 588 

F01 3:48 671 

F02 3:32 667 

F03 4:54 850 

F04 13:16 2522 

Section Time #Words 

G01 20:00 3163 

G02 8:56 1221 

G03 2:39 442 

G04 5:30 810 

G05 9:20 1663 

H01 1:41 248 

H02 2:03 286 

H03 1:00 157 

H04 2:59 405 

H05 1:17 196 

J01 7:58 1674 

J02 1:31 279 

J03 2:04 375 

J04 0:27 74 

J05 1:28 277 

J06 24:00 4147 

K01 4:32 798 

K02 4:09 634 

M01 0:41 93 

M02 1:10 200 

M03 0:48 140 

M04 1:40 298 

M05 4:33 738 

M06 7:05 1112 

M07 1:06 187 

M08 0:47 143 

M09 2:24 441 

Table A.2: Sections in the SEC/MARSEC 

90

[N a AT1 tiny JJ minority NN1 [P in II [N Argentina NP1 N]P]N] 

Figure A.1: Diagram showing waveform, fundamental frequency, RMS energy, segmental, prosodic 

and treebank transcriptions. 

91

Appendix B 

Syntactic Tagging of SEC 

B.1 Introduction 

This appendix provides a list of the part of speech word class tags used in the version of CLAWS 

(CLAWS4) with which the SEC treebank version was annotated. 

B.2 Word Class Tags 

Wordtag Denition 

! punctuation tag { exclamation mark 

" punctuation tag { quotation marks 

$ genitive sux(\'"or\'s") 

&FO 

formula 

&FW 

foreign word 

( punctuation tag { left bracket 

) punctuation tag { right bracket 

, punctuation tag { comma 

{ punctuation tag { dash 

. punctuation tag { full stop 

... punctuation tag { ellipsis 

: punctuation tag { colon 

 

punctuation tag { semicolon 

punctuation tag { question mark 

APP$ 

possessive pre-nominal pronoun: my, your, our 

AT 

neutral article: the, no 

AT1 

singular article: a, every 

BTO 

before-innitive marker: in order, so as before to 

92

BTO21 

idiom tag 

BTO22 

idiom tag 

CC 

general co-ordinating conjunction 

CC31 

idiom tag 

CC32 

idiom tag 

CC33 

idiom tag 

CCB 

co-ordinating conjunction but 

CF 

semi-co-ordinating conjunction: so, then, yet 

CS 

general subordinating conjunction 

CS21 

idiom tag 

CS22 

idiom tag 

CSA 

as as conjunction 

CSN 

than as conjunction 

CST 

that as conjunction 

CSW 

whether as conjunction 

DA 

neutral after-determiner capable of pronominal function: such 

DA1 

singular after-determiner: little, much 

DA2 

plural after-determiner: few, several, many 

DA2R 

comparative plural after-determiner: fewer 

DAR 

comparative neutral after-determiner: more, less 

DAT 

superlative neutral after-determiner: most, least 

DB 

before-determiner (capable of pronominal fn.): half, all 

DB2 

plural before-determiner: both without and 

DD 

neutral determiner capable of pronominal function: any, some 

DD1 

singular determiner: this, that, another 

DD121 

idiom tag 

DD122 

idiom tag 

DD2 

plural determiner: these, those 

DD21 

idiom tag 

DD22 

idiom tag 

DD221 

idiom tag 

DD222 

idiom tag 

DDQ 

`wh-' determiner without `-ever' : what, which 

DDQ$ 

possessive `wh-' determiner: whose 

DDQV 

`wh-ever' determiner: whatsoever, whichever 

EX 

existential there 

ICS 

preposition-conjunction of time: after, before, since 

IF 

for as preposition 

II 

general preposition 

II21 

idiom tag 

II22 

idiom tag 

II31 

idiom tag 

II32 

idiom tag 

II33 

idiom tag 

IO 

of as preposition 

IW 

with, without as preposition 

JA 

predicative adjective: tantamount, afraid, asleep 

JB 

attributive adjective: late, model in a model prisoner 

JBR 

attributive comparative adjective: upper, outer 

JJ 

general adjective 

JJR 

general comparative adjective: older, better, stronger 

JJT 

general superlative adjective: oldest, best 

LE 

leading co-ordinator: either before or 

MC cardinal: two, 6, 2.34 

93

MC-MC hyphenated number: 1770-1827 

MC1 singular cardinal number one, 1 

MC2 

plural cardinal number: threes, 3s 

MD 

ordinal number: second, 2nd, last 

MF 

fraction neutral for numbers: two-thirds 

ND1 

singular noun of direction: west 

NN 

common noun neutral for number: sheep, cod 

NN1 

singular common noun: book, girl 

NN121 

idiom tag 

NN122 

idiom tag 

NN2 

plural common noun: books, girls 

NNJ 

organization noun neutral for number: Company, group 

NNJ1 

singular organization noun: conference, Church 

NNJ2 

plural organization noun: groups, councils 

NNL1 

singular locative noun: island, Street 

NNL2 

plural locative noun: islands, streets 

NNO 

numeral noun, neutral for number agreement: dozen, hundred 

NNO2 

plural numeral noun: hundreds, millions 

NNS1 

singular titular noun: Mrs, President 

NNS2 

plural titular noun: Presidents 

NNSA1 

following abbrev. singular titular noun: M.A. 

NNSB1 

preceding abbrev. singular titular noun: Prof. 

NNT1 

singular temporal noun: day, week, year 

NNT2 

plural temporal noun: days, weeks, years 

NNU 

abbreviated unit of measurement neutral for number: in., kg 

NNU1 

singular unit of measurement: inch, kilo 

NNU2 

plural unit of measurement: ins., feet 

NNU21 

idiom tag 

NNU22 

idiom tag 

NP 

proper noun neutral for number: Andes, Indies 

NP1 

singular proper noun: London, Frederick 

NP2 

plural proper noun: Americas 

NPD1 

singular weekday noun: Thursday 

NPM1 

singular month noun: October 

PN 

indenite pronoun neutral for number: none 

PN1 

singular indenite pronoun: anybody, everyone, one as pronoun 

PN121 

idiom tag 

PN122 

idiom tag 

PNQO 

objective `wh-' pronoun without `-ever': whom 

PNQS 

`wh-' pronoun without `-ever' : who, that 

PP$ 

nominal possessive personal pronoun: mine, yours 

PPH1 

it 

PPHO1 

him , her 

PPHO2 

them 

PPHS1 

he,she 

PPHS2 

they 

PPIO1 

me 

PPIO2 

us 

PPIS1 

I 

PPIS2 

we 

PPX1 

singular reexive personal pronoun: yourself, itself 

PPX121 

idiom tag 

PPX122 

idiom tag 

PPX2 

plural reexive personal pronoun: ourselves, themselves 

94

PPX221 

PPX222 

PPY 

RA 

REX 

REX21 

REX22 

RG 

RG21 

RG22 

RGA 

RGQ 

RGQV 

RGR 

RGT 

RL 

RL21 

RL22 

RP 

RR 

RR21 

RR22 

RR31 

RR32 

RR33 

RRQ 

RRQV 

RRR 

RRT 

RT 

TO 

UH 

VBO 

VBDR 

VBDZ 

VBG 

VBM 

VBN 

VBR 

VBZ 

VDO 

VDD 

VDG 

VDN 

VDZ 

VHO 

VHD 

VHG 

VHN 

VHZ 

VM 

VM21 

VM22 

VMK 

idiom tag 

idiom tag 

you 

adverb after nominal head: else, galore 

adverb apposition-introducer: namely, e.g. 

idiom tag 

idiom tag 

degree adverb: very, so, too 

idiom tag 

idiom tag 

post-adjectival / adverbial degree adverb: enough, indeed 

`wh-' degree adverb without `-ever' : how 

`wh-ever' degree adverb: however 

comparative degree adverb: more, less 

superlative degree adverb: most, least 

locative adverb: here, there 

idiom tag 

idiom tag 

prepositional adverb which is also particle 

general adverb 

idiom tag 

idiom tag 

idiom tag 

idiom tag 

idiom tag 

non-degree `wh-' adverb without `-ever': where, when, why 

non-degree `wh-ever' adverb: wherever, whenever, however 

comparative general adverb: better, longer 

superlative general adverb: best, longest 

nominal adverb of time: now, then 

innitive marker to 

interjection: hello, no 

base form be 

imperfect indicative were 

was 

being 

am, 'm 

been 

are, 're 

is, 's 

base form do 

did 

doing 

done 

does 

base form have 

had, 'd (preterite) 

having 

had (past participle) 

has, 's 

modal auxiliary: can, may, would 

idiom tag 

idiom tag 

modal catenative: ought, used 

95

Labels 

F 

Fa 

Fc 

Fn 

Fr 

G 

J 

N 

Nr 

P 

S 

Si 

T 

Tg 

Ti 

Tn 

V 

Denition 

nite clause, divided into: 

adverbial clause 

comparative clause 

noun clause 

relative clause 

genitive phrase 

adjectival phrase 

noun phrase 

temporal adverbial noun phrase 

prepositional phrase 

independent sentence (sentential conjuct) 

interpolated or appended sentence 

non-nite clause, divided into: 

clause with present-participle head 

clause with innitive head 

clause with past-participle head 

verb phrase (sequence of auxiliary & main verbs, 

excl. object, complement, etc.) 

Table B.1: Phrase and Clause labels 

VV0 

VVD 

VVG 

VVN 

VVZ 

XX 

ZZ1 

lexical verb, base form: eat, request 

lexical verb, preterite: ate, requested 

\-ing" present participle of lexical verb: giving 

past participle of lexical verb: given 

3rd singular form of verb: eats, requests 

not, n't 

singular letter of the alphabet 

B.3 Phrase/Clause Tags 

Table B.1 presents the list of phrase/clause node labels used in the treebank version of the SEC. 

Note that some node labels will sometimes occur with a `&' or `+' sux in order to show the 

coordination of phrases or clauses. 

96

Appendix C 

Testing Data 

This appendices presents the text of those sections of category M that were used as testing data. 

The results of which are presented in table 6.3. Also presented is the annotations produced by 

the model. 

C.1 Corpus Texts: Category M 

Below are some of the prosodically annotated texts from Category M. Category M is the miscellaneous 

section and the texts are therefore of a variety ofstyles. Not all of the texts were used 

for testing. M01 was ommitted from the test set because it is a short peotry reading by John 

Betjeman and is therefore an unusual style and M06 was ommitted because of a techinical problem 

associated with alignment. It was not deemed necessary to have the complete category. 

C.1.1 

Section M02 

046 SPOKEN ENGLISH CORPUS TEXT M02 

Motoring News 

Speaker: male 

Broadcast notes: Radio 4, 8.55a.m., January 18th, 1987 

97

Transcriber: BJW 

in spite of the low j of the slow thaw j con ditions are probably more " dangerous 

on the roads this morning j because yesterday's slush and " snow j is this morning's 

ice k in Kent j the A two six four is still closed at " Blackham j as are many 

side roads k and about a dozen of the more isolated villages j re main cut o k 

a part from the weather j gas and 

water main re pairs j to gether with j scheduled 

weekend work j are also going to a ect the roads to day k and in London j 

ex pect long delays in " Chiswick k where the west bound elevated section of the 

M 4 j is 

closed for most of to day j and only a single eastbound lane is open k in 

Lancashire j northbound trac on the M 6 j will be re stricted to the 

centre lane 

only j between junctions thirty- one j and thirty- two from " Preston k to the M 

fty- ve intersection until 

4p.m. k and you can expect long queues there k at 

10 o'clock this morning j the Thames Valley po lice are di verting all northbound 

trac o the M 1 k and are leaving only one southbound lane open j between 

junctions 

fourteen and fteen j while a rather complicated re covery ope ration is 

being carried out k so ex pect very long delays there k that's between junctions 

fourteen j and 

fteen j on the M 1this morning k 

C.1.2 

Section M03 



Speaker: male 

Broadcast notes: Radio 4, January 18th, 1987 


98

now the weather forecast j#until dawn to morrow k over England and 

Wales j 

many places will be cloudy but dry k but in parts of Cornwall and west Wales j 

a little light rain is likely j#which will ex tend into parts of Cumbria over night 

k eastern Scotland j will have a dry j cloudy day j followed by rain or sleet j 

to night k over Northern Ireland j and western Scotland j rain will ex tend from 

the southwest j reaching Northern Ireland j early this morning j and western 

Scotland soon after k outbreaks of rain j#will per sist j#into the night k but 

after midnight j they're ex pected to clear j from Northern Ireland k eastern 

areas will a gain be cold j but in the 

west it'll be come less cold than recently 

k and the outlook for Monday and Tuesday j mainly j dry and cold in the 

south east j but re maining areas will be milder j with j a little rain k"so things 

are im proving slightly there k 

C.1.3 

Section M04 



Speaker: male 


Transcriber: GOK 

now let's look at our programmes j coming up on * Radio 4 this morning k 

well j in a moment j in *"two-and-a- half minutes or therea bouts j there's the 

news j followed j by our browse j through the Sunday papers k then at 9.1 5 

j Alistair Cooke j pre sents this week's * Letter from A merica k our morning 

service j at 9. 30 j comes this week j from Eneld j in Middlesex k and it's followed 

j at 10.1 5 j by * a nother chance j to catch up with the week's goings on at 

Ambridge k Margaret Howard j will be here with her * Pick of the Week j at a 

99

quarter past e leven j and this week j we'll be hearing a bout * 

camel wrestling 

in Turkey j nights spent inthe Great Pyramids at Cairo j Hamlet j in Elsinore 

j and tales j from wildest Canada j and Ecuador k add some * do-it-your self 

Gilbert and Sullivan j and a re minder of the comedy of Al Reid j and that's 

our Pick ofthe Week j at e leven f teen k the castaway j in Desert Island Discs 

j one hour later j is silly-ass actor Jeremy Lloyd j who's also known j as a 

scriptwriter k he's one half of the writing team j that cre ated * Are You Being 

Served j and * 'A llo 'A llo k he'll be chatting a bout * chatting a bout his ca reer j to 

Michael Parkinson j and picking his eightfavourite records j at "a quarter past 

twelve k and that takes us to # lunchtime j and * The World at One k 

nally j I've 

just time j for a word a bout our Sunday feature j whichisa poetry programme 

j with a dierence j Gardens of Eden j by Micheline Wonder k Maureen Lipman 

plays Eve j Adam's rst wife j a ccording to the New Testament j and * 

Miriam 

Margolis j plays Lillith j who is Adam's rst wife j a ccording to the alphabet j 

of Ben Seurat k their meeting in volves j a kind of life swap j Lillith j journeys 

to the Old Testament j and takes tea with the Lord j while Eve decides j she's 

had e nough j of being everyone's mother k 

C.1.4 

Section M05 


Nelson Mandela speech 

Speaker: Colin Lyas 

Recorded at MSU, University of Lancaster 


your Royal Highness and Chancellor j"it is my privilege to pre sent toyou j on 

be half of the Senate j the name of Dr Nelson Man dela k the im prisoned leader 

100

of the African National Congress j as one eminently worthy ofthede gree of 

Doctor of Laws k Dr Mandela's life j has been de voted to the eort to se cure 

j for all the citizens of his native South Africa j re gardless of their colour j 

certain simple yet basic rights k the most fundamental of which is the right of 

each of those who must o bey the law j to an equal voice within the po litical 

system j under which the law iscreated k those who live in countries j where 

the basic rights j that Dr Man dela seeks for his people have been won j owe 

at the very least j a duty of sympathy j for those who have no such rights k 

and they owe too j a duty ofres pect to those who j like Dr Man dela j have 

un ceasingly striven in the face of hardship and danger j to claim those rights 

k but in a ddition to what is owed to Dr Man dela by the citizens of any free 

nation j any uni versity ina free society j owes him a special tribute k for in his 

speeches and writings j Dr Man dela has un swervingly a sserted the cen trality 

of an open edu cation j to the cultural life of any nation k he has in sisted 

j as the Charter of the African National Congress puts it j that the doors of 

learning and culture j shall be open k he has emphasised j that in a healthy 

so ciety j young scholars are to be thought ofasa credit to their nation j and 

not merely as a threat to its rulers k he has in sisted upon the pro foundly 

liberalising e ects j of the meeting of the world's peoples j in open institutions 

of learning k and to use his own words j he has re soundingly a rmed j that 

for centuries uni versities have served j as centres j for the di ssemination of 

learning and knowledge j to 

all students j irre spective of colour and creed k in 

multi-racial so cieties he continues j they serve asthe centres for the de velopment 

of the cultural and spiritual aspects j of the life of the people k"the Charter 

of this uni versity commands j that no test re lated to sex j race j colour 

j or creed j shall be im posed upon any person j in order to entitle him j to 

be ad mitted j as a member j teacher j or student k"for the members of this 

101

uni versity j this charter en shrines a vic torious principle k and the fruits of that 

victory can i mmediately be seen j in the international co mmunity of scholars 

j that has graduated here to day k their presence has en riched this uni versity 

and this country k and many will return home j to en hance their own nations k 

but those who live by the principles of such charters as our own j owe a special 

duty of testimony j to those j for whom the ght toachieve a recog nition of those 

principles j has not been won k whose a llegiance to the principle of an open 

educational system j in an open so ciety j is a con fession and a proclamation to be 

paid for j in the coin of im prisonment j sepa ration j and even death k and 

Nelson 

Man dela has of course been willing to pay that price k"your Royal Highness and 

Chancellor k at all times j there have been women and men j whose lives and 

words have taken on a special meaning j to i nnumerably many oftheir fellow 

human beings k their lives em body j and their words ar ticulate j the le gitimate 

aspi rations of the de prived j the suering j and the slighted k Nelson Man dela j 

has be come one such k and I can think of no better way to commend him to you 

j than to use his own closing words j spoken in court j at the end of his nal trial 

j when he was indeed facing the possi bility ofa sentence of death k during 

my life I have dedicated my self to the struggle of the African people k Ihave 

fought against white domi nation j and I have 

fought against black domination 

k Ihave cherished the i deal of a demo cratic and free so ciety k in which all 

persons live to gether j in harmony j and with equal oppor tunities k it is an ideal 

which I hope to live forandachieve j but if needs be j it is an ideal for which 

I am pre pared to die k a life that has indeed been lived in the spirit of these 

i deals j cannot but co mmand our re spect k and I therefore pre sent toyou the 

name of Nelson Man dela j a las in ab sentia j as one 

eminently worthy j of the 

a ward of the de gree j of 

Doctor of Laws j ho noris causa k 

102

C.1.5 

Section M07 


Travel Roundup 

Speaker: male 

Broadcast notes: Radio 4, 8.55a.m., January 25th, 1987 


" ve to nine now j and er * here's this morning's travel roundup j weekend 

engi neering work j will cause problems j for both 

road j and rail travellers to day 

I'm a fraid k de lays can be ex pected j on the M8 in Glasgow j where there are lane 

closures in " both di rections j between Kingston Bridge j and Charing Cross 

tunnels k at junction 10 j the on ramp is closed j from Barty Beeth road j 

to gether with two westbound lanes j on the 

motorway itself k at " Manchester j 

work on the M 6 2 j has closed the nearside lane j and hard shoulder j of both 

carriageways j at junction seven teen j to wards Prestwich k and there are also j 

eastbound lane closures j on the M5 6 leading from Cheshire j be tween junction 

3 and 4 j at Altrincham k near Worcester j both carriageways of the M 5 are still 

closed j following overnight work j be tween junctions 5and 6 j and di versions 

are a long the A3 8 j until 10 this morning k on the " railways j engi neering work j 

is widespread j with buses operating in stead of trains j on some routes k eastern 

region trains j will be de layed j on the London King's Cross to Peterborough 

line j as will western region services j on the Paddington j to Exeter route k I 

hope you get there in the end k 

C.1.6 

Section M08 


Weather forecast 

103

Speaker: male 



here's the weather forecast for the U nited Kingdom until j 

dawn to# morrow 

k in 

southern counties j of England and Wales j it'll be dull j but * early patches 

of mist and drizzle j will # clear during this morning k over " northern England 

j and Northern Ireland j it'll stay mainly cloudy j but dry j during to day j 

and to night k in southern " Scotland after early sunshine in places j it'll be 

mostly cloudy j but dry j al though to night there" may be some light rain k 

northern Scotland j will have o ccasional light rain j which will be 

followed during 

the day j by colder j but still * mainly cloudy weather j with a few sleet and 

snow showers k temperatures to day j will be * much as yesterday j ex cept 

in northern Scotland j where it'll turn noticeably colder k and the outlook for 

Monday and Tuesday j it'll be rather cold j in most places j with south western 

areas j staying dry j but elsewhere j some light rain or sleet j is likely k 

C.1.7 

Section M09 



Speaker: male, Margaret Howard 



we'll just check j some of today's programmes j on Radio 4 j" for you j for our 

Morning Service j at half past nine j we'll join the congre gation in the parish church 

of St Faith in Great Cosby k and then at a quarter past ten j some Ambridge 

104

ell-ringing to en joy k#just one highlight inthe lives of the Archers this week k 

after that Margaret Howard j with her se lection j of listening highlights j from 

the 

past week's broadcasting k 

CHANGE OF SPEAKER: MARGARET HOWARD 

"the chief constable of Greater Manchester j James Anderton has been much 

in the news this week k many interpre tations have been put on his statements 

about being an instrument of" God k on Pick ofthe Week j you can hear j what 

he actually said k Nigel Hawthorne j temporarily de serts Yes Prime Minister 

j for a new role j on One Man and His Dog k he plays the part of a 

ve year old 

border collie k we recall the day that King George the Sixth j kept the Empire 

waiting k David 

Frost j e licits a cure for snoring j and we take a ride out with 

a Bicester j in pur suit of the fox k 

CHANGE OF SPEAKER: MALE 

Margaret Howard k that's all in Pick ofthe Week j at a quarter past e leven 

k an hour later j Baroness Ryder of Warsaw j joins Michael Parkinson j for a 

stint on the old desert island k the Baroness heads the Sue Ryder foun dation 

j which looks after the sick and di sabled in many parts of the world k and in 

past years has frequently driven lorries full of medical supplies and pro visions 

j to dis tressed areas of central Europe k all she has to do to day j is to pick 

her eight desert island discs so j join her j and en joy hermusical se lection j 

at a quarter past j#twelve k"this evening at a quarter past six j the rst of 

two Actu ality proles j of some men and women recently re cruited j"to do 

j voluntary overseas service k over the next two weeks j we'll join them j as 

105

they're pre pared for j not only the ex citement of living in strange and e xotic 

climes j but 

also j the down side of things k 

SPEECH EXTRACT OMITTED 

well they all sound cheerful e nough j but for most VSO re cruits j the training 

period is a mixture of fears j fantasies j and expec tations k as you can nd out 

through Actu ality tonight at a quarter past 

six k and just a re minder j that it's 

Burns night k#if you're not planning to go out for a Burns night supper j then 

stay in and en joy The 

Miller's Reel j which be gins at a quarter past seven j this 

evening k The Miller's Reel j takes the form of a love story j woven j from the 

letters j poems j and songs of Robert Burns j and features the singing of Jean 

Redpath and Rod Patterson k that's The Miller's Reel j e specially for Burns 

night j# here on Radio 4 k 

C.2 Prediction Results 

C.2.1 

Extract from section M05 

The text presented below is an extract from section M05. This extract was not produced automatically 

the output of the model is not in this format but as a list of symbols one per word class 

signifying the prosodic annotation plus symbols for tone unit boundaries | which are not infact 

predicted by the model. The simple rule a major boundary is generated by major punctuation 

symbols (such as full stop, colon, semi{colon, exclamation mark, and question mark) and a minor 

boundary by commas. 

In this prediction refers to any non{accented stress such as or as well as the class stressed 

but unaccented which is not well dened. Rises, falls and fall{rises (denoted by , and ) are not 

distinguished here between high and low. 

106

your Royal Highness and Chancellor k it is my privilege to pre sent toyou j on 

be half of the Senate j the name of Dr Nelson Man dela k the im prisoned leader of 

the African National Congress j as one eminently worthy ofthedegree of Doctor 

of Laws 1 k Dr Mandela's lifeyhas been de voted to the eort to se cureyfor all the 

citizens of his native South Africayre gardless of their 

colour j certain simple yet 

basic 

rights k the most fundamental of which isthe right of each of those who 

must o bey the lawyto an equal voice within the po litical systemyunder which the 

law iscreated k 

1 The lack of a fall here is probably due to the lack of a predicted tone unit boundary at the end of the predicted 

stretch. A limit is placed on how many words are worked on at once. In this case the algorithm just missed the 

end of sentence full stop that would have given rise to a tone unit boundary. Other missing boundaries are denoted 

by y. 

107

Appendix D 

Word{Class / TSM Co{occurence 

gures 

The gures presented in this appendix were calculated from the sub{corpus sample used throughout 

this research. 

D.1 Tonic Stress Mark Frequencies. 

Absolute ASCII Symbol 

Frequency Symbol 

12801 @ unstress 

3511 * 

2564 ` 

2297 ~ 

1528 

1511 `/ 

1200 n 

1158 , 

342 n , 

261 / 

8 /` 

108

D.2 Word Class Frequencies. 

Freq. Tag 

3536 NN1 

2067 AT 

2066 II 

1673 JJ 

1516 NN2 

1299 NP1 

884 IO 

800 VV0 

773 CC 

771 RR 

734 AT1 

725 VVN 

470 VVD 

440 MC 

409 TO 

375 VVG 

332 APP$ 

300 VM 

283 DD1 

281 VBDZ 

267 IF 

266 CST 

262 PPH1 

258 VBZ 

208 NNT1 

205 RP 

192 VVZ 

178 IW 

172 NN 

167 VB0 

167 CCB 

164 JB 

158 DDQ 

157 CS 

152 VH0 

150 MD 

141 NNL1 

130 VBR 

130 PPHS1 

120 ICS 

119 PPHS2 

118 RG 

117 NNJ 

Freq. Tag 

114 VBN 

110 RL 

108 RT 

106 PPIS2 

99 VBDR 

98 MC1 

97 DD 

93 XX 

93 NNS1 

93 CSA 

92 NNT2 

91 VHZ 

88 EX 

88 DB 

87 PNQS 

84 VHD 

79 NNSB1 

75 NNO 

75 DD2 

73 MF 

72 RRQ 

71 PPY 

71 II22 

71 II21 

60 RR22 

60 RR21 

59 PPIS1 

51 JJT 

49 NNU2 

46 DA 

42 RRR 

42 PPHO2 

42 DAR 

42 DA2 

38 RGR 

38 PN1 

37 JJR 

34 CF 

32 VDD 

32 NPM1 

31 REX22 

31 REX21 

29 PPX1 

Freq. Tag 

28 VBG 

28 RA 

28 NNU21 

27 NNU22 

27 NNJ2 

26 ZZ1 

26 UH 

25 VD0 

25 DA1 

25 &FW 

24 CSN 

23 RGT 

23 PPIO2 

23 ND1 

18 NNL2 

18 DAT 

18 CS22 

18 CS21 

17 PPHO1 

17 NNU1 

17 NNJ1 

16 LE 

15 DB2 

14 CSW 

13 VDZ 

12 PPX2 

12 NNS2 

12 DD222 

12 DD221 

11 VHG 

11 MC2 

10 PPIO1 

10 NNU 

10 II33 

10 II32 

10 II31 

9 NPD1 

8 VHN 

8 RR33 

8 RR32 

8 RR31 

8 RGQ 

8 NP 

Freq. Tag 

7 VM21 

7 RGA 

7 JA 

6 VDG 

6 PP$ 

5 VDN 

4 PN 

4 NNO2 

4 MC{MC 

4 DDQV 

4 DD22 

4 DD21 

4 BTO22 

4 BTO21 

3 RL22 

3 RL21 

3 DDQ$ 

3 DD122 

3 DD121 

2 RG22 

2 RG21 

2 PN122 

2 PN121 

2 NP2 

2 NN122 

2 NN121 

2 JBR 

2 &FO 

1 VBM 

1 RRT 

1 RRQV 

1 RGQV 

1 PPX122 

1 PPX121 

1 PNQO 

1 NNSA1 

1 NNS 

1 CC33 

1 CC32 

1 CC31 

D.3 Tag/Tone Co-occurences 

The following is a table showing all the occuring word class tags along with their frequncies of 

co{occurence with TSMs low rise, high rise, low fall, high fall, low level, high level, low fall{rise, 

high fall{rise, stressed but unaccented, and unstressed. 

109

Tag 

U/str 

&FO 0 0 0 0 0 0 1 0 1 0 

&FW 1 0 6 2 1 5 1 1 3 5 

APP$ 1 0 2 9 1 8 0 7 6 303 

AT 0 0 0 18 1 19 0 3 42 2016 

AT1 1 0 0 1 1 3 0 1 12 739 

BTO21 0 0 0 0 1 0 0 0 1 2 

BTO22 0 0 0 0 1 0 0 0 0 3 

CC 5 1 1 12 1 18 0 6 57 683 

CC31 0 0 0 0 0 0 0 0 0 1 

CC32 0 0 0 1 0 0 0 0 0 0 

CC33 0 0 0 0 0 0 0 0 0 1 

CCB 2 0 0 0 0 3 0 1 23 140 

CF 4 0 0 1 0 3 1 1 5 19 

CS 3 1 0 12 3 17 0 0 30 92 

CS21 0 0 0 4 0 4 0 1 4 5 

CS22 0 0 0 1 1 0 0 1 1 14 

CSA 1 0 0 4 1 8 0 0 7 72 

CSN 0 0 0 0 0 0 0 0 0 24 

CST 0 0 0 0 0 0 0 0 8 260 

CSW 1 0 0 4 1 0 0 0 7 1 

DA 1 0 1 2 6 8 0 4 7 18 

DA1 1 0 0 2 2 6 0 2 7 6 

DA2 2 1 2 4 2 8 0 7 6 10 

DAR 1 1 3 6 5 6 0 3 6 11 

DAT 2 0 0 2 0 3 3 6 1 1 

DB 0 1 9 18 5 19 2 14 14 8 

DB2 0 0 0 1 0 4 0 6 2 2 

DD 0 0 1 13 9 20 0 17 14 23 

DD1 5 1 4 38 6 44 1 31 42 113 

DD121 0 0 0 0 0 0 0 0 0 3 

DD122 0 0 0 1 1 0 0 1 0 0 

DD2 5 0 1 3 7 14 0 4 12 29 

DD21 0 0 0 0 0 0 0 0 0 4 

DD22 0 0 0 1 0 0 0 0 1 2 

DD221 0 0 0 0 0 0 0 0 0 12 

DD222 0 0 0 3 0 2 0 1 3 3 

DDQ 3 0 1 10 7 11 0 1 22 106 

DDQ$ 0 0 0 0 0 0 0 0 0 3 

110

Tag 

U/str 

DDQV 0 0 0 3 0 0 0 1 0 0 

EX 0 0 0 0 1 2 0 0 6 79 

ICS 1 0 1 9 7 9 0 5 24 65 

IF 1 0 0 1 1 0 0 0 11 254 

II 7 1 11 25 13 47 1 25 128 1847 

II21 1 0 0 9 4 7 1 6 16 26 

II22 0 0 0 0 1 0 0 0 0 70 

II31 0 0 0 0 0 0 0 0 0 11 

II32 0 0 1 2 2 1 0 1 3 1 

II33 0 0 0 0 0 0 0 0 0 11 

IO 0 0 0 1 0 0 0 0 7 896 

IW 0 0 0 4 1 5 0 3 12 157 

JA 1 0 1 3 0 1 0 0 1 0 

JB 7 0 3 23 18 28 1 20 40 24 

JBR 0 0 0 0 0 1 0 1 0 0 

JJ 68 15 87 263 200 354 37 177 356 145 

JJR 2 0 2 5 4 8 1 11 1 3 

JJT 1 1 2 12 3 9 0 14 3 6 

LE 0 0 0 3 1 0 0 1 7 4 

MC 12 2 22 78 63 108 4 35 73 50 

MC-MC 0 0 0 1 0 2 0 0 0 1 

MC1 4 1 2 10 11 22 0 12 18 18 

MC2 3 0 1 3 0 0 1 1 2 0 

MD 3 0 4 18 22 27 3 26 20 29 

MF 3 0 2 13 1 10 0 1 16 24 

ND1 1 0 5 4 2 2 0 1 5 3 

NN 14 3 13 23 14 23 3 6 45 28 

NN1 379 80 373 610 285 348 106 359 810 275 

NN121 0 0 1 0 1 0 0 0 0 0 

NN122 0 0 1 0 0 0 0 0 1 0 

NN2 160 26 172 236 134 130 45 124 364 141 

NNJ 7 5 6 20 12 3 7 8 30 19 

NNJ1 1 2 3 2 2 2 0 1 3 1 

NNJ2 3 0 1 4 3 4 1 2 7 2 

NNL1 15 4 20 23 13 11 2 8 26 19 

NNL2 0 0 2 3 6 2 1 0 2 2 

NNO 3 0 2 9 7 4 0 10 14 26 

NNO2 0 0 0 3 0 0 0 0 1 0 

NNS 1 0 0 0 0 0 0 0 0 0 

NNS1 3 3 3 8 9 15 2 5 31 13 

NNS2 2 0 1 2 0 0 1 3 2 1 

NNSA1 0 1 0 0 0 0 0 0 0 0 

NNSB1 2 0 0 1 1 3 0 0 12 60 

111

Tag 

U/str 

NNT1 15 3 19 32 13 15 4 12 62 35 

NNT2 12 1 13 21 4 4 1 7 15 14 

NNU 1 0 0 1 1 0 0 1 3 3 

NNU1 2 2 1 3 0 1 2 1 3 2 

NNU2 2 1 13 9 4 2 1 3 10 4 

NNU21 0 0 0 0 0 0 0 0 1 27 

NNU22 0 0 4 6 1 0 1 2 8 5 

NP 0 0 0 1 1 1 0 0 1 4 

NP1 91 52 129 191 131 191 37 155 222 117 

NP2 0 0 0 2 0 0 0 0 0 0 

NPD1 0 0 2 3 3 1 0 0 0 0 

NPM1 1 0 7 9 1 1 2 6 3 2 

PN 0 0 0 3 0 1 0 0 0 0 

PN1 4 0 0 7 4 6 1 7 4 5 

PN121 0 0 0 0 1 0 1 0 0 0 

PN122 0 0 0 0 0 0 0 0 0 2 

PNQO 0 0 0 0 0 0 0 0 1 0 

PNQS 0 1 0 1 1 2 0 0 4 78 

PP$ 0 0 0 1 0 0 0 0 0 5 

PPH1 0 0 1 1 0 0 0 0 16 245 

PPHO1 1 0 1 0 0 1 0 0 2 12 

PPHO2 0 0 0 0 0 0 0 1 1 40 

PPHS1 0 0 1 5 1 3 0 1 5 116 

PPHS2 1 0 0 5 2 2 1 6 7 97 

PPIO1 0 0 0 1 0 1 0 0 1 7 

PPIO2 0 0 0 0 0 0 0 0 0 23 

PPIS1 0 0 0 1 1 2 0 3 6 46 

PPIS2 0 0 0 0 0 4 0 1 5 96 

PPX1 1 0 5 6 2 1 4 4 7 1 

PPX121 0 0 0 0 0 0 0 0 2 0 

PPX122 0 0 1 0 0 0 0 0 1 0 

PPX2 1 0 1 0 2 0 1 4 1 2 

PPY 0 0 0 0 1 0 0 1 1 68 

RA 2 0 4 3 0 3 1 2 7 6 

REX21 0 0 0 0 0 0 0 0 1 30 

REX22 3 1 1 1 3 0 0 1 20 1 

RG 0 1 0 14 6 22 0 5 19 52 

RG21 0 0 0 0 0 0 0 0 0 2 

RG22 0 0 0 0 0 0 0 0 0 2 

RGA 0 0 1 2 1 0 0 0 2 1 

RGQ 1 0 0 0 1 1 0 0 1 4 

RGQV 0 0 0 0 0 1 0 0 0 0 

RGR 0 0 0 1 5 6 0 3 3 20 

RGT 1 0 0 1 0 2 0 1 5 13 

112

Tag 

U/str 

RL 8 1 16 13 10 15 3 4 26 15 

RL21 0 0 0 0 0 0 0 0 0 3 

RL22 0 1 0 1 0 0 0 1 0 0 

RP 14 4 24 33 11 18 4 9 24 66 

RR 41 5 28 157 65 130 16 83 132 117 

RR21 0 0 0 0 1 1 0 0 7 51 

RR22 7 1 3 14 2 5 0 6 16 6 

RR31 0 0 0 0 0 0 0 0 3 5 

RR32 0 0 2 4 1 0 0 0 0 1 

RR33 0 0 0 0 1 0 0 0 0 7 

RRQ 0 0 0 11 3 6 0 2 9 41 

RRQV 0 0 0 0 0 0 0 0 0 1 

RRR 3 0 7 3 6 7 1 6 8 1 

RRT 0 0 0 0 0 1 0 0 0 0 

RT 12 2 8 10 7 15 1 11 23 22 

TO 0 0 0 0 0 0 0 0 3 410 

UH 3 2 1 2 1 5 0 0 4 8 

VB0 1 0 1 2 0 1 0 0 9 153 

VBDR 0 0 0 2 0 6 0 1 1 90 

VBDZ 0 0 0 2 3 4 0 1 11 261 

VBG 0 0 0 0 0 2 0 0 4 22 

VBM 0 0 0 1 0 0 0 0 0 0 

VBN 0 1 0 1 2 1 0 0 5 104 

VBR 0 0 1 8 4 3 0 2 5 108 

VBZ 2 1 3 8 1 6 0 8 10 229 

VD0 0 0 1 5 1 2 0 8 1 7 

VDD 1 0 2 4 1 8 0 0 3 14 

VDG 0 0 0 0 1 1 1 0 2 1 

VDN 0 0 1 0 2 0 0 0 0 2 

VDZ 0 0 0 2 1 2 0 0 3 5 

VH0 1 0 0 3 3 4 0 5 13 123 

VHD 0 0 0 2 1 6 0 0 3 75 

VHG 0 0 0 0 2 1 0 0 2 6 

VHN 0 0 0 2 1 0 0 0 2 3 

VHZ 1 0 0 3 1 2 0 1 5 79 

VM 2 0 5 20 9 19 0 10 17 218 

VM21 0 0 0 2 0 2 0 0 2 1 

VV0 67 7 47 104 90 105 9 47 185 144 

VVD 13 10 21 53 47 85 2 28 147 67 

VVG 21 3 6 54 51 71 5 20 105 42 

VVN 67 8 62 107 96 115 16 56 152 60 

VVZ 11 2 9 20 21 33 1 4 54 40 

XX 0 1 4 21 6 22 0 8 10 22 

113

Appendix E 

Punctuation and Boundaries 

Table E.1 shows the frequency of co{occurrence of puntuation with tone unit boudaries. fCTUg is 

a symbol the means more than one punctuation symbol matched the same boundary the second 

or third punctuation symbols are marked with this symbol. Punctuation that does not match any 

tone unit boundary is marked with the fPNg symbol and tone unit boundaries that do not match 

punctuation are marked with the fTUg symbol. 

Analysis of the unusual cases such as the four full stops not coinciding with tone unit boundaries 

often shows that the cases are questionable. In general the only punctuation symbols that do not 

match tone unit boundaries are quotes and commas. 

114

TAG fPNg k j * fCTUg 

{ 1 21 69 0 14 

! 1 13 1 0 0 

1 40 2 1 1 

. 4 1045 57 0 16 

1 68 38 0 0 

: 5 74 45 3 2 

, 255 169 1425 13 6 

( 2 6 26 0 1 

) 3 0 19 0 13 

" 56 18 44 14 102 

... 1 2 1 0 1 

fTUg N/A 55 3503 74 N/A 

Table E.1: Punctuation/Tone Unit Boundary Co-occurence Table. 

115

Appendix F 

Source Code 

F.1 symbolify.c 

This program changes the symbols used for the prosodic annotation to the more iconic ASCII scheme 

devised during this research. The were found to be two formats of prosodic annotation one which used 

(generally unused) characters above ASCII 128 and another that used the sequence #xxx where xxx is 

the ASCII value of the character used in the rst format. The original annotation system used a specially 

modied character set that represented each of the prosodic symbols under a PC{based system but since 

character sets like this are not generally available to most UNIX users this program remaps the symbols 

into standard ASCII characters. 

/******************************************************************************\ 

symbolify 

- changes numbers to symbols in a SEC prosody transcription. 

AUTHOR: 

symbolify.c (c) Copyright January 1992, Simon Arnfield. All Rights Reserved. 

SYNOPSIS: 

symbolify [-h] prosody-filename 

DESCRIPTION: 

Gets prosody infomation from a SEC prosody transcription and converts to 

ASCII symbols. Handles two different formats of prosody transcription: 

Select second format with the -h switch. 

First format uses byte codes to represent the different tonic stress marks 

second format uses the string code #xxx to represent TSM symbols as listed 

below. 

String Octal Symbol Symbol name 

#161 241 `/ high fall-rise 

#162 242 /` high rise-fall 

#246 366 ,\ low rise-fall --changed from 240 because of ^ problem 

#247 367 \, low fall-rise 

#171 253 , low rise 

#172 254 / high rise 

116

#174 256 ` high fall 

#173 255 \ low fall 

#163 243 ~ high level 

_ low level 

#248 370 * level stress (also #249, 371) 

#165 245 > high reset 

#166 246 < low reset 

#240 360 ^ hesitation TU boundary (see low rise-fall) 

| minor tone unit boundary 

|| major tone unit boundary 

FILES: 

REFERENCES: 

BUGS: 

PROGRAM MODIFICATION HISTORY 

Date | By | Vers | Comments 

----------+-----+------+-------------------------------------------------------- 

11/02/92 | ScA | 1.0 | Created original code from extract-prosody.c 

28/04/92 | ScA | 1.1 | Made alterations to numbers/symbols with new info 

| | | of errors made in transcription. 

03/07/92 | ScA | 1.2 | Added handling of byte codes and switch to select which 

\******************************************************************************/ 

/*----------------------------*\ 

| INCLUDES AND DEFINITIONS | 

\*----------------------------*/ 

#include 



#define HASH 1 

#define CHARACTER 2 

/*-------------------------*\ 

| FUNCTIONS DEFINITIONS | 

\*-------------------------*/ 

void error_exit(msg) 

char *msg 

{ 

printf("%s\n", msg) 

exit(1) 

} 

void main(argc, argv) 

int argc 

char *argv[] 

{ 

FILE * tag1 

int fnameno = 1, ftype = CHARACTER 

117

char cc[3] 

unsigned char 

c 

if (argc != 2 && argc != 3) 

error_exit("Usage: symbolify [-h] prosody-transcription-file") 

if (argc == 3) { 

fnameno = 2 

if (!strcmp(argv[1], "-h")) 

ftype = HASH 

else 

error_exit("Invalid switch") 

} 

if (!(tag1 = fopen(argv[fnameno], "r"))) 

error_exit("Filename error!") 

while (!feof(tag1)) { 

c = getc(tag1) 

switch (ftype) { 

case HASH: 

if (c != '#') 

putchar(c) 

if (c == '#') { 

fscanf(tag1, "%3s", cc) 

if (!strcmp(cc, "161")) 

printf("`/") 


printf("/`") 


printf(",\\") 


printf("\\,") 


printf(",") 


printf("/") 


printf("\\") 


printf("`") 


printf("~") 


printf(">") 


printf("

} 

switch ((int)c) { 

case 161: 

printf("`/") 

break 

case 162: 

printf("/`") 

break 

case 247: 

printf("\\,") 

break 

case 246: 

printf(",\\") 

break 

case 171: 

printf(",") 

break 

case 172: 

printf("/") 

break 

case 173: 

printf("\\") 

break 

case 174: 

printf("`") 

break 

case 163: 

printf("~") 

break 

case 165: 

printf(">") 

break 

case 166: 

printf("

F.2 ttalign.c 

See chapter 3 for a description of this program. 

/******************************************************************************\ 

ttalign 

- matches two word column files and associated tag and tone files 

AUTHOR: 

ttalign.c (c) Copyright November 1991, Simon Arnfield. All Rights Reserved. 

SYNOPSIS: 

ttalign tag-col-file tag-word-col-file tone-col-file tone-word-col-file 

DESCRIPTION: 

There are four input files derrived from the tagged and prosodic versions 

of the sec with the following csh script: 

# PROCESS_TAG_PROS. This script processes the vertical tag and prosodic 

# files given in the input and outputs four files pros.[12] vtag.[12] 

# ready for matching. Format: process_tag_pros tag-file pros-file 

# e.g. process_tag_pros M01 M01.b 

# tag file handling... 

awk '$4=="-----" {next} \ 

$5=="("&&$6=="@" {b=1} \ 

$5=="[" {b=1} \ 

{if(b==0) print $4,$5} \ 

$5==")"&&$6=="@" {b=0} \ 

$5=="]" {b=0}' b=0 \ 

/usr/export/home/sca/work/corpus/tag/$1 | tee process.vtag \ 

| cut -f1 -d" " >vtag.1 

cat process.vtag| cut -f2 -d" " | tr A-Z a-z >vtag.2 

rm process.vtag 

# prosody file handling... 

symbolify /usr/export/home/sca/work/corpus/pros/$2 \ 

| awk '{a=substr($0,1,1)} {if (a!="[") print $0}' \ 

| tr -s ' ' '\012' \ 

| awk 'length!=0 {print $0}' \ 

| tee pros.1 \ 

| tr -d '`,/\\~_*@' | tr A-Z a-z >pros.2 

This program attempts to match the words in the two files 

pros.2 and vtag.2 and when it does so, prints out 

the associated entry in the files pros.1 and vtag.1 

The result is a list of words, the word with its tone marked on it, 

and the part-of-speech that the word has in the corpus. These values 

may then be used to calculate probabilities of co-occurrence of 

tags and tones. 

Several problems arise in this however, for example differences in 

case between words in the vertical tag files and prosody files. 

Words such as "don't" "won't" "it's" called enclitics, are treated as 

two words "do + n't" "will + n't" in the vtag files but as single words 

in the prosody file, meaning that the vertical output format will have 

120

to have a blank entry for one column. Similar problems occur with compound 

nouns, where "mother-in-law" takes only one entry in the vtag file but 

may be marked as three lines (if hyphens are ommitted) in the prosdy file. 

In addition it is possible to have a tone unit boundary (| or || or ^)that 

does not co-incide with a punctuation symbol and vice-versa. These have 

to add new blank entries in the appropriate columns. 

New tags used are enclosed in {}. 

TAG-TAGs TONE-TAGs WORD-TAGs Usage. 

{PN} 

no tone unit matching PuNctuation. 

{TU} 

{TONE-UNIT} no punc matching Tone Unit 

{CP} {COMPOUND} lines following ComPound-nouns 

{EN} 

lines following ENclitics 

Because, in some circumstances a bracket or quote will be next to 

some punctuation such as a comma or full stop there is a need for 

a post-processing phase to re-organise which punc symbol gets matched 

to a tone-unit if one occurs. In these cases the bracket or quote is 

given the symbol {CTU}. eg: 

JJ philo*sophical philosophical 

NN position position 

*' {PN} ' 

NN /naturalism naturalism 

**' | ' 

, {PN} , 

CC and and 

will become: 



*' {PN} ' 


**' {CTU} ' 

, | , 


FILES: 

~sca/src/sec/symbolify.c 

~sca/src/sec/collate-tu.c 

~sca/work/corpus/pros/* 

~sca/work/corpus/tag/* 

These are similar to the original files in the corpus in /bjw /gok /dup 

for the porsody, and /vtag for the tag, except that prosody files are 

re-organised into one directory, names are changed, errors are corrected. 

REFERENCES: 

A manual of information to accompany the SEC Corpus. 

L.J.Taylor, Dr. G.Knowles, 1988, Lancaster University. 

BUGS: 

Does not detect eof properly resulting in erronous lines at end of 

output files. 



----------+-----+------+-------------------------------------------------------- 

04/11/91 | ScA | 1.0 | Created original code 

26/02/92 | ScA | 1.1 | Made handling of enclitics, punctuation, tone units 

| | | and compound nouns. 

121

02/03/92 | ScA | 1.2 | Several improvements, added handling of ( ) *' **' 

06/03/92 | ScA | 1.3 | fixed some bugs in input handling. 

06/04/92 | ScA | 1.4 | changed {} to differnt symbols for diff situations. 

29/04/92 | ScA | 1.5 | removed {MW}, {BR}, made processing simpler 

| | | by adding post-procesing phase 

04/05/92 | ScA | 1.6 | tried to fix eof bug and multiple mismatch lines. 

12/07/92 | ScA | 2.0 | added interactive fixing of mismatches. 

12/08/92 | ScA | 2.1 | finished interactive fixing routines. 

\******************************************************************************/ 

/*----------------------------*\ 


\*----------------------------*/ 




/*-------------------*\ 

| GLOBAL VARIABLES | 

\*-------------------*/ 

char ntag[20][40], ntone[20][40], 

ntag_word[20][40], ntone_word[20][40] 

int b1 = 0, b2 = 0, b3 = 0, b4 = 0 

/*-------------------------*\ 


\*-------------------------*/ 

void print(str1, str2, str3) 

char *str1, *str2, *str3 

{ 

printf("%-8s%-24s%-24s\n", str1, str2, str3) 

fprintf(stderr, "%-8s%-24s%-24s\n", str1, str2, str3) 

} 

char *read_tag1(file) 

/* provides tag input from the tag files. */ 

FILE *file 

{ 

char input[40] 

int i 

if (b1 == 0) 

if (feof(file)) 

strcpy(input, "") 

else 

fscanf(file, "%s", input) 

else { 

strcpy(input, ntag[1]) 

for (i = 2 i

char *read_tag2(file) 

/* provides word input from the tag files. */ 

FILE *file 

{ 

char input[40] 

int i 

if (b2 == 0) 

if (feof(file)) 

strcpy(input, "") 

else 

fscanf(file, "%s", input) 

else { 

strcpy(input, ntag_word[1]) 

for (i = 2 i

} 

for (i = 2 i b1) 

fscanf(tag1, "%s", ntag[++b1]) 

if ((++bb2) > b2) 

fscanf(tag2, "%s", ntag_word[++b2]) 

if ((++bb3) > b3) 

fscanf(pros1, "%s", ntone[++b3]) 

if ((++bb4) > b4) 

fscanf(pros2, "%s", ntone_word[++b4]) 

} while ((b1 < 10) && (b3 < 10) && strcmp(ntag_word[bb2], 

tone_word) && strcmp(ntone_word[bb4], tag_word) && !feof(tag1) && 

!feof(pros1)) 

if ((punc || tu) || (strcmp(ntag_word[b2], tone_word) && 

strcmp(ntone_word[b4], tag_word))) { 

124

* if tag is a punctuation symbol or if tone is a tone unit */ 

/* then it is possible that the previous mismatch is part of */ 

/* an enclitic or compound followed by punctuation or a tone */ 

/* unit the PUNC or TU may then be found to match a TU or PUNC */ 

/* later on this is where the automatching gets complicated */ 

/* so, ask the user for assistance interactively. Note that we */ 

/* ignore the reults of the bit above - as far as we are */ 

/* concerend here we only want the buffers full so we can give */ 

/* the user some context. */ 

/* interactive fixit */ 

int i, finished = 0 

char *command[256] 

do { 

fprintf(stderr, "FIXITMODE: Tone next(1-TU,2-CP). 

Tags next(3-EN,4-PN). 5:Exit\n") 

fprintf(stderr, " 0# %-8s%-24s%-24s\n", 

tag, tone, tag_word) 

for (i=1,j=1(i

eak 

case 3: 

print(tag, "{EN}", tag_word) 

strcpy(tag, ntag[1]) 

strcpy(tag_word, ntag_word[1]) 

for (i = 2 i

char tag[40], tone[40], tag_word[40], pros_word[40] 

int punc = 0, tu = 0, new_data = 0 

if (argc != 5) { 

fprintf(stderr, "usage: ttalign tag tag-word tone tone-word\n 

Output produced by PROCESS_TAG_PROS") 

exit(1) 

} 

if (!(tag1 = fopen(argv[1], "r"))) { 

fprintf(stderr, "Can't open %s\n", argv[1]) 

exit(1) 

} 



exit(1) 

} 

if (!(pros1 = fopen(argv[3], "r"))) { 


exit(1) 

} 

if (!(pros2 = fopen(argv[4], "r"))) { 


exit(1) 

} 

while (!feof(tag1) && !feof(pros1)) { 

if (!new_data) { 

strcpy(tag, read_tag1(tag1)) 

strcpy(tag_word, read_tag2(tag2)) 

strcpy(tone, read_pros1(pros1)) 

strcpy(pros_word, read_pros2(pros2)) 

} 

new_data = 0 

if (strstr(".,:!*-...*'**'()", tag)) 

punc = 1 

else 

punc = 0 

if (strstr("||^|", tone)) 

tu = 1 

else 

tu = 0 

if (!strcmp(pros_word, tag_word) || (punc && tu)) 

print(tag, tone, tag_word) 

else { 

if (!punc && !tu) /* neither punc or tu and don't match */ 

handle_unmatched(tag, tone, tag_word, 

pros_word, tag1, tag2, pros1, pros2) 

if (punc && !tu) /* punctuation but no tu boundary */ { 

print(tag, "{PN}", tag_word) 

if (!feof(tag1)) { 

strcpy(tag, read_tag1(tag1)) 

strcpy(tag_word, read_tag2(tag2)) 

new_data = 1 

} 

} 

if (!punc && tu) /* tu boundary but no punctuation */ { 

print("{TU}", tone, "{TONE-UNIT}") 

if (!feof(pros1)) { 

127

} 

} 

} 

} 

strcpy(tone, read_pros1(pros1)) 

strcpy(pros_word, read_pros2(pros2)) 

new_data = 1 

} 

fclose(pros2) 

fclose(pros1) 

fclose(tag2) 

fclose(tag1) 

/*-------*\ 

| END | 

\*-------*/ 

F.3 collate-tu.c 

This program is used as a post{processing phase to ttalign and specifcally handles cases where there 

are multiple punctuation symbols that co{incide with a tone unit boundary. As wellasalittleshuing 

to ensure that the tone unit boundary is aligned with the primary type of punctuation (viz. brackets 

and quotes are less likely to give rise to a boundary than, say, a full stop or comma) it marks the other 

punctuation symbols as matching a boundary whereas ttalign would have left the punctuation marked as 

punctuation that does not match a boundary | which would be incorrect. 

/******************************************************************************\ 

collate-tu - collects together punctuation under one TU, where appropriate 

AUTHOR: 

collate-tu.c (c) Copyright May 1992, Simon Arnfield. All Rights Reserved. 

SYNOPSIS: 

collate-tu 

DESCRIPTION: 

Because, in some circumstances a bracket or quote will be next to 

some punctuation such as a comma or full stop there is a need for 

a post-processing phase to re-organise which punc symbol gets matched 

to a tone-unit if one occurs. In these cases the bracket or quote is 

given the symbol {CTU}. eg: 



*' {PN} ' 


**' | ' 

, {PN} , 


will become: 



*' {PN} ' 


128

**' {CTU} ' 

, | , 




----------+-----+------+-------------------------------------------------------- 


\******************************************************************************/ 

/*----------------------------*\ 


\*----------------------------*/ 




/*-------------------------*\ 


\*-------------------------*/ 



char *argv[] 

{ 

FILE * file 

char pwrd1[8], pwrd2[24], pwrd3[24] 

char wrd1[8], wrd2[24], wrd3[24] 

int p_is_tu = 0, is_pn = 0, flag = 0 

if (argc != 2) { 

printf("usage: collate-tu \n") 

exit(1) 

} 

if (!(file = fopen(argv[1], "r"))) { 

printf("Can't open %s\n", argv[1]) 

exit(1) 

} 

fscanf(file, "%s", pwrd1) 



while (!feof(file)) { 

fscanf(file, "%s", wrd1) 



if (strstr("|^||", pwrd2)) 

p_is_tu = 1 

else 

p_is_tu = 0 

if (!strcmp("{PN}", wrd2)) 

is_pn = 1 

else 

is_pn = 0 

129

flag = 0 

if ((!strcmp("'", pwrd1) || !strcmp(")", pwrd1)) && 

p_is_tu && is_pn) { 

printf("%-8s%-24s%-24s\n", pwrd1, "{CTU}", 

pwrd3) 

printf("%-8s%-24s%-24s\n", wrd1, pwrd2, 

wrd3) 




flag = 1 

} 

if (!flag && p_is_tu && is_pn) { 

printf("%-8s%-24s%-24s\n", pwrd1, pwrd2, 

pwrd3) 

printf("%-8s%-24s%-24s\n", wrd1, "{CTU}", 

wrd3) 




while (!strcmp("{PN}", pwrd2)) { 

printf("%-8s%-24s%-24s\n", pwrd1, 

"{CTU}", pwrd3) 




} 

flag = 1 

} 

} 

if (!flag) { 

printf("%-8s%-24s%-24s\n", pwrd1, pwrd2, 

pwrd3) 

strcpy(pwrd1, wrd1) 



} 

} 

fclose(file) 

/*-------*\ 

| END | 

\*-------*/ 

F.4 align-parse.c 

align-parse takes the output from ttalign and the parsetree le from which tags were taken and inserts 

the phrase brackets at the appropriate place in the alignle. That is it produces a le with the phrase 

brackets and tags aligned with the prosodic words and tone unit boundaries. This somewhat trivial task is 

confused with the need to check for tone unit boundaries or the absence of and insert lines appropriately. 

/******************************************************************************\ 

130

align-parse - aligns sec parsetree with output from ttalign 

AUTHOR: 

align-parse (c) Copyright November 1991, Simon Arnfield. All Rights Reserved. 

SYNOPSIS: 

align-parse 

parse-file tag-tone-file 

DESCRIPTION: 

Aligns a sec parsetree with the output produced by ttalign. That is it 

produces a file containing phrase brackets and words aligned with tone-unit 

boundaries and words in the output from the tag-tone alignment. 

IDIOSYNCRASIES: 

FILES: 

REFERENCES: 

A manual of information to accompany the SEC Corpus. 

L.J.Taylor, Dr. G.Knowles, 1988, Lancaster University. 

BUGS: 



----------+-----+------+-------------------------------------------------------- 


06/08/92 | ScA | 1.1 | changed because ttalign is now used with treebank 

17/08/92 | ScA | 1.2 | added recognition of compounds as these will skew 

| output in same way as TU would, if lines are matched. 

| | | data. Therefore word-tags are same. 

\******************************************************************************/ 

/*----------------------------*\ 


\*----------------------------*/ 




/*-------------------------*\ 


\*-------------------------*/ 



char *argv[] 

{ 

FILE * tag1, *tag2 

char input1[80], input1b[40], in2a[32], in2b[32], in2c[32] 

if (argc != 3) { 

printf("usage: align-parse parse-file tag-tone-file\n") 

exit(1) 

} 

131



exit(1) 

} 



exit(1) 

} 

fscanf(tag1, "%s", input1) 

fscanf(tag2, "%s", in2a) 

fscanf(tag2, "%s", in2b) 

fscanf(tag2, "%s", in2c) 

while (!feof(tag1) && !feof(tag2)) { 

/* if in2a = {TU} then must insert blank line in col 1 */ 

if (!strcmp(in2a, "{TU}")) { 

printf("%-40s%-6s%-16s%-18s\n", "", in2a, 

in2c, in2b) 




} 

/* if in2a = {CP} treat as {TU} */ 

/* BUT can have sequences of more than one {CP} */ 

while (!strcmp(in2a, "{CP}")) { 






} 

/* if input1 has [ or ] in it then add next line 

/* otherwise align with input2*/ 

if (strstr(input1, "[") || strstr(input1, "]")) { 

fscanf(tag1, "%s", input1b) 

if (strstr(input1b, "[") || strstr(input1b, 

"]")) { 

strcat(input1, input1b) 

fscanf(tag1, "%s", input1b) 

} else 

printf("%-40s%-6s%-16s%-18s\n", 

input1, in2a, in2c, in2b) 

} else 



} 

fscanf(tag1, "%s", input1) 




} 

fclose(tag2) 

fclose(tag1) 

132

*-------*\ 

| END | 

\*-------*/ 

F.5 splittule.c 

This program provided a front end to probabilityc by splitting the align les produced by ttalign into many 

smaller les and generating a c{shell script to execute probabilityc on these les. This is used for testing 

probabilityc and for producing results. It is not intended as a real front end to the model | only a front 

end to allow testing. Splitting the input into lots of les is very dirty programming but allows recovery of 

a run in case the machine goes down whilst the lengthy calculations take place. A nicer front end should 

be written for use with the model for synthesis purposes. 

/* splittufile.c Copyright Simon Arnfield August 1993 */ 

/* this program converts align files to many files suitable for use with */ 

/* probabilityc.c it also generates (on stdout) a script to process these */ 

/* the align files must first have been processed with the following script */ 

/* cat ~/work/corpus/align/a01.align|grep -v "{TU}"| grep -v "{CTU}"| \ */ 

/* grep -v '^$ {EN}' |grep -v "XXXX"|splittufile|csh >a01.results */ 

/* we break the input either at punctuation or when the file is MAXWRD long */ 



#define MAXWRD 12 

main() 

{ 

char 

bufw[MAXWRD][99], buft[MAXWRD][99], filename[9], word[99], 

tag[99], tmp[99] 

int pos = 0, i, filenum = 1 

FILE * file 

sprintf(filename, "file%d", filenum) 

while (!feof(stdin)) { 

scanf("%s %s %*s", tag, word) 

if (strstr("!(),-.:...'", tag) && strstr("!(),-.:...'", 

buft[pos-1]) && pos >= 1) { 

pos-- 

strcpy(tmp, buft[pos]) 

strcat(tmp, tag) 

strcpy(tag, tmp) 

strcpy(tmp, bufw[pos]) 

strcat(tmp, word) 

strcpy(word, tmp) 

} 

strcpy(buft[pos], tag) 

strcpy(bufw[pos], word) 

pos++ 

if ((pos == MAXWRD || strstr("!(),-.:...'", tag)) && 

pos > 3) { 

if ((file = fopen(filename, "w")) == 0) { 

fprintf(stderr, "couldn't open file %s\n", 

filename) 

exit(1) 

} 

fprintf(file, "%d\n", pos) 

133

} 

} 

} 

for (i = 0 i < pos i++) 

fprintf(file, "%s %s\n", buft[i], 

bufw[i]) 

fclose(file) 

printf("probabilityc


\*----------------------------*/ 




/*-------------------*\ 

| GLOBAL VARIABLES | 

\*-------------------*/ 

char tags[187][7] = { 

"&FO", "&FW", "APP$", "AT", "AT1", "BTO", "BTO21", "BTO22", 

"CC", "CC31", "CC32", "CC33", "CCB", "CF", "CS", "CS21", 

"CS22", "CSA", "CSN", "CST", "CSW", "DA", "DA1", "DA2", 

"DA2R", "DAR", "DAT", "DB", "DB2", "DD", "DD1", "DD121", 

"DD122", "DD2", "DD21", "DD22", "DD221", "DD222", "DDQ", 

"DDQ$", "DDQV", "EX", "ICS", "IF", "II", "II21", "II22", 

"II31", "II32", "II33", "IO", "IW", "JA", "JB", "JBR", 

"JJ", "JJR", "JJT", "LE", "MC", "MC-MC", "MC1", "MC2", 

"MD", "MF", "ND1", "NN", "NN1", "NN121", "NN122", "NN2", 

"NNJ", "NNJ1", "NNJ2", "NNL1", "NNL2", "NNO", "NNO2", 

"NNS", "NNS1", "NNS2", "NNSA1", "NNSB1", "NNT1", "NNT2", 

"NNU", "NNU1", "NNU2", "NNU21", "NNU22", "NP", "NP1", 

"NP2", "NPD1", "NPM1", "PN", "PN1", "PN121", "PN122", 

"PNQO", "PNQS", "PP$", "PPH1", "PPHO1", "PPHO2", "PPHS1", 

"PPHS2", "PPIO1", "PPIO2", "PPIS1", "PPIS2", "PPX1", 

"PPX121", "PPX122", "PPX2", "PPX221", "PPX222", "PPY", 

"RA", "REX", "REX21", "REX22", "RG", "RG21", "RG22", 

"RGA", "RGQ", "RGQV", "RGR", "RGT", "RL", "RL21", "RL22", 

"RP", "RR", "RR21", "RR22", "RR31", "RR32", "RR33", "RRQ", 

"RRQV", "RRR", "RRT", "RT", "TO", "UH", "VB0", "VBDR", 

"VBDZ", "VBG", "VBM", "VBN", "VBR", "VBZ", "VD0", "VDD", 

"VDG", "VDN", "VDZ", "VH0", "VHD", "VHG", "VHN", "VHZ", 

"VM", "VM21", "VM22", "VMK", "VV0", "VVD", "VVG", "VVN", 

"VVZ", "XX", "ZZ1", "{CP}", ".", ":", "", "!", "", 

",", "-", "(", ")", "'"} 

int trUU[190][190], trUS[190][190], trSU[190][190], trSS[190][190] 

/*-------------------------*\ 


\*-------------------------*/ 

void main() 

{ 

char tag[7], prosody[40], word[40] 

int i, j, t1 = -1, t2 = -1, s1, s2 

for (i = 0 i < 170 i++) { 

for (j = 0 j < 170 j++) { 

trUU[i][j] = 0 

trUS[i][j] = 0 

trSU[i][j] = 0 

trSS[i][j] = 0 

} 

} 

135

* assume first tag is ok - we don't check if it IS 0-187 */ 

scanf("%s %s %s", tag, prosody, word) 

for (i = 0 i < 187 i++) 

if (!strcmp(tag, tags[i])) 

t1 = i 

for (s1 = 0, j = 0 j < strlen(prosody) j++) 

if (prosody[j] == '*' || prosody[j] == ',' || prosody[j] 

== '/' || prosody[j] == '\\' || prosody[j] =='`'|| 

prosody[j] == '~' || prosody[j] == '_') { 

s1 = 1 

break 

} 



for (t2 = -1, i = 0 i < 187 i++) 

if (!strcmp(tags[i], tag)) 

t2 = i 

if (t2 == -1) { 

printf("Error: %s\n", tag) 


for (t2 = -1, i = 0 i < 187 i++) 

if (!strcmp(tags[i], tag)) 

t2 = i 

} 

for (s2 = 0, j = 0 j < strlen(prosody) j++) 

if (prosody[j] == '*' || prosody[j] == ',' || 

prosody[j] == '/' || prosody[j] == '\\' || 

prosody[j] == '`' || prosody[j] == '~' || 

prosody[j] == '_') 

s2 = 1 

} 

if (s1 == 0 && s2 == 0) 

trUU[t1][t2]++ 

if (s1 == 0 && s2 == 1) 

trUS[t1][t2]++ 

if (s1 == 1 && s2 == 0) 

trSU[t1][t2]++ 

if (s1 == 1 && s2 == 1) 

trSS[t1][t2]++ 

t1 = t2 

s1 = s2 

for (i = 0 i < 187 i++) { 

for (j = 0 j < 187 j++) 

printf("%d ", trUU[i][j]) 

printf("\n") 

} 

for (i = 0 i < 187 i++) { 

for (j = 0 j < 187 j++) 

printf("%d ", trUS[i][j]) 

136

} 


for (i = 0 i < 187 i++) { 

for (j = 0 j < 187 j++) 

printf("%d ", trSU[i][j]) 


} 

} 

for (i = 0 i < 187 i++) { 

for (j = 0 j < 187 j++) 

printf("%d ", trSS[i][j]) 


} 

/*-------*\ 

| END | 

\*-------*/ 

F.7 transgroups.c 

Transgroups has hard{wired into it the group denitions (i.e. which tags belong to which groups). It takes 

as input the transitions.table produced by the transitions program. It uses the group denitions and the 

transitions.table to produce the group{to{group transition probabilities. That is the probability that a tag 

in group G1 will have stress state S1 and be followed by a tag in group G2 which has a stress state S2. S1 

and S2, in this case, are either stressed or unstressed. The resulting probabilities are used in proabilityc. 

/******************************************************************************\ 

transgroups - adds up values in transitions.table for given group transitions 

AUTHOR: 

transgroups.c (c) Copyright May 1993, Simon Arnfield. All Rights Reserved. 

SYNOPSIS: 

transgroups

*----------------------------*\ 


\*----------------------------*/ 




#define NUMGROUPS 10 

/*-------------------------*\ 


\*-------------------------*/ 

int whichgroup(tg) /* returns group number for tag number tg */ 

int tg 

{ 

switch (tg) { 

case 177: case 178: case 179: case 180: case 181: 


return(0) 






case 161: case 162: case 163: case 164: 

return(1) 



case 149: case 152: 

return(2) 



case 50: case 117: case 145: 

return(3) 






return(4) 








case 175: 

return(5) 

138


case 62: case 63: case 174: 

return(6) 


case 173: 

return(7) 




case 114: case 115: case 116: case 176: 

return(8) 








return(9) 

} 

} 


{ 

int i, j, val, g1, g2, 

gpUU[NUMGROUPS][NUMGROUPS], gpUS[NUMGROUPS][NUMGROUPS], 

gpSU[NUMGROUPS][NUMGROUPS], gpSS[NUMGROUPS][NUMGROUPS] 

float T = 0 

for (i = 0 i < NUMGROUPS i++) { 

for (j = 0 j < NUMGROUPS j++) { 

gpUU[i][j] = 0 

gpUS[i][j] = 0 

gpSU[i][j] = 0 

gpSS[i][j] = 0 

} 

} 

for (i = 0 i < 187 i++) 

for (j = 0 j < 187 j++) { 

scanf("%d", &val) 

g1 = whichgroup(i) 

g2 = whichgroup(j) 

gpUU[g1][g2] +=val 

} 

for (i = 0 i < 187 i++) 

for (j = 0 j < 187 j++) { 




gpUS[g1][g2] +=val 

139

} 

for (i = 0 i < 187 i++) 

for (j = 0 j < 187 j++) { 




gpSU[g1][g2] +=val 

} 

for (i = 0 i < 187 i++) 

for (j = 0 j < 187 j++) { 




gpSS[g1][g2] +=val 

} 



T = gpUU[i][j] + gpUS[i][j] + gpSU[i][j] 

+ gpSS[i][j] 

printf("%5.4f,", (float)gpUU[i][j] / T) 

} 


} 




+ gpSS[i][j] 

printf("%5.4f,", (float)gpUS[i][j] / T) 

} 


} 




+ gpSS[i][j] 

printf("%5.4f,", (float)gpSU[i][j] / T) 

} 


} 




+ gpSS[i][j] 

printf("%5.4f,", (float)gpSS[i][j] / T) 

} 


} 

} 

/*-------*\ 

| END | 

\*-------*/ 

140

F.8 segment.c 

This program reads standard input and segments the text input le into tone units using three tone unit 

boundary markers. Initially this is based upon punctuation. Further research should lead to a more 

rened segmentation algorithm that makes use of phrase boundaries and rules. As this program stands 

punctuation is mapped into tone unit boundary symbols with the exception of quotes which are mapped 

into nothing. 

/******************************************************************************\ 

segment.c - segment text file into tone-units. 

AUTHOR: 

(c) Copyright July 1992, Simon Arnfield. All Rights Reserved. 

SYNOPSIS: 

segment 

DESCRIPTION: 

reads standard input and segments the text input file into tone-units 

using three tone-unit boundary markers. Initially this is based upon 

punctuation. Punctuation is classed either as a minor TU boundary, 

a major TU bouary or nothing. Where appropriate multiple punctuation 

exists one TU is produced. Precedences: || first then ^ | and none. 

Distributions are taken from frequency of co-occurrence as given by 

results from ttalign. 

none open/close quotes 

minor , ( ) hyphen 

major . ! : 

FILES: 

REFERENCES: 

BUGS: 

deletes apostrophies because it can't tell them from open-quotes. 



----------+-----+------+-------------------------------------------------------- 


19/07/92 | ScA | | finished version 1.0 

| | | 

\******************************************************************************/ 

/*----------------------------*\ 


\*----------------------------*/ 





#define NOTALPHA 0 

141

#define ALPHA 1 

#define NOTU 0 

#define MINORTU 1 

#define MAJORTU 2 

/*-------------------------*\ 


\*-------------------------*/ 


{ 

int tu = NOTU, ch = NOTALPHA 

char c, lc = '\0' 

} 


c = getchar() 

if (isalnum(c) || isspace(c)) 

ch = ALPHA 

else 

ch = NOTALPHA 

if (c == '\'' && isalnum(lc)) 

ch = ALPHA 

if (strstr(",()-", &c) && tu == NOTU) 

tu = MINORTU 

if (strstr(".!:", &c)) 

tu = MAJORTU 

if (ch == ALPHA) { 

if (tu == MINORTU) 

printf("| ") 

if (tu == MAJORTU) 

printf("|| ") 

tu = NOTU 

putchar(c) 

lc = c 

} 

} 

/*-------*\ 

| END | 

\*-------*/ 

F.9 probability.c 

Probability is the original stress prediction model described in chapter 5. It makes no use of word 

class/prosodic mark bigram frequencies but uses prosodic mark bigram frequencies. 

/* probability.c copyright Simon Arnfield 15th January 1993 */ 

/* compile with lc -Lm -DBISTATE probability.c for bi-stress model */ 

/* compile with lc -Lm -DTRISTATE probability.c for tri-stress model */ 




#define MAXWORDS 20 

#define NUMTAGS 168 

142

#define NUMBEST 1 

char tags[NUMTAGS][7] ={ 

"&FO", "&FW", "APP$", "AT", "AT1", "BTO21", 

"BTO22","CC", "CC31", "CC32", "CC33", "CCB", 

"CF", "CS", "CS21", "CS22", "CSA", "CSN", 

"CST", "CSW", "DA", "DA1", "DA2", "DAR", 

"DAT", "DB", "DB2", "DD", "DD1", "DD121", 

"DD122", "DD2","DD21", "DD22", "DD221","DD222", 

"DDQ", "DDQ$","DDQV", "EX", "ICS", "IF", "II", 

"II21", "II22","II31", "II32", "II33", "IO", 

"IW", "JA", "JB", "JBR", "JJ", "JJR", 

"JJT", "LE", "MC", "MC-MC", "MC1", "MC2", 

"MD", "MF", "ND1", "NN", "NN1", "NN121", 

"NN122","NN2", "NNJ", "NNJ1", "NNJ2", "NNL1", 

"NNL2", "NNO", "NNO2", "NNS", "NNS1", "NNS2", 

"NNSA1","NNSB1", "NNT1", "NNT2","NNU", "NNU1", 

"NNU2", "NNU21", "NNU22","NP", "NP1", "NP2", 

"NPD1", "NPM1","PN", "PN1", "PN121","PN122", 

"PNQO", "PNQS","PP$", "PPH1", "PPHO1","PPHO2", 

"PPHS1","PPHS2","PPIO1","PPIO2","PPIS1","PPIS2", 

"PPX1", "PPX121","PPX122","PPX2","PPY","RA", 

"REX21","REX22","RG", "RG21", "RG22", "RGA", 

"RGQ", "RGQV", "RGR", "RGT", "RL", "RL21", 

"RL22", "RP", "RR", "RR21", "RR22", "RR31", 

"RR32", "RR33", "RRQ", "RRQV", "RRR", "RRT", 

"RT", "TO", "UH", "VB0", "VBDR", "VBDZ", 

"VBG", "VBM", "VBN", "VBR", "VBZ", "VD0", 

"VDD", "VDG", "VDN", "VDZ", "VH0", "VHD", 

"VHG", "VHN", "VHZ", "VM", "VM21", "VV0", 

"VVD", "VVG", "VVN", "VVZ", "XX"} 

/**** probabilities for bi-stress model ie Q A ****/ 

float probs[NUMTAGS][3] = { 

0.0000, 0.00, 1.0000, 

0.2000, 0.00, 0.8000, 

0.8991, 0.00, 0.1009, 

0.9605, 0.00, 0.0395, 

0.9749, 0.00, 0.0250, 

0.5000, 0.00, 0.5000, 

0.7500, 0.00, 0.2500, 

0.8712, 0.00, 0.1288, 

1.0000, 0.00, 0.0000, 

0.0000, 0.00, 1.0000, 

1.0000, 0.00, 0.0000, 

0.8284, 0.00, 0.1716, 

0.5588, 0.00, 0.4412, 

0.5823, 0.00, 0.4177, 

0.2778, 0.00, 0.7222, 

0.7778, 0.00, 0.2223, 

0.7742, 0.00, 0.2258, 

1.0000, 0.00, 0.0000, 

0.9701, 0.00, 0.0299, 

0.0714, 0.00, 0.9286, 

0.3830, 0.00, 0.6170, 

0.2308, 0.00, 0.7692, 

0.2381, 0.00, 0.7619, 

143

0.2619, 0.00, 0.7381, 

0.0556, 0.00, 0.9445, 

0.0889, 0.00, 0.9112, 

0.1333, 0.00, 0.8666, 

0.2371, 0.00, 0.7629, 

0.3965, 0.00, 0.6035, 

1.0000, 0.00, 0.0000, 

0.0000, 0.00, 1.0000, 

0.3867, 0.00, 0.6133, 

1.0000, 0.00, 0.0000, 

0.5000, 0.00, 0.5000, 

1.0000, 0.00, 0.0000, 

0.2500, 0.00, 0.7500, 

0.6584, 0.00, 0.3416, 

1.0000, 0.00, 0.0000, 

0.0000, 0.00, 1.0000, 

0.8977, 0.00, 0.1023, 

0.5372, 0.00, 0.4628, 

0.9478, 0.00, 0.0522, 

0.8774, 0.00, 0.1226, 

0.3714, 0.00, 0.6286, 

0.9859, 0.00, 0.0141, 

1.0000, 0.00, 0.0000, 

0.0909, 0.00, 0.9091, 

1.0000, 0.00, 0.0000, 

0.9912, 0.00, 0.0088, 

0.8626, 0.00, 0.1373, 

0.0000, 0.00, 1.0000, 

0.1463, 0.00, 0.8537, 

0.0000, 0.00, 1.0000, 

0.0852, 0.00, 0.9148, 

0.0811, 0.00, 0.9189, 

0.1176, 0.00, 0.8823, 

0.2500, 0.00, 0.7500, 

0.1119, 0.00, 0.8881, 

0.2500, 0.00, 0.7500, 

0.1837, 0.00, 0.8164, 

0.0000, 0.00, 1.0000, 

0.1908, 0.00, 0.8092, 

0.3429, 0.00, 0.6572, 

0.1304, 0.00, 0.8696, 

0.1628, 0.00, 0.8372, 

0.0759, 0.00, 0.9241, 

0.0000, 0.00, 1.0000, 

0.0000, 0.00, 1.0000, 

0.0920, 0.00, 0.9080, 

0.1624, 0.00, 0.8376, 

0.0588, 0.00, 0.9412, 

0.0741, 0.00, 0.9260, 

0.1348, 0.00, 0.8653, 

0.1111, 0.00, 0.8889, 

0.3467, 0.00, 0.6534, 

0.0000, 0.00, 1.0000, 

0.0000, 0.00, 1.0000, 

0.1413, 0.00, 0.8587, 

0.0833, 0.00, 0.9167, 

0.0000, 0.00, 1.0000, 

0.7595, 0.00, 0.2405, 

144

0.1667, 0.00, 0.8333, 

0.1522, 0.00, 0.8478, 

0.3000, 0.00, 0.7000, 

0.1176, 0.00, 0.8824, 

0.0816, 0.00, 0.9184, 

0.9643, 0.00, 0.0357, 

0.1852, 0.00, 0.8148, 

0.5000, 0.00, 0.5000, 

0.0889, 0.00, 0.9111, 

0.0000, 0.00, 1.0000, 

0.0000, 0.00, 1.0000, 

0.0625, 0.00, 0.9376, 

0.0000, 0.00, 1.0000, 

0.1316, 0.00, 0.8685, 

0.0000, 0.00, 1.0000, 

1.0000, 0.00, 0.0000, 

0.0000, 0.00, 1.0000, 

0.8966, 0.00, 0.1035, 

0.8333, 0.00, 0.1667, 

0.9316, 0.00, 0.0684, 

0.7059, 0.00, 0.2941, 

0.9524, 0.00, 0.0476, 

0.8788, 0.00, 0.1212, 

0.8017, 0.00, 0.1984, 

0.7000, 0.00, 0.3000, 

1.0000, 0.00, 0.0000, 

0.7797, 0.00, 0.2203, 

0.9057, 0.00, 0.0944, 

0.0323, 0.00, 0.9677, 

0.0000, 0.00, 1.0000, 

0.0000, 0.00, 1.0000, 

0.1667, 0.00, 0.8333, 

0.9577, 0.00, 0.0423, 

0.2143, 0.00, 0.7857, 

0.9677, 0.00, 0.0323, 

0.0323, 0.00, 0.9678, 

0.4370, 0.00, 0.5631, 

1.0000, 0.00, 0.0000, 

1.0000, 0.00, 0.0000, 

0.1429, 0.00, 0.8571, 

0.5000, 0.00, 0.5000, 

0.0000, 0.00, 1.0000, 

0.5263, 0.00, 0.4736, 

0.5652, 0.00, 0.4348, 

0.1351, 0.00, 0.8648, 

1.0000, 0.00, 0.0000, 

0.0000, 0.00, 1.0000, 

0.3188, 0.00, 0.6811, 

0.1512, 0.00, 0.8488, 

0.8500, 0.00, 0.1500, 

0.1000, 0.00, 0.9000, 

0.6250, 0.00, 0.3750, 

0.1250, 0.00, 0.8750, 

0.8750, 0.00, 0.1250, 

0.5694, 0.00, 0.4306, 

1.0000, 0.00, 0.0000, 

0.0238, 0.00, 0.9762, 

0.0000, 0.00, 1.0000, 

145

0.1982, 0.00, 0.8018, 

0.9927, 0.00, 0.0073, 

0.3077, 0.00, 0.6923, 

0.9162, 0.00, 0.0838, 

0.9000, 0.00, 0.1000, 

0.9255, 0.00, 0.0745, 

0.7857, 0.00, 0.2143, 

0.0000, 0.00, 1.0000, 

0.9123, 0.00, 0.0878, 

0.8244, 0.00, 0.1756, 

0.8545, 0.00, 0.1455, 

0.2800, 0.00, 0.7200, 

0.4242, 0.00, 0.5757, 

0.1667, 0.00, 0.8333, 

0.4000, 0.00, 0.6000, 

0.3846, 0.00, 0.6154, 

0.8092, 0.00, 0.1908, 

0.8621, 0.00, 0.1379, 

0.5455, 0.00, 0.4545, 

0.3750, 0.00, 0.6250, 

0.8587, 0.00, 0.1413, 

0.7267, 0.00, 0.2734, 

0.1429, 0.00, 0.8571, 

0.1789, 0.00, 0.8211, 

0.1416, 0.00, 0.8584, 

0.1111, 0.00, 0.8889, 

0.0812, 0.00, 0.9188, 

0.2051, 0.00, 0.7948, 

0.2340, 0.00, 0.7660} 

float bigrams[5][5] = { 

0.1016, 0.0000, 0.2397, 0.0024, 0.0115, /* QQ -- QA QT QI */ 

0.0000, 0.0000, 0.0000, 0.0000, 0.0000, /* -- -- -- -- -- */ 

0.1305, 0.0000, 0.1442, 0.0404, 0.1377, /* AQ -- AA AT AI */ 

0.0226, 0.0000, 0.0202, 0.0000, 0.0000, /* TQ -- TA -- -- */ 

0.1005, 0.0000, 0.0488, 0.0000, 0.0000} /* IQ -- IA -- -- */ 

/**** end of probabilities for bi-stress model ie Q A ****/ 


{ 

int i, j, k, l, pos, done, w, tustart, tuend, numberstates 

double value, bigvalue[NUMBEST] 

int state[MAXWORDS], bigstate[NUMBEST][MAXWORDS], stress[MAXWORDS] 

float sentence[MAXWORDS][3] 

char c, word[MAXWORDS][30], wordtag[MAXWORDS][7] 

/* read in tu data from standard input */ 

if ((c = getc(stdin)) == 'T') 

tustart = 3 

else 

tustart = 4 

if (c != 'I' && tustart == 4) 

printf("Invalid tustart character. Assuming I\n") 

146

if ((c = getc(stdin)) == 'T') 

tuend = 3 

else 

tuend = 4 

if (c != 'I' && tuend == 4) 

printf("Invalid tuend character. Assuming I\n") 

scanf("%d", &w) /* get number of words */ 

if (w < 0 || w > 40) { 

printf("Number of words(%d) too many. Exiting\n", 

w) 

exit(1) 

} 

for (i = 0 i < w i++) { 

scanf("%s %s", wordtag[i], word[i]) 

/* find tag in tags and set sentence[w][0..2] appropriately */ 

for (j = 0 j < NUMTAGS j++) 

if (!strcmp(tags[j], wordtag[i])) {/* found the right tag */ 

for (k = 0 k < 3 k++) 

sentence[i][k] = probs[j][k] 

break 

} 

if (j == NUMTAGS) { 

printf("Invalid tag(%s). Exiting\n", wordtag[i]) 

exit(1) 

} 

} 

/* number of possible states 3^w */ 

numberstates = pow(3.0, (double)w) 

for (j = 0 j < NUMBEST j++) 

bigvalue[j] = 0.0 /* reset best values */ 

for (j = 0 j < w j++) 

state[j] = 0 /* set up first state */ 

/* print out words and tags */ 

for (k = 0, j = 0 j < w j++) { 

k += strlen(word[j]) + strlen(wordtag[j]) + 2 

#ifdef SINGLELINEOUTPUT 

if (k > 70) { 


k = 0 

} 

#endif 

printf("%s=%s ", word[j], wordtag[j]) 

/* assume unstressed, unless otherwise set below */ 

stress[j] = 0 

for (i = 0 i < strlen(word[j]) i++) 

if (word[j][i] == ',' || word[j][i] == '/' 

|| word[j][i] == '`' || word[j][i] == '\\' 

|| word[j][i] == '_' || 

word[j][i] == '~' || word[j][i] == '*') 

stress[j] = 2 

} 

#ifdef SINGLELINEOUTPUT 


#endif 

147

for (i = 0 i < numberstates i++) { 

for (value = 1, j = 0 j < w j++) 

value *= sentence[j][state[j]] 

for (j = 1 j < w j++) 

value *= bigrams[state[j-1]][state[j]] 

value *= bigrams[tustart][state[0]] * bigrams[state[w-1]][tuend] 

/* keep track of the top NUMBEST most probable sequences */ 

for (j = 0 value < bigvalue[j] && j < NUMBEST 

j++) 

 

if (j < NUMBEST) { 

/* shuffle other values to make room */ 

for (k = NUMBEST - 1 k > j k--) { 

bigvalue[k] = bigvalue[k-1] 

for (l = 0 l < w l++) 

bigstate[k][l] = bigstate[k-1][l] 

} 

bigvalue[j] = value 

for (k = 0 k < w k++) 

bigstate[j][k] = state[k] 

} 

} 

pos = 0 /* update state[] to next state */ 

do { 

if (++state[pos] == 3) { 

state[pos] = 0 

pos++ 

done = 0 

} else 

done = 1 

} while (pos < w && !done) 

for (i = 0 i < NUMBEST i++) { 

printf("%g ", bigvalue[i]) 

if (tustart == 3) 

printf("T") 

else 

printf("I") 

k = 0 

for (j = 0 j < w j++) { 

if (stress[j] != bigstate[i][j]) 

k++ 

switch (bigstate[i][j]) { 

case 0: 

printf("Q") 

break 

case 1: 

printf("S") 

break 

case 2: 

printf("A") 

break 

} 

} 

148

} 

} 

if (tuend == 3) 

printf("T ") 

else 

printf("I ") 

if (k == 0) 

printf("CORRECT\n") 

else 

printf("ERROR(%d)\n", k) 

F.10 probability3.c 

Probability3 is the original prosodic mark prediction model described in chapter 6. 

/* probability.c copyright Simon Arnfield 15th January 1993 */ 

/* probability3.c (version 4) copyright 25/5/93, 15/7/93 */ 

/* cc -o probability3 probability3.c -lm */ 






char tags[NUMTAGS][7] ={ 

"&FO", "&FW", "APP$", "AT", "AT1", "BTO21", 

"BTO22","CC", "CC31", "CC32", "CC33", "CCB", 

"CF", "CS", "CS21", "CS22", "CSA", "CSN", 

"CST", "CSW", "DA", "DA1", "DA2", "DAR", 

"DAT", "DB", "DB2", "DD", "DD1", "DD121", 

"DD122", "DD2","DD21", "DD22", "DD221","DD222", 

"DDQ", "DDQ$","DDQV", "EX", "ICS", "IF", "II", 

"II21", "II22","II31", "II32", "II33", "IO", 

"IW", "JA", "JB", "JBR", "JJ", "JJR", 

"JJT", "LE", "MC", "MC-MC", "MC1", "MC2", 

"MD", "MF", "ND1", "NN", "NN1", "NN121", 

"NN122","NN2", "NNJ", "NNJ1", "NNJ2", "NNL1", 

"NNL2", "NNO", "NNO2", "NNS", "NNS1", "NNS2", 

"NNSA1","NNSB1", "NNT1", "NNT2","NNU", "NNU1", 

"NNU2", "NNU21", "NNU22","NP", "NP1", "NP2", 

"NPD1", "NPM1","PN", "PN1", "PN121","PN122", 

"PNQO", "PNQS","PP$", "PPH1", "PPHO1","PPHO2", 

"PPHS1","PPHS2","PPIO1","PPIO2","PPIS1","PPIS2", 

"PPX1", "PPX121","PPX122","PPX2","PPY","RA", 

"REX21","REX22","RG", "RG21", "RG22", "RGA", 

"RGQ", "RGQV", "RGR", "RGT", "RL", "RL21", 

"RL22", "RP", "RR", "RR21", "RR22", "RR31", 

"RR32", "RR33", "RRQ", "RRQV", "RRR", "RRT", 

"RT", "TO", "UH", "VB0", "VBDR", "VBDZ", 

"VBG", "VBM", "VBN", "VBR", "VBZ", "VD0", 

"VDD", "VDG", "VDN", "VDZ", "VH0", "VHD", 

"VHG", "VHN", "VHZ", "VM", "VM21", "VV0", 

"VVD", "VVG", "VVN", "VVZ", "XX"} 

/**** probabilities for pent-stress model ie Rise Fall Vfallrise Str Ustr ****/ 

149

float probs[NUMTAGS][5] = { 

0.0001, 0.0001, 0.1443, 0.0357, 0.0001, 

0.1869, 0.5785, 0.2886, 0.3213, 0.1090, 

0.1869, 0.7954, 1.0101, 0.5355, 6.6042, 

0.0001, 1.3015, 0.4329, 2.2135, 43.9407, 

0.1869, 0.0723, 0.1443, 0.5712, 16.1072, 

0.0001, 0.0001, 0.0001, 0.0714, 0.0436, 

0.0001, 0.0001, 0.0001, 0.0357, 0.0654, 

1.1215, 0.9400, 0.8658, 2.7133, 14.8867, 

0.0001, 0.0001, 0.0001, 0.0001, 0.0218, 

0.0001, 0.0723, 0.0001, 0.0001, 0.0001, 

0.0001, 0.0001, 0.0001, 0.0001, 0.0218, 

0.3738, 0.0001, 0.1443, 0.9282, 3.0514, 

0.7477, 0.0723, 0.2886, 0.2856, 0.4141, 

0.7477, 0.8677, 0.0001, 1.7851, 2.0052, 

0.0001, 0.2892, 0.1443, 0.2856, 0.1090, 

0.0001, 0.0723, 0.1443, 0.0714, 0.3051, 

0.1869, 0.2892, 0.0001, 0.5712, 1.5693, 

0.0001, 0.0001, 0.0001, 0.0001, 0.5231, 

0.0001, 0.0001, 0.0001, 0.2856, 5.6670, 

0.1869, 0.2892, 0.0001, 0.2856, 0.0218, 

0.1869, 0.2169, 0.5772, 0.7497, 0.3923, 

0.1869, 0.1446, 0.2886, 0.5355, 0.1308, 

0.5607, 0.4338, 1.0101, 0.5712, 0.2180, 

0.3738, 0.6508, 0.4329, 0.6069, 0.2398, 

0.3738, 0.1446, 1.2987, 0.1428, 0.0218, 

0.1869, 1.9523, 2.3088, 1.3567, 0.1744, 

0.0001, 0.0723, 0.8658, 0.2142, 0.0436, 

0.0001, 1.0123, 2.4531, 1.5352, 0.5013, 

1.1215, 3.0369, 4.6176, 3.2845, 2.4629, 

0.0001, 0.0001, 0.0001, 0.0001, 0.0654, 

0.0001, 0.0723, 0.1443, 0.0357, 0.0001, 

0.9346, 0.2892, 0.5772, 1.1782, 0.6321, 

0.0001, 0.0001, 0.0001, 0.0001, 0.0872, 

0.0001, 0.0723, 0.0001, 0.0357, 0.0436, 

0.0001, 0.0001, 0.0001, 0.0001, 0.2616, 

0.0001, 0.2169, 0.1443, 0.1785, 0.0654, 

0.5607, 0.7954, 0.1443, 1.4281, 2.3104, 

0.0001, 0.0001, 0.0001, 0.0001, 0.0654, 

0.0001, 0.2169, 0.1443, 0.0001, 0.0001, 

0.0001, 0.0001, 0.0001, 0.3213, 1.7219, 

0.1869, 0.7231, 0.7215, 1.4281, 1.4167, 

0.1869, 0.0723, 0.0001, 0.4284, 5.5362, 

1.4953, 2.6030, 3.7518, 6.7119, 40.2572, 

0.1869, 0.6508, 1.0101, 0.9639, 0.5667, 

0.0001, 0.0001, 0.0001, 0.0357, 1.5257, 

0.0001, 0.0001, 0.0001, 0.0001, 0.2398, 

0.0001, 0.2169, 0.1443, 0.2142, 0.0218, 

0.0001, 0.0001, 0.0001, 0.0001, 0.2398, 

0.0001, 0.0723, 0.0001, 0.2499, 19.5292, 

0.0001, 0.2892, 0.4329, 0.6426, 3.4220, 

0.1869, 0.2892, 0.0001, 0.0714, 0.0001, 

1.3084, 1.8800, 3.0303, 3.0703, 0.5231, 

0.0001, 0.0001, 0.1443, 0.0357, 0.0001, 

15.5140, 25.3073, 30.8802, 32.4884, 3.1604, 

0.3738, 0.5061, 1.7316, 0.4641, 0.0654, 

0.3738, 1.0123, 2.0202, 0.5355, 0.1308, 

0.0001, 0.2169, 0.1443, 0.2856, 0.0872, 

150

2.6168, 7.2307, 5.6277, 8.7112, 1.0898, 

0.0001, 0.0723, 0.0001, 0.0714, 0.0218, 

0.9346, 0.8677, 1.7316, 1.8208, 0.3923, 

0.5607, 0.2892, 0.2886, 0.0714, 0.0001, 

0.5607, 1.5907, 4.1847, 2.4634, 0.6321, 

0.5607, 1.0846, 0.1443, 0.9639, 0.5231, 

0.1869, 0.6508, 0.1443, 0.3213, 0.0654, 

3.1776, 2.6030, 1.2987, 2.9275, 0.6103, 

85.7944, 71.0774, 67.0996, 51.5173, 5.9939, 

0.0001, 0.0723, 0.0001, 0.0357, 0.0001, 

0.0001, 0.0723, 0.0001, 0.0357, 0.0001, 

34.7664, 29.5011, 24.3867, 22.4206, 3.0732, 

2.2430, 1.8800, 2.1645, 1.6066, 0.4141, 

0.5607, 0.3615, 0.1443, 0.2499, 0.0218, 

0.5607, 0.3615, 0.4329, 0.4998, 0.0436, 

3.5514, 3.1092, 1.4430, 1.7851, 0.4141, 

0.0001, 0.3615, 0.1443, 0.3570, 0.0436, 

0.5607, 0.7954, 1.4430, 0.8925, 0.5667, 

0.0001, 0.2169, 0.0001, 0.0357, 0.0001, 

0.1869, 0.0001, 0.0001, 0.0001, 0.0001, 

1.1215, 0.7954, 1.0101, 1.9636, 0.2833, 

0.3738, 0.2169, 0.5772, 0.0714, 0.0218, 

0.1869, 0.0001, 0.0001, 0.0001, 0.0001, 

0.3738, 0.0723, 0.0001, 0.5712, 1.3078, 

3.3645, 3.6876, 2.3088, 3.2131, 0.7629, 

2.4299, 2.4584, 1.1544, 0.8211, 0.3051, 

0.1869, 0.0723, 0.1443, 0.1428, 0.0654, 

0.7477, 0.2892, 0.4329, 0.1428, 0.0436, 

0.5607, 1.5907, 0.5772, 0.5712, 0.0872, 

0.0001, 0.0001, 0.0001, 0.0357, 0.5885, 

0.0001, 0.7231, 0.4329, 0.3213, 0.1090, 

0.0001, 0.0723, 0.0001, 0.1071, 0.0872, 

26.7290, 23.1381, 27.7056, 19.4216, 2.5501, 

0.0001, 0.1446, 0.0001, 0.0001, 0.0001, 

0.0001, 0.3615, 0.0001, 0.1428, 0.0001, 

0.1869, 1.1569, 1.1544, 0.1785, 0.0436, 

0.0001, 0.2169, 0.0001, 0.0357, 0.0001, 

0.7477, 0.5061, 1.1544, 0.4998, 0.1090, 

0.0001, 0.0001, 0.1443, 0.0357, 0.0001, 

0.0001, 0.0001, 0.0001, 0.0001, 0.0436, 

0.0001, 0.0001, 0.0001, 0.0357, 0.0001, 

0.1869, 0.0723, 0.0001, 0.2499, 1.7001, 

0.0001, 0.0723, 0.0001, 0.0001, 0.1090, 

0.0001, 0.1446, 0.0001, 0.5712, 5.3400, 

0.1869, 0.0723, 0.0001, 0.1071, 0.2616, 

0.0001, 0.0001, 0.1443, 0.0357, 0.8718, 

0.0001, 0.4338, 0.1443, 0.3213, 2.5283, 

0.1869, 0.3615, 1.0101, 0.3927, 2.1142, 

0.0001, 0.0723, 0.0001, 0.0714, 0.1526, 

0.0001, 0.0001, 0.0001, 0.0001, 0.5013, 

0.0001, 0.0723, 0.4329, 0.3213, 1.0026, 

0.0001, 0.0001, 0.1443, 0.3213, 2.0924, 

0.1869, 0.7954, 1.1544, 0.3570, 0.0218, 

0.0001, 0.0001, 0.0001, 0.0714, 0.0001, 

0.0001, 0.0723, 0.0001, 0.0357, 0.0001, 

0.1869, 0.0723, 0.7215, 0.1071, 0.0436, 

0.0001, 0.0001, 0.1443, 0.0714, 1.4821, 

0.3738, 0.5061, 0.4329, 0.3570, 0.1308, 

151

0.0001, 0.0001, 0.0001, 0.0357, 0.6539, 

0.7477, 0.1446, 0.1443, 0.8211, 0.0218, 

0.1869, 1.0123, 0.7215, 1.6780, 1.1334, 

0.0001, 0.0001, 0.0001, 0.0001, 0.0436, 

0.0001, 0.0001, 0.0001, 0.0001, 0.0436, 

0.0001, 0.2169, 0.0001, 0.1071, 0.0218, 

0.1869, 0.0001, 0.0001, 0.1071, 0.0872, 

0.0001, 0.0001, 0.0001, 0.0357, 0.0001, 

0.0001, 0.0723, 0.4329, 0.4998, 0.4359, 

0.1869, 0.0723, 0.1443, 0.2499, 0.2833, 

1.6822, 2.0969, 1.0101, 1.8208, 0.3269, 

0.0001, 0.0001, 0.0001, 0.0001, 0.0654, 

0.1869, 0.0723, 0.1443, 0.0001, 0.0001, 

3.3645, 4.1215, 1.8759, 1.8922, 1.4385, 

8.5981, 13.3767, 14.2857, 11.6744, 2.5501, 

0.0001, 0.0001, 0.0001, 0.3213, 1.1116, 

1.4953, 1.2292, 0.8658, 0.8211, 0.1308, 

0.0001, 0.0001, 0.0001, 0.1071, 0.1090, 

0.0001, 0.4338, 0.0001, 0.0357, 0.0218, 

0.0001, 0.0001, 0.0001, 0.0357, 0.1526, 

0.0001, 0.7954, 0.2886, 0.6426, 0.8936, 

0.0001, 0.0001, 0.0001, 0.0001, 0.0218, 

0.5607, 0.7231, 1.0101, 0.7497, 0.0218, 

0.0001, 0.0001, 0.0001, 0.0357, 0.0001, 

2.6168, 1.3015, 1.7316, 1.6066, 0.4795, 

0.0001, 0.0001, 0.0001, 0.1071, 8.9364, 

0.9346, 0.2169, 0.0001, 0.3570, 0.1744, 

0.1869, 0.2169, 0.0001, 0.3570, 3.3348, 

0.0001, 0.1446, 0.1443, 0.2499, 1.9616, 

0.0001, 0.1446, 0.1443, 0.6426, 5.6888, 

0.0001, 0.0001, 0.0001, 0.2142, 0.4795, 

0.0001, 0.0723, 0.0001, 0.0001, 0.0001, 

0.1869, 0.0723, 0.0001, 0.2856, 2.2668, 

0.0001, 0.6508, 0.2886, 0.4284, 2.3540, 

0.5607, 0.7954, 1.1544, 0.6069, 4.9913, 

0.0001, 0.4338, 1.1544, 0.1428, 0.1526, 

0.1869, 0.4338, 0.0001, 0.4284, 0.3051, 

0.0001, 0.0001, 0.1443, 0.1428, 0.0218, 

0.0001, 0.0723, 0.0001, 0.0714, 0.0436, 

0.0001, 0.1446, 0.0001, 0.2142, 0.1090, 

0.1869, 0.2169, 0.7215, 0.7140, 2.6809, 

0.0001, 0.1446, 0.0001, 0.3570, 1.6347, 

0.0001, 0.0001, 0.0001, 0.1785, 0.1308, 

0.0001, 0.1446, 0.0001, 0.1071, 0.0654, 

0.1869, 0.2169, 0.1443, 0.2856, 1.7219, 

0.3738, 1.8077, 1.4430, 1.6066, 4.7515, 

0.0001, 0.1446, 0.0001, 0.1428, 0.0218, 

13.8318, 10.9183, 8.0808, 13.5666, 3.1386, 

4.2991, 5.3507, 4.3290, 9.9607, 1.4603, 

4.4860, 4.3384, 3.6075, 8.1042, 0.9154, 

14.0187, 12.2198, 10.3896, 12.9597, 1.3078, 

2.4299, 2.0969, 0.7215, 3.8558, 0.8718, 

0.1869, 1.8077, 1.1544, 1.3567, 0.4795 } 

float bigram[6][6] = { 

0.0000, 0.0040, 0.0148, 0.0047, 0.0372, 0.1060, /* TUB */ 

0.1313, 0.0022, 0.0025, 0.0013, 0.0082, 0.0212, /* rise */ 

152

0.0903, 0.0008, 0.0057, 0.0040, 0.0200, 0.0458, /* fall */ 

0.0685, 0.0007, 0.0043, 0.0015, 0.0413, 0.0503, /* V fallrise */ 

0.0405, 0.0102, 0.0257, 0.0072, 0.0262, 0.0570, /* stress */ 

0.0095, 0.0080, 0.0318, 0.0098, 0.0530, 0.0545} /* unstress */ 

/*** 

/* 0.000, 0.024, 0.089, 0.028, 0.223, 0.636, /* TUB */ 

/* 0.788, 0.013, 0.015, 0.008, 0.049, 0.127, /* Rise */ 

/* 0.542, 0.005, 0.034, 0.024, 0.120, 0.275, /* Fall */ 

/* 0.411, 0.004, 0.026, 0.009, 0.248, 0.302, /* V fallrise */ 

/* 0.243, 0.061, 0.154, 0.043, 0.157, 0.342, /* Stress */ 

/* 0.057, 0.048, 0.191, 0.059, 0.318, 0.327} /* Unstress */ 

/* TUB Rise Fall V Str Ustr */ 


{ 

int i, j, k, pos, done, w, numberstates, s, tub = 0 

double value, bigvalue, v1, v2 

short int state[MAXWORDS], bigstate[MAXWORDS] 

float sentence[MAXWORDS][6] 

char wordtag[MAXWORDS][7], prosody[MAXWORDS][24] 

/**printf("\nNumber of tags >")**/ 


if (w < 0 || w > MAXWORDS) { 

/*printf("Number of words(%d) not suitable. Exiting\n",w)*/ 

exit(0) 

} 

for (j = 0 j < w j++) 

state[j] = 1 /* set up first state range from 1...1 to 5...5 */ 

bigvalue = 0.0 /* reset best values */ 

/**printf("Expecting %d tags (and or tone unit boundaries)\n",w)**/ 

for (i = 0 i < w i++) { 

/**printf("Tag & word %d(of %d)>",i+1,w)**/ 

scanf("%s %s", wordtag[i], prosody[i]) 


if (!strcmp(tags[j], wordtag[i])) {/* found the right tag */ 

sentence[i][0] = 0.0 /* TUB */ 

sentence[i][1] = probs[j][0] /* RISE */ 

sentence[i][2] = probs[j][1] /* FALL */ 

sentence[i][3] = probs[j][2] /* FALL-RISE */ 

sentence[i][4] = probs[j][3] /* STRESSED */ 

sentence[i][5] = probs[j][4] /* UNSTRESSED */ 

break 

} 

if (!strcmp("{CP}", wordtag[i])) {/* ie it is a compound */ 

/*printf("A Compound - assuming probably stressed\n")*/ 

sentence[i][0] = 0.0 

sentence[i][1] = 0.25 /* treat as alomst */ 



sentence[i][4] = 0.399 /* always stressed */ 

sentence[i][5] = 0.001 

153

} 

} 


/*printf("Assuming %s is a TUB - restricting search space\n", 

prosody[i])*/ 

tub++ 

sentence[i][0] = 1.0 /* a tu is always a tu */ 






state[i] = 0 /* prevent state changing a tu */ 

} 


numberstates = pow(5.0, (double)(w - tub)) 

/* but TUBs don't alter hence reduced search space */ 

/**printf("Processing %d states\n",numberstates)**/ 


value = 1.0 

v1 = 1.0 

v2 = 1.0 

for (j = 0 j < w j++) 

v1 *= (double)sentence[j][state[j]] 

for (j = 1 j < w j++) 

v2 *= (double)bigram[state[j-1]][state[j]] 

value = v1 * v2 

if (value > bigvalue) { 

/** printf("BEST SO FAR:")**/ 

bigvalue = value 

for (k = 0 k < w k++) { 

bigstate[k] = state[k] /* save new best state */ 

/** switch(state[k]) { 

case 0: printf("|") break 

case 1: printf("R") break 

case 2: printf("F") break 

case 3: printf("V") break 

case 4: printf("S") break 

case 5: printf("U") 

}**/ 

} 

/** printf(" %e\n",value)**/ 

} 

pos = w - 1 /* update state[] to next state */ 

do { 

done = 1 /* by default have done unless changed below */ 

if (state[pos] == 0) 

pos-- /* don't change tub state */ 

if (pos >= 0) { /* don't go past end */ 

if (++state[pos] == 6) {/* if state increases to 6 */ 

state[pos] = 1 /* reset to 1 */ 

154

} 

pos-- /* and point to next word */ 

done = 0 /* and say we haven't done */ 

} 

} 

} while (pos >= 0 && !done) 

for (i = 0 i < w i++) 

printf("%s=%s ", wordtag[i], prosody[i]) 

printf("\nPredicted: ") 

for (i = 0 i < w i++) { 

switch (bigstate[i]) { 

case 0: 

printf("|") 

break 

case 1: 

printf("R") 

break 

case 2: 

printf("F") 

break 

case 3: 

printf("V") 

break 

case 4: 


break 

case 5: 

printf("U") 

} 

} 

printf("\nShould Be: ") 

for (i = 0 i < w i++) { 

if (bigstate[i] == 0) 

s = 0 /* tu boundary */ 

else 

for (s = 5, j = 0 j < strlen(prosody[i]) 

j++) { 

if ((prosody[i][j] == ',' || prosody[i][j] 

== '/') && s >= 4) 

s = 1 

if ((prosody[i][j] == '\\' || prosody[i][j] 

== '`') && s >= 4) 

s = 2 


== '/') && s == 2) 

s = 3 

if ((prosody[i][j] == '*' || prosody[i][j] 

== '_' || prosody[i][j] == '~') && 

s == 5) 

s = 4 

if ((prosody[i][j] == '') && s == 5) 

s = 4 

} 

switch (s) { 

case 0: 

155

} 


break 

case 1: 


break 

case 2: 


break 

case 3: 


break 

case 4: 


break 

case 5: 


} 

} 


F.11 probabilityc.c 

Probabilityc is the composite model for prosodic mark prediction. See chapters 5 and 6. 

/* probabilityc.c copyright Simon Arnfield 15th January 1993 */ 

/* multistate-probabilities.c (version 4) copyright 25/5/93 */ 

/* probability3.c (version 4) copyright 25/5/93, 15/7/93 */ 

/* probabilityc.c (composite model) copyright 2/8/93 */ 

/* cc -o probabilityc probabilityc.c -lm */ 






#define NUMBEST 2 

struct tagtype { 

char tagname[7] 

int group 

float prb2U, prb2S 

float prb5R, prb5F, prb5V, prb5S, prb5U 

} 

struct tagtype tags[187] = { 

"&FO", 6, 0.0025, 0.9975, 0.0001, 0.0001, 

0.1443, 0.0357, 0.0001, 

"&FW", 1, 0.2000, 0.8000, 0.1869, 0.5785, 

0.2886, 0.3213, 0.1090, /* not sure about group */ 

"APP$", 1, 0.8991, 0.1009, 0.1869, 0.7954, 

1.0101, 0.5355, 6.6042, 

"AT", 3, 0.9605, 0.0395, 0.0001, 1.3015, 

0.4329, 2.2135, 43.9407, 

"AT1", 3, 0.9749, 0.0250, 0.1869, 0.0723, 

0.1443, 0.5712, 16.1072, 

156

"BTO", 3, 0.5000, 0.5000, 0.0001, 0.0001, 

0.0001, 0.1000, 0.1000, /* no data */ 

"BTO21", 3, 0.5000, 0.5000, 0.0001, 0.0001, 

0.0001, 0.0714, 0.0436, 

"BTO22", 3, 0.7500, 0.2500, 0.0001, 0.0001, 

0.0001, 0.0357, 0.0654, 

"CC", 1, 0.8712, 0.1288, 1.1215, 0.9400, 

0.8658, 2.7133, 14.8867, 

"CC31", 1, 0.9975, 0.0025, 0.0001, 0.0001, 

0.0001, 0.0001, 0.0218, 

"CC32", 1, 0.0025, 0.9975, 0.0001, 0.0723, 

0.0001, 0.0001, 0.0001, 

"CC33", 1, 0.9975, 0.0025, 0.0001, 0.0001, 

0.0001, 0.0001, 0.0218, 

"CCB", 4, 0.8284, 0.1716, 0.3738, 0.0001, 

0.1443, 0.9282, 3.0514, 

"CF", 4, 0.5588, 0.4412, 0.7477, 0.0723, 

0.2886, 0.2856, 0.4141, 

"CS", 4, 0.5823, 0.4177, 0.7477, 0.8677, 

0.0001, 1.7851, 2.0052, 

"CS21", 4, 0.2778, 0.7222, 0.0001, 0.2892, 

0.1443, 0.2856, 0.1090, 

"CS22", 4, 0.7778, 0.2223, 0.0001, 0.0723, 

0.1443, 0.0714, 0.3051, 

"CSA", 4, 0.7742, 0.2258, 0.1869, 0.2892, 

0.0001, 0.5712, 1.5693, 

"CSN", 3, 0.9975, 0.0025, 0.0001, 0.0001, 

0.0001, 0.0001, 0.5231, 

"CST", 3, 0.9701, 0.0299, 0.0001, 0.0001, 

0.0001, 0.2856, 5.6670, 

"CSW", 4, 0.0714, 0.9286, 0.1869, 0.2892, 

0.0001, 0.2856, 0.0218, 

"DA", 5, 0.3830, 0.6170, 0.1869, 0.2169, 

0.5772, 0.7497, 0.3923, 

"DA1", 5, 0.2308, 0.7692, 0.1869, 0.1446, 

0.2886, 0.5355, 0.1308, 

"DA2", 5, 0.2381, 0.7619, 0.5607, 0.4338, 

1.0101, 0.5712, 0.2180, 

"DA2R", 5, 0.5000, 0.5000, 0.2000, 0.2000, 

0.2000, 0.2000, 0.2000, /* no data */ 

"DAR", 5, 0.2619, 0.7381, 0.3738, 0.6508, 

0.4329, 0.6069, 0.2398, 

"DAT", 5, 0.0556, 0.9445, 0.3738, 0.1446, 

1.2987, 0.1428, 0.0218, 

"DB", 5, 0.0889, 0.9112, 0.1869, 1.9523, 

2.3088, 1.3567, 0.1744, 

"DB2", 5, 0.1333, 0.8666, 0.0001, 0.0723, 

0.8658, 0.2142, 0.0436, 

"DD", 6, 0.2371, 0.7629, 0.0001, 1.0123, 

2.4531, 1.5352, 0.5013, 

"DD1", 9, 0.3965, 0.6035, 1.1215, 3.0369, 

4.6176, 3.2845, 2.4629, 

"DD121", 9, 0.9975, 0.0025, 0.0001, 0.0001, 

0.0001, 0.0001, 0.0654, 

"DD122", 9, 0.0025, 0.9975, 0.0001, 0.0723, 

0.1443, 0.0357, 0.0001, 

"DD2", 9, 0.3867, 0.6133, 0.9346, 0.2892, 

0.5772, 1.1782, 0.6321, 

157

"DD21", 9, 0.9975, 0.0025, 0.0001, 0.0001, 

0.0001, 0.0001, 0.0872, 

"DD22", 9, 0.5000, 0.5000, 0.0001, 0.0723, 

0.0001, 0.0357, 0.0436, 

"DD221", 9, 0.9975, 0.0025, 0.0001, 0.0001, 

0.0001, 0.0001, 0.2616, 

"DD222", 9, 0.2500, 0.7500, 0.0001, 0.2169, 

0.1443, 0.1785, 0.0654, 

"DDQ", 4, 0.6584, 0.3416, 0.5607, 0.7954, 

0.1443, 1.4281, 2.3104, 

"DDQ$", 4, 0.9975, 0.0025, 0.0001, 0.0001, 

0.0001, 0.0001, 0.0654, 

"DDQV", 4, 0.0025, 0.9975, 0.0001, 0.2169, 

0.1443, 0.0001, 0.0001, 

"EX", 2, 0.8977, 0.1023, 0.0001, 0.0001, 

0.0001, 0.3213, 1.7219, 

"ICS", 4, 0.5372, 0.4628, 0.1869, 0.7231, 

0.7215, 1.4281, 1.4167, 

"IF", 2, 0.9478, 0.0522, 0.1869, 0.0723, 

0.0001, 0.4284, 5.5362, 

"II", 1, 0.8774, 0.1226, 1.4953, 2.6030, 

3.7518, 6.7119, 40.2572, 

"II21", 9, 0.3714, 0.6286, 0.1869, 0.6508, 

1.0101, 0.9639, 0.5667, 

"II22", 3, 0.9859, 0.0141, 0.0001, 0.0001, 

0.0001, 0.0357, 1.5257, 

"II31", 3, 0.9975, 0.0025, 0.0001, 0.0001, 

0.0001, 0.0001, 0.2398, 

"II32", 9, 0.0909, 0.9091, 0.0001, 0.2169, 

0.1443, 0.2142, 0.0218, 

"II33", 3, 0.9975, 0.0025, 0.0001, 0.0001, 

0.0001, 0.0001, 0.2398, 

"IO", 3, 0.9912, 0.0088, 0.0001, 0.0723, 

0.0001, 0.2499, 19.5292, 

"IW", 1, 0.8626, 0.1373, 0.0001, 0.2892, 

0.4329, 0.6426, 3.4220, 

"JA", 5, 0.0025, 0.9975, 0.1869, 0.2892, 

0.0001, 0.0714, 0.0001, 

"JB", 5, 0.1463, 0.8537, 1.3084, 1.8800, 

3.0303, 3.0703, 0.5231, 

"JBR", 5, 0.0025, 0.9975, 0.0001, 0.0001, 

0.1443, 0.0357, 0.0001, 

"JJ", 5, 0.0852, 0.9148, 15.5140, 25.3073, 

30.8802, 32.4884, 3.1604, 

"JJR", 5, 0.0811, 0.9189, 0.3738, 0.5061, 

1.7316, 0.4641, 0.0654, 

"JJT", 5, 0.1176, 0.8823, 0.3738, 1.0123, 

2.0202, 0.5355, 0.1308, 

"LE", 4, 0.2500, 0.7500, 0.0001, 0.2169, 

0.1443, 0.2856, 0.0872, 

"MC", 6, 0.1119, 0.8881, 2.6168, 7.2307, 

5.6277, 8.7112, 1.0898, 

"MC-MC", 6, 0.2500, 0.7500, 0.0001, 0.0723, 

0.0001, 0.0714, 0.0218, 

"MC1", 6, 0.1837, 0.8164, 0.9346, 0.8677, 

1.7316, 1.8208, 0.3923, 

"MC2", 6, 0.0025, 0.9975, 0.5607, 0.2892, 

0.2886, 0.0714, 0.0001, 

158

"MD", 6, 0.1908, 0.8092, 0.5607, 1.5907, 

4.1847, 2.4634, 0.6321, 

"MF", 9, 0.3429, 0.6572, 0.5607, 1.0846, 

0.1443, 0.9639, 0.5231, 

"ND1", 8, 0.1304, 0.8696, 0.1869, 0.6508, 

0.1443, 0.3213, 0.0654, 

"NN", 5, 0.1628, 0.8372, 3.1776, 2.6030, 

1.2987, 2.9275, 0.6103, 

"NN1", 8, 0.0759, 0.9241, 85.7944, 71.0774, 

67.0996, 51.5173, 5.9939, 

"NN121", 8, 0.0025, 0.9975, 0.0001, 0.0723, 

0.0001, 0.0357, 0.0001, 

"NN122", 8, 0.0025, 0.9975, 0.0001, 0.0723, 

0.0001, 0.0357, 0.0001, 

"NN2", 8, 0.0920, 0.9080, 34.7664, 29.5011, 

24.3867, 22.4206, 3.0732, 

"NNJ", 8, 0.1624, 0.8376, 2.2430, 1.8800, 

2.1645, 1.6066, 0.4141, 

"NNJ1", 8, 0.0588, 0.9412, 0.5607, 0.3615, 

0.1443, 0.2499, 0.0218, 

"NNJ2", 8, 0.0741, 0.9260, 0.5607, 0.3615, 

0.4329, 0.4998, 0.0436, 

"NNL1", 8, 0.1348, 0.8653, 3.5514, 3.1092, 

1.4430, 1.7851, 0.4141, 

"NNL2", 8, 0.1111, 0.8889, 0.0001, 0.3615, 

0.1443, 0.3570, 0.0436, 

"NNO", 9, 0.3467, 0.6534, 0.5607, 0.7954, 

1.4430, 0.8925, 0.5667, 

"NNO2", 9, 0.0025, 0.9975, 0.0001, 0.2169, 

0.0001, 0.0357, 0.0001, 

"NNS", 7, 0.0025, 0.9975, 0.1869, 0.0001, 

0.0001, 0.0001, 0.0001, 

"NNS1", 7, 0.1413, 0.8587, 1.1215, 0.7954, 

1.0101, 1.9636, 0.2833, 

"NNS2", 7, 0.0833, 0.9167, 0.3738, 0.2169, 

0.5772, 0.0714, 0.0218, 

"NNSA1", 4, 0.0025, 0.9975, 0.1869, 0.0001, 

0.0001, 0.0001, 0.0001, 

"NNSB1", 4, 0.7595, 0.2405, 0.3738, 0.0723, 

0.0001, 0.5712, 1.3078, 

"NNT1", 8, 0.1667, 0.8333, 3.3645, 3.6876, 

2.3088, 3.2131, 0.7629, 

"NNT2", 8, 0.1522, 0.8478, 2.4299, 2.4584, 

1.1544, 0.8211, 0.3051, 

"NNU", 9, 0.3000, 0.7000, 0.1869, 0.0723, 

0.1443, 0.1428, 0.0654, 

"NNU1", 9, 0.1176, 0.8824, 0.7477, 0.2892, 

0.4329, 0.1428, 0.0436, 

"NNU2", 9, 0.0816, 0.9184, 0.5607, 1.5907, 

0.5772, 0.5712, 0.0872, 

"NNU21", 9, 0.9643, 0.0357, 0.0001, 0.0001, 

0.0001, 0.0357, 0.5885, 

"NNU22", 9, 0.1852, 0.8148, 0.0001, 0.7231, 

0.4329, 0.3213, 0.1090, 

"NP", 5, 0.5000, 0.5000, 0.0001, 0.0723, 

0.0001, 0.1071, 0.0872, 

"NP1", 5, 0.0889, 0.9111, 26.7290, 23.1381, 

27.7056, 19.4216, 2.5501, 

159

"NP2", 5, 0.0025, 0.9975, 0.0001, 0.1446, 

0.0001, 0.0001, 0.0001, 

"NPD1", 9, 0.0025, 0.9975, 0.0001, 0.3615, 

0.0001, 0.1428, 0.0001, 

"NPM1", 9, 0.0625, 0.9376, 0.1869, 1.1569, 

1.1544, 0.1785, 0.0436, 

"PN", 1, 0.0025, 0.9975, 0.0001, 0.2169, 


"PN1", 1, 0.1316, 0.8685, 0.7477, 0.5061, 


"PN121", 1, 0.0025, 0.9975, 0.0001, 0.0001, 


"PN122", 1, 0.9975, 0.0025, 0.0001, 0.0001, 


"PNQO", 2, 0.0025, 0.9975, 0.0001, 0.0001, 

0.0001, 0.0357, 0.0001, 

"PNQS", 2, 0.8966, 0.1035, 0.1869, 0.0723, 

0.0001, 0.2499, 1.7001, 

"PP$", 1, 0.8333, 0.1667, 0.0001, 0.0723, 

0.0001, 0.0001, 0.1090, 

"PPH1", 2, 0.9316, 0.0684, 0.0001, 0.1446, 

0.0001, 0.5712, 5.3400, 

"PPHO1", 1, 0.7059, 0.2941, 0.1869, 0.0723, 

0.0001, 0.1071, 0.2616, 

"PPHO2", 4, 0.9524, 0.0476, 0.0001, 0.0001, 

0.1443, 0.0357, 0.8718, 

"PPHS1", 1, 0.8788, 0.1212, 0.0001, 0.4338, 

0.1443, 0.3213, 2.5283, 

"PPHS2", 4, 0.8017, 0.1984, 0.1869, 0.3615, 

1.0101, 0.3927, 2.1142, 

"PPIO1", 2, 0.7000, 0.3000, 0.0001, 0.0723, 

0.0001, 0.0714, 0.1526, 

"PPIO2", 2, 0.9975, 0.0025, 0.0001, 0.0001, 

0.0001, 0.0001, 0.5013, 

"PPIS1", 2, 0.7797, 0.2203, 0.0001, 0.0723, 

0.4329, 0.3213, 1.0026, 

"PPIS2", 2, 0.9057, 0.0944, 0.0001, 0.0001, 

0.1443, 0.3213, 2.0924, 

"PPX1", 8, 0.0323, 0.9677, 0.1869, 0.7954, 

1.1544, 0.3570, 0.0218, 

"PPX121", 8, 0.0025, 0.9975, 0.0001, 0.0001, 

0.0001, 0.0714, 0.0001, 

"PPX122", 8, 0.0025, 0.9975, 0.0001, 0.0723, 

0.0001, 0.0357, 0.0001, 

"PPX2", 8, 0.1667, 0.8333, 0.1869, 0.0723, 

0.7215, 0.1071, 0.0436, 

"PPX221", 8, 0.5000, 0.5000, 0.2000, 0.2000, 

0.2000, 0.2000, 0.2000, /* no data */ 

"PPX222", 8, 0.5000, 0.5000, 0.2000, 0.2000, 

0.2000, 0.2000, 0.2000, /* no data */ 

"PPY", 3, 0.9577, 0.0423, 0.0001, 0.0001, 

0.1443, 0.0714, 1.4821, 

"RA", 5, 0.2143, 0.7857, 0.3738, 0.5061, 

0.4329, 0.3570, 0.1308, 

"REX", 9, 0.5000, 0.5000, 0.2000, 0.2000, 

0.2000, 0.2000, 0.2000, /* no data */ 

"REX21", 9, 0.9677, 0.0323, 0.0001, 0.0001, 

0.0001, 0.0357, 0.6539, 

160

"REX22", 9, 0.0323, 0.9678, 0.7477, 0.1446, 

0.1443, 0.8211, 0.0218, 

"RG", 9, 0.4370, 0.5631, 0.1869, 1.0123, 

0.7215, 1.6780, 1.1334, 

"RG21", 9, 0.9975, 0.0025, 0.0001, 0.0001, 

0.0001, 0.0001, 0.0436, 

"RG22", 9, 0.9975, 0.0025, 0.0001, 0.0001, 

0.0001, 0.0001, 0.0436, 

"RGA", 9, 0.1429, 0.8571, 0.0001, 0.2169, 

0.0001, 0.1071, 0.0218, 

"RGQ", 9, 0.5000, 0.5000, 0.1869, 0.0001, 

0.0001, 0.1071, 0.0872, 

"RGQV", 9, 0.0025, 0.9975, 0.0001, 0.0001, 

0.0001, 0.0357, 0.0001, 

"RGR", 9, 0.5263, 0.4736, 0.0001, 0.0723, 

0.4329, 0.4998, 0.4359, 

"RGT", 9, 0.5652, 0.4348, 0.1869, 0.0723, 

0.1443, 0.2499, 0.2833, 

"RL", 5, 0.1351, 0.8648, 1.6822, 2.0969, 

1.0101, 1.8208, 0.3269, 

"RL21", 5, 0.9975, 0.0025, 0.0001, 0.0001, 

0.0001, 0.0001, 0.0654, 

"RL22", 5, 0.0025, 0.9975, 0.1869, 0.0723, 

0.1443, 0.0001, 0.0001, 

"RP", 9, 0.3188, 0.6811, 3.3645, 4.1215, 

1.8759, 1.8922, 1.4385, 

"RR", 5, 0.1512, 0.8488, 8.5981, 13.3767, 

14.2857, 11.6744, 2.5501, 

"RR21", 5, 0.8500, 0.1500, 0.0001, 0.0001, 

0.0001, 0.3213, 1.1116, 

"RR22", 5, 0.1000, 0.9000, 1.4953, 1.2292, 

0.8658, 0.8211, 0.1308, 

"RR31", 5, 0.6250, 0.3750, 0.0001, 0.0001, 

0.0001, 0.1071, 0.1090, 

"RR32", 5, 0.1250, 0.8750, 0.0001, 0.4338, 

0.0001, 0.0357, 0.0218, 

"RR33", 5, 0.8750, 0.1250, 0.0001, 0.0001, 

0.0001, 0.0357, 0.1526, 

"RRQ", 5, 0.5694, 0.4306, 0.0001, 0.7954, 

0.2886, 0.6426, 0.8936, 

"RRQV", 5, 0.9975, 0.0025, 0.0001, 0.0001, 

0.0001, 0.0001, 0.0218, 

"RRR", 5, 0.0238, 0.9762, 0.5607, 0.7231, 

1.0101, 0.7497, 0.0218, 

"RRT", 5, 0.0025, 0.9975, 0.0001, 0.0001, 

0.0001, 0.0357, 0.0001, 

"RT", 5, 0.1982, 0.8018, 2.6168, 1.3015, 

1.7316, 1.6066, 0.4795, 

"TO", 3, 0.9927, 0.0073, 0.0001, 0.0001, 

0.0001, 0.1071, 8.9364, 

"UH", 1, 0.3077, 0.6923, 0.9346, 0.2169, 


"VB0", 2, 0.9162, 0.0838, 0.1869, 0.2169, 

0.0001, 0.3570, 3.3348, 

"VBDR", 1, 0.9000, 0.1000, 0.0001, 0.1446, 

0.1443, 0.2499, 1.9616, 

"VBDZ", 2, 0.9255, 0.0745, 0.0001, 0.1446, 

0.1443, 0.6426, 5.6888, 

161

"VBG", 1, 0.7857, 0.2143, 0.0001, 0.0001, 

0.0001, 0.2142, 0.4795, 

"VBM", 1, 0.0025, 0.9975, 0.0001, 0.0723, 

0.0001, 0.0001, 0.0001, 

"VBN", 2, 0.9123, 0.0878, 0.1869, 0.0723, 

0.0001, 0.2856, 2.2668, 

"VBR", 4, 0.8244, 0.1756, 0.0001, 0.6508, 

0.2886, 0.4284, 2.3540, 

"VBZ", 1, 0.8545, 0.1455, 0.5607, 0.7954, 

1.1544, 0.6069, 4.9913, 

"VD0", 1, 0.2800, 0.7200, 0.0001, 0.4338, 

1.1544, 0.1428, 0.1526, /* maybe should be in group 2 */ 

"VDD", 1, 0.4242, 0.5757, 0.1869, 0.4338, 

0.0001, 0.4284, 0.3051, 

"VDG", 1, 0.1667, 0.8333, 0.0001, 0.0001, 

0.1443, 0.1428, 0.0218, 

"VDN", 1, 0.4000, 0.6000, 0.0001, 0.0723, 


"VDZ", 1, 0.3846, 0.6154, 0.0001, 0.1446, 

0.0001, 0.2142, 0.1090, 

"VH0", 4, 0.8092, 0.1908, 0.1869, 0.2169, 

0.7215, 0.7140, 2.6809, 

"VHD", 1, 0.8621, 0.1379, 0.0001, 0.1446, 

0.0001, 0.3570, 1.6347, 

"VHG", 1, 0.5455, 0.4545, 0.0001, 0.0001, 

0.0001, 0.1785, 0.1308, 

"VHN", 1, 0.3750, 0.6250, 0.0001, 0.1446, 


"VHZ", 1, 0.8587, 0.1413, 0.1869, 0.2169, 

0.1443, 0.2856, 1.7219, 

"VM", 4, 0.7267, 0.2734, 0.3738, 1.8077, 

1.4430, 1.6066, 4.7515, 

"VM21", 4, 0.1429, 0.8571, 0.0001, 0.1446, 

0.0001, 0.1428, 0.0218, 

"VM22", 4, 0.5000, 0.5000, 0.2000, 0.2000, 

0.2000, 0.2000, 0.2000, /* no data */ 

"VMK", 4, 0.5000, 0.5000, 0.2000, 0.2000, 

0.2000, 0.2000, 0.2000, /* no data */ 

"VV0", 5, 0.1789, 0.8211, 13.8318, 10.9183, 

8.0808, 13.5666, 3.1386, 

"VVD", 7, 0.1416, 0.8584, 4.2991, 5.3507, 

4.3290, 9.9607, 1.4603, 

"VVG", 7, 0.1111, 0.8889, 4.4860, 4.3384, 

3.6075, 8.1042, 0.9154, 

"VVN", 5, 0.0812, 0.9188, 14.0187, 12.2198, 

10.3896, 12.9597, 1.3078, 

"VVZ", 7, 0.2051, 0.7948, 2.4299, 2.0969, 

0.7215, 3.8558, 0.8718, 

"XX", 6, 0.2340, 0.7660, 0.1869, 1.8077, 

1.1544, 1.3567, 0.4795, 

"ZZ1", 5, 0.0417, 0.9583, 0.1250, 0.4167, 

0.0417, 0.3750, 0.0417, 

"{CP}", 8, 0.9000, 0.1000, 0.2500, 0.2500, 

0.1000, 0.3000, 0.1000, 

".", 0, 1.0000, 0.0000, 0.0000, 0.0000, 

0.0000, 0.0000, 0.0000, 

":", 0, 1.0000, 0.0000, 0.0000, 0.0000, 

0.0000, 0.0000, 0.0000, 

162

"", 0, 1.0000, 0.0000, 0.0000, 0.0000, 

0.0000, 0.0000, 0.0000, 

"!", 0, 1.0000, 0.0000, 0.0000, 0.0000, 

0.0000, 0.0000, 0.0000, 

"", 0, 1.0000, 0.0000, 0.0000, 0.0000, 

0.0000, 0.0000, 0.0000, 

",", 0, 1.0000, 0.0000, 0.0000, 0.0000, 

0.0000, 0.0000, 0.0000, 

"-", 0, 1.0000, 0.0000, 0.0000, 0.0000, 

0.0000, 0.0000, 0.0000, 

"(", 0, 1.0000, 0.0000, 0.0000, 0.0000, 

0.0000, 0.0000, 0.0000, 

")", 0, 1.0000, 0.0000, 0.0000, 0.0000, 

0.0000, 0.0000, 0.0000, 

"'", 0, 1.0000, 0.0000, 0.0000, 0.0000, 

0.0000, 0.0000, 0.0000} 

float transUU[10][10] = { 

1.0000, 0.8538, 0.9589, 0.9754, 0.7731, 0.2040, 

0.1466, 0.1473, 0.0353, 0.4330, 

0.4907, 0.6492, 0.8084, 0.8437, 0.4118, 0.0866, 

0.1250, 0.0495, 0.0232, 0.2447, 

0.8333, 0.7733, 0.9067, 0.8786, 0.6439, 0.0651, 

0.1064, 0.1006, 0.0312, 0.1200, 

0.9767, 0.7451, 0.9583, 0.9808, 0.7921, 0.0829, 

0.0811, 0.0390, 0.0314, 0.3193, 

0.7241, 0.5031, 0.6804, 0.5864, 0.5027, 0.0816, 

0.1395, 0.0824, 0.0167, 0.0976, 

0.0432, 0.0646, 0.0773, 0.1107, 0.0561, 0.0185, 

0.0093, 0.0200, 0.0044, 0.0675, 

0.0463, 0.0750, 0.1667, 0.1026, 0.3846, 0.1140, 

0.0385, 0.0000, 0.0140, 0.0163, 

0.0614, 0.0711, 0.1250, 0.1126, 0.1333, 0.0000, 

0.0256, 0.0000, 0.0106, 0.0631, 

0.0592, 0.0769, 0.1036, 0.0883, 0.0787, 0.0240, 

0.0652, 0.0037, 0.0162, 0.0345, 

0.0495, 0.3468, 0.1143, 0.3497, 0.2500, 0.0373, 

0.1163, 0.0000, 0.0252, 0.1074} 

float transUS[10][10] = { 

0.0000, 0.1462, 0.0411, 0.0246, 0.2269, 0.7960, 

0.8534, 0.8527, 0.9647, 0.5670, 

0.0000, 0.1672, 0.0599, 0.0154, 0.4118, 0.8524, 

0.7372, 0.8960, 0.8971, 0.6011, 

0.0000, 0.1700, 0.0800, 0.0347, 0.2879, 0.9118, 

0.8085, 0.8742, 0.9688, 0.8400, 

0.0000, 0.2549, 0.0417, 0.0082, 0.1980, 0.9058, 

0.9189, 0.9610, 0.9467, 0.6807, 

0.0000, 0.2147, 0.0731, 0.0500, 0.2350, 0.7814, 

0.4302, 0.8353, 0.7333, 0.6829, 

0.0000, 0.0094, 0.0000, 0.0033, 0.0561, 0.2319, 

0.2710, 0.1200, 0.1274, 0.2068, 

0.0000, 0.0000, 0.0556, 0.0000, 0.1538, 0.3161, 

0.2308, 0.0909, 0.1821, 0.1138, 

0.0000, 0.0133, 0.0000, 0.0033, 0.0000, 0.1937, 

163

0.1795, 0.1111, 0.2979, 0.1441, 

0.0000, 0.0071, 0.0000, 0.0000, 0.0140, 0.1226, 

0.1739, 0.0787, 0.1542, 0.1207, 

0.0000, 0.0289, 0.0000, 0.0061, 0.0500, 0.4191, 

0.3953, 0.0625, 0.4137, 0.6694} 

float transSU[10][10] = { 

0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 

0.0000, 0.0000, 0.0000, 0.0000, 

0.5093, 0.1672, 0.1317, 0.1401, 0.1765, 0.0165, 

0.0929, 0.0149, 0.0232, 0.1011, 

0.1667, 0.0567, 0.0133, 0.0867, 0.0682, 0.0063, 

0.0851, 0.0126, 0.0000, 0.0200, 

0.0233, 0.0000, 0.0000, 0.0110, 0.0099, 0.0063, 

0.0000, 0.0000, 0.0037, 0.0000, 

0.2759, 0.2577, 0.2466, 0.3591, 0.2404, 0.0375, 

0.3721, 0.0706, 0.0333, 0.1707, 

0.9568, 0.8642, 0.8969, 0.8826, 0.7296, 0.1277, 

0.1869, 0.1950, 0.1051, 0.4135, 

0.9537, 0.8250, 0.7778, 0.8974, 0.4615, 0.1813, 

0.0962, 0.0909, 0.2129, 0.4959, 

0.9386, 0.8044, 0.8250, 0.8742, 0.8333, 0.1309, 

0.1538, 0.1111, 0.0638, 0.3604, 

0.9408, 0.8281, 0.8679, 0.9085, 0.7303, 0.2091, 

0.2609, 0.1873, 0.1571, 0.4483, 

0.9505, 0.5838, 0.8571, 0.6442, 0.7000, 0.1328, 

0.1395, 0.3750, 0.2014, 0.0661} 

float transSS[10][10] = { 

0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 

0.0000, 0.0000, 0.0000, 0.0000, 

0.0000, 0.0164, 0.0000, 0.0009, 0.0000, 0.0445, 

0.0449, 0.0396, 0.0565, 0.0532, 

0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0168, 

0.0000, 0.0126, 0.0000, 0.0200, 

0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0050, 

0.0000, 0.0000, 0.0181, 0.0000, 

0.0000, 0.0245, 0.0000, 0.0045, 0.0219, 0.0995, 

0.0581, 0.0118, 0.2167, 0.0488, 

0.0000, 0.0618, 0.0258, 0.0033, 0.1582, 0.6218, 

0.5327, 0.6650, 0.7631, 0.3122, 

0.0000, 0.1000, 0.0000, 0.0000, 0.0000, 0.3886, 

0.6346, 0.8182, 0.5910, 0.3740, 

0.0000, 0.1111, 0.0500, 0.0099, 0.0333, 0.6754, 

0.6410, 0.7778, 0.6277, 0.4324, 

0.0000, 0.0879, 0.0286, 0.0032, 0.1770, 0.6442, 

0.5000, 0.7303, 0.6725, 0.3966, 

0.0000, 0.0405, 0.0286, 0.0000, 0.0000, 0.4108, 

0.3488, 0.5625, 0.3597, 0.1570} 

float bigram[5][5] = { 

0.0000, 0.0591, 0.2698, 0.0879, 0.5832, 

0.8212, 0.0393, 0.0362, 0.0096, 0.0936, 

164

0.5949, 0.0158, 0.1164, 0.0543, 0.2185, 

0.5241, 0.0060, 0.0442, 0.0145, 0.4112, 

0.2697, 0.1007, 0.2665, 0.0725, 0.2906} 

/* floats above produced from normalising across each line of data below: */ 

/* 0, 774, 3535, 1151, 7640, 

/* 2131, 102, 94, 25, 243, 

/* 5309, 141, 1039, 485, 1950, 

/* 1481, 17, 125, 41, 1162, 

/* 4180, 1561, 4130, 1124, 4505 

/* T R F V S */ 

float bigram5[6][6] = { 

0.0000, 0.0240, 0.0893, 0.0283, 0.2225, 0.6358, 

0.7877, 0.0131, 0.0154, 0.0077, 0.0493, 0.1268, 

0.5417, 0.0049, 0.0341, 0.0242, 0.1201, 0.2750, 

0.4112, 0.0035, 0.0262, 0.0088, 0.2481, 0.3022, 

0.2430, 0.0615, 0.1536, 0.0430, 0.1572, 0.3417, 

0.0567, 0.0478, 0.1913, 0.0590, 0.3179, 0.3273} 

/* floats above produced from normalising across each line of data below: */ 

/* 0, 319, 1185, 375, 2953, 8437, 

/* 2044, 34, 40, 20, 128, 329, 

/* 4834, 44, 304, 216, 1072, 2454, 

/* 1162, 10, 74, 25, 701, 854, 

/* 3767, 953, 2381, 666, 2436, 5297, 

/* 1463, 1235, 4939, 1524, 8210, 8453 

/* TUB RISE FALL R-F STR USTR */ 

float trigram[3][3][3] ={ 

0.0261, 0.0808, 0.0050, 

0.0722, 0.0748, 0.0818, 

0.0112, 0.0080, 0.0019, 

0.0429, 0.0711, 0.0145, 

0.0262, 0.0304, 0.0743, 

0.0981, 0.0558, 0.0160, 

0.0429, 0.0768, 0.0016, 

0.0301, 0.0258, 0.0138, 

0.0120, 0.0058, 0.0000} 


{ 

int i, j, k, l, pos, done, finished, w, numberstates, s, 

tub = 0, p1, p2, tri1, tri2, tri3 

double value, val1, val2, bigvalue, tp 

short int state[MAXWORDS], beststates[NUMBEST][MAXWORDS] 

short int grp[MAXWORDS], bigstate[MAXWORDS] 

float sentence2[MAXWORDS][2], sentence6[MAXWORDS][6] 

float beststatevals[NUMBEST] 

char wordtag[MAXWORDS][7], prosody[MAXWORDS][24] 

for (i = 0 i < NUMBEST i++) 

165

eststatevals[i] =0 /* clear best states */ 


if (w < 0 || w > MAXWORDS) { 

printf("Number of words(%d) not suitable. Exiting\n", 

w) 

exit(0) 

} 

for (i = 0 i < w i++) { 

scanf("%s %s", wordtag[i], prosody[i]) 


if (!strcmp(tags[j].tagname, wordtag[i])) { 

sentence2[i][0] = tags[j].prb2U 

sentence2[i][1] = tags[j].prb2S 

grp[i] = tags[j].group 

break 

} 

} 

if (j == NUMTAGS) {/* unknown tag assume multiple punc */ 

sentence2[i][0] = 1.0 


grp[i] = 0 

} 


numberstates = pow(2.0, (double)w) 

for (j = 0 j < w j++) 

state[j] = 0 /* set up first state */ 


val1 = 1.0 

/* product of prob tag being in its state */ 

for (j = 0 j < w j++) 

val1 *= sentence2[j][state[j]] 

val2 = 1.0 /* group transition probabilities */ 

for (j = 1 j < w j++) { 

if (state[j-1] == 0 && state[j] == 0) 

tp = transUU[grp[j-1]][grp[j]] 


tp = transUS[grp[j-1]][grp[j]] 


tp = transSU[grp[j-1]][grp[j]] 


tp = transSS[grp[j-1]][grp[j]] 

val2 *= tp 

} 

if (w > 3) 

for (j = 2 j < w j++) { /* stress trigram probs */ 

if (grp[j-2] == 0) 

tri1 = 2 

else 

tri1 = state[j-2] 

if (grp[j-1] == 0) 

166

} 

tri2 = 2 

else 

tri2 = state[j-1] 

if (grp[j-0] == 0) 

tri3 = 2 

else 

tri3 = state[j] 

val2 *= trigram[tri1][tri2][tri3] 

value = val1 * val2 

/* keep track of the top NUMBEST most probable sequences */ 

for (j = 0 value < beststatevals[j] && j < NUMBEST 

j++) 

 

if (j < NUMBEST) { 

for (k = NUMBEST - 1 k > j k--) {/* make room */ 

beststatevals[k] = beststatevals[k-1] 

for (l = 0 l < w l++) 

beststates[k][l] = beststates[k-1][l] 

} 

beststatevals[j] = value 

for (k = 0 k < w k++) 

beststates[j][k] = state[k] 

} 

} 

pos = 0 /* update state[] to next state */ 

do { 

if (++state[pos] == 2) { 

state[pos] = 0 

pos++ 

done = 0 

} else 

done = 1 

} while (pos < w && !done) 

/* NOW to use the NUMBEST sequences held in */ 

/* beststates[NUMBEST][MAXWORDS] to predict the TSMs */ 

bigvalue = 0.0 /* reset best value */ 

/* do this for each state from above */ 

for (l = NUMBEST - 1 l >= 0 l--) { 

for (j = 0 j < w j++) /* setup state */ 

if (beststates[l][j] == 0) 

state[j] = 5 /* 0=unstr -> 5*/ 

else 

state[j] = 1 /* 1=stressed -> 1 */ 

for (i = 0 i < w i++) { /* setup probability lattice */ 


if (!strcmp(tags[j].tagname, wordtag[i])) { 

sentence6[i][0] = 0.0 /* TUB */ 

sentence6[i][1] = tags[j].prb5R 

sentence6[i][2] = tags[j].prb5F 

167

} 

sentence6[i][3] = tags[j].prb5V 

sentence6[i][4] = tags[j].prb5S 

sentence6[i][5] = tags[j].prb5U 

if (sentence6[i][1] + sentence6[i][2] 

+ sentence6[i][3] + sentence6[i][4] 

+ sentence6[i][5] == 0 ) { 

sentence6[i][0] 

= 1.0 /* if all probs = 0 */ 

state[i] = 0 /* then must be punc */ 

} 

break 

} 


sentence6[i][0] = 1.0 /* TUB */ 






state[i] = 0 

} 

finished = 0 

while (!finished) { 

value = 1.0 

val1 = 1.0 

val2 = 1.0 

for (j = 0 j < w j++) /* initial state probs */ 

val1 *= (double)sentence6[j][state[j]] 

for (j = 1 j < w j++) /* all state trans probs */ 

val2 *= (double)bigram5[state[j-1]][state[j]] 

/* non stress state trans probs */ 

for (p1 = 0, j = 0 j < w j++) 

if (state[j] != 5 && state[j] != 

0) 

p1++ /* count no. of stresses */ 

if (p1 >= 2) { 

/* mult state-to-state trans probs ignoring unstr */ 

for (p1 = 0 state[p1] != 5 && state[p1] 

!= 0 p1++) 

/* find first stress*/ 

for (p2 = (p1 + 1) state[p2] != 

5 && state[p2] != 0 && p2 < w p2++) { 

if (p2 < w) 

val2 *= (double)bigram[state[p1]][state[p2]] 

p1 = p2 

} 

} 

/* initial state probs * transition state probs */ 

value = val1 * val2 

if (value > bigvalue) { 

/*printf("BEST SO FAR:")*/ 

bigvalue = value 

for (k = 0 k < w k++) { 

bigstate[k] = state[k] /* save new best state */ 

168

} 

/*switch(state[k]) { 

case 0: 


break 

case 1: 


break 

case 2: 


break 

case 3: 


break 

case 4: 


break 

case 5: 


}*/ 

} 

/*printf(" %e\n",value)*/ 

} 

pos = w - 1 /* update state[] to next state */ 

do { 

done = 1 /* by default have done unless changed below */ 

if (state[pos] == 0 || state[pos] 

== 5) {/* don't change tub or unstressed state */ 

pos-- 

done = 0 

} else if (pos >= 0) {/* don't go past end */ 

if (++state[pos] == 5) {/* if state incs to 5=unstressed */ 

state[pos] =1 /* reset to 1 */ 

pos-- /* and point to next word */ 

done = 0 /* and say we haven't done */ 

} 

} 

if (pos < 0) 

finished = 1 /* ie have done all combinations */ 

} while (pos >= 0 && !done) 

} 

for (i = 0 i < w i++) 

printf("%s=%s ", wordtag[i], prosody[i]) 

printf("\nPredicted: ") 

for (i = 0 i < w i++) { 

switch (bigstate[i]) { 

case 0: 


break 

case 1: 


break 

case 2: 


break 

case 3: 

169

} 


break 

case 4: 


break 

case 5: 


} 

} 

printf("\nShould Be: ") 

for (i = 0 i < w i++) { 

if (bigstate[i] == 0) 

s = 0 /* tu boundary */ 

else 

for (s = 5, j = 0 j < strlen(prosody[i]) 

j++) { 


== '/') && s >= 4) 

s = 1 

if ((prosody[i][j] == '\\' || prosody[i][j] 

== '`') && s >= 4) 

s = 2 


== '/') && s == 2) 

s = 3 

if ((prosody[i][j] == '*' || prosody[i][j] 

== '_' || prosody[i][j] == '~') && 

s == 5) 

s = 4 

if ((prosody[i][j] == '') && s == 5) 

s = 4 

} 

switch (s) { 

case 0: 


break 

case 1: 


break 

case 2: 


break 

case 3: 


break 

case 4: 


break 

case 5: 


} 

} 


170

Bibliography 

[AA93] Simon Arneld and Eric Atwell. A syntax based grammar of stress sequences. In 

Simon Lucas, editor, Grammatical Inference: theory, applications and alternatives, 

pages 71{78, London, 1993. Institution of Electrical Engineers. Colloquium Proceedings 

no.1993/092. 

[ASO88] 

Eric Atwell, Clive Souter, and Tim O'Donoghue. Prototype parser 1. COMMUNAL 

research report 17, School of Computer Studies, University of Leeds, 1988. 

[Atw83] 

Eric Atwell. Constituent-likelihood grammar. ICAME Journal of the International 

Computer Archive of Modern English, 7:34{67, 1983. 

[Atw93] 

Eric Atwell. Corpus{based statistical modelling of english grammar. In Clive Souter 

and Eric Atwell, editors, Corpus{Based Computational Linguistics, Amsterdam, 1993. 

Rodopi. 

[Atw94] 

Eric Atwell. Speech{oriented probabilistic parser project: Final report to mod. Technical 

report, School of Computer Studies, University of Leeds, 1994. 

[BA93] Mark E. Beckman and Gayle M. Ayers. Guidelines for ToBI Labelling, 1993. 

[BCJ80] David Brazil, Malcolm Coulthard, and Catherine Jones. Discourse Intonation and 

Language Teaching. Longman, 1980. 

[BGL93] 

Ezra Black, Roger Garside, and Georey Leech, editors. Statistically{Driven Computer 

Grammars of English: the IBM/Lancaster Approach. Rodopi, 1993. 

171

[Cha93] Eugene Chariak. Statistical Language Learning. Bradford Books, 1993. 

[Cho57] Noam Chomsky. Syntactic Structures. Mouton, 1957. 

[CQ64] 

David Crystal and Randolph Quirk. Systems of Prosodic and Paralinguistic Features 

in English. Mouton & Co., 1964. 

[Cru86] Alan Cruttenden. Intonation. Cambridge University Press, 1986. 

[Cry69] 

David Crystal. Prosodic Systems and Intonation in English. Cambridge University 

Press, 1969. 

[FP80] 

Robin Fawcett and Mike Perkins. Child Language Transcripts 6{12. Politechnic of 

Wales, 1980. 

[Fud84] Erik Fudge. English Word Stress. George Allen & Unwin Ltd., 1984. 

[GAR92] 

Nawal Ghali, Simon Arneld, and Peter Roach. Statistical relationships between auditory 

and acoustic recordings of intonation: design of a database. In Proceedings of 

the Institute of Acoustics: Speech and Hearing, volume 14.6, pages 207{215, 1992. 

[GKPS85] Gerald Gazdar, Ewan Klein, Georey Pullman, and Ivan Sag. 

Generalized Phrase 

Structure Grammar. Basil Blackwell, 1985. 

[GLS87] Roger Garside, Georey Leech, and Georey Sampson, editors. The Computation 

Analysis of English: A Corpus{Based Approach. Longman, 1987. 

[GM89] 

Gerald Gazdar and Chris Mellish. Natural Language Processing in Prolog: An Inroduction 

to Computational Linguistics. Addison{Wesley, 1989. 

[Hug94] 

John Hughes. Automatically Acquiring a Classication of Words. PhD thesis, The 

School of Computer Studies, The University of Leeds, 1994. 

[Isa85] S. D. Isard. Speech synthesis and the rhythm of english. In Frank Fallside and 

William A. Woods, editors, Computer Speech Processing, chapter 19. Prentice{Hall 

International, 1985. 

172

[JA94] 

Uwe Joust and Eric Atwell. Deriving a probabilistic grammar of semantic markers 

from unrestricted english text. submitted to International Workshop on Computational 

Semantics, 1994. 

[Kla80] Denis H. Klatt. Scriber and lafs: Two new approaches to speech analysis. In Wayne A. 

Lea, editor, Trends in Speech Recognition, chapter 25. Prentice{Hall, 1980. 

[Kla87] 

Denis H. Klatt. Review of text-to-speech conversion for english. Journal of the Acoustical 

Society of America, 82, 1987. 

[Kla90] 

Denis H. Klatt. Review of the arpa speech understanding project. In Alex Waibel and 

Kai-Fu Lee, editors, Readings in Speech Recognition, pages 554{575. Morgan Kaufmann, 

1990. 

[Kno88] 

Gerry Knowles. The spoken english corpus: A progress report. ICAME Journal of the 

International Computer Archive of Modern English, 1988. 

[KT88] Gerry Knowles and Lita Taylor. A Manual of Information to Acompany the SEC 

Corpus. UCREL, The University of Lancaster, 1988. 

[Lav72] 

John Laver. Voice quality and indexical information. In John Laver and Sandy Hutcheson, 

editors, Communication in Face toFace Interaction, chapter 10. Penguin, 1972. 

[Lav80] 

John Laver. The Phonetic Description of Voice Quality. Cambridge University Press, 

1980. 

[Lav94] John Laver. Principles of Phonetics. Cambridge University Press, 1994. 

[Lea80] 

Wayne A. Lea. Prosodic aids to speech recognition. In Wayne A. Lea, editor, Trends 

in Speech Recognition, chapter 8. Prentice{Hall, 1980. 

[Leh70] Ilse Lehiste. Suprasegmentals. MIT Press, 1970. 

173

[LH85] Christopher Longuet-Higgins. Tones of voice: The role of intonation in computer 

speech understanding. In Frank Fallside and William A. Woods, editors, Computer 

Speech Processing, chapter 11. Prentice{Hall International, 1985. 

[Lyo94] 

Caroline Lyon. The representation of natural language to enable neural networks to 

detect syntactic features. PhD thesis, Department of Computer Science, University of 

Hertfordshire, 1994. 

[MA91] Kirsten Malmkjaer and James M. Andersen, editors. The Linguistic Encyclopedia. 

Routledge, 1991. 

[OA61] 

J. D. O'Connor and G. F. Arnold. Intonation of Colloquial English. Longman, London, 

second edition, 1961. 

[O'D93] 

Tim O'Donoghue. Reversing the Process of Generation in Systemic Grammar. PhD 

thesis, School of Computer Science, University of Leeds, 1993. 

[Pie87] 

J. Pierrehumbert. The Phonology and Phonetics of English Intonation. Indiana Linguistics 

Club, 1987. 

[Rab90] 

L. R. Rabiner. A tutorial on hidden markov models and selected applications in speech 

recognition. In Alex Waibel and Kai-Fu Lee, editors, Readings in Speech Recognition, 

pages 267{296. Morgan Kaufmann, 1990. 

[RAng] 

Peter Roach and Simon Arneld. Linking prosodic transcription to the time dimension. 

In G. N. Leech and J. Thomas, editors, Spoken English on Computer. Longman, 

forthcoming. 

[RH95] 

Peter Roach and J.W. Hartman, editors. English Pronouncing Dictionary. Cambridge 

University Press, 1995. forthcoming. 

[RKVA94] Peter Roach, Gerry Knowles, Tamas Varadi, and Simon Arneld. Marsec: a machine{ 

readable spoken english corpus. Journal of the Intonational Phonetic Association, 24.1, 

May 1994. 

174

[Roa91] 

Peter Roach. English Phonetics and Phonology: A practical course. Cambridge University 

Press, second edition, 1983,1991. 

[Roa92] Peter Roach. Introducing Phonetics. Penguin English Linguistics. Penguin, 1992. 

[Roa94] 

Peter Roach. Conversion between prosodic transcription systems: \standard british" 

and tobi. Speech Communication, 15, 1994. 

[Rob67] 

R.H. Robins. A Short History of Linguistics. Longmans' Linguistics Library. Longmans, 

1967. 

[SBP + 92] 

K. Silverman, M. Beckman, J. Pitrelli, M. Ostendorf, C. Wightman, P. Price, J. Pierrehumbert, 

and J. Hirschberg. Tobi: A standard for labelling english prosody. Proceedings 

of the 1992 International Conference ofSpeech Language Processing, 1992. 

[Sch73] 

Roger G. Schank. Identication of conceptualizations underlying natural language. In 

Roger G. Schank and Kenneth Mark Colby, editors, Computer Models of Thought and 

Language, chapter 5. W. H. Freeman and Company, 1973. 

[Ste85] 

M. Stella. Speech synthesis. In Frank Fallside and William A. Woods, editors, Computer 

Speech Processing, chapter 17. Prentice{Hall International, 1985. 

[Sva90] 

Jan Svartvik. The London{Lund Corpus of Spoken English: Description and Research. 

Lund University Press, 1990. 

[tHCC90] 

Johan 't Hart, Rene Collier, and Antonie Cohen. A Perceptual Study of Intonation: An 

experimental{phonetic approach to speech melody. Cambridge University Press, 1990. 

[Wai90] 

Alex Waibel. Prosodic knowledge sources for word hypothesization in a continuous 

speech recognition system. In Alex Waibel and Kai-Fu Lee, editors, Readings in Speech 

Recognition, pages 534{537. Morgan Kaufmann, 1990. 

[Wal89] 

DavidL.Waltz. Connectionist models: Not just a notational varient, not a panacea. 

In Yorick Wilks, editor, Theoretical Issues in Natural Language Processing, chapter 3, 

pages 56{63. Lawrence Erlbaum Associates Inc., 1989. 

175

[Wee94] Ruvan Weerasinghe. Probabilistic Parsing in Systemic Functional Grammar. PhD 

thesis, Department of Computer Science, University ofWales College at Cardi, 1994. 

[Wil78] Yorick Wilks. Making preferences more active. FIArt. Int., 11:197{223, 1978. 

[Woo85] 

William A. Woods. Language processing for speech understanding. In Frank Fallside 

and William A. Woods, editors, Computer Speech Processing,chapter 12. Prentice{Hall 

International, 1985. 

176

Prosody and Syntax in Corpus Based Analysis of Spoken English ...

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?