05.01.2015 Views

Prosody and Syntax in Corpus Based Analysis of Spoken English ...

Prosody and Syntax in Corpus Based Analysis of Spoken English ...

Prosody and Syntax in Corpus Based Analysis of Spoken English ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Prosody</strong> <strong>and</strong> <strong>Syntax</strong> <strong>in</strong><br />

<strong>Corpus</strong> <strong>Based</strong> <strong>Analysis</strong> <strong>of</strong><br />

<strong>Spoken</strong> <strong>English</strong><br />

by<br />

Simon Christopher Arneld<br />

The University <strong>of</strong> Leeds<br />

School <strong>of</strong> Computer Studies<br />

December 14, 1994<br />

Submitted <strong>in</strong> accordance with the requirements<br />

for the degree <strong>of</strong> Doctor <strong>of</strong> Philosophy.<br />

The c<strong>and</strong>idate conrms that the work submitted is his own <strong>and</strong> that<br />

appropriate credit has been given where reference has been made to<br />

the work <strong>of</strong> others.


Abstract<br />

This thesis attempts to show that it can be productive to analyse <strong>English</strong> prosody <strong>in</strong> terms <strong>of</strong><br />

syntax. Although dier<strong>in</strong>g prosodies are possible for a xed syntax, it is demonstrated that an<br />

utterances syntax can be used to generate an underly<strong>in</strong>g \basel<strong>in</strong>e" prosody regardless <strong>of</strong> the<br />

actual words, semantics or context. In order to analyse this a British <strong>English</strong> spoken corpus is<br />

needed which has both syntactic <strong>and</strong> prosodic <strong>in</strong>formation. Such a corpus (the <strong>Spoken</strong> <strong>English</strong><br />

<strong>Corpus</strong> (SEC) now known as the Mach<strong>in</strong>e Readable <strong>Spoken</strong> <strong>English</strong> <strong>Corpus</strong> (MARSEC)) is used<br />

to calculate a number <strong>of</strong> statistical measures relat<strong>in</strong>g the prosodic (specically the tonic stress<br />

mark annotations) <strong>and</strong> the syntactic (specically the part <strong>of</strong> speech tags) <strong>in</strong>formation.<br />

This thesis explores the mapp<strong>in</strong>g between this <strong>in</strong>formation. Models are devised around this<br />

<strong>in</strong>formation which implement the mapp<strong>in</strong>gs <strong>and</strong> select from the search space <strong>of</strong> possible annotations<br />

those with the highest scores. The mapp<strong>in</strong>g is applied <strong>in</strong> the models for prediction <strong>of</strong> stress<br />

<strong>and</strong> prosodic annotations <strong>in</strong> new (part <strong>of</strong> speech tagged) text.<br />

The models are used to demonstrate that there is a clear relationship between parts <strong>of</strong> speech<br />

<strong>and</strong> the prosodic annotations <strong>in</strong> the <strong>Spoken</strong> <strong>English</strong> <strong>Corpus</strong>. The models may be exploited to<br />

generate stress <strong>and</strong> prosodic annotations for text{to{speech applications <strong>in</strong> order to <strong>in</strong>crease the<br />

<strong>in</strong>telligibility <strong>and</strong> naturalness <strong>of</strong> the synthesized speech.


Contents<br />

1 Introduction 1<br />

1.1 Introduction :::::::::::::::::::::::::::::::::::::::: 1<br />

1.2 Motivation :::::::::::::::::::::::::::::::::::::::: 2<br />

1.3 Applications :::::::::::::::::::::::::::::::::::::::: 2<br />

1.3.1 Speech Synthesis ::::::::::::::::::::::::::::::::: 2<br />

1.3.2 Speech Recognition <strong>and</strong> Underst<strong>and</strong><strong>in</strong>g :::::::::::::::::::: 3<br />

1.4 Overview ::::::::::::::::::::::::::::::::::::::::: 4<br />

2 Background 6<br />

2.1 Introduction :::::::::::::::::::::::::::::::::::::::: 6<br />

2.2 <strong>Prosody</strong> :::::::::::::::::::::::::::::::::::::::::: 6<br />

2.2.1 Denition <strong>of</strong> prosody ::::::::::::::::::::::::::::::: 7<br />

2.2.2 Auditory versus Acoustic <strong>Prosody</strong> ::::::::::::::::::::::: 12<br />

2.2.3 Functions <strong>of</strong> prosody ::::::::::::::::::::::::::::::: 12<br />

2.2.4 Representation <strong>of</strong> prosody :::::::::::::::::::::::::::: 13<br />

2.3 <strong>Syntax</strong> ::::::::::::::::::::::::::::::::::::::::::: 14<br />

2.3.1 Denition <strong>of</strong> <strong>Syntax</strong> ::::::::::::::::::::::::::::::: 14<br />

2.4 Approaches To Natural Language Process<strong>in</strong>g ::::::::::::::::::::: 16<br />

2.5 <strong>Spoken</strong> <strong>English</strong> Corpora <strong>and</strong> Prosodic Annotation :::::::::::::::::: 17<br />

2.5.1 London{Lund <strong>Corpus</strong> :::::::::::::::::::::::::::::: 18<br />

i


2.5.2 Polytechnic <strong>of</strong> Wales <strong>Corpus</strong> :::::::::::::::::::::::::: 19<br />

2.5.3 SEC/MARSEC :::::::::::::::::::::::::::::::::: 19<br />

2.6 Computational Use <strong>of</strong> <strong>Prosody</strong> ::::::::::::::::::::::::::::: 20<br />

2.6.1 Speech Recognition <strong>and</strong> Underst<strong>and</strong><strong>in</strong>g :::::::::::::::::::: 22<br />

2.6.2 Speech Synthesis ::::::::::::::::::::::::::::::::: 22<br />

3 Relat<strong>in</strong>g <strong>Prosody</strong> <strong>and</strong> Word Class 24<br />

3.1 Introduction :::::::::::::::::::::::::::::::::::::::: 24<br />

3.2 A Source <strong>of</strong> Data ::::::::::::::::::::::::::::::::::::: 24<br />

3.3 A Need For Process<strong>in</strong>g :::::::::::::::::::::::::::::::::: 25<br />

3.4 Cross Referenc<strong>in</strong>g ::::::::::::::::::::::::::::::::::::: 27<br />

3.5 By{Products ::::::::::::::::::::::::::::::::::::::: 31<br />

3.6 Summary ::::::::::::::::::::::::::::::::::::::::: 31<br />

4 Prelim<strong>in</strong>ary Statistical <strong>Analysis</strong> 34<br />

4.1 Introduction :::::::::::::::::::::::::::::::::::::::: 34<br />

4.2 Prosodic Annotation Statistics ::::::::::::::::::::::::::::: 35<br />

4.2.1 Prosodic mark frequencies :::::::::::::::::::::::::::: 35<br />

4.2.2 Tone Unit lengths :::::::::::::::::::::::::::::::: 36<br />

4.2.3 Prosodic mark bigram frequencies ::::::::::::::::::::::: 37<br />

4.3 Cross{Reference Statistics :::::::::::::::::::::::::::::::: 38<br />

4.3.1 Co{occurence tables ::::::::::::::::::::::::::::::: 38<br />

4.3.2 Ignor<strong>in</strong>g Higher{Level Syntactic structures :::::::::::::::::: 40<br />

4.3.3 Cluster<strong>in</strong>g word classes ::::::::::::::::::::::::::::: 41<br />

4.4 Summary ::::::::::::::::::::::::::::::::::::::::: 41<br />

5 Automatic Stress Annotation 45<br />

5.1 Introduction :::::::::::::::::::::::::::::::::::::::: 45<br />

5.2 Stress Prediction ::::::::::::::::::::::::::::::::::::: 47<br />

ii


5.2.1 Search Mechanism :::::::::::::::::::::::::::::::: 49<br />

5.2.2 Scor<strong>in</strong>g :::::::::::::::::::::::::::::::::::::: 49<br />

5.2.3 Performance Measures :::::::::::::::::::::::::::::: 51<br />

5.2.4 Context :::::::::::::::::::::::::::::::::::::: 52<br />

5.2.5 Boundary Conditions :::::::::::::::::::::::::::::: 55<br />

5.3 Performance :::::::::::::::::::::::::::::::::::::::: 56<br />

5.4 Improvements ::::::::::::::::::::::::::::::::::::::: 58<br />

5.5 Summary ::::::::::::::::::::::::::::::::::::::::: 61<br />

6 Automatic Prosodic Annotation 63<br />

6.1 Introduction :::::::::::::::::::::::::::::::::::::::: 63<br />

6.2 Exp<strong>and</strong><strong>in</strong>g the Model :::::::::::::::::::::::::::::::::: 63<br />

6.3 Model Design ::::::::::::::::::::::::::::::::::::::: 65<br />

6.3.1 Choice <strong>of</strong> Prosodic Marks :::::::::::::::::::::::::::: 66<br />

6.3.2 Estimation <strong>of</strong> Probabilities ::::::::::::::::::::::::::: 67<br />

6.3.3 The Model :::::::::::::::::::::::::::::::::::: 69<br />

6.3.4 Composite Model ::::::::::::::::::::::::::::::::: 70<br />

6.4 Model Assessment :::::::::::::::::::::::::::::::::::: 73<br />

6.4.1 Performance Statistics :::::::::::::::::::::::::::::: 76<br />

6.5 Summary ::::::::::::::::::::::::::::::::::::::::: 77<br />

7 Conclusions <strong>and</strong> Future Work 78<br />

7.1 Introduction :::::::::::::::::::::::::::::::::::::::: 78<br />

7.2 Review ::::::::::::::::::::::::::::::::::::::::::: 78<br />

7.3 Performance Measures :::::::::::::::::::::::::::::::::: 79<br />

7.3.1 Tone Unit lengths <strong>in</strong> the Model. :::::::::::::::::::::::: 80<br />

7.3.2 <strong>Analysis</strong> <strong>of</strong> Models :::::::::::::::::::::::::::::::: 80<br />

7.3.3 Word Class Models :::::::::::::::::::::::::::::::: 82<br />

iii


7.3.4 Prosodic Mark Models :::::::::::::::::::::::::::::: 83<br />

7.4 Future Work ::::::::::::::::::::::::::::::::::::::: 84<br />

7.4.1 Conversion to ToBI ::::::::::::::::::::::::::::::: 85<br />

7.4.2 Additional Constra<strong>in</strong>ts :::::::::::::::::::::::::::::: 85<br />

7.4.3 Speech Synthesis ::::::::::::::::::::::::::::::::: 86<br />

7.4.4 Parameter Improvement ::::::::::::::::::::::::::::: 87<br />

7.5 General Conclusions ::::::::::::::::::::::::::::::::::: 87<br />

A SEC <strong>and</strong> MARSEC 88<br />

A.1 Introduction :::::::::::::::::::::::::::::::::::::::: 88<br />

A.2 The <strong>Spoken</strong> <strong>English</strong> <strong>Corpus</strong> ::::::::::::::::::::::::::::::: 88<br />

A.2.1 History :::::::::::::::::::::::::::::::::::::: 88<br />

A.2.2 Categories ::::::::::::::::::::::::::::::::::::: 88<br />

A.3 MARSEC ::::::::::::::::::::::::::::::::::::::::: 89<br />

B Syntactic Tagg<strong>in</strong>g <strong>of</strong> SEC 92<br />

B.1 Introduction :::::::::::::::::::::::::::::::::::::::: 92<br />

B.2 Word Class Tags ::::::::::::::::::::::::::::::::::::: 92<br />

B.3 Phrase/Clause Tags ::::::::::::::::::::::::::::::::::: 96<br />

C Test<strong>in</strong>g Data 97<br />

C.1 <strong>Corpus</strong> Texts: Category M ::::::::::::::::::::::::::::::: 97<br />

C.1.1 Section M02 :::::::::::::::::::::::::::::::::::: 97<br />

C.1.2 Section M03 :::::::::::::::::::::::::::::::::::: 98<br />

C.1.3 Section M04 :::::::::::::::::::::::::::::::::::: 99<br />

C.1.4 Section M05 :::::::::::::::::::::::::::::::::::: 100<br />

C.1.5 Section M07 :::::::::::::::::::::::::::::::::::: 103<br />

C.1.6 Section M08 :::::::::::::::::::::::::::::::::::: 103<br />

C.1.7 Section M09 :::::::::::::::::::::::::::::::::::: 104<br />

iv


C.2 Prediction Results :::::::::::::::::::::::::::::::::::: 106<br />

C.2.1 Extract from section M05 :::::::::::::::::::::::::::: 106<br />

D Word{Class / TSM Co{occurence gures 108<br />

D.1 Tonic Stress Mark Frequencies. ::::::::::::::::::::::::::::: 108<br />

D.2 Word Class Frequencies. ::::::::::::::::::::::::::::::::: 109<br />

D.3 Tag/Tone Co-occurences ::::::::::::::::::::::::::::::::: 109<br />

E Punctuation <strong>and</strong> Boundaries 114<br />

F Source Code 116<br />

F.1 symbolify.c :::::::::::::::::::::::::::::::::::::::: 116<br />

F.2 ttalign.c :::::::::::::::::::::::::::::::::::::::::: 120<br />

F.3 collate-tu.c :::::::::::::::::::::::::::::::::::::::: 128<br />

F.4 align-parse.c :::::::::::::::::::::::::::::::::::::::: 130<br />

F.5 splittule.c :::::::::::::::::::::::::::::::::::::::: 133<br />

F.6 transition.c :::::::::::::::::::::::::::::::::::::::: 134<br />

F.7 transgroups.c ::::::::::::::::::::::::::::::::::::::: 137<br />

F.8 segment.c ::::::::::::::::::::::::::::::::::::::::: 141<br />

F.9 probability.c :::::::::::::::::::::::::::::::::::::::: 142<br />

F.10 probability3.c ::::::::::::::::::::::::::::::::::::::: 149<br />

F.11 probabilityc.c ::::::::::::::::::::::::::::::::::::::: 156<br />

Bibliography 171<br />

v


List <strong>of</strong> Tables<br />

2.1 Prosodic marks used <strong>in</strong> the SEC/MARSEC :::::::::::::::::::::: 21<br />

4.1 Categories <strong>of</strong> the corpus used for analysis ::::::::::::::::::::::: 35<br />

4.2 Prosodic mark bigram frequencies ::::::::::::::::::::::::::: 37<br />

4.3 Co{occurence table for 64 most frequent word classes. :::::::::::::::: 43<br />

5.1 Stress Transition Table. ::::::::::::::::::::::::::::::::: 54<br />

5.2 Probability <strong>of</strong> a tone unit boundary follow<strong>in</strong>g a stressed or unstressed word. :::: 56<br />

5.3 Probability <strong>of</strong> stressed or unstressed word follow<strong>in</strong>g a Tone Unit boundary. :::: 56<br />

5.4 Performance statistics for stress prediction model. Percentage <strong>of</strong> words which are<br />

correctly stressed/unstressed <strong>in</strong> comparison to the two expert annotations <strong>and</strong> overall. 57<br />

5.5 Performance statistics for stress prediction model. Percentage <strong>of</strong> completely correct<br />

tone units <strong>in</strong> comparison to the two expert annotations (BJW: Briony Williams,<br />

<strong>and</strong> GOK: Gerry Knowles) <strong>and</strong> overall (ALL). :::::::::::::::::::: 57<br />

5.6 Words classes <strong>in</strong> the groups. ::::::::::::::::::::::::::::::: 60<br />

5.7 Performance statistics for stress prediction model us<strong>in</strong>g group transition probabilities. 61<br />

6.1 Performance scores for the tra<strong>in</strong><strong>in</strong>g categories. ::::::::::::::::::::: 72<br />

6.2 Scor<strong>in</strong>g relationship between predicted <strong>and</strong> annotated prosodic marks. ::::::: 75<br />

6.3 Performance scores for the test category <strong>of</strong> the corpus. :::::::::::::::: 77<br />

vi


7.1 Word class tags with frequencies <strong>of</strong> 50 or greater show<strong>in</strong>g percentage <strong>of</strong> correct<br />

predictions (when compared with the corpus annotations) for the stress prediction<br />

model (SPM) <strong>and</strong> the prosodic mark prediction model (PPM). ::::::::::: 81<br />

7.2 Prosodic marks show<strong>in</strong>g prediction percentages for the composite prosody prediction<br />

model. :::::::::::::::::::::::::::::::::::::::: 83<br />

A.1 Categories <strong>in</strong> the SEC/MARSEC :::::::::::::::::::::::::::: 90<br />

A.2 Sections <strong>in</strong> the SEC/MARSEC ::::::::::::::::::::::::::::: 90<br />

B.1 Phrase <strong>and</strong> Clause labels ::::::::::::::::::::::::::::::::: 96<br />

E.1 Punctuation/Tone Unit Boundary Co-occurence Table. :::::::::::::::: 115<br />

vii


List <strong>of</strong> Figures<br />

2.1 Waveform for the vowel e with one cycle or pitch period marked. :::::::::: 9<br />

3.1 Example <strong>of</strong> Prosodic annotation format. :::::::::::::::::::::::: 32<br />

3.2 Example <strong>of</strong> Treebank format. :::::::::::::::::::::::::::::: 32<br />

3.3 Example output from cross{referenc<strong>in</strong>g from section B02. : : : : : : : : : : : : : : 33<br />

4.1 Frequency <strong>of</strong> prosodic marks :::::::::::::::::::::::::::::: 36<br />

4.2 Relative frequencies <strong>of</strong> tone{unit lengths <strong>in</strong> terms <strong>of</strong> numbers <strong>of</strong>: words with tonic<br />

stress marks words with prosodic marks <strong>and</strong> words. ::::::::::::::::: 36<br />

4.3 Hierarchical cluster<strong>in</strong>g <strong>of</strong> 64 most frequent word classes. ::::::::::::::: 44<br />

7.1 Relative frequencies <strong>of</strong> tone-unit lengths produced by the model <strong>in</strong> terms <strong>of</strong> numbers<br />

<strong>of</strong>: words with tonic stress marks words with prosodic marks <strong>and</strong> words. ::::: 82<br />

A.1 Diagram show<strong>in</strong>g waveform, fundamental frequency, RMS energy, segmental, prosodic<br />

<strong>and</strong> treebank transcriptions. ::::::::::::::::::::::::::::::: 91<br />

viii


Acknowledgements<br />

Iwould like to thank my supervisors Eric Atwell <strong>and</strong> Peter Roach for their support, advice, wisdom<br />

<strong>and</strong> ideas. Gratitude must also go to my collegues <strong>in</strong> the Speech Laboratory <strong>and</strong> <strong>in</strong> the School <strong>of</strong><br />

Computer Studies who helped me with a great many th<strong>in</strong>gs.<br />

Iwould also like to thank my family <strong>and</strong> friends especially my wife Debra <strong>and</strong> my friend Dean<br />

Brown who have put up with my rant<strong>in</strong>gs over the last few years <strong>and</strong> gave me their condence<br />

<strong>and</strong> enthusiasm to cont<strong>in</strong>ue when th<strong>in</strong>gs looked bleak.<br />

This research was funded by a Science <strong>and</strong> Eng<strong>in</strong>eer<strong>in</strong>g Research Council (SERC) research<br />

studentship.<br />

ix


Chapter 1<br />

Introduction<br />

1.1 Introduction<br />

This thesis <strong>in</strong>vestigates the relationship between parts <strong>of</strong> speech (or word class) <strong>and</strong> prosodic annotations<br />

specically <strong>in</strong> the <strong>Spoken</strong> <strong>English</strong> <strong>Corpus</strong>. A probabilistic approach istaken to describe<br />

the mapp<strong>in</strong>g from word class annotations to prosodic annotations. This mapp<strong>in</strong>g quanties the<br />

relationship.<br />

It is shown that a strong relationship exists between word class tags <strong>and</strong> stress (see chapter 5)<br />

but to a lesser extent there is also a relationship with prosodic annotations (see chapter 6). Models<br />

developed to demonstrate the relationship achieve over 91% agreement with the orig<strong>in</strong>al corpus<br />

annotations for stress prediction <strong>and</strong> over 65% agreement for prosody (stress accent) prediction.<br />

The pr<strong>in</strong>cipal aim <strong>in</strong> study<strong>in</strong>g the relationship is to assess whether either can be <strong>of</strong> any use <strong>in</strong><br />

determ<strong>in</strong><strong>in</strong>g the other <strong>in</strong> speech synthesis, speech recognition <strong>and</strong> speech underst<strong>and</strong><strong>in</strong>g applications.<br />

1


1.2 Motivation<br />

Lea[Lea80] (pp.172{174) gives results <strong>of</strong> an experiment where ve listeners were presented with<br />

255 carefully designed spoken sentences. The listeners were asked to mark each syllable as either<br />

stressed, unstressed or reduced. Lea collated their results <strong>and</strong> presented them as relative stress<br />

level ordered accord<strong>in</strong>g to syntactic category (see his gure 8{2 p.174 <strong>in</strong>[Lea80]).<br />

His experiment showed that articles, conjunctions, <strong>and</strong> prepositions were on average judged<br />

reduced.<br />

Possessive determ<strong>in</strong>ers, relatives, copulatives, auxiliary verbs <strong>and</strong> pronouns were on<br />

average unstressed <strong>and</strong> ma<strong>in</strong> verbs, adjectives, sentence adverbs, nouns, quantiers <strong>and</strong> comm<strong>and</strong><br />

verbs were on average stressed.<br />

This evidence shows that there is a relationship between stress <strong>and</strong> word class. With<strong>in</strong> this<br />

thesis similar data is analysed but on a much larger scale s<strong>in</strong>ce there are over 1400 sentences <strong>and</strong><br />

the stress dist<strong>in</strong>ctions extend to cover a wide variety <strong>of</strong> stress accents. Such ataskwould not be<br />

possible without the use <strong>of</strong> a mach<strong>in</strong>e readable corpus such as the <strong>Spoken</strong> <strong>English</strong> <strong>Corpus</strong> the like<br />

<strong>of</strong> which has not been available until recent years.<br />

The results presented <strong>in</strong> chapters 4, 5, 6 <strong>and</strong> 7 conrm <strong>and</strong> exp<strong>and</strong> upon Lea's results.<br />

1.3 Applications<br />

Information relat<strong>in</strong>g parts <strong>of</strong> speech to prosody has useful applications for speech recognition <strong>and</strong><br />

underst<strong>and</strong><strong>in</strong>g <strong>and</strong> speech synthesis.<br />

1.3.1 Speech Synthesis<br />

In speech synthesis we have a one{to{many relationship between the str<strong>in</strong>g <strong>of</strong> word class tags<br />

for the words <strong>in</strong> an utterance <strong>and</strong> all the possible prosodic patterns that can be used with that<br />

utterance.<br />

It is apparent that context will aect the choice <strong>of</strong> prosodic patterns beyond the scope derivable<br />

from word class <strong>in</strong>formation. Consider the utterance \Peter isn't here". The tonic (or ma<strong>in</strong>) stress<br />

2


accent may be placed on any word to eect dierent emphasis <strong>and</strong> attitud<strong>in</strong>al circumstances. Viz:<br />

1. Peter isn't here.<br />

2. Peter isn't here.<br />

3. Peter isn't here.<br />

Where the small capitals <strong>in</strong>dicate the word tak<strong>in</strong>g the stress accent. The rst might be uttered<br />

if everyone except Peter were present <strong>and</strong> if he were expected to be. The second might be uttered<br />

to correct a persons misconception about the presence <strong>of</strong> Peter.<br />

The nal utterance might be<br />

made if we expected Peter to be somewhere when he is not.<br />

It will be realised that the possible variations <strong>in</strong> prosody have multiple purposes cover<strong>in</strong>g<br />

aspects <strong>of</strong> attitude, emphasis, given/new <strong>in</strong>formation <strong>and</strong> grammatical structure.<br />

In this thesis we are only concerned with the <strong>in</strong>formation conta<strong>in</strong>ed with<strong>in</strong> the parts <strong>of</strong> speech<br />

<strong>and</strong> how this relates to prosody <strong>and</strong> hence only the latter <strong>of</strong> the above aspects will have a major<br />

bear<strong>in</strong>g on this work. However, there is a certa<strong>in</strong> amount <strong>of</strong> other <strong>in</strong>formation implicit with<strong>in</strong> the<br />

structure <strong>of</strong> the parts <strong>of</strong> speech.<br />

Without the support <strong>of</strong> contextual <strong>in</strong>formation to dist<strong>in</strong>guish between the above choices the<br />

best that can be aimed for <strong>in</strong> speech synthesis is a discourse neutral pattern | or a\basel<strong>in</strong>e" pattern<br />

which would be the st<strong>and</strong>ard or default pronunciation <strong>in</strong> the absence <strong>of</strong> any major contextual<br />

eects.<br />

So, the ma<strong>in</strong> application <strong>of</strong> relat<strong>in</strong>g prosody <strong>and</strong> word class tags for text{to{speech speech<br />

synthesis is <strong>in</strong> produc<strong>in</strong>g neutral prosodic patterns which, <strong>of</strong> course, may be modied by higher<br />

level context or semantic processes. It is beyond the scope <strong>of</strong> this thesis to cover the realisation<br />

<strong>of</strong> these prosodic patterns acoustically.<br />

1.3.2 Speech Recognition <strong>and</strong> Underst<strong>and</strong><strong>in</strong>g<br />

In speech recognition the uses <strong>of</strong> prosody are more limited because identify<strong>in</strong>g prosodic <strong>in</strong>formation<br />

<strong>in</strong> an utterance will not help signicantly with the low{level identication <strong>of</strong> words or the words<br />

3


classes. However, disambiguation between similar sound<strong>in</strong>g words based upon pitch accents or<br />

stress is a realistic possibility. For example \which" <strong>and</strong> \witch" sound the same but will most<br />

likely have dierent prosody. The word \which" (word class tag DDQ | `wh-' determ<strong>in</strong>er without<br />

`-ever') is most likely to be unstressed though it may be stressed approximately 1/3 <strong>of</strong> the time.<br />

The word \witch" (word class tag NN1 | s<strong>in</strong>gular common noun) is highly likely to be stressed<br />

with a variety <strong>of</strong> stress accents. Thus by determ<strong>in</strong><strong>in</strong>g the presence <strong>of</strong> stress we can postulate that<br />

\witch" is more likely <strong>and</strong> by comb<strong>in</strong><strong>in</strong>g this with evidence from surround<strong>in</strong>g words for probable<br />

syntactic structures we can remove the ambiguity. The data presented <strong>in</strong> chapter 4 <strong>and</strong> appendix D<br />

is most useful for this task.<br />

The most useful application <strong>of</strong> prosody is likely to be <strong>in</strong> speech underst<strong>and</strong><strong>in</strong>g. The identication<br />

<strong>of</strong> stress or stress accents with<strong>in</strong> an utterance may be useful <strong>in</strong> predict<strong>in</strong>g deeper semantic<br />

features such asgiven/new <strong>in</strong>formation, mood, use <strong>of</strong> irony or sarcasm etc.<br />

1.4 Overview<br />

This thesis is presented <strong>in</strong> 5 ma<strong>in</strong> chunks correspond<strong>in</strong>g to chapters 3,4,5,6 <strong>and</strong> 7. First however<br />

a review <strong>of</strong> background <strong>in</strong>formation is given <strong>in</strong> chapter 2.<br />

Chapter 3 deals with the problem <strong>of</strong> process<strong>in</strong>g the data <strong>in</strong> the <strong>Spoken</strong> <strong>English</strong> <strong>Corpus</strong> <strong>in</strong> such<br />

away that statistical <strong>in</strong>formation may be gathered automatically. It describes an algorithm <strong>and</strong><br />

s<strong>of</strong>tware that was devised to cross{reference between the prosodic annotations with the word class<br />

tag <strong>in</strong>formation.<br />

Chapter 4 uses the results <strong>of</strong> this cross{reference to extract various statistical measures <strong>of</strong><br />

prosody <strong>and</strong> word class <strong>in</strong>formation specically the co-occurrence frequencies <strong>of</strong> each word class<br />

with each prosodic mark. That is the number <strong>of</strong> times that words with a given word class tag are<br />

annotated <strong>in</strong> the corpus with each <strong>of</strong> the prosodic marks.<br />

Chapter 5 describes <strong>and</strong> develops a model us<strong>in</strong>g the co-occurrence frequencies that can predict<br />

with high accuracy which words (correspond<strong>in</strong>g to their word class tags) should be stressed (as<br />

4


opposed to left unstressed) <strong>in</strong> the speak<strong>in</strong>g <strong>of</strong> an utterance. The predictions are made only on the<br />

basis <strong>of</strong> the word class tags the actual words are not relevant to the operation <strong>of</strong> the model.<br />

Chapter 6 builds upon the success <strong>of</strong> the model ren<strong>in</strong>g it to <strong>in</strong>clude stress accents <strong>in</strong> its<br />

predictions <strong>in</strong>stead <strong>of</strong> just mak<strong>in</strong>g the stress/unstressed dist<strong>in</strong>ction.<br />

Chapter 7 presents a review <strong>and</strong> analysis <strong>of</strong> the models which demonstrate their potential<br />

<strong>and</strong> limitations. It demonstrates how well each word class tag is modelled <strong>and</strong> how well each<br />

prosodic mark is modelled. It shows that although stress prediction works well, the prosodic mark<br />

prediction model is somewhat constra<strong>in</strong>ed <strong>in</strong> that it does not model the dierence between stress<br />

accents well it has a predisposition to make use <strong>of</strong> the fall stress accent. And nally, future work<br />

possibilities are identied.<br />

5


Chapter 2<br />

Background<br />

2.1 Introduction<br />

This chapter provides background <strong>in</strong>formation on prosody <strong>and</strong> syntax, <strong>Spoken</strong> <strong>English</strong> Corpora<br />

<strong>and</strong> the computational uses <strong>of</strong> prosody.<br />

2.2 <strong>Prosody</strong><br />

The term prosody comes from Ancient Greek, <strong>and</strong> the study <strong>of</strong> it dates from that time if not<br />

earlier. Rob<strong>in</strong>s[Rob67] says:<br />

Apollonius's son, Herodian, is best known for his work on Greek accentutation<br />

... cover<strong>in</strong>g the eld <strong>of</strong> the prosodai ... The prosodai were described <strong>in</strong> more detail<br />

by later scholiasts <strong>and</strong> came to <strong>in</strong>clude the dist<strong>in</strong>ctive pitch levels symbolized by the<br />

accent marks on written words ... It is <strong>in</strong>terest<strong>in</strong>g to see the Greek word prosoda<br />

cover<strong>in</strong>g very much the range <strong>of</strong> phonetic phenomena to which the term prosody has<br />

been applied ...<br />

But Crystal[Cry69] (p. 20{21) po<strong>in</strong>ts out that no major work was done on prosody until the 16 th<br />

Century.<br />

6


It is generally agreed that the earliest discussion <strong>of</strong> melody <strong>in</strong> spoken <strong>English</strong> is<br />

that <strong>of</strong> John Hart <strong>in</strong> his Orthographie, <strong>and</strong> The open<strong>in</strong>g <strong>of</strong> the unreasonable writ<strong>in</strong>g <strong>of</strong><br />

our Inglish toung. The former, published <strong>in</strong> 1569, has a long section on <strong>in</strong>tonation (see<br />

Danielsson, 1955, pp. 199{201) ... In The open<strong>in</strong>g <strong>of</strong> the unreasonable writ<strong>in</strong>g <strong>of</strong> our<br />

Inglish toung (1551, xx164{5 see Danielsson, 1955, pp. 147 .), there is an attempt to<br />

outl<strong>in</strong>e the nature <strong>of</strong> stress <strong>in</strong> <strong>English</strong> ...<br />

Follow<strong>in</strong>g Hart's work, Crystal (p. 22) says, was:<br />

Butler (1633), who provides the rst connected discussion <strong>of</strong> the two ma<strong>in</strong> <strong>English</strong><br />

tunes ... [<strong>and</strong> Fl<strong>in</strong>t (1740)] which <strong>in</strong>volved reference to stress, there was no specic<br />

study until Steele (1775) <strong>and</strong> Walker (1787).<br />

A review <strong>of</strong> the history <strong>of</strong> <strong>English</strong> prosody is given <strong>in</strong> Crystal[Cry69] sections 2.5 to 2.7.<br />

However there is not much work <strong>of</strong> relevance to this thesis until after the late fties when systems<br />

<strong>of</strong> tonetic stress marks <strong>and</strong> the tone unit were given substance by O`Connor <strong>and</strong> Arnold[OA61].<br />

The particular emphasis <strong>of</strong> this thesis is on mach<strong>in</strong>e readable copora which have not existed until<br />

very recent years.<br />

2.2.1 Denition <strong>of</strong> prosody<br />

The terms used <strong>in</strong> the study <strong>of</strong> prosody are <strong>of</strong>ten ambiguous <strong>and</strong> confus<strong>in</strong>g (for example British<br />

researchers tend to use the term prosodic whereas American researchers tend to use the term<br />

suprasegmental). In fact some researchers tend to talk about prosody as a term synonymous with<br />

<strong>in</strong>tonation or <strong>in</strong>tonation <strong>and</strong> <strong>in</strong>tensity. The purpose <strong>of</strong> this section is to provide denitions <strong>of</strong> the<br />

ma<strong>in</strong> terms <strong>and</strong> particularly those relevant to this work. In all cases British <strong>English</strong> is the only<br />

language under consideration unless stated otherwise.<br />

<strong>Prosody</strong><br />

<strong>Prosody</strong> is the term used to describe those features <strong>of</strong> speech that are considered to be nonsegmental<br />

usually agreed to <strong>in</strong>clude <strong>in</strong>tonation, stress, loudness, rhythm, tempo <strong>and</strong> voice quality.<br />

7


These contrast with segmental aspects <strong>of</strong> speech suchaswords, syllables or phonemes. <strong>Prosody</strong><br />

is <strong>of</strong>ten used as an alternative termto<strong>in</strong>tonation. Hence prosody is the superord<strong>in</strong>ate term used<br />

to cover all <strong>of</strong> the above aspects which will be described <strong>in</strong>dividually <strong>in</strong> the sections below.<br />

In the work presented here prosody is used primarily to <strong>in</strong>dicate stress <strong>and</strong> <strong>in</strong>tonation <strong>and</strong> to<br />

a lesser extent loudness. No consideration is given to aspects <strong>of</strong> rhythm, tempo <strong>and</strong> voice quality.<br />

Suprasegmental/nonsegmental<br />

Accord<strong>in</strong>g to Roach[Roa92] (p. 105) \A term <strong>in</strong>vented to refer to aspects <strong>of</strong> sound such as<strong>in</strong>tonation<br />

that did not seem to be properties <strong>of</strong> <strong>in</strong>dividual segments ... much British work has preferred<br />

to use the term prosodic <strong>in</strong>stead." Crystal[CQ64] (p. 341) also lists plurisegmental <strong>and</strong> superx<br />

as alternative terms.<br />

Intonation<br />

Intonation is the patterns <strong>of</strong> chang<strong>in</strong>g pitch <strong>of</strong>voice (or melody) over an utterance used to convey<br />

<strong>in</strong>formation. However, as Roach (p. 56) po<strong>in</strong>ts out it is <strong>of</strong>ten used <strong>in</strong> a \broader <strong>and</strong> more popular<br />

sense, [as] equivalent to prosody, where variations <strong>in</strong> such th<strong>in</strong>gs as voice quality, tempo <strong>and</strong><br />

loudness are <strong>in</strong>cluded."<br />

Intonation is used by speakers to convey emotions <strong>and</strong> attitudes <strong>and</strong> (p. 57) \<strong>in</strong>terest<strong>in</strong>g relationships<br />

exist <strong>in</strong> <strong>English</strong> between <strong>in</strong>tonation <strong>and</strong> grammar" suggest<strong>in</strong>g that <strong>in</strong>tonation also plays<br />

a role <strong>in</strong> guid<strong>in</strong>g the listener through the structure <strong>of</strong> the utterance<br />

Descriptive frameworks have been developed to describe the chang<strong>in</strong>g pitch movements. The<br />

tone unit is considered to be the basic unit <strong>of</strong> prosody <strong>and</strong> <strong>in</strong>tonation <strong>and</strong> is the approach most<br />

widely used <strong>in</strong> Brita<strong>in</strong>. See section 2.2.4.<br />

Pitch<br />

A sound with a periodicity issaidtohave a pitch. Speech is considered to be periodic although<br />

strictly speak<strong>in</strong>g there are m<strong>in</strong>or variations between cylces. Pitch is sometimes considered equivalent<br />

to fundamental frequency (F0) but this is not the case the fundamental frequency is the<br />

8


Figure 2.1: Waveform for the vowel e with one cycle or pitch period marked.<br />

acoustic counterpart to pitch. Pitch is a complex auditory perception. It is possible to perceive a<br />

change <strong>in</strong> pitch when the fundamental frequency is xed but signal <strong>in</strong>tensity is slightly varied an<br />

<strong>in</strong>crease <strong>in</strong> <strong>in</strong>tensity produces a drop <strong>in</strong> pitch as noted by Lehiste (p. 67)[Leh70].<br />

Under most conditions pitch rema<strong>in</strong>s closely related to the fundamental frequency. Figure 2.1<br />

shows the waveform for a vowel with one cycle or pitch period <strong>in</strong>dicated. The more cycles there<br />

are per second the higher the perceived pitch. The relationship is not l<strong>in</strong>ear however <strong>and</strong> the mel<br />

scale relates pitch to frequency (see Lehiste p. 65). Signicant changes or contrasts <strong>in</strong> pitch give<br />

rise to pitch accents.<br />

Accent<br />

Accent refers to a prom<strong>in</strong>ence sometimes called a pitch accent. This is dist<strong>in</strong>ct from stress <strong>in</strong> that<br />

stress is more generally used to refer to other types <strong>of</strong> prom<strong>in</strong>ence <strong>in</strong>clud<strong>in</strong>g loudness, length <strong>and</strong><br />

sound quality.<br />

Accent may also refer to a particular way <strong>of</strong> pronounc<strong>in</strong>g. For example two <strong>English</strong> people<br />

may say the same sentence but one may speak with Received Pronunciation whilst the other may<br />

speak with a broad Yorkshire accent.<br />

Stress<br />

Stress is the term given to any form <strong>of</strong> prom<strong>in</strong>ence <strong>of</strong> syllables.<br />

Sentence Stress is the most<br />

prom<strong>in</strong>ent word <strong>in</strong> a sentence <strong>and</strong> word stress is the stress pattern on the syllables with<strong>in</strong> a word.<br />

The position <strong>of</strong> stress with<strong>in</strong> a word can determ<strong>in</strong>e its mean<strong>in</strong>g for example REfuse <strong>and</strong> reFUSE.<br />

Stress is a complex topic <strong>and</strong> not completely understood. Components <strong>of</strong> stress <strong>in</strong>clude pitch<br />

prom<strong>in</strong>ence, <strong>in</strong>creased articulatory eort, loudness, <strong>and</strong> syllable lengthen<strong>in</strong>g.<br />

9


The number <strong>of</strong> degrees (or levels) <strong>of</strong> stress are a subject <strong>of</strong> disagreement yet it seems that no<br />

more than three levels (unstressed, weakly stressed, <strong>and</strong> strongly stressed) are all that is necessary.<br />

See Fudge[Fud84] for more <strong>in</strong>formation.<br />

Loudness<br />

Roach (p. 68) has the follow<strong>in</strong>g to say on loudness:<br />

[Loudness is the] auditory impression <strong>of</strong> the amount <strong>of</strong> energy <strong>in</strong> sounds. We all use<br />

greater loudness to overcome dicult communication conditions ... <strong>and</strong> to give strong<br />

emphasis to what we aresay<strong>in</strong>g, <strong>and</strong> it is clear that <strong>in</strong>dividuals dier from each other<br />

<strong>in</strong> the natural loudness level <strong>of</strong> their normal speak<strong>in</strong>g voice. Loudness plays a relatively<br />

small role <strong>in</strong> the stress<strong>in</strong>g <strong>of</strong> syllables, but it seems that <strong>in</strong> general we do not make<br />

very much use <strong>of</strong> loudness contrasts <strong>in</strong> speak<strong>in</strong>g.<br />

Crystal (p. 215) adds that it corresponds \to some degree with the acoustic feature <strong>of</strong> <strong>in</strong>tensity<br />

(measured <strong>in</strong> decibels (dB))" <strong>and</strong> he goes on to note that \other factors than <strong>in</strong>tensity may aect<br />

our sensation <strong>of</strong> loudness, e.g. <strong>in</strong>creas<strong>in</strong>g the frequency <strong>of</strong> vocal cord vibrations ..."<br />

Rhythm<br />

The tim<strong>in</strong>g <strong>and</strong> distribution <strong>of</strong> events <strong>in</strong> speech. Roach (pp. 93-94) notes:<br />

An extreme view (though quite a common one) is that <strong>English</strong> speech has a rhythm<br />

that allows us to divide it up <strong>in</strong>to more or less equal <strong>in</strong>tervals <strong>of</strong> time called `feet',<br />

each <strong>of</strong> which beg<strong>in</strong>s with a stressed syllable: this is called the stress{timed rhythm<br />

hypothesis.<br />

Crystal (p. 307) denes rhythm as \the perceived regularity <strong>of</strong> prom<strong>in</strong>ent units <strong>in</strong> speech".<br />

Speech Rate/Tempo<br />

Simply the speed <strong>of</strong> speak<strong>in</strong>g or rate <strong>of</strong> articulation measured <strong>in</strong>, for example, syllables per m<strong>in</strong>ute.<br />

Speech rate can be modied (from a speaker's normal speech rate, with<strong>in</strong> certa<strong>in</strong> ranges) for<br />

10


semantic or emotional/attitud<strong>in</strong>al eects.<br />

Pitch Range/Tessitura<br />

The extent <strong>in</strong> pitch (between lowest <strong>and</strong> highest pitch) which a speaker usually uses <strong>in</strong> normal<br />

speech. This may be extended or shifted for semantic or emotional/attitud<strong>in</strong>al eects.<br />

Key<br />

Crystal (p. 200) denes Key <strong>in</strong> the follow<strong>in</strong>g way:<br />

A term used by some sociol<strong>in</strong>guists as part <strong>of</strong> a classication <strong>of</strong> variations <strong>in</strong> spoken<br />

<strong>in</strong>teraction: it refers to the tone, manner, or spirit <strong>in</strong> which a speech{act is carried out,<br />

e.g. the contrast between mock <strong>and</strong> serious styles <strong>of</strong> activity ...<br />

However, Roach (p. 61) says that key \has generally been used simply to <strong>in</strong>dicate a rough location<br />

with<strong>in</strong> the pitch range" <strong>and</strong> that the terms high key <strong>and</strong> low key have been used to describe the<br />

fact that sometimes a speaker will make more use <strong>of</strong> the higher or lower part <strong>of</strong> their pitch range<br />

usually as a result <strong>of</strong> the emotional content <strong>of</strong> what they may besay<strong>in</strong>g. See Brazil et al[BCJ80]<br />

chapter 2 for a more comprehensive description.<br />

Voice Quality<br />

Dist<strong>in</strong>ctive characteristics <strong>of</strong> a person's speech such a breath<strong>in</strong>ess or creek<strong>in</strong>ess are aspects <strong>of</strong> voice<br />

quality. Speakers do, however, <strong>in</strong>troduce variations <strong>in</strong> voice quality for particular purposes. For<br />

example: speak<strong>in</strong>g <strong>in</strong> a s<strong>of</strong>t voice to <strong>in</strong>dicate sympathy or a harsh voice to show anger. Voice<br />

quality isbeyond the scope <strong>of</strong> this thesis but see Laver[Lav80, Lav72] for a more authorative<br />

description <strong>of</strong> voice quality.<br />

Juncture<br />

Crystal (p. 197) classies juncture as \phonetic boundary features which demarcate grammatical<br />

units ..." <strong>and</strong> Roach (p. 60) describes it as the \way one sound is attached to its neighbours"<br />

11


<strong>and</strong> \where one found <strong>in</strong> cont<strong>in</strong>uous speech phonetic eects that would usually be found preced<strong>in</strong>g<br />

or follow<strong>in</strong>g a pause, the phonological element <strong>of</strong> juncture would be postulated". Crystal<br />

demonstrates with an example:<br />

Word{division, for example, can be signalled by a complex <strong>of</strong> pitch, stress, length <strong>and</strong><br />

other features, as <strong>in</strong> the potential contrast between that stu <strong>and</strong> that's tough.<br />

Roach lists some other examples: cart rack/car track, pea stalks/peace talks, great ape/grey tape.<br />

2.2.2 Auditory versus Acoustic <strong>Prosody</strong><br />

Research <strong>in</strong>to prosody usually falls <strong>in</strong>to two camps: auditory or impressionistic/subjective <strong>and</strong><br />

acoustic.<br />

Acoustic research concentrates upon physical phenomena such as can be measured<br />

<strong>and</strong> would for example concentrate upon (for <strong>in</strong>tonation, for example) the acoustic correlates<br />

such as fundamental frequency whereas an auditory approach would be more concerned with the<br />

perceptual phenomena such aspitch rises <strong>and</strong> falls.<br />

For computer based work it is dicult to follow an auditory approach because <strong>of</strong> the problems<br />

<strong>in</strong>herent <strong>in</strong> modell<strong>in</strong>g complex perceptual eects. For example it is relatively easy to measure F0<br />

but it is not possible to measure pitch s<strong>in</strong>ce it is a perception. See [Leh70, Cry69, Lav80, BCJ80,<br />

Fud84] for relevant work.<br />

2.2.3 Functions <strong>of</strong> prosody<br />

The functions <strong>of</strong> prosody are diverse <strong>and</strong> op<strong>in</strong>ions are divided between emotion <strong>and</strong> attitude signall<strong>in</strong>g<br />

versus grammatical <strong>and</strong> lexical <strong>in</strong>formation. See [Cru86, Cry69] <strong>and</strong>Laver[Lav94](pp.494-<br />

498) <strong>and</strong> Roach[Roa91](p.163). It seems reasonable that prosody performs both tone unit segmentation<br />

correspond with some major syntactic constituents. See section 2.6. Lea[Lea80] has<br />

shown that there is a relationship between word class <strong>and</strong> stress. <strong>Prosody</strong> also directs attention<br />

or focus <strong>and</strong> can signal new or old <strong>in</strong>formation as well as contrast<strong>in</strong>g or correct<strong>in</strong>g <strong>and</strong> echo<strong>in</strong>g<br />

<strong>in</strong>formation. For example see section 1.3.1.<br />

12


2.2.4 Representation <strong>of</strong> prosody<br />

An old transcription system <strong>of</strong>ten used referred to as \<strong>in</strong>terl<strong>in</strong>ear tonetic" represents <strong>in</strong>tonation<br />

as a sequence <strong>of</strong> dots or dashes (with curves represent<strong>in</strong>g stress accents) between two horizontal<br />

l<strong>in</strong>es (see [Roa91, Cru86] for examples). This is not a convienient representation scheme for use<br />

<strong>in</strong> mach<strong>in</strong>e readable process<strong>in</strong>g <strong>of</strong> <strong>in</strong>tonation though.<br />

Cruttenden[Cru86] comments:<br />

there have beentwo alternative approaches to the analysis <strong>of</strong> <strong>English</strong>: the older British<br />

`whole tune' approach which describes the overall tunes associated with sentences ...<br />

<strong>and</strong> which does not therefore have a concept <strong>of</strong> nucleus <strong>and</strong> nuclear tone ... [<strong>and</strong> the]<br />

American approach <strong>in</strong>volv<strong>in</strong>g pitch levels <strong>and</strong> term<strong>in</strong>al junctures.<br />

The st<strong>and</strong>ard British approach has been the tone unit, with tonic stress movement markers<br />

placed upon the syllables upon which a pitch accent starts. The structure <strong>of</strong> a tone unit is dened<br />

as<br />

(pre{head) (head) tonic stress (tail)<br />

The tonic stress is the only element that is required the others be<strong>in</strong>g optional. Laver[Lav94](p.492)<br />

says<br />

Any legitimate utterance <strong>of</strong> <strong>English</strong> is made up <strong>of</strong> one or more <strong>in</strong>tonational phrases,<br />

<strong>and</strong> each <strong>in</strong>tonational phrase conta<strong>in</strong>s one <strong>in</strong>tonational nucleus at which one <strong>of</strong> the<br />

possible nuclear tones is chosen.<br />

If any words precede the tonic stress they comprise the pre{head, any words which are stressed<br />

(but without accent) form the head. Words follow<strong>in</strong>g the tonic stress are the tail. Typical tonic<br />

stress accents <strong>in</strong>clude rises, falls, fall{rises, <strong>and</strong> levels.<br />

In British writ<strong>in</strong>g another transcription system <strong>in</strong>dicates the tune <strong>in</strong> a tone unit by anumber<br />

at the start <strong>of</strong> each tone unit with diacritics to <strong>in</strong>dicate variations.<br />

One <strong>of</strong> the problems with the above system comments Cruttenden[Cru86](pp.63-64) is the<br />

confound<strong>in</strong>g eect <strong>of</strong> the various dist<strong>in</strong>ctions <strong>of</strong> pitch range. Recent work, he says, has explored<br />

13


the use <strong>of</strong> only two level tones (High <strong>and</strong> Low). He goes on to describe autosegmental <strong>in</strong>tonation<br />

(see section 3.8.1).<br />

Cruttenden reports that the model presented by Pierrehumbert (1980) uses \an LP{type metrical<br />

representation <strong>of</strong> the text ... <strong>and</strong> secondly a tune represented by a sequence <strong>of</strong> high (H) <strong>and</strong><br />

low (L) tones ... the model constructs an underly<strong>in</strong>g representation for tunes <strong>of</strong> <strong>English</strong> <strong>in</strong>tonation<br />

<strong>and</strong> a set <strong>of</strong> rules which transmutes such tunes <strong>in</strong>to actual patterns <strong>of</strong> fundamental frequency".<br />

Refer to section 3.9 <strong>of</strong> Cruttenden[Cru86] for a summary.<br />

Another transcription commonly used until recently <strong>in</strong> American writ<strong>in</strong>g used one <strong>of</strong> a number<br />

<strong>of</strong> pitch levels at crucial change po<strong>in</strong>ts <strong>in</strong> a contour.<br />

Crystal[Cry69] denes the most comprehensive prosodic <strong>and</strong> paral<strong>in</strong>guistic transcription which<br />

has been used <strong>in</strong> the London{Lund <strong>Corpus</strong> <strong>of</strong> <strong>Spoken</strong> <strong>English</strong> (see below).<br />

2.3 <strong>Syntax</strong><br />

Although the emphasis <strong>of</strong> this thesis is upon the automatic generation <strong>of</strong> prosodic annotations<br />

from their relations to word class it is necessary to <strong>in</strong>clude a few denitions on syntax.<br />

2.3.1 Denition <strong>of</strong> <strong>Syntax</strong><br />

<strong>Syntax</strong> is the grammatical arrangement <strong>of</strong>words show<strong>in</strong>g their connection <strong>and</strong> relation or a set <strong>of</strong><br />

rules den<strong>in</strong>g how words may be comb<strong>in</strong>ed.<br />

Part<strong>of</strong>Speech<br />

The part <strong>of</strong> speech <strong>of</strong>aword is its grammatical identity such as noun, verb, adjective, conjunction<br />

etc. Most dictionaries will list the parts <strong>of</strong> speech for words although <strong>in</strong> corpus based natural<br />

language process<strong>in</strong>g there are usually many subdivisions <strong>of</strong> the basic classes given above. For<br />

example <strong>in</strong> the system used throughout this thesis there are 29 divisions with<strong>in</strong> the noun class.<br />

Some words may have more than one part <strong>of</strong> speech which gives them dierent mean<strong>in</strong>gs <strong>in</strong><br />

dierent circumstances. For example \record" as a verb is \to make a copy <strong>of</strong> someth<strong>in</strong>g" whilst<br />

14


as a noun it is \a disc <strong>of</strong> plastic from which music may be played".<br />

Wordtag<br />

Awordtag is a symbol used to annotate a word <strong>in</strong> a sentence (usually conta<strong>in</strong>ed with<strong>in</strong> a corpus)<br />

to <strong>in</strong>dicate the part <strong>of</strong> speech <strong>of</strong>thatword. For example NP1 is the wordtag for a s<strong>in</strong>gular proper<br />

noun.<br />

The symbols for wordtags vary amongst authors but <strong>in</strong> all cases with<strong>in</strong> this thesis the word<br />

tags used are from the CLAWS4 (see section 2.5.3) part <strong>of</strong> speech tagg<strong>in</strong>g system. See appendix B<br />

for a list <strong>of</strong> the tags used.<br />

Note that some <strong>of</strong> the wordtags are very highly constra<strong>in</strong>ed for example the wordtag VBZ is<br />

only used for is or 's.<br />

Parsetree<br />

A parse tree is a tree{like structure that shows the <strong>in</strong>ter{relationship between parts <strong>of</strong> speech<br />

<strong>and</strong> shows phrase <strong>and</strong> clause structures with<strong>in</strong> a sentence. In this example wordtags immediately<br />

follow the word they belong to with a separat<strong>in</strong>g underscore character. Thus right JJ is the word<br />

right with wordtag JJ which means adjective <strong>in</strong> this context. For example (SEC section A09<br />

sentence 9):<br />

[N They_PPHS2 N][V are_VBR [J right_JJ J]<br />

[Ti to_TO be_VB0 [J sceptical_JJ J]Ti]V]<br />

The square brackets <strong>in</strong>dicate phrase <strong>and</strong> clause structures <strong>and</strong> come <strong>in</strong> match<strong>in</strong>g pairs so the [J<br />

bracket before the word right matches the J] bracket just after it. In this case They is a noun<br />

phrase, right <strong>and</strong> sceptical are adjective phrases, to be sceptical is a clause with <strong>in</strong>nite<br />

head <strong>and</strong> are right to be sceptical is a verb phrase. It can be seen that with the nested<br />

match<strong>in</strong>g brackets the above sequence could be drawn as a tree. For a full list <strong>of</strong> phrase <strong>and</strong> clause<br />

labels refer to appendix B.<br />

15


Treebank<br />

A treebank is simply a collection <strong>of</strong> parsetrees for each <strong>of</strong> the sentences <strong>in</strong> a text.<br />

2.4 Approaches To Natural Language Process<strong>in</strong>g<br />

In this section I will briey outl<strong>in</strong>e some <strong>of</strong> the wider rang<strong>in</strong>g approaches taken to natural language<br />

process<strong>in</strong>g (NLP) which dier from the probabilistic corpus{based approach taken <strong>in</strong> this thesis.<br />

Roger Schank argued that whatever <strong>in</strong>formation is encoded <strong>in</strong> the organisation <strong>of</strong> language can<br />

be extracted directly without build<strong>in</strong>g an <strong>in</strong>termediate representation(p.32)[MA91]. That is, it is<br />

possible to process language without a system <strong>of</strong> grammar.<br />

Gazdar's et al Generalised Phrase Structure Grammar (GPSG 1 )[GKPS85], on the other h<strong>and</strong>,<br />

conta<strong>in</strong>s specic rules about possible <strong>and</strong> legal structures <strong>of</strong> the k<strong>in</strong>d which Schank avoids. However,<br />

approaches such as Gazdar's also conta<strong>in</strong> mapp<strong>in</strong>gs between syntactic rules <strong>and</strong> rules for<br />

semantic <strong>in</strong>terpretation. So, it is easy to view any structure syntactically <strong>and</strong> semantically.<br />

Both <strong>of</strong> these approaches to NLP use a system <strong>of</strong> rules. Rules for represent<strong>in</strong>g legal <strong>and</strong> illegal<br />

structures <strong>and</strong> rules for mapp<strong>in</strong>g between <strong>in</strong>terpretations <strong>of</strong> structures.<br />

Computer programm<strong>in</strong>g languages such as LISP <strong>and</strong> PROLOG (with its unication mechanism)<br />

have been <strong>in</strong>strumental <strong>in</strong> the design <strong>of</strong> systems such as GPSG. See Gazdar <strong>and</strong> Mellish's<br />

book on NLP <strong>in</strong> PROLOG[GM89] for an example.<br />

Representation <strong>of</strong> mean<strong>in</strong>g was approached by Schank by den<strong>in</strong>g a set <strong>of</strong> \atomic" primatives<br />

<strong>in</strong> terms <strong>of</strong> whicheveryth<strong>in</strong>g else was dened. His theory <strong>of</strong> conceptual dependency was an attempt<br />

to nd a m<strong>in</strong>imal set <strong>of</strong> semantic primatives which can be used for the <strong>in</strong>terpretation <strong>of</strong> all natural<br />

language texts. He qualied this by argu<strong>in</strong>g that any two texts which have the same mean<strong>in</strong>g<br />

should be represented <strong>in</strong> the same way. For example:<br />

John loves Mary.<br />

1 GPSG is a framework for writ<strong>in</strong>g fully explicit formal grammars for natural languages. It is a notationally<br />

elaborated varient <strong>of</strong>context{free phrase structure grammar.<br />

16


Mary is loved by John.<br />

Should both have the same representation 2 . See, for example, Schank [Sch73]. Wilks's[Wil78]<br />

theory <strong>of</strong> preference semantics is similar but uses a much larger set <strong>of</strong> primatives.<br />

In contrast to the symbolic rule{based models mentioned above, connexionist approaches such<br />

as that <strong>of</strong> Waltz[Wal89] are based upon neural network models <strong>of</strong> language.<br />

Waltz lists the<br />

advantages <strong>of</strong> connexionism as:<br />

Connectionist systems exhibit non{trivial learn<strong>in</strong>g ...<br />

[they] can be made fault{<br />

tolerant <strong>and</strong> error{correct<strong>in</strong>g, degrad<strong>in</strong>g gracefully for cases not encountered previously.<br />

... Connectionist architectures also scale well ...<br />

He goes on to po<strong>in</strong>t out that:<br />

In contrast, systems based on logic, unication <strong>and</strong> exact match<strong>in</strong>g are <strong>in</strong>evitably<br />

brittle.<br />

Similar to neural networks are Hidden Markov Models (HMM) (see Rab<strong>in</strong>er[Rab90]) which are<br />

also used <strong>in</strong> non{symbolic, non{rule based approaches to NLP.<br />

There are then, two ma<strong>in</strong> approaches to NLP:<br />

1. rule{based symbolic systems<br />

2. non{symbolic connexionist or stochastic systems.<br />

The work <strong>in</strong> this thesis, whilst it does not used neural networks nor HMMs, falls <strong>in</strong>to the latter<br />

category.<br />

2.5 <strong>Spoken</strong> <strong>English</strong> Corpora <strong>and</strong> Prosodic Annotation<br />

There are three ma<strong>in</strong> prosodic annotation schemes used <strong>in</strong> mach<strong>in</strong>e readable speech corpora: the<br />

system used <strong>in</strong> the SEC (st<strong>and</strong>ard British prosodic annotation) or variations on it which are based<br />

2 Some l<strong>in</strong>guists, however, would question whether these sentences are actually the same<br />

17


upon O'Connor <strong>and</strong> Arnold[OA61] the system used <strong>in</strong> the LLC which follows Crystal[Cry69] <strong>and</strong><br />

the ToBI system[SBP + 92] derived from Pierrehumbert[Pie87]. See below for <strong>in</strong>formation on these<br />

corpora. The reader is referred to the relevant references for <strong>in</strong>formation regard<strong>in</strong>g the annotation<br />

<strong>in</strong>formation <strong>in</strong> corpora other than the SEC.<br />

2.5.1 London{Lund <strong>Corpus</strong><br />

The London{Lund <strong>Corpus</strong> <strong>of</strong> <strong>Spoken</strong> <strong>English</strong> (LLC) derives from two projects: the Survey <strong>of</strong><br />

<strong>English</strong> Usage at University College London launched <strong>in</strong> 1959 by R<strong>and</strong>olph Quirk <strong>and</strong> the Survey<br />

<strong>of</strong> <strong>Spoken</strong> <strong>English</strong> launched <strong>in</strong> 1975 by JanSvartvik at Lund University.<br />

The LLC conta<strong>in</strong>s written as well as spoken material <strong>in</strong>clud<strong>in</strong>g surreptitiously recorded material.<br />

Texts have been analysed grammatically <strong>and</strong> have a prosodic/paral<strong>in</strong>guistic analysis (follow<strong>in</strong>g<br />

Crystals conventions[Cry69]) which are held on typed cards. Only a fraction <strong>of</strong> this prosodic<br />

paral<strong>in</strong>guistic analysis is available <strong>in</strong> a mach<strong>in</strong>e readable form. Greenbaum <strong>and</strong> Svartvik[Sva90]<br />

state that:<br />

The basic prosodic features marked <strong>in</strong> the full transcription are tone unit boundaries,<br />

the location <strong>of</strong> the nucleus (ie the peak <strong>of</strong> greatest prom<strong>in</strong>ence <strong>in</strong> a tone unit), the<br />

direction <strong>of</strong> the nuclear tone, vary<strong>in</strong>g lengths <strong>of</strong> pauses, <strong>and</strong> vary<strong>in</strong>g degrees <strong>of</strong> stress.<br />

Other features comprise vary<strong>in</strong>g degrees <strong>of</strong> loudness <strong>and</strong> tempo (eg allegro, clipped,<br />

drawled), modications <strong>in</strong> voice quality (pitch range, rhythmicality <strong>and</strong> tension), <strong>and</strong><br />

paral<strong>in</strong>guistic features such as whisper <strong>and</strong> creak. Indications are given <strong>of</strong> overlap <strong>in</strong><br />

the utterances <strong>of</strong> speakers. The full transcription <strong>and</strong> the grammatical analysis are<br />

available only on the slips at the Survey <strong>of</strong> <strong>English</strong> Usage at University College London.<br />

The (mach<strong>in</strong>e readable) reduced transcription <strong>in</strong>cludes tone units, onsets, nuclei, nuclear tone<br />

direction (falls, rises etc.), boosters, pauses (2 degrees), <strong>and</strong> stress (2 levels).<br />

18


2.5.2 Polytechnic <strong>of</strong> Wales <strong>Corpus</strong><br />

The Polytechnic <strong>of</strong> Wales <strong>Corpus</strong> (POW)[FP80] is not really suitable for analysis with<strong>in</strong> this thesis<br />

s<strong>in</strong>ce it is neither mach<strong>in</strong>e readable (be<strong>in</strong>g collected <strong>in</strong> the days prior to word processors) nor is it<br />

<strong>of</strong> suitable material s<strong>in</strong>ce it is a corpus <strong>of</strong> child speech. It is however worth a mention because it<br />

conta<strong>in</strong>s prosodic annotations as well as a full grammatical analysis.<br />

It is worth not<strong>in</strong>g that the orig<strong>in</strong>al record<strong>in</strong>gs have been rescued by Clive Souter <strong>and</strong> have<br />

now be<strong>in</strong>g copied to digital audio tape (DAT). The four volumes <strong>of</strong> transcripts may one day be<br />

scanned to mach<strong>in</strong>e to make thePOW corpus mach<strong>in</strong>e readable.<br />

2.5.3 SEC/MARSEC<br />

The <strong>Spoken</strong> <strong>English</strong> <strong>Corpus</strong> (SEC)[KT88] (also known as the Lancaster/IBM <strong>Spoken</strong> <strong>English</strong><br />

<strong>Corpus</strong>) was compiled between 1984 <strong>and</strong> 1985 at the Unit for Computer Research on the <strong>English</strong><br />

Language (UCREL), University <strong>of</strong> Lancaster, <strong>and</strong> the Speech Research Group at IBM UK Scientic<br />

Centre, W<strong>in</strong>chester. As the name implies, the SEC is a corpus <strong>of</strong> spoken British <strong>English</strong><br />

(taken ma<strong>in</strong>ly from BBC Radio 4 broadcasts) which isavailable as lexicographically transcribed<br />

texts (with <strong>and</strong> without punctuation), as part <strong>of</strong> speech annotated texts, <strong>and</strong> as prosodically annotated<br />

texts. All annotations were produced manually with the exception <strong>of</strong> the part <strong>of</strong> speech<br />

annotated texts which are semi{automatically produced. In addition, as a parallel resource, there<br />

is a treebank version <strong>of</strong> the corpus. The SEC is available through the International Computer<br />

Archive <strong>of</strong> Modern <strong>English</strong> (ICAME).<br />

Together these form a very rich source <strong>of</strong> <strong>in</strong>formation <strong>and</strong> all are <strong>in</strong> a mach<strong>in</strong>e readable format.<br />

A potential drawback but one solved <strong>in</strong> chapter 4 is that the dier<strong>in</strong>g versions <strong>of</strong> the corpus exist as<br />

separate entities that have evolved <strong>in</strong>dependently they are not related to each other except by the<br />

fact that they cover the same speech material. See appendix A for a description <strong>of</strong> the contents <strong>of</strong><br />

the corpus. For more comprehensive <strong>in</strong>formation refer to the orig<strong>in</strong>al corpus documentation[KT88].<br />

The corpus size is approximately 52000 words, which is quite small by modern computer<br />

corpora st<strong>and</strong>ards. Although this may be a problem the corpus is so richly annotated as to make<br />

19


it a very desirable resource.<br />

The Mach<strong>in</strong>e Readable <strong>Spoken</strong> <strong>English</strong> <strong>Corpus</strong> (MARSEC)[RKVA94, GAR92] is an extension<br />

to the SEC <strong>in</strong> which the orig<strong>in</strong>al acoustic data <strong>of</strong> the corpus has been digitised <strong>and</strong> made available<br />

on CDROM. To complement this fundamental frequency, RMS energy, time aligned segmental<br />

transcription <strong>and</strong> syllabic divisions have been added.<br />

As a direct result <strong>of</strong> work done <strong>in</strong> this thesis cross{reference between the prosodic, part <strong>of</strong><br />

speech, treebank, syllabic, <strong>and</strong> segmental transcriptions have been produced which form part <strong>of</strong><br />

the Leeds (UNIX/waves based) version <strong>of</strong> the corpus | although not yet released. The cross{<br />

reference allows direct l<strong>in</strong>ks to be made from any po<strong>in</strong>t <strong>in</strong> the corpus between any two (or more)<br />

versions <strong>of</strong> the corpus annotations (<strong>in</strong>clud<strong>in</strong>g acoustic signal, F0 <strong>and</strong> RMS energy).<br />

The part <strong>of</strong> speech annotations <strong>in</strong> the corpus were assigned at the University <strong>of</strong> Lancaster<br />

us<strong>in</strong>g their CLAWS[GLS87, Atw83] tagg<strong>in</strong>g program which was rst developed between 1981 <strong>and</strong><br />

1983 at the Universities <strong>of</strong> Lancaster, Oslo <strong>and</strong> Bergen.<br />

The prosodic annotations were produced manually by two expert transcribers 3 us<strong>in</strong>g a system<br />

based upon O'Connor <strong>and</strong> Arnold[OA61]. The corpus was annotated with 16 prosodic marks.<br />

The " <strong>and</strong> # symbols could be used <strong>in</strong> conjunction with any <strong>of</strong> the high or low level TSMs. Of<br />

the tone unit boundaries only one transcriber used the hesitation boundary. A major boundary<br />

existed where there was a pause a m<strong>in</strong>or boundary where there was a boundary without a pause.<br />

Hesitation boundaries were placed <strong>in</strong> <strong>in</strong>stances where there was a pause but one would not normally<br />

expect to nd a boundary. See table 2.1<br />

2.6 Computational Use <strong>of</strong> <strong>Prosody</strong><br />

Even though prosody is an <strong>in</strong>tegral part <strong>of</strong> the speech act <strong>and</strong> conveys several types <strong>of</strong> <strong>in</strong>formation<br />

it has hardly been exploited <strong>in</strong> computational systems such asspeech synthesis <strong>and</strong> recognition.<br />

Waibel[Wai90] claims that<br />

3 Dr. B. Williams <strong>and</strong> Dr. G. Knowles<br />

20


" higher than predictable pitch<br />

# lower than predictable pitch<br />

low level<br />

low fall<br />

low rise<br />

low fall{rise<br />

low rise{fall<br />

high level<br />

high fall<br />

high rise<br />

high fall{rise<br />

high rise{fall<br />

stressed but unaccented<br />

k major tone unit boundary<br />

j m<strong>in</strong>or tone unit boundary<br />

* hesitation tone unit boundary<br />

Table 2.1: Prosodic marks used <strong>in</strong> the SEC/MARSEC<br />

To this day, the prosodic cues <strong>in</strong> the speech signal, duration, rhythm, <strong>in</strong>tensity, pitch<br />

<strong>and</strong> stress, are frequently be<strong>in</strong>g ignored <strong>in</strong> the implementation <strong>of</strong> speech recognition<br />

systems.<br />

<strong>and</strong> that<br />

Several attempts at us<strong>in</strong>g prosodic cues <strong>in</strong> speech recognition systems have mostly been<br />

limited to aid<strong>in</strong>g syntactic analysis by hypothesiz<strong>in</strong>g phrase or clause boundaries (from<br />

pitch excursions) <strong>and</strong>/or hypothesiz<strong>in</strong>g phonemicaly reliable parts <strong>of</strong> the utterance<br />

(\isl<strong>and</strong>s <strong>of</strong> reliability") from the amount <strong>of</strong> stress signal.<br />

Klatt has also commented on the little <strong>of</strong> use <strong>of</strong> prosody[Kla80, Kla90]:<br />

While relatively little use has been made <strong>of</strong> prosodic <strong>in</strong>formation <strong>in</strong> most recognition<br />

systems described to date, some ideas for prosodic analysis have been proposed <strong>and</strong><br />

tested (Lea, Medress, <strong>and</strong> Sk<strong>in</strong>ner, 1975).<br />

Aga<strong>in</strong> the same op<strong>in</strong>ion is expressed by Lea[Lea80]:<br />

If there is one aspect <strong>of</strong> the <strong>in</strong>formation <strong>in</strong> the speech signal that seems promis<strong>in</strong>g <strong>and</strong><br />

yet \untapped", it is the \suprasegmental" <strong>in</strong>formation ...<br />

21


He goes on to document several prosodic correlates <strong>of</strong> l<strong>in</strong>guistic structures that have potentially<br />

useful applications <strong>in</strong> computer speech technologies.<br />

2.6.1 Speech Recognition <strong>and</strong> Underst<strong>and</strong><strong>in</strong>g<br />

This section demonstrates that although some work has been done on <strong>in</strong>tegrat<strong>in</strong>g prosody <strong>in</strong>to<br />

speech recognition <strong>and</strong> underst<strong>and</strong><strong>in</strong>g systems there is still much to be ga<strong>in</strong>ed.<br />

It is commonly agreed that speech recognition can be improved by use <strong>of</strong> prosodic <strong>in</strong>formation.<br />

\Prosodic cues (fundamental frequency, segmental duration, <strong>and</strong> <strong>in</strong>tensity contour) suggest a<br />

stress pattern for the <strong>in</strong>com<strong>in</strong>g syllable str<strong>in</strong>g <strong>and</strong> thus could assist <strong>in</strong> lexical hypothesization"<br />

said Klatt[Kla80, Kla90]. Three types <strong>of</strong> prosodic knowledge source (duration <strong>and</strong> rhythm, stress<br />

<strong>and</strong> <strong>in</strong>tensity) were <strong>in</strong>vestigated by Waibel[Wai90] for use <strong>in</strong> a speech recognition system <strong>and</strong> he<br />

showed that \dramatic overall improvements" were atta<strong>in</strong>ed when used <strong>in</strong> comb<strong>in</strong>ation with a<br />

speaker{<strong>in</strong>dependent phonetic word hypothesizer.<br />

Longuet{Higg<strong>in</strong>s[LH85] comments upon the possible uses that <strong>in</strong>tonation may be put to for<br />

speech underst<strong>and</strong><strong>in</strong>g. In particular he suggests that contrastive pitch movements may be used<br />

to identify the relative importance <strong>of</strong> words <strong>in</strong> an utterance <strong>and</strong> thus <strong>in</strong>dicate emphasis or new<br />

<strong>in</strong>formation. Similarly on the subject <strong>of</strong> prosodics <strong>in</strong> speech underst<strong>and</strong><strong>in</strong>g systems Woods[Woo85]<br />

says that <strong>in</strong> a speech underst<strong>and</strong><strong>in</strong>g system it is necessary to have:<br />

the ability to use cues such as<strong>in</strong>tonation <strong>and</strong> rhythm to predict the possible syntactic<br />

structure <strong>of</strong> an utterance or to conrm or reject a proposed syntactic structure.<br />

To date none <strong>of</strong> the speech recognition systems currently available commercially make signicant<br />

use <strong>of</strong> prosodic <strong>in</strong>formation.<br />

2.6.2 Speech Synthesis<br />

<strong>Prosody</strong> is an important <strong>and</strong> <strong>in</strong>tegral part <strong>of</strong> speech <strong>and</strong> speech synthesis systems designers have<br />

tried to model aspects <strong>of</strong> it such as, for example, rhythm[Isa85] <strong>and</strong> melody[Ste85] although the<br />

latter notes that (for French) \the ga<strong>in</strong> <strong>in</strong> <strong>in</strong>telligilbility is almost zero" but that the speech will be<br />

22


much more \pleasant". He also notes that \not all authors agree that prosody is directly related<br />

to the syntactic structure <strong>of</strong> a sentence".<br />

The Klatt system (see [Kla87]) has a prosodic process<strong>in</strong>g stage where prosodic contours are<br />

applied <strong>in</strong> the synthesised utterance tak<strong>in</strong>g <strong>in</strong>to account syntactic structure. Word classes give rise<br />

to a potential to aect the fundamental frequency contour, n<strong>in</strong>e <strong>of</strong> which are actually dist<strong>in</strong>guished<br />

from one another. The relative height <strong>of</strong> the peak deection <strong>of</strong> the F0 contour depends upon the<br />

level given to each word class. This however does not correspond to the way <strong>in</strong> which prosodic<br />

annotations are given to speech. That is: <strong>in</strong> terms <strong>of</strong> rises, falls <strong>and</strong> stresses.<br />

There appears to be no system that can do the conversion between prosodic annotations<br />

(whether <strong>of</strong> SEC, LLC or ToBI type) <strong>and</strong> the acoustic signal.<br />

However, see chapter 3 <strong>of</strong> t'<br />

Hart[tHCC90]. Indeed it is not even known if there is a real acoustic correlate or if the transcriptions<br />

are auditory phenomena. Should such a process become available by the work <strong>of</strong> others the<br />

work presented <strong>in</strong> chapters 5 <strong>and</strong> 6 will be able to generate annotations similar to those <strong>in</strong> the<br />

SEC <strong>and</strong> so becomes a l<strong>in</strong>k <strong>in</strong> the path from text{to{speech.<br />

23


Chapter 3<br />

Relat<strong>in</strong>g <strong>Prosody</strong> <strong>and</strong> Word Class<br />

3.1 Introduction<br />

Comparison <strong>of</strong> variations <strong>in</strong> prosody <strong>and</strong> word class annotations requires that one has that <strong>in</strong>formation<br />

for the same utterances. This <strong>in</strong>formation is provided <strong>in</strong> the SEC but not <strong>in</strong> a unied way.<br />

This chapter addresses the problem <strong>of</strong> how to cross{reference between the prosodic <strong>and</strong> the word<br />

class annotations.<br />

3.2 A Source <strong>of</strong> Data<br />

With any statistical analysis a body <strong>of</strong> data is needed. For the task <strong>in</strong> h<strong>and</strong> this data must be<br />

speech which has been lexically transcribed, tagged with word classes <strong>and</strong> must be annotated by<br />

experts with prosodic <strong>in</strong>formation. Ideally the acoustic data should be available to allow reference<br />

between the annotations <strong>and</strong> the orig<strong>in</strong>al speech. This is also preferable for future work where<br />

the relationships between the prosodic annotations <strong>and</strong> fundamental frequency, RMS energy, <strong>and</strong><br />

duration could be <strong>in</strong>vestigated[RAng, GAR92]. The data must also be <strong>in</strong> a mach<strong>in</strong>e readable form<br />

s<strong>in</strong>ce the task <strong>of</strong> cod<strong>in</strong>g such <strong>in</strong>formation is highly time consum<strong>in</strong>g.<br />

There are manyspeech corpora available but few (at that time) oered the requirements <strong>of</strong> be-<br />

24


<strong>in</strong>g: mach<strong>in</strong>e readable, British <strong>English</strong>, hav<strong>in</strong>g word class <strong>and</strong> prosodic annotations <strong>and</strong> provid<strong>in</strong>g<br />

acoustic data. The only obvious c<strong>and</strong>idate: The <strong>Spoken</strong> <strong>English</strong> <strong>Corpus</strong> (SEC)[KT88, GAR92,<br />

Kno88] was, by chance, be<strong>in</strong>g updated jo<strong>in</strong>tly by Leeds University <strong>and</strong> Lancaster University to<br />

become The Mach<strong>in</strong>e Readable <strong>Spoken</strong> <strong>English</strong> <strong>Corpus</strong> (MARSEC)[RKVA94]. Other corpora are<br />

ruled out for various reasons: the Polytechnic <strong>of</strong> Wales <strong>Corpus</strong> (POW) is a corpus <strong>of</strong> child speech<br />

which has some prosodic annotations but is not <strong>in</strong> a mach<strong>in</strong>e readable form <strong>and</strong> the London{Lund<br />

<strong>Corpus</strong> (LLC) has some sections <strong>of</strong> speech with prosodic annotations however these are mostly<br />

not <strong>in</strong> a mach<strong>in</strong>e readable form <strong>and</strong> due to the source <strong>of</strong> the data the acoustic record<strong>in</strong>gs are not<br />

available. The st<strong>and</strong>ard corpus does not have word class tags | although these now exist for some<br />

parts.<br />

MARSEC provides <strong>in</strong> mach<strong>in</strong>e readable form the acoustic data along with fundamental frequency<br />

<strong>and</strong> RMS energy traces, syllabic divisions, segmental time alignment aswell as all the<br />

data that is conta<strong>in</strong>ed with<strong>in</strong> the SEC (see appendix A) along with the ability to cross{reference<br />

between forms <strong>of</strong> data. The MARSEC project was not due to nish until the end <strong>of</strong> this work<br />

hence the cross-referenc<strong>in</strong>g described below has proved to be a useful addition to that project.<br />

Other speech corpora are discussed <strong>in</strong> chapter 2. The SEC <strong>and</strong> MARSEC are ideal sources for<br />

this work.<br />

3.3 A Need For Process<strong>in</strong>g<br />

In an ideal world the <strong>in</strong>formation <strong>in</strong> the SEC would have been easily extractable. That is: prosodic<br />

<strong>in</strong>formation should be directly comparable to word class <strong>in</strong>formation <strong>and</strong> vice{versa. The orig<strong>in</strong>al<br />

structure <strong>of</strong> the SEC does not allow for this: word class <strong>in</strong>formation <strong>and</strong> prosodic <strong>in</strong>formation<br />

were held <strong>in</strong> parallel versions <strong>of</strong> the corpus text.<br />

There are a number <strong>of</strong> dierences between the two representations <strong>of</strong> the corpus texts that<br />

hold <strong>in</strong>formation on word classes <strong>and</strong> prosody. A manual <strong>of</strong> <strong>in</strong>formation to accompany the SEC<br />

<strong>Corpus</strong>[KT88] (sic) expla<strong>in</strong>s that, <strong>in</strong> general, transcriptions were produced from record<strong>in</strong>gs <strong>of</strong><br />

25


speech. These (unpunctuated) transcriptions were then used <strong>in</strong> conjunction with the record<strong>in</strong>gs<br />

to produce the prosodic annotations. Upon occasion it was necessary to alter the transcription<br />

to allow for the appropriate placement <strong>of</strong> prosodic tone marks. For example, if a prosodic marker<br />

were to be placed on the nal syllable <strong>of</strong> the word \19" it would be necessary to rewrite \19" as<br />

\n<strong>in</strong>eteen". This generally happened with numbers <strong>and</strong> acronyms but only where it was necessary<br />

| not <strong>in</strong> every <strong>in</strong>stance.<br />

Meanwhile the transcriptions were be<strong>in</strong>g punctuated <strong>and</strong> then tagged with word classes us<strong>in</strong>g<br />

the CLAWS tagg<strong>in</strong>g program. Modications that were made to allow prosodic markers to be placed<br />

were not also applied to the punctuated or tagged annotations. Likewise the representation <strong>of</strong> the<br />

word classes required changes to be made to the tagged annotation which were not propagated<br />

to other annotations namely the treatment <strong>of</strong> enclitics such as\won't" (section A01 l<strong>in</strong>e 18) <strong>and</strong><br />

\don't" (section A11 l<strong>in</strong>e 43) which are exp<strong>and</strong>ed <strong>in</strong>to \will n't" <strong>and</strong> \do n't" (notice that this is not<br />

merely a case <strong>of</strong> splitt<strong>in</strong>g the two parts <strong>of</strong> the enclitic, so the process is not easily reversed without<br />

doma<strong>in</strong> knowledge). This is to allow the <strong>in</strong>sertion <strong>of</strong> the appropriate word classes. Compounds<br />

(which may be several words long, hyphenated or not) may be classied as a s<strong>in</strong>gle word class<br />

<strong>and</strong> are therefore treated as a s<strong>in</strong>gle word. Possessives such as \Engl<strong>and</strong>'s" (section A01 l<strong>in</strong>e 28)<br />

are split like enclitics <strong>in</strong>to \Engl<strong>and</strong>" <strong>and</strong> \ 's" although it should be noted that the tagg<strong>in</strong>g<br />

scheme used <strong>in</strong> the orig<strong>in</strong>al SEC (CLAWS1 ) did not do this <strong>and</strong> this is only applicable to the<br />

tagg<strong>in</strong>g scheme used to tag the parsed annotated corpus (CLAWS4 ). F<strong>in</strong>ally m<strong>in</strong>or dierences<br />

were <strong>in</strong>troduced by changes <strong>of</strong> case for some words (which, the author underst<strong>and</strong>s, to have been<br />

caused by some rule{based mechanism <strong>in</strong> CLAWS).<br />

Two very obvious dierences between the annotations are that the tagged annotation conta<strong>in</strong>s<br />

punctuation <strong>and</strong> word classes whereas the prosodic annotation conta<strong>in</strong>s tone{unit boundaries <strong>and</strong><br />

prosodic tone <strong>and</strong> stress markers.<br />

The annotations were preceded with headers detail<strong>in</strong>g the<br />

source, speaker(s) <strong>and</strong> other details. Comments enclosed with<strong>in</strong> square brackets detailed such<br />

<strong>in</strong>formation as speaker changes or extracts omitted. The orig<strong>in</strong>al SEC tagged les reta<strong>in</strong>ed this<br />

<strong>in</strong>formation <strong>and</strong> tagged it as if it were a part <strong>of</strong> the orig<strong>in</strong>al script which is not true. Care must,<br />

26


therefore, be taken to ensure that these comments are not taken as data.<br />

S<strong>in</strong>ce the treebank 1<br />

conta<strong>in</strong>ed word class <strong>in</strong>formation <strong>and</strong> phrase bracket <strong>in</strong>formation it was<br />

desirable to allow for use <strong>of</strong> this data yet attempts to match the word class data <strong>and</strong> the treebank<br />

data with the prosodic data proved too arduous because <strong>of</strong> changes that had been made (for better<br />

or worse) to the word class data. This meant that although it was possible to match the word class<br />

or treebank data to the prosodic data, it was dicult to resolve the dierences <strong>in</strong> punctuation<br />

between the word class <strong>and</strong> treebank versions. A decision was made to drop the word class data<br />

<strong>and</strong> use the treebank data exclusively. There is a dist<strong>in</strong>ct advantage to this s<strong>in</strong>ce the treebank<br />

was tagged us<strong>in</strong>g CLAWS4 | a more advanced tagg<strong>in</strong>g system to that used to tag the word class<br />

version.<br />

All these factors give good reason why it has not been an easy task to cross reference data <strong>in</strong><br />

the prosodically annotated version <strong>of</strong> the corpus with the word class <strong>in</strong>formation <strong>in</strong> the corpus.<br />

They also <strong>in</strong>dicate that a fairly complex cross{referenc<strong>in</strong>g process is necessary. This is detailed <strong>in</strong><br />

the rema<strong>in</strong>der <strong>of</strong> this chapter.<br />

3.4 Cross Referenc<strong>in</strong>g<br />

The structure <strong>of</strong> the corpus <strong>and</strong> the problems described above meantitwas not possible to<br />

answer question such as \what word class has a given stressed word <strong>in</strong> the corpus been tagged<br />

with" or \list all the proper nouns sorted accord<strong>in</strong>g to the type <strong>of</strong> prosodic marks they have".<br />

Cross{referenc<strong>in</strong>g between word class <strong>and</strong> prosodic <strong>in</strong>formation was an essential step.<br />

To this<br />

end s<strong>of</strong>tware was written that would match upword{by{word the two data sources. Before the<br />

cross{referenc<strong>in</strong>g can beg<strong>in</strong> it is necessary to preprocess the corpus data to transform it <strong>in</strong>to a<br />

useful format. Figure 3.1 shows an extract <strong>of</strong> the prosodic annotation format <strong>and</strong> gure 3.2 shows<br />

an extract <strong>of</strong> the treebank annotation.<br />

See appendix F for the source code.<br />

trees.<br />

1 The treebank is a further version <strong>of</strong> the corpus conta<strong>in</strong><strong>in</strong>g word classes (assigned us<strong>in</strong>g CLAWS4) <strong>and</strong> parse<br />

27


The preprocess<strong>in</strong>g (acomplished with UNIX 2 tools) produces two les from each <strong>of</strong> these formats.<br />

The rst two les produced from the prosodic annotation conta<strong>in</strong> one word per l<strong>in</strong>e (here<br />

we dene \word" as any sequence <strong>of</strong> characters delimited by whitespace). In one le the words<br />

reta<strong>in</strong> their prosodic annotations <strong>and</strong> <strong>in</strong> the other these are removed just leav<strong>in</strong>g the word. Tone<br />

unit boundaries are treated as words <strong>in</strong> this context <strong>and</strong> appear on a separate l<strong>in</strong>e. Comments<br />

conta<strong>in</strong>ed <strong>in</strong> square brackets are discarded. The second two les produced from the treebank are<br />

similar: one le conta<strong>in</strong>s each word (separated from the word class tag) <strong>and</strong> the other conta<strong>in</strong>s<br />

the word class. Phrase bracket <strong>and</strong> sentence numbers are discarded. Punctuation is treated as a<br />

word <strong>in</strong> these les (a punctuation symbol be<strong>in</strong>g given the word class tag <strong>of</strong> itself). As a further<br />

aid to the follow<strong>in</strong>g stage the case <strong>of</strong> letters <strong>in</strong> the two word les is converted <strong>in</strong>to lowercase where<br />

appropriate.<br />

This preprocess<strong>in</strong>g stage is doubly useful <strong>in</strong> that it not only simplies the cross{referenc<strong>in</strong>g<br />

stage but it also <strong>in</strong>sulates the body <strong>of</strong> the process from the potentially variable annotation formats.<br />

Hence by chang<strong>in</strong>g the preprocess<strong>in</strong>g stage this s<strong>of</strong>tware may be used with alternative annoatation<br />

formats.<br />

Before the treebank format was adopted as a source <strong>of</strong> word classes the word{class<br />

tagged version <strong>of</strong> the SEC was used. This change was acomplished with a m<strong>in</strong>or change to the<br />

preprocess<strong>in</strong>g stage.<br />

The next stage is the use <strong>of</strong> the program ttalign (see section F.2). The program works like this:<br />

the correspond<strong>in</strong>g entries are read from the two les produced by the preprocess<strong>in</strong>g stage from<br />

the prosodic annotation <strong>and</strong> similarly for the two les produced from the treebank. The program<br />

then compares the word from the prosodic annotation with the word from the treebank. If these<br />

words match an output l<strong>in</strong>e is generated from the <strong>in</strong>put. As a special case punctuation is treated<br />

as equivalent to tone unit boundaries where they co<strong>in</strong>cide (remember that tone unit boundaries<br />

only exist <strong>in</strong> the prosodic annotation <strong>and</strong> punctuation only exists <strong>in</strong> the treebank). Two other<br />

possibilities are that the prosodic annotation word is a tone unit boundary <strong>and</strong> the treebank word<br />

is the next word after the boundary. In these cases a ller symbol (fTUg) is <strong>in</strong>serted to match with<br />

2 UNIX is a registered trademark <strong>of</strong> UNIX System Laboratories.<br />

28


the tone unit boundary on the output l<strong>in</strong>e alternatively the treebank word may be punctuation<br />

whereas the prosodic annotation word may be the next word after the punctuation. As above a<br />

ller symbol (fPNg) is <strong>in</strong>serted to match the punctuation on the output l<strong>in</strong>e. The symbol fTUg<br />

st<strong>and</strong>s for \a tone unit boundary that does not co<strong>in</strong>cide with any punctuation" <strong>and</strong> the symbol<br />

fPNg st<strong>and</strong>s for \a punctuation symbol that does not co<strong>in</strong>cide with a tone unit boundary".<br />

If none <strong>of</strong> these situations is true, that is: if the two words do not match <strong>and</strong> neither is a<br />

punctuation or tone unit boundary then the mismatch h<strong>and</strong>l<strong>in</strong>g rout<strong>in</strong>e is called. In the mismatch<br />

h<strong>and</strong>l<strong>in</strong>g rout<strong>in</strong>e an output l<strong>in</strong>e is generated from the mismatch<strong>in</strong>g entries <strong>and</strong> new <strong>in</strong>put is read.<br />

If these words now match then it is assumed that the previous mismatchwas an error <strong>in</strong> the corpus<br />

or that the words were dierent representations <strong>of</strong> the same th<strong>in</strong>g e.g. \n<strong>in</strong>eteen" <strong>and</strong> \19". In<br />

this case another output l<strong>in</strong>e is generated <strong>and</strong> process<strong>in</strong>g cont<strong>in</strong>ues as normal.<br />

If the new <strong>in</strong>put words do not match atwo way lookahead stage is entered.<br />

In this the<br />

prosodic annotation word is compared with the next few treebank words <strong>and</strong> the treebank word<br />

is simultaneously compared with the next few prosodic annotation words until a match is found<br />

<strong>in</strong> either search oruntil a xed distance ahead has been viewed. If a match is found, depend<strong>in</strong>g<br />

upon which stream the match is found <strong>in</strong>, the program assumes that the mismatchmust have been<br />

caused by either an enclitic or a compound word. As previously noted enclitics such as\won't"<br />

are represented as \will n't" <strong>in</strong> the treebank. The s<strong>of</strong>tware will match \won't" with \will" but will<br />

<strong>in</strong>sert a ller symbol (fENg) mean<strong>in</strong>g enclitic which will be matched with \n't". For compounds,<br />

for example, \search <strong>and</strong> destroy" (section a10 l<strong>in</strong>e 43) the s<strong>of</strong>tware will match \search" with<br />

\search <strong>and</strong> destroy" <strong>and</strong> will <strong>in</strong>sert a ller symbol fCPg) mean<strong>in</strong>g compound word which will be<br />

matched with the next two l<strong>in</strong>es: \<strong>and</strong>" <strong>and</strong> \destroy".<br />

It it, <strong>of</strong> course, possible that the two way lookahead will not turn up a match. In this case<br />

an <strong>in</strong>teractive modeisentered that allows the user to specify one <strong>of</strong> the four ller symbols (fTUg,<br />

fPNg, fENg, <strong>and</strong> fCPg) which is repeated until the <strong>in</strong>put streams are aga<strong>in</strong> <strong>in</strong> synchronisation (i.e.<br />

the prosodic annotation word matches the treebank word). This is a very rare occurrence <strong>and</strong> it<br />

was not justiable to add the extra complexity necessary to h<strong>and</strong>le the conditions under which<br />

29


it occurred. This happened when a compound word was followed by a tone unit boundary or an<br />

enclitic was followed by punctuation which did not match punctuation or a tone unit boundary<br />

respectively. S<strong>in</strong>ce the lookahead does not check for punctuation or tone unit boundaries it can<br />

easily get confused th<strong>in</strong>k<strong>in</strong>g that, say, the tone unit boundary is part <strong>of</strong> the compound. For this<br />

reason <strong>and</strong> to save add<strong>in</strong>g recursive levels <strong>of</strong> lookahead the <strong>in</strong>teractive modeisentered. When<br />

synchronisation is achieved the user exits this mode <strong>and</strong> the program cont<strong>in</strong>ues as normal.<br />

The whole process is repeated for each <strong>in</strong>put l<strong>in</strong>e until the data are exhausted. The output<br />

from ttalign is not the end <strong>of</strong> the cross{reference process: two further stages are necessary. A<br />

post{process<strong>in</strong>g stage h<strong>and</strong>les the tricky problem <strong>of</strong> what to do if more than one punctuation<br />

symbol co<strong>in</strong>cides with a tone unit boundary. The ttalign program will only match the rst such<br />

punctuation symbol with the tone unit boundary <strong>and</strong> the rema<strong>in</strong><strong>in</strong>g punctuation symbols will be<br />

tagged with the fPNg ller symbol which means that the punctuation does not match a tone unit<br />

boundary. This is wrong <strong>and</strong> to x this the program collate{tu (see section F.3) will nd such<br />

<strong>in</strong>stances <strong>and</strong> convert fPNg llers <strong>in</strong>to fCTUg ller symbols which st<strong>and</strong> for collated tone unit (a<br />

s<strong>in</strong>gle tone unit boundary that matches multiple punctuation symbols). Which <strong>of</strong> the punctuation<br />

symbols actually matches the tone unit boundary (as opposed to the ller symbol) is determ<strong>in</strong>ed<br />

by an order <strong>of</strong> precedence. This usually only occurs with quotes <strong>and</strong> brackets which are given less<br />

precedence than the other punctuation symbols. With h<strong>in</strong>dsight itwould have been preferable<br />

to use two or three ller symbols such asfCTU1g, fCTU2g, <strong>and</strong> fCTU3g that are used respectively<br />

with the three dierent types <strong>of</strong> tone unit boundary. As it st<strong>and</strong>s it is not possible to tell (without<br />

reference to the context) whether punctuation aligned with the fCTUg ller was at a major, m<strong>in</strong>or<br />

or hesitation tone unit boundary. This does not have any eect on the research presented here<br />

but may bear on automatic segmentation <strong>of</strong> the text <strong>in</strong>to tone units.<br />

The nal stage <strong>of</strong> the cross{referenc<strong>in</strong>g uses the word classes to guide the program align{parse<br />

(see section F.4) while it re-<strong>in</strong>serts the treebank phrase brackets. Although the phrase brackets<br />

are not actually used here it is hoped that future research may be able to improve on results by<br />

us<strong>in</strong>g the contextual knowledge embodied with<strong>in</strong> them. Figure 3.3 shows some example output<br />

30


from the alignment process (here without the phrase brackets). The rst column conta<strong>in</strong>s the<br />

word class (see appendix B for an explanation <strong>of</strong> the code used) the second column conta<strong>in</strong>s the<br />

prosodically annotated word <strong>and</strong> the third column conta<strong>in</strong>s the lexical word. Notice how tone unit<br />

boundaries <strong>and</strong> punctuation are h<strong>and</strong>led. These are not \words" <strong>and</strong> some researchers choose to<br />

add an extra eld for <strong>in</strong>formation such as this plac<strong>in</strong>g the punctuation next to the word which it<br />

follows. This approach has not been adopted because <strong>of</strong> the additional complexity <strong>of</strong> process<strong>in</strong>g<br />

the data.<br />

3.5 By{Products<br />

The above has also been useful <strong>in</strong> the production <strong>of</strong> cross{referenc<strong>in</strong>g data <strong>in</strong> the MARSEC<br />

project[GAR92]. The ability to relate temporal <strong>in</strong>formation to the treebank is provided as a direct<br />

by{product <strong>of</strong> this s<strong>of</strong>tware. Prosodic annotation words are located <strong>in</strong> a cross{reference produced<br />

by the above algorithm <strong>and</strong> the location with<strong>in</strong> the parsetree is identied <strong>and</strong> a cross{reference<br />

table is produced.<br />

3.6 Summary<br />

The chosen data source, the SEC, conta<strong>in</strong>s word class, parsetree <strong>and</strong> prosodic <strong>in</strong>formation which<br />

could not directly be cross{referenced. A semi-<strong>in</strong>telligent algorithm was used to produce cross{<br />

referenc<strong>in</strong>g between these annotations which coped with representational dierences with little<br />

doma<strong>in</strong> knowledge. The results may be used to make direct comparisons between word class <strong>and</strong><br />

prosodic annotations.<br />

31


[001 SPOKEN ENGLISH CORPUS TEXT A01]<br />

[In Perspective]<br />

[Rosemary Hartill]<br />

[Broadcast notes: Radio 4, 07.45a.m., 24th November, 1984]<br />

[Transcriber: BJW]<br />

#Good morn<strong>in</strong>g k"more news about the Reverend Sun Myung Moon j founder<strong>of</strong>the<br />

Uni cation Church j who's currently <strong>in</strong> jail j for tax evasion k"he was a warded an honorary<br />

de gree last week j by the Roman Catholic Uni versity <strong>of</strong>la Plata j <strong>in</strong> Buenos Aires j<br />

Argen t<strong>in</strong>a k <strong>in</strong> an nounc<strong>in</strong>g the a ward <strong>in</strong> New York j the rector <strong>of</strong> the uni versity j#Dr<br />

Nicholas Argen tato j de scribed Mr Moon as j a prophet <strong>of</strong> our time k<br />

Figure 3.1: Example <strong>of</strong> Prosodic annotation format.<br />

SA01 1 v<br />

SA01 2 v<br />

[N Good JJ morn<strong>in</strong>g NN1 N] . .<br />

SA01 3 v<br />

[N More DAR news NN1 [P about II [N the AT Reverend NNS1 Sun NP1 Myung NP1 Moon NP1<br />

, , [N founder NN1 [P <strong>of</strong> IO [N the AT Unication NN1 church NN1 N]P]N] , ,[Fr[N who PNQS<br />

N][V 's VBZ currently RR [P <strong>in</strong> II [N jail NN1 N]P][P for IF [N tax NN1 evasion NN1<br />

N]P]V]Fr]N]P]N] : : [N he PPHS1 N][V was VBDZ awarded VVN [N an AT1 honorary JJ degree<br />

NN1 N][Nr last MD week NNT1 Nr][P by II [N the AT [ Roman JJ Catholic JJ ] University<br />

NNL1 [P <strong>of</strong> IO [N la &FW Plata NP1 N]P][P <strong>in</strong> II [N Buenos NP1 Aires NP1 , , Argent<strong>in</strong>a<br />

NP1 N]P]N]P]V] . .<br />

SA01 4 v<br />

[P In II [Tg announc<strong>in</strong>g VVG [NtheATaward NN1 N][P <strong>in</strong> II [N New NP1 York NP1 N]P]Tg]P]<br />

, , [N the AT rector NNS1 [P <strong>of</strong> IO [N the AT university NNL1 N]P] , , [N Dr NNSB1<br />

Nicholas NP1 Argentato NP1 N]N] , , [V described VVD [N Mr NNSB1 Moon NP1 N][P as II<br />

[N a AT1 prophet NN1 [P <strong>of</strong> IO [N our APP$ time NN1 N]P]N]P]V] . .<br />

Figure 3.2: Example <strong>of</strong> Treebank format.<br />

32


NNJ BBC bbc<br />

NN1 news news<br />

fTUg j fTONE{UNITg<br />

II at at<br />

MC eight eight<br />

RA o' clock o'clock<br />

fTUg j fTONE{UNITg<br />

II on on<br />

NPD1 Saturday saturday<br />

, j ,<br />

AT the the<br />

MD twenty{ second twenty{second<br />

IO <strong>of</strong> <strong>of</strong><br />

NP1 June june<br />

. k .<br />

DD1 # this this<br />

VBZ is is<br />

NP1 Brian brian<br />

NP1 Perk<strong>in</strong>s perk<strong>in</strong>s<br />

. k .<br />

Figure 3.3: Example output from cross{referenc<strong>in</strong>g from section B02.<br />

33


Chapter 4<br />

Prelim<strong>in</strong>ary Statistical <strong>Analysis</strong><br />

4.1 Introduction<br />

This chapter will expla<strong>in</strong> how statistics were extracted from the corpus <strong>and</strong> how these statistics<br />

may be used to estimate probabilities for the co{occurrence between prosodic stress marks <strong>and</strong><br />

word classes <strong>and</strong> probabilities for seqeuences <strong>of</strong> prosodic stress marks <strong>and</strong> word classes. Together<br />

these probabilities are used <strong>in</strong> later chapters to build up a probabilistic grammar <strong>of</strong> prosody that<br />

may be generated from the word classes for an utterance. Other statistics gathered are used to<br />

provide descriptions <strong>of</strong> the corpus annotations. These are used to provide operational parameters<br />

for the context <strong>in</strong> which the grammar will work <strong>and</strong> provide <strong>in</strong>formation on the factors <strong>of</strong> the<br />

prosodic annotation that are not covered by the grammar.<br />

In this statistical analysis <strong>of</strong> the corpus there were two aims:<br />

to extract <strong>in</strong>formation descriptive <strong>of</strong> the prosodic annotations.<br />

to extract <strong>in</strong>formation helpful <strong>in</strong> relat<strong>in</strong>g the prosodic annotation to the syntactic annotation.<br />

The analysis results described <strong>and</strong> presented here are extracted from the prosodic annotation<br />

<strong>and</strong> the cross{reference between the prosody <strong>and</strong> syntax produced <strong>in</strong> the previous chapter. Only a<br />

subsection <strong>of</strong> the corpus was used for these analyses because the corpus is comprised <strong>of</strong> a number<br />

34


<strong>of</strong> dierent speech styles (see appendix D) <strong>and</strong> there is a likelihood that unusual prosodic styles<br />

such as found <strong>in</strong> the poetry examples, for example, may produce spurious results <strong>in</strong> the statistics.<br />

A subsection <strong>of</strong> the corpus (<strong>of</strong> approximately two thirds) was selected roughly correspond<strong>in</strong>g to a<br />

report style. This subsection comprised the categories listed <strong>in</strong> table 4.1. Category M was reserved<br />

for test<strong>in</strong>g purposes.<br />

Category Style #words % corpus<br />

A Commentary 9066 17%<br />

B News Broadcast 5235 10%<br />

C Lecture type I (general audience) 4471 8%<br />

D Lecture type II (restricted audience) 7451 14%<br />

F Magaz<strong>in</strong>e Style Reports 4710 9%<br />

M Miscellaneous 3352 6%<br />

Table 4.1: Categories <strong>of</strong> the corpus used for analysis<br />

(K).<br />

Categories omitted were: Religious (E), Fiction (G), Poetry (H), Dialogue (J), <strong>and</strong> Propag<strong>and</strong>a<br />

4.2 Prosodic Annotation Statistics<br />

Although there is little new <strong>in</strong>formation presented <strong>in</strong> these statistics they are useful <strong>in</strong> provid<strong>in</strong>g<br />

descriptions <strong>of</strong> the annotations <strong>in</strong> the corpus <strong>and</strong> can serve as a metric to compare synthesized<br />

annotations aga<strong>in</strong>st. Useful statistics to look at are the relative frequency<strong>of</strong>each <strong>of</strong> the prosodic<br />

annotations (see appendix D) the lengths <strong>of</strong> tone units <strong>and</strong> frequencies <strong>of</strong> prosodic mark bigrams.<br />

4.2.1 Prosodic mark frequencies<br />

Figure 4.1 shows the relative frequency <strong>of</strong> each <strong>of</strong> the prosodic marks (<strong>in</strong>clud<strong>in</strong>g unstressed)<br />

used <strong>in</strong> the annotation <strong>of</strong> the SEC. Unstressed words account for 47.1%, whereas rises, falls, fall{<br />

rises <strong>and</strong> level tones account for 40%. The rema<strong>in</strong><strong>in</strong>g 12.9% is largely made up <strong>of</strong> the class stressed<br />

but unaccented. Rise{fall tones are so rare that they are negligible for our purposes. The one or<br />

two <strong>in</strong>stances that actually do occur <strong>in</strong> the corpus were ommitted hence the count <strong>of</strong> zero <strong>in</strong> the<br />

35


Frequency<br />

12801<br />

3511<br />

2564<br />

2297<br />

1528<br />

1511<br />

1200 1158<br />

342 261<br />

8<br />

0<br />

Unstressed<br />

High Fall<br />

Stress (unaccented)<br />

Low Fall<br />

Low Rise<br />

High Rise<br />

High Fall Rise<br />

Low Level<br />

High Level<br />

Low Fall Rise<br />

High Rise Fall<br />

Low Rise Fall<br />

Figure 4.1: Frequency <strong>of</strong> prosodic marks<br />

histogram.<br />

4.2.2 Tone Unit lengths<br />

3300<br />

3000<br />

2700<br />

2400<br />

2100<br />

1800<br />

1500<br />

1200<br />

900<br />

600<br />

300<br />

0<br />

0 1 2 3 4 5 6 7 8 9 10<br />

Length <strong>of</strong> Tone-Unit<br />

Figure 4.2: Relative frequencies <strong>of</strong> tone{unit lengths <strong>in</strong> terms <strong>of</strong> numbers <strong>of</strong>: words with tonic<br />

stress marks words with prosodic marks <strong>and</strong> words.<br />

The length <strong>of</strong> tone units (presented <strong>in</strong> gure 4.2) was calculated by count<strong>in</strong>g the number <strong>of</strong><br />

words with a prosodic mark (the dotted l<strong>in</strong>e), by count<strong>in</strong>g the number <strong>of</strong> words with tonic stress<br />

36


marks only (the dashed l<strong>in</strong>e), <strong>and</strong> by count<strong>in</strong>g the total number <strong>of</strong> words (the solid l<strong>in</strong>e). It is, <strong>of</strong><br />

course, not mean<strong>in</strong>gful to have a tone unit with zero words but it is possible to have a tone unit<br />

with no TSM or stressed words. The total length <strong>of</strong> tone units (<strong>in</strong> terms <strong>of</strong> words) extends to over<br />

35 words but this trails o to a frequency below 10 after a length <strong>of</strong> about 15 words. The average<br />

length <strong>of</strong> a tone unit is about 4 words.<br />

Of these measures <strong>of</strong> length only the rst two are useful for comparison with the same measures<br />

taken from any prosody synthesis model that is not concerned with the segmentation (<strong>in</strong>to tone<br />

units) problem. The models presented <strong>in</strong> chapters 5 <strong>and</strong> 6 are only concerned with synthesis <strong>of</strong><br />

stress <strong>and</strong> prosodic marks annotations <strong>and</strong> not prosodic boundaries | these are taken as given.<br />

Under those conditions the number <strong>of</strong> words <strong>in</strong> a tone unit will not change as the same tone unit<br />

boundaries are used.<br />

4.2.3 Prosodic mark bigram frequencies<br />

or or or or Unstress kjor *<br />

or 34 40 20 128 329 2044<br />

or 44 304 216 1072 2454 4834<br />

or 10 74 25 701 854 1162<br />

or 953 2381 666 2436 5297 3767<br />

Unstress 1235 4939 1524 8210 8453 1463<br />

kjor * 319 1185 375 2953 8437 N/A<br />

Table 4.2: Prosodic mark bigram frequencies<br />

The gures <strong>in</strong> table 4.2 show the absolute number <strong>of</strong> <strong>in</strong>stances <strong>of</strong> each <strong>of</strong> the bigrams. So,<br />

for example, the frequency <strong>of</strong> a fall (either high or low) be<strong>in</strong>g immediately followed by a tone<br />

unit boundary is 4834 <strong>in</strong>stances. For reasons applicable to the models developed <strong>in</strong> chapters 5<br />

<strong>and</strong> 6 the bigrams gures presented here group low <strong>and</strong> high tones together as well as group<strong>in</strong>g<br />

the stressed but unaccented mark together with the low <strong>and</strong> high level tones. This group is <strong>of</strong>ten<br />

referred to here simply as stressed. The unstress element refers to words (not syllables) that have<br />

no stress or prosodic mark annotated. It is <strong>in</strong>terest<strong>in</strong>g to note that 60.6% <strong>of</strong> tone units are ended<br />

with a fall<strong>in</strong>g, ris<strong>in</strong>g or rise{fall tone <strong>and</strong> only 11% with an unstressed word support<strong>in</strong>g the view<br />

37


that tonic stress comes at the end <strong>of</strong> a tone unit.<br />

It is <strong>in</strong>terest<strong>in</strong>g to note that none <strong>of</strong> the cells are zero. This shows that a probabilistic approach<br />

to language modell<strong>in</strong>g is essential. A traditional generative approach to develop<strong>in</strong>g a \grammar<br />

<strong>of</strong> prosodic marks" follow<strong>in</strong>g Chomsky[Cho57] would <strong>in</strong>volve den<strong>in</strong>g a set <strong>of</strong> rules to generate all<br />

<strong>and</strong> only \legal" prosodic mark sequences, <strong>and</strong> disallow<strong>in</strong>g \illegal" sequences. S<strong>in</strong>ce all pair comb<strong>in</strong>ations<br />

are legal a rule{based grammar derrived from the bigrams alone would not be sucient.<br />

4.3 Cross{Reference Statistics<br />

The statistics here provide evidence for what has commonly been believed about the relationship<br />

between prosodic marks <strong>and</strong> word classes. In particular it quanties the relationship <strong>and</strong> by use<br />

<strong>of</strong> a larger set <strong>of</strong> word classications than is normally considered it adds detail. The relationship<br />

between punctuation <strong>and</strong> tone unit boundaries is also covered by show<strong>in</strong>g the frequency <strong>of</strong> co{<br />

occurrence <strong>of</strong> the marks <strong>in</strong> each annotation.<br />

4.3.1 Co{occurence tables<br />

Of particular <strong>in</strong>terest to this research is the frequency <strong>of</strong> co{occurence <strong>of</strong> word classes with prosodic<br />

marks. The prosodic marks (known as Tonic Stress Marks or TSMs) fall on the syllable upon which<br />

a pitch movement starts <strong>and</strong> the behaviour <strong>of</strong> the prosody between TSMs may be predicted[KT88].<br />

The TSMs therefore encapsulate the essential changes <strong>in</strong> the prosody.<br />

The SEC (as noted <strong>in</strong><br />

chapter 2) marks every stressed syllable with a TSM with the assumption that the nal TSM <strong>in</strong><br />

a tone{unit is the tonic stress. It is this approach to the annotation that makes it possible to<br />

produce the co{occurence table for word classes <strong>and</strong> TSMs. This could not be done with other<br />

prosodic annotations schemes such asToBI[SBP + 92] which have a rigid grammar <strong>and</strong> <strong>in</strong>dividual<br />

elements cannot be extracted from an utterance ignor<strong>in</strong>g the context from where they came. Of<br />

course a co{ocurrence table (or any other statistical model) cannot readily be extracted from<br />

prosodic annotations unless these are mach<strong>in</strong>e{readable. The POW corpus was collated before<br />

word process<strong>in</strong>g s<strong>of</strong>tware was available so the only \models" we can build from the POW must be<br />

38


ased on <strong>in</strong>telligent observation <strong>of</strong> the four volumes <strong>of</strong> pr<strong>in</strong>ted transcripts. From the cross-reference<br />

produced <strong>in</strong> chapter 3 this table is easy to produce: for each word class a count ismade<strong>of</strong>each<br />

time it occurs with each <strong>of</strong> the possible TSMs plus the annotations <strong>of</strong> stressed but unnaccented<br />

<strong>and</strong> unstressed 1 .<br />

One problem with this approach is that some multisyllabic words have more than one prosodic<br />

mark with<strong>in</strong> them (for example \ unscien tic" section a01 l<strong>in</strong>e 64). Words with multiple prosodic<br />

marks comprise 1.2% <strong>of</strong> the sub{corpus used for the cross{reference which is small enough to<br />

ignore <strong>and</strong> assume that all words will have only one TSM. This is an important assumption s<strong>in</strong>ce<br />

it overcomes the diculties associated with the fact that prosody <strong>and</strong> stress are usually syllable<br />

based but word class tagg<strong>in</strong>g is word based. Compound words are <strong>of</strong>ten treated as a s<strong>in</strong>gle word<br />

(for example \ battle{ marked" (section a02 l<strong>in</strong>e 12) is not treated as two separated words \battle"<br />

<strong>and</strong> \marked" with word classes NN1 <strong>and</strong> JJ but as a s<strong>in</strong>gle word with word class JJ) <strong>and</strong> cases<br />

such as this <strong>of</strong>ten disagree with the assumption. This work does not attempt to address this<br />

problem. Further work on the relationships between prosody <strong>and</strong> compound words looks like it<br />

would be a fruitful avenue to explore, but is outside the bounds <strong>in</strong>vestigated here.<br />

Another problem is the use <strong>of</strong> " <strong>and</strong> # which may be either used on their own or <strong>in</strong> comb<strong>in</strong>ation<br />

with other TSMs. This <strong>in</strong>creases the number <strong>of</strong> possible (if not actually used) prosodic marks<br />

substantially yet " <strong>and</strong> # occur very <strong>in</strong>frequently. The approach here had been to assume that the<br />

eect <strong>of</strong> " <strong>and</strong> # is highly semantic, contextual or pragmatic <strong>in</strong> nature <strong>and</strong> hence they have been<br />

ignored. Thus there is no dist<strong>in</strong>ction made between TSMs marked with a high or low reset <strong>and</strong><br />

those without.<br />

Look<strong>in</strong>g at the frequencies <strong>of</strong> TSMs (gure 4.1) it was noticed that rise{fall tones are also very<br />

<strong>in</strong>frequent. So <strong>in</strong>frequent that statistics gleened from the few <strong>in</strong>stances are likely to be <strong>in</strong> error<br />

hence they have likewise been ignored.<br />

In Summary: with 168 dierent word classes occurr<strong>in</strong>g <strong>in</strong> the sub{corpus sample <strong>and</strong> a possible<br />

34 dierent types <strong>of</strong> prosodic annotation per syllable there is a major problem <strong>in</strong> regards <strong>of</strong> sample<br />

1 N.B. From here on I will not normally make the dist<strong>in</strong>ction between TSMs <strong>and</strong> stressed but unaccented <strong>and</strong><br />

unstressed annotations.<br />

39


sizes. In order to alleviate this problem certa<strong>in</strong> prosodic annotation marks are ignored. These<br />

are higher <strong>and</strong> lower than expected pitch level markers (" <strong>and</strong> #), <strong>and</strong> rise falls which together<br />

account for less than 0.6% <strong>of</strong> the data but reduce the number <strong>of</strong> prosodic marks to ten. Even with<br />

this precaution many cells <strong>in</strong> the co{occurrence table will be zero <strong>and</strong> correspond<strong>in</strong>gly distribution<br />

probabilities based on the frequencies will be <strong>in</strong> error.<br />

A table <strong>of</strong> frequencies is produced (see table D.3). Presented <strong>in</strong> table 4.3 are the frequencies<br />

for the 64 most frequent word classes.<br />

The same statistics can also be calculated for tone unit boundaries <strong>and</strong> punctuation symbols<br />

see table E.1.<br />

Although the phrase brackets are also aligned with the word classes <strong>and</strong> prosodic annotations<br />

they are not actually used <strong>in</strong> the mapp<strong>in</strong>gs dened here. Atwell[Atw94] has some nd<strong>in</strong>gs which<br />

support the view that phrase brackets do not give much more <strong>in</strong>formation than the word class<br />

tags alone.<br />

4.3.2 Ignor<strong>in</strong>g Higher{Level Syntactic structures<br />

There are two good reasons to ignore the parse tree <strong>in</strong>formation available <strong>in</strong> the SEC treebank.<br />

Firstly, there exist accurate word class taggers (e.g. CLAWS) but no parsers <strong>of</strong> equivalent<br />

accuracy exist[Atw93, Atw94, ASO88, O'D93, Wee94]. It would be much more useful to be able to<br />

predict prosody from word class tags alone s<strong>in</strong>ce natural language process<strong>in</strong>g systems can predict<br />

these with condence. Accurate parse trees cannot be generated automatically.<br />

Secondly, parse trees are large structures <strong>in</strong> comparison with <strong>in</strong>dividual prosodic marks <strong>and</strong> the<br />

SEC is not large enough to be able to provide enough examples <strong>of</strong> given parse tree structures (or<br />

sub{structures) <strong>in</strong> correlation with prosodic mark sequences. It is very doubtful that any progress<br />

would be made from use <strong>of</strong> such limited <strong>in</strong>formation with<strong>in</strong> the framework <strong>of</strong> this research.<br />

This is why this thesis concentrates upon mapp<strong>in</strong>g only word class tags to prosody. Hopefully<br />

higher level syntactic structures would only be useful for provid<strong>in</strong>g contextual or semantic<br />

<strong>in</strong>formation or segment<strong>in</strong>g the utterance <strong>in</strong>to tone units which are all outside the scope.<br />

40


4.3.3 Cluster<strong>in</strong>g word classes<br />

Us<strong>in</strong>g the data <strong>in</strong> the co-occurence table it is possible to perform a prosodic{based word class<br />

hierarchical cluster<strong>in</strong>g 2 <strong>of</strong> these word classes. Cluster<strong>in</strong>g <strong>of</strong> non{parametric n{dimensional data<br />

is a dicult task <strong>and</strong> many distance metrics exist (see Hughes[Hug94]). It is highly likely that an<br />

improvement on the cluster<strong>in</strong>g presented here would be possible. Advanced cluster<strong>in</strong>g techniques,<br />

however, are beyond the scope <strong>of</strong> this research.<br />

By draw<strong>in</strong>g a vertical l<strong>in</strong>e through the arcs <strong>in</strong> gure 4.3 the word classes may be divided <strong>in</strong>to<br />

anumber <strong>of</strong> groups. A l<strong>in</strong>e at the far left would give one group, slightly to the right would give<br />

two groups etc. There would be n groups where the vertical l<strong>in</strong>e cut through n horizontal l<strong>in</strong>es<br />

| all the arcs to the right <strong>of</strong> cut bracket all the word classes <strong>in</strong> each group. For example with<br />

two groups there would be word classes APP$ to DDQ <strong>in</strong> one group <strong>and</strong> DB to RG <strong>in</strong> the other<br />

group.<br />

This leaves the problem <strong>of</strong> which group each <strong>of</strong> the other word classes (not <strong>in</strong> this gure)<br />

belong to. Low frequency word classes have poorly dened co{occurence vectors which make it<br />

dicult to know which cluster a word class would group with. The best that can be achieved is<br />

to <strong>in</strong>spect the groups for patterns between the word classes. For example one cluster near the<br />

bottom <strong>of</strong> gure 4.3 has the word classes: NN1, NN2, NNL1, NNT2, NNU, <strong>and</strong> NNT1. A clear<br />

pattern exists here <strong>and</strong> similar low frequency tags such as NNL2 could be added to this cluster.<br />

4.4 Summary<br />

In this chapter consideration has been given to the extraction <strong>of</strong> various measures to aid the<br />

mapp<strong>in</strong>g between word class <strong>and</strong> prosodic annotations <strong>in</strong> the <strong>Spoken</strong> <strong>English</strong> <strong>Corpus</strong>. In particular<br />

the range <strong>and</strong> frequency <strong>of</strong> the prosodic marks has been presented along with their frequency with<strong>in</strong><br />

tone units. Prosodic mark bigram frequencies have been extracted which may be used to calculate<br />

likelihood scores for sequences <strong>of</strong> prosodic marks. It was noted that these gures <strong>in</strong>dicate that <strong>in</strong><br />

2 The cluster<strong>in</strong>g was k<strong>in</strong>dly performed for me by John Hughes us<strong>in</strong>g a technique described <strong>in</strong> Hughes[Hug94]<br />

41


common with general belief the rst word <strong>in</strong> a tone unit is usually unstressed <strong>and</strong> the last word<br />

<strong>in</strong> a tone unit usually carries the tonic stress.<br />

A mechanism was described to cluster word classes <strong>in</strong>to groups based upon their similarity <strong>of</strong><br />

co{occurrence with prosodic marks. This has led to the idea <strong>of</strong> prosodically orientated word class<br />

groups.<br />

The statistics so assembled here are used <strong>in</strong> chapter 5 to devise a model to generate stress<br />

patterns from word class tagged text.<br />

42


Tag<br />

U/str<br />

APP$ 1 0 2 9 1 8 0 7 6 303<br />

VBDR 0 0 0 2 0 6 0 1 1 90<br />

VHD 0 0 0 2 1 6 0 0 3 75<br />

CC 5 1 1 12 1 18 0 6 57 683<br />

II 7 1 11 25 13 47 1 25 128 1847<br />

IW 0 0 0 4 1 5 0 3 12 157<br />

PPHS1 0 0 1 5 1 3 0 1 5 116<br />

VHZ 1 0 0 3 1 2 0 1 5 79<br />

VBZ 2 1 3 8 1 6 0 8 10 229<br />

EX 0 0 0 0 1 2 0 0 6 79<br />

PNQS 0 1 0 1 1 2 0 0 4 78<br />

PPIS2 0 0 0 0 0 4 0 1 5 96<br />

VBDZ 0 0 0 2 3 4 0 1 11 261<br />

VBN 0 1 0 1 2 1 0 0 5 104<br />

IF 1 0 0 1 1 0 0 0 11 254<br />

PPH1 0 0 1 1 0 0 0 0 16 245<br />

VB0 1 0 1 2 0 1 0 0 9 153<br />

AT 0 0 0 18 1 19 0 3 42 2016<br />

PPY 0 0 0 0 1 0 0 1 1 68<br />

AT1 1 0 0 1 1 3 0 1 12 739<br />

CST 0 0 0 0 0 0 0 0 8 260<br />

II22 0 0 0 0 1 0 0 0 0 70<br />

IO 0 0 0 1 0 0 0 0 7 896<br />

TO 0 0 0 0 0 0 0 0 3 410<br />

CCB 2 0 0 0 0 3 0 1 23 140<br />

NNSB1 2 0 0 1 1 3 0 0 12 60<br />

CSA 1 0 0 4 1 8 0 0 7 72<br />

VM 2 0 5 20 9 19 0 10 17 218<br />

PPHS2 1 0 0 5 2 2 1 6 7 97<br />

VBR 0 0 1 8 4 3 0 2 5 108<br />

VH0 1 0 0 3 3 4 0 5 13 123<br />

CS 3 1 0 12 3 17 0 0 30 92<br />

ICS 1 0 1 9 7 9 0 5 24 65<br />

DDQ 3 0 1 10 7 11 0 1 22 106<br />

DB 0 1 9 18 5 19 2 14 14 8<br />

NP1 91 52 129 191 131 191 37 155 222 117<br />

JJ 68 15 87 263 200 354 37 177 356 145<br />

VVN 67 8 62 107 96 115 16 56 152 60<br />

JB 7 0 3 23 18 28 1 20 40 24<br />

NN 14 3 13 23 14 23 3 6 45 28<br />

VV0 67 7 47 104 90 105 9 47 185 144<br />

RL 8 1 16 13 10 15 3 4 26 15<br />

RR 41 5 28 157 65 130 16 83 132 117<br />

RT 12 2 8 10 7 15 1 11 23 22<br />

DD 0 0 1 13 9 20 0 17 14 23<br />

MD 3 0 4 18 22 27 3 26 20 29<br />

XX 0 1 4 21 6 22 0 8 10 22<br />

MC 12 2 22 78 63 108 4 35 73 50<br />

MC1 4 1 2 10 11 22 0 12 18 18<br />

NNS1 3 3 3 8 9 15 2 5 31 13<br />

VVD 13 10 21 53 47 85 2 28 147 67<br />

VVG 21 3 6 54 51 71 5 20 105 42<br />

VVZ 11 2 9 20 21 33 1 4 54 40<br />

NN1 379 80 373 610 285 348 106 359 810 275<br />

NN2 160 26 172 236 134 130 45 124 364 141<br />

NNL1 15 4 20 23 13 11 2 8 26 19<br />

NNT2 12 1 13 21 4 4 1 7 15 14<br />

NNJ 7 5 6 20 12 3 7 8 30 19<br />

NNT1 15 3 19 32 13 15 4 12 62 35<br />

DD1 5 1 4 38 6 44 1 31 42 113<br />

MF 3 0 2 13 1 10 0 1 16 24<br />

NNO 3 0 2 9 7 4 0 10 14 26<br />

RP 14 4 24 33 11 18 4 9 24 66<br />

RG 0 1 0 14 6 22 0 5 19 52<br />

Table 4.3: Co{occurence table for 64 most frequent word classes.<br />

43


APP$<br />

VBDR<br />

VHD<br />

CC<br />

II<br />

IW<br />

PPHS1<br />

VHZ<br />

VBZ<br />

EX<br />

PNQS<br />

PPIS2<br />

VBDZ<br />

VBN<br />

IF<br />

PPH1<br />

VB0<br />

AT<br />

PPY<br />

AT1<br />

CST<br />

II22<br />

IO<br />

TO<br />

CCB<br />

NNSB1<br />

CSA<br />

VM<br />

PPHS2<br />

VBR<br />

VH0<br />

CS<br />

ICS<br />

DDQ<br />

DB<br />

NP1<br />

JJ<br />

VVN<br />

JB<br />

NN<br />

VV0<br />

RL<br />

RR<br />

RT<br />

DD<br />

MD<br />

XX<br />

MC<br />

MC1<br />

NNS1<br />

VVD<br />

VVG<br />

VVZ<br />

NN1<br />

NN2<br />

NNL1<br />

NNT2<br />

NNJ<br />

NNT1<br />

DD1<br />

MF<br />

NNO<br />

RP<br />

RG<br />

Figure 4.3: Hierarchical cluster<strong>in</strong>g <strong>of</strong> 64 most frequent word classes.<br />

44


Chapter 5<br />

Automatic Stress Annotation<br />

5.1 Introduction<br />

In this chapter it will be shown that the placement <strong>of</strong> stresses 1<br />

on words <strong>in</strong> an utterance may<br />

be largely predicted from word class classications. This chapter is largely based upon the work<br />

presented <strong>in</strong> Arneld[AA93].<br />

Chapter 6 will exp<strong>and</strong> on the ideas presented here to show how prosodic annotations may be<br />

calculated from word classes. The <strong>in</strong>tention, as po<strong>in</strong>ted out <strong>in</strong> section 1.3.1, is not to produce<br />

predictions that exactly match the annotations <strong>in</strong> the corpus but to generate annotations that will<br />

act as a basel<strong>in</strong>e annotation which may be built upon by semantic <strong>and</strong> contextual processes.<br />

It is possible for a sentence to be stressed <strong>in</strong> dierent ways <strong>in</strong> dierent texts (contexts). A<br />

predictor based on sentence{syntax, without any model <strong>of</strong> \text grammar" or <strong>in</strong>ter{sentential<br />

cohesion cannot hope to work perfectly. This leads to a problem <strong>of</strong> evaluation if the predicted<br />

stress is dierent from that <strong>in</strong> the corpus | it need not necessarily be wrong.<br />

There are a number <strong>of</strong> problems associated with this task (which are exam<strong>in</strong>ed below):<br />

1 For the purposes <strong>of</strong> clarity <strong>in</strong> writ<strong>in</strong>g I will refer to a stressed syllable or word here as one that has either a<br />

Tonic Stress Mark or is classied as be<strong>in</strong>g stressed but unnaccented. Unstressed refers to words or syllables which<br />

have no prosodic annotation (In some cases words are annotated solely with pitch resets. These are ignored <strong>in</strong> this<br />

research <strong>and</strong> such aword would be considered unstressed). In general I refer to stress marks as mean<strong>in</strong>g either <strong>of</strong><br />

the above, <strong>and</strong> not a syllable which isstressed.<br />

45


Stresses are normally associated with syllables, not words.<br />

How does one decide which syllable stress will fall on<br />

Enclitic words have more than one word class.<br />

Compound words have a s<strong>in</strong>gle word class but multiple words.<br />

A basic assumption <strong>of</strong> this research has been that it will only deal with at most one stress<br />

mark per word. This is not the case <strong>in</strong> reality by exam<strong>in</strong><strong>in</strong>g the corpus it can be seen that this<br />

assumption holds for 98.8% <strong>of</strong> the words <strong>in</strong> the sections <strong>of</strong> the corpus used (see section 4.1). The<br />

assumption is such a useful one to make because it simplies the mapp<strong>in</strong>g problem to mapp<strong>in</strong>g<br />

between a s<strong>in</strong>gle word class (<strong>in</strong> most cases) <strong>and</strong> a s<strong>in</strong>gle stress mark per word. In words that<br />

feature more than one stress mark the most prom<strong>in</strong>ent stress takes precedence, <strong>in</strong> this analysis.<br />

The second problem follows on from the rst if a stress mark is to be placed (upon a syllable)<br />

with<strong>in</strong> a word, which syllable should it go on This is not a problem tackled by this research<br />

because <strong>of</strong> a second underly<strong>in</strong>g assumption: that the stress mark will go on the syllable marked<br />

as that carry<strong>in</strong>g the primary stress <strong>in</strong> a dictionary. Accord<strong>in</strong>g to Fudge[Fud84]<br />

<strong>in</strong> <strong>English</strong>, the syllable s<strong>in</strong>gled out <strong>in</strong> a given word is nearly always the same one,<br />

irrespective <strong>of</strong> the context.<br />

He notes two types <strong>of</strong> exceptions (i) cases where the word has not been properly perceived by<br />

the hearer <strong>and</strong> (ii) certa<strong>in</strong> types <strong>of</strong> phrase require a shift <strong>in</strong> word{stress. The problem would<br />

therefore be solved by acomb<strong>in</strong>ation <strong>of</strong> dictionary lookup us<strong>in</strong>g a mach<strong>in</strong>e readable dictionary<br />

or lexicon (such as the forthcom<strong>in</strong>g edition <strong>of</strong> The <strong>English</strong> Pronounc<strong>in</strong>g Dictionary[RH95]) <strong>and</strong><br />

a rule{based approach ascovered by Fudge. Several mach<strong>in</strong>e readable dictionaries <strong>and</strong> lexical<br />

databases commonly used for natural language process<strong>in</strong>g research (e.g. LDOCE, OALD, Coll<strong>in</strong>s<br />

<strong>English</strong> Dictionary) <strong>in</strong>clude stress assignment <strong>in</strong>formation.<br />

Enclitics (which are pronounced as s<strong>in</strong>gle words) have more than one word class because they<br />

are formed from two (or more) words that become jo<strong>in</strong>ed. For example \can't" is \can not" <strong>and</strong><br />

46


\won't" is \will not". Two possibilities exist to deal with mapp<strong>in</strong>g between two (ormore)word<br />

classes <strong>and</strong> a s<strong>in</strong>gle word: (i) treat enclitics as a special case <strong>in</strong> the stress prediction stage or (ii)<br />

as a special case <strong>in</strong> the placement <strong>of</strong> stress marks on syllables stage. The dierence here is either<br />

some additional complexity <strong>in</strong> the prediction model or some simple rule{based approach <strong>in</strong>how<br />

to deal with plac<strong>in</strong>g more than one stress mark on the same syllable. Because <strong>of</strong> limited data on<br />

enclitics be<strong>in</strong>g available <strong>in</strong> the corpus the former approach would be dicult to implement <strong>and</strong><br />

mayhave the aect <strong>of</strong> blurr<strong>in</strong>g the word classications <strong>of</strong> enclitics. Toavoid this potential problem<br />

the latter approach is assumed: a prediction is made for each part <strong>of</strong> the enclitic <strong>and</strong> these are<br />

comb<strong>in</strong>ed at a later stage with the rule that the most prom<strong>in</strong>ent stress mark that is assigned to<br />

any part <strong>of</strong> the enclitic is taken as the stress mark for the whole enclitic. If it is a multisyllabic<br />

enclitic the stress mark will be placed on the primarily stressed syllable as <strong>in</strong>dicated above.<br />

Compounds unlike enclitics suer from the problem that a s<strong>in</strong>gle word classication is given<br />

for a phrase that mayconta<strong>in</strong> several words. For example \search{<strong>and</strong>{destroy" may be classied<br />

with a s<strong>in</strong>gle word class. This will give a s<strong>in</strong>gle stress mark prediction but <strong>in</strong> reality more than<br />

one word may be stressed.<br />

In addition it is dicult to say whichword should take the ma<strong>in</strong><br />

stress. In eect what is needed is a lexicon <strong>of</strong> compound words that list primary stress <strong>and</strong> gives<br />

rules for assignment <strong>of</strong> stress marks to the other words/syllables. Compound words are a special<br />

<strong>and</strong> dicult case that are not dealt with here. It would be outside the scope <strong>of</strong> this research to<br />

attempt to h<strong>and</strong>le them eectively. Hence, with<strong>in</strong> the constra<strong>in</strong>ts <strong>of</strong> the predictor model used, it<br />

will not be possible to predict stress marks for compound word sequences. However, Fudge[Fud84]<br />

(chapter 5) gives rules to deal with compounds.<br />

5.2 Stress Prediction<br />

The study (described <strong>in</strong> this chapter <strong>and</strong> extended <strong>in</strong> chapter 6) concentrates upon build<strong>in</strong>g a<br />

stochastic grammar model <strong>of</strong> stress based upon word class <strong>and</strong> the prosodic mark co{occurrence<br />

table.<br />

47


If we can collect a number <strong>of</strong> \measures" <strong>of</strong> the relationship between word classes <strong>and</strong> prosodic<br />

marks we can comb<strong>in</strong>e these measures together. Each dier<strong>in</strong>g measure <strong>of</strong> likelihood <strong>of</strong> relationship<br />

forms a constituent <strong>of</strong> the overall measure <strong>of</strong> relationship. Us<strong>in</strong>g a number<strong>of</strong>such constituents<br />

to relate one entity (suchasword class) to another (prosody) is what Atwell[Atw83] has called<br />

constituent likelihood.<br />

In the case <strong>of</strong> this research the measures are probabilities (or estimates <strong>of</strong>) <strong>of</strong> co{occurrence<br />

between word class <strong>and</strong> prosodic marks.<br />

In the model developed here the dist<strong>in</strong>ction between dierent tones will be ignored <strong>and</strong> only two<br />

types <strong>of</strong> annotations (stress markers <strong>and</strong> unstressed markers) will be considered. Stress markers<br />

are considered to be any <strong>of</strong> the tonic stress marks or stressed but unaccented marks whereas<br />

unstressed markers are considered to exist on words which have no prosodic marks whatsoever or<br />

only have marks " or # although if these exist <strong>in</strong> comb<strong>in</strong>ation with a TSM the word is considered<br />

stressed. With the assumptions made above there will be no more that a s<strong>in</strong>gle stress occurr<strong>in</strong>g<br />

with<strong>in</strong> each word one can look at a sequence <strong>of</strong> words as a sequence <strong>of</strong> stress <strong>and</strong> unstress markers.<br />

Each word is either stressed (i.e. conta<strong>in</strong>s a stressed syllable) or unstressed (no syllable is stressed).<br />

S<strong>in</strong>ce this is a b<strong>in</strong>ary str<strong>in</strong>g, for a sequence <strong>of</strong> n words, there would be 2 n possible sequences.<br />

As an example: a three word utterance (\at Ford motors" (section B04 l<strong>in</strong>e 51)) would have<br />

eight possible stress patterns where each word is <strong>in</strong> one <strong>of</strong> two states.<br />

1 at Ford motors<br />

2 at Ford motors<br />

3 at Ford motors<br />

4 at Ford motors<br />

5 at Ford motors<br />

6 at Ford motors<br />

7 at Ford motors<br />

8 at Ford motors<br />

Here<br />

<strong>in</strong>dicates (some type <strong>of</strong>) stress on the word (others be<strong>in</strong>g unstressed). If we wish to assign<br />

48


stress to the appropriate words <strong>in</strong> an utterance we needtondwhich <strong>of</strong> the possible 2 n sequences<br />

are valid or acceptable. One way <strong>of</strong> do<strong>in</strong>g this is to assign a score to each sequence <strong>and</strong> pick the<br />

highest scor<strong>in</strong>g sequence as the pattern for the utterance. The scores be<strong>in</strong>g designed such that<br />

the \basel<strong>in</strong>e" stress patterns generate the best scores.<br />

5.2.1 Search Mechanism<br />

For limited sizes <strong>of</strong> n the number <strong>of</strong> sequences is small enough to do a global search. If n were large<br />

or if we were deal<strong>in</strong>g with more that two possible annotation types for each word then it would<br />

be necessary to use an alternative search methodology to cut down on the computational load.<br />

For example for a 10 word sequence with two annotation types there are only 1024 possibilities<br />

but if there were, say, ve annotation types this number rises to 5 10 = 9765625 which is too large<br />

to search exhaustively <strong>in</strong> reasonable time. Alternative search possibilities exist (see section 6.3.4)<br />

where more annotation types are considered.<br />

5.2.2 Scor<strong>in</strong>g<br />

How does one assign scores to the sequences<br />

Various factors appear to be relevant <strong>in</strong>clud<strong>in</strong>g<br />

semantics, pragmatics, word class <strong>and</strong> context <strong>and</strong> clearly all <strong>of</strong> these could be used to provide<br />

components <strong>of</strong> the score. S<strong>in</strong>ce the aim <strong>of</strong> this research is to demonstrate a relationship between<br />

syntax <strong>and</strong> prosody we will only use measures derived from word class <strong>and</strong> context <strong>and</strong> will not<br />

attempt to implement any measures based upon semantics or other relevant factors.<br />

One could try the follow<strong>in</strong>g formula for scor<strong>in</strong>g each possible sequence <strong>of</strong> length w words<br />

function a n gives the stress{state (or annotation) <strong>of</strong> word n (i.e. either stressed or unstressed)<br />

function w n<br />

gives the word class at word n <strong>in</strong>to the sequence <strong>and</strong> function S(p q) gives the<br />

probability (or likelihood) that word class p would have annotation q.<br />

score =<br />

wY<br />

n=1<br />

S(w n a n ) (5.1)<br />

49


That is: if we know the word class for each word <strong>in</strong> our utterance <strong>and</strong> if we know the probability<br />

<strong>of</strong> each word class be<strong>in</strong>g associated with a stressed (or unstressed) word then we canmultiply<br />

all the probabilities together to give avalue we call the score (it is not actually necessary to use<br />

probabilities, any measure <strong>of</strong> likelihood would be acceptable but us<strong>in</strong>g probabilities (<strong>in</strong> reality an<br />

estimate <strong>of</strong> the probability) means that all the measures <strong>of</strong> likelihood will always be <strong>in</strong> the range<br />

<strong>of</strong> 0.0 to 1.0 <strong>and</strong> hence one does not have toworry about overow errors when us<strong>in</strong>g very long<br />

sequences. Underow errors can, <strong>of</strong> course, occur but this is easy to h<strong>and</strong>le s<strong>in</strong>ce any sequence<br />

score that underows is go<strong>in</strong>g to be very unlikely <strong>and</strong> therefore will not be a c<strong>and</strong>idate).<br />

By summ<strong>in</strong>g frequencies for all stress marks <strong>in</strong> the tonic stress/word class co{occurrence table<br />

for each word class the two values (for frequency <strong>of</strong> stressed <strong>and</strong> frequency <strong>of</strong> unstressed) can<br />

be used to calculate probabilities <strong>of</strong> each word class be<strong>in</strong>g stressed or unstressed. For NP1 (see<br />

table D.3) we get the frequency <strong>of</strong> NP1 be<strong>in</strong>g unstressed at 117 <strong>and</strong> the frequency <strong>of</strong> NP1 be<strong>in</strong>g<br />

stressed at<br />

91 + 52 + 129 + 191 + 131 + 191 + 37 + 155 + 222 = 1199<br />

The probability <strong>of</strong> NP1 be<strong>in</strong>g unstressed is then<br />

117=(117 + 1199) = 0:09<br />

<strong>and</strong> the probability <strong>of</strong> NP1 be<strong>in</strong>g stressed is<br />

1199=(117 + 1199) = 0:91<br />

Given that the word class is known for each word <strong>in</strong> the utterance 2<br />

wecannow state the<br />

likelihood <strong>of</strong> stress be<strong>in</strong>g present on each word. As an example consider:<br />

2 It is assumed that this <strong>in</strong>formation may be derrived automatically us<strong>in</strong>g a tagg<strong>in</strong>g system such as CLAWS).<br />

50


Word at Ford motors<br />

Word Class II NP1 NN2<br />

Probability <strong>of</strong> be<strong>in</strong>g stressed 0.12 0.91 0.92<br />

Probability <strong>of</strong> be<strong>in</strong>g unstressed 0.88 0.09 0.08<br />

Which means, for example, that the word class NP1 (s<strong>in</strong>gular proper noun) has a 91% chance<br />

<strong>of</strong> be<strong>in</strong>g stressed <strong>and</strong> a 9% chance <strong>of</strong> be<strong>in</strong>g unstressed.<br />

Then for each <strong>of</strong> the possible stress<br />

sequences the scor<strong>in</strong>g would be as follows (S represents stressed annotation, U represents unstressed<br />

annotation so USS means that \at" will be unstressed whilst both \Ford" <strong>and</strong> \motors" will be<br />

stressed):<br />

pattern calculation score<br />

UUU 0:88 0:09 0:08 = 0:006<br />

UUS 0:88 0:09 0:92 = 0:073<br />

USU 0:88 0:91 0:08 = 0:064<br />

USS 0:88 0:91 0:92 = 0:737<br />

SUU 0:12 0:09 0:08 = 0:001<br />

SUS 0:12 0:09 0:92 = 0:010<br />

SSU 0:12 0:91 0:08 = 0:009<br />

SSS 0:12 0:91 0:92 = 0:101<br />

Us<strong>in</strong>g this simple scor<strong>in</strong>g scheme reasonable results are achieved (<strong>in</strong> the order <strong>of</strong> half <strong>of</strong> the<br />

predictions match<strong>in</strong>g the corpus annotation). In this case \at Ford motors" is the w<strong>in</strong>n<strong>in</strong>g<br />

sequence by a long way however the next best sequences \ at Ford motors", \at Ford motors"<br />

<strong>and</strong> \at Ford motors" are all plausible.<br />

5.2.3 Performance Measures<br />

An important consideration is now evident: how does one rate the performance <strong>of</strong> the annotations<br />

That is: how does this generated annotation compare with real speech. This is a dicult question<br />

but us<strong>in</strong>g the corpus as a guide two possibilities are:<br />

51


calculate the percentage <strong>of</strong> words with the annotation marks which are the same as those<br />

annotated <strong>in</strong> the corpus.<br />

calculate the number <strong>of</strong> (tone unit) sequences that are entirely the same as those annotated<br />

<strong>in</strong> the corpus.<br />

The latter performance measure was used <strong>in</strong>itially s<strong>in</strong>ce synthesis worked on a sequence <strong>of</strong> several<br />

words at once. One error <strong>in</strong> a tone unit might upset a listeners perception <strong>of</strong> normalness. However<br />

this performance measure proved to be too <strong>in</strong>exible to register m<strong>in</strong>or improvements <strong>in</strong> the performance<br />

(the ma<strong>in</strong> reason for use <strong>of</strong> such a measure is to assess the function <strong>of</strong> the model). Indeed<br />

the majority <strong>of</strong> errors that <strong>in</strong>dicated poor performance were likely to be unpredictable without<br />

additional <strong>in</strong>formation such as semantics (afterall one would not expect 100% performance from<br />

this type <strong>of</strong> model). Such errorsovershadowed those errors that could have been improved upon<br />

with<strong>in</strong> the constra<strong>in</strong>ts <strong>of</strong> the model. It was also discovered that sequences that conta<strong>in</strong>ed errors<br />

were only wrong <strong>in</strong> one place 69% <strong>of</strong> the time. The change to the former performance measure<br />

brought underst<strong>and</strong>ably higher percentage gures but most importantly provided better <strong>in</strong>sights<br />

<strong>in</strong>to how changes eected the synthesis results. Neither <strong>of</strong> these are wholly satisfactory because<br />

they do not take account <strong>of</strong> the fact that alternative annotations to those given <strong>in</strong> the corpus may<br />

be acceptible. The matter will be returned to later.<br />

5.2.4 Context<br />

Someth<strong>in</strong>g to notice about this simple scheme is that it takes no account <strong>of</strong>theorder<strong>of</strong>the<br />

words.<br />

Hence \at Ford motors" (II NP1 NN2) <strong>and</strong> \Police <strong>in</strong> Yorkshire" (NN2 II NP1) will<br />

atta<strong>in</strong> the same scores regardless <strong>of</strong> the word order <strong>and</strong> regardless <strong>of</strong> whether the change <strong>in</strong> order<br />

dramatically changes the way that each word would be stressed (it is unlikely however <strong>in</strong> this<br />

case). There is strong reason to suggest that the order <strong>of</strong> stress annotations is important asis<br />

noted <strong>in</strong> section 4.2.3 there is a tendency for a TSM to come at the end <strong>of</strong> a tone unit <strong>and</strong> it<br />

is relatively likely for an unstressed word to appear at the start <strong>of</strong> a tone unit. A renement to<br />

formula 5.1 can take account <strong>of</strong> the sequence order. To calculate this we can use the probability<br />

52


<strong>of</strong> a stress occur<strong>in</strong>g at each word <strong>and</strong> the probability <strong>of</strong> the stress sequence. Ideally a value is<br />

needed that represents the relative likelihood (or probability) <strong>of</strong> each sequence <strong>of</strong> word classes<br />

<strong>and</strong> stress annotations. However these values are dicult to extract from the corpus because the<br />

corpus is not large enough to provide enough examples <strong>of</strong> each word class sequence <strong>in</strong> dierent<br />

stress patterns to get reliable probability orlikelihood measures.<br />

Although it is not possible to extract likelihood measures for any arbitary sequence <strong>of</strong> word<br />

classes it is possible to approximate this by us<strong>in</strong>g likelihoods for shorter sequences <strong>and</strong> overlapp<strong>in</strong>g<br />

them. Mak<strong>in</strong>g use <strong>of</strong> xed length short sequences also considerably simplies calculation <strong>of</strong> the<br />

score.<br />

For example the sequence II NP1 NN2 could be divided <strong>in</strong>to the two sequences II NP1 <strong>and</strong><br />

NP1 NN2. The likelihood for II NP1 NN2 can be estimated as the product <strong>of</strong> the likelihoods for<br />

the sequences II NP1 <strong>and</strong> NP1 NN2. Two{symbol sequences like this are known as bigrams 3 .<br />

The new scor<strong>in</strong>g metric given <strong>in</strong> equation 5.2 <strong>in</strong>cludes components for both the likelihood <strong>of</strong><br />

stresses occurr<strong>in</strong>g with word classes <strong>and</strong> the transition likelihoods for bigram sequences <strong>of</strong> word<br />

classes with specied stress annotations.<br />

score =<br />

wY<br />

S(w n a n ) <br />

wY<br />

n=1<br />

m=2<br />

B(w m;1 a m;1 w m a m ) (5.2)<br />

Where S(p q), w n ,<strong>and</strong>a n are as before <strong>and</strong> B(p q p 0 q 0 ) is the probability (or likelihood) <strong>of</strong><br />

the bigram <strong>of</strong> word class p followed by word class p 0 where p has an annotation q <strong>and</strong> p 0 has an<br />

annotation q 0 .<br />

Bigrams Probabilities<br />

For the bigram probabilities four tables are needed, one for each <strong>of</strong> the possible stress transitions:<br />

3 A bigram is a sequence <strong>of</strong> two symbols, such astwo word class tags that follow each other <strong>in</strong> a text. Trigrams<br />

or more generally n{grams are three or n symbol sequences.<br />

53


unstressed ! unstressed<br />

unstressed ! stressed<br />

stressed ! unstressed<br />

stressed ! stressed<br />

Each table has to hold the likelihoods for transition from any <strong>of</strong> the 168 word classes to any other<br />

word class.<br />

Four tables <strong>of</strong> 168 168 cell locations still need very many more samples than are available <strong>in</strong><br />

the corpus size <strong>of</strong> appoximately 28000 words 4 to produce reasonable likelihood measures. It is clear<br />

that even these values cannot be extracted. One (possibly extreme) solution to this problem (but<br />

see section 5.4) is to assume that all word classes behave similarly <strong>in</strong> these bigram probabilities.<br />

In all likelihood many word classes will behave similarly <strong>and</strong> their likelihoods could be comb<strong>in</strong>ed<br />

to give groups <strong>of</strong> word classes that occur similarly with regard to stress pattern/word class orders.<br />

The extreme case <strong>of</strong> assum<strong>in</strong>g that all word classes behave similarly (which isequivalent toa<br />

s<strong>in</strong>gle group) will mean that only four values (one per table) need to be estimated. In fact <strong>in</strong> this<br />

approach word classes are ignored. This is a simplication to B(p q p 0 q 0 ) such thatp <strong>and</strong> p 0 are<br />

irrelevant <strong>and</strong> are ignored or more generally all word classes are mapped to the same group. The<br />

group<strong>in</strong>g idea will be returned to later. These four probabilities can be calculated from the corpus<br />

prosodic annotations <strong>and</strong> are shown <strong>in</strong> the table 5.1.<br />

first nsecond unstressed stressed<br />

unstressed 0.17 0.39<br />

stressed 0.21 0.23<br />

Table 5.1: Stress Transition Table.<br />

This means that for the sequence where the rst word <strong>in</strong> unstressed <strong>and</strong> the second word is<br />

stressed (<strong>in</strong> a bigram) the probability is0:39.<br />

4 Section 4.1 lists the categories if the corpus used for this research. Note that some sections <strong>of</strong> these categories<br />

are also omitted because <strong>of</strong> problems between the various corpus versions.<br />

54


For a given stress pattern the follow<strong>in</strong>g probabilities can now be laid out (where arrows <strong>in</strong>dicate<br />

transition (bigram) probabilities <strong>and</strong> <strong>in</strong>itial state probabilities (probabilties <strong>of</strong> word classes be<strong>in</strong>g<br />

stressed or unstressed are listed below eachword class). This example is for the specic stress<br />

annotation \at Ford motors".<br />

at Ford Motors<br />

II NP1 NN2<br />

0.39 0.23<br />

0.88 0.91 0.92<br />

prosody<br />

word tags<br />

transition probabilities<br />

state probabilities<br />

The product <strong>of</strong> all these probabilities is calculated (us<strong>in</strong>g equation 5.2) giv<strong>in</strong>g the overall<br />

likelihood for this utterance. This is repeated for all possible patterns <strong>and</strong> the highest scor<strong>in</strong>g<br />

pattern is selected giv<strong>in</strong>g the \most likely" b<strong>in</strong>ary pattern.<br />

Here are the new values for the example (note that although the order <strong>of</strong> most sequences is<br />

the same UUS <strong>and</strong> USU are reversed):<br />

pattern<br />

score<br />

UUU 1:7 10 ;4<br />

UUS 4:8 10 ;3<br />

USU 5:2 10 ;3<br />

USS 6:6 10 ;2<br />

SUU 3:6 10 ;5<br />

SUS 8:2 10 ;4<br />

SSU 4:3 10 ;4<br />

SSS 5:3 10 ;3<br />

5.2.5 Boundary Conditions<br />

Until now the subject <strong>of</strong> boundary considerations has been ignored. It is possible to calculate<br />

bigram frequencies from the corpus prosodic annotations for stress annotations after <strong>and</strong> before<br />

prosodic boundaries (see tables 5.2 <strong>and</strong> 5.3) <strong>and</strong> <strong>in</strong>corporate these <strong>in</strong>to the calculations. Assum<strong>in</strong>g<br />

that the sequences that are processed with the model are whole tone units (bounded on either<br />

55


TU boundary<br />

stressed 0.1781<br />

unstressed 0.0139<br />

Table 5.2: Probability <strong>of</strong> a tone unit boundary follow<strong>in</strong>g a stressed or unstressed word.<br />

stressed unstressed<br />

TU boundary 0.0690 0.1231<br />

Table 5.3: Probability <strong>of</strong> stressed or unstressed word follow<strong>in</strong>g a Tone Unit boundary.<br />

side by a tone unit boundary) the additional probabilities: tone unit boundary followed by either<br />

stressed or unstress annotations (dependent upon the stress state <strong>of</strong> the rst word) <strong>and</strong> likewise<br />

for the last word <strong>in</strong> the tone unit can be applied by multiply<strong>in</strong>g them together with the value<br />

given by formula 5.2.<br />

For the example sequence \k at Ford motors k" the score given above (6:6 10 ;2 )would be<br />

multiplied by 0.1231 for the \k at" bigram <strong>and</strong> by 0.1781 for the \ motors k" bigram.<br />

This model performs no k<strong>in</strong>d <strong>of</strong> boundary prediction. In the results presented below the orig<strong>in</strong>al<br />

boundaries were taken from the corpus prosodic annotations. Boundary conditions do eect the<br />

results (there is a greater tendency for a stress at the end <strong>of</strong> a tone unit than at the beg<strong>in</strong>n<strong>in</strong>g)<br />

but the models (<strong>in</strong> this research) are not concerned with boundary predictions or modell<strong>in</strong>g even<br />

though their use here does improve the model's performance. Initially a dist<strong>in</strong>ction was made<br />

between boundary types but this was later dropped <strong>and</strong> all boundary types are treated the same<br />

because there was no signicant dierence atta<strong>in</strong>ed by mak<strong>in</strong>g the dist<strong>in</strong>ction.<br />

5.3 Performance<br />

To assess the accuracy <strong>of</strong> the model (see source code <strong>in</strong> section F.9) it was applied to the tra<strong>in</strong><strong>in</strong>g<br />

data <strong>in</strong> the corpus 5<br />

<strong>and</strong> the predicted stress patterns compared with those transcribed by the<br />

5 It is usual to use test<strong>in</strong>g data to assess a model's usefulness but this is not the purpose here. Here we assess<br />

how well the model matches the tra<strong>in</strong><strong>in</strong>g data. Test<strong>in</strong>g data would then be used to compare with the tra<strong>in</strong><strong>in</strong>g<br />

data results to see if the model worked as well for unseen data Category M was reserved for check<strong>in</strong>g the models<br />

generality <strong>and</strong> results for this are presented <strong>in</strong> chapter 6.<br />

56


Category Speech Style %BJW %GOK %ALL<br />

A Commentary 88 94 91<br />

B News Broadcasts 90 95 92<br />

C Lecture(general) 88 94 90<br />

D Lecture(specialist) 92 95 93<br />

F Magaz<strong>in</strong>e Report<strong>in</strong>g 85 94 90<br />

Table 5.4: Performance statistics for stress prediction model. Percentage <strong>of</strong> words which are<br />

correctly stressed/unstressed <strong>in</strong> comparison to the two expert annotations <strong>and</strong> overall.<br />

Category Speech Style %BJW %GOK %ALL<br />

A Commentary 50 74 64<br />

B News Broadcasts 54 82 69<br />

C Lecture(general) 55 75 66<br />

D Lecture(specialist) 66 79 73<br />

F Magaz<strong>in</strong>e Report<strong>in</strong>g 43 76 64<br />

Table 5.5: Performance statistics for stress prediction model. Percentage <strong>of</strong> completely correct<br />

tone units <strong>in</strong> comparison to the two expert annotations (BJW: Briony Williams, <strong>and</strong> GOK: Gerry<br />

Knowles) <strong>and</strong> overall (ALL).<br />

experts. The results are summarized <strong>in</strong> table 5.4. The performance <strong>of</strong> the model is good averag<strong>in</strong>g<br />

over 90% agreement with the annotations <strong>in</strong> the corpus for a range <strong>of</strong> speech styles however it seems<br />

especially good with the more formal speech styles <strong>in</strong> the specialist lecture <strong>and</strong> news categories.<br />

It is <strong>in</strong>terest<strong>in</strong>g to note that us<strong>in</strong>g the second performance measure listed earlier the results<br />

<strong>in</strong> table 5.5 were atta<strong>in</strong>ed. The <strong>in</strong>terest<strong>in</strong>g po<strong>in</strong>t to note is not that the values are lower (as this<br />

would be expected) but that the values for the dierent transcribers are signicantly dierent.<br />

The values <strong>in</strong> the last three columns show the percentages <strong>of</strong> correct predictions for tone<br />

units (average length 4 words) <strong>in</strong> each catagory. The percentage correct is the percentage <strong>of</strong> tone<br />

units whose predicted stress pattern completely agrees with that <strong>in</strong> the transcription. There is an<br />

average 24% dierence between the accuracy <strong>of</strong> the model when applied to each <strong>of</strong> the transcribers'<br />

sections <strong>of</strong> the corpus. This does not mean that one transcriber was better or worse than the other<br />

as both transcribers are experts <strong>in</strong> the eld <strong>and</strong> were approach<strong>in</strong>g the same task. However it does<br />

go some way to illum<strong>in</strong>ate the diculty <strong>in</strong> produc<strong>in</strong>g prosodic transcriptions. Prosodic factors are<br />

complex perceptions <strong>and</strong> a person's perceptions are avoured by <strong>in</strong>terests, attitudes, environment<br />

etc. It is reasonable to expect a level <strong>of</strong> \perceptual variability" between any two transcriptions.<br />

57


It is not supris<strong>in</strong>g, therefore, that one transcriber had a tendency to produce transcriptions more<br />

consistent with syntactic structure (he hav<strong>in</strong>g been work<strong>in</strong>g <strong>in</strong> that area for many years) whilst<br />

the other transcriber produced transcriptions more consistent with acoustic measures <strong>of</strong> speech<br />

(she work<strong>in</strong>g on speech <strong>and</strong> <strong>in</strong>tonation synthesis at that time).<br />

S<strong>in</strong>ce the model is based on<br />

syntactic class it is clear why the percentage gures for GOKs transcribed sections were larger.<br />

This dierence is amplied by the metric used.<br />

5.4 Improvements<br />

As suggested earlier cluster<strong>in</strong>g word classes <strong>in</strong>to groups with similar probabilities will enable us<br />

to estimate word classs/stress state transition probabilities for groups <strong>of</strong> word classes <strong>and</strong> thus<br />

side{step the problem <strong>of</strong> low sample sizes. If the assumption made above that word class is not<br />

important for these probabilities is wrong the estimated group{based probabilities will perform<br />

better than the non{group{based probabilities used before. In which way the groups are formed<br />

is the subject <strong>of</strong> the follow<strong>in</strong>g.<br />

There are potentially many ways to group word classes. Initially they were grouped upon the<br />

ratio <strong>of</strong> frequencies <strong>of</strong> stressed to unstressed <strong>in</strong>stances <strong>of</strong> each word class. That is the probability<br />

<strong>of</strong> each word class be<strong>in</strong>g stressed <strong>and</strong> unstressed were used to plot a po<strong>in</strong>t on a graph. All po<strong>in</strong>ts<br />

lie on the l<strong>in</strong>e<br />

y =1; x where 0 x 1<br />

An arbitrary thirteen groups were produced by divid<strong>in</strong>g this l<strong>in</strong>e at arbitrary po<strong>in</strong>ts <strong>and</strong> the<br />

group stress transition probabilities were extracted for each group from the aligned prosodic <strong>and</strong><br />

syntactic annotations (this produced four tables <strong>of</strong> 13 13 = 676 cells). This was achieved by<br />

summ<strong>in</strong>g the transition frequencies for each word class <strong>and</strong> then add<strong>in</strong>g together frequencies for<br />

each word class <strong>in</strong> each group <strong>and</strong> translat<strong>in</strong>g between word classes <strong>and</strong> groups. These transition<br />

probabilities were then used <strong>in</strong> place <strong>of</strong> the probabilities <strong>in</strong> the stress transition table. Each word<br />

class is mapped onto its appropriate group <strong>and</strong> the group{to{group transition probabilities used.<br />

58


Initial word class state probabilities were used as before as opposed to calculat<strong>in</strong>g group state<br />

probabilities.<br />

The probabilities made no new mistakes <strong>and</strong> when tested upon a sample <strong>of</strong> 50 previously erroneously<br />

predicted sequences 16% were corrected. This translates to an overall 3%{4% improvement<br />

thus demonstrat<strong>in</strong>g that group<strong>in</strong>g <strong>of</strong> word classes to aid estimation <strong>of</strong> transition probabilities improved<br />

the model. However, the expected improvements were not forthcom<strong>in</strong>g when the test<strong>in</strong>g<br />

was scaled up.<br />

This failure prompted a closer exam<strong>in</strong>ation <strong>of</strong> the group<strong>in</strong>gs. There is a problem with group<strong>in</strong>g<br />

word classes <strong>in</strong> this way consider a very low frequency word class (for example NNSA1 which has<br />

one <strong>in</strong>stance <strong>in</strong> the sub{corpus sample used). A word class with only one <strong>in</strong>stance will be placed<br />

at either end <strong>of</strong> the cont<strong>in</strong>uum stressed/unstressed <strong>and</strong> no account is taken <strong>of</strong> the behaviour <strong>of</strong><br />

other word classes that might be similar. This is true for other low frequency word classes: it is<br />

quite possible for a s<strong>in</strong>gle extra example <strong>of</strong> the word class to shift which group it would be placed<br />

<strong>in</strong>.<br />

A dierent group<strong>in</strong>g scheme could be to use transitions probabilities themselves as a guide<br />

to group formation but this suers from all the problems mentioned earlier <strong>and</strong> is therefore not<br />

practical. A more realistic scheme would be to use the word class/prosodic mark co{occurrence<br />

frequencies as a guide see section 4.3.3.<br />

To avoid the problem <strong>of</strong> low frequency word classes aect<strong>in</strong>g the group<strong>in</strong>gs the cluster<strong>in</strong>g<br />

was performed on the 64 most frequent word classes. This was an arbitrary cut o po<strong>in</strong>t. The<br />

rema<strong>in</strong><strong>in</strong>g word classes were then placed <strong>in</strong>to groups us<strong>in</strong>g the similarity <strong>of</strong>word classes already<br />

<strong>in</strong> the groups as a guide <strong>and</strong> was performed some what ad hoc. The cluster<strong>in</strong>g <strong>in</strong>to groups was<br />

performed us<strong>in</strong>g the method described <strong>in</strong> Hughes[Hug94] where the distance between two word<br />

classes as dened by the distance between the vectors <strong>of</strong> prosodic mark co-occurrence gures for<br />

each word class was calculated <strong>and</strong> closest word classes were merged. The vector for each word<br />

class conta<strong>in</strong>ed the probabilties (i.e. normalised frequencies) for co{occurrence with unstressed<br />

words, levelly stressed words, stressed but unaccented words <strong>and</strong> words with stress accent.<br />

59


1 2 3 4 5 6 7 8 9<br />

&FW EX AT CCB DA &FO NNS ND1 DD1<br />

APP$ IF AT1 CF DA1 DD NNS1 NN1 DD121<br />

CC PNQO BTO CS DA2 MC NNS2 NN121 DD122<br />

CC31 PNQS BTO21 CS21 DA2R MC{MC VVD NN122 DD2<br />

CC32 PPH1 BTO22 CS22 DAR MC1 VVG NN2 DD21<br />

CC33 PPIO1 CSN CSA DAT MC2 VVZ NNJ DD22<br />

II PPIO2 CST CSW DB MD NNJ1 DD221<br />

IW PPIS1 II22 DDQ DB2 XX NNJ2 DD222<br />

PN PPIS2 II31 DDQ$ JA NNL1 II21<br />

PN1 VB0 II33 DDQV JB NNL2 II32<br />

PN121 VBDZ IO ICS JBR NNT1 MF<br />

PN122 VBN PPY LE JJ NNT2 NNO<br />

PP$ TO NNSA1 JJR PPX1 NNO2<br />

PPHO1 NNSB1 JJT PPX121 NNU<br />

PPHS1 PPHO2 NN PPX122 NNU1<br />

UH PPHS2 NP PPX2 NNU2<br />

VBDR VBR NP1 PPX221 NNU21<br />

VBG VH0 NP2 PPX222 NNU22<br />

VBM VM RA NPD1<br />

VBZ VM21 RL NPM1<br />

VD0 VM22 RL21 REX<br />

VDD VMK RL22 REX21<br />

VDG RR REX22<br />

VDN RR21 RG<br />

VDZ RR22 RG21<br />

VHD RR31 RG22<br />

VHG RR32 RGA<br />

VHN RR33 RGQ<br />

VHZ RRQ RGQV<br />

RRQV<br />

RGR<br />

RRR<br />

RGT<br />

RRT<br />

RP<br />

RT<br />

VV0<br />

VVN<br />

ZZ1<br />

Table 5.6: Words classes <strong>in</strong> the groups.<br />

This is repeated until all word classes were merged together. From the result<strong>in</strong>g distances a<br />

dendritic diagram could be produced (see gure 4.3). By cutt<strong>in</strong>g vertically through the l<strong>in</strong>es <strong>of</strong><br />

the dendritic diagram the word classes may be divided <strong>in</strong>toanumber <strong>of</strong> groups. This was done<br />

to yield n<strong>in</strong>e groups.<br />

The word classes <strong>in</strong> each group are presented <strong>in</strong> table 5.6.<br />

After all word classes have been placed <strong>in</strong>to a group it is possible to calculate the transition<br />

probabilities.<br />

This was performed us<strong>in</strong>g the two programs transition <strong>and</strong> transgroups see sec-<br />

60


Category Speech Style %Correct<br />

A Commentary 92<br />

B News Broadcasts 93<br />

C Lecture(general) 92<br />

D Lecture(specialist) 93<br />

F Magaz<strong>in</strong>e Report<strong>in</strong>g 93<br />

Table 5.7: Performance statistics for stress prediction model us<strong>in</strong>g group transition probabilities.<br />

tions F.6 <strong>and</strong> F.7 respectively. The rst program calculates the absolute transition frequencies for<br />

each word class (that is the number <strong>of</strong> times each word class/prosodic mark pair is followed by<br />

another word class/prosodic mark pair) <strong>and</strong> the second maps the word classes (<strong>and</strong> their associated<br />

frequencies) <strong>in</strong>to groups before normalis<strong>in</strong>g the frequencies <strong>in</strong>to the range 0 to 1 giv<strong>in</strong>g the<br />

estimate <strong>of</strong> probability.<br />

As before only two stress states are <strong>of</strong> concern here: stressed <strong>and</strong> unstressed, however, transitions<br />

to <strong>and</strong> from tone unit boundaries are also calculated, so a special group is assigned to the role<br />

<strong>of</strong> represent<strong>in</strong>g tone unit boundaries although it is clearly not possible for tone unit boundaries<br />

to be stressed or unstressed. For convenience <strong>and</strong> simplicity they are always treated as unstressed<br />

by convention, even though this has no mean<strong>in</strong>g.<br />

This produced results (see table 5.7) equivalent to those shown <strong>in</strong> table 5.4. This demonstrates<br />

that cluster<strong>in</strong>g can produce results at least as good as those produced earlier, <strong>in</strong> this case slightly<br />

higher.<br />

5.5 Summary<br />

With performances as high as 95% for some categories (see table 5.4 categories B <strong>and</strong> D) it is obvious<br />

that the model performs very well. A performance <strong>of</strong> 100% correct would be remarkable <strong>and</strong><br />

it is clear that the models' performance is approach<strong>in</strong>g a limit <strong>in</strong> terms <strong>of</strong> possible improvements<br />

us<strong>in</strong>g just word class <strong>and</strong> bigram frequencies.<br />

It is highly likely (but outside the scope <strong>of</strong> this work) that contextual <strong>in</strong>formation (such as<br />

given <strong>in</strong>formation) could improve performance.<br />

The results are encourag<strong>in</strong>g enough to try to<br />

61


exp<strong>and</strong> the model to predict more than two types <strong>of</strong> prosodic mark. That is to exp<strong>and</strong> the model<br />

such that it produces a potential prosodic annotation for a sequence <strong>of</strong> word classes. In addition<br />

this will quantify the relationship between word class <strong>and</strong> stress accents. This is the subject <strong>of</strong><br />

the next chapter.<br />

62


Chapter 6<br />

Automatic Prosodic Annotation<br />

6.1 Introduction<br />

Chapter 5 described a model for predict<strong>in</strong>g stress patterns for word class tagged text. This chapter<br />

concentrates upon extend<strong>in</strong>g the model developed to make predictions about stress accents 1 .<br />

The basic function <strong>of</strong> the model is as described <strong>in</strong> the previous chapter hence for a background<br />

description <strong>of</strong> the mathematics <strong>and</strong> function<strong>in</strong>g <strong>of</strong> the model refer to chapter 5.<br />

6.2 Exp<strong>and</strong><strong>in</strong>g the Model<br />

The stress prediction model (SPM) developed <strong>in</strong> Chapter 5 showed that there was a good degree <strong>of</strong><br />

relationship between word class <strong>and</strong> stress such that for over 91% <strong>of</strong> word classes the stressed versus<br />

unstressed state <strong>of</strong> the word may be predicted. That is a probabilistic model has been derived<br />

which can generate acceptable levels <strong>of</strong> stress annotation from word class sequences. This shows<br />

that stress is closely related to word class. This has been a widely held belief but has not until<br />

now been quantied <strong>in</strong> such away due to the unavailability <strong>of</strong> h<strong>and</strong> annotated mach<strong>in</strong>e{readable<br />

corpora such as the SEC.<br />

1 Whether a stress has a ris<strong>in</strong>g or fall<strong>in</strong>g or level pitch<br />

63


This chapter extends the model to show that this relationship holds too for stress accents (to a<br />

lesser extent). That is: it shows that there is a relationship between word class <strong>and</strong> pitch accents<br />

<strong>and</strong> that to a certa<strong>in</strong> degree stress accents can (reasonably accurately) be generated from word<br />

classes.<br />

Three po<strong>in</strong>ts however need to be made. Firstly the degree to which it is reasonable to expect<br />

such a model to function accurately will be less than for the previous model. This is partly due<br />

to the <strong>in</strong>creased diversity <strong>of</strong> the possible annotations <strong>and</strong> partly due to the fact that the stress<br />

prediction model is a more general case than that <strong>of</strong> stress accent prediction so <strong>in</strong> predict<strong>in</strong>g<br />

accents we are ren<strong>in</strong>g the model's behaviour.<br />

Secondly, as has repeatedly been po<strong>in</strong>ted out, it is unreasonable to attempt to deduce an<br />

accurate statistically based model from a corpus as small as the SEC. That is not to say itis<br />

impossible but any models will be subject to deciencies <strong>in</strong> the same areas where the corpus<br />

suers from deciencies. For example the tonic stress mark for rise{fall is so <strong>in</strong>frequent <strong>in</strong> the<br />

SEC as to make it impossible to model it statistically. The same is true for certa<strong>in</strong> word classes e.g.<br />

NP2 <strong>and</strong> REX. In general it is still possible to model a category (prosodic or syntactic) accurately<br />

despite low frequency if <strong>and</strong> only if it is highly constra<strong>in</strong>ed. The few examples are enough to<br />

denitively illustrate the category's behaviour.<br />

F<strong>in</strong>ally, stress seems to play an important role <strong>in</strong> giv<strong>in</strong>g <strong>in</strong>formation about utterance (or sentence)<br />

structure whereas stress accents also play important roles <strong>in</strong> giv<strong>in</strong>g semantic <strong>and</strong> contextual<br />

<strong>in</strong>formation. The eect <strong>of</strong> these will not be modellable with<strong>in</strong> the constra<strong>in</strong>ts <strong>of</strong> this research.<br />

For these reasons the model developed here does not attempt to reproduce the exact set <strong>of</strong><br />

annotations with which the corpus is annotated, but <strong>in</strong>stead uses a subset as described <strong>and</strong> justied<br />

<strong>in</strong> section 6.3.1 below. This br<strong>in</strong>gs up the issue <strong>of</strong> how to compare the two annotation schemes.<br />

Section 6.4 describes an alternative metric 2 to that used previously (i.e. an exact match with the<br />

corpus annotations) that attempts to overcome these problems.<br />

There is, <strong>and</strong> cont<strong>in</strong>ues to be, a real lack <strong>of</strong> ability to assess general performance <strong>of</strong> any such<br />

2 None <strong>of</strong> the metrics used for assessment are wholey satisfactory.<br />

64


model s<strong>in</strong>ce it is beyond current capability to know what comprises a good prosodic annotation<br />

for a given word class sequence | unless, <strong>of</strong> course, one is an expert <strong>in</strong> prosodic annotations 3 .<br />

An expert transcriber or a l<strong>in</strong>guist who had worked with prosody would have an<strong>in</strong>tuitive feel for<br />

what transcriptions were acceptible or natural. No system has yet been <strong>in</strong>vented that can do this.<br />

In addition no system has been <strong>in</strong>vented that can generate speech with the appropriate stresses<br />

<strong>and</strong> <strong>in</strong>tonations from such transcriptions (thus perform<strong>in</strong>g the <strong>in</strong>verse role <strong>of</strong> the transcriber).<br />

One possibility (See section 6.4) would be to synthesize utterances or manipulate pre{recorded<br />

utterances with the desired changes <strong>in</strong> fundamental frequency, <strong>in</strong>tensity, duration <strong>and</strong> other<br />

prosodic features to reect the stress accent predictions <strong>and</strong> submit these to listen<strong>in</strong>g tests whereby<br />

subjects assess the naturalness <strong>of</strong> each synthesized utterance. Attractive though this is it is not<br />

possible to achieve at present because <strong>of</strong> lack<strong>of</strong>knowledge <strong>of</strong> how (<strong>and</strong> if) the prosodic annotations<br />

relate to acoustic features such as fundamental frequency, <strong>in</strong>tensity etc.<br />

6.3 Model Design<br />

The model used <strong>in</strong> this chapter works on the same pr<strong>in</strong>ciples as the stress prediction model but<br />

with two important dierences. Firstly the range <strong>of</strong> annotation symbols is <strong>in</strong>creased from stressed<br />

<strong>and</strong> unstressed to <strong>in</strong>corporate rises <strong>and</strong> falls (see below) <strong>and</strong> transition probabilities are estimated<br />

from group<strong>in</strong>gs <strong>of</strong> word classes us<strong>in</strong>g their prosodic similarity (<strong>in</strong> the ultimately described model).<br />

There are two basic details to consider <strong>in</strong> the design <strong>of</strong> the model <strong>and</strong> these are: what range<br />

<strong>of</strong> word classes the model should h<strong>and</strong>le <strong>and</strong> which aspects <strong>of</strong> the prosodic annotation will be<br />

modelled. There are also two sets <strong>of</strong> parameters that need to be estimated from the corpus: the<br />

probability <strong>of</strong> co{ocurrence between word classes <strong>and</strong> prosodic marks <strong>and</strong> the transition probabilities<br />

<strong>of</strong> a pair <strong>of</strong> word classes with appropriate prosodic marks.<br />

The choice <strong>of</strong> which word classes to model is largely dictated by those word classes that will<br />

be <strong>in</strong> the <strong>in</strong>put stream. However as po<strong>in</strong>ted out above itwould be impossible to model some word<br />

3 unlike the author<br />

65


classes because <strong>of</strong> their low frequency. The action taken was to attempt to model all word classes<br />

however erroneously that may be for some word classes. In attempt<strong>in</strong>g to model such word classes<br />

the overall performance <strong>of</strong> the model will be reduced. It should be noted however, that by their<br />

very nature these word classes hardly occur <strong>and</strong> so each word class badly modelled will not have<br />

a signicant impact upon the performance.<br />

A related factor to this becomes obvious when we want to assess whether changes to the model<br />

are improvements or not. S<strong>in</strong>ce most errors <strong>of</strong> the sort caused by poor modell<strong>in</strong>g occur <strong>in</strong> those<br />

word classes with ill{dened probabilities (i.e. those with low frequencies) performance changes<br />

are unlikely to be large <strong>and</strong> hence will be dicult to assess. It would be quite justiable to ignore<br />

the annotations on some <strong>of</strong> the word classes output from the model (if they were <strong>in</strong> the poorly<br />

dened class). This would leave the problem <strong>of</strong> side eects that happen over transitions between<br />

these word classes <strong>and</strong> their neighbours. In order to not complicate the analyses more than is<br />

necessary this is not done. Performance gures are given for the more frequently occur<strong>in</strong>g word<br />

classes. See section 7.3.3.<br />

6.3.1 Choice <strong>of</strong> Prosodic Marks<br />

The corpus has a wide range <strong>of</strong> prosodic marks:<br />

<strong>and</strong> unstressed with the<br />

" <strong>and</strong> # modier symbols. It is unreasonable to expect this model to perform well with so many<br />

possibilities for reasons given above <strong>and</strong> the search space would become unreasonably large. This<br />

must be reduced. Dropp<strong>in</strong>g the high/low dist<strong>in</strong>ction (as <strong>in</strong> Roach[Roa91]) <strong>and</strong> the rise{fall mark<br />

( ) which isvery <strong>in</strong>frequent, reduces the set <strong>of</strong> marks to six high <strong>and</strong> low resets are generally<br />

ignored <strong>in</strong> these models. The symbol (stressed but unaccented) has been described as ill{dened 4<br />

<strong>and</strong> a decision was made to merge the low <strong>and</strong> high level tones ( <strong>and</strong> )with to give anew<br />

mark (here it will cont<strong>in</strong>ue to be donoted as<br />

but its mean<strong>in</strong>g is simply stress or level stress).<br />

This results <strong>in</strong> 5 possible prosodic classes: rise, fall, fall{rise, stressed <strong>and</strong> unstressed. Unstressed<br />

words will cont<strong>in</strong>ue to carry no mark.<br />

4 Gerry Knowles <strong>in</strong> private conversation.<br />

66


For convienience we will cont<strong>in</strong>ue to use the symbols <strong>and</strong> to represent these prosodic<br />

marks but it must be remembered that these symbols do not now dist<strong>in</strong>guish between high <strong>and</strong><br />

low levels <strong>of</strong> marks.<br />

6.3.2 Estimation <strong>of</strong> Probabilities<br />

Estimat<strong>in</strong>g a model <strong>of</strong> stresses or tonic stress marks statistically from a corpus would normally<br />

require enough data for each entity to be modelled. As po<strong>in</strong>ted out <strong>in</strong> section 5.2.4 the SEC is<br />

not a large corpus <strong>and</strong> the direct methods <strong>of</strong> estimat<strong>in</strong>g co-occurrence probabilities are prone to<br />

error. However <strong>in</strong> section 5.2.4 an assumption was made that all word classes behave similarly<br />

with regard to stress transition likelihoods.<br />

This is not a valid assumption (but was a useful<br />

way to simplify the calculation <strong>of</strong> the likelihoods) because improved results may beachieved by<br />

group<strong>in</strong>g word classes <strong>in</strong> terms <strong>of</strong> their behaviour <strong>and</strong> estimat<strong>in</strong>g likelihoods for each group (see<br />

below on transition probabilities). If there is a sucient number <strong>of</strong> examples <strong>of</strong> the dierent word<br />

classes <strong>in</strong> each group (which thus imposes some constra<strong>in</strong>ts upon how groups are comprised) the<br />

likelihoods <strong>of</strong> co{occurrence between word classes (or more accurately, groups) <strong>and</strong> tonic stress<br />

marks can be estimated with reasonable accuracy. Section 4.3.3 discusses group<strong>in</strong>g word classes<br />

us<strong>in</strong>g the 64 most frequent as a guide. The rema<strong>in</strong><strong>in</strong>g word classes which are <strong>of</strong> low frequency<br />

can be <strong>in</strong>serted <strong>in</strong>to groups based upon the similarity <strong>of</strong>theword class types. It should be noted<br />

that the cluster<strong>in</strong>g <strong>of</strong> word classes <strong>in</strong>to groups performed <strong>in</strong> section 4.3.3 does so not based upon<br />

the transition likelihoods but on the co-occurrence likelihoods, however, the enhancements to the<br />

model provided here assume that word classes that have similar co-occurrence likelihoods will also<br />

have similar transition likelihoods. This is described <strong>in</strong> more detail <strong>in</strong> section 5.4.<br />

Estimation <strong>of</strong> State Probabilities<br />

It would be possible to use the state probabilities for the group associated with each word class.<br />

This would mean that low frequency word classes would use an \improved" set <strong>of</strong> probabilities<br />

but at the expense <strong>of</strong> those word classes that are <strong>of</strong> high frequency. S<strong>in</strong>ce these latter word classes<br />

67


are the most frequent it is desirable to have them as accurate as possible. The decision made here<br />

is to use the state probabilities derived from the co{occurrence table directly <strong>in</strong>dividually for each<br />

word class. An alternative would be to use the group state probabilities for low frequency word<br />

classes only. The problem here is that it is still not clear which group a low frequency word class<br />

should belong <strong>and</strong> consequently this approach has not been explored although there is reason to<br />

suspect some improvement <strong>in</strong> performance.<br />

Estimation <strong>of</strong> Transition Probabilities<br />

The estimation <strong>of</strong> the transition probabilities can be attempted <strong>in</strong> three ways: rstly ignore word<br />

class <strong>and</strong> use prosodic mark transition probabilities. This is equivalent to the special case <strong>of</strong> a<br />

s<strong>in</strong>gle word class group. Secondly the transition probabilities for each <strong>and</strong> every word class could<br />

be estimated from alignment data produced <strong>in</strong> chapter 3. This is equivalent tohav<strong>in</strong>g groups<br />

with a s<strong>in</strong>gle word class <strong>in</strong> each. F<strong>in</strong>ally a compromise could be struck us<strong>in</strong>g a small number <strong>of</strong><br />

groups on the basis <strong>of</strong> similarity. Although there is good reason to expect that this compromise<br />

would be the most benecial the construction <strong>of</strong> groups is still an uncerta<strong>in</strong> process <strong>and</strong> has<br />

signicant impact upon results as can be seen by the two approaches taken <strong>in</strong> section 5.4. The<br />

estimation <strong>of</strong> probabilities is fraught with low sample size problems: each group must conta<strong>in</strong> a<br />

sucient number <strong>of</strong> word classes with a representative sample <strong>of</strong> transitions. For this reason the<br />

transition probabilities are estimated on the s<strong>in</strong>gle group basis which has been shown to work if<br />

not optimally. It is also important to not over{complicate the model <strong>in</strong>itially. In section 6.3.4 the<br />

concept <strong>of</strong> a composite model is <strong>in</strong>troduced. This model uses aspects <strong>of</strong> the model developed <strong>in</strong><br />

the previous chapter along with the model developed here. This, <strong>of</strong> course, leads to two lots <strong>of</strong><br />

transition probabilities. The transition probabilities for the rst stage <strong>of</strong> the composite model use<br />

the group<strong>in</strong>gs described <strong>in</strong> section 5.4 <strong>and</strong> the transition probabilties for the latter stage <strong>of</strong> the<br />

composite model use the s<strong>in</strong>gle group transition probabilities mentioned above.<br />

68


6.3.3 The Model<br />

The formula for the model is very similar to that given <strong>in</strong> equation 5.2. There are however now<br />

more annotation types: the ve marks given <strong>in</strong> section 6.3.1. As <strong>in</strong> the orig<strong>in</strong>al stress prediction<br />

model (SPM) transition probabilities are used with the assumption that word class is not important<br />

(i.e. the extreme case <strong>of</strong> one group).<br />

The source code for this model is given <strong>in</strong> section F.10. This program is very computationally<br />

<strong>in</strong>tensive because <strong>of</strong> the exhaustive search <strong>of</strong> the very large search space. In an attempt to limit<br />

computation time the length <strong>of</strong> the utterances that it will h<strong>and</strong>le is limited to 15 words (<strong>and</strong>/or<br />

tone unit boundaries). In most situations however utterances (divided by tone unit boundaries)<br />

will be much shorter than this. There is no special requirement that the utterance is bounded by<br />

tone unit boundaries <strong>and</strong> it is worth not<strong>in</strong>g that this will aect performance s<strong>in</strong>ce the presence<br />

<strong>of</strong> a tone unit boundary will aect the placement <strong>of</strong> tones given the <strong>in</strong>clusion <strong>of</strong> TSM|tone unit<br />

boundary bigram likelihood constra<strong>in</strong>ts (see section 5.2.5). In particular it may be found that<br />

TSMs will not be placed at the end <strong>of</strong> what would be a tone unit if a tone unit boundary is<br />

omitted.<br />

Boundary Considerations<br />

In a real life application <strong>of</strong> the model it would be necessary to predict the location <strong>of</strong> the tone unit<br />

boundaries. This is beyond the scope <strong>of</strong> this research but cannot be entirely ignored. The model<br />

makes no serious attempt to model tone unit boundaries other than assum<strong>in</strong>g that punctuation<br />

gives rise to a boundary. As punctuation marks are classed as lexical items <strong>and</strong> assigned their own<br />

syntactic word tags <strong>in</strong> the SEC (<strong>and</strong> most other tagged corpora) this amounts to an appropriate<br />

mapp<strong>in</strong>g between syntax <strong>and</strong> prosody at a basic level. This is a very rough <strong>and</strong> ready rule which<br />

aects the performance <strong>of</strong> the model but is only described as a short term solution s<strong>in</strong>ce much<br />

work has be<strong>in</strong>g done <strong>in</strong> this area.<br />

Table E.1 quanties the relationship between punctuation <strong>and</strong> tone unit boundaries. It should<br />

be noted that this rule will miss approximately 52% <strong>of</strong> the boundaries <strong>and</strong> approximately 9%<br />

69


<strong>of</strong> the boundaries generated will be <strong>in</strong>serted where they would otherwise not have existed.<br />

It<br />

is undoubtable that this would aect the accuracy <strong>of</strong> the model <strong>in</strong> some cases.<br />

In the results<br />

presented here <strong>and</strong> elsewhere <strong>in</strong> this thesis the orig<strong>in</strong>al tone unit boundaries (as transcribed <strong>in</strong> the<br />

corpus) were used for the assessment <strong>of</strong> the model. Punctuation would only be used <strong>in</strong> situations<br />

where tone unit boundaries were not available.<br />

6.3.4 Composite Model<br />

The major fail<strong>in</strong>g <strong>of</strong> the model above is that it performs badly on the stress/unstress dist<strong>in</strong>ction.<br />

The stress prediction model however achieves a high success rate <strong>in</strong> this very area (over 91% on<br />

average). It is therefore desirable to try to capitalise on this.<br />

A composite model (i.e.<br />

a model that comb<strong>in</strong>es both the model described above <strong>and</strong> the<br />

stress prediction model) may be devised by mak<strong>in</strong>g use <strong>of</strong> the mechanism <strong>of</strong> the model developed<br />

<strong>in</strong> chapter 5 to select a number <strong>of</strong> c<strong>and</strong>idate sequences <strong>and</strong> use these as <strong>in</strong>put to the prosody<br />

predition model. In this approach the stress prediction model is applied to utterances <strong>and</strong> the<br />

top few sequences are selected. <strong>Analysis</strong> <strong>of</strong> the models' performance has shown that the \correct"<br />

or most acceptable stress pattern was usually present <strong>in</strong> the top 5 patterns. The prosodic mark<br />

prediction model now operates on the search space dened by the top few patterns: the stressed<br />

words will be allowed to vary between each <strong>of</strong> the accents <strong>and</strong> <strong>and</strong> the unstressed words<br />

will be clamped as unstressed.<br />

For example the utterance \at Ford motors" as processed by the SPM gives performance scores<br />

as listed <strong>in</strong> section 5.2.4 with the w<strong>in</strong>n<strong>in</strong>g sequence be<strong>in</strong>g \at Ford motors". This sequence would<br />

be constra<strong>in</strong>ed to the possibilities listed below. Remember that<br />

has dierent mean<strong>in</strong>gs <strong>in</strong> the<br />

SPM <strong>and</strong> PPM.<br />

70


at Ford motors<br />

In this case the w<strong>in</strong>n<strong>in</strong>g sequence was \at Ford motors" which is considered to be equivalent<br />

to that annotated <strong>in</strong> the corpus <strong>of</strong> \j at<br />

Ford motors j".<br />

Of course the SPM is not perfect <strong>and</strong> its errors will be passed on to the second phase <strong>of</strong> the<br />

model. A better solution would be more accurate estimates <strong>of</strong> the model parameters: state <strong>and</strong><br />

transition probabilities as mentioned elsewhere.<br />

Alternative search methods such assimulated anneal<strong>in</strong>g are not guaranteed to nd the global<br />

m<strong>in</strong>imum whereas the above mechanism is highly unlikely to miss the global m<strong>in</strong>imum<strong>and</strong> achieves<br />

avery large reduction <strong>in</strong> the search space it is therefore reasonably hard to beat. There seems<br />

little justication <strong>in</strong> attempt<strong>in</strong>g to implement alternative search methodologies.<br />

71


Category Speech Style %Correct<br />

A Commentary 64<br />

B News Broadcasts 77<br />

C Lecture(general) 59<br />

D Lecture(specialist) 63<br />

F Magaz<strong>in</strong>e Report<strong>in</strong>g 61<br />

Table 6.1: Performance scores for the tra<strong>in</strong><strong>in</strong>g categories.<br />

Search Space Reduction<br />

The non{composite model described above suers from a high computational load which makes<br />

it unsuitable for real time applications: the search space is too large to search <strong>in</strong> reasonable time<br />

given current comput<strong>in</strong>g power. Although the global search mechanism could be replaced with<br />

an alternative there is no need: the composite model described <strong>in</strong> section 6.3.4 massively reduced<br />

the computational load.<br />

For example: a ten word utterance would have a search space <strong>of</strong> size 5 10 patterns as compared<br />

to the composite model which would have a search space <strong>of</strong> the order 2 10 + 100 (with the, very<br />

reasonable, assumption that the composite model selects 5 patterns from the rst phase to pass<br />

to the second phase <strong>and</strong> if we assume that, on average, half <strong>of</strong> the words <strong>in</strong> the utterance will be<br />

unstressed (5 words 4 prosodic marks 5 patterns = 100). We note that 2 10 + 100 5 10 .<br />

As noted <strong>in</strong> section 4.2.2 <strong>and</strong> by reference to gure 4.2 the average length <strong>of</strong> a tone unit is<br />

about 4 words. This would mean a search space <strong>of</strong> less than 130 patterns which iseasilyachievable<br />

<strong>in</strong> real time. On a computer capable <strong>of</strong> 1Mop (<strong>and</strong> perform<strong>in</strong>g no other work) it could achieve<br />

<strong>in</strong> the order <strong>of</strong> 1000 tone units per second equivalent to more than 4000 words. For an assumed<br />

speech rate <strong>of</strong> ve words per second this would account for approximately 1/8% <strong>of</strong> the comput<strong>in</strong>g<br />

power.<br />

Performance Improvements<br />

The performance statistics for the tra<strong>in</strong><strong>in</strong>g categories <strong>of</strong> the model are presented <strong>in</strong> table 6.1.<br />

Results for the test<strong>in</strong>g sections are given <strong>in</strong> table 6.3.<br />

72


There are a whole range <strong>of</strong> alternatives that may be used to attempt to improve the performance<br />

<strong>of</strong> the model. One subset <strong>of</strong> these alternatives changes the transition probabilities. In addition<br />

to the bigrams it is possible to calculate tri{gram probabilities for prosodic marks (i.e.<br />

these<br />

probabilities are for prosodic marks only not for word class <strong>and</strong> prosodic marks as it has previously<br />

been noted that probabilities such as those cannot reliably be estimated from the corpus). The<br />

addition <strong>of</strong> these tri{gram probabilities does provide a small <strong>in</strong>crease (overall performance <strong>in</strong>crease<br />

<strong>of</strong> 0.18%) <strong>in</strong> the performance <strong>of</strong> the model overall, but not really <strong>of</strong> any signicance.<br />

Vary<strong>in</strong>g the number <strong>of</strong> sequences passed to the second phase from the rst phase does not<br />

make any dierence to the performance <strong>of</strong> the model. It would be possible to dynamically vary<br />

the number <strong>of</strong> sequences passed on based upon the closeness <strong>of</strong> the scores for each sequence <strong>in</strong> the<br />

rst phase. This could further reduce the search space. The top two or three sequences are the<br />

most that is required.<br />

Runn<strong>in</strong>g the model with either the <strong>in</strong>itial state probabilities or the transition probabilities<br />

alone reduces its performance to just over half what is achieved when us<strong>in</strong>g both.<br />

6.4 Model Assessment<br />

The stress prediction model developed <strong>in</strong> chapter 5 could predict one <strong>of</strong> two possible prosodic<br />

marks for each word class. The metric used to assess how well the model worked was to count the<br />

number <strong>of</strong> times that each predicted mark matched the actual mark (or equivalent actual mark)<br />

<strong>in</strong> the corpus. This was quite acceptible. In the case <strong>of</strong> the model developed <strong>in</strong> this chapter the<br />

situtation is more complex s<strong>in</strong>ce the model can predict multiple types <strong>of</strong> mark. Us<strong>in</strong>g the above<br />

metric would disadvantage the model s<strong>in</strong>ce it is clear that an error <strong>of</strong> one type maybeworse than<br />

an error <strong>of</strong> another type. In fact what may be construed as an error (<strong>in</strong> that it does not match<br />

exactly the annotation <strong>in</strong> the corpus) may be an acceptable prediction (<strong>in</strong> that a listener would<br />

not object to the naturalness <strong>of</strong> the utterance).<br />

In an attempt to alleviate this problem each<br />

predicted mark is scored (rang<strong>in</strong>g from 0 to 1) depend<strong>in</strong>g upon its similarity to the annotation<br />

73


given <strong>in</strong> the corpus as opposed to the score be<strong>in</strong>g 1 for an exact match <strong>and</strong> 0 for a mismatch. For<br />

example if the corpus annotation was a fall{rise <strong>and</strong> the model predicted a fall this would be given<br />

a higher score than if the model predicted an unstressed word. The scores were given accord<strong>in</strong>g to<br />

the gures <strong>in</strong> table 6.2. For example: a predicted mark <strong>of</strong><br />

would score 0.15 if the corpus word<br />

were annotated with<br />

but would score 0.0 if the corpus word were annotated as unstressed. In all<br />

cases a score <strong>of</strong> 1.0 is given where an exact match isachieved. The nal score for a section can be<br />

converted to a percentage by divid<strong>in</strong>g the score by the number<strong>of</strong>words.<br />

It should be acknowledged that these scores have been allocated somewhat arbitrarily although<br />

it would be possible to choose the scores so that the models behaviour seemed to be improved.<br />

This has not been attempted <strong>and</strong> <strong>in</strong> the example performance statistics given <strong>in</strong> table 6.3 the<br />

percentage correct metric is also given for comparison.<br />

Scor<strong>in</strong>g or evaluation is a notoriously<br />

dicult problem <strong>in</strong> natural language process<strong>in</strong>g <strong>in</strong> general.<br />

For example see [Wee94, Lyo94,<br />

BGL93] for discussions <strong>of</strong> the range <strong>of</strong> dierent <strong>and</strong> contradictory metrics <strong>of</strong> pars<strong>in</strong>g sucess used<br />

<strong>in</strong> corpus{based grammatical analysis systems.<br />

As will be noted when view<strong>in</strong>g the performance statistics <strong>in</strong> table 6.3 the percentage scores are<br />

higher than the percentage correct. This is to be expected given the nature <strong>of</strong> the way the scores<br />

are calculated. Although the gures seem similar it should be realised that the percentage scores<br />

aim to give a better idea <strong>of</strong> how good annotations are <strong>and</strong> should be more sensitive tochanges <strong>in</strong><br />

the model than the percentage correct. They have only really been useful when compar<strong>in</strong>g dier<strong>in</strong>g<br />

versions <strong>of</strong> the PPM. A more thoughtful <strong>and</strong> <strong>in</strong>sightful approach to assignment <strong>of</strong> the values <strong>in</strong><br />

table 6.2 would perhaps provide more sensitivity <strong>and</strong> improvement <strong>in</strong> annotation assessment but<br />

this requires expertise <strong>in</strong> the area <strong>of</strong> prosodic tone labell<strong>in</strong>g <strong>of</strong> speech.<br />

The only really satisfactory way to assess the \goodness" <strong>of</strong> prosodic annotations is to listen<br />

to speech spoken <strong>in</strong> a way to reect the annotations. A listen<strong>in</strong>g test experiment would present<br />

subjects with a variety <strong>of</strong> dierent utterances with dier<strong>in</strong>g prosody <strong>and</strong> the subjects would give<br />

a subjective op<strong>in</strong>ion <strong>of</strong> the \naturalness" <strong>of</strong> each utterance. The only diculty with this approach<br />

is the diculty <strong>in</strong> produc<strong>in</strong>g the utterances. Three possibilities are given below.<br />

74


u/stress<br />

1.00 0.25 0.50 0.50 0.15<br />

0.25 1.00 0.75 0.50 0.15<br />

0.50 0.50 1.00 0.50 0.15<br />

0.15 0.15 0.15 1.00 0.25<br />

u/stress 0.00 0.00 0.00 0.15 1.00<br />

Table 6.2: Scor<strong>in</strong>g relationship between predicted <strong>and</strong> annotated prosodic marks.<br />

1. Speak the utterances follow<strong>in</strong>g the prosodic annotations. This is not an easy task <strong>and</strong><br />

requires expert ability not readily available.<br />

2. Modify the orig<strong>in</strong>al corpus record<strong>in</strong>gs (available on the MARSEC CDROM) us<strong>in</strong>g either the<br />

SOLA/PSOLA 5 algorithms or by us<strong>in</strong>g re{synthesis techniques. This requires the specication<br />

<strong>of</strong> <strong>in</strong>tensity, fundamental frequency, <strong>and</strong> duration contours derived from the prosodic<br />

annotation. It is not clear how to do this or <strong>in</strong>deed whether there really is a relationship<br />

between the two which can be captured <strong>in</strong> an accurate computational model.<br />

3. F<strong>in</strong>ally, a speech synthesizer could be used which allows specication <strong>of</strong> syllable durations,<br />

ris<strong>in</strong>g <strong>and</strong> fall<strong>in</strong>g tones <strong>and</strong> relative syllable loudness. There are, however, no such synthesizers<br />

readily available as most that employany form <strong>of</strong> <strong>in</strong>tonation control tend to be restricted<br />

to either a fall<strong>in</strong>g or ris<strong>in</strong>g tune over each utterance. Listeners may deem the output \unnatural"<br />

even if the predicted annotations are an exact match for MARSEC markup because<br />

<strong>of</strong> poor synthesis rather than poor prosody prediction.<br />

The Klatt synthesizer allows the specication <strong>of</strong> all the necessary parameters but this requires<br />

that prosodic annotations are converted <strong>in</strong>to fundamental frequency <strong>and</strong> <strong>in</strong>tensity contours<br />

<strong>and</strong> syllable durations which leads to the same problems as above.<br />

The use <strong>of</strong> a listen<strong>in</strong>g test has dist<strong>in</strong>ct advantages: it gets away from the use <strong>of</strong> corpus{based<br />

scor<strong>in</strong>g metrics whichhave been criticised[Wee94]. It also allows for the fact that some annotations<br />

may be acceptible <strong>in</strong> dier<strong>in</strong>g ways from those <strong>in</strong> the corpus <strong>and</strong> there may well be reason why<br />

5 These two algorithms are digital signal process<strong>in</strong>g techniques that allow the duration <strong>of</strong> a sound to be lengthened<br />

or shortened without chang<strong>in</strong>g the pitch (SOLA) <strong>and</strong> allow the pitch <strong>of</strong> a sound to be changed without chang<strong>in</strong>g<br />

the length <strong>of</strong> a sound (PSOLA).<br />

75


the corpus annotation is not typical or would not be acceptible <strong>in</strong> a general case. Context may<br />

have forced a change <strong>in</strong> the prosody <strong>in</strong> the corpus which would not ord<strong>in</strong>arily have happened. For<br />

example where the speaker wishes to contrast or correct someth<strong>in</strong>g that the listener has misheard.<br />

For example \Peter isn't here" versus \Peter isn't here".<br />

6.4.1 Performance Statistics<br />

The model performs quite well especially when it is realised that chance performance would be<br />

20% (that is each output symbol may be one <strong>of</strong> ve possibilities) compared with that <strong>of</strong> the stress<br />

prediction model where chance would be 50%. Overal score percentage is 1621:25(1420+763)<br />

100 = 74:27% <strong>and</strong> the overall percentage correct is 1420 (1420 + 763) 100 = 65:05%. Here<br />

score percentage is calculated as scoreright+wrong. See table 6.3.<br />

The majority <strong>of</strong> test sections have good results over 60% correct. It is worth not<strong>in</strong>g the the<br />

best performance <strong>of</strong> nearly 74% correct is achieved on the longest section (m05). The sections (<strong>in</strong><br />

decreas<strong>in</strong>g order <strong>of</strong> performance) cover the follow<strong>in</strong>g topics.<br />

m05<br />

m04<br />

m09<br />

m08<br />

m03<br />

m02<br />

m07<br />

Nelson M<strong>and</strong>ela speech<br />

Programme News<br />

Programme News<br />

Weather Forecast<br />

Weather Forecast<br />

Motor<strong>in</strong>g News<br />

Travel Roundup<br />

It is clear that the model performs better for some types <strong>of</strong> speech <strong>and</strong> perhaps speaker. Above<br />

the formal speech given about Nelson Madela <strong>and</strong> programme <strong>and</strong> weather news are given by<br />

pr<strong>of</strong>essional speakers who are more likely to speak with more accurate structure, pronunciation<br />

<strong>and</strong> <strong>in</strong>tonation than would be expected <strong>in</strong> casual speech.<br />

76


Section # right # wrong Score Score % Correct %<br />

m02 113 88 133.45 66.39 56.22<br />

m03 86 57 102.70 71.82 60.14<br />

m04 196 109 225.45 73.92 64.26<br />

m05 548 196 597.65 80.33 73.66<br />

m07 103 88 127.20 66.60 53.93<br />

m08 90 59 108.05 72.52 60.40<br />

m09 284 166 326.75 72.61 63.11<br />

total 1420 763 1621.25 74.27 65.05<br />

6.5 Summary<br />

Table 6.3: Performance scores for the test category <strong>of</strong> the corpus.<br />

In this chapter the stress prediction model developed <strong>in</strong> chapter 5 has been exp<strong>and</strong>ed to make<br />

predictions for a range <strong>of</strong> prosodic stress marks. It was found that a model such asthishadavery<br />

high computational load <strong>and</strong> was not very sucessful. The ma<strong>in</strong> problem area for the model seemed<br />

to be <strong>in</strong> the stress/unstress dist<strong>in</strong>ction. However, the stress prediction model does this task well<br />

<strong>and</strong> so a composite model was created. The composite model uses the stress prediction model<br />

to select a number <strong>of</strong> stress patterns for the utterance from the search space <strong>and</strong> passes these<br />

few patterns onto the prosody prediction model. This clamps unstressed words as such <strong>and</strong> only<br />

allows stressed words to vary between the dierent stress accents. It was found that this produced<br />

signicant improvement <strong>in</strong> performance with, on average, 65% <strong>of</strong> all prosodic marks match<strong>in</strong>g the<br />

prosodic mark <strong>in</strong> the corpus. This result was also seen <strong>in</strong> the test<strong>in</strong>g data presented <strong>in</strong> table 6.1.<br />

Chapter 7 goes on to analyse the function <strong>of</strong> the model <strong>in</strong> terms <strong>of</strong> how well it performs for<br />

each prosodic mark <strong>and</strong> for each word class.<br />

77


Chapter 7<br />

Conclusions <strong>and</strong> Future Work<br />

7.1 Introduction<br />

This chapter rst presents a review <strong>of</strong> the ma<strong>in</strong> po<strong>in</strong>ts covered <strong>in</strong> this thesis. It then covers analysis<br />

<strong>of</strong> the models' performance <strong>in</strong> terms <strong>of</strong> how well it models each word class <strong>and</strong> prosodic mark.<br />

F<strong>in</strong>ally comments upon future <strong>in</strong>vestigations are given.<br />

7.2 Review<br />

In chapter 3 a need to be able to cross reference between prosodic <strong>and</strong> word class annotated<br />

versions <strong>of</strong> the <strong>Spoken</strong> <strong>English</strong> <strong>Corpus</strong> was identied <strong>and</strong> a semi{automatic semi{<strong>in</strong>telligent tool<br />

was devised to enable this. This tool produced an aligned version <strong>of</strong> the corpus with both word<br />

class <strong>and</strong> prosodic annotations 1 from which itwas possible to extract statistics <strong>of</strong> co{occurrence<br />

between the two as described <strong>in</strong> chapter 4. This tool is general enough to be exploited <strong>in</strong> a similar<br />

way to relate any other types <strong>of</strong> corpus annotation <strong>and</strong> it has been used to provide alignment<br />

<strong>in</strong>formation <strong>in</strong> the Mach<strong>in</strong>e Readable <strong>Spoken</strong> <strong>English</strong> <strong>Corpus</strong>[GAR92].<br />

In chapters 5 <strong>and</strong> 6 two modelswere developed us<strong>in</strong>g statistics extracted from the corpus.<br />

1 Treebank parse trees are also aligned, even though these were not used <strong>in</strong> experiments.<br />

78


The development <strong>of</strong> these models was designed to demonstrate <strong>in</strong> a quantiable way the extent <strong>of</strong><br />

the relationship between the prosodic annotations <strong>and</strong> the word class tags <strong>in</strong> the <strong>Spoken</strong> <strong>English</strong><br />

<strong>Corpus</strong>. The models use two sets <strong>of</strong> parameters calculated from the cross reference between the<br />

prosodic <strong>and</strong> word class versions <strong>of</strong> the corpus. These are the co{occurrence likelihoods (or the<br />

likelihood that a prosodic mark occurs upon a word with a given word class) <strong>and</strong> the bigram<br />

likelihoods (or the likelihood that a prosodic mark/word class comb<strong>in</strong>ation is followed by another<br />

specic prosodic mark/word class comb<strong>in</strong>ation).<br />

As far as has been possible with<strong>in</strong> the limitations <strong>of</strong> the size <strong>of</strong> the corpus these likelihoods<br />

have been estimated, although various alternatives have been considered to cope with problems<br />

<strong>of</strong> low frequency <strong>of</strong> some entities. The most important <strong>of</strong> these techniques is the cluster<strong>in</strong>g <strong>of</strong><br />

word classes <strong>in</strong>to groups on a prosodic basis. This has given rise to the concept <strong>of</strong> prosodically<br />

orientated word class groups. It has been shown that these groups can perform as well, if not<br />

better, <strong>in</strong> estimat<strong>in</strong>g bigram likelihoods.<br />

The stress prediction model <strong>and</strong> the prosody prediction model have been tested with new<br />

unseen text <strong>and</strong> both have demonstrated that they can perform good levels <strong>of</strong> annotations which<br />

correspond 91% <strong>and</strong> 65% <strong>of</strong> the time respectively with the annotations with<strong>in</strong> the corpus. These<br />

models may be used as a stage <strong>in</strong> a text{to{speech system for the low level \basel<strong>in</strong>e" assignment<br />

<strong>of</strong> stress <strong>and</strong> prosody annotations. Higher level processes could use these as a start<strong>in</strong>g po<strong>in</strong>t for the<br />

assigment <strong>of</strong>context <strong>and</strong> semantic dependent prosody before generat<strong>in</strong>g the prosodic annotations<br />

acoustically.<br />

7.3 Performance Measures<br />

In order to evaluate the models <strong>in</strong> more detail it is necessary to look at performance gures for<br />

<strong>in</strong>dividual elements with<strong>in</strong> the model. This section presents <strong>and</strong> comments on several such elements<br />

<strong>in</strong>clud<strong>in</strong>g the frequency <strong>of</strong> prosodic marks with<strong>in</strong> tone units, the performance <strong>of</strong> <strong>in</strong>dividual word<br />

classes <strong>and</strong> the dierences between predicted <strong>and</strong> actual prosodic marks. The latter two show<strong>in</strong><br />

79


which areas the models performance varies. These show <strong>in</strong>aquantitative way the relationship<br />

between word class <strong>and</strong> stress accents.<br />

7.3.1 Tone Unit lengths <strong>in</strong> the Model.<br />

Although the model makes no attempt to generate tone unit boundaries (other than assum<strong>in</strong>g<br />

that punctuation <strong>in</strong>dicates a boundary) the number <strong>of</strong> tonic stresses <strong>and</strong> stressed words <strong>in</strong> tone<br />

units will dier between the model <strong>and</strong> the transcribed corpus. If the same tone unit boundaries<br />

are used to segment the synthesized prosody we can compare the graphs shown <strong>in</strong> gure 4.2 with<br />

the equivalent graphs <strong>in</strong> gure 7.1. It should be noted that the boundaries used are not exactly<br />

the same s<strong>in</strong>ce boundaries <strong>in</strong>serted due to punctuation have not been removed. This accounts for<br />

approximately 330 extra tone unit boundaries. It is also important to realise that the model would<br />

probably behave dierently if these boundaries had not been present <strong>in</strong> its <strong>in</strong>put or if the tone<br />

unit boundaries <strong>in</strong>serted to allow this comparison had been present <strong>in</strong> its <strong>in</strong>put. This accounts for<br />

the spread<strong>in</strong>g to the right for the words curve. This does not have signicant impact upon the<br />

other two curves.<br />

Note that the model predicts a large number <strong>of</strong> tone units with 0, 1 or 2 stress accents (TSMs)<br />

<strong>in</strong> comparison with the orig<strong>in</strong>al data which has large numbers <strong>of</strong> tone units with 1, 2 or 3 stress<br />

accents. This <strong>in</strong>dicates that the model underpredicts many <strong>of</strong> the stress accents, although it is<br />

not possible to tell what percentage <strong>of</strong> stress accents are present only as a function <strong>of</strong> context <strong>and</strong><br />

semantics.<br />

7.3.2 <strong>Analysis</strong> <strong>of</strong> Models<br />

There is a real need to be able to assess the \goodness" <strong>of</strong> stress patterns. Clearly there is a<br />

dierence between the experts' transcriptions, but to what extent do errors go towards mak<strong>in</strong>g<br />

an utterance un<strong>in</strong>telligible Is it more wrong to miss a stress, <strong>in</strong>sert one or transpose it It is<br />

<strong>in</strong>visaged that this question could be answered by listen<strong>in</strong>g tests as described <strong>in</strong> section 6.4.<br />

80


Tag SPM% PPM%<br />

APP$ 91.46 89.76<br />

AT 97.65 95.79<br />

AT1 98.86 97.14<br />

CC 93.09 86.68<br />

CCB 93.71 81.44<br />

CS 69.80 57.96<br />

CSA 79.78 75.27<br />

CST 99.24 96.99<br />

DB 91.57 44.32<br />

DD 76.29 45.36<br />

DD1 65.06 36.04<br />

DD2 56.76 36.00<br />

DDQ 72.73 66.46<br />

EX 96.51 89.77<br />

ICS 58.77 53.33<br />

IF 98.81 94.38<br />

II 90.25 87.04<br />

II21 61.76 40.85<br />

II22 98.53 98.59<br />

IO 99.88 98.87<br />

IW 88.89 84.83<br />

JB 88.64 48.66<br />

JJ 92.30 52.43<br />

JJT 87.76 31.37<br />

MC 89.79 52.34<br />

MC1 81.25 41.84<br />

MD 80.99 41.40<br />

MF 61.84 40.79<br />

NN 85.03 42.44<br />

NN1 92.88 35.96<br />

NN2 90.97 35.29<br />

NNJ 85.71 36.43<br />

NNL1 85.61 40.56<br />

NNO 66.22 45.33<br />

NNS1 86.02 54.84<br />

Tag SPM% PPM%<br />

NNSB1 78.48 69.62<br />

NNT1 82.41 35.07<br />

NNT2 89.41 36.96<br />

NP1 91.99 43.82<br />

PPH1 99.19 93.13<br />

PPHS1 92.06 87.69<br />

PPHS2 82.73 80.67<br />

PPIS1 86.21 77.97<br />

PPIS2 95.15 90.57<br />

PPY 97.14 95.77<br />

RG 64.15 49.15<br />

RL 88.29 38.60<br />

RP 67.53 35.61<br />

RR 86.00 42.49<br />

RR21 94.44 80.00<br />

RR22 92.59 33.33<br />

RRQ 54.41 50.00<br />

RT 79.59 42.59<br />

TO 100.00 99.27<br />

VB0 92.22 91.62<br />

VBDR 89.80 89.90<br />

VBDZ 94.66 92.88<br />

VBN 91.26 91.23<br />

VBR 87.97 85.44<br />

VBZ 91.83 90.15<br />

VH0 84.80 83.15<br />

VHD 89.89 87.23<br />

VHZ 89.83 88.43<br />

VM 76.99 75.82<br />

VV0 82.64 47.09<br />

VVD 87.59 55.79<br />

VVG 89.76 54.74<br />

VVN 91.82 48.57<br />

VVZ 78.65 51.55<br />

XX 45.51 27.61<br />

Table 7.1: Word class tags with frequencies <strong>of</strong> 50 or greater show<strong>in</strong>g percentage <strong>of</strong> correct predictions<br />

(when compared with the corpus annotations) for the stress prediction model (SPM) <strong>and</strong><br />

the prosodic mark prediction model (PPM).<br />

81


Frequency<br />

3300<br />

3000<br />

2700<br />

2400<br />

2100<br />

1800<br />

1500<br />

1200<br />

900<br />

600<br />

300<br />

0<br />

0 1 2 3 4 5 6 7 8 9 10<br />

Length <strong>of</strong> Tone-Unit<br />

Figure 7.1: Relative frequencies <strong>of</strong> tone-unit lengths produced by the model <strong>in</strong> terms <strong>of</strong> numbers<br />

<strong>of</strong>: words with tonic stress marks words with prosodic marks <strong>and</strong> words.<br />

7.3.3 Word Class Models<br />

The gures <strong>in</strong> table 7.1 show for both models how well the predictions match the corpus for each<br />

word class. Word classes with frequencies <strong>of</strong> less than fty have not been presented because they<br />

are poorly modelled due to <strong>in</strong>sucient data they are not <strong>of</strong> primary concern to the model s<strong>in</strong>ce<br />

their low frequency means that they have alow impact upon the models.<br />

With some exceptions the PPM performs just as well as the SPM <strong>in</strong> all word classes except<br />

for determ<strong>in</strong>ers (DB{DDQ), adjectives (JB{JJ), numbers (MC{MF), nouns (NN{NP1), adverbs<br />

(RG{RT), lexical verbs (VV0{VVZ) <strong>and</strong> not or n't (XX). The performance <strong>of</strong> XX is probably<br />

poor due to its frequent <strong>in</strong>clusion <strong>in</strong> enclitics. It is no great surprise to discover that those word<br />

classes upon which both models perform well are those that are mostly unstressed. Remember<br />

for the SPM a chance result would be 50% correct but for the PPM a chance result would be<br />

20%. For the PPM most predictions are between 15% <strong>and</strong> 75% higher than chance. However high<br />

performance was achieved for (APP$), articles (AT{AT1), conjunctions (CC{CST), existential<br />

there (EX), prepositions (ICS{IW), it, he/she, they, I, we, you (PPH1{PPY), <strong>in</strong>nitive marker to<br />

82


U/stress<br />

48.86 2.87 1.03 38.88 8.36<br />

66.46 2.14 0.22 27.26 3.37<br />

44.54 2.02 1.74 43.34 8.36<br />

21.62 1.44 1.09 59.90 15.95<br />

U/Stress 3.68 0.12 0.27 9.95 85.97<br />

Table 7.2: Prosodic marks show<strong>in</strong>g prediction percentages for the composite prosody prediction<br />

model.<br />

(TO), <strong>and</strong> verbs be, were, was, been, are, is, have, had, has, can/may/would etc. (VB0{VM).<br />

Refer<strong>in</strong>g to tables 7.1 <strong>and</strong> 4.3 it will be noted that for those word classes with poor performance<br />

by the PPM that there is a widespread <strong>of</strong> use <strong>of</strong> the dier<strong>in</strong>g TSMs. For example DD1 scores 36%<br />

<strong>in</strong> table 7.1. Table 4.3 shows that although it is ma<strong>in</strong>ly unstressed it also co{occurs with 4 TSMs<br />

with roughly equal likelihood ( <strong>and</strong> ). RP also scores approximately 36% <strong>and</strong> table 4.3<br />

shows that all 10 prosodic marks are plausible.<br />

Whereas the models are very good at determ<strong>in</strong><strong>in</strong>g which words should be unstressed or which<br />

words should have a stress accent the PPM is not able to choose the correct stress accent for all<br />

nouns, adjectives, verbs (VV0-VVZ), adverbs, <strong>and</strong> determ<strong>in</strong>ers. This is not really surpris<strong>in</strong>g s<strong>in</strong>ce<br />

these word classes are those most related to the context <strong>and</strong> semantics <strong>of</strong> the utterance.<br />

7.3.4 Prosodic Mark Models<br />

The values presented <strong>in</strong> table 7.2 show the percentage <strong>of</strong> times that the predictions for each<br />

prosodic mark match the corpus or not. For example is predicted as 38.88% <strong>of</strong> the time. The<br />

ratios for each <strong>of</strong> the prosodic marks <strong>of</strong> the numbers <strong>of</strong> times that they exist <strong>in</strong> the corpus to the<br />

number <strong>of</strong> times that they were predicted by the model are as follows:<br />

83


actual : predicted<br />

2333 : 3245<br />

653 : 173<br />

1089 : 114<br />

4237 : 4825<br />

Unstress 7336 : 7291<br />

Although there is a reasonable degree <strong>of</strong> accuracy for stress <strong>and</strong> unstress, rises <strong>and</strong> fall{rises<br />

are very poorly modelled <strong>in</strong>stead be<strong>in</strong>g replaced with falls or stressed marks. This is possibly<br />

due to the high frequency <strong>of</strong> the bigram: fall followed by a tone unit boudary, which may bias<br />

TSMs before a boundary away from be<strong>in</strong>g a rise or fall{rise. The fall mark has been largely over<br />

predicted <strong>and</strong> to a lesser extent the stressed mark also. The model performs best for unstress,<br />

stress <strong>and</strong> fall marks. In chapter 2 it was noted that the number<strong>of</strong>levels <strong>of</strong> stress that it is<br />

usually useful to dist<strong>in</strong>guish between is three (unstressed, weakly stressed <strong>and</strong> strongly stressed).<br />

The model seems to add weight to this claim. S<strong>in</strong>ce fall, rise <strong>and</strong> fall{rise are all \strong" stresses<br />

it is perhaps not surpris<strong>in</strong>g that the model does not perform well <strong>in</strong> dist<strong>in</strong>guish<strong>in</strong>g between them<br />

<strong>and</strong> that one dom<strong>in</strong>ates over the others. Though <strong>in</strong> the case <strong>of</strong> fall{rise there is an approximately<br />

equal split between the fall <strong>and</strong> stress mark which take the place <strong>of</strong> the fall{rise mark. For the<br />

rise mark there is a 2:1 split between fall <strong>and</strong> stress marks. We can note that falls <strong>and</strong> rises are<br />

\stronger" stress marks than fall{rises.<br />

The results <strong>of</strong> the PPM suggest that the placement <strong>of</strong> stress accents is predictable from structure<br />

<strong>and</strong> word class <strong>in</strong>formation but that the direction <strong>of</strong> the stress accent is not.<br />

7.4 Future Work<br />

Dur<strong>in</strong>g this research Ihave had many ideas which it has not been possibe to <strong>in</strong>vestigate either<br />

due to lack <strong>of</strong> time or ma<strong>in</strong>ly because they do not belong with<strong>in</strong> this research but are <strong>in</strong>terest<strong>in</strong>g<br />

oshoots. These ideas are presented below <strong>in</strong> no particular order.<br />

84


7.4.1 Conversion to ToBI<br />

TOBI[SBP + 92, BA93] (Tones <strong>and</strong> Break Indices) is a modern system for transcrib<strong>in</strong>g prosodic<br />

<strong>and</strong> <strong>in</strong>tonation patterns <strong>in</strong> <strong>English</strong> <strong>and</strong> is attract<strong>in</strong>g a lot <strong>of</strong> <strong>in</strong>terest. Unlike previous prosodic<br />

systems ToBI is be<strong>in</strong>g developed by alargenumber<strong>of</strong>speech scientists with explicit application<br />

to the annotation <strong>of</strong> mach<strong>in</strong>e readable spoken <strong>English</strong> corpora.<br />

Roach[Roa94] has already provided a means <strong>of</strong> convert<strong>in</strong>g between the annotation scheme used<br />

<strong>in</strong> the SEC <strong>and</strong> ToBI.<br />

ToBI has two clear advantages. Firstly it has a grammar <strong>in</strong> the sense that some \sentences" <strong>of</strong><br />

annotations are not allowed <strong>and</strong> secondly it does not have unclear areas such as the stressed but<br />

unaccented mark used <strong>in</strong> the SEC. A useful development tothework presented here would be to<br />

convert the annotations produced to the ToBI system. This would allow for a phase <strong>of</strong> weed<strong>in</strong>g<br />

out illegal ToBI sequences from all those possibilities presented by the model <strong>and</strong> would decrease<br />

the likelihood <strong>of</strong> the model produc<strong>in</strong>g unacceptible annotations. Given the amount <strong>of</strong><strong>in</strong>terest <strong>in</strong><br />

ToBI this would be a useful task to undertake especially if the model were ever to be used <strong>in</strong> a<br />

text{to{speech system.<br />

7.4.2 Additional Constra<strong>in</strong>ts<br />

The focus <strong>of</strong> this thesis has been upon the relationships between prosodic annotations <strong>and</strong> word<br />

class. It has <strong>of</strong>ten been noted that prosody serves more purposes than signify<strong>in</strong>g stucture. We<br />

can therefore expect marked improvements (especially <strong>in</strong> the prediction <strong>of</strong> stress accents) by the<br />

<strong>in</strong>clusion <strong>of</strong> other factors than word class. Researchers <strong>in</strong> the eld <strong>of</strong> Natural Language Process<strong>in</strong>g<br />

have been develop<strong>in</strong>g semantic taggers for corpora | or systems that can annotate corpora with<br />

semantic <strong>in</strong>formation derived from them[JA94]. As has been previously po<strong>in</strong>ted out <strong>in</strong> this thesis<br />

semantics play an important role <strong>in</strong> the structure <strong>of</strong> prosody. The <strong>in</strong>clusion <strong>of</strong> semantic <strong>in</strong>formation<br />

<strong>in</strong> the prediction would be most advantageous <strong>in</strong> that, for example, given <strong>in</strong>formation could be<br />

signied prosodically. At a lesser level parse tree structures could provide additional constra<strong>in</strong>ts.<br />

85


It has also be<strong>in</strong>g noted by Knowles 2 that given <strong>in</strong>formation plays an important role <strong>in</strong> stress<br />

assignment.<br />

A further level <strong>of</strong> constra<strong>in</strong>ts could be imposed by rule <strong>in</strong> certa<strong>in</strong> circumstances for example<br />

stress accent assignment <strong>in</strong> compound nouns or noun phrases. Such constra<strong>in</strong>ts are discussed <strong>in</strong><br />

chapter 5 <strong>of</strong> Fudge[Fud84].<br />

7.4.3 Speech Synthesis<br />

There is at present no method for automatically synthesiz<strong>in</strong>g speech with appropriate pitch, <strong>in</strong>tensity<br />

<strong>and</strong> syllable durations for a given prosodic annotation. Usage <strong>of</strong> such a system, had it<br />

existed, would have been most useful <strong>in</strong> assess<strong>in</strong>g the acceptibility <strong>of</strong> annotations produced by the<br />

model (see section 6.4).<br />

There are very few text{to{speech systems that generate speech from phonemes <strong>and</strong> allow the<br />

<strong>in</strong>clusion <strong>of</strong> prosodic annotations or pitch movement <strong>in</strong>dications. One such system available (the<br />

speech synthesizer 3 built <strong>in</strong>to the Commodore Amiga personal computer) allows a specication <strong>of</strong><br />

stress level follow<strong>in</strong>g each vowel rang<strong>in</strong>g from 1 to 9. An attempt was made to convert between<br />

the annotations produced by the model <strong>and</strong> stress level accepted by the synthesizer but there was<br />

too little range <strong>of</strong> control as a general rise|fall contour was imposed by the system outside the<br />

user's control. The stress level number merely perturbs this level upwards by avary<strong>in</strong>g amount.<br />

There is therefore no easy way <strong>of</strong> <strong>in</strong>dicat<strong>in</strong>g a fall<strong>in</strong>g tone.<br />

SOLA/PSOLA<br />

An alternative tospeechsynthesis would be to use real speech <strong>and</strong> adjust these with the SOLA<br />

<strong>and</strong> PSOLA algorithms to change the length <strong>and</strong> pitch contour. These are digital signal process<strong>in</strong>g<br />

techniques that allow a segment <strong>of</strong> speech tobechanged <strong>in</strong> duration without aect<strong>in</strong>g the pitch<br />

<strong>of</strong> the utterance (SOLA) <strong>and</strong> will allow the pitch <strong>of</strong> the speech tobechanged without aect<strong>in</strong>g<br />

the duration <strong>of</strong> the speech (PSOLA). These can be used to give any utterance any desired pitch<br />

2 At a sem<strong>in</strong>ar given at Leeds University L<strong>in</strong>guistics Department. 1994<br />

3 believed to be modeled on the DecTalk system.<br />

86


contour <strong>and</strong> syllable duration. Intensity is modiable by scal<strong>in</strong>g the waveform with an <strong>in</strong>tensity<br />

contour.<br />

It is, however, very dicult for the novice to know how exactly to realise the prosodic annotations<br />

<strong>in</strong> terms <strong>of</strong> F0 <strong>and</strong> <strong>in</strong>tensity contours <strong>and</strong> syllable durations. There is also the additional<br />

problem <strong>of</strong> correctly <strong>in</strong>terpret<strong>in</strong>g between F0 <strong>and</strong> <strong>in</strong>tensity <strong>and</strong> pitch.<br />

There is a very clear need for work on relat<strong>in</strong>g F0, <strong>in</strong>tensity <strong>and</strong> duration to prosodic annotations.<br />

7.4.4 Parameter Improvement<br />

The model is dened <strong>in</strong> terms <strong>of</strong> the state probabilities <strong>and</strong> the transition probabilities. A large<br />

portion <strong>of</strong> this research has concentrated upon methods to estimate these parameters directly from<br />

the corpus. Improvements <strong>in</strong> the estimation <strong>of</strong> these parameters will improve the accuracy <strong>of</strong> the<br />

model hence any means to perform this would be desireable.<br />

There are other iterative methods that may be exploited to improve the parameters.<br />

One<br />

method would be to impose r<strong>and</strong>om variations <strong>in</strong> the parameters <strong>and</strong> observe how these eect the<br />

overall performance <strong>of</strong> the model. Iteratively this would allow improvements to be made to the<br />

poorly estimated parameters by keep<strong>in</strong>g those changes that improved performance. Many other<br />

methods exist. See for example Statistical Language Learn<strong>in</strong>g[Cha93].<br />

7.5 General Conclusions<br />

This whole thesis starts from the assumption that the prosodic annotations <strong>in</strong> the <strong>Spoken</strong> <strong>English</strong><br />

<strong>Corpus</strong> do have some correlation to the acoustic signal i.e. some \realisation" beyond the<br />

perception <strong>of</strong> the human annotators. It is beyond the scope <strong>of</strong> this thesis to dene this prosodic{<br />

acoustic correlation exactly but if <strong>and</strong> when others do this, then this thesis has mapped out the<br />

l<strong>in</strong>k between prosodic <strong>and</strong> syntactic tags, <strong>and</strong> so will constitute a further l<strong>in</strong>k <strong>in</strong> the cha<strong>in</strong> relat<strong>in</strong>g<br />

acoustic signals to syntactic analysis.<br />

87


Appendix A<br />

SEC <strong>and</strong> MARSEC<br />

A.1 Introduction<br />

This appendix provides an outl<strong>in</strong>e description <strong>of</strong> the data <strong>and</strong> its organisation <strong>in</strong> the <strong>Spoken</strong><br />

<strong>English</strong> <strong>Corpus</strong> (SEC) <strong>and</strong> the Mach<strong>in</strong>e Readable <strong>Spoken</strong> <strong>English</strong> <strong>Corpus</strong> (MARSEC).<br />

A.2 The <strong>Spoken</strong> <strong>English</strong> <strong>Corpus</strong><br />

A.2.1<br />

History<br />

The SEC was the product <strong>of</strong> a three{year project funded by IBM UK <strong>and</strong> carried out at the<br />

University <strong>of</strong> Lancaster by Knowles et al[KT88] with the aim <strong>of</strong> provid<strong>in</strong>g a corpus <strong>of</strong> data for the<br />

analysis <strong>of</strong> <strong>in</strong>tonation.<br />

The corpus comprises 52,673 words <strong>of</strong> text recorded pr<strong>in</strong>cipally from BBC Radio 4 broadcasts<br />

<strong>and</strong> covers a diversity <strong>of</strong> categories <strong>in</strong> l<strong>in</strong>e with the LOB <strong>and</strong> BROWN corpora conventions.<br />

A.2.2<br />

Categories<br />

The SEC is divided <strong>in</strong>to 11 categories each category featur<strong>in</strong>g a dierent variety <strong>of</strong>speechstyle.<br />

These categories are listed <strong>in</strong> table A.1 with their size <strong>in</strong> words <strong>and</strong> as a percentage <strong>of</strong> the whole<br />

88


corpus.<br />

Each category is divided <strong>in</strong>to a number <strong>of</strong> sections. Each section comprises <strong>of</strong> a s<strong>in</strong>gle record<strong>in</strong>g.<br />

Table A.2 shows the section numbers (which beg<strong>in</strong> with their category letter), their duration <strong>in</strong><br />

m<strong>in</strong>utes <strong>and</strong> seconds <strong>and</strong> the approximate number<strong>of</strong>words <strong>in</strong> each section (accurate word counts<br />

can sometimes be debateable).<br />

A.3 MARSEC<br />

The MARSEC was produced from a two{year ESRC project held jo<strong>in</strong>tly by Lancaster University<br />

<strong>and</strong> Leeds University Roach[RKVA94]. The ma<strong>in</strong> dierence between the SEC <strong>and</strong> the MARSEC<br />

is that the acoustic data has been added to the corpus <strong>in</strong> the form <strong>of</strong> a CD{ROM <strong>and</strong> fundamental<br />

frequency <strong>and</strong> RMS energy have been calculated. The MARSEC also has the advantage <strong>of</strong> br<strong>in</strong>g<strong>in</strong>g<br />

together all the available corpus material along with a segmental time alignment <strong>of</strong> the data <strong>and</strong><br />

a mechanism for cross{referenc<strong>in</strong>g (see Section 3.5).<br />

The MARSEC update <strong>of</strong> the SEC therefore br<strong>in</strong>gs together the follow<strong>in</strong>g <strong>in</strong>formation.<br />

Digitized Record<strong>in</strong>gs <strong>of</strong> the Speech<br />

Fundamental Frequency<br />

RMS Energy<br />

Segmental Time Alignment with Syllabic Divisions<br />

Prosodic Annotation with Time Alignment<br />

Orthographic Text<br />

Part <strong>of</strong> Speech Annotation<br />

Parse Trees<br />

Figure A.1 shows an example <strong>of</strong> all the available <strong>in</strong>formation for an example utterance. For<br />

more detailed <strong>in</strong>formation see the references given above.<br />

89


Category Style #words % corpus<br />

A Commentary 9066 17%<br />

B News Broadcast 5235 10%<br />

C Lecture type I (general audience) 4471 8%<br />

D Lecture type II (restricted audience) 7451 14%<br />

E Religious (<strong>in</strong>clud<strong>in</strong>g liturgy) 1503 3%<br />

F Magaz<strong>in</strong>e Style Reports 4710 9%<br />

G Fiction 7299 14%<br />

H Poetry 1292 2%<br />

J Dialogue 6826 13%<br />

K Propag<strong>and</strong>a 1432 3%<br />

M Miscellaneous 3352 6%<br />

Table A.1: Categories <strong>in</strong> the SEC/MARSEC<br />

Section Time #Words<br />

A01 15:00 793<br />

A02 4:28 734<br />

A03 4:01 620<br />

A04 5:41 977<br />

A05 4:48 804<br />

A06 4:32 828<br />

A07 3:54 716<br />

A08 4:08 618<br />

A09 5:12 787<br />

A10 4:26 800<br />

A11 4:15 785<br />

A12 4:05 604<br />

B01 9:32 1722<br />

B02 9:40 1720<br />

B03 5:00 940<br />

B04 5:00 853<br />

C01 30:00 4471<br />

D01 19:00 2410<br />

D02 19:00 2434<br />

D03 19:00 2607<br />

E01 6:48 915<br />

E02 4:30 588<br />

F01 3:48 671<br />

F02 3:32 667<br />

F03 4:54 850<br />

F04 13:16 2522<br />

Section Time #Words<br />

G01 20:00 3163<br />

G02 8:56 1221<br />

G03 2:39 442<br />

G04 5:30 810<br />

G05 9:20 1663<br />

H01 1:41 248<br />

H02 2:03 286<br />

H03 1:00 157<br />

H04 2:59 405<br />

H05 1:17 196<br />

J01 7:58 1674<br />

J02 1:31 279<br />

J03 2:04 375<br />

J04 0:27 74<br />

J05 1:28 277<br />

J06 24:00 4147<br />

K01 4:32 798<br />

K02 4:09 634<br />

M01 0:41 93<br />

M02 1:10 200<br />

M03 0:48 140<br />

M04 1:40 298<br />

M05 4:33 738<br />

M06 7:05 1112<br />

M07 1:06 187<br />

M08 0:47 143<br />

M09 2:24 441<br />

Table A.2: Sections <strong>in</strong> the SEC/MARSEC<br />

90


[N a AT1 t<strong>in</strong>y JJ m<strong>in</strong>ority NN1 [P <strong>in</strong> II [N Argent<strong>in</strong>a NP1 N]P]N]<br />

Figure A.1: Diagram show<strong>in</strong>g waveform, fundamental frequency, RMS energy, segmental, prosodic<br />

<strong>and</strong> treebank transcriptions.<br />

91


Appendix B<br />

Syntactic Tagg<strong>in</strong>g <strong>of</strong> SEC<br />

B.1 Introduction<br />

This appendix provides a list <strong>of</strong> the part <strong>of</strong> speech word class tags used <strong>in</strong> the version <strong>of</strong> CLAWS<br />

(CLAWS4) with which the SEC treebank version was annotated.<br />

B.2 Word Class Tags<br />

Wordtag Denition<br />

! punctuation tag { exclamation mark<br />

" punctuation tag { quotation marks<br />

$ genitive sux(\'"or\'s")<br />

&FO<br />

formula<br />

&FW<br />

foreign word<br />

( punctuation tag { left bracket<br />

) punctuation tag { right bracket<br />

, punctuation tag { comma<br />

{ punctuation tag { dash<br />

. punctuation tag { full stop<br />

... punctuation tag { ellipsis<br />

: punctuation tag { colon<br />

<br />

punctuation tag { semicolon<br />

punctuation tag { question mark<br />

APP$<br />

possessive pre-nom<strong>in</strong>al pronoun: my, your, our<br />

AT<br />

neutral article: the, no<br />

AT1<br />

s<strong>in</strong>gular article: a, every<br />

BTO<br />

before-<strong>in</strong>nitive marker: <strong>in</strong> order, so as before to<br />

92


BTO21<br />

idiom tag<br />

BTO22<br />

idiom tag<br />

CC<br />

general co-ord<strong>in</strong>at<strong>in</strong>g conjunction<br />

CC31<br />

idiom tag<br />

CC32<br />

idiom tag<br />

CC33<br />

idiom tag<br />

CCB<br />

co-ord<strong>in</strong>at<strong>in</strong>g conjunction but<br />

CF<br />

semi-co-ord<strong>in</strong>at<strong>in</strong>g conjunction: so, then, yet<br />

CS<br />

general subord<strong>in</strong>at<strong>in</strong>g conjunction<br />

CS21<br />

idiom tag<br />

CS22<br />

idiom tag<br />

CSA<br />

as as conjunction<br />

CSN<br />

than as conjunction<br />

CST<br />

that as conjunction<br />

CSW<br />

whether as conjunction<br />

DA<br />

neutral after-determ<strong>in</strong>er capable <strong>of</strong> pronom<strong>in</strong>al function: such<br />

DA1<br />

s<strong>in</strong>gular after-determ<strong>in</strong>er: little, much<br />

DA2<br />

plural after-determ<strong>in</strong>er: few, several, many<br />

DA2R<br />

comparative plural after-determ<strong>in</strong>er: fewer<br />

DAR<br />

comparative neutral after-determ<strong>in</strong>er: more, less<br />

DAT<br />

superlative neutral after-determ<strong>in</strong>er: most, least<br />

DB<br />

before-determ<strong>in</strong>er (capable <strong>of</strong> pronom<strong>in</strong>al fn.): half, all<br />

DB2<br />

plural before-determ<strong>in</strong>er: both without <strong>and</strong><br />

DD<br />

neutral determ<strong>in</strong>er capable <strong>of</strong> pronom<strong>in</strong>al function: any, some<br />

DD1<br />

s<strong>in</strong>gular determ<strong>in</strong>er: this, that, another<br />

DD121<br />

idiom tag<br />

DD122<br />

idiom tag<br />

DD2<br />

plural determ<strong>in</strong>er: these, those<br />

DD21<br />

idiom tag<br />

DD22<br />

idiom tag<br />

DD221<br />

idiom tag<br />

DD222<br />

idiom tag<br />

DDQ<br />

`wh-' determ<strong>in</strong>er without `-ever' : what, which<br />

DDQ$<br />

possessive `wh-' determ<strong>in</strong>er: whose<br />

DDQV<br />

`wh-ever' determ<strong>in</strong>er: whatsoever, whichever<br />

EX<br />

existential there<br />

ICS<br />

preposition-conjunction <strong>of</strong> time: after, before, s<strong>in</strong>ce<br />

IF<br />

for as preposition<br />

II<br />

general preposition<br />

II21<br />

idiom tag<br />

II22<br />

idiom tag<br />

II31<br />

idiom tag<br />

II32<br />

idiom tag<br />

II33<br />

idiom tag<br />

IO<br />

<strong>of</strong> as preposition<br />

IW<br />

with, without as preposition<br />

JA<br />

predicative adjective: tantamount, afraid, asleep<br />

JB<br />

attributive adjective: late, model <strong>in</strong> a model prisoner<br />

JBR<br />

attributive comparative adjective: upper, outer<br />

JJ<br />

general adjective<br />

JJR<br />

general comparative adjective: older, better, stronger<br />

JJT<br />

general superlative adjective: oldest, best<br />

LE<br />

lead<strong>in</strong>g co-ord<strong>in</strong>ator: either before or<br />

MC card<strong>in</strong>al: two, 6, 2.34<br />

93


MC-MC hyphenated number: 1770-1827<br />

MC1 s<strong>in</strong>gular card<strong>in</strong>al number one, 1<br />

MC2<br />

plural card<strong>in</strong>al number: threes, 3s<br />

MD<br />

ord<strong>in</strong>al number: second, 2nd, last<br />

MF<br />

fraction neutral for numbers: two-thirds<br />

ND1<br />

s<strong>in</strong>gular noun <strong>of</strong> direction: west<br />

NN<br />

common noun neutral for number: sheep, cod<br />

NN1<br />

s<strong>in</strong>gular common noun: book, girl<br />

NN121<br />

idiom tag<br />

NN122<br />

idiom tag<br />

NN2<br />

plural common noun: books, girls<br />

NNJ<br />

organization noun neutral for number: Company, group<br />

NNJ1<br />

s<strong>in</strong>gular organization noun: conference, Church<br />

NNJ2<br />

plural organization noun: groups, councils<br />

NNL1<br />

s<strong>in</strong>gular locative noun: isl<strong>and</strong>, Street<br />

NNL2<br />

plural locative noun: isl<strong>and</strong>s, streets<br />

NNO<br />

numeral noun, neutral for number agreement: dozen, hundred<br />

NNO2<br />

plural numeral noun: hundreds, millions<br />

NNS1<br />

s<strong>in</strong>gular titular noun: Mrs, President<br />

NNS2<br />

plural titular noun: Presidents<br />

NNSA1<br />

follow<strong>in</strong>g abbrev. s<strong>in</strong>gular titular noun: M.A.<br />

NNSB1<br />

preced<strong>in</strong>g abbrev. s<strong>in</strong>gular titular noun: Pr<strong>of</strong>.<br />

NNT1<br />

s<strong>in</strong>gular temporal noun: day, week, year<br />

NNT2<br />

plural temporal noun: days, weeks, years<br />

NNU<br />

abbreviated unit <strong>of</strong> measurement neutral for number: <strong>in</strong>., kg<br />

NNU1<br />

s<strong>in</strong>gular unit <strong>of</strong> measurement: <strong>in</strong>ch, kilo<br />

NNU2<br />

plural unit <strong>of</strong> measurement: <strong>in</strong>s., feet<br />

NNU21<br />

idiom tag<br />

NNU22<br />

idiom tag<br />

NP<br />

proper noun neutral for number: Andes, Indies<br />

NP1<br />

s<strong>in</strong>gular proper noun: London, Frederick<br />

NP2<br />

plural proper noun: Americas<br />

NPD1<br />

s<strong>in</strong>gular weekday noun: Thursday<br />

NPM1<br />

s<strong>in</strong>gular month noun: October<br />

PN<br />

<strong>in</strong>denite pronoun neutral for number: none<br />

PN1<br />

s<strong>in</strong>gular <strong>in</strong>denite pronoun: anybody, everyone, one as pronoun<br />

PN121<br />

idiom tag<br />

PN122<br />

idiom tag<br />

PNQO<br />

objective `wh-' pronoun without `-ever': whom<br />

PNQS<br />

`wh-' pronoun without `-ever' : who, that<br />

PP$<br />

nom<strong>in</strong>al possessive personal pronoun: m<strong>in</strong>e, yours<br />

PPH1<br />

it<br />

PPHO1<br />

him , her<br />

PPHO2<br />

them<br />

PPHS1<br />

he,she<br />

PPHS2<br />

they<br />

PPIO1<br />

me<br />

PPIO2<br />

us<br />

PPIS1<br />

I<br />

PPIS2<br />

we<br />

PPX1<br />

s<strong>in</strong>gular reexive personal pronoun: yourself, itself<br />

PPX121<br />

idiom tag<br />

PPX122<br />

idiom tag<br />

PPX2<br />

plural reexive personal pronoun: ourselves, themselves<br />

94


PPX221<br />

PPX222<br />

PPY<br />

RA<br />

REX<br />

REX21<br />

REX22<br />

RG<br />

RG21<br />

RG22<br />

RGA<br />

RGQ<br />

RGQV<br />

RGR<br />

RGT<br />

RL<br />

RL21<br />

RL22<br />

RP<br />

RR<br />

RR21<br />

RR22<br />

RR31<br />

RR32<br />

RR33<br />

RRQ<br />

RRQV<br />

RRR<br />

RRT<br />

RT<br />

TO<br />

UH<br />

VBO<br />

VBDR<br />

VBDZ<br />

VBG<br />

VBM<br />

VBN<br />

VBR<br />

VBZ<br />

VDO<br />

VDD<br />

VDG<br />

VDN<br />

VDZ<br />

VHO<br />

VHD<br />

VHG<br />

VHN<br />

VHZ<br />

VM<br />

VM21<br />

VM22<br />

VMK<br />

idiom tag<br />

idiom tag<br />

you<br />

adverb after nom<strong>in</strong>al head: else, galore<br />

adverb apposition-<strong>in</strong>troducer: namely, e.g.<br />

idiom tag<br />

idiom tag<br />

degree adverb: very, so, too<br />

idiom tag<br />

idiom tag<br />

post-adjectival / adverbial degree adverb: enough, <strong>in</strong>deed<br />

`wh-' degree adverb without `-ever' : how<br />

`wh-ever' degree adverb: however<br />

comparative degree adverb: more, less<br />

superlative degree adverb: most, least<br />

locative adverb: here, there<br />

idiom tag<br />

idiom tag<br />

prepositional adverb which is also particle<br />

general adverb<br />

idiom tag<br />

idiom tag<br />

idiom tag<br />

idiom tag<br />

idiom tag<br />

non-degree `wh-' adverb without `-ever': where, when, why<br />

non-degree `wh-ever' adverb: wherever, whenever, however<br />

comparative general adverb: better, longer<br />

superlative general adverb: best, longest<br />

nom<strong>in</strong>al adverb <strong>of</strong> time: now, then<br />

<strong>in</strong>nitive marker to<br />

<strong>in</strong>terjection: hello, no<br />

base form be<br />

imperfect <strong>in</strong>dicative were<br />

was<br />

be<strong>in</strong>g<br />

am, 'm<br />

been<br />

are, 're<br />

is, 's<br />

base form do<br />

did<br />

do<strong>in</strong>g<br />

done<br />

does<br />

base form have<br />

had, 'd (preterite)<br />

hav<strong>in</strong>g<br />

had (past participle)<br />

has, 's<br />

modal auxiliary: can, may, would<br />

idiom tag<br />

idiom tag<br />

modal catenative: ought, used<br />

95


Labels<br />

F<br />

Fa<br />

Fc<br />

Fn<br />

Fr<br />

G<br />

J<br />

N<br />

Nr<br />

P<br />

S<br />

Si<br />

T<br />

Tg<br />

Ti<br />

Tn<br />

V<br />

Denition<br />

nite clause, divided <strong>in</strong>to:<br />

adverbial clause<br />

comparative clause<br />

noun clause<br />

relative clause<br />

genitive phrase<br />

adjectival phrase<br />

noun phrase<br />

temporal adverbial noun phrase<br />

prepositional phrase<br />

<strong>in</strong>dependent sentence (sentential conjuct)<br />

<strong>in</strong>terpolated or appended sentence<br />

non-nite clause, divided <strong>in</strong>to:<br />

clause with present-participle head<br />

clause with <strong>in</strong>nitive head<br />

clause with past-participle head<br />

verb phrase (sequence <strong>of</strong> auxiliary & ma<strong>in</strong> verbs,<br />

excl. object, complement, etc.)<br />

Table B.1: Phrase <strong>and</strong> Clause labels<br />

VV0<br />

VVD<br />

VVG<br />

VVN<br />

VVZ<br />

XX<br />

ZZ1<br />

lexical verb, base form: eat, request<br />

lexical verb, preterite: ate, requested<br />

\-<strong>in</strong>g" present participle <strong>of</strong> lexical verb: giv<strong>in</strong>g<br />

past participle <strong>of</strong> lexical verb: given<br />

3rd s<strong>in</strong>gular form <strong>of</strong> verb: eats, requests<br />

not, n't<br />

s<strong>in</strong>gular letter <strong>of</strong> the alphabet<br />

B.3 Phrase/Clause Tags<br />

Table B.1 presents the list <strong>of</strong> phrase/clause node labels used <strong>in</strong> the treebank version <strong>of</strong> the SEC.<br />

Note that some node labels will sometimes occur with a `&' or `+' sux <strong>in</strong> order to show the<br />

coord<strong>in</strong>ation <strong>of</strong> phrases or clauses.<br />

96


Appendix C<br />

Test<strong>in</strong>g Data<br />

This appendices presents the text <strong>of</strong> those sections <strong>of</strong> category M that were used as test<strong>in</strong>g data.<br />

The results <strong>of</strong> which are presented <strong>in</strong> table 6.3. Also presented is the annotations produced by<br />

the model.<br />

C.1 <strong>Corpus</strong> Texts: Category M<br />

Below are some <strong>of</strong> the prosodically annotated texts from Category M. Category M is the miscellaneous<br />

section <strong>and</strong> the texts are therefore <strong>of</strong> a variety <strong>of</strong>styles. Not all <strong>of</strong> the texts were used<br />

for test<strong>in</strong>g. M01 was ommitted from the test set because it is a short peotry read<strong>in</strong>g by John<br />

Betjeman <strong>and</strong> is therefore an unusual style <strong>and</strong> M06 was ommitted because <strong>of</strong> a tech<strong>in</strong>ical problem<br />

associated with alignment. It was not deemed necessary to have the complete category.<br />

C.1.1<br />

Section M02<br />

046 SPOKEN ENGLISH CORPUS TEXT M02<br />

Motor<strong>in</strong>g News<br />

Speaker: male<br />

Broadcast notes: Radio 4, 8.55a.m., January 18th, 1987<br />

97


Transcriber: BJW<br />

<strong>in</strong> spite <strong>of</strong> the low j <strong>of</strong> the slow thaw j con ditions are probably more " dangerous<br />

on the roads this morn<strong>in</strong>g j because yesterday's slush <strong>and</strong> " snow j is this morn<strong>in</strong>g's<br />

ice k <strong>in</strong> Kent j the A two six four is still closed at " Blackham j as are many<br />

side roads k <strong>and</strong> about a dozen <strong>of</strong> the more isolated villages j re ma<strong>in</strong> cut o k<br />

a part from the weather j gas <strong>and</strong><br />

water ma<strong>in</strong> re pairs j to gether with j scheduled<br />

weekend work j are also go<strong>in</strong>g to a ect the roads to day k <strong>and</strong> <strong>in</strong> London j<br />

ex pect long delays <strong>in</strong> " Chiswick k where the west bound elevated section <strong>of</strong> the<br />

M 4 j is<br />

closed for most <strong>of</strong> to day j <strong>and</strong> only a s<strong>in</strong>gle eastbound lane is open k <strong>in</strong><br />

Lancashire j northbound trac on the M 6 j will be re stricted to the<br />

centre lane<br />

only j between junctions thirty- one j <strong>and</strong> thirty- two from " Preston k to the M<br />

fty- ve <strong>in</strong>tersection until<br />

4p.m. k <strong>and</strong> you can expect long queues there k at<br />

10 o'clock this morn<strong>in</strong>g j the Thames Valley po lice are di vert<strong>in</strong>g all northbound<br />

trac o the M 1 k <strong>and</strong> are leav<strong>in</strong>g only one southbound lane open j between<br />

junctions<br />

fourteen <strong>and</strong> fteen j while a rather complicated re covery ope ration is<br />

be<strong>in</strong>g carried out k so ex pect very long delays there k that's between junctions<br />

fourteen j <strong>and</strong><br />

fteen j on the M 1this morn<strong>in</strong>g k<br />

C.1.2<br />

Section M03<br />

047 SPOKEN ENGLISH CORPUS TEXT M03<br />

Weather Forecast<br />

Speaker: male<br />

Broadcast notes: Radio 4, January 18th, 1987<br />

Transcriber: BJW<br />

98


now the weather forecast j#until dawn to morrow k over Engl<strong>and</strong> <strong>and</strong><br />

Wales j<br />

many places will be cloudy but dry k but <strong>in</strong> parts <strong>of</strong> Cornwall <strong>and</strong> west Wales j<br />

a little light ra<strong>in</strong> is likely j#which will ex tend <strong>in</strong>to parts <strong>of</strong> Cumbria over night<br />

k eastern Scotl<strong>and</strong> j will have a dry j cloudy day j followed by ra<strong>in</strong> or sleet j<br />

to night k over Northern Irel<strong>and</strong> j <strong>and</strong> western Scotl<strong>and</strong> j ra<strong>in</strong> will ex tend from<br />

the southwest j reach<strong>in</strong>g Northern Irel<strong>and</strong> j early this morn<strong>in</strong>g j <strong>and</strong> western<br />

Scotl<strong>and</strong> soon after k outbreaks <strong>of</strong> ra<strong>in</strong> j#will per sist j#<strong>in</strong>to the night k but<br />

after midnight j they're ex pected to clear j from Northern Irel<strong>and</strong> k eastern<br />

areas will a ga<strong>in</strong> be cold j but <strong>in</strong> the<br />

west it'll be come less cold than recently<br />

k <strong>and</strong> the outlook for Monday <strong>and</strong> Tuesday j ma<strong>in</strong>ly j dry <strong>and</strong> cold <strong>in</strong> the<br />

south east j but re ma<strong>in</strong><strong>in</strong>g areas will be milder j with j a little ra<strong>in</strong> k"so th<strong>in</strong>gs<br />

are im prov<strong>in</strong>g slightly there k<br />

C.1.3<br />

Section M04<br />

048 SPOKEN ENGLISH CORPUS TEXT M04<br />

Programme News<br />

Speaker: male<br />

Broadcast notes: Radio 4, January 18th, 1987<br />

Transcriber: GOK<br />

now let's look at our programmes j com<strong>in</strong>g up on * Radio 4 this morn<strong>in</strong>g k<br />

well j <strong>in</strong> a moment j <strong>in</strong> *"two-<strong>and</strong>-a- half m<strong>in</strong>utes or therea bouts j there's the<br />

news j followed j by our browse j through the Sunday papers k then at 9.1 5<br />

j Alistair Cooke j pre sents this week's * Letter from A merica k our morn<strong>in</strong>g<br />

service j at 9. 30 j comes this week j from Eneld j <strong>in</strong> Middlesex k <strong>and</strong> it's followed<br />

j at 10.1 5 j by * a nother chance j to catch up with the week's go<strong>in</strong>gs on at<br />

Ambridge k Margaret Howard j will be here with her * Pick <strong>of</strong> the Week j at a<br />

99


quarter past e leven j <strong>and</strong> this week j we'll be hear<strong>in</strong>g a bout *<br />

camel wrestl<strong>in</strong>g<br />

<strong>in</strong> Turkey j nights spent <strong>in</strong>the Great Pyramids at Cairo j Hamlet j <strong>in</strong> Els<strong>in</strong>ore<br />

j <strong>and</strong> tales j from wildest Canada j <strong>and</strong> Ecuador k add some * do-it-your self<br />

Gilbert <strong>and</strong> Sullivan j <strong>and</strong> a re m<strong>in</strong>der <strong>of</strong> the comedy <strong>of</strong> Al Reid j <strong>and</strong> that's<br />

our Pick <strong>of</strong>the Week j at e leven f teen k the castaway j <strong>in</strong> Desert Isl<strong>and</strong> Discs<br />

j one hour later j is silly-ass actor Jeremy Lloyd j who's also known j as a<br />

scriptwriter k he's one half <strong>of</strong> the writ<strong>in</strong>g team j that cre ated * Are You Be<strong>in</strong>g<br />

Served j <strong>and</strong> * 'A llo 'A llo k he'll be chatt<strong>in</strong>g a bout * chatt<strong>in</strong>g a bout his ca reer j to<br />

Michael Park<strong>in</strong>son j <strong>and</strong> pick<strong>in</strong>g his eightfavourite records j at "a quarter past<br />

twelve k <strong>and</strong> that takes us to # lunchtime j <strong>and</strong> * The World at One k<br />

nally j I've<br />

just time j for a word a bout our Sunday feature j whichisa poetry programme<br />

j with a dierence j Gardens <strong>of</strong> Eden j by Michel<strong>in</strong>e Wonder k Maureen Lipman<br />

plays Eve j Adam's rst wife j a ccord<strong>in</strong>g to the New Testament j <strong>and</strong> *<br />

Miriam<br />

Margolis j plays Lillith j who is Adam's rst wife j a ccord<strong>in</strong>g to the alphabet j<br />

<strong>of</strong> Ben Seurat k their meet<strong>in</strong>g <strong>in</strong> volves j a k<strong>in</strong>d <strong>of</strong> life swap j Lillith j journeys<br />

to the Old Testament j <strong>and</strong> takes tea with the Lord j while Eve decides j she's<br />

had e nough j <strong>of</strong> be<strong>in</strong>g everyone's mother k<br />

C.1.4<br />

Section M05<br />

049 SPOKEN ENGLISH CORPUS TEXT M05<br />

Nelson M<strong>and</strong>ela speech<br />

Speaker: Col<strong>in</strong> Lyas<br />

Recorded at MSU, University <strong>of</strong> Lancaster<br />

Transcriber: BJW<br />

your Royal Highness <strong>and</strong> Chancellor j"it is my privilege to pre sent toyou j on<br />

be half <strong>of</strong> the Senate j the name <strong>of</strong> Dr Nelson Man dela k the im prisoned leader<br />

100


<strong>of</strong> the African National Congress j as one em<strong>in</strong>ently worthy <strong>of</strong>thede gree <strong>of</strong><br />

Doctor <strong>of</strong> Laws k Dr M<strong>and</strong>ela's life j has been de voted to the eort to se cure<br />

j for all the citizens <strong>of</strong> his native South Africa j re gardless <strong>of</strong> their colour j<br />

certa<strong>in</strong> simple yet basic rights k the most fundamental <strong>of</strong> which is the right <strong>of</strong><br />

each <strong>of</strong> those who must o bey the law j to an equal voice with<strong>in</strong> the po litical<br />

system j under which the law iscreated k those who live <strong>in</strong> countries j where<br />

the basic rights j that Dr Man dela seeks for his people have been won j owe<br />

at the very least j a duty <strong>of</strong> sympathy j for those who have no such rights k<br />

<strong>and</strong> they owe too j a duty <strong>of</strong>res pect to those who j like Dr Man dela j have<br />

un ceas<strong>in</strong>gly striven <strong>in</strong> the face <strong>of</strong> hardship <strong>and</strong> danger j to claim those rights<br />

k but <strong>in</strong> a ddition to what is owed to Dr Man dela by the citizens <strong>of</strong> any free<br />

nation j any uni versity <strong>in</strong>a free society j owes him a special tribute k for <strong>in</strong> his<br />

speeches <strong>and</strong> writ<strong>in</strong>gs j Dr Man dela has un swerv<strong>in</strong>gly a sserted the cen trality<br />

<strong>of</strong> an open edu cation j to the cultural life <strong>of</strong> any nation k he has <strong>in</strong> sisted<br />

j as the Charter <strong>of</strong> the African National Congress puts it j that the doors <strong>of</strong><br />

learn<strong>in</strong>g <strong>and</strong> culture j shall be open k he has emphasised j that <strong>in</strong> a healthy<br />

so ciety j young scholars are to be thought <strong>of</strong>asa credit to their nation j <strong>and</strong><br />

not merely as a threat to its rulers k he has <strong>in</strong> sisted upon the pro foundly<br />

liberalis<strong>in</strong>g e ects j <strong>of</strong> the meet<strong>in</strong>g <strong>of</strong> the world's peoples j <strong>in</strong> open <strong>in</strong>stitutions<br />

<strong>of</strong> learn<strong>in</strong>g k <strong>and</strong> to use his own words j he has re sound<strong>in</strong>gly a rmed j that<br />

for centuries uni versities have served j as centres j for the di ssem<strong>in</strong>ation <strong>of</strong><br />

learn<strong>in</strong>g <strong>and</strong> knowledge j to<br />

all students j irre spective <strong>of</strong> colour <strong>and</strong> creed k <strong>in</strong><br />

multi-racial so cieties he cont<strong>in</strong>ues j they serve asthe centres for the de velopment<br />

<strong>of</strong> the cultural <strong>and</strong> spiritual aspects j <strong>of</strong> the life <strong>of</strong> the people k"the Charter<br />

<strong>of</strong> this uni versity comm<strong>and</strong>s j that no test re lated to sex j race j colour<br />

j or creed j shall be im posed upon any person j <strong>in</strong> order to entitle him j to<br />

be ad mitted j as a member j teacher j or student k"for the members <strong>of</strong> this<br />

101


uni versity j this charter en shr<strong>in</strong>es a vic torious pr<strong>in</strong>ciple k <strong>and</strong> the fruits <strong>of</strong> that<br />

victory can i mmediately be seen j <strong>in</strong> the <strong>in</strong>ternational co mmunity <strong>of</strong> scholars<br />

j that has graduated here to day k their presence has en riched this uni versity<br />

<strong>and</strong> this country k <strong>and</strong> many will return home j to en hance their own nations k<br />

but those who live by the pr<strong>in</strong>ciples <strong>of</strong> such charters as our own j owe a special<br />

duty <strong>of</strong> testimony j to those j for whom the ght toachieve a recog nition <strong>of</strong> those<br />

pr<strong>in</strong>ciples j has not been won k whose a llegiance to the pr<strong>in</strong>ciple <strong>of</strong> an open<br />

educational system j <strong>in</strong> an open so ciety j is a con fession <strong>and</strong> a proclamation to be<br />

paid for j <strong>in</strong> the co<strong>in</strong> <strong>of</strong> im prisonment j sepa ration j <strong>and</strong> even death k <strong>and</strong><br />

Nelson<br />

Man dela has <strong>of</strong> course been will<strong>in</strong>g to pay that price k"your Royal Highness <strong>and</strong><br />

Chancellor k at all times j there have been women <strong>and</strong> men j whose lives <strong>and</strong><br />

words have taken on a special mean<strong>in</strong>g j to i nnumerably many <strong>of</strong>their fellow<br />

human be<strong>in</strong>gs k their lives em body j <strong>and</strong> their words ar ticulate j the le gitimate<br />

aspi rations <strong>of</strong> the de prived j the suer<strong>in</strong>g j <strong>and</strong> the slighted k Nelson Man dela j<br />

has be come one such k <strong>and</strong> I can th<strong>in</strong>k <strong>of</strong> no better way to commend him to you<br />

j than to use his own clos<strong>in</strong>g words j spoken <strong>in</strong> court j at the end <strong>of</strong> his nal trial<br />

j when he was <strong>in</strong>deed fac<strong>in</strong>g the possi bility <strong>of</strong>a sentence <strong>of</strong> death k dur<strong>in</strong>g<br />

my life I have dedicated my self to the struggle <strong>of</strong> the African people k Ihave<br />

fought aga<strong>in</strong>st white domi nation j <strong>and</strong> I have<br />

fought aga<strong>in</strong>st black dom<strong>in</strong>ation<br />

k Ihave cherished the i deal <strong>of</strong> a demo cratic <strong>and</strong> free so ciety k <strong>in</strong> which all<br />

persons live to gether j <strong>in</strong> harmony j <strong>and</strong> with equal oppor tunities k it is an ideal<br />

which I hope to live for<strong>and</strong>achieve j but if needs be j it is an ideal for which<br />

I am pre pared to die k a life that has <strong>in</strong>deed been lived <strong>in</strong> the spirit <strong>of</strong> these<br />

i deals j cannot but co mm<strong>and</strong> our re spect k <strong>and</strong> I therefore pre sent toyou the<br />

name <strong>of</strong> Nelson Man dela j a las <strong>in</strong> ab sentia j as one<br />

em<strong>in</strong>ently worthy j <strong>of</strong> the<br />

a ward <strong>of</strong> the de gree j <strong>of</strong><br />

Doctor <strong>of</strong> Laws j ho noris causa k<br />

102


C.1.5<br />

Section M07<br />

051 SPOKEN ENGLISH CORPUS TEXT M07<br />

Travel Roundup<br />

Speaker: male<br />

Broadcast notes: Radio 4, 8.55a.m., January 25th, 1987<br />

Transcriber: GOK<br />

" ve to n<strong>in</strong>e now j <strong>and</strong> er * here's this morn<strong>in</strong>g's travel roundup j weekend<br />

engi neer<strong>in</strong>g work j will cause problems j for both<br />

road j <strong>and</strong> rail travellers to day<br />

I'm a fraid k de lays can be ex pected j on the M8 <strong>in</strong> Glasgow j where there are lane<br />

closures <strong>in</strong> " both di rections j between K<strong>in</strong>gston Bridge j <strong>and</strong> Char<strong>in</strong>g Cross<br />

tunnels k at junction 10 j the on ramp is closed j from Barty Beeth road j<br />

to gether with two westbound lanes j on the<br />

motorway itself k at " Manchester j<br />

work on the M 6 2 j has closed the nearside lane j <strong>and</strong> hard shoulder j <strong>of</strong> both<br />

carriageways j at junction seven teen j to wards Prestwich k <strong>and</strong> there are also j<br />

eastbound lane closures j on the M5 6 lead<strong>in</strong>g from Cheshire j be tween junction<br />

3 <strong>and</strong> 4 j at Altr<strong>in</strong>cham k near Worcester j both carriageways <strong>of</strong> the M 5 are still<br />

closed j follow<strong>in</strong>g overnight work j be tween junctions 5<strong>and</strong> 6 j <strong>and</strong> di versions<br />

are a long the A3 8 j until 10 this morn<strong>in</strong>g k on the " railways j engi neer<strong>in</strong>g work j<br />

is widespread j with buses operat<strong>in</strong>g <strong>in</strong> stead <strong>of</strong> tra<strong>in</strong>s j on some routes k eastern<br />

region tra<strong>in</strong>s j will be de layed j on the London K<strong>in</strong>g's Cross to Peterborough<br />

l<strong>in</strong>e j as will western region services j on the Padd<strong>in</strong>gton j to Exeter route k I<br />

hope you get there <strong>in</strong> the end k<br />

C.1.6<br />

Section M08<br />

052 SPOKEN ENGLISH CORPUS TEXT M08<br />

Weather forecast<br />

103


Speaker: male<br />

Broadcast notes: Radio 4, January 25th, 1987<br />

Transcriber: GOK<br />

here's the weather forecast for the U nited K<strong>in</strong>gdom until j<br />

dawn to# morrow<br />

k <strong>in</strong><br />

southern counties j <strong>of</strong> Engl<strong>and</strong> <strong>and</strong> Wales j it'll be dull j but * early patches<br />

<strong>of</strong> mist <strong>and</strong> drizzle j will # clear dur<strong>in</strong>g this morn<strong>in</strong>g k over " northern Engl<strong>and</strong><br />

j <strong>and</strong> Northern Irel<strong>and</strong> j it'll stay ma<strong>in</strong>ly cloudy j but dry j dur<strong>in</strong>g to day j<br />

<strong>and</strong> to night k <strong>in</strong> southern " Scotl<strong>and</strong> after early sunsh<strong>in</strong>e <strong>in</strong> places j it'll be<br />

mostly cloudy j but dry j al though to night there" may be some light ra<strong>in</strong> k<br />

northern Scotl<strong>and</strong> j will have o ccasional light ra<strong>in</strong> j which will be<br />

followed dur<strong>in</strong>g<br />

the day j by colder j but still * ma<strong>in</strong>ly cloudy weather j with a few sleet <strong>and</strong><br />

snow showers k temperatures to day j will be * much as yesterday j ex cept<br />

<strong>in</strong> northern Scotl<strong>and</strong> j where it'll turn noticeably colder k <strong>and</strong> the outlook for<br />

Monday <strong>and</strong> Tuesday j it'll be rather cold j <strong>in</strong> most places j with south western<br />

areas j stay<strong>in</strong>g dry j but elsewhere j some light ra<strong>in</strong> or sleet j is likely k<br />

C.1.7<br />

Section M09<br />

053 SPOKEN ENGLISH CORPUS TEXT M09<br />

Programme News<br />

Speaker: male, Margaret Howard<br />

Broadcast notes: Radio 4, January 25th, 1987<br />

Transcriber: BJW<br />

we'll just check j some <strong>of</strong> today's programmes j on Radio 4 j" for you j for our<br />

Morn<strong>in</strong>g Service j at half past n<strong>in</strong>e j we'll jo<strong>in</strong> the congre gation <strong>in</strong> the parish church<br />

<strong>of</strong> St Faith <strong>in</strong> Great Cosby k <strong>and</strong> then at a quarter past ten j some Ambridge<br />

104


ell-r<strong>in</strong>g<strong>in</strong>g to en joy k#just one highlight <strong>in</strong>the lives <strong>of</strong> the Archers this week k<br />

after that Margaret Howard j with her se lection j <strong>of</strong> listen<strong>in</strong>g highlights j from<br />

the<br />

past week's broadcast<strong>in</strong>g k<br />

CHANGE OF SPEAKER: MARGARET HOWARD<br />

"the chief constable <strong>of</strong> Greater Manchester j James Anderton has been much<br />

<strong>in</strong> the news this week k many <strong>in</strong>terpre tations have been put on his statements<br />

about be<strong>in</strong>g an <strong>in</strong>strument <strong>of</strong>" God k on Pick <strong>of</strong>the Week j you can hear j what<br />

he actually said k Nigel Hawthorne j temporarily de serts Yes Prime M<strong>in</strong>ister<br />

j for a new role j on One Man <strong>and</strong> His Dog k he plays the part <strong>of</strong> a<br />

ve year old<br />

border collie k we recall the day that K<strong>in</strong>g George the Sixth j kept the Empire<br />

wait<strong>in</strong>g k David<br />

Frost j e licits a cure for snor<strong>in</strong>g j <strong>and</strong> we take a ride out with<br />

a Bicester j <strong>in</strong> pur suit <strong>of</strong> the fox k<br />

CHANGE OF SPEAKER: MALE<br />

Margaret Howard k that's all <strong>in</strong> Pick <strong>of</strong>the Week j at a quarter past e leven<br />

k an hour later j Baroness Ryder <strong>of</strong> Warsaw j jo<strong>in</strong>s Michael Park<strong>in</strong>son j for a<br />

st<strong>in</strong>t on the old desert isl<strong>and</strong> k the Baroness heads the Sue Ryder foun dation<br />

j which looks after the sick <strong>and</strong> di sabled <strong>in</strong> many parts <strong>of</strong> the world k <strong>and</strong> <strong>in</strong><br />

past years has frequently driven lorries full <strong>of</strong> medical supplies <strong>and</strong> pro visions<br />

j to dis tressed areas <strong>of</strong> central Europe k all she has to do to day j is to pick<br />

her eight desert isl<strong>and</strong> discs so j jo<strong>in</strong> her j <strong>and</strong> en joy hermusical se lection j<br />

at a quarter past j#twelve k"this even<strong>in</strong>g at a quarter past six j the rst <strong>of</strong><br />

two Actu ality proles j <strong>of</strong> some men <strong>and</strong> women recently re cruited j"to do<br />

j voluntary overseas service k over the next two weeks j we'll jo<strong>in</strong> them j as<br />

105


they're pre pared for j not only the ex citement <strong>of</strong> liv<strong>in</strong>g <strong>in</strong> strange <strong>and</strong> e xotic<br />

climes j but<br />

also j the down side <strong>of</strong> th<strong>in</strong>gs k<br />

SPEECH EXTRACT OMITTED<br />

well they all sound cheerful e nough j but for most VSO re cruits j the tra<strong>in</strong><strong>in</strong>g<br />

period is a mixture <strong>of</strong> fears j fantasies j <strong>and</strong> expec tations k as you can nd out<br />

through Actu ality tonight at a quarter past<br />

six k <strong>and</strong> just a re m<strong>in</strong>der j that it's<br />

Burns night k#if you're not plann<strong>in</strong>g to go out for a Burns night supper j then<br />

stay <strong>in</strong> <strong>and</strong> en joy The<br />

Miller's Reel j which be g<strong>in</strong>s at a quarter past seven j this<br />

even<strong>in</strong>g k The Miller's Reel j takes the form <strong>of</strong> a love story j woven j from the<br />

letters j poems j <strong>and</strong> songs <strong>of</strong> Robert Burns j <strong>and</strong> features the s<strong>in</strong>g<strong>in</strong>g <strong>of</strong> Jean<br />

Redpath <strong>and</strong> Rod Patterson k that's The Miller's Reel j e specially for Burns<br />

night j# here on Radio 4 k<br />

C.2 Prediction Results<br />

C.2.1<br />

Extract from section M05<br />

The text presented below is an extract from section M05. This extract was not produced automatically<br />

the output <strong>of</strong> the model is not <strong>in</strong> this format but as a list <strong>of</strong> symbols one per word class<br />

signify<strong>in</strong>g the prosodic annotation plus symbols for tone unit boundaries | which are not <strong>in</strong>fact<br />

predicted by the model. The simple rule a major boundary is generated by major punctuation<br />

symbols (such as full stop, colon, semi{colon, exclamation mark, <strong>and</strong> question mark) <strong>and</strong> a m<strong>in</strong>or<br />

boundary by commas.<br />

In this prediction refers to any non{accented stress such as or as well as the class stressed<br />

but unaccented which is not well dened. Rises, falls <strong>and</strong> fall{rises (denoted by , <strong>and</strong> ) are not<br />

dist<strong>in</strong>guished here between high <strong>and</strong> low.<br />

106


your Royal Highness <strong>and</strong> Chancellor k it is my privilege to pre sent toyou j on<br />

be half <strong>of</strong> the Senate j the name <strong>of</strong> Dr Nelson Man dela k the im prisoned leader <strong>of</strong><br />

the African National Congress j as one em<strong>in</strong>ently worthy <strong>of</strong>thedegree <strong>of</strong> Doctor<br />

<strong>of</strong> Laws 1 k Dr M<strong>and</strong>ela's lifeyhas been de voted to the eort to se cureyfor all the<br />

citizens <strong>of</strong> his native South Africayre gardless <strong>of</strong> their<br />

colour j certa<strong>in</strong> simple yet<br />

basic<br />

rights k the most fundamental <strong>of</strong> which isthe right <strong>of</strong> each <strong>of</strong> those who<br />

must o bey the lawyto an equal voice with<strong>in</strong> the po litical systemyunder which the<br />

law iscreated k<br />

1 The lack <strong>of</strong> a fall here is probably due to the lack <strong>of</strong> a predicted tone unit boundary at the end <strong>of</strong> the predicted<br />

stretch. A limit is placed on how many words are worked on at once. In this case the algorithm just missed the<br />

end <strong>of</strong> sentence full stop that would have given rise to a tone unit boundary. Other miss<strong>in</strong>g boundaries are denoted<br />

by y.<br />

107


Appendix D<br />

Word{Class / TSM Co{occurence<br />

gures<br />

The gures presented <strong>in</strong> this appendix were calculated from the sub{corpus sample used throughout<br />

this research.<br />

D.1 Tonic Stress Mark Frequencies.<br />

Absolute ASCII Symbol<br />

Frequency Symbol<br />

12801 @ unstress<br />

3511 *<br />

2564 `<br />

2297 ~<br />

1528<br />

1511 `/<br />

1200 n<br />

1158 ,<br />

342 n ,<br />

261 /<br />

8 /`<br />

108


D.2 Word Class Frequencies.<br />

Freq. Tag<br />

3536 NN1<br />

2067 AT<br />

2066 II<br />

1673 JJ<br />

1516 NN2<br />

1299 NP1<br />

884 IO<br />

800 VV0<br />

773 CC<br />

771 RR<br />

734 AT1<br />

725 VVN<br />

470 VVD<br />

440 MC<br />

409 TO<br />

375 VVG<br />

332 APP$<br />

300 VM<br />

283 DD1<br />

281 VBDZ<br />

267 IF<br />

266 CST<br />

262 PPH1<br />

258 VBZ<br />

208 NNT1<br />

205 RP<br />

192 VVZ<br />

178 IW<br />

172 NN<br />

167 VB0<br />

167 CCB<br />

164 JB<br />

158 DDQ<br />

157 CS<br />

152 VH0<br />

150 MD<br />

141 NNL1<br />

130 VBR<br />

130 PPHS1<br />

120 ICS<br />

119 PPHS2<br />

118 RG<br />

117 NNJ<br />

Freq. Tag<br />

114 VBN<br />

110 RL<br />

108 RT<br />

106 PPIS2<br />

99 VBDR<br />

98 MC1<br />

97 DD<br />

93 XX<br />

93 NNS1<br />

93 CSA<br />

92 NNT2<br />

91 VHZ<br />

88 EX<br />

88 DB<br />

87 PNQS<br />

84 VHD<br />

79 NNSB1<br />

75 NNO<br />

75 DD2<br />

73 MF<br />

72 RRQ<br />

71 PPY<br />

71 II22<br />

71 II21<br />

60 RR22<br />

60 RR21<br />

59 PPIS1<br />

51 JJT<br />

49 NNU2<br />

46 DA<br />

42 RRR<br />

42 PPHO2<br />

42 DAR<br />

42 DA2<br />

38 RGR<br />

38 PN1<br />

37 JJR<br />

34 CF<br />

32 VDD<br />

32 NPM1<br />

31 REX22<br />

31 REX21<br />

29 PPX1<br />

Freq. Tag<br />

28 VBG<br />

28 RA<br />

28 NNU21<br />

27 NNU22<br />

27 NNJ2<br />

26 ZZ1<br />

26 UH<br />

25 VD0<br />

25 DA1<br />

25 &FW<br />

24 CSN<br />

23 RGT<br />

23 PPIO2<br />

23 ND1<br />

18 NNL2<br />

18 DAT<br />

18 CS22<br />

18 CS21<br />

17 PPHO1<br />

17 NNU1<br />

17 NNJ1<br />

16 LE<br />

15 DB2<br />

14 CSW<br />

13 VDZ<br />

12 PPX2<br />

12 NNS2<br />

12 DD222<br />

12 DD221<br />

11 VHG<br />

11 MC2<br />

10 PPIO1<br />

10 NNU<br />

10 II33<br />

10 II32<br />

10 II31<br />

9 NPD1<br />

8 VHN<br />

8 RR33<br />

8 RR32<br />

8 RR31<br />

8 RGQ<br />

8 NP<br />

Freq. Tag<br />

7 VM21<br />

7 RGA<br />

7 JA<br />

6 VDG<br />

6 PP$<br />

5 VDN<br />

4 PN<br />

4 NNO2<br />

4 MC{MC<br />

4 DDQV<br />

4 DD22<br />

4 DD21<br />

4 BTO22<br />

4 BTO21<br />

3 RL22<br />

3 RL21<br />

3 DDQ$<br />

3 DD122<br />

3 DD121<br />

2 RG22<br />

2 RG21<br />

2 PN122<br />

2 PN121<br />

2 NP2<br />

2 NN122<br />

2 NN121<br />

2 JBR<br />

2 &FO<br />

1 VBM<br />

1 RRT<br />

1 RRQV<br />

1 RGQV<br />

1 PPX122<br />

1 PPX121<br />

1 PNQO<br />

1 NNSA1<br />

1 NNS<br />

1 CC33<br />

1 CC32<br />

1 CC31<br />

D.3 Tag/Tone Co-occurences<br />

The follow<strong>in</strong>g is a table show<strong>in</strong>g all the occur<strong>in</strong>g word class tags along with their frequncies <strong>of</strong><br />

co{occurence with TSMs low rise, high rise, low fall, high fall, low level, high level, low fall{rise,<br />

high fall{rise, stressed but unaccented, <strong>and</strong> unstressed.<br />

109


Tag<br />

U/str<br />

&FO 0 0 0 0 0 0 1 0 1 0<br />

&FW 1 0 6 2 1 5 1 1 3 5<br />

APP$ 1 0 2 9 1 8 0 7 6 303<br />

AT 0 0 0 18 1 19 0 3 42 2016<br />

AT1 1 0 0 1 1 3 0 1 12 739<br />

BTO21 0 0 0 0 1 0 0 0 1 2<br />

BTO22 0 0 0 0 1 0 0 0 0 3<br />

CC 5 1 1 12 1 18 0 6 57 683<br />

CC31 0 0 0 0 0 0 0 0 0 1<br />

CC32 0 0 0 1 0 0 0 0 0 0<br />

CC33 0 0 0 0 0 0 0 0 0 1<br />

CCB 2 0 0 0 0 3 0 1 23 140<br />

CF 4 0 0 1 0 3 1 1 5 19<br />

CS 3 1 0 12 3 17 0 0 30 92<br />

CS21 0 0 0 4 0 4 0 1 4 5<br />

CS22 0 0 0 1 1 0 0 1 1 14<br />

CSA 1 0 0 4 1 8 0 0 7 72<br />

CSN 0 0 0 0 0 0 0 0 0 24<br />

CST 0 0 0 0 0 0 0 0 8 260<br />

CSW 1 0 0 4 1 0 0 0 7 1<br />

DA 1 0 1 2 6 8 0 4 7 18<br />

DA1 1 0 0 2 2 6 0 2 7 6<br />

DA2 2 1 2 4 2 8 0 7 6 10<br />

DAR 1 1 3 6 5 6 0 3 6 11<br />

DAT 2 0 0 2 0 3 3 6 1 1<br />

DB 0 1 9 18 5 19 2 14 14 8<br />

DB2 0 0 0 1 0 4 0 6 2 2<br />

DD 0 0 1 13 9 20 0 17 14 23<br />

DD1 5 1 4 38 6 44 1 31 42 113<br />

DD121 0 0 0 0 0 0 0 0 0 3<br />

DD122 0 0 0 1 1 0 0 1 0 0<br />

DD2 5 0 1 3 7 14 0 4 12 29<br />

DD21 0 0 0 0 0 0 0 0 0 4<br />

DD22 0 0 0 1 0 0 0 0 1 2<br />

DD221 0 0 0 0 0 0 0 0 0 12<br />

DD222 0 0 0 3 0 2 0 1 3 3<br />

DDQ 3 0 1 10 7 11 0 1 22 106<br />

DDQ$ 0 0 0 0 0 0 0 0 0 3<br />

110


Tag<br />

U/str<br />

DDQV 0 0 0 3 0 0 0 1 0 0<br />

EX 0 0 0 0 1 2 0 0 6 79<br />

ICS 1 0 1 9 7 9 0 5 24 65<br />

IF 1 0 0 1 1 0 0 0 11 254<br />

II 7 1 11 25 13 47 1 25 128 1847<br />

II21 1 0 0 9 4 7 1 6 16 26<br />

II22 0 0 0 0 1 0 0 0 0 70<br />

II31 0 0 0 0 0 0 0 0 0 11<br />

II32 0 0 1 2 2 1 0 1 3 1<br />

II33 0 0 0 0 0 0 0 0 0 11<br />

IO 0 0 0 1 0 0 0 0 7 896<br />

IW 0 0 0 4 1 5 0 3 12 157<br />

JA 1 0 1 3 0 1 0 0 1 0<br />

JB 7 0 3 23 18 28 1 20 40 24<br />

JBR 0 0 0 0 0 1 0 1 0 0<br />

JJ 68 15 87 263 200 354 37 177 356 145<br />

JJR 2 0 2 5 4 8 1 11 1 3<br />

JJT 1 1 2 12 3 9 0 14 3 6<br />

LE 0 0 0 3 1 0 0 1 7 4<br />

MC 12 2 22 78 63 108 4 35 73 50<br />

MC-MC 0 0 0 1 0 2 0 0 0 1<br />

MC1 4 1 2 10 11 22 0 12 18 18<br />

MC2 3 0 1 3 0 0 1 1 2 0<br />

MD 3 0 4 18 22 27 3 26 20 29<br />

MF 3 0 2 13 1 10 0 1 16 24<br />

ND1 1 0 5 4 2 2 0 1 5 3<br />

NN 14 3 13 23 14 23 3 6 45 28<br />

NN1 379 80 373 610 285 348 106 359 810 275<br />

NN121 0 0 1 0 1 0 0 0 0 0<br />

NN122 0 0 1 0 0 0 0 0 1 0<br />

NN2 160 26 172 236 134 130 45 124 364 141<br />

NNJ 7 5 6 20 12 3 7 8 30 19<br />

NNJ1 1 2 3 2 2 2 0 1 3 1<br />

NNJ2 3 0 1 4 3 4 1 2 7 2<br />

NNL1 15 4 20 23 13 11 2 8 26 19<br />

NNL2 0 0 2 3 6 2 1 0 2 2<br />

NNO 3 0 2 9 7 4 0 10 14 26<br />

NNO2 0 0 0 3 0 0 0 0 1 0<br />

NNS 1 0 0 0 0 0 0 0 0 0<br />

NNS1 3 3 3 8 9 15 2 5 31 13<br />

NNS2 2 0 1 2 0 0 1 3 2 1<br />

NNSA1 0 1 0 0 0 0 0 0 0 0<br />

NNSB1 2 0 0 1 1 3 0 0 12 60<br />

111


Tag<br />

U/str<br />

NNT1 15 3 19 32 13 15 4 12 62 35<br />

NNT2 12 1 13 21 4 4 1 7 15 14<br />

NNU 1 0 0 1 1 0 0 1 3 3<br />

NNU1 2 2 1 3 0 1 2 1 3 2<br />

NNU2 2 1 13 9 4 2 1 3 10 4<br />

NNU21 0 0 0 0 0 0 0 0 1 27<br />

NNU22 0 0 4 6 1 0 1 2 8 5<br />

NP 0 0 0 1 1 1 0 0 1 4<br />

NP1 91 52 129 191 131 191 37 155 222 117<br />

NP2 0 0 0 2 0 0 0 0 0 0<br />

NPD1 0 0 2 3 3 1 0 0 0 0<br />

NPM1 1 0 7 9 1 1 2 6 3 2<br />

PN 0 0 0 3 0 1 0 0 0 0<br />

PN1 4 0 0 7 4 6 1 7 4 5<br />

PN121 0 0 0 0 1 0 1 0 0 0<br />

PN122 0 0 0 0 0 0 0 0 0 2<br />

PNQO 0 0 0 0 0 0 0 0 1 0<br />

PNQS 0 1 0 1 1 2 0 0 4 78<br />

PP$ 0 0 0 1 0 0 0 0 0 5<br />

PPH1 0 0 1 1 0 0 0 0 16 245<br />

PPHO1 1 0 1 0 0 1 0 0 2 12<br />

PPHO2 0 0 0 0 0 0 0 1 1 40<br />

PPHS1 0 0 1 5 1 3 0 1 5 116<br />

PPHS2 1 0 0 5 2 2 1 6 7 97<br />

PPIO1 0 0 0 1 0 1 0 0 1 7<br />

PPIO2 0 0 0 0 0 0 0 0 0 23<br />

PPIS1 0 0 0 1 1 2 0 3 6 46<br />

PPIS2 0 0 0 0 0 4 0 1 5 96<br />

PPX1 1 0 5 6 2 1 4 4 7 1<br />

PPX121 0 0 0 0 0 0 0 0 2 0<br />

PPX122 0 0 1 0 0 0 0 0 1 0<br />

PPX2 1 0 1 0 2 0 1 4 1 2<br />

PPY 0 0 0 0 1 0 0 1 1 68<br />

RA 2 0 4 3 0 3 1 2 7 6<br />

REX21 0 0 0 0 0 0 0 0 1 30<br />

REX22 3 1 1 1 3 0 0 1 20 1<br />

RG 0 1 0 14 6 22 0 5 19 52<br />

RG21 0 0 0 0 0 0 0 0 0 2<br />

RG22 0 0 0 0 0 0 0 0 0 2<br />

RGA 0 0 1 2 1 0 0 0 2 1<br />

RGQ 1 0 0 0 1 1 0 0 1 4<br />

RGQV 0 0 0 0 0 1 0 0 0 0<br />

RGR 0 0 0 1 5 6 0 3 3 20<br />

RGT 1 0 0 1 0 2 0 1 5 13<br />

112


Tag<br />

U/str<br />

RL 8 1 16 13 10 15 3 4 26 15<br />

RL21 0 0 0 0 0 0 0 0 0 3<br />

RL22 0 1 0 1 0 0 0 1 0 0<br />

RP 14 4 24 33 11 18 4 9 24 66<br />

RR 41 5 28 157 65 130 16 83 132 117<br />

RR21 0 0 0 0 1 1 0 0 7 51<br />

RR22 7 1 3 14 2 5 0 6 16 6<br />

RR31 0 0 0 0 0 0 0 0 3 5<br />

RR32 0 0 2 4 1 0 0 0 0 1<br />

RR33 0 0 0 0 1 0 0 0 0 7<br />

RRQ 0 0 0 11 3 6 0 2 9 41<br />

RRQV 0 0 0 0 0 0 0 0 0 1<br />

RRR 3 0 7 3 6 7 1 6 8 1<br />

RRT 0 0 0 0 0 1 0 0 0 0<br />

RT 12 2 8 10 7 15 1 11 23 22<br />

TO 0 0 0 0 0 0 0 0 3 410<br />

UH 3 2 1 2 1 5 0 0 4 8<br />

VB0 1 0 1 2 0 1 0 0 9 153<br />

VBDR 0 0 0 2 0 6 0 1 1 90<br />

VBDZ 0 0 0 2 3 4 0 1 11 261<br />

VBG 0 0 0 0 0 2 0 0 4 22<br />

VBM 0 0 0 1 0 0 0 0 0 0<br />

VBN 0 1 0 1 2 1 0 0 5 104<br />

VBR 0 0 1 8 4 3 0 2 5 108<br />

VBZ 2 1 3 8 1 6 0 8 10 229<br />

VD0 0 0 1 5 1 2 0 8 1 7<br />

VDD 1 0 2 4 1 8 0 0 3 14<br />

VDG 0 0 0 0 1 1 1 0 2 1<br />

VDN 0 0 1 0 2 0 0 0 0 2<br />

VDZ 0 0 0 2 1 2 0 0 3 5<br />

VH0 1 0 0 3 3 4 0 5 13 123<br />

VHD 0 0 0 2 1 6 0 0 3 75<br />

VHG 0 0 0 0 2 1 0 0 2 6<br />

VHN 0 0 0 2 1 0 0 0 2 3<br />

VHZ 1 0 0 3 1 2 0 1 5 79<br />

VM 2 0 5 20 9 19 0 10 17 218<br />

VM21 0 0 0 2 0 2 0 0 2 1<br />

VV0 67 7 47 104 90 105 9 47 185 144<br />

VVD 13 10 21 53 47 85 2 28 147 67<br />

VVG 21 3 6 54 51 71 5 20 105 42<br />

VVN 67 8 62 107 96 115 16 56 152 60<br />

VVZ 11 2 9 20 21 33 1 4 54 40<br />

XX 0 1 4 21 6 22 0 8 10 22<br />

113


Appendix E<br />

Punctuation <strong>and</strong> Boundaries<br />

Table E.1 shows the frequency <strong>of</strong> co{occurrence <strong>of</strong> puntuation with tone unit boudaries. fCTUg is<br />

a symbol the means more than one punctuation symbol matched the same boundary the second<br />

or third punctuation symbols are marked with this symbol. Punctuation that does not match any<br />

tone unit boundary is marked with the fPNg symbol <strong>and</strong> tone unit boundaries that do not match<br />

punctuation are marked with the fTUg symbol.<br />

<strong>Analysis</strong> <strong>of</strong> the unusual cases such as the four full stops not co<strong>in</strong>cid<strong>in</strong>g with tone unit boundaries<br />

<strong>of</strong>ten shows that the cases are questionable. In general the only punctuation symbols that do not<br />

match tone unit boundaries are quotes <strong>and</strong> commas.<br />

114


TAG fPNg k j * fCTUg<br />

{ 1 21 69 0 14<br />

! 1 13 1 0 0<br />

1 40 2 1 1<br />

. 4 1045 57 0 16<br />

1 68 38 0 0<br />

: 5 74 45 3 2<br />

, 255 169 1425 13 6<br />

( 2 6 26 0 1<br />

) 3 0 19 0 13<br />

" 56 18 44 14 102<br />

... 1 2 1 0 1<br />

fTUg N/A 55 3503 74 N/A<br />

Table E.1: Punctuation/Tone Unit Boundary Co-occurence Table.<br />

115


Appendix F<br />

Source Code<br />

F.1 symbolify.c<br />

This program changes the symbols used for the prosodic annotation to the more iconic ASCII scheme<br />

devised dur<strong>in</strong>g this research. The were found to be two formats <strong>of</strong> prosodic annotation one which used<br />

(generally unused) characters above ASCII 128 <strong>and</strong> another that used the sequence #xxx where xxx is<br />

the ASCII value <strong>of</strong> the character used <strong>in</strong> the rst format. The orig<strong>in</strong>al annotation system used a specially<br />

modied character set that represented each <strong>of</strong> the prosodic symbols under a PC{based system but s<strong>in</strong>ce<br />

character sets like this are not generally available to most UNIX users this program remaps the symbols<br />

<strong>in</strong>to st<strong>and</strong>ard ASCII characters.<br />

/******************************************************************************\<br />

symbolify<br />

- changes numbers to symbols <strong>in</strong> a SEC prosody transcription.<br />

AUTHOR:<br />

symbolify.c (c) Copyright January 1992, Simon Arnfield. All Rights Reserved.<br />

SYNOPSIS:<br />

symbolify [-h] prosody-filename<br />

DESCRIPTION:<br />

Gets prosody <strong>in</strong>fomation from a SEC prosody transcription <strong>and</strong> converts to<br />

ASCII symbols. H<strong>and</strong>les two different formats <strong>of</strong> prosody transcription:<br />

Select second format with the -h switch.<br />

First format uses byte codes to represent the different tonic stress marks<br />

second format uses the str<strong>in</strong>g code #xxx to represent TSM symbols as listed<br />

below.<br />

Str<strong>in</strong>g Octal Symbol Symbol name<br />

#161 241 `/ high fall-rise<br />

#162 242 /` high rise-fall<br />

#246 366 ,\ low rise-fall --changed from 240 because <strong>of</strong> ^ problem<br />

#247 367 \, low fall-rise<br />

#171 253 , low rise<br />

#172 254 / high rise<br />

116


#174 256 ` high fall<br />

#173 255 \ low fall<br />

#163 243 ~ high level<br />

_ low level<br />

#248 370 * level stress (also #249, 371)<br />

#165 245 > high reset<br />

#166 246 < low reset<br />

#240 360 ^ hesitation TU boundary (see low rise-fall)<br />

| m<strong>in</strong>or tone unit boundary<br />

|| major tone unit boundary<br />

FILES:<br />

REFERENCES:<br />

BUGS:<br />

PROGRAM MODIFICATION HISTORY<br />

Date | By | Vers | Comments<br />

----------+-----+------+--------------------------------------------------------<br />

11/02/92 | ScA | 1.0 | Created orig<strong>in</strong>al code from extract-prosody.c<br />

28/04/92 | ScA | 1.1 | Made alterations to numbers/symbols with new <strong>in</strong>fo<br />

| | | <strong>of</strong> errors made <strong>in</strong> transcription.<br />

03/07/92 | ScA | 1.2 | Added h<strong>and</strong>l<strong>in</strong>g <strong>of</strong> byte codes <strong>and</strong> switch to select which<br />

\******************************************************************************/<br />

/*----------------------------*\<br />

| INCLUDES AND DEFINITIONS |<br />

\*----------------------------*/<br />

#<strong>in</strong>clude <br />

#<strong>in</strong>clude <br />

#<strong>in</strong>clude <br />

#def<strong>in</strong>e HASH 1<br />

#def<strong>in</strong>e CHARACTER 2<br />

/*-------------------------*\<br />

| FUNCTIONS DEFINITIONS |<br />

\*-------------------------*/<br />

void error_exit(msg)<br />

char *msg<br />

{<br />

pr<strong>in</strong>tf("%s\n", msg)<br />

exit(1)<br />

}<br />

void ma<strong>in</strong>(argc, argv)<br />

<strong>in</strong>t argc<br />

char *argv[]<br />

{<br />

FILE * tag1<br />

<strong>in</strong>t fnameno = 1, ftype = CHARACTER<br />

117


char cc[3]<br />

unsigned char<br />

c<br />

if (argc != 2 && argc != 3)<br />

error_exit("Usage: symbolify [-h] prosody-transcription-file")<br />

if (argc == 3) {<br />

fnameno = 2<br />

if (!strcmp(argv[1], "-h"))<br />

ftype = HASH<br />

else<br />

error_exit("Invalid switch")<br />

}<br />

if (!(tag1 = fopen(argv[fnameno], "r")))<br />

error_exit("Filename error!")<br />

while (!fe<strong>of</strong>(tag1)) {<br />

c = getc(tag1)<br />

switch (ftype) {<br />

case HASH:<br />

if (c != '#')<br />

putchar(c)<br />

if (c == '#') {<br />

fscanf(tag1, "%3s", cc)<br />

if (!strcmp(cc, "161"))<br />

pr<strong>in</strong>tf("`/")<br />

if (!strcmp(cc, "162"))<br />

pr<strong>in</strong>tf("/`")<br />

if (!strcmp(cc, "246"))<br />

pr<strong>in</strong>tf(",\\")<br />

if (!strcmp(cc, "247"))<br />

pr<strong>in</strong>tf("\\,")<br />

if (!strcmp(cc, "171"))<br />

pr<strong>in</strong>tf(",")<br />

if (!strcmp(cc, "172"))<br />

pr<strong>in</strong>tf("/")<br />

if (!strcmp(cc, "173"))<br />

pr<strong>in</strong>tf("\\")<br />

if (!strcmp(cc, "174"))<br />

pr<strong>in</strong>tf("`")<br />

if (!strcmp(cc, "163"))<br />

pr<strong>in</strong>tf("~")<br />

if (!strcmp(cc, "165"))<br />

pr<strong>in</strong>tf(">")<br />

if (!strcmp(cc, "166"))<br />

pr<strong>in</strong>tf("


}<br />

switch ((<strong>in</strong>t)c) {<br />

case 161:<br />

pr<strong>in</strong>tf("`/")<br />

break<br />

case 162:<br />

pr<strong>in</strong>tf("/`")<br />

break<br />

case 247:<br />

pr<strong>in</strong>tf("\\,")<br />

break<br />

case 246:<br />

pr<strong>in</strong>tf(",\\")<br />

break<br />

case 171:<br />

pr<strong>in</strong>tf(",")<br />

break<br />

case 172:<br />

pr<strong>in</strong>tf("/")<br />

break<br />

case 173:<br />

pr<strong>in</strong>tf("\\")<br />

break<br />

case 174:<br />

pr<strong>in</strong>tf("`")<br />

break<br />

case 163:<br />

pr<strong>in</strong>tf("~")<br />

break<br />

case 165:<br />

pr<strong>in</strong>tf(">")<br />

break<br />

case 166:<br />

pr<strong>in</strong>tf("


F.2 ttalign.c<br />

See chapter 3 for a description <strong>of</strong> this program.<br />

/******************************************************************************\<br />

ttalign<br />

- matches two word column files <strong>and</strong> associated tag <strong>and</strong> tone files<br />

AUTHOR:<br />

ttalign.c (c) Copyright November 1991, Simon Arnfield. All Rights Reserved.<br />

SYNOPSIS:<br />

ttalign tag-col-file tag-word-col-file tone-col-file tone-word-col-file<br />

DESCRIPTION:<br />

There are four <strong>in</strong>put files derrived from the tagged <strong>and</strong> prosodic versions<br />

<strong>of</strong> the sec with the follow<strong>in</strong>g csh script:<br />

# PROCESS_TAG_PROS. This script processes the vertical tag <strong>and</strong> prosodic<br />

# files given <strong>in</strong> the <strong>in</strong>put <strong>and</strong> outputs four files pros.[12] vtag.[12]<br />

# ready for match<strong>in</strong>g. Format: process_tag_pros tag-file pros-file<br />

# e.g. process_tag_pros M01 M01.b<br />

# tag file h<strong>and</strong>l<strong>in</strong>g...<br />

awk '$4=="-----" {next} \<br />

$5=="("&&$6=="@" {b=1} \<br />

$5=="[" {b=1} \<br />

{if(b==0) pr<strong>in</strong>t $4,$5} \<br />

$5==")"&&$6=="@" {b=0} \<br />

$5=="]" {b=0}' b=0 \<br />

/usr/export/home/sca/work/corpus/tag/$1 | tee process.vtag \<br />

| cut -f1 -d" " >vtag.1<br />

cat process.vtag| cut -f2 -d" " | tr A-Z a-z >vtag.2<br />

rm process.vtag<br />

# prosody file h<strong>and</strong>l<strong>in</strong>g...<br />

symbolify /usr/export/home/sca/work/corpus/pros/$2 \<br />

| awk '{a=substr($0,1,1)} {if (a!="[") pr<strong>in</strong>t $0}' \<br />

| tr -s ' ' '\012' \<br />

| awk 'length!=0 {pr<strong>in</strong>t $0}' \<br />

| tee pros.1 \<br />

| tr -d '`,/\\~_*@' | tr A-Z a-z >pros.2<br />

This program attempts to match the words <strong>in</strong> the two files<br />

pros.2 <strong>and</strong> vtag.2 <strong>and</strong> when it does so, pr<strong>in</strong>ts out<br />

the associated entry <strong>in</strong> the files pros.1 <strong>and</strong> vtag.1<br />

The result is a list <strong>of</strong> words, the word with its tone marked on it,<br />

<strong>and</strong> the part-<strong>of</strong>-speech that the word has <strong>in</strong> the corpus. These values<br />

may then be used to calculate probabilities <strong>of</strong> co-occurrence <strong>of</strong><br />

tags <strong>and</strong> tones.<br />

Several problems arise <strong>in</strong> this however, for example differences <strong>in</strong><br />

case between words <strong>in</strong> the vertical tag files <strong>and</strong> prosody files.<br />

Words such as "don't" "won't" "it's" called enclitics, are treated as<br />

two words "do + n't" "will + n't" <strong>in</strong> the vtag files but as s<strong>in</strong>gle words<br />

<strong>in</strong> the prosody file, mean<strong>in</strong>g that the vertical output format will have<br />

120


to have a blank entry for one column. Similar problems occur with compound<br />

nouns, where "mother-<strong>in</strong>-law" takes only one entry <strong>in</strong> the vtag file but<br />

may be marked as three l<strong>in</strong>es (if hyphens are ommitted) <strong>in</strong> the prosdy file.<br />

In addition it is possible to have a tone unit boundary (| or || or ^)that<br />

does not co-<strong>in</strong>cide with a punctuation symbol <strong>and</strong> vice-versa. These have<br />

to add new blank entries <strong>in</strong> the appropriate columns.<br />

New tags used are enclosed <strong>in</strong> {}.<br />

TAG-TAGs TONE-TAGs WORD-TAGs Usage.<br />

{PN}<br />

no tone unit match<strong>in</strong>g PuNctuation.<br />

{TU}<br />

{TONE-UNIT} no punc match<strong>in</strong>g Tone Unit<br />

{CP} {COMPOUND} l<strong>in</strong>es follow<strong>in</strong>g ComPound-nouns<br />

{EN}<br />

l<strong>in</strong>es follow<strong>in</strong>g ENclitics<br />

Because, <strong>in</strong> some circumstances a bracket or quote will be next to<br />

some punctuation such as a comma or full stop there is a need for<br />

a post-process<strong>in</strong>g phase to re-organise which punc symbol gets matched<br />

to a tone-unit if one occurs. In these cases the bracket or quote is<br />

given the symbol {CTU}. eg:<br />

JJ philo*sophical philosophical<br />

NN position position<br />

*' {PN} '<br />

NN /naturalism naturalism<br />

**' | '<br />

, {PN} ,<br />

CC <strong>and</strong> <strong>and</strong><br />

will become:<br />

JJ philo*sophical philosophical<br />

NN position position<br />

*' {PN} '<br />

NN /naturalism naturalism<br />

**' {CTU} '<br />

, | ,<br />

CC <strong>and</strong> <strong>and</strong><br />

FILES:<br />

~sca/src/sec/symbolify.c<br />

~sca/src/sec/collate-tu.c<br />

~sca/work/corpus/pros/*<br />

~sca/work/corpus/tag/*<br />

These are similar to the orig<strong>in</strong>al files <strong>in</strong> the corpus <strong>in</strong> /bjw /gok /dup<br />

for the porsody, <strong>and</strong> /vtag for the tag, except that prosody files are<br />

re-organised <strong>in</strong>to one directory, names are changed, errors are corrected.<br />

REFERENCES:<br />

A manual <strong>of</strong> <strong>in</strong>formation to accompany the SEC <strong>Corpus</strong>.<br />

L.J.Taylor, Dr. G.Knowles, 1988, Lancaster University.<br />

BUGS:<br />

Does not detect e<strong>of</strong> properly result<strong>in</strong>g <strong>in</strong> erronous l<strong>in</strong>es at end <strong>of</strong><br />

output files.<br />

PROGRAM MODIFICATION HISTORY<br />

Date | By | Vers | Comments<br />

----------+-----+------+--------------------------------------------------------<br />

04/11/91 | ScA | 1.0 | Created orig<strong>in</strong>al code<br />

26/02/92 | ScA | 1.1 | Made h<strong>and</strong>l<strong>in</strong>g <strong>of</strong> enclitics, punctuation, tone units<br />

| | | <strong>and</strong> compound nouns.<br />

121


02/03/92 | ScA | 1.2 | Several improvements, added h<strong>and</strong>l<strong>in</strong>g <strong>of</strong> ( ) *' **'<br />

06/03/92 | ScA | 1.3 | fixed some bugs <strong>in</strong> <strong>in</strong>put h<strong>and</strong>l<strong>in</strong>g.<br />

06/04/92 | ScA | 1.4 | changed {} to differnt symbols for diff situations.<br />

29/04/92 | ScA | 1.5 | removed {MW}, {BR}, made process<strong>in</strong>g simpler<br />

| | | by add<strong>in</strong>g post-proces<strong>in</strong>g phase<br />

04/05/92 | ScA | 1.6 | tried to fix e<strong>of</strong> bug <strong>and</strong> multiple mismatch l<strong>in</strong>es.<br />

12/07/92 | ScA | 2.0 | added <strong>in</strong>teractive fix<strong>in</strong>g <strong>of</strong> mismatches.<br />

12/08/92 | ScA | 2.1 | f<strong>in</strong>ished <strong>in</strong>teractive fix<strong>in</strong>g rout<strong>in</strong>es.<br />

\******************************************************************************/<br />

/*----------------------------*\<br />

| INCLUDES AND DEFINITIONS |<br />

\*----------------------------*/<br />

#<strong>in</strong>clude <br />

#<strong>in</strong>clude <br />

#<strong>in</strong>clude <br />

/*-------------------*\<br />

| GLOBAL VARIABLES |<br />

\*-------------------*/<br />

char ntag[20][40], ntone[20][40],<br />

ntag_word[20][40], ntone_word[20][40]<br />

<strong>in</strong>t b1 = 0, b2 = 0, b3 = 0, b4 = 0<br />

/*-------------------------*\<br />

| FUNCTIONS DEFINITIONS |<br />

\*-------------------------*/<br />

void pr<strong>in</strong>t(str1, str2, str3)<br />

char *str1, *str2, *str3<br />

{<br />

pr<strong>in</strong>tf("%-8s%-24s%-24s\n", str1, str2, str3)<br />

fpr<strong>in</strong>tf(stderr, "%-8s%-24s%-24s\n", str1, str2, str3)<br />

}<br />

char *read_tag1(file)<br />

/* provides tag <strong>in</strong>put from the tag files. */<br />

FILE *file<br />

{<br />

char <strong>in</strong>put[40]<br />

<strong>in</strong>t i<br />

if (b1 == 0)<br />

if (fe<strong>of</strong>(file))<br />

strcpy(<strong>in</strong>put, "")<br />

else<br />

fscanf(file, "%s", <strong>in</strong>put)<br />

else {<br />

strcpy(<strong>in</strong>put, ntag[1])<br />

for (i = 2 i


char *read_tag2(file)<br />

/* provides word <strong>in</strong>put from the tag files. */<br />

FILE *file<br />

{<br />

char <strong>in</strong>put[40]<br />

<strong>in</strong>t i<br />

if (b2 == 0)<br />

if (fe<strong>of</strong>(file))<br />

strcpy(<strong>in</strong>put, "")<br />

else<br />

fscanf(file, "%s", <strong>in</strong>put)<br />

else {<br />

strcpy(<strong>in</strong>put, ntag_word[1])<br />

for (i = 2 i


}<br />

for (i = 2 i b1)<br />

fscanf(tag1, "%s", ntag[++b1])<br />

if ((++bb2) > b2)<br />

fscanf(tag2, "%s", ntag_word[++b2])<br />

if ((++bb3) > b3)<br />

fscanf(pros1, "%s", ntone[++b3])<br />

if ((++bb4) > b4)<br />

fscanf(pros2, "%s", ntone_word[++b4])<br />

} while ((b1 < 10) && (b3 < 10) && strcmp(ntag_word[bb2],<br />

tone_word) && strcmp(ntone_word[bb4], tag_word) && !fe<strong>of</strong>(tag1) &&<br />

!fe<strong>of</strong>(pros1))<br />

if ((punc || tu) || (strcmp(ntag_word[b2], tone_word) &&<br />

strcmp(ntone_word[b4], tag_word))) {<br />

124


* if tag is a punctuation symbol or if tone is a tone unit */<br />

/* then it is possible that the previous mismatch is part <strong>of</strong> */<br />

/* an enclitic or compound followed by punctuation or a tone */<br />

/* unit the PUNC or TU may then be found to match a TU or PUNC */<br />

/* later on this is where the automatch<strong>in</strong>g gets complicated */<br />

/* so, ask the user for assistance <strong>in</strong>teractively. Note that we */<br />

/* ignore the reults <strong>of</strong> the bit above - as far as we are */<br />

/* concerend here we only want the buffers full so we can give */<br />

/* the user some context. */<br />

/* <strong>in</strong>teractive fixit */<br />

<strong>in</strong>t i, f<strong>in</strong>ished = 0<br />

char *comm<strong>and</strong>[256]<br />

do {<br />

fpr<strong>in</strong>tf(stderr, "FIXITMODE: Tone next(1-TU,2-CP).<br />

Tags next(3-EN,4-PN). 5:Exit\n")<br />

fpr<strong>in</strong>tf(stderr, " 0# %-8s%-24s%-24s\n",<br />

tag, tone, tag_word)<br />

for (i=1,j=1(i


eak<br />

case 3:<br />

pr<strong>in</strong>t(tag, "{EN}", tag_word)<br />

strcpy(tag, ntag[1])<br />

strcpy(tag_word, ntag_word[1])<br />

for (i = 2 i


char tag[40], tone[40], tag_word[40], pros_word[40]<br />

<strong>in</strong>t punc = 0, tu = 0, new_data = 0<br />

if (argc != 5) {<br />

fpr<strong>in</strong>tf(stderr, "usage: ttalign tag tag-word tone tone-word\n<br />

Output produced by PROCESS_TAG_PROS")<br />

exit(1)<br />

}<br />

if (!(tag1 = fopen(argv[1], "r"))) {<br />

fpr<strong>in</strong>tf(stderr, "Can't open %s\n", argv[1])<br />

exit(1)<br />

}<br />

if (!(tag2 = fopen(argv[2], "r"))) {<br />

fpr<strong>in</strong>tf(stderr, "Can't open %s\n", argv[2])<br />

exit(1)<br />

}<br />

if (!(pros1 = fopen(argv[3], "r"))) {<br />

fpr<strong>in</strong>tf(stderr, "Can't open %s\n", argv[3])<br />

exit(1)<br />

}<br />

if (!(pros2 = fopen(argv[4], "r"))) {<br />

fpr<strong>in</strong>tf(stderr, "Can't open %s\n", argv[4])<br />

exit(1)<br />

}<br />

while (!fe<strong>of</strong>(tag1) && !fe<strong>of</strong>(pros1)) {<br />

if (!new_data) {<br />

strcpy(tag, read_tag1(tag1))<br />

strcpy(tag_word, read_tag2(tag2))<br />

strcpy(tone, read_pros1(pros1))<br />

strcpy(pros_word, read_pros2(pros2))<br />

}<br />

new_data = 0<br />

if (strstr(".,:!*-...*'**'()", tag))<br />

punc = 1<br />

else<br />

punc = 0<br />

if (strstr("||^|", tone))<br />

tu = 1<br />

else<br />

tu = 0<br />

if (!strcmp(pros_word, tag_word) || (punc && tu))<br />

pr<strong>in</strong>t(tag, tone, tag_word)<br />

else {<br />

if (!punc && !tu) /* neither punc or tu <strong>and</strong> don't match */<br />

h<strong>and</strong>le_unmatched(tag, tone, tag_word,<br />

pros_word, tag1, tag2, pros1, pros2)<br />

if (punc && !tu) /* punctuation but no tu boundary */ {<br />

pr<strong>in</strong>t(tag, "{PN}", tag_word)<br />

if (!fe<strong>of</strong>(tag1)) {<br />

strcpy(tag, read_tag1(tag1))<br />

strcpy(tag_word, read_tag2(tag2))<br />

new_data = 1<br />

}<br />

}<br />

if (!punc && tu) /* tu boundary but no punctuation */ {<br />

pr<strong>in</strong>t("{TU}", tone, "{TONE-UNIT}")<br />

if (!fe<strong>of</strong>(pros1)) {<br />

127


}<br />

}<br />

}<br />

}<br />

strcpy(tone, read_pros1(pros1))<br />

strcpy(pros_word, read_pros2(pros2))<br />

new_data = 1<br />

}<br />

fclose(pros2)<br />

fclose(pros1)<br />

fclose(tag2)<br />

fclose(tag1)<br />

/*-------*\<br />

| END |<br />

\*-------*/<br />

F.3 collate-tu.c<br />

This program is used as a post{process<strong>in</strong>g phase to ttalign <strong>and</strong> specifcally h<strong>and</strong>les cases where there<br />

are multiple punctuation symbols that co{<strong>in</strong>cide with a tone unit boundary. As wellasalittleshu<strong>in</strong>g<br />

to ensure that the tone unit boundary is aligned with the primary type <strong>of</strong> punctuation (viz. brackets<br />

<strong>and</strong> quotes are less likely to give rise to a boundary than, say, a full stop or comma) it marks the other<br />

punctuation symbols as match<strong>in</strong>g a boundary whereas ttalign would have left the punctuation marked as<br />

punctuation that does not match a boundary | which would be <strong>in</strong>correct.<br />

/******************************************************************************\<br />

collate-tu - collects together punctuation under one TU, where appropriate<br />

AUTHOR:<br />

collate-tu.c (c) Copyright May 1992, Simon Arnfield. All Rights Reserved.<br />

SYNOPSIS:<br />

collate-tu <br />

DESCRIPTION:<br />

Because, <strong>in</strong> some circumstances a bracket or quote will be next to<br />

some punctuation such as a comma or full stop there is a need for<br />

a post-process<strong>in</strong>g phase to re-organise which punc symbol gets matched<br />

to a tone-unit if one occurs. In these cases the bracket or quote is<br />

given the symbol {CTU}. eg:<br />

JJ philo*sophical philosophical<br />

NN position position<br />

*' {PN} '<br />

NN /naturalism naturalism<br />

**' | '<br />

, {PN} ,<br />

CC <strong>and</strong> <strong>and</strong><br />

will become:<br />

JJ philo*sophical philosophical<br />

NN position position<br />

*' {PN} '<br />

NN /naturalism naturalism<br />

128


**' {CTU} '<br />

, | ,<br />

CC <strong>and</strong> <strong>and</strong><br />

PROGRAM MODIFICATION HISTORY<br />

Date | By | Vers | Comments<br />

----------+-----+------+--------------------------------------------------------<br />

04/05/92 | ScA | 1.0 | Created orig<strong>in</strong>al code<br />

\******************************************************************************/<br />

/*----------------------------*\<br />

| INCLUDES AND DEFINITIONS |<br />

\*----------------------------*/<br />

#<strong>in</strong>clude <br />

#<strong>in</strong>clude <br />

#<strong>in</strong>clude <br />

/*-------------------------*\<br />

| FUNCTIONS DEFINITIONS |<br />

\*-------------------------*/<br />

void ma<strong>in</strong>(argc, argv)<br />

<strong>in</strong>t argc<br />

char *argv[]<br />

{<br />

FILE * file<br />

char pwrd1[8], pwrd2[24], pwrd3[24]<br />

char wrd1[8], wrd2[24], wrd3[24]<br />

<strong>in</strong>t p_is_tu = 0, is_pn = 0, flag = 0<br />

if (argc != 2) {<br />

pr<strong>in</strong>tf("usage: collate-tu \n")<br />

exit(1)<br />

}<br />

if (!(file = fopen(argv[1], "r"))) {<br />

pr<strong>in</strong>tf("Can't open %s\n", argv[1])<br />

exit(1)<br />

}<br />

fscanf(file, "%s", pwrd1)<br />

fscanf(file, "%s", pwrd2)<br />

fscanf(file, "%s", pwrd3)<br />

while (!fe<strong>of</strong>(file)) {<br />

fscanf(file, "%s", wrd1)<br />

fscanf(file, "%s", wrd2)<br />

fscanf(file, "%s", wrd3)<br />

if (strstr("|^||", pwrd2))<br />

p_is_tu = 1<br />

else<br />

p_is_tu = 0<br />

if (!strcmp("{PN}", wrd2))<br />

is_pn = 1<br />

else<br />

is_pn = 0<br />

129


flag = 0<br />

if ((!strcmp("'", pwrd1) || !strcmp(")", pwrd1)) &&<br />

p_is_tu && is_pn) {<br />

pr<strong>in</strong>tf("%-8s%-24s%-24s\n", pwrd1, "{CTU}",<br />

pwrd3)<br />

pr<strong>in</strong>tf("%-8s%-24s%-24s\n", wrd1, pwrd2,<br />

wrd3)<br />

fscanf(file, "%s", pwrd1)<br />

fscanf(file, "%s", pwrd2)<br />

fscanf(file, "%s", pwrd3)<br />

flag = 1<br />

}<br />

if (!flag && p_is_tu && is_pn) {<br />

pr<strong>in</strong>tf("%-8s%-24s%-24s\n", pwrd1, pwrd2,<br />

pwrd3)<br />

pr<strong>in</strong>tf("%-8s%-24s%-24s\n", wrd1, "{CTU}",<br />

wrd3)<br />

fscanf(file, "%s", pwrd1)<br />

fscanf(file, "%s", pwrd2)<br />

fscanf(file, "%s", pwrd3)<br />

while (!strcmp("{PN}", pwrd2)) {<br />

pr<strong>in</strong>tf("%-8s%-24s%-24s\n", pwrd1,<br />

"{CTU}", pwrd3)<br />

fscanf(file, "%s", pwrd1)<br />

fscanf(file, "%s", pwrd2)<br />

fscanf(file, "%s", pwrd3)<br />

}<br />

flag = 1<br />

}<br />

}<br />

if (!flag) {<br />

pr<strong>in</strong>tf("%-8s%-24s%-24s\n", pwrd1, pwrd2,<br />

pwrd3)<br />

strcpy(pwrd1, wrd1)<br />

strcpy(pwrd2, wrd2)<br />

strcpy(pwrd3, wrd3)<br />

}<br />

}<br />

fclose(file)<br />

/*-------*\<br />

| END |<br />

\*-------*/<br />

F.4 align-parse.c<br />

align-parse takes the output from ttalign <strong>and</strong> the parsetree le from which tags were taken <strong>and</strong> <strong>in</strong>serts<br />

the phrase brackets at the appropriate place <strong>in</strong> the alignle. That is it produces a le with the phrase<br />

brackets <strong>and</strong> tags aligned with the prosodic words <strong>and</strong> tone unit boundaries. This somewhat trivial task is<br />

confused with the need to check for tone unit boundaries or the absence <strong>of</strong> <strong>and</strong> <strong>in</strong>sert l<strong>in</strong>es appropriately.<br />

/******************************************************************************\<br />

130


align-parse - aligns sec parsetree with output from ttalign<br />

AUTHOR:<br />

align-parse (c) Copyright November 1991, Simon Arnfield. All Rights Reserved.<br />

SYNOPSIS:<br />

align-parse<br />

parse-file tag-tone-file<br />

DESCRIPTION:<br />

Aligns a sec parsetree with the output produced by ttalign. That is it<br />

produces a file conta<strong>in</strong><strong>in</strong>g phrase brackets <strong>and</strong> words aligned with tone-unit<br />

boundaries <strong>and</strong> words <strong>in</strong> the output from the tag-tone alignment.<br />

IDIOSYNCRASIES:<br />

FILES:<br />

REFERENCES:<br />

A manual <strong>of</strong> <strong>in</strong>formation to accompany the SEC <strong>Corpus</strong>.<br />

L.J.Taylor, Dr. G.Knowles, 1988, Lancaster University.<br />

BUGS:<br />

PROGRAM MODIFICATION HISTORY<br />

Date | By | Vers | Comments<br />

----------+-----+------+--------------------------------------------------------<br />

07/03/92 | ScA | 1.0 | Created orig<strong>in</strong>al code<br />

06/08/92 | ScA | 1.1 | changed because ttalign is now used with treebank<br />

17/08/92 | ScA | 1.2 | added recognition <strong>of</strong> compounds as these will skew<br />

| output <strong>in</strong> same way as TU would, if l<strong>in</strong>es are matched.<br />

| | | data. Therefore word-tags are same.<br />

\******************************************************************************/<br />

/*----------------------------*\<br />

| INCLUDES AND DEFINITIONS |<br />

\*----------------------------*/<br />

#<strong>in</strong>clude <br />

#<strong>in</strong>clude <br />

#<strong>in</strong>clude <br />

/*-------------------------*\<br />

| FUNCTIONS DEFINITIONS |<br />

\*-------------------------*/<br />

void ma<strong>in</strong>(argc, argv)<br />

<strong>in</strong>t argc<br />

char *argv[]<br />

{<br />

FILE * tag1, *tag2<br />

char <strong>in</strong>put1[80], <strong>in</strong>put1b[40], <strong>in</strong>2a[32], <strong>in</strong>2b[32], <strong>in</strong>2c[32]<br />

if (argc != 3) {<br />

pr<strong>in</strong>tf("usage: align-parse parse-file tag-tone-file\n")<br />

exit(1)<br />

}<br />

131


if (!(tag1 = fopen(argv[1], "r"))) {<br />

pr<strong>in</strong>tf("Can't open %s\n", argv[1])<br />

exit(1)<br />

}<br />

if (!(tag2 = fopen(argv[2], "r"))) {<br />

pr<strong>in</strong>tf("Can't open %s\n", argv[2])<br />

exit(1)<br />

}<br />

fscanf(tag1, "%s", <strong>in</strong>put1)<br />

fscanf(tag2, "%s", <strong>in</strong>2a)<br />

fscanf(tag2, "%s", <strong>in</strong>2b)<br />

fscanf(tag2, "%s", <strong>in</strong>2c)<br />

while (!fe<strong>of</strong>(tag1) && !fe<strong>of</strong>(tag2)) {<br />

/* if <strong>in</strong>2a = {TU} then must <strong>in</strong>sert blank l<strong>in</strong>e <strong>in</strong> col 1 */<br />

if (!strcmp(<strong>in</strong>2a, "{TU}")) {<br />

pr<strong>in</strong>tf("%-40s%-6s%-16s%-18s\n", "", <strong>in</strong>2a,<br />

<strong>in</strong>2c, <strong>in</strong>2b)<br />

fscanf(tag2, "%s", <strong>in</strong>2a)<br />

fscanf(tag2, "%s", <strong>in</strong>2b)<br />

fscanf(tag2, "%s", <strong>in</strong>2c)<br />

}<br />

/* if <strong>in</strong>2a = {CP} treat as {TU} */<br />

/* BUT can have sequences <strong>of</strong> more than one {CP} */<br />

while (!strcmp(<strong>in</strong>2a, "{CP}")) {<br />

pr<strong>in</strong>tf("%-40s%-6s%-16s%-18s\n", "", <strong>in</strong>2a,<br />

<strong>in</strong>2c, <strong>in</strong>2b)<br />

fscanf(tag2, "%s", <strong>in</strong>2a)<br />

fscanf(tag2, "%s", <strong>in</strong>2b)<br />

fscanf(tag2, "%s", <strong>in</strong>2c)<br />

}<br />

/* if <strong>in</strong>put1 has [ or ] <strong>in</strong> it then add next l<strong>in</strong>e<br />

/* otherwise align with <strong>in</strong>put2*/<br />

if (strstr(<strong>in</strong>put1, "[") || strstr(<strong>in</strong>put1, "]")) {<br />

fscanf(tag1, "%s", <strong>in</strong>put1b)<br />

if (strstr(<strong>in</strong>put1b, "[") || strstr(<strong>in</strong>put1b,<br />

"]")) {<br />

strcat(<strong>in</strong>put1, <strong>in</strong>put1b)<br />

fscanf(tag1, "%s", <strong>in</strong>put1b)<br />

} else<br />

pr<strong>in</strong>tf("%-40s%-6s%-16s%-18s\n",<br />

<strong>in</strong>put1, <strong>in</strong>2a, <strong>in</strong>2c, <strong>in</strong>2b)<br />

} else<br />

pr<strong>in</strong>tf("%-40s%-6s%-16s%-18s\n", "", <strong>in</strong>2a,<br />

<strong>in</strong>2c, <strong>in</strong>2b)<br />

}<br />

fscanf(tag1, "%s", <strong>in</strong>put1)<br />

fscanf(tag2, "%s", <strong>in</strong>2a)<br />

fscanf(tag2, "%s", <strong>in</strong>2b)<br />

fscanf(tag2, "%s", <strong>in</strong>2c)<br />

}<br />

fclose(tag2)<br />

fclose(tag1)<br />

132


*-------*\<br />

| END |<br />

\*-------*/<br />

F.5 splittule.c<br />

This program provided a front end to probabilityc by splitt<strong>in</strong>g the align les produced by ttalign <strong>in</strong>to many<br />

smaller les <strong>and</strong> generat<strong>in</strong>g a c{shell script to execute probabilityc on these les. This is used for test<strong>in</strong>g<br />

probabilityc <strong>and</strong> for produc<strong>in</strong>g results. It is not <strong>in</strong>tended as a real front end to the model | only a front<br />

end to allow test<strong>in</strong>g. Splitt<strong>in</strong>g the <strong>in</strong>put <strong>in</strong>to lots <strong>of</strong> les is very dirty programm<strong>in</strong>g but allows recovery <strong>of</strong><br />

a run <strong>in</strong> case the mach<strong>in</strong>e goes down whilst the lengthy calculations take place. A nicer front end should<br />

be written for use with the model for synthesis purposes.<br />

/* splittufile.c Copyright Simon Arnfield August 1993 */<br />

/* this program converts align files to many files suitable for use with */<br />

/* probabilityc.c it also generates (on stdout) a script to process these */<br />

/* the align files must first have been processed with the follow<strong>in</strong>g script */<br />

/* cat ~/work/corpus/align/a01.align|grep -v "{TU}"| grep -v "{CTU}"| \ */<br />

/* grep -v '^$ {EN}' |grep -v "XXXX"|splittufile|csh >a01.results */<br />

/* we break the <strong>in</strong>put either at punctuation or when the file is MAXWRD long */<br />

#<strong>in</strong>clude <br />

#<strong>in</strong>clude <br />

#def<strong>in</strong>e MAXWRD 12<br />

ma<strong>in</strong>()<br />

{<br />

char<br />

bufw[MAXWRD][99], buft[MAXWRD][99], filename[9], word[99],<br />

tag[99], tmp[99]<br />

<strong>in</strong>t pos = 0, i, filenum = 1<br />

FILE * file<br />

spr<strong>in</strong>tf(filename, "file%d", filenum)<br />

while (!fe<strong>of</strong>(std<strong>in</strong>)) {<br />

scanf("%s %s %*s", tag, word)<br />

if (strstr("!(),-.:...'", tag) && strstr("!(),-.:...'",<br />

buft[pos-1]) && pos >= 1) {<br />

pos--<br />

strcpy(tmp, buft[pos])<br />

strcat(tmp, tag)<br />

strcpy(tag, tmp)<br />

strcpy(tmp, bufw[pos])<br />

strcat(tmp, word)<br />

strcpy(word, tmp)<br />

}<br />

strcpy(buft[pos], tag)<br />

strcpy(bufw[pos], word)<br />

pos++<br />

if ((pos == MAXWRD || strstr("!(),-.:...'", tag)) &&<br />

pos > 3) {<br />

if ((file = fopen(filename, "w")) == 0) {<br />

fpr<strong>in</strong>tf(stderr, "couldn't open file %s\n",<br />

filename)<br />

exit(1)<br />

}<br />

fpr<strong>in</strong>tf(file, "%d\n", pos)<br />

133


}<br />

}<br />

}<br />

for (i = 0 i < pos i++)<br />

fpr<strong>in</strong>tf(file, "%s %s\n", buft[i],<br />

bufw[i])<br />

fclose(file)<br />

pr<strong>in</strong>tf("probabilityc


| INCLUDES AND DEFINITIONS |<br />

\*----------------------------*/<br />

#<strong>in</strong>clude <br />

#<strong>in</strong>clude <br />

#<strong>in</strong>clude <br />

/*-------------------*\<br />

| GLOBAL VARIABLES |<br />

\*-------------------*/<br />

char tags[187][7] = {<br />

"&FO", "&FW", "APP$", "AT", "AT1", "BTO", "BTO21", "BTO22",<br />

"CC", "CC31", "CC32", "CC33", "CCB", "CF", "CS", "CS21",<br />

"CS22", "CSA", "CSN", "CST", "CSW", "DA", "DA1", "DA2",<br />

"DA2R", "DAR", "DAT", "DB", "DB2", "DD", "DD1", "DD121",<br />

"DD122", "DD2", "DD21", "DD22", "DD221", "DD222", "DDQ",<br />

"DDQ$", "DDQV", "EX", "ICS", "IF", "II", "II21", "II22",<br />

"II31", "II32", "II33", "IO", "IW", "JA", "JB", "JBR",<br />

"JJ", "JJR", "JJT", "LE", "MC", "MC-MC", "MC1", "MC2",<br />

"MD", "MF", "ND1", "NN", "NN1", "NN121", "NN122", "NN2",<br />

"NNJ", "NNJ1", "NNJ2", "NNL1", "NNL2", "NNO", "NNO2",<br />

"NNS", "NNS1", "NNS2", "NNSA1", "NNSB1", "NNT1", "NNT2",<br />

"NNU", "NNU1", "NNU2", "NNU21", "NNU22", "NP", "NP1",<br />

"NP2", "NPD1", "NPM1", "PN", "PN1", "PN121", "PN122",<br />

"PNQO", "PNQS", "PP$", "PPH1", "PPHO1", "PPHO2", "PPHS1",<br />

"PPHS2", "PPIO1", "PPIO2", "PPIS1", "PPIS2", "PPX1",<br />

"PPX121", "PPX122", "PPX2", "PPX221", "PPX222", "PPY",<br />

"RA", "REX", "REX21", "REX22", "RG", "RG21", "RG22",<br />

"RGA", "RGQ", "RGQV", "RGR", "RGT", "RL", "RL21", "RL22",<br />

"RP", "RR", "RR21", "RR22", "RR31", "RR32", "RR33", "RRQ",<br />

"RRQV", "RRR", "RRT", "RT", "TO", "UH", "VB0", "VBDR",<br />

"VBDZ", "VBG", "VBM", "VBN", "VBR", "VBZ", "VD0", "VDD",<br />

"VDG", "VDN", "VDZ", "VH0", "VHD", "VHG", "VHN", "VHZ",<br />

"VM", "VM21", "VM22", "VMK", "VV0", "VVD", "VVG", "VVN",<br />

"VVZ", "XX", "ZZ1", "{CP}", ".", ":", "", "!", "",<br />

",", "-", "(", ")", "'"}<br />

<strong>in</strong>t trUU[190][190], trUS[190][190], trSU[190][190], trSS[190][190]<br />

/*-------------------------*\<br />

| FUNCTIONS DEFINITIONS |<br />

\*-------------------------*/<br />

void ma<strong>in</strong>()<br />

{<br />

char tag[7], prosody[40], word[40]<br />

<strong>in</strong>t i, j, t1 = -1, t2 = -1, s1, s2<br />

for (i = 0 i < 170 i++) {<br />

for (j = 0 j < 170 j++) {<br />

trUU[i][j] = 0<br />

trUS[i][j] = 0<br />

trSU[i][j] = 0<br />

trSS[i][j] = 0<br />

}<br />

}<br />

135


* assume first tag is ok - we don't check if it IS 0-187 */<br />

scanf("%s %s %s", tag, prosody, word)<br />

for (i = 0 i < 187 i++)<br />

if (!strcmp(tag, tags[i]))<br />

t1 = i<br />

for (s1 = 0, j = 0 j < strlen(prosody) j++)<br />

if (prosody[j] == '*' || prosody[j] == ',' || prosody[j]<br />

== '/' || prosody[j] == '\\' || prosody[j] =='`'||<br />

prosody[j] == '~' || prosody[j] == '_') {<br />

s1 = 1<br />

break<br />

}<br />

while (!fe<strong>of</strong>(std<strong>in</strong>)) {<br />

scanf("%s %s %s", tag, prosody, word)<br />

for (t2 = -1, i = 0 i < 187 i++)<br />

if (!strcmp(tags[i], tag))<br />

t2 = i<br />

if (t2 == -1) {<br />

pr<strong>in</strong>tf("Error: %s\n", tag)<br />

scanf("%s %s %s", tag, prosody, word)<br />

for (t2 = -1, i = 0 i < 187 i++)<br />

if (!strcmp(tags[i], tag))<br />

t2 = i<br />

}<br />

for (s2 = 0, j = 0 j < strlen(prosody) j++)<br />

if (prosody[j] == '*' || prosody[j] == ',' ||<br />

prosody[j] == '/' || prosody[j] == '\\' ||<br />

prosody[j] == '`' || prosody[j] == '~' ||<br />

prosody[j] == '_')<br />

s2 = 1<br />

}<br />

if (s1 == 0 && s2 == 0)<br />

trUU[t1][t2]++<br />

if (s1 == 0 && s2 == 1)<br />

trUS[t1][t2]++<br />

if (s1 == 1 && s2 == 0)<br />

trSU[t1][t2]++<br />

if (s1 == 1 && s2 == 1)<br />

trSS[t1][t2]++<br />

t1 = t2<br />

s1 = s2<br />

for (i = 0 i < 187 i++) {<br />

for (j = 0 j < 187 j++)<br />

pr<strong>in</strong>tf("%d ", trUU[i][j])<br />

pr<strong>in</strong>tf("\n")<br />

}<br />

for (i = 0 i < 187 i++) {<br />

for (j = 0 j < 187 j++)<br />

pr<strong>in</strong>tf("%d ", trUS[i][j])<br />

136


}<br />

pr<strong>in</strong>tf("\n")<br />

for (i = 0 i < 187 i++) {<br />

for (j = 0 j < 187 j++)<br />

pr<strong>in</strong>tf("%d ", trSU[i][j])<br />

pr<strong>in</strong>tf("\n")<br />

}<br />

}<br />

for (i = 0 i < 187 i++) {<br />

for (j = 0 j < 187 j++)<br />

pr<strong>in</strong>tf("%d ", trSS[i][j])<br />

pr<strong>in</strong>tf("\n")<br />

}<br />

/*-------*\<br />

| END |<br />

\*-------*/<br />

F.7 transgroups.c<br />

Transgroups has hard{wired <strong>in</strong>to it the group denitions (i.e. which tags belong to which groups). It takes<br />

as <strong>in</strong>put the transitions.table produced by the transitions program. It uses the group denitions <strong>and</strong> the<br />

transitions.table to produce the group{to{group transition probabilities. That is the probability that a tag<br />

<strong>in</strong> group G1 will have stress state S1 <strong>and</strong> be followed by a tag <strong>in</strong> group G2 which has a stress state S2. S1<br />

<strong>and</strong> S2, <strong>in</strong> this case, are either stressed or unstressed. The result<strong>in</strong>g probabilities are used <strong>in</strong> proabilityc.<br />

/******************************************************************************\<br />

transgroups - adds up values <strong>in</strong> transitions.table for given group transitions<br />

AUTHOR:<br />

transgroups.c (c) Copyright May 1993, Simon Arnfield. All Rights Reserved.<br />

SYNOPSIS:<br />

transgroups


*----------------------------*\<br />

| INCLUDES AND DEFINITIONS |<br />

\*----------------------------*/<br />

#<strong>in</strong>clude <br />

#<strong>in</strong>clude <br />

#<strong>in</strong>clude <br />

#def<strong>in</strong>e NUMGROUPS 10<br />

/*-------------------------*\<br />

| FUNCTIONS DEFINITIONS |<br />

\*-------------------------*/<br />

<strong>in</strong>t whichgroup(tg) /* returns group number for tag number tg */<br />

<strong>in</strong>t tg<br />

{<br />

switch (tg) {<br />

case 177: case 178: case 179: case 180: case 181:<br />

case 182: case 183: case 184: case 185: case 186:<br />

return(0)<br />

case 1: case 2: case 8: case 9: case 10:<br />

case 11: case 44: case 51: case 95: case 96:<br />

case 97: case 98: case 101: case 103: case 105:<br />

case 146: case 148: case 150: case 151: case 154:<br />

case 155: case 156: case 157: case 158: case 159:<br />

case 161: case 162: case 163: case 164:<br />

return(1)<br />

case 41: case 43: case 99: case 100: case 102:<br />

case 107: case 108: case 109: case 110: case 147:<br />

case 149: case 152:<br />

return(2)<br />

case 3: case 4: case 5: case 6: case 7:<br />

case 18: case 19: case 46: case 47: case 49:<br />

case 50: case 117: case 145:<br />

return(3)<br />

case 12: case 13: case 14: case 15: case 16:<br />

case 17: case 20: case 38: case 39: case 40:<br />

case 42: case 58: case 81: case 82: case 104:<br />

case 106: case 153: case 160: case 165: case 166:<br />

case 167: case 168:<br />

return(4)<br />

case 21: case 22: case 23: case 24: case 25:<br />

case 26: case 27: case 28: case 52: case 53:<br />

case 54: case 55: case 56: case 57: case 66:<br />

case 90: case 91: case 92: case 118: case 130:<br />

case 131: case 132: case 134: case 135: case 136:<br />

case 137: case 138: case 139: case 140: case 141:<br />

case 142: case 143: case 144: case 169: case 172:<br />

case 175:<br />

return(5)<br />

138


case 0: case 29: case 59: case 60: case 61:<br />

case 62: case 63: case 174:<br />

return(6)<br />

case 78: case 79: case 80: case 170: case 171:<br />

case 173:<br />

return(7)<br />

case 65: case 67: case 68: case 69: case 70:<br />

case 71: case 72: case 73: case 74: case 75:<br />

case 83: case 84: case 111: case 112: case 113:<br />

case 114: case 115: case 116: case 176:<br />

return(8)<br />

case 30: case 31: case 32: case 33: case 34:<br />

case 35: case 36: case 37: case 45: case 48:<br />

case 64: case 76: case 77: case 85: case 86:<br />

case 87: case 88: case 89: case 93: case 94:<br />

case 119: case 120: case 121: case 122: case 123:<br />

case 124: case 125: case 126: case 127: case 128:<br />

case 129: case 133:<br />

return(9)<br />

}<br />

}<br />

void ma<strong>in</strong>()<br />

{<br />

<strong>in</strong>t i, j, val, g1, g2,<br />

gpUU[NUMGROUPS][NUMGROUPS], gpUS[NUMGROUPS][NUMGROUPS],<br />

gpSU[NUMGROUPS][NUMGROUPS], gpSS[NUMGROUPS][NUMGROUPS]<br />

float T = 0<br />

for (i = 0 i < NUMGROUPS i++) {<br />

for (j = 0 j < NUMGROUPS j++) {<br />

gpUU[i][j] = 0<br />

gpUS[i][j] = 0<br />

gpSU[i][j] = 0<br />

gpSS[i][j] = 0<br />

}<br />

}<br />

for (i = 0 i < 187 i++)<br />

for (j = 0 j < 187 j++) {<br />

scanf("%d", &val)<br />

g1 = whichgroup(i)<br />

g2 = whichgroup(j)<br />

gpUU[g1][g2] +=val<br />

}<br />

for (i = 0 i < 187 i++)<br />

for (j = 0 j < 187 j++) {<br />

scanf("%d", &val)<br />

g1 = whichgroup(i)<br />

g2 = whichgroup(j)<br />

gpUS[g1][g2] +=val<br />

139


}<br />

for (i = 0 i < 187 i++)<br />

for (j = 0 j < 187 j++) {<br />

scanf("%d", &val)<br />

g1 = whichgroup(i)<br />

g2 = whichgroup(j)<br />

gpSU[g1][g2] +=val<br />

}<br />

for (i = 0 i < 187 i++)<br />

for (j = 0 j < 187 j++) {<br />

scanf("%d", &val)<br />

g1 = whichgroup(i)<br />

g2 = whichgroup(j)<br />

gpSS[g1][g2] +=val<br />

}<br />

for (i = 0 i < NUMGROUPS i++) {<br />

for (j = 0 j < NUMGROUPS j++) {<br />

T = gpUU[i][j] + gpUS[i][j] + gpSU[i][j]<br />

+ gpSS[i][j]<br />

pr<strong>in</strong>tf("%5.4f,", (float)gpUU[i][j] / T)<br />

}<br />

pr<strong>in</strong>tf("\n")<br />

}<br />

for (i = 0 i < NUMGROUPS i++) {<br />

for (j = 0 j < NUMGROUPS j++) {<br />

T = gpUU[i][j] + gpUS[i][j] + gpSU[i][j]<br />

+ gpSS[i][j]<br />

pr<strong>in</strong>tf("%5.4f,", (float)gpUS[i][j] / T)<br />

}<br />

pr<strong>in</strong>tf("\n")<br />

}<br />

for (i = 0 i < NUMGROUPS i++) {<br />

for (j = 0 j < NUMGROUPS j++) {<br />

T = gpUU[i][j] + gpUS[i][j] + gpSU[i][j]<br />

+ gpSS[i][j]<br />

pr<strong>in</strong>tf("%5.4f,", (float)gpSU[i][j] / T)<br />

}<br />

pr<strong>in</strong>tf("\n")<br />

}<br />

for (i = 0 i < NUMGROUPS i++) {<br />

for (j = 0 j < NUMGROUPS j++) {<br />

T = gpUU[i][j] + gpUS[i][j] + gpSU[i][j]<br />

+ gpSS[i][j]<br />

pr<strong>in</strong>tf("%5.4f,", (float)gpSS[i][j] / T)<br />

}<br />

pr<strong>in</strong>tf("\n")<br />

}<br />

}<br />

/*-------*\<br />

| END |<br />

\*-------*/<br />

140


F.8 segment.c<br />

This program reads st<strong>and</strong>ard <strong>in</strong>put <strong>and</strong> segments the text <strong>in</strong>put le <strong>in</strong>to tone units us<strong>in</strong>g three tone unit<br />

boundary markers. Initially this is based upon punctuation. Further research should lead to a more<br />

rened segmentation algorithm that makes use <strong>of</strong> phrase boundaries <strong>and</strong> rules. As this program st<strong>and</strong>s<br />

punctuation is mapped <strong>in</strong>to tone unit boundary symbols with the exception <strong>of</strong> quotes which are mapped<br />

<strong>in</strong>to noth<strong>in</strong>g.<br />

/******************************************************************************\<br />

segment.c - segment text file <strong>in</strong>to tone-units.<br />

AUTHOR:<br />

(c) Copyright July 1992, Simon Arnfield. All Rights Reserved.<br />

SYNOPSIS:<br />

segment<br />

DESCRIPTION:<br />

reads st<strong>and</strong>ard <strong>in</strong>put <strong>and</strong> segments the text <strong>in</strong>put file <strong>in</strong>to tone-units<br />

us<strong>in</strong>g three tone-unit boundary markers. Initially this is based upon<br />

punctuation. Punctuation is classed either as a m<strong>in</strong>or TU boundary,<br />

a major TU bouary or noth<strong>in</strong>g. Where appropriate multiple punctuation<br />

exists one TU is produced. Precedences: || first then ^ | <strong>and</strong> none.<br />

Distributions are taken from frequency <strong>of</strong> co-occurrence as given by<br />

results from ttalign.<br />

none open/close quotes<br />

m<strong>in</strong>or , ( ) hyphen<br />

major . ! :<br />

FILES:<br />

REFERENCES:<br />

BUGS:<br />

deletes apostrophies because it can't tell them from open-quotes.<br />

PROGRAM MODIFICATION HISTORY<br />

Date | By | Vers | Comments<br />

----------+-----+------+--------------------------------------------------------<br />

3/07/92 | ScA | 1.0 | Created orig<strong>in</strong>al code<br />

19/07/92 | ScA | | f<strong>in</strong>ished version 1.0<br />

| | |<br />

\******************************************************************************/<br />

/*----------------------------*\<br />

| INCLUDES AND DEFINITIONS |<br />

\*----------------------------*/<br />

#<strong>in</strong>clude <br />

#<strong>in</strong>clude <br />

#<strong>in</strong>clude <br />

#<strong>in</strong>clude <br />

#def<strong>in</strong>e NOTALPHA 0<br />

141


#def<strong>in</strong>e ALPHA 1<br />

#def<strong>in</strong>e NOTU 0<br />

#def<strong>in</strong>e MINORTU 1<br />

#def<strong>in</strong>e MAJORTU 2<br />

/*-------------------------*\<br />

| FUNCTIONS DEFINITIONS |<br />

\*-------------------------*/<br />

void ma<strong>in</strong>()<br />

{<br />

<strong>in</strong>t tu = NOTU, ch = NOTALPHA<br />

char c, lc = '\0'<br />

}<br />

while (!fe<strong>of</strong>(std<strong>in</strong>)) {<br />

c = getchar()<br />

if (isalnum(c) || isspace(c))<br />

ch = ALPHA<br />

else<br />

ch = NOTALPHA<br />

if (c == '\'' && isalnum(lc))<br />

ch = ALPHA<br />

if (strstr(",()-", &c) && tu == NOTU)<br />

tu = MINORTU<br />

if (strstr(".!:", &c))<br />

tu = MAJORTU<br />

if (ch == ALPHA) {<br />

if (tu == MINORTU)<br />

pr<strong>in</strong>tf("| ")<br />

if (tu == MAJORTU)<br />

pr<strong>in</strong>tf("|| ")<br />

tu = NOTU<br />

putchar(c)<br />

lc = c<br />

}<br />

}<br />

/*-------*\<br />

| END |<br />

\*-------*/<br />

F.9 probability.c<br />

Probability is the orig<strong>in</strong>al stress prediction model described <strong>in</strong> chapter 5. It makes no use <strong>of</strong> word<br />

class/prosodic mark bigram frequencies but uses prosodic mark bigram frequencies.<br />

/* probability.c copyright Simon Arnfield 15th January 1993 */<br />

/* compile with lc -Lm -DBISTATE probability.c for bi-stress model */<br />

/* compile with lc -Lm -DTRISTATE probability.c for tri-stress model */<br />

#<strong>in</strong>clude <br />

#<strong>in</strong>clude <br />

#<strong>in</strong>clude <br />

#def<strong>in</strong>e MAXWORDS 20<br />

#def<strong>in</strong>e NUMTAGS 168<br />

142


#def<strong>in</strong>e NUMBEST 1<br />

char tags[NUMTAGS][7] ={<br />

"&FO", "&FW", "APP$", "AT", "AT1", "BTO21",<br />

"BTO22","CC", "CC31", "CC32", "CC33", "CCB",<br />

"CF", "CS", "CS21", "CS22", "CSA", "CSN",<br />

"CST", "CSW", "DA", "DA1", "DA2", "DAR",<br />

"DAT", "DB", "DB2", "DD", "DD1", "DD121",<br />

"DD122", "DD2","DD21", "DD22", "DD221","DD222",<br />

"DDQ", "DDQ$","DDQV", "EX", "ICS", "IF", "II",<br />

"II21", "II22","II31", "II32", "II33", "IO",<br />

"IW", "JA", "JB", "JBR", "JJ", "JJR",<br />

"JJT", "LE", "MC", "MC-MC", "MC1", "MC2",<br />

"MD", "MF", "ND1", "NN", "NN1", "NN121",<br />

"NN122","NN2", "NNJ", "NNJ1", "NNJ2", "NNL1",<br />

"NNL2", "NNO", "NNO2", "NNS", "NNS1", "NNS2",<br />

"NNSA1","NNSB1", "NNT1", "NNT2","NNU", "NNU1",<br />

"NNU2", "NNU21", "NNU22","NP", "NP1", "NP2",<br />

"NPD1", "NPM1","PN", "PN1", "PN121","PN122",<br />

"PNQO", "PNQS","PP$", "PPH1", "PPHO1","PPHO2",<br />

"PPHS1","PPHS2","PPIO1","PPIO2","PPIS1","PPIS2",<br />

"PPX1", "PPX121","PPX122","PPX2","PPY","RA",<br />

"REX21","REX22","RG", "RG21", "RG22", "RGA",<br />

"RGQ", "RGQV", "RGR", "RGT", "RL", "RL21",<br />

"RL22", "RP", "RR", "RR21", "RR22", "RR31",<br />

"RR32", "RR33", "RRQ", "RRQV", "RRR", "RRT",<br />

"RT", "TO", "UH", "VB0", "VBDR", "VBDZ",<br />

"VBG", "VBM", "VBN", "VBR", "VBZ", "VD0",<br />

"VDD", "VDG", "VDN", "VDZ", "VH0", "VHD",<br />

"VHG", "VHN", "VHZ", "VM", "VM21", "VV0",<br />

"VVD", "VVG", "VVN", "VVZ", "XX"}<br />

/**** probabilities for bi-stress model ie Q A ****/<br />

float probs[NUMTAGS][3] = {<br />

0.0000, 0.00, 1.0000,<br />

0.2000, 0.00, 0.8000,<br />

0.8991, 0.00, 0.1009,<br />

0.9605, 0.00, 0.0395,<br />

0.9749, 0.00, 0.0250,<br />

0.5000, 0.00, 0.5000,<br />

0.7500, 0.00, 0.2500,<br />

0.8712, 0.00, 0.1288,<br />

1.0000, 0.00, 0.0000,<br />

0.0000, 0.00, 1.0000,<br />

1.0000, 0.00, 0.0000,<br />

0.8284, 0.00, 0.1716,<br />

0.5588, 0.00, 0.4412,<br />

0.5823, 0.00, 0.4177,<br />

0.2778, 0.00, 0.7222,<br />

0.7778, 0.00, 0.2223,<br />

0.7742, 0.00, 0.2258,<br />

1.0000, 0.00, 0.0000,<br />

0.9701, 0.00, 0.0299,<br />

0.0714, 0.00, 0.9286,<br />

0.3830, 0.00, 0.6170,<br />

0.2308, 0.00, 0.7692,<br />

0.2381, 0.00, 0.7619,<br />

143


0.2619, 0.00, 0.7381,<br />

0.0556, 0.00, 0.9445,<br />

0.0889, 0.00, 0.9112,<br />

0.1333, 0.00, 0.8666,<br />

0.2371, 0.00, 0.7629,<br />

0.3965, 0.00, 0.6035,<br />

1.0000, 0.00, 0.0000,<br />

0.0000, 0.00, 1.0000,<br />

0.3867, 0.00, 0.6133,<br />

1.0000, 0.00, 0.0000,<br />

0.5000, 0.00, 0.5000,<br />

1.0000, 0.00, 0.0000,<br />

0.2500, 0.00, 0.7500,<br />

0.6584, 0.00, 0.3416,<br />

1.0000, 0.00, 0.0000,<br />

0.0000, 0.00, 1.0000,<br />

0.8977, 0.00, 0.1023,<br />

0.5372, 0.00, 0.4628,<br />

0.9478, 0.00, 0.0522,<br />

0.8774, 0.00, 0.1226,<br />

0.3714, 0.00, 0.6286,<br />

0.9859, 0.00, 0.0141,<br />

1.0000, 0.00, 0.0000,<br />

0.0909, 0.00, 0.9091,<br />

1.0000, 0.00, 0.0000,<br />

0.9912, 0.00, 0.0088,<br />

0.8626, 0.00, 0.1373,<br />

0.0000, 0.00, 1.0000,<br />

0.1463, 0.00, 0.8537,<br />

0.0000, 0.00, 1.0000,<br />

0.0852, 0.00, 0.9148,<br />

0.0811, 0.00, 0.9189,<br />

0.1176, 0.00, 0.8823,<br />

0.2500, 0.00, 0.7500,<br />

0.1119, 0.00, 0.8881,<br />

0.2500, 0.00, 0.7500,<br />

0.1837, 0.00, 0.8164,<br />

0.0000, 0.00, 1.0000,<br />

0.1908, 0.00, 0.8092,<br />

0.3429, 0.00, 0.6572,<br />

0.1304, 0.00, 0.8696,<br />

0.1628, 0.00, 0.8372,<br />

0.0759, 0.00, 0.9241,<br />

0.0000, 0.00, 1.0000,<br />

0.0000, 0.00, 1.0000,<br />

0.0920, 0.00, 0.9080,<br />

0.1624, 0.00, 0.8376,<br />

0.0588, 0.00, 0.9412,<br />

0.0741, 0.00, 0.9260,<br />

0.1348, 0.00, 0.8653,<br />

0.1111, 0.00, 0.8889,<br />

0.3467, 0.00, 0.6534,<br />

0.0000, 0.00, 1.0000,<br />

0.0000, 0.00, 1.0000,<br />

0.1413, 0.00, 0.8587,<br />

0.0833, 0.00, 0.9167,<br />

0.0000, 0.00, 1.0000,<br />

0.7595, 0.00, 0.2405,<br />

144


0.1667, 0.00, 0.8333,<br />

0.1522, 0.00, 0.8478,<br />

0.3000, 0.00, 0.7000,<br />

0.1176, 0.00, 0.8824,<br />

0.0816, 0.00, 0.9184,<br />

0.9643, 0.00, 0.0357,<br />

0.1852, 0.00, 0.8148,<br />

0.5000, 0.00, 0.5000,<br />

0.0889, 0.00, 0.9111,<br />

0.0000, 0.00, 1.0000,<br />

0.0000, 0.00, 1.0000,<br />

0.0625, 0.00, 0.9376,<br />

0.0000, 0.00, 1.0000,<br />

0.1316, 0.00, 0.8685,<br />

0.0000, 0.00, 1.0000,<br />

1.0000, 0.00, 0.0000,<br />

0.0000, 0.00, 1.0000,<br />

0.8966, 0.00, 0.1035,<br />

0.8333, 0.00, 0.1667,<br />

0.9316, 0.00, 0.0684,<br />

0.7059, 0.00, 0.2941,<br />

0.9524, 0.00, 0.0476,<br />

0.8788, 0.00, 0.1212,<br />

0.8017, 0.00, 0.1984,<br />

0.7000, 0.00, 0.3000,<br />

1.0000, 0.00, 0.0000,<br />

0.7797, 0.00, 0.2203,<br />

0.9057, 0.00, 0.0944,<br />

0.0323, 0.00, 0.9677,<br />

0.0000, 0.00, 1.0000,<br />

0.0000, 0.00, 1.0000,<br />

0.1667, 0.00, 0.8333,<br />

0.9577, 0.00, 0.0423,<br />

0.2143, 0.00, 0.7857,<br />

0.9677, 0.00, 0.0323,<br />

0.0323, 0.00, 0.9678,<br />

0.4370, 0.00, 0.5631,<br />

1.0000, 0.00, 0.0000,<br />

1.0000, 0.00, 0.0000,<br />

0.1429, 0.00, 0.8571,<br />

0.5000, 0.00, 0.5000,<br />

0.0000, 0.00, 1.0000,<br />

0.5263, 0.00, 0.4736,<br />

0.5652, 0.00, 0.4348,<br />

0.1351, 0.00, 0.8648,<br />

1.0000, 0.00, 0.0000,<br />

0.0000, 0.00, 1.0000,<br />

0.3188, 0.00, 0.6811,<br />

0.1512, 0.00, 0.8488,<br />

0.8500, 0.00, 0.1500,<br />

0.1000, 0.00, 0.9000,<br />

0.6250, 0.00, 0.3750,<br />

0.1250, 0.00, 0.8750,<br />

0.8750, 0.00, 0.1250,<br />

0.5694, 0.00, 0.4306,<br />

1.0000, 0.00, 0.0000,<br />

0.0238, 0.00, 0.9762,<br />

0.0000, 0.00, 1.0000,<br />

145


0.1982, 0.00, 0.8018,<br />

0.9927, 0.00, 0.0073,<br />

0.3077, 0.00, 0.6923,<br />

0.9162, 0.00, 0.0838,<br />

0.9000, 0.00, 0.1000,<br />

0.9255, 0.00, 0.0745,<br />

0.7857, 0.00, 0.2143,<br />

0.0000, 0.00, 1.0000,<br />

0.9123, 0.00, 0.0878,<br />

0.8244, 0.00, 0.1756,<br />

0.8545, 0.00, 0.1455,<br />

0.2800, 0.00, 0.7200,<br />

0.4242, 0.00, 0.5757,<br />

0.1667, 0.00, 0.8333,<br />

0.4000, 0.00, 0.6000,<br />

0.3846, 0.00, 0.6154,<br />

0.8092, 0.00, 0.1908,<br />

0.8621, 0.00, 0.1379,<br />

0.5455, 0.00, 0.4545,<br />

0.3750, 0.00, 0.6250,<br />

0.8587, 0.00, 0.1413,<br />

0.7267, 0.00, 0.2734,<br />

0.1429, 0.00, 0.8571,<br />

0.1789, 0.00, 0.8211,<br />

0.1416, 0.00, 0.8584,<br />

0.1111, 0.00, 0.8889,<br />

0.0812, 0.00, 0.9188,<br />

0.2051, 0.00, 0.7948,<br />

0.2340, 0.00, 0.7660}<br />

float bigrams[5][5] = {<br />

0.1016, 0.0000, 0.2397, 0.0024, 0.0115, /* QQ -- QA QT QI */<br />

0.0000, 0.0000, 0.0000, 0.0000, 0.0000, /* -- -- -- -- -- */<br />

0.1305, 0.0000, 0.1442, 0.0404, 0.1377, /* AQ -- AA AT AI */<br />

0.0226, 0.0000, 0.0202, 0.0000, 0.0000, /* TQ -- TA -- -- */<br />

0.1005, 0.0000, 0.0488, 0.0000, 0.0000} /* IQ -- IA -- -- */<br />

/**** end <strong>of</strong> probabilities for bi-stress model ie Q A ****/<br />

ma<strong>in</strong>()<br />

{<br />

<strong>in</strong>t i, j, k, l, pos, done, w, tustart, tuend, numberstates<br />

double value, bigvalue[NUMBEST]<br />

<strong>in</strong>t state[MAXWORDS], bigstate[NUMBEST][MAXWORDS], stress[MAXWORDS]<br />

float sentence[MAXWORDS][3]<br />

char c, word[MAXWORDS][30], wordtag[MAXWORDS][7]<br />

/* read <strong>in</strong> tu data from st<strong>and</strong>ard <strong>in</strong>put */<br />

if ((c = getc(std<strong>in</strong>)) == 'T')<br />

tustart = 3<br />

else<br />

tustart = 4<br />

if (c != 'I' && tustart == 4)<br />

pr<strong>in</strong>tf("Invalid tustart character. Assum<strong>in</strong>g I\n")<br />

146


if ((c = getc(std<strong>in</strong>)) == 'T')<br />

tuend = 3<br />

else<br />

tuend = 4<br />

if (c != 'I' && tuend == 4)<br />

pr<strong>in</strong>tf("Invalid tuend character. Assum<strong>in</strong>g I\n")<br />

scanf("%d", &w) /* get number <strong>of</strong> words */<br />

if (w < 0 || w > 40) {<br />

pr<strong>in</strong>tf("Number <strong>of</strong> words(%d) too many. Exit<strong>in</strong>g\n",<br />

w)<br />

exit(1)<br />

}<br />

for (i = 0 i < w i++) {<br />

scanf("%s %s", wordtag[i], word[i])<br />

/* f<strong>in</strong>d tag <strong>in</strong> tags <strong>and</strong> set sentence[w][0..2] appropriately */<br />

for (j = 0 j < NUMTAGS j++)<br />

if (!strcmp(tags[j], wordtag[i])) {/* found the right tag */<br />

for (k = 0 k < 3 k++)<br />

sentence[i][k] = probs[j][k]<br />

break<br />

}<br />

if (j == NUMTAGS) {<br />

pr<strong>in</strong>tf("Invalid tag(%s). Exit<strong>in</strong>g\n", wordtag[i])<br />

exit(1)<br />

}<br />

}<br />

/* number <strong>of</strong> possible states 3^w */<br />

numberstates = pow(3.0, (double)w)<br />

for (j = 0 j < NUMBEST j++)<br />

bigvalue[j] = 0.0 /* reset best values */<br />

for (j = 0 j < w j++)<br />

state[j] = 0 /* set up first state */<br />

/* pr<strong>in</strong>t out words <strong>and</strong> tags */<br />

for (k = 0, j = 0 j < w j++) {<br />

k += strlen(word[j]) + strlen(wordtag[j]) + 2<br />

#ifdef SINGLELINEOUTPUT<br />

if (k > 70) {<br />

pr<strong>in</strong>tf("\n")<br />

k = 0<br />

}<br />

#endif<br />

pr<strong>in</strong>tf("%s=%s ", word[j], wordtag[j])<br />

/* assume unstressed, unless otherwise set below */<br />

stress[j] = 0<br />

for (i = 0 i < strlen(word[j]) i++)<br />

if (word[j][i] == ',' || word[j][i] == '/'<br />

|| word[j][i] == '`' || word[j][i] == '\\'<br />

|| word[j][i] == '_' ||<br />

word[j][i] == '~' || word[j][i] == '*')<br />

stress[j] = 2<br />

}<br />

#ifdef SINGLELINEOUTPUT<br />

pr<strong>in</strong>tf("\n")<br />

#endif<br />

147


for (i = 0 i < numberstates i++) {<br />

for (value = 1, j = 0 j < w j++)<br />

value *= sentence[j][state[j]]<br />

for (j = 1 j < w j++)<br />

value *= bigrams[state[j-1]][state[j]]<br />

value *= bigrams[tustart][state[0]] * bigrams[state[w-1]][tuend]<br />

/* keep track <strong>of</strong> the top NUMBEST most probable sequences */<br />

for (j = 0 value < bigvalue[j] && j < NUMBEST<br />

j++)<br />

<br />

if (j < NUMBEST) {<br />

/* shuffle other values to make room */<br />

for (k = NUMBEST - 1 k > j k--) {<br />

bigvalue[k] = bigvalue[k-1]<br />

for (l = 0 l < w l++)<br />

bigstate[k][l] = bigstate[k-1][l]<br />

}<br />

bigvalue[j] = value<br />

for (k = 0 k < w k++)<br />

bigstate[j][k] = state[k]<br />

}<br />

}<br />

pos = 0 /* update state[] to next state */<br />

do {<br />

if (++state[pos] == 3) {<br />

state[pos] = 0<br />

pos++<br />

done = 0<br />

} else<br />

done = 1<br />

} while (pos < w && !done)<br />

for (i = 0 i < NUMBEST i++) {<br />

pr<strong>in</strong>tf("%g ", bigvalue[i])<br />

if (tustart == 3)<br />

pr<strong>in</strong>tf("T")<br />

else<br />

pr<strong>in</strong>tf("I")<br />

k = 0<br />

for (j = 0 j < w j++) {<br />

if (stress[j] != bigstate[i][j])<br />

k++<br />

switch (bigstate[i][j]) {<br />

case 0:<br />

pr<strong>in</strong>tf("Q")<br />

break<br />

case 1:<br />

pr<strong>in</strong>tf("S")<br />

break<br />

case 2:<br />

pr<strong>in</strong>tf("A")<br />

break<br />

}<br />

}<br />

148


}<br />

}<br />

if (tuend == 3)<br />

pr<strong>in</strong>tf("T ")<br />

else<br />

pr<strong>in</strong>tf("I ")<br />

if (k == 0)<br />

pr<strong>in</strong>tf("CORRECT\n")<br />

else<br />

pr<strong>in</strong>tf("ERROR(%d)\n", k)<br />

F.10 probability3.c<br />

Probability3 is the orig<strong>in</strong>al prosodic mark prediction model described <strong>in</strong> chapter 6.<br />

/* probability.c copyright Simon Arnfield 15th January 1993 */<br />

/* probability3.c (version 4) copyright 25/5/93, 15/7/93 */<br />

/* cc -o probability3 probability3.c -lm */<br />

#<strong>in</strong>clude <br />

#<strong>in</strong>clude <br />

#<strong>in</strong>clude <br />

#def<strong>in</strong>e MAXWORDS 15<br />

#def<strong>in</strong>e NUMTAGS 168<br />

char tags[NUMTAGS][7] ={<br />

"&FO", "&FW", "APP$", "AT", "AT1", "BTO21",<br />

"BTO22","CC", "CC31", "CC32", "CC33", "CCB",<br />

"CF", "CS", "CS21", "CS22", "CSA", "CSN",<br />

"CST", "CSW", "DA", "DA1", "DA2", "DAR",<br />

"DAT", "DB", "DB2", "DD", "DD1", "DD121",<br />

"DD122", "DD2","DD21", "DD22", "DD221","DD222",<br />

"DDQ", "DDQ$","DDQV", "EX", "ICS", "IF", "II",<br />

"II21", "II22","II31", "II32", "II33", "IO",<br />

"IW", "JA", "JB", "JBR", "JJ", "JJR",<br />

"JJT", "LE", "MC", "MC-MC", "MC1", "MC2",<br />

"MD", "MF", "ND1", "NN", "NN1", "NN121",<br />

"NN122","NN2", "NNJ", "NNJ1", "NNJ2", "NNL1",<br />

"NNL2", "NNO", "NNO2", "NNS", "NNS1", "NNS2",<br />

"NNSA1","NNSB1", "NNT1", "NNT2","NNU", "NNU1",<br />

"NNU2", "NNU21", "NNU22","NP", "NP1", "NP2",<br />

"NPD1", "NPM1","PN", "PN1", "PN121","PN122",<br />

"PNQO", "PNQS","PP$", "PPH1", "PPHO1","PPHO2",<br />

"PPHS1","PPHS2","PPIO1","PPIO2","PPIS1","PPIS2",<br />

"PPX1", "PPX121","PPX122","PPX2","PPY","RA",<br />

"REX21","REX22","RG", "RG21", "RG22", "RGA",<br />

"RGQ", "RGQV", "RGR", "RGT", "RL", "RL21",<br />

"RL22", "RP", "RR", "RR21", "RR22", "RR31",<br />

"RR32", "RR33", "RRQ", "RRQV", "RRR", "RRT",<br />

"RT", "TO", "UH", "VB0", "VBDR", "VBDZ",<br />

"VBG", "VBM", "VBN", "VBR", "VBZ", "VD0",<br />

"VDD", "VDG", "VDN", "VDZ", "VH0", "VHD",<br />

"VHG", "VHN", "VHZ", "VM", "VM21", "VV0",<br />

"VVD", "VVG", "VVN", "VVZ", "XX"}<br />

/**** probabilities for pent-stress model ie Rise Fall Vfallrise Str Ustr ****/<br />

149


float probs[NUMTAGS][5] = {<br />

0.0001, 0.0001, 0.1443, 0.0357, 0.0001,<br />

0.1869, 0.5785, 0.2886, 0.3213, 0.1090,<br />

0.1869, 0.7954, 1.0101, 0.5355, 6.6042,<br />

0.0001, 1.3015, 0.4329, 2.2135, 43.9407,<br />

0.1869, 0.0723, 0.1443, 0.5712, 16.1072,<br />

0.0001, 0.0001, 0.0001, 0.0714, 0.0436,<br />

0.0001, 0.0001, 0.0001, 0.0357, 0.0654,<br />

1.1215, 0.9400, 0.8658, 2.7133, 14.8867,<br />

0.0001, 0.0001, 0.0001, 0.0001, 0.0218,<br />

0.0001, 0.0723, 0.0001, 0.0001, 0.0001,<br />

0.0001, 0.0001, 0.0001, 0.0001, 0.0218,<br />

0.3738, 0.0001, 0.1443, 0.9282, 3.0514,<br />

0.7477, 0.0723, 0.2886, 0.2856, 0.4141,<br />

0.7477, 0.8677, 0.0001, 1.7851, 2.0052,<br />

0.0001, 0.2892, 0.1443, 0.2856, 0.1090,<br />

0.0001, 0.0723, 0.1443, 0.0714, 0.3051,<br />

0.1869, 0.2892, 0.0001, 0.5712, 1.5693,<br />

0.0001, 0.0001, 0.0001, 0.0001, 0.5231,<br />

0.0001, 0.0001, 0.0001, 0.2856, 5.6670,<br />

0.1869, 0.2892, 0.0001, 0.2856, 0.0218,<br />

0.1869, 0.2169, 0.5772, 0.7497, 0.3923,<br />

0.1869, 0.1446, 0.2886, 0.5355, 0.1308,<br />

0.5607, 0.4338, 1.0101, 0.5712, 0.2180,<br />

0.3738, 0.6508, 0.4329, 0.6069, 0.2398,<br />

0.3738, 0.1446, 1.2987, 0.1428, 0.0218,<br />

0.1869, 1.9523, 2.3088, 1.3567, 0.1744,<br />

0.0001, 0.0723, 0.8658, 0.2142, 0.0436,<br />

0.0001, 1.0123, 2.4531, 1.5352, 0.5013,<br />

1.1215, 3.0369, 4.6176, 3.2845, 2.4629,<br />

0.0001, 0.0001, 0.0001, 0.0001, 0.0654,<br />

0.0001, 0.0723, 0.1443, 0.0357, 0.0001,<br />

0.9346, 0.2892, 0.5772, 1.1782, 0.6321,<br />

0.0001, 0.0001, 0.0001, 0.0001, 0.0872,<br />

0.0001, 0.0723, 0.0001, 0.0357, 0.0436,<br />

0.0001, 0.0001, 0.0001, 0.0001, 0.2616,<br />

0.0001, 0.2169, 0.1443, 0.1785, 0.0654,<br />

0.5607, 0.7954, 0.1443, 1.4281, 2.3104,<br />

0.0001, 0.0001, 0.0001, 0.0001, 0.0654,<br />

0.0001, 0.2169, 0.1443, 0.0001, 0.0001,<br />

0.0001, 0.0001, 0.0001, 0.3213, 1.7219,<br />

0.1869, 0.7231, 0.7215, 1.4281, 1.4167,<br />

0.1869, 0.0723, 0.0001, 0.4284, 5.5362,<br />

1.4953, 2.6030, 3.7518, 6.7119, 40.2572,<br />

0.1869, 0.6508, 1.0101, 0.9639, 0.5667,<br />

0.0001, 0.0001, 0.0001, 0.0357, 1.5257,<br />

0.0001, 0.0001, 0.0001, 0.0001, 0.2398,<br />

0.0001, 0.2169, 0.1443, 0.2142, 0.0218,<br />

0.0001, 0.0001, 0.0001, 0.0001, 0.2398,<br />

0.0001, 0.0723, 0.0001, 0.2499, 19.5292,<br />

0.0001, 0.2892, 0.4329, 0.6426, 3.4220,<br />

0.1869, 0.2892, 0.0001, 0.0714, 0.0001,<br />

1.3084, 1.8800, 3.0303, 3.0703, 0.5231,<br />

0.0001, 0.0001, 0.1443, 0.0357, 0.0001,<br />

15.5140, 25.3073, 30.8802, 32.4884, 3.1604,<br />

0.3738, 0.5061, 1.7316, 0.4641, 0.0654,<br />

0.3738, 1.0123, 2.0202, 0.5355, 0.1308,<br />

0.0001, 0.2169, 0.1443, 0.2856, 0.0872,<br />

150


2.6168, 7.2307, 5.6277, 8.7112, 1.0898,<br />

0.0001, 0.0723, 0.0001, 0.0714, 0.0218,<br />

0.9346, 0.8677, 1.7316, 1.8208, 0.3923,<br />

0.5607, 0.2892, 0.2886, 0.0714, 0.0001,<br />

0.5607, 1.5907, 4.1847, 2.4634, 0.6321,<br />

0.5607, 1.0846, 0.1443, 0.9639, 0.5231,<br />

0.1869, 0.6508, 0.1443, 0.3213, 0.0654,<br />

3.1776, 2.6030, 1.2987, 2.9275, 0.6103,<br />

85.7944, 71.0774, 67.0996, 51.5173, 5.9939,<br />

0.0001, 0.0723, 0.0001, 0.0357, 0.0001,<br />

0.0001, 0.0723, 0.0001, 0.0357, 0.0001,<br />

34.7664, 29.5011, 24.3867, 22.4206, 3.0732,<br />

2.2430, 1.8800, 2.1645, 1.6066, 0.4141,<br />

0.5607, 0.3615, 0.1443, 0.2499, 0.0218,<br />

0.5607, 0.3615, 0.4329, 0.4998, 0.0436,<br />

3.5514, 3.1092, 1.4430, 1.7851, 0.4141,<br />

0.0001, 0.3615, 0.1443, 0.3570, 0.0436,<br />

0.5607, 0.7954, 1.4430, 0.8925, 0.5667,<br />

0.0001, 0.2169, 0.0001, 0.0357, 0.0001,<br />

0.1869, 0.0001, 0.0001, 0.0001, 0.0001,<br />

1.1215, 0.7954, 1.0101, 1.9636, 0.2833,<br />

0.3738, 0.2169, 0.5772, 0.0714, 0.0218,<br />

0.1869, 0.0001, 0.0001, 0.0001, 0.0001,<br />

0.3738, 0.0723, 0.0001, 0.5712, 1.3078,<br />

3.3645, 3.6876, 2.3088, 3.2131, 0.7629,<br />

2.4299, 2.4584, 1.1544, 0.8211, 0.3051,<br />

0.1869, 0.0723, 0.1443, 0.1428, 0.0654,<br />

0.7477, 0.2892, 0.4329, 0.1428, 0.0436,<br />

0.5607, 1.5907, 0.5772, 0.5712, 0.0872,<br />

0.0001, 0.0001, 0.0001, 0.0357, 0.5885,<br />

0.0001, 0.7231, 0.4329, 0.3213, 0.1090,<br />

0.0001, 0.0723, 0.0001, 0.1071, 0.0872,<br />

26.7290, 23.1381, 27.7056, 19.4216, 2.5501,<br />

0.0001, 0.1446, 0.0001, 0.0001, 0.0001,<br />

0.0001, 0.3615, 0.0001, 0.1428, 0.0001,<br />

0.1869, 1.1569, 1.1544, 0.1785, 0.0436,<br />

0.0001, 0.2169, 0.0001, 0.0357, 0.0001,<br />

0.7477, 0.5061, 1.1544, 0.4998, 0.1090,<br />

0.0001, 0.0001, 0.1443, 0.0357, 0.0001,<br />

0.0001, 0.0001, 0.0001, 0.0001, 0.0436,<br />

0.0001, 0.0001, 0.0001, 0.0357, 0.0001,<br />

0.1869, 0.0723, 0.0001, 0.2499, 1.7001,<br />

0.0001, 0.0723, 0.0001, 0.0001, 0.1090,<br />

0.0001, 0.1446, 0.0001, 0.5712, 5.3400,<br />

0.1869, 0.0723, 0.0001, 0.1071, 0.2616,<br />

0.0001, 0.0001, 0.1443, 0.0357, 0.8718,<br />

0.0001, 0.4338, 0.1443, 0.3213, 2.5283,<br />

0.1869, 0.3615, 1.0101, 0.3927, 2.1142,<br />

0.0001, 0.0723, 0.0001, 0.0714, 0.1526,<br />

0.0001, 0.0001, 0.0001, 0.0001, 0.5013,<br />

0.0001, 0.0723, 0.4329, 0.3213, 1.0026,<br />

0.0001, 0.0001, 0.1443, 0.3213, 2.0924,<br />

0.1869, 0.7954, 1.1544, 0.3570, 0.0218,<br />

0.0001, 0.0001, 0.0001, 0.0714, 0.0001,<br />

0.0001, 0.0723, 0.0001, 0.0357, 0.0001,<br />

0.1869, 0.0723, 0.7215, 0.1071, 0.0436,<br />

0.0001, 0.0001, 0.1443, 0.0714, 1.4821,<br />

0.3738, 0.5061, 0.4329, 0.3570, 0.1308,<br />

151


0.0001, 0.0001, 0.0001, 0.0357, 0.6539,<br />

0.7477, 0.1446, 0.1443, 0.8211, 0.0218,<br />

0.1869, 1.0123, 0.7215, 1.6780, 1.1334,<br />

0.0001, 0.0001, 0.0001, 0.0001, 0.0436,<br />

0.0001, 0.0001, 0.0001, 0.0001, 0.0436,<br />

0.0001, 0.2169, 0.0001, 0.1071, 0.0218,<br />

0.1869, 0.0001, 0.0001, 0.1071, 0.0872,<br />

0.0001, 0.0001, 0.0001, 0.0357, 0.0001,<br />

0.0001, 0.0723, 0.4329, 0.4998, 0.4359,<br />

0.1869, 0.0723, 0.1443, 0.2499, 0.2833,<br />

1.6822, 2.0969, 1.0101, 1.8208, 0.3269,<br />

0.0001, 0.0001, 0.0001, 0.0001, 0.0654,<br />

0.1869, 0.0723, 0.1443, 0.0001, 0.0001,<br />

3.3645, 4.1215, 1.8759, 1.8922, 1.4385,<br />

8.5981, 13.3767, 14.2857, 11.6744, 2.5501,<br />

0.0001, 0.0001, 0.0001, 0.3213, 1.1116,<br />

1.4953, 1.2292, 0.8658, 0.8211, 0.1308,<br />

0.0001, 0.0001, 0.0001, 0.1071, 0.1090,<br />

0.0001, 0.4338, 0.0001, 0.0357, 0.0218,<br />

0.0001, 0.0001, 0.0001, 0.0357, 0.1526,<br />

0.0001, 0.7954, 0.2886, 0.6426, 0.8936,<br />

0.0001, 0.0001, 0.0001, 0.0001, 0.0218,<br />

0.5607, 0.7231, 1.0101, 0.7497, 0.0218,<br />

0.0001, 0.0001, 0.0001, 0.0357, 0.0001,<br />

2.6168, 1.3015, 1.7316, 1.6066, 0.4795,<br />

0.0001, 0.0001, 0.0001, 0.1071, 8.9364,<br />

0.9346, 0.2169, 0.0001, 0.3570, 0.1744,<br />

0.1869, 0.2169, 0.0001, 0.3570, 3.3348,<br />

0.0001, 0.1446, 0.1443, 0.2499, 1.9616,<br />

0.0001, 0.1446, 0.1443, 0.6426, 5.6888,<br />

0.0001, 0.0001, 0.0001, 0.2142, 0.4795,<br />

0.0001, 0.0723, 0.0001, 0.0001, 0.0001,<br />

0.1869, 0.0723, 0.0001, 0.2856, 2.2668,<br />

0.0001, 0.6508, 0.2886, 0.4284, 2.3540,<br />

0.5607, 0.7954, 1.1544, 0.6069, 4.9913,<br />

0.0001, 0.4338, 1.1544, 0.1428, 0.1526,<br />

0.1869, 0.4338, 0.0001, 0.4284, 0.3051,<br />

0.0001, 0.0001, 0.1443, 0.1428, 0.0218,<br />

0.0001, 0.0723, 0.0001, 0.0714, 0.0436,<br />

0.0001, 0.1446, 0.0001, 0.2142, 0.1090,<br />

0.1869, 0.2169, 0.7215, 0.7140, 2.6809,<br />

0.0001, 0.1446, 0.0001, 0.3570, 1.6347,<br />

0.0001, 0.0001, 0.0001, 0.1785, 0.1308,<br />

0.0001, 0.1446, 0.0001, 0.1071, 0.0654,<br />

0.1869, 0.2169, 0.1443, 0.2856, 1.7219,<br />

0.3738, 1.8077, 1.4430, 1.6066, 4.7515,<br />

0.0001, 0.1446, 0.0001, 0.1428, 0.0218,<br />

13.8318, 10.9183, 8.0808, 13.5666, 3.1386,<br />

4.2991, 5.3507, 4.3290, 9.9607, 1.4603,<br />

4.4860, 4.3384, 3.6075, 8.1042, 0.9154,<br />

14.0187, 12.2198, 10.3896, 12.9597, 1.3078,<br />

2.4299, 2.0969, 0.7215, 3.8558, 0.8718,<br />

0.1869, 1.8077, 1.1544, 1.3567, 0.4795 }<br />

float bigram[6][6] = {<br />

0.0000, 0.0040, 0.0148, 0.0047, 0.0372, 0.1060, /* TUB */<br />

0.1313, 0.0022, 0.0025, 0.0013, 0.0082, 0.0212, /* rise */<br />

152


0.0903, 0.0008, 0.0057, 0.0040, 0.0200, 0.0458, /* fall */<br />

0.0685, 0.0007, 0.0043, 0.0015, 0.0413, 0.0503, /* V fallrise */<br />

0.0405, 0.0102, 0.0257, 0.0072, 0.0262, 0.0570, /* stress */<br />

0.0095, 0.0080, 0.0318, 0.0098, 0.0530, 0.0545} /* unstress */<br />

/***<br />

/* 0.000, 0.024, 0.089, 0.028, 0.223, 0.636, /* TUB */<br />

/* 0.788, 0.013, 0.015, 0.008, 0.049, 0.127, /* Rise */<br />

/* 0.542, 0.005, 0.034, 0.024, 0.120, 0.275, /* Fall */<br />

/* 0.411, 0.004, 0.026, 0.009, 0.248, 0.302, /* V fallrise */<br />

/* 0.243, 0.061, 0.154, 0.043, 0.157, 0.342, /* Stress */<br />

/* 0.057, 0.048, 0.191, 0.059, 0.318, 0.327} /* Unstress */<br />

/* TUB Rise Fall V Str Ustr */<br />

ma<strong>in</strong>()<br />

{<br />

<strong>in</strong>t i, j, k, pos, done, w, numberstates, s, tub = 0<br />

double value, bigvalue, v1, v2<br />

short <strong>in</strong>t state[MAXWORDS], bigstate[MAXWORDS]<br />

float sentence[MAXWORDS][6]<br />

char wordtag[MAXWORDS][7], prosody[MAXWORDS][24]<br />

/**pr<strong>in</strong>tf("\nNumber <strong>of</strong> tags >")**/<br />

scanf("%d", &w) /* get number <strong>of</strong> words */<br />

if (w < 0 || w > MAXWORDS) {<br />

/*pr<strong>in</strong>tf("Number <strong>of</strong> words(%d) not suitable. Exit<strong>in</strong>g\n",w)*/<br />

exit(0)<br />

}<br />

for (j = 0 j < w j++)<br />

state[j] = 1 /* set up first state range from 1...1 to 5...5 */<br />

bigvalue = 0.0 /* reset best values */<br />

/**pr<strong>in</strong>tf("Expect<strong>in</strong>g %d tags (<strong>and</strong> or tone unit boundaries)\n",w)**/<br />

for (i = 0 i < w i++) {<br />

/**pr<strong>in</strong>tf("Tag & word %d(<strong>of</strong> %d)>",i+1,w)**/<br />

scanf("%s %s", wordtag[i], prosody[i])<br />

for (j = 0 j < NUMTAGS j++)<br />

if (!strcmp(tags[j], wordtag[i])) {/* found the right tag */<br />

sentence[i][0] = 0.0 /* TUB */<br />

sentence[i][1] = probs[j][0] /* RISE */<br />

sentence[i][2] = probs[j][1] /* FALL */<br />

sentence[i][3] = probs[j][2] /* FALL-RISE */<br />

sentence[i][4] = probs[j][3] /* STRESSED */<br />

sentence[i][5] = probs[j][4] /* UNSTRESSED */<br />

break<br />

}<br />

if (!strcmp("{CP}", wordtag[i])) {/* ie it is a compound */<br />

/*pr<strong>in</strong>tf("A Compound - assum<strong>in</strong>g probably stressed\n")*/<br />

sentence[i][0] = 0.0<br />

sentence[i][1] = 0.25 /* treat as alomst */<br />

sentence[i][2] = 0.25<br />

sentence[i][3] = 0.1<br />

sentence[i][4] = 0.399 /* always stressed */<br />

sentence[i][5] = 0.001<br />

153


}<br />

}<br />

if (j == NUMTAGS) {<br />

/*pr<strong>in</strong>tf("Assum<strong>in</strong>g %s is a TUB - restrict<strong>in</strong>g search space\n",<br />

prosody[i])*/<br />

tub++<br />

sentence[i][0] = 1.0 /* a tu is always a tu */<br />

sentence[i][1] = 0.0<br />

sentence[i][2] = 0.0<br />

sentence[i][3] = 0.0<br />

sentence[i][4] = 0.0<br />

sentence[i][5] = 0.0<br />

state[i] = 0 /* prevent state chang<strong>in</strong>g a tu */<br />

}<br />

/* number <strong>of</strong> possible states 5^w */<br />

numberstates = pow(5.0, (double)(w - tub))<br />

/* but TUBs don't alter hence reduced search space */<br />

/**pr<strong>in</strong>tf("Process<strong>in</strong>g %d states\n",numberstates)**/<br />

for (i = 0 i < numberstates i++) {<br />

value = 1.0<br />

v1 = 1.0<br />

v2 = 1.0<br />

for (j = 0 j < w j++)<br />

v1 *= (double)sentence[j][state[j]]<br />

for (j = 1 j < w j++)<br />

v2 *= (double)bigram[state[j-1]][state[j]]<br />

value = v1 * v2<br />

if (value > bigvalue) {<br />

/** pr<strong>in</strong>tf("BEST SO FAR:")**/<br />

bigvalue = value<br />

for (k = 0 k < w k++) {<br />

bigstate[k] = state[k] /* save new best state */<br />

/** switch(state[k]) {<br />

case 0: pr<strong>in</strong>tf("|") break<br />

case 1: pr<strong>in</strong>tf("R") break<br />

case 2: pr<strong>in</strong>tf("F") break<br />

case 3: pr<strong>in</strong>tf("V") break<br />

case 4: pr<strong>in</strong>tf("S") break<br />

case 5: pr<strong>in</strong>tf("U")<br />

}**/<br />

}<br />

/** pr<strong>in</strong>tf(" %e\n",value)**/<br />

}<br />

pos = w - 1 /* update state[] to next state */<br />

do {<br />

done = 1 /* by default have done unless changed below */<br />

if (state[pos] == 0)<br />

pos-- /* don't change tub state */<br />

if (pos >= 0) { /* don't go past end */<br />

if (++state[pos] == 6) {/* if state <strong>in</strong>creases to 6 */<br />

state[pos] = 1 /* reset to 1 */<br />

154


}<br />

pos-- /* <strong>and</strong> po<strong>in</strong>t to next word */<br />

done = 0 /* <strong>and</strong> say we haven't done */<br />

}<br />

}<br />

} while (pos >= 0 && !done)<br />

for (i = 0 i < w i++)<br />

pr<strong>in</strong>tf("%s=%s ", wordtag[i], prosody[i])<br />

pr<strong>in</strong>tf("\nPredicted: ")<br />

for (i = 0 i < w i++) {<br />

switch (bigstate[i]) {<br />

case 0:<br />

pr<strong>in</strong>tf("|")<br />

break<br />

case 1:<br />

pr<strong>in</strong>tf("R")<br />

break<br />

case 2:<br />

pr<strong>in</strong>tf("F")<br />

break<br />

case 3:<br />

pr<strong>in</strong>tf("V")<br />

break<br />

case 4:<br />

pr<strong>in</strong>tf("S")<br />

break<br />

case 5:<br />

pr<strong>in</strong>tf("U")<br />

}<br />

}<br />

pr<strong>in</strong>tf("\nShould Be: ")<br />

for (i = 0 i < w i++) {<br />

if (bigstate[i] == 0)<br />

s = 0 /* tu boundary */<br />

else<br />

for (s = 5, j = 0 j < strlen(prosody[i])<br />

j++) {<br />

if ((prosody[i][j] == ',' || prosody[i][j]<br />

== '/') && s >= 4)<br />

s = 1<br />

if ((prosody[i][j] == '\\' || prosody[i][j]<br />

== '`') && s >= 4)<br />

s = 2<br />

if ((prosody[i][j] == ',' || prosody[i][j]<br />

== '/') && s == 2)<br />

s = 3<br />

if ((prosody[i][j] == '*' || prosody[i][j]<br />

== '_' || prosody[i][j] == '~') &&<br />

s == 5)<br />

s = 4<br />

if ((prosody[i][j] == '') && s == 5)<br />

s = 4<br />

}<br />

switch (s) {<br />

case 0:<br />

155


}<br />

pr<strong>in</strong>tf("|")<br />

break<br />

case 1:<br />

pr<strong>in</strong>tf("R")<br />

break<br />

case 2:<br />

pr<strong>in</strong>tf("F")<br />

break<br />

case 3:<br />

pr<strong>in</strong>tf("V")<br />

break<br />

case 4:<br />

pr<strong>in</strong>tf("S")<br />

break<br />

case 5:<br />

pr<strong>in</strong>tf("U")<br />

}<br />

}<br />

pr<strong>in</strong>tf("\n")<br />

F.11 probabilityc.c<br />

Probabilityc is the composite model for prosodic mark prediction. See chapters 5 <strong>and</strong> 6.<br />

/* probabilityc.c copyright Simon Arnfield 15th January 1993 */<br />

/* multistate-probabilities.c (version 4) copyright 25/5/93 */<br />

/* probability3.c (version 4) copyright 25/5/93, 15/7/93 */<br />

/* probabilityc.c (composite model) copyright 2/8/93 */<br />

/* cc -o probabilityc probabilityc.c -lm */<br />

#<strong>in</strong>clude <br />

#<strong>in</strong>clude <br />

#<strong>in</strong>clude <br />

#def<strong>in</strong>e MAXWORDS 20<br />

#def<strong>in</strong>e NUMTAGS 187<br />

#def<strong>in</strong>e NUMBEST 2<br />

struct tagtype {<br />

char tagname[7]<br />

<strong>in</strong>t group<br />

float prb2U, prb2S<br />

float prb5R, prb5F, prb5V, prb5S, prb5U<br />

}<br />

struct tagtype tags[187] = {<br />

"&FO", 6, 0.0025, 0.9975, 0.0001, 0.0001,<br />

0.1443, 0.0357, 0.0001,<br />

"&FW", 1, 0.2000, 0.8000, 0.1869, 0.5785,<br />

0.2886, 0.3213, 0.1090, /* not sure about group */<br />

"APP$", 1, 0.8991, 0.1009, 0.1869, 0.7954,<br />

1.0101, 0.5355, 6.6042,<br />

"AT", 3, 0.9605, 0.0395, 0.0001, 1.3015,<br />

0.4329, 2.2135, 43.9407,<br />

"AT1", 3, 0.9749, 0.0250, 0.1869, 0.0723,<br />

0.1443, 0.5712, 16.1072,<br />

156


"BTO", 3, 0.5000, 0.5000, 0.0001, 0.0001,<br />

0.0001, 0.1000, 0.1000, /* no data */<br />

"BTO21", 3, 0.5000, 0.5000, 0.0001, 0.0001,<br />

0.0001, 0.0714, 0.0436,<br />

"BTO22", 3, 0.7500, 0.2500, 0.0001, 0.0001,<br />

0.0001, 0.0357, 0.0654,<br />

"CC", 1, 0.8712, 0.1288, 1.1215, 0.9400,<br />

0.8658, 2.7133, 14.8867,<br />

"CC31", 1, 0.9975, 0.0025, 0.0001, 0.0001,<br />

0.0001, 0.0001, 0.0218,<br />

"CC32", 1, 0.0025, 0.9975, 0.0001, 0.0723,<br />

0.0001, 0.0001, 0.0001,<br />

"CC33", 1, 0.9975, 0.0025, 0.0001, 0.0001,<br />

0.0001, 0.0001, 0.0218,<br />

"CCB", 4, 0.8284, 0.1716, 0.3738, 0.0001,<br />

0.1443, 0.9282, 3.0514,<br />

"CF", 4, 0.5588, 0.4412, 0.7477, 0.0723,<br />

0.2886, 0.2856, 0.4141,<br />

"CS", 4, 0.5823, 0.4177, 0.7477, 0.8677,<br />

0.0001, 1.7851, 2.0052,<br />

"CS21", 4, 0.2778, 0.7222, 0.0001, 0.2892,<br />

0.1443, 0.2856, 0.1090,<br />

"CS22", 4, 0.7778, 0.2223, 0.0001, 0.0723,<br />

0.1443, 0.0714, 0.3051,<br />

"CSA", 4, 0.7742, 0.2258, 0.1869, 0.2892,<br />

0.0001, 0.5712, 1.5693,<br />

"CSN", 3, 0.9975, 0.0025, 0.0001, 0.0001,<br />

0.0001, 0.0001, 0.5231,<br />

"CST", 3, 0.9701, 0.0299, 0.0001, 0.0001,<br />

0.0001, 0.2856, 5.6670,<br />

"CSW", 4, 0.0714, 0.9286, 0.1869, 0.2892,<br />

0.0001, 0.2856, 0.0218,<br />

"DA", 5, 0.3830, 0.6170, 0.1869, 0.2169,<br />

0.5772, 0.7497, 0.3923,<br />

"DA1", 5, 0.2308, 0.7692, 0.1869, 0.1446,<br />

0.2886, 0.5355, 0.1308,<br />

"DA2", 5, 0.2381, 0.7619, 0.5607, 0.4338,<br />

1.0101, 0.5712, 0.2180,<br />

"DA2R", 5, 0.5000, 0.5000, 0.2000, 0.2000,<br />

0.2000, 0.2000, 0.2000, /* no data */<br />

"DAR", 5, 0.2619, 0.7381, 0.3738, 0.6508,<br />

0.4329, 0.6069, 0.2398,<br />

"DAT", 5, 0.0556, 0.9445, 0.3738, 0.1446,<br />

1.2987, 0.1428, 0.0218,<br />

"DB", 5, 0.0889, 0.9112, 0.1869, 1.9523,<br />

2.3088, 1.3567, 0.1744,<br />

"DB2", 5, 0.1333, 0.8666, 0.0001, 0.0723,<br />

0.8658, 0.2142, 0.0436,<br />

"DD", 6, 0.2371, 0.7629, 0.0001, 1.0123,<br />

2.4531, 1.5352, 0.5013,<br />

"DD1", 9, 0.3965, 0.6035, 1.1215, 3.0369,<br />

4.6176, 3.2845, 2.4629,<br />

"DD121", 9, 0.9975, 0.0025, 0.0001, 0.0001,<br />

0.0001, 0.0001, 0.0654,<br />

"DD122", 9, 0.0025, 0.9975, 0.0001, 0.0723,<br />

0.1443, 0.0357, 0.0001,<br />

"DD2", 9, 0.3867, 0.6133, 0.9346, 0.2892,<br />

0.5772, 1.1782, 0.6321,<br />

157


"DD21", 9, 0.9975, 0.0025, 0.0001, 0.0001,<br />

0.0001, 0.0001, 0.0872,<br />

"DD22", 9, 0.5000, 0.5000, 0.0001, 0.0723,<br />

0.0001, 0.0357, 0.0436,<br />

"DD221", 9, 0.9975, 0.0025, 0.0001, 0.0001,<br />

0.0001, 0.0001, 0.2616,<br />

"DD222", 9, 0.2500, 0.7500, 0.0001, 0.2169,<br />

0.1443, 0.1785, 0.0654,<br />

"DDQ", 4, 0.6584, 0.3416, 0.5607, 0.7954,<br />

0.1443, 1.4281, 2.3104,<br />

"DDQ$", 4, 0.9975, 0.0025, 0.0001, 0.0001,<br />

0.0001, 0.0001, 0.0654,<br />

"DDQV", 4, 0.0025, 0.9975, 0.0001, 0.2169,<br />

0.1443, 0.0001, 0.0001,<br />

"EX", 2, 0.8977, 0.1023, 0.0001, 0.0001,<br />

0.0001, 0.3213, 1.7219,<br />

"ICS", 4, 0.5372, 0.4628, 0.1869, 0.7231,<br />

0.7215, 1.4281, 1.4167,<br />

"IF", 2, 0.9478, 0.0522, 0.1869, 0.0723,<br />

0.0001, 0.4284, 5.5362,<br />

"II", 1, 0.8774, 0.1226, 1.4953, 2.6030,<br />

3.7518, 6.7119, 40.2572,<br />

"II21", 9, 0.3714, 0.6286, 0.1869, 0.6508,<br />

1.0101, 0.9639, 0.5667,<br />

"II22", 3, 0.9859, 0.0141, 0.0001, 0.0001,<br />

0.0001, 0.0357, 1.5257,<br />

"II31", 3, 0.9975, 0.0025, 0.0001, 0.0001,<br />

0.0001, 0.0001, 0.2398,<br />

"II32", 9, 0.0909, 0.9091, 0.0001, 0.2169,<br />

0.1443, 0.2142, 0.0218,<br />

"II33", 3, 0.9975, 0.0025, 0.0001, 0.0001,<br />

0.0001, 0.0001, 0.2398,<br />

"IO", 3, 0.9912, 0.0088, 0.0001, 0.0723,<br />

0.0001, 0.2499, 19.5292,<br />

"IW", 1, 0.8626, 0.1373, 0.0001, 0.2892,<br />

0.4329, 0.6426, 3.4220,<br />

"JA", 5, 0.0025, 0.9975, 0.1869, 0.2892,<br />

0.0001, 0.0714, 0.0001,<br />

"JB", 5, 0.1463, 0.8537, 1.3084, 1.8800,<br />

3.0303, 3.0703, 0.5231,<br />

"JBR", 5, 0.0025, 0.9975, 0.0001, 0.0001,<br />

0.1443, 0.0357, 0.0001,<br />

"JJ", 5, 0.0852, 0.9148, 15.5140, 25.3073,<br />

30.8802, 32.4884, 3.1604,<br />

"JJR", 5, 0.0811, 0.9189, 0.3738, 0.5061,<br />

1.7316, 0.4641, 0.0654,<br />

"JJT", 5, 0.1176, 0.8823, 0.3738, 1.0123,<br />

2.0202, 0.5355, 0.1308,<br />

"LE", 4, 0.2500, 0.7500, 0.0001, 0.2169,<br />

0.1443, 0.2856, 0.0872,<br />

"MC", 6, 0.1119, 0.8881, 2.6168, 7.2307,<br />

5.6277, 8.7112, 1.0898,<br />

"MC-MC", 6, 0.2500, 0.7500, 0.0001, 0.0723,<br />

0.0001, 0.0714, 0.0218,<br />

"MC1", 6, 0.1837, 0.8164, 0.9346, 0.8677,<br />

1.7316, 1.8208, 0.3923,<br />

"MC2", 6, 0.0025, 0.9975, 0.5607, 0.2892,<br />

0.2886, 0.0714, 0.0001,<br />

158


"MD", 6, 0.1908, 0.8092, 0.5607, 1.5907,<br />

4.1847, 2.4634, 0.6321,<br />

"MF", 9, 0.3429, 0.6572, 0.5607, 1.0846,<br />

0.1443, 0.9639, 0.5231,<br />

"ND1", 8, 0.1304, 0.8696, 0.1869, 0.6508,<br />

0.1443, 0.3213, 0.0654,<br />

"NN", 5, 0.1628, 0.8372, 3.1776, 2.6030,<br />

1.2987, 2.9275, 0.6103,<br />

"NN1", 8, 0.0759, 0.9241, 85.7944, 71.0774,<br />

67.0996, 51.5173, 5.9939,<br />

"NN121", 8, 0.0025, 0.9975, 0.0001, 0.0723,<br />

0.0001, 0.0357, 0.0001,<br />

"NN122", 8, 0.0025, 0.9975, 0.0001, 0.0723,<br />

0.0001, 0.0357, 0.0001,<br />

"NN2", 8, 0.0920, 0.9080, 34.7664, 29.5011,<br />

24.3867, 22.4206, 3.0732,<br />

"NNJ", 8, 0.1624, 0.8376, 2.2430, 1.8800,<br />

2.1645, 1.6066, 0.4141,<br />

"NNJ1", 8, 0.0588, 0.9412, 0.5607, 0.3615,<br />

0.1443, 0.2499, 0.0218,<br />

"NNJ2", 8, 0.0741, 0.9260, 0.5607, 0.3615,<br />

0.4329, 0.4998, 0.0436,<br />

"NNL1", 8, 0.1348, 0.8653, 3.5514, 3.1092,<br />

1.4430, 1.7851, 0.4141,<br />

"NNL2", 8, 0.1111, 0.8889, 0.0001, 0.3615,<br />

0.1443, 0.3570, 0.0436,<br />

"NNO", 9, 0.3467, 0.6534, 0.5607, 0.7954,<br />

1.4430, 0.8925, 0.5667,<br />

"NNO2", 9, 0.0025, 0.9975, 0.0001, 0.2169,<br />

0.0001, 0.0357, 0.0001,<br />

"NNS", 7, 0.0025, 0.9975, 0.1869, 0.0001,<br />

0.0001, 0.0001, 0.0001,<br />

"NNS1", 7, 0.1413, 0.8587, 1.1215, 0.7954,<br />

1.0101, 1.9636, 0.2833,<br />

"NNS2", 7, 0.0833, 0.9167, 0.3738, 0.2169,<br />

0.5772, 0.0714, 0.0218,<br />

"NNSA1", 4, 0.0025, 0.9975, 0.1869, 0.0001,<br />

0.0001, 0.0001, 0.0001,<br />

"NNSB1", 4, 0.7595, 0.2405, 0.3738, 0.0723,<br />

0.0001, 0.5712, 1.3078,<br />

"NNT1", 8, 0.1667, 0.8333, 3.3645, 3.6876,<br />

2.3088, 3.2131, 0.7629,<br />

"NNT2", 8, 0.1522, 0.8478, 2.4299, 2.4584,<br />

1.1544, 0.8211, 0.3051,<br />

"NNU", 9, 0.3000, 0.7000, 0.1869, 0.0723,<br />

0.1443, 0.1428, 0.0654,<br />

"NNU1", 9, 0.1176, 0.8824, 0.7477, 0.2892,<br />

0.4329, 0.1428, 0.0436,<br />

"NNU2", 9, 0.0816, 0.9184, 0.5607, 1.5907,<br />

0.5772, 0.5712, 0.0872,<br />

"NNU21", 9, 0.9643, 0.0357, 0.0001, 0.0001,<br />

0.0001, 0.0357, 0.5885,<br />

"NNU22", 9, 0.1852, 0.8148, 0.0001, 0.7231,<br />

0.4329, 0.3213, 0.1090,<br />

"NP", 5, 0.5000, 0.5000, 0.0001, 0.0723,<br />

0.0001, 0.1071, 0.0872,<br />

"NP1", 5, 0.0889, 0.9111, 26.7290, 23.1381,<br />

27.7056, 19.4216, 2.5501,<br />

159


"NP2", 5, 0.0025, 0.9975, 0.0001, 0.1446,<br />

0.0001, 0.0001, 0.0001,<br />

"NPD1", 9, 0.0025, 0.9975, 0.0001, 0.3615,<br />

0.0001, 0.1428, 0.0001,<br />

"NPM1", 9, 0.0625, 0.9376, 0.1869, 1.1569,<br />

1.1544, 0.1785, 0.0436,<br />

"PN", 1, 0.0025, 0.9975, 0.0001, 0.2169,<br />

0.0001, 0.0357, 0.0001, /* not sure about group */<br />

"PN1", 1, 0.1316, 0.8685, 0.7477, 0.5061,<br />

1.1544, 0.4998, 0.1090, /* not sure about group */<br />

"PN121", 1, 0.0025, 0.9975, 0.0001, 0.0001,<br />

0.1443, 0.0357, 0.0001, /* not sure about group */<br />

"PN122", 1, 0.9975, 0.0025, 0.0001, 0.0001,<br />

0.0001, 0.0001, 0.0436, /* not sure about group */<br />

"PNQO", 2, 0.0025, 0.9975, 0.0001, 0.0001,<br />

0.0001, 0.0357, 0.0001,<br />

"PNQS", 2, 0.8966, 0.1035, 0.1869, 0.0723,<br />

0.0001, 0.2499, 1.7001,<br />

"PP$", 1, 0.8333, 0.1667, 0.0001, 0.0723,<br />

0.0001, 0.0001, 0.1090,<br />

"PPH1", 2, 0.9316, 0.0684, 0.0001, 0.1446,<br />

0.0001, 0.5712, 5.3400,<br />

"PPHO1", 1, 0.7059, 0.2941, 0.1869, 0.0723,<br />

0.0001, 0.1071, 0.2616,<br />

"PPHO2", 4, 0.9524, 0.0476, 0.0001, 0.0001,<br />

0.1443, 0.0357, 0.8718,<br />

"PPHS1", 1, 0.8788, 0.1212, 0.0001, 0.4338,<br />

0.1443, 0.3213, 2.5283,<br />

"PPHS2", 4, 0.8017, 0.1984, 0.1869, 0.3615,<br />

1.0101, 0.3927, 2.1142,<br />

"PPIO1", 2, 0.7000, 0.3000, 0.0001, 0.0723,<br />

0.0001, 0.0714, 0.1526,<br />

"PPIO2", 2, 0.9975, 0.0025, 0.0001, 0.0001,<br />

0.0001, 0.0001, 0.5013,<br />

"PPIS1", 2, 0.7797, 0.2203, 0.0001, 0.0723,<br />

0.4329, 0.3213, 1.0026,<br />

"PPIS2", 2, 0.9057, 0.0944, 0.0001, 0.0001,<br />

0.1443, 0.3213, 2.0924,<br />

"PPX1", 8, 0.0323, 0.9677, 0.1869, 0.7954,<br />

1.1544, 0.3570, 0.0218,<br />

"PPX121", 8, 0.0025, 0.9975, 0.0001, 0.0001,<br />

0.0001, 0.0714, 0.0001,<br />

"PPX122", 8, 0.0025, 0.9975, 0.0001, 0.0723,<br />

0.0001, 0.0357, 0.0001,<br />

"PPX2", 8, 0.1667, 0.8333, 0.1869, 0.0723,<br />

0.7215, 0.1071, 0.0436,<br />

"PPX221", 8, 0.5000, 0.5000, 0.2000, 0.2000,<br />

0.2000, 0.2000, 0.2000, /* no data */<br />

"PPX222", 8, 0.5000, 0.5000, 0.2000, 0.2000,<br />

0.2000, 0.2000, 0.2000, /* no data */<br />

"PPY", 3, 0.9577, 0.0423, 0.0001, 0.0001,<br />

0.1443, 0.0714, 1.4821,<br />

"RA", 5, 0.2143, 0.7857, 0.3738, 0.5061,<br />

0.4329, 0.3570, 0.1308,<br />

"REX", 9, 0.5000, 0.5000, 0.2000, 0.2000,<br />

0.2000, 0.2000, 0.2000, /* no data */<br />

"REX21", 9, 0.9677, 0.0323, 0.0001, 0.0001,<br />

0.0001, 0.0357, 0.6539,<br />

160


"REX22", 9, 0.0323, 0.9678, 0.7477, 0.1446,<br />

0.1443, 0.8211, 0.0218,<br />

"RG", 9, 0.4370, 0.5631, 0.1869, 1.0123,<br />

0.7215, 1.6780, 1.1334,<br />

"RG21", 9, 0.9975, 0.0025, 0.0001, 0.0001,<br />

0.0001, 0.0001, 0.0436,<br />

"RG22", 9, 0.9975, 0.0025, 0.0001, 0.0001,<br />

0.0001, 0.0001, 0.0436,<br />

"RGA", 9, 0.1429, 0.8571, 0.0001, 0.2169,<br />

0.0001, 0.1071, 0.0218,<br />

"RGQ", 9, 0.5000, 0.5000, 0.1869, 0.0001,<br />

0.0001, 0.1071, 0.0872,<br />

"RGQV", 9, 0.0025, 0.9975, 0.0001, 0.0001,<br />

0.0001, 0.0357, 0.0001,<br />

"RGR", 9, 0.5263, 0.4736, 0.0001, 0.0723,<br />

0.4329, 0.4998, 0.4359,<br />

"RGT", 9, 0.5652, 0.4348, 0.1869, 0.0723,<br />

0.1443, 0.2499, 0.2833,<br />

"RL", 5, 0.1351, 0.8648, 1.6822, 2.0969,<br />

1.0101, 1.8208, 0.3269,<br />

"RL21", 5, 0.9975, 0.0025, 0.0001, 0.0001,<br />

0.0001, 0.0001, 0.0654,<br />

"RL22", 5, 0.0025, 0.9975, 0.1869, 0.0723,<br />

0.1443, 0.0001, 0.0001,<br />

"RP", 9, 0.3188, 0.6811, 3.3645, 4.1215,<br />

1.8759, 1.8922, 1.4385,<br />

"RR", 5, 0.1512, 0.8488, 8.5981, 13.3767,<br />

14.2857, 11.6744, 2.5501,<br />

"RR21", 5, 0.8500, 0.1500, 0.0001, 0.0001,<br />

0.0001, 0.3213, 1.1116,<br />

"RR22", 5, 0.1000, 0.9000, 1.4953, 1.2292,<br />

0.8658, 0.8211, 0.1308,<br />

"RR31", 5, 0.6250, 0.3750, 0.0001, 0.0001,<br />

0.0001, 0.1071, 0.1090,<br />

"RR32", 5, 0.1250, 0.8750, 0.0001, 0.4338,<br />

0.0001, 0.0357, 0.0218,<br />

"RR33", 5, 0.8750, 0.1250, 0.0001, 0.0001,<br />

0.0001, 0.0357, 0.1526,<br />

"RRQ", 5, 0.5694, 0.4306, 0.0001, 0.7954,<br />

0.2886, 0.6426, 0.8936,<br />

"RRQV", 5, 0.9975, 0.0025, 0.0001, 0.0001,<br />

0.0001, 0.0001, 0.0218,<br />

"RRR", 5, 0.0238, 0.9762, 0.5607, 0.7231,<br />

1.0101, 0.7497, 0.0218,<br />

"RRT", 5, 0.0025, 0.9975, 0.0001, 0.0001,<br />

0.0001, 0.0357, 0.0001,<br />

"RT", 5, 0.1982, 0.8018, 2.6168, 1.3015,<br />

1.7316, 1.6066, 0.4795,<br />

"TO", 3, 0.9927, 0.0073, 0.0001, 0.0001,<br />

0.0001, 0.1071, 8.9364,<br />

"UH", 1, 0.3077, 0.6923, 0.9346, 0.2169,<br />

0.0001, 0.3570, 0.1744, /* not sure about group */<br />

"VB0", 2, 0.9162, 0.0838, 0.1869, 0.2169,<br />

0.0001, 0.3570, 3.3348,<br />

"VBDR", 1, 0.9000, 0.1000, 0.0001, 0.1446,<br />

0.1443, 0.2499, 1.9616,<br />

"VBDZ", 2, 0.9255, 0.0745, 0.0001, 0.1446,<br />

0.1443, 0.6426, 5.6888,<br />

161


"VBG", 1, 0.7857, 0.2143, 0.0001, 0.0001,<br />

0.0001, 0.2142, 0.4795,<br />

"VBM", 1, 0.0025, 0.9975, 0.0001, 0.0723,<br />

0.0001, 0.0001, 0.0001,<br />

"VBN", 2, 0.9123, 0.0878, 0.1869, 0.0723,<br />

0.0001, 0.2856, 2.2668,<br />

"VBR", 4, 0.8244, 0.1756, 0.0001, 0.6508,<br />

0.2886, 0.4284, 2.3540,<br />

"VBZ", 1, 0.8545, 0.1455, 0.5607, 0.7954,<br />

1.1544, 0.6069, 4.9913,<br />

"VD0", 1, 0.2800, 0.7200, 0.0001, 0.4338,<br />

1.1544, 0.1428, 0.1526, /* maybe should be <strong>in</strong> group 2 */<br />

"VDD", 1, 0.4242, 0.5757, 0.1869, 0.4338,<br />

0.0001, 0.4284, 0.3051,<br />

"VDG", 1, 0.1667, 0.8333, 0.0001, 0.0001,<br />

0.1443, 0.1428, 0.0218,<br />

"VDN", 1, 0.4000, 0.6000, 0.0001, 0.0723,<br />

0.0001, 0.0714, 0.0436, /* maybe should be <strong>in</strong> group 2 */<br />

"VDZ", 1, 0.3846, 0.6154, 0.0001, 0.1446,<br />

0.0001, 0.2142, 0.1090,<br />

"VH0", 4, 0.8092, 0.1908, 0.1869, 0.2169,<br />

0.7215, 0.7140, 2.6809,<br />

"VHD", 1, 0.8621, 0.1379, 0.0001, 0.1446,<br />

0.0001, 0.3570, 1.6347,<br />

"VHG", 1, 0.5455, 0.4545, 0.0001, 0.0001,<br />

0.0001, 0.1785, 0.1308,<br />

"VHN", 1, 0.3750, 0.6250, 0.0001, 0.1446,<br />

0.0001, 0.1071, 0.0654, /* maybe should be <strong>in</strong> group 2 */<br />

"VHZ", 1, 0.8587, 0.1413, 0.1869, 0.2169,<br />

0.1443, 0.2856, 1.7219,<br />

"VM", 4, 0.7267, 0.2734, 0.3738, 1.8077,<br />

1.4430, 1.6066, 4.7515,<br />

"VM21", 4, 0.1429, 0.8571, 0.0001, 0.1446,<br />

0.0001, 0.1428, 0.0218,<br />

"VM22", 4, 0.5000, 0.5000, 0.2000, 0.2000,<br />

0.2000, 0.2000, 0.2000, /* no data */<br />

"VMK", 4, 0.5000, 0.5000, 0.2000, 0.2000,<br />

0.2000, 0.2000, 0.2000, /* no data */<br />

"VV0", 5, 0.1789, 0.8211, 13.8318, 10.9183,<br />

8.0808, 13.5666, 3.1386,<br />

"VVD", 7, 0.1416, 0.8584, 4.2991, 5.3507,<br />

4.3290, 9.9607, 1.4603,<br />

"VVG", 7, 0.1111, 0.8889, 4.4860, 4.3384,<br />

3.6075, 8.1042, 0.9154,<br />

"VVN", 5, 0.0812, 0.9188, 14.0187, 12.2198,<br />

10.3896, 12.9597, 1.3078,<br />

"VVZ", 7, 0.2051, 0.7948, 2.4299, 2.0969,<br />

0.7215, 3.8558, 0.8718,<br />

"XX", 6, 0.2340, 0.7660, 0.1869, 1.8077,<br />

1.1544, 1.3567, 0.4795,<br />

"ZZ1", 5, 0.0417, 0.9583, 0.1250, 0.4167,<br />

0.0417, 0.3750, 0.0417,<br />

"{CP}", 8, 0.9000, 0.1000, 0.2500, 0.2500,<br />

0.1000, 0.3000, 0.1000,<br />

".", 0, 1.0000, 0.0000, 0.0000, 0.0000,<br />

0.0000, 0.0000, 0.0000,<br />

":", 0, 1.0000, 0.0000, 0.0000, 0.0000,<br />

0.0000, 0.0000, 0.0000,<br />

162


"", 0, 1.0000, 0.0000, 0.0000, 0.0000,<br />

0.0000, 0.0000, 0.0000,<br />

"!", 0, 1.0000, 0.0000, 0.0000, 0.0000,<br />

0.0000, 0.0000, 0.0000,<br />

"", 0, 1.0000, 0.0000, 0.0000, 0.0000,<br />

0.0000, 0.0000, 0.0000,<br />

",", 0, 1.0000, 0.0000, 0.0000, 0.0000,<br />

0.0000, 0.0000, 0.0000,<br />

"-", 0, 1.0000, 0.0000, 0.0000, 0.0000,<br />

0.0000, 0.0000, 0.0000,<br />

"(", 0, 1.0000, 0.0000, 0.0000, 0.0000,<br />

0.0000, 0.0000, 0.0000,<br />

")", 0, 1.0000, 0.0000, 0.0000, 0.0000,<br />

0.0000, 0.0000, 0.0000,<br />

"'", 0, 1.0000, 0.0000, 0.0000, 0.0000,<br />

0.0000, 0.0000, 0.0000}<br />

float transUU[10][10] = {<br />

1.0000, 0.8538, 0.9589, 0.9754, 0.7731, 0.2040,<br />

0.1466, 0.1473, 0.0353, 0.4330,<br />

0.4907, 0.6492, 0.8084, 0.8437, 0.4118, 0.0866,<br />

0.1250, 0.0495, 0.0232, 0.2447,<br />

0.8333, 0.7733, 0.9067, 0.8786, 0.6439, 0.0651,<br />

0.1064, 0.1006, 0.0312, 0.1200,<br />

0.9767, 0.7451, 0.9583, 0.9808, 0.7921, 0.0829,<br />

0.0811, 0.0390, 0.0314, 0.3193,<br />

0.7241, 0.5031, 0.6804, 0.5864, 0.5027, 0.0816,<br />

0.1395, 0.0824, 0.0167, 0.0976,<br />

0.0432, 0.0646, 0.0773, 0.1107, 0.0561, 0.0185,<br />

0.0093, 0.0200, 0.0044, 0.0675,<br />

0.0463, 0.0750, 0.1667, 0.1026, 0.3846, 0.1140,<br />

0.0385, 0.0000, 0.0140, 0.0163,<br />

0.0614, 0.0711, 0.1250, 0.1126, 0.1333, 0.0000,<br />

0.0256, 0.0000, 0.0106, 0.0631,<br />

0.0592, 0.0769, 0.1036, 0.0883, 0.0787, 0.0240,<br />

0.0652, 0.0037, 0.0162, 0.0345,<br />

0.0495, 0.3468, 0.1143, 0.3497, 0.2500, 0.0373,<br />

0.1163, 0.0000, 0.0252, 0.1074}<br />

float transUS[10][10] = {<br />

0.0000, 0.1462, 0.0411, 0.0246, 0.2269, 0.7960,<br />

0.8534, 0.8527, 0.9647, 0.5670,<br />

0.0000, 0.1672, 0.0599, 0.0154, 0.4118, 0.8524,<br />

0.7372, 0.8960, 0.8971, 0.6011,<br />

0.0000, 0.1700, 0.0800, 0.0347, 0.2879, 0.9118,<br />

0.8085, 0.8742, 0.9688, 0.8400,<br />

0.0000, 0.2549, 0.0417, 0.0082, 0.1980, 0.9058,<br />

0.9189, 0.9610, 0.9467, 0.6807,<br />

0.0000, 0.2147, 0.0731, 0.0500, 0.2350, 0.7814,<br />

0.4302, 0.8353, 0.7333, 0.6829,<br />

0.0000, 0.0094, 0.0000, 0.0033, 0.0561, 0.2319,<br />

0.2710, 0.1200, 0.1274, 0.2068,<br />

0.0000, 0.0000, 0.0556, 0.0000, 0.1538, 0.3161,<br />

0.2308, 0.0909, 0.1821, 0.1138,<br />

0.0000, 0.0133, 0.0000, 0.0033, 0.0000, 0.1937,<br />

163


0.1795, 0.1111, 0.2979, 0.1441,<br />

0.0000, 0.0071, 0.0000, 0.0000, 0.0140, 0.1226,<br />

0.1739, 0.0787, 0.1542, 0.1207,<br />

0.0000, 0.0289, 0.0000, 0.0061, 0.0500, 0.4191,<br />

0.3953, 0.0625, 0.4137, 0.6694}<br />

float transSU[10][10] = {<br />

0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,<br />

0.0000, 0.0000, 0.0000, 0.0000,<br />

0.5093, 0.1672, 0.1317, 0.1401, 0.1765, 0.0165,<br />

0.0929, 0.0149, 0.0232, 0.1011,<br />

0.1667, 0.0567, 0.0133, 0.0867, 0.0682, 0.0063,<br />

0.0851, 0.0126, 0.0000, 0.0200,<br />

0.0233, 0.0000, 0.0000, 0.0110, 0.0099, 0.0063,<br />

0.0000, 0.0000, 0.0037, 0.0000,<br />

0.2759, 0.2577, 0.2466, 0.3591, 0.2404, 0.0375,<br />

0.3721, 0.0706, 0.0333, 0.1707,<br />

0.9568, 0.8642, 0.8969, 0.8826, 0.7296, 0.1277,<br />

0.1869, 0.1950, 0.1051, 0.4135,<br />

0.9537, 0.8250, 0.7778, 0.8974, 0.4615, 0.1813,<br />

0.0962, 0.0909, 0.2129, 0.4959,<br />

0.9386, 0.8044, 0.8250, 0.8742, 0.8333, 0.1309,<br />

0.1538, 0.1111, 0.0638, 0.3604,<br />

0.9408, 0.8281, 0.8679, 0.9085, 0.7303, 0.2091,<br />

0.2609, 0.1873, 0.1571, 0.4483,<br />

0.9505, 0.5838, 0.8571, 0.6442, 0.7000, 0.1328,<br />

0.1395, 0.3750, 0.2014, 0.0661}<br />

float transSS[10][10] = {<br />

0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,<br />

0.0000, 0.0000, 0.0000, 0.0000,<br />

0.0000, 0.0164, 0.0000, 0.0009, 0.0000, 0.0445,<br />

0.0449, 0.0396, 0.0565, 0.0532,<br />

0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0168,<br />

0.0000, 0.0126, 0.0000, 0.0200,<br />

0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0050,<br />

0.0000, 0.0000, 0.0181, 0.0000,<br />

0.0000, 0.0245, 0.0000, 0.0045, 0.0219, 0.0995,<br />

0.0581, 0.0118, 0.2167, 0.0488,<br />

0.0000, 0.0618, 0.0258, 0.0033, 0.1582, 0.6218,<br />

0.5327, 0.6650, 0.7631, 0.3122,<br />

0.0000, 0.1000, 0.0000, 0.0000, 0.0000, 0.3886,<br />

0.6346, 0.8182, 0.5910, 0.3740,<br />

0.0000, 0.1111, 0.0500, 0.0099, 0.0333, 0.6754,<br />

0.6410, 0.7778, 0.6277, 0.4324,<br />

0.0000, 0.0879, 0.0286, 0.0032, 0.1770, 0.6442,<br />

0.5000, 0.7303, 0.6725, 0.3966,<br />

0.0000, 0.0405, 0.0286, 0.0000, 0.0000, 0.4108,<br />

0.3488, 0.5625, 0.3597, 0.1570}<br />

float bigram[5][5] = {<br />

0.0000, 0.0591, 0.2698, 0.0879, 0.5832,<br />

0.8212, 0.0393, 0.0362, 0.0096, 0.0936,<br />

164


0.5949, 0.0158, 0.1164, 0.0543, 0.2185,<br />

0.5241, 0.0060, 0.0442, 0.0145, 0.4112,<br />

0.2697, 0.1007, 0.2665, 0.0725, 0.2906}<br />

/* floats above produced from normalis<strong>in</strong>g across each l<strong>in</strong>e <strong>of</strong> data below: */<br />

/* 0, 774, 3535, 1151, 7640,<br />

/* 2131, 102, 94, 25, 243,<br />

/* 5309, 141, 1039, 485, 1950,<br />

/* 1481, 17, 125, 41, 1162,<br />

/* 4180, 1561, 4130, 1124, 4505<br />

/* T R F V S */<br />

float bigram5[6][6] = {<br />

0.0000, 0.0240, 0.0893, 0.0283, 0.2225, 0.6358,<br />

0.7877, 0.0131, 0.0154, 0.0077, 0.0493, 0.1268,<br />

0.5417, 0.0049, 0.0341, 0.0242, 0.1201, 0.2750,<br />

0.4112, 0.0035, 0.0262, 0.0088, 0.2481, 0.3022,<br />

0.2430, 0.0615, 0.1536, 0.0430, 0.1572, 0.3417,<br />

0.0567, 0.0478, 0.1913, 0.0590, 0.3179, 0.3273}<br />

/* floats above produced from normalis<strong>in</strong>g across each l<strong>in</strong>e <strong>of</strong> data below: */<br />

/* 0, 319, 1185, 375, 2953, 8437,<br />

/* 2044, 34, 40, 20, 128, 329,<br />

/* 4834, 44, 304, 216, 1072, 2454,<br />

/* 1162, 10, 74, 25, 701, 854,<br />

/* 3767, 953, 2381, 666, 2436, 5297,<br />

/* 1463, 1235, 4939, 1524, 8210, 8453<br />

/* TUB RISE FALL R-F STR USTR */<br />

float trigram[3][3][3] ={<br />

0.0261, 0.0808, 0.0050,<br />

0.0722, 0.0748, 0.0818,<br />

0.0112, 0.0080, 0.0019,<br />

0.0429, 0.0711, 0.0145,<br />

0.0262, 0.0304, 0.0743,<br />

0.0981, 0.0558, 0.0160,<br />

0.0429, 0.0768, 0.0016,<br />

0.0301, 0.0258, 0.0138,<br />

0.0120, 0.0058, 0.0000}<br />

ma<strong>in</strong>()<br />

{<br />

<strong>in</strong>t i, j, k, l, pos, done, f<strong>in</strong>ished, w, numberstates, s,<br />

tub = 0, p1, p2, tri1, tri2, tri3<br />

double value, val1, val2, bigvalue, tp<br />

short <strong>in</strong>t state[MAXWORDS], beststates[NUMBEST][MAXWORDS]<br />

short <strong>in</strong>t grp[MAXWORDS], bigstate[MAXWORDS]<br />

float sentence2[MAXWORDS][2], sentence6[MAXWORDS][6]<br />

float beststatevals[NUMBEST]<br />

char wordtag[MAXWORDS][7], prosody[MAXWORDS][24]<br />

for (i = 0 i < NUMBEST i++)<br />

165


eststatevals[i] =0 /* clear best states */<br />

scanf("%d", &w) /* get number <strong>of</strong> words */<br />

if (w < 0 || w > MAXWORDS) {<br />

pr<strong>in</strong>tf("Number <strong>of</strong> words(%d) not suitable. Exit<strong>in</strong>g\n",<br />

w)<br />

exit(0)<br />

}<br />

for (i = 0 i < w i++) {<br />

scanf("%s %s", wordtag[i], prosody[i])<br />

for (j = 0 j < NUMTAGS j++)<br />

if (!strcmp(tags[j].tagname, wordtag[i])) {<br />

sentence2[i][0] = tags[j].prb2U<br />

sentence2[i][1] = tags[j].prb2S<br />

grp[i] = tags[j].group<br />

break<br />

}<br />

}<br />

if (j == NUMTAGS) {/* unknown tag assume multiple punc */<br />

sentence2[i][0] = 1.0<br />

sentence2[i][1] = 0.0<br />

grp[i] = 0<br />

}<br />

/* number <strong>of</strong> possible states 2^w */<br />

numberstates = pow(2.0, (double)w)<br />

for (j = 0 j < w j++)<br />

state[j] = 0 /* set up first state */<br />

for (i = 0 i < numberstates i++) {<br />

val1 = 1.0<br />

/* product <strong>of</strong> prob tag be<strong>in</strong>g <strong>in</strong> its state */<br />

for (j = 0 j < w j++)<br />

val1 *= sentence2[j][state[j]]<br />

val2 = 1.0 /* group transition probabilities */<br />

for (j = 1 j < w j++) {<br />

if (state[j-1] == 0 && state[j] == 0)<br />

tp = transUU[grp[j-1]][grp[j]]<br />

if (state[j-1] == 0 && state[j] == 1)<br />

tp = transUS[grp[j-1]][grp[j]]<br />

if (state[j-1] == 1 && state[j] == 0)<br />

tp = transSU[grp[j-1]][grp[j]]<br />

if (state[j-1] == 1 && state[j] == 1)<br />

tp = transSS[grp[j-1]][grp[j]]<br />

val2 *= tp<br />

}<br />

if (w > 3)<br />

for (j = 2 j < w j++) { /* stress trigram probs */<br />

if (grp[j-2] == 0)<br />

tri1 = 2<br />

else<br />

tri1 = state[j-2]<br />

if (grp[j-1] == 0)<br />

166


}<br />

tri2 = 2<br />

else<br />

tri2 = state[j-1]<br />

if (grp[j-0] == 0)<br />

tri3 = 2<br />

else<br />

tri3 = state[j]<br />

val2 *= trigram[tri1][tri2][tri3]<br />

value = val1 * val2<br />

/* keep track <strong>of</strong> the top NUMBEST most probable sequences */<br />

for (j = 0 value < beststatevals[j] && j < NUMBEST<br />

j++)<br />

<br />

if (j < NUMBEST) {<br />

for (k = NUMBEST - 1 k > j k--) {/* make room */<br />

beststatevals[k] = beststatevals[k-1]<br />

for (l = 0 l < w l++)<br />

beststates[k][l] = beststates[k-1][l]<br />

}<br />

beststatevals[j] = value<br />

for (k = 0 k < w k++)<br />

beststates[j][k] = state[k]<br />

}<br />

}<br />

pos = 0 /* update state[] to next state */<br />

do {<br />

if (++state[pos] == 2) {<br />

state[pos] = 0<br />

pos++<br />

done = 0<br />

} else<br />

done = 1<br />

} while (pos < w && !done)<br />

/* NOW to use the NUMBEST sequences held <strong>in</strong> */<br />

/* beststates[NUMBEST][MAXWORDS] to predict the TSMs */<br />

bigvalue = 0.0 /* reset best value */<br />

/* do this for each state from above */<br />

for (l = NUMBEST - 1 l >= 0 l--) {<br />

for (j = 0 j < w j++) /* setup state */<br />

if (beststates[l][j] == 0)<br />

state[j] = 5 /* 0=unstr -> 5*/<br />

else<br />

state[j] = 1 /* 1=stressed -> 1 */<br />

for (i = 0 i < w i++) { /* setup probability lattice */<br />

for (j = 0 j < NUMTAGS j++)<br />

if (!strcmp(tags[j].tagname, wordtag[i])) {<br />

sentence6[i][0] = 0.0 /* TUB */<br />

sentence6[i][1] = tags[j].prb5R<br />

sentence6[i][2] = tags[j].prb5F<br />

167


}<br />

sentence6[i][3] = tags[j].prb5V<br />

sentence6[i][4] = tags[j].prb5S<br />

sentence6[i][5] = tags[j].prb5U<br />

if (sentence6[i][1] + sentence6[i][2]<br />

+ sentence6[i][3] + sentence6[i][4]<br />

+ sentence6[i][5] == 0 ) {<br />

sentence6[i][0]<br />

= 1.0 /* if all probs = 0 */<br />

state[i] = 0 /* then must be punc */<br />

}<br />

break<br />

}<br />

if (j == NUMTAGS) {<br />

sentence6[i][0] = 1.0 /* TUB */<br />

sentence6[i][1] = 0.0<br />

sentence6[i][2] = 0.0<br />

sentence6[i][3] = 0.0<br />

sentence6[i][4] = 0.0<br />

sentence6[i][5] = 0.0<br />

state[i] = 0<br />

}<br />

f<strong>in</strong>ished = 0<br />

while (!f<strong>in</strong>ished) {<br />

value = 1.0<br />

val1 = 1.0<br />

val2 = 1.0<br />

for (j = 0 j < w j++) /* <strong>in</strong>itial state probs */<br />

val1 *= (double)sentence6[j][state[j]]<br />

for (j = 1 j < w j++) /* all state trans probs */<br />

val2 *= (double)bigram5[state[j-1]][state[j]]<br />

/* non stress state trans probs */<br />

for (p1 = 0, j = 0 j < w j++)<br />

if (state[j] != 5 && state[j] !=<br />

0)<br />

p1++ /* count no. <strong>of</strong> stresses */<br />

if (p1 >= 2) {<br />

/* mult state-to-state trans probs ignor<strong>in</strong>g unstr */<br />

for (p1 = 0 state[p1] != 5 && state[p1]<br />

!= 0 p1++)<br />

/* f<strong>in</strong>d first stress*/<br />

for (p2 = (p1 + 1) state[p2] !=<br />

5 && state[p2] != 0 && p2 < w p2++) {<br />

if (p2 < w)<br />

val2 *= (double)bigram[state[p1]][state[p2]]<br />

p1 = p2<br />

}<br />

}<br />

/* <strong>in</strong>itial state probs * transition state probs */<br />

value = val1 * val2<br />

if (value > bigvalue) {<br />

/*pr<strong>in</strong>tf("BEST SO FAR:")*/<br />

bigvalue = value<br />

for (k = 0 k < w k++) {<br />

bigstate[k] = state[k] /* save new best state */<br />

168


}<br />

/*switch(state[k]) {<br />

case 0:<br />

pr<strong>in</strong>tf("|")<br />

break<br />

case 1:<br />

pr<strong>in</strong>tf("R")<br />

break<br />

case 2:<br />

pr<strong>in</strong>tf("F")<br />

break<br />

case 3:<br />

pr<strong>in</strong>tf("V")<br />

break<br />

case 4:<br />

pr<strong>in</strong>tf("S")<br />

break<br />

case 5:<br />

pr<strong>in</strong>tf("U")<br />

}*/<br />

}<br />

/*pr<strong>in</strong>tf(" %e\n",value)*/<br />

}<br />

pos = w - 1 /* update state[] to next state */<br />

do {<br />

done = 1 /* by default have done unless changed below */<br />

if (state[pos] == 0 || state[pos]<br />

== 5) {/* don't change tub or unstressed state */<br />

pos--<br />

done = 0<br />

} else if (pos >= 0) {/* don't go past end */<br />

if (++state[pos] == 5) {/* if state <strong>in</strong>cs to 5=unstressed */<br />

state[pos] =1 /* reset to 1 */<br />

pos-- /* <strong>and</strong> po<strong>in</strong>t to next word */<br />

done = 0 /* <strong>and</strong> say we haven't done */<br />

}<br />

}<br />

if (pos < 0)<br />

f<strong>in</strong>ished = 1 /* ie have done all comb<strong>in</strong>ations */<br />

} while (pos >= 0 && !done)<br />

}<br />

for (i = 0 i < w i++)<br />

pr<strong>in</strong>tf("%s=%s ", wordtag[i], prosody[i])<br />

pr<strong>in</strong>tf("\nPredicted: ")<br />

for (i = 0 i < w i++) {<br />

switch (bigstate[i]) {<br />

case 0:<br />

pr<strong>in</strong>tf("|")<br />

break<br />

case 1:<br />

pr<strong>in</strong>tf("R")<br />

break<br />

case 2:<br />

pr<strong>in</strong>tf("F")<br />

break<br />

case 3:<br />

169


}<br />

pr<strong>in</strong>tf("V")<br />

break<br />

case 4:<br />

pr<strong>in</strong>tf("S")<br />

break<br />

case 5:<br />

pr<strong>in</strong>tf("U")<br />

}<br />

}<br />

pr<strong>in</strong>tf("\nShould Be: ")<br />

for (i = 0 i < w i++) {<br />

if (bigstate[i] == 0)<br />

s = 0 /* tu boundary */<br />

else<br />

for (s = 5, j = 0 j < strlen(prosody[i])<br />

j++) {<br />

if ((prosody[i][j] == ',' || prosody[i][j]<br />

== '/') && s >= 4)<br />

s = 1<br />

if ((prosody[i][j] == '\\' || prosody[i][j]<br />

== '`') && s >= 4)<br />

s = 2<br />

if ((prosody[i][j] == ',' || prosody[i][j]<br />

== '/') && s == 2)<br />

s = 3<br />

if ((prosody[i][j] == '*' || prosody[i][j]<br />

== '_' || prosody[i][j] == '~') &&<br />

s == 5)<br />

s = 4<br />

if ((prosody[i][j] == '') && s == 5)<br />

s = 4<br />

}<br />

switch (s) {<br />

case 0:<br />

pr<strong>in</strong>tf("|")<br />

break<br />

case 1:<br />

pr<strong>in</strong>tf("R")<br />

break<br />

case 2:<br />

pr<strong>in</strong>tf("F")<br />

break<br />

case 3:<br />

pr<strong>in</strong>tf("V")<br />

break<br />

case 4:<br />

pr<strong>in</strong>tf("S")<br />

break<br />

case 5:<br />

pr<strong>in</strong>tf("U")<br />

}<br />

}<br />

pr<strong>in</strong>tf("\n")<br />

170


Bibliography<br />

[AA93] Simon Arneld <strong>and</strong> Eric Atwell. A syntax based grammar <strong>of</strong> stress sequences. In<br />

Simon Lucas, editor, Grammatical Inference: theory, applications <strong>and</strong> alternatives,<br />

pages 71{78, London, 1993. Institution <strong>of</strong> Electrical Eng<strong>in</strong>eers. Colloquium Proceed<strong>in</strong>gs<br />

no.1993/092.<br />

[ASO88]<br />

Eric Atwell, Clive Souter, <strong>and</strong> Tim O'Donoghue. Prototype parser 1. COMMUNAL<br />

research report 17, School <strong>of</strong> Computer Studies, University <strong>of</strong> Leeds, 1988.<br />

[Atw83]<br />

Eric Atwell. Constituent-likelihood grammar. ICAME Journal <strong>of</strong> the International<br />

Computer Archive <strong>of</strong> Modern <strong>English</strong>, 7:34{67, 1983.<br />

[Atw93]<br />

Eric Atwell. <strong>Corpus</strong>{based statistical modell<strong>in</strong>g <strong>of</strong> english grammar. In Clive Souter<br />

<strong>and</strong> Eric Atwell, editors, <strong>Corpus</strong>{<strong>Based</strong> Computational L<strong>in</strong>guistics, Amsterdam, 1993.<br />

Rodopi.<br />

[Atw94]<br />

Eric Atwell. Speech{oriented probabilistic parser project: F<strong>in</strong>al report to mod. Technical<br />

report, School <strong>of</strong> Computer Studies, University <strong>of</strong> Leeds, 1994.<br />

[BA93] Mark E. Beckman <strong>and</strong> Gayle M. Ayers. Guidel<strong>in</strong>es for ToBI Labell<strong>in</strong>g, 1993.<br />

[BCJ80] David Brazil, Malcolm Coulthard, <strong>and</strong> Cather<strong>in</strong>e Jones. Discourse Intonation <strong>and</strong><br />

Language Teach<strong>in</strong>g. Longman, 1980.<br />

[BGL93]<br />

Ezra Black, Roger Garside, <strong>and</strong> Georey Leech, editors. Statistically{Driven Computer<br />

Grammars <strong>of</strong> <strong>English</strong>: the IBM/Lancaster Approach. Rodopi, 1993.<br />

171


[Cha93] Eugene Chariak. Statistical Language Learn<strong>in</strong>g. Bradford Books, 1993.<br />

[Cho57] Noam Chomsky. Syntactic Structures. Mouton, 1957.<br />

[CQ64]<br />

David Crystal <strong>and</strong> R<strong>and</strong>olph Quirk. Systems <strong>of</strong> Prosodic <strong>and</strong> Paral<strong>in</strong>guistic Features<br />

<strong>in</strong> <strong>English</strong>. Mouton & Co., 1964.<br />

[Cru86] Alan Cruttenden. Intonation. Cambridge University Press, 1986.<br />

[Cry69]<br />

David Crystal. Prosodic Systems <strong>and</strong> Intonation <strong>in</strong> <strong>English</strong>. Cambridge University<br />

Press, 1969.<br />

[FP80]<br />

Rob<strong>in</strong> Fawcett <strong>and</strong> Mike Perk<strong>in</strong>s. Child Language Transcripts 6{12. Politechnic <strong>of</strong><br />

Wales, 1980.<br />

[Fud84] Erik Fudge. <strong>English</strong> Word Stress. George Allen & Unw<strong>in</strong> Ltd., 1984.<br />

[GAR92]<br />

Nawal Ghali, Simon Arneld, <strong>and</strong> Peter Roach. Statistical relationships between auditory<br />

<strong>and</strong> acoustic record<strong>in</strong>gs <strong>of</strong> <strong>in</strong>tonation: design <strong>of</strong> a database. In Proceed<strong>in</strong>gs <strong>of</strong><br />

the Institute <strong>of</strong> Acoustics: Speech <strong>and</strong> Hear<strong>in</strong>g, volume 14.6, pages 207{215, 1992.<br />

[GKPS85] Gerald Gazdar, Ewan Kle<strong>in</strong>, Georey Pullman, <strong>and</strong> Ivan Sag.<br />

Generalized Phrase<br />

Structure Grammar. Basil Blackwell, 1985.<br />

[GLS87] Roger Garside, Georey Leech, <strong>and</strong> Georey Sampson, editors. The Computation<br />

<strong>Analysis</strong> <strong>of</strong> <strong>English</strong>: A <strong>Corpus</strong>{<strong>Based</strong> Approach. Longman, 1987.<br />

[GM89]<br />

Gerald Gazdar <strong>and</strong> Chris Mellish. Natural Language Process<strong>in</strong>g <strong>in</strong> Prolog: An Inroduction<br />

to Computational L<strong>in</strong>guistics. Addison{Wesley, 1989.<br />

[Hug94]<br />

John Hughes. Automatically Acquir<strong>in</strong>g a Classication <strong>of</strong> Words. PhD thesis, The<br />

School <strong>of</strong> Computer Studies, The University <strong>of</strong> Leeds, 1994.<br />

[Isa85] S. D. Isard. Speech synthesis <strong>and</strong> the rhythm <strong>of</strong> english. In Frank Fallside <strong>and</strong><br />

William A. Woods, editors, Computer Speech Process<strong>in</strong>g, chapter 19. Prentice{Hall<br />

International, 1985.<br />

172


[JA94]<br />

Uwe Joust <strong>and</strong> Eric Atwell. Deriv<strong>in</strong>g a probabilistic grammar <strong>of</strong> semantic markers<br />

from unrestricted english text. submitted to International Workshop on Computational<br />

Semantics, 1994.<br />

[Kla80] Denis H. Klatt. Scriber <strong>and</strong> lafs: Two new approaches to speech analysis. In Wayne A.<br />

Lea, editor, Trends <strong>in</strong> Speech Recognition, chapter 25. Prentice{Hall, 1980.<br />

[Kla87]<br />

Denis H. Klatt. Review <strong>of</strong> text-to-speech conversion for english. Journal <strong>of</strong> the Acoustical<br />

Society <strong>of</strong> America, 82, 1987.<br />

[Kla90]<br />

Denis H. Klatt. Review <strong>of</strong> the arpa speech underst<strong>and</strong><strong>in</strong>g project. In Alex Waibel <strong>and</strong><br />

Kai-Fu Lee, editors, Read<strong>in</strong>gs <strong>in</strong> Speech Recognition, pages 554{575. Morgan Kaufmann,<br />

1990.<br />

[Kno88]<br />

Gerry Knowles. The spoken english corpus: A progress report. ICAME Journal <strong>of</strong> the<br />

International Computer Archive <strong>of</strong> Modern <strong>English</strong>, 1988.<br />

[KT88] Gerry Knowles <strong>and</strong> Lita Taylor. A Manual <strong>of</strong> Information to Acompany the SEC<br />

<strong>Corpus</strong>. UCREL, The University <strong>of</strong> Lancaster, 1988.<br />

[Lav72]<br />

John Laver. Voice quality <strong>and</strong> <strong>in</strong>dexical <strong>in</strong>formation. In John Laver <strong>and</strong> S<strong>and</strong>y Hutcheson,<br />

editors, Communication <strong>in</strong> Face toFace Interaction, chapter 10. Pengu<strong>in</strong>, 1972.<br />

[Lav80]<br />

John Laver. The Phonetic Description <strong>of</strong> Voice Quality. Cambridge University Press,<br />

1980.<br />

[Lav94] John Laver. Pr<strong>in</strong>ciples <strong>of</strong> Phonetics. Cambridge University Press, 1994.<br />

[Lea80]<br />

Wayne A. Lea. Prosodic aids to speech recognition. In Wayne A. Lea, editor, Trends<br />

<strong>in</strong> Speech Recognition, chapter 8. Prentice{Hall, 1980.<br />

[Leh70] Ilse Lehiste. Suprasegmentals. MIT Press, 1970.<br />

173


[LH85] Christopher Longuet-Higg<strong>in</strong>s. Tones <strong>of</strong> voice: The role <strong>of</strong> <strong>in</strong>tonation <strong>in</strong> computer<br />

speech underst<strong>and</strong><strong>in</strong>g. In Frank Fallside <strong>and</strong> William A. Woods, editors, Computer<br />

Speech Process<strong>in</strong>g, chapter 11. Prentice{Hall International, 1985.<br />

[Lyo94]<br />

Carol<strong>in</strong>e Lyon. The representation <strong>of</strong> natural language to enable neural networks to<br />

detect syntactic features. PhD thesis, Department <strong>of</strong> Computer Science, University <strong>of</strong><br />

Hertfordshire, 1994.<br />

[MA91] Kirsten Malmkjaer <strong>and</strong> James M. Andersen, editors. The L<strong>in</strong>guistic Encyclopedia.<br />

Routledge, 1991.<br />

[OA61]<br />

J. D. O'Connor <strong>and</strong> G. F. Arnold. Intonation <strong>of</strong> Colloquial <strong>English</strong>. Longman, London,<br />

second edition, 1961.<br />

[O'D93]<br />

Tim O'Donoghue. Revers<strong>in</strong>g the Process <strong>of</strong> Generation <strong>in</strong> Systemic Grammar. PhD<br />

thesis, School <strong>of</strong> Computer Science, University <strong>of</strong> Leeds, 1993.<br />

[Pie87]<br />

J. Pierrehumbert. The Phonology <strong>and</strong> Phonetics <strong>of</strong> <strong>English</strong> Intonation. Indiana L<strong>in</strong>guistics<br />

Club, 1987.<br />

[Rab90]<br />

L. R. Rab<strong>in</strong>er. A tutorial on hidden markov models <strong>and</strong> selected applications <strong>in</strong> speech<br />

recognition. In Alex Waibel <strong>and</strong> Kai-Fu Lee, editors, Read<strong>in</strong>gs <strong>in</strong> Speech Recognition,<br />

pages 267{296. Morgan Kaufmann, 1990.<br />

[RAng]<br />

Peter Roach <strong>and</strong> Simon Arneld. L<strong>in</strong>k<strong>in</strong>g prosodic transcription to the time dimension.<br />

In G. N. Leech <strong>and</strong> J. Thomas, editors, <strong>Spoken</strong> <strong>English</strong> on Computer. Longman,<br />

forthcom<strong>in</strong>g.<br />

[RH95]<br />

Peter Roach <strong>and</strong> J.W. Hartman, editors. <strong>English</strong> Pronounc<strong>in</strong>g Dictionary. Cambridge<br />

University Press, 1995. forthcom<strong>in</strong>g.<br />

[RKVA94] Peter Roach, Gerry Knowles, Tamas Varadi, <strong>and</strong> Simon Arneld. Marsec: a mach<strong>in</strong>e{<br />

readable spoken english corpus. Journal <strong>of</strong> the Intonational Phonetic Association, 24.1,<br />

May 1994.<br />

174


[Roa91]<br />

Peter Roach. <strong>English</strong> Phonetics <strong>and</strong> Phonology: A practical course. Cambridge University<br />

Press, second edition, 1983,1991.<br />

[Roa92] Peter Roach. Introduc<strong>in</strong>g Phonetics. Pengu<strong>in</strong> <strong>English</strong> L<strong>in</strong>guistics. Pengu<strong>in</strong>, 1992.<br />

[Roa94]<br />

Peter Roach. Conversion between prosodic transcription systems: \st<strong>and</strong>ard british"<br />

<strong>and</strong> tobi. Speech Communication, 15, 1994.<br />

[Rob67]<br />

R.H. Rob<strong>in</strong>s. A Short History <strong>of</strong> L<strong>in</strong>guistics. Longmans' L<strong>in</strong>guistics Library. Longmans,<br />

1967.<br />

[SBP + 92]<br />

K. Silverman, M. Beckman, J. Pitrelli, M. Ostendorf, C. Wightman, P. Price, J. Pierrehumbert,<br />

<strong>and</strong> J. Hirschberg. Tobi: A st<strong>and</strong>ard for labell<strong>in</strong>g english prosody. Proceed<strong>in</strong>gs<br />

<strong>of</strong> the 1992 International Conference <strong>of</strong>Speech Language Process<strong>in</strong>g, 1992.<br />

[Sch73]<br />

Roger G. Schank. Identication <strong>of</strong> conceptualizations underly<strong>in</strong>g natural language. In<br />

Roger G. Schank <strong>and</strong> Kenneth Mark Colby, editors, Computer Models <strong>of</strong> Thought <strong>and</strong><br />

Language, chapter 5. W. H. Freeman <strong>and</strong> Company, 1973.<br />

[Ste85]<br />

M. Stella. Speech synthesis. In Frank Fallside <strong>and</strong> William A. Woods, editors, Computer<br />

Speech Process<strong>in</strong>g, chapter 17. Prentice{Hall International, 1985.<br />

[Sva90]<br />

Jan Svartvik. The London{Lund <strong>Corpus</strong> <strong>of</strong> <strong>Spoken</strong> <strong>English</strong>: Description <strong>and</strong> Research.<br />

Lund University Press, 1990.<br />

[tHCC90]<br />

Johan 't Hart, Rene Collier, <strong>and</strong> Antonie Cohen. A Perceptual Study <strong>of</strong> Intonation: An<br />

experimental{phonetic approach to speech melody. Cambridge University Press, 1990.<br />

[Wai90]<br />

Alex Waibel. Prosodic knowledge sources for word hypothesization <strong>in</strong> a cont<strong>in</strong>uous<br />

speech recognition system. In Alex Waibel <strong>and</strong> Kai-Fu Lee, editors, Read<strong>in</strong>gs <strong>in</strong> Speech<br />

Recognition, pages 534{537. Morgan Kaufmann, 1990.<br />

[Wal89]<br />

DavidL.Waltz. Connectionist models: Not just a notational varient, not a panacea.<br />

In Yorick Wilks, editor, Theoretical Issues <strong>in</strong> Natural Language Process<strong>in</strong>g, chapter 3,<br />

pages 56{63. Lawrence Erlbaum Associates Inc., 1989.<br />

175


[Wee94] Ruvan Weeras<strong>in</strong>ghe. Probabilistic Pars<strong>in</strong>g <strong>in</strong> Systemic Functional Grammar. PhD<br />

thesis, Department <strong>of</strong> Computer Science, University <strong>of</strong>Wales College at Cardi, 1994.<br />

[Wil78] Yorick Wilks. Mak<strong>in</strong>g preferences more active. FIArt. Int., 11:197{223, 1978.<br />

[Woo85]<br />

William A. Woods. Language process<strong>in</strong>g for speech underst<strong>and</strong><strong>in</strong>g. In Frank Fallside<br />

<strong>and</strong> William A. Woods, editors, Computer Speech Process<strong>in</strong>g,chapter 12. Prentice{Hall<br />

International, 1985.<br />

176

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!