On the Boundaries of Phonology and Phonetics - Faculteit der ...

2Q WKH %RXQGDULHV RI 

3KRQRORJ\ DQG 3KRQHWLFV

Sponsored by 

Nederlandse Vereniging voor Fonetische Wetenschappen 

Center for Language and Cognition Groningen 

Stichting Groninger Universiteitsfonds 

Department of Linguistics, University of Groningen 

1 st edition, January 2004 

2 nd edition, February 2004 

ISBN 90 367 1930 5

UNIVERSITY OF GRONINGEN 

2Q WKH %RXQGDULHV RI 

3KRQRORJ\ DQG 3KRQHWLFV 

Edited by 

Dicky Gilbers 

Maartje Schreuder 

Nienke Knevel 

To honour Tjeerd de Graaf

Contents 

On the Boundaries of Phonology and Phonetics 7 

The Editors: Dicky Gilbers, Maartje Schreuder and Nienke Knevel 

Tjeerd de Graaf 15 

Markus Bergmann, Nynke de Graaf and Hidetoshi Shiraishi 

Tseard de Graaf 31 

Oerset troch Jurjen van der Kooi 

Boundary Tones in Dutch: Phonetic or Phonological Contrasts? 37 

Vincent J. van Heuven 

The Position of Frisian in the Germanic Language Area 61 

Charlotte Gooskens and Wilbert Heeringa 

Learning Phonotactics with Simple Processors 89 

John Nerbonne and Ivilin Stoianov 

Weak Interactions 123 

Tamás Bíró 

Prosodic Acquisition: a Comparison of Two Theories 147 

Angela Grimm 

Base-Identity and the Noun-Verb Asymmetry in Nivkh 159 

Hidetoshi Shiraishi 

The Influence of Speech Rate on Rhythm Patterns 183 

Maartje Schreuder and Dicky Gilbers 

List of Addresses 203

On the Boundaries of Phonology and Phonetics 

The Editors: Dicky Gilbers, Maartje Schreuder and 

Nienke Knevel 

In this volume a collection of papers is presented in which the boundaries 

of phonology and phonetics are explored. In current phonological research, 

the distinction between phonology, as the study of sound systems of 

languages, and phonetics, as the study of the characteristics of human 

(speech) sound making, seems to be blurred. 

Consider an example of the phonological process of /l/-substitution as 

exemplified in the data in Table 1. 

Table 1. /l/ substitutions 

Historical Dutch data: 

/l/ → [w] 

alt/olt oud 'old' 

kalt/kolt koud 'cold' 

schoo[l] schoo[w] 'school' 

First Language Acquisition data (Dutch): 

hallo ha[w]o 'hello' 

lief [w]ief 'sweet' 

blauw b[w]auw 'blue' 

In phonology, the substitution segment is expected to be a minimal 

deviation from the target segment. For example, boot ‘boat’ could be 

realized as [pot], but not as [lot], since the target /b/ and the output [l] differ 

in too many dimensions. In other words, sound substitutions should be 

characterized more commonly by single feature changes than by several 

feature changes. The widely attested substitution of /l/ by [w], however, 

cannot be accounted for adequately as a minimal deviation from the target 

based on articulatorily defined features, as shown in Figure 1.

8 The Editors: Dicky Gilbers, Maartje Schreuder and Nienke Knevel 

Figure 1. /l/-substitutions 

/l/ → [w] 

+ son + son 

+ cons - cons 

+ cont + cont 

+ lat - lat 

- lab + lab 

+ ant - ant 

+ cor - cor 

- high + high 

- back + back 

- round + round 

From an acoustic point of view, liquid-glide alternations can be described 

as minimal changes. The differences between the individual glides and 

liquids can be related to their relative second and third formant locus 

frequencies. Ainsworth and Paliwal (1984) found that in a perceptualidentification 

experiment liquids such as [l] having a mid F2 locus 

frequency were classified as [w] if they had a low F2 locus frequency and 

as [j] if they had a high F2 locus frequency. 

3160 Hz w w w l l l l j j j 

↑ w w w l l l l j j j 

F3 locus freq. w w w r r r l j j j 

↓ w w w r r r j j j j 

1540 Hz w w r r r r r j j j 

760 Hz ← F2 locus freq. → 2380 Hz 

Figure 2. Typical set of responses obtained from listening to glide/liquid-vowel 

synthetic stimuli (after Ainsworth & Paliwal, 1984 (simplified)) 

Based on these acoustic characteristics, liquid-glide substitutions can be 

described as a minimal change from the target, which cannot be done in the 

phonological representation of these sounds. Obviously, phonology needs


phonetic information to explain a phonological process of this kind (cf. 

Gilbers, 2002). 

Now consider the Dutch process of schwa insertion as exhibited in 

Table 2. 

Table 2. schwa insertion in Dutch 

helm [��l�m] 'helmet' darm [��r�m] 'intestine' 

half [��l�f] 'half' durf [��r�f] 'courage' 

melk [��l�k] 'milk' hark [��r�k] 'rake' 

not in: vals 'out of tune', hals 'neck', hart 'heart', start 'start' 

Schwa may be inserted between a liquid /l,r/ and a non-homorganic 

consonant (i.e. a consonant that differs in place of articulation with /l,r/) at 

the end of a syllable. Therefore, schwa may be inserted between coronal /l/ 

or /r/ and non-coronal /m/, /f/, /k/, etc. Schwa is not allowed, however, 

between /l/ or /r/ and a coronal obstruent /s/ or /t/. Now, Dutch has at least 

two different varieties of /r/: an alveolar [r] and a uvular [�]. Since there is 

no functional difference between realizations such as [��] and [��] for rat 

'rat', however, there is only one phoneme /r/ in the Dutch system with its 

allophones [r] and [�]. Interestingly, even Dutch speakers with a uvular [�] 

do not show schwa insertion between their [�] and non-homorganic coronal 

obstruent /s/ or /t/. The process of schwa insertion, apparently, takes place 

before the phonetic level of actual realization of segments, i.e. on the 

abstract phonological level, where /r/, /s/ and /t/ share their place feature 

[coronal]. Synchronically, the process can only be described in a 

phonological way, even though it may have had a phonetic - articulatory - 

base originally. We assume that uvular [�] is a later variant of Dutch /r/ 

than coronal [r], just as the even younger, recently observed allophonic 

variant [�] in Western Dutch dialects: raar 'strange' realized as [ra:�]. These 

allophones date from times when the process of schwa insertion between 

non-homorganic, syllable-final liquid-consonant clusters was already 

'fossilized' in the Dutch system. 

The above-mentioned two accounts of phonological processes indicate 

the way many phonologists approach their research objects nowadays. 

More and more the distinction between phonology and phonetics is 

challenged in attempts to provide adequate accounts of the phonological 

phenomena. In this way, the phonologists of the so-called CLCG Klankleer


group in Groningen study the phonology-phonetics interface, whereas other 

members of the group cross the boundaries of phonology and phonetics by 

combining the study of sound patterns with dialectology, computational 

linguistics, musicology, first language acquisition or ethnolinguistics. 

The Center for Language and Cognition Groningen (CLCG) is a 

research institute within the Faculty of Arts of the University of Groningen. 

It comprises most of the linguistic research that is being carried out within 

the Faculty of Arts. One of the research groups of CLCG is this 'Klankleer' 

group (Phonology and Phonetics), which focuses on the structure and 

contents of the sounds of language. 

This volume of papers by members of the Klankleer group is dedicated 

to Tjeerd de Graaf, who was the coordinator of this group from 1999 until 

2003. It does not mean that Tjeerd no longer participates in the group, 

because he still supervises two PhD projects. These projects by Hidetoshi 

Shiraishi and Markus Bergmann combine phonetics and phonology with 

ethnolinguistics. As mentioned above, the research of most members of the 

group involves combinations of different (linguistic) areas. Wilbert 

Heeringa, Charlotte Gooskens and Roberto Bolognesi apply phonetics to 

the study of dialectology. Nanne Streekstra is one of the first linguists in 

our group who was interested in the phonology-phonetics interface. Wouter 

Jansen's work is exemplary for this so-called 'laboratory phonology'. He 

provides acoustic studies of voicing assimilation in obstruent clusters in 

Germanic languages. Maartje Schreuder and Dicky Gilbers combine 

phonetics and phonology with areas beyond linguistics, such as music 

theory. Former member Klarien van der Linde and Angela Grimm study 

first language acquisition, whereas Wander Lowie studies second language 

acquisition. Finally, Tjeerd de Graaf started his academic life as a 

researcher in theoretical physics, switched to phonetics, whereas his main 

interest is now in ethnolinguistics. This homo universalis also plays piano 

and oboe and speaks nine different languages. This Festschrift, however, is 

dedicated to the phonetician Tjeerd de Graaf. The papers cover a wide 

range of topics varying from ethnolinguistics to computational linguistics 

and from first language acquisition to dialectology. The common 

denominator is that all researchers work on the boundaries of phonology 

and phonetics. 

Vincent van Heuven, as a guest author from University of Leiden, 

wonders whether certain distinctions in the speech signal are phonological 

or phonetic. He investigates whether different prosodic boundary tones 

form a continuum or whether they are categorical. He finds a categorical


division between low (declarative) and non-low tones, but within the nonlow 

category the cross-over from continuation to question is rather gradual. 

Charlotte Gooskens and Wilbert Heeringa measured linguistic distances 

between Frisian dialects and the other Germanic languages in order to get 

an impression of the effect of genetic relationship and language contact on 

the position of the modern Frisian language on the Germanic language 

map. Wilbert is a member of the CLCG group 'Computational Linguistics'. 

John Nerbonne participates as head of CLCG. His paper with Ivilin 

Stoianov explores the learning of phonotactics in neural networks, in 

particular the so-called Simple Recurrent Networks (SRNs). SRNs provide 

a valuable means of exploring what information in the linguistic signal 

could in principle be acquired by a very primitive learning mechanism. 

Tamás Bíró, who is also a member of 'Computational Linguistics' and 

interested in phonology, claims that the types of interactions between 

languages can be extremely diverse, depending on a number of factors. The 

paper analyses three case studies, namely the influence of Yiddish on 

Hungarian, Modern Hebrew and Esperanto. 

Angela Grimm discusses a number of empirical and theoretical 

problems with respect to two models of prosodic acquisition: a template 

mapping model and a prosodic hierarchy model. Both models assume that 

the acquisition of word prosody is guided by universal prosodic principles. 

Toshi Shiraishi discusses phonological asymmetries between nominal 

and verbal stems of Nivkh, a minority language spoken on the island of 

Sakhalin. These asymmetries are observed in two phonological phenomena: 

consonant alternation and final fricative devoicing. Though the 

asymmetries themselves look very different on the surface, Toshi's paper 

makes explicit that they are subject to a common generalization, Base- 

Identity. 

Maartje Schreuder and Dicky Gilbers wondered whether the influence 

of a higher speech rate leads to adjustment of the rhythmic pattern, as it 

does in music, or just to 'phonetic compression' with preservation of the 

phonological structure. An example of an item they examined is the Dutch 

word perfèctioníst, which can get the rhythmic structure pèrfectioníst in fast 

tempo. The results indeed showed a preference for restructured rhythms in 

fast speech. 

With this very diverse collection of papers, we hope to present the 

phonetician Tjeerd de Graaf a representative selection of the current 

activities of his CLCG-Klankleer group.


In the 1970's and 1980's Tjeerd's phonetic research stood miles away 

from the feature geometries and grid representations that were customary in 

phonology. He used to make sonagrams, i.e. visual displays of sound 

spectrograms, of e.g. [p�], [si] and [r�]. But when the violin string of his 

sonagraph broke, he wasn't able to do phonetic research anymore and that is 

when ethnolinguistics stole his heart. Nowadays, it is much easier to do 

phonetic analyses on the computer using programs, such as PRAAT 

(Boersma and Weenink, 1992-2003). Whereas phonetics and phonology 

grew apart from each other since they were installed as two distinct 

disciplines of linguistics at the First International Congress of Linguists 

(The Hague 1928), current laboratory phonological research may even 

suggest that phonetics and phonology coincide. However, as shown in the 

two examples in this introductory paper, /l/-substitution and schwainsertion, 

the role of both disciplines is still distinguishable. That does not 

alter the fact that co-operation between phoneticians and phonologists must 

be an integral part of the study of sound patterns. Some sound phenomena, 

such as ethnolinguistic and dialect differences or acquisition data, can only 

be explained adequately if both phonological and phonetic characteristics 

of sounds are considered. 

University of Groningen, January 2004 

This volume was presented to Tjeerd de Graaf on January 30, 2004 at the 

workshop 'On the Boundaries of Phonology and Phonetics'. The CLCG and 

the Department of Linguistics of the University of Groningen, 'de 

Nederlandse Vereniging voor Fonetische Wetenschappen' and GUF 

(Stichting Groninger Universiteitsfonds) sponsored this workshop. Keynote 

speakers were Vincent van Heuven and Carlos Gussenhoven. 

References 

Ainsworth, W.A. & K.K. Paliwal (1984). Correlation between the production 

and perception of the English glides /w,r,l,j/. Journal of 

Phonetics, 12: 237-243. 

Boersma, Paul, and David Weenink (1992-2003). PRAAT, phonetics by 

computer. Available at http://www.praat.org. University of 

Amsterdam.


Gilbers, D.G. (2002). Conflicting phonologically based and phonetically based 

constraints in the analysis of /l/-substitutions. In: M. Beers, P. 

Jongmans & A. Wijnands (eds). Netwerk Eerste Taalverwerving, 

Net-bulletin 2001. Leiden, 22-40.

Tjeerd de Graaf 

Markus Bergmann, Nynke de Graaf and Hidetoshi 

Shiraishi 

Tjeerd de Graaf was born on January 27th 1938 in Leeuwarden, the capital 

of the province Fryslân in the Netherlands. Fryslân is the largest of several 

regions on the North Sea where Frisian is spoken, a West Germanic 

language whose genetically closest relative is English. 

Tjeerd’s parents were both Frisians, and at home they spoke exclusively 

Frisian. As most other children in Fryslân at that time, Tjeerd grew up 

bilingually. His first native language was Frisian, and at school he learned 

Dutch, the official language of the Netherlands. 

The coexistence of Frisian at home and Dutch at school was Tjeerd’s 

first experience in a fascinating world of different languages. For Tjeerd, 

the difference between the two languages had a very illustrative spatial 

implication: when he and the other children in his neighborhood went to 

school in the mornings, there was a railway crossing along the way. Once 

they had crossed it they stopped speaking Frisian and switched to Dutch, 

their official school language. 

At the age of 18, in 1956, Tjeerd graduated from the Leeuwarden High 

School and became interested in languages. His other big passion was the 

science of physics and astronomy. The oldest planetarium in the world is 

located in Franeker, an old academic place in Fryslân. Intrigued by the laws 

governing space and time, Tjeerd studied physics at the University of 

Groningen from 1956 to 1963. In 1963 he received his master’s degree in 

science (Doctoraal examen) in theoretical physics, a combination of 

physics, mathematics and astronomy. From 1963 until 1969 he continued 

as a research associate at the Institute of Theoretical Physics at the 

University of Groningen. 

Tjeerd was already a “polyglot” at that time, speaking not only Frisian 

and Dutch, but also German, English and French. Other languages would 

follow. In the former Soviet Union the study of astronomical sciences was 

enjoying an era of superiority. Tjeerd understood that learning Russian and 

other East European languages would be the key to enter the field of 

scientific knowledge. Along with his theoretical physics’ studies, he also

16 Markus Bergmann, Nynke de Graaf and Hidetoshi Shiraishi 

enrolled for the study of Slavic languages. The new technologies and their 

application for future research fascinated him. In 1967 he received his 

Master of Arts degree (Kandidaatsexamen) in Slavic languages and 

computer linguistics. In the meantime, after having obtained his MS, he 

continued his research in theoretical physics, combined with a study abroad 

in Poland, where he lived for half a year and mastered the language. 

By 1969, he finished his dissertation entitled “Aspects of Neutrino 

Astrophysics”. 

The cover page of Tjeerd’s dissertation in Theoretical Physics in 1969 

Tjeerd’s quenchless thirst for knowledge led him to England together with 

his wife Nynke and their children where they spent a year from 1970 to 

1971 and where he worked as a research associate at the Institute of 

Theoretical Physics at the University of Cambridge. 

Upon their return to Groningen, Tjeerd became assistant professor in 

physics at the Institute of Astronomy, a post he held until 1975. This was to 

be a turning point in his professional career when he decided to switch to 

his second passion, namely the study of languages. One of his dissertational 

theses dealt with the question as to how exact a person’s identity could be 

defined by his or her speech. This thesis symbolically defined one of 

Tjeerd’s later linguistic interests: the aspects of spoken language, the study 

of phonetics. 

In 1975, Tjeerd became associate professor at the Institute of Phonetic 

Sciences, Department of Linguistics, University of Groningen. 

Being a native bilingual in Frisian and Dutch, Tjeerd was aware of the 

numerous phonetic differences between the languages. Having studied


many other languages as well, Tjeerd understood how important phonetic 

descriptions are not only for theoretical linguistics, but also in learning and 

teaching foreign languages. 

Language coexistence and language change would become another focal 

point of his research. In most regions of the world, people are bilingual or 

even multilingual. Language variety appears both in space and time. 

Listening to radio programs or TV broadcasts dating back ten or twenty 

years, reveals a distinct difference in speech as compared with today’s 

custom of speaking. It is still the same language, the same place, and yet 

the speech is not the same as before. Not only the lexicon of a language 

changes but also the manner in which people speak, their pronunciation and 

intonation. This is an extremely intriguing topic for a person interested in 

languages and their varieties. 

Tjeerd started to trace the oldest recordings of spoken examples of 

languages. He analyzed Frisian recordings from the province of Fryslân as 

well as recordings from North and East Frisian regions. Recordings of the 

spoken language of former times are not only a historically important 

heritage, but they also offer valuable information pertaining to language 

shift processes. A practical problem with the oldest sound recordings is that 

they were made on wax cylinders and their quality decreases tremendously 

every time they are listened to. Tjeerd was aware of the fact that one of the 

main tasks was to transfer these recordings to modern media in order to 

preserve them. In the beginning of the 1990s, together with Japanese 

colleagues, Tjeerd started to investigate the possibilities of preserving old 

language recordings via modern audio technology. At that time, Tjeerd 

acquired yet another language, namely Japanese. 

Tjeerd working on wax cylinders with old recordings of Dutch


Tjeerd started to contact the most important sound archives of the world, 

which are in Vienna, Berlin, and St. Petersburg. Through his collaboration 

with the sound archive of the Academy of Sciences in St. Petersburg in the 

1990s, he renewed his contact with Russia, which had begun with his 

studies of Slavic languages in the 1960s. 

After 1990, the world had experienced dramatic changes. The Iron 

Curtain had disappeared and Russia had once again opened her “Window to 

the West”. When Tjeerd came back to St. Petersburg in the 1990s, he was 

immediately fascinated by this city he had visited for the first time some 

twenty years ago when it was still known as Leningrad. As a Frisian and a 

Dutchman, he felt at home there. The picturesque canals and paths along 

the wide boulevards reminded him of his home region. This was no 

coincidence: Czar Peter the Great, some 300 years ago, had chosen Holland 

as the model for his new capital. 

In the following years, Tjeerd organized joint projects with the Russian 

Academy of Sciences and St. Petersburg State University to preserve and 

transfer old Russian sound recordings onto modern digital audio media. 

Research on a vast collection of the most various sound recordings 

resulting from many linguistic field work expeditions from the end of the 

XIX and XX centuries served as an incentive for several projects related to 

different languages spoken in Russia. 

Tjeerd started to initiate research projects on the language spoken by the 

Mennonites, a group of people in Siberia, who had originally come from 

regions in the Northern Netherlands and Germany and still speak the 

language of their ancestors – in fact a language with great similarities to the 

modern dialects spoken in North-Germany and northern parts of the 

Netherlands. The Dutch press even reported about “Siberians speak 

Gronings”. 

Languages do not only divide people of different nations, but also build 

a bridge between them. Tjeerd showed this with his research work. Even in 

far-away Siberia there are people speaking almost the same language as in 

Groningen. When planning his expeditions, Tjeerd was concerned with 

both scientific aims and the organization of humanitarian aid from 

Groningen to the Siberian villages he visited. 

Language as a cultural heritage became the core of Tjeerd’s linguistic 

activities. With his bilingual origin, he set the perfect example. Throughout 

his life, he showed that each individual can contribute to the survival of a 

language. With his Frisian wife Nynke, whom he met in his student years, 

Tjeerd used to converse in Dutch. After their parents had passed away, they


decided to switch to Frisian. They personally experienced how a language 

slowly starts to become extinct if the children do not carry on the language. 

This attitude defined Tjeerd’s successive research activities in Russia. 

Subsequent projects, which he coordinated now, had two goals: 

documentation of endangered languages, and revitalizing and preserving 

them for future generations. In the following projects, both aspects – 

preservation and further development – were present. Tjeerd made several 

expeditions, among others to Yakutia and the Island of Sakhalin, where he 

and other linguists recorded the speech of the local indigenous peoples. 

Tjeerd de Graaf with a group of speakers of indigenous languages of the Island of 

Sakhalin in the Far East of Russia: Uiltas and Nivkhs, in the 1990s. 

In the second half of the 1990s, Tjeerd coordinated several projects with 

Institutions throughout the Russian Federation funded by the Netherlands 

Organization for Scientific Research and the EU INTAS organization in 

Brussels. 

His main goal was to make young people aware of their unique 

linguistic heritage and stimulate them in supporting minority and regional 

languages. In 1998, Tjeerd was appointed Knight in the Order of the Dutch 

Lion for his research and contribution in support of the preservation and 

construction of databases for the minority languages in Russia. Later that 

same year Tjeerd was awarded an honorary doctorate at the University of 

St. Petersburg for his contribution in the joint language preservation 

projects.


Tjeerd de Graaf is appointed Doctor Honoris Causa at the University of 

St. Petersburg, November 1998. 

Tjeerd has retired from the University of Groningen in 2003 and vacated 

the chair of the coordinator of the 'Klankleer' (Phonology and Phonetics) 

group of CLCG (Center of Language and Cognition Groningen). Therefore, 

his colleagues compiled this Festschrift exhibiting a diversity of research 

subjects on the boundaries of phonology and phonetics. 

It is not a goodbye to our former coordinator. Tjeerd's passionate 

engagement for languages and linguistic projects continues. Since his 

retirement he became an active honoree member at the Frisian Academy in 

Leeuwarden and he is still in contact with the University of St. Petersburg 

for future research projects. That means more than enough commitments 

for Tjeerd combined with his role as a grandfather for his five 

grandchildren. Tjeerd’s enthusiasm is a stimulation for other researchers 

and the young generation to continue his research. 

Publications by Tjeerd de Graaf 

1966 

The Annihilation of a Neutrino-antineutrino Pair into Photons and the 

Neutrino Density in the Universe. (With H.A.Tolhoek). Nuclear physics, 

81: 596 and 99: 695. 

Neutrinoprocessen en Neutrino-astronomie [Neutrino Processes and 

Neutrino Astronomy]. Internal Report IR 68, Natuurkundig Laboratorium 

Groningen. 58 pp.


1968 

De Rol van het Neutrino in de Astrofysica [The Role of the Neutrino in 

Astrophysics]. Nederlands tijdschrift voor natuurkunde, 34: 329. 

Phase Factors in Discrete Symmetry Operations. (With H.A.Tolhoek). 

Intern Rapport IR 85, Natuurkundig Laboratorium Groningen, 96 pp. 

Detectie van Neutrino's uit de Zon [Detection of Solar Neutrinos]. 

Nederlands tijdschrift voor natuurkunde, 34: 357. 

1969 

Phase Factors in Quantum Field Theory. Physica, 43: 142. 

Muonen uit Kosmische Straling: het Utah Experiment [Muons from 

Cosmic Radiation: the Utah Experiment]. (With J. van Klinken). 

Nederlands tijdschrift voor natuurkunde, 36: 301. 

Aspects of Neutrino Astrophysics. Dissertation University of Groningen. 

Groningen. 119 pp. 

Syllabus Beknopte Theoretische Natuurkunde [Syllabus Summary of 

Theoretical Physics]. Natuurkundig Laboratorium Groningen, 190 pp. 

1970 

On a Cosmic Background of Low-energy Neutrinos. Astronomy and 

Astrophysics, 5: 335. 

Neutrino Processes in the Lepton Era of the Universe. Lettere al Nuovo 

Cimento, 4: 638. 

Cosmological Neutrinos. Proceedings of the Cortona Meeting on 

Astrophysical Aspects of the Weak Interactions, 81. 

1971 

Nucleaire Astrofysica in het Laboratorium [Laboratory Nuclear 

Astrophysics]. Nederlands tijdschrift voor natuurkunde, 38: 107. 

The Astrophysical Importance of Heavy Leptons. Lettere al Nuovo 

Cimento, 2: 979. 

1972 

Lecture Notes on Nuclear Astrophysics. Scuola Normale Superiore, Pisa, 

45 pp. 

The Lepton Era of the Big Bang. Proceedings of the Europhysics 

Conference Neutrino'72. Budapest, 167.


1973 

Neutrinos in the Universe. Vistas in Astronomy, 15: 161. 

1974 

Nuclear Processes in the Early Universe. VI th International Seminar on 

Nuclear Reactions in the Cosmos. Leningrad, 329. 

Kernenergie in de Kosmos [Nuclear Energy in the Cosmos]. Atoomenergie 

en haar Toepassingen, 81. 

De Heliumabundantie in het Heelal [The Helium Abundance in the 

Universe]. (With W.J. Weeber). Nederlands tijdschrift voor natuurkunde, 

40: 183. 

1977 

De Computer en de Faculteit der Letteren [The Computer and the Faculty 

of Arts]. Informatiebulletin Computercommissie FdL. Groningen, 38 pp. 

1978 

Vowel Analysis with the Fast Fourier Transform. Acustica, 41: 41 

Ienlûd, twa lûden, twalûden [Monophthongs, Two Sounds, Diphthongs]. 

(with G.L. Meinsma). Us Wurk, 27: 81. 

Analyse de voyelles avec des méthodes digitales [Vowel Analysis with 

Digital Methods]. Actes des 9èmes Journées d'Etude sur la Parole. 

Lannion, 233. 

Linear Prediction in Speech Research. Prace XXV Seminarium Otwartego z 

Akustyki. Poznań, 19. 

1979 

Het kenmerk bij hoge gespannen vokalen [The Feature 

in High Tense Vowels]. (With N.Streekstra). TABU, 8: 40. 

De Computer en Fonetisch Onderzoek [The Computer and Phonetic 

Research]. Informatiebulletin Computercommissie FdL. Groningen, 5 pp. 

Vowel Analysis with Linear Prediction. Proceedings of the 9th 

International Congress of Phonetic Sciences. Copenhagen, 265. 

Digital Methods for the Analysis of Speech. Proceedings of the 7th 

Colloquium on Acoustics. Budapest, 289. 

1980 

Phonetic Aspects of Breaking in West Frisian. (With P.Tiersma). 

Phonetica, 37: 109.


De brekking fan sintralisearjende twalûden yn it Frysk [Breaking of 

Centralizing Diphthongs in Frisian]. (With G.L. Meinsma). Us Wurk, 29: 

131. 

Vannak-e Diftongusok a Magyar Köznyelvben? [Are there Diphthongs in 

Standard Hungarian?]. (With A.D. Kylstra). Nyelvtudományi Közlemények, 

82: 313. 

Applications of Linear Predictive Coding in Speech Analysis. Proceedings 

of the Symposium on Speech Acoustics, 57. 

1981 

Wiskundige Modellen in het Spraakonderzoek [Mathematical Models in 

Speech Research]. Wiskundige Modellen: Cursusboek Stichting TELEAC, 

165. 

Syllabegrenzen en Fonetische Experimentatie [Syllable Boundaries and 

Phonetic Experiments]. GLOT, Tijdschrift voor Taalwetenschap, 4: 229. 

Book Review of: Metrical Myths – An Experimental-Phonetic 

Investigation into the Production and Perception of Metrical Speech. 

Spectator, 10: 385. 

1982 

Vowel Contrast Reduction in Japanese Compared to Dutch. (With F.J. 

Koopmans-van Beinum). Proceedings of the Institute of Phonetic Sciences . 

Amsterdam, 7: 27. 

A Sociophonetic Study of Language Change. Proceedings of the 13 th 

International Conference of Linguistics. Tokyo, 602. 

1983 

Phonetic Sciences in the Netherlands, Past and Present. (With other 

authors). Publication of the Netherlands Association for Phonetic Sciences. 

Dordrecht, 32 pp. 

On the Reliability of the Intraoral Measuring of Subglottal Pressure. (With 

G.L.J. Nieboer and H.K. Schutte). Proceedings of the 10 th International 

Congress of Phonetic Sciences. Utrecht, 367. 

Phonetic Aspects of Vowels and Breaking of Diphthongs. Fifth 

International Phonology Meeting. Eisenstadt, 98. 

Vowel Contrast Reduction in Finnish, Hungarian and Other Languages. 

Dritte Tagung für Uralische Phonologie. Eisenstadt, 11.


1984 

Vowel Contrast Reduction in Terms of Acoustic System Contrast. (With 

F.J. Koopmans-van Beinum). Proceedings of the Institute of Phonetic 

Sciences. Amsterdam, 8: 41. 

Vokaalduur en Breking van Diftongen in het Fries [Vowel Duration and 

Breaking of Diphthongs in Frisian]. Verslagen van de Nederlandse 

Vereniging voor Fonetische Wetenschappen, 54. 

The Acoustic System Contrast and Vowel Contrast Reduction in Various 

Languages. Proceedings of the 23 rd Acoustic Conference on Physiological 

and Psychological Acoustics. Madrid, 76. 

Vowel Data Bases. (With A. Bladon en M. O'Kane). Speech 

Communication, 3: 169. 

Nederlandse Leerboeken voor de Fonetiek van het Engels [Dutch Teaching 

Methods on the Phonetics of English]. (With A. van Essen en J. 

Posthumus). Toegepaste Taalwetenschap in Artikelen, 20: 123-154. 

1985 

Phonetic Aspects of the Frisian Vowel System. NOVELE, 5: 23-42. 

Review of: Spreken en Verstaan, een nieuwe Inleiding tot de Experimentele 

Fonetiek [Speaking and Understanding, A New Introduction to 

Experimental Phonetics]. (By S. Nooteboom en A. Cohen). Logopedie en 

Foniatrie, 57: 106. 

De Groninger Button [The Groningen Button]. (With G.L.J. Nieboer and 

H.K. Schutte). Verslagen van de Nederlandse Vereniging voor Fonetische 

Wetenschappen, 57-62. 

1986 

Sandhi Phenomena in West Frisian. (With G. van der Meer). Sandhi 

Phenomena in the Languages of Europe. Berlin, 301-328. 

Review of: The Production of Speech. (By P.F. MacNeilage). Studies in 

Language, 10: 273-277. 

Production of Different Types of Esophageal Voice Related to the Quality 

and the Intensity of the Sound Produced. Folia Phoniatrica, 38: 292. 

De Uitspraak van het Nederlands door Buitenlanders [The Pronunciation of 

Dutch by Foreigners]. Logopedie en Foniatrie, 58: 343-349. 

Sociophonetic Aspects of Frisian. Friser Studier IV/V. Odense, 3-21. 

Een contrastief fonetisch onderzoek Japans-Nederlands [A Contrastive 

Phonetic Research Japanese-Dutch]. Verslagen van de Nederlandse 

Vereniging voor Fonetische Wetenschappen, 15-24.


1987 

The Retrieval of Dialect Material from Old Phonographic Wax Cylinders. 

Proceedings of the Workshop on “New Methods in Dialectology”. 

Amsterdam, 117-125. 

Acoustic and Physiological Properties of the Laryngeal and Alaryngeal 

(Esophageal) Voice. Proceedings of the XXXIV th Open Seminar on 

Acoustics. Wrocław, 10-16. 

A Contrastive Study of Japanese and Dutch. Proceedings of the XI th 

International Congress of Phonetic Sciences. Tallinn, 124-128. 

1988 

His Master's Voice: Herkenning van de Spraakmaker [His Master’s Voice: 

Recognition of the Speech Producer]. TER SPRAKE: SPRAAK als 

betekenisvol geluid in 36 thematische hoofdstukken. Dordrecht, 200-208. 

Book Review: Fonetiek en Fonologie [Phonetics and Phonology]. (By R. 

Collier en F.G. Droste). Logopedie en Foniatrie, 60: 195. 

The Frisian Language in America. (With T. Anema and H. Schatz). 

NOWELE, 6: 91-108. 

Esophageal Voice Quality Judgements by Means of the Semantic 

Differential. (With G.L.J. Nieboer and H.K. Schutte). Journal of Phonetics, 

16: 417-436. 

Book Review: Sprechererkennung [Speaker Recognition]. (By Hermann J. 

Künzel). Journal of Phonetics, 16: 459-463. 

1989 

Reconstruction, Signal Enhancement and Storage of Sound Material in 

Japan. Proceedings of the 2 nd International Conference on Japanese 

Information in Science, Technology and Commerce. Berlin, 367-374. 

Aerodynamic and Psycho-acoustic Properties of Esophageal Voice 

Production. (With G.L.J. Nieboer and H.K. Schutte). Proceedings of the 

Conference on Speech Research '89. Budapest, 53-58. 

A Data Base of Old Sound Material. Proceedings of the ESCA Workshop 

on Speech Input/Output Assessment and Speech Data Bases. Noordwijk, 

2.14.1-5. 

1990 

Een contrastief fonetisch onderzoek, in het bijzonder Japans-Nederlands 

[Contrastive Phonetic Research, in Particular Japanese-Dutch]. Neerlandica 

Wratislaviensia IV. Wrocław, 140-148.


Book Review: To Siberia and Russian America, Three Centuries of Russian 

Eastward Expansion. Circumpolar Journal, 7: 41-46. 

New Technologies in Sound Reconstruction and their Applications to the 

Study of the Smaller Languages of Asia. Proceedings of the IV th 

International Symposium “Uralische Phonologie”. Hamburg, 15-19. 

GARASU-GLAS: Fonetische contrasten Japans-Nederlands [GARASU- 

GLAS: Phonetic Contrasts Japanese-Dutch]. TABU. Bulletin voor 

Taalwetenschap, 20: 49-57. 

1991 

Aerodynamic and Phonetic Properties of Voice Production with the 

Groningen Button. TENK jaarboek, 91-97. 

Laser-beam Technology in Diachronic Phonetic Research and 

Ethnolinguistic Field Work. Proceedings of the XII th International 

Congress of Phonetic Sciences. Amsterdam, 114-118. 

Laut aus Wachs: Der Übergang von stoffgebundenen zum elektronischen 

und optischen Informationstransport [Sound from Wax: The Transition 

from Material-Bound to Electronic and Optic Information Transport]. TU 

International. Berlin, 14/15: 63-66. 

1992 

The Languages of Sakhalin. Small Languages and Small Language 

Communities: News, Notes, and Comments. International Journal of the 

Sociology of Languages, 94: 185-200. 

Dutch Encounters with Sakhalin and with the Ainu People. Proceedings of 

the International Conference 125 th anniversary of the birth of Bronis�aw 

Pi�sudski. Sapporo, 108-137. 

The Ethnolinguistic Situation on the Island of Sakhalin. Circumpolar 

Journal, 6: 32-58. 

Aerodynamische en fonetische eigenschappen van verschillende soorten 

slokdarmstem [Aerodynamic and Phonetic Features of Different Kinds of 

Esophageal Voice]. (With G.L.J. Nieboer and H.K. Schutte). Klinische 

Fysica, 8: 64-66. 

The Dutch Role in the Border Area between Japan and Russian. Round Table 

Conference “The Territorial Problem in Russo-Japanese Relations”. 

Moscow, 20-26. 

De Taal der Mennonieten [The Language of the Mennonites]. Syllabus 

NOMES Symposium Groningen, 42 pp.


1993 

Saharin ni okeru shoosuu minzoku no gengo jookyoo [The Status of 

Minority Languages on Sakhalin]. (With K. Murasaki). Japanese Scientific 

Monthly, 46: 18-24. 

The Ethnolinguistic Situation on the Island of Sakhalin. Ethnic minorities 

on Sakhalin. Yokohama, 13-32. 

Vstrechi gollandtsev c Sakhalinom i Ainami [Meetings of the Dutch with 

Sakhalin and the Ainu Population]. Proceedings of the International 

Conference “B.O. Pilsudski - issledovatel' narodov Sakhalina”. Yuzhno- 

Sakhalinsk, 92-99. 

De taal der Mennonieten in Siberië en hun relatie met Nederland [The 

Language of the Siberian Mennonites and their Relation with the 

Netherlands]. (With R. Nieuweboer). Doopsgezinde Bijdragen, 19: 175- 

189. 

Languages and Cultures of the Arctic Region in the Former Soviet Union. 

(With R. Nieuweboer). Circumpolar Journal, 1-2: 29-42. 

1994 

The Dutch Role in the Border Area between Japan and Russia. 

Circumpolar Journal, 3-4: 1-12. 

Nederlands in Siberië [Dutch in Siberia]. (With R. Nieuweboer). TABU 

Taalkundig Bulletin, 24: 65-75. 

The Language of the West Siberian Mennonites. (With R. Nieuweboer). 

RASK, Internationalt tidsskrift for sprog og kommunikation, 1: 47-63. 

1995 

Het territoriale geschil tussen Japan en Rusland over de Koerilen [The 

Territorial Dispute between Japan and Russia about the Kuril Islands]. 

(With I. van Oosteroom). Internationale Spectator, 49: 41-46. 

Dutch Encounters with Sakhalin and with the Ainu People. Linguistic and 

�� , 35-61. 

The Language of the West Siberian Mennonites. (with R. Nieuweboer). 

Proceedings of the XIII th Congress of Phonetics Sciences. Stockholm, 4: 

180-184. 

Pitch Stereotypes in the Netherlands and Japan. (With R. van Bezooijen 

and T. Otake). Proceedings of the XIII th Congress of Phonetic Sciences. 

Stockholm, 680-684. 

The Reconstruction of Acoustic Data on the Ethnic Minorities of Siberia. 

Proceedings of the International Conference on “The Indigenous Peoples


of Siberia: Studies of Endangered Languages and Cultures”. Novosibirsk, 

1: 381-383. 

1996 

Book Review of: Joshua A. Fishman, Yiddish: Turning to Life. Studies in 

Language, 20,1: 191-196. 

Language Minorities in the Sakha Republic (Yakutia). Report Nagoya City 

University. Nagoya, 165-179. 

Dutch Encounters with the Peoples of Eastern Asia. A Frisian and Germanic 

Miscellany, published in Honour of Nils Århammar on his Sixty-Fifth 

Birthday. Odense, 377-386. 

Dutch Immigrants in Siberia? The Language of the Mennonites. Charisteria 

doctissimo Přemysl Janota oblata, Acta Universitatis Carolinae 

viro 

Philologica. Prague, 75-86 

Archives of the Languages of Russia. (With L.V. Bondarko). Reports on the 

INTAS Project No. 94-4758. St.-Petersburg, 120 pp. 

1997 

The Reconstruction of Acoustic Data and Minority Languages in Russia. 

Proceedings of the 2 nd International Congress of Dialectologists and 

Geolinguists. Amsterdam.,44-54. 

Language and Culture of the Russian Mennonites. Around Peter the Great. 

Three Centuries of Russian-Dutch Relations. Groningen, 132-142. 

Resten van het Jiddisch in Groningen en Sint-Petersburg [Remnants of the 

Yiddish Language in Groningen and Saint-Petersburg]. VDW-berichten, 

Vereniging voor Dialectwetenschap, 1: 6-7. 

The Reconstruction of Acoustic Data and the Study of Language Minorities 

in Russia. Language Minorities and Minority Language. �� 

1998 

Linguistic Databases and Language Minorities around the North Pacific 

Rim. Lecture on the Occasion of the Doctorate Honoris Causa, St.- 

Petersburg, 14 pp. 

Linguistic Databases: A Link between Archives and Users. Journal of the 

International Association of Sound Archives, 27-34.


1999 

Russian-Yiddish: Phonetic Aspects of Language Interference. (With N. 

Svetozarova, Yu. Kleiner and R. Nieuweboer). Proceedings of the 14 th 

International Congress of Phonetic Sciences. San Francisco., 1397-1401. 

Language Contact and Sound Archives in Russia. (With L. Bondarko). 

Proceedings of the 14 th International Congress of Phonetic Sciences. San 

Francisco, 1401-1404. 

Lingvisticheskie bazy dannykh i yazykovye men’shinstva po obeim storonam 

severnogo tikho-okeanskogo poyasa [Linguistic databases and language 

minorities at both sides of the North-Pacific Rim]. Yazyk i rechevaya 

deyatel’nost’, 2: 8-18. 

2000 

Scientific Links between Russia and The Netherlands: A Study of the 

Languages and Cultures in the Russian Federation. Proceedings of the 

Conference on the Netherlands and the Russian North. Arkhangelsk. To be 

published. 

The Language of the Siberian Mennonites. (With R. Nieuweboer). New 

Insights in Germanic Linguistics II. Frankfurt am Main, 21-34. 

2001 

Nivkh and Kashaya: Two endangered Languages in Contact with Russian and 

English. Materialy mezhdunarodnoy konferentsii “100 let eksperimental’noy 

fonetike v Rossii”. St.-Petersburg, 78-83. 

Data on the Languages of Russia from Historical Documents, Sound Archives 

and Fieldwork Expeditions. Recording and Restoration of Minority 

Languages, Sakhalin Ainu and Nivkh, ELPR Report A2-009. Kyoto, 13-37. 

Kashaya Pomo and the Russian Influence around the North Pacific. Materials 

�� 

�� 

Scholarly Heritage. Kraków, 385-395. 

2002 

Yazyk i etnos [Language and Ethnos]. (With A.S. Gerd and M. Savijärvi). 

Texts and Comments on Balto-Finnic and Northwestern Archaic Russian 

Dialects. St.-Petersburg, 206 pp. 

Voices from Tundra and Taiga: Endangered Languages in Russia on the 

Internet. Conference Handbook on Endangered Languages. Kyoto, 57-79.


Phonetic Aspects of the Frisian Language and the Use of Sound Archives. 

Problemy i metody eksperimental’no-foneticheskikh issledovaniy. St.- 

Peterburg, 52-57. 

Voices from the Shtetl: The Past and Present of the Yiddish Language in 

Russia. Final Report NWO Russian-Dutch Research Cooperation. 

Groningen, 143 pp. 

The Use of Sound Archives in the Study of Endangered Languages. Music 

Archiving in the World, Papers Presented at the Conference on the 

Occasion of the 100 th Anniversary of the Berlin Phonogramm-Archiv. 

Berlin, 101-107. 

The Use of Acoustic Databases and Fieldwork for the Study of the 

Endangered Languages of Russia. Proceedings of the International LREC 

Workshop on Resources and Tools in Field Linguistics. Las Palmas, 29.1-4 

(CD-ROM). 

Yiddish in St.-Petersburg: The Last Sounds of a Language. Proceedings of 

the Conference “Klezmer, Klassik, jiddisches Lied. Jüdische Musik-Kultur 

in Osteuropa.”. Potsdam. To be published. 

2003 

Yazyki severnoy i vostochnoy Tartarii – o yazykovykh svedeniyakh v 

knige N. Vitsena [The Languages of North and East Tartary – About the 

Linguistic Data in the Book of N. Witsen]. (With M. Bergmann). 

Proceedings of the Conference on General Linguistics. St.-Petersburg. To 

be published. 

Description of Minority Languages in Russia on the Basis of Historical 

Data and Fieldwork. Proceedings of the XVI th International Congress of 

Linguists. Prague. To be published. 

Voices of Tundra and Taiga: Data on Minority Languages in Russia from 

Historical Data and Fieldwork. Proceedings of the Conference “Formation 

of Educational Programs Aimed at a New Type of Humanitarian Education 

in Siberian Polyethnic Society, Novosibirsk. To be published. 

Endangered Languages in Europe and Siberia: State of the Art, Needs and 

Solutions. International Expert Meeting on UNESCO Programme 

“Safeguarding of Endangered Languages”. Paris. To be published 

Presentation of the UNESCO Document “Language Vitality and 

Endangerement”. Focus on Linguistic Diversity in the New Europe. 

European Bureau for Lesser Used Languages, Brussels. To be published.

Tseard de Graaf 

Oerset troch Jurjen van der Kooi 

Tseard de Graaf is berne op 27 jannewaris 1938 yn Ljouwert, de haadstêd 

fan de Nederlânske provinsje Fryslân, de grutste regio oan de kusten fan de 

Noardsee dêr’t it Frysk, in Westgermaanske taal mei as neiste sibbe it 

Ingelsk, sprutsen wurdt. 

Tseard syn âlden wiene beide Fries en thús waard allinne Frysk praat. 

Syn earste taal wie dan ek it Frysk; op skoalle learde er it Nederlânsk, de 

offisjele taal fan Nederlân. 

It Frysk waard doe noch net op skoalle jûn en de bern moasten dêr 

Nederlânsk leare. Troch it neistinoar fan it Frysk thús en it Nederlânsk op 

skoalle kaam Tseard foar it earst yn ’e kunde mei de fassinearjende wrâld 

fan ûnderskate talen. Foar Tseard hie it ûnderskie tusken dy twa talen ek in 

besûnder romtlik aspekt. As hy en de oare bern út syn buert de moarns nei 

skoalle ta gyngen moasten se oer it spoar. Wiene se dêr oer, dan giene se 

fan it Frysk oer op it Nederlânsk, de taal fan de skoalle. 

18 Jier âld wie er, doe’t er yn 1956 eineksamen middelbere skoalle die 

en hy krige niget oan talen. Syn oare grutte leafdes wiene natuer- en 

stjerrekunde. It âldste planetarium fan de wrâld is yn Frjentsjer, de âlde 

akademystêd fan Fryslân. Yn ’e besnijing fan de wetten dy’t tiid en romte 

regeare, studearre Tseard fan 1956 oant 1963 natuerkunde oan de 

Universiteit fan Grins. Yn dat lêste jier die er doktoraal eksamen teoretyske 

natuerkunde, in kombinaasje fan natuerkunde, wiskunde en stjerrekunde en 

dêrnei wie er oant 1969 ûndersykmeiwurker oan it Ynstitút foar Teoretyske 

Natuerkunde fan de Grinzer universiteit. 

Tseard wie doe al in ‘polyglot’. Hy spruts net allinne Frysk en 

Nederlânsk, mar ek Dútsk, Ingelsk en Frânsk. Oare talen soene folgje. Yn 

de Sowjet Uny stie de stúdzje fan de astronomy doe op in superieur nivo. 

Tseard seach dat en learde it Russysk en oare Eastjeropeeske talen om’t dy 

it paad nei nij ynsjoch yn dy fjilden fan wittenskip iepenleinen. Neist syn 

stúdzje fan de teoretyske natuerkunde folge er kolleezjes yn de Slavyske 

talen. Hy wie fassinearre troch de nije technologyen en har 

tapassingsmooglikheden foar takomstich ûndersyk en yn 1967 die er 

kandidaatseksamen Slavyske talen en kompjutertaalkunde. Yntysken wie er

32 Oerset troch Jurjen van der Kooi 

nei syn doktoraal natuerkunde trochgien mei syn ûndersyk yn de teoretyske 

natuerkunde, dat er kombinearre mei in heal jier stúdzje yn Poalen, dêr’t er 

ek it Poalsk by learde. 

Yn 1969 wie er klear mei syn dissertaasje, titele: “Aspects of Neutrino 

Astrophysics”. 

It titelblêd fan Tseard syn dissertaasje teoretyske natuerkunde út 1969 

Syn ûndwêstbere toarst nei witten brocht Tseard nei Ingelân, dêr’t er mei 

frou en bern fan 1970 oant 1971 in jier tabrocht en dêr’t er wurke as 

ûndersiker oan it Ynstitút foar Teoretyske Natuerkunde fan de Universiteit 

fan Cambridge. 

Werom yn Grins waard Tseard universitêr meiwurker natuerkunde oan 

it Ynstitút foar Astronomy. Oant 1975. Dat jier waard in kearpunt yn syn 

wittenskiplike karriêre om’t er besleat de wei fan syn twadde grutte leafde 

te gean, dy fan de bestudearring fan talen. Ien fan de stellingen by syn 

dissertaasje gie oer de fraach, hoe krekt of immens identiteit definiearre 

wurde kin troch syn of har taal. Dy stelling kin sjoen wurde as in 

symboalyske paadwizer nei syn lettere wei yn de taalkunde, dy’t him liede 

soe nei de stúdzje fan aspekten fan de sprutsen taal, nei de fonetyk. 

Yn 1975 waard Tseard meiwurker oan it Ynstitút foar Fonetyk fan de 

Literêre Fakulteit fan de Grinzer universiteit. 

Om’t er fan jongs ôf oan twatalich wie (Frysk-Nederlânsk) hie Tseard in 

skerp each foar de ûntelbere fonetyske ferskillen tusken dy talen. En om’t 

er oare talen bestudearre hie, wist er hoe wichtich fonetyske beskriuwingen 

binne, net allinne foar de teoretyske taalkunde, mar likegoed ek foar it 

learen fan en it lesjaan yn frjemde talen.


It neistinoar fan talen en taalferoaring soene oare swiertepunten fan syn 

ûndersyk wurde. Rûnom yn ’e wrâld binne minsken twa- of sels meartalich. 

Der is taalfariaasje yn romte èn yn tiid. Harket men nei radioprogramma’s 

of tillevyzje-útstjoerings fan tsien of twintich jier lyn, dan heart men in oare 

sprektaal as at no gongber is. It is noch altiten deselde taal en itselde plak, 

en dochs is de taal net mear gelyk. Net allinne it leksikon fan in taal 

feroaret, mar ek minskene wize fan sprekken, de útspraak en de yntonaasje, 

in útsûnderlik nijsgjirrich ûnderwerp foar immen dy’t niget hat oan talen en 

har fariabiliteit. 

Tseard begûn mei in syktocht nei de âldste registraasjes op lûddragers 

fan sprutsen taal. Hy analysearre materiaal net allinne út Westerlauwersk 

mar ek út Noard- en Eastfryslân. Sokke registraasjes fan eardere sprutsen 

taal binne net allinne wichtich histoarysk erfguod, mar se jouwe ek 

weardefolle ynformaasje oangeande taalferoaringsprosessen. In praktysk 

probleem by dy âldste lûdregistraasjes is dat se makke binne op 

waakssilinders en dat de kwaliteit hurd ôfnimt elke kear as se beharke 

wurde. Tseard seach yn dat it fan it grutste belang is en bring dizze 

registraasjes oer op moderne lûddragers, sadat se bewarre bliuwe. Yn it 

begjin fan de jierren 90 begûn Tseard mei kollega’s út Japan in ûndersyk 

nei de mooglikheden dêrta. Yn dy tiid makke er him noch in taal eigen, it 

Japansk, dat er floeiend sprekken learde. 

Tseard oan ’e skrep mei waakssilinders mei âlder Nederlânsk 

Tseard socht kontakt mei de wichtichste lûdargiven yn de wrâld, dy yn 

Wenen, Berlyn en Sint Petersboarch. Troch syn oparbeidzjen yn de 90er 

jierren mei it lûdargyf fan de Akademy fan Wittenskippen yn dy lêste stêd


luts er op ’en nij de relaasjes oan mei Ruslân, dy’t yn de 60er jierren begûn 

wiene mei syn bestudearring fan de Slavyske talen. 

Sûnt 1990 is wrâld dramatysk feroare. It Izeren Gerdyn is der net mear 

en Ruslân hat opnij syn ‘Finster op it Westen’ iepenset. Doe’t Tseard nei 

1990 weromkaam yn Sint Petersboarch rekke er daliken fassinearre troch 

dizze stêd dy’t er foar it earst likernôch 20 jier lyn, doe’t er noch Leningrad 

hiet, sjoen hie. As Fries en Nederlanner fielde er him der thús. De 

skildereftige kanalen en paden lâns de wide bûlevaren diene him tinke oan 

thús. Dat wie gjin tafal: tsaar Peter de Grutte hie sa’n 300 jier earder Hollân 

keazen as model foar syn nije haadstêd. 

Tseard organisearre no mienskiplike projekten mei de Russyske 

Akademy fan Wittenskippen en de Steatsuniversiteit fan Sint Petersboarch. 

It doel wie âlde Russyske lûdregistraasjes te bewarjen en oer te setten op 

moderne digitale audio media. Undersyk nei in grutte samling fan 

alderhande lûdregistraasjes, resultaat fan withoe folle linguïstyske 

ekspedysjes fan ein 19de en út de 20ste ieu, brocht nije projekten 

ûnderskate yn Ruslân spruten talen oanbelangjende op ’e gleed. 

Sels sette er útein mei ûndersyk nei de taal fan de Sibearyske 

Mennoniten, dêr’t it komôf fan socht wurde moat yn noardlik Nederlân en 

Dútslân en dy’t noch altiten de taal fan de foarâlden sprekke – feitliken in 

taal dy’t gâns hat fan de dialekten fan it hjoeddeiske Noard-Dútslân en de 

noardlike parten fan Nederlân. De Nederlânske parse kaam sels mei de kop 

“Sibeariërs sprekke Grinzers”. 

Talen skiede net allinne folken en naasjes, se bouwe der ek brêgen 

tusken. Tseard liet ek dat mei syn ûndersyk sjen. Sels yn it fiere Sibearje 

wenje minsken dy’t likernôch deselde taal hawwe as de minsken yn 

Grinslân. By it plannen fan syn ekspedysjes tocht Tseard net allinne oan de 

wittenskip, mar ek om de minsken: hy organisearre ek humanitêre help út 

Grins wei foar de doarpen dy’t er yn Sibearje oandie. 

Taal as kultureel erfskip waard de kearn fan syn linguïstyske 

aktiviteiten. Troch syn twatalich komôf koe er in treflik foarbyld jaan. Syn 

hiele libben hat er sjen litten dat elk yndividu bydrage kin oan it oerlibjen 

fan in taal. Mei syn frou Nynke, dy’t er met hie yn syn studintetiid en dêr’t 

er lang allinne Nederlânsk mei praat hie, praatte er no ôf om oer te stappen 

op it Frysk. Nynke is sels in Friesinne en nei de dea fan har âlden murken 

hja sels hoe't in taal stadichoan út begjint te stjerren as de bern him net 

fierder trochjouwe. 

Dy taalhâlding waard de rjochtline foar Tseard syn opienfolgjende 

ûndersykaktiviteiten yn Ruslân. De projekten dy’t er fan dat stuit ôf oan


koördinearre krigen twa doelstellings: bedrige talen net allinne 

dokumintearje, mar ek revitalisearje en yn stân hâlde foar kommende 

generaasjes. Tseard die mei oan ferskate ekspedysjes, ûnder oaren nei 

Yakutia en it eilân Sakhalin, wêr’t er mei oare linguïsten de talen fan de 

lokale folken fêstlei. 

Tseard de Graaf mei sprekkers fan talen fan it eilân Sakhalin yn it fiere easten fan 

Ruslân: Uiltas en Nivkhs (jierren 90) 

Yn de twadde helte fan de 90er jierren koördinearre Tseard ûnderskate 

projekten mei ynstituten rûnom yn de Russyske Federaasje, foar de 

finansearring soargen Nederlânske wittenskiplike organisaasjes en de 

INTAS fan de EU. 

Alderearsten woe er jonge minsken bybringe dat harren taal in unyk 

erfskip is en dat hja minderheids- en regionale talen stypje moatte. Yn 1998 

waard Tseard beneamd ta ridder yn de oarder fan de Nederlânske liuw 

fanwegens syn ûndersyk nei en krewearjen foar it behâld fan en it opsetten 

fan databanken foar de minderheidstalen yn Ruslân. Letter datselde jiers 

krige er in earedoktoraat fan de Universiteit fan Sint Petersboarch foar syn 

bydragen oan de mienskiplike taalbehâldprojekten.


Tseard de Graaf earedoktor oan de Universiteit fan Sint Petersboarch, novimber 

1998. 

Tseard moast yn 2003 mei pinsjoen en syn plak as koördinator fan de 

ôfdieling 'Klanklear' (Fonology en Fonetyk) fan it CLGC (Center of 

Language and Cognition Groningen) fan de Grinzer Universiteit opjaan. Ta 

dy gelegenheid ha syn kollega's dizze earebondel mei in ferskaat oan 

bydragen oer ûndersyk yn de grinsgebieten fan fonology en fonetyk 

gearstald. 

Lykwols, it is gjin ôfskie fan ús eardere koördinator. Tseard syn 

pasjonearre belutsenens by talen en linguïstyske projekten is bleaun. Sûnt 

syn pinsjoen is er aktyf as honorêr meiwurker fan de Fryske Akademy yn 

Ljouwert en ek it kontakt mei de Universiteit fan Sint Petersboarch oer 

ûndersyksprojekten dy't op kommende wei binne is bleaun. Dat betsjut dat 

Tseard neist syn rol as pake foar syn fiif pakesizzers noch genôch te dwaan 

hat. Syn entûsjasme is in oantrún foar (kommende) ûndersikers om fierder 

te gean mei it ûndersyk dat hy op priemmen set hat.

Boundary Tones in Dutch: Phonetic or Phonological 

Contrasts? 

Vincent J. van Heuven 

1. Introduction 1 

1.1. Linguistic categorization of sound 

A basic problem of linguistic phonetics is to explain how the infinite 

variety of speech sounds in actual utterances can be described with finite 

means, such that they can be dealt with in the grammar, i.e. phonology, of a 

language. The crucial concept that was developed to cope with this 

reduction problem is the sound category, or – when applied to the 

description of segmental phenomena – the phoneme. This is best conceived 

of as an abstract category that contains all possible sounds that are mutually 

interchangeable in the context of a minimal word pair. That is, substitution 

of one token (allophone) of a phoneme for an other does not yield a 

different word (i.e., a string of sounds with a different lexical meaning). 2 

The phonemes in a language differ from one another along a finite 

number of phonetic dimensions, such as degree of voicing, degree of 

noisiness, degree of nasality, degree of openness, degree of backness, 

degree of rounding, etc. Each phonetic dimension, in turn, is subdivided 

into a small number (two to four) of phonologically functional categories, 

such as voiced/voiceless, (half)closed/(half)open, front/central/back, etc. 

Phonetic dimensions generally have multiple acoustical correlates. For 

instance, degree of voicing correlates with a multitude of acoustic cues 

such as voice onset time, duration of preceding vowel, steepness of 

intensity decay and of formant bends in preceding vowel, duration of 

intervocalic (near) silence, duration and intensity of noise burst, steepness 

of intensity attack and formant bends of following vowel. These acoustic 

properties typically co-vary in preferred patterns, but may be manipulated

38 Vincent J. van Heuven 

independently through speech synthesis. When non-typical (‘conflicting’) 

combinations of parameter values are generated in the laboratory, some 

cues prove to be more influential than others; so-called ‘cue trading 

relationships’ have been established for many phonemic contrasts. In 

Dutch, for instance, vowel quality (acoustically defined by F1 and F2, i.e., 

the centre frequencies of the lowest two resonances in the vocal tract) and 

vowel duration were found to be equally influential in cuing the tense/laxcontrast 

between /�� and /�/: a duller vowel quality (lower F1 and F2values), 

normally cuing /�/ could be compensated for by increasing the 

duration of the vowel so that native listeners still perceive /a �/ (and vice 

versa, van Heuven, 1986). 

Categorization of sounds may proceed along several possible lines. 

First, many differences between sounds are simply too small to be heard at 

all: these are subliminal. The scientific discipline of psycho-acoustics 

provides a huge literature on precisely what differences between sounds 

can and cannot be heard with the naked ear. Moreover, research has shown 

that the human hearing mechanism (and that of mammals in general) has 

developed specific sensitivities to certain differences between sounds and is 

relatively deaf to others. These predilections have been shown to be present 

at birth (probably even in utero), and need not be acquired through 

learning. However, human categorization of sound is further shaped by 

exposure to language. As age progresses from infancy to adulthood, sound 

differences that were still above threshold shortly after birth quickly lose 

their distinctivity. An important concept in this context is the notion of 

categorical perception. This notion is best explained procedurally in terms 

of a laboratory experiment. 

Imagine a minimal word pair such as English back ~ pack. One 

important difference between these two tokens is that the onset of voicing 

in back is more or less coincident with the plosive release, whilst the voice 

onset in pack does not start until some 50 ms after the release. It is not too 

difficult in the laboratory to create a series of exemplars by interpolating 

the voice onset time of a prototypical back (0-ms delay) and that of a 

prototypical pack (70-ms delay) in steps of, say, 10 ms, so that we now 

have an 8-step continuum ranging over 0, 10, 20, 30, 40, 50, 60, and 70 ms. 

These eight exemplars are shuffled in random order and played to an 

audience of native English listeners for identification as either back or pack 

(forced choice). The 0-ms voice delay token will naturally come out with 

exclusively back-responses (0% pack); the 70-ms token will have 100% 

pack-responses. But what results will be obtained for the intermediate


exemplars? If the 10-ms changes in voice delay are perceived continuously, 

one would predict a constant, gradual increase in %-pack responses for 

each 10-ms increment in the delay. I.e., when the stimulus increment (from 

left to right) is plotted against the response increment (from bottom to top), 

the psychometric function (the line that captures the stimulus-response 

relationship) is essentially a straight line (open symbols in Figure 1B). The 

typical outcome of experiments with voiced/voiceless continua, however, is 

non-continuous. For the first part of the continuum all exemplars are 

perceived as back-tokens, the rightmost two or three exemplars are nearunanimously 

perceived as pack. Only for one or two exemplars in the 

middle of the continuum do we observe uncertainty on the part of the 

listener: here the distribution of responses is more or less ambiguous 

between back and pack. The psychometric function for this so-called 

categorical perception is sigmoid, i.e., has the shape of an S (big solid 

symbols in Figure 1B). In the idealized case of perfect categorical 

perception we would, in fact, expect to see a step-function jumping 

abruptly from (almost) 0 to (almost) 100% pack-responses somewhere 

along the continuum (thin black line with small solid symbols in Figure 

1B). 

The category boundary (at 35-ms VOT in Figure 1B) is defined as the 

(interpolated) point along the stimulus axis where the distribution of 

responses is completely ambiguous, i.e., 50-50%. For a well-defined crossover 

from one category to the other there should be a point along the 

stimulus axis where 75% of the responses agree on one category, and a 

second point where there is 75%-agreement on the other category. The 

uncertainty margin is defined in absolute terms as the distance along the 

stimulus axis between the two 75%-points; equivalent relative measures 

can be derived from the steepness of the psychometric function (e.g. the 

slope coefficient or the standard deviation of the cumulative normal 

distribution fitted to the data points).


Figure 1. Panel A. Hypothetical discrimination function for physically same and 

different pairs of stimuli (one-step difference) reflecting categorical 

perception. Panel B. Illustration of continuous (open squares) versus 

categorical (big solid squares) perception in the identification and 

discrimination paradigm. The thin line with small squares represents the 

ideal step function that should be obtained when categorical perception is 

absolute. Category boundary and uncertainty margin are indicated 

(further, see text). 

Although a pronounced sigmoid function (such as the one drawn in Figure 

1B) is a clear sign of categorical perception, researchers have always been 

reluctant to consider it definitive proof. Listeners, when forced to, tend to 

split any continuum down the middle. For a continuum to be perceived 

categorically, therefore, two conditions should be met:


- results of an identification experiment should show a clear sigmoid 

function, and 

- the discrimination function should show a local peak for stimuli 

straddling the category boundary. 

The discrimination function is determined in a separate experiment in 

which either (i) identical or (ii) adjacent tokens along the stimulus 

continuum are presented pair-wise. Listeners then decide for each pair 

whether the two tokens are ‘same’ or ‘different’. Two kinds of error may 

occur in a discrimination task: 

- a physically different pair may be heard as ‘same’, and 

- a pair of identical tokens may be called ‘different’. 

The results of a discrimination task are best expressed as the percentage of 

correct decisions obtained for a ‘different’ stimulus pair minus the 

percentage of errors for ‘same’ pairs constructed from these stimuli (the 

latter percentage is often called the response bias). In the case of true 

categorical perception the discrimination scores show a pronounced peak 

for the stimulus pair straddling the category boundary, whilst all other pairs 

are discriminated at or only little above chance level (see panel A in Figure 

1). Physically different sounds that fall in the same perceptual category are 

hard to discriminate. In the case of continuous perception, there is no local 

peak in the discrimination function. 

1.2. Categorical nature of intonational contrasts 

By intonation or speech melody we mean the pattern of rises and falls in 

the time-course of the pitch of spoken sentences. Melodic patterns in 

speech vary systematically across languages, and even within languages 

across dialects. The cross-linguistic differences can be parameterized and 

described in much the same way as has been done for the segmentals in 

language: a set of distinctive features defines an inventory of abstract units, 

which can be organized in higher-order units subject to wellformedness 

constraints. Moreover, intonational contrasts are used to perform 

grammatical functions that can also be expressed by lexico-syntactic 

means, such as turning statements into questions, and putting constituents 

in focus. For these reasons it has become widely accepted that intonation is


part of the linguistic system (Ladd, 1996: 8). Yet, there have always been 

adherents of the view that speech melody should be considered as 

something outside the realm of linguistics proper, i.e., that intonation is a 

paralinguistic phenomenon at best, to be treated on a par with the 

expression of attitudes or emotions. Typically, the communication of 

emotions (such as anger, fear, joy, surprise) or of attitudes (such as 

sarcasm) is non-categorical: the speaker shows himself more or less angry, 

fearful, or sarcastic in a continuous, gradient fashion. 

A relatively recent insight, therefore, is that a division should be made 

in melodic phenomena occurring in speech between linguistic versus 

paralinguistic contrasts. Obviously, only the former but not the latter type 

of phenomena should be described by the grammar and explained by 

linguistic theory. This, however, begs the question how the difference can 

be made between linguistic and paralinguistic phenomena within the realm 

of speech melody. 3 Ladd & Morton (1997) were the first to suggest that the 

traditional diagnostic for categorical perception should be applicable to 

intonational categories in much the same ways as it works for segmental 

contrasts. Only if a peak in the discrimination function is found for adjacent 

members on a tone continuum straddling a boundary between tonal 

categories, are the categories part of the linguistic system, i.e., phonological 

categories. If no categorical perception of the tone categories can be 

established, the categories are ‘just’ the extremes of a paralinguistic or 

phonetic tonal continuum. Ladd & Morton tested the traditional diagnostic 

on a tone continuum between normal and emphatic accent in English and 

noted that it failed. This – to me – indicates that the contrast is not part of 

the phonology of English. 

Remijsen & van Heuven (1999, 2003) tested the traditional diagnostic 

on a tone continuum between ‘L%’ and ‘H%’ in Dutch, and showed that 

indeed there was a discrimination peak for adjacent members along the 

continuum straddling the boundary – indicating that the ‘L%’ and ‘H%’ 

categories are part of the phonology of Dutch. At the same time, however, 

we had to take recourse to listener-individual normalization of the category 

boundary, a complication that is not generally needed when dealing with 

contrasts in the segmental phonology. 4 

Van Heuven & Kirsner (2002) suggested that the relatively weak 

categorical effects in Remijsen & van Heuven could have been the result of 

an incorrect subdivision of the ‘L%’ to ‘H%’ tone range. Van Heuven & 

Kirsner (2002) showed that Dutch listeners were perfectly able to 

categorize a range of final pitches between low and high in terms of three


categories, functionally denoted as command intonation, continuation, and 

question. However, we did not run the full diagnostic involving both 

identification and discrimination procedures. Moreover, Van Heuven & 

Kirsner forced their listeners to choose between three response alternatives, 

viz. command, conditional and question. Although the extremes of the 

range, i.e. command versus question are unchallenged categories, it may 

well be the case that the conditional is not necessarily distinct from the 

question type. After all, in the grammar developed by ‘t Hart, Collier & 

Cohen (1990) any type of non-low terminal pitch falls into the same 

category, indicating non-finality. It occurred to us that we should take the 

precaution to run the experiment several times, using different response 

alternatives, such that two separate binary (‘command’ ~ ‘no command’ 

and ‘question ~ ‘no question’) response sets as well as the ternary response 

set (‘command’ ~ ‘conditional’ ~ ‘question’) were used by the same set of 

listeners. If the intermediate ‘conditional’ response category does constitute 

a clearly defined notion in the listeners’ minds, the binary and ternary 

divisions of the stimulus range should converge on the category boundaries. 

The present paper seeks to remedy the infelicities of Van Heuven & 

Kirsner (2002). However, before I deal with the experiments, it is necessary 

to introduce the inventory of the domain-final boundary configurations that 

can be found in Dutch. 

1.3. Dutch domain-final boundary tones 

Over the past decades a major research effort has been spent on the formal 

description of the sentence melody of Dutch. In the present paper we 

concentrate on one small part of the intonation system of Dutch: the options 

that are available to the speaker to terminate an intonation phrase. It has 

become customary to model the intonation system of a language as a 

hierarchically organized structure in which the tonal primitives (or ‘atoms’) 

are combined into tonal configurations, which in turn combine into 

intonation phrases. One or more of such intonation phrases are combined 

into an utterance, which may combine with other utterances to form a 

prosodic paragraph. The intonation phrase (henceforth IP), then, is situated 

roughly in the middle of the prosodic hierarchy. Note that a short utterance 

may consist of just one IP. An IP is characterized as a stretch of speech 

between two IP boundaries, i.e., a break in the segment string that is 

signaled by either a pause (physical interruption of the sound stream), pre-


boundary lengthening and/or by a boundary-marking tone. If the boundary 

is sentence medial, then yet another IP must follow in order to finish the 

utterance. 

The first explicit and experimentally verified grammar of Dutch 

intonation was developed at the Institute for Perception Research at 

Eindhoven (‘t Hart et al., 1990; Rietveld & van Heuven, 2001: 263-270). 

This grammar models the sentence melody of Dutch as a system of two 

gently declining reference lines, nominally 6 semitones (half an octave) 

apart, between which the pitch rises and falls in a limited number of 

patterns. The grammar provides for three different ways in which an IP 

may be terminated: (i) on the low reference line (‘0’), (ii) on the high 

reference line (‘∅’), or (iii) by executing a steep pitch rise (‘2’). Although 

the grammar is not completely explicit on this point, it appears that the 

offset of rise ‘2’ may exceed the level of the high reference line, 

specifically when the rise starts at the high reference line. The grammar 

then allows IPs to end at three different pitches: low, high, and extra high. 

A more recent account of Dutch intonation is given by Gussenhoven and 

co-workers (Gussenhoven, Rietveld & Terken, 1999; Rietveld & van 

Heuven, 2001: 270-277). This model is constructed along the principles 

adopted by autosegmental intonologists, in which a sentence melody is 

basically a sequence of tonal targets of two types: ‘H’ (high) and ‘L’ (low). 

The ToDI system (Transcription of Dutch Intonation), which is an 

inventory of tonal configurations for surface-level transcriptions of Dutch 

sentence melodies using the autosegmental H/L notation format, provides 

three symbols for marking IP boundaries: (i) ‘L%’, i.e., the final pitch 

target extends below the baseline, (ii) ‘%’, i.e., the absence of a tonal IP 

boundary marker, and (iii) ‘H%’, i.e., the final pitch is higher than the 

preceding pitch. 5 For details of the ToDI transcription system I refer to the 

ToDI website (www.lands.kun.nl/todi) or to Rietveld & van Heuven (2001: 

399-401). 

Remijsen & van Heuven (1999, 2003) report an experiment which 

sought to establish the perceptual boundary between sentence-final 

statement and question intonation. They did this by varying the pitch 

configuration on the utterance-final syllable of the verb-less phrase De 

Dennenlaan(?) ‘Pine Lane(?)’ between a fall and a steep rise in eleven 

perceptually equal steps. Listeners were then asked to decide for each of the 

eleven pitch patterns whether they perceived it as a statement or a question. 

At the time we tacitly assumed that the continuum spanned just two 

pragmatic categories, i.e. statement versus question, and that there was no


relevant intermediate category that could be interpreted as ‘non-finality’. In 

fact, Kirsner & van Heuven (1996) suggested a single abstract meaning for 

the non-low tonal category: ‘appeal (by the speaker to the hearer)’, asking 

for the hearer’s continued attention or for a verbal response to a question or 

a non-verbal compliance with a request. However, Caspers (1998) 

suggested that there is a functional difference between the non-tonal 

boundary (‘%’) following an earlier ‘H*’ target and the high boundary 

(‘H%’) following an earlier ‘H*’. She synthesized stimuli in which the 

terminal pitch after the accent-marking ‘H*’ was followed by either ‘H%’ 

(where the final pitch was raised further) or just % (where the pitch 

remained high but level after the accent). Her results indicate that listeners 

unequivocally expect the speaker to continue after the ‘H* ... %’ 

configuration, in contradistinction to the ‘H* ... H%’ pattern, for which the 

responses were equally divided between ‘same speaker will continue’ and 

‘interlocutor will take over (with a response)’. 

Note that the ‘%’ tone-less boundary as studied by Caspers is found 

only after a preceding H* accent. Strictly speaking, then, the ‘%’ boundary 

cannot be used as an intermediate category in between ‘L%’ and ‘H%’ 

when the preceding pitch is low. After ‘L’, any rise in pitch, whether strong 

or intermediate, is a perceptually relevant change in pitch, which must be 

coded by an ‘H%’ target. On the other hand, this formal constraint is in the 

way of an attractive generalization which would allow us to view the high 

level pitch (‘H* ... %’) pattern as a surface realization of the ‘H*L...%’ 

pattern from which the L target has been deleted – in much the same way as 

was suggested by Haan (2002) in order to account for the functional 

similarity between the ‘H*...H%’ and the ‘H*L…H%’ interrogative 

patterns, as exemplified in Figure 2. 

Figure 2. Underlying tonal shape (dotted) and surface realization after ‘L’-deletion 

(solid) of an ‘H*L … H%’ sequence. 

There seems to be a mismatch between the functions expressed by Caspers’ 

‘%’ and ‘H%’ after ‘H*’. If we assume an iconic relationship between the


terminal pitch of the utterance and the degree of submissiveness of the 

speaker towards the hearer, then we would reason that ‘H%’ should make 

more of an appeal to the hearer (expressing greater submissiveness) than 

just ‘%’. On the other hand, answering a question seems a bigger favor on 

the part of the hearer than merely waiting for the speaker to continue the 

utterance. It could be the case, of course, that even the highest terminal 

pitches used by Caspers were not high enough to elicit unambiguous ‘other 

speaker will take over’ (i.e. ‘question’) responses. Also, it is unclear if the 

unambiguous ‘same speaker will continue’ response crucially depends on a 

flat stretch of high declination (as is the case after an ‘H*’ accent) or if any 

terminal pitch of intermediate height would yield the same response. 

In Caspers’ analysis the ‘%’ boundary – and arguably an ‘L … H%’ 

sequence with a moderately high terminal pitch – unambiguously signals 

continuation. This category would then be expected to be firmly 

represented in the listener’s cognitive system. Varying the terminal pitch 

from low to extremely high should then elicit two well-defined categories: 

(i) unambiguous statement for low pitches, (ii) unambiguous continuations 

for intermediate terminal pitches, and (iii) a poorly defined or non-unique 

interrogative category, which is also compatible with a continuation 

reading. 

At this time, then, we do not know whether two or three formal tone 

categories should be postulated in IP-final position. It seems that the status 

of ‘L%’ as a linguistic category is unchallenged but the non-low part of the 

IP-final tone range is very much a matter of debate. Does the non-low part 

of the range form a continuum expressing lesser or greater appeal by the 

speaker in a paralinguistic manner, or should this part of the range be split 

into two discrete phonological categories, each expressing a distinct 

meaning of its own (i.e. ‘continuation’ ~ ‘question’, or – even worse – into 

two categories of which one is specific for ‘continuation’ and the other 

underspecified and compatible with both ‘question’ and ‘continuation’? 

These meanings, and a possible way of testing the categorical nature of 

tonal contrasts expressing them, are the topic of the next section. 

1.4. Clause typing 

Dutch, like any other language, has lexico-syntactic means to express a 

range of clause types, such as statement, command, exclamation and 

question. Although the lexico-syntactic means are generally adequate and


sufficient to express the speaker’s pragmatic intention to the hearer, several 

– if not all – clause types are supported by prosodic means, specifically by 

appropriate intonation patterns. In fact, exceptional situations may arise 

where there is no lexico-syntactic differentiation between the clause types, 

and where the speaker’s intention can only be recovered from melodic 

cues. For the purposes of the present experiment we have looked for a 

situation in which the three prosodic categories may serve as the only cue to 

a ternary choice among clause types, so that prosody will be exploited to 

the utmost, and the listener’s choice will not be co-determined by lexical 

and/or syntactic cues. Such a situation may be obtained in a V1 sentence, 

where the finite verb has been moved into the sentence-initial position. 6 In 

the sentence Neemt u de trein naar Wageningen ‘Take you the train to 

Wageningen’ the lexico-syntactic information is compatible with at least 

three interpretations: 7 

- A polite imperative (Kirsner, van Heuven & Caspers, 1998) 

- A conditional clause similar in meaning to ‘If you take the train to 

Wageningen ...’ 

- A yes/no question ‘Do you take the train to Wageningen?’ 

Which of the three readings is intended by the speaker, is expressed 

through prosody only. In setting up the experiment we assumed that there is 

no principal difference in the speech melody between a statement and a 

command in Dutch. 8 Using a range of terminal pitch patterns on the single 

phrase Neemt u de trein naar Wageningen, we can determine the category 

boundaries between command (for statement), conditional (for 

continuation), and question without any interfering differences in lexicosyntactic 

structure. 

We may conclude this introduction by summarizing the research 

questions that we will address: 

1. Are the domain-final boundaries ‘L%’ ~ ‘%’ ~ ‘H%’ contiguous 

categories along a single tonal dimension? 

2. Is there a one-to-one correspondence between ‘L%’ and ‘command’, 

‘%’ and ‘conditional’, and ‘H%’ and ‘question’? 

3. Where are the category boundaries – if any – along the continuum 

between (i) ‘L%’ and ‘%’ and (ii) between ‘%’ and ‘H%’? 

4. Are the category boundaries at the same positions along the stimulus 

range irrespective of the binary versus ternary response mode?


5. Are both boundaries truly categorical in the sense that there are 

discrimination peaks for adjacent stimulus pairs straddling the category 

boundaries? 

2. Methods 

2.1. Stimuli 

A male native speaker of standard Dutch read the sentence Neemt u de trein 

naar WAgeningen? with a single ‘H*L’ accent on the first syllable of 

Wageningen. The utterance was recorded onto digital audio tape (DAT) 

using a Sennheiser MKH 416 unidirectional condenser microphone, 

transferred to computer disk (16 kHz, 16 bits) and digitally processed using 

the Praat speech processing software (Boersma & Weenink, 1996; Boersma 

& van Heuven, 2001). The intonation pattern of the utterance was stylized 

by hand as a sequence of straight lines in the ERB x linear time 

representation. Nine intonationally different versions were then generated 

using the PSOLA analysis-resynthesis technique (e.g. Moulines & Verhelst, 

1995; Rietveld & van Heuven, 2001: 379-380) implemented in the Praat 

software. The nine versions were identical up to and including the ‘H*L’ 

configuration on Wageningen. From that point onwards the nine versions 

diverged into two falls and seven rises. The terminal frequencies of the nine 

versions were chosen to be perceptually equidistant, i.e., the difference 

between any two adjacent terminal frequencies was equal in terms of the 

ERB scale. 9 The terminal pitch of version 1 equaled 80 Hz, the increment 

in the terminal frequency for each following version was 0,25 ERB. The 

nine pitch patterns are shown in Figure 3.

80 

60 

160 

120 

80 


neemt u de trein naar WA ge ni ngen 

0 0.5 

1 

1.5 

Time (s) 

Figure 3. Steps 1 through 9 along resynthesized continuum differing in terminal F0 

by 0,25 ERB increments. Intensity contour (dB) and segmentation (by 

syllables) are indicated. 

2.2. Tasks and experimental procedures 

For the discrimination task, which was the first task imposed on the 

subjects, we followed Ladd and Morton (1997) in using the AX 

discrimination paradigm. Stimuli were presented in pairs that were either 

the same or one step apart on the continuum. In the latter case, the second 

can be higher or lower than the first (hereafter AB and BA, respectively). 

The eight AB stimulus types ran from pair {1,2} to {8,9}; the eight 

corresponding BA types from {2,1} to {9,8}. This yielded 9 identical pairs 

and 2 x 8 = 16 different pairs, which occurred in random order, yielding a 

set of 25 trials in all, which was presented to each listener four times in 

different random orders, preceded by five practice trials. Stimuli within 

pairs were separated by a 500-ms silence, the pause between pairs was 

3000 ms. A short warning tone was sounded after every tenth trial. 

For the identification task listeners responded to individual stimuli from 

the 9-step continuum by classifying each either in terms of a binary or a 

ternary choice: 

1. ‘Command’ ~ ‘no command’. In one task the listeners were instructed 

to decide for each stimulus whether they interpreted it as a command or 

not. 

2. ‘Question’ ~ ‘no question’. An alternative task involved the decision 

whether the stimulus sounded like a question or not.


3. ‘Command’ ~ ‘condition’ ~ ‘question’. The third task was identical to 

the task imposed in van Heuven & Kirsner (2002). 

Half of the listeners first performed task (1), the other half of the listeners 

began with task (2). Task (3) was always the last identification procedure in 

the array of tests. For each task, the set of nine stimuli were presented five 

times to each listener, in different random orders, and preceded by five 

practice items, yielding sets of 50 identification stimuli per task. 

Twenty native Dutch listeners, ten males and ten females, took part in 

the experiment on a voluntary basis. Participants were university students 

or members of their families. None of them reported any perceptual 

deficiencies. 

The experiments were run with small groups of subjects, who listened to 

the stimuli at a comfortable loudness level over Quad ESL-63 electrostatic 

loudspeakers, while seated in a sound-treated lecture room. Subjects 

marked their responses on printed answer sheets provided to them, always 

taking the discrimination task first and the identification tasks last. 

3. Results 

3.1. Identification 

Figures 4 and 5 present the results obtained in the binary identification 

tasks, i.e., the forced choice between ‘command’ ~ ‘no command’ (Figure 

4) and between ‘question’ ~ ‘no question’ (Figure 5).

Indentifications "command" (%) 


Figure 4. Percent ‘command’ responses as a function of stimulus step (terminal F0 

increments in 0.25 ERB steps) in a binary identification task (‘command’ 

~ ’no command’). 

The psychometric function for the ‘command’ responses is very steep. The 

category boundary between ‘command’ and ‘no command’ is located at a 

step size of 2.7, and the margin of uncertainty runs between 2.2 and 3.7, 

i.e., a cross-over from 75% to 25% ‘command’ responses is effected by an 

increase in the terminal pitch of the stimulus of 1.5 step (i.e., 0.37 ERB). 

Indentifications "question" (%) 

100 

80 

60 

40 

20 

0 

1 

100 

80 

60 

40 

20 

0 

1 

2 

2 

3 

3 

Stimulus step 

4 

4 

5 

5 

6 

6 

Stimulus step 

7 

7 

8 

8 

Figure 5. Percent ‘question’ responses as a function of stimulus step (terminal F0 

increments in 0.25 ERB steps) in a binary identification task (‘question’ 

~ ’no question’ 

9 

9


A complete cross-over is also found for the ‘question’ ~ ‘no question’ task. 

The category boundary finds itself at a stimulus value of 3.6, whilst the 

margin of uncertainty runs between 2.3 and 4.9, i.e., an interval of 2.6 

increments of 0.25 ERB. We may note that the category boundaries in the 

‘command’ and the ‘question’ tasks do not coincide, but are separated 

along the stimulus axis by almost a complete step: 2.7 versus 3.6 or 0.9 

step. Note, once more, that none of the subjects had been alerted to the 

possible existence of an intermediate category between ‘command’ and 

‘question’. Therefore, the emergence of the interval between the 

‘command’ and the ‘question’ boundaries might be taken in justification of 

such an intermediate category. 

Let us now turn to the results of the ternary identification task in which 

all the listeners who had already responded to the stimuli were now 

required to classify the nine stimulus types as either ‘command’, 

‘conditional subclause’ or ‘question’. These results are shown in Figure 6. 

Identification (%) 

100 

80 

60 

40 

20 

0 

1 

2 

3 

4 

5 

6 

Stimulus step 

7 

8 

9 

'command' 

'continuation' 

'question' 

Figure 6. Ternary identification of stimuli as ‘command’, ‘conditional clause’ or 

‘question’. Category boundaries are indicated. 

The boundary between ‘command’ and the ‘continuation’ categories is at 

2.8; this is hardly different than the ‘command’ ~ ’no command’ boundary 

that was found in the binary response task. This, then, would seem to be a 

very robust boundary, showing that at least ‘command’ intonation has welldefined 

linguistic status. The boundary between ‘continuation’ and 

‘question’ is less clearly defined. Also, the maximum scores in these two


categories are around 80% rather than 90% or more. Although there is no 

ambiguity in the listeners’ minds whether a stimulus is a command or 

something else, the choice between ‘continuation’ and ‘question’ seems 

more ambiguous leaving room for a minority response in the order of 20%. 

This would indicate to us that we are dealing here with a continuum rather 

than with a dichotomy. Finally, we may note that the (soft) category 

boundary between ‘continuation’ and ‘question’ is located at a stimulus 

value of 7.2. The boundary, then, that sets off ‘question’ from ‘no question’ 

responses proves very unstable: there is a shift from the binary response 

task (3.6) to the ternary task (7.2) of no less than 3.6 points along the 

stimulus continuum. 

It would seem, then, that the ‘command’ category is highly stable and 

well-established in the minds of the listeners. The ‘question’ boundary, 

however, is rather poorly defined, as a result of several circumstances. The 

cross-over points for the ‘question’ category of individual listeners vary 

over a wide range of stimulus values, i.e., between 2.2 and 8.5 step number, 

with a fairly even spread of values in between these extremes. Moreover, 

for two listeners no cross-over to the ‘question’ category could be found at 

all; here the listeners never gave the ‘question’ response in more than 75%. 

Also, some listeners have extremely sharp cross-overs to the ‘question’ 

category, but others show large margins of uncertainty. 

3.2. Discrimination 

Figure 7 presents the mean percentage of successfully discriminated stimuli 

that were actually different (hereafter ‘hits’), and the percentage of false 

alarms, i.e. ‘different’ responses to (identical) AA stimuli. The false-alarm 

rate is roughly 20% across the entire stimulus continuum. This value can be 

seen as a bias for responding ‘different’. Generally, an increment of 0.25 

ERB is discriminated above the 20% bias level, with the exception of the 

difference between stimulus steps 5 and 6. The discrimination function 

shows two local peaks. The first one is very large, and is located between 

stimulus steps 2 and 3. This peak obviously coincides with the stable 

category boundary found between ‘command’ and the non-command 

responses (whether binary or ternary). A much smaller second 

discrimination peak may be observed between stimulus steps 6 and 7, 

which location may well reflect the rather poorly defined category 

boundary between ‘continuation’ and ‘question’.


"Different" judgments (%) 

100 

80 

60 

40 

20 

0 

1 2 3 4 5 6 7 8 9 

1-2 2-3 3-4 4-5 5-6 6-7 7-8 8-9 

Stimulus step 

Figure 7. Percent ‘different’ judgments to nine identical stimulus pairs (false 

alarms) and eight pairs differing by one step (hits). 

4. Conclusions and discussion 

"same" pair 

"diff" pair 

Let us now try to formulate answers to the research questions that we asked 

in section 2. The first two questions, which I will attempt to answer 

together, asked whether the domain-final boundary tones are contiguous 

categories along a single tonal dimension, and map onto the command, 

continuation and question meaning in a one-to-one fashion. The results of 

our experiments clearly indicate that this is indeed the case. Our listeners 

had no difficulty in using the three response alternatives provided to them. 

When the terminal pitch was lower than the preceding pivot point in the 

contour the responses were almost unanimously for ‘command’. When the 

IP-final pitch was higher than the preceding pivot point, the incidence of 

‘continuation’ responses increased up to and including step 4, and 

decreased for higher terminal pitches which were more readily identified as 

questions as the terminal pitch was higher. Although there was always 

some ambiguity between the ‘continuation’ and ‘question’ alternatives, the 

results clearly indicate that ‘continuation’ is signaled by moderate final 

pitch, and question by (extra) high pitch. 

The latter finding corresponds with our suggestion that asking a 

question involves a higher degree of appeal by the speaker to hearer than


asking the listener’s continued attention. We may also note that our result 

clashes with Caspers (1998). She found that the intermediate final pitch (or 

high level pitch in her experiment) was unambiguously identified as 

continuation; extra high final pitch ambiguously coded either continuation 

or question. Comparison of Caspers’ and our own results is hazardous since 

the utterance-final tone configurations differ, not so much at the underlying 

tone level, but at the surface. It seems to me that the discrepancy between 

Caspers’ and our own findings can be resolved if we accept the possibility 

that Caspers’ extra high terminal pitch was simply not high enough to elicit 

the 80% ‘question’ responses that we got in our experiment. 

The results so far concur with van Heuven & Kirsner (2002). However, 

we may now go on to consider the third, fourth and fifth question, which 

asked where the category boundaries are located along the final pitch 

continuum between ‘L%, ‘%’ and ‘H’, in the binary and ternary response 

tasks, and to what extent the boundaries coincide with a peak in the 

discrimination function. 

The results obtained in the binary (‘command’ ~ ‘no command’) and 

ternary (‘command’ ~ ‘continuation’ ~ ‘question’) identification tasks are 

virtually the same, yielding the same location of the boundary (at step 2.7) 

separating the ‘command’ category from the rest of the stimulus 

continuum. However, a very unstable boundary is found in the binary 

‘question’ ~ ‘no question’ task (at step 3.6), which is reflected in the poorly 

defined boundary separating the ‘continuation’ and ‘question’ categories in 

the ternary response task (at step 7.2). Moreover, we have seen that the 

category boundary between ‘command’ and ‘no command’ coincides with 

a huge peak in the discrimination function. Although there is a modest local 

maximum in the discrimination function that may be associated with a 

boundary between ‘continuation’ and ‘question’, this peak is not very 

convincing. 

I take these findings as evidence that there is a linguistic, or 

phonological, categorization of the IP-final boundary tone continuum in 

just two types, which is best characterized as low and non-low. The low 

boundary tone signals dominance or superiority on the part of the speaker. 

This is the boundary tone that is suited for issuing statements and 

commands. The non-low boundary tone signals subservience of the speaker 

to the hearer; the speaker appeals to the hearer for his continued attention or 

for an answer to a question. 

The non-low part of the boundary opposition, however, represents a 

gradient, paralinguistic continuum between a moderate appeal (asking for


the hearer’s continued attention) and a stronger appeal (asking the hearer 

for a verbal reply to a question). Here the lower terminal pitches are 

associated with weaker degrees of appeal (or subservience), and the higher 

levels with strong appeal, but in a continuous, gradient, non-phonological 

manner. 

Our results indicate that earlier findings reported by Remijsen & van 

Heuven (1999, 2003) are to be viewed with caution. We now know that the 

proper task to be imposed on listeners should not be to decide whether the 

stimulus is a statement (or a command) versus a question. If binary 

response alternatives are required, then the categories should be ‘statement’ 

versus ‘no statement’ but a better procedure would be to ask the listener to 

respond by choosing from three categories: ‘statement’ (equivalent to 

‘command’ in our experiments ~ ‘continuation’ ~ ‘question’. Had such 

precautions been taken by Remijsen & van Heuven, their category 

boundary would have been much better defined with less listener-individual 

variation. 

Methodologically, we argue that the classical identification-cumdiscrimination 

paradigm is a useful diagnostic tool in intonation research 

which allows linguists to decide experimentally whether a melodic contrast 

is categorical and therefore part of the phonology, or continuously gradient 

and therefore phonetic or even paralinguistic. 

Notes 

1 The experiments reported in this chapter were run by Susanne Strik and Josien 

Klink in partial fulfillment of the course requirements for the Experimental 

Phonetics Seminar taught by the Linguistics Programme at University of 

Leiden. 

2 This commutation procedure is best viewed as a mental experiment; when the 

exchange is implemented through actual digital tape splicing, the result is more 

often than not an uninterpretable stream of sound. 

3 The nature of the distinction between intonational categories is problematic for 

a further reason: inter-listener agreement on the identity of intonational events 

is low (Pitrelli et al., 1994), particularly in comparison with the self-evident 

consensus on segmental distinctions. This lack of consistency has lead Taylor 

(1998) to reject a basic principle of (intonational) phonology, namely its 

categorical nature. With respect to methodology, researchers tend to act as 

expert listeners, linking contours that sound distinct to pragmatic meaning in an


intuitive fashion. Accordingly, inter-researcher agreement may be low, too (e.g. 

Caspers, 1998). 

4 Nevertheless, large between-listener variability has been reported, for instance, in 

the cuing of the voiced/voiceless contrast by the duration of the pre-burst silent 

interval: the boundary was at 70 ms for subject #1 and over 100 ms for subject #7 

(Slis & Cohen, 1969). These results are commented on by Nooteboom & Cohen 

(1976: 84) as follows: ‘Although the cross-over from /d/ to /t/ proceeds rather 

gradually when averaged over all listeners, the boundary is quite sharply defined for 

individual listeners’ (my translation, VH). 

5 The ‘%’ sign following the tone letter (as in ‘L%’, ‘H%’) denotes a domainfinal 

boundary; domain-initial boundaries are coded by the ‘%’ sign preceding 

a tone letter (as in ‘%L’, ‘%H’). A ‘%’ sign unaccompanied by a tone letter 

may only occur in domain-final positions, where it is phonetically coded by a 

physical pause and/or pre-boundary lengthening only. 

6 It has been argued by structuralists at least as far back as Merckens (1960) that 

V1 (‘verb first’) is directly opposed to V2 ('verb second') in signaling, for 

example, ‘non-assertion’ rather than ‘assertion’, since neither a command nor a 

question nor a condition expresses an ongoing state of affairs. 

7 A sequence like Neemt u de trein naar Wageningen might in addition be 

interpretable as a topic-drop-sentence (e.g. [Dan/Daar] neemt u de trein naar 

Wageningen ‘[Then/There] you take the train to Wageningen’, analogous to 

Doen we! ‘We'll do [it]’ or Weet ik! ‘[That] I know’. Although this added 

interpretation (with a ‘deleted’ element) is theoretically possible, we believe 

that it was highly unlikely under the controlled conditions of the experiment. 

Furthermore, none of the experimental subjects volunteered the information 

that we had forgotten such an extra interpretation. 

8 This position does not exclude the possibility that statement and imperative are 

subtly different in their paralinguistic use of prosody. For instance, the overall 

pitch of the imperative may be lower, and it may be said with greater loudness 

and larger/higher pitch excursions on the accented syllables. This does not 

invalidate our claim that both statements and imperatives are coded by the ‘L%’ 

terminal boundary. 

9 The ERB scale (Equivalent Rectangular Bandwidth) is currently held to be the 

most satisfactory psychophysical conversion for pitch intervals in human 

speech (Hermes & van Gestel, 1991; Ladd & Terken, 1995). The conversion 

from Hertz (f) to ERB (E) is achieved by a simple formula: E = 16.6 * log (1 + f 

/ 165.4).


References 

Boersma, P. and Heuven, V.J. van (2001). Speak and unSpeak with Praat. Glot 

International, 5: 341-347. 

Boersma, P. and Weenink, D. (1996). Praat, a System for Doing Phonetics by 

Computer. Report of the Institute of Phonetic Sciences 

Amsterdam, 132. 

Caspers, J. (1998). ‘Who’s Next? The Melodic Marking of Question vs. 

Continuation in Dutch. Language and Speech, 41: 375-398. 

Gussenhoven, C., Rietveld, T. and Terken, J.M.B. (1999). Transcription of 

Dutch Intonation. http://lands.let.kun. nl/todi. 

Haan, J. (2002). Speaking of questions. An Exploration of Dutch Question 

Intonation. LOT Dissertation Series, nr. 52, Utrecht: LOT. 

Hart, J. 't, Collier, R. and Cohen, A. (1990). A Perceptual Study of Intonation. 

An Experimental-phonetic Approach to Speech Perception. 

Cambridge: Cambridge University Press. 

Hermes, D.J. and Gestel, J.C. van (1991). ‘The Frequency Scale of Speech 

Intonation. Journal of the Acoustical Society of America, 90: 97- 

102. 

Heuven, V.J. van (1986). Some acoustic characteristics and perceptual 

consequences of foreign accent in Dutch spoken by Turkish 

immigrant workers. In: J. van Oosten, J.F. Snapper (eds.) Dutch 

Linguistics at Berkeley, papers presented at the Dutch Linguistics 

Colloquium held at the University of California, Berkeley on 

November 9th, 1985, Berkeley: The Dutch Studies Program, U.C. 

Berkeley, 67-84. 

Heuven, V.J. van and Kirsner, R.S. (2002). Interaction of tone and particle in 

the signaling of clause type in Dutch. In: H. Broekhuis, P. Fikkert 

(eds.). Linguistics in the Netherlands 2002, Amsterdam 

/Philadelphia: John Benjamins, 73-84. 

Kirsner, R.S. and Heuven, V.J. van (1996). Boundary Tones and the Semantics 

of the Dutch Final Particles hè, hoor, zeg and joh. In: M. den 

Dikken, C. Cremers, eds., Linguistics in the Netherlands 1996, 

Amsterdam/Philadelphia: John Benjamins, 133-146. 

Kirsner, R.S., Heuven, V.J. van, and Caspers, J. (1998). From Request to 

Command: An Exploratory Experimental Study of Grammatical 

Form, Intonation, and Pragmatic Particle in Dutch Imperatives. 

In: R. van Bezooijen, R. Kager, eds., Linguistics in the 

Netherlands 1998. Amsterdam/Philadelphia: John Benjamins, 

135-148.


Ladd, D.R. (1996). Intonational phonology. Cambridge: Cambridge University 

Press. 

Ladd, D.R. and Morton, R. (1997). The perception of intonational emphasis: 

continuous or categorical? Journal of Phonetics, 25: 313-342. 

Ladd, D.R. and Terken, J.M.B. (1995). Modelling intra- and inter-speaker pitch 

range variation. Proceedings of the 13 th International Congress of 

Phonetic Sciences, Stockholm, 2: 386-389. 

Merckens, P.J. (1960). De plaats van de persoonsvorm: een verwaarloosd codeteken 

[The position of the finite verb: a neglected code sign]. De 

nieuwe taalgids, 53: 248-54. 

Moulines, E. and Verhelst, E. (1995). ‘Time-domain and frequency-domain 

techniques for prosodic modification of speech’. In: W.B. Kleijn 

and K.K. Paliwal, eds., Speech coding and synthesis. Amsterdam: 

Elsevier Science, 519-555. 

Nooteboom, S.G. and Cohen, A. (1976). Spreken en verstaan. Een inleiding tot 

de experimentele fonetiek [Speaking and understanding. An 

introduction to experimental phonetics], Assen: van Gorcum. 

Pitrelli, J.F., Beckman, M.E. and Hirschberg, J. (1994). Evaluation of prosodic 

transcription reliability in the ToBI framework. Proceedings of 

the 3rd International Conference on Spoken Language 

Processing, Yokohama, 1: 123-126. 

Remijsen, A.C. and Heuven, V.J. van (1999). Gradient and categorical pitch 

dimensions in Dutch: Diagnostic test’. Proceedings of the 14th 

International Congress of Phonetic Sciences, San Francisco, 

1865-1868. 

Remijsen, A.C. and Heuven, V.J. van (2003). Linguistic versus paralinguistic 

status of prosodic contrasts, the case of high and low pitch in 

Dutch. In: J.M. van de Weijer, V.J. van Heuven, H.G. van der 

Hulst (eds.): The phonological spectrum. Volume II: 

Suprasegmental structure. Current Issues in Linguistic Theory nr. 

235. Amsterdam/Philadelphia: John Benjamins, 225-246. 

Rietveld, A.C.M. and Heuven, V.J. van (2001). Algemene Fonetiek [General 

Phonetics]. Bussum: Coutinho. 

Slis, I.H. and Cohen, A. (1969). On the complex regulating the voicedvoiceless 

distinction, Language and Speech, 80-102: 137-155. 

Taylor, P. (1998). Analysis and synthesis of intonation using the TILT model. 

Unpublished manuscript, Centre for Speech Technology 

Research, University of Edinburgh.

The Position of Frisian in the Germanic Language 

Area 

Charlotte Gooskens and Wilbert Heeringa 

1. Introduction 

Among the Germanic varieties the Frisian varieties in the Dutch province 

of Friesland have their own position. The Frisians are proud of their 

language and more than 350,000 inhabitants of the province of Friesland 

speak Frisian every day. Heeringa (2004) shows that among the dialects in 

the Dutch language area the Frisian varieties are most distant with respect 

to standard Dutch. This may justify the fact that Frisian is recognized as a 

second official language in the Netherlands. In addition to Frisian, in some 

towns and on some islands a mixed variety is used which is an intermediate 

form between Frisian and Dutch. The variety spoken in the Frisian towns is 

known as Town Frisian 1 . 

The Frisian language has existed for more than 2000 years. Genetically 

the Frisian dialects are most closely related to the English language. 

However, historical events have caused the English and the Frisian 

language to diverge, while Dutch and Frisian have converged. The 

linguistic distance to the other Germanic languages has also altered in the 

course of history due to different degrees of linguistic contact. As a result 

traditional genetic trees do not give an up-to-date representation of the 

distance between the modern Germanic languages. 

In the present investigation we measured linguistic distances between 

Frisian and the other Germanic languages in order to get an impression of 

the effect of genetic relationship and language contact for the position of 

the modern Frisian language on the Germanic language map. We included 

six Frisian varieties and one Town Frisian variety in the investigation. 

Furthermore, eight Germanic standard languages were taken into account. 

Using this material, we firstly wished to obtain a hierarchical classification 

of the Germanic varieties. From this classification the position of (Town)

62 Charlotte Gooskens and Wilbert Heeringa 

Frisian became clear. Secondly, we ranked all varieties with respect to each 

of the standard Germanic languages as well as to (Town) Frisian. The 

rankings showed the position of (Town) Frisian with respect to the standard 

languages and the position of the standard languages with respect to 

(Town) Frisian. 

In order to obtain a classification of varieties and establish rankings, we 

needed a tool that can measure linguistic distances between the varieties. 

Bolognesi and Heeringa (2002) investigated the position of Sardinian 

dialects with respect to different Romance languages using the Levenshtein 

distance, an algorithm with which distances between word pronunciations 

are calculated. In our investigation we used the same methodology. 

In Section 2, we will present the traditional ideas about the genetic 

relationship between the Germanic languages and discuss the relationship 

between Frisian and the other Germanic languages. At the end of the 

section we will discuss the expected outcome of the linguistic distance 

measurements between Frisian and the other Germanic languages. In 

Section 3 the data sources are described and in Section 4 the method for 

measuring linguistic distances between the language varieties is presented. 

The results are presented in Section 5, the discussion of which is presented 

in Section 6. 

2. Frisian and the Germanic languages 

2.1. History and classification of the Germanic languages 2 

The Germanic branch of the Indo-European languages has a large number 

of speakers, approximately 450 million native speakers, partly due to the 

colonization of many parts of the world. However, the number of different 

languages within the Germanic group is rather limited. Depending on the 

definition of what counts as a language there are about 12 different 

languages. Traditionally, they are divided into three subgroups: East 

Germanic (Gothic, which is no longer a living language), North Germanic 

(Icelandic, Faeroese, Norwegian, Danish, and Swedish), and West 

Germanic (English, German, Dutch, Afrikaans, Yiddish, and Frisian). 

Some of these languages are so similar that they are only considered 

independent languages because of their position as standardized languages


spoken within the limits of a state. This goes for the languages of the 

Scandinavian countries, Swedish, Danish and Norwegian, which are 

mutually intelligible. Other languages consist of dialects which are in fact 

so different that they are no longer mutually intelligible but are still 

considered one language because of standardization. Northern and southern 

German dialects are an example of this situation. 

Figure 3. The genetic tree of Germanic languages. 

In Figure 1, a traditional Germanic genetic tree is shown. We constructed 

this tree on the basis of data in the literature. The tree gives just a rough 

division, and linguistic distances should not be derived from this tree. It is 

commonly assumed that the Germanic languages originate from the 

southern Scandinavian and the northern German region. After the migration 

of the Goths to the Balkans towards the end of the pre-Christian era, North- 

West Germanic remained uniform till the 5th century AD, after which a 

split between North and West Germanic occurred owing to dialectal 

variation and the departure of the Anglo-Saxons from the Continent and the 

colonization of Jutland. 

During the Viking Age, speakers of North Germanic settled in a large 

geographic area, which eventually led to the five modern languages (see 

above). Of these languages, Icelandic (and to a lesser degree Faeroese), 

which is based on the language of southwestern Norway where the settlers 

came from, can be considered the most conservative language (Sandøy, 

1994). Of the three mainland Scandinavian languages, Danish has moved


farthest away from the common Scandinavian roots due to influences from 

the south. 

The parentage of the West Germanic languages is less clear. Different 

tribal groups representing different dialect groups spread across the area, 

which eventually resulted in the modern language situation. Historically 

Frisian and English both belong to the Ingwaeonic branch of the West 

Germanic language group. Originally the Frisian speech community 

extended from the present Danish-German border along the coast to the 

French-Belgian border in the south. However, expansion from Saxons and 

Franconians from the east and the south throughout the medieval period 

resulted in a loss of large Frisian areas and a division into three mutually 

intelligible varieties: West Frisian (spoken in the northern Dutch province 

of Friesland by more than 350,000 people), East Frisian or Saterlandic 

(spoken by a thousand speakers in three villages west of Bremen) and 

North Frisian (spoken by less than ten thousand people on the islands on 

the north-western coast of Germany). 

The English language came into being as a result of immigrations of 

tribal Anglo-Saxon groups from the North Sea coast during the fifth and 

sixth centuries. Whereas other insular Germanic varieties are in general 

rather conservative, the English insularity lacked this conservatism. English 

is considered most closely related to Frisian on every linguistic level due to 

their common ancestorship and to continued language contact over the 

North Sea. 

The German language is spoken in many European countries in a large 

number of dialects and varieties, which can be divided into Low German 

and High German. Yiddish, too, can be regarded as a German variety. 

Dutch is mainly based on the western varieties of the low Franconian area 

but low Saxon and Frisian elements are also found in this standard 

language. Scholars disagree about the precise position of Dutch and Low 

German in the language tree. They can be traced back to a common root 

often referred to as the Ingwaeonic language group, but are often grouped 

together with High German as a separate West Germanic group. This 

grouping with High German might be the best representation of the modern 

language situation given that the individual dialects spoken in the area in 

fact form a dialect continuum. Afrikaans, finally, is a contemporary West 

Germanic language, developed from seventeenth century Dutch as a result 

of colonization, but with influences from African languages.


2.2. The relationship between Frisian and the other Germanic languages. 

This short outline of the relationships among the Germanic languages 

shows that English is the language which is genetically closest to Frisian, 

and still today English is considered to be most similar to Frisian. For 

example The Columbia Encyclopedia (2001) says: “Of all foreign 

languages, [Frisian] is most like English”. Pei (1966, p. 34) summarizes the 

situation as follows: “Frisian, a variant of Dutch spoken along the Dutch 

and German North Sea coast, is the foreign speech that comes closest to 

modern English, as shown by the rhyme: ‘Good butter and good cheese is 

good English and good Fries’”. This rhyme refers to the fact that the words 

for butter and cheese are almost the same in the two languages. However, 

in the course of history, contact with other Germanic languages has caused 

Frisian to converge to these languages. The Frisians have a long history of 

trade and in early medieval times they were one of the leading trading 

nations in Europe due to their strategic geographic position close to major 

trade routes along the rivers and the North Sea. Also, the Vikings and the 

English were frequent visitors of the Frisian language area. This intensive 

contact with both English and the North Germanic languages, especially 

Danish, resulted in linguistic exchanges (see Feitsma, 1963; Miedema, 

1966; Wadstein, 1933). Later in history, the Frisian language was 

especially influenced by the Dutch language (which itself contains many 

Frisian elements). For a long period, Frisian was stigmatized as a peasant 

language and due to the weak social position of the Frisian language in the 

Dutch community it was often suppressed, resulting in a strong Dutch 

impact on the Frisian language. Nowadays, Dutch as the language of the 

administration still has a large influence on the media and there has been 

substantial immigration of Dutch speaking people to Friesland. However, 

the provincial government has decided to promote Frisian at all levels in 

the society. 

When investigating the position of the Frisian language within the 

Germanic language group, there are clearly two forces which should be 

taken into account. On the one hand, Frisian and English are genetically 

closely related and share sound changes which do not occur in the other 

Germanic languages. This yields the expectation that the linguistic distance 

between these two languages is relatively small. On the other hand, the 

close contact with Dutch makes it plausible that the Dutch and the Frisian 

languages have converged. Also the distance to Danish might be smaller 

than expected from the traditional division of Germanic into a North


Germanic and a West Germanic branch at an early stage because of the 

intensive contacts in the past. 

3. Data sources 

In this section, we will first give a short characterization of the language 

varieties and the speakers who were recorded for our investigation. Next, 

we will present the nature of the recordings and the transcriptions which 

formed the basis for linguistic distance measurements. 

3.1. Language varieties 

Since our main interest was the Frisian language and its linguistic position 

within the Germanic language group we wished to represent this language 

as well as possible. For this reason, we included seven Frisian varieties, 

spread over the Frisian language area. Furthermore, our material contained 

eight Germanic standard languages. First, we will describe the Frisian 

varieties and next the standard languages. 

As far as the Frisian varieties are concerned, we chose varieties from 

different parts of the province, both from the coastal area and from the 

inland. The varieties are spoken in different dialect areas according to the 

traditional classification (see below) and they represent different stages of 

conservatism. The precise choice of the seven varieties was determined by 

speaker availability for recordings in our vicinity and at the Fryske 

Akademy in Leeuwarden. In Figure 2, the geographical position of the 

seven Frisian language varieties in the province of Friesland is shown. 

Due to the absence of major geographical barriers, the Frisian language 

area is relatively uniform. The major dialectal distinctions are primarily 

phonological. Traditionally, three main dialect areas are distinguished (see 

e.g. Hof, 1933; Visser, 1997): Klaaifrysk (clay Frisian) in the west, 

Wâldfrysk (forest Frisian) in the east and Súdwesthoeksk (southwest 

quarter) in the southwest. In our material Klaaifrysk is represented by the 

dialects of Oosterbierum and Hijum, Wâldfrysk by Wetsens and 

Westergeest, and Súdwesthoeksk by Tjerkgaast. Hindeloopen is in the area 

of Súdwesthoeksk. However, this dialect represents a highly conservative 

area. The phonological distance between Hindeloopen and the main 

dialects is substantial (van der Veen, 2001). Finally, our material contains


the variety spoken in Leeuwarden (see note 1). This is an example of Town 

Frisian, which is also spoken in other cities of Friesland. Town Frisian is a 

Dutch dialect strongly influenced by Frisian but stripped of the most 

characteristic Frisian elements (Goossens, 1977). 

Oosterbierum 

Hindeloopen 

Hijum 

Leeuwarden 

Tjerkgaast 

Wetsens 

Westergeest 

Figure 2. The geographical position of the seven Frisian language varieties in the 

province of Friesland. 

In addition to the Frisian dialects, the following eight standard languages 

were included: Icelandic, Faroese, Norwegian, Swedish, Danish, English, 

Dutch, and German. We had meant to include all standard Germanic 

languages in our material. However, due to practical limitations a few 

smaller languages were not included. 

As for Norwegian, there is no official standard variety. The varieties 

spoken around the capital of Oslo in the southeast, however, are often 

considered to represent the standard language. We based the present 

investigation on prior research on Norwegian dialects (see Heeringa and 

Gooskens, 2003; Gooskens and Heeringa, submitted), and we chose the 

recording which to Norwegians sounded most standard, namely the 

Lillehammer recording 3 . It was our aim to select standard speakers from all 

countries, but it is possible that the speech of some speakers contains slight 

regional influences. The speakers from Iceland, the Faroe Islands and


Sweden spoke the standard varieties of the capitals. The Danish speaker 

came from Jutland, the German speaker from Kiel, the English speaker 

from Birmingham and the Dutch speaker had lived at different places in the 

Netherlands, including a long period in the West during adolescence. 

3.2. Phonetic transcriptions 

The speakers all read aloud translations of the same text, namely the fable 

‘The North Wind and the Sun’. This text has often been used for phonetic 

investigations; see for example The International Phonetic Association 

(1949 and 1999) where the same text has been transcribed in a large 

number of different languages. A database of Norwegian transcriptions of 

the same text has been compiled by J. Almberg (see note 3). As mentioned 

in the previous section, we only used the transcription of Lillehammer from 

this database. In future, we would like to investigate the relations between 

Norwegian and other Germanic varieties, using the greater part of the 

transcriptions in this database. Therefore, our new transcriptions should be 

as comparable as possible with the existing Norwegian ones. To ensure 

this, our point of departure was the Norwegian text. This text consists of 91 

words (58 different words) which were used to calculate Levenshtein 

distances (see Section 4). The text was translated word for word from 

Norwegian into each of the Germanic language varieties. We are aware of 

the fact that this may result in less natural speech: sentences were often 

syntactically wrong. However, it guarantees that for each of the 58 words a 

translation was obtained. The words were not recorded as a word list, but as 

sentences. Therefore in the new recordings words appear in a similar 

context as in the Norwegian varieties. This ensures that the influence of 

assimilation phenomena on the results is as comparable as possible. 

Most new recordings were transcribed phonetically by one of the 

authors. To ensure consistency with the existing Norwegian transcriptions, 

our new transcriptions were corrected by J. Almberg, the transcriber of the 

Norwegian recordings. In most cases we incorporated the corrections. The 

transcription of the Faroese language was completely done by J. Almberg. 

The transcriptions were made in IPA as well as in X-SAMPA (eXtended 

Speech Assessment Methods Phonetic Alphabet). This is a machinereadable 

phonetic alphabet, which is also readable by people. Basically, it 

maps IPA-symbols to the 7 bit printable ASCII/ANSI characters 4 . The


transcriptions were used to calculate the linguistic distances between 

varieties (see Section 4). 

4. Measuring distances between varieties 

In 1995 Kessler introduced the use of the Levenshtein distance as tool for 

measuring linguistic distances between language varieties. The Levenshtein 

distance is a string edit distance measure and Kessler applied this algorithm 

to the comparison of Irish dialects. Later on, this approach was applied by 

Nerbonne, Heeringa, Van den Hout, Van der Kooi, Otten, and Van de Vis 

(1996) to Dutch dialects. They assumed that distances between all possible 

pairs of segments are the same. E.g. the distance between an [�] and an [e] 

is the same as the distance between the [�] and [�]. Both Kessler (1995) and 

Nerbonne and Heeringa (1997) also experimented with more refined 

versions of the Levenshtein algorithm in which gradual segment distances 

were used which were found on the basis of the feature systems of 

Hoppenbrouwers (1988) and Vieregge et. al. (1984). 

In this paper we use an implementation of the Levenshtein distance in 

which sound distances are used which are found by comparing 

spectrograms. In Section 4.1 we account for the use of spectral distances 

and explain how we calculate them. Comparisons are made on the basis of 

the audiotape The Sounds of the International Phonetic Alphabet (Wells 

and House, 1995). In Section 4.2 we describe the Levenshtein distance and 

explain how spectral distances can be used in this algorithm. 

4.1. Gradual segment distances 

When acquiring language, children learn to pronounce sounds by listening 

to the pronunciation of their parents or other people. The acoustic signal 

seems to be sufficient to find the articulation which is needed to realize the 

sound. Acoustically, speech is just a series of changes in air pressure, 

quickly following each other. A spectrogram is a “graph with frequency on 

the vertical axis and time on the horizontal axis, with the darkness of the 

graph at any point representing the intensity of the sound” (Trask, 1996, p. 

328). 

In this section we present the use of spectrograms for finding segment 

distances. Segment distances can also be found on the basis of phonological


or phonetic feature systems. However, we prefer the use of acoustic 

representations since they are based on physical measurements. In Potter, 

Kopp and Green’s (1947) Visible Speech, spectrograms are shown for all 

common English sounds (see pp. 54-56). Looking at the spectrograms we 

already see which sounds are similar and which are not. We assume that 

visible (dis)similarity between spectrograms reflects perceptual 

(dis)similarity between segments to some extent. In Figure 3 the 

spectrograms of some sounds are shown as pronounced by John Wells on 

the audiotape The Sounds of the International Phonetic Alphabet (Wells 

and House, 1995). The spectrograms are made with the computer program 

PRAAT 5 . 

Figure 3. Spectrograms of some sounds pronounced by John Wells. Upper the [i] 

(left) and the [e] (right) are shown, and lower the [p] (left) and the [s] 

(right) are visualized. 

4.1.1. Samples 

For finding spectrogram distances between all IPA segments we need 

samples of one or more speakers for each of them. We found the samples 

on the tape The Sounds of the International Phonetic Alphabet on which all


IPA sounds are pronounced by John Wells and Jill House. On the tape the 

vowels are pronounced in isolation. The consonants are sometimes 

preceded, and always followed by an [a]. We cut out the part preceding the 

[a], or the part between the [a]’s. We realize that the pronunciation of 

sounds depends on their context. Since we use samples of vowels 

pronounced in isolation and samples of consonants selected from a limited 

context, our approach is a simplification of reality. However, Stevens 

(1998, p. 557) observes that 

“by limiting the context, it was possible to specify rather precisely the 

articulatory aspects of the utterances and to develop models for estimating 

the acoustic patterns from the articulation”. 

The burst in a plosive of the IPA inventory is always preceded by a period 

of silence (voiceless plosives) or a period of murmur (voiced plosives). 

When a voiceless plosive is not preceded by an [a], it is not clear how long 

the period of silence which really belongs to the sounds lasts. Therefore we 

always cut out each plosive in such a way that the time span from the 

beginning to the middle of the burst is equal to 90 ms. Among the plosives 

which were preceded by an [a] or which are voiced (so that the real time of 

the start-up phase can be found) we found no sounds with a period of 

silence or murmur which was clearly shorter than 90 ms. 

In voiceless plosives, the burst is followed by an [h]-like sound before 

the following vowel starts. A consequence of including this part in the 

samples is that bursts often do not match when comparing two voiceless 

plosives. However, since aspiration is a characteristic property of voiceless 

sounds, we retained aspiration in the samples. In general, when comparing 

two voiced plosives, the bursts match. When comparing a voiceless plosive 

and a voiced plosive, the bursts do not match. 

To keep trills comparable to each other, we always cut three periods, 

even when the original samples contained more periods. When there were 

more periods, the most regular looking sequence of three periods was cut. 

The Levenshtein algorithm also requires a definition of ‘silence’. To get 

a sample of ‘silence’ we cut a small silent part on the IPA tape. This 

assures that silence has approximately the same background noise as the 

other sounds. 

To make the samples as comparable as possible, all vowel and extracted 

consonant samples are monotonized on the mean pitch of the 28 

concatenated vowels. The mean pitch of John Wells was 128 Hertz; the


mean pitch of Jill House was 192 Hertz. In order to monotonize the 

samples the pitch contours were changed to flat lines. The volume was not 

normalized because volume contains too much segment specific 

information. For example it is specific for the [v] that its volume is greater 

than that of the [f]. 

4.1.2. Acoustic representation 

In the most common type of spectrogram the linear Hertz frequency scale is 

used. The difference between 100 Hz and 200 Hz is the same as the 

difference between 1000 Hz and 1100 Hz. However, our perception of 

frequency is non-linear. We hear the difference between 100 Hz and 200 

Hz as an octave interval, but also the difference between 1000 Hz and 2000 

Hz is perceived as an octave. Our ear evaluates frequency differences not 

absolutely, but relatively, namely in a logarithmic manner. Therefore, in the 

Barkfilter, the Bark-scale is used which is roughly linear below 1000 Hz 

and roughly logarithmic above 1000 Hz (Zwicker and Feldtkeller, 1967). 

In the commonly used type of spectrogram the power spectral density is 

represented per frequency per time. The power spectral density is the power 

per unit of frequency as a function of the frequency. In the Barkfilter the 

power spectral density is expressed in decibels (dB’s). “The decibel scale is 

a way of expressing sound amplitude that is better correlated with 

perceived loudness” (Johnson, 1997, p. 53). The decibel scale is a 

logarithmic scale. Multiplying the sound pressure ten times corresponds to 

an increase of 20 dB. On a decibel scale intensities are expressed relative to 

the auditory threshold. The auditory threshold of 0.00002 Pa corresponds 

with 0 dB (Rietveld and Van Heuven, 1997, p. 199). 

A Barkfilter is created from a sound by band filtering in the frequency 

domain with a bank of filters. In PRAAT the lowest band has a central 

frequency of 1 Bark per default, and each band has a width of 1 Bark. 

There are 24 bands, corresponding to the first 24 critical bands of hearing 

as found along the basilar membrane (Zwicker and Fastl, 1990). A critical 

band is an area within which two tones influence each other’s perceptibility 

(Rietveld and Van Heuven, 1997). Due to the Bark-scale the higher bands 

summarize a wider frequency range than the lower bands. 

In PRAAT we used the default settings when using the Barkfilter. The 

sound signal is probed each 0.005 seconds with an analysis window of 

0.015 seconds. Other settings may give different results, but since it was


not a priori obvious which results are optimal, we restricted ourselves to the 

default settings. In Figure 4 Barkfilters for some segments are shown. 

Figure 4. Barkfilter spectrograms of some sounds pronounced by John Wells. 

Upper the [i] (left) and the [e] (right) are shown, and lower the 

[p] (left) and the [s] (right) are visualized. 

4.1.3. Comparison 

In this section, we explain the comparison of segments in order to get 

distances between segments that will be used in the Levenshtein distance 

measure. In a Barkfilter, the intensities of frequencies are given for a range 

of times. A spectrum contains the intensities of frequencies at one time. 

The smaller the time step, the more spectra there are in the acoustic 

representation. We consistently used the same time step for all samples. 

It appears that the duration of the segment samples varies. This may be 

explained by variation in speech rate. Duration is also a sound-specific 

property. E.g., a plosive is shorter than a vowel. The result is that the 

number of spectra per segment may vary, although for each segment the 

same time step was used. Since we want to normalize the speech rate and 

regard segments as linguistic units, we made sure that two segments get the 

same number of spectra when they are compared to each other.


When comparing one segment of m spectra with another segment of n 

spectra, each of the m elements is duplicated n times, and each of the n 

elements is duplicated m times. So both segments get a length of m × n. 

In order to find the distance between two sounds, the Euclidean distance 

is calculated between each pair of corresponding spectra, one from each of 

the sounds. Assume a spectrum e1 and e2 with n frequencies, then the 

Euclidean distance is: 

Equation 1. Euclidean distance 

The distance between two segments is equal to the sum of the spectrum 

distances divided by the number of spectra. In this way we found that the 

greatest distance occurs between the [a] and ‘silence’. We regard this 

maximum distance as 100%. Other segment distances are divided by this 

maximum and multiplied by 100. This yields segment distances expressed 

in percentages. Word distances and distances between varieties which are 

based on them may also be given in terms of percentages. 

In perception, small differences in pronunciation may play a relatively 

strong role in comparison with larger differences. Therefore we used 

logarithmic segment distances. The effect of using logarithmic distances is 

that small distances are weighed relatively more heavily than large 

distances. Since the logarithm of 0 is not defined, and the logarithm of 1 is 

0, distances are increased by 1 before the logarithm is calculated. To obtain 

percentages, we calculate ln(distance + 1) / ln(maximum distance + 1). 

4.1.4. Suprasegmentals and diacritics 

The sounds on the tape The Sounds of the International Phonetic Alphabet 

are pronounced without suprasegmentals and diacritics. However, a 

restricted set of suprasegmentals and diacritics can be processed in our 

system. 

Length marks and syllabification are processed by changing the 

transcription beforehand. In the X-SAMPA transcription, extra-short


segments are kept unchanged, sounds with no length indication are 

doubled, half long sounds are trebled, and long sounds are quadrupled. 

Syllabic sounds are treated as long sounds, so they are quadrupled. 

When processing the diacritics voiceless and/or voiced, we assume that a 

voiced voiceless segment (e.g. [��]) and a voiceless voiced segment (e.g. [d�]) 

are intermediate pronunciations of a voiceless segment ([t]) and a voiced 

segment ([d]). Therefore we calculate the distance between a segment x and 

a voiced segment y as the average of the distance between x and y and the 

distance between x and the voiced counterpart of y. Similarly, the distance 

between a segment x and a voiceless segment y is calculated as the mean of 

the distance between x and y and the distance between x and the voiceless 

counterpart of y. For voiced sounds which have no voiceless counterpart 

(the sonorants), or for voiceless sounds which have no voiced counterpart 

(the glottal stop) the sound itself is used. 

The diacritic apical is only processed for the [s] and the [z]. We 

calculate the distance between [s�] and e.g. [f] as the average of the distance 

between [s] and [f] and [�] and [f]. Similarly, the distance between [z�] and 

e.g. [v] is calculated as the mean of [z] and [v] and [�] and [v]. 

The thought behind the way in which the diacritic nasal is processed is 

that a nasal sound is more or less intermediate between its non-nasal 

version and the [n]. We calculate the distance between a segment x and a 

nasal segment y as the average of the distance between x and y and the 

distance between x and [n]. 

4.2. Levenshtein distance 

Using the Levenshtein distance, two dialects are compared by comparing 

the pronunciation of a word in the first dialect with the pronunciation of the 

same word in the second. It is determined how one pronunciation is 

changed into the other by inserting, deleting or substituting sounds. 

Weights are assigned to these three operations. In the simplest form of the 

algorithm, all operations have the same cost, e.g. 1. Assume afternoon is 

pronounced as [��t��n��n] in the dialect of Savannah, Georgia, and as 

[��] in the dialect of Lancaster, Pennsylvania 6 . Changing one 

pronunciation into the other can be done as in table 1 (ignoring 

suprasegmentals and diacritics for this moment) 7 :


Table 1. Changing one pronunciation into another using a minimal set of 

operations. 

æ�ft�n�n delete � 1 

æft�n�n insert r 1 

æft�rn�n subst. �/u 1 

æft�rnun 

⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯ 

3 

In fact many sequence operations map [�æ�ft��n��n] to [�æft�r�nu�n]. The 

power of the Levenshtein algorithm is that it always finds the cost of the 

cheapest mapping. 

Comparing pronunciations in this way, the distance between longer 

pronunciations will generally be greater than the distance between shorter 

pronunciations. The longer the pronunciation, the greater the chance for 

differences with respect to the corresponding pronunciation in another 

variety. Because this does not accord with the idea that words are linguistic 

units, the sum of the operations is divided by the length of the longest 

alignment which gives the minimum cost. The longest alignment has the 

greatest number of matches. In our example we have the following 

alignment: 

Table 2. Alignment which gives the minimal cost. The alignment corresponds 

with table 1. 

æ � f t � n � n 

æ f t � r n u n 

⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯ 

1 1 1 

The total cost of 3 (1+1+1) is now divided by the length of 9. This gives a 

word distance of 0.33 or 33%. 

In Section 3.1.3 we explained how distances between segments can be 

found using spectrograms. This makes it possible to refine our Levenshtein 

algorithm by using the spectrogram distances as operation weights. Now 

the cost of insertions, deletions and substitutions is not always equal to 1, 

but varies, i.e., it is equal to the spectrogram distance between the segment


and ‘silence’ (insertions and deletions) or between two segments 

(substitution). 

To reckon with syllabification in words, the Levenshtein algorithm is 

adapted so that only a vowel may match with a vowel, a consonant with a 

consonant, the [j] or [w] with a vowel (or opposite), the [i] or [u] with a 

consonant (or opposite), and a central vowel (in our research only the 

schwa) with a sonorant (or opposite). In this way unlikely matches (e.g. a 

[p] with a [a]) are prevented. 

In our research we used 58 different words. When a word occurred in 

the text more than once, the mean over the different pronunciations was 

used. So when comparing two dialects we get 58 Levenshtein distances. 

Now the dialect distance is equal to the sum of 58 Levenshtein distances 

divided by 58. When the word distances are presented in terms of 

percentages, the dialect distance will also be presented in terms of 

percentages. All distances between the 15 language varieties are arranged 

in a 15 × 15 matrix. 

5. Results 

The results of the Levenshtein distance measurements are analyzed in two 

ways. First, on the basis of the distance matrix we applied hierarchical 

cluster analysis (see Section 5.1). The goal of clustering is to identify the 

main groups. The groups are called clusters. Clusters may consist of 

subclusters, and subclusters may in turn consist of subsubclusters, etc. The 

result is a hierarchically structured tree in which the dialects are the leaves 

(Jain and Dubes, 1988). Several alternatives exist. We used the Unweighted 

Pair Group Method using Arithmetic averages (UPGMA), since 

dendrograms generated by this method reflected distances which correlated 

most strongly with the original Levenshtein distances (r=0.9832), see Sokal 

and Rohlf (1962). 

Second, we ranked all varieties in order of relationship with the standard 

languages, Frisian and Town Frisian (see Section 5.2). When ranking with 

relation to Frisian, we looked at the average over all Frisian dialects. Since 

the ratings with respect to each of the Frisian varieties individually were 

very similar averaging was justified.


5.1. The classification of the Germanic languages 

Looking at the clusters of language varieties in Figure 5 we note that our 

results reflect the traditional classification of the Germanic languages to a 

large extent (see Figure 1). On the highest level there is a division between 

English and the other Germanic languages. When we examine the group of 

other Germanic languages, we find a clear division between the North 

Germanic languages and the West Germanic languages. Within the North 

Germanic group, we see a clear division between the Scandinavian 

languages (Danish, Norwegian and Swedish) on the one hand and the 

Faroese and Icelandic on the other hand. In the genetic tree (see Figure 1), 

Norwegian is clustered with Icelandic and Faroese. However, due to the 

isolated position of Iceland and the Faroes and intensive language contact 

between Norway and the rest of Scandinavia, modern Norwegian has 

become very similar to the modern languages of Denmark and Sweden. All 

varieties spoken in the Netherlands, including the Frisian varieties, cluster 

together, and German clusters more closely to these varieties than English. 

Figure 5. Dendrogram showing the clustering of the 14 language varieties in our 

study. The scale distance shows average Levenshtein distances in 

percentages. 

All Frisian dialects form a cluster. This clustering corresponds well with 

the traditional classification as sketched in Section 3.1. The dialects of 

Hijum and Oosterbierum belong to Klaaifrysk and these dialects form a 

cluster. The Wâldfrysk dialects of Westergeest and Wetsens also cluster 

together. The Levenshtein distance between the four dialects is small,


ranging from 19.6% between Hijum and Oosterbierum and 23.8% between 

Oosterbierum and Westergeest. Also the Súdwesthoeksk dialects, 

represented by the Tjerkgaast dialect, are rather close to the Klaaifrysk and 

Wâldfrysk dialects (distances between 21.6% and 26.4%). The highly 

conservative dialect of Hindeloopen is more deviant from the other dialects 

(distances between 29.8% and 32.5%) and this is also the case for the Town 

Frisian dialect of Leeuwarden which is more similar to Dutch (20.3%) than 

to Frisian (between 32.3% and 35.8%) which confirms the characterization 

of Town Frisian by Kloeke (1927) as ‘Dutch in Frisian mouth’. 

5.2. The relationship between Frisian and the other Germanic languages 

From Table 3 and 4 it is possible to determine the distance between all 

Germanic standard languages. We are especially interested in the position 

of Frisian within the Germanic language group. For this purpose the mean 

distance over the 6 Frisian dialects (excluding the dialect of Leeuwarden 

which is considered Dutch) has been added. This makes it possible to treat 

Frisian as one language. Examining the column which shows the ranking 

with respect to Frisian, we find that Dutch is most similar to Frisian (a 

mean distance of 38.7%). Clearly the intensive contact with Dutch during 

history has had a great impact on the distance between the two languages. 

Moreover, German appears to be closer to Frisian than any other language 

outside the Netherlands. Looking at the ranking with respect to Dutch, it 

appears that Town Frisian is most similar (Leeuwarden 20.3%), followed 

by the Frisian varieties (average of 38.7%). Next, German is most similar, 

due to common historical roots and continuous contact (a distance of 

53.3%). 

As discussed in the introduction, Friesland has a long history of 

language contact with the Scandinavian countries, and traces of 

Scandinavian influences can be found in the Frisian language. The impact 

of this contact is reflected in our results only to a limited extent. 

Remarkably, the distances to the mainland Scandinavian languages 

(Danish, Norwegian and Swedish) are smaller (between 60.7% and 63.3%) 

than to English (65.3%) even though the Frisian language is genetically 

closer related to English than to Scandinavian (see Section 2.1).


Table 3. Ranked Levenshtein distances in percentages between each of the five 

West Germanic languages and the other language varieties in the 

investigation. 

Frisian Leeuwarden Dutch English German 

Dutch 20.3 Leeuw 20.3 Hindel 63.1 Dutch 53.3 

Wetsens 32.3 Hindel 37.5 Wetsens 64.4 Leeuw 54.2 

Westerg 32.7 Westerg 37.7 Dutch 64.7 Hindel 56.2 

Frisian 34.2 Wetsens 38.3 Swedish 64.9 Westerg 56.9 

Oosterb 34.3 Tjerkg 38.5 Leeuw 65.1 Oosterb 57.2 

Hindel 34.9 Frisian 38.7 Tjerkg 65.2 Tjerkg 57.3 

Leeuw 34.2 Tjerkg 35.3 Hijum 38.9 Frisian 65.3 Frisian 57.3 

Dutch 38.7 Hijum 35.8 Oosterb 41.3 Hijum 65.8 Hijum 57.5 

German 57.3 German 54.2 German 53.3 Westerg 65.8 Wetsens 58.6 

Swedish 60.7 Swedish 59.2 Swedish 60.9 Danish 66.7 Swedish 61.0 

Norweg 60.9 Norweg 60.0 Norweg 61.4 Faroese 67.1 Danish 63.5 

Danish 63.3 Danish 61.1 Danish 63.4 Oosterb 67.2 Norweg 64.0 

English 65.3 English 65.1 English 64.7 German 68.1 Faroese 67.1 

Faroese 67.7 Faroese 67.5 Faroese 66.1 Norweg 68.6 English 68.1 

Icelandic 70.0 Icelandic 69.6 Icelandic 69.2 Icelandic 69.1 Icelandic 68.5 

Table 4. Ranked Levenshtein distances in percentages between each of the five 

North Germanic languages and the other language varieties in the 

investigation. 

Danish Swedish Norwegian Icelandic Faroese 

Norweg 43.8 Norweg 43.4 Swedish 43.4 Faroese 54.1 Swedish 53.6 

Swedish 47.0 Danish 47.0 Danish 43.8 Swedish 58.7 Icelandic 54.1 

Faroese 58.5 Faroese 53.6 Faroese 57.2 Norweg 62.6 Norweg 57.2 

Leeuw 61.1 Icelandic 58.7 Westerg 59.6 Danish 62.7 Danish 58.5 

Westerg 62.2 Hindel 59.2 Leeuw 60.0 German 68.5 Dutch 66.1 

Wetsens 62.3 Leeuw 59.2 Hindel 60.2 Tjerkg 69.1 Hindel 67.0 

Icelandic 62.7 Westerg 59.6 Tjerkg 60.6 English 69.1 English 67.1 

Hijum 62.9 Tjerkg 60.0 Wetsens 60.7 Dutch 69.2 German 67.1 

Frisian 63.3 Frisian 60.7 Frisian 60.9 Leeuw 69.6 Westerg 67.4 

Hindel 63.4 Dutch 60.9 Dutch 61.4 Hijum 69.8 Leeuw 67.5 

Dutch 63.4 German 61.0 Oosterb 61.9 Frisian 70.0 Tjerkg 67.5 

German 63.5 Wetsens 61.1 Hijum 62.6 Wetsens 70.1 Frisian 67.5 

Tjerkg 63.8 Oosterb 61.4 Icelandic 62.6 Hindel 70.1 Oosterb 67.7 

Oosterb 65.2 Hijum 62.7 German 64.0 Oosterb 70.3 Wetsens 68.1 

English 66.7 Icelandic 64.9 English 68.6 Westerg 70.3 Hijum 68.2


So, when looking at the results from a Frisian perspective, the close genetic 

relationship with English is not reflected in our results. Of the Germanic 

languages in our investigation, only Icelandic and Faroese are less similar 

to Frisian than English. However, when looking at the results from an 

English perspective, we discover that of all Germanic language varieties in 

our material the Frisian dialect of Hindeloopen is most similar to English. 

As mentioned before, this dialect is highly conservative and furthermore it 

is spoken in a coastal place, which provides for easy contact with England. 

Also the Frisian dialect of Wetsens is more similar to English than the 

remaining Germanic languages. The other Frisian varieties are found 

elsewhere in the middle of the ranking. Among the non-Frisian varieties, 

Dutch appears to be most similar to English. However, all Germanic 

languages, including Frisian and Dutch, show a large linguistic distance to 

English, all distances being above 60%. The development of the English 

language has thus clearly taken place independently from the other 

Germanic languages, which can be explained by the strong influence from 

non-Germanic languages, especially French. 

Also Icelandic shows a large distance to all other Germanic languages 

(from 54.1% to 70.0%), but in the Icelandic case this is explained by the 

conservative nature of this language rather than by language contact 

phenomena. Faroese is somewhat less conservative, but still shows rather 

large distances to the other languages (between 53.6% and 67.7%). The 

distances between the other Nordic languages are smaller (between 43.4% 

and 47%), as was expected given that the three Scandinavian languages are 

mutually intelligible. 

6. Conclusions and discussion 

Overall, the classification of the Germanic languages resulting from our 

distance measurements supports our predictions. This goes for the 

classification of the Frisian dialects and also for the rest of the Germanic 

languages. We interpret this as a confirmation of the suitability of our 

material showing that it is possible to measure Levenshtein distances on the 

basis of whole texts with assimilation phenomena typical of connected 

speech and with a rather limited number of words. 

The aim of the present investigation was to get an impression of the 

position of the Frisian language in the Germanic language area on the basis 

of quantitative data. The fact that Frisian is genetically most closely related


to English yields the expectation that these two languages may still be 

linguistically similar. However, the distance between English and the 

Frisian dialects is large. We can thus conclude that the close genetic 

relationship between English and Frisian is not reflected in the linguistic 

distances between the modern languages. Geographical and historical 

circumstances have caused the two languages to drift apart linguistically. 

Frisian has been strongly influenced by Dutch whereas English has been 

influenced by other languages, especially French. 

It would have been interesting to include these languages in our 

material. This would have given an impression of their impact on the 

English language. At the same time it would also have given us the 

opportunity to test the Levenshtein method on a larger language family than 

the Germanic family with its relatively closely related languages. It would 

also be interesting to include Old English in our material since this would 

give us an impression of how modern Frisian is related to the English 

language at a time when it had only recently separated from the common 

Anglo-Saxon roots to which also Old Frisian belonged. 

For many centuries Frisian has been under the strong influence from 

Dutch and the Frisian and Dutch language areas share a long common 

history. It therefore does not come as a surprise that Dutch is the Germanic 

language most similar to the language varieties spoken in Friesland. 

It may be surprising that the linguistic distances between Dutch and the 

Frisian dialects are smaller than the distances between the Scandinavian 

languages (a mean difference of 6%). Scandinavian languages are known to 

be mutually intelligible. This means that when, for example, a Swede and a 

Dane meet, they mostly communicate each in their own language. This 

kind of communication, which is known as semi-communication (Haugen, 

1966), is not typical in the communication between Dutch-speaking and 

Frisian-speaking citizens in the Netherlands. The two languages are 

considered so different that it is not possible for a Dutch-speaking person to 

understand Frisian and consequently the Frisian interlocutor will have to 

speak Dutch to a non-Frisian person. Our results raise the question whether 

semi-communication would also be possible in a Dutch-Frisian situation. If 

this is not the case, we may explain this by linguistic and non-linguistic 

differences between the Frisian-Dutch situation and the Scandinavian 

situation. The Levenshtein distance processes lexical, phonetic and 

morphological differences. All three types are present in our transcription, 

since word lists are derived from running texts. Syntactic characteristics are 

completely excluded from the analysis. It might be the case that certain


characteristics play a larger role for the Levenshtein distances than 

desirable in the case of the Scandinavian languages if we were to use the 

method for the explaining mutual intelligibility. For example, it is wellknown 

among the speakers of Scandinavian languages that many words 

end in an ‘a’ in Swedish while ending in an ‘e’ in Danish. Probably people 

use this knowledge in an inter-Scandinavian situation. However, this 

difference is included in the Levenshtein distances between Swedish and 

Danish. It is possible that Frisian-Dutch differences are less predictable or 

less well-known by speakers of the two languages. It is also possible that 

the difference in communication in the Netherlands and in Scandinavia 

should be sought at the extra-linguistic level. Scandinavian research on 

semi-communication has shown that the willingness to understand and the 

belief that it is possible to communicate play a large role for mutual 

intelligibility between speakers of closely related languages. 

Staying with the Scandinavian languages, it should be noted that the 

mainland Scandinavian languages are in fact closer to Frisian than English, 

even though the Scandinavian languages belong genetically to another 

Germanic branch than English and Frisian. This can probably be explained 

by intensive contacts between Frisians and Scandinavians for many 

centuries. However, the common idea among some speakers of Frisian and 

Scandinavian that the two languages are so close that they are almost 

mutually intelligible is not confirmed by our results, at least not as far as 

the standard Scandinavian languages are concerned. Probably this popular 

idea is built on the fact that a few frequent words are identical in Frisian 

and Scandinavian. It is possible, however, that this picture would change if 

we would include more Danish dialects in our material. For example, it 

seems to be relatively easy for fishermen from Friesland to speak to their 

colleagues from the west coast of Denmark. Part of the explanation might 

also be that fishermen share a common vocabulary of professional terms. 

Also the frequent contact and a strong motivation to communicate 

successfully are likely to be important factors. 

As we mentioned in the introduction, among dialects in the Netherlands 

and Flanders, the Frisian varieties are most deviant from Standard Dutch. 

However, among the varieties which are recognized as languages in the 

Germanic language area, Frisian is most similar to Dutch. The smallest 

distance between two languages, apart from Frisian, was found between 

Norwegian and Swedish: 43.4%. The distance between Frisian and Dutch is 

smaller: 38.7%. The Town Frisian variety of the capital of Friesland 

(Leeuwarden) has a distance of only 20.3% to Dutch. Although the


recognition of Frisian as second official language in the Netherlands is right 

in our opinion, we found that the current linguistic position of Frisian 

provide too little foundation for becoming independent from the 

Netherlands, as some Frisians may wish 8 . 

Acknowledgements 

This research would have been impossible without informants who were 

willing to translate the story of ‘the Northwind and the Sun’. We wish to 

thank G. Blom (Hindeloopen), J. Spoelstra (Hijum) and W. Visser 

(Oosterbierum). All of them are affiliated with the Fryske Akademy in 

Leeuwarden. We also thank S. van Dellen (Wetsens), T. de Graaf 

(Leeuwarden), F. Postma (Tjerkgaast) and O. Vries (Westergeest), all of 

them employees of the University of Groningen. We thank J. Allen 

(England), A. Mikaelsdóttir (Iceland), Vigdis Petersen (the Faroes), R. 

Kraayenbrink (the Netherlands), K. Sjöberg (Sweden) and R. Schmidt 

(Germany). We are also very grateful to Jørn Almberg for making available 

the recording of Lillehammer (Norway). The recordings and transcriptions 

of the Frisian transcriptions are made by the second author, and those of the 

standard languages (except Norway and the Faroes) by the first author. The 

transcriptions subsequently were checked by Jørn Almberg who we thank 

gratefully for correcting our transcriptions. Furthermore, we wish to 

express our gratitude to Peter Kleiweg for his software for creating the map 

(Figure 2) and visualizing the dendrogram (Figure 5). Finally we thank 

Maartje Schreuder for reading an earlier version of this article and giving 

useful comments and Angeliek van Hout for reviewing our English. 

Notes 

1 Dr. Tjeerd de Graaf, the central figure in this volume, was born in Leeuwarden, 

the capital of Friesland. Leeuwarden is one of the places where Town Frisian is 

spoken. Tjeerd de Graaf is a native speaker of this dialect, but later on he also 

learned (standard) Frisian. The Leeuwarden speaker in the present investigation 

was Tjeerd de Graaf (see Section 3.1). 

2 Most of this section is based on König and Van der Auwera (1994). 

3 The Lillehammer recording can be found at http://www.ling.hf.ntnu.no/nos/ 

together with 52 recordings of other Norwegian dialects.


4 Since our material included two toneme languages, Swedish and Norwegian, 

also the two tonemes I and II were transcribed. For the other varieties primary 

stress was noted. Stress and tonemes were, however, not included for 

calculation of linguistic distances. 

5 The program PRAAT is a free public-domain program developed by Paul 

Boersma and David Weenink at the Institute of Phonetic Sciences of the 

University of Amsterdam and available at http://www.fon.hum.uva.nl/praat. 

6 The data is taken from the Linguistic Atlas of the Middle and South Atlantic 

States (LAMSAS) and available via: http://hyde.park.uga.edu/lamsas/. 

7 The example should not be interpreted as a historical reconstruction of the way 

in which one pronunciation changed into another. From that point of view it 

may be more obvious to show how [��] changed into [��t��n��n]. We 

just show that the distance between two arbitrary pronunciations is found on the 

basis of the least costly set of operations mapping one pronunciation into 

another. 

8 Tjeerd de Graaf has never taken such an extreme position. Possibly speakers of 

Town Frisian have a more moderate opinion towards this issue since Town 

Frisian is more closely related to standard Dutch, as appeared in Figure 5 and 

Table 3. 

References 

Bolognesi, R. and W. Heeringa (2002). De invloed van dominante talen op het 

lexicon en de fonologie van Sardische dialecten. In: D. Bakker, 

T. Sanders, R. Schoonen and Per van der Wijst (eds.). 

Gramma/TTT: tijdschrift voor taalwetenschap. Nijmegen 

University Press, Nijmegen, 9 (1): 45-84. 

Feitsma, T. (1963). Sproglige berøringer mellem Frisland og Skandinavien. 

Sprog og kultur, 23: 97-121. 

Gooskens, Ch. and W. Heeringa (submitted). Perceptive Evaluation of 

Levenshtein Dialect Distance Measurements Using Norwegian 

Dialect Data. (submitted to Language Variation and Change). 

Goossens, J. (1977). Inleiding tot de Nederlandse Dialectologie. Wolters- 

Noordhoff, Groningen. 

Haugen, E. (1966). Semicommunication: The Language Gap in Scandinavia. 

Sociological Inquiry, 36 (2): 280-297. 

Heeringa, W. (2004). Measuring Dialect Pronunciation Differences using 

Levenshtein Distance. Doctoral dissertation. University of 

Groningen.


Heeringa, W. and C. Gooskens (2003). Norwegian Dialects Examined 

Perceptually and Acoustically. In: J. Nerbonne and W. 

Kretzschmar (eds.). Computers and the Humanities. Kluwer 

Academic Publishers, Dordrecht, 37 (3): 293-315. 

Hof, J. J. (1933). Friesche Dialectgeographie. ‘s Gravenhage (Noord- en Zuid- 

Nederlandse Dialectbibliotheek 3). 

Hoppenbrouwers, C and G. Hoppenbrouwers (1988). De featurefrequentie 

methode en de classificatie van Nederlandse dialecten. TABU: 

Bulletin voor Taalwetenschap, 18 (2): 51-92. 

Jain, A.K. and R.C. Dubes (1988). Algorithms for Clustering Data. Prentice 

Hall, Englewood Cliffs, New Yersey. 

Johnson, K. (1997). Acoustic and Auditory Phonetics. Blackwell Publishers, 

Cambridge etc.. 

Kessler, B. (1995). Computational dialectology in Irish Gaelic. In: Proceedings 

of the 7 th Conference of the European Chapter of the Association 

for Computational Linguistics. EACL, Dublin, 60-67. 

Kloeke, G. G. (1927). De Hollandsche expansie in de zestiende en zeventiende 

eeuw en haar weerspiegeling in de hedendaagsche 

Nederlandsche dialecten. Nijhoff, ‘s-Gravenhage. 

König, E. and J. van der Auwera (1994). eds. The Germanic Languages. 

Routledge, London. 

Miedema, H.T.J. (1966). Van York naar Jorwerd. Enkele problemen uit de 

Friese taalgeschiedenis. J.B. Wolters, Groningen. 

Nerbonne, J., W. Heeringa, E. van den Hout, P. van der Kooi, S. Otten, and W. 

van de Vis, (1996). Phonetic Distance between Dutch dialects. 

In: G. Durieux, W. Daelemans, and S. Gillis (eds.). CLIN VI, 

Papers from the sixth CLIN meeting. Antwerpen. University of 

Antwerp, Center for Dutch Language and Speech, 185-202. 

Nerbonne, J. and W. Heeringa (1997). Measuring dialect distances 

phonetically. In: J. Coleman (ed.). Workshop on Computational 

Phonology. Madrid, 11-18. 

Pei, M. (1966). The story of language. Allen & Unwin, London. 

Potter, R.K., G.A. Kopp and H.C. Green (1947). Visible Speech. The Bell 

Telephone Laboratories Series. Van Nostrand, New York. 

Rietveld, A.C.M. and V.J. Van Heuven (1997). Algemene fonetiek. Coutinho, 

Bussum. 

Sandøy, H. (1994). Utan kontakt og endring? In: U.-B. Kotsinas and J. 

Helgander (eds.). Dialektkontakt, språkkontakt och 

språkförändring i Norden. Almqvist & Wiksell International, 

Stockholm, 38-51. 

Sokal, R.R. and F.J. Rohlf (1962). The comparison of dendrograms by 

objective methods. Taxon, 11: 33-40.


Stevens, K.N. (1998). Acoustic Phonetics. MIT Press, Cambridge. 

The Columbia Encyclopedia (2001). www.bartleby.com/65/fr/Frisianl.html 

The International Phonetic Association (1949). The principles of the 

International Phonetic Association: being a description of the 

International Phonetic Alphabet and the manner of using it, 

illustrated by texts in 51 languages. International Phonetic 

Association, London. 

The International Phonetic Association (1999). Handbook of the International 

Phonetic Association: a guide to the use of the International 

Phonetic Alphabet. Cambridge University Press, Cambridge. 

Trask, R.L. (1996). A Dictionary of Phonetics and Phonology. Routledge, 

London and New York. 

Van der Veen, K. F. (2001). West Frisian Dialectology and Dialects. In: H. H. 

Munske (ed.). Handbook of Frisian Studies. Niemeyer, 

Tübingen, 83-98. 

Vieregge, W. H., A.C.M. Rietveld and C. Jansen (1984). A distinctive feature 

based system for the evaluation of segmental transcription in 

Dutch. In: M.P.R. van den Broecke and A. Cohen. Proceedings 

of the 10 th International Congress of Phonetic Sciences. Foris 

Publications, Dordrecht and Cinnaminson, 654-659. 

Visser, W. (1997). The syllable in Frisian. Holland Academic Graphics, The 

Hague. 

Wadstein, E. (1933). On the Relations between Scandinavians and Frisians in 

Early Times. University of London, London. 

Wells, J. and J. House (1995). The sounds of the International Phonetic 

Alphabet. UCL, London. 

Zwicker, E. and H. Fastl (1990). Psychoacoustics and Models. Springer Verlag, 

Berlin. 

Zwicker, E. and R. Feldtkeller (1967). Das Ohr als Nachrichtemfänger. 

Monographien der elektrischen Nachrichtentechnik. 19, 2 nd 

revised edition. Hirzel, Stuttgart.

Learning Phonotactics with Simple Processors 

John Nerbonne and Ivilin Stoianov 

Abstract 

This paper explores the learning of phonotactics in neural networks. 

Experiments are conducted on the complete set of over 5,000 Dutch 

monosyllables extracted from CELEX, and the results are shown to be 

accurate within 5% error. Extensive comparisons to human phonotactic 

learning conclude the paper. We focus on whether phonotactics can be 

effectively learned and how the learning which is induced compares to 

human behavior. 


Phonotactics concerns the organization of the phonemes in words and 

syllables. The phonotactic rules constrain how phonemes combine in order 

to form larger linguistic units (syllables and words) in that language (Laver, 

1994). For example, Cohen, Ebeling & van Holk (1972) describe the 

phoneme combinations possible in Dutch, which will be the language in 

focus in this study. 

Phonotactic rules are implicit in natural languages so that humans 

require no explicit instruction about which combinations are allowed and 

which are not. An explicit phonotactic grammar can of course be abstracted 

from the words in a language, but this is an activity linguists engage in, not 

language learners in general. Children normally learn a language's 

phonotactics in their early language development and probably update it 

only slightly once they have mastered the language. 

Most work on language acquisition has arisen in linguistics and 

psychology, and that work employs mechanisms that have been developed 

for language, typically, discrete, symbol-manipulation systems. 

Phonotactics in particular has been modeled with n-gram models, Finite

90 John Nerbonne and Ivilin Stoianov 

State Machines, Inductive Logic Programming, etc. (Tjong Kim Sang, 

1998; Konstantopoulos, 2003). Such approaches are effective, but a 

cognitive scientist may ask whether the same success could be possible 

using less custom-made tools. The brain, viewed as a computational 

machine, exploits other principles, which have been modeled in the 

approach known as Parallel Distributed Processing (PDP), which was 

thoroughly described in the seminal work of Rumelhart & McClelland 

(1986). Computational models inspired by the brain structure and neural 

processing principles are Neural Networks (NNs), also known as 

connectionist models. 

Learning phonotactic grammars is not an easy problem, especially when 

one restricts one's attention to cognitively plausible models. Since 

languages are experienced and produced dynamically, we need to focus on 

the processing of sequences, which complicates the learning task. The 

history of research in connectionist language learning shows both successes 

and failures even when one concentrates on simpler structures, such as 

phonotactics (Stoianov, Nerbonne & Bouma, 1998; Stoianov & Nerbonne, 

2000; Tjong Kim Sang, 1995; Tjong Kim Sang & Nerbonne, 1999; Pacton, 

Perruchet, Fayol & Cleeremans, 2001). 

This paper will attack phonotactics learning with models that have no 

specifically linguistic knowledge encoded a priori. The models naturally do 

have a bias, viz., toward extracting local conditioning factors for 

phonotactics, but we maintain that this is a natural bias for many sorts of 

sequential behavior, not only linguistic processing. A first-order Discrete 

Time Recurrent Neural Network (DTRNN) (Carrasco, Forcada & Neco, 

1999; Tsoi & Back, 1997) will be used: the Simple Recurrent Network 

(SRNs) (Elman, 1988). SRNs have been applied to different language 

problems (Elman, 1991; Christiansen & Chater, 1999; Lawrence, Giles & 

Fong, 1995), including learning phonotactics (Shillcock, Levy, Lindsey, 

Cairns & Chater, 1993; Shillcock, Cairns, Chater & Levy, 1997). With 

respect to phonotactics, we have also contributed reports (Stoianov et al., 

1998; Stoianov & Nerbonne, 2000; Stoianov, 1998). 

SRNs have been shown capable of representing regular languages 

(Omlin & Giles, 1996; Carrasco et al., 1999). Kaplan & Kay (1994) 

demonstrated that the apparently context-sensitive rules that are standardly 

found in phonological descriptions can in fact be expressed within the more 

restrictive formalism of regular relations. We begin thus with a device 

which is in principle capable of representing the needed patterns.


We then simulate the language learning task by training networks to 

produce context-dependent predictions. We also show how the continuous 

predictions of trained SRNs - likelihoods that a particular token can follow 

the current context - can be transformed into more useful discrete 

predictions, or, alternatively, string recognitions. 

In spite of the above claims about representability, the Back- 

Propagation (BP) and Back-Propagation Through Time (BPTT) learning 

algorithms used to train SRNs do not always find optimal solutions - SRNs 

that produce only correct context-dependent successors or recognize only 

strings from the training language. Hence, section 3 focuses on the practical 

demonstration that a realistic language learning task may be simulated by 

an SRN. We evaluate the network learning from different perspectives - 

grammar learning, phonotactics learning, and language recognition. The 

last two methods need one language-specific parameter - a threshold - that 

distinguishes successors/words allowed in the training language. This 

threshold is found with a post-training procedure, but it could also be 

sought interactively during training. 

Finally, section 4 assesses the networks from linguistic and 

psycholinguistic perspectives: a static analysis extracts acquired linguistic 

knowledge from network weights, and the network performance is 

compared to humans' in a lexical decision task. The network performance, 

in particular the distribution of errors as a function of string position, will 

be compared to alternative construals of Dutch syllabic structure - 

following a suggestion from discussions of psycholinguistic experiments 

about English syllables (Kessler & Treiman, 1997). 

1.1. Motivations for a Phonotactic Device 

This section will review standard arguments that demonstrate the cognitive 

and practical importance of phonotactics. English phonotactic rules such as: 

‘/s/ may precede, but not follow /t/ syllable-initially’ 

(ignoring loanwords such as `tsar' and `tse-tse') may be adduced by judging 

the well-formedness of sequences of letters/phonemes, taken as words in 

the language, e.g. /st�p/ vs. */ts�p/. There may also be cases judged to be of 

intermediate acceptability. So, even if all of the following are English 

words:


/m��/ `mother', /f��/ `father', /s�st��/ `sister' 

None of the following are, however: 

*/m��/, */f��/, */tss��/ 

None of these sound like English words. However, the following 

sequences: 

/m��/, /fu��/, /s�nt��/ 

"sound" much more like English, even if they mean nothing and therefore 

are not genuine English words. We suspect that, e.g., /s�nt��/ 'santer', could 

be used to name a new object or a concept. 

This simple example shows that we have a feeling for word structure, 

even if no explicit knowledge. Given the huge variety of words, it is more 

efficient to put this knowledge into a compact form - a set of phonotactic 

rules. These rules would state which phonemic sequences sound correct and 

which do not. In this same vein, second language learners experience a 

period when they recognize that certain phonemic combinations (words) 

belong to the language they learn without knowing the meaning of these 

words. 

Convincing psycholinguistic evidence that we make use of phonotactics 

comes from studying the information sources used in word segmentation 

(McQueen, 1998). In a variety of experiments, this author shows that word 

boundary locations are likely to be signaled by phonotactics. The author 

rules out the possibility that other sources of information, such as prosodic 

cues, syllabic structure and lexemes, are sufficient for segmentation. 

Similarly, Treiman & Zukowski (1990) had shown earlier that phonotactics 

play an important role in the syllabification process. According to 

McQueen (1998), phonotactic and metrical cues play complementary roles 

in the segmentation process. In accordance with this, some researchers have 

elaborated on a model for word segmentation: the Possible Word 

Constraints Model (Norris, McQueen, Cutler & Butterfield, 1997), in which 

likely word-boundary locations are marked by phonotactics, metrical cues, 

etc., and in which they are further fixed by using lexicon-specific 

knowledge.


Exploiting the specific phonotactics of Japanese, Dupoux, Pallier, 

Kakehi & Mehler (2001) conducted an experiment with Japanese listeners 

who heard stimuli that contained illegal consonant clusters. The listeners 

tended to hear an acoustically absent vowel that brought their perception 

into line with Japanese phonotactics. The authors were able to rule out 

lexical influences as a putative source for the perception of the illusory 

vowel, which suggests that speech perception must use phonotactic 

information directly. 

Further justification for the postulation of a neurobiological device that 

encodes phonotactics comes from neurolinguistic and neuroimaging 

studies. It is widely accepted that the neuronal structure of Broca’s area (in 

the brain's left frontal lobe) is used for language processing, and more 

specially that it represents a general sequential device (Stowe, Wijers, 

Willemsen, Reuland, Paans & Vaalburg, 1994; Reilly, 2002). A general 

sequential processor capable of working at the phonemic level would be a 

plausible realization of a neuronal phonotactic device. 

Besides cognitive modeling, there are also a number of practical 

problems that would benefit from effective phonotactic processing. In 

speech recognition, for example, a number of hypotheses that explain the 

speech signal are created, from which the impossible sound combinations 

have to be filtered out before further processing. This exemplifies a lexical 

decision task, in which a model is trained on a language L and then tests 

whether a given string belongs to L. In such a task a phonotactic device 

would be of use. Another important problem in speech recognition is word 

segmentation. Speech is continuous, but we divide it into psychologically 

significant units such as words and syllables. As noted above, there are a 

number of cues that we can use to distinguish these elements - prosodic 

markers, context, but also phonotactics. Similarly to the former problem, an 

intuitive strategy here is to split the phonetic/phonemic stream at the points 

of violation of phonotactic constraints (see Shillcock et al. (1997) and 

Cairns, Shillcock, Chater & Levy (1997) for connectionist modeling). 

Similarly, the constraints of the letters forming words in written languages 

(graphotactics) are useful in word processing applications, for example, 

spell-checking. 

There is another, more speculative aspect to investigating phonotactics. 

Searching for an explanation of the structure of the natural languages, 

Carstairs-McCarthy presented in his recent book (1999) an analogy 

between syllable structure and sentence structure. He argues that sentences 

and syllables have a similar type of structure. Therefore, if we find a proper


mechanism for learning the syllabic structures, we might apply a similar 

mechanism to learning syntax as well. Of course, syntax is much more 

complex and more challenging, but if Carstairs-McCarthy is right, the basic 

principles of both devices might be the same. 

2. Simple Recurrent Networks 

This section will briefly present Simple Recurrent Networks (Elman, 1988; 

Robinson & Fallside, 1988) and will review earlier studies of sequential, 

especially phonotactic learning. Detailed descriptions of the SRN 

processing mechanisms and the Back-Propagation Through Time learning 

algorithm that is used to train the model are available elsewhere (Stoianov, 

2001; Haykin, 1994), and will be reviewed only superficially. 

Figure 1. Learning phonotactics with the SRNs. If the training data set contains the 

words /n�t#/, /n�ts#/ and /n�t��rk#/ then after the network has processed 

a left context /n�/, the reaction to an input token /t/ will be active neurons 

corresponding to the symbol '#' and the phonemes /s/, and /�/. 

Simple Recurrent Networks (SRNs) were invented to encode simple 

artificial grammars, as an extension of the Multilayer Perceptron 

(Rumelhart, Hinton & Williams, 1986) with an extra input - a context layer 

that holds the hidden layer activations at the previous processing cycle. 

After training, Elman (1988) conducted investigations on how context


evolves in time. The analysis showed graded encoding of the input 

sequence: similar items presented to the input were clustered at close, but 

different, shifting positions. That is, the network discovered and implicitly 

represented in a distributed way the rules of the grammar generating the 

training sequences. This is noteworthy, because the rules for context were 

not encoded, but rather acquired through experience. The capacity of SRNs 

to learn simple artificial languages was further explored in a number of 

studies (Cleeremans, Servan-Schreiber & McClelland, 1989; Gasser, 1992). 

SRNs have the structure shown in Figure 1. They operate as follows: 

Input sequences S I are presented to the input layer, one element S I (t) at a 

time. The purpose of the input layer is just to transfer activation to the 

hidden layer through a weight matrix. The hidden layer in turn copies its 

activations after every step to the context layer, which provides an 

additional input to the hidden layer - i.e., information about the past, after a 

brief delay. Finally, the hidden layer neurons output their signal through a 

second weight matrix to the output layer neurons. The activation of the 

latter is interpreted as the product of the network. Since the activation of 

the hidden layer depends both on its previous state (the context) and on the 

current input, SRNs have the theoretical capacity to be sensitive to the 

entire history of the input sequence. However, practical limitations restrict 

the time span of the context information to maximally 10-15 steps 

(Christiansen & Chater, 1999). The size of the layers does not restrict the 

range of temporal sensitivity. 

The network operates in two working regimens - supervised training and 

network use. In the latter, the network is presented the sequential input data 

S I (t) and computes the output N(t) using contextual information. The 

training regimen involves the same sort of processing as network use and 

also includes a second, training step, which compares the network reaction 

N(t) to the desired one S T (t), and which uses the difference to adjust the 

network behavior in a way that improves future network performance on 

the same data. 

The two most popular supervised learning algorithms used to train 

SRNs are the standard Back-Propagation algorithm (Rumelhart et al., 1986) 

and the Back-Propagation Through Time algorithm (Haykin, 1994). While 

the earlier is simpler because it uses information from one previous time 

step only (the context activation, the current network activations, and 

error), the latter trains the network faster, because it collects errors from all 

time steps during which the network processes the current sequence and 

therefore it adjusts the weights more precisely. However, the BPTT


learning algorithm is also cognitively less plausible, since the collection of 

the time-spanning information requires mechanisms specific for the 

symbolic methods. Nevertheless, this compromise allows more extensive 

research, and without it the problems discussed below would require much 

longer training time when using standard computers for simulations. 

Therefore, in the experiments reported here the BPTT learning algorithm 

will be used. In brief, it works in the following way: the network reaction to 

a given input sequence is compared to the desired target sequence at every 

time step and an error is computed. The network activation and error at 

each step are kept in a stack. When the whole sequence is processed, the 

error is propagated back through space (the layers) and time, and weightupdating 

values are computed. Then, the network weights are adjusted with 

the values computed in this way. 

2.1. Learning Phonotactics with SRNs 

Dell, Juliano & Govindjee (1993) showed that words could be described 

not only with symbolic approaches, using word structure and content, but 

also by a connectionist approach. In this early study of learning word 

structure with neural nets (NNs), the authors trained SRNs to predict the 

phoneme that follows the current input phoneme, given context 

information. The data sets contained 100 - 500 English words. An 

important issue in their paper is the analysis and modeling of a number of 

speech-error phenomena, which were taken as strong support for parallel 

distributed processing (PDP) models, in particular SRNs. Some of these 

phenomena were: phonological movement errors (reading list - leading 

list), manner errors (department - jepartment), phonotactic regularity 

violations (dorm - dlorm), consonant-vowel category confusions and initial 

consonant omissions (cluster-initial consonants dropping as when `stop' is 

mispronounced [t�p]). 

Aiming at segmentation of continuous phonetic input, Shillcock et al. 

(1997) and Cairns et al. (1997) trained SRNs with a version of the BPTT 

learning algorithm on English phonotactics. They used 2 million 

phonological segments derived from a transcribed speech corpus and 

encoded with a vector containing nine phonological features. The neural 

network was presented a single phoneme at a time and was trained to 

produce the previous, the current and the next phonemes. The output 

corresponding to the predicted phoneme was matched against the following


phoneme, measuring cross-entropy; this produced a varying error signal 

with occasional peaks corresponding to word boundaries. The SRN 

reportedly learned to reproduce the current phoneme and the previous one, 

but was poor at predicting the following phoneme. Correspondingly, the 

segmentation performance was quite modest, predicting only about onefifth 

of the word boundaries correctly, but it was more successful in 

predicting syllable boundaries. It was significantly improved by adding 

other cues such as prosodic information. This means that phonotactics 

might be used alone for syllable detection, but polysyllabic word detection 

needs extra cues. 

In another connectionist study on phonological regularities, Rodd (1997) 

trained SRNs on 602 Turkish words; the networks were trained to predict 

the following phonemes. Analyzing the hidden layer representations 

developed during the training, the author found that hidden units came to 

correspond to graded detectors for natural phonological classes such as 

vowels, consonants, voiced stops and front and back vowels. This is further 

evidence that NN models can capture important properties of the data they 

have been trained on without any prior knowledge, based only on statistical 

co-occurrences. 

Learning the graphotactics and phonotactics of Dutch monosyllables 

with connectionist models was first explored by Tjong Kim Sang (1995) 

and Tjong Kim Sang & Nerbonne (1999), who trained SRNs to predict 

graphemes/phonemes based on preceding segments. The data was 

orthogonally encoded, that is, for each phoneme or grapheme there was 

exactly one neuron activated at the input and output layers (see below 3.1). 

To test the knowledge learned by the network, Tjong Kim Sang and 

Nerbonne tested whether the activation of the neurons corresponding to the 

expected symbols are greater than a threshold determined as the lowest 

activation for some correct sequence encountered during the training data. 

This resulted in almost perfect acceptance of unseen Dutch words 

(generalization), but also in negligible discrimination with respect to (illformed) 

random strings. The authors concluded that “SRNs are unfit for 

processing our data set” (Tjong Kim Sang & Nerbonne, 1999). 

These early works on learning phonotactics with SRNs prompted the 

work reported here. First, Stoianov et al. (1998) demonstrated that the 

SRNs in Tjong Kim Sang and Nerbonne's work were learning phonotactics 

rather better than those authors had realized. By analyzing the error as a 

function of the acceptance threshold, Stoianov et al. (1998) were able to 

demonstrate the existence of thresholds successful at both the acceptance of


well-formed data and the rejection of ill-formed data (see below 3.6.2 for a 

description of how we determine such thresholds). The interval of highperforming 

thresholds is narrow, which is why earlier work had not 

identified it (see Figure 2 on how narrow the window is). More recently, 

Stoianov & Nerbonne (2000) have studied the performance of SRNs from a 

cognitive perspective, attending to the errors produced by the network and 

to what extent it correlates with the performance of humans on related 

lexical decision tasks. The current article ties these two strands of work and 

presents it systematically. 

3. Experiments 

The challenge in connectionist modeling is not only developing theoretical 

frameworks, but also obtaining the most from the network models during 

experimentation. This section focuses on experiments on learning the 

phonotactics of Dutch syllables with Simple Recurrent Networks and 

discusses a number of related problems. It will be followed by a study on 

the network behavior from a linguistic point of view. 

3.1. Some implementation decisions 

SRNs were presented in section 2. A first implementation decision 

concerns how sounds are to be represented. A simple orthogonal strategy is 

to choose a vector of n neurons to represent n phonemes, to assign each 

phoneme (e.g. /�/) to a neuron (e.g., neuron 5 in a sequence of 45), and then 

to activate that one neuron and deactivate all the others whenever the 

phoneme is to be represented (so a /�/ is represented by four deactivated 

neurons, a single activated one, and then forty more deactivated neurons). 

This orthogonal strategy makes no assumptions about phonemes being 

naturally grouped into classes on the basis of linguistic features such as 

consonant/vowel status, voicing, place of articulation, etc. An alternative 

strategy exploits such features by assigning each feature to a neuron and 

then representing a phoneme via a translation of its feature description into 

a sequence of corresponding neural activations. 

In phonotactics learning, the input encoding method might be featurebased 

or orthogonal, but the output decoding should be orthogonal in order 

to obtain a simple prediction of successors, and to avoid a bias induced


from the peculiarities of the feature encoding scheme used. The input 

encoding chosen was also orthogonal, which also requires the network 

discover natural classes of phonemes by itself. 

The orthogonal encoding implies that we need as many neurons as we 

have phonemes, plus one for the end-of-word '#' symbol. That is, the input 

and output layers will have 45 neurons. However, it is usually difficult to 

choose the right size of the hidden layer for a particular learning problem. 

That size is rather indirectly related to the learning task and encoding 

chosen (as a subcomponent of the learning task). A linguistic bias in the 

encoding scheme, e.g., feature-based encoding, would simplify the learning 

task and decrease the number of the hidden neurons required learning it 

(Stoianov, 2001). Intuition tells us that hidden layers that are too small lead 

to an overly crude representation of the problem and larger error. Larger 

hidden layers, on the other hand, increase the chance that the network 

wanders aimlessly because the space of possibilities it needs to traverse is 

too large. Therefore, we sought an effective size in a pragmatic fashion. 

Starting with a plausible size, we compared its performance to nets with 

double and half the number of neurons in the hidden layer. We repeated in 

the direction of the better behavior, keeping track of earlier bounds in order 

to home in on an appropriate size. In this way we settled on a range of 20- 

80 neurons in the hidden layer, and we continued experimentation on 

phonotactic learning using only nets of this size. 

However, even given the right size of the hidden layer, the training will 

not always result in an optimal weight set W* since the network learning is 

nondeterministic - each network training process depends on a number of 

stochastic variables, e.g., initial network weights and an order of 

presentation of examples. Therefore, in order to produce more successful 

learning, several SRNs with different initial weights were trained in a pool 

(group). 

The back-propagation learning algorithm is controlled by two main 

parameters - a learning coefficient � and a smoothing parameter �. The first 

one controls the speed of the learning and is usually set within the range 

(0.1…0.3). It is advisable to choose a smaller value when the hidden layer 

is larger. Also, this parameter may vary in time, starting with a larger initial 

value that decreases progressively in time (as suggested in Kuan, Hornik & 

White (1994) for the learning algorithm to improve its chances at attaining 

a global minimum in error). Intuitively, such a schedule helps the network 

approximately to locate the region with the global minima and later to 

make more precise steps in searching for that minimum (Haykin, 1994;


Reed & Marks, II 1999). The smoothing parameter � will be set to 0.7, 

which also allows the network to escape from local minima during the 

search walk over the error surface. 

The training process also depends on the initial values of the weights. 

They are set to random values drawn from a region (-r...+r). It is also 

important to find a proper value for r, since large initial weight values will 

produce chaotic network behavior, impeding the training. We used r = 0.1. 

The SRNs used for this problem are schematically represented in Fig. 1, 

where the SRN reaction to an input sequence /n�/ after training on an 

exemplary set containing the sequences /n�t#/, /n�ts#/, /n�t��rk#/ is given. 

For this particular database, the network has experienced the tokens '#', /s/ 

and /�/ as possible successors to /n��/ during training and therefore it will 

activate them in response to this input sequence. 

3.2. Linguistic Data - Dutch syllables 

A data base L M of all Dutch monosyllables - 5,580 words - was extracted 

from the CELEX (1993) lexical database. CELEX is a difficult data source 

because it contains many rare and foreign words among its approximately 

350,000 Dutch lexical entries, which additionally complicate the learning 

task. Filtering out non-typical words is a formidable task and one which 

might introduce experimenter prejudice, and therefore all monosyllables 

were used. The monosyllables have a mean length of 4.1(� = 0.94; min = 2; 

max = 8) tokens and are built from a set of 44 phonemes plus one extra 

symbol representing space (#) used as a filler specifying end-of-word. 

The main dataset is split into a training (L 1 ) and a testing (L 2 ) database 

in proportion approximately 85% to 15%. The training database will be 

used to train a Simple Recurrent Network and the testing one will be used 

for evaluating the success of word recognition. Negative data also will be 

created for test purposes. The complete database L M will be used for some 

parts of evaluation. 

In language modeling it is important to explore the frequencies of word 

occurrences which naturally bias humans' linguistic performance. If a 

model is trained on data in proportion to its empirical frequency, this 

focuses the learning on the more frequent words and thus improves the 

performance of the model. This also makes feasible a comparison of the 

model's performance with that of humans performing various linguistic 

tasks, such as a lexical decision task. For these reasons, we used the word


frequencies given in the CELEX database. Because the frequencies vary 

greatly ([0...100,000]), we presented training data items in proportion with 

the natural logarithms of their frequencies, in accordance with standard 

practice (Plaut, McClelland, Seidenberg & Patterson, 1996). This approach 

resulted in frequencies in a range of [1...12]. 

3.3. Difficulty 

One way to characterize the complexity of the training set is to compute the 

entropy of the distribution of successors, for every available left context. 

The entropy of a language L viewed as a stochastic process measures the 

average surprise value associated with each element (Mitchell, 1997). In 

our case, the language is a set of words and the elements are phonemes, 

hence the appropriate entropy measures the average surprise value for 

phonemes c preceded by a context s. Entropy is measured for a given 

distribution, which in our case is the set of all possible successors. We 

compute entropy Entr(s) for a given context s with (1): 

Equation 1. Entropy 

where � is the alphabet of segment symbols, and p(c) the probability of a 

given context. Then the average entropy for all available contexts s∈L, 

weighted with their frequencies, will be the measure of the complexity of 

the words. The smaller this measure, the less difficult are the words. The 

maximal possible value for one context would be log2(45), that is, 5.49, and 

this would only obtain for the unlikely case that each phoneme was equally 

likely in that context. The actual average value of the entropy measured for 

the Dutch monosyllables, is 2.24, � = 1.32. The minimal value was 0.0, and 

the maximal value was 3.96. These values may be interpreted as follows: 

The minimal value of 0.0 means that there are left contexts with only one 

possible successor (log2(1) = 0). A maximal value of 3.96 means that there 

is one context which is as unpredictable as one in which 2 3.96 = 16 

successors were equally likely. The mean entropy is 2.24, which is to say 

that in average 4.7 phonemes follow a given left context.


3.4. Negative Data 

We noted above that negative data is also necessary for evaluation. Since 

we are interested in models that discriminate more precisely the strings 

from L (the Dutch syllables), the negative data for the following 

experiments will be biased toward L. 

Three negative testing sets were generated and used: First, a set RM containing strings with syllabic form [C] 0...3 V[C] 0...4 , based on the empirical 

observation that the Dutch mono-syllables have up to three onset (word 

initial) consonants and up to four coda (word final) consonants. The second 

group consists of three sub-sets of R M: {R 1 M , R 2 M , R M 

3 + }, with fixed distances of 

the random strings to any existing Dutch word at 1, 2, and 3+ phonemes, 

respectively (measured by edit distance (Nerbonne, Heeringa & Kleiweg, 

1999)). Controlling for the distance to any training word allows us to assess 

more precisely the performance of the model. And finally, a third group: 

random strings built of concatenations of n-grams picked randomly from 

Dutch monosyllables. In particular, two sets - R 2 N and R 3 N - were randomly 

developed, based on bigrams and trigrams, correspondingly. 

The latter groups are the most "difficult" ones, and especially R 3 N , 

because it consists of strings that are closest to Dutch. They are also useful 

for the comparison of SRN methods to n-gram modeling. The 

corresponding n-gram models will always wrongly recognize these random 

strings as words from the language. Where the connectionist predictor 

recognizes them as non-words, it outperforms the corresponding n-gram 

models, which are considered as benchmark models for prediction tasks 

such as phonotactics learning. 

3.5. Training 

This section reports on network training. We will add a few more details 

about the training procedure, then we will present pilot experiments aimed 

at determining the hidden layer size. The later parts will analyze the 

network performance.

3.5.1. Procedure 


The networks were trained in a pool on the same problem, and 

independently of each other, with the BPTT learning algorithm. The 

training of each individual network was organized in epochs, in the course 

of which the whole training data set is presented in accordance with the 

word frequencies. The total of the logarithm of the frequencies in the 

training data base L 1 M is about 11,000, which is also the number of 

presentations of sequences per epoch, drawn in a random order. Next, for 

each word, the corresponding sequence of phonemes is presented to the 

input, one at a time, followed by the end-of-sequence marker `#'. Each time 

step is completed by copying the hidden layer activations to the context 

layer, which is used in the following step. 

The parameters of the learning algorithm were as follows: the learning 

coefficient � started at 0.3 and dropped by 30% each epoch, finishing at 

0.001; the momentum (smoothing) term � = 0.7. The networks required 30 

epochs to complete training. After this point, very little improvement is 

noted. 

3.5.2. Pilot experiments 

Pilot experiments aiming at searching for the most appropriate hidden layer 

size were done with 20, 40 and 80 hidden neurons. In order to avoid other 

nondeterminism which comes from the random selection of negative data, 

during the pilot experiments the network was tested solely on its ability to 

distinguish admissible from inadmissible successors. Those experiments 

were done with a small pool of three networks, each of them trained for 30 

epochs, which resulted in approximately 330,000 word presentations or 

1,300,000 segments. The total number of individual word presentations 

ranged from 30 to 300, according to the individual word frequencies. The 

results of the training are given in Table 1, under the group of columns 

"Optimal phonotactics". In the course of the training, the networks typically 

started with a sharp error drop to about 13%, which soon turned into a very 

slow decrease (see Table 2, left 3 columns). 

The training of the three pools with hidden layer size 20, 40 and 80, 

resulted in networks with similar performance, with the largest network 

performing best. Additional experiments with SRNs with 100 hidden 

neurons resulted in larger errors than a network with 80 hidden neurons, so


that we settled experimentally on 80 hidden neurons as the likely optimal 

size. It is clear that this procedure is rough, and that one needs to be on 

guard against premature concentration on one size model. 

Table 1. Results of a pilot study on phonotactics learning by SRNs with 20, 40, 

and 80 (rows) hidden neurons. Each network is independently trained on 

language LM three times (columns). The performance is measured (left 3 

columns) using the error in predicting the next phoneme, and (right 3 

columns) using L2 (semi-Euclidean) distance between the empirical 

context-dependent predictions and the network predictions for each 

context in the tree. Those two methods do not depend on randomly 

chosen negative data. 

Optimal Phonotactics ||SRN L , T L ||L2 

Hidd Layer Size SRN1 SRN2 SRN3 SRN1 SRN2 SRN3 

20 10.57% 10.65% 10.57% 0.0643 0.0642 0.0642 

40 10.44% 10.51% 10.44% 0.0637 0.0637 0.0637 

80 10.00% 9.97% 10.02% 0.0634 0.0634 0.0632 

Table 2. A typical shape of the SRN error during training. The error drops sharply 

in the beginning and then slowly decreases to convergence. 

Epoch 1 2-4 5-10 11-15 16-30 

Error (%) 15.0 12.0 10.8 10.7 10.5 

3.6. Evaluation 

The performance of a neural predictor trained on phonotactics may be 

evaluated with different methods, depending on the particular task the 

network is applied to. In this section we evaluate the neural networks 

performing best during the pilot studies. 

3.6.1. Likelihoods 

The direct outcome of training the sequential prediction task is learning the 

successors' distribution. This will therefore be used as a basic evaluation 

method: the empirical context-dependent successor distribution Ps L (C) will 

be matched against the network context dependent predictions NPs L (C). For


this purpose, the output of the network will be normalized and matched 

against the distribution in the language data. This procedure resulted in a 

mean L2 (semi-Euclidean) distance of 0.063 - 0.064, where the optimal 

value would be zero (see Table 1, right 3 columns). 2 These values are close 

to optimal but baseline models (completely random networks) also result in 

approximately 0.085 L2 distance. 

3.6.2. Phonotactic Constraints 

To evaluate the network's success in becoming sensitive to phonotactic 

constraints, we first need to judge how well it predicts individual 

phonemes. For this purpose we seek a threshold above which phonemes are 

predicted to be admissible and below which they are predicted to be 

inadmissible. This is done empirically - we perform a binary search for an 

optimal threshold, i.e. the threshold �� that minimizes the network error 

E(�). The classification obtained in this fashion constitutes the network's 

predictions about phonotactics. 

We now turn to evaluating the network's predictions: the method to 

evaluate the network from this point of view compares the contextdependent 

network predictions with the corresponding empirical 

distributions. For this purpose, the method described by Stoianov (2001) 

will be used. The algorithm traverses a trie (Aho, Hopcroft & Ullman, 

1983: 163-169), which is a tree representing the vocabulary where initial 

segments are the first branches. Words are paths through this data structure. 

The algorithm computes the performance at the optimal threshold 

determined using the procedure described in the last paragraph, i.e., at the 

threshold which determines which phonemes are admissible and which 

inadmissible (see also 2.1). This approach compares the actual distribution 

with the learned distribution, and we normally use the complete database L M 

for training and testing. 

Figure 2 shows the error of SRN1 8 0 at different values of the threshold. 

The optimal threshold searching procedure resulted in 6.0% erroneous 

phoneme prediction at a threshold of 0.0175. This means that if we want to 

predict phonemes with this SRN, they would be accepted as allowed 

successors if the activation of the correspondent neurons are higher than 

0.0175.


3.6.3. Word Recognition 

Using an SRN trained on phoneme prediction as a word recognizing device 

shifts the focus from phoneme prediction to sequence classification. We 

wish to see whether it can classify sequences of phonemes into well-formed 

words on the one hand and ill-formed non-words on the other. To do this 

we need to translate the phoneme (prediction) values into sequence values. 

We do this by taking the sum of the phoneme error values for the sequence 

of phonemes in the string, normalized to correct for length effects. But to 

translate this sum into a classification, we again need to determine an 

acceptability threshold, and we use a variant of the same empirical 

optimization described above. The threshold arrived at for this purpose is 

slightly lower than the optimal threshold from the previous algorithm. This 

means that the network accepts more phonemes, which, however, is 

compensated for by the fact that a string is accepted only if all its phonemes 

are predicted. In string recognition it is better to increase the phoneme 

acceptance rate, because the chance to detect a non-word is larger when 

more tokens are tested. 

Figure 2. SRN error (in %) as a function of the threshold �� The False Negative 

Error increases as the threshold increases because more and more 

admissible phonemes are incorrectly rejected. At the same time, the False 

Positive Error decreases because fewer unwanted successors are falsely 

accepted. The mean of those two errors is the network error, which finds 

its minimum 6.0% at threshold �� = 0.0175. Notice that the optimal 

threshold is limited to a small range. This illustrates how critical the 

exact setting of threshold is for good performance.


Since the performance measure here is the mean percentage of correctly 

recognized monosyllables and correctly rejected random strings, we 

incorporate both in seeking the optimal threshold. The negative data is as 

described above in 3.4. Concerning the positive data, this approach allows 

us to test the generalization capacity of the model, so that the training L 1 M 

and testing L 2 M subsets may be used here - the first for training the model 

and evaluating it during training, and the second to test the generalization 

capacity of the trained network. 

Once we determine the optimal sequence-acceptance threshold (0.016), 

we obtain 5% error on the positive training dataset L 1 M and the negative 

strings from RM , where the error varied 0.5% depending on the random data 

set generated. 

The model was tested further on the second group of negative data sets. 

As expected, strings which are more unlike Dutch resulted in smaller error. 

Performance on random strings from R N 

3 + is almost perfect. In the opposite 

case, the strings close to real words (from R 1 N ) resulted in larger error. 

The generalization capabilities of the network were tested on the L 2 M 

positive data, unseen during training. The error on this test set was about 

6%. An explanation of the increase of the error will be presented later, 

when the error will be studied by varying its properties. 

Another interesting issue is how SRN performance compares to other 

known models, e.g. n-grams. The trained SRN definitely outperformed 

bigrams and trigrams, which was shown by testing the trained SRNs on the 

non-words from R 2 N and R 3 N sets, yielding 19% and 35% error, respectively. 

This means that the SRN correctly rejected four out of five non-word 

strings composed of correct bigrams and two out of three non-word strings 

made of trigrams. To clarify, note that bigram models would have 100% 

error on R 2 N , and trigram models 100% error on R 3 N . 

4. Network Analysis 

The distributed representations in Neural Networks prevent the analysis of 

generalizations in trained models by simple observation, which symbolic 

learning methods allow. Smaller NNs may be analyzed to some extent by 

examination, but for larger networks this is practically impossible. 

It is possible, however, to analyze trained networks to extract abstract 

knowledge about their behavior. Elman (1988), for example, trained an 

SRN to learn sentences and then analyzed the hidden layer activations of


that SRN in various contexts, from which he showed that the network had 

internally developed syntactical categories. Similarly, we trained SRNs on 

phonotactics (Stoianov et al., 1998), and then analyzed the network 

statically, by viewing the weight vectors of each neuron as pattern 

classifiers. We showed that the SRN had induced generalizations about 

phonetic categories. We follow that earlier work in order to study network 

behavior, and we present the results of this study in the first subsection. 

Another approach to the analysis of connectionist models assumes that 

they are black boxes and examines the variation of network performance 

while varying some properties of the data (Plaut et al., 1996; Stoianov, 

Stowe & Nerbonne, 1999). For example, one can vary word frequency, 

length, etc., and study the network error. When modeling human cognitive 

functions with this approach one can compare the behavior of the cognitive 

system and its artificial models. For example, in phonotactic modeling, one 

can compare results from psycholinguistic studies of a lexical decision task 

with the network reaction. This will be subject of study in the rest of the 

section. 

4.1. Weight Analysis 

The neurons of a neural network act as pattern classifiers. The inputs 

selectively activate one or another neuron, depending on the weight 

vectors. This means that information about network structure may be 

extracted from the weight vectors. 

In this section we will present a cluster analysis of the neurons in the 

output layer. For that purpose, the mean weight vectors of the output layer 

of one of the networks - SRN2 4 0 (from Table 1) - were clustered using a 

minimum variance (Ward's) method, and each vector in the resulting 

dendrogram was labeled with the phoneme it corresponds to. 3 The resulting 

diagram is shown in Figure 3.


Figure 3. Cluster analysis of the vector of the output neurons, labeled with the 

phonemes they correspond to. The weight vectors are split into clusters 

which roughly correspond to existing phonetic categories.


We can see that the weight vectors (and correspondingly, the phonemes) 

cluster into some well-known major natural classes - vowels (in the bottom) 

and consonants (the upper part). The vowels are split into two major 

categories: low vowels and semi-low, front vowels (/��, �, a, e/), and high, 

back ones. The latter, in turn, are clustered into round+ and round- classes. 

Consonants appear to be categorized in a way less congruent with 

phonetics. But here, too, some established groups are distinguished. The 

first subgroup contains non-coronal consonants (/f, k, m, p, x/) with the 

exceptions of /l/ and /n/. Another subgroup contains voiced obstruents (/�, 

d, �, ��/). The delimiter '#' is also clustered as a consonant, in a group with 

/t/, which is also natural. The upper part of the figure seems to contain 

phonemes from different groups, but we can recognize that most of these 

phonemes are quite rare in Dutch monosyllables, e.g., /�/, perhaps because 

they have been 'loaned' from other languages, e.g. /g/. 

4.2. Functional analysis 

We may also study NNs by examining their performance as a function of 

factors such as word frequency, similarity neighborhood, and word length. 

Such an analysis relates computational language modeling to 

psycholinguistics, and we submit that it is useful to compare the models' 

performance with humans'. In this section we introduce several factors 

which have played a role in psycholinguistic theorizing. We then examine 

the performance of our model as a function of these factors. 

4.2.1. Psycholinguistic Factors 

Frequency is one of the most thoroughly investigated characteristics of 

words that affect performance. Numerous previous studies have 

demonstrated that the ease and the time with which spoken words are 

recognized are monotonically related to the experienced frequency of 

words in the language environment (Luce, Pisoni & Goldinger, 1990; Plaut 

et al., 1996). The general tendencies found are that the more frequent words 

are, the faster and the more precise they are recognized. 

Our perception of a word is likewise known to depend on its similarity 

to other words. The similarity neighborhood of a word is defined as the 

collection of words that are phonetically similar to it. Some neighborhoods


are dense with many phonetically similar words while others are sparse 

with few. 

The so-called Colthearth-N measure of a word w counts the number of 

words that might be produced by replacing a single letter of w with some 

other. We modify this concept slightly to make it sensitive to similarity of 

sub-syllabic elements, so that we regard words as similar when they share 

two of the subsyllabic elements - onset, nucleus and coda. Empty onsets or 

codas are counted as the same. The word neighborhood is computed by 

counting the number of the similar words. If implemented precisely, the 

complexity of the measuring process just explained is high, so we reduce it 

by probing for sub-syllables rather than for units of variable size, starting 

from a single phoneme. This simplifies and speeds up processing. The 

neighborhood size of the corpus we used ranged from 0 to 77 and had mean 

value of �= 30; � = 13. 

For example, the phonological neighborhood of the Dutch word broeds 

/bruts/ is given below. Note that the neighborhood contains only Dutch 

words. 

/br�ts/, /brots/, /bruj/, /brujt/, /bruk/, /brur/, /brus/, /brut/, /buts/, /kuts/, 

/puts/, /tuts/ 

These represent the pronunciations of Brits `British', broods `bread' 

(gen.sg.), broei `brew', broeit `brew' (3rd. sg.), broek `pants', broer 

`brother', broes `spray nozzle', broed `brood', boots `boots' (Eng. loan), 

koets `coach', poets `clean' and toets `test'. Among the words with very poor 

neighborhood are /��/ schwung, /b�rts/ boards, /��jnt/ joint, and 

/sk��rs/ squares, all of which are of foreign origin. Words such as /h�k/ 

hek, /b�s/ bas, /l�xt/ lacht, and /b�kt/ bakt have large neighborhoods. 

It is still controversial how similarity neighborhood influences cognitive 

processes (Balota, Paul & Spieler, 1999). Intuitively, it seems likely that 

words with larger neighborhoods are easier to access due to many similar 

items, but from another perspective these words might be more difficult to 

access due to the nearby competitors and longer selection process. 

However, in the more specific lexical decision task, the overall activity of 

many candidates has been shown to facilitate lexical decisions, so we will 

look for the same effect here. 

The property word length might affect performance in the lexical 

decision task in two different ways. On one hand, longer words provide 

more evidence since more phonemes are available to decide whether the


input sequence is a word so that we expect higher precision for longer 

words, and lower precision for particularly short words. On the other hand, 

network error accumulating in iteration increases the error in phoneme 

predictions at later positions, which in turn will increase the overall error 

for longer words. For these reasons we expect U-shaped patterns of error as 

word length increases. Such a pattern was observed in a study on modeling 

grapheme-to-phoneme conversion with SRNs (Stoianov et al., 1999). Static 

NNs are less likely (than dynamic models such as SRNs) to produce such 

patterns. 

So far we have presented three main characteristics of the individual 

words, which we expect to affect the performance of the model. However, a 

statistical correlation analysis (bivariate Spearman test) showed that they 

are not independent, which means that an analysis of the influence of any 

single factor should control for the rest. In particular, there is high negative 

correlation between word neighborhood and word length (r = -0.476), 

smaller positive correlation between neighborhood and frequency (r = 

0.223), and very small negative correlation between frequency and word 

length (r = -0.107). Because of the large amount of data all these 

coefficients are significant at the 0.001 level. 

Finally, it will be useful to seek a correlate in the simulation for reaction 

time, which psycholinguists are particularly fond of using as a probe to 

understanding linguistic structure. Perhaps we can find an SRN correlate to 

Reaction Time (RT) for the lexical decision task in network confidence, 

i.e., the amount of evidence that the test string is a word from the training 

language. The less confident the network, the slower the reaction, which 

can be implemented with a lateral inhibition (Haykin, 1994; Plaut et al., 

1996). The network confidence for a given word might be expressed as the 

product of the activations of the neurons corresponding to the phonemes of 

that word. A similar measure, which we call uncertainty U, is the negative 

sum of (output) neuron activation logarithms, normalized with respect to 

word length |w| (2). Note that U varies inversely with confidence. Less 

certain sequences get higher (positive) scores.

Equation 2. 


To analyze the influence of these parameters, the network scores and Uvalues 

were recorded for each monosyllabic word at the optimal threshold 

��= 0.016. The data was then submitted to the statistical package SPSS for 

analysis of variance using SPSS's General Linear Model (GLM). When 

analyzing network score, the analysis revealed main effects of all three 

parameters discussed above: word neighborhood size (F = 18.4; p < 

0.0001), word frequency (F = 19.2; p < 0.0001), word length (F = 11.5; p < 

0.0001). There was also interaction between neighborhood size and the 

other parameters: the interaction with word frequency had an F -score 6.6 

and the interaction of the neighborhood with word length had an F-score of 

4.9, both significant at 0.0001 level. Table 3 summarizes the findings. Error 

decreases both as neighborhood size and as frequency increases, and error 

dependent on length shows the predicted U-shaped form (Table 3c). 

Table 3. Effect of (a) frequency, (b) neighborhood density and (c) length on word 

uncertainty U and word error. 

a. 

Frequency Low Mid High 

U 2.30 2.20 2.18 

Error (%) 8.6 4.1 1.5 

b. 

Neighb. size Low Mid High 

U 2.62 2.30 2.21 

Error (%) 12.7 3.9 0.8 

c. 

Length Low Mid High 

U 2.63 2.20 2.13 

Error (%) 5.2 4.4 13.1


Analysis of variance on the U-values revealed similar dependencies. There 

were main effects of word neighborhood size (F = 58.2; p < 0.0001), word 

frequency (F = 45.9; p < 0.0001), word length (F = 137.5; p < 0.0001), as 

well as the earlier observed interactions between neighborhood density and 

the other two variables: word length (F = 10.4; p < 0.001) and frequency (F 

= 5.235; p < 0.005). 

The frequency pattern of error and uncertainty variance was expected, 

given the increased evidence to the network for more frequent words. The 

displayed length effect showed that the influence of error gained in 

recursion is weaker than the effect of stronger evidence for longer words. 

Also, the pattern of performance when varying neighborhood density 

confirmed the hypothesis of the lexical decision literature that larger 

neighborhoods makes it easier for words to be recognized as such. 

4.3. Syllabic structure 

Phonotactic constraints might hint at how the stream of phonemes is 

organized in the language processing system. The popular phoneme, 

syllable and word entities may not be the only units that we use for lexical 

access and production. There are suggestions that in addition, some subsyllabic 

elements are involved in those processes, that is, the syllables 

might have not linear structure, but more complex representations (Kessler 

& Treiman, 1997). For that purpose, we will analyze how the phoneme 

prediction error at a threshold of 0.016 - where the network resulted in best 

word recognition - is located within words with respect to the following 

sub-syllabic elements - onset, nucleus and coda. The particular hypothesis 

that will be tested is whether Dutch monosyllables follow the structure 

below that was found in English as well (Kessler & Treiman, 1997). 

( Onset - Rhyme (Nucleus - Coda) ) 

The distribution of phoneme error within words (Table 4a) shows that the 

network makes more mistakes at the beginning than at the end of words, 

where SRN becomes more confident in its decision. This could be 

explained with increasing contextual information that more severely 

restricts possible phonemic combinations. A more precise analysis of the 

error position in the onset, the nucleus and the coda further reveals other 

interesting phenomena (Table 4b).


Table 4. Distribution of phoneme prediction error at a threshold of 0.016 by (a) 

phoneme position within words and (b) phoneme position within subsyllables. 

Word and Onset positions start from 2, because the prediction 

starts after the first phoneme. 

a. 

Word Position 2 3 4 5 6 7 8 

Error (%) 4.3 1.7 1.4 0.6 0.3 0.3 0.00 

b. 

Sub-syllabes Onset Nucleus Coda 

Relative Position 2 3 1 1 2 3 

Error (%) 

2.6 0.0 4.5 1.0 1.5 2.0 

4 

2.6 

First, error within the coda increases at the coda's end. We attribute this to 

error accumulated toward the end of the words, as was predicted earlier. 

The mean entropy in the coda (1.32; � = 0.87) is smaller than the mean 

entropy in the onset (1.53; � = 0.78), where we do not observe such effects. 

So looser constraints are not the reason for the relatively greater error in the 

coda. Next, the error at the transition onset-nucleus is much higher than the 

error at the surrounding positions, which means that the break between 

onset and rhyme (the conjunction nucleus-coda) is significant. This 

distribution is also consistent with the statistical finding that the entropy is 

larger in the body (the transition point onset-nucleus) (3.45; � = 0.39), than 

in the rhyme (1.94; � = 1.21). All this data support the hypothesis that onset 

and rhyme play significant roles in lexical access and that the syllabic 

structure confirmed for English by Kessler & Treiman (1997) is valid for 

Dutch, too. 

5. Conclusions 

Phonotactic constraints restrict the way phonemes combine in order to form 

words. These constraints are empirical and can be abstracted from the 

lexicon - either by extracting rules directly, or via models of that lexicon. 

Existing language models are usually based on abstract symbolic methods, 

which provide good tools for studying such knowledge. But linguistic 

research from a connectionist perspective can provide a fresh perspective 

about language because the brain and artificial neural networks share 

principles of computations and data representations.


Connectionist language modeling, however, is a challenging task. 

Neural networks use distributed processing and continuous computations, 

while languages have a discrete, symbolic nature. This means that some 

special tools are necessary if one is to model linguistic problems with 

connectionist models. The research reported in this paper attempted to 

provide answers to two basic questions: first, whether phonotactic learning 

is possible at all in connectionist systems, which had been doubted earlier 

(Tjong Kim Sang, 1995; Tjong Kim Sang, 1998). In the case of a positive 

answer, the second question is how NN performance compares to human 

ability. In order to draw this comparison, we needed to extract the 

phonotactic knowledge from a network which has learned the sequential 

structure. We proposed several ways of doing this. 

Section 3 studied the first question. Even if there are theoretical results 

demonstrating that NNs have the needed finite-state capacity for 

phonotactic processing, there are practical limitations, so that we needed 

experimental support to demonstrate the practical capability of SRNs to 

learn phonotactics. A key to solving the problems of earlier investigators 

was to focus on finding a threshold that optimally discriminated the 

continuous neuron activations with respect to phoneme acceptance and 

rejection simultaneously. The threshold range at which the network 

achieves good discrimination is very small (see Figure 2), which illustrates 

how critical the exact setting of the threshold is. We also suggested that this 

threshold might be computed interactively, after processing each symbol, 

which is cognitively plausible, but we postpone a demonstration of this to 

another paper. 

The network performance on word recognition - word acceptance rate of 

95% and random string rejection rate of 95% at a threshold of 0.016 - 

competes with the scores of symbolic techniques such as Inductive Logic 

Programming and Hidden Markov Models (Tjong Kim Sang, 1998), both 

of which reflect low-level human processing architecture with less fidelity. 

Section 4 addressed the second question of how other linguistic 

knowledge encoded into the networks can be extracted. Two approaches 

were used. Section 4.1 clustered the weights of the network, revealing that 

the network has independently become sensitive to established phonetic 

categories. 

We went on to analyze how various factors which have been shown to 

play a role in human performance find their counterparts in the network's 

performance. Psycholinguistics has shown, for example, the ease and the 

time with which spoken words are recognized are monotonically related to


the frequency of words in language experience (Luce et al., 1990). The 

model likewise reflected the importance of neighborhood density in 

facilitating word recognition, which we speculated stems from the 

supportive evidence which more similar patterns lend to the words in their 

neighborhood. Whenever network and human subjects exhibit a similar 

sensitivity to well-established parameters, we see a confirmation of the 

plausibility of the architecture chosen. 

Finally, the distribution of the errors within the words showed another 

linguistically interesting result. In particular, the network tended to err 

more often at the transition onset-nucleus - which is also typical for 

transitions between adjacent words in the speech stream and used for 

speech segmentation. Analogically, we can conclude from this that the 

nucleus-coda unit - the rhyme - is a significant linguistic unit for the Dutch 

language, a result suggested earlier for English (Kessler & Treiman, 1997). 

We wind up this conclusion with one disclaimer and a repetition of the 

central claim. We have not claimed that SRNs are the only (connectionist) 

model capable of dynamic processing, nor that they are biologically the 

most plausible neural network. Our central claim is to have demonstrated 

that relatively simple connectionist mechanisms have the capacity to model 

and learn phonotactic structure. 

Notes 

1 

The authors are particularly pleased to offer this piece to a Festschrift honoring 

Dr. Dr. h.c. Tjeerd de Graaf, who graciously agreed to cooperate in the 

supervision of Stoianov's Ph.D. project 1997-2001 at the University of 

Groningen. Even if Tjeerd is best known for his more recent work on 

descriptive linguistics, minority languages and language documentation, his 

early training in physics and earlier research on acoustic phonetics made him 

one of the best-suited supervisors for projects such as the one reported on here 

involving advanced learning algorithms. Tjeerd's sympathy with Eastern 

European languages and cultures is visceral and might have led him to agree in 

any case, but we particularly appreciated his phonetic acumen. 

2 

The distance is related to Euclidean, but more exactly the distance between the 

two n-dimensional vectors is


3 The cluster analysis in Figure 3 was produced by programs written by Peter 

Kleiweg, available at http://www.let.rug.nl/alfa. 

References 

Aho, Alfred, John Hopcroft & Jeffrey Ullman (1983). Data Structures and 

Algorithms. Addison Wesley. 

Balota, David, Stephen Paul & Daniel Spieler (1999). Attentional control of 

lexical processing pathways during word recognition and 

reading. In: S. Garrod & M. Pickering (eds). Studies in cognition: 

Language processing. UCL Press, London, England, 15-57. 

Cairns, Paul, R. Shillcock, Nick Chater & Joe Levy (1997). Bootstrapping word 

boundaries: A bottom-up corpus-based approach to speech 

segmentation. Cognitive Psychology, 33(2): 111-153. 

Carrasco, Rafael, Mikel Forcada & Ramon Neco (1999). Stable encoding of 

finite-state machines in discrete-time recurrent neural networks 

with sigmoid units. Neural Computation, 12(9): 2129-2174. 

Carstairs-McCarthy, Andrew (1999). The Origins of Complex Language. 

Oxford Univ Press. 

CELEX (1993). The CELEX Lexical Data Base (cd-rom), Linguistic Data 

Consortium. http://www.kun.nl/celex. 

Christiansen, Morton H. & Nick Chater (1999). Toward a connectionist model 

of recursion in human linguistic performance. Cognitive Science, 

23: 157-205. 

Cleeremans, A., D. Servan-Schreiber & J.L. McClelland (1989). Finite state 

automata and simple recurrent networks. Neural Computation, 

1(3): 372-381. 

Cohen, A., C. Ebeling & A.G.F. van Holk (1972). Fonologie van het 

Nederlands en het Fries. Martinus Nijhoff, The Hague. 

Dell, Gary, Cornell Juliano & Anita Govindjee (1993). Structure and content in 

language production: A theory of frame constraints in 

phonological speech errors. Cognitive Science, 17: 145-195. 

Dupoux, Emmanuel, Christophe Pallier, Kazuhiko Kakehi & Jacques Mehler 

(2001). New evidence for prelexical phonological processing in 

word recognition. Language and Cognitive Processes, 5(16): 

491-505. 

Elman, Jeffrey L. (1988). Finding structure in time. Technical Report 9901, 

Center for Research in Language, UCSD, CA.


Elman, Jeffrey L. (1991). Distributed representations, simple recurrent 

networks, and grammatical structure. Machine Learning, 7(2/3): 

195-226. 

Gasser, Michael (1992). Learning distributed representations for syllables. In: 

Proc. of 14th Annual Conference of Cognitive Science Society, 

396- 401. 

Haykin, Simon (1994). Neural Networks. Macmillian Publ, NJ. 

Kaplan, Ronald & Martin Kay (1994). Regular models of phonological rule 

systems. Computational Linguistics, 20/3: 331-378. 

Kessler, Brett & Rebecca Treiman (1997). Syllable structure and the 

distribution of phonemes in English syllables. Journal of Memory 

and Language, 37: 295-311. 

Konstantopoulos, Stasinos (2003). Using Inductive Logic Programming to 

Learn Local Linguistic Structures. PhD thesis, Rijksuniversiteit 

Groningen. 

Kuan, Chung-Ming, Kurt Hornik & Halbert White (1994). A convergence 

result for learning in recurrent neural networks. Neural 

Computation, 6: 420-440. 

Laver, John (1994). Principles of Phonetics. Cambridge University Press, 

Cambridge. 

Lawrence, Steve, C. Lee Giles & S. Fong (1995). On the applicability of neural 

networks and machine learning methodologies to natural 

language processing. Technical report, Univ. of Maryland. 

Luce, Paul L., David B. Pisoni & Steven D. Goldinger (1990). Similarity 

neighborhoods of spoken words. In: G. T. M. Altmann (ed.). 

Cognitive Models of Speech Processing. A Bradford Book, 

Cambridge, Massachusetts, USA, 122-147. 

McQueen, James (1998). Segmentation of continuous speech using 

phonotactics. Journal of Memory and Language, 39: 21-46. 

Mitchell, Thomas (1997). Machine Learning. McGraw Hill College. 

Nerbonne, John, Wilbert Heeringa & Peter Kleiweg (1999). Edit distance and 

dialect proximity. In: D. Sankoff & J. Kruskal (eds). Time Warps, 

String Edits and Macromolecules: The Theory and Practice of 

Sequence Comparison, 2nd ed.. CSLI, Stanford, CA, v-xv. 

Norris, D., J.M. McQueen, A. Cutler & S. Butterfield (1997). The possibleword 

constraint in the segmentation of continuous speech. 

Cognitive Psychology, 34: 191-243. 

Omlin, Christian W. & C. Lee Giles (1996). Constructing deterministic finitestate 

automata in recurrent neural networks. Journal of the ACM, 

43(6): 937-972.


Pacton, S., P. Perruchet, M. Fayol & A. Cleeremans (2001). Implicit learning in 

real world context: The case of orthographic regularities. Journal 

of Experimental Psychology: General, 130(3): 401-426. 

Plaut, D.C., J. McClelland, M. Seidenberg & K. Patterson (1996). 

Understanding normal and impaired word reading: 

Computational principles in quasi-regular domains. 

Psychological Review, 103: 56-115. 

Reed, Russell D. & Robert J. Marks II (1999). Neural Smithing. MIT Press, 

Cambridge, MA. 

Reilly, Ronan (2002). The relationship between object manipulation and 

language development in Broca's area: A connectionist 

simulation of Greenfield's hypothesis. Behavioral and Brain 

Sciences, 25: 145-153. 

Robinson, A. J. & F. Fallside (1988). Static and dynamic error propagation 

networks with application to speech coding. In: D. Z. Anderson 

(ed.). Neural Information Processing Systems. American Institute 

of Physics, NY. 

Rodd, Jennifer (1997). Recurrent neural-network learning of phonological 

regularities in Turkish. In: Proc. of Int. Conf. on Computational 

Natural Language Learning. Madrid, 97-106. 

Rumelhart, David E. & James A. McClelland (1986). Parallel Distributed 

Processing: Explorations of the Microstructure of Cognition. The 

MIT Press, Cambridge, MA. 

Rumelhart, D.E., G.E. Hinton & R.J. Williams (1986). Learning internal 

representations by error propagation. In: D. E. Rumelhart & J. A. 

McClelland (eds.). Parallel Distributed Processing: Explorations 

of the Microstructure of Cognition, Volume 1, Foundations . The 

MIT Press, Cambridge, MA, 318-363. 

Shillcock, Richard, Paul Cairns, Nick Chater & Joe Levy (1997). Statistical and 

connectionist modelling of the development of speech 

segmentation. In: Broeder & Murre (eds.). Models of Language 

Learning. MIT Press. 

Shillcock, Richard, Joe Levy, Geoff Lindsey, Paul Cairns & Nick Chater 

(1993). Connectionist modelling of phonological space In: T. M. 

Ellison & J. Scobbie (eds.). Computational Phonology. 

Edinburgh Working Papers in Cognitive Science, Edinburgh, 8: 

179-195 

Stoianov, Ivilin Peev (1998). Tree-based analysis of simple recurrent network 

learning. In: 36 Annual Meeting of the Association for 

Computational Linguistics and 17 Int. Conference on 

Compuational Linguistics. Vol. 2, Montreal, Canada, 1502-1504.


Stoianov, Ivilin Peev (2001). Connectionist Lexical Modelling. PhD thesis, 

Rijksuniversiteit Groningen. 

Stoianov, Ivilin Peev & John Nerbonne (2000). Exploring phonotactics with 

simple recurrent networks. In: F. van Eynde, I. Schuurman & N. 

Schelkens (eds.). Computational Linguistics in the Netherlands, 

1998. Rodopi, Amsterdam, NL, 51-68. 

Stoianov, Ivilin Peev, John Nerbonne & Huub Bouma (1998). Modelling the 

phonotactic structure of natural language words with simple 

recurrent networks. In: P.-A. Coppen, H. van Halteren & L. 

Teunissen (eds.). Computational Linguistics in the Netherlands, 

1997. Rodopi, Amsterdam, NL, 77-96. 

Stoianov, Ivilin Peev, Laurie Stowe & John Nerbonne (1999). Connectionist 

learning to read aloud and correlation to human data. In: 21st 

Annual Meeting of the Cognitive Science Society, Vancouver, 

Canada. Lawrence Erlbaum Ass., London, 706-711. 

Stowe, Laurie, Anton Wijers, A. Willemsen, Eric Reuland, A. Paans & Wim 

Vaalburg (1994). Pet studies of language: An assessment of the 

reliability of the technique. Journal of Psycholinguistic Research, 

23(6): 499-527. 

Tjong Kim Sang, Erick (1995). The limitations of modeling finite state 

grammars with simple recurrent networks. In: Proceedings of the 

5th Computational Linguistics in The Netherlands, 133-143. 

Tjong Kim Sang, Erick (1998). Machine Learning of Phonotactics. PhD thesis, 

Rijksuniversiteit Groningen. 

Tjong Kim Sang, Erik & John Nerbonne (1999). Learning simple phonotactics. 

In: Proceedings of the Workshop on Neural, Symbolic, and 

Reinforcement Methods for Sequence Processing, Machine 

Learning Workshop at IJCAI '99, 41-46. 

Treiman, R. & A. Zukowski (1990). Toward an understanding of English 

syllabification. Journal of Memory and Language, 34: 66-85. 

Tsoi, Ah Chung & Andrew Back (1997). Discrete time recurrent neural 

network architectures: A unifying review. Neurocomputing, 15: 

183-223.

Weak Interactions 

Yiddish influence in Hungarian, Esperanto and 

Modern Hebrew 

Tamás Bíró 

When I arrived in Groningen, I was introduced to Tjeerd de Graaf as 

somebody speaking Hungarian. Then it turned out that both of us were 

interested in Yiddish. Furthermore, we shared the fact that we started our 

scientific life within physics, although, unlike Tjeerd, I have not worked as 

a physicist since my graduation. Nevertheless, as a second year physics 

student I received a research question from the late leading Hungarian 

physicist George Marx that was also somehow related to Tjeerd’s earlier 

research topic, neutrino astrophysics. 

Neutrinos are funny particles. They are extremely light, if they have any 

mass, at all. 1 Therefore, they cannot interact through gravitation. Because 

they do not have any electrical charge either, electromagnetic interaction is 

also unknown to them. The only way they can interact with the universe is 

the so-called weak interaction, one of the four fundamental forces. 2 

Nowadays physicists spend an inconceivable amount of budget building 

gigantic, underground basins containing millions of liters of heavy water 

just to try to detect a few neutrinos per year out of the very intense stream 

of neutrinos flowing constantly from the Sun and going through the Earth, 

that is, us. Even though they almost never interact with regular material, 

through weak interaction they play a fundamental role both in shaping what 

the universe looks like and in the Sun’s energy production. Therefore our 

life would not be possible without neutrinos and without weak interaction. 

Something similar happens in ethnolinguistics. The interaction between 

two languages may not always be very salient, and it cannot necessarily be 

explained by the most famous types of interactions. A weak interaction in 

linguistics might be an interaction which is not acknowledged by the 

speakers’ community, for instance for ideologically reasons. 

In the present paper I shall present three cases of weak interaction 

between languages, understood in this sense, namely Yiddish affecting

124 Tamás Bíró 

Hungarian, Modern Hebrew (Israeli Hebrew) and Esperanto. All the stories 

take place in the late nineteenth or early twentieth century, when a new or 

modernized language had to be created. We shall observe what kind of 

interactions took place under which conditions. A model for interactions 

combined with the better understanding of the social-historical setting will 

enable us to do so. 

1. Language interactions within a given socio-historical setting 

1.1. Modeling interactions 

In physics, the interaction between two bodies depends on three factors: the 

two “eligibilities” of the parties to interact, as well as their distance. For 

gravity and electromagnetism, the formula probably familiar from highschool 

physics states that the force is proportional to the product of the 

“eligibilities” - mass or electric charge - of the two bodies, divided by the 

square of their distance. In other words, the higher the two masses (or 

electric charges) and the smaller the distance, the stronger the interaction. 

For Newton, who formulated this formula first, gravity was a long-range 

interaction. Modern physics has completed this picture with introducing 

exchange particles intermediating between the interacting bodies. 3 That 

way, contemporary science has also incorporated the view of Newton’s 

opponents who argued for the only possibility of short-range interactions. 

To transplant this image, vaguely, into the phenomenon of language 

interaction, we have to identify the eligibilities of the two interacting 

languages, their distance and the exchange particles. In fact, we can do that 

even on two levels. On a purely linguistic level, one can easily point to 

words and grammatical phenomena - “exchange particles” - wandering 

from language to language. But it would be harder to identify in general the 

properties of the phenomena and of the given languages that make the 

interaction more probable or less probable. 

The sociolinguistic level is more promising for such an approach. In this 

case, the human beings are the exchange particles: people who leave one 

linguistic community in order to join a new one. By the very fact of their 

moves, they affect their new language by a linguistic quantum. The closer 

the two language communities, the more people will act as an exchange


particle. Here distance should be understood not only based on geography, 

but on the intensity of the social network, as well. Thus, the more people 

wander to the target community, the more linguistic impulse is brought to 

the second language and therefore the stronger the interaction. Note that the 

physical analogy is not complete, since the symmetry of action and reaction 

is not guaranteed for interacting languages. 

The three cases to be discussed share the feature that the role of the 

carriers of the interaction is played by late nineteenth century Eastern 

European Jews. In order to understand the historical background, we have 

to recall what is called Haskala or Jewish Enlightenment. 

1.2. The Haskala 

By the late eighteenth century, the French and German Aufklärung had 

raised the question whether to emancipate and integrate - or assimilate - the 

Jewish population on the one side, and an increasing wish to join the 

European culture on the other. Although in the second half of the siècle des 

lumières there were only a few Jewish intellectuals who articulated these 

ideas, most of them belonging to the circle of the philosopher Moses 

Mendelssohn (1729-1786) in Berlin, the next decades witnessed the 

acculturation of a growing segment of the Jewish population in the German 

territories, as well as within the Austrian Empire. The eighteenth century 

Berlin Haskala is called the first stage of the Jewish Enlightenment, 

whereas the early nineteenth century social and cultural developments 

represent its second stage. 

What the first two stages of the Haskala yielded was including a Jewish 

color on the contemporary Western European cultural palette. “Jewish” was 

understood exclusively as one possible faith within the list of European 

religions, and nothing more than a religious conviction. An enlightened Jew 

was supposed to fully master the educated standard variant of the language 

of the society he lived in (Hochdeutsch¸ in most of the cases), without any 

“Jewish-like” feature. Propagating the knowledge of Hochdeutsch and 

rolling back Jüdischdeutsch had already been the program of Moses 

Mendelssohn when he began writing a modern targum 4 of the Bible, the 

Biur. Further, the same Jew was expected to fully master the contemporary 

European culture, including classical languages, sciences and arts. The only 

sphere in which this Jew could express his or her being Jewish was the 

diminished and europeanized arena of religious life. Diminished, because


of a secularization of life style; and Europeanized, due to the inclusion of 

philosophical ideals of the Enlightenment together with aesthetic models of 

the Romanticism. The traditional religious duty of constantly learning the 

traditional texts with the traditional methods was sublimated into the 

scholarly movement of the Wissenschaft des Judentums. 

The picture changed dramatically in the middle of the nineteenth 

century, when the Haskala, in its third stage, reached the Eastern European 

Jewry, including Jews in Poland and Lithuania (under Russian 

government), Eastern Hungary, and Rumania. Here the Jewish population 

was far denser, whereas the surrounding society was far behind Western 

Europe in the process of the social and economic development. In fact, 

Jews would play an important role in the modernization of those areas. 

Therefore, several people of Jewish origin could take the initiative and 

invent absolutely new alternatives to the social constructs that people had 

been living with so far. 

One type of those social alternatives still preserved the idea of the 

earlier Haskala according to which Jews should become and remain an 

organic part of the universal human culture. These alternatives proposed 

thus some forms of revolutionary change to the entire humankind, as was 

the case in the different types of socialist movements, in which Jews 

unquestionably played an important role. Esperantism also belongs here, 

for its father, Ludwig Zamenhof was a Polish-Lithuanian Jew proposing an 

alternative to national language as another social construct. 

The second type of radical answer that Eastern European Jews gave to 

the emergence of Enlightenment in the underdeveloped Eastern European 

milieu was creating a new kind of Jewish society. Recall that there was a 

dense Jewish population living within a society that itself did not represent 

a modern model to which most Jews wished to acculturate. Different 

streams of this type of answer emerged, although they did not mutually 

exclude each other. Many varieties of political activism, such as early 

forms of Zionism, political Zionism, territorialism or cultural autonomism, 

embody one level of creating an autonomous Jewish society. 

The birth of a new Jewish secular culture, including literature, 

newspapers or Klezmer music is another one. The question then arose 

whether the language of this new secular culture should be Yiddish - and 

thus a standardized, literary version of Yiddish was to be developed - or 

Hebrew - and therefore a renewal of the Hebrew language was required. In 

the beginning, this point was not such an enormous matter of dispute as it 

would later develop into, when “Hebraists”, principally connected with


Zionism, confronted “Yiddishists”, generally claiming a cultural and / or 

political autonomy within Eastern Europe. It is the irony of history that the 

far more naïve and seemingly unrealistic ideology, calling for the revival of 

an almost unspoken language in the distant Palestine, was the one that later 

would become reality. 

1.3. Language interactions in the Haskala 

Let us now return to our model of language interactions. As we have seen, 

the intensity of the interaction depends on the number of “exchange 

particles” - language changing individuals - , that is a kind of “distance” 

measured in the social network; furthermore on the “eligibility” of the 

languages to transmit and to adopt features. We shall now confront this 

model with the linguistic reality of the different stages of the Haskala. 

Concerning the first stage, when only a handful of followers of Moses 

Mendelssohn rejected the Jüdischdeutsch and started speaking 

Hochdeutsch, our model will correctly predict that the number of exchange 

particles is insufficient to affect German in a perceptible way. 

The number of exchange particles increases dramatically when we reach 

the first half of the nineteenth century. However, the people changing 

language more or less consciously adopted the idea of their original idiom 

being an unclean and corrupt version of the target language. Consequently, 

by nature their language change consisted of not bringing any influence on 

the target language with them. By applying our vague physical model to 

this situation, we might say that although the two languages were indeed 

close - from the viewpoints of geography, linguistic similarity and social 

contacts - , Hochdeutsch was not “eligible” enough to be seriously affected. 

What happened in the third stage of the Haskala? The following three 

case studies represent three possibilities. The first one, the influence of 

Yiddish on Hungarian, was actually a case where some elements of stage 2 

Haskala were still present. The emancipation of the Jews was closely 

related to their assimilation into the Hungarian society, culture and 

language. As Jews wished to become an equal part of that society, let us 

call this case type e. Each of the many people brings only a very “light” 

quantum of influence, similarly to the very little mass, if any, of the 

electron neutrinos. The type mu designates a case when Jews migrated to a 

newly created Jewish “land, language and culture”, namely to Modern 

Hebrew. Here less people carry possibly more “weight”, that is why they


can be paralleled by the heavier muon neutrinos. In the third case, that is 

the birth of Esperanto, only one person of Jewish cultural background 

wished to transform the entire word, with a total rejection of reference to 

any form of Jewishness, at least on a conscious level (type tau, referring to 

the probably heaviest type of neutrinos). 

2. Three examples of weak interaction 

2.1. Type e: Yiddish and Hungarian 

Nineteenth century Hungary was situated on the border of Western 

European Jewry, affected already by the first two stages of Haskala, and 

Eastern European Jewry, which would be reached only by its third phase. 

From the second half of the previous century onward, the Jewish 

immigration from Bohemia and Moravia had been importing a rather 

urbanized population speaking Western Yiddish, or even Jüdischdeutsch, 

whereas Eastern Yiddish speaking Galician Jews inhabiting Eastern 

Hungary represented the westernmost branch of Eastern European Jewry. 

Not only were the linguistic features of the two groups strikingly different, 

but also their social, economic and cultural background. 

In the social and economic fields, Hungary met a first wave of 

modernization in the 1830s and 1840s, which is referred to as the reform 

age, reaching its peak in the 1848-49 revolution. After the so-called 

Compromise with Austria in 1867, the consequence of which had been the 

creation of the Austro-Hungarian Empire with a dualistic system, the most 

urbanized parts of the country showed an especially remarkable economic 

and cultural growth. 

Parallel to the phenomenon of general modernization, the Jewish 

population underwent a similar process to the one we have already seen 

apropos of the French and German Jewry that had gone through these 

social changes fifty years earlier. The second quarter of the century already 

witnesses a few Jewish thinkers, mainly rabbis arriving from Germany or 

Bohemia, and bringing modern ideals with them. Yet, their effect cannot be 

perceived on a larger social scale before the last third of the century. 

A few differences should, however, be noted between German and 

Hungarian Haskala. First, for the larger society into which Hungarian Jews


wished to integrate, Enlightenment was not so much the consequence of the 

Embourgeoisement, rather its catalyst. Enormous heterogeneities in the 

degree of development could be found within the country, both in social, as 

well as economic terms. This general picture was paralleled with a 

heterogeneous distribution of Eastern and Western type of Jewry. Thus, 

even if the most Europeanized Jews may have wished, they could not 

disown their pre-Haskala coreligionists living close to them. 

Moreover, the modern Hungarian society and culture had to be created 

in spite of the Austrian occupation. Social constructs underwent huge 

changes, and any group of people identifying themselves as Hungarian - 

and not Austrian - could influence the new shapes of society and culture. 

Immigrants from all directions played a fundamental role in laying down 

the bases of modern Hungarian urban culture. These are the circumstances 

under which most of the Jews chose the Hungarian, rather than the German 

or Yiddish culture and language. This decision was far from being evident. 

Even most of the orthodoxy adopted Hungarian, though more slowly and 

by keeping simultaneously Yiddish. 

By putting together the pieces, we obtain an image in which the 

dynamically changing Hungarian culture and society is searching new, 

modern forms, and is ready to integrate foreign influences - as long as the 

carriers identify themselves as new Hungarians. Further, a major part of the 

Jewish population is seeking its place in this new society, wants to adopt 

the new culture, but is still strongly connected - often against its will - to 

the pre-Haskala Jewry living not so far from them. Consequently, we have 

both a high “eligibility” for being influenced on the part of the Hungarian 

language, and a large number of “exchange particles” flowing from Yiddish 

to Hungarian. 5 

What is the outcome of such a situation? Let us consider a few examples 

of Yiddishisms in Hungarian. I shall distinguish between three registers that 

Yiddishisms entered considerably: the Jewish sociolect of Hungarian, argot 

(slang), and standard Hungarian. 

The vocabulary of Hungarian speaking Jews unsurprisingly includes a 

large number of words specific to domains of Jewish culture and religion. 

In some cases only phonological assimilation takes place. The 

Hungarian phonological system lacks a short /a/, and the short counterpart 

of /�� is /�/. Therefore the Yiddish word [��] (‘Rosh Ha-shana, name 

of the Jewish New Year’, from Hebrew [�� ], i.e. [�� in 

standard Hungarian Ashkenazi pronunciation) becomes optionally 

[��]. Although the original Yiddish pronunciation [��] is still


possible, the latter emphasizes the foreign origin of the word. An analogous 

example is the word barchesz ([��] or [��], ‘chala, a special bread 

used on Shabbat and holidays’), which is clearly from Yiddish origin, but is 

unknown outside Hungary; it may have belonged to the vocabulary of 

Hungarian Yiddish. 

Other words immediately underwent Hungarian morphological 

processes. In fact, it is a well known phenomenon in many languages of the 

world that borrowed verbs, unlike borrowed nouns, cannot be integrated 

directly into the vocabulary of a given language. This is the case in words 

like lejnol (‘to read the Torah-scroll in the synagogue’), lejnolás (‘the 

reading of the Torah-scroll’) as well as snóder (‘money given as donation’), 

snóderol (‘to donate money, especially after the public Torah-reading’), 

snóderolás (‘the act of money donation’). In the first case, the Yiddish verb 

leyenen (‘idem’) 6 was borrowed and one of the two most frequent 

denominal verbal suffixes, -l, was added. 7 The word lejnolás is the nomen 

actionis formed with the suffix -ás. The expression tfilint légol (‘to put on 

the phylacteries’) originates from German and Yiddish legen, and has gone 

through the same processes. For snóderol, Hungarian borrows a Yiddish 

noun, 8 which then serves as the base of further derivations. 

The Jewish sociolect of Hungarian includes further lexical items, which 

do not belong to the domain of religious practice or Jewish culture. One 

such word is unberufn (‘without calling [the devil]’), which should be 

added out of superstition to any positive statement that the speaker hopes to 

remain true in the future. For instance: ‘My child grows in beauty, 

unberufn’ (Blau-Láng, 1995:66). Nowadays, many people of the generation 

born after World War II and raised already in an almost non-Yiddish 

speaking milieu judge this expression as having nothing to do with 

superstition, but qualifying a situation as surprisingly good, like ‘You don’t 

say so! It’s incredible!’ and definitely including also some irony. 9 Others of 

that generation say in the same surprising-ironic context: “My grandma 

would have said: unberufn…”, even if Grandma had used that word in a 

slightly different way. This second meaning of unberufn clearly lacks any 

reference to superstition, since the same people would use another 

expression (lekopogom) to say ‘touch wood! knock on wood!’. 

Unlike the previous interjections, the adjective betámt (‘nice, intelligent, 

smart, sweet, lovely’) already enters the “real” syntax of the target 

language, even if morphological and phonological changes have not taken 

place yet - which happened in the case of lejnol and snóderol. The word 

betámt consists of the Hebrew root taam (‘taste’), together with the


Germanic verbal prefix be- and past participle ending –t. The resulting 

word denotes a person who “has some taste”: somebody who has some 

characteristic traits, who is interesting, who has style and some sense of 

humour, which is kind, polite, and so on. It is typically used by “Yiddishe 

mammes” describing the groom they wish their daughter had. 

So far, we have seen examples where the language changing population 

has kept its original expression to denote something that could be best 

expressed using items of their old vocabulary. This Jewish sociolect has 

become an organic part of modern Hungarian, acknowledged, and partially 

known by many non-Jewish speakers, as well. But do we also find 

influences of Yiddish outside of the Jewish sociolect? 

The register that is the most likely to be affected under such 

circumstances is probably always slang: it is non-conformist by definition, 

and, therefore, it is the least conservative. Slang is also the field where 

social norms, barriers and older prejudices play the least role. This may be 

the reason why Hungarian slang created in the nineteenth century borrowed 

so much from the languages of two socially marginal groups: the Gipsy 

(Roma) languages and Yiddish. In contemporary Hungarian slang, one can 

find well-known words from Yiddish origin such as: kóser (‘kosher’, 

meaning ‘good’ in slang); tré (‘bad, crappy, grotty’, from Hebrew-Yiddish- 

Hungarian tréfli ‘ritually unclean, non kosher food’); majré (‘fear, dread, 

rabbit fever’, from Hebrew mora ‘fear’ > Ashkenazi [��] > Yiddish 

moyre [��] > Hungarian [��]), further derived to majrézik (‘to fear, 

to be afraid of sg.’); szajré (‘swag, loot, hot stuff’, from Hebrew sehora 

�� et al., 1967-76). An interesting 

construction is stikában, meaning ‘in the sly, in secret, quitely’. Its origin is 

the Aramaic-Hebrew noun ��] ‘remaining silent’, which receives a 

Hungarian inessive case ending, meaning ‘in’. 

Through slang, some of the Yiddish words have then infiltrated into the 

standard language and become quasi-standard. Thus, the word haver - from 

the Hebrew ��] ‘friend’ - is used nowadays as an informal synonym for 

a ‘good acquaintance, a friend’. Similarly, dafke means in spoken 

Hungarian ‘For all that! Only out of spite!’. Furthermore, there are words 

of Yiddish origin which did not enter Hungarian through the slang, but 

through cultural interaction: macesz (‘matzo, unleavened bread’, from 

Hebrew matzot, plural form of matza; its ending clearly shows that the 

word arrived to Hungarian through Yiddish) or sólet (‘tsholent’, a typically 

Hungarian Jewish bean dish, popular among non-Jews, too). 10


To summarize, the high amount of “exchange particles”, that is, Jewish 

people gradually changing their language from Yiddish to Hungarian, has 

affected the target language in three manners. One of them has been the 

creation of a special Jewish sociolect. This was not a secret language 

though, and non-Jews have borrowed quite a few expressions. This fact led 

to the second manner of influence, namely to the high amount of Yiddish 

words entering the slang. Some of these words have infiltrated even into the 

relatively more informal registers of the standard language. The third 

manner is cultural interaction: the exchange of cultural goods - for instance 

in the field of gastronomy - inevitably has resulted the exchange of the 

vocabulary designating those goods. 

2.2. Type µ: Yiddish and Modern Hebrew 

The fruit of Western European Haskala in the field of science was the birth 

of Wissenschaft des Judentums. The Jewish scholars belonging to this 

group aimed to introduce modern approaches when dealing with traditional 

texts, Jewish history, and so forth. Their approach contrasted traditional 

rabbinical activity the same way as the romanticist cantorial compositions 

by Salomon Sulzer and Louis Lewandowski contrasted traditional synagogal 

music: modernists aimed to produce cultural goods that were esteemed by 

the modern society, both by Jews and the recipient country. A further 

motivation of the Wissenschaft des Judentums was to expose the values of 

post-Biblical Jewish culture, and to present them as an organic part of 

universal culture: by emancipating Jewish past, they hoped to be also 

emancipated by contemporary society. 

This background illuminates why early Haskala honored so much 

Hebrew - the language of the contribution par excellence of the Jewish 

nation to universal culture, which is the Hebrew Bible, and a language that 

had been long studied by Christian Hebraists. And also why Yiddish, the 

supposedly jargon of the uneducated Jews and a corrupt version of German, 

was so much scorned in the same time. 

Although the goal of the earlier phases of Haskala was to promote the 

literary language of the recipient country among Jews, that is practically 

Hochdeutsch, and Hebrew was principally only the object of scholarly 

study, still some attempts were made to use the language in modern 

domains, at least for some restricted purposes. After a few pioneering 

experiments to establish Hebrew newspapers in the middle of the


eighteenth century, the Hebrew literary quarterly Ha-Meassef appeared as 

early as 1784 (Sáenz-Badillos, 1993:267). 

However, it was not before the middle of the next century, when 

Haskala reached Russia, that the need of reviving the Hebrew language was 

really articulated. As already discussed, the major reasons for this switch 

were that the Jewish population did not see the underdeveloped 

surrounding society as a model to which they wanted to assimilate; the 

Russian society and policy did not show any real sign of wanting to 

emancipate and integrate Jews, either; furthermore, the huge Jewish 

population reached the critical mass required to develop something in itself. 

The summation of these factors led to the idea of seeing Jewry as separate a 

nation in its modern sense. A further factor reinforcing Jewish national 

feelings both in Eastern and Western Europe was the emergence of modern 

political anti-Semitism in the 1870s in the West, accompanied by events 

such as the huge Russian pogroms in 1881, the blood libel of Tiszaeszlár, 

Hungary (1882-3) or the Dreyfus-affair in France (starting in 1894). 

The claims following from this idea were that the Jewish nation has the 

right to have a country - in Palestine or elsewhere, but at least it should 

receive some local autonomy - , and also that the Jewish nation must have 

its own national language. The two major candidates for the Jewish 

national language were Yiddish and Hebrew, although German was not out 

of the competition, either (cf. e.g. Shur, 1979:VII-VIII). 

The first wave of attempts to revive Hebrew consisted mainly of purists, 

seeing Biblical Hebrew as the most precious layer of the language: some of 

them went so far that they preferred to create very complicated expressions 

to designate modern concepts, rather than using non-Biblical vocabulary. 

The fruits of this early period are among others the first regular Hebrew 

weekly, Ha-Maggid (1856), the first modern play by D. Zamoscz (1851), 

novels by A. Mapu, as well as works of S. J. Abramowitsch (Mendele 

Moykher Seforim), who can be considered one of the founders of both 

modern Hebrew and modern Yiddish literature. 

The real upswing was observable in the last quarter of the century, 

especially after the 1881 pogroms, and when Haskala had reached the 

broadest masses, as well. Traditionally, the publication of Eliezer Ben- 

Yehuda’s article in 1879 entitled ‘A burning question’ is considered to be 

the opening of the new era (Sáenz-Badillos, 1993:269). Ben-Yehuda (1858- 

1922) has been portrayed as the hero of the revival: he moved to Jerusalem 

in 1881, where he forced himself and his family to speak Hebrew. To speak 

a language, that is to produce everyday, spontaneous sentences “in real-


time”, on a language that had been mostly used for writing and reading and 

only in restricted domains. His son, Ithamar (1882-1943), was the first 

person after millennia who grew up in an exclusively Hebrew-speaking 

environment. Ben-Yehuda constantly introduced new words designating 

weekday concepts, while he was editing a newspaper and working on his 

monumental Thesaurus, which incorporated material from ancient and 

medieval literature. In 1890, he founded the Va’ad ha-Lashon (‘Language 

Committee’), the forerunner of the Hebrew Language Academy, hereby 

creating a quasi-official institution for language planning. 

However, Shur (1979) has argued against an overestimation of Ben- 

Yehuda’s role. Out of Fishman’s five stages of language planning (in Shur, 

1979) (1. code selection; 2. ideologization of the choice; 3. codification; 4. 

elaboration and modernization; 5. standardization, i.e. the acceptance by the 

community), Ben-Yehuda was salient especially in codification and 

elaboration, as well as in vitalization, which was also necessary under the 

given circumstances. But for socio-political reasons, he had not much 

influence on the initial language choice and its ideologization, as well as on 

the final acceptance of the codified and elaborated standard. 

It is clear that Yiddish was the mother tongue, or one of the main 

languages for a major fraction of the members of the Va’ad ha-Lashon, 

including Ben-Yehuda himself. Moreover, people with Yiddish as first 

language represented an important part of the speaker community of the 

old-new tongue in the first half of the twentieth century. Yiddish was not 

scorned anymore, as it had been a century before, but it was not considered 

as a major source for language reform, either. Especially for the later 

generations, Yiddish would symbolize the Diaspora left behind by the 

Zionist movement. 

Yiddish speaking “exchange particles” dominated the community, much 

more than in the Hungarian case. Yet, a very conscious ideology required 

changing the previous ethnic language to the old-new national language, 

especially after the 1913-14 “Language Quarrel”, wherein the defenders of 

Hebrew defeated those of German and Yiddish (Shur, 1979:VII-VIII, X). 

This ideology was actively present in almost each and every individual who 

had chosen to move to the Land of Israel in a given period - contrary to the 

European case, where ideology of changing the language was explicit only 

in the cultural elite. Further, the language change was not slow and gradual, 

but drastic in the life of the people emigrating to Palestine, combined with a 

simultaneous radical change in geographical location, social structure and 

lifestyle. What phenomena would this constellation involve?


Yiddish influence on Modern Hebrew vocabulary has been investigated 

by - among others - Haim Blanc. For instance, the Modern Hebrew 

interjection davka (approx. ’necessarily, for all that’) is clearly a 

Hebraisation of Yiddish dafke, of Hebrew origin itself, and mentioned also 

in relation with Hungarian. Similarly, kumzitz ‘get-together, picnic, 

campfire’ undoubtedly originates from the Yiddish expression ‘come [and] 

sit down!’, since only in Yiddish do we find [u] in the verb ‘to come’. 

However, the expression was probably coined in Hebrew, as standard 

Yiddish dictionaries do not mention it. One can easily imagine the early 

pioneers sitting around a campfire in the first kibbutzim, chatting in a 

mixture of Yiddish and Hebrew, and inviting their comrades to join them. 

Nissan Netzer (1988) analyses the use of the Modern Hebrew verb 

firgen and the corresponding de-verbal noun firgun. Officially, the word is 

still not considered to belong to the language, for it is not attested in any 

dictionary of Hebrew that I know. Definitions for this word I have found on 

the Internet are: “the ability to allow someone else to enjoy if his or her 

enjoyment does not hurt one,” and “to treat favorably, with equanimity, to 

bear no grudge or jealousy against somebody,” and also “to be delighted at 

the success of the other”. The word can be traced back to Yiddish farginen 

‘not begrudge, not envy, indulge’. As Netzer has demonstrated, there is a 

linguistic gap in Hebrew, for the expressions darash et tovato shel… or lo 

hayta eno tsara be- that should bear that meaning are cumbersome, 

circuitous, overly sophisticated in style and seems to cloud the true 

linguistic message. Therefore, they were not accepted by the linguistic 

community. When a leading Hebrew linguistics professor used the Yiddish 

equivalent in the early sixties, the situation made the listeners of an 

academic lecture smile, because in that time the Yiddishism was considered 

to be a folk idiom that would finally withdraw in favor of a “real Hebrew 

expression”. However, firgen would have become more and more accepted 

in daily conversation and even in journalistic writings by the eighties. 11 

This example has led us to the issue of the sociolinguistic status of 

Yiddish words in Modern Hebrew. Ora Schwarzwald (1995) shows that the 

vocabulary of the most used classical texts, such as the Hebrew Bible and 

liturgy, has become the base of Modern Hebrew, in all its registers. 

Furthermore, loanwords of European languages are also used both in 

formal and non-formal language. However, from less esteemed languages, 

such as Jewish languages (e.g. Yiddish and Ladino), as well as Arabic, 

words would infiltrate primarily into lower registers and everyday informal 

speech.


For instance, chevre ‘friends’ is used mainly when addressing 

informally a group of people, and it is the borrowing of the similar word in 

Yiddish (khevre ‘gang, bunch of friends, society’). The latter obviously 

comes from Hebrew chevra ‘society, company, gathering’, whose root is 

chaver ‘friend’, a well-known word for speakers of Hungarian and Dutch 

(gabber), too. The originally Hebrew word thus arrived back to Modern 

Hebrew, but keeping the phonological traces of its trajectory. Also note the 

minor shifts in the semantics during the two borrowings. 

Another example for Yiddish influence on informal speech is the use of 

the -le diminutive suffix: abale from aba ‘dad’, Sarale ‘little Sarah’, 

Chanale ‘little Hanah’, and so forth. Observe that the suffix follows the 

Hebrew word, whereas in Yiddish one would have Sorele and Chanele 

expect. 

Thus, the influence of Yiddish on Modern Hebrew is indeed similar to 

its influence on Hungarian: lower registers and informal speech constitute 

one of the canals through which this interaction takes place. To make the 

similarity even more prominent, we can point to two further canals, shared 

by the Modern Hebrew case and the Hungarian case. Similarly to 

Hungarian, the designation of goods of general culture, such as food names 

(beygelach ‘bagels or pretzel’) represent a domain for word borrowings. 

Moreover, Yiddish loan words, or Hebrew words with a Yiddish or 

Ashkenazi pronunciation are likely to appear in religious vocabulary (e.g. 

rebe ‘Chasidic charismatic leader’); typically in the sociolect of religious 

groups (especially within the ultra-orthodox society), and in the language 

used by secular Israelis to mock the stereotypically Yiddish-speaking ultraorthodox 

Jews (e.g. dos ‘an ultra-orthodox person’, from Hebrew dat 

‘religion’; vus-vus-im ‘the Ashkenazi ultra-orthodox Jews’, who often say 

Vus? Vus? ‘What? What?’ followed by the Hebrew plural ending -im). 

2.3. Type τ: Yiddish and Esperanto 

Esperanto emerged in the very same context as Modern Hebrew. Its creator, 

Lazar Ludwik Zamenhof (1859-1917), was born one year after Eliezer Ben- 

Yehuda, similarly from a Jewish family living in a small Lithuanian town, 

whose population was composed of Russian, Polish and Lithuanian people, 

but was dominated by a Jewish majority. The Litvak (Lithuanian-Jewish) 

Haskala background of both men encouraged traditional Jewish education 

combined with studies in a secular Gymnasium; both of them went on to


study medicine. Following the 1881 wave of pogroms, in the year in which 

Ben-Yehuda moved to Jerusalem, Zamenhof published an article calling for 

mass emigration to a Jewish homeland. For a few years, he became one of 

the first activists of the early Zionist movement Hovevei Tzion (“Lovers of 

Zion”). Berdichevsky (1986) points out the similarities even in the 

mentality and the physical appearance of Zamenhof and Ben-Yehuda. 

Nevertheless, two key differences should be pointed out. The first one is 

Zamenhof’s pragmatism. In his 1881 article, Zamenhof imagined the 

Jewish homeland to be in the western part of the United States, a relatively 

unsettled area those days, which would have arisen much less sensibility 

from all sides. Furthermore, Zamenhof shared the skepticism of many of 

his contemporaries in the feasibility to revive the Hebrew language. 

According to the anecdote, Theodor Herzl said once that he could not buy 

even a train ticket in Hebrew. Leading Jewish writers, such as Mendele 

Moykher Seforim, oscillated between writing in Yiddish and in Hebrew; 

both of these languages called for the establishment of a modern, secular 

literary tongue. The young and pragmatic Zamenhof chose to reform 

Yiddish, the language with millions of native speakers; whereas the first 

native speaker of Modern Hebrew, the son of Ben-Yehuda was not born 

yet. 

In his early years, Zamenhof wrote a comprehensive Yiddish grammar 

(completed in 1879, partially published in 1909 in the Vilna Journal, Lebn 

un Vissenschaft, and fully published only in 1982). He argued for the 

modernization of the language and fought for the use of the Latin alphabet, 

instead of the Hebrew one. How is it possible then that a few years later 

Zamenhof changed his mind, and switched to Esperanto (1887)? 

Here comes the second key difference into the picture. Ben-Yehuda was 

sent by his orthodox family to a yeshiva (traditional school teaching mainly 

the Talmud), where one of the rabbis introduced him secretly into the 

revolutionary ideas of the Haskala. On the contrary, Zamenhof’s father and 

grandfather were enlightened high-school teachers of Western languages 

(French and German). For him, being Jewish probably meant a universal 

mission to make the world a better place for the whole humankind. This 

idea originates from eighteenth century German Haskala philosophers 

claiming that Judaism is the purest embodiment so far existing of the 

universal moral and of the faith of the Pure Reason; even today a major part 

of Jews worldwide perceive Judaism this way. 

Zamenhof did not therefore content himself with the goal of creating a 

Jewish national language. For him, similarly to his semi-secularized


coreligionists joining the socialist movement in the same decades, unifying 

the human race and building a new word order presented the solution for - 

among others - the problems of the oppressed Eastern European Jewry. 

And also the other way around: the secular messianic idea of the unification 

of the dispersed and oppressed Jews into a Jewish nation was just one step 

behind from the secular messianic idea of the unification of the whole 

mankind into a supra-national unit. This explains not only the motivations 

of Zamenhof himself, but also why Jews played such an important role in 

the pre-World War II Esperanto movement in Central and Eastern Europe 

(Berdichevsky, 1986:60). Whereas socialists fought for a social-economic 

liberation of the oppressed, Zamenhof spoke about the liberation of the 

humans from the cultural and linguistic barriers. It is not a coincidence that 

the twentieth century history of the Esperantist movement was so much 

intermingled with the one of the socialist movements. 

Zamenhof’s initiative was to create a language that would be equally 

distant from and equally close to each ethnic language, thus each human 

being would have equal chance using this bridge connecting cultures and 

people. Hence Zamenhof created a vocabulary and a grammar using 

elements of languages he knew: Russian (the language his father spoke 

home and the language of his highschool), German and French (the 

languages his father and grandfather were teachers of), Polish (the language 

of his non-Jewish fellow children), Latin and Greek (from highschool), as 

well as English and Italian. Note that the resulting language, similarly to 

most artificial languages, is inherently European and Indo-European in its 

character, though extremely simplified. 

However, one should not forget that Zamenhof’s native tongue was 

Yiddish, this was the language he used with his school mates in the Jewish 

primary school (kheyder, cf. Piron, 1984), and most of his life he kept 

contact with circles where Yiddish was alive. So one would wonder why 

Yiddish is not mentioned overtly among the source languages of Esperanto. 

Seeing Zamenhof’s former devotion for the Jewish sake and the Yiddish 

language, as well as his later remark that Yiddish is a language similar to 

any other (in Homo Sum (1901), cf. Piron (1984:17) and Berdichevsky 

(1986:70)), the possibility that he despised “the corrupt version of German” 

or that he felt shame at his Yiddish origins, are out of question. 

The challenging task now is to find at least covert influences of Yiddish 

on Esperanto. 

As strange as it may sound, a considerable literature has been devoted to 

etymology within Esperanto linguistics. One of the biggest mysteries is the


morpheme edz. As a root, it means ‘married person’ (edzo ‘husband’; 

edzino ‘wife’, by adding the feminine suffix -in-). While as a suffix, it turns 

the word’s meaning into the wife or husband of the stem: lavistino ’washerwoman’ 

vs. lavistinedzo ‘washerwoman’s housband’; doktoro ‘doctor’ vs. 

doktoredzino ‘doctor’s wife’. Hungarian Esperantists have tried to use this 

suffix to translate the Hungarian suffix -né (‘wife of…’, e.g.: Deákné ‘wife 

of Deák, Mrs. Deák’; cf. Goldin (1982:28)). The phonemic content of the 

morpheme is not similar to any word with related meaning in any of the 

languages that Zamenhof might have taken into consideration. 

Zamenhof himself wrote in a letter to Émile Boirac that the morpheme 

was the result of backformation, and that originally it was a bound form 

(Goldin, 1982:22f). Boirac suggested in 1913 the following reconstruction: 

if the German Kronprinz (‘heir apparent’) became kronprinco in Esperanto, 

while Kronprinzessin (‘wife of a crown prince’, note the double feminine 

ending: the French feminine suffix -esse is followed by the Germanic 

feminine -in) turns to kronprincedzino, then the ending -edzin- can be 

identified as ‘a woman legally bound to a man’. By removing the feminine 

suffix -in-, we obtain the morpheme -edz-. Goldin adds to this theory that 

the morphemes es and ec had already been used with other meanings, that 

is why the surprising [��] combination appeared. Summarizing, the 

etymology of the Esperanto morpheme edz would be the French feminine 

ending -esse, which had been reanalyzed with a different meaning due to 

the additional feminine suffix in German. 

However, this is not the end of the story. Other alternatives have been also 

proposed. Waringhien and others (in Goldin, 1982) have brought forward 

the idea that the word serving as the base of backformation was the Yiddish 

word rebetsin (‘wife of a rabbi’). In fact, this word can be reanalyzed as 

reb+edz+in, and we obtain the edz morpheme using the same logic as 

above. Goldin’s counterargument that the Yiddish word is actually rebetsn 

with a syllabic [��] is not at all convincing: old Yiddish spelling often uses 

the letter yod to designate a schwa, or even more the syllabicity of [�� an �], 

similarly to the in German spelling, like in wissen. Consequently, I can 

indeed accept the idea that a pre-YIVO spelling rebetsin was in the mind of 

Zamenhof. 

Piron (1984) adds further cases of possible Yiddish influence. In words 

taken from German, the affricate [��] always changes to [�]: German pfeifen 

‘to whistle’ became Esperanto fajfi. This coincides with Yiddish fayfn. 

Though, one is not compelled to point to Yiddish as the origin of this word: 

the reason can simply be that the affricate [��] is too typical to German, not


occurring in any other languages that served “officially” as examples for 

Zamenhof. In other words, [��] was not seen as universal enough. But what 

about the consonant clusters ��], ��], ��], which are also characteristic 

solely to German (and to Yiddish)? May the solution be that while [��] 

becomes [�] in Yiddish, these clusters are unchanged; therefore, Zamenhof 

felt less discomfort with regard to the latter clusters than with regard to [��] 

which truly occurs exclusively in German? I do not believe that we can do 

more than speculate about the different unconscious factors acting within a 

person more than a hundred years ago. The only claim we can make is that 

some of these factors must have been related to Yiddish, as expected from 

the fact that Yiddish was one of the major tongues of Zamenhof. 

In the field of semantics, Piron brings the differentiation in Esperanto 

between landa (‘national, related to a given country’, adjective formed 

from lando ‘country’) as opposed to nacia (‘national, related to a given 

nation’, adjective from nacio ‘nation’). This differentiation exists in 

Yiddish (landish and natsional), but not in any other languages that 

Zamenhof might have taken into consideration. Piron also argues against 

the possible claim that this is not a Yiddish influence, rather an inner 

development related to the inner logic of Esperanto. 

The most evident example of Piron is Esperanto superjaro ‘leap year’, a 

compound of super ‘on’ and jaro ‘year’. No known language uses the 

preposition on or above to express this concept. However, Yiddish has 

iberyor for ‘leap year’, from Hebrew ibbur (‘making pregnant’), the term 

used in rabbinic literature for intercalating an extra month and making the 

year a leap year (e.g. Tosefta Sanhedrin 2:1-7). On the other hand, iber also 

means ‘above’ in Yiddish, which explains the strange expression in 

Esperanto. I do not know if Zamenhof realized that the Yiddish expression 

iberyor is not related to German über, but this is probably not relevant. 

Let us summarize this section. Yiddish influence on Esperanto is a case 

where there is only one exchange particle - in the first order approximation, 

at least, since we have not dealt with the possible influences related to the 

numerous later speakers of Esperanto of Yiddish background. Though, this 

one particle had a huge impact on the language for a very obvious reason. 

Even if he did not overtly acknowledge that Yiddish had played a role in 

creating Esperanto, it is possible to discover the - either consciously hidden 

or unconscious - traces of Yiddish. 

Did Zamenhof want to deny that he had also used Yiddish, as a building 

block of Esperanto? Perhaps because his goal was indeed to create a 

universal, supra-national language, and not the language of the Jewish


nation? Or, alternatively, was this influence unconscious? I do not dare to 

give an answer. 

3. Conclusion 

In linguistics, we could define weak interaction as an interaction that is not 

overtly acknowledged. No one would deny the influence of the Frenchspeaking 

ruling class on medieval English, or the impact of the Slavic 

neighbors on Hungarian. But sometimes, conscious factors hide the effect. 

Yet, weak interactions are as crucial for the development of a language, as 

the nuclear processes emitting neutrinos in the core of the Sun that produce 

the energy which is vital for us. 

We have seen three cases of weak interaction between languages. In 

fact, all three stories were about the formative phase of a new or 

modernized language, in the midst of the late nineteenth century Eastern 

Europe Jewry. In the cases of Yiddish influencing Hungarian and Modern 

Hebrew, the number of “exchange particles”, that is, the amount of initially 

Yiddish-speaking people joining the new language community, were 

extremely high: roughly one tenth of the Hungarian speaking population in 

nineteenth century Hungary, and probably above 50% of the Jews living in 

early twentieth century Palestine. Nonetheless, in both cases we encounter 

an ideology promoting the new language and disfavoring Yiddish. 

Because the level of consciousness of this ideology seems to be 

inversely proportional to the ratio of “exchange particles” - stronger in 

Palestine than in Hungary - , the two factors extinguish each other, and we 

find similar phenomena. For instance, Yiddish has affected first and 

foremost lower registers, which are less censored by society; therefrom it 

infiltrates into informal standard language. Additional trends are Yiddish 

words entering specific domains, such as gastronomy or Jewish religious 

practice. Although it is essential to note that not all concepts that are new in 

the target culture are expressed by their original Yiddish word: many new 

expressions in these domains have been coined in Hungarian and Modern 

Hebrew, and accepted by the language community. 

The third case that we have examined is different. Zamenhof was a 

single person, but as the creator of Esperanto, he had an enormous 

influence on the new language. The influence of Yiddish was again weak in 

the sense that it was not overtly admitted; however, we could present 

examples where the native tongue of Zamenhof influenced the new


language. We could have cited, as the articles mentioned had done, 

numerous further instances where the influence of Yiddish cannot be 

proven directly, the given phenomenon could have been taken from other 

languages, as well; however, one can hypothesize that Yiddish played - 

consciously or unconsciously - a reinforcing role in Zamenhof’s decisions. 

I do hope that I have been able to prove to the reader that seemingly 

very remote fields, such as physics, social history and linguistics, can be 

interconnected, at least for the sake of a thought experiment. Furthermore, 

“exchange particles” in the field of science, and Tjeerd is certainly among 

them, have hopefully brought at least some weak interaction among the 

different disciplines. 

Notes 

1 According to http://cupp.oulu.fi/neutrino/nd-mass.html, the mass of the 

electron neutrino (νe) is less than 2.2 eV, the mass of the muon neutrino (νµ) 

does not exceed 170 keV, while the mass of the tau neutrino (νπ) is reported to 

be bellow 15.5 MeV. For the sake of comparison, the mass of an electron is 511 

keV, while the mass of a proton is almost 940 MeV. 

2 Physical phenomena are thought to be reducible to four fundamental forces. 

These are gravity, electromagnetism, weak interaction and strong interaction. 

The last two play a role in sub-atomic physics. 

3 The photons (particles of the light) are the exchange particles for the 

electromagnetic interaction; the hypothetical gravitons should transmit gravitation; 

in the case of the weak interaction, the W + , W - and Z vector bosons play 

that role; whereas the strong interaction is mediated by pions. 

4 Targumim (plural of targum) are the Jewish Aramaic versions of the Hebrew 

Bible from the late antiquity, including also many commentaries beside the 

pure translation. The same way as late antiquity Jews created the commented 

translation of the Holy Scriptures to their native tongue and using their way of 

thinking, Moses Mendelssohn expected his version of the Bible to fit the 

modern way of thinking and the “correct language” of its future readers. 

Obviously, the Biur should first have to fulfil its previous task, namely to teach 

the modern way of thinking and the “correct tongue” to the first generation of 

its readers. Interestingly enough, script was not such a major issue for Mendelssohn 

as “language purity”, thus he wrote Hochdeutsch in Hebrew characters; in 

order to better disseminate his work among the Jewish population. 

5 I assume that the formative phase of modern Dutch society and culture in the 

17 th and 18 th century is comparable to that of 19 th century Hungary; even more


is so the role of Jewry in both countries, as a group which was simultaneously 

integrating into the new society and also forming it. In both cases, the presence 

of the continuous spectrum from the pre-Haskala Yid to the self-modernizing 

Israelite led to a gradual, though determined giving up of the Yiddish language. 

This socio-historical parallelism could partially explain why phenomena of 

Yiddish influence on Dutch are often similar to that on Hungarian. 

Concerning Dutch-Jewish linguistic interactions, readers interested in Jewish 

aspects of Papiamentu, a creole language spoken in the Netherlands Antilles, 

are referred to Richard E. Wood’s article in Jewish Language Review 3 

(1983):15-18. 

6 The etymology of the Yiddish word itself is also interesting. The origin is the 

late Latin or Old French root [��] ‘to read’ (cf. to Latin lego, legere, modern 

French je lis, lire), which was borrowed by the Jews living in early medieval 

Western Europe. The latter would then change their language to Old High 

German, the ancestor of Yiddish. At some point, the meaning of the Old French 

word was restricted to the public reading of the Torah-scroll in the synagogue. 

7 Compare to sí ‘ski’ > síel ‘to ski’, �� ‘fire’ > tüzel ‘to fire’; also: printel ‘to 

print with a computer printer’. It is extremely surprising that the word lejnol 

does not follow vowel harmony, one would expect * lejnel. Even though the [�] 

sound can be transparent for vowel harmony, this fact is not enough to explain 

the word lejnol. Probably the dialectal Yiddish laynen was originally borrowed, 

and this form served as the base for word formation, before the official Yiddish 

form leynen influenced the Hungarian word. Some people still say lájnol. 

8 When being called to the Torah during the public reading, one recites a 

blessing, the text of which says: “He Who blessed our forefathers Abraham, 

Isaac and Jacob, may He bless [the name of the person] because he has come up 

to the Torah / who has promised to contribute to charity on behalf of… etc.” 

The part of the text ‘who has promised’ sounds in the Ashkenazi pronunciation 

[�� ]. This is most probably the source of the word snóder, after vowel 

in the unstressed last syllable has become a schwa, a process that is crucial for 

understanding the Yiddishization of Hebrew words. The exciting part of the 

story is that the proclitic [��] (‘that’) was kept together with the following finite 

verbal form ([��] ‘he promised’), and they were reanalysed as one word. 

9 When I asked people about the meaning of unberufn on the mailing list 2nd- 

Generation-Jews-Hungary@yahoogroups.com, somebody reported that her 

non-Jewish grandmother also used to say unberufn with a similar meaning. 

10 Other Hungarian words of Hebrew origin do not come from Yiddish, as shown 

by their non-Askenazi pronunciation: Tóra ([��] ‘Torah’, as opposed to its 

Yiddish counterpart Toyre) or rabbi (and not rov or rebe). Words like behemót 

(‘big hulking fellow’), originally from Biblical Hebrew behema (‘cattle’, plural:


behemot; appearing also as a proper name both in Jewish and in Christian 

mythology) should be rather traced back to Christian Biblical tradition. 

11 Note, that the word has kept its original word initial [�], without transforming it 

into [�], which would have been predicted by Hebrew phonology. Although 

this is a remarkable fact for Netzer, it turns out that almost no word borrowed 

by Modern Hebrew would change its initial [�] to [�]. Even not verbs that have 

had to undergo morpho-phonological processes (e.g. fibrek from English to 

fabricate). The only exception I have found in dictionaries is the colloquial 

form pilosofiya for filosofiya ‘philosophy’, as well as the verb formed from it, 

pilsef ‘to philosophise’. Furthermore, it can be argued that pilosofiya is not even 

a modern borrowing. The only reason why one would still expect firgen 

�� 

�� 

�� Г �� 

�� 

�� ]. On the other hand, one 

� 

may claim that /�/ and /�/ should be considered as distinct phonemes in Modern 

Hebrew, even if no proposed minimal pair that I know of is really convincing. 

References 

�� A Magyar Nyelv Történeti- 

Etimológiai Szótára [The Historical-Etymological Dictionary of 

The Hungarian Language]. Akadémiai Kiadó, Budapest. 

Berdichevsky, Norman (1986). Zamenhof and Esperanto. Ariel, A Review of 

Arts and Letters in Israel, 64: 58-71. 

Blau Henrik, Károly Láng (1995). Szájról-szájra, Magyar-jiddis 

�� 2 . 

Goldin, Bernard (1982). The Supposed Yiddish Origin of the Esperanto 

Morpheme. edz, Jewish Language Review, 2: 21-33. 

Graaf, Tjeerd, de (1969). Aspects of neutrino astrophysics. Wolters-Noordhoff 

nv, Groningen. 

Netzer, Nissan (1988). “Fargen” - Employing a Yiddish Root to Bridge a 

Linguistic Gap in the Hebrew Language (in Hebrew, with 

English abstract). Hebrew Computational Linguistics, 26: 49-58. 

Piron, Claude (1984). Contribution à l’étude des apports du yidiche à 

l’ésperanto. Jewish Language Review, 4: 15-29. 

Sáenz-Badillos, Angel (1993). A History of the Hebrew Language. University 

Press, Cambridge. 

Schwarzwald, Ora (Rodrigue) (1995). The Components of the Modern Hebrew 

Lexicon: The Influence of Hebrew Classical Sources, Jewish


Languages and Other Foreign Languages on Modern Hebrew (in 

Hebrew, with English abstract). Hebrew Linguistics, 39: 79-90. 

Shur, Shimon (1979). Language Innovation and Socio-political Setting: The 

Case of Modern Hebrew. Hebrew Computational Linguistics, 15: 

IV-XIII.

Prosodic Acquisition: a Comparison of Two 

Theories 

Angela Grimm 


During language development, children’s word productions are target of a 

variety of prosodic processes as e.g. syllable deletions, syllable additions 

and stress shift. Using current phonological theory, investigators have 

explained the production pattern in a number of different ways. 

In this paper, I review two approaches to the development of word 

stress: Fikkert’s (1994) theory of trochaic template mapping and Demuth & 

Fee’s (1995) prosodic hierarchy account. Both theories assume that 

children build up the prosodic representation of words step-by-step, starting 

with the smallest unit and ending with an adult-like representation. I argue 

that both theories are problematic because they overgenerate certain 

structures (e.g. level stress), but that the model of Demuth & Fee can better 

account for the data presented so far. 

This paper is organized as follows: since it is crucial in both theories, 

paragraph 2 briefly introduces the basic assumptions of the prosodic 

hierarchy. In paragraph 3, I will give a survey of Fikkert’s (1994) model of 

stress development and Demuth and Fee’s (1995) model based on the 

prosodic hierarchy. In paragraph 4, I discuss the problems arising with the 

models and paragraph 5 concludes. 

2. The prosodic hierarchy of words 

The prosodic hierarchy up to the word level consists of four constituents. 

The lowest element of the prosodic hierarchy is the mora (µ). Since there 

are often no segmental slots in moraic models, the mora has a double 

function as the unit of syllable weight and as the unique sub-syllabic 

constituent. The moraic level is dominated by the syllabic level (σ), and 

syllables are parsed into feet (F) at the foot level above. The highest unit is

148 Angela Grimm 

the prosodic word (Wd) which directly dominates the foot level (see Figure 

1): 

Prosodic word (Wd) 

Foot (F) 

Syllable (σ) 

Mora (µ) 

Figure 4. The prosodic hierarchy (Selkirk, 1980) 

Syllables differ with respect to the number of moras they contain. Light 

syllables contain one mora, while heavy syllables contain at least two. The 

tendency of languages to assign stress to heavy syllables is expressed by 

the Weight-to-Stress-Principle (WSP). In a parametric approach to word 

stress (cf. Hayes, 1995), languages either respect this principle (quantitysensitive 

languages) or do not (quantity-insensitive languages). 

The next constituent of the prosodic organization above the syllable 

level is the foot. Ideally, the foot is binary branching which implies that it 

should consist of two moras or of two syllables. Thus, a binary foot can be 

monosyllabic if it contains two moras (e.g. �� ‘duck’) or it can be 

disyllabic if it consists of two syllables or two moras (e.g. �� ‘papa’). 

The head constituent of the foot receives stress. 

The prosodic word is the domain of stress application. It can also 

coincide with a single foot. Because the foot size is the smallest shape a 

prosodic word can have, it is called Minimal Word. Many languages have 

restrictions such that content words must not be smaller than the minimal 

word. There is ample evidence that the minimal word restriction also 

governs the shape of the early words in language acquisition (Demuth & 

Fee, 1995; Demuth, 1996; Fikkert, 1994; Ota, 2001). 

A very important principle of the prosodic hierarchy is the Strict Layer 

Hypothesis (Selkirk, 1984) which demands that layers must not be skipped, 

i.e. that a given prosodic constituent(n-1) is contained in the constituent(n) 

immediately above. Furthermore, it requires that constituents have one and 

only one head, which implies that there is always a difference in 

prominence among the elements forming a given prosodic unit.


3. The acquisition of word stress: two current models 

3.1. Fikkert (1994) 

Fikkert’s study of Dutch children is the most detailed research on stress 

acquisition to date. Fikkert mainly focused on disyllabic words and argued 

for the foot as the basic unit of development. 

Although Fikkert’s model is based on Dutch, she claims that the 

trochaic template is universal in child language since it is the only quantityinsensitive 

foot in the typology of Hayes (1991). Thus, children should not 

show sensitivity to syllable weight at the earliest stages of prosodic 

acquisition. The postulation of a universal foot template implies that the 

child always makes reference to the foot level in the word productions. 

Consequently, it is a foot, not a syllable that is being truncated in forms like 

below: 

�� 

�� 

Example 1. 

child form adult target gloss 

‘ballon’ 

‘holiday’ 

Fikkert assumes that the output a child produces is directed by the mapping 

of a melody template onto a trochaic template via prosodic circumscription. 

Based on phenomena such as truncation, stress shift and epenthesis, four 

different stages of prosodic development are postulated. 

Stage 1 

According to Fikkert, the child circumscribes the stressed syllable of the 

adult form together with its segmental material and maps it onto a trochaic 

template. The presumed representation of the child is given in Figure 2 (‘S’ 

denotes the prominent position and ‘W’ the non-prominent position within 

the foot):


Wd 

F 

σS σW 

�� 

Figure 5. The prosodic representation at stage 1 

Prosodic circumscription forces the child to divide the input into two parts, 

the kernel (i.e. the stressed syllable) and the residue. In the mapping 

process, the kernel (��) is mapped onto the strong position in the 

prosodic template. The residue (/��/) becomes truncated because there are 

no empty positions in the template. The mapping onto the trochaic template 

accounts for the fact that, if the result of prosodic circumscription is a 

monosyllabic foot, sometimes a syllable is added to receive a disyllabic 

output, for example �� instead of ��. 

Stage 2 

At stage 2, the child circumscribes a trochaic foot. Thus, if the prosodic 

circumscription already results in a trochee as in ��/ ‘holiday’, the 

trochee remains unchanged in the output and appears as ��. Words 

consisting of more than a single foot are circumscribed differently. Fikkert 

argues that the child selects the next stressed syllable to the left in addition 

to the stressed final syllable. For instance, Dutch /�� ‘crocodile’ 

should be realized as �� because the ultimate, main stressed 

syllable and the antepenultimate, secondary stressed syllable are kept. The 

disyllabic representation is then mapped onto the trochaic template 

resulting in a trochaic pattern. Since the production template still consists 

of one single trochaic foot, stress shifts to the initial syllable. The 

representation of the child is depicted in Figure 3:


Wd 

F 

σS σW 

�� 

�� 


Stage 3 

At stage 3, the productions are extended to two feet. According to Fikkert, 

the children have noticed that the target words can consist of more than a 

single foot. She claims that her subjects realized two syllables of the target 

word with equal prominence (level stress). However, her argument for the 

level stress stage is rather weak: she stipulates that the children have to 

produce two equally stressed feet because they are unable to realize stress 

at word level. 

The prosodic representation at stage 3 is depicted in Figure 4 below: 

Wd 

F F 

σS σW σS σW 

�� 


Since the trochaic foot still governs the productions, weak positions in the 

template can be filled with extra syllables.


Stage 4 

The representations are now adult-like. The word level stress has been 

acquired and the child is able to operate at the level of the prosodic word. 

3.2. Demuth & Fee (1995) 

Demuth & Fee propose a more abstract approach which, although primarily 

based on data of English acquiring children, aims to capture the prosodic 

development universally. The basic assumption in Demuth & Fee’s model 

is that prosodic development goes along the prosodic hierarchy (see Figure 

1). In contrast to Fikkert, Demuth & Fee avoid the notion of prosodic 

circumscription and trochaic template mapping. According to them, 

sensitivity to the moraic structure of the mother tongue is already there 

from the onset of word production on. They distinguish between the 

following stages: 

Stage 1 

The first stage is characterized by sub-minimal (monomoraic) words. The 

productions consist of a single CV-syllable and there are no vowel length 

distinctions yet. Thus, the phonological representation of the words also is 

CV. 

Stage 2 

At stage 2, children realize words of foot-size (Minimal Words). Stage 2 is 

characterized by three successional sub-stages: at the beginning, the foot is 

disyllabic as for example in �� ‘papa’. Second, as soon as the child is 

able to produce coda consonants the foot can also have a monosyllabic 

form, e.g. �� ‘duck’. Third, the vowel length distinction becomes 

phonemic. The child is now aware of the fact that the stressed syllable of 

Dutch �� ‘banana’ has to be realized with a long vowel ��, while 

in �� ‘giraffe’ the second vowel remains short (examples from Robin, 

see Fikkert, 1994). Demuth & Fee assume a direct relationship between 

distinctive vowel length and the appearance of coda consonants. Thus, a 

CVV structure counts as sub-minimal, and a CVVCVV structure as 

minimal as long as the child does not produce coda consonants.

Stage 3 


Beyond the minimal word stage, syllable structure can be more complex 

and words can have a larger size than a single foot. This is also the stage 

where the largest progress in the development of the word stress is 

predicted. The child seems to become aware that feet have to be stressed 

and that there are language-specific stress rules. Demuth & Fee do not 

assume a trochaic template. However, they adopt Fikkert’s assumption of 

an obligatory intermediate stage of level stress where two feet are produced 

with primary stress. 

At the end of stage 3, children acquire stress at the word level and they 

realize one primary stress per word. 

Stage 4 

At the final stage, extrametrical (i.e. unfooted) syllables are permitted. 

Children at this stage operate at the level of the prosodic word. 

4. Discussion of the models 

Although both models can explain a number of frequently observed 

patterns like syllable deletions and word size restrictions, there are a 

number of empirical and theoretical problems related with the models. 

First, Fikkert and Demuth & Fee assume that the prosodic development 

proceeds bottom-up, i.e. from a lower level of representation (the foot or 

the mora) to the top of the prosodic hierarchy (the prosodic word). Children 

invariably have to pass trough one stage before they can go to the next. For 

example, multisyllabic words like ‘elephant’ or ‘crocodile’ have to show a 

level stress pattern before they can be produced adult-like. 

Fikkert explicitly points to that fact. Missing evidence in her data is 

explained by the recording modalities or is due to the fact that a given stage 

took a very short time. Demuth & Fee, in contrast, are not explicit with 

respect to the ordering of the stages. However, they claim that prosodic 

development proceeds along the prosodic hierarchy. Since in the prosodic 

hierarchy one constituent strictly dominates the constituent below, stages 

cannot vary with respect to their temporal order. According to the models, 

the following realizations for /�� ‘crocodile’ of Jule, a girl 

acquiring German, should be chronologically impossible (data from my 

own corpus):


Example 2. 

child form age description 

�� 

�� 

�� 

(1;08,12) 

(1;08,29) 

the main stressed syllable is realised 

a foot with final stress is realized 

(1;10,14) level stress emerges 

As the examples illustrate, level stress can occur after a finally stressed 

variant of the target word was produced, contrary to the predictions of the 

models. Such an acquisition order provides empirical evidence against level 

stress as an obligatory component of prosodic development. Additional 

empirical support comes from the data of English acquiring children 

examined by Kehoe & Stoel-Gammon (1997) who also could not find a 

systematic emergence of level stress. 

Level stress as assumed in the models above is problematic also from a 

grammatical point of view: the representation intended to create level stress 

(see Figure 4 above) essentially violates the strict layer hypothesis because 

the two feet are not correctly bounded into the prosodic word. The problem 

is that the strict layer hypothesis never can be kept by such a representation 

because there is no gradation in prominence at the word level. According to 

prosodic theory, two equally stressed feet must not occur within a single 

prosodic word: 

*Wd 

FS FS 


�� 

Figure 8. The ill-formed representation of the prosodic hierarchy as implied by 

Fikkert (1994) and Demuth & Fee (1995) 

Both models remain vague with respect to the source of level stress: it is 

unclear how the stages of level stress fit to the assumption that prosodic 

development is directed by universal prosodic principles. Since they do not 

discuss the possibility of a child-specific representation, the representation 

according to the prosodic hierarchy should look like illustrated in Figure 6:


Wd Wd 

FS 

FS 


�� 

Figure 9. A prosodic representation that incorporates the requirements of the 

prosodic hierarchy and that allows for level stress 

The representation in Figure 6 admits the co-occurrence of two equally 

stressed feet because every foot projects its own prosodic word. The 

drawback is that this assumption is ad hoc. There is no motivation for 

separating a single prosodic word like Dutch /�� ‘crocodile’ into 

two prosodic words. In addition, it is an open question which factors could 

trigger the merging of the two prosodic words into a single one later. 

Another problem is that the models described above are primarily based 

on truncation patterns in multisyllabic words. This is critical from a 

methodological point of view because it is presupposed that the truncation 

of syllables is exclusively triggered by prosodic size restrictions. Recent 

evidence, however, suggests that segmental properties of syllables can also 

affect the truncation rate. For example, syllables with sonorant onsets seem 

to be more prone to truncation than syllables with obstruent onsets (Kehoe 

& Stoel-Gammon 1997). 

A comparison of both models suggests that the predictions of the 

template mapping model of Fikkert (1994) are sometimes too strong. Thus, 

the prosodic hierarchy model of Demuth & Fee (1995) seems to be superior 

because of its greater flexibility. First of all, it prevents Fikkert’s circular 

process of assigning a trochaic structure via prosodic circumscription that 

actually should be created by the foot template. Furthermore, the prosodic 

hierarchy model allows for more variability in the productions of children. 

For example, it allows for the co-occurrence of monosyllabic and disyllabic 

feet in contrast to Fikkert’s model that only proposes disyllabic trochees for 

a very long period of time. As the data of children acquiring English 

suggest, there are doubts on Fikkert’s view that the disyllabic trochee is the 

unique representation at the early stages (Kehoe & Stoel-Gammon, 1997;


Salidis & Johnson, 1997). Moreover, Fikkert predicts a systematic stress 

shift to the left in disyllabic iambs, a pattern that still needs empirical 

evaluation. It is also possible that stress shift is rather the result of a 

complex interplay of factors like edge preferences, weight sensitivity and 

segmental factors than of a simple template mapping mechanism. If this is 

true, stress shift can be bidirectional to the left or to the right, depending on 

the relative importance of the factors involved. 

Fikkert’s model is more detailed than the model of Demuth & Fee. It is 

at best elaborated for stage 1 and 2. With respect to the later stages she 

remains somewhat inconsistent. For example, she strongly argues for the 

foot as the relevant prosodic unit, but already at stage 2 the syllable, not the 

foot, becomes the target of circumscription: 

“[...] the child realises both syllables of the target word. However, stress 

falls on the first syllable. The segmental material of both syllables of the 

adult word is taken out and mapped onto the child’s trochaic template [...]” 

(p. 210). 

Fikkert also considers the possibility of circumscribing a foot. She 

concludes that the children circumscribe syllables because the surviving 

syllables do not constitute a foot in the adult word. But the examples she 

presents (p. 211) do form two feet within a weight-sensitive model, with 

each foot containing at least two moras (Example 3): 

�� 

�� 

�� 

�� 

Example 3. 

child form adult target gloss 

‘elephant’ 

‘pelican’ 

‘locomotive’ 

‘farm’ 

Fikkert cannot account for this fact because she exclusively assumes 

weight-insensitive trochees at stage 2. Demuth & Fee’s model, in contrast, 

would allow for the retention of the foot as the relevant unit since it 

assumes sensitivity to syllable weight with the emergence of the foot 

structure. 

Demuth & Fee, in contrast, have problems to explain the stress shift to 

the first syllable in the examples above for two reasons: first, recent 

evidence suggests that the relationship between distinctive vowel length 

and the emergence of coda consonants is not as categorial as they claim. In 

an examination of Fikkert’s data, Salidis & Johnson (1997) found that,


contrary to their English acquiring child, the vowel length was not 

controlled by the Dutch children even if they correctly produced coda 

consonants. If in turn, children cannot control vowel length appropriately, 

they cannot assign two moras to a long vowel. The authors relate the 

divergence between the languages to the impact of vowel quantity on the 

stress pattern: in English, the long vowels in (C)VV(C) syllables count as 

heavy and thus attract stress, contrary to Dutch which rather relies on the 

open-closed distinction. In Dutch, a (C)VC syllable counts as heavy, while 

a (C)VV does not. Thus, a learner of Dutch presumably does not rely on 

vowel length as an indicator for stress, while it is crucial for a learner of 

English to identify the relationship between vowel quantity and stress. 

Second, given that the absence of the vowel length distinction is an artifact 

of the investigation and children have mastered the vowel length 

distinctions if they produce bimoraic feet. Then neither universal nor 

language-specific constraints could account for the fact that the superheavy 

finals lose their primary stress in favor of the less heavy ultimates because 

Dutch follows the universal generalization that a (C)VCC (e.g. /��/) or a 

(C)VVC (e.g. /��/) syllable is heavier than a (C)VV syllable (/��/,/��/). 

The observation that Dutch children need more time to acquire vowel 

length distinctions indicates that language-specific properties may influence 

the prosodic representation in a more detailed way than assumed so far. 

Thus, further empirical work is needed to shed light on the interplay of 

universal principles and language-specific conditions in prosodic 

development. 

5. Conclusion 

In the present paper, two models of prosodic development are introduced 

and examined. As they evidence, the acquisition of word prosody largely 

conforms to the prosodic hierarchy in such a way that universal prosodic 

constituents as the foot or the mora govern children’s word productions. 

This is essential in both models. However, it has turned out in the 

discussion that there are empirical, theoretical and methodical 

shortcomings. Common problems of both accounts are the absence of 

empirical and theoretical motivation of level stress and the reliance on 

truncations as the primary diagnostics of prosodic development. 

In sum, the evidence so far rather supports the prosodic hierarchy model 

of Demuth & Fee (1995) because it is more flexible than Fikkert’s template 

mapping model.


References 

Demuth, K. (1996). The prosodic structure of early words. In: J. Morgan & K. 

Demuth (eds.) From signal to syntax: Bootstrapping from speech 

to grammar in early acquisition. Lawrence Erlbaum Associates, 

Hillsdale, N.J., 171-184. 

Demuth, C. & Fee, J. (1995). Minimal words in early phonological 

development. Ms., Brown University and Dalhousie University. 

Fikkert, P.M. (1994). On the acquisition of prosodic structure. Holland Institute 

of Generative Linguistics, Dordrecht, 

Hayes, B. (1991). Metrical stress theory: principles and case studies. Ms, 

UCLA. 

Hayes, B. (1995). Metrical stress theory. Chicago University Press, Chicago. 

Johnson, J. & Salidis, J.S. (1997). The production of minimal words: A 

longitudinal case study of phonological development. Language 

Acquisition, 6 (1): 1-36. 

Kehoe, M. & Stoel-Gammon, C. (1997). The acquisition of prosodic structure: 

An investigation of current accounts of children’s prosodic 

development. Language, 73 (1): 113-144. 

Ota, M. (2001). Phonological Theory and the Development of Prosodic 

Structure: Evidence from Child Japanese. Available at 

http://www.ling.ed.ac.uk/~mits/downloadables.shtml 

Selkirk, E. (1980). The role of prosodic categories in English word stress. 

Linguistic Inquiry, 11: 563-605. 

Selkirk, E. (1984). Phonology and Syntax: The relation between Sound and 

Structure. MIT Press, Cambridge, MA.

Base-Identity and the Noun-Verb Asymmetry in 

Nivkh 

Hidetoshi Shiraishi 


1.1. Background 

Morphologically complex words often exhibit phonological similarities 

with their morphologically related base forms which they are derived from. 

In a number of cases, these similarities yield a marked phonological pattern 

given the general rules or phonotactics of the language (Kenstowicz, 1996; 

Burzio, 1997, 2002 etc.). In Optimality-Theory (OT), similarity between 

existing words is captured by Output-to-Output (OO) correspondence 

constraints (Burzio, 1996, 2002; Kenstowicz, 1996, 1997; Benua, 1997ab; 

Ito and Mester, 1997; Steriade, 2000 etc.). The marked phonological 

pattern arises when similarity between words takes priority over the 

canonical phonology of the language. OT expresses this situation by 

ranking OO-correspondence constraints above phonological markedness 

constraints. OO-correspondence constraints evaluate the output candidates 

and select the one which is most similar to the base. 

Since the base plays a crucial role in computing the phonology of its 

derivatives, it is important to identify the correct surface form as the base. 

Many authors have observed that OO-constraints have access to the base 

only if the latter occurs as an independent word (Kenstowicz, 1996; Benua, 

1997a; Ito and Mester, 1997). 1 Consider the s-voicing observed in the 

northern dialects of Italian. In these dialects, s and z are in complementary 

distribution. Z appears intervocalically, when the flanking vowels belong to 

the same phonological word (examples from Kenstowicz, 1996: 373-374).

160 Hidetoshi Shiraishi 

1.1 

a. a[z]ola ‘button hole’ 

a[z]ilo ‘nursery school’ 

ca[z]-a ‘house’ 

ca[z]-ina ‘house - diminutive’ 

b. lo [s]apevo ‘I knew it’ 

telefonati [s]i ‘having called each other’ 

The distribution of s-voicing in lexical items containing a prefix is more 

complicated. When the target precedes the boundary, s-voicing applies 

(1.2a). But when the target follows the boundary, s-voicing may or may not 

apply, even if the structural description of s-voicing is met (1.2b, c). 

1.2 

a. di[z]-onesto ‘dishonest’ 

di[z]-ugale ‘unequal’ 

b. re-[z]istenza ‘resistance’ 

pre-[z]entire ‘to have a presentiment’ 

c. a-[s]ociale ‘asocial’ 

bi-[s]essuale ‘bisexual’ 

pre-[s]entire ‘to hear in advance’ 

The unexpected blocking of s-voicing in 1.2c is in sharp contrast with the 

items in 1.2b where z surfaces intervocalically, following the phonological 

norm of the language. Nespor and Vogel (1986) pointed out that the crucial 

difference between the items in 1.2b and 1.2c lies in the lexical status of the 

stem to which the prefix is attached; in 1.2c the stem occurs as an 

independent word (sociale, sessuale, etc.) whereas in 1.2b it does not 

(*sistenza, etc.). Following this view, Kenstowicz (1996) claimed that there 

is a lexico-morphological pressure from the independently occurring stem 

to surface its derivative as similar as possible. The presence of such an 

independently occurring immediate constituent is thus crucial in computing 

the phonology of a morphologically complex item. Kenstowicz dubbed this 

generalization Base-Identity; the base forces its derivative to be formally as 

similar as possible in order to “improve the transparency of morphological 

relationships between words and enhance lexical access” (Kenstowicz, 

1996: 372).


1.3 Base-Identity: Given an input structure [X Y] output candidates are 

evaluated for how well they match [X] and [Y] if the latter occur as 

independent words. (Kenstowicz, 1996: 372) 

The languages in East Asia provide an interesting test for this 

generalization. Languages as Korean or Japanese show a systematic 

difference in the composition of verbs and nouns; while verbal stems 

always surface with a morphological extension, nominal stems may surface 

without such an extension. This means that complex words formed from a 

nominal stem always have an independently occurring base to which they 

phonologically should conform, whereas verbal derivatives lack such a 

base and hence should not show such conformity. This prediction is borne 

out in Korean in which derivatives of nominal and verbal stems are subject 

to different phonology (Kenstowicz, 1996. See section 2.3 below.). In this 

paper, I discuss another language of East Asia, Nivkh, which also has an 

asymmetric composition of nouns and verbs like Korean and Japanese. I 

will focus on two phonological phenomena, Consonant Alternation and 

Final Fricative Devoicing and show that both phenomena exhibit 

asymmetries between nominal and verbal phonology. I will discuss each 

case in detail and argue that Base-Identity is the driving force of these 

asymmetries. 

The article is organized as follows. I will start with a descriptive sketch 

of Consonant Alternation (section 2.1) and then illustrate the exceptional 

behavior of nominal stems as a case of noun-verb asymmetry (section 2.2). 

While most previous works, including my own, somehow stipulated the 

asymmetric behavior of nominal and verbal stems, I will argue that Base- 

Identity provides a superior analysis which is free from such a stipulation. 

Section 3 discusses the second phenomenon, Final Fricative Devoicing. I 

will illustrate the asymmetric behavior of fricative-final nominal and verbal 

stems when followed by a suffix. The pattern of asymmetry is as in CA: 

while verbal phonology is subject to canonical phonology, nominal 

phonology is not. Section 4 concludes. 

1.2. About Nivkh 

Nivkh (also called Gilyak) is an isolated language spoken by the people of 

Nivkh, who live on the island of Sakhalin and in the lower reaches of the 

Amur River in the Russian Far East. The language has four dialects and the


major discrepancy is between the Amur dialect, spoken in the Amur area on 

the continent and the west coast of north Sakhalin, and the Sakhalin dialect 

spoken in the east coast of Sakhalin. Nivkh is listed in the UNESCO Red 

Book on endangered languages as being seriously endangered. According 

to the census of 1989, the percentage of speakers is 23, 3% of the total 

population of 4,681. 2 This article concerns the phonology of the Amur 

dialect spoken by the continental Nivkh. All the examples are from the 

following sources, unless otherwise mentioned: Krejnovich (1937), and 

Saveleva and Taksami (1970). 

2. Consonant Alternation 

2.1. A descriptive sketch 

I will first outline the segmental inventory of Nivkh. 

2.1 Consonantal inventory of Nivkh 

(I) aspirated plosives p� t� c� k� q� 

(II) non-aspirated plosives p t c k q 

(III) voiceless fricatives f r� s x � 

(IV) voiced fricatives v r 3 z � � 

nasals m n � � 

lateral l 

glides j h 

2.2 Vowels 

i � u 

e o 

a 

Consonant Alternation (henceforth CA) is a phonological process which 

changes the feature [continuant] in obstruents when they are placed in 

certain phonological and morphosyntactic contexts. Descriptively, CA 

consists of two processes: spirantization, in which a plosive changes to a 

fricative, and hardening, in which a fricative changes to a plosive. 

Laryngeal features are also relevant since aspirated plosives only alternate


with voiceless fricatives and non-aspirated plosives with voiced fricatives, 

i.e. the alternation is strictly between the obstruents of row (I) and (III), or 

4, 5 

(II) and (IV). 

2.3 Spirantization: (I) > (III), (II) > (IV) 

a. (I) > (III) mac�a [r�]om (< t�om) ‘fat of a seal’ 

seal fat 

c�ol�i [�]os (< q�os ) ‘neck of a 

reindeer neck reindeer’ 

b. (II) > (IV) p�eq [v]��x (< p��x ) ‘chicken soup’ 

chicken soup 

mac�a [z]us (< cus) ‘meat of a seal’ 

seal meat 

2.4 Hardening: (III) > (I), (IV) > (II) 

a. (III) > (I) c�x�f [q�]a- (< �a-) ‘to shoot a bear’ 

bear shoot 

cus [t�]a- (< r�a-) ‘to bake meat’ 

meat bake 

b. (IV) > (II) tux [k]e- (< �e-) ‘to take an axe’ 

axe take 

p�n�nx [t]�u- (< r�u-) ‘to teach one's 

one's sister teach sister’ 

The phonological contexts of spirantization and hardening are in 

complementary distribution. Spirantization takes place when the target 

(plosive) follows a vowel, a glide, or a plosive (2.5). There is no 

spirantization when the target follows a fricative or a nasal (2.6). 

2.5 Spirantization Preceding segment 

Vowel mac�a [r�]om ‘fat of a seal’ 

Glide k��nraj [r�]om ‘fat of a duck’ 

k��nraj [v]��x ‘duck soup’ 

Plosive �t [r�]om ‘fat of a species of 

duck’ 

amsp [v]��x ‘soup of a species 

of seal’


2.6 No spirantization 

Fricative c�x�f t�om ‘bear fat’ 

c�x�f p��x ‘bear soup’ 

Nasal k�e� t�i ‘sun ray’ 

rum d�f ‘Rum(person)’s house’ 

On the other hand, hardening occurs when the target (fricative) follows 

either a fricative or a nasal (2.7). When a segment other than fricative 

precedes the target, hardening does not occur (2.8). 

2.7 Hardening Preceding segment 

Fricative cx�f [q�]a- (< �a-) ‘to shoot a bear’ 

lovr� [c]osq-(< zosq-) ‘to break a spoon’ 

Nasal qan [d]�u- 6 (


whereas hardening activates when a fricative is in the input. In the past, 

many approaches have overlooked this generalization and described the 

rules as if they had independent structural goals. This is not the case. 

Let us now move to the morphosyntactic conditioning. CA targets a 

segment at the left edge of a derived morphosyntactic unit in the presence 

of a preceding segment. CA applies cyclically to every left edge of a 

morpho-syntactic unit until the maximal projection (NP, VP) is reached. 

2.10 Means of derivation 

Prefixation p�-[r�]u (< t�u) ‘one’s own sledge’ 

REF-sledge 

Postposition t��x-tox ‘towards the top’ 

top-ALL 

tu-rox ‘towards a lake’ 

qan-dox ‘towards a dog’ 

Reduplication t�k[r�]�k- ‘to be silent’ 

(Sakhalin dialect, Hattori, 1962: 107) 

NP formation mac�a [r�]om ‘fat of a seal’ 

VP formation cx�f [q�]a- (< �a-) ‘to shoot a bear’ 

On the other hand, CA never targets segments in a non-derived 

environment, nor does it apply across XP boundary, as shown in 2.11 and 

2.12, respectively. 

2.11 CA does not apply in non-derived environment 

utku *ut[�]u ‘man’ 

n��s *n��[c�] ‘teeth’ 

e�l� 

2.12 No CA across XP boundary (subject-predicate) 

r�o- ‘The child holds (something)’ 

= [NPe�l�] [VPr�o-] (‘child’ is subject) 

Example 2.13 below differs minimally from example 2.12 above with 

respect to the application of CA. In the former, CA applies since the noun 

is the object of the following predicate. Thus these two words form a VP, 

differing minimally from example 2.12.


2.13 

e�l� [t�]o- ‘(Someone) holds the child’ 

= [VP[NPe�l�][V t�o-]] (‘child’ is object) 

2.2. The spirantization – hardening asymmetry 

There is one environment in which the regular pattern of CA as depicted 

above fails to apply. Nouns beginning with a fricative never undergo 

hardening. In such a case, the structural goal of CA (2.9) is not achieved. In 

this context the otherwise illicit fricative-fricative or nasal-fricative 

sequence appears. 

2.14 

a. t�ulv vo *t�ulv [b]o ‘winter village’ 

winter village 

b. c��r vox *c��r [b]ox 'a hill covered with grass' 

grass hill 

c. t�f r�� *t�f [t�]� ‘entrance door’ 

house door 

d. t�e� vaqi *t�e� [b]aqi ‘coal box’ 

coal box 

Previous works have either described this context as an exception to CA, or 

did not discuss it. In most cases, these works simply stipulated that a) nouns 

do not undergo hardening, or alternatively b) only transitive verbs undergo 

hardening. Once stated as a condition this way, the application of hardening 

to nouns can indeed be avoided. However, adding such a condition (in 

either form) to a phonological rule pairs prosodic phonology with specific 

category labels (transitive verb, noun), which is unlikely to occur in natural 

languages (Nespor and Vogel, 1986; Selkirk, 1986 etc.). 7 But most 

critically, it is explanatorily unsatisfying; why should hardening be 

restricted to transitive verbs (or alternatively, why should nouns be an 

exception to hardening)? No literature provides a satisfactory answer to this 

question. 

The tacit assumption prevailing in the previous works is that the input to 

CA is the citation form, i.e. the form that appears in isolation. Following 

this assumption, the transitive verbs ought to undergo hardening since they 

initiate with a fricative in the citation form. However, there is no a priori


reason that the citation form should be the underlying form. In Shiraishi 

(2000), I defended the position that the citation form of these transitive 

verbs cannot be the underlying form, if we want to advocate a 

phonologically plausible analysis for the observed spirantization-hardening 

asymmetry. The lack of hardening in nouns could be interpreted as 

evidence that CA consists solely of spirantization, without hardening. I 

argued that transitive verbs of Nivkh initiate with a plosive at the 

underlying level, instead of a fricative that appears in the citation form. 

Initiating with a plosive, transitive verbs now undergo spirantization in the 

8, 9 

same way as nouns do. 

2.15 

Previous analyses Shiraishi (2000) 

VP 'shoot a NP 'bird soup' VP 'shoot NP 'bird 

bear' 

a bear' soup' 

Underlying 

form 

cx�f �a- p�eq p��x cx�f q�a- p�eq p��x 

Spirantization not 

applicable 

p�eq [v]��x blocked p�eq [v]��x 

Hardening cx�f [q�]a- not applicable 

Surface form cx�f q�a- p�eq v��x cx�f q�a- p�eq v��x 

The analysis in Shiraishi (2000) leaves hardening out of the list of 

phonological processes; nouns do not undergo hardening since there is no 

hardening in the phonology of the language. 

2.16 

Previous 

analyses 

Shiraishi (2000) 

Underlying 

form 

t�ulv vo t�ulv vo 

Spirantization not applicable not applicable 

Hardening t�ulv [b]o 

Surface form �t�ulv bo t�ulv vo �: incorrect output 

This analysis is free from category-specific specification in the structural 

description of the rule, which was inevitable in the previous analyses. 

Although this analysis explains nicely why fricative-initial nouns never 

undergo hardening in Nivkh, it is not without problems. First, it


manipulates the underlying form of a specific lexical category (transitive 

verb) in order to explain phonologically exceptional behavior. Although 

such a 'prespecification' at the underlying level is not an uncommon way to 

approach phonological exceptions (cf. Inkelas, Orgun and Zoll, 1997 

amongst others), such an approach does not explain why only this 

particular class of words needs to undergo such manipulation. Since 

prespecification puts unpredictable information into the lexicon, it is a 

strong descriptive device which leaves little space for phonological 

generalizations. Contrary to what seems to be the case at first glance, the 

analytical gain of Shiraishi (2000) from previous analyses is not so 

obvious. One may ask correctly what the difference between the two 

analyses is, which claim that a) nouns are exceptions to hardening 

(previous analyses) or b) transitive verbs undergo spirantization because 

they initiate with plosives underlyingly (Shiraishi 2000). In other words, it 

remains an arbitrary choice that only transitive verbs, and not other 

categories, undergo prespecification. 

Secondly, the relationship between the underlying form and the citation 

form is obscured in transitive verbs. By positing a form other than the 

citation form as the underlying form, the citation form would always be 

derived from the underlying form by some morphological operation. That 

is, Shiraishi (2000) created asymmetry between the morpholexical make-up 

between nominal and verbal stems. 

2.17 

Nominal stem Verbal stem 

Underlying form p��x q�a- 

Surface form p��x �a- 

In fact, this asymmetry describes the historical path of derivation of 

transitive verbs (Jakobson, 1957; Austerlitz, 1977). On synchronic grounds, 

however, it is highly doubtful whether such a morphological operation can 

be justified. 

In the next section I propose an alternative approach to the 

spirantization-hardening (or noun-transitive verb) asymmetry, which makes 

use neither of prespecification nor of information about category labels. 

Instead, I will argue that correspondence relation between output forms 

plays a decisive role in distinguishing the phonological behavior of the two 

groups. Once stated this way, nothing ought to be stipulated in order to


derive the surface form; this follows naturally from the phonological 

principles of the language. 

2.3. Noun-verb asymmetry as Base-Identity 

In Nivkh, verbal and nominal stems differ from each other in one crucial 

morphological aspect; verbal stems should always end in a morphological 

extension but nominal stems do not. Or put differently, verbal stems never 

surface in isolation, whereas nominal stems do. This means that bare verbal 

stems cannot function as citation forms. Usually, the form with an 

infinitival suffix (-d�, -t�) provides the citation form. 

2.18 

2.19 

Stem /�a/ ‘to shoot~’ /r�o/ ‘to take’ 

Infinitive �a-d� r�o-d� 

(citation form) 

‘when~’ �a-�an r�o-�an 

/vo/ ‘village’ /�ota/ ‘town’ 

Citation form vo �ota 

Allative vo-rox �ota-rox 

As mentioned in section 1, independent forms often exercise special 

influence on the realization of morphologically related forms in derived 

contexts. For instance, in certain varieties of English the existence of the 

form condense guarantees that the vowel of the second syllable in the 

morphologically related word condensation does not reduce to a schwa. 

2.20 

co�nd[�]nsa�tion co�mp[�]nsa�tion 

cond[��]nse co�mp[�]nsa�te 

Phonology would expect the unstressed vowel of condensation to surface 

with a schwa, as is the case with the structurally similar compensation. The 

usual explanation for this asymmetry is that the vowel reduction in 

condensation is blocked by virtue of the existence of the morphologically 

related form condense, which appears with a full vowel [�] (Chomsky and


Halle, 1968: 110-116). On the other hand, compensation lacks such a 

morphologically related form with a full vowel. Hence unstressed vowel 

reduces to a schwa, following the phonological norm of the language. 

Another example comes from Korean. In Korean a stem-final consonant 

cluster surfaces only when it is followed by a vowel-initial suffix. In 

combination with a consonant-initial suffix, the cluster is simplified to a 

single consonant (Kenstowicz, 1996: 375). 

2.21 

Stem /kaps/ ‘price’ /talk/ ‘chicken’ 

Citation form kap tak 

Nominative kaps-i talk-i 

Comitative kap-k'wa tak-k'wa 

In the speech of younger generation of Seoul, however, simplification overapplies 

to contexts where vowel-initial suffix follows the stem. 

2.22 

Nominative kap-i tak-i 

Interestingly, this overgeneralization does not apply to verbal stems. Here 

the consonant cluster surfaces. 

2.23 

�ps-�ss-� palk-�ss-� 

Stem /�ps/ ‘not have’ /palk/ 

‘be bright’ 

Past-informal (*�p-�ss-�) 

(*pak-�ss-�) 

Non-past-formal �p-t'a pak-t'a 

Kenstowicz analyzed the absence of the cluster simplification in verbal 

stems to be due to a lack of corresponding citation forms. As in Nivkh, 

verbal stems in Korean never appear in isolation; they should always 

appear with an inflectional ending. In contrast, nominal stems are free to 

appear without any inflectional ending, so they exercise strong influence on 

the realization of their derivatives. Verbal stems, on the other hand, surface 

with consonant clusters since there are no isolated counterparts which 

forces conformity to it. This is an instance of Base-Identity, which requires 

forms in derived contexts to be formally similar to the base. This is the


generalization captured in the Base-Identity constraint of Kenstowicz (1.3), 

repeated below. 

2.24 (=1.3) Base-Identity: Given an input structure [X Y] output 

candidates are evaluated for how well they match [X] and [Y] 

if the latter occur as independent words. (Kenstowicz, 1996: 

372) 

We can account for the noun-verb asymmetry in Korean using Base- 

Identity as a high-ranked constraint. By ranking Base-Identity above a 

faithfulness constraint which prohibits deletion of a segment in the input 

(MAX), nominal stems surface with a single consonant in concordance 

with the base. 

2.25 

constraints → 

/kaps+i/ base: kap 

candidates ↓ 

Base-Identity *CLUSTER MAX 

kapsi *! 

�kapi * 

Base-Identity is vacuously satisfied in verbal stems. Since there is no base 

to which verbal stems should conform, verbal stems exhibit canonical 

phonology. Consonant clusters surface only if a vowel-initial suffix 

follows, elsewhere they are simplified. A phonological markedness 

constraint *CLUSTER penalizes every output candidate containing a triconsonantal 

cluster. 

2.26 


/�ps+�ss+�/ base: ø 


��ps-�ss-� 


�p-�ss-� *!


2.27 


/�ps-t'a/ base: ø 



�ps-t'a *! 

��p-t'a * 

The noun-verb asymmetry of hardening in Nivkh is strikingly similar to the 

case of Korean. As in Korean, verbal stems of Nivkh are not allowed to 

surface in isolation; they always require a morpho-syntactic extension 

(2.18). This is in contrast to nominal stems, which may surface in isolation 

(2.19). The difference is reflected directly in their phonological behavior; 

verbal stems undergo hardening, nominal stems do not. In the next section I 

will show how this analysis formally works. 

2.4. Base-Identity blocks hardening 

I assume that the phonological markedness constraint that induces 

hardening to be the Obligatory Contour Principle (OCP) [fric]. 10 OCP [fric] 

prohibits adjacent fricatives. Base-Identity, as defined in the previous 

section, prefers output candidates which are similar to the base. With the 

ranking Base-Identity >> OCP [fric], we obtain the desired output; 

hardening does not apply to nominal stems. 

2.28 


/tulv vo/ base: vo 


Base-Identity OCP [fric] IDENT [cont] 

�tulv vo * 

tulv bo *! * 

Base-Identity is satisfied vacuously in verbal stems since they lack a base. 

Being free from Base-Identity, an initial fricative now hardens to a plosive 

in order to circumvent an OCP violation.

2.29 


/c�x�f �a-/ base: ø 



Base-Identity OCP IDENT [cont] 

c�x�f �a- *! 

�c�x�f [q�]a- * 

Since Base-Identity refers to the base and not to the input, this ranking 

always derives the correct output no matter of the input value. This is 

illustrated in the tableau below in which the verbal stem initiates with a 

plosive in the input (cf. Shiraishi, 2000). 

2.30 


/c�x�f q�a-/ base: ø 


Base-Identity OCP IDENT [cont] 

�c�x�f q�ac�x�f 

[�]a- *! * 

The present analysis correctly derives the observed output no matter of the 

input. There is thus no prespecification, in which input strings are fixed to 

take a particular form. Nor does it make use of information of category 

labels, a condition that was inevitable in previous descriptions in order to 

let hardening apply appropriately. The current analysis makes a totally 

different claim. There is no exception to the hardening rule (nominal 

stems), nor should the specific undergoer (verbal stems) be prespecified at 

the underlying level. Rather, the asymmetry of nominal and verbal stems 

follows from the existence of a base, which is an independent fact of the 

language. By making use of such morpho-lexical information, the current 

analysis accounts for the noun-verb asymmetry without appealing to 

language-specific stipulations. 

3. Final Fricative Devoicing 

Base-Identity plays a crucial role in another phonological phenomenon of 

Nivkh. In this section, I will discuss such a case.


3.1. Distribution of laryngeal features 

Like Danish, a full contrast of laryngeal features of Nivkh obstruents is 

realized only at the stem-initial position, which is the most prominent 

position as in many other languages (cf. Beckman, 1996). In other 

positions, laryngeal features do not exercise a phonemic contrast and the 

feature value at the surface level is predictable from the context (Jakobson, 

1957: 83). In principle, non-prominent (stem-medial and final) positions 

only allow non-aspirated plosives and voiced fricatives. Aspirated plosives 

and voiceless fricatives, on the other hand, are excluded from these 

positions. Following Jakobson (1957), I will call them the lenis and fortis 

series, respectively. 

3.1 

3.2 

Lenis obstruents non-aspirated plosives : p t c k q 

voiced fricatives : v r z � � 

Fortis obstruents aspirated plosives : p� t� c� k� q� 

voiceless fricatives : f r s x � 

pal ‘forest’ �t�k ‘father’ 

p�al ‘floor’ �k�n ‘mother’ 

ra-d� ‘to drink’ ova ‘flour’ 

r�a-d� ‘to bake’ muvi ‘porridge’ 

eri ‘river’ 

There are two exceptional contexts in which a voiceless fricative appears in 

a non-prominent position: i) when preceding a plosive, and/or ii) before an 

I[ntonational] P[hrase] boundary (Jakobson, 1957: 83). 

3.3 

a. esqa-d� ‘to hate’ 

taft� ‘salt’ 

kins ‘evil spirit’ 

kins k�u-d� ‘to kill an evil spirit’ 

c�x�f ‘bear’ 

c�x�f k�u-d� ‘to kill a bear’ 

als ‘berry’ 

als p�e- ‘to pick berries’

. nivx ‘human’ 

erx ‘to him/her’ 


The examples in 3.3b indicate that it is only the absolute final position that 

matters; the fricative second from the right appears as voiced. In Nivkh, 

there are no words ending in consecutive voiceless fricatives, indicating 

that voicelessness is required only for the very last fricative in an IP. I 

assume this to be due to a restriction which I will call Final Fricative 

Devoicing (FFD). FFD targets every final fricative within an IP. 

Stem-final voiceless fricatives appear as voiced, however, as soon as the 

above-mentioned conditions are removed. Thus if a stem-final fricative is 

embedded in an IP, i.e. not final in the domain, and if it is not adjacent to a 

plosive it becomes voiced (3.4a). This is in concordance with the 

phonotactics of stem-medial fricatives which are always voiced (3.4b) 

unless adjacent to a plosive. This distribution is not surprising since stemmedial 

fricatives are expected not to coincide with an IP-boundary. 

3.4 

a. [kinz it-]I ‘go insane’ 

[c�x�v l�j-]I 

'to kill a bear' 

[alz �a-]I 

‘to pick berry’ 

b. ezmu- ‘to like~’ 

urla ‘good’ 

pa�la ‘red’ 

Outside of these two contexts, only lenis obstruents appear in nonprominent 

positions. Apparently, lenis obstruents have more distributional 

freedom than fortis obstruents, indicating their unmarked status in the 

phonology of Nivkh. Since non-prominent positions are predictably 

occupied by lenis obstruents, I assume that obstruents in these positions are 

unspecified for laryngeal features in the underlying form. Unless contextsensitive 

requirements contravene, obstruents without laryngeal 

specifications surface as lenis, the unmarked obstruent of the language.


3.2. Base-Identity in suffixation 

Having discussed the unmarked nature of the lenis obstruents, we are now 

ready to look at the way FFD interacts with Base-Identity. Such a case 

arises when a suffix attaches to a fricative-final stem. 

Like stem-medial and final positions, the initial obstruent of a suffix 

does not exhibit a laryngeal contrast, indicating that it is a non-prominent 

position. Except for a few exceptional cases, only lenis obstruents are 

allowed. 11 

3.5 

-tox/rox/dox allative (case suffix) 

-�u/gu/ku plural 

-t�/d� infinitive 

-gu/ku causative 

When affixed to a stem, the redundant [+voice] specification of the stemfinal 

segment spreads to the initial obstruent of the suffix. 

3.6 

‘to drink-INF’ 

ra-d� 

‘big-INF’ 

pil-d� 

‘to walk-INF’ 

amam-d� 

‘to harness-INF’ 

ifk-t� 

‘to bind-INF’ 

jup-t� 

‘to help-CAU-INF’ 

ro-gu-d� 

‘to do-CAU-INF’ 

l�t-ku-d� 

c�am-gu ‘shaman-PL’ 

c�am-dox ‘shaman-ALL’ 

There is an interesting discrepancy between fricative-final nominal and 

verbal stems in this context; following a verbal stem, the initial segment of 

a suffix is always voiced (3.7a), while following a nominal stem, it is 

always voiceless (3.7b). 

fuv-d� 

i�-d� 

t�v�-d� 

3.7 

a. ‘to blow/to saw-INF’ 

‘to kill-INF’ 

‘to go inside the house-INF’


jar-d� 

roz-gu-d� 

t�mz-gu-d� 

b. kins-ku 

‘to feed-INF’ 

‘to divide-CAU-INF’ 

‘drop-CAU-INF’ 

‘evil spirit-PL’ 

c�x�f-ku ‘bear-PL’ 

or�r�-ku ‘Uilta-PL’ 

t�f-tox ‘house-ALL’ 

ti�r�-tox ‘wood-ALL’ 

The reason of this discrepancy is not immediately clear. In particular, the 

final voiceless fricative of nominal stems is a mystery. Being affixed by a 

suffix, it is no longer in the context of FFD, so nothing prevents it from 

appearing in the unmarked voiced fricative. In fact, this is the case with 

verbal stems; final fricatives of verbal stems are systematically voiced 

(3.7a). The other context-sensitive requirement, namely, the precedence to 

a plosive cannot be the reason either since these suffixes have a voiced 

variant, which surfaces when following a (redundant) [+voice] segment 

(3.6, 3.7a). The derivatives of verbal stems in 3.7a show that the initial 

plosive of these suffixes can accommodate a (preceding) voiced fricative, 

unlike plosives in a stem. But in fact, this option is not adopted in nominal 

stems. In short, these context-sensitive requirements cannot explain the 

different behavior of final-fricatives in nominal and verbal stems. 

Under Base-Identity, however, such a discrepancy is explicable. Recall 

that nominal and verbal stems have different morpho-lexical compositions. 

Nominal stems can surface without any morphological ending, making the 

last fricative target of FFD. In contrast, final fricative of a verbal stem is 

always followed by a morphological extension, making it irrelevant to 

FFD. Since Base-Identity claims that derivatives should phonologically 

conform to the base, nominal derivatives conform to their base, which ends 

in a voiceless fricative (due to FFD). This is not the case, however, for 

verbal stems since they have no base and therefore do not underlie such 

pressure. As a consequence, verbal stems undergo canonical phonology and 

fricatives in non-prominent positions do appear as lenis, the unmarked 

obstruents of the language. 

Finally, it is important to note that reference to laryngeal specifications 

using Input-to-Output correspondence constraints is not a viable option in 

this context. Recall that there is no laryngeal contrast in stem-final position 

in Nivkh. A phonological theory which minimizes the specification of 

predictable features in underlying representations, which is the one adopted


here, makes it impossible for Input-to-Output constraints to refer to the 

voiceless status of stem-final fricatives. 12 Thus their voicelessness should 

come from somewhere else. According to the current analysis it originates 

from the base, the independently occurring isolated form. 

4. Conclusion 

In this paper I have discussed phonological asymmetries between nominal 

and verbal stems of Nivkh, as observed in two phonological phenomena 

CA and FFD. Though the asymmetries themselves look very different on 

the surface, this article has made explicit that they are subject to a common 

generalization, Base-Identity. Given the asymmetric composition of nouns 

and verbs, Base-Identity makes two predictions: i) nominal and verbal 

derivatives exhibit different phonological patterns, and ii) it is the nominal 

stem which exhibits the non-canonical phonology given the strong pressure 

from the base. Both predictions were borne out in the phonological 

phenomena discussed above. The base plays a decisive role in computing 

the phonology of nominal and verbal derivatives in both CA and FFD. As 

for CA, the current analysis correctly predicts that nominal derivatives 

accommodate the otherwise illicit segmental sequence (fricative-fricative, 

nasal-fricative), while verbal derivatives do not. This analysis is superior to 

previous accounts since it makes no direct use of the notion of exception, 

which was inevitable in previous works. Rather, the suggested analysis 

relates the asymmetry in phonology to the compositional asymmetry 

between nouns and verbs. 

As for FFD, nominal derivatives showed conformity to their base, in 

ending in a voiceless fricative. Verbal stems on the other hand, do not show 

such conformity since they lack a base. Unlike nominal derivatives, the 

stem-final fricatives of verbal derivatives appear as lenis, following the 

canonical phonology of Nivkh. Base-Identity provides us with the 

mechanism underlying the noun-verb asymmetry, and it correctly predicts 

their phonological behavior with respect to the canonical phonology of the 

language.

Acknowledgements 


I would like to thank Dicky Gilbers, Angela Grimm, Maartje Schreuder, 

Jeroen van de Weijer and the audiences of ULCL Phonology meeting at 

Leiden (27-05-2003) and TABU dag (20-06-2003, Groningen) for 

comments on parts of this article. I bear all responsibility for errors. 

Notes 

1 

“…identity effects will come into play only to the extent that the immediate 

constituents composing the complex structure constitute independently 

occurring outputs…(Kenstowicz 1996: 373)”, “The base of an OOcorrespondence 

relation is a licit output word, which is both morphologically 

and phonologically well-formed (Benua 1997a: 29)”, “The bound form of a 

stem is segmentally identical with its corresponding free form (Ito and Mester 

1997: 431)”. 

2 

See www.let.rug.nl/~toshi/ for more information. 

3 

The rhotic r of Nivkh is classified here and elsewhere in the literature (e.g. 

Trubetzkoj 1939) as a voiced fricative since it patterns as such in the CA 

system. Its voiceless r� counterpart is an apical trill containing portions without 

vocal cord vibration (Ladefoged and Maddieson 1996: 236). 

4 

Regarding this nature of CA, one may postulate a single laryngeal feature 

(rather than two) for both plosives and fricatives, e.g. [+spread glottis] for both 

aspirated plosives and voiceless fricatives. Such an analysis is proposed by 

Jakobson (1957) and Blevins (1993). See also section 3 below. 

5 

Segments that underwent CA are put in square brackets. Abbreviations are: 

ALL= allative, asp= aspiration, I=Intonational phrase, INF=infinitive, NP = 

noun phrase, PL= plural, VP = verb phrase, XP = maximal projection. 

6 

The alternation (r >) t > d is due to post-nasal voicing. 

7 

CA exhibits aspects of prosodic phonology (I am using this term to contrast 

with lexical phonology); it is sensitive to pause insertions and to speech rate. I 

would classify it as a P-structure rule in the terminology of Selkirk (1986). Pstructure 

rules exhibit phonological properties of prosodic phonology, yet they 

are sensitive to syntactic bracketing (Selkirk 1986). 

8 

This line of analysis has antecedents. Amongst them are: Kenstowicz and 

Kisserberth (1979), Rushchakov (1981), Kaisse (1985), and Blevins (1993). 

Interestingly, Lev Shternberg, the pioneer of Nivkh study, assumed plosiveinitial 

forms to be the input to transitive structures, as well (Shternberg 1908).


9 

Spirantization and hardening are not ordered relative to each other in the 

tableau below. 

10 

Post-nasal context requires different markedness constraint but I omit it from 

the discussion below. See Shiraishi (2000) for details. 

11 

Following a velar or a uvular plosive, the initial velar of a suffix appears as [x], 

spirantizing the former at the same time: �t�x-xu


Ito, J. and A. Mester (1997). Correspondence and Compositionality: The Gagyo 

Variation in Japanese Phonology. In: I.Roca (ed.), 419-462. 

Jakobson, R. (1957). Notes on Gilyak. Roman Jakobson. Selected Writings II. 

Word and language. Mouton, The Hague and Paris, 72-102. 

Kaisse, E. (1985). Connected Speech. Academic Press, Orlando. 

Kenstowicz, M. (1996). Base-Identity and Uniform Exponence: Alternatives to 

Cyclicity. In: J. Durand and B. Laks (eds.), 365-395. 

Kenstowicz, M. (1997). Uniform exponence: Exemplification and extension. 

In: V. Miglio and B. Moren (eds.), 139-155. 

Kenstowicz, M. and C. Kisserberth. (1979). Generative Phonology: description 

and theory. Academic Press, New York. 

Krejnovich, E. (1937). Fonetika nivxskogo (giljackogo) jazyka [Phonetics of the 

Nivkh (Gilyak) language]. Uchpedgiz, Moskva - Leningrad. 

Ladefoged, P. and I. Maddieson. (1996). The Sounds of the World’s Languages. 

Blackwell, Oxford. 

Miglio, V. and B. Moren (eds.) (1997). University of Maryland Working 

Papers in Linguistics, vol.5. 

Nepor, M. and I. Vogel (1986). Prosodic Phonology. Foris, Dordrecht. 

Roca, I. (ed.) (1997). Derivations and Constraints in Phonology. Clarendon 

Press, Oxford. 

Rushchakov, V. (1981). Akusticheskie xarakteristiki soglasnyx nivxskogo 

jazyka (avtoreferat). Ph.D.dissertation, Akademija Nauk CCCP, 

Leningradskoe otdelenie instituta jazykoznanija. 

Savel’eva,V. and C.Taksami. (1970). Nivxsko-russkij slovar. [Nivkh-Russian 

dictionary] Sovetskaja Enciklopedija, Moskva. 

Selkirk, E. (1986). On derived domains in sentence phonology . Phonology 

Yearbook, 3: 371-405. 

Shiraishi, H. (2000). Nivkh consonant alternation does not involve hardening. 

Journal of Chiba University Eurasian Society. No.3. 89-119 

(Also available at www.let.rug.nl/˜toshi/list_of_publication.htm). 

Abridged version has appeared in the Proceedings of the 120 th 

meeting of the Japanese Society of Linguists, 42-47. 

Shternberg, L. (1908). Materialy po izucheniju gilijackogo jazyka i fol’klora. 

In: Obrachy narodnoj slovesnosti. Vol. 1, Part I. Imper. 

Akademii Nauk, St.Petersburg. 

Steriade, D. (2000). Paradigm Uniformity and the Phonetics-Phonology 

boundary. In: M. Broe and J. Pierrehumbert (eds.). Papers in 

Laboratory Phonology 5. Cambridge University Press, 

Cambridge, 313-334 

Trubetzkoj, N. (1939). Grundzuge der Phonologie. Travaux du Cercle 

Linguistique de Prague, Prague.

The Influence of Speech Rate on Rhythm Patterns 

Maartje Schreuder and Dicky Gilbers 


The topic of this paper is how rhythmic variability in speech can be 

accounted for both phonologically and phonetically. The question is 

whether a higher speech rate leads to adjustment of the phonological 

structure, or just to 'phonetic compression', i.e. shortening and merging of 

vowels and consonants, with preservation of the phonological structure. We 

claim that the melodic content of a phonological domain is indeed adjusted 

optionally when the speech rate increases. In other words, every speech rate 

has its own preferred register, in terms of Optimality Theory (Prince and 

Smolensky, 1993) its own ranking of constraints. 

We will investigate prosodic variability as part of our main research 

project, which involves a comparison of the analyses of music and 

language. Our ultimate aim is to provide evidence for the assumption that 

every temporal behavior is structured similarly (cf. Liberman, 1975). 

Gilbers and Schreuder (to appear) show that Optimality Theory owes a lot 

to the constraint-based music theory of Lerdahl and Jackendoff (1983). 

Based on the great similarities between language and music we claim that 

musical knowledge can help in solving linguistic issues. 

In this paper, we will show that clashes are avoided in allegro tempo. In 

both language and music distances between beats are enlarged, i.e. there 

appears to be more melodic content between beats. To illustrate this, we ran 

a pilot experiment in which we elicited fast speech. As expected, speech 

rate plays a role in rhythmic variability. 

The paper is organized as follows. In section 2 the data of the 

experiment is introduced. Section 3 is addressed to the phonological 

framework of Optimality Theory and the different rankings of andante and 

allegro speech. The method of the experiment is discussed in section 4 and 

the auditive and acoustic analyses plus the results follow in section 5. The 

perspectives of our analysis will be discussed in the final section.

184 Maartje Schreuder and Dicky Gilbers 

2. Data 

We will discuss three types of rhythmic variability in Dutch. The first we 

will call “stress shifts to the right”; the second “stress shifts to the left” and 

the third “beat reduction”. In the first type as exemplified in stúdietòelage 

(s w s w w) ‘study grant’, we assume that this compound can be realized as 

stúdietoelàge (s w w s w) in allegro speech. Perfèctioníst (w s w s) is an 

example of “stress shift to the left” and we expect a realization pèrfectioníst 

(s w w s) in allegro speech. The last type does not concern a stress shift, but 

a stress reduction. In zùidàfrikáans (s s w s) ‘South African’ compounding 

of zuid and afrikaans results in a stress clash. In fast speech this clash is 

avoided by means of reducing the second beat: zùidafrikáans (s w w s). 

Table 1 shows a selection of our data. 

Table 1. Data 

Type 1: stress shift to the right (andante: s w s w w; allegro: s w w s w) 

stu die toe la ge ‘study grant’ 

weg werp aan ste ker ‘disposable lighter’ 

ka mer voor zit ter ‘chairman of the House of Parliament’ 

Type 2: stress shift to the left (andante: w s w s; allegro: s w w s) 

per fec tio nist ‘perfectionist’ 

a me ri kaan ‘American’ 

vi ri li teit ‘virility’ 

Type 3: beat reduction (andante: s s w s; allegro: s w w s) 

zuid a fri kaans ‘South African’ 

schier mon nik oog ‘name of an island’ 

gre go ri aans ‘Gregorian’ 

The different rhythmic patterns are accounted for phonologically within the 

framework of OT. 

3. Framework and phonological analysis 

The mechanism of constraint interaction, the essential characteristic of OT, 

is also used in the generative theory of tonal music (Lerdahl and 

Jackendoff, 1983). In both frameworks, constraint satisfaction determines 

grammaticality and in both frameworks the constraints are potentially


conflicting and soft, which means violable. Violation, however, is only 

allowed if it leads to satisfaction of a more important, higher ranked 

constraint. The great similarities between these theoretical frameworks 

make comparison and interdisciplinary research possible. 

For example, restructuring rhythm patterns as a consequence of a higher 

playing rate is a very common phenomenon in music. In Figure 1 we give 

an example of re-/misinterpretation of rhythm in accelerated or sloppy 

playing. 

Dotted notes rhythm � triplet rhythm 

Figure 1. Rhythmic restructuring in music 

In Figure 1, the “dotted notes rhythm” (left of the arrow) is played as a 

triplet rhythm (right of the arrow). In the dotted notes rhythm the second 

note has a duration which is three times as long as the third, and in the 

triplet rhythm the second note is twice as long as the third. In fast playing it 

is easier to have equal durations between note onsets. Clashes are thus 

avoided and one tries to distribute the notes, the melodic content, over the 

measures as evenly as possible, even if this implies a restructuring of the 

rhythmic pattern. To ensure that the beats do not come too close to each 

other in fast playing, the distances are enlarged, thus avoiding a staccatolike 

rhythm. In short, in fast tempos the musical equivalents of the 

Obligatory Contour Principle (OCP), a prohibition on adjacency of 

identical elements in language (McCarthy, 1986), become more important. 

We claim that - just as in music - the allegro patterns in all the different 

types of data in Table 1 are caused by clash avoidance. There is a 

preference for beats that are more evenly distributed over the phrase. The 

different structures can be described phonologically as a conflict between 

markedness constraints, such as FOOT REPULSION (��) (Kager, 1994), and 

OUTPUT - OUTPUT CORRESPONDENCE constraints (cf. Burzio, 1998) within 

the framework of OT. FOOT REPULSION prohibits adjacent feet and 

consequently prefers a structure in which feet are separated from each other 

by an unparsed syllable. This constraint is in conflict with PARSE-σ, which


demands that every syllable is part of a foot. OUTPUT - OUTPUT 

CORRESPONDENCE compares the structure of a phonological word with the 

structure of its individual parts. For example, in a word such as fototoestel 

'photo camera', OUTPUT - OUTPUT CORRESPONDENCE demands that the 

rhythmic structure of its part tóestel 'camera' with a stressed first syllable is 

reflected in the rhythmic structure of the output. In other words, OUTPUT - 

OUTPUT CORRESPONDENCE prefers fótotòestel, with secondary stress on toe, 

to fótotoestèl, with secondary stress on stel. 

Whereas the normal patterns in andante speech satisfy OUTPUT - 

OUTPUT CORRESPONDENCE, the preference for triplet patterns in fast speech 

is accounted for by means of dominance of the markedness constraint, 

FOOT REPULSION, as illustrated in Table 2. 2 

Table 2. Rhythmic restructuring in language 

a. ranking in andante speech: 


fototoestel 


OUTPUT - OUTPUT 

CORRESPONDENCE 

�� PARSE-σ 

� (fóto)(tòestel) * 

(fóto)toe(stèl) *! * 

b. ranking in allegro speech: 


fototoestel 


(fóto)(tòestel) *! 

OUTPUT - OUTPUT 

�� 

CORRESPONDENCE 

� (fóto)toe(stèl) * * 

PARSE-σ 

Dutch is described as a trochaic language (Neijt and Zonneveld, 1982). 

Table 2a shows a preference for an alternating rhythm. The dactyl pattern 

as preferred in Table 2b, however, is a very common rhythmic pattern of 

prosodic words in languages such as Estonian and Cayuvava: every strong 

syllable alternates with two weak syllables (cf. Kager, 1994). We assume 

that the rhythm grammar, i.e. constraint ranking, of Dutch allegro speech


resembles the grammar of these languages. In the next section we will 

explore whether we can find empirical evidence for our hypothesis. 

4. Method 

To find out whether people indeed prefer triplet patterns in allegro speech, 

we ran a pilot experiment in which we tried to elicit fast speech. Six 

subjects participated in a multiple-choice quiz in which they competed each 

other in answering twenty simple questions as quickly as possible. In this 

way, we expected them to speak fast without concentrating too much on 

their own speech. In Table 3 one of the quiz items is depicted. 

Table 3. Quiz item 

Q4 President Bush is een typische ‘President Bush is a typical ’ 

A1 intellectueel ‘intellectual’ 

A2 amerikaan ‘American’ 

A3 taalkundige ‘linguist’ 

We categorized the obtained data as allegro speech. As a second task the 

subjects were asked to read out the answers at a normal speaking rate 

embedded in the sentence ik spreek nu het woord … uit 'now I pronounce 

the word … '. This normal speaking rate generally means that the subjects 

will produce the words at a rate of approximately 180 words per minute, 

which we categorize as andante speech. All data were recorded on minidisk 

in a soundproof studio and normalized in CoolEdit in order to improve the 

signal-noise (S/N) ratio. Normalizing to 100% yields an S/N ratio 

approaching 0 dB. 

Six trained listeners judged the data auditively and indicated where they 

perceived secondary stress. After this auditive analysis the data were 

phonetically analyzed in PRAAT (Boersma and Weenink, 1992). We 

compared the andante and allegro data by measuring duration, pitch, 

intensity, spectral balance and rhythmic timing (Sluijter, 1995; Couper- 

Kuhlen, 1993; Cummins & Port, 1998; Quené & Port, 2002; a.o.). Sluijter 

claims that, respectively, duration and spectral balance are the main 

correlates of primary stress. In our experiment, we are concerned with 

secondary stress.


For the duration measurements, the rhymes of the relevant syllables 

were observed. For example, in the allegro style answer A2 amerikaan in 

Table 3, we measured the first two rhymes and compared the values in 

Msec. with the values for the same rhymes at the andante rate. In order to 

make this comparison valid, we equalized the total durations of both 

realizations by multiplying the duration of the allegro with a so-called 

'acceleration factor', i.e. the duration of the andante version divided by the 

duration of the allegro version. According to Eefting and Rietveld (1989) 

and Rietveld and Van Heuven (1997), the just noticeable difference for 

duration is 4,5%. If the difference in duration between the andante and the 

allegro realization did not exceed this threshold, we considered the 

realizations as examples of the same speech rate and neglected them for 

further analysis. 

For the pitch measurements, we took the value in Hz in the middle of 

the vowel. The just noticeable difference for pitch is 2,5% ('t Hart et al, 

1990). For the intensity measurements, we registered the mean value in dB 

of the whole syllable. 

The next parameter we considered concerns spectral balance. Sluijter 

(1995) claims that the spectral balance of the vowel of a stressed syllable is 

characterized by more perceived loudness in the higher frequency region, 

because of the changes in the source spectrum due to a more pulse-like 

shape of the glottal waveform. The vocal effort, which is used for stress, 

generates a strongly asymmetrical glottal pulse. As a result of the shortened 

closing phase, there is an increase of intensity around the four formants in 

the frequency region above 500 Hz. Following Sluijter (1995) we compared 

the differences in intensity of the higher and lower frequencies of the 

relevant syllables in both tempos. 

Finally, we considered rhythmic timing. The idea is that the beats in 

speech are separated from each other at an approximately equal distance 

independent of the speech rate. In other words, a speaker more or less 

follows an imaginary metronome. If he/she speaks faster, more melodic 

content will be placed between beats, which results in a shift of secondary 

stress. This hypothesis will be confirmed if the distance between the 

stressed syllables in the andante realization of an item, e.g. stu and toe in 

studietoelage, approximates the distance between the stressed syllables in 

the allegro realization of the same item, e.g. stu and la. If the quotient of the 

andante beat interval duration divided by the allegro beat interval duration 

approximates 1, we expect perceived restructuring.

5. Results 

5.1. Auditive analysis 


Before we can present an auditive analysis of the data, we have to find out 

whether or not the quiz design was successful. The results show that the 

quiz indeed triggers faster speech by all subjects. Figure 2 shows their 

acceleration factors. Subjects 1, 2 and 4 turned out to be the best 

accelerating speakers, whereas subjects 3, 5 and 6 showed less difference in 

duration between andante and allegro realizations. The mean acceleration 

factor for the three fast speakers is 1.31, whereas the mean acceleration 

factor for the three slow speakers is 1.13. 

factor 

1.4 

1.35 

1.3 

1.25 

1.2 

1.15 

1.1 

1.05 

1 

acceleration factors 

p1 p2 p3 p4 p5 p6 

subjects 

Figure 2. Acceleration factors of all subjects 

Figure 3 shows the mean durations of the items at both speech rates. It 

shows that the best accelerating speakers are also the fastest speakers. We 

expect to find more restructured patterns for these speakers, mainly subjects 

1 and 4, in comparison to the slower speakers, such as subjects 3 and 6.


seconds 

1.15 

1.05 

0.95 

0.85 

0.75 

0.65 

mean word durations 

p1 p2 p3 p4 p5 p6 

subjects 

Figure 3. Mean word durations 

andante 

Figure 4 shows that most subjects prefer patterns in which from a 

phonological point of view markedness constraints dominate the 

correspondence constraints at both rates for right and left shift data, but not 

for beat reduction data. There are slightly more restructured patterns in 

allegro tempo, although the differences are quite small. 

Number 

40 

35 

30 

25 

20 

15 

10 

5 

0 

Right Shifts 

(N=42) 

Left Shifts 

(N=36) 

Types 

All Subjects 

Beat 

Reductions 

(N=42) 

Figure 4. All subjects: Number of restructured items per type 

When we take the results of two fast subjects apart, subjects 1 and 4, we 

observe a stronger preference for restructuring in allegro speech and no 

restructuring in andante speech, as shown in Figure 5. In other words, the 

fast subjects display both a greater difference in word durations in andante 

and allegro speech, and more variability in their speech patterns due to 

tempo than the slow subjects do. 

allegro 

Not Shifted Andante 

Shifted Andante 

Shifted Allegro 

Not Shifted Allegro

Number 

16 

14 

12 

10 

8 

6 

4 

2 

0 

Right Shifts 

(N=14) 

Fast Subjects 

Left Shifts 

(N=12) 

Types 


Beat 

Reductions 

(N=14) 

Not Shifted Andante 

Shifted Andante 

Shifted Allegro 

Not Shifted Allegro 

Figure 5. Fast subjects: Number of restructured items per type 

Obviously, the preference for restructuring the rhythmic pattern in allegro 

speech is not an absolute preference. Sometimes restructuring does not take 

place in allegro speech, but on the other hand restructured patterns also 

show up in andante speech. 3 Some items were realized with the same 

rhythmic pattern irrespective of the tempo. Therefore, we also looked at the 

word pairs with a different rhythmic pattern in both tempos for each 

subject. We observe that the relatively fast speakers p1, p2 and p4, show 

the expected pattern according to our hypothesis, which means that they 

show a restructured pattern in allegro tempo, as shown in Figure 6 for the 

right shifts. 

number 

4 

3 

2 

1 

0 

Right Shifts Word Pairs 

p1 p2 p3 p4 p5 p6 

subjects 

Expected combinations 

Counterexamples 

Figure 6. Right Shifts: Expected combinations 

Two of the relatively slow speakers, p3 and p6, show one counterexample 

each, where the subject prefers the restructured patterns in andante tempo. 

The other slow speaker, P5, displays no different patterns in andante and


allegro at all. Clearly, we have two different groups of speakers and this 

observation strengthens our claim that restructuring relates to speech rate. 

Some items, such as hobbywerkruimte (Type 1) 'hobby room', never 

show a stress shift and other items, such as viriliteit (Type 2) ‘virility’, 

prefer the shifted pattern in both tempos for all subjects. Possibly, the 

syllable structure plays an important role; open syllables seem to lose stress 

more easily than closed ones. 

5.2. Acoustic analysis 

In the current state of phonological research, embodied in e.g. laboratory 

phonology, much value is set on acoustic evidence for phonological 

analyses. Studies such as Sluijter (1995) and Sluijter and Van Heuven 

(1996) provide acoustic correlates for primary stress. In our study we are 

concerned with beat reduction and secondary stress shifts and we wonder 

whether or not the same acoustic correlates hold for secondary stress. 

Shattuck Hufnagel et al (1994) and Cooper and Eady (1986) do not find 

acoustic correlates of rhythmic stress at all. They claim that it is not entirely 

clear which acoustic correlates are appropriate to measure, since these 

correlates are dependent on the relative strength of the syllables of an 

utterance. The absolute values of a single syllable can hardly be compared 

without reference to their context and the intonation pattern of the complete 

phrase. Huss (1978) claims that some cases of perceived rhythmic stress 

shift may be perceptual rather than acoustic in nature. Grabe and Warren 

(1995) also suggest that stress shifts can only be perceived in rhythmic 

contexts. In isolation, the prominence patterns are unlikely to be judged 

reliably. In the remainder of this paper we try to find out if we can support 

one of these lines of reasoning. In other words, are we able to support our 

perceived rhythmic variability with a phonetic analysis? Therefore, we 

measured the duration, pitch, intensity, spectral balance and rhythmic 

timing of the relevant syllables as realized by subject P1. 

Because Dutch is a quantity-sensitive language, the duration of the 

relevant syllable rhymes was considered. Onsets do not contribute to the 

weight of a syllable. In Figure 7, the duration analysis is shown for Type 2 

data (left shifts). The four columns indicate, respectively, the duration of 

the rhyme of the first and second syllable in andante speech, and the 

duration of the first and second one in allegro speech. According to Sluijter 

(1995), duration is the main correlate of primary stress. As a starting point,


we adopt her claim for our analysis of secondary stress. Our measurements 

would confirm our hypothesis and our auditive analysis, if the second 

column were higher than the first one and if the fourth column were lower 

than the third one. In that case, the subject would realize a word such as 

perfectionist as perfèctioníst in andante tempo and as pèrfectioníst in 

allegro tempo. 

In the andante tempo, three out of six items show the dominant 

correspondence pattern and in the allegro tempo, four out of six items show 

the dominant markedness pattern. That is hardly a preference and it does 

not confirm our auditive analysis of the same data. Furthermore, if we 

consider the word pairs with different patterns, there is only one pair that 

has the ideal ratio: the patterns of amerikaan. 

seconds 

0.16 

0.14 

0.12 

0.1 

0.08 

0.06 

0.04 

0.02 

0 

perfectionist 

amerikaan 

duration rhymes 

piraterij 

verbaliseren 

item 

banaliteit 

Figure 7. Duration (Left Shifts by Subject P1) 

If duration does not enable us to confirm our auditive findings, maybe pitch 

is the main stress correlate for this speaker. However, pitch measurements 

reveal the same fuzzy result as the duration measurements. Again, only one 

pattern confirms the auditive analysis. This time it is not the item 

amerikaan, but the item perfectionist. Moreover, the differences in pitch in 

this item do not exceed the threshold of the 2.5%, which is the just 

noticeable difference for pitch. We also analyzed the mean intensity value 

of the relevant vowels without recognizable patterns between allegro and 

andante style. These results support the analyses of Sluijter (1995) and 

Sluijter and Van Heuven (1996), who also claim that the intensity 

parameter does not contribute much to the perception of stress. 

Next, we considered the spectral balance. In order to rule out the 

influence of the other parameters, we monotonized the data for volume and 

viriliteit 

andante r1 

andante r2 

allegro r1 

allegro r2


pitch. Then we selected the relevant vowels and analyzed them as a 

cochleagram in PRAAT. The cochleagram simulates the way the tympanic 

membrane functions, in other words the way in which we perceive sounds. 

In Figure 8 we show two cochleagrams of the vowel [a] in the fourth 

syllable of, respectively, stúdietòelage 'study grant' (Type 1) in andante 

tempo and stúdietoelàge in allegro tempo. This item was taken from a prestudy. 

The allegro data show the expected increased perceived loudness in 

the higher frequencies, indicated by means of shades of gray; the darker 

gray the more perceived loudness. 

25 

20 

15 

10 

5 

0 

0 0.169371 

Time (s) 

Figure 8. Cochleagrams of [�] in studietoelage 

0 

0 0.143209 

Time (s) 

The right cochleagram (stressed [a]) in Figure 8 shows increased perceived 

loudness in the regions of approximately 5 to 22 Bark in the allegro version 

of [a] in comparison with the left cochleagram (unstressed [a]). This 

confirms the results of the study of primary stress in Sluijter (1995). If we 

convert this perceptive, almost logarithmic, Bark scale into its linear 

counterpart, the Hertz scale, this area correlates with the frequency region 

of 3 to 10 kHz. 

In order to measure perceived secondary stress, we will measure the 

relative loudness in the different frequency regions in Phon. 4 According to 

Sluijter (1995) stressed vowels have increased loudness above 500 Hz 

compared to the same vowel in an unstressed position. This can be shown if 

we take a point in time from both cochleagrams in Figure 8 in which the F1 

reaches its highest value (following Sluijter, 1995). In Figure 9 the values 

in Phon are depicted for these points and plotted against the Bark values in 

25 steps. 

25 

20 

15 

10 

5

phon 

60 

50 

40 

30 

20 

1 

stressed and unstressed [a] 

5 

9 

13 

bark 

Figure 9. Loudness in Phon 

17 


21 

25 


a lle g ro 

The white line in Figure 9 indicates the pattern of the allegro stressed [a] in 

studietoelage and the black line indicates the pattern of the andante 

unstressed [a]. We see increased loudness in the region of 13 to 21 Bark, 

which correlates with the most sensitive region of our ear. The mean Phon 

value in Figure 9 between 5 and 21 Bark is 43.6 Phon for the andante 

unstressed [a] and 47.4 Phon for the allegro stressed [a]; a mean difference 

of 3.8 Phon. 

Now, let us see whether or not we can find similar results for our subject 

P1. Figure 10 shows that the spectral balance confirms the leftward stress 

shift we perceived in the allegro realization of amerikaan. The first syllable 

vowel in allegro tempo is characterized by more loudness in the higher 

frequency regions than its andante counterpart. In the second syllable vowel 

it is just the other way around. 

Phon 

50 

40 

30 

20 

10 

0 

Stressed and unstressed [a] in 

[a]merikaan 

1 

4 

7 

10 

13 

16 

19 

22 

25 

Bark 


allegro 

Stressed and unstressed [e] in 

a[me]rikaan 

Figure 10. Spectral balance comparison of the first two vowels of amerikaan 

Unfortunately, not all spectral balance data confirm our auditive analysis. 

For example, we claimed that the pitch analysis of the stress shift in 

Phon 

60 

40 

20 

0 

1 

4 

7 

10 

13 

16 

19 

22 

25 

Bark 


allegro


perfectionist did confirm our auditive analysis. Therefore, we expected 

more loudness in the allegro realization of the first vowel and less loudness 

in the allegro realization of the second vowel, but it appeared that there is 

relatively more loudness in the andante realization of per. This result 

contradicts our auditive and our pitch analysis. 

We have to conclude that the different phonetic analyses contradict each 

other. Sometimes the perceived stress shift is characterized by a longer 

duration of the stressed syllable; sometimes a relatively higher pitch 

characterizes it. The results of our spectral balance analysis show that the 

differences in loudness pattern with differences in duration. In our 

perceived stress shift in allegro perfectionist, pitch turned out to be the 

decisive correlate, whereas duration and spectral balance measurements 

indicated no shift at all. On the other hand, the perceived shift in allegro 

amerikaan was confirmed by the duration and spectral balance analyses 

together, whereas pitch measurements indicated the opposite pattern. For 

most perceived stress shifts, however, the acoustic correlates did not give 

any clue. 

Finally, we will consider whether the perception of restructuring 

depends on rhythmic timing. Just as in music, speech can be divided into a 

melodic string and a rhythmic string as partly independent entities. With 

respect to speech, the melodic string seems to be more flexible than the 

rhythmic one. Imagine that the rhythm constitutes a kind of metronome 

pulse to which the melodic content has to be aligned. The listener expects 

prominent syllables to occur with beats. This behavior is formulated as the 

Equal Spacing Constraint: prominent vowel onsets are attracted to 

periodically spaced temporal locations (Couper-Kuhlen, 1993; Cummins & 

Port, 1998; Quené & Port, 2002; a.o.). Dependent on speech rate the 

number of intervening syllables between beats may differ. Suppose the beat 

interval is constant at 300 msec., there will be more linguistic material in 

between in allegro speech, e.g. the two syllables die and toe in 

stúdietoelàge, than in andante speech, e.g. only one syllable die in 

stúdietòelage. 

If indeed the perception of secondary stress shifts depends on rhythmic 

timing, i.e. the beat interval between prominent syllables in andante and 

allegro speech is approximately equal, than we expect that the duration 

quotient of the interval between, for example, stu and toe in the andante 

realization of studietoelage and stu and la in the allegro realization 

approximates 1.


In our pre-study, the interval between the vowel onsets of the first and 

third syllable in studietoelage (andante) is 0.358 sec, whereas the interval 

between the first and the fourth syllable in the allegro realization of the 

same word is 0.328 sec. This means that the duration quotient is 1.091, 

which indeed approximates 1. In other words, this example supports the 

idea of the Equal Spacing Constraint. 

Does the same result hold for our present data? We measured the beat 

intervals between all possible stress placement sites for all six subjects. 

Figure 11 depicts the duration quotients for subject 1. Figure 12 shows the 

beat intervals of the same data. It depicts as well the duration interval 

between the first and the third, as the first and fourth syllable for both 

speech rates. We expect restructuring for those data in which the line of the 

first to third syllable interval (andante (black line)) coincides with the line 

of the first to fourth syllable interval (allegro (white line)). 

quotient 

1.1 

1 

0.9 

0.8 

0.7 

0.6 

0.5 

studietoelage 

Right Shifts beat intervals andante:allegro 

wegwerpaanst... 

trimesterindeling 

kamervoorzitter 

hobbywerkruimte 

gemeenteinschr... 

Figure 11. Quotient beat intervals of Subject P1 

beat intervals in sec 

0.9 

0.8 

0.7 

0.6 

0.5 

0.4 

0.3 

0.2 

0.1 

0 

studietoelage 

wegwerpaansteker 

trimesterindeling 

Figure 12. Beat intervals of Subject P1 

winkelopheffing 

Right Shifts Beat Intervals P1 

kamervoorzitter 

hobbywerkruimte 

gemeenteinschrijving 

winkelopheffing 

quotient 

andante s1-3 

andante s1-4 

allegro s1-4 

allegro s1-3


The Figures 11 and 12 indicate that the relevant beat intervals of the items 

1, 4 and 7, studietoelage 'study grant', kamervoorzitter 'chairman of the 

House of Parliament' and winkelopheffing 'closing down of a shop', 

respectively, coincide. In other words, we expect to hear restructuring in 

exactly these three items. 

Unfortunately, our auditive analysis indicates only attested 

combinations of restructuring in items 2 and 6: wegwerpaansteker 

'disposable lighter' and gemeente-inschrijving 'municipal registration', 

respectively. Obviously, rhythmic timing is not the decisive characteristic 

of perceived restructuring in allegro speech either. 

6. Discussion and Conclusion 

In section 4, we presented our phonological account of the restructuring 

within the framework of OT. Our main conclusion is that phonetic 

compression cannot be the sole explanation of the different rhythm 

patterns. Although the results cannot really confirm our hypothesis that 

there are different grammars, i.e. constraint rankings for different rates of 

speaking, there seems to be something that relates to speech rate. The fast 

speakers display different grammars, i.e. constraint rankings, for different 

rates of speaking. In their andante tempo, correspondence constraints 

prevail, whereas in allegro tempo markedness constraints dominate the 

correspondence ones. These preferences resemble the preferences of 

andante and allegro music. In both disciplines clashes are avoided in 

allegro tempo by means of enlarging the distances between beats. 

In section 5, we attempted to confirm our phonological account with a 

phonetic analysis. Unfortunately, the phonetic correlates of stress - 

duration, pitch, intensity and spectral balance - do not show the expected 

and perceived differences in rhythm patterns in all pairs. Sluijter (1995) 

found out that duration is the main correlate of primary stress with spectral 

balance as an important second characteristic. In our analysis, however, 

neither differences in duration nor differences in spectral balance could 

identify secondary stress. Therefore, we have to conclude that our analysis 

supports earlier work by Shattuck Hufnagel et al (1994), Cooper and Eady 

(1986), Huss (1978) and Grabe and Warren (1995), who all claim that 

acoustic evidence for secondary stress cannot be found unambiguously. 

Although we did find some differences in duration, spectral balance or 

pitch, these differences were not systematically found in all pairs in which


we perceived rhythmic variability. Finally, we discussed rhythmic timing as 

a cue for variable patterns. However, the hypothesis that the duration 

between prominent syllables is approximately equal in both andante and 

allegro speech was not confirmed by the auditive analysis of the data. It 

seems that rhythmic restructuring is more a matter of perception than of 

production. At this point, the question remains: are we fooled by our brains 

and is there no phonetic correlate of the perceived phonological stress shifts 

in the acoustic signal or do we have to conclude that the real phonetic 

correlate of secondary stress has yet to be found? 

Notes 

1 This paper is an extension of our paper "Restructuring the melodic content of 

feet", which is submitted to the proceedings of the 9 th International Phonology 

Meeting: Structure and melody, Vienna 2002. We wish to thank Grzegorz 

Dogil, Hidetoshi Shiraishi plus the participants of the 9 th International 

Phonology Meeting, Vienna 2002 and the participants of the 11 th Manchester 

Phonology Meeting, Manchester 2003 for their useful comments. We are also 

grateful to Sible Andringa, Nynke van den Bergh, Gerlof Bouma, John Hoeks, 

Jack Hoeksema, Wander Lowie, Dirk-Bart den Ouden, Joanneke Prenger, 

Ingeborg Prinsen, Femke Wester for participating in our experiment. We 

especially thank Wilbert Heeringa and Hugo Quené for supplying us with the 

PRAAT scripts that we could use for our spectral balance and rhythmic timing 

analyses. 

2 For reasons of clarity, we abstract from constraints such as FOOTBINARITY 

(FTBIN) and WEIGHT-TO-STRESS PRINCIPLE in Table 2. Although these 

constraints play an important role in the Dutch stress system (cf. Gilbers & 

Jansen, 1996), the conflict between OUTPUT-OUTPUT CORRESPONDENCE and 

FOOT REPULSION is essential for our present analysis. 

3 With respect to the phonological analysis of the data, we suggest a random 

ranking of weighed correspondence and markedness constraints. By means of 

weighing constraints we adopt an OT variant that more or less resembles the 

analyses in OT’s predecessor Harmonic Grammar (cf. Legendre, G., Y. Miyata & 

P. Smolensky, 1990). Note that we do not opt for a co-phonology for allegro-style 

speech in our analysis. In a co-phonology, the output of the andante-style ranking 

is input or base for the allegro-style ranking. We opt for a random ranking with 

different preferences for allegro and andante speech, because our data show 

variable rhythmic structures at both rates. Both rankings evaluate the same input 

form.


4 The perceived loudness depends on the frequency of the tone. The Phon entity 

is defined using the 1kHz tone and the decibel scale. A pure sinus tone at any 

frequency with 100 Phon is as loud as a pure tone with 100 dB at 1kHz 

(Rietveld and Van Heuven, 1997: 199). We are most sensitive to frequencies 

around 3kHz. The hearing threshold rapidly rises around the lower and upper 

frequency limits, which are respectively about 20Hz and 16kHz. 

References 

Boersma, Paul, and David Weenink (1992-2002). PRAAT, phonetics by 

computer. Available at http://www.praat.org, University of 

Amsterdam. 

Burzio, Luigi (1998). Multiple Correspondence. Lingua, 104: 79-109. 

Cooper, W., and J. Eady (1986). Metrical phonology in speech production. 

Journal of Memory and Language, 25: 369-384. 

Couper-Kuhlen, Elizabeth (1993). English speech rhythm: form and function in 

everyday verbal interaction. Benjamins, Amsterdam. 

Cummins, Fred, and Robert Port (1998). Rhythmic constraints on stress timing 

in English. Journal of Phonetics, 26(2): 145-171. 

Eefting, Wieke, and Toni Rietveld (1989). Just noticeable differences of 

articulation rate at sentence level. Speech Communication, 8: 

355-351. 

Gilbers, Dicky, and Wouter Jansen (1996). Klemtoon en ritme in Optimality 

Theory, deel 1: hoofd-, neven-, samenstellings- en 

woordgroepsklemtoon in het Nederlands [Stress and rhythm in 

Optimality Theory, part 1: primary stress, secondary stress, 

compound stress and phrasal stress in Dutch]. TABU, 26(2): 53- 

101. 

Gilbers, Dicky, and Maartje Schreuder (to appear). Language and Music in 

Optimality Theory. Proceedings of the 7th International 

Congress on Musical Signification 2001, Imatra, Finland. 

Extended manuscript available as ROA-571. 

Grabe, Esther, and Paul Warren (1995). Stress shift: do speakers do it or do 

listeners hear it? In: Connell, Bruce and Amalia Arvaniti (eds.). 

Phonology and phonetic evidence. Papers in Laboratory 

Phonology IV. 

Hart, Johan, René Collier, and Antonie Cohen (1990). A perceptual study of 

intonation. An experimental-phonetic approach to speech 

melody. Cambridge University Press, Cambridge.


Huss, V. (1978). English word stress in the postnuclear position. Phonetica, 35: 

86-105. 

Kager, René (1994). Ternary rhythm in alignment theory. ROA-35. 

Legendre, Geraldine, Yoshiro Miyata, and Paul Smolensky (1990). Harmonic 

Grammar - A formal multi-level connectionist theory of 

linguistic wellformedness: An application. In: Proceedings of 

the Twelfth Annual Meeting of the Cognitive Science Society, 

884-891. 

Lerdahl, Fred, and Ray Jackendoff (1983). A Generative Theory of Tonal 

Music. The MIT Press, Cambridge, Massachusetts, London, 

England. 

Liberman, Mark (1975). The Intonational System of English. Garland, New 

York and London. 

McCarthy, John J. (1986). OCP Effects: Gemination and antigemination. 

Linguistic Inquiry, 17: 207-263. 

Neijt, Anneke, and Wim Zonneveld (1982). Metrische fonologie - De 

representatie van klemtoon in Nederlandse monomorfematische 

woorden. [Metrical phonology – The representation of stress in 

Dutch monomorphemic words] De nieuwe Taalgids, 75: 527- 

547. 

Prince, Alan, and Paul Smolensky (1993). Optimality Theory: constraint 

interaction in generative grammar. Ms., ROA-537. 

Quené, Hugo, and Robert F. Port (2002). Rhythmical factors in stress shift. 

Paper presented at the 38th Meeting of the Chicago Linguistic 

Society, Chicago. 

Rietveld, Toni, and Vincent van Heuven (1997). Algemene Fonetiek. [General 

Phonetics]. Dick Coutinho, Bussum. 

Schreuder, Maartje, and Dicky Gilbers (submitted). Restructuring the melodic 

content of feet. In: Proceedings of the 9th International 

Phonology Meeting 2002, Vienna, Austria. 

Shattuck Hufnagel, Stephanie, Mari Ostendorf, and Ken Ross (1994). Stress 

shift and early pitch accent placement in lexical items in 

American English. Journal of Phonetics, 22: 357-388. 

Sluijter, Agaath (1995). Phonetic Correlates of Stress and Accent. HIL 

dissertations 15, Leiden University. 

Sluijter, Agaath, and Vincent van Heuven (1996). Spectral balance as an 

acoustic correlate of linguistic stress. Journal of the Acoustical 

Society of America, 100(4): 2471-2485.

List of Addresses 

Drs. Markus Bergmann 

University of Groningen, Faculty of Arts, Department of Linguistics 

Oude Kijk in 't Jatstraat 26, 9712 EK Groningen, The Netherlands 

+31 50 3635982, M.Bergmann@let.rug.nl 

Drs. Tamás Bíró 

University of Groningen, Faculty of Arts, Department of Computational 

Linguistics 


+31 50 3636852, Birot@let.rug.nl 

Dr. Dicky Gilbers 



+31 50 3635983, D.G.Gilbers@let.rug.nl 

Dr. Charlotte Gooskens 

University of Groningen, Faculty of Arts, Department of Scandinavian 

Languages and Cultures 


+31 50 3635827, C.S.Gooskens@let.rug.nl 

Dr. Dr. Tjeerd de Graaf and Drs. Nynke de Graaf 



+31 50 3635982, T.de.Graaf@let.rug.nl 

Drs. Angela Grimm 



+31 50 3635920, A.Grimm@let.rug.nl 

Dr. Ing. Wilbert Heeringa 


Linguistics 


+31 50 3635970, W.J.Heeringa@let.rug.nl

204 

Prof. Dr. Vincent J. van Heuven 

University of Leiden, Faculty of Arts, Department of Linguistics 

Van Wijkplaats 4, 2311 BX Leiden, The Netherlands 

+31 71 5272105, V.J.J.P.van.Heuven@let.leidenuniv.nl 

Nienke Knevel 

p/a University of Groningen, Faculty of Arts, Department of Linguistics 


+31 50 3635983, N.B.Knevel@student.rug.nl 

Dr. Jurjen van der Kooi 

University of Groningen, Faculty of Arts, Department of Frisian 


+31 50 3635966, J.van.der.Kooi@let.rug.nl 

Prof. Dr. Ir. John Nerbonne 


Linguistics 


+31 50 3635815, J.Nerbonne@let.rug.nl 

Drs. Maartje Schreuder 



+31 50 3635920, M.J.Schreuder@let.rug.nl 

Drs. Hidetoshi Shiraishi 



+31 50 3635982, H.Shiraishi@let.rug.nl 

Dr. Ivilin Stoianov 

University of Padova, Department of General Psychology 

Via Venezia 8, 35100 AS Padova, Italy 

+39 049 8276676, Ivilin.Stoianov@unipd.it

On the Boundaries of Phonology and Phonetics - Faculteit der ...

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?