25.08.2015 Views

In the Beginning was Information

6KezkB

6KezkB

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

H syll = 8.6 bits/syllable. (16)The average number of letters per syllable is 3.03, so thatH 3 = 8.6/3.03 = 2.84 bits/letter. (17)W. Fucks [F9] investigated <strong>the</strong> number of syllables per word, and foundinteresting frequency distributions which determine characteristic valuesfor different languages.The average number of syllables per word is illustrated in Figure 36 forsome languages. These frequency distributions were obtained from fictiontexts. We may find small differences in various books, but <strong>the</strong> overallresult does not change. <strong>In</strong> English 71.5 % of all words are monosyllabic,19.4 % are bisyllabic, 6.8 % consist of three syllables, 1.6 % have four,etc. The respective values for German are 55.6 %, 30.8 %, 9.38 %,3.35 %, 0.71 %, 0.14 %, 0.2 %, and 0.01 %.For English, German, and Greek <strong>the</strong> frequency distribution peaks at onesyllable, but <strong>the</strong> modus for Arabic, Latin, and Turkish is two syllables(Figure 36). <strong>In</strong> Figure 37 <strong>the</strong> entropy H S ≡ H syllable is plotted against <strong>the</strong>average number of syllables per word for various languages. Of <strong>the</strong>investigated languages, English has <strong>the</strong> smallest number of syllables perword, namely 1.4064, followed by German (1.634), Esperanto (1.895),Arabic (2.1036), Greek (2.1053), etc. The average ordinate values forsyllable entropy H syllable of <strong>the</strong> different languages have been found bymeans of equation (9), but it should be noted that <strong>the</strong> probabilities ofoccurrence of monosyllabic, bisyllabic, etc. words were used for p i. Thevalue of H syllable = 1.51 found for German, should not be compared with<strong>the</strong> value derived from equation (16), because a different method of computationis used.3. Words: Statistical investigations of German showed that half of allwritten text comprises only 322 words [K4]. Using <strong>the</strong>se words, it followsfrom equation (9) that <strong>the</strong> word entropy, H word = 4.5 bits/word. When only<strong>the</strong> 16 most frequently used words, which already make up 20 % of a text,are considered, H word is found to be 1.237 bits per word. When all wordsare considered, we obtain <strong>the</strong> estimated 1.6 bits per letter, as indicated inequation (14). The average length of German words is 5.53 letters, so that<strong>the</strong> average information content is 5.53 x 1.6 = 8.85 bits per word.It should now be clear that certain characteristics of a language may bedescribed in terms of values derived from Shannon’s <strong>the</strong>ory of information.These values are purely of a statistical nature, and do not tell us anythingabout <strong>the</strong> grammar of <strong>the</strong> language or <strong>the</strong> contents of a text. Just as198

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!