25.08.2015 Views

In the Beginning was Information

6KezkB

6KezkB

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

a) <strong>the</strong> factor n, which indicates that <strong>the</strong> information content is directlyproportional to <strong>the</strong> number of symbols used. This is totally inadequate fordescribing real information. If, for example, somebody uses a spate ofwords without really saying anything, <strong>the</strong>n Shannon would rate <strong>the</strong> informationcontent as very large, because of <strong>the</strong> great number of lettersemployed. On <strong>the</strong> o<strong>the</strong>r hand, if someone who is an expert, expresses <strong>the</strong>actual meanings concisely, his “message” is accorded a very small informationcontent.b) <strong>the</strong> variable H, expressed in equation (6) as a summation over <strong>the</strong> availableset of elementary symbols. H refers to <strong>the</strong> different frequency distributionsof <strong>the</strong> letters and thus describes a general characteristic of <strong>the</strong> languagebeing used. If two languages A and B use <strong>the</strong> same alphabet (e. g. <strong>the</strong>Latin alphabet), <strong>the</strong>n H will be larger for A when <strong>the</strong> letters are more evenlydistributed, i. e. are closer to an equal distribution. When all symbols occurwith exactly <strong>the</strong> same frequency, <strong>the</strong>n H = lb N will be a maximum.An equal distribution is an exceptional case: We consider <strong>the</strong> case whereall symbols can occur with equal probability, e. g. when noughts and onesappear with <strong>the</strong> same frequency as for random binary signals. The probabilitythat two given symbols (e. g. G, G) appear directly one after <strong>the</strong> o<strong>the</strong>r,is p 2 ; but <strong>the</strong> information content I is doubled because of <strong>the</strong> logarithmicrelationship. The information content of an arbitrary long sequence ofsymbols (n symbols) from an available supply (e. g. <strong>the</strong> alphabet) when<strong>the</strong> probability of all symbols is identical, i. e.p 1 = p 2 = ... = p N = p, is found from equation (5) to be:nI tot = ∑ lb(1/p i ) = n x lb(1/p) = - n x lb p. (7)i=1If all N symbols may occur with <strong>the</strong> same frequency, <strong>the</strong>n <strong>the</strong> probabilityis p = 1/N. If this value is substituted in equation (7), we have <strong>the</strong> importantequationI tot = n x lb N = n x H. (8)2. The average information content of one single symbol in asequence: If <strong>the</strong> symbols of a long sequence occur with differing probabilities(e. g. <strong>the</strong> sequence of letters in an English text), <strong>the</strong>n we are interestedin <strong>the</strong> average information content of each symbol in this sequence,or <strong>the</strong> average in <strong>the</strong> case of <strong>the</strong> language itself. <strong>In</strong> o<strong>the</strong>r words: What is<strong>the</strong> average information content in this case with relation to <strong>the</strong> averageuncertainty of a single symbol?174

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!