The Arimoto-Blahut Algorithm for Calculation of Channel Capacity

J(Q, Φ) ≤ log( ∑ ir(i) = exp[ ∑ jr(i))P (j/i) log Φ(i/j)]with equality iff Q(i) =r(i) ∑k r(k).The double maximum in (4) can be taken in any order. Therefore,C = maxΦmaxQJ(Q, Φ) = max log(∑Φir(i))= max log(∑Φiexp ∑ jP (j/i) log Φ(i/j)) (5)The point of introducing J(Q, Φ) is that C can be calculated through a double maximizationprocedure (40¡ each step of which can be solved in closed form, as given in Lemmas1 and 2. The maximization of J(Q) with respect to Q has no closed-form solution. We candoubly maximize J(Q, Φ) in any order. For example, let us fix Q to some initial value Q o .By Lemma 1, Φ o (i/j) =P (j/i)Qo (i)∑k P (j/k)Qo (k) maximizes J(Qo , Φ) when Φ = Φ o . Now considerJ(Q, Φ o ) for Φ o fixed and vary Q to produce a maximum. In Lemma 2, that maximum isJ(Q, Φ o ) = log( ∑ ir(i)) = log ∑ iexp ∑ jP (j/i) log Φ(i/j)for Q(i) = Q 1 (i) =r(i) ∑k r(k). Then for that Q1 we can find a maximum of J(Q 1 , Φ) forΦ = Φ 1 as given in Lemma 1 for Q 1 and P (j/i).Fix Φ 1 and maximize J(Q, Φ 1 ) forQ = Q 2 as given in Lemma 2. Now maximize J(Q 2 , Φ) and so on. At each step, J(Q, Φ)is increasing until it eventually reaches capacity C.So, we can now formulate the Arimoto-Blahut algorithm to accomplish the doublemaximization in (4).3

Algorithml is the iteration index1. Set l = 0 and choose initial set of input probabilities Q o (i) > 0, all i.2. ComputeΦ l (i/j) =r l (i) = exp ∑ jQl (i)P (j/i)∑k Ql (k)P (j/k)all i, jP (j/i) ln Φ l (i/j)J(Q l+1 , Φ l ) = ln( ∑ ir l (i))3. Set l = l + 1 and go to 2.Q l+1 (i) =rl (i)∑k rl (i)This algorithm accomplishes what was already explained, but not how to stop therecursion. The question now is when do we know that we are close to capacity. We nowhave to introduce one more lemma.For any l, J(Q l+1 , Φ l ) = ln( ∑ i rl (i)) ≤ C. Let us defineThenc l (i) ≡ rl (i)Q l (i)J(Q l+1 , Φ l ) = ln( ∑ iQ l (i)c l (i))So J(Q l+1 , Φ l ) is the logarithm of the average of the c l (i)’s.Lemma 3C ≤ max ln c l (i)i4

Therefore, channel capacity C is bounded from below and above as follows:ln( ∑ iQ l (i)c l (i)) ≤ C ≤ max ln c l (i)iIn order to find C within accuracy ɛ, stop the iteration whenmax ln c l (i) − ln( ∑iiQ l (i)c l (i)) < ɛorJ(Q l+1 , Φ l ) > max ln c l (i) − ɛiInsert in algorithm:2 ′ . Calculate −J(Q l+1 , Φ l ) + max i ln rl (i)Q l (i) = T .If T > ɛ, go to 3.If T < ɛ, go to 4.4. C = J(Q l+1 , Φ l )It is interesting to note that ln c l (i) = I(i; Y/Q l ); the average information that theoutput ensemble Y gives about the input event i, when the input probabilities are Q l .Recall that ifI(i; Y/Q) = γ for all i s.t. Q(i) > 0I(i; Y/Q) ≤ γ for all i s.t. Q(i) = 0(Kuhn-Tucker conditions).J(Q) = I(i; Y/Q) = C and C = γ. So, when we are nearing capacity C, the informationgiven by the output ensemble Y about each input event of nonzero probability isnearing equality, since5

∑Q l (i) ln c l (i) ≤ ln ∑iiQ l (i)c l (i) ≤ C ≤ max ln c l (i)iby the convex ∩ property of ln. So we haveI(i; Y/Q l ) ≤ C ≤ max I(i; Y/Q l )iWe now prove the lemmas.Proof of Lemma 1:J(Q, Φ) − J(Q) = ∑ i≤ ∑ i= ∑ i∑P (j/i)Q(i) log Φ(i/j)P (i/j)j∑[ ]Φ(i/j)P (j/i)Q(i)P (i/j) − 1 j∑[P (j)Φ(i/j) − P (j/i)Q(i)] = 1 − 1 = 0jwith equality iff Φ(i/j) = P (i/j) for all i and j.Proof of Lemma 2:6

J(Q, Φ) = ∑ i= ∑ i= ∑ iJ(Q, Φ) = ∑ i= ∑ i∑jP (j/i)Q(i) log Φ(i/j)Q(i)∑1P (j/i)Q(i) logQ(i) + ∑ ∑P (j/i)Q(i) log Φ(i/j)ji j1Q(i) logQ(i) + ∑ Q(i) ∑ P (j/i) log Φ(i/j)i j( ∑)expjP (j/i) log Φ(i/j)Q(i) logQ(i)( ) r(i)Q(i) log , r(i) = exp ∑ P (j/i) log Φ(i/j)Q(i)j= ∑ iQ(i) log( ∑ kr(k)) + ∑ iQ(i) log r(i)/ ∑ k r(k)Q(i)≤ log( ∑ kr(k))The last step results from ln x ≤ x − 1 in the second sum with equality iff Q(i) =r(i) ∑k r(k).Proof of Lemma 3.Suppose Q ∗ achieves capacity. ThenC = ∑ iC = ∑ i∑j∑jP (j/i)Q ∗ (i) log P (j/i)P ∗ (j) , P ∗ (j) = ∑ kP (j/i)Q ∗ (i) log P (j/i)P l (j) × P l (j)P ∗ (j)P (j/k)Q ∗ (k)whereP l (j) = ∑ kP (j/k)Q l (k) are the output probabilities for Q l (k).Continuing,7

C = ∑ i= ∑ j∑jP (j/i)Q ∗ (i) log P l (j)P ∗ (j) + ∑ iP ∗ (j) log P l (j)P ∗ (j) + ∑ iQ ∗ (i) ∑ j∑jP (j/i)Q ∗ (i) log P (j/i)P l (j)P (j/i) log P (j/i)P l (j)Using ln x ≤ x − 1, the first term is ≤ 0. In the second term, we can overbound the∑jP (j/i) logP (j/i)P l (j)by its maximum over i. Therefore,C ≤ maxi∑jP (j/i) log P (j/i)p l (j)= max I(i; Y/Q l )iRecallr l (i) = exp ∑ jP (j/i) ln Φ l (i/j) = exp ∑ jP (j/i) ln Ql (i)P (j/i)P l (j)ln c l (i) = ln r l (i) − ln Q l (i) = ∑ jP (j/i) ln Ql (i)P (j/i)P l (j)− ∑ jP (j/i) ln Q l (i)= ∑ jP (j/i) ln P (j/i)P l (j) = I(i; Y/Ql ).Therefore, the conclusion of the lemma can be expressed asC ≤ max ln c l (i),ias given.Note the condition for equality. We must have P l (j) = P ∗ (j) or Q l (i) = Q ∗ (i) fornonzero Q ∗ (i) and I(i; Y/Q l ) be the same maximum value for all i such that Q l (i) > 0.These conditions are both necessary and sufficient for equality. So we have corroboratedthe Kuhn-Tucker conditions.8

The Arimoto-Blahut Algorithm for Calculation of Channel Capacity

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?