Thomas_M.Cover信息论英文教材课后题答案.doc-道客多多

资源描述

1、2.2、Entropy of functions. Let be a random variable taking on a finite Xnumber of values. What is the (general) inequality relationship of HXand if HY(a) ?2X(b) ?cosSolution: Let . Then ygx.:()xygppConsider any set of s that map onto a single . For this set xy,:() :()loglo()log()xygxygp pySince is a

2、monotone increasing function and . lo :()()xygExtending this argument to the entire range of (and ), we obtain XY:()()log()lo()x yxgHXppx,()l()yyHYwith equality iff if one-to-one with probability one.g(a) is one-to-one and hence the entropy, which is just a function of 2XYthe probabilities does not

3、change, i.e., .()HXY(b) is not necessarily one-to-one. Hence all that we can say is costhat , which equality if cosine is one-to-one on the range ()HXYof .2.16. Example of joint entropy. Let be given by (,)pxyXY0 10 1/3 1/31 0 1/3Find(a) , .()HXY(b) , .|(|)(c) ()(d) .|HYX(e) (;)I(f) Draw a Venn diag

5、 joint random variables. Prove the XYZfollowing inequalities and find conditions for equality.(a) )|()|,(ZHYX(b) ;II(c) )(,(),(),( XH(d) ;|;|; ZIYIXZIYXI Solution:(a) Using the chain rule for conditional entropy, )|(),|()|()|,( ZXHYZXHYWith equality iff ,that is, when is a function of and 0,|.Z(b)Us

6、ing the chain rule for mutual information,);()|;();();,( ZXIYIZXIYI With equality iff , that is, when and are conditionally 0|independent given .(c) Using first the chain rule for entropy and then definition of conditional mutual information, )|;()|(),|(),(),( XZYIHYXZHZYX,|With equality iff , that

7、is, when and are conditionally 0)|;(Iindependent given .X(d)Using the chain rule for mutual information, );()|;();,();()|;( ZXIYZIXIYZIXI And therefore this inequality is actually an equality in all cases.4.5 Entropy rates of Markov chains.(a) Find the entropy rate of the two-state Markov chain with

8、 transition matrix 1010 pP(b) What values of , maximize the rate of part (a)?01p(c) Find the entropy rate of the two-state Markov chain with transition matrix 0 1pP(d)Find the maximum value of the entropy rate of the Markov chain of part (c). We expect that the maximizing value of should be less tha

9、np, since the 0 state permits more information to be generated than 2/1the 1 state.Solution:(a)The stationary distribution is easily calculated.101010,ppTherefore the entropy rate is 10101012 )()()()()|( pHpHXH(b)The entropy rate is at most 1 bit because the process has only two states. This rate ca

10、n be achieved if( and only if) , in 2/101pwhich case the process is actually i.i.d. with .2/1)Pr()0r(iiX(c)As a special case of the general two-state Markov chain, the entropy rate is .1)()()|(1012 pHpXH(d)By straightforward calculus, we find that the maximum value of )(Hof part (c) occurs for . The

11、 maximum value is3820/)53(p(wrong!)bits 694.0)215()1()Hp5.4 Huffman coding. Consider the random variable 0.2 3.4 0.12 649.076531 xxxX(a) Find a binary Huffman code for .X(b)Find the expected codelength for this encoding.(c) Find a ternary Huffman code for .Solution:(a) The Huffman tree for this dist

12、ribution is(b)The expected length of the codewords for the binary Huffman code is 2.02 bits.( )()(iplXE(c) The ternary Huffman tree is 5.9 Optimal code lengths that require one bit above entropy. The source coding theorem shows that the optimal code for a random variable has Xan expected length less

13、 than . Given an example of a random 1)(XHvariable for which the expected length of the optimal code is close to , i.e., for any , construct a distribution for which the optimal 1)(XH0code has .1)(LSolution: there is a trivial example that requires almost 1 bit above its entropy. Let be a binary ran

14、dom variable with probability of X 1Xclose to 1. Then entropy of is close to 0, but the length of its optimal Xcode is 1 bit, which is almost 1 bit above its entropy.5.25 Shannon code. Consider the following method for generating a code for a random variable which takes on values with Xmm,21probabil

15、ities . Assume that the probabilities are ordered so that mp,21. Define , the sum of the probabilities of all p21 1ikFsymbols less than . Then the codeword for is the number i i 1,0iFrounded off to bits, where .il iipl1og(a) Show that the code constructed by this process is prefix-free and the avera

16、ge length satisfies .1)()(XHL(b) Construct the code for the probability distribution (0.5, 0.25, 0.125, 0.125).Solution:(a) Since , we have iipl1og1log1liipWhich implies that .)()(XHlLXHiBy the choice of , we have . Thus , differs from il )1(2ii lil jFiby at least , and will therefore differ from is

17、 at least one place jFil2 iin the first bits of the binary expansion of . Thus the codeword for il i, , which has length , differs from the codeword for at jijliFleast once in the first places. Thus no codeword is a prefix of any ilother codeword.(b)We build the following tableSymbol Probability in

18、iFdecimalin ibinaryilCodeword1 0.5 0.0 0.0 1 02 0.25 0.5 0.10 2 103 0.125 0.75 0.110 3 1104 0.125 0.875 0.111 3 1113.5 AEP. Let be independent identically distributed random ,21Xvariables drawn according to the probability mass function. Thus . We know that mxp,21),(niixpxp121)(),(in probability. Le

19、t , )(),(log21XHXnn niixqq121)(),(where q is another probability mass function on .m,(a) Evaluate , where are i.i.d. .),(logim21nqn21X)(xpSolution: Since the are i.i.d., so are nX21, ,，，and hence we can apply the strong law of large )(1Xq2)(nqnumbers to obtain )(log1lim),(log1im21 inXqXn.(pwE)(lx)(

20、log)(logxpqp|(HD8.1 Preprocessing the output. One is given a communication channel with transition probabilities and channel capacity . A )|(xyp );(max)YXICphelpful statistician preprocesses the output by forming . He _gclaims that this will strictly improve the capacity.(a) Show that he is wrong.(b

21、) Under what condition does he not strictly decrease the capacity?Solution:(a) The statistician calculates . Since forms a )(_Yg_YXMarkov chain, we can apply the data processing inequality. Hence for every distribution on ,x.);();(_YXIILet be the distribution on that maximizes . Then )(_xpx);(_YXI._

22、)()(_)()( max;(;ma _ CIYXIIC pxpxpxp Thus, the statistician is wrong and processing the output does not increase capacity.(b)We have equality in the above sequence of inequalities only if we have equality in data processing inequality, i.e., for the distribution that maximizes , we have forming a Ma

23、rkov chain.);(_YXI YX_8.3 An addition noise channel. Find the channel capacity of the following discrete memoryless channel:Where . The alphabet for is . Assume that 21Pr0raZx1,0Xis independent of . Observe that the channel capacity depends on the ZXvalue of .aSolution: A sum channel., ZXY10aZ,We ha

24、ve to distinguish various cases depending on the values of .aIn this case, ,and . Hence the capacity is 1 bit 0a );(maxYIper transmission.In this case, has four possible values . Knowing ,we 1,Ya1,0Yknow the which was sent, and hence . Hence the capacity is X)|(YXHalso 1 bit per transmission.In this

25、 case has three possible output values, 0,1,2, the channel is aYidentical to the binary erasure channel, with . The capacity of this 21fchannel is bit per transmission.21fThis is similar to the case when and the capacity is also 1/2 1a 1abit per transmission.8.5 Channel capacity. Consider the discre

26、te memoryless channel , where and . Assume that )1(modZXY1/3 ,/2,Z10,Xis independent of .X(a) Find the capacity.(b)What is the maximizing ?)(*xpSolution: The capacity of the channel is );(max)YXICp)|()|();( ZHZHYXHYXI , which is obtained when has an uniform bits31loglogZydistribution, which occurs w

27、hen has an uniform distribution.(a) The capacity of the channel is /transmission.bits31log(b) The capacity is achieved by an uniform distribution on the inputs. 10,for1)(iiXp8.12 Time-varying channels. Consider a time-varying discrete memoryless channel. Let be conditionally independent given nY,21,

28、 with conditional distribution given by . nX,21 niiixypxyp1)|()|(Let , . Find .),(21n),(21nY;ma)(YXIxpSolution: niiniini niii ii phXYHXYHXI 111 112)()|()( )|(,| |,)(|();( With equlity if is chosen i.i.d. Hence n,21.niixpphYXI1)( )(;ma10.2 A channel with two independent looks at . Let and be Y12Ycond

29、itionally independent and conditionally identically distributed given .X(a) Show .);();(2),;( 211YIXIYI(b) Conclude that the capacity of the channelX ( Y 1 , Y 2 )is less than twice the capacity of the channelX Y 1Solution:(a) )|,(),(),;( 212121 XHYI)|(|;2YYIH);();(2)(21211IXI(b)The capacity of the

30、single look channel is .1YX);(max1)YXICpThe capacity of the channel is ,21)( 21)(22);max );();max,CYXI IYICpp10.3 The two-look Gaussian channel. Consider the ordinary Shannon Gaussian channel with two correlated looks at , i.e., , whereX)(21Ywith a power constraint on , and , where 2211ZXYP0),(21KNZ

31、. Find the capacity forNK C(a) 1(b) 0(c) Solution:It is clear that the two input distribution that maximizes the capacity is . Evaluating the mutual information for this distribution,),0(PNX),(),()|,(),( |;max21212121 ZhYXZhYXICNow since , we have N ,0,21.)1()log()log(2),( 221 eKzeZhSince , and , we

32、 have11ZXY22ZXY,NPN ,0),(21And .)1(2)1()2log(1)2log(1),( 22 PNeKeYhYHence )1(l,2NPZhC(a) . In this case, , which is the capacity of a single 1N1log2look channel. (b) . In this case, , which corresponds to using 0PC1log2twice the power in a single look. The capacity is the same as the capacity of the

33、 channel .)(21YX(c) . In this case, , which is not surprising since if we add 1Cand , we can recover exactly.Y210.4 Parallel channels and waterfilling. Consider a pair of parallel Gaussian channels, i.e., , where ,2121ZXY 2121 0,NZAnd there is a power constraint . Assume that . PE)(121At what power

34、does the channel stop behaving like a single channel with noise variance , and begin behaving like a pair of channels?2Solution: We will put all the signal power into the channel with less noise until the total power of noise+signal in that channel equals the noise power in the other channel. After that, we will split any additional power evenly between the two channels. Thus the combined channel begins to behave like a pair of parallel channels when the signal power is equal to the difference of the two noise powers, i.e., when .21P

展开阅读全文