ENGG5301 Information Theory 2025 Midterm Exam P3:Causal Encoding

考试时 (1) 问就想偏了,考后看到 GPT-5 的答案很气,不等式想不到直接 (1)(2)(3) 连跪,搞的 (4)(5) 问也没做。

从初中就开始烂完的不等式水平又发力了,但这课确实没啥心思去刷教材/习题,符合预期。

好的,根据您的要求,我将把第 3 题的内容转换为英文 Markdown 格式。

3. (19 pts) Causal Encoding

Consider an i.i.d. sequence \(X_1,...,X_n\) (\(n\ge2\)) where each \(X_i\) follows a common distribution \(p_X\) over a finite set \(\mathcal{X}\). An encoder encodes it into a sequence \(M_1,...,M_n\) (not necessarily i.i.d., \(M_i\) can take values over any finite set \(\mathcal{M}\)), by looking at \(X_1,...,X_n\) and producing \(M_1,...,M_n\) one by one.

We say that \(M_1,...,M_n\) is a causal encoding of \(X_1,...,X_n\) if these two conditions are satisfied:
i) \(M_i\) is a function of \((X_1,...,X_i)\) for each \(i=1,...,n\) (i.e., the encoder does not look into the future when outputting \(M_i)\), and
ii) \(X_i\) is a function of \((M_1,...,M_i)\) for each \(i=1,...,n\) (i.e., a decoder can recover \(X_i\) using \(M_1,...,M_i\) without looking into the future).

Moreover, we say that \(M_1,...,M_n\) is an optimal causal encoding of \(X_1,...,X_n\) if it is a causal encoding of \(X_1,...,X_n,\) and the total entropy \(\sum_{i=1}^{n}H(M_i)\) is the smallest possible among causal encodings.


(a) (4 pts) Total Entropy of Optimal Causal Encoding

What is the total entropy \(\sum_{i=1}^{n}H(M_i)\) of an optimal causal encoding? Express your answer in terms of \(p_X.\)

Ans: Since \((X_1,...,X_n)\) is a function of \((M_1,...,M_n),\)

\[\sum_{i=1}^{n}H(M_i) \ge H(M_1,...,M_n) \ge H(X_1,...,X_n) = nH(p_X). \]

Equality holds when \(M_i=X_i,\) Hence, the total entropy \(\sum_{i=1}^{n}H(M_i)\) of an optimal causal encoding is \(nH(p_X)\).


(b) (4 pts) Mutual Independence

(True or False) Is it true that for every choice of \(n, p_X\) and optimal causal encoding \(M_1,...,M_n,\) the random variables \(M_1,...,M_n\) must be mutually independent? Prove your assertion. (Hint: You may use the fact that \(M_1,...,M_n\) are mutually independent if \(H(M_1,...,M_n)=\sum_{i=1}^{n}H(M_i).)\)

Ans: True. If \(M_1,...,M_n\) is an optimal causal encoding, we have \(\sum_{i=1}^{n}H(M_i)=nH(p_X)\). Equality must hold, i.e.,

\[H(M_1,...,M_n)=\sum_{i=1}^{n}H(M_i) \]

This implies that \(M_1,...,M_n\) are mutually independent.


(c) (4 pts) Conditional Entropy \(H(M_i|X_i)\)

(True or False) Is it true that for every choice of \(n, p_X\) and optimal causal encoding \(M_1,...,M_n\) we must have \(H(M_i|X_i)=0\) for all \(i=1,...,n?\) Prove your assertion.

Ans: False. Consider \(p_X\) being \(Bern(1/2)\) and \(M_i=X_1\oplus\cdot\cdot\cdot\oplus X_i\). Since \(X_1=M_1\) and \(X_i=M_i\oplus M_{i-1}\) for \(i\ge2,\) this is a causal encoding. Since \(M_i\sim Bern(1/2)\), we have \(\sum_{i=1}^{n}H(M_i)=n=nH(p_X),\) and hence this is an optimal causal encoding. Nevertheless, \(H(M_i|X_i)=1\) for \(i>2\).


(d) (3 pts) Expected Length \(\le H(p_X)+1\) (Prefix-Free)

Suppose now we require each \(M_i\) to be a prefix-free bit sequence, that is, we require \(M_i\in\mathcal{M}\) where \(\mathcal{M}\) is a finite subset of \(\{0,1\}^{*}\) that satisfies the prefix-free property (i.e., for every \(m_1,m_2\in\mathcal{M}\), if \(m_1\ne m_2\), then \(m_1\) is not a prefix of \(m_2\)). Is it true that for every \(p_X\) and \(n\), we can find a (not necessarily optimal) causal encoding \(M_1,...,M_n\) such that

\[\mathbb{E}\left[\frac{1}{n}\sum_{i=1}^{n}L(M_i)\right]\le H(p_X)+1, \]

where \(L(M_i)\) is the length of \(M_i\)? Prove your assertion.

Ans: True. Let \(f:\mathcal{X}\rightarrow\{0,1\}^{*}\) be the Huffman code on \(p_X\) and take \(M_i=f(X_i)\). We have \(\mathbb{E}[L(M_i)] < H(p_X)+1\). Hence, \(\mathbb{E}[(1/n)\sum_{i=1}^{n}L(M_i)]\le H(p_X)+1.\)


(e) (4 pts) Expected Length \(\le H(p_X)+\epsilon\) (Prefix-Free)

(True or False. Warning: this part is difficult.) Suppose now we require each \(M_i\) to be a prefix-free bit sequence as in part (d). Is it true that for every \(p_X\) and \(\epsilon>0\) we can find \(n\) and a (not necessarily optimal) causal encoding \(M_1,...,M_n\) such that

\[\mathbb{E}\left[\frac{1}{n}\sum_{i=1}^{n}L(M_i)\right]\le H(p_X)+\epsilon, \]

where \(L(M_i)\) is the length of \(M_i\)? Prove your assertion.

Ans: False. We will argue that no strategy can attain a smaller \(\mathbb{E}[(1/n)\sum_{i=1}^{n}L(M_i)]\) than the strategy in part (d).

Assume the contrary that there is another causal encoding \(\tilde{M}_1,...\tilde{M}_n\) with

\[\mathbb{E}\left[\frac{1}{n}\sum_{i=1}^{n}L(\tilde{M}_i)\right] < \mathbb{E}\left[\frac{1}{n}\sum_{i=1}^{n}L(M_i)\right] \]

This implies that there exists \(k\) such that \(\mathbb{E}[L(\tilde{M}_k)] < \mathbb{E}[L(M_k)]\). We have

\[\mathbb{E}[L(M_k)] > \mathbb{E}[L(\tilde{M}_k)] = \mathbb{E}[\mathbb{E}[L(\tilde{M}_k)|X_1,...,X_{k-1}]], \]

and hence there exists \(x_1,...,x_{k-1}\) such that

\[\mathbb{E}[L(\tilde{M}_k)|X_1=x_1,...,X_{k-1}=x_{k-1}] < \mathbb{E}[L(M_k)]. \]

Since \(\tilde{M}_1,...\tilde{M}_n\) is causal encoding, we can assume \(\tilde{M}_k=\phi(X_1,...,X_k)\) and \(X_k=\psi(\tilde{M}_1,...,\tilde{M}_k)\) for some functions \(\phi, \psi\).

Consider the event \(X_1=x_1,...,X_{k-1}=x_{k-1}\). Since \(X_1,...,X_n\) are i.i.d., the conditional distribution of \(X_k\) given this event is still \(p_X\). We have \(\tilde{M}_k=\phi(x_1,...,x_{k-1},X_k)\) and \(X_k=\psi(\tilde{m}_1,...,\tilde{m}_{k-1},\tilde{M}_k)\) conditional on this event, where \(\tilde{m}_1,...,\tilde{m}_{k-1}\) are fixed (depend only on the fixed values of \(x_1,...,x_{k-1})\) by the definition of causal encoding.

This means that the function \(x_k\mapsto\phi(x_1,...,x_{k-1},x_k)\) is injective, and hence is prefix-free since it takes values over a set \(\mathcal{M}\) satisfying the prefix-free property. Hence, \(x_k\mapsto\phi(x_1,...,x_{k-1},x_k)\) is a prefix-free encoding for the distribution \(p_X\), and cannot achieve the inequality since \(M_k\) is the Huffman code of \(X_k\) which is optimal.

Therefore, the strategy of part (d) achieves the smallest \(\mathbb{E}[(1/n)\sum_{i=1}^{n}L(M_i)]\).

To show that the answer to this question is False, we can consider any \(p_X\) where the expected length \(\mathbb{E}[L(M_i)]\) of Huffman coding is strictly greater than \(H(p_X)\), for example, when \(p_X\) is \(Bern(1/3)\). We then have \(\mathbb{E}[(1/n)\sum_{i=1}^{n}L(M_i)]\) being strictly greater than \(H(p_X)\) and cannot be \(\le H(p_X)+\epsilon\) for \(0<\epsilon<\mathbb{E}[L(M_1)]-H(p_X).\)

posted @ 2025-10-30 17:45  Cold_Chair  阅读(18)  评论(0)    收藏  举报