ENGG5301 Information Theory 2025 Midterm Exam P3:Causal Encoding
题目为回忆版,解答是 GPT-5 写的。
考试时 (1) 问就想偏了,考后看到 GPT-5 的答案很气,不等式想不到直接 (1)(2)(3) 连跪,搞的 (4)(5) 问也没做。
从初中就开始烂完的不等式水平又发力了,但这课确实没啥心思去刷教材/习题,符合预期。
Problem
Let random variables \(X_1,\dots,X_n\) be i.i.d. with distribution \(p_X\).
We define an encoding sequence \(M_1,\dots,M_n\) subject to the following causality constraints:
- \(M_i\) is a function of \(X_1,\dots,X_i\), i.e., \(M_i = f_i(X_1,\dots,X_i)\);
 - \(X_i\) is a function of \(M_1,\dots,M_i\), i.e., \(X_i = g_i(M_1,\dots,M_i)\).
 
Answer the following:
- 
Find the minimum value of \(\sum_{i=1}^n H(M_i)\) in terms of \(p_X\).
 - 
Show that, under the optimal encoding, \(M_1,\dots,M_n\) are mutually independent.
 - 
Show that, under the optimal encoding, \(H(M_i \mid X_i) = 0\).
 - 
If we drop optimality but require causal encoding and prefix-free constraint , show that there exists an encoding such that
\[\mathbb{E}\!\left(\frac{1}{n}\sum_{i=1}^n L(M_i)\right) < H(p_X) + 1. \] - 
Under the same causal and prefix-free constraint, show that for any \(\varepsilon>0\), there exists an encoding such that
\[\mathbb{E}\!\left(\frac{1}{n}\sum_{i=1}^n L(M_i)\right) < H(p_X) + \varepsilon. \] 
Solution
(a) Minimum of \(\sum_i H(M_i)\) under optimal encoding
Result:
Proof:
By the data processing inequality and lossless reconstruction,
Since \(X_1,\dots,X_n\) are i.i.d.,
Thus,
Choosing \(M_i=f(X_i)\), i.e., encoding each symbol separately, achieves \(H(M_i)=H(X)\) and satisfies both causality and recoverability, hence the bound is tight.
(b) Independence of \(M_1,\dots,M_n\) under optimal encoding
Result:
Under the optimal encoding, \(M_1,\dots,M_n\) are mutually independent (and thus pairwise independent).
Proof:
Equality in part (a) requires that
Equality between the sum of marginal entropies and the joint entropy holds if and only if the variables are mutually independent.
Hence, the optimal \(M_1,\dots,M_n\) are independent.
(c) Determinism: \(H(M_i\mid X_i)=0\) under optimal encoding
Result:
Under the optimal scheme, \(M_i\) is a deterministic function of \(X_i\).
Proof:
From the reconstruction condition,
Since \(X_i\) is independent of previous messages \((M_1,\dots,M_{i-1})\),
Hence the mutual information satisfies
On the other hand,
Under the optimal encoding, \(H(M_i\mid M_1,\dots,M_{i-1}) = H(X_i)\), so
Because \(M_i\) is independent of \((M_1,\dots,M_{i-1})\), it follows that
Therefore, each \(M_i\) is a deterministic function of \(X_i\).
(d) Existence of a causal, prefix-free encoding with average length \(< H(p_X)+1\)
We want to construct \(M_1,\dots,M_n\) satisfying causality and a prefix-free constraint, such that
Construction (symbol-by-symbol / Huffman coding):
- Since \(X_1,\dots,X_n\) are i.i.d., we can encode each symbol independently.
 - Build a prefix-free code \(C\) (e.g., Huffman code) for \(p_X\).
 - For each \(i\), set
 
This clearly satisfies causality:
- \(M_i\) depends on \(X_i\) (and trivially on \(X_1,\dots,X_{i-1}\))
\(\Rightarrow M_i = g(X_1,\dots,X_i)\). - \(X_i\) can be decoded from \(M_i\) alone
\(\Rightarrow X_i = f(M_1,\dots,M_i)\). 
Average length:
Hence
by the standard bound for Huffman coding.
✅ This proves part (d).
(e) Existence of a causal, prefix-free encoding with average length \(< H(p_X)+\varepsilon\)
To achieve \(H(p_X)+\varepsilon\), we need block coding.
- Treat the whole sequence \(X^n=(X_1,\dots,X_n)\) as a block.
 - Use an \(n\)-symbol prefix-free code \(C_n\) (e.g., arithmetic coding), which satisfies
 
Since \(X^n\) is i.i.d.,
- Implement \(C_n\) sequentially (arithmetic coding can be output symbol by symbol):
 
- Let \(M_1,\dots,M_n\) be the incremental outputs, so that
 
- Then causality is preserved:
- \(M_i\) depends on \(X_1,\dots,X_i\).
 - \(X_i\) can be recovered from \(M_1,\dots,M_i\).
 
 
- Average length per symbol:
 
- By choosing \(n > 1/\varepsilon\), we guarantee
 
✅ This proves part (e).

                
            
        
浙公网安备 33010602011771号