ENGG5301 Information Theory 2025 Midterm Exam P3:Causal Encoding

题目为回忆版,解答是 GPT-5 写的。

考试时 (1) 问就想偏了,考后看到 GPT-5 的答案很气,不等式想不到直接 (1)(2)(3) 连跪,搞的 (4)(5) 问也没做。

从初中就开始烂完的不等式水平又发力了,但这课确实没啥心思去刷教材/习题,符合预期。

Problem

Let random variables \(X_1,\dots,X_n\) be i.i.d. with distribution \(p_X\).

We define an encoding sequence \(M_1,\dots,M_n\) subject to the following causality constraints:

  • \(M_i\) is a function of \(X_1,\dots,X_i\), i.e., \(M_i = f_i(X_1,\dots,X_i)\);
  • \(X_i\) is a function of \(M_1,\dots,M_i\), i.e., \(X_i = g_i(M_1,\dots,M_i)\).

Answer the following:

  1. Find the minimum value of \(\sum_{i=1}^n H(M_i)\) in terms of \(p_X\).

  2. Show that, under the optimal encoding, \(M_1,\dots,M_n\) are mutually independent.

  3. Show that, under the optimal encoding, \(H(M_i \mid X_i) = 0\).

  4. If we drop optimality but require causal encoding and prefix-free constraint , show that there exists an encoding such that

    \[\mathbb{E}\!\left(\frac{1}{n}\sum_{i=1}^n L(M_i)\right) < H(p_X) + 1. \]

  5. Under the same causal and prefix-free constraint, show that for any \(\varepsilon>0\), there exists an encoding such that

    \[\mathbb{E}\!\left(\frac{1}{n}\sum_{i=1}^n L(M_i)\right) < H(p_X) + \varepsilon. \]


Solution

(a) Minimum of \(\sum_i H(M_i)\) under optimal encoding

Result:

\[\min \sum_{i=1}^n H(M_i) = n\,H(p_X). \]

Proof:

By the data processing inequality and lossless reconstruction,

\[H(X_1,\dots,X_n) \le H(M_1,\dots,M_n) \le \sum_{i=1}^n H(M_i). \]

Since \(X_1,\dots,X_n\) are i.i.d.,

\[H(X_1,\dots,X_n) = n H(X). \]

Thus,

\[\sum_{i=1}^n H(M_i) \ge n H(X). \]

Choosing \(M_i=f(X_i)\), i.e., encoding each symbol separately, achieves \(H(M_i)=H(X)\) and satisfies both causality and recoverability, hence the bound is tight.


(b) Independence of \(M_1,\dots,M_n\) under optimal encoding

Result:
Under the optimal encoding, \(M_1,\dots,M_n\) are mutually independent (and thus pairwise independent).

Proof:

Equality in part (a) requires that

\[\sum_{i=1}^n H(M_i) = H(M_1,\dots,M_n). \]

Equality between the sum of marginal entropies and the joint entropy holds if and only if the variables are mutually independent.
Hence, the optimal \(M_1,\dots,M_n\) are independent.


(c) Determinism: \(H(M_i\mid X_i)=0\) under optimal encoding

Result:
Under the optimal scheme, \(M_i\) is a deterministic function of \(X_i\).

Proof:

From the reconstruction condition,

\[H(X_i\mid M_1,\dots,M_i) = 0. \]

Since \(X_i\) is independent of previous messages \((M_1,\dots,M_{i-1})\),

\[H(X_i\mid M_1,\dots,M_{i-1}) = H(X_i). \]

Hence the mutual information satisfies

\[I(X_i; M_i \mid M_1,\dots,M_{i-1}) = H(X_i) - 0 = H(X_i). \]

On the other hand,

\[I(X_i; M_i \mid M_1,\dots,M_{i-1}) = H(M_i \mid M_1,\dots,M_{i-1}) - H(M_i \mid X_i, M_1,\dots,M_{i-1}). \]

Under the optimal encoding, \(H(M_i\mid M_1,\dots,M_{i-1}) = H(X_i)\), so

\[H(M_i\mid X_i, M_1,\dots,M_{i-1}) = 0. \]

Because \(M_i\) is independent of \((M_1,\dots,M_{i-1})\), it follows that

\[H(M_i\mid X_i)=0. \]

Therefore, each \(M_i\) is a deterministic function of \(X_i\).


(d) Existence of a causal, prefix-free encoding with average length \(< H(p_X)+1\)

We want to construct \(M_1,\dots,M_n\) satisfying causality and a prefix-free constraint, such that

\[\mathbb{E}\Big[\frac{1}{n}\sum_{i=1}^n L(M_i)\Big] < H(p_X)+1. \]

Construction (symbol-by-symbol / Huffman coding):

  1. Since \(X_1,\dots,X_n\) are i.i.d., we can encode each symbol independently.
  2. Build a prefix-free code \(C\) (e.g., Huffman code) for \(p_X\).
  3. For each \(i\), set

\[M_i = C(X_i). \]

This clearly satisfies causality:

  • \(M_i\) depends on \(X_i\) (and trivially on \(X_1,\dots,X_{i-1}\))
    \(\Rightarrow M_i = g(X_1,\dots,X_i)\).
  • \(X_i\) can be decoded from \(M_i\) alone
    \(\Rightarrow X_i = f(M_1,\dots,M_i)\).

Average length:

\[\mathbb{E}[L(M_i)] = \sum_x p_X(x) L(C(x)) = L(C), \quad \forall i. \]

Hence

\[\mathbb{E}\Big[\frac{1}{n}\sum_{i=1}^n L(M_i)\Big] = L(C) < H(p_X)+1 \]

by the standard bound for Huffman coding.

✅ This proves part (d).


(e) Existence of a causal, prefix-free encoding with average length \(< H(p_X)+\varepsilon\)

To achieve \(H(p_X)+\varepsilon\), we need block coding.

  1. Treat the whole sequence \(X^n=(X_1,\dots,X_n)\) as a block.
  2. Use an \(n\)-symbol prefix-free code \(C_n\) (e.g., arithmetic coding), which satisfies

\[H(X^n) \le \mathbb{E}[L(C_n(X^n))] < H(X^n)+1. \]

Since \(X^n\) is i.i.d.,

\[H(X^n) = n H(p_X). \]

  1. Implement \(C_n\) sequentially (arithmetic coding can be output symbol by symbol):
  • Let \(M_1,\dots,M_n\) be the incremental outputs, so that

\[M_1 M_2 \dots M_n = C_n(X^n). \]

  • Then causality is preserved:
    • \(M_i\) depends on \(X_1,\dots,X_i\).
    • \(X_i\) can be recovered from \(M_1,\dots,M_i\).
  1. Average length per symbol:

\[\mathbb{E}\Big[\frac{1}{n}\sum_{i=1}^n L(M_i)\Big] = \frac{1}{n} \mathbb{E}[L(C_n(X^n))] < \frac{n H(p_X)+1}{n} = H(p_X) + \frac{1}{n}. \]

  1. By choosing \(n > 1/\varepsilon\), we guarantee

\[\mathbb{E}\Big[\frac{1}{n}\sum_{i=1}^n L(M_i)\Big] < H(p_X) + \varepsilon. \]

✅ This proves part (e).

posted @ 2025-10-30 17:45  Cold_Chair  阅读(3)  评论(0)    收藏  举报