10-2 霍夫曼编码
霍夫曼编码(Huffman Coding)
霍夫曼编码(Huffman Coding)是一种基于字符频率的变长前缀编码(Variable-Length Prefix Code)算法。它的核心思想是:出现频率高的字符使用短编码,出现频率低的字符使用长编码,从而最小化整体编码长度。
霍夫曼编码是一种贪心算法(Greedy Algorithm),每次选择频率最低的两个节点合并。它生成的编码满足前缀性质(Prefix Property):任何字符的编码都不是另一个字符编码的前缀,因此解码时不会产生歧义。
霍夫曼编码广泛应用于文件压缩(如 ZIP、gzip)、图像压缩(如 JPEG)和多媒体编码中。
前缀编码(Prefix Code)
定长编码(Fixed-Length Code)为每个字符分配相同位数的编码。例如 4 个字符用 2 位:
字符 定长编码
A 00
B 01
C 10
D 11
如果一段文本中 A 出现频率远高于 D,定长编码会浪费空间。霍夫曼编码用变长编码解决这个问题:
字符 频率 霍夫曼编码
A 5 0
B 2 10
C 1 110
D 1 111
前缀性质保证了:遇到 0 就知道是 A,遇到 10 就知道是 B,不会产生歧义。
编码长度对比(以 "AABACABAD" 为例,A=5, B=2, C=1, D=1):
定长编码:9 字符 × 2 位 = 18 位
霍夫曼编码:5×1 + 2×2 + 1×3 + 1×3 = 15 位
节省:3 位 (16.7%)
霍夫曼树的构建
霍夫曼编码通过构建一棵霍夫曼树(Huffman Tree)来生成编码。构建过程如下:
- 将每个字符作为一个叶子节点,权重为其频率
- 选取频率最低的两个节点,合并为一个新节点(频率 = 两子节点频率之和)
- 将新节点放回集合中
- 重复步骤 2-3,直到只剩一个节点(根节点)
以字符频率 A=5, B=2, C=1, D=1 为例:
初始节点(按频率排序):
C(1) D(1) B(2) A(5)
Step 1: 合并 C(1) 和 D(1) → N1(2)
B(2) N1(2) A(5)
N1(2)
/ \
C(1) D(1)
Step 2: 合并 B(2) 和 N1(2) → N2(4)
N2(4) A(5)
N2(4)
/ \
B(2) N1(2)
/ \
C(1) D(1)
Step 3: 合并 N2(4) 和 A(5) → Root(9)
Root(9)
/ \
A(5) N2(4)
/ \
B(2) N1(2)
/ \
C(1) D(1)
从根节点出发,左子树标记 0,右子树标记 1,即可得到每个字符的编码:
A: 0 (左)
B: 10 (右→左)
C: 110 (右→右→左)
D: 111 (右→右→右)
编码过程
有了编码表,将原始文本逐字符转换为二进制编码:
文本: A A B A C A B A D
A → 0
A → 0
B → 10
A → 0
C → 110
A → 0
B → 10
A → 0
D → 111
编码结果: 00 10 0 110 0 10 0 111
= 001001100100111 (15 位)
C++ 实现
#include <iostream>
#include <string>
#include <queue>
#include <unordered_map>
using namespace std;
struct HuffmanNode {
char ch;
int freq;
HuffmanNode* left;
HuffmanNode* right;
HuffmanNode(char c, int f) : ch(c), freq(f), left(nullptr), right(nullptr) {}
};
struct CompareNode {
bool operator()(HuffmanNode* a, HuffmanNode* b) {
return a->freq > b->freq; // Min-heap: smallest freq on top
}
};
// Build Huffman tree from frequency table
HuffmanNode* buildHuffmanTree(const unordered_map<char, int>& freq) {
priority_queue<HuffmanNode*, vector<HuffmanNode*>, CompareNode> pq;
for (auto& [ch, f] : freq)
pq.push(new HuffmanNode(ch, f));
while (pq.size() > 1) {
HuffmanNode* left = pq.top(); pq.pop();
HuffmanNode* right = pq.top(); pq.pop();
HuffmanNode* parent = new HuffmanNode('\0', left->freq + right->freq);
parent->left = left;
parent->right = right;
pq.push(parent);
}
return pq.top();
}
// Generate code table by traversing the tree
void generateCodes(HuffmanNode* node, string code,
unordered_map<char, string>& codes) {
if (!node) return;
if (!node->left && !node->right) { // Leaf node
codes[node->ch] = code;
return;
}
generateCodes(node->left, code + "0", codes);
generateCodes(node->right, code + "1", codes);
}
// Encode text using Huffman codes
string encode(const string& text, unordered_map<char, string>& codes) {
string result;
for (char c : text)
result += codes[c];
return result;
}
int main() {
string text = "AABACABAD";
// Count frequencies
unordered_map<char, int> freq;
for (char c : text) freq[c]++;
// Build tree and generate codes
HuffmanNode* root = buildHuffmanTree(freq);
unordered_map<char, string> codes;
generateCodes(root, "", codes);
// Print code table
cout << "Huffman Codes:" << endl;
for (auto& [ch, code] : codes)
cout << " " << ch << " (freq=" << freq[ch] << "): " << code << endl;
// Encode
string encoded = encode(text, codes);
cout << "\nOriginal text: " << text << endl;
cout << "Encoded: " << encoded << " (" << encoded.length() << " bits)" << endl;
cout << "Fixed-length: " << text.length() * 2 << " bits" << endl;
return 0;
}
C 实现
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
typedef struct HuffmanNode {
char ch;
int freq;
struct HuffmanNode* left;
struct HuffmanNode* right;
} HuffmanNode;
// Min-heap for Huffman nodes
typedef struct {
HuffmanNode** data;
int size;
int capacity;
} MinHeap;
MinHeap* heap_create(int capacity) {
MinHeap* h = (MinHeap*)malloc(sizeof(MinHeap));
h->data = (HuffmanNode**)malloc(capacity * sizeof(HuffmanNode*));
h->size = 0;
h->capacity = capacity;
return h;
}
void heap_swap(HuffmanNode** a, HuffmanNode** b) {
HuffmanNode* tmp = *a; *a = *b; *b = tmp;
}
void heap_push(MinHeap* h, HuffmanNode* node) {
int i = h->size++;
h->data[i] = node;
while (i > 0) {
int parent = (i - 1) / 2;
if (h->data[i]->freq >= h->data[parent]->freq) break;
heap_swap(&h->data[i], &h->data[parent]);
i = parent;
}
}
HuffmanNode* heap_pop(MinHeap* h) {
HuffmanNode* top = h->data[0];
h->data[0] = h->data[--h->size];
int i = 0;
while (1) {
int smallest = i;
int left = 2 * i + 1, right = 2 * i + 2;
if (left < h->size && h->data[left]->freq < h->data[smallest]->freq)
smallest = left;
if (right < h->size && h->data[right]->freq < h->data[smallest]->freq)
smallest = right;
if (smallest == i) break;
heap_swap(&h->data[i], &h->data[smallest]);
i = smallest;
}
return top;
}
HuffmanNode* node_create(char ch, int freq) {
HuffmanNode* n = (HuffmanNode*)malloc(sizeof(HuffmanNode));
n->ch = ch;
n->freq = freq;
n->left = n->right = NULL;
return n;
}
// Code table storage
char code_table[256][256];
int code_lengths[256];
void generate_codes(HuffmanNode* node, char* buffer, int depth) {
if (!node) return;
if (!node->left && !node->right) {
buffer[depth] = '\0';
strcpy(code_table[(unsigned char)node->ch], buffer);
code_lengths[(unsigned char)node->ch] = depth;
return;
}
buffer[depth] = '0';
generate_codes(node->left, buffer, depth + 1);
buffer[depth] = '1';
generate_codes(node->right, buffer, depth + 1);
}
int main() {
const char* text = "AABACABAD";
int len = strlen(text);
// Count frequencies
int freq[256] = {0};
for (int i = 0; i < len; i++)
freq[(unsigned char)text[i]]++;
// Build Huffman tree
MinHeap* heap = heap_create(256);
for (int i = 0; i < 256; i++)
if (freq[i] > 0)
heap_push(heap, node_create((char)i, freq[i]));
while (heap->size > 1) {
HuffmanNode* left = heap_pop(heap);
HuffmanNode* right = heap_pop(heap);
HuffmanNode* parent = node_create('\0', left->freq + right->freq);
parent->left = left;
parent->right = right;
heap_push(heap, parent);
}
HuffmanNode* root = heap_pop(heap);
// Generate codes
memset(code_table, 0, sizeof(code_table));
char buffer[256];
generate_codes(root, buffer, 0);
// Print code table
printf("Huffman Codes:\n");
for (int i = 0; i < 256; i++) {
if (freq[i] > 0)
printf(" %c (freq=%d): %s\n", (char)i, freq[i], code_table[i]);
}
// Encode
printf("\nOriginal text: %s\n", text);
printf("Encoded: ");
int total_bits = 0;
for (int i = 0; i < len; i++) {
printf("%s", code_table[(unsigned char)text[i]]);
total_bits += code_lengths[(unsigned char)text[i]];
}
printf(" (%d bits)\n", total_bits);
printf("Fixed-length: %d bits\n", len * 2);
return 0;
}
Python 实现
import heapq
from collections import defaultdict
class HuffmanNode:
def __init__(self, ch=None, freq=0, left=None, right=None):
self.ch = ch
self.freq = freq
self.left = left
self.right = right
def __lt__(self, other):
return self.freq < other.freq
def build_huffman_tree(freq):
heap = [HuffmanNode(ch=ch, freq=f) for ch, f in freq.items()]
heapq.heapify(heap)
while len(heap) > 1:
left = heapq.heappop(heap)
right = heapq.heappop(heap)
parent = HuffmanNode(freq=left.freq + right.freq, left=left, right=right)
heapq.heappush(heap, parent)
return heap[0]
def generate_codes(node, code="", codes=None):
if codes is None:
codes = {}
if node is None:
return codes
if node.ch is not None: # Leaf node
codes[node.ch] = code
return codes
generate_codes(node.left, code + "0", codes)
generate_codes(node.right, code + "1", codes)
return codes
def encode(text, codes):
return "".join(codes[c] for c in text)
text = "AABACABAD"
# Count frequencies
freq = defaultdict(int)
for c in text:
freq[c] += 1
# Build tree and generate codes
root = build_huffman_tree(freq)
codes = generate_codes(root)
print("Huffman Codes:")
for ch in sorted(codes):
print(f" {ch} (freq={freq[ch]}): {codes[ch]}")
encoded = encode(text, codes)
print(f"\nOriginal text: {text}")
print(f"Encoded: {encoded} ({len(encoded)} bits)")
print(f"Fixed-length: {len(text) * 2} bits")
Go 实现
package main
import (
"container/heap"
"fmt"
)
type HuffmanNode struct {
ch byte
freq int
left *HuffmanNode
right *HuffmanNode
}
type MinHeap []*HuffmanNode
func (h MinHeap) Len() int { return len(h) }
func (h MinHeap) Less(i, j int) bool { return h[i].freq < h[j].freq }
func (h MinHeap) Swap(i, j int) { h[i], h[j] = h[j], h[i] }
func (h *MinHeap) Push(x interface{}) {
*h = append(*h, x.(*HuffmanNode))
}
func (h *MinHeap) Pop() interface{} {
old := *h
n := len(old)
item := old[n-1]
*h = old[:n-1]
return item
}
func buildHuffmanTree(freq map[byte]int) *HuffmanNode {
h := &MinHeap{}
heap.Init(h)
for ch, f := range freq {
heap.Push(h, &HuffmanNode{ch: ch, freq: f})
}
for h.Len() > 1 {
left := heap.Pop(h).(*HuffmanNode)
right := heap.Pop(h).(*HuffmanNode)
parent := &HuffmanNode{freq: left.freq + right.freq, left: left, right: right}
heap.Push(h, parent)
}
return heap.Pop(h).(*HuffmanNode)
}
func generateCodes(node *HuffmanNode, code string, codes map[byte]string) {
if node == nil {
return
}
if node.left == nil && node.right == nil {
codes[node.ch] = code
return
}
generateCodes(node.left, code+"0", codes)
generateCodes(node.right, code+"1", codes)
}
func encode(text string, codes map[byte]string) string {
result := ""
for i := 0; i < len(text); i++ {
result += codes[text[i]]
}
return result
}
func main() {
text := "AABACABAD"
// Count frequencies
freq := make(map[byte]int)
for i := 0; i < len(text); i++ {
freq[text[i]]++
}
// Build tree and generate codes
root := buildHuffmanTree(freq)
codes := make(map[byte]string)
generateCodes(root, "", codes)
fmt.Println("Huffman Codes:")
for ch, code := range codes {
fmt.Printf(" %c (freq=%d): %s\n", ch, freq[ch], code)
}
encoded := encode(text, codes)
fmt.Printf("\nOriginal text: %s\n", text)
fmt.Printf("Encoded: %s (%d bits)\n", encoded, len(encoded))
fmt.Printf("Fixed-length: %d bits\n", len(text)*2)
}
运行该程序将输出:
Huffman Codes:
A (freq=5): 0
B (freq=2): 10
C (freq=1): 110
D (freq=1): 111
Original text: AABACABAD
Encoded: 001001100100111 (15 bits)
Fixed-length: 18 bits
解码过程
解码过程就是沿着霍夫曼树从根节点出发,逐位读取编码:
- 遇到
0走左子树 - 遇到
1走右子树 - 到达叶子节点时输出对应字符,回到根节点继续
以编码 001001100100111 为例:
0 → 左 → 叶子 A 输出: A
0 → 左 → 叶子 A 输出: A
1 → 右, 0 → 左 → 叶子 B 输出: B
0 → 左 → 叶子 A 输出: A
1 → 右, 1 → 右, 0 → 左 → 叶子 C 输出: C
0 → 左 → 叶子 A 输出: A
1 → 右, 0 → 左 → 叶子 B 输出: B
0 → 左 → 叶子 A 输出: A
1 → 右, 1 → 右, 1 → 右 → 叶子 D 输出: D
解码结果: AABACABAD
C++ 实现
#include <iostream>
#include <string>
#include <queue>
#include <unordered_map>
using namespace std;
struct HuffmanNode {
char ch;
int freq;
HuffmanNode* left;
HuffmanNode* right;
HuffmanNode(char c, int f) : ch(c), freq(f), left(nullptr), right(nullptr) {}
};
struct CompareNode {
bool operator()(HuffmanNode* a, HuffmanNode* b) {
return a->freq > b->freq;
}
};
HuffmanNode* buildTree(const unordered_map<char, int>& freq) {
priority_queue<HuffmanNode*, vector<HuffmanNode*>, CompareNode> pq;
for (auto& [ch, f] : freq)
pq.push(new HuffmanNode(ch, f));
while (pq.size() > 1) {
HuffmanNode* left = pq.top(); pq.pop();
HuffmanNode* right = pq.top(); pq.pop();
HuffmanNode* parent = new HuffmanNode('\0', left->freq + right->freq);
parent->left = left;
parent->right = right;
pq.push(parent);
}
return pq.top();
}
void generateCodes(HuffmanNode* node, string code,
unordered_map<char, string>& codes) {
if (!node) return;
if (!node->left && !node->right) { codes[node->ch] = code; return; }
generateCodes(node->left, code + "0", codes);
generateCodes(node->right, code + "1", codes);
}
// Decode by traversing the tree bit by bit
string decode(HuffmanNode* root, const string& encoded) {
string result;
HuffmanNode* current = root;
for (char bit : encoded) {
if (bit == '0') current = current->left;
else current = current->right;
if (!current->left && !current->right) {
result += current->ch;
current = root; // Reset to root for next character
}
}
return result;
}
int main() {
string text = "AABACABAD";
unordered_map<char, int> freq;
for (char c : text) freq[c]++;
HuffmanNode* root = buildTree(freq);
unordered_map<char, string> codes;
generateCodes(root, "", codes);
// Encode
string encoded;
for (char c : text) encoded += codes[c];
cout << "Encoded: " << encoded << endl;
// Decode
string decoded = decode(root, encoded);
cout << "Decoded: " << decoded << endl;
cout << "Match: " << (text == decoded ? "yes" : "no") << endl;
return 0;
}
C 实现
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
typedef struct HuffmanNode {
char ch;
int freq;
struct HuffmanNode* left;
struct HuffmanNode* right;
} HuffmanNode;
typedef struct {
HuffmanNode** data;
int size;
int capacity;
} MinHeap;
MinHeap* heap_create(int cap) {
MinHeap* h = malloc(sizeof(MinHeap));
h->data = malloc(cap * sizeof(HuffmanNode*));
h->size = 0; h->capacity = cap;
return h;
}
void heap_push(MinHeap* h, HuffmanNode* node) {
int i = h->size++;
h->data[i] = node;
while (i > 0) {
int p = (i - 1) / 2;
if (h->data[i]->freq >= h->data[p]->freq) break;
HuffmanNode* tmp = h->data[i]; h->data[i] = h->data[p]; h->data[p] = tmp;
i = p;
}
}
HuffmanNode* heap_pop(MinHeap* h) {
HuffmanNode* top = h->data[0];
h->data[0] = h->data[--h->size];
int i = 0;
while (1) {
int s = i, l = 2*i+1, r = 2*i+2;
if (l < h->size && h->data[l]->freq < h->data[s]->freq) s = l;
if (r < h->size && h->data[r]->freq < h->data[s]->freq) s = r;
if (s == i) break;
HuffmanNode* tmp = h->data[i]; h->data[i] = h->data[s]; h->data[s] = tmp;
i = s;
}
return top;
}
HuffmanNode* build_tree(int* freq) {
MinHeap* h = heap_create(256);
for (int i = 0; i < 256; i++)
if (freq[i] > 0) {
HuffmanNode* n = malloc(sizeof(HuffmanNode));
n->ch = (char)i; n->freq = freq[i]; n->left = n->right = NULL;
heap_push(h, n);
}
while (h->size > 1) {
HuffmanNode* left = heap_pop(h);
HuffmanNode* right = heap_pop(h);
HuffmanNode* parent = malloc(sizeof(HuffmanNode));
parent->ch = '\0'; parent->freq = left->freq + right->freq;
parent->left = left; parent->right = right;
heap_push(h, parent);
}
HuffmanNode* root = heap_pop(h);
free(h->data); free(h);
return root;
}
char code_table[256][256];
int code_lens[256];
void gen_codes(HuffmanNode* node, char* buf, int depth) {
if (!node) return;
if (!node->left && !node->right) {
buf[depth] = '\0';
strcpy(code_table[(unsigned char)node->ch], buf);
code_lens[(unsigned char)node->ch] = depth;
return;
}
buf[depth] = '0'; gen_codes(node->left, buf, depth + 1);
buf[depth] = '1'; gen_codes(node->right, buf, depth + 1);
}
// Decode by traversing tree
void decode(HuffmanNode* root, const char* encoded, char* output) {
int out_idx = 0;
HuffmanNode* cur = root;
for (int i = 0; encoded[i]; i++) {
cur = (encoded[i] == '0') ? cur->left : cur->right;
if (!cur->left && !cur->right) {
output[out_idx++] = cur->ch;
cur = root;
}
}
output[out_idx] = '\0';
}
int main() {
const char* text = "AABACABAD";
int len = strlen(text);
int freq[256] = {0};
for (int i = 0; i < len; i++) freq[(unsigned char)text[i]]++;
HuffmanNode* root = build_tree(freq);
memset(code_table, 0, sizeof(code_table));
memset(code_lens, 0, sizeof(code_lens));
char buf[256];
gen_codes(root, buf, 0);
// Encode
char encoded[1024] = "";
for (int i = 0; i < len; i++)
strcat(encoded, code_table[(unsigned char)text[i]]);
printf("Encoded: %s\n", encoded);
// Decode
char decoded[1024];
decode(root, encoded, decoded);
printf("Decoded: %s\n", decoded);
printf("Match: %s\n", strcmp(text, decoded) == 0 ? "yes" : "no");
return 0;
}
Python 实现
import heapq
from collections import defaultdict
class HuffmanNode:
def __init__(self, ch=None, freq=0, left=None, right=None):
self.ch = ch
self.freq = freq
self.left = left
self.right = right
def __lt__(self, other):
return self.freq < other.freq
def build_tree(freq):
h = [HuffmanNode(ch=ch, freq=f) for ch, f in freq.items()]
heapq.heapify(h)
while len(h) > 1:
left, right = heapq.heappop(h), heapq.heappop(h)
heapq.heappush(h, HuffmanNode(freq=left.freq + right.freq,
left=left, right=right))
return h[0]
def gen_codes(node, code="", codes=None):
if codes is None:
codes = {}
if node is None:
return codes
if node.ch is not None:
codes[node.ch] = code
return codes
gen_codes(node.left, code + "0", codes)
gen_codes(node.right, code + "1", codes)
return codes
def decode(root, encoded):
result = []
current = root
for bit in encoded:
current = current.left if bit == "0" else current.right
if current.ch is not None: # Leaf
result.append(current.ch)
current = root
return "".join(result)
text = "AABACABAD"
freq = defaultdict(int)
for c in text:
freq[c] += 1
root = build_tree(freq)
codes = gen_codes(root)
encoded = "".join(codes[c] for c in text)
print(f"Encoded: {encoded}")
decoded = decode(root, encoded)
print(f"Decoded: {decoded}")
print(f"Match: {'yes' if text == decoded else 'no'}")
Go 实现
package main
import (
"container/heap"
"fmt"
)
type HuffmanNode struct {
ch byte
freq int
left *HuffmanNode
right *HuffmanNode
}
type MinHeap []*HuffmanNode
func (h MinHeap) Len() int { return len(h) }
func (h MinHeap) Less(i, j int) bool { return h[i].freq < h[j].freq }
func (h MinHeap) Swap(i, j int) { h[i], h[j] = h[j], h[i] }
func (h *MinHeap) Push(x interface{}) { *h = append(*h, x.(*HuffmanNode)) }
func (h *MinHeap) Pop() interface{} {
old := *h
n := len(old)
item := old[n-1]
*h = old[:n-1]
return item
}
func buildTree(freq map[byte]int) *HuffmanNode {
h := &MinHeap{}
heap.Init(h)
for ch, f := range freq {
heap.Push(h, &HuffmanNode{ch: ch, freq: f})
}
for h.Len() > 1 {
left := heap.Pop(h).(*HuffmanNode)
right := heap.Pop(h).(*HuffmanNode)
heap.Push(h, &HuffmanNode{freq: left.freq + right.freq, left: left, right: right})
}
return heap.Pop(h).(*HuffmanNode)
}
func genCodes(node *HuffmanNode, code string, codes map[byte]string) {
if node == nil {
return
}
if node.left == nil && node.right == nil {
codes[node.ch] = code
return
}
genCodes(node.left, code+"0", codes)
genCodes(node.right, code+"1", codes)
}
func decode(root *HuffmanNode, encoded string) string {
var result []byte
current := root
for i := 0; i < len(encoded); i++ {
if encoded[i] == '0' {
current = current.left
} else {
current = current.right
}
if current.left == nil && current.right == nil {
result = append(result, current.ch)
current = root
}
}
return string(result)
}
func main() {
text := "AABACABAD"
freq := make(map[byte]int)
for i := 0; i < len(text); i++ {
freq[text[i]]++
}
root := buildTree(freq)
codes := make(map[byte]string)
genCodes(root, "", codes)
encoded := ""
for i := 0; i < len(text); i++ {
encoded += codes[text[i]]
}
fmt.Printf("Encoded: %s\n", encoded)
decoded := decode(root, encoded)
fmt.Printf("Decoded: %s\n", decoded)
match := "no"
if text == decoded {
match = "yes"
}
fmt.Printf("Match: %s\n", match)
}
运行该程序将输出:
Encoded: 001001100100111
Decoded: AABACABAD
Match: yes
编码长度的计算
对于给定频率分布,霍夫曼编码的总位数可以通过树的构建过程直接计算:
总位数 = Σ (字符频率 × 编码长度)
= Σ (叶子节点深度 × 该叶子的频率)
对于 A=5, B=2, C=1, D=1:
总位数 = 5 × 1 (A: 0)
+ 2 × 2 (B: 10)
+ 1 × 3 (C: 110)
+ 1 × 3 (D: 111)
= 5 + 4 + 3 + 3
= 15 位
也可以从树的角度理解:每次合并操作产生一个内部节点,该节点的频率等于其子树中所有叶子频率之和。总编码长度等于所有内部节点的频率之和:
N1(2) = C + D = 2 → 贡献 2 位
N2(4) = B + C + D = 4 → 贡献 4 位
Root(9) = A + B + C + D = 9 → 根节点不贡献
总位数 = 2 + 4 = 6... 不对
修正:总位数 = Σ (内部节点频率) = N1 + N2 = 2 + 4 = 6
但这不是总位数。
实际上更准确的计算是:总位数 = 加权路径长度(Weighted Path Length)= 所有叶子节点 (频率 × 深度) 的总和。
最优性证明
霍夫曼编码是最优前缀编码——在所有可能的前缀编码中,霍夫曼编码的总编码长度最短。
这个最优性基于以下贪心选择性质:
- 频率最低的两个字符编码长度一定相同(否则可以交换来减小总长度)
- 频率最低的两个字符在树中一定是兄弟(深度最大且相同)
- 将这两个字符合并后,子问题仍然满足贪心选择性质
因此,霍夫曼算法的贪心策略(每次合并频率最低的两个节点)能够产生全局最优解。
霍夫曼编码的性质
压缩性能
| 度量 | 公式 | 说明 |
|---|---|---|
| 编码总长度 | Σ fi × li | fi = 字符 i 的频率,li = 编码长度 |
| 平均编码长度 | Σ pi × li | pi = 字符 i 的概率 |
| 压缩比 | 编码长度 / (n × log₂k) | n = 字符数,k = 字符集大小 |
| 理论下界 | H = -Σ pi × log₂(pi) | 香农熵(Shannon Entropy) |
霍夫曼编码的平均编码长度满足:
H ≤ 平均编码长度 < H + 1
其中 H 是信源的香农熵。这意味着霍夫曼编码最多比理论最优多 1 位/字符。
前缀性质
霍夫曼编码保证是前缀码(Prefix Code):任何字符的编码都不是另一个字符编码的前缀。这可以从树的结构来理解——所有字符都在叶子节点上,从根到任何叶子节点的路径不会经过另一个叶子节点。
树的结构
| 性质 | 说明 |
|---|---|
| 二叉树 | 每个内部节点恰好有两个子节点 |
| 完全性 | 如果频率都是正数,则树是满的(无单子节点) |
| 叶子节点 | 所有字符都对应叶子节点 |
| 内部节点 | 仅用于树的结构,不对应任何字符 |
编码的唯一性
对于相同的频率分布,霍夫曼树可能不唯一:
- 合并频率相同的节点时,谁做左子树谁做右子树不影响最优性
- 频率相同时,合并顺序可能不同
- 但所有最优霍夫曼树的总编码长度相同
应用场景
| 应用 | 说明 |
|---|---|
| DEFLATE 压缩 | ZIP / gzip 的核心算法之一(结合 LZ77) |
| JPEG 图像压缩 | 对 DCT 系数进行霍夫曼编码 |
| MP3 音频压缩 | 对量化后的频谱数据进行编码 |
| PNG 图像 | 可选的霍夫曼编码过滤器 |
| 通信协议 | 哈夫曼编码减少传输数据量 |
| 文本压缩 | 基于字符频率的通用压缩 |
与其他编码的对比
| 编码方式 | 类型 | 最优性 | 实现复杂度 |
|---|---|---|---|
| 定长编码 | 固定长度 | 无压缩效果 | O(1) |
| 霍夫曼编码 | 变长前缀码 | 字符级最优 | O(n log n) |
| 算术编码 | 变长流式编码 | 逼近熵极限 | O(n) |
| Shannon-Fano | 变长前缀码 | 接近最优 | O(n log n) |
霍夫曼编码是字符级最优的前缀编码,但不如算术编码(Arithmetic Coding)高效——算术编码可以将多个字符一起编码,更接近理论熵极限。

浙公网安备 33010602011771号