3-10 基数树(紧凑Trie)
基数树(Radix Tree / Compact Trie)
基数树是 Trie(前缀树)的压缩版本。在标准 Trie 中,如果某个节点只有一个子节点,就会造成空间浪费。基数树通过将只有单个子节点的链合并到一条边上来解决这一问题,每条边可以存储一个字符串片段而不仅是单个字符。基数树也被称为 Patricia Trie。
以插入 "romane", "romanus", "romulus", "rubens" 为例,标准 Trie 与基数树的对比:
标准 Trie 中 "romane" 和 "romanus" 共享前缀 "roman",但每个字符占一个节点:
(root)
├── r
│ ├── o
│ │ └── m
│ │ ├── a
│ │ │ ├── n
│ │ │ │ ├── e (E)
│ │ │ │ └── u
│ │ │ │ └── s (E)
│ │ │ └── ...
│ │ └── u
│ │ └── l
│ │ └── u
│ │ └── s (E)
│ └── u
│ └── b
│ └── e
│ └── n
│ └── s (E)
基数树将单子节点链压缩为一条边:
(root)
├── "rom"
│ ├── "ane" (E)
│ │ └── "us" (E)
│ └── "ulus" (E)
└── "ubens" (E)
其中 (E) 表示该节点是一个完整单词的结束。
节点定义
基数树的节点需要存储:边标签(字符串片段)、子节点列表、以及是否为单词结尾的标记。
C++ 实现
#include <iostream>
#include <string>
#include <vector>
#include <algorithm>
struct RadixNode
{
std::string edge; // edge label
std::vector<RadixNode*> children;
bool isEndOfWord;
RadixNode(const std::string& e = "", bool end = false)
: edge(e), isEndOfWord(end) {}
};
C 实现
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <stdbool.h>
#define MAX_EDGE 256
#define MAX_CHILDREN 128
typedef struct RadixNode
{
char edge[MAX_EDGE]; // edge label
struct RadixNode* children[MAX_CHILDREN];
int childCount;
bool isEndOfWord;
} RadixNode;
RadixNode* createNode(const char* edge, bool isEnd)
{
RadixNode* node = (RadixNode*)malloc(sizeof(RadixNode));
strncpy(node->edge, edge, MAX_EDGE - 1);
node->edge[MAX_EDGE - 1] = '\0';
node->childCount = 0;
node->isEndOfWord = isEnd;
return node;
}
Python 实现
class RadixNode:
def __init__(self, edge="", is_end=False):
self.edge = edge # edge label
self.children = [] # list of RadixNode
self.is_end = is_end # end of word marker
Go 实现
type RadixNode struct {
edge string // edge label
children []*RadixNode // list of child nodes
isEnd bool // end of word marker
}
Go 使用结构体定义节点,edge 存储边标签字符串片段,children 为子节点切片,isEnd 标记是否为单词结尾。
辅助函数:最长公共前缀
在插入和搜索操作中,我们需要计算两个字符串的最长公共前缀长度。
C++ 实现
int commonPrefixLength(const std::string& a, const std::string& b)
{
int len = std::min(a.length(), b.length());
int i = 0;
while (i < len && a[i] == b[i])
{
i++;
}
return i;
}
C 实现
int commonPrefixLength(const char* a, const char* b)
{
int i = 0;
while (a[i] && b[i] && a[i] == b[i])
{
i++;
}
return i;
}
Python 实现
def common_prefix_length(a, b):
i = 0
while i < len(a) and i < len(b) and a[i] == b[i]:
i += 1
return i
Go 实现
func commonPrefixLength(a, b string) int {
n := len(a)
if len(b) < n {
n = len(b)
}
i := 0
for i < n && a[i] == b[i] {
i++
}
return i
}
Go 版本使用 len 获取字符串长度,通过循环逐字符比较,返回公共前缀长度。
插入操作
插入时需要处理三种情况:
- 情况 1:无公共前缀 — 在当前节点的兄弟中创建新分支。
- 情况 2:部分匹配 — 当前节点的边标签被部分匹配,需要分裂节点。
- 情况 3:完全匹配 — 如果还有剩余字符,继续在子节点中递归;否则标记为单词结尾。
以依次插入 "car", "card", "care", "cat" 为例:
插入 "car": root → [car (E)]
插入 "card": root → [car (E) → [d (E)]]
插入 "care": root → [car (E) → [d (E)], [e (E)]]
拆分: root → [car (E) → [d (E)]]
→ 但 "care" 共享 "car" 后剩 "e"
实际: root → [car (E) → [d (E), e (E)]]
插入 "cat": root → [ca → [r (E) → [d (E), e (E)], t (E)]]
拆分 "car" 边为 "ca" + "r"
C++ 实现
#include <iostream>
#include <string>
#include <vector>
#include <algorithm>
struct RadixNode
{
std::string edge;
std::vector<RadixNode*> children;
bool isEndOfWord;
RadixNode(const std::string& e = "", bool end = false)
: edge(e), isEndOfWord(end) {}
};
int commonPrefixLength(const std::string& a, const std::string& b)
{
int len = std::min(a.length(), b.length());
int i = 0;
while (i < len && a[i] == b[i])
{
i++;
}
return i;
}
void insert(RadixNode* root, const std::string& word)
{
if (word.empty())
{
root->isEndOfWord = true;
return;
}
// find a child with common prefix
for (RadixNode* child : root->children)
{
int cpLen = commonPrefixLength(word, child->edge);
if (cpLen == 0) continue;
if (cpLen == (int)child->edge.length() && cpLen == (int)word.length())
{
// exact match: mark as end of word
child->isEndOfWord = true;
return;
}
if (cpLen == (int)child->edge.length())
{
// word is longer, recurse into child
insert(child, word.substr(cpLen));
return;
}
// partial match: split this node
// create split node with the common prefix
RadixNode* splitNode = new RadixNode(word.substr(0, cpLen), false);
// update current child's edge to remaining part
child->edge = child->edge.substr(cpLen);
// move current child under split node
splitNode->children.push_back(child);
// remove old child from parent, add split node
auto& kids = root->children;
kids.erase(std::remove(kids.begin(), kids.end(), child), kids.end());
kids.push_back(splitNode);
// if word has remaining part after common prefix
if (cpLen < (int)word.length())
{
RadixNode* newNode = new RadixNode(word.substr(cpLen), true);
splitNode->children.push_back(newNode);
}
else
{
splitNode->isEndOfWord = true;
}
return;
}
// no common prefix found: add new child
root->children.push_back(new RadixNode(word, true));
}
// collect all words in the tree
void collectWords(RadixNode* node, std::string prefix, std::vector<std::string>& result)
{
std::string current = prefix + node->edge;
if (node->isEndOfWord)
{
result.push_back(current);
}
for (RadixNode* child : node->children)
{
collectWords(child, current, result);
}
}
int main()
{
RadixNode* root = new RadixNode();
std::vector<std::string> words = {"car", "card", "care", "careful", "cat", "do", "dog", "dogma"};
for (const auto& w : words)
{
insert(root, w);
std::cout << "Inserted: " << w << "\n";
}
std::vector<std::string> allWords;
for (RadixNode* child : root->children)
{
collectWords(child, "", allWords);
}
std::cout << "\nAll words in radix tree:\n";
for (const auto& w : allWords)
{
std::cout << " " << w << "\n";
}
return 0;
}
运行该程序将输出
Inserted: car
Inserted: card
Inserted: care
Inserted: careful
Inserted: cat
Inserted: do
Inserted: dog
Inserted: dogma
All words in radix tree:
car
card
care
careful
cat
do
dog
dogma
Go 实现
package main
import "fmt"
type RadixNode struct {
edge string
children []*RadixNode
isEnd bool
}
func commonPrefixLength(a, b string) int {
n := len(a)
if len(b) < n {
n = len(b)
}
i := 0
for i < n && a[i] == b[i] {
i++
}
return i
}
func insert(root *RadixNode, word string) {
if word == "" {
root.isEnd = true
return
}
for i, child := range root.children {
cpLen := commonPrefixLength(word, child.edge)
if cpLen == 0 {
continue
}
if cpLen == len(child.edge) && cpLen == len(word) {
// exact match
child.isEnd = true
return
}
if cpLen == len(child.edge) {
// word is longer, recurse
insert(child, word[cpLen:])
return
}
// partial match: split node
split := &RadixNode{edge: word[:cpLen]}
child.edge = child.edge[cpLen:]
split.children = append(split.children, child)
if cpLen < len(word) {
split.children = append(split.children, &RadixNode{edge: word[cpLen:], isEnd: true})
} else {
split.isEnd = true
}
root.children[i] = split
return
}
// no common prefix: new child
root.children = append(root.children, &RadixNode{edge: word, isEnd: true})
}
func collectWords(node *RadixNode, prefix string) []string {
current := prefix + node.edge
var result []string
if node.isEnd {
result = append(result, current)
}
for _, child := range node.children {
result = append(result, collectWords(child, current)...)
}
return result
}
func main() {
root := &RadixNode{}
words := []string{"car", "card", "care", "careful", "cat", "do", "dog", "dogma"}
for _, w := range words {
insert(root, w)
fmt.Printf("Inserted: %s\n", w)
}
var allWords []string
for _, child := range root.children {
allWords = append(allWords, collectWords(child, "")...)
}
fmt.Println("\nAll words in radix tree:")
for _, w := range allWords {
fmt.Printf(" %s\n", w)
}
}
运行该程序将输出
Inserted: car
Inserted: card
Inserted: care
Inserted: careful
Inserted: cat
Inserted: do
Inserted: dog
Inserted: dogma
All words in radix tree:
car
card
care
careful
cat
do
dog
dogma
Go 版本使用切片 []*RadixNode 管理子节点。commonPrefixLength 计算公共前缀长度,insert 处理三种情况:完全匹配、部分匹配(分裂节点)、无公共前缀(新建子节点)。collectWords 通过 DFS 遍历收集所有单词。
搜索操作
搜索时,沿着路径逐个匹配边标签。
C++ 实现
bool search(RadixNode* root, const std::string& word)
{
if (word.empty())
{
return root->isEndOfWord;
}
for (RadixNode* child : root->children)
{
int cpLen = commonPrefixLength(word, child->edge);
if (cpLen == 0) continue;
if (cpLen < (int)child->edge.length())
{
// edge label doesn't fully match
return false;
}
// edge label fully matched, continue with remainder
return search(child, word.substr(cpLen));
}
return false;
}
C 实现
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <stdbool.h>
#define MAX_EDGE 256
#define MAX_CHILDREN 128
typedef struct RadixNode
{
char edge[MAX_EDGE];
struct RadixNode* children[MAX_CHILDREN];
int childCount;
bool isEndOfWord;
} RadixNode;
RadixNode* createNode(const char* edge, bool isEnd)
{
RadixNode* node = (RadixNode*)malloc(sizeof(RadixNode));
strncpy(node->edge, edge, MAX_EDGE - 1);
node->edge[MAX_EDGE - 1] = '\0';
node->childCount = 0;
node->isEndOfWord = isEnd;
return node;
}
int commonPrefixLength(const char* a, const char* b)
{
int i = 0;
while (a[i] && b[i] && a[i] == b[i]) i++;
return i;
}
void insert(RadixNode* root, const char* word)
{
if (word[0] == '\0')
{
root->isEndOfWord = true;
return;
}
for (int i = 0; i < root->childCount; i++)
{
RadixNode* child = root->children[i];
int cpLen = commonPrefixLength(word, child->edge);
if (cpLen == 0) continue;
if (cpLen == (int)strlen(child->edge) && cpLen == (int)strlen(word))
{
child->isEndOfWord = true;
return;
}
if (cpLen == (int)strlen(child->edge))
{
insert(child, word + cpLen);
return;
}
// split
RadixNode* splitNode = createNode("", false);
strncpy(splitNode->edge, word, cpLen);
splitNode->edge[cpLen] = '\0';
splitNode->isEndOfWord = false;
// update child's edge
memmove(child->edge, child->edge + cpLen, strlen(child->edge) - cpLen + 1);
splitNode->children[0] = child;
splitNode->childCount = 1;
if (cpLen < (int)strlen(word))
{
RadixNode* newNode = createNode(word + cpLen, true);
splitNode->children[splitNode->childCount++] = newNode;
}
else
{
splitNode->isEndOfWord = true;
}
root->children[i] = splitNode;
return;
}
root->children[root->childCount++] = createNode(word, true);
}
bool search(RadixNode* root, const char* word)
{
if (word[0] == '\0')
{
return root->isEndOfWord;
}
for (int i = 0; i < root->childCount; i++)
{
RadixNode* child = root->children[i];
int cpLen = commonPrefixLength(word, child->edge);
if (cpLen == 0) continue;
if (cpLen < (int)strlen(child->edge))
{
return false;
}
return search(child, word + cpLen);
}
return false;
}
void collectWords(RadixNode* node, const char* prefix, char result[][256], int* count)
{
char current[512];
snprintf(current, sizeof(current), "%s%s", prefix, node->edge);
if (node->isEndOfWord)
{
strncpy(result[*count], current, 255);
(*count)++;
}
for (int i = 0; i < node->childCount; i++)
{
collectWords(node->children[i], current, result, count);
}
}
int main()
{
RadixNode* root = createNode("", false);
const char* words[] = {"car", "card", "care", "careful", "cat", "do", "dog", "dogma"};
int n = sizeof(words) / sizeof(words[0]);
for (int i = 0; i < n; i++)
{
insert(root, words[i]);
printf("Inserted: %s\n", words[i]);
}
char allWords[100][256];
int count = 0;
for (int i = 0; i < root->childCount; i++)
{
collectWords(root->children[i], "", allWords, &count);
}
printf("\nAll words in radix tree:\n");
for (int i = 0; i < count; i++)
{
printf(" %s\n", allWords[i]);
}
// search tests
printf("\nSearch 'car': %s\n", search(root, "car") ? "Found" : "Not found");
printf("Search 'care': %s\n", search(root, "care") ? "Found" : "Not found");
printf("Search 'cow': %s\n", search(root, "cow") ? "Found" : "Not found");
return 0;
}
运行该程序将输出
Inserted: car
Inserted: card
Inserted: care
Inserted: careful
Inserted: cat
Inserted: do
Inserted: dog
Inserted: dogma
All words in radix tree:
car
card
care
careful
cat
do
dog
dogma
Search 'car': Found
Search 'care': Found
Search 'cow': Not found
Python 实现
class RadixNode:
def __init__(self, edge="", is_end=False):
self.edge = edge
self.children = []
self.is_end = is_end
def common_prefix_length(a, b):
i = 0
while i < len(a) and i < len(b) and a[i] == b[i]:
i += 1
return i
def insert(root, word):
if not word:
root.is_end = True
return
for i, child in enumerate(root.children):
cp_len = common_prefix_length(word, child.edge)
if cp_len == 0:
continue
if cp_len == len(child.edge) and cp_len == len(word):
# exact match
child.is_end = True
return
if cp_len == len(child.edge):
# word is longer, recurse
insert(child, word[cp_len:])
return
# partial match: split node
split = RadixNode(word[:cp_len], False)
child.edge = child.edge[cp_len:]
split.children.append(child)
if cp_len < len(word):
new_node = RadixNode(word[cp_len:], True)
split.children.append(new_node)
else:
split.is_end = True
root.children[i] = split
return
# no common prefix: new child
root.children.append(RadixNode(word, True))
def search(root, word):
if not word:
return root.is_end
for child in root.children:
cp_len = common_prefix_length(word, child.edge)
if cp_len == 0:
continue
if cp_len < len(child.edge):
return False
return search(child, word[cp_len:])
return False
def collect_words(node, prefix=""):
current = prefix + node.edge
result = []
if node.is_end:
result.append(current)
for child in node.children:
result.extend(collect_words(child, current))
return result
def main():
root = RadixNode()
words = ["car", "card", "care", "careful", "cat", "do", "dog", "dogma"]
for w in words:
insert(root, w)
print(f"Inserted: {w}")
all_words = []
for child in root.children:
all_words.extend(collect_words(child))
print("\nAll words in radix tree:")
for w in all_words:
print(f" {w}")
print(f"\nSearch 'car': {'Found' if search(root, 'car') else 'Not found'}")
print(f"Search 'care': {'Found' if search(root, 'care') else 'Not found'}")
print(f"Search 'cow': {'Found' if search(root, 'cow') else 'Not found'}")
if __name__ == "__main__":
main()
运行该程序将输出
Inserted: car
Inserted: card
Inserted: care
Inserted: careful
Inserted: cat
Inserted: do
Inserted: dog
Inserted: dogma
All words in radix tree:
car
card
care
careful
cat
do
dog
dogma
Search 'car': Found
Search 'care': Found
Search 'cow': Not found
Go 实现
package main
import "fmt"
type RadixNode struct {
edge string
children []*RadixNode
isEnd bool
}
func commonPrefixLength(a, b string) int {
n := len(a)
if len(b) < n {
n = len(b)
}
i := 0
for i < n && a[i] == b[i] {
i++
}
return i
}
func insert(root *RadixNode, word string) {
if word == "" {
root.isEnd = true
return
}
for i, child := range root.children {
cpLen := commonPrefixLength(word, child.edge)
if cpLen == 0 {
continue
}
if cpLen == len(child.edge) && cpLen == len(word) {
// exact match
child.isEnd = true
return
}
if cpLen == len(child.edge) {
// word is longer, recurse
insert(child, word[cpLen:])
return
}
// partial match: split node
split := &RadixNode{edge: word[:cpLen], isEnd: false}
child.edge = child.edge[cpLen:]
split.children = append(split.children, child)
if cpLen < len(word) {
newNode := &RadixNode{edge: word[cpLen:], isEnd: true}
split.children = append(split.children, newNode)
} else {
split.isEnd = true
}
root.children[i] = split
return
}
// no common prefix: new child
root.children = append(root.children, &RadixNode{edge: word, isEnd: true})
}
func search(root *RadixNode, word string) bool {
if word == "" {
return root.isEnd
}
for _, child := range root.children {
cpLen := commonPrefixLength(word, child.edge)
if cpLen == 0 {
continue
}
if cpLen < len(child.edge) {
return false
}
return search(child, word[cpLen:])
}
return false
}
func collectWords(node *RadixNode, prefix string, result *[]string) {
current := prefix + node.edge
if node.isEnd {
*result = append(*result, current)
}
for _, child := range node.children {
collectWords(child, current, result)
}
}
func main() {
root := &RadixNode{}
words := []string{"car", "card", "care", "careful", "cat", "do", "dog", "dogma"}
for _, w := range words {
insert(root, w)
fmt.Printf("Inserted: %s\n", w)
}
var allWords []string
for _, child := range root.children {
collectWords(child, "", &allWords)
}
fmt.Println("\nAll words in radix tree:")
for _, w := range allWords {
fmt.Printf(" %s\n", w)
}
fmt.Printf("\nSearch 'car': %s\n", map[bool]string{true: "Found", false: "Not found"}[search(root, "car")])
fmt.Printf("Search 'care': %s\n", map[bool]string{true: "Found", false: "Not found"}[search(root, "care")])
fmt.Printf("Search 'cow': %s\n", map[bool]string{true: "Found", false: "Not found"}[search(root, "cow")])
}
Go 版本使用指针操作基数树。insert 函数处理三种情况:完全匹配、边标签完全匹配(递归进入子节点)、部分匹配(分裂节点)。collectWords 通过递归遍历收集所有单词。注意 Go 的字符串切片 word[cpLen:] 直接返回子串视图,无需额外分配。
运行该程序将输出
Inserted: car
Inserted: card
Inserted: care
Inserted: careful
Inserted: cat
Inserted: do
Inserted: dog
Inserted: dogma
All words in radix tree:
car
card
care
careful
cat
do
dog
dogma
Search 'car': Found
Search 'care': Found
Search 'cow': Not found
删除操作
删除一个单词时:
- 先找到对应的路径。
- 将该节点的
isEndOfWord设为false。 - 如果该节点没有子节点,可以从树中移除。
- 如果删除后父节点只剩一个子节点且父节点不是单词结尾,则将父节点与子节点合并。
C++ 实现
bool remove(RadixNode* parent, RadixNode* node, const std::string& word)
{
if (word.empty())
{
if (!node->isEndOfWord) return false;
node->isEndOfWord = false;
// if leaf, remove from parent
if (node->children.empty())
{
auto& kids = parent->children;
kids.erase(std::remove(kids.begin(), kids.end(), node), kids.end());
delete node;
}
return true;
}
for (RadixNode* child : node->children)
{
int cpLen = commonPrefixLength(word, child->edge);
if (cpLen == 0) continue;
if (cpLen < (int)child->edge.length()) return false;
bool result = remove(node, child, word.substr(cpLen));
// merge: if node has one child, not end of word, merge
if (result && node->children.size() == 1 && !node->isEndOfWord && node != parent)
{
RadixNode* onlyChild = node->children[0];
node->edge += onlyChild->edge;
node->isEndOfWord = onlyChild->isEndOfWord;
node->children = onlyChild->children;
delete onlyChild;
}
return result;
}
return false;
}
Python 实现
def remove(node, word):
"""Remove word from radix tree rooted at node."""
if not word:
if not node.is_end:
return False
node.is_end = False
return True
for i, child in enumerate(node.children):
cp_len = common_prefix_length(word, child.edge)
if cp_len == 0:
continue
if cp_len < len(child.edge):
return False
result = remove(child, word[cp_len:])
if result:
# remove empty leaf
if not child.children and not child.is_end:
node.children.pop(i)
# merge single child
elif len(node.children) == 1 and not node.is_end:
only = node.children[0]
node.edge += only.edge
node.is_end = only.is_end
node.children = only.children
return result
return False
Go 实现
func remove(parent *RadixNode, node *RadixNode, word string) bool {
if word == "" {
if !node.isEnd {
return false
}
node.isEnd = false
// if leaf, remove from parent
if len(node.children) == 0 {
for i, child := range parent.children {
if child == node {
parent.children = append(parent.children[:i], parent.children[i+1:]...)
break
}
}
}
return true
}
for _, child := range node.children {
cpLen := commonPrefixLength(word, child.edge)
if cpLen == 0 {
continue
}
if cpLen < len(child.edge) {
return false
}
result := remove(node, child, word[cpLen:])
// merge: if node has one child, not end of word, merge
if result && len(node.children) == 1 && !node.isEnd && node != parent {
onlyChild := node.children[0]
node.edge += onlyChild.edge
node.isEnd = onlyChild.isEnd
node.children = onlyChild.children
}
return result
}
return false
}
Go 版本的删除操作接收父节点和当前节点两个指针。删除后如果节点变为空叶子则从父节点中移除;如果节点只剩一个子节点且不是单词结尾,则与子节点合并。
基数树的性质
| 属性 | 基数树 | 标准 Trie |
|---|---|---|
| 空间复杂度 | O(N)(N 为所有字符串总长度) | O(N × L),L 为字符集大小 |
| 搜索时间 | O(L),L 为搜索字符串长度 | O(L) |
| 插入时间 | O(L) | O(L) |
| 单子节点链 | 压缩为单条边 | 每个字符一个节点 |
| 节点数 | 更少 | 更多 |
基数树的优势:
- 比标准 Trie 更节省空间(消除了单子节点链)
- 搜索效率与 Trie 相同
- 适合长字符串且有大量公共前缀的场景
基数树的应用:
- IP 路由表:Linux 内核使用基数树存储路由表(fib_trie)
- 自动补全:比标准 Trie 内存占用更低
- 数据压缩:Lempel-Ziv 压缩算法的变体
- 字典表示:高效存储大量有共同前缀的字符串

浙公网安备 33010602011771号