字典树详解

简介

参考资料：https://leetcode.cn/problems/implement-trie-prefix-tree/description/

基础版本（增查）

结构

Trie，又称前缀树或字典树，是一棵有根树，其每个节点包含以下字段：

指向子节点的指针数组 children。比如数据范围为小写英文字母，则数组长度为 26。此时 children[0] 对应小写字母 a，children[1] 对应小写字母 b，…，children[25] 对应小写字母 z。
布尔字段 isEnd，表示该节点是否为字符串的结尾。

插入字符串

我们从字典树的根开始，插入字符串。对于当前字符对应的子节点，有两种情况：

子节点存在。沿着指针移动到子节点，继续处理下一个字符。
子节点不存在。创建一个新的子节点，记录在 children 数组的对应位置上，然后沿着指针移动到子节点，继续搜索下一个字符。

重复以上步骤，直到处理字符串的最后一个字符，然后将当前节点标记为字符串的结尾。

查找前缀

我们从字典树的根开始，查找前缀。对于当前字符对应的子节点，有两种情况：

子节点存在。沿着指针移动到子节点，继续搜索下一个字符。
子节点不存在。说明字典树中不包含该前缀，返回空指针。

重复以上步骤，直到返回空指针或搜索完前缀的最后一个字符。

若搜索到了前缀的末尾，就说明字典树中存在该前缀。此外，若前缀末尾对应节点的 isEnd 为真，则说明字典树中存在该字符串。

代码实现

C

typedef struct Trie {
    struct Trie* children[26];
    bool isEnd;
} Trie;

Trie* trieCreate() {
    Trie* ret = malloc(sizeof(Trie));
    memset(ret->children, 0, sizeof(ret->children));
    ret->isEnd = false;
    return ret;
}

void trieInsert(Trie* obj, char* word) {
    int n = strlen(word);
    for (int i = 0; i < n; i++) {
        int ch = word[i] - 'a';
        if (obj->children[ch] == NULL) {
            obj->children[ch] = trieCreate();
        }
        obj = obj->children[ch];
    }
    obj->isEnd = true;
}

bool trieSearch(Trie* obj, char* word) {
    int n = strlen(word);
    for (int i = 0; i < n; i++) {
        int ch = word[i] - 'a';
        if (obj->children[ch] == NULL) {
            return false;
        }
        obj = obj->children[ch];
    }
    return obj->isEnd;
}

bool trieStartsWith(Trie* obj, char* prefix) {
    int n = strlen(prefix);
    for (int i = 0; i < n; i++) {
        int ch = prefix[i] - 'a';
        if (obj->children[ch] == NULL) {
            return false;
        }
        obj = obj->children[ch];
    }
    return true;
}

void trieFree(Trie* obj) {
    for (int i = 0; i < 26; i++) {
        if (obj->children[i]) {
            trieFree(obj->children[i]);
        }
    }
    free(obj);
}

这段代码实现了一个基于字典树（Trie）的数据结构，用于存储和检索字符串。

首先，定义了一个结构体 Trie，其中包含一个名为 children 的指针数组，用于存储当前节点的子节点，数组的长度为 26，表示英文字母的数量。另外，还有一个 isEnd 变量，用于标识当前节点是否是一个单词的结束。

然后，实现了一系列函数来操作字典树：

trieCreate()：用于创建一个新的字典树节点。首先分配了一个 Trie 结构体的内存空间，然后使用 memset 函数将 children 数组初始化为 NULL，将 isEnd 初始化为 false。
trieInsert()：用于向字典树中插入一个单词。通过遍历单词的每个字符，依次插入到字典树中。如果当前节点的子节点中没有对应字符的节点，则创建一个新的节点。
trieSearch()：用于检索字典树中是否存在一个单词。与插入类似，通过遍历单词的每个字符，在字典树中查找对应的节点，如果遍历完成后节点的 isEnd 标志为 true，则说明该单词存在。
trieStartsWith()：用于检索字典树中是否存在以给定前缀开头的单词。与 trieSearch() 类似，遍历前缀的每个字符，在字典树中查找对应的节点，如果所有字符都存在，则说明存在以该前缀开头的单词。
trieFree()：用于释放字典树的内存。通过递归地释放每个节点的内存，然后释放根节点的内存。

实例化代码如下——

int main() {
    // 实例化一个字典树
    Trie* root = trieCreate();

    // 插入一些单词
    trieInsert(root, "hello");
    trieInsert(root, "world");
    trieInsert(root, "openai");

    // 检索和输出单词
    printf("检索结果：\n");
    printf("hello 存在于字典树中: %s\n", trieSearch(root, "hello") ? "是" : "否");
    printf("world 存在于字典树中: %s\n", trieSearch(root, "world") ? "是" : "否");
    printf("open 存在于字典树中: %s\n", trieSearch(root, "open") ? "是" : "否");
    printf("ai 存在于字典树中: %s\n", trieSearch(root, "ai") ? "是" : "否");

    // 释放字典树内存
    trieFree(root);

    return 0;
}

java

class Trie {
    private Trie[] children;
    private boolean isEnd;

    public Trie() {
        children = new Trie[26];
        isEnd = false;
    }
    
    public void insert(String word) {
        Trie node = this;
        for (int i = 0; i < word.length(); i++) {
            char ch = word.charAt(i);
            int index = ch - 'a';
            if (node.children[index] == null) {
                node.children[index] = new Trie();
            }
            node = node.children[index];
        }
        node.isEnd = true;
    }
    
    public boolean search(String word) {
        Trie node = searchPrefix(word);
        return node != null && node.isEnd;
    }
    
    public boolean startsWith(String prefix) {
        return searchPrefix(prefix) != null;
    }

    private Trie searchPrefix(String prefix) {
        Trie node = this;
        for (int i = 0; i < prefix.length(); i++) {
            char ch = prefix.charAt(i);
            int index = ch - 'a';
            if (node.children[index] == null) {
                return null;
            }
            node = node.children[index];
        }
        return node;
    }
}

python

class Trie:
    def __init__(self):
        self.children = [None] * 26
        self.isEnd = False
    
    def searchPrefix(self, prefix: str) -> "Trie":
        node = self
        for ch in prefix:
            ch = ord(ch) - ord("a")
            if not node.children[ch]:
                return None
            node = node.children[ch]
        return node

    def insert(self, word: str) -> None:
        node = self
        for ch in word:
            ch = ord(ch) - ord("a")
            if not node.children[ch]:
                node.children[ch] = Trie()
            node = node.children[ch]
        node.isEnd = True

    def search(self, word: str) -> bool:
        node = self.searchPrefix(word)
        return node is not None and node.isEnd

    def startsWith(self, prefix: str) -> bool:
        return self.searchPrefix(prefix) is not None

改进版本（增删改查）

结构

typedef struct Trie {
    int path;
    int end;
    struct Trie* children[ALPHABET_SIZE];
} Trie;

在 Trie 结构体中，新增的 path 节点表示经过当前节点的路径数目。具体来说，path 表示有多少个单词的前缀是以当前节点为起点的。这个概念在字典树中是很重要的，它可以帮助我们快速地统计某个前缀出现的次数。

举例来说，假设我们有一个 Trie 结构体，其中存储了以下单词：{"ab", "abc", "abd", "abcd", "bcd"}。对应的 Trie 结构如下所示：

        root
        /  \
       a    b
      /      \
     b        c
    / \        \
   c   d        d
  /          
 d

e.g.，对于左分支节点 'c'，path 为 2，表示有 2 个单词的前缀是以节点 'c' 为起点的，即 "abc" 和 "abcd"。

通过 path 属性，我们可以在 Trie 结构中快速地统计某个前缀的出现次数，而无需遍历整个 Trie 树。这对于实现自动补全、单词频率统计等功能非常有用。

删除字符串

从根节点开始，按照要删除的字符串中的字符顺序，沿着指针移动到字符串的最后一个字符所在的节点。
检查当前字符对应的子节点是否存在：
- 如果存在，继续移动到子节点，并继续处理下一个字符。
- 如果不存在，说明要删除的字符串在字典树中不存在，直接返回 false。
当处理完字符串的最后一个字符后，将当前节点的 end 属性减少 1，表示当前节点不再是字符串的结尾。然后检查当前节点的 path 属性是否为 0：
- 如果 path 不为 0，说明其他单词共享当前节点作为结尾，不能删除当前节点，直接返回 true。
- 如果 path 减少到 0，说明没有其他单词共享当前节点作为结尾，可以安全地删除当前节点，并释放其内存。删除操作完成后，返回 true。

这样，在删除字符串的过程中，我们始终保持沿着字符串所对应的路径移动，并适时地更新节点的 end 和 path 属性，以保持字典树的正确性。

修改字符串

先执行删除功能，再执行插入功能。（可能效率不高，还未想到优化方案）

代码实现

c

#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>

#define ALPHABET_SIZE 26

// 字典树节点结构体
typedef struct Trie {
    int path;
    int end;
    struct Trie* children[ALPHABET_SIZE];
} Trie;

// 创建一个新的字典树节点
Trie* createNode() {
    Trie* node = (Trie*)malloc(sizeof(Trie));
    if (node) {
        node->path = 0;
        node->end = 0;
        for (int i = 0; i < ALPHABET_SIZE; i++) {
            node->children[i] = NULL;
        }
    }
    return node;
}

// 添加单词到字典树中
void insert(Trie* root, const char* word) {
    if (!root || !word) return;

    Trie* node = root;
    while (*word != '\0') {
        int index = *word - 'a';
        if (!node->children[index]) {
            node->children[index] = createNode();
        }
        node = node->children[index];
        node->path++;
        word++;
    }
    node->end++;
}

// 在字典树中搜索单词
bool search(Trie* root, const char* word) {
    if (!root || !word) return false;

    Trie* node = root;
    while (*word != '\0') {
        int index = *word - 'a';
        if (!node->children[index]) {
            return false;
        }
        node = node->children[index];
        word++;
    }
    return node->end != 0;
}

// 在字典树中查找前缀出现的次数
int preSearch(Trie* root, const char* pre) {
    if (!root || !pre) return 0;

    Trie* node = root;
    while (*pre != '\0') {
        int index = *pre - 'a';
        if (!node->children[index]) {
            return 0;
        }
        node = node->children[index];
        pre++;
    }
    return node->path;
}

// 从字典树中删除单词，如果word添加过多次，仅删除一个
bool delete(Trie* root, const char* word) {
    if (!root || !word) return false;
    if (!search(root, word)) return false;

    Trie* node = root;
    while (*word != '\0') {
        int index = *word - 'a';
        if (node->children[index]->path-- == 1) {
            free(node->children[index]);
            node->children[index] = NULL;
            return true;
        }
        node = node->children[index];
        word++;
    }
    node->end--;
    return true;
}

// 从字典树中更新单词
void update(Trie* root, const char* old, const char* new) {
    //先删
    if (!delete(root, old)) return;
    //后增
    insert(root, new);
}

//释放内存
void trieFree(Trie* root) {
    for (int i = 0; i < ALPHABET_SIZE; ++i) {
        if (root->children[i]) {
            trieFree(root->children[i]);
        }
    }
    free(root);
}

实例化代码如下：

int main() {
    Trie* root = createNode();

    // 测试例子
    const char* str1 = "ab";
    const char* str2 = "abc";
    const char* str3 = "abd";
    const char* str4 = "abcd";
    const char* str5 = "bcd";

    insert(root, str1);
    insert(root, str2);
    insert(root, str2);
    insert(root, str3);
    insert(root, str4);
    insert(root, str5);
    
    printf("%s是否在字典树中：%s\n", str2, search(root, str2) ? "是": "否");

    delete(root, str2);
    printf("删除%s%s\n", str2, search(root, str2) ? "失败": "成功");
    printf("%s是否在字典树中：%s\n", str2, search(root, str2) ? "是": "否");

    printf("统计前缀%s的数量：%d\n", str1, preSearch(root, str1));
    printf("统计前缀%s的数量：%d\n", str2, preSearch(root, str2));

    delete(root, str3);
    printf("删除%s%s\n", str3, search(root, str3) ? "失败": "成功");

    printf("统计前缀%s的数量：%d\n", str1, preSearch(root, str1));
    printf("统计前缀%s的数量：%d\n", str2, preSearch(root, str2));

    update(root, str2, str5);
    printf("%s是否在字典树中：%s\n", str2, search(root, str2) ? "是": "否");
    printf("统计前缀%s的数量：%d\n", str5, preSearch(root, str5));

    // 释放内存
    //free(root);
    trieFree(root);

    return 0;
}

posted @ 2024-04-22 14:44 岸南阅读(48) 评论(0) 收藏举报

刷新页面返回顶部

shoresouth

字典树详解

简介

基础版本（增查）

结构

插入字符串

查找前缀

代码实现

C

java

python

改进版本（增删改查）

结构

删除字符串

修改字符串

代码实现

c

公告