字典树

Java字典树

1. 什么是字典树？

字典树（Trie），又称前缀树或单词查找树，是一种树形数据结构，用于高效存储和检索字符串集合中的键。它的核心思想是共享公共前缀，从而减少冗余存储并加速搜索。

2. 字典树的核心特性

前缀共享：具有相同前缀的字符串共享树中的路径。
快速检索：搜索时间复杂度仅与键的长度有关，而非数据集大小。
多模式匹配：天然支持前缀匹配、模糊搜索等功能。

3. 字典树的结构

每个节点包含两部分：

子节点数组：通常为字符集大小（例如26个字母）。
标记位：标识当前节点是否为某个字符串的结尾。

4. Java实现基础字典树

4.1 节点定义

class TrieNode {
    private TrieNode[] children;  // 子节点数组
    private boolean isEnd;        // 标记是否为单词结尾

    public TrieNode() {
        children = new TrieNode[26]; // 假设仅处理小写字母
        isEnd = false;
    }
}

4.2 插入操作

class Trie {
    private TrieNode root;

    public Trie() {
        root = new TrieNode();
    }

    // 插入单词
    public void insert(String word) {
        TrieNode node = root;
        for (char c : word.toCharArray()) {
            int index = c - 'a';
            if (node.children[index] == null) {
                node.children[index] = new TrieNode();
            }
            node = node.children[index];
        }
        node.isEnd = true; // 标记单词结束
    }
}

4.3 搜索操作

public boolean search(String word) {
    TrieNode node = searchPrefix(word);
    return node != null && node.isEnd;
}

// 搜索前缀
private TrieNode searchPrefix(String prefix) {
    TrieNode node = root;
    for (char c : prefix.toCharArray()) {
        int index = c - 'a';
        if (node.children[index] == null) {
            return null;
        }
        node = node.children[index];
    }
    return node;
}

4.4 前缀检查

public boolean startsWith(String prefix) {
    return searchPrefix(prefix) != null;
}

5. 时间复杂度分析

操作	时间复杂度
插入	O(L)，L为字符串长度
搜索	O(L)
前缀检查	O(L)

6. 字典树的经典应用场景

6.1 自动补全（AutoComplete）

原理：输入前缀时，遍历前缀对应的子树，收集所有以该前缀开头的单词。

代码片段：

public List<String> getSuggestions(String prefix) {
    List<String> suggestions = new ArrayList<>();
    TrieNode node = searchPrefix(prefix);
    dfs(node, new StringBuilder(prefix), suggestions);
    return suggestions;
}

private void dfs(TrieNode node, StringBuilder path, List<String> result) {
    if (node == null) return;
    if (node.isEnd) {
        result.add(path.toString());
    }
    for (char c = 'a'; c <= 'z'; c++) {
        int index = c - 'a';
        if (node.children[index] != null) {
            path.append(c);
            dfs(node.children[index], path, result);
            path.deleteCharAt(path.length() - 1);
        }
    }
}

6.2 词频统计

改进节点：在节点中添加计数器。

class TrieNode {
    private TrieNode[] children = new TrieNode[26];
    private int count; // 记录单词出现次数
}

// 插入时更新count
public void insert(String word) {
    // ...原有逻辑...
    node.count++; // 在最终节点增加计数
}

6.3 敏感词过滤

原理：构建敏感词字典树，快速检测文本中是否包含敏感词。

7. 字典树的变种与优化

7.1 压缩字典树（Radix Tree）

核心思想：合并单子节点的路径，减少树的高度。
适用场景：长字符串且公共前缀较多的场景（如URL路由）。

7.2 双数组字典树（Double-Array Trie）

优势：将树结构转换为两个数组，大幅减少内存占用。
缺点：实现复杂，插入和删除效率低。

7.3 支持任意字符集的字典树

改进方案：使用Map<Character, TrieNode>替代数组。

class TrieNode {
    Map<Character, TrieNode> children = new HashMap<>();
    boolean isEnd;
}

8. 字典树与其他数据结构的对比

数据结构	插入/查询时间复杂度	适用场景
哈希表	O(1)	精确匹配
红黑树	O(log n)	有序数据
字典树	O(L)	前缀匹配、多模式搜索

9. 实战：实现一个支持通配符的字典树

问题描述

设计一个字典树，支持通配符.匹配任意单个字符（类似LeetCode 211）。

代码实现

import java.util.*;

class TrieNode {
    Map<Character, TrieNode> children = new HashMap<>();
    boolean isEndOfWord; // 标记当前节点是否是一个完整单词的结尾
}

public class Trie {
    private final TrieNode root;

    public Trie() {
        root = new TrieNode();
    }

    // 插入单词
    public void insert(String word) {
        TrieNode current = root;
        for (char c : word.toCharArray()) {
            current.children.putIfAbsent(c, new TrieNode()); // 如果没有该字符的节点，则创建
            current = current.children.get(c);
        }
        current.isEndOfWord = true; // 标记单词结束
    }

    // 查找单词是否存在
    public boolean search(String word) {
        TrieNode current = root;
        for (char c : word.toCharArray()) {
            if (!current.children.containsKey(c)) {
                return false; // 如果字符不存在，则说明单词不在 Trie 中
            }
            current = current.children.get(c); // 遍历 Trie
        }
        return current.isEndOfWord; // 只有到达完整单词结尾才算匹配成功
    }

    // 检查是否有以 prefix 为前缀的单词
    public boolean startsWith(String prefix) {
        TrieNode current = root;
        for (char c : prefix.toCharArray()) {
            if (!current.children.containsKey(c)) {
                return false;
            }
            current = current.children.get(c);
        }
        return true; // 只要前缀匹配，就返回 true
    }

    // 删除单词（递归删除）
    public boolean delete(String word) {
        return delete(root, word, 0);
    }

    private boolean delete(TrieNode current, String word, int index) {
        if (index == word.length()) {
            if (!current.isEndOfWord) {
                return false; // 该单词并不存在于 Trie
            }
            current.isEndOfWord = false; // 取消单词结束标记
            return current.children.isEmpty(); // 如果该节点没有子节点，告诉上级节点可以删除
        }

        char c = word.charAt(index);
        TrieNode nextNode = current.children.get(c);
        if (nextNode == null) {
            return false; // 该字符不存在，无法删除
        }

        boolean shouldDelete = delete(nextNode, word, index + 1);

        // 递归返回后，检查子节点是否可以被删除
        if (shouldDelete) {
            current.children.remove(c);
            return current.children.isEmpty() && !current.isEndOfWord; // 只有当没有子节点且不是另一个单词的结尾时，才删除该节点
        }

        return false;
    }

    public static void main(String[] args) {
        Trie trie = new Trie();
        Scanner sc = new Scanner(System.in);
        int n = sc.nextInt(); // 需要插入的单词数量
        int m = sc.nextInt(); // 需要查询的单词数量
        int d = sc.nextInt(); // 需要删除的单词数量
        sc.nextLine(); // 处理换行符

        // 插入单词
        for (int i = 0; i < n; i++) {
            String word = sc.nextLine();
            trie.insert(word);
        }

        // 查询单词
        for (int i = 0; i < m; i++) {
            String word = sc.nextLine();
            System.out.println(trie.search(word) ? "Y" : "N");
        }

        // 删除单词
        for (int i = 0; i < d; i++) {
            String word = sc.nextLine();
            trie.delete(word);
        }

        // 再次查询单词
        System.out.println("After deletion:");
        for (int i = 0; i < m; i++) {
            String word = sc.nextLine();
            System.out.println(trie.search(word) ? "Y" : "N");
        }

        sc.close();
    }
}

posted @ 2025-03-05 20:50 咋还没来阅读(36) 评论(0) 收藏举报

刷新页面返回顶部

H

字典树