3-12 B树

B树(B Tree)

B树(B Tree)是一种自平衡多路搜索树(Self-Balancing Multi-way Search Tree),由 Rudolf Bayer 和 Edward McCreight 于 1970 年提出。与二叉搜索树每个节点最多有两个子节点不同,B树的每个节点可以拥有多个关键字(Key)和多个子节点(Children),这使得它特别适合磁盘等外部存储的数据组织。

B树的核心设计思想是减少磁盘 I/O 次数:通过在每个节点中存储多个关键字,整棵树变得更加"矮胖",查找时从根到叶子的路径更短,每次访问一个节点对应一次磁盘读取,因此磁盘 I/O 次数大幅减少。

B树使用最小度数(Minimum Degree) t 来定义节点的容量范围:

  • 每个节点最多有 2t - 1 个关键字,最少有 t - 1 个关键字(根节点除外)
  • 每个内部节点最多有 2t 个子节点,最少有 t 个子节点(根节点除外)
  • 所有叶子节点(Leaf Node)位于同一深度

下面是一棵 t = 3(每个节点最多 5 个关键字)的 B 树示例:

              [10 | 20]
             /    |     \
          [5]   [15]   [25 | 30]
  • 根节点 [10 | 20] 有 2 个关键字,3 个子节点
  • 叶子节点 [5][15][25 | 30] 都在同一层
  • 每个节点内的关键字按升序排列

B树的性质

B树必须满足以下性质:

  1. 所有叶子节点在同一深度 — 从根节点到任意叶子节点的路径长度相同,这保证了查找操作的时间复杂度稳定。
  2. 根节点至少有 1 个关键字,至多有 2t - 1 个关键字 — 根节点是唯一允许关键字数少于 t - 1 的节点。
  3. 每个非根内部节点至少有 t - 1 个关键字,至多有 2t - 1 个关键字 — 这保证了节点不会太空,树不会退化为链表。
  4. 每个内部节点如果有 n 个关键字,则有 n + 1 个子节点 — 关键字将子节点的值域划分为 n + 1 个区间。
  5. 节点内的关键字按升序排列key[0] < key[1] < ... < key[n-1],便于节点内使用二分查找或线性查找。
  6. 子节点与关键字的关系:设某节点有 n 个关键字 k[0], k[1], ..., k[n-1]n+1 个子节点 c[0], c[1], ..., c[n],则子树 c[i] 中的所有关键字 x 满足 k[i-1] < x < k[i](边界处取开区间)。

t = 3 为例,各参数的具体范围:

参数 最小值 最大值
每个节点的关键字数 2(非根)或 1(根) 5
每个内部节点的子节点数 3(非根)或 2(根) 6

节点定义

B树的节点与二叉搜索树不同,它需要存储多个关键字和多个子节点指针,同时记录当前关键字数量和是否为叶子节点。

C++ 节点定义

#include <iostream>
#include <vector>
using namespace std;

// B-Tree node definition
struct BTreeNode {
    vector<int> keys;              // sorted keys in this node
    vector<BTreeNode*> children;   // child pointers
    int t;                         // minimum degree
    bool leaf;                     // true if leaf node

    // constructor
    BTreeNode(int _t, bool _leaf) : t(_t), leaf(_leaf) {}

    // utility: check if node is full (has 2t-1 keys)
    bool isFull() {
        return keys.size() == 2 * t - 1;
    }
};

C++ 使用 vector 动态数组存储关键字和子节点指针,通过 tleaf 字段控制节点行为,isFull() 判断节点是否已满。

C 节点定义

#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>

// B-Tree node definition
typedef struct BTreeNode {
    int* keys;                    // array of keys
    struct BTreeNode** children;  // array of child pointers
    int t;                        // minimum degree
    int n;                        // current number of keys
    bool leaf;                    // true if leaf node
} BTreeNode;

// create a new node
BTreeNode* createNode(int t, bool leaf) {
    BTreeNode* node = (BTreeNode*)malloc(sizeof(BTreeNode));
    node->t = t;
    node->leaf = leaf;
    node->n = 0;
    node->keys = (int*)malloc((2 * t - 1) * sizeof(int));
    node->children = (BTreeNode**)malloc(2 * t * sizeof(BTreeNode*));
    for (int i = 0; i < 2 * t; i++)
        node->children[i] = NULL;
    return node;
}

C 语言使用固定大小的数组(最大容量 2t-1 个关键字和 2t 个子节点),并额外维护 n 字段记录当前关键字数量。createNode 函数负责分配内存并初始化。

Python 节点定义

class BTreeNode:
    def __init__(self, t, leaf):
        # keys and children stored as lists
        self.keys = []          # sorted keys in this node
        self.children = []      # child pointers
        self.t = t              # minimum degree
        self.leaf = leaf        # true if leaf node

    def is_full(self):
        # check if node has maximum number of keys
        return len(self.keys) == 2 * self.t - 1

Python 使用列表存储关键字和子节点,代码最为简洁。is_full() 方法判断节点是否已满(包含 2t-1 个关键字)。

Go 节点定义

package main

import "fmt"

// BTreeNode represents a node in the B-Tree
type BTreeNode struct {
    keys     []int        // sorted keys in this node
    children []*BTreeNode // child pointers
    t        int          // minimum degree
    leaf     bool         // true if leaf node
}

// newBTreeNode creates a new B-Tree node
func newBTreeNode(t int, leaf bool) *BTreeNode {
    return &BTreeNode{
        keys:     []int{},
        children: []*BTreeNode{},
        t:        t,
        leaf:     leaf,
    }
}

// isFull checks if node has maximum number of keys (2t-1)
func (n *BTreeNode) isFull() bool {
    return len(n.keys) == 2*n.t-1
}

Go 使用切片存储关键字和子节点指针,通过结构体方法 isFull() 判断节点是否已满。newBTreeNode 工厂函数负责创建并初始化节点。


搜索操作

B树的搜索(Search)与二叉搜索树类似,但需要在一个节点内的多个关键字中进行查找。具体步骤如下:

  1. 从当前节点的第一个关键字开始,线性扫描找到第一个大于或等于目标值的位置
  2. 如果找到等于目标值的关键字,搜索成功
  3. 如果当前节点是叶子节点且未找到,搜索失败
  4. 否则递归搜索对应的子节点
搜索 15 的过程:

              [10 | 20]          ← 15 >= 10, 继续比较; 15 < 20, 进入子节点 c[1]
             /    |     \
          [5]   [15]   [25 | 30] ← 在 [15] 中找到 15, 搜索成功

C++ 搜索实现

// search for key k in subtree rooted at this node
BTreeNode* search(BTreeNode* node, int k, int* index) {
    int i = 0;
    // find first key >= k
    while (i < (int)node->keys.size() && k > node->keys[i])
        i++;

    // found the key
    if (i < (int)node->keys.size() && node->keys[i] == k) {
        *index = i;
        return node;
    }

    // key not found and this is a leaf
    if (node->leaf)
        return nullptr;

    // recurse into appropriate child
    return search(node->children[i], k, index);
}

C 搜索实现

// search for key k in subtree rooted at node
BTreeNode* search(BTreeNode* node, int k, int* index) {
    int i = 0;
    // find first key >= k
    while (i < node->n && k > node->keys[i])
        i++;

    // found the key
    if (i < node->n && node->keys[i] == k) {
        *index = i;
        return node;
    }

    // key not found and this is a leaf
    if (node->leaf)
        return NULL;

    // recurse into appropriate child
    return search(node->children[i], k, index);
}

Python 搜索实现

def search(node, k):
    # find first key >= k
    i = 0
    while i < len(node.keys) and k > node.keys[i]:
        i += 1

    # found the key
    if i < len(node.keys) and node.keys[i] == k:
        return node, i

    # key not found and this is a leaf
    if node.leaf:
        return None, -1

    # recurse into appropriate child
    return search(node.children[i], k)

Go 搜索实现

// search searches for key k in subtree rooted at node
func search(node *BTreeNode, k int) (*BTreeNode, int) {
    if node == nil {
        return nil, -1
    }
    i := 0
    // find first key >= k
    for i < len(node.keys) && k > node.keys[i] {
        i++
    }

    // found the key
    if i < len(node.keys) && node.keys[i] == k {
        return node, i
    }

    // key not found and this is a leaf
    if node.leaf {
        return nil, -1
    }

    // recurse into appropriate child
    return search(node.children[i], k)
}

搜索操作在每一层最多扫描 2t - 1 个关键字,树高为 O(log_t n),因此总体时间复杂度为 O(t * log_t n)。由于 t 通常是一个较小的常数(如 50~100),实际中搜索效率很高。


分裂操作

分裂(Split)是 B树维持平衡的核心操作。当一个节点已满(包含 2t - 1 个关键字)时,需要将其分裂为两个各含 t - 1 个关键字的节点,并将中间关键字提升到父节点。

具体步骤:

  1. 取已满节点的中间关键字 keys[t-1](第 t 个关键字,下标为 t-1
  2. 创建一个新节点,将原节点右半部分的关键字和子节点复制到新节点
  3. 将中间关键字插入父节点
  4. 将新节点作为父节点的新子节点插入
分裂一个已满节点(t=3, 5个关键字):

分裂前:                        分裂后:
父节点: [30]                   父节点: [20 | 30]
      |                              |    \
子节点: [10|15|20|25|28]       [10|15]   [25|28]
              ↑                        ↑
         中间关键字 20 提升到父节点

C++ 分裂实现

// split child y of node x at index i
// y must be full (has 2t-1 keys) when this is called
void splitChild(BTreeNode* x, int i, BTreeNode* y) {
    // create new node z with same t and leaf status as y
    BTreeNode* z = new BTreeNode(y->t, y->leaf);

    // copy the last (t-1) keys from y to z
    for (int j = 0; j < y->t - 1; j++)
        z->keys.push_back(y->keys[j + y->t]);

    // copy the last t children from y to z (if not leaf)
    if (!y->leaf) {
        for (int j = 0; j < y->t; j++)
            z->children.push_back(y->children[j + y->t]);
    }

    // remove transferred keys and children from y
    int keysToRemove = y->t;  // t-1 to z + 1 to parent = t keys
    while ((int)y->keys.size() > y->t - 1)
        y->keys.pop_back();
    while (!y->leaf && (int)y->children.size() > y->t)
        y->children.pop_back();

    // insert z as a child of x at position i+1
    x->children.insert(x->children.begin() + i + 1, z);

    // move middle key of y up to x
    int midKey = y->keys[y->t - 1];
    y->keys.erase(y->keys.begin() + y->t - 1);
    x->keys.insert(x->keys.begin() + i, midKey);
}

C 分裂实现

// split child y of node x at index i
// y must be full (has 2t-1 keys) when this is called
void splitChild(BTreeNode* x, int i, BTreeNode* y) {
    // create new node z with same t and leaf status as y
    BTreeNode* z = createNode(y->t, y->leaf);
    z->n = y->t - 1;

    // copy the last (t-1) keys from y to z
    for (int j = 0; j < y->t - 1; j++)
        z->keys[j] = y->keys[j + y->t];

    // copy the last t children from y to z (if not leaf)
    if (!y->leaf) {
        for (int j = 0; j < y->t; j++)
            z->children[j] = y->children[j + y->t];
    }

    // update y's key count
    y->n = y->t - 1;

    // shift x's children right to make room for z
    for (int j = x->n; j >= i + 1; j--)
        x->children[j + 1] = x->children[j];

    // insert z as a child of x at position i+1
    x->children[i + 1] = z;

    // shift x's keys right to make room for middle key
    for (int j = x->n - 1; j >= i; j--)
        x->keys[j + 1] = x->keys[j];

    // move middle key of y up to x
    x->keys[i] = y->keys[y->t - 1];
    x->n++;
}

Python 分裂实现

def split_child(x, i):
    # y is the full child to split
    y = x.children[i]
    t = y.t

    # create new node z with same t and leaf status as y
    z = BTreeNode(t, y.leaf)

    # copy the last (t-1) keys from y to z
    z.keys = y.keys[t:]       # keys[t] to keys[2t-2]
    mid_key = y.keys[t - 1]   # middle key to promote

    # copy the last t children from y to z (if not leaf)
    if not y.leaf:
        z.children = y.children[t:]

    # trim y to keep only first (t-1) keys and t children
    y.keys = y.keys[:t - 1]
    if not y.leaf:
        y.children = y.children[:t]

    # insert z as a child of x at position i+1
    x.children.insert(i + 1, z)

    # insert middle key into x at position i
    x.keys.insert(i, mid_key)

Go 分裂实现

// splitChild splits full child y of node x at index i
// y must be full (has 2t-1 keys) when this is called
func splitChild(x *BTreeNode, i int, y *BTreeNode) {
    // create new node z with same t and leaf status as y
    z := newBTreeNode(y.t, y.leaf)

    // copy the last (t-1) keys from y to z
    z.keys = append(z.keys, y.keys[y.t:]...)
    midKey := y.keys[y.t-1]

    // copy the last t children from y to z (if not leaf)
    if !y.leaf {
        z.children = append(z.children, y.children[y.t:]...)
    }

    // trim y to keep only first (t-1) keys and t children
    y.keys = y.keys[:y.t-1]
    if !y.leaf {
        y.children = y.children[:y.t]
    }

    // insert z as a child of x at position i+1
    x.children = append(x.children, nil)
    copy(x.children[i+2:], x.children[i+1:])
    x.children[i+1] = z

    // insert middle key into x at position i
    x.keys = append(x.keys, 0)
    copy(x.keys[i+1:], x.keys[i:])
    x.keys[i] = midKey
}

分裂操作只涉及单个节点的数据移动,时间复杂度为 O(t)。注意分裂不会改变树的高度——只有当根节点分裂时,树才会长高一层。


插入操作

B树的插入(Insertion)采用预分裂(Proactive Splitting)策略:在从根节点向下寻找插入位置的过程中,遇到已满的节点就先分裂,这样保证了插入时父节点始终有空位容纳提升上来的关键字。

具体步骤:

  1. 如果根节点已满,先分裂根节点,树高增加一层
  2. 从根节点开始,向下寻找合适的叶子节点
  3. 沿途遇到已满的子节点时,先执行分裂操作
  4. 到达叶子节点后,将新关键字插入到正确位置(保持有序)
插入序列(t=3): 10, 20, 5, 6, 12, 30, 7, 17

步骤1: 插入 10                步骤2: 插入 20
     [10]                         [10 | 20]

步骤3: 插入 5                  步骤4: 插入 6
   [5 | 10 | 20]               [5 | 6 | 10 | 20]

步骤5: 插入 12(节点已满,分裂)
分裂前: [5|6|10|12|20]    分裂后:
                              [10]
                             /    \
                          [5|6]  [12|20]

步骤6: 插入 30                    步骤7: 插入 7
        [10]                          [10]
       /    \                        /    \
    [5|6]  [12|20|30]             [5|6|7] [12|20|30]

步骤8: 插入 17
         [10]
        /    \
    [5|6|7] [12|17|20|30]

C++ 插入实现

// insert key k into non-full node x
void insertNonFull(BTreeNode* x, int k) {
    int i = x->keys.size() - 1;

    if (x->leaf) {
        // insert key into leaf node at correct position
        x->keys.push_back(0);  // placeholder
        while (i >= 0 && x->keys[i] > k) {
            x->keys[i + 1] = x->keys[i];
            i--;
        }
        x->keys[i + 1] = k;
    } else {
        // find child to recurse into
        while (i >= 0 && x->keys[i] > k)
            i--;
        i++;

        // split child if full
        if (x->children[i]->isFull()) {
            splitChild(x, i, x->children[i]);
            if (k > x->keys[i])
                i++;
        }
        insertNonFull(x->children[i], k);
    }
}

// main insert function
BTreeNode* insert(BTreeNode* root, int k, int t) {
    if (root == nullptr) {
        root = new BTreeNode(t, true);
        root->keys.push_back(k);
        return root;
    }

    // if root is full, split and grow the tree
    if (root->isFull()) {
        BTreeNode* s = new BTreeNode(t, false);
        s->children.push_back(root);
        splitChild(s, 0, root);
        // determine which child to insert into
        int i = 0;
        if (s->keys[0] < k)
            i++;
        insertNonFull(s->children[i], k);
        return s;  // new root
    }

    insertNonFull(root, k);
    return root;
}

C 插入实现

// insert key k into non-full node x
void insertNonFull(BTreeNode* x, int k) {
    int i = x->n - 1;

    if (x->leaf) {
        // shift keys right to make room for k
        while (i >= 0 && x->keys[i] > k) {
            x->keys[i + 1] = x->keys[i];
            i--;
        }
        x->keys[i + 1] = k;
        x->n++;
    } else {
        // find child to recurse into
        while (i >= 0 && x->keys[i] > k)
            i--;
        i++;

        // split child if full
        if (x->children[i]->n == 2 * x->t - 1) {
            splitChild(x, i, x->children[i]);
            if (k > x->keys[i])
                i++;
        }
        insertNonFull(x->children[i], k);
    }
}

// main insert function — returns new root if tree grew
BTreeNode* insert(BTreeNode* root, int k, int t) {
    if (root == NULL) {
        root = createNode(t, true);
        root->keys[0] = k;
        root->n = 1;
        return root;
    }

    // if root is full, split and grow the tree
    if (root->n == 2 * t - 1) {
        BTreeNode* s = createNode(t, false);
        s->children[0] = root;
        splitChild(s, 0, root);
        // determine which child to insert into
        int i = 0;
        if (s->keys[0] < k)
            i++;
        insertNonFull(s->children[i], k);
        return s;  // new root
    }

    insertNonFull(root, k);
    return root;
}

Python 插入实现

def insert_non_full(x, k):
    i = len(x.keys) - 1

    if x.leaf:
        # insert key into leaf at correct position
        x.keys.append(0)  # placeholder
        while i >= 0 and x.keys[i] > k:
            x.keys[i + 1] = x.keys[i]
            i -= 1
        x.keys[i + 1] = k
    else:
        # find child to recurse into
        while i >= 0 and x.keys[i] > k:
            i -= 1
        i += 1

        # split child if full
        if x.children[i].is_full():
            split_child(x, i)
            if k > x.keys[i]:
                i += 1
        insert_non_full(x.children[i], k)


def insert(root, k, t):
    if root is None:
        root = BTreeNode(t, True)
        root.keys.append(k)
        return root

    # if root is full, split and grow the tree
    if root.is_full():
        s = BTreeNode(t, False)
        s.children.append(root)
        split_child(s, 0)
        # determine which child to insert into
        i = 0
        if s.keys[0] < k:
            i += 1
        insert_non_full(s.children[i], k)
        return s  # new root

    insert_non_full(root, k)
    return root

Go 插入实现

// insertNonFull inserts key k into non-full node x
func insertNonFull(x *BTreeNode, k int) {
    i := len(x.keys) - 1

    if x.leaf {
        // insert key into leaf at correct position
        x.keys = append(x.keys, 0)
        for i >= 0 && x.keys[i] > k {
            x.keys[i+1] = x.keys[i]
            i--
        }
        x.keys[i+1] = k
    } else {
        // find child to recurse into
        for i >= 0 && x.keys[i] > k {
            i--
        }
        i++

        // split child if full
        if x.children[i].isFull() {
            splitChild(x, i, x.children[i])
            if k > x.keys[i] {
                i++
            }
        }
        insertNonFull(x.children[i], k)
    }
}

// insert inserts key k into B-tree, returns new root if tree grew
func bTreeInsert(root *BTreeNode, k, t int) *BTreeNode {
    if root == nil {
        root = newBTreeNode(t, true)
        root.keys = append(root.keys, k)
        return root
    }

    // if root is full, split and grow the tree
    if root.isFull() {
        s := newBTreeNode(t, false)
        s.children = append(s.children, root)
        splitChild(s, 0, root)
        // determine which child to insert into
        i := 0
        if s.keys[0] < k {
            i++
        }
        insertNonFull(s.children[i], k)
        return s // new root
    }

    insertNonFull(root, k)
    return root
}

预分裂策略确保了每次递归下降时遇到的节点都未满,因此插入操作只需从根到叶子走一趟即可完成,不需要回溯。插入的时间复杂度为 O(t * log_t n),其中 t 是节点内扫描的代价,log_t n 是树的高度。


遍历操作

B树的遍历(Traversal)采用类似二叉搜索树中序遍历(Inorder Traversal)的方式:对于节点中的每个关键字,先访问其左边的子树,然后访问该关键字本身,最后访问最右边的子树。

遍历顺序:
节点: [k0 | k1 | k2]
子节点: c0   c1   c2   c3

访问顺序: c0 → k0 → c1 → k1 → c2 → k2 → c3

这样遍历的结果就是所有关键字的升序排列。

C++ 遍历实现

// inorder traversal of subtree rooted at node
void traverse(BTreeNode* node) {
    if (node == nullptr) return;

    int n = node->keys.size();
    for (int i = 0; i < n; i++) {
        // visit left child before key
        if (!node->leaf)
            traverse(node->children[i]);
        cout << node->keys[i] << " ";
    }
    // visit the rightmost child
    if (!node->leaf)
        traverse(node->children[n]);
}

C 遍历实现

// inorder traversal of subtree rooted at node
void traverse(BTreeNode* node) {
    if (node == NULL) return;

    for (int i = 0; i < node->n; i++) {
        // visit left child before key
        if (!node->leaf)
            traverse(node->children[i]);
        printf("%d ", node->keys[i]);
    }
    // visit the rightmost child
    if (!node->leaf)
        traverse(node->children[node->n]);
}

Python 遍历实现

def traverse(node):
    if node is None:
        return

    for i in range(len(node.keys)):
        # visit left child before key
        if not node.leaf:
            traverse(node.children[i])
        print(node.keys[i], end=' ')
    # visit the rightmost child
    if not node.leaf:
        traverse(node.children[len(node.keys)])

Go 遍历实现

// traverse performs inorder traversal of subtree rooted at node
func traverse(node *BTreeNode) {
    if node == nil {
        return
    }

    for i := 0; i < len(node.keys); i++ {
        // visit left child before key
        if !node.leaf {
            traverse(node.children[i])
        }
        fmt.Printf("%d ", node.keys[i])
    }
    // visit the rightmost child
    if !node.leaf {
        traverse(node.children[len(node.keys)])
    }
}

遍历操作访问每个节点恰好一次,时间复杂度为 O(n),其中 n 是关键字总数。


完整实现

下面给出 C++、C 和 Python 三种语言的 B树完整实现,包含搜索、插入和遍历操作。使用最小度数 t = 3,插入序列为 10, 20, 5, 6, 12, 30, 7, 17

C++ 完整实现

#include <iostream>
#include <vector>
using namespace std;

// B-Tree node
struct BTreeNode {
    vector<int> keys;
    vector<BTreeNode*> children;
    int t;
    bool leaf;

    BTreeNode(int _t, bool _leaf) : t(_t), leaf(_leaf) {}

    bool isFull() {
        return (int)keys.size() == 2 * t - 1;
    }
};

// split full child y of node x at index i
void splitChild(BTreeNode* x, int i, BTreeNode* y) {
    BTreeNode* z = new BTreeNode(y->t, y->leaf);

    // copy last (t-1) keys from y to z
    for (int j = 0; j < y->t - 1; j++)
        z->keys.push_back(y->keys[j + y->t]);

    // copy last t children from y to z
    if (!y->leaf) {
        for (int j = 0; j < y->t; j++)
            z->children.push_back(y->children[j + y->t]);
    }

    // middle key to promote
    int midKey = y->keys[y->t - 1];

    // trim y to keep first (t-1) keys
    y->keys.resize(y->t - 1);
    if (!y->leaf)
        y->children.resize(y->t);

    // insert z as child and midKey into x
    x->children.insert(x->children.begin() + i + 1, z);
    x->keys.insert(x->keys.begin() + i, midKey);
}

// insert key into non-full node
void insertNonFull(BTreeNode* x, int k) {
    int i = (int)x->keys.size() - 1;

    if (x->leaf) {
        // find position and insert
        x->keys.push_back(0);
        while (i >= 0 && x->keys[i] > k) {
            x->keys[i + 1] = x->keys[i];
            i--;
        }
        x->keys[i + 1] = k;
    } else {
        while (i >= 0 && x->keys[i] > k)
            i--;
        i++;

        if (x->children[i]->isFull()) {
            splitChild(x, i, x->children[i]);
            if (k > x->keys[i])
                i++;
        }
        insertNonFull(x->children[i], k);
    }
}

// insert key into B-tree, return new root if tree grew
BTreeNode* insert(BTreeNode* root, int k, int t) {
    if (root == nullptr) {
        root = new BTreeNode(t, true);
        root->keys.push_back(k);
        return root;
    }

    if (root->isFull()) {
        BTreeNode* s = new BTreeNode(t, false);
        s->children.push_back(root);
        splitChild(s, 0, root);
        int i = (s->keys[0] < k) ? 1 : 0;
        insertNonFull(s->children[i], k);
        return s;
    }

    insertNonFull(root, k);
    return root;
}

// search for key in subtree rooted at node
BTreeNode* search(BTreeNode* node, int k, int* idx) {
    int i = 0;
    while (i < (int)node->keys.size() && k > node->keys[i])
        i++;

    if (i < (int)node->keys.size() && node->keys[i] == k) {
        *idx = i;
        return node;
    }

    if (node->leaf)
        return nullptr;

    return search(node->children[i], k, idx);
}

// inorder traversal
void traverse(BTreeNode* node) {
    if (node == nullptr) return;

    int n = (int)node->keys.size();
    for (int i = 0; i < n; i++) {
        if (!node->leaf)
            traverse(node->children[i]);
        cout << node->keys[i] << " ";
    }
    if (!node->leaf)
        traverse(node->children[n]);
}

int main() {
    BTreeNode* root = nullptr;
    int t = 3;

    int keys[] = {10, 20, 5, 6, 12, 30, 7, 17};
    for (int k : keys)
        root = insert(root, k, t);

    cout << "Traversal: ";
    traverse(root);
    cout << endl;

    int idx;
    BTreeNode* result = search(root, 6, &idx);
    cout << "Search 6: " << (result ? "Found" : "Not found") << endl;

    result = search(root, 15, &idx);
    cout << "Search 15: " << (result ? "Found" : "Not found") << endl;

    return 0;
}

C 完整实现

#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>

typedef struct BTreeNode {
    int* keys;
    struct BTreeNode** children;
    int t;
    int n;
    bool leaf;
} BTreeNode;

// create a new node
BTreeNode* createNode(int t, bool leaf) {
    BTreeNode* node = (BTreeNode*)malloc(sizeof(BTreeNode));
    node->t = t;
    node->leaf = leaf;
    node->n = 0;
    node->keys = (int*)malloc((2 * t - 1) * sizeof(int));
    node->children = (BTreeNode**)malloc(2 * t * sizeof(BTreeNode*));
    for (int i = 0; i < 2 * t; i++)
        node->children[i] = NULL;
    return node;
}

// split full child y of node x at index i
void splitChild(BTreeNode* x, int i, BTreeNode* y) {
    BTreeNode* z = createNode(y->t, y->leaf);
    z->n = y->t - 1;

    // copy last (t-1) keys from y to z
    for (int j = 0; j < y->t - 1; j++)
        z->keys[j] = y->keys[j + y->t];

    // copy last t children from y to z
    if (!y->leaf) {
        for (int j = 0; j < y->t; j++)
            z->children[j] = y->children[j + y->t];
    }

    // update y's key count
    y->n = y->t - 1;

    // shift x's children right
    for (int j = x->n; j >= i + 1; j--)
        x->children[j + 1] = x->children[j];

    x->children[i + 1] = z;

    // shift x's keys right
    for (int j = x->n - 1; j >= i; j--)
        x->keys[j + 1] = x->keys[j];

    // promote middle key
    x->keys[i] = y->keys[y->t - 1];
    x->n++;
}

// insert key into non-full node
void insertNonFull(BTreeNode* x, int k) {
    int i = x->n - 1;

    if (x->leaf) {
        // shift keys right and insert
        while (i >= 0 && x->keys[i] > k) {
            x->keys[i + 1] = x->keys[i];
            i--;
        }
        x->keys[i + 1] = k;
        x->n++;
    } else {
        while (i >= 0 && x->keys[i] > k)
            i--;
        i++;

        if (x->children[i]->n == 2 * x->t - 1) {
            splitChild(x, i, x->children[i]);
            if (k > x->keys[i])
                i++;
        }
        insertNonFull(x->children[i], k);
    }
}

// insert key into B-tree, return new root if tree grew
BTreeNode* insert(BTreeNode* root, int k, int t) {
    if (root == NULL) {
        root = createNode(t, true);
        root->keys[0] = k;
        root->n = 1;
        return root;
    }

    if (root->n == 2 * t - 1) {
        BTreeNode* s = createNode(t, false);
        s->children[0] = root;
        splitChild(s, 0, root);
        int i = (s->keys[0] < k) ? 1 : 0;
        insertNonFull(s->children[i], k);
        return s;
    }

    insertNonFull(root, k);
    return root;
}

// search for key in subtree
BTreeNode* search(BTreeNode* node, int k, int* idx) {
    int i = 0;
    while (i < node->n && k > node->keys[i])
        i++;

    if (i < node->n && node->keys[i] == k) {
        *idx = i;
        return node;
    }

    if (node->leaf)
        return NULL;

    return search(node->children[i], k, idx);
}

// inorder traversal
void traverse(BTreeNode* node) {
    if (node == NULL) return;

    for (int i = 0; i < node->n; i++) {
        if (!node->leaf)
            traverse(node->children[i]);
        printf("%d ", node->keys[i]);
    }
    if (!node->leaf)
        traverse(node->children[node->n]);
}

int main() {
    BTreeNode* root = NULL;
    int t = 3;

    int keys[] = {10, 20, 5, 6, 12, 30, 7, 17};
    int n = sizeof(keys) / sizeof(keys[0]);
    for (int i = 0; i < n; i++)
        root = insert(root, keys[i], t);

    printf("Traversal: ");
    traverse(root);
    printf("\n");

    int idx;
    BTreeNode* result = search(root, 6, &idx);
    printf("Search 6: %s\n", result ? "Found" : "Not found");

    result = search(root, 15, &idx);
    printf("Search 15: %s\n", result ? "Found" : "Not found");

    return 0;
}

Python 完整实现

class BTreeNode:
    def __init__(self, t, leaf):
        self.keys = []
        self.children = []
        self.t = t
        self.leaf = leaf

    def is_full(self):
        return len(self.keys) == 2 * self.t - 1


def split_child(x, i):
    y = x.children[i]
    t = y.t
    z = BTreeNode(t, y.leaf)

    # copy last (t-1) keys from y to z
    z.keys = y.keys[t:]
    mid_key = y.keys[t - 1]

    # copy last t children from y to z
    if not y.leaf:
        z.children = y.children[t:]

    # trim y
    y.keys = y.keys[:t - 1]
    if not y.leaf:
        y.children = y.children[:t]

    # insert z and mid_key into x
    x.children.insert(i + 1, z)
    x.keys.insert(i, mid_key)


def insert_non_full(x, k):
    i = len(x.keys) - 1

    if x.leaf:
        x.keys.append(0)
        while i >= 0 and x.keys[i] > k:
            x.keys[i + 1] = x.keys[i]
            i -= 1
        x.keys[i + 1] = k
    else:
        while i >= 0 and x.keys[i] > k:
            i -= 1
        i += 1

        if x.children[i].is_full():
            split_child(x, i)
            if k > x.keys[i]:
                i += 1
        insert_non_full(x.children[i], k)


def insert(root, k, t):
    if root is None:
        root = BTreeNode(t, True)
        root.keys.append(k)
        return root

    if root.is_full():
        s = BTreeNode(t, False)
        s.children.append(root)
        split_child(s, 0)
        i = 0
        if s.keys[0] < k:
            i += 1
        insert_non_full(s.children[i], k)
        return s

    insert_non_full(root, k)
    return root


def search(node, k):
    i = 0
    while i < len(node.keys) and k > node.keys[i]:
        i += 1

    if i < len(node.keys) and node.keys[i] == k:
        return node, i

    if node.leaf:
        return None, -1

    return search(node.children[i], k)


def traverse(node):
    if node is None:
        return

    for i in range(len(node.keys)):
        if not node.leaf:
            traverse(node.children[i])
        print(node.keys[i], end=' ')
    if not node.leaf:
        traverse(node.children[len(node.keys)])


if __name__ == '__main__':
    root = None
    t = 3

    for k in [10, 20, 5, 6, 12, 30, 7, 17]:
        root = insert(root, k, t)

    print("Traversal: ", end='')
    traverse(root)
    print()

    node, idx = search(root, 6)
    print(f"Search 6: {'Found' if node else 'Not found'}")

    node, idx = search(root, 15)
    print(f"Search 15: {'Found' if node else 'Not found'}")

Go 完整实现

package main

import "fmt"

// BTreeNode represents a node in the B-Tree
type BTreeNode struct {
    keys     []int
    children []*BTreeNode
    t        int
    leaf     bool
}

func newBTreeNode(t int, leaf bool) *BTreeNode {
    return &BTreeNode{
        keys:     []int{},
        children: []*BTreeNode{},
        t:        t,
        leaf:     leaf,
    }
}

func (n *BTreeNode) isFull() bool {
    return len(n.keys) == 2*n.t-1
}

// splitChild splits full child y of node x at index i
func splitChild(x *BTreeNode, i int, y *BTreeNode) {
    z := newBTreeNode(y.t, y.leaf)

    // copy last (t-1) keys from y to z
    z.keys = append(z.keys, y.keys[y.t:]...)
    midKey := y.keys[y.t-1]

    // copy last t children from y to z
    if !y.leaf {
        z.children = append(z.children, y.children[y.t:]...)
    }

    // trim y to keep first (t-1) keys and t children
    y.keys = y.keys[:y.t-1]
    if !y.leaf {
        y.children = y.children[:y.t]
    }

    // insert z as a child of x at position i+1
    x.children = append(x.children, nil)
    copy(x.children[i+2:], x.children[i+1:])
    x.children[i+1] = z

    // insert middle key into x at position i
    x.keys = append(x.keys, 0)
    copy(x.keys[i+1:], x.keys[i:])
    x.keys[i] = midKey
}

// insertNonFull inserts key k into non-full node x
func insertNonFull(x *BTreeNode, k int) {
    i := len(x.keys) - 1

    if x.leaf {
        // find position and insert
        x.keys = append(x.keys, 0)
        for i >= 0 && x.keys[i] > k {
            x.keys[i+1] = x.keys[i]
            i--
        }
        x.keys[i+1] = k
    } else {
        for i >= 0 && x.keys[i] > k {
            i--
        }
        i++

        if x.children[i].isFull() {
            splitChild(x, i, x.children[i])
            if k > x.keys[i] {
                i++
            }
        }
        insertNonFull(x.children[i], k)
    }
}

// bTreeInsert inserts key into B-tree, returns new root if tree grew
func bTreeInsert(root *BTreeNode, k, t int) *BTreeNode {
    if root == nil {
        root = newBTreeNode(t, true)
        root.keys = append(root.keys, k)
        return root
    }

    if root.isFull() {
        s := newBTreeNode(t, false)
        s.children = append(s.children, root)
        splitChild(s, 0, root)
        i := 0
        if s.keys[0] < k {
            i++
        }
        insertNonFull(s.children[i], k)
        return s
    }

    insertNonFull(root, k)
    return root
}

// search searches for key k in subtree rooted at node
func search(node *BTreeNode, k int) (*BTreeNode, int) {
    if node == nil {
        return nil, -1
    }
    i := 0
    for i < len(node.keys) && k > node.keys[i] {
        i++
    }

    if i < len(node.keys) && node.keys[i] == k {
        return node, i
    }

    if node.leaf {
        return nil, -1
    }

    return search(node.children[i], k)
}

// traverse performs inorder traversal of subtree rooted at node
func traverse(node *BTreeNode) {
    if node == nil {
        return
    }

    for i := 0; i < len(node.keys); i++ {
        if !node.leaf {
            traverse(node.children[i])
        }
        fmt.Printf("%d ", node.keys[i])
    }
    if !node.leaf {
        traverse(node.children[len(node.keys)])
    }
}

func main() {
    var root *BTreeNode
    t := 3

    keys := []int{10, 20, 5, 6, 12, 30, 7, 17}
    for _, k := range keys {
        root = bTreeInsert(root, k, t)
    }

    fmt.Print("Traversal: ")
    traverse(root)
    fmt.Println()

    node, _ := search(root, 6)
    if node != nil {
        fmt.Println("Search 6: Found")
    } else {
        fmt.Println("Search 6: Not found")
    }

    node, _ = search(root, 15)
    if node != nil {
        fmt.Println("Search 15: Found")
    } else {
        fmt.Println("Search 15: Not found")
    }
}

运行该程序将输出:

Traversal: 5 6 7 10 12 17 20 30
Search 6: Found
Search 15: Not found

四个语言版本的输出结果完全一致。遍历输出 5 6 7 10 12 17 20 30 是所有插入关键字的升序排列,验证了 B树的有序性。搜索 6 成功找到,搜索 15 因未插入而返回 Not found。

插入 10, 20, 5, 6, 12, 30, 7, 17(t=3)后,最终的树结构为:

                [10]
               /    \
          [5|6|7]  [12|17|20|30]
  • 根节点 [10] 包含 1 个关键字,2 个子节点
  • 左子节点 [5|6|7] 包含 3 个关键字
  • 右子节点 [12|17|20|30] 包含 4 个关键字
  • 所有叶子节点都在同一层(深度 1)

B树的性质(总结)

B树通过多路分支和平衡约束,实现了高效的外存数据组织。

时间复杂度

操作 时间复杂度 说明
搜索(Search) O(log n) 每层扫描 O(t) 个关键字,共 O(log_t n) 层
插入(Insert) O(log n) 单趟下降 + 预分裂,无需回溯
删除(Delete) O(log n) 需要合并或借用兄弟节点的关键字
遍历(Traverse) O(n) 访问每个关键字恰好一次

磁盘访问优势

B树最大的优势在于减少磁盘 I/O。设计原则是让一个节点的大小恰好等于一个磁盘块(Disk Block)的大小(通常为 4KB):

  • 每次读取一个节点 = 一次磁盘 I/O
  • 树高 h = O(log_t n),即查找只需 h 次磁盘读取
  • t = 100 时,存储 100 万条记录只需树高约 3 层,即 3 次磁盘读取

相比之下,二叉搜索树存储 100 万条记录的树高约为 20 层,需要 20 次磁盘读取。

实际应用

应用 说明
MongoDB 默认使用 B树作为索引结构(WiredTiger 引擎使用 B+ 树)
数据库索引 关系型数据库广泛使用 B树或 B+ 树索引
HDFS Hadoop 分布式文件系统的块管理
NTFS Windows 文件系统的目录索引
ext4 Linux 文件系统的 Htree 目录索引

B树与 B+ 树(B+ Tree)密切相关:B+ 树将所有数据存储在叶子节点,内部节点仅存储关键字用于路由,叶子节点通过链表串联,更适合范围查询。大多数数据库系统实际使用的是 B+ 树而非原始 B树。

posted @ 2026-04-16 20:59  游翔  阅读(8)  评论(0)    收藏  举报