3-12 B树
B树(B Tree)
B树(B Tree)是一种自平衡多路搜索树(Self-Balancing Multi-way Search Tree),由 Rudolf Bayer 和 Edward McCreight 于 1970 年提出。与二叉搜索树每个节点最多有两个子节点不同,B树的每个节点可以拥有多个关键字(Key)和多个子节点(Children),这使得它特别适合磁盘等外部存储的数据组织。
B树的核心设计思想是减少磁盘 I/O 次数:通过在每个节点中存储多个关键字,整棵树变得更加"矮胖",查找时从根到叶子的路径更短,每次访问一个节点对应一次磁盘读取,因此磁盘 I/O 次数大幅减少。
B树使用最小度数(Minimum Degree) t 来定义节点的容量范围:
- 每个节点最多有
2t - 1个关键字,最少有t - 1个关键字(根节点除外) - 每个内部节点最多有
2t个子节点,最少有t个子节点(根节点除外) - 所有叶子节点(Leaf Node)位于同一深度
下面是一棵 t = 3(每个节点最多 5 个关键字)的 B 树示例:
[10 | 20]
/ | \
[5] [15] [25 | 30]
- 根节点
[10 | 20]有 2 个关键字,3 个子节点 - 叶子节点
[5]、[15]、[25 | 30]都在同一层 - 每个节点内的关键字按升序排列
B树的性质
B树必须满足以下性质:
- 所有叶子节点在同一深度 — 从根节点到任意叶子节点的路径长度相同,这保证了查找操作的时间复杂度稳定。
- 根节点至少有 1 个关键字,至多有
2t - 1个关键字 — 根节点是唯一允许关键字数少于t - 1的节点。 - 每个非根内部节点至少有
t - 1个关键字,至多有2t - 1个关键字 — 这保证了节点不会太空,树不会退化为链表。 - 每个内部节点如果有
n个关键字,则有n + 1个子节点 — 关键字将子节点的值域划分为n + 1个区间。 - 节点内的关键字按升序排列 —
key[0] < key[1] < ... < key[n-1],便于节点内使用二分查找或线性查找。 - 子节点与关键字的关系:设某节点有
n个关键字k[0], k[1], ..., k[n-1]和n+1个子节点c[0], c[1], ..., c[n],则子树c[i]中的所有关键字x满足k[i-1] < x < k[i](边界处取开区间)。
以 t = 3 为例,各参数的具体范围:
| 参数 | 最小值 | 最大值 |
|---|---|---|
| 每个节点的关键字数 | 2(非根)或 1(根) | 5 |
| 每个内部节点的子节点数 | 3(非根)或 2(根) | 6 |
节点定义
B树的节点与二叉搜索树不同,它需要存储多个关键字和多个子节点指针,同时记录当前关键字数量和是否为叶子节点。
C++ 节点定义
#include <iostream>
#include <vector>
using namespace std;
// B-Tree node definition
struct BTreeNode {
vector<int> keys; // sorted keys in this node
vector<BTreeNode*> children; // child pointers
int t; // minimum degree
bool leaf; // true if leaf node
// constructor
BTreeNode(int _t, bool _leaf) : t(_t), leaf(_leaf) {}
// utility: check if node is full (has 2t-1 keys)
bool isFull() {
return keys.size() == 2 * t - 1;
}
};
C++ 使用 vector 动态数组存储关键字和子节点指针,通过 t 和 leaf 字段控制节点行为,isFull() 判断节点是否已满。
C 节点定义
#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>
// B-Tree node definition
typedef struct BTreeNode {
int* keys; // array of keys
struct BTreeNode** children; // array of child pointers
int t; // minimum degree
int n; // current number of keys
bool leaf; // true if leaf node
} BTreeNode;
// create a new node
BTreeNode* createNode(int t, bool leaf) {
BTreeNode* node = (BTreeNode*)malloc(sizeof(BTreeNode));
node->t = t;
node->leaf = leaf;
node->n = 0;
node->keys = (int*)malloc((2 * t - 1) * sizeof(int));
node->children = (BTreeNode**)malloc(2 * t * sizeof(BTreeNode*));
for (int i = 0; i < 2 * t; i++)
node->children[i] = NULL;
return node;
}
C 语言使用固定大小的数组(最大容量 2t-1 个关键字和 2t 个子节点),并额外维护 n 字段记录当前关键字数量。createNode 函数负责分配内存并初始化。
Python 节点定义
class BTreeNode:
def __init__(self, t, leaf):
# keys and children stored as lists
self.keys = [] # sorted keys in this node
self.children = [] # child pointers
self.t = t # minimum degree
self.leaf = leaf # true if leaf node
def is_full(self):
# check if node has maximum number of keys
return len(self.keys) == 2 * self.t - 1
Python 使用列表存储关键字和子节点,代码最为简洁。is_full() 方法判断节点是否已满(包含 2t-1 个关键字)。
Go 节点定义
package main
import "fmt"
// BTreeNode represents a node in the B-Tree
type BTreeNode struct {
keys []int // sorted keys in this node
children []*BTreeNode // child pointers
t int // minimum degree
leaf bool // true if leaf node
}
// newBTreeNode creates a new B-Tree node
func newBTreeNode(t int, leaf bool) *BTreeNode {
return &BTreeNode{
keys: []int{},
children: []*BTreeNode{},
t: t,
leaf: leaf,
}
}
// isFull checks if node has maximum number of keys (2t-1)
func (n *BTreeNode) isFull() bool {
return len(n.keys) == 2*n.t-1
}
Go 使用切片存储关键字和子节点指针,通过结构体方法 isFull() 判断节点是否已满。newBTreeNode 工厂函数负责创建并初始化节点。
搜索操作
B树的搜索(Search)与二叉搜索树类似,但需要在一个节点内的多个关键字中进行查找。具体步骤如下:
- 从当前节点的第一个关键字开始,线性扫描找到第一个大于或等于目标值的位置
- 如果找到等于目标值的关键字,搜索成功
- 如果当前节点是叶子节点且未找到,搜索失败
- 否则递归搜索对应的子节点
搜索 15 的过程:
[10 | 20] ← 15 >= 10, 继续比较; 15 < 20, 进入子节点 c[1]
/ | \
[5] [15] [25 | 30] ← 在 [15] 中找到 15, 搜索成功
C++ 搜索实现
// search for key k in subtree rooted at this node
BTreeNode* search(BTreeNode* node, int k, int* index) {
int i = 0;
// find first key >= k
while (i < (int)node->keys.size() && k > node->keys[i])
i++;
// found the key
if (i < (int)node->keys.size() && node->keys[i] == k) {
*index = i;
return node;
}
// key not found and this is a leaf
if (node->leaf)
return nullptr;
// recurse into appropriate child
return search(node->children[i], k, index);
}
C 搜索实现
// search for key k in subtree rooted at node
BTreeNode* search(BTreeNode* node, int k, int* index) {
int i = 0;
// find first key >= k
while (i < node->n && k > node->keys[i])
i++;
// found the key
if (i < node->n && node->keys[i] == k) {
*index = i;
return node;
}
// key not found and this is a leaf
if (node->leaf)
return NULL;
// recurse into appropriate child
return search(node->children[i], k, index);
}
Python 搜索实现
def search(node, k):
# find first key >= k
i = 0
while i < len(node.keys) and k > node.keys[i]:
i += 1
# found the key
if i < len(node.keys) and node.keys[i] == k:
return node, i
# key not found and this is a leaf
if node.leaf:
return None, -1
# recurse into appropriate child
return search(node.children[i], k)
Go 搜索实现
// search searches for key k in subtree rooted at node
func search(node *BTreeNode, k int) (*BTreeNode, int) {
if node == nil {
return nil, -1
}
i := 0
// find first key >= k
for i < len(node.keys) && k > node.keys[i] {
i++
}
// found the key
if i < len(node.keys) && node.keys[i] == k {
return node, i
}
// key not found and this is a leaf
if node.leaf {
return nil, -1
}
// recurse into appropriate child
return search(node.children[i], k)
}
搜索操作在每一层最多扫描 2t - 1 个关键字,树高为 O(log_t n),因此总体时间复杂度为 O(t * log_t n)。由于 t 通常是一个较小的常数(如 50~100),实际中搜索效率很高。
分裂操作
分裂(Split)是 B树维持平衡的核心操作。当一个节点已满(包含 2t - 1 个关键字)时,需要将其分裂为两个各含 t - 1 个关键字的节点,并将中间关键字提升到父节点。
具体步骤:
- 取已满节点的中间关键字
keys[t-1](第 t 个关键字,下标为t-1) - 创建一个新节点,将原节点右半部分的关键字和子节点复制到新节点
- 将中间关键字插入父节点
- 将新节点作为父节点的新子节点插入
分裂一个已满节点(t=3, 5个关键字):
分裂前: 分裂后:
父节点: [30] 父节点: [20 | 30]
| | \
子节点: [10|15|20|25|28] [10|15] [25|28]
↑ ↑
中间关键字 20 提升到父节点
C++ 分裂实现
// split child y of node x at index i
// y must be full (has 2t-1 keys) when this is called
void splitChild(BTreeNode* x, int i, BTreeNode* y) {
// create new node z with same t and leaf status as y
BTreeNode* z = new BTreeNode(y->t, y->leaf);
// copy the last (t-1) keys from y to z
for (int j = 0; j < y->t - 1; j++)
z->keys.push_back(y->keys[j + y->t]);
// copy the last t children from y to z (if not leaf)
if (!y->leaf) {
for (int j = 0; j < y->t; j++)
z->children.push_back(y->children[j + y->t]);
}
// remove transferred keys and children from y
int keysToRemove = y->t; // t-1 to z + 1 to parent = t keys
while ((int)y->keys.size() > y->t - 1)
y->keys.pop_back();
while (!y->leaf && (int)y->children.size() > y->t)
y->children.pop_back();
// insert z as a child of x at position i+1
x->children.insert(x->children.begin() + i + 1, z);
// move middle key of y up to x
int midKey = y->keys[y->t - 1];
y->keys.erase(y->keys.begin() + y->t - 1);
x->keys.insert(x->keys.begin() + i, midKey);
}
C 分裂实现
// split child y of node x at index i
// y must be full (has 2t-1 keys) when this is called
void splitChild(BTreeNode* x, int i, BTreeNode* y) {
// create new node z with same t and leaf status as y
BTreeNode* z = createNode(y->t, y->leaf);
z->n = y->t - 1;
// copy the last (t-1) keys from y to z
for (int j = 0; j < y->t - 1; j++)
z->keys[j] = y->keys[j + y->t];
// copy the last t children from y to z (if not leaf)
if (!y->leaf) {
for (int j = 0; j < y->t; j++)
z->children[j] = y->children[j + y->t];
}
// update y's key count
y->n = y->t - 1;
// shift x's children right to make room for z
for (int j = x->n; j >= i + 1; j--)
x->children[j + 1] = x->children[j];
// insert z as a child of x at position i+1
x->children[i + 1] = z;
// shift x's keys right to make room for middle key
for (int j = x->n - 1; j >= i; j--)
x->keys[j + 1] = x->keys[j];
// move middle key of y up to x
x->keys[i] = y->keys[y->t - 1];
x->n++;
}
Python 分裂实现
def split_child(x, i):
# y is the full child to split
y = x.children[i]
t = y.t
# create new node z with same t and leaf status as y
z = BTreeNode(t, y.leaf)
# copy the last (t-1) keys from y to z
z.keys = y.keys[t:] # keys[t] to keys[2t-2]
mid_key = y.keys[t - 1] # middle key to promote
# copy the last t children from y to z (if not leaf)
if not y.leaf:
z.children = y.children[t:]
# trim y to keep only first (t-1) keys and t children
y.keys = y.keys[:t - 1]
if not y.leaf:
y.children = y.children[:t]
# insert z as a child of x at position i+1
x.children.insert(i + 1, z)
# insert middle key into x at position i
x.keys.insert(i, mid_key)
Go 分裂实现
// splitChild splits full child y of node x at index i
// y must be full (has 2t-1 keys) when this is called
func splitChild(x *BTreeNode, i int, y *BTreeNode) {
// create new node z with same t and leaf status as y
z := newBTreeNode(y.t, y.leaf)
// copy the last (t-1) keys from y to z
z.keys = append(z.keys, y.keys[y.t:]...)
midKey := y.keys[y.t-1]
// copy the last t children from y to z (if not leaf)
if !y.leaf {
z.children = append(z.children, y.children[y.t:]...)
}
// trim y to keep only first (t-1) keys and t children
y.keys = y.keys[:y.t-1]
if !y.leaf {
y.children = y.children[:y.t]
}
// insert z as a child of x at position i+1
x.children = append(x.children, nil)
copy(x.children[i+2:], x.children[i+1:])
x.children[i+1] = z
// insert middle key into x at position i
x.keys = append(x.keys, 0)
copy(x.keys[i+1:], x.keys[i:])
x.keys[i] = midKey
}
分裂操作只涉及单个节点的数据移动,时间复杂度为 O(t)。注意分裂不会改变树的高度——只有当根节点分裂时,树才会长高一层。
插入操作
B树的插入(Insertion)采用预分裂(Proactive Splitting)策略:在从根节点向下寻找插入位置的过程中,遇到已满的节点就先分裂,这样保证了插入时父节点始终有空位容纳提升上来的关键字。
具体步骤:
- 如果根节点已满,先分裂根节点,树高增加一层
- 从根节点开始,向下寻找合适的叶子节点
- 沿途遇到已满的子节点时,先执行分裂操作
- 到达叶子节点后,将新关键字插入到正确位置(保持有序)
插入序列(t=3): 10, 20, 5, 6, 12, 30, 7, 17
步骤1: 插入 10 步骤2: 插入 20
[10] [10 | 20]
步骤3: 插入 5 步骤4: 插入 6
[5 | 10 | 20] [5 | 6 | 10 | 20]
步骤5: 插入 12(节点已满,分裂)
分裂前: [5|6|10|12|20] 分裂后:
[10]
/ \
[5|6] [12|20]
步骤6: 插入 30 步骤7: 插入 7
[10] [10]
/ \ / \
[5|6] [12|20|30] [5|6|7] [12|20|30]
步骤8: 插入 17
[10]
/ \
[5|6|7] [12|17|20|30]
C++ 插入实现
// insert key k into non-full node x
void insertNonFull(BTreeNode* x, int k) {
int i = x->keys.size() - 1;
if (x->leaf) {
// insert key into leaf node at correct position
x->keys.push_back(0); // placeholder
while (i >= 0 && x->keys[i] > k) {
x->keys[i + 1] = x->keys[i];
i--;
}
x->keys[i + 1] = k;
} else {
// find child to recurse into
while (i >= 0 && x->keys[i] > k)
i--;
i++;
// split child if full
if (x->children[i]->isFull()) {
splitChild(x, i, x->children[i]);
if (k > x->keys[i])
i++;
}
insertNonFull(x->children[i], k);
}
}
// main insert function
BTreeNode* insert(BTreeNode* root, int k, int t) {
if (root == nullptr) {
root = new BTreeNode(t, true);
root->keys.push_back(k);
return root;
}
// if root is full, split and grow the tree
if (root->isFull()) {
BTreeNode* s = new BTreeNode(t, false);
s->children.push_back(root);
splitChild(s, 0, root);
// determine which child to insert into
int i = 0;
if (s->keys[0] < k)
i++;
insertNonFull(s->children[i], k);
return s; // new root
}
insertNonFull(root, k);
return root;
}
C 插入实现
// insert key k into non-full node x
void insertNonFull(BTreeNode* x, int k) {
int i = x->n - 1;
if (x->leaf) {
// shift keys right to make room for k
while (i >= 0 && x->keys[i] > k) {
x->keys[i + 1] = x->keys[i];
i--;
}
x->keys[i + 1] = k;
x->n++;
} else {
// find child to recurse into
while (i >= 0 && x->keys[i] > k)
i--;
i++;
// split child if full
if (x->children[i]->n == 2 * x->t - 1) {
splitChild(x, i, x->children[i]);
if (k > x->keys[i])
i++;
}
insertNonFull(x->children[i], k);
}
}
// main insert function — returns new root if tree grew
BTreeNode* insert(BTreeNode* root, int k, int t) {
if (root == NULL) {
root = createNode(t, true);
root->keys[0] = k;
root->n = 1;
return root;
}
// if root is full, split and grow the tree
if (root->n == 2 * t - 1) {
BTreeNode* s = createNode(t, false);
s->children[0] = root;
splitChild(s, 0, root);
// determine which child to insert into
int i = 0;
if (s->keys[0] < k)
i++;
insertNonFull(s->children[i], k);
return s; // new root
}
insertNonFull(root, k);
return root;
}
Python 插入实现
def insert_non_full(x, k):
i = len(x.keys) - 1
if x.leaf:
# insert key into leaf at correct position
x.keys.append(0) # placeholder
while i >= 0 and x.keys[i] > k:
x.keys[i + 1] = x.keys[i]
i -= 1
x.keys[i + 1] = k
else:
# find child to recurse into
while i >= 0 and x.keys[i] > k:
i -= 1
i += 1
# split child if full
if x.children[i].is_full():
split_child(x, i)
if k > x.keys[i]:
i += 1
insert_non_full(x.children[i], k)
def insert(root, k, t):
if root is None:
root = BTreeNode(t, True)
root.keys.append(k)
return root
# if root is full, split and grow the tree
if root.is_full():
s = BTreeNode(t, False)
s.children.append(root)
split_child(s, 0)
# determine which child to insert into
i = 0
if s.keys[0] < k:
i += 1
insert_non_full(s.children[i], k)
return s # new root
insert_non_full(root, k)
return root
Go 插入实现
// insertNonFull inserts key k into non-full node x
func insertNonFull(x *BTreeNode, k int) {
i := len(x.keys) - 1
if x.leaf {
// insert key into leaf at correct position
x.keys = append(x.keys, 0)
for i >= 0 && x.keys[i] > k {
x.keys[i+1] = x.keys[i]
i--
}
x.keys[i+1] = k
} else {
// find child to recurse into
for i >= 0 && x.keys[i] > k {
i--
}
i++
// split child if full
if x.children[i].isFull() {
splitChild(x, i, x.children[i])
if k > x.keys[i] {
i++
}
}
insertNonFull(x.children[i], k)
}
}
// insert inserts key k into B-tree, returns new root if tree grew
func bTreeInsert(root *BTreeNode, k, t int) *BTreeNode {
if root == nil {
root = newBTreeNode(t, true)
root.keys = append(root.keys, k)
return root
}
// if root is full, split and grow the tree
if root.isFull() {
s := newBTreeNode(t, false)
s.children = append(s.children, root)
splitChild(s, 0, root)
// determine which child to insert into
i := 0
if s.keys[0] < k {
i++
}
insertNonFull(s.children[i], k)
return s // new root
}
insertNonFull(root, k)
return root
}
预分裂策略确保了每次递归下降时遇到的节点都未满,因此插入操作只需从根到叶子走一趟即可完成,不需要回溯。插入的时间复杂度为 O(t * log_t n),其中 t 是节点内扫描的代价,log_t n 是树的高度。
遍历操作
B树的遍历(Traversal)采用类似二叉搜索树中序遍历(Inorder Traversal)的方式:对于节点中的每个关键字,先访问其左边的子树,然后访问该关键字本身,最后访问最右边的子树。
遍历顺序:
节点: [k0 | k1 | k2]
子节点: c0 c1 c2 c3
访问顺序: c0 → k0 → c1 → k1 → c2 → k2 → c3
这样遍历的结果就是所有关键字的升序排列。
C++ 遍历实现
// inorder traversal of subtree rooted at node
void traverse(BTreeNode* node) {
if (node == nullptr) return;
int n = node->keys.size();
for (int i = 0; i < n; i++) {
// visit left child before key
if (!node->leaf)
traverse(node->children[i]);
cout << node->keys[i] << " ";
}
// visit the rightmost child
if (!node->leaf)
traverse(node->children[n]);
}
C 遍历实现
// inorder traversal of subtree rooted at node
void traverse(BTreeNode* node) {
if (node == NULL) return;
for (int i = 0; i < node->n; i++) {
// visit left child before key
if (!node->leaf)
traverse(node->children[i]);
printf("%d ", node->keys[i]);
}
// visit the rightmost child
if (!node->leaf)
traverse(node->children[node->n]);
}
Python 遍历实现
def traverse(node):
if node is None:
return
for i in range(len(node.keys)):
# visit left child before key
if not node.leaf:
traverse(node.children[i])
print(node.keys[i], end=' ')
# visit the rightmost child
if not node.leaf:
traverse(node.children[len(node.keys)])
Go 遍历实现
// traverse performs inorder traversal of subtree rooted at node
func traverse(node *BTreeNode) {
if node == nil {
return
}
for i := 0; i < len(node.keys); i++ {
// visit left child before key
if !node.leaf {
traverse(node.children[i])
}
fmt.Printf("%d ", node.keys[i])
}
// visit the rightmost child
if !node.leaf {
traverse(node.children[len(node.keys)])
}
}
遍历操作访问每个节点恰好一次,时间复杂度为 O(n),其中 n 是关键字总数。
完整实现
下面给出 C++、C 和 Python 三种语言的 B树完整实现,包含搜索、插入和遍历操作。使用最小度数 t = 3,插入序列为 10, 20, 5, 6, 12, 30, 7, 17。
C++ 完整实现
#include <iostream>
#include <vector>
using namespace std;
// B-Tree node
struct BTreeNode {
vector<int> keys;
vector<BTreeNode*> children;
int t;
bool leaf;
BTreeNode(int _t, bool _leaf) : t(_t), leaf(_leaf) {}
bool isFull() {
return (int)keys.size() == 2 * t - 1;
}
};
// split full child y of node x at index i
void splitChild(BTreeNode* x, int i, BTreeNode* y) {
BTreeNode* z = new BTreeNode(y->t, y->leaf);
// copy last (t-1) keys from y to z
for (int j = 0; j < y->t - 1; j++)
z->keys.push_back(y->keys[j + y->t]);
// copy last t children from y to z
if (!y->leaf) {
for (int j = 0; j < y->t; j++)
z->children.push_back(y->children[j + y->t]);
}
// middle key to promote
int midKey = y->keys[y->t - 1];
// trim y to keep first (t-1) keys
y->keys.resize(y->t - 1);
if (!y->leaf)
y->children.resize(y->t);
// insert z as child and midKey into x
x->children.insert(x->children.begin() + i + 1, z);
x->keys.insert(x->keys.begin() + i, midKey);
}
// insert key into non-full node
void insertNonFull(BTreeNode* x, int k) {
int i = (int)x->keys.size() - 1;
if (x->leaf) {
// find position and insert
x->keys.push_back(0);
while (i >= 0 && x->keys[i] > k) {
x->keys[i + 1] = x->keys[i];
i--;
}
x->keys[i + 1] = k;
} else {
while (i >= 0 && x->keys[i] > k)
i--;
i++;
if (x->children[i]->isFull()) {
splitChild(x, i, x->children[i]);
if (k > x->keys[i])
i++;
}
insertNonFull(x->children[i], k);
}
}
// insert key into B-tree, return new root if tree grew
BTreeNode* insert(BTreeNode* root, int k, int t) {
if (root == nullptr) {
root = new BTreeNode(t, true);
root->keys.push_back(k);
return root;
}
if (root->isFull()) {
BTreeNode* s = new BTreeNode(t, false);
s->children.push_back(root);
splitChild(s, 0, root);
int i = (s->keys[0] < k) ? 1 : 0;
insertNonFull(s->children[i], k);
return s;
}
insertNonFull(root, k);
return root;
}
// search for key in subtree rooted at node
BTreeNode* search(BTreeNode* node, int k, int* idx) {
int i = 0;
while (i < (int)node->keys.size() && k > node->keys[i])
i++;
if (i < (int)node->keys.size() && node->keys[i] == k) {
*idx = i;
return node;
}
if (node->leaf)
return nullptr;
return search(node->children[i], k, idx);
}
// inorder traversal
void traverse(BTreeNode* node) {
if (node == nullptr) return;
int n = (int)node->keys.size();
for (int i = 0; i < n; i++) {
if (!node->leaf)
traverse(node->children[i]);
cout << node->keys[i] << " ";
}
if (!node->leaf)
traverse(node->children[n]);
}
int main() {
BTreeNode* root = nullptr;
int t = 3;
int keys[] = {10, 20, 5, 6, 12, 30, 7, 17};
for (int k : keys)
root = insert(root, k, t);
cout << "Traversal: ";
traverse(root);
cout << endl;
int idx;
BTreeNode* result = search(root, 6, &idx);
cout << "Search 6: " << (result ? "Found" : "Not found") << endl;
result = search(root, 15, &idx);
cout << "Search 15: " << (result ? "Found" : "Not found") << endl;
return 0;
}
C 完整实现
#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>
typedef struct BTreeNode {
int* keys;
struct BTreeNode** children;
int t;
int n;
bool leaf;
} BTreeNode;
// create a new node
BTreeNode* createNode(int t, bool leaf) {
BTreeNode* node = (BTreeNode*)malloc(sizeof(BTreeNode));
node->t = t;
node->leaf = leaf;
node->n = 0;
node->keys = (int*)malloc((2 * t - 1) * sizeof(int));
node->children = (BTreeNode**)malloc(2 * t * sizeof(BTreeNode*));
for (int i = 0; i < 2 * t; i++)
node->children[i] = NULL;
return node;
}
// split full child y of node x at index i
void splitChild(BTreeNode* x, int i, BTreeNode* y) {
BTreeNode* z = createNode(y->t, y->leaf);
z->n = y->t - 1;
// copy last (t-1) keys from y to z
for (int j = 0; j < y->t - 1; j++)
z->keys[j] = y->keys[j + y->t];
// copy last t children from y to z
if (!y->leaf) {
for (int j = 0; j < y->t; j++)
z->children[j] = y->children[j + y->t];
}
// update y's key count
y->n = y->t - 1;
// shift x's children right
for (int j = x->n; j >= i + 1; j--)
x->children[j + 1] = x->children[j];
x->children[i + 1] = z;
// shift x's keys right
for (int j = x->n - 1; j >= i; j--)
x->keys[j + 1] = x->keys[j];
// promote middle key
x->keys[i] = y->keys[y->t - 1];
x->n++;
}
// insert key into non-full node
void insertNonFull(BTreeNode* x, int k) {
int i = x->n - 1;
if (x->leaf) {
// shift keys right and insert
while (i >= 0 && x->keys[i] > k) {
x->keys[i + 1] = x->keys[i];
i--;
}
x->keys[i + 1] = k;
x->n++;
} else {
while (i >= 0 && x->keys[i] > k)
i--;
i++;
if (x->children[i]->n == 2 * x->t - 1) {
splitChild(x, i, x->children[i]);
if (k > x->keys[i])
i++;
}
insertNonFull(x->children[i], k);
}
}
// insert key into B-tree, return new root if tree grew
BTreeNode* insert(BTreeNode* root, int k, int t) {
if (root == NULL) {
root = createNode(t, true);
root->keys[0] = k;
root->n = 1;
return root;
}
if (root->n == 2 * t - 1) {
BTreeNode* s = createNode(t, false);
s->children[0] = root;
splitChild(s, 0, root);
int i = (s->keys[0] < k) ? 1 : 0;
insertNonFull(s->children[i], k);
return s;
}
insertNonFull(root, k);
return root;
}
// search for key in subtree
BTreeNode* search(BTreeNode* node, int k, int* idx) {
int i = 0;
while (i < node->n && k > node->keys[i])
i++;
if (i < node->n && node->keys[i] == k) {
*idx = i;
return node;
}
if (node->leaf)
return NULL;
return search(node->children[i], k, idx);
}
// inorder traversal
void traverse(BTreeNode* node) {
if (node == NULL) return;
for (int i = 0; i < node->n; i++) {
if (!node->leaf)
traverse(node->children[i]);
printf("%d ", node->keys[i]);
}
if (!node->leaf)
traverse(node->children[node->n]);
}
int main() {
BTreeNode* root = NULL;
int t = 3;
int keys[] = {10, 20, 5, 6, 12, 30, 7, 17};
int n = sizeof(keys) / sizeof(keys[0]);
for (int i = 0; i < n; i++)
root = insert(root, keys[i], t);
printf("Traversal: ");
traverse(root);
printf("\n");
int idx;
BTreeNode* result = search(root, 6, &idx);
printf("Search 6: %s\n", result ? "Found" : "Not found");
result = search(root, 15, &idx);
printf("Search 15: %s\n", result ? "Found" : "Not found");
return 0;
}
Python 完整实现
class BTreeNode:
def __init__(self, t, leaf):
self.keys = []
self.children = []
self.t = t
self.leaf = leaf
def is_full(self):
return len(self.keys) == 2 * self.t - 1
def split_child(x, i):
y = x.children[i]
t = y.t
z = BTreeNode(t, y.leaf)
# copy last (t-1) keys from y to z
z.keys = y.keys[t:]
mid_key = y.keys[t - 1]
# copy last t children from y to z
if not y.leaf:
z.children = y.children[t:]
# trim y
y.keys = y.keys[:t - 1]
if not y.leaf:
y.children = y.children[:t]
# insert z and mid_key into x
x.children.insert(i + 1, z)
x.keys.insert(i, mid_key)
def insert_non_full(x, k):
i = len(x.keys) - 1
if x.leaf:
x.keys.append(0)
while i >= 0 and x.keys[i] > k:
x.keys[i + 1] = x.keys[i]
i -= 1
x.keys[i + 1] = k
else:
while i >= 0 and x.keys[i] > k:
i -= 1
i += 1
if x.children[i].is_full():
split_child(x, i)
if k > x.keys[i]:
i += 1
insert_non_full(x.children[i], k)
def insert(root, k, t):
if root is None:
root = BTreeNode(t, True)
root.keys.append(k)
return root
if root.is_full():
s = BTreeNode(t, False)
s.children.append(root)
split_child(s, 0)
i = 0
if s.keys[0] < k:
i += 1
insert_non_full(s.children[i], k)
return s
insert_non_full(root, k)
return root
def search(node, k):
i = 0
while i < len(node.keys) and k > node.keys[i]:
i += 1
if i < len(node.keys) and node.keys[i] == k:
return node, i
if node.leaf:
return None, -1
return search(node.children[i], k)
def traverse(node):
if node is None:
return
for i in range(len(node.keys)):
if not node.leaf:
traverse(node.children[i])
print(node.keys[i], end=' ')
if not node.leaf:
traverse(node.children[len(node.keys)])
if __name__ == '__main__':
root = None
t = 3
for k in [10, 20, 5, 6, 12, 30, 7, 17]:
root = insert(root, k, t)
print("Traversal: ", end='')
traverse(root)
print()
node, idx = search(root, 6)
print(f"Search 6: {'Found' if node else 'Not found'}")
node, idx = search(root, 15)
print(f"Search 15: {'Found' if node else 'Not found'}")
Go 完整实现
package main
import "fmt"
// BTreeNode represents a node in the B-Tree
type BTreeNode struct {
keys []int
children []*BTreeNode
t int
leaf bool
}
func newBTreeNode(t int, leaf bool) *BTreeNode {
return &BTreeNode{
keys: []int{},
children: []*BTreeNode{},
t: t,
leaf: leaf,
}
}
func (n *BTreeNode) isFull() bool {
return len(n.keys) == 2*n.t-1
}
// splitChild splits full child y of node x at index i
func splitChild(x *BTreeNode, i int, y *BTreeNode) {
z := newBTreeNode(y.t, y.leaf)
// copy last (t-1) keys from y to z
z.keys = append(z.keys, y.keys[y.t:]...)
midKey := y.keys[y.t-1]
// copy last t children from y to z
if !y.leaf {
z.children = append(z.children, y.children[y.t:]...)
}
// trim y to keep first (t-1) keys and t children
y.keys = y.keys[:y.t-1]
if !y.leaf {
y.children = y.children[:y.t]
}
// insert z as a child of x at position i+1
x.children = append(x.children, nil)
copy(x.children[i+2:], x.children[i+1:])
x.children[i+1] = z
// insert middle key into x at position i
x.keys = append(x.keys, 0)
copy(x.keys[i+1:], x.keys[i:])
x.keys[i] = midKey
}
// insertNonFull inserts key k into non-full node x
func insertNonFull(x *BTreeNode, k int) {
i := len(x.keys) - 1
if x.leaf {
// find position and insert
x.keys = append(x.keys, 0)
for i >= 0 && x.keys[i] > k {
x.keys[i+1] = x.keys[i]
i--
}
x.keys[i+1] = k
} else {
for i >= 0 && x.keys[i] > k {
i--
}
i++
if x.children[i].isFull() {
splitChild(x, i, x.children[i])
if k > x.keys[i] {
i++
}
}
insertNonFull(x.children[i], k)
}
}
// bTreeInsert inserts key into B-tree, returns new root if tree grew
func bTreeInsert(root *BTreeNode, k, t int) *BTreeNode {
if root == nil {
root = newBTreeNode(t, true)
root.keys = append(root.keys, k)
return root
}
if root.isFull() {
s := newBTreeNode(t, false)
s.children = append(s.children, root)
splitChild(s, 0, root)
i := 0
if s.keys[0] < k {
i++
}
insertNonFull(s.children[i], k)
return s
}
insertNonFull(root, k)
return root
}
// search searches for key k in subtree rooted at node
func search(node *BTreeNode, k int) (*BTreeNode, int) {
if node == nil {
return nil, -1
}
i := 0
for i < len(node.keys) && k > node.keys[i] {
i++
}
if i < len(node.keys) && node.keys[i] == k {
return node, i
}
if node.leaf {
return nil, -1
}
return search(node.children[i], k)
}
// traverse performs inorder traversal of subtree rooted at node
func traverse(node *BTreeNode) {
if node == nil {
return
}
for i := 0; i < len(node.keys); i++ {
if !node.leaf {
traverse(node.children[i])
}
fmt.Printf("%d ", node.keys[i])
}
if !node.leaf {
traverse(node.children[len(node.keys)])
}
}
func main() {
var root *BTreeNode
t := 3
keys := []int{10, 20, 5, 6, 12, 30, 7, 17}
for _, k := range keys {
root = bTreeInsert(root, k, t)
}
fmt.Print("Traversal: ")
traverse(root)
fmt.Println()
node, _ := search(root, 6)
if node != nil {
fmt.Println("Search 6: Found")
} else {
fmt.Println("Search 6: Not found")
}
node, _ = search(root, 15)
if node != nil {
fmt.Println("Search 15: Found")
} else {
fmt.Println("Search 15: Not found")
}
}
运行该程序将输出:
Traversal: 5 6 7 10 12 17 20 30
Search 6: Found
Search 15: Not found
四个语言版本的输出结果完全一致。遍历输出 5 6 7 10 12 17 20 30 是所有插入关键字的升序排列,验证了 B树的有序性。搜索 6 成功找到,搜索 15 因未插入而返回 Not found。
插入 10, 20, 5, 6, 12, 30, 7, 17(t=3)后,最终的树结构为:
[10]
/ \
[5|6|7] [12|17|20|30]
- 根节点
[10]包含 1 个关键字,2 个子节点 - 左子节点
[5|6|7]包含 3 个关键字 - 右子节点
[12|17|20|30]包含 4 个关键字 - 所有叶子节点都在同一层(深度 1)
B树的性质(总结)
B树通过多路分支和平衡约束,实现了高效的外存数据组织。
时间复杂度
| 操作 | 时间复杂度 | 说明 |
|---|---|---|
| 搜索(Search) | O(log n) | 每层扫描 O(t) 个关键字,共 O(log_t n) 层 |
| 插入(Insert) | O(log n) | 单趟下降 + 预分裂,无需回溯 |
| 删除(Delete) | O(log n) | 需要合并或借用兄弟节点的关键字 |
| 遍历(Traverse) | O(n) | 访问每个关键字恰好一次 |
磁盘访问优势
B树最大的优势在于减少磁盘 I/O。设计原则是让一个节点的大小恰好等于一个磁盘块(Disk Block)的大小(通常为 4KB):
- 每次读取一个节点 = 一次磁盘 I/O
- 树高
h = O(log_t n),即查找只需h次磁盘读取 - 当
t = 100时,存储 100 万条记录只需树高约 3 层,即 3 次磁盘读取
相比之下,二叉搜索树存储 100 万条记录的树高约为 20 层,需要 20 次磁盘读取。
实际应用
| 应用 | 说明 |
|---|---|
| MongoDB | 默认使用 B树作为索引结构(WiredTiger 引擎使用 B+ 树) |
| 数据库索引 | 关系型数据库广泛使用 B树或 B+ 树索引 |
| HDFS | Hadoop 分布式文件系统的块管理 |
| NTFS | Windows 文件系统的目录索引 |
| ext4 | Linux 文件系统的 Htree 目录索引 |
B树与 B+ 树(B+ Tree)密切相关:B+ 树将所有数据存储在叶子节点,内部节点仅存储关键字用于路由,叶子节点通过链表串联,更适合范围查询。大多数数据库系统实际使用的是 B+ 树而非原始 B树。

浙公网安备 33010602011771号