3-8 使用桶的闭哈希表
使用桶的闭哈希表(Closed Hash Tables, using Buckets)
使用桶的闭哈希表(Closed Hash Tables with Buckets)是处理哈希冲突(Hash Collision)的一种策略。与开放地址法(Open Addressing)在数组中逐个探测空位不同,桶方法的哈希表(Hash Table)每个槽位(Slot)是一个固定大小的"桶"(Bucket),可以容纳多个元素。当一个桶满了之后,新的冲突元素被放入一个公共的溢出区(Overflow Area)。因此这种方案既属于闭哈希表——所有元素存储在数组中而非链表里——又利用桶来批量容纳冲突元素。
核心概念
- 桶(Bucket):哈希表中每个槽位是一个固定大小的数组,可以存放多个键。桶有一个
count字段记录当前存放了多少个元素。 - 桶大小(Bucket Size):每个桶能容纳的最大元素数量,记作
BUCKET_SIZE。通常取较小的值(如 2 或 3)。 - 溢出区(Overflow Area):当一个桶已满(
count == BUCKET_SIZE),新元素无法放入该桶时,被存入溢出区。溢出区本身也是一个桶。 - 哈希函数(Hash Function):将键映射到桶的下标,例如
hash(key) = key % TABLE_SIZE。
下面是一个 TABLE_SIZE = 7, BUCKET_SIZE = 2 的桶哈希表插入过程示意:
插入 10: hash(10) = 10 % 7 = 3 -> 桶[3]为空,放入桶[3][0]
插入 22: hash(22) = 22 % 7 = 1 -> 桶[1]为空,放入桶[1][0]
插入 31: hash(31) = 31 % 7 = 3 -> 桶[3]未满,放入桶[3][1]
插入 4: hash(4) = 4 % 7 = 4 -> 桶[4]为空,放入桶[4][0]
插入 15: hash(15) = 15 % 7 = 1 -> 桶[1]未满,放入桶[1][1]
插入 28: hash(28) = 28 % 7 = 0 -> 桶[0]为空,放入桶[0][0]
插入 17: hash(17) = 17 % 7 = 3 -> 桶[3]已满,放入溢出区[0]
插入 88: hash(88) = 88 % 7 = 4 -> 桶[4]未满,放入桶[4][1]
Bucket[0]: [28, -] count=1
Bucket[1]: [22, 15] count=2
Bucket[2]: [ -, -] count=0
Bucket[3]: [10, 31] count=2
Bucket[4]: [ 4, 88] count=2
Bucket[5]: [ -, -] count=0
Bucket[6]: [ -, -] count=0
Overflow: [17, -] count=1
在这个例子中,10 和 31 都映射到桶 3,恰好填满了桶 3(BUCKET_SIZE = 2)。当 17 也映射到桶 3 时,桶已满,只能放入溢出区。22 和 15 填满桶 1。4 和 88 填满桶 4。
数据结构定义
桶哈希表的核心数据结构由两部分组成:
- 桶(Bucket):包含一个固定大小的键数组和一个计数器,记录当前桶中存放的元素数量。
- 哈希表(HashTable):包含一个桶数组和一个额外的溢出桶(Overflow Bucket)。
// C++ Bucket and HashTable definitions
const int TABLE_SIZE = 7; // Number of buckets
const int BUCKET_SIZE = 2; // Slots per bucket
const int OVERFLOW_SIZE = 10; // Overflow bucket capacity
struct Bucket {
int keys[BUCKET_SIZE]; // Array storing keys in this bucket
int count; // Number of keys currently stored
Bucket() : count(0) {
for (int i = 0; i < BUCKET_SIZE; i++)
keys[i] = -1; // -1 indicates empty slot
}
};
class HashTable {
private:
Bucket buckets[TABLE_SIZE]; // Main bucket array
Bucket overflow; // Overflow bucket
int overflowCount; // Items in overflow
int hashFunction(int key) {
return key % TABLE_SIZE;
}
public:
HashTable() : overflowCount(0) {}
void insert(int key);
bool search(int key);
void remove(int key);
void display();
};
// C Bucket and HashTable definitions
#define TABLE_SIZE 7
#define BUCKET_SIZE 2
#define OVERFLOW_SIZE 10
typedef struct {
int keys[BUCKET_SIZE]; // Array storing keys
int count; // Number of keys stored
} Bucket;
typedef struct {
Bucket buckets[TABLE_SIZE]; // Main bucket array
Bucket overflow; // Overflow bucket
int overflowCount; // Items in overflow
} HashTable;
# Python Bucket and HashTable definitions
TABLE_SIZE = 7
BUCKET_SIZE = 2
OVERFLOW_SIZE = 10
class Bucket:
"""A fixed-size bucket that holds multiple keys."""
def __init__(self, size=BUCKET_SIZE):
self.keys = [-1] * size # -1 indicates empty slot
self.count = 0
class HashTable:
"""Hash table using closed hashing with buckets and overflow."""
def __init__(self):
self.buckets = [Bucket() for _ in range(TABLE_SIZE)]
self.overflow = Bucket(OVERFLOW_SIZE)
self.overflow_count = 0
// Go Bucket and HashTable definitions
package main
const TABLE_SIZE = 7
const BUCKET_SIZE = 2
const OVERFLOW_SIZE = 10
// Bucket 固定大小的桶,存储多个键
type Bucket struct {
keys []int // 键数组,-1 表示空槽位
count int // 当前存储的键数量
}
// HashTable 桶哈希表,包含主桶数组和溢出桶
type HashTable struct {
buckets []*Bucket // 主桶数组
overflow *Bucket // 溢出桶
overflowCount int // 溢出区元素数量
}
C++ 使用 struct Bucket 定义桶,内含 int keys[BUCKET_SIZE] 数组和 int count 计数器。class HashTable 包含桶数组 buckets[TABLE_SIZE] 和一个溢出桶 overflow。C 语言用 typedef struct 定义等价结构。Python 使用 Bucket 类封装键列表和计数器,HashTable 类管理桶列表和溢出桶。Go 使用结构体(struct)定义桶和哈希表,Bucket 包含 []int 切片和 count 计数器,HashTable 管理桶指针切片和溢出桶指针。
插入操作
插入(Insertion)的步骤如下:
- 用哈希函数计算目标桶的下标
bucketIndex = key % TABLE_SIZE。 - 检查该桶:如果桶未满(
count < BUCKET_SIZE),将键追加到桶中。 - 如果桶已满,检查键是否已在桶中(避免重复插入)。
- 如果桶满且键不在桶中,将键放入溢出区(Overflow)。
// C++ Insert: add a key to the bucket hash table
void HashTable::insert(int key) {
int idx = hashFunction(key);
Bucket& b = buckets[idx];
// Check if key already exists in the bucket
for (int i = 0; i < b.count; i++) {
if (b.keys[i] == key) {
cout << "Key " << key << " already exists" << endl;
return;
}
}
// Try to insert into the main bucket
if (b.count < BUCKET_SIZE) {
b.keys[b.count] = key;
b.count++;
return;
}
// Bucket is full, insert into overflow
for (int i = 0; i < overflowCount; i++) {
if (overflow.keys[i] == key) {
cout << "Key " << key << " already exists" << endl;
return;
}
}
if (overflowCount < OVERFLOW_SIZE) {
overflow.keys[overflowCount] = key;
overflowCount++;
} else {
cout << "Overflow is full, cannot insert " << key << endl;
}
}
// C Insert: add a key to the bucket hash table
void insert(HashTable* ht, int key) {
int idx = key % TABLE_SIZE;
Bucket* b = &ht->buckets[idx];
// Check if key already exists in the bucket
for (int i = 0; i < b->count; i++) {
if (b->keys[i] == key) {
printf("Key %d already exists\n", key);
return;
}
}
// Try to insert into the main bucket
if (b->count < BUCKET_SIZE) {
b->keys[b->count] = key;
b->count++;
return;
}
// Bucket is full, insert into overflow
for (int i = 0; i < ht->overflowCount; i++) {
if (ht->overflow.keys[i] == key) {
printf("Key %d already exists\n", key);
return;
}
}
if (ht->overflowCount < OVERFLOW_SIZE) {
ht->overflow.keys[ht->overflowCount] = key;
ht->overflowCount++;
} else {
printf("Overflow is full, cannot insert %d\n", key);
}
}
# Python Insert: add a key to the bucket hash table
def insert(self, key):
idx = key % TABLE_SIZE
b = self.buckets[idx]
# Check if key already exists in the bucket
for i in range(b.count):
if b.keys[i] == key:
print(f"Key {key} already exists")
return
# Try to insert into the main bucket
if b.count < BUCKET_SIZE:
b.keys[b.count] = key
b.count += 1
return
# Bucket is full, insert into overflow
for i in range(self.overflow_count):
if self.overflow.keys[i] == key:
print(f"Key {key} already exists")
return
if self.overflow_count < OVERFLOW_SIZE:
self.overflow.keys[self.overflow_count] = key
self.overflow_count += 1
else:
print(f"Overflow is full, cannot insert {key}")
// Go Insert: add a key to the bucket hash table
func (ht *HashTable) insert(key int) {
idx := key % TABLE_SIZE
b := ht.buckets[idx]
// 检查键是否已存在于主桶中
for i := 0; i < b.count; i++ {
if b.keys[i] == key {
fmt.Printf("Key %d already exists\n", key)
return
}
}
// 主桶未满,直接插入
if b.count < BUCKET_SIZE {
b.keys[b.count] = key
b.count++
return
}
// 主桶已满,检查溢出区是否已存在该键
for i := 0; i < ht.overflowCount; i++ {
if ht.overflow.keys[i] == key {
fmt.Printf("Key %d already exists\n", key)
return
}
}
// 插入溢出区
if ht.overflowCount < OVERFLOW_SIZE {
ht.overflow.keys[ht.overflowCount] = key
ht.overflowCount++
} else {
fmt.Printf("Overflow is full, cannot insert %d\n", key)
}
}
插入操作首先在目标桶中查找是否已存在相同键,避免重复插入。如果桶未满,直接将键追加到桶的 keys 数组末尾。如果桶已满,则将键放入溢出区。溢出区也有容量限制,满时插入失败。
搜索操作
搜索(Search)的步骤如下:
- 计算目标桶的下标
bucketIndex = key % TABLE_SIZE。 - 在该桶中遍历所有已存储的键,查找匹配项。
- 如果桶中未找到,继续在溢出区中搜索。
// C++ Search: find a key in the bucket hash table
bool HashTable::search(int key) {
int idx = hashFunction(key);
Bucket& b = buckets[idx];
// Search in the main bucket
for (int i = 0; i < b.count; i++) {
if (b.keys[i] == key) {
return true;
}
}
// Search in overflow
for (int i = 0; i < overflowCount; i++) {
if (overflow.keys[i] == key) {
return true;
}
}
return false;
}
// C Search: find a key in the bucket hash table
int search(HashTable* ht, int key) {
int idx = key % TABLE_SIZE;
Bucket* b = &ht->buckets[idx];
// Search in the main bucket
for (int i = 0; i < b->count; i++) {
if (b->keys[i] == key) {
return 1;
}
}
// Search in overflow
for (int i = 0; i < ht->overflowCount; i++) {
if (ht->overflow.keys[i] == key) {
return 1;
}
}
return 0;
}
# Python Search: find a key in the bucket hash table
def search(self, key):
idx = key % TABLE_SIZE
b = self.buckets[idx]
# Search in the main bucket
for i in range(b.count):
if b.keys[i] == key:
return True
# Search in overflow
for i in range(self.overflow_count):
if self.overflow.keys[i] == key:
return True
return False
// Go Search: find a key in the bucket hash table
func (ht *HashTable) search(key int) bool {
idx := key % TABLE_SIZE
b := ht.buckets[idx]
// 在主桶中搜索
for i := 0; i < b.count; i++ {
if b.keys[i] == key {
return true
}
}
// 在溢出区中搜索
for i := 0; i < ht.overflowCount; i++ {
if ht.overflow.keys[i] == key {
return true
}
}
return false
}
搜索操作先在哈希值对应的主桶中逐一比较。由于桶大小较小(通常 2~3),这一步是常数时间。如果主桶中没有找到,则需要遍历溢出区。最坏情况下需要搜索整个溢出区,因此溢出区的大小直接影响搜索性能。
删除操作
删除(Deletion)的步骤如下:
- 首先在目标桶中查找键。
- 如果找到,用桶中最后一个元素覆盖被删除元素的位置,然后减少计数。这种"用末尾填充"的方式避免了在桶中间留下空洞。
- 如果桶中未找到,在溢出区中查找并删除,同样使用末尾填充。
- 如果溢出区中也没有找到,报告键不存在。
// C++ Remove: delete a key from the bucket hash table
void HashTable::remove(int key) {
int idx = hashFunction(key);
Bucket& b = buckets[idx];
// Search in the main bucket
for (int i = 0; i < b.count; i++) {
if (b.keys[i] == key) {
// Replace with last element to avoid gaps
b.keys[i] = b.keys[b.count - 1];
b.keys[b.count - 1] = -1;
b.count--;
cout << "Deleted key " << key << " from bucket " << idx << endl;
return;
}
}
// Search in overflow
for (int i = 0; i < overflowCount; i++) {
if (overflow.keys[i] == key) {
overflow.keys[i] = overflow.keys[overflowCount - 1];
overflow.keys[overflowCount - 1] = -1;
overflowCount--;
cout << "Deleted key " << key << " from overflow" << endl;
return;
}
}
cout << "Key " << key << " not found" << endl;
}
// C Remove: delete a key from the bucket hash table
void removeKey(HashTable* ht, int key) {
int idx = key % TABLE_SIZE;
Bucket* b = &ht->buckets[idx];
// Search in the main bucket
for (int i = 0; i < b->count; i++) {
if (b->keys[i] == key) {
b->keys[i] = b->keys[b->count - 1];
b->keys[b->count - 1] = -1;
b->count--;
printf("Deleted key %d from bucket %d\n", key, idx);
return;
}
}
// Search in overflow
for (int i = 0; i < ht->overflowCount; i++) {
if (ht->overflow.keys[i] == key) {
ht->overflow.keys[i] = ht->overflow.keys[ht->overflowCount - 1];
ht->overflow.keys[ht->overflowCount - 1] = -1;
ht->overflowCount--;
printf("Deleted key %d from overflow\n", key);
return;
}
}
printf("Key %d not found\n", key);
}
# Python Remove: delete a key from the bucket hash table
def remove(self, key):
idx = key % TABLE_SIZE
b = self.buckets[idx]
# Search in the main bucket
for i in range(b.count):
if b.keys[i] == key:
# Replace with last element to avoid gaps
b.keys[i] = b.keys[b.count - 1]
b.keys[b.count - 1] = -1
b.count -= 1
print(f"Deleted key {key} from bucket {idx}")
return
# Search in overflow
for i in range(self.overflow_count):
if self.overflow.keys[i] == key:
self.overflow.keys[i] = self.overflow.keys[self.overflow_count - 1]
self.overflow.keys[self.overflow_count - 1] = -1
self.overflow_count -= 1
print(f"Deleted key {key} from overflow")
return
print(f"Key {key} not found")
// Go Remove: delete a key from the bucket hash table
func (ht *HashTable) remove(key int) {
idx := key % TABLE_SIZE
b := ht.buckets[idx]
// 在主桶中查找
for i := 0; i < b.count; i++ {
if b.keys[i] == key {
// 用末尾元素覆盖,避免空洞
b.keys[i] = b.keys[b.count-1]
b.keys[b.count-1] = -1
b.count--
fmt.Printf("Deleted key %d from bucket %d\n", key, idx)
return
}
}
// 在溢出区中查找
for i := 0; i < ht.overflowCount; i++ {
if ht.overflow.keys[i] == key {
ht.overflow.keys[i] = ht.overflow.keys[ht.overflowCount-1]
ht.overflow.keys[ht.overflowCount-1] = -1
ht.overflowCount--
fmt.Printf("Deleted key %d from overflow\n", key)
return
}
}
fmt.Printf("Key %d not found\n", key)
}
删除操作采用"末尾元素覆盖"策略:将被删除位置用桶中最后一个元素填充,然后减少计数。这种方式保证桶中的元素始终连续存储,不会产生中间空洞,搜索时无需跳过空位。这比开放地址法的懒删除(Lazy Deletion)更干净,因为桶内的元素顺序无关紧要——只要它们都在同一个桶里即可。
显示哈希表
显示(Display)操作遍历所有桶和溢出区,打印每个桶中存储的键。
// C++ Display: print the entire hash table
void HashTable::display() {
for (int i = 0; i < TABLE_SIZE; i++) {
cout << "Bucket[" << i << "]: ";
if (buckets[i].count == 0) {
cout << "empty";
} else {
for (int j = 0; j < buckets[i].count; j++) {
cout << buckets[i].keys[j];
if (j < buckets[i].count - 1) cout << ", ";
}
}
cout << " (count=" << buckets[i].count << ")" << endl;
}
cout << "Overflow: ";
if (overflowCount == 0) {
cout << "empty";
} else {
for (int j = 0; j < overflowCount; j++) {
cout << overflow.keys[j];
if (j < overflowCount - 1) cout << ", ";
}
}
cout << " (count=" << overflowCount << ")" << endl;
}
// C Display: print the entire hash table
void display(HashTable* ht) {
for (int i = 0; i < TABLE_SIZE; i++) {
printf("Bucket[%d]: ", i);
if (ht->buckets[i].count == 0) {
printf("empty");
} else {
for (int j = 0; j < ht->buckets[i].count; j++) {
printf("%d", ht->buckets[i].keys[j]);
if (j < ht->buckets[i].count - 1) printf(", ");
}
}
printf(" (count=%d)\n", ht->buckets[i].count);
}
printf("Overflow: ");
if (ht->overflowCount == 0) {
printf("empty");
} else {
for (int j = 0; j < ht->overflowCount; j++) {
printf("%d", ht->overflow.keys[j]);
if (j < ht->overflowCount - 1) printf(", ");
}
}
printf(" (count=%d)\n", ht->overflowCount);
}
# Python Display: print the entire hash table
def display(self):
for i in range(TABLE_SIZE):
b = self.buckets[i]
if b.count == 0:
items = "empty"
else:
items = ", ".join(str(b.keys[j]) for j in range(b.count))
print(f"Bucket[{i}]: {items} (count={b.count})")
if self.overflow_count == 0:
items = "empty"
else:
items = ", ".join(
str(self.overflow.keys[j]) for j in range(self.overflow_count)
)
print(f"Overflow: {items} (count={self.overflow_count})")
// Go Display: print the entire hash table
func (ht *HashTable) display() {
for i := 0; i < TABLE_SIZE; i++ {
b := ht.buckets[i]
fmt.Printf("Bucket[%d]: ", i)
if b.count == 0 {
fmt.Print("empty")
} else {
for j := 0; j < b.count; j++ {
fmt.Print(b.keys[j])
if j < b.count-1 {
fmt.Print(", ")
}
}
}
fmt.Printf(" (count=%d)\n", b.count)
}
fmt.Print("Overflow: ")
if ht.overflowCount == 0 {
fmt.Print("empty")
} else {
for j := 0; j < ht.overflowCount; j++ {
fmt.Print(ht.overflow.keys[j])
if j < ht.overflowCount-1 {
fmt.Print(", ")
}
}
}
fmt.Printf(" (count=%d)\n", ht.overflowCount)
}
显示操作按顺序遍历每个桶,打印桶中所有有效键及计数,最后打印溢出区内容。空桶显示 "empty"。这种格式便于直观地观察元素在桶间的分布和溢出区的使用情况。
完整实现
下面提供完整的桶哈希表实现,包含插入、搜索、删除和显示操作,整合为可独立运行的程序。依次插入 10, 22, 31, 4, 15, 28, 17, 88,然后执行搜索、删除和再次显示。
#include <iostream>
using namespace std;
const int TABLE_SIZE = 7;
const int BUCKET_SIZE = 2;
const int OVERFLOW_SIZE = 10;
struct Bucket {
int keys[BUCKET_SIZE];
int count;
Bucket() : count(0) {
for (int i = 0; i < BUCKET_SIZE; i++)
keys[i] = -1;
}
};
class HashTable {
private:
Bucket buckets[TABLE_SIZE];
Bucket overflow;
int overflowCount;
int hashFunction(int key) {
return key % TABLE_SIZE;
}
public:
HashTable() : overflow(), overflowCount(0) {}
void insert(int key) {
int idx = hashFunction(key);
Bucket& b = buckets[idx];
// Check if key already exists in the bucket
for (int i = 0; i < b.count; i++) {
if (b.keys[i] == key) {
cout << "Key " << key << " already exists" << endl;
return;
}
}
if (b.count < BUCKET_SIZE) {
b.keys[b.count] = key;
b.count++;
return;
}
// Bucket full, insert into overflow
for (int i = 0; i < overflowCount; i++) {
if (overflow.keys[i] == key) {
cout << "Key " << key << " already exists" << endl;
return;
}
}
if (overflowCount < OVERFLOW_SIZE) {
overflow.keys[overflowCount] = key;
overflowCount++;
} else {
cout << "Overflow is full, cannot insert " << key << endl;
}
}
bool search(int key) {
int idx = hashFunction(key);
Bucket& b = buckets[idx];
for (int i = 0; i < b.count; i++) {
if (b.keys[i] == key) return true;
}
for (int i = 0; i < overflowCount; i++) {
if (overflow.keys[i] == key) return true;
}
return false;
}
void remove(int key) {
int idx = hashFunction(key);
Bucket& b = buckets[idx];
for (int i = 0; i < b.count; i++) {
if (b.keys[i] == key) {
b.keys[i] = b.keys[b.count - 1];
b.keys[b.count - 1] = -1;
b.count--;
cout << "Deleted key " << key << " from bucket " << idx << endl;
return;
}
}
for (int i = 0; i < overflowCount; i++) {
if (overflow.keys[i] == key) {
overflow.keys[i] = overflow.keys[overflowCount - 1];
overflow.keys[overflowCount - 1] = -1;
overflowCount--;
cout << "Deleted key " << key << " from overflow" << endl;
return;
}
}
cout << "Key " << key << " not found" << endl;
}
void display() {
for (int i = 0; i < TABLE_SIZE; i++) {
cout << "Bucket[" << i << "]: ";
if (buckets[i].count == 0) {
cout << "empty";
} else {
for (int j = 0; j < buckets[i].count; j++) {
cout << buckets[i].keys[j];
if (j < buckets[i].count - 1) cout << ", ";
}
}
cout << " (count=" << buckets[i].count << ")" << endl;
}
cout << "Overflow: ";
if (overflowCount == 0) {
cout << "empty";
} else {
for (int j = 0; j < overflowCount; j++) {
cout << overflow.keys[j];
if (j < overflowCount - 1) cout << ", ";
}
}
cout << " (count=" << overflowCount << ")" << endl;
}
};
int main() {
HashTable ht;
// Insert keys
cout << "=== Inserting ===" << endl;
int keys[] = {10, 22, 31, 4, 15, 28, 17, 88};
for (int k : keys) {
cout << "Insert " << k << ": hash(" << k << ") = " << k % 7 << endl;
ht.insert(k);
}
// Display
cout << "\n=== Hash Table ===" << endl;
ht.display();
// Search
cout << "\n=== Search ===" << endl;
int targets[] = {31, 17, 15, 99};
for (int t : targets) {
if (ht.search(t)) {
cout << "Key " << t << " found" << endl;
} else {
cout << "Key " << t << " not found" << endl;
}
}
// Delete
cout << "\n=== Delete ===" << endl;
ht.remove(22);
ht.remove(17);
ht.remove(99);
// Display after deletion
cout << "\n=== After Deletion ===" << endl;
ht.display();
return 0;
}
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define TABLE_SIZE 7
#define BUCKET_SIZE 2
#define OVERFLOW_SIZE 10
typedef struct {
int keys[BUCKET_SIZE];
int count;
} Bucket;
typedef struct {
Bucket buckets[TABLE_SIZE];
Bucket overflow;
int overflowCount;
} HashTable;
void initBucket(Bucket* b) {
b->count = 0;
for (int i = 0; i < BUCKET_SIZE; i++)
b->keys[i] = -1;
}
HashTable* createHashTable() {
HashTable* ht = (HashTable*)malloc(sizeof(HashTable));
if (!ht) exit(1);
for (int i = 0; i < TABLE_SIZE; i++)
initBucket(&ht->buckets[i]);
initBucket(&ht->overflow);
ht->overflow.keys = (int*)malloc(OVERFLOW_SIZE * sizeof(int));
for (int i = 0; i < OVERFLOW_SIZE; i++)
ht->overflow.keys[i] = -1;
ht->overflowCount = 0;
return ht;
}
void destroyHashTable(HashTable* ht) {
free(ht->overflow.keys);
free(ht);
}
void insert(HashTable* ht, int key) {
int idx = key % TABLE_SIZE;
Bucket* b = &ht->buckets[idx];
for (int i = 0; i < b->count; i++) {
if (b->keys[i] == key) {
printf("Key %d already exists\n", key);
return;
}
}
if (b->count < BUCKET_SIZE) {
b->keys[b->count] = key;
b->count++;
return;
}
for (int i = 0; i < ht->overflowCount; i++) {
if (ht->overflow.keys[i] == key) {
printf("Key %d already exists\n", key);
return;
}
}
if (ht->overflowCount < OVERFLOW_SIZE) {
ht->overflow.keys[ht->overflowCount] = key;
ht->overflowCount++;
} else {
printf("Overflow is full, cannot insert %d\n", key);
}
}
int search(HashTable* ht, int key) {
int idx = key % TABLE_SIZE;
Bucket* b = &ht->buckets[idx];
for (int i = 0; i < b->count; i++) {
if (b->keys[i] == key) return 1;
}
for (int i = 0; i < ht->overflowCount; i++) {
if (ht->overflow.keys[i] == key) return 1;
}
return 0;
}
void removeKey(HashTable* ht, int key) {
int idx = key % TABLE_SIZE;
Bucket* b = &ht->buckets[idx];
for (int i = 0; i < b->count; i++) {
if (b->keys[i] == key) {
b->keys[i] = b->keys[b->count - 1];
b->keys[b->count - 1] = -1;
b->count--;
printf("Deleted key %d from bucket %d\n", key, idx);
return;
}
}
for (int i = 0; i < ht->overflowCount; i++) {
if (ht->overflow.keys[i] == key) {
ht->overflow.keys[i] = ht->overflow.keys[ht->overflowCount - 1];
ht->overflow.keys[ht->overflowCount - 1] = -1;
ht->overflowCount--;
printf("Deleted key %d from overflow\n", key);
return;
}
}
printf("Key %d not found\n", key);
}
void displayHashTable(HashTable* ht) {
for (int i = 0; i < TABLE_SIZE; i++) {
printf("Bucket[%d]: ", i);
if (ht->buckets[i].count == 0) {
printf("empty");
} else {
for (int j = 0; j < ht->buckets[i].count; j++) {
printf("%d", ht->buckets[i].keys[j]);
if (j < ht->buckets[i].count - 1) printf(", ");
}
}
printf(" (count=%d)\n", ht->buckets[i].count);
}
printf("Overflow: ");
if (ht->overflowCount == 0) {
printf("empty");
} else {
for (int j = 0; j < ht->overflowCount; j++) {
printf("%d", ht->overflow.keys[j]);
if (j < ht->overflowCount - 1) printf(", ");
}
}
printf(" (count=%d)\n", ht->overflowCount);
}
int main() {
HashTable* ht = createHashTable();
printf("=== Inserting ===\n");
int keys[] = {10, 22, 31, 4, 15, 28, 17, 88};
for (int i = 0; i < 8; i++) {
printf("Insert %d: hash(%d) = %d\n", keys[i], keys[i], keys[i] % 7);
insert(ht, keys[i]);
}
printf("\n=== Hash Table ===\n");
displayHashTable(ht);
printf("\n=== Search ===\n");
int targets[] = {31, 17, 15, 99};
for (int i = 0; i < 4; i++) {
if (search(ht, targets[i])) {
printf("Key %d found\n", targets[i]);
} else {
printf("Key %d not found\n", targets[i]);
}
}
printf("\n=== Delete ===\n");
removeKey(ht, 22);
removeKey(ht, 17);
removeKey(ht, 99);
printf("\n=== After Deletion ===\n");
displayHashTable(ht);
destroyHashTable(ht);
return 0;
}
TABLE_SIZE = 7
BUCKET_SIZE = 2
OVERFLOW_SIZE = 10
class Bucket:
"""A fixed-size bucket that holds multiple keys."""
def __init__(self, size=BUCKET_SIZE):
self.keys = [-1] * size
self.count = 0
class HashTable:
"""Hash table using closed hashing with buckets and overflow."""
def __init__(self):
self.buckets = [Bucket() for _ in range(TABLE_SIZE)]
self.overflow = Bucket(OVERFLOW_SIZE)
self.overflow_count = 0
def _hash(self, key):
return key % TABLE_SIZE
def insert(self, key):
idx = self._hash(key)
b = self.buckets[idx]
# Check if key already exists in the bucket
for i in range(b.count):
if b.keys[i] == key:
print(f"Key {key} already exists")
return
if b.count < BUCKET_SIZE:
b.keys[b.count] = key
b.count += 1
return
# Bucket full, insert into overflow
for i in range(self.overflow_count):
if self.overflow.keys[i] == key:
print(f"Key {key} already exists")
return
if self.overflow_count < OVERFLOW_SIZE:
self.overflow.keys[self.overflow_count] = key
self.overflow_count += 1
else:
print(f"Overflow is full, cannot insert {key}")
def search(self, key):
idx = self._hash(key)
b = self.buckets[idx]
for i in range(b.count):
if b.keys[i] == key:
return True
for i in range(self.overflow_count):
if self.overflow.keys[i] == key:
return True
return False
def remove(self, key):
idx = self._hash(key)
b = self.buckets[idx]
for i in range(b.count):
if b.keys[i] == key:
b.keys[i] = b.keys[b.count - 1]
b.keys[b.count - 1] = -1
b.count -= 1
print(f"Deleted key {key} from bucket {idx}")
return
for i in range(self.overflow_count):
if self.overflow.keys[i] == key:
self.overflow.keys[i] = self.overflow.keys[self.overflow_count - 1]
self.overflow.keys[self.overflow_count - 1] = -1
self.overflow_count -= 1
print(f"Deleted key {key} from overflow")
return
print(f"Key {key} not found")
def display(self):
for i in range(TABLE_SIZE):
b = self.buckets[i]
if b.count == 0:
items = "empty"
else:
items = ", ".join(str(b.keys[j]) for j in range(b.count))
print(f"Bucket[{i}]: {items} (count={b.count})")
if self.overflow_count == 0:
items = "empty"
else:
items = ", ".join(
str(self.overflow.keys[j]) for j in range(self.overflow_count)
)
print(f"Overflow: {items} (count={self.overflow_count})")
if __name__ == "__main__":
ht = HashTable()
print("=== Inserting ===")
for k in [10, 22, 31, 4, 15, 28, 17, 88]:
print(f"Insert {k}: hash({k}) = {k % 7}")
ht.insert(k)
print("\n=== Hash Table ===")
ht.display()
print("\n=== Search ===")
for t in [31, 17, 15, 99]:
if ht.search(t):
print(f"Key {t} found")
else:
print(f"Key {t} not found")
print("\n=== Delete ===")
ht.remove(22)
ht.remove(17)
ht.remove(99)
print("\n=== After Deletion ===")
ht.display()
Go 语言使用结构体(struct)定义桶和哈希表。Bucket 包含固定大小的键数组(用切片实现)和计数器。HashTable 管理桶数组和溢出桶。删除时采用末尾元素覆盖策略,避免桶内出现空洞。
package main
import "fmt"
const TABLE_SIZE = 7
const BUCKET_SIZE = 2
const OVERFLOW_SIZE = 10
// Bucket 固定大小的桶,存储多个键
type Bucket struct {
keys []int // 键数组,-1 表示空槽位
count int // 当前存储的键数量
}
func newBucket(size int) *Bucket {
b := &Bucket{
keys: make([]int, size),
count: 0,
}
for i := range b.keys {
b.keys[i] = -1
}
return b
}
// HashTable 桶哈希表,包含主桶数组和溢出桶
type HashTable struct {
buckets []*Bucket // 主桶数组
overflow *Bucket // 溢出桶
overflowCount int // 溢出区元素数量
}
func newHashTable() *HashTable {
ht := &HashTable{
buckets: make([]*Bucket, TABLE_SIZE),
overflow: newBucket(OVERFLOW_SIZE),
overflowCount: 0,
}
for i := range ht.buckets {
ht.buckets[i] = newBucket(BUCKET_SIZE)
}
return ht
}
func hashFunc(key int) int {
return key % TABLE_SIZE
}
// Insert 插入键到桶哈希表,桶满时放入溢出区
func (ht *HashTable) insert(key int) {
idx := hashFunc(key)
b := ht.buckets[idx]
// 检查键是否已存在于主桶中
for i := 0; i < b.count; i++ {
if b.keys[i] == key {
fmt.Printf("Key %d already exists\n", key)
return
}
}
// 主桶未满,直接插入
if b.count < BUCKET_SIZE {
b.keys[b.count] = key
b.count++
return
}
// 主桶已满,检查溢出区是否已存在该键
for i := 0; i < ht.overflowCount; i++ {
if ht.overflow.keys[i] == key {
fmt.Printf("Key %d already exists\n", key)
return
}
}
// 插入溢出区
if ht.overflowCount < OVERFLOW_SIZE {
ht.overflow.keys[ht.overflowCount] = key
ht.overflowCount++
} else {
fmt.Printf("Overflow is full, cannot insert %d\n", key)
}
}
// Search 查找键,先在主桶中查找,再到溢出区查找
func (ht *HashTable) search(key int) bool {
idx := hashFunc(key)
b := ht.buckets[idx]
for i := 0; i < b.count; i++ {
if b.keys[i] == key {
return true
}
}
for i := 0; i < ht.overflowCount; i++ {
if ht.overflow.keys[i] == key {
return true
}
}
return false
}
// Remove 删除键,使用末尾元素覆盖避免空洞
func (ht *HashTable) remove(key int) {
idx := hashFunc(key)
b := ht.buckets[idx]
// 在主桶中查找
for i := 0; i < b.count; i++ {
if b.keys[i] == key {
b.keys[i] = b.keys[b.count-1]
b.keys[b.count-1] = -1
b.count--
fmt.Printf("Deleted key %d from bucket %d\n", key, idx)
return
}
}
// 在溢出区中查找
for i := 0; i < ht.overflowCount; i++ {
if ht.overflow.keys[i] == key {
ht.overflow.keys[i] = ht.overflow.keys[ht.overflowCount-1]
ht.overflow.keys[ht.overflowCount-1] = -1
ht.overflowCount--
fmt.Printf("Deleted key %d from overflow\n", key)
return
}
}
fmt.Printf("Key %d not found\n", key)
}
// Display 打印所有桶和溢出区的内容
func (ht *HashTable) display() {
for i := 0; i < TABLE_SIZE; i++ {
b := ht.buckets[i]
fmt.Printf("Bucket[%d]: ", i)
if b.count == 0 {
fmt.Print("empty")
} else {
for j := 0; j < b.count; j++ {
fmt.Print(b.keys[j])
if j < b.count-1 {
fmt.Print(", ")
}
}
}
fmt.Printf(" (count=%d)\n", b.count)
}
fmt.Print("Overflow: ")
if ht.overflowCount == 0 {
fmt.Print("empty")
} else {
for j := 0; j < ht.overflowCount; j++ {
fmt.Print(ht.overflow.keys[j])
if j < ht.overflowCount-1 {
fmt.Print(", ")
}
}
}
fmt.Printf(" (count=%d)\n", ht.overflowCount)
}
func main() {
ht := newHashTable()
// 插入键
fmt.Println("=== Inserting ===")
keys := []int{10, 22, 31, 4, 15, 28, 17, 88}
for _, k := range keys {
fmt.Printf("Insert %d: hash(%d) = %d\n", k, k, k%7)
ht.insert(k)
}
// 显示哈希表
fmt.Println("\n=== Hash Table ===")
ht.display()
// 搜索键
fmt.Println("\n=== Search ===")
targets := []int{31, 17, 15, 99}
for _, t := range targets {
if ht.search(t) {
fmt.Printf("Key %d found\n", t)
} else {
fmt.Printf("Key %d not found\n", t)
}
}
// 删除键
fmt.Println("\n=== Delete ===")
ht.remove(22)
ht.remove(17)
ht.remove(99)
// 删除后显示
fmt.Println("\n=== After Deletion ===")
ht.display()
}
Go 使用 *Bucket 指针切片管理桶数组,newBucket 构造函数初始化桶的键数组(-1 表示空槽位)。与 Python 的列表不同,Go 的切片大小在创建后不可改变,因此桶的容量由构造参数固定。删除时用末尾元素覆盖被删除位置,保证桶内元素连续存储——这种策略比开放地址法的懒删除更干净。
运行该程序将输出:
=== Inserting ===
Insert 10: hash(10) = 3
Insert 22: hash(22) = 1
Insert 31: hash(31) = 3
Insert 4: hash(4) = 4
Insert 15: hash(15) = 1
Insert 28: hash(28) = 0
Insert 17: hash(17) = 3
Insert 88: hash(88) = 4
=== Hash Table ===
Bucket[0]: 28 (count=1)
Bucket[1]: 22, 15 (count=2)
Bucket[2]: empty (count=0)
Bucket[3]: 10, 31 (count=2)
Bucket[4]: 4, 88 (count=2)
Bucket[5]: empty (count=0)
Bucket[6]: empty (count=0)
Overflow: 17 (count=1)
=== Search ===
Key 31 found
Key 17 found
Key 15 found
Key 99 not found
=== Delete ===
Deleted key 22 from bucket 1
Deleted key 17 from overflow
Key 99 not found
=== After Deletion ===
Bucket[0]: 28 (count=1)
Bucket[1]: 15 (count=1)
Bucket[2]: empty (count=0)
Bucket[3]: 10, 31 (count=2)
Bucket[4]: 4, 88 (count=2)
Bucket[5]: empty (count=0)
Bucket[6]: empty (count=0)
Overflow: empty (count=0)
可以看到:
- 插入
{10, 22, 31, 4, 15, 28, 17, 88}后,10和31都映射到桶3,恰好填满桶3(BUCKET_SIZE = 2)。22和15填满桶1。4和88填满桶4。28独占桶0。17也映射到桶3,但桶3已满,因此被放入溢出区。 - 搜索
31在桶3中直接找到。搜索17在桶3中未找到,转而在溢出区中找到。搜索15在桶1中找到。搜索99在桶1和溢出区中均未找到。 - 删除
22时,桶1中找到22,用桶1的最后一个元素15覆盖,桶1变为[15](count=1)。删除17时从溢出区中找到并移除,溢出区变为空。
性能分析
下表总结了桶哈希表在不同桶大小和装载因子下的性能特征:
| 操作 | 平均时间复杂度 | 最坏时间复杂度 |
|---|---|---|
| 插入(Insert) | O(1) | O(n)(溢出区满时) |
| 搜索(Search) | O(1) | O(n)(需遍历溢出区) |
| 删除(Delete) | O(1) | O(n)(需遍历溢出区) |
桶大小(Bucket Size)的权衡:
| 桶大小 | 优点 | 缺点 |
|---|---|---|
| 较小(1~2) | 内存利用率高,每个桶浪费少 | 冲突容易溢出,溢出区压力大 |
| 中等(3~4) | 较好地平衡冲突和空间 | 删除时桶内移动略多 |
| 较大(8+) | 冲突极少溢出 | 空桶浪费大量空间 |
溢出区(Overflow Area)的处理策略:
- 固定大小溢出桶:本文实现采用的方案。简单直接,但溢出桶满时无法插入新元素。
- 可增长溢出区:溢出区满时动态扩容。灵活但增加了实现复杂度。
- 多级溢出:溢出区本身也用哈希组织,形成二级哈希结构。
与分离链接法(Separate Chaining)和开放地址法(Open Addressing)的比较:
| 特性 | 分离链接法 | 开放地址法 | 桶方法(本文) |
|---|---|---|---|
| 冲突处理 | 链表存储冲突元素 | 探测序列寻找空位 | 桶内数组 + 溢出区 |
| 每个槽位容量 | 无限制(链表) | 1 个元素 | BUCKET_SIZE 个 |
| 额外空间 | 每个节点需要指针 | 无额外开销 | 桶内固定数组 |
| 删除操作 | 直接删除链表节点 | 需要懒删除 | 直接覆盖,无空位问题 |
| 缓存性能 | 较差(链表不连续) | 最好(连续数组) | 较好(桶内连续) |
| 装载因子限制 | 无限制 | 必须 < 1 | 受溢出区大小限制 |
| 最坏搜索 | O(n) | O(n) | O(n)(遍历溢出区) |
关键说明:
- 桶方法的本质:桶方法是分离链接法和开放地址法之间的折中。它用固定大小的数组替代链表来处理冲突,避免了动态内存分配的开销;同时比开放地址法有更好的局部性——冲突的元素集中在同一个桶内,而不是散布在整个数组中。
- 溢出区是性能瓶颈:当哈希函数分布不均匀或装载因子过高时,大量元素溢出到溢出区,搜索和删除操作退化为线性扫描。因此桶方法适合键分布均匀、装载因子适中的场景。
- 空间效率:桶方法的空间开销为
TABLE_SIZE * BUCKET_SIZE + OVERFLOW_SIZE。当装载因子较低时,大量桶内空间被浪费。空间利用率约为n / (TABLE_SIZE * BUCKET_SIZE + OVERFLOW_SIZE)。 - 删除的优势:与开放地址法的懒删除不同,桶方法在删除时直接用末尾元素覆盖,不会产生 DELETED 标记的积累问题。这使得桶方法在频繁删除的场景下比开放地址法更有优势。
- 适用场景:桶方法特别适合键的大小已知、内存布局需要紧凑、且不希望使用动态内存分配(如链表节点)的嵌入式系统或实时系统。通过选择合适的桶大小,可以在空间利用率和冲突处理效率之间取得良好平衡。

浙公网安备 33010602011771号