3-8 使用桶的闭哈希表

使用桶的闭哈希表(Closed Hash Tables, using Buckets)

使用桶的闭哈希表(Closed Hash Tables with Buckets)是处理哈希冲突(Hash Collision)的一种策略。与开放地址法(Open Addressing)在数组中逐个探测空位不同,桶方法的哈希表(Hash Table)每个槽位(Slot)是一个固定大小的"桶"(Bucket),可以容纳多个元素。当一个桶满了之后,新的冲突元素被放入一个公共的溢出区(Overflow Area)。因此这种方案既属于闭哈希表——所有元素存储在数组中而非链表里——又利用桶来批量容纳冲突元素。

核心概念

  • 桶(Bucket):哈希表中每个槽位是一个固定大小的数组,可以存放多个键。桶有一个 count 字段记录当前存放了多少个元素。
  • 桶大小(Bucket Size):每个桶能容纳的最大元素数量,记作 BUCKET_SIZE。通常取较小的值(如 2 或 3)。
  • 溢出区(Overflow Area):当一个桶已满(count == BUCKET_SIZE),新元素无法放入该桶时,被存入溢出区。溢出区本身也是一个桶。
  • 哈希函数(Hash Function):将键映射到桶的下标,例如 hash(key) = key % TABLE_SIZE

下面是一个 TABLE_SIZE = 7, BUCKET_SIZE = 2 的桶哈希表插入过程示意:

插入 10:  hash(10) = 10 % 7 = 3  -> 桶[3]为空,放入桶[3][0]
插入 22:  hash(22) = 22 % 7 = 1  -> 桶[1]为空,放入桶[1][0]
插入 31:  hash(31) = 31 % 7 = 3  -> 桶[3]未满,放入桶[3][1]
插入  4:  hash(4)  = 4 % 7  = 4  -> 桶[4]为空,放入桶[4][0]
插入 15:  hash(15) = 15 % 7 = 1  -> 桶[1]未满,放入桶[1][1]
插入 28:  hash(28) = 28 % 7 = 0  -> 桶[0]为空,放入桶[0][0]
插入 17:  hash(17) = 17 % 7 = 3  -> 桶[3]已满,放入溢出区[0]
插入 88:  hash(88) = 88 % 7 = 4  -> 桶[4]未满,放入桶[4][1]

Bucket[0]: [28,  -]   count=1
Bucket[1]: [22, 15]   count=2
Bucket[2]: [ -,  -]   count=0
Bucket[3]: [10, 31]   count=2
Bucket[4]: [ 4, 88]   count=2
Bucket[5]: [ -,  -]   count=0
Bucket[6]: [ -,  -]   count=0
Overflow:  [17,  -]   count=1

在这个例子中,1031 都映射到桶 3,恰好填满了桶 3BUCKET_SIZE = 2)。当 17 也映射到桶 3 时,桶已满,只能放入溢出区。2215 填满桶 1488 填满桶 4


数据结构定义

桶哈希表的核心数据结构由两部分组成:

  1. 桶(Bucket):包含一个固定大小的键数组和一个计数器,记录当前桶中存放的元素数量。
  2. 哈希表(HashTable):包含一个桶数组和一个额外的溢出桶(Overflow Bucket)。
// C++ Bucket and HashTable definitions
const int TABLE_SIZE = 7;    // Number of buckets
const int BUCKET_SIZE = 2;   // Slots per bucket
const int OVERFLOW_SIZE = 10; // Overflow bucket capacity

struct Bucket {
    int keys[BUCKET_SIZE];   // Array storing keys in this bucket
    int count;               // Number of keys currently stored

    Bucket() : count(0) {
        for (int i = 0; i < BUCKET_SIZE; i++)
            keys[i] = -1;    // -1 indicates empty slot
    }
};

class HashTable {
private:
    Bucket buckets[TABLE_SIZE];  // Main bucket array
    Bucket overflow;             // Overflow bucket
    int overflowCount;           // Items in overflow

    int hashFunction(int key) {
        return key % TABLE_SIZE;
    }

public:
    HashTable() : overflowCount(0) {}
    void insert(int key);
    bool search(int key);
    void remove(int key);
    void display();
};
// C Bucket and HashTable definitions
#define TABLE_SIZE 7
#define BUCKET_SIZE 2
#define OVERFLOW_SIZE 10

typedef struct {
    int keys[BUCKET_SIZE];   // Array storing keys
    int count;               // Number of keys stored
} Bucket;

typedef struct {
    Bucket buckets[TABLE_SIZE];  // Main bucket array
    Bucket overflow;             // Overflow bucket
    int overflowCount;           // Items in overflow
} HashTable;
# Python Bucket and HashTable definitions
TABLE_SIZE = 7
BUCKET_SIZE = 2
OVERFLOW_SIZE = 10

class Bucket:
    """A fixed-size bucket that holds multiple keys."""
    def __init__(self, size=BUCKET_SIZE):
        self.keys = [-1] * size  # -1 indicates empty slot
        self.count = 0

class HashTable:
    """Hash table using closed hashing with buckets and overflow."""
    def __init__(self):
        self.buckets = [Bucket() for _ in range(TABLE_SIZE)]
        self.overflow = Bucket(OVERFLOW_SIZE)
        self.overflow_count = 0
// Go Bucket and HashTable definitions
package main

const TABLE_SIZE = 7
const BUCKET_SIZE = 2
const OVERFLOW_SIZE = 10

// Bucket 固定大小的桶,存储多个键
type Bucket struct {
    keys  []int // 键数组,-1 表示空槽位
    count int   // 当前存储的键数量
}

// HashTable 桶哈希表,包含主桶数组和溢出桶
type HashTable struct {
    buckets       []*Bucket // 主桶数组
    overflow      *Bucket   // 溢出桶
    overflowCount int       // 溢出区元素数量
}

C++ 使用 struct Bucket 定义桶,内含 int keys[BUCKET_SIZE] 数组和 int count 计数器。class HashTable 包含桶数组 buckets[TABLE_SIZE] 和一个溢出桶 overflow。C 语言用 typedef struct 定义等价结构。Python 使用 Bucket 类封装键列表和计数器,HashTable 类管理桶列表和溢出桶。Go 使用结构体(struct)定义桶和哈希表,Bucket 包含 []int 切片和 count 计数器,HashTable 管理桶指针切片和溢出桶指针。


插入操作

插入(Insertion)的步骤如下:

  1. 用哈希函数计算目标桶的下标 bucketIndex = key % TABLE_SIZE
  2. 检查该桶:如果桶未满(count < BUCKET_SIZE),将键追加到桶中。
  3. 如果桶已满,检查键是否已在桶中(避免重复插入)。
  4. 如果桶满且键不在桶中,将键放入溢出区(Overflow)。
// C++ Insert: add a key to the bucket hash table
void HashTable::insert(int key) {
    int idx = hashFunction(key);
    Bucket& b = buckets[idx];

    // Check if key already exists in the bucket
    for (int i = 0; i < b.count; i++) {
        if (b.keys[i] == key) {
            cout << "Key " << key << " already exists" << endl;
            return;
        }
    }

    // Try to insert into the main bucket
    if (b.count < BUCKET_SIZE) {
        b.keys[b.count] = key;
        b.count++;
        return;
    }

    // Bucket is full, insert into overflow
    for (int i = 0; i < overflowCount; i++) {
        if (overflow.keys[i] == key) {
            cout << "Key " << key << " already exists" << endl;
            return;
        }
    }
    if (overflowCount < OVERFLOW_SIZE) {
        overflow.keys[overflowCount] = key;
        overflowCount++;
    } else {
        cout << "Overflow is full, cannot insert " << key << endl;
    }
}
// C Insert: add a key to the bucket hash table
void insert(HashTable* ht, int key) {
    int idx = key % TABLE_SIZE;
    Bucket* b = &ht->buckets[idx];

    // Check if key already exists in the bucket
    for (int i = 0; i < b->count; i++) {
        if (b->keys[i] == key) {
            printf("Key %d already exists\n", key);
            return;
        }
    }

    // Try to insert into the main bucket
    if (b->count < BUCKET_SIZE) {
        b->keys[b->count] = key;
        b->count++;
        return;
    }

    // Bucket is full, insert into overflow
    for (int i = 0; i < ht->overflowCount; i++) {
        if (ht->overflow.keys[i] == key) {
            printf("Key %d already exists\n", key);
            return;
        }
    }
    if (ht->overflowCount < OVERFLOW_SIZE) {
        ht->overflow.keys[ht->overflowCount] = key;
        ht->overflowCount++;
    } else {
        printf("Overflow is full, cannot insert %d\n", key);
    }
}
# Python Insert: add a key to the bucket hash table
def insert(self, key):
    idx = key % TABLE_SIZE
    b = self.buckets[idx]

    # Check if key already exists in the bucket
    for i in range(b.count):
        if b.keys[i] == key:
            print(f"Key {key} already exists")
            return

    # Try to insert into the main bucket
    if b.count < BUCKET_SIZE:
        b.keys[b.count] = key
        b.count += 1
        return

    # Bucket is full, insert into overflow
    for i in range(self.overflow_count):
        if self.overflow.keys[i] == key:
            print(f"Key {key} already exists")
            return
    if self.overflow_count < OVERFLOW_SIZE:
        self.overflow.keys[self.overflow_count] = key
        self.overflow_count += 1
    else:
        print(f"Overflow is full, cannot insert {key}")
// Go Insert: add a key to the bucket hash table
func (ht *HashTable) insert(key int) {
    idx := key % TABLE_SIZE
    b := ht.buckets[idx]

    // 检查键是否已存在于主桶中
    for i := 0; i < b.count; i++ {
        if b.keys[i] == key {
            fmt.Printf("Key %d already exists\n", key)
            return
        }
    }

    // 主桶未满,直接插入
    if b.count < BUCKET_SIZE {
        b.keys[b.count] = key
        b.count++
        return
    }

    // 主桶已满,检查溢出区是否已存在该键
    for i := 0; i < ht.overflowCount; i++ {
        if ht.overflow.keys[i] == key {
            fmt.Printf("Key %d already exists\n", key)
            return
        }
    }

    // 插入溢出区
    if ht.overflowCount < OVERFLOW_SIZE {
        ht.overflow.keys[ht.overflowCount] = key
        ht.overflowCount++
    } else {
        fmt.Printf("Overflow is full, cannot insert %d\n", key)
    }
}

插入操作首先在目标桶中查找是否已存在相同键,避免重复插入。如果桶未满,直接将键追加到桶的 keys 数组末尾。如果桶已满,则将键放入溢出区。溢出区也有容量限制,满时插入失败。


搜索操作

搜索(Search)的步骤如下:

  1. 计算目标桶的下标 bucketIndex = key % TABLE_SIZE
  2. 在该桶中遍历所有已存储的键,查找匹配项。
  3. 如果桶中未找到,继续在溢出区中搜索。
// C++ Search: find a key in the bucket hash table
bool HashTable::search(int key) {
    int idx = hashFunction(key);
    Bucket& b = buckets[idx];

    // Search in the main bucket
    for (int i = 0; i < b.count; i++) {
        if (b.keys[i] == key) {
            return true;
        }
    }

    // Search in overflow
    for (int i = 0; i < overflowCount; i++) {
        if (overflow.keys[i] == key) {
            return true;
        }
    }

    return false;
}
// C Search: find a key in the bucket hash table
int search(HashTable* ht, int key) {
    int idx = key % TABLE_SIZE;
    Bucket* b = &ht->buckets[idx];

    // Search in the main bucket
    for (int i = 0; i < b->count; i++) {
        if (b->keys[i] == key) {
            return 1;
        }
    }

    // Search in overflow
    for (int i = 0; i < ht->overflowCount; i++) {
        if (ht->overflow.keys[i] == key) {
            return 1;
        }
    }

    return 0;
}
# Python Search: find a key in the bucket hash table
def search(self, key):
    idx = key % TABLE_SIZE
    b = self.buckets[idx]

    # Search in the main bucket
    for i in range(b.count):
        if b.keys[i] == key:
            return True

    # Search in overflow
    for i in range(self.overflow_count):
        if self.overflow.keys[i] == key:
            return True

    return False
// Go Search: find a key in the bucket hash table
func (ht *HashTable) search(key int) bool {
    idx := key % TABLE_SIZE
    b := ht.buckets[idx]

    // 在主桶中搜索
    for i := 0; i < b.count; i++ {
        if b.keys[i] == key {
            return true
        }
    }

    // 在溢出区中搜索
    for i := 0; i < ht.overflowCount; i++ {
        if ht.overflow.keys[i] == key {
            return true
        }
    }

    return false
}

搜索操作先在哈希值对应的主桶中逐一比较。由于桶大小较小(通常 2~3),这一步是常数时间。如果主桶中没有找到,则需要遍历溢出区。最坏情况下需要搜索整个溢出区,因此溢出区的大小直接影响搜索性能。


删除操作

删除(Deletion)的步骤如下:

  1. 首先在目标桶中查找键。
  2. 如果找到,用桶中最后一个元素覆盖被删除元素的位置,然后减少计数。这种"用末尾填充"的方式避免了在桶中间留下空洞。
  3. 如果桶中未找到,在溢出区中查找并删除,同样使用末尾填充。
  4. 如果溢出区中也没有找到,报告键不存在。
// C++ Remove: delete a key from the bucket hash table
void HashTable::remove(int key) {
    int idx = hashFunction(key);
    Bucket& b = buckets[idx];

    // Search in the main bucket
    for (int i = 0; i < b.count; i++) {
        if (b.keys[i] == key) {
            // Replace with last element to avoid gaps
            b.keys[i] = b.keys[b.count - 1];
            b.keys[b.count - 1] = -1;
            b.count--;
            cout << "Deleted key " << key << " from bucket " << idx << endl;
            return;
        }
    }

    // Search in overflow
    for (int i = 0; i < overflowCount; i++) {
        if (overflow.keys[i] == key) {
            overflow.keys[i] = overflow.keys[overflowCount - 1];
            overflow.keys[overflowCount - 1] = -1;
            overflowCount--;
            cout << "Deleted key " << key << " from overflow" << endl;
            return;
        }
    }

    cout << "Key " << key << " not found" << endl;
}
// C Remove: delete a key from the bucket hash table
void removeKey(HashTable* ht, int key) {
    int idx = key % TABLE_SIZE;
    Bucket* b = &ht->buckets[idx];

    // Search in the main bucket
    for (int i = 0; i < b->count; i++) {
        if (b->keys[i] == key) {
            b->keys[i] = b->keys[b->count - 1];
            b->keys[b->count - 1] = -1;
            b->count--;
            printf("Deleted key %d from bucket %d\n", key, idx);
            return;
        }
    }

    // Search in overflow
    for (int i = 0; i < ht->overflowCount; i++) {
        if (ht->overflow.keys[i] == key) {
            ht->overflow.keys[i] = ht->overflow.keys[ht->overflowCount - 1];
            ht->overflow.keys[ht->overflowCount - 1] = -1;
            ht->overflowCount--;
            printf("Deleted key %d from overflow\n", key);
            return;
        }
    }

    printf("Key %d not found\n", key);
}
# Python Remove: delete a key from the bucket hash table
def remove(self, key):
    idx = key % TABLE_SIZE
    b = self.buckets[idx]

    # Search in the main bucket
    for i in range(b.count):
        if b.keys[i] == key:
            # Replace with last element to avoid gaps
            b.keys[i] = b.keys[b.count - 1]
            b.keys[b.count - 1] = -1
            b.count -= 1
            print(f"Deleted key {key} from bucket {idx}")
            return

    # Search in overflow
    for i in range(self.overflow_count):
        if self.overflow.keys[i] == key:
            self.overflow.keys[i] = self.overflow.keys[self.overflow_count - 1]
            self.overflow.keys[self.overflow_count - 1] = -1
            self.overflow_count -= 1
            print(f"Deleted key {key} from overflow")
            return

    print(f"Key {key} not found")
// Go Remove: delete a key from the bucket hash table
func (ht *HashTable) remove(key int) {
    idx := key % TABLE_SIZE
    b := ht.buckets[idx]

    // 在主桶中查找
    for i := 0; i < b.count; i++ {
        if b.keys[i] == key {
            // 用末尾元素覆盖,避免空洞
            b.keys[i] = b.keys[b.count-1]
            b.keys[b.count-1] = -1
            b.count--
            fmt.Printf("Deleted key %d from bucket %d\n", key, idx)
            return
        }
    }

    // 在溢出区中查找
    for i := 0; i < ht.overflowCount; i++ {
        if ht.overflow.keys[i] == key {
            ht.overflow.keys[i] = ht.overflow.keys[ht.overflowCount-1]
            ht.overflow.keys[ht.overflowCount-1] = -1
            ht.overflowCount--
            fmt.Printf("Deleted key %d from overflow\n", key)
            return
        }
    }

    fmt.Printf("Key %d not found\n", key)
}

删除操作采用"末尾元素覆盖"策略:将被删除位置用桶中最后一个元素填充,然后减少计数。这种方式保证桶中的元素始终连续存储,不会产生中间空洞,搜索时无需跳过空位。这比开放地址法的懒删除(Lazy Deletion)更干净,因为桶内的元素顺序无关紧要——只要它们都在同一个桶里即可。


显示哈希表

显示(Display)操作遍历所有桶和溢出区,打印每个桶中存储的键。

// C++ Display: print the entire hash table
void HashTable::display() {
    for (int i = 0; i < TABLE_SIZE; i++) {
        cout << "Bucket[" << i << "]: ";
        if (buckets[i].count == 0) {
            cout << "empty";
        } else {
            for (int j = 0; j < buckets[i].count; j++) {
                cout << buckets[i].keys[j];
                if (j < buckets[i].count - 1) cout << ", ";
            }
        }
        cout << "   (count=" << buckets[i].count << ")" << endl;
    }

    cout << "Overflow:   ";
    if (overflowCount == 0) {
        cout << "empty";
    } else {
        for (int j = 0; j < overflowCount; j++) {
            cout << overflow.keys[j];
            if (j < overflowCount - 1) cout << ", ";
        }
    }
    cout << "   (count=" << overflowCount << ")" << endl;
}
// C Display: print the entire hash table
void display(HashTable* ht) {
    for (int i = 0; i < TABLE_SIZE; i++) {
        printf("Bucket[%d]: ", i);
        if (ht->buckets[i].count == 0) {
            printf("empty");
        } else {
            for (int j = 0; j < ht->buckets[i].count; j++) {
                printf("%d", ht->buckets[i].keys[j]);
                if (j < ht->buckets[i].count - 1) printf(", ");
            }
        }
        printf("   (count=%d)\n", ht->buckets[i].count);
    }

    printf("Overflow:   ");
    if (ht->overflowCount == 0) {
        printf("empty");
    } else {
        for (int j = 0; j < ht->overflowCount; j++) {
            printf("%d", ht->overflow.keys[j]);
            if (j < ht->overflowCount - 1) printf(", ");
        }
    }
    printf("   (count=%d)\n", ht->overflowCount);
}
# Python Display: print the entire hash table
def display(self):
    for i in range(TABLE_SIZE):
        b = self.buckets[i]
        if b.count == 0:
            items = "empty"
        else:
            items = ", ".join(str(b.keys[j]) for j in range(b.count))
        print(f"Bucket[{i}]: {items}   (count={b.count})")

    if self.overflow_count == 0:
        items = "empty"
    else:
        items = ", ".join(
            str(self.overflow.keys[j]) for j in range(self.overflow_count)
        )
    print(f"Overflow:   {items}   (count={self.overflow_count})")
// Go Display: print the entire hash table
func (ht *HashTable) display() {
    for i := 0; i < TABLE_SIZE; i++ {
        b := ht.buckets[i]
        fmt.Printf("Bucket[%d]: ", i)
        if b.count == 0 {
            fmt.Print("empty")
        } else {
            for j := 0; j < b.count; j++ {
                fmt.Print(b.keys[j])
                if j < b.count-1 {
                    fmt.Print(", ")
                }
            }
        }
        fmt.Printf("   (count=%d)\n", b.count)
    }

    fmt.Print("Overflow:   ")
    if ht.overflowCount == 0 {
        fmt.Print("empty")
    } else {
        for j := 0; j < ht.overflowCount; j++ {
            fmt.Print(ht.overflow.keys[j])
            if j < ht.overflowCount-1 {
                fmt.Print(", ")
            }
        }
    }
    fmt.Printf("   (count=%d)\n", ht.overflowCount)
}

显示操作按顺序遍历每个桶,打印桶中所有有效键及计数,最后打印溢出区内容。空桶显示 "empty"。这种格式便于直观地观察元素在桶间的分布和溢出区的使用情况。


完整实现

下面提供完整的桶哈希表实现,包含插入、搜索、删除和显示操作,整合为可独立运行的程序。依次插入 10, 22, 31, 4, 15, 28, 17, 88,然后执行搜索、删除和再次显示。

#include <iostream>
using namespace std;

const int TABLE_SIZE = 7;
const int BUCKET_SIZE = 2;
const int OVERFLOW_SIZE = 10;

struct Bucket {
    int keys[BUCKET_SIZE];
    int count;

    Bucket() : count(0) {
        for (int i = 0; i < BUCKET_SIZE; i++)
            keys[i] = -1;
    }
};

class HashTable {
private:
    Bucket buckets[TABLE_SIZE];
    Bucket overflow;
    int overflowCount;

    int hashFunction(int key) {
        return key % TABLE_SIZE;
    }

public:
    HashTable() : overflow(), overflowCount(0) {}

    void insert(int key) {
        int idx = hashFunction(key);
        Bucket& b = buckets[idx];

        // Check if key already exists in the bucket
        for (int i = 0; i < b.count; i++) {
            if (b.keys[i] == key) {
                cout << "Key " << key << " already exists" << endl;
                return;
            }
        }

        if (b.count < BUCKET_SIZE) {
            b.keys[b.count] = key;
            b.count++;
            return;
        }

        // Bucket full, insert into overflow
        for (int i = 0; i < overflowCount; i++) {
            if (overflow.keys[i] == key) {
                cout << "Key " << key << " already exists" << endl;
                return;
            }
        }
        if (overflowCount < OVERFLOW_SIZE) {
            overflow.keys[overflowCount] = key;
            overflowCount++;
        } else {
            cout << "Overflow is full, cannot insert " << key << endl;
        }
    }

    bool search(int key) {
        int idx = hashFunction(key);
        Bucket& b = buckets[idx];

        for (int i = 0; i < b.count; i++) {
            if (b.keys[i] == key) return true;
        }
        for (int i = 0; i < overflowCount; i++) {
            if (overflow.keys[i] == key) return true;
        }
        return false;
    }

    void remove(int key) {
        int idx = hashFunction(key);
        Bucket& b = buckets[idx];

        for (int i = 0; i < b.count; i++) {
            if (b.keys[i] == key) {
                b.keys[i] = b.keys[b.count - 1];
                b.keys[b.count - 1] = -1;
                b.count--;
                cout << "Deleted key " << key << " from bucket " << idx << endl;
                return;
            }
        }

        for (int i = 0; i < overflowCount; i++) {
            if (overflow.keys[i] == key) {
                overflow.keys[i] = overflow.keys[overflowCount - 1];
                overflow.keys[overflowCount - 1] = -1;
                overflowCount--;
                cout << "Deleted key " << key << " from overflow" << endl;
                return;
            }
        }

        cout << "Key " << key << " not found" << endl;
    }

    void display() {
        for (int i = 0; i < TABLE_SIZE; i++) {
            cout << "Bucket[" << i << "]: ";
            if (buckets[i].count == 0) {
                cout << "empty";
            } else {
                for (int j = 0; j < buckets[i].count; j++) {
                    cout << buckets[i].keys[j];
                    if (j < buckets[i].count - 1) cout << ", ";
                }
            }
            cout << "   (count=" << buckets[i].count << ")" << endl;
        }
        cout << "Overflow:   ";
        if (overflowCount == 0) {
            cout << "empty";
        } else {
            for (int j = 0; j < overflowCount; j++) {
                cout << overflow.keys[j];
                if (j < overflowCount - 1) cout << ", ";
            }
        }
        cout << "   (count=" << overflowCount << ")" << endl;
    }
};

int main() {
    HashTable ht;

    // Insert keys
    cout << "=== Inserting ===" << endl;
    int keys[] = {10, 22, 31, 4, 15, 28, 17, 88};
    for (int k : keys) {
        cout << "Insert " << k << ": hash(" << k << ") = " << k % 7 << endl;
        ht.insert(k);
    }

    // Display
    cout << "\n=== Hash Table ===" << endl;
    ht.display();

    // Search
    cout << "\n=== Search ===" << endl;
    int targets[] = {31, 17, 15, 99};
    for (int t : targets) {
        if (ht.search(t)) {
            cout << "Key " << t << " found" << endl;
        } else {
            cout << "Key " << t << " not found" << endl;
        }
    }

    // Delete
    cout << "\n=== Delete ===" << endl;
    ht.remove(22);
    ht.remove(17);
    ht.remove(99);

    // Display after deletion
    cout << "\n=== After Deletion ===" << endl;
    ht.display();

    return 0;
}
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define TABLE_SIZE 7
#define BUCKET_SIZE 2
#define OVERFLOW_SIZE 10

typedef struct {
    int keys[BUCKET_SIZE];
    int count;
} Bucket;

typedef struct {
    Bucket buckets[TABLE_SIZE];
    Bucket overflow;
    int overflowCount;
} HashTable;

void initBucket(Bucket* b) {
    b->count = 0;
    for (int i = 0; i < BUCKET_SIZE; i++)
        b->keys[i] = -1;
}

HashTable* createHashTable() {
    HashTable* ht = (HashTable*)malloc(sizeof(HashTable));
    if (!ht) exit(1);
    for (int i = 0; i < TABLE_SIZE; i++)
        initBucket(&ht->buckets[i]);
    initBucket(&ht->overflow);
    ht->overflow.keys = (int*)malloc(OVERFLOW_SIZE * sizeof(int));
    for (int i = 0; i < OVERFLOW_SIZE; i++)
        ht->overflow.keys[i] = -1;
    ht->overflowCount = 0;
    return ht;
}

void destroyHashTable(HashTable* ht) {
    free(ht->overflow.keys);
    free(ht);
}

void insert(HashTable* ht, int key) {
    int idx = key % TABLE_SIZE;
    Bucket* b = &ht->buckets[idx];

    for (int i = 0; i < b->count; i++) {
        if (b->keys[i] == key) {
            printf("Key %d already exists\n", key);
            return;
        }
    }

    if (b->count < BUCKET_SIZE) {
        b->keys[b->count] = key;
        b->count++;
        return;
    }

    for (int i = 0; i < ht->overflowCount; i++) {
        if (ht->overflow.keys[i] == key) {
            printf("Key %d already exists\n", key);
            return;
        }
    }
    if (ht->overflowCount < OVERFLOW_SIZE) {
        ht->overflow.keys[ht->overflowCount] = key;
        ht->overflowCount++;
    } else {
        printf("Overflow is full, cannot insert %d\n", key);
    }
}

int search(HashTable* ht, int key) {
    int idx = key % TABLE_SIZE;
    Bucket* b = &ht->buckets[idx];

    for (int i = 0; i < b->count; i++) {
        if (b->keys[i] == key) return 1;
    }
    for (int i = 0; i < ht->overflowCount; i++) {
        if (ht->overflow.keys[i] == key) return 1;
    }
    return 0;
}

void removeKey(HashTable* ht, int key) {
    int idx = key % TABLE_SIZE;
    Bucket* b = &ht->buckets[idx];

    for (int i = 0; i < b->count; i++) {
        if (b->keys[i] == key) {
            b->keys[i] = b->keys[b->count - 1];
            b->keys[b->count - 1] = -1;
            b->count--;
            printf("Deleted key %d from bucket %d\n", key, idx);
            return;
        }
    }

    for (int i = 0; i < ht->overflowCount; i++) {
        if (ht->overflow.keys[i] == key) {
            ht->overflow.keys[i] = ht->overflow.keys[ht->overflowCount - 1];
            ht->overflow.keys[ht->overflowCount - 1] = -1;
            ht->overflowCount--;
            printf("Deleted key %d from overflow\n", key);
            return;
        }
    }

    printf("Key %d not found\n", key);
}

void displayHashTable(HashTable* ht) {
    for (int i = 0; i < TABLE_SIZE; i++) {
        printf("Bucket[%d]: ", i);
        if (ht->buckets[i].count == 0) {
            printf("empty");
        } else {
            for (int j = 0; j < ht->buckets[i].count; j++) {
                printf("%d", ht->buckets[i].keys[j]);
                if (j < ht->buckets[i].count - 1) printf(", ");
            }
        }
        printf("   (count=%d)\n", ht->buckets[i].count);
    }
    printf("Overflow:   ");
    if (ht->overflowCount == 0) {
        printf("empty");
    } else {
        for (int j = 0; j < ht->overflowCount; j++) {
            printf("%d", ht->overflow.keys[j]);
            if (j < ht->overflowCount - 1) printf(", ");
        }
    }
    printf("   (count=%d)\n", ht->overflowCount);
}

int main() {
    HashTable* ht = createHashTable();

    printf("=== Inserting ===\n");
    int keys[] = {10, 22, 31, 4, 15, 28, 17, 88};
    for (int i = 0; i < 8; i++) {
        printf("Insert %d: hash(%d) = %d\n", keys[i], keys[i], keys[i] % 7);
        insert(ht, keys[i]);
    }

    printf("\n=== Hash Table ===\n");
    displayHashTable(ht);

    printf("\n=== Search ===\n");
    int targets[] = {31, 17, 15, 99};
    for (int i = 0; i < 4; i++) {
        if (search(ht, targets[i])) {
            printf("Key %d found\n", targets[i]);
        } else {
            printf("Key %d not found\n", targets[i]);
        }
    }

    printf("\n=== Delete ===\n");
    removeKey(ht, 22);
    removeKey(ht, 17);
    removeKey(ht, 99);

    printf("\n=== After Deletion ===\n");
    displayHashTable(ht);

    destroyHashTable(ht);
    return 0;
}
TABLE_SIZE = 7
BUCKET_SIZE = 2
OVERFLOW_SIZE = 10

class Bucket:
    """A fixed-size bucket that holds multiple keys."""
    def __init__(self, size=BUCKET_SIZE):
        self.keys = [-1] * size
        self.count = 0

class HashTable:
    """Hash table using closed hashing with buckets and overflow."""
    def __init__(self):
        self.buckets = [Bucket() for _ in range(TABLE_SIZE)]
        self.overflow = Bucket(OVERFLOW_SIZE)
        self.overflow_count = 0

    def _hash(self, key):
        return key % TABLE_SIZE

    def insert(self, key):
        idx = self._hash(key)
        b = self.buckets[idx]

        # Check if key already exists in the bucket
        for i in range(b.count):
            if b.keys[i] == key:
                print(f"Key {key} already exists")
                return

        if b.count < BUCKET_SIZE:
            b.keys[b.count] = key
            b.count += 1
            return

        # Bucket full, insert into overflow
        for i in range(self.overflow_count):
            if self.overflow.keys[i] == key:
                print(f"Key {key} already exists")
                return
        if self.overflow_count < OVERFLOW_SIZE:
            self.overflow.keys[self.overflow_count] = key
            self.overflow_count += 1
        else:
            print(f"Overflow is full, cannot insert {key}")

    def search(self, key):
        idx = self._hash(key)
        b = self.buckets[idx]

        for i in range(b.count):
            if b.keys[i] == key:
                return True
        for i in range(self.overflow_count):
            if self.overflow.keys[i] == key:
                return True
        return False

    def remove(self, key):
        idx = self._hash(key)
        b = self.buckets[idx]

        for i in range(b.count):
            if b.keys[i] == key:
                b.keys[i] = b.keys[b.count - 1]
                b.keys[b.count - 1] = -1
                b.count -= 1
                print(f"Deleted key {key} from bucket {idx}")
                return

        for i in range(self.overflow_count):
            if self.overflow.keys[i] == key:
                self.overflow.keys[i] = self.overflow.keys[self.overflow_count - 1]
                self.overflow.keys[self.overflow_count - 1] = -1
                self.overflow_count -= 1
                print(f"Deleted key {key} from overflow")
                return

        print(f"Key {key} not found")

    def display(self):
        for i in range(TABLE_SIZE):
            b = self.buckets[i]
            if b.count == 0:
                items = "empty"
            else:
                items = ", ".join(str(b.keys[j]) for j in range(b.count))
            print(f"Bucket[{i}]: {items}   (count={b.count})")

        if self.overflow_count == 0:
            items = "empty"
        else:
            items = ", ".join(
                str(self.overflow.keys[j]) for j in range(self.overflow_count)
            )
        print(f"Overflow:   {items}   (count={self.overflow_count})")

if __name__ == "__main__":
    ht = HashTable()

    print("=== Inserting ===")
    for k in [10, 22, 31, 4, 15, 28, 17, 88]:
        print(f"Insert {k}: hash({k}) = {k % 7}")
        ht.insert(k)

    print("\n=== Hash Table ===")
    ht.display()

    print("\n=== Search ===")
    for t in [31, 17, 15, 99]:
        if ht.search(t):
            print(f"Key {t} found")
        else:
            print(f"Key {t} not found")

    print("\n=== Delete ===")
    ht.remove(22)
    ht.remove(17)
    ht.remove(99)

    print("\n=== After Deletion ===")
    ht.display()

Go 语言使用结构体(struct)定义桶和哈希表。Bucket 包含固定大小的键数组(用切片实现)和计数器。HashTable 管理桶数组和溢出桶。删除时采用末尾元素覆盖策略,避免桶内出现空洞。

package main

import "fmt"

const TABLE_SIZE = 7
const BUCKET_SIZE = 2
const OVERFLOW_SIZE = 10

// Bucket 固定大小的桶,存储多个键
type Bucket struct {
	keys  []int // 键数组,-1 表示空槽位
	count int   // 当前存储的键数量
}

func newBucket(size int) *Bucket {
	b := &Bucket{
		keys:  make([]int, size),
		count: 0,
	}
	for i := range b.keys {
		b.keys[i] = -1
	}
	return b
}

// HashTable 桶哈希表,包含主桶数组和溢出桶
type HashTable struct {
	buckets       []*Bucket // 主桶数组
	overflow      *Bucket   // 溢出桶
	overflowCount int       // 溢出区元素数量
}

func newHashTable() *HashTable {
	ht := &HashTable{
		buckets:       make([]*Bucket, TABLE_SIZE),
		overflow:      newBucket(OVERFLOW_SIZE),
		overflowCount: 0,
	}
	for i := range ht.buckets {
		ht.buckets[i] = newBucket(BUCKET_SIZE)
	}
	return ht
}

func hashFunc(key int) int {
	return key % TABLE_SIZE
}

// Insert 插入键到桶哈希表,桶满时放入溢出区
func (ht *HashTable) insert(key int) {
	idx := hashFunc(key)
	b := ht.buckets[idx]

	// 检查键是否已存在于主桶中
	for i := 0; i < b.count; i++ {
		if b.keys[i] == key {
			fmt.Printf("Key %d already exists\n", key)
			return
		}
	}

	// 主桶未满,直接插入
	if b.count < BUCKET_SIZE {
		b.keys[b.count] = key
		b.count++
		return
	}

	// 主桶已满,检查溢出区是否已存在该键
	for i := 0; i < ht.overflowCount; i++ {
		if ht.overflow.keys[i] == key {
			fmt.Printf("Key %d already exists\n", key)
			return
		}
	}

	// 插入溢出区
	if ht.overflowCount < OVERFLOW_SIZE {
		ht.overflow.keys[ht.overflowCount] = key
		ht.overflowCount++
	} else {
		fmt.Printf("Overflow is full, cannot insert %d\n", key)
	}
}

// Search 查找键,先在主桶中查找,再到溢出区查找
func (ht *HashTable) search(key int) bool {
	idx := hashFunc(key)
	b := ht.buckets[idx]

	for i := 0; i < b.count; i++ {
		if b.keys[i] == key {
			return true
		}
	}
	for i := 0; i < ht.overflowCount; i++ {
		if ht.overflow.keys[i] == key {
			return true
		}
	}
	return false
}

// Remove 删除键,使用末尾元素覆盖避免空洞
func (ht *HashTable) remove(key int) {
	idx := hashFunc(key)
	b := ht.buckets[idx]

	// 在主桶中查找
	for i := 0; i < b.count; i++ {
		if b.keys[i] == key {
			b.keys[i] = b.keys[b.count-1]
			b.keys[b.count-1] = -1
			b.count--
			fmt.Printf("Deleted key %d from bucket %d\n", key, idx)
			return
		}
	}

	// 在溢出区中查找
	for i := 0; i < ht.overflowCount; i++ {
		if ht.overflow.keys[i] == key {
			ht.overflow.keys[i] = ht.overflow.keys[ht.overflowCount-1]
			ht.overflow.keys[ht.overflowCount-1] = -1
			ht.overflowCount--
			fmt.Printf("Deleted key %d from overflow\n", key)
			return
		}
	}

	fmt.Printf("Key %d not found\n", key)
}

// Display 打印所有桶和溢出区的内容
func (ht *HashTable) display() {
	for i := 0; i < TABLE_SIZE; i++ {
		b := ht.buckets[i]
		fmt.Printf("Bucket[%d]: ", i)
		if b.count == 0 {
			fmt.Print("empty")
		} else {
			for j := 0; j < b.count; j++ {
				fmt.Print(b.keys[j])
				if j < b.count-1 {
					fmt.Print(", ")
				}
			}
		}
		fmt.Printf("   (count=%d)\n", b.count)
	}

	fmt.Print("Overflow:   ")
	if ht.overflowCount == 0 {
		fmt.Print("empty")
	} else {
		for j := 0; j < ht.overflowCount; j++ {
			fmt.Print(ht.overflow.keys[j])
			if j < ht.overflowCount-1 {
				fmt.Print(", ")
			}
		}
	}
	fmt.Printf("   (count=%d)\n", ht.overflowCount)
}

func main() {
	ht := newHashTable()

	// 插入键
	fmt.Println("=== Inserting ===")
	keys := []int{10, 22, 31, 4, 15, 28, 17, 88}
	for _, k := range keys {
		fmt.Printf("Insert %d: hash(%d) = %d\n", k, k, k%7)
		ht.insert(k)
	}

	// 显示哈希表
	fmt.Println("\n=== Hash Table ===")
	ht.display()

	// 搜索键
	fmt.Println("\n=== Search ===")
	targets := []int{31, 17, 15, 99}
	for _, t := range targets {
		if ht.search(t) {
			fmt.Printf("Key %d found\n", t)
		} else {
			fmt.Printf("Key %d not found\n", t)
		}
	}

	// 删除键
	fmt.Println("\n=== Delete ===")
	ht.remove(22)
	ht.remove(17)
	ht.remove(99)

	// 删除后显示
	fmt.Println("\n=== After Deletion ===")
	ht.display()
}

Go 使用 *Bucket 指针切片管理桶数组,newBucket 构造函数初始化桶的键数组(-1 表示空槽位)。与 Python 的列表不同,Go 的切片大小在创建后不可改变,因此桶的容量由构造参数固定。删除时用末尾元素覆盖被删除位置,保证桶内元素连续存储——这种策略比开放地址法的懒删除更干净。

运行该程序将输出:

=== Inserting ===
Insert 10: hash(10) = 3
Insert 22: hash(22) = 1
Insert 31: hash(31) = 3
Insert 4: hash(4) = 4
Insert 15: hash(15) = 1
Insert 28: hash(28) = 0
Insert 17: hash(17) = 3
Insert 88: hash(88) = 4

=== Hash Table ===
Bucket[0]: 28   (count=1)
Bucket[1]: 22, 15   (count=2)
Bucket[2]: empty   (count=0)
Bucket[3]: 10, 31   (count=2)
Bucket[4]: 4, 88   (count=2)
Bucket[5]: empty   (count=0)
Bucket[6]: empty   (count=0)
Overflow:   17   (count=1)

=== Search ===
Key 31 found
Key 17 found
Key 15 found
Key 99 not found

=== Delete ===
Deleted key 22 from bucket 1
Deleted key 17 from overflow
Key 99 not found

=== After Deletion ===
Bucket[0]: 28   (count=1)
Bucket[1]: 15   (count=1)
Bucket[2]: empty   (count=0)
Bucket[3]: 10, 31   (count=2)
Bucket[4]: 4, 88   (count=2)
Bucket[5]: empty   (count=0)
Bucket[6]: empty   (count=0)
Overflow:   empty   (count=0)

可以看到:

  • 插入 {10, 22, 31, 4, 15, 28, 17, 88} 后,1031 都映射到桶 3,恰好填满桶 3BUCKET_SIZE = 2)。2215 填满桶 1488 填满桶 428 独占桶 017 也映射到桶 3,但桶 3 已满,因此被放入溢出区。
  • 搜索 31 在桶 3 中直接找到。搜索 17 在桶 3 中未找到,转而在溢出区中找到。搜索 15 在桶 1 中找到。搜索 99 在桶 1 和溢出区中均未找到。
  • 删除 22 时,桶 1 中找到 22,用桶 1 的最后一个元素 15 覆盖,桶 1 变为 [15](count=1)。删除 17 时从溢出区中找到并移除,溢出区变为空。

性能分析

下表总结了桶哈希表在不同桶大小和装载因子下的性能特征:

操作 平均时间复杂度 最坏时间复杂度
插入(Insert) O(1) O(n)(溢出区满时)
搜索(Search) O(1) O(n)(需遍历溢出区)
删除(Delete) O(1) O(n)(需遍历溢出区)

桶大小(Bucket Size)的权衡:

桶大小 优点 缺点
较小(1~2) 内存利用率高,每个桶浪费少 冲突容易溢出,溢出区压力大
中等(3~4) 较好地平衡冲突和空间 删除时桶内移动略多
较大(8+) 冲突极少溢出 空桶浪费大量空间

溢出区(Overflow Area)的处理策略:

  • 固定大小溢出桶:本文实现采用的方案。简单直接,但溢出桶满时无法插入新元素。
  • 可增长溢出区:溢出区满时动态扩容。灵活但增加了实现复杂度。
  • 多级溢出:溢出区本身也用哈希组织,形成二级哈希结构。

与分离链接法(Separate Chaining)和开放地址法(Open Addressing)的比较:

特性 分离链接法 开放地址法 桶方法(本文)
冲突处理 链表存储冲突元素 探测序列寻找空位 桶内数组 + 溢出区
每个槽位容量 无限制(链表) 1 个元素 BUCKET_SIZE 个
额外空间 每个节点需要指针 无额外开销 桶内固定数组
删除操作 直接删除链表节点 需要懒删除 直接覆盖,无空位问题
缓存性能 较差(链表不连续) 最好(连续数组) 较好(桶内连续)
装载因子限制 无限制 必须 < 1 受溢出区大小限制
最坏搜索 O(n) O(n) O(n)(遍历溢出区)

关键说明:

  • 桶方法的本质:桶方法是分离链接法和开放地址法之间的折中。它用固定大小的数组替代链表来处理冲突,避免了动态内存分配的开销;同时比开放地址法有更好的局部性——冲突的元素集中在同一个桶内,而不是散布在整个数组中。
  • 溢出区是性能瓶颈:当哈希函数分布不均匀或装载因子过高时,大量元素溢出到溢出区,搜索和删除操作退化为线性扫描。因此桶方法适合键分布均匀、装载因子适中的场景。
  • 空间效率:桶方法的空间开销为 TABLE_SIZE * BUCKET_SIZE + OVERFLOW_SIZE。当装载因子较低时,大量桶内空间被浪费。空间利用率约为 n / (TABLE_SIZE * BUCKET_SIZE + OVERFLOW_SIZE)
  • 删除的优势:与开放地址法的懒删除不同,桶方法在删除时直接用末尾元素覆盖,不会产生 DELETED 标记的积累问题。这使得桶方法在频繁删除的场景下比开放地址法更有优势。
  • 适用场景:桶方法特别适合键的大小已知、内存布局需要紧凑、且不希望使用动态内存分配(如链表节点)的嵌入式系统或实时系统。通过选择合适的桶大小,可以在空间利用率和冲突处理效率之间取得良好平衡。
posted @ 2026-04-16 17:52  游翔  阅读(14)  评论(0)    收藏  举报