3-7 闭哈希表(开放地址法)
闭哈希表(开放地址法 / Open Addressing)
闭哈希表(Closed Hashing),又称开放地址法(Open Addressing),是处理哈希冲突(Hash Collision)的一种经典策略。其核心思想与开放哈希表截然相反:所有元素直接存储在哈希表数组(Array)本身中,不使用链表。当哈希函数将一个键(Key)映射到的槽位(Slot)已被占用时——即发生冲突——就按照某种探测序列(Probe Sequence)在数组中寻找下一个空闲槽位。因此被称为"开放地址"——元素的最终位置可能不是哈希函数最初计算的位置,而是沿着探测序列"开放"出来的某个地址。
本文重点讲解最基础的线性探测(Linear Probing),并简要介绍二次探测(Quadratic Probing)和双重哈希(Double Hashing)。
核心概念
- 哈希函数(Hash Function):将键映射到数组下标的函数,例如
hash(key) = key % TABLE_SIZE。 - 哈希冲突(Collision):两个不同的键经过哈希函数计算后得到相同的下标,而该槽位已被占用。
- 探测(Probing):当冲突发生时,按照规则在数组中寻找下一个可用槽位的过程。
- 装载因子(Load Factor):记作
α = n / m,其中n是已存储的元素数量,m是数组大小。开放地址法要求α < 1(数组不能存满),否则探测将无法终止。
下面是一个 TABLE_SIZE = 7 的闭哈希表插入过程示意:
插入 10: hash(10) = 10 % 7 = 3 -> 槽[3]为空,直接放入
插入 22: hash(22) = 22 % 7 = 1 -> 槽[1]为空,直接放入
插入 31: hash(31) = 31 % 7 = 3 -> 槽[3]已占用,探测槽[4],为空,放入
插入 4: hash(4) = 4 % 7 = 4 -> 槽[4]已占用,探测槽[5],为空,放入
插入 15: hash(15) = 15 % 7 = 1 -> 槽[1]已占用,探测槽[2],为空,放入
插入 28: hash(28) = 28 % 7 = 0 -> 槽[0]为空,直接放入
Index: [0] [1] [2] [3] [4] [5] [6]
Value: 28 22 15 10 31 4
在这个例子中,10、22、28 直接放入其哈希值对应的槽位。31 与 10 冲突(都映射到槽 3),通过线性探测找到了槽 4。4 映射到槽 4 但已被 31 占据,探测到槽 5 放入。15 与 22 冲突(都映射到槽 1),探测到槽 2 放入。
线性探测(Linear Probing)
线性探测(Linear Probing)是最简单的开放地址法。当冲突发生时,从哈希值对应的槽位开始,逐个检查下一个槽位(到达数组末尾则回到开头),直到找到空槽。
探测公式为:
index = (hash(key) + i) % TABLE_SIZE, i = 0, 1, 2, ...
其中 i 是探测的步数。i = 0 时检查原始哈希位置,i = 1 时检查下一个位置,以此类推。
逐步演示
以 TABLE_SIZE = 7 为例,依次插入 10, 22, 31, 4, 15, 28:
插入 10:hash(10) = 10 % 7 = 3,槽 3 为空,放入 10。
[0] [1] [2] [3] [4] [5] [6]
10
插入 22:hash(22) = 22 % 7 = 1,槽 1 为空,放入 22。
[0] [1] [2] [3] [4] [5] [6]
22 10
插入 31:hash(31) = 31 % 7 = 3,槽 3 已被 10 占用(冲突)。线性探测:检查槽 4,为空,放入 31。
[0] [1] [2] [3] [4] [5] [6]
22 10 31
插入 4:hash(4) = 4 % 7 = 4,槽 4 已被 31 占用(冲突)。线性探测:检查槽 5,为空,放入 4。
[0] [1] [2] [3] [4] [5] [6]
22 10 31 4
插入 15:hash(15) = 15 % 7 = 1,槽 1 已被 22 占用(冲突)。线性探测:检查槽 2,为空,放入 15。
[0] [1] [2] [3] [4] [5] [6]
22 15 10 31 4
插入 28:hash(28) = 28 % 7 = 0,槽 0 为空,放入 28。
[0] [1] [2] [3] [4] [5] [6]
28 22 15 10 31 4
可以看到,由于 10 和 31 都映射到槽 3,31 被挤到了槽 4,而 4 本应放在槽 4,又被挤到了槽 5。这种一个冲突引发后续连锁冲突的现象称为一次聚集(Primary Clustering),是线性探测的主要缺点。
二次探测和双重哈希
线性探测虽然简单,但一次聚集问题严重。本节简要介绍两种改进的探测方法。
二次探测(Quadratic Probing)
二次探测使用二次函数作为探测步长,避免线性探测的聚集问题:
index = (hash(key) + c1*i + c2*i^2) % TABLE_SIZE, i = 0, 1, 2, ...
最常见的形式是 c1 = 0, c2 = 1,即:
index = (hash(key) + i^2) % TABLE_SIZE
探测顺序为:h, h+1, h+4, h+9, h+16, ...
二次探测能够有效减少一次聚集,因为探测的步长逐渐增大,不会像线性探测那样连续占用相邻槽位。但二次探测可能产生二次聚集(Secondary Clustering)——哈希到同一位置的不同键会沿完全相同的探测序列查找。此外,二次探测不保证能访问到所有槽位(除非 TABLE_SIZE 是质数且满足特定条件)。
双重哈希(Double Hashing)
双重哈希使用第二个哈希函数计算探测步长,使得不同键的探测序列各不相同:
index = (hash1(key) + i * hash2(key)) % TABLE_SIZE
例如:
hash1(key) = key % TABLE_SIZE
hash2(key) = 1 + (key % (TABLE_SIZE - 1)) // 步长函数,保证 >= 1
探测顺序为:h1, h1+h2, h1+2*h2, h1+3*h2, ...
双重哈希的探测序列依赖于键本身,因此不同键即使 hash1 相同,hash2 也大概率不同,从而产生不同的探测路径。这极大地减少了聚集现象,是开放地址法中理论上最优的探测策略。缺点是需要计算两次哈希函数,且 hash2 不能返回 0(否则探测将原地不动)。
数据结构定义
闭哈希表的所有元素直接存储在数组中。与开放哈希表不同,闭哈希表需要区分三种槽位状态:
- EMPTY(空):该槽位从未被使用过,搜索时遇到 EMPTY 意味着键一定不存在。
- OCCUPIED(已占用):该槽位当前存储了一个键。
- DELETED(已删除):该槽位曾经存储过键但已被删除。搜索时遇到 DELETED 不能停止,需要继续探测。
引入 DELETED 状态是因为:如果直接将槽位标记为 EMPTY,则在该槽位之后的同族元素(即因冲突被探测到更后位置的元素)将无法被搜索到。这就是懒删除(Lazy Deletion)策略。
// C++ HashTable class definition
const int TABLE_SIZE = 7;
enum SlotState { EMPTY, OCCUPIED, DELETED };
class HashTable {
private:
int* table; // Array storing keys
SlotState* flags; // Array storing slot states
int capacity; // Number of slots
int count; // Number of stored elements
int hashFunction(int key) {
return key % capacity;
}
public:
HashTable(int size = TABLE_SIZE);
~HashTable();
void insert(int key);
bool search(int key);
void remove(int key);
void display();
};
// C HashTable struct definition
#define TABLE_SIZE 7
typedef enum { EMPTY, OCCUPIED, DELETED } SlotState;
typedef struct {
int* table; // Array storing keys
SlotState* flags; // Array storing slot states
int capacity; // Number of slots
int count; // Number of stored elements
} HashTable;
# Python HashTable class definition
TABLE_SIZE = 7
# Slot states
EMPTY = 0
OCCUPIED = 1
DELETED = 2
class HashTable:
"""Hash table using open addressing with linear probing."""
def __init__(self, size=TABLE_SIZE):
self.capacity = size
self.count = 0
self.table = [0] * size # Array storing keys
self.flags = [EMPTY] * size # Array storing slot states
// Go HashTable struct definition
package main
import "fmt"
const TABLE_SIZE = 7
const (
EMPTY = 0
OCCUPIED = 1
DELETED = 2
)
// HashTable represents a closed hash table using open addressing
type HashTable struct {
table []int // slice storing keys
flags []int // slice storing slot states: EMPTY / OCCUPIED / DELETED
capacity int // number of slots
count int // number of stored elements
}
func newHashTable(size int) *HashTable {
return &HashTable{
capacity: size,
count: 0,
table: make([]int, size),
flags: make([]int, size), // zero-initialized, all EMPTY
}
}
func (ht *HashTable) hash(key int) int {
return key % ht.capacity
}
C++ 使用枚举(enum)SlotState 表示三种状态,int* table 存储键值,SlotState* flags 存储对应槽位的状态。C 语言用 typedef enum 定义等价的结构。Python 使用常量 EMPTY = 0, OCCUPIED = 1, DELETED = 2 和两个列表分别存储键和状态。Go 使用 const 定义三种状态常量,切片 []int 分别存储键和状态,切片的零值 0 恰好对应 EMPTY 状态,因此无需显式初始化。
插入操作
插入(Insertion)的步骤如下:
- 用哈希函数计算起始位置
index = key % capacity。 - 从
index开始线性探测:检查每个槽位,跳过 OCCUPIED 且键不同的槽位。 - 如果找到 EMPTY 或 DELETED 槽位,将键存入该槽并标记为 OCCUPIED。
- 如果所有槽位都被占用(表满),插入失败。
// C++ Insert: add a key to the hash table
void HashTable::insert(int key) {
if (count == capacity) {
cout << "Hash table is full, cannot insert " << key << endl;
return;
}
int index = hashFunction(key);
int startIndex = index;
do {
if (flags[index] == EMPTY || flags[index] == DELETED) {
// Found an available slot
table[index] = key;
flags[index] = OCCUPIED;
count++;
return;
}
if (flags[index] == OCCUPIED && table[index] == key) {
// Key already exists, no duplicate insertion
return;
}
// Slot is occupied by a different key, probe next
index = (index + 1) % capacity;
} while (index != startIndex);
}
// C Insert: add a key to the hash table
void insert(HashTable* ht, int key) {
if (ht->count == ht->capacity) {
printf("Hash table is full, cannot insert %d\n", key);
return;
}
int index = key % ht->capacity;
int startIndex = index;
do {
if (ht->flags[index] == EMPTY || ht->flags[index] == DELETED) {
ht->table[index] = key;
ht->flags[index] = OCCUPIED;
ht->count++;
return;
}
if (ht->flags[index] == OCCUPIED && ht->table[index] == key) {
return; // Key already exists
}
index = (index + 1) % ht->capacity;
} while (index != startIndex);
}
# Python Insert: add a key to the hash table
def insert(self, key):
if self.count == self.capacity:
print(f"Hash table is full, cannot insert {key}")
return
index = key % self.capacity
start_index = index
while True:
if self.flags[index] == EMPTY or self.flags[index] == DELETED:
# Found an available slot
self.table[index] = key
self.flags[index] = OCCUPIED
self.count += 1
return
if self.flags[index] == OCCUPIED and self.table[index] == key:
# Key already exists
return
# Probe next slot
index = (index + 1) % self.capacity
if index == start_index:
break
// Go Insert: add a key to the hash table
func (ht *HashTable) insert(key int) {
if ht.count == ht.capacity {
fmt.Printf("Hash table is full, cannot insert %d\n", key)
return
}
idx := ht.hash(key)
startIdx := idx
for {
if ht.flags[idx] == EMPTY || ht.flags[idx] == DELETED {
// Found an available slot
ht.table[idx] = key
ht.flags[idx] = OCCUPIED
ht.count++
return
}
if ht.flags[idx] == OCCUPIED && ht.table[idx] == key {
// Key already exists, no duplicate insertion
return
}
// Slot is occupied by a different key, probe next
idx = (idx + 1) % ht.capacity
if idx == startIdx {
break
}
}
}
插入操作从哈希位置开始线性探测。遇到 OCCUPIED 且键不同的槽位就继续探测下一个位置,遇到 EMPTY 或 DELETED 就立即存入。C++/C 使用 do-while 循环,Python 使用 while True,Go 使用 for {} 无限循环配合 break——三者逻辑等价,均确保最多探测一整圈。如果键已存在则跳过,避免重复插入。
搜索操作
搜索(Search)的步骤如下:
- 计算起始位置
index = key % capacity。 - 线性探测:遇到 OCCUPIED 且键匹配则找到;遇到 EMPTY 则停止(键一定不存在);遇到 DELETED 则跳过继续探测。
- 探测一整圈未找到则键不存在。
关键区别:遇到 DELETED 必须继续探测,因为目标键可能被冲突挤到了更后面的位置。
// C++ Search: find a key, return true if found
bool HashTable::search(int key) {
int index = hashFunction(key);
int startIndex = index;
do {
if (flags[index] == EMPTY) {
// Empty slot means key definitely not in table
return false;
}
if (flags[index] == OCCUPIED && table[index] == key) {
return true; // Found the key
}
// OCCUPIED with different key or DELETED: keep probing
index = (index + 1) % capacity;
} while (index != startIndex);
return false; // Full loop, key not found
}
// C Search: find a key, return 1 if found, 0 otherwise
int search(HashTable* ht, int key) {
int index = key % ht->capacity;
int startIndex = index;
do {
if (ht->flags[index] == EMPTY) {
return 0;
}
if (ht->flags[index] == OCCUPIED && ht->table[index] == key) {
return 1;
}
index = (index + 1) % ht->capacity;
} while (index != startIndex);
return 0;
}
# Python Search: find a key, return True if found
def search(self, key):
index = key % self.capacity
start_index = index
while True:
if self.flags[index] == EMPTY:
# Empty slot means key definitely not in table
return False
if self.flags[index] == OCCUPIED and self.table[index] == key:
return True
# OCCUPIED with different key or DELETED: keep probing
index = (index + 1) % self.capacity
if index == start_index:
break
return False
// Go Search: find a key, return true if found
func (ht *HashTable) search(key int) bool {
idx := ht.hash(key)
startIdx := idx
for {
if ht.flags[idx] == EMPTY {
// Empty slot means key definitely not in table
return false
}
if ht.flags[idx] == OCCUPIED && ht.table[idx] == key {
return true // Found the key
}
// OCCUPIED with different key or DELETED: keep probing
idx = (idx + 1) % ht.capacity
if idx == startIdx {
break
}
}
return false // Full loop, key not found
}
搜索时遇到 EMPTY 可以立即判定键不存在——因为如果该键曾被插入并探测到此处之后的位置,这个 EMPTY 槽位必然会被经过或占据。但遇到 DELETED 不能停止,因为目标键可能是在该删除位置之后被插入的。
删除操作
删除(Deletion)不能简单地将槽位标记为 EMPTY。如果这样做,在该删除位置之后的同族元素(因冲突被探测到更后面位置的键)将变得不可达——搜索时会在 EMPTY 槽位提前停止。因此必须使用懒删除(Lazy Deletion):将槽位标记为 DELETED 而非 EMPTY。
// C++ Remove: delete a key using lazy deletion
void HashTable::remove(int key) {
int index = hashFunction(key);
int startIndex = index;
do {
if (flags[index] == EMPTY) {
// Key not found
cout << "Key " << key << " not found" << endl;
return;
}
if (flags[index] == OCCUPIED && table[index] == key) {
// Found the key, mark as DELETED
flags[index] = DELETED;
count--;
cout << "Deleted key " << key << endl;
return;
}
index = (index + 1) % capacity;
} while (index != startIndex);
cout << "Key " << key << " not found" << endl;
}
// C Remove: delete a key using lazy deletion
void removeKey(HashTable* ht, int key) {
int index = key % ht->capacity;
int startIndex = index;
do {
if (ht->flags[index] == EMPTY) {
printf("Key %d not found\n", key);
return;
}
if (ht->flags[index] == OCCUPIED && ht->table[index] == key) {
ht->flags[index] = DELETED;
ht->count--;
printf("Deleted key %d\n", key);
return;
}
index = (index + 1) % ht->capacity;
} while (index != startIndex);
printf("Key %d not found\n", key);
}
# Python Remove: delete a key using lazy deletion
def remove(self, key):
index = key % self.capacity
start_index = index
while True:
if self.flags[index] == EMPTY:
print(f"Key {key} not found")
return
if self.flags[index] == OCCUPIED and self.table[index] == key:
# Mark as DELETED instead of EMPTY
self.flags[index] = DELETED
self.count -= 1
print(f"Deleted key {key}")
return
index = (index + 1) % self.capacity
if index == start_index:
break
print(f"Key {key} not found")
// Go Remove: delete a key using lazy deletion
func (ht *HashTable) remove(key int) {
idx := ht.hash(key)
startIdx := idx
for {
if ht.flags[idx] == EMPTY {
// Key not found
fmt.Printf("Key %d not found\n", key)
return
}
if ht.flags[idx] == OCCUPIED && ht.table[idx] == key {
// Found the key, mark as DELETED
ht.flags[idx] = DELETED
ht.count--
fmt.Printf("Deleted key %d\n", key)
return
}
idx = (idx + 1) % ht.capacity
if idx == startIdx {
break
}
}
fmt.Printf("Key %d not found\n", key)
}
删除操作的探测逻辑与搜索类似:沿探测序列查找目标键。找到后不释放内存也不清空数据,仅将状态标记改为 DELETED。这样既不会破坏探测链的完整性,又为插入操作提供了可复用的槽位(DELETED 槽位可以重新写入新键)。
完整实现
下面提供完整的闭哈希表实现,使用线性探测,包含插入、搜索、删除和显示操作,整合为可独立运行的程序。
#include <iostream>
using namespace std;
const int TABLE_SIZE = 7;
enum SlotState { EMPTY, OCCUPIED, DELETED };
class HashTable {
private:
int* table;
SlotState* flags;
int capacity;
int count;
int hashFunction(int key) {
return key % capacity;
}
public:
HashTable(int size = TABLE_SIZE) : capacity(size), count(0) {
table = new int[capacity];
flags = new SlotState[capacity];
for (int i = 0; i < capacity; i++) {
flags[i] = EMPTY;
}
}
~HashTable() {
delete[] table;
delete[] flags;
}
void insert(int key) {
if (count == capacity) {
cout << "Hash table is full, cannot insert " << key << endl;
return;
}
int index = hashFunction(key);
int startIndex = index;
do {
if (flags[index] == EMPTY || flags[index] == DELETED) {
table[index] = key;
flags[index] = OCCUPIED;
count++;
return;
}
if (flags[index] == OCCUPIED && table[index] == key) {
return; // Duplicate
}
index = (index + 1) % capacity;
} while (index != startIndex);
}
bool search(int key) {
int index = hashFunction(key);
int startIndex = index;
do {
if (flags[index] == EMPTY) {
return false;
}
if (flags[index] == OCCUPIED && table[index] == key) {
return true;
}
index = (index + 1) % capacity;
} while (index != startIndex);
return false;
}
void remove(int key) {
int index = hashFunction(key);
int startIndex = index;
do {
if (flags[index] == EMPTY) {
cout << "Key " << key << " not found" << endl;
return;
}
if (flags[index] == OCCUPIED && table[index] == key) {
flags[index] = DELETED;
count--;
cout << "Deleted key " << key << endl;
return;
}
index = (index + 1) % capacity;
} while (index != startIndex);
cout << "Key " << key << " not found" << endl;
}
void display() {
for (int i = 0; i < capacity; i++) {
cout << "[" << i << "] ";
if (flags[i] == EMPTY) {
cout << "EMPTY";
} else if (flags[i] == DELETED) {
cout << "DELETED";
} else {
cout << table[i];
}
cout << endl;
}
}
};
int main() {
HashTable ht;
// Insert keys
cout << "=== Inserting ===" << endl;
int keys[] = {10, 22, 31, 4, 15, 28};
for (int k : keys) {
cout << "Insert " << k << ": hash(" << k << ") = " << k % 7 << endl;
ht.insert(k);
}
// Display
cout << "\n=== Hash Table ===" << endl;
ht.display();
// Search
cout << "\n=== Search ===" << endl;
int targets[] = {31, 17, 15};
for (int t : targets) {
if (ht.search(t)) {
cout << "Key " << t << " found" << endl;
} else {
cout << "Key " << t << " not found" << endl;
}
}
// Delete
cout << "\n=== Delete ===" << endl;
ht.remove(22);
ht.remove(17);
// Display after deletion
cout << "\n=== After Deletion ===" << endl;
ht.display();
return 0;
}
#include <stdio.h>
#include <stdlib.h>
#define TABLE_SIZE 7
typedef enum { EMPTY, OCCUPIED, DELETED } SlotState;
typedef struct {
int* table;
SlotState* flags;
int capacity;
int count;
} HashTable;
HashTable* createHashTable(int size) {
HashTable* ht = (HashTable*)malloc(sizeof(HashTable));
if (!ht) { exit(1); }
ht->capacity = size;
ht->count = 0;
ht->table = (int*)malloc(size * sizeof(int));
ht->flags = (SlotState*)malloc(size * sizeof(SlotState));
if (!ht->table || !ht->flags) { exit(1); }
for (int i = 0; i < size; i++) {
ht->flags[i] = EMPTY;
}
return ht;
}
void destroyHashTable(HashTable* ht) {
free(ht->table);
free(ht->flags);
free(ht);
}
void insert(HashTable* ht, int key) {
if (ht->count == ht->capacity) {
printf("Hash table is full, cannot insert %d\n", key);
return;
}
int index = key % ht->capacity;
int startIndex = index;
do {
if (ht->flags[index] == EMPTY || ht->flags[index] == DELETED) {
ht->table[index] = key;
ht->flags[index] = OCCUPIED;
ht->count++;
return;
}
if (ht->flags[index] == OCCUPIED && ht->table[index] == key) {
return;
}
index = (index + 1) % ht->capacity;
} while (index != startIndex);
}
int search(HashTable* ht, int key) {
int index = key % ht->capacity;
int startIndex = index;
do {
if (ht->flags[index] == EMPTY) {
return 0;
}
if (ht->flags[index] == OCCUPIED && ht->table[index] == key) {
return 1;
}
index = (index + 1) % ht->capacity;
} while (index != startIndex);
return 0;
}
void removeKey(HashTable* ht, int key) {
int index = key % ht->capacity;
int startIndex = index;
do {
if (ht->flags[index] == EMPTY) {
printf("Key %d not found\n", key);
return;
}
if (ht->flags[index] == OCCUPIED && ht->table[index] == key) {
ht->flags[index] = DELETED;
ht->count--;
printf("Deleted key %d\n", key);
return;
}
index = (index + 1) % ht->capacity;
} while (index != startIndex);
printf("Key %d not found\n", key);
}
void display(HashTable* ht) {
for (int i = 0; i < ht->capacity; i++) {
printf("[%d] ", i);
if (ht->flags[i] == EMPTY) {
printf("EMPTY");
} else if (ht->flags[i] == DELETED) {
printf("DELETED");
} else {
printf("%d", ht->table[i]);
}
printf("\n");
}
}
int main() {
HashTable* ht = createHashTable(TABLE_SIZE);
printf("=== Inserting ===\n");
int keys[] = {10, 22, 31, 4, 15, 28};
for (int i = 0; i < 6; i++) {
printf("Insert %d: hash(%d) = %d\n", keys[i], keys[i], keys[i] % 7);
insert(ht, keys[i]);
}
printf("\n=== Hash Table ===\n");
display(ht);
printf("\n=== Search ===\n");
int targets[] = {31, 17, 15};
for (int i = 0; i < 3; i++) {
if (search(ht, targets[i])) {
printf("Key %d found\n", targets[i]);
} else {
printf("Key %d not found\n", targets[i]);
}
}
printf("\n=== Delete ===\n");
removeKey(ht, 22);
removeKey(ht, 17);
printf("\n=== After Deletion ===\n");
display(ht);
destroyHashTable(ht);
return 0;
}
TABLE_SIZE = 7
EMPTY = 0
OCCUPIED = 1
DELETED = 2
class HashTable:
"""Hash table using open addressing with linear probing."""
def __init__(self, size=TABLE_SIZE):
self.capacity = size
self.count = 0
self.table = [0] * size
self.flags = [EMPTY] * size
def _hash(self, key):
return key % self.capacity
def insert(self, key):
if self.count == self.capacity:
print(f"Hash table is full, cannot insert {key}")
return
index = self._hash(key)
start_index = index
while True:
if self.flags[index] == EMPTY or self.flags[index] == DELETED:
self.table[index] = key
self.flags[index] = OCCUPIED
self.count += 1
return
if self.flags[index] == OCCUPIED and self.table[index] == key:
return # Duplicate
index = (index + 1) % self.capacity
if index == start_index:
break
def search(self, key):
index = self._hash(key)
start_index = index
while True:
if self.flags[index] == EMPTY:
return False
if self.flags[index] == OCCUPIED and self.table[index] == key:
return True
index = (index + 1) % self.capacity
if index == start_index:
break
return False
def remove(self, key):
index = self._hash(key)
start_index = index
while True:
if self.flags[index] == EMPTY:
print(f"Key {key} not found")
return
if self.flags[index] == OCCUPIED and self.table[index] == key:
self.flags[index] = DELETED
self.count -= 1
print(f"Deleted key {key}")
return
index = (index + 1) % self.capacity
if index == start_index:
break
print(f"Key {key} not found")
def display(self):
for i in range(self.capacity):
state = {EMPTY: "EMPTY", OCCUPIED: str(self.table[i]), DELETED: "DELETED"}
print(f"[{i}] {state[self.flags[i]]}")
if __name__ == "__main__":
ht = HashTable()
print("=== Inserting ===")
for k in [10, 22, 31, 4, 15, 28]:
print(f"Insert {k}: hash({k}) = {k % 7}")
ht.insert(k)
print("\n=== Hash Table ===")
ht.display()
print("\n=== Search ===")
for t in [31, 17, 15]:
if ht.search(t):
print(f"Key {t} found")
else:
print(f"Key {t} not found")
print("\n=== Delete ===")
ht.remove(22)
ht.remove(17)
print("\n=== After Deletion ===")
ht.display()
Go 语言使用 const 定义槽位状态常量(EMPTY/OCCUPIED/DELETED),用切片(slice)分别存储键和状态。与 C/C++ 的枚举不同,Go 用 iota 或显式常量表示状态。线性探测的循环使用 for 语句实现,逻辑与 C 的 do-while 等价。
package main
import "fmt"
const TABLE_SIZE = 7
const (
EMPTY = 0
OCCUPIED = 1
DELETED = 2
)
// HashTable 闭哈希表,使用开放地址法(线性探测)处理冲突
type HashTable struct {
capacity int
count int
table []int // 存储键的数组
flags []int // 存储槽位状态:EMPTY / OCCUPIED / DELETED
}
func newHashTable(size int) *HashTable {
return &HashTable{
capacity: size,
count: 0,
table: make([]int, size),
flags: make([]int, size), // 默认值为 0,即 EMPTY
}
}
func (ht *HashTable) hash(key int) int {
return key % ht.capacity
}
// Insert 插入键到哈希表,使用线性探测解决冲突
func (ht *HashTable) insert(key int) {
if ht.count == ht.capacity {
fmt.Printf("Hash table is full, cannot insert %d\n", key)
return
}
idx := ht.hash(key)
startIdx := idx
for {
if ht.flags[idx] == EMPTY || ht.flags[idx] == DELETED {
ht.table[idx] = key
ht.flags[idx] = OCCUPIED
ht.count++
return
}
if ht.flags[idx] == OCCUPIED && ht.table[idx] == key {
return // 键已存在,不重复插入
}
idx = (idx + 1) % ht.capacity
if idx == startIdx {
break
}
}
}
// Search 查找键,返回是否找到
func (ht *HashTable) search(key int) bool {
idx := ht.hash(key)
startIdx := idx
for {
if ht.flags[idx] == EMPTY {
return false // 空槽位意味着键一定不存在
}
if ht.flags[idx] == OCCUPIED && ht.table[idx] == key {
return true
}
idx = (idx + 1) % ht.capacity
if idx == startIdx {
break
}
}
return false
}
// Remove 删除键,使用懒删除(标记为 DELETED)
func (ht *HashTable) remove(key int) {
idx := ht.hash(key)
startIdx := idx
for {
if ht.flags[idx] == EMPTY {
fmt.Printf("Key %d not found\n", key)
return
}
if ht.flags[idx] == OCCUPIED && ht.table[idx] == key {
ht.flags[idx] = DELETED
ht.count--
fmt.Printf("Deleted key %d\n", key)
return
}
idx = (idx + 1) % ht.capacity
if idx == startIdx {
break
}
}
fmt.Printf("Key %d not found\n", key)
}
// Display 打印哈希表所有槽位及其状态
func (ht *HashTable) display() {
for i := 0; i < ht.capacity; i++ {
fmt.Printf("[%d] ", i)
switch ht.flags[i] {
case EMPTY:
fmt.Print("EMPTY")
case DELETED:
fmt.Print("DELETED")
default:
fmt.Print(ht.table[i])
}
fmt.Println()
}
}
func main() {
ht := newHashTable(TABLE_SIZE)
// 插入键
fmt.Println("=== Inserting ===")
keys := []int{10, 22, 31, 4, 15, 28}
for _, k := range keys {
fmt.Printf("Insert %d: hash(%d) = %d\n", k, k, k%7)
ht.insert(k)
}
// 显示哈希表
fmt.Println("\n=== Hash Table ===")
ht.display()
// 搜索键
fmt.Println("\n=== Search ===")
targets := []int{31, 17, 15}
for _, t := range targets {
if ht.search(t) {
fmt.Printf("Key %d found\n", t)
} else {
fmt.Printf("Key %d not found\n", t)
}
}
// 删除键
fmt.Println("\n=== Delete ===")
ht.remove(22)
ht.remove(17)
// 删除后显示
fmt.Println("\n=== After Deletion ===")
ht.display()
}
Go 使用 const 定义三种槽位状态,[]int 切片分别存储键和状态。Go 切片的零值为 0,恰好对应 EMPTY 状态,因此无需显式初始化。删除操作将槽位标记为 DELETED 而非 EMPTY,保证探测链的完整性——与 C/C++ 的懒删除策略一致。
运行该程序将输出:
=== Inserting ===
Insert 10: hash(10) = 3
Insert 22: hash(22) = 1
Insert 31: hash(31) = 3
Insert 4: hash(4) = 4
Insert 15: hash(15) = 1
Insert 28: hash(28) = 0
=== Hash Table ===
[0] 28
[1] 22
[2] 15
[3] 10
[4] 31
[5] 4
[6] EMPTY
=== Search ===
Key 31 found
Key 17 not found
Key 15 found
=== Delete ===
Deleted key 22
Key 17 not found
=== After Deletion ===
[0] 28
[1] DELETED
[2] 15
[3] 10
[4] 31
[5] 4
[6] EMPTY
可以看到:
- 插入
{10, 22, 31, 4, 15, 28}后,槽位分布与前文逐步演示的结果完全一致。10在槽3,22在槽1,31因与10冲突被探测到槽4,4因槽4已占用被探测到槽5,15因与22冲突被探测到槽2,28在槽0。 - 搜索
31从槽3开始探测,跳过10,在槽4找到。搜索17从槽3开始(17 % 7 = 3),一直探测到空槽6,判定不存在。搜索15从槽1开始探测,跳过22,在槽2找到。 - 删除
22后,槽1被标记为 DELETED 而非 EMPTY,保证后续搜索15时能正确跳过槽1继续探测到槽2。
性能分析
下表总结了三种开放地址法策略的对比:
| 特性 | 线性探测(Linear Probing) | 二次探测(Quadratic Probing) | 双重哈希(Double Hashing) |
|---|---|---|---|
| 探测公式 | (h + i) % m |
(h + i^2) % m |
(h1 + i*h2) % m |
| 聚集问题 | 一次聚集(Primary Clustering)严重 | 二次聚集(Secondary Clustering)较轻 | 几乎无聚集 |
| 缓存性能 | 最好(连续内存访问) | 较好 | 较差(跳跃访问) |
| 实现复杂度 | 最简单 | 中等 | 较复杂 |
| 空间利用 | 可能无法利用所有槽位 | TABLE_SIZE 为质数时可利用所有槽位 | 当 h2 与 m 互质时可利用所有槽位 |
| 插入(平均) | O(1 / (1 - α)) | O(1 / (1 - α)) | O(1 / (1 - α)) |
| 搜索(平均) | O((1 + 1/(1-α)) / 2) | O(1 / (1 - α)) | O(1 / (1 - α)) |
其中 α 是装载因子(Load Factor),α = n / m。
关键说明:
- 装载因子限制:开放地址法要求
α < 1(数组不能满),否则探测将无法终止。实践中通常将装载因子控制在0.5 ~ 0.7以下。当α接近1时,探测次数急剧增加,性能急剧下降。 - 一次聚集:线性探测的最大问题。当连续一段槽位被占用时(形成一个聚集块),新插入的键只要哈希到这个块中的任何位置,都会沿探测序列一直走到块的末尾,使聚集块越来越大。这导致探测次数的增长速度远超装载因子的增长。
- 懒删除的影响:DELETED 槽位不释放空间但不算入元素计数,因此实际可用槽位比
(capacity - count)更少。如果频繁插入和删除,DELETED 槽位会积累,导致探测效率降低。解决方案是定期清理(Rehash)或将 DELETED 槽位在插入时复用(本文的实现已支持后者)。 - 扩容(Rehashing):当装载因子超过阈值时,创建一个更大的数组(通常是原来的两倍大小),然后将所有元素重新哈希到新数组中。扩容后所有 DELETED 标记被清除,探测效率恢复到最优状态。
- 空间复杂度:O(m),其中 m 是数组大小。与开放哈希表不同,闭哈希表不需要额外的指针开销,数据直接存储在连续数组中,对缓存更友好。
与开放哈希表(闭地址法)的比较:
| 特性 | 开放哈希表(Separate Chaining) | 闭哈希表(Open Addressing) |
|---|---|---|
| 冲突处理 | 链表存储冲突元素 | 在数组内探测下一个空位 |
| 装载因子 | 可超过 1.0 | 必须小于 1.0 |
| 删除操作 | 直接删除链表节点 | 需要懒删除(Lazy Deletion) |
| 缓存性能 | 较差(链表节点不连续) | 较好(数据在连续数组中) |
| 内存开销 | 每个节点额外存储指针 | 无额外指针开销 |
| 最坏情况 | O(n)(所有键冲突到同一桶) | O(n)(探测一整圈) |
| 适用场景 | 元素数量不确定、频繁删除 | 元素大小固定、内存受限、需要缓存友好 |
在实际应用中,Python 的 dict 和 Rust 的 HashMap 使用开放地址法(Python 3.6+ 使用一种改进的开放地址法方案)。C++ 的 std::unordered_map 和 Java 的 HashMap 则使用分离链接法。选择哪种方案取决于具体的使用场景:如果键的大小较小、内存需要紧凑且缓存友好,开放地址法更优;如果元素数量变化大或删除操作频繁,分离链接法更灵活。

浙公网安备 33010602011771号