3-7 闭哈希表(开放地址法)

闭哈希表(开放地址法 / Open Addressing)

闭哈希表(Closed Hashing),又称开放地址法(Open Addressing),是处理哈希冲突(Hash Collision)的一种经典策略。其核心思想与开放哈希表截然相反:所有元素直接存储在哈希表数组(Array)本身中,不使用链表。当哈希函数将一个键(Key)映射到的槽位(Slot)已被占用时——即发生冲突——就按照某种探测序列(Probe Sequence)在数组中寻找下一个空闲槽位。因此被称为"开放地址"——元素的最终位置可能不是哈希函数最初计算的位置,而是沿着探测序列"开放"出来的某个地址。

本文重点讲解最基础的线性探测(Linear Probing),并简要介绍二次探测(Quadratic Probing)和双重哈希(Double Hashing)。

核心概念

  • 哈希函数(Hash Function):将键映射到数组下标的函数,例如 hash(key) = key % TABLE_SIZE
  • 哈希冲突(Collision):两个不同的键经过哈希函数计算后得到相同的下标,而该槽位已被占用。
  • 探测(Probing):当冲突发生时,按照规则在数组中寻找下一个可用槽位的过程。
  • 装载因子(Load Factor):记作 α = n / m,其中 n 是已存储的元素数量,m 是数组大小。开放地址法要求 α < 1(数组不能存满),否则探测将无法终止。

下面是一个 TABLE_SIZE = 7 的闭哈希表插入过程示意:

插入 10:  hash(10) = 10 % 7 = 3  -> 槽[3]为空,直接放入
插入 22:  hash(22) = 22 % 7 = 1  -> 槽[1]为空,直接放入
插入 31:  hash(31) = 31 % 7 = 3  -> 槽[3]已占用,探测槽[4],为空,放入
插入  4:  hash(4)  = 4 % 7  = 4  -> 槽[4]已占用,探测槽[5],为空,放入
插入 15:  hash(15) = 15 % 7 = 1  -> 槽[1]已占用,探测槽[2],为空,放入
插入 28:  hash(28) = 28 % 7 = 0  -> 槽[0]为空,直接放入

Index:  [0]  [1]  [2]  [3]  [4]  [5]  [6]
Value:   28   22   15   10   31    4

在这个例子中,102228 直接放入其哈希值对应的槽位。3110 冲突(都映射到槽 3),通过线性探测找到了槽 44 映射到槽 4 但已被 31 占据,探测到槽 5 放入。1522 冲突(都映射到槽 1),探测到槽 2 放入。


线性探测(Linear Probing)

线性探测(Linear Probing)是最简单的开放地址法。当冲突发生时,从哈希值对应的槽位开始,逐个检查下一个槽位(到达数组末尾则回到开头),直到找到空槽。

探测公式为:

index = (hash(key) + i) % TABLE_SIZE,    i = 0, 1, 2, ...

其中 i 是探测的步数。i = 0 时检查原始哈希位置,i = 1 时检查下一个位置,以此类推。

逐步演示

TABLE_SIZE = 7 为例,依次插入 10, 22, 31, 4, 15, 28

插入 10hash(10) = 10 % 7 = 3,槽 3 为空,放入 10

[0]    [1]    [2]    [3]    [4]    [5]    [6]
                       10

插入 22hash(22) = 22 % 7 = 1,槽 1 为空,放入 22

[0]    [1]    [2]    [3]    [4]    [5]    [6]
       22            10

插入 31hash(31) = 31 % 7 = 3,槽 3 已被 10 占用(冲突)。线性探测:检查槽 4,为空,放入 31

[0]    [1]    [2]    [3]    [4]    [5]    [6]
       22            10     31

插入 4hash(4) = 4 % 7 = 4,槽 4 已被 31 占用(冲突)。线性探测:检查槽 5,为空,放入 4

[0]    [1]    [2]    [3]    [4]    [5]    [6]
       22            10     31      4

插入 15hash(15) = 15 % 7 = 1,槽 1 已被 22 占用(冲突)。线性探测:检查槽 2,为空,放入 15

[0]    [1]    [2]    [3]    [4]    [5]    [6]
       22     15     10     31      4

插入 28hash(28) = 28 % 7 = 0,槽 0 为空,放入 28

[0]    [1]    [2]    [3]    [4]    [5]    [6]
 28     22     15     10     31      4

可以看到,由于 1031 都映射到槽 331 被挤到了槽 4,而 4 本应放在槽 4,又被挤到了槽 5。这种一个冲突引发后续连锁冲突的现象称为一次聚集(Primary Clustering),是线性探测的主要缺点。


二次探测和双重哈希

线性探测虽然简单,但一次聚集问题严重。本节简要介绍两种改进的探测方法。

二次探测(Quadratic Probing)

二次探测使用二次函数作为探测步长,避免线性探测的聚集问题:

index = (hash(key) + c1*i + c2*i^2) % TABLE_SIZE,    i = 0, 1, 2, ...

最常见的形式是 c1 = 0, c2 = 1,即:

index = (hash(key) + i^2) % TABLE_SIZE

探测顺序为:h, h+1, h+4, h+9, h+16, ...

二次探测能够有效减少一次聚集,因为探测的步长逐渐增大,不会像线性探测那样连续占用相邻槽位。但二次探测可能产生二次聚集(Secondary Clustering)——哈希到同一位置的不同键会沿完全相同的探测序列查找。此外,二次探测不保证能访问到所有槽位(除非 TABLE_SIZE 是质数且满足特定条件)。

双重哈希(Double Hashing)

双重哈希使用第二个哈希函数计算探测步长,使得不同键的探测序列各不相同:

index = (hash1(key) + i * hash2(key)) % TABLE_SIZE

例如:

hash1(key) = key % TABLE_SIZE
hash2(key) = 1 + (key % (TABLE_SIZE - 1))    // 步长函数,保证 >= 1

探测顺序为:h1, h1+h2, h1+2*h2, h1+3*h2, ...

双重哈希的探测序列依赖于键本身,因此不同键即使 hash1 相同,hash2 也大概率不同,从而产生不同的探测路径。这极大地减少了聚集现象,是开放地址法中理论上最优的探测策略。缺点是需要计算两次哈希函数,且 hash2 不能返回 0(否则探测将原地不动)。


数据结构定义

闭哈希表的所有元素直接存储在数组中。与开放哈希表不同,闭哈希表需要区分三种槽位状态:

  • EMPTY(空):该槽位从未被使用过,搜索时遇到 EMPTY 意味着键一定不存在。
  • OCCUPIED(已占用):该槽位当前存储了一个键。
  • DELETED(已删除):该槽位曾经存储过键但已被删除。搜索时遇到 DELETED 不能停止,需要继续探测。

引入 DELETED 状态是因为:如果直接将槽位标记为 EMPTY,则在该槽位之后的同族元素(即因冲突被探测到更后位置的元素)将无法被搜索到。这就是懒删除(Lazy Deletion)策略。

// C++ HashTable class definition
const int TABLE_SIZE = 7;

enum SlotState { EMPTY, OCCUPIED, DELETED };

class HashTable {
private:
    int* table;          // Array storing keys
    SlotState* flags;    // Array storing slot states
    int capacity;        // Number of slots
    int count;           // Number of stored elements

    int hashFunction(int key) {
        return key % capacity;
    }

public:
    HashTable(int size = TABLE_SIZE);
    ~HashTable();
    void insert(int key);
    bool search(int key);
    void remove(int key);
    void display();
};
// C HashTable struct definition
#define TABLE_SIZE 7

typedef enum { EMPTY, OCCUPIED, DELETED } SlotState;

typedef struct {
    int* table;          // Array storing keys
    SlotState* flags;    // Array storing slot states
    int capacity;        // Number of slots
    int count;           // Number of stored elements
} HashTable;
# Python HashTable class definition
TABLE_SIZE = 7

# Slot states
EMPTY = 0
OCCUPIED = 1
DELETED = 2

class HashTable:
    """Hash table using open addressing with linear probing."""
    def __init__(self, size=TABLE_SIZE):
        self.capacity = size
        self.count = 0
        self.table = [0] * size        # Array storing keys
        self.flags = [EMPTY] * size    # Array storing slot states
// Go HashTable struct definition
package main

import "fmt"

const TABLE_SIZE = 7

const (
	EMPTY    = 0
	OCCUPIED = 1
	DELETED  = 2
)

// HashTable represents a closed hash table using open addressing
type HashTable struct {
	table    []int // slice storing keys
	flags    []int // slice storing slot states: EMPTY / OCCUPIED / DELETED
	capacity int   // number of slots
	count    int   // number of stored elements
}

func newHashTable(size int) *HashTable {
	return &HashTable{
		capacity: size,
		count:    0,
		table:    make([]int, size),
		flags:    make([]int, size), // zero-initialized, all EMPTY
	}
}

func (ht *HashTable) hash(key int) int {
	return key % ht.capacity
}

C++ 使用枚举(enum)SlotState 表示三种状态,int* table 存储键值,SlotState* flags 存储对应槽位的状态。C 语言用 typedef enum 定义等价的结构。Python 使用常量 EMPTY = 0, OCCUPIED = 1, DELETED = 2 和两个列表分别存储键和状态。Go 使用 const 定义三种状态常量,切片 []int 分别存储键和状态,切片的零值 0 恰好对应 EMPTY 状态,因此无需显式初始化。


插入操作

插入(Insertion)的步骤如下:

  1. 用哈希函数计算起始位置 index = key % capacity
  2. index 开始线性探测:检查每个槽位,跳过 OCCUPIED 且键不同的槽位。
  3. 如果找到 EMPTY 或 DELETED 槽位,将键存入该槽并标记为 OCCUPIED。
  4. 如果所有槽位都被占用(表满),插入失败。
// C++ Insert: add a key to the hash table
void HashTable::insert(int key) {
    if (count == capacity) {
        cout << "Hash table is full, cannot insert " << key << endl;
        return;
    }

    int index = hashFunction(key);
    int startIndex = index;

    do {
        if (flags[index] == EMPTY || flags[index] == DELETED) {
            // Found an available slot
            table[index] = key;
            flags[index] = OCCUPIED;
            count++;
            return;
        }
        if (flags[index] == OCCUPIED && table[index] == key) {
            // Key already exists, no duplicate insertion
            return;
        }
        // Slot is occupied by a different key, probe next
        index = (index + 1) % capacity;
    } while (index != startIndex);
}
// C Insert: add a key to the hash table
void insert(HashTable* ht, int key) {
    if (ht->count == ht->capacity) {
        printf("Hash table is full, cannot insert %d\n", key);
        return;
    }

    int index = key % ht->capacity;
    int startIndex = index;

    do {
        if (ht->flags[index] == EMPTY || ht->flags[index] == DELETED) {
            ht->table[index] = key;
            ht->flags[index] = OCCUPIED;
            ht->count++;
            return;
        }
        if (ht->flags[index] == OCCUPIED && ht->table[index] == key) {
            return;  // Key already exists
        }
        index = (index + 1) % ht->capacity;
    } while (index != startIndex);
}
# Python Insert: add a key to the hash table
def insert(self, key):
    if self.count == self.capacity:
        print(f"Hash table is full, cannot insert {key}")
        return

    index = key % self.capacity
    start_index = index

    while True:
        if self.flags[index] == EMPTY or self.flags[index] == DELETED:
            # Found an available slot
            self.table[index] = key
            self.flags[index] = OCCUPIED
            self.count += 1
            return
        if self.flags[index] == OCCUPIED and self.table[index] == key:
            # Key already exists
            return
        # Probe next slot
        index = (index + 1) % self.capacity
        if index == start_index:
            break
// Go Insert: add a key to the hash table
func (ht *HashTable) insert(key int) {
	if ht.count == ht.capacity {
		fmt.Printf("Hash table is full, cannot insert %d\n", key)
		return
	}

	idx := ht.hash(key)
	startIdx := idx

	for {
		if ht.flags[idx] == EMPTY || ht.flags[idx] == DELETED {
			// Found an available slot
			ht.table[idx] = key
			ht.flags[idx] = OCCUPIED
			ht.count++
			return
		}
		if ht.flags[idx] == OCCUPIED && ht.table[idx] == key {
			// Key already exists, no duplicate insertion
			return
		}
		// Slot is occupied by a different key, probe next
		idx = (idx + 1) % ht.capacity
		if idx == startIdx {
			break
		}
	}
}

插入操作从哈希位置开始线性探测。遇到 OCCUPIED 且键不同的槽位就继续探测下一个位置,遇到 EMPTY 或 DELETED 就立即存入。C++/C 使用 do-while 循环,Python 使用 while True,Go 使用 for {} 无限循环配合 break——三者逻辑等价,均确保最多探测一整圈。如果键已存在则跳过,避免重复插入。


搜索操作

搜索(Search)的步骤如下:

  1. 计算起始位置 index = key % capacity
  2. 线性探测:遇到 OCCUPIED 且键匹配则找到;遇到 EMPTY 则停止(键一定不存在);遇到 DELETED 则跳过继续探测。
  3. 探测一整圈未找到则键不存在。

关键区别:遇到 DELETED 必须继续探测,因为目标键可能被冲突挤到了更后面的位置。

// C++ Search: find a key, return true if found
bool HashTable::search(int key) {
    int index = hashFunction(key);
    int startIndex = index;

    do {
        if (flags[index] == EMPTY) {
            // Empty slot means key definitely not in table
            return false;
        }
        if (flags[index] == OCCUPIED && table[index] == key) {
            return true;  // Found the key
        }
        // OCCUPIED with different key or DELETED: keep probing
        index = (index + 1) % capacity;
    } while (index != startIndex);

    return false;  // Full loop, key not found
}
// C Search: find a key, return 1 if found, 0 otherwise
int search(HashTable* ht, int key) {
    int index = key % ht->capacity;
    int startIndex = index;

    do {
        if (ht->flags[index] == EMPTY) {
            return 0;
        }
        if (ht->flags[index] == OCCUPIED && ht->table[index] == key) {
            return 1;
        }
        index = (index + 1) % ht->capacity;
    } while (index != startIndex);

    return 0;
}
# Python Search: find a key, return True if found
def search(self, key):
    index = key % self.capacity
    start_index = index

    while True:
        if self.flags[index] == EMPTY:
            # Empty slot means key definitely not in table
            return False
        if self.flags[index] == OCCUPIED and self.table[index] == key:
            return True
        # OCCUPIED with different key or DELETED: keep probing
        index = (index + 1) % self.capacity
        if index == start_index:
            break

    return False
// Go Search: find a key, return true if found
func (ht *HashTable) search(key int) bool {
	idx := ht.hash(key)
	startIdx := idx

	for {
		if ht.flags[idx] == EMPTY {
			// Empty slot means key definitely not in table
			return false
		}
		if ht.flags[idx] == OCCUPIED && ht.table[idx] == key {
			return true // Found the key
		}
		// OCCUPIED with different key or DELETED: keep probing
		idx = (idx + 1) % ht.capacity
		if idx == startIdx {
			break
		}
	}

	return false // Full loop, key not found
}

搜索时遇到 EMPTY 可以立即判定键不存在——因为如果该键曾被插入并探测到此处之后的位置,这个 EMPTY 槽位必然会被经过或占据。但遇到 DELETED 不能停止,因为目标键可能是在该删除位置之后被插入的。


删除操作

删除(Deletion)不能简单地将槽位标记为 EMPTY。如果这样做,在该删除位置之后的同族元素(因冲突被探测到更后面位置的键)将变得不可达——搜索时会在 EMPTY 槽位提前停止。因此必须使用懒删除(Lazy Deletion):将槽位标记为 DELETED 而非 EMPTY。

// C++ Remove: delete a key using lazy deletion
void HashTable::remove(int key) {
    int index = hashFunction(key);
    int startIndex = index;

    do {
        if (flags[index] == EMPTY) {
            // Key not found
            cout << "Key " << key << " not found" << endl;
            return;
        }
        if (flags[index] == OCCUPIED && table[index] == key) {
            // Found the key, mark as DELETED
            flags[index] = DELETED;
            count--;
            cout << "Deleted key " << key << endl;
            return;
        }
        index = (index + 1) % capacity;
    } while (index != startIndex);

    cout << "Key " << key << " not found" << endl;
}
// C Remove: delete a key using lazy deletion
void removeKey(HashTable* ht, int key) {
    int index = key % ht->capacity;
    int startIndex = index;

    do {
        if (ht->flags[index] == EMPTY) {
            printf("Key %d not found\n", key);
            return;
        }
        if (ht->flags[index] == OCCUPIED && ht->table[index] == key) {
            ht->flags[index] = DELETED;
            ht->count--;
            printf("Deleted key %d\n", key);
            return;
        }
        index = (index + 1) % ht->capacity;
    } while (index != startIndex);

    printf("Key %d not found\n", key);
}
# Python Remove: delete a key using lazy deletion
def remove(self, key):
    index = key % self.capacity
    start_index = index

    while True:
        if self.flags[index] == EMPTY:
            print(f"Key {key} not found")
            return
        if self.flags[index] == OCCUPIED and self.table[index] == key:
            # Mark as DELETED instead of EMPTY
            self.flags[index] = DELETED
            self.count -= 1
            print(f"Deleted key {key}")
            return
        index = (index + 1) % self.capacity
        if index == start_index:
            break

    print(f"Key {key} not found")
// Go Remove: delete a key using lazy deletion
func (ht *HashTable) remove(key int) {
	idx := ht.hash(key)
	startIdx := idx

	for {
		if ht.flags[idx] == EMPTY {
			// Key not found
			fmt.Printf("Key %d not found\n", key)
			return
		}
		if ht.flags[idx] == OCCUPIED && ht.table[idx] == key {
			// Found the key, mark as DELETED
			ht.flags[idx] = DELETED
			ht.count--
			fmt.Printf("Deleted key %d\n", key)
			return
		}
		idx = (idx + 1) % ht.capacity
		if idx == startIdx {
			break
		}
	}

	fmt.Printf("Key %d not found\n", key)
}

删除操作的探测逻辑与搜索类似:沿探测序列查找目标键。找到后不释放内存也不清空数据,仅将状态标记改为 DELETED。这样既不会破坏探测链的完整性,又为插入操作提供了可复用的槽位(DELETED 槽位可以重新写入新键)。


完整实现

下面提供完整的闭哈希表实现,使用线性探测,包含插入、搜索、删除和显示操作,整合为可独立运行的程序。

#include <iostream>
using namespace std;

const int TABLE_SIZE = 7;

enum SlotState { EMPTY, OCCUPIED, DELETED };

class HashTable {
private:
    int* table;
    SlotState* flags;
    int capacity;
    int count;

    int hashFunction(int key) {
        return key % capacity;
    }

public:
    HashTable(int size = TABLE_SIZE) : capacity(size), count(0) {
        table = new int[capacity];
        flags = new SlotState[capacity];
        for (int i = 0; i < capacity; i++) {
            flags[i] = EMPTY;
        }
    }

    ~HashTable() {
        delete[] table;
        delete[] flags;
    }

    void insert(int key) {
        if (count == capacity) {
            cout << "Hash table is full, cannot insert " << key << endl;
            return;
        }

        int index = hashFunction(key);
        int startIndex = index;

        do {
            if (flags[index] == EMPTY || flags[index] == DELETED) {
                table[index] = key;
                flags[index] = OCCUPIED;
                count++;
                return;
            }
            if (flags[index] == OCCUPIED && table[index] == key) {
                return;  // Duplicate
            }
            index = (index + 1) % capacity;
        } while (index != startIndex);
    }

    bool search(int key) {
        int index = hashFunction(key);
        int startIndex = index;

        do {
            if (flags[index] == EMPTY) {
                return false;
            }
            if (flags[index] == OCCUPIED && table[index] == key) {
                return true;
            }
            index = (index + 1) % capacity;
        } while (index != startIndex);

        return false;
    }

    void remove(int key) {
        int index = hashFunction(key);
        int startIndex = index;

        do {
            if (flags[index] == EMPTY) {
                cout << "Key " << key << " not found" << endl;
                return;
            }
            if (flags[index] == OCCUPIED && table[index] == key) {
                flags[index] = DELETED;
                count--;
                cout << "Deleted key " << key << endl;
                return;
            }
            index = (index + 1) % capacity;
        } while (index != startIndex);

        cout << "Key " << key << " not found" << endl;
    }

    void display() {
        for (int i = 0; i < capacity; i++) {
            cout << "[" << i << "] ";
            if (flags[i] == EMPTY) {
                cout << "EMPTY";
            } else if (flags[i] == DELETED) {
                cout << "DELETED";
            } else {
                cout << table[i];
            }
            cout << endl;
        }
    }
};

int main() {
    HashTable ht;

    // Insert keys
    cout << "=== Inserting ===" << endl;
    int keys[] = {10, 22, 31, 4, 15, 28};
    for (int k : keys) {
        cout << "Insert " << k << ": hash(" << k << ") = " << k % 7 << endl;
        ht.insert(k);
    }

    // Display
    cout << "\n=== Hash Table ===" << endl;
    ht.display();

    // Search
    cout << "\n=== Search ===" << endl;
    int targets[] = {31, 17, 15};
    for (int t : targets) {
        if (ht.search(t)) {
            cout << "Key " << t << " found" << endl;
        } else {
            cout << "Key " << t << " not found" << endl;
        }
    }

    // Delete
    cout << "\n=== Delete ===" << endl;
    ht.remove(22);
    ht.remove(17);

    // Display after deletion
    cout << "\n=== After Deletion ===" << endl;
    ht.display();

    return 0;
}
#include <stdio.h>
#include <stdlib.h>

#define TABLE_SIZE 7

typedef enum { EMPTY, OCCUPIED, DELETED } SlotState;

typedef struct {
    int* table;
    SlotState* flags;
    int capacity;
    int count;
} HashTable;

HashTable* createHashTable(int size) {
    HashTable* ht = (HashTable*)malloc(sizeof(HashTable));
    if (!ht) { exit(1); }
    ht->capacity = size;
    ht->count = 0;
    ht->table = (int*)malloc(size * sizeof(int));
    ht->flags = (SlotState*)malloc(size * sizeof(SlotState));
    if (!ht->table || !ht->flags) { exit(1); }
    for (int i = 0; i < size; i++) {
        ht->flags[i] = EMPTY;
    }
    return ht;
}

void destroyHashTable(HashTable* ht) {
    free(ht->table);
    free(ht->flags);
    free(ht);
}

void insert(HashTable* ht, int key) {
    if (ht->count == ht->capacity) {
        printf("Hash table is full, cannot insert %d\n", key);
        return;
    }

    int index = key % ht->capacity;
    int startIndex = index;

    do {
        if (ht->flags[index] == EMPTY || ht->flags[index] == DELETED) {
            ht->table[index] = key;
            ht->flags[index] = OCCUPIED;
            ht->count++;
            return;
        }
        if (ht->flags[index] == OCCUPIED && ht->table[index] == key) {
            return;
        }
        index = (index + 1) % ht->capacity;
    } while (index != startIndex);
}

int search(HashTable* ht, int key) {
    int index = key % ht->capacity;
    int startIndex = index;

    do {
        if (ht->flags[index] == EMPTY) {
            return 0;
        }
        if (ht->flags[index] == OCCUPIED && ht->table[index] == key) {
            return 1;
        }
        index = (index + 1) % ht->capacity;
    } while (index != startIndex);

    return 0;
}

void removeKey(HashTable* ht, int key) {
    int index = key % ht->capacity;
    int startIndex = index;

    do {
        if (ht->flags[index] == EMPTY) {
            printf("Key %d not found\n", key);
            return;
        }
        if (ht->flags[index] == OCCUPIED && ht->table[index] == key) {
            ht->flags[index] = DELETED;
            ht->count--;
            printf("Deleted key %d\n", key);
            return;
        }
        index = (index + 1) % ht->capacity;
    } while (index != startIndex);

    printf("Key %d not found\n", key);
}

void display(HashTable* ht) {
    for (int i = 0; i < ht->capacity; i++) {
        printf("[%d] ", i);
        if (ht->flags[i] == EMPTY) {
            printf("EMPTY");
        } else if (ht->flags[i] == DELETED) {
            printf("DELETED");
        } else {
            printf("%d", ht->table[i]);
        }
        printf("\n");
    }
}

int main() {
    HashTable* ht = createHashTable(TABLE_SIZE);

    printf("=== Inserting ===\n");
    int keys[] = {10, 22, 31, 4, 15, 28};
    for (int i = 0; i < 6; i++) {
        printf("Insert %d: hash(%d) = %d\n", keys[i], keys[i], keys[i] % 7);
        insert(ht, keys[i]);
    }

    printf("\n=== Hash Table ===\n");
    display(ht);

    printf("\n=== Search ===\n");
    int targets[] = {31, 17, 15};
    for (int i = 0; i < 3; i++) {
        if (search(ht, targets[i])) {
            printf("Key %d found\n", targets[i]);
        } else {
            printf("Key %d not found\n", targets[i]);
        }
    }

    printf("\n=== Delete ===\n");
    removeKey(ht, 22);
    removeKey(ht, 17);

    printf("\n=== After Deletion ===\n");
    display(ht);

    destroyHashTable(ht);
    return 0;
}
TABLE_SIZE = 7

EMPTY = 0
OCCUPIED = 1
DELETED = 2

class HashTable:
    """Hash table using open addressing with linear probing."""
    def __init__(self, size=TABLE_SIZE):
        self.capacity = size
        self.count = 0
        self.table = [0] * size
        self.flags = [EMPTY] * size

    def _hash(self, key):
        return key % self.capacity

    def insert(self, key):
        if self.count == self.capacity:
            print(f"Hash table is full, cannot insert {key}")
            return

        index = self._hash(key)
        start_index = index

        while True:
            if self.flags[index] == EMPTY or self.flags[index] == DELETED:
                self.table[index] = key
                self.flags[index] = OCCUPIED
                self.count += 1
                return
            if self.flags[index] == OCCUPIED and self.table[index] == key:
                return  # Duplicate
            index = (index + 1) % self.capacity
            if index == start_index:
                break

    def search(self, key):
        index = self._hash(key)
        start_index = index

        while True:
            if self.flags[index] == EMPTY:
                return False
            if self.flags[index] == OCCUPIED and self.table[index] == key:
                return True
            index = (index + 1) % self.capacity
            if index == start_index:
                break

        return False

    def remove(self, key):
        index = self._hash(key)
        start_index = index

        while True:
            if self.flags[index] == EMPTY:
                print(f"Key {key} not found")
                return
            if self.flags[index] == OCCUPIED and self.table[index] == key:
                self.flags[index] = DELETED
                self.count -= 1
                print(f"Deleted key {key}")
                return
            index = (index + 1) % self.capacity
            if index == start_index:
                break

        print(f"Key {key} not found")

    def display(self):
        for i in range(self.capacity):
            state = {EMPTY: "EMPTY", OCCUPIED: str(self.table[i]), DELETED: "DELETED"}
            print(f"[{i}] {state[self.flags[i]]}")

if __name__ == "__main__":
    ht = HashTable()

    print("=== Inserting ===")
    for k in [10, 22, 31, 4, 15, 28]:
        print(f"Insert {k}: hash({k}) = {k % 7}")
        ht.insert(k)

    print("\n=== Hash Table ===")
    ht.display()

    print("\n=== Search ===")
    for t in [31, 17, 15]:
        if ht.search(t):
            print(f"Key {t} found")
        else:
            print(f"Key {t} not found")

    print("\n=== Delete ===")
    ht.remove(22)
    ht.remove(17)

    print("\n=== After Deletion ===")
    ht.display()

Go 语言使用 const 定义槽位状态常量(EMPTY/OCCUPIED/DELETED),用切片(slice)分别存储键和状态。与 C/C++ 的枚举不同,Go 用 iota 或显式常量表示状态。线性探测的循环使用 for 语句实现,逻辑与 C 的 do-while 等价。

package main

import "fmt"

const TABLE_SIZE = 7

const (
	EMPTY    = 0
	OCCUPIED = 1
	DELETED  = 2
)

// HashTable 闭哈希表,使用开放地址法(线性探测)处理冲突
type HashTable struct {
	capacity int
	count    int
	table    []int // 存储键的数组
	flags    []int // 存储槽位状态:EMPTY / OCCUPIED / DELETED
}

func newHashTable(size int) *HashTable {
	return &HashTable{
		capacity: size,
		count:    0,
		table:    make([]int, size),
		flags:    make([]int, size), // 默认值为 0,即 EMPTY
	}
}

func (ht *HashTable) hash(key int) int {
	return key % ht.capacity
}

// Insert 插入键到哈希表,使用线性探测解决冲突
func (ht *HashTable) insert(key int) {
	if ht.count == ht.capacity {
		fmt.Printf("Hash table is full, cannot insert %d\n", key)
		return
	}

	idx := ht.hash(key)
	startIdx := idx

	for {
		if ht.flags[idx] == EMPTY || ht.flags[idx] == DELETED {
			ht.table[idx] = key
			ht.flags[idx] = OCCUPIED
			ht.count++
			return
		}
		if ht.flags[idx] == OCCUPIED && ht.table[idx] == key {
			return // 键已存在,不重复插入
		}
		idx = (idx + 1) % ht.capacity
		if idx == startIdx {
			break
		}
	}
}

// Search 查找键,返回是否找到
func (ht *HashTable) search(key int) bool {
	idx := ht.hash(key)
	startIdx := idx

	for {
		if ht.flags[idx] == EMPTY {
			return false // 空槽位意味着键一定不存在
		}
		if ht.flags[idx] == OCCUPIED && ht.table[idx] == key {
			return true
		}
		idx = (idx + 1) % ht.capacity
		if idx == startIdx {
			break
		}
	}

	return false
}

// Remove 删除键,使用懒删除(标记为 DELETED)
func (ht *HashTable) remove(key int) {
	idx := ht.hash(key)
	startIdx := idx

	for {
		if ht.flags[idx] == EMPTY {
			fmt.Printf("Key %d not found\n", key)
			return
		}
		if ht.flags[idx] == OCCUPIED && ht.table[idx] == key {
			ht.flags[idx] = DELETED
			ht.count--
			fmt.Printf("Deleted key %d\n", key)
			return
		}
		idx = (idx + 1) % ht.capacity
		if idx == startIdx {
			break
		}
	}

	fmt.Printf("Key %d not found\n", key)
}

// Display 打印哈希表所有槽位及其状态
func (ht *HashTable) display() {
	for i := 0; i < ht.capacity; i++ {
		fmt.Printf("[%d] ", i)
		switch ht.flags[i] {
		case EMPTY:
			fmt.Print("EMPTY")
		case DELETED:
			fmt.Print("DELETED")
		default:
			fmt.Print(ht.table[i])
		}
		fmt.Println()
	}
}

func main() {
	ht := newHashTable(TABLE_SIZE)

	// 插入键
	fmt.Println("=== Inserting ===")
	keys := []int{10, 22, 31, 4, 15, 28}
	for _, k := range keys {
		fmt.Printf("Insert %d: hash(%d) = %d\n", k, k, k%7)
		ht.insert(k)
	}

	// 显示哈希表
	fmt.Println("\n=== Hash Table ===")
	ht.display()

	// 搜索键
	fmt.Println("\n=== Search ===")
	targets := []int{31, 17, 15}
	for _, t := range targets {
		if ht.search(t) {
			fmt.Printf("Key %d found\n", t)
		} else {
			fmt.Printf("Key %d not found\n", t)
		}
	}

	// 删除键
	fmt.Println("\n=== Delete ===")
	ht.remove(22)
	ht.remove(17)

	// 删除后显示
	fmt.Println("\n=== After Deletion ===")
	ht.display()
}

Go 使用 const 定义三种槽位状态,[]int 切片分别存储键和状态。Go 切片的零值为 0,恰好对应 EMPTY 状态,因此无需显式初始化。删除操作将槽位标记为 DELETED 而非 EMPTY,保证探测链的完整性——与 C/C++ 的懒删除策略一致。

运行该程序将输出:

=== Inserting ===
Insert 10: hash(10) = 3
Insert 22: hash(22) = 1
Insert 31: hash(31) = 3
Insert 4: hash(4) = 4
Insert 15: hash(15) = 1
Insert 28: hash(28) = 0

=== Hash Table ===
[0] 28
[1] 22
[2] 15
[3] 10
[4] 31
[5] 4
[6] EMPTY

=== Search ===
Key 31 found
Key 17 not found
Key 15 found

=== Delete ===
Deleted key 22
Key 17 not found

=== After Deletion ===
[0] 28
[1] DELETED
[2] 15
[3] 10
[4] 31
[5] 4
[6] EMPTY

可以看到:

  • 插入 {10, 22, 31, 4, 15, 28} 后,槽位分布与前文逐步演示的结果完全一致。10 在槽 322 在槽 131 因与 10 冲突被探测到槽 44 因槽 4 已占用被探测到槽 515 因与 22 冲突被探测到槽 228 在槽 0
  • 搜索 31 从槽 3 开始探测,跳过 10,在槽 4 找到。搜索 17 从槽 3 开始(17 % 7 = 3),一直探测到空槽 6,判定不存在。搜索 15 从槽 1 开始探测,跳过 22,在槽 2 找到。
  • 删除 22 后,槽 1 被标记为 DELETED 而非 EMPTY,保证后续搜索 15 时能正确跳过槽 1 继续探测到槽 2

性能分析

下表总结了三种开放地址法策略的对比:

特性 线性探测(Linear Probing) 二次探测(Quadratic Probing) 双重哈希(Double Hashing)
探测公式 (h + i) % m (h + i^2) % m (h1 + i*h2) % m
聚集问题 一次聚集(Primary Clustering)严重 二次聚集(Secondary Clustering)较轻 几乎无聚集
缓存性能 最好(连续内存访问) 较好 较差(跳跃访问)
实现复杂度 最简单 中等 较复杂
空间利用 可能无法利用所有槽位 TABLE_SIZE 为质数时可利用所有槽位 当 h2 与 m 互质时可利用所有槽位
插入(平均) O(1 / (1 - α)) O(1 / (1 - α)) O(1 / (1 - α))
搜索(平均) O((1 + 1/(1-α)) / 2) O(1 / (1 - α)) O(1 / (1 - α))

其中 α 是装载因子(Load Factor),α = n / m

关键说明:

  • 装载因子限制:开放地址法要求 α < 1(数组不能满),否则探测将无法终止。实践中通常将装载因子控制在 0.5 ~ 0.7 以下。当 α 接近 1 时,探测次数急剧增加,性能急剧下降。
  • 一次聚集:线性探测的最大问题。当连续一段槽位被占用时(形成一个聚集块),新插入的键只要哈希到这个块中的任何位置,都会沿探测序列一直走到块的末尾,使聚集块越来越大。这导致探测次数的增长速度远超装载因子的增长。
  • 懒删除的影响:DELETED 槽位不释放空间但不算入元素计数,因此实际可用槽位比 (capacity - count) 更少。如果频繁插入和删除,DELETED 槽位会积累,导致探测效率降低。解决方案是定期清理(Rehash)或将 DELETED 槽位在插入时复用(本文的实现已支持后者)。
  • 扩容(Rehashing):当装载因子超过阈值时,创建一个更大的数组(通常是原来的两倍大小),然后将所有元素重新哈希到新数组中。扩容后所有 DELETED 标记被清除,探测效率恢复到最优状态。
  • 空间复杂度:O(m),其中 m 是数组大小。与开放哈希表不同,闭哈希表不需要额外的指针开销,数据直接存储在连续数组中,对缓存更友好。

与开放哈希表(闭地址法)的比较:

特性 开放哈希表(Separate Chaining) 闭哈希表(Open Addressing)
冲突处理 链表存储冲突元素 在数组内探测下一个空位
装载因子 可超过 1.0 必须小于 1.0
删除操作 直接删除链表节点 需要懒删除(Lazy Deletion)
缓存性能 较差(链表节点不连续) 较好(数据在连续数组中)
内存开销 每个节点额外存储指针 无额外指针开销
最坏情况 O(n)(所有键冲突到同一桶) O(n)(探测一整圈)
适用场景 元素数量不确定、频繁删除 元素大小固定、内存受限、需要缓存友好

在实际应用中,Python 的 dict 和 Rust 的 HashMap 使用开放地址法(Python 3.6+ 使用一种改进的开放地址法方案)。C++ 的 std::unordered_map 和 Java 的 HashMap 则使用分离链接法。选择哪种方案取决于具体的使用场景:如果键的大小较小、内存需要紧凑且缓存友好,开放地址法更优;如果元素数量变化大或删除操作频繁,分离链接法更灵活。

posted @ 2026-04-16 17:51  游翔  阅读(10)  评论(0)    收藏  举报