Redis Hash

Redis hash 是一个 string 类型的 field（字段）和 value（值）的映射表，hash 特别适合用于存储对象。
Redis 中每个 hash 可以存储 2^32 - 1 键值对（40多亿）。
类似于数据库中的一行记录，或 Java/Python 中的 Map 或 dict

Demo

1. 设置字段值（类似给对象属性赋值）

HSET user:1000 name "Alice"
HSET user:1000 age 30

执行结果：

127.0.0.1:6379> HSET user:1000 name "Alice"
(integer) 1
127.0.0.1:6379> HSET user:1000 age 30
(integer) 1

2. 批量设置字段值

HSET user:1000 name "Alice" age 30 city "Beijing"

虽然 HMSET 在 Redis 4.0+ 被标记为已弃用（推荐使用 HSET），但仍然可用。

执行结果：

127.0.0.1:6379> HSET user:1000 name "Alice" age 30 city "Beijing"
(integer) 1

3. 获取字段值

HGET user:1000 name

执行结果：

127.0.0.1:6379> HGET user:1000 name
"Alice"

4. 批量获取字段值

HMGET user:1000 name age

执行结果：

127.0.0.1:6379> HMGET user:1000 name age
1) "Alice"
2) "30"

5. 获取所有字段和值

HGETALL user:1000

执行结果：

127.0.0.1:6379> HGETALL user:1000
1) "name"
2) "Alice"
3) "age"
4) "30"
5) "city"
6) "Beijing"

6

HKEYS user:1000
HVALS user:1000

执行结果：

127.0.0.1:6379> HKEYS user:1000
1) "name"
2) "age"
3) "city"
127.0.0.1:6379> HVALS user:1000
1) "Alice"
2) "30"
3) "Beijing"

7. 判断字段是否存在

HEXISTS user:1000 age

执行结果：

127.0.0.1:6379> HEXISTS user:1000 age
(integer) 1

8. 删除字段

HDEL user:1000 city

执行结果：

127.0.0.1:6379> HDEL user:1000 city
(integer) 1

9. 获取字段数量

HLEN user:1000

执行结果：

127.0.0.1:6379> HLEN user:1000
(integer) 2

10. 自增 / 自减字段值（字段值是整数时）

HINCRBY user:1000 age 1
HINCRBYFLOAT product:123 price 0.5

执行结果：

127.0.0.1:6379> HINCRBY user:1000 age 1
(integer) 31
127.0.0.1:6379> HINCRBYFLOAT product:123 price 0.5
"0.5"

数据结构

Redis的哈希表是一个非常核心的数据结构，不仅用于实现Redis的哈希类型，还用于Redis内部的各种字典实现。

Redis的哈希表主要由dict结构定义，它包含了两个哈希表（用于渐进式rehash）和一些元数据:dict.h:106-121

struct dict {
    dictType *type;

    dictEntry **ht_table[2];
    unsigned long ht_used[2];

    long rehashidx; /* rehashing not in progress if rehashidx == -1 */

    /* Keep small vars at end for optimal (minimal) struct padding */
    unsigned pauserehash : 15; /* If >0 rehashing is paused */

    unsigned useStoredKeyApi : 1; /* See comment of storedHashFunction above */
    signed char ht_size_exp[2]; /* exponent of size. (size = 1<<exp) */
    int16_t pauseAutoResize;  /* If >0 automatic resizing is disallowed (<0 indicates coding error) */
    void *metadata[];
};

字段	类型	说明
`type`	`dictType*`	指向具体操作的函数集合（如 hash 函数、key 比较函数等），实现策略的“策略模式”。
`ht_table[2]`	`dictEntry**`	哈希表数组，支持双表机制（rehash 时需要两个表）。原来的 `ht[2]` 被拆成两个指针数组。
`ht_used[2]`	`unsigned long`	每个哈希表中当前元素数量。
`rehashidx`	`long`	当前 rehash 进行的位置。如果为 `-1` 表示没有正在 rehash。
`pauserehash`	`unsigned :15`	暂停 rehash 机制（手动或 server 内部策略控制）。
`useStoredKeyApi`	`unsigned :1`	是否使用“存储式 key API”优化（优化 key 的 hash 再利用）。
`ht_size_exp[2]`	`signed char`	记录两个哈希表的大小指数，即表长 `size = 1 << exp`，代替过去直接存储大小字段，节省空间。
`pauseAutoResize`	`int16_t`	自动扩容/缩容是否暂停（手动控制或调试用）。
`metadata[]`	`void*`	用于扩展字段，比如统计信息、模块自定义等（变长结构）。

每个哈希表由一个桶数组组成，每个桶是一个指向dictEntry的指针：dict.c:45-54

struct dictEntry {
    void *key;
    union {
        void *val;
        uint64_t u64;
        int64_t s64;
        double d;
    } v;
    struct dictEntry *next;     /* Next entry in the same hash bucket. */
};

三种编码方式

编码方式	描述	特点
listpack	默认编码，用于小数据的紧凑存储	高度压缩，占用内存少，操作略慢
hashtable	当字段多/值大时，升级为哈希表	操作快，结构灵活，占用内存相对多
LISTPACK_EX	(7.4.0出现)，listpack+过期时间支持	可保存字段的过期信息，仍保持压缩结构

编码转换的触发条件

转换逻辑:hashTypeTryConversion

void hashTypeTryConversion(redisDb *db, robj *o, robj **argv, int start, int end) {
    int i;
    size_t sum = 0;

    if (o->encoding != OBJ_ENCODING_LISTPACK && o->encoding != OBJ_ENCODING_LISTPACK_EX)
        return;

    /* We guess that most of the values in the input are unique, so
     * if there are enough arguments we create a pre-sized hash, which
     * might over allocate memory if there are duplicates. */
    size_t new_fields = (end - start + 1) / 2;
    if (new_fields > server.hash_max_listpack_entries) {
        hashTypeConvert(o, OBJ_ENCODING_HT, &db->hexpires);
        dictExpand(o->ptr, new_fields);
        return;
    }

    for (i = start; i <= end; i++) {
        if (!sdsEncodedObject(argv[i]))
            continue;
        size_t len = sdslen(argv[i]->ptr);
        if (len > server.hash_max_listpack_value) {
            hashTypeConvert(o, OBJ_ENCODING_HT, &db->hexpires);
            return;
        }
        sum += len;
    }
    if (!lpSafeToAdd(hashTypeListpackGetLp(o), sum))
        hashTypeConvert(o, OBJ_ENCODING_HT, &db->hexpires);
}

1. 初始编码（listpack）

Redis 中的 Hash 在创建时默认采用 listpack 编码（在 Redis 6 中是 ziplist，7 开始统一为 listpack）：

创建 hash：字段数 <= hash-max-listpack-entries（默认 512）
        且每个字段和值大小 <= hash-max-listpack-value（默认 64字节）

2. 升级为 hashtable 的条件

当满足以下任一条件时，Hash 编码会从 listpack 升级为 hashtable：

字段数量超过 hash-max-listpack-entries（默认 512）
任意 field 或 value 的长度超过 hash-max-listpack-value（默认 64 字节）

3. 升级为 LISTPACK_EX 的条件（7.4.0出现）

当 Hash 中任意一个字段设置了过期时间（field-level expiration）时，会触发编码升级为 LISTPACK_EX：

LISTPACK_EX 是在 listpack 的基础上新增了过期字段记录
仍然是紧凑结构，但额外支持 TTL

例如

1. 初始状态：listpack

127.0.0.1:6379> HSET user:1 name "Alice" age 20
(integer) 2
127.0.0.1:6379> OBJECT ENCODING user:1
"listpack"

2. 增加字段数 > 512 → 转为 hashtable 或者任意 field 或 value 的长度超过 > 64

127.0.0.1:6379> HSET user:1 name "qqweroitueronadionnisdnogfiogndfoignohniuas;hjuiengudoinoihujnion" age 20
(integer) 0
127.0.0.1:6379> OBJECT ENCODING user:1
"hashtable"

3. 设置字段过期时间 → 转为 LISTPACK_EX

HSET user:1 token "abc123"
HSETEX user:1 token 60 "abc123" → encoding = listpack_ex

解决哈希冲突

Redis使用链地址法（chaining）来解决哈希冲突。当多个键映射到同一个哈希桶时，它们会形成一个链表： dict.c:769-811

dictEntry *dictFindByHash(dict *d, const void *key, const uint64_t hash) {
    dictEntry *he;
    uint64_t idx, table;

    if (dictSize(d) == 0) return NULL; /* dict is empty */

    idx = hash & DICTHT_SIZE_MASK(d->ht_size_exp[0]);
    keyCmpFunc cmpFunc = dictGetKeyCmpFunc(d);

    /* Rehash the hash table if needed */
    _dictRehashStepIfNeeded(d,idx);

    /* Check if we can use the compare function with length to avoid recomputing length of key always */
    keyCmpFuncWithLen cmpFuncWithLen = d->type->keyCompareWithLen;
    keyLenFunc keyLenFunc = d->type->keyLen;
    const int has_len_fn = (keyLenFunc != NULL && cmpFuncWithLen != NULL);
    const size_t key_len = has_len_fn ? keyLenFunc(d,key) : 0;
    for (table = 0; table <= 1; table++) {
        if (table == 0 && (long)idx < d->rehashidx) continue;
        idx = hash & DICTHT_SIZE_MASK(d->ht_size_exp[table]);

        /* Prefetch the bucket at the calculated index */
        redis_prefetch_read(&d->ht_table[table][idx]);

        he = d->ht_table[table][idx];
        while(he) {
            void *he_key = dictGetKey(he);

            /* Prefetch the next entry to improve cache efficiency */
            redis_prefetch_read(dictGetNext(he));
            if (key == he_key || (has_len_fn ?
                cmpFuncWithLen(d, key, key_len, he_key, keyLenFunc(d,he_key)) :
                cmpFunc(d, key, he_key)))
            {
                return he;
            }
            he = dictGetNext(he);
        }
        /* Use unlikely to optimize branch prediction for the common case */
        if (unlikely(!dictIsRehashing(d))) return NULL;
    }
    return NULL;
}

从上面的代码可以看出，当查找一个键时，Redis会：

计算键的哈希值
使用哈希值与哈希表大小掩码进行按位与操作，确定桶索引
遍历该桶中的链表，比较每个节点的键
如果找到匹配的键，返回对应的条目；否则返回NULL

Rehash

Redis的rehash是渐进式的，这意味着它不会一次性重新哈希整个表，而是分散在多个操作中完成，以避免长时间阻塞。

1. 触发条件

当哈希表的负载因子（元素数量/桶数量）超过某个阈值时，会触发rehash： dict.c L1541-1573

int dictExpandIfNeeded(dict *d) {
    /* Incremental rehashing already in progress. Return. */
    // 当前正在 rehash，直接返回
    if (dictIsRehashing(d)) return DICT_OK;

    /* If the hash table is empty expand it to the initial size. */
    // 如果空表，创建一个初始大小
    if (DICTHT_SIZE(d->ht_size_exp[0]) == 0) {
        dictExpand(d, DICT_HT_INITIAL_SIZE);
        return DICT_OK;
    }

    /* If we reached the 1:1 ratio, and we are allowed to resize the hash
     * table (global setting) or we should avoid it but the ratio between
     * elements/buckets is over the "safe" threshold, we resize doubling
     * the number of buckets. */
    //  装载因子 = used / size，超过 1.0 就扩容（默认策略）
    if ((dict_can_resize == DICT_RESIZE_ENABLE &&
         d->ht_used[0] >= DICTHT_SIZE(d->ht_size_exp[0])) ||
        (dict_can_resize != DICT_RESIZE_FORBID &&
         d->ht_used[0] >= dict_force_resize_ratio * DICTHT_SIZE(d->ht_size_exp[0])))
    {
        if (dictTypeResizeAllowed(d, d->ht_used[0] + 1))
            dictExpand(d, d->ht_used[0] + 1);
        return DICT_OK;
    }
    return DICT_ERR;
}

/* Expand the hash table if needed */
static void _dictExpandIfNeeded(dict *d) {
    /* Automatic resizing is disallowed. Return */
    if (d->pauseAutoResize > 0) return;

    dictExpandIfNeeded(d);
}

正在rehash，不重复触发，跳过
空hash表，自动扩容为初始大小，实际上没有rehash
DICT_RESIZE_ENABLE：允许自动扩容；
如果装载因子达到 1（used ≥ size），就扩容；
如果设置了 DICT_RESIZE_AVOID，则只有装载因子 ≥ dict_force_resize_ratio（默认为4）才扩；
还得满足 dictTypeResizeAllowed() 里的一些限制（例如不在暂停期、表不太小等）；

什么时候调用这段逻辑

dictAdd
dictReplace
dictAddRaw

只有在插入/替换 key 时，才会触发自动 rehash 判断。

几个参数：

参数名	含义
`dict_can_resize`	控制是否允许自动扩容（ENABLE / AVOID / FORBID）
`dict_force_resize_ratio`	默认是 4(`static unsigned int dict_force_resize_ratio = 4;`)，表示“谨慎扩容”时的触发倍数
`pauseAutoResize`	暂停 rehash（比如模块导入大数据时）
`DICTHT_SIZE(exp)`	实际 table 大小：2 的 exp 次幂

2. Rehash过程

rehash的主要步骤如下：

1. 初始化

根据当前哈希表的使用情况，调整其大小，以优化性能和内存使用。
创建一个新的哈希表，大小通常是当前表的两倍（扩容）或一半（收缩） dict.c:226-246

int _dictResize(dict *d, unsigned long size, int* malloc_failed)
{
    if (malloc_failed) *malloc_failed = 0;

    /* We can't rehash twice if rehashing is ongoing. */
    assert(!dictIsRehashing(d));

    /* the new hash table */
    dictEntry **new_ht_table;
    unsigned long new_ht_used;
    signed char new_ht_size_exp = _dictNextExp(size);

    /* Detect overflows */
    size_t newsize = DICTHT_SIZE(new_ht_size_exp);
    if (newsize < size || newsize * sizeof(dictEntry*) < newsize)
        return DICT_ERR;

    /* Rehashing to the same table size is not useful. */
    if (new_ht_size_exp == d->ht_size_exp[0]) return DICT_ERR;

    /* Allocate the new hash table and initialize all pointers to NULL */

2. 渐进式迁移

这个函数每次执行把旧哈希表（ht[0]）中的最多 n 个 bucket 的数据迁移到新表（ht[1]）。Redis 为了避免阻塞，在每次增删查改时偷偷执行一点点 rehash 工作。dict.c:383-421

int dictRehash(dict *d, int n) {
    // 为了防止卡死在空桶上，这里最多允许访问 n * 10 个空 bucket。
    int empty_visits = n*10; /* Max number of empty buckets to visit. */
    unsigned long s0 = DICTHT_SIZE(d->ht_size_exp[0]);
    unsigned long s1 = DICTHT_SIZE(d->ht_size_exp[1]);
    // 控制是否允许 resize（全局变量） 
    // DICT_RESIZE_FORBID：禁止 resize   
    // !dictIsRehashing(d): 如果当前不在 rehash 状态，直接返回
    if (dict_can_resize == DICT_RESIZE_FORBID || !dictIsRehashing(d)) return 0;
    /* If dict_can_resize is DICT_RESIZE_AVOID, we want to avoid rehashing. 
     * - If expanding, the threshold is dict_force_resize_ratio which is 4.
     * - If shrinking, the threshold is 1 / (HASHTABLE_MIN_FILL * dict_force_resize_ratio) which is 1/32. */
     // 避免触发 rehash 的保护逻辑
    if (dict_can_resize == DICT_RESIZE_AVOID && 
        ((s1 > s0 && s1 < dict_force_resize_ratio * s0) || // 如果扩容的比例不足 4 倍（dict_force_resize_ratio = 4）
         (s1 < s0 && s0 < HASHTABLE_MIN_FILL * dict_force_resize_ratio * s1))) //或者缩容比例没到 1/32，就不触发。
    {
        return 0;
    }

    while(n-- && d->ht_used[0] != 0) {
        /* Note that rehashidx can't overflow as we are sure there are more
         * elements because ht[0].used != 0 */

         // 找到当前旧表（ht[0]）中 rehashidx 位置的 bucket。
        assert(DICTHT_SIZE(d->ht_size_exp[0]) > (unsigned long)d->rehashidx);

        // 如果是空的，跳过，并检查空桶访问次数。
        while(d->ht_table[0][d->rehashidx] == NULL) {
            d->rehashidx++;
            if (--empty_visits == 0) return 1;
        }

        // 如果不为空，就调用 rehashEntriesInBucketAtIndex 把它的 entry 搬迁到新表。
        /* Move all the keys in this bucket from the old to the new hash HT */
        rehashEntriesInBucketAtIndex(d, d->rehashidx);

        // 然后 rehashidx++ 继续下一个 bucket。
        d->rehashidx++;
    }

    // 完成rehash判断
    return !dictCheckRehashingCompleted(d);
}

3. 完成rehash

当所有键都迁移完成后，释放旧表，将新表设为主表 dict.c:368-381

/* This checks if we already rehashed the whole table and if more rehashing is required */
static int dictCheckRehashingCompleted(dict *d) {
    // 如果旧表还有数据（说明没迁移完），直接返回 0，rehash 尚未完成。
    if (d->ht_used[0] != 0) return 0;
    
    // 如果有自定义 hook，就执行（用于模块扩展钩子）
    if (d->type->rehashingCompleted) d->type->rehashingCompleted(d);
    // 释放旧的 ht[0]
    zfree(d->ht_table[0]);
    // 把新的 ht[1] 搬过来，正式“升级”为新的主表
    /* Copy the new ht onto the old one */
    d->ht_table[0] = d->ht_table[1];
    d->ht_used[0] = d->ht_used[1];
    d->ht_size_exp[0] = d->ht_size_exp[1];
    _dictReset(d, 1); // 清空 ht[1]
    d->rehashidx = -1;  // 关闭 rehash 状态
    return 1;
}

3. Rehash步骤控制

Redis通过以下方式控制rehash的进度：

每次操作时执行一步rehash： dict.c:447-457

/* This function performs just a step of rehashing, and only if hashing has
 * not been paused for our hash table. When we have iterators in the
 * middle of a rehashing we can't mess with the two hash tables otherwise
 * some elements can be missed or duplicated.
 *
 * This function is called by common lookup or update operations in the
 * dictionary so that the hash table automatically migrates from H1 to H2
 * while it is actively used. */
static void _dictRehashStep(dict *d) {
    if (d->pauserehash == 0) dictRehash(d,1);
}

主动rehash：在服务器空闲时，主动执行一定量的rehash操作 dict.c:430-445

/* Rehash in us+"delta" microseconds. The value of "delta" is larger
 * than 0, and is smaller than 1000 in most cases. The exact upper bound
 * depends on the running time of dictRehash(d,100).*/
int dictRehashMicroseconds(dict *d, uint64_t us) {
    if (d->pauserehash > 0) return 0;

    monotime timer;
    elapsedStart(&timer);
    int rehashes = 0;

    while(dictRehash(d,100)) {
        rehashes += 100;
        if (elapsedUs(timer) >= us) break;
    }
    return rehashes;
}

配置控制：通过配置文件控制rehash行为 redis.conf:2038-2056

activerehashing yes

Redis哈希表rehash的调用链

我们举个例子来阐述一下这个调用链吧：在Redis中，当不断往数据库中添加元素时，会触发字典(dict)的自动扩容和rehash过程。

触发自动扩容

当向Redis添加新元素时（例如使用HSET命令添加哈希字段），最终会调用dictAdd函数来将元素添加到字典中
dictAdd函数会调用dictAddRaw，而dictAddRaw内部会调用_dictExpandIfNeeded来检查是否需要扩容
_dictExpandIfNeeded函数会调用dictExpandIfNeeded，这个函数会检查哈希表的负载因子，并决定是否需要扩容
如果需要扩容，会调用dictExpand函数
dictExpand最终会调用_dictResize函数来创建新的哈希表
在_dictResize中，会创建一个新的哈希表（ht[1]），并将rehashidx设置为0，表示开始rehash过程

执行rehash

1. 操作字典时的自动rehash

每次对字典进行查找、添加或删除操作时，都会尝试执行一步rehash。
会调用_dictRehashStepIfNeeded函数。
函数会检查是否正在进行rehash，如果是，则执行一步rehash操作。

2 服务器周期性任务中的rehash

Redis服务器会在周期性任务中执行rehash操作。
当activerehashing配置为yes时（默认值），Redis会在服务器空闲时主动执行rehash操作，通过调用kvstoreIncrementallyRehash函数
这个函数会在一定时间限制内（INCREMENTAL_REHASHING_THRESHOLD_US）执行尽可能多的rehash步骤。

哈希表的特殊优化

Redis的哈希表实现了一些特殊优化：

字典类型：通过dictType结构定义不同类型的字典行为，如哈希函数、键比较函数等 server.c:556-565
迭代器：提供安全和非安全的迭代器，安全迭代器允许在迭代过程中修改字典 dict.c:1068-1111
扫描：提供dictScan函数，允许在不阻塞的情况下遍历大型哈希表 dict.c:1334-1418

小型哈希的优化

对于小型哈希，Redis使用更紧凑的数据结构（如listpack或之前的ziplist）来节省内存
当哈希增长到一定大小或者需要更复杂的操作时，会自动转换为标准哈希表编码

附录

posted @ 2025-05-01 02:18 Eiffelzero 阅读(87) 评论(0) 收藏举报

刷新页面返回顶部

eiffelzero

Redis Hash

Redis Hash

Demo

1. 设置字段值（类似给对象属性赋值）

2. 批量设置字段值

3. 获取字段值

4. 批量获取字段值

5. 获取所有字段和值

6

7. 判断字段是否存在

8. 删除字段

9. 获取字段数量

10. 自增 / 自减字段值（字段值是整数时）

数据结构

三种编码方式

编码转换的触发条件

1. 初始编码（listpack）

2. 升级为 hashtable 的条件

3. 升级为 LISTPACK_EX 的条件（7.4.0出现）

例如

1. 初始状态：listpack

2. 增加字段数 > 512 → 转为 hashtable 或者 任意 field 或 value 的长度超过 > 64

3. 设置字段过期时间 → 转为 LISTPACK_EX

解决哈希冲突

Rehash

1. 触发条件

什么时候调用这段逻辑

2. Rehash过程

1. 初始化

2. 渐进式迁移

3. 完成rehash

3. Rehash步骤控制

Redis哈希表rehash的调用链

触发自动扩容

执行rehash

1. 操作字典时的自动rehash

2 服务器周期性任务中的rehash

哈希表的特殊优化

小型哈希的优化

附录

公告

2. 增加字段数 > 512 → 转为 hashtable 或者任意 field 或 value 的长度超过 > 64