Redis Set

Redis Set

  • Redis 的 Set 是 String 类型的无序集合。集合成员是唯一的,这就意味着集合中不能出现重复的数据。
  • 集合对象的编码可以是 intset 或者 hashtable。
  • Redis 中集合是通过哈希表实现的,所以添加,删除,查找的复杂度都是 O(1)。
  • 集合中最大的成员数为 2^32 - 1 (4294967295, 每个集合可存储40多亿个成员)。

Demo

1. 添加元素

SMEMBERS tags
  • Set 自动去重,多次添加同一个值也只会存在一次。
    执行结果:
127.0.0.1:6379> SADD tags "redis"
(integer) 1
127.0.0.1:6379> SADD tags "nosql" "database"
(integer) 2

2. 获取所有元素

SMEMBERS tags

执行结果:

127.0.0.1:6379> SMEMBERS tags
1) "redis"
2) "nosql"
3) "database"

3. 判断元素是否存在

SISMEMBER tags "redis"

执行结果:

127.0.0.1:6379> SISMEMBER tags "redis"
(integer) 1

4. 获取元素数量

SCARD tags

执行结果:

127.0.0.1:6379> SCARD tags
(integer) 3

5. 随机弹出一个或多个元素(删除)

SPOP tags         # 随机弹出一个
SPOP tags 2       # 随机弹出两个

执行结果:

127.0.0.1:6379> SPOP tags 
"database"
127.0.0.1:6379> SPOP tags 2
1) "redis"
2) "nosql"

6. 随机获取元素但不删除

SRANDMEMBER tags        # 随机取 1 个
SRANDMEMBER tags 3      # 随机取 3 个,不删除

执行结果:

127.0.0.1:6379> SADD tags "redis"
(integer) 1
127.0.0.1:6379> SADD tags "nosql" "database"
(integer) 2
127.0.0.1:6379> SRANDMEMBER tags
"database"
127.0.0.1:6379> SRANDMEMBER tags 3
1) "redis"
2) "nosql"
3) "database"

7. 删除指定元素

SREM tags "nosql"

执行结果:

127.0.0.1:6379> SREM tags "nosql"
(integer) 1

8. 交集

SINTER group:math group:english

执行结果:

127.0.0.1:6379> SADD group:math 1 3 5 7 9 2 4 5
(integer) 7
127.0.0.1:6379> SADD group:english 2 4 6 8 0
(integer) 5
127.0.0.1:6379> SINTER group:math group:english
1) "2"
2) "4"

9. 并集

SUNION group:math group:english

执行结果:

127.0.0.1:6379> SUNION group:math group:english
 1) "0"
 2) "1"
 3) "2"
 4) "3"
 5) "4"
 6) "5"
 7) "6"
 8) "7"
 9) "8"
10) "9"

10. 差集

SDIFF group:math group:english

执行结果:

127.0.0.1:6379> SDIFF group:math group:english
1) "1"
2) "3"
3) "5"
4) "7"
5) "9"

11. 集合运算结果保存到新集合

SINTERSTORE common group:math group:english

执行结果:

127.0.0.1:6379> SINTERSTORE common group:math group:english
(integer) 2

127.0.0.1:6379> SMEMBERS common
1) "2"
2) "4"

数据编码

Redis的Set支持三种编码:

1. 整数集合(intset):当集合仅包含整数且元素数量较少时使用

127.0.0.1:6379> sadd set1 1 2 3 0 -1
(integer) 5
127.0.0.1:6379> object encoding set1
"intset"

2. 压缩列表(listpack):当集合较小且不是纯整数时使用

127.0.0.1:6379> sadd set1 abc
(integer) 1
127.0.0.1:6379> object encoding set1
"listpack"

3. 哈希表(hashtable):当集合较大或不适合前两种编码时使用

for i in $(seq 1 200); do
    redis-cli SADD myset "element_$i"
done

127.0.0.1:6379> Object encoding myset
"hashtable"

编码选择

Set创建时会根据元素类型和预期大小自动选择编码:t_set.c:24-41

/* Factory method to return a set that *can* hold "value". When the object has
 * an integer-encodable value, an intset will be returned. Otherwise a listpack
 * or a regular hash table.
 *
 * The size hint indicates approximately how many items will be added which is
 * used to determine the initial representation. */
robj *setTypeCreate(sds value, size_t size_hint) {
    if (isSdsRepresentableAsLongLong(value,NULL) == C_OK && size_hint <= server.set_max_intset_entries)
        return createIntsetObject();
    if (size_hint <= server.set_max_listpack_entries)
        return createSetListpackObject();

    /* We may oversize the set by using the hint if the hint is not accurate,
     * but we will assume this is acceptable to maximize performance. */
    robj *o = createSetObject();
    dictExpand(o->ptr, size_hint);
    return o;
}

1. 判断是否所有元素都是整数,并且数量较少

if (isSdsRepresentableAsLongLong(value,NULL) == C_OK &&
    size_hint <= server.set_max_intset_entries)
  • 如果这个元素能转换为 long long,说明可能是纯整数集合。

  • 并且元素个数(预估)在配置阈值内(默认 512)。

  • 那么就使用 intset 编码。

优点:存储紧凑,查找用二分查找,性能优秀。

2. 如果元素是字符串且数量也不大,就用 listpack

if (size_hint <= server.set_max_listpack_entries)
  • Redis 7.0 开始,增加了 listpack 对 SET 的支持(更节省空间的有序压缩结构)。

  • 如果不满足 intset 的整数条件,但总数小于配置值(默认 128),就用 listpack

优点:内存利用率非常高,适合超小数据集。

3. 否则使用 dict(哈希表)结构

robj *o = createSetObject();
dictExpand(o->ptr, size_hint);
  • 对于数据量大的集合,或混合类型数据,直接用 dict。
  • 并用 dictExpand 根据 hint 预分配容量,避免后续频繁 rehash。

优点:插入/删除/查找性能最稳定,适合中大规模集合。

配置

配置项 默认值 控制作用
set-max-intset-entries 512 控制 intset 转 dict 的临界值
set-max-listpack-entries 128 控制 listpack 转 dict 的临界值

自动编码转换

当集合内容变化时,Redis会自动在不同编码间转换:

intset转换条件

  • 当添加非整数元素时,转换为listpack或hashtable
  • 当元素数量超过server.set_max_intset_entries时,转换为hashtable
/* Converts intset to HT if it contains too many entries. */
static void maybeConvertIntset(robj *subject) {
    serverAssert(subject->encoding == OBJ_ENCODING_INTSET);
    if (intsetLen(subject->ptr) > intsetMaxEntries())
        setTypeConvert(subject,OBJ_ENCODING_HT);
}

listpack转换条件

  • 当元素数量超过server.set_max_listpack_entries
  • 当元素大小超过server.set_max_listpack_value时
            if (lpLength(lp) < server.set_max_listpack_entries &&
                len <= server.set_max_listpack_value &&
                lpSafeToAdd(lp, len))
            {
                if (str == tmpbuf) {
                    /* This came in as integer so we can avoid parsing it again.
                     * TODO: Create and use lpFindInteger; don't go via string. */
                    lp = lpAppendInteger(lp, llval);
                } else {
                    lp = lpAppend(lp, (unsigned char*)str, len);
                }
                set->ptr = lp;
            } else {
                /* Size limit is reached. Convert to hashtable and add. */
                setTypeConvertAndExpand(set, OBJ_ENCODING_HT, lpLength(lp) + 1, 1);
                serverAssert(dictAdd(set->ptr,sdsnewlen(str,len),NULL) == DICT_OK);
            }

转换实现

转换逻辑由setTypeConvertsetTypeConvertAndExpand函数实现
set.c:475-542

/* Convert the set to specified encoding. The resulting dict (when converting
 * to a hash table) is presized to hold the number of elements in the original
 * set. */
void setTypeConvert(robj *setobj, int enc) {
    setTypeConvertAndExpand(setobj, enc, setTypeSize(setobj), 1);
}

/* Converts a set to the specified encoding, pre-sizing it for 'cap' elements.
 * The 'panic' argument controls whether to panic on OOM (panic=1) or return
 * C_ERR on OOM (panic=0). If panic=1 is given, this function always returns
 * C_OK. */
int setTypeConvertAndExpand(robj *setobj, int enc, unsigned long cap, int panic) {
    setTypeIterator *si;
    serverAssertWithInfo(NULL,setobj,setobj->type == OBJ_SET &&
                             setobj->encoding != enc);

    if (enc == OBJ_ENCODING_HT) {
        dict *d = dictCreate(&setDictType);
        sds element;

        /* Presize the dict to avoid rehashing */
        if (panic) {
            dictExpand(d, cap);
        } else if (dictTryExpand(d, cap) != DICT_OK) {
            dictRelease(d);
            return C_ERR;
        }

        /* To add the elements we extract integers and create redis objects */
        si = setTypeInitIterator(setobj);
        while ((element = setTypeNextObject(si)) != NULL) {
            serverAssert(dictAdd(d,element,NULL) == DICT_OK);
        }
        setTypeReleaseIterator(si);

        freeSetObject(setobj); /* frees the internals but not setobj itself */
        setobj->encoding = OBJ_ENCODING_HT;
        setobj->ptr = d;
    } else if (enc == OBJ_ENCODING_LISTPACK) {
        /* Preallocate the minimum two bytes per element (enc/value + backlen) */
        size_t estcap = cap * 2;
        if (setobj->encoding == OBJ_ENCODING_INTSET && setTypeSize(setobj) > 0) {
            /* If we're converting from intset, we have a better estimate. */
            size_t s1 = lpEstimateBytesRepeatedInteger(intsetMin(setobj->ptr), cap);
            size_t s2 = lpEstimateBytesRepeatedInteger(intsetMax(setobj->ptr), cap);
            estcap = max(s1, s2);
        }
        unsigned char *lp = lpNew(estcap);
        char *str;
        size_t len;
        int64_t llele;
        si = setTypeInitIterator(setobj);
        while (setTypeNext(si, &str, &len, &llele) != -1) {
            if (str != NULL)
                lp = lpAppend(lp, (unsigned char *)str, len);
            else
                lp = lpAppendInteger(lp, llele);
        }
        setTypeReleaseIterator(si);

        freeSetObject(setobj); /* frees the internals but not setobj itself */
        setobj->encoding = OBJ_ENCODING_LISTPACK;
        setobj->ptr = lp;
    } else {
        serverPanic("Unsupported set conversion");
    }
    return C_OK;
}
参数 含义
setobj 要转换的集合对象(robj*
enc 目标编码类型(如 OBJ_ENCODING_HT
cap 预估元素数量(用于扩容字典或预估listpack大小)
panic 是否在内存不足时直接 panic(1 表示 OOM 直接崩溃)
1. 转为哈希表编码(OBJ_ENCODING_HT)
  1. 创建空字典

    dict *d = dictCreate(&setDictType);
    
    • 使用 setDictType(定义了比较函数、哈希函数等)构建空字典。
  2. 扩容(避免后续 rehash)

    if (panic) {
        dictExpand(d, cap);
    } else if (dictTryExpand(d, cap) != DICT_OK) {
        dictRelease(d);
        return C_ERR;
    }
    
    • 如果是 panic 模式,直接调用 dictExpand,分配失败会 crash。
    • 否则使用 dictTryExpand,失败时优雅返回错误码。
  3. 遍历原始集合并插入字典

    si = setTypeInitIterator(setobj);
    while ((element = setTypeNextObject(si)) != NULL) {
        serverAssert(dictAdd(d,element,NULL) == DICT_OK);
    }
    setTypeReleaseIterator(si);
    
    • 使用 setTypeIterator 遍历原集合的元素,逐个插入新的 dict 中。
  4. 替换原有数据结构

    freeSetObject(setobj); // 释放原始结构体内部内容,但保留 robj 本体
    setobj->encoding = OBJ_ENCODING_HT;
    setobj->ptr = d;
    
2. 转为 listpack(OBJ_ENCODING_LISTPACK)
size_t estcap = cap * 2;
  • 初步估算 listpack 所需容量(非常保守)。
  1. 精细估算容量(如果原来是 intset)

    if (setobj->encoding == OBJ_ENCODING_INTSET && setTypeSize(setobj) > 0) {
        size_t s1 = lpEstimateBytesRepeatedInteger(intsetMin(setobj->ptr), cap);
        size_t s2 = lpEstimateBytesRepeatedInteger(intsetMax(setobj->ptr), cap);
        estcap = max(s1, s2);
    }
    
    • intsetMin 和 intsetMax 用来估计整数转 listpack 时的编码大小。
  2. 新建空 listpack 并插入数据

    unsigned char *lp = lpNew(estcap);
    si = setTypeInitIterator(setobj);
    while (setTypeNext(si, &str, &len, &llele) != -1) {
        if (str != NULL)
            lp = lpAppend(lp, (unsigned char *)str, len);
        else
            lp = lpAppendInteger(lp, llele);
    }
    setTypeReleaseIterator(si);
    
    • 如果是字符串,就调用 lpAppend

    • 如果是整数,就调用 lpAppendInteger

  3. 替换原结构

    freeSetObject(setobj);
    setobj->encoding = OBJ_ENCODING_LISTPACK;
    setobj->ptr = lp;
    

主要操作实现

1. 添加元素(SADD)

t_set.c:588-613

void saddCommand(client *c) {
    robj *set;
    int j, added = 0;

    set = lookupKeyWrite(c->db,c->argv[1]);
    if (checkType(c,set,OBJ_SET)) return;
    
    if (set == NULL) {
        set = setTypeCreate(c->argv[2]->ptr, c->argc - 2);
        dbAdd(c->db,c->argv[1],set);
    } else {
        setTypeMaybeConvert(set, c->argc - 2);
    }

    for (j = 2; j < c->argc; j++) {
        if (setTypeAdd(set,c->argv[j]->ptr)) added++;
    }
    if (added) {
        unsigned long size = setTypeSize(set);
        updateKeysizesHist(c->db, getKeySlot(c->argv[1]->ptr), OBJ_SET, size - added, size);
        signalModifiedKey(c,c->db,c->argv[1]);
        notifyKeyspaceEvent(NOTIFY_SET,"sadd",c->argv[1],c->db->id);
    }
    server.dirty += added;
    addReplyLongLong(c,added);
}

t_set.c:95-213

/* Add the specified sds value into a set.
 *
 * If the value was already member of the set, nothing is done and 0 is
 * returned, otherwise the new element is added and 1 is returned. */
int setTypeAdd(robj *subject, sds value) {
    return setTypeAddAux(subject, value, sdslen(value), 0, 1);
}

/* Add member. This function is optimized for the different encodings. The
 * value can be provided as an sds string (indicated by passing str_is_sds =
 * 1), as string and length (str_is_sds = 0) or as an integer in which case str
 * is set to NULL and llval is provided instead.
 *
 * Returns 1 if the value was added and 0 if it was already a member. */
int setTypeAddAux(robj *set, char *str, size_t len, int64_t llval, int str_is_sds) {
    char tmpbuf[LONG_STR_SIZE];
    if (!str) {
        if (set->encoding == OBJ_ENCODING_INTSET) {
            uint8_t success = 0;
            set->ptr = intsetAdd(set->ptr, llval, &success);
            if (success) maybeConvertIntset(set);
            return success;
        }
        /* Convert int to string. */
        len = ll2string(tmpbuf, sizeof tmpbuf, llval);
        str = tmpbuf;
        str_is_sds = 0;
    }

    serverAssert(str);
    if (set->encoding == OBJ_ENCODING_HT) {
        /* Avoid duping the string if it is an sds string. */
        sds sdsval = str_is_sds ? (sds)str : sdsnewlen(str, len);
        dict *ht = set->ptr;
        void *position = dictFindPositionForInsert(ht, sdsval, NULL);
        if (position) {
            /* Key doesn't already exist in the set. Add it but dup the key. */
            if (sdsval == str) sdsval = sdsdup(sdsval);
            dictInsertAtPosition(ht, sdsval, position);
        } else if (sdsval != str) {
            /* String is already a member. Free our temporary sds copy. */
            sdsfree(sdsval);
        }
        return (position != NULL);
    } else if (set->encoding == OBJ_ENCODING_LISTPACK) {
        unsigned char *lp = set->ptr;
        unsigned char *p = lpFirst(lp);
        if (p != NULL)
            p = lpFind(lp, p, (unsigned char*)str, len, 0);
        if (p == NULL) {
            /* Not found.  */
            if (lpLength(lp) < server.set_max_listpack_entries &&
                len <= server.set_max_listpack_value &&
                lpSafeToAdd(lp, len))
            {
                if (str == tmpbuf) {
                    /* This came in as integer so we can avoid parsing it again.
                     * TODO: Create and use lpFindInteger; don't go via string. */
                    lp = lpAppendInteger(lp, llval);
                } else {
                    lp = lpAppend(lp, (unsigned char*)str, len);
                }
                set->ptr = lp;
            } else {
                /* Size limit is reached. Convert to hashtable and add. */
                setTypeConvertAndExpand(set, OBJ_ENCODING_HT, lpLength(lp) + 1, 1);
                serverAssert(dictAdd(set->ptr,sdsnewlen(str,len),NULL) == DICT_OK);
            }
            return 1;
        }
    } else if (set->encoding == OBJ_ENCODING_INTSET) {
        long long value;
        if (string2ll(str, len, &value)) {
            uint8_t success = 0;
            set->ptr = intsetAdd(set->ptr,value,&success);
            if (success) {
                maybeConvertIntset(set);
                return 1;
            }
        } else {
            /* Check if listpack encoding is safe not to cross any threshold. */
            size_t maxelelen = 0, totsize = 0;
            unsigned long n = intsetLen(set->ptr);
            if (n != 0) {
                size_t elelen1 = sdigits10(intsetMax(set->ptr));
                size_t elelen2 = sdigits10(intsetMin(set->ptr));
                maxelelen = max(elelen1, elelen2);
                size_t s1 = lpEstimateBytesRepeatedInteger(intsetMax(set->ptr), n);
                size_t s2 = lpEstimateBytesRepeatedInteger(intsetMin(set->ptr), n);
                totsize = max(s1, s2);
            }
            if (intsetLen((const intset*)set->ptr) < server.set_max_listpack_entries &&
                len <= server.set_max_listpack_value &&
                maxelelen <= server.set_max_listpack_value &&
                lpSafeToAdd(NULL, totsize + len))
            {
                /* In the "safe to add" check above we assumed all elements in
                 * the intset are of size maxelelen. This is an upper bound. */
                setTypeConvertAndExpand(set, OBJ_ENCODING_LISTPACK,
                                        intsetLen(set->ptr) + 1, 1);
                unsigned char *lp = set->ptr;
                lp = lpAppend(lp, (unsigned char *)str, len);
                lp = lpShrinkToFit(lp);
                set->ptr = lp;
                return 1;
            } else {
                setTypeConvertAndExpand(set, OBJ_ENCODING_HT,
                                        intsetLen(set->ptr) + 1, 1);
                /* The set *was* an intset and this value is not integer
                 * encodable, so dictAdd should always work. */
                serverAssert(dictAdd(set->ptr,sdsnewlen(str,len),NULL) == DICT_OK);
                return 1;
            }
        }
    } else {
        serverPanic("Unknown set encoding");
    }
    return 0;
}

1:整数形式插入(str == NULL)

if (!str) {
    if (set->encoding == OBJ_ENCODING_INTSET) {
        ...
        return success;
    }
    // 否则将整数转为字符串
    len = ll2string(tmpbuf, sizeof tmpbuf, llval);
    str = tmpbuf;
    str_is_sds = 0;
}
  • 如果传入的是整数(没有字符串),优先尝试使用 intset 插入。
  • 如果 set 不是 intset,就把整数转换成字符串,走下边统一逻辑。

2:当前编码为 hashtable(OBJ_ENCODING_HT)

sds sdsval = str_is_sds ? (sds)str : sdsnewlen(str, len);
dict *ht = set->ptr;
void *position = dictFindPositionForInsert(ht, sdsval, NULL);
if (position) {
    ...
}
  • 构造 sdsval,避免重复创建。
  • 如果 key 不存在(position != NULL),插入新元素。
  • 否则说明元素已存在,如果是临时 sds,就释放掉。

返回值:是否插入成功(即是否是新元素)。

3:当前编码为 listpack

unsigned char *lp = set->ptr;
unsigned char *p = lpFirst(lp);
if (p != NULL)
    p = lpFind(lp, p, (unsigned char*)str, len, 0);

  • 在 listpack 中查找元素。
  • 如果元素不存在,尝试插入。
if (lpLength(lp) < server.set_max_listpack_entries &&
    len <= server.set_max_listpack_value &&
    lpSafeToAdd(lp, len))
{
    ...
} else {
    setTypeConvertAndExpand(set, OBJ_ENCODING_HT, lpLength(lp) + 1, 1);
    ...
}
  • 如果 listpack 还没超限,就插入。
  • 否则就转换成 hashtable 再插入。

4:当前编码为 intset,但插入的是非整数字符串

if (string2ll(str, len, &value)) {
    ...
} else {
    ...
}
  • 尝试将字符串转为整数。如果成功,走 intset 插入流程。
  • 否则,需要根据元素长度和集合大小判断是否可以转换为 listpack,否则直接转换为 hashtable。
if (intsetLen((const intset*)set->ptr) < server.set_max_listpack_entries &&
    len <= server.set_max_listpack_value &&
    maxelelen <= server.set_max_listpack_value &&
    lpSafeToAdd(NULL, totsize + len))
{
    setTypeConvertAndExpand(set, OBJ_ENCODING_LISTPACK, ...);
    ...
} else {
    setTypeConvertAndExpand(set, OBJ_ENCODING_HT, ...);
    ...
}
  • 判断是否可以转换为 listpack:
    • 当前元素个数是否超出限制
    • 元素长度是否超过 listpack 单元素上限
    • 添加是否安全(避免 OOM)
  • 如果不满足,则转换为 hashtable。

2. 删除元素(SREM)

t_set.c:615-645

void sremCommand(client *c) {
    robj *set;
    int j, deleted = 0, keyremoved = 0;

    if ((set = lookupKeyWriteOrReply(c,c->argv[1],shared.czero)) == NULL ||
        checkType(c,set,OBJ_SET)) return;

    unsigned long oldSize = setTypeSize(set);

    for (j = 2; j < c->argc; j++) {
        if (setTypeRemove(set,c->argv[j]->ptr)) {
            deleted++;
            if (setTypeSize(set) == 0) {
                dbDelete(c->db,c->argv[1]);
                keyremoved = 1;
                break;
            }
        }
    }
    if (deleted) {
        
        updateKeysizesHist(c->db, getKeySlot(c->argv[1]->ptr), OBJ_SET, oldSize, oldSize - deleted);
        signalModifiedKey(c,c->db,c->argv[1]);
        notifyKeyspaceEvent(NOTIFY_SET,"srem",c->argv[1],c->db->id);
        if (keyremoved)
            notifyKeyspaceEvent(NOTIFY_GENERIC,"del",c->argv[1],
                                c->db->id);
        server.dirty += deleted;
    }
    addReplyLongLong(c,deleted);
}

t_set.c:215-266

/* Deletes a value provided as an sds string from the set. Returns 1 if the
 * value was deleted and 0 if it was not a member of the set. */
int setTypeRemove(robj *setobj, sds value) {
    return setTypeRemoveAux(setobj, value, sdslen(value), 0, 1);
}

/* Remove a member. This function is optimized for the different encodings. The
 * value can be provided as an sds string (indicated by passing str_is_sds =
 * 1), as string and length (str_is_sds = 0) or as an integer in which case str
 * is set to NULL and llval is provided instead.
 *
 * Returns 1 if the value was deleted and 0 if it was not a member of the set. */
int setTypeRemoveAux(robj *setobj, char *str, size_t len, int64_t llval, int str_is_sds) {
    char tmpbuf[LONG_STR_SIZE];
    if (!str) {
        if (setobj->encoding == OBJ_ENCODING_INTSET) {
            int success;
            setobj->ptr = intsetRemove(setobj->ptr,llval,&success);
            return success;
        }
        len = ll2string(tmpbuf, sizeof tmpbuf, llval);
        str = tmpbuf;
        str_is_sds = 0;
    }

    if (setobj->encoding == OBJ_ENCODING_HT) {
        sds sdsval = str_is_sds ? (sds)str : sdsnewlen(str, len);
        int deleted = (dictDelete(setobj->ptr, sdsval) == DICT_OK);
        if (sdsval != str) sdsfree(sdsval); /* free temp copy */
        return deleted;
    } else if (setobj->encoding == OBJ_ENCODING_LISTPACK) {
        unsigned char *lp = setobj->ptr;
        unsigned char *p = lpFirst(lp);
        if (p == NULL) return 0;
        p = lpFind(lp, p, (unsigned char*)str, len, 0);
        if (p != NULL) {
            lp = lpDelete(lp, p, NULL);
            setobj->ptr = lp;
            return 1;
        }
    } else if (setobj->encoding == OBJ_ENCODING_INTSET) {
        long long llval;
        if (string2ll(str, len, &llval)) {
            int success;
            setobj->ptr = intsetRemove(setobj->ptr,llval,&success);
            if (success) return 1;
        }
    } else {
        serverPanic("Unknown set encoding");
    }
    return 0;
}

1. 处理 str == NULL 的情况(表示整数)

if (!str) {
    if (setobj->encoding == OBJ_ENCODING_INTSET) {
        ...
        return success;
    }
    len = ll2string(tmpbuf, sizeof tmpbuf, llval);
    str = tmpbuf;
    str_is_sds = 0;
}
  • 如果传入的是整数(没有字符串)
    • 若编码是 intset,直接调用 intsetRemove。
    • 否则,把整数转成字符串(放入临时 tmpbuf 中)后走下边统一逻辑。

2. 编码为 hashtable(OBJ_ENCODING_HT)

sds sdsval = str_is_sds ? (sds)str : sdsnewlen(str, len);
int deleted = (dictDelete(setobj->ptr, sdsval) == DICT_OK);
if (sdsval != str) sdsfree(sdsval); /* free temp copy */
return deleted;
  • 构造 sds 类型的 key。
  • 调用 dictDelete 从字典中删除该 key。
  • 如果构造的是临时 sds,则释放。
  • 返回是否删除成功(1 成功,0 失败)。

3. 编码为 listpack

unsigned char *lp = setobj->ptr;
unsigned char *p = lpFirst(lp);
if (p == NULL) return 0;
p = lpFind(lp, p, (unsigned char*)str, len, 0);
if (p != NULL) {
    lp = lpDelete(lp, p, NULL);
    setobj->ptr = lp;
    return 1;
}
  • 遍历 listpack,查找是否存在该元素。
  • 如果找到了,使用 lpDelete 删除。
  • 更新对象的指针并返回成功。

4. 编码为 intset(注意是字符串情况)

long long llval;
if (string2ll(str, len, &llval)) {
    ...
}
  • 尝试将字符串解析为整数。
  • 如果成功,就调用 intsetRemove。
  • 删除成功则返回 1。

3. 检查成员(SISMEMBER)

t_set.c:708-718

void sismemberCommand(client *c) {
    robj *set;

    if ((set = lookupKeyReadOrReply(c,c->argv[1],shared.czero)) == NULL ||
        checkType(c,set,OBJ_SET)) return;

    if (setTypeIsMember(set,c->argv[2]->ptr))
        addReply(c,shared.cone);
    else
        addReply(c,shared.czero);
}

t_set.c:268-307

/* Check if an sds string is a member of the set. Returns 1 if the value is a
 * member of the set and 0 if it isn't. */
int setTypeIsMember(robj *subject, sds value) {
    return setTypeIsMemberAux(subject, value, sdslen(value), 0, 1);
}


/* Membership checking optimized for the different encodings. The value can be
 * provided as an sds string (indicated by passing str_is_sds = 1), as string
 * and length (str_is_sds = 0) or as an integer in which case str is set to NULL
 * and llval is provided instead.
 *
 * Returns 1 if the value is a member of the set and 0 if it isn't. */
int setTypeIsMemberAux(robj *set, char *str, size_t len, int64_t llval, int str_is_sds) {
    char tmpbuf[LONG_STR_SIZE];
    if (!str) {
        if (set->encoding == OBJ_ENCODING_INTSET)
            return intsetFind(set->ptr, llval);
        len = ll2string(tmpbuf, sizeof tmpbuf, llval);
        str = tmpbuf;
        str_is_sds = 0;
    }


    if (set->encoding == OBJ_ENCODING_LISTPACK) {
        unsigned char *lp = set->ptr;
        unsigned char *p = lpFirst(lp);
        return p && lpFind(lp, p, (unsigned char*)str, len, 0);
    } else if (set->encoding == OBJ_ENCODING_INTSET) {
        long long llval;
        return string2ll(str, len, &llval) && intsetFind(set->ptr, llval);
    } else if (set->encoding == OBJ_ENCODING_HT && str_is_sds) {
        return dictFind(set->ptr, (sds)str) != NULL;
    } else if (set->encoding == OBJ_ENCODING_HT) {
        sds sdsval = sdsnewlen(str, len);
        int result = dictFind(set->ptr, sdsval) != NULL;
        sdsfree(sdsval);
        return result;
    } else {
        serverPanic("Unknown set encoding");
    }
}

1. 处理 str == NULL 的情况(说明是整数)

if (!str) {
    if (set->encoding == OBJ_ENCODING_INTSET)
        return intsetFind(set->ptr, llval);
    len = ll2string(tmpbuf, sizeof tmpbuf, llval);
    str = tmpbuf;
    str_is_sds = 0;
}
  • 如果传入的是整数值:
    • 对于 intset 编码,直接用 llval 查找。
    • 否则将整数转成字符串,进入下一步统一处理逻辑。

2. 处理不同编码类型

OBJ_ENCODING_LISTPACK
unsigned char *lp = set->ptr;
unsigned char *p = lpFirst(lp);
return p && lpFind(lp, p, (unsigned char*)str, len, 0);
  • 先找到第一个元素 p,再用 lpFind 查找目标字符串。
  • 找到返回非 0,找不到返回 0。
OBJ_ENCODING_INTSET
long long llval;
return string2ll(str, len, &llval) && intsetFind(set->ptr, llval);
  • 尝试将字符串转换为整数。
  • 如果成功,用 intsetFind 查找该整数是否存在。
OBJ_ENCODING_HT,且 str 已是 sds
return dictFind(set->ptr, (sds)str) != NULL;
  • 直接使用已有的 sds 指针进行 dictFind,避免拷贝。
OBJ_ENCODING_HT,但 str 不是 sds
sds sdsval = sdsnewlen(str, len);
int result = dictFind(set->ptr, sdsval) != NULL;
sdsfree(sdsval);
return result;
  • 创建临时 sds 对象 sdsval。
  • 使用 dictFind 查询是否存在。
  • 查询结束后释放临时对象

4. 随机弹出元素(SPOP)

t_set.c:965-1006

void spopCommand(client *c) {
    unsigned long size;
    robj *set, *ele;


    if (c->argc == 3) {
        spopWithCountCommand(c);
        return;
    } else if (c->argc > 3) {
        addReplyErrorObject(c,shared.syntaxerr);
        return;
    }


    /* Make sure a key with the name inputted exists, and that it's type is
     * indeed a set */
    if ((set = lookupKeyWriteOrReply(c,c->argv[1],shared.null[c->resp]))
         == NULL || checkType(c,set,OBJ_SET)) return;


    size = setTypeSize(set);
    updateKeysizesHist(c->db, getKeySlot(c->argv[1]->ptr), OBJ_SET, size, size-1);


    /* Pop a random element from the set */
    ele = setTypePopRandom(set);


    notifyKeyspaceEvent(NOTIFY_SET,"spop",c->argv[1],c->db->id);


    /* Replicate/AOF this command as an SREM operation */
    rewriteClientCommandVector(c,3,shared.srem,c->argv[1],ele);


    /* Add the element to the reply */
    addReplyBulk(c, ele);
    decrRefCount(ele);


    /* Delete the set if it's empty */
    if (setTypeSize(set) == 0) {
        dbDelete(c->db,c->argv[1]);
        notifyKeyspaceEvent(NOTIFY_GENERIC,"del",c->argv[1],c->db->id);
    }


    /* Set has been modified */
    signalModifiedKey(c,c->db,c->argv[1]);
    server.dirty++;
}

t_set.c:434-461

/* Pops a random element and returns it as an object. */
robj *setTypePopRandom(robj *set) {
    robj *obj;
    if (set->encoding == OBJ_ENCODING_LISTPACK) {
        /* Find random and delete it without re-seeking the listpack. */
        unsigned int i = 0;
        unsigned char *p = lpNextRandom(set->ptr, lpFirst(set->ptr), &i, 1, 1);
        unsigned int len = 0; /* initialize to silence warning */
        long long llele = 0; /* initialize to silence warning */
        char *str = (char *)lpGetValue(p, &len, &llele);
        if (str)
            obj = createStringObject(str, len);
        else
            obj = createStringObjectFromLongLong(llele);
        set->ptr = lpDelete(set->ptr, p, NULL);
    } else {
        char *str;
        size_t len = 0;
        int64_t llele = 0;
        int encoding = setTypeRandomElement(set, &str, &len, &llele);
        if (str)
            obj = createStringObject(str, len);
        else
            obj = createStringObjectFromLongLong(llele);
        setTypeRemoveAux(set, str, len, llele, encoding == OBJ_ENCODING_HT);
    }
    return obj;
}

1. OBJ_ENCODING_LISTPACK

unsigned char *p = lpNextRandom(set->ptr, lpFirst(set->ptr), &i, 1, 1);
  • lpFirst(set->ptr):获取 listpack 中第一个节点。
  • lpNextRandom(...):从该起始位置随机返回一个元素的位置 p。
char *str = (char *)lpGetValue(p, &len, &llele);
  • 获取随机元素 p 的内容(可能是字符串或整数):
    • 如果是字符串:str != NULL
    • 如果是整数:str == NULL,内容保存在 llele
if (str)
    obj = createStringObject(str, len);
else
    obj = createStringObjectFromLongLong(llele);
  • 根据类型创建 Redis 字符串对象。
set->ptr = lpDelete(set->ptr, p, NULL);
  • 从 listpack 中移除该元素。

2. 其他编码类型(hashtable、intset)

int encoding = setTypeRandomElement(set, &str, &len, &llele);
  • 使用 setTypeRandomElement 获取一个随机元素,并判断其编码类型。
if (str)
    obj = createStringObject(str, len);
else
    obj = createStringObjectFromLongLong(llele);
  • 构造字符串对象(同样区分字符串或整数)。
setTypeRemoveAux(set, str, len, llele, encoding == OBJ_ENCODING_HT);
  • 调用 setTypeRemoveAux 删除该元素(支持 intset、hashtable 等)。

附录

  1. 官方文档
  2. 菜鸟教程-Set
posted @ 2025-05-01 02:54  Eiffelzero  阅读(77)  评论(0)    收藏  举报