Redis List
Redis List
-
Redis列表是简单的字符串列表,按照插入顺序排序。你可以添加一个元素到列表的头部(左边)或者尾部(右边)
-
一个列表最多可以包含 2^32 - 1 个元素 (4294967295, 每个列表超过40亿个元素)。
-
Redis 的 List 是一个 双端链表,支持从左(头部)或右(尾部)插入和删除元素。你可以把它理解为一个可以两头操作的队列或栈。
Demo
1. 左侧插入元素(头插)
LPUSH task_queue "task1"
LPUSH task_queue "task2"
- 列表现在是:task2 → task1
执行结果:
127.0.0.1:6379> LPUSH task_queue "task1"
(integer) 1
127.0.0.1:6379> LPUSH task_queue "task2"
(integer) 2
2. 右侧插入元素(尾插)
RPUSH task_queue "task3"
- 列表现在是:task2 → task1 → task3
执行结果:
127.0.0.1:6379> RPUSH task_queue "task3"
(integer) 3
3. 左侧弹出元素(头出)
LPOP task_queue
- 弹出并返回 task2
执行结果:
127.0.0.1:6379> LPOP task_queue
"task2"
4. 右侧弹出元素(尾出)
RPOP task_queue
- 弹出并返回 task3
执行结果:
127.0.0.1:6379> RPOP task_queue
"task3"
5. 查看列表所有元素
LRANGE task_queue 0 -1
- 范围 [start, end],0 -1 表示所有
执行结果:
127.0.0.1:6379> LRANGE task_queue 0 -1
1) "task1"
6. 获取列表长度
LLEN task_queue
执行结果:
127.0.0.1:6379> LLEN task_queue
(integer) 1
7. 按下标获取元素
LINDEX task_queue 0
执行结果:
127.0.0.1:6379> LINDEX task_queue 0
"task1"
8. 设置某个索引的值
LSET task_queue 0 "task-new"
执行结果:
127.0.0.1:6379> LSET task_queue 0 "task-new"
OK
9. 删除指定值的元素(匹配并删除)
LREM task_queue 1 "task-new"
- 删除 1 个 值为 task-new 的元素
执行结果:
127.0.0.1:6379> LREM task_queue 1 "task-new"
(integer) 1
10. 截取列表(保留指定范围)
LTRIM task_queue 0 2
- 保留下标 0~2 的元素,其余全部删除
执行结果:
127.0.0.1:6379> LPUSH task_queue "task1"
(integer) 1
127.0.0.1:6379> LPUSH task_queue "task2"
(integer) 2
127.0.0.1:6379> LPUSH task_queue "task3"
(integer) 3
127.0.0.1:6379> LPUSH task_queue "task4"
(integer) 4
127.0.0.1:6379> LTRIM task_queue 0 2
OK
127.0.0.1:6379> LRANGE task_queue 0 -1
1) "task4"
2) "task3"
3) "task2"
11. 阻塞式弹出(队列等待)
BLPOP task_queue 5
- 如果 task_queue 为空,会阻塞最多 5 秒等待数据
执行结果:
127.0.0.1:6379> BLPOP task_queue 5
1) "task_queue"
2) "task4"
数据编码
- Listpack (旧版本是ziplist,listpack是ziplist的升级版,结构更紧凑,性能更好)
- Quicklist
数据编码转换
-
初始状态:使用 listpack
- 当你新建一个list时,例如:
127.0.0.1:6379> LPUSH mylist "a" (integer) 1 127.0.0.1:6379> object encoding mylist "listpack"
-
自动转换为 quicklist 的条件
Redis 会在以下几种情况下,将 listpack 自动转成 quicklist:-
条件一:超过
list-max-listpack-size
- list-max-listpack-size 是 Redis 的配置项(单位:entry 数量或大小),默认值视版本可能为 -2(即 8KB)。
- 一旦 单个 listpack 的内存大小或 entry 数量 超过这个阈值,Redis 就会触发结构转换:
for i in {1..2000}; do redis-cli LPUSH mylist "abc" done 127.0.0.1:6379> OBJECT ENCODING mylist "quicklist"
-
条件二: 新增元素导致大小即将超过阈值
- 即便当前 listpack 没超过限制,如果下一个元素一插入就会超过,Redis 会提前转为 quicklist。
-
转换方法:t_list.c:22-25
static void listTypeTryConvertListpack(robj *o, robj **argv, int start, int end, beforeConvertCB fn, void *data) { serverAssert(o->encoding == OBJ_ENCODING_LISTPACK); size_t add_bytes = 0; size_t add_length = 0; if (argv) { for (int i = start; i <= end; i++) { if (!sdsEncodedObject(argv[i])) continue; add_bytes += sdslen(argv[i]->ptr); } add_length = end - start + 1; } if (quicklistNodeExceedsLimit(server.list_max_listpack_size, lpBytes(o->ptr) + add_bytes, lpLength(o->ptr) + add_length)) { /* Invoke callback before conversion. */ if (fn) fn(data); quicklist *ql = quicklistNew(server.list_max_listpack_size, server.list_compress_depth); /* Append listpack to quicklist if it's not empty, otherwise release it. */ if (lpLength(o->ptr)) quicklistAppendListpack(ql, o->ptr); else lpFree(o->ptr); o->ptr = ql; o->encoding = OBJ_ENCODING_QUICKLIST; } }
-
-
自动从 quicklist 转回 listpack 的条件
Redis 也会“智能回退”,当 list 缩小时,尝试回到更节省内存的 listpack:-
条件一:quicklist 节点数 = 1,且总大小 < ½ * list-max-listpack-size
- 如果 list 缩小至只剩一个 quicklistNode 且大小足够小,Redis 会自动压缩回 listpack。
-
转换方法:t_list.c:66-94
static void listTypeTryConvertQuicklist(robj *o, int shrinking, beforeConvertCB fn, void *data) { serverAssert(o->encoding == OBJ_ENCODING_QUICKLIST); size_t sz_limit; unsigned int count_limit; quicklist *ql = o->ptr; /* A quicklist can be converted to listpack only if it has only one packed node. */ if (ql->len != 1 || ql->head->container != QUICKLIST_NODE_CONTAINER_PACKED) return; /* Check the length or size of the quicklist is below the limit. */ quicklistNodeLimit(server.list_max_listpack_size, &sz_limit, &count_limit); if (shrinking) { sz_limit /= 2; count_limit /= 2; } if (ql->head->sz > sz_limit || ql->count > count_limit) return; /* Invoke callback before conversion. */ if (fn) fn(data); /* Extract the listpack from the unique quicklist node, * then reset it and release the quicklist. */ o->ptr = ql->head->entry; ql->head->entry = NULL; quicklistRelease(ql); o->encoding = OBJ_ENCODING_LISTPACK; }
-
zipList、listpack、quickList
zipList(压缩列表)
数据结构
zipList 是 Redis 中早期的一种紧凑型内存数据结构,专为存储少量数据而设计。
zipList 的整体布局如下:ziplist.c:14-35
* The general layout of the ziplist is as follows:
*
* <zlbytes> <zltail> <zllen> <entry> <entry> ... <entry> <zlend>
*
* NOTE: all fields are stored in little endian, if not specified otherwise.
*
* <uint32_t zlbytes> is an unsigned integer to hold the number of bytes that
* the ziplist occupies, including the four bytes of the zlbytes field itself.
* This value needs to be stored to be able to resize the entire structure
* without the need to traverse it first.
*
* <uint32_t zltail> is the offset to the last entry in the list. This allows
* a pop operation on the far side of the list without the need for full
* traversal.
*
* <uint16_t zllen> is the number of entries. When there are more than
* 2^16-2 entries, this value is set to 2^16-1 and we need to traverse the
* entire list to know how many items it holds.
*
* <uint8_t zlend> is a special entry representing the end of the ziplist.
* Is encoded as a single byte equal to 255. No other normal entry starts
* with a byte set to the value of 255.
各部分含义:
- zlbytes:无符号整数,表示整个 ziplist 占用的字节数
- zltail:到最后一个条目的偏移量,便于从后向前操作
- zllen:条目数量
- zlend:特殊结束标记(0xFF)
zipList 的每个entry都有这样的结构:ziplist.c:45-47
* So a complete entry is stored like this:
*
* <prevlen> <encoding> <entry-data>
各部分含义:
- prevlen: 前一项的长度
- encoding: 当前项的长度编码
- entry-data:实际数据
例子:
ziplist 的主要缺点:级联更新问题
ziplist 的最大缺点是"级联更新"(cascade update)问题,这会导致某些操作在最坏情况下的时间复杂度达到 O(N²)。
级联更新的原理
在 ziplist 中,每个entry都包含一个 prevlen
字段,用于存储前一个entry的长度。这个设计允许从后向前遍历列表。但是,当你插入或删除一个条目时,可能会导致后续条目的 prevlen
字段需要更新。
ziplist 中 prevlen 的编码规则如下:
- 如果前一个条目的长度小于 254 字节,prevlen 只使用 1 个字节存储
- 如果前一个条目的长度大于或等于 254 字节,prevlen 将使用 5 个字节存储(1 个标记字节 + 4 个字节的长度)ziplist.c:55-60
* The length of the previous entry, <prevlen>, is encoded in the following way:
* If this length is smaller than 254 bytes, it will only consume a single
* byte representing the length as an unsigned 8 bit integer. When the length
* is greater than or equal to 254, it will consume 5 bytes. The first byte is
* set to 254 (FE) to indicate a larger value is following. The remaining 4
* bytes take the length of the previous entry as value.
相关函数: __ziplistCascadeUpdate
ziplist.c:750-770
unsigned char *__ziplistCascadeUpdate(unsigned char *zl, unsigned char *p) {
zlentry cur;
size_t prevlen, prevlensize, prevoffset; /* Informat of the last changed entry. */
size_t firstentrylen; /* Used to handle insert at head. */
size_t rawlen, curlen = intrev32ifbe(ZIPLIST_BYTES(zl));
size_t extra = 0, cnt = 0, offset;
size_t delta = 4; /* Extra bytes needed to update a entry's prevlen (5-1). */
unsigned char *tail = zl + intrev32ifbe(ZIPLIST_TAIL_OFFSET(zl));
/* Empty ziplist */
if (p[0] == ZIP_END) return zl;
zipEntry(p, &cur); /* no need for "safe" variant since the input pointer was validated by the function that returned it. */
firstentrylen = prevlen = cur.headersize + cur.len;
prevlensize = zipStorePrevEntryLength(NULL, prevlen);
prevoffset = p - zl;
p += prevlen;
/* Iterate ziplist to find out how many extra bytes do we need to update it. */
while (p[0] != ZIP_END) {
assert(zipEntrySafe(zl, curlen, p, &cur, 0));
...
实际测试案例
Redis 的测试代码中有一个具体的例子,展示了级联更新的性能问题: ziplist.c:2514-2526
printf("Stress __ziplistCascadeUpdate:\n");
{
char data[ZIP_BIG_PREVLEN];
zl = ziplistNew();
iteration = accurate ? 100000 : 100;
for (int i = 0; i < iteration; i++) {
zl = ziplistPush(zl, (unsigned char*)data, ZIP_BIG_PREVLEN-4, ZIPLIST_TAIL);
}
unsigned long long start = usec();
zl = ziplistPush(zl, (unsigned char*)data, ZIP_BIG_PREVLEN-3, ZIPLIST_HEAD);
printf("Done. usec=%lld\n\n", usec()-start);
zfree(zl);
}
这个测试创建了一个包含大量条目的 ziplist,每个条目的大小都是 ZIP_BIG_PREVLEN-4(接近但小于 254 字节)。然后,它在列表头部插入一个大小为 ZIP_BIG_PREVLEN-3 的条目,这会触发级联更新。
更详细的测试案例: ziplist.c:2528-2607
)
printf("Edge cases of __ziplistCascadeUpdate:\n");
{
/* Inserting a entry with data length greater than ZIP_BIG_PREVLEN-4
* will leads to cascade update. */
size_t s1 = ZIP_BIG_PREVLEN-4, s2 = ZIP_BIG_PREVLEN-3;
zl = ziplistNew();
zlentry e[4] = {{.prevrawlensize = 0, .prevrawlen = 0, .lensize = 0,
.len = 0, .headersize = 0, .encoding = 0, .p = NULL}};
zl = insertHelper(zl, 'a', s1, ZIPLIST_ENTRY_HEAD(zl));
verify(zl, e);
assert(e[0].prevrawlensize == 1 && e[0].prevrawlen == 0);
assert(compareHelper(zl, 'a', s1, 0));
ziplistRepr(zl);
/* No expand. */
zl = insertHelper(zl, 'b', s1, ZIPLIST_ENTRY_HEAD(zl));
verify(zl, e);
assert(e[0].prevrawlensize == 1 && e[0].prevrawlen == 0);
assert(compareHelper(zl, 'b', s1, 0));
assert(e[1].prevrawlensize == 1 && e[1].prevrawlen == strEntryBytesSmall(s1));
assert(compareHelper(zl, 'a', s1, 1));
ziplistRepr(zl);
/* Expand(tail included). */
zl = insertHelper(zl, 'c', s2, ZIPLIST_ENTRY_HEAD(zl));
verify(zl, e);
assert(e[0].prevrawlensize == 1 && e[0].prevrawlen == 0);
assert(compareHelper(zl, 'c', s2, 0));
assert(e[1].prevrawlensize == 5 && e[1].prevrawlen == strEntryBytesSmall(s2));
assert(compareHelper(zl, 'b', s1, 1));
assert(e[2].prevrawlensize == 5 && e[2].prevrawlen == strEntryBytesLarge(s1));
assert(compareHelper(zl, 'a', s1, 2));
ziplistRepr(zl);
/* Expand(only previous head entry). */
zl = insertHelper(zl, 'd', s2, ZIPLIST_ENTRY_HEAD(zl));
verify(zl, e);
assert(e[0].prevrawlensize == 1 && e[0].prevrawlen == 0);
assert(compareHelper(zl, 'd', s2, 0));
assert(e[1].prevrawlensize == 5 && e[1].prevrawlen == strEntryBytesSmall(s2));
assert(compareHelper(zl, 'c', s2, 1));
assert(e[2].prevrawlensize == 5 && e[2].prevrawlen == strEntryBytesLarge(s2));
assert(compareHelper(zl, 'b', s1, 2));
assert(e[3].prevrawlensize == 5 && e[3].prevrawlen == strEntryBytesLarge(s1));
assert(compareHelper(zl, 'a', s1, 3));
ziplistRepr(zl);
/* Delete from mid. */
unsigned char *p = ziplistIndex(zl, 2);
zl = ziplistDelete(zl, &p);
verify(zl, e);
assert(e[0].prevrawlensize == 1 && e[0].prevrawlen == 0);
assert(compareHelper(zl, 'd', s2, 0));
assert(e[1].prevrawlensize == 5 && e[1].prevrawlen == strEntryBytesSmall(s2));
assert(compareHelper(zl, 'c', s2, 1));
assert(e[2].prevrawlensize == 5 && e[2].prevrawlen == strEntryBytesLarge(s2));
assert(compareHelper(zl, 'a', s1, 2));
ziplistRepr(zl);
zfree(zl);
}
这个测试展示了级联更新的具体过程:
- 首先创建一个包含几个条目的 ziplist
- 然后插入一个大小为 s2 的条目(大于临界值)
- 这导致后续条目的 prevlen 字段从 1 字节扩展到 5 字节
- 测试验证了每个条目的 prevlensize 是否正确更新为 5
影响
这种级联更新在最坏情况下会导致 O(N²) 的时间复杂度:
- 如果有 N 个条目,每个条目的大小都接近但小于 254 字节
- 在列表头部插入一个条目,使第一个条目的大小超过 254 字节
- 这会导致第二个条目的 prevlen 从 1 字节扩展到 5 字节
- 第二个条目的总大小增加,导致第三个条目的 prevlen 也需要更新
- 这种更新会一直级联下去,影响所有 N 个条目
- 每次更新都需要重新分配整个 ziplist 的内存
- 总的时间复杂度为 O(N²)
listpack
listpack 是 ziplist 的改进版本,它通过改变编码方式解决了级联更新问题。在 listpack 中,条目不再存储前一个条目的长度,而是存储自己的长度,这样就避免了级联更新问题。
listpack 的基本结构
listpack 的基本结构如下:
<total_bytes><num_elements><element_1><element_2>...<element_N><LP_EOF>
其中:
- total_bytes:占用 4 字节,表示整个 listpack 的总字节数
- num_elements:占用 2 字节,表示 listpack 中的元素数量
- element_X:各个元素
- LP_EOF:结束标记(1 字节)
listpack 中的元素编码
listpack 支持多种编码方式,可以高效地存储不同类型和大小的数据: listpack.c:33-73
#define LP_ENCODING_7BIT_UINT 0
#define LP_ENCODING_7BIT_UINT_MASK 0x80
#define LP_ENCODING_IS_7BIT_UINT(byte) (((byte)&LP_ENCODING_7BIT_UINT_MASK)==LP_ENCODING_7BIT_UINT)
#define LP_ENCODING_7BIT_UINT_ENTRY_SIZE 2
#define LP_ENCODING_6BIT_STR 0x80
#define LP_ENCODING_6BIT_STR_MASK 0xC0
#define LP_ENCODING_IS_6BIT_STR(byte) (((byte)&LP_ENCODING_6BIT_STR_MASK)==LP_ENCODING_6BIT_STR)
#define LP_ENCODING_13BIT_INT 0xC0
#define LP_ENCODING_13BIT_INT_MASK 0xE0
#define LP_ENCODING_IS_13BIT_INT(byte) (((byte)&LP_ENCODING_13BIT_INT_MASK)==LP_ENCODING_13BIT_INT)
#define LP_ENCODING_13BIT_INT_ENTRY_SIZE 3
#define LP_ENCODING_12BIT_STR 0xE0
#define LP_ENCODING_12BIT_STR_MASK 0xF0
#define LP_ENCODING_IS_12BIT_STR(byte) (((byte)&LP_ENCODING_12BIT_STR_MASK)==LP_ENCODING_12BIT_STR)
#define LP_ENCODING_16BIT_INT 0xF1
#define LP_ENCODING_16BIT_INT_MASK 0xFF
#define LP_ENCODING_IS_16BIT_INT(byte) (((byte)&LP_ENCODING_16BIT_INT_MASK)==LP_ENCODING_16BIT_INT)
#define LP_ENCODING_16BIT_INT_ENTRY_SIZE 4
#define LP_ENCODING_24BIT_INT 0xF2
#define LP_ENCODING_24BIT_INT_MASK 0xFF
#define LP_ENCODING_IS_24BIT_INT(byte) (((byte)&LP_ENCODING_24BIT_INT_MASK)==LP_ENCODING_24BIT_INT)
#define LP_ENCODING_24BIT_INT_ENTRY_SIZE 5
#define LP_ENCODING_32BIT_INT 0xF3
#define LP_ENCODING_32BIT_INT_MASK 0xFF
#define LP_ENCODING_IS_32BIT_INT(byte) (((byte)&LP_ENCODING_32BIT_INT_MASK)==LP_ENCODING_32BIT_INT)
#define LP_ENCODING_32BIT_INT_ENTRY_SIZE 6
#define LP_ENCODING_64BIT_INT 0xF4
#define LP_ENCODING_64BIT_INT_MASK 0xFF
#define LP_ENCODING_IS_64BIT_INT(byte) (((byte)&LP_ENCODING_64BIT_INT_MASK)==LP_ENCODING_64BIT_INT)
#define LP_ENCODING_64BIT_INT_ENTRY_SIZE 10
#define LP_ENCODING_32BIT_STR 0xF0
#define LP_ENCODING_32BIT_STR_MASK 0xFF
#define LP_ENCODING_IS_32BIT_STR(byte) (((byte)&LP_ENCODING_32BIT_STR_MASK)==LP_ENCODING_32BIT_STR)
每个元素在 listpack 中的存储格式为:
<encoding><data><backlen>
其中:
- encoding:表示数据类型和长度的编码
- data:实际数据
- backlen:元素的总长度(用于从后向前遍历)
listpack 如何解决级联更新问题
ziplist 的级联更新问题主要源于其每个条目都存储前一个条目长度的设计。当一个条目的长度从小于 254 字节变为大于等于 254 字节时,后续所有条目的 prevlen
字段都需要从 1 字节扩展到 5 字节,导致连锁反应。
listpack 通过以下几个关键设计解决了这个问题:
-
使用后向长度(
backlen
)而非前向长度
与ziplist
不同,listpack
中的每个元素不存储前一个元素的长度,而是在元素末尾存储自己的长度(称为backlen
)。这样,当一个元素的大小变化时,不会影响其他元素。 -
可变长度编码的
backlen
backlen
使用可变长度编码,可以根据元素的实际大小使用 1 到 5 个字节。这种设计使得backlen
的大小变化不会影响其他元素的位置。 -
元素的独立性
由于每个元素都包含自己的完整信息(编码、数据和长度),元素之间的依赖性大大降低。当需要插入或删除元素时,只需要移动后续元素,而不需要修改它们的内部结构。
listpack 的实现细节
创建新的 listpack
unsigned char *lpNew(size_t capacity)
这个函数创建一个新的 listpack,可以指定初始容量。它会分配内存并初始化 listpack 的头部和结束标记。 listpack.h:36
插入元素
listpack 提供了多种插入元素的方法: listpack.h:40-49
unsigned char *lpInsertString(unsigned char *lp, unsigned char *s, uint32_t slen,
unsigned char *p, int where, unsigned char **newp);
unsigned char *lpInsertInteger(unsigned char *lp, long long lval,
unsigned char *p, int where, unsigned char **newp);
unsigned char *lpPrepend(unsigned char *lp, unsigned char *s, uint32_t slen);
unsigned char *lpPrependInteger(unsigned char *lp, long long lval);
unsigned char *lpAppend(unsigned char *lp, unsigned char *s, uint32_t slen);
unsigned char *lpAppendInteger(unsigned char *lp, long long lval);
unsigned char *lpReplace(unsigned char *lp, unsigned char **p, unsigned char *s, uint32_t slen);
unsigned char *lpReplaceInteger(unsigned char *lp, unsigned char **p, long long lval);
以 lpInsert 为例,它是 listpack 中最核心的插入函数,其他插入函数都是基于它实现的。当插入一个元素时,它会:
- 计算新元素所需的空间
- 重新分配 listpack 的内存
- 移动后续元素
- 在指定位置插入新元素
- 更新 listpack 的头部信息
与 ziplist 不同,这个过程不会触发级联更新,因为每个元素都是独立的,不依赖于其他元素的长度信息。
批量插入优化
listpack 还提供了批量插入的功能,可以一次插入多个元素,减少内存重分配的次数: listpack.h:53-55
unsigned char *lpBatchAppend(unsigned char *lp, listpackEntry *entries, unsigned long len);
unsigned char *lpBatchInsert(unsigned char *lp, unsigned char *p, int where,
listpackEntry *entries, unsigned int len, unsigned char **newp);
批量插入的实现如下: listpack.c:1121-1230
unsigned char *lpBatchInsert(unsigned char *lp, unsigned char *p, int where,
listpackEntry *entries, unsigned int len,
unsigned char **newp)
{
assert(where == LP_BEFORE || where == LP_AFTER);
assert(entries != NULL && len > 0);
struct listpackInsertEntry {
int enctype;
uint64_t enclen;
unsigned char intenc[LP_MAX_INT_ENCODING_LEN];
unsigned char backlen[LP_MAX_BACKLEN_SIZE];
unsigned long backlen_size;
};
uint64_t addedlen = 0; /* The encoded length of the added elements. */
struct listpackInsertEntry tmp[3]; /* Encoded entries */
struct listpackInsertEntry *enc = tmp;
if (len > sizeof(tmp) / sizeof(struct listpackInsertEntry)) {
/* If 'len' is larger than local buffer size, allocate on heap. */
enc = zmalloc(len * sizeof(struct listpackInsertEntry));
}
/* If we need to insert after the current element, we just jump to the
* next element (that could be the EOF one) and handle the case of
* inserting before. So the function will actually deal with just one
* case: LP_BEFORE. */
if (where == LP_AFTER) {
p = lpSkip(p);
where = LP_BEFORE;
ASSERT_INTEGRITY(lp, p);
}
for (unsigned int i = 0; i < len; i++) {
listpackEntry *e = &entries[i];
if (e->sval) {
/* Calling lpEncodeGetType() results into the encoded version of the
* element to be stored into 'intenc' in case it is representable as
* an integer: in that case, the function returns LP_ENCODING_INT.
* Otherwise, if LP_ENCODING_STR is returned, we'll have to call
* lpEncodeString() to actually write the encoded string on place
* later.
*
* Whatever the returned encoding is, 'enclen' is populated with the
* length of the encoded element. */
enc[i].enctype = lpEncodeGetType(e->sval, e->slen,
enc[i].intenc, &enc[i].enclen);
} else {
enc[i].enctype = LP_ENCODING_INT;
lpEncodeIntegerGetType(e->lval, enc[i].intenc, &enc[i].enclen);
}
addedlen += enc[i].enclen;
/* We need to also encode the backward-parsable length of the element
* and append it to the end: this allows to traverse the listpack from
* the end to the start. */
enc[i].backlen_size = lpEncodeBacklen(enc[i].backlen, enc[i].enclen);
addedlen += enc[i].backlen_size;
}
uint64_t old_listpack_bytes = lpGetTotalBytes(lp);
uint64_t new_listpack_bytes = old_listpack_bytes + addedlen;
if (new_listpack_bytes > UINT32_MAX) return NULL;
/* Store the offset of the element 'p', so that we can obtain its
* address again after a reallocation. */
unsigned long poff = p-lp;
unsigned char *dst = lp + poff; /* May be updated after reallocation. */
/* Realloc before: we need more room. */
if (new_listpack_bytes > old_listpack_bytes &&
new_listpack_bytes > lp_malloc_size(lp)) {
if ((lp = lp_realloc(lp,new_listpack_bytes)) == NULL) return NULL;
dst = lp + poff;
}
/* Setup the listpack relocating the elements to make the exact room
* we need to store the new ones. */
memmove(dst+addedlen,dst,old_listpack_bytes-poff);
for (unsigned int i = 0; i < len; i++) {
listpackEntry *ent = &entries[i];
if (newp)
*newp = dst;
if (enc[i].enctype == LP_ENCODING_INT)
memcpy(dst, enc[i].intenc, enc[i].enclen);
else
lpEncodeString(dst, ent->sval, ent->slen);
dst += enc[i].enclen;
memcpy(dst, enc[i].backlen, enc[i].backlen_size);
dst += enc[i].backlen_size;
}
/* Update header. */
uint32_t num_elements = lpGetNumElements(lp);
if (num_elements != LP_HDR_NUMELE_UNKNOWN) {
if ((int64_t) len > (int64_t) LP_HDR_NUMELE_UNKNOWN - (int64_t) num_elements)
lpSetNumElements(lp, LP_HDR_NUMELE_UNKNOWN);
else
lpSetNumElements(lp,num_elements + len);
}
lpSetTotalBytes(lp,new_listpack_bytes);
if (enc != tmp) lp_free(enc);
return lp;
}
这个函数会一次性计算所有要插入元素的总大小,然后只进行一次内存重分配,大大提高了插入效率。
遍历元素
listpack 支持从前向后和从后向前遍历: listpack.h:66-70
unsigned char *lpFirst(unsigned char *lp);
unsigned char *lpLast(unsigned char *lp);
unsigned char *lpNext(unsigned char *lp, unsigned char *p);
unsigned char *lpNextWithBytes(unsigned char *lp, unsigned char *p, const size_t lpbytes);
unsigned char *lpPrev(unsigned char *lp, unsigned char *p);
由于每个元素都包含自己的长度信息,无论是从前向后还是从后向前遍历,都可以高效地进行,不需要依赖其他元素的信息。
quickList
quickList 是 Redis 中的一种复合数据结构,它结合了链表和紧凑型数据结构的优点
quickList 的基本结构
quickList 的结构定义如下: quicklist.h:107-116
/* quicklist is a 40 byte struct (on 64-bit systems) describing a quicklist.
* 'count' is the number of total entries.
* 'len' is the number of quicklist nodes.
* 'compress' is: 0 if compression disabled, otherwise it's the number
* of quicklistNodes to leave uncompressed at ends of quicklist.
* 'fill' is the user-requested (or default) fill factor.
* 'bookmarks are an optional feature that is used by realloc this struct,
* so that they don't consume memory when not used. */
typedef struct quicklist {
quicklistNode *head;
quicklistNode *tail;
unsigned long count; /* total count of all entries in all listpacks */
unsigned long len; /* number of quicklistNodes */
signed int fill : QL_FILL_BITS; /* fill factor for individual nodes */
unsigned int compress : QL_COMP_BITS; /* depth of end nodes not to compress;0=off */
unsigned int bookmark_count: QL_BM_BITS;
quicklistBookmark bookmarks[];
} quicklist;
每个 quickList 包含:
- 头尾指针(head 和 tail)
- 元素总数(count)
- 节点数量(len)
- 填充因子(fill)
- 压缩深度(compress)
- 书签相关字段(bookmark_count 和 bookmarks)
而 quickList 中的每个节点(quicklistNode)定义如下: quicklist.h:47-59
/* quicklistNode is a 32 byte struct describing a listpack for a quicklist.
* We use bit fields keep the quicklistNode at 32 bytes.
* count: 16 bits, max 65536 (max lp bytes is 65k, so max count actually < 32k).
* encoding: 2 bits, RAW=1, LZF=2.
* container: 2 bits, PLAIN=1 (a single item as char array), PACKED=2 (listpack with multiple items).
* recompress: 1 bit, bool, true if node is temporary decompressed for usage.
* attempted_compress: 1 bit, boolean, used for verifying during testing.
* dont_compress: 1 bit, boolean, used for preventing compression of entry.
* extra: 9 bits, free for future use; pads out the remainder of 32 bits */
typedef struct quicklistNode {
struct quicklistNode *prev;
struct quicklistNode *next;
unsigned char *entry;
size_t sz; /* entry size in bytes */
unsigned int count : 16; /* count of items in listpack */
unsigned int encoding : 2; /* RAW==1 or LZF==2 */
unsigned int container : 2; /* PLAIN==1 or PACKED==2 */
unsigned int recompress : 1; /* was this node previous compressed? */
unsigned int attempted_compress : 1; /* node can't compress; too small */
unsigned int dont_compress : 1; /* prevent compression of entry that will be used later */
unsigned int extra : 9; /* more bits to steal for future usage */
} quicklistNode;
每个节点包含:
- 前后指针(prev 和 next)
- 数据指针(entry)
- 数据大小(sz)
- 元素数量(count)
- 编码方式(encoding):RAW 或 LZF 压缩
- 容器类型(container):PLAIN 或 PACKED
- 其他压缩相关标志
quickList 的实现细节
创建 quickList
创建一个新的 quickList 的代码如下: quicklist.c:165-170
/* Create a new quicklist with some default parameters. */
quicklist *quicklistNew(int fill, int compress) {
quicklist *quicklist = quicklistCreate();
quicklistSetOptions(quicklist, fill, compress);
return quicklist;
}
这个函数会创建一个空的 quickList,并设置填充因子(fill)和压缩深度(compress)。
添加元素
quickList 支持在头部加元素: quicklist.c:583-605
/* Add new entry to head node of quicklist.
*
* Returns 0 if used existing head.
* Returns 1 if new head created. */
int quicklistPushHead(quicklist *quicklist, void *value, size_t sz) {
quicklistNode *orig_head = quicklist->head;
if (unlikely(isLargeElement(sz, quicklist->fill))) {
__quicklistInsertPlainNode(quicklist, quicklist->head, value, sz, 0);
return 1;
}
if (likely(
_quicklistNodeAllowInsert(quicklist->head, quicklist->fill, sz))) {
quicklist->head->entry = lpPrepend(quicklist->head->entry, value, sz);
quicklistNodeUpdateSz(quicklist->head);
} else {
quicklistNode *node = quicklistCreateNode();
node->entry = lpPrepend(lpNew(0), value, sz);
quicklistNodeUpdateSz(node);
_quicklistInsertNodeBefore(quicklist, quicklist->head, node);
}
quicklist->count++;
quicklist->head->count++;
return (orig_head != quicklist->head);
}
支持尾盘添加元素:
quicklist.c:611-632
/* Add new entry to tail node of quicklist.
*
* Returns 0 if used existing tail.
* Returns 1 if new tail created. */
int quicklistPushTail(quicklist *quicklist, void *value, size_t sz) {
quicklistNode *orig_tail = quicklist->tail;
if (unlikely(isLargeElement(sz, quicklist->fill))) {
__quicklistInsertPlainNode(quicklist, quicklist->tail, value, sz, 1);
return 1;
}
if (likely(
_quicklistNodeAllowInsert(quicklist->tail, quicklist->fill, sz))) {
quicklist->tail->entry = lpAppend(quicklist->tail->entry, value, sz);
quicklistNodeUpdateSz(quicklist->tail);
} else {
quicklistNode *node = quicklistCreateNode();
node->entry = lpAppend(lpNew(0), value, sz);
quicklistNodeUpdateSz(node);
_quicklistInsertNodeAfter(quicklist, quicklist->tail, node);
}
quicklist->count++;
quicklist->tail->count++;
return (orig_tail != quicklist->tail);
}
这些函数会根据填充因子决定是在现有节点中添加元素,还是创建新节点。如果元素过大,会使用 PLAIN 容器类型存储;否则使用 PACKED 容器类型(即 listpack)。
压缩机制
quickList 支持 LZF 压缩来节省内存: quicklist.h:139-142
/* quicklist node encodings */
#define QUICKLIST_NODE_ENCODING_RAW 1
#define QUICKLIST_NODE_ENCODING_LZF 2
通过设置压缩深度,可以控制哪些节点会被压缩。通常,活跃使用的头尾节点不会被压缩,而中间的节点会被压缩以节省内存。
合并节点
当删除元素后,quickList 会尝试合并相邻的节点以保持效率: quicklist.c:865-895
/* Given two nodes, try to merge their listpacks.
*
* This helps us not have a quicklist with 3 element listpacks if
* our fill factor can handle much higher levels.
*
* Note: 'a' must be to the LEFT of 'b'.
*
* After calling this function, both 'a' and 'b' should be considered
* unusable. The return value from this function must be used
* instead of re-using any of the quicklistNode input arguments.
*
* Returns the input node picked to merge against or NULL if
* merging was not possible. */
REDIS_STATIC quicklistNode *_quicklistListpackMerge(quicklist *quicklist,
quicklistNode *a,
quicklistNode *b) {
D("Requested merge (a,b) (%u, %u)", a->count, b->count);
quicklistDecompressNode(a);
quicklistDecompressNode(b);
if ((lpMerge(&a->entry, &b->entry))) {
/* We merged listpacks! Now remove the unused quicklistNode. */
quicklistNode *keep = NULL, *nokeep = NULL;
if (!a->entry) {
nokeep = a;
keep = b;
} else if (!b->entry) {
nokeep = b;
keep = a;
}
keep->count = lpLength(keep->entry);
quicklistNodeUpdateSz(keep);
keep->recompress = 0; /* Prevent 'keep' from being recompressed if
* it becomes head or tail after merging. */
nokeep->count = 0;
__quicklistDelNode(quicklist, nokeep);
quicklistCompress(quicklist, keep);
return keep;
} else {
/* else, the merge returned NULL and nothing changed. */
return NULL;
}
}
为什么需要quickList
虽然 listpack 解决了 ziplist 的级联更新问题,但它仍然有一些局限性:
-
大小限制:listpack 作为一个连续内存块,当数据量很大时,需要一次性分配大量内存,可能导致内存碎片和分配失败。
-
操作效率:对于非常大的 listpack,每次修改都需要重新分配整个内存块,效率较低。
-
内存使用:大型 listpack 不利于内存回收和重用。
quickList 通过将数据分散到多个较小的 listpack 节点中,解决了这些问题:
-
平衡了内存效率和操作效率:每个节点是一个较小的 listpack,既保持了紧凑存储的优势,又避免了大块内存分配的问题。
-
支持压缩:不活跃的节点可以被压缩,进一步节省内存。
-
灵活的内存管理:只需要重新分配修改的节点,而不是整个数据结构。
总结
- zipList:早期紧凑型列表,适合少量数据,但有级联更新问题
- listpack:zipList 的改进版,更高效的编码和解决了级联更新问题
- quickList:双向链表和 listpack 的混合体,提供了灵活性和内存效率的平衡
list主要操作实现
1. 添加元素 (LPUSH/RPUSH)
可以在列表的头部或尾部添加元素:t_list.c:496-514
/* LPUSH <key> <element> [<element> ...] */
void lpushCommand(client *c) {
pushGenericCommand(c,LIST_HEAD,0);
}
/* RPUSH <key> <element> [<element> ...] */
void rpushCommand(client *c) {
pushGenericCommand(c,LIST_TAIL,0);
}
/* LPUSHX <key> <element> [<element> ...] */
void lpushxCommand(client *c) {
pushGenericCommand(c,LIST_HEAD,1);
}
/* RPUSHX <key> <element> [<element> ...] */
void rpushxCommand(client *c) {
pushGenericCommand(c,LIST_TAIL,1);
}
都调用了一个核心函数:pushGenericCommand
t_list.c:463-494
命令 | 参数解释 | 行为描述 |
---|---|---|
LPUSH |
LIST_HEAD , 0 |
向列表头插入一个或多个元素。如果 key 不存在,会创建新列表。 |
RPUSH |
LIST_TAIL , 0 |
向列表尾插入一个或多个元素。如果 key 不存在,会创建新列表。 |
LPUSHX |
LIST_HEAD , 1 |
向列表头插入,但 key 必须已存在,否则不执行。 |
RPUSHX |
LIST_TAIL , 1 |
向列表尾插入,但 key 必须已存在,否则不执行。 |
/* Implements LPUSH/RPUSH/LPUSHX/RPUSHX.
* 'xx': push if key exists. */
void pushGenericCommand(client *c, int where, int xx) {
unsigned long llen;
int j;
// 调用 lookupKeyWrite 从当前数据库中查找 key(写操作版本)。
robj *lobj = lookupKeyWrite(c->db, c->argv[1]);
// 如果该 key 存在但不是 list 类型,返回类型错误。
if (checkType(c,lobj,OBJ_LIST)) return;
// 如果 key 不存在,判断是否允许插入
if (!lobj) {
// 如果是 LPUSHX/RPUSHX(xx == 1)且 key 不存在,直接返回 (integer) 0。
if (xx) {
addReply(c, shared.czero);
return;
}
// 否则,创建一个新的 listpack 编码的 list 对象,并添加到数据库。
lobj = createListListpackObject();
dbAdd(c->db,c->argv[1],lobj);
}
// 检查是否需要将 listpack 转换为 quicklist
listTypeTryConversionAppend(lobj,c->argv,2,c->argc-1,NULL,NULL);
// 循环插入每个元素
for (j = 2; j < c->argc; j++) {
// 将所有待插入的元素,依次 LPUSH 或 RPUSH 到 list 对象中。
listTypePush(lobj,c->argv[j],where);
// 每次写操作都会将 server.dirty++,用于统计数据库更改次数。
server.dirty++;
}
// 计算插入后 list 长度并返回
llen = listTypeLength(lobj);
addReplyLongLong(c, llen);
// 发布事件、发送通知
char *event = (where == LIST_HEAD) ? "lpush" : "rpush";
signalModifiedKey(c,c->db,c->argv[1]);
notifyKeyspaceEvent(NOTIFY_LIST,event,c->argv[1],c->db->id);
updateKeysizesHist(c->db, getKeySlot(c->argv[1]->ptr), OBJ_LIST, llen - (c->argc - 2), llen);
}
- client *c:客户端命令上下文。
- where:插入位置,LIST_HEAD 或 LIST_TAIL。
- xx:是否为 X
2. 删除元素 (LPOP/RPOP)
可以从列表的头部或尾部删除元素: t_list.c:174-201
robj *listTypePop(robj *subject, int where) {
robj *value = NULL;
if (subject->encoding == OBJ_ENCODING_QUICKLIST) {
long long vlong;
int ql_where = where == LIST_HEAD ? QUICKLIST_HEAD : QUICKLIST_TAIL;
if (quicklistPopCustom(subject->ptr, ql_where, (unsigned char **)&value,
NULL, &vlong, listPopSaver)) {
if (!value)
value = createStringObjectFromLongLong(vlong);
}
} else if (subject->encoding == OBJ_ENCODING_LISTPACK) {
unsigned char *p;
unsigned char *vstr;
int64_t vlen;
unsigned char intbuf[LP_INTBUF_SIZE];
p = (where == LIST_HEAD) ? lpFirst(subject->ptr) : lpLast(subject->ptr);
if (p) {
vstr = lpGet(p, &vlen, intbuf);
value = createStringObject((char*)vstr, vlen);
subject->ptr = lpDelete(subject->ptr, p, NULL);
}
} else {
serverPanic("Unknown list encoding");
}
return value;
}
函数签名
robj *listTypePop(robj *subject, int where)
-
subject:代表 list 的 Redis 对象(类型为 OBJ_LIST)。
-
where:弹出方向,LIST_HEAD 或 LIST_TAIL。
-
返回值:弹出的值(Redis 字符串对象 robj *)。
1. 如果是 quicklist 编码(高版本默认使用)
if (subject->encoding == OBJ_ENCODING_QUICKLIST) {
...
quicklistPopCustom(subject->ptr, ql_where, (unsigned char **)&value,
NULL, &vlong, listPopSaver);
...
}
- quicklistPopCustom 是弹出元素的核心逻辑,负责从对应节点中提取值。
- 如果是整数编码,它会赋值到 vlong,并由 listPopSaver 回调处理。
- 若 value == NULL,说明是整数,则构造 long long 类型的 Redis 字符串对象返回。
- listPopSaver 是个回调函数,会把弹出的原始字节构造为 Redis 字符串对象。
2. 如果是 listpack 编码(小型 list 时使用)
else if (subject->encoding == OBJ_ENCODING_LISTPACK) {
...
p = (where == LIST_HEAD) ? lpFirst(subject->ptr) : lpLast(subject->ptr);
if (p) {
vstr = lpGet(p, &vlen, intbuf);
value = createStringObject((char*)vstr, vlen);
subject->ptr = lpDelete(subject->ptr, p, NULL);
}
}
-
使用 lpFirst / lpLast 获取头部或尾部元素的位置指针。
-
然后使用 lpGet 获取数据,并构建为 Redis 字符串对象。
-
使用 lpDelete 删除这个节点,更新 listpack 本身。
3. 未知编码 panic
else {
serverPanic("Unknown list encoding");
}
- 正常 Redis 不可能触发这一分支,如果触发说明内存或结构损坏。
3. 查询元素 (LINDEX)
可以通过索引获取列表中的元素: t_list.c:578-605
/* LINDEX <key> <index> */
void lindexCommand(client *c) {
robj *o = lookupKeyReadOrReply(c,c->argv[1],shared.null[c->resp]);
if (o == NULL || checkType(c,o,OBJ_LIST)) return;
long index;
if ((getLongFromObjectOrReply(c, c->argv[2], &index, NULL) != C_OK))
return;
listTypeIterator *iter = listTypeInitIterator(o,index,LIST_TAIL);
listTypeEntry entry;
unsigned char *vstr;
size_t vlen;
long long lval;
if (listTypeNext(iter,&entry)) {
vstr = listTypeGetValue(&entry,&vlen,&lval);
if (vstr) {
addReplyBulkCBuffer(c, vstr, vlen);
} else {
addReplyBulkLongLong(c, lval);
}
} else {
addReplyNull(c);
}
listTypeReleaseIterator(iter);
}
1. 读取 key 并检查类型
robj *o = lookupKeyReadOrReply(c,c->argv[1],shared.null[c->resp]);
if (o == NULL || checkType(c,o,OBJ_LIST)) return;
- 从数据库中读取 key。
- 如果 key 不存在或类型不是 list,就直接返回 null 回复。
2. 解析 index 参数
long index;
if ((getLongFromObjectOrReply(c, c->argv[2], &index, NULL) != C_OK))
return;
- 将参数转换成 long 类型的索引。
- 支持负数(如 -1 表示最后一个元素)。
3. 创建迭代器
listTypeIterator *iter = listTypeInitIterator(o,index,LIST_TAIL);
-
初始化迭代器,从 index 处开始。
-
实际上 listTypeInitIterator 会处理正负索引的方向和位置,内部支持 quicklist 和 listpack。
4. 迭代并获取值
listTypeEntry entry;
unsigned char *vstr;
size_t vlen;
long long lval;
if (listTypeNext(iter,&entry)) {
vstr = listTypeGetValue(&entry,&vlen,&lval);
...
}
-
listTypeNext 移动迭代器,定位到目标节点。
-
listTypeGetValue 获取节点的内容。
-
如果是字符串编码,则写入 vstr / vlen;
-
如果是整数编码,则写入 lval。
-
5. 构造回复
if (vstr) {
addReplyBulkCBuffer(c, vstr, vlen);
} else {
addReplyBulkLongLong(c, lval);
}
6. 释放迭代器
listTypeReleaseIterator(iter);
-
手动释放迭代器,避免内存泄漏。
-
构造 RESP 格式的返回数据。
4. 插入元素 (LINSERT)
可以在指定元素前后插入新元素: t_list.c:516-569
/* LINSERT <key> (BEFORE|AFTER) <pivot> <element> */
void linsertCommand(client *c) {
int where;
robj *subject;
listTypeIterator *iter;
listTypeEntry entry;
int inserted = 0;
if (strcasecmp(c->argv[2]->ptr,"after") == 0) {
where = LIST_TAIL;
} else if (strcasecmp(c->argv[2]->ptr,"before") == 0) {
where = LIST_HEAD;
} else {
addReplyErrorObject(c,shared.syntaxerr);
return;
}
if ((subject = lookupKeyWriteOrReply(c,c->argv[1],shared.czero)) == NULL ||
checkType(c,subject,OBJ_LIST)) return;
/* We're not sure if this value can be inserted yet, but we cannot
* convert the list inside the iterator. We don't want to loop over
* the list twice (once to see if the value can be inserted and once
* to do the actual insert), so we assume this value can be inserted
* and convert the listpack to a regular list if necessary. */
listTypeTryConversionAppend(subject,c->argv,4,4,NULL,NULL);
/* Seek pivot from head to tail */
iter = listTypeInitIterator(subject,0,LIST_TAIL);
const size_t object_len = sdslen(c->argv[3]->ptr);
while (listTypeNext(iter,&entry)) {
if (listTypeEqual(&entry,c->argv[3],object_len)) {
listTypeInsert(&entry,c->argv[4],where);
inserted = 1;
break;
}
}
listTypeReleaseIterator(iter);
if (inserted) {
signalModifiedKey(c,c->db,c->argv[1]);
notifyKeyspaceEvent(NOTIFY_LIST,"linsert",
c->argv[1],c->db->id);
server.dirty++;
unsigned long ll = listTypeLength(subject);
updateKeysizesHist(c->db, getKeySlot(c->argv[1]->ptr), OBJ_LIST, ll-1, ll);
} else {
/* Notify client of a failed insert */
addReplyLongLong(c,-1);
return;
}
addReplyLongLong(c,listTypeLength(subject));
}
1. 解析参数
if (strcasecmp(c->argv[2]->ptr,"after") == 0) {
where = LIST_TAIL;
} else if (strcasecmp(c->argv[2]->ptr,"before") == 0) {
where = LIST_HEAD;
}
- 判断是插入在 pivot 前还是后,用 LIST_HEAD 或 LIST_TAIL 表示方向。
2. 读取 key 并做类型检查
subject = lookupKeyWriteOrReply(c,c->argv[1],shared.czero)
-
如果 key 不存在,返回 0。
-
如果 key 存在但不是 list 类型,返回类型错误。
3. 进行编码转换(如果需要)
listTypeTryConversionAppend(subject,c->argv,4,4,NULL,NULL);
-
虽然这里还没确定是否插入成功,但 Redis 先尝试把 listpack 转换成 quicklist,因为:
-
插入操作对 listpack 来说代价太高(插入中间元素会导致频繁移动内存)。
-
所以 Redis 预判可能要插入,先提前做一次结构升级。
-
4. 查找 pivot 元素
iter = listTypeInitIterator(subject,0,LIST_TAIL);
while (listTypeNext(iter,&entry)) {
if (listTypeEqual(&entry,c->argv[3],object_len)) {
listTypeInsert(&entry,c->argv[4],where);
inserted = 1;
break;
}
}
- 从头到尾遍历整个 list。
- 如果找到了第一个值等于 pivot 的元素,则调用 listTypeInsert 插入新值。
- 设置标志 inserted = 1。
5. 释放迭代器,构造返回值
- 如果插入成功:
- 通知 key 修改(signalModifiedKey)。
- 发出 keyspace 事件(支持订阅机制)。
- 更新 list 长度统计信息。
- 回复新的 list 长度。
- 如果插入失败(找不到 pivot):
- 回复 -1。
为什么提前转换结构?
- listTypeTryConversionAppend 是设计上的优化,避免插入时再进行结构升级,减少中途状态的复杂性。这种做法牺牲一点空间,但换取性能上的简洁和一致性,符合 Redis 对性能极致追求的风格。