redis6.0.5之ziplist阅读笔记1--压缩列表(ziplist)之注释翻译
redis6.0.5之adlist阅读笔记--压缩列表(ziplist)之注释翻译
***********************************************************************
/* The ziplist is a specially encoded dually linked list that is designed
* to be very memory efficient. It stores both strings and integer values,
* where integers are encoded as actual integers instead of a series of
* characters. It allows push and pop operations on either side of the list
* in O(1) time. However, because every operation requires a reallocation of
* the memory used by the ziplist, the actual complexity is related to the
* amount of memory used by the ziplist.
压缩列表是一个特殊编码的双链表结构,这种设计内存利用非常高效。它保存字符串和整数,
整数被编码成内存实际的格式存储,而非一些列的数字。它允许在两头进行复杂度为O(1)的压入和弹出操作。
然而因为每次操作需要对压缩列表重新分配内存,所以实际的复杂度取决于压缩列表所使用的内存.
* ----------------------------------------------------------------------------
* ZIPLIST OVERALL LAYOUT 压缩列表格式总览
* ======================
* The general layout of the ziplist is as follows:
一般化的压缩列表个格式如下
* <zlbytes> <zltail> <zllen> <entry> <entry> ... <entry> <zlend>
* NOTE: all fields are stored in little endian, if not specified otherwise.
如果没有特别说明,默认为小端存储格式
* <uint32_t zlbytes> is an unsigned integer to hold the number of bytes that
* the ziplist occupies, including the four bytes of the zlbytes field itself.
* This value needs to be stored to be able to resize the entire structure
* without the need to traverse it first.
32位无符号数整数zlbytes用来保存整个压缩列表占用的长度,包括自身4个字节。
这个值的存储可以使得扩容整个结构的时候不需要首先遍历整个压缩列表
* <uint32_t zltail> is the offset to the last entry in the list. This allows
* a pop operation on the far side of the list without the need for full
* traversal.
32位无符号数整数zltail是相对于列表结尾元素的偏移量。
这个值可以使得对列表远端一边进行弹出操作而不用遍历按整个列表
* <uint16_t zllen> is the number of entries. When there are more than
* 2^16-2 entries, this value is set to 2^16-1 and we need to traverse the
* entire list to know how many items it holds.
16位无符号数整数zllen表示元素的个数.当总共有超过2^16-2个元素时,这个值被设定为2^16-1,
我们需要遍历整个列表获取总的数目
* <uint8_t zlend> is a special entry representing the end of the ziplist.
* Is encoded as a single byte equal to 255. No other normal entry starts
* with a byte set to the value of 255.
8位无符号数整数zlend是一个特别的元素代表压缩列表的结尾。
用值为255的一个字节进行编码,任何其他的元素不能使用一个字节为255的值(否则会被误认为结尾)
* ZIPLIST ENTRIES
* ===============
* Every entry in the ziplist is prefixed by metadata that contains two pieces
* of information. First, the length of the previous entry is stored to be
* able to traverse the list from back to front. Second, the entry encoding is
* provided. It represents the entry type, integer or string, and in the case
* of strings it also represents the length of the string payload.
* So a complete entry is stored like this:
每一个实体元素都是包含两条信息的元数据作为前缀.
首先,保存前一个实体元素的长度,方便从后向前遍历列表。
其次是实体元素的编码。它代表了实体元素的类型,整型或者字符串,
当为字符串的时候,它还代表字符串的有效长度。
所以一个完整的实体元素存储如下所示:
* <prevlen> <encoding> <entry-data>
* Sometimes the encoding represents the entry itself, like for small integers
* as we'll see later. In such a case the <entry-data> part is missing, and we
* could have just:
有些时候编码代表实体元素自身,像我们将要遇见的小整数。
在这种情况下,<entry-data>部分就没有了,我们只拥有如下两部分:
* <prevlen> <encoding>
* The length of the previous entry, <prevlen>, is encoded in the following way:
* If this length is smaller than 254 bytes, it will only consume a single
* byte representing the length as an unsinged 8 bit integer. When the length
* is greater than or equal to 254, it will consume 5 bytes. The first byte is
* set to 254 (FE) to indicate a larger value is following. The remaining 4
* bytes take the length of the previous entry as value.
前一个实体元素的长度<prevlen>,按照如下方式编码:
如果长度小于254,那么就只需要一个字节表示8比特的无符号整数。
当长度大于等于254时,需要5个字节来保存。第一个字节被设置为254表示接下来是一个大值。
剩下的4个字节表示前面实体元素的长度。
* So practically an entry is encoded in the following way:
所以实际的实体元素按照如下方式编码:
* <prevlen from 0 to 253> <encoding> <entry>
* Or alternatively if the previous entry length is greater than 253 bytes
* the following encoding is used:
或者如果前面的实体元素长度大于253,那么按照如下方式编码:
* 0xFE <4 bytes unsigned little endian prevlen> <encoding> <entry>
* The encoding field of the entry depends on the content of the
* entry. When the entry is a string, the first 2 bits of the encoding first
* byte will hold the type of encoding used to store the length of the string,
* followed by the actual length of the string. When the entry is an integer
* the first 2 bits are both set to 1. The following 2 bits are used to specify
* what kind of integer will be stored after this header. An overview of the
* different types and encodings is as follows. The first byte is always enough
* to determine the kind of entry.
编码字段取决于实体元素的内容,当实体元素是一个字符串时,
用于编码第一个字节的开始2比特将保证可以保存字符串的长度,
接下来是字符串实际的长度,当实体元素是一个整型,前面两个bite全部置成1,
接下来的2比特用来确定什么类型的整数在这个头后面将被保存。下面是各种类型和编码的概览
第一个字节总是足够决定实体元素的类型
* |00pppppp| - 1 byte
* String value with length less than or equal to 63 bytes (6 bits).
* "pppppp" represents the unsigned 6 bit length.
小于等于63字节长度的字符串,"pppppp"表示无符号6比特长度
* |01pppppp|qqqqqqqq| - 2 bytes
* String value with length less than or equal to 16383 bytes (14 bits).
* IMPORTANT: The 14 bit number is stored in big endian.
小于等于16383字节的长度的字符串用14比特表示
重要提示:这个14比特用大端方式保存的
* |10000000|qqqqqqqq|rrrrrrrr|ssssssss|tttttttt| - 5 bytes
* String value with length greater than or equal to 16384 bytes.
* Only the 4 bytes following the first byte represents the length
* up to 32^2-1. The 6 lower bits of the first byte are not used and
* are set to zero.
* IMPORTANT: The 32 bit number is stored in big endian.
大于等于16384字节长度的字符串,用接下来的4字节表示,最大可以达到2^32-1(此处原文估计为笔误)。
第一个字节的低6比特没有使用,全部被设置为0
* |11000000| - 3 bytes
* Integer encoded as int16_t (2 bytes).
2个字节可以表示的整数
* |11010000| - 5 bytes
* Integer encoded as int32_t (4 bytes).
4个字节可以表示的整数
* |11100000| - 9 bytes
* Integer encoded as int64_t (8 bytes).
8个字节可以表示的整数
* |11110000| - 4 bytes
* Integer encoded as 24 bit signed (3 bytes).
3个字节可以表示的整数
* |11111110| - 2 bytes
* Integer encoded as 8 bit signed (1 byte).
1个字节可以表示的整数
* |1111xxxx| - (with xxxx between 0000 and 1101) immediate 4 bit integer.
* Unsigned integer from 0 to 12. The encoded value is actually from
* 1 to 13 because 0000 and 1111 can not be used, so 1 should be
* subtracted from the encoded 4 bit value to obtain the right value.
xxxx代表从0000 and 1101,4比特的整数,无符号整数从0到12。实际编码值从1到13,
因为0000和1111都不能使用(0000上面的类型已经被使用,1111就变成全1了,表示结尾也不能使用)。
所以需要从编码的4比特减去1获取实际的值
* |11111111| - End of ziplist special entry.
|11111111|表示压缩列表的结尾实体元素
*
* Like for the ziplist header, all the integers are represented in little
* endian byte order, even when this code is compiled in big endian systems.
像压缩列表的头一样,所有的整数都用小端顺序表示,即使这段代码在大端系统编译
* EXAMPLES OF ACTUAL ZIPLISTS
实际的压缩列表例子
* ===========================
* The following is a ziplist containing the two elements representing
* the strings "2" and "5". It is composed of 15 bytes, that we visually
* split into sections:
下面是一个包含两个元素的压缩列表,字符串"2"和"5"。总共由15个字节组成,
我们将它分成几个部分,方便查看
* [0f 00 00 00] [0c 00 00 00] [02 00] [00 f3] [02 f6] [ff]
* | | | | | |
* zlbytes zltail entries "2" "5" end
*
* The first 4 bytes represent the number 15, that is the number of bytes
* the whole ziplist is composed of. The second 4 bytes are the offset
* at which the last ziplist entry is found, that is 12, in fact the
* last entry, that is "5", is at offset 12 inside the ziplist.
最前面的4个字节表示数字15,那是整个压缩列表组成的总字节数。
接下来的4个字节是相对于压缩列表最后一个元素的偏移量,这里是12,
事实上最后一个元素是字符串"5",在压缩列表中的(相对于头)偏移量是12
* The next 16 bit integer represents the number of elements inside the
* ziplist, its value is 2 since there are just two elements inside.
接下来的16比特代表了压缩列表中的元素个数,它的值是2因为只有两个元素。
* Finally "00 f3" is the first entry representing the number 2. It is
* composed of the previous entry length, which is zero because this is
* our first entry, and the byte F3 which corresponds to the encoding
* |1111xxxx| with xxxx between 0001 and 1101. We need to remove the "F"
* higher order bits 1111, and subtract 1 from the "3", so the entry value
* is "2". The next entry has a prevlen of 02, since the first entry is
* composed of exactly two bytes. The entry itself, F6, is encoded exactly
* like the first entry, and 6-1 = 5, so the value of the entry is 5.
最总"00 f3"表示第一个实体元素数字2.它是由上一个实体元素的长度组成,因为是第一个实体元素,所以长度为0,
F3这个字节对应编码|1111xxxx|,其中xxxx 在0001到1101之间。我们需要从中移除高4比特1111,然后减去1,
那么实体的值就是2了。接下来的实体元素有一个前长度为2的实体,因为第一个实体元素由两个字节组成。
这个实体元素本身,F6,和第一个元素编码类似,6-1=5,所以实体的值是5。
* Finally the special entry FF signals the end of the ziplist.
最后特殊的实体FF表示压缩列表的结束.
* Adding another element to the above string with the value "Hello World"
* allows us to show how the ziplist encodes small strings. We'll just show
* the hex dump of the entry itself. Imagine the bytes as following the
* entry that stores "5" in the ziplist above:
添加另外一个值为"Hello World"的元素到上面的字符串,让我们去展示压缩列表如何编码小字符串。
我们把16进制格式展示实体元素的拷贝。想想在上面压缩列表中保存实体元素"5"之后的字节
* [02] [0b] [48 65 6c 6c 6f 20 57 6f 72 6c 64]
* The first byte, 02, is the length of the previous entry. The next
* byte represents the encoding in the pattern |00pppppp| that means
* that the entry is a string of length <pppppp>, so 0B means that
* an 11 bytes string follows. From the third byte (48) to the last (64)
* there are just the ASCII characters for "Hello World".
第一个字节02,是前一个实体元素的长度。接下来的字节用|00pppppp|格式编码,
意味着实体元素是一个长度为<pppppp>的字符串。因此0B意味着接下来的11字节的字符串。
从第三个字节48到最后的字节64,它们都是"Hello World"的ASCII字符。
* ----------------------------------------------------------------------------