redis6.0.5之Rax阅读笔记1-相关数据结构和部分辅助函数

/* Representation of a radix tree as implemented in this file, that contains
 * the strings "foo", "foobar" and "footer" after the insertion of each
 * word. When the node represents a key inside the radix tree, we write it
 * between [], otherwise it is written between ().
在这个文件中实现的基树的表示，假如包含字符串"foo", "foobar" 和 "footer"，它们相继插入。
当节点表示基树中的一个键，我们写在[]中间，否则(如果不是键)我们写在()中间
 * This is the vanilla representation:
示例如下
 *              (f) ""
 *                \
 *                (o) "f"
 *                  \
 *                  (o) "fo"
 *                    \
 *                  [t   b] "foo"  这是一个键
 *                  /     \
 *         "foot" (e)     (a) "foob"
 *                /         \
 *      "foote" (r)         (r) "fooba"
 *              /             \
 *    "footer" []   键         [] "foobar" 键
 *
 * However, this implementation implements a very common optimization where
 * successive nodes having a single child are "compressed" into the node
 * itself as a string of characters, each representing a next-level child,
 * and only the link to the node representing the last character node is
 * provided inside the representation. So the above representation is turend
 * into:
然而，这个实现是一个非常普通的优化，把连续只有一个孩子的节点压缩成一个拥有字符串的一个节点，
每个字符代表一个子节点，而且只有表示最后字符节点的链接被提供表示(这里的意思就是只有最后一个节点的字符有链接)，
所以上述表示可以转化如下
 *                  ["foo"] ""  压缩节点，只要最后一个节点有链接，中间的全部去掉了
 *                     |
 *                  [t   b] "foo"
 *                  /     \
 *        "foot" ("er")    ("ar") "foob"  压缩节点
 *                 /          \
 *       "footer" []          [] "foobar"
 *
 * However this optimization makes the implementation a bit more complex.
 * For instance if a key "first" is added in the above radix tree, a
 * "node splitting" operation is needed, since the "foo" prefix is no longer
 * composed of nodes having a single child one after the other. This is the
 * above tree and the resulting node splitting after this event happens:
然而这个优化使得实现更加复杂。举例如下，如果一个键"first"被添加在上述的基树中，需要一个分裂节点操作，
因为"foo"这个前缀不在是一个拥有单子节点相连的压缩节点了，下面是上述基树插入键"first"后的结果
 *                    (f) ""
 *                    /
 *                 (i o) "f"  这里分裂节点
 *                 /   \
 *    "firs"  ("rst")  (o) "fo"
 *              /        \
 *    "first" []       [t   b] "foo"
 *                     /     \
 *           "foot" ("er")    ("ar") "foob"
 *                    /          \
 *          "footer" []          [] "foobar"
 *
 * Similarly after deletion, if a new chain of nodes having a single child
 * is created (the chain must also not include nodes that represent keys),
 * it must be compressed back into a single node.
删除也类似，如果具有单个子节点的新节点链被创建了(这个链不能包含代表键的节点)，
那么必须被压缩成一个单独的节点
 */
#define RAX_NODE_MAX_SIZE ((1<<29)-1)
typedef struct raxNode {
    uint32_t iskey:1;     /* Does this node contain a key? */  是否为键
    uint32_t isnull:1;    /* Associated value is NULL (don't store it). */ 是否有值
    uint32_t iscompr:1;   /* Node is compressed. */  是否压缩节点
    uint32_t size:29;     /* Number of children, or compressed string len. */ 子节点数或 压缩节点长度
    /* Data layout is as follows: 数据按照如下格式排列
     * If node is not compressed we have 'size' bytes, one for each children
     * character, and 'size' raxNode pointers, point to each child node.
     * Note how the character is not stored in the children but in the
     * edge of the parents:
如果是一个拥有size字节大小并且不压缩的节点，每个字节代表一个子字符和size个节点指针，指向每个子节点。
注意到字符保存在父节点的边缘而不是子节点。
     * [header iscompr=0][abc][a-ptr][b-ptr][c-ptr](value-ptr?)
        节点头 不压缩 具体字符 字符对应的指针(指向子节点)   值(如果有的话)
     * if node is compressed (iscompr bit is 1) the node has 1 children.
     * In that case the 'size' bytes of the string stored immediately at
     * the start of the data section, represent a sequence of successive
     * nodes linked one after the other, for which only the last one in
     * the sequence is actually represented as a node, and pointed to by
     * the current compressed node.
如果节点是压缩的(iscompr值为1)，表示该节点只有一个孩子。在这种情况下size字节大小的字符串保存
在数据段开始的地方，表示一些列前后连接的节点，只有序列最后一个节点实际代表一个节点，并且由当前节点指向
     * [header iscompr=1][xyz][z-ptr](value-ptr?)
        节点头 压缩    具体字符 最后字符对应的指针(指向子节点)   值(如果有的话)
     * Both compressed and not compressed nodes can represent a key
     * with associated data in the radix tree at any level (not just terminal
     * nodes).
压缩节点和非压缩节点能够代表基树中任何层级(不仅仅是终节点)的一个关联值的键
     * If the node has an associated key (iskey=1) and is not NULL
     * (isnull=0), then after the raxNode pointers poiting to the
     * children, an additional value pointer is present (as you can see
     * in the representation above as "value-ptr" field).
如果节点是一个关联的键而且非空，那么在节点指针指向孩子之后，需要增加一个值的指针表示
(就如你缩减在上述表示的域value-ptr)
     */
    unsigned char data[]; 数据，字符串
} raxNode;
基树定义如下，只需要三个元素，一个是指向头部的头指针，一个是节点数量，一个是元素数量

typedef struct rax {
    raxNode *head;
    uint64_t numele;
    uint64_t numnodes;
} rax;
*********************************************************************************** 
/* ------------------------- raxStack functions --------------------------
 * The raxStack is a simple stack of pointers that is capable of switching
 * from using a stack-allocated array to dynamic heap once a given number of
 * items are reached. It is used in order to retain the list of parent nodes
 * while walking the radix tree in order to implement certain operations that
 * need to navigate the tree upward.
 * ------------------------------------------------------------------------- */

#define RAX_STACK_STATIC_ITEMS 32

基树栈相关函数
基树栈是一个简单的保存指针的堆栈，默认是一个已经实现分配好的堆数组，
如果到了了这个数组的最大值，那么会重新分配堆内存。
这个堆栈被用来保存在遍历基树过程中的父节点。
从而实现特定类型的操作，不需要再次向上遍历基树
/* Initialize the stack. */ 初始化栈
static inline void raxStackInit(raxStack *ts) {
    ts->stack = ts->static_items; 初始使用实现分配好的静态栈
    ts->items = 0; 无数据
    ts->maxitems = RAX_STACK_STATIC_ITEMS; 32
    ts->oom = 0;
}
/* Push an item into the stack, returns 1 on success, 0 on out of memory. */
压入一个元素到栈，成功返回1，OOM(失败)返回0
static inline int raxStackPush(raxStack *ts, void *ptr) {
    if (ts->items == ts->maxitems) {  判断是否已经达到最大容量值
        if (ts->stack == ts->static_items) {  是否是初始的默认静态分配栈
            ts->stack = rax_malloc(sizeof(void*)*ts->maxitems*2); 是的话容量变成2倍，新开辟空间
            if (ts->stack == NULL) {  分配内存失败
                ts->stack = ts->static_items; 仍然使用静态栈
                ts->oom = 1;  提示OOM错误
                errno = ENOMEM;
                return 0;
            }
            memcpy(ts->stack,ts->static_items,sizeof(void*)*ts->maxitems);
            成功情况下，迁移数据到新栈
        } else { 使用的已经不是初始化的静态栈了，表示已经扩容过了，在原地址(上次扩容的地方)上扩容
            void **newalloc = rax_realloc(ts->stack,sizeof(void*)*ts->maxitems*2);
            if (newalloc == NULL) { 分配内存失败
                ts->oom = 1;
                errno = ENOMEM;
                return 0;
            }
            ts->stack = newalloc; 成功用新栈指针，这里数据无需迁移
        }
        ts->maxitems *= 2;   成功的情况下，容量都变成原来的2倍
    }
    ts->stack[ts->items] = ptr; 将新值插入
    ts->items++;  元素多1
    return 1;
}
/* Pop an item from the stack, the function returns NULL if there are no
 * items to pop. */
从栈中弹出一个元素，这个函数返回空如果栈中已经没有元素可以弹出了。
static inline void *raxStackPop(raxStack *ts) {
    if (ts->items == 0) return NULL; 没有元素可以弹出了
    ts->items--;  还有元素，先减去1
    return ts->stack[ts->items]; 弹出元素
}
/* Return the stack item at the top of the stack without actually consuming
 * it. */
返回栈中最上面的元素，但是实际上不消费(即不弹出，只是看看，不减少)
static inline void *raxStackPeek(raxStack *ts) {
    if (ts->items == 0) return NULL; 无元素可看
    return ts->stack[ts->items-1]; 最上面的元素拿出来看看
}
/* Free the stack in case we used heap allocation. */
释放栈如果我们使用了堆内存的分配
static inline void raxStackFree(raxStack *ts) {
    if (ts->stack != ts->static_items) rax_free(ts->stack);
}

/* ----------------------------------------------------------------------------
 * Radix tree implementation
 * --------------------------------------------------------------------------*/
基树的实现
*********************************************************************************** 
/* Return the padding needed in the characters section of a node having size
 * 'nodesize'. The padding is needed to store the child pointers to aligned
 * addresses. Note that we add 4 to the node size because the node has a four
 * bytes header. */
返回一个具有nodesize大小节点的填充字符字节数。填充的字节数是为了保存子指针对齐（这里按照8字节对齐，和内存地址相关）
注意到这里需要加4个字节的大小，因为每个节点由4个字节的头部。
就是如下4个字段占用4个字节：
    uint32_t iskey:1;     /* Does this node contain a key? */
    uint32_t isnull:1;    /* Associated value is NULL (don't store it). */
    uint32_t iscompr:1;   /* Node is compressed. */
    uint32_t size:29;     /* Number of children, or compressed string len. */
    
#define raxPadding(nodesize) ((sizeof(void*)-((nodesize+4) % sizeof(void*))) & (sizeof(void*)-1))
***********************************************************************************
/* Return the current total size of the node. Note that the second line
 * computes the padding after the string of characters, needed in order to
 * save pointers to aligned addresses. */
返回当前节点总的长度。注意到第二行计算字符串后添加的字符个数，需要按照保存对齐长度的指针所需
#define raxNodeCurrentLength(n) ( \
    sizeof(raxNode)+(n)->size+ \
    raxPadding((n)->size)+ \
    ((n)->iscompr ? sizeof(raxNode*) : sizeof(raxNode*)*(n)->size)+ \
    (((n)->iskey && !(n)->isnull)*sizeof(void*)) \
) 
*********************************************************************************** 
/* Return the pointer to the last child pointer in a node. For the compressed
 * nodes this is the only child pointer. */
返回节点指向最后一个孩子指针的指针，对于压缩节点来说这是唯一的一个子指针。
#define raxNodeLastChildPtr(n) ((raxNode**) ( \
    ((char*)(n)) + \  用字节指针相加
    raxNodeCurrentLength(n) - \ 当前节点的总长度(按字节计算)
    sizeof(raxNode*) - \ 减去最后一个指针节点的长度
    (((n)->iskey && !(n)->isnull) ? sizeof(void*) : 0) \  如果有值的话，需要减去指向值的指针长度
))
*********************************************************************************** 
/* Return the pointer to the first child pointer. */
返回指向第一个子指针的指针
#define raxNodeFirstChildPtr(n) ((raxNode**) ( \
    (n)->data + \  数据
    (n)->size + \  计算长度以及填充，结束了就是第一个子指针开始的地址
    raxPadding((n)->size)))
***********************************************************************************
posted on 2021-01-11 20:27 子虚乌有阅读(227) 评论(0) 收藏举报