redis6.0.5之dict阅读笔记8-dict之奇妙函数dictScan

redis6.0.5之dict阅读笔记8-dict之奇妙函数dictScan
基本的函数我们已经看过，接下来我们把剩余的函数拿出来看看,发现剩下的函数相当奇妙
******************************************************************
/* dictScan() is used to iterate over the elements of a dictionary.
函数dictScan被使用来遍历整个字段元素
 * Iterating works the following way:
迭代过程按照如下方式进行
 * 1) Initially you call the function using a cursor (v) value of 0.
 * 2) The function performs one step of the iteration, and returns the
 *    new cursor value you must use in the next call.
 * 3) When the returned cursor is 0, the iteration is complete.
1)使用一个初始值为0的游标调用这个函数
2)这个函数只迭代一步，然后返回游标，你做下次迭代的时候继续用
3)当返回的游标为0时，表示迭代完成了
 * The function guarantees all elements present in the
 * dictionary get returned between the start and end of the iteration.
 * However it is possible some elements get returned multiple times.
函数保证返回所有当前在字典中从迭代开始到结束的元素。
但是部分元素可能返回多次
 * For every element returned, the callback argument 'fn' is
 * called with 'privdata' as first argument and the dictionary entry
 * 'de' as second argument.
对每个返回的元素，回调参数fn被调用，第一个参数是'privdata', 第二个参数是字典实体(元素)'de'
 * HOW IT WORKS.
本算法如何工作？
 * The iteration algorithm was designed by Pieter Noordhuis.
这个算法由Pieter Noordhuis设计
 * The main idea is to increment a cursor starting from the higher order
 * bits. That is, instead of incrementing the cursor normally, the bits
 * of the cursor are reversed, then the cursor is incremented, and finally
 * the bits are reversed again.
这个算法的主要思想是 从游标的高位开始迭代，代替平常的的从低位开始迭代。
游标的比特位被翻转，然后开始增加，最后又将比特位翻转回来。
 * This strategy is needed because the hash table may be resized between
 * iteration calls.
采用这个策略是因为调用迭代的时候，hash表可能正在调整大小。
 * dict.c hash tables are always power of two in size, and they
 * use chaining, so the position of an element in a given table is given
 * by computing the bitwise AND between Hash(key) and SIZE-1
 * (where SIZE-1 is always the mask that is equivalent to taking the rest
 *  of the division between the Hash of the key and SIZE).
dict.c的hash表的大小总是2的指数次方，并且使用链表，
所以在一个给定表中一个元素的位置是通过计算键的hash值和SIZE-1的按位与操作获得。
(其中SIZE-1是掩码，等价于键的hash值除以SIZE得到的余数)
即 键的hash值 &  SIZE-1 = 键的hash值 % SIZE
 * For example if the current hash table size is 16, the mask is
 * (in binary) 1111. The position of a key in the hash table will always be
 * the last four bits of the hash output, and so forth.
举例说明,如果当前的hash表大小是16,那么掩码就是二进制的1111(10进制的15),
键在hash表中的位置总是键hash值的最后四位，其它大小的hash表也是这个道理

 * WHAT HAPPENS IF THE TABLE CHANGES IN SIZE?
如果hash表大小发生了变化将会发生什么？
 * If the hash table grows, elements can go anywhere in one multiple of
 * the old bucket: for example let's say we already iterated with
 * a 4 bit cursor 1100 (the mask is 1111 because hash table size = 16).
如果hash表变大了，那么元素就会分散到原来桶倍数的地方：举例说明,如果我们迭代一个4比特的游标1100，
(这个掩码就是二进制的1111因为hash表的大小为16)
 * If the hash table will be resized to 64 elements, then the new mask will
 * be 111111. The new buckets you obtain by substituting in ??1100
 * with either 0 or 1 can be targeted only by keys we already visited
 * when scanning the bucket 1100 in the smaller hash table.
如果这个时候hash表扩容到64个桶，那么新的掩码就是63(111111)。
要获取新的桶用什么代替原来的1100呢？即??1100。
？的位置是0或1，由我们已经访问过在小的hash表中的的桶位置1100来决定。
 * By iterating the higher bits first, because of the inverted counter, the
 * cursor does not need to restart if the table size gets bigger. It will
 * continue iterating using cursors without '1100' at the end, and also
 * without any other combination of the final 4 bits already explored.
通过优先迭代高位的比特位，因为翻转的计数，当时hash表变大时，游标不用重新开始。
它可以继续迭代游标，不需要访问1100结尾和其它已经访问过的最后4位比特组合的位置
(为什么可以这样呢？ 因为如果从高位开始迭代，1100就变成了0011，如果扩展了一位就是00011
因为初始化是从0开始，所以我们已经访问过了 0000，1000，0100，1100，0010，1010，
0110，1110，0001，1001，0101，1101，0011(当前位置)，接下来我们需要访问 1011，0111，1111,
如果扩展了，那么我们将要访问01100,因为是从高位开始计数的，对于访问过的元素，
最高位添加0，全部小于00011，所以之前访问的位置无需再次访问，
接下来要访问的就是00011，10011，01011，11011，00111，10111，01111，11111，000000，
即为  3 --> 13 --> b --> 1b --> 7 --> 17 --> f --> 1f --> 0 
这个和之前的是连续的，不存在重复访问的情况)
 * Similarly when the table size shrinks over time, for example going from
 * 16 to 8, if a combination of the lower three bits (the mask for size 8
 * is 111) were already completely explored, it would not be visited again
 * because we are sure we tried, for example, both 0111 and 1111 (all the
 * variations of the higher bit) so we don't need to test it again.
类似的当hash表缩小的时候，举例来说，当hash表大小从16到8时，如果一个低三位比特的组合已经被探索过了(大小为8的掩码是111)，
那么它将不会再被访问，因为我们确认已经访问过它了，举例，0111和1111(只有最高位的比特不一样)，所以我们不需要再次访问它.
(为什么可以这样呢？ 因为如果从高位开始迭代，1100就变成了0011，如果收缩了，那么会变为011
因为初始化是从0开始，所以我们已经访问过了 0000，1000，0100，1100，0010，1010，
0110，1110，0001，1001，0101，1101，0011(当前位置)，接下来我们需要访问 1011，0111，1111,
如果这个时候表缩小了，那么我们将要访问的是011，111，0,这种情况下不会重复，
如果之前我们已经访问到了1011，这个时候缩表了，我们就要放问011，
因为0011对应位置的元素已经访问过了，所以可能再次被访问，这种情况下出可能会出现重复的元素)
 * WAIT... YOU HAVE *TWO* TABLES DURING REHASHING!
等等...,在迁移过程中你有两张表！
 * Yes, this is true, but we always iterate the smaller table first, then
 * we test all the expansions of the current cursor into the larger
 * table. For example if the current cursor is 101 and we also have a
 * larger table of size 16, we also test (0)101 and (1)101 inside the larger
 * table. This reduces the problem back to having only one table, where
 * the larger one, if it exists, is just an expansion of the smaller one.
是的，确实是这样，但是我们总是从小表开始迭代，然后我们通过当前游标的所有扩展到大的表。
举例，如果当前的游标是101然后我们有一个更大的大小为16的表，在大表中访问(0)101和(1)101。
这就把问题缩小到只有一个表，而大表（如果存在的话）只是小表的扩展
 * LIMITATIONS
局限性
 * This iterator is completely stateless, and this is a huge advantage,
 * including no additional memory used.
这个迭代器是完全没有状态的，意味着不需要额外的内存，这是一个巨大的优势。
 * The disadvantages resulting from this design are:
这种设计的缺点如下：
 * 1) It is possible we return elements more than once. However this is usually
 *    easy to deal with in the application level.
 * 2) The iterator must return multiple elements per call, as it needs to always
 *    return all the keys chained in a given bucket, and all the expansions, so
 *    we are sure we don't miss keys moving during rehashing.
 * 3) The reverse cursor is somewhat hard to understand at first, but this
 *    comment is supposed to help.
 */
1函数可能返回同样元素多次，但是这个可以很容易在应用程序方面处理(去重)
2迭代器每次必须返回多个元素，因为它必须返回给定桶链接的所有键以及所有的扩展(指在rehashing中的扩展)，
这样我们才能保证在做rehashing迁移的过程中不会遗漏任何键。
3这个反向游标开始理解起来比较困难，但是上述的注释会有所帮助
unsigned long dictScan(dict *d,
                       unsigned long v,
                       dictScanFunction *fn,
                       dictScanBucketFunction* bucketfn,
                       void *privdata)
{
    dictht *t0, *t1;
    const dictEntry *de, *next;
    unsigned long m0, m1;

    if (dictSize(d) == 0) return 0;  //没有元素，直接返回0，结束

    /* Having a safe iterator means no rehashing can happen, see _dictRehashStep.
     * This is needed in case the scan callback tries to do dictFind or alike. */
     拥有一个安全的迭代器意味着中间不能做迁移，具体可见函数_dictRehashStep。
     这样做是需要的，比如scan掉回调中尝试做dictFind或者类似的动作
    d->iterators++;//非0就不会做rehashing了，暂停了，等iterators为0继续做

    if (!dictIsRehashing(d)) { //没有做rehashing迁移,只有一张表
        t0 = &(d->ht[0]);   //获取表地址
        m0 = t0->sizemask;  //获取掩码值

        /* Emit entries at cursor */ 从游标处出发
        if (bucketfn) bucketfn(privdata, &t0->table[v & m0]);  //调用传入对桶调用的函数
        de = t0->table[v & m0]; //获取传入参数V所在的位置的桶的首元素
        while (de) {
            next = de->next;  //预先获取下一个元素
            fn(privdata, de);  //调用传入对单元素操作的函数
            de = next; //迭代下一个元素
        }

        /* Set unmasked bits so incrementing the reversed cursor
         * operates on the masked bits */
        设置无掩码比特位，这样就可以用来反向游标在掩码位置上的操作，看个例子如下：
        假设： v=3ul, m0=15ul

        v |= ~m0;  //fffffffffffffff3
        /* Increment the reverse cursor */   
        完成了高位向低位进位的加法  00->10->01->11
        v = rev(v);  //cfffffffffffffff  0011->1100  3->c
        v++;         //d000000000000000 =  cfffffffffffffff + 1 ,这里就是上面注释想要说明的内容
        v = rev(v);  //d->b   1101 -> 1011
//这几句代码就是函数开始注释的不好理解但是很棒的核心代码，
//下面我们用类似的正向代码来理解它
//int count = 0;
//while(t->sizemask >>=1) 
//{
//    count++;
//}
//int movebits = sizeof(v)*8-count; //这里的movebits就是无掩码的位数 64-5
//上面是解释movebits如何得到，接下来就是我们的正向解释，
//同上述初始条件一致，t->sizemask=16，v=3ul，那么movebits=60
//r2 <<=movebits;   r2=3 ，3<<60  ->  3000000000000000  因为v是一个反值，所以我们先将无掩码位补齐
//kk = rev(r2);   3000000000000000 -> c   0011->1100   然后取反求正
//kk++;          c->d   1100 + 1 = 1101                再正向的+1
//r2=rev(kk);    d-> b000000000000000    1101->1011    然后求反
//r2 >>=movebits;  b000000000000000>>60 = b            再把无掩码位去掉
    } else {//如果在做迁移，那么存在两张表
        t0 = &d->ht[0];  //表1
        t1 = &d->ht[1];  //表2

        /* Make sure t0 is the smaller and t1 is the bigger table */
        确保表t0比标t1小
        if (t0->size > t1->size) { //通过桶数判断，如果t0大就交换
            t0 = &d->ht[1];
            t1 = &d->ht[0];
        }

        m0 = t0->sizemask; //获取掩码值
        m1 = t1->sizemask;

        /* Emit entries at cursor */同上
        if (bucketfn) bucketfn(privdata, &t0->table[v & m0]);
        de = t0->table[v & m0];
        while (de) {
            next = de->next;
            fn(privdata, de);
            de = next;
        }

        /* Iterate over indices in larger table that are the expansion
         * of the index pointed to by the cursor in the smaller table */
        大表的游标是通过小表游标扩展出来的
        do {
            /* Emit entries at cursor */
            if (bucketfn) bucketfn(privdata, &t1->table[v & m1]);
            de = t1->table[v & m1]; //因为m1>m0,所以原来的一个值，会分散到(m1+1)/(m0+1)个桶中去
            while (de) {
                next = de->next;
                fn(privdata, de);
                de = next;
            }

            /* Increment the reverse cursor not covered by the smaller mask.*/
            因为从小到大，会把元素分散到多个不同的桶中去，所以需要遍历所有的元素，举例如下
            从0011->00011 这里时候还需要访问 10011， 如果扩展的更多，
            例如从0011->000011,那么需要遍历的就更多 100011，010011，110011
            需要多遍历的个数为 (m1+1)/(m0+1)-1
            v |= ~m1;
            v = rev(v);
            v++;
            v = rev(v);
            /* Continue while bits covered by mask difference is non-zero */
        } while (v & (m0 ^ m1)); //v & (m0 ^ m1) 就是遍历扩展的 (m1+1)/(m0+1)-1 个元素
    }

    /* undo the ++ at the top */
    d->iterators--; //将迭代器数目减少1
    return v;
}


/* Finds the dictEntry reference by using pointer and pre-calculated hash.
 * oldkey is a dead pointer and should not be accessed.
 * the hash value should be provided using dictGetHash.
 * no string / key comparison is performed.
 * return value is the reference to the dictEntry if found, or NULL if not found. */
通过指针和预先计算的hash值查找元素索引，oldkey是一个不能被修改的常量指针。
传入的hash值应该由dictGetHash提供。
不需要做字符串/键比较
如果找到，返回的值是对元素的一个引用，如果找不到就返回空
dictEntry **dictFindEntryRefByPtrAndHash(dict *d, const void *oldptr, uint64_t hash) {
    dictEntry *he, **heref;
    unsigned long idx, table;

    if (dictSize(d) == 0) return NULL; /* dict is empty */
    for (table = 0; table <= 1; table++) {
        idx = hash & d->ht[table].sizemask;
        heref = &d->ht[table].table[idx];
        he = *heref;
        while(he) {
            if (oldptr==he->key) //找到返回元素引用
                return heref;
            heref = &he->next;
            he = *heref;
        }
        if (!dictIsRehashing(d)) return NULL; //没有扩表的情况下，只需要查找一张表即可
    }
    return NULL;
}
posted on 2020-08-20 19:42 子虚乌有阅读(292) 评论(0) 收藏举报