redis6.0.5之dict阅读笔记3-dict之新增元素中空间扩展和平滑Rehashing

redis6.0.5之dict阅读笔记3-dict之新增元素中空间扩展和平滑Rehashing
******************************************************************
在上一节中我们进行了新增元素的操作，新增的元素当然需要一个地方存放,
今天我们就先来看看redis是如何来扩容的

/* Expand the hash table if needed */
static int _dictExpandIfNeeded(dict *d)
{
    /* Incremental rehashing already in progress. Return. */
    if (dictIsRehashing(d)) return DICT_OK; //如果正在做Rehashing，不做扩容

    /* If the hash table is empty expand it to the initial size. */
    如果hash表示空的，需要初始化，初始化的大小为DICT_HT_INITIAL_SIZE(4)
    if (d->ht[0].size == 0) return dictExpand(d, DICT_HT_INITIAL_SIZE);

    /* If we reached the 1:1 ratio, and we are allowed to resize the hash
     * table (global setting) or we should avoid it but the ratio between
     * elements/buckets is over the "safe" threshold, we resize doubling
     * the number of buckets. */
如果我们使用元素和桶额比例已经达到了1：1，并且允许扩容hash表或者虽然不允许扩容，
但是元素和桶的比例超过安全阀值dict_force_resize_ratio(5)，我们也将翻倍扩容,
这里需要注意的是一个桶可以存放多个元素，是通过链表形式存放的
    if (d->ht[0].used >= d->ht[0].size &&
        (dict_can_resize ||  d->ht[0].used/d->ht[0].size > dict_force_resize_ratio))
    {
        return dictExpand(d, d->ht[0].used*2); //大小大于等于已存在元素的两倍
    }
    return DICT_OK;
}
******************************************************************
* Expand or create the hash table */
int dictExpand(dict *d, unsigned long size)
{
    /* the size is invalid if it is smaller than the number of
     * elements already inside the hash table */
如果正在做Rehashing或者已使用元素比将要分配的空间的大，返回错误
    if (dictIsRehashing(d) || d->ht[0].used > size)
        return DICT_ERR;

    dictht n; /* the new hash table */
    unsigned long realsize = _dictNextPower(size); //按照2的指数分配空间

    /* Rehashing to the same table size is not useful. */
    if (realsize == d->ht[0].size) return DICT_ERR;

    /* Allocate the new hash table and initialize all pointers to NULL */
    n.size = realsize;  分配总的桶数
    n.sizemask = realsize-1; 哨兵
    n.table = zcalloc(realsize*sizeof(dictEntry*)); 分配指向每个桶需要的指针总大小
    n.used = 0;

    /* Is this the first initialization? If so it's not really a rehashing
     * we just set the first hash table so that it can accept keys. */
     这里分两种情况，一种是第一次初始化，另外一种是需要做rehashing，
     对于第一次初始化的情况，我们设置第一个是hash表，让它可以接收键值
    if (d->ht[0].table == NULL) {
        d->ht[0] = n;
        return DICT_OK;
    }
   如果是要做rehashing，那么我们就初始化第二个hash表，准备做增量(平滑)rehashing
    /* Prepare a second hash table for incremental rehashing */
    d->ht[1] = n;
    d->rehashidx = 0; //设置标志位
    return DICT_OK;
}

******************************************************************
/* Our hash table capability is a power of two */
static unsigned long _dictNextPower(unsigned long size)
{
    unsigned long i = DICT_HT_INITIAL_SIZE;
    //LONG_MAX == long int  
    //LLONG_MAX == long long int
    if (size >= LONG_MAX) return LONG_MAX + 1LU; //如果超过有符号数，那么使用无符号数
    while(1) {  //一直翻倍直到大于给定长度size为止,所以返回的长度肯定是2的倍数,初始值为2的2次方，后面又是2的指数
        if (i >= size)
            return i;
        i *= 2;
    }
}

******************************************************************
我们再来看看redis是如何做平滑rehashing的 ？
一种是按时间，一种是按个数
我们先看按个数的，如下
******************************************************************
/* This function performs just a step of rehashing, and only if there are
 * no safe iterators bound to our hash table. When we have iterators in the
 * middle of a rehashing we can't mess with the two hash tables otherwise
 * some element can be missed or duplicated.
 *
 * This function is called by common lookup or update operations in the
 * dictionary so that the hash table automatically migrates from H1 to H2
 * while it is actively used. */
这个函数仅仅是rehashing的一步，而且只能在没有绑定非安全迭代器的情况下执行。
当在做rehashing过程中使用迭代器，我们不能混淆两个hash表，否则一些元素可能被遗漏或者重复
static void _dictRehashStep(dict *d) {
    if (d->iterators == 0) dictRehash(d,1); //调用dictRehash做rehashing，参数1表示只迁移一桶
}
******************************************************************
/* Performs N steps of incremental rehashing. Returns 1 if there are still
 * keys to move from the old to the new hash table, otherwise 0 is returned.
 *
 * Note that a rehashing step consists in moving a bucket (that may have more
 * than one key as we use chaining) from the old to the new hash table, however
 * since part of the hash table may be composed of empty spaces, it is not
 * guaranteed that this function will rehash even a single bucket, since it
 * will visit at max N*10 empty buckets in total, otherwise the amount of
 * work it does would be unbound and the function may block for a long time. */
在平滑rehashing中执行N步，返回1表示好需要继续从老的hash表到新的hash表，否则如果是0的话，表示已经迁移完毕。

注意一步rehashing包含一整个桶从老的hash表到新的hash表的迁移(一个桶可能拥有超过一个键值的元素，因为我们使用的是链表存储),
然而因为一次最多只访问N*10个桶，而hash表的部分桶是由空的空间组成的，没有元素，所以它(这个函数)不能保证至少一个桶可以被迁移。
另外一方面，如果总量不控制的话(一直要查找到有元素的桶位置，远远大于 n*10个桶)，那这个函数可能阻塞很长一段时间，耽误时间
int dictRehash(dict *d, int n) {
    int empty_visits = n*10; /* Max number of empty buckets to visit. */
    if (!dictIsRehashing(d)) return 0;  //如果不在做rehashing，就直接返回

    while(n-- && d->ht[0].used != 0) { //目标数已经完成或者所有需要迁移的数目已经完成，那么就退出循环
        dictEntry *de, *nextde;

        /* Note that rehashidx can't overflow as we are sure there are more
         * elements because ht[0].used != 0 */
        因为还存在没有迁移的元素，那么迁移的桶数必定小于原来的总桶数，超过表明程序出问题了，
        assert(d->ht[0].size > (unsigned long)d->rehashidx);
        
        while(d->ht[0].table[d->rehashidx] == NULL) {
        //查找一个非空桶的位置，直到找到一个位置或者超过了要迁移个数的10倍(为了让阻塞时间少点)，就停止
            d->rehashidx++;
            if (--empty_visits == 0) return 1;
        }
        de = d->ht[0].table[d->rehashidx]; //取出非空桶的第一个元素
        
        /* Move all the keys in this bucket from the old to the new hash HT */
        while(de) { //非空则继续操作,要将一整个桶的元素全部搬迁完为止
            uint64_t h;

            nextde = de->next;  //取出下一个来预备
            /* Get the index in the new hash table */
            找到新表中桶的位置
            h = dictHashKey(d, de->key) & d->ht[1].sizemask;
            
            de->next = d->ht[1].table[h];  //将原来table1表中对应桶的第一个元素挂在迁移元素后面
            d->ht[1].table[h] = de; //将迁移的元素放在table1中的第一个位置，得到 新元素->原来的桶链表
            
            d->ht[0].used--; //表table0减少一个元素
            d->ht[1].used++; //表table1增加一个元素
            de = nextde; //赋值下一个元素，继续循环知道空，即不存在元素为止
        }
        d->ht[0].table[d->rehashidx] = NULL;  //原桶清空
        d->rehashidx++;//又迁移了一桶
    }

    /* Check if we already rehashed the whole table... */
    检查是否已经搬迁完毕
    if (d->ht[0].used == 0) {
        zfree(d->ht[0].table); //搬迁完毕之后需要释放table0申请的空间
        d->ht[0] = d->ht[1]; 
        //将table1的值赋值给table0,这里的拷贝是浅拷贝，所以对于指针 **table，只拷贝了对应的地址，内容不拷贝
        _dictReset(&d->ht[1]);//将table1清空，这里不需要释放为table分配的内存空间，因为上一句中已经给了table0了
        d->rehashidx = -1; //设置rehashing结束标志
        return 0;  //迁移完毕
    }

    /* More to rehash... */
    return 1;
}
******************************************************************
再来看按照时间的迁移，如下

/* Rehash for an amount of time between ms milliseconds and ms+1 milliseconds */
用1毫秒的时间做平滑迁移
int dictRehashMilliseconds(dict *d, int ms) {
    long long start = timeInMilliseconds(); //获取开始毫秒数
    int rehashes = 0;

    while(dictRehash(d,100)) { //迁移100桶，实际不一定，怕查找耽误过长时间
        rehashes += 100; //成功迁移桶数加起来
        if (timeInMilliseconds()-start > ms) break; //如果时间到了，就停止迁移
    }
    return rehashes; //返回总的迁移桶数
}
这里是获取时间的函数
long long timeInMilliseconds(void) {
    struct timeval tv;//来自 time.h
/*struct timeval
{
__time_t tv_sec;        /* Seconds. */
__suseconds_t tv_usec;  /* Microseconds. */
};*/
    gettimeofday(&tv,NULL); //获取当前时间
    return (((long long)tv.tv_sec)*1000)+(tv.tv_usec/1000); //秒和微妙全部转化成毫秒
}
posted on 2020-08-11 17:36 子虚乌有阅读(194) 评论(0) 收藏举报