redis6.0.5之evict阅读笔记-内存淘汰策略

***************************************************************************************************************************
/* ----------------------------------------------------------------------------
 * Data structures  数据结构
 * --------------------------------------------------------------------------*/
LRU : Least Recently Used，最近没有使用时间的长短
LFU : Least Frequently Used  最近使用次数多少
/* To improve the quality of the LRU approximation we take a set of keys
 * that are good candidate for eviction across freeMemoryIfNeeded() calls.
为了改善LRU算法的质量，我们采用了一个键的集合，这个集合中的是通过调用函数freeMemoryIfNeeded来淘汰的候选集
 * Entries inside the eviciton pool are taken ordered by idle time, putting
 * greater idle times to the right (ascending order).
在被淘汰池中的实体元素通过空等时间排序，空等时间最长的放在最右边(升序排列)
 * When an LFU policy is used instead, a reverse frequency indication is used
 * instead of the idle time, so that we still evict by larger value (larger
 * inverse frequency means to evict keys with the least frequent accesses).
当使用LFU策略时候，一个频率的反向指标用来代替空等时间，所以我们任然淘汰最大值
(大的反向频率意味着淘汰的键是最近最少访问的)
 * Empty entries have the key pointer set to NULL. */
空的实体键指针被设置为NULL
#define EVPOOL_SIZE 16
#define EVPOOL_CACHED_SDS_SIZE 255
struct evictionPoolEntry {  淘汰内存池中的实体元素结构
    unsigned long long idle;    /* Object idle time (inverse frequency for LFU) */ 对象空等时间(频率的反向 对于LFU)
    sds key;                    /* Key name. */ 键名
    sds cached;                 /* Cached SDS object for key name. */ 对应 键名 缓存的sds对象
    int dbid;                   /* Key DB number. */ 键所在库的序号
};

static struct evictionPoolEntry *EvictionPoolLRU; 指向淘汰池中元素结构体的指针

/* ----------------------------------------------------------------------------
 * Implementation of eviction, aging and LRU 实现淘汰老化和LRU
 * --------------------------------------------------------------------------*/

/* Return the LRU clock, based on the clock resolution. This is a time
 * in a reduced-bits format that can be used to set and check the
 * object->lru field of redisObject structures. */
根据时钟分辨率返回LRU时钟。
可以用来设置redisObject结构中的字段lru,lru字段是一个用位格式表示时间的字段(24位)
unsigned int getLRUClock(void) {
    return (mstime()/LRU_CLOCK_RESOLUTION) & LRU_CLOCK_MAX;  
    (系统时间除以/LRU的时钟分辨率)      获得时钟周期内的值，超过部分被舍去
}

/* This function is used to obtain the current LRU clock.
 * If the current resolution is lower than the frequency we refresh the
 * LRU clock (as it should be in production servers) we return the
 * precomputed value, otherwise we need to resort to a system call. */
这个函数用于获取当前的LRU时钟，如果当前的分辨率低于我们刷新LRU时钟的频率(在生产服务器上应该是这种情况)
我们返回预先计算好的值，否则我们需要求助于系统调用
unsigned int LRU_CLOCK(void) {
    unsigned int lruclock;
    if (1000/server.hz <= LRU_CLOCK_RESOLUTION) {  系统刷新频率低于 LRU时钟刷新频率
        lruclock = server.lruclock; 直接用lru的时间即可
    } else {
        lruclock = getLRUClock(); 否则需要调用系统计算
    }
    return lruclock;
}

/* Given an object returns the min number of milliseconds the object was never
 * requested, using an approximated LRU algorithm. */
给定一个对象，返回该对象没有访问请求的最小毫秒数，使用一个LRU的近似算法
unsigned long long estimateObjectIdleTime(robj *o) {
    unsigned long long lruclock = LRU_CLOCK(); 获取当前时钟计数
    if (lruclock >= o->lru) { 和对象中存储的时钟计数比较，如果大于对象中存储的时钟计数
        return (lruclock - o->lru) * LRU_CLOCK_RESOLUTION; 计算时间按照毫秒为单位
    } else {
        return (lruclock + (LRU_CLOCK_MAX - o->lru)) *  如果比对象中存储的小，那么已经经过了一个周期，需要加上周期值
                    LRU_CLOCK_RESOLUTION;
    }
}

/* freeMemoryIfNeeded() gets called when 'maxmemory' is set on the config
 * file to limit the max memory used by the server, before processing a
 * command.
在处理命令之前，调用函数freeMemoryIfNeeded确认 服务器使用的最大内存是否超过了配置文件中的最大内存值
 * The goal of the function is to free enough memory to keep Redis under the
 * configured memory limit.
这个函数的目的是释放足够多的内存来保持redis服务器使用的内存数量在配置文件数值之下
 * The function starts calculating how many bytes should be freed to keep
 * Redis under the limit, and enters a loop selecting the best keys to
 * evict accordingly to the configured policy.
这个函数开始计算多少字节需要被释放才能保持redis服务器内存在配置限制之下，
进入一个循环根据配置的侧率挑选最佳的淘汰键
 * If all the bytes needed to return back under the limit were freed the
 * function returns C_OK, otherwise C_ERR is returned, and the caller
 * should block the execution of commands that will result in more memory
 * used by the server.
如果在配置限制之下的所需的字节数能被满足（释放的字节数满足我们的要求）返回成功，
否则返回失败，调用者需要阻塞 会导致服务器使用更多内存的 当前执行命令.
 * ------------------------------------------------------------------------
 *
 * LRU approximation algorithm  LRU近似算法
 *
 * Redis uses an approximation of the LRU algorithm that runs in constant
 * memory. Every time there is a key to expire, we sample N keys (with
 * N very small, usually in around 5) to populate a pool of best keys to
 * evict of M keys (the pool size is defined by EVPOOL_SIZE).
redis使用了一个LRU近似算法，从而保持运行在固定内存下。每次有一个键过期，我们抽样N个键(N非常小，通常在5左右)
来填充最佳键池，从而淘汰M个键(池的大小由EVPOOL_SIZE定义)
 * The N keys sampled are added in the pool of good keys to expire (the one
 * with an old access time) if they are better than one of the current keys
 * in the pool.
抽样的N个键被添加到好的过期键的池中(老的访问时间的键)，如果他们比当前池中的键要更好（就是他们访问的时间更新）
 * After the pool is populated, the best key we have in the pool is expired.
 * However note that we don't remove keys from the pool when they are deleted
 * so the pool may contain keys that no longer exist.
经过填充池，在池中我们拥有的最好的键已经过期。然而注意到我们不会从池中移除键当时他们被删除时，
所以池中会存在不存在的键
 * When we try to evict a key, and all the entries in the pool don't exist
 * we populate it again. This time we'll be sure that the pool has at least
 * one key that can be evicted, if there is at least one key that can be
 * evicted in the whole database. */
当我们尝试淘汰一个键，所有在池中的键都不存在(池中一个键也没有)，我们再次填充池。
这次我们确定池中至少有一个键可以被淘汰，如果它就是唯一的键可以从整个数据库中淘汰的键
/* Create a new eviction pool. */
创建一个新的淘汰池
void evictionPoolAlloc(void) {
    struct evictionPoolEntry *ep; 单个淘汰键元素结构
    int j;

    ep = zmalloc(sizeof(*ep)*EVPOOL_SIZE); 总的所需指正空间
    for (j = 0; j < EVPOOL_SIZE; j++) { 初始化
        ep[j].idle = 0;
        ep[j].key = NULL;
        ep[j].cached = sdsnewlen(NULL,EVPOOL_CACHED_SDS_SIZE);
        ep[j].dbid = 0;
    }
    EvictionPoolLRU = ep; 指向首个淘汰键结构指针
}

/* This is an helper function for freeMemoryIfNeeded(), it is used in order
 * to populate the evictionPool with a few entries every time we want to
 * expire a key. Keys with idle time smaller than one of the current
 * keys are added. Keys are always added if there are free entries.
这是一个freeMemoryIfNeeded的帮助函数，每次我们想要淘汰一个键的时候它用一些新的实体来填充淘汰池。
比当前的实体拥有更小的等待时间被添加。被释放的实体总是被添加。
 * We insert keys on place in ascending order, so keys with the smaller
 * idle time are on the left, and keys with the higher idle time on the
 * right. */
我们按照升序排列插入键，所以拥有最小等到时间在左边，拥有更高等待时间的键在右边
void evictionPoolPopulate(int dbid, dict *sampledict, dict *keydict, struct evictionPoolEntry *pool) {
    int j, k, count;
    dictEntry *samples[server.maxmemory_samples];

    count = dictGetSomeKeys(sampledict,samples,server.maxmemory_samples); 
    从字段中随机抽取元素，server.maxmemory_samples 是目标值，count是实际返回值
    for (j = 0; j < count; j++) {
        unsigned long long idle;
        sds key;
        robj *o;
        dictEntry *de;

        de = samples[j];
        key = dictGetKey(de);

        /* If the dictionary we are sampling from is not the main
         * dictionary (but the expires one) we need to lookup the key
         * again in the key dictionary to obtain the value object. */
如果我们是从过期字典而不是从主字典中中抽取的键，外面需要从主字典中重新查询获取键的值
(volatile-前缀表示从expire字典中清除，allkeys表示从主dict字典中清除)
        if (server.maxmemory_policy != MAXMEMORY_VOLATILE_TTL) {  
TTL TIME TO LIVE  剩余存活时间
如果不是只从过期字典中删除最接近到期时间（较小的TTL）的键，这种情况不需要操作主dict，其它情况下均需要操作主dict
            if (sampledict != keydict) de = dictFind(keydict, key);
            o = dictGetVal(de); 获取键值
        }

        /* Calculate the idle time according to the policy. This is called
         * idle just because the code initially handled LRU, but is in fact
         * just a score where an higher score means better candidate. */
根据策略计算等待时间。这只是叫做等待时间，因为代码舒适化处理的是LRU算法，
实际上只是一个分值，越高代表是越好的候选集（越需要淘汰的）
        if (server.maxmemory_policy & MAXMEMORY_FLAG_LRU) { 如果是lru策略
            idle = estimateObjectIdleTime(o); 返回没有请求访问的等待时间
        } else if (server.maxmemory_policy & MAXMEMORY_FLAG_LFU) {
            /* When we use an LRU policy, we sort the keys by idle time
             * so that we expire keys starting from greater idle time.
             * However when the policy is an LFU one, we have a frequency
             * estimation, and we want to evict keys with lower frequency
             * first. So inside the pool we put objects using the inverted
             * frequency subtracting the actual frequency to the maximum
             * frequency of 255. */
当我们使用LRU策略时，我们使用等待时间排序，所以过期的键从大的等待时间开始排，
然而当策略是LFU的时候，我们使用频率估算，我们需要淘汰使用频率低的键。
所以在池中的对象，我们使用最大的频率值减去实际频率值（即使用反向的频率值）
            idle = 255-LFUDecrAndReturn(o);
        } else if (server.maxmemory_policy == MAXMEMORY_VOLATILE_TTL) {
            /* In this case the sooner the expire the better. */
            在这种情况下，越早到期越好，原理同上，生存时间越长需要排到后面，所以采用了反向值
            idle = ULLONG_MAX - (long)dictGetVal(de);
        } else {
            serverPanic("Unknown eviction policy in evictionPoolPopulate()");
        }

        /* Insert the element inside the pool.
         * First, find the first empty bucket or the first populated
         * bucket that has an idle time smaller than our idle time. */
在池中插入元素，第一步查找第一个空桶或者第一个拥有一个等待时间小于我们等待时间的填充桶
        k = 0;
        while (k < EVPOOL_SIZE &&
               pool[k].key &&
               pool[k].idle < idle) k++;  因为等待时间从小开始排，所以越往后越大，找到第一个大于当前等待时间的键
        if (k == 0 && pool[EVPOOL_SIZE-1].key != NULL) { 
        如果我们当前的元素就是等待时间最短的那个，并且池是满的，不能插入当前元素
            /* Can't insert if the element is < the worst element we have
             * and there are no empty buckets. */
            continue;
        } else if (k < EVPOOL_SIZE && pool[k].key == NULL) { 如果在范围并且当前位置是空的，直接插入即可
            /* Inserting into empty position. No setup needed before insert. */
        } else {
            /* Inserting in the middle. Now k points to the first element
             * greater than the element to insert.  */
             在中间位置插入一个元素，那么现在k指向第一个大于当前待插入元素的元素
            if (pool[EVPOOL_SIZE-1].key == NULL) { 池不是满的
                /* Free space on the right? Insert at k shifting
                 * all the elements from k to end to the right. */
在右边存在空的位置，把从k到最后一个元素向右移，留出第k个位置
                /* Save SDS before overwriting. */ 保存覆盖之前的sds字符串
                sds cached = pool[EVPOOL_SIZE-1].cached; 保存cached指向的空间，留作后面使用
                memmove(pool+k+1,pool+k,
                    sizeof(pool[0])*(EVPOOL_SIZE-k-1));
                pool[k].cached = cached; 复用空间
            } else {
                /* No free space on right? Insert at k-1 */ 右边没有空间了，需要在K-1处插入
                k--; 这k不可能为0，至少为1，因为上述的if判定了条件pool[EVPOOL_SIZE-1].key ！= NULL，
                而第一种情况下 k == 0 && pool[EVPOOL_SIZE-1].key != NULL 这种情况已判，故k最小是1
                /* Shift all elements on the left of k (included) to the
                 * left, so we discard the element with smaller idle time. */
                 将k前面的元素（包括k）向左移动，这样我们就取消了最左边的最小等待时间的元素
                sds cached = pool[0].cached; /* Save SDS before overwriting. */ 保存cached指向的空间，留作后面使用
                if (pool[0].key != pool[0].cached) sdsfree(pool[0].key); 
                如果key和cached指向不同的空间，那么key所指向的空间需要被释放，否则就是孤魂野鬼，会造成内存泄漏
                memmove(pool,pool+1,sizeof(pool[0])*k); 向左移动
                pool[k].cached = cached; 复用空间
            }
        }

        /* Try to reuse the cached SDS string allocated in the pool entry,
         * because allocating and deallocating this object is costly
         * (according to the profiler, not my fantasy. Remember:
         * premature optimizbla bla bla bla. */
        尝试使用分配给池中实体的缓存字符串空间，因为分配和释放分配的内存消耗很大
        （依据分析器，而不是我自己的想想，记住过度优化会导致巴拉巴拉巴拉）
        int klen = sdslen(key);
        if (klen > EVPOOL_CACHED_SDS_SIZE) { 如果超过了缓存的空间大小，那需要新建一个SDS字符串
            pool[k].key = sdsdup(key);
        } else {
            memcpy(pool[k].cached,key,klen+1); 如果缓存空间够用，将键拷贝到缓存中
            sdssetlen(pool[k].cached,klen);设置SDS字符串长度
            pool[k].key = pool[k].cached;将键也指向缓存所在内存空间
        }
        pool[k].idle = idle; 设置该键的等待时间
        pool[k].dbid = dbid; 设置所在数据库id
    }
}

/* ----------------------------------------------------------------------------
 * LFU (Least Frequently Used) implementation. LFU(最近最少使用)算法实现

 * We have 24 total bits of space in each object in order to implement
 * an LFU (Least Frequently Used) eviction policy, since we re-use the
 * LRU field for this purpose.
我们总共使用每个对象中的24位空间来实现LFU淘汰策略，因此我们重复使用LRU使用的位来达到这个目的
 * We split the 24 bits into two fields: 我们将24比特分为以下两个部分:
 *
 *          16 bits      8 bits
 *     +----------------+--------+
 *     + Last decr time | LOG_C  |
 *     +----------------+--------+
 *
 * LOG_C is a logarithmic counter that provides an indication of the access
 * frequency. However this field must also be decremented otherwise what used
 * to be a frequently accessed key in the past, will remain ranked like that
 * forever, while we want the algorithm to adapt to access pattern changes.
LOG_C是一个对数计数器，提供了一个存取频率的指标，然而这个值必须减少否则过去频繁访问的键会一直保持这样的排序
，同时我们期望算法可以自适应访问模式的变化
 * So the remaining 16 bits are used in order to store the "decrement time",
 * a reduced-precision Unix time (we take 16 bits of the time converted
 * in minutes since we don't care about wrapping around) where the LOG_C
 * counter is halved if it has an high value, or just decremented if it
 * has a low value.
因此剩余的16位比特被用来保存减少时间，一个降低精度的unix系统时间
(我们用16比特来表示以分为时间的单位，我们不关心近似的包装)
如果当LOG_C的是一个高值的情况下，我们进行减半处理，当是低值时只是减1
 * New keys don't start at zero, in order to have the ability to collect
 * some accesses before being trashed away, so they start at COUNTER_INIT_VAL.
 * The logarithmic increment performed on LOG_C takes care of COUNTER_INIT_VAL
 * when incrementing the key, so that keys starting at COUNTER_INIT_VAL
 * (or having a smaller value) have a very high chance of being incremented
 * on access.
新键不是从0开始计数的，为了让新键有机会在淘汰前获取访问次数，因此开始用COUNTER_INIT_VAL
作为初始值。在LOG_C上执行的对数算法考虑到了初始值COUNTER_INIT_VAL的作用。
当我们对键增加访问计数时，开始于初始值COUNTER_INIT_VAL的键（或者一个较小的值）
被访问时有很大机会可以递增
 * During decrement, the value of the logarithmic counter is halved if
 * its current value is greater than two times the COUNTER_INIT_VAL, otherwise
 * it is just decremented by one.
减量期间，对数计数器减半如果比初始值COUNTER_INIT_VAL的两倍还打的时，否则只是减去1
 * --------------------------------------------------------------------------*/

/* Return the current time in minutes, just taking the least significant
 * 16 bits. The returned time is suitable to be stored as LDT (last decrement
 * time) for the LFU implementation. */
用分的形式返回当前的时间，只采用最小的16位值。返回的时间适合LDT(最近递减时间)的形式，用来实现LFU算法
unsigned long LFUGetTimeInMinutes(void) {
    return (server.unixtime/60) & 65535; 
}

/* Given an object last access time, compute the minimum number of minutes
 * that elapsed since the last access. Handle overflow (ldt greater than
 * the current 16 bits minutes time) considering the time as wrapping
 * exactly once. */
给定一个对象的最后获取时间，计算最近一次存取之后经过的最小的分数（时间的分）。
处理溢出的情况(LDT大于当前的16位比特的最大分值)，只考虑多一个循环(因为知道实际超出多少圈，只用一圈计算)
unsigned long LFUTimeElapsed(unsigned long ldt) {
    unsigned long now = LFUGetTimeInMinutes();
    if (now >= ldt) return now-ldt;  没有超出一圈
    return 65535-ldt+now; 超过了不知道多少圈，当做一圈计算
}

/* Logarithmically increment a counter. The greater is the current counter value
 * the less likely is that it gets really implemented. Saturate it at 255. */
按照对数的增长方式增加计数器，计数器的值越大，那么它增长的就越慢，终值是255
uint8_t LFULogIncr(uint8_t counter) {
    if (counter == 255) return 255; 计数器已经是最大值了，直接返回不会增加了
    double r = (double)rand()/RAND_MAX; 获取随机概率
    double baseval = counter - LFU_INIT_VAL; 获取基础值
    if (baseval < 0) baseval = 0; 如果基础值小于0，那么计数器会增加1
    double p = 1.0/(baseval*server.lfu_log_factor+1); 
    server.lfu_log_factor初始化的值为10
    这个是结合基础值和lfu对数算法因子的概率公式，会随着计数值的增加概率变小
    if (r < p) counter++;
    return counter;
}

/* If the object decrement time is reached decrement the LFU counter but
 * do not update LFU fields of the object, we update the access time
 * and counter in an explicit way when the object is really accessed.
 * And we will times halve the counter according to the times of
 * elapsed time than server.lfu_decay_time.
 * Return the object frequency counter.
如果达到对象递减时间，就减少LFU的计数器，但是不更新对象中LFU字段的值，
只有当实际存取对象时，我们才更新获取时间和计数器。
我们将根据所有消耗的时间和服务器衰退因子时间之比，对计数器减半处理
返回对象的频率计数器
 * This function is used in order to scan the dataset for the best object
 * to fit: as we check for the candidate, we incrementally decrement the
 * counter of the scanned objects if needed. */
这个函数被用来扫描数据集合查找最佳的符合条件的对象：当我们检查候选集合时，我们会递增的减少扫描对象的计数器
unsigned long LFUDecrAndReturn(robj *o) {
    unsigned long ldt = o->lru >> 8;  只使用高18位作为减值时间计数
    unsigned long counter = o->lru & 255; 低8为作为对数计数
    unsigned long num_periods = server.lfu_decay_time ? LFUTimeElapsed(ldt) / server.lfu_decay_time : 0;
    是否设置了衰退因子，设置的情况下，就会对计数器进行衰退处理(这衰退的有点快，怪不得上面要有基础的初始值，要不很快就淘汰了)
    if (num_periods)
        counter = (num_periods > counter) ? 0 : counter - num_periods;
    return counter;
}

/* ----------------------------------------------------------------------------
 * The external API for eviction: freeMemroyIfNeeded() is called by the
 * server when there is data to add in order to make space if needed.
供外部使用的键淘汰API： 函数freeMemroyIfNeeded有服务器调用，当需要为新增数据获取空间时
 * --------------------------------------------------------------------------*/

/* We don't want to count AOF buffers and slaves output buffers as
 * used memory: the eviction should use mostly data size. This function
 * returns the sum of AOF and slaves buffer. */
我们不能把AOF的缓存和从服务器的输出缓存当做使用的内存空间计算： 
淘汰算法应该主要使用数据大小（需要排除上面两种情况）。这个函数返回AOF和从服务器缓存的总和空间大小
size_t freeMemoryGetNotCountedMemory(void) {
    size_t overhead = 0;
    int slaves = listLength(server.slaves); 从服务器的个数

    if (slaves) {
        listIter li;
        listNode *ln;

        listRewind(server.slaves,&li); //指向列表头部
        while((ln = listNext(&li))) {  遍历列表, 对每个从服务器的输出缓存求总和
            client *slave = listNodeValue(ln); 
            overhead += getClientOutputBufferMemoryUsage(slave); 
        }
    }
    if (server.aof_state != AOF_OFF) { 开启了AOF，需要增加AOF部分的内存
        overhead += sdsalloc(server.aof_buf)+aofRewriteBufferSize();
    }
    return overhead;
}

/* Get the memory status from the point of view of the maxmemory directive:
 * if the memory used is under the maxmemory setting then C_OK is returned.
 * Otherwise, if we are over the memory limit, the function returns
 * C_ERR.
从最大内存指标的角度来获取内存状态： 如果使用的内存在最大设置的内存之下，那么返回C_OK.
否则如果超过了最大内存限制，那么函数返回C_ERR
 * The function may return additional info via reference, only if the
 * pointers to the respective arguments is not NULL. Certain fields are
 * populated only when C_ERR is returned:
函数可能通过引用返回额外的信息，当且仅当对应参数的指针非空。
只有失败的时候特定的域值才被填充
 *  'total'     total amount of bytes used. 总的内存使用字节
 *              (Populated both for C_ERR and C_OK) 不论成功失败都会填充这个字段
 *
 *  'logical'   the amount of memory used minus the slaves/AOF buffers. 总使用内存字节数 减去 从服务器和AOF使用字节数
 *              (Populated when C_ERR is returned) 当失败的时候填充这个字段
 *
 *  'tofree'    the amount of memory that should be released   应该被释放的总的内存字节数，为了达到限制的最大内存字节数
 *              in order to return back into the memory limits.
 *              (Populated when C_ERR is returned)当失败的时候填充这个字段
 *
 *  'level'     this usually ranges from 0 to 1, and reports the amount of
 *              memory currently used. May be > 1 if we are over the memory
 *              limit.这个值总是在0到1之间，用来报告当前使用内存比例，也会超过1，当我们超过了内存最大限制值时
 *              (Populated both for C_ERR and C_OK) 不论成功失败都会填充这个字段
 */
int getMaxmemoryState(size_t *total, size_t *logical, size_t *tofree, float *level) {
    size_t mem_reported, mem_used, mem_tofree;

    /* Check if we are over the memory usage limit. If we are not, no need
     * to subtract the slaves output buffers. We can just return ASAP. */
检查我们是不是超过了总的内存限制，如果没有，那么需要减去从服务器的字节数。我们直接返回即可
    mem_reported = zmalloc_used_memory(); 获取总的使用内存
    if (total) *total = mem_reported;

    /* We may return ASAP if there is no need to compute the level. */
    我们立即返回如果不需要计算使用比例
    int return_ok_asap = !server.maxmemory || mem_reported <= server.maxmemory; 内存无限制或者使用内存小于最大内存
    if (return_ok_asap && !level) return C_OK;  !level  不需要计算比例

    /* Remove the size of slaves output buffers and AOF buffer from the
     * count of used memory. */
从总的使用内存字节数中 减去 从服务器的输出缓存和AOF的缓存内存字节数
    mem_used = mem_reported;
    size_t overhead = freeMemoryGetNotCountedMemory();
    mem_used = (mem_used > overhead) ? mem_used-overhead : 0;

    /* Compute the ratio of memory usage. */ 计算内存使用比例值
    if (level) {
        if (!server.maxmemory) { 内存无限制，比例值为0
            *level = 0;
        } else {
            *level = (float)mem_used / (float)server.maxmemory;
        }
    }

    if (return_ok_asap) return C_OK;

    /* Check if we are still over the memory limit. */
    检查我们是否还是超过了设置的内存限制（上面第一次判断是总的内存，这次是除去了从服务器和AOF的）
    if (mem_used <= server.maxmemory) return C_OK;

    /* Compute how much memory we need to free. */ 计算我们还需要释放多少内存才能满足最大内存限制
    mem_tofree = mem_used - server.maxmemory;

    if (logical) *logical = mem_used; 设置逻辑内存
    if (tofree) *tofree = mem_tofree; 设置总内存

    return C_ERR; 
}

/* This function is periodically called to see if there is memory to free
 * according to the current "maxmemory" settings. In case we are over the
 * memory limit, the function will try to free some memory to return back
 * under the limit.
这个函数被周期调用 用来检查根据最大内存的设置是否需要释放内存。
当我们超过了内存限制，这个函数将会尝试释放一些内存从而达到最大内存限制的要求
 * The function returns C_OK if we are under the memory limit or if we
 * were over the limit, but the attempt to free memory was successful.
 * Otehrwise if we are over the memory limit, but not enough memory
 * was freed to return back under the limit, the function returns C_ERR. */
这个函数返回成功， 如果当前使用内存在最大内存限制之下或者 我们超过了最大内存限制，但是释放内存成功。
否则如果我们超过了最大内存限制，而且没有足够的内存被释放来达到最大内存的限制。函数返回失败
int freeMemoryIfNeeded(void) {
    int keys_freed = 0;
    /* By default replicas should ignore maxmemory
     * and just be masters exact copies. */
默认情况下，从服务器可以忽略最大内存限制，只需要主服务器的准确复制
    if (server.masterhost && server.repl_slave_ignore_maxmemory) return C_OK;

    size_t mem_reported, mem_tofree, mem_freed;
    mstime_t latency, eviction_latency, lazyfree_latency;
    long long delta;
    int slaves = listLength(server.slaves); 获取从服务器数量
    int result = C_ERR;

    /* When clients are paused the dataset should be static not just from the
     * POV of clients not being able to write, but also from the POV of
     * expires and evictions of keys not being performed. */
当客户端被阻塞时，数据集也会被阻塞，不仅仅是因为客户端无法写，还因为过期和淘汰的键不能被执行
    if (clientsArePaused()) return C_OK; 客户端静止的情况下，返回成功
    if (getMaxmemoryState(&mem_reported,NULL,&mem_tofree,NULL) == C_OK) 获取最大内存使用情况
        return C_OK;

    mem_freed = 0;

    latencyStartMonitor(latency); 设置延时监控器
    if (server.maxmemory_policy == MAXMEMORY_NO_EVICTION)  使用不淘汰的策略
        goto cant_free; /* We need to free memory, but policy forbids. */ 我们想要释放内存，但是策略不允许

    while (mem_freed < mem_tofree) { 如果释放的内存还小于需要释放的内存，继续释放
        int j, k, i;
        static unsigned int next_db = 0;
        sds bestkey = NULL;
        int bestdbid;
        redisDb *db;
        dict *dict;
        dictEntry *de;
        如果是LRU、LFU或者是volatile-ttl策略释放内存
        if (server.maxmemory_policy & (MAXMEMORY_FLAG_LRU|MAXMEMORY_FLAG_LFU) ||
            server.maxmemory_policy == MAXMEMORY_VOLATILE_TTL) 
        {
            struct evictionPoolEntry *pool = EvictionPoolLRU;

            while(bestkey == NULL) {
                unsigned long total_keys = 0, keys;

                /* We don't want to make local-db choices when expiring keys,
                 * so to start populate the eviction pool sampling keys from
                 * every DB. */
                 我们不想在键过期时只选择本地数据库释放，所以从每个数据库中抽样键来填充淘汰键池
                for (i = 0; i < server.dbnum; i++) {
                    db = server.db+i;
                    从过期字典中获取还是从主字典中获取键
                    dict = (server.maxmemory_policy & MAXMEMORY_FLAG_ALLKEYS) ?
                            db->dict : db->expires;
                    if ((keys = dictSize(dict)) != 0) { 数据字典非空，采样键到淘汰键池
                        evictionPoolPopulate(i, dict, db->dict, pool);
                        total_keys += keys;
                    }
                }
                if (!total_keys) break; /* No keys to evict. */ 没有键可以淘汰，数据字典为空

                /* Go backward from best to worst element to evict. */ 
                从后往前淘汰键（越后等待时间越长），从最好候选键（等待时间最长）到最差候选键
                for (k = EVPOOL_SIZE-1; k >= 0; k--) {
                    if (pool[k].key == NULL) continue;
                    bestdbid = pool[k].dbid;

                    if (server.maxmemory_policy & MAXMEMORY_FLAG_ALLKEYS) { 如果是全局策略
                        de = dictFind(server.db[pool[k].dbid].dict,
                            pool[k].key); 从主字典查找
                    } else {
                        de = dictFind(server.db[pool[k].dbid].expires,
                            pool[k].key); 从过期字典查找
                    }

                    /* Remove the entry from the pool. */ 从淘汰键池中移除键所在实体
                    if (pool[k].key != pool[k].cached) 保留了cached的内存空间
                        sdsfree(pool[k].key);
                    pool[k].key = NULL;
                    pool[k].idle = 0;

                    /* If the key exists, is our pick. Otherwise it is
                     * a ghost and we need to try the next element. */
                     如果键存在,那就是我们所选。否则只是一个幽灵(就是数据字典中不存在了)，我们需要获取下一个元素
                    if (de) {
                        bestkey = dictGetKey(de);
                        break;
                    } else {
                        /* Ghost... Iterate again. */ 幽灵，再次迭代
                    }
                }
            }
        }

        /* volatile-random and allkeys-random policy */
        过期字典随机策略 和  主字典随机策略
        else if (server.maxmemory_policy == MAXMEMORY_ALLKEYS_RANDOM ||
                 server.maxmemory_policy == MAXMEMORY_VOLATILE_RANDOM)
        {
            /* When evicting a random key, we try to evict a key for
             * each DB, so we use the static 'next_db' variable to
             * incrementally visit all DBs. */
            当我们随机淘汰一个键时，我们尝试对每个库淘汰一个键。因此我们使用静态变量next_db
            来增量式的访问所有数据库
            for (i = 0; i < server.dbnum; i++) {
                j = (++next_db) % server.dbnum;
                db = server.db+j;
                dict = (server.maxmemory_policy == MAXMEMORY_ALLKEYS_RANDOM) ? 主字典还是过期字典
                        db->dict : db->expires;
                if (dictSize(dict) != 0) { 字典存在键
                    de = dictGetRandomKey(dict); 随机获取一个
                    bestkey = dictGetKey(de);
                    bestdbid = j;
                    break;
                }
            }
        }

        /* Finally remove the selected key. */ 最终删除选择的键
        if (bestkey) {
            db = server.db+bestdbid;
            robj *keyobj = createStringObject(bestkey,sdslen(bestkey)); 创建删除对象
            propagateExpire(db,keyobj,server.lazyfree_lazy_eviction);  发送删除键的信息给每个从数据库和AOF
            /* We compute the amount of memory freed by db*Delete() alone.
             * It is possible that actually the memory needed to propagate
             * the DEL in AOF and replication link is greater than the one
             * we are freeing removing the key, but we can't account for
             * that otherwise we would never exit the loop.
             我们单独计算db*Delete释放的内存数量。实际情况可能是发送删除命令的操作在AOF和赋值链接中
             消耗的内存大于我们释放键所产生的内存。但是我们不能这样处理，否则我们将永远退不出循环
             * AOF and Output buffer memory will be freed eventually so
             * we only care about memory used by the key space. */
             因为AOF和输出缓存暂用的内存最终会被释放，所以我们只关心键空间使用的内存
            delta = (long long) zmalloc_used_memory(); 获取当前使用的内存
            latencyStartMonitor(eviction_latency); 开始淘汰监控
            if (server.lazyfree_lazy_eviction) 允许延时淘汰
                dbAsyncDelete(db,keyobj); 开启异步删除模式
            else
                dbSyncDelete(db,keyobj);否则同步删除
            signalModifiedKey(NULL,db,keyobj);将删除信息通知相关关注各方
            latencyEndMonitor(eviction_latency);结束延时监控
            latencyAddSampleIfNeeded("eviction-del",eviction_latency);添加监控信息
            delta -= (long long) zmalloc_used_memory(); 获取最新内存数量
            mem_freed += delta; 差值
            server.stat_evictedkeys++; 淘汰键数目加1
            notifyKeyspaceEvent(NOTIFY_EVICTED, "evicted",
                keyobj, db->id); 通知淘汰事件
            decrRefCount(keyobj);减少引用计数
            keys_freed++;

            /* When the memory to free starts to be big enough, we may
             * start spending so much time here that is impossible to
             * deliver data to the slaves fast enough, so we force the
             * transmission here inside the loop. */
当释放的内存开始变得足够大，我们开始需要在这里花费很多时间，以至于不可能足够快的发送数据给从服务器，
所以我们强制在循环中传输（怕积累起来的数据量太大，不能一下子及时传送给从服务器，所以在循环中一点点的传输）
            if (slaves) flushSlavesOutputBuffers();

            /* Normally our stop condition is the ability to release
             * a fixed, pre-computed amount of memory. However when we
             * are deleting objects in another thread, it's better to
             * check, from time to time, if we already reached our target
             * memory, since the "mem_freed" amount is computed only
             * across the dbAsyncDelete() call, while the thread can
             * release the memory all the time. */
正常情况下，我们停止的条件是释放规定的预先计算好的内存空间。然而当我们在另外一个线程删除对象时，
最好随时检查，我们是否已经达到我们的目标内存空间，因为mem_freed只在通过函数dbAsyncDelete调用计算
但是线程会一直释放内存。
            if (server.lazyfree_lazy_eviction && !(keys_freed % 16)) { 如果是延迟释放模式 那么每隔16个键检查一次是否满足条件
                if (getMaxmemoryState(NULL,NULL,NULL,NULL) == C_OK) { 已经达到要求
                    /* Let's satisfy our stop condition. */ 让我们满足停止条件
                    mem_freed = mem_tofree;
                }
            }
        } else {
            goto cant_free; /* nothing to free... */
        }
    }
    result = C_OK;

cant_free:
    /* We are here if we are not able to reclaim memory. There is only one
     * last thing we can try: check if the lazyfree thread has jobs in queue
     * and wait... */
如果我们不能回收内存。那么只剩下一件事情我们可以尝试： 检查延迟线程是否有在队列中有任务然后等待。。。
    if (result != C_OK) {
        latencyStartMonitor(lazyfree_latency);开启事件监控
        while(bioPendingJobsOfType(BIO_LAZY_FREE)) { 返回特定类型的等待任务个数，一直等待直到内存够为止
            if (getMaxmemoryState(NULL,NULL,NULL,NULL) == C_OK) {
                result = C_OK;
                break;
            }
            usleep(1000);
        }
        latencyEndMonitor(lazyfree_latency); 结束监控
        latencyAddSampleIfNeeded("eviction-lazyfree",lazyfree_latency);将超过配置阈值的样品添加到特定事件
    }
    latencyEndMonitor(latency);
    latencyAddSampleIfNeeded("eviction-cycle",latency);
    return result;
}

/* This is a wrapper for freeMemoryIfNeeded() that only really calls the
 * function if right now there are the conditions to do so safely:
这个函数是对freeMemoryIfNeeded的包装，它只有在满足下面的安全条件的时候才会调用
 * - There must be no script in timeout condition. 必须没有超时状态的脚本
 * - Nor we are loading data right now. 或者我们正在加载数据
 *
 */
int freeMemoryIfNeededAndSafe(void) {
    if (server.lua_timedout || server.loading) return C_OK;
    return freeMemoryIfNeeded();
}
***************************************************************************************************************************
posted on 2021-09-02 18:03 子虚乌有阅读(191) 评论(0) 收藏举报