Redis Persistence And Key Eviction

Persistence

要取消持久化，需先删除原有的 dump.rdb，不然启动后会加载

命令

# 保存 RDB，会阻塞 redis 进程
save
# 保存 RDB，合适的时候后台执行，会 fork 函数开启子进程去执行持久化操作
bgsave
# 获取 RDB 最后成功保存时间戳
LASTSAVE
# 关闭服务器时指定保存 RDB 数据
shutdowm save
# 触发 AOF 重写(避免 AOF 文件过大)
BGREWRITEAOF
# 查看 AOF 持久化状态
INFO Persistence
# 服务器运行过程中重启
debug reload

RDB

RDB 文件存储的是二进制数据

配置

# 指定在多长时间内，有多少次更新操作，就将数据同步(使用的 bgsave 方式，不影响主进程)到文件，可以多个条件配合
# 满足以下条件将会同步数据：900 秒内有 1 个更改，300 秒内有 10 个更改，60 秒内有 10000 个更改
# 把所有 save 行注释，就取消同步操作了
save 900 1
save 300 10
save 60 10000
# 如果用户开启了 RDB 快照功能，那么在 redis 持久化数据到磁盘时如果出现失败，默认情况下，redis 会停止接受所有的写请求
stop-writes-on-bgsave-error yes
# 指定存储至本地数据库时是否压缩数据，默认为 yes，Redis 采用 LZF 压缩，如果为了节省 CPU 时间，可以关闭该选项，但会导致数据库文件变的巨大
rdbcompression yes
# 在存储快照后，可以让 redis 使用 CRC64 算法来进行数据校验，这样做会增加大约 10% 的性能消耗，如果希望获取到最大的性能提升，可以关闭此功能
rdbchecksum yes
# 安全删除旧 RDB
rdb-del-sync-files no
# 转储数据库的文件名，默认值为 dump.rdb
dbfilename dump.rdb
# 工作目录,指定本地数据库存放目录，文件名由上一个 dbfilename 配置项指定,这里只能指定一个目录，不能指定文件名
dir ./

工作流程：

达到 save 条件或执行 SAVE/BGSAVE 命令
主进程 fork 子进程(COW机制)
子进程将内存数据写入临时 RDB 文件
完成写入后替换旧 RDB 文件
异常时记录日志(若配置 stop-writes-on-bgsave-error)

AOF

AOF 文件存储的是操作命令

配置

# 是否开启 AOF 持久化功能，默认为不开启状态
appendonly yes | no

# AOF 写数据策略
# always(每次)，每次写入操作均同步到AOF文件中，数据零误差，性能较低，不建议使用。
# everysec(每秒)，每秒将缓冲区中的指令同步到AOF文件中，数据准确性较高，性能较高，建议使用，也是默认配置在系统突然宕机的情况下丢失1秒内的数据
# no(系统控制)，由操作系统控制每次同步到AOF文件的周期，整体过程不可控
appendfsync always | everysec | no
# 时间戳注释
aof-timestamp-enabled no
# 触发重写百分比(指定百分比为0，将禁用aof自动重写功能)
auto-aof-rewrite-percentage 100
# 触发自动重写的最低文件体积(小于64mb不自动重写)
auto-aof-rewrite-min-size 64mb
# AOF文件末尾截断时，加载文件还是报错退出
aof-load-truncated yes
# 后台执行(RDB 的 save | AOF 重写)时 appendfsync 设为 no
no-appendfsync-on-rewrite yes

# AOF 持久化文件名，默认文件名未 appendonly.aof，建议配置为 appendonly-端口号.aof
appendfilename "appendonly.aof"
# AOF 持久化文件保存路径，与 RDB 持久化文件保持一致即可
dir ./

工作流程：

写命令追加到 aof_buf 缓冲区
根据 appendfsync 策略同步到磁盘：
1. always：每次写入
2. everysec：每秒批量同步(默认)
3. no：由操作系统决定
达到重写条件时触发 BGREWRITEAOF
子进程创建新 AOF 文件(包含当前数据集的最小命令集合)
新文件完成后替换旧文件

混合持久化

# 混合持久化（Redis7+ 默认启用）
aof-use-rdb-preamble yes

工作流程：

使用 RDB 格式存储全量数据作为 AOF 文件头部
后续增量数据使用 AOF 格式
加载时先加载 RDB 部分再重放 AOF 命令

AOF 和 RDB 同时启用时，优先加载 AOF 文件，仅当 AOF 关闭时加载 RDB

Key Eviction

expires 存储结构

typedef struct redisDb { // https://github.com/redis/redis/blob/7.4.2/src/server.h#L968
    kvstore *keys;              /* The keyspace for this DB */
    kvstore *expires;           /* Timeout of keys with a timeout set */ // 过期字典：存储键->毫秒级过期时间戳(支撑 TTL 机制)
    ebuckets hexpires;          /* Hash expiration DS. Single TTL per hash (of next min field to expire) */ // 哈希结构专用过期桶：优化哈希字段过期检查(HEXPIRE 命令)
    dict *blocking_keys;        /* Keys with clients waiting for data (BLPOP)*/
    dict *blocking_keys_unblock_on_nokey;   /* Keys with clients waiting for data, and should be unblocked if key is deleted (XREADEDGROUP). This is a subset of blocking_keys*/
    dict *ready_keys;           /* Blocked keys that received a PUSH */
    dict *watched_keys;         /* WATCHED keys for MULTI/EXEC CAS */
    int id;                     /* Database ID */
    long long avg_ttl;          /* Average TTL, just for stats */ // 平均 TTL 统计，用于内存淘汰策略决策
    unsigned long expires_cursor; /* Cursor of the active expire cycle. */ // 定期删除游标，实现渐进式扫描
    list *defrag_later;         /* List of key names to attempt to defrag one by one, gradually. */
} redisDb;
struct _kvstore { // https://github.com/redis/redis/blob/7.4.2/src/kvstore.c#L30
    int flags;
    dictType dtype;
    dict **dicts;
    long long num_dicts;
    long long num_dicts_bits;
    list *rehashing;                       /* List of dictionaries in this kvstore that are currently rehashing. */
    int resize_cursor;                     /* Cron job uses this cursor to gradually resize dictionaries (only used if num_dicts > 1). */
    int allocated_dicts;                   /* The number of allocated dicts. */
    int non_empty_dicts;                   /* The number of non-empty dicts. */
    unsigned long long key_count;          /* Total number of keys in this kvstore. */ // 当前 key 总数，用于内存淘汰触发判断
    unsigned long long bucket_count;       /* Total number of buckets in this kvstore across dictionaries. */
    unsigned long long *dict_size_index;   /* Binary indexed tree (BIT) that describes cumulative key frequencies up until given dict-index. */
    size_t overhead_hashtable_lut;         /* The overhead of all dictionaries. */
    size_t overhead_hashtable_rehashing;   /* The overhead of dictionaries rehashing. */
};
struct dict { // https://github.com/redis/redis/blob/7.4.2/src/dict.h#L96
    dictType *type;

    dictEntry **ht_table[2];
    unsigned long ht_used[2];

    long rehashidx; /* rehashing not in progress if rehashidx == -1 */

    /* Keep small vars at end for optimal (minimal) struct padding */
    unsigned pauserehash : 15; /* If >0 rehashing is paused */

    unsigned useStoredKeyApi : 1; /* See comment of storedHashFunction above */
    signed char ht_size_exp[2]; /* exponent of size. (size = 1<<exp) */
    int16_t pauseAutoResize;  /* If >0 automatic resizing is disallowed (<0 indicates coding error) */
    void *metadata[];
};
struct dictEntry { // https://github.com/redis/redis/blob/7.4.2/src/dict.c#L45
    void *key;
    union {
        void *val;
        uint64_t u64;
        int64_t s64;
        double d;
    } v;
    struct dictEntry *next;     /* Next entry in the same hash bucket. */
};
typedef struct {
    void *key;
    dictEntry *next;
} dictEntryNoValue;
#define LRU_BITS 24
struct redisObject { // https://github.com/redis/redis/blob/7.4.2/src/server.h#L903
    unsigned type:4;
    unsigned encoding:4;
    unsigned lru:LRU_BITS; /* LRU time (relative to global lru_clock) or LFU data (least significant 8 bits frequency and most significant 16 bits access time). */
    int refcount;
    void *ptr;
};

删除策略

定时删除：设置 key 时，创建一个对应的定时器，到时间了执行删除操作

惰性删除(Lazy Expiration)：被动过期检查，当访问一个 key 时，判断是否需要执行过期删除

keyStatus expireIfNeeded(redisDb *db, robj *key, int flags) { // https://github.com/redis/redis/blob/7.4.2/src/db.c#L1974
    if (server.lazy_expire_disabled) return KEY_VALID; // 如果禁用惰性删除，直接返回有效
    if (!keyIsExpired(db,key)) return KEY_VALID; // 检查 key 是否未过期

    if (server.masterhost != NULL) { // 主从复制处理逻辑：从节点不主动删除过期键(依赖主节点同步 DEL 命令)，除非是写操作在可写从节点上执行
        if (server.current_client && (server.current_client->flags & CLIENT_MASTER)) return KEY_VALID;
        if (!(flags & EXPIRE_FORCE_DELETE_EXPIRED)) return KEY_EXPIRED;
    }

    if (flags & EXPIRE_AVOID_DELETE_EXPIRED) return KEY_EXPIRED; // // 明确要求避免删除的情况(仅返回过期状态)
    if (isPausedActionsWithUpdate(PAUSE_ACTION_EXPIRE)) return KEY_EXPIRED; // 如果过期操作被暂停(如故障转移期间)

    int static_key = key->refcount == OBJ_STATIC_REFCOUNT; // 处理静态键(引用计数为OBJ_STATIC_REFCOUNT)，需要先转换为堆分配的内存结构
    if (static_key) key = createStringObject(key->ptr, sdslen(key->ptr));

    deleteExpiredKeyAndPropagate(db,key); // 执行过期 key 删除并传播命令
    if (static_key) decrRefCount(key); // 清理临时创建的 key 对象
    return KEY_DELETED;
}

定期删除(Active Expire)：主动定期过期扫描(时间事件驱动)，随机时间获取一定量的 key 检查，过期就删除

void activeExpireCycle(int type) { // https://github.com/redis/redis/blob/7.4.2/src/expire.c#L187
    /* Adjust the running parameters according to the configured expire effort. The default effort is 1, and the maximum configurable effort is 10. */
    unsigned long effort = server.active_expire_effort-1, /* Rescale from 0 to 9. */
    config_keys_per_loop = ACTIVE_EXPIRE_CYCLE_KEYS_PER_LOOP + ACTIVE_EXPIRE_CYCLE_KEYS_PER_LOOP/4*effort, // 每轮扫描 key 的数量
    config_cycle_fast_duration = ACTIVE_EXPIRE_CYCLE_FAST_DURATION + ACTIVE_EXPIRE_CYCLE_FAST_DURATION/4*effort, // 快速模式时长
    config_cycle_slow_time_perc = ACTIVE_EXPIRE_CYCLE_SLOW_TIME_PERC + 2*effort, // CPU 时间占比
    config_cycle_acceptable_stale = ACTIVE_EXPIRE_CYCLE_ACCEPTABLE_STALE-effort; // 可接受过期 key 比例

    /* This function has some global state in order to continue the work incrementally across calls. */
    static unsigned int current_db = 0; /* Next DB to test. */
    static int timelimit_exit = 0;      /* Time limit hit in previous call? */
    static long long last_fast_cycle = 0; /* When last fast cycle ran. */

    int j, iteration = 0;
    int dbs_per_call = CRON_DBS_PER_CALL; // 每次调用处理的 DB 数量
    int dbs_performed = 0; // 已处理的 DB 数量
    long long start = ustime(), timelimit, elapsed;

    if (isPausedActionsWithUpdate(PAUSE_ACTION_EXPIRE)) return; // 检查过期操作是否被暂停
    if (type == ACTIVE_EXPIRE_CYCLE_FAST) { // 快速模式
        if (!timelimit_exit && server.stat_expired_stale_perc < config_cycle_acceptable_stale) return; // 仅当过期键比例过高时执行快速模式
        if (start < last_fast_cycle + (long long)config_cycle_fast_duration*2) return; // 控制快速模式的执行频率
        last_fast_cycle = start;
    }
    if (dbs_per_call > server.dbnum || timelimit_exit) dbs_per_call = server.dbnum; // 计算本次调用最多处理的DB数量
    timelimit = config_cycle_slow_time_perc*1000000/server.hz/100; // 计算 CPU 时间限制(根据配置的时间占比)
    timelimit_exit = 0;
    if (timelimit <= 0) timelimit = 1;

    if (type == ACTIVE_EXPIRE_CYCLE_FAST) timelimit = config_cycle_fast_duration; // 设置快速模式的时间限制，微秒为单位

    /* Accumulate some global stats as we expire keys, to have some idea about the number of keys that are already logically expired, but still existing inside the database. */
    long total_sampled = 0; // 总扫描键数
    long total_expired = 0; // 总删除键数

    /* Try to smoke-out bugs (server.also_propagate should be empty here) */
    serverAssert(server.also_propagate.numops == 0);

    /* Stop iteration when one of the following conditions is met:
     * 1) We have checked a sufficient number of databases with expiration time.
     * 2) The time limit has been exceeded.
     * 3) All databases have been traversed. */
    for (j = 0; dbs_performed < dbs_per_call && timelimit_exit == 0 && j < server.dbnum; j++) {
        /* Scan callback data including expired and checked count per iteration. */
        expireScanData data;
        data.ttl_sum = 0;
        data.ttl_samples = 0;

        redisDb *db = server.db+(current_db % server.dbnum);
        data.db = db;

        int db_done = 0; /* The scan of the current DB is done? */
        int update_avg_ttl_times = 0, repeat = 0;

        /* Increment the DB now so we are sure if we run out of time in the current DB we'll restart from the next. This allows to distribute the time evenly across DBs. */
        current_db++;

        /* Interleaving hash-field expiration with key expiration. Better call it before handling expired keys because HFE DS is optimized for active expiration */
        activeExpireHashFieldCycle(type);

        if (kvstoreSize(db->expires)) dbs_performed++;

        /* Continue to expire if at the end of the cycle there are still a big percentage of keys to expire, compared to the number of keys
         * we scanned. The percentage, stored in config_cycle_acceptable_stale is not fixed, but depends on the Redis configured "expire effort". */
        do { // 单个 DB 的扫描循环
            unsigned long num;
            iteration++;

            /* If there is nothing to expire try next DB ASAP. */
            if ((num = kvstoreSize(db->expires)) == 0) { // 无过期 key 直接跳过
                db->avg_ttl = 0;
                break;
            }
            data.now = mstime();

            /* The main collection cycle. Scan through keys among keys with an expire set, checking for expired ones. */
            data.sampled = 0;
            data.expired = 0;

            if (num > config_keys_per_loop) num = config_keys_per_loop;

            /* Here we access the low level representation of the hash table for speed concerns: this makes this code coupled with dict.c, but it hardly changed in ten years.
             * Note that certain places of the hash table may be empty, so we want also a stop condition about the number of buckets that we scanned. However scanning for free buckets
             * is very fast: we are in the cache line scanning a sequential array of NULL pointers, so we can scan a lot more buckets than keys in the same time. */
            long max_buckets = num*20;
            long checked_buckets = 0;

            int origin_ttl_samples = data.ttl_samples;

            while (data.sampled < num && checked_buckets < max_buckets) { // 使用游标扫描过期字典
                db->expires_cursor = kvstoreScan(db->expires, db->expires_cursor, -1, expireScanCallback, isExpiryDictValidForSamplingCb, &data);
                if (db->expires_cursor == 0) { // 扫描完成
                    db_done = 1;
                    break;
                }
                checked_buckets++;
            }
            total_expired += data.expired;
            total_sampled += data.sampled;

            /* If find keys with ttl not yet expired, we need to update the average TTL stats once. */
            if (data.ttl_samples - origin_ttl_samples > 0) update_avg_ttl_times++;

            /* We don't repeat the cycle for the current database if the db is done for scanning or an acceptable number of stale keys (logically expired but yet not reclaimed). */
            repeat = db_done ? 0 : (data.sampled == 0 || (data.expired * 100 / data.sampled) > config_cycle_acceptable_stale);

            /* We can't block forever here even if there are many keys to expire. So after a given amount of microseconds return to the caller waiting for the other active expire cycle. */
            if ((iteration & 0xf) == 0 || !repeat) { /* Update the average TTL stats every 16 iterations or about to exit. */
                /* Update the average TTL stats for this database, because this may reach the time limit. */
                if (data.ttl_samples) {
                    long long avg_ttl = data.ttl_sum / data.ttl_samples;

                    /* Do a simple running average with a few samples. We just use the current estimate with a weight of 2% and the previous estimate with a weight of 98%. */
                    if (db->avg_ttl == 0) {
                        db->avg_ttl = avg_ttl;
                    } else {
                        /* The origin code is as follow.
                         * for (int i = 0; i < update_avg_ttl_times; i++) db->avg_ttl = (db->avg_ttl/50)*49 + (avg_ttl/50);
                         * We can convert the loop into a sum of a geometric progression.
                         * db->avg_ttl = db->avg_ttl * pow(0.98, update_avg_ttl_times) + avg_ttl / 50 * (pow(0.98, update_avg_ttl_times - 1) + ... + 1) 
                         *             = db->avg_ttl * pow(0.98, update_avg_ttl_times) + avg_ttl * (1 - pow(0.98, update_avg_ttl_times))
                         *             = avg_ttl +  (db->avg_ttl - avg_ttl) * pow(0.98, update_avg_ttl_times) 
                         * Notice that update_avg_ttl_times is between 1 and 16, we use a constant table to accelerate the calculation of pow(0.98, update_avg_ttl_times).*/
                        db->avg_ttl = avg_ttl + (db->avg_ttl - avg_ttl) * avg_ttl_factor[update_avg_ttl_times - 1] ;
                    }
                    update_avg_ttl_times = 0;
                    data.ttl_sum = 0;
                    data.ttl_samples = 0;
                }
                if ((iteration & 0xf) == 0) { /* check time limit every 16 iterations. */
                    elapsed = ustime()-start;
                    if (elapsed > timelimit) {
                        timelimit_exit = 1;
                        server.stat_expired_time_cap_reached_count++;
                        break;
                    }
                }
            }
        } while (repeat); // 根据过期比例决定是否重复扫描
    }

    elapsed = ustime()-start;
    server.stat_expire_cycle_time_used += elapsed;
    latencyAddSampleIfNeeded("expire-cycle",elapsed/1000);

    /* Update our estimate of keys existing but yet to be expired. Running average with this sample accounting for 5%. */
    double current_perc;
    if (total_sampled) current_perc = (double)total_expired/total_sampled;
    else current_perc = 0;
    server.stat_expired_stale_perc = (current_perc*0.05)+(server.stat_expired_stale_perc*0.95);
}

双模式扫描：快速模式(ACTIVE_EXPIRE_CYCLE_FAST)：高频执行(10次/秒)，短时间扫描。慢速模式(ACTIVE_EXPIRE_CYCLE_SLOW)：低频执行(1次/秒)，更彻底扫描

动态调整策略：通过 active_expire_effort 参数(1-10)智能调整 CPU 消耗，根据历史过期比例动态调整扫描强度(server.stat_expired_stale_perc)

时间敏感控制：严格的时间窗口控制(timelimit)，每 16 次迭代的时间检查机制(& 0xf 位运算优化)

统计指标优化：使用指数平滑法更新平均 TTL(db->avg_ttl)，维护全局过期键比例指标(stat_expired_stale_perc)

主从复制协同：从节点不主动删除过期键(依赖主节点同步 DEL)，但可写从节点特殊处理

在高性能与资源控制之间平衡，可结合 redis.conf 中的相关配置参数(如 hz、active-expire-effort)进行调整

Redis 默认使用惰性和定期两种

淘汰(逐出)策略

内存达到限制了，要删除一些 key。processCommand 中调用了 performEvictions

最大内存配置

没有配置内存大小或配置为 0 的话，32 位系统上默认为 3G，64 位系统上无限制。单位是 bytes 字节

# 指定 Redis 最大可用内存，Redis 启动时会把数据(RDB/AOF)加载到内存中，达到最大内存后，尝试清除已到期或即将到期的 Key，规则可通过 maxmemory-policy 指定，当处理后，仍然到达最大内存，将无法再进行写操作，但仍然可以进行读操作
# Redis 新的 vm 机制，会把 Key 存放内存，Value 会存放在 swap 区
# maxmemory <bytes>
# 3GB = 3*1024*1024*1024=3221225472
maxmemory 3221225472

淘汰方式配置

# LRU(Least Recently Used)：最近最少使用，LFU(Least Frequently Used)：最不经常使用
# volatile-xxx：检测易失数据(可能会过期的数据集 server.db[i].expires)
# allkeys-xxx：检测全库数据(所有数据集 server.db[i].keys)

# volatile-lru -> 利用 LRU 算法移除设置过过期时间的 key
# allkeys-lru -> 利用 LRU 算法移除任何 key（常用）
# volatile-lfu -> 利用 LFU 算法移除设置过过期时间的 key
# allkeys-lfu -> 利用 LFU 算法移除任何 key
# volatile-random -> 移除设置过过期时间的随机 key
# allkeys-random -> 移除随机 key
# volatile-ttl -> 移除即将过期的 key(minor TTL)
# noeviction -> 不移除任何 key，只是返回一个写错误(默认)
maxmemory-policy noeviction

# 每次选取待删除数据的个数，选取数据时并不会全库扫描，导致严重的性能消耗，降低读写性能。因此采用随机获取数据的方式作为待检测删除数据
# LRU，LFU 和最小 TTL 算法不是精确的算法，而是近似算法(节省内存)，默认 Redis 将检查 5 个 key 并选择最近使用的，可配置样本大小获得速度或精度。默认值 5 会产生足够好的结果，10 非常接近真实的 LRU，但耗 CPU，3 更快，但不是很准确
maxmemory-samples 5

命令行方式修改：设置之后立即生效，不需要重启 Redis，但重启 Redis 后，设置会失效

# 设置 100M
config set maxmemory 104857600
config set maxmemory-policy allkeys-lru
# 查看
config get maxmemory
# maxmemory_human：总共，used_memory_human：已使用
info memory
# 查看所有配置
config get *

数据逐出策略配置依据可以使用 INFO 命令输出监控信息，查询缓存 hit 和 miss 的次数，根据业务需求调整

https://redis.io/topics/persistence

https://redis.io/commands/expire

https://redis.io/topics/lru-cache

https://redis.io/docs/manual/keyspace-notifications

https://tech.meituan.com/2017/03/17/cache-about.html

posted @ 2020-09-16 09:51 江湖小小白阅读(7687) 评论(0) 收藏举报

刷新页面返回顶部