redis6.0?多线程?到底发生肾么事了?

背景

redis一直以来都是以单线程模式运行,这里的单线程指网络IO和命令的执行部分。今年发布了6.0版本,加上了多线程来处理网络IO(read,write)和命令的解析。

单线程模式优缺点

这个想必大家都知道,简单介绍一下。

优点:

  • 纯内存操作,CPU不是其性能瓶颈,开多个进程也可以更容易的使用多个CPU
  • 无需考虑多线程同步,对开发友好
  • 执行命令天然原子性
  • 使用IO多路复用来处理大量连接,省去了线程上下文切换的时间

缺点:

  • 耗时的操作将引起阻塞
  • 单实例不能充分利用多核CPU(read/write还是需要CPU的参与在内核态与用户态之间copy数据)

redis网络IO模型简介

redis采用IO多路复用来管理多个网络连接,代码编写采用Reactor模式。

image-20201204184422679

主线程是一个事件循环。

image-20201204190457369

简单看下源代码:

/* State of an event based program */
typedef struct aeEventLoop {
    int maxfd;   /* highest file descriptor currently registered */
    int setsize; /* max number of file descriptors tracked */
    long long timeEventNextId;
    time_t lastTime;     /* Used to detect system clock skew */
    aeFileEvent *events; /* Registered events */
    aeFiredEvent *fired; /* Fired events */
    aeTimeEvent *timeEventHead;
    int stop;
    void *apidata; /* This is used for polling API specific data */
    aeBeforeSleepProc *beforesleep;
    aeBeforeSleepProc *aftersleep;
    int flags;
} aeEventLoop;

struct redisServer {
      aeEventLoop *el;
}

el变量保存了事件循环相关的信息,其中void *apidata;保存了IO多路复用API相关的信息,redis封装了selectepollkqueue等多种不同的IO多路复用函数,在编译期根据平台类型来选择一种。

ae.c:

/* Include the best multiplexing layer supported by this system.
 * The following should be ordered by performances, descending. */
#ifdef HAVE_EVPORT
#include "ae_evport.c"
#else
    #ifdef HAVE_EPOLL
    #include "ae_epoll.c"
    #else
        #ifdef HAVE_KQUEUE
        #include "ae_kqueue.c"
        #else
        #include "ae_select.c"
        #endif
    #endif
#endif

FileEvent

FileEvent其实就是网络IO事件,为fd绑定读写对应的事件处理函数,当通过IO多路复用获取到其就绪时,调用其绑定的处理函数。

/* File event structure */
typedef struct aeFileEvent {
    int mask; /* one of AE_(READABLE|WRITABLE|BARRIER) */
    aeFileProc *rfileProc;
    aeFileProc *wfileProc;
    void *clientData;
} aeFileEvent;

TimeEvent

TimeEvent是定时任务事件。每个定时任务都绑定一个执行函数,巧妙的利用IO多路复用API拉取就绪事件时的阻塞时间参数,来实现定时的效果。比如最近要执行的定时任务是100ms后(这里用的循环遍历的方式获取最值,时间复杂度O(n),可以改用跳表之类的数据结构优化到O(log n),应该是作者考虑到定时任务并不会特别多,所以这里并没有专门去做优化),那么就让select函数的阻塞超时时间设为100ms,这样就可以实现一个不是特别精确的定时器。

/* Time event structure */
typedef struct aeTimeEvent {
    long long id; /* time event identifier. */
    long when_sec; /* seconds */
    long when_ms; /* milliseconds */
    aeTimeProc *timeProc;
    aeEventFinalizerProc *finalizerProc;
    void *clientData;
    struct aeTimeEvent *prev;
    struct aeTimeEvent *next;
    int refcount; /* refcount to prevent timer events from being
  		   * freed in recursive time event calls. */
} aeTimeEvent;

6.0版本引入多线程

IO多线程相关的配置

先看一下6.0版本配置文件中关于多线程的参数和说明:

################################ THREADED I/O #################################

# Redis is mostly single threaded, however there are certain threaded
# operations such as UNLINK, slow I/O accesses and other things that are
# performed on side threads.
#
# Now it is also possible to handle Redis clients socket reads and writes
# in different I/O threads. Since especially writing is so slow, normally
# Redis users use pipelining in order to speed up the Redis performances per
# core, and spawn multiple instances in order to scale more. Using I/O
# threads it is possible to easily speedup two times Redis without resorting
# to pipelining nor sharding of the instance.
#
# By default threading is disabled, we suggest enabling it only in machines
# that have at least 4 or more cores, leaving at least one spare core.
# Using more than 8 threads is unlikely to help much. We also recommend using
# threaded I/O only if you actually have performance problems, with Redis
# instances being able to use a quite big percentage of CPU time, otherwise
# there is no point in using this feature.
#
# So for instance if you have a four cores boxes, try to use 2 or 3 I/O
# threads, if you have a 8 cores, try to use 6 threads. In order to
# enable I/O threads use the following configuration directive:
#
# io-threads 4
#
# Setting io-threads to 1 will just use the main thread as usual.
# When I/O threads are enabled, we only use threads for writes, that is
# to thread the write(2) syscall and transfer the client buffers to the
# socket. However it is also possible to enable threading of reads and
# protocol parsing using the following configuration directive, by setting
# it to yes:
#
# io-threads-do-reads no
#
# Usually threading reads doesn't help much.
#
# NOTE 1: This configuration directive cannot be changed at runtime via
# CONFIG SET. Aso this feature currently does not work when SSL is
# enabled.
#
# NOTE 2: If you want to test the Redis speedup using redis-benchmark, make
# sure you also run the benchmark itself in threaded mode, using the
# --threads option to match the number of Redis threads, otherwise you'll not
# be able to notice the improvements.

这里我们需要关注以下几点:

  • 默认是单线程模式
  • IO多线程用于readwrite函数
  • 不需要开多个实例运行redis也可以轻松加速2倍的速度
  • io-threads参数指明有几个IO线程
  • 如果io-threads是1,则只有一个主线程,如果是2,则多开一个IO线程,以此类推
  • 默认只有write函数会使用多线程
  • io-threads-do-reads控制read是否开启多线程
  • 多线程IO对read的帮助并不是特别大
  • SSL模式暂时不支持这个配置
  • 对多线程IO的redis做基准测试的时候,redis-benchmark也要开启多线程参数

看看源码

IO线程的主要源代码在这里: networking.c#L2979

关键全局变量

pthread_t io_threads[IO_THREADS_MAX_NUM];
pthread_mutex_t io_threads_mutex[IO_THREADS_MAX_NUM];
_Atomic unsigned long io_threads_pending[IO_THREADS_MAX_NUM];
int io_threads_op;      /* IO_THREADS_OP_WRITE or IO_THREADS_OP_READ. */

/* This is the list of clients each thread will serve when threaded I/O is
 * used. We spawn io_threads_num-1 threads, since one is the main thread
 * itself. */
list *io_threads_list[IO_THREADS_MAX_NUM];

io_threads:pthread多线程结构体

io_threads_mutex:互斥锁,用于在主线程控制IO线程的停止和运行

io_threads_pending:原子类型,和主线程进行同步的变量,如果io_threads_pending[i]==1说明编号为i的线程就绪了,可以进行读/写操作。

io_threads_op:当前操作是读还是写

io_threads_list:每个线程的client队列,对于某个thread,遍历list依次处理其下的client

关键代码逻辑

beforeSleep函数中分别调用handleClientsWithPendingReadsUsingThreadshandleClientsWithPendingWritesUsingThreads来唤醒IO线程,分别处理读和写。

handleClientsWithPendingReadsUsingThreads函数中可以看到,就绪的客户端会均匀分配到n个IO线程中去执行:

/* Distribute the clients across N different lists. */
    listIter li;
    listNode *ln;
    listRewind(server.clients_pending_read,&li);
    int item_id = 0;
    while((ln = listNext(&li))) {
        client *c = listNodeValue(ln);
        int target_id = item_id % server.io_threads_num;
        listAddNodeTail(io_threads_list[target_id],c);
        item_id++;
    }

然后会通过设置io_threads_pending变量来唤醒IO线程,假设设置了io-threads=4则会有io-threads - 1 = 3个额外的线程启动,因为主线程也会作为一个IO线程。主线程处理io_threads_list[0]里面的客户端。

/* Give the start condition to the waiting threads, by setting the
     * start condition atomic var. */
    io_threads_op = IO_THREADS_OP_READ;
    for (int j = 1; j < server.io_threads_num; j++) {
        int count = listLength(io_threads_list[j]);
        io_threads_pending[j] = count;
    }

/* Also use the main thread to process a slice of clients. */
    listRewind(io_threads_list[0],&li);
    while((ln = listNext(&li))) {
        client *c = listNodeValue(ln);
        readQueryFromClient(c->conn);
    }
    listEmpty(io_threads_list[0]);

然后主线程做完IO操作之后,会死循环等待其他IO线程完成读操作,才会执行命令的执行,这个时候读取数据和解析命令已经在IO线程中完成了,主线程执行命令,保证了命令执行的原子性。

/* Wait for all the other threads to end their work. */
    while(1) {
        unsigned long pending = 0;
        for (int j = 1; j < server.io_threads_num; j++)
            pending += io_threads_pending[j];
        if (pending == 0) break;
    }
    if (tio_debug) printf("I/O READ All threads finshed\n");

    /* Run the list of clients again to process the new buffers. */
    while(listLength(server.clients_pending_read)) {
        ln = listFirst(server.clients_pending_read);
        client *c = listNodeValue(ln);
        c->flags &= ~CLIENT_PENDING_READ;
        listDelNode(server.clients_pending_read,ln);

        if (c->flags & CLIENT_PENDING_COMMAND) {
            c->flags &= ~CLIENT_PENDING_COMMAND;
            if (processCommandAndResetClient(c) == C_ERR) {
                /* If the client is no longer valid, we avoid
                 * processing the client later. So we just go
                 * to the next. */
                continue;
            }
        }
        processInputBuffer(c);
    }

IO线程的执行逻辑在IOThreadMain中:

死循环中等待io_threads_pending被设置为非零值,这里如果死循环一直轮询会把CPU吃满,所以这里还有一个互斥锁io_threads_mutex来暂停IO线程,使其阻塞在pthread_mutex_lock这里。

 /* Wait for start */
        for (int j = 0; j < 1000000; j++) {
            if (io_threads_pending[id] != 0) break;
        }

        /* Give the main thread a chance to stop this thread. */
        if (io_threads_pending[id] == 0) {
            pthread_mutex_lock(&io_threads_mutex[id]);
            pthread_mutex_unlock(&io_threads_mutex[id]);
            continue;
        }

接下来就是根据io_threads_op来区分是读还是写,去执行readwrite

/* Process: note that the main thread will never touch our list
         * before we drop the pending count to 0. */
        listIter li;
        listNode *ln;
        listRewind(io_threads_list[id],&li);
        while((ln = listNext(&li))) {
            client *c = listNodeValue(ln);
            if (io_threads_op == IO_THREADS_OP_WRITE) {
                writeToClient(c,0);
            } else if (io_threads_op == IO_THREADS_OP_READ) {
                readQueryFromClient(c->conn);
            } else {
                serverPanic("io_threads_op value is unknown");
            }
        }
        listEmpty(io_threads_list[id]);
        io_threads_pending[id] = 0;

最后看下后台进程

io-threads=4,会多额外的3个io-thread

top -Hp 17339

image-20201207115032168

IO多线程模式流程

image-20201207113755497

性能测试

这里使用redis自带的基准测试工具redis-benchmark来进行测试。

对比图

image-20201207162113321

详情数据

redis6.0.9:IO线程数是4

多线程:./redis-benchmark -c 1000 -n 1000000 --threads 4 --csv

单线程:./redis-benchmark -c 1000 -n 1000000 --csv

表格:

cmd 4 threads & read yes 4 threads & read no 1 thread
PING_INLINE 472589.81 363240.09 215610.17
PING_BULK 515198.34 423908.44 213766.56
SET 442673.75 372162.25 213401.62
GET 476644.41 400320.28 212901.84
INCR 460829.47 389559.81 214408.23
LPUSH 399520.56 346500.34 220896.84
RPUSH 430292.62 358680.03 217391.31
LPOP 404203.72 344946.53 222024.86
RPOP 399680.25 333111.25 215517.25
SADD 450856.66 363372.09 216590.86
HSET 399680.25 333111.25 217344.06
SPOP 486854.94 405350.62 213401.62
ZADD 415627.62 333222.28 217912.39
ZPOPMIN 444049.72 402900.88 216122.77
LPUSH (needed to benchmark LRANGE) 410677.62 342114.25 218914.19
LRANGE_100 (first 100 elements) 113869.28 110168.56 75483.09
LRANGE_300 (first 300 elements) 45687.13 44081.99 27139.99
LRANGE_500 (first 450 elements) 31991.81 31406.05 20085.56
LRANGE_600 (first 600 elements) 24688.31 23973.34 15635.75
MSET (10 keys) 226244.34 200240.30 175500.17

总结

redis6.0之后针对网络IO增加了多线程,IO线程中只负责read、解析command、write操作,命令执行操作还是在主线程,依然具有原子性。
开启四个IO线程的情况下,GET和SET操作,相对于单线程模式,开启write+read多线程,性能为原来的2倍,只开启write多线程,性能为原来的1.68倍

最后大家可以考虑下,为肾么,多线程执行read速度提升并不明显?

posted @ 2020-12-07 16:55  Deaglepc  阅读(314)  评论(0编辑  收藏  举报