15 Oct 16:41:28.146 # Connection with slave 10.72.26.55:6379 lost.  15 Oct 16:41:28.999 * Slave asks for synchronization  15 Oct 16:41:28.999 * Unable to partial resync with the slave for lack of backlog (Slave request was: 152340118946214).  15 Oct 16:41:28.999 * Starting BGSAVE for SYNC  15 Oct 16:41:29.447 * Background saving started by pid 11357  15 Oct 16:41:57.325 * DB saved on disk  15 Oct 16:41:57.555 * RDB: 231 MB of memory used by copy-on-write  15 Oct 16:41:57.980 * Background saving terminated with success  15 Oct 16:42:31.739 * Synchronization with slave succeeded  15 Oct 16:43:01.021 # Client id=6082455 addr=slave_host:55308 fd=329 name= age=93 idle=1 flags=S db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=10657 omem=2504780296 events=rw cmd=replconf scheduled to be closed ASAP for overcoming of output buffer limits.
 15 Oct 16:43:01.141 # Connection with master lost.  15 Oct 16:43:01.141 * Caching the disconnected master state.  15 Oct 16:43:01.213 * Connecting to MASTER masterhost:6379  15 Oct 16:43:01.213 * MASTER <-> SLAVE sync started  15 Oct 16:43:01.213 * Non blocking connect for SYNC fired the event.  15 Oct 16:43:01.572 * Master replied to PING, replication can continue...  15 Oct 16:43:01.599 * Trying a partial resynchronization (request cbc213a279fde141211f65d436595e4ed64198fa:152342150944513).  15 Oct 16:43:01.602 * Full resync from master: cbc213a279fde141211f65d436595e4ed64198fa:152344338348685  15 Oct 16:43:01.602 * Discarding previously cached master state.  15 Oct 16:43:30.326 * MASTER <-> SLAVE sync: receiving 1308737462 bytes from master  15 Oct 16:43:59.846 * MASTER <-> SLAVE sync: Flushing old data  15 Oct 16:44:01.534 * MASTER <-> SLAVE sync: Loading DB in memory  15 Oct 16:44:22.590 * MASTER <-> SLAVE sync: Finished with success  15 Oct 16:44:22.600 # Connection with master lost.  15 Oct 16:44:22.600 * Caching the disconnected master state.
从主库的日志我们可以看到slave的链接由于超过了output buffer limits的设置值所以被强行中断了。看一下redis2.8的自描述文件
# client-output-buffer-limit <class> <hard limit> <soft limit> <soft seconds> # # A client is immediately disconnected once the hard limit is reached, or if # the soft limit is reached and remains reached for the specified number of # seconds (continuously). # So for instance if the hard limit is 32 megabytes and the soft limit is # 16 megabytes / 10 seconds, the client will get disconnected immediately # if the size of the output buffers reach 32 megabytes, but will also get # disconnected if the client reaches 16 megabytes and continuously overcomes # the limit for 10 seconds. # # By default normal clients are not limited because they don't receive data # without asking (in a push way), but just after a request, so only # asynchronous clients may create a scenario where data is requested faster # than it can read. # # Instead there is a default limit for pubsub and slave clients, since # subscribers and slaves receive data in a push fashion. # # Both the hard or the soft limit can be disabled by setting them to zero. client-output-buffer-limit normal 0 0 0 client-output-buffer-limit slave 256mb 64mb 60 client-output-buffer-limit pubsub 32mb 8mb 60
256mb 是一个硬性限制，当output-buffer的大小大于256mb之后就会断开连接 64mb 60 是一个条件限制，当output-buffer的大小大于64mb并且超过了60秒的时候就会断开连接
当我们链接暴增，数据量大的情况下默认参数已经不能满足主从同步，从库会不停的向主库发起同步，主库就会不停的bgsave，发送文件给从库，这样就会造成一个死循环。我们必须依据从库的使用来调整client-output-buffer-limit slave 的值。调整以后就可以正常同步了。