RocketMQ中Broker的刷盘源码分析

上一篇博客的最后简单提了下CommitLog的刷盘【RocketMQ中Broker的消息存储源码分析】（这篇博客和上一篇有很大的联系）

Broker的CommitLog刷盘会启动一个线程，不停地将缓冲区的内容写入磁盘（CommitLog文件）中，主要分为异步刷盘和同步刷盘

异步刷盘又可以分为两种方式：
①缓存到mappedByteBuffer -> 写入磁盘（包括同步刷盘）
②缓存到writeBuffer -> 缓存到fileChannel -> 写入磁盘（前面说过的开启内存字节缓冲区情况下）

CommitLog的两种刷盘模式：

1 public enum FlushDiskType {
2     SYNC_FLUSH,
3     ASYNC_FLUSH
4 }

同步和异步，同步刷盘由GroupCommitService实现，异步刷盘由FlushRealTimeService实现，默认采用异步刷盘

在采用异步刷盘的模式下，若是开启内存字节缓冲区，那么会在FlushRealTimeService的基础上开启CommitRealTimeService

同步刷盘：

启动GroupCommitService线程：

 1 public void run() {
 2     CommitLog.log.info(this.getServiceName() + " service started");
 3 
 4     while (!this.isStopped()) {
 5         try {
 6             this.waitForRunning(10);
 7             this.doCommit();
 8         } catch (Exception e) {
 9             CommitLog.log.warn(this.getServiceName() + " service has exception. ", e);
10         }
11     }
12 
13     // Under normal circumstances shutdown, wait for the arrival of the
14     // request, and then flush
15     try {
16         Thread.sleep(10);
17     } catch (InterruptedException e) {
18         CommitLog.log.warn("GroupCommitService Exception, ", e);
19     }
20 
21     synchronized (this) {
22         this.swapRequests();
23     }
24 
25     this.doCommit();
26 
27     CommitLog.log.info(this.getServiceName() + " service end");
28 }

通过循环中的doCommit不断地进行刷盘

doCommit方法：

 1 private void doCommit() {
 2     synchronized (this.requestsRead) {
 3         if (!this.requestsRead.isEmpty()) {
 4             for (GroupCommitRequest req : this.requestsRead) {
 5                 // There may be a message in the next file, so a maximum of
 6                 // two times the flush
 7                 boolean flushOK = false;
 8                 for (int i = 0; i < 2 && !flushOK; i++) {
 9                     flushOK = CommitLog.this.mappedFileQueue.getFlushedWhere() >= req.getNextOffset();
10 
11                     if (!flushOK) {
12                         CommitLog.this.mappedFileQueue.flush(0);
13                     }
14                 }
15 
16                 req.wakeupCustomer(flushOK);
17             }
18 
19             long storeTimestamp = CommitLog.this.mappedFileQueue.getStoreTimestamp();
20             if (storeTimestamp > 0) {
21                 CommitLog.this.defaultMessageStore.getStoreCheckpoint().setPhysicMsgTimestamp(storeTimestamp);
22             }
23 
24             this.requestsRead.clear();
25         } else {
26             // Because of individual messages is set to not sync flush, it
27             // will come to this process
28             CommitLog.this.mappedFileQueue.flush(0);
29         }
30     }
31 }

其中在GroupCommitService中管理着两张List：

1 private volatile List<GroupCommitRequest> requestsWrite = new ArrayList<GroupCommitRequest>();
2 private volatile List<GroupCommitRequest> requestsRead = new ArrayList<GroupCommitRequest>();

GroupCommitRequest中封装了一个Offset

1 private final long nextOffset;

这里就需要看到上一篇博客结尾提到的handleDiskFlush方法：

 1 public void handleDiskFlush(AppendMessageResult result, PutMessageResult putMessageResult, MessageExt messageExt) {
 2     // Synchronization flush
 3     if (FlushDiskType.SYNC_FLUSH == this.defaultMessageStore.getMessageStoreConfig().getFlushDiskType()) {
 4         final GroupCommitService service = (GroupCommitService) this.flushCommitLogService;
 5         if (messageExt.isWaitStoreMsgOK()) {
 6             GroupCommitRequest request = new GroupCommitRequest(result.getWroteOffset() + result.getWroteBytes());
 7             service.putRequest(request);
 8             boolean flushOK = request.waitForFlush(this.defaultMessageStore.getMessageStoreConfig().getSyncFlushTimeout());
 9             if (!flushOK) {
10                 log.error("do groupcommit, wait for flush failed, topic: " + messageExt.getTopic() + " tags: " + messageExt.getTags()
11                     + " client address: " + messageExt.getBornHostString());
12                 putMessageResult.setPutMessageStatus(PutMessageStatus.FLUSH_DISK_TIMEOUT);
13             }
14         } else {
15             service.wakeup();
16         }
17     }
18     // Asynchronous flush
19     else {
20         if (!this.defaultMessageStore.getMessageStoreConfig().isTransientStorePoolEnable()) {
21             flushCommitLogService.wakeup();
22         } else {
23             commitLogService.wakeup();
24         }
25     }
26 }

这个方法的调用发生在Broker接收到来自Producer的消息，并且完成了向ByteBuffer的写入

可以看到，在同步刷盘SYNC_FLUSH模式下，会从AppendMessageResult 中取出WroteOffset以及WroteBytes从而计算出nextOffset，把这个nextOffset封装到GroupCommitRequest中，然后通过GroupCommitService 的putRequest方法，将GroupCommitRequest添加到requestsWrite这个List中
putRequest方法：

1 public synchronized void putRequest(final GroupCommitRequest request) {
2     synchronized (this.requestsWrite) {
3         this.requestsWrite.add(request);
4     }
5     if (hasNotified.compareAndSet(false, true)) {
6         waitPoint.countDown(); // notify
7     }
8 }

在完成List的add操作后，会通过CAS操作修改hasNotified这个原子化的Boolean值，同时通过waitPoint的countDown进行唤醒操作，在后面会有用

由于这里这里是同步刷盘，所以需要通过GroupCommitRequest的waitForFlush方法，在超时时间内等待该记录对应的刷盘完成
而异步刷盘会通过wakeup方法唤醒刷盘任务，并没有进行等待，这就是二者区别

回到doCommit方法中，这时会发现这里是对requestsRead这条List进行的操作，而刚才是将记录存放在requestsWrite这条List中的
这就和在run方法中的waitForRunning方法有关了：

 1 protected void waitForRunning(long interval) {
 2    if (hasNotified.compareAndSet(true, false)) {
 3         this.onWaitEnd();
 4         return;
 5     }
 6 
 7     //entry to wait
 8     waitPoint.reset();
 9 
10     try {
11         waitPoint.await(interval, TimeUnit.MILLISECONDS);
12     } catch (InterruptedException e) {
13         log.error("Interrupted", e);
14     } finally {
15         hasNotified.set(false);
16         this.onWaitEnd();
17     }
18 }

这里通过CAS操作修改hasNotified值，从而调用onWaitEnd方法；如果修改失败，则因为await进入阻塞，等待上面所说的putRequest方法将其唤醒，也就是说当Producer发送的消息被缓存成功后，调用handleDiskFlush方法后，唤醒刷盘线工作，当然刷盘线程在达到超时时间interval后也会唤醒

再来看看onWaitEnd方法：

1 protected void onWaitEnd() {
2     this.swapRequests();
3 }
4 
5 private void swapRequests() {
6     List<GroupCommitRequest> tmp = this.requestsWrite;
7     this.requestsWrite = this.requestsRead;
8     this.requestsRead = tmp;
9 }

可以看到，这里是将两个List进行了交换

这是一个非常有趣的做法，如果熟悉JVM的话，有没有觉得这其实很像新生代的复制算法！
当刷盘线程阻塞的时候，requestsWrite中会填充记录，当刷盘线程被唤醒工作的时候，首先会将requestsWrite和requestsRead进行交换，那么此时的记录就是从requestsRead中读取的了，而同时requestsWrite会变为空的List，消息记录就会往这个空的List中填充，如此往复

可以看到doCommit方法中，当requestsRead不为空的时候，在最后会调用requestsRead的clear方法，由此证明了我上面的说法

仔细来看看是如何进行刷盘的：

 1 for (GroupCommitRequest req : this.requestsRead) {
 2    // There may be a message in the next file, so a maximum of
 3     // two times the flush
 4     boolean flushOK = false;
 5     for (int i = 0; i < 2 && !flushOK; i++) {
 6         flushOK = CommitLog.this.mappedFileQueue.getFlushedWhere() >= req.getNextOffset();
 7 
 8         if (!flushOK) {
 9             CommitLog.this.mappedFileQueue.flush(0);
10         }
11     }
12 
13     req.wakeupCustomer(flushOK);
14 }

通过遍历requestsRead，可以到得到GroupCommitRequest封装的NextOffset

其中flushedWhere是用来记录上一次刷盘完成后的offset，若是上一次的刷盘位置大于等于NextOffset，就说明从NextOffset位置起始已经被刷新过了，不需要刷新，否则调用mappedFileQueue的flush方法进行刷盘

MappedFileQueue的flush方法：

 1 public boolean flush(final int flushLeastPages) {
 2     boolean result = true;
 3     MappedFile mappedFile = this.findMappedFileByOffset(this.flushedWhere, this.flushedWhere == 0);
 4     if (mappedFile != null) {
 5         long tmpTimeStamp = mappedFile.getStoreTimestamp();
 6         int offset = mappedFile.flush(flushLeastPages);
 7         long where = mappedFile.getFileFromOffset() + offset;
 8         result = where == this.flushedWhere;
 9         this.flushedWhere = where;
10         if (0 == flushLeastPages) {
11             this.storeTimestamp = tmpTimeStamp;
12         }
13     }
14 
15     return result;
16 }

这里首先根据flushedWhere上一次刷盘完成后的offset，通过findMappedFileByOffset方法，找到CommitLog文件的映射MappedFile
有关MappedFile及其相关操作在我之前的博客中介绍过很多次，就不再累赘

再找到MappedFile后，调用其flush方法：

MappedFile的flush方法：

 1 public int flush(final int flushLeastPages) {
 2     if (this.isAbleToFlush(flushLeastPages)) {
 3         if (this.hold()) {
 4             int value = getReadPosition();
 5 
 6             try {
 7                 //We only append data to fileChannel or mappedByteBuffer, never both.
 8                 if (writeBuffer != null || this.fileChannel.position() != 0) {
 9                     this.fileChannel.force(false);
10                 } else {
11                     this.mappedByteBuffer.force();
12                 }
13             } catch (Throwable e) {
14                 log.error("Error occurred when force data to disk.", e);
15             }
16 
17             this.flushedPosition.set(value);
18             this.release();
19         } else {
20             log.warn("in flush, hold failed, flush offset = " + this.flushedPosition.get());
21             this.flushedPosition.set(getReadPosition());
22         }
23     }
24     return this.getFlushedPosition();
25 }

首先isAbleToFlush方法：

 1 private boolean isAbleToFlush(final int flushLeastPages) {
 2     int flush = this.flushedPosition.get();
 3     int write = getReadPosition();
 4 
 5     if (this.isFull()) {
 6         return true;
 7     }
 8 
 9     if (flushLeastPages > 0) {
10         return ((write / OS_PAGE_SIZE) - (flush / OS_PAGE_SIZE)) >= flushLeastPages;
11     }
12 
13     return write > flush;
14 }

其中flush记录的是上一次完成刷新后的位置，write记录的是当前消息内容写入后的位置
当flushLeastPages 大于0的时候，通过：

1 return ((write / OS_PAGE_SIZE) - (flush / OS_PAGE_SIZE)) >= flushLeastPages;

可以计算出是否满足page的要求，其中OS_PAGE_SIZE是4K，也就是说1个page大小是4k

由于这里是同步刷盘，flushLeastPages是0，不对page要求，只要有缓存有内容就会刷盘；但是在异步刷盘中，flushLeastPages是4，也就是说，只有当缓存的消息至少是4（page个数）*4K（page大小）= 16K时，异步刷盘才会将缓存写入文件

回到MappedFile的flush方法，在通过isAbleToFlush检查完写入要求后

 1 int value = getReadPosition();
 2 try {
 3     //We only append data to fileChannel or mappedByteBuffer, never both.
 4     if (writeBuffer != null || this.fileChannel.position() != 0) {
 5         this.fileChannel.force(false);
 6     } else {
 7         this.mappedByteBuffer.force();
 8     }
 9 } catch (Throwable e) {
10     log.error("Error occurred when force data to disk.", e);
11 }
12 
13 this.flushedPosition.set(value);

首先通过getReadPosition获取当前消息内容写入后的位置，由于是同步刷盘，所以这里调用mappedByteBuffer的force方法，通过JDK的NIO操作，将mappedByteBuffer缓存中的数据写入CommitLog文件中
最后更新flushedPosition的值

再回到MappedFileQueue的flush方法，在完成MappedFile的flush后，还需要更新flushedWhere的值

此时缓存中的数据完成了持久化，同步刷盘结束

异步刷盘：

①FlushCommitLogService：

 1 public void run() {
 2     CommitLog.log.info(this.getServiceName() + " service started");
 3 
 4     while (!this.isStopped()) {
 5         boolean flushCommitLogTimed = CommitLog.this.defaultMessageStore.getMessageStoreConfig().isFlushCommitLogTimed();
 6 
 7         int interval = CommitLog.this.defaultMessageStore.getMessageStoreConfig().getFlushIntervalCommitLog();
 8         int flushPhysicQueueLeastPages = CommitLog.this.defaultMessageStore.getMessageStoreConfig().getFlushCommitLogLeastPages();
 9 
10         int flushPhysicQueueThoroughInterval =
11             CommitLog.this.defaultMessageStore.getMessageStoreConfig().getFlushCommitLogThoroughInterval();
12 
13         boolean printFlushProgress = false;
14 
15         // Print flush progress
16         long currentTimeMillis = System.currentTimeMillis();
17         if (currentTimeMillis >= (this.lastFlushTimestamp + flushPhysicQueueThoroughInterval)) {
18             this.lastFlushTimestamp = currentTimeMillis;
19             flushPhysicQueueLeastPages = 0;
20             printFlushProgress = (printTimes++ % 10) == 0;
21         }
22 
23         try {
24             if (flushCommitLogTimed) {
25                 Thread.sleep(interval);
26             } else {
27                 this.waitForRunning(interval);
28             }
29 
30             if (printFlushProgress) {
31                 this.printFlushProgress();
32             }
33 
34             long begin = System.currentTimeMillis();
35             CommitLog.this.mappedFileQueue.flush(flushPhysicQueueLeastPages);
36             long storeTimestamp = CommitLog.this.mappedFileQueue.getStoreTimestamp();
37             if (storeTimestamp > 0) {
38                 CommitLog.this.defaultMessageStore.getStoreCheckpoint().setPhysicMsgTimestamp(storeTimestamp);
39             }
40             long past = System.currentTimeMillis() - begin;
41             if (past > 500) {
42                 log.info("Flush data to disk costs {} ms", past);
43             }
44         } catch (Throwable e) {
45             CommitLog.log.warn(this.getServiceName() + " service has exception. ", e);
46             this.printFlushProgress();
47         }
48     }
49 
50     // Normal shutdown, to ensure that all the flush before exit
51     boolean result = false;
52     for (int i = 0; i < RETRY_TIMES_OVER && !result; i++) {
53         result = CommitLog.this.mappedFileQueue.flush(0);
54         CommitLog.log.info(this.getServiceName() + " service shutdown, retry " + (i + 1) + " times " + (result ? "OK" : "Not OK"));
55     }
56 
57     this.printFlushProgress();
58 
59     CommitLog.log.info(this.getServiceName() + " service end");
60 }

flushCommitLogTimed：是否使用定时刷盘
interval：刷盘时间间隔，默认500ms
flushPhysicQueueLeastPages：page大小，默认4个
flushPhysicQueueThoroughInterval：彻底刷盘时间间隔，默认10s

首先根据lastFlushTimestamp（上一次刷盘时间）+ flushPhysicQueueThoroughInterval和当前时间比较，判断是否需要进行一次彻底刷盘，若达到了需要则将flushPhysicQueueLeastPages置为0

接着根据flushCommitLogTimed判断
当flushCommitLogTimed为true，使用sleep等待500ms
当flushCommitLogTimed为false，调用waitForRunning在超时时间为500ms下阻塞，其唤醒条件也就是在handleDiskFlush中的wakeup唤醒

最后，和同步刷盘一样，调用mappedFileQueue的flush方法
只不过，这里的flushPhysicQueueLeastPages决定了其是进行彻底刷新，还是按4page（16K）的标准刷新

②CommitRealTimeService
这种刷盘方式需要和FlushCommitLogService配合

CommitRealTimeService的run方法：

 1 public void run() {
 2    CommitLog.log.info(this.getServiceName() + " service started");
 3     while (!this.isStopped()) {
 4         int interval = CommitLog.this.defaultMessageStore.getMessageStoreConfig().getCommitIntervalCommitLog();
 5 
 6         int commitDataLeastPages = CommitLog.this.defaultMessageStore.getMessageStoreConfig().getCommitCommitLogLeastPages();
 7 
 8         int commitDataThoroughInterval =
 9             CommitLog.this.defaultMessageStore.getMessageStoreConfig().getCommitCommitLogThoroughInterval();
10 
11         long begin = System.currentTimeMillis();
12         if (begin >= (this.lastCommitTimestamp + commitDataThoroughInterval)) {
13             this.lastCommitTimestamp = begin;
14             commitDataLeastPages = 0;
15         }
16 
17         try {
18             boolean result = CommitLog.this.mappedFileQueue.commit(commitDataLeastPages);
19             long end = System.currentTimeMillis();
20             if (!result) {
21                 this.lastCommitTimestamp = end; // result = false means some data committed.
22                 //now wake up flush thread.
23                 flushCommitLogService.wakeup();
24             }
25 
26             if (end - begin > 500) {
27                 log.info("Commit data to file costs {} ms", end - begin);
28             }
29             this.waitForRunning(interval);
30         } catch (Throwable e) {
31             CommitLog.log.error(this.getServiceName() + " service has exception. ", e);
32         }
33     }
34 
35     boolean result = false;
36     for (int i = 0; i < RETRY_TIMES_OVER && !result; i++) {
37         result = CommitLog.this.mappedFileQueue.commit(0);
38         CommitLog.log.info(this.getServiceName() + " service shutdown, retry " + (i + 1) + " times " + (result ? "OK" : "Not OK"));
39     }
40     CommitLog.log.info(this.getServiceName() + " service end");
41 }

这里的逻辑和FlushCommitLogService中相似，之不过参数略有不同

interval：提交时间间隔，默认200ms
commitDataLeastPages：page大小，默认4个
commitDataThoroughInterval：提交完成时间间隔，默认200ms

基本和FlushCommitLogService相似，只不过调用了mappedFileQueue的commit方法

 1 public boolean commit(final int commitLeastPages) {
 2     boolean result = true;
 3     MappedFile mappedFile = this.findMappedFileByOffset(this.committedWhere, this.committedWhere == 0);
 4     if (mappedFile != null) {
 5         int offset = mappedFile.commit(commitLeastPages);
 6         long where = mappedFile.getFileFromOffset() + offset;
 7         result = where == this.committedWhere;
 8         this.committedWhere = where;
 9     }
10 
11     return result;
12 }

这里和mappedFileQueue的flush方法很相似，通过committedWhere寻找MappedFile

然后调用MappedFile的commit方法：

 1 public int commit(final int commitLeastPages) {
 2     if (writeBuffer == null) {
 3         //no need to commit data to file channel, so just regard wrotePosition as committedPosition.
 4         return this.wrotePosition.get();
 5     }
 6     if (this.isAbleToCommit(commitLeastPages)) {
 7         if (this.hold()) {
 8             commit0(commitLeastPages);
 9             this.release();
10         } else {
11             log.warn("in commit, hold failed, commit offset = " + this.committedPosition.get());
12         }
13     }
14 
15     // All dirty data has been committed to FileChannel.
16     if (writeBuffer != null && this.transientStorePool != null && this.fileSize == this.committedPosition.get()) {
17         this.transientStorePool.returnBuffer(writeBuffer);
18         this.writeBuffer = null;
19     }
20 
21     return this.committedPosition.get();
22 }

依旧和MappedFile的flush方法很相似，在isAbleToCommit检查完page后调用commit0方法

MappedFile的commit0方法：

 1 protected void commit0(final int commitLeastPages) {
 2     int writePos = this.wrotePosition.get();
 3     int lastCommittedPosition = this.committedPosition.get();
 4 
 5     if (writePos - this.committedPosition.get() > 0) {
 6         try {
 7             ByteBuffer byteBuffer = writeBuffer.slice();
 8             byteBuffer.position(lastCommittedPosition);
 9             byteBuffer.limit(writePos);
10             this.fileChannel.position(lastCommittedPosition);
11             this.fileChannel.write(byteBuffer);
12             this.committedPosition.set(writePos);
13         } catch (Throwable e) {
14             log.error("Error occurred when commit data to FileChannel.", e);
15         }
16     }
17 }

在【RocketMQ中Broker的消息存储源码分析】

中说过，当使用这种方式时，会先将消息缓存在writeBuffer中而不是之前的mappedByteBuffer
这里就可以清楚地看到将writeBuffer中从lastCommittedPosition（上次提交位置）开始到writePos（缓存消息结束位置）的内容缓存到了fileChannel中相同的位置，并没有写入磁盘
在缓存到fileChannel后，会更新committedPosition值

回到commit方法，在向fileCfihannel缓存完毕后，会检查committedPosition是否达到了fileSize，也就是判断writeBuffer中的内容是不是去全部提交完毕

若是全部提交，需要通过transientStorePool的returnBuffer方法来回收利用writeBuffer
transientStorePool其实是一个双向队列，由CommitLog来管理
TransientStorePool：

 1 public class TransientStorePool {
 2     private static final InternalLogger log = InternalLoggerFactory.getLogger(LoggerName.STORE_LOGGER_NAME);
 3 
 4     private final int poolSize;
 5     private final int fileSize;
 6     private final Deque<ByteBuffer> availableBuffers;
 7     private final MessageStoreConfig storeConfig;
 8 
 9     public TransientStorePool(final MessageStoreConfig storeConfig) {
10         this.storeConfig = storeConfig;
11         this.poolSize = storeConfig.getTransientStorePoolSize();
12         this.fileSize = storeConfig.getMapedFileSizeCommitLog();
13         this.availableBuffers = new ConcurrentLinkedDeque<>();
14     }
15     ......
16 }

returnBuffer方法：

1 public void returnBuffer(ByteBuffer byteBuffer) {
2     byteBuffer.position(0);
3     byteBuffer.limit(fileSize);
4     this.availableBuffers.offerFirst(byteBuffer);
5 }

这里就可以清楚地看到byteBuffer确实被回收了

回到MappedFileQueue的commit方法：

 1 public boolean commit(final int commitLeastPages) {
 2     boolean result = true;
 3     MappedFile mappedFile = this.findMappedFileByOffset(this.committedWhere, this.committedWhere == 0);
 4     if (mappedFile != null) {
 5         int offset = mappedFile.commit(commitLeastPages);
 6         long where = mappedFile.getFileFromOffset() + offset;
 7         result = where == this.committedWhere;
 8         this.committedWhere = where;
 9     }
10 
11     return result;
12 }

在完成mappedFile的commit后，通过where和committedWhere来判断是否真的向fileCfihannel缓存了，只有确实缓存了result才是false！
之后会更新committedWhere，并返回result

那么回到CommitRealTimeService的run方法，在完成commit之后，会判断result
只有真的向fileCfihannel缓存后，才会调用flushCommitLogService的wakeup方法，也就是唤醒了FlushCommitLogService的刷盘线程

唯一和之前分析的FlushCommitLogService不同的地方是在MappedFile的flush方法中：

1 if (writeBuffer != null || this.fileChannel.position() != 0) {
2     this.fileChannel.force(false);
3 } else {
4     this.mappedByteBuffer.force();
5 }

之前在没有开启内存字节缓冲区的情况下，是将mappedByteBuffer中的内容写入磁盘
而这时，终于轮到fileChannel了

可以看到这里的条件判断，当writeBuffer不等与null，或者fileChannel的position不等与0
writeBuffer等于null的情况会在TransientStorePool对其回收之后

到这里就可以明白开启内存字节缓冲区的情况下，其实是进行了两次缓存才写入磁盘

至此，Broker的消息持久化以及刷盘的整个过程完毕

posted @ 2019-08-07 00:35 松饼人阅读(1185) 评论(0) 收藏举报

刷新页面返回顶部

松饼人

RocketMQ中Broker的刷盘源码分析

公告