四事务型MQ的最终一致性事务方案--2RocketMQ事务消息的回查

四 事务型MQ的最终一致性事务方案--2RocketMQ事务消息的回查

3.2.3 回查事务状态

根据终结事务的源码中,当事务消息在第一阶段prepared时,发送commitlog并被分发到RMQ_SYS_TRANS_HALF_TOPIC队列中。在prepared消息发送成功后,producer端调用executeLocalTransaction方法执行,获取本地事务状态;由于在后续endTransaction方法执行时,业务的事务方法尚未提交,因此建议本地事务方法返回unknown的事务状态,然后再结束事务时,不做任何处理。最后通过事务状态的定时回查以得到producer端明确的state(commit或者rollback)。

rocketMQ通过TransactionMessageCheckService线程,定时的去检测RMQ_SYS_TRANS_HALF_TOPIC主题中的消息,然后通过消息去producerGroup组中挑选一个producer,回查本地事务的状态。(检测频率默认1min)

TransactionalMessageCheckService事务消息回查服务的类继承结构,如下:

image-20230531202046530

TransactionalMessageCheckService是brokerController内的属性,在brokerController.initialTransaction初始化时,被创建;然后再brokerController.start--》startProcessorByHa--》this.transactionalMessageCheckService.start()中被开启。

………………broker端启动事务状态回查………………
CHECK1 TransactionalMessageCheckService.start事务回查服务开启
public abstract class ServiceThread implements Runnable {

    public void start() {
        log.info("Try to start service thread:{} started:{} lastThread:{}", getServiceName(), started.get(), thread);
        if (!started.compareAndSet(false, true)) {
            return;
        }
        stopped = false;
      //将当前TransactionalMessageCheckService作为服务,单独设置线程任务,并启动
        this.thread = new Thread(this, getServiceName());
        this.thread.setDaemon(isDaemon);
        this.thread.start();
    }

然后,执行TransactionalMessageCheckService.run

public class TransactionalMessageCheckService extends ServiceThread {

    @Override
    public void run() {
        log.info("Start transaction check service thread!");
      //从配置文件中获取事务回查间隔interval
        long checkInterval = brokerController.getBrokerConfig().getTransactionCheckInterval();
      //CH1 isStopped
        while (!this.isStopped()) {
          //当线程serviceThread没有标记stop时,执行
          //CK2 waitForRunning
            this.waitForRunning(checkInterval);
        }
        log.info("End transaction check service thread!");
    }
ServiceThread
    protected volatile boolean stopped = false;
    protected final CountDownLatch2 waitPoint = new CountDownLatch2(1);//线程计数器,设置为1
    protected volatile AtomicBoolean hasNotified = new AtomicBoolean(false);

    public boolean isStopped() {
        return stopped;
    }

    protected void waitForRunning(long interval) {
      //case1 如果hasNotified为true,已经被通知过
        if (hasNotified.compareAndSet(true, false)) {
          //CH3 onWaitEnd---调用事务的回查TransactionalMessageService().check
            this.onWaitEnd();
            return;
        }
		//case2 hasNotified为false,尚未被通知过(或者已经执行过前面,设置过hasNotified为false)
        waitPoint.reset();//重置

        try {
          //当前线程阻塞,等待waitPoint.countDown,等待被唤醒----在回查间隔时间内被唤醒
            waitPoint.await(interval, TimeUnit.MILLISECONDS);
        } catch (InterruptedException e) {
            log.error("Interrupted", e);
        } finally {
            hasNotified.set(false); //最后会重置hasNotified的值为false
          //CH3 onWaitEnd
            this.onWaitEnd();
        }
    }
TransactionalMessageCheckService
    @Override
    protected void onWaitEnd() {
  //事务过期时间---消息存储+过期>当前时间,才会执行状态回查;否则,下一周期中执行状态回查
        long timeout = brokerController.getBrokerConfig().getTransactionTimeOut();
  //事务回查最大次数(超过最大回查次数,直接丢弃)
        int checkMax = brokerController.getBrokerConfig().getTransactionCheckMax();
        long begin = System.currentTimeMillis();
        log.info("Begin to check prepare message, begin time:{}", begin);
  //CHECK2 TransactionalMessageService().check调用事务消息服务的check回查
        this.brokerController.getTransactionalMessageService().check(timeout, checkMax, this.brokerController.getTransactionalMessageCheckListener());
        log.info("End to check prepare message, consumed time:{}", System.currentTimeMillis() - begin);
    }

这里,TransactionalMessageCheckService回查服务的执行,依赖tomicBoolean hasNotified(通知标识位)和CountDownLatch2 waitPoint(线程计数器)实现等待通知。逻辑如下:

TransactionalMessageCheckService.run(){
case1 如果hasNotified为true,重新设置为false:
  	然后,调用事务的回查TransactionalMessageService().check;
case2 如果hasNotified为false,waitPoint.await(interval),阻塞当前时间为回查间隔,如果线程被wakeup,也会唤醒:
		然后,仍旧调用事务的回查TransactionalMessageService().check;
}
CHECK2 TransactionalMessageService().check定时回查的执行体
TransactionalMessageServiceImpl
  
    @Override
    public void check(long transactionTimeout, int transactionCheckMax,
        AbstractTransactionalMessageCheckListener listener) {
        try { //1
            String topic = MixAll.RMQ_SYS_TRANS_HALF_TOPIC;//RMQ_SYS_TRANS_HALF_TOPIC
          //从对应的RMQ_SYS_TRANS_HALF_TOPIC队列中,获取所有的事务half消息的消息队列
            Set<MessageQueue> msgQueues = transactionalMessageBridge.fetchMessageQueues(topic);
        
          //逐个处理所有half消息
            for (MessageQueue messageQueue : msgQueues) { //2 
                long startTime = System.currentTimeMillis();
              //根据事务half消息队列,获取与之对应的RMQ_SYS_TRANS_OP_HALF_TOPIC已经处理过的队列
                MessageQueue opQueue = getOpQueue(messageQueue);
              //half消息的offset
                long halfOffset = transactionalMessageBridge.fetchConsumeOffset(messageQueue);
              //op消息的offset
                long opOffset = transactionalMessageBridge.fetchConsumeOffset(opQueue);
              
	
              //已经处理过的of消息的offset
                List<Long> doneOpOffset = new ArrayList<>();
              //已经处理过的halfoffset:ophalfoffset
                HashMap<Long, Long> removeMap = new HashMap<>();
              //fillOpRemoveMap:根据当前处理进度,从op队列中拉取32条消息,方便后续检查当前half消息,是否已经处理过(是否commit、rollback过),如果处理过,即不需要再向producer端发送执行回查的request请求
                PullResult pullResult = fillOpRemoveMap(removeMap, opQueue, opOffset, halfOffset, doneOpOffset);
                if (null == pullResult) {
                    log.error("The queue={} check msgOffset={} with opOffset={} failed, pullResult is null",
                        messageQueue, halfOffset, opOffset);
                    continue;
                }
                // single thread
                int getMessageNullCount = 1;//获取空消息的次数统计
                long newOffset = halfOffset;//当前half消息offset
                long i = halfOffset;//
                while (true) { //3 
                  //如果当前消息队列的回查,超过最大时长,等待下一次任务调度再处理---默认60秒
                    if (System.currentTimeMillis() - startTime > MAX_PROCESS_TIME_LIMIT) {
                        log.info("Queue={} process time reach max={}", messageQueue, MAX_PROCESS_TIME_LIMIT);
                        break;
                    }
                  //当前i-halfOffset的half消息被处理过,后面递增这两个offset值---++,while中处理下一条消息
                    if (removeMap.containsKey(i)) {
                        log.info("Half offset {} has been committed/rolled back", i);
                        removeMap.remove(i);
                    } else {//4    未处理过
                      //从RMQ_SYS_TRANS_HALF_TOPIC的队列中,获取i位置的half消息
                        GetResult getResult = getHalfMsg(messageQueue, i);
                        MessageExt msgExt = getResult.getMsg();
                      //如果half消息为空-----根据默认重试次数,在下一个while中再拉一次
                        if (msgExt == null) {
                            if (getMessageNullCount++ > MAX_RETRY_COUNT_WHEN_HALF_NULL) {
                                break;
                            }
                            if (getResult.getPullResult().getPullStatus() == PullStatus.NO_NEW_MSG) {
                                log.debug("No new msg, the miss offset={} in={}, continue check={}, pull result={}", i,
                                    messageQueue, getMessageNullCount, getResult.getPullResult());
                                break;
                            } else {
                                log.info("Illegal offset, the miss offset={} in={}, continue check={}, pull result={}",
                                    i, messageQueue, getMessageNullCount, getResult.getPullResult());
                                i = getResult.getPullResult().getNextBeginOffset();
                                newOffset = i;
                                continue;
                            }
                        }

                      //当half消息不为空时
                      /**判断当前消息,是否需要discard丢弃或者skip跳过 :
                      discard逻辑:当前msg超过最大回查次数,消息丢弃----没回查一次+1,最大15次;
                      skip逻辑:事务消息超过文件过期时间,72小时,则跳过消息;
                      这两个操作,都是跳过当前消息,即++1*/
                        if (needDiscard(msgExt, transactionCheckMax) || needSkip(msgExt)) {
                            listener.resolveDiscardMsg(msgExt);
                            newOffset = i + 1;
                            i++;
                            continue;
                        }
                        if (msgExt.getStoreTimestamp() >= startTime) {
                            log.debug("Fresh stored. the miss offset={}, check it later, store={}", i,
                                new Date(msgExt.getStoreTimestamp()));
                            break;
                        }

                      //消息已经存储的时间
                        long valueOfCurrentMinusBorn = System.currentTimeMillis() - msgExt.getBornTimestamp();
      					//checkImmunityTime:立刻检查事务消息的时间------原理:在half消息发送时,不应该立即提交,应该在这个时间段后,发起check回查本地事务
                   //transactionTimeout:事务超时时间(当从op中拉取最后一条时间与check开始的时间差,如果超过transactionTimeout,那么无论是否小于checkImmunityTime,都会发送事务回查check的请求
                        long checkImmunityTime = transactionTimeout;
                      //事务消息配置的回查请求最晚的时间,只有该时间内,才可以回查(默认null)
                        String checkImmunityTimeStr = msgExt.getUserProperty(MessageConst.PROPERTY_CHECK_IMMUNITY_TIME_IN_SECONDS);
                        if (null != checkImmunityTimeStr) {
                            checkImmunityTime = getImmunityTime(checkImmunityTimeStr, transactionTimeout);
                            if (valueOfCurrentMinusBorn < checkImmunityTime) {
                                if (checkPrepareQueueOffset(removeMap, doneOpOffset, msgExt)) {
                                    newOffset = i + 1;
                                    i++;
                                    continue;
                                }
                            }
                          //
                        } else {
                          //如果消息已经存储时间《不发起check时间,则跳过此次处理,等while下一次
                            if ((0 <= valueOfCurrentMinusBorn) && (valueOfCurrentMinusBorn < checkImmunityTime)) {
                                log.debug("New arrived, the miss offset={}, check it later checkImmunity={}, born={}", i,
                                    checkImmunityTime, new Date(msgExt.getBornTimestamp()));
                                break;
                            }
                        }
                      //获取32条已经op的消息
                        List<MessageExt> opMsg = pullResult.getMsgFoundList();
                      //判断当前消息是否需要check,两种主要情况:
                      //1 op的32条队列为null,且消息存在时间超过了checkImmunityTime不回查时间;
                      //2 op不为null,且op最后一条消息时间—check开始时间,超过transactionTimeout事务超时时间,这时无论是否小于checkImmunityTime,都要进行check回查
                        boolean isNeedCheck = (opMsg == null && valueOfCurrentMinusBorn > checkImmunityTime)
                            || (opMsg != null && (opMsg.get(opMsg.size() - 1).getBornTimestamp() - startTime > transactionTimeout))
                            || (valueOfCurrentMinusBorn <= -1);

                        if (isNeedCheck) {
//CHECK3 putBackHalfMsgQueue  如果需要回查,需要把half消息,再次发送到RMQ_SYS_TRANS_HALF_TOPIC中
                            if (!putBackHalfMsgQueue(msgExt, i)) {
                                continue;
                            }
//CHECK4 resolveHalfMsg 由线程池异步发送事务回查消息
                            listener.resolveHalfMsg(msgExt);
                        } else {
                          //如果无法判断当前消息是否需要回查check,继续从op的队列中,再拉取后续的32条op消息,再判定是否有需要回查
                            pullResult = fillOpRemoveMap(removeMap, opQueue, pullResult.getNextBeginOffset(), halfOffset, doneOpOffset);
                            log.info("The miss offset:{} in messageQueue:{} need to get more opMsg, result is:{}", i,
                                messageQueue, pullResult);
                            continue;
                        }
                    } //4 
                    newOffset = i + 1;
                    i++;
                } //3 
                if (newOffset != halfOffset) {
                  //重新计算half消息中已经消费进度的offset
                    transactionalMessageBridge.updateConsumeOffset(messageQueue, newOffset);
                }
                long newOpOffset = calculateOpOffset(doneOpOffset, opOffset);
                if (newOpOffset != opOffset) {
                  //更新op队列中消费进度
                    transactionalMessageBridge.updateConsumeOffset(opQueue, newOpOffset);
                }
            } //2
        } catch (Exception e) {   //1
            e.printStackTrace();
            log.error("Check error", e);
        }

    }

上述过程是TransactionalMessageServiceImpl这一定时线程回查消息的代码,内容比较多,下面进行逐步的分析:

step1 获取RMQ_SYS_TRANS_HALF_TOPIC的topic的所有队列
      String topic = MixAll.RMQ_SYS_TRANS_HALF_TOPIC;//RMQ_SYS_TRANS_HALF_TOPIC
          //从对应的RMQ_SYS_TRANS_HALF_TOPIC队列中,获取所有的事务half消息的消息队列
            Set<MessageQueue> msgQueues = transactionalMessageBridge.fetchMessageQueues(topic);
forstep1 循环处理各个队列
    //逐个处理所有half消息
            for (MessageQueue messageQueue : msgQueues) { //2 
                long startTime = System.currentTimeMillis();
              //根据事务half消息队列,获取与之对应的RMQ_SYS_TRANS_OP_HALF_TOPIC已经处理过的队列
                MessageQueue opQueue = getOpQueue(messageQueue);
              //half消息的offset
                long halfOffset = transactionalMessageBridge.fetchConsumeOffset(messageQueue);
              //op消息的offset
                long opOffset = transactionalMessageBridge.fetchConsumeOffset(opQueue);

获取每个RMQ_SYS_TRANS_HALF_TOPIC对应的op队列,并获取两个队列的处理进度offset;

forstep2 从op队列拉取32条消息
       //已经处理过的of消息的offset
                List<Long> doneOpOffset = new ArrayList<>();
              //已经处理过的halfoffset:ophalfoffset
                HashMap<Long, Long> removeMap = new HashMap<>();
              //fillOpRemoveMap:根据当前处理进度,从op队列中拉取32条消息,方便后续检查当前half消息,是否已经处理过(是否commit、rollback过),如果处理过,即不需要再向producer端发送执行回查的request请求
                PullResult pullResult = fillOpRemoveMap(removeMap, opQueue, opOffset, halfOffset, doneOpOffset);
                if (null == pullResult) {
                    log.error("The queue={} check msgOffset={} with opOffset={} failed, pullResult is null",
                        messageQueue, halfOffset, opOffset);
                    continue;
                }

从op队列中拉取32条已经处理过(commit或者rollback)的消息,来对msg判断其是否处理过(减少不必要回查check的次数)。

whilestep1 未被处理过且为null时
   while (true) { //3 
                  //如果当前消息队列的回查,超过最大时长,等待下一次任务调度再处理---默认60秒
                    if (System.currentTimeMillis() - startTime > MAX_PROCESS_TIME_LIMIT) {
                        log.info("Queue={} process time reach max={}", messageQueue, MAX_PROCESS_TIME_LIMIT);
                        break;
                    }
                  //当前i-halfOffset的half消息被处理过,后面递增这两个offset值---++,while中处理下一条消息
                    if (removeMap.containsKey(i)) {
                        log.info("Half offset {} has been committed/rolled back", i);
                        removeMap.remove(i);
                    } else {//4    未处理过
                      //从RMQ_SYS_TRANS_HALF_TOPIC的队列中,获取i位置的half消息
                        GetResult getResult = getHalfMsg(messageQueue, i);
                        MessageExt msgExt = getResult.getMsg();
                      //如果half消息为空-----根据默认重试次数,在下一个while中再拉一次
                        if (msgExt == null) {
                            if (getMessageNullCount++ > MAX_RETRY_COUNT_WHEN_HALF_NULL) {
                                break;
                            }
                            if (getResult.getPullResult().getPullStatus() == PullStatus.NO_NEW_MSG) {
                                log.debug("No new msg, the miss offset={} in={}, continue check={}, pull result={}", i,
                                    messageQueue, getMessageNullCount, getResult.getPullResult());
                                break;
                            } else {
                                log.info("Illegal offset, the miss offset={} in={}, continue check={}, pull result={}",
                                    i, messageQueue, getMessageNullCount, getResult.getPullResult());
                                i = getResult.getPullResult().getNextBeginOffset();
                                newOffset = i;
                                continue;
                            }
                        }

在msg被处理过时,不做处理,只递增offset和opoffset的值,处理下一条;

如果未被处理过,且half消息队列上当前消息为null时,对重试回查字数+1处理,进入下一个while循环,处理下一条;

whilestep2 未被处理过,且非null
             //当half消息不为空时
                      /**判断当前消息,是否需要discard丢弃或者skip跳过 :
                      discard逻辑:当前msg超过最大回查次数,消息丢弃----没回查一次+1,最大15次;
                      skip逻辑:事务消息超过文件过期时间,72小时,则跳过消息;
                      这两个操作,都是跳过当前消息,即++1*/
                        if (needDiscard(msgExt, transactionCheckMax) || needSkip(msgExt)) {
                            listener.resolveDiscardMsg(msgExt);
                            newOffset = i + 1;
                            i++;
                            continue;
                        }
                        if (msgExt.getStoreTimestamp() >= startTime) {
                            log.debug("Fresh stored. the miss offset={}, check it later, store={}", i,
                                new Date(msgExt.getStoreTimestamp()));
                            break;
                        }

                      //消息已经存储的时间
                        long valueOfCurrentMinusBorn = System.currentTimeMillis() - msgExt.getBornTimestamp();
      					//checkImmunityTime:立刻检查事务消息的时间------原理:在half消息发送时,不应该立即提交,应该在这个时间段后,发起check回查本地事务
                   //transactionTimeout:事务超时时间(当从op中拉取最后一条时间与check开始的时间差,如果超过transactionTimeout,那么无论是否小于checkImmunityTime,都会发送事务回查check的请求
                        long checkImmunityTime = transactionTimeout;
                      //事务消息配置的回查请求最晚的时间,只有该时间内,才可以回查(默认null)
                        String checkImmunityTimeStr = msgExt.getUserProperty(MessageConst.PROPERTY_CHECK_IMMUNITY_TIME_IN_SECONDS);
                        if (null != checkImmunityTimeStr) {
                            checkImmunityTime = getImmunityTime(checkImmunityTimeStr, transactionTimeout);
                            if (valueOfCurrentMinusBorn < checkImmunityTime) {
                                if (checkPrepareQueueOffset(removeMap, doneOpOffset, msgExt)) {
                                    newOffset = i + 1;
                                    i++;
                                    continue;
                                }
                            }
                          //
                        } else {
                          //如果消息已经存储时间《不发起check时间,则跳过此次处理,等while下一次
                            if ((0 <= valueOfCurrentMinusBorn) && (valueOfCurrentMinusBorn < checkImmunityTime)) {
                                log.debug("New arrived, the miss offset={}, check it later checkImmunity={}, born={}", i,
                                    checkImmunityTime, new Date(msgExt.getBornTimestamp()));
                                break;
                            }
                        }

判断当前消息是否需要discard或者skip

discard逻辑:当前msg超过最大回查次数,消息丢弃----没回查一次+1,最大15次;
skip逻辑:事务消息超过文件过期时间,72小时,则跳过消息;
                      这两个操作,都是跳过当前消息,即++1

然后是获取判断当前消息是否需要check的属性设置:

1 checkImmunityTime:立刻检查事务消息的时间------原理:在half消息发送时,不应该立即提交,应该在这个时间段后,发起check回查本地事务
2 transactionTimeout:事务超时时间(当从op中拉取最后一条时间与check开始的时间差,如果超过transactionTimeout,那么无论是否小于		checkImmunityTime,都会发送事务回查check的请求
3 PROPERTY_CHECK_IMMUNITY_TIME_IN_SECONDS查请求最晚的时间,只有该时间内,才可以回查(默认null

这三个参数决定事务消息msg是否回查:

如果存在时间valueOfCurrentMinusBorn>checkImmunityTime || 从op队列中拉取最后一条处理过的消息的存储时间—check当前时间>transactionTimeout(此时无论该差值是否小于checkImmunityTime),这两个情况,都需要执行回查

whilestep3 判断是否需要回查isNeedCheck
            //获取32条已经op的消息
                        List<MessageExt> opMsg = pullResult.getMsgFoundList();
                      //判断当前消息是否需要check,两种主要情况:
                      //1 op的32条队列为null,且消息存在时间超过了checkImmunityTime不回查时间;
                      //2 op不为null,且op最后一条消息时间—check开始时间,超过transactionTimeout事务超时时间,这时无论是否小于checkImmunityTime,都要进行check回查
                        boolean isNeedCheck = (opMsg == null && valueOfCurrentMinusBorn > checkImmunityTime)
                            || (opMsg != null && (opMsg.get(opMsg.size() - 1).getBornTimestamp() - startTime > transactionTimeout))
                            || (valueOfCurrentMinusBorn <= -1);

                        if (isNeedCheck) {
//CHECK3 putBackHalfMsgQueue  如果需要回查,需要把half消息,再次发送到RMQ_SYS_TRANS_HALF_TOPIC中
                            if (!putBackHalfMsgQueue(msgExt, i)) {
                                continue;
                            }
//CHECK4 resolveHalfMsg 由线程池异步发送事务回查消息
                            listener.resolveHalfMsg(msgExt);
                        } else {
                          //如果无法判断当前消息是否需要回查check,继续从op的队列中,再拉取后续的32条op消息,再判定是否有需要回查
                            pullResult = fillOpRemoveMap(removeMap, opQueue, pullResult.getNextBeginOffset(), halfOffset, doneOpOffset);
                            log.info("The miss offset:{} in messageQueue:{} need to get more opMsg, result is:{}", i,
                                messageQueue, pullResult);
                            continue;
                        }
                    } //4 
                    newOffset = i + 1;
                    i++;
                } //3 

此处,给出根据几个参数,决定是否需要回查:

//判断当前消息是否需要check,两种主要情况:
                      //1 op的32条队列为null,且消息存在时间超过了checkImmunityTime不回查时间;
                      //2 op不为null,且op最后一条消息时间—check开始时间,超过transactionTimeout事务超时时间,这时无论是否小于checkImmunityTime,都要进行check回查

如果需要回查,分别执行如下两个操作:

CHECK3 putBackHalfMsgQueue  如果需要回查,需要把half消息,再次发送到RMQ_SYS_TRANS_HALF_TOPIC中
CHECK4 resolveHalfMsg 由线程池异步发送事务回查消息

如果此时无法判定是否需要回查,那么会再次从op队列中拉取下一个32条消息,在下一次while中继续判定当前消息,是否需要回查check。

whilestep4 更新half和op队列
if (newOffset != halfOffset) {
                  //重新计算half消息中已经消费进度的offset
                    transactionalMessageBridge.updateConsumeOffset(messageQueue, newOffset);
                }
                long newOpOffset = calculateOpOffset(doneOpOffset, opOffset);
                if (newOpOffset != opOffset) {
                  //更新op队列中消费进度
                    transactionalMessageBridge.updateConsumeOffset(opQueue, newOpOffset);
                }

此处,着重分析whilestep3中两个回查方法:

CHECK3 putBackHalfMsgQueue 将待check消息再次存入half队列:原因分析
TransactionalMessageServiceImpl

   private boolean putBackHalfMsgQueue(MessageExt msgExt, long offset) {
  //将待回查消息,再次存入RMQ_SYS_TRANS_HALF_TOPIC队列
        PutMessageResult putMessageResult = putBackToHalfQueueReturnResult(msgExt);
        if (putMessageResult != null
            && putMessageResult.getPutMessageStatus() == PutMessageStatus.PUT_OK) {
            msgExt.setQueueOffset(
                putMessageResult.getAppendMessageResult().getLogicsOffset());
            msgExt.setCommitLogOffset(
                putMessageResult.getAppendMessageResult().getWroteOffset());
            msgExt.setMsgId(putMessageResult.getAppendMessageResult().getMsgId());
            log.debug(
                "Send check message, the offset={} restored in queueOffset={} "
                    + "commitLogOffset={} "
                    + "newMsgId={} realMsgId={} topic={}",
                offset, msgExt.getQueueOffset(), msgExt.getCommitLogOffset(), msgExt.getMsgId(),
                msgExt.getUserProperty(MessageConst.PROPERTY_UNIQ_CLIENT_MESSAGE_ID_KEYIDX),
                msgExt.getTopic());
            return true;
        } else {
            log.error(
                "PutBackToHalfQueueReturnResult write failed, topic: {}, queueId: {}, "
                    + "msgId: {}",
                msgExt.getTopic(), msgExt.getQueueId(), msgExt.getMsgId());
            return false;
        }
    }

此处,对于已经需要check的msg消息,需要再次传入RMQ_SYS_TRANS_HALF_TOPIC队列中,而且把新的消息重新设置最新的offset。这样做的目的和好处如下:
1 在判断msg需要执行回查check时,后续需要使用线程池异步的执行回查请求的发送,即listener.resolveHalfMsg,而由于是异步,所以无法知道回查操作是否成功,因此将次消息再次存入commitlog的RMQ_SYS_TRANS_HALF_TOPIC队列,然后继续推进half和op消息队列的进度。再half消息队列的推进过程中,再次执行到该消息,可以通过op队列,判断当前消息是否已经处理过;

2 其次的原因是,rocketmq采用顺序存储,效率高,而如果执行到msg发送异步的check,得到结果后,再回过头来处理已经执行过的half队列的信息,会影响性能。

CHECK4 AbstractTransactionalMessageCheckListener.resolveHalfMsg线程池执行异步的回查请求发送
AbstractTransactionalMessageCheckListener

    public void resolveHalfMsg(final MessageExt msgExt) {
        executorService.execute(new Runnable() {
            @Override
            public void run() {
                try {
                  //发送回查消息的请求
                    sendCheckMessage(msgExt);
                } catch (Exception e) {
                    LOGGER.error("Send check message error!", e);
                }
            }
        });
    }

    public void sendCheckMessage(MessageExt msgExt) throws Exception {
      //构造回查事务状态的请求头
        CheckTransactionStateRequestHeader checkTransactionStateRequestHeader = new CheckTransactionStateRequestHeader();
        checkTransactionStateRequestHeader.setCommitLogOffset(msgExt.getCommitLogOffset());
        checkTransactionStateRequestHeader.setOffsetMsgId(msgExt.getMsgId());
        checkTransactionStateRequestHeader.setMsgId(msgExt.getUserProperty(MessageConst.PROPERTY_UNIQ_CLIENT_MESSAGE_ID_KEYIDX));
        checkTransactionStateRequestHeader.setTransactionId(checkTransactionStateRequestHeader.getMsgId());
        checkTransactionStateRequestHeader.setTranStateTableOffset(msgExt.getQueueOffset());
        msgExt.setTopic(msgExt.getUserProperty(MessageConst.PROPERTY_REAL_TOPIC));
        msgExt.setQueueId(Integer.parseInt(msgExt.getUserProperty(MessageConst.PROPERTY_REAL_QUEUE_ID)));
        msgExt.setStoreSize(0);
      //根据消息的producerGroup,从中选择一个向producer发送请求
        String groupId = msgExt.getProperty(MessageConst.PROPERTY_PRODUCER_GROUP);
        Channel channel = brokerController.getProducerManager().getAvaliableChannel(groupId);
        if (channel != null) {
          //发送回查消息请求
            brokerController.getBroker2Client().checkProducerTransactionState(groupId, channel, checkTransactionStateRequestHeader, msgExt);
        } else {
            LOGGER.warn("Check transaction failed, channel is null. groupId={}", groupId);
        }
    }

这里,要从生产者组group中根据groupid选择一个生产者,发送回查请求

然后调用client发送回查请求:

public class Broker2Client {
    private static final InternalLogger log = InternalLoggerFactory.getLogger(LoggerName.BROKER_LOGGER_NAME);
    private final BrokerController brokerController;

    public Broker2Client(BrokerController brokerController) {
        this.brokerController = brokerController;
    }

    public void checkProducerTransactionState(
        final String group,
        final Channel channel,
        final CheckTransactionStateRequestHeader requestHeader,
        final MessageExt messageExt) throws Exception {
        RemotingCommand request =
            RemotingCommand.createRequestCommand(RequestCode.CHECK_TRANSACTION_STATE, requestHeader);
        request.setBody(MessageDecoder.encode(messageExt, false));
        try {
          //发送回查请求
            this.brokerController.getRemotingServer().invokeOneway(channel, request, 10);
        } catch (Exception e) {
            log.error("Check transaction failed because invoke producer exception. group={}, msgId={}", group, messageExt.getMsgId(), e.getMessage());
        }
    }
………………producer端响应事务状态回查………………

producer端收到请求后,处理回查

image-20230601142714826

CHECK5 ClientRemotingProcessor.processRequest

由clientRemotingProcessor处理器,处理请求

public class ClientRemotingProcessor implements NettyRequestProcessor {
    private final InternalLogger log = ClientLogger.getLog();
    private final MQClientInstance mqClientFactory;


    @Override
    public RemotingCommand processRequest(ChannelHandlerContext ctx,
        RemotingCommand request) throws RemotingCommandException {
        switch (request.getCode()) {
            //check
            case RequestCode.CHECK_TRANSACTION_STATE:
                return this.checkTransactionState(ctx, request);
            case RequestCode.NOTIFY_CONSUMER_IDS_CHANGED:
                return this.notifyConsumerIdsChanged(ctx, request);
            case RequestCode.RESET_CONSUMER_CLIENT_OFFSET:
                return this.resetOffset(ctx, request);
            case RequestCode.GET_CONSUMER_STATUS_FROM_CLIENT:
                return this.getConsumeStatus(ctx, request);

            case RequestCode.GET_CONSUMER_RUNNING_INFO:
                return this.getConsumerRunningInfo(ctx, request);

            case RequestCode.CONSUME_MESSAGE_DIRECTLY:
                return this.consumeMessageDirectly(ctx, request);
            default:
                break;
        }
        return null;
    }
  
      public RemotingCommand checkTransactionState(ChannelHandlerContext ctx,
        RemotingCommand request) throws RemotingCommandException {
        final CheckTransactionStateRequestHeader requestHeader =
            (CheckTransactionStateRequestHeader) request.decodeCommandCustomHeader(CheckTransactionStateRequestHeader.class);
        final ByteBuffer byteBuffer = ByteBuffer.wrap(request.getBody());
        //解码消息
        final MessageExt messageExt = MessageDecoder.decode(byteBuffer);
        if (messageExt != null) {
            String transactionId = messageExt.getProperty(MessageConst.PROPERTY_UNIQ_CLIENT_MESSAGE_ID_KEYIDX);
            if (null != transactionId && !"".equals(transactionId)) {
                messageExt.setTransactionId(transactionId);
            }
          //获取生产者组的name
            final String group = messageExt.getProperty(MessageConst.PROPERTY_PRODUCER_GROUP);
            if (group != null) {
              //从producergroup中获取一个producer
                MQProducerInner producer = this.mqClientFactory.selectProducer(group);
                if (producer != null) {
                    final String addr = RemotingHelper.parseChannelRemoteAddr(ctx.channel());
//CHECK6 producer.checkTransactionState
                    producer.checkTransactionState(addr, messageExt, requestHeader);
                } else {
                    log.debug("checkTransactionState, pick producer by group[{}] failed", group);
                }
            } else {
                log.warn("checkTransactionState, pick producer group failed");
            }
        } else {
            log.warn("checkTransactionState, decode message failed");
        }

        return null;
    }
CHECK6 producer.checkTransactionState
 @Override
    public void checkTransactionState(final String addr, final MessageExt msg,
        final CheckTransactionStateRequestHeader header) {
        Runnable request = new Runnable() {
            private final String brokerAddr = addr;
            private final MessageExt message = msg;
            private final CheckTransactionStateRequestHeader checkRequestHeader = header;
            private final String group = DefaultMQProducerImpl.this.defaultMQProducer.getProducerGroup();

            @Override
            public void run() {
                TransactionCheckListener transactionCheckListener = DefaultMQProducerImpl.this.checkListener();
                TransactionListener transactionListener = getCheckListener();
                if (transactionCheckListener != null || transactionListener != null) {
                    LocalTransactionState localTransactionState = LocalTransactionState.UNKNOW;
                    Throwable exception = null;
                    try {
                        if (transactionCheckListener != null) {
                          //执行transactionCheckListener.checkLocalTransactionState(message)的本地事务状态回查
                            localTransactionState = transactionCheckListener.checkLocalTransactionState(message);
                        } else if (transactionListener != null) {
                            log.debug("Used new check API in transaction message");
                            localTransactionState = transactionListener.checkLocalTransaction(message);
                        } else {
                            log.warn("CheckTransactionState, pick transactionListener by group[{}] failed", group);
                        }
                    } catch (Throwable e) {
                        log.error("Broker call checkTransactionState, but checkLocalTransactionState exception", e);
                        exception = e;
                    }

                    this.processTransactionState(
                        localTransactionState,
                        group,
                        exception);
                } else {
                    log.warn("CheckTransactionState, pick transactionCheckListener by group[{}] failed", group);
                }
            }

            private void processTransactionState(
                final LocalTransactionState localTransactionState,
                final String producerGroup,
                final Throwable exception) {
                final EndTransactionRequestHeader thisHeader = new EndTransactionRequestHeader();
                thisHeader.setCommitLogOffset(checkRequestHeader.getCommitLogOffset());
                thisHeader.setProducerGroup(producerGroup);
                thisHeader.setTranStateTableOffset(checkRequestHeader.getTranStateTableOffset());
                thisHeader.setFromTransactionCheck(true);

                String uniqueKey = message.getProperties().get(MessageConst.PROPERTY_UNIQ_CLIENT_MESSAGE_ID_KEYIDX);
                if (uniqueKey == null) {
                    uniqueKey = message.getMsgId();
                }
                thisHeader.setMsgId(uniqueKey);
                thisHeader.setTransactionId(checkRequestHeader.getTransactionId());
                switch (localTransactionState) {
                    case COMMIT_MESSAGE:
                        thisHeader.setCommitOrRollback(MessageSysFlag.TRANSACTION_COMMIT_TYPE);
                        break;
                    case ROLLBACK_MESSAGE:
                        thisHeader.setCommitOrRollback(MessageSysFlag.TRANSACTION_ROLLBACK_TYPE);
                        log.warn("when broker check, client rollback this transaction, {}", thisHeader);
                        break;
                    case UNKNOW:
                        thisHeader.setCommitOrRollback(MessageSysFlag.TRANSACTION_NOT_TYPE);
                        log.warn("when broker check, client does not know this transaction state, {}", thisHeader);
                        break;
                    default:
                        break;
                }

                String remark = null;
                if (exception != null) {
                    remark = "checkLocalTransactionState Exception: " + RemotingHelper.exceptionSimpleDesc(exception);
                }

                try {
                  //向broker端发送事务消息commit、rollback、unknown的处理事务状态
                    DefaultMQProducerImpl.this.mQClientFactory.getMQClientAPIImpl().endTransactionOneway(brokerAddr, thisHeader, remark,
                        3000);
                } catch (Exception e) {
                    log.error("endTransactionOneway exception", e);
                }
            }
        };

      //事务状态的回查,是交给回查线程池处理
        this.checkExecutor.submit(request);
    }

事务的回查,是交给producer端线程池处理

    this.checkExecutor = new ThreadPoolExecutor(
                producer.getCheckThreadPoolMinSize(),
                producer.getCheckThreadPoolMaxSize(),
                1000 * 60,
                TimeUnit.MILLISECONDS,
                this.checkRequestQueue);

上述过程,如果本地回查状态commit,则producer向broker发送commit提交事务的命令;

如果本地回查rollback,则producer发送rollback的回滚事务操作;

如果unknown,则忽略此次提交。

由此,事务消息的处理过程,基本结束。

image-20230601144059268
posted @ 2023-06-12 18:36  LBJboy  阅读(141)  评论(0编辑  收藏  举报