• 博客园logo
  • 会员
  • 周边
  • 新闻
  • 博问
  • 闪存
  • 众包
  • 赞助商
  • Chat2DB
    • 搜索
      所有博客
    • 搜索
      当前博客
  • 写随笔 我的博客 短消息 简洁模式
    用户头像
    我的博客 我的园子 账号设置 会员中心 简洁模式 ... 退出登录
    注册 登录
邹天得
博客园    首页    新随笔    联系   管理    订阅  订阅
Spark Streaming 'numRecords must not be negative' (II)

自从把Spark 从1.3升级到1.6之后,kafka Streaming相关问题频出。最近又遇到了一个。 
job中使用Kafka DirectStream 读取topic中数据,然后做处理。其中有个测试job,停止了几天,再次启动时爆出了kafka.common.OffsetOutOfRangeException。下文记录下异常分析与解决过程。

异常分析

从字面意思上,说是kafka topic的offset越界异常;在job中使用的是Kafka DirectStream,每成功处理一批数据,就把对应的offset更新到zookeeper中;和数组越界异常一样,offset越界应该分为头越界和尾越界,如下图所示。 
越界示意图

  • 头部越界: zookeeper中保存的offset在topic中仍然存在的最老message的offset之前时(zk_offset < earliest_offset);
  • 尾部越界: zookeeper中保存的offset在topic中最新message的offset之后时(zk_offset > last_offset)

因为代码中采用了之前文章的方法,因此不可能是尾部越界,因此猜测是头部越界。 
是什么导致头部越界呢? 
考虑到kafka broker配置中修改了message的保持时间为24小时:

log.retention.hours=24(The minimum age of a log file to be eligible for deletion)

因此,应该是kafka 中未被消费的数据被broker清除了,使得zk中的offset落在仍存在的最老message offset的左侧,本来合法的offset变得不非法了。

验证猜测

  1. 改kafka broker 的retention time 为2分钟 
    配置文件 
    kafka/config/server.properties 
    log.retention.hours=168 -> log.retention.minutes=2 
    修改完成后重启kafka。
  2. 使用zk shell 命令得到解析器所保存的zk_offset
  3. 停止spark streaming kafka DirectStream job
  4. 发送数据到kafka topic,等待一段时间(超过两分钟)
  5. 启动streaming job,复现该异常。

通过异常验证可以导致异常的原因为:kafka broker因为log.retention.hours的配置,导致topic中有些数据被清除,而在retention时间范围内streaming job都没有把将要被清除的message消费掉,因此zk中offset落在了earliest_offset的左侧,引发异常。

解决方法

首先想到的方法就是 streaming job要及时消费掉topic中的数据,消费延迟不得大于log.retention.time的配置。 
但是更好的办法是在遇到该问题时,依然能让job正常运行,因此就需要在发现zk_offset<earliest_offset时矫正zk_offset为合法值。 
同样使用Spark Streaming ‘numRecords must not be negative’问题解决,解决思路的方法。 
代码:

package com.frey.v1.utils.kafka;

import com.google.common.collect.Lists;
import com.google.common.collect.Maps;
import kafka.api.PartitionOffsetRequestInfo;
import kafka.cluster.Broker;
import kafka.common.TopicAndPartition;
import kafka.javaapi.*;
import kafka.javaapi.consumer.SimpleConsumer;

import java.util.Date;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

/**
 * KafkaOffsetTool
 *
 * @author Shuai YUAN
 * @date 2016/4/11
 */
public class KafkaOffsetTool {

  private static KafkaOffsetTool instance;
  final int TIMEOUT = 100000;
  final int BUFFERSIZE = 64 * 1024;

  private KafkaOffsetTool() {
  }

  public static synchronized KafkaOffsetTool getInstance() {
    if (instance == null) {
      instance = new KafkaOffsetTool();
    }
    return instance;
  }

public Map<TopicAndPartition, Long> getLastOffset(String brokerList, List<String> topics,
      String groupId) {

    Map<TopicAndPartition, Long> topicAndPartitionLongMap = Maps.newHashMap();

    Map<TopicAndPartition, Broker> topicAndPartitionBrokerMap =
        KafkaOffsetTool.getInstance().findLeader(brokerList, topics);

    for (Map.Entry<TopicAndPartition, Broker> topicAndPartitionBrokerEntry : topicAndPartitionBrokerMap
        .entrySet()) {
      // get leader broker
      Broker leaderBroker = topicAndPartitionBrokerEntry.getValue();

      SimpleConsumer simpleConsumer = new SimpleConsumer(leaderBroker.host(), leaderBroker.port(),
          TIMEOUT, BUFFERSIZE, groupId);

      long readOffset = getTopicAndPartitionLastOffset(simpleConsumer,
          topicAndPartitionBrokerEntry.getKey(), groupId);

      topicAndPartitionLongMap.put(topicAndPartitionBrokerEntry.getKey(), readOffset);

    }

    return topicAndPartitionLongMap;

  }

  /**
   *
   * @param brokerList
   * @param topics
   * @param groupId
   * @return
   */
  public Map<TopicAndPartition, Long> getEarliestOffset(String brokerList, List<String> topics,
      String groupId) {

    Map<TopicAndPartition, Long> topicAndPartitionLongMap = Maps.newHashMap();

    Map<TopicAndPartition, Broker> topicAndPartitionBrokerMap =
        KafkaOffsetTool.getInstance().findLeader(brokerList, topics);

    for (Map.Entry<TopicAndPartition, Broker> topicAndPartitionBrokerEntry : topicAndPartitionBrokerMap
        .entrySet()) {
      // get leader broker
      Broker leaderBroker = topicAndPartitionBrokerEntry.getValue();

      SimpleConsumer simpleConsumer = new SimpleConsumer(leaderBroker.host(), leaderBroker.port(),
          TIMEOUT, BUFFERSIZE, groupId);

      long readOffset = getTopicAndPartitionEarliestOffset(simpleConsumer,
          topicAndPartitionBrokerEntry.getKey(), groupId);

      topicAndPartitionLongMap.put(topicAndPartitionBrokerEntry.getKey(), readOffset);

    }

    return topicAndPartitionLongMap;

  }

  /**
   * 得到所有的 TopicAndPartition
   *
   * @param brokerList
   * @param topics
   * @return topicAndPartitions
   */
  private Map<TopicAndPartition, Broker> findLeader(String brokerList, List<String> topics) {
    // get broker's url array
    String[] brokerUrlArray = getBorkerUrlFromBrokerList(brokerList);
    // get broker's port map
    Map<String, Integer> brokerPortMap = getPortFromBrokerList(brokerList);

    // create array list of TopicAndPartition
    Map<TopicAndPartition, Broker> topicAndPartitionBrokerMap = Maps.newHashMap();

    for (String broker : brokerUrlArray) {

      SimpleConsumer consumer = null;
      try {
        // new instance of simple Consumer
        consumer = new SimpleConsumer(broker, brokerPortMap.get(broker), TIMEOUT, BUFFERSIZE,
            "leaderLookup" + new Date().getTime());

        TopicMetadataRequest req = new TopicMetadataRequest(topics);

        TopicMetadataResponse resp = consumer.send(req);

        List<TopicMetadata> metaData = resp.topicsMetadata();

        for (TopicMetadata item : metaData) {
          for (PartitionMetadata part : item.partitionsMetadata()) {
            TopicAndPartition topicAndPartition =
                new TopicAndPartition(item.topic(), part.partitionId());
            topicAndPartitionBrokerMap.put(topicAndPartition, part.leader());
          }
        }
      } catch (Exception e) {
        e.printStackTrace();
      } finally {
        if (consumer != null)
          consumer.close();
      }
    }
    return topicAndPartitionBrokerMap;
  }

  /**
   * get last offset
   * @param consumer
   * @param topicAndPartition
   * @param clientName
   * @return
   */
  private long getTopicAndPartitionLastOffset(SimpleConsumer consumer,
      TopicAndPartition topicAndPartition, String clientName) {
    Map<TopicAndPartition, PartitionOffsetRequestInfo> requestInfo =
        new HashMap<TopicAndPartition, PartitionOffsetRequestInfo>();

    requestInfo.put(topicAndPartition, new PartitionOffsetRequestInfo(
        kafka.api.OffsetRequest.LatestTime(), 1));

    OffsetRequest request = new OffsetRequest(
        requestInfo, kafka.api.OffsetRequest.CurrentVersion(),
        clientName);

    OffsetResponse response = consumer.getOffsetsBefore(request);

    if (response.hasError()) {
      System.out
          .println("Error fetching data Offset Data the Broker. Reason: "
              + response.errorCode(topicAndPartition.topic(), topicAndPartition.partition()));
      return 0;
    }
    long[] offsets = response.offsets(topicAndPartition.topic(), topicAndPartition.partition());
    return offsets[0];
  }

  /**
   * get earliest offset
   * @param consumer
   * @param topicAndPartition
   * @param clientName
   * @return
   */
  private long getTopicAndPartitionEarliestOffset(SimpleConsumer consumer,
      TopicAndPartition topicAndPartition, String clientName) {
    Map<TopicAndPartition, PartitionOffsetRequestInfo> requestInfo =
        new HashMap<TopicAndPartition, PartitionOffsetRequestInfo>();

    requestInfo.put(topicAndPartition, new PartitionOffsetRequestInfo(
        kafka.api.OffsetRequest.EarliestTime(), 1));

    OffsetRequest request = new OffsetRequest(
        requestInfo, kafka.api.OffsetRequest.CurrentVersion(),
        clientName);

    OffsetResponse response = consumer.getOffsetsBefore(request);

    if (response.hasError()) {
      System.out
          .println("Error fetching data Offset Data the Broker. Reason: "
              + response.errorCode(topicAndPartition.topic(), topicAndPartition.partition()));
      return 0;
    }
    long[] offsets = response.offsets(topicAndPartition.topic(), topicAndPartition.partition());
    return offsets[0];
  }
  /**
   * 得到所有的broker url
   *
   * @param brokerlist
   * @return
   */
  private String[] getBorkerUrlFromBrokerList(String brokerlist) {
    String[] brokers = brokerlist.split(",");
    for (int i = 0; i < brokers.length; i++) {
      brokers[i] = brokers[i].split(":")[0];
    }
    return brokers;
  }

  /**
   * 得到broker url 与 其port 的映射关系
   *
   * @param brokerlist
   * @return
   */
  private Map<String, Integer> getPortFromBrokerList(String brokerlist) {
    Map<String, Integer> map = new HashMap<String, Integer>();
    String[] brokers = brokerlist.split(",");
    for (String item : brokers) {
      String[] itemArr = item.split(":");
      if (itemArr.length > 1) {
        map.put(itemArr[0], Integer.parseInt(itemArr[1]));
      }
    }
    return map;
  }

  public static void main(String[] args) {
    List<String> topics = Lists.newArrayList();
    topics.add("bsa_sys_tmp");
//    topics.add("bugfix");
    Map<TopicAndPartition, Long> topicAndPartitionLongMap =
        KafkaOffsetTool.getInstance().getEarliestOffset("bsa142:9092,bsa143:9092", topics,
            "com.nsfocus.bsa.setl");

    for (Map.Entry<TopicAndPartition, Long> entry : topicAndPartitionLongMap.entrySet()) {
     System.out.println(entry.getKey().topic() + "-"+ entry.getKey().partition() + ":" + entry.getValue());
    }
  }
}

矫正offset核心代码:

/** 以下 矫正 offset */

    // lastest offsets
    Map<TopicAndPartition, Long> lastestTopicAndPartitionLongMap =
        KafkaOffsetTool.getInstance().getLastOffset(kafkaParams.get("metadata.broker.list"),
            Lists.newArrayList(topicsSet), kafkaParams.get(Constants.KAFKA_CONSUMER_GROUP_ID));

    // earliest offsets
    Map<TopicAndPartition, Long> earliestTopicAndPartitionLongMap =
        KafkaOffsetTool.getInstance().getEarliestOffset(kafkaParams.get("metadata.broker.list"),
            Lists.newArrayList(topicsSet), kafkaParams.get(Constants.KAFKA_CONSUMER_GROUP_ID));


    for (Map.Entry<TopicAndPartition, Long> topicAndPartitionLongEntry : fromOffsets.entrySet()) {

      long zkOffset = topicAndPartitionLongEntry.getValue();
      long lastestOffset = lastestTopicAndPartitionLongMap.get(topicAndPartitionLongEntry.getKey());
      long earliestOffset = earliestTopicAndPartitionLongMap.get(topicAndPartitionLongEntry.getKey());
      // zkoffset 不在可用message offset区间内
      if (zkOffset > lastestOffset || zkOffset < earliestOffset) {
        // set offset = earliestOffset
        logger.warn("矫正offset: " + zkOffset +" -> "+ earliestOffset);
        topicAndPartitionLongEntry.setValue(earliestOffset);
      }
    }
    /** 以上 矫正 offset */
posted on 2016-12-10 15:37  邹天得  阅读(1104)  评论(0)    收藏  举报
刷新页面返回顶部
博客园  ©  2004-2026
浙公网安备 33010602011771号 浙ICP备2021040463号-3