Kafka 0.8: 多日志文件夹机制

kafka 0.7.2 中对log.dir的定义如下：

`log.dir`	none	Specifies the root directory in which all log data is kept.

在kafka 0.8 中将log.dir 修改为 log.dirs，官方文档说明如下：

log.dirs	/tmp/kafka-logs	A comma-separated list of one or more directories in which Kafka data is stored. Each new partition that is created will be placed in the directory which currently has the fewest partitions.

从0.8开始，支持配置多个日志文件夹，文件夹之间使用逗号隔开即可，这样做在实际项目中有非常大的好处，那就是支持多硬盘。

下面从源码着手来浅析一下多日志文件夹是怎么工作的

1. 首先broker启动时会加载指定的配置文件，并把property对象传入KafkaConfig对象中

object Kafka extends Logging {
    try {
      val props = Utils.loadProps(args(0))
      val serverConfig = new KafkaConfig(props)

2. 在kafkaConfig 中会解析log.dirs字符串，将其通过逗号隔开，形成Set，调用split方法时传入"\\s*,\\s*"，表示逗号前后的空格都会被忽略

 /* the directories in which the log data is kept */
  val logDirs = Utils.parseCsvList(props.getString("log.dirs", props.getString("log.dir", "/tmp/kafka-logs")))
  require(logDirs.size > 0)

  /**
   * Parse a comma separated string into a sequence of strings.
   * Whitespace surrounding the comma will be removed.
   */
  def parseCsvList(csvList: String): Seq[String] = {
    if(csvList == null || csvList.isEmpty)
      Seq.empty[String]
    else {
      csvList.split("\\s*,\\s*").filter(v => !v.equals(""))
    }
  }

3. 在KafkaServer中生成LogManager对象时传入 [(dir_path_1,File(dir_path_1)), (dir_path_2,File(dir_path_2)) ]

    new LogManager(logDirs = config.logDirs.map(new File(_)).toArray,
                   topicConfigs = configs,
                   defaultConfig = defaultLogConfig,
                   cleanerConfig = cleanerConfig,
                   flushCheckMs = config.logFlushSchedulerIntervalMs,
                   flushCheckpointMs = config.logFlushOffsetCheckpointIntervalMs,
                   retentionCheckMs = config.logCleanupIntervalMs,
                   scheduler = kafkaScheduler,
                   time = time)

4.LogManager首先对传入的dir进行下列验证：是否存在相同的文件夹、文件夹是否存在（不存在则创建）、是否为可读的文件夹

  /**
   * Create and check validity of the given directories, specifically:
   * <ol>
   * <li> Ensure that there are no duplicates in the directory list
   * <li> Create each directory if it doesn't exist
   * <li> Check that each path is a readable directory 
   * </ol>
   */
  private def createAndValidateLogDirs(dirs: Seq[File]) {
    if(dirs.map(_.getCanonicalPath).toSet.size < dirs.size)
      throw new KafkaException("Duplicate log directory found: " + logDirs.mkString(", "))
    for(dir <- dirs) {
      if(!dir.exists) {
        info("Log directory '" + dir.getAbsolutePath + "' not found, creating it.")
        val created = dir.mkdirs()
        if(!created)
          throw new KafkaException("Failed to create data directory " + dir.getAbsolutePath)
      }
      if(!dir.isDirectory || !dir.canRead)
        throw new KafkaException(dir.getAbsolutePath + " is not a readable log directory.")
    }
  }

5. LogManager 对所有的文件夹获取文件锁，防止其他进行对该文件夹进行操作

  /**
   * Lock all the given directories
   */
  private def lockLogDirs(dirs: Seq[File]): Seq[FileLock] = {
    dirs.map { dir =>
      val lock = new FileLock(new File(dir, LockFile))
      if(!lock.tryLock())
        throw new KafkaException("Failed to acquire lock on file .lock in " + lock.file.getParentFile.getAbsolutePath + 
                               ". A Kafka instance in another process or thread is using this directory.")
      lock
    }
  }

6. 通过文件夹下面的recovery-point-offset-checkpoint 恢复加载每个目录下面的partition文件

  /**
   * Recover and load all logs in the given data directories
   */
  private def loadLogs(dirs: Seq[File]) {
    for(dir <- dirs) {
      val recoveryPoints = this.recoveryPointCheckpoints(dir).read
      /* load the logs */
      val subDirs = dir.listFiles()
      if(subDirs != null) {
        //当kafka退出时，正常关闭的日志文件都会在该日志文件下生成.kafka_cleanshutdown为后缀的文件，该文件的作用是，在下次启动时，此日志文件可以不进行恢复流程
        val cleanShutDownFile = new File(dir, Log.CleanShutdownFile)
        if(cleanShutDownFile.exists())
          info("Found clean shutdown file. Skipping recovery for all logs in data directory '%s'".format(dir.getAbsolutePath))
        for(dir <- subDirs) {
          if(dir.isDirectory) {
            info("Loading log '" + dir.getName + "'")
            val topicPartition = Log.parseTopicPartitionName(dir.getName)
            val config = topicConfigs.getOrElse(topicPartition.topic, defaultConfig)
            val log = new Log(dir, 
                              config,
                              recoveryPoints.getOrElse(topicPartition, 0L),
                              scheduler,
                              time)
            val previous = this.logs.put(topicPartition, log)
            if(previous != null)
              throw new IllegalArgumentException("Duplicate log directories found: %s, %s!".format(log.dir.getAbsolutePath, previous.dir.getAbsolutePath))
          }
        }
        cleanShutDownFile.delete()
      }
    }
  }

7. 当需要创建新的日志文件时，会在日志文件比较少的文件夹下去创建，源码中的注释很详细

  /**
   * Choose the next directory in which to create a log. Currently this is done
   * by calculating the number of partitions in each directory and then choosing the
   * data directory with the fewest partitions.
   */
  private def nextLogDir(): File = {
    if(logDirs.size == 1) {
      logDirs(0)
    } else {
      // count the number of logs in each parent directory (including 0 for empty directories
      val logCounts = allLogs.groupBy(_.dir.getParent).mapValues(_.size)
      val zeros = logDirs.map(dir => (dir.getPath, 0)).toMap
      //下面代码的主要作用是，对没有日志文件的文件夹设置size为0
      var dirCounts = (zeros ++ logCounts).toBuffer
    
      // choose the directory with the least logs in it
      val leastLoaded = dirCounts.sortBy(_._2).head
      new File(leastLoaded._1)
    }
  }

posted @ 2015-01-22 16:17 cruze_lee 阅读(5170) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

沉木

坚持，坚持

Kafka 0.8: 多日志文件夹机制

公告