Parquet && spark和Hive的问题排查

Parquet异常问题排查
问题异常如下:

Caused by: parquet.hadoop.MemoryManager$1: New Memory allocation 1044273 bytes is smaller than the minimum allocation size of 1048576 bytes.
        at parquet.hadoop.MemoryManager.updateAllocation(MemoryManager.java:125)
        at parquet.hadoop.MemoryManager.addWriter(MemoryManager.java:82)
        at parquet.hadoop.ParquetRecordWriter.<init>(ParquetRecordWriter.java:104)
        at parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:303)
        at parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:267)
        at org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.<init>(ParquetRecordWriterWrapper.java:65)
        at org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat.getParquerRecordWriterWrapper(MapredParquetOutputFormat.java:125)
        at org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat.getHiveRecordWriter(MapredParquetOutputFormat.java:114)
        at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getRecordWriter(HiveFileFormatUtils.java:261)
        at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveRecordWriter(HiveFileFormatUtils.java:246)
        ... 19 more

定位异常的代码:

private void updateAllocation() {
...其他代码
    for (Map.Entry<InternalParquetRecordWriter, Long> entry : writerList.entrySet()) {
      long newSize = (long) Math.floor(entry.getValue() * scale);
      if(scale < 1.0 && minMemoryAllocation > 0 && newSize < minMemoryAllocation) {
          throw new ParquetRuntimeException(String.format("New Memory allocation %d bytes" +
          " is smaller than the minimum allocation size of %d bytes.",
              newSize, minMemoryAllocation)){};
      }
      entry.getKey().setRowGroupSizeThreshold(newSize);
      LOG.debug(String.format("Adjust block size from %,d to %,d for writer: %s",
            entry.getValue(), newSize, entry.getKey()));
    }
  }

抛出异常的检查条件 scale 小于1 并且 minMemoryAllocation 大于0 并且新申请的空间大小小于 minMemoryAllocation 的值。那么解决此问题的思路简单的看有两种方法
1. 将minMemoryAllocation 设置为0
2. 将minMemoryAllocation 设置的小一点,比新申请的空间大小还要小,具体的参考值,可以参考异常提示的值。

修改minMemoryAllocation的值通过

hiveContext.setConf("parquet.memory.min.chunk.size", (1024 * 32).toString)

设置,此处我设置了32K。问题得到了解决。

但是深入研究发现,其实 scale 小于1 ,也是触发此异常的关键,为什么要对此进行判断呢?

我们先看看scale是如何计算的?

    long totalAllocations = 0;
    double scale;
    for (Long allocation : writerList.values()) {
      totalAllocations += allocation;
    }
    if (totalAllocations <= totalMemoryPool) {
      scale = 1.0;
    } else {
      scale = (double) totalMemoryPool / totalAllocations;
      LOG.warn(String.format(
          "Total allocation exceeds %.2f%% (%,d bytes) of heap memory\n" +
          "Scaling row group sizes to %.2f%% for %d writers",
          100*memoryPoolRatio, totalMemoryPool, 100*scale, writerList.size()));
    }

此处显示当需要分配的内存大于等于系统总的内存时,scale的值会小于1.需要分配的内存通过writerList的值累加获取,那么writerList是什么呢?


  /**
   * Add a new writer and its memory allocation to the memory manager.
   * @param writer the new created writer
   * @param allocation the requested buffer size
   */
  synchronized void addWriter(InternalParquetRecordWriter writer, Long allocation) {
    Long oldValue = writerList.get(writer);
    if (oldValue == null) {
      writerList.put(writer, allocation);
    } else {
      throw new IllegalArgumentException("[BUG] The Parquet Memory Manager should not add an " +
          "instance of InternalParquetRecordWriter more than once. The Manager already contains " +
          "the writer: " + writer);
    }
    updateAllocation();
  }

新增writer时,会增加需要分配的值,新增writer是如何触发的呢?

  public ParquetRecordWriter(
      ParquetFileWriter w,
      WriteSupport<T> writeSupport,
      MessageType schema,
      Map<String, String> extraMetaData,
      long blockSize, int pageSize,
      BytesCompressor compressor,
      int dictionaryPageSize,
      boolean enableDictionary,
      boolean validating,
      WriterVersion writerVersion,
      MemoryManager memoryManager) {
    internalWriter = new InternalParquetRecordWriter<T>(w, writeSupport, schema,
        extraMetaData, blockSize, pageSize, compressor, dictionaryPageSize, enableDictionary,
        validating, writerVersion);
    this.memoryManager = checkNotNull(memoryManager, "memoryManager");
    memoryManager.addWriter(internalWriter, blockSize);
  }
  
  
  public RecordWriter<Void, T> getRecordWriter(Configuration conf, Path file, CompressionCodecName codec)
        throws IOException, InterruptedException {
    final WriteSupport<T> writeSupport = getWriteSupport(conf);

    CodecFactory codecFactory = new CodecFactory(conf);
    long blockSize = getLongBlockSize(conf);
    if (INFO) LOG.info("Parquet block size to " + blockSize);
    int pageSize = getPageSize(conf);
    if (INFO) LOG.info("Parquet page size to " + pageSize);
    int dictionaryPageSize = getDictionaryPageSize(conf);
    if (INFO) LOG.info("Parquet dictionary page size to " + dictionaryPageSize);
    boolean enableDictionary = getEnableDictionary(conf);
    if (INFO) LOG.info("Dictionary is " + (enableDictionary ? "on" : "off"));
    boolean validating = getValidation(conf);
    if (INFO) LOG.info("Validation is " + (validating ? "on" : "off"));
    WriterVersion writerVersion = getWriterVersion(conf);
    if (INFO) LOG.info("Writer version is: " + writerVersion);

    WriteContext init = writeSupport.init(conf);
    ParquetFileWriter w = new ParquetFileWriter(conf, init.getSchema(), file);
    w.start();

    float maxLoad = conf.getFloat(ParquetOutputFormat.MEMORY_POOL_RATIO,
        MemoryManager.DEFAULT_MEMORY_POOL_RATIO);
    long minAllocation = conf.getLong(ParquetOutputFormat.MIN_MEMORY_ALLOCATION,
        MemoryManager.DEFAULT_MIN_MEMORY_ALLOCATION);
    if (memoryManager == null) {
      memoryManager = new MemoryManager(maxLoad, minAllocation);
    } else if (memoryManager.getMemoryPoolRatio() != maxLoad) {
      LOG.warn("The configuration " + MEMORY_POOL_RATIO + " has been set. It should not " +
          "be reset by the new value: " + maxLoad);
    }

    return new ParquetRecordWriter<T>(
        w,
        writeSupport,
        init.getSchema(),
        init.getExtraMetaData(),
        blockSize, pageSize,
        codecFactory.getCompressor(codec, pageSize),
        dictionaryPageSize,
        enableDictionary,
        validating,
        writerVersion,
        memoryManager);
  }

由此可见,随着writer创建的个数越来越多,导致申请的内存的数量超出了系统分配的数量,从而导致 scale 计算得到小于1的情况。而在我们的场景下,是因为采用了自动分区,由于创建分区的数量超出了我们设想的值,因此才触发此bug。为什么要这么设置呢?
我们看看下面代码:

entry.getKey().setRowGroupSizeThreshold(newSize);

最终设置的是 RowGroupSizeThreshold 的值,

  private void checkBlockSizeReached() throws IOException {
    if (recordCount >= recordCountForNextMemCheck) { // checking the memory size is relatively expensive, so let's not do it for every record.
      long memSize = columnStore.getBufferedSize();
      if (memSize > rowGroupSizeThreshold) {
        LOG.info(format("mem size %,d > %,d: flushing %,d records to disk.", memSize, rowGroupSizeThreshold, recordCount));
        flushRowGroupToStore();
        initStore();
        recordCountForNextMemCheck = min(max(MINIMUM_RECORD_COUNT_FOR_CHECK, recordCount / 2), MAXIMUM_RECORD_COUNT_FOR_CHECK);
      } else {
        float recordSize = (float) memSize / recordCount;
        recordCountForNextMemCheck = min(
            max(MINIMUM_RECORD_COUNT_FOR_CHECK, (recordCount + (long)(rowGroupSizeThreshold / recordSize)) / 2), // will check halfway
            recordCount + MAXIMUM_RECORD_COUNT_FOR_CHECK // will not look more than max records ahead
            );
        if (DEBUG) LOG.debug(format("Checked mem at %,d will check again at: %,d ", recordCount, recordCountForNextMemCheck));
      }
    }
  }

当内存的值超过阈值的时候,会自动触发将内存的数据flush到硬盘中,从而保证不会出现内存溢出的情况,所以writer越多,每个writer的阈值会越小。

总结一下:
当hive写入的时候,每增加一个分区时,会创建一个writer,而增加一个writer,会触发修改所有的writer的RowGroupSizeThreshold内存阈值,从而保证不会发生内存溢出的情况。

posted on 2017-04-17 11:13  luckuan1985  阅读(2402)  评论(0编辑  收藏  举报