Spark源码阅读05:内存管理

内存管理

主要说的内存管理其实是Executor

存储内存:缓存数据 & 广播变量,(总内存-300M) x 60% x 0.5

执行内存:Shuffle过程中的操作,(总内存-300M) x 60% x 0.5

其他内存:系统 & RDD元数据信息,(总内存-300M) x 40%

预留内存:300M

不归Java虚拟机管理的内存叫做堆外内存,即JVM不能对其进行管理和释放,可以自己控制,用起来更灵活但是也更加不安全

存储和执行当双方内存不足的时候会互相挤占资源,而被挤占的部分会根据存储级别进行淘汰或溢写,所以cache数据正常情况下都有可能会丢失,绝对不能切断血缘

记住执行内存可以只借不还,因为存储数据如果丢失了可以重新计算得到;但是如果执行内存的数据丢失了,统计结果就会错误


源码

找到org.apache.spark.SparkEnv#create,往下滑找到memoryManager即内存管理

val useLegacyMemoryManager = conf.getBoolean("spark.memory.useLegacyMode", false)
val memoryManager: MemoryManager =
  if (useLegacyMemoryManager) {
    new StaticMemoryManager(conf, numUsableCores)  // 静态内存管理
  } else {
    UnifiedMemoryManager(conf, numUsableCores)  // 统一内存管理
  }

点击进入UnifiedMemoryManager

def apply(conf: SparkConf, numCores: Int): UnifiedMemoryManager = {
  val maxMemory = getMaxMemory(conf)
  new UnifiedMemoryManager(
    conf,
    maxHeapMemory = maxMemory,
    onHeapStorageRegionSize =
      (maxMemory * conf.getDouble("spark.memory.storageFraction", 0.5)).toLong,
    numCores = numCores)
}

UnifiedMemoryManager=> MemoryManager,可以看见内存管理相关的代码

/**
 * An abstract memory manager that enforces how memory is shared between execution and storage.
 *
 * In this context, execution memory refers to that used for computation in shuffles, joins,
 * sorts and aggregations, while storage memory refers to that used for caching and propagating
 * internal data across the cluster. There exists one MemoryManager per JVM.
 */
private[spark] abstract class MemoryManager(
    conf: SparkConf,
    numCores: Int,
    onHeapStorageMemory: Long,
    onHeapExecutionMemory: Long) extends Logging {

  // -- Methods related to memory allocation policies and bookkeeping ------------------------------

  @GuardedBy("this")
  protected val onHeapStorageMemoryPool = new StorageMemoryPool(this, MemoryMode.ON_HEAP)
  @GuardedBy("this")
  protected val offHeapStorageMemoryPool = new StorageMemoryPool(this, MemoryMode.OFF_HEAP)
  @GuardedBy("this")
  protected val onHeapExecutionMemoryPool = new ExecutionMemoryPool(this, MemoryMode.ON_HEAP)
  @GuardedBy("this")
  protected val offHeapExecutionMemoryPool = new ExecutionMemoryPool(this, MemoryMode.OFF_HEAP)

  onHeapStorageMemoryPool.incrementPoolSize(onHeapStorageMemory)
  onHeapExecutionMemoryPool.incrementPoolSize(onHeapExecutionMemory)

  protected[this] val maxOffHeapMemory = conf.get(MEMORY_OFFHEAP_SIZE)
  protected[this] val offHeapStorageMemory =
    (maxOffHeapMemory * conf.getDouble("spark.memory.storageFraction", 0.5)).toLong

  offHeapExecutionMemoryPool.incrementPoolSize(maxOffHeapMemory - offHeapStorageMemory)
  offHeapStorageMemoryPool.incrementPoolSize(offHeapStorageMemory)
posted @ 2022-12-12 12:48  黄一洋  阅读(7)  评论(0)    收藏  举报