Spark源码阅读05:内存管理
内存管理
主要说的内存管理其实是Executor
存储内存:缓存数据 & 广播变量,(总内存-300M) x 60% x 0.5
执行内存:Shuffle过程中的操作,(总内存-300M) x 60% x 0.5
其他内存:系统 & RDD元数据信息,(总内存-300M) x 40%
预留内存:300M
不归Java虚拟机管理的内存叫做堆外内存,即JVM不能对其进行管理和释放,可以自己控制,用起来更灵活但是也更加不安全
存储和执行当双方内存不足的时候会互相挤占资源,而被挤占的部分会根据存储级别进行淘汰或溢写,所以cache数据正常情况下都有可能会丢失,绝对不能切断血缘
记住执行内存可以只借不还,因为存储数据如果丢失了可以重新计算得到;但是如果执行内存的数据丢失了,统计结果就会错误
源码
找到org.apache.spark.SparkEnv#create,往下滑找到memoryManager即内存管理
val useLegacyMemoryManager = conf.getBoolean("spark.memory.useLegacyMode", false)
val memoryManager: MemoryManager =
if (useLegacyMemoryManager) {
new StaticMemoryManager(conf, numUsableCores) // 静态内存管理
} else {
UnifiedMemoryManager(conf, numUsableCores) // 统一内存管理
}
点击进入UnifiedMemoryManager
def apply(conf: SparkConf, numCores: Int): UnifiedMemoryManager = {
val maxMemory = getMaxMemory(conf)
new UnifiedMemoryManager(
conf,
maxHeapMemory = maxMemory,
onHeapStorageRegionSize =
(maxMemory * conf.getDouble("spark.memory.storageFraction", 0.5)).toLong,
numCores = numCores)
}
UnifiedMemoryManager=> MemoryManager,可以看见内存管理相关的代码
/**
* An abstract memory manager that enforces how memory is shared between execution and storage.
*
* In this context, execution memory refers to that used for computation in shuffles, joins,
* sorts and aggregations, while storage memory refers to that used for caching and propagating
* internal data across the cluster. There exists one MemoryManager per JVM.
*/
private[spark] abstract class MemoryManager(
conf: SparkConf,
numCores: Int,
onHeapStorageMemory: Long,
onHeapExecutionMemory: Long) extends Logging {
// -- Methods related to memory allocation policies and bookkeeping ------------------------------
@GuardedBy("this")
protected val onHeapStorageMemoryPool = new StorageMemoryPool(this, MemoryMode.ON_HEAP)
@GuardedBy("this")
protected val offHeapStorageMemoryPool = new StorageMemoryPool(this, MemoryMode.OFF_HEAP)
@GuardedBy("this")
protected val onHeapExecutionMemoryPool = new ExecutionMemoryPool(this, MemoryMode.ON_HEAP)
@GuardedBy("this")
protected val offHeapExecutionMemoryPool = new ExecutionMemoryPool(this, MemoryMode.OFF_HEAP)
onHeapStorageMemoryPool.incrementPoolSize(onHeapStorageMemory)
onHeapExecutionMemoryPool.incrementPoolSize(onHeapExecutionMemory)
protected[this] val maxOffHeapMemory = conf.get(MEMORY_OFFHEAP_SIZE)
protected[this] val offHeapStorageMemory =
(maxOffHeapMemory * conf.getDouble("spark.memory.storageFraction", 0.5)).toLong
offHeapExecutionMemoryPool.incrementPoolSize(maxOffHeapMemory - offHeapStorageMemory)
offHeapStorageMemoryPool.incrementPoolSize(offHeapStorageMemory)

浙公网安备 33010602011771号