记一次JVM参数调优
简介
本文介绍了在k8s中部署jave应用时的jvm调优参数经验,以及Prometheus在生产中的实际使用经验。
出现的现象
在k8s中部署的java应用pod在凌晨时发生重启,观察到pod重启的原因是OOM(Out Of Memory) killed,但JVM参数已经限制了堆大小为1G,并且pod限制的内存大小为1.5G。
JVM配置:
-Xmx1024M -Xms1024M
k8s deploment配置片段:
resources:
limits:
memory: "1.5G"
cpu: "2000m"
requests:
memory: "1.5G"
cpu: "50m"
查看Grafana:
发现老年代在不断增长,达到某个大小之后就会发生OOM Killed。重启发生在定时任务的执行过程中。Young GC正常,STW时间也正常。没有发生过一次Full GC。

解决方案
先说解决方案,出现以上的本质原因是非堆内存+堆内存超出了pod的内存限制,从而导致了oom的发生。经过优化和测试,JVM参数调整为:
-XX:NativeMemoryTracking=summary
-XX:+UseContainerSupport
-XX:InitialRAMPercentage=35.0
-XX:MaxRAMPercentage=35.0
-XX:MinRAMPercentage=35.0
-XX:MaxDirectMemorySize=50m
-Xss512k
参数说明:
-XX:NativeMemoryTracking=summary打开Native内存分析,打开后可以使用jcmd <pid> VM.native_memory命令输出native内存统计信息-XX:+UseContainerSupport打开容器支持,自动识别容器限制-XX:InitialRAMPercentage设置初始堆大小占总内存的百分比-XX:MaxRAMPercentage设置最大堆占总内存的百分比-XX:MinRAMPercentage设置最小堆占总内存的百分比-XX:MaxDirectMemorySize限制DirectMemory的大小-Xss512k线程栈大小,默认为1MB
解决过程
1、一开始从代码侧入手,优化内存的使用,去除了不必要的List,有一定效果,但一段时间后依然会OOM Killed
2、从GC图上发现一次FullGc都没有,这不是个寻常的现象。逐步调整-XX:MaxRAMPercentage参数,并进行测试,调整到35%才不会发生OOM Killed现象。
3、Native内存分析,加上-XX:NativeMemoryTracking=summary参数,并执行jcmd <pid> VM.native_memory,输出如下:
Native Memory Tracking:
(Omitting categories weighting less than 1KB)
Total: reserved=2327934KB, committed=1026082KB
malloc: 222020KB #1294826
mmap: reserved=2105914KB, committed=804062KB
- Java Heap (reserved=430080KB, committed=430080KB)
(mmap: reserved=430080KB, committed=430080KB)
- Class (reserved=1058182KB, committed=39750KB)
(classes #41852)
( instance classes #39902, array classes #1950)
(malloc=9606KB #158280) (peak=9664KB #158273)
(mmap: reserved=1048576KB, committed=30144KB)
( Metadata: )
( reserved=262144KB, committed=206656KB)
( used=203462KB)
( waste=3194KB =1.55%)
( Class space:)
( reserved=1048576KB, committed=30144KB)
( used=26349KB)
( waste=3795KB =12.59%)
- Thread (reserved=99870KB, committed=11394KB)
(thread #98)
(stack: reserved=99586KB, committed=11110KB)
(malloc=171KB #587) (peak=202KB #663)
(arena=113KB #193) (peak=4525KB #141)
- Code (reserved=257472KB, committed=122344KB)
(malloc=9784KB #41244) (peak=9784KB #41245)
(mmap: reserved=247688KB, committed=112560KB)
- GC (reserved=1431KB, committed=1431KB)
(malloc=23KB #79) (peak=1464KB #441)
(mmap: reserved=1408KB, committed=1408KB)
- Compiler (reserved=1168KB, committed=1168KB)
(malloc=1004KB #2662) (peak=1077KB #2672)
(arena=165KB #5) (peak=56457KB #13)
- Internal (reserved=118973KB, committed=118969KB)
(malloc=118933KB #78736) (peak=118940KB #78823)
(mmap: reserved=40KB, committed=36KB)
- Other (reserved=10316KB, committed=10316KB)
(malloc=10316KB #88) (peak=11034KB #93)
- Symbol (reserved=38714KB, committed=38714KB)
(malloc=35989KB #968166) (peak=35989KB #968167)
(arena=2725KB #1) (at peak)
- Native Memory Tracking (reserved=20519KB, committed=20519KB)
(malloc=287KB #5184) (at peak)
(tracking overhead=20232KB)
- Shared class space (reserved=16384KB, committed=12060KB)
(mmap: reserved=16384KB, committed=12060KB)
- Arena Chunk (reserved=174KB, committed=174KB)
(malloc=174KB #283) (peak=59297KB #1646)
- Module (reserved=8121KB, committed=8121KB)
(malloc=8121KB #26192) (at peak)
- Safepoint (reserved=8KB, committed=8KB)
(mmap: reserved=8KB, committed=8KB)
- Synchronization (reserved=864KB, committed=864KB)
(malloc=864KB #8627) (at peak)
- Serviceability (reserved=3KB, committed=3KB)
(malloc=3KB #13) (at peak)
- Metaspace (reserved=265651KB, committed=210163KB)
(malloc=3507KB #4648) (at peak)
(mmap: reserved=262144KB, committed=206656KB)
- String Deduplication (reserved=1KB, committed=1KB)
(malloc=1KB #8) (at peak)
- Object Monitors (reserved=4KB, committed=4KB)
(malloc=4KB #18) (peak=23KB #112)
其中有部分内存是无法被观察到的,比如GC、Internal 、Symbol 、Thread,这部分内存占用无法在grafana中被观察到,限制内存时应留足内存。

浙公网安备 33010602011771号