Netty 直接内存(堆外内存)溢出分析

问题描述

通过压测,发现系统最后会停止响应,不再接收新的请求。

查看日志,发现有如下申请直接内存错误。

io.netty.util.internal.OutOfDirectMemoryError: failed to allocate 16777216 byte(s) of direct memory (used: 2130706439, max: 2147483648)
        at io.netty.util.internal.PlatformDependent.incrementMemoryCounter(PlatformDependent.java:742) ~[netty-all-4.1.47.Final.jar!/:4.1.47.Final]
        at io.netty.util.internal.PlatformDependent.allocateDirectNoCleaner(PlatformDependent.java:697) ~[netty-all-4.1.47.Final.jar!/:4.1.47.Final]
        at io.netty.buffer.PoolArena$DirectArena.allocateDirect(PoolArena.java:758) ~[netty-all-4.1.47.Final.jar!/:?]
        at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:734) ~[netty-all-4.1.47.Final.jar!/:?]
        at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:245) ~[netty-all-4.1.47.Final.jar!/:?]
        at io.netty.buffer.PoolArena.allocate(PoolArena.java:215) ~[netty-all-4.1.47.Final.jar!/:?]
        at io.netty.buffer.PoolArena.allocate(PoolArena.java:147) ~[netty-all-4.1.47.Final.jar!/:?]
        at io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:356) ~[netty-all-4.1.47.Final.jar!/:?]
        at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:187) ~[netty-all-4.1.47.Final.jar!/:?]
        at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:178) ~[netty-all-4.1.47.Final.jar!/:?]
        at io.netty.buffer.AbstractByteBufAllocator.ioBuffer(AbstractByteBufAllocator.java:139) ~[netty-all-4.1.47.Final.jar!/:?]
        at io.netty.channel.DefaultMaxMessagesRecvByteBufAllocator$MaxMessageHandle.allocate(DefaultMaxMessagesRecvByteBufAllocator.java:114) ~[netty-all-4.1.47.Final.jar!/:?]
        at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:147) [netty-all-4.1.47.Final.jar!/:?]
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:714) [netty-all-4.1.47.Final.jar!/:?]
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:650) [netty-all-4.1.47.Final.jar!/:?]
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:576) [netty-all-4.1.47.Final.jar!/:?]
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493) [netty-all-4.1.47.Final.jar!/:?]
        at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) [netty-all-4.1.47.Final.jar!/:?]
        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-all-4.1.47.Final.jar!/:4.1.47.Final]
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [netty-all-4.1.47.Final.jar!/:?]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_191]

问题分析

堆内存

首先怀疑是堆内存泄露,通过jmap命令查看堆内存使用情况,发现使用正常。Xmx设置是8G,只使用了954M

如果确认是堆溢出,使用命令jmap -dump:format=b,file=heap2.hprof 28624将堆快照dump出来,再使用MAT/JProfiler等工具分析,同时可以使用jConsole和JVisualVM监控堆内存分配情况。

jmap -heap 17667

Attaching to process ID 17667, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 25.191-b12

using thread-local object allocation.
Garbage-First (G1) GC with 4 thread(s)

Heap Configuration:
   MinHeapFreeRatio         = 40
   MaxHeapFreeRatio         = 70
   MaxHeapSize              = 8589934592 (8192.0MB)
   NewSize                  = 1363144 (1.2999954223632812MB)
   MaxNewSize               = 5152702464 (4914.0MB)
   OldSize                  = 5452592 (5.1999969482421875MB)
   NewRatio                 = 2
   SurvivorRatio            = 8
   MetaspaceSize            = 268435456 (256.0MB)
   CompressedClassSpaceSize = 1073741824 (1024.0MB)
   MaxMetaspaceSize         = 17592186044415 MB
   G1HeapRegionSize         = 2097152 (2.0MB)

Heap Usage:
G1 Heap:
   regions  = 4096
   capacity = 8589934592 (8192.0MB)
   used     = 1001361104 (954.9723663330078MB)
   free     = 7588573488 (7237.027633666992MB)
   11.657377518713474% used
G1 Young Generation:
Eden Space:
   regions  = 54
   capacity = 1847590912 (1762.0MB)
   used     = 113246208 (108.0MB)
   free     = 1734344704 (1654.0MB)
   6.129398410896708% used
Survivor Space:
   regions  = 0
   capacity = 0 (0.0MB)
   used     = 0 (0.0MB)
   free     = 0 (0.0MB)
   0.0% used
G1 Old Generation:
   regions  = 429
   capacity = 1113587712 (1062.0MB)
   used     = 888114896 (846.9723663330078MB)
   free     = 225472816 (215.0276336669922MB)
   79.75257686751486% used

43044 interned Strings occupying 5008680 bytes.

堆外内存

分析堆外内存。通过设置-Xmx参数只能限制java进程中Java Heap的内存大小。而java进程的内存是由Java Heap、Class、Thread、Code、GC、Internal、Symbol、Native Memory Tracking、unknown这几部分组成。
比如当系统网络请求过大时,Internal部分的内存会显著上升,占用大量内存。
实际场景中,要结合top,jmap,NMT工具对java进程的内存进行分析。

jcmd配置NMT使用。NMT的全称是Native Memory Tracker ,是一个本地内存跟踪工具。
常用来分析JVM的内存使用情况。NMT功能默认关闭,可以通过以下方式开启:

-XX:NativeMemoryTracking=[off | summary | detail]

配置项 说明
off 默认配置
summary 只收集汇总信息
detail 收集每次调用的信息
注意,根据Java官方文档,开启NMT会有5%-10%的性能损耗;

如果想JVM退出时打印退出时的内存使用情况,可以通过如下配置项:

-XX:+UnlockDiagnosticVMOptions -XX:+PrintNMTStatistics

通过jcmd查看堆NMT内存、类内存、内部内存等详细信息

jcmd 17667  VM.native_memory detail scale=MB >temp.txt

17667:

Native Memory Tracking:

Total: reserved=15306MB, committed=11529MB
-                 Java Heap (reserved=8192MB, committed=5678MB)
                            (mmap: reserved=8192MB, committed=5678MB) 
 
-                     Class (reserved=1146MB, committed=138MB)
                            (classes #23198)
                            (malloc=4MB #56498) 
                            (mmap: reserved=1142MB, committed=134MB) 
 
-                    Thread (reserved=461MB, committed=461MB)
                            (thread #458)
                            (stack: reserved=459MB, committed=459MB)
                            (malloc=2MB #2292) 
                            (arena=1MB #912)
 
-                      Code (reserved=265MB, committed=118MB)
                            (malloc=21MB #25211) 
                            (mmap: reserved=244MB, committed=97MB) 
 
-                        GC (reserved=409MB, committed=315MB)
                            (malloc=73MB #151121) 
                            (mmap: reserved=336MB, committed=243MB) 
 
-                  Compiler (reserved=2MB, committed=2MB)
                            (malloc=2MB #3347) 
 
-                  Internal (reserved=4778MB, committed=4778MB)
                            (malloc=4778MB #218361) 
 
-                    Symbol (reserved=27MB, committed=27MB)
                            (malloc=23MB #249386) 
                            (arena=4MB #1)
 
-    Native Memory Tracking (reserved=11MB, committed=11MB)
                            (tracking overhead=11MB)
 
-                   Unknown (reserved=16MB, committed=0MB)
                            (mmap: reserved=16MB, committed=0MB) 
 
Virtual memory map:
 
[0x00000005c0000000 - 0x00000007c0000000] reserved 8192MB for Java Heap from
    [0x00007fce6c249efb] ReservedSpace::initialize(unsigned long, unsigned long, bool, char*, unsigned long, bool)+0xab
    [0x00007fce6c24a4e2] ReservedHeapSpace::ReservedHeapSpace(unsigned long, unsigned long, bool, char*)+0x52
    [0x00007fce6c216f65] Universe::reserve_heap(unsigned long, unsigned long)+0x65
    [0x00007fce6bd6e681] G1CollectedHeap::initialize()+0x151

发现Internal区域很大。可以通过如下方式,对比内存变化。

设置基线,并观察堆内存变化(包括堆内和堆外)

一般对于堆外内存缓慢增长直到爆炸的情况来说,可以先设一个基线jcmd pid VM.native_memory baseline。

然后等放一段时间后再去看看内存增长的情况,通过jcmd pid VM.native_memory detail.diff(summary.diff)做一下 summary 或者 detail 级别的 diff。这边堆外内存我们重点关注 Internal 的内存增长,如果增长十分明显的话那就是有问题了。

jcmd 17667 VM.native_memory baseline
jcmd 13527 VM.native_memory scale=MB detail.diff   --包含具体的内存分配信息
jcmd 13527 VM.native_memory scale=MB summary.diff  --只包含内存概要信息

通过如上步骤基本确认是堆外内存泄露了。为了进一步监控直接内存,可以使用如下代码,在启动时加载。


package netty.util;

import java.lang.reflect.Field;
import java.util.concurrent.Executors;
import java.util.concurrent.ScheduledExecutorService;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.atomic.AtomicLong;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.util.ReflectionUtils;

import io.netty.util.internal.PlatformDependent;
/*直接内存监控*/
public class DirectMemoryReporter {
    private static final Logger LOGGER = LoggerFactory.getLogger(DirectMemoryReporter.class);

    private static final int _1K = 1024;

    private static final String BUSINESS_KEY = "netty_direct_memory";

    private AtomicLong directMemory;

    public void init() {
        Field field = ReflectionUtils.findField(PlatformDependent.class, "DIRECT_MEMORY_COUNTER");
        field.setAccessible(true);
        try {
            directMemory = ((AtomicLong) field.get(PlatformDependent.class));
            startReport();
        } catch (IllegalAccessException e) {
            LOGGER.error("netty_direct_memory error", e);
        }
    }

    public void startReport() {

        Runnable runnable = () -> {
            doReport();
        };
        ScheduledExecutorService service = Executors.newSingleThreadScheduledExecutor();
        // 第二个参数为首次执行的延时时间,第三个参数为定时执行的间隔时间
        service.scheduleAtFixedRate(runnable, 30, 1, TimeUnit.SECONDS);
    }

    private void doReport() {
        try {
            int memoryInKB = (int) (directMemory.get());
            LOGGER.info("{}:{}KB", BUSINESS_KEY, memoryInKB/1024);
        } catch (Exception e) {
            LOGGER.error("netty_direct_memory error", e);
        }
    }
}

可以通过new DirectMemoryReporter().init()启动。之后每隔一秒可以在日志中看到直接内存的占用情况。

2021-03-02 09:58:59,073|[INFO]|[pool-11-thread-1]|[]|xxx.DirectMemoryReporter|netty_direct_memory:2032MB(2130706439B)

通过监控,发现随着压测,直接内存占用会不断上涨,直到涨到Xmx设置的值(如果不设置-XX:MaxDirectMemorySize默认与Xmx相同)

定位

上面基本确认了是堆外直接内存泄露问题,下面要具体定位是哪里泄露的,需要使用到pmap(查看内存分配情况)、gdb(根据可能的内存泄露地址解析出实际内容)、strace(分析线程的内存分配情况)、jstack(定位线程的具体信息)

注意,除了pmap出来的信息大小比较小,strace和jstack的日志可能比较大,以当前定位为例,最终jstack日志6G、strace日志1G,传到本地,文本编辑器打不开。因此尽量在linux环境下进行操作。

1、启动内存pmap、内存分配strace、进程jstack跟踪

先使用pmap -X pid >pmap1.txt 保存当前内存情况,同时启动strace和jstack跟踪

当跑性能跑出溢出后,再使用pmap -X pid >pmap2.txt 保存溢出时的内存分配

pmap

pmap -X pid >> pmap1.txt

strace

在压测前执行该命令,并保持会话

strace -o 日志 -p pid -Ff -Tt -e trace=brk,mmap,munmap

jstack

通过脚本test.sh,每隔一段时间打印一次线程快照

#!/bin/bash

while [ 1 ];
do
    jstack -l 16067 >> /opt/jstack_0301.txt
    sleep 10
done

./test.sh &

通过对比找出可能的泄露的内存地址

image-20210302112208575

通过分析

7f4674483000 rw-p 00000000 00:00 0 1575296 1574868 1574868 1535980 1574868 0 0

这一行增加的数据很大

2、gdb获取可能泄露的内存

通过偏移计算,使用gdb获取可能泄露的内存的内容,此步需要确认泄露的内存地址,如果内容中有时间,将时间计算出来

gdb

结束地址可以直接使用10进制设置偏移,如起始地址是0x7f4674483000,偏移1G 即偏移1x1024x1024x1024(1073741824),则结束地址为0x7f4674483000+1073741824,注意地址要以0x开头,pmap出来的地址省略了0x

gdb attach pid
c
dump memory 文件名 开始地址 结束地址

打开文件就可以看到解析的内存内容

可以通过vi/hexdump命令查看

下面是实例

[root@host-x-x-x-x ~]# gdb attach 16067
GNU gdb (GDB) 9.1
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
···省略
warning: Cannot parse .gnu_debugdata section; LZMA support was disabled at compile time
--Type <RET> for more, q to quit, c to continue without paging--c   #此处输入c,并回车,则进入gdb

(gdb) dump memory out.bin 0x7f4674483000 0x7f4674483000+1073741824 #解析对应内存内容,一般需要计算偏移,如64M
(gdb) q #输入q,退出
A debugging session is active.

	Inferior 1 [process 16067] will be detached.

Quit anyway? (y or n) y #输入y,退出
Detaching from program: /root/jdk-8u191/bin/java, process 16067
[Inferior 1 (process 16067) detached]

解析后的文件名为out.bin,打开后可以查看解析后的内容

[root@host-x-x-x-x ~]# ls -alh out.bin
-rw------- 1 root root 1.0G Mar  2 11:01 out.bin
[root@host-10-33-144-227 ~]# vi out.bin
 ^@^@^@^@^@^@^@^@^B^P^@^A^@^@^@^@^@^@^@^A^@^@^@^@^G^@^@ó^@^@^@^@DATA^M
      2 xxx@xxx^M
      3 To: xxx^M
      4 Message-ID: <651317388.574761.1614644538528.JavaMail.root@host-xxx>^M
      5 Subject: =?UTF-8?B?5rWL6K+V6YKu5Lu2?=^M
      6 MIME-Version: 1.0^M
      7 Content-Type: multipart/mixed; ^M
      8         boundary="----=_Part_574760_1068795689.1614644538528"^M
      9 ^M
     10 ------=_Part_574760_1068795689.1614644538528^M
     11 Content-Type: text/plain; charset=UTF-8^M
     12 Content-Transfer-Encoding: base64^M
     13 ^M
     14 6L+Z5piv5LiA5p2h5rWL6K+V6YKu5Lu277yBClRoaXMgaXMgYSB0ZXN0IGVtYWlsIQDATA^
[root@host-10-33-144-227 ~]# hexdump -C out.bin #使用十六进制查看
00219420  41 55 54 48 20 4c 4f 47  49 4e 0d 0a 40 68 6d 61  |AUTH LOGIN..@xx|
00219430  69 6c 2e 68 75 61 77 65  69 2e 63 6f 6d 0d 0a 54  |xx.xxx.com..T|
00219440  6f 3a 20 6c 68 6e 39 34  31 34 40 68 6d 61 69 6c  |o: xxx|
00219450  2e 68 75 61 77 65 69 2e  63 6f 6d 0d 0a 4d 65 73  |.xxx.com..Mes|
00219460  73 61 67 65 2d 49 44 3a  20 3c 31 33 38 38 32 34  |sage-ID: <138824|
00219470  33 34 39 2e 35 38 31 31  31 33 2e 31 36 31 34 36  |349.581113.16146|
00219480  34 34 39 35 33 32 32 30  2e 4a 61 76 61 4d 61 69  |44953220.JavaMai|
00219490  6c 2e 72 6f 6f 74 40 68  6f 73 74 2d 31 30 2d 33  |l.root@host-10-3|
002194a0  33 2d 31 34 34 2d 31 39  30 3e 0d 0a 53 75 62 6a  |3-144-190>..Subj|
002194b0  65 63 74 3a 20 3d 3f 55  54 46 2d 38 3f 42 3f 35  |ect: =?UTF-8?B?5|
002194c0  72 57 4c 36 4b 2b 56 36  59 4b 75 35 4c 75 32 3f  |rWL6K+V6YKu5Lu2?|
002194d0  3d 0d 0a 4d 49 4d 45 2d  56 65 72 73 69 6f 6e 3a  |=..MIME-Version:|
002194e0  20 31 2e 30 0d 0a 43 6f  6e 74 65 6e 74 2d 54 79  | 1.0..Content-Ty|
002194f0  70 65 3a 20 6d 75 6c 74  69 70 61 72 74 2f 6d 69  |pe: multipart/mi|
00219500  78 65 64 3b 20 0d 0a 09  62 6f 75 6e 64 61 72 79  |xed; ...boundary|
00219510  3d 22 2d 2d 2d 2d 3d 5f  50 61 72 74 5f 35 38 31  |="----=_Part_581|
00219520  31 31 32 5f 37 38 38 34  33 32 36 31 34 2e 31 36  |112_788432614.16|
00219530  31 34 36 34 34 39 35 33  32 32 30 22 0d 0a 0d 0a  |14644953220"....|
00219540  2d 2d 2d 2d 2d 2d 3d 5f  50 61 72 74 5f 35 38 31  |------=_Part_581|
00219550  31 31 32 5f 37 38 38 34  33 32 36 31 34 2e 31 36  |112_788432614.16|
00219560  31 34 36 34 34 39 35 33  32 32 30 0d 0a 43 6f 6e  |14644953220..Con|
00219570  74 65 6e 74 2d 54 79 70  65 3a 20 74 65 78 74 2f  |tent-Type: text/|
00219580  70 6c 61 69 6e 3b 20 63  68 61 72 73 65 74 3d 55  |plain; charset=U|
00219590  54 46 2d 38 0d 0a 43 6f  6e 74 65 6e 74 2d 54 72  |TF-8..Content-Tr|
002195a0  61 6e 73 66 65 72 2d 45  6e 63 6f 64 69 6e 67 3a  |ansfer-Encoding:|
002195b0  20 62 61 73 65 36 34 0d  0a 0d 0a 36 4c 2b 5a 35  | base64....6L+Z5|
002195c0  70 69 76 35 4c 69 41 35  70 32 68 35 72 57 4c 36  |piv5LiA5p2h5rWL6|
002195d0  4b 2b 56 36 59 4b 75 35  4c 75 32 37 37 79 42 43  |K+V6YKu5Lu277yBC|
002195e0  6c 52 6f 61 58 4d 67 61  58 4d 67 59 53 42 30 5a  |lRoaXMgaXMgYSB0Z|
002195f0  58 4e 30 49 47 56 74 59  57 6c 73 49 51 3d 3d 0d  |XN0IGVtYWlsIQ==.|
00219600  0a 2d 2d 2d 2d 2d 2d 3d  5f 50 61 72 74 5f 35 38  |.------=_Part_58|
00219610  31 31 31 32 5f 37 38 38  34 33 32 36 31 34 2e 31  |1112_788432614.1|
00219620  36 31 34 36 34 34 39 35  33 32 32 30 2d 2d 0d 0a  |614644953220--..|
00219630  2e 0d 0a 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00219640  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00219820  45 48 4c 4f 20 68 6f 73  74 2d 31 30 2d 33 33 2d  |EHLO host-xxx-|
00219830  31 34 34 2d 31 39 30 0d  0a 00 00 00 00 00 00 00  |xx-xx.........|
00219840  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*

通过内容,确认数据时间(注意到秒,后面3位毫秒值去掉),如

[root@host-x-x-x-x ~]# date -d @1614625596
Tue Mar  2 03:06:36 CST 2021

3、strace查看申请内存的pid

通过strace日志,查找上一步得到的泄露的内存地址是哪个进程申请的,通过搜索内存地址,可以查看是否该地址只被该进行mmap没有munmap,如果没有进行释放,则怀疑是其导致内存泄露

通过上面分析,内存0x7f4674483000确认发生泄露,查找泄露内存的线程

[root@host-x-x-x-x opt]# grep "0x7f4674483000" strace_0301.txt 
25897 08:22:40.071076 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f4674483000 <0.000524>

由查找结果,发现线程25897只使用了mmap申请内存(大小为16M),未进行释放munmap,存在泄漏

通过申请内存的大小16781312,我们可以过滤一下

[root@host-10-33-144-227 opt]# grep 16781312 strace_0301.txt |wc -l
96
[root@host-10-33-144-227 opt]# grep 16781312 strace_0301.txt 
25900 22:48:40.293407 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f46d34e2000 <0.001026>
25898 22:48:40.779514 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f46d24e1000 <0.000931>
25895 22:48:42.072311 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
25894 22:48:43.126104 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f46d04df000 <0.000846>
25896 22:48:43.453893 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
25899 22:48:54.240661 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
25893 22:48:55.871836 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
25897 22:48:58.918825 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f46cc4db000 <0.000926>
25899 23:40:52.070042 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
25900 23:40:52.076028 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f46ca4d9000 <0.001435>
25894 23:40:53.138579 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
25895 23:40:53.993883 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
25897 23:40:54.021415 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f46c74d6000 <0.000961>
25896 23:40:55.046603 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f46c64d5000 <0.001035>
25898 23:41:01.209776 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f46c54d4000 <0.001205>
25893 23:41:38.545986 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
25898 00:32:43.584853 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
25899 00:32:45.662573 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
25900 00:32:46.247013 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
25896 00:32:47.807808 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f46c04cf000 <0.000738>
25894 00:32:48.105049 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f46bf4ce000 <0.000832>
25895 00:32:48.646481 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f46be4cd000 <0.000840>
25897 00:32:51.705296 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
25893 00:33:31.796557 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
25898 01:24:53.214635 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
25899 01:24:54.268791 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
25900 01:24:55.509481 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f46b94c8000 <0.001084>
25894 01:24:55.975972 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
25897 01:24:56.313316 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f46b74c6000 <0.000877>
25896 01:24:57.998530 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f46b64c5000 <0.001148>
25895 01:25:02.973712 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
25893 01:25:39.658599 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f46b44c3000 <0.000751>
25898 02:16:08.550683 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f46b34c2000 <0.000872>
25894 02:16:09.253646 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f46b24c1000 <0.000981>
25900 02:16:09.868975 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
25897 02:16:10.299096 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f46b04bf000 <0.000774>
25895 02:16:10.898468 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f46af4be000 <0.000921>
25896 02:16:11.490967 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f46ae4bd000 <0.000816>
25899 02:16:16.064957 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f46ad4bc000 <0.001132>
25893 02:16:56.362348 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
25898 03:07:37.304262 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f46ab4ba000 <0.000832>
25899 03:07:37.985234 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f46aa4b9000 <0.000840>
25895 03:07:40.860608 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f46a94b8000 <0.001047>
25900 03:07:41.419694 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
25896 03:07:43.400255 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
25897 03:07:43.572092 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f46a64b5000 <0.001165>
25894 03:07:49.068627 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f46a54b4000 <0.001139>
25893 03:08:22.889127 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
25899 04:00:37.946945 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f46a34b2000 <0.001101>
25900 04:00:41.183475 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
25898 04:00:41.470902 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f46a14b0000 <0.000641>
25896 04:00:42.676048 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
25894 04:00:43.223683 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f469f4ae000 <0.001021>
25897 04:00:45.400673 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f469e4ad000 <0.000564>
25895 04:00:46.895991 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
25893 04:01:24.135505 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
25898 04:52:44.120418 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f469b4aa000 <0.000958>
25894 04:52:46.038504 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f469a4a9000 <0.001086>
25900 04:52:46.404094 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f46994a8000 <0.001060>
25896 04:52:46.780690 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
25895 04:52:47.319693 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
25897 04:52:47.591897 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
25899 04:52:54.528811 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f46954a4000 <0.000924>
25893 04:53:27.404271 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
25899 05:44:25.134003 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
25898 05:44:26.525414 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
25900 05:44:27.165630 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
25894 05:44:28.995850 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f469049f000 <0.000962>
25896 05:44:31.020882 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
25897 05:44:31.657577 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
25895 05:44:33.198010 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f468d49c000 <0.000796>
25893 05:45:08.843821 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
25898 06:37:21.368773 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f468b49a000 <0.000512>
25899 06:37:23.990442 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f468a499000 <0.000969>
25894 06:37:24.215510 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f4689498000 <0.001021>
25897 06:37:26.174882 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
25895 06:37:26.414563 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f4687496000 <0.000920>
25900 06:37:26.507630 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
25896 06:37:39.847076 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
25893 06:37:51.091554 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
25894 07:31:27.933343 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
25896 07:31:29.163123 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f4682491000 <0.000441>
25898 07:31:29.324907 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f4681490000 <0.000298>
25900 07:31:31.294559 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
25893 07:31:31.443325 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
25899 07:31:39.104532 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f467e48d000 <0.000974>
25895 07:31:43.078283 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
25897 07:31:52.607503 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f467c48b000 <0.001193>
25898 08:22:19.669250 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f467b48a000 <0.000921>
25900 08:22:20.093744 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f467a489000 <0.000832>
25896 08:22:22.625889 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f4679488000 <0.001093>
25893 08:22:23.076138 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
25894 08:22:27.951371 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
25895 08:22:38.615218 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
25899 08:22:38.734819 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f4675484000 <0.000974>
25897 08:22:40.071076 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f4674483000 <0.000524>

申请的总内存为1536.375MB(16781312*96=1611005952),发现与第2步中分析可能泄露的内存

7f4674483000 rw-p 00000000 00:00 0 1575296 1574868 1574868 1535980 1574868 0 0

中的大小1575296/1024=1538.375MB,基本相等。由此推断,这些内存均是泄露的内存

4、jstack日志查找线程

通过上一步骤获取的pid 25897,转换16进制后6529(`printf "%x\n" 25897),到jstack日志中查找,通过对比上一步骤的时间与jstack日志的时间及线程号,找到其对应的线程(使用的是线程池,会重复利用)

[root@host-x-x-x-x opt]# printf "%x\n" 25897
6529
[root@host-x-x-x-x opt]# grep 6529 -C 10 jstack_0301.txt |more
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:576)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.lang.Thread.run(Thread.java:748)

   Locked ownable synchronizers:
	- None

"WorkerPetalGroup-9-5" #236 prio=5 os_prio=0 tid=0x0000000002d4b000 nid=0x6529 waiting for monitor entry [0x00007f47142cc000]
   java.lang.Thread.State: BLOCKED (on object monitor)
	at org.apache.activemq.ActiveMQSession.send(ActiveMQSession.java:1906)
	- waiting to lock <0x000000009bfd68b8> (a java.lang.Object)
	at org.apache.activemq.ActiveMQMessageProducer.send(ActiveMQMessageProducer.java:288)
	at org.apache.activemq.ActiveMQMessageProducer.send(ActiveMQMessageProducer.java:223)
	at org.apache.activemq.ActiveMQMessageProducerSupport.send(ActiveMQMessageProducerSupport.java:269)
	at org.apache.james.queue.activemq.ActiveMQMailQueue.produceMail(ActiveMQMailQueue.java:203)
	at org.apache.james.queue.jms.JMSMailQueue.enQueue(JMSMailQueue.java:275)
	at org.apache.james.queue.api.MailQueue.enQueue(MailQueue.java:96)
	at org.apache.james.queue.jms.JMSMailQueue.enQueue(JMSMailQueue.java:304)
--
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:576)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.lang.Thread.run(Thread.java:748)

   Locked ownable synchronizers:
	- None

"WorkerPetalGroup-9-5" #236 prio=5 os_prio=0 tid=0x0000000002d4b000 nid=0x6529 waiting for monitor entry [0x00007f47142cc000]
   java.lang.Thread.State: BLOCKED (on object monitor)
	at org.apache.activemq.ActiveMQSession.send(ActiveMQSession.java:1906)
	- waiting to lock <0x000000009bfd68b8> (a java.lang.Object)
	at org.apache.activemq.ActiveMQMessageProducer.send(ActiveMQMessageProducer.java:288)
	at org.apache.activemq.ActiveMQMessageProducer.send(ActiveMQMessageProducer.java:223)
	at org.apache.activemq.ActiveMQMessageProducerSupport.send(ActiveMQMessageProducerSupport.java:269)
	at org.apache.james.queue.activemq.ActiveMQMailQueue.produceMail(ActiveMQMailQueue.java:203)
	at org.apache.james.queue.jms.JMSMailQueue.enQueue(JMSMailQueue.java:275)
	at org.apache.james.queue.api.MailQueue.enQueue(MailQueue.java:96)
	at org.apache.james.queue.jms.JMSMailQueue.enQueue(JMSMailQueue.java:304)
--
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:576)

可以发现该线程WorkerPetalGroup-9-5为我们自定义的Netty线程,为了进一步确认线程,需要根据时间,如22:48:58

25897 22:48:58.918825 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f46cc4db000 <0.000926>

在jstack的日志中搜索2021-03-01 22:48:5,因为jstack每10秒打一次,从下面的搜索结果向下搜索线程 ,当前行号是8093667

找到如下内容

8114644 "WorkerPetalGroup-9-5" #236 prio=5 os_prio=0 tid=0x0000000002d4b000 nid=0x6529 runnable [0x00007f47142cd000]
8114645    java.lang.Thread.State: RUNNABLE
8114646         at sun.nio.ch.NativeThread.current(Native Method)
8114647         at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:326)
8114648         - locked <0x000000009a3d1f10> (a java.lang.Object)
8114649         - locked <0x000000009a3d1ef0> (a java.lang.Object)
8114650         at io.netty.buffer.PooledByteBuf.setBytes(PooledByteBuf.java:253)
8114651         at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1133)
8114652         at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:350)
8114653         at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:148)
8114654         at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:714)
8114655         at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:650)
8114656         at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:576)
8114657         at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)
8114658         at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
8114659         at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
8114660         at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
8114661         at java.lang.Thread.run(Thread.java:748)
8114662 
8114663    Locked ownable synchronizers:
8114664         - None

通过以上样例发现,内存的申请都是调用了

io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163)

在代码中查找

io.netty.channel.nio.AbstractNioByteChannel.NioByteUnsafe#read

发现在该方法中申请了内存

5、分析代码

由于netty的堆外内存回收依赖于ByteBuf的引用计数器,可以通过refCnt()方法获取,由于继承了io.netty.channel.SimpleChannelInboundHandler#channelRead,该方法会调用释放方法释放堆外内存(当引用计数为0时才会释放)

   @Override
    public void channelRead(ChannelHandlerContext ctx, Object msg) throws Exception {
        boolean release = true;
        try {
            if (acceptInboundMessage(msg)) {
                @SuppressWarnings("unchecked")
                I imsg = (I) msg;
                channelRead0(ctx, imsg);
            } else {
                release = false;
                ctx.fireChannelRead(msg);
            }
        } finally {
            if (autoRelease && release) {
                ReferenceCountUtil.release(msg);
            }
        }
    }

通过在第15行打断点,观察msg在调用ReferenceCountUtil.release(msg);前和调用后的refCnt()的值,如果调用后该值仍不为0,则会导致堆外内存无法释放。可以在代码中查看是否使用了msg.retain()方法(该方法会使引用计数+1)。

其他

自定义Netty线程名称

由于有组件也依赖于Netty,为了区分线程,需要设置业务线程名称,如下

 // 指定线程池并指定线程名称
bossGroup = new NioEventLoopGroup(new DefaultThreadFactory("BossPetalGroup"));
workerGroup = new NioEventLoopGroup(ioWorker, new DefaultThreadFactory("WorkerPetalGroup"));

涉及的工具

1、可以使用JVisualVM/JConsole监控JVM资源
2、堆可以使用MAT(MemoryAnalyzer Tool)分析泄露堆内存,线程资源;使用JProfiler分析对象实例信息
3、堆外内存需要使用NMT(Native Memory Tracing)结合jcmd命令(比jmap更强大)
4、内存分析可以使用Linux工具pmap结合strace跟踪内存分配,内存内容使用gdb进行解析
5、线程使用jstack结合4中的工具分析
6、使用jps查看java进程,jinfo -flags查看JVM参数信息
7、通过free -g查看系统内存情况
8、netstat -aon|grep port查看端口情况
9、jstat -gcutil 31974 1000 10查看JVM GC情况

Netty防止内存泄露常识

InBound Message
在AbstractNioByteChannel.NioByteUnsafe.read() 处创建了ByteBuf并调用 pipeline.fireChannelRead(byteBuf) 送入Handler链。

根据上面的谁最后谁负责原则,每个Handler对消息可能有三种处理方式

  • 对原消息不做处理,调用 ctx.fireChannelRead(msg)把原消息往下传,那不用做什么释放。
  • 将原消息转化为新的消息并调用 ctx.fireChannelRead(newMsg)往下传,那必须把原消息release掉。
  • 如果已经不再调用ctx.fireChannelRead(msg)传递任何消息,那更要把原消息release掉。

假设每一个Handler都把消息往下传,Handler并也不知道谁是启动Netty时所设定的Handler链的最后一员,所以Netty在Handler链的最末补了一个TailHandler,如果此时消息仍然是ReferenceCounted类型就会被release掉。

Netty之有效规避内存泄漏
Java – Debugging Native Memory Leaks

本博客用于技术学习,所有资源都来源于网络,部分是转发,部分是个人总结。欢迎共同学习和转载,转载请在醒目位置标明原文。如有侵权,请留言告知,及时撤除。
posted @ 2021-03-02 19:56  风动静泉  阅读(1836)  评论(0编辑  收藏  举报