spring-boot-starter-data-redis OutOfDirectMemoryError 堆外内存泄漏

一、现象

   起因:应用经常重启,JVM 内存是比较稳定,查看错误日志发现是堆外内存泄漏,而且观看相关的 trace,只有这个用户会导致OutOfDirectMemoryError

继续排查现象

  1. 查看 redis 出口流量远没有这么大
  2. 查看对应堆栈的数据的 key,也木有这么大。900KB 左右(这个很大,也是促成现象的原因之一)
  3. 该用户的处理很频繁

我们的spring-boot-starter-data-redis使用的5.1.4.RELEASE版本的lettuce-core连接redis。luttuce 使用 netty,而netty中的
io.netty.util.internal.PlatformDependent#DIRECT_MEMORY_COUNTER
记录着当前使用的堆外内存大小。然后我们使用 arthas 查看重启之后一段时间的内存大小

getstatic io.netty.util.internal.PlatformDependent DIRECT_MEMORY_COUNTER -x 1

然后再隔一段时间,发现它不会回收。感觉问题是出在这里了。一直增长导致OutOfDirectMemoryError

二、原因

问:为什么会重启?
答:堆外内存过大导致超过限制重启

问:为什么堆外内存会泄漏?
答:因为一直增长不回收,最终导致 OutOfDirectMemoryError

问:为什么只有这个店铺会?这个答案在后面揭晓

三、如何快速处理

  本着减少影响面,第一时间把这个用户从线上环境剥离到其他环境,这样其他用户就不会受到影响,而且排查起来也方便。

四、如何复现

  那么我们来梳理一下复现的条件

  1. 某用户的特殊大key配置
  2. 较高的频率获取缓存
  3. 5.1.4.RELEASE版本的lettuce-core
    那么看起来没有什么特殊的,可以放本地redis然后起项目复现一波。
    @Test
    public void testOOD() {
        ExecutorService executorService = Executors.newFixedThreadPool(10);

        for (int i = 0; i < 10; i++) {
            executorService.execute(() -> {
                System.out.println(cacheTest.selectxxxxList(param).size());
            });
        }

        System.in.read();
    }
    
    @Cacheable(cacheNames = "cache:settings", key = "#param")
    public List<SettingsDTO> selectSettingsList(Long param) {
        // 已经在redis内提前插入数据了,所以这里是空
       return Lists.newArrayList();
    }

果然,稳定复现

本地测试 -XX:MaxDirectMemorySize=5m

2022-03-05 18:12:50.272  WARN 51615 --- [ioEventLoop-4-1] io.lettuce.core.protocol.CommandHandler  : [null, io.netty.util.internal.OutOfDirectMemoryError: failed to allocate 2097152 byte(s) of direct memory (used: 3155342, max: 5242880), io.netty.util.internal.OutOfDirectMemoryError: failed to allocate 2097152 byte(s) of direct memory (used: 3155342, max: 5242880)] Unexpected exception during request: [null, io.netty.util.internal.OutOfDirectMemoryError: failed to allocate 2097152 byte(s) of direct memory (used: 3155342, max: 5242880), io.netty.util.internal.OutOfDirectMemoryError: failed to allocate 2097152 byte(s) of direct memory (used: 3155342, max: 5242880)]
	at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:688)
	at com.berserk.CacheTest$$EnhancerBySpringCGLIB$$65af824b.selectSettingsList(<generated>)
	at com.berserk.ChatListenerTest.lambda$test$0(ChatListenerTest.java:33)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: io.lettuce.core.RedisException: io.netty.util.internal.OutOfDirectMemoryError: failed to allocate 2097152 byte(s) of direct memory (used: 3155342, max: 5242880)
	at io.lettuce.core.LettuceFutures.awaitOrCancel(LettuceFutures.java:129)
	at io.lettuce.core.FutureSyncInvocationHandler.handleInvocation(FutureSyncInvocationHandler.java:69)
	at io.lettuce.core.internal.AbstractInvocationHandler.invoke(AbstractInvocationHandler.java:80)
	at com.sun.proxy.$Proxy82.get(Unknown Source)
	at org.springframework.data.redis.connection.lettuce.LettuceStringCommands.get(LettuceStringCommands.java:66)
	... 19 more
Caused by: io.netty.util.internal.OutOfDirectMemoryError: failed to allocate 2097152 byte(s) of direct memory (used: 3155342, max: 5242880)
	at io.netty.util.internal.PlatformDependent.incrementMemoryCounter(PlatformDependent.java:655)
	at io.netty.util.internal.PlatformDependent.reallocateDirectNoCleaner(PlatformDependent.java:628)
	at io.netty.buffer.UnpooledUnsafeNoCleanerDirectByteBuf.reallocateDirect(UnpooledUnsafeNoCleanerDirectByteBuf.java:34)
	at io.netty.buffer.UnpooledUnsafeNoCleanerDirectByteBuf.capacity(UnpooledUnsafeNoCleanerDirectByteBuf.java:51)
	at io.netty.buffer.AbstractByteBuf.ensureWritable0(AbstractByteBuf.java:299)
	at io.netty.buffer.AbstractByteBuf.ensureWritable(AbstractByteBuf.java:278)
	at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1103)
	at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1096)
	at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1087)
	at io.lettuce.core.protocol.CommandHandler.channelRead(CommandHandler.java:554)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
	at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
	at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1408)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:930)
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163)
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:677)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:612)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:529)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:491)
	at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:905)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	... 1 more

五、问题根源

核心流程如下:

  产生问题的代码在io.lettuce.core.protocol.CommandHandler#decode(io.netty.channel.ChannelHandlerContext, io.netty.buffer.ByteBuf)

核心代码如下:

    
    
protected void decode(ChannelHandlerContext ctx, ByteBuf buffer) throws InterruptedException {
......
    
    //判断是否可解码(deque不为空 && buffer可读---buffer的writerIndex > readerIndex)
        while (canDecode(buffer)) {

            // 查询最先入栈的command
            RedisCommand<?, ?, ?> command = stack.peek();
            if (debugEnabled) {
                logger.debug("{} Stack contains: {} commands", logPrefix(), stack.size());
            }

            pristine = false;

            try {
            	// 解码
                if (!decode(ctx, buffer, command)) {
                    return;
                }
            } catch (Exception e) {

                ctx.close();
                throw e;
            }

           ...

                if (canComplete(command)) {
                // 解析成功则把最开始的cmd给出栈
                    stack.poll();

                    try {
                    	// 通知并返回业务线程
                        complete(command);
                    } catch (Exception e) {
                        logger.warn("{} Unexpected exception during request: {}", logPrefix, e.toString(), e);
                    }
                }
           

            afterDecode(ctx, command);
        }

        //重置读写下标,writerIndex = readerIndex = 0,
    //或者writerIndex -= readerIndex,readerIndex= 0
        if (buffer.refCnt() != 0) {
            buffer.discardReadBytes();
        }
    }

 /**
     * 判断是否可以解码
     *
     * @param buffer
     * @return
     */
    protected boolean canDecode(ByteBuf buffer) {
        return !stack.isEmpty() && buffer.isReadable();
    }
    
    
    

  由图上可以看出,使用的是同一个channel,还有同一个ByteBuf,这里问题是出现在步骤5中,如上decode的代码中得出一个极端场景
当canDecode(buffer)一直为true(deque不为空 && buffer的writerIndex > readerIndex)时,此时就不会重置读写下标,导致buffer会一直扩容。再结合上我们这个用户并发还挺高,所以一直居高不下,回收不了。

六、如何解决

目前是有两个方案:

  1. 改为jedis
  2. 升级lettuce到5.2.0.RELEASE及之后的版本,至于为什么5.2.0.RELEASE不会出现这种消费不过来就OutOfDirectMemoryError嘛。那是因为它在decode的时候,如果发现不能decode的话,就会强制执行discardReadBytes重置读写下标,也就是在,就不会去叠加开辟堆外内存。
  <dependency>
            <groupId>io.lettuce</groupId>
            <artifactId>lettuce-core</artifactId>
            <version>5.2.0.RELEASE</version>
        </dependency>

        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-data-redis</artifactId>
            <exclusions>
                <exclusion>
                    <groupId>io.lettuce</groupId>
                    <artifactId>lettuce-core</artifactId>
                </exclusion>
            </exclusions>
        </dependency>

https://github.com/redis/lettuce/issues/906
这是在github上面的相似issues,事实上也是这位老哥后面pr修复了。

七、后续优化

  1. 减少无效字段
  2. 多重缓存
posted @ 2025-01-22 15:06  昔吾  阅读(195)  评论(0)    收藏  举报