K8S 中实现程序 OOM 后 dump 上传到阿里云 OSS (二)

在 K8S 中使用 Sidecar

创建 secret, 用来保存阿里云 RAM 用户的 OSS_ACCESS_KEY_ID 和 OSS_ACCESS_KEY_SECRET,traeAliOSS 命令需要用到。

kubectl create secret generic aliyun-credentials \
  --from-literal=OSS_ACCESS_KEY_ID=aaaaaaaaaaaa \
  --from-literal=OSS_ACCESS_KEY_SECRET=bbbbbbbbbbbb

创建 deployment,memory-leak-test 镜像是我用来测试 OOM 的,同样也是使用 Trae 来编写的 java 程序。

# 下面是使用 Sidecar 的配置,新增的部分,我会标出来
cat deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: memory-leak-test
spec:
  replicas: 1
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
  selector:
    matchLabels:
      app: memory-leak-test
  template:
    metadata:
      labels:
        app: memory-leak-test
    spec:
      terminationGracePeriodSeconds: 10 
      imagePullSecrets:
      - name: harbor

      # 新增一个挂载目录,用于共享 dump 所在的目录
      volumes:
      - name: shared-tmp
        emptyDir: {}

      containers:
      - name: memory-leak-test
        image: harbor.klvchen.com/tmp/memory-leak-test:12
        resources:
          limits:
            cpu: 1
            memory: 256Mi
          requests:
            cpu: 100m
            memory: 128Mi
        livenessProbe:
          tcpSocket:
            port: 8000
          initialDelaySeconds: 30
          periodSeconds: 10

        # 新增目录挂载到 dump 所在的地方
        volumeMounts:
        - name: shared-tmp
          mountPath: /tmp
      
      # 新增的 Sidecar 容器的配置
      - name: tmp-watcher-sidecar
        image: harbor.klvchen.com/library/alpine-ossutil:0.1
        env:
        - name: OSS_ACCESS_KEY_ID
          valueFrom:
            secretKeyRef:
              name: aliyun-credentials
              key: OSS_ACCESS_KEY_ID
        - name: OSS_ACCESS_KEY_SECRET
          valueFrom:
            secretKeyRef:
              name: aliyun-credentials
              key: OSS_ACCESS_KEY_SECRET
        - name: ALI_OSS_BUCKET
          value:klvchen-test
        volumeMounts:
        - name: shared-tmp
          mountPath: /tmp

# 启动服务
kubectl apply -f deployment.yaml

注意事项:

1. 业务容器,我这里是 memory-leak-test,启动时,要配置 "-XX:+HeapDumpOnOutOfMemoryError", "-XX:HeapDumpPath=/tmp/memory-leak-test.dump" ,当 OOM 时生成 dump,并且指定 dump 在 /tmp 目录中与 Deployment 中匹配

下面是我的 memory-leak-test Dockerfile 参考例子:
cat Dockerfile
FROM harbor.junengcloud.com/openfaas/openjdk:11
RUN ln -sf /usr/share/zoneinfo/Asia/Shanghai  /etc/localtime && echo "Asia/Shanghai" >/etc/timezone
ADD memory-leak-test-1.0-SNAPSHOT.jar memory-leak-test.jar
ENTRYPOINT ["java", "-Xmx128m","-Xms128m", "-XX:+HeapDumpOnOutOfMemoryError", "-XX:HeapDumpPath=/tmp/memory-leak-test.dump", "-Djava.security.egd=file:/dev/./urandom", "-Dentry.timezone=GMT+08", "-   jar","/memory-leak-test.jar"]
EXPOSE 8000

2. 业务容器 deployment 挂载的目录,要与 "-XX:HeapDumpPath=/tmp/memory-leak-test.dump" 匹配,并且dump的名字以 .dump 命名

测试

memory-leak-test 设置成访问 /leak 就会启动
image
一段时间后,程序发生 OOM
image
登录阿里云OSS,发现 dump 已经成功上传
image
后续可配置钉钉告警,发生 OOM 时提醒相关人员。

posted @ 2026-01-08 16:57  klvchen  阅读(4)  评论(0)    收藏  举报