记录一次线程启动时jvm崩溃问题
执行器节点崩溃,没有Java的crush日志,只有系统留下的core文件,通过gdb查看core文件中的栈信息如下图所示:

get_klass_by_name_impl方法的作用是Implementation methods for loading and constant pool access.
Lock的作用:We have to lock the cpool to keep the oop from being resolved while we are accessing it.
trans_and_fence的作用就是调用transition_and_fence,关于transition_and_fence的描述:transition_and_fence must be used on any thread state transition where there might not be a Java call stub on the stack, in particular on Windows where the Structured Exception Handler is set up in the call stub. os::write_memory_serialize_page() can fault and we can't recover from it on Windows without a SEH in place.
认为可能是jvm启动线程加载类信息时出错,导致执行器进程背系统杀死。
通过搜索找到有类似的问题,也是在jvm启动线程后加载类信息
时出错,对方当时的错误栈信息如下所示:

解决方案是在启动程序是添加 -Xnoclassgc 或 -XX:-ClassUnloading这两个参数。
-Xnoclassgc参数在Oracle官方文档描述如下
Disables garbage collection (GC) of classes. This can save some GC time, which shortens interruptions during the application run.
When you specify -Xnoclassgc at startup, the class objects in the application will be left untouched during GC and will always be considered live. This can result in more memory being permanently occupied which, if not used carefully, will throw an out of memory exception.
但是没有-XX:-ClassUnloading的描述,找到类似参数:
-XX:+CMSClassUnloadingEnabled
Enables class unloading when
using the concurrent mark-sweep (CMS) garbage collector. This option is enabled
by default. To disable class unloading for the CMS garbage collector,
specify -XX:-CMSClassUnloadingEnabled.
启用这两个参数后gc时将不会清理类对象,可能会出现内存浪费甚至内存泄漏问题,如果情况严重可能会填满持久代。

浙公网安备 33010602011771号