JDK tools之jps和jstack诊断Java程序

大部分Java开发者可能知道有这么个工具,但是没怎么用过,每次还得百度一下。我也是之一 -_-!!。

每次遇到👇下面的问题,头是不是有点大:

  • OutOfMemoryError
  • 程序突然卡顿死锁
  • 内存泄露

个人感觉jps、jstack、jmap、visualVM、jConsole、MAT都是解决这些问题的利器,今天先介绍jps、jstack。

1、jps,先man一下看看jps是干什么的

NAME
       jps - Java Virtual Machine Process Status Tool
DESCRIPTION
       The  jps  tool  lists  the  instrumented  HotSpot Java Virtual Machines (JVMs) on the target system. The tool is limited to reporting  information on JVMs for which it has the access permissions.

从名字就大概知道jps是查看Java进程信息的工具,跟LInux/Unix下的ps(process status)类似,只是jps只针对Java程序。

使用语法如下:

SYNOPSIS
       jps [options] [hostid]

主要参数:

-q 不输出类名、Jar名和传入main方法的参数
-m 输出传入main方法的参数
-l 输出main类或Jar的全限名
-v 输出传入JVM的参数

例如:

➜  ~ jps -m -l
2281 /Users/xxxxxx/Applications/data-integration-7.0/launcher/pentaho-application-launcher-7.0.0.0-25.jar -lib ./../libswt/osx64/ 
2312 sun.tools.jps.Jps -m -l
➜  ~ 

包括jps在内一共有两个Java进程,第一列是pid,也是后面我们要用到的信息。

当然上面的信息还可以用ps命令来查询ps  -ef或ps aux,前者Linux下的标准语法,后者是BSD风格的语法。一样可以查到第二列的pid。

➜  ~ ps -ef | grep java
  501  2281  2261   0 11:48PM ttys000    0:52.81 /Library/Java/JavaVirtualMachines/jdk1.8.0_144.jdk/Contents/Home/bin/java -XstartOnFirstThread -Xms1024m -Xmx2048m -XX:MaxPermSize=256m -Dhttps.protocols=TLSv1,TLSv1.1,TLSv1.2 -Djava.library.path=./../libswt/osx64/ -DKETTLE_HOME= -DKETTLE_REPOSITORY= -DKETTLE_USER= -DKETTLE_PASSWORD= -DKETTLE_PLUGIN_PACKAGES= -DKETTLE_LOG_SIZE_LIMIT= -DKETTLE_JNDI_ROOT= -jar /Users/xxxxxx/Applications/data-integration-7.0/launcher/pentaho-application-launcher-7.0.0.0-25.jar -lib ./../libswt/osx64/ 
  501  2361  2285   0 11:51PM ttys001    0:00.00 grep --color=auto --exclude-dir=.bzr --exclude-dir=CVS --exclude-dir=.git --exclude-dir=.hg --exclude-dir=.svn java
➜  ~ 

2、jstack

NAME
       jstack - stack trace
DESCRIPTION
       jstack  prints  Java  stack  traces  of  Java threads for a given Java process or core file or a remote debug server. 
For each Java frame, the full class name, method name, 'bci' (byte code index) and line number, if available, are printed.
With the -m option,jstack prints both Java and native frames of all threads along with the 'pc' (program counter).
For each native frame, the closest native symbol to 'pc', if available, is printed. C++ mangled names are not demangled.
To demangle C++ names, the output of this command may be piped to c++filt.

说白了jstack就是查看当前Java程序内线程详细堆栈信息的工具。

使用语法如下:

SYNOPSIS
       jstack [ option ] pid
       jstack [ option ] executable core
       jstack [ option ] [server-id@]remote-hostname-or-IP

主要参数:

Options:
    -F  to force a thread dump. Use when jstack <pid> does not respond (process is hung)(强制线程转储,此时线程会被挂起)
    -m  to print both java and native frames (mixed mode)(打印Java和native接口的堆栈信息)
    -l  long listing. Prints additional information about locks(打印关于锁的详细信息,如果有线程死锁可以使用jstack -l pid查看锁或资源的持有信息)
    -h or -help to print this help message

 为了配合实战,写了一个会发生死锁的多线程程序代码如下:

public class MyThread extends Thread {
    private Object object1;
    private Object object2;

    public MyThread(Object object1, Object object2, String threadName) {
        this.object1 = object1;
        this.object2 = object2;
        setName(threadName);
    }

    @Override
    public void run() {
        String name = Thread.currentThread().getName();
        System.out.println(name + " acquiring lock on " + object1);
        synchronized (object1) {
            System.out.println(name + " acquired lock on " + object1);
            sleep();
            System.out.println(name + " acquiring lock on " + object2);
            synchronized (object2) {
                System.out.println(name + " acquired lock on " + object2);
                sleep();
            }
            System.out.println(name + " released lock on " + object2);
        }
        System.out.println(name + " released lock on " + object2);
        System.out.println(name + " finished execution.");
    }

    private void sleep() {
        try {
            Thread.sleep(1000);
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
    }
}

main方法如下:

public class MainTest {
    public static void main(String[] args) {
        Object object1 = new Object();
        Object object2 = new Object();
        Object object3 = new Object();
        MyThread myThread1 = new MyThread(object1, object2, "myThread1");
        MyThread myThread2 = new MyThread(object2, object3, "myThread2");
        MyThread myThread3 = new MyThread(object3, object1, "myThread3");

        myThread1.start();
        myThread2.start();
        myThread3.start();
    }
}

运行程序后输出如下:

[root@localhost MutipleThreadTest]# java MainTest
myThread1 acquiring lock on java.lang.Object@796523ab
myThread3 acquiring lock on java.lang.Object@4335236e
myThread2 acquiring lock on java.lang.Object@5f08edd0
myThread3 acquired lock on java.lang.Object@4335236e
myThread1 acquired lock on java.lang.Object@796523ab
myThread2 acquired lock on java.lang.Object@5f08edd0
myThread3 acquiring lock on java.lang.Object@796523ab
myThread2 acquiring lock on java.lang.Object@4335236e
myThread1 acquiring lock on java.lang.Object@5f08edd0

从上面信息可知myThread1、myThread2、myThread3三个线程分别拿到一把锁,然后等待其他线程持有的锁。
由于没有任何一个线程释放锁于是整个程序进入死锁的状态。
现场的生产情况往往会更复杂,需要借助工具定位分析发生死锁的位置和原因,下面上jstack分析一下当前的堆栈信息。
上面的介绍已经知道如果有线程死锁可以使用`jstack -l pid`查看锁或资源的持有信息

[root@localhost ~]# jstack -l 3661
2018-01-09 17:24:15
Full thread dump Java HotSpot(TM) 64-Bit Server VM (24.79-b02 mixed mode):

"Attach Listener" daemon prio=10 tid=0x00007f03a0001000 nid=0xeab waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

   Locked ownable synchronizers:
    - None

"DestroyJavaVM" prio=10 tid=0x00007f03d4008800 nid=0xe4e waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

   Locked ownable synchronizers:
    - None

"myThread3" prio=10 tid=0x00007f03d40b5800 nid=0xe5e waiting for monitor entry [0x00007f03d05f4000]
   java.lang.Thread.State: BLOCKED (on object monitor)
    at MyThread.run(MyThread.java:26)
    - waiting to lock <0x00000000d784b978> (a java.lang.Object)
    - locked <0x00000000d784b998> (a java.lang.Object)

   Locked ownable synchronizers:
    - None

"myThread2" prio=10 tid=0x00007f03d40b4000 nid=0xe5d waiting for monitor entry [0x00007f03d06f5000]
   java.lang.Thread.State: BLOCKED (on object monitor)
    at MyThread.run(MyThread.java:26)
    - waiting to lock <0x00000000d784b998> (a java.lang.Object)
    - locked <0x00000000d784b988> (a java.lang.Object)

   Locked ownable synchronizers:
    - None

"myThread1" prio=10 tid=0x00007f03d40b1800 nid=0xe5c waiting for monitor entry [0x00007f03d07f6000]
   java.lang.Thread.State: BLOCKED (on object monitor)
    at MyThread.run(MyThread.java:26)
    - waiting to lock <0x00000000d784b988> (a java.lang.Object)
    - locked <0x00000000d784b978> (a java.lang.Object)

   Locked ownable synchronizers:
    - None

"Service Thread" daemon prio=10 tid=0x00007f03d4097800 nid=0xe5a runnable [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

   Locked ownable synchronizers:
    - None

"C2 CompilerThread1" daemon prio=10 tid=0x00007f03d4095000 nid=0xe59 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

   Locked ownable synchronizers:
    - None

"C2 CompilerThread0" daemon prio=10 tid=0x00007f03d4092000 nid=0xe58 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

   Locked ownable synchronizers:
    - None

"Signal Dispatcher" daemon prio=10 tid=0x00007f03d408f800 nid=0xe57 runnable [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

   Locked ownable synchronizers:
    - None

"Finalizer" daemon prio=10 tid=0x00007f03d406f000 nid=0xe55 in Object.wait() [0x00007f03d0dfc000]
   java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    - waiting on <0x00000000d7804858> (a java.lang.ref.ReferenceQueue$Lock)
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135)
    - locked <0x00000000d7804858> (a java.lang.ref.ReferenceQueue$Lock)
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:151)
    at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:209)

   Locked ownable synchronizers:
    - None

"Reference Handler" daemon prio=10 tid=0x00007f03d406d000 nid=0xe54 in Object.wait() [0x00007f03d0efd000]
   java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    - waiting on <0x00000000d7804470> (a java.lang.ref.Reference$Lock)
    at java.lang.Object.wait(Object.java:503)
    at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:133)
    - locked <0x00000000d7804470> (a java.lang.ref.Reference$Lock)

   Locked ownable synchronizers:
    - None

"VM Thread" prio=10 tid=0x00007f03d4068800 nid=0xe53 runnable 

"GC task thread#0 (ParallelGC)" prio=10 tid=0x00007f03d401e000 nid=0xe4f runnable 

"GC task thread#1 (ParallelGC)" prio=10 tid=0x00007f03d4020000 nid=0xe50 runnable 

"GC task thread#2 (ParallelGC)" prio=10 tid=0x00007f03d4022000 nid=0xe51 runnable 

"GC task thread#3 (ParallelGC)" prio=10 tid=0x00007f03d4024000 nid=0xe52 runnable 

"VM Periodic Task Thread" prio=10 tid=0x00007f03d40a2000 nid=0xe5b waiting on condition 

JNI global references: 114


Found one Java-level deadlock:
=============================
"myThread3":
  waiting to lock monitor 0x00007f03ac0062c8 (object 0x00000000d784b978, a java.lang.Object),
  which is held by "myThread1"
"myThread1":
  waiting to lock monitor 0x00007f0388002258 (object 0x00000000d784b988, a java.lang.Object),
  which is held by "myThread2"
"myThread2":
  waiting to lock monitor 0x00007f03ac004e28 (object 0x00000000d784b998, a java.lang.Object),
  which is held by "myThread3"

Java stack information for the threads listed above:
===================================================
"myThread3":
    at MyThread.run(MyThread.java:26)
    - waiting to lock <0x00000000d784b978> (a java.lang.Object)
    - locked <0x00000000d784b998> (a java.lang.Object)
"myThread1":
    at MyThread.run(MyThread.java:26)
    - waiting to lock <0x00000000d784b988> (a java.lang.Object)
    - locked <0x00000000d784b978> (a java.lang.Object)
"myThread2":
    at MyThread.run(MyThread.java:26)
    - waiting to lock <0x00000000d784b998> (a java.lang.Object)
    - locked <0x00000000d784b988> (a java.lang.Object)

Found 1 deadlock.

简单分析如下:

  • 以上结果末尾可知jstack已经检测到了死锁。
  • "myThread3"在等待"myThread1"释放资源、"myThread1"在等待"myThread2"释放资源、"myThread2"在等待"myThread3"释放资源。简单说就是资源循环依赖了导致了死锁。
  • 现实的情况往往不会发生结构如此简单的死锁,很多情况都是等待网络IO、数据读写等耗时操作,导致线程占用的资源不能释放,从而导致程序假死、卡顿等情况。
  • 比如之前遇到过的一个场景saiku每次刷新缓存的时候都要卡很久,最后通过jstack发现每次刷缓存都会去拿olap数据库连接,并且加锁了。必须等刷缓存的相关操作执行完才能恢复,这其中就包含执行一系列的select COUNT(*)操作,而恰好存在这比样的大表4千万多万行数据,一个select COUNT(*)要执行90多秒,这就悲剧了,全部刷完需要8分钟左右系统才能恢复

堆栈中一些常见值解析

线程状态(java.lang.Thread.State)

  • NEW:至今尚未启动的线程的状态。 
  • RUNNABLE:可运行线程的线程状态。处于可运行状态的某一线程正在 Java 虚拟机中运行,但它可能正在等待操作系统中的其他资源,比如处理器。
  • BLOCKED:受阻塞并且正在等待monitor lock的某一线程的线程状态。处于受阻塞状态的某一线程正在等待monitor lock,以便进入一个同步的块/方法,或者在调用 Object.wait 之后再次进入同步的块/方法。 比如等待执行synchronized代码块或者使用synchronized标记的方法。
  • WAITING:某一等待线程的线程状态。某一线程因为调用下列方法之一而处于等待状态。
    • 不带超时值的 Object.wait
    • 不带超时值的 Thread.join
    • LockSupport.park 处于等待状态的线程正等待另一个线程,以执行特定操作。 例如,已经在某一对象上调用了 Object.wait() 的线程正等待另一个线程,以便在该对象上调用 Object.notify() 或 Object.notifyAll()。已经调用了 Thread.join() 的线程正在等待指定线程终止。 
  • TIMED_WAITING:具有指定等待时间的某一等待线程的线程状态。某一线程因为调用以下带有指定正等待时间的方法之一而处于定时等待状态: 
    • Thread.sleep
    • 带有超时值的 Object.wait
    • 带有超时值的 Thread.join
    • LockSupport.parkNanos
    • LockSupport.parkUntil
  • TERMINATED:已终止线程的线程状态。线程已经结束执行。 

各种线程ID

  • prio:线程的优先级;
  • tid:java Thread id;
  • nid:native线程的id, 非常关键,后面再使用jstack时补充;

使用top -Hp pid查看进程的线程状态时线程的id就对应这里的nid,在堆栈中nid是使用16进制表示的因此可以使用如下工具做简单的转换

printf "%x\n" 21742

 

实战一下,找当前个人博客系统中最运行时间最长的线程

首先找到博客进程id

$ ps -ef | grep java

结果为:

zhangsan   27952     1  4  2017 ?        1-07:19:12 java -Xms128m -Xmx128m -Dfile.encoding=UTF-8 -jar tale-least.jar

查看当前线程信息

 

$ top -Hp 27952

top - 23:32:06 up 59 days, 10:06,  2 users,  load average: 0.06, 0.04, 0.01
Tasks:  16 total,   1 running,  15 sleeping,   0 stopped,   0 zombie
Cpu(s):  5.7%us,  3.0%sy,  0.0%ni, 91.3%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   1019988k total,   907552k used,   112436k free,   151028k buffers
Swap:        0k total,        0k used,        0k free,   387324k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                       
27965 zhangsan    20   0 2116m 213m  13m R  7.3 21.4   1727:05 java                                                           
27954 zhangsan    20   0 2116m 213m  13m S  0.7 21.4 133:27.59 java                                                           
27952 zhangsan    20   0 2116m 213m  13m S  0.0 21.4   0:00.00 java                                                           
27953 zhangsan    20   0 2116m 213m  13m S  0.0 21.4   0:00.22 java                                                           
27955 zhangsan    20   0 2116m 213m  13m S  0.0 21.4   0:00.06 java                                                           
27956 zhangsan    20   0 2116m 213m  13m S  0.0 21.4   0:00.55 java                                                           
27957 zhangsan    20   0 2116m 213m  13m S  0.0 21.4   0:00.00 java                                                           
27958 zhangsan    20   0 2116m 213m  13m S  0.0 21.4   0:28.53 java                                                           
27959 zhangsan    20   0 2116m 213m  13m S  0.0 21.4   0:09.86 java                                                           
27960 zhangsan    20   0 2116m 213m  13m S  0.0 21.4   0:00.00 java                                                           
27961 zhangsan    20   0 2116m 213m  13m S  0.0 21.4  13:00.98 java                                                           
27962 zhangsan    20   0 2116m 213m  13m S  0.0 21.4   0:00.57 java                                                           
27969 zhangsan    20   0 2116m 213m  13m S  0.0 21.4   2:04.10 java                                                           
27970 zhangsan    20   0 2116m 213m  13m S  0.0 21.4   0:49.48 java                                                           
27971 zhangsan    20   0 2116m 213m  13m S  0.0 21.4   2:16.69 java                                                           
18940 zhangsan    20   0 2116m 213m  13m S  0.0 21.4   0:00.00 java

 

发现线程pid 27954运行时间最长,转换成16进制

$ printf "%x\n" 27954
6d32

使用jstack该线程的堆栈信息

$ jstack 27952 | grep 6d32 | grep -v grep
"VM Thread" os_prio=0 tid=0x00007fc6ec094000 nid=0x6d32 runnable

发现是Java VM相关的线程怪不得运行时间最长,这里根据实际情况鉴别VM系统线程和自己的业务线程

 

posted @ 2018-01-09 23:46  海阔天空990  阅读(1149)  评论(0编辑  收藏  举报