If you cant explain it simply, you dont understand it well enough

【转】processD

转【http://blog.kevac.org/2013/02/uninterruptible-sleep-d-state.html】

Uninterruptible sleep (D state)
Sometimes you will see processes on your linux box that are in D state as shown by ps, top, htop or similar. D means uninterruptible sleep. As opposed to normal sleep, you can't do anything with these processes (i.e. kill them).
Additional information could be found on LWN.

Usually this means that process is stuck reading or writing something to disk or NFS and remote site does not answer. But not always. I'v stumbled upon processes that periodically jump into D state and back. These are processes under heavy network load. Strange...

While not been successful in finding out what is going on yet, I would like to share pointers how to dig deeper into these processes.
ps
Familiar tool ps can tell us in what function inside kernel our process is sleeping.

ps -eo ppid,pid,user,stat,pcpu,comm,wchan:32

Adding some bash magic will help us dealing with process that is constantly jumping between D, S and R states:

for x in $(seq 10000); do ps -eo ppid,pid,user,stat,pcpu,comm,wchan:32 | grep D | grep laccess | awk '{print $7}' | grep -v "-"; done;

lock_sock_nested
lock_sock_nested
^C

Ok, so our process is sleeping in kernel function called lock_sock_nested.
sysrq
We can force linux kernel to dump backtraces for processes that are in D state. Information could be seen via dmesg utility or in /var/log/messages file.

echo w > /proc/sysrq-trigger

Again, with additional bash magic our command looks like this:

echo w > /proc/sysrq-trigger; dmesg -c | less;

This is how output looks like:

[688596.998782] SysRq : Show Blocked State
[688596.999054] task PC stack pid father
[688596.999101] laccessd D 00000000ffffffff 0 26940 25632 0x00000000
[688596.999104] ffff88090885fdd8 0000000000000086 ffff8800282d3700 ffff88125a5b2380
[688596.999107] 0000000000013700 ffff88090885ffd8 0000000000013700 ffff88090885ffd8
[688596.999109] 0000000000013700 0000000000013700 0000000000013700 0000000000013700
[688596.999111] Call Trace:
[688596.999113] Inexact backtrace:
[688596.999113]
[688596.999122] [] ? try_to_wake_up+0x1be/0x400
[688596.999128] [] ? schedule_timeout+0x143/0x240
[688596.999133] [] ? process_timeout+0x0/0x10
[688596.999136] [] ? ep_poll+0x153/0x240
[688596.999139] [] ? default_wake_function+0x0/0x10
[688596.999141] [] ? sys_epoll_wait+0xd4/0xe0
[688596.999146] [] ? system_call_fastpath+0x16/0x1b

posted @ 2017-02-17 11:55  zk47  阅读(600)  评论(0编辑  收藏  举报

I am a stupid bird, and I need to work hard