代码改变世界

Linux 日志报错 xxx blocked for more than 120 seconds

2016-08-18 12:23  潇湘隐者  阅读(17088)  评论(0编辑  收藏  举报

   

    监控作业发现一台服务器(Red Hat Enterprise Linux Server release 5.7)从凌晨1:32开始,有一小段时间无法响应,数据库也连接不上,后面又正常了。早上检查了监听日志,并没有发现错误信息。但是检查告警日志,发现有下面错误信息:

Thread 1 advanced to log sequence 19749 (LGWR switch)
  Current log# 2 seq# 19749 mem# 0: /u01/oradata/epps/redo02.log
  Current log# 2 seq# 19749 mem# 1: /u03/oradata/epps/redo021.log
Wed Aug 17 01:37:08 2016
kkjcre1p: unable to spawn jobq slave process 
Wed Aug 17 01:37:08 2016
Errors in file /u01/app/oracle/admin/epps/bdump/epps_cjq0_4094.trc:
 
Wed Aug 17 01:56:44 2016
kkjcre1p: unable to spawn jobq slave process 
Wed Aug 17 01:56:44 2016
Errors in file /u01/app/oracle/admin/epps/bdump/epps_cjq0_4094.trc:

clip_image001

more /u01/app/oracle/admin/epps/bdump/epps_cjq0_4094.trc
u01/app/oracle/admin/epps/bdump/epps_cjq0_4094.trc
racle Database 10g Release 10.2.0.4.0 - 64bit Production
RACLE_HOME = /u01/app/oracle/product/10.2.0/db_1
ystem name:    Linux
ode name:      ceglnx01
elease:        2.6.32-200.13.1.el5uek
ersion:        #1 SMP Wed Jul 27 21:02:33 EDT 2011
achine:        x86_64
nstance name: epps
edo thread mounted by this instance: 1
racle process number: 17
nix process pid: 4094, image: oracle@ceglnx01 (CJQ0)
 
** 2016-08-13 01:52:16.178
** SERVICE NAME:(SYS$BACKGROUND) 2016-08-13 01:52:16.155
** SESSION ID:(177.9) 2016-08-13 01:52:16.155
aited for process J000 to be spawned for 60 seconds
** 2016-08-13 01:52:26.187
aited for process J000 to be spawned for 70 seconds
** 2016-08-13 01:52:36.197
aited for process J000 to be spawned for 80 seconds
** 2016-08-13 01:52:46.207
aited for process J000 to be spawned for 90 seconds
** 2016-08-13 01:52:56.217
aited for process J000 to be spawned for 100 seconds
** 2016-08-13 01:53:06.227
aited for process J000 to be spawned for 110 seconds
** 2016-08-13 01:57:15.512
** 2016-08-14 01:30:52.159
aited for process J000 to be spawned for 60 seconds
** 2016-08-14 01:31:02.186
aited for process J000 to be spawned for 70 seconds
** 2016-08-14 01:31:12.196
aited for process J000 to be spawned for 80 seconds
** 2016-08-14 01:31:22.206
aited for process J000 to be spawned for 90 seconds
** 2016-08-14 01:31:32.217
aited for process J000 to be spawned for 100 seconds
** 2016-08-14 01:31:42.227
aited for process J000 to be spawned for 110 seconds
** 2016-08-14 01:35:36.155
** 2016-08-14 01:44:46.866
aited for process J000 to be spawned for 60 seconds
** 2016-08-14 01:44:56.876
aited for process J000 to be spawned for 70 seconds
** 2016-08-14 01:45:06.886
aited for process J000 to be spawned for 80 seconds
** 2016-08-14 01:45:16.896
aited for process J000 to be spawned for 90 seconds
** 2016-08-14 01:45:26.906
aited for process J000 to be spawned for 100 seconds
** 2016-08-14 01:45:36.916
aited for process J000 to be spawned for 110 seconds
** 2016-08-14 01:55:49.977
** 2016-08-14 01:58:22.156
aited for process J000 to be spawned for 60 seconds
** 2016-08-14 01:58:32.166
aited for process J000 to be spawned for 70 seconds
** 2016-08-14 01:58:42.176
aited for process J000 to be spawned for 80 seconds
** 2016-08-14 01:58:52.187
aited for process J000 to be spawned for 90 seconds
** 2016-08-14 01:59:02.196
aited for process J000 to be spawned for 100 seconds
** 2016-08-14 01:59:12.206
aited for process J000 to be spawned for 110 seconds
** 2016-08-14 02:17:55.867
** 2016-08-16 01:30:52.151
aited for process J000 to be spawned for 60 seconds
** 2016-08-16 01:31:02.180
aited for process J000 to be spawned for 70 seconds
** 2016-08-16 01:31:12.191
aited for process J000 to be spawned for 80 seconds
** 2016-08-16 01:31:22.202
aited for process J000 to be spawned for 90 seconds
** 2016-08-16 01:31:32.213
aited for process J000 to be spawned for 100 seconds
** 2016-08-16 01:31:42.223
aited for process J000 to be spawned for 110 seconds
** 2016-08-16 01:35:09.610
** 2016-08-16 01:51:50.369
aited for process J000 to be spawned for 60 seconds
** 2016-08-16 01:52:00.379
aited for process J000 to be spawned for 70 seconds
** 2016-08-16 01:52:10.389
aited for process J000 to be spawned for 80 seconds
** 2016-08-16 01:52:20.399
aited for process J000 to be spawned for 90 seconds
** 2016-08-16 01:52:30.409
aited for process J000 to be spawned for 100 seconds
** 2016-08-16 01:52:40.419
aited for process J000 to be spawned for 110 seconds
** 2016-08-16 01:56:11.022
** 2016-08-17 01:22:21.154
aited for process J000 to be spawned for 60 seconds
** 2016-08-17 01:22:31.190
aited for process J000 to be spawned for 70 seconds
** 2016-08-17 01:22:41.199
aited for process J000 to be spawned for 80 seconds
** 2016-08-17 01:22:51.209
aited for process J000 to be spawned for 90 seconds
** 2016-08-17 01:23:01.219
aited for process J000 to be spawned for 100 seconds
** 2016-08-17 01:23:11.229
aited for process J000 to be spawned for 110 seconds
** 2016-08-17 01:37:08.240
** 2016-08-17 01:38:08.401
aited for process J000 to be spawned for 60 seconds
** 2016-08-17 01:38:18.412
aited for process J000 to be spawned for 70 seconds
** 2016-08-17 01:38:28.421
aited for process J000 to be spawned for 80 seconds
** 2016-08-17 01:38:38.431
aited for process J000 to be spawned for 90 seconds
** 2016-08-17 01:38:48.441
aited for process J000 to be spawned for 100 seconds
** 2016-08-17 01:38:58.451
aited for process J000 to be spawned for 110 seconds
** 2016-08-17 01:56:44.526

 

几乎都是1:30左右发生的,很有意思的是错误出现的时间(工厂系统管理员反馈Symantec BE 收取RMAN备份的那个作业开始异常的时间也刚刚是13号出现,白天我去测试作业收带,发现又都是正常的,当时一直按照官方文档https://www.veritas.com/support/en_US/article.TECH73015在排除、验证),现在看来很有可能Symantec BE作业的异常也是这个引起的。检查了RMAN的备份记录,几乎在12:00到12:40直接就已经操作完成了。 J000一般为作业的进程。检查了一下,仅仅是一个系统收集统计信息的作业在这个时间点运行。暂且看不出什么问题。

关于Waited for process xxx to be spawned for xxx seconds, 在oracle metalink看到下面有意思的资料:

 

You need to review both the traditional as well as incident trace files to understand the load on the system and major waits on the database. Typically
 
The incident trace file will show you the database wide waits that the child process encountering when coming up.
The traditional trace file will show you the details of the load on the machine (below ones):
Load average
Memory consumption
Output of PS (process state)
Output of GDB (to view the function stack)
In order to investigate the issue, it might also help to understand the general stages of the process startup:
 
Queued phase
Forking Phase
Execution Phase
Initialization phase
In general, the forking and execution phases are directly linked to system load and resources. To check what phase the process startup is, open the traditional trace file (not incident) and look for the wording "Waited for process"
 
Waited for process XYZ to be spawned for nnn seconds
Waited for process XYZ to initialize for nnn seconds
 
If the message contains "to be spawned", it means the process is still at queued or forking phase (1 & 2)
If the message contains "to initialize", it means the process is at execution or Initialize phase (3 & 4)
 
Other useful information can be obtained from the trace files:
 
Open the traditional trace file and review the section which starts with 'Process diagnostic dump for' - 'load average','memory information','process state - ps output' and also 'gdb output' will provide initial insight on the load on the system.
Open the incident trace file and find the section 'PROCESS STATE' and within that section look for 'Current Wait Stack'. This will provide the database wide events that the child process encountered and may provide clues and generic direction on how to proceed.
 
Why does the error occur? - Potential causes and solutions/actions
 
The root cause of this issue mainly falls under the following 2 categories.
Contention among processes: The process which is coming up might require resources that are actually being contended for by different other processes and sometimes the parent process itself might contend for the same resource as the child process (indirectly).
OS and network level issues: The machine on which this is happening might be CPU/memory saturated and this may delay the process spawning. Network latency when your storage is on a network file system may also delay the process spawning.
Some of the common known issues and potential solutions are listed below:
 
Lack of OS resources or incorrect configuration
This error may be observed due to lack of OS resources or incorrect configuration, typically memory or swap space may be insufficient to spawn a new process. 
Accordingly, the following checks may help to identify the issue:
Check OS Error Log file for the time when the error is generated
OS Messages log can provide an indication whether there is a problem with the Operating System Itself
* AIX: the output of the "errpt" command and the "errpt -a" command 
* Linux: /var/log/messages
* Solaris: /var/adm/messages
* HP-UX: /var/adm/syslog/syslog.log
* Tru64: /var/adm/messages
* Windows: Save Application Log and System Log as txt files Using Event Viewer
Run HCVE script 
The HCVE script verifies whether OS resources are set as recommended by Oracle. Instructions on how to download and run the script are outlined in Document 250262.1.
Please note, that the script will only check whether your system is configured as per the recommended 'minimum' values. Depending on your environment these values may not be sufficient though.
Run OS Watcher 
OS Watcher is an Oracle tool that allows you to monitor the system from an OS perspective. Document 301137.1 outlines the usage of this tool.
Check the defined user limitation (ulimit) settings (UNIX-only)
Check the ulimit settings as the oracle user (or the owner of the oracle software) using 
# ulimit -a
Minimum values can be found in Document 169706.1.
Please note, that the values mentioned in the note are bare minimum values for a standard environment. Depending on your environment setup, you may need to increase these values accordingly.
Recommended actions:
 
Review the resource-related suggestions as reported from the output of the HCVE script and make changes accordingly. The following 2 articles may help in understanding these suggestion better:
Document 169706.1: Oracle Database on Unix AIX,HP-UX,Linux,Mac OS X,Solaris,Tru64 Unix Operating Systems Installation and Configuration Requirements Quick Reference (8.0.5 to 11.2)
Document 225349.1: Address Windowing Extensions (AWE) or VLM on Windows Platforms
(Typically on windows-system with more than 4Gb of RAM, enabling the /3GB switch in the boot.ini is required)
Check the define user limitation (ulimit) settings (UNIX-only)
 
ASLR LINUX feature is being used
ASLR is a feature that is activated by default on some of the newer LINUX distributions. It is designed to load shared memory objects in random addresses. In Oracle, multiple processes map a shared memory object at the same address across the processes. With ASLR turned on, Oracle cannot guarantee the availability of this shared memory address. This conflict in the address space means that a process trying to attach a shared memory object to a specific address may not be able to do so, resulting in a failure in SHMAT subroutine.
This problem is mainly reported in Redhat 5 and Oracle 11.2.0.2. You can verify whether ASLR is being used as follows:
# /sbin/sysctl -a | grep randomize
kernel.randomize_va_space = 1
-> If the parameter is set to any value other than 0 then ASLR is in use.
 
Recommended actions:
Disable the use of the ASLR Feature. 
On Redhat 5, to permanently disable ASLR, add/modify this parameter in /etc/sysctl.conf
 
kernel.randomize_va_space=0
kernel.exec-shield=0
 
You need to reboot, for kernel.exec-shield parameter to take effect. 
Note that both kernel parameters are required for ASLR to be switched off.
 
Incorrect Database Settings
Check whether PGA_AGGREGATE_TARGET is set to TRUE/FALSE
The parameter PGA_AGGREGATE_TARGET is a numeric value and not a Boolean value and therefore must be set to a number for it to function correctly.
Check whether PRE_PAGE_SGA is set to TRUE
PRE_PAGE_SGA instructs Oracle to read the entire SGA into active memory at instance startup. Operating system page table entries are then prebuilt for each page of the SGA. This setting can increase the amount of time necessary for instance startup, but it is likely to decrease the amount of time necessary for Oracle to reach its full performance capacity after startup. PRE_PAGE_SGA can increase the process startup duration, because every process that starts must access every page in the SGA, this can cause the PMON process to take longer to start and exceed the timeout which is by default 120 seconds causing the instance startup to fail.
Check output of SQL> select * from v$resource_limit;
V$resource_limit dynamic view provides the details of resources like sessions, processes, locks etc. It has the initialization values for the resources, maximum limit reached after the last database startup and current utilization of the resource.
Recommended actions:
 
Properly set PGA_AGGREGATE_TARGET to a numeric value.
Set PRE_PAGE_SGA to FALSE (recommended).
Check if limits were reached and accordingly increase the value of the resource in concern.
 
Other Causes or Known Issues
Other potential causes known to cause issues could be
Too much activity on your machine
NFS latency issues
Disk latency issue (that affects I/O)
Network latency
 
Recommended actions:
 
Since almost all these reasons are not related to the RDBMS itself, it is recommended you involve your Network, Storage and System Administrators in this investigation. 

     

也就是说J000这个进程老是在forking phase要花很长时间排队。上面资料也介绍了发生这种错误的场景,我们先检查一下系统资源是否存在不足的情况。

# sar -r

clip_image002

如上所示,内存各项指标不大,CPU利用率在这个时间段确实有所上升。但是总体来说,还是非常低的。

$sar -u

clip_image003

 

于是我们检查系统的message日志,如下所示,发现这个时间点出现了"INFO: task bdi-default:40 blocked for more than 120 seconds"错误(当然13-17号,只有17号出现了这个错误):

Aug 17 01:32:13 localhost kernel: INFO: task bdi-default:40 blocked for more than 120 seconds.
Aug 17 01:32:13 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 17 01:32:13 localhost kernel: bdi-default   D 0000000000000000     0    40      2 0x00000000
Aug 17 01:32:13 localhost kernel:  ffff88042e7b5bf0 0000000000000046 0000000000000000 ffff88042e5a8080
Aug 17 01:32:13 localhost kernel:  ffff88042e7b2340 ffffffff81aa84c0 ffff88042e7b2710 0000000547046b50
Aug 17 01:32:13 localhost kernel:  0000000000000001 0000000000000000 0000000000000000 7fffffffffffffff
Aug 17 01:32:13 localhost kernel: Call Trace:
Aug 17 01:32:13 localhost kernel:  [<ffffffff814527e9>] schedule_timeout+0x36/0xe7
Aug 17 01:32:13 localhost kernel:  [<ffffffff8102bc30>] ? flat_send_IPI_mask+0x11/0x13
Aug 17 01:32:13 localhost kernel:  [<ffffffff8104367e>] ? need_resched+0x23/0x2d
Aug 17 01:32:13 localhost kernel:  [<ffffffff8145265b>] wait_for_common+0xb7/0x12c
Aug 17 01:32:13 localhost kernel:  [<ffffffff8104cf2f>] ? default_wake_function+0x0/0x19
Aug 17 01:32:13 localhost kernel:  [<ffffffff81452773>] wait_for_completion+0x1d/0x1f
Aug 17 01:32:13 localhost kernel:  [<ffffffff81076aab>] kthread_create+0x9c/0x143
Aug 17 01:32:13 localhost kernel:  [<ffffffff810edce9>] ? bdi_start_fn+0x0/0xd1
Aug 17 01:32:13 localhost kernel:  [<ffffffff810654cb>] ? lock_timer_base+0x2b/0x4f
Aug 17 01:32:13 localhost kernel:  [<ffffffff81065562>] ? try_to_del_timer_sync+0x73/0x81
Aug 17 01:32:13 localhost kernel:  [<ffffffff81065589>] ? del_timer_sync+0x19/0x26
Aug 17 01:32:13 localhost kernel:  [<ffffffff8145286a>] ? schedule_timeout+0xb7/0xe7
Aug 17 01:32:13 localhost kernel:  [<ffffffff81065596>] ? process_timeout+0x0/0x10
Aug 17 01:32:13 localhost kernel:  [<ffffffff810edf8b>] bdi_forker_task+0x1d1/0x263
Aug 17 01:32:13 localhost kernel:  [<ffffffff8103feee>] ? __wake_up_common+0x46/0x79
Aug 17 01:32:13 localhost kernel:  [<ffffffff810eddba>] ? bdi_forker_task+0x0/0x263
Aug 17 01:32:13 localhost kernel:  [<ffffffff810eddba>] ? bdi_forker_task+0x0/0x263
Aug 17 01:32:13 localhost kernel:  [<ffffffff81076a07>] kthread+0x6e/0x76
Aug 17 01:32:13 localhost kernel:  [<ffffffff81012dea>] child_rip+0xa/0x20
Aug 17 01:32:13 localhost kernel:  [<ffffffff81076999>] ? kthread+0x0/0x76
Aug 17 01:32:13 localhost kernel:  [<ffffffff81012de0>] ? child_rip+0x0/0x20
Aug 17 01:34:13 localhost kernel: INFO: task bdi-default:40 blocked for more than 120 seconds.
Aug 17 01:34:13 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 17 01:34:13 localhost kernel: bdi-default   D 0000000000000000     0    40      2 0x00000000
Aug 17 01:34:13 localhost kernel:  ffff88042e7b5bf0 0000000000000046 0000000000000000 ffff88042e5a8080
Aug 17 01:34:13 localhost kernel:  ffff88042e7b2340 ffffffff81aa84c0 ffff88042e7b2710 0000000547046b50
Aug 17 01:34:13 localhost kernel:  0000000000000001 0000000000000000 0000000000000000 7fffffffffffffff
Aug 17 01:34:13 localhost kernel: Call Trace:
Aug 17 01:34:13 localhost kernel:  [<ffffffff814527e9>] schedule_timeout+0x36/0xe7
Aug 17 01:34:13 localhost kernel:  [<ffffffff8102bc30>] ? flat_send_IPI_mask+0x11/0x13
Aug 17 01:34:13 localhost kernel:  [<ffffffff8104367e>] ? need_resched+0x23/0x2d
Aug 17 01:34:13 localhost kernel:  [<ffffffff8145265b>] wait_for_common+0xb7/0x12c
Aug 17 01:34:13 localhost kernel:  [<ffffffff8104cf2f>] ? default_wake_function+0x0/0x19
Aug 17 01:34:13 localhost kernel:  [<ffffffff81452773>] wait_for_completion+0x1d/0x1f
Aug 17 01:34:13 localhost kernel:  [<ffffffff81076aab>] kthread_create+0x9c/0x143
Aug 17 01:34:13 localhost kernel:  [<ffffffff810edce9>] ? bdi_start_fn+0x0/0xd1
Aug 17 01:34:13 localhost kernel:  [<ffffffff810654cb>] ? lock_timer_base+0x2b/0x4f
Aug 17 01:34:13 localhost kernel:  [<ffffffff81065562>] ? try_to_del_timer_sync+0x73/0x81
Aug 17 01:34:13 localhost kernel:  [<ffffffff81065589>] ? del_timer_sync+0x19/0x26
Aug 17 01:34:13 localhost kernel:  [<ffffffff8145286a>] ? schedule_timeout+0xb7/0xe7
Aug 17 01:34:13 localhost kernel:  [<ffffffff81065596>] ? process_timeout+0x0/0x10
Aug 17 01:34:13 localhost kernel:  [<ffffffff810edf8b>] bdi_forker_task+0x1d1/0x263
Aug 17 01:34:13 localhost kernel:  [<ffffffff8103feee>] ? __wake_up_common+0x46/0x79
Aug 17 01:34:13 localhost kernel:  [<ffffffff810eddba>] ? bdi_forker_task+0x0/0x263
Aug 17 01:34:13 localhost kernel:  [<ffffffff810eddba>] ? bdi_forker_task+0x0/0x263
Aug 17 01:34:13 localhost kernel:  [<ffffffff81076a07>] kthread+0x6e/0x76
Aug 17 01:34:13 localhost kernel:  [<ffffffff81012dea>] child_rip+0xa/0x20
Aug 17 01:34:13 localhost kernel:  [<ffffffff81076999>] ? kthread+0x0/0x76
Aug 17 01:34:13 localhost kernel:  [<ffffffff81012de0>] ? child_rip+0x0/0x20
Aug 17 01:36:13 localhost kernel: INFO: task bdi-default:40 blocked for more than 120 seconds.
Aug 17 01:36:13 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

clip_image004

 

关于bdi-defualt进程,我从Linux内核进程详解之二:bdi-default这篇文章摘取了下面一段:

bdi,即是backing device info的缩写,根据英文单词全称可知其通指备用存储设备相关描述信息,这在内核代码里用一个结构体backing_dev_info来表示:http://lxr.linux.no/#linux+v2.6.38.8/include/linux/backing-dev.h#L62

bdi,备用存储设备,简单点说就是能够用来存储数据的设备,而这些设备存储的数据能够保证在计算机电源关闭时也不丢失。这样说来,软盘存储设备、光驱存储设备、USB存储设备、硬盘存储设备都是所谓的备用存储设备(后面都用bdi来指示),而内存显然不是,具体看下面这个链接:http://www.gordonschools.aberdeenshire.sch.uk/Departments/Computing/StandardGrade/SystemsWeb/6BackingStorage.htm

相对于内存来说,bdi设备(比如最常见的硬盘存储设备)的读写速度是非常慢的,因此为了提高系统整体性能,Linux系统对bdi设备的读写内容进行了缓冲,那些读写的数据会临时保存在内存里,以避免每次都直接操作bdi设备,但这就需要在一定的时机(比如每隔5秒、脏数据达到的一定的比率等)把它们同步到bdi设备,否则长久的呆在内存里容易丢失(比如机器突然宕机、重启),而进行间隔性同步工作的进程之前名叫pdflush,但后来在Kernel 2.6.2x/3x(没注意具体是哪个小版本的改动,比如:http://kernelnewbies.org/Linux_2_6_35#head-57d43d498509746df08f48b1f040d20d88d2b984,http://lwn.net/Articles/396757/)对此进行了优化改进,产生有多个内核进程,bdi-default、flush-x:y等

 

检查确认如下

# ps -ef | grep bdi-default | grep 40
root        40     2  0 Jan17 ?        00:00:31 [bdi-default]
root      4026   752  0 17:07 pts/3    00:00:00 grep bdi-default

 

看来确实是这个bdi-default进程的异常引起的。接下来就需要查证为什么出现“task bdi-default:40 blocked for more than 120 seconds”,关于这个网上有很多资料:

 

由于bug引起的(与我们的环境不符,直接pass)

Slow writes to ext4 partition - INFO: task flush-253:7:2137 blocked for more than 120 seconds

 

我自己也遇到过几次有关这样的案例,不过是不同的进程引起的,也有所区别。

TNS-12518 & Linux Error:32:Broken pipe

Linux服务器宕机案例第二则

 

在官方资料有介绍,但是我无订阅账号,无法查看具体内容。

https://access.redhat.com/solutions/31453

 

System becomes unresponsive with message “INFO: task : blocked for more than 120 seconds里面有如下介绍

INFO: task <process>:<pid> blocked for more than 120 seconds

What is the reason behind above messages and what kind of information is needed to troubleshoot them?

  • These messages typically mean that the system is experiencing disk or memory congestion and processes are being starved of available resources.
  • These messages serve as a warning that something may not be operating optimally. They do not necessarily indicate a serious problem and any blocked processes should eventually proceed when the system recovers.
  • To troubleshoot further, generate a vmcore
  • If the hung task is a third-party application, please engage the application vendor in the issue meanwhile.
  • If the hung task messages are known to be erroneous, they can be disabled. This is not advised, and it will not prevent any hangs. To disable the messages, use

sysctl kernel.hung_task_timeout_secs=0

  • If it causes the crash and the causes are known to be erroneous, please disable it by putting the below in /etc/sysctl.conf. Once have the below value in the file, please run ‘sysctl -p’ to apply the change.

kernel.hung_task_panic = 0

Root Cause:

The khungtaskd kernel thread monitors the process states and checks if there are any processes stuck in uninterruptible state for the period set in “kernel.hung_task_timeout_secs” sysctl parameter (default is 120 seconds). If so it will log the message along with a call trace of the process that is blocked. The hung task watchdog will by default only report the first 10 instances of detected hung tasks and then will disable reporting. This is to prevent the system log from being flooded with messages. This means that even though no more blocked task messages are being reported there may still be processes becoming stuck for long periods

Cautions:

  • If hung_task_panic is enabled, this can cause false positives to unnecessarily panic your machine.
    • Although usually, a task being blocked for > 2 minutes is a good indication that something has gone wrong inside the kernel, there are circumstances in which a task can be legitimately blocked for this long, for example:
      • flushing large amounts of file system data to a slow device duringumount() or sync() can take a longer than the defaulthung_task_timeout_secs
      • while the default hung_task_timeout_secs is 2x the default IO timeout value, different types of devices and different environments can change the default IO timeout value causing false hung task event triggers as well as panics if hung_task_panic is enabled.
  • As a general rule, the hung_task_timeout_secs should be set to a value larger than the longest expected IO timeout value to prevent false triggering of the stall task logic and its event output within messages. This is especially true ifhung_task_panic is enabled as false triggering of the stalled task logic will cause induced kernel panics.

 

打算先重启观察一下,如果不行再试试设置kernel.hung_task_panic = 0,结果昨天重启过后,一切正常了。真怀疑是VMware环境在某些条件下触发的一个bug!

 

参考资料:

http://www.lenky.info/archives/2012/02/1125

https://www.blackmoreops.com/2014/09/22/linux-kernel-panic-issue-fix-hung_task_timeout_secs-blocked-120-seconds-problem/

http://www.cnblogs.com/kerrycode/p/4164838.html

http://www.cnblogs.com/kerrycode/p/5653586.html

https://linuxraj.wordpress.com/2015/11/17/system-becomes-unresponsive-with-message-info-task-blocked-for-more-than-120-seconds/

https://support.oracle.com/epmos/faces/DocumentDisplay?_afrLoop=359621643018467&id=1989698.1&displayIndex=7&_afrWindowMode=0&_adf.ctrl-state=s96s1un90_182#title1

https://bugzilla.redhat.com/show_bug.cgi?id=652262