清除信号量队列导致zabbix自动关闭

前几天在海外UCloud机器上部署了一套zabbix proxy和zabbix agentd,可是第二天一大早就收到邮件说zabbix_proxy挂掉了,上去查一下发现两台机器中的一台的proxy和agentd都挂了,而另一台没事,再查一下log日志:

zabbix_agentd [12977]: [file:'cpustat.c',line:235] lock failed: [22] Invalid argument
 12976:20150305:022001.966 One child process died (PID:12977,exitcode/signal:255). Exiting ...
 12976:20150305:022003.967 Zabbix Agent stopped. Zabbix 2.0.13 (revision 48919).
 
zabbix_proxy [12970]: [file:'selfmon.c',line:341] lock failed: [22] Invalid argument
zabbix_proxy [12972]: [file:'selfmon.c',line:341] lock failed: [22] Invalid argument
zabbix_proxy [12973]: [file:'selfmon.c',line:341] lock failed: [22] Invalid argument
 12951:20150305:022001.362 One child process died (PID:12970,exitcode/signal:255). Exiting ...
 12951:20150305:022003.365 syncing history data...
zabbix_proxy [12951]: [file:'dbcache.c',line:2196] lock failed: [22] Invalid argument

 第一感觉就是crontab跑了一个什么脚本,删除了啥东西导致的,果不其然,的确是删除了信号量导致的(关于信号量的介绍参看大牛博客 ipcs介绍 ),删除脚本如下:

#!/bin/sh
for semid in `ipcs -s | cut -f2 -d" "`
do
    ipcrm -s $semid
done

这么粗暴的删除,不出事才怪呢,加个删除条件:

#!/bin/sh
for semid in `ipcs -s | grep -v zabbix | cut -f2 -d" "`
do
    ipcrm -s $semid
done

再跑一下脚本,没问题啦 ^_^

posted @ 2015-03-05 17:20  forilen  阅读(2124)  评论(0编辑  收藏  举报