zabbix监控hp服务器raid硬盘健康情况
解决HP服务器raid坏的问题.相当于每天巡检一回硬盘状态,以便于及时发现处理硬盘的问题
系统环境为centos7.9
下载地址: https://downloads.linux.hpe.com/SDR/repo/mcp/centos/7/x86_64/current/
| wget https://downloads.linux.hpe.com/SDR/repo/mcp/centos/7/x86_64/current/ssacli-5.10-44.0.x86_64.rpm |
安装软件hp官方的软件
rpm -ivh ssacli-5.10-44.0.x86_64.rpm
也可以直接用下面的命令安装
| rpm -ivh https://downloads.linux.hpe.com/SDR/repo/mcp/centos/7/x86_64/current/ssacli-5.10-44.0.x86_64.rpm |
查看硬盘状态命令
| ssacli ctrl slot=0 pd all show status | |
| physicaldrive 1I:3:1 (port 1I:box 3:bay 1, 300 GB): OK | |
| physicaldrive 1I:3:2 (port 1I:box 3:bay 2, 300 GB): OK | |
| physicaldrive 1I:3:3 (port 1I:box 3:bay 3, 300 GB): OK | |
| physicaldrive 1I:3:4 (port 1I:box 3:bay 4, 300 GB): OK |
配置zabbix_aget.conf
| echo "UserParameter=diskbad.count[*],/usr/local/sbin/diskinfo_z.sh" >> /etc/zabbix/zabbix_agentd.conf |
增加每天检测的计划任务
| 01 01 * * * /usr/sbin/ssacli ctrl slot=0 pd all show status | grep Failed|wc -l > /usr/local/sbin/diskinfo.txt |
增加检测脚本
| cat > /usr/local/sbin/diskinfo_z.sh <<EOF | |
| #!/bin/bash | |
| cat /usr/local/sbin/diskinfo.txt | |
| EOF |
给脚本增加可执行权限
| chmod +x /usr/local/sbin/diskinfo_z.sh |
配置zabbix监控项

配置触发器

重起一下客户端服务
| systemctl restart zabbix-agent |
测试效果
| echo 1 > /usr/local/sbin/diskinfo.txt |

钉钉群收到报警测试成功

附监控模板 zbx_export_templates.xml
| <zabbix_export> | |
| <version>5.0</version> | |
| <date>2023-03-23T08:00:53Z</date> | |
| <groups> | |
| <group> | |
| <name>Linux servers</name> | |
| </group> | |
| </groups> | |
| <templates> | |
| <template> | |
| <template>HPdiskbakcount</template> | |
| <name>HP服务器硬盘健康状态监控</name> | |
| <description>HP服务器硬盘健康状态监控</description> | |
| <groups> | |
| <group> | |
| <name>Linux servers</name> | |
| </group> | |
| </groups> | |
| <items> | |
| <item> | |
| <name>diskbadcount</name> | |
| <key>diskbad.count</key> | |
| <description>HP服务器硬盘健康度监控,一天更新一次数据</description> | |
| <valuemap> | |
| <name>diskbadcount</name> | |
| </valuemap> | |
| <triggers> | |
| <trigger> | |
| <expression>{last(,30)}<>0</expression> | |
| <name>物理硬盘状态异常,请尽快检查处理</name> | |
| <priority>HIGH</priority> | |
| <description>物理硬盘状态异常</description> | |
| <manual_close>YES</manual_close> | |
| </trigger> | |
| </triggers> | |
| </item> | |
| </items> | |
| </template> | |
| </templates> | |
| <graphs> | |
| <graph> | |
| <name>HP服务器硬盘健康度</name> | |
| <graph_items> | |
| <graph_item> | |
| <sortorder>1</sortorder> | |
| <color>1A7C11</color> | |
| <item> | |
| <host>HPdiskbakcount</host> | |
| <key>diskbad.count</key> | |
| </item> | |
| </graph_item> | |
| </graph_items> | |
| </graph> | |
| </graphs> | |
| <value_maps> | |
| <value_map> | |
| <name>diskbadcount</name> | |
| <mappings> | |
| <mapping> | |
| <value>0</value> | |
| <newvalue>OK</newvalue> | |
| </mapping> | |
| <mapping> | |
| <value>1</value> | |
| <newvalue>Failed1</newvalue> | |
| </mapping> | |
| <mapping> | |
| <value>2</value> | |
| <newvalue>Failed2</newvalue> | |
| </mapping> | |
| </mappings> | |
| </value_map> | |
| </value_maps> | |
| </zabbix_export> |

浙公网安备 33010602011771号