prometheus监控vmware vsphere vcsa
【1】我的最佳实现
去官网下载好包:https://github.com/pryorda/vmware_exporter
# 解压
gizp -d vmware_exporter-0.18.3.tar
cat vmware_exporter-0.18.3.tar |docker import - vmware_exporter
docker获取,vim /etc/docker/daemon-reload
{ "registry-mirrors": [ "https://docker.registry.cyou", "https://docker-cf.registry.cyou", "https://dockercf.jsdelivr.fyi", "https://docker.jsdelivr.fyi", "https://dockertest.jsdelivr.fyi", "https://mirror.aliyuncs.com", "https://dockerproxy.com", "https://mirror.baidubce.com", "https://docker.m.daocloud.io", "https://docker.nju.edu.cn", "https://docker.mirrors.sjtug.sjtu.edu.cn", "https://docker.mirrors.ustc.edu.cn", "https://mirror.iscas.ac.cn", "https://docker.rainbond.cc", "https://docker.xuanyuan.me", "https://doublezonline.cloud", "https://dislabaiot.xyz", "https://docker.fxxk.dedyn.io" ] }
systemctl daemon-reload
systemctl restart docker
docker pull pryorda/vmware_exporter
一、Linux 运行VMware_exporter
(1)直接运行方式
VMware_exporter不需要做持久化
直接运行方式:
#docker 运行的方式 docker run -d -p 9272:9272 -e VSPHERE_USER=administrator@192.168.31.200 -e VSPHERE_PASSWORD=密码 -e VSPHERE_HOST=主机 -e VSPHERE_IGNORE_SSL=True -e VSPHERE_SPECS_SIZE=2000 --name vmware_exporter pryorda/vmware_exporter
#Linux直接运行 #需要安装python3.6 我这里就不演示了 Requires Python >= 3.6 1.Install with $ python setup.py install or via pip $ pip install vmware_exporter. The docker command below is preferred. 2.Create config.yml based on the configuration section. Some variables can be passed as environment variables 3.Run $ vmware_exporter -c /path/to/your/config 4.Go to http://localhost:9272/metrics?vsphere_host=vcenter.company.com to see metrics
访问测试http://192.168.31.200:9272/metrics 访问docker主机的9272端口,也可以自己定义。
(2)连接参数采用配置文件方式
cat vmware_config.env<<eof VSPHERE_USER=administrator@vsphere.local VSPHERE_PASSWORD=xxx(密码) VSPHERE_HOST=xxx VSPHERE_IGNORE_SSL=TRUE VSPHERE_SPECS_SIZE=2000 eof # VSPHERE_USERNAME vcenter用户名 # VSPHERE_PASSWORD vcenter密码 # VSPHERE_HOST vcenter地址 # 运行容器 docker run -itd -p 9272:9272 --name vmware_exporter --env-file /data/vmware/vmware_config.env pryorda/vmware_exporter
访问测试http://192.168.31.200:9272/metrics 访问docker主机的9272端口,也可以自己定义。
(3)深入 vmware_exporter 环境变量与参数
-e VSPHERE_USER=administrator@192.168.31.200 --用户名
-e VSPHERE_PASSWORD=密码 --密码
-e VSPHERE_HOST=主机 -- IP
-e VSPHERE_IGNORE_SSL=True -- SSL
-e VSPHERE_SPECS_SIZE=2000 -- 大小空间
环境变量配置信息
| 变量名 | 默认值 | 注释 |
|---|---|---|
| VSPHERE_HOST | 连接地址 | |
| VSPHERE_USER | 用户名 | |
| VSPHERE_PASSWORD | 密码 | |
| VSPHERE_SPECS_SIZE | 5000 | 查询统计功能的规格列表大小 |
| VSPHERE_IGNORE_SSL | False | 忽略与 vsphere 主机的连接上的 ssl 证书 |
| VSPHERE_FETCH_CUSTOM_ATTRIBUTES | False | 设置为 true 以收集对象自定义属性作为度量标签 |
| VSPHERE_FETCH_TAGS | False | 设置为 true 以收集对象标签作为度量标签 |
| VSPHERE_FETCH_ALARMS | False | 获取对象触发警报,并且在主机 hdw 警报的情况下也是如此 |
| VSPHERE_COLLECT_HOSTS | True | 设置为 false 以禁用主机指标收集 |
| VSPHERE_COLLECT_DATASTORES | True | 设置为 false 以禁用数据存储指标的收集 |
| VSPHERE_COLLECT_VMS | True | 设置为 false 以禁用收集虚拟机指标 |
| VSPHERE_COLLECT_VMGUESTS | True | 设置为 false 以禁用虚拟机来宾指标的收集 |
| VSPHERE_COLLECT_SNAPSHOTS | True | 设置为 false 以禁用快照指标的收集 |
八、Grafana设置
Prometheus中已经有数据了,接下来导入模板~
https://grafana.com/grafana/dashboards/11243
目前使用的是11243模板,没有找到更好的,后续自己改一个
九、AlertManager 告警配置
alertmanager基于下方文章搭建
添加rule文件
#注意格式,格式错误会让prometheus无法启动 vim /etc/prometheus/rules/vmware_exporter.yaml groups: - name: vmware status rules: - alert: HighNumberOfSnapshots #大量快照模板 expr: vmware_vm_snapshots > 5 for: 30m labels: severity: warning annotations: summary: High Number of Snapshots (instance {{ $labels.instance }}) description: "High snapshots number on {{ $labels.instance }}: {{ $value }}n Num = {{ $value }}n VMware_Name = {{ $labels.vm_name }}" - alert: VirtualMachineMemoryWarning #虚拟机内存警告模板 expr: vmware_vm_mem_usage_average / 100 >= 80 and vmware_vm_mem_usage_average / 100 < 50 for: 30m labels: severity: warning annotations: summary: Virtual Machine Memory Warning (instance {{ $labels.instance }}) description: "High memory usage on {{ $labels.instance }}: {{ $value | printf "%.2f"}}%n VALUE = {{ $value }}n VMware_Name = {{ $labels.vm_name }}" - alert: VirtualMachineMemoryCritical #虚拟机内存严重模板 expr: vmware_vm_mem_usage_average / 100 >= 90 for: 30m labels: severity: error annotations: summary: Virtual Machine Memory Critical (instance {{ $labels.instance }}) description: "High memory usage on {{ $labels.instance }}: {{ $value | printf "%.2f"}}%n VALUE = {{ $value }}n VMware_Name = {{ $labels.vm_name }}" - alert: OutdatedSnapshots #过时的快照模板 (单位是舔) expr: (time() - vmware_vm_snapshot_timestamp_seconds) / (60 * 60 * 24) >= 90 for: 30m labels: severity: warning annotations: summary: Outdated Snapshots (instance {{ $labels.instance }}) description: "Outdated snapshots on {{ $labels.instance }}: {{ $value | printf "%.0f"}} daysn VALUE = {{ $value }}n VMware_Name = {{ $labels.vm_name }}" - alert: EsxiHostMemoryCritical #esxi内存使用百分比模板 expr: ((vmware_host_memory_usage / vmware_host_memory_max) * 100) > 50 for: 5m labels: severity: warning annotations: summary: Esxi Host Memory Warning (instance {{ $labels.instance }}) description: "Outdated Host Esxi Memory on {{ $labels.instance }}: {{ $value | printf "%.0f"}} %n VALUE = {{ $value }}n VMware_Name = {{ $labels.host_name }}" - alert: EsxiHostCPUCritical #esxi cpu 百分比 expr: ((vmware_host_cpu_usage / vmware_host_cpu_max) * 100) > 50 for: 5m labels: severity: warning annotations: summary: Esxi Host CPU Warning (instance {{ $labels.instance }}) description: "Outdated Host Esxi CPU on {{ $labels.instance }}: {{ $value | printf "%.0f"}} %n VALUE = {{ $value }}n VMware_Name = {{ $labels.host_name }}"
这里我根据我自己的需求单独添加了2个,针对esxi cpu和内存的触发器报警。它的监控项比较多,大家可以根据自己的需求来添加告警。不一定要和我相同

浙公网安备 33010602011771号