1,监控指标分类:
硬件监控 温度,硬件故障等
系统监控 CPU,内存,硬盘,网卡流量,TCP状态,进程数
应用监控 Nginx、Tomcat、PHP、MySQL、Redis等
日志监控 系统日志、服务日志、访问日志、错误日志
安全监控 WAF,敏感文件监控
API监控 可用性,接口请求,响应时间
业务监控 例如电商网站,每分钟产生多少订单、注册多少用户、多少活跃用户、推广活动效果
流量分析 根据流量获取用户相关信息,例如用户地理位置、某页面访问状况、页面停留时间等
2,Prometheus提供了大量的官方以及第三方的exporters:
https://prometheus.io/docs/instrumenting/exporters/
(official) 官方开发的
不带(official)社区开发的
Prometheus默认的pull模式获取数据,这也是官方推荐的方式。
3,Prometheus 组成及架构:
Prometheus Server:收集指标和存储时间序列数据,并提供查询接口
ClientLibrary:客户端库
Push Gateway:短期存储指标数据。主要用于临时性的任务,将指标push到pushgateway,再由Prometheus Server从Pushgateway上pull。
Exporters:采集已有的第三方服务监控指标并暴露metrics
Alertmanager:告警
Web UI:简单的Web控制台
4,下载服务端二进制包:
prometheus-2.6.1.linux-amd64.tar.gz
[root@centos7 prometheus-2.6.1.linux-amd64]# ./prometheus –help
5,启动prometheus server端
[root@centos7 -amd64]# ./prometheus --config.file="./prometheus.yml"
6,检查配置文件语法:
[root@centos7 prometheus]# ./promtool check config prometheus.yml
7,服务端设置systemctl 启动:
[root@centos7 ~]# cat /usr/lib/systemd/system/prometheus.service
[Unit]
Description=prometheus
[Service]
ExecStart=/usr/local/prometheus/prometheus --config.file=/usr/local/prometheus/prometheus.yml
Restart=on-failure
[Install]
WantedBy=multi-user.target
[root@centos7 ~]#
relabel_configs :允许在采集之前对任何目标及其标签进行修改
重新标签的意义?
重命名标签名
删除标签
过滤目标
8,监控prometheus server本机:
global:
scrape_interval: 15s
evaluation_interval: 15s
alerting:
alertmanagers:
- static_configs:
- targets:
rule_files:
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['192.168.0.11:9090']
9,添加自定义标签:
global:
scrape_interval: 15s
evaluation_interval: 15s
alerting:
alertmanagers:
- static_configs:
- targets:
rule_files:
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['192.168.0.11:9090']
labels:
idc: bj
10,
process_cpu_seconds_total
返回值:
Element Value
process_cpu_seconds_total{idc="bj",instance="192.168.0.11:9090",job="prometheus"} 0.34
process_cpu_seconds_total{instance="192.168.0.11:9090",job="prometheus"} 3.14
process_cpu_seconds_total{idc="bj"}
返回值:
process_cpu_seconds_total{idc="bj",instance="192.168.0.11:9090",job="prometheus"} 0.86
11,重命名标签
global:
scrape_interval: 15s
evaluation_interval: 15s
alerting:
alertmanagers:
- static_configs:
- targets:
rule_files:
scrape_configs:
- job_name: 'bj'
static_configs:
- targets: ['192.168.0.11:9090']
relabel_configs:
- action: replace
source_labels: ['job']
regex: (.*) #匹配的是job标签的值:bj
replacement: $1 #$1的值就是(.*)匹配的
target_label: idc #即重命名标签,job='bj' 为 idc='bj'
12,删除标签
global:
scrape_interval: 15s
evaluation_interval: 15s
alerting:
alertmanagers:
- static_configs:
- targets:
rule_files:
scrape_configs:
- job_name: 'bj'
static_configs:
- targets: ['192.168.0.11:9090']
relabel_configs:
- action: replace
source_labels: ['job']
regex: (.*)
replacement: $1
target_label: idc
- action: labeldrop
regex: job
13,基于文件的服务发现
[root@centos7 prometheus]# cat /usr/local/prometheus/sd_config/test.yml
- targets: ['192.168.0.11:9090']
[root@centos7 prometheus]#
[root@centos7 prometheus]#
[root@centos7 prometheus]# cat prometheus.yml
global:
scrape_interval: 15s
evaluation_interval: 15s
alerting:
alertmanagers:
- static_configs:
- targets:
rule_files:
scrape_configs:
- job_name: 'bj'
file_sd_configs:
- files: ['/usr/local/prometheus/sd_config/*.yml']
refresh_interval: 5s
[root@centos7 prometheus]#
14,监控Linux服务器,node_exporter
node_exporter的可执行文件即可启动 node export,默认会启动9100端口。
[root@centos7 node]# cat /etc/systemd/system/node.service
[Unit]
Description=node
[Service]
Restart=on-failure
ExecStart=/usr/local/node/node_exporter
[Install]
WantedBy=multi-user.target
[root@centos7 node]#