ELK之Kibana的可视化监控报警插件sentinl的配置

  参考:https://www.bbsmax.com/A/gGdXbgXmJ4/

                 https://www.deathearth.com/333.html

     https://www.cnblogs.com/amyzhu/p/10193557.html

  ELK搭建好之后,如何利用收集到的数据进行告警呢,可以使用插件sentiel

  一,安装环境

  1,系统环境

 

   2,软件版本选择

java 1.8.0_171
elasticsearch 6.2.4
kibana 6.2.4

   二,安装

  1,安装ELK

  略

  2,安装sentinl插件

  根据ELK版本下载插件,本次下载版本为6.2.4

  https://github.com/sirensolutions/sentinl/releases/

/usr/share/kibana/bin/kibana-plugin install file:///nas/nas/softs/elk/6.2.4/sentinl-v6.2.4-1.zip 

  安装后查看

 

   设置邮件,修改kibana配置文件/etc/kibana/kibana.yml在尾部添加以下内容

sentinl:
  settings:
    email:
      active: true
      user: xxx@xxx.com #邮箱地址
      password: xxxx       #邮箱密码或者授权码
      host: smtp.exmail.qq.com #发送邮件服务器
      ssl: true   #根据实际情况添加 改成false则port修改成25,如果是阿里云禁用25端口需要使用ssl
      port: 465
    report:
      active: true

 

 

 

  重启kibana

systemctl restart kibana

   打开head可以查看到生成了一个名字为wacter_alarms的索引

 

   

  打开kibana菜单可以看到sentina选项

  新建一个watchers

 

 

 

 

 

 

 

 

   修改完可以编辑或者测试

 

   点击运行测试

 

   查看告警信息

 

   配置advanced文件设置查询告警条件,一个较为完整的配置文件如下

{
  "actions": {
    "Email_alarm_773206d5-2977-465e-882d-762a7d69fe68": {
      "name": "Email alarm",
      "throttle_period": "15m",
      "email": {
        "priority": "low",
        "stateless": false,
        "body": "Find error log {{payload.hits.total}}", #发送邮件的内容,统计出现关键字错误的匹配次数
        "to": "xxx@xxx.com",           #邮件接收方自定义
        "from": "xxx@xxx.com"         #邮件发送方为kibana配置文件里面的邮箱
      }
    }
  },
  "input": {
    "search": {
      "request": {
        "index": [
          "system-log-*"                  #索引名
        ],
        "body": {
          "query": {
            "bool": {
              "must": [
                {
                  "range": {
                    "@timestamp": {                #匹配时间
                      "gte": "now-5m/m",          #大于或等于从现在减5分钟
                      "lte": "now/m",                 #小于等于现在
                      "format": "epoch_millis"
                    }
                  }
                }
              ],
              "filter": [
                {
                  "multi_match": {
                    "type": "best_fields",
                    "query": "error",                  #匹配日志里面是否出现关键字error
                    "lenient": true
                  }
                }
              ]
            }
          },
          "size": 0,
          "aggs": {
            "dateAgg": {
              "date_histogram": {
                "field": "@timestamp",
                "time_zone": "Asia/Shanghai",
                "interval": "1m",
                "min_doc_count": 1
              }
            }
          }
        }
      }
    }
  },
  "condition": {
    "script": {
      "script": "payload.hits.total>1"                #匹配的次数大于1则触发告警动作
    }
  },
  "trigger": {
    "schedule": {
      "later": "every 5 minutes"                      #每五分钟执行一次
    }
  },
  "disable": false,
  "report": false,
  "title": "system-log错误日志监控告警",
  "wizard": {},
  "save_payload": false,
  "spy": false,
  "impersonate": false
}

   PS:为方便理解加了注释,时间配置文件不可加注释

  监控对应日志五分钟内是否出现关键字error如果出现并且大于1则触发邮件告警

  往对应日志重定向几次error即可触发该告警

  邮件内容如下

 

   在写一个监控CPU使用率告警配置文件

{
  "actions": {
    "HTML_email_alarm_5fbf1925-81fc-4d73-a37e-b6ac8b9bfc06": {
      "name": "HTML email alarm",
      "throttle_period": "1m",
      "email_html": {
        "html": "五分钟内cpu使用率超过10% 次数为{{ payload.hits.total }}",
        "priority": "low",
        "stateless": false,
        "to": "xxx@xxx.com",
        "from": "xxx@xxx.com"
      }
    }
  },
  "input": {
    "search": {
      "request": {
        "index": [
          "metricbeat-*"
        ],
        "body": {
          "query": {
            "bool": {
              "filter": [
                {
                  "range": {
                    "system.cpu.total.pct": {
                      "gt": 0.1
                    }
                  }
                }
              ],
              "must": [
                {
                  "range": {
                    "@timestamp": {
                      "gte": "now-5m/m",
                      "lte": "now/m",
                      "format": "epoch_millis"
                    }
                  }
                }
              ]
            }
          },
          "size": 0,
          "aggs": {
            "dateAgg": {
              "date_histogram": {
                "field": "@timestamp",
                "time_zone": "Europe/Amsterdam",
                "interval": "1m",
                "min_doc_count": 1
              }
            }
          }
        }
      }
    }
  },
  "condition": {
    "script": {
      "script": "payload.hits.total >=1"
    }
  },
  "trigger": {
    "schedule": {
      "later": "every 5 minutes"
    }
  },
  "disable": false,
  "report": false,
  "title": "metricber",
  "wizard": {},
  "save_payload": true,
  "spy": false,
  "impersonate": false
}

   监控CPU使用率如果大于10%就告警,system.cpu.total.pct为浮点数,对比大于0.1就是大于10%

  

 

 

 

 

  

posted @ 2019-09-29 14:25  minseo  阅读(7884)  评论(0编辑  收藏  举报