FASTAPI自定义钉钉告警-JAVA服务监控

配置文件:config.yaml

global:
  dingtalk_webhook: "https://oapi.dingtalk.com/robot/send?access_token=b18fc3a58fd6ab1c153eae423b5d92c3edf90164736190d2e4361d8ffbfc76be"
  keyword: "监控"
  check_interval: 60  # 单位:秒
  environment: "生产环境"
  severity: "严重"

nodes:
  - name: "订单服务"
    url: "http://10.0.0.1:8080/actuator/health"
  - name: "支付服务"
    url: "http://10.0.0.2:8080/actuator/health"
  - name: "库存服务"
    url: "http://10.0.0.3:8080/actuator/health"

完整脚本:monitor.py

# -*- coding: utf-8 -*-
# pip3 install fastapi uvicorn httpx apscheduler pyyaml
from fastapi import FastAPI
import httpx
import logging
import yaml
from urllib.parse import urlparse
from apscheduler.schedulers.background import BackgroundScheduler
from datetime import datetime

app = FastAPI()

# 日志配置
logging.basicConfig(level=logging.INFO)

# 状态缓存:记录每个节点上次状态
node_status_map = {}

# 读取配置
with open("./config.yaml", "r", encoding="utf-8") as f:
    config = yaml.safe_load(f)

GLOBAL_CONF = config["global"]
NODES = config["nodes"]

# 全局参数
WEBHOOK = GLOBAL_CONF["dingtalk_webhook"]
KEYWORD = GLOBAL_CONF["keyword"]
ENVIRONMENT = GLOBAL_CONF["environment"]
SEVERITY = GLOBAL_CONF["severity"]
INTERVAL = GLOBAL_CONF.get("check_interval", 60)


def now_str():
    return datetime.now().strftime("%Y-%m-%d %H:%M:%S")


def send_dingtalk_markdown_alert(title: str, content: str):
    headers = {"Content-Type": "application/json"}
    data = {
        "msgtype": "markdown",
        "markdown": {
            "title": f"{KEYWORD} - {title}",
            "text": content
        }
    }
    try:
        response = httpx.post(WEBHOOK, json=data, headers=headers)
        logging.info(f"[钉钉通知] {title}: {response.status_code} {response.text}")
    except Exception as e:
        logging.error(f"发送钉钉失败: {e}")


def check_node(node):
    name = node["name"]
    url = node["url"]
    last_status = node_status_map.get(name, True)

    parsed = urlparse(url)
    host = parsed.hostname or "未知"

    try:
        response = httpx.get(url, timeout=5.0)
        json_data = response.json()
        status = json_data.get("status", "UNKNOWN")

        if status != "UP":
            if last_status:
                content = f"""
### {KEYWORD} - 告警通知 🚨
- **告警名称**:{name}健康检查告警
- **环境**:{ENVIRONMENT}
- **级别**:{SEVERITY}
- **主机**:{host}
- **服务名称**:{name}
- **告警信息**:服务状态异常,当前状态为 `{status}`
- **告警时间**:{now_str()}
                """
                send_dingtalk_markdown_alert(f"{name} 服务异常", content)
            node_status_map[name] = False
        else:
            if not last_status:
                content = f"""
### {KEYWORD} - 恢复通知 ✅
- **告警名称**:{name}健康检查告警
- **环境**:{ENVIRONMENT}
- **级别**:{SEVERITY}
- **主机**:{host}
- **服务名称**:{name}
- **告警信息**:服务已恢复为 `UP`
- **恢复时间**:{now_str()}
                """
                send_dingtalk_markdown_alert(f"{name} 恢复正常", content)
            node_status_map[name] = True
    except Exception as e:
        logging.error(f"[{name}] 服务健康检查失败: {e}")
        if last_status:
            content = f"""
### {KEYWORD} - 告警通知 🚨
- **告警名称**:{name}健康检查异常
- **环境**:{ENVIRONMENT}
- **级别**:{SEVERITY}
- **主机**:{host}
- **服务名称**:{name}
- **告警信息**:服务检查失败,错误:`{e}`
- **告警时间**:{now_str()}
            """
            send_dingtalk_markdown_alert(f"{name} 检查失败", content)
        node_status_map[name] = False


def monitor_all_nodes():
    for node in NODES:
        check_node(node)


# 启动定时任务
scheduler = BackgroundScheduler()
scheduler.add_job(monitor_all_nodes, 'interval', seconds=INTERVAL)
scheduler.start()


@app.get("/")
def root():
    return {"message": f"共监控 {len(NODES)} 个服务,运行中..."}

启动服务:

确保安装了相关的依赖库,然后启动 FastAPI 服务:

pip install fastapi httpx pyyaml requests uvicorn

运行 FastAPI 应用:

uvicorn monitor:app --host 0.0.0.0 --port 8000
posted @ 2025-04-11 17:27  蒲公英PGY  阅读(58)  评论(0)    收藏  举报