prometheus 监控 eureka 里面的服务状态

 

1.用python写个eureka_exporter.py

pip install flask requests prometheus_client

 

from flask import Flask, Response
import requests
from prometheus_client import Gauge, generate_latest, REGISTRY
import prometheus_client
from prometheus_client import Counter, Gauge, generate_latest
from prometheus_client.core import CollectorRegistry

app = Flask(__name__)



# 设置 metrics
registry = CollectorRegistry(auto_describe=False)
downgrade = Gauge('eureka', 'eureka', ['instanceId','hostName', 'app', 'status', 'port'], registry=registry)

@app.route("/metrics")
def metrics():
    downgrade.clear()
    # 假设 Eureka 地址为 http://localhost:8761/eureka/apps/
    eureka_url = "http://10.20.11.138:9500/eureka/apps/"

    up_count = 0

    try:
        response = requests.get(eureka_url,headers = {"Content-Type": "application/json", "Accept": "application/json"})

        if response.status_code == 200:
            apps = response.json()['applications']['application']

            # 遍历所有应用并设置其状态
            for app in apps:
                app_name = app['name']
                for instance in app['instance']:
                    status = instance['status']
                    # print(instance)
                    if status == 'UP':
                        downgrade.labels(instanceId=instance['instanceId'], hostName=instance['hostName'], app=app_name, status=status, port=instance['port']['$']).set(1)
                    else:
                        downgrade.labels(instanceId=instance['instanceId'], hostName=instance['hostName'], app=app_name, status=status, port=instance['port']['$']).set(0)




        else:
            print(f"Failed to fetch data from Eureka. Status code: {response.status_code}")
    except Exception as e:
        print(f"Error fetching data from Eureka: {e}")

    metrics_page = prometheus_client.generate_latest(registry)
    return Response(metrics_page, mimetype=prometheus_client.CONTENT_TYPE_LATEST)


if __name__ == '__main__':
    app.run(host='0.0.0.0', port=8000)

 

2.配置Prometheus:
修改prometheus.yml文件,添加针对Eureka Server的scrape_configs。这里需要指定job_name、scrape_interval、metrics_path以及targets等参数。例如,如果你的Eureka Server运行在10.124.129.42:8000,并且暴露了/metrics端点用于指标收集,那么你的配置可能如下所示:

- job_name: 'eureka_exporter'
  scrape_interval: 10s
  metrics_path: '/metrics'
  static_configs:
    - targets: ['10.124.129.42:8000']

 

3.告警配置

- name: Eureka_Rules
  rules:

  # Eureka里面的 服务down告警
  - alert: EurekaDown
    expr: eureka==0
    for: 5s  
    labels:
      severity: critical
    annotations:
      summary: "{{ $labels.app }} instance down on {{ $labels.hostName }}"
      description: "{{ $labels.app }} instance {{ $labels.hostName }} has been down for more than 1 minute."
  - alert: 服务数量小于2
    expr: count by (app) (eureka{job="eureka_exporter"}) < 2
    for: 5s  
    labels:
      severity: critical
    annotations:
      summary: "{{ $labels.app }} 数量小于2"
      description: "{{ $labels.app }} 数量小于2 "

 

posted @ 2025-03-10 10:30  py哥  阅读(126)  评论(0)    收藏  举报