Leo Zhang
我是一块砖,哪里需要哪里搬!

【问题】使用kube-prometheus无法监控到自定义命名空间下的资源情况

已知:多个服务开启jmx监控,并新建一个service用于匹配 开启监控的pod,匹配标签为 jmx=prometheus,命名空间为自定义的jmbymt,查看endpoints信息正常,并且能获取到指标。

[root@ymt36 tmo]# kubectl -n jmbymt describe svc jmxprometheus
Name:              jmxprometheus
Namespace:         jmbymt
Labels:            jmx=prometheus
Annotations:       kubectl.kubernetes.io/last-applied-configuration:
                     {"apiVersion":"v1","kind":"Service","metadata":{"annotations":{},"labels":{"jmx":"prometheus"},"name":"jmxprometheus","namespace":"jmbymt"...
Selector:          jmx=prometheus
Type:              ClusterIP
IP:                10.87.234.23
Port:              jmx  8013/TCP
TargetPort:        8013/TCP
Endpoints:         10.20.234.148:8013,10.20.234.149:8013,10.20.234.164:8013 + 7 more...
Session Affinity:  None
Events:            <none>

[root@ymt36 tmo]# curl http://10.87.234.23:8013/metrics
# HELP jmx_exporter_build_info A metric with a constant '1' value labeled with the version of the JMX exporter.
# TYPE jmx_exporter_build_info gauge
jmx_exporter_build_info{version="0.13.0",name="jmx_prometheus_javaagent",} 1.0
# HELP jvm_gc_collection_seconds Time spent in a given JVM garbage collector in seconds.
# TYPE jvm_gc_collection_seconds summary
jvm_gc_collection_seconds_count{gc="Copy",} 8180.0
jvm_gc_collection_seconds_sum{gc="Copy",} 70.006
jvm_gc_collection_seconds_count{gc="MarkSweepCompact",} 5.0
jvm_gc_collection_seconds_sum{gc="MarkSweepCompact",} 0.337
# HELP jmx_config_reload_failure_total Number of times configuration have failed to be reloaded.
# TYPE jmx_config_reload_failure_total counter
jmx_config_reload_failure_total 0.0
# HELP jvm_classes_loaded The number of classes that are currently loaded in the JVM
# TYPE jvm_classes_loaded gauge
jvm_classes_loaded 12033.0
# HELP jvm_classes_loaded_total The total number of classes that have been loaded since the JVM has started execution
# TYPE jvm_classes_loaded_total counter
jvm_classes_loaded_total 12087.0
# HELP jvm_classes_unloaded_total The total number of classes that have been unloaded since the JVM has started execution
# TYPE jvm_classes_unloaded_total counter
jvm_classes_unloaded_total 54.0
# HELP jvm_memory_bytes_used Used bytes of a given JVM memory area.
# TYPE jvm_memory_bytes_used gauge
jvm_memory_bytes_used{area="heap",} 1.30044704E8
jvm_memory_bytes_used{area="nonheap",} 1.22438832E8

添加Prometheus监控规则:

- job_name: jmxprometheus
  honor_labels: false 
  kubernetes_sd_configs:
  - role: endpoints
    namespaces:
      names:
      - jmbymt
  scrape_interval: 30s
  relabel_configs:
  - action: keep
    source_labels:
    - __meta_kubernetes_service_label_jmx
    regex: prometheus
  - action: keep
    source_labels:
    - __meta_kubernetes_endpoint_port_name
    regex: jmx
  - source_labels:
    - __meta_kubernetes_endpoint_address_target_kind
    - __meta_kubernetes_endpoint_address_target_name
    separator: ;
    regex: Node;(.*)
    replacement: ${1}
    target_label: node
  - source_labels:
    - __meta_kubernetes_endpoint_address_target_kind
    - __meta_kubernetes_endpoint_address_target_name
    separator: ;
    regex: Pod;(.*)
    replacement: ${1}
    target_label: pod
  - source_labels:
    - __meta_kubernetes_namespace
    target_label: namespace     
  - source_labels:
    - __meta_kubernetes_service_name
    target_label: service
  - source_labels:
    - __meta_kubernetes_pod_name
    target_label: pod
  - source_labels:
    - __meta_kubernetes_service_name
    target_label: job
    replacement: ${1}
  - source_labels:
    - __meta_kubernetes_service_label_jmx
    target_label: job
    regex: (.+)
    replacement: ${1}
  - target_label: endpoint
    replacement: jmx

查看prometheus日志发现报错:

[root@cicd ~]# kubectl -n monitoring logs -f -l prometheus=k8s --all-containers=true --max-log-requests=2
error: you are attempting to follow 6 log streams, but maximum allowed concurrency is 2, use --max-log-requests to increase the limit
[root@cicd ~]# kubectl -n monitoring logs -f -l prometheus=k8s -c prometheus --max-log-requests=2
level=error ts=2020-12-01T08:22:28.047Z caller=klog.go:94 component=k8s_client_runtime func=ErrorDepth msg="/app/discovery/kubernetes/kubernetes.go:263: Failed to list *v1.Endpoints: endpoints is forbidden: User \"system:serviceaccount:monitoring:prometheus-k8s\" cannot list resource \"endpoints\" in API group \"\" in the namespace \"jmbymt\""
level=error ts=2020-12-01T08:22:29.047Z caller=klog.go:94 component=k8s_client_runtime func=ErrorDepth msg="/app/discovery/kubernetes/kubernetes.go:265: Failed to list *v1.Pod: pods is forbidden: User \"system:serviceaccount:monitoring:prometheus-k8s\" cannot list resource \"pods\" in API group \"\" in the namespace \"jmbymt\""
level=error ts=2020-12-01T08:22:29.049Z caller=klog.go:94 component=k8s_client_runtime func=ErrorDepth msg="/app/discovery/kubernetes/kubernetes.go:264: Failed to list *v1.Service: services is forbidden: User \"system:serviceaccount:monitoring:prometheus-k8s\" cannot list resource \"services\" in API group \"\" in the namespace \"jmbymt\""
level=error ts=2020-12-01T08:22:29.049Z caller=klog.go:94 component=k8s_client_runtime func=ErrorDepth msg="/app/discovery/kubernetes/kubernetes.go:263: Failed to list *v1.Endpoints: endpoints is forbidden: User \"system:serviceaccount:monitoring:prometheus-k8s\" cannot list resource \"endpoints\" in API group \"\" in the namespace \"jmbymt\""
level=error ts=2020-12-01T08:22:30.050Z caller=klog.go:94 component=k8s_client_runtime func=ErrorDepth msg="/app/discovery/kubernetes/kubernetes.go:265: Failed to list *v1.Pod: pods is forbidden: User \"system:serviceaccount:monitoring:prometheus-k8s\" cannot list resource \"pods\" in API group \"\" in the namespace \"jmbymt\""
level=error ts=2020-12-01T08:22:30.051Z caller=klog.go:94 component=k8s_client_runtime func=ErrorDepth msg="/app/discovery/kubernetes/kubernetes.go:264: Failed to list *v1.Service: services is forbidden: User \"system:serviceaccount:monitoring:prometheus-k8s\" cannot list resource \"services\" in API group \"\" in the namespace \"jmbymt\""
level=error ts=2020-12-01T08:22:30.051Z caller=klog.go:94 component=k8s_client_runtime func=ErrorDepth msg="/app/discovery/kubernetes/kubernetes.go:263: Failed to list *v1.Endpoints: endpoints is forbidden: User \"system:serviceaccount:monitoring:prometheus-k8s\" cannot list resource \"endpoints\" in API group \"\" in the namespace \"jmbymt\""
level=error ts=2020-12-01T08:22:31.053Z caller=klog.go:94 component=k8s_client_runtime func=ErrorDepth msg="/app/discovery/kubernetes/kubernetes.go:265: Failed to list *v1.Pod: pods is forbidden: User \"system:serviceaccount:monitoring:prometheus-k8s\" cannot list resource \"pods\" in API group \"\" in the namespace \"jmbymt\""

 

【解决】更改prometheus集群访问权限即可

[root@ymt36 custom]# cat prometheus-clusterRole.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: prometheus-k8s
rules:
- apiGroups:
  - ""
  resources:
  - nodes
  - services
  - endpoints
  - pods
  verbs: ["get", "list", "watch"]
- nonResourceURLs:
  - /metrics
  verbs:
  - get

 

Github issue:https://github.com/prometheus-operator/prometheus-operator/issues/2155#issuecomment-441002864

 

作者:Leozhanggg

出处:https://www.cnblogs.com/leozhanggg/p/14069375.html

本文版权归作者和博客园共有,欢迎转载,但未经作者同意必须保留此段声明,且在文章页面明显位置给出原文连接,否则保留追究法律责任的权利。

 

posted on 2020-12-01 17:03  LeoZhanggg  阅读(1810)  评论(0编辑  收藏  举报