启用ceph dashboard及并通过prometheus 监控ceph集群状态

ceph dashboard

Dashboard介绍

Ceph dashboard 是通过一个 web 界面，对已经运行的 ceph 集群进行状态查看及功能配置等功能，早期 ceph 使用的是第三方的 dashboard 组件

启用 dashboard 插件

https://docs.ceph.com/en/mimic/mgr/

https://docs.ceph.com/en/latest/mgr/dashboard/

https://packages.debian.org/unstable/ceph-mgr-dashboard 15 版本有依赖需要单独解决

Ceph mgr 是一个多插件(模块化)的组件，其组件可以单独的启用或关闭,以下为在 ceph-deploy 服务器操作：

新版本需要安装 dashboard，而且必须安装在 mgr 节点，否则报错如下：

The following packages have unmet dependencies:

ceph-mgr-dashboard : Depends: ceph-mgr (= 15.2.13-1~bpo10+1) but it is not going to be installed

E: Unable to correct problems, you have held broken packages.

ceph-mgr 节点安装 ceph-mgr-dashboard

root@ceph-mgr1:~# apt-cache madison ceph-mgr-dashboard
ceph-mgr-dashboard | 16.2.10-1bionic | https://mirrors.tuna.tsinghua.edu.cn/ceph/debian-pacific bionic/main amd64 Packages
ceph-mgr-dashboard | 16.2.10-1bionic | https://mirrors.tuna.tsinghua.edu.cn/ceph/debian-pacific bionic/main i386 Packages

root@ceph-mgr1:~# apt install ceph-mgr-dashboard
root@ceph-mgr2:~# apt install ceph-mgr-dashboard

ceph-deploy节点操作

cephadmin@ceph-deploy:~/ceph-cluster$ ceph mgr module -h   #查看ceph mgr module 帮助

cephadmin@ceph-deploy:~/ceph-cluster$ ceph mgr module ls   #列巨额所有 ceph mgr 模块
{
    "always_on_modules": [
        "balancer",
        "crash",
        "devicehealth",
        "orchestrator",
        "pg_autoscaler",
        "progress",
        "rbd_support",
        "status",
        "telemetry",
        "volumes"
    ],
    "enabled_modules": [    #已经开启的模块，可以看出没有启动dashboard模块
        "iostat",
        "nfs",
        "restful"
    ],
    "disabled_modules": [    #已关闭的模块
        {
            "name": "alerts",
            "can_run": true,    #是否可以启用
            "error_string": "",
            "module_options": {
                "interval": {
                    "name": "interval",
                    "type": "secs",
                    "level": "advanced",

启用dashboard模块

cephadmin@ceph-deploy:~/ceph-cluster$ ceph mgr module enable dashboard

注：模块启用后还不能直接访问，需要配置关闭 SSL 或启用 SSL 及指定监听地址。

配置 dashboard 模块

配置Ceph dashboard 关闭 SSL，如下：

#禁用 dashboard 的 ssl
cephadmin@ceph-deploy:~/ceph-cluster$ ceph config set mgr mgr/dashboard/ssl false

配置方法一：Ceph dashboard 可以只对 mgr1 节点进行开启设置

#指定 dashboard 的监听地址为其中一个 mgr节点的ip
cephadmin@ceph-deploy:~/ceph-cluster$ ceph config set mgr mgr/dashboard/ceph-mgr1/server_addr 172.16.100.38

#指定 dashboard 的 在 mgr1 节点上监听的端口为 9009
cephadmin@ceph-deploy:~/ceph-cluster$ ceph config set mgr mgr/dashboard/ceph-mgr1/server_port 9009

配置方法二（推荐）：设置多个mgr监听，如果 mgr 1172.16.100.38 节点mgr服务宕机，则可以在其他 mgr 节点访问dashboard，做到 dashboard 的高可用

#指定 dashboard 的监听地址为其中一个 mgr节点的ip
cephadmin@ceph-deploy:~/ceph-cluster$ ceph config set mgr mgr/dashboard/server_addr 172.16.100.38

#指定 dashboard 的 在 mgr1 节点上监听的端口为 9009
cephadmin@ceph-deploy:~/ceph-cluster$ ceph config set mgr mgr/dashboard/server_port 9009

这里使用的是方法二的配置。配置完成后，重启模块，加载配置

cephadmin@ceph-deploy:~/ceph-cluster$ ceph mgr module disable dashboard
cephadmin@ceph-deploy:~/ceph-cluster$ ceph mgr module enable dashboard

检查ceph状态

cephadmin@ceph-deploy:~/ceph-cluster$ ceph -s
  cluster:
    id:     5372c074-edf7-45dd-b635-16422165c17c
    health: HEALTH_OK
 
  services:
    mon: 3 daemons, quorum ceph-mon1,ceph-mon2,ceph-mon3 (age 107m)
    mgr: ceph-mgr2(active, since 18m), standbys: ceph-mgr1
    mds: 2/2 daemons up, 2 standby
    osd: 20 osds: 20 up (since 6h), 20 in (since 7d)
    rgw: 2 daemons active (2 hosts, 1 zones)
 
  data:
    volumes: 1/1 healthy
    pools:   12 pools, 337 pgs
    objects: 304 objects, 68 MiB
    usage:   1.1 GiB used, 2.0 TiB / 2.0 TiB avail
    pgs:     337 active+clean

如果有以下报错：需要检查 mgr 服务是否正常运行，可以重启一遍 mgr 服务

Module 'dashboard' has failed: error('No socket could be created',)

第一次启用 dashboard 插件需要等一段时间(几分钟)，再去被启用的 mgr1 节点验证。

如果长时间等待 mgr1 节点并没哟监听 9009的服务，那么需要手动重启 mgr 服务

root@ceph-mgr1:~# ss -lntup|grep 9009
root@ceph-mgr1:~# systemctl restart ceph-mgr@ceph-mgr1.service

如果重启ceph-mgr@ceph-mgr1.service报错:

Dec 21 16:23:50 ceph-mgr1 systemd[1]: ceph-mgr@ceph-mgr1.service: Start request repeated too quickly.
Dec 21 16:23:50 ceph-mgr1 systemd[1]: ceph-mgr@ceph-mgr1.service: Failed with result 'start-limit-hit'.
Dec 21 16:23:50 ceph-mgr1 systemd[1]: Failed to start Ceph cluster manager daemon.

修改ceph-mgr.target.service启动文件，注释启动时间间隔

root@ceph-mgr1:/var/log/ceph# vim /lib/systemd/system/ceph-mgr@.servic
#StartLimitInterval=30min

浏览器访问：mgr1节点ip 172.16.100.38:9009

关闭 mgr1 节点 mgr 服务，验证dashboard的高可用

root@ceph-mgr1:~# lsof -i :9009
COMMAND    PID USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
ceph-mgr 16144 ceph   32u  IPv4 159123      0t0  TCP ceph-mgr1.example.local:9009 (LISTEN)

root@ceph-mgr1:~# systemctl stop ceph-mgr@ceph-mgr1.service 

root@ceph-mgr1:~# lsof -i :9009

root@ceph-mgr1:~# ceph -s

查看 mgr2 节点 dashboard 端口的监听，并访问 mgr2节点 172.16.100.39:9009

成功访问。

设置 dashboard 账户及密码

方法1（推荐）：指定文件进行设置

cephadmin@ceph-deploy:~/ceph-cluster$ touch pass.txt

cephadmin@ceph-deploy:~/ceph-cluster$ echo "123456" > pass.txt 

cephadmin@ceph-deploy:~/ceph-cluster$ ceph dashboard set-login-credentials lxh -i pass.txt 
******************************************************************
***          WARNING: this command is deprecated.              ***
*** Please use the ac-user-* related commands to manage users. ***
******************************************************************
Username and password updated

方法2：直接指定用户名和密码

在 ceph pacific 16.x 版本已经启用此方法

命令格式：

Dashboard set-login-credentials <username> <password>    Set the login credentials
 创建用户并生成密码
cephadmin@ceph-deploy:~/ceph-cluster$ ceph dashboard set-login-credentials lxh 123456

dashboard HTTPS SSL 配置

如果要使用 SSL 访问。则需要配置签名证书。证书可以使用 ceph 命令生成，或是 opessl 命令生成。生成建议使用 nginx 反向代理，并在 nginx上配置 https

https://docs.ceph.com/en/latest/mgr/dashboard/

ceph 自签名证书

1、使用 ceph dashboard 创建自签名证书

cephadmin@ceph-deploy:~/ceph-cluster$ ceph dashboard create-self-signed-cert 
Self-signed certificate created

2、开启 dashboard ssl协议，并设置 ssl https 端口为 9443，默认为8443

cephadmin@ceph-deploy:~/ceph-cluster$ ceph config set mgr mgr/dashboard/ssl true
cephadmin@ceph-deploy:~/ceph-cluster$ ceph config set mgr mgr/dashboard/ssl_server_port 9443

3、重启模块，加载配置

cephadmin@ceph-deploy:~/ceph-cluster$ ceph mgr module disable dashboard
cephadmin@ceph-deploy:~/ceph-cluster$ ceph mgr module enable dashboard

4、查看mgr dashboard状态

cephadmin@ceph-deploy:~/ceph-cluster$ ceph mgr services
{
    "dashboard": "https://172.16.100.39:9443/"
}

5、查看ceph状态

cephadmin@ceph-deploy:~/ceph-cluster$ ceph -s
  cluster:
    id:     5372c074-edf7-45dd-b635-16422165c17c
    health: HEALTH_OK
 
  services:
    mon: 3 daemons, quorum ceph-mon1,ceph-mon2,ceph-mon3 (age 94m)
    mgr: ceph-mgr2(active, since 115s), standbys: ceph-mgr1
    mds: 2/2 daemons up, 2 standby
    osd: 20 osds: 20 up (since 9h), 20 in (since 7d)
    rgw: 2 daemons active (2 hosts, 1 zones)
 
  data:
    volumes: 1/1 healthy
    pools:   12 pools, 337 pgs
    objects: 306 objects, 68 MiB
    usage:   1.2 GiB used, 2.0 TiB / 2.0 TiB avail
    pgs:     337 active+clean

6、浏览器访问验证：

ceph 监控

通过 prometheus 监控 ceph node 节点

https://prometheus.io/

部署node_exporter

ceph 集群各个 node 节点部署 node_exporter

root@ceph-node1:/usr/local# tar xf node_exporter-1.3.1.linux-amd64.tar.gz
root@ceph-node1:/usr/local# mv node_exporter-1.3.1.linux-amd64 node_exporter

创建启动文件

root@ceph-node1:/usr/local# vim /etc/systemd/system/node-exporter.service
[Unit]
Description=Prometheus Node Exporter
After=network.target

[Service]
ExecStart=/usr/local/node_exporter/node_exporter

[Install]
WantedBy=multi-user.target

启动node_exporter

root@ceph-node1:/usr/local# systemctl daemon-reload && systemctl restart node-exporter && systemctl enable node-exporter.service

验证各个节点 node_exporter

prometheus server 采集node-exporter

添加 ceph-node 节点采集任务

root@prometheus:/usr/local/prometheus# vim prometheus.yml
- job_name: "ceph-node" 
    static_configs:
      - targets: ["172.16.100.31:9100","172.16.100.32:9100","172.16.100.33:9100","172.16.100.34:9100"]

#重启prometheus
root@prometheus:/usr/local/prometheus\# systemctl restart prometheus

prometheus server 验证

通过 prometheus 监控 ceph 服务

Ceph manager 内部的模块中包含了 prometheus 的监控模块,并监听在每个 manager 节点的 9283 端口，该端口用于将采集到的信息通过 http 接口向 prometheus 提供数据。https://docs.ceph.com/en/mimic/mgr/prometheus/?highlight=prometheus

启用 prometheus 监控模块

开启 mgr 节点 prometheus监控模块

cephadmin@ceph-deploy:~/ceph-cluster$ ceph mgr module enable prometheus

验证模块开启

cephadmin@ceph-deploy:~/ceph-cluster$ ceph mgr module ls |less
{
    "always_on_modules": [
        "balancer",
        "crash",
        "devicehealth",
        "orchestrator",
        "pg_autoscaler",
        "progress",
        "rbd_support",
        "status",
        "telemetry",
        "volumes"
    ],
    "enabled_modules": [
        "dashboard",
        "iostat",
        "nfs",
        "prometheus",
        "restful"
    ],
    "disabled_modules": [
        {
            "name": "alerts",
            "can_run": true,
            "error_string": "",
            "module_options": {
                "interval": {
                    "name": "interval",
                    "type": "secs",

验证 mgr 节点端口监听

root@ceph-mgr1:~# ss -lntup | grep 9283
tcp   LISTEN  0       5                          *:9283                 *:*      users:(("ceph-mgr",pid=1247,fd=36))

浏览器访问 mgr 指标

配置 prometheus 采集数据

添加 mgr 节点 metrics 采集任务

root@prometheus:/usr/local/prometheus# vim prometheus.yml 
- job_name: "ceph-mgr"
    static_configs:
      - targets: ["172.168.100.38:9283"]

#重启prometheus
root@prometheus:/usr/local/prometheus# systemctl restart prometheus

prometheus server 验证