部署Ganglia集群

　　　　　　　　　　　　　　　　　　部署Ganglia集群

　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　作者：尹正杰

一.实验环境说明

　　首先介绍一下我的Hadoop测试集群，采用的CentOS 7.6，其角色分配如下：
　　　　[nn]
　　　　hadoop101.yinzhengjie.com

　　　　[snn]
　　　　hadoop105.yinzhengjie.com

　　　　[dn]
　　　　hadoop102.yinzhengjie.com
　　　　hadoop103.yinzhengjie.com
　　　　hadoop104.yinzhengjie.com

　　综上所述，为了充分利用集群资源，于是我决定在hadoop105.yinzhengjie.com节点作为承载Ganglia服务器的节点。这意味着gmetad，gweb和rrdtool守护程序都将在这个服务器节点上运行。

二.安装Ganglia软件包

1>.在hadoop105.yinzhengjie.com节点上安装epel扩展源(因为CentOs默认源是没有Ganglia源的)

[root@hadoop105.yinzhengjie.com ~]# yum -y install epel-release  # 如下图所示，如果安装扩展源成功后，我们就可以看到有关ganglia的软件源啦~

2>.在hadoop105.yinzhengjie.com节点安装gmetad和gmod软件包

[root@hadoop105.yinzhengjie.com ~]# yum -y install ganglia-gmetad.x86_64 ganglia-gmond.x86_64


温馨提示：
　　通常ganglia-gmetad.x86_64软件包安装在服务端即可，而ganglia-gmond.x86_64软件包安装在客户端。
　　但由于我的hadoop105.yinzhengjie.com有Hadoop相关进程需要监控，因此在本机我除了安装ganglia-gmetad.x86_64软件包外，还安装了ganglia-gmond.x86_64。

3>.在其他hadoop集群节点安装gmod软件包

[root@hadoop101.yinzhengjie.com ~]# ansible all -m shell -a 'yum -y install epel-release'  # 让所有节点安装扩展源，否则无法安装Ganglia软件包。
[root@hadoop101.yinzhengjie.com ~]#
[root@hadoop101.yinzhengjie.com ~]# ansible all -m shell -a 'yum -y install ganglia-gmond.x86_64'  # 让hadoop集群的素有节点安装gmod进程。
[root@hadoop101.yinzhengjie.com ~]#

三.配置gmetad管理端

1>.备份配置文件(/etc/ganglia/gmetad.conf)

[root@hadoop105.yinzhengjie.com ~]# wc -l /etc/ganglia/gmetad.conf  # 很明显，默认的配置文件信息量还是蛮大的，有240多行文字描述，不过大多数都是注释信息
240 /etc/ganglia/gmetad.conf
[root@hadoop105.yinzhengjie.com ~]# 
[root@hadoop105.yinzhengjie.com ~]# egrep -v "^#|^$" /etc/ganglia/gmetad.conf  # 默认启用的就以下几个参数
data_source "my cluster" localhost
setuid_username ganglia
case_sensitive_hostnames 0
[root@hadoop105.yinzhengjie.com ~]# 
[root@hadoop105.yinzhengjie.com ~]# cp /etc/ganglia/gmetad.conf /etc/ganglia/gmetad.conf-`date +%F`  # 在做接下的操作之前，一定要执行该步骤，先把原生的配置文件备份一份，而后再去修改gmetad配置文件哟~
[root@hadoop105.yinzhengjie.com ~]#

2>.编辑"/etc/ganglia/gmetad.conf"配置文件

[root@hadoop105.yinzhengjie.com ~]# vim /etc/ganglia/gmetad.conf
[root@hadoop105.yinzhengjie.com ~]# 
[root@hadoop105.yinzhengjie.com ~]# egrep -v "^#|^$" /etc/ganglia/gmetad.conf
data_source "yinzhengjie-hadoop" hadoop105.yinzhengjie.com:8649  hadoop102.yinzhengjie.com:8649 hadoop103.yinzhengjie.com:8649
gridname "Yinzhengjie's Hadoop Cluster Monitoring"  
setuid_username ganglia  
xml_port 8651  
interactive_port 8652  
rrd_rootdir "/yinzhengjie/data/ganglia/rrds"  
case_sensitive_hostnames 0  
[root@hadoop105.yinzhengjie.com ~]# 

上述相关参数说明:
　　data_source:
　　　　语法格式如下:
　　　　　　data_source "my cluster" [polling interval] address1:port addreses2:port ...
　　　　关键字"data_source"后面必须紧跟标识源的唯一字符串，然后是可选的轮询间隔（以秒为单位）。平均将在此间隔轮询源。如果省略轮询间隔，则假定为15秒。再之后就是为数据源提供服务的计算机的列表。
　　　　需要注意的是:
　　　　　　(1)指定标识源可以理解为集群的名称，gmond端必须和这里配置的一致，特别是在多播模式中，它们依赖于此进行识别。
　　　　　　(2)为数据源提供服务的计算机的列表(使用空格隔开)，格式为"ip:port"或者"hostname:prot"，如果未指定端口，则假定为8649（默认gmond端口）。默认值：没有默认值

　　gridname：
　　　　指定web端网格的名称。上面的所有数据源都将被包装在具有此名称的网格标记中。默认值为:"unspecified"。
　　　　这里的网格实际上就是data_source指定的集群。

　　setuid_username：
　　　　若不指定则启动gmetad进程的默认用户nobody，ganglia用户是我们在安装软件包时自动创建出来的。

　　xml_port:
　　　　指定XML收集汇总的交互端口，可以telnet该端口来获取XML格式的数据。默认端口号是8651，无需配置，当然，你也可以自定义端口号。

　　interactive_port:
　　　　指定web端获取数据的端口，该端口在配置gweb时需要指定，若不指定默认端口号是8652，无需配置，当然，你也可以自定义端口号。

　　rrd_rootdir:
　　　　指定RRD数据库的存储路径。gmetad在收集到监控数据后，会将其更新到RRD数据库中。需要注意的是，该目录路径对于运行Ganglia用户要有写权限哟~

　　case_sensitive_hostnames
　　　　禁用区分大小写功能(当ganglia版本大于3.2默认值为0)

温馨提示:
　　建议仔细阅读配置文件，因为原生的配置文件有240行，里面含有对各个参数丰富的注释信息，基本上每个参数的作用，默认值均有相关解释哟~

3>.创建rrd数据库的存储路径(若不创建启动gmetad服务时可能会抛出错误哟)

[root@hadoop105.yinzhengjie.com ~]# mkdir -pv /yinzhengjie/data/ganglia/rrds
mkdir: created directory ‘/yinzhengjie/data/ganglia’
mkdir: created directory ‘/yinzhengjie/data/ganglia/rrds’
[root@hadoop105.yinzhengjie.com ~]# 
[root@hadoop105.yinzhengjie.com ~]# chown ganglia:ganglia -R /yinzhengjie/data/ganglia/  # 创建目录成功后别忘记将该目录路径授权给启动ganglia服务的守护进程用户，我定义的文件是让ganglia用户来启动。

四.配置gmond客户端

1>.备份配置文件(/etc/ganglia/gmond.conf)

[root@hadoop101.yinzhengjie.com ~]# wc -l /etc/ganglia/gmond.conf  # 很明显，客户端的默认配置文件内容也不少哟~
379 /etc/ganglia/gmond.conf
[root@hadoop101.yinzhengjie.com ~]# 
[root@hadoop101.yinzhengjie.com ~]# cp  /etc/ganglia/gmond.conf /etc/ganglia/gmond.conf-`date +%F`  # 为了保险起见，还是建议把配置文件备份一份，这样便于前后对比你所改的内容。

2>.编辑"/etc/ganglia/gmond.conf"配置文件

[root@hadoop101.yinzhengjie.com ~]# vim /etc/ganglia/gmond.conf
[root@hadoop101.yinzhengjie.com ~]# 
[root@hadoop101.yinzhengjie.com ~]# cat /etc/ganglia/gmond.conf
/* This configuration is as close to 2.5.x default behavior as possible
   The values closely match ./gmond/metric.h definitions in 2.5.x */
globals {
  daemonize = yes
  setuid = yes
  user = ganglia
  debug_level = 0
  max_udp_msg_len = 1472
  mute = no
  deaf = no
  allow_extra_data = yes
  host_dmax = 86400 /*secs. Expires (removes from web interface) hosts in 1 day */
  host_tmax = 20 /*secs */
  cleanup_threshold = 300 /*secs */
  gexec = no
  # By default gmond will use reverse DNS resolution when displaying your hostname
  # Uncommeting following value will override that value.
  # override_hostname = "mywebserver.domain.com"
  # If you are not using multicast this value should be set to something other than 0.
  # Otherwise if you restart aggregator gmond you will get empty graphs. 60 seconds is reasonable
  send_metadata_interval = 0 /*secs */

}

/*
 * The cluster attributes specified will be used as part of the <CLUSTER>
 * tag that will wrap all hosts collected by this instance.
 */
cluster {
 
  name = "yinzhengjie-hadoop"
 
  owner = "unspecified"
  latlong = "unspecified"
  url = "unspecified"
}

/* The host section describes attributes of the host, like the location */
host {
  location = "unspecified"
}

/* Feel free to specify as many udp_send_channels as you like.  Gmond
   used to only support having a single channel */
udp_send_channel {
  #bind_hostname = yes # Highly recommended, soon to be default.
                       # This option tells gmond to use a source address
                       # that resolves to the machine's hostname.  Without
                       # this, the metrics may appear to come from any
                       # interface and the DNS names associated with
                       # those IPs will be used to create the RRDs.
  # mcast_join = 239.2.11.71
  
  host = hadoop105.yinzhengjie.com

  port = 8649
  ttl = 1
}

/* You can specify as many udp_recv_channels as you like as well. */
udp_recv_channel {
  # mcast_join = 239.2.11.71
  port = 8649

  bind = hadoop101.yinzhengjie.com

  retry_bind = true
  # Size of the UDP buffer. If you are handling lots of metrics you really
  # should bump it up to e.g. 10MB or even higher.
  # buffer = 10485760
}

/* You can specify as many tcp_accept_channels as you like to share
   an xml description of the state of the cluster */
tcp_accept_channel {
  port = 8649
  # If you want to gzip XML output
  gzip_output = no
}

/* Channel to receive sFlow datagrams */
#udp_recv_channel {
#  port = 6343
#}

/* Optional sFlow settings */
#sflow {
# udp_port = 6343
# accept_vm_metrics = yes
# accept_jvm_metrics = yes
# multiple_jvm_instances = no
# accept_http_metrics = yes
# multiple_http_instances = no
# accept_memcache_metrics = yes
# multiple_memcache_instances = no
#}

/* Each metrics module that is referenced by gmond must be specified and
   loaded. If the module has been statically linked with gmond, it does
   not require a load path. However all dynamically loadable modules must
   include a load path. */
modules {
  module {
    name = "core_metrics"
  }
  module {
    name = "cpu_module"
    path = "modcpu.so"
  }
  module {
    name = "disk_module"
    path = "moddisk.so"
  }
  module {
    name = "load_module"
    path = "modload.so"
  }
  module {
    name = "mem_module"
    path = "modmem.so"
  }
  module {
    name = "net_module"
    path = "modnet.so"
  }
  module {
    name = "proc_module"
    path = "modproc.so"
  }
  module {
    name = "sys_module"
    path = "modsys.so"
  }
}

/* The old internal 2.5.x metric array has been replaced by the following
   collection_group directives.  What follows is the default behavior for
   collecting and sending metrics that is as close to 2.5.x behavior as
   possible. */

/* This collection group will cause a heartbeat (or beacon) to be sent every
   20 seconds.  In the heartbeat is the GMOND_STARTED data which expresses
   the age of the running gmond. */
collection_group {
  collect_once = yes
  time_threshold = 20
  metric {
    name = "heartbeat"
  }
}

/* This collection group will send general info about this host*/
collection_group {
  collect_every = 60
  time_threshold = 60
  metric {
    name = "cpu_num"
    title = "CPU Count"
  }
  metric {
    name = "cpu_speed"
    title = "CPU Speed"
  }
  metric {
    name = "mem_total"
    title = "Memory Total"
  }
  metric {
    name = "swap_total"
    title = "Swap Space Total"
  }
  metric {
    name = "boottime"
    title = "Last Boot Time"
  }
  metric {
    name = "machine_type"
    title = "Machine Type"
  }
  metric {
    name = "os_name"
    title = "Operating System"
  }
  metric {
    name = "os_release"
    title = "Operating System Release"
  }
  metric {
    name = "location"
    title = "Location"
  }
}

/* This collection group will send the status of gexecd for this host
   every 300 secs.*/
/* Unlike 2.5.x the default behavior is to report gexecd OFF. */
collection_group {
  collect_once = yes
  time_threshold = 300
  metric {
    name = "gexec"
    title = "Gexec Status"
  }
}

/* This collection group will collect the CPU status info every 20 secs.
   The time threshold is set to 90 seconds.  In honesty, this
   time_threshold could be set significantly higher to reduce
   unneccessary  network chatter. */
collection_group {
  collect_every = 20
  time_threshold = 90
  /* CPU status */
  metric {
    name = "cpu_user"
    value_threshold = "1.0"
    title = "CPU User"
  }
  metric {
    name = "cpu_system"
    value_threshold = "1.0"
    title = "CPU System"
  }
  metric {
    name = "cpu_idle"
    value_threshold = "5.0"
    title = "CPU Idle"
  }
  metric {
    name = "cpu_nice"
    value_threshold = "1.0"
    title = "CPU Nice"
  }
  metric {
    name = "cpu_aidle"
    value_threshold = "5.0"
    title = "CPU aidle"
  }
  metric {
    name = "cpu_wio"
    value_threshold = "1.0"
    title = "CPU wio"
  }
  metric {
    name = "cpu_steal"
    value_threshold = "1.0"
    title = "CPU steal"
  }
  /* The next two metrics are optional if you want more detail...
     ... since they are accounted for in cpu_system.
  metric {
    name = "cpu_intr"
    value_threshold = "1.0"
    title = "CPU intr"
  }
  metric {
    name = "cpu_sintr"
    value_threshold = "1.0"
    title = "CPU sintr"
  }
  */
}

collection_group {
  collect_every = 20
  time_threshold = 90
  /* Load Averages */
  metric {
    name = "load_one"
    value_threshold = "1.0"
    title = "One Minute Load Average"
  }
  metric {
    name = "load_five"
    value_threshold = "1.0"
    title = "Five Minute Load Average"
  }
  metric {
    name = "load_fifteen"
    value_threshold = "1.0"
    title = "Fifteen Minute Load Average"
  }
}

/* This group collects the number of running and total processes */
collection_group {
  collect_every = 80
  time_threshold = 950
  metric {
    name = "proc_run"
    value_threshold = "1.0"
    title = "Total Running Processes"
  }
  metric {
    name = "proc_total"
    value_threshold = "1.0"
    title = "Total Processes"
  }
}

/* This collection group grabs the volatile memory metrics every 40 secs and
   sends them at least every 180 secs.  This time_threshold can be increased
   significantly to reduce unneeded network traffic. */
collection_group {
  collect_every = 40
  time_threshold = 180
  metric {
    name = "mem_free"
    value_threshold = "1024.0"
    title = "Free Memory"
  }
  metric {
    name = "mem_shared"
    value_threshold = "1024.0"
    title = "Shared Memory"
  }
  metric {
    name = "mem_buffers"
    value_threshold = "1024.0"
    title = "Memory Buffers"
  }
  metric {
    name = "mem_cached"
    value_threshold = "1024.0"
    title = "Cached Memory"
  }
  metric {
    name = "swap_free"
    value_threshold = "1024.0"
    title = "Free Swap Space"
  }
}

collection_group {
  collect_every = 40
  time_threshold = 300
  metric {
    name = "bytes_out"
    value_threshold = 4096
    title = "Bytes Sent"
  }
  metric {
    name = "bytes_in"
    value_threshold = 4096
    title = "Bytes Received"
  }
  metric {
    name = "pkts_in"
    value_threshold = 256
    title = "Packets Received"
  }
  metric {
    name = "pkts_out"
    value_threshold = 256
    title = "Packets Sent"
  }
}

/* Different than 2.5.x default since the old config made no sense */
collection_group {
  collect_every = 1800
  time_threshold = 3600
  metric {
    name = "disk_total"
    value_threshold = 1.0
    title = "Total Disk Space"
  }
}

collection_group {
  collect_every = 40
  time_threshold = 180
  metric {
    name = "disk_free"
    value_threshold = 1.0
    title = "Disk Space Available"
  }
  metric {
    name = "part_max_used"
    value_threshold = 1.0
    title = "Maximum Disk Space Used"
  }
}

include ("/etc/ganglia/conf.d/*.conf")

[root@hadoop101.yinzhengjie.com ~]# 
[root@hadoop101.yinzhengjie.com ~]#

[root@hadoop101.yinzhengjie.com ~]# vim /etc/ganglia/gmond.conf

　　各字段解释如下:
　　globals字段:
　　　　daemonize：
　　　　　　是否后台运行，默认值为yes。
　　　　 setuid:
　　　　　　是否设置运行用户的UID，在windows操作系统下建议设置为no。
　　　　user:
　　　　　　设置运行ganglia服务的用户名称，若不设置默认为nobody，官方已经显式帮我们设置为ganglia用户，该用户必须在当前操作系统中存在，而在安装gmod服务时，会自动帮咱们创建出该用户哟。
　　　　debug_level:
　　　　　　设置调试级别，默认值为0，表示不输出任何日志。 
　　　　max_udp_msg_len:
　　　　　　最大UDP消息长度，默认值为1472。
　　　　mute:
　　　　　　是否发送监控数据到其他节点，设置为yes则表示本节点不再发送自己的监控数据到其他节点，默认值为no。
　　　　deaf:
　　　　　　表示是否接受其他节点发送过来的监控数据，设置为yes则表示本节点不再接受其他节点发送来的监控数据，默认值为no。
　　　　allow_extra_data:
　　　　　　是否接受额外的数据信息，默认值为yes，当设置为no时可以有效的节省带宽，但也意味着不再接受额外的数据。
　　　　host_dmax:
　　　　　　默认值是86400秒，即主机在1天内过期（从web界面删除）。若设置为0，则永不删除主机信息。
　　　　host_tmax:
　　　　　　指定TMAX的时间长度，默认值是20秒，TMAX的属性我也不是很清楚，但我发现在"/etc/ganglia/gmetad.conf"配置文件中关于"data_source"关键字的注释信息中有提到了它。
　　　　cleanup_threshold:
　　　　　　设置gmod清理过期数据的时间，默认300秒。
　　　　gexec: 
　　　　　　当设置为yes时，运行执行gexec Job，默认值为no。
　　　　send_metadata_interval:
　　　　　　默认值为0秒，如果不使用多播，则应将此值设置为0以外的值。否则，如果重新启动aggregator gmond，将得到空的图。60秒是合理的。
　　　　　　换句话说，在单播环境中，如果将该值设置为0，当某个节点的gmod重启后，gmod的获取节点(即gmetad)将不再接受该节点的数据，如果设置大于0，可以保证gmod节点关闭或重启后，在设定的阈值时间内gmetad节点可以重新获取gmod发送的数据。


　　cluster字段:
　　　　name:
　　　　　　定义集群的名称，该名称必须和"/etc/ganglia/gmetad.conf"配置文件中的"data_source"关键字指定的唯一标识符同名，默认值为"unspecified"
　　　　owner :
　　　　　　默认值为"unspecified"，无需修改，
　　　　latlong:
　　　　　　默认值为"unspecified"，无需修改。
　　　　url：
　　　　　　默认值为"unspecified"，无需修改。 


　　host 字段:
　　　　location:
　　　　　　默认值为 "unspecified"，无需修改。
　　
　　udp_send_channel字段:
　　　　mcast_join ：
　　　　　　默认值是一个D类地址，即239.2.11.71。在网络环境比较复杂的情况下建议使用单播模式，即使用host来定义。
　　　　host:
　　　　　　指定单播地址。该地址指定的是gmetad服务器的地址。
　　　　port:
　　　　　　指定gmod的UDP监听端口，默认值为8649
　　　　ttl：
      　　指定UDP发送通道的ttl，默认值为1，无需修改。

　　udp_recv_channel字段
　　　　mcast_join：
　　　　　　默认值是一个D类地址，即239.2.11.71，如果udp_send_channle字段使用的是单播地址(即host字段)，则建议注释该字段。
　　　　port:
　　　　　　指定本机端口，默认值为8649
　　　　bind:
　　　　　　指定绑定本机的地址，hadoop101.yinzhengjie.com。
　　　　retry_bind:
　　　　　　尝试重试绑定，默认值为true，无需修改。
　　　　buffer:
　　　　　　指定UDP的buffer大小，默认是10485760字节(即10MB)，无需修改。
　　　　
　　tcp_accept_channel 字段
　　　　port ：
　　　　　　指定TCP的监听端口，默认值为 8649
　　　　gzip_output ：
　　　　　　是否启用gzip压缩，默认值为no。

3>.将"/etc/ganglia/gmond.conf"配置文件分发到其它集群节点

[root@hadoop101.yinzhengjie.com ~]# ansible all -m copy -a "src=/etc/ganglia/gmond.conf dest=/etc/ganglia/gmond.conf"

温馨提示：
　　需要注意的是，将hadoop101.yinzhengjie.com的配置拷贝到集群其他节点，如果我们在bind写死主机名的话需要一个一个手动登录到所有节点逐一去修改哟~因此为了省事情建议写成"0.0.0.0"。

五.配置gweb端

1>.在hadoop105.yinzhengjie.com节点上安装gweb相关组件

[root@hadoop105.yinzhengjie.com ~]# yum -y install nginx php-fpm ganglia-web

温馨提示:

nginx:
　　指的就是咱们使用的web服务器，只不过它仅能处理静态数据。

php-fpm:
　　它可以处理PHP相关的程序，因此必须安装该包。

ganglia-web:
　　主要是安装ganglia所需要的web文件，安装该包成功后，你会发现多出来了一个"/usr/share/ganglia"目录，我们无需关心该目录，不需要做任何操作，到时候在nginx的配置文件指定root目录为它即可。
　　当然你也可以安装该包。如下图所示，可以去官网手动下载(连接地址:https://sourceforge.net/projects/ganglia/files/)ganglia-web文件，它是一个tar包。下载后解压并配置相应权限即可。
　　博主建议还是基于yum方式安装即可，因为无需做更多配置直接就一步到位啦~

2>.配置nginx的配置文件

[root@hadoop105.yinzhengjie.com ~]# vim /etc/nginx/nginx.conf  # 编辑主配置文件，主要修改日志格式为JSON格式，而后确认子配置文件在哪个路径下配置(即关注"include"关键字)。
[root@hadoop105.yinzhengjie.com ~]# 
[root@hadoop105.yinzhengjie.com ~]# egrep -v "^#|^$" /etc/nginx/nginx.conf
user nginx;
worker_processes auto;
error_log /var/log/nginx/error.log;
pid /run/nginx.pid;
include /usr/share/nginx/modules/*.conf;
events {
    worker_connections 1024;
}
http {
    log_format my_access_json '{"@timestamp":"$time_iso8601",' '"host":"$server_addr",' '"clientip":"$remote_addr",' '"size":$body_bytes_sent,' '"responsetime":$request_time,' '"upstreamtim
e":"$upstream_response_time",' '"upstreamhost":"$upstream_addr",' '"http_host":"$host",' '"uri":"$uri",' '"domain":"$host",' '"xff":"$http_x_forwarded_for",' '"referer":"$http_referer",' '"tcp_xff":"$proxy_protocol_addr",' '"http_user_agent":"$http_user_agent",' '"status":"$status"}';    
　　 access_log  /var/log/nginx/access.log  my_access_json;
    sendfile            on;
    tcp_nopush          on;
    tcp_nodelay         on;
    keepalive_timeout   65;
    types_hash_max_size 2048;
    include             /etc/nginx/mime.types;
    default_type        application/octet-stream;
    # Load modular configuration files from the /etc/nginx/conf.d directory.
    # See http://nginx.org/en/docs/ngx_core_module.html#include
    # for more information.
    include /etc/nginx/conf.d/*.conf;
}
[root@hadoop105.yinzhengjie.com ~]#

[root@hadoop105.yinzhengjie.com ~]# vim /etc/nginx/nginx.conf # 编辑主配置文件，主要修改日志格式为JSON格式，而后确认子配置文件在哪个路径下配置(即关注"include"关键字)。

[root@hadoop105.yinzhengjie.com ~]# vim /etc/nginx/conf.d/ganglia.conf  # 配置ganglia的root目录
[root@hadoop105.yinzhengjie.com ~]# 
[root@hadoop105.yinzhengjie.com ~]# cat /etc/nginx/conf.d/ganglia.conf 
server {
    listen       80 default_server;
    server_name  _;
    # 注意:"/usr/share/ganglia"无需手动创建，它在安装"ganglia-web"包时自动生成的哟~
    root         /usr/share/ganglia;
    index    index.php;
    include /etc/nginx/default.d/*.conf;

    location / {
    }

    location ~ \.php$ {
    fastcgi_pass   127.0.0.1:9000;
    fastcgi_index  index.php;
    fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
    include        fastcgi_params;
    }

    error_page 404 /404.html;
        location = /40x.html {
    }

    error_page 500 502 503 504 /50x.html;
        location = /50x.html {
    }
}
[root@hadoop105.yinzhengjie.com ~]#

[root@hadoop105.yinzhengjie.com ~]# vim /etc/nginx/conf.d/ganglia.conf # 配置ganglia的root目录

六.启动服务

1>.启动nginx服务

[root@hadoop105.yinzhengjie.com ~]# systemctl start php-fpm
[root@hadoop105.yinzhengjie.com ~]# 
[root@hadoop105.yinzhengjie.com ~]# systemctl status php-fpm
[root@hadoop105.yinzhengjie.com ~]# 
[root@hadoop105.yinzhengjie.com ~]# systemctl enable php-fpm
[root@hadoop105.yinzhengjie.com ~]# 
[root@hadoop105.yinzhengjie.com ~]# systemctl start nginx
[root@hadoop105.yinzhengjie.com ~]# 
[root@hadoop105.yinzhengjie.com ~]# systemctl status nginx
[root@hadoop105.yinzhengjie.com ~]# 
[root@hadoop105.yinzhengjie.com ~]# systemctl enable nginx
[root@hadoop105.yinzhengjie.com ~]#

2>.启动gmetad

[root@hadoop105.yinzhengjie.com ~]# systemctl restart gmetad
[root@hadoop105.yinzhengjie.com ~]# 
[root@hadoop105.yinzhengjie.com ~]# systemctl status gmetad
[root@hadoop105.yinzhengjie.com ~]# 
[root@hadoop105.yinzhengjie.com ~]# systemctl enable gmetad
[root@hadoop105.yinzhengjie.com ~]#

3>.启动gmond

[root@hadoop105.yinzhengjie.com ~]# systemctl start gmond
[root@hadoop105.yinzhengjie.com ~]# 
[root@hadoop105.yinzhengjie.com ~]# systemctl status gmond
[root@hadoop105.yinzhengjie.com ~]# 
[root@hadoop105.yinzhengjie.com ~]# systemctl enable gmond
 

[root@hadoop101.yinzhengjie.com ~]# ansible all -m shell -a 'systemctl start gmond'  # 其他节点也需要启动gmond哟~
[root@hadoop101.yinzhengjie.com ~]# 
[root@hadoop101.yinzhengjie.com ~]# ansible all -m shell -a 'systemctl enable gmond'

七.搭建集群可能出现问题

　　如下图所示，gmetad已经接受来自gmod节点的数据，但始终无法在web界面展示，但相比大家也看到了，关于CPU的信息的确在web界面是获取到了，但其他的信息就没有显示了，这一点让我很困惑！

　　我这篇笔记是是根据"https://www.cnblogs.com/yinzhengjie/p/9798739.html"笔记而来，只不过把搭建过程更详细化了。

　　希望你没有遇到和我一样的困惑，先留个截图在这里吧，若以后解决了在把解决方案写下。(我初步怀疑可能是某些依赖包未安装导致web页面无法显示采集来的数据信息)。

　　Ganglia的仅能起到监控作用，无法发出警报功能，若想要发出警报信息可以借助Nagios发出警报信息哟~

　　Nagios是一个开源监控系统，可以帮助我们检验系统的运行状况，是一个非常好的警报和监视工具。可以使用Nagios来监测家禽资源和应用程序的状态以及CPU，磁盘和内存等系统资源。

　　虽然Ganglia主要用来收集和跟踪指标，但Nagios可以依靠其内置的通知系统发警报。

　　Nagios支持以下功能:
　　　　(1)获取关于集群基础设施的最新信息;
　　　　(2)生成故障报警;
　　　　(3)检测潜在问题;
　　　　(4)监控资源可用性;

　　博主推荐使用监控，警报功能一体的开源监控系统，比如国外的zabbix服务，如果您的集群在10000台以内的话，分布式zabbix监控系统应该是hold住的。如果集群规模较大可以考虑使用Open Falcon。

posted @ 2020-10-24 01:32 JasonYin2020 阅读(553) 评论(0) 收藏举报

刷新页面返回顶部

Jason Yin

部署Ganglia集群

公告