大数据平台告警信息监控

大数据平台监控界面和报表

通过界面查看大数据平台状态

未配主机映射网址换成ip访问
地址:http://master:8088/cluster/nodes

image-20230508170724876

通过界面查看Hadoop状态

地址: http://master:50070

Hadoop 的运行状态:

image-20230508170906368

菜单功能:

1)Overview(总览),查看 Hadoop 启动时间、版本号、命名节点日志状态、命名节

点存储状态等信息;

2)Datanodes(数据节点),查看正在运行、停止运行的数据节点信息;

3)DataNode Volume Failures(数据节点挂载失败),查看挂载失败的数据节点;

4)Snapshot(快照),查看快照建立、删除的信息;

5)Startup Progress(启动进程),查看启动进程信息;

6)Browse The File System(文件系统浏览),查看 HDFS 中的文件和文件夹;

7)Logs(日记),查看 Hadoop 的命名节点、资源管理等日志

Hadoop 的详细汇总信息:

image-20230508170927588

Web界面监控大数据平台资源状态

通过界面监控YARN的状态

地址:http://master:8088/cluster

image-20230508171355193

Hadoop 中查看 MapReduce 运行日志

地址:http://master:8088/logs/

image-20230508171506702

查看 mapreduce 日志需要先启动 jobhistory 进程

启动: 在 hadoop 用户下执行

[root@master ~]# cd /usr/local/src/hadoop/sbin
[root@master sbin]#  ./mr-jobhistory-daemon.sh start historyserver
starting historyserver, logging to /usr/local/src/hadoop/logs/mapred-root-historyserver-master.out

地址:http://master:8088/cluster

由于当前没有任务进程,web界面为空,于是试一下统计单词,运行 WordCount 案例,计算数据文件中各单词的频度:

[hadoop@master ~]$ cd /usr/local/src/hadoop/
[hadoop@master hadoop]$ ls
bin  dfs  etc  include  lib  libexec  LICENSE.txt  logs  NOTICE.txt  README.txt  sbin  share  tmp
[hadoop@master hadoop]$ hdfs dfs -put ~/data.txt /input
put: `/input/data.txt': File exists			#显示存在,没有的的话就加
[hadoop@master hadoop]$ hdfs dfs -ls /input
Found 1 items
-rw-r--r--   3 hadoop supergroup         46 2023-04-19 10:49 /input/data.txt

#如果/下有outputmulu,先删掉
[hadoop@master hadoop]$ hdfs dfs -rm -r -f /output
23/05/08 17:41:04 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes.
Deleted /output

#执行计数
[hadoop@master hadoop]$ hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar wordcount /input/data.txt /output
23/05/08 18:12:12 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.88.10:8032
23/05/08 18:12:13 INFO input.FileInputFormat: Total input paths to process : 1
23/05/08 18:12:13 INFO mapreduce.JobSubmitter: number of splits:1
23/05/08 18:12:13 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1683539521530_0005
23/05/08 18:12:13 INFO impl.YarnClientImpl: Submitted application application_1683539521530_0005
23/05/08 18:12:13 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1683539521530_0005/
23/05/08 18:12:13 INFO mapreduce.Job: Running job: job_1683539521530_0005
23/05/08 18:12:20 INFO mapreduce.Job: Job job_1683539521530_0005 running in uber mode : false
23/05/08 18:12:20 INFO mapreduce.Job:  map 0% reduce 0%
23/05/08 18:12:26 INFO mapreduce.Job:  map 100% reduce 0%
23/05/08 18:12:32 INFO mapreduce.Job:  map 100% reduce 100%
23/05/08 18:12:32 INFO mapreduce.Job: Job job_1683539521530_0005 completed successfully
23/05/08 18:12:33 INFO mapreduce.Job: Counters: 49
	File System Counters
		FILE: Number of bytes read=63
		FILE: Number of bytes written=231009
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=144
		HDFS: Number of bytes written=41
		HDFS: Number of read operations=6
		HDFS: Number of large read operations=0
........

浏览器查看任务

image-20230508182553547

image-20230508182644075

通过界面监控HDFS状态

地址:http://master:50070

image-20230508182827443

HDFS 文件夹打开和文件下载操作如下

打开文件夹

在地址栏中输入 hbase 名称,点击“Go!”按钮或者直接回车,打开相应的文件夹:

image-20230508182958633

下载 HDFS 中的文件

image-20230508183102805

通过界面监控HBase的状态

访问Web 用户界面地址分别为:master:60010 ;slave1:60010 ;slave2:60010

在访问之前,先要启动HbBse,而启动HBase就要先启动zookeeper,否则HBase会起不来

#先启动zookeeper
[hadoop@master ~]$ zkServer.sh start
ZooKeeper JMX enabled by default
Using config: /usr/local/src/zookeeper/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED

#启动HBase
[hadoop@master ~]$ start-hbase.sh 
starting master, logging to /usr/local/src/hbase/logs/hbase-hadoop-master-master.out
slave2: starting regionserver, logging to /usr/local/src/hbase/logs/hbase-hadoop-regionserver-slave2.out
slave1: starting regionserver, logging to /usr/local/src/hbase/logs/hbase-hadoop-regionserver-slave1.out
[hadoop@master ~]$ jps
2437 NameNode
2789 ResourceManager
4518 HMaster
2631 SecondaryNameNode
4639 Jps

HBase 用户界面主页

HBase 的主页访问地址为 http://master:60010

HBase 的用户主界面菜单有 Table Details(表信息)、Local Logs(本地日志)、Log

Level(日记等级)、Debug Dump(调试转储)、Metrics Dump(指标转储)、HBase Configuration(HBase 配置)

image-20230508200850571

查看 HBase 里面的表信息,点击的菜单栏 Table Details 可查看所有表信息

image-20230508201759503

在 Tables 中点击 System Tables 查看系统表,主要是元数据、命名空间

image-20230508201817929

大数据平台告警和日志信息监控

查看大数据平台主机日志

[hadoop@master ~]$ cd /var/log/
[hadoop@master log]$ ll
total 2212
drwxr-xr-x. 2 root  root     204 Mar  1 09:48 anaconda
drwx------. 2 root  root      23 Mar  1 09:49 audit
-rw-------. 1 root  root    8943 May  8 17:48 boot.log
-rw-------  1 root  root   70827 Mar 15 17:13 boot.log-20230315
-rw-------  1 root  root   17941 Mar 25 18:21 boot.log-20230325
-rw-------  1 root  root    8553 Mar 29 10:36 boot.log-20230329
-rw-------  1 root  root   17758 Apr 12 14:06 boot.log-20230412
-rw-------  1 root  root    8699 Apr 19 10:24 boot.log-20230419
-rw-------  1 root  root   25902 May  6 10:34 boot.log-20230506
-rw-------  1 root  root    8901 May  8 17:33 boot.log-20230508
-rw-------  1 root  utmp       0 May  6 10:34 btmp
-rw-------  1 root  utmp       0 Apr 12 14:06 btmp-20230506
-rw-------  1 root  root    2479 May  8 20:01 cron
-rw-------  1 root  root    1588 Mar 25 18:21 cron-20230325
-rw-------  1 root  root    3170 Apr 12 14:06 cron-20230412
-rw-------  1 root  root    1464 Apr 19 10:24 cron-20230419
-rw-------  1 root  root    1919 May  6 10:34 cron-20230506
-rw-r--r--  1 root  root  122829 May  8 17:48 dmesg
-rw-r--r--  1 root  root  122808 May  8 16:40 dmesg.old
-rw-r--r--. 1 root  root       0 Mar  1 09:49 firewalld
-rw-r--r--. 1 root  root     193 Mar  1 09:45 grubby_prune_debug
-rw-r--r--. 1 root  root  292292 May  8 20:30 lastlog
-rw-------  1 root  root       0 May  6 10:34 maillog
-rw-------  1 root  root     190 Mar 25 17:17 maillog-20230325
-rw-------  1 root  root       0 Mar 25 18:21 maillog-20230412
-rw-------  1 root  root       0 Apr 12 14:06 maillog-20230419
-rw-------  1 root  root       0 Apr 19 10:24 maillog-20230506
-rw-------  1 root  root  272392 May  8 20:30 messages
-rw-------  1 root  root  268597 Mar 25 18:16 messages-20230325
-rw-------  1 root  root  402646 Apr 12 14:01 messages-20230412
-rw-------  1 root  root  134516 Apr 19 10:24 messages-20230419
-rw-------  1 root  root  393753 May  6 10:20 messages-20230506
-rw-r--r--  1 mysql mysql  89848 May  8 19:40 mysqld.log
drwxr-xr-x. 2 root  root       6 Mar  1 09:48 rhsm
-rw-------  1 root  root   11316 May  8 20:30 secure
-rw-------  1 root  root    4476 Mar 25 18:16 secure-20230325
-rw-------  1 root  root   15924 Apr 12 13:54 secure-20230412
-rw-------  1 root  root    5308 Apr 19 10:24 secure-20230419
-rw-------  1 root  root    5328 May  6 10:18 secure-20230506
-rw-------  1 root  root       0 May  6 10:34 spooler
-rw-------  1 root  root       0 Mar 15 17:13 spooler-20230325
-rw-------  1 root  root       0 Mar 25 18:21 spooler-20230412
-rw-------  1 root  root       0 Apr 12 14:06 spooler-20230419
-rw-------  1 root  root       0 Apr 19 10:24 spooler-20230506
-rw-------. 1 root  root       0 Mar  1 09:45 tallylog
drwxr-xr-x. 2 root  root      23 Mar  1 09:49 tuned
-rw-r--r--. 1 root  root   29439 May  8 17:48 vmware-vgauthsvc.log.0
-rw-r--r--. 1 root  root   45878 May  8 19:40 vmware-vmsvc.log
-rw-rw-r--. 1 root  utmp   57600 May  8 20:25 wtmp
-rw-------. 1 root  root    2332 May  6 10:59 yum.log

查看内核及公共消息日志(/var/log/messages)

内核及公共信息日志是许多进程日志文件的汇总,可以切换到 root 用户,采用 cat 或

tail 命令查看该文件,这里采用head的方式

[root@master ~]# head -n 5 /var/log/messages
May  6 10:34:01 master rsyslogd: [origin software="rsyslogd" swVersion="8.24.0" x-pid="882" x-info="http://www.rsyslog.com"] rsyslogd was HUPed
May  6 10:35:36 master su: (to hadoop) root on pts/0
May  6 10:51:05 master su: (to root) root on pts/0
May  6 10:59:17 master yum[2309]: Installed: 2:vim-filesystem-7.4.629-8.el7_9.x86_64
May  6 10:59:18 master yum[2309]: Installed: 2:vim-common-7.4.629-8.el7_9.x86_64

查看计划任务日志/var/log/cron

该文件会记录 crontab 计划任务的创建、执行信息

[root@master ~]# head -n 5 /var/log/cron
May  6 10:34:01 master run-parts(/etc/cron.daily)[2092]: finished logrotate
May  6 10:34:01 master run-parts(/etc/cron.daily)[2080]: starting man-db.cron
May  6 10:34:03 master run-parts(/etc/cron.daily)[2199]: finished man-db.cron
May  6 10:34:03 master anacron[1148]: Job `cron.daily' terminated
May  6 10:54:01 master anacron[1148]: Job `cron.weekly' started

查看系统引导日志/var/log/dmesg

该文件记录硬件设备信息(device)属纯文本,也可以用 dmesg 命令查看。由于文件内容

比较多,用head命令查看前5行

[root@master ~]# head -n 5 /var/log/dmesg
[    0.000000] Initializing cgroup subsys cpuset
[    0.000000] Initializing cgroup subsys cpu
[    0.000000] Initializing cgroup subsys cpuacct
[    0.000000] Linux version 3.10.0-862.el7.x86_64 (builder@kbuilder.dev.centos.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-28) (GCC) ) #1 SMP Fri Apr 20 16:44:24 UTC 2018
[    0.000000] Command line: BOOT_IMAGE=/vmlinuz-3.10.0-862.el7.x86_64 root=/dev/mapper/centos-root ro rd.lvm.lv=centos/root rd.lvm.lv=centos/swap rhgb quiet LANG=en_US.UTF-8

查看邮件系统日志/var/log/maillog

该日志文件记录了每一个发送到系统或从系统发出的电子邮件的活动。它可以用来查看

用户使用哪个系统发送工具或把数据发送到哪个系统

[root@master ~]# head -n 5 /var/log/maillog

查看用户登录日志

这种日志数据用于记录 Linux 操作系统用户登录及退出系统的相关信息,包括用户名、

登录的终端、登录时间、来源主机、正在使用的进程操作等

以下文件保存了用户登录、退出系统等相关信息:

1)/var/log/lastlog :最近的用户登录事件

2)/var/log/wtmp :用户登录注销及系统开、关机事件

3)/var/run/utmp :当前登录的每个用户的详细信息

4)/var/log/secure :与用户验证相关的安全性事件

lastlog 列出所有用户最近登录的信息

lastlog 引用的是/var/log/lastlog 文件中的信息,包括登录名、端口、最后登录时

间等

[root@master ~]# cd /var/log/
[root@master log]# lastlog
Username         Port     From             Latest
root             pts/0    192.168.88.1     Tue May  9 08:36:07 +0800 2023
bin                                        **Never logged in**
daemon                                     **Never logged in**
adm                                        **Never logged in**
lp                                         **Never logged in**
sync                                       **Never logged in**
shutdown                                   **Never logged in**
halt                                       **Never logged in**
mail                                       **Never logged in**
operator                                   **Never logged in**
games                                      **Never logged in**
ftp                                        **Never logged in**
nobody                                     **Never logged in**
systemd-network                            **Never logged in**
dbus                                       **Never logged in**
polkitd                                    **Never logged in**
sshd                                       **Never logged in**
postfix                                    **Never logged in**
hadoop           pts/1                     Mon May  8 20:30:43 +0800 2023
mysql                                      **Never logged in**
ntp                                        **Never logged in**

last 列出当前和曾经登入系统的用户信息

它默认读取的是/var/log/wtmp 文件的信息。输出的内容包括:用户名、终端位置、登

录源信息、开始时间、结束时间、持续时间。注意最后一行输出的是 wtmp 文件起始记录的

时间。当然也可以通过 last -f 参数指定读取文件,可以是/var/log/btmp、/var/run/utmp

文件

[root@master log]# last
root     pts/0        192.168.88.1     Tue May  9 08:36   still logged in   
reboot   system boot  3.10.0-862.el7.x Tue May  9 08:33 - 08:43  (00:09)    
root     pts/1        192.168.88.1     Mon May  8 20:25 - 20:41  (00:15)    
root     pts/1        192.168.88.1     Mon May  8 19:42 - 20:25  (00:42)    
root     pts/0        192.168.88.1     Mon May  8 17:49 - crash  (14:44)    
reboot   system boot  3.10.0-862.el7.x Mon May  8 17:48 - 08:43  (14:55)    
root     pts/1        192.168.88.1     Mon May  8 17:04 - down   (00:43)    
root     pts/0        192.168.88.1     Mon May  8 16:48 - 17:47  (00:59)    
reboot   system boot  3.10.0-862.el7.x Mon May  8 16:39 - 17:48  (01:08)    
root     pts/1        192.168.88.1     Sat May  6 11:31 - 11:34  (00:02)    
root     pts/0        192.168.88.1     Sat May  6 11:29 - 11:34  (00:04)    
root     pts/0        192.168.88.1     Sat May  6 10:00 - 11:29  (01:29)    
reboot   system boot  3.10.0-862.el7.x Sat May  6 09:59 - 11:34  (01:35)    
root     pts/0        192.168.88.1     Tue May  2 13:18 - crash (3+20:41)   
reboot   system boot  3.10.0-862.el7.x Tue May  2 13:17 - 11:34 (3+22:16)   
root     tty1                          Wed Apr 19 11:04 - 11:04  (00:00)    
reboot   system boot  3.10.0-862.el7.x Wed Apr 19 11:03 - 11:04  (00:00)    
root     pts/0        192.168.88.1     Wed Apr 19 09:47 - 11:01  (01:13)    
reboot   system boot  3.10.0-862.el7.x Wed Apr 19 09:44 - 11:02  (01:17)    
root     pts/0        192.168.88.1     Wed Apr 12 13:54 - 14:53  (00:58)    
root     pts/0        192.168.88.1     Wed Apr 12 13:16 - 13:54  (00:38)    
reboot   system boot  3.10.0-862.el7.x Wed Apr 12 13:15 - 11:02 (6+21:47)   
root     pts/0        192.168.88.1     Thu Apr  6 22:28 - 23:02  (00:34)    
reboot   system boot  3.10.0-862.el7.x Thu Apr  6 22:16 - 23:03  (00:46)    
root     pts/0        192.168.88.1     Wed Mar 29 09:24 - 10:43  (01:19)    
reboot   system boot  3.10.0-862.el7.x Wed Mar 29 09:22 - 23:03 (8+13:40)   
root     pts/0        192.168.88.1     Sat Mar 25 18:51 - 18:53  (00:02)    
root     pts/0        192.168.88.1     Sat Mar 25 17:27 - 18:51  (01:23)    
reboot   system boot  3.10.0-862.el7.x Sat Mar 25 17:26 - 23:03 (12+05:36)  
root     pts/0        192.168.88.1     Sat Mar 25 17:17 - 17:25  (00:07)    
reboot   system boot  3.10.0-862.el7.x Sat Mar 25 17:17 - 23:03 (12+05:45)  
hadoop   pts/1        master           Wed Mar 15 16:08 - 16:08  (00:00)    
root     pts/0        192.168.88.1     Wed Mar 15 16:05 - 17:35  (01:29)    
reboot   system boot  3.10.0-862.el7.x Wed Mar 15 16:04 - 17:36  (01:31)    
root     pts/0        192.168.88.1     Wed Mar 15 15:56 - down   (00:06)    
reboot   system boot  3.10.0-862.el7.x Wed Mar 15 15:55 - 16:03  (00:07)    
root     pts/0        192.168.88.1     Wed Mar 15 10:50 - 10:57  (00:06)    
reboot   system boot  3.10.0-862.el7.x Wed Mar 15 10:49 - 10:57  (00:07)    
root     pts/0        192.168.88.1     Sun Mar  5 10:26 - 11:52  (01:25)    
reboot   system boot  3.10.0-862.el7.x Sun Mar  5 10:25 - 10:57 (10+00:32)  
root     pts/0        192.168.88.1     Sun Mar  5 09:24 - crash  (01:00)    
reboot   system boot  3.10.0-862.el7.x Sun Mar  5 09:22 - 10:57 (10+01:34)  
root     pts/0        192.168.88.1     Thu Mar  2 17:30 - 17:34  (00:04)    
reboot   system boot  3.10.0-862.el7.x Thu Mar  2 17:25 - 17:34  (00:09)    
root     pts/0        192.168.88.1     Wed Mar  1 09:59 - crash (1+07:25)   
reboot   system boot  3.10.0-862.el7.x Wed Mar  1 09:57 - 17:34 (1+07:36)   
root     pts/0        192.168.88.1     Wed Mar  1 09:56 - down   (00:00)    
root     tty1                          Wed Mar  1 09:52 - 09:57  (00:04)    
reboot   system boot  3.10.0-862.el7.x Wed Mar  1 09:49 - 09:57  (00:07)    

wtmp begins Wed Mar  1 09:49:55 2023

使用命令 last -f /var/run/utmp,查看 utmp 文件

[root@master log]# last -f /var/run/utmp 
root     pts/0        192.168.88.1     Tue May  9 08:36   still logged in   
reboot   system boot  3.10.0-862.el7.x Tue May  9 08:33 - 08:45  (00:11)    

utmp begins Tue May  9 08:33:54 2023

lastb 列出失败尝试的登录信息

lastb 和 last 命令功能完全相同,只不过它默认读取的是/var/log/btmp 文件的信息

[root@master log]# lastb

btmp begins Sat May  6 10:34:01 2023

通过 Linux 系统安全日志文件/var/log/secure 可查看 SSH 登录行为,该文件读取需要 root 权限

[root@master log]# cat /var/log/secure
May  6 10:35:36 master su: pam_unix(su-l:session): session opened for user hadoop by root(uid=0)
May  6 10:51:05 master su: pam_unix(su-l:session): session opened for user root by root(uid=1000)
May  6 11:02:08 master su: pam_unix(su-l:session): session opened for user hadoop by root(uid=0)
May  6 11:29:37 master su: pam_unix(su-l:session): session closed for user hadoop
May  6 11:29:37 master sshd[1112]: pam_unix(sshd:session): session closed for user root
May  6 11:29:37 master su: pam_unix(su-l:session): session closed for user root
May  6 11:29:37 master su: pam_unix(su-l:session): session closed for user hadoop
May  6 11:29:37 master su: pam_unix(su-l:session): session closed for user root
May  6 11:29:37 master su: pam_unix(su-l:session): session closed for user hadoop
May  6 11:29:42 master sshd[2462]: Accepted password for root from 192.168.88.1 port 3913 ssh2
May  6 11:29:42 master sshd[2462]: pam_unix(sshd:session): session opened for user root by (uid=0)
..................

在Hadoop MapReduce Jobs中查看日志信息

Hadoop 中每一个 Mapper 和 Reducer 都有以下三种类型的日志:

1)stdout-System.out.println()的输出定向到这个文件

2)stderr-System.err.println()的输出定向到这个文件

3)syslog-log4j 的日记输出定向到这个文件。在作业执行中出现和没有被处理的所

有异常的栈跟踪信息会在 syslog 中显示

在浏览器地址栏中输入 http://master:19888/jobhistory,将显示关于作业的摘要信

息,但请注意,需先启动 jobhistory 进程:

[hadoop@master ~]$ start-all.sh 
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [master]
master: starting namenode, logging to /usr/local/src/hadoop/logs/hadoop-hadoop-namenode-master.out
192.168.88.30: starting datanode, logging to /usr/local/src/hadoop/logs/hadoop-hadoop-datanode-slave2.out
192.168.88.20: starting datanode, logging to /usr/local/src/hadoop/logs/hadoop-hadoop-datanode-slave1.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /usr/local/src/hadoop/logs/hadoop-hadoop-secondarynamenode-master.out
starting yarn daemons
starting resourcemanager, logging to /usr/local/src/hadoop/logs/yarn-hadoop-resourcemanager-master.out
192.168.88.20: starting nodemanager, logging to /usr/local/src/hadoop/logs/yarn-hadoop-nodemanager-slave1.out
192.168.88.30: starting nodemanager, logging to /usr/local/src/hadoop/logs/yarn-hadoop-nodemanager-slave2.out
[hadoop@master ~]$ jps
1312 NameNode
1509 SecondaryNameNode
1670 ResourceManager
1931 Jps
[hadoop@master ~]$ cd /usr/local/src/hadoop/sbin/
[hadoop@master sbin]$ ./mr-jobhistory-daemon.sh start historyserver
starting historyserver, logging to /usr/local/src/hadoop/logs/mapred-hadoop-historyserver-master.out

image-20230509090027273

单击 Job ID

image-20230509090109457

点击 Maps 的“1”的链接查看详细的 Mapper 日志:

image-20230509090236597

现在可以查看 Mapper 特定实例的日志。我们可以点击 Logs(日志)查看:

image-20230509090924352

上图显示问题,是因为没有实现日志聚合,在 yarn-site.xml 中加入以下配置启动日志聚合:

[hadoop@master ~]$ cd /usr/local/src/hadoop/etc/hadoop
[hadoop@master hadoop]$ vi yarn-site.xml
<property>
 	<name>yarn.log-aggregation-enable</name>
 	<value>true</value>
</property>

通过用户界面查看Hadoop日志

点击左边的菜单“FINISHED”显示已经完成运行的作业

image-20230509102118668

通过 Hadoop 的 用 户 界 面 查 看 日 志 信 息 , 使 用 浏 览 器 访 问

http://master:50070,点击 Utilities-->Logs

image-20230509102219165

通过命令查看Hadoop 日志

[hadoop@master ~]$ cd /usr/local/src/hadoop/logs
[hadoop@master logs]$ ll
total 3844
-rw-rw-r-- 1 hadoop hadoop 1556048 May  9 10:22 hadoop-hadoop-namenode-master.log
-rw-rw-r-- 1 hadoop hadoop     716 May  9 10:16 hadoop-hadoop-namenode-master.out
-rw-rw-r-- 1 hadoop hadoop     716 May  9 09:56 hadoop-hadoop-namenode-master.out.1
-rw-rw-r-- 1 hadoop hadoop     716 Apr 12 14:51 hadoop-hadoop-namenode-master.out.1.COMPLETED
-rw-rw-r-- 1 hadoop hadoop     716 May  9 09:51 hadoop-hadoop-namenode-master.out.2
-rw-rw-r-- 1 hadoop hadoop     716 Apr 12 13:25 hadoop-hadoop-namenode-master.out.2.COMPLETED
-rw-rw-r-- 1 hadoop hadoop     716 May  9 09:39 hadoop-hadoop-namenode-master.out.3
-rw-rw-r-- 1 hadoop hadoop     716 Apr  6 22:41 hadoop-hadoop-namenode-master.out.3.COMPLETED
-rw-rw-r-- 1 hadoop hadoop     716 May  9 09:22 hadoop-hadoop-namenode-master.out.4
-rw-rw-r-- 1 hadoop hadoop     716 Mar 25 18:41 hadoop-hadoop-namenode-master.out.4.COMPLETED
-rw-rw-r-- 1 hadoop hadoop     716 May  9 09:16 hadoop-hadoop-namenode-master.out.5
-rw-rw-r-- 1 hadoop hadoop     716 Mar 25 18:30 hadoop-hadoop-namenode-master.out.5.COMPLETED
-rw-rw-r-- 1 hadoop hadoop    4965 Apr 19 10:56 hadoop-hadoop-namenode-master.out.COMPLETED
-rw-rw-r-- 1 hadoop hadoop  421330 May  9 10:17 hadoop-hadoop-secondarynamenode-master.log
-rw-rw-r-- 1 hadoop hadoop  191397 Apr 19 11:02 hadoop-hadoop-secondarynamenode-master.log.COMPLETED
-rw-rw-r-- 1 hadoop hadoop     716 May  9 10:16 hadoop-hadoop-secondarynamenode-master.out
-rw-rw-r-- 1 hadoop hadoop     716 May  9 09:56 hadoop-hadoop-secondarynamenode-master.out.1
-rw-rw-r-- 1 hadoop hadoop     716 Apr 12 14:51 hadoop-hadoop-secondarynamenode-master.out.1.COMPLETED
-rw-rw-r-- 1 hadoop hadoop     716 May  9 09:51 hadoop-hadoop-secondarynamenode-master.out.2
-rw-rw-r-- 1 hadoop hadoop     716 Apr 12 13:25 hadoop-hadoop-secondarynamenode-master.out.2.COMPLETED
-rw-rw-r-- 1 hadoop hadoop     716 May  9 09:39 hadoop-hadoop-secondarynamenode-master.out.3
-rw-rw-r-- 1 hadoop hadoop     716 Apr  6 22:42 hadoop-hadoop-secondarynamenode-master.out.3.COMPLETED
-rw-rw-r-- 1 hadoop hadoop     716 May  9 09:22 hadoop-hadoop-secondarynamenode-master.out.4
-rw-rw-r-- 1 hadoop hadoop     716 Mar 25 18:42 hadoop-hadoop-secondarynamenode-master.out.4.COMPLETED
-rw-rw-r-- 1 hadoop hadoop     716 May  9 09:16 hadoop-hadoop-secondarynamenode-master.out.5
-rw-rw-r-- 1 hadoop hadoop     716 Mar 25 18:30 hadoop-hadoop-secondarynamenode-master.out.5.COMPLETED
-rw-rw-r-- 1 hadoop hadoop     716 Apr 19 10:24 hadoop-hadoop-secondarynamenode-master.out.COMPLETED
...............

查看HBase日志

Hbase提供了Web用户界面对日志文件的查看,使用浏览器访问http://master:60010,

显示 HBase 的 web 主界面

image-20230509102736467

点击“Local Logs”菜单打开 HBase 的日志列表

image-20230509102804985

查看Hive日志

Hive 日志存储的位置为/tmp/hadoop,在命令行的模式下,切换到该目录,执行 ll 命

令,查看 Hive 的日志列表

[hadoop@master ~]$ cd /tmp/hadoop
[hadoop@master hadoop]$ ll
total 312
-rw-rw-r-- 1 hadoop hadoop 314209 May  8 17:03 hive.log
-rw-rw-r-- 1 hadoop hadoop   1019 May  8 17:00 stderr
[hadoop@master hadoop]$ head -5 hive.log 
2023-05-08T17:00:19,657 INFO  [main]: hwi.HWIServer (HWIServer.java:main(131)) - HWI is starting up
2023-05-08T17:00:21,325 INFO  [main]: mortbay.log (Slf4jLog.java:info(67)) - Logging to org.apache.logging.slf4j.Log4jLogger@4145bad8 via org.mortbay.log.Slf4jLog
2023-05-08T17:00:21,358 INFO  [main]: mortbay.log (Slf4jLog.java:info(67)) - jetty-6.1.26
2023-05-08T17:00:21,465 WARN  [main]: mortbay.log (Slf4jLog.java:warn(76)) - Can't reuse /tmp/Jetty_0_0_0_0_9999_hive.hwi.2.0.0.war__hwi__p3f5if, using /tmp/Jetty_0_0_0_0_9999_hive.hwi.2.0.0.war__hwi__p3f5if_4307487751968339939
2023-05-08T17:00:21,466 INFO  [main]: mortbay.log (Slf4jLog.java:info(67)) - Extract /usr/local/src/hive/lib/hive-hwi-2.0.0.war to /tmp/Jetty_0_0_0_0_9999_hive.hwi.2.0.0.war__hwi__p3f5if_4307487751968339939/webapp

查看大数据平台告警信息

查看大数据平台主机告警信息

Linux 操作系统的的日志文件存储在/var/log 文件夹中。我们可以利用日志管理工具

journalctl 查看 Linux 操作系统主机上的告警信息。journalctl 是 centos7 上专有的日志

管理工具,该工具是从 message 这个文件里读取信息

[root@master ~]# cd /var/log/
[root@master log]# journalctl -p err..alert
-- Logs begin at Tue 2023-05-09 08:33:49 CST, end at Tue 2023-05-09 10:34:16 CST. --
May 09 08:33:49 localhost.localdomain kernel: Detected CPU family 17h model 104
May 09 08:33:49 localhost.localdomain kernel: Warning: AMD Processor - this hardware has not undergone up
May 09 08:33:49 localhost.localdomain kernel: sd 2:0:0:0: [sda] Assuming drive cache: write through
May 09 08:34:02 master kernel: piix4_smbus 0000:00:07.3: SMBus Host Controller not enabled!
May 09 08:34:16 master systemd[1]: Failed to start Postfix Mail Transport Agent.

我们也可以使用 journalctl 命令,根据服务的 ID 号来查询其告警信息

[root@master log]# journalctl _PID=[ID号] -p err

查看Hadoop告警信息

[root@master log]# cd /usr/local/src/hadoop/logs/
[root@master logs]# ll
total 3924
-rw-rw-r-- 1 hadoop hadoop 1593552 May  9 10:37 hadoop-hadoop-namenode-master.log
-rw-rw-r-- 1 hadoop hadoop     716 May  9 10:16 hadoop-hadoop-namenode-master.out
-rw-rw-r-- 1 hadoop hadoop     716 May  9 09:56 hadoop-hadoop-namenode-master.out.1
-rw-rw-r-- 1 hadoop hadoop     716 Apr 12 14:51 hadoop-hadoop-namenode-master.out.1.COMPLETED
-rw-rw-r-- 1 hadoop hadoop     716 May  9 09:51 hadoop-hadoop-namenode-master.out.2
-rw-rw-r-- 1 hadoop hadoop     716 Apr 12 13:25 hadoop-hadoop-namenode-master.out.2.COMPLETED
-rw-rw-r-- 1 hadoop hadoop     716 May  9 09:39 hadoop-hadoop-namenode-master.out.3
-rw-rw-r-- 1 hadoop hadoop     716 Apr  6 22:41 hadoop-hadoop-namenode-master.out.3.COMPLETED
.....................

查看HBase告警信息

在 HBase 的 Web 用户界面提供了日志告警级别的查询和设置功能。在浏览器中访问

http://master:60010/logLevel 页面

image-20230509103953084

若要查询某个日志的告警级别,输入该日志名,点击“Get Log Level”按钮,显示该

日志的告警级别。如查询日志文件 hadoop-hadoop-namenode-master.log 的告警级别

image-20230509104052412

日志文件 hadoop-hadoop-namenode-master.log 的告警级别为 INFO。如果要

将该日志告警级别调整为 WARN,则在第二个框中输入 Log:hadoop-hadoop-namenode-master.log,Level:WARN,点击“Set Log Level”按钮,设置完毕后再次查询该日志文件的级别

image-20230509104352459

查询日志告警信息

[root@master logs]# cd /usr/local/src/hbase/logs
[root@master logs]#  tail -10f hbase-hadoop-master-master.log |grep INFO
2023-05-09 10:31:51,706 INFO  [master,16000,1683599183843_ChoreService_1] master.RegionStates: Transition {9bffc61846e344cb34473dbb877e9e92 state=OPEN, ts=1683599215578, server=slave1,16020,1683599199173} to {9bffc61846e344cb34473dbb877e9e92 state=PENDING_CLOSE, ts=1683599511705, server=slave1,16020,1683599199173}
2023-05-09 10:31:51,804 INFO  [AM.ZK.Worker-pool2-t15] master.RegionStates: Transition {9bffc61846e344cb34473dbb877e9e92 state=PENDING_CLOSE, ts=1683599511705, server=slave1,16020,1683599199173} to {9bffc61846e344cb34473dbb877e9e92 state=CLOSED, ts=1683599511804, server=slave1,16020,1683599199173}
2023-05-09 10:31:51,804 INFO  [AM.ZK.Worker-pool2-t15] master.AssignmentManager: Setting node as OFFLINED in ZooKeeper for region {ENCODED => 9bffc61846e344cb34473dbb877e9e92, NAME => 'scores,,1683548174984.9bffc61846e344cb34473dbb877e9e92.', STARTKEY => '', ENDKEY => ''}
2023-05-09 10:31:51,804 INFO  [AM.ZK.Worker-pool2-t15] master.RegionStates: Transition {9bffc61846e344cb34473dbb877e9e92 state=CLOSED, ts=1683599511804, server=slave1,16020,1683599199173} to {9bffc61846e344cb34473dbb877e9e92 state=OFFLINE, ts=1683599511804, server=slave1,16020,1683599199173}
2023-05-09 10:31:51,817 INFO  [AM.ZK.Worker-pool2-t15] master.AssignmentManager: Assigning scores,,1683548174984.9bffc61846e344cb34473dbb877e9e92. to slave2,16020,1683599198956
2023-05-09 10:31:51,817 INFO  [AM.ZK.Worker-pool2-t15] master.RegionStates: Transition {9bffc61846e344cb34473dbb877e9e92 state=OFFLINE, ts=1683599511804, server=slave1,16020,1683599199173} to {9bffc61846e344cb34473dbb877e9e92 state=PENDING_OPEN, ts=1683599511817, server=slave2,16020,1683599198956}
2023-05-09 10:31:52,020 INFO  [AM.ZK.Worker-pool2-t17] master.RegionStates: Transition {9bffc61846e344cb34473dbb877e9e92 state=PENDING_OPEN, ts=1683599511817, server=slave2,16020,1683599198956} to {9bffc61846e344cb34473dbb877e9e92 state=OPENING, ts=1683599512020, server=slave2,16020,1683599198956}
2023-05-09 10:31:52,678 INFO  [AM.ZK.Worker-pool2-t18] master.RegionStates: Transition {9bffc61846e344cb34473dbb877e9e92 state=OPENING, ts=1683599512020, server=slave2,16020,1683599198956} to {9bffc61846e344cb34473dbb877e9e92 state=OPEN, ts=1683599512677, server=slave2,16020,1683599198956}
2023-05-09 10:31:52,709 INFO  [AM.ZK.Worker-pool2-t20] master.RegionStates: Offlined 9bffc61846e344cb34473dbb877e9e92 from slave1,16020,1683599199173

查看Hive告警信息

Hive 的日志文件存储在/tmp/hadoop 目录下,切换到该目录,并执行命令 ll

[root@master logs]# cd /tmp/hadoop
[root@master hadoop]# tail -10f hive.log |grep INFO
2023-05-08T17:03:38,040 INFO  [724085490@qtp-1094523823-2]: metastore.ObjectStore (ObjectStore.java:setConf(301)) - Initialized ObjectStore
2023-05-08T17:03:38,389 INFO  [724085490@qtp-1094523823-2]: metastore.HiveMetaStore (HiveMetaStore.java:createDefaultRoles_core(586)) - Added admin role in metastore
2023-05-08T17:03:38,392 INFO  [724085490@qtp-1094523823-2]: metastore.HiveMetaStore (HiveMetaStore.java:createDefaultRoles_core(595)) - Added public role in metastore
2023-05-08T17:03:38,458 INFO  [724085490@qtp-1094523823-2]: metastore.HiveMetaStore (HiveMetaStore.java:addAdminUsers_core(635)) - No user is added in admin role, since config is empty
2023-05-08T17:03:38,586 INFO  [724085490@qtp-1094523823-2]: metastore.HiveMetaStore (HiveMetaStore.java:logInfo(669)) - 0: get_all_databases
2023-05-08T17:03:38,755 INFO  [724085490@qtp-1094523823-2]: HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(280)) - ugi=hadoop	ip=unknown-ip-addr	cmd=get_all_databases	
2023-05-08T17:03:38,765 INFO  [724085490@qtp-1094523823-2]: metastore.HiveMetaStore (HiveMetaStore.java:logInfo(669)) - 0: Shutting down the object store...
2023-05-08T17:03:38,765 INFO  [724085490@qtp-1094523823-2]: HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(280)) - ugi=hadoop	ip=unknown-ip-addr	cmd=Shutting down the object store...	
2023-05-08T17:03:38,766 INFO  [724085490@qtp-1094523823-2]: metastore.HiveMetaStore (HiveMetaStore.java:logInfo(669)) - 0: Metastore shutdown complete.
2023-05-08T17:03:38,766 INFO  [724085490@qtp-1094523823-2]: HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(280)) - ugi=hadoop	ip=unknown-ip-addr	cmd=Metastore shutdown complete.	

[root@master hadoop]#  tail -10f stderr |grep ERROR
ERROR StatusLogger No log4j2 configuration file found. Using default configuration: logging only errors to the console.
posted @ 2023-06-18 16:48  SkyRainmom  阅读(33)  评论(0编辑  收藏  举报