代码改变世界

Exadata健康检查工具EXAchk

2021-02-02 17:00  AlfredZhao  阅读(1051)  评论(0编辑  收藏  举报

本文根据MOS文章:Oracle Exadata Database Machine EXAchk (Doc ID 1070954.1)整理关键步骤。
注:通常都会要求使用当前最新可用的EXAchk版本进行健康检查。

1.查看当前版本

因为Exachk 现在是自治运行状况框架 (AHF) 和跟踪文件分析器 (TFA) 的组成部分。安装 AHF 的当前版本,即可获取最新的 Exachk 和 TFA 版本。 所以既可以通过tfactl查看版本信息,也可以单独查看:
tfactl version -all
exachk -v

若查询结果不匹配最新,则下载最新可用的 Exachk 版本,进行版本更新。反之就可以直接进行第三步巡检信息采集。

要查找 AHF/Exachk 文件结构的根目录,请执行以下命令:

# cat /etc/oracle.ahf.loc
/opt/oracle.ahf

此外,如果完整的 AHF 安装成功,则 Exachk 应安排每天 02:00 执行 exatier1 配置文件。您可以使用以下命令验证自动运行配置:

# exachk -get all -id autostart_client_exatier1

2.执行版本更新

下载最新可用版本,在上面提到的MOS文章中有下载链接。当前最新可用版本是:v20.4。然后解压:
# unzip AHF-LINUX_v20.4.0.zip 
Archive:  AHF-LINUX_v20.4.0.zip
  inflating: README.txt              
  inflating: ahf_setup   

更新版本:

# ./ahf_setup -ahf_loc /opt -data_dir <ORACLE_BASE of Grid owner>
# ./ahf_setup -ahf_loc /opt -data_dir /u01/app/grid

实际执行如下:

[root@dbm08dbadm01 ~]# ./ahf_setup -ahf_loc /opt -data_dir /u01/app/grid

AHF Installer for Platform Linux Architecture x86_64

AHF Installation Log : /tmp/ahf_install_204000_252391_2021_02_02-15_08_40.log

Starting Autonomous Health Framework (AHF) Installation

AHF Version: 20.4.0 Build Date: 202012141017

AHF is already installed at /opt/oracle.ahf

Installed AHF Version: 20.2.3 Build Date: 202010121848

Do you want to upgrade AHF [Y]|N : y

AHF will also be installed/upgraded on these Cluster Nodes :

1. dbm08dbadm02

The AHF Location and AHF Data Directory must exist on the above nodes
AHF Location : /opt/oracle.ahf
AHF Data Directory : /u01/app/grid/oracle.ahf/data

Do you want to install/upgrade AHF on Cluster Nodes ? [Y]|N : y

Upgrading /opt/oracle.ahf

Shutting down AHF Services
Shutting down TFA
Removed symlink /etc/systemd/system/graphical.target.wants/oracle-tfa.service.
Removed symlink /etc/systemd/system/multi-user.target.wants/oracle-tfa.service.
. . . . . 
. . . 
Successfully shutdown TFA..


Starting AHF Services
Starting TFA..
Created symlink from /etc/systemd/system/multi-user.target.wants/oracle-tfa.service to /etc/systemd/system/oracle-tfa.service.
Created symlink from /etc/systemd/system/graphical.target.wants/oracle-tfa.service to /etc/systemd/system/oracle-tfa.service.
Waiting up to 100 seconds for TFA to be started..
. . . . . 
Successfully started TFA Process..
. . . . . 
TFA Started and listening for commands
No new directories were added to TFA

INFO: Starting exachk scheduler in background. Details for the process can be found at /u01/app/grid/oracle.ahf/data/dbm08dbadm01/diag/exachk/compliance_start_020221_151142.log


AHF upgrade completed on dbm08dbadm01

Upgrading AHF on Remote Nodes :

AHF will be installed on dbm08dbadm02, Please wait.

Please Enter the password for dbm08dbadm02 : 

Is password same for all the nodes? [Y]|N : y

Upgrading AHF on dbm08dbadm02 :

[dbm08dbadm02] Copying AHF Installer

[dbm08dbadm02] Running AHF Installer

AHF is sucessfully upgraded to latest version

.--------------------------------------------------------------------.
| Host         | TFA Version | TFA Build ID         | Upgrade Status |
+--------------+-------------+----------------------+----------------+
| dbm08dbadm01 |  20.4.0.0.0 | 20400020201214101756 | UPGRADED       |
| dbm08dbadm02 |  20.4.0.0.0 | 20400020201214101756 | UPGRADED       |
'--------------+-------------+----------------------+----------------'

Moving /tmp/ahf_install_204000_252391_2021_02_02-15_08_40.log to /u01/app/grid/oracle.ahf/data/dbm08dbadm01/diag/ahf/

[root@dbm08dbadm01 ~]# 

再次查询版本确认更新成功:

[root@dbm08dbadm01 ~]# exachk -v
EXACHK  VERSION: 20.4.0_20201214

[root@dbm08dbadm01 ~]# tfactl version -all
TFA Version : 204000
TFA Build ID : 20201214101756
TFA Build Label : TFA_MAIN_GENERIC_201213.1900

EXACHK  VERSION: 20.4.0_20201214


AHF VERSION: 20.4.0

3.执行exachk巡检

使用最新版本的exachk进行巡检:
[root@dbm08dbadm01 ~]# which exachk
/usr/bin/exachk
[root@dbm08dbadm01 ~]# exachk
..
UPLOAD [if required] - /u01/app/grid/oracle.ahf/data/dbm08dbadm01/exachk/user_root/output/exachk_dbm08dbadm01_cdb1db1_020221_15276.zip

根据交互提示输入对应的数据库信息、如果没有配置互信,还需要输入CELL节点、交换机的密码等信息,完成采集后下载压缩包到本机查看结果,重点关注critical类问题。