由锁等待引发的关于数据库的 automatic database maintenance functions的探讨
前两周处理了一个由所等待引发的关于DB2 数据库自动维护功能的探讨,现分享如下:
前端监控系统发来警告:
Alarm Comment: STATE ALARM OF PARAMETER PCTAPPSWAITINGONLOCKS TRIGGERED ON DB2DB_LOCK.DB2V956-RWCSTAGE <=5.26<=
注: PCTAPPSWAITINGONLOCKS 是监控软件Patrol里的一个参数,该参数监控当前数据库所有连接中处于lock-wait的连接数占总连接数的百分比。
收到alert后立即登上该server,查找所等待信息:
[db2udb]:(/usr/home/db2v956)$ db2pd -d RWCSTAGE -wlock Database Partition 0 -- Database RWCSTAGE -- Active -- Up 364 days 05:27:48 -- Date 04/29/2014 02:23:00 Locks being waited on : AppHandl [nod-index] TranHdl Lockname Type Mode Conv Sts CoorEDU AppName AuthID AppID 51024 [000-51024] 2 00000002000000000038000152 Row ..U G 24715 db2bp db2v956 *LOCAL.db2v956.140429053001 51448 [000-51448] 48 00000002000000000038000152 Row ..U W 4282 DB2HMON db2v956 *LOCAL.db2v956.140429055534 [db2udb]:(/usr/home/db2v956)$ db2pd -d RWCSTAGE -wlock -dyn -app>db2pd.log [db2udb]:(/usr/home/db2v956)$db2 get snapshot for application agentid 51024 >agent51024.log [db2udb]:(/usr/home/db2v956)$db2 get snapshot for application agentid 51448 >agent51448.log
第一眼看到了等待的agent叫DB2HMON,相信很多同学已经猜到了该进程是干啥的。由于不敢确认是否就是我们经常看到的health monitor process
所以第一时间抓了下两个agent的snapshot。 通agent51448.log,我找到了该agent的pid,
[db2udb]:(/usr/home/db2v956)$ grep "Process ID" agent51448.log
Process ID of client application = 262646
[db2udb]:(/usr/home/db2v956)$ ps -ef|grep 262646
db2v956 262646 192568 0 Feb 22 - 143:20 db2acd 0
db2v956 459250 475278 0 02:25:35 pts/3 0:00 grep 262646
于是证实了db2hmon就是有db2acd派生出来的agent,既然这样就先来了解点db2hmon的历史吧:
db2acd, autonomic computing daemon hosting the health monitor, automatic maintenance utilities, and the administrative task scheduler. This process was formerly called db2hmon
In DB2 UDB Version 8.1, the db2hmon process was controlled by the HEALTH_MON database manager configuration parameter. When HEALTH_MON was set to ON, a single-threaded independent coordinator process named db2hmon would start. This process would terminate if HEALTH_MON was set to OFF. In DB2 UDB Version 8.2, the db2hmon process is no longer controlled by the HEALTH_MON database manager configuration parameter. Rather, it is a stand-alone process that is part of the database server so when DB2 is started, the db2hmon process starts. db2hmon is a special multi-threaded DB2FMP process that is named db2hmon on UNIX/Linux platforms and DB2FMP on Windows. There are two main functions, health monitor and automatic maintenance, that run in the db2hmon process. The following outlines the activity that occurs in the db2hmon process: 1) If HEALTH_MON=ON, the health monitor starts and health indicators will be evaluated. Automatic maintenance evaluation will also occur to report the need for maintenance for any enable automatic maintenance health indicators (any of db.db_backup_req, db.tb_reorg_req, or db.tb_runstats_req). 2) If the appropriate AUTO_* database configuration parameter is set to ON, automatic maintenance evaluation will occur and automatic utilities will be run within the appropriate policy-defined maintenance windows. 3) If HEALTH_MON=OFF and all AUTO_* database configuration parameters are OFF, the db2hmon process will "wake-up" approximately every 2 hours to check for active databases. For any active database, the db2hmon process will connect, check the in-memory database configuration parameter settings for automatic maintenance, and disconnect.
Note that if the db2hmon process is manually terminated, it will automatically restart. It is a standard database server process that must always
be running.
How to stop the db2acd (db2hmon) process in DB2 Version 9.x Technote (FAQ) Question The db2acd process in DB2 Version 9.x will still run when database configuration parameters HEALTH_MON and AUTO_* are set to OFF Cause DB2® Version 9.x process db2acd is a single process that runs in parallel for both health monitor and automatic maintenance functions. When HEALTH_MON=OFF and AUTO_* database configuration parameters are set to OFF, db2acd will still wakeup approximately every 2 hours to check for active database automatic maintenance. Answer Setting registry variable DB2_FMP_COMM_HEAPSZ to 0 will prevent db2acd from starting. This means no set is created, and as a result, no fenced routines can be invoked. It also means that the health monitor and the automatic database maintenance functionality (such as automatic backups, statistics collection, and REORG) will be disabled since this functionality relies on the fenced routine infrastructure. If DB2_FMP_COMM_HEAPSZ is not (cannot be) set to 0, we cannot disable db2acd completely due to its original design. The db2 system controller (db2sysc process) will restart db2acd process if it is killed and this checking is done every 15 minutes. For more detail on setting DB2 registry variable and DB2 Database Manager Configuration, please see section "Related Information".
了解到以上信息后,看看我的环境吧:
[db2v956@db2udb ~]$ db2level DB21085I Instance "db2v956" uses "64" bits and DB2 code release "SQL09056" with level identifier "06070107". Informational tokens are "DB2 v9.5.0.6", "s100918", "IP23209", and Fix Pack "6a". Product is installed at "/opt/ibm/db2/V9.5". [db2v956@db2udb ~]$ uname -a Linux db2udb 2.6.32-131.0.15.el6.x86_64 #1 SMP Sat Nov 12 15:11:58 CST 2011 x86_64 x86_64 x86_64 GNU/Linux [db2v956@db2udb ~]$
再看实例和数据库配置:
[db2udb]:(/usr/home/db2v956)$ db2 get dbm cfg |grep -i health Monitor health of instance and databases (HEALTH_MON) = OFF [db2udb]:(/usr/home/db2v956)$ db2 get db cfg for RWCSTAGE|grep AUTO_ Auto deletion of recovery objects (AUTO_DEL_REC_OBJ) = OFF Automatic maintenance (AUTO_MAINT) = ON Automatic database backup (AUTO_DB_BACKUP) = OFF Automatic table maintenance (AUTO_TBL_MAINT) = ON Automatic runstats (AUTO_RUNSTATS) = ON Automatic statement statistics (AUTO_STMT_STATS) = OFF Automatic statistics profiling (AUTO_STATS_PROF) = OFF Automatic profile updates (AUTO_PROF_UPD) = OFF Automatic reorganization (AUTO_REORG) = OFF [db2udb]:(/usr/home/db2v956)$
明白了吧,虽然instance的health monitor关了,但数据库还在开着runstats呢。
so far, 我们已经弄明白了等待的agent是db2hmon,是由DB2的核心进程db2sysc大概每两个小时派生出来检查数据库health和自动维护的一个进程,
从上面可以看到,我的库确实开着auto_runstats,那runstats一般情况下不会被阻塞才对啊,我们去看看runstats所需要的lock资源:
RUNSTATS acquires an IX table lock on SYSTABLES and a U lock on the row for the table on which statistics are being gathered at the beginning of RUNSTATS. Operations can still read from SYSTABLES including the row with the U lock. Write operations are also possible, providing they do not occuragainst the row with the U lock. However, another reader or writer will not be able acquire an S lock on SYSTABLES because of RUNSTATS' IX lock.
再去看看持有锁的agent的snapshot里看看,很快得到:
Most recent operation = Run statistics
Most recent operation start timestamp = 04/29/2014 01:49:55.469877
再去crontab里扫描一遍:
[db2udb]:(/usr/home/db2v956)$crontab -l|grep runstat 30 01 * * * /usr/home/db2v956/scripts/runstatsall.ksh 'rwcstage' > /dev/null 2>&1
好吧,原来是数据库的自动runstats在等待我crontab里的runstats。
那解决方案就比较简单了,去掉其一就行了。(如果是初级DBA或者开发DBA,建议保留数据库的自动维护功能)
我的解决方案比较特殊,我把crontab 里的runstats脚本放到周末去执行,周一到周六用数据库的自动runstats,所以这就需要调整下数据库的在线维护窗口和runstats的policy:
先看看数据库默认的维护窗口和策略:
[db2udb]:(/usr/home/db2v956)$ db2 "call sysproc.automaint_get_policyfile( 'MAINTENANCE_WINDOW','Maint_window.xml')" Return Status = 0 [db2udb]:(/usr/home/db2v956)$ db2 "call sysproc.automaint_get_policyfile( 'AUTO_RUNSTATS','Auto_runstats.xml')" Return Status = 0 [db2udb]:(/usr/home/db2v956)$ cd sqllib/tmp [db2udb]:(/usr/home/db2v956/sqllib/tmp)$ ls -tl total 32
-rwxr-xr-x 1 db2fenc1 db2iadm1 218 May 01 21:42 Auto_runstats.xml -rwxr-xr-x 1 db2fenc1 db2iadm1 377 May 01 21:42 Maint_window.xml -rw-r--r-- 1 db2v956 db2iadm1 2243 Nov 08 2012 generate.spf [db2udb]:(/usr/home/db2v956/sqllib/tmp)$ cat Maint_window.xml <?xml version="1.0" encoding="UTF-8"?> <DB2MaintenanceWindows xmlns="http://www.ibm.com/xmlns/prod/db2/autonomic/config" > <!-- Online Maintenance Window --> <OnlineWindow Occurrence="During" startTime="00:00:00" duration="24" > <DaysOfWeek>All</DaysOfWeek> <DaysOfMonth>All</DaysOfMonth> <MonthsOfYear>All</MonthsOfYear> </OnlineWindow> </DB2MaintenanceWindows> [db2udb]:(/usr/home/db2v956/sqllib/tmp)$ cat Auto_runstats.xml <?xml version="1.0" encoding="UTF-8"?> <DB2AutoRunstatsPolicy
xmlns="http://www.ibm.com/xmlns/prod/db2/autonomic/config" > <RunstatsTableScope> <FilterCondition/> </RunstatsTableScope> </DB2AutoRunstatsPolicy> [db2udb]:(/usr/home/db2v956/sqllib/tmp)$
好吧,我让online maintenance 窗口调整为:
[db2v956@db2udb tmp]$ cat Maint_windows.xml <?xml version="1.0" encoding="UTF-8"?> <DB2MaintenanceWindows xmlns="http://www.ibm.com/xmlns/prod/db2/autonomic/config" > <!-- Online Maintenance Window --> <OnlineWindow Occurrence="During" startTime="23:00:00" duration="5"> <DaysOfWeek>Mon Tue Wed Thu Fri Sat</DaysOfWeek> <DaysOfMonth>All</DaysOfMonth> <MonthsOfYear>All</MonthsOfYear> </OnlineWindow> </DB2MaintenanceWindows> [db2v956@db2udb tmp]$ db2 "call sysproc.automaint_set_policyfile( 'MAINTENANCE_WINDOW','Maint_windows.xml')" Return Status = 0 [db2v956@db2udb tmp]$
runstatns的scope不变,crontab的脚本调到周末去运行:
[db2udb]:(/usr/home/db2v956)$crontab -l|grep runstat 30 04 * * 0 /usr/home/db2v956/scripts/runstatsall.ksh 'rwcstage' > /dev/null 2>&1
好吧,问题算是解决了。喝杯coffee去!
浙公网安备 33010602011771号