排序列表堆(SORTHEAP)等参数引起的db2diag.log不断增大一例
服务器配置了
随着业务的增长,数据的增多服务器开始出现异常情况越来越频繁,主要表现死锁的增加,于是继续调整参数,异常情况消失。运行2月以后偶出现死锁现象,于是对数据库进行监控,发现日志文件db2diag.log异常增大,大小到达
PID : 4660 TID : 5372 PROC : db2syscs.exe INSTANCE: DB2 NODE : 000 DB : TIMMS APPHDL : 0-719 APPID: GA FUNCTION: DB2 UDB, oper system services, getPrivateChunksFromOs, probe:100 CALLED : OS, -, VirtualAlloc OSERR : 8 "存储空间不足,无法处理此命令。" MESSAGE : Private memory and/or virtual address space exhausted. Or ulimit is not high enough. DATA #1 : Requested size, PD_TYPE_MEM_REQUESTED_SIZE, 4 bytes 136544256 PID : 4660 TID : 5372 PROC : db2syscs.exe INSTANCE: DB2 NODE : 000 DB : TIMMS APPHDL : 0-719 APPID: GA FUNCTION: DB2 UDB, sort/list services, sqlsAllocateSortMemory, probe:35 DATA #1 : <preformatted> Not enough memory available for a (private) sort heap of size 50000 bufSize = 136527872, bPrivatePool = 1, bPoolHasSpace = 1 Trying smaller size... PID : 4660 TID : 5372 PROC : db2syscs.exe INSTANCE: DB2 NODE : 000 DB : TIMMS APPHDL : 0-719 APPID: GA FUNCTION: DB2 UDB, oper system services, getPrivateChunksFromOs, probe:100 CALLED : OS, -, VirtualAlloc OSERR : 8 "存储空间不足,无法处理此命令。" MESSAGE : Private memory and/or virtual address space exhausted. Or ulimit is not high enough. DATA #1 : Requested size, PD_TYPE_MEM_REQUESTED_SIZE, 4 bytes 117030912 PID : 4660 TID : 5372 PROC : db2syscs.exe INSTANCE: DB2 NODE : 000 DB : TIMMS APPHDL : 0-719 APPID: GA FUNCTION: DB2 UDB, oper system services, getPrivateChunksFromOs, probe:100 CALLED : OS, -, VirtualAlloc OSERR : 8 "存储空间不足,无法处理此命令。" MESSAGE : Private memory and/or virtual address space exhausted. Or ulimit is not high enough. DATA #1 : Requested size, PD_TYPE_MEM_REQUESTED_SIZE, 4 bytes 117030912 PID : 4660 TID : 5372 PROC : db2syscs.exe INSTANCE: DB2 NODE : 000 DB : TIMMS APPHDL : 0-719 APPID: GA FUNCTION: DB2 UDB, sort/list services, sqlsAllocateSortMemory, probe:35 DATA #1 : <preformatted> Not enough memory available for a (private) sort heap of size 50000 bufSize = 117022720, bPrivatePool = 1, bPoolHasSpace = 1 Trying smaller size... |
从上述日志中,可以看到每次日志的写入数据是不一致的,表现在bufSize的值不一致。
第一次:
Not enough memory available for a (private) sort heap of size 50000 bufSize = 136527872, bPrivatePool = 1, bPoolHasSpace = 1 |
第二次:
Not enough memory available for a (private) sort heap of size 50000 bufSize = 117022720, bPrivatePool = 1, bPoolHasSpace = 1 |
从日志中记录中已经可以很直观的看到对于Sortheap大小分配存在严重问题了,于是每有一个查询、更新、插入、删除提交便会出现一次这样的警告,同时也在日志中记录,于是便出现了看到的日志文件不断增大。于是重新对所有的分配进行了一遍计算,发现留给操作系统的内存只有
继续调整参数,主要是调整缓冲池、排序堆、Sql语句堆等的值,最后剩余给操作系统的内存增大至
数据库的性能调优,就意味着必须调整参数,而参数调整就是一个双刃剑,调整没有什么固定的模式可走,只能依靠对业务、数据量的分析,不断的调整、观察、再调整。每个业务系统的参数调整都很难达到最优化配置,不过能满足业务需求足矣,特别对于业务较大的系统。
对于DB2的参数调整时我们可以参考DB2内存模型的文章,在此文中已经对DB2的整个内存模型做了全面的介绍,同时在这边文章中也可以找到对于上述描述的日志文件中的错误的记录,同时也可以很清楚的了解到上面所提到的bufSize的值不断变化的原因所在,作为数据库共享内存的一个组成部分,而数据库共享内存的剩余值也会随操作SQL的变化而变化,在文章中对于这个部分作了很详细的描述:
Sorting is required when no index satisfies the requested ordering of fetched rows, or the optimizer determines that a sort is less expensive than an index scan. There are two kinds of sorts in DB2, private sorts and shared sorts. Private sorts take place in an agent's private agent memory (which we will discuss in the next section); shared shorts take place in the database's database shared memory. For private sorts, the database manager configuration parameter sheapthres specifies an instance-wide soft limit on the total amount of memory that can be consumed by private sorts at any given time. When the total private-sort memory consumption for an instance reaches this limit, the memory allocated for additional incoming private-sort requests will be considerably reduced. You will see the following message in the db2diag.log: "Not enough memory available for a (private) sort heap of size size of sortheap. Trying smaller size..." If you enable intra-partition parallelism or concentrator, DB2 may choose to perform a shared sort if it determines that method to be more efficient than a private sort. If a shared sort is performed, the sort heap for the sort is allocated in database shared memory. The maximum amount of memory used for shared sorts is controlled by the sheapthres_shr database parameter. This is a database-wide hard limit on the total amount of memory consumed by shared sorts at any given time. When this limit is reached, the application requesting the sort will receive the error SQL0955 (rc2). No further shared-sort memory requests will be allowed until the total shared-sort memory consumption falls below the limit specified by sheapthres_shr. The following formula calculates approximately how much memory the database shared memory set requires: Database shared memory = (Main bufferpools + 4 hidden bufferpools + database heap + utility heap + locklist + package cache + catalog cache) + (number of estore pages * 100 bytes) + approx. 10% overhead For databases with intra_parallel enabled or concentrator enable, shared sort memory must be pre-allocated as part of the database shared memory, thus the formula becomes: Database shared memory = (Main bufferpools + 4 hidden bufferpools + database heap + utility heap + locklist + package cache + catalog cache + sheapthres_shr) + (number of estore pages * 100 bytes) + approx. 10% overhead |
几年前曾使用DB2自带的自动调整配置程序,期间因为死锁导致系统经常业务系统当机不断的事情发生,也咨询了几次IBM的工程师,不过都没有得到很好的答案,对于置身于开发、调试环境的人才能更好的知道当前的业务系统所需要的,当然如果有专业的数据库开发和管理人员那是最好,不过很多公司也未必有这样专业的人员去做这些事情。经验就是在不断的失败中得到的。
好久没写了,一向对于文字的组织也不太擅长,所以就凑合看看吧。
请尊重原创。明天我是否还能坚持......................