lock_sga这个雷子炸了

接通知某系统1号机宕机,且尝试重启5分钟又再次宕机
通过VPN登录查看数据库的告警日志
 
 1 Fri Feb  8 20:53:19 2013
 2 Errors in file /oracle/db/admin/ahsmdb/bdump/ahsmdb1_lmon_495646.trc:
 3 ORA-27091: unable to queue I/O
 4 ORA-27072: File I/O error
 5 IBM AIX RISC System/6000 Error: 12: Not enough space
 6 Additional information: 7
 7 Additional information: 2224224
 8 Additional information: -1
 9 Fri Feb  8 20:53:19 2013
10 Errors in file /oracle/db/admin/ahsmdb/bdump/ahsmdb1_lmon_495646.trc:
11 ORA-00202: control file: '+DATA/ahsmdb/controlfile/current.297.702063033'
12 ORA-15080: synchronous I/O operation to a disk failed
13 Fri Feb  8 20:53:21 2013
14 Errors in file /oracle/db/admin/ahsmdb/bdump/ahsmdb1_ckpt_552974.trc:
15 ORA-00206: error in writing (block 3, # blocks 1) of control file
16 ORA-00202: control file: '+DATA/ahsmdb/controlfile/current.296.702063033'
17 ORA-27070: async read/write failed
18 ORA-00206: error in writing (block 3, # blocks 1) of control file
19 ORA-00202: control file: '+DATA/ahsmdb/controlfile/current.297.702063033'
20 ORA-27070: async read/write failed
21 Fri Feb  8 20:53:22 2013
22 Errors in file /oracle/db/admin/ahsmdb/bdump/ahsmdb1_ckpt_552974.trc:
23 ORA-00221: error on write to control file
24 ORA-00206: error in writing (block 3, # blocks 1) of control file
25 ORA-00202: control file: '+DATA/ahsmdb/controlfile/current.296.702063033'
26 ORA-27070: async read/write failed
27 ORA-00206: error in writing (block 3, # blocks 1) of control file
28 ORA-00202: control file: '+DATA/ahsmdb/controlfile/current.297.702063033'
29 ORA-27070: async read/write failed
30 Fri Feb  8 20:53:22 2013
31 CKPT: terminating instance due to error 221
 
从告警日志查看刚开始判断是盘阵的空间不足导致,通过一下命令可以查看ASM管理的盘阵使用情况:
1 $ ps -ef|grep pmon
2   oracle  249868       1   0   Jan 22      -  3:56 asm_pmon_+ASM1
3   oracle 1618094       1   0 15:15:24      -  0:00 ora_pmon_ahsmdb1
4   oracle 1622138 1646698   0 15:42:37  pts/1  0:00 grep pmon
5 $ export ORACLE_SID=+ASM1
6 $ asmcmd
7 ASMCMD> lsdg
8 State           Type    Rebal  Unbal  Sector  Block       AU  Total_MB  Free_MB  Req_mir_free_MB  Usable_file_MB  Offline_disks  Name
9 MOUNTED  EXTERN  N      N         512   4096  1048576    409600   290397                0                           290397                  0        DATA/
 
可以查看到盘阵的剩余空间足够,排除该原因之后再看内存使用情况
将数据库起起来后,通过topas观察内存使用变化发现,物理内存使用率慢慢涨到99%,交换区涨到10之后库就挂掉了
所以觉得很奇怪,交换区没有使用完,不应该是因为内存不足原因导致数据库宕机,此时思路陷入僵局。
继续观察多次起库的后台日志,发现在第二次重启库宕机之前有个报错如下:
Sat Feb  9 13:09:48 2013
Errors in file /oracle/db/admin/ahsmdb/bdump/ahsmdb1_psp0_495796.trc:
ORA-27300: OS system dependent operation:fork failed with status: 0
ORA-27301: OS failure message: Error 0
ORA-27302: failure occurred at: skgpspawn5
ORA-27303: additional information: skgpspawn5
该报错是由于申请交换区失败导致,但之前查看交换区使用率是正常的。
 
扭头查看2号机的数据库参数和内存使用情况,一切正常,但sga的一个属性lock_sga=true引起了我的注意
这个属性的意思是想把sga内存锁在物理内存中,这样可以保证数据库缓存的数据都是用的物理内存,而不用交换空间,这样能大大提高数据库的速度,
但这也同样存在一个风险,就是物理内存不足的时候,原有的进程想继续申请物理内存的时候就会报错,提示Not enough space,然后再尝试把1号机数据库起起来看下属性,发现同样是true,再结合之前起库之后不就,物理内存的使用率是99%,也就说明问题了
通过修改pfile,将sga和pga都改小后,重启数据库,观察正常。
 
思:lock_sga=true这个属性只有在物理内存比较充裕的时候才能设,慎用!
posted @ 2017-10-09 09:23  tonnytangy  阅读(1104)  评论(0)    收藏  举报