未设置min_free_kbytes参数,导致操作系统进行页缓存回收时,整个操作系统夯住
1、故障概述
某客户的一套三节点RAC数据库,第3个计算节点操作系统自动重启。 客户想弄清重启原因,本文档针对整个故障进行分析,并给出相应的建议。
2、故障分析
2.1 分析数据库的alert日志,在18:25分之前的日志都是正常的日志输出,但之后的日志就记录了异常信息。
|
2025-11-03T18:25:10.522108+08:00 requested by (instance=3, osid=694963 (DIA0)), summary=[process state dump request (kjznHA)] requested by (instance=3, osid=694963 (DIA0)), summary=[process state dump request (kjznHA)] 2025-11-03T18:25:22.199848+08:00 kkjcre1p: unable to spawn jobq slave process 2025-11-03T18:25:22.200027+08:00 Errors in file /oracle/app/oracle/diag/rdbms/emssdb3dg/emssdb33/trace/emssdb33_cjq0_730024.trc: 2025-11-03T18:25:41.687693+08:00 requested by (instance=3, osid=694963 (DIA0)), summary=[process state dump request (kjznHA)] requested by (instance=3, osid=694963 (DIA0)), summary=[process state dump request (kjznHA)] 2025-11-03T18:25:52.400169+08:00 kkjcre1p: unable to spawn jobq slave process 略...... 2025-11-03T18:31:50.139128+08:00 Errors in file /oracle/app/oracle/diag/rdbms/emssdb3dg/emssdb33/trace/emssdb33_dia0_694963.trc (incident=832169): ORA-32701: Possible hangs up to hang ID=10 detected Incident details in: /oracle/app/oracle/diag/rdbms/emssdb3dg/emssdb33/incident/incdir_832169/emssdb33_dia0_694963_i832169.trc 2025-11-03T18:33:05.609713+08:00 Instance termination initiated by instance 1 with reason 1. Instance 1 received a reconfig event from its cluster manager indicating that this instance is supposed to be down Please check instance 1's alert log and LMON trace file for more details. Please also examine the CSS log files. LMON (ospid: 694965): terminating the instance due to ORA error 481 Cause - 'Instance is being terminated by instance 1 (reason 1 inst 3 uid 0x2e5420c)' 2025-11-03T18:33:05.797059+08:00 opiodr aborting process unknown ospid (525710) as a result of ORA-1092 2025-11-03T18:33:05.798367+08:00 Errors in file /oracle/app/oracle/diag/rdbms/emssdb3dg/emssdb33/trace/emssdb33_ora_525710.trc: ORA-01092: ORACLE instance terminated. Disconnection forced 2025-11-03T18:33:06.039309+08:00 Errors in file /oracle/app/oracle/diag/rdbms/emssdb3dg/emssdb33/trace/emssdb33_diag_694939.trc: ORA-27508: IPC error sending a message 略…… 2025-11-03T18:58:09.068188+08:00 Starting ORACLE instance (normal) (OS id: 527122) |
从数据库的日志可以看出:18点25分,数据库已经出现性能问题,整个节点3夯住。 接着,节点1驱逐节点3实例。
在驱逐节点3实例时,因为整个操作系统几乎夯住,所以实例驱逐失败,从而升级为重启整个操作系统。
“kkjcre1p: unable to spawn jobq slave process”这个错误日志,表示操作系统无法创建新的进程,这通常与物理内存不足相关。
2.2 检查操作系统日志。
|
Nov 3 18:26:58 yyjf3 systemd-udevd[755908]: timeout '/usr/lib/udev/scsi_id -g -u -d /dev/dm-183' Nov 3 18:26:58 yyjf3 systemd-udevd[754210]: timeout '/usr/lib/udev/scsi_id -g -u -d /dev/dm-62' Nov 3 18:26:58 yyjf3 systemd-udevd[755908]: slow: '/usr/lib/udev/scsi_id -g -u -d /dev/dm-183' [856664] Nov 3 18:26:58 yyjf3 systemd-udevd[754210]: slow: '/usr/lib/udev/scsi_id -g -u -d /dev/dm-62' [856657] Nov 3 18:26:58 yyjf3 systemd-udevd[755874]: timeout '/usr/lib/udev/scsi_id -g -u -d /dev/dm-181' Nov 3 18:26:58 yyjf3 systemd-udevd[755874]: slow: '/usr/lib/udev/scsi_id -g -u -d /dev/dm-181' [856641] Nov 3 18:26:58 yyjf3 systemd-udevd[756122]: timeout '/usr/lib/udev/scsi_id -g -u -d /dev/dm-163' Nov 3 18:26:58 yyjf3 systemd-udevd[756122]: slow: '/usr/lib/udev/scsi_id -g -u -d /dev/dm-163' [856645] Nov 3 18:26:58 yyjf3 systemd-udevd[756013]: timeout '/usr/lib/udev/scsi_id -g -u -d /dev/dm-128' Nov 3 18:26:58 yyjf3 systemd-udevd[756013]: slow: '/usr/lib/udev/scsi_id -g -u -d /dev/dm-128' [856655] Nov 3 18:26:58 yyjf3 systemd-udevd[756566]: timeout '/usr/lib/udev/scsi_id -g -u -d /dev/dm-174' Nov 3 18:26:58 yyjf3 systemd-udevd[756566]: slow: '/usr/lib/udev/scsi_id -g -u -d /dev/dm-174' [856650] Nov 3 18:26:58 yyjf3 systemd-udevd[756746]: timeout '/usr/lib/udev/scsi_id -g -u -d /dev/dm-171' 略...... Nov 3 18:27:41 yyjf3 kernel: INFO: task systemd-udevd:32363 blocked for more than 120 seconds. Nov 3 18:27:41 yyjf3 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Nov 3 18:27:41 yyjf3 kernel: systemd-udevd D ffff981c1c59b0c0 0 32363 1 0x00000000 Nov 3 18:27:41 yyjf3 kernel: Call Trace: Nov 3 18:27:41 yyjf3 kernel: [<ffffffff8f167bc9>] schedule+0x29/0x70 Nov 3 18:27:41 yyjf3 kernel: [<ffffffff8f1691fd>] rwsem_down_read_failed+0x10d/0x1a0 Nov 3 18:27:41 yyjf3 kernel: [<ffffffff8ed869a8>] call_rwsem_down_read_failed+0x18/0x30 Nov 3 18:27:41 yyjf3 kernel: [<ffffffff8f166e80>] down_read+0x20/0x40 Nov 3 18:27:41 yyjf3 kernel: [<ffffffff8ebf8824>] page_lock_anon_vma_read+0xa4/0x110 Nov 3 18:27:41 yyjf3 kernel: [<ffffffff8ebf8a42>] rmap_walk+0x1b2/0x360 Nov 3 18:27:41 yyjf3 kernel: [<ffffffff8ebf8e5b>] try_to_unmap+0x8b/0xe0 Nov 3 18:27:41 yyjf3 kernel: [<ffffffff8ebf70e0>] ? page_remove_rmap+0x160/0x160 Nov 3 18:27:41 yyjf3 kernel: [<ffffffff8ebf6250>] ? invalid_migration_vma+0x30/0x30 Nov 3 18:27:41 yyjf3 kernel: [<ffffffff8ebf7930>] ? try_to_unmap_one+0x850/0x850 Nov 3 18:27:41 yyjf3 kernel: [<ffffffff8ebf8780>] ? page_get_anon_vma+0xa0/0xa0 Nov 3 18:27:41 yyjf3 kernel: [<ffffffff8ebf6220>] ? invalid_mkclean_vma+0x20/0x20 Nov 3 18:27:41 yyjf3 kernel: [<ffffffff8ec2469f>] migrate_pages+0x43f/0x790 Nov 3 18:27:41 yyjf3 kernel: [<ffffffff8ec21360>] ? migrate_vma_collect.constprop.53+0xe0/0xe0 Nov 3 18:27:41 yyjf3 kernel: [<ffffffff8ec2534c>] migrate_misplaced_page+0xcc/0x100 Nov 3 18:27:41 yyjf3 kernel: [<ffffffff8ebe8d8d>] do_numa_page+0x19d/0x250 Nov 3 18:27:41 yyjf3 kernel: [<ffffffff8ebe9156>] handle_pte_fault+0x316/0xd10 Nov 3 18:27:41 yyjf3 kernel: [<ffffffff8ea7a5eb>] ? flush_tlb_mm_range+0x4b/0x140 Nov 3 18:27:41 yyjf3 kernel: [<ffffffff8ebf404d>] ? change_protection+0x5cd/0x680 Nov 3 18:27:41 yyjf3 kernel: [<ffffffff8ebebc6d>] handle_mm_fault+0x39d/0x9b0 Nov 3 18:27:41 yyjf3 kernel: [<ffffffff8f16f5e3>] __do_page_fault+0x203/0x500 Nov 3 18:27:41 yyjf3 kernel: [<ffffffff8f16f915>] do_page_fault+0x35/0x90 Nov 3 18:27:41 yyjf3 kernel: [<ffffffff8f16b758>] page_fault+0x28/0x30 略…… Nov 3 18:32:44 yyjf3 systemd-udevd[756213]: slow: '/usr/lib/udev/scsi_id -g -u -d /dev/dm-90' [856638] Nov 3 18:32:44 yyjf3 systemd-udevd[754223]: timeout: killing '/usr/lib/udev/scsi_id -g -u -d /dev/dm-134' [856634] Nov 3 18:32:44 yyjf3 systemd-udevd[754223]: slow: '/usr/lib/udev/scsi_id -g -u -d /dev/dm-134' [856634] Nov 4 02:42:48 yyjf3 kernel: Initializing cgroup subsys cpuset Nov 4 02:42:48 yyjf3 kernel: Initializing cgroup subsys cpu Nov 4 02:42:48 yyjf3 kernel: Initializing cgroup subsys cpuacct Nov 4 02:42:48 yyjf3 kernel: Linux version 3.10.0-957.el7.x86_64 (mockbuild@x86-040.build.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-36) (GCC) ) #1 SMP Thu Oct 4 20:48:51 UTC 2018 Nov 4 02:42:48 yyjf3 kernel: Command line: BOOT_IMAGE=/vmlinuz-3.10.0-957.el7.x86_64 root=/dev/mapper/rhel-root ro crashkernel=auto rd.lvm.lv=rhel/root rd.lvm.lv=rhel/swap rhgb quiet LANG=en_US.UTF-8 Nov 4 02:42:48 yyjf3 kernel: e820: BIOS-provided physical RAM map: Nov 4 02:42:48 yyjf3 kernel: BIOS-e820: [mem 0x0000000000000000-0x00000000000983ff] usable Nov 4 02:42:48 yyjf3 kernel: BIOS-e820: [mem 0x0000000000098400-0x000000000009ffff] reserved Nov 4 02:42:48 yyjf3 kernel: BIOS-e820: [mem 0x00000000000e0000-0x00000000000fffff] reserved Nov 4 02:42:48 yyjf3 kernel: BIOS-e820: [mem 0x0000000000100000-0x00000000506c6fff] usable Nov 4 02:42:48 yyjf3 kernel: BIOS-e820: [mem 0x00000000506c7000-0x0000000050cc6fff] ACPI NVS |
在18点26分开始,操作系统的日志显示异常。操作系统的udev服务异常超时,然后操作系统内核也出现报错。内核相关的报错显示NUMA在进行页缓存回收,释放物理内存时出现问题。
在18点32分之后 ,操作系统自动重启。
2.3 分析OSW日志。
|
vmstat输出: zzz ***Mon Nov 3 18:24:16 CST 2025 procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu----- r b swpd free buff cache si so bi bo in cs us sy id wa st Linux OSWbb v8.3.2 yyjf3 SNAP_INTERVAL 30 CPU_CORES 192 VCPUS 384 OSWBB_ARCHIVE_DEST /oracle/app/grid/oracle.ahf/data/repository/suptools/yyjf3/oswbb/grid/archive zzz ***Mon Nov 3 18:59:47 CST 2025 procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu----- r b swpd free buff cache si so bi bo in cs us sy id wa st 20 1 0 289175232 142892 11381036 0 0 20 9 12 74 0 1 98 0 0 9 0 0 289107392 142892 11403364 0 0 11197 4064 120554 71918 1 1 98 0 0 9 0 0 289144736 142892 11409940 0 0 11987 3769 116569 77138 1 1 98 0 0
top输出: zzz ***Mon Nov 3 18:24:16 CST 2025 Linux OSWbb v8.3.2 zzz ***Mon Nov 3 18:59:47 CST 2025 Tasks: 5114 total, 6 running, 5106 sleeping, 0 stopped, 2 zombie %Cpu(s): 0.8 us, 1.1 sy, 0.0 ni, 98.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem : 79089049+total, 28912688+free, 49021488+used, 11548732 buff/cache KiB Swap: 33554428 total, 33554428 free, 0 used. 29153315+avail Mem
meminfo输出: zzz ***Mon Nov 3 18:24:16 CST 2025 MemTotal: 790890500 kB MemFree: 4387368 kB MemAvailable: 203794060 kB Buffers: 190616 kB Cached: 205886336 kB SwapCached: 31592 kB Active: 97362036 kB Inactive: 178229964 kB Active(anon): 79410092 kB Inactive(anon): 10386824 kB Active(file): 17951944 kB Inactive(file): 167843140 kB Unevictable: 746344 kB Mlocked: 746396 kB SwapTotal: 33554428 kB SwapFree: 33120760 kB Dirty: 711908 kB Writeback: 0 kB AnonPages: 71154144 kB Mapped: 11535996 kB Shmem: 19222396 kB Slab: 22168812 kB SReclaimable: 14903508 kB SUnreclaim: 7265304 kB KernelStack: 171024 kB PageTables: 6047376 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 193070076 kB Committed_AS: 118668056 kB VmallocTotal: 34359738367 kB VmallocUsed: 3085580 kB VmallocChunk: 33652245144 kB HardwareCorrupted: 0 kB AnonHugePages: 3796992 kB CmaTotal: 0 kB CmaFree: 0 kB HugePages_Total: 230400 HugePages_Free: 238 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB DirectMap4k: 1660940 kB DirectMap2M: 33619968 kB DirectMap1G: 770703360 kB
free命令输出: total used free shared buff/cache available Mem: 790890496 487423984 738632 932572 302727880 299187488 Swap: 33554428 264 33554164 |
从OSW日志可以看出:在18点24分16秒,操作系统就已经夯住,vmstat、top等命令无法执行成功。
从meminfo的输出可以看出:在出现故障之前,节点3的物理内存只有4GB的空闲空间;200GB左右的文件缓存。大页配置了450GB左右,目前剩余0.5GB,说明数据库实例已经使用了大页。
从free命令输出可以看出:严重时,整个系统的空闲内存只剩余700MB左右,而文件缓存占用了约290GB左右。
2.4 检查操作系统内核设置
2.4.1 sysctl.conf文件
|
kernel.shmall = 2251799813685247 kernel.shmmax = 335082307584 fs.file-max = 6815744 kernel.sem = 250 32000 100 128 kernel.shmmni = 4096 net.core.rmem_default = 262144 net.core.rmem_max = 4194304 net.core.wmem_default = 262144 net.core.wmem_max = 1048576 fs.aio-max-nr = 6048576 net.ipv4.ip_local_port_range = 9000 65500 net.ipv4.conf.ib3.rp_filter=2 net.ipv4.conf.ib1.rp_filter=2 kernel.core_pattern = /tmp/cores/core.%e.%p.%h.%t vm.nr_hugepages = 230400 net.ipv4.ip_default_ttl = 128 |
4.2 limits.conf文件
|
* soft memlock unlimited * hard memlock unlimited
grid soft nofile 1024 grid hard nofile 65536 grid soft nproc 40470 grid hard nproc 66384 grid soft stack 10240 grid hard stack 32768
oracle soft nofile 1024 oracle hard nofile 65536 oracle soft nproc 40470 oracle hard nproc 66384 oracle soft stack 10240 oracle hard stack 32768
oracle hard memlock 263352660 oracle soft memlock 263497900
grid hard memlock 263352660 grid soft memlock 263497900
* soft nofile 65536 * hard nofile 65536 * soft core 0 * hard core 0 |
内核中配置了460GB的大页,然而memlock只设置锁定了256GB的内存,这可能会导致Oracle实例无法锁定足够的内存来分配全部的大页,从而只能继续使用标准页,最终造成物理内存的浪费。
但从当前的实际情况来看(cat /proc/meminfo命令输出),配置的大页内存没有造成浪费。虽然如此,但仍然强制建议修改大页配置。
kernel.shmall:表示系统范围内可用的共享内存页的总数。计算公式为:物理内存大小 / 页面大小(4096)。
kernel.shmmax:定义单个共享内存段的最大大小。当前设置了330GB左右,而数据库的SGA设置为450GB,这就只能给数据库分配多个共享内存段,这可能会影响性能。建议该值的设置略大于SGA大小。
2.5 检查vm.min_free_bytes参数设置
|
# sysctl -a |grep min_free_kbytes vm.min_free_kbytes = 225280 |
这意味着该内核参数使用了系统的默认值。
该内核参数的作用:设置系统保留的最小空闲内存(以KB为单位),确保系统在内存压力下仍然有足够的内存来满足内存分配请求,避免系统出现内存不足(OOM)的情况。当系统空闲内存低于这个阈值时,内核会开始积极清理文件缓存等,释放出更多的物理内存,以维持系统正常运行。如果该参数值设置过小,可能导致系统在内存压力下无法快速回收内存,最终造成整个系统夯住的故障。
2.6 对比三个计算节点的操作系统配置和数据库实例配置,发现配置完全相同,但为什么只有节点3多次出现该故障呢?原因是节点3承担着数据库备份任务,使用RMAN将数据库备份至本地文件系统。对比三个节点的OSW日志可以看出,节点3的文件系统缓存远高于其他节点。
2.7 对整个故障进行梳理:
由于平时在节点3进行RMAN备份工作,导致节点3使用了大量的物理内存作为文件缓存,在18点24分左右,节点3空闲的物理内存不足,而此时的业务比较繁忙,进程需要更多的物理内存,就只能触发内核释放文件缓存所占用的物理内存,这个过程比较缓慢,操作系统内核出现故障,最终造成整个操作系统夯住。
3、建议:
调整vm.min_free_bytes内核参数,为操作系统保留足够的最小空闲内存。
调整Huge Pages设置。
调整kernel.shmall 和 kernel.shmmax内存段设置。
浙公网安备 33010602011771号