[20260427]21c下设置pre_page_sga=true使用hugepages的疑问.txt

[20260427]21c下设置pre_page_sga=true使用hugepages的疑问.txt

--//这是前几天测试use_large_pages参数时有关使用hugepages的疑问。

1.环境:
SYS@book> @ ver2
==============================
PORT_STRING                   : x86_64/Linux 2.4.xx
VERSION                       : 21.0.0.0.0
BANNER                        : Oracle Database 21c Enterprise Edition Release 21.0.0.0.0 - Production
BANNER_FULL                   : Oracle Database 21c Enterprise Edition Release 21.0.0.0.0 - Production
Version 21.3.0.0.0
BANNER_LEGACY                 : Oracle Database 21c Enterprise Edition Release 21.0.0.0.0 - Production
CON_ID                        : 0
PL/SQL procedure successfully completed.

2.问题提出:
SYS@book> @ hidez pre_page_sga|^use_large_pages
NUM N_HEX CON_ID NAME            DESCRIPTION                                   DEFAULT_VALUE SESSION_VALUE SYSTEM_VALUE ISSES ISSYS_MOD
--- ----- ------ --------------- --------------------------------------------- ------------- ------------- ------------ ----- ---------
180    B4      0 use_large_pages Use large pages if available  TRUE/FALSE/ONLY FALSE         ONLY          ONLY         FALSE FALSE
193    C1      0 pre_page_sga    pre-page sga for process                      TRUE          TRUE          TRUE         FALSE FALSE
--//pre_page_sga=true,use_large_pages=only,以前我的理解设置模式启动会touch全部hugepages。
--//而21c下实际的情况如下:

$ grep -i hugepages /proc/meminfo
AnonHugePages:     40960 kB
HugePages_Total:     530
HugePages_Free:        7
HugePages_Rsvd:        7
HugePages_Surp:        0
Hugepagesize:       2048 kB
--//HugePages_Rsvd=7,还有7个hugepages没有touch,为什么?

--//11g下就看不到以上情况:
SYS@book> @ ver1
PORT_STRING                    VERSION        BANNER
------------------------------ -------------- --------------------------------------------------------------------------------
x86_64/Linux 2.4.xx            11.2.0.4.0     Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production

SYS@book> alter system set pre_page_sga=true scope=spfile;
System altered.

SYS@book> shutdown immediate ;
Database closed.
Database dismounted.
ORACLE instance shut down.

SYS@book> startup
ORACLE instance started.
Total System Global Area  801701888 bytes
Fixed Size                  2257520 bytes
Variable Size             285216144 bytes
Database Buffers          507510784 bytes
Redo Buffers                6717440 bytes
Database mounted.
Database opened.

$ grep -i hugepages /proc/meminfo
AnonHugePages:     16384 kB
HugePages_Total:     385
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB

$ ipcs -m
------ Shared Memory Segments --------
key        shmid      owner      perms      bytes      nattch     status
0x00000000 163840     oracle     640        12582912   24
0x00000000 196609     oracle     640        792723456  24
0x56c108b0 229378     oracle     640        2097152    24

$ pgrep -a oracle  | grep pmon
3235 ora_pmon_book

--//ipcs -m看不见共享内存段的开始以及结束地址,换一个方式查看:
$ cat /proc/3235/maps | grep "rw-s"
60000000-60c00000 rw-s 00000000 00:0c 163840                             /SYSV00000000 (deleted)
60c00000-90000000 rw-s 00000000 00:0c 196609                             /SYSV00000000 (deleted)
90000000-90200000 rw-s 00000000 00:0c 229378                             /SYSV56c108b0 (deleted)
7f0c6dc9d000-7f0c6dc9e000 rw-s 00000000 08:11 35114806                   /u01/app/oracle/product/11.2.0.4/db_1/dbs/hc_book.dat
7f0c6dc9e000-7f0c6dca0000 rw-s 00000000 00:0a 22772                      /[aio] (deleted)
--//可以发现3个共享内存段的地址确实是紧挨的,也就是不存在间隙.
--//可以发现全部分配,与21c的情况完全不同,21c总存在小量内存没有分配。

--//使用kimi提问,在逐步启发以后,给出的解答如下:

真正原因:12c+ 的 SGA 共享内存段分配方式变了

从 Oracle 11.2.0.3 / 12c 开始,Oracle 为了更灵活地使用 HugePages,改变了 SGA 的共享内存分配策略:

11g:SGA 通常作为单个连续共享内存段分配,pre_page_sga=true 启动时一次性 touch 整个段,所有页都被映射,因此 HugePages_Rsvd 为 0。
12c+:SGA 被拆分为多个共享内存段(ipcs -m 可见多个段,常见为一个主段 + 一个较小的辅助段)。每个段独立在 HugePage(2MB)边界
上对齐,段与段之间会产生少量间隙页。这些间隙页已被内核预留(计入 Rsvd),但 pre_page_sga的 touch 操作只覆盖实际数据页,不
会读写纯粹用于对齐边界的间隙页,于是残留为 HugePages_Rsvd。

你看到的 7 个页(14MB) 正是这种多段对齐开销——在 11g 的单段模式下不会产生,但在 12c+ 的多段模式下必然出现。

--//如果真像上面介绍,仔细看11g下 cat /proc/3235/maps | grep "rw-s"的输出,可以发现3个共享内存段的地址确实是紧挨的,这样
--//21c下的情况也很容易验证:
$ ipcs -m

------ Shared Memory Segments --------
key        shmid      owner      perms      bytes      nattch     status
0x00000000 0          oracle     600        10485760   57
0x00000000 32769      oracle     600        1090519040 57
0x00000000 65538      oracle     600        8388608    57
0xafa94c20 98307      oracle     600        2097152    57

--//10485760/2/1024/1024 = 5
--//1090519040/2/1024/1024 = 520
--//8388608/2/1024/1024    = 4
--//2097152/2/1024/1024    = 1
--//5+520+4+1 = 530
--//ipcs -m看不见共享内存段的开始以及结束地址,换一个方式查看:

$ cat /proc/$(pgrep pmon)/maps | grep "rw-s"
60000000-60a00000 rw-s 00000000 00:0c 0                                  /SYSV00000000 (deleted)
61000000-a2000000 rw-s 00000000 00:0c 32769                              /SYSV00000000 (deleted)
a2000000-a2800000 rw-s 00000000 00:0c 65538                              /SYSV00000000 (deleted)
a3000000-a3200000 rw-s 00000000 00:0c 98307                              /SYSVafa94c20 (deleted)
7f3764b20000-7f3764b21000 rw-s 00000000 08:11 18861347                   /u01/app/oracle/dbs/hc_book.dat

--//如果段与段之间会产生少量间隙页,看看间歇有多大?
--//看看第1行与第2行的共享内存段的间歇:
--//0x61000000-0x60a00000   = 0x600000 = 6291456
--//6291456/2/1024/1024 = 3

--//第2行与第3行的共享内存段不存在间隙.

--//第3行与第4行的共享内存段存在间隙.:
--//0xa3000000-0xa2800000 = 0x800000 = 8388608
--//8388608/2/1024/1024 = 4
--//3+4确实等于7,不知道是是否是巧合.

--//看看ipcs -m输出的bytes是否包括间隙部分,看第1,2行:
--//0x60a00000 - 0x60000000 = 0xa00000 = 10485760
--//0xa2000000-0x61000000 = 0x41000000 = 1090519040

--//很明显并不包括.也就是HugePages_Rsvd=7并不是HugePages_Total=530里面的一部分.

--//kimi给出的解析是正确的,为什么oracle要这样设计呢?这样7*2=14M的内存不是浪费了吗?

3.继续:
--//看了前面的测试笔记,发现启动到nomount状态,情况不同:
SYS@book> startup nomount
ORACLE instance started.

Total System Global Area 1107294056 bytes
Fixed Size                  9684840 bytes
Variable Size             654311424 bytes
Database Buffers          436207616 bytes
Redo Buffers                7090176 bytes

$ grep -i hugepages /proc/meminfo
AnonHugePages:     28672 kB
HugePages_Total:     530
HugePages_Free:       11
HugePages_Rsvd:       11
HugePages_Surp:        0
Hugepagesize:       2048 kB
--//前面HugePages_Rsvd:7,少了4个hugepages。

$ ipcs -m
------ Shared Memory Segments --------
key        shmid      owner      perms      bytes      nattch     status
0x00000000 1409024    oracle     600        10485760   34
0x00000000 1441793    oracle     600        1090519040 34
0x00000000 1474562    oracle     600        8388608    34
0xafa94c20 1507331    oracle     600        2097152    34

$ cat /proc/$(pgrep pmon)/maps | grep "rw-s"
60000000-60a00000 rw-s 00000000 00:0c 1409024                            /SYSV00000000 (deleted)
61000000-a2000000 rw-s 00000000 00:0c 1441793                            /SYSV00000000 (deleted)
a2000000-a2800000 rw-s 00000000 00:0c 1474562                            /SYSV00000000 (deleted)
a3000000-a3200000 rw-s 00000000 00:0c 1507331                            /SYSVafa94c20 (deleted)
7f9be6252000-7f9be6253000 rw-s 00000000 08:11 18861347                   /u01/app/oracle/dbs/hc_book.dat
--//对比前面输出地址没有变化。
--//我仅仅理解第3行的共享内存段的4个hugepages,当前还没有touch。

SYS@book> alter database mount ;
Database altered.

$ grep -i hugepages /proc/meminfo
AnonHugePages:     63488 kB
HugePages_Total:     530
HugePages_Free:       11
HugePages_Rsvd:       11
HugePages_Surp:        0
Hugepagesize:       2048 kB

$ cat /proc/$(pgrep pmon)/maps | grep "rw-s"
60000000-60a00000 rw-s 00000000 00:0c 1409024                            /SYSV00000000 (deleted)
61000000-a2000000 rw-s 00000000 00:0c 1441793                            /SYSV00000000 (deleted)
a2000000-a2800000 rw-s 00000000 00:0c 1474562                            /SYSV00000000 (deleted)
a3000000-a3200000 rw-s 00000000 00:0c 1507331                            /SYSVafa94c20 (deleted)
7f9be6252000-7f9be6253000 rw-s 00000000 08:11 18861347                   /u01/app/oracle/dbs/hc_book.dat

SYS@book> alter database open ;
Database altered.

$ grep -i hugepages /proc/meminfo
AnonHugePages:    161792 kB
HugePages_Total:     530
HugePages_Free:        7
HugePages_Rsvd:        7
HugePages_Surp:        0
Hugepagesize:       2048 kB

$ cat /proc/$(pgrep pmon)/maps | grep "rw-s"
60000000-60a00000 rw-s 00000000 00:0c 1409024                            /SYSV00000000 (deleted)
61000000-a2000000 rw-s 00000000 00:0c 1441793                            /SYSV00000000 (deleted)
a2000000-a2800000 rw-s 00000000 00:0c 1474562                            /SYSV00000000 (deleted)
a3000000-a3200000 rw-s 00000000 00:0c 1507331                            /SYSVafa94c20 (deleted)
7f9be6252000-7f9be6253000 rw-s 00000000 08:11 18861347                   /u01/app/oracle/dbs/hc_book.dat
--//仅仅到open阶段,Redo Buffers的共享内存段才会touch.

SYS@book> show parameter log_buffer
PARAMETER_NAME TYPE         VALUE
-------------- ------------ ------
log_buffer     big integer  7840K
--//7840+2048 = 9888

$ cat /u01/app/oracle/dbs/initbook.ora
SPFILE='/u01/app/oracle/dbs/spfilebook.ora'
use_large_pages=ONLY
sga_target=1072m
sga_max_size=1072m
log_buffer=9888K
--//注:修改log_buffer=9888K,提示要修改sga_target=1072m。
--//1072-1056 = 16,16/2 = 8

# sysctl -w vm.nr_hugepages=538
vm.nr_hugepages = 538

SYS@book> startup nomount pfile=/u01/app/oracle/dbs/initbook.ora
ORACLE instance started.
Total System Global Area 1124071320 bytes
Fixed Size                  9684888 bytes
Variable Size             654311424 bytes
Database Buffers          436207616 bytes
Redo Buffers               23867392 bytes

$ grep -i hugepages /proc/meminfo
AnonHugePages:     53248 kB
HugePages_Total:     538
HugePages_Free:       19
HugePages_Rsvd:       19
HugePages_Surp:        0
Hugepagesize:       2048 kB

$ ipcs -m

------ Shared Memory Segments --------
key        shmid      owner      perms      bytes      nattch     status
0x00000000 262144     oracle     600        10485760   33
0x00000000 294913     oracle     600        1090519040 33
0x00000000 327682     oracle     600        25165824   33
0xafa94c20 360451     oracle     600        2097152    33

$ cat /proc/$(pgrep pmon)/maps | grep "rw-s"
60000000-60a00000 rw-s 00000000 00:0c 262144                             /SYSV00000000 (deleted)
61000000-a2000000 rw-s 00000000 00:0c 294913                             /SYSV00000000 (deleted)
a2000000-a3800000 rw-s 00000000 00:0c 327682                             /SYSV00000000 (deleted)
a4000000-a4200000 rw-s 00000000 00:0c 360451                             /SYSVafa94c20 (deleted)
7fe49a4bf000-7fe49a4c0000 rw-s 00000000 08:11 18861347                   /u01/app/oracle/dbs/hc_book.dat

--//0x61000000-0x60a00000 = 0x600000 = 6291456, 6291456/2/1024/1024 = 3
--//0xa4000000-0xa3800000 = 0x800000 = 8388608 , 8388608/2/1024/1024 = 4
--//间隙还是3+4=7。
--//25165824/2/1024/1024 = 12,修改log_buffer=9888K,竟然从原来的8M-> 24M.
--//12+7=19.

SYS@book> alter database mount ;
Database altered.

SYS@book> alter database open ;
Database altered.

$ grep -i hugepages /proc/meminfo
AnonHugePages:    149504 kB
HugePages_Total:     538
HugePages_Free:       14
HugePages_Rsvd:       14
HugePages_Surp:        0
Hugepagesize:       2048 kB

--//这次HugePages_Rsvd=14,是否可以这样解析,手工修改修改log_buffer=9888K,相当于需要5个2Mhugepages.
--//而第3个共享内存段设置12个2Mhugepages,这样12-5=7, 7+7=14.

SYS@book> oradebug setmypid
Statement processed.
SYS@book> oradebug ipc
IPC information written to the trace file

*** 2026-04-27T10:46:04.113268+08:00 (CDB$ROOT(1))
Processing Oradebug command 'ipc'
Dump of unix-generic skgm context
areaflags            00001fb7
realmflags           0003ffff
mapsize              00001000
protectsize          00001000
lcmsize              00001000
seglen               00001000
largestsize  0000040000000000
smallestsize 0000000001000000
stacklimit     0x7ffe2f6ceeb2
stackdir                   -1
mode                      600
magic                acc01ade
 Dump of unix-generic realm handle `/u01/app/oracle/product/21.0.0/dbhome_1book', flags = 00000100
  key 2947107872 actual_key 2947107872 num_areas 4 num_subareas 4
  primary shmid: 360451 primary sanum 3 version 3
  deferred alloc: FALSE (0) def_post_create: FALSE (0) exp_memlock: 1076M
 Area #0 `Fixed Size' containing Subareas 2-2
  Total size 000000000093c798 Minimum Subarea size 00000000
   Area  Subarea    Shmid    Segment Addr    Stable Addr    Actual Addr
      0        2   262144 0x00000060000000 0x00000060000000 0x00000060000000
               Subarea size     Segment size   Req_Protect  Cur_protect
                          000000000093d000 0000000000a00000 default       readwrite
 Area #1 `Variable Size' containing Subareas 0-0
  Total size 0000000041000000 Minimum Subarea size 01000000
   Area  Subarea    Shmid    Segment Addr    Stable Addr    Actual Addr
      1        0   294913 0x00000061000000 0x00000061000000 0x00000061000000
               Subarea size     Segment size   Req_Protect  Cur_protect
                          0000000041000000 0000000041000000 default       readwrite
 Area #2 `Redo Buffers' containing Subareas 1-1
  Total size 00000000016c3000 Minimum Subarea size 00001000
   Area  Subarea    Shmid    Segment Addr    Stable Addr    Actual Addr
      2        1   327682 0x000000a2000000 0x000000a2000000 0x000000a2000000
               Subarea size     Segment size   Req_Protect  Cur_protect
                          00000000016c3000 0000000001800000 default       readwrite
 Area #3 `skgm overhead' containing Subareas 3-3
  Total size 0000000000004000 Minimum Subarea size 00000000
   Area  Subarea    Shmid    Segment Addr    Stable Addr    Actual Addr
      3        3   360451 0x000000a4000000 0x000000a4000000 0x000000a4000000
               Subarea size     Segment size   Req_Protect  Cur_protect
                          0000000000004000 0000000000200000 default       readwrite

4.小结:
--//HugePages_Rsvd=7并不是HugePages_Total=530里面的一部分.
--//为什么oracle要这样设计呢?这样7*2=14M的内存不是浪费了吗?还是作为类似"挡板"的作用,是否可以这样理解2*2M作为"隔板",第1个
--//共享内存段占5个2M,剩下3M作为间隔,第3个共享内存段占4个2M,剩下4M作为间隔.
posted @ 2026-04-28 20:32  lfree  阅读(3)  评论(0)    收藏  举报