mongo Too many open files 异常
1 情况说明
mongodb 在启动后,插入数据,过一会就挂掉,查看日志,出现如下:
{"t":{"$date":"2024-04-29T14:48:15.075+08:00"},"s":"I", "c":"NETWORK", "id":22944, "ctx":"conn12","msg":"Connection ended","attr":{"remote":"192.168.179.129:52808","connectionId":12,"connectionCount":7}}
{"t":{"$date":"2024-04-29T14:48:15.567+08:00"},"s":"I", "c":"-", "id":20883, "ctx":"conn2","msg":"Interrupted operation as its client disconnected","attr":{"opId":2993}}
{"t":{"$date":"2024-04-29T14:48:15.567+08:00"},"s":"I", "c":"NETWORK", "id":22944, "ctx":"conn2","msg":"Connection ended","attr":{"remote":"127.0.0.1:36488","connectionId":2,"connectionCount":6}}
{"t":{"$date":"2024-04-29T14:48:15.575+08:00"},"s":"I", "c":"-", "id":20883, "ctx":"conn10","msg":"Interrupted operation as its client disconnected","attr":{"opId":3007}}
{"t":{"$date":"2024-04-29T14:48:15.575+08:00"},"s":"I", "c":"NETWORK", "id":22944, "ctx":"conn11","msg":"Connection ended","attr":{"remote":"192.168.179.129:52806","connectionId":11,"connectionCount":5}}
{"t":{"$date":"2024-04-29T14:48:15.575+08:00"},"s":"I", "c":"NETWORK", "id":22944, "ctx":"conn10","msg":"Connection ended","attr":{"remote":"192.168.179.129:52804","connectionId":10,"connectionCount":4}}
{"t":{"$date":"2024-04-29T14:48:16.894+08:00"},"s":"E", "c":"STORAGE", "id":22435, "ctx":"conn9","msg":"WiredTiger error","attr":{"error":24,"message":"[1714373296:894282][16731:0x7f9272f61700], file:collection-62--2378942896950423417.wt, WT_SESSION.open_cursor: __posix_open_file, 815: /data/disk2/mongodb_4_4_29/data/db0/collection-62--2378942896950423417.wt: handle-open: open: Too many open files"}}
{"t":{"$date":"2024-04-29T14:48:16.894+08:00"},"s":"F", "c":"STORAGE", "id":50882, "ctx":"conn9","msg":"Failed to open WiredTiger cursor. This may be due to data corruption","attr":{"uri":"table:collection-62--2378942896950423417","config":"","error":{"code":264,"codeName":"TooManyFilesOpen","errmsg":"24: Too many open files"},"message":"Please read the documentation for starting MongoDB with --repair here: http://dochub.mongodb.org/core/repair"}}
{"t":{"$date":"2024-04-29T14:48:16.894+08:00"},"s":"F", "c":"-", "id":23091, "ctx":"conn9","msg":"Fatal assertion","attr":{"msgid":50882,"file":"src/mongo/db/storage/wiredtiger/wiredtiger_session_cache.cpp","line":109}}
{"t":{"$date":"2024-04-29T14:48:16.894+08:00"},"s":"F", "c":"-", "id":23092, "ctx":"conn9","msg":"\n\n***aborting after fassert() failure\n\n"}
核心的报错内容为:
handle-open: open: Too many open files
- 原因说明:
这个错误通常表示MongoDB尝试打开更多的文件描述符(file descriptors),但是操作系统限制了可以同时打开的数量,并且已经达到了上限。文件描述符是操作系统用来追踪打开文件的资源。
2 验证问题:
- 查看当前的用户打开文件的数量限制
ulimit -a
ulimit -u
ulimit -n
分别显示如下:
[root@192 mydata]# ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 127881
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 65536
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 65536
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
[root@192 mydata]# ulimit -n
65536
[root@192 mydata]# ulimit -u
65536
如果以上查看,发现
ulimit -a
Max open files 1024 4096 files
或者 -n -u 的值较低
1024
如果是显示以上情况,说明是系统的该设置较低。
3 解决方案
3.1 临时增加系统限制:
可以使用ulimit命令临时增加打开文件的数量限制。例如,运行ulimit -n 65535将当前会话的文件描述符数量限制增加到65535。
暂时性,重新打开shell命令窗口会失效。
ulimit -n 65535
ulimit -u 65535
3.2 查看对应pid limits:
先查看对应线程
ps -ef |grep mongo
查看该线程的限制
cat /proc/41814/limits
增加对应的值
prlimit --pid 41814 --nofile=65535:65535
3.3 永久增加限制:
编辑/etc/security/limits.conf文件,添加或修改相应的行来增加限制。例如:
mongod soft nofile 65535
mongod hard nofile 65535
- mongod 是指启动mongo的账号,即当前的登录账号
注意:这个值不能大于 1048576 ! 否则会导致机器无法启动,ssh无法连接等问题,切记!!!!
2 systemd 的限制问题
还有一种情况,报错情况是一样的,但是,在验证问题环节不一样,在使用 ulimit -a,ulimit -n,ulimit -u 等查看系统限制时,均为65535,数值不低,正常来讲,不会出现这个问题。但是,仍然会出现这个错误。
这种情况,往往是 monodb的配置文件或者systemd 的限制导致的。
2.1 验证问题
进入mongo,提示如下:
MongoDB shell version v4.4.29
connecting to: mongodb://127.0.0.1:27017/?compressors=disabled&gssapiServiceName=mongodb
Implicit session: session { "id" : UUID("f12a123d-9e03-4bc7-b382-e8bfcc6671db") }
MongoDB server version: 4.4.29
---
The server generated these startup warnings when booting:
2024-04-29T15:17:57.493+08:00: Access control is not enabled for the database. Read and write access to data and configuration is unrestricted
2024-04-29T15:17:57.493+08:00: You are running this process as the root user, which is not recommended
2024-04-29T15:17:57.494+08:00: /sys/kernel/mm/transparent_hugepage/enabled is 'always'. We suggest setting it to 'never'
2024-04-29T15:17:57.494+08:00: /sys/kernel/mm/transparent_hugepage/defrag is 'always'. We suggest setting it to 'never'
2024-04-29T15:17:57.494+08:00: Soft rlimits too low
2024-04-29T15:17:57.494+08:00: currentValue: 1024
2024-04-29T15:17:57.494+08:00: recommendedMinimum: 64000
会发现有警告:Soft rlimits too low ,currentValue: 1024 ,recommendedMinimum: 64000
这说明当前的mongo的 Soft rlimits 确实是1024。而不是系统的配置 65535.
这种情况,问题可能出在了mongodb的配置文件,或者systemd的配置限制。
我自己的配置文件没问题,因此说明下systemd的配置限制问题。
2.2 排查问题
mongo的启动,可以直接通过mongod 启动,也可以通过注册systemd服务启动。
- 直接命令启动:
/data/disk2/mongodb/bin/mongod -f /data/disk2/mongodb/mongodb.conf
- systemd启动
sudo service mongodb start
systemctl start mongodb.service
发现,只有在 systemd启动 时,才会出现上述警告,说明问题出在了systemd启动mongo的过程中。
2.3 解决方案
你可以在 MongoDB 的 systemd 服务文件中添加
LimitNOFILE=65536
或者
LimitNOFILE=soft:65536 hard:131072
来永久设置文件描述符限制。
修改 systemd 服务文件,服务文件位于 /lib/systemd/system下,
cp /lib/systemd/system/mongodb.service mongodb.service
chmod 755 mongodb.service
vim mongodb.service
mv mongodb.service /lib/systemd/system
添加到 Service 部分。
[Unit]
Description=mongodb
After=network.target remote-fs.target nss-lookup.target
[Service]
Type=forking
ExecStart=/data/disk2/mongodb_4_4_29/bin/mongod -f /data/disk2/mongodb_4_4_29/mongodb.conf
ExecReload=/bin/kill -s HUP $MAINPID
ExecStop=/data/disk2/mongodb_4_4_29/bin/mongod --shutdown -f /data/disk2/mongodb_4_4_29/mongodb.conf
PrivateTmp=true
LimitNOFILE=65535
[Install]
WantedBy=multi-user.target
再次systemd 启动即可。