hengdin

导航

 

mongo Too many open files 异常

1 情况说明

mongodb 在启动后,插入数据,过一会就挂掉,查看日志,出现如下:

{"t":{"$date":"2024-04-29T14:48:15.075+08:00"},"s":"I",  "c":"NETWORK",  "id":22944,   "ctx":"conn12","msg":"Connection ended","attr":{"remote":"192.168.179.129:52808","connectionId":12,"connectionCount":7}}
{"t":{"$date":"2024-04-29T14:48:15.567+08:00"},"s":"I",  "c":"-",        "id":20883,   "ctx":"conn2","msg":"Interrupted operation as its client disconnected","attr":{"opId":2993}}
{"t":{"$date":"2024-04-29T14:48:15.567+08:00"},"s":"I",  "c":"NETWORK",  "id":22944,   "ctx":"conn2","msg":"Connection ended","attr":{"remote":"127.0.0.1:36488","connectionId":2,"connectionCount":6}}
{"t":{"$date":"2024-04-29T14:48:15.575+08:00"},"s":"I",  "c":"-",        "id":20883,   "ctx":"conn10","msg":"Interrupted operation as its client disconnected","attr":{"opId":3007}}
{"t":{"$date":"2024-04-29T14:48:15.575+08:00"},"s":"I",  "c":"NETWORK",  "id":22944,   "ctx":"conn11","msg":"Connection ended","attr":{"remote":"192.168.179.129:52806","connectionId":11,"connectionCount":5}}
{"t":{"$date":"2024-04-29T14:48:15.575+08:00"},"s":"I",  "c":"NETWORK",  "id":22944,   "ctx":"conn10","msg":"Connection ended","attr":{"remote":"192.168.179.129:52804","connectionId":10,"connectionCount":4}}
{"t":{"$date":"2024-04-29T14:48:16.894+08:00"},"s":"E",  "c":"STORAGE",  "id":22435,   "ctx":"conn9","msg":"WiredTiger error","attr":{"error":24,"message":"[1714373296:894282][16731:0x7f9272f61700], file:collection-62--2378942896950423417.wt, WT_SESSION.open_cursor: __posix_open_file, 815: /data/disk2/mongodb_4_4_29/data/db0/collection-62--2378942896950423417.wt: handle-open: open: Too many open files"}}
{"t":{"$date":"2024-04-29T14:48:16.894+08:00"},"s":"F",  "c":"STORAGE",  "id":50882,   "ctx":"conn9","msg":"Failed to open WiredTiger cursor. This may be due to data corruption","attr":{"uri":"table:collection-62--2378942896950423417","config":"","error":{"code":264,"codeName":"TooManyFilesOpen","errmsg":"24: Too many open files"},"message":"Please read the documentation for starting MongoDB with --repair here: http://dochub.mongodb.org/core/repair"}}
{"t":{"$date":"2024-04-29T14:48:16.894+08:00"},"s":"F",  "c":"-",        "id":23091,   "ctx":"conn9","msg":"Fatal assertion","attr":{"msgid":50882,"file":"src/mongo/db/storage/wiredtiger/wiredtiger_session_cache.cpp","line":109}}
{"t":{"$date":"2024-04-29T14:48:16.894+08:00"},"s":"F",  "c":"-",        "id":23092,   "ctx":"conn9","msg":"\n\n***aborting after fassert() failure\n\n"}

核心的报错内容为:

handle-open: open: Too many open files
  • 原因说明:

这个错误通常表示MongoDB尝试打开更多的文件描述符(file descriptors),但是操作系统限制了可以同时打开的数量,并且已经达到了上限。文件描述符是操作系统用来追踪打开文件的资源。

2 验证问题:

  1. 查看当前的用户打开文件的数量限制
ulimit -a
ulimit -u 
ulimit -n 

分别显示如下:

[root@192 mydata]# ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 127881
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 65536
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 65536
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

[root@192 mydata]# ulimit -n
65536
[root@192 mydata]# ulimit -u
65536

如果以上查看,发现
ulimit -a

Max open files 1024 4096 files

或者 -n -u 的值较低

1024
如果是显示以上情况,说明是系统的该设置较低。

3 解决方案

3.1 临时增加系统限制:

可以使用ulimit命令临时增加打开文件的数量限制。例如,运行ulimit -n 65535将当前会话的文件描述符数量限制增加到65535。

暂时性,重新打开shell命令窗口会失效。

ulimit -n 65535
ulimit -u 65535

3.2 查看对应pid limits:

先查看对应线程

ps -ef |grep mongo

查看该线程的限制

cat /proc/41814/limits

增加对应的值

prlimit --pid 41814 --nofile=65535:65535

3.3 永久增加限制:

编辑/etc/security/limits.conf文件,添加或修改相应的行来增加限制。例如:

mongod soft nofile 65535
mongod hard nofile 65535
  • mongod 是指启动mongo的账号,即当前的登录账号

注意:这个值不能大于 1048576 ! 否则会导致机器无法启动,ssh无法连接等问题,切记!!!!

2 systemd 的限制问题

还有一种情况,报错情况是一样的,但是,在验证问题环节不一样,在使用 ulimit -a,ulimit -n,ulimit -u 等查看系统限制时,均为65535,数值不低,正常来讲,不会出现这个问题。但是,仍然会出现这个错误。
这种情况,往往是 monodb的配置文件或者systemd 的限制导致的。

2.1 验证问题

进入mongo,提示如下:

MongoDB shell version v4.4.29
connecting to: mongodb://127.0.0.1:27017/?compressors=disabled&gssapiServiceName=mongodb
Implicit session: session { "id" : UUID("f12a123d-9e03-4bc7-b382-e8bfcc6671db") }
MongoDB server version: 4.4.29
---
The server generated these startup warnings when booting: 
        2024-04-29T15:17:57.493+08:00: Access control is not enabled for the database. Read and write access to data and configuration is unrestricted
        2024-04-29T15:17:57.493+08:00: You are running this process as the root user, which is not recommended
        2024-04-29T15:17:57.494+08:00: /sys/kernel/mm/transparent_hugepage/enabled is 'always'. We suggest setting it to 'never'
        2024-04-29T15:17:57.494+08:00: /sys/kernel/mm/transparent_hugepage/defrag is 'always'. We suggest setting it to 'never'
        2024-04-29T15:17:57.494+08:00: Soft rlimits too low
        2024-04-29T15:17:57.494+08:00:         currentValue: 1024
        2024-04-29T15:17:57.494+08:00:         recommendedMinimum: 64000

会发现有警告:Soft rlimits too low ,currentValue: 1024 ,recommendedMinimum: 64000

这说明当前的mongo的 Soft rlimits 确实是1024。而不是系统的配置 65535.

这种情况,问题可能出在了mongodb的配置文件,或者systemd的配置限制。

我自己的配置文件没问题,因此说明下systemd的配置限制问题。

2.2 排查问题

mongo的启动,可以直接通过mongod 启动,也可以通过注册systemd服务启动。

  • 直接命令启动:
/data/disk2/mongodb/bin/mongod -f /data/disk2/mongodb/mongodb.conf
  • systemd启动
sudo service mongodb start
systemctl start mongodb.service

发现,只有在 systemd启动 时,才会出现上述警告,说明问题出在了systemd启动mongo的过程中。

2.3 解决方案

你可以在 MongoDB 的 systemd 服务文件中添加

LimitNOFILE=65536

或者

LimitNOFILE=soft:65536 hard:131072

来永久设置文件描述符限制。

修改 systemd 服务文件,服务文件位于 /lib/systemd/system下,

cp /lib/systemd/system/mongodb.service mongodb.service
chmod 755 mongodb.service

vim mongodb.service

mv mongodb.service /lib/systemd/system

添加到 Service 部分。

[Unit]
Description=mongodb
After=network.target remote-fs.target nss-lookup.target

[Service]
Type=forking
ExecStart=/data/disk2/mongodb_4_4_29/bin/mongod -f /data/disk2/mongodb_4_4_29/mongodb.conf
ExecReload=/bin/kill -s HUP $MAINPID
ExecStop=/data/disk2/mongodb_4_4_29/bin/mongod --shutdown -f /data/disk2/mongodb_4_4_29/mongodb.conf
PrivateTmp=true
LimitNOFILE=65535

[Install]
WantedBy=multi-user.target

再次systemd 启动即可。

posted on 2024-05-09 17:42  hengdin  阅读(746)  评论(0)    收藏  举报