debain 下使用cgroup2对磁盘带宽和内存进行限制

最近在公司有个项目上要对磁盘进行限速,在centos上面使用cgroup v1接口对磁盘限速很正常,但是在debain下面怎么都不生效,只好放弃cgroup v1采用cgroup v2。

从Linux 4.5内核开始cgroup v2接口已经被标记为官方发布,意味着不再使用devel标签并且可以作为新型cgroup2 fs类型来挂载。

V2相对于V1,规则发生了一些变化。每个控制组(control group)都有一个cgroup.controllers文件,列出子group可以开启的controller。另外,还有一个 cgroup.subtree_control文件,用于控制开启/关闭子group的controller。

1、修改系统启动项

在/etc/default/grub文件修改系统启动cmdline,末尾添加cgroup_no_v1=all

root@node115:~# vim.tiny /etc/default/grub
添加或修改GRUB_CMDLINE_LINUX行:
GRUB_CMDLINE_LINUX="systemd.unified_cgroup_hierarchy=1 cgroup_no_v1=all"
root@node115:~# update-grub
root@node115:~# reboot

2、检查是否生效

重启系统后查看系统日志,有下面的信息就表示cgroup1已经关闭了.

root@node115:~# dmesg|grep group
[    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-5.3.10-1-pve root=/dev/mapper/vcl-root ro systemd.unified_cgroup_hierarchy=1 cgroup_no_v1=all quiet
[    0.108203] Built 1 zonelists, mobility grouping on.  Total pages: 2064227
[    0.108208] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-5.3.10-1-pve root=/dev/mapper/vcl-root ro systemd.unified_cgroup_hierarchy=1 cgroup_no_v1=all quiet
[    0.290723] Disabling cpuset control group subsystem in v1 mounts
[    0.290733] Disabling cpu control group subsystem in v1 mounts
[    0.290737] Disabling cpuacct control group subsystem in v1 mounts
[    0.290742] Disabling io control group subsystem in v1 mounts
[    0.290762] Disabling memory control group subsystem in v1 mounts
[    0.290775] Disabling devices control group subsystem in v1 mounts
[    0.290786] Disabling freezer control group subsystem in v1 mounts
[    0.290789] Disabling net_cls control group subsystem in v1 mounts
[    0.290802] Disabling perf_event control group subsystem in v1 mounts
[    0.290805] Disabling net_prio control group subsystem in v1 mounts
[    0.290808] Disabling hugetlb control group subsystem in v1 mounts
[    0.290810] Disabling pids control group subsystem in v1 mounts
[    0.290813] Disabling rdma control group subsystem in v1 mounts
[    0.290816] *** VALIDATE cgroup1 ***
[    0.290819] *** VALIDATE cgroup2 ***

查看cgroup2的挂载目录.

root@node115:~# mount | grep cgroup
cgroup on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime)

3、限制磁盘读写带宽

注意这样设置后,重启系统时设置会失效,可将相关的命令加入到启动项.

root@node115:~# echo "+io +memory" > /sys/fs/cgroup/cgroup.subtree_control
root@node115:~# ls -l /sys/fs/cgroup/user.slice/
root@node115:~# lsblk -d
NAME MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda    8:0    0   32G  0 disk
sdb    8:16   0  100G  0 disk /mnt/sdb
sdc    8:32   0  100G  0 disk
sdd    8:48   0  100G  0 disk
sde    8:64   0   32G  0 disk
root@node115:~# echo "8:16 wbps=10485760" > /sys/fs/cgroup/user.slice/io.max
root@node115:~# echo "8:16 rbps=10485760" > /sys/fs/cgroup/user.slice/io.max
root@node115:/mnt/sdb# dd if=/dev/zero of=1gfile bs=1M count=1024 conv=fdatasync
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 103.251 s, 10.4 MB/s

取消限制:
root@node115:/sys/fs/cgroup/user.slice# echo "8:16 rbps=max" > io.max
root@node115:/sys/fs/cgroup/user.slice# echo "8:16 wbps=max" > io.max
root@node115:/sys/fs/cgroup/user.slice# cat io.max

4、验证磁盘读写带宽限制是否生效

使用fio测试磁盘读写带宽,可以看到对磁盘的读写带宽限制生效.

fio顺序读带宽:
root@node115:~# fio -filename=/mnt/sdb/testfile -direct=1 -iodepth 1 -thread -rw=read -ioengine=libaio -bs=64k -size=2G -numjobs=10 -runtime=60 -group_reporting -name=mytest
mytest: (g=0): rw=read, bs=64K-64K/64K-64K/64K-64K, ioengine=libaio, iodepth=1
...
fio-2.16
Starting 10 threads
mytest: Laying out IO file(s) (1 file(s) / 2048MB)
Jobs: 10 (f=10): [R(10)] [100.0% done] [10250KB/0KB/0KB /s] [160/0/0 iops] [eta 00m:00s]
mytest: (groupid=0, jobs=10): err= 0: pid=19216: Mon Apr 25 16:53:20 2022
  read : io=615040KB, bw=10249KB/s, iops=160, runt= 60007msec
    slat (usec): min=6, max=458, avg=42.04, stdev=28.85
    clat (usec): min=356, max=175157, avg=62390.91, stdev=47134.65
     lat (usec): min=402, max=175260, avg=62432.94, stdev=47116.45
    clat percentiles (usec):
     |  1.00th=[  644],  5.00th=[  948], 10.00th=[ 1096], 20.00th=[ 1256],
     | 30.00th=[ 1800], 40.00th=[84480], 50.00th=[96768], 60.00th=[98816],
     | 70.00th=[98816], 80.00th=[99840], 90.00th=[101888], 95.00th=[105984],
     | 99.00th=[116224], 99.50th=[121344], 99.90th=[173056], 99.95th=[175104],
     | 99.99th=[175104]
    lat (usec) : 500=0.23%, 750=1.71%, 1000=4.46%
    lat (msec) : 2=24.34%, 4=2.81%, 10=2.86%, 20=1.04%, 50=0.22%
    lat (msec) : 100=48.45%, 250=13.88%
  cpu          : usr=0.04%, sys=0.11%, ctx=9764, majf=0, minf=170
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=9610/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: io=615040KB, aggrb=10249KB/s, minb=10249KB/s, maxb=10249KB/s, mint=60007msec, maxt=60007msec

Disk stats (read/write):
  sdb: ios=8297/3, merge=1359/0, ticks=24863/74, in_queue=10864, util=7.60%
fio顺序写带宽:
root@node115:~# fio -filename=/mnt/sdb/testfile -direct=1 -iodepth 1 -thread -rw=write -ioengine=libaio -bs=64k -size=2G -numjobs=10 -runtime=60 -group_reporting -name=mytest
mytest: (g=0): rw=write, bs=64K-64K/64K-64K/64K-64K, ioengine=libaio, iodepth=1
...
fio-2.16
Starting 10 threads
Jobs: 10 (f=10): [W(10)] [100.0% done] [0KB/10240KB/0KB /s] [0/160/0 iops] [eta 00m:00s]
mytest: (groupid=0, jobs=10): err= 0: pid=20050: Mon Apr 25 16:55:05 2022
  write: io=614720KB, bw=10244KB/s, iops=160, runt= 60007msec
    slat (usec): min=10, max=518, avg=54.92, stdev=34.38
    clat (usec): min=595, max=219293, avg=62408.97, stdev=47014.26
     lat (usec): min=677, max=219329, avg=62463.90, stdev=46990.89
    clat percentiles (usec):
     |  1.00th=[ 1560],  5.00th=[ 1768], 10.00th=[ 1880], 20.00th=[ 2064],
     | 30.00th=[ 2352], 40.00th=[95744], 50.00th=[97792], 60.00th=[98816],
     | 70.00th=[98816], 80.00th=[98816], 90.00th=[98816], 95.00th=[99840],
     | 99.00th=[102912], 99.50th=[102912], 99.90th=[218112], 99.95th=[218112],
     | 99.99th=[220160]
    lat (usec) : 750=0.03%, 1000=0.01%
    lat (msec) : 2=16.94%, 4=20.32%, 10=0.33%, 100=59.19%, 250=3.18%
  cpu          : usr=0.06%, sys=0.10%, ctx=9971, majf=0, minf=10
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=0/w=9605/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: io=614720KB, aggrb=10244KB/s, minb=10244KB/s, maxb=10244KB/s, mint=60007msec, maxt=60007msec

Disk stats (read/write):
  sdb: ios=72/8951, merge=0/631, ticks=23/25042, in_queue=3808, util=8.21%

5、设置内存限制

限制user.slice服务的最大内存使用率为100M.

root@node115:~# systemctl show user.slice | grep MemoryLimit
root@node115:~# systemctl set-property user.slice MemoryLimit=100M
root@node115:~# systemctl daemon-reload
root@node115:~# cd /sys/fs/cgroup/user.slice/
root@node115:/sys/fs/cgroup/user.slice# ls
root@node115:/sys/fs/cgroup/user.slice# cat memory.max
104857600

root@node115:~# cat /etc/systemd/system.control/user.slice.d/50-MemoryLimit.conf
# This is a drop-in unit file extension, created via "systemctl set-property"
# or an equivalent operation. Do not edit.
[Slice]
MemoryLimit=104857600
查看服务资源占用:
root@node115:~# systemd-cgtop

6、验证进程内存限制

关闭服务虚拟内存:
root@node115:/sys/fs/cgroup/user.slice# echo 0 > memory.swap.max
root@node115:~/inode_test# ./memtest
Killed
root@node115:~/inode_test# systemctl set-property user.slice MemoryLimit=200M
root@node115:/sys/fs/cgroup/user.slice# echo 0 > memory.swap.max
root@node115:/sys/fs/cgroup/user.slice# cat memory.swap.max
0
root@node115:~/inode_test# ./memtest
malloc memory 100 MB
Killed
禁用虚拟内存:
root@node115:~/inode_test# swapoff -a
root@node115:~/inode_test# free -mh
              total        used        free      shared  buff/cache   available
Mem:           7.8G        1.6G        5.8G         68M        356M        5.8G
Swap:            0B          0B          0B
root@node115:/sys/fs/cgroup/user.slice# echo 314572800 > memory.max
root@node115:/sys/fs/cgroup/user.slice# cat memory.max
314572800
root@node115:~/inode_test# ./memtest
malloc memory 100 MB
malloc memory 200 MB
Killed
posted @ 2022-04-28 13:48  xzy186  阅读(957)  评论(0编辑  收藏  举报