ceph的存储池管理

一.存储池概述

1.存储池的类型

- replicated pools
	副本池,将数据存储为3副本,也就是说数据默认会被存储为3份。
	当然,创建存储池时可以修改对应的副本数量。
		
- erasure-coded pools
	相比于副本池并不会将数据存储3份,而是基于纠删码技术节省存储空间,达到数据冗余的效果。
	
温馨提示:
	如果非要类比的话,那么副本池和纠删码池可以类型"RAID 1"和"RAID 5"。
	生产环境中,大多数情况下都会选择"replicated pools"进行创建,纠删码不敢用。
	
	参考链接:
		https://docs.ceph.com/en/latest/rados/operations/pools/#pools

2.PG数量的计算公式


OSD的数量 X 100 / 存储池的副本数量(osd_pool_default_size,默认值为3)  ----》 2N次幂就是对应的PG数量。


咱们的环境:
	ceph141有3个osd设备:
[root@ceph141 ~]# lsblk 
NAME                                                                                                MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
..
sdb                                                                                                   8:16   0  200G  0 disk 
└─ceph--72aef53e--0a69--4aa5--8be3--9239bc333ec2-osd--block--23387ffb--9b97--4eef--8b77--22b728069b1e
                                                                                                    253:2    0  200G  0 lvm  
sdc                                                                                                   8:32   0  300G  0 disk 
└─ceph--313a6cda--6b9d--4796--9668--8d1da63cd1b4-osd--block--7107cd5e--5a71--46cd--94fe--ab7cc8c779b9
                                                                                                    253:3    0  300G  0 lvm  
sdd                                                                                                   8:48   0  500G  0 disk 
└─ceph--84a0e3f4--7fae--446e--89ad--1bb22b6940ab-osd--block--6cfeedfc--e870--4ab6--bb04--1085c674a9ab
                                                                                                    253:4    0  500G  0 lvm 
...
[root@ceph141 ~]# 


	ceph142有3个osd设备:
[root@ceph142 ~]# lsblk 
NAME                                                                                                MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
...
sdb                                                                                                   8:16   0  200G  0 disk 
└─ceph--223b39c1--89c2--440d--9dce--930450aaad7d-osd--block--183644a5--5af8--4387--b995--51ad8419ba82
                                                                                                    253:2    0  200G  0 lvm  
sdc                                                                                                   8:32   0  300G  0 disk 
└─ceph--72aafaac--5151--49f4--aa4a--b0216f1a33b7-osd--block--674f0f7b--cf54--4813--a486--f92a6d6fe30f
                                                                                                    253:3    0  300G  0 lvm  
sdd                                                                                                   8:48   0  500G  0 disk 
└─ceph--c019c813--5e99--41d2--923b--6c68bc6a87c7-osd--block--636e7599--9338--4b57--989b--d04d1d951322
                                                                                                    253:4    0  500G  0 lvm  
..																									
[root@ceph142 ~]# 


	ceph143有2个osd设备:
[root@ceph143 ~]# lsblk 
NAME                                                                                                MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
...
sdb                                                                                                   8:16   0  200G  0 disk 
└─ceph--2f9b8018--7242--4eae--9b89--454c56222d72-osd--block--e2ee73ae--c94e--4bb8--a0c4--ab24f7654237
                                                                                                    253:2    0  200G  0 lvm  
sdc                                                                                                   8:32   0  300G  0 disk 
└─ceph--a72237c7--f9ec--4228--a3f3--1b4d5625fb62-osd--block--04eb39e9--1dc6--4446--930c--1c2434674b1e
                                                                                                    253:3    0  300G  0 lvm  
...
[root@ceph143 ~]#
	
	
	综上所述,统计咱们的环境如下:
		操作OSD设备是:8个
		存储池的默认副本: 3个
	
	因此适合咱们的PG数量为:
		100 * 8 / 3 => <270
		
	2的8次方得到的结果是256.
	2的9次方得到的结果是512。
	很明显远大于270,因此推荐设置pgs数量的为256。
	
	

参考链接:
	https://docs.ceph.com/en/latest/rados/configuration/pool-pg-config-ref/#pool-pg-and-crush-config-reference
	https://docs.ceph.com/en/nautilus/rados/configuration/pool-pg-config-ref/

二.存储池的基本管理

1.创建存储池

	1.创建副本池
语法:
	ceph osd pool create {pool-name} [{pg-num} [{pgp-num}]] [replicated] \
         [crush-rule-name] [expected-num-objects]

参考案例: 
[root@ceph141 ~]# ceph osd pool create yinzhengjie 128 128 replicated
pool 'yinzhengjie' created
[root@ceph141 ~]# 


	2.创建纠删码池
语法:
	ceph osd pool create {pool-name} [{pg-num} [{pgp-num}]]   erasure \
         [erasure-code-profile] [crush-rule-name] [expected_num_objects] [--autoscale-mode=<on,off,warn>]

参考案例:
[root@ceph141 ~]# ceph osd pool create jasonyin 128 128 erasure
pool 'jasonyin' created
[root@ceph141 ~]# 



温馨提示: pg_num和pgp-num的区别:
- pg_num:
    创建pg的数量。
    启用pg自动缩放时,允许集群根据预期集群利用率和预期池利用率,对每个池的pg数量(pgp_num)进行建议或自动调整。
    参考链接:
        https://docs.ceph.com/en/latest/rados/operations/placement-groups/#autoscaling-placement-groups

- pgp-num:
	用于放置目的的PG的总数。这应该等于pg的总数,除非在pg_num增加或减少时短暂增加或减少。

参考链接:
	https://docs.ceph.com/en/latest/rados/operations/pools/#creating-a-pool

2.查看存储池

	1. 查看存储池的名称列表
[root@ceph141 ~]# ceph osd pool ls
yinzhengjie
jasonyin
[root@ceph141 ~]# 


	2. 查看存储池的列表详细信息
[root@ceph141 ~]# ceph osd pool ls detail
pool 1 'yinzhengjie' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 128 pgp_num 128 autoscale_mode warn last_change 32 flags hashpspool stripe_width 0
pool 2 'jasonyin' erasure size 3 min_size 2 crush_rule 1 object_hash rjenkins pg_num 128 pgp_num 128 autoscale_mode warn last_change 36 flags hashpspool stripe_width 8192

[root@ceph141 ~]# 


	3 查看存储池的名称列表并显示存储池的编号
[root@ceph141 ~]# ceph osd lspools
1 yinzhengjie
2 jasonyin
[root@ceph141 ~]# 


	4 查看存储池的使用空间
[root@ceph141 ~]# rados df
POOL_NAME USED OBJECTS CLONES COPIES MISSING_ON_PRIMARY UNFOUND DEGRADED RD_OPS  RD WR_OPS  WR USED COMPR UNDER COMPR 
jasonyin   0 B       0      0      0                  0       0        0      0 0 B      0 0 B        0 B         0 B 
yinzhengjie  0 B       0      0      0                  0       0        0      0 0 B      0 0 B        0 B         0 B 

total_objects    0
total_used       7.0 GiB
total_avail      1.9 TiB
total_space      2.0 TiB
[root@ceph141 ~]# 


	5 查看指定存储池的I/O拷贝信息
[root@ceph141 ~]# ceph osd pool stats yinzhengjie
pool yinzhengjie id 1
  nothing is going on

[root@ceph141 ~]# 


	6 查看OSD的信息也可以看到存储池相关的信息
[root@ceph141 ~]# ceph osd dump  | grep pool
pool 1 'yinzhengjie' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 16 pgp_num 16 last_change 511 lfor 0/509/507 flags hashpspool stripe_width 0
pool 2 'jasonyin' erasure size 3 min_size 2 crush_rule 1 object_hash rjenkins pg_num 128 pgp_num 128 autoscale_mode warn last_change 36 flags hashpspool stripe_width 8192
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph osd dump 
epoch 512
fsid 5821e29c-326d-434d-a5b6-c492527eeaad
created 2024-01-31 17:46:11.238910
modified 2024-02-01 11:02:20.375752
flags sortbitwise,recovery_deletes,purged_snapdirs,pglog_hardlimit
crush_version 16
full_ratio 0.95
backfillfull_ratio 0.9
nearfull_ratio 0.85
require_min_compat_client jewel
min_compat_client jewel
require_osd_release nautilus
pool 1 'yinzhengjie' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 16 pgp_num 16 last_change 511 lfor 0/509/507 flags hashpspool stripe_width 0
pool 2 'jasonyin' erasure size 3 min_size 2 crush_rule 1 object_hash rjenkins pg_num 128 pgp_num 128 autoscale_mode warn last_change 36 flags hashpspool stripe_width 8192
max_osd 7
osd.0 up   in  weight 1 up_from 5 up_thru 372 down_at 0 last_clean_interval [0,0) [v2:10.0.0.141:6800/2833,v1:10.0.0.141:6801/2833] [v2:10.0.0.141:6802/2833,v1:10.0.0.141:6803/2833] exists,up 2e6612cc-fa0e-403b-9ea0-3023e6c536c6
osd.1 up   in  weight 1 up_from 9 up_thru 511 down_at 0 last_clean_interval [0,0) [v2:10.0.0.141:6808/3299,v1:10.0.0.141:6809/3299] [v2:10.0.0.141:6810/3299,v1:10.0.0.141:6811/3299] exists,up ee7ad091-20a7-4600-a94a-9c0281f8e79f
osd.2 up   in  weight 1 up_from 13 up_thru 511 down_at 0 last_clean_interval [0,0) [v2:10.0.0.142:6800/18107,v1:10.0.0.142:6801/18107] [v2:10.0.0.142:6802/18107,v1:10.0.0.142:6803/18107] exists,up 66310a40-46eb-4e47-8706-4ebc455c161d
osd.3 up   in  weight 1 up_from 17 up_thru 511 down_at 0 last_clean_interval [0,0) [v2:10.0.0.142:6808/18572,v1:10.0.0.142:6809/18572] [v2:10.0.0.142:6810/18572,v1:10.0.0.142:6811/18572] exists,up 3003810f-42ee-4a6d-bd5c-8878b9f2a307
osd.4 up   in  weight 1 up_from 21 up_thru 511 down_at 0 last_clean_interval [0,0) [v2:10.0.0.142:6816/19035,v1:10.0.0.142:6817/19035] [v2:10.0.0.142:6818/19035,v1:10.0.0.142:6819/19035] exists,up 0f234c3b-a0b9-4912-a351-f0d39ae93834
osd.5 up   in  weight 1 up_from 25 up_thru 511 down_at 0 last_clean_interval [0,0) [v2:10.0.0.143:6800/12844,v1:10.0.0.143:6801/12844] [v2:10.0.0.143:6802/12844,v1:10.0.0.143:6803/12844] exists,up 4c34a506-2fa0-47ad-9f01-1080d389dcd3
osd.6 up   in  weight 1 up_from 29 up_thru 511 down_at 0 last_clean_interval [0,0) [v2:10.0.0.143:6808/13302,v1:10.0.0.143:6809/13302] [v2:10.0.0.143:6810/13302,v1:10.0.0.143:6811/13302] exists,up 4a6082bc-ba84-41f3-94d9-daff6942517f
[root@ceph141 ~]# 


参考链接:
	https://docs.ceph.com/en/nautilus/rados/operations/pools/
	https://docs.ceph.com/en/latest/rados/operations/pools/#list-pools
	https://docs.ceph.com/en/latest/rados/operations/pools/#showing-pool-statistics
	https://docs.ceph.com/en/nautilus/rados/operations/pools/#get-the-number-of-object-replicas

3.修改存储池信息

	1 查看存储池的指定属性
[root@ceph141 ~]# ceph osd pool ls detail
pool 1 'yinzhengjie' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 128 pgp_num 128 autoscale_mode warn last_change 32 flags hashpspool stripe_width 0
pool 2 'jasonyin' erasure size 3 min_size 2 crush_rule 1 object_hash rjenkins pg_num 128 pgp_num 128 autoscale_mode warn last_change 36 flags hashpspool stripe_width 8192

[root@ceph141 ~]# 
[root@ceph141 ~]# ceph osd pool get yinzhengjie size
size: 3
[root@ceph141 ~]# 


	2 修改存储池的指定属性
[root@ceph141 ~]# ceph osd pool set yinzhengjie size 1
set pool 1 size to 1
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph osd pool get yinzhengjie size
size: 1
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph osd pool ls detail
pool 1 'yinzhengjie' replicated size 1 min_size 1 crush_rule 0 object_hash rjenkins pg_num 128 pgp_num 128 autoscale_mode warn last_change 37 flags hashpspool stripe_width 0
pool 2 'jasonyin' erasure size 3 min_size 2 crush_rule 1 object_hash rjenkins pg_num 128 pgp_num 128 autoscale_mode warn last_change 36 flags hashpspool stripe_width 8192

[root@ceph141 ~]# 


	3 禁止pg数量自动伸缩
[root@ceph141 ~]# ceph osd pool get yinzhengjie pg_autoscale_mode
pg_autoscale_mode: warn
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph osd pool set yinzhengjie pg_autoscale_mode off
set pool 1 pg_autoscale_mode to off
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph osd pool get yinzhengjie pg_autoscale_mode
pg_autoscale_mode: off
[root@ceph141 ~]# 


	4 修改pg数量
[root@ceph141 ~]# ceph osd pool set yinzhengjie pg_num 16
set pool 1 pg_num to 16
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph osd pool ls detail  # 注意,在达到目标的pg数量前,会有一个"pg_num_target"和"pgp_num_target"属性。
pool 1 'yinzhengjie' replicated size 1 min_size 1 crush_rule 0 object_hash rjenkins pg_num 25 pgp_num 24 pg_num_target 16 pgp_num_target 16 last_change 470 lfor 0/470/468 flags hashpspool stripe_width 0
pool 2 'jasonyin' erasure size 3 min_size 2 crush_rule 1 object_hash rjenkins pg_num 128 pgp_num 128 autoscale_mode warn last_change 36 flags hashpspool stripe_width 8192

[root@ceph141 ~]# 
[root@ceph141 ~]# ceph osd pool ls detail
pool 1 'yinzhengjie' replicated size 1 min_size 1 crush_rule 0 object_hash rjenkins pg_num 16 pgp_num 16 last_change 509 lfor 0/509/507 flags hashpspool stripe_width 0
pool 2 'jasonyin' erasure size 3 min_size 2 crush_rule 1 object_hash rjenkins pg_num 128 pgp_num 128 autoscale_mode warn last_change 36 flags hashpspool stripe_width 8192

[root@ceph141 ~]# 
[root@ceph141 ~]# ceph osd pool get yinzhengjie pg_num 
pg_num: 16
[root@ceph141 ~]# 


	5 查看修改的信息
[root@ceph141 ~]# ceph osd pool set yinzhengjie size 3 
set pool 1 size to 3
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph osd dump | grep 'replicated size'
pool 1 'yinzhengjie' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 16 pgp_num 16 last_change 511 lfor 0/509/507 flags hashpspool stripe_width 0
[root@ceph141 ~]# 


参考链接:
	https://docs.ceph.com/en/latest/rados/operations/pools/#getting-pool-values
	https://docs.ceph.com/en/latest/rados/operations/pools/#setting-pool-values
	https://docs.ceph.com/en/nautilus/rados/operations/placement-groups/

4.删除存储池的两种机制

温馨提示:
	一旦一个存储池被删除,那么该存储池的所有数据都会被删除且无法找回。
	因此为了安全起见,ceph有存储池保护机制,ceph支持两种保护机制: "nodelete"和"mon_allow_pool_delete"
		- nodelete:
			一旦一个存储池被打上该标记,则意味着存储池不可被删除,默认值为false。
			
		- mon_allow_pool_delete:
			告诉所有mon组件,可以删除存储池。
			
	生产环境中,为了安全起见,建议大家将存储池设置为nodelete的属性为"ture",mon_allow_pool_delete的值为false。
	

参考案例:
	1. nodelete案例
[root@ceph141 ~]# ceph osd pool ls
yinzhengjie
jasonyin
[root@ceph141 ~]# ceph osd pool get yinzhengjie nodelete 
nodelete: false
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph osd pool get jasonyin nodelete 
nodelete: false
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph osd pool set yinzhengjie nodelete true
set pool 1 nodelete to true
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph osd pool get yinzhengjie nodelete 
nodelete: true
[root@ceph141 ~]# 


	2 mon_allow_pool_delete案例
[root@ceph141 ~]# ceph osd pool ls
yinzhengjie
jasonyin
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph tell mon.* injectargs --mon_allow_pool_delete=true
mon.ceph141: injectargs:mon_allow_pool_delete = 'true' 
mon.ceph142: injectargs:mon_allow_pool_delete = 'true' 
mon.ceph143: injectargs:mon_allow_pool_delete = 'true' 
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph osd pool delete yinzhengjie yinzhengjie  --yes-i-really-really-mean-it
Error EPERM: pool deletion is disabled; you must unset nodelete flag for the pool first
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph osd pool delete jasonyin jasonyin  --yes-i-really-really-mean-it
pool 'jasonyin' removed
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph osd pool ls
yinzhengjie
[root@ceph141 ~]# 


	3 如果想要删除存储池必须让nodelete的值为false,且mon_allow_pool_delete为true。
[root@ceph141 ~]# ceph tell mon.* injectargs --mon_allow_pool_delete=false
mon.ceph141: injectargs:mon_allow_pool_delete = 'false' 
mon.ceph142: injectargs:mon_allow_pool_delete = 'false' 
mon.ceph143: injectargs:mon_allow_pool_delete = 'false' 
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph osd pool delete yinzhengjie yinzhengjie  --yes-i-really-really-mean-it
Error EPERM: pool deletion is disabled; you must first set the mon_allow_pool_delete config option to true before you can destroy a pool
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph osd pool set yinzhengjie nodelete false
set pool 1 nodelete to false
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph osd pool get yinzhengjie nodelete 
nodelete: false
[root@ceph141 ~]# 
[root@ceph141 ~]# ceph osd pool delete yinzhengjie yinzhengjie  --yes-i-really-really-mean-it
Error EPERM: pool deletion is disabled; you must first set the mon_allow_pool_delete config option to true before you can destroy a pool
[root@ceph141 ~]# 

posted @ 2021-01-06 22:30  尹正杰  阅读(124)  评论(0编辑  收藏  举报