问题描述

描述

当我们用低版本的rbd或cephfs客户端mount高版本的ceph服务端的时候会报错1000000000000、200000000000000或400000000000000

痛点:客户端是直接集成在Linux内核里的更新频率显然跟不上服务端社区的更新频率。
不更新ceph的服务端版本,一些功能和BUG又没办法解决。

报错日志

sudo mount -t ceph 10.10.1.11:/
>> > /mnt/mycephfs -o name=admin,secretfile=/etc/ceph/admin.key;
>> > sudo tail /var/log/messages
>> > Fri May  6 22:31:14 MSK 2016
>> > mount error 5 = Input/output error
>> > May  6 22:31:24 ceph-admin kernel: libceph: mon0 10.10.1.11:6789
>> > feature set mismatch, my 103b84a842aca < server's 40103b84a842aca,
>> > missing 400000000000000 May  6 22:31:24 ceph-admin kernel: libceph:
>> > mon0 10.10.1.11:6789 missing required protocol features May  6
>> > 22:31:34 ceph-admin kernel: libceph: mon0 10.10.1.11:6789 feature set
>> > mismatch, my 103b84a842aca < server's 40103b84a842aca, missing
>> > 400000000000000 May  6 22:31:34 ceph-admin kernel: libceph: mon0
>> > 10.10.1.11:6789 missing required protocol features May  6 22:31:44
>> > ceph-admin kernel: libceph: mon0 10.10.1.11:6789 feature set mismatch,
>> > my 103b84a842aca < server's 40103b84a842aca, missing 400000000000000
>> > May  6 22:31:44 ceph-admin kernel: libceph: mon0 10.10.1.11:6789
>> > missing required protocol features May  6 22:31:54 ceph-admin kernel:
>> > libceph: mon0 10.10.1.11:6789 feature set mismatch, my 103b84a842aca <
>> > server's 40103b84a842aca, missing 400000000000000 May  6 22:31:54
>> > ceph-admin kernel: libceph: mon0 10.10.1.11:6789 missing required
>> > protocol features May  6 22:32:04 ceph-admin kernel: libceph: mon0
>> > 10.10.1.11:6789 feature set mismatch, my 103b84a842aca < server's
>> > 40103b84a842aca, missing 400000000000000 May  6 22:32:04 ceph-admin
>> > kernel: libceph: mon0 10.10.1.11:6789 missing required protocol
>> > features
>> >
>> > As I guessed I need to switch off the "require_feature_tunables5" to
>> > remove the error messages.
>> >
>> > Can somebody tell me how to do that ?
>> >
>> > Many thanks in advance.

特性和内核对应表

客户端与服务端能力之间的匹配关系

CEPH_FEATURE Table and Kernel Version
You can find the feature missing in that table :

For exemple, missing 2040000 means that CEPH_FEATURE_CRUSH_TUNABLES (40000) and CEPH_FEATURE_CRUSH_TUNABLES2 (2000000) is missing on kernel client.

‘R’:required, ’S’:support, ‘-X-’ feature is new since this version
Feature BIT OCT 3.8 3.9 3.10 3.14 3.15 3.18 4.1 4.5 4.6
CEPH_FEATURE_NOSRCADDR 1 2 R R R R R R R R R
CEPH_FEATURE_SUBSCRIBE2 4 10 -R-
CEPH_FEATURE_RECONNECT_SEQ 6 40 -R- R R R R R R
CEPH_FEATURE_PGID64 9 200 R R R R R R R R
CEPH_FEATURE_PGPOOL3 11 800 R R R R R R R R
CEPH_FEATURE_OSDENC 13 2000 R R R R R R R R
CEPH_FEATURE_CRUSH_TUNABLES 18 40000 S S S S S S S S S
CEPH_FEATURE_MSG_AUTH 23 800000 -S- S S S
CEPH_FEATURE_CRUSH_TUNABLES2 25 2000000 S S S S S S S S
CEPH_FEATURE_REPLY_CREATE_INODE 27 8000000 S S S S S S S S
CEPH_FEATURE_OSDHASHPSPOOL 30 40000000 S S S S S S S S
CEPH_FEATURE_OSD_CACHEPOOL 35 800000000 -S- S S S S S
CEPH_FEATURE_CRUSH_V2 36 1000000000 -S- S S S S S
CEPH_FEATURE_EXPORT_PEER 37 2000000000 -S- S S S S S
CEPH_FEATURE_OSD_ERASURE_CODES*** 38 4000000000
CEPH_FEATURE_OSDMAP_ENC 39 8000000000 -S- S S S S
CEPH_FEATURE_CRUSH_TUNABLES3 41 20000000000 -S- S S S S
CEPH_FEATURE_OSD_PRIMARY_AFFINITY 41* 20000000000 -S- S S S S
CEPH_FEATURE_CRUSH_V4 **** 48 1000000000000 -S- S S
CEPH_FEATURE_CRUSH_TUNABLES5 58 200000000000000 -S- S
CEPH_FEATURE_NEW_OSDOPREPLY_ENCODING 58* 400000000000000 -S- S

解决办法

描述

最简单的办法就是升级客户端版本,但显然遇到这个问题的人都是升级不了客户端版本的人。
反过来,那只能通过降低服务端的能力来解决这个问题了。
以ceph-nautilus 14.2.9为例
展示一下tunables

$ ceph osd crush show-tunables
{
    "choose_local_tries": 0,
    "choose_local_fallback_tries": 0,
    "choose_total_tries": 50,
    "chooseleaf_descend_once": 1,
    "chooseleaf_vary_r": 1,
    "chooseleaf_stable": 0,
    "straw_calc_version": 1,
    "allowed_bucket_algs": 54,
    "profile": "hammer",
    "optimal_tunables": 0,
    "legacy_tunables": 0,
    "minimum_required_version": "jewel",
    "require_feature_tunables": 1,
    "require_feature_tunables2": 1,
    "has_v2_rules": 0,
    "require_feature_tunables3": 0,
    "has_v3_rules": 0,
    "has_v4_buckets": 1,
    "require_feature_tunables5": 1,
    "has_v5_rules": 0
}

自己的客户端版本内核是3.10;因此错误1000000000000、和400000000000000都会报。
最终关掉require_feature_tunables5、has_v4_buckets两项能力才完成了挂载。

关掉require_feature_tunables5

查看可调的参数

$ ceph osd crush tunables --help
.....
osd crush tunables legacy|argonaut|bobtail|firefly|hammer|jewel|optimal|default

设置到firefly

ceph osd crush tunables firefly
ceph osd crush reweight-all

关掉has_v4_buckets

我们发现就算把所有的选项都尝试一遍has_v4_buckets依然都是1
最终有网友发现,把crush里的straw2都改成straw就可以了。

# 获取crushmap
$ sudo ceph osd getcrushmap -o crushmap.txt
# 反编译crushmap
$ crushtool -d crushmap.txt -o crushmap-decompile
# 改之前记得备份
$ cp crushmap-decompile bakcrushmap
# 修改把所有的straw2都改成straw
$ sed -i "s/straw2/straw/" crushmap-decompile
# 编译crushmap
$ crushtool -c crushmap-decompile -o crushmap-compiled
# 设置crushmap
[root@node1 ~]# sudo ceph osd setcrushmap -i crushmap-compiled

参考网址

http://cephnotes.ksperis.com/blog/2014/01/21/feature-set-mismatch-error-on-ceph-kernel-client/
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-May/009634.html
https://ceph.io/planet/ceph-的crush算法-straw/
https://blog.csdn.net/tiankai517/article/details/50221931?locationNum=3&fps=1

posted on 2020-12-31 18:55  步孤天  阅读(1167)  评论(0编辑  收藏  举报