几招教你如何使用lttng以及log分析cpeh

LTTng: (Linux Trace Toolkit Next Generation),它是用于跟踪 Linux 内核、应用程序以及库的系统软件包.LTTng 主要由内核模块和动态链接库(用于应用程序和动态链接库的跟踪)组成。它由一个会话守护进程控制,该守护进程接受来自命令行接口的命令。babeltrace 项目允许将追踪信息翻译成用户可读的日志,并提供一个读追踪库,即 libbabletrace。 ceph代码中大量嵌入了tracepoint ,使用lttng进行跟踪。

 

配置开启tracing 功能

首先linux 使用 apt 或者 yum 安装 lttng

apt方式

 $ sudo apt-get update
 $ sudo apt-get install lttng-tools lttng-modules-dkms babeltrace

 

yum方式

$ sudo yum install lttng-tools lttng-ust // 查看 trace 结果的工具 # yum install babeltrace

 

命令行输入 ceph daemon /var/run/ceph/ceph-osd.0.asok config show 可以从中得知traceing 对应的模块 , 以librbd为例子展开

"event_tracing": "false"
"osd_function_tracing": "false"
"osd_objectstore_tracing": "false"
"osd_tracing": "false"
"rados_tracing": "false"
"rbd_tracing": "false"

 

修改默认ceph.conf配置,把rbd_tracing 设置成true
~# vim /etc/ceph/ceph.conf

[global]
fsid = xxxxxxxx
public_network = xxxxxx/24
cluster_network = xxxxx/24
mon_initial_members = xxxxxxx, xxxxxx, xxxxxxx
mon_host = xxxxxx,xxxxxx,xxxxxx
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
osd_pool_default_size=3
osd_pool_default_min_size=2
osd_journal_size=100
filestore_xattr_use_omap=true
osd_pool_default_pg_num=128
osd_pool_default_pgp_num=128
osd crush_chooseleaf_type=0
rbd_cache=false
rbd_tracing=true ##------- 设置成true

 

注意:使用ceph tell mon.* injectargs “—rbd_tracing false” 方式无法修改到配置的,必须修改配置重启集群

 

lttng 展示可跟踪位置

lttng 跟踪必定是需要进程一直在运行状态,像mon、osd的自不必多说,如果跟踪是librbd就必须保证加载 librbd.so进程是在运行。

 ./rbd_example  ##一直在跑
 lttng list -u

UST events:
-------------

PID: 12039 - Name: ./rbd_example
      lttng_ust_tracelog:TRACE_DEBUG (loglevel: TRACE_DEBUG (14)) (type: tracepoint)
      lttng_ust_tracelog:TRACE_DEBUG_LINE (loglevel: TRACE_DEBUG_LINE (13)) (type: tracepoint)
      lttng_ust_tracelog:TRACE_DEBUG_FUNCTION (loglevel: TRACE_DEBUG_FUNCTION (12)) (type: tracepoint)
      lttng_ust_tracelog:TRACE_DEBUG_UNIT (loglevel: TRACE_DEBUG_UNIT (11)) (type: tracepoint)
      lttng_ust_tracelog:TRACE_DEBUG_MODULE (loglevel: TRACE_DEBUG_MODULE (10)) (type: tracepoint)
      lttng_ust_tracelog:TRACE_DEBUG_PROCESS (loglevel: TRACE_DEBUG_PROCESS (9)) (type: tracepoint)
      lttng_ust_tracelog:TRACE_DEBUG_PROGRAM (loglevel: TRACE_DEBUG_PROGRAM (8)) (type: tracepoint)
      lttng_ust_tracelog:TRACE_DEBUG_SYSTEM (loglevel: TRACE_DEBUG_SYSTEM (7)) (type: tracepoint)
      lttng_ust_tracelog:TRACE_INFO (loglevel: TRACE_INFO (6)) (type: tracepoint)
      lttng_ust_tracelog:TRACE_NOTICE (loglevel: TRACE_NOTICE (5)) (type: tracepoint)
        .
        .
        .
        .
        .
        .
        .
        .
      librbd:open_image_enter (loglevel: TRACE_DEBUG_LINE (13)) (type: tracepoint)
      librbd:write_exit (loglevel: TRACE_DEBUG_LINE (13)) (type: tracepoint)
      librbd:write2_enter (loglevel: TRACE_DEBUG_LINE (13)) (type: tracepoint)
      librbd:write_enter (loglevel: TRACE_DEBUG_LINE (13)) (type: tracepoint)
      librbd:read_iterate2_exit (loglevel: TRACE_DEBUG_LINE (13)) (type: tracepoint)
      librbd:read_iterate2_enter (loglevel: TRACE_DEBUG_LINE (13)) (type: tracepoint)
      librbd:read_iterate_exit (loglevel: TRACE_DEBUG_LINE (13)) (type: tracepoint)
      librbd:read_iterate_enter (loglevel: TRACE_DEBUG_LINE (13)) (type: tracepoint)
      librbd:read_exit (loglevel: TRACE_DEBUG_LINE (13)) (type: tracepoint)
      librbd:read2_enter (loglevel: TRACE_DEBUG_LINE (13)) (type: tracepoint)
      librbd:read_enter (loglevel: TRACE_DEBUG_LINE (13)) (type: tracepoint)

 

跟踪并获取信息

mkdir -p traces ##创建存放
lttng create -o traces librbd ## 创建trace session
lttng enable-event -u 'librbd:*' ## 使能感兴趣的event
lttng add-context -u -t pthread_id ## 加入 线程信息
lttng start ## 开始跟踪
# run RBD workload here
lttng stop ## 停止trace

lttng destroy ##销毁 session

 

可以查看traces目录,是否有对应的记录生成

 

使用babeltrace读取结果

babeltrace traces > result.all

 

[10:17:31.802322370] (+?.?????????) XXXXXXXXX librbd:aio_complete_enter: { cpu_id = 2 }, { pthread_id = 139658738509568 }, { completion = 0x5635FA045920, rval = 0 }
[10:17:31.802361060] (+0.000038690) XXXXXXXXX librbd:aio_get_return_value_enter: { cpu_id = 2 }, { pthread_id = 139658738509568 }, { completion = 0x5635FA045920 }
[10:17:31.802362582] (+0.000001522) XXXXXXXXX librbd:aio_get_return_value_exit: { cpu_id = 2 }, { pthread_id = 139658738509568 }, { retval = 0 }
[10:17:31.802399704] (+0.000037122) XXXXXXXXX librbd:aio_complete_exit: { cpu_id = 2 }, { pthread_id = 139658738509568 }, { }
[10:17:31.802522131] (+0.000122427) XXXXXXXXX librbd:write_exit: { cpu_id = 2 }, { pthread_id = 139659290902208 }, { retval = 10485760 }
.
.
.
.
.

[10:17:34.397260832] (+0.000000840) XXXXXXXXX librbd:aio_get_return_value_exit: { cpu_id = 2 }, { pthread_id = 139658738509568 }, { retval = 0 }
[10:17:34.397273502] (+0.000012670) XXXXXXXXX librbd:aio_complete_exit: { cpu_id = 2 }, { pthread_id = 139658738509568 }, { }
[10:17:34.397364800] (+0.000091298) XXXXXXXXX librbd:write_exit: { cpu_id = 2 }, { pthread_id = 139659290902208 }, { retval = 10485760 }
[10:17:34.650313545] (+0.252948745) XXXXXXXXX librbd:write_enter: { cpu_id = 2 }, { pthread_id = 139659290902208 }, { imagectx = 0x5635FA04B4B0, name = "librbd_test"
, snap_name = "", read_only = 0, off = 0, buf = 0x5635FA053330, buf_isnull = 0, buf_len = 10485760 }

 

log功能启动

如果想使用同样也是修改ceph.conf 加入以下 client 部分

~# vim /etc/ceph/ceph.conf

[global]
fsid = xxxxxxxx
public_network = xxxxxx/24
cluster_network = xxxxx/24
mon_initial_members = xxxxxxx, xxxxxx, xxxxxxx
mon_host = xxxxxx,xxxxxx,xxxxxx
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
osd_pool_default_size=3
osd_pool_default_min_size=2
osd_journal_size=100
filestore_xattr_use_omap=true
osd_pool_default_pg_num=128
osd_pool_default_pgp_num=128
osd crush_chooseleaf_type=0
rbd_cache=false
rbd_tracing=true ##------- 设置成true


[client]
debug rbd = 20 ## 需要打印日志部分,以及其等级
debug rados = 20 ## 需要打印日志部分,以及其等级
log file = /var/log/ceph/ceph-client.log ## 日志输出位置

 

结束语

使用lttng以及log可以更好地分析ceph的运作模式。

posted on 2021-01-05 18:02  睿江云  阅读(648)  评论(0编辑  收藏  举报