Fork me on GitHub

blktrace 编译与使用

在对ssd性能调优过程中,有使用到blktrace,本文对blktrace做一个记录

文章目录

  1. blktrace介绍
  2. blktrace aarch64移植
  3. 使用
  4. 实例
  5. 数据分析
  6. blktrace分析ssd性能差的问题实例

1. blktrace介绍

我们首先需要知道提交到 block I/O层的每个I/O执行的具体操作,如下图所示

image

blktrace功能如下:

  1. 提供关于单个I/O的详细的block layer信息
  2. 低开销内核跟踪机制(在I/O压力相对较大的情况下,对应用程序性能的影响不到2%)
  3. 可配置:
    1. 可以指定一个或多个物理设备或逻辑设备
    2. 用户可选择事件-可以在事件获取和/或格式化输出时指定过滤器
  4. 同时支持“实时”和“回放”跟踪

blktrace总体架构如下:

image

2. blktrace aarch64移植

首先对blktrace,blkparse ,btt做个简单的介绍,blktrace负责采集,blkparse负责对采集的数据进行解析,而btt能够协助分析统计

blktrace选项如下,常用的使用有

  • -d dev #添加一个设备追踪
  • -o file #指定输出文件的名字
root@a1000:~# blktrace --help
blktrace: unrecognized option '--help'
Usage: blktrace

-d <dev>             | --dev=<dev>
[ -r <debugfs path>  | --relay=<debugfs path> ]
[ -o <file>          | --output=<file>]
[ -D <dir>           | --output-dir=<dir>
[ -w <time>          | --stopwatch=<time>]
[ -a <action field>  | --act-mask=<action field>]
[ -A <action mask>   | --set-mask=<action mask>]
[ -b <size>          | --buffer-size]
[ -n <number>        | --num-sub-buffers=<number>]
[ -l                 | --listen]
[ -h <hostname>      | --host=<hostname>]
[ -p <port number>   | --port=<port number>]
[ -s                 | --no-sendfile]
[ -I <devs file>     | --input-devs=<devs file>]
[ -v <version>       | --version]
[ -V <version>       | --version]
        -d Use specified device. May also be given last after options
        -r Path to mounted debugfs, defaults to /sys/kernel/debug
        -o File(s) to send output to
        -D Directory to prepend to output file names
        -w Stop after defined time, in seconds
        -a Only trace specified actions. See documentation
        -A Give trace mask as a single value. See documentation
        -b Sub buffer size in KiB (default 512)
        -n Number of sub buffers (default 4)
        -l Run in network listen mode (blktrace server)
        -h Run in network client mode, connecting to the given host
        -p Network port to use (default 8462)
        -s Make the network client NOT use sendfile() to transfer data
        -I Add devices found in <devs file>
        -v Print program version info
        -V Print program version info

blkparse选项如下,常用的使用有

  • -i input #输入包含跟踪数据的文件
  • -o output #输出文件。如果未给出,则输出为标准输出
root@a1000:~# blkparse -h
Usage: blkparse

-i <file>           | --input=<file>
[ -a <action field> | --act-mask=<action field> ]
[ -A <action mask>  | --set-mask=<action mask> ]
[ -b <traces>       | --batch=<traces> ]
[ -d <file>         | --dump-binary=<file> ]
[ -D <dir>          | --input-directory=<dir> ]
[ -f <format>       | --format=<format> ]
[ -F <spec>         | --format-spec=<spec> ]
[ -h                | --hash-by-name ]
[ -o <file>         | --output=<file> ]
[ -O                | --no-text-output ]
[ -q                | --quiet ]
[ -s                | --per-program-stats ]
[ -t                | --track-ios ]
[ -w <time>         | --stopwatch=<time> ]
[ -M                | --no-msgs
[ -v                | --verbose ]
[ -V                | --version ]

        -a Only trace specified actions. See documentation
        -A Give trace mask as a single value. See documentation
        -b stdin read batching
        -d Output file. If specified, binary data is written to file
        -D Directory to prepend to input file names
        -f Output format. Customize the output format. The format field
           identifies can be found in the documentation
        -F Format specification. Can be found in the documentation
        -h Hash processes by name, not pid
        -i Input file containing trace data, or '-' for stdin
        -o Output file. If not given, output is stdout
        -O Do NOT output text data
        -q Quiet. Don't display any stats at the end of the trace
        -s Show per-program io statistics
        -t Track individual ios. Will tell you the time a request took
           to get queued, to get dispatched, and to get completed
        -w Only parse data between the given time interval in seconds.
           If 'start' isn't given, blkparse defaults the start time to 0
        -M Do not output messages to binary file
        -v More verbose for marginal errors
        -V Print program version info

btt选项如下,常用的使用有

  • -i input #输入文件
root@a1000:~# btt -h
Usage: btt
[ -a               | --seek-absolute ]
[ -A               | --all-data ]
[ -B <output name> | --dump-blocknos=<output name> ]
[ -d <seconds>     | --range-delta=<seconds> ]
[ -D <dev;...>     | --devices=<dev;...> ]
[ -e <exe,...>     | --exes=<exe,...>  ]
[ -h               | --help ]
[ -i <input name>  | --input-file=<input name> ]
[ -I <output name> | --iostat=<output name> ]
[ -l <output name> | --d2c-latencies=<output name> ]
[ -L <freq>        | --periodic-latencies=<freq> ]
[ -m <output name> | --seeks-per-second=<output name> ]
[ -M <dev map>     | --dev-maps=<dev map>
[ -o <output name> | --output-file=<output name> ]
[ -p <output name> | --per-io-dump=<output name> ]
[ -P <output name> | --per-io-trees=<output name> ]
[ -q <output name> | --q2c-latencies=<output name> ]
[ -Q <output name> | --active-queue-depth=<output name> ]
[ -r               | --no-remaps ]
[ -s <output name> | --seeks=<output name> ]
[ -S <interval>    | --iostat-interval=<interval> ]
[ -t <sec>         | --time-start=<sec> ]
[ -T <sec>         | --time-end=<sec> ]
[ -u <output name> | --unplug-hist=<output name> ]
[ -V               | --version ]
[ -v               | --verbose ]
[ -X               | --easy-parse-avgs ]
[ -z <output name> | --q2d-latencies=<output name> ]
[ -Z               | --do-active

4. 实例

  1. 对/dev/nvme0n1p1进行采集,随后执行dd命令。
root@a1000:~# blktrace -d /dev/nvme0n1p1&
[1] 418
root@a1000:~# dd of=/dev/null if=/dev/nvme0n1p1 bs=1M count=512
512+0 records in
512+0 records out
536870912 bytes (537 MB, 512 MiB) copied, 0.74305 s, 723 MB/s
root@a1000:~# kill -9 418
[1]+  Killed                  blktrace -d /dev/nvme0n1p1

  1. 利用blkparse对blktrace采集的文件进行解析,其中-d 是输出二进制文件,方便btt分析,-o 是将blkparse解析的数据放入nvme_data中,可自行分析。
root@a1000:~# ls -lh nvme0n1p1.blktrace.0
-rw-r--r-- 1 root root 8.0M Jan 18 17:10 nvme0n1p1.blktrace.0
root@a1000:~# blkparse -i nvme0n1p1 -d nvme.blktrace.bin -o nvme_data
Input file nvme0n1p1.blktrace.0 added
Bad magic 0

nvme_data原始数据内容如下:
image

  1. btt解析nvme.blktrace.bin
root@a1000:~# btt -i nvme.blktrace.bin
==================== All Devices ====================

            ALL           MIN           AVG           MAX           N
--------------- ------------- ------------- ------------- -----------

Q2Q               0.000151718   0.000393516   0.009215535        2912
Q2G               0.000002172   0.000002721   0.000030020        2912
D2C               0.000366797   0.000482912   0.000821314        2911
Q2C               0.000381274   0.000497561   0.000835603        2911

==================== Device Overhead ====================

       DEV |       Q2G       G2I       Q2M       I2D       D2C
---------- | --------- --------- --------- --------- ---------
 (259,  3) |   0.5470%   0.0000%   0.0000%   0.0000%  97.0557%
---------- | --------- --------- --------- --------- ---------
   Overall |   0.5470%   0.0000%   0.0000%   0.0000%  97.0557%

==================== Device Merge Information ====================

       DEV |       #Q       #D   Ratio |   BLKmin   BLKavg   BLKmax    Total
---------- | -------- -------- ------- | -------- -------- -------- --------
 (259,  3) |     2913     2912     1.0 |      256      255      256   745216

==================== Device Q2Q Seek Information ====================

       DEV |          NSEEKS            MEAN          MEDIAN | MODE
---------- | --------------- --------------- --------------- | ---------------
 (259,  3) |            2913             0.7               0 | 0(2912)
---------- | --------------- --------------- --------------- | ---------------
   Overall |          NSEEKS            MEAN          MEDIAN | MODE
   Average |            2913             0.7               0 | 0(2912)

==================== Device D2D Seek Information ====================

       DEV |          NSEEKS            MEAN          MEDIAN | MODE
---------- | --------------- --------------- --------------- | ---------------
 (259,  3) |            2912             0.7               0 | 0(2911)
---------- | --------------- --------------- --------------- | ---------------
   Overall |          NSEEKS            MEAN          MEDIAN | MODE
   Average |            2912             0.7               0 | 0(2911)

==================== Plug Information ====================

       DEV |    # Plugs # Timer Us  | % Time Q Plugged
---------- | ---------- ----------  | ----------------
 (259,  3) |       2912(         0) |   0.252751036%

       DEV |    IOs/Unp   IOs/Unp(to)
---------- | ----------   ----------
 (259,  2) |        0.0          0.0
 (259,  3) |        1.0          0.0
---------- | ----------   ----------
   Overall |    IOs/Unp   IOs/Unp(to)
   Average |        1.0          0.0

==================== Active Requests At Q Information ====================

       DEV |  Avg Reqs @ Q
---------- | -------------
 (259,  3) |           0.0

==================== I/O Active Period Information ====================

       DEV |     # Live      Avg. Act     Avg. !Act % Live
---------- | ---------- ------------- ------------- ------
 (259,  2) |          0   0.000000000   0.000000000   0.00
 (259,  3) |       1331   0.000716134   0.000144799  83.19
---------- | ---------- ------------- ------------- ------
 Total Sys |       1331   0.000716134   0.000144799  83.19

# Total System
#     Total System : q activity
  0.000006160   0.0
  0.000006160   0.4
  1.145925500   0.4
  1.145925500   0.0

#     Total System : c activity
  0.000537965   0.5
  0.000537965   0.9
  1.145756949   0.9
  1.145756949   0.5

# Per device
#            259,3 : q activity
  0.000006160   1.0
  0.000006160   1.4
  1.145925500   1.4
  1.145925500   1.0

#            259,3 : c activity
  0.000537965   1.5
  0.000537965   1.9
  1.145756949   1.9
  1.145756949   1.5

# Per process
#               dd : q activity
  0.000006160   2.0
  0.000006160   2.4
  1.145925500   2.4
  1.145925500   2.0

#               dd : c activity

#              irq : q activity

#              irq : c activity
  0.000537965   3.5
  0.000537965   3.9
  1.145756949   3.9
  1.145756949   3.5

5. 数据分析

image

第7个字段在上图中没有标出来,它表示操作类型,具体含义是:

”R” for Read, “W” for Write, “D” for block, “B” for Barrier operation。

第6个字段Event解释如下:

image

 Q------->G------------>I--------->M------------------->D----------------------------->C
 |-Q time-|-Insert time-|
 |--------- merge time ------------|-merge with other IO|
 |----------------scheduler time time-------------------|---driver,adapter,storagetime--|
 
 |----------------------- await time in iostat output ----------------------------------|

其中:
Q2Q — time between requests sent to the block layer
Q2G — time from a block I/O is queued to the time it gets a request allocated for it
G2I — time from a request is allocated to the time it is Inserted into the device’s queue
Q2M — time from a block I/O is queued to the time it gets merged with an existing request
I2D — time from a request is inserted into the device’s queue to the time it is actually issued to the device
M2D — time from a block I/O is merged with an exiting request until the request is issued to the device
D2C — service time of the request by the device
Q2C — total time spent in the block layer for a request

blkparse解析出来的数据依旧是很难直接进行分析的,btt是对这些数据做一个总结。

6. blktrace分析ssd性能差的问题实例

posted @ 2022-04-23 16:36  yooooooo  阅读(348)  评论(0编辑  收藏  举报