DiskSpd 是一个由微软开发的强大的存储性能测试工具,主要用于测试磁盘、存储设备和系统的 I/O 性能。它是一个命令行工具,允许用户模拟不同的负载模式(如顺序读取、顺序写入、随机读取、随机写入等)来评估磁盘或存储系统的性能。DiskSpd 具有高度的可配置性,可以让用户自定义许多参数,以进行精确的性能测试。

Release DISKSPD 2.2 · microsoft/diskspd · GitHub

 

什么是 DiskSpd?

DiskSpd 是一个由微软开发的强大的存储性能测试工具,主要用于测试磁盘、存储设备和系统的 I/O 性能。它是一个命令行工具,允许用户模拟不同的负载模式(如顺序读取、顺序写入、随机读取、随机写入等)来评估磁盘或存储系统的性能。DiskSpd 具有高度的可配置性,可以让用户自定义许多参数,以进行精确的性能测试。

为什么要使用 DiskSpd?

  1. 高效性:DiskSpd 是由微软开发的,具有高度的优化,能够充分发挥现代存储系统的性能,提供非常准确的性能数据。

  2. 功能强大:DiskSpd 可以模拟多种不同的 I/O 模式,包括顺序读/写、随机读/写、混合工作负载等,并支持调整块大小、线程数、文件大小、缓存行为等大量测试参数。

  3. 广泛的支持:它支持多个 Windows 操作系统版本,可以在多种硬件配置和存储介质上运行,包括硬盘、固态硬盘(SSD)和其他存储设备。

  4. 灵活的配置:DiskSpd 提供了非常灵活的配置选项,可以用来创建简单或复杂的测试场景,帮助用户深入了解存储设备的性能特点。

  5. 适用于企业和开发者:企业 IT 基础设施、存储管理员和开发人员都可以使用 DiskSpd 来测试和优化存储系统,或者评估存储硬件在不同负载下的表现。

DiskSpd 的主要特点和功能

  1. 多种 I/O 模式

    • 顺序读/写(Sequential Read/Write)
    • 随机读/写(Random Read/Write)
    • 混合负载(Mix Read/Write,按比例设定读写比例)
  2. 灵活的测试配置

    • 文件大小、块大小、I/O 请求大小
    • 线程数、文件数量
    • 每个线程的 I/O 操作数量
    • 磁盘缓存的控制选项(如启用或禁用缓存)
  3. 性能度量

    • IOPS(每秒输入输出操作数)
    • 吞吐量(Throughput,即带宽,通常以 MB/s 或 GB/s 测量)
    • 延迟(Latency,即每次操作的时间,通常以毫秒 ms 为单位)
  4. 系统资源的深度控制

    • 可自定义磁盘、CPU 绑定,精细控制测试期间资源的使用。
    • 支持事件同步与等待,适用于精确控制测试时间。
  5. 多线程与多文件支持

    • 支持使用多个线程进行并发 I/O 操作,模拟多用户或高并发的工作负载。
    • 可以指定多个测试文件,模拟更复杂的测试场景。
  6. 支持日志和报告

    • DiskSpd 提供了详细的日志文件,能够记录测试过程中的所有性能指标。
    • 结果可以通过命令行或脚本进行进一步分析,生成性能报告。

DiskSpd 的使用场景

  1. 硬件性能评估:DiskSpd 常用于评估硬盘、SSD 等存储设备在不同负载下的性能,帮助系统管理员和硬件工程师选择合适的硬件设备。

  2. 存储优化:对于数据库管理员、虚拟化专家等,DiskSpd 可以用于评估存储性能,帮助优化存储配置,确保系统能够高效运行。

  3. 开发和测试:软件开发人员,特别是存储相关的开发人员,使用 DiskSpd 来模拟真实的负载,测试应用程序在不同存储条件下的表现。

  4. 故障排查:DiskSpd 可以用来测试存储系统在出现性能瓶颈或故障时的反应,帮助定位性能问题。

总结:为什么使用 DiskSpd?

DiskSpd 是一个功能强大且灵活的存储性能测试工具,适用于企业 IT 基础设施、存储管理员、硬件工程师、开发人员等。它能够模拟多种负载模式,帮助用户全面评估磁盘及存储系统的性能,检测潜在瓶颈,进行故障排查,或优化存储配置。无论是硬件评估、性能调优,还是存储系统测试,DiskSpd 都是一个非常有用的工具。


 


DiskSpd 工具按功能分类的表格:

功能类别 选项 描述
基本配置 -c<size> 创建一个指定大小的文件
  -d<seconds> 运行测试的持续时间(秒)
  -b<size> 设置块大小(例如 4K8K 等)
  -t<num> 每个文件的线程数
  -o<num> 每个线程的重叠 I/O 操作数量
  -r 随机访问模式
  -a<cpu_mask> 设置线程与 CPU 的绑定(例如 0,1 表示绑定到 CPU 0 和 CPU 1)
文件与数据源 -c<size> 创建指定大小的文件
  -X<filepath> 使用 XML 配置文件来配置测试参数
  -Z<size> 使用指定大小的随机填充缓冲区进行写操作
  -Zr 对每次 I/O 操作使用随机填充的缓冲区
同步控制 -ys<eventname> 在实际测试开始前触发指定事件(不进行热身)
  -yf<eventname> 在实际测试结束后触发指定事件(不进行冷却)
  -yr<eventname> 在测试开始前等待指定事件(包括热身)
  -yp<eventname> 当指定事件触发时停止测试;可以通过 CTRL+C 绑定此事件
  -ye<eventname> 设置指定事件并退出
性能计时器与事件 `-e<q c
  -ep 使用分页内存进行内核日志记录(默认使用非分页内存)
  -ePROCESS 跟踪进程的开始与结束
  -eTHREAD 跟踪线程的开始与结束
  -eIMAGE_LOAD 跟踪图像加载事件
  -eDISK_IO 跟踪物理磁盘 I/O 操作
  -eMEMORY_PAGE_FAULTS 跟踪所有页面错误
  -eMEMORY_HARD_FAULTS 跟踪硬错误
  -eNETWORK 跟踪 TCP/IP 和 UDP/IP 的发送与接收
  -eREGISTRY 跟踪注册表操作
缓存与优化 -Sh 禁用所有缓存机制
  -Sz 禁用磁盘缓存
随机种子与填充 -z[seed] 设置随机种子(默认种子为 0)
  -Zr 使用随机填充的缓冲区进行写操作
多文件与多线程 -t<num> 每个文件的线程数(可以使用多个文件进行测试)
  -a<cpu_mask> 设置多线程与多个 CPU 的绑定
高级功能 -X<filepath> 使用 XML 配置文件来配置测试参数(可以配置多个目标)

备注说明:

  1. -c<size> 用于指定文件大小,可以是字节、KB、MB、GB 等。
  2. -b<size> 用于设置块大小,通常为 4KB、8KB 或更大。
  3. -t<num> 表示测试中线程的数量,每个文件可以有不同数量的线程。
  4. -r 指定进行随机访问(默认是顺序访问)。
  5. -o<num> 指定每个线程的重叠 I/O 操作数,适用于负载较大的测试。
  6. 同步控制选项(如 -ys-yf-yr 等)用于在测试前后与外部事件进行同步操作,常用于精确控制测试启动与结束时机。

这个表格概括了 diskspd 工具的大多数常见用法及其功能类别,有助于快速查看和配置各种测试参数。


C:\Users\Administrator\Downloads\DiskSpd 2.2\amd64>diskspd /?

Usage: diskspd [options] target1 [ target2 [ target3 ...] ]
version 2.2.0 (2024/6/3)

Valid targets:
     file_path
     #<physical drive number>
     <drive_letter>:

Sizes, offsets and lengths are specified as integer bytes, or with an
optional suffix of KMGT (KiB/MiB/GiB/TiB) or b (for blocks, see -b).
Examples: 4k = 4096
          with -b4k, 8b = 32768 (8 * 4KiB)

Available options:
  -?                    display usage information
  -:<flags>             experimental behaviors, as a bitmask of flags. current:
                          1 - allow throughput rate limit sleeps >1ms if indicated by rate
  -ag                   group affinity - threads assigned round-robin to CPUs by processor groups, 0 - n.
                          Groups are filled from lowest to highest processor before moving to the next.
                          [default; use -n to disable default affinity]
  -a[g#,]#[,#,...]>     advanced CPU affinity -  threads assigned round-robin to the CPUs stated, in order of
                          specification; g# is the processor group for the following CPUs. If no group is
                          stated, 0 is default. Additional groups/processors can be added, comma separated,
                          on the same or separate -a parameters.
                          Examples: -a0,1,2 and -ag0,0,1,2 are equivalent.
                                    -ag0,0,1,2,g1,0,1,2 specifies the first three CPUs in groups 0 and 1.
                                    -ag0,0,1,2,g1,0,1,2 and -ag0,0,1,2 -ag1,0,1,2 are equivalent.
  -b<size>              IO size, defines the block 'b' for sizes stated in units of blocks [default=64K]
  -B<base>[:length]     bounds; specify range of target to issue IO to - base offset and length
                          (default: IO is issued across the entire target)
  -c<size>              create file targets of the given size. Conflicts with non-file target specifications.
  -C<seconds>           cool down time - duration of the test after measurements finished [default=0s].
  -D<milliseconds>      Capture IOPs statistics in intervals of <milliseconds>; these are per-thread
                          per-target: text output provides IOPs standard deviation, XML provides the full
                          IOPs time series in addition. [default=1000, 1 second].
  -d<seconds>           duration (in seconds) to run test [default=10s]
  -f<size>              maximum target offset to issue IO to (non-inclusive); -Bbase -f(base+length) is the same
                         as -Bbase:length. For example, to test only the first sectors of a disk.
  -f<rst>               open file with one or more additional access hints
                          r : the FILE_FLAG_RANDOM_ACCESS hint
                          s : the FILE_FLAG_SEQUENTIAL_SCAN hint
                          t : the FILE_ATTRIBUTE_TEMPORARY hint
                          [default: none]
  -F<count>             total number of threads (conflicts with -t)
  -g<value>[i]          throughput per-thread per-target throttled to given value; defaults to bytes per millisecond
                          With the optional i qualifier the value is IOPS of the specified block size (-b).
                          Throughput limits cannot be specified when using completion routines (-x)
                          [default: no limit]
  -h                    deprecated, see -Sh
  -i<count>             number of IOs per burst; see -j [default: inactive]
  -j<milliseconds>      interval in <milliseconds> between issuing IO bursts; see -i [default: inactive]
  -I<priority>          Set IO priority to <priority>. Available values are: 1-very low, 2-low, 3-normal (default)
  -l                    Use large pages for IO buffers
  -L                    measure latency statistics
  -n                    disable default affinity (-a)
  -N<vni>               specify the flush mode for memory mapped I/O
                          v : uses the FlushViewOfFile API
                          n : uses the RtlFlushNonVolatileMemory API
                          i : uses RtlFlushNonVolatileMemory without waiting for the flush to drain
                          [default: none]
  -o<count>             number of outstanding I/O requests per target per thread
                          (1=synchronous I/O, unless more than 1 thread is specified with -F)
                          [default=2]
  -O<count>             number of outstanding I/O requests per thread - for use with -F
                          (1=synchronous I/O)
  -p                    start parallel sequential I/O operations with the same offset
                          (ignored if -r is specified, makes sense only with -o2 or greater)
  -P<count>             enable printing a progress dot after each <count> [default=65536]
                          completed I/O operations, counted separately by each thread
  -r[align]             random I/O aligned to [align] byte offsets within the target range (overrides -s)
                          [default alignment=block size (-b)]
  -rd<dist>[params]     specify an non-uniform distribution for random IO in the target
                          [default uniformly random]
                           distributions: pct, abs
                           all:  IO% and %Target/Size are cumulative. If the sum of IO% is less than 100% the
                                 remainder is applied to the remainder of the target. An IO% of 0 indicates a gap -
                                 no IO will be issued to that range of the target.
                           pct : parameter is a combination of IO%/%Target separated by : (colon)
                                 Example: -rdpct90/10:0/10:5/20 specifies 90% of IO in 10% of the target, no IO
                                   next 10%, 5% IO in the next 20% and the remaining 5% of IO in the last 60%
                           abs : parameter is a combination of IO%/Target Size separated by : (colon)
                                 If the actual target size is smaller than the distribution, the relative values of IO%
                                 for the valid elements define the effective distribution.
                                 Example: -rdabs90/10G:0/10G:5/20G specifies 90% of IO in 10GiB of the target, no IO
                                   next 10GiB, 5% IO in the next 20GiB and the remaining 5% of IO in the remaining
                                   capacity of the target. If the target is only 20G, the distribution truncates at
                                   90/10G:0:10G and all IO is directed to the first 10G (equivalent to -f10G).
  -rs<percentage>       percentage of requests which should be issued randomly; -r is used to specify IO alignment.
                          Sequential IO runs are homogeneous when a mixed r/w ratio is specified (-w) and their lengths
                          follow a geometric distribution based on the percentage (chance of next IO being sequential).
  -R[p]<text|xml>       output format. With the p prefix, the input profile (command line or XML) is validated and
                          re-output in the specified format without running load, useful for checking or building
                          complex profiles.
                          [default: text]
  -s[i][align]          stride size of [align] bytes, alignment & offset between operations
                          [default=non-interlocked, default alignment=block size (-b)]
                          By default threads track independent sequential IO offsets starting at base offset of the target.
                          With multiple threads this results in threads overlapping their IOs - see -T to divide
                          them into multiple separate sequential streams on the target.
                          With the optional i qualifier (-si) threads interlock on a shared sequential offset.
                          Interlocked operations may introduce overhead but make it possible to issue a single
                          sequential stream to a target which responds faster than one thread can drive.
                          (ignored if -r specified, -si conflicts with -p, -rs and -T)
  -S[bhmruw]            control caching behavior [default: caching is enabled, no writethrough]
                          non-conflicting flags may be combined in any order; ex: -Sbw, -Suw, -Swu
  -S                    equivalent to -Su
  -Sb                   enable caching (default, explicitly stated)
  -Sh                   equivalent -Suw
  -Sm                   enable memory mapped I/O
  -Su                   disable software caching, equivalent to FILE_FLAG_NO_BUFFERING
  -Sr                   disable local caching, with remote sw caching enabled; only valid for remote filesystems
  -Sw                   enable writethrough (no hardware write caching), equivalent to FILE_FLAG_WRITE_THROUGH or
                          non-temporal writes for memory mapped I/O (-Sm)
  -t<count>             number of threads per target (conflicts with -F)
  -T<offs>              starting separation between I/O operations performed on the same target by different threads
                          [default=0] (starting offset = base target offset + (thread number * <offs>)
                          only applies to -s sequential IO with #threads > 1, conflicts with -r and -si
  -v[s]                 verbose mode - with s, only provide additional summary statistics
  -w<percentage>        percentage of write requests (-w and -w0 are equivalent and result in a read-only workload).
                        absence of this switch indicates 100% reads
                          IMPORTANT: a write test will destroy existing data without a warning
  -W<seconds>           warm up time - duration of the test before measurements start [default=5s]
  -x                    use completion routines instead of I/O Completion Ports
  -X<filepath>          use an XML file to configure the workload. Profile defaults for -W/d/C (durations) and -R/v/z
                          (output format, verbosity and random seed) may be overriden by direct specification.
                          Targets can be defined in XML profiles as template paths of the form *<integer> (*1, *2, ...).
                          When run, specify the paths to substitute for the template paths in order on the command line.
                          The first specified target is *1, second is *2, and so on.
                          Example: diskspd -d60 -Xprof.xml first.bin second.bin (prof.xml using *1 and *2, 60s run)
  -z[seed]              set random seed [with no -z, seed=0; with plain -z, seed is based on system run time]

Write buffers:
  -Z                    zero buffers used for write tests
  -Zr                   per IO random buffers used for write tests - this incurrs additional run-time
                         overhead to create random content and shouln't be compared to results run
                         without -Zr
  -Z<size>              use a <size> buffer filled with random data as a source for write operations.
  -Z<size>,<file>       use a <size> buffer filled with data from <file> as a source for write operations.

  By default, write source buffers are filled with a repeating pattern (0, 1, 2, ..., 255, 0, 1, ...)

Synchronization:
  -ys<eventname>     signals event <eventname> before starting the actual run (no warmup)
                       (creates a notification event if <eventname> does not exist)
  -yf<eventname>     signals event <eventname> after the actual run finishes (no cooldown)
                       (creates a notification event if <eventname> does not exist)
  -yr<eventname>     waits on event <eventname> before starting the run (including warmup)
                       (creates a notification event if <eventname> does not exist)
  -yp<eventname>     stops the run when event <eventname> is set; CTRL+C is bound to this event
                       (creates a notification event if <eventname> does not exist)
  -ye<eventname>     sets event <eventname> and quits

Event Tracing:
  -e<q|c|s>             Use query perf timer (qpc), cycle count, or system timer respectively.
                          [default = q, query perf timer (qpc)]
  -ep                   use paged memory for the NT Kernel Logger [default=non-paged memory]
  -ePROCESS             process start & end
  -eTHREAD              thread start & end
  -eIMAGE_LOAD          image load
  -eDISK_IO             physical disk IO
  -eMEMORY_PAGE_FAULTS  all page faults
  -eMEMORY_HARD_FAULTS  hard faults only
  -eNETWORK             TCP/IP, UDP/IP send & receive
  -eREGISTRY            registry calls


Examples:

Create 8192KB file and run read test on it for 1 second:

  diskspd -c8192K -d1 testfile.dat

Set block size to 4KB, create 2 threads per file, 32 overlapped (outstanding)
I/O operations per thread, disable all caching mechanisms and run block-aligned random
access read test lasting 10 seconds:

  diskspd -b4K -t2 -r -o32 -d10 -Sh testfile.dat

Create two 1GB files, set block size to 4KB, create 2 threads per file, affinitize threads
to CPUs 0 and 1 (each file will have threads affinitized to both CPUs) and run read test
lasting 10 seconds:

  diskspd -c1G -b4K -t2 -d10 -a0,1 testfile1.dat testfile2.dat

 
 
posted @ 2024-12-29 22:52  suv789  阅读(526)  评论(0)    收藏  举报