基于gem5的ROCm开发环境搭建和测试
正如《我买不起AMD MI300X计算卡,也可以学习ROCm吗?》标题所说,不是谁都有AMD GPU算力卡进行ROCm开发。那么搭建一个虚拟开发环境,便捷经济。
如《Full System AMD GPU model》指出,gem5提供了全系统的GPU模拟,并且CPU部分的模拟通过KVM CPU实现,这就需要主机支持KVM。
通过gem5构建的虚拟化环境,可创建模拟AMD CPU与GPU的异构计算平台,配合gem5-resources预置的ROCm优化镜像,实现HIP程序在虚拟GPU上的无缝运行。gem5-resource根据需求创建ubuntu和kernel镜像,在gem5创建的虚拟机中执行。通过diod服务,主机和gem5虚拟机进行文件共享。通过gem5term登录到gem5虚拟机,进行ROCm代码编译和测试。
1 gem5和gem5-resource编译
gem5是一个基于离散事件驱动的模块化计算机系统架构模拟器,广泛应用于计算机体系结构的学术研究和工业开发。该模拟器支持:
- 周期精确(cycle-accurate)的CPU指令集模拟
- 时序精确(timing-accurate)的内存子系统建模
- 可扩展的互连网络仿真框架
其官方配套资源库gem5-resources提供:
1. 预构建的Linux磁盘镜像(Kernel 5.10+)
2. 预编译的基准测试套件(SPEC CPU2017, NAS Parallel Benchmarks)
3. 硬件验证工作负载(如ARM AMBA AXI4总线测试)
该资源库通过标准化研究环境配置,显著降低异构计算架构的仿真门槛。
安装编译必备工具:
sudo apt update sudo apt install build-essential git m4 scons zlib1g zlib1g-dev qemu-system-x86 libpng-dev libcapstone-dev diod \ libprotobuf-dev protobuf-compiler libprotoc-dev libgoogle-perftools-dev \ python3-dev python3-pip python3-setuptools libboost-all-dev pkg-config \ cmake flex bison libhdf5-dev libxml2-dev libnuma-dev numactl \ python3-venv python3-dev sudo pip3 install scons pybind11
下载仓库,并使用scons开始编译:
git clone https://github.com/gem5/gem5.git
scons build/VEGA_X86/gem5.opt -j `nproc`
下载gem5-resources,并编译资源:
https://github.com/gem5/gem5-resources.git cd src/x86-ubuntu-gpu-ml ./build.sh
gem5-resource编译过程中的异常修复:
2025/08/25 19:50:25 packer-plugin-qemu_v1.1.3_x5.0_linux_amd64 plugin: 2025/08/25 19:50:25 Started Qemu. Pid: 229293 2025/08/25 19:50:25 packer-plugin-qemu_v1.1.3_x5.0_linux_amd64 plugin: 2025/08/25 19:50:25 Qemu stderr: Could not access KVM kernel module: Permission denied 2025/08/25 19:50:25 packer-plugin-qemu_v1.1.3_x5.0_linux_amd64 plugin: 2025/08/25 19:50:25 Qemu stderr: qemu-system-x86_64: failed to initialize kvm: Permission denied
如下:
# 将当前用户添加到 kvm 用户组 sudo usermod -aG kvm $USER # 验证用户是否已加入 kvm 组 groups $USER # 注销并重新登录,或者使用以下命令立即生效 newgrp kvm
2 在gem5模拟AMD CPU/GPU,并终端登录、与主机文件共享
2.1 启动文件共享服务
启动diod服务:
rm -f /tmp/gem5_9p.sock && sudo diod -f -o "trans=unix,path=/tmp/gem5_9p.sock,port=0" -e /home/lbq/data/rocm
为了和9p交互提供文件共享服务,修改gem5/configs/example/gpufs/runfs.py:
diff --git a/configs/example/gpufs/runfs.py b/configs/example/gpufs/runfs.py
index db2282808a..7c553c8d3d 100644
--- a/configs/example/gpufs/runfs.py
+++ b/configs/example/gpufs/runfs.py
@@ -51,7 +51,7 @@ from ruby import Ruby
# GPU FS related
from system.system import makeGpuFSSystem
-
+from m5.objects import PciVirtIO, VirtIO9PDiod, PciHost
def addRunFSOptions(parser):
parser.add_argument(
@@ -321,6 +321,44 @@ def runGpuFSSystem(args):
)
+_real_instantiate = m5.instantiate
+
+def _instantiate_with_9p(*args,**kwargs):
+ root = m5.objects. Root.getInstance()
+ sys = getattr(root, "system", None)
+ if sys and not hasattr(sys, "_virtio9p_added"):
+ viopci = PciVirtIO()
+ viopci.vio = VirtIO9PDiod()
+ viopci.vio.root = "/home/lbq/data/rocm"
+ viopci.vio.socketPath = "/tmp/gem5_9p.sock"
+ sys.viopci = viopci
+
+ host_bridge = next(obj for obj in sys.descendants()
+ if isinstance(obj, PciHost))
+
+
+ viopci.host = host_bridge
+ viopci.pci_bus = 0
+ viopci.pci_dev = 2
+ viopci.pci_func = 0
+ viopci.pio = sys.iobus.mem_side_ports
+ viopci.dma = sys.iobus.cpu_side_ports
+
+ viopci.VendorID = 0x1AF4
+ viopci.DeviceID = 0x1009
+ viopci.SubClassCode = 0x80
+ viopci.ClassCode = 0xFF
+ viopci.Revision = 0x00
+ viopci.SubsystemID = 0x09
+ viopci.InterruptPin = 1
+ viopci.InterruptLine = 11
+
+ sys._virtio9p_added = True
+ return _real_instantiate(*args, **kwargs)
+
+m5.instantiate = _instantiate_with_9p
+
+
if __name__ == "__m5_main__":
# Add gpufs, common, ruby, amdgpu, and gpu tlb args
parser = argparse.ArgumentParser()
主要的组件如下:
组件 |
功能描述 |
交互接口 |
---|---|---|
DIOD |
主机文件共享守护进程 |
9P协议(TCP/UDP端口56432) |
PciHost |
主机PCI控制器 |
物理PCIe总线接口 |
PciVirtIO |
模拟的PCI设备 |
虚拟PCI配置空间 |
VirtIO9PDiod |
9P协议传输引擎 |
VirtIO队列(请求/响应) |
Guest Kernel |
客户机9P客户端驱动 |
9P文件系统接口 |
流程图如下:
2.2 主机安装/dev/gem5_bridge
/dev/gem5_bridge是gem5模拟环境中一个特殊的设备接口,主要用于连接模拟系统内部与外部网络,或在模拟的不同组件间进行高效的数据交换。它在需要网络功能或设备模拟的仿真场景中尤为重要。
进入gem5/util/gem5_bridge,执行make,得到gem5_bridge.ko。对于x86来说,执行如下命令创建/dev/gem5_bridge:
sudo insmod gem5_bridge.ko \ gem5_bridge_baseaddr=0x7f000000 \ gem5_bridge_rangesize=0x1000
通过ls /dev/gem5_bridge查看是否成功,或者查看内核日志:
[ 209.473510] gem5_bridge: loading out-of-tree module taints kernel. [ 209.473515] gem5_bridge: module verification failed: signature and/or required key missing - tainting kernel [ 209.474615] gem5_bridge_init: SUCCESS!
2.3 启动gem5执行ROCm环境
为了解决ROCm执行后自动退出环境,修改gem5/configs/example/gpufs/mi300.py:
diff --git a/configs/example/gpufs/mi300.py b/configs/example/gpufs/mi300.py index 08dce8f4c2..85f369c28e 100644 --- a/configs/example/gpufs/mi300.py +++ b/configs/example/gpufs/mi300.py @@ -153,7 +153,7 @@ def runMI300GPUFS( ) b64file.write(runscriptStr) - args.script = tempRunscript +# args.script = tempRunscript # Defaults for CPU args.cpu_type = "X86KvmCPU"
编写简单测试程序pytorch_test.py验证环境:
#!/usr/bin/env python3 import torch import subprocess import os def get_gpu_info(): """获取详细的GPU信息""" try: rocm_info = subprocess.check_output(["rocminfo"]).decode() return [line for line in rocm_info.split('\n') if "Marketing Name" in line][0] except: return "N/A" print("=== 增强版ROCm验证 ===") print(f"PyTorch版本: {torch.__version__}") print(f"Torch HIP版本: {torch.version.hip}") print(f"CUDA可用: {torch.cuda.is_available()}") print(f"GPU设备数: {torch.cuda.device_count()}") print(f"当前设备: {torch.cuda.current_device()}") print(f"设备名称: {torch.cuda.get_device_name(0)}") print(f"GPU内存: {torch.cuda.get_device_properties(0).total_memory/1024**3:.2f} GB") print(f"ROCm营销名称: {get_gpu_info()}") # 张量运算验证 x = torch.rand(5,3).to('cuda') y = torch.rand(3,5).to('cuda') start = torch.cuda.Event(enable_timing=True) end = torch.cuda.Event(enable_timing=True) start.record() z = (x @ y).norm() end.record() torch.cuda.synchronize() print(f"\n计算验证:") print(f"张量设备: {x.device}") print(f"矩阵乘法结果范数: {z.item():.6f}") print(f"CUDA事件耗时: {start.elapsed_time(end):.2f} ms")
启动gem5,执行:
./gem5/build/VEGA_X86/gem5.opt ./gem5/configs/example/gpufs/mi300.py --disk-image gem5-resources/src/x86-ubuntu-gpu-ml/disk-image/x86-ubuntu-gpu-ml --kernel gem5-resources/src/x86-ubuntu-gpu-ml/vmlinux-gpu-ml --app ./pytorch_test.py
执行日志如下,提供了串口(3456)和gdb(7000)连接:
gem5 Simulator System. https://www.gem5.org gem5 is copyrighted software; use the --copyright option for details. gem5 version DEVELOP-FOR-25.1 gem5 compiled Aug 25 2025 20:39:59 gem5 started Aug 26 2025 11:07:38 gem5 executing on lbq-hp, pid 7369 command line: ./gem5/build/VEGA_X86/gem5.opt ./gem5/configs/example/gpufs/mi300.py --disk-image gem5-resources/src/x86-ubuntu-gpu-ml/disk-image/x86-ubuntu-gpu-ml --kernel gem5-resources/src/x86-ubuntu-gpu-ml/vmlinux-gpu-ml --app ./pytorch_test.py warn: Physical memory size specified is 8GiB which is greater than 3GiB. Twice the number of memory controllers would be created. Global frequency set at 1000000000000 ticks per second warn: system.workload.acpi_description_table_pointer.rsdt adopting orphan SimObject param 'entries' warn: No dot file generated. Please install pydot to generate the dot file and pdf. src/mem/dram_interface.cc:690: warn: DRAM device capacity (8192 Mbytes) does not match the address range assigned (4096 Mbytes) src/sim/kernel_workload.cc:46: info: kernel located at: gem5-resources/src/x86-ubuntu-gpu-ml/vmlinux-gpu-ml src/base/statistics.hh:279: warn: One of the stats is a legacy stat. Legacy stat is a stat that does not belong to any statistics::Group. Legacy stat is deprecated. src/cpu/kvm/base.cc:113: info: Using KVM CPU without perf. The stats related to the number of cycles and instructions executed by the KVM CPU will not be updated. The stats should not be used for performance evaluation. src/base/statistics.hh:279: warn: One of the stats is a legacy stat. Legacy stat is a stat that does not belong to any statistics::Group. Legacy stat is deprecated. src/mem/dram_interface.cc:690: warn: DRAM device capacity (128 Mbytes) does not match the address range assigned (16384 Mbytes) src/base/statistics.hh:279: warn: One of the stats is a legacy stat. Legacy stat is a stat that does not belong to any statistics::Group. Legacy stat is deprecated. 0: system.pc.south_bridge.cmos.rtc: Real-time clock set to Sun Jan 1 00:00:00 2012 system.pc.com_1.device: Listening for connections on port 3456 src/base/statistics.hh:279: warn: One of the stats is a legacy stat. Legacy stat is a stat that does not belong to any statistics::Group. Legacy stat is deprecated. system.remote_gdb: Listening for connections on port 7000 src/dev/intel_8254_timer.cc:128: warn: Reading current count from inactive timer. Running the simulation src/cpu/kvm/base.cc:169: info: KVM: Coalesced MMIO disabled by config. src/cpu/kvm/base.cc:591: hack: Pretending totalOps is equivalent to totalInsts() src/arch/x86/kvm/x86_cpu.cc:1688: warn: kvm-x86: MSR (0x3a) unsupported by gem5. Skipping. src/arch/x86/kvm/x86_cpu.cc:1688: warn: kvm-x86: MSR (0x48) unsupported by gem5. Skipping. ... src/arch/x86/kvm/x86_cpu.cc:1688: warn: kvm-x86: MSR (0xc0010015) unsupported by gem5. Skipping. src/arch/x86/kvm/x86_cpu.cc:1688: warn: kvm-x86: MSR (0x4b564d05) unsupported by gem5. Skipping. src/dev/pci/host.cc:171: warn: 00:1f.1: Write to config space on non-existent PCI device src/dev/pci/host.cc:171: warn: 00:1f.1: Write to config space on non-existent PCI device src/dev/x86/pc.cc:117: warn: Don't know what interrupt to clear for console.
2.4 串口登录ROCm运行环境
将gem5/util/term/gem5term拷贝到/usr/bin:
gem5term localhost 3456
登陆后:
Ubuntu 24.04.2 LTS gem5 ttyS0 gem5 login: root (automatic login) The programs included with the Ubuntu system are free software; the exact distribution terms for each program are described in the individual files in /usr/share/doc/*/copyright. Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by applicable law. Can't open /dev/gem5_bridge: No such file or directory --> Make sure the gem5_bridge device driver has been properly inserted into the kernel. Otherwise, sudo access required to perform address-mode ops when linking against m5 library. root@gem5:~# lspci 00:02.0 Unassigned class [ff80]: Red Hat, Inc. Virtio filesystem 00:04.0 IDE interface: Intel Corporation 82371AB/EB/MB PIIX4 IDE 00:08.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Aqua Vanjaram [Instinct MI300X]
2.5 挂载外部目录
在gem5+ROCm内部挂在外部目录:
mkdir /root/share
mount -t 9p -o trans=virtio,version=9p2000.L,aname=/home/lbq/data/rocm gem5 /root/share
如下:
root@gem5:~/share# df -h Filesystem Size Used Avail Use% Mounted on /dev/root 54G 39G 12G 77% / tmpfs 3.9G 0 3.9G 0% /dev/shm tmpfs 1.6G 492K 1.6G 1% /run tmpfs 5.0M 0 5.0M 0% /run/lock tmpfs 795M 16K 795M 1% /run/user/0 gem5 703G 106G 561G 16% /root/share
2.6 插入amggpu驱动module
通过以下命令插入amdgpu驱动:
export LD_LIBRARY_PATH=/opt/rocm/lib:$LD_LIBRARY_PATH export HSA_ENABLE_INTERRUPT=0 export HCC_AMDGPU_TARGET=gfx942 dd if=/root/roms/mi300.rom of=/dev/mem bs=1k seek=768 count=128 #加载MI300X固件。 sh /home/gem5/load_amdgpu.sh
rocm-smi查GPU状态:
============================================ ROCm System Management Interface ============================================ ====================================================== Concise Info ====================================================== Device Node IDs Temp Power Partitions SCLK MCLK Fan Perf PwrCap VRAM% GPU% (DID, GUID) (Edge) (Avg) (Mem, Compute, ID) ========================================================================================================================== 0 1 0x74a1, 36215 N/A N/A N/A, SPX, 0 None None 0% n/a Unsupported 1% Unsupported ========================================================================================================================== ================================================== End of ROCm SMI Log ===================================================
rocminfo检查硬件信息和驱动安装状态:
ROCk module version 6.12.12 is loaded ===================== HSA System Attributes ===================== Runtime Version: 1.15 Runtime Ext Version: 1.7 System Timestamp Freq.: 0.001000MHz Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count) Machine Model: LARGE System Endianness: LITTLE Mwaitx: DISABLED XNACK enabled: NO DMAbuf Support: YES VMM Support: YES ========== HSA Agents ========== ******* Agent 1 *******... ******* Agent 2 ******* Name: gfx942 Uuid: GPU-XX Marketing Name: AMD Instinct MI300X Vendor Name: AMD Feature: KERNEL_DISPATCH Profile: BASE_PROFILE Float Round Mode: NEAR Max Queue Number: 128(0x80) Queue Min Size: 64(0x40) Queue Max Size: 131072(0x20000) Queue Type: MULTI Node: 1 Device Type: GPU Cache Info: Chip ID: 29857(0x74a1) ASIC Revision: 2(0x2) Cacheline Size: 64(0x40) Max Clock Freq. (MHz): 100 BDFID: 64 Internal Node ID: 1 Compute Unit: 320 SIMDs per CU: 4 Shader Engines: 32 Shader Arrs. per Eng.: 1 WatchPts on Addr. Ranges:4 Coherent Host Access: FALSE Memory Properties: Features: KERNEL_DISPATCH Fast F16 Operation: TRUE Wavefront Size: 64(0x40) Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Max Waves Per CU: 32(0x20) Max Work-item Per CU: 2048(0x800) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) Max fbarriers/Workgrp: 32 Packet Processor uCode:: 177 SDMA engine uCode:: 24 IOMMU Support:: None Pool Info: Pool 1 Segment: GLOBAL; FLAGS: COARSE GRAINED Size: 16760832(0xffc000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Recommended Granule:2048KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 2... ISA Info: ISA 1 Name: amdgcn-amd-amdhsa--gfx942:sramecc-:xnack- Machine Models: HSA_MACHINE_MODEL_LARGE Profiles: HSA_PROFILE_BASE Default Rounding Mode: NEAR Default Rounding Mode: NEAR Fast f16: TRUE Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) FBarrier Max Size: 32 ISA 2... *** Done ***
3 在gem5模拟环境中测试ROCm
3.1 借助gem5-resource进行hip测试
进入gem5-resources/src/gpu/square,修改Makefile:
diff --git a/src/gpu/square/Makefile b/src/gpu/square/Makefile index 0e0cf02b..13dcbef4 100644 --- a/src/gpu/square/Makefile +++ b/src/gpu/square/Makefile @@ -1,4 +1,4 @@ -HIP_PATH?= /opt/rocm/hip +HIP_PATH?= /opt/rocm HIPCC=$(HIP_PATH)/bin/hipcc BIN_DIR?= ./bin @@ -6,7 +6,7 @@ BIN_DIR?= ./bin square: $(BIN_DIR)/square $(BIN_DIR)/square: square.cpp $(BIN_DIR) - $(HIPCC) --amdgpu-target=gfx900,gfx902 $(CXXFLAGS) square.cpp -o $(BIN_DIR)/square + $(HIPCC) --amdgpu-target=gfx942 --save-temps $(CXXFLAGS) square.cpp -o $(BIN_DIR)/square $(BIN_DIR): mkdir -p $(BIN_DIR)
编译如下:
make: Warning: File 'Makefile' has modification time 66360 s in the future mkdir -p ./bin /opt/rocm/bin/hipcc --amdgpu-target=gfx942 square.cpp -o ./bin/square Warning: The --amdgpu-target option has been deprecated and will be removed in the future. Use --offload-arch instead. square.cpp:77:5: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result] 77 | hipDeviceSynchronize(); | ^~~~~~~~~~~~~~~~~~~~ 1 warning generated when compiling for gfx942. square.cpp:77:5: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result] 77 | hipDeviceSynchronize(); | ^~~~~~~~~~~~~~~~~~~~ 1 warning generated when compiling for host. make: warning: Clock skew detected. Your build may be incomplete.
执行:
./bin/square
执行结果如下:
info: running on device AMD Instinct MI300X info: allocate host and device mem ( 7.63 MB) info: launch 'vector_square' kernel info: check result PASSED!
反汇编查看指令:/opt/rocm-6.4.0/llvm/bin/llvm-objdump -d square-hip-amdgcn-amd-amdhsa-gfx942.o
square-hip-amdgcn-amd-amdhsa-gfx942.o: file format elf64-amdgpu Disassembly of section .text: 0000000000000000 <.text>: s_nop 0 // 000000000000: BF800000 s_nop 0 // 000000000004: BF800000 ... s_nop 0 // 0000000003F8: BF800000 s_nop 0 // 0000000003FC: BF800000 Disassembly of section .text._Z13vector_squareIfEvPT_PKS0_m: 0000000000000000 <_Z13vector_squareIfEvPT_PKS0_m>: s_load_dword s3, s[0:1], 0x24 // 000000000000: C00200C0 00000024 s_load_dwordx2 s[8:9], s[0:1], 0x10 // 000000000008: C0060200 00000010 s_add_u32 s10, s0, 24 // 000000000010: 800A9800 s_addc_u32 s11, s1, 0 // 000000000014: 820B8001 v_mov_b32_e32 v1, 0 // 000000000018: 7E020280 s_waitcnt lgkmcnt(0) // 00000000001C: BF8CC07F s_and_b32 s3, s3, 0xffff // 000000000020: 8603FF03 0000FFFF s_mul_i32 s2, s2, s3 // 000000000028: 92020302 v_add_u32_e32 v0, s2, v0 // 00000000002C: 68000002 v_cmp_gt_u64_e32 vcc, s[8:9], v[0:1] // 000000000030: 7DD80008 s_and_saveexec_b64 s[4:5], vcc // 000000000034: BE84206A s_cbranch_execz 29 // 000000000038: BF88001D <_Z13vector_squareIfEvPT_PKS0_m+0xb0> s_load_dword s2, s[10:11], 0x0 // 00000000003C: C0020085 00000000 s_load_dwordx4 s[4:7], s[0:1], 0x0 // 000000000044: C00A0100 00000000 s_mov_b32 s1, 0 // 00000000004C: BE810080 v_lshlrev_b64 v[2:3], 2, v[0:1] // 000000000050: D28F0002 00020082 s_mov_b64 s[10:11], 0 // 000000000058: BE8A0180 s_waitcnt lgkmcnt(0) // 00000000005C: BF8CC07F s_mul_i32 s0, s2, s3 // 000000000060: 92000302 s_lshl_b64 s[2:3], s[0:1], 2 // 000000000064: 8E828200 v_lshl_add_u64 v[4:5], s[6:7], 0, v[2:3] // 000000000068: D2080004 04090006 global_load_dword v6, v[4:5], off // 000000000070: DC508000 067F0004 v_lshl_add_u64 v[0:1], v[0:1], 0, s[0:1] // 000000000078: D2080000 00010100 v_cmp_le_u64_e32 vcc, s[8:9], v[0:1] // 000000000080: 7DD60008 v_lshl_add_u64 v[4:5], s[4:5], 0, v[2:3] // 000000000084: D2080004 04090004 v_lshl_add_u64 v[2:3], v[2:3], 0, s[2:3] // 00000000008C: D2080002 00090102 s_or_b64 s[10:11], vcc, s[10:11] // 000000000094: 878A0A6A s_waitcnt vmcnt(0) // 000000000098: BF8C0F70 v_mul_f32_e32 v6, v6, v6 // 00000000009C: 0A0C0D06 global_store_dword v[4:5], v6, off // 0000000000A0: DC708000 007F0604 s_andn2_b64 exec, exec, s[10:11] // 0000000000A8: 89FE0A7E s_cbranch_execnz 65518 // 0000000000AC: BF89FFEE <_Z13vector_squareIfEvPT_PKS0_m+0x68> s_endpgm // 0000000000B0: BF810000
square的依赖关系如下:
3.2 借助hip-tests进行测试
hip-tests位于rocm-systems/projects/hip-tests,参考rocm-systems/projects/hip-tests/README-doc.md对其编译:
cd rocm-systems/projects/hip-tests mkdir -p build; cd build cmake ../catch/ -DHIP_PLATFORM=amd make -j$(nproc) build_tests ctest # run tests
HIP Catch2 standalone test:
hipcc ./catch/unit/memory/hipPointerGetAttributes.cc -I ./catch/include ./catch/hipTestMain/standalone_main.cc -I ./catch/external/Catch2 -o hipPointerGetAttributes ./hipPointerGetAttributes