QEMU上搭建DPDK的测试环境
这是在OERV做实习生的第二个月,RVV小组目前的重点是存储领域的开发优化,我也尝试着看看能不能使用QEMU搭建DPDK的整体的测试环境,搭建出来一个比较简单的测试,做一下记录
QEMU上搭建DPDK的测试环境
在QEMU上搭建DPDK的测试环境
可复现的单元测试
测试环境:
Ubuntu24.04LTS
QEMU 10.0.5
openEuler24.03LTS SP1虚拟机
交叉编译RISCV架构的DPDK
https://doc.dpdk.org/guides/linux_gsg/cross_build_dpdk_for_riscv.html根据DPDK官方文档可以进行编译
DPDK源码下载
下载页面网址
https://core.dpdk.org/download/这里选用的是DPDK 24.11.2(LTS)
安装RISCV交叉编译工具
sudo apt update sudo apt install rossbuild-essential-riscv64 sudo apt install meson使用DPDK官方文档的教程,编译DPDK
meson setup riscv64-build-gcc --cross-file config/riscv/riscv64_linux_gcc ninja -C riscv64-build-gcc中间如果有提示缺少库时,根据提示安装即可
除了编译DPDK库之外,我们还可以编译DPDK的Sample applications
进入
riscv64-build-gcc目录,运行meson configure -Dexamples=all ninja创建RISCV虚拟机
openEuler版本:24.03 LTS SP1,参考文档:
https://docs.openeuler.org/zh/docs/23.09/docs/Installation/riscv_qemu.html下载RISCV版本虚拟机
在
https://dl-cdn.openeuler.openatom.cn/openEuler-24.03-LTS-SP1/virtual_machine_img/riscv64/页面下载RISCV_VIRT_CODE.fd,RISCV_VIRT_VARS.fd,openEuler-24.03-LTS-SP1-riscv64.qcow2.xz,start_vm.sh,目录下的fw_dynamic_oe_penglai.bin和
start_vm_penglai.sh为添加蓬莱特性,本测试中使用不到,无需下载.。配置 copy-on-write(COW)磁盘
写时复制(copy-on-write,缩写COW)技术不会对原始的映像文件做更改,变化的部分写在另外的映像文件中,这种特性在 QEMU 中只有 QCOW 格式支持,多个磁盘映像可以指向同一映像同时测试多个配置, 而不会破坏原映像。
本实验使用这个配置来避免对原始镜像进行修改。
解压镜像压缩包并创建新镜像
xz -dk openEuler-24.03-LTS-SP1-riscv64.qcow2.xz qemu-img create -o backing_file=openEuler-24.03-LTS-SP1-riscv64.qcow2,backing_fmt=qcow2 -f qcow2 openeuler2403ltssp1_riscv64.qcow2调整启动参数
## Configuration vcpu=8 memory=8 # 因为上面是在同级目录下设置了COW磁盘,所以目录里有两个qcow2镜像,所以这里要指定我们的COW镜像 drive=openeuler2403ltssp1_riscv64.qcow2 fw1="RISCV_VIRT_CODE.fd" fw2="RISCV_VIRT_VARS.fd" ssh_port=12055启动脚本
bash start_vm.sh,RISCV虚拟机即可启动。启动RISCV虚拟机后,通过脚本可以看到
ssh_port 为转发的 SSH 端口,默认为12055。设定为空以关闭该功能。除了可以使用SSH登陆系统之外,我们也会通过这个SSH端口使用
scp命令传输数据。我这里是将整个DPDK的目录都通过SCP传输到虚拟机中了命令为:
scp -P 12055 物理机源码目录 root@localhost:虚拟机目标目录测试RISCV程序运行
通过运行dpdk-helloworld来测试riscv架构的程序可以在虚拟机中运行,由于目前还没有分配大页,所以需要先禁用大页。
进入构建根目录,这里是
riscv64-build-gcc,运行我们构建的helloworld smaple applicationsudo ./examples/dpdk-helloworld --no-huge # 输出如下 EAL: Detected CPU lcores: 8 EAL: Detected NUMA nodes: 1 EAL: Static memory layout is selected, amount of reserved memory can be adjusted with -m or --socket-mem EAL: Detected static linkage of DPDK EAL: Multi-process socket /var/run/dpdk/rte/mp_socket EAL: Selected IOVA mode 'VA' EAL: TSC using RISC-V rdtime. hello from core 1 hello from core 2 hello from core 3 hello from core 4 hello from core 5 hello from core 6 hello from core 7 hello from core 0如果不禁用大页,可以得到如下输出,显示无法得到大页信息:
sudo ./dpdk/riscv64-build-gcc/examples/dpdk-helloworld EAL: Detected CPU lcores: 8 EAL: Detected NUMA nodes: 1 EAL: Detected static linkage of DPDK EAL: Multi-process socket /var/run/dpdk/rte/mp_socket EAL: Selected IOVA mode 'PA' EAL: No free 2048 kB hugepages reported on node 0 EAL: Cannot get hugepage information. EAL: PANIC in main(): Cannot init EAL 0: ./dpdk/riscv64-build-gcc/examples/dpdk-helloworld (rte_dump_stack+0x56) [2ac91f9b34] 1: ./dpdk/riscv64-build-gcc/examples/dpdk-helloworld (__rte_panic+0x58) [2ac8f906ac] 2: ./dpdk/riscv64-build-gcc/examples/dpdk-helloworld (2ac8e7d000+0x1c8c7c) [2ac9045c7c] 3: /lib64/lp64d/libc.so.6 (3fb31b1000+0x26958) [3fb31d7958] 4: /lib64/lp64d/libc.so.6 (__libc_start_main+0x74) [3fb31d7a00] 5: ./dpdk/riscv64-build-gcc/examples/dpdk-helloworld (_start+0x20) [2ac905f878] Aborted配置DPDK运行环境
配置大页
首先在物理机上配置大页,将大页设备传递到QEMU中启动参数中,这里我将大页设置为8G,和虚拟机内存大小一致。
sudo sh -c "echo 4096 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages"查看大页是否配置成功
cat /proc/meminfo |grep Huge #HugePages_Total和HugePages_Free不为0即为配置成功 AnonHugePages: 815104 kB ShmemHugePages: 1990656 kB FileHugePages: 0 kB HugePages_Total: 4096 HugePages_Free: 4096 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB Hugetlb: 8388608 kB将大页传递到虚拟机中,在start_vm.sh的启动命令中添加大页设置
-object memory-backend-file,id=mem,size="$memory"G,mem-path=/dev/hugepages,share=on,cmd="/opt/qemu/bin/qemu-system-riscv64 \ -nographic -machine virt,pflash0=pflash0,pflash1=pflash1,acpi=off \ -smp "$vcpu" -m "$memory"G \ -blockdev node-name=pflash0,driver=file,read-only=on,filename="$fw1" \ -blockdev node-name=pflash1,driver=file,filename="$fw2" \ -drive file="$drive",format=qcow2,id=hd0,if=none \ -object rng-random,filename=/dev/urandom,id=rng0 \ -device virtio-vga \ -device virtio-rng-device,rng=rng0 \ -device virtio-blk-device,drive=hd0 \ -device virtio-net-device,netdev=usernet \ -netdev user,id=usernet,hostfwd=tcp::"$ssh_port"-:22 \ -object memory-backend-file,id=mem,size="$memory"G,mem-path=/dev/hugepages,share=on\ -device qemu-xhci -usb -device usb-kbd -device usb-tablet"再次启动虚拟机,在虚拟机中设置大页,这里我设置为大页内存为4G
sudo sh -c "echo 2048 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages"注意这里不要将所有的内存都分配为大页,因为在分配大页时会将这些内存标记为在使用,会导致可用内存减小,如果全部的内存分配为大页,那就导致启动新程序是无内存可用,系统会杀掉进程
查看大页是否配置成功
cat /proc/meminfo |grep Huge AnonHugePages: 2048 kB ShmemHugePages: 0 kB FileHugePages: 0 kB HugePages_Total: 2048 HugePages_Free: 2048 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB Hugetlb: 4194304 kB此时再运行dpdk-helloworld,可以看到正常退出,不会报错
sudo ./examples/dpdk-helloworld EAL: Detected CPU lcores: 8 EAL: Detected NUMA nodes: 1 EAL: Detected static linkage of DPDK EAL: Multi-process socket /var/run/dpdk/rte/mp_socket EAL: Selected IOVA mode 'PA' EAL: TSC using RISC-V rdtime. hello from core 1 hello from core 5 hello from core 6 hello from core 7 hello from core 2 hello from core 4 hello from core 0 hello from core 3模拟PCI网卡
现在我们查看PCI设备,使用
./dpdk/usertools/dpdk-devbind.py -s可以看到没有可用的PCI设备No 'Network' devices detected ============================= No 'Baseband' devices detected ============================== No 'Crypto' devices detected ============================ No 'DMA' devices detected ========================= No 'Eventdev' devices detected ============================== No 'Mempool' devices detected ============================= No 'Compress' devices detected ============================== No 'Misc (rawdev)' devices detected =================================== No 'Regex' devices detected =========================== No 'ML' devices detected ========================因此我们需要模拟PCI网卡设备。
我这里是使用tap设备作为后端设备,使用ip命令创建两个tap设备,设置IP并启动
sudo ip tuntap add tap0 mode tap #创建tap设备,名称为tap0 sudo ip addr add 192.168.31.1/24 dev tap0 # 设置IP sudo ip link set tap0 up #启动tap0设备 sudo ip tuntap add taptest mode tap #创建tap设备,名称为tap0 sudo ip addr add 192.168.1.1/24 dev taptest # 设置IP sudo ip link set taptest up #启动tap0设备查看网络配置
ifconfig tap0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500 inet 192.168.31.1 netmask 255.255.255.0 broadcast 0.0.0.0 inet6 fe80::6001:ddff:fed8:a341 prefixlen 64 scopeid 0x20<link> ether 62:01:dd:d8:a3:41 txqueuelen 1000 (以太网) RX packets 84 bytes 14172 (14.1 KB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 52 bytes 7830 (7.8 KB) TX errors 0 dropped 68 overruns 0 carrier 0 collisions 0 taptest: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500 inet 192.168.1.1 netmask 255.255.255.0 broadcast 0.0.0.0 inet6 fe80::fc4f:40ff:fe09:821a prefixlen 64 scopeid 0x20<link> ether fe:4f:40:09:82:1a txqueuelen 1000 (以太网) RX packets 710 bytes 119814 (119.8 KB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 128 bytes 15615 (15.6 KB) TX errors 0 dropped 191 overruns 0 carrier 0 collisions 0在qemu命令中添加网卡,这里模拟的是e1000e网卡
-netdev tap,id=net0,ifname=taptest,script=no \ -device e1000e,netdev=net0\ -netdev tap,id=net1,ifname=tap0,script=no \ -device e1000e,netdev=net1\qemu的启动命令如下:
cmd="/opt/qemu/bin/qemu-system-riscv64 \ -nographic -machine virt,pflash0=pflash0,pflash1=pflash1,acpi=off\ -smp "$vcpu" -m "$memory"G \ -blockdev node-name=pflash0,driver=file,read-only=on,filename="$fw1" \ -blockdev node-name=pflash1,driver=file,filename="$fw2" \ -drive file="$drive",format=qcow2,id=hd0,if=none \ -object rng-random,filename=/dev/urandom,id=rng0 \ -device virtio-vga \ -device virtio-rng-device,rng=rng0 \ -device virtio-blk-device,drive=hd0 \ -device virtio-net-device,netdev=usernet \ -netdev user,id=usernet,hostfwd=tcp::"$ssh_port"-:22 \ -object memory-backend-file,id=mem,size="$memory"G,mem-path=/dev/hugepages,share=on\ -netdev tap,id=net0,ifname=taptest,script=no \ -device e1000e,netdev=net0\ -netdev tap,id=net1,ifname=tap0,script=no \ -device e1000e,netdev=net1\ -device qemu-xhci -usb -device usb-kbd -device usb-tablet"这里启动时会报错误:
[ 2.072294][ T1] e1000e 0000:00:02.0 0000:00:02.0 (uninitialized): Failed to initialize MSI-X interrupts. Falling back to MSI interrupts. [ 2.074012][ T1] e1000e 0000:00:02.0 0000:00:02.0 (uninitialized): Failed to initialize MSI interrupts. Falling back to legacy interrupts. [ 2.206276][ T1] e1000e 0000:00:03.0 0000:00:03.0 (uninitialized): Failed to initialize MSI-X interrupts. Falling back to MSI interrupts. [ 2.206809][ T1] e1000e 0000:00:03.0 0000:00:03.0 (uninitialized): Failed to initialize MSI interrupts. Falling back to legacy interrupts.这里无法初始化MSI-X中断和MSI中断,可能是因为使用
-machine virt中的aia(高级中断控制器)当前的参数无法模拟PCIe中断(这个后面在讨论)再次使用
./dpdk/usertools/dpdk-devbind.py -s命令./dpdk/usertools/dpdk-devbind.py -s Network devices using kernel driver =================================== 0000:00:02.0 '82574L Gigabit Network Connection 10d3' if=eth1 drv=e1000e unused= 0000:00:03.0 '82574L Gigabit Network Connection 10d3' if=eth2 drv=e1000e unused= No 'Baseband' devices detected ============================== No 'Crypto' devices detected ============================ No 'DMA' devices detected ========================= No 'Eventdev' devices detected ============================== No 'Mempool' devices detected ============================= No 'Compress' devices detected ============================== No 'Misc (rawdev)' devices detected =================================== No 'Regex' devices detected =========================== No 'ML' devices detected ========================可以看到有两个网络设备出现在列表中。
加载vfio-pci驱动
根据
https://doc.dpdk.org/guides/linux_gsg/linux_drivers.html提到的Linux驱动,推荐使用vfio-pci驱动,因此加载vfio-pci驱动sudo modprobe vfio-pci使用
./dpdk/usertools/dpdk-devbind.py命令可以将这两个PCI设置从Linux内核解绑,绑定到DPDK上,由于启动虚拟机时没有开启iommu,因此绑定时禁用iommu,执行命令如下:sudo ./dpdk/usertools/dpdk-devbind.py -b=vfio-pci 00:03.0 --noiommu-mode sudo ./dpdk/usertools/dpdk-devbind.py -b=vfio-pci 00:02.0 --noiommu-mode再次执行
./dpdk/usertools/dpdk-devbind.py -s可以看到网卡已经绑定到DPDK驱动中。./dpdk/usertools/dpdk-devbind.py -s Network devices using DPDK-compatible driver ============================================ 0000:00:02.0 '82574L Gigabit Network Connection 10d3' drv=vfio-pci unused= 0000:00:03.0 '82574L Gigabit Network Connection 10d3' drv=vfio-pci unused= No 'Baseband' devices detected ============================== No 'Crypto' devices detected ============================ No 'DMA' devices detected ========================= No 'Eventdev' devices detected ============================== No 'Mempool' devices detected ============================= No 'Compress' devices detected ============================== No 'Misc (rawdev)' devices detected =================================== No 'Regex' devices detected =========================== No 'ML' devices detected ========================使用testpmd来测试代码,由于
sudo ./dpdk/riscv64-build-gcc/app/dpdk-testpmd -l 0-3 -n 2 -a 00:02.0 -a 00:03.0 -m 1024 -- -i --port-topology=chained --rxq=2 --txq=2 --auto-start --rxd=64 --txd=64 --nb-cores=2进入交互界面命令
输入 show port stats all查看端口转发情况
测试结果
------- Forward Stats for RX Port= 1/Queue= 0 -> TX Port= 0/Queue= 0 ------- RX-packets: 2 TX-packets: 2 TX-dropped: 0 ---------------------- Forward statistics for port 0 ---------------------- RX-packets: 0 RX-dropped: 0 RX-total: 0 TX-packets: 2 TX-dropped: 0 TX-total: 2 --- ---------------------- Forward statistics for port 1 ---------------------- RX-packets: 2 RX-dropped: 0 RX-total: 2 TX-packets: 0 TX-dropped: 0 TX-total: 0 --- +++++++++++++++ Accumulated forward statistics for all ports+++++++++++++++ RX-packets: 2 RX-dropped: 0 RX-total: 2 TX-packets: 2 TX-dropped: 0 TX-total: 2 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++碰到的问题
目前只使用上面的方式成功进行测试
尝试启用PCIe中断
通过文档:
https://www.qemu.org/docs/master/system/riscv/virt.html可以介绍到'virt' 通用虚拟平台用来模拟RISCV的参数设置。其中aia=可以用来选择由 AIA (高级中断体系结构) 规范定义的中断控制器。“aia=aplic” 选择 APLIC (高级平台级中断控制器) 来处理有线中断,而 “aia=aplic-imsic” 选择 APLIC 和 IMSIC (传入消息信号中断控制器) 来处理有线中断和 MSI。如果未指定,则假定此选项为 “none”,它选择 SiFive PLIC 来处理有线中断。
这里当指定“aia=aplic-imsic”时,虚拟机启动时会报
[ 2.108610][ T1] xhci_hcd 0000:00:04.0: No msi-x/msi found and no IRQ in BIOS [ 2.109226][ T1] xhci_hcd 0000:00:04.0: startup error -22 [ 2.123974][ T1] xhci_hcd 0000:00:04.0: init 0000:00:04.0 fail, -22 [ 2.158971][ T1] syscon-poweroff poweroff: pm_power_off already claimed for sbi_srst_power_off [ 5.020055][ T1] Warning: unable to open an initial console. [ 5.020997][ T1] integrity: Unable to open file: /etc/keys/x509_ima.der (-2) [ 5.021182][ T1] integrity: Unable to open file: /etc/keys/x509_evm.der (-2) [ 10.024797][ T1] systemd[1]: Failed to start Virtual Console Setup. [ 17.148199][ T336] [drm:virtio_gpu_init [virtio_gpu]] *ERROR* failed to find virt queues这里可能是启动固件
RISCV_VIRT_CODE.fd,RISCV_VIRT_VARS.fd里没有PCIe和MSI-X相关设置使用virtio-net-pci网卡来模拟网卡,而不是使用e1000e,在运行testpmd测试环境时,输出了如下信息:
EAL: Detected CPU lcores: 8 EAL: Detected NUMA nodes: 1 EAL: Detected static linkage of DPDK EAL: Multi-process socket /var/run/dpdk/rte/mp_socket EAL: Selected IOVA mode 'PA' EAL: VFIO support initialized EAL: TSC using RISC-V rdtime. EAL: Using IOMMU type 8 (No-IOMMU) Interactive-mode selected Auto-start selected Warning: NUMA should be configured manually by using --port-numa-config and --ring-numa-config parameters along with --numa. testpmd: create a new mbuf pool <mb_pool_0>: n=171456, size=2176, socket=0 testpmd: preferred mempool ops selected: ring_mp_mc Configuring Port 0 (socket 0) EAL: Error disabling MSI-X interrupts for fd 541 EAL: Error enabling MSI-X interrupts for fd 541 VIRTIO_DRIVER: virtio_dev_start(): interrupt enable failed Fail to start port 0: Input/output error Configuring Port 1 (socket 0) EAL: Error disabling MSI-X interrupts for fd 545 EAL: Error enabling MSI-X interrupts for fd 545 VIRTIO_DRIVER: virtio_dev_start(): interrupt enable failed Fail to start port 1: Input/output error Done Start automatic packet forwarding Not all ports were started testpmd> start Not all ports were started程序无法正常启动,可以看到virtio-net-pci设备无法启动端口,可能是因为virtio-net-pci只模拟了PCI中断,而e1000e可以模拟传统中断。
参考文档:
https://doc.dpdk.org/guides/linux_gsg/linux_drivers.html
https://doc.dpdk.org/guides/linux_gsg/cross_build_dpdk_for_riscv.html
https://docs.openeuler.org/zh/docs/23.09/docs/Installation/riscv_qemu.html
https://www.qemu.org/docs/master/system/riscv/virt.html
浙公网安备 33010602011771号