netmap performance

转自 http://mnstory.net/2014/11/netmap-performance/

本验证基于LINUX平台，E1000E网卡。
目标设备：有三个网口，eth0,eth1皆为Intel 82574L网卡，用于数据转发，eth2为管理口，方便eth0,eth1转发数据时还能远程连接。
使用smartflow来测试流量，拓扑图为：

编译

先进入LINUX目录

1	# cd netmap-src-dir/LINUX

我们测试e1000e驱动，根据版本号的命名规则，COPY对应的e1000e驱动patch到patches目录
例如，我的KERNEL版本为 3.10.0

1 2	# uname -a Linux host-001e67a1aaf9 3.10.0 #128 SMP x86_64 GNU/Linux

那么，3.10.0翻译过来就是31000，查看final-patches下面的文件，可以推出对应patch文件版本号范围为30900--99999

1	# mkdir patches

将对应e1000e网卡的patch拷贝到patches目录

1	# cp final-patches/diff--e1000e--30900--99999 patches/

编译的时候，需要指定目标机器的源码目录KSRC

1	# make clean; make KSRC=/src/VMP4.0/src/linux

生成netmap_lin.ko 和 e1000e/e1000e.ko，拷贝到目标机器。
编译pkt-gen等examples

1 2	# cd ../examples # make

生成的文件，COPY到目标机。

运行测试

运行的时候，发现一个问题，启动ptk-gen后内核会出CORE，demsg打印
错误信息为：

[ 3.561456] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1
[ 3.562540] {1}[Hardware Error]: APEI generic hardware error status
[ 3.563146] {1}[Hardware Error]: severity: 1, fatal
[ 3.563734] {1}[Hardware Error]: section: 0, severity: 1, fatal
[ 3.564326] {1}[Hardware Error]: flags: 0x01
[ 3.564911] {1}[Hardware Error]: primary
[ 3.565516] {1}[Hardware Error]: section_type: PCIe error
[ 3.566105] {1}[Hardware Error]: port_type: 0, PCIe end point
[ 3.566700] {1}[Hardware Error]: version: 1.16
[ 3.567289] {1}[Hardware Error]: command: 0x4010, status: 0x0547
[ 3.567878] {1}[Hardware Error]: device_id: 0000:00:00.0
[ 3.568464] {1}[Hardware Error]: slot: 0
[ 3.569046] {1}[Hardware Error]: secondary_bus: 0x00
[ 3.569652] {1}[Hardware Error]: vendor_id: 0x8086, device_id: 0x10d3
[ 3.570246] {1}[Hardware Error]: class_code: 000002
[ 3.570856] Kernel panic - not syncing: Fatal hardware error!

其中vendor_id: 0x8086, device_id: 0x10d3为发生错误的设备厂商和设备，通过lspci查找device_id

1 2	# lspci -nn -vv \| grep 10d3 03:00.0 Ethernet controller [0200]: Intel Corporation 82574L Gigabit Network Connection [8086:10d3]

发现是就是发包网卡，后续又发现dmesg：

e1000e 0000:00:19.0 eth0: registered PHC clock
e1000e 0000:00:19.0 eth0: (PCI Express:2.5GT/s:Width x1) 00:1e:67:a1:aa:f9
e1000e 0000:00:19.0 eth0: Intel(R) PRO/1000 Network Connection
e1000e 0000:00:19.0 eth0: MAC: 10, PHY: 11, PBA No: 0100FF-0FF
ACPI Warning: SystemIO range 0x0000000000005000-0x000000000000501f conflicts with OpRegion 0x0000000000005000-0x000000000000500f (\_SB_.PCI0.SBUS.SMBI) (20130517/utaddress-254)
ACPI: If an ACPI driver is available for this device, you should use it instead of the native driver
ahci 0000:00:1f.2: version 3.0
e1000e 0000:03:00.0: Interrupt Throttling Rate (ints/sec) set to dynamic conservative mode
e1000e 0000:03:00.0: irq 46 for MSI/MSI-X

DMAR:[fault reason 05] PTE Write access is not set
dmar: DRHD: handling fault status reg 2
dmar: DMAR:[DMA Write] Request device [03:00.0] fault addr 7285ac000
DMAR:[fault reason 05] PTE Write access is not set
irq 48: nobody cared (try booting with the "irqpoll" option)
CPU: 0 PID: 0 Comm: swapper/0 Tainted: G O-------------- 3.10.0 #128
Hardware name: Intel Corporation S1200BTL/S1200BTL, BIOS S1200BT.86B.02.00.0035.030220120927 03/02/2012
ffff8807053f5e84 ffff88081e803e28 ffffffff81653b0a ffff88081e803e58
ffffffff810dba9d 0000000000013080 ffff8807053f5e00 0000000000000030
0000000000000000 ffff88081e803ea8 ffffffff810dbf41 00000030669e0b7c
Call Trace:
[] dump_stack+0x19/0x1b
[] __report_bad_irq+0x3d/0xe0
[] note_interrupt+0x1b1/0x200
[] ? cpuidle_enter_state+0x5b/0xe0
[] handle_irq_event_percpu+0xa2/0x1e0
[] handle_irq_event+0x42/0x70
[] handle_edge_irq+0x6f/0x110
[] handle_irq+0x22/0x40
[] do_IRQ+0x5e/0x110
[] common_interrupt+0x6a/0x6a
[] ? cpuidle_enter_state+0x5b/0xe0
[] ? cpuidle_enter_state+0x57/0xe0
[] cpuidle_idle_call+0xbb/0x200
[] arch_cpu_idle+0xe/0x30
[] cpu_startup_entry+0x9a/0x220
[] rest_init+0x77/0x80
[] start_kernel+0x44e/0x45b
[] ? repair_env_string+0x5e/0x5e
[] x86_64_start_reservations+0x2a/0x2c
[] x86_64_start_kernel+0xfd/0x101
handlers:
[] e1000_msix_other [e1000e]
Disabling IRQ #48

从上述信息，分析出，可能和DMA有关，查看内核编译选项：

# cat .config | grep INTEL_IOMMU

CONFIG_INTEL_IOMMU=y

CONFIG_INTEL_IOMMU_DEFAULT_ON=y

CONFIG_INTEL_IOMMU_FLOPPY_WA=y

默认IOMMU都开启了，于是在grub启动项里修改，将iommu关闭:

1 2	# cat /proc/cmdline BOOT_IMAGE=/firmware/current/package/files/vm crashkernel=400M loglevel=3 elevator=deadline softlockup_panic=1 reboot=force nohz=off intel_iommu=off

LINUX原生测试

原生LINUX测试的时候，遇到一个问题是没开启ip_forward导致三层数据不通，数据只到eth0，无法从eth1出去：

# pps.sh eth0 eth1

02:25:03 IFN , , ,RXB , , ,TXB , ,RXP , ,TXP

eth0 18661696 507 291589 5

eth1 0 0 0 0

查看ip_forward发现关闭了转发，开启它：

# cat /proc/sys/net/ipv4/ip_forward

# echo 1 > /proc/sys/net/ipv4/ip_forward

然后通了：

# pps.sh eth0 eth1

02:27:51 IFN , , ,RXB , , ,TXB , ,RXP , ,TXP

eth0 27094848 95 423357 1

eth1 0 27058240 0 422789

Netmap vale测试

初始化环境的时候，将网口中断绑定到CPU 1上，防止在CPU 0上受干扰过多，写成脚本，方便复用：

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

lcmdrun()

{

cmd="$*"

echo "$cmd"

eval "$cmd"

}

#此函数设置网口中断到指定CPU

w.eth.irqaffinity()

{

eth="$1"

affinity="$2"

if [ "$eth" = "" ]; then

lerror "w.eth.irqaffinity \$eth [\$affinity]";

return;

#make mult lie result to one line

lines=`cat /proc/interrupts | grep $eth | awk -F: '{print $2}'`

#lines=' 47

#48

#49'

SAVE_IFS=$IFS

IFS=$'\n'

#if we set IFS=$'\n' and pass lines as args, it become => trim ' 47' ' 48' ' 49'

#if we set IFS=$' ' and pass lines as args, it become =>

#trim '47

#' '48

#' 49

lines=$(trim $lines)

#trim return lines='47 48 49'

IFS=$' '

#in $lines, not in "$lines", "$lines" can't split by space, by $lines can be

for i in $lines; do

i=$(trim $i)

if [ "$affinity" == "" ]; then

#just echo

cmd="cat /proc/irq/$i/smp_affinity"

else

cmd="echo \"$affinity\" > /proc/irq/$i/smp_affinity"

lcmdrun "$cmd"

done

IFS=$SAVE_IFS

}

w.nm.stop()

{

nic=$1

shift

ifs=( $@ )

ifs_nr=$#

#stop interface

for ((i=0; i < $ifs_nr; i++))

lcmdrun "ifconfig ${ifs[$i]} down"

done

#remove old modules

lcmdrun rmmod "${nic}.ko"

lcmdrun rmmod "netmap_lin.ko"

}

w.nm.start()

{

nic=$1

shift

ifs=( $@ )

if [ ! -f "${nic}.ko" ]; then

echo "can't found ${nic}.ko"

return 1

if [ ! -f "netmap_lin.ko" ]; then

echo "can't found netmap_lin.ko"

return 1

#clean dmsg

dmesg -c > /dev/null

#set ring slot, set before netmap attach when eth is down

for eth in "${ifs[@]}"; do

lcmdrun "ifconfig $eth down 2>/dev/null"

done

#insert new modules

lcmdrun insmod "netmap_lin.ko"

lcmdrun insmod "${nic}.ko"

for eth in "${ifs[@]}"; do

#"ethtool -G $eth rx 4096 tx 4096"

#start interface

lcmdrun "ifconfig $eth up"

#set irq affinity to cpu1

lcmdrun "w.eth.irqaffinity $eth 2"

#set promisc module, important

lcmdrun "ifconfig $eth promisc"

#ethtool -K $eth tso off

#ethtool -K $eth gso off

done

#show log

lcmdrun dmesg

lcmdrun "lsmod | grep \"${nic}\""

}

w.nm.restart()

{

w.nm.stop $*

w.nm.start $*

}

w.nm.restart.e1000e()

{

w.nm.restart "e1000e" "eth0" "eth1"

}

只需调用w.nm.restart.e1000e即可初始化netmap环境，初始化完成后，设置虚拟bridge：

# vale-ctl -h vale23:eth0

# vale-ctl -h vale23:eth1

# vale-ctl -l

bdg_ctl [98] bridge:0 port:0 vale23:eth0

bdg_ctl [98] bridge:0 port:1 eth0

bdg_ctl [98] bridge:0 port:2 vale23:eth1

bdg_ctl [98] bridge:0 port:3 eth1

此时eth0和eth1就桥接到同一bridge上了，数据包就可以从eth0转发到eth1，反之亦然。

Netmap bridge测试

上面使用的是vale方式转发，netmap应用场景还有通过socket：轮询，收取，转发数据，其提供了示例代码bridge，我们就用bridge测试。

使用bridge的时候发现netmap不能收发数据，后来发现没有开启网口混杂模式，例如eth0，通过ifconfig eth0 | grep PROMISC查看是否设置了混杂模式，如果没有，使用ifconfig eth0 promisc设置。

Bridge测试比较简单，直接：

1	#./bridge -i netmap:eth0 -i netmap:eth1

当然，为了更优性能，也将此进程绑定到网口中断发生的CPU 1，并设置其为实时进程：

pidexpend()

{

ip=$1;

if [ "$ip" == "" ] ; then

return;

if [ "$ip" -eq "$ip" ] 2>/dev/null; then

echo $ip;

return;

#not digit

pidof $ip | awk '{print $1}'

}

w.cpu.affinity.show()

{

pid=$(pidexpend $1)

if [ "$pid" = "" ]; then

echo "w.cpu.affinity.show (pid|progname)"

return 1

lcmdrun "taskset -p $pid"

lcmdrun "chrt -p $pid"

return 0

}

w.cpu.affinity()

{

#set eth affinity to cpu 1, hex 0x2

pid=$(pidexpend $1)

cpumask=2

if [ "$2" != "" ]; then

cpumask=$2

realtime=90

if [ "$3" != "" ]; then

realtime=$3

if [ "$pid" = "" ]; then

echo "w.cpu.affinity (pid|progname) [\$cpumask=$cpumask] [\$realtime=$realtime]"

return 1

if [ "$cpumask" -eq "$cpumask" ] 2>/dev/null; then

lcmdrun "taskset -p $cpumask $pid"

if [ "$realtime" -eq "$realtime" ] 2>/dev/null; then

lcmdrun "chrt -p -r $realtime $pid"

return 0

}

运行了bridge之后，再调用w.cpu.affinity bridge可以设置其为CPU亲和性与实时性，然后可以开始测试。

测试结果

netmap bridge和netmap vale性能相当。
在发64字节包的时候，netmap性能为121万PPS，LINUX原生为83万PPS，提高了45%。官方给出的万兆网卡性能可达到,1488万PPS，折换为千兆网卡大约为148万PPS，比我测试的数据高22%，也不知道它如何测试的。
当frame size增大后，性能无差别，因为流量到了瓶颈，除了64字节frame size的时候，CPU利用率都不会100%。

posted @ 2017-08-04 14:45 Andy.gbhu 阅读(1551) 评论(0) 收藏举报

刷新页面返回顶部

Andy.gbhu

netmap performance

编译

运行测试

LINUX原生测试

Netmap vale测试

Netmap bridge测试

测试结果

公告