排查 “Detected Tx Unit Hang”问题

实现功能:

使用自己已经分配的内存让skb->data指向,而不是使用alloc_malloc()。

部分代码如下:   

 1             /*
 2              * build a new sk_buff
 3              */
 4             //struct sk_buff *send_skb = kmem_cache_alloc_node(skbuff_head_cache, GFP_ATOMIC & ~__GFP_DMA, NUMA_NO_NODE);
 5             struct sk_buff *send_skb = kmem_cache_alloc(skbuff_head_cache, GFP_ATOMIC & ~__GFP_DMA);
 6 
 7             if (!send_skb) {
 8                 //spin_unlock(&lock);
 9                 return NF_DROP;
10             }
11             
12             //printk("what2\n");
13             memset(send_skb, 0, offsetof(struct sk_buff, tail));
14             atomic_set(&send_skb->users, 2);
15             send_skb->cloned = 0;
16             
17             send_skb->head = mmap_buf + 1024;
18             send_skb->data = mmap_buf + 1024;
19             

第18行,mmap_buf是提前分配的内存。

在/var/log/messages中网卡驱动会输出错误信息:

 1 ep 28 15:36:17 10g-host2 kernel: ixgbe 0000:03:00.0: eth2: Detected Tx Unit Hang
 2 Sep 28 15:36:17 10g-host2 kernel:  Tx Queue             <13>
 3 Sep 28 15:36:17 10g-host2 kernel:  TDH, TDT             <0>, <1ea>
 4 Sep 28 15:36:17 10g-host2 kernel:  next_to_use          <1ea>
 5 Sep 28 15:36:17 10g-host2 kernel:  next_to_clean        <0>
 6 Sep 28 15:36:17 10g-host2 kernel: ixgbe 0000:03:00.0: eth2: Detected Tx Unit Hang
 7 Sep 28 15:36:17 10g-host2 kernel:  Tx Queue             <15>
 8 Sep 28 15:36:17 10g-host2 kernel:  TDH, TDT             <1>, <1eb>
 9 Sep 28 15:36:17 10g-host2 kernel:  next_to_use          <1eb>
10 Sep 28 15:36:17 10g-host2 kernel:  next_to_clean        <1>
11 Sep 28 15:36:17 10g-host2 kernel: ixgbe 0000:03:00.0: eth2: Detected Tx Unit Hang
12 Sep 28 15:36:17 10g-host2 kernel:  Tx Queue             <14>
13 Sep 28 15:36:17 10g-host2 kernel:  TDH, TDT             <0>, <1ea>
14 Sep 28 15:36:17 10g-host2 kernel:  next_to_use          <1ea>
15 Sep 28 15:36:17 10g-host2 kernel:  next_to_clean        <0>
16 Sep 28 15:36:17 10g-host2 kernel: ixgbe 0000:03:00.0: eth2: Detected Tx Unit Hang
17 Sep 28 15:36:17 10g-host2 kernel:  Tx Queue             <4>
18 Sep 28 15:36:17 10g-host2 kernel:  TDH, TDT             <0>, <1ea>
19 Sep 28 15:36:17 10g-host2 kernel:  next_to_use          <1ea>
20 Sep 28 15:36:17 10g-host2 kernel:  next_to_clean        <0>
21 Sep 28 15:36:17 10g-host2 kernel: ixgbe 0000:03:00.0: eth2: Detected Tx Unit Hang
22 Sep 28 15:36:17 10g-host2 kernel:  Tx Queue             <12>
23 Sep 28 15:36:17 10g-host2 kernel:  TDH, TDT             <5>, <1ef>
24 Sep 28 15:36:17 10g-host2 kernel:  next_to_use          <1ef>
25 Sep 28 15:36:17 10g-host2 kernel:  next_to_clean        <5>
26 Sep 28 15:36:17 10g-host2 kernel: ixgbe 0000:03:00.0: eth2: Detected Tx Unit Hang
27 Sep 28 15:36:17 10g-host2 kernel:  Tx Queue             <2>
28 Sep 28 15:36:17 10g-host2 kernel:  TDH, TDT             <2>, <1ec>
29 Sep 28 15:36:17 10g-host2 kernel:  next_to_use          <1ec>
30 Sep 28 15:36:17 10g-host2 kernel:  next_to_clean        <2>
31 Sep 28 15:36:17 10g-host2 kernel: ixgbe 0000:03:00.0: eth2: Detected Tx Unit Hang

在排除各种原因后,定位为分配的mmap_buf存在问题。使用vmalloc()分配不正确,改为kmalloc()后正常。

《Linux内核设计与实现》第12.5节有解释,应该是:网卡设备要求分配的物理地址连续,而vmalloc()只是虚拟地址连续

 

posted @ 2014-10-22 11:35  lxgeek  阅读(7425)  评论(0编辑  收藏  举报