vxworks issue: FTP tansfer slowly

1. 问题现象

调试环境:单板(AM335X+linux) --------(GE以太网)----------- 控制器(AM5728+vxworks)

出错场景:单板作为ftp client(172.16.89.4),控制器作为ftp server(172.16.89.71),单板在进行ftp数据下载的时候会经常出现速度突然变慢,有时会极度卡顿,最恶劣情况会传输失败。

m_Ftp_Connect_Login_Server_Function 
mmmmm "172.16.89.71" 21 
wait for command finished 0 
connecting to the Ftp server... 
"FTP now stateeeeeee:" 1 "主机名查找正在进行中" 
"FTP now stateeeeeee:" 2 "正在尝试连接到主机" 
"FTP now stateeeeeee:" 3 "已实现与主机的连接" 
wait for command finished 1 
m_ftp->currentCommand() 3 "Unknown error" 
connectToHost 
the FTP_server is successfully connected 
logging in... 
"Unknown error" 
wait for command finished 2 
"FTP now stateeeeeee:" 4 "已实现连接和用户登录" 
m_ftp->currentCommand() 4 "Unknown error" 
Login 
the login FTP_server is successful 
11111------------------------------------------------------------------------------------------ 
start!-----------------------------------------------------------!start 
uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu "/mmc1:2/robot-folder" 
cd file from server... 
m_ftp->currentCommand() 7 "Unknown error" 
cd 
open file path successfully 
m_ftp->currentCommand() 6 "Unknown error" 
close 
cd file from server... 
m_ftp->currentCommand() 7 "Unknown error" 
cd 
open file path successfully 

...

2. 分析过程

因为单板作为ftp client和其他控制单板传输的过程中未碰到这类问题,所以我们排查重点放在目前的控制器单板侧。

Step 1. 排查系统负载过重

首先怀疑到的是server端在出现卡顿的时候,是不是任务负载过重,造成调度不过来的情况。

首先启动vxworks本地应用,并且开启单板的ftp下载程序。

// 启动应用
cd"/mmc1:1"
ld<pruss_eth.out
prussStart
ld<codesyscontrol.out
PlcStart"CODESYSControl.cfg"

我们通过spy命令查看vxworks的cpu占用率,发现在复现卡顿现象的时候cpu占用率非常低,idle都是百分之九十几:

-> spy
value = 548438048 = 0x20b08020 = ' '
-> 
    NAME         ENTRY       TID      PRI   total % (ticks)  delta % (ticks)
------------ ------------ ----------  ---   ---------------  ---------------
tIsr0        0x3ec07c     0x2027f380    0     0% (       1)    0% (       1)
tJobTask     0x30eba4     0x204f35c0    0     0% (       0)    0% (       0)
tExcTask     0x30e544     0x5461a8      0     0% (       0)    0% (       0)
tLogTask     logTask      0x204f3b08    0     0% (       0)    0% (       0)
tShell0      shellTask    0x20446c80    1     0% (       0)    0% (       0)
tHrfsCommit: hrfsCommitTa 0x20473b10    2     0% (       0)    0% (       0)
tSpyTask     spyComTask   0x20b08020    5     0% (       9)    0% (       3)
tErfTask     0x255664     0x2022aae8   10     0% (       0)    0% (       0)
tVxdbgTask   0x298384     0x20431868   25     0% (       0)    0% (       0)
CAAEventTask 0x108da24    0x20a7c430   30     0% (       0)    0% (       0)
SchedExcepti 0x108da24    0x20a7c020   31     0% (       0)    0% (       0)
tDmaJobTask  jobQueueProc 0x20000640   45     0% (       0)    0% (       0)
tTcfEvents   0x3bba88     0x551668     49     0% (       0)    0% (       0)
tTcf         0x2bfa60     0x20449238   49     0% (       1)    0% (       0)
tTcf         0x2bfa60     0x20455858   49     0% (       0)    0% (       0)
tTcf         0x2bfa60     0x20458a90   49     0% (       0)    0% (       0)
tNet0        ipcomNetTask 0x20232630   50     0% (       0)    0% (       0)
tNetConf     0x1b0974     0x204165c0   50     0% (       0)    0% (       0)
ipftps       ipftps_main  0x20428348   50     0% (       0)    0% (       0)
tAnalysisAge cafe_event_t 0x53dea8     50     0% (       0)    0% (       0)
SchedProcess 0x108da24    0x20a68af0   64     0% (       5)    0% (       2)
xHCD_IH0     usbXhcdInter 0x204db0d0   90     0% (       0)    0% (       0)
xHCD_IH1     usbXhcdInter 0x205d5348   90     0% (       0)    0% (       0)
BlkDrvUdp    0x108da24    0x20a90020   95     0% (       1)    0% (       0)
BlkDrvTcp    0x108da24    0x20a90298   95     0% (       5)    0% (       5)
tiCardMon    0x109924     0x202870d0  100     0% (       0)    0% (       0)
tiCardMon    0x109924     0x202879d8  100     0% (       0)    0% (       0)
BusM A       usbHubThread 0x204c54a0  100     0% (       2)    0% (       0)
BusM B       usbHubThread 0x204c5c08  100     0% (       0)    0% (       0)
BusM C       usbHubThread 0x204eb7a0  100     0% (       2)    0% (       0)
BusM D       usbHubThread 0x205d5050  100     0% (       0)    0% (       0)
tTffsPTask   flPollTask   0x206929b0  100     0% (       0)    0% (       0)
/mmc1XbdSvc  0x30c984     0x2045e010  100     0% (       0)    0% (       0)
SupervisorCy 0x108da24    0x20aa4020  128     0% (       0)    0% (       0)
OPCUAServer  0x108da24    0x20aa4ad8  128     0% (       2)    0% (       0)
tUsb2Clr     0x3d1aa8     0x20692d10  150     0% (       0)    0% (       0)
WebServerClo 0x108da24    0x20a7c788  150     0% (       1)    0% (       0)
CommCtrl     CommControlL 0x20491850  161     0% (       0)    0% (       0)
vnSyncer     vn_sync_task 0x20464988  200     0% (       0)    0% (       0)
miiBusMonito 0x24f3b8     0x204f32a0  252     0% (       3)    0% (       0)
MainLoop     0x108da24    0x20a90510  254     0% (       0)    0% (       0)
KERNEL                                        0% (       4)    0% (       4)
INTERRUPT                                     0% (       0)    0% (       0)
IDLE                                         98% (    2045)   97% (     512)
TOTAL                                        98% (    2081)   97% (     527)

总结:和系统负载无关。

Step 2. 排查文件系统读写速度

会不会是文件系统的读写速度变慢拖累了ftp速度。

vxworks下一共有两个分区,"/mmc1:1"是dosfs文件格式,"/mmc1:2"是hrfs文件格式:

xbdCreatePartition ("/mmc1:1", 2, 50, 0, 0)
dosFsVolFormat ("/mmc1:1", 0, 0)
hrfsFormat ("/mmc1:2", 0ll, 0, 0)

我们测试两个分区的文件读写速度,发现读速度还行,但是写速度非常慢。

[vxWorks *]# ls -l
... 
-rwxrwxrwx  1 0       0             21774591 Jan  1  1980 codesyscontrol.out 

-> timex(cp,"/mmc1:1/codesyscontrol.out","/mmc1:1/codesyscontrol.out.bak")
copying file /mmc1:1/codesyscontrol.out -> /mmc1:1/codesyscontrol.out.bak
timex: time of execution = 17873 +/- 0 (0%) millisecs
value = 54 = 0x36 = '6'
-> 
-> 
-> timex(cp,"/mmc1:2/codesyscontrol.out","/mmc1:2/codesyscontrol.out.bak")
copying file /mmc1:2/codesyscontrol.out -> /mmc1:2/codesyscontrol.out.bak
timex: time of execution = 137 +/- 0 (0%) secs
value = 47 = 0x2f = '/'
-> 

ftp使用的是"/mmc1:2"分区,出现故障的时候只是在进行ftp下载,对文件系统来说是读操作,但是是不是别的地方的写操作阻塞了读操作?

总结:这里也是一个怀疑点,但是后续的其他测试排除了和它相关。

Step 3. 排查网口0、网口1

ftp server端有两个GE网口,ftp下载一直使用的是网口0。
在这里插入图片描述
在卡顿故障复现的时候,单板通过网口0(cpsw0)和AM5728的ftp通讯非常卡顿,这时使用pc作为ftp client通过网口1(cpsw1)和AM5728的ftp通讯非常通畅无问题,把pc换到网口0(cpsw0)ftp传输一样的卡顿。

这一下子就排除掉了很多的因素,说明故障集中在AM5728的网口0传输链上:网卡驱动、网络协议栈、硬件走线都有嫌疑。

总结:缩小定位范围,故障集中在AM5728的网口0传输链上。

Step 4. 排查网卡驱动和网络协议栈

把单板换成通过网口1(cpsw1)和AM5728进行ftp通讯。

使用同样的ftp测试,连续测试一天无问题出现。

因为网口0、网口1的网卡驱动和网络协议栈部分基本一致,所以网口0的硬件走线嫌疑比较大。

总结:进一步缩小范围,走线信号嫌疑较大,但是还缺乏直接证据。

Step 5. 排查网口0的走线信号

我们打算读出网口0的mac层统计看看它的错误统计,如果因为走线信号质量问题产生的各种错误,这里应该能看到错误统计。

因为原生驱动代码没有支持,我们增加相应的函数:

diff --git a/vxbFdtTiCpswEnd.c b/vxbFdtTiCpswEnd.c
index e60c507..4c32f0e 100644 (file)
--- a/vxbFdtTiCpswEnd.c
+++ b/vxbFdtTiCpswEnd.c
@@ -256,6 +257,7 @@ LOCAL char * rxChanErrMsg [] =
 /* common resource which is shared by all port */
 
 LOCAL CPSW_SW_CTRL * pSwCtrl = NULL;
+CPSW_STAT           tmp_cpswStat; 
 
 /* import functions */
 

@@ -2872,6 +2874,70 @@ LOCAL STATUS cpswEndStatsDump
     return OK;
     }
 
+STATUS cpswEndStatsShow
+    (
+    )
+    {
+    int              i;
+    VXB_DEV_ID       pDev = pSwCtrl->pDev;
+       UINT32         * temp = (UINT32 *)&tmp_cpswStat;
+
+    /*
+     * hardware statistic counters are write-to-decrement,
+     * after a read, we write the value read to clear
+     * the counters
+     */
+
+    for (i = 0; i < sizeof (CPSW_STAT) / sizeof (UINT32); i++)
+        {
+        *temp = CSR_READ_4 (pDev, pSwCtrl->statsOffset + i*4);
+        //CSR_WRITE_4 (pDev, pSwCtrl->statsOffset + i*4, *temp);
+        temp++;
+        }
+               
+       (* _func_kprintf)("cpswMacStat rxgood = %d\n", tmp_cpswStat.rxgood);
+       (* _func_kprintf)("cpswMacStat rxbroadcast = %d\n", tmp_cpswStat.rxbroadcast);
+       (* _func_kprintf)("cpswMacStat rxmulticast = %d\n", tmp_cpswStat.rxmulticast);
+       (* _func_kprintf)("cpswMacStat rxpause = %d\n", tmp_cpswStat.rxpause);
+       (* _func_kprintf)("cpswMacStat rxcrcerros = %d\n", tmp_cpswStat.rxcrcerros);
+       (* _func_kprintf)("cpswMacStat rxalignmenterrors = %d\n", tmp_cpswStat.rxalignmenterrors);
+       (* _func_kprintf)("cpswMacStat rxoversized = %d\n", tmp_cpswStat.rxoversized);
+       (* _func_kprintf)("cpswMacStat rxjabber = %d\n", tmp_cpswStat.rxjabber);
+       (* _func_kprintf)("cpswMacStat rxundersized = %d\n", tmp_cpswStat.rxundersized);
+       (* _func_kprintf)("cpswMacStat rxfrags = %d\n", tmp_cpswStat.rxfrags);
+       (* _func_kprintf)("cpswMacStat rxoctets = %d\n\n", tmp_cpswStat.rxoctets);
+
+       (* _func_kprintf)("cpswMacStat txgood = %d\n", tmp_cpswStat.txgood);
+       (* _func_kprintf)("cpswMacStat txbroadcast = %d\n", tmp_cpswStat.txbroadcast);
+       (* _func_kprintf)("cpswMacStat txmulticast = %d\n", tmp_cpswStat.txmulticast);
+       (* _func_kprintf)("cpswMacStat txpause = %d\n", tmp_cpswStat.txpause);
+       (* _func_kprintf)("cpswMacStat txdefered = %d\n", tmp_cpswStat.txdefered);
+       (* _func_kprintf)("cpswMacStat txcollision = %d\n", tmp_cpswStat.txcollision);
+       (* _func_kprintf)("cpswMacStat txsinglecol = %d\n", tmp_cpswStat.txsinglecol);
+       (* _func_kprintf)("cpswMacStat txmulticol = %d\n", tmp_cpswStat.txmulticol);
+       (* _func_kprintf)("cpswMacStat txexceesive = %d\n", tmp_cpswStat.txexceesive);
+       (* _func_kprintf)("cpswMacStat txlatecol = %d\n", tmp_cpswStat.txlatecol);
+       (* _func_kprintf)("cpswMacStat txunderrun = %d\n", tmp_cpswStat.txunderrun);
+       (* _func_kprintf)("cpswMacStat txcariersense = %d\n", tmp_cpswStat.txcariersense);
+       (* _func_kprintf)("cpswMacStat txoctets = %d\n\n", tmp_cpswStat.txoctets);
+
+       (* _func_kprintf)("cpswMacStat sz64octets = %d\n", tmp_cpswStat.sz64octets);
+       (* _func_kprintf)("cpswMacStat sz65_127octets = %d\n", tmp_cpswStat.sz65_127octets);
+       (* _func_kprintf)("cpswMacStat sz128_255octets = %d\n", tmp_cpswStat.sz128_255octets);
+       (* _func_kprintf)("cpswMacStat sz256_511octets = %d\n", tmp_cpswStat.sz256_511octets);
+       (* _func_kprintf)("cpswMacStat sz512_1023octets = %d\n", tmp_cpswStat.sz512_1023octets);
+       (* _func_kprintf)("cpswMacStat sz1024octets = %d\n", tmp_cpswStat.sz1024octets);
+       (* _func_kprintf)("cpswMacStat netoctets = %d\n", tmp_cpswStat.netoctets);
+       (* _func_kprintf)("cpswMacStat rxfifooverrun = %d\n", tmp_cpswStat.rxfifooverrun[0]);
+       (* _func_kprintf)("cpswMacStat rxfifooverrun = %d\n", tmp_cpswStat.rxfifooverrun[1]);
+       (* _func_kprintf)("cpswMacStat rxfifooverrun = %d\n\n", tmp_cpswStat.rxfifooverrun[2]);
+
+       (* _func_kprintf)("cpswMacStat port 0 cpswInDropped = %lld\n", pSwCtrl->port[0]->cpswInDropped);
+       (* _func_kprintf)("cpswMacStat port 1 cpswInDropped = %lld\n\n\n", pSwCtrl->port[1]->cpswInDropped);
+
+    return OK;
+    }
+
 /*******************************************************************************
 *
 * cpswEndIoctl - the driver I/O control routine

在平时的传输中,偶尔就有接收crc error的情况:

-> cpswEndStatsShow
...
cpswEndStatsShow,2900, rxgood = 1023
cpswEndStatsShow,2901, rxbroadcast = 0
cpswEndStatsShow,2902, rxmulticast = 0
cpswEndStatsShow,2903, rxpause = 0
cpswEndStatsShow,2904, rxcrcerros = 17
cpswEndStatsShow,2905, rxalignmenterrors = 0
cpswEndStatsShow,2906, rxoversized = 0
cpswEndStatsShow,2907, rxjabber = 0
cpswEndStatsShow,2908, rxundersized = 0
cpswEndStatsShow,2909, rxfrags = 0
cpswEndStatsShow,2910, rxoctets = 83903

在极端卡顿的情况下,接收crc error大量增加:

-> cpswEndStatsShow
...
cpswEndStatsShow,2900, rxgood = 5544
cpswEndStatsShow,2901, rxbroadcast = 181
cpswEndStatsShow,2902, rxmulticast = 158
cpswEndStatsShow,2903, rxpause = 0
cpswEndStatsShow,2904, rxcrcerros = 109
cpswEndStatsShow,2905, rxalignmenterrors = 0
cpswEndStatsShow,2906, rxoversized = 0
cpswEndStatsShow,2907, rxjabber = 0
cpswEndStatsShow,2908, rxundersized = 0
cpswEndStatsShow,2909, rxfrags = 0
cpswEndStatsShow,2910, rxoctets = 433598

构造内容为全FF的包,发送100个,错误100个:
在这里插入图片描述

-> cpswEndStatsShow
...
cpswEndStatsShow,2900, rxgood = 9844
cpswEndStatsShow,2901, rxbroadcast = 613
cpswEndStatsShow,2902, rxmulticast = 379
cpswEndStatsShow,2903, rxpause = 0
cpswEndStatsShow,2904, rxcrcerros = 272
cpswEndStatsShow,2905, rxalignmenterrors = 0
cpswEndStatsShow,2906, rxoversized = 0
cpswEndStatsShow,2907, rxjabber = 0
cpswEndStatsShow,2908, rxundersized = 0
cpswEndStatsShow,2909, rxfrags = 0
cpswEndStatsShow,2910, rxoctets = 764175

-> cpswEndStatsShow
...
cpswEndStatsShow,2900, rxgood = 9877
cpswEndStatsShow,2901, rxbroadcast = 635
cpswEndStatsShow,2902, rxmulticast = 387
cpswEndStatsShow,2903, rxpause = 0
cpswEndStatsShow,2904, rxcrcerros = 374
cpswEndStatsShow,2905, rxalignmenterrors = 0
cpswEndStatsShow,2906, rxoversized = 0
cpswEndStatsShow,2907, rxjabber = 0
cpswEndStatsShow,2908, rxundersized = 0
cpswEndStatsShow,2909, rxfrags = 0
cpswEndStatsShow,2910, rxoctets = 766815

同样的测试,网口1测试了一天无crc error情况。

总结:的确是硬件走线信号质量问题,造成数据在传输中出错。

Step 6. 网口0降速测试尝试

我们尝试把网口0降速到100M,来测试速度降低是否能消除错误。

因为原生驱动没有类似函数,我们也必须手工编写一个:

diff --git a/vxbFdtTiCpswEnd.c b/vxbFdtTiCpswEnd.c
index 4c32f0e..71afa1a 100644 (file)
--- a/vxbFdtTiCpswEnd.c
+++ b/vxbFdtTiCpswEnd.c
@@ -2938,6 +2946,126 @@ STATUS cpswEndStatsShow
     return OK;
     }
 
+STATUS cpswEndLinkSet
+(
+       UINT32 port,
+       UINT32 mode
+)
+{
+       CPSW_DRV_CTRL * pDrvCtrl;
+       
+       if ((port > 1) || (mode > 4))
+               return -1;
+       
+       pDrvCtrl = pSwCtrl->port[port];
+       
+       switch(mode)
+       {
+               case 0:
+                       (void)miiBusModeSet (pDrvCtrl->cpswMiiDev, IFM_ETHER | IFM_AUTO);
+                       (* _func_kprintf)("set port %d mode: AUTO_SEL\n");
+                       break;
+               case 1:
+                       (void)miiBusModeSet (pDrvCtrl->cpswMiiDev, IFM_ETHER|IFM_100_TX|IFM_FDX);
+                       (* _func_kprintf)("set port %d mode: 100 FDX\n");
+                       break;  
+               case 2:
+                       (void)miiBusModeSet (pDrvCtrl->cpswMiiDev, IFM_ETHER|IFM_100_TX);
+                       (* _func_kprintf)("set port %d mode: 100 HDX\n");
+                       break;
+               case 3:
+                       (void)miiBusModeSet (pDrvCtrl->cpswMiiDev, IFM_ETHER|IFM_10_T|IFM_FDX);
+                       (* _func_kprintf)("set port %d mode: 10 FDX\n");
+                       break;
+               case 4:
+                       (void)miiBusModeSet (pDrvCtrl->cpswMiiDev, IFM_ETHER|IFM_10_T);
+                       (* _func_kprintf)("set port %d mode: 10 HDX\n");
+                       break;
+               default:
+                       break;                  
+       }
+       
+       return OK;
+}

强制设置网口0为100M全双工模式:

-> cpswEndLinkSet(0, 1)

网口0降速为100M模式,测试了半天无crc error情况。

总结:降速能解决crc error的问题,是一个workaround方案。

3. 结论

综上所述,ftp变慢卡顿的根因是硬件信号不稳定造成的大量错包丢包。

目前的workaround方案,可以把网口0降速成100M来使用。

根本的解决办法,需要硬件重新布线重新仿真,单板完工后需要做完备的信号测试,这样高速信号才能有一个扎实稳定的基础。

posted @ 2020-11-01 10:44  pwl999  阅读(110)  评论(0编辑  收藏  举报