cache工作原理测试与印证

近期看了GustavoDuarte的博客《Cache: APlace for Concealment and Safekeeping》(中文翻译网址:http://www.cnblogs.com/xkfz007/archive/2012/10/08/2715163.html),总算是明白了Cache的工作机制。于是想测试一下Cache冲突对程序性能的影响。

 

测试机器:

        [test@database2 cache_test]$ uname -a

               Linux database2 2.6.18-308.el5PAE #1 SMP Tue Feb 21 20:46:05 EST 2012 i686 i686 i386 GNU/Linux

 

缓冲区:

        [root@database2 ~]# dmidecode -t cache

        # dmidecode 2.11

        SMBIOS 2.7 present.

 

        Handle 0x0005, DMI type 7, 19 bytes

        Cache Information

        Socket Designation: L1-Cache

        Configuration: Enabled, Not Socketed, Level 1

        Operational Mode: Write Back

        Location: Internal

        Installed Size: 256 kB

        Maximum Size: 256 kB

        Supported SRAM Types:

                Unknown

        Installed SRAM Type: Unknown

        Speed: Unknown

        Error Correction Type: Parity

        System Type: Unified

        Associativity: 8-way Set-associative

 

        Handle 0x0006, DMI type 7, 19 bytes

        Cache Information

        Socket Designation: L2-Cache

        Configuration: Enabled, Not Socketed, Level 2

        Operational Mode: Write Back

        Location: Internal

        Installed Size: 1024 kB

        Maximum Size: 1024 kB

        Supported SRAM Types:

                Unknown

        Installed SRAM Type: Unknown

        Speed: Unknown

        Error Correction Type: Single-bit ECC

        System Type: Unified

        Associativity: 8-way Set-associative

 

        Handle 0x0007, DMI type 7, 19 bytes

        Cache Information

        Socket Designation: L3-Cache

        Configuration: Enabled, Not Socketed, Level 3

        Operational Mode: Write Back

        Location: Internal

        Installed Size: 10240 kB

        Maximum Size: 10240 kB

        Supported SRAM Types:

                Unknown

        Installed SRAM Type: Unknown

        Speed: Unknown

        Error Correction Type: Single-bit ECC

        System Type: Unified

        Associativity: <OUT OF SPEC>

采用单CPU4核超线程,共有8个逻辑cpu,从/var/log/dmesg日志中看,每个逻辑cpu有32kb一级缓存。

CPU: L1 I cache: 32K, L1 D cache: 32K

CPU: L2 cache: 256K

CPU: L3 cache: 10240K

CPU: Physical Processor ID: 0

CPU: Processor Core ID: 0

 

写如下代码,进行测试:

 

 1 #include <stdio.h>
 2 #include <stdlib.h>
 3 #include <unistd.h>
 4 #include <sys/time.h>
 5 
 6 #define PAGE 4096  //内存页
 7 #define WAY 8      //8路组联合
 8 #define LINE 64    //缓存线
 9 
10 #define GROUP 2    //2组
11 
12 #define TOUCH 100000000
13 
14 int timeval_subtract(struct timeval* result, struct timeval* x, struct timeval* y)
15 {
16       int nsec;
17   
18       if ( x->tv_sec>y->tv_sec )
19                 return -1;
20     
21       if ( (x->tv_sec==y->tv_sec) && (x->tv_usec>y->tv_usec) )
22                 return -1;
23     
24       result->tv_sec = ( y->tv_sec-x->tv_sec );
25       result->tv_usec = ( y->tv_usec-x->tv_usec );
26     
27       if (result->tv_usec<0)
28       {
29                 result->tv_sec--;
30                 result->tv_usec+=1000000;
31       }
32    
33       return 0;
34 } 
35 
36 int main(int argc,char **argv)
37 {
38     int *pI;
39     char *p = (char*)malloc(GROUP*WAY*PAGE);
40     struct timeval start,stop,diff;
41 
42     printf("cache touch style 1\n");
43     gettimeofday(&start,0);
44     for(int i=0;i<TOUCH;i++)
45     {
46         for(int j=0;j<GROUP*WAY;j++)
47         {
48             pI = (int*)(p + j*PAGE); //数据放在同一组,大于8路,引起冲突
49             *pI = j;
50         }
51     }
52     gettimeofday(&stop,0);
53     timeval_subtract(&diff,&start,&stop);
54     printf("总计用时:%d.%d 秒\n",diff.tv_sec,diff.tv_usec/1000);
55 
56     printf("cache touch style 2\n");
57     gettimeofday(&start,0);
58     for(int i=0;i<TOUCH;i++)
59     {
60            for(int j=0;j<GROUP*WAY;j++)
61            {
62                  pI = (int*)(p + j*PAGE+j*LINE); //数据分组放置,不会引起冲突
63                  *pI = j;
64            }
65     }
66     gettimeofday(&stop,0);
67     timeval_subtract(&diff,&start,&stop);
68     printf("总计用时:%d.%d 秒\n",diff.tv_sec,diff.tv_usec/1000);
69     free(p);
70 }

 

 编译程序,输出结果如下:

[test@database2 cache_test]$ ./cache_test

cache touch style 1

总计用时:4.535 秒

cache touch style 2

总计用时:1.314 秒

第一种情况由于存在冲突,大概用时4.5秒。

第二种情况由于没有冲突,用时只有1.3秒。

 

将宏定义GROUP改为1,重新编译程序,输出结果如下:

[test@database2 cache_test]$ ./cache_test

cache touch style 1

总计用时:0.486 秒

cache touch style 2

总计用时:0.664 秒

 

第一种情况由于没有冲突,性能明显改善,用时减少。

第二种情况的时间减少是线性的。

posted @ 2017-06-13 13:47  已秋  阅读(57)  评论(0)    收藏  举报