cache工作原理测试与印证
近期看了GustavoDuarte的博客《Cache: APlace for Concealment and Safekeeping》(中文翻译网址:http://www.cnblogs.com/xkfz007/archive/2012/10/08/2715163.html),总算是明白了Cache的工作机制。于是想测试一下Cache冲突对程序性能的影响。
测试机器:
[test@database2 cache_test]$ uname -a
Linux database2 2.6.18-308.el5PAE #1 SMP Tue Feb 21 20:46:05 EST 2012 i686 i686 i386 GNU/Linux
缓冲区:
[root@database2 ~]# dmidecode -t cache
# dmidecode 2.11
SMBIOS 2.7 present.
Handle 0x0005, DMI type 7, 19 bytes
Cache Information
Socket Designation: L1-Cache
Configuration: Enabled, Not Socketed, Level 1
Operational Mode: Write Back
Location: Internal
Installed Size: 256 kB
Maximum Size: 256 kB
Supported SRAM Types:
Unknown
Installed SRAM Type: Unknown
Speed: Unknown
Error Correction Type: Parity
System Type: Unified
Associativity: 8-way Set-associative
Handle 0x0006, DMI type 7, 19 bytes
Cache Information
Socket Designation: L2-Cache
Configuration: Enabled, Not Socketed, Level 2
Operational Mode: Write Back
Location: Internal
Installed Size: 1024 kB
Maximum Size: 1024 kB
Supported SRAM Types:
Unknown
Installed SRAM Type: Unknown
Speed: Unknown
Error Correction Type: Single-bit ECC
System Type: Unified
Associativity: 8-way Set-associative
Handle 0x0007, DMI type 7, 19 bytes
Cache Information
Socket Designation: L3-Cache
Configuration: Enabled, Not Socketed, Level 3
Operational Mode: Write Back
Location: Internal
Installed Size: 10240 kB
Maximum Size: 10240 kB
Supported SRAM Types:
Unknown
Installed SRAM Type: Unknown
Speed: Unknown
Error Correction Type: Single-bit ECC
System Type: Unified
Associativity: <OUT OF SPEC>
采用单CPU4核超线程,共有8个逻辑cpu,从/var/log/dmesg日志中看,每个逻辑cpu有32kb一级缓存。
CPU: L1 I cache: 32K, L1 D cache: 32K
CPU: L2 cache: 256K
CPU: L3 cache: 10240K
CPU: Physical Processor ID: 0
CPU: Processor Core ID: 0
写如下代码,进行测试:
1 #include <stdio.h> 2 #include <stdlib.h> 3 #include <unistd.h> 4 #include <sys/time.h> 5 6 #define PAGE 4096 //内存页 7 #define WAY 8 //8路组联合 8 #define LINE 64 //缓存线 9 10 #define GROUP 2 //2组 11 12 #define TOUCH 100000000 13 14 int timeval_subtract(struct timeval* result, struct timeval* x, struct timeval* y) 15 { 16 int nsec; 17 18 if ( x->tv_sec>y->tv_sec ) 19 return -1; 20 21 if ( (x->tv_sec==y->tv_sec) && (x->tv_usec>y->tv_usec) ) 22 return -1; 23 24 result->tv_sec = ( y->tv_sec-x->tv_sec ); 25 result->tv_usec = ( y->tv_usec-x->tv_usec ); 26 27 if (result->tv_usec<0) 28 { 29 result->tv_sec--; 30 result->tv_usec+=1000000; 31 } 32 33 return 0; 34 } 35 36 int main(int argc,char **argv) 37 { 38 int *pI; 39 char *p = (char*)malloc(GROUP*WAY*PAGE); 40 struct timeval start,stop,diff; 41 42 printf("cache touch style 1\n"); 43 gettimeofday(&start,0); 44 for(int i=0;i<TOUCH;i++) 45 { 46 for(int j=0;j<GROUP*WAY;j++) 47 { 48 pI = (int*)(p + j*PAGE); //数据放在同一组,大于8路,引起冲突 49 *pI = j; 50 } 51 } 52 gettimeofday(&stop,0); 53 timeval_subtract(&diff,&start,&stop); 54 printf("总计用时:%d.%d 秒\n",diff.tv_sec,diff.tv_usec/1000); 55 56 printf("cache touch style 2\n"); 57 gettimeofday(&start,0); 58 for(int i=0;i<TOUCH;i++) 59 { 60 for(int j=0;j<GROUP*WAY;j++) 61 { 62 pI = (int*)(p + j*PAGE+j*LINE); //数据分组放置,不会引起冲突 63 *pI = j; 64 } 65 } 66 gettimeofday(&stop,0); 67 timeval_subtract(&diff,&start,&stop); 68 printf("总计用时:%d.%d 秒\n",diff.tv_sec,diff.tv_usec/1000); 69 free(p); 70 }
编译程序,输出结果如下:
[test@database2 cache_test]$ ./cache_test
cache touch style 1
总计用时:4.535 秒
cache touch style 2
总计用时:1.314 秒
第一种情况由于存在冲突,大概用时4.5秒。
第二种情况由于没有冲突,用时只有1.3秒。
将宏定义GROUP改为1,重新编译程序,输出结果如下:
[test@database2 cache_test]$ ./cache_test
cache touch style 1
总计用时:0.486 秒
cache touch style 2
总计用时:0.664 秒
第一种情况由于没有冲突,性能明显改善,用时减少。
第二种情况的时间减少是线性的。
浙公网安备 33010602011771号