多线程下的false sharing问题+编程实践（待完善）

1.false sharing原因：

CPU loads memory into cache by "line"

Linux下获取cache line :

cat /proc/cpuinfo | grep cache_alignment

or cat /sys/devices/system/cpu/cpuN/cache/indexN/coherency_line_size 以及其他详尽的cpu cache信息

Maybe : get cacheline programatically ( via C language )

Multi-cpu contention algorithm "hardware write lock"

Maybe : cache algorithms "MESI"

2.程序示例

根据开启的线程数量，观察完成时间的scaling情况，同时观察CPU load

实践：验证the time waiting for memory is counted into CPU time?

3.数据在同一cache line的原因

Array is continuous

The linker lay out global or static data closely in the memory ( in the data segment? )

structs and C++ object layout is compact

Two individual objects on the heap happens to be nearby.(Especially for same kind of objects allocated using its own memory-pool or slab-allocator)

4.解决办法

1）reduce access of the false-sharing structure <= use thread stack or local storage

2）using alignment + pading to eliminate false-sharing

Reminder : the CACHE_LINE_SIZE must be defined at compiling time

c++0x syntax : [[ align(CACHE_LINZE_SIZE) ]] T data;

(可惜GCC现在仍然不支持alignment，在此查看GCC支持的C++0x功能列表)

GCC's externsion : __attribute__(( aligned(CACHE_LINE_SIZE) )) T data;

实践：A general wrapper to ensure 'correct individual ' : CacheLineStorage<T> (Maybe only useful for data object)

5. 最佳实践+总结

1）尽量减少线程间共享结构的频繁读写（尽管无需锁，依然会造成cache-line miss）

2）如果共享结构必须，则注意使在不同线程中频繁读取的数据位于不同的cache line上

6. 数据 + Profiling方法

Intel Pentium M:

按2GHz CPU计算，cache-miss-time最大约为100ns (0.1us)?

TODO : 使用profiling工具来检查CPI和cache miss rate

参考文献：

Eliminate false sharing by Herb Sutter (nice blog and author!)

posted @ 2012-01-11 22:38 PromisE_谢阅读(1341) 评论(0) 编辑收藏举报

刷新页面返回顶部

One Step Further