关于指令重排序的内容

重排序问题：为了更有效的压榨 CPU，使得每个处理器内部的运算都达到最大限度的利用，处理器可能会对执行的任务（代码）进行重排序优化，使得排序后执行的结果和代码执行的结果保持一致。但是，对多线程的重排序，可能会改变程序的执行结果（比如：控制依赖的重排序）。

https://xie.infoq.cn/article/680fd531df57856ddcb532914

除了处理器，常见的Java运行时环境的JIT编译器也会做指令重排序操作4，即生成的机器指令与字节码指令顺序不一致。

JVM能根据处理器特性(CPU多级缓存系统、多核处理器等)适当的对机器指令进行重排序，使机器指令能更符合CPU的执行特性，最大限度的发挥机器性能。

总结一句话，重排的的规则一般是基于单线程讨论的，看看那些指令可以提前执行，哪些可以延期执行，充分利用cpu的流水线程模型，提高执行效率，但是不管怎么排序都不能影响单线程下的串行执行语义，但是他不会考虑在多线下，由于重排序导致的整体执行顺序性的问题。
https://www.cnblogs.com/ITPower/p/13580691.html

内存排序是指CPU访问主存时的顺序。可以是编译器在编译时产生，也可以是CPU在运行时产生。反映了内存操作重排序，乱序执行，从而充分利用不同内存的总线带宽。
https://zh.wikipedia.org/wiki/内存排序

In my point of view, memory order的问题就是因为指令重排引起的, 指令重排导致原来的内存可见顺序发生了变化, 在单线程执行起来的时候是没有问题的, 但是放到多核/多线程执行的时候就出现问题了, 为了效率引入的额外复杂逻辑的的弊端就出现了.

http://gavinchou.github.io/summary/c++/memory-ordering/#acquire and relase semantics by preshing

======================== 指令执行的几种顺序 ==================

Source code order: The order in which the memory operations are specified in the source code.
Program order: The order in which the memory operations are specified in the machine code. The program order can differ from the source code order because, based on the language memory model, compilers can reorder instructions as part of the optimization process.
Execution order: The order in which the individual memory-reference instructions are executed on a given CPU. The execution order can differ from the program order due to optimizations based on the specific CPU-implementations’ hardware memory model (e.g., out-of-order execution).
Perceived order: The order in which a CPU perceives its and other CPUs’ memory operations. The perceived order can differ from the execution order due to caching, interconnect, and memory-system optimizations defined by the hardware memory model. On some architectures, different CPUs can perceive the same set of memory operations as occurring in different orders.
https://www.arangodb.com/2021/02/cpp-memory-model-migrating-from-x86-to-arm/
======================== 重排序需要遵循的规则 ==================

这种硬件必须服从下面的规则：

（1）对于每个CPU而言，从它自己的角度看，其内存访问的顺序总是符合program order的。

（2）如果CPU要对一个指定的操作（load或者store）和另外的一个store操作进行重排的话，那么一定要符合指定的条件：即这两个操作的memory地址是不能有重叠区域的。

（3）如果program order是先load A然后load B，这样的操作在CPU 0上执行的时候，从其自己来看memory系统的变化，当然是load A，然后load B（请参考前面的第一条规则）。但是，系统中的其他其他CPU如何看待CPU 0的操作呢（想像所有的CPU都是趴在总线上的观察者，不断的观察memory的变化）？根据前面文章描述的知识，可以肯定的是load A和load B的顺序是无法保证的。如果增加了个read memory barrier，那结果可就不一样了。假设CPU 0执行的是load A，然后read memory barrier，最后load B，那么总线上的所有CPU，包括CPU 0自己，看到的内存操作顺序都是load A，然后load B。

（4）如果系统中所有的CPU都是潜伏在总线上的观察者，不断的观察memory的变化。那么任意一个给定的系统中的CPU在执行store代码的时候，都可以被系统中的所有CPU感知（包括它自己）。如果给定CPU执行的代码被write memory barrier分成两段：wmb之前的store操作，wmb之后的store操作，那么系统中所有的CPU的观察结果都是一样的，wmb之前的store操作先执行完毕，wmb之后的store操作随后被执行。

（5）某一个CPU执行的全功能内存屏障指令之前的memory access的操作（load or store），必定先被系统中所有的CPU感知到，之后系统中的所有CPU才看到全功能内存屏障指令之后的memory access的操作（load or store）的执行结果。

https://blog.csdn.net/reliveIT/article/details/106898762
======================== 重排序的解决方案 ==================

解决重排序带问题：通过插入特定类型的内存屏障指令来禁止指令的重排序。但是，不同物理机架构支持的内存屏障指令类型不尽相同。常见的处理器允许的重排序类型如下图所示。

【总结】（1）重排序和可见性的关系：CPU 对指令的重排序会导致可见性问题；编译器对指令的重排序会导致可见性问题；CPU 缓存的写入和读取延迟，导致可见性问题，产生重排序的视觉（伪排序）。（2）通过加入写屏障，强制把 StoreBuffer 的数据刷新到缓存，CPU 可以获取本地 StoreBuffer 的最新值；通过加入读屏障，强制把 InvalidatedQueue 处理完成，标记无效值，以读取到最新的值。

【总结】如果没有加读写屏障的话，由于 StoreBuffer 和 cache 之间、InvalidatedQueue 和 cache 之间会出现短暂的延迟，造成短暂的可见性问题，从而产生 CPU 内存重排序（伪排序）视觉效果。读写屏障会强制处理 StoreBuffer 和 InvalidatedQueue，以刷新本地 cache 的值

https://xie.infoq.cn/article/680fd531df57856ddcb532914

对于没有先后依赖关系的语句，编译器可以重新调整语句的执行顺序。这种重排序应该很好理解，目的和 CPU 指令重排序类似，都是为了缩短执行时间，但是在多线程的条件下，就会产生可见性问题。比如：在JMM 中，专门针对volatile 关键字制定重排序规则。

https://xie.infoq.cn/article/680fd531df57856ddcb532914

======================== 重排序的依赖保证 ==================

In addition to barriers, these architectures provide the following (implicit) dependencies to enforce orderings:

Address Dependency: There is an address dependency from a read to a program-order-later read or write if the value read by the first instruction is used to compute the address of the second instruction.
Control Dependency: There is a control dependency from a read to a program-order-later read/write if the value read by the first instruction is used to compute the condition of a conditional branch that is program-order-before the second instruction.
Data Dependency: There is a data dependency from a read to a program-order-later write if the value read by the first instruction is used to compute the value written by the second instruction.
The ARMv8 architecture has been revised and now has a multicopy-atomic model.

https://www.arangodb.com/2021/02/cpp-memory-model-migrating-from-x86-to-arm/

======================== 顺序一致 ==================

在顺序一致 (Sequential consistency) 的内存模型中,没有内存乱序存在.

如今,很难找到一个现代多核设备保证在硬件层 Sequential consistency.也就早期的 386 没有强大到能在运行时进行任何内存的乱序.

当用上层语言编程时,Sequential consistency 成为一个重要的软件内存模型.Java5 和之后版本,用volatile声明共享变量.在 C+11 中,可以使用默认的顺序约束memory_order_seq_cst在做原子操作时.当使用这些术语后,编译器会限制编译乱序和插入特定 CPU 的指令来指定合适的 memory barrier 类型.
http://dreamrunner.org/blog/2014/06/28/qian-tan-memory-reordering/

======================== 相关文献 ==================

C++ Memory Model: Migrating from X86 to ARM：https://www.arangodb.com/2021/02/cpp-memory-model-migrating-from-x86-to-arm/
memory-ordering： https://gavinchou.github.io/summary/c++/memory-ordering/
内存排序 https://zh.wikipedia.org/wiki/内存排序
A Tutorial Introduction to the ARM and POWER Relaxed Memory Models ： https://www.cl.cam.ac.uk/~pes20/ppc-supplemental/test7.pdf
Intel® 64 Architecture Memory Ordering White Paper ： http://www.cs.cmu.edu/~410-f10/doc/Intel_Reordering_318147.pdf
memory-ordering：http://gavinchou.github.io/summary/c++/memory-ordering/
因特中规则的重排序：http://www.cs.cmu.edu/~410-f10/doc/Intel_Reordering_318147.pdf

posted @ 2021-10-10 20:26 TomStudio 阅读(417) 评论(0) 收藏举报

刷新页面返回顶部