多发射与超标量
多发射:Mulitple Issue. 不是launch和emit. issue: supply and distribution of items for use.
计算机组成原理6-流水线、多发射和超标量、SIMD、异常 - 庞某人 - 博客园
[link] ...The goal of the multiple-issue processors is to allow multiple instructions to issue in a clock cycle. Multiple-issue processors come in three major flavors :
- Very Long Instruction Word (VLIW) processors
- Statically scheduled superscalar processors. Although statically scheduled superscalars issue a varying rather than a fixed number of instructions per clock, they are actually closer in concept to VLIWs, since both approaches rely on the compiler to schedule code for the processor.
- Dynamically scheduled superscalar processors
Mulitple Issue是目标和效果,VLIW等是手段。
Multiple Instruction Issue (washington.edu)是一个到.pdf (737KB)的link,推荐阅读。下面是从别的地方抄来的:
Basic five-stage pipeline in a RISC machine (IF = Instruction Fetch, ID = Instruction Decode, EX = Execute, MEM = Memory access, WB = Register write back). The vertical axis is successive instructions; the horizontal axis is time. So in the green column, the earliest instruction is in WB stage, and the latest instruction is undergoing instruction fetch.

...Bypassing is also known as operand forwarding. Suppose the CPU is executing the following piece of code:
SUB r3,r4 -> r10 ; Writes r3 - r4 to r10
AND r10,r3 -> r11 ; Writes r10 & r3 to r11
The instruction fetch and decode stages send the second instruction one cycle after the first. They flow down the pipeline as shown in this diagram:

In a naive pipeline, without hazard consideration, the data hazard progresses as follows:
In cycle 3, the SUB instruction calculates the new value for r10. In the same cycle, the AND operation is decoded, and the value of r10 is fetched from the register file [寄存器堆]. However, the SUB instruction has not yet written its result to r10. Write-back of this normally occurs in cycle 5 (green box). Therefore, the value read from the register file and passed to the ALU (in the Execute stage of the AND operation, red box) is incorrect.
1Fetch, 2Decode, 3Execute, 4Access, 5Write-Back, 一条指令5个时钟周期,2条10个。上例6个? cycle搞完两条,依然是有加速的?
Instead, we must pass the data that was computed by SUB back to the Execute stage (i.e. to the red circle in the diagram) of the AND operation before it is normally written-back. The solution to this problem is a pair of bypass multiplexers. These multiplexers sit at the end of the decode stage, and their flopped [flop: vt. drop or lay down heavily and noisily] outputs are the inputs to the ALU. Each multiplexer selects between:
- A register file read port (i.e. the output of the decode stage, as in the naive pipeline): red arrow
- The current register pipeline of the ALU (to bypass by one stage): blue arrow
- The current register pipeline of the access stage (which is either a loaded value or a forwarded ALU result, this provides bypassing of two stages): purple arrow. Note that this requires the data to be passed backwards in time by one cycle. If this occurs, a bubble must be inserted to stall the AND operation until the data is ready.
...This NOP is termed a pipeline bubble since it floats in the pipeline, like an air bubble in a water pipe, occupying resources but not producing useful results. The hardware to detect a data hazard and stall the pipeline until the hazard is cleared is called a pipeline interlock.
...Block diagram of a basic uniprocessor-CPU computer. Black lines indicate data flow, whereas red lines indicate control flow; arrows indicate flow directions:

浙公网安备 33010602011771号