Advanced FPGA design: architecture, implementation, and optimization (Architecturing speed)
有三种对速度(speed)的定义: throughput, latency, and timing。
throughput: 每个clock cycle 处理的 data 单位bits per second
latency: time between data input and processed data output. 单位 clock cycles or time
timing: logic delays between sequential elements. 当谈论设计没有“meet timing”, critical path的delay大于时钟周期。 critial path是指flip-flop之间最长的delay (包含 combinatorial delay, clk-to-out delay, routing delay, setup timing, clock skew and so on)
High throughput: pipeline instead of iterative implementation.unrolled the loop (n iterative times), the throughput performance increase of a factor of n.
Unrolling an iterative loop increases throughput because pipeline could be achieved. But there is a penalty in area.
Low latency: passes the data from the input to the output as quickly as possible by minimizing the intermediate processing dealys.
Latency can be reduced by removing pipeline registers . The penalty for removing pipeline registers is an increase in combinatorial delay between registers (与之前的例子相比用了阻塞赋值来移除了那些registers实现组合逻辑)
Timing
F_max 最大频率 = 1 / Tclk-to-q +Tlogic between flip-flops + Tsetup - Tskew
方法 1: add register layers: adding register layers improves timing by dividing the critical path into two paths of smaller delay
方法 2:parallel structures: X = {A,B} X * X= {(A * A), (2 * A * B), (B * B)}
方法 3:Flatten Logic Structures: no priority encoding
原:always@(posedge clk)
begin
if(ctrl[0]) rout[0] <= in;
else if (ctrl[1]) rout[1] <= in;
else if(ctrl[2]) rout[2] <= in;
else if(ctrl[3]) rout[3] <= in;
end
优化后: always@(posedge clk)
begin
if(ctrl[0]) rout[0] <= in;
if (ctrl[1]) rout[1] <= in;
if(ctrl[2]) rout[2] <= in;
if(ctrl[3]) rout[3] <= in;
end
方法4 : Register balancing: improves timing by moving combinatorial logic from the critical path to an adjacent path
原:always@(posedge clk) begin
ra <= A;
rb <= B;
rc <= C;
Sum <= ra + rb +rc
end
改:
always@(posedge clk) begin
rABsum <= A+B
rc <= C;
Sum <= rABsum + rc
end
方法5: Reorder path : should be used whenever multiple paths combine with the critical path and the combined path can be reordered such that the critical path can be moved closer to the destination register.

浙公网安备 33010602011771号