阿里平头哥C906特性及FMA指令latency和throughput
| Feature | Description |
|---|---|
| Architechture | RV64GCV |
| Pipline | 5 stage in-order |
| Vector Unit | 32*128bit |
| CACHE | 32KB I/D-cache |
| DRAM | DDR3 2GB |
Vector Unit: rvv0.7.1, INT8-64, FP8-64, BFP16, 4GFLOPs(1GHz)
I-cache: 2-way set associative, 64B line size, VIPT, FIFO
D-cache: 4-way set associative, 64B line size, VIPT, FIFO
According to C906 user's manual, the latency of 32bit FMA is 4.

We can also caculate the throughput(CPI) is 1 using 4(GFLOPS)/1(GHz)/(128(bit)/32(bit)).

浙公网安备 33010602011771号