从零开始写处理器(1)——数据通路+控制模块设计+指令定义
此前在UCSB交换旁听了那边的ECE154b,课设是写MIPS+cache+branch prediction,个人感觉很有意思。现在就按照教授给的project guideline自己动手走一遍流程(见https://www.ece.ucsb.edu/~strukov/ece154bSpring2017/project1.pdf)。参考的书籍是Harris编写的 "Digital Design and Computer Architecture" 。
我写的MIPS处理器为5级流水线,分为取指(Fetch)->译码(Decode)->执行(Execute)->存储(Memory)->回写(Writeback),有冲突监测。
//2020.2.7更新,修复了一些bug
1.指令集编码
首先我们需要确定我们希望MIPS所能实现的指令,并确定指令的编码。对于该32bit微处理器,我们希望实现至少以下27种指令,我们分为三种:
R型指令:add, addu, sub, subu, and, or, xor, xnor, slt, sltu, lw, sw, mult, multu. 指令格式:OP, rd, rs, rt. 对于R型指令,将32位的指令各部分分配为opcode(6), rs(5), rt(5), rd(5), shamt(5), funct(6),其中opcode=6'b0,对于不同指令的区分取决于funct部分。
I型指令:addi,addiu,andi,ori,xori,slti,sltiu,lw,sw,lui,bne,beq.指令格式:OP, rt, rs, imm. 对于I型指令,将32位的指令各部分分配为opcode(6), rs(5), rt (5), immediate(16)。
J型指令:j. 指令格式:OP, LABEL. 对于J型指令,将32位的指令各部分分配为opcode(6), address(26)。
我们将上述15种指令编码得到表1.
表1 MIPS指令编码
|
R-type instr |
operation | OP | rs | rt | rd | shamt | funct |
| add | 000000 |
sssss (reg addr) |
ttttt (reg addr) |
ddddd (reg addr) |
00000 (no use) |
100000 | |
| addu | 100001 | ||||||
| sub | 100010 | ||||||
| subu | 100011 | ||||||
| and | 100100 | ||||||
| or | 100101 | ||||||
| xor | 100110 | ||||||
| xnor | 100111 | ||||||
| slt | 101010 | ||||||
| sltu | 101011 | ||||||
| mult | 000111 | ||||||
| multu | 000101 |
|
I-type instr |
operation | op | rs | rt | imm |
| addi | 001000 |
sssss (reg addr) |
ttttt (reg addr) |
imm |
|
| addiu | 001001 | ||||
| andi | 001100 | ||||
| ori | 001101 | ||||
| xori | 001110 | ||||
| sltiu | 001011 | ||||
| lw | 100011 | ||||
| sw | 101001 | ||||
| lui | 001111 | ||||
| bne | 000101 | ||||
| beq | 000100 |
|
J-type instr |
operation | OP | LABEL |
| J | 000010 | address |
2.ALU设计
对于ALU,我们将模块设定为如图1所示(实际上我在此基础上加入了一个输出信号判断此时输出是否为0),是一个有3个输入、2个输出的组合逻辑电路。对于该ALU,我们希望实现除乘法外所有的运算环节(乘法耗时较长,我们另加一个独立单元计算,避免让流水线频率过低)。

图1 ALU模块图(来自project guideline)
图1中Func信号来自control模块读取opcode与funct等作出的判断,输入ALU中以判断执行何种操作。Func[2:0]用于实现+,-,&,|,^等运算,Func[3]用于区分符号数运算与非符号数运算,代码如下:
module alu( input [31:0] in1, input [31:0] in2, input [3:0] func, output reg[31:0] aluout); always @(*) begin case (func[2:0]) 3'b000 : aluout = in1+in2; 3'b001 : aluout = in1-in2; 3'b010 : aluout = in1|in2; 3'b011 : aluout = in1&in2; 3'b100 : aluout = in1^in2; 3'b101 : aluout = ~(in1^in2); 3'b110 : aluout = (in1-in2)?0:1;//slt 3'b111 : aluout = {in2[15:0],16'b0};//lui default : aluout = 32'b0; endcase end endmodule
值得注意的是,在MIPS对指令集的定义中,符号数与非符号数在加法与减法运算上完全没有区别(add与addu,addi与addiu等),因此alu中并不需要区分符号数与非符号数加减法,对于这两者的区分源于编译器对32bit的二进制数的判断。但对于有符号数与无符号数我们依然需要运用不同的编码,是因为有符号数存在溢出检测,而无符号数则没有,在本案例中我们没有加入该项功能,具体可以参考MIPS官方文档。(感谢https://blog.csdn.net/sinat_42483341/article/details/89511856给的启发)
3.数据内存设计
数据内存为模拟cache功能,因为MIPS为RISC机,即寄存器与外部存储只能通过load/store指令交换数据,我们在开始一个新的程序时,必须要通过lw将操作数导入寄存器文件(Register File)。在后续学习中我们会将cache模块代替该部分,从而进一步让设计尽可能贴近实际计算机体系结果。
在本部分,我们加入了从外部写入数据的三个控制信号(ext_),在进行仿真的时候我们可以对数据内存中数据进行录入。同时,在本模块中,由于数据为32bit,但每个单元只可以存储8bit数据,本设计采用大端存储的方式存储数据。
假设前两个32bit数据分别为0x12345678,0x87654321,数据存放顺序如下表示意:
| addr | 1 | 2 | 3 | 4 |
| data | 0x12 | 0x34 | 0x56 | 0x78 |
| addr | 5 | 6 | 7 | 8 |
| data | 0x87 | 0x65 | 0x43 | 0x21 |
module data_memory( input [31:0] a, input [31:0] wd, input clk, input we,//enable signal output [31:0] rd, input [31:0] ext_data,ext_data_addr, input ext_data_en); reg [7:0] data_ram [255:0]; assign rd = {data_ram[a],data_ram[a+1],data_ram[a+2],data_ram[a+3]};//word aligned integer n; always@(posedge clk) begin //if (!rst_n) begin // for(n=0;n<255;n=n+1) // data_ram[n]<=0; //end // else if(ext_data_en) {data_ram[ext_data_addr],data_ram[ext_data_addr+1],data_ram[ext_data_addr+2],data_ram[ext_data_addr+3]} <= ext_data; if (we) {data_ram[a],data_ram[a+1],data_ram[a+2],data_ram[a+3]} <= wd; end endmodule
这一模块为时序模块,处于五级流水线的第四级(存储部分)。
4.指令存储
该模块为流水线第一级,取指部分,只有一个输入一个输出,在初始的时候我们将机器码写入指令存储代码内部的寄存器组中,每过一个时钟上升沿,将地址+4得到下一条指令,但遇到j,bne,beq时需要通过特殊的运算跳转到特定的地址取指,这一部分我将在后面的文章中给出(重度拖延症患者)。
与数据内存一样,该部分有ext_端口读外部数据,此外也使用了大端存储。
module inst_memory( input clk,//for writing memory input [31:0] a, output reg [31:0] rd, input [31:0] ext_instr,ext_instr_addr, input ext_instr_en, input start); reg [7:0] inst_mem [255:0]; always@(*) begin if(start == 1) rd = {inst_mem[a],inst_mem[a+1],inst_mem[a+2],inst_mem[a+3]}; else rd = 0; end integer n; always@(posedge clk) begin if(ext_instr_en) begin inst_mem[ext_instr_addr]<=ext_instr[31:24]; inst_mem[ext_instr_addr+1]<=ext_instr[23:16]; inst_mem[ext_instr_addr+2]<=ext_instr[15:8]; inst_mem[ext_instr_addr+3]<=ext_instr[7:0]; end end endmodule
5.寄存器组
这一模块为译码部分,我们通过译码选中不同的寄存器导出相应数据输送到ALU与乘法器中进行计算,此外也可以通过lw与sw指令与cache中的数据进行交换。其中两个读输出为rs与rt,为了避免在同一时刻进行读和写从而造成RAW(Read After Write)冲突,下降沿写寄存器数据,读的部分我们使用wire型变量,为组合逻辑。
module reg_file( input [31:0] wd,//data written in input [4:0] a3,//write addr, for I instruction: [20:16] for R instruction: [15:11] input clk,reg_wr,rst_n, input [4:0] a1,a2,//read addr,[25:21]&[20:16] output [31:0]rd1,rd2);//output data reg [31:0] rf [31:0]; integer i; always@(negedge clk or negedge rst_n) begin if (!rst_n) begin for (i=0;i<32;i=i+1) begin rf[i] <= 0; end end else begin if (reg_wr) begin rf[a3] <= wd; end end end /*always@(posedge clk or negedge rst_n) begin if(!rst_n) begin rd1 <= 0; rd2 <= 0; end else begin rd1 <= rf[a1]; rd2 <= rf[a2]; end end*/ assign rd1 = a1?rf[a1]:0; assign rd2 = a2?rf[a2]:0; endmodule
6.乘法器设计
因为乘法器耗时较长,所以我们将乘法功能从ALU中抽出来,与其他运算不同的是,无符号乘法与有符号乘法并不一致,需要加以区别。
`timescale 1ns / 1ps module multipler(//for add&sub, there is no different in the circut, but multiple is input [31:0] a, input [31:0] b, input clk, input sign, output reg [31:0]multout); wire signed [31:0]a_sign,b_sign; reg signed [31:0]multout_sign; always@(*) begin if (sign) multout = a*b; else begin multout_sign = a_sign*b_sign; multout = multout_sign; end end endmodule
7.数据通路设计
利用如下图所示数据通路我们可以连接各个模块,此外,数据通路的运算也需要控制器与冲突控制端口的控制,我们需要预留出相应的端口。

module datapath( input clk, input rst_n, input regwrite, input memtoreg, input memwrite, input [1:0] branch, input [3:0] alucontrol,//first bit for signed input alusrc, input regdst, input jump, input mult_sel, //input for hazard handle input [1:0]forwardAE,forwardBE, input forwardAD,forwardBD, input flushD, input stallF,stallD,flushE, //output for hazard handle output [4:0] rsE,rtE,//forward RAW output [4:0] rtD,//stall RAW output [4:0] rsD, output [4:0] writeregM,writeregW,//forward RAW output regwriteM,regwriteW,//forward RAW output memtoregE,//stall RAW output [4:0] writeregE, output pcsrcD,//for branch control output memtoregM,regwriteE, // output [5:0]op, output [5:0] funct, output [31:0] resultW, //loading data from outside input [31:0] ext_data,ext_data_addr, input ext_data_en, input [31:0] ext_instr,ext_instr_addr, input ext_instr_en, input start); wire [31:0]realoutM,writedataM; /////////////////////Fetch/////////////////////////////// wire [31:0] pcF; wire [31:0] instrD; wire [31:0] pcplus4D; wire [31:0] pc_n; wire [31:0] pcbranchD;//branch address wire [31:0] instrF,pcplus4F; wire [31:0] jumpadd;//jump address assign pcplus4F = pcF+4; wire [31:0] pc_n_temp; mux2 #(32) PCBranchMux(pcplus4F,pcbranchD,pcsrcD,pc_n_temp); mux2 #(32) PCJumpMux(pc_n_temp,jumpadd,jump,pc_n); //assign pc_n = jump?(pcsrcD?pcplus4F:pcbranchD):jumpadd; Fetch Fetchff(clk,rst_n,stallF,pc_n,pcF); inst_memory fetch_mem(clk,pcF,instrF,ext_instr,ext_instr_addr,ext_instr_en,start);//instruction memory ////////////////////Decode///////////////////////// Decode Decodeff(clk,rst_n,stallD,flushD,instrF,pcplus4F,instrD,pcplus4D); assign op = instrD[31:26]; assign funct = instrD[5:0]; wire regwriteD,memtoregD,memwriteD; wire [1:0] branchD; wire [4:0] a1,a2; wire [3:0] alucontrolD;//first bit for signed wire alusrcD,regdstD,jumpD,mult_selD; assign {regwriteD,memtoregD,memwriteD,branchD,alucontrolD,alusrcD,regdstD,jumpD,mult_selD}= {regwrite,memtoreg,memwrite,branch,alucontrol,alusrc,regdst,jump,mult_sel}; assign a1 = instrD[25:21];//rs assign a2 = instrD[20:16];//rt wire [31:0] rd1D,rd2D,signimmD;// wire [4:0] rdD; wire [31:0] signimmE,rd1E,rd2E; wire [4:0] rdE; wire memwriteE,alusrcE,regdstE,mult_selE; wire [3:0] alucontrolE; assign rtD = instrD[20:16];//for I-type write assign rdD = instrD[15:11];//for R-tpye write assign rsD = instrD[25:21]; wire [31:0]rd1,rd2; reg_file reg_file(resultW,writeregW,clk,regwriteW,rst_n,a1,a2,rd1,rd2); mux2 #(32) CtrHazardMux1(rd1,realoutM,forwardAD,rd1D); mux2 #(32) CtrHazardMux2(rd2,realoutM,forwardBD,rd2D);//control hazard forward wire pcsrcD_temp; mux2 #(1) BranchMux1((rd1D == rd2D),(rd1D != rd2D),branchD[0],pcsrcD_temp); mux2 #(1) BranchMux2(1'b0,pcsrcD_temp,branchD[1],pcsrcD); //assign pcsrcD = branchD[1]?0:(branchD[0]?(rd1D == rd2D):(rd1D != rd2D)); assign jumpadd = {pcF[31:28],instrD[25:0],2'b0}; assign signimmD = {{16{instrD[15]}},instrD[15:0]}<<2; assign pcbranchD = pcplus4D+signimmD; wire [31:0] srcaE,srcbE; wire [31:0] aluoutE, multoutE; wire [31:0] writedataE; wire [31:0] rd1E_temp,rd2E_temp; /////////////////////////Execution/////////////////////////////// Execute Executeff(clk,rst_n,flushE,signimmD,rd1D,rd2D,rtD,rdD,rsD,regwriteD,memtoregD,memwriteD,alusrcD,regdstD,mult_selD, alucontrolD,signimmE,rd1E_temp,rd2E_temp,rtE,rdE,rsE,alucontrolE, regwriteE,memtoregE,memwriteE,alusrcE,regdstE,mult_selE); mux3 #(32) RAWHazardMux1(rd1E_temp,resultW,realoutM,forwardAE,rd1E); mux3 #(32) RAWHazardMux2(rd2E_temp,resultW,realoutM,forwardBE,rd2E);//RAW Hazard forward assign srcaE = rd1E; mux2 #(32) SrcBMux(rd2E,signimmE,alusrcE,srcbE); //assign srcbE = alusrcE?signimmE:rd2E; alu alu(srcaE,srcbE,alucontrolE,aluoutE); multipler multipler(srcaE,srcbE,clk,alucontrolE[3],multoutE);//alucontrolE[2]=1,signed wire [31:0] realoutE; wire memwriteM; mux2 #(32) ALU_MULT_Mux(aluoutE,multoutE,mult_sel,realoutE); //assign realoutE = mult_selE?aluoutE:multoutE; assign writedataE = rd2E; mux2 #(5) regMux(rtE,rdE,regdstE,writeregE); //assign writeregE = regdstE?rtE:rdE; ///////////////////////Memory//////////////////////////////// Memory Memoryff(clk,rst_n,regwriteE,memtoregE,memwriteE,realoutE,writedataE,writeregE, regwriteM,memtoregM,memwriteM,realoutM,writedataM,writeregM); wire [31:0]readdataM; data_memory data_memory(realoutM,writedataM,clk,memwriteM,readdataM,ext_data,ext_data_addr,ext_data_en); wire memtoregW; wire [31:0]readdataW,realoutW; ///////////////////////Writeback///////////////////////////////// WriteBack WriteBackff(clk,rst_n,regwriteM,memtoregM,realoutM,writeregM,readdataM, regwriteW,memtoregW,realoutW,writeregW,readdataW); mux2 #(32) ResultMux(realoutW,readdataW,memtoregW,resultW); //assign resultW = memtoregW?realoutW:readdataW; endmodule `timescale 1ns / 1ps module Fetch( input clk, rst_n, input enable_n, input [31:0] pc_n, output reg [31:0] pcF); always@(posedge clk or negedge rst_n) begin if(!rst_n) pcF<=0; else if(!enable_n) pcF<=pc_n; end endmodule module Decode( input clk,rst_n, input enable_n, flushD, input [31:0] instrF,pcplus4F, output reg [31:0] instrD,pcplus4D); always@(posedge clk or negedge rst_n or posedge flushD) begin if(!rst_n) begin instrD <= 0; pcplus4D <= 0; end else if (flushD) begin instrD <= 0; pcplus4D <= 0; end else if (!enable_n) begin instrD <= instrF; pcplus4D <= pcplus4F; end end endmodule module Execute( input clk,rst_n, input flushE, input [31:0] signimmD,rd1D,rd2D, input [4:0] rtD,rdD,rsD, input regwriteD,memtoregD,memwriteD,alusrcD,regdstD,mult_selD, input [3:0]alucontrolD, output reg [31:0] signimmE,rd1E,rd2E, output reg [4:0] rtE,rdE,rsE, output reg [3:0] alucontrolE, output reg regwriteE,memtoregE,memwriteE,alusrcE,regdstE,mult_selE); always@(posedge clk,negedge rst_n,posedge flushE) begin if((!rst_n) || flushE) begin signimmE <= 0; rd1E <= 0; rd2E <= 0; {regwriteE,memtoregE,memwriteE,alucontrolE,alusrcE,regdstE,mult_selE} <= 0; rtE <= 0; rdE <= 0; rsE <= 0; end else begin signimmE <= signimmD; rd1E <= rd1D; rd2E <= rd2D; {regwriteE,memtoregE,memwriteE,alucontrolE,regdstE,mult_selE} <= {regwriteD,memtoregD,memwriteD,alucontrolD,regdstD,mult_selD}; alusrcE <= alusrcD; rtE <= rtD; rdE <= rdD; rsE <= rsD; end end endmodule module Memory( input clk,rst_n, input regwriteE,memtoregE,memwriteE, input [31:0]realoutE,writedataE, input [4:0]writeregE, output reg regwriteM, output reg memtoregM,memwriteM, output reg [31:0]realoutM,writedataM, output reg [4:0]writeregM); always@(posedge clk or negedge rst_n) begin if(!rst_n) begin {regwriteM,memtoregM,memwriteM} <= 0; realoutM <= 0; writedataM <= 0; writeregM <= 0; end else begin {regwriteM,memtoregM,memwriteM} <= {regwriteE,memtoregE,memwriteE}; realoutM <= realoutE; writedataM <= writedataE; writeregM <= writeregE; end end endmodule module WriteBack( input clk,rst_n, input regwriteM,memtoregM, input [31:0]realoutM, input [4:0]writeregM, input [31:0]readdataM, output reg regwriteW,memtoregW, output reg [31:0]realoutW, output reg [4:0]writeregW, output reg [31:0]readdataW); always@(posedge clk or negedge rst_n) begin if(!rst_n) begin regwriteW <= 0; memtoregW <= 0; readdataW <= 0; realoutW <= 0; writeregW <= 0; end else begin regwriteW <= regwriteM; memtoregW <= memtoregM; readdataW <= readdataM; realoutW <= realoutM; writeregW <= writeregM; end end endmodule `timescale 1ns / 1ps module mux3 #(parameter WIDTH=8)( input [WIDTH-1:0]a, input [WIDTH-1:0]b, input [WIDTH-1:0]c, input [1:0]sel, output reg [WIDTH-1:0]d); always@(*) begin case(sel) 2'b00: d=a; 2'b01: d=b; 2'b10: d=c; default: d=0; endcase end endmodule `timescale 1ns / 1ps module mux2 #(parameter WIDTH=8)( input [WIDTH-1:0]a, input [WIDTH-1:0]b, input sel, output [WIDTH-1:0]c); assign c = sel?b:a; endmodule
该部分已经包含冲突监测的foward与stall,具体如何进行冲突监测我们将在下一章详细阐述。
*always@(*)与assign写组合逻辑的不同(https://blog.csdn.net/u010709324/article/details/77967694)
**1.always块中左端变量需要为reg变量,但此reg并不会被综合成触发器,因为敏感信号列表中没有posedge
**2.always块中只有信号变化才会进入,如果块中为b=0,则b初始态为不定态,在后续的仿真中,由于没有信号变化,所以b无法被赋值为0,会一直是不定态。
8.控制器设计

控制器由主要是对指令进行解码,由两个部分组成:主解码器与ALU解码器,ALU解码器作用为提供ALU的控制信号,其他控制信号由主解码器提供。
`timescale 1ns / 1ps module control_unit( input [5:0] funct,//instr[5:0] input [5:0] op,//instr[31:26] input rst_n, output reg regwrite, output reg memtoreg, output reg memwrite, output reg [1:0] branch, output reg [3:0] alucontrol,//first bit for signed output reg alusrc, output reg regdst, output reg jump, output reg mult_sel); wire [3:0] aluop; main_decoder main_decoder(.op(op),.rst_n(rst_n),.aluop(aluop),.regwrite(regwrite), .memtoreg(memtoreg),.memwrite(memwrite), .branch(branch),.alusrc(alusrc),.regdst(regdst),.jump(jump)); alu_decoder alu_decoder(.funct(funct),.rst_n(rst_n),.aluop(aluop), .mult_sel(mult_sel),.func(alucontrol)); endmodule `timescale 1ns / 1ps module main_decoder( input [5:0]op, input rst_n, output [3:0]aluop,//first bit is 1 --> signed output regwrite, output memtoreg, output memwrite, output [1:0]branch,//0x --> no branch, 10->beq, 11->bne output alusrc, output regdst, output jump); reg [11:0]control; assign {aluop,regwrite,regdst,alusrc,branch,memwrite,memtoreg,jump}=control; always@(*) begin if(!rst_n) control = 12'b0; else begin case(op) 6'b000000: control = 12'b1111_110_00000;//r-type 6'b001000: control = 12'b1000_101_00000;//addi 6'b001001: control = 12'b0000_101_00000;//addiu 6'b001100: control = 12'b0011_101_00000;//andi 6'b001101: control = 12'b0010_101_00000;//ori 6'b001110: control = 12'b0100_101_00000;//xori 6'b001011: control = 12'b0110_101_00000;//sltiu 6'b100011: control = 12'b0000_101_00010;//lw 6'b101001: control = 12'b0000_001_00100;//sw 6'b001111: control = 12'b0111_101_00000;//lui 6'b000101: control = 12'b0001_000_11000;//bne - 6'b000100: control = 12'b0001_000_10000;//beq - 6'b000010: control = 12'b0000_000_00001;//j default:control = 12'b0; endcase end end endmodule `timescale 1ns / 1ps module alu_decoder( input [5:0]funct, input rst_n, input [3:0]aluop,//first bit is 1 --> signed output mult_sel, output [3:0]func); reg [4:0] ctr; assign {mult_sel,func} = ctr; always@(*) begin if(!rst_n) ctr = 5'b0; else if (aluop == 4'b1111) begin//r-type instruction case(funct) 6'b100000: ctr = 5'b01000;//add 6'b100001: ctr = 5'b00000;//addu 6'b100010: ctr = 5'b01001;//sub 6'b100011: ctr = 5'b00001;//subu 6'b100100: ctr = 5'b00010;//and 6'b100101: ctr = 5'b00011;//or 6'b100110: ctr = 5'b00100;//xor 6'b100111: ctr = 5'b00101;//xnor 6'b101010: ctr = 5'b01110;//slt 6'b101011: ctr = 5'b00110;//sltu 6'b000111: ctr = 5'b11000;//mult 6'b000101: ctr = 5'b10000;//multu endcase end else ctr = {1'b0,aluop}; end endmodule

浙公网安备 33010602011771号