从零开始写处理器(1)——数据通路+控制模块设计+指令定义

此前在UCSB交换旁听了那边的ECE154b,课设是写MIPS+cache+branch prediction,个人感觉很有意思。现在就按照教授给的project guideline自己动手走一遍流程(见https://www.ece.ucsb.edu/~strukov/ece154bSpring2017/project1.pdf)。参考的书籍是Harris编写的 "Digital Design and Computer Architecture" 。

我写的MIPS处理器为5级流水线,分为取指(Fetch)->译码(Decode)->执行(Execute)->存储(Memory)->回写(Writeback),有冲突监测。

//2020.2.7更新,修复了一些bug

1.指令集编码

首先我们需要确定我们希望MIPS所能实现的指令,并确定指令的编码。对于该32bit微处理器,我们希望实现至少以下27种指令,我们分为三种:

R型指令:add, addu, sub, subu, and, or, xor, xnor, slt, sltu, lw, sw, mult, multu. 指令格式:OP, rd, rs, rt. 对于R型指令,将32位的指令各部分分配为opcode(6), rs(5), rt(5), rd(5), shamt(5), funct(6),其中opcode=6'b0,对于不同指令的区分取决于funct部分。

I型指令:addi,addiu,andi,ori,xori,slti,sltiu,lw,sw,lui,bne,beq.指令格式:OP, rt, rs, imm. 对于I型指令,将32位的指令各部分分配为opcode(6), rs(5), rt (5), immediate(16)。

J型指令:j. 指令格式:OP, LABEL. 对于J型指令,将32位的指令各部分分配为opcode(6), address(26)。

我们将上述15种指令编码得到表1.

表1 MIPS指令编码

R-type

instr

operation OP rs rt rd shamt funct
add 000000

sssss

(reg addr)

ttttt

(reg addr)

ddddd

(reg addr)

00000

(no use)

100000
addu 100001
sub 100010
subu 100011
and 100100
or 100101
xor 100110
xnor 100111
slt 101010
sltu 101011
mult 000111
multu 000101

I-type

instr

operation op rs rt imm
addi 001000

sssss

(reg addr)

ttttt

(reg addr)

imm

addiu 001001
andi 001100
ori 001101
xori 001110
sltiu 001011
lw 100011
sw 101001
lui 001111
bne 000101
beq 000100

J-type

instr

operation OP LABEL
J 000010 address

 

2.ALU设计

对于ALU,我们将模块设定为如图1所示(实际上我在此基础上加入了一个输出信号判断此时输出是否为0),是一个有3个输入、2个输出的组合逻辑电路。对于该ALU,我们希望实现除乘法外所有的运算环节(乘法耗时较长,我们另加一个独立单元计算,避免让流水线频率过低)。

图1 ALU模块图(来自project guideline)

图1中Func信号来自control模块读取opcode与funct等作出的判断,输入ALU中以判断执行何种操作。Func[2:0]用于实现+,-,&,|,^等运算,Func[3]用于区分符号数运算与非符号数运算,代码如下:

module alu(
input [31:0] in1,
input [31:0] in2,
input [3:0] func,
output reg[31:0] aluout);

always @(*) begin
    case (func[2:0]) 
        3'b000 : aluout = in1+in2;
        3'b001 : aluout = in1-in2;
        3'b010 : aluout = in1|in2;
        3'b011 : aluout = in1&in2;
        3'b100 : aluout = in1^in2;
        3'b101 : aluout = ~(in1^in2);
        3'b110 : aluout = (in1-in2)?0:1;//slt
        3'b111 : aluout = {in2[15:0],16'b0};//lui
        default : aluout = 32'b0;
    endcase
end
endmodule

值得注意的是,在MIPS对指令集的定义中,符号数与非符号数在加法与减法运算上完全没有区别(add与addu,addi与addiu等),因此alu中并不需要区分符号数与非符号数加减法,对于这两者的区分源于编译器对32bit的二进制数的判断。但对于有符号数与无符号数我们依然需要运用不同的编码,是因为有符号数存在溢出检测,而无符号数则没有,在本案例中我们没有加入该项功能,具体可以参考MIPS官方文档。(感谢https://blog.csdn.net/sinat_42483341/article/details/89511856给的启发) 

3.数据内存设计

 数据内存为模拟cache功能,因为MIPS为RISC机,即寄存器与外部存储只能通过load/store指令交换数据,我们在开始一个新的程序时,必须要通过lw将操作数导入寄存器文件(Register File)。在后续学习中我们会将cache模块代替该部分,从而进一步让设计尽可能贴近实际计算机体系结果。

在本部分,我们加入了从外部写入数据的三个控制信号(ext_),在进行仿真的时候我们可以对数据内存中数据进行录入。同时,在本模块中,由于数据为32bit,但每个单元只可以存储8bit数据,本设计采用大端存储的方式存储数据。

假设前两个32bit数据分别为0x12345678,0x87654321,数据存放顺序如下表示意:

addr 1 2 3 4
data 0x12 0x34 0x56 0x78
addr 5 6 7 8
data 0x87 0x65 0x43 0x21
module data_memory(
input [31:0] a,
input [31:0] wd,
input clk,
input we,//enable signal
output [31:0] rd,
input [31:0] ext_data,ext_data_addr,
input ext_data_en);

reg [7:0] data_ram [255:0];

assign rd = {data_ram[a],data_ram[a+1],data_ram[a+2],data_ram[a+3]};//word aligned

integer n;
always@(posedge clk) begin
    //if (!rst_n) begin
    //    for(n=0;n<255;n=n+1) 
    //        data_ram[n]<=0;
    //end
//    else
    if(ext_data_en)
        {data_ram[ext_data_addr],data_ram[ext_data_addr+1],data_ram[ext_data_addr+2],data_ram[ext_data_addr+3]} <= ext_data;
    if (we) 
        {data_ram[a],data_ram[a+1],data_ram[a+2],data_ram[a+3]} <= wd;
end
endmodule

这一模块为时序模块,处于五级流水线的第四级(存储部分)。

4.指令存储

该模块为流水线第一级,取指部分,只有一个输入一个输出,在初始的时候我们将机器码写入指令存储代码内部的寄存器组中,每过一个时钟上升沿,将地址+4得到下一条指令,但遇到j,bne,beq时需要通过特殊的运算跳转到特定的地址取指,这一部分我将在后面的文章中给出(重度拖延症患者)。

与数据内存一样,该部分有ext_端口读外部数据,此外也使用了大端存储。

module inst_memory(
input clk,//for writing memory
input [31:0] a,
output reg [31:0] rd,
input [31:0] ext_instr,ext_instr_addr,
input ext_instr_en,
input start);

reg [7:0] inst_mem [255:0];

always@(*)
begin
    if(start == 1)
        rd = {inst_mem[a],inst_mem[a+1],inst_mem[a+2],inst_mem[a+3]};
    else 
        rd = 0;
end

integer n;

always@(posedge clk)
begin
    if(ext_instr_en) begin
        inst_mem[ext_instr_addr]<=ext_instr[31:24];
        inst_mem[ext_instr_addr+1]<=ext_instr[23:16];
        inst_mem[ext_instr_addr+2]<=ext_instr[15:8];
        inst_mem[ext_instr_addr+3]<=ext_instr[7:0];
    end
end
endmodule

5.寄存器组

这一模块为译码部分,我们通过译码选中不同的寄存器导出相应数据输送到ALU与乘法器中进行计算,此外也可以通过lw与sw指令与cache中的数据进行交换。其中两个读输出为rs与rt,为了避免在同一时刻进行读和写从而造成RAW(Read After Write)冲突,下降沿写寄存器数据,读的部分我们使用wire型变量,为组合逻辑。

module reg_file(
input [31:0] wd,//data written in
input [4:0] a3,//write addr, for I instruction: [20:16] for R instruction: [15:11]
input clk,reg_wr,rst_n,
input [4:0] a1,a2,//read addr,[25:21]&[20:16]
output [31:0]rd1,rd2);//output data

reg [31:0] rf [31:0];
integer i;

always@(negedge clk or negedge rst_n) begin
    if (!rst_n) begin 
        for (i=0;i<32;i=i+1) begin rf[i] <= 0; end
    end
    else begin 
        if (reg_wr) begin rf[a3] <= wd; end
    end
end 

/*always@(posedge clk or negedge rst_n) begin
    if(!rst_n) begin
        rd1 <= 0;
        rd2 <= 0;
    end
    else begin
        rd1 <= rf[a1];
        rd2 <= rf[a2];
    end
end*/
assign rd1 = a1?rf[a1]:0; 
assign rd2 = a2?rf[a2]:0; 

endmodule

6.乘法器设计

因为乘法器耗时较长,所以我们将乘法功能从ALU中抽出来,与其他运算不同的是,无符号乘法与有符号乘法并不一致,需要加以区别。

`timescale 1ns / 1ps
module multipler(//for add&sub, there is no different in the circut, but multiple is
input [31:0] a,
input [31:0] b,
input clk,
input sign,
output reg [31:0]multout);

wire signed [31:0]a_sign,b_sign;
reg signed [31:0]multout_sign;

always@(*) begin
    if (sign) 
        multout = a*b;
    else begin
        multout_sign = a_sign*b_sign;
        multout = multout_sign;
    end
end

endmodule

 7.数据通路设计

利用如下图所示数据通路我们可以连接各个模块,此外,数据通路的运算也需要控制器与冲突控制端口的控制,我们需要预留出相应的端口。

module datapath(
input clk,
input rst_n,
input regwrite,
input memtoreg,
input memwrite,
input [1:0] branch,
input [3:0] alucontrol,//first bit for signed
input alusrc,
input regdst,
input jump,
input mult_sel,
//input for hazard handle
input [1:0]forwardAE,forwardBE,
input forwardAD,forwardBD,
input flushD,
input stallF,stallD,flushE,
//output for hazard handle
output [4:0] rsE,rtE,//forward RAW
output [4:0] rtD,//stall RAW
output [4:0] rsD,
output [4:0] writeregM,writeregW,//forward RAW
output regwriteM,regwriteW,//forward RAW
output memtoregE,//stall RAW
output [4:0] writeregE,
output pcsrcD,//for branch control
output memtoregM,regwriteE,
//
output [5:0]op,
output [5:0] funct,
output [31:0] resultW,
//loading data from outside
input [31:0] ext_data,ext_data_addr,
input ext_data_en,
input [31:0] ext_instr,ext_instr_addr,
input ext_instr_en,
input start);


wire [31:0]realoutM,writedataM;
/////////////////////Fetch///////////////////////////////
wire [31:0] pcF;
wire [31:0] instrD;
wire [31:0] pcplus4D;
wire [31:0] pc_n;
wire [31:0] pcbranchD;//branch address
wire [31:0] instrF,pcplus4F;
wire [31:0] jumpadd;//jump address
assign pcplus4F = pcF+4;

wire [31:0] pc_n_temp;

mux2 #(32) PCBranchMux(pcplus4F,pcbranchD,pcsrcD,pc_n_temp);
mux2 #(32) PCJumpMux(pc_n_temp,jumpadd,jump,pc_n);
//assign pc_n = jump?(pcsrcD?pcplus4F:pcbranchD):jumpadd;

Fetch Fetchff(clk,rst_n,stallF,pc_n,pcF);
inst_memory fetch_mem(clk,pcF,instrF,ext_instr,ext_instr_addr,ext_instr_en,start);//instruction memory

////////////////////Decode/////////////////////////
Decode Decodeff(clk,rst_n,stallD,flushD,instrF,pcplus4F,instrD,pcplus4D);

assign op = instrD[31:26];
assign funct = instrD[5:0];
wire regwriteD,memtoregD,memwriteD;
wire [1:0] branchD;
wire [4:0] a1,a2;
wire [3:0] alucontrolD;//first bit for signed
wire alusrcD,regdstD,jumpD,mult_selD;
assign {regwriteD,memtoregD,memwriteD,branchD,alucontrolD,alusrcD,regdstD,jumpD,mult_selD}=
        {regwrite,memtoreg,memwrite,branch,alucontrol,alusrc,regdst,jump,mult_sel};
assign a1 = instrD[25:21];//rs
assign a2 = instrD[20:16];//rt
wire [31:0] rd1D,rd2D,signimmD;//
wire [4:0] rdD;
wire [31:0] signimmE,rd1E,rd2E;
wire [4:0] rdE;
wire memwriteE,alusrcE,regdstE,mult_selE;
wire [3:0] alucontrolE;
assign rtD = instrD[20:16];//for I-type write 
assign rdD = instrD[15:11];//for R-tpye write
assign rsD = instrD[25:21];

wire [31:0]rd1,rd2;

reg_file reg_file(resultW,writeregW,clk,regwriteW,rst_n,a1,a2,rd1,rd2);

mux2 #(32) CtrHazardMux1(rd1,realoutM,forwardAD,rd1D);
mux2 #(32) CtrHazardMux2(rd2,realoutM,forwardBD,rd2D);//control hazard forward

wire pcsrcD_temp;
mux2 #(1) BranchMux1((rd1D == rd2D),(rd1D != rd2D),branchD[0],pcsrcD_temp);
mux2 #(1) BranchMux2(1'b0,pcsrcD_temp,branchD[1],pcsrcD);
//assign pcsrcD = branchD[1]?0:(branchD[0]?(rd1D == rd2D):(rd1D != rd2D));
assign jumpadd = {pcF[31:28],instrD[25:0],2'b0};
assign signimmD = {{16{instrD[15]}},instrD[15:0]}<<2;
assign pcbranchD = pcplus4D+signimmD;


wire [31:0] srcaE,srcbE;
wire [31:0] aluoutE, multoutE;
wire [31:0] writedataE;
wire [31:0] rd1E_temp,rd2E_temp;
/////////////////////////Execution///////////////////////////////
Execute Executeff(clk,rst_n,flushE,signimmD,rd1D,rd2D,rtD,rdD,rsD,regwriteD,memtoregD,memwriteD,alusrcD,regdstD,mult_selD,
                    alucontrolD,signimmE,rd1E_temp,rd2E_temp,rtE,rdE,rsE,alucontrolE,
                    regwriteE,memtoregE,memwriteE,alusrcE,regdstE,mult_selE);

mux3 #(32) RAWHazardMux1(rd1E_temp,resultW,realoutM,forwardAE,rd1E);
mux3 #(32) RAWHazardMux2(rd2E_temp,resultW,realoutM,forwardBE,rd2E);//RAW Hazard forward

assign srcaE = rd1E;
mux2 #(32) SrcBMux(rd2E,signimmE,alusrcE,srcbE);
//assign srcbE = alusrcE?signimmE:rd2E;

alu alu(srcaE,srcbE,alucontrolE,aluoutE); 
multipler multipler(srcaE,srcbE,clk,alucontrolE[3],multoutE);//alucontrolE[2]=1,signed

wire [31:0] realoutE;
wire memwriteM;

mux2 #(32) ALU_MULT_Mux(aluoutE,multoutE,mult_sel,realoutE);
//assign realoutE = mult_selE?aluoutE:multoutE;
assign writedataE = rd2E;
mux2 #(5) regMux(rtE,rdE,regdstE,writeregE);
//assign writeregE = regdstE?rtE:rdE;


///////////////////////Memory////////////////////////////////
Memory Memoryff(clk,rst_n,regwriteE,memtoregE,memwriteE,realoutE,writedataE,writeregE,
                regwriteM,memtoregM,memwriteM,realoutM,writedataM,writeregM);

wire [31:0]readdataM;

data_memory data_memory(realoutM,writedataM,clk,memwriteM,readdataM,ext_data,ext_data_addr,ext_data_en);

wire memtoregW;
wire [31:0]readdataW,realoutW;
///////////////////////Writeback/////////////////////////////////
WriteBack WriteBackff(clk,rst_n,regwriteM,memtoregM,realoutM,writeregM,readdataM,
                        regwriteW,memtoregW,realoutW,writeregW,readdataW);
mux2 #(32) ResultMux(realoutW,readdataW,memtoregW,resultW);
//assign resultW = memtoregW?realoutW:readdataW;
endmodule

`timescale 1ns / 1ps
module Fetch(
input clk, rst_n,
input enable_n,
input [31:0] pc_n,
output reg [31:0] pcF);

always@(posedge clk or negedge rst_n)
begin
    if(!rst_n)
        pcF<=0;
    else if(!enable_n)
        pcF<=pc_n;
end
endmodule

module Decode(
input clk,rst_n,
input enable_n, flushD,
input [31:0] instrF,pcplus4F,
output reg [31:0] instrD,pcplus4D);

always@(posedge clk or negedge rst_n or posedge flushD)
begin
    if(!rst_n) begin
        instrD <= 0;
        pcplus4D <= 0;
    end
    else if (flushD) begin
        instrD <= 0;
        pcplus4D <= 0;
    end
    else if (!enable_n) begin
        instrD <= instrF;
        pcplus4D <= pcplus4F;
    end
end
endmodule

module Execute(
input clk,rst_n,
input flushE,
input [31:0] signimmD,rd1D,rd2D,
input [4:0] rtD,rdD,rsD,
input regwriteD,memtoregD,memwriteD,alusrcD,regdstD,mult_selD,
input [3:0]alucontrolD,
output reg [31:0] signimmE,rd1E,rd2E,
output reg [4:0] rtE,rdE,rsE,
output reg [3:0] alucontrolE,
output reg regwriteE,memtoregE,memwriteE,alusrcE,regdstE,mult_selE);

always@(posedge clk,negedge rst_n,posedge flushE)
begin
    if((!rst_n) || flushE) begin
        signimmE <= 0;
        rd1E <= 0;
        rd2E <= 0;
        {regwriteE,memtoregE,memwriteE,alucontrolE,alusrcE,regdstE,mult_selE} <= 0;
        rtE <= 0;
        rdE <= 0;
        rsE <= 0;
    end
    else begin
        signimmE <= signimmD;
        rd1E <= rd1D;
        rd2E <= rd2D;
        {regwriteE,memtoregE,memwriteE,alucontrolE,regdstE,mult_selE} <= 
        {regwriteD,memtoregD,memwriteD,alucontrolD,regdstD,mult_selD};
        alusrcE <= alusrcD;
        rtE <= rtD;
        rdE <= rdD;
        rsE <= rsD;
    end
end
endmodule

module Memory(
input clk,rst_n,
input regwriteE,memtoregE,memwriteE,
input [31:0]realoutE,writedataE,
input [4:0]writeregE,
output reg regwriteM,
output reg memtoregM,memwriteM,
output reg [31:0]realoutM,writedataM,
output reg [4:0]writeregM);

always@(posedge clk or negedge rst_n)
begin
    if(!rst_n) begin
        {regwriteM,memtoregM,memwriteM} <= 0;
        realoutM <= 0;
        writedataM <= 0;
        writeregM <= 0;
    end
    else begin
        {regwriteM,memtoregM,memwriteM} <= {regwriteE,memtoregE,memwriteE};
        realoutM <= realoutE;
        writedataM <= writedataE;
        writeregM <= writeregE;
    end
end
endmodule

module WriteBack(
input clk,rst_n,
input regwriteM,memtoregM,
input [31:0]realoutM,
input [4:0]writeregM,
input [31:0]readdataM,
output reg regwriteW,memtoregW,
output reg [31:0]realoutW,
output reg [4:0]writeregW,
output reg [31:0]readdataW);

always@(posedge clk or negedge rst_n)
begin
    if(!rst_n) begin
        regwriteW <= 0;
        memtoregW <= 0;
        readdataW <= 0;
        realoutW <= 0;
        writeregW <= 0;
    end
    else begin
        regwriteW <= regwriteM;
        memtoregW <= memtoregM;
        readdataW <= readdataM;
        realoutW <= realoutM;
        writeregW <= writeregM;
    end
end
endmodule

`timescale 1ns / 1ps
module mux3 #(parameter WIDTH=8)(
input [WIDTH-1:0]a,
input [WIDTH-1:0]b,
input [WIDTH-1:0]c,
input [1:0]sel,
output reg [WIDTH-1:0]d);

always@(*)
begin
    case(sel)
        2'b00: d=a;
        2'b01: d=b;
        2'b10: d=c;
        default: d=0;
    endcase
end
endmodule

`timescale 1ns / 1ps
module mux2 #(parameter WIDTH=8)(
input [WIDTH-1:0]a,
input [WIDTH-1:0]b,
input sel,
output [WIDTH-1:0]c);

assign c = sel?b:a;

endmodule

该部分已经包含冲突监测的foward与stall,具体如何进行冲突监测我们将在下一章详细阐述。

*always@(*)与assign写组合逻辑的不同(https://blog.csdn.net/u010709324/article/details/77967694

**1.always块中左端变量需要为reg变量,但此reg并不会被综合成触发器,因为敏感信号列表中没有posedge

**2.always块中只有信号变化才会进入,如果块中为b=0,则b初始态为不定态,在后续的仿真中,由于没有信号变化,所以b无法被赋值为0,会一直是不定态。

 8.控制器设计

 

控制器由主要是对指令进行解码,由两个部分组成:主解码器与ALU解码器,ALU解码器作用为提供ALU的控制信号,其他控制信号由主解码器提供。

`timescale 1ns / 1ps
module control_unit(
input [5:0] funct,//instr[5:0]
input [5:0] op,//instr[31:26]
input rst_n,
output reg regwrite,
output reg memtoreg,
output reg memwrite,
output reg [1:0] branch,
output reg [3:0] alucontrol,//first bit for signed
output reg alusrc,
output reg regdst,
output reg jump,
output reg mult_sel);

wire [3:0] aluop;

main_decoder main_decoder(.op(op),.rst_n(rst_n),.aluop(aluop),.regwrite(regwrite),
            .memtoreg(memtoreg),.memwrite(memwrite),
            .branch(branch),.alusrc(alusrc),.regdst(regdst),.jump(jump));

alu_decoder alu_decoder(.funct(funct),.rst_n(rst_n),.aluop(aluop),
                        .mult_sel(mult_sel),.func(alucontrol));
                        
endmodule


`timescale 1ns / 1ps
module main_decoder(
input [5:0]op,
input rst_n,
output [3:0]aluop,//first bit is 1 --> signed
output regwrite,
output memtoreg,
output memwrite,
output [1:0]branch,//0x --> no branch, 10->beq, 11->bne
output alusrc,
output regdst,
output jump);

reg [11:0]control;

assign {aluop,regwrite,regdst,alusrc,branch,memwrite,memtoreg,jump}=control;

always@(*) begin
    if(!rst_n)
        control = 12'b0;
    else begin 
        case(op)
            6'b000000: control = 12'b1111_110_00000;//r-type
            6'b001000: control = 12'b1000_101_00000;//addi
            6'b001001: control = 12'b0000_101_00000;//addiu
            6'b001100: control = 12'b0011_101_00000;//andi
            6'b001101: control = 12'b0010_101_00000;//ori
            6'b001110: control = 12'b0100_101_00000;//xori
            6'b001011: control = 12'b0110_101_00000;//sltiu
            6'b100011: control = 12'b0000_101_00010;//lw
            6'b101001: control = 12'b0000_001_00100;//sw
            6'b001111: control = 12'b0111_101_00000;//lui
            6'b000101: control = 12'b0001_000_11000;//bne -
            6'b000100: control = 12'b0001_000_10000;//beq -
            6'b000010: control = 12'b0000_000_00001;//j
            default:control = 12'b0;
        endcase
    end
end
endmodule

`timescale 1ns / 1ps
module alu_decoder(
input [5:0]funct,
input rst_n,
input [3:0]aluop,//first bit is 1 --> signed
output mult_sel,
output [3:0]func);

reg [4:0] ctr;
assign {mult_sel,func} = ctr;

always@(*) begin
    if(!rst_n) 
        ctr = 5'b0;
    else if (aluop == 4'b1111) begin//r-type instruction
        case(funct)
            6'b100000: ctr = 5'b01000;//add
            6'b100001: ctr = 5'b00000;//addu
            6'b100010: ctr = 5'b01001;//sub
            6'b100011: ctr = 5'b00001;//subu
            6'b100100: ctr = 5'b00010;//and
            6'b100101: ctr = 5'b00011;//or
            6'b100110: ctr = 5'b00100;//xor
            6'b100111: ctr = 5'b00101;//xnor
            6'b101010: ctr = 5'b01110;//slt
            6'b101011: ctr = 5'b00110;//sltu
            6'b000111: ctr = 5'b11000;//mult
            6'b000101: ctr = 5'b10000;//multu
        endcase
    end
    else
        ctr = {1'b0,aluop};
end
endmodule
posted @ 2019-10-11 01:00  霍比特人长不高  阅读(2820)  评论(0)    收藏  举报