CSE 125/225 Implementing Multiple Module

CSE x25 Lab Assignment 3

Welcome to CSE 125/225! Each lab is formed from one or more parts. Each part isrelatively independent and parts can normally be completed in any order. Each part will teach aconcept by implementing multiple modules.Each lab will be graded on multiple categories: correctness, style/lint, git hygiene, anddemonstration. Correctness will be assessed by our autograder. Lint will be assessed byVerilator (make lint). Style and hygiene will be graded by the TAs.To run the test scripts in this lab, run make test from one of the module directories.This will run (many) tests against your solution in two simulators: Icarus Verilog, andVerilator.Both will generate waveform files (.fst) in a directory: run/<test_name and parameter

values>/<simulator name>. You will need to run make extraclean after make testto clean your previous output files. You may use any waveform viewer you would like. The codespace has Surfer installed.It is also an excellent web-based viewer. You can also download GTKWave, which is a bit morefinicky.Each Part will have a demonstration component that must beshown to a TA or instructorfor credit. We may manually grade style/lint after the assignment deadline. Style checking andlinting feels pedantic, but it is the gold standard in industry.At any time you can run make help from inside one of the module directories to find allPaPart 4Part 1: Memories as LUTs as Programmable Logic

Déjà vu anyone?We like to treat FPGAs as a “sea of gates”, but they are not. They are actually made upof discrete elements, like look-up-tables (LUTs) (Xilinx/AMD,Lattice), multiplexers (Xilinx/AMD),multipliers (Xilinx/AMD, Lattice) and memories (Xilinx/AMD). The job of the Electronic DesignAutomation (EDA) toolchain is to synthesize SystemVerilog into these discrete elements andthen program them.In this part, the objective is to use actual FPGA primitives to re-create the logicfunctionsyou completed in Lab 1 and 2. In effect, do synthesis by hand.The following is the instantiation template for a AMD/XILINX LUT6 module:

module LUT6

#(parameter [63:0] INIT = 64'h0000000000000000)

(output O,

,input I0

,input I1

,input I2

,input I3

,input I4

,input I5);

The Look-up-Table (LUT) operates by using I0 through I5 as a 5-bit address that indexes

the bits in the INIT parameter to produce O. For example, if I5 - I0 have the values {1’b0, 1’b0,

1’b0, 1’b1, 1’b0, 1’b0. O will be the value at index 4 in INIT (0 in this example). Edit/Update:

Remember that index 0 of of the bit string 2’b01 is 1 (not 0)

We do not use the Xilinx LUT6 in this lab; we use the SB_LUT4 on your ICE40 FPGA.

This is the SB_LUT4 definition:

module SB_LUT4 (

output O,

input I0,

input I1,

input I2,

input I3

);

parameter [15:0] LUT_INIT = 0;

Like in the Xilinx example, LUT_INIT is the LUT initialization string. The Look-up-Table

(LUT) operates by using I0 through I3 as a 4-bit address that indexes the bits in the INIT

parameter to produce O.Please complete the following parts using the primitives dictated. All of the FPGA

primitives are available in the provided folder.

  • xor2: Using only the Lattice SB_LUT4 module, create a 2-input Exclusive-Or module.
  • xnor2: Using only the Lattice SB_LUT4 module, create a 2-input Exclusive-Nor module.
  • mux2: Using only the Lattice SB_LUT4 module, create a 2-input multiplexer module.
  • full_add: Using only the Lattice ICESTORM_LC module, create a full_add module.

The documentation for this module is here (Page 2-2). Another good reference is here.

You will need to produce sum_o using the internal Look-up-Table and inputs I1, I2, and

CIN. These inputs also connect to “hardened” carry logic (It looks like a mux in the

diagram). These are the relevant lines in the provided module:

wire mux_cin = CIN_CONST ? CIN_SET : CIN;

assign COUT = CARRY_ENABLE ? (I1_pd && I2_pd) || ((I1_pd || I2_pd) &&

mux_cin) : 1'bx;

LUT_INIT, CIN_SET, and CARRY_ENABLE are the three key parameters to set. All the

remaining parameters can be ignored.

  • We are removing this module to simplify Lab 3.

adder: Using only the Xilinx CARRY4 module, create a parameterized adder in

adder.sv. You should use a generative for-loop. You will need to handle arbitrary

with_p values. This document has a good (english) description of the ports on Page 44.

CARRY4 adds the inputs S[3:0] and DI[3:0], and produces O[3:0]. However, not all

adds are 4 bits; so the carry output from each bitwise addition is in CO[3:0]. If you are

chaining two CARRY4‘s together, you will use the MSB of CO (CO[3]) in the first CARRY4

as the input to CI of the second CARRY4. If you are adding three 2-bit values, you will

take CO[2] of the first CARRY4, to get the MSB (bit 3) of the addition.

Here is an example of using the CARRY4 (Hopefully, this is a big hint):

CARRY4

CARRY4_i

(.CO(wCarryOut[4*i+4-1:4*i]), // 4-bit carry out

.O(wResult[4*i+4-1:4*i]), // 4-bit carry chain sum

.CI(wCascadeIn[i]), // 1-bit carry cascade input

.CYINIT(1'b0), // 1-bit carry initialization

.DI(wInputB[4*i+4-1:4*i]), // 4-bit carry-MUX data in.S(wInputA[4*i+4-1:4*i])); // 4-bit carry-MUX select input

  • triadder: Sometimes, ripple-carry-adders (chaining the c_o from one Full-Adder to c_i

in the next Full-Adder) are suboptimal. For example, adding three numbers together

requires two ripple carry adders – the longest path through the circuit would be through

all the carry bits. Fortunately, there’s a “faster” approach.

Implement a 3-way adder without using the verilog + operator more than once. You

should use the full_add module (either the one above, or the one from the previous

lab). The key technique here is to use a 3:2 compressor, and then add the resulting 2-bit

output using an adder. This is a good reference: link

Tl:DR: Use the full_add module as a 3:2 compressor, and then add the resulting two

bits to get the final result. This technique generalizes to N inputs if you read further in the

link above.

  • shift: Using only the Lattice ICESTORM_LC module, create a parameterized, shift

register identical to Lab 1. The shift register should shift left, and shift in d_i to the

low-order-bit, on the positive edge of clk_i, when enable_i == 1.

The documentation for the ICESTORM_LC module is here (Page 2-2). Another good

reference is here.

These are the key lines from the provided module:

always @(posedge polarized_clk)

if (CEN_pu)

o_reg <= SR_pd ? SET_NORESET : lut_o;

assign O = DFF_ENABLE ? ASYNC_SR ? o_reg_async : o_reg : lut_o;The key ports for this module are I0, 0, CLK, CEN and SR. The key parameters for thismodule are: LUT_INIT (How do you use the LUT_INIT to pass through I0,unmodified?

ow do you use it to make a mux, to select the d_i input?), SET_NORESET (Related to

reset_val_p), and DFF_ENABLE (parameter for enabling the D-Flip-Flop. NEG_CLK and

ASYNC_SR must be left at their default values.

Demonstration (All Students):

There is no demonstration for this part.Part 2: Asynchronous Memories in SystemVerilog

Let’s get some practice with memories. Instead of using the LUTs in the FPGA, let’s uverilog to describe memories. Since these memories are asynchronous-read, they aresynthesized to registers in the actual fabric.

  • ram_1r1w_async: Using behavioral SystemVerilog, create a read-priority (aka read-first)asynchronous memory with 1 write port and 1 read port. It must implement the

arameters width_p, and depth_p. Read priority means that reads get the old writata when there is an address collision (i.e. the read happens first).

our asynchronous memory should initialize using the function $readmemh.When simulating, you will see a warning like this in icarus: FST warning: array

word ram_1r1w_async.ram[10] will conflict with an escaped identifier. This is OK.

hex2ssd: Using your ram_1r1w_async, create a module that converts a 4-bithexadecimal number into a seven-segment display encoding.Commit your memory initialization file along with your solution.

kpyd2hex: Using your ram_1r1w_async memory, create a module that converts from a

keypad (Row, Column) output to a hexadecimal value.

kpyd_i is the one-hot encoding of the row value in the high-order bits, and the columnvalue in the low-order bits. I think the Icebreaker PMOD pin definitions are the swappedfrom the Digilent PMOD definitions. My solution treats Column 1 as 0001, correspondinto the column with 1/4/7/0, and Row 1 as 0001, corresponding to the row 1/2/3/A.Commit your memory initialization file along with your 代写 CSE 125/225  Implementing Multiple Module solution. You will need to copyboth .hex files into this directory for it to compile to your FPGA.Demonstration (All Students):Demonstrate your working Keypad to Seven-Segment Display module onthe FPGA byinstantiating your modules in top.sv. Use your keypad to show which button is being pressed onthe seven segment display.10/23 Notes (Contributed by Raphael):

  • You will need to iteratively select columns to determine which column has a button being

pressed. (Do this with a 1-hot shift register/ring counter!)

  • You will need to drive the column pins on the keypad, and it will respond with the rowbeing pressed within that column.● The kpyd2hex module takes rows and columns as one-hot values (e.g. 00010001).However, the keypad columns are zero-hot with pull-up resistors. Therefore, the rowsare also zero-hot (see datasheet for more info). For example, if button “1" is beingressed and we send 1110 to the keypad, it will respond with 1110.
  • Finally, the keypad glitches if you send too many requests, so you need to slow the12MHz clockOld Notes:
  • You do not need to debounce or edge-detect the buttons.
  • You will need to iteratively select columns to determine which column has a button beingpressed. (Do this with a 1-hot shift register!)
  • The kpyd2hex module takes rows and columns as one-hot values (e.g. 00010001).However, the keypad columns are zero-hot with pull-up resistors. Therefore, the rowsare also zero-hot.
  • You will need to handle the case where no button is pressed.
  • It is safe to assume we will only press one button at a time in a column.
  • You can use persistence of vision.Part 3: Elastic Pipelines and FIFOsWe are working our way up to pipelines. There are two types of pipelines: inelastic, andelastic (we will cover these in class). You can always wrap inelastic pipelines to create elasticones. In this lab, you will write an in-elastic pipeline stage. Next, you willcreate an elasticpipeline stage. Finally, you will use your memory (from above) to create a FIFO.You may use whatever operators and behavioral description you prefer, except you may

not use always@(*). You are encouraged to reuse whatever modules see fit from previous

labs or this lab.

  • inelastic: Write an inelastic pipeline stage. When en_i is 1, it should save the data.

Otherwise, it should not. When datapath_reset_p == 1, data_o should be reset to 0

if reset_i ==1 at the positive edge of the clock.

You can use /* verilator lint_off WIDTHTRUNC */ around

datapath_reset_p to clear the lint warnings.

Note: This should look a lot like writing a DFF.

  • elastic: Write a mealy elastic pipeline stage. You can think of this as a 1-element FIFO,

with a mealy state machine to improve throughput. The module must be Ready Valid &

on the input/consumer interface (ready_o and valid_i) and Ready Valid & (valid_o

and ready_i) on the output/producer.

When datapath_reset_p == 1, data_o should be reset to 0 if reset_i ==1 at the

positive edge of the clock.

When datapath_gate_p ==1, data_o should only be updated when valid_i == 1.

Otherwise, data_o should be updated whenever ready_o == 1. This a very simple

form of “Data Gating”, and is the missing “bit” from the class lecture slides.

10/23 Note: A potentially better way to say above: When datapath_gate_p == 1,

data_o should only be updated when (valid_i & ready_o) == 1. Otherwise, data_o

should be updated whenever ready_o ==1.

  • fifo_1r1w: Using behavioral SystemVerilog, your ram_sync_1r1w module, and any

other module you have written, write a First-in-First-Out (FIFO) module. The module

must be Ready Valid & on the input/consumer interface (ready_o and valid_i) and

Ready Valid & (valid_o and ready_i) on the output/producer. This paper and this

google doc have good breakdowns of the interface types.Demonstration (All Students):Demonstrate your working FIFO by using it to connect between audio input and outputon your FPGA board. You will plug the PMOD I2S2 into PMOD Port B on your board, and thenuse 3.5mm cables to connect to the Audio I/O ports to/from your computer/speaker. You shouldset your FIFO to a very small depth (e.g. 2) because the Lattice boards do not have memories

that support the ram_1r1w_async pattern. The output must sound the same as the original

audio for credit.What is the maximum value for depth_p that you can use on your FPGA, before thetoolchains fail to compile?Part 4: Sinusoid / Fixed Point Representation

Have you ever heard anyone complain about how complicated IEEE 754 floating pointis? The problem is that it’s easy to use (in software), until it isn’t: List of Failures from IEEE 754.For this reason, floating point arithmetic isn’t used in many safety critical applications. For thesame reasons, floating point numbers aren’t used in signal processing. Fixed point operations

are vastly less complicated than floating point operations, require vastly less area, and arenumerically stable.Fixed point arithmetic follows the same rules as normal two’s-complement arithmetic. Inthat sense, you already know the basics. The difference is that when two fixed-point numbersare multiplied, the number of fractional digits/bits increases. For example, .5 * .5, which isrepresentable with one fractional digit, produces .25, which needs two fractional digits torepresent. In fixed point, the fractional digits represent ½ (.5), ¼ (.25), ⅛ (.125), etc. In theexample above, .5 is represented in binary by .1. When you multiply 0.1 and 0.1, the result willbe two bits, 0.01 (binary), or .25 (decimal)

I like to handle fractional bits by declaring the fractional bits in the negative range of thebus. For example, wire [11:-4] foo, has 12 integer bits, and 4 fractional bits. When foo ismultiplied by itself, it produces 24 integer bits, and 8 fractional bits, or [23:-8].However, if[-1:-4] bus is multiplied by a [11:-4] bus, the result is only a [11:-8] bus.Here are a few good tutorials:

  • sinusoid: Using your ram_1r1w_async memory, create a module that generates a sinewave, turning hexadecimal indices into (signed) 12-bit values. See the demo below formore information.Since this is an audial challenge, there is no testbench for this part. If you needaccommodations, please see the instructor. Commit your memory initialization file (inhex format) along with your solution.Demonstration (AllStudents):Demonstrate using your counter module from Lab Assignment 1/2, and your sinusoidmodule above, play a Tuning-A tone on the speakers in the lab with the PMOD I2S2 module.You need to figure out how to generate a tone at 440 Hz, given that the PLL clock runs at22.591MHz, and the I2S2 accepts a Left channel and a Right channel output at approximately44.1 KHz. The interface to the I2S2 module is Ready-Valid-&. (Note: Do not use the output ofyour counter as your clock. All of your logic should run at 22.591MHz.). Implement your solution

in sinusoid/top.sv. We will use this link (or similar) to determine if you have succeeded.The clock frequency in this lab has changed. The signal from the PLL is faster,

22.591MHz, and called clk_o.In both demonstration folders, top.sv instantiates an I2S2-to-AXI-Streaming module,

which drives the I2S2 PMOD. The input and output of this module uses a ready/validhandshake. The left and right audio channels are separate wires, but you canconcatenate themif you would like (for your FIFO). Drive both, for your sinusoid.top.sv “works out of the box”. You can test your setup works by connecting yourcomputer to the audio input and connecting the audio output into amplified speakers, i.e. thosewith a power cable. You will need to instantiate your logic between the interfaces for the demo.

WARNING WARNING WARNING

DO NOT PLUG YOUR HEADPHONES INTO THE AUDIO OUTPUT WHILE THEY ARE INYOUR EARS. PLAY MUSIC FIRST, ADJUST VOLUME, THEN PUT IN EARS.Use make bitstream to build the FPGA bitstream (configuration file) and program theFPGA. Your FPGA will need to be plugged into a USB port.Grading:

  1. Push your completed assignment to your git repository. Only push your modified files!
  2. Submit your assignment through gradescope, and confirm that the autograder runs.
  3. Demonstrate each part to a TA.This lab will be graded on the following criteria. Weights are available in Canvas.
  1. Correctness: Is the code in git correct? Does it pass the checks in Gradescope?
  2. Lint and Style: Does the solution pass the Verilog Lint Checker run by Gradescope? Arevariable names consistent with what is being taught in class? This may seem pedanticbut in industry and open source projects this is standard practice.Hint: use the make lint command to check your codeDemonstration: Was the code demonstrated to a TA or instructor before the deadline?The following will also be considered in your final grade:
  1. Language Features: Does the solution use allowed language features? (i.e. Structural vsBehavioral Verilog). Maximum 50% deduction.
  1. Git Hygiene: Does the assignment submission only contain files that are relevant to theassignment? Please, please, please don’t check files that aren’t part of the submission.Maximum 20% deduction.Finally, modifying any parts of the test/grading infrastructure without permission willresult in a zero on the entire part.
posted @ 2024-10-29 13:28  t82x8z  阅读(38)  评论(0)    收藏  举报