Computer Organization and Design--计组作业习题（6）

Computer Organization and Design

----------------------个人作业，如果有后辈的作业习题一致，可以参考学习，一起交流，请勿直接copy

Problem 2. Cache Associativity (8 points)

For this question, you will simulate different configurations of an 8 block cache using the following

8-bit memory accesses:

load 100

store 102

store 88

load 120

load 90

a) Simple Questions

i) If the block size is 4 bytes, how large must the cache be?

----------32 bytes ；

ii) How many bits will be used for the offset?

----------2 bits will be used for the offset ；

b) Fully-associative with a 4-byte block size

i) How many bits should be used for the tag?

----------6 bits should be used for the tag ；

ii) Complete the table by simulating the above memory references

Show each entry, and simply cross out blocks as they are overwritten

Block#	Tag	Dirty
0	25	0->1
1	22	1
2	30
3
4
5
6
7

c) Direct-mapped with a 4-byte block size

i) How many bits should be used for the tag?

------------3 bits should be used for the tag ；

ii) Complete the table by simulating the above memory references

Show each entry, and simply cross out blocks as they are overwritten

Block/Set#	Tag	Dirty
0
1	3	0->1
2
3
4
5
6	2->3->2	1->0->0
7

d) 2-way Set-associative with a 4-byte block size

i) How many bits should be used for the tag?

------------4 bits should be used for the tag ；

ii) Complete the table by simulating the above memory references

Show each entry, and simply cross out blocks as they are overwritten

Set#	Block#	Tag	Dirty
0	0
	1
1	2	6	0->1
	3
2	4	5	1
	5	7	0
3	6
	7

e) Matching

Note: Some may have multiple answers and some answers may not be used.

__bd___full-associative cache a) Fewest connections to memory are needed

__ae___ direct-mapped cache b) Provides optimal cache utilization

___f___ set-associative cache c) Allows a larger block size to be used

　　　　　　　 d) Largest Tag overhead

　　　　　　　　 e) Smallest Tag overhead

　　　　　　　　 f) May prevent conflict when two blocks have the same set

　　　　　　 (N/A to full-associative caches)

Problem 3. Cache Comparison (8 Points)

EZ-Cache Company has hired you to design their next generation cache for the LC2k. They have want to use a 32-byte direct-mapped cache with a 4-byte block size.

a) Using the following sequence of memory accesses, compute the number of cache hits:

load 200

store 204

store 236

load 201

load 208

store 234

load 204

load 239

store 201

Block/Set#	Tag	Dirty
0
1
2	6->7->6
3	6->7->6->7
4	6
5
6
7

Number of Hits: ________1________

b) EZ-Cache believes they can improve the hit rate by either increasing the block size or increasing the associativity

Simulate the above memory accesses for a 32-byte direct-mapped cache with an 8-byte block size

Block/Set#	Tag
0
1	6->7->6->7->6->7->6
2	6
3

Number of Hits: _________1________

Simulate the above memory accesses for a 32-byte 2-way associative cache with a 4-byte block size

Set#	Block#	Tag
0	0	13
	1
1	2
	3
2	4	12
	5	14
3	6	12
	7	14

Number of Hits ________4__________

Which one of the configurations is best for the memory access sequence?

-----------2 ways ,4 bytes block associative cache ;

Problem 4 (12 points)

You have been given the following two caches which are both byte addressable and use 16 bit memory addresses.

Cache	Cache A	Cache B
Total size (Bytes)	16	16
Block size (Bytes)	4	4
Organization	Fully Associative	Direct Mapped
Replacement policy	LRU	-
Write policy	Allocate on write	Allocate on write

a) The following addresses are referenced in the given order; please put an H for each of the hits and an M for each of the misses for both the caches. Also calculate the hit rate for each cache. An extra column for an infinite size fully-associative cache (also of block size 4 bytes) is given to make the calculation for part (b) easy. [8 pts]

Address(hex)	Address(binary)	Infinite	Cache A	Cache B
0x0000	0000 0000 0000 0000	M	M	M
0x0007	0000 0000 0000 0111	M	M	M
0x0003	0000 0000 0000 0011	H	H	H
0x0009	0000 0000 0000 1001	M	M	M
0x0016	0000 0000 0001 0110	M	M	M
0x0005	0000 0000 0000 0101	H	H	M
0x000D	0000 0000 0000 1101	M	M	M
0x0001	0000 0000 0000 0001	H	M	H
Hit Rate		3/8	2/8	2/8

-----------------(2/8+2/8)*2*8=8 ;

b) For each reference in the previous sequence of references, classify them using one of the four possible labels HIT (if the access is a hit) or COMPULSORY / CAPACITY / CONFLICT (if it’s a miss depending on the type of miss) [4 pts]

Address(hex)	Cache A	Cache B
0x0000	COMPULSORY	COMPULSORY
0x0007	COMPULSORY	COMPULSORY
0x0003	HIT	HIT
0x0009	COMPULSORY	COMPULSORY
0x0016	COMPULSORY	COMPULSORY
0x0005	HIT	CONFLICT
0x000D	COMPULSORY	COMPULSORY
0x0001	CAPACITY	HIT

------------0.25*8*2=4 ;

Problem 5 (10 points)

The picojoule microprocessor has a byte-addressable ISA and only 64 bytes of memory. It has a 16 byte, 2-way set-associative, write-back, write-allocate cache, and uses a block size of 2 bytes. Each load / store instruction accesses a single byte. The OB0 and OB1 fields in the cache hold the 2 data bytes in a block (1 byte each). Given the following sequence of instructions, update the cache after each instruction. When both ways in a set are invalid and a block has to be allocated, the cache logic puts higher priority on way 0. Use decimal value for the B0 and B1 field and binary for the rest, if a cache block is invalid you don’t have to fill in anything, if you don’t know the value of a certain field for a valid block put a X there. The initial empty state of the cache is given. The content of the following memory locations are known:

M[4]=7

M[14]=11

M[15]=13

M[37]=17

M[45]=19

The instructions (LD is a load and ST is a store) follow:

1: LD R1 ← M[4]

2: LD R2 ← M[37]

3: ST R1 → M[36]

4: ST R2 → M[5]

5: LD R1 ← M[15]

6: LD R2 ← M[14]

7: LD R1 ← M[45]

8: ST R2 → M[44]

9: HALT

Part (a) [8 points]

Initial

Way 0

Way 1

lru

Tag

OB0

OB1

lru

Tag

OB0

OB1

Set 0

Set 1

Set 2

Set 3

After instruction 1 1: LD R1 ← M[4]

Way 0

Way 1

lru

Tag

OB0

OB1

lru

Tag

OB0

OB1

Set 0

Set 1

Set 2

000

Set 3

After instruction 2 2: LD R2 ← M[37]

	Way 0						Way 1
	V	D	lru	Tag	OB0	OB1	V	D	lru	Tag	OB0	OB1
Set 0	0						0
Set 1	0						0
Set 2	1	0	LRU	000	7	X	1	0		100	X	17
Set 3	0						0

After instruction 3 3: ST R1 → M[36]

	Way 0						Way 1
	V	D	lru	Tag	OB0	OB1	V	D	lru	Tag	OB0	OB1
Set 0	0						0
Set 1	0						0
Set 2	1	0	LRU	000	7	X	1	1		100	7	17
Set 3	0						0

After instruction 4 4: ST R2 → M[5]

	Way 0						Way 1
	V	D	lru	Tag	OB0	OB1	V	D	lru	Tag	OB0	OB1
Set 0	0						0
Set 1	0						0
Set 2	1	1		000	7	17	1	1	LRU	100	7	17
Set 3	0						0

After instruction 5 5: LD R1 ← M[15]

	Way 0						Way 1
	V	D	lru	Tag	OB0	OB1	V	D	lru	Tag	OB0	OB1
Set 0	0						0
Set 1	0						0
Set 2	1	1		000	7	17	1	1	LRU	100	7	17
Set 3	1	0		001	11	13	0

After instruction 6 6: LD R2 ← M[14]

	Way 0						Way 1
	V	D	lru	Tag	OB0	OB1	V	D	lru	Tag	OB0	OB1
Set 0	0						0
Set 1	0						0
Set 2	1	1		000	7	17	1	1	LRU	100	7	17
Set 3	1	0		001	11	13	0

After instruction 7 7: LD R1 ← M[45]

	Way 0						Way 1
	V	D	lru	Tag	OB0	OB1	V	D	lru	Tag	OB0	OB1
Set 0	0						0
Set 1	0						0
Set 2	1	1	LRU	000	7	17	1	0		101	X	19
Set 3	1	0		001	11	13	0

After instruction 8 8: ST R2 → M[44]

	Way 0						Way 1
	V	D	lru	Tag	OB0	OB1	V	D	lru	Tag	OB0	OB1
Set 0	0						0
Set 1	0						0
Set 2	1	1	LRU	000	7	17	1	1		101	11	19
Set 3	1	0		001	11	13	0

Part (b):

In total how many bytes are written to memory for executing instruction 1 to 8 (including instruction 8) ? How many more bytes will have to be written to memory after HALT is executed? [2 points]

------------In total 2 bytes are written to memory for executing instruction 1 to 8 ;

2x2=4 bytes will have to be written to memory after HALT is executed.

Problem 6 (8 points)

A certain workload having the following instruction mix is run on two processor designs with both having I-Cache and D-Cache.

ADD 10%

NAND 20%

BEQ 25%

SW 15%

LW 30%

Additionally, it is known that I-Cache Hit-rate is 90%, D-Cache Hit-rate is 98%, 45% branches are not taken and 25% of LW instructions are followed by a dependent instruction. The memory takes 75 nano-seconds to access.

a) Assuming the above code is run on a standard LC-2K 5-stage pipeline design processor with forwarding and with branches predicted not taken and clocked at 200MHz, what is the CPI? Show your work. [3 points]

Clock period : 1 / 200MHz = 5ns

Cache : 75 ns / 5ns = 15 cycles

CPI = 1 + 1*0.10*15 + (0.3+0.15)*0.02*15 + 0.3*0.25*1 + 0.25*0.55*3 = 3.1225

b) This five stage pipeline is extended to a similar 15 stage pipeline with no additional hazards being introduced. The amount of stall cycles needed for a lw followed by a dependent instruction does not change. The new frequency is 400MHz. Now the same code is run on the 15 stage pipeline where branches are resolved in the 11th stage.

I. What is the new CPI? Show your work. [4 pts]

II. Does this new design result in better performance for this workload? [1 pt]

I ：

Beq : 11 – 1 = 10 cycles

Clock period : 1/400 MHz = 2.5ns

Cache : 75ns / 2.5 ns = 30 cycles

CPI = 1 + 1*0.1*30 + (0.3+0.15)*0.02*30 + 0.3*0.25*1 + 0.25*0.55*10 = 5.72

II ：

(a) : 5 ns * 3.1225 = 15.6125 ns ;

(b) : 2.5 ns * 5.72 = 14.3 ns ;

Yes, this new design result in better performance for this workload.

posted @ 2017-04-03 16:50 nanashi 阅读(421) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

nanashi

Computer Organization and Design--计组作业习题（6）

公告