教材内容总结

这次学习的是教材第四章的内容

第四章 处理器体系结构

1基本知识:
(1)处理器:执行一系列指令完成相应功能

(2)指令体系结构:处理器支持的指令和指令的字节级编码

(3)指令集在机器型号上有着一定要求(不同型号相互兼容)

(4)通过处理多条指令的不同部分(流水线)提高性能

2定义Y86体系结构

(1)过程:定义指令集(对操作及位长定义),寄存器标识符定义,编码(将每一条指令根据固定格式翻译成唯一的二进制编码,整数采用小端法编码,根据地址及指令长度更新指针)

,对细节把握
(2)遇到异常时停止指令执行(完整设计中会调用异常处理程序)

(3)逻辑门,存储器和时钟等基础知识,同时介绍HCL语言,清晰明确的显示出所有候选条件,类似C语言switch语句
3定义指令实现阶段(同体系结构课本)

(1)取值,译码,执行,访存,写回,更新PC

(2)对指令跟踪执行,定义保存变量和模块值(valC),这样在介绍硬件结构时可通过相关标识进行表示

4硬件结构

SEQ:完全顺序执行,使用较少的硬件资源,需要信号向后传递,导致处理太慢;

SEQ+:和SEQ相比在时钟周期开始时更新PC值,提前确定下条指令地址

PIPE-:在SEQ+各阶段间加入流水寄存器,对信号重新排列,通过暂停方式处理数据冒险

PIPE:在PIPE-基础上可以通过转发处理数据冒险,避免了停顿,提升吞吐量

5流水线

(1)由时钟信号控制,使多条指令不同阶段可以同时执行,增加了延迟,提升吞吐量

(2)问题:流水阶段长度不一,流水线深度过大插入寄存器影响性能

(3)因为存在反馈的流水线(即一条指令可能需要多次执行某个操作,或使用某个数据,访问某个存储器或寄存器)导致流水线冒险(数据,控制)

(4)数据冒险解决方案:暂停(系统判断是否会产生冒险,会则插入气泡,延迟下条指令执行),转发(使产生的结果理解送入需要的位置,避免暂停),加载\使用(当存储器读发生较晚时需要暂停和转发机制同时使用)

(5)异常处理

(6)控制逻辑和机制

处理return:暂停流水线直到ret指令到达写回阶段
加载/使用冒险:在对存储器读和使用之间需要暂停一个周期
预测错误分支:当预测失败时应该可以返回到之前状态,去掉错误指令
异常:出现异常时,停止后序指令执行,并避免当前异常指令。
(7)性能分析
CPI=1+处罚项(预测错误+返回+暂停)
目标:使CPI=1;一个周期执行一条指令

对流水线的理解:

(1)

流水线主要讲解流水线的概念,分类,根据是否存在反馈回路分为线性非线性流水线,对非线性流水线的调度,本质就是想办法使流水线执行起来不冲突,在教材上通过预约表冲突向量的方式并考虑到吞吐量计算出最佳调度方案;

(2)
再一次阅读课本,很容易也很清晰的理解整个过程,指令执行使用流水线技术,就是在整体上在固定的时间有效完成更多的事情。

教材遇到的问题及解决

1.理解Y86-64 程序和X86-64 程序

Y86指令集

Y86程序

Y86和X86的区别在于,有的时候Y86需要两条指令来达到X86一条指令就可以达成的目的.

比如对于X86指令中的addl $4,%ecx这样的指令,由于Y86当中的addl指令不包含立即数,所以Y86需要先将立即数存如寄存器,即使用irmovl指令,然后再使用addl来处理加法运算.

总的来说,Y86就是X86的一个缩减版,他的目的就是以简单的结构来实现一个处理器,帮助我们了解处理器的设计和实现.

2.什么是流水线的冒险性

使用流水线技术,当相邻指令间存在相关时会导致出现问题。
这些相关有:

1、数据相关:下一条指令会用到这一条指令计算出的结果

2、控制相关:一条指令要确定下一条指令的位置,例如在执行跳转、调用或返回指令时。

这些相关可能会导致流水线产生计算错误,称为冒险。

实验楼实验

在实验楼构建YIS
新建两个文件夹,并进入,输入命令

wget http://labfile.oss.aliyuncs.com/courses/413/sim.tar tar -xvf sim.tar

成功会显示已保存


接着输入指令即可:

代码调试的问题以及解决

1.对国密算法的研究

sm4

/*
 * SM4 Encryption alogrithm (SMS4 algorithm)
 * GM/T 0002-2012 Chinese National Standard ref:http://www.oscca.gov.cn/ 
 * thanks to Xyssl
 * thnaks and refers to http://hi.baidu.com/numax/blog/item/80addfefddfb93e4cf1b3e61.html
 * author:goldboar
 * email:goldboar@163.com
 * 2012-4-20
 */

// Test vector 1
// plain: 01 23 45 67 89 ab cd ef fe dc ba 98 76 54 32 10
// key:   01 23 45 67 89 ab cd ef fe dc ba 98 76 54 32 10
// 	   round key and temp computing result:
// 	   rk[ 0] = f12186f9 X[ 0] = 27fad345
// 		   rk[ 1] = 41662b61 X[ 1] = a18b4cb2
// 		   rk[ 2] = 5a6ab19a X[ 2] = 11c1e22a
// 		   rk[ 3] = 7ba92077 X[ 3] = cc13e2ee
// 		   rk[ 4] = 367360f4 X[ 4] = f87c5bd5
// 		   rk[ 5] = 776a0c61 X[ 5] = 33220757
// 		   rk[ 6] = b6bb89b3 X[ 6] = 77f4c297
// 		   rk[ 7] = 24763151 X[ 7] = 7a96f2eb
// 		   rk[ 8] = a520307c X[ 8] = 27dac07f
// 		   rk[ 9] = b7584dbd X[ 9] = 42dd0f19
// 		   rk[10] = c30753ed X[10] = b8a5da02
// 		   rk[11] = 7ee55b57 X[11] = 907127fa
// 		   rk[12] = 6988608c X[12] = 8b952b83
// 		   rk[13] = 30d895b7 X[13] = d42b7c59
// 		   rk[14] = 44ba14af X[14] = 2ffc5831
// 		   rk[15] = 104495a1 X[15] = f69e6888
// 		   rk[16] = d120b428 X[16] = af2432c4
// 		   rk[17] = 73b55fa3 X[17] = ed1ec85e
// 		   rk[18] = cc874966 X[18] = 55a3ba22
// 		   rk[19] = 92244439 X[19] = 124b18aa
// 		   rk[20] = e89e641f X[20] = 6ae7725f
// 		   rk[21] = 98ca015a X[21] = f4cba1f9
// 		   rk[22] = c7159060 X[22] = 1dcdfa10
// 		   rk[23] = 99e1fd2e X[23] = 2ff60603
// 		   rk[24] = b79bd80c X[24] = eff24fdc
// 		   rk[25] = 1d2115b0 X[25] = 6fe46b75
// 		   rk[26] = 0e228aeb X[26] = 893450ad
// 		   rk[27] = f1780c81 X[27] = 7b938f4c
// 		   rk[28] = 428d3654 X[28] = 536e4246
// 		   rk[29] = 62293496 X[29] = 86b3e94f
// 		   rk[30] = 01cf72e5 X[30] = d206965e
// 		   rk[31] = 9124a012 X[31] = 681edf34
// cypher: 68 1e df 34 d2 06 96 5e 86 b3 e9 4f 53 6e 42 46
// 		
// test vector 2
// the same key and plain 1000000 times coumpting 
// plain:  01 23 45 67 89 ab cd ef fe dc ba 98 76 54 32 10
// key:    01 23 45 67 89 ab cd ef fe dc ba 98 76 54 32 10
// cypher: 59 52 98 c7 c6 fd 27 1f 04 02 f8 04 c3 3d 3f 66

#include "sm4.h"
#include <string.h>
#include <stdio.h>

/*
 * 32-bit integer manipulation macros (big endian)
 */
#ifndef GET_ULONG_BE
#define GET_ULONG_BE(n,b,i)                             \
{                                                       \
    (n) = ( (unsigned long) (b)[(i)    ] << 24 )        \
        | ( (unsigned long) (b)[(i) + 1] << 16 )        \
        | ( (unsigned long) (b)[(i) + 2] <<  8 )        \
        | ( (unsigned long) (b)[(i) + 3]       );       \
}
#endif

#ifndef PUT_ULONG_BE
#define PUT_ULONG_BE(n,b,i)                             \
{                                                       \
    (b)[(i)    ] = (unsigned char) ( (n) >> 24 );       \
    (b)[(i) + 1] = (unsigned char) ( (n) >> 16 );       \
    (b)[(i) + 2] = (unsigned char) ( (n) >>  8 );       \
    (b)[(i) + 3] = (unsigned char) ( (n)       );       \
}
#endif

/*
 *rotate shift left marco definition
 *
 */
#define  SHL(x,n) (((x) & 0xFFFFFFFF) << n)
#define ROTL(x,n) (SHL((x),n) | ((x) >> (32 - n)))

#define SWAP(a,b) { unsigned long t = a; a = b; b = t; t = 0; }

/*
 * Expanded SM4 S-boxes
 /* Sbox table: 8bits input convert to 8 bits output*/
 
static const unsigned char SboxTable[16][16] = 
{
{0xd6,0x90,0xe9,0xfe,0xcc,0xe1,0x3d,0xb7,0x16,0xb6,0x14,0xc2,0x28,0xfb,0x2c,0x05},
{0x2b,0x67,0x9a,0x76,0x2a,0xbe,0x04,0xc3,0xaa,0x44,0x13,0x26,0x49,0x86,0x06,0x99},
{0x9c,0x42,0x50,0xf4,0x91,0xef,0x98,0x7a,0x33,0x54,0x0b,0x43,0xed,0xcf,0xac,0x62},
{0xe4,0xb3,0x1c,0xa9,0xc9,0x08,0xe8,0x95,0x80,0xdf,0x94,0xfa,0x75,0x8f,0x3f,0xa6},
{0x47,0x07,0xa7,0xfc,0xf3,0x73,0x17,0xba,0x83,0x59,0x3c,0x19,0xe6,0x85,0x4f,0xa8},
{0x68,0x6b,0x81,0xb2,0x71,0x64,0xda,0x8b,0xf8,0xeb,0x0f,0x4b,0x70,0x56,0x9d,0x35},
{0x1e,0x24,0x0e,0x5e,0x63,0x58,0xd1,0xa2,0x25,0x22,0x7c,0x3b,0x01,0x21,0x78,0x87},
{0xd4,0x00,0x46,0x57,0x9f,0xd3,0x27,0x52,0x4c,0x36,0x02,0xe7,0xa0,0xc4,0xc8,0x9e},
{0xea,0xbf,0x8a,0xd2,0x40,0xc7,0x38,0xb5,0xa3,0xf7,0xf2,0xce,0xf9,0x61,0x15,0xa1},
{0xe0,0xae,0x5d,0xa4,0x9b,0x34,0x1a,0x55,0xad,0x93,0x32,0x30,0xf5,0x8c,0xb1,0xe3},
{0x1d,0xf6,0xe2,0x2e,0x82,0x66,0xca,0x60,0xc0,0x29,0x23,0xab,0x0d,0x53,0x4e,0x6f},
{0xd5,0xdb,0x37,0x45,0xde,0xfd,0x8e,0x2f,0x03,0xff,0x6a,0x72,0x6d,0x6c,0x5b,0x51},
{0x8d,0x1b,0xaf,0x92,0xbb,0xdd,0xbc,0x7f,0x11,0xd9,0x5c,0x41,0x1f,0x10,0x5a,0xd8},
{0x0a,0xc1,0x31,0x88,0xa5,0xcd,0x7b,0xbd,0x2d,0x74,0xd0,0x12,0xb8,0xe5,0xb4,0xb0},
{0x89,0x69,0x97,0x4a,0x0c,0x96,0x77,0x7e,0x65,0xb9,0xf1,0x09,0xc5,0x6e,0xc6,0x84},
{0x18,0xf0,0x7d,0xec,0x3a,0xdc,0x4d,0x20,0x79,0xee,0x5f,0x3e,0xd7,0xcb,0x39,0x48}
};

/* System parameter */
static const unsigned long FK[4] = {0xa3b1bac6,0x56aa3350,0x677d9197,0xb27022dc};

/* fixed parameter */
static const unsigned long CK[32] =
{
0x00070e15,0x1c232a31,0x383f464d,0x545b6269,
0x70777e85,0x8c939aa1,0xa8afb6bd,0xc4cbd2d9,
0xe0e7eef5,0xfc030a11,0x181f262d,0x343b4249,
0x50575e65,0x6c737a81,0x888f969d,0xa4abb2b9,
0xc0c7ced5,0xdce3eaf1,0xf8ff060d,0x141b2229,
0x30373e45,0x4c535a61,0x686f767d,0x848b9299,
0xa0a7aeb5,0xbcc3cad1,0xd8dfe6ed,0xf4fb0209,
0x10171e25,0x2c333a41,0x484f565d,0x646b7279
};


/*
 * private function:
 * look up in SboxTable and get the related value.
 * args:    [in] inch: 0x00~0xFF (8 bits unsigned value).
 */
static unsigned char sm4Sbox(unsigned char inch)
{
    unsigned char *pTable = (unsigned char *)SboxTable;
    unsigned char retVal = (unsigned char)(pTable[inch]);
    return retVal;
}

/*
 * private F(Lt) function:
 * "T algorithm" == "L algorithm" + "t algorithm".
 * args:    [in] a: a is a 32 bits unsigned value;
 * return: c: c is calculated with line algorithm "L" and nonline algorithm "t"
 */
static unsigned long sm4Lt(unsigned long ka)
{
    unsigned long bb = 0;
    unsigned long c = 0;
    unsigned char a[4];
	unsigned char b[4];
    PUT_ULONG_BE(ka,a,0)
    b[0] = sm4Sbox(a[0]);
    b[1] = sm4Sbox(a[1]);
    b[2] = sm4Sbox(a[2]);
    b[3] = sm4Sbox(a[3]);
	GET_ULONG_BE(bb,b,0)
    c =bb^(ROTL(bb, 2))^(ROTL(bb, 10))^(ROTL(bb, 18))^(ROTL(bb, 24));
    return c;
}

/*
 * private F function:
 * Calculating and getting encryption/decryption contents.
 * args:    [in] x0: original contents;
 * args:    [in] x1: original contents;
 * args:    [in] x2: original contents;
 * args:    [in] x3: original contents;
 * args:    [in] rk: encryption/decryption key;
 * return the contents of encryption/decryption contents.
 */
static unsigned long sm4F(unsigned long x0, unsigned long x1, unsigned long x2, unsigned long x3, unsigned long rk)
{
    return (x0^sm4Lt(x1^x2^x3^rk));
}


/* private function:
 * Calculating round encryption key.
 * args:    [in] a: a is a 32 bits unsigned value;
 * return: sk[i]: i{0,1,2,3,...31}.
 */
static unsigned long sm4CalciRK(unsigned long ka)
{
    unsigned long bb = 0;
    unsigned long rk = 0;
    unsigned char a[4];
    unsigned char b[4];
    PUT_ULONG_BE(ka,a,0)
    b[0] = sm4Sbox(a[0]);
    b[1] = sm4Sbox(a[1]);
    b[2] = sm4Sbox(a[2]);
    b[3] = sm4Sbox(a[3]);
	GET_ULONG_BE(bb,b,0)
    rk = bb^(ROTL(bb, 13))^(ROTL(bb, 23));
    return rk;
}

static void sm4_setkey( unsigned long SK[32], unsigned char key[16] )
{
    unsigned long MK[4];
    unsigned long k[36];
    unsigned long i = 0;

    GET_ULONG_BE( MK[0], key, 0 );
    GET_ULONG_BE( MK[1], key, 4 );
    GET_ULONG_BE( MK[2], key, 8 );
    GET_ULONG_BE( MK[3], key, 12 );
    k[0] = MK[0]^FK[0];
    k[1] = MK[1]^FK[1];
    k[2] = MK[2]^FK[2];
    k[3] = MK[3]^FK[3];
    for(; i<32; i++)
    {
        k[i+4] = k[i] ^ (sm4CalciRK(k[i+1]^k[i+2]^k[i+3]^CK[i]));
        SK[i] = k[i+4];
	}

}

/*
 * SM4 standard one round processing
 *
 */
static void sm4_one_round( unsigned long sk[32],
                    unsigned char input[16],
                    unsigned char output[16] )
{
    unsigned long i = 0;
    unsigned long ulbuf[36];

    memset(ulbuf, 0, sizeof(ulbuf));
    GET_ULONG_BE( ulbuf[0], input, 0 )
    GET_ULONG_BE( ulbuf[1], input, 4 )
    GET_ULONG_BE( ulbuf[2], input, 8 )
    GET_ULONG_BE( ulbuf[3], input, 12 )
    while(i<32)
    {
        ulbuf[i+4] = sm4F(ulbuf[i], ulbuf[i+1], ulbuf[i+2], ulbuf[i+3], sk[i]);
// #ifdef _DEBUG
//        	printf("rk(%02d) = 0x%08x,  X(%02d) = 0x%08x \n",i,sk[i], i, ulbuf[i+4] );
// #endif
	    i++;
    }
	PUT_ULONG_BE(ulbuf[35],output,0);
	PUT_ULONG_BE(ulbuf[34],output,4);
	PUT_ULONG_BE(ulbuf[33],output,8);
	PUT_ULONG_BE(ulbuf[32],output,12);
}

/*
 * SM4 key schedule (128-bit, encryption)
 */
void sm4_setkey_enc( sm4_context *ctx, unsigned char key[16] )
{
    ctx->mode = SM4_ENCRYPT;
	sm4_setkey( ctx->sk, key );
}

/*
 * SM4 key schedule (128-bit, decryption)
 */
void sm4_setkey_dec( sm4_context *ctx, unsigned char key[16] )
{
    int i;
	ctx->mode = SM4_ENCRYPT;
    sm4_setkey( ctx->sk, key );
    for( i = 0; i < 16; i ++ )
    {
        SWAP( ctx->sk[ i ], ctx->sk[ 31-i] );
    }
}


/*
 * SM4-ECB block encryption/decryption
 */

void sm4_crypt_ecb( sm4_context *ctx,
				   int mode,
				   int length,
				   unsigned char *input,
                   unsigned char *output)
{
    while( length > 0 )
    {
        sm4_one_round( ctx->sk, input, output );
        input  += 16;
        output += 16;
        length -= 16;
    }

}

/*
 * SM4-CBC buffer encryption/decryption
 */
void sm4_crypt_cbc( sm4_context *ctx,
                    int mode,
                    int length,
                    unsigned char iv[16],
                    unsigned char *input,
                    unsigned char *output )
{
    int i;
    unsigned char temp[16];

    if( mode == SM4_ENCRYPT )
    {
        while( length > 0 )
        {
            for( i = 0; i < 16; i++ )
                output[i] = (unsigned char)( input[i] ^ iv[i] );

            sm4_one_round( ctx->sk, output, output );
            memcpy( iv, output, 16 );

            input  += 16;
            output += 16;
            length -= 16;
        }
    }
    else /* SM4_DECRYPT */
    {
        while( length > 0 )
        {
            memcpy( temp, input, 16 );
            sm4_one_round( ctx->sk, input, output );

            for( i = 0; i < 16; i++ )
                output[i] = (unsigned char)( output[i] ^ iv[i] );

            memcpy( iv, temp, 16 );

            input  += 16;
            output += 16;
            length -= 16;
        }
    }
}

1.摘要出现了问题

解决:openssl在建立连接的过程中,会将握手过程的信息做摘要保存在handshake_dgst[]数组,数组中每个索引对应一种摘要算法,每种支持的摘要算法都算一遍摘要保存.

2.大端小端出现了问题

在签名与验签过程中,始终不能通过,打印过程中的参数都没有错误.最后确认为:usbkey实现的csp接口返回的签名是已小端方式存储的.服务端实现的是大端存储方式的验签算法.所以将csp接口返回的签名(r,s)分别倒序,验签通过。

代码托管

上周考试错题总结
未公布答案

结对及互评

点评模板:

博客中值得学习的或问题:
xxx
xxx
...
代码中值得学习的或问题:
xxx
xxx
...

其他

本周结对学习情况

  • 结对学习内容
    第四章
    其他(感悟、思考等,可选)

这周学习了第四章,下载了Y86模拟器,在模拟器中还可以让它运行显示栈的变化,还是很神奇的,总的来说,这周接触的东西还是比较新鲜的,学习效率也还可以。

学习进度条

代码行数(新增/累积)| 博客量(新增/累积)|学习时间(新增/累积)|重要成长 |
| -------- | :----------------😐:----------------😐:---------------: |:-----: |
| 目标 | 5000行 | 30篇 | 400小时 | |
| 第一周 | 10 /10 | 1/1 | 10/10 | |
| 第二周 | 40 /70 | 2/4 | 18/38 | |
| 第三周 | 150/200 | 3/7 | 15/60 | |
|第四周 | 160/210 |6/8| 23/70

尝试一下记录「计划学习时间」和「实际学习时间」,到期末看看能不能改进自己的计划能力。这个工作学习中很重要,也很有用。
耗时估计的公式
:Y=X+X/N ,Y=X-X/N,训练次数多了,X、Y就接近了。

参考:软件工程软件的估计为什么这么难,软件工程 估计方法

计划学习时间:20小时

实际学习时间:23小时

改进情况:

(有空多看看现代软件工程 课件
软件工程师能力自我评价表)

参考资料
《深入理解计算机系统V3》学习指导