【MIT CS6.828】Lab 1: Booting a PC - Part 3: The Kernel

Part 3: The Kernel

1. 物理地址与虚拟地址的映射

我们在 5.3 中提到过，内核的第一条指令所在的物理地址0x100000被映射到虚拟地址0xF0100000。事实上，在 Lab 1 中，JOS 对前 4MB 的物理内存都进行了类似的映射，这是由kern/entrypgdir.c中手写的、静态初始化的页目录和页表实现的。我们在 Lab 2 中会深入相关细节，而现在更关心的是内核何时启用这个映射关系。

kern/entry.S中通过设置 CPU 控制寄存器中CR0段的PG位来开启映射（回忆：实模式切换到保护模式的PE位也位于CR0段）。

# Turn on paging.
movl	%cr0, %eax
orl	$(CR0_PE|CR0_PG|CR0_WP), %eax
movl	%eax, %cr0

开启后，虚拟地址0xf0000000到0xf0400000将被转换为物理地址0x00000000到0x00400000（前 4MB ）；虚拟地址0x00000000到0x00400000也将被转换为一样的物理地址。

练习 7. 使用 QEMU 和 GDB 跟踪 JOS 内核并在movl %eax, %cr0处检查 0x00100000 和 0xf0100000 处的内存。然后输入stepi，单步执行该指令。再次检查 0x00100000 和 0xf0100000 处的内存。

建立新映射后，如果映射不存在，将无法正常工作的第一条指令是什么？注释掉kern/entry.S中movl %eax, %cr0这一行，再次GDB跟踪检查。

(gdb) b *0x100025
(gdb) c
(gdb) x/4b 0x00100000
0x100000:	0x02	0xb0	0xad	0x1b
(gdb) x/4b 0xf0100000
0xf0100000 <_start-268435468>:	0x00	0x00	0x00	0x00
(gdb) stepi
=> 0x100028:	mov    $0xf010002f,%eax
0x00100028 in ?? ()
(gdb) x/4b 0x00100000
0x100000:	0x02	0xb0	0xad	0x1b
(gdb) x/4b 0xf0100000
0xf0100000 <_start-268435468>:	0x02	0xb0	0xad	0x1b

可以看到建立映射前，0x00100000 和 0xf0100000 处的内存内容不同，建立映射后内容相同。

注释掉（注意用#而不是;）kern/entry.S中movl %eax, %cr0后，运行内核，QEMU报错：

qemu: fatal: Trying to execute code outside RAM or ROM at 0xf010002c

表明0xf010002c超出了物理地址范围。说明内核在执行f010002a处的指令jmp *%eax时执行失败。

2. 格式化打印到控制台

在完成前述的一切初始化工作后，我们终于可以让 JOS 做一些简单的事情了。比如让它输出一些文字到控制台上。

printf之类的函数不是天然存在的，本质上还是在调用 OS 提供的功能函数才能实现。在这里，我们需要自己实现这个功能函数。

在 JOS 中，这一功能涉及到的文件有kern/printf.c、lib/printfmt.c、kern/console.c。这里不逐行解释，按练习8的引导边做边理解。

练习 8.

在上述文件中找到被省略的用于实现通过%o以打印八进制数的代码段，将它补充完整。

在printfmt.c的第207行。参考一下上下文“%d”和"%x"的代码即可。完成的代码：
```
// (unsigned) octal
case 'o':
    // 已完成
	num = getuint(&ap, lflag);
    base = 8;
    goto number;
    break;
```
make grade会自动测试代码正确性，printf 一项显示OK即为通过。
```
running JOS: (0.5s) 
  printf: OK 
```
解释printf.c和console.c之间的接口。具体地说，console.c 导出了什么函数？这个函数在printf.c又是怎样被调用的？

printf.c中putch()调用了console.c定义的函数cputchar()，cputchar()又仅调用了cons_putc()函数，通过对相关硬件的I/O端口读写（内联汇编实现）来完成“输出一个字符到控制台”的功能。

解释console.c中的以下代码

（注：CGA是QEMU模拟的硬件，与古老的80386适配的同样古老的彩色图形卡；CRT代表QEMU模拟的CRT显示器，即在液晶普及之前用的阴极射线显像管）

用来实现这么一个显示效果：当文本光标超出了屏幕范围，所有文本都应该往上滚动一行，同时光标移动到最下方新的空行的起始位置。

// What is the purpose of this?
if (crt_pos >= CRT_SIZE) { // 如果光标超出了屏幕范围
    // crt_pos 当前光标所在位置
    // CRT_SIZE 是CRT显示器的大小，单位可能是字符数，值为25*80
    int i;
	// crt_buf 存储着当前屏幕上显示的字符的数组
    memmove(crt_buf, crt_buf + CRT_COLS, (CRT_SIZE - CRT_COLS) * sizeof(uint16_t));
    // memmove的功能类似于C语言中常用的memcpy
    // 屏幕上第二行往后的字符全部前移一行（即第一行从屏幕上消失了）
    for (i = CRT_SIZE - CRT_COLS; i < CRT_SIZE; i++)
        crt_buf[i] = 0x0700 | ' ';
    // 将重复的最后一行删掉，变成新的空行
    crt_pos -= CRT_COLS;
    // 光标从原来的旧一行末尾移动到了现在的新一行开头
}

For the following questions you might wish to consult the notes for Lecture 2. These notes cover GCC's calling convention on the x86.

Trace the execution of the following code step-by-step:

int x = 1, y = 3, z = 4;
cprintf("x %d, y %x, z %d\n", x, y, z);

要用 GDB 在 JOS 中跟踪这段代码的执行，将代码复制到kern/monitor.c中的monitor()函数内即可，如：

void
monitor(struct Trapframe *tf)
{
    ...
	cprintf("Welcome to the JOS kernel monitor!\n");
	cprintf("Type 'help' for a list of commands.\n");

   	// 插入要执行的代码
	int x = 1, y = 3, z = 4; 
	cprintf("x %d, y %x, z %d\n", x, y, z);
    ...
}

然后make，在/obj/kern/kernel.asm可以查到这一段的内存地址为0xf01008ea，GDB 打断点然后调试即可。

In the call to cprintf(), to what does fmt point? To what does ap point?

先看cprintf()的定义：

int
cprintf(const char *fmt, ...) // C语言不定参数
{
	va_list ap; 
	int cnt;

	va_start(ap, fmt); // 初始化参数列表，将函数的第一个参数传给ap
	cnt = vcprintf(fmt, ap); // 将%d、%x等与参数列表中的x、y一一对应输出
	va_end(ap); // 结束不定参数的使用

	return cnt;
}

显然fmt对应的是参数"x %d, y %x, z %d\n"，ap为多个参数（在这里为x,y,z）构成的参数列表

List (in order of execution) each call to cons_putc, va_arg, and vcprintf. For cons_putc, list its argument as well. For va_arg, list what ap points to before and after the call. For vcprintf list the values of its two arguments.

GDB 在单步进入一个函数后会自动显示其参数，所以在几个调用的地方打断点即可：

(gdb) b *0xf01008ea # 需要跟踪的代码段
(gdb) b *0xf0100a8d # cprintf
(gdb) b *0xf0100a62 # vcprintf
(gdb) b *0xf010037b # cons_putc
(gdb) b *0xf01011b2 # va_arg

以下为 GDB 输出信息经过精简后得到的：

cprintf (fmt=0xf0101da8 "x %d, y %x, z %d\n") 
vcprintf (fmt=0xf0101da8 "x %d, y %x, z %d\n", ap=0xf010ef54 "\001")
cons_putc (c=120) at kern/console.c:70 # x
cons_putc (c=32) at kern/console.c:70  # 空格

Hardware watchpoint 2: ap
Old value = (va_list) 0xf010ef54 "\001"
New value = (va_list) 0xf010ef58 "\003"

cons_putc (c=49) at kern/console.c:70  # 1
cons_putc (c=44) at kern/console.c:70  # ,
cons_putc (c=32) at kern/console.c:70  # 空格
cons_putc (c=121) at kern/console.c:70 # y
cons_putc (c=32) at kern/console.c:70  # 空格

Hardware watchpoint 2: ap
Old value = (va_list) 0xf010ef58 "\003"
New value = (va_list) 0xf010ef5c "\004"

cons_putc (c=51) at kern/console.c:70  # 3
cons_putc (c=44) at kern/console.c:70  # ,
cons_putc (c=32) at kern/console.c:70  # 空格
cons_putc (c=122) at kern/console.c:70 # z
cons_putc (c=32) at kern/console.c:70  # 空格

Hardware watchpoint 2: ap
Old value = (va_list) 0xf010ef5c "\004"
New value = (va_list) 0xf010ef60 "(\037", <incomplete sequence \360>)

cons_putc (c=52) at kern/console.c:70  # 4

完整调用顺序cprintf()→vcprintf()→vprintfmt()→getint()→va_arg()（返回值为putch的参数）→putch()→cputchar()→cons_putc()

cons_putc的参数是打印语句x 1, y 3, z 4的各个字符的ASCII码

va_arg第一次调用：前ap="\001"，后ap="\003"

va_arg第二次调用：前ap="\003"，后ap="\004"

va_arg第三次调用：前ap="\004"，后ap=""(\037", <incomplete sequence \360>)"

vcprintf两个参数的值为fmt=0xf0101da8 "x %d, y %x, z %d\n"、 ap=0xf010ef54 "\001"

Run the following code.
```
unsigned int i = 0x00646c72;
cprintf("H%x Wo%s", 57616, &i);
```
What is the output? Explain how this output is arrived at in the step-by-step manner of the previous exercise.（需要参考ASCII表）

输出为：
```
He110 World
```
这个输出结果是这样得到的：

首先经过调用cprintf()→vcprintf()→vprintfmt()，在vprintfmt中有以下代码：
```
while ((ch = *(unsigned char *) fmt++) != '%') {
    if (ch == '\0')
        return;
    putch(ch, putdat);
}
```
即从头开始扫描字符串fmt = "H%x Wo%s"，若未碰到%则直接打印到控制台，否则跳出循环开始解析格式符。

第一个格式符为%x，对应的解析代码为：
```
// (unsigned) hexadecimal
case 'x':
    num = getuint(&ap, lflag);
    base = 16;
number:
    printnum(putch, putdat, num, base, width, padc);
    break;
```
getuint会调用va_arg，va_arg返回当前所指的cprintf的第二个参数57616，并将指针移到第三个参数；57616在printnum中完成 16 进制转换得到e110，然后打印到控制台；

第二个格式符为%s，对应的解析代码为：
```
// string
case 's':
	if ((p = va_arg(ap, char *)) == NULL)
        p = "(null)";
    if (width > 0 && padc != '-')
        for (width -= strnlen(p, precision); width > 0; width--)
            putch(padc, putdat);
    for (; (ch = *p++) != '\0' && (precision < 0 || --precision >= 0); width--)
        if (altflag && (ch < ' ' || ch > '~'))
            putch('?', putdat);
        else
            putch(ch, putdat);
    for (; width > 0; width--)
        putch(' ', putdat);
    break;
```
（需要先搞明白指针）要理解这个字符串是怎么解析出来的，先看unsigned int i = 0x00646c72;，它表示在以&i这个内存地址为起始地址的连续4个存储单元中存放了数值72 6c 64 00（ x86 系列都是小端存储的，低字节在低地址），在上面的解析代码中，p就是传入的&i，ch = *p++依次读入了十六进制数值72 6c 64 00，对应ASCII码即r l d \0，故输出字符rld。

The output depends on that fact that the x86 is little-endian. If the x86 were instead big-endian what would you set i to in order to yield the same output? Would you need to change 57616 to a different value?

如果是大端存储，i需要改为0x726c6400；而57616不需要改，它是直接作为一个数值传入并解析的。

In the following code, what is going to be printed aftery=? (note: the answer is not a specific value.) Why does this happen?

cprintf("x=%d y=%d", 3);

由 GDB 结果：

# 调用 va_arg 获取参数 3 之前，ap 指向参数3
vprintfmt (putch=0xf0100a2d <putch>, putdat=0xf010ef2c, fmt=0xf0101da8 "x=%d y=%d", ap=0xf010ef64 "\003") at lib/printfmt.c:92
# 调用 va_arg 获取参数 3 之后，ap 从参数3所在的0xf010ef64移动到0xf010ef68，而这个内存地址并没有任何对应参数，只是一堆无意义值
vprintfmt (putch=0xf0100a2d <putch>, putdat=0xf010ef2c, fmt=0xf0101da8 "x=%d y=%d", ap=0xf010ef68 "\230\357\020\360~\n\020\360-\n\020\360\214\357\020\360\234\032\020\360\310\357\020\360\234\032\020\360\244\357\020\360\270\357", <incomplete sequence \360>) at lib/printfmt.c:92

可以看到，在读取了参数3之后，ap向高地址方向移动了4个内存地址以期找到下一个参数（从栈的角度来说就是弹出了栈顶元素3，ap指向新的栈顶），但该地址并未存储对应参数，其中的无意义值被vprintfmt照常解析为整型，所以打印了奇怪的负数。

Let's say that GCC changed its calling convention so that it pushed arguments on the stack in declaration order, so that the last argument is pushed last. How would you have to change cprintf or its interface so that it would still be possible to pass it a variable number of arguments?

这里需要理解 GCC 的调用约定，但是完整地讲就太多了，这里不展开。

在 GCC 中，参数入栈顺序是最后声明的最先入栈，例如：
```
int main() 
{
    fun(1, 2, 3);
    // fun(a, b, c)
    return 0;
}
&a = 0x0022FF50
&b = 0x0022FF54
&c = 0x0022FF58
```
栈是从高地址向低地址增长的，可以看到第一个参数a=17反而是最后入栈的栈顶元素，地址最小。

用可变参数的方式，则ap初始指向0x0022FF50，即栈顶元素1，每调用一次va_arg，将返回当前ap对应的参数值，同时将ap向高地址方向移动，如变为0x0022FF54。换句话说，每调用一次va_arg，实际上是弹出了当前栈顶元素，并将栈顶指针指向下一元素。参数读入顺序为1→2→3。

题目假设情况变为：入栈顺序为第一个参数1、第二个参数2、第三个参数3。即：
```
&a = 0x0022FF58 
&b = 0x0022FF54 
&c = 0x0022FF50 
```
那么ap仍指向0x0022F50，但栈顶元素变为了3。参数读入顺序变为3→2→1。

要修改cprintf及相关函数使它在这种情况下还能正常使用，可以改成：

（TODO）
Challenge: Enhance the console to allow text to be printed in different colors. The traditional way to do this is to make it interpret ANSI escape sequences embedded in the text strings printed to the console, but you may use any mechanism you like.

（TODO）

posted @ 2023-01-29 23:13 StreamAzure 阅读(76) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

彩虹桥

彩虹桥

【MIT CS6.828】Lab 1: Booting a PC - Part 3: The Kernel

Part 3: The Kernel

1. 物理地址与虚拟地址的映射

2. 格式化打印到控制台

公告