JOS和抢占式内核的一点随想

几年前，我在面试现在的公司的时候，被问过一个很经典的问题

抢占式的内核是怎么工作的？

那个时候我对OS的调度流程理解很肤浅，并且也没有过hands-on experiences，读Linux内核的一些书其实也没有真正理解整个软件+硬件的行为。

只能凭着过去 CS 537和本科时候一点OS课的经验，泛泛的回答了一点 time slice，调度器，优先级之类的名词，结合自己想象中的流程瞎扯了一通。

听完我的回答后，我还记得谷雨并不满意的说道，“不是这样的。” 我们那个时候在用RTOS，做开发时，对进程调度，抢占式内核的理解是很重要的。

不过谢天谢地，最后大佬们还是offer了我，把对OS渣理解的我捞了起来。但是这个事情让我一直耿耿于怀，以至于后面有机会学习6.828的JOS，真正自己动手做round-robin的调度器、抢占式内核的时候，这个知识点还是我着重想去理解的地方。

闲话少说，关于JOS的preemptive multitasking，这里记录一些重要的细节以及个人理解：

JOS 的preemptive 只能在CPU运行在user mode的时候进行抢占，进入kernel 后会关中断，就无法触发抢占操作。（注册IDT时必须走interrupt gates，而不是trap gates）

The IF (interrupt-enable flag) controls the acceptance of external interrupts signalled via the INTR pin. When IF=0, INTR interrupts are inhibited; when IF=1, INTR interrupts are enabled. As with the other flag bits, the processor clears IF in response to a RESET signal. The instructions CLI and STI alter the setting of IF.

CLI (Clear Interrupt-Enable Flag) and STI (Set Interrupt-Enable Flag) explicitly alter IF (bit 9 in the flag register). These instructions may be executed only if CPL <= IOPL. A protection exception occurs if they are executed when CPL > IOPL.

The IF is also affected implicitly by the following operations:

Interrupts through interrupt gates automatically reset IF, disabling interrupts. (Interrupt gates are explained later in this chapter.)　

(x86 i386手册)
我们用了 i386 interupt gates注册了INT 31和所有的外部中断，所以发生系统调用和外部中断的时候，会自动reset EFLAGS里的IF 。

设置系统调用和抢断时钟中断的gate (第二个参数为0， 0 for an interrupt gate)

SETGATE(idt[T_SYSCALL], 0, GD_KT, t48_entry, 3); 
SETGATE(idt[IRQ_OFFSET + IRQ_TIMER], 0, GD_KT, irq_timer, 0);

x86 setgate x86 setgate

 1  // Set up a normal interrupt/trap gate descriptor. 
 2 // - istrap: 1 for a trap (= exception) gate, 0 for an interrupt gate. 
 3 //     see section 9.6.1.3 of the i386 reference: "The difference between
 4 //     an interrupt gate and a trap gate is in the effect on IF (the 
 5 //     interrupt-enable flag). An interrupt that vectors through an 
 6 //     interrupt gate resets IF, thereby preventing other interrupts from
 7 //     interfering with the current interrupt handler. A subsequent IRET 
 8 //     instruction restores IF to the value in the EFLAGS image on the 
 9 //     stack. An interrupt through a trap gate does not change IF." 
10 // - sel: Code segment selector for interrupt/trap handler 
11 // - off: Offset in code segment for interrupt/trap handler 
12 // - dpl: Descriptor Privilege Level - 
13 // the privilege level required for software to invoke 
14 // this interrupt/trap gate explicitly using an int instruction. 
15 #define SETGATE(gate, istrap, sel, off, dpl) \ 
16 { \ 
17 (gate).gd_off_15_0 = (uint32_t) (off) & 0xffff; \
18  (gate).gd_sel = (sel); \
19  (gate).gd_args = 0; \
20  (gate).gd_rsv1 = 0; \
21  (gate).gd_type = (istrap) ? STS_TG32 : STS_IG32; \
22  (gate).gd_s = 0; \
23  (gate).gd_dpl = (dpl); \ (gate).gd_p = 1; \
24  (gate).gd_off_31_16 = (uint32_t) (off) >> 16; \
25  }

内核在通过 gate时（无论从 kernel mode 还是user mode），都会保存当前的eflags，这是i386的硬件行为（跳转到 ISR之前）
退出内核或者中断时，使用 IRET 退出，会自动pop出来eflags。如果之前user mode打开了中断，这时就会重新恢复中断

退出中断或者系统调用的代码

 env_pop_tf(struct Trapframe *tf) 
{
 // Record the CPU we are running on for user-space debugging 
curenv->env_cpunum = cpunum();
asm volatile( "\tmovl %0,%%esp\n"
　　"\tpopal\n" 
　　"\tpopl %%es\n"
　　"\tpopl %%ds\n" 
　　"\taddl $0x8,%%esp\n" /* skip tf_trapno and tf_errcode */ 
　　"\tiret\n" 
　　: : "g" (tf) : "memory");

panic("iret failed"); /* mostly to placate the compiler */ }

关于 IRET
1. IRET是个很复杂的操作。行为取决于中断进入时栈上的EFLAGS以及当前的EFLAGS 中的VM项，以及NT 项。
2. IRET可以进行的跳转模式
  1. Return from virtual-8086 mode.
  2. Return to virtual-8086 mode.
  3. Intra-privilege level return.
  4. Inter-privilege level return.
  5. Return from nested task (task switch).
3. 在JOS里，由于kernel mode下关了中断，所以所有的iret最终都会到user mode。这个和Linux 2.6以前的行为是一致的。Linux 2.6以后加了内核态抢占，就复杂了很多
4. IRET最核心的还是恢复几个寄存器：
5. 所谓中断上下文，从硬件的角度来看，就是kernel stack上会多几个旧的寄存器值，并且切换ss + cs + ip，再加上关闭中断。至于push哪些，取决于是否穿越 privilege。我们现实生活中说的中断上下文（只能使用自旋锁，以及不能sleep等约束），是可以由OS软件定义的行为，你自己也可以写一个一进ISR就把中断打开的软件（这当然很蠢）。
关于Blocking的Syscall
1. JOS的抢占式内核，只能做User Mode下的抢占。所以系统调用间的控制转移，是cooperative模式，而非preemptive模式（这个名词时LKD里面提到的）
2. 所谓合作式，就是在某个blocking系统调用 yield() + not_runnable 之后，后半部分，需要另外的系统调用来处理。
3. 这个地方的处理，除了准备数据，就是准备返回值（ax）。一旦从not_runnable改变为runnable，那么重新运行的点，JOS会直接返回user mode
4. 因为env_run会直接pop_tf，tf就是进入sys_call时的user mode点。
5. sys_ipc_recv，yield之后的代码是永远不会被执行的，除非改变stack frame。
6. sys_ipc_send 需要把rcv调用的ax返回值准备好。否则会返回syscall num。
Linux内核的抢占方式
1. 内核在2.6以后全面支持了 kernel mode的抢占。一旦打开中断（或者放掉锁，大多数锁是会关掉中断的），在kernel mode也会被抢占
2. yield时的context switch，会把 kernel stack也保存起来。通过_switch_to调用，部分clobbers寄存器是通过 call 自动压栈来保存的
3. signal 会将wait_interruptable的 process唤醒，所以要用mesa语义来检查是否是spurious唤醒。请使用推荐的wait - awake流程
4. 由于有内核抢占 + 信号中断的情况，可能还是要注意可重入的问题

先写到这里吧

posted @ 2022-04-10 17:10 飞机云阅读(134) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

飞机云_

JOS和抢占式内核的一点随想

公告