Linux内核源码阅读-系统启动(一)
市面上关于Linux源码的数据很少提及从系统启动到Linux内核接管cpu这部分源码的解读。但是这部分内核做的时对一些诸如内存管理,进程创建的细节很重要。
一、BIOS阶段
在计算机加电的一瞬间,cpu通过硬件电路将一些关键处理器设置成固定的值(包括cs和eip指针寄存器),同时执行物理地址0xffffff0处的代码。硬件将这个地址映射到某个只读、持久的存储芯片上。该芯片通常为ROM(只读内存),这个内存区域中放置的是一些基本输入输出设备的驱动程序。在BIOS运行的过程中,cpu处于实模式,此时的cs:eip到物理地址直接的关系是cs左移4位+eip的地址偏移,此时内存管理相关的概念还没有建立起来。Linux内核的启动需要依靠BIOS来加载内核镜像(严格来说是依靠BIOS来加载bootloader程序,如GRUB等)。从计算机上电开始,在BIOS阶段主要完成了以下事情:
(1)POST,这个阶段完成上电自检,同时建立几个表来描述系统中的硬件设备
(2)搜索一个操作系统来启动,根据BIOS的设置,可能从不同的设备或者介质去加载操作系统
(3)只要找到一个有效的设备,就把设备的第一个扇区的内容拷贝到内存从0x00007c00开始的位置,然后跳转到这个地址处,开始执行刚才装载进来的代码。
二、BootLoader阶段
在x86体系结构中Linux的引导装入程序叫LILO,此外目前使用的较多的是GRUB,GRUB比LILO更加强大,因为它可以识别基于多个磁盘的文件系统,并且可以从文件中读入部分引导程序。以LILO为例,由BIOS将这个小程序装入地址0x00007开始的RAM中,这个小程序又把自己移动到地址0x00096a00,建立实模式栈,并把LILO的第二阶段装入到从地址0x00096c00开始的RAM。第二部分又依次从磁盘读取可用的系统映射表,并提供给用户一个提示符,当选择了操作系统后,bootloade开始将操作系统镜像加载进内存。这个过程中主要发生了以下事件:
(1)调用一个BIOS过程从磁盘装入内核镜像的最初部分,即将内核镜像的第一个512字节从地址0x00090000开始存入内存(在2.6以后的内核这512个字节被bootloader跳过)。
(2)调用一个BIOS过程从磁盘装入剩余的内核镜像(分为高装载和低装载)
源码分析(内核版本2.6.38.8,arch/x86/boot/header.h):
1. 下面的宏定义中的BOOTSEG就是前述的MRB装入的地址(0x00007c00)。
#include <asm/segment.h> #include <generated/utsrelease.h> #include <asm/boot.h> #include <asm/e820.h> #include <asm/page_types.h> #include <asm/setup.h> #include "boot.h" #include "voffset.h" #include "zoffset.h" BOOTSEG = 0x07C0 /* original address of boot-sector */ SYSSEG = 0x1000 /* historical load address >> 4 */ #ifndef SVGA_MODE #define SVGA_MODE ASK_VGA #endif #ifndef RAMDISK #define RAMDISK 0 #endif #ifndef ROOT_RDONLY #define ROOT_RDONLY 1 #endif
2. 前述的512字节的代码在名字为bootsect_start,start2的段内(第30行-88行)这部分的长度刚好为512字节,可以看到在2.6版本的内核中,bootsect_start中直接通过一个长跳转到start2。其实这部分代码就是检测一下是不是从软盘启动,如果是就抛出错误重启。而现在GRUB的引导程序已经不加载这512字节的代码部分了,而是直接加载下边的setup部分的代码。
.code16 /*16位模式代码*/ .section ".bstext", "ax" /*定义一个.bstext段,这个段是可写'a'和可执行'x'的*/ .global bootsect_start bootsect_start: /* */ ljmp $BOOTSEG, $start2 /*跳转到start2,这个是一个远跳转,注意其语法为:ljmp $段,$段内偏移*/ start2: movw %cs, %ax /*cs内容为0x07c0*/ movw %ax, %ds movw %ax, %es movw %ax, %ss xorw %sp, %sp /*sp寄存器清0*/ sti /*开中断*/ cld /*清除eflags方向位,为msg_loop后的语句做准备*/ movw $bugger_off_msg, %si /*把bugger_off_msg的地址放入si寄存器(源变址寄存器)*/ msg_loop: /*循环打印bugger_off_msg中的信息*/ lodsb /*块读出指令, lodsb将si指向的内存单元读取到AL, 然后si寄存器自增*/ andb %al, %al jz bs_die /*jump if zero, 如果上一条指令的值是0则不跳转继续往下执行, 否则跳转到bs_die*/ movb $0xe, %ah /*把0xe按字节放入ax寄存器的高位*/ movw $7, %bx int $0x10 /*调用0x10号中断, 显示字符中断,前边移入ax和bx的操作应该是配置显示的模式,如字体颜色等*/ jmp msg_loop /*跳转到msg_loop*/ bs_die: # Allow the user to press a key, then reboot xorw %ax, %ax /*清零ax寄存器*/ int $0x16 /*调用0x16号bios中断,该中断在%ax为零的时候等待用户按任意键*/ int $0x19 /*调用0x19号bios中断重启计算机*/ # int 0x19 should never return. In case it does anyway, # invoke the BIOS reset code... ljmp $0xf000,$0xfff0 /*理论上不会执行到这里,如果执行到这里就直接重启(BIOS reset code)*/ .section ".bsdata", "a" bugger_off_msg: .ascii "Direct booting from floppy is no longer supported.\r\n" .ascii "Please use a boot loader program instead.\r\n" .ascii "\n" .ascii "Remove disk and press any key to reboot . . .\r\n" .byte 0 # Kernel attributes; used by setup. This is part 1 of the # header, from the old boot sector. .section ".header", "a" .globl hdr hdr: setup_sects: .byte 0 /* Filled in by build.c */ root_flags: .word ROOT_RDONLY syssize: .long 0 /* Filled in by build.c */ ram_size: .word 0 /* Obsolete */ vid_mode: .word SVGA_MODE root_dev: .word 0 /* Filled in by build.c */ boot_flag: .word 0xAA55 # offset 512, entry point
3.紧接着程序往下执行_start段,这部分代码很长,但是其实就是跳转到了start_of_setup,剩余部分都是一些数据的定义。
.globl _start /*_start全局可见,伪指令,c语言中要用extern来声明*/ _start: # Explicitly enter this as bytes, or the assembler # tries to generate a 3-byte jump here, which causes # everything else to push off to the wrong offset. .byte 0xeb # short (2-byte) jump /*0xEB 是指令 jmp的机器码,后边两个字节是偏移*/ .byte start_of_setup-1f /*start_of_setup - 1 就是减去了头部的一个byte, 这两个指令的意义就是跳转到start_of_setup*/ 1: # Part 2 of the header, from the old setup.S .ascii "HdrS" # header signature .word 0x020a # header version number (>= 0x0105) # or else old loadlin-1.5 will fail) .globl realmode_swtch realmode_swtch: .word 0, 0 # default_switch, SETUPSEG start_sys_seg: .word SYSSEG # obsolete and meaningless, but just # in case something decided to "use" it .word kernel_version-512 # pointing to kernel version string # above section of header is compatible # with loadlin-1.5 (header v1.5). Don't # change it. type_of_loader: .byte 0 # 0 means ancient bootloader, newer # bootloaders know to change this. # See Documentation/i386/boot.txt for # assigned ids # flags, unused bits must be zero (RFU) bit within loadflags loadflags: LOADED_HIGH = 1 # If set, the kernel is loaded high CAN_USE_HEAP = 0x80 # If set, the loader also has set # heap_end_ptr to tell how much # space behind setup.S can be used for # heap purposes. # Only the loader knows what is free .byte LOADED_HIGH setup_move_size: .word 0x8000 # size to move, when setup is not # loaded at 0x90000. We will move setup # to 0x90000 then just before jumping # into the kernel. However, only the # loader knows how much data behind # us also needs to be loaded. code32_start: # here loaders can put a different # start address for 32-bit code. .long 0x100000 # 0x100000 = default for big kernel ramdisk_image: .long 0 # address of loaded ramdisk image # Here the loader puts the 32-bit # address where it loaded the image. # This only will be read by the kernel. ramdisk_size: .long 0 # its size in bytes bootsect_kludge: .long 0 # obsolete heap_end_ptr: .word _end+STACK_SIZE-512 # (Header version 0x0201 or later) # space from here (exclusive) down to # end of setup code can be used by setup # for local heap purposes. ext_loader_ver: .byte 0 # Extended boot loader version ext_loader_type: .byte 0 # Extended boot loader type cmd_line_ptr: .long 0 # (Header version 0x0202 or later) # If nonzero, a 32-bit pointer # to the kernel command line. # The command line should be # located between the start of # setup and the end of low # memory (0xa0000), or it may # get overwritten before it # gets read. If this field is # used, there is no longer # anything magical about the # 0x90000 segment; the setup # can be located anywhere in # low memory 0x10000 or higher. ramdisk_max: .long 0x7fffffff # (Header version 0x0203 or later) # The highest safe address for # the contents of an initrd # The current kernel allows up to 4 GB, # but leave it at 2 GB to avoid # possible bootloader bugs. kernel_alignment: .long CONFIG_PHYSICAL_ALIGN #physical addr alignment #required for protected mode #kernel #ifdef CONFIG_RELOCATABLE relocatable_kernel: .byte 1 #else relocatable_kernel: .byte 0 #endif min_alignment: .byte MIN_KERNEL_ALIGN_LG2 # minimum alignment pad3: .word 0 cmdline_size: .long COMMAND_LINE_SIZE-1 #length of the command line, #added with boot protocol #version 2.06 hardware_subarch: .long 0 # subarchitecture, added with 2.07 # default to 0 for normal x86 PC hardware_subarch_data: .quad 0 payload_offset: .long ZO_input_data payload_length: .long ZO_z_input_len setup_data: .quad 0 # 64-bit physical pointer to # single linked list of # struct setup_data pref_address: .quad LOAD_PHYSICAL_ADDR # preferred load addr #define ZO_INIT_SIZE (ZO__end - ZO_startup_32 + ZO_z_extract_offset) #define VO_INIT_SIZE (VO__end - VO__text) #if ZO_INIT_SIZE > VO_INIT_SIZE #define INIT_SIZE ZO_INIT_SIZE #else #define INIT_SIZE VO_INIT_SIZE #endif init_size: .long INIT_SIZE # kernel initialization size
4. 之后代码来到了start_of_setup段,这部分开始完成了一些初始化的相关工作,包括重置磁盘控制器,初始化工作栈,清除标志位等等,并跳转到c语言符号main处执行。
.section ".entrytext", "ax" /*声明一个段*/ start_of_setup: #ifdef SAFE_RESET_DISK_CONTROLLER # Reset the disk controller. movw $0x0000, %ax # Reset disk controller movb $0x80, %dl # All disks int $0x13 /*0x13中断, 重置磁盘控制器*/ #endif # Force %es = %ds movw %ds, %ax /*以ax寄存器为中间寄存器, 让es = ds*/ movw %ax, %es cld /*清楚方向标志位DF=0*/ # Apparently some ancient versions of LILO invoked the kernel with %ss != %ds, # which happened to work by accident for the old code. Recalculate the stack # pointer if %ss is invalid. Otherwise leave it alone, LOADLIN sets up the # stack behind its own code, so we can't blindly put it directly past the heap. movw %ss, %dx # 堆栈段寄存器的值放入dx寄存器 cmpw %ax, %dx # %ds == %ss? 执行到这一步ax中还是ds寄存器的值, 比较ds 是不是 等于ss movw %sp, %dx # dx 设置为栈顶指针 je 2f # -> assume %sp is reasonably set # Invalid %ss, make up a new stack, 如果ss不等于ds就做一个新栈 movw $_end, %dx # 把栈底放入dx testb $CAN_USE_HEAP, loadflags jz 1f movw heap_end_ptr, %dx 1: addw $STACK_SIZE, %dx jnc 2f xorw %dx, %dx # Prevent wraparound 2: # Now %dx should point to the end of our stack space andw $~3, %dx # dword align (might as well...) jnz 3f movw $0xfffc, %dx # Make sure we're not zero 3: movw %ax, %ss movzwl %dx, %esp # Clear upper half of %esp sti # Now we should have a working stack #sti:置中断允许位 # We will have entered with %cs = %ds+0x20, normalize %cs so # it is on par with the other segments. pushw %ds pushw $6f lretw 6: # Check signature at end of setup cmpl $0x5a5aaa55, setup_sig jne setup_bad # Zero the bss movw $__bss_start, %di movw $_end+3, %cx xorl %eax, %eax subw %di, %cx shrw $2, %cx rep; stosl # Jump to C code (should not return) calll main # 这一步开始调用main函数执行 # Setup corrupt somehow... setup_bad: movl $setup_corrupt, %eax calll puts # Fall through... .globl die .type die, @function die: hlt jmp die .size die, .-die .section ".initdata", "a" setup_corrupt: .byte 7 .string "No setup signature found...\n"