（翻译）Writing an x86 "Hello world" bootloader with assembly

原文出处：http://50linesofco.de/post/2018-02-28-writing-an-x86-hello-world-bootloader-with-assembly

摘要

（TL;DR 可以是 Too long; Didn't read（太长，所以没有看）。也可以是 Too long; Don't read （太长，请不要看），常作为一篇很长的文章的摘要标题。）

计算机通电后，计算机的BIOS从启动设备上读取512bytes，如果其在这些512bytes末尾检测到一个2-byte的“magic number”，便将这512bytes的数据当成指令并运行这些指令。

这种指令被叫做“引导程序”（或者“引导扇区”）（"bootloader" (or "boot sector")），我们打算写一些汇编代码来让一个虚拟机启动并显示“Hello world”。引导程序也是启动系统的第一个阶段。

x86计算机启动时所发生的事

你或许想知道当你按下你计算机的电源键时所发生的事。好了，没有太多的细节——在硬件准备好后，初始化BIOS代码来读取设置并检查系统，BIOS开始查看配置的潜在引导设备以执行某些操作。

它通过读取引导设备中的第一个512字节并检查这些512字节中的最后两个是否包含一个magic number(0x55AA)。如果这就是最后两个字节，BIOS将这512个字节移到内存地址0x7C00，并将这512字节开始处的任何字节当作代码，即所谓的引导加载程序。在本文中，我们将编写这样一段代码，让它打印文本“Hello World！”然后进入无限循环。

真正的引导程序通常将实际操作系统代码加载到内存中，将CPU更改为保护模式（protected mode），并运行实际操作系统代码。

用GNU汇编程序编写X86汇编程序

为了更加容易更加有趣，我们选择x86汇编语言来写我们的引导程序。这篇文章将使用GNU 汇编器来从我们的代码中创建二进制可执行文件，其中GNU 汇编器使用“AT&T语法“而不是流传甚广的”Intel语法“。在文章最后，我会用Intel语法重写例子。

准备我们的代码

目前为止我们知道：我们需要创建一个以0X55AA为结尾的512bytes二进制文件。同时，值得我们注意的是：无论你的x86处理器时32位还是64位的，在启动时都会运行在16bit实模式下，因此我们的程序需要处理这种情况。

让我们为我们的汇编源代码创建boot.s文件，并告诉GNU汇编器我们将使用16位：

.code16 # 告诉汇编器我们使用16 bit模式

下一步，我们应该给我们的程序一个起始点，使链接器能够链接得到：

.code16
.global init # 使得我们的标签"init"为外面所知（makes our label "init" available to the outside）
init: # 这是之后我们二进制文件的起始点（this is the beginning of our binary later.）
  jmp init # 跳转到"init"（jump to "init"）

注意你可以为你的标签取任何名字。标准做法是_start但是我选择init就是为了说明我们确实可以取任意的名字。

好，现在我们甚至得到了一个无限循环，因为我们一直跳到标签，然后跳到标签再次…

通过运行GNU汇编程序（as）将代码转换为二进制代码，看看我们得到了什么：

as -o boot.o boot.s
ls -lh .
784 boot.o
152 boot.s

哇，坚持住！我们的输出已经是784字节？但是我们的引导装载器只有512个字节！

嗯，大多数时候开发人员感兴趣的可能是为他们的目标操作系统创建一个可执行文件，即一个exe文件（Windows）和elf文件（Unix）。这些文件有一个头（附加的、前置字节）并且通常加载几个系统库来访问操作系统功能。

我们的情况是不同的：我们不需要这些，只需要在引导时让BIOS执行的二进制代码。

通常，汇编器会产生一个可以运行的ELF或EXE文件，但是我们需要一个附加的步骤来从这些文件中删除不需要的附加数据。我们可以利用链接器（GNU的链接器被称为ld）来实现该步骤。

链接器通常用于将各种库和从编译器或汇编器等其他工具产生的二进制可执行文件组合到一个最终文件中。在我们这种情况下，我们需要一个“普通二进制文件（plain binary file）”，所以我们会在我们运行ld时给其传递--oformat binary参数。我们还想指定我们的程序从何处开始，所以通过使用-e init标志来告诉链接器在我们的代码中使用开始标签（我称之为init）作为程序的入口点。

当我们运行时，我们得到了更好的结果：

as -o boot.o boot.s
ld -o boot.bin --oformat binary -e init boot.o
ls -lh .
  3 boot.bin
784 boot.o
152 boot.s

现在3字节听起来好多了，但是这还不能启动，因为它缺少在字节511和512的二进制数字0x55AA...

使之成为可引导的

幸运的是，我们可以用零来填充我们的二进制文件，并在末尾加上magic number。

让我们添加零直到我们的二进制文件是510字节长（因为最后两字节将是个magic number）。

我们可以使用as的预处理指令.fill来实现同样的效果。语法是.fill, count,size,value——它都会添加count乘size个值为value的字节到boot.s里汇编代码中我们写这个指令的位置。

但是我们如何知道需要填充多少字节呢？方便的是，汇编器再次帮助我们。我们需要总共510个字节，所以我们将用零填充510-（我们的代码字节大写）个字节。但是，“代码的字节大小”是什么？幸运的是as有一个帮手来告诉我们在生成的二进制文件当前字节位置：.——并且我们还能够得到标签的位置。所以，我们的代码大小就是当前位置.在我们代码中的位置减去第一条语句在我们的代码中的位置（它是init的位置）。所以，.-init返回在最终二进制文件中生成代码的字节数。

.code16
.global init # makes our label "init" available to the outside

init: # this is the beginning of our binary later.
  jmp init # jump to "init"

.fill 510-(.-init), 1, 0 # add zeroes to make it 510 bytes long

as -o boot.o boot.s
ld -o boot.bin --oformat binary -e init boot.s
ls -lh .
 510 boot.bin
1.3k boot.o
 176 boot.s

我们就要达到目标了——仍然缺少最后两个字节的magic number：

.code16
.global init # makes our label "init" available to the outside

init: # this is the beginning of our binary later.
  jmp init # jump to "init"

.fill 510-(.-init), 1, 0 # add zeroes to make it 510 bytes long
.word 0xaa55 # magic bytes that tell BIOS that this is bootable

哦等等...如果magic number是0x55aa，为什么我们在这里交换它们？这是因为x86是小端的，所以字节在内存中交换。

现在，如果我们生成一个更新过的二进制文件，它是512字节长。

引导我们的引导程序

理论上，你可以把这个二进制写在USB驱动器、软盘或你电脑上愿意启动的任何512个字节上，但是我们可以使用一个简单的x86仿真器（它就像一个虚拟机）来替代。

我将使用x86系统架构的QEMU来实现这一点：

qemu-system-x86_64 boot.bin

运行这个命令会产生一些相对不引人注意的事情：

QEMU停止寻找可引导设备意味着我们的引导加载程序工作了——但是它并没有做任何事情！

为了证明这一点，我们可以重新启动循环，而不是无限循环，通过将程序集代码更改为：

.code16
.global init # makes our label "init" available to the outside

init: # this is the beginning of our binary later.
  ljmpw $0xFFFF, $0 # jumps to the "reset vector", doing a reboot

.fill 510-(.-init), 1, 0 # add zeroes to make it 510 bytes long
.word 0xaa55 # magic bytes that tell BIOS that this is bootable

这条新的指令ljmpw $0xFFFF, $0跳转到所谓的复位向量（Reset vector）。

他实际上意味着在系统重新启动之后，在没有实际重新启动的情况下重新执行第一条指令。它有时被称为“热重启”。

使用BIOS打印文本

好的，让我们从打印一个字符开始。

我们没有任何可用的操作系统和库，所以我们不能仅仅调用printf或者它的friends来完成工作。

幸运的是，我们的BIOS还是可用使用的，所以我们可以使用它的功能。通过所谓的中断（interrupts）这些功能（伴随着不同硬件提供的一系列功能）可以为我们使用。

在 Ralf Brown's interrupt list这个网站上，我们能够找到视频中断0x10。

通常，一个中断可以通过设置AX寄存器为特定值来实现不同的功能。在我们这种情况下，function "Teletype" 听起来是一个不错的选择——它打印在al中给出的字符并自动推进光标。漂亮！我们可以通过将ah设置为0xe来选择该函数，将要打印的ASCII代码放入al中，然后调用int 0x10:

.code16
.global init # makes our label "init" available to the outside

init: # this is the beginning of our binary later.
  mov $0x0e41, %ax # sets AH to 0xe (function teletype) and al to 0x41 (ASCII "A")
  int $0x10 # call the function in ah from interrupt 0x10
  hlt # stops executing

.fill 510-(.-init), 1, 0 # add zeroes to make it 510 bytes long
.word 0xaa55 # magic bytes that tell BIOS that this is bootable

现在我们将必要的值加载到ax寄存器中，调用中断0x10并停止执行（使用hlt）。

当我们运行as和ld来得到更新后的引导程序后，QEMU显示如下：

我们甚至可以看到光标在下一个位置闪烁，所以这个功能应该很容易使用更长的消息，对吧？

我们最终的hello-word-bootloader

要得到一个完整的信息来显示，我们需要一种方法来将这些信息存储在我们的二进制文件中。我们可以就像我们在二进制文件的末尾存储magic word一样，但是我们将使用一个与.byte不同的指令来存储一个完整的字符串。幸运的是，as有.ascii和.asciz用于字符串。它们之间的区别是，.asciz自动添加另一个被设置为零的字节。这一会就派上用场了，所以为我们的数据选择asciz指令。

另外，我们会使用一个标签来访问地址：

.code16
.global init # makes our label "init" available to the outside

init: # this is the beginning of our binary later.
  mov $0x0e, %ah # sets AH to 0xe (function teletype)
  mov $msg, %bx   # sets BX to the address of the first byte of our message
  mov (%bx), %al   # sets AL to the first byte of our message
  int $0x10 # call the function in ah from interrupt 0x10
  hlt # stops executing

msg: .asciz "Hello world!" # stores the string (plus a byte with value "0") and gives us access via $msg

.fill 510-(.-init), 1, 0 # add zeroes to make it 510 bytes long
.word 0xaa55 # magic bytes that tell BIOS that this is bootable

我们在这儿有了新的功能：

mov $msg, %bx
mov (%bx), %al

第一行将第一个字节的地址加载到寄存器bx（我们使用整个寄存器因为地址长度是16位）。

第二行将在bx寄存器里的地址所指向的位置上的值加载到寄存器al中，所以消息的第一个字节最终进入到al中，因为bx指向它的地址。

但是现在当我们运行ld时发生一个错误：

as -o boot.o boot.s
ld -o boot.bin --oformat binary -e init -o boot.bin boot.o
boot.o: In function `init':
(.text+0x3): relocation truncated to fit: R_X86_64_16 against `.text'+a

这些是什么意思呢？

这说明，在我们的16位地址空间中，msg在ELF文件（boot.o）中移动的地址不适合。可以通过告诉ld我们的程序应该从哪里开始从而解决这个问题。BIOS会在地址0x7c00处加载我们的代码，所以我们可以在调用链接器时通过-Ttext 0x7c00来设置我们的开始地址。

as -o boot.o boot.s
ld -o boot.bin --oformat binary -e init -Ttext 0x7c00 -o boot.bin boot.o

QEMU现在会打印出"H"，我们消息文本的第一个字符。

我们可以通过下面的步骤来打印整个字符串：

将字符串（即msg）第一个字符的地址放到除ax外（因为我们使用ax来进行实际的打印）的其他任何寄存器中，比如使用cx。
将寄存器cx里地址所指向的位置上的字节加载到al中。
将al中的值与0（字符串结尾，.asciz的特点）比较。
如果al中是0，转到程序的最后。
调用中断0x10。
将cx中存储的地址增加1。
从步骤2开始重复。

同样有用的是，x86有一个特殊的寄存器和一组特殊的指令来处理字符串。

为了使用这些指令，我们将把字符串（msg）的地址加载到特殊寄存器si，该寄存器允许我们lodsb指令来把si指向的地址上的字节加载到al并同时将si里的地址增加1。

让我们把这些放在一起:

.code16 # use 16 bits
.global init

init:
  mov $msg, %si # loads the address of msg into si
  mov $0xe, %ah # loads 0xe (function number for int 0x10) into ah
print_char:
  lodsb # loads the byte from the address in si into al and increments si
  cmp $0, %al # compares content in AL with zero
  je done # if al == 0, go to "done"
  int $0x10 # prints the character in al to screen
  jmp print_char # repeat with next byte
done:
  hlt # stop execution

msg: .asciz "Hello world!"

.fill 510-(.-init), 1, 0 # add zeroes to make it 510 bytes long
.word 0xaa55 # magic bytes that tell BIOS that this is bootable

Let's look at this new code in QEmu:

通过不断地循环print_char到jmp print_char直到si寄存器中为0（在我们消息的最后一个字符的后面），它打印了完整的消息。一旦找到0字节，我们跳转到done并停止程序。

Intel语法的版本和`nasm`

前面说过，我会告诉你如何用nasm代替GNU汇编器来实现同样的事情。

首先：nasm可以自己生成一个原始二进制文件，并且使用英特尔语法：

operation target, source————我记得“W，T，F”的顺序：“What，To，From”😉

下面是nasm兼容版本的代码：

[bits 16]    ; use 16 bits
[org 0x7c00] ; sets the start address

init: 
  mov si, msg  ; loads the address of "msg" into SI register
  mov ah, 0x0e ; sets AH to 0xe (function teletype)
print_char:
  lodsb     ; loads the current byte from SI into AL and increments the address in SI
  cmp al, 0 ; compares AL to zero
  je done   ; if AL == 0, jump to "done"
  int 0x10  ; print to screen using function 0xe of interrupt 0x10
  jmp print_char ; repeat with next byte
done:
  hlt ; stop execution

msg: db "Hello world!", 0 ; we need to explicitely put the zero byte here

times 510-($-$$) db 0           ; fill the output file with zeroes until 510 bytes are full
dw 0xaa55                       ; magic number that tells the BIOS this is bootable

将它命名为boot.asm后，可以通过运行 nasm -o boot2.bin boot.asm编译。

注意，cmp参数的顺序和as中参数的顺序相反，并且[org]在nasm中和as中的.org是不一样的。

nasm不会通过ELF文件（boot.o）做额外的步骤，所以它不会像as和ld一样在内存中移动我们的msg。

然而，如果我们忘记设置我们代码的启动地址为0x7c00，二进制文件中使用的msg地址仍然会是错误的，因为nasm默认情况下会使用不同的起始地址。当我们明确设置起始地址为0x7c00（BIOS加载我们代码的地方）,二进制文件中的地址就会被正确计算，并且代码也会和其他版本一样正常工作。

原文

Writing an x86 "Hello world" bootloader with assembly

http://50linesofco.de/post/2018-02-28-writing-an-x86-hello-world-bootloader-with-assembly

Writing an x86 "Hello world" bootloader with assembly

TL;DR

After booting, the BIOS of the computer reads 512 bytes from the boot devices and, if it detects a two-byte "magic number" at the end of those 512 bytes, loads the data from these 512 bytes as code and runs it.

This kind of code is called a "bootloader" (or "boot sector") and we're writing a tiny bit of assembly code to make a virtual machine run our code and display "Hello world" for the fun of it. Bootloaders are also the very first stage of booting an operating system.

What happens when your x86 computer starts

You might have wondered what happens when you press the "power" button on your computer. Well, without going into too much detail - after getting the hardware ready and launching the initial BIOS code to read the settings and check the system, the BIOS starts looking at the configured potential boot devices for something to execute.

It does that by reading the first 512 bytes from the boot devices and checks if the last two of these 512 bytes contain a magic number (0x55AA). If that's what these last two bytes are, the BIOS moves the 512 bytes to the memory address 0x7c00 and treats whatever was at the beginning of the 512 bytes as code, the so-called bootloader. In this article we will write such a piece of code, have it print the text "Hello World!" and then go into an infinite loop.
Real bootloaders usually load the actual operating system code into memory, change the CPU into the so-called protected mode and run the actual operating system code.

A primer on x86 assembly with the GNU assembler

To make our lives a little easier (sic!) and make it all more fun, we will use x86 assembly language for our bootloader. The article will use the GNU assembler to create the binary executable file from our code and the GNU assembler uses the "AT&T syntax" instead of the pretty widely-spread "Intel syntax". I will repeat the example in the Intel syntax at the end of the article.

For those of you, who are not familiar with x86 assembly language and/or the GNU assembler, I created this description that explains just enough assembly to get you up to speed for the rest of this article. The assembly code within this article will also be commented, so that you should be able to glance over the code snippets without knowing much about the details of assembly.

Getting our code ready

Okay, so far we know: We need to create a 512 byte binary file that contains 0x55AA at its end. It's also worth mentioning that no matter if you have a 32 or 64 bit x86 processor, at boot time the processor will run in the 16 bit real mode, so our program needs to deal with that.

Let's create our boot.s file for our assembly sourcecode and tell the GNU assembler that we'll use 16 bits:

.code16 # tell the assembler that we're using 16 bit mode

Ah, this is going great! Next up we should give us a starting point for our program and make that available to the linker (more on that in a few moments):

.code16
.global init # makes our label "init" available to the outside

init: # this is the beginning of our binary later.
  jmp init # jump to "init"

Note You can call your label whatever you wish. The standard would be _start but I chose initto illustrate that you can call it anything, really.

Nice, now we even got an infinite loop, because we keep jumping to the label, then jump to the label again...

Time to turn our code into some binary by running the GNU assembler (as) and see what we got:

as -o boot.o boot.s
ls -lh .
784 boot.o
152 boot.s

Woah, hold on! Our output is already 784 bytes? But we only have 512 bytes for our bootloader!

Well, most of the time developers are probably interested in creating an executable file for the operating system they are targeting, i.e. an exe (Windows), elf (Unix) file. These files have a header (read: additional, preceeding bytes) and usually load a few system libraries to access operating system functionality.

Our case is different: We want none of that, just our code in binary for the bios to execute upon boot.

Usually, the assembler produces an ELF or EXE file that is ready to run but we need one additional step that strips the unwanted additional data in those files. We can use the linker (GNU's linker is called ld) for this step.

The linker is normally used to combine the various libraries and the binary executables from other tools such as compilers or assemblers into one final file. In our case we want to produce a "plain binary file", so we will pass --oformat binary to ld when we run it. We also want to specify where our program starts, so we tell the linker to use the starting label (I called it init) in our code as the program's entry point by using the -e init flag.

When we run that, we get a better result:

as -o boot.o boot.s
ld -o boot.bin --oformat binary -e init boot.o
ls -lh .
  3 boot.bin
784 boot.o
152 boot.s

(Typo spotted by xnumbersx)

Okay, three bytes sounds much better, but this won't boot up, because it is missing the magic number 0x55AA at bytes 511 and 512 of our binary...

Making it bootable

Luckily, we can just fill our binary with a bunch of zeroes and add the magic number as data at the end.
Let's start with adding zeroes until our binary file is 510 bytes long (because the last two bytes will be the magic number).

We can use the the preprocessor directive .fill from as to do that. The syntax is .fill, count,size,value - it adds count times size bytes with the value value wherever we will write this directive into our assembly code in boot.s.

But how do we know how many bytes we need to fill in? Conveniently, the assembler helps us again. We need a total number of 510 bytes so we will fill 510 - (byte size of our code) bytes with zeroes. But what is the "byte size of our code"? Luckily as has a helper that tells us the current byte position within the generated binary: . - and we can get the position of the labels, too. So our code size will be whatever the current position . is after our code minus the positon of the first statement in our code (which is the position of init). So .-init returns the number of generated bytes of our code in the final binary file...

.code16
.global init # makes our label "init" available to the outside

init: # this is the beginning of our binary later.
  jmp init # jump to "init"

.fill 510-(.-init), 1, 0 # add zeroes to make it 510 bytes long

as -o boot.o boot.s
ld -o boot.bin --oformat binary -e init boot.s
ls -lh .
 510 boot.bin
1.3k boot.o
 176 boot.s

We're getting there - still missing the final two bytes of our magic word:

.code16
.global init # makes our label "init" available to the outside

init: # this is the beginning of our binary later.
  jmp init # jump to "init"

.fill 510-(.-init), 1, 0 # add zeroes to make it 510 bytes long
.word 0xaa55 # magic bytes that tell BIOS that this is bootable

Oh wait... if the magic bytes are 0x55aa, why are we swapping them here?
That is because x86 is little endian, so the bytes get swapped in memory.

Now if we produce an updated binary file, it is 512 bytes long.

Booting our bootloader

You could theoretically write this binary into the first 512 byte on a USB drive, a floppy disk or whatever else your computer is happy booting from, but let's use a simple x86 emulator (it's like a virtual machine) instead.

I will use QEmu with an x86 system architecture for this:

qemu-system-x86_64 boot.bin

Running this command produces something relatively unspectacular:
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-OhaiBiar-1675167656687)(http://50linesofco.de/images/post-images/qemu-first-boot.png)]

The fact that QEmu stops looking for bootable devices means that our bootloader worked - but it doesn't do anything yet!

To prove that, we can cause a reboot loop instead of an infinite loop that does nothing by changing our assembly code to this:

.code16
.global init # makes our label "init" available to the outside

init: # this is the beginning of our binary later.
  ljmpw $0xFFFF, $0 # jumps to the "reset vector", doing a reboot

.fill 510-(.-init), 1, 0 # add zeroes to make it 510 bytes long
.word 0xaa55 # magic bytes that tell BIOS that this is bootable

This new command ljmpw $0xFFFF, $0 jumps to the so-called reset vector.
This effectively means re-executing the first instruction after the system boots again without actually rebooting. It's sometimes referred to as a "warm reboot".

Using the BIOS to print text

Okay, let's start with printing a single character.
We don't have any operating system or libraries available, so we can't just call printf or one of its friends and be done.

Luckily, we have the BIOS still around and reachable, so we can make use of its functions. These functions (along with a bunch of functions that different hardware provides) are available to us via the so-called interrupts.

In Ralf Brown's interrupt list we can find the video interrupt 0x10.

A single interrupt can carry out many different functions which are usually selected by setting the AX register to a specific value. In our case the function "Teletype" sounds like a good match - it prints a character given in al and automatically advances the cursor. Nifty! We can select that function by setting ah to 0xe, put the ASCII code we want to print into al and then call int 0x10:

.code16
.global init # makes our label "init" available to the outside

init: # this is the beginning of our binary later.
  mov $0x0e41, %ax # sets AH to 0xe (function teletype) and al to 0x41 (ASCII "A")
  int $0x10 # call the function in ah from interrupt 0x10
  hlt # stops executing

.fill 510-(.-init), 1, 0 # add zeroes to make it 510 bytes long
.word 0xaa55 # magic bytes that tell BIOS that this is bootable

Now we're loading the necessary value into the ax register, call interrupt 0x10 and halt the execution (using hlt).

When we run as and ld to get our updated bootloader, QEmu shows us this:

We can even see that the cursor blinks at the next position, so this function should be easy to use with longer messages, right?

Our final hello-world-bootloader

To get a full message to display, we will need a way to store this information in our binary. We can do that similar to how we store the magic word at the end of our binary, but we'll use a different directive than .byte as we wanna store a full string. as luckily comes with .ascii and .asciz for strings. The difference between them is that .asciz automatically adds another byte that is set to zero. This will come in handy in a moment, so we chose .asciz for our data.
Also, we will use a label to give us access to the address:

.code16
.global init # makes our label "init" available to the outside

init: # this is the beginning of our binary later.
  mov $0x0e, %ah # sets AH to 0xe (function teletype)
  mov $msg, %bx   # sets BX to the address of the first byte of our message
  mov (%bx), %al   # sets AL to the first byte of our message
  int $0x10 # call the function in ah from interrupt 0x10
  hlt # stops executing

msg: .asciz "Hello world!" # stores the string (plus a byte with value "0") and gives us access via $msg

.fill 510-(.-init), 1, 0 # add zeroes to make it 510 bytes long
.word 0xaa55 # magic bytes that tell BIOS that this is bootable

(Typo spotted by xnumbersx)

We have one new feature in there:

mov $msg, %bx
mov (%bx), %al

The first line loads the address of the first byte into the register bx (we use the entire register because addresses are 16 bit long).

The second line then loads the value that is stored at the address from bx into al, so the first character of the message ends up in al, because bx points to its address.

But now we get an error when running ld:

as -o boot.o boot.s
ld -o boot.bin --oformat binary -e init -o boot.bin boot.o
boot.o: In function `init':
(.text+0x3): relocation truncated to fit: R_X86_64_16 against `.text'+a

Dang, what does that mean?

Well it turns out that the address at which msg is moved in the ELF file (boot.o) doesn't fit in our 16 bit address space. We can fix that by telling ld where our program memory should start. The BIOS will load our code at address 0x7c00, so we will make that our starting address by specifying -Ttext 0x7c00 when we call the linker:

as -o boot.o boot.s
ld -o boot.bin --oformat binary -e init -Ttext 0x7c00 -o boot.bin boot.o

QEmu will now print "H", the first character of our message text.

We could now print the entire string by doing the following:

Put the address of the first byte of the string (i.e. msg) into any register except ax (because we use that for the actual printing), say we use cx.
Load the byte at the address in cx into al
Compare the value in al with 0 (end of string, thanks to .asciz)
If AL contains 0, go to the end of our program
Call interrupt 0x10
Increment the address in cx by one
Repeat from step 2

What is also useful is the fact that x86 has a special register and a bunch of special instructions to deal with strings.
In order to use these instructions, we will load the address of our string (msg) into the special register si which allows us to use the convenient lodsb instruction that loads a byte from the address that si points to into al and increments the address in si at the same time.

Let's put it all together:

.code16 # use 16 bits
.global init

init:
  mov $msg, %si # loads the address of msg into si
  mov $0xe, %ah # loads 0xe (function number for int 0x10) into ah
print_char:
  lodsb # loads the byte from the address in si into al and increments si
  cmp $0, %al # compares content in AL with zero
  je done # if al == 0, go to "done"
  int $0x10 # prints the character in al to screen
  jmp print_char # repeat with next byte
done:
  hlt # stop execution

msg: .asciz "Hello world!"

.fill 510-(.-init), 1, 0 # add zeroes to make it 510 bytes long
.word 0xaa55 # magic bytes that tell BIOS that this is bootable

Let's look at this new code in QEmu:

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-SThlue2h-1675167656688)(http://50linesofco.de/images/post-images/qemu-final-bootloader.png)]

🎉 Yay! 🎉

It prints our message by looping from print_char to jmp print_char until we hit a zero-byte (which is right after the last character of our message) in si. Once we find the zero byte, we jump to done and halt execution.

The Intel syntax edition and `nasm`

As promised, I will also show you the alternative way of using nasm instead of the GNU assembler.

First things first: nasm can produce a raw binary by itself and it uses the Intel Syntax:

operation target, source - I remember the order with "W,T,F" - "What, To, From" 😉

So here is the nasm-compatible version of the previous code:

[bits 16]    ; use 16 bits
[org 0x7c00] ; sets the start address

init: 
  mov si, msg  ; loads the address of "msg" into SI register
  mov ah, 0x0e ; sets AH to 0xe (function teletype)
print_char:
  lodsb     ; loads the current byte from SI into AL and increments the address in SI
  cmp al, 0 ; compares AL to zero
  je done   ; if AL == 0, jump to "done"
  int 0x10  ; print to screen using function 0xe of interrupt 0x10
  jmp print_char ; repeat with next byte
done:
  hlt ; stop execution

msg: db "Hello world!", 0 ; we need to explicitely put the zero byte here

times 510-($-$$) db 0           ; fill the output file with zeroes until 510 bytes are full
dw 0xaa55                       ; magic number that tells the BIOS this is bootable

(Thanks to Reddit user pahefu for pointing out a typo here!
After saving it as boot.asm it can be compiled by running nasm -o boot2.bin boot.asm.

Note that the order of arguments for cmp are the opposite of the order that as uses and [org] in nasm and .org in as are not the same thing!

nasm does not do the extra step via the ELF file (boot.o), so it won't move our msg around in memory like as and ld did.

Yet, if we forget to set the start address of our code to 0x7c00, the address that the binary uses for msg will still be wrong, because nasm assumes a different start address by default. When we explicitly set it to 0x7c00 (where the BIOS loads our code), the addresses will be correctly calculated in the binary and the code works just like the other version does.

posted @ 2018-05-03 11:20 main_c 阅读(516) 评论(0) 收藏举报

刷新页面返回顶部

CZW

be better

（翻译）Writing an x86 "Hello world" bootloader with assembly

摘要

x86计算机启动时所发生的事

用GNU汇编程序编写X86汇编程序

准备我们的代码

使之成为可引导的

引导我们的引导程序

使用BIOS打印文本

我们最终的hello-word-bootloader

Intel语法的版本和`nasm`

原文

Writing an x86 "Hello world" bootloader with assembly

Writing an x86 "Hello world" bootloader with assembly

TL;DR

What happens when your x86 computer starts

A primer on x86 assembly with the GNU assembler

Getting our code ready

Making it bootable

Booting our bootloader

Using the BIOS to print text

Our final hello-world-bootloader

The Intel syntax edition and `nasm`

公告

CZW

be better

（翻译）Writing an x86 "Hello world" bootloader with assembly

摘要

x86计算机启动时所发生的事

用GNU汇编程序编写X86汇编程序

准备我们的代码

使之成为可引导的

引导我们的引导程序

使用BIOS打印文本

我们最终的hello-word-bootloader

Intel语法的版本和nasm

原文

Writing an x86 "Hello world" bootloader with assembly

Writing an x86 "Hello world" bootloader with assembly

TL;DR

What happens when your x86 computer starts

A primer on x86 assembly with the GNU assembler

Getting our code ready

Making it bootable

Booting our bootloader

Using the BIOS to print text

Our final hello-world-bootloader

The Intel syntax edition and nasm

公告

Intel语法的版本和`nasm`

The Intel syntax edition and `nasm`